Microsoft Excel — The BASICS of Data Transformation
Key Concepts Essential to Shaping Data For Analysis
Whenever analysis and reporting is performed in Excel — and practically every company does — it is imperative that data is aggregated correctly and “transformed”.
For analysis to be efficient and effective, proper structure of the data and the development of additional fields (to aid analysis) is very important.
If interested, you can get a download of the Excel file I walk-through below.
The two charts above were developed for a recent training session I conducted.
A part of the session was to focus on working with a data set I compiled (Apple historical sales) and performing basic data “shaping” to enable flexible analysis and reporting.
Below, I highlight the steps taken to do this and touch on three areas of Excel that are typically used for basic data transformation and analysis:
- Excel Data Tables
- VLOOKUP function
- Pivot Tables (future post…) — Part 2
Data Transformation / Shaping Process
Excel Data Tables
Here is a excellent summary from TechRepublic of key reasons you want to use Excel Tables. Key considerations mentioned include:
- Effortless data entry
- Always visible headers
- Formula autofill (my favorite!)
- Dynamic range
Step 1-Convert raw data input into an Excel Table
From this —
To this —
Step 2 — Create additional fields in the data table
This will enable the analysis we are potentially interested in getting. In this session, I have anticipated a need for three additional fields.
- (1) — Calendar year (Apple reports on a fiscal year). Why a calendar year? It is likely that when data analysis begins, information on a calendar year basis might be useful. The YEAR function is used for this.
- (2) — Category Summary. This field will “condense” Apple’s classification (which changes over the years) to a more summary grouping. This will enable higher level insight which would not be available from the raw dataset. The VLOOKUP function, with EXACT match, is used.
- (3) — Stratification of the Amount field. This may be useful to report aggregated data and develop charts regarding frequency of occurrence. The VLOOKUP function, with APPROXIMATE match, is used.
Step 3 — Populate the Calendar Year field
As soon as the formula is “Entered”, the Table populates the entire column immediately. This is known as “formula auto fill” and is a very useful reason to use tables — a lot.
Step 4 — Populate the Category Summary field
We will use a VLOOKUP with EXACT match to do this. Here is the formula syntax for VLOOKUP.
Here is the flow, with the LOOKUP table referenced (which is located on another worksheet).
- The formula is “=VLOOKUP([@Category2],SummaryLookupTable2,2,FALSE)”
- Translation…Look at the Category2 field, go to the SummaryLookupTable2, if you find a match — return the value from column 2, and I want an EXACT match.
Step 5 — Populate the Stratification field
Getting this field populated is more complicated. Here’s why:
- It involves 2 lookup tables. Since the Amount field includes $’s AND Units, separate lookup tables are needed for DOLLARS grouping and UNITS grouping.
- Nesting functions is used to resolve this. The formula performs the following:
— Use the IF Function to check whether the TYPE field is “Dollars”.
— If TYPE = “Dollars”, then use the specified lookup table (result if IF statement is TRUE).
— If TYPE is NOT “Dollars”, then use the Units lookup table (result if IF statement is FALSE).
Our data set is complete…for now.
Next step is to start to build analysis and visualizations of the data. This is the fun part — and where pivot tables make it simple!
I will cover the basics tips I provided in the presentation in a future post…stay tuned!
“On a mission to challenge the status quo to a more productive and effective end…”
Don Tomoff is passionate about helping professionals and organizations keep up and adapt to the changing business world that we operate in.