Wednesday, March 18, 2015

Normalizing Data in Excel with Index/Match

Spreadsheets are powerful/friendly because they don't constrain how we structure our data.

However, this sometimes results in the same information appear over and over in a column because it applies to every row.

For example, if we had a worksheet containing training hours it might look something like this.

Training

Date Class Employee ID Employ Name Department


The problem is the name of the employee and department have nothing to do with training. Nevertheless, it will be repeated for each row.

If you've used Excel's vlookup, you're likely thinking this solves the problem. It does, but there's a better way using Index and Match. It might appear more complex at first, but vlookup can bring its own complexity.

Combining files

What problem am I solving?
A customer provided 76 files with all the same structure. They needed to be combined together into one Excel file. PowerShell can do this easily enough using get-content and out-file. However, the file name in this case needed to be a column in Excel. This is a case where scripting is much less work than importing each file into Excel (and we get a reusable tool out of the effort).

Is this more trouble than it's worth?
In this case, not at all.


How is it run/invoked?
Update the local variable to specify the files you want combined and the name of the file to containing all the aggregated files. You can run this through PowerShell's ISE.

How does it work, and what does it do?
In the spirit of documenting code, the answer is in comments below.


What's the whole script look like?

# Prepare a file to put the results
 $output_file = "lspv.txt"
 remove-item $output_file
# Define the filename pattern for which files need to be concatenated together
 $filename_pattern = "*.lspv"
#Define the delimiter between the filename and the contents of the file
 $delimiter = "`t"
#Get the list of files to be aggregated
 $files = get-childitem $filename_pattern
 #for each file in the directory...
 foreach($file in $files){
   #store the filename for use in the output file
    $filename=$($file.name);
   #get the contents of the file for use in the output file
    $file_content = get-content $filename
   #for each line in the file...
    foreach($row in $file_content){
   #build a string containing the filename + the contents of the row
    $output_string = $filename + $delimiter +$row
   #add the output_string to the output file
    out-file -InputObject $output_string -FilePath $output_file -append  
   } #foreach
 } #foreach