Remove Duplicate Rows From A Text File Using Powershell

March 21, 2007

apple, [code], commandline, microISV, nimbletext, powershell, tools

If your text file is already sorted... then removing duplicates is very easy.

PS:\> gc $filename | get-unique > $newfileName

(But remember, the Get-Unique command only works on sorted data!)

If the file's content is not sorted, and the final order of the lines is unimportant, then it's also easy....

Sort it -- and then use Get-Unique

gc $filename | sort | get-unique > $newfileName

(You now end up with a file that is sorted, and where every line is unique)

However... the case that one bumps into is always the tricky case...

If the file data is not 'sorted', but the order *is* important... then it's a little trickier. I've got an approach... let's turn it into a solution.

Remove Duplicate Rows From A Text File Using Powershell... unsorted file, where order is important

I'm going to add each line to a hash table.

But before adding it -- i'll check if the line is already in the hash table.

If it's not there yet -- then I'll send that line into the new file. Here's an example:

PS H:\> $hash = @{}      # define a new empty hash table
PS H:\> gc c:\rawlist.txt | 
>> %{if($hash.$_ -eq $null) { $_ }; $hash.$_ = 1} > 
>> c:\newlist.txt

I test it out... given this input:

Apple
Dog
Dog
Carrot
Banana
Fun
Dog
Apple
Egg
Carrot
Egg

I get this output...

Apple
Dog
Carrot
Banana
Fun
Egg

Okay... i thought that was going to be really hard. Huh.

One more thing to do... see if I can comment this a little better...

PS H:\> $hash = @{}                 # Define an empty hashtable
PS H:\>  gc c:\rawlist.txt |        # Send the content of the file into the pipeline...
>>  % {                             # For each object in the pipeline...
>>                                      # note '%' is an alias of 'foreach-object'          
>>     if ($hash.$_ -eq $null) {    # if that line is not a key in our hashtable...
>>                                      # note -eq means 'equals'
>>                                      # note $_ means 'the data we got from the pipe'
>>                                      # note $null means NULL
>>         $_                       # ... send that line further along the pipe
>>     };
>>     $hash.$_ = 1                 # Add that line to the hash (so we won't send it again)
>>                                      # note that the value isn't important here,
>>                                      # only the key. ;-)
>>  } > c:\newlist.txt              # finally... redirect the pipe into a new file.
>>

By the way, my tools NimbleText and NimbleSET make removing duplicates from a list (or a file) even easier.

Next → ← Previous

My book "Choose Your First Product" is available now.

It gives you 4 easy steps to find and validate a humble product idea.

Learn more.

secretGeek.net

Remove Duplicate Rows From A Text File Using Powershell

Remove Duplicate Rows From A Text File Using Powershell... unsorted file, where order is important

Your comment, please?