How to merge and remove duplicates of CSV files using Powershell

Question

I would like to remove duplicates in a CSV file using PowerShell. I know that there are posts about this already but I can't seem to find one that helps.

I'm trying to merge 2 CSV Files that have the same header and then remove the duplicates of the resulting file based on the IDs listed in the first column and then put it to the same CSV file.

The properties of the file are as follows:

And when I try to use the sort and unique method, I get the following (not a table:

Here is my code so far:

####
#MERGE
$getFirstLine = $true
    get-childItem "C:\IGHandover\Raw\IG_INC*.csv"| foreach {
    $filePath = $_
    $lines =  $lines = Get-Content $filePath  
    $linesToWrite = switch($getFirstLine) {
           $true  {$lines}
           $false {$lines | Select -Skip 1}
    }
    $getFirstLine = $false
    Add-Content "C:\IGHandover\new.csv" $linesToWrite
    }

####
#REMOVE DUPLICATES
Import-Csv "C:\IGHandover\new.csv" | Sort inc_number -Unique |
    Set-Content "C:\IGHandover\new.csv"

Please add code and not images as it's more difficult to reproduce it. — Manu
– Manu, Commented Nov 6, 2017 at 9:06
Hi Manu. Here is what I have so far: #### #MERGE $getFirstLine = $true get-childItem "C:\IGHandover\Raw\IG_INC*.csv"| foreach { $filePath = $_ $lines = $lines = Get-Content $filePath $linesToWrite = switch($getFirstLine) { $true {$lines} $false {$lines | Select -Skip 1} } $getFirstLine = $false Add-Content "C:\IGHandover\new.csv" $linesToWrite } #### #REMOVE DUPLICATES Import-Csv "C:\IGHandover\new.csv" | Sort inc_number -Unique | Set-Content "C:\IGHandover\new.csv" — Trizia Dimalanta
– Trizia Dimalanta, Commented Nov 6, 2017 at 9:11
Trizia, add the code (code + results) in your question, not in the comment. Replace the confidential informations in the question. — Manu
– Manu, Commented Nov 6, 2017 at 9:13
If you want to see your Import-Csv "C:\IGHandover\new.csv" | Sort inc_number -Unique to display data in tabular format then Format-Table -AutoSize is what you are looking for. But that is just for your representation purpose on the Shell screen. What is it exactly you are looking for? Do the Sort and Unique properties don't work correctly for you? — Vivek Kumar Singh
– Vivek Kumar Singh, Commented Nov 6, 2017 at 9:30
Your immediate problem is that you are using Set-Content io Export-Csv -NoClobber. Solving that takes you to another hurdle: you are writing to a file you are still reading from. That can be solved by adding brackets around the import but much easier is to write to a new file. — Lieven Keersmaekers
– Lieven Keersmaekers, Commented Nov 6, 2017 at 9:42

Vincent K · Accepted Answer · 2017-11-06 10:13:04Z

2

Don't use Get-Content or Set-Content to import or export csv file

Import-Csv (Get-ChildItem 'C:\IGHandover\Raw\IG_INC*.csv') |         
        Sort-Object -Unique inc_number |
            Export-Csv 'C:\IGHandover\new.csv' -NoClobber -NoTypeInformation

answered Nov 6, 2017 at 10:13

Vincent K

1,34413 silver badges19 bronze badges

Sign up to request clarification or add additional context in comments.

Comments

iRon · Accepted Answer · 2017-11-06 10:00:06Z

0

I guess you want to update a table (HandoverINC.csv) with records from a new table (New.csv), replacing any records in the HandoverINC.csv with the same primary key (inc_number) from the New.csv in the HandoverINC.csv. And add any new records in the New.csv to the HandoverINC.csv (Basically what is called a Full Join in SQL).

Using the Join-Object described at: https://stackoverflow.com/a/45483110/1701026

Import-CSV .\HandoverINC.csv | FullJoin (Import-CSV .\New.csv) inc_number {$Right.$_} | Export-CSV .\HandoverINC.csv

edited Nov 6, 2017 at 10:00

answered Nov 6, 2017 at 9:35

iRon

24.4k10 gold badges60 silver badges107 bronze badges

Comments

Trizia Dimalanta · Accepted Answer · 2017-11-06 12:44:59Z

0

As suggested by Lieven Keersmaekers and Vivek Kumar, I've made a few changes in my code:

Put the merged contents to a temporary file
Import the csv file with the merge contents
Sort the column of reference and use the unique parameter
Export the results to a new csv file

I found that my code was similar to Vincent K's:

#MERGE
$getFirstLine = $true
get-childItem "C:\IGHandover\Raw\IG_INC*.csv"|
foreach {
    $filePath = $_
    $lines =  $lines = Get-Content $filePath  
    $linesToWrite = switch($getFirstLine) {
    $true  {$lines}
    $false {$lines | Select -Skip 1}}
    $getFirstLine = $false
    Add-Content "C:\IGHandover\HandoverINCtemp.csv" $linesToWrite }

#REMOVE DUPLICATES
Import-Csv "C:\IGHandover\HandoverINCtemp.csv" | Sort inc_number -Unique |
    Export-Csv "C:\IGHandover\HandoverINC.csv" -NoClobber -NoTypeInformation -Force
    Remove-Item "C:\IGHandover\HandoverINCtemp.csv"

To simplify (merging and removing duplicates with the same header), as suggested by Vincent:

Import-Csv (Get-ChildItem "C:\IGHandover\Raw\IG_INC*.csv") | Sort inc_number -Unique |
    Export-Csv "C:\IGHandover\HandoverINC.csv" -NoClobber -NoTypeInformation -Force

I hope this helps anyone who'd like to do the same with their files

edited Nov 6, 2017 at 12:44

answered Nov 6, 2017 at 10:38

Trizia Dimalanta

131 silver badge4 bronze badges

1 Comment

Vincent K Over a year ago

this code will merge your csv file: "Import-Csv (Get-ChildItem 'C:\IGHandover\Raw\IG_INC*.csv')"....all csv files have the same header

Collectives™ on Stack Overflow

How to merge and remove duplicates of CSV files using Powershell

3 Answers 3

Comments

Comments

1 Comment

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

Comments

Comments

1 Comment

Your Answer

Sign up or log in

Post as a guest

Linked

Related