0

I would like to remove duplicates in a CSV file using PowerShell. I know that there are posts about this already but I can't seem to find one that helps.

I'm trying to merge 2 CSV Files that have the same header and then remove the duplicates of the resulting file based on the IDs listed in the first column and then put it to the same CSV file.

The properties of the file are as follows: enter image description here

And when I try to use the sort and unique method, I get the following (not a table: enter image description here

Here is my code so far: enter image description here

####
#MERGE
$getFirstLine = $true
    get-childItem "C:\IGHandover\Raw\IG_INC*.csv"| foreach {
    $filePath = $_
    $lines =  $lines = Get-Content $filePath  
    $linesToWrite = switch($getFirstLine) {
           $true  {$lines}
           $false {$lines | Select -Skip 1}
    }
    $getFirstLine = $false
    Add-Content "C:\IGHandover\new.csv" $linesToWrite
    }

####
#REMOVE DUPLICATES
Import-Csv "C:\IGHandover\new.csv" | Sort inc_number -Unique |
    Set-Content "C:\IGHandover\new.csv"
8
  • Please add code and not images as it's more difficult to reproduce it. Commented Nov 6, 2017 at 9:06
  • Hi Manu. Here is what I have so far: #### #MERGE $getFirstLine = $true get-childItem "C:\IGHandover\Raw\IG_INC*.csv"| foreach { $filePath = $_ $lines = $lines = Get-Content $filePath $linesToWrite = switch($getFirstLine) { $true {$lines} $false {$lines | Select -Skip 1} } $getFirstLine = $false Add-Content "C:\IGHandover\new.csv" $linesToWrite } #### #REMOVE DUPLICATES Import-Csv "C:\IGHandover\new.csv" | Sort inc_number -Unique | Set-Content "C:\IGHandover\new.csv" Commented Nov 6, 2017 at 9:11
  • Trizia, add the code (code + results) in your question, not in the comment. Replace the confidential informations in the question. Commented Nov 6, 2017 at 9:13
  • If you want to see your Import-Csv "C:\IGHandover\new.csv" | Sort inc_number -Unique to display data in tabular format then Format-Table -AutoSize is what you are looking for. But that is just for your representation purpose on the Shell screen. What is it exactly you are looking for? Do the Sort and Unique properties don't work correctly for you? Commented Nov 6, 2017 at 9:30
  • Your immediate problem is that you are using Set-Content io Export-Csv -NoClobber. Solving that takes you to another hurdle: you are writing to a file you are still reading from. That can be solved by adding brackets around the import but much easier is to write to a new file. Commented Nov 6, 2017 at 9:42

3 Answers 3

2

Don't use Get-Content or Set-Content to import or export csv file

Import-Csv (Get-ChildItem 'C:\IGHandover\Raw\IG_INC*.csv') |         
        Sort-Object -Unique inc_number |
            Export-Csv 'C:\IGHandover\new.csv' -NoClobber -NoTypeInformation
Sign up to request clarification or add additional context in comments.

Comments

0

I guess you want to update a table (HandoverINC.csv) with records from a new table (New.csv), replacing any records in the HandoverINC.csv with the same primary key (inc_number) from the New.csv in the HandoverINC.csv. And add any new records in the New.csv to the HandoverINC.csv (Basically what is called a Full Join in SQL).

Using the Join-Object described at: https://stackoverflow.com/a/45483110/1701026

Import-CSV .\HandoverINC.csv | FullJoin (Import-CSV .\New.csv) inc_number {$Right.$_} | Export-CSV .\HandoverINC.csv

Comments

0

As suggested by Lieven Keersmaekers and Vivek Kumar, I've made a few changes in my code:

  • Put the merged contents to a temporary file
  • Import the csv file with the merge contents
  • Sort the column of reference and use the unique parameter
  • Export the results to a new csv file

I found that my code was similar to Vincent K's:

#MERGE
$getFirstLine = $true
get-childItem "C:\IGHandover\Raw\IG_INC*.csv"|
foreach {
    $filePath = $_
    $lines =  $lines = Get-Content $filePath  
    $linesToWrite = switch($getFirstLine) {
    $true  {$lines}
    $false {$lines | Select -Skip 1}}
    $getFirstLine = $false
    Add-Content "C:\IGHandover\HandoverINCtemp.csv" $linesToWrite }

#REMOVE DUPLICATES
Import-Csv "C:\IGHandover\HandoverINCtemp.csv" | Sort inc_number -Unique |
    Export-Csv "C:\IGHandover\HandoverINC.csv" -NoClobber -NoTypeInformation -Force
    Remove-Item "C:\IGHandover\HandoverINCtemp.csv"

To simplify (merging and removing duplicates with the same header), as suggested by Vincent:

Import-Csv (Get-ChildItem "C:\IGHandover\Raw\IG_INC*.csv") | Sort inc_number -Unique |
    Export-Csv "C:\IGHandover\HandoverINC.csv" -NoClobber -NoTypeInformation -Force

I hope this helps anyone who'd like to do the same with their files

1 Comment

this code will merge your csv file: "Import-Csv (Get-ChildItem 'C:\IGHandover\Raw\IG_INC*.csv')"....all csv files have the same header

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.