PowerShell - Merge CSVs

Question

I created a script which gets Name, CreationTime and Duration from each .mp4 file in a specific directory and than exports the data into a CSV file using the Cmdlet Export-Csv.

Now I have 3 CSV exports and I want to merge them, so I tried this:

$Data = @()
$CSVsPaths | ForEach-Object {
    $Data += Import-Csv -Path "$_" -Encoding "UTF8"
}
$Data

But for some reason, some of the objects are duplicated, and I'm sure the exports all contains different data. What am I doing wrong?

Edit:

Here are the CSVs: https://drive.google.com/drive/folders/1MbeUenLxbKdlle6rKFwc3jJNZf85AMtd

By duplicated you mean all values in the Name column should be different? also, $data = Get-ChildItem -Filter *.csv | Import-Csv -Encoding UTF8 should be a cleaner approach — Santiago Squarzon
– Santiago Squarzon, Commented Sep 5, 2022 at 14:34
@Dstr0 There are exactly 86 duplicates in the final output because there are exactly 86 duplicates between the Export 2 and Export 3 files. Do you want to only output unique rows across all 3 files? — Mathias R. Jessen
– Mathias R. Jessen, Commented Sep 5, 2022 at 16:34
If you want to see for yourself, the duplicated values per Column: $data[0].PSObject.Properties.Name | % { ($data | Group-Object $_ | ? Count -GT 1).Count } this gives me 86 for the first column, 107 for the second and 152 for the third — Santiago Squarzon
– Santiago Squarzon, Commented Sep 5, 2022 at 16:45
If you want to get rid of duplicates in 1 or all columns you can give this function a try, you would use like: $data = Get-ChildItem -Filter *.csv | Import-Csv -Encoding UTF8 | Filter-Unique -On * — Santiago Squarzon
– Santiago Squarzon, Commented Sep 5, 2022 at 17:47

Mathias R. Jessen · Accepted Answer · 2022-09-05 17:07:45Z

As mentioned in the comments, your output has duplicate rows because duplicates already exist in your data set.

To locate the duplicated rows, use Get-Content against the input files - the file system provider will attach some hidden properties to the output that we can use to identify the duplicate location(s) later:

# find all non-unique strings in the input files
$nonUniques = Get-Content '.\Export*.csv' |Group-Object |Where-Object Count -gt 1 |ForEach-Object Group 

# use the PSChildName and ReadCount provider properties to identify the files that host the duplicate content,
# then use Format-Table to show output nicely grouped on the non-unique string value
$nonUniques |Select @{Name='Name';Expression='PSChildName'},@{Name='Line';Expression='ReadCount'},@{Name='Duplicate';Expression={$_}} |Format-Table Name,Line -GroupBy Duplicate

Which, given the input data you linked, will produce something like this:

   Duplicate: "QVR_06082021_141022 (PRIMA VOLTA CHE REGISTRO).mp4","06/08/2021 14:10:22","00:00:46"

Name                                      Line
----                                      ----
Export 2 (da 24-06-2022 a 31-07-2022).csv  113
Export 3 (da 06-08-2021 a 31-08-2021).csv    3


   Duplicate: "QVR_06082021_142308.mp4","06/08/2021 14:23:08","00:00:50"

Name                                      Line
----                                      ----
Export 2 (da 24-06-2022 a 31-07-2022).csv  114
Export 3 (da 06-08-2021 a 31-08-2021).csv    4


   Duplicate: "VID_20210806_220220.mp4","06/08/2021 22:02:20","00:00:20"

Name                                      Line
----                                      ----
Export 2 (da 24-06-2022 a 31-07-2022).csv  115
Export 3 (da 06-08-2021 a 31-08-2021).csv    5

Collectives™ on Stack Overflow

PowerShell - Merge CSVs

1 Answer 1

1 Comment

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

1 Comment

Your Answer

Sign up or log in

Post as a guest

Related