0

I created a script which gets Name, CreationTime and Duration from each .mp4 file in a specific directory and than exports the data into a CSV file using the Cmdlet Export-Csv.

Now I have 3 CSV exports and I want to merge them, so I tried this:

$Data = @()
$CSVsPaths | ForEach-Object {
    $Data += Import-Csv -Path "$_" -Encoding "UTF8"
}
$Data

But for some reason, some of the objects are duplicated, and I'm sure the exports all contains different data. What am I doing wrong?


Edit:

Here are the CSVs: https://drive.google.com/drive/folders/1MbeUenLxbKdlle6rKFwc3jJNZf85AMtd

9
  • 3
    Show us the input data and the duplicates :) Commented Sep 5, 2022 at 14:33
  • 1
    By duplicated you mean all values in the Name column should be different? also, $data = Get-ChildItem -Filter *.csv | Import-Csv -Encoding UTF8 should be a cleaner approach Commented Sep 5, 2022 at 14:34
  • 2
    @Dstr0 There are exactly 86 duplicates in the final output because there are exactly 86 duplicates between the Export 2 and Export 3 files. Do you want to only output unique rows across all 3 files? Commented Sep 5, 2022 at 16:34
  • 1
    If you want to see for yourself, the duplicated values per Column: $data[0].PSObject.Properties.Name | % { ($data | Group-Object $_ | ? Count -GT 1).Count } this gives me 86 for the first column, 107 for the second and 152 for the third Commented Sep 5, 2022 at 16:45
  • 1
    If you want to get rid of duplicates in 1 or all columns you can give this function a try, you would use like: $data = Get-ChildItem -Filter *.csv | Import-Csv -Encoding UTF8 | Filter-Unique -On * Commented Sep 5, 2022 at 17:47

1 Answer 1

2

As mentioned in the comments, your output has duplicate rows because duplicates already exist in your data set.

To locate the duplicated rows, use Get-Content against the input files - the file system provider will attach some hidden properties to the output that we can use to identify the duplicate location(s) later:

# find all non-unique strings in the input files
$nonUniques = Get-Content '.\Export*.csv' |Group-Object |Where-Object Count -gt 1 |ForEach-Object Group 

# use the PSChildName and ReadCount provider properties to identify the files that host the duplicate content,
# then use Format-Table to show output nicely grouped on the non-unique string value
$nonUniques |Select @{Name='Name';Expression='PSChildName'},@{Name='Line';Expression='ReadCount'},@{Name='Duplicate';Expression={$_}} |Format-Table Name,Line -GroupBy Duplicate

Which, given the input data you linked, will produce something like this:

   Duplicate: "QVR_06082021_141022 (PRIMA VOLTA CHE REGISTRO).mp4","06/08/2021 14:10:22","00:00:46"

Name                                      Line
----                                      ----
Export 2 (da 24-06-2022 a 31-07-2022).csv  113
Export 3 (da 06-08-2021 a 31-08-2021).csv    3


   Duplicate: "QVR_06082021_142308.mp4","06/08/2021 14:23:08","00:00:50"

Name                                      Line
----                                      ----
Export 2 (da 24-06-2022 a 31-07-2022).csv  114
Export 3 (da 06-08-2021 a 31-08-2021).csv    4


   Duplicate: "VID_20210806_220220.mp4","06/08/2021 22:02:20","00:00:20"

Name                                      Line
----                                      ----
Export 2 (da 24-06-2022 a 31-07-2022).csv  115
Export 3 (da 06-08-2021 a 31-08-2021).csv    5
Sign up to request clarification or add additional context in comments.

1 Comment

I'm sorry for wasing your time man

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.