1

I am trying to find the difference between two files, where-in i would like to know the new entries in file_2. For ex :

If a.txt contains:

a
b
c

And b.txt contains:

c
d
f

I would like to get d and f

I'm using the command : diff --changed-group-format="%>" --unchanged-group-format=''

mymach@dev-machine:~/test$ grep 'C:/Documents and Settings/pandep2/AppData/Local/Google/Chrome/User Data/CrashpadMetrics.pma~RF115cef5.TMP' file_1.log
C:/Documents and Settings/pandep2/AppData/Local/Google/Chrome/User Data/CrashpadMetrics.pma~RF115cef5.TMP

mymach@dev-machine:~/test$ grep 'C:/Documents and Settings/pandep2/AppData/Local/Google/Chrome/User Data/CrashpadMetrics.pma~RF115cef5.TMP' file_2.log
C:/Documents and Settings/pandep2/AppData/Local/Google/Chrome/User Data/CrashpadMetrics.pma~RF115cef5.TMP

mymach@dev-machine:~/test$ diff --changed-group-format="%>" --unchanged-group-format='' file_1.log file_2.log >diff_file.log

mymach@dev-machine:~/test$ grep 'C:/Documents and Settings/pandep2/AppData/Local/Google/Chrome/User Data/CrashpadMetrics.pma~RF115cef5.TMP' diff_file.log
C:/Documents and Settings/pandep2/AppData/Local/Google/Chrome/User Data/CrashpadMetrics.pma~RF115cef5.TMP

Since, the same file existed in both files, why would diff command still report that file?

5
  • Perhaps the location in each file is different? Commented Jun 14, 2018 at 11:05
  • $ echo "C:/Documents and Settings/pandep2/AppData/Local/Google/Chrome/User Data/CrashpadMetrics.pma~RF115cef5.TMP" > file_1.log $ cp file_1.log file_2.log $ diff --changed-group-format="%>" --unchanged-group-format='' file_1.log file_2.log $ It should work. Your files obviously aren't matching Commented Jun 14, 2018 at 11:08
  • I never thought that location does play an important role, i always thought diff doesn't need a sorted file ? Commented Jun 14, 2018 at 11:15
  • does anyone have a better solution than diff for such scenario? I would really appreciate that.... i have two files which contain more than 1 million entries in each and i would like to find the difference of them. Commented Jun 14, 2018 at 11:31
  • i found a command : bash -c 'diff --changed-group-format="%>" --unchanged-group-format='' <(sort file_1.log) <(sort file_2.log) > diff_file.log' is this good ? Commented Jun 14, 2018 at 11:38

1 Answer 1

1

In such scenario's, we are better of using the comm command in Unix.

for the above scenario, i have used:

comm -23 <(sort file_1.txt) <(sort file_2.txt)

This would give the unique files of file_1.txt

comm -13 <(sort a.txt) <(sort b.txt)

This would give the unique files of file_2.txt

comm -12 <(sort a.txt) <(sort b.txt)

This would give the common files between both files

Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.