Diff 2 files ignoring strings between @ and [

Question

I am comparing two files. I am trying to ignore the alphanumeric characters after @ and before [ . A line looks like

model.Field@d6b0d6b[fieldName

Can you use sed to remove everything between the @ and [ characters? If that's the case, you can pipe the output to temporary files, use diff, and then know where your changes are. Kind of round-about, but works. Alternatively, you could use Perl. — kmort
– kmort, Commented Apr 18, 2013 at 15:48
You could write a diff tool in Perl. It wouldn't be too hard, as long as your data is VERY CLOSE to the same. You would just have to read a line of A and a line of B, then use the Perl s/// operator to replace everything between @ and [ with nothing, and compare the two lines with the m// operator. This becomes very complicated very quickly if your data is not VERY similar. (Like if you had to re-syncronize lines.) It's probably easiest for a one-shot deal to just do what I suggested above or what @suspectus suggested below (they are the same thing). — kmort
– kmort, Commented Apr 18, 2013 at 17:25
Please help me in debugging this error "sed: -e expression #1, char 10: unterminated `s' command " — Pooja Upadhyaya
– Pooja Upadhyaya, Commented Apr 18, 2013 at 18:00
I suspect your sed command is interpreting your unescaped square bracket as the start of a character class. Put \[ instead of [. Take a look at gnu.org/software/sed/manual/html_node/Regular-Expressions.html — kmort
– kmort, Commented Apr 18, 2013 at 20:05

glenn jackman · Accepted Answer · 2013-04-18 20:16:46Z

2

I would use process substitutions here:

diff <(sed 's/@[^[]*/@/' old) <(sed 's/@[^[]*/@/' new)

answered Apr 18, 2013 at 20:16

glenn jackman

88.5k16 gold badges124 silver badges179 bronze badges

Hi , This really helped but half way The output is : I have occurenece of those characters 2 times , so at the first place they were removed but caught the difference at the second

Pooja Upadhyaya
– Pooja Upadhyaya

2013-04-22 05:31:26 +00:00
Commented Apr 22, 2013 at 5:31
com.ibm.dataexplorer.bigindex.search.model.Field@[fieldName=com.ibm.dataexplor er.bigindex.search.model.FieldName@4cdc4cdc[fieldName=twitterMsg] com.ibm.dataexplorer.bigindex.search.model.Field@[fieldName=com.ibm.dataexplor er.bigindex.search.model.FieldName@79ff79ff[fieldName=twitterMsg],fieldValues=[c om.ibm.dataexplorer.bigindex.internal.search.model.ModifiableFieldValue@4ac84ac8 [fieldValue=Tweet1]]]

Pooja Upadhyaya
– Pooja Upadhyaya

2013-04-22 05:32:41 +00:00
Commented Apr 22, 2013 at 5:32
Pls help me in removing the second occurrence also

Pooja Upadhyaya
– Pooja Upadhyaya

2013-04-22 05:33:13 +00:00
Commented Apr 22, 2013 at 5:33
Do you just need to add the "g" flag to the sed "s///" commands?

glenn jackman
– glenn jackman

2013-04-22 12:38:00 +00:00
Commented Apr 22, 2013 at 12:38

Add a comment |

Rany Albeg Wein · Accepted Answer · 2013-04-26 15:43:15Z

I assume you are using Bash.

if v="model.Field@d6b0d6b[fieldName" then you can do the following:

# Extract the right side of "$v"
r="${v#*[}"
# Extract the left side of "$v"
l="${v%@*}"

# Combine
new_v="$l@[$r"; new_v1="$l$r"

You can use "$new_v" or "$new_v1" depends on whether you want the @ and [ or not.

As Mr. Wijsman commented, my answer doesn't answer the question. Correct, I did not pay much attention to the title. Let's fix it and wrap the code above with the following function to print a single file's data as required

pf()
{
    while read -r line; do
        # This is a bit fancy but does the same thing as the code above.
        printf '%s\n' "${line%@*}${line#*[}"
    done < "$1"
}

Now, we can diff the two files by using the following command:

diff <(pf file1.txt) <(pf file2.txt)

Here is a Sample output

rany$ cat file1.txt

model.Field1@__A__[fieldName
model.FieldIAMDIFFERENT@__B__[fieldName
model.Field1@__C__[fieldName

rany$ cat file2.txt

model.Field1@__C__[fieldName
model.Field1@__D__[fieldName
model.Field1@__E__[fieldName

rany$ diff <(pf file1.txt) <(pf file2.txt)

2c2
< model.FieldIAMDIFFERENTfieldName
---
> model.Field1fieldName
rany$

As you can see, the fact that the lines are different between @ and [ is being ignored, and the only line which is different between the files is this:

model.FieldIAMDIFFERENTfieldName

I'm sorry for not paying careful attention to your title as a part of the question.

This doesn't answer the question.

Tamara Wijsman
– Tamara Wijsman

2013-04-18 20:56:25 +00:00
Commented Apr 18, 2013 at 20:56 — Tamara Wijsman
– Tamara Wijsman, Commented Apr 18, 2013 at 20:56

suspectus · Accepted Answer · 2013-04-18 17:28:30Z

1

Filter the datafiles - then perform diff-:

sed 's/\@.*\[/@[/' file1 > file1.filt
sed 's/\@.*\[/@[/' file2 > file2.filt
diff file1.filt file2.filt

An alternative is to use diff has an option -I . Any lines which match the pattern are ignored in the diff comparision. Select a pattern which will uniquely select the lines which are not to be compared. e.g.

diff -I 'dataexplorer.bigindex' file1 file2

edited Apr 18, 2013 at 17:28

answered Apr 18, 2013 at 15:48

suspectus

6,2284 gold badges22 silver badges26 bronze badges

i have 2 files containing part like com.ibm.dataexplorer.bigindex.search.model.Field@d6b0d6b[fieldName=com.ibm.dataexplorer.bigindex.search.model.FieldName@d700d70[fieldName=twitterMsg] AND i want to do diff of 2 files and it should ignore the characters between @ and [ because these will change on every new run and hence my diff will not pass

Pooja Upadhyaya
– Pooja Upadhyaya

2013-04-18 16:33:25 +00:00
Commented Apr 18, 2013 at 16:33
does diff -I ... help?

suspectus
– suspectus

2013-04-18 17:28:52 +00:00
Commented Apr 18, 2013 at 17:28
Hi I tried using sed but i got this sed: -e expression #1, char 10: unterminated `s' command pls explain whta is the cause My expression was $rc = systemTestSetup::execute("sed 's/\@.*[/@[/' $tmpDir/data/actual_out.tmp > $tmpDir/data/actual_out");

Pooja Upadhyaya
– Pooja Upadhyaya

2013-04-18 17:45:41 +00:00
Commented Apr 18, 2013 at 17:45
diff -I ? how it could be used ?

Pooja Upadhyaya
– Pooja Upadhyaya

2013-04-18 17:49:17 +00:00
Commented Apr 18, 2013 at 17:49
Please help me in debugging this error "sed: -e expression #1, char 10: unterminated `s' command "

Pooja Upadhyaya
– Pooja Upadhyaya

2013-04-18 17:55:18 +00:00
Commented Apr 18, 2013 at 17:55

| Show 8 more comments

Stack Exchange Network

Diff 2 files ignoring strings between @ and [

3 Answers 3

You must log in to answer this question.

Hot Network Questions

Diff 2 files ignoring strings between @ and [

3 Answers 3

You must log in to answer this question.

Related

Hot Network Questions