Though late, I thought I'd contribute a small Bash script that processes filenames with Gawk. It flags a variety of dupes among files in any given directory. The processing remains O(n^2), although file pairs are checked only once.
Run the script in the directory of interest, for instance in a directory that contains:
$ \ls -A -1
2323- (1236).suffix
23232(1234).suffix
23232 (1236).hsj
23232 (1236).suffix
hello(2001.10.29)fgh.ssh
hello(2002.10.29)fgh.ssh
23232(1236).suffix
23232(12 6).suffix
23232 (1286).suffix
23232 (1446).suffix
23232(3236).suffix
dwlkl(1234).sds
Now the script:
$ cat near_match.sh
#!/usr/bin/env bash
gawk '
{i++; a[i] = $0; next} # put files to check in array 'a'
END {
nfiles = i; # number of files to check
for (i = 1; i <= nfiles-1; i++) {
lblkarr1 = split(a[i], blkarr1, "[()]");
lparr1 = split(blkarr1[2], parr1, "");
for (j = i+1; j <= nfiles; j++) {
lblkarr2 = split(a[j], blkarr2, "[()]");
lparr2 = split(blkarr2[2], parr2, "");
mismatch = 0;
if ("x"blkarr1[1]"x" == "x"blkarr2[1]"x" && "x"blkarr1[3]"x" == "x"blkarr2[3]"x" && lparr1 == lparr2) {
for (k=1; k<=length(blkarr1[2]); k++) { if (parr1[k] != parr2[k]) mismatch++ };
if (mismatch == 1) printf "dupes: %s <--> %s\n", a[i], a[j];
}
}
}
}' <<< $(\ls -A -1 -- *"("*")"*)
exit 0
Make the script executable and run with no arguments.
$ near_match.sh
dupes: 23232(1234).suffix <--> 23232(1236).suffix
dupes: 23232 (1236).suffix <--> 23232 (1286).suffix
dupes: hello(2001.10.29)fgh.ssh <--> hello(2002.10.29)fgh.ssh
dupes: 23232(1236).suffix <--> 23232(12 6).suffix
dupes: 23232(1236).suffix <--> 23232(3236).suffix
- As specified by OP, "dupe" are exclusively defined as 1-character mismatches, inside the parentheses. Adapting the solution to extend it so character mismatches can be detected anywhere in filename is trivial.
- The parentheses can be anywhere in the filename and the filename can have any suffix o none.
- The proposed solution seems to be robust against special characters inside the filename, except additional parentheses.
- Changing the 1-character mismatch to a multiple-character mismatch is possible with minimal changes. It would be trivial to vary that number by means of an argument fed to the script. Ask if you must.
HTH