I am trying to unzip a set of files (3 files). These files contain a lot of random files, duplicate files etc. To do this, I am first unziping the list of the names of files into a text file, then clean up the files, and then unzipping the files.
I was originally extracting files by extracting all the necessary names of the files into a text file all at onces, and then running them through unzip.
So for example I have 3 zip files (file1, file2 and file3) all from the same source. First I extracted the list of the contents and cleaned them up into unzip.txt. Then I ran the following script on the file unzip.txt
zipdir="/Volumes/filedir"
i=0
while IFS= read -r line; do
i=$((i+1))
for f in $zipdir/*.zip; do
# echo $f
unzip -qq $f $line
done
echo "$i"
done < unzip.txt
Not all of the files corresponding to each of the lines (in unzip.txt) are in 3 zip files. So I received a lot of errors, and I suspect because I am unnecessarily running the lines which are not in file1, its wasting a lot of unzipping time. I am concerned about this, as I have a much larger set of files I have to run this on.
So I came up with a better way of handling the unipping using chatgpt, but I am not sure what errors I made:
function unzip1() {
f=$1
echo $f
while IFS= read -r line; do
printf "%s\0" "$line"
done < unzip1.txt | xargs -0 -n 1000 unzip $f
}
To explain this, I am now extracting only the files corresponding to file1, and then cleaning up the files and then unzipping the files in one single pass. Lets assume that the extraction of the lines into unzip.txt is ok, because i can see that the total number of lines came out to be the same.
The lines have a lot of random characters, so I have to pass it through "printf "%s\0" "$line"", to first put a null delimiter at the end of each line, then I pass it through xargs. The reason I have to use xargs, is because the number of lines in the unzip.txt is very large, and I cant just do a "cat unzip.txt".
At the end of this, I was able to get a unzipped archive. However, when I ran the test on the first 3 files using method 1 and method 2, method 2 gave me half of the size of the archive compared to method 1.
Am I doing anything incorrectly?
The error in the size of the files is happening because of something to do with xargs. When I use xargs -0, it gives me 30% less number of files, when I use xargs -0 -n 1000, it gives me 20% less number of files, and when I use xargs -0 -n 10, it gives me 5% less number of files. So I am not sure what to do with xargs, and switched to the following method of running one line at a time.
function unzip1() {
f=$1
echo $f
while IFS= read -r line; do
unzip -qq $f $line
done < unzip1.txt
}
There must be a bug in xargs, and how it works with unzip. Is this something that should be reported to xargs, or am I imagining this as a bug?
unzip *.zipfor years.unzip *.zipon Microsoft operating systems (where globbing is done (each in their own way) by applications). On other OSes, that wouldn't make sense.unziphaving a MS-DOS-like API,unzip '*.zip'would work though in Unix shell (at unzipping all the .zip files in the current directory, not unzipping the file literally called*.ziponly).unzip *.zipcommand works fine on my Ubuntu. The commandunrar x *.rardoes not work, though. The "rar' files have to be "unrared" by name. And just to be complete, I am not sure whattarwould do.unzip *.zipin a directory that has more than onezipfile (and whose name don't start with-) would be expanded to something likeunzip file1.zip file2.zip file3.zipwhich is a request to extract thefile2.zipandfile3.zipmembers of thefile1.zipzip file.