0

I have a directory which contains both ISO-8859 and UTF8 encoded files. I want to convert all ISO files to UTF8 encoding, and leave the UTF8 files untouched. So far, I've got this:

for isoFile in `file exports/invoice/* | grep "ISO-8859"`; do iconv -f iso-8859-1 -t utf-8 "$isoFile" -o "$isoFile"; done

The problem is that file exports/invoice/* | grep "ISO-8859" returns a list of files in this format:

exports/invoice/2014.03547.html:                 HTML document, ISO-8859 text, with very long lines, with CRLF, LF line terminators

which of course will not work for iconv. I need to extract the filename from this string and run it through iconv.

2 Answers 2

1

Easy to use awk:

file exports/invoice/* | grep "ISO-8859" | awk -F':' '{print $1}'

Sign up to request clarification or add additional context in comments.

1 Comment

Thanks for your answer. For completeness' sake the whole command to convert the encoding becomes: for file in file exports/invoice/* | grep "ISO-8859" | awk -F':' '{print $1}'; do iconv -f iso-8859-1 -t utf-8 "$file" -o "${file}"; done
0

You can extract the filename from this string using the following command:

cut -d' ' -f1 //to select first column

rev | cut -c 2- | rev //to remove ':' from the end of the name

So the whole command to extract the filename gets this way:

file exports/invoice/* | grep "ISO-8859" | cut -d' ' -f1 | rev | cut -c 2- | rev

And it will return to you: exports/invoice/2014.03547.html

1 Comment

Thx, I've preferred the awk method as suggested above, because it is shorter and also works with spaces in filenames.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.