I'm trying to convert a HTML file, on my linux server, to a TXT file. The thing is the conversion working fine but it keeps the HTML tags in it. Any command to strip all HTML tags in the conversion ?
libreoffice4.2 --headless --convert-to txt 2000.html 2000.txt
Opening it in a GUI Libreoffice is already stripping HTML when saving from HTML to TXT so there must be something to accomplish this in command line too.
sedcommand with a regex formula to strip the content of the HTML file instead to using Libre Office. Will tell if it works.