2

I'm trying to replace all HTML codes in my HTML file in a for Loop (not sure if this is the easiest approach) without changing the formatting of the original file. When I run the code below I don't get the codes replaced. Does anyone know what could be wrong?

import re
tex=open('ALICE.per-txt.txt', 'r')

tex=tex.read()




for i in tex:
  if i =='õ':
      i=='õ'
  elif i == 'ç':
      i=='ç'



with open('Alice1.replaced.txt', "w") as f:
    f.write(tex)
    f.close()
1
  • With for i in tex you iterate over single characters, but 'õ' has 6 characters. This will never be equal. And you never change tex. You change only i and overwrite the value of i in each loop. Commented Feb 1, 2021 at 14:58

1 Answer 1

1

You can use html.unescape.

>>> import html
>>> html.unescape('õ')
'õ'

With your code:

import html

with open('ALICE.per-txt.txt', 'r') as f:
    html_text = f.read()

html_text = html.unescape(html_text)

with open('ALICE.per-txt.txt', 'w') as f:
    f.write(html_text)

Please note that I opened the files with a with statement. This takes care of closing the file after the with block - something you forgot to do when reading the file.

Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.