0

I am trying to convert all the .html files under a directory into Markdown. After some Googling I discovered a Pypi script called html2text.

Then I wrote a code block that can convert one .html into .md at a time.

import html2text as ht
import os
import sys

from pathlib import Path

text_maker = ht.HTML2Text()

with open('myHtmlFilePath.html','r',encoding='UTF-8') as f:
    htmlpage = f.read()

text = text_maker.handle(htmlpage)

with open('myMarkdownFileName.md','w') as f:
    f.write(text)

Is there any possibility that I can wrap this code block in a loop, so that it can convert any file with the filename extension .html into .md under a given directory?

2
  • 2
    Does this help? Commented Dec 7, 2020 at 11:23
  • As a newbie in Python I need to use my noodle to figure out how to integrate your reference into my code. But thanks anyway, this definitely is useful though I haven't figured out how. Commented Dec 7, 2020 at 11:55

1 Answer 1

0

if you use linux you can use find command

linux

import os

dir = "."

for file in os.popen("find " + dir).read().splitlines():
    if file.endswith(".html"):
        print(file)

windows

import os

dir = "."

for i in os.walk(dir):
    for i2 in i[2]:
        if i2.endswith(".html"):
            print(i[0] + "/" + i2)

Sign up to request clarification or add additional context in comments.

3 Comments

Thanks so much. I wrote my script in Win10 though. How do I realize your code in Win10?
@eyal best to stick to the more portable (cross-platform) solutions, as suggested in the link I posted above.
Ugh, please don't shell out for this. Python is perfectly capable of iterating over files itself. See the link provided by costaparas in the comments above, for starters.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.