How to Use Python Script to Convert HTML to Markdown in Batch [duplicate]

Question

I am trying to convert all the .html files under a directory into Markdown. After some Googling I discovered a Pypi script called html2text.

Then I wrote a code block that can convert one .html into .md at a time.

import html2text as ht
import os
import sys

from pathlib import Path

text_maker = ht.HTML2Text()

with open('myHtmlFilePath.html','r',encoding='UTF-8') as f:
    htmlpage = f.read()

text = text_maker.handle(htmlpage)

with open('myMarkdownFileName.md','w') as f:
    f.write(text)

Is there any possibility that I can wrap this code block in a loop, so that it can convert any file with the filename extension .html into .md under a given directory?

As a newbie in Python I need to use my noodle to figure out how to integrate your reference into my code. But thanks anyway, this definitely is useful though I haven't figured out how. — ChinaMahjongKing
– ChinaMahjongKing, Commented Dec 7, 2020 at 11:55

eyal · Accepted Answer · 2020-12-07 11:43:19Z

0

if you use linux you can use find command

linux

import os

dir = "."

for file in os.popen("find " + dir).read().splitlines():
    if file.endswith(".html"):
        print(file)

windows

import os

dir = "."

for i in os.walk(dir):
    for i2 in i[2]:
        if i2.endswith(".html"):
            print(i[0] + "/" + i2)

edited Dec 7, 2020 at 11:43

answered Dec 7, 2020 at 11:28

eyal

1072 silver badges8 bronze badges

Sign up to request clarification or add additional context in comments.

3 Comments

ChinaMahjongKing Over a year ago

Thanks so much. I wrote my script in Win10 though. How do I realize your code in Win10?

costaparas Over a year ago

@eyal best to stick to the more portable (cross-platform) solutions, as suggested in the link I posted above.

Chris Over a year ago

Ugh, please don't shell out for this. Python is perfectly capable of iterating over files itself. See the link provided by costaparas in the comments above, for starters.

Collectives™ on Stack Overflow

How to Use Python Script to Convert HTML to Markdown in Batch [duplicate]

1 Answer 1

3 Comments

Linked

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

3 Comments

Linked

Related