1

I am trying to create a Word document that is a mixture of inserting converted markdown strings and images. The file generates, however the markdown elements are included at the very end of the document, after the pictures. The best work around for converting markdown is included below (I'm also open to changing this). Here's my code:

import os
import pypandoc
from docx import Document

doc = Document()

def add_md(txt):
    TMP_FILE = 'tmp.docx'
    if os.path.exists(TMP_FILE):
        os.remove(TMP_FILE)

    pypandoc.convert_text(txt, 'docx', 'md', outputfile=TMP_FILE)
    tmp_doc = Document(TMP_FILE)
    os.remove(TMP_FILE)

    for element in tmp_doc.element.body:
        doc.element.body.append(element)

add_md('### first')
add_md('### second')
doc.add_picture('./family.jpg')

doc.save('sample.docx')

I would expect the output to be:

  1. first header
  2. second header
  3. photo

However the order is:

  1. photo
  2. first header
  3. second header

I know one work around would be to manually create the xml for the image and insert it, however I'd like to be able to continue to use the high level docx functions and still be able to insert md in this manner. Can I have my cake and eat it too?

1
  • Not an expert, but after a quick look at the docx code one question comes up. Why are you mixing direct manipulation of the xml (doc.element.body.append) with updates via the docx API? How do you know that going behind docx's back like that won't cause issues, such as messing with docx's idea of where the "end of document" is? Commented Jun 28, 2024 at 23:05

0

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.