5

Pandoc doesn't render well HTML tables into docx documents. I get the content of a request, I render it using a template file. Then I use pypandoc like this:

 response = render(                                     
   request,                                      
   'template.html',                      
   {                                             
     "field1": f1,                               
     "field1": f2,     
   }                                             
 )                                               

 import pypandoc                                                                                            
 pypandoc.convert(source=response.content, format='html', to='docx', outputfile='output.docx')  

The template.html contains a table. In the docx file I get an table with its content separated below. Are there extra parameters to consider to solve this? Or maybe pandoc conversion doesn't support well tables yet? Are there any functional example? Maybe there is an easier way to do it?


EDIT 1

I provide more concise example. Here is a testing python snippet:

$ cat test-table.py 
#!/usr/bin/env python
test_table = """
 <p>Table with colgroup and col</p>
 <table border="1">
   <colgroup>
     <col style="background-color: #0f0">
     <col span="2">
   </colgroup>
   <tr>
     <th>Lime</th>
     <th>Lemon</th>
     <th>Orange</th>
   </tr>
   <tr>
     <td>Green</td>
     <td>Yellow</td>
     <td>Orange</td>
   </tr>
   <tr>
     <td>Fruit</td>
     <td>Fruit</td>
     <td>Fruit</td>
   </tr>
 </table>

   """
print("[test_table]")
print(test_table)
import pypandoc
pypandoc.convert(source=test_table, format='html', to='docx', outputfile='test-table.docx')  

## Write to html
with open('test-table.html', 'w') as fh:
  fh.write(test_table)

I open the html file:

$ firefox test-table.html 

and get the following html page:

enter image description here

which is good. I also get the following docx document:

$ libreoffice test-table.docx 

enter image description here

Which is not good.

I exported the docx file into a pdf file and got the following output:

$ evince test-table.pdf 

enter image description here

Note that what we see in the images are the whole page, there is no scrolling possible. Date from the second column doesn't exist at all. Any ideas?


EDIT 2

Pandoc has been installed in a conda environment:

$ type pandoc
pandoc is hashed (/home/kaligne/local/miniconda3/bin/pandoc)

Pandoc version is:

$ pandoc -v
pandoc 2.2.1
Compiled with pandoc-types 1.17.4.2, texmath 0.11, skylighting 0.7.0.2
Default user data directory: /home/kaligne/.pandoc
Copyright (C) 2006-2018 John MacFarlane
Web:  http://pandoc.org
This is free software; see the source for copying conditions.
There is no warranty, not even for merchantability or fitness
for a particular purpose.

EDIT 3 I converted the docx file into txt:

$ docx2txt test-table.docx
$ cat test-table.txt 
Table with colgroup and col
Lime
Lemon
Green
Yellow
Fruit
Fruit

We can see that all the data are present. So I guess this has to with how information is being displayed.

4
  • can you provide us with some of your input? Commented Jun 19, 2018 at 16:17
  • I provided a working python snippets with outputs Commented Jun 20, 2018 at 12:30
  • 2
    I have the same problem Commented Oct 9, 2018 at 12:25
  • Facing same issue, I am getting coloured in HTML file but not seeing any colour in doc file My input is a Pandas Stylo object which i converted into a HTML string Commented Sep 21, 2020 at 12:38

0

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.