-1

I’m using pypandoc to convert an RTF file to a PDF, but I’m running into an issue where the page structure and formatting are altered during the conversion. It looks like the output PDF is being generated using LaTeX, and this changes the layout compared to the original RTF file.

Here’s the code I’m using:

import pypandoc
def rtf_to_pdf(input_file, output_file):
    """
    Convert an RTF file to PDF using pypandoc.
    
    Args:
    input_file (str): Path to the input RTF file.
    output_file (str): Path where the output PDF will be saved.
    """
    try:
        output = pypandoc.convert_file(input_file, 'pdf', outputfile=output_file)
        print(f"Conversion successful! PDF saved as {output_file}")
        return output
    except Exception as e:
        print(f"An error occurred: {e}")

# Example usage
rtf_to_pdf('input_file.rtf', 'output_file.pdf')

The issue is that the formatting (e.g., margins, alignment, spacing) does not match the original RTF document after conversion. I just want to retain the same format and layout as the RTF file without any changes.

Question:

Is there a way to use pypandoc or another library to ensure the formatting and layout of the original RTF file is preserved in the PDF output? Are there any alternative approaches or libraries I can use for this kind of conversion where the layout stays exactly the same?

Any suggestions or insights would be much appreciated!

Here is a simple example of an RTF file I’m working with (sample.rtf):

It is an example test rtf-file to RTF2XML bean for testing

https://jeroen.github.io/files/sample.rtf

and here a screenshot of the output : enter image description here

I'm using MicrosoftWord to visualise the rtf document and I'm working using MacOS , Python version: 3.11.7, pandoc version: 3.4

3
  • Can you verify your version by running pandoc -v from the command line? Make sure tthe correct python environment is activated... Commented Oct 20, 2024 at 18:46
  • @YannisP. i'm using Python version: 3.11.7, pandoc version: 3.4 Commented Oct 21, 2024 at 8:17
  • Nice trick @KJ! I reckon though the OP would like to see a solution through pypandoc. Commented Oct 21, 2024 at 12:55

1 Answer 1

1

Although there are 15 lines of preamble before the first Rich Text File start, none will define a new page layout until you initiate manual printing or set defaults in Windows registry.

The default for RTF is hard against the top left corner and wrapping as set by the printer carriage width etc.

enter image description here

You can use the native layout of either Letter or A4 depending on your locale, and let the native RTF writer print out at that scale. But margins will depend on prior settings.

We can thus run the 3 commands and see that default as a pdf

curl -O https://jeroen.github.io/files/sample.rtf
write /pt sample.rtf "Microsoft Print to PDF" "Microsoft Print to PDF" sample.pdf
sample.pdf

enter image description here

Thus to chose a desired "Print Layout" we need to use Word or WordPad to set the margins and page size etc.

enter image description here

Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.