Add text to Existing PDF using Python

Question

I need to add some extra text to an existing PDF using Python, what is the best way to go about this and what extra modules will I need to install.

Note: Ideally I would like to be able to run this on both Windows and Linux, but at a push Linux only will do.

Edit: pypdf and ReportLab look good but neither one will allow me to edit an existing PDF, are there any other options?

PyPDF2 allows you to copy every page + add a text annotation on top: — Martin Thoma
– Martin Thoma, Commented Dec 20, 2022 at 18:01

Wtower · Accepted Answer · 2023-01-20 11:59:20Z

188

Example for [Python 2.7]:

from pyPdf import PdfFileWriter, PdfFileReader
import StringIO
from reportlab.pdfgen import canvas
from reportlab.lib.pagesizes import letter

packet = StringIO.StringIO()
can = canvas.Canvas(packet, pagesize=letter)
can.drawString(10, 100, "Hello world")
can.save()

#move to the beginning of the StringIO buffer
packet.seek(0)

# create a new PDF with Reportlab
new_pdf = PdfFileReader(packet)
# read your existing PDF
existing_pdf = PdfFileReader(file("original.pdf", "rb"))
output = PdfFileWriter()
# add the "watermark" (which is the new pdf) on the existing page
page = existing_pdf.getPage(0)
page.mergePage(new_pdf.getPage(0))
output.addPage(page)
# finally, write "output" to a real file
outputStream = file("destination.pdf", "wb")
output.write(outputStream)
outputStream.close()

Example for Python 3.x:

from PyPDF2 import PdfFileWriter, PdfFileReader
import io
from reportlab.pdfgen import canvas
from reportlab.lib.pagesizes import letter

packet = io.BytesIO()
can = canvas.Canvas(packet, pagesize=letter)
can.drawString(10, 100, "Hello world")
can.save()

#move to the beginning of the StringIO buffer
packet.seek(0)

# create a new PDF with Reportlab
new_pdf = PdfFileReader(packet)
# read your existing PDF
existing_pdf = PdfFileReader(open("original.pdf", "rb"))
output = PdfFileWriter()
# add the "watermark" (which is the new pdf) on the existing page
page = existing_pdf.pages[0]
page.merge_page(new_pdf.pages[0])
output.add_page(page)
# finally, write "output" to a real file
output_stream = open("destination.pdf", "wb")
output.write(output_stream)
output_stream.close()

edited Jan 20, 2023 at 11:59

Wtower

20.1k12 gold badges110 silver badges86 bronze badges

answered Jul 9, 2013 at 0:16

David Dehghan

25.3k11 gold badges113 silver badges101 bronze badges

Sign up to request clarification or add additional context in comments.

10 Comments

Noufal Ibrahim Over a year ago

For python3, packet should be io.BytesIO and use PyPDF2 rather than pyPDF (which is unmaintained). Great answer!

mitenka Over a year ago

Thanks for sharing. It works great. One note: I believe it's better to use open instead of file.

alexis Over a year ago

Careful: The new document only includes the first page of the original! It's easy enough to copy the rest of the pages from existing_pdf to output, the sample code just doesn't.

DavidV Over a year ago

@alexis: How would you modify the code to put something on the second page of the pdf? I have a form that uses two pages and I am stuck on the first page. Thanks in advance.

PythonProgrammi Over a year ago

@DavidV substitute 0 with 1

|

dwelch · Accepted Answer · 2018-12-10 17:43:40Z

107

I know this is an older post, but I spent a long time trying to find a solution. I came across a decent one using only ReportLab and PyPDF so I thought I'd share:

read your PDF using PdfFileReader(), we'll call this input
create a new pdf containing your text to add using ReportLab, save this as a string object
read the string object using PdfFileReader(), we'll call this text
create a new PDF object using PdfFileWriter(), we'll call this output
iterate through input and apply .mergePage(*text*.getPage(0)) for each page you want the text added to, then use output.addPage() to add the modified pages to a new document

This works well for simple text additions. See PyPDF's sample for watermarking a document.

Here is some code to answer the question below:

packet = StringIO.StringIO()
can = canvas.Canvas(packet, pagesize=letter)
<do something with canvas>
can.save()
packet.seek(0)
input = PdfFileReader(packet)

From here you can merge the pages of the input file with another document.

edited Dec 10, 2018 at 17:43

user8554766

answered Feb 1, 2010 at 23:28

dwelch

3,0463 gold badges22 silver badges13 bronze badges

2 Comments

blaze Over a year ago

I recommend using PyPDF2 since it is more updated, also check their sample code: github.com/mstamy2/PyPDF2/blob/…

Anton Kukoba Over a year ago

This code will create a new pdf file and will skip all metadata. So it's not appending to existing pdf.

Patrick Maupin · Accepted Answer · 2015-10-20 23:13:33Z

19

pdfrw will let you read in pages from an existing PDF and draw them to a reportlab canvas (similar to drawing an image). There are examples for this in the pdfrw examples/rl1 subdirectory on github. Disclaimer: I am the pdfrw author.

edited Oct 20, 2015 at 23:13

answered Jul 11, 2015 at 4:47

Patrick Maupin

8,1672 gold badges25 silver badges44 bronze badges

1 Comment

Patrick Maupin Over a year ago

FWIW, there are some more reportlab/pdfrw examples if you start following this link. I answered there, based on an answer in the dupe target.

user2243670 · Accepted Answer · 2014-03-05 11:51:36Z

9

cpdf will do the job from the command-line. It isn't python, though (afaik):

cpdf -add-text "Line of text" input.pdf -o output .pdf

answered Mar 5, 2014 at 11:51

user2243670

3653 silver badges8 bronze badges

3 Comments

Tim Small Over a year ago

Carefully check the license for cpdf before using - it's not Open Source.

jepe170 Aug 14 at 8:41

I have checked it carefully: the license is free, the code is open source. see: en.wikipedia.org/wiki/GNU_Affero_General_Public_License and see: github.com/coherentgraphics/cpdf-binaries/blob/master/… no offense :)

Tim Small Aug 18 at 16:11

@jepe170 It appears to have been relicensed on 23 July 2024. Prior to that (my comment March 2022) it used a non-OSF approved "source available" type license - the "Coherent Graphics Ltd Non-Commercial Use License Agreement".

Community · Accepted Answer · 2017-05-23 12:02:39Z

7

Leveraging David Dehghan's answer above, the following works in Python 2.7.13:

from PyPDF2 import PdfFileWriter, PdfFileReader, PdfFileMerger

import StringIO

from reportlab.pdfgen import canvas
from reportlab.lib.pagesizes import letter

packet = StringIO.StringIO()
# create a new PDF with Reportlab
can = canvas.Canvas(packet, pagesize=letter)
can.drawString(290, 720, "Hello world")
can.save()

#move to the beginning of the StringIO buffer
packet.seek(0)
new_pdf = PdfFileReader(packet)
# read your existing PDF
existing_pdf = PdfFileReader("original.pdf")
output = PdfFileWriter()
# add the "watermark" (which is the new pdf) on the existing page
page = existing_pdf.getPage(0)
page.mergePage(new_pdf.getPage(0))
output.addPage(page)
# finally, write "output" to a real file
outputStream = open("destination.pdf", "wb")
output.write(outputStream)
outputStream.close()

edited May 23, 2017 at 12:02

CommunityBot

11 silver badge

answered Apr 22, 2017 at 21:52

Ross Smith II

12.2k1 gold badge41 silver badges46 bronze badges

2 Comments

West Over a year ago

If the existing pdf has multiple pages, how do you ensure the output has the same number of pages with only difference being the edited page? Im hoping there is a simpler way without making weird loops

Martin Thoma Over a year ago

PyPDF2 is deprecated. Please use pypdf: pypdf.readthedocs.io/en/stable

TimH · Accepted Answer · 2024-08-02 17:17:29Z

5

2024 update: PyPDF is active again, replacing the need for PyPDF2. Text annotations can be made with the PyPDF package as demonstrated with the FreeText annotation here (code chunk from link quoted below). This can do the trick for simple cases where you need to add text.

### From pypdf docs:
### https://pypdf.readthedocs.io/en/stable/user/adding-pdf-annotations.html#free-text

from pypdf import PdfReader, PdfWriter
from pypdf.annotations import FreeText

# Fill the writer with the pages you want
pdf_path = os.path.join(RESOURCE_ROOT, "crazyones.pdf")
reader = PdfReader(pdf_path)
page = reader.pages[0]
writer = PdfWriter()
writer.add_page(page)

# Create the annotation and add it
annotation = FreeText(
    text="Hello World\nThis is the second line!",
    rect=(50, 550, 200, 650),
    font="Arial",
    bold=True,
    italic=True,
    font_size="20pt",
    font_color="00ff00",
    border_color="0000ff",
    background_color="cdcdcd",
)
writer.add_annotation(page_number=0, annotation=annotation)

# Write the annotated file to disk
with open("annotated-pdf.pdf", "wb") as fp:
    writer.write(fp)

answered Aug 2, 2024 at 17:17

TimH

5507 silver badges23 bronze badges

3 Comments

SaeX Over a year ago

Sadly doesn't render in Chrome at the moment of writing (github.com/py-pdf/pypdf/issues/2372).

robertspierre Jul 19 at 17:10

In passing a tuple of 4 numbers likes (x1, y1, x2, y2) to the rect argument, notice that the (0,0) coordinate in the pdf is the lower left corner of the page, x grows right and y grows up

robertspierre Jul 19 at 17:14

Also note that, for some reson, if border_color is not None, font_color appears to be ignored and the font color will equal border color

VIGNESH E · Accepted Answer · 2023-03-08 16:02:13Z

The PyPDF2 as of date of writing has depreciated the PdfFileReader, PdfFileWriter and few other methods and changed it to different names and methods and has also changed methods like getPage() directly to attribute of PdfReader.

Here is a very Simple Class to add text to existing pdf file: (Use is demonstrated at end)

from PyPDF2 import PdfWriter, PdfReader, Transformation
import io
from reportlab.pdfgen.canvas import Canvas

class GenerateFromTemplate:
    def __init__(self,template):
        self.template_pdf = PdfReader(open(template, "rb"))
        self.template_page= self.template_pdf.pages[0]

        self.packet = io.BytesIO()
        self.c = Canvas(self.packet,pagesize=(self.template_page.mediabox.width,self.template_page.mediabox.height))

    
    def addText(self,text,point):
        self.c.drawString(point[0],point[1],text)

    def merge(self):
        self.c.save()
        self.packet.seek(0)
        result_pdf = PdfReader(self.packet)
        result = result_pdf.pages[0]

        self.output = PdfWriter()

        op = Transformation().rotate(0).translate(tx=0, ty=0)
        result.add_transformation(op)
        self.template_page.merge_page(result)
        self.output.add_page(self.template_page)
    
    def generate(self,dest):
        outputStream = open(dest,"wb")
        self.output.write(outputStream)
        outputStream.close()

"""
Use as:
gen = GenerateFromTemplate("template.pdf")
gen.addText("Hello!",(100,200))
gen.addText("PDF!",(100,300))
gen.merge()
gen.generate("Output.pdf")
"""

Hope this helps.

ConMan77 · Accepted Answer · 2022-02-27 07:26:02Z

0

Don't use mergePage, It may not work for some pdfs You should use mergeRotatedTranslatedPage

from PyPDF2 import PdfFileWriter, PdfFileReader
import io
from reportlab.pdfgen.canvas import Canvas

page_to_merge = 0 #Refers to the First page of PDF 
xcoor = 250 #To be changed according to your pdf
ycoor = 650 #To be changed according to your pdf

input_pdf = PdfFileReader(open("Source.pdf", "rb"))
page_count = input_pdf.getNumPages()
inputpdf_page_to_be_merged = input_pdf.getPage(page_to_merge)

packet = io.BytesIO()
c = Canvas(packet,pagesize=(inputpdf_page_to_be_merged.mediaBox.getWidth(),inputpdf_page_to_be_merged.mediaBox.getHeight()))
c.drawString(xcoor,ycoor,"Hello World")
c.save()
packet.seek(0)

overlay_pdf = PdfFileReader(packet)
overlay = overlay_pdf.getPage(0)

output = PdfFileWriter()

for PAGE in range(page_count):
    if PAGE == page_to_merge:
        inputpdf_page_to_be_merged.mergeRotatedTranslatedPage(overlay, 
                inputpdf_page_to_be_merged.get('/Rotate') or 0, 
                overlay.mediaBox.getWidth()/2, overlay.mediaBox.getWidth()/2)
        output.addPage(inputpdf_page_to_be_merged)
    
    else:
        Page_in_pdf = input_pdf.getPage(PAGE)
        output.addPage(Page_in_pdf)

outputStream = open("destination.pdf", "wb")
output.write(outputStream)
outputStream.close()

edited Feb 27, 2022 at 7:26

answered Feb 23, 2022 at 10:37

ConMan77

736 bronze badges

2 Comments

thinker3 Over a year ago

What version the PyPDF2 is in this answer?

ConMan77 Over a year ago

@thinker3 pypdf2 version is 1.26.0

SaeX · Accepted Answer · 2024-11-03 18:58:47Z

The combination of pypdf and fpdf can add text as an overlay to an existing PDF.

Ensure both packages are installed: pip install pypdf fpdf2. Tested with fpdf2==2.8.1 and pypdf==5.1.0.

Then adjust the below to your needs:

import io, os

from fpdf import FPDF
from pypdf import PdfReader, PdfWriter

RESOURCE_ROOT = os.path.abspath('c:\\temp\\')
input_path = os.path.join(RESOURCE_ROOT, "inputfile.pdf")
output_path = os.path.join(RESOURCE_ROOT, "outputfile.pdf")

def new_content():
    pdf = FPDF()
    pdf.add_page()
    pdf.set_font("Helvetica", "I", 30)  # Helvetica font, italic, size 30pt
    pdf.text(40, 100, "This is some overlay text")  # X and Y coordinates, text to add
    return io.BytesIO(pdf.output())


reader = PdfReader(input_path)
page_overlay = PdfReader(new_content()).pages[0]
reader.pages[0].merge_page(page2=page_overlay)  # Overlay on page #0 (first page)

writer = PdfWriter()
writer.append_pages_from_reader(reader)
writer.write(output_path)

Community · Accepted Answer · 2014-09-15 13:01:29Z

-4

If you're on Windows, this might work:

PDF Creator Pilot

There's also a whitepaper of a PDF creation and editing framework in Python. It's a little dated, but maybe can give you some useful info:

Using Python as PDF Editing and Processing Framework

edited Sep 15, 2014 at 13:01

CommunityBot

11 silver badge

answered Jul 24, 2009 at 21:14

thedz

5,6203 gold badges28 silver badges29 bronze badges

1 Comment

Frozenskys Over a year ago

The white paper looks good but is a little light on code, and I don't really have the resource to implement a whole PDF framework myself! ;)

aehlke · Accepted Answer · 2009-07-24 21:03:21Z

-5

You may have better luck breaking the problem down into converting PDF into an editable format, writing your changes, then converting it back into PDF. I don't know of a library that lets you directly edit PDF but there are plenty of converters between DOC and PDF for example.

answered Jul 24, 2009 at 21:03

aehlke

16k5 gold badges38 silver badges45 bronze badges

4 Comments

Frozenskys Over a year ago

Problem is that I only have the source in PDF (from a 3rd party) and PDF -> DOC -> PDF will lose a lot in the conversion. Also I need this to run on Linux so DOC may not be the best choice.

aehlke Over a year ago

I believe Adobe keeps PDF editing capability pretty closed and proprietary so that they can sell licenses for their better versions of Acrobat. Maybe you can find a way to automate the usage of Acrobat Pro to edit it, using some kind of macro interface.

aehlke Over a year ago

If the parts you want to write to are form fields, there are XML interfaces to editing them - otherwise I can't find anything.

Frozenskys Over a year ago

No I just wanted to add a few lines of text to each page.

Collectives™ on Stack Overflow

Add text to Existing PDF using Python

11 Answers 11

Example for [Python 2.7]:

Example for Python 3.x:

10 Comments

2 Comments

1 Comment

3 Comments

2 Comments

3 Comments

Comments

2 Comments

Comments

1 Comment

4 Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

11 Answers 11

Example for [Python 2.7]:

Example for Python 3.x:

10 Comments

2 Comments

1 Comment

3 Comments

2 Comments

3 Comments

Comments

2 Comments

Comments

1 Comment

4 Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related