Replace text in PDF with Python [duplicate]

Question

I am trying to replace text strings in a PDF file using the Python code below.

import PyPDF2
reader = PyPDF2.PdfFileReader('document.pdf', strict=True, warndest=None, overwriteWarnings=True)
writer = PyPDF2.PdfFileWriter()
replacements = {'old' : 'new'}

P = reader.getNumPages()
for p in range(P):
    page = reader.getPage(p)
    contents = page.getContents()
    bdata = contents.getData()
    ddata = bdata.decode('utf-8') #decoded data (string)  
    for key in replacements.keys():
        ddata = ddata.replace(key, replacements[key])
    
    contents.setData(ddata.encode('utf-8')) #Error occurs here
    
    #page.setContents(contents)
    writer.addPage(page)

with open("result.pdf", 'wb') as f:
    writer.write(f)

The problem is that contents.setData raises PdfReadError: Creating EncodedStreamObject is not currently supported.

Can anybody think of a workaround?

P.S. Applying the method described here did create a new PDF file but without replacements.

As an aside: using UTF-8 to decode the content stream is a sure way to damage the stream data in many pdfs. — mkl
– mkl, Commented Jan 6, 2021 at 22:02

ishahak · Accepted Answer · 2022-11-06 16:29:43Z

1

As explained here, this isn't a good idea. You might consider building an HTML of the page you want, then use wkhtmltopdf to convert it into PDF

answered Nov 6, 2022 at 16:29

ishahak

6,8456 gold badges42 silver badges60 bronze badges

Sign up to request clarification or add additional context in comments.

Collectives™ on Stack Overflow

Replace text in PDF with Python [duplicate]

1 Answer 1

Comments

Linked

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

Comments

Linked

Related