1

I am trying to replace text strings in a PDF file using the Python code below.

import PyPDF2
reader = PyPDF2.PdfFileReader('document.pdf', strict=True, warndest=None, overwriteWarnings=True)
writer = PyPDF2.PdfFileWriter()
replacements = {'old' : 'new'}

P = reader.getNumPages()
for p in range(P):
    page = reader.getPage(p)
    contents = page.getContents()
    bdata = contents.getData()
    ddata = bdata.decode('utf-8') #decoded data (string)  
    for key in replacements.keys():
        ddata = ddata.replace(key, replacements[key])
    
    contents.setData(ddata.encode('utf-8')) #Error occurs here
    
    #page.setContents(contents)
    writer.addPage(page)

with open("result.pdf", 'wb') as f:
    writer.write(f)

The problem is that contents.setData raises PdfReadError: Creating EncodedStreamObject is not currently supported.

Can anybody think of a workaround?

P.S. Applying the method described here did create a new PDF file but without replacements.

1
  • 2
    As an aside: using UTF-8 to decode the content stream is a sure way to damage the stream data in many pdfs. Commented Jan 6, 2021 at 22:02

1 Answer 1

1

As explained here, this isn't a good idea. You might consider building an HTML of the page you want, then use wkhtmltopdf to convert it into PDF

Sign up to request clarification or add additional context in comments.

Comments