117

I would like to take a multi-page pdf file and create separate pdf files per page.

I have downloaded reportlab and have browsed the documentation, but it seems aimed at pdf generation. I haven't yet seen anything about processing PDF files themselves.

Is there an easy way to do this in python?

9 Answers 9

243
from PyPDF2 import PdfWriter, PdfReader

inputpdf = PdfReader(open("document.pdf", "rb"))

for i in range(len(inputpdf.pages)):
    output = PdfWriter()
    output.add_page(inputpdf.pages[i])
    with open("document-page%s.pdf" % i, "wb") as outputStream:
        output.write(outputStream)

etc.

Sign up to request clarification or add additional context in comments.

5 Comments

User with open("document-page%s.pdf" % (i+1), "wb") as outputStream: if you want your files to be named with index starting from 1 instead of 0.
if you are having PdfReadError: Multiple definitions in dictionary at byte, you can modify your input pdf variable to: pdf = PdfFileReader(open("document.pdf", "rb"), strict=False)
If i want to split 100 instead of split 1 page individual i want to save 2 in 1 pdf. I change the for to save two pages at a time, right?
The issue I ran into with this code was as I was "extracting" each page and iterating through the document, the next output would contain all previous pages. For example, a document "test.pdf" containing 5 pages and to be renamed "test_(page num).pdf" would give me a result where the first iteration was fine (one page) but the next iteration would contain pages1-2, then the next would be pages 1-3...and so on. My fix for this was to simply add another PdfFileReader inside the for loop
I was in a hurry, the script did great! but before that, I had some problems with the PyPDF version, to make it run I had to downgrade with: pip install PyPDF2==2.12.1 (didn't had time to adjust to a proper fix)
13

Updated solution for the latest release of PyPDF (3.0.0) and to split a range of pages.

from PyPDF2 import PdfReader, PdfWriter

file_name = r'c:\temp\junk.pdf'
pages = (121, 130)

reader = PdfReader(file_name)
writer = PdfWriter()
page_range = range(pages[0], pages[1] + 1)

for page_num, page in enumerate(reader.pages, 1):
    if page_num in page_range:
        writer.add_page(page)

with open(f'{file_name}_page_{pages[0]}-{pages[1]}.pdf', 'wb') as out:
    writer.write(out)

Comments

11

I missed here a solution where you split the PDF to two parts consisting of all pages so I append my solution if somebody was looking for the same:

from PyPDF2 import PdfFileWriter, PdfFileReader

def split_pdf_to_two(filename,page_number):
    pdf_reader = PdfFileReader(open(filename, "rb"))
    try:
        assert page_number < pdf_reader.numPages
        pdf_writer1 = PdfFileWriter()
        pdf_writer2 = PdfFileWriter()

        for page in range(page_number):
            pdf_writer1.addPage(pdf_reader.getPage(page))

        for page in range(page_number,pdf_reader.getNumPages()):
            pdf_writer2.addPage(pdf_reader.getPage(page))

        with open("part1.pdf", 'wb') as file1:
            pdf_writer1.write(file1)

        with open("part2.pdf", 'wb') as file2:
            pdf_writer2.write(file2)

    except AssertionError as e:
        print("Error: The PDF you are cutting has less pages than you want to cut!")

Comments

10

The PyPDF2 package gives you the ability to split up a single PDF into multiple ones.

import os
from PyPDF2 import PdfFileReader, PdfFileWriter

pdf = PdfFileReader(path)
for page in range(pdf.getNumPages()):
    pdf_writer = PdfFileWriter()
    pdf_writer.addPage(pdf.getPage(page))

    output_filename = '{}_page_{}.pdf'.format(fname, page+1)

    with open(output_filename, 'wb') as out:
        pdf_writer.write(out)

    print('Created: {}'.format(output_filename))

Changes for PyPDF2 3.0.0

import os
from PyPDF2 import PdfReader, PdfWriter
path = 'pdf_forms/myform.pdf'
fname = 'fname'

pdf = PdfReader(path)
for page in range(len(pdf.pages)):
    pdf_writer = PdfWriter()
    pdf_writer.add_page(pdf.pages[page])

    output_filename =         
     'pdf_forms/splitted/{}_page_{}.pdf'.format(fname, page+1)

    with open(output_filename, 'wb') as out:
        pdf_writer.write(out)

    print('Created: {}'.format(output_filename))

Source: https://www.blog.pythonlibrary.org/2018/04/11/splitting-and-merging-pdfs-with-python/

2 Comments

added page number selection and wrapped it in a function: gist.github.com/shanecandoit/b3b90fa4532aeedce6400c0084981933
I want to split a 169 pages pdf to 85 containing 2 pages in each file. It's possible do say that i want 2 pages instead 1?
5

I know that the code is not related to python, however i felt like posting this piece of R code which is simple, flexible and works amazingly. The PDFtools package in R is amazing in splitting merging PDFs at ease.

library(pdftools) #Rpackage
pdf_subset('D:\\file\\20.02.20\\22 GT 2017.pdf',
           pages = 1:51, output = "subset.pdf")

1 Comment

Here number of pages is hardcoded. Anyway to automatically do it ?
5
import fitz

src = fitz.open("source.pdf")
for page in src:
    tar = fitz.open()  # output PDF for 1 page
    # copy over current page
    tar.insert_pdf(src, from_page=page.number, to_page=page.number)
    tar.save(f"page-{page.number}.pdf")
    tar.close()

Comments

5

The earlier answers with PyPDF2 for splitting pdfs are not working anymore with the latest version update. The authors recommend using pypdf instead and this version of PyPDF2==3.0.1 will be the last version of PyPDF2. The function needs to be modified as follows:

import os
from PyPDF2 import PdfReader, PdfWriter

def split_pdfs(input_file_path):
    inputpdf = PdfReader(open(input_file_path, "rb"))

    out_paths = []
    if not os.path.exists("outputs"):
        os.makedirs("outputs")

    for i, page in enumerate(inputpdf.pages):
        output = PdfWriter()
        output.add_page(page)

        out_file_path = f"outputs/{input_file_path[:-4]}_{i}.pdf"
        with open(out_file_path, "wb") as output_stream:
            output.write(output_stream)

        out_paths.append(out_file_path)
    return out_paths

Note: The same function will work with pypdf as well. Import PdfReader and PdfWriter from pypdf rather than PyPDF2.

Comments

0
from PyPDF2 import PdfFileReader, PdfFileWriter
import os
import sys
import glob
abspath = os.path.abspath(__file__)
dname = os.path.dirname(abspath)
os.chdir(dname)

if getattr(sys, 'frozen', False):
    _location_ = os.path.dirname(os.path.realpath(sys.executable))
elif __file__:
    _location_ = os.path.realpath(
    os.path.join(os.getcwd(), os.path.dirname(__file__)))

for file in glob.glob(__location__ + "/*.pdf"):
    if file.endswith('.pdf'):
        pdf_file = open(os.path.join(__location__, file), 'rb')
        pdf_reader = PdfFileReader(pdf_file)
        
pageNumbers = pdf_reader.getNumPages()

for i in range (pageNumbers):
    pdf_writer = PdfFileWriter()
    pdf_writer.addPage(pdf_reader.getPage(i))
    split_motive = open('Page ' + str(i+1) + '.pdf', 'wb')
    pdf_writer.write(split_motive)
    split_motive.close()

pdf_file.close()

Link to article

Comments

0

In Windows it would be very simple to use a single line shortcut or "sendto" command and any one of dozens of PDF app's such as ultra-fast MuPDF or even GhostScript.

However, this is a Python question and thus I will show a one-line method for Python.

In windows to process a folder we use a for loop allowing for multiple inputs for example for %f in (*.pdf) do python .....

Thus, we ensure we have a suitable library we can import to replace mutool with pymupdf, and using the following command, all the files (Just two here as MWE, one with spaces for assurance) are instantly split.

for %f in (*.pdf) do @python -c "import pymupdf; import os; fn=r'%f'; src=pymupdf.open(fn); [ (lambda p: (d:=pymupdf.open(), d.insert_pdf(src, from_page=p.number, to_page=p.number), d.save(f'{os.path.splitext(fn)[0]}_page_{p.number+1}.pdf'), d.close()))(p) for p in src ]"

enter image description here

If you need multiple lines or Python file handling then consider one of the more complex answers, but this is a basic, use a command line task.

To save it as a CMD file, convert the %f to %%f and edit the file mask to allow for an argument.

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.