split a multi-page pdf file into multiple pdf files with python?

Question

I would like to take a multi-page pdf file and create separate pdf files per page.

I have downloaded reportlab and have browsed the documentation, but it seems aimed at pdf generation. I haven't yet seen anything about processing PDF files themselves.

Is there an easy way to do this in python?

claudiu.brandusa · Accepted Answer · 2023-02-07 15:55:49Z

243

from PyPDF2 import PdfWriter, PdfReader

inputpdf = PdfReader(open("document.pdf", "rb"))

for i in range(len(inputpdf.pages)):
    output = PdfWriter()
    output.add_page(inputpdf.pages[i])
    with open("document-page%s.pdf" % i, "wb") as outputStream:
        output.write(outputStream)

etc.

edited Feb 7, 2023 at 15:55

claudiu.brandusa

242 silver badges6 bronze badges

answered Jan 29, 2009 at 1:38

user26294

5,6624 gold badges25 silver badges18 bronze badges

Sign up to request clarification or add additional context in comments.

5 Comments

ePandit Over a year ago

User with open("document-page%s.pdf" % (i+1), "wb") as outputStream: if you want your files to be named with index starting from 1 instead of 0.

Haha Over a year ago

if you are having PdfReadError: Multiple definitions in dictionary at byte, you can modify your input pdf variable to: pdf = PdfFileReader(open("document.pdf", "rb"), strict=False)

youngt17 Over a year ago

If i want to split 100 instead of split 1 page individual i want to save 2 in 1 pdf. I change the for to save two pages at a time, right?

Andres Alvarez Over a year ago

The issue I ran into with this code was as I was "extracting" each page and iterating through the document, the next output would contain all previous pages. For example, a document "test.pdf" containing 5 pages and to be renamed "test_(page num).pdf" would give me a result where the first iteration was fine (one page) but the next iteration would contain pages1-2, then the next would be pages 1-3...and so on. My fix for this was to simply add another PdfFileReader inside the for loop

Gabriel G Over a year ago

I was in a hurry, the script did great! but before that, I had some problems with the PyPDF version, to make it run I had to downgrade with: pip install PyPDF2==2.12.1 (didn't had time to adjust to a proper fix)

Jeremy Whitcher · Accepted Answer · 2023-02-24 00:11:29Z

13

Updated solution for the latest release of PyPDF (3.0.0) and to split a range of pages.

from PyPDF2 import PdfReader, PdfWriter

file_name = r'c:\temp\junk.pdf'
pages = (121, 130)

reader = PdfReader(file_name)
writer = PdfWriter()
page_range = range(pages[0], pages[1] + 1)

for page_num, page in enumerate(reader.pages, 1):
    if page_num in page_range:
        writer.add_page(page)

with open(f'{file_name}_page_{pages[0]}-{pages[1]}.pdf', 'wb') as out:
    writer.write(out)

edited Feb 24, 2023 at 0:11

answered Feb 24, 2023 at 0:04

Jeremy Whitcher

7418 silver badges12 bronze badges

Comments

DovaX · Accepted Answer · 2019-11-04 13:12:38Z

I missed here a solution where you split the PDF to two parts consisting of all pages so I append my solution if somebody was looking for the same:

from PyPDF2 import PdfFileWriter, PdfFileReader

def split_pdf_to_two(filename,page_number):
    pdf_reader = PdfFileReader(open(filename, "rb"))
    try:
        assert page_number < pdf_reader.numPages
        pdf_writer1 = PdfFileWriter()
        pdf_writer2 = PdfFileWriter()

        for page in range(page_number):
            pdf_writer1.addPage(pdf_reader.getPage(page))

        for page in range(page_number,pdf_reader.getNumPages()):
            pdf_writer2.addPage(pdf_reader.getPage(page))

        with open("part1.pdf", 'wb') as file1:
            pdf_writer1.write(file1)

        with open("part2.pdf", 'wb') as file2:
            pdf_writer2.write(file2)

    except AssertionError as e:
        print("Error: The PDF you are cutting has less pages than you want to cut!")

Rahul Chauhan · Accepted Answer · 2023-08-04 10:10:40Z

10

The PyPDF2 package gives you the ability to split up a single PDF into multiple ones.

import os
from PyPDF2 import PdfFileReader, PdfFileWriter

pdf = PdfFileReader(path)
for page in range(pdf.getNumPages()):
    pdf_writer = PdfFileWriter()
    pdf_writer.addPage(pdf.getPage(page))

    output_filename = '{}_page_{}.pdf'.format(fname, page+1)

    with open(output_filename, 'wb') as out:
        pdf_writer.write(out)

    print('Created: {}'.format(output_filename))

Changes for PyPDF2 3.0.0

import os
from PyPDF2 import PdfReader, PdfWriter
path = 'pdf_forms/myform.pdf'
fname = 'fname'

pdf = PdfReader(path)
for page in range(len(pdf.pages)):
    pdf_writer = PdfWriter()
    pdf_writer.add_page(pdf.pages[page])

    output_filename =         
     'pdf_forms/splitted/{}_page_{}.pdf'.format(fname, page+1)

    with open(output_filename, 'wb') as out:
        pdf_writer.write(out)

    print('Created: {}'.format(output_filename))

Source: https://www.blog.pythonlibrary.org/2018/04/11/splitting-and-merging-pdfs-with-python/

edited Aug 4, 2023 at 10:10

Rahul Chauhan

1351 silver badge8 bronze badges

answered May 11, 2020 at 9:46

Nikita Jain

7699 silver badges13 bronze badges

2 Comments

shanecandoit Over a year ago

added page number selection and wrapped it in a function: gist.github.com/shanecandoit/b3b90fa4532aeedce6400c0084981933

youngt17 Over a year ago

I want to split a 169 pages pdf to 85 containing 2 pages in each file. It's possible do say that i want 2 pages instead 1?

sandilya M · Accepted Answer · 2020-03-06 06:40:57Z

5

I know that the code is not related to python, however i felt like posting this piece of R code which is simple, flexible and works amazingly. The PDFtools package in R is amazing in splitting merging PDFs at ease.

library(pdftools) #Rpackage
pdf_subset('D:\\file\\20.02.20\\22 GT 2017.pdf',
           pages = 1:51, output = "subset.pdf")

answered Mar 6, 2020 at 6:40

sandilya M

1192 silver badges7 bronze badges

1 Comment

Soumya Boral Over a year ago

Here number of pages is hardcoded. Anyway to automatically do it ?

Jorj McKie · Accepted Answer · 2023-01-17 11:57:52Z

5

import fitz

src = fitz.open("source.pdf")
for page in src:
    tar = fitz.open()  # output PDF for 1 page
    # copy over current page
    tar.insert_pdf(src, from_page=page.number, to_page=page.number)
    tar.save(f"page-{page.number}.pdf")
    tar.close()

answered Jan 17, 2023 at 11:57

Jorj McKie

3,2831 gold badge17 silver badges24 bronze badges

Comments

anomanderrake · Accepted Answer · 2023-01-28 07:35:20Z

The earlier answers with PyPDF2 for splitting pdfs are not working anymore with the latest version update. The authors recommend using pypdf instead and this version of PyPDF2==3.0.1 will be the last version of PyPDF2. The function needs to be modified as follows:

import os
from PyPDF2 import PdfReader, PdfWriter

def split_pdfs(input_file_path):
    inputpdf = PdfReader(open(input_file_path, "rb"))

    out_paths = []
    if not os.path.exists("outputs"):
        os.makedirs("outputs")

    for i, page in enumerate(inputpdf.pages):
        output = PdfWriter()
        output.add_page(page)

        out_file_path = f"outputs/{input_file_path[:-4]}_{i}.pdf"
        with open(out_file_path, "wb") as output_stream:
            output.write(output_stream)

        out_paths.append(out_file_path)
    return out_paths

Note: The same function will work with pypdf as well. Import PdfReader and PdfWriter from pypdf rather than PyPDF2.

rafcioz · Accepted Answer · 2023-03-06 09:16:44Z

from PyPDF2 import PdfFileReader, PdfFileWriter
import os
import sys
import glob
abspath = os.path.abspath(__file__)
dname = os.path.dirname(abspath)
os.chdir(dname)

if getattr(sys, 'frozen', False):
    _location_ = os.path.dirname(os.path.realpath(sys.executable))
elif __file__:
    _location_ = os.path.realpath(
    os.path.join(os.getcwd(), os.path.dirname(__file__)))

for file in glob.glob(__location__ + "/*.pdf"):
    if file.endswith('.pdf'):
        pdf_file = open(os.path.join(__location__, file), 'rb')
        pdf_reader = PdfFileReader(pdf_file)
        
pageNumbers = pdf_reader.getNumPages()

for i in range (pageNumbers):
    pdf_writer = PdfFileWriter()
    pdf_writer.addPage(pdf_reader.getPage(i))
    split_motive = open('Page ' + str(i+1) + '.pdf', 'wb')
    pdf_writer.write(split_motive)
    split_motive.close()

pdf_file.close()

Link to article

K J · Accepted Answer · 2025-11-16 22:34:06Z

In Windows it would be very simple to use a single line shortcut or "sendto" command and any one of dozens of PDF app's such as ultra-fast MuPDF or even GhostScript.

However, this is a Python question and thus I will show a one-line method for Python.

In windows to process a folder we use a for loop allowing for multiple inputs for example for %f in (*.pdf) do python .....

Thus, we ensure we have a suitable library we can import to replace mutool with pymupdf, and using the following command, all the files (Just two here as MWE, one with spaces for assurance) are instantly split.

for %f in (*.pdf) do @python -c "import pymupdf; import os; fn=r'%f'; src=pymupdf.open(fn); [ (lambda p: (d:=pymupdf.open(), d.insert_pdf(src, from_page=p.number, to_page=p.number), d.save(f'{os.path.splitext(fn)[0]}_page_{p.number+1}.pdf'), d.close()))(p) for p in src ]"

If you need multiple lines or Python file handling then consider one of the more complex answers, but this is a basic, use a command line task.

To save it as a CMD file, convert the %f to %%f and edit the file mask to allow for an argument.

Collectives™ on Stack Overflow

split a multi-page pdf file into multiple pdf files with python?

9 Answers 9

5 Comments

Comments

Comments

2 Comments

1 Comment

Comments

Comments

Comments

Comments

Linked

Hot Network Questions

Collectives™ on Stack Overflow

9 Answers 9

5 Comments

Comments

Comments

2 Comments

1 Comment

Comments

Comments

Comments

Comments

Linked

Related