For this article there is no such prerequisite, we will use PyPDF2 library for this purpose. PyPDF2 is a free and open-source pure-Python PyPDF library capable of performing many tasks like splitting, merging, cropping, and transforming the pages of PDF files. It can also add custom data, viewing options, and passwords to PDF files. PyPDF2 can retrieve text and metadata from PDFs as well. Refer to this “ Working with PDF files in Python ” to explore about PyPDF2
Execute the below command to install the PyPDF2 library in the command prompt or terminal.
pip install PyPDF2Step 1: Import PyPDF2 library into the Python program
import PyPDF2Step 2: Open the PDF file in read binary format using file handling
file = open('your pdf file path', 'rb')Step 3: Read the pdf using the PdfReader() function of the PyPDF2 library
pdfReader = PyPDF2.PdfReader(file)Note: These above three steps are similar for all methods that we are going to see using an example.
We are going to learn three methods to count the number of pages in a PDF file which are as follows:
len(pdfReader.pages) is a property of PdfReader Class that returns the total number of pages in the PDF file.
totalPages1 = len(pdfReader.pages)For Example:
Output:
Total Pages: 10In the above example, we imported the PyPDF2 module and opened the file using file handling in read binary format after that with the help of PdfReader() function of PyPDF2 module we read the pdf file which we opened previously, then with the help of the numPages property of the module we counted the total pages of PDF file and stored the total number of pages in a variable “totalPages ” for further usage and at last, we print the variable holding the total page count of PDF file.
getNumPages() is a method of PdfReader class that returns an integer specifying a total number of pages and it takes no argument this method is deprecated since version 1.28.0 but we can still use another method that comes in its replacement is next method discussed.
totalPages2 = pdfReader.getNumPages()Output:
Total Pages: 10In the above example, we imported the PyPDF2 module and opened the file using file handling in reading binary format after that with the help of the PdfReader() function of PyPDF2 module we read the pdf file that we opened previously, then with the help of getNumPages() method of the module we counted the total pages of PDF file and stored the total number of pages in a variable “totalpages” for further usage and at last, we print the variable holding the total page count of PDF file.
pages is a read-only property that emulates a list of Page objects and using len() function which is Python’s inbuilt function to count the length of a sequence is used combinedly to determine the total pages of the PDF.
totalPages3 = len(pdfReader.pages)Output:
Total Pages: 10In the above example we imported the PyPDF2 module and opened the file using file handling in read binary format then with the help of PdfReader() function of PyPDF2 module we read the pdf file which we opened previously, then with the help of the pages property of the module we get the list of all the pages of PDF file and with the help of len() function we counted the total pages returned by pages property and stored the total number of pages in a variable “totalpages” for further usage and at last, we print the variable holding the total page count of PDF file.