Data range has shifted from a few bytes to quintillion bytes of data every day as greater emphasis has been placed on data science and data analysis.
We work a lot with files and PDFs, and AI is becoming increasingly important in this domain. We’ve seen a huge shift in the way AI analyses and extracts data from files to reduce human workload.
HummusJS is a Node.JS Module that allows you to create, parse, and manipulate PDF files and streams swiftly. In this blog, we’ll see how to create a new PDF File and split an existing PDF File using HummusJS and with the installation of Express.
What is HummusJS?
HummusJS is a node.js module for high-performance creation, parsing, and splitting of PDF files, and modification of those files or streams. The module provides fast flexibility in performing the set of operations using a unique model of one-off writing. The following library is built on top of the PDFHummus library, a powerful, fast, and free XPlatform C++ PDF library. The library is free of cost and is licensed by Apache 2 so we can use it safely for both commercial and non-commercial purposes.
Why is splitting a PDF file important in various use cases?
With a boom in the tech industry related to the major chunks of data, the majority of the attention has been diverted toward data science and data analysis. Their use cases have taken over the IT market and are on a rapid hike. The range of the chunks of data has been changed from a few bytes to a million trillion bytes of data every day. We work on a lot of file types majorly PDF files and AI has been playing a significant role in the current area.
We have noticed how AI is replacing human load by analyzing and extracting data out of files. For an AI use case, in a PDF file with multiple pages, each page of that file might be used for different functionalities which in turn makes it very important to split each page of that document efficiently without a loss of data since each page of that file can be used for training various ML models to perform operations like OCR, text extraction, etc.
Features of HummusJS
The features of HummusJS involves:
- Creating new PDF files.
- Modifying the existing PDF files.
- Displaying JPG, JPEG, PNG and TIFF images on a PDF page.
- Show texts using a variety of formats.
- Defining reusable graphics.
- Splitting a PDF file with multiple pages into a single file.
- Embedding other PDF files into one PDF file with multiple options.
- Flexibility to parse PDFs to get basic details about the PDF file and it’s pages, reading form values and much more.
- Flexibility to add new pages to an existing PDF file with new contents.
Required installations for the process
HummusJS: A node.js module for high-performance creation, parsing, and splitting of PDF files, and modification of those files or streams.
Express: It is basically a back-end web application framework for Node.js designed for building various web applications and APIs.
Hands-on
In this hands-on, we’ll start by splitting an example PDF file with several pages into two PDF files containing single pages. We will see how we can make use of the HummusJS library to perform the following operation and even check out the various functionalities offered to perform operations like finding the PDF files, page count, etc. Then, we will install Express and have a look at how we can make use of the Express and HummusJS library to create a PDF file with some sort of text in it.
Splitting a PDF file with multiple pages.
Download a sample PDF file online or use any PDF file with multiple pages that you wish to split into a PDF file with single pages.
Open a code editor and create a new JS file in which we will be writing our code.
In the command prompt, run the below command. We will need the following library to execute the set of operations.
You will see the folder structure as shown in the image below.
Now, we need to import the Hummus library that we installed. We can do the same using the command below.
Once you import the library, we need to create a reader for the sample PDF file with multiple pages.
If the PDF chosen can be successfully parsed, then you will see the message as shown in the image below.
Now, once the message is displayed, we can get the total pages to count in that PDF file using the code below.
For the selected PDF file, you will see that the page count is displayed as 2.
Now, let’s create a writer that will mention the name of the new file that is to be created. In the below example, we’ll create a file with the name ‘1st Page.pdf’.
On success, run the file using node filename.js and you will see the message as shown in the image below.
The below code will split the page and add the page from the old PDF file to the new file that is to be created. The value 0 in the appendPDFPageFromPDF function states that it will take up the first page in the old PDF file.
If the creation was successful and if you console the above writer variable, you will see the output as shown in the image below.
In the image below, you will see that the new file will be generated in the folder structure.
When we open the file, we will see that there will be a single page inside that PDF document.
Now, if you wish to split multiple pages, we can add a loop on the old PDF file using the page count as an iterator and get the multiple PDF files with the respective pages.
Now, if we execute the above code file, you will see in the folder structure, new files will be created in the folder structure.
If you open the result1.pdf file, you will see the first page in it.
If you open the result2.pdf file, you will see the second page in it.
Now, let’s say we have a PDF file with 24 pages in it. Let’s try our code on the new file with many pages.
On executing the code for the new file, you will see that new files with the respective pages will be created in the same folder structure.
Creating a new PDF file using Express and HummusJS.
First, before proceeding, install the express library using the below command in the command terminal.
We will begin with writing some basic server code using Express. The below code is just plain express code. It will create a server that will listen to port 3000. The content type is application/pdf such that the PDF content in the PDF file is interpreted as such.
Note: Add the below sections of code inside the express code.
Now, we will write some PDF code. We will start by fetching the hummus library and then create a PDF writer using the hummus.createWriter method.
Then, we will create a page with the required dimensions using the below code.
Using the below code, we will add some text in the PDF file on a page. A content context will be created for the page which is what is needed for us to write down the required text on the page in the PDF file.
The below set of codes defines the Text color, the font type, the size of the text, and other such details. Based on the below configuration, the required text will be added to the page in the PDF file.
The below code states that there are no more operations required to be performed on that page so we can close the page.
Now, if you run the code and hit https://www.localhost:3000, then you will see that a PDF file will be opened with the required text in it.
Now, let’s change the text in that PDF file and execute the same code again.
Now if you refresh the localhost executed on port 3000, you will see that the code executed properly with the required text in that file.
To clone the entire code repository along with the sample PDF files, use the below link and use the git clone command in a folder on your local machine.
https://github.com/workfall/HummusJS.git
Conclusion
In this blog, we saw how we can split a sample PDF file with multiple pages into separate PDF files consisting of single pages. We saw how we can make use of the HummusJS library to perform the following operation and even checked out the various functionalities offered to perform operations like finding the PDF files, page count, etc. Then, we installed Express and had a look at how we can make use of the Express and HummusJS library to create a PDF file with some sort of text in it. Stay tuned to keep getting all updates about our upcoming new blogs on different technologies.
Meanwhile …
Keep Exploring -> Keep Learning -> Keep Mastering
This blog is part of our effort towards building a knowledgeable and kick-ass tech community. At Workfall, we strive to provide the best tech and pay opportunities to AWS-certified talents. If you’re looking to work with global clients, build kick-ass products while making big bucks doing so, give it a shot at workfall.com/partner today.