This week, one of my customers wanted to use Optical Character Recognition (OCR) to extract text from PDF using Azure Cognitive Services. However, at this point of writing, Azure Cognitive Services for Computer Vision API only works with images.
Since Computer Vision API only works with images, I came up with a solution to first, pre-process the PDF files into images, then, apply OCR processing with Computer Vision. To achieve this, I designed a solution that uses Azure Functions and Logic App, building an end-to-end solution to ensure that each Function processes part of the task before post OCR processing; to storing of the results.
The key objective of this showcase was to demonstrate Azure’s capabilities by building an equivalent state-of-the-art technology leveraging on microservice architecture and serverless concept. I came up with an initial solution, then further improved it with this simple demo in a couple of hours to showcase the “Cloud + AI” capabilites on Azure with Azure Functions and Logic Apps.
This is the solution architecture for the solution. I used Azure Functions, Azure Cognitive Services for Computer Vision and Logic App.
The approach was simple. I basically used Azure Functions triggers and bindings to handle the communication end-to-end.
- User uploads the PDF to Azure Blob Storage
- Azure Functions is triggered and enqueue the filename to Azure Queue Storage
- Azure Functions is triggered to get the PDF filename to be processed into PNG images
- Each image filename is then enqueued to a Azure Queue Storage for OCR processing
- Azure Functions is triggered to get the image filename and POST a request to Logic App
- Logic App receives the filename and processes the blob file with Computer Vision OCR
- The results are stored in an Azure Blob Storage
Let’s get started! In this tutorial, I’ll walk you through on some of the key concepts in building up this solution. Since there are multiple parts to this, I’ll briefly talk about some touch points.
GhostScript DLL: GhostScript
I downloaded (here) the latest version of 64-bit GhostScript (you could install a 32-bit version too if you like). Once it’s installed, let’s keep the file location in mind, we’ll need this later. The file location should be located in this path
Let’s start by creating a Serverless Functions App resource.
Once this is created, we’ll get the functions setup for our solution. Let’s head over to
Application Settings to configure the platform to a 64-bit architecture.
Let’s create ourselves our first Function app. We’ll add
project.json into our project that includes
GhostScript.NET, save it and the Function app will be compiled.
Next, let’s go back to
Platform Features -> Select
Advanced Tools (Kudu). We’ll configure our Functions App to be able to process our PDF later on. Remember we installed the GhostScript earlier, what we’ll do is to include that in our Function App via Kudu.
D:\home\data\Functions\packages\nuget\ghostscript.net\1.2.1\lib\net40, then ‘drag-and-drop’ the
gsdll64.dll file into the browser. This will upload the DLL library into our Function App.
These steps provide our Function App with the capabilities to now convert our PDF to PNG files.
The solution is built with 3 Azure Functions and 1 Logic App. Since the objective is to execute OCR processing on PNG images, we’ll need to convert our PDF to PNG files.
Blob Trigger - Queue Output
Users will upload the PDF into a Blob Storage. We’ll use a Blob Trigger to pick up the filename from the Blob Storage (Trigger), then enqueue the filename to a Queue Storage (Output) to be picked up by our second Function.
This is a very simple Blob Trigger Function. Firstly, let’s configure the Queue Storage output under
Integrate. This step is pretty intuitive, we’ll choose the Output binding to
Queue Storage, then let the Function app know which queue to send it to, and the parameter name.
run.csx, this is all we need for it to work. This code receives the blob name from the Blob trigger, then sends the name to a Queue Storage Output.
Queue Trigger - PDF Processing - Queue Output
The second Function will GET the Queue message containing the filename and retrieve the PDF file from the Blob Storage. This Function will use GhostScript to process the PDF file into a single page PNG file. Once each page is uploaded to a Blob Storage, it’ll concurrently output to a Queue Storage containing the image filename.
Remember, if there are multiple pages within your PDF files, there would also be multi-paged PNG images. We’ll queue each image filename to the Queue Storage (Output) so that each image will be processed by OCR later on.
Similar to the earlier configuration, let’s configure the Queue Storage (Output) under
Queue Trigger - Logic App
Lastly, the Queue trigger will GET the Queue message containing the PNG image filename and submit a POST request to a Logic App which will handle the OCR processing via Cognitive Service - Computer Vision API. We’ll return the results into a Blob Storage.
This is the desired outcome for Logic App.
Or, you could copy this code into the
Logic App Code View in your Logic App, replace the
variables and it should be working!
Here’s the Azure Functions Codes to the solution.