Kenneth H
Kenneth H Visionary Technology Leader

Understanding text with Azure Functions using OCR Processing of PDF files

Understanding text with <b>Azure Functions</b> using <b>OCR</b> Processing of PDF files

This week, one of my customers wanted to use Optical Character Recognition (OCR) to extract text from PDF using Azure Cognitive Services. However, at this point of writing, Azure Cognitive Services for Computer Vision API only works with images.


Since Computer Vision API only works with images, I came up with a solution to first, pre-process the PDF files into images, then, apply OCR processing with Computer Vision. To achieve this, I designed a solution that uses Azure Functions and Logic App, building an end-to-end solution to ensure that each Function processes part of the task before post OCR processing; to storing of the results.


The key objective of this showcase was to demonstrate Azure’s capabilities by building an equivalent state-of-the-art technology leveraging on microservice architecture and serverless concept. I came up with an initial solution, then further improved it with this simple demo in a couple of hours to showcase the “Cloud + AI” capabilites on Azure with Azure Functions and Logic Apps.


Reference Architecture


This is the solution architecture for the solution. I used Azure Functions, Azure Cognitive Services for Computer Vision and Logic App.

The approach was simple. I basically used Azure Functions triggers and bindings to handle the communication end-to-end.

  1. User uploads the PDF to Azure Blob Storage
  2. Azure Functions is triggered and enqueue the filename to Azure Queue Storage
  3. Azure Functions is triggered to get the PDF filename to be processed into PNG images
  4. Each image filename is then enqueued to a Azure Queue Storage for OCR processing
  5. Azure Functions is triggered to get the image filename and POST a request to Logic App
  6. Logic App receives the filename and processes the blob file with Computer Vision OCR
  7. The results are stored in an Azure Blob Storage


OCR Architecture



Getting Started


Let’s get started! In this tutorial, I’ll walk you through on some of the key concepts in building up this solution. Since there are multiple parts to this, I’ll briefly talk about some touch points.


Pre-requisite

GhostScript DLL: GhostScript


I downloaded (here) the latest version of 64-bit GhostScript (you could install a 32-bit version too if you like). Once it’s installed, let’s keep the file location in mind, we’ll need this later. The file location should be located in this path C:\Program Files\gs\gs9.23\bin\gsdll64.dll.

GhostScript DLL Location


Azure Functions


Let’s start by creating a Serverless Functions App resource.

Create Serverless Functions App

Once this is created, we’ll get the functions setup for our solution. Let’s head over to Application Settings to configure the platform to a 64-bit architecture.

Application Settings

Let’s create ourselves our first Function app. We’ll add project.json into our project that includes GhostScript.NET, save it and the Function app will be compiled.

C# Blob Trigger Add Project.json


Next, let’s go back to Platform Features -> Select Advanced Tools (Kudu). We’ll configure our Functions App to be able to process our PDF later on. Remember we installed the GhostScript earlier, what we’ll do is to include that in our Function App via Kudu.

Advanced Tools (Kudu)

Debug Console (CMD)

Navigate to D:\home\data\Functions\packages\nuget\ghostscript.net\1.2.1\lib\net40, then ‘drag-and-drop’ the gsdll64.dll file into the browser. This will upload the DLL library into our Function App.

Debug Console (CMD)

These steps provide our Function App with the capabilities to now convert our PDF to PNG files.

Solution


The solution is built with 3 Azure Functions and 1 Logic App. Since the objective is to execute OCR processing on PNG images, we’ll need to convert our PDF to PNG files.


Blob Trigger - Queue Output


Users will upload the PDF into a Blob Storage. We’ll use a Blob Trigger to pick up the filename from the Blob Storage (Trigger), then enqueue the filename to a Queue Storage (Output) to be picked up by our second Function.


This is a very simple Blob Trigger Function. Firstly, let’s configure the Queue Storage output under Integrate. This step is pretty intuitive, we’ll choose the Output binding to Queue Storage, then let the Function app know which queue to send it to, and the parameter name.

Integrate

Then in run.csx, this is all we need for it to work. This code receives the blob name from the Blob trigger, then sends the name to a Queue Storage Output.



Queue Trigger - PDF Processing - Queue Output


The second Function will GET the Queue message containing the filename and retrieve the PDF file from the Blob Storage. This Function will use GhostScript to process the PDF file into a single page PNG file. Once each page is uploaded to a Blob Storage, it’ll concurrently output to a Queue Storage containing the image filename.


Remember, if there are multiple pages within your PDF files, there would also be multi-paged PNG images. We’ll queue each image filename to the Queue Storage (Output) so that each image will be processed by OCR later on.


Similar to the earlier configuration, let’s configure the Queue Storage (Output) under Integrate.

Integrate


Queue Trigger - Logic App


Lastly, the Queue trigger will GET the Queue message containing the PNG image filename and submit a POST request to a Logic App which will handle the OCR processing via Cognitive Service - Computer Vision API. We’ll return the results into a Blob Storage.


Logic App


This is the desired outcome for Logic App.

Logic App Designer

Or, you could copy this code into the Logic App Code View in your Logic App, replace the variables and it should be working!



Here’s the Azure Functions Codes to the solution.


Cheers!

comments powered by Disqus