How to Process Handwritten Text Using Python and Cloud Vision

How to Process Handwritten Text Using Python and Cloud Visionimage
In this blog, we cover how handwritten text data can be processed using Python and Google Cloud Vision. Cloud vision offers pre-trained ML models which are very powerful, and we do not need to do any pre-training.

With the size and scope of collected data expanding with each passing day, it is very important to continually analyze available data to drive better-informed business and policy decisions. To a large extent, handwritten data remains unexplored and unanalyzed. If we can analyze handwritten text data, we can minimize the hurdles and save the manpower involved in digitizing handwritten data.

In this blog, we cover how handwritten text data can be processed using Python and Google Cloud Vision. Cloud vision offers pre-trained ML models which are very powerful, and we do not need to do any pre-training.

Below is an example of converting simple handwritten text into digital words, which can be easily ingested into a CSV file or database.

Using the applications listed in the instructions below, we are able to convert the scanned images or PDF files into digital text. Our approach for this solution requires us to first convert PDF documents to an image format. As we are processing images, it is better to convert all the images to the same size. We then need to define the source from which we are extracting the data from the image. This requires knowing the coordinates of the data to be extracted.

Steps to Convert Handwritten Text into Digital Data

  1. The prerequisites for this exercise are to install Google Cloud Vision, Python 3, Handprint, Keras, NumPy, pandas, pdf2image and cv2.
  2. Convert the PDF document to an image using the following python command.
    images = convert_from_path(file)
  3. Save the images to a directory so that this will be used for data extraction.
    for img in images:
       img.save(path + "\\" + fileName + '.jpg', 'JPEG')
  4. Read the image using following command.
    image = cv2.imread(filename)
  5. Pass the image to the pytesseract to convert the image to data item.
  6. To view the keys, run the below command.
  7. Check the extracted text data.
  8. Extract the coordinates for the required text using the following command.
  9. Repeat the same process for all the required text fields.
  10. Use the extracted coordinates to crop the image for the required field and save the cropped image to a specific location.
  11. Download the credentials file after setting up the Cloud Vision account on Google.
  12. Process these cropped images using Google’s Cloud Vision to extract handwritten values.
  13. We then extract the required data from the Json files.
  14. We have created a Python dictionary to store the extracted information. Convert the dictionary to dataframe for easy processing.

As this example deals with only two images, the instructions did not include many validations. For a real-world application, analysts should validate the extracted data based on confidence levels and filter out the accurate data.

In Conclusion

Handwritten image and text processing is very useful across many industries, including healthcare, insurance, and utilities. Various use cases include invoice processing, processing of inspection forms, employee onboarding, analyzing reviews and survey data and so on. One prescient application for utilities and other organizations relying on distributed field services is the digitization of handwritten notes taken by field technicians when completing repairs or regularly scheduled maintenance. By digitizing these notes, and even incorporating process automation to do so, utilities are better situated to retain critical handwritten information regarding asset health, maintenance history, and more.


Let's get your data streamlined today!