Extract text from an image

I have watched a lot of webcasts about Microsoft Dynamics 365 for Operations and Finance. As a trainer, I’m able to archive some of these webcasts for reference purpose. But how can I find a specific topic in this pile of webcasts? Is there a way so that I can use the “Search through file content” option of Total Commander?

The first step for me is to grab protable network graphic images from a webcast. I used ffMpeg to grab stills from a webcast webcast. Then I deleted duplicate images and all images that didn’t show a presentation slide. And last but not least I created a command line utility to help with the OCR process.

Download and unzip the command line utility

Open the command prompt on the unzipped folder:

  • In the explorer, select the unzipped folder
  • With the left shift pressed, select “open command window here”

Example 1. Scan one specific folder and create an output file within that folder.

png2text folder="c:\\Users\\botten\\Pictures\\MyTest\\"

Example 2. Scan one folder and define specific output file.

png2text folder="c:\\Users\\botten\\Pictures\\MyTest\\" outputfile="c:\\Users\\botten\\OCR\\20181212.txt""

The result file that is generated contains the textual information found in the image. The text will be surrounded by an XML tag. The XML tag is build up from the filename. After processing several images in a folder, I generated the output file and now I can search for key words.

Example

<png0>
Features of Dynamics 365 for Finance and Operations
</png0>

<png1>
Lifecycle Services
Predictable
Automated
Proactive
</png1>