Search the Dev Community
Showing results for 
Search instead for 
Do you mean 

How to optimize results from the OCR API when extracting text from an image?

by Level 7 on ‎03-12-2015 11:17 - edited on ‎02-02-2016 09:09 by Community Manager

Question

How to optimize results from the OCR API when extracting text from an image? 

Answer

Please note that HP IDOL OnDemand is now HPE Haven OnDemand. The API endpoints have changed to Haven OnDemand. Please see the API documentation for more details.

 

---

 

Using the OCR api, there are a few tricks you can use to improve the results for your request.

 

A clean image with sharp, dark type font on a white background will greatly effect the capacity of OCR to identify your text. So make sure, your lighting is optimal and you don't shake when you snap a picture with your mobile camera.

 

The larger the image the more detail, and the OCR engine can take background distortions as possible text. So instead of making your results clearer, in this case a larger picture with a lot of miniscule details may actually not give you the best result. On the other hand, when the image is too small, you obviously start to loose sharpness of your font and the image starts to pixelize. Depending on the quality of the camera you take photo's with, the quality of the photo's can become too distorting. So it might take you some testing to find the best results settings.

 

Notice the 'mode' parameter in the OCR api. When you use photo's from your mobile camera, try using the 'scene_photo' mode, instead of the default 'document_photo', which can be used to process professional images, automated images, screenshots and scans for instance. The IDOL OnDemand engine treats different types of images differently, in order to get the best results.

Comments
by Level 4 TPS99 Level 4 on ‎04-14-2015 08:51

Just to add to the above - recently we were using this API and, having followed the steps above, we were getting results very close to what we after but not quite 100% there.

 

OCR by its nature is never perfect but in many cases you are looking for a match against a set of possible terms or strings as opposed to trying to interpret any text in any scenario.  In our particular case we were looking to match OCR output against 5-10 possible names, a task made slightly more tricky as some of the names were quite similar.  After a search on the web we came across the “Levenshtein Distance” algorithm on wikibooks.org.  This algorithm takes 2 strings and iterates through each to compare them – with the result that it returns the “number of edits” to get from 1 to the other.  So for example “cat” to “hats” has a score of 2.

 

Combining this with the Idol onDemand OCR API we were able to get some excellent results, especially if we tried 2 different modes in the OCR API (see remkohdev’s advice).

 

Enclosed is a JSFiddle showing this in action so that you can try it yourself

 

http://jsfiddle.net/tpbldr99/tq5e6suf/

 

There is in the code a “match list” of 5 terms – if you enter for example “k??ten&” to simulate the output from IDOL onDemand OCR you’ll see that “kites” is close with 4 but “kittens” is the best match.

 

Putting this together with the IDOL onDemand OCR API should help you in any use case where you want to match text in an image against a known set of terms.

Article Dashboard
Social Media
Topics
† The opinions expressed above are the personal opinions of the authors, not of HPE. By using this site, you accept the Terms of Use and Rules of Participation