« MBS Filemaker Plugin,… | Home | Sybase with Real Stud… »

OCR for Real Studio and Filemaker


Today we added OCR functions for our Filemaker plugin. The example looks like this:

We use tesseract3 as engine and provide the plugin functionality for both Real Studio and Filemaker with our latest plugins. Basically you simply give it an image and ask for the text. For better results you should check the options like defining the page segmentation mode or an rectangle of interest.
The results in first tests of some users are less than optimal. Mostly because they use the wrong options. Like for a business card, the default page segmentation mode is wrong. Single Block mode does not work well with multi column texts or mixed texts like in a business card. Better use Auto mode here.
Next you need to make sure you have the best image. We recommend at least 150 dpi, better 300 dpi. And if you have RGB or Grayscale. B/W is often not so good. Internally tesseract applies some filters and converts itself to black and white later. But the result with RGB or grayscale is normally better if you let tesseract convert the colors. Using an image with poor resolution can cause tesseract to not recognize anything at all.
Finally you have to choose a language. The reason is that the engine has been trained with demo text from a given language. You can of course create your own file here. If you don't know the language you can simply try with all packs you want to support and pick the result with best confidence.

You find documentation here for Filemaker and here for Real Studio.
We hope you have fun adding OCR functionality to Real Studio and Filemaker solutions. If you have questions, please do not hesitate to contact us.
29 08 12 - 23:35
four comments

This is totally awesome!

My dad needs to do a lot of OCR for an upcoming project, and we’ve been looking for a solution we could tailor to his specific needs. Now we can just code our own, with Real Studio and MBS – really cool Christian, thanks man!
Thomas Boelskifte (URL) - 29 10 12 - 12:36

Excellent function. I am so thankfull that you guys came up with it. We can now replace an external executution we use to make to do OCR.

It would be wonderfull to have image comparision for multipl choise checks on the OCR functions.

tks
Guillermo Dewey (URL) - 17 11 12 - 20:05

Just wondering, why would the function OCR.SetRectangle use (x, y) and (width and height)

and not

X1 Y1
X2 Y2

I am saying this because most image mappers would give you this 4 numbers and non I can find would give you the x an y and then height and whidth :S
Guillermo Dewey (URL) - 17 11 12 - 21:09

That is a design decision. Calculating X2/Y2 with width/height is easy. Simply use it and report (by email) if you have problems.
Christian Schmitz (URL) - 17 11 12 - 21:39


  
Remember personal info?

Emoticons / Textile


Notify:
Hide email:

Small print: All html tags except <b> and <i> will be removed from your comment. You can make links by just typing the url or mail-address.