A recent project of mine called for optical character recognition. After a brief Google search and a personal recommendation I decided to use tesseract because it is cross platform, under active development, and has a Python API (pytesseract).
Installing these was surprisingly easy:
tesseract has a Windows installer which comes with the English language data available here.
pytesseract can be installed using pip:
pip install pytesseract
pytesseract states that it requires Python Imaging Library (PIL) however this project no longer appears to be active, so I used the maintained fork of that project pillow. This can be installed using pip:
pip install pillow
And that’s it!
You should now be able to do some optical recognition with python:
import pytesseract from PIL import Image print pytesseract.image_to_string(Image.open('test.jpg'))
As always, if you have any comments or suggestions please feel free to get in touch.