Installing pytesseract – practically painless

Last updated on 2 April 2015

A recent project of mine called for optical character recognition. After a brief Google search and a personal recommendation I decided to use tesseract because it is cross platform, under active development, and has a Python API (pytesseract).

Installing these was surprisingly easy:

tesseract has a Windows installer which comes with the English language data available here.

pytesseract can be installed using pip:

pip install pytesseract

pytesseract states that it requires Python Imaging Library (PIL) however this project no longer appears to be active, so I used the maintained fork of that project pillow. This can be installed using pip:

pip install pillow

And that’s it!

You should now be able to do some optical recognition with python:

import pytesseract
from PIL import Image
print pytesseract.image_to_string(Image.open('test.jpg'))

As always, if you have any comments or suggestions please feel free to get in touch.

10 Comments

Hamza

Nice and easy explained. Thanks !

# 29 June 2016 Reply
Sesha

Hello, thanks for the guide. Do you know how to do an image to Spanish_language string using pytesseract?

# 17 July 2016 Reply
- GrimHacker
  
  Hi Sesha
  I’m sorry for the slow reply, I hope the following can still be of use to you or others:
  I have not tested this, but if you install have the Spanish language package for tesseract, you can specify the language to use in pytesseract like this:
  
  print(pytesseract.image_to_string(Image.open('test-european.jpg'), lang='spa'))
  
  # 8 September 2016 Reply
Paul

hello grimhacker, i tried the steps accordingly but when i try to run the code it gives me this error below.
WindowsError: [Error 2] The system cannot find the file specified

my actual code is:
image = cv2.imread(“test.png”,0)
cv2.imshow(“text”, image)
img = Image.fromarray(image)
print pytesseract.image_to_string(img)
cv2.waitKey(0)
any suggestion?

# 21 November 2016 Reply
- GrimHacker
  
  Hi Paul,
  
  I have seen that error previously when the python library is installed but the tesseract binary is not in your system path.
  
  Make sure you have tesseract installed and and it is working using the information here: https://github.com/tesseract-ocr/tesseract/wiki
  
  Let me know if that works for you 🙂
  
  # 21 November 2016 Reply
John

hi, i’m having a problem with pytesseract. i have been through paul’s error however, another error sprung.
“Traceback (most recent call last):
File “”, line 1, in
a()
File “”, line 5, in a
p.image_to_string(img)
File “C:\Python34\lib\site-packages\pytesseract\pytesseract.py”, line 163, in image_to_string
errors = get_errors(error_string)
File “C:\Python34\lib\site-packages\pytesseract\pytesseract.py”, line 110, in get_errors
error_lines = tuple(line for line in lines if line.find(‘Error’) >= 0)
File “C:\Python34\lib\site-packages\pytesseract\pytesseract.py”, line 110, in
error_lines = tuple(line for line in lines if line.find(‘Error’) >= 0)
TypeError: ‘str’ does not support the buffer interface”

i have read in other webpages about bytes, strings and encodings. your reply is very much appreciated.
cheers!

# 22 December 2016 Reply
- GrimHacker
  
  Hi John
  
  I think this is a known issue in pytesseract when the image you are converting contains characters of a different encoding: https://github.com/madmaze/pytesseract/issues/32
  
  I don’t think there is anything i can do to help you solve that one, sorry 🙁
  
  # 31 January 2017 Reply
fgc

A pytesseract installation using pip, in March 2017, did not appear to include updates from the latest merged pull request, number 33. PR 33 provides for potential encoding issues resulting from output of Tesseract-OCR.

The pytesseract project page – https://pypi.python.org/pypi/pytesseract, appears to reflect an upload date of 2015-03-19. Is that the date of the files installed when using pip? How does pip get “forced” to update to and install the latest “official” version from Github?

Pip’s a nice tool for convenient installation, but if packages aren’t installed using the current version, that seems to diminish the value.

# 30 March 2017 Reply
- GrimHacker
  
  Hi
  As far as I am aware PyPI does not generate packages, it is up to the maintainer to upload the latest version of their module.
  You might consider opening an Issue on the project’s GitHub page requesting PyPI be updated with the latest version, or if you have a pressing need to use the latest version, follow the instructions in the readme to install from source:
  $> git clone git@github.com:madmaze/pytesseract.git $ (env)> python setup.py install
  
  # 27 April 2017 Reply
Luiz Henrique Bernardes

Hi, I’m trying to use de pytesseract but I’m having the same problem for the windows 8 an 10, on Python 3.4 e 3.6.

Python 3.6.1 (v3.6.1:69c0db5, Mar 21 2017, 17:54:52) [MSC v.1900 32 bit (Intel)]
on win32
Type “help”, “copyright”, “credits” or “license” for more information.
>>> from PIL import Image
>>> import pytesseract
>>> img = Image.open(‘C:\\Users\\User\\Desktop\\Docs\\20170124_184232.jpg’)
>>> img

>>> pytesseract.pytesseract.tesseract_cmd = pytesseract.__path__
>>> print(pytesseract.image_to_string(img))
Traceback (most recent call last):
File “”, line 1, in
File “C:\Users\User\AppData\Local\Programs\Python\Python36-32\lib\site-package
s\pytesseract\pytesseract.py”, line 109, in image_to_string
config=config)
File “C:\Users\User\AppData\Local\Programs\Python\Python36-32\lib\site-package
s\pytesseract\pytesseract.py”, line 42, in run_tesseract
stderr=subprocess.PIPE)
File “C:\Users\User\AppData\Local\Programs\Python\Python36-32\lib\subprocess.p
y”, line 707, in __init__
restore_signals, start_new_session)
File “C:\Users\User\AppData\Local\Programs\Python\Python36-32\lib\subprocess.p
y”, line 990, in _execute_child
startupinfo)
PermissionError: [WinError 5] Acesso negado
>>>

I already tried to run as admin and change the directory of the image.

Thanks! 🙂

# 9 June 2017 Reply

Installing pytesseract – practically painless

Related

10 Comments

Leave a Reply Cancel reply

Installing pytesseract – practically painless

Share:

Related

10 Comments

Leave a Reply Cancel reply