Skip to content

Installing pytesseract – practically painless

Last updated on 2 April 2015

A recent project of mine called for optical character recognition.  After a brief Google search and a personal recommendation I decided to use tesseract because it is cross platform, under active development, and has a Python API (pytesseract).

Installing these was surprisingly easy:

tesseract has a Windows installer which comes with the English language data available here.

pytesseract can be installed using pip:

pip install pytesseract

pytesseract states that it requires Python Imaging Library (PIL) however this project no longer appears to be active, so I used the maintained fork of that project pillow. This can be installed using pip:

pip install pillow

And that’s it!

You should now be able to do some optical recognition with python:

import pytesseract
from PIL import Image
print pytesseract.image_to_string(Image.open('test.jpg'))

 


As always, if you have any comments or suggestions please feel free to get in touch.

Published inInstalling and Configuring (notes to my future self)

10 Comments

  1. Sesha Sesha

    Hello, thanks for the guide. Do you know how to do an image to Spanish_language string using pytesseract?

    • GrimHacker GrimHacker

      Hi Sesha
      I’m sorry for the slow reply, I hope the following can still be of use to you or others:
      I have not tested this, but if you install have the Spanish language package for tesseract, you can specify the language to use in pytesseract like this:

      print(pytesseract.image_to_string(Image.open('test-european.jpg'), lang='spa'))

  2. Paul Paul

    hello grimhacker, i tried the steps accordingly but when i try to run the code it gives me this error below.
    WindowsError: [Error 2] The system cannot find the file specified

    my actual code is:
    image = cv2.imread(“test.png”,0)
    cv2.imshow(“text”, image)
    img = Image.fromarray(image)
    print pytesseract.image_to_string(img)
    cv2.waitKey(0)
    any suggestion?

  3. John John

    hi, i’m having a problem with pytesseract. i have been through paul’s error however, another error sprung.
    “Traceback (most recent call last):
    File “”, line 1, in
    a()
    File “”, line 5, in a
    p.image_to_string(img)
    File “C:\Python34\lib\site-packages\pytesseract\pytesseract.py”, line 163, in image_to_string
    errors = get_errors(error_string)
    File “C:\Python34\lib\site-packages\pytesseract\pytesseract.py”, line 110, in get_errors
    error_lines = tuple(line for line in lines if line.find(‘Error’) >= 0)
    File “C:\Python34\lib\site-packages\pytesseract\pytesseract.py”, line 110, in
    error_lines = tuple(line for line in lines if line.find(‘Error’) >= 0)
    TypeError: ‘str’ does not support the buffer interface”

    i have read in other webpages about bytes, strings and encodings. your reply is very much appreciated.
    cheers!

  4. fgc fgc

    A pytesseract installation using pip, in March 2017, did not appear to include updates from the latest merged pull request, number 33. PR 33 provides for potential encoding issues resulting from output of Tesseract-OCR.

    The pytesseract project page – https://pypi.python.org/pypi/pytesseract, appears to reflect an upload date of 2015-03-19. Is that the date of the files installed when using pip? How does pip get “forced” to update to and install the latest “official” version from Github?

    Pip’s a nice tool for convenient installation, but if packages aren’t installed using the current version, that seems to diminish the value.

    • GrimHacker GrimHacker

      Hi
      As far as I am aware PyPI does not generate packages, it is up to the maintainer to upload the latest version of their module.
      You might consider opening an Issue on the project’s GitHub page requesting PyPI be updated with the latest version, or if you have a pressing need to use the latest version, follow the instructions in the readme to install from source:
      $> git clone git@github.com:madmaze/pytesseract.git
      $ (env)> python setup.py install

  5. Luiz Henrique Bernardes Luiz Henrique Bernardes

    Hi, I’m trying to use de pytesseract but I’m having the same problem for the windows 8 an 10, on Python 3.4 e 3.6.

    Python 3.6.1 (v3.6.1:69c0db5, Mar 21 2017, 17:54:52) [MSC v.1900 32 bit (Intel)]
    on win32
    Type “help”, “copyright”, “credits” or “license” for more information.
    >>> from PIL import Image
    >>> import pytesseract
    >>> img = Image.open(‘C:\\Users\\User\\Desktop\\Docs\\20170124_184232.jpg’)
    >>> img

    >>> pytesseract.pytesseract.tesseract_cmd = pytesseract.__path__
    >>> print(pytesseract.image_to_string(img))
    Traceback (most recent call last):
    File “”, line 1, in
    File “C:\Users\User\AppData\Local\Programs\Python\Python36-32\lib\site-package
    s\pytesseract\pytesseract.py”, line 109, in image_to_string
    config=config)
    File “C:\Users\User\AppData\Local\Programs\Python\Python36-32\lib\site-package
    s\pytesseract\pytesseract.py”, line 42, in run_tesseract
    stderr=subprocess.PIPE)
    File “C:\Users\User\AppData\Local\Programs\Python\Python36-32\lib\subprocess.p
    y”, line 707, in __init__
    restore_signals, start_new_session)
    File “C:\Users\User\AppData\Local\Programs\Python\Python36-32\lib\subprocess.p
    y”, line 990, in _execute_child
    startupinfo)
    PermissionError: [WinError 5] Acesso negado
    >>>

    I already tried to run as admin and change the directory of the image.

    Thanks! 🙂

Leave a Reply to Luiz Henrique Bernardes Cancel reply

Your email address will not be published. Required fields are marked *