conda activate
conda install -c conda-forge tesseract
package | build
---------------------------|-----------------
leptonica-1.82.0 | h950d820_0 2.6 MB conda-forge
libarchive-3.5.2 | hccf745f_1 1.6 MB conda-forge
lzo-2.10 | h516909a_1000 314 KB conda-forge
tesseract-5.0.1 | h84e3e21_0 171.4 MB conda-forge
------------------------------------------------------------
Total: 175.9 MB
tesseract
Usage:
tesseract --help | --help-extra | --version
tesseract --list-langs
tesseract imagename outputbase [options...] [configfile...]
OCR options:
-l LANG[+LANG] Specify language(s) used for OCR.
NOTE: These options must occur before any configfile.
Single options:
--help Show this help message.
--help-extra Show extra help for advanced users.
--version Show version information.
--list-langs List available languages for tesseract engine.
example
wget https://tesseract-ocr.github.io/tessdoc/images/eurotext.png
tesseract eurotext.png eurotext-eng
cat eurotext-eng
eurotext-eng.txt eurotext.png
(base) [[email protected] ~]$ cat eurotext-eng.txt
The (quick) [brown] {fox} jumps!
Over the $43,456.78 <lazy> #90 dog
& duck/goose, as 12.5% of E-mail
from [email protected] is spam.
Der ,.schnelle” braune Fuchs springt
tiber den faulen Hund. Le renard brun
«rapide» saute par-dessus le chien
paresseux. La volpe marrone rapida
salta sopra il cane pigro. El zorro
marron rapido salta sobre el perro
perezoso. A raposa marrom rapida
salta sobre 0 cao preguigoso.
TESSDATA_PREFIX environment variable
tesseract testing/eurotext.tif testing/eurotext-eng -l eng
tesseract testing/eurotext.png testing/eurotext-engdeu -l eng+deu
tesseract testing/bilingual.jpg testing/bilingual-enghin -l eng+hin
tesseract testing/bilingual.jpg testing/bilingual-hineng -l hin+eng
OUTPUTS
searchable pdf output::
tesseract testing/eurotext.png testing/eurotext-eng -l eng pdf
pdf+txt layer
tesseract c:\temp\test_ara.jpg -l ara -psm 3 c:\temp\test_ara pdf
Hocr output
tesseract testing/eurotext.png testing/eurotext-eng -l eng hocr
cat eurotext-eng.hocr
tsv output
tesseract testing/eurotext.png testing/eurotext-eng -l eng tsv
page segmentation
tesseract testing/san002.png testing/san002-psm6 -l san -psm 6
tesseract testing/san002.png testing/san002-psm3 -l san -psm 3
src::
https://anaconda.org/conda-forge/tesseract
https://tesseract-ocr.github.io/tessdoc/Command-Line-Usage.html
https://github.com/tesseract-ocr/
Comments
Post a Comment