Replace tesseract with PaddleOCR #5
Loading…
Reference in New Issue
No description provided.
Delete Branch "Yun/videocr:master"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
PaddleOCR is a bit slower but seems to have a more advanced architecture and much better chinese models available than tesseract.
Install PaddleOCR 1.1:
python -m pip install paddleocr==1.1.1
orInstall PaddleOCR 2.0+ from source (the version from pip seems to be really slow with cpu atm):
clone https://github.com/PaddlePaddle/PaddleOCR
if running python 3.8+, you may need to modify the requirements.txt to use
opencv-contrib-python>=4.2.0.32
instead ofopencv-contrib-python==4.2.0.32
from the project root directory run
python -m pip install -e .
PaddleOCR also requires installation of PaddlePaddle:
python -m pip install paddlepaddle
orpython -m pip install paddlepaddle-gpu
if you have a CUDA9 or CUDA10 gpuWith the mobile models listed in their git repo, it seems the 1.1 models might perform a bit better than the 2.0 models that are used by default. Gonna test this a bit more.
Update: yeah so it turns out the 2.0 mobile models don't seem to perform as well as the 1.1 mobile models. With paddleocr 2.0, would have to use the slower but more accurate server models. There are also PaddleLite models but I'm not sure how to run those.
Update 2: Actually with dilation before threshold and filtering out nonwhite pixels, the 2.0 mobile models perform decent
Pull request closed