Retrain new HanS tesseract model using SimHei font #4

Closed
opened 2021-07-14 15:05:19 +02:00 by Yun · 1 comment
Contributor

https://github.com/tesseract-ocr/langdata/blob/master/Han.xheights

Doesn't appear like it's used in the default training data, which may explain why tesseract has trouble recognizing certain characters even when there's no noise

https://github.com/tesseract-ocr/langdata/blob/master/Han.xheights Doesn't appear like it's used in the default training data, which may explain why tesseract has trouble recognizing certain characters even when there's no noise
Author
Contributor

Nevermind. PaddleOCR does a much better job than Tesseract.

Nevermind. PaddleOCR does a much better job than Tesseract.
Yun closed this issue 2021-07-16 19:56:03 +02:00
Sign in to join this conversation.
No Label
No Milestone
No project
No Assignees
1 Participants
Notifications
Due Date
The due date is invalid or out of range. Please use the format 'yyyy-mm-dd'.

No due date set.

Dependencies

No dependencies set.

Reference: pradana.aumars/videocr#4
No description provided.