videocr

Commit Graph

Author	SHA1	Message	Date
Yun	37de9b3e5f	Update image processing to use PaddleOCR instead of tesseract	2021-07-16 17:01:18 +02:00
Yun	9b37319961	Update model to use PaddleOCR results	2021-07-16 16:58:44 +02:00
Yun	b5e6f5a57f	Update image processing procedure Apply threshold after dilution and select only white pixels from result. Erode afterwards to thin out the text.	2021-07-14 06:22:55 +02:00
Pradana AUMARS	25765b8b6f	Import numpy in video.py	2021-07-13 16:39:03 +02:00
Pradana AUMARS	09f5098e19	Fix missing parenthesis	2021-07-13 16:36:35 +02:00
Yun	aec2b9c95a	fixup	2021-07-13 10:20:47 +02:00
Yun	7f6881749f	Add additional image processing Ordered process: 1. dilation - thicken white portion of subtitles 2. resize - temporary hardcoded to 47% (assuming subtitles are 68 pixels in height) 3. apply hsv color mask - filter out non gray pixels and filter out pixels that are not bright enough 4. invert image - make it black text on white background 5. add border to top and bottom - assuming subtitles are cropped closely	2021-07-13 09:12:43 +02:00
Pradana AUMARS	edc1bc28a2	Fix indentation on last commit	2021-07-12 23:52:26 +02:00
Pradana AUMARS	5534ae317f	Isolate subtitles as black over white background (kudos to u/Yun on hexbear.net)	2021-07-12 22:20:00 +02:00
Yi Ge	8f8f2d6d79	print muted exception from multiprocessing pool	2019-12-15 22:29:13 +08:00
Yi Ge	f8e99465c7	move util functions to utils.py	2019-12-15 21:38:48 +08:00
Yi Ge	9360ebdd40	add adapter for OpenCV	2019-12-15 21:38:17 +08:00
Yi Ge	720c9d479f	move download_lang_data to utils.py	2019-12-15 20:56:09 +08:00
Yi Ge	da8cd05f08	use lazy map when performing parallel ocr	2019-05-17 16:26:06 +02:00
Yi Ge	04ad4597ff	support combining multiple languages	2019-04-29 22:29:49 +02:00
Yi Ge	efd7223624	make sim_threshold adjustable through api	2019-04-29 03:50:06 +02:00
Yi Ge	77362dce1a	make conf_threshold adjustable through api	2019-04-29 03:05:02 +02:00
Yi Ge	a5e6845a1b	move tessdata dir to ~/tessdata	2019-04-29 03:04:06 +02:00
Yi Ge	fba35f0108	auto download tesseract data file	2019-04-28 17:33:16 +02:00
Yi Ge	bccdcc02fc	define module __init__.py	2019-04-28 17:31:43 +02:00
Yi Ge	bd6f15978b	add api definition	2019-04-28 15:46:24 +02:00
Yi Ge	bc84ee39ff	move video parameters to run_ocr() function	2019-04-27 21:41:19 +02:00
Yi Ge	3f73cb9bca	adjust text similarity metrics	2019-04-27 03:18:59 +02:00
Yi Ge	a3986b3279	support ocr on part of the video	2019-04-27 00:31:32 +02:00
Yi Ge	e55c17c325	export subtitles to srt file	2019-04-27 00:31:32 +02:00
Yi Ge	f5d27a7a46	calculate PredictedSubtitle.text early	2019-04-26 00:32:47 +02:00
Yi Ge	3a73f1f508	merge new sub to the last subs if they are similar	2019-04-26 00:07:25 +02:00
Yi Ge	0d86e14fbc	divide ocr of frames into subtitle paragraphs	2019-04-25 01:40:46 +02:00
Yi Ge	0e932936a1	add PredictedSubtitle model	2019-04-25 01:39:35 +02:00
Yi Ge	63873af476	add Video class	2019-04-24 21:18:31 +02:00
Yi Ge	57d1dc7b9b	add initial models	2019-04-20 23:21:41 +02:00

31 Commits