日韩性视频-久久久蜜桃-www中文字幕-在线中文字幕av-亚洲欧美一区二区三区四区-撸久久-香蕉视频一区-久久无码精品丰满人妻-国产高潮av-激情福利社-日韩av网址大全-国产精品久久999-日本五十路在线-性欧美在线-久久99精品波多结衣一区-男女午夜免费视频-黑人极品ⅴideos精品欧美棵-人人妻人人澡人人爽精品欧美一区-日韩一区在线看-欧美a级在线免费观看

歡迎訪問 生活随笔!

生活随笔

當(dāng)前位置: 首頁 > 编程资源 > 编程问答 >内容正文

编程问答

用于文本识别的合成数据生成器

發(fā)布時間:2025/4/16 编程问答 28 豆豆
生活随笔 收集整理的這篇文章主要介紹了 用于文本识别的合成数据生成器 小編覺得挺不錯的,現(xiàn)在分享給大家,幫大家做個參考.

https://github.com/Belval/TextRecognitionDataGenerator

A synthetic data generator for text recognition

?

說明:

功能與上篇博客介紹的文本圖片生成類似。

安裝相關(guān)的依賴后,按要求即可以運行demo。

?

可以生成自己所希望的語料的文本,也可以添加自己所需要的背景。

例如,火車票信息,可以將所有可能的車站名稱、車次名稱、等一些固定的信息都放在里面,隨機生成需要的樣本數(shù)據(jù)。

?

python run.py -l cn --output_dir MY_samples -i texts/city.txt -c 1000 -b 3 -w 5

另外,對于中文字體(黑體、宋體...),如何修改還在探索。

?

生成樣本如下圖:

?

?

TextRecognitionDataGenerator??

A synthetic data generator for text recognition

What is it for?

Generating text image samples to train an OCR software. Now supporting non-latin text!

What do I need to make it work?

I use Archlinux so I cannot tell if it works on Windows yet.

Python 3.X OpenCV 3.2 (It probably works with 2.4) Pillow Numpy Requests BeautifulSoup tqdm

You can simply use?pip install -r requirements.txt?too.

New

  • Specify text color range using?-tc min,max
  • Explicit alignement when using?-al?with fixed width (0: Left, 1: Center, 2: Right)
  • Fixed width using?-wd
  • Generate random strings with letters, numbers and symbols (Thank you @FHainzl)
  • Save the labels in a file instead of in the file name (Thank you @FHainzl)
  • Add support for Simplified and Traditional Chinese

How does it work?

python run.py -w 5 -f 64

You get 1000 randomly generated images with random text on them like:

????

What if you want random skewing? Add?-k?and?-rk?(python run.py -w 5 -f 64 -k 5 -rk)

?

But scanned document usually aren't that clear are they? Add?-bl?and?-rbl?to get gaussian blur on the generated image with user-defined radius (here 0, 1, 2, 4):

???

Maybe you want another background? Add?-b?to define one of the three available backgrounds: gaussian noise (0), plain white (1), quasicrystal (2) or picture (3).

???

When using picture background (3). A picture from the pictures/ folder will be randomly selected and the text will be written on it.

Or maybe you are working on an OCR for handwritten text? Add?-hw! (Experimental)

It uses a Tensorflow model trained using?this excellent project?by Grzego.

The project does not require TensorFlow to run if you aren't using this feature

You can also add distorsion to the generated text with?-d?and?-do

??

The text is chosen at random in a dictionary file (that can be found in the?dicts?folder) and drawn on a white background made with Gaussian noise. The resulting image is saved as [text]_[index].jpg

There are a lot of parameters that you can tune to get the results you want, therefore I recommand checking out?python run.py -h?for more informations.

How to create images with Chinese (both simplified and traditional) text

It is simple! Just do?python run.py -l cn -c 1000 -w 5!

Unfortunately I do not speak Chinese so you may have to edit?texts/cn.txt?to include some meaningful words instead of random glyphs.

Here are examples of what I could make with it:

Traditional:

Simplified:

Can I add my own font?

Yes, the script picks a font at random from the?fonts?directory.

??
fonts/latinEnglish, French, Spanish, German
fonts/cnChinese
??

Simply add / remove fonts until you get the desired output.

If you want to add a new non-latin language, the amount of work is minimal.

  • Create a new folder with your language two-letters code
  • Add a .ttf font in it
  • Edit?run.py?to add an if statement in?load_fonts()
  • Add a text file in?dicts?with the same two-letters code
  • Run the tool as you normally would but add?-l?with your two-letters code
  • It only supports .ttf for now.

    Benchmarks

    • Intel Core i7-4710HQ @ 2.50Ghz + SSD (-c 1000 -w 1)
      • -t 1?: 363 img/s
      • -t 2?: 694 img/s
      • -t 4?: 1300 img/s
      • -t 8?: 1500 img/s
    • AMD Ryzen 7 1700 @ 4.0Ghz + SSD (-c 1000 -w 1)
      • -t 1?: 558 img/s
      • -t 2?: 1045 img/s
      • -t 4?: 2107 img/s
      • -t 8?: 3297 img/s

    Contributing

  • Create an issue describing the feature you'll be working on
  • Code said feature
  • Create a pull request
  • Feature request & issues

    If anything is missing, unclear, or simply not working, open an issue on the repository.

    What is left to do?

    • Better background generation
    • Better handwritten text generation
    • More customization parameters (mostly regarding background)

    總結(jié)

    以上是生活随笔為你收集整理的用于文本识别的合成数据生成器的全部內(nèi)容,希望文章能夠幫你解決所遇到的問題。

    如果覺得生活随笔網(wǎng)站內(nèi)容還不錯,歡迎將生活随笔推薦給好友。

    主站蜘蛛池模板: 五月婷婷六月激情 | 91精品免费看 | 国内毛片毛片毛片毛片 | 美女色av| 最新国产中文字幕 | 91综合在线 | www.色偷偷| 99热精品在线播放 | 亚洲欧美综合视频 | 亚洲一区亚洲二区 | 日韩激情 | 午夜欧美在线 | 操操操视频| 日韩成人福利视频 | 三上悠亚中文字幕在线播放 | 黄色生活毛片 | 日本中文在线视频 | 日日日夜夜操 | 精品影片一区二区入口 | 欧美激情视频网站 | 佐佐木明希电影 | 一级特黄性色生活片 | 久久鲁视频| 精品三级av | 黄色调教视频 | 日韩视频免费观看 | 日韩欧美一区二区三区在线观看 | 欧美日韩中文一区 | 久久综合色视频 | 欧美a级肉欲大片xxx | 永久免费的av网站 | 国产精选一区 | 欧美最顶级a∨艳星 | 欧美自拍偷拍一区 | 九草网| 超碰超碰超碰超碰超碰 | 日本中文字幕不卡 | 欧美日韩一卡二卡 | 天天玩天天操 | 日本少妇大战黑人 | 伊人视屏 | 亚洲视屏 | 中文字幕亚洲欧美日韩在线不卡 | 欧美色吊丝 | 伊人狼人久久 | 欧美国产一区二区在线观看 | 最新av网站在线观看 | 吃瓜网今日吃瓜 热门大瓜 色婷在线 | 欧美疯狂做受 | 男人和女人在床的app | 国产哺乳奶水91在线播放 | 吻胸摸激情床激烈视频大胸 | 亚洲大胆视频 | 免费午夜人成电影 | 欧美高清性xxxxhdvideosex | 男人天堂你懂的 | 男人操女人的网站 | 日韩影视一区 | 色噜噜一区二区三区 | 强行挺进白丝老师里呻吟 | 最新黄色网址在线观看 | 女人十八毛片嫩草av | 日本69熟 | 亚洲aa | 超碰免费看 | 国产中文字幕一区二区三区 | 韩国av一区二区三区 | 欧美日韩亚洲一区二区三区 | 国产福利一区二区三区在线观看 | 亚洲网站av | 色欲久久久天天天精品综合网 | 国产色秀视频 | 五月亚洲综合 | 久久av资源网 | 丁香六月欧美 | 日本三级一区二区三区 | 超碰在线观看av | 亚洲射吧 | 国产精品无码在线播放 | 一级a性色生活片久久无 | 成年人视频在线看 | 6680新视觉电影免费观看 | 91美女网站 | 精品视频一区在线观看 | 国产精品7 | 长篇乱肉合集乱500小说日本 | www.好了av| 秘密基地电影免费版观看国语 | 农村偷人一级超爽毛片 | www.美色吧.com | 国产一区二区亚洲 | 久久精品欧美一区二区三区不卡 | 日韩激情免费 | 人妻少妇一区 | 国产盗摄av | 黄色日批视频 | 色狠| 8090av| 欧美精品毛片 |