manga-image-translator 是一款能够一键翻译各类图片内文字
先看效果在线版官方演示站 (由 zyddnys 维护): https://cotrans.touhou.ai/镜像站 (由 Eidenz 维护): https://manga.eidenz.com/浏览器脚本 (由 QiroNT 维护): https://greasyfork.org/scripts/437569
注意如果在线版无法访问说明 Google GCP 又在重启我的服务器,此时请等待我重新开启服务。在线版使用的是目前 main 分支最新版本。使用说明# 首先,确信你的机器安装了 Python 3.8 及以上版本$ python --versionPython 3.8.13# 拉取仓库$ git clone https://github.com/zyddnys/manga-image-translator.git# 安装依赖$ pip install -r requirements.txt$ pip install git+https://github.com/kodalli/pydensecrf.git注意:pydensecrf 并没有作为一个依赖列出,如果你的机器没有安装过,就需要手动安装一下。如果你在使用 Windows,可以尝试在 https://www.lfd.uci.edu/~gohlke/pythonlibs/#_pydensecrf (英文) (pip install https://www.lfd.uci.edu/~gohlke/pythonlibs/#_pydensecrf) 找一个对应 Python 版本的预编译包,并使用 pip 安装。如果你在使用其它操作系统,可以尝试使用 pip install git+https://github.com/kodalli/pydensecrf.git 安装。
[使用谷歌翻译时可选]申请有道翻译或者 DeepL 的 API,把你的 APP_KEY 和 APP_SECRET 或 AUTH_KEY 写入 translators/key.py 中。
翻译器列表名称
是否需要 API Key
是否离线可用
其他说明
youdao
✔️
需要 YOUDAO_APP_KEY 和 YOUDAO_SECRET_KEY
baidu
✔️
需要 BAIDU_APP_ID 和 BAIDU_SECRET_KEY
deepl
✔️
需要 DEEPL_AUTH_KEY
caiyun
✔️
需要 CAIYUN_TOKEN
gpt3
✔️
Implements text-davinci-003. Requires OPENAI_API_KEY
gpt3.5
✔️
Implements gpt-3.5-turbo. Requires OPENAI_API_KEY
gpt4
✔️
Implements gpt-4. Requires OPENAI_API_KEY
papago
sakura
需要SAKURA_API_BASE
offline
✔️
自动选择可用的离线模型,只是选择器
sugoi
✔️
只能翻译英文
m2m100
✔️
可以翻译所有语言
m2m100_big
✔️
带big的是完整尺寸,不带是精简版
none
✔️
翻译成空白文本
mbart50
✔️
original
✔️
翻译成源文本
语言代码列表可以填入 --target-lang 参数
CHS: Chinese (Simplified)CHT: Chinese (Traditional)CSY: CzechNLD: DutchENG: EnglishFRA: FrenchDEU: GermanHUN: HungarianITA: ItalianJPN: JapaneseKOR: KoreanPLK: PolishPTB: Portuguese (Brazil)ROM: RomanianRUS: RussianESP: SpanishTRK: TurkishVIN: VietnamesARA: ArabicSRP: SerbianHRV: CroatianTHA: ThaiIND: Indonesian选项-h, --help show this help message and exit-m, --mode {demo,batch,web,web_client,ws,api} Run demo in single image demo mode (demo), batch translation mode (batch), web service mode (web)-i, --input INPUT [INPUT ...] Path to an image file if using demo mode, or path to an image folder if using batch mode-o, --dest DEST Path to the destination folder for translated images in batch mode-l, --target-lang {CHS,CHT,CSY,NLD,ENG,FRA,DEU,HUN,ITA,JPN,KOR,PLK,PTB,ROM,RUS,ESP,TRK,UKR,VIN,ARA,CNR,SRP,HRV,THA,IND} Destination language-v, --verbose Print debug info and save intermediate images in result folder-f, --format {png,webp,jpg,xcf,psd,pdf} Output format of the translation.--attempts ATTEMPTS Retry attempts on encountered error. -1 means infinite times.--ignore-errors Skip image on encountered error.--overwrite Overwrite already translated images in batch mode.--skip-no-text Skip image without text (Will not be saved).--model-dir MODEL_DIR Model directory (by default ./models in project root)--use-gpu Turn on/off gpu (automatic selection between mps or cuda)--use-gpu-limited Turn on/off gpu (excluding offline translator)--detector {default,ctd,craft,none} Text detector used for creating a text mask from an image, DO NOT use craft for manga, it's not designed for it--ocr {32px,48px,48px_ctc,mocr} Optical character recognition (OCR) model to use--use-mocr-merge Use bbox merge when Manga OCR inference.--inpainter {default,lama_large,lama_mpe,sd,none,original} Inpainting model to use--upscaler {waifu2x,esrgan,4xultrasharp} Upscaler to use. --upscale-ratio has to be set for it to take effect--upscale-ratio UPSCALE_RATIO Image upscale ratio applied before detection. Can improve text detection.--colorizer {mc2} Colorization model to use.--translator {google,youdao,baidu,deepl,papago,caiyun,gpt3,gpt3.5,gpt4,none,original,offline,nllb,nllb_big,sugoi,jparacrawl,jparacrawl_big,m2m100,sakura} Language translator to use--translator-chain TRANSLATOR_CHAIN Output of one translator goes in another. Example: --translator-chain "google:JPN;sugoi:ENG".--selective-translation SELECTIVE_TRANSLATION Select a translator based on detected language in image. Note the first translation service acts as default if the language isn't defined. Example: --translator-chain "google:JPN;sugoi:ENG".--revert-upscaling Downscales the previously upscaled image after translation back to original size (Use with --upscale- ratio).--detection-size DETECTION_SIZE Size of image used for detection--det-rotate Rotate the image for detection. Might improve detection.--det-auto-rotate Rotate the image for detection to prefer vertical textlines. Might improve detection.--det-invert Invert the image colors for detection. Might improve detection.--det-gamma-correct Applies gamma correction for detection. Might improve detection.--unclip-ratio UNCLIP_RATIO How much to extend text skeleton to form bounding box--box-threshold BOX_THRESHOLD Threshold for bbox generation--text-threshold TEXT_THRESHOLD Threshold for text detection--min-text-length MIN_TEXT_LENGTH Minimum text length of a text region--no-text-lang-skip Dont skip text that is seemingly already in the target language.--inpainting-size INPAINTING_SIZE Size of image used for inpainting (too large will result in OOM)--inpainting-precision {fp32,fp16,bf16} Inpainting precision for lama, use bf16 while you can.--colorization-size COLORIZATION_SIZE Size of image used for colorization. Set to -1 to use full image size--denoise-sigma DENOISE_SIGMA Used by colorizer and affects color strength, range from 0 to 255 (default 30). -1 turns it off.--mask-dilation-offset MASK_DILATION_OFFSET By how much to extend the text mask to remove left-over text pixels of the original image.--font-size FONT_SIZE Use fixed font size for rendering--font-size-offset FONT_SIZE_OFFSET Offset font size by a given amount, positive number increase font size and vice versa--font-size-minimum FONT_SIZE_MINIMUM Minimum output font size. Default is image_sides_sum/200--font-color FONT_COLOR Overwrite the text fg/bg color detected by the OCR model. Use hex string without the "#" such as FFFFFF for a white foreground or FFFFFF:000000 to also have a black background around the text.--line-spacing LINE_SPACING Line spacing is font_size * this value. Default is 0.01 for horizontal text and 0.2 for vertical.--force-horizontal Force text to be rendered horizontally--force-vertical Force text to be rendered vertically--align-left Align rendered text left--align-center Align rendered text centered--align-right Align rendered text right--uppercase Change text to uppercase--lowercase Change text to lowercase--no-hyphenation If renderer should be splitting up words using a hyphen character (-)--manga2eng Render english text translated from manga with some additional typesetting. Ignores some other argument options--gpt-config GPT_CONFIG Path to GPT config file, more info in README--use-mtpe Turn on/off machine translation post editing (MTPE) on the command line (works only on linux right now)--save-text Save extracted text and translations into a text file.--save-text-file SAVE_TEXT_FILE Like --save-text but with a specified file path.--filter-text FILTER_TEXT Filter regions by their text with a regex. Example usage: --text-filter ".*badtext.*"--prep-manual Prepare for manual typesetting by outputting blank, inpainted images, plus copies of the original for reference--font-path FONT_PATH Path to font file--gimp-font GIMP_FONT Font family to use for gimp rendering.--host HOST Used by web module to decide which host to attach to--port PORT Used by web module to decide which port to attach to--nonce NONCE Used by web module as secret for securing internal web server communication--ws-url WS_URL Server URL for WebSocket mode--save-quality SAVE_QUALITY Quality of saved JPEG image, range from 0 to 100 with 100 being best--ignore-bubble IGNORE_BUBBLE The threshold for ignoring text in non bubble areas, with valid values ranging from 1 to 50, does not ignore others. Recommendation 5 to 10. If it is too low, normal bubble areas may be ignored, and if it is too large, non bubble areas may be considered normal bubbles使用命令行执行# 如果机器有支持 CUDA 的 NVIDIA GPU,可以添加 `--use-gpu` 参数# 使用 `--use-gpu-limited` 将需要使用大量显存的翻译交由CPU执行,这样可以减少显存占用# 使用 `--translator=<翻译器名称>` 来指定翻译器# 使用 `--target-lang=<语言代码>` 来指定目标语言# 将 <图片文件路径> 替换为图片的路径# 如果你要翻译的图片比较小或者模糊,可以使用upscaler提升图像大小与质量,从而提升检测翻译效果$ python -m manga_translator --verbose --use-gpu --translator=google --target-lang=CHS -i <path_to_image_file># 结果会存放到 result 文件夹里使用命令行批量翻译# 其它参数如上# 使用 `--mode batch` 开启批量翻译模式# 将 <图片文件夹路径> 替换为图片文件夹的路径$ python -m manga_translator --verbose --mode batch --use-gpu --translator=google --target-lang=CHS -i <图片文件夹路径># 结果会存放到 `<图片文件夹路径>-translated` 文件夹里使用浏览器 (Web 服务器)# 其它参数如上# 使用 `--mode web` 开启 Web 服务器模式$ python -m manga_translator --verbose --mode web --use-gpu# 程序服务会开启在 http://127.0.0.1:5003程序提供两个请求模式:同步模式和异步模式。同步模式下你的 HTTP POST 请求会一直等待直到翻译完成。异步模式下你的 HTTP POST 会立刻返回一个 task_id,你可以使用这个 task_id 去定期轮询得到翻译的状态。
同步模式POST 提交一个带图片,名字是 file 的 form 到 http://127.0.0.1:5003/run等待返回从得到的 task_id 去 result 文件夹里取结果,例如通过 Nginx 暴露 result 下的内容异步模式POST 提交一个带图片,名字是 file 的 form 到http://127.0.0.1:5003/submit你会得到一个 task_id通过这个 task_id 你可以定期发送 POST 轮询请求 JSON {"taskid": <task_id>} 到 http://127.0.0.1:5003/task-state当返回的状态是 finished、error 或 error-lang 时代表翻译完成去 result 文件夹里取结果,例如通过 Nginx 暴露 result 下的内容人工翻译人工翻译允许代替机翻手动填入翻译后文本
POST 提交一个带图片,名字是 file 的 form 到 http://127.0.0.1:5003/manual-translate,并等待返回
你会得到一个 JSON 数组,例如:
{ "task_id": "12c779c9431f954971cae720eb104499", "status": "pending", "trans_result": [ { "s": "☆上司来ちゃった……", "t": "" } ]}将翻译后内容填入 t 字符串:
{ "task_id": "12c779c9431f954971cae720eb104499", "status": "pending", "trans_result": [ { "s": "☆上司来ちゃった……", "t": "☆上司来了..." } ]}将该 JSON 发送到 http://127.0.0.1:5003/post-manual-result,并等待返回之后就可以从得到的 task_id 去 result 文件夹里取结果,例如通过 Nginx 暴露 result 下的内容