macSubtitleOCR

Overview

macSubtitleOCR is a tool written entirely in Swift that converts bitmap subtitles into the SubRip subtitle format (SRT) using Optical Character Recognition (OCR). It currently supports both PGS and VobSub bitmap subtitles. The tool utilizes the built-in macOS OCR engine, offering highly accurate text recognition.

For more details on performance, refer to the Accuracy section below.

Features

Export .png images of subtitles for manual correction of OCR output.
Use the macOS OCR engine's language recognition feature to enhance accuracy by validating character sequences as real words.
Export raw JSON output from the OCR engine for further analysis.
Experimental internal decoder for development (mostly working, VobSub gives occasional errors)

Supported Formats

PGS (.mkv, .sup)
VobSub (.sub, .idx)

Building the Project

Important

This project requires Swift 6 to compile and run correctly. This project also requires FFmpeg to be installed on your system. Currently only arm64 is supported, PR adding support welcome.

To build macSubtitleOCR, follow these steps:

brew install ffmpeg
git clone https://github.com/ecdye/macSubtitleOCR
cd macSubtitleOCR
swift build

The compiled build will be available in the .build/debug directory.

Running Tests

The testing process compares OCR output against known correct results. We aim for at least 95% accuracy, because slight differences may occur between machines.

swift test

Accuracy

In tests comparing macSubtitleOCR with the Tesseract OCR engine, the macOS OCR engine often outperforms Tesseract, particularly with challenging cases like the letter 'I'. While methods like binary image comparison, used by tools such as SubtitleEdit, may offer slightly better accuracy in some cases, the macOS OCR engine provides excellent results for most use cases.

Contribution and TODO

For information on how to contribute to the project, please refer to CONTRIBUTING.md.

If you're interested in working on specific features or improvements, check out issues tagged as enhancements.

Name		Name	Last commit message	Last commit date
Latest commit History 91 Commits
.github		.github
Sources		Sources
Tests		Tests
.gitignore		.gitignore
.periphery.yml		.periphery.yml
.swiftformat		.swiftformat
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE.md		LICENSE.md
Package.resolved		Package.resolved
Package.swift		Package.swift
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

macSubtitleOCR

Overview

Features

Supported Formats

Building the Project

Running Tests

Accuracy

Contribution and TODO

References

About

Releases

Languages

License

ecdye/macSubtitleOCR

Folders and files

Latest commit

History

Repository files navigation

macSubtitleOCR

Overview

Features

Supported Formats

Building the Project

Running Tests

Accuracy

Contribution and TODO

References

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Languages