Skip to content

Tools for preprocessing and AI-driven analysis of images with Tibetan text.

License

Notifications You must be signed in to change notification settings

nih23/Tibetan-NLP

Repository files navigation

Tibetan Column Detection

Overview

This Python project focuses on generating training data for detecting columns or text blocks of tibetan texts by embedding Tibetan text into images.

Validation results

Validation results

It includes functions to create lorem ipsum-like Tibetan text, read random Tibetan text files from a directory, and calculate and embed text within specified bounding boxes in images. The project effectively handles Tibetan script, ensuring proper display and formatting within the images.

Features

  • Automated Data Generation: Simplifies the process of generating training data for Tibetan NLP tasks.
  • Customizable Input: Allows users to specify various input parameters like images, labels, directories for backgrounds and corporate images, etc.
  • Image Processing: Utilizes the PIL library for image manipulation.
  • Bounding Box Preparation: Includes a utility function prepare_bbox_string for handling bounding boxes.
  • Multiprocessing Support: Leverages multiprocessing for efficient data processing.
  • Debugging Mode: Includes a debug mode for troubleshooting and ensuring correct data processing.

Getting Started

Prerequisites

  • Python 3.x
  • PIL (Python Imaging Library)
  • YOLO utilities (for bounding box handling)
  • Additional Python libraries: numpy, tqdm, yaml

Installation

Clone the repository to your local machine:

git clone https://github.com/nih23/Tibetan-NLP.git
cd Tibetan-NLP

Generating training data

Training data is generated by simply running generate_training_data.py. Make sure to update folders for background images.

python generate_training_data.py

Train YOLOv8n

Training of YOLOv8n is done by a CLI call to Ultralytics.

yolo detect train data=data/yolo_tibetan/tibetan_text_boxes.yml epochs=1000 imgsz=1024

The model is then converted into a torchscript for inference:

yolo detect export model=runs/detect/train9/weights/best.pt 

Inference

We can now employ our trained model for recognition and classification of tibetan text blocks as follows:

yolo predict task=detect model=runs/detect/train9/weights/best.torchscript imgsz=1024 source=data/my_inference_data/*.jpg

The results are then saved to folder runs/detect/predict

Contributions

Contributions to this project are welcome! Please fork the repository and submit a pull request with your proposed changes.

License

This project is licensed under the MIT License - see the LICENSE file for details.

About

Tools for preprocessing and AI-driven analysis of images with Tibetan text.

Resources

License

Stars

Watchers

Forks

Languages