This Python project focuses on generating training data for detecting columns or text blocks of tibetan texts by embedding Tibetan text into images.
It includes functions to create lorem ipsum-like Tibetan text, read random Tibetan text files from a directory, and calculate and embed text within specified bounding boxes in images. The project effectively handles Tibetan script, ensuring proper display and formatting within the images.
- Automated Data Generation: Simplifies the process of generating training data for Tibetan NLP tasks.
- Customizable Input: Allows users to specify various input parameters like images, labels, directories for backgrounds and corporate images, etc.
- Image Processing: Utilizes the PIL library for image manipulation.
- Bounding Box Preparation: Includes a utility function
prepare_bbox_string
for handling bounding boxes. - Multiprocessing Support: Leverages multiprocessing for efficient data processing.
- Debugging Mode: Includes a debug mode for troubleshooting and ensuring correct data processing.
- Python 3.x
- PIL (Python Imaging Library)
- YOLO utilities (for bounding box handling)
- Additional Python libraries: numpy, tqdm, yaml
Clone the repository to your local machine:
git clone https://github.com/nih23/Tibetan-NLP.git
cd Tibetan-NLP
Training data is generated by simply running generate_training_data.py
. Make sure to update folders for background images.
python generate_training_data.py
Training of YOLOv8n is done by a CLI call to Ultralytics.
yolo detect train data=data/yolo_tibetan/tibetan_text_boxes.yml epochs=1000 imgsz=1024
The model is then converted into a torchscript for inference:
yolo detect export model=runs/detect/train9/weights/best.pt
We can now employ our trained model for recognition and classification of tibetan text blocks as follows:
yolo predict task=detect model=runs/detect/train9/weights/best.torchscript imgsz=1024 source=data/my_inference_data/*.jpg
The results are then saved to folder runs/detect/predict
Contributions to this project are welcome! Please fork the repository and submit a pull request with your proposed changes.
This project is licensed under the MIT License - see the LICENSE file for details.