Solving NTHU AIS captchas with deep learning models, implemented as APIs.
Tip
This project was made for NTHUMods!
Go check it out! It's open source too!
-
[cnn_seg]: Segmentation + CNN
This method involves segmenting the CAPTCHA image into individual characters and then using a Convolutional Neural Network (CNN) to recognize each character.
-
[cnn_ctc]: CNN + CTC Layer
This method involves using a CNN trained with a Connectionist Temporal Classification (CTC) loss function to recognize the entire CAPTCHA image at once.
Depending on the use case, different methods may be more suitable.
To get started with running the model API, you can use the provided run.sh script, which is set up to run the cnn_ctc.py model by default.
Make sure you have all the required dependencies installed:
pip install -r requirements.txt
Or if you are on Windows:
pip install -r requirements-windows.txt
To run the cnn_ctc model as a fetchable API:
./run.sh
Or if you want to run the model manually:
sudo gunicorn cnn_ctc:app --workers 1 --worker-class uvicorn.workers.UvicornWorker --bind 0.0.0.0:5000
Or if you are on Windows:
fastapi dev cnn_ctc.py
Or for Windows production:
fastapi run cnn_ctc.py
You may also run the cnn_seg model by changing the script to cnn_seg.py.
Make a GET request to the api, passing in the url of the image:
https://example.api:5000?url=IMAGEURL
Or visit the fastAPI docs at:
https://example.api:5000/docs
The API will return the predicted text of the image in plain text format.
Two formats of the image recognition model is available in the /models folder
These models are for single digit recognition, separation is still required
- Tensorflow [.h5] model format
- TFLite [.tflite] format
Example usage for the TFLite model:
import tflite_runtime.interpreter as tflite
model = tflite.Interpreter(model_path='models/cnn_seg/model.tflite')
# Or
import tensorflow as tf
model = tf.lite.Interpreter(model_path='models/cnn_seg/model.tflite')
input_details = model.get_input_details()
output_details = model.get_output_details()
model.allocate_tensors()
model.set_tensor(input_details[0]['index'], img)
model.invoke()
prediction = model.get_tensor(output_details[0]['index'])
answer = np.argmax(prediction[0])
print(''.join(map(str, answers)))
You may view usage examples in the notebooks, scripts, or cnn_seg.py file
There are a few variations of the CTC model available in the /models folder
Some options containing the CTCLoss layer (if you want to train further)
- Tensorflow [.keras] model format (with CTC variation)
- Tensorflow [.h5] model format (with CTC variation)
- Tensorflow [saved_model] format
- TFLite [.tflite] format (quantized model available)
Usage same as above, but with the respective model format
Note: The output of this model has been labeled and requires decoding. Implementations can be found in the notebooks, scripts, or cnn_ctc.py file
To train your own models or experiment with different architectures:
- Navigate to the notebooks folder.
- Open the respective ipynb file in Google Colab.
- Load the data from the training folder if necessary.
- Have fun experimenting!
Note: The full experimental process and training was done on Google Colab, I am unabled to guarantee the same results on local machines.
Full training processes can be found in the notebooks folder.
All data used for training can be found in the training folder.
Data generated (300.zip, 1000.zip) for cnn_ctc training used was generated by the cnn_seg model, then handpicked for correctness.
You can benchmark the performance of the models using the scripts in the benchmarks folder. The benchmarks will help you evaluate the speed and accuracy of each model when deployed as an API.
Results will be published soon.
Multi digit recognition via
(conv + pool + norm) * n + attention rnn + lstm * 2 + dense + ctc loss
Contributions are always welcomed! If you have any ideas or improvements, feel free to fork the repository, make your changes, and submit a pull request. Also, feel free to open an issue if you have any suggestions or feedback.
Note: I am not a professional machine learning engineer, so any help or advice is greatly appreciated.
- Keras - Automatic Speech Recognition using CTC
- Hugging Face - keras-io/ocr-for-captcha
- Not a reference, but I wrote this for fun for a image processing class while working on cnn_seg
Audrey for helping me brain everything
Chew for helping me debug everything
MIT