The Chinese Visual Embeddings are trained for the purpose of visual representations for the Chinese written signs. It is part of a larger project Graphein by Qianxun Chen
- Download 'NotoSansCJKsc-Regular' and 'NotoSansCJKtc-Regular' from Google Noto Fonts and save them to folder 'fonts'
- Create a new virtual environment with python 3
virtualenv --python=python3.6 venv
- Activate the environment
activate source venv/bin/activate
- Install all the packages
pip3 install -r requirements.txt
python preprocess.py -img
- An earlier version of the embeddings was trained with
python CNN.py
- The latest version of the embeddings can be trained with
python CNN_cuda_multiLabel.py
- Generate tsv files to preview embeddings in Embedding Projector
python embeddings/VC/generateTSV.py
- Generate word2vec format embeddings
python embeddings/VC/tsvs2txt.py
dict.csv is created based on the dictionary.txt file from the project, makemehanzi.