Skills Taxonomy

The aim of this pipeline is to build the taxonomy from skills extracted from TextKernel job adverts. There are 2 steps:

Build the taxonomy (build_taxonomy.py)
Output a user friendly version of the taxonomy (output_taxonomy.py)

The parameters for all these steps can be found in the config path directory skills_taxonomy_v2/config/skills_taxonomy/.

The latest config file is 2022.01.21.yaml.

1. Build the taxonomy

This is run by:

python -i skills_taxonomy_v2/pipeline/skills_taxonomy/build_taxonomy.py --config_path 'skills_taxonomy_v2/config/skills_taxonomy/2022.01.21.yaml'

Outputs:

A dictionary of each skill with what part of the hierarchy it is in - outputs/skills_taxonomy/2022.01.21_skills_hierarchy.json
A nested dictionary of each skill group with the skill groups it contains - outputs/skills_taxonomy/2022.01.21_hierarchy_structure.json

2. Output the taxonomy

Rather than output a json of the hierarchy with numerical keys, this switches the keys to the skill group names. It makes the json output a little bit more user-friendly as a means to interrogate the hierarchy.

Run by:

python -i skills_taxonomy_v2/pipeline/skills_taxonomy/output_taxonomy.py --config_path 'skills_taxonomy_v2/config/skills_taxonomy/2022.01.21.yaml'

Outputs:

outputs/skills_taxonomy/2022.01.21_hierarchy_structure_named.json

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

Skills Taxonomy

1. Build the taxonomy

2. Output the taxonomy

Files

README.md

Latest commit

History

README.md

File metadata and controls

Skills Taxonomy

1. Build the taxonomy

2. Output the taxonomy