llm4ke

Repository for Large Language Models for Knowledge Engineering (LLM4KE).

Objectives

Original idea:

How much LLM could co-contribute in the knowledge engineering process together with our usual methodology (competency questions, ontology re-use, authoring tests, etc.).

Set of questions we could investigate:

Could a LLM reverse engineer an ontology and find out what good competency questions could be derived?
Could a LLM take as input the CQ and generate parts of the ontology?
Could a LLM take as input the CQ and extend an existing ontology?
Could a LLM take as input the CQ and generate abstract patterns?
Could a LLM write an authoring test (a SPARQL query) given the ontology and the CQ?
Given a dataset and an ontology, is an LLM able to generate an adequate set of RML rules for data ingestion?
Could a LLM take as input the CQ and extend an existing ontology?

The content of this code repository accompanies the research project explained in the following paper:

@inproceedings{llm4ke-2024,
  title     = {{Can LLMs Generate Competency Questions?}},
  author    = {{Youssra Rebboud} and {Lionel Tailhardat} and {Pasquale Lisena} and {Rapha\"el Troncy}},
  booktitle = {Semantic Web - 21st International Conference (ESWC), LLMs for KE track, Hersonissos, Crete, Greece, May 26 - 30, 2024},
  year      = {2024}
}

Usage

See the Repository Structure for navigating into this repository:

llm4ke
├───data <Reference data models with their related components>
│   └─[DataModelName]
│     ├─dm <data model implementation>
│     ├─rq <set of queries>
│     └─...
├───src <Processing pipeline code>
└───...

Generating Competency Questions

We will now address the research question "1. Could a LLM reverse engineer an ontology and identify potential competency questions?" mentioned above.

The pipeline uses LangChain, and in particular Ollama.

Install Ollama from its website.
Install requirements
```
pip install -r requirements.txt
```
Download the desidered LLM (full list of available LLMs)
```
ollama pull llama2
```

Run the pipeline to generate Competency Questions for a given ontology

# Canonical form:
# python src/main.py <task> --name <OntologyName> --input <OntologyFolder> --llm <ModelName>

# Basic example for the Odeuropa ontology:
python src/main.py all_classes --name Odeuropa --input ./data/Odeuropa/ --llm llama2

Then browse the results in the out/Odeuropa/ directory. You can get the full list of available parameters with python src/main.py --help

Evaluating the LLM's Competency Questions

With the output data from the above Generating Competency Questions step,

Run the evaluation pipeline to compute similarity scores for all ontologies or a given ontology

# Canonical form:
# python src/eval.py <all|OntologyName>

# Basic example for the Odeuropa ontology with a 0.8 similarity threshold and verbose logging:
python3 ./src/eval.py Odeuropa -t 0.8 --log 10

Then browse the results in the ./results_<all|OntologyName>.json/ file.

Copyright

License

Apache.

Maintainer

Raphaël TRONCY
Pasquale LISENA
Youssra REBBOUD
Lionel TAILHARDAT

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

llm4ke

Objectives

Usage

Generating Competency Questions

Evaluating the LLM's Competency Questions

Copyright

License

Maintainer

Files

README.md

Latest commit

History

README.md

File metadata and controls

llm4ke

Objectives

Usage

Generating Competency Questions

Evaluating the LLM's Competency Questions

Copyright

License

Maintainer