Name of project: RosaeNLG
Requested project maturity level: Sandbox
Template-based Natural Language Generation (NLG) automates the production of relatively repetitive texts based on structured input data and textual templates, run by a NLG engine. Production usage is widespread in large corporations, especially in the financial industry.
Typical usecases are:
-
describing a product based on its features for SEO purposes
-
produce structured reports like risk reports or fund performance in the financial industry
-
generate well formed chatbot answers
RosaeNLG is an open source NLG engine. It aims:
-
to offer the same NLG features as product NLG solutions
-
to be developer and IT friendly for template configuration and integration
-
to provide NLG on both server-side and browser-side
NLG is a brick to build business services directly aimed at final users. It is widely used in the industry.
NLG concurs to democratization and understandability of AI:
-
Non-expert users don’t understand figures and dashboards and prefer textual explanations
-
Computer-generated texts can be superior (from the reader’s perspective) to human-written texts
-
At the end of an AI pipeline, NLG can automate and convey expertise, explain and summarize situations, and communicate with end users
-
Bring trusted AI, typically in collaboration with AI Explainability 360 and AI Fairness 360
RosaeNLG project will also:
-
Increase diversity: first project originated in France
-
Foster usage, contributions and diversity in NLG domain, supporting languages presently not covered by any NLG system at all
RosaeNLG currently runs on Acumos for Orange AI Marketplace.
RosaeNLG can be used at the end of the AI pipeline, to explain a decision to non-experts:
-
AI Explainability 360: provide a clear, readable, summarized explanation for an end user (e.g. Bank Customer) asking for explanations
-
AI Fairness 360: generate comprehensive compliance reports on fairness (initial situation, what was done, final situation)
Potential usage of MLflow, especially Model Registry, to manage templates.
Technical tooling:
-
Github Actions for the CI
The project has it own GH organization: RosaeNLG organization on Github
-
RosaeNLG meetup group: for example,
-
direct communication with the author by email
-
Slack channel (not largely used as of today)
RosaeNLG is a fork of the Pug template engine (MIT).
It is composed of 70 submodules. Most of these modules are an original part of RosaeNLG, with the same Apache 2.0 license, and are not listed below.
Depending on the output language, RosaeNLG will load some linguistic resources and use linguistic libraries, to make agreements and to conjugate verbs.
Resources derived from linguistic resources (mainly WordNet
, lefff
, german-pos-dict
, morph-it
) remain under their original licence.
Resource | Usage | Licence |
---|---|---|
random numbers |
MIT |
|
dates and times formatting. |
MIT |
|
numbers formatting |
MIT |
|
Cardinal numbers in letters: 5 → five etc. (except for German) |
MIT |
|
stemming |
MIT |
|
stopwords-de, stopwords-en, stopwords-es, stopwords-fr, stopwords-it |
lists of stop words |
MIT |
tokenizer |
MIT |
Resource | Usage | Licence |
---|---|---|
English gerunds ( |
||
title case (for titles) in English |
MIT |
|
ordinal numbers in English |
MIT |
Resource | Usage | Licence |
---|---|---|
French words that are 'aspiré' (vs. 'muet') |
||
gender and plural of French words |
||
pluralize nouns |
MIT |
|
title case (for titles) in French |
MIT |
Resource | Usage | Licence |
---|---|---|
German adjectives, words and verbs agreement |
Resource | Usage | Licence |
---|---|---|
agreement of Italian adjectives, words and verbs |
Resource | Usage | Licence |
---|---|---|
ordinal numbers for Spanish |
Apache 2.0 |
|
gender of Spanish words |
MIT |
|
plural of Spanish words |
MIT |
|
Spanish verbs conjugation |
Apache 2.0 |
-
Ludan Stoecklé, [email protected], personal author, 3+ years
-
Marco Riva, https://github.com/rivamarco, on Italian in 2020
Yes, see:
-
Ludan Stoecklé (> 60 000 lines of code, 100+ commits since first public version in Sept. 2019)
-
Marco Riva (https://github.com/rivamarco) on Italian (company: Radicalbit)
-
Ongoing work with RedLab Paris to have PhDs as contributors
For JavaScript version (main), see Publish a new version:
-
orchestrated by GitHub Actions
-
uses
vXX.XX.XX
branches -
Sonar quality gate
-
GitHub Actions builds, tests, and publishes:
-
Docker images on docker hub: RosaeNLG server image and RosaeNLG CLI image
-
documentation on main doc website and github pages
For Java version, see Publish a new version:
-
orchestrated by GitHub Actions
-
uses
vXX.XX.XX
branches -
publishes libraries on Maven Central
-
creates Java Server image on docker hub
RosaeNLG code of conduct, which refers to https://lfprojects.org/policies/code-of-conduct/.
Yes for both repos:
Do you have any specific infrastructure requests needed as part of hosting the project in the LF AI?
-
Github Actions
-
documentation is hosted on AWS (S3 + CloudFront)
-
Documentation: https://rosaenlg.org
-
Documentation site and project on Github might be sufficient
-
articles on Medium
-
LinkedIn Company page (which is not used today)
Support:
-
Addventa (company specialized in NLG, based in Paris) provides commercial support on RosaeNLG (support with SLA and Professional Services)
-
RosaeNLG is available for commercial usage on Orange AI marketplace
-
Ongoing discussions with RedLab Paris to have junior PhDs as contributors
Early adopters:
-
Lizeo (tires descriptions)