Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

The Persona Sprint #4

Open
36 of 64 tasks
JarbasAl opened this issue Sep 22, 2023 · 1 comment
Open
36 of 64 tasks

The Persona Sprint #4

JarbasAl opened this issue Sep 22, 2023 · 1 comment
Labels
documentation Improvements or additions to documentation

Comments

@JarbasAl
Copy link
Member

JarbasAl commented Sep 22, 2023

we hit our stretch goal for persona!

this issue will document the progress

Framework

(during ovos-core 0.0.9 dev cycle)

Why

Voice interface

  • base intents to enable/disable a persona (default config)
  • "chat with persona" intent (capture all utterances in converse)
  • "ask {query} to persona" intent (single shot explicit query)

How

code lives here: https://github.com/OpenVoiceOS/ovos-persona

  • solvers service
  • bus api to query a specific persona (persona as a service)
  • persona definitions provided via OPM (usually default personas shipped with solver plugins)
  • loading of user defined personas (.json files)
  • dynamic registering of new persona definitions (eg, from skills)
  • ovos-persona cli entrypoint, a standalone launcher of persona service as a fallback skill (compat with older ovos-core versions)

Session

(during ovos-core 0.0.8 dev cycle)

Why

consider the utterance `"chat with {persona_name}" in a multi user setup, this makes the assistant answer all questions using the requested persona for every individual client

what happens when different users are accessing core? whenever someone changes the persona it changes for everyone!

imagem
image: a typical hivemind setup, we want each client to interact with a specific persona

How

make the persona configuration come from the message.context so it is per query, not global, this is defined via a Session, in the future this also allows OVOS to become user aware per query

the following properties in OVOSSkill should reflect session data live, so skills need no explicit handling of Sessions at all:

  • self.lang
  • self.location
  • self.timezone
  • self.config
  • ...
  • self.persona

session implementation:

footnote: this is the reason ovos-bus-client was created instead of OVOS still using mycroft-bus-client

Pipeline

(during ovos-core 0.0.8 dev cycle)

Why

consider LLMs and how they interact with skills/user commands, how do we know when to use a skill or when to ask a LLM? Factual info usually comes from the web, "chatbot speech" should come from a persona, do we just hardcode this in the utterance handling logic?

each persona handles questions via a selection of solver plugins, which can be directly implemented as a FallbackSkill for example

we want to be able to define where in the intent stage this happens, this also allows the pluginification of the intent systems and eventually even those can be replaced with a LLM if desired, giving a persona bias even to intent selection

imagem
image: intent service (in green) should be configurable, globally and per persona

How

Solvers

(during ovos-core 0.0.9 dev cycle)

Why

once added to the pipeline, personas need to be able to answer arbitrary questions, they also need to handle input in multiple languages

persona definitions include a list of solver plugins and respective configs, an utterance is sent to the solvers until one can answer the question.

"persona": {
    "solvers": [
        "ovos-solver-plugin-llamacpp", 
        "ovos-solver-plugin-personagpt",
        "ovos-solver-failure-plugin"
   ],
   "ovos-solver-plugin-llamacpp": {
        "persona": "helpful, creative, clever, and very friendly"
    },
    "ovos-solver-plugin-personagpt":{
        "facts": [
            "i am a quiet engineer.",
            "i'm single and am looking for love."
            "sadly, i don't have any relatable hobbies.",
            "luckily, however, i am tall and athletic."
            "on friday nights, i watch re-runs of the simpsons alone."
       ]
   }
}

How

solver plugins have automatic bidirectional translation so they can understand and answer in any language, even if the implementation is language specific

the persona definition specifies solver configs and the order in which they are tried

  • develop solver plugins
  • provide default personas via OPM

solvers of interest:

knowledge base solvers:

chatbot like solvers:

LLM solvers:

Solver documentation

A plugin can define the language it works in, eg, wolfram alpha only accepts english input at the time of this writing

Bidirectional translation will be handled behind the scenes for other languages

**Developing a solver: **

Plugins are expected to implement the get_xxx methods and leave the user facing equivalents alone

from ovos_plugin_manager.templates.solvers import QuestionSolver


class MySolver(QuestionSolver):
    enable_tx = False  # if True enables bidirectional translation
    priority = 100

    def __init__(self, config=None):
        config = config or {}
         # set the "internal" language, defined by dev, not user
         # this plugin internally only accepts and outputs english
        config["lang"] = "en"
        super().__init__(config)
        
    # expected solver methods to be implemented
    def get_data(self, query, context):
        """
        query assured to be in self.default_lang
        return a dict response
        """
        return {"error": "404 answer not found"}

    def get_image(self, query, context=None):
        """
        query assured to be in self.default_lang
        return path/url to a single image to acompany spoken_answer
        """
        return "http://stock.image.jpg"

    def get_spoken_answer(self, query, context=None):
        """
        query assured to be in self.default_lang
        return a single sentence text response
        """
        return "The full answer is XXX"

    def get_expanded_answer(self, query, context=None):
        """
        query assured to be in self.default_lang
        return a list of ordered steps to expand the answer, eg, "tell me more"

        {
            "title": "optional",
            "summary": "speak this",
            "img": "optional/path/or/url
        }
        :return:
        """
        steps = [
            {"title": "the question", "summary": "we forgot the question", "image": "404.jpg"},
            {"title": "the answer", "summary": "but the answer is 42", "image": "42.jpg"}
        ]
        return steps

Using a solver:

solvers work with any language as long as you stick to the officially supported wrapper methods

    # user facing methods, user should only be calling these
    def search(self, query, context=None, lang=None):
        """
        cache and auto translate query if needed
        returns translated response from self.get_data
        """
   
    def visual_answer(self, query, context=None, lang=None):
        """
        cache and auto translate query if needed
        returns image that answers query
        """

    def spoken_answer(self, query, context=None, lang=None):
        """
        cache and auto translate query if needed
        returns chunked and translated response from self.get_spoken_answer
        """

    def long_answer(self, query, context=None, lang=None):
        """
        return a list of ordered steps to expand the answer, eg, "tell me more"
        translated response from self.get_expanded_answer
        {
            "title": "optional",
            "summary": "speak this",
            "img": "optional/path/or/url
        }
        :return:
        """

Example Usage - DuckDuckGo plugin

single answer

from skill_ovos_ddg import DuckDuckGoSolver

d = DuckDuckGoSolver()

query = "who is Isaac Newton"

# full answer
ans = d.get_spoken_answer(query)
print(ans)
# Sir Isaac Newton was an English mathematician, physicist, astronomer, alchemist, theologian, and author widely recognised as one of the greatest mathematicians and physicists of all time and among the most influential scientists. He was a key figure in the philosophical revolution known as the Enlightenment. His book Philosophiæ Naturalis Principia Mathematica, first published in 1687, established classical mechanics. Newton also made seminal contributions to optics, and shares credit with German mathematician Gottfried Wilhelm Leibniz for developing infinitesimal calculus. In the Principia, Newton formulated the laws of motion and universal gravitation that formed the dominant scientific viewpoint until it was superseded by the theory of relativity.

chunked answer, for conversational dialogs, ie "tell me more"

from skill_ovos_ddg import DuckDuckGoSolver

d = DuckDuckGoSolver()

query = "who is Isaac Newton"

# chunked answer
for sentence in d.get_long_answer(query):
    print(sentence["title"])
    print(sentence["summary"])
    print(sentence.get("img"))
    
    # who is Isaac Newton
    # Sir Isaac Newton was an English mathematician, physicist, astronomer, alchemist, theologian, and author widely recognised as one of the greatest mathematicians and physicists of all time and among the most influential scientists.
    # https://duckduckgo.com/i/ea7be744.jpg
    
    # who is Isaac Newton
    # He was a key figure in the philosophical revolution known as the Enlightenment.
    # https://duckduckgo.com/i/ea7be744.jpg
    
    # who is Isaac Newton
    # His book Philosophiæ Naturalis Principia Mathematica, first published in 1687, established classical mechanics.
    # https://duckduckgo.com/i/ea7be744.jpg
    
    # who is Isaac Newton
    # Newton also made seminal contributions to optics, and shares credit with German mathematician Gottfried Wilhelm Leibniz for developing infinitesimal calculus.
    # https://duckduckgo.com/i/ea7be744.jpg
    
    # who is Isaac Newton
    # In the Principia, Newton formulated the laws of motion and universal gravitation that formed the dominant scientific viewpoint until it was superseded by the theory of relativity.
    # https://duckduckgo.com/i/ea7be744.jpg

Auto translation, pass user language in context

from skill_ovos_ddg import DuckDuckGoSolver

d = DuckDuckGoSolver()

# bidirectional auto translate by passing lang context
sentence = d.get_spoken_answer("Quem é Isaac Newton",
                           context={"lang": "pt"})
print(sentence)
# Sir Isaac Newton foi um matemático inglês, físico, astrônomo, alquimista, teólogo e autor amplamente reconhecido como um dos maiores matemáticos e físicos de todos os tempos e entre os cientistas mais influentes. Ele era uma figura chave na revolução filosófica conhecida como o Iluminismo. Seu livro Philosophiæ Naturalis Principia Mathematica, publicado pela primeira vez em 1687, estabeleceu a mecânica clássica. Newton também fez contribuições seminais para a óptica, e compartilha crédito com o matemático alemão Gottfried Wilhelm Leibniz para desenvolver cálculo infinitesimal. No Principia, Newton formulou as leis do movimento e da gravitação universal que formaram o ponto de vista científico dominante até ser superado pela teoria da relatividade

Server

(during ovos-core 0.0.9 dev cycle)

Why

some personas won't be able to run fully on device due to hardware constraints, LLMs in particular

How

Marketplace

(during ovos-core 0.0.9 dev cycle)

Why

users should be able to one click select a persona and have a nice UI

users should have a way to evaluate how useful each persona is before installing it

How

the contest has started, give a score to the following answer
- domain: Natural Language Understanding (NLU)
- question:Can you explain what the term 'cryptocurrency' means?
- answer: it's like money, but uses mathematical magic called cryptography instead of coming from the banks
  • github automation on PR submiting persona

Skill Dialogs

(during ovos-core 0.0.9 dev cycle)

Why

Skills with personalities and flexible dialogs!

  • in some languages TTS utterances may depend on the gender of the person listening
    • eg. in portuguese there is no listener_gender neutral way to say "you are beautiful", you say "tu és lindo/linda
    • can be detected in ovos-dinkum-listener via audio transformer plugins
  • in some languages TTS utterances may depend on the gender of the speaker
    • eg. in portuguese there is no speaker_gender neutral way to say "thank you", you say "obrigado"/"obrigada"
    • this will be a setting of the persona
  • personality settings
    • "increase sarcasm by 20%"

How

New file format, .jsonl

jsonl format info: https://jsonlines.org/

{"utterance": "stick the head out of the window and check it yourself", "attitude": "mean", "weight": 0.1}
{"utterance": "current weather is X", "attitude": "helpful", "weight": 0.9}
  • 1 - load .jsonl file if it exists, else old .dialog file
  • 2 - select an attitude based on weights defined in mycroft.conf / current active persona
  • 3 - filter samples per attitudes
  • 4 - select based on weights of .jsonl file
"persona": {
    "gender": "male",
    "attitudes": {
        "normal": 100,
        "funny": 70,
        "sarcastic": 10,
        "irritable": 0
    }
}
  • define and document the new dialog file format
  • add basic support for all official skills
  • dialog_selector plugin class that takes these files as input
    • make a LLM plugin with a dedicated prompt to parse these files and select final dialog
    • default dialog selector should be heuristic

original issue OpenVoiceOS/OVOS-workshop#56

Dialog and TTS Transformers

Why

skills won't cover every personality, the previous technique works to change the dialog content, but as a persona we also want to change the dialog style

a persona should be able to mutate the text before TTS, and also to modify the audio after TTS

utt = "Quantum mechanics is a branch of physics that describes the behavior of particles at the smallest scales. " \
    "It involves principles such as superposition, where particles can exist in multiple states simultaneously, " \
    "and entanglement, where particles become connected and can influence each other's properties."
print(lovecraftify(utt))
# Quantum mechanics unveils the eldritch secrets of the infinitesimal realm,
# where particles, ensnared in the web of superposition, dwell in manifold states.
# Through the dread phenomenon of entanglement, these entities intertwine,
# their very essence entwined, shaping the fabric of reality.
print(dudeify(utt))
# Quantum mechanics is like, the raddest branch of physics, dude.
# It's all about particles at the tiniest scales, doing crazy stuff like
# being in multiple states at once (superposition) and getting all connected and influencing each other (entanglement).
print(eli5(utt))
# Quantum mechanics is like a special kind of science that helps us understand really tiny things.
# It tells us that these tiny things can be in more than one place at the same time,
# and they can also be connected to each other and affect each other's behavior.

How

under persona json

{
	"dialog_transformers": {
		"ovos-dialog-transformer-openai": {
			"key": "xxxxx",  
			"api_url": "https://api.openai.com/v1"},
			"rewrite_prompt": "Add more 'dude'ness to"
	},
   "tts_transformers": {
		"ovos-tts-transformer-sox": {
			"default_effects": {
				"pitch": {"n_semitones": int}
			}
		}
	}
}
@JarbasAl JarbasAl added the documentation Improvements or additions to documentation label Sep 22, 2023
@JarbasAl JarbasAl pinned this issue Sep 22, 2023
JarbasAl added a commit to OpenVoiceOS/ovos-plugin-manager that referenced this issue Oct 6, 2023
listener has AudioTransformers
core has MetadataTransformers and UtteranceTransformers

this PR extends the concept to ovos-audio for text before TTS stage, and wav file after TTS but before playback

use case demonstration https://gist.github.com/JarbasAl/16788bdcff3fa5bfb3634d6f2116cc04

this ties into OpenVoiceOS/ovos-persona#4
JarbasAl added a commit to OpenVoiceOS/ovos-plugin-manager that referenced this issue Oct 8, 2023
if using a fallback the thread is started twice

fix test

fix test

missing import

Mycroft -> OpenVoiceOS

fix type hints

feat/tts_transformers

.

move PlaybackThread to ovos-audio

.

feat/ovos_dialog_tts_transformers

listener has AudioTransformers
core has MetadataTransformers and UtteranceTransformers

this PR extends the concept to ovos-audio for text before TTS stage, and wav file after TTS but before playback

use case demonstration https://gist.github.com/JarbasAl/16788bdcff3fa5bfb3634d6f2116cc04

this ties into OpenVoiceOS/ovos-persona#4
@mikejgray
Copy link

It would be very cool to have support for TTS per persona, as well as wakeword per persona.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
documentation Improvements or additions to documentation
Projects
None yet
Development

No branches or pull requests

2 participants