-
Notifications
You must be signed in to change notification settings - Fork 523
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[EN] Add more english sentences for lock get state queries #2219
Conversation
WalkthroughThe recent updates to the Changes
Sequence Diagram(s) (Beta)sequenceDiagram
participant User
participant System
User->>System: "Is the Front Door locked?"
System-->>User: "Yes, the Front Door is locked."
User->>System: "Do I have any unlocked doors?"
System-->>User: "No, all doors are locked."
User->>System: "Which doors are currently locked?"
System-->>User: "The Front Door and the Back Door are locked."
Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media? TipsChatThere are 3 ways to chat with CodeRabbit:
Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (invoked as PR comments)
Additionally, you can add CodeRabbit Configration File (
|
ad49b52
to
c8dbdcb
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
All in all good changes, but there are one or two things which could be improved. Thanks!
Please take a look at the requested changes, and use the Ready for review button when you are done, thanks 👍 |
Discussion around how to handle this and possibly addressing it more global is forthstanding
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I haven't addressed each individual line, but you have 3 types of issues:
- Removing all optional parts in a set of sentences basically produces the same sentence template. This makes for very inefficient (slow) matching.
- Whitespace should be part of the optional part, i.e.
unlock [the ]{name}
instead ofunlock [the] {name}
. This also prevents inefficiency, as internally, the 2 spaces before and after[the]
would have to match a single space in the transcribed sentence, which is slower than it having to match a single whitespace. - The
securely
adverb should probably be an optional part of the input string of thelock_states
list, rather than included in each sentence. You can't call somethingsecurely unlocked
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please review my comment here #2219 (review)
sentences/en/lock_HassGetState.yaml
Outdated
@@ -3,34 +3,42 @@ intents: | |||
HassGetState: | |||
data: | |||
- sentences: | |||
- "is <name> {lock_states:state} [in <area>]" | |||
- "(is|are) <name> ([<currently>];{lock_states:state} [in <area>])" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can't have a completely optional permutation item, as skipping <currently>
would match both the optional part at the beginning and at the end, which is inefficient. Also, watch out for spaces before/after optional parts.
- "(is|are) <name> ([<currently>];{lock_states:state} [in <area>])" | |
- "(is|are) <name> (<currently>;{lock_states:state}[ in <area>])" | |
- "(is|are) <name> {lock_states:state}[ in <area>]" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not seeing any significant difference in performance when I run benchmarks over 324,000 sentences.
The performance seems to flip flop a bit between the two versions and, even taking worst case difference (0.5s for a benchmark iteration), that comes out to be 0.5, which is one micro second.
With that in consideration, it feels like optimizing for ease of maintenance is more important to giving users a good experience. My worry is not so much making this one change, but doing this for every intent type and then expanding for other common utterances.
Methodology:
# Number of generated sentences
$ python3 -m hassil.sample ./split_permutations.yaml --areas kitchen --names "front door" | jq -r .text | sort | uniq | wc -l
# Benchmarking with split sentences (your diff)
$ set PERMUTATIONS (python3 -m hassil.sample ./split_permutations.yaml --areas kitchen --names "front door" | jq -r .text) [.venv]
time for i in (seq 1000)
echo $PERMUTATIONS | hassil ./split_permutations.yaml --areas kitchen --names "front door" 2&> /dev/null
end
________________________________________________________
Executed in 83.68 secs fish external
usr time 56.92 secs 353.69 millis 56.56 secs
sys time 14.81 secs 391.86 millis 14.42 secs
# Benchmarking with optional permutations
$ set PERMUTATIONS (python3 -m hassil.sample ./empty_permutations.yaml --areas kitchen --names "front door" | jq -r .text) [.venv]
time for i in (seq 1000)
echo $PERMUTATIONS | hassil ./empty_permutations.yaml --areas kitchen --names "front door" 2&> /dev/null
end
________________________________________________________
Executed in 82.60 secs fish external
usr time 56.22 secs 377.65 millis 55.85 secs
sys time 14.59 secs 398.24 millis 14.20 secs
empty_permutations.yaml
:hassil:Area names: ['kitchen']
language: en
intents:
HassGetState:
data:
- sentences:
- "(is|are) <name> ([<currently>];{lock_states:state} [in <area>])"
response: one_yesno
expansion_rules:
name: "[the] {name}"
area: "[the] {area}"
currently: "(currently|presently|right now|at the moment)"
lists:
lock_states:
values:
- in: "[securely ]locked"
out: "locked"
- in: "unlocked"
out: "unlocked"
split_permutations.yaml
language: en
intents:
HassGetState:
data:
- sentences:
- "(is|are) <name> (<currently>;{lock_states:state} [in <area>])"
- "(is|are) <name> {lock_states:state} [in <area>]"
response: one_yesno
expansion_rules:
name: "[the] {name}"
area: "[the] {area}"
currently: "(currently|presently|right now|at the moment)"
lists:
lock_states:
values:
- in: "[securely ]locked"
out: "locked"
- in: "unlocked"
out: "unlocked"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well, that's interesting, i must admit. Simply looking at the code structure/complexity says that the split variant should be faster, but i guess that the complexity of other permutation items might have a role here as well.
Either way, I agree that for such minute performance differences, ease of maintenance should be prioritized.
@synesthesiam please take a look at the performance results above and let's discuss if we should change our approach regarding optimization enforcement.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just wanted to see if any more thought has been given to this? I'd like to do another PR generalizing adding some more common values, then start to refactor sentences and generate new structures via an LLM.
It's pretty cool that we can now use an LLM in Assist, but I'm hoping to get things to feel more natural and flexible by letting the LLM generate sentences to match and then performing any actual sentence to intent matching offline.
Add some new permutations of lock sentence queries.
Summary by CodeRabbit
New Features
HassGetState
intent to support more flexible and varied sentence structures for querying the state of locks, doors, and windows in specific areas.Improvements