-
Notifications
You must be signed in to change notification settings - Fork 66
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
No way to configure SSML parsing to on for speech-dispatcher-espeak-ng #301
Comments
I you do not specify which version of speech-dispatcher you are using exactly, or giving URLs such as speechd/src/modules/espeak-ng.c Line 344 in 0ec6b72
? That is bogus, that does not happen so on my system, and shouldn't: If I don't pass -m, I'm hearing the
Again, that's the documented, and thus expected behavior.
Not at all, it's a speech-dispatcher module, see
That looks like erroneous to me. Like I already told so in #1 , you can just enable ssml parsing during the connection. Allowing users to configure otherwise would provide confusion all over: applications expecting their speech text to be non-interpreted will see their input interpreted! Really, the only sane way is for clients to just tell whether their input has ssml or not. I don't see why clients can not just do that, it's in the SSIP protocol, you just need to pass |
If you want to enable SSML by default only for your own experimentations, you can do so in |
https://github.com/espeak-ng/espeak-ng/blob/master/src/espeak-ng.c#L344
is the simplest way that have tried and succeeded to enable SSML parsing by default.
Do not post "bogus" content here, in any way, shape or form. All posts on any topic are supported by primary sources. In this case am the primary source.
The concept is to set SSML parsing on by default. That is the only use case. If markup is not expected to be parsed there are the entire world of speech synthesis engine applications and options to select from - not the version of
Well,
How to do that? Does not appear to be possible.
The branch am building, or intending to build has a single purpose: SSML parsing on by default. There is zero confusion.There is a clear, concise and single purpose for the application. Again, if the user wants speech synthesis without SSML parsing on by default, they have the entire FOSS and proprietary market to select from, as AFAICT no speech synthesis engines are shipped with SSML parsing on by default.
Absolutely agree. The patch graciously provided at https://bugs.chromium.org/p/chromium/issues/detail?id=795371#c18 / https://github.com/brailcom/speechd/files/1677168/ssml.txt/ is sitting un-used. Will try again to request a user with write access to Chromium source code to apply the patch.
Ok. Will try to find the file and setting. There is no confusion. There is a specification that not address key aspects of the application, no algorithms, and yet the specification has not been replaced with language and code that does what is expected right now, in this day and age, given the state of the art - which would probably take the same amount of energy as dealing with the current specification. Have created several other workarounds, including, for example, this proof-of-concept https://gist.github.com/guest271314/59406ad47a622d19b26f8a8c1e1bdfd5. Am not sitting around waiting for someone else to do something regarding this matter. |
Re
The specification actually does include the language either so there should be no confusion, per the specification. 4.2.4. SpeechSynthesisUtterance Attributes text attribute, of type DOMString Emphasis added. The problem is implementers have ignored that language even when the patch to achieve the "either" part of the specification are available. |
Ah! You were talking about espeak-ng's source, not speech-dispatcher's source. Thus why your espeak-ng command is behaving differently than mine
I was not saying you were not getting that behavior from espeak-ng. I was saying that espeak-ng having that behavior was bogus, since it is not what is documented. Now knowing that you patched your espeak-ng, I understand where the behavior difference is coming from: you modified it. And yes I will still call that behavior of espeak-ng as bogus since it's not what is document, not what it has been doing in the past etc. so doing such a change would bring more harm than good.
Could you read about XY problems? (https://en.wikipedia.org/wiki/XY_problem). You are asking me X (SSML by default) which will not happen because that'd bring way more harm than good. So get back to the actual goal, Y. What you want is to get SSML interpreted when chromium/firefox passes data over to speech-dispatcher. Okay, then make them actually express that it is ssml by toggling the proper bit. Trying to do it another way will just bring confusion whether data is pure text not to be interpreted, or text to be interpreted as ssml.
I couldn't understand what you meant.
Here I guess you are talking about the speech-dispatcher module source. Yes, SSML is enabled in that module. The idea in the speech-dispatcher structure is that speech-dispatcher modules are always passed SSML. If some synth backend doesn't support SSML, then But the idea is also that if a client doesn't enable SSML when it connects to speech-dispatcher, non-SSML mode is assumed, and thus if there are SSML tags they get escaped, so that they properly get spoken out (e.g. a programmer reading some text with SSML tags, in that case we want tags to be spoken). That last behavior has always been the behavior of speech-dispatcher. We definitely do not want to change that, because that would break a lot of existing software, and bring confusion as to what is text and what is ssml. Really, we don't want that, we have seen the kind of mess that kind of approach made in the charset questions.
No. No, and no. It has never been so, so we do not want to suddenly change this.
"open source" doesn't mean we shouldn't set on sane basics. Making something that used to behave some way suddenly change its behavior is really not something we want, be it open or closed source.
I already said so: it's in the SSIP protocol, just enable SSML. Modify the client that connects to speech-dispatcher. That's the only sane approach.
And where will it be deployed? Will it be visible publicly? Who will be using it? Will these users also use the same speech-dispatcher to run their own desktop? How will people know which branch they are running?
IT COMPLETELY DOES. You just have to enable the SSML mode when connecting to speech-dispatcher.
Again, I don't see what you are meaning here. But what is sure is that you seem to underestimate the amount of energy to deal with the confusion brought by enabling ssml by default while it has never been by default before. Really, look at the charset mess that we have in emails, it's been dozens of years, and we still have issues everywhere. |
URRRRGL. So it's the API which is screwed. See, here is precisely the confusion: how are we supposed to know whether the text has to be interpreted as plain text or as SSML? A plain text can very well contain That speech API needs to be fixed to add an attribute to specify whether this is SSML or not. There is no other sane way, really. Really this is just like character sets, you do not want any other way than actually specify which encoding is used.
Yes, and speech-dispatcher does that if the client enables SSML, but the synthesis backend doesn't actually support it. So just make your speech-dispatcher client enable SSML, and you will have everything working appropriately.
The problem is in the spec. Implementers have no way to know when SSML should be enabled or not. |
Not an "XY" problem. SSML can be parsed. If there are SSML tags, output SSML, otherwise just parse the plain text. The problem already exists. Am not creating it looking for a solution. Re
the version of espeak-ng with speech-dispatcher that am attempting to build will have SSML parsing turned on by default. That is the only purpose. Plain text can still get parsed. If we could supply our own SSML parsers https://github.com/guest271314/SpeechSynthesisSSMLParser, even better, to avoid the edge cases of other SSML parsers espeak-ng/espeak-ng#737. Consider the command
Again, agree. WICG/speech-api#10, web-platform-tests/wpt#8712. However, changes have not been made. The motion of the ocean does not stop: the requirement has not been met, therefore proceed until achieve goal, by any means. |
Good luck trying to change the Web Speech API specification. When tried to contribute was advied to join W3C and WICG, which reluctantly did, as do not need a group hug to accomplish tasks and institutions have issues outside of the technical issue that am focused on. They did not like /guest271314/ as a "name", and when told them they did not define "real name", while having other contributers with their own version of a "real name", thus clear hypocrisy, was banned froom WICG for 1,000 years w3c/mst-content-hint#39 (comment), w3c/mediacapture-main#629 (comment).
True. The Web Speech API is dead, though that is all we have at the front-end without trying to use WASM and Emscripten, which is a treat to try at 32-bit architecture. While you are here, can you kindly reference how to build speech-dispatcher again, so can modify |
Full disclosure: TL;DR WICG/speech-api#67; https://lists.w3.org/Archives/Public/public-wicg/2019Oct/0000.html. |
Yes, it is. X: "enable SSML by default in speech-dispatcher". Y: "have SSML text coming from speech API interpreted as SSML".
And I'm saying that your solution is just moving the problem, and will get other problems in other places.
Then do like what I said in #301 (comment) , but again, that's not something that should be released publicly, since it changes behavior that has been stable for decades.
No, because enabling SSML parsing by default would bring confusion. If I'm typing
And I did mean this to be the plain text that happens to contain
Sorry, but I can't buy "proceed by any means". Going fast by bringing confusion only means having to spend a lot of effort later on to fix the confusion. See this comment: WICG/speech-api#10 (comment) , it's exactly what I said: specify in the API whether the text is SSML or not.
I'm sorry but it's already difficult for me to find the time to answer you, find the time to fix the pulseaudio bug which I believe was wrongly blamed on speech-dispatcher, find the time to at last release a 0.10, etc. while there is [email protected] etc. which are meant for this. |
Your feedback is valuable. Have been trying to get SSML input implemented by any means for a couple of years now. As mentioned in the comments above, WICG banned this user, per their "discourse" site, for 1,000 years, from GitHub WICG subsidiaries, ostensibly indefinitely, because they did not like the "real name" /guest271314/. Thus, am not able to reply to you comment at WICG/speech-api#10 (comment).
No worries. Will continue to do what can here, again, by any means, as have no restrictions on own conduct relevant to achieving the requirement. YMMV trying to move the Web Speech API specification forward to write and therefrom for implementers to actually implement what is already possible, and has been since filed the first issue here. If the specification and browsers are not moving to achieve the requirement, even though the technology is readily available, the conclusion reached here is those bodies lack the will to do so, and am not bound by their restrictions or lack of will: yes, will continue to try to implement SSML parsing by any means. If after achieved the proof-of-concept can be made sane, will work on sorting out the parts to keep. Kind regards, |
Unfortunately trying to build from GitHub repository has some issues,
Removed default installation of
|
autopoint and makeinfo are required dependencies only when you build from git, building from a tarball doesn't need it. People building from git are expected to know to install the additional dependencies.
is /usr/local/bin in your $PATH ? Did you run hash -r to make sure that bash is not still trying to look for it in /usr/bin? |
Yes,
Just installed the tarball!
Uninstalled
Will installing |
Your system python apparently doesn't find what you installed in /usr/local/. You'd have to see what your system provides to enable using it. Possibly simply
or whatever path it got installed in.
No, the python speechd library is essentially unchanged. The system-provided library will not be able to autospawn /usr/local/bin/speech-dispatcher, but you can spawn that by hand. |
|
Did you run One always has to do so, otherwise the system library cache still points at the system-provided library. |
No. Did not. Not in the build instructions. The reader could perhaps not be expected to know everything seasoned authors of the source run every day. If that information is critical, though not printed in instructions, could be printed anyway for double-redundancy, e.g., the tests reference Can |
|
./configure && make && make install && ldconfig is a really normal thing to know when one is building packages by hand.
No it's really run_test, but you need to run
/etc/ is root-only indeed. |
Same error
|
It's still looking for the system-provided library. I'd say check with your distribution how you are supposed to make libraries in /usr/local/lib actually work. Normally |
Ok. Would any options passed to
Will try that. Have already asked, you, the maintainer. Do not know how to solve the issue right now. |
No, by default it installs to |
lib is for library. binaries are in bin |
Ok. Not working. Apparently |
Actually,
|
Your comment #301 (comment) shows that make install has properly put it in /usr/local/lib, i.e. where it shall be, and a Since apparently it doesn't work, it's a distro problem, so see with the distro |
https://ubuntuforums.org/showthread.php?t=2442447. Who knew enabling SSML support in browsers where the technology, patch, and infrastructure exists to do so would not be as simple as applying the existing patch and installing the source code to facilitate said support? |
Computer science is almost never as simple as just applying a patch. Some patches really open big cans of worms, and we cannot afford that if we want to keep our software stacks maintainable and debuggable. |
Really is the same in any and all fields of human activity. Found that out through hard lessons first hand: litigating to SCOTUS twice by self; and independent research into various invented political classifications; i.e., vetting historical claims; which the former provided instruction therefore by way of statutory construction. Of course, neither individuals not institutions necessarily want their mythologies shredded https://plnkr.co/edit/5CwKsW?preview, thus have been banned from several sites and organizations, one for 1,000 years, some for 5 years, some indefinitely, like Twitter, evidently preemptively: "the algorithm did it" https://politics.meta.stackexchange.com/a/3509. Have been accused of being a Russian bot, and a racist for transcribing Gobineau by hand https://english.meta.stackexchange.com/q/12032 when researched and found the origin of the terms "white people" and later invention "White-women" (1681 Maryland Colony), "white races" and so-called "black" "race" and "race" itself. Primary sources are a must https://history.stackexchange.com/a/47942, emotional responses and some erroneous idea of "respect" are not a must. Too many examples of censored questions and answers and bans to list here https://github.com/guest271314/banned. Am well-suited to handle several years on a computer project after contesting an entire State in a nation-state venue.
Essentially, your stance is extendable to all disciplines. Expect that if the result is not reproducible by the scientific method, or the claim is not backed by primary sources, will be exposing that fraud forthwith if in this midst, without exception. E.g., institutions cannot claim to be awaiting on WHO to tell them about some "new" virus when an admin. agency of that institution has funded research into that very subject matter since at least 2014 https://taggs.hhs.gov/Detail/AwardDetail?arg_AwardNum=R01AI110964&arg_ProgOfficeCode=104. |
Well, yes, sure. Anything complex is complex to achieve, yes, sure. |
Progress.
However, now |
See the daemon logs, they are probably in /run/user/your_uid/speech-dispatcher/log |
when trying to use
that is both with |
Chromium does not autospawn connection when |
|
Not certain what looking for? |
Looking for espeak-ng: configure:21029: $PKG_CONFIG --exists --print-errors "espeak-ng" Do you really have installed the package that provides the espeak-ng.pc file? |
Was trying with
Re-installed using
and Nightly 77, where the listing of the voice
Nightly lists Success! Thank you kindly for your time, energy, patience and professional maintenance over the course of this and several related issues at |
Related #1
Attempting to set SSML parsing to on by default for Web Speech API at browsers 1) installed
python3-speechd
, removedespeak-ng
,espeak
, clonedespeak-ng
from GitHub and added| espeakSSML
to L344 atespeak-ng.c
, built, installed and verified the installationand the output of the change to the source file
$ espeak-ng <speak>test</speak>
parses the SSML by default, without passing-m
flag,$ spd-say <speak>test</test>
does not parse SSML without-x
flag.Looking into the source code further
speech-dispatcher-espeak-ng
package installs the file/usr/lib/speech-dispatcher-modules/sd_espeak-ng
, which appears to be a self-contained version ofespeak-ng
unrelated to the version installed from the repository.Kindly provide the steps necessary to either a) set the speech synthesis engine to the user selected local speech synthesis engine, instead of the file shipped with
speech-dispatcher-espeak-ng
; or b) set SSML parsing to on during$ spd-conf
prompts and in~/.config/speech-dispatcher/modules/espeak-ng.conf
directly.The text was updated successfully, but these errors were encountered: