-
Notifications
You must be signed in to change notification settings - Fork 5
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Text Citations and footnotes using ReadAloud can be disruptive to comprehension #72
Comments
This is directly related to what is proposed in #69 and I wonder if we should just combine these two issues. With "read aloud" are we referring to text-to-speech, screen reader output, media overlays, or a combination of the three? |
Those are three different domains that cannot share a common solution:
|
I believe that for the RS to give the reader the option to avoid unwanted spoken information, the content would need to be marked up. In particular, where the author is referencing a formal citation, if this were marked up, the RS ReadAloud function could give the option to skip it. The option of skipping the reading of footnotes could also be skipped. Here doc-footnote could be skipped. I believe this skipping could also be implemented by screen readers. In the case of SMIL, if the content was marked up, then this could be identified in the SMIL markup. |
Yes, This issue is directly related to #69, but it is much simpler to implement. If we create a best practice for marking citations, I think there is enough general markup to resolve this issue. RS systems could simply add the option of what to skip in their ReadAloud. For example in ReadAloud skip : These could be toggled. In textbooks, reading of pages I would want, but in a novle, it would be disruptive. People should be able to choose. |
So the problem I'm having here is that I have never seen TTS referred to as read aloud, I've actually seen this terminology from publishers in the context of media overlays in places like the description of the book, or in the context of a specific learning style (there is a lot of content out there on "Read Aloud" practices that include media overlays or teach parents how to read aloud). In user discussions, we have also only heard users refer to either TTS or "reader mode", not "read aloud". I want to make sure we're using accurate and precise language, and unify on it, so we avoid confusion on both the publisher side (where I currently see a lot of confusion between the methods), and the user side. |
Publishers that I work with use "Read Aloud" to mean text-to-speech. Perhaps we define our terms in any resulting spec. Media Overlays: Audio or video files embedded in an ebook |
Hi @sueneu in the definitions you provided I have heard folks use |
@clapierre well, that explains some of the confusion! I've been told by developers that "Read Aloud" refers only to synchronized media overlays. So there is some variation within the industry. For that reason, we should be careful to define terms in documentation. Could we define "Read Aloud" as any audio expression of the text no matter what technology (ie. TTS, media overlays) is used? You could easily make the argument that the user needn't be aware of how the audio is generated. Documentation for publishers, producers, and reading systems could further define the underlying tech. |
You might want to refer to the guide @GeorgeKerscher wrote: https://www.w3.org/publishing/a11y/audio-playback/ It gets into the confusion around the Read Aloud v. Read Now naming. |
But @mattgarrish, the document you're referring to explicitly makes the difference between media overlays "Read Aloud" and TTS as separate features. There's a big gulf between the two features, and in many cases, completely different sub-features between the two. Most SMIL implementations don't allow you to adjust reading speed for instance, and SMIL allows the publisher to customize text highlighting, but TTS implementations do not. Not to mention the different audio, SMIL is most often a human narrator where TTS is computer generated. I think it's really important to be clear about what the user is going to experience. Especially in cases where both options might be available for a title. EDIT: I also think it's important to point out that the two features have completely different origins, one is publisher-driven and provided, the other is reading-system driven. |
It's defining "full audio" publications as those that use media overlays and TTS for the reading system/AT-generated playback, regardless of what names are assigned to those features in different reading systems. When you start using generic names like "read aloud" it means different things to different people. I'm only pointing it out as a means of standardizing the language used to talk about the issue. |
From a reading app perspective, I'm not sure that there's always a need to identify TTS and media overlay as two completely different affordances. Framing this as a User Story: "As a user, I would like to listen to an ebook and have sufficient control over that experience". The following preferences/features can apply to both of them:
While Media Overlay can come with their own CSS class for highlighting, this authored preference could prove problematic to some users and it makes sense to always offer the ability to customize things. I would need to double check but as far as I can remember, this is also optional in EPUB, which means that reading systems need a way to handle highlighting if it isn't authored in the file anyway. For reading speed, it's well known by now that many users want the ability to tweak things to their own liking. This goes beyond ebooks/audiobooks, since podcast and video apps often offer this option as well (there are many people watching anime at a higher speed for example). I believe that this eventually comes down to two key differences:
As TTS becomes better and better, I believe that the barrier between the two of them will continue to break down. Just earlier this week, I read an article about Storytel providing TTS as an alternative option in a number of audiobooks that they provide: https://www.boktugg.se/2024/02/27/rostbytaren-storytel-lanserare-voice-switcher-pa-svenska/ The key argument being: "A whopping 89% of Storytel's listeners have at some point finished a book, not because the book was bad, but because the voice didn't suit them". |
Hadrian:
Everything mentioned could be accomplished via JS. Modern browsers could accomplish all of the above. When working with ePUB reader, do they allow JS, limited to document control? Do they have and require an API? Is DOM and JS the API?
I wonder. Since EPUB is a website wrapped in a package, would a simpler solution be to allow web browsers the ability to see inside a ZIP archive and read ePUB files, rather than wait for the readers to catch up?
Best Regards,
Dale Rogers, M.Ed., CIW
Designer
eLearning Developer
***@***.***
http://dalerogers.me/
https://www.linkedin.com/in/dalerrogers/
From my iPhone. Pardon my thumbs.
…________________________________
From: Hadrien Gardeur ***@***.***>
Sent: Friday, March 1, 2024 10:26:30 AM
To: w3c/publishingcg ***@***.***>
Cc: Subscribed ***@***.***>
Subject: Re: [w3c/publishingcg] Text Citations and footnotes using ReadAloud can be disruptive to comprehension (Issue #72)
From a reading app perspective, I'm not sure that there's always a need to identify TTS and media overlay as two completely different affordances.
Framing this as a User Story: "As a user, I would like to listen to an ebook and have sufficient control over that experience".
The following preferences/features can apply to both of them:
* play/pause/stop
* skip to next/previous utterances
* highlight colour
* speed
* continous playback (this mostly applies to FXL content, where you might want to automatically pause the playback until the reader moves forward to the next page/spread)
* skippability could apply to both as Media Overlay/SMIL also provides semantic information that can be used to skip specific utterances
While Media Overlay can come with their own CSS class for highlighting, this authored preference could prove problematic to some users and it makes sense to always offer the ability to customize things. I would need to double check but as far as I can remember, this is also optional in EPUB, which means that reading systems need a way to handle highlighting if it isn't authored in the file anyway.
For reading speed, it's well known by now that many users want the ability to tweak things to their own liking. This goes beyond ebooks/audiobooks, since podcast and video apps often offer this option as well (there are many people watching anime at a higher speed for example).
I believe that this eventually comes down to two key differences:
* Media Overlay may provide a higher quality audio experience, if it's recorded by a real human narrator (TTS could also be used to mass produce such files)
* and the way content is broken down into utterances (more control from the reading system with TTS)
As TTS becomes better and better, I believe that the barrier between the two of them will continue to break down. Just earlier this week, I read an article about Storytel providing TTS as an alternative option in a number of audiobooks that they provide: https://www.boktugg.se/2024/02/27/rostbytaren-storytel-lanserare-voice-switcher-pa-svenska/
The key argument being: "A whopping 89% of Storytel's listeners have at some point finished a book, not because the book was bad, but because the voice didn't suit them".
—
Reply to this email directly, view it on GitHub<#72 (comment)>, or unsubscribe<https://github.com/notifications/unsubscribe-auth/AAOCKEQLKGH42LMDGYPHTITYWCT3NAVCNFSM6AAAAABDG7X6QKVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTSNZTGQ4TAMZQG4>.
You are receiving this because you are subscribed to this thread.Message ID: ***@***.***>
|
@dalerrogers you're right that this can be entirely done in JS and in fact that's what a number of reading apps do (mostly the ones that are Web Apps since there are better native options available for dividing text into utterances and then reading these utterances using a TTS engine). In such cases, the JS handling all of that is served by the reading app though, not the publication. That's consistent with using Edge's TTS feature, which works on every website that you visit. I think Chrome has something similar in testing as well. The main issue when implementing TTS with Web technologies right now is mostly related to inconsistencies across implementations of lower level API in browsers. |
Hello all:
I sent this request to Avneesh Singh as well. So, I apologize in advance for cross-posting. I’m trying to track down an error as I am publishing my first fixed-layout eBook to Amazon, Kobo, and Ingram Spark.
The short version:
I’m hand-coding the EPUB so I know exactly what is in there. I’m a front-end coder and have taught HTML/CSS for 18 years. I am CIW certified. I understand markup. I’m using VS Code as my editor so there shouldn’t be any odd hidden characters. It’s plain text, coded as UTF-8.
I ran my EPUB package through the latest EPUB checker (version 5.1.0 according to the CHANGELOG.txt file). It validates. I ran it through the Daisy ACE checker. It validates. It opens and displays as designed on my iBooks, and Kindle apps on my MacBook Air, iPad Pro, and iPhone. So far, so good.
During the submission process to Ingram Spark and Kobo Writing Life, I’m getting the error:
Error while parsing file: [attribute "class" not allowed here; expected attribute "dir", "version" or "xml:lang"] in OEBPS/title-page.xhtml, line 2
Line 2 contains the following code…
<html xmlns=http://www.w3.org/1999/xhtml lang="en">
I ran the code and errors through ChatGPT to see if AI could help me isolate the issue. It recommended I confirm the versions of EPUB check being used. Good idea.
According to KOBO documentation (https://github.com/kobolabs/epub-spec/blob/master/README.md#epub-versions-kobo-supports), it uses EPUB checker version 4.2.4. I validated my file with EPUB checker version 5.1.0. Is that what is throwing my Kobo and Ingram Spark errors?
Should I ignore the Ingram Spark and Kobo validator warnings and proceed with confidence that my validators are more current? Has anyone else run into this?
My project is a comic book. The intended audience is sighted readers. Still, I want everyone to enjoy the experience, and the image alt attributes have rich descriptions of all the panels.
Is this an issue to be reported? What is the workaround or guidance to get my book published?
Best Regards,
Dale
Dale R Rogers, M.Ed, CIW
Creator | Designer | Educator
Personal: ***@***.******@***.***>
Web: dalerogers.me<https://dalerogers.me/>
|
Try this:
<html xmlns:epub="http://www.idpf.org/2007/ops" xmlns="http://www.w3.org/1999/xhtml" xml:lang="en">
…________________________________
From: Dale Rogers ***@***.***>
Sent: Sunday, September 1, 2024 5:19 PM
To: w3c/publishingcg ***@***.***>; W3C EPUB 3 Community Group ***@***.***>
Subject: EPUB validation and publishing platforms
Hello all:
I sent this request to Avneesh Singh as well. So, I apologize in advance for cross-posting. I’m trying to track down an error as I am publishing my first fixed-layout eBook to Amazon, Kobo, and Ingram Spark.
The short version:
I’m hand-coding the EPUB so I know exactly what is in there. I’m a front-end coder and have taught HTML/CSS for 18 years. I am CIW certified. I understand markup. I’m using VS Code as my editor so there shouldn’t be any odd hidden characters. It’s plain text, coded as UTF-8.
I ran my EPUB package through the latest EPUB checker (version 5.1.0 according to the CHANGELOG.txt file). It validates. I ran it through the Daisy ACE checker. It validates. It opens and displays as designed on my iBooks, and Kindle apps on my MacBook Air, iPad Pro, and iPhone. So far, so good.
During the submission process to Ingram Spark and Kobo Writing Life, I’m getting the error:
Error while parsing file: [attribute "class" not allowed here; expected attribute "dir", "version" or "xml:lang"] in OEBPS/title-page.xhtml, line 2
Line 2 contains the following code…
<html xmlns=http://www.w3.org/1999/xhtml lang="en">
I ran the code and errors through ChatGPT to see if AI could help me isolate the issue. It recommended I confirm the versions of EPUB check being used. Good idea.
According to KOBO documentation (https://github.com/kobolabs/epub-spec/blob/master/README.md#epub-versions-kobo-supports), it uses EPUB checker version 4.2.4. I validated my file with EPUB checker version 5.1.0. Is that what is throwing my Kobo and Ingram Spark errors?
Should I ignore the Ingram Spark and Kobo validator warnings and proceed with confidence that my validators are more current? Has anyone else run into this?
My project is a comic book. The intended audience is sighted readers. Still, I want everyone to enjoy the experience, and the image alt attributes have rich descriptions of all the panels.
Is this an issue to be reported? What is the workaround or guidance to get my book published?
Best Regards,
Dale
Dale R Rogers, M.Ed, CIW
Creator | Designer | Educator
Personal: ***@***.******@***.***>
Web: dalerogers.me<https://dalerogers.me/>
|
Hi Dale
I suggest you add xml:lang=“en” to that line.
This is an example of a typical html tag for a fixed layout EPUB content document.
<html xmlns="http://www.w3.org/1999/xhtml" xmlns:epub="http://www.idpf.org/2007/ops" lang="en-GB" xml:lang="en-GB">
Thanks
Ken
Ken Jones
Director
Circular Software Limited
circularsoftware.com<https://www.circularsoftware.com/>
***@***.******@***.***>
linkedin.com/in/kenjones<http://linkedin.com/in/kenjones>
On 1 Sep 2024, at 23:19, Dale Rogers ***@***.***> wrote:
Hello all:
I sent this request to Avneesh Singh as well. So, I apologize in advance for cross-posting. I’m trying to track down an error as I am publishing my first fixed-layout eBook to Amazon, Kobo, and Ingram Spark.
The short version:
I’m hand-coding the EPUB so I know exactly what is in there. I’m a front-end coder and have taught HTML/CSS for 18 years. I am CIW certified. I understand markup. I’m using VS Code as my editor so there shouldn’t be any odd hidden characters. It’s plain text, coded as UTF-8.
I ran my EPUB package through the latest EPUB checker (version 5.1.0 according to the CHANGELOG.txt file). It validates. I ran it through the Daisy ACE checker. It validates. It opens and displays as designed on my iBooks, and Kindle apps on my MacBook Air, iPad Pro, and iPhone. So far, so good.
During the submission process to Ingram Spark and Kobo Writing Life, I’m getting the error:
Error while parsing file: [attribute "class" not allowed here; expected attribute "dir", "version" or "xml:lang"] in OEBPS/title-page.xhtml, line 2
Line 2 contains the following code…
<html xmlns=http://www.w3.org/1999/xhtml lang="en">
I ran the code and errors through ChatGPT to see if AI could help me isolate the issue. It recommended I confirm the versions of EPUB check being used. Good idea.
According to KOBO documentation (https://github.com/kobolabs/epub-spec/blob/master/README.md#epub-versions-kobo-supports), it uses EPUB checker version 4.2.4. I validated my file with EPUB checker version 5.1.0. Is that what is throwing my Kobo and Ingram Spark errors?
Should I ignore the Ingram Spark and Kobo validator warnings and proceed with confidence that my validators are more current? Has anyone else run into this?
My project is a comic book. The intended audience is sighted readers. Still, I want everyone to enjoy the experience, and the image alt attributes have rich descriptions of all the panels.
Is this an issue to be reported? What is the workaround or guidance to get my book published?
Best Regards,
Dale
Dale R Rogers, M.Ed, CIW
Creator | Designer | Educator
Personal: ***@***.******@***.***>
Web: dalerogers.me<https://dalerogers.me/>
|
Description
When using the ReadAloud function in a Reading System, or when a screen reader is being used, text citations in the text can be disruptive to reading comprehension. The same disruption occurs if a footnote is read where it occurs. The concept of skipability and escapeability has been discussed using SMIL and media overlays, but when using ReadAloud or with a screen reader has not yet been addressed.
This feature request originated in the EPUB Reading Systems accessibility testing, but it is not accessibility specific. We are requesting that the Publishing Community Group take up this issue. It relates to best practices for markup and having the feature in Reading Systems and with screen readers.
The text was updated successfully, but these errors were encountered: