Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

review of issues raised in "Reading Between the Lines: An Extensive Evaluation of the Security and Privacy Implications of EPUB Reading Systems" #2266

Closed
npdoty opened this issue May 10, 2022 · 11 comments
Labels
EPUB33 Issues addressed in the EPUB 3.3 revision security-tracker Group bringing to attention of security, or tracked by the security Group but not needing response. Spec-EPUB3 The issue affects the core EPUB 3.3 Recommendation

Comments

@npdoty
Copy link

npdoty commented May 10, 2022

We should review this academic paper on the security and privacy implications and evaluate each of the threats and suggestions there for mitigations that might require some spec work. My impression is that the authors recommend stronger normative language in the specification to make protections more consistent.

(Apologies if this has been done already and I just missed it! I saw this raised in #1876.)

@mattgarrish
Copy link
Member

Just taking a look at the conclusions in this document and the primary issue appears to be access to the local file system. They mention:

  • not asking for permission to access the file system on mobile devices, with Apple already sandboxing apps
  • making the recommendations in the specification more binding (which is what Privacy and security changes #2297 will do)
  • blocking scripts from accessing files on the local file system and making it illegal to reference files on the file system
  • obtaining consent of users to access remote resources
  • enumerating the APIs that reading systems can support

Of these, only the latter point strikes me as problematic to do. We've discussed in the past if there should be a support profile for scripting. When we selectively allow APIs, though, we're back to having to forever update the specification to allow new ones. And can we really block APIs when reading systems may run directly in browsers?

Disallowing and blocking access to the local file system seems reasonable to implement. I know there are theoretical use cases where someone might need local resources, but has anyone seen an epub in reality that needs this functionality? We made it a "should not" to use file scheme in the package document earlier this revision, but perhaps this should be bumped to a "must not" and applied anywhere in an EPUB? (As part of also clamping down on RS access.)

Assigning unique origins to publications only solves relative references from leaking outside the container (and symlinks, I'd expect). It doesn't stop scripting APIs from accessing local files, and remote resources are still allowed to reference file URLs with only a warning.

@iherman iherman added the Agenda+ Issues that should be discussed during the next working group call. label May 23, 2022
@iherman
Copy link
Member

iherman commented May 23, 2022

Just for the records, saving some extra clips: the PDF file for the study can be downloaded at https://www.computer.org/csdl/proceedings-article/sp/2021/893400a247/1mbmHAQitna. However, the paper is behind the firewall ☹️.

@npdoty I do not want to come up with a copyright breaking scheme here. I see that @mattgarrish has had access to the study, and has given a summary in #2266 (comment). Does that summary satisfy you? Or do you have additional bullet points to add?

@iherman
Copy link
Member

iherman commented May 23, 2022

My impression is that the authors recommend stronger normative language in the specification to make protections more consistent.

this has been (partially?) addressed in #2297

@iherman
Copy link
Member

iherman commented May 23, 2022

Of these, only the latter point strikes me as problematic to do. We've discussed in the past if there should be a support profile for scripting. When we selectively allow APIs, though, we're back to having to forever update the specification to allow new ones.

Indeed, that sounds fairly problematic. Earlier incarnations of this group did this profiling exercise in the context of, e.g., CSS, and it has proven to be a problem with such a moving landscape as the Web Platform. The EPUB specification became outdated, and a profile became a hurdle and an obstacle. The current specification relies, instead, on the CSS snapshot published by the CSS WG, making the EPUB specification closer to a living document (while keeping the dynamic nature within acceptable bounds for the industry).

I wonder whether there would be an interest, in the PING WG, to define and, primarily, maintain a profile of some sort for privacy preserving API-s that a specification like EPUB could refer to. This may be a better approach; a RS may declare, for example, that it accepts only those API-s. But this is only a very vague idea...

And can we really block APIs when reading systems may run directly in browsers?

... or, maybe even more to the point, when relying on an external WebView?

@OriIdan
Copy link

OriIdan commented May 23, 2022 via email

@iherman
Copy link
Member

iherman commented May 27, 2022

The issue was discussed in a meeting on 2022-05-26

  • no resolutions were taken
View the transcript

1.4. Review of issues in “Reading Between the Lines".

See github issue epub-specs#2266.

Dave Cramer: See (academic) paper on EPUB Security.

Dave Cramer: this is a paywalled academic paper, link is to the author's own website.
… they tested close to 100 epub RS, with some automated tests of security issues (e.g. scripting). Concluded with a few recommendations on how to change the spec.
… I've looked at it briefly.
… we were talking earlier about remote resources, but it seems that we allow use of local resources, which may be a significant security hole.
… it also mentioned that half of RS that supported javascript supported geolocation, etc..
… they say it wasn't mentioned in spec, but it should be covered by our recommendation for soliciting user consent.
… I think several of us need to review issues raised in it.

Wendy Reid: mgarrish gave good summary of points in issue.
… authors make some suggestions around the support of javascript and that the spec should specify no access to local file system.
… but we've always strayed from explicitly telling RS how to build an RS.
… their points are valid, and the RSs featured should take those issues up.
… rather than changes that should be made in the spec itself.

Dan Lazin: should we invite them to give us a presentation and some recommendations?.

Dave Cramer: I like the idea of having a dialog with them. They seem interested and knowledgeable.

Wendy Reid: I also like the idea.

Dave Cramer: good topic for next chairs call.

Brady Duga: +1.
… also re access to file system, does that mean you can't access the epub? Difficult not to be overly broad in prohibitions.

Matt Garrish: this is why I didn't try to write a PR to answer this one.
… their main point of contention is file: URLs right?.
… we can block those in authoring without breaking epub.
… trickier on RS side.
… we talk in theory about how file: URLs could be used, but does anyone even do this in reality?.

Dave Cramer: <xi:include href=file:://usr/passwd>.

Matt Garrish: if there was a way to block that I personally wouldn't take issue, but the language would be tricky from RS perspective.

Wendy Reid: I've found the authors of the paper online.

Dave Cramer: that's a great way forward on this issue.
… we could probably learn from each other.

@iherman
Copy link
Member

iherman commented May 31, 2022

We made it a "should not" to use file scheme in the package document earlier this revision,

Just for the records, this is in § 5.3.2 and it only refers to the href attribute.

but perhaps this should be bumped to a "must not" and applied anywhere in an EPUB? (As part of also clamping down on RS access.)

I would be perfectly fine with a MUST NOT use file url schemes so that epubcheck could simply disallow it. This may be a problem with backward compatibility, but security is really the case that might be considered as a strong enough argument (and I do not believe any publisher produces EPUB files with file URLs).

For a greater emphasis, we may give a better visibility to this restriction to put this into its own subsection, alongside the data URL subsection.

@iherman
Copy link
Member

iherman commented Jun 3, 2022

This issue was discussed in a meeting.

  • No actions or resolutions
View the transcript ### 1. Gertjan Franken presents “Reading Between the Lines: An Extensive Evaluation of the Security and Privacy Implications of EPUB Reading Systems”.
{: #section1}
> *Ivan Herman:* See [The published paper](https://lirias.kuleuven.be/retrieve/616428).
**Dave Cramer:** this is a special meeting of the epub3 wg, we have a guest.
… in the last meeting we talked about the paper written by our guest about security and privacy implications for epub 3 rs, and how that relates to reivew we've had from TAG and security and privacy wg.
… would you like to introduce yourself?.
**Gertjan Franken:** I am a security researcher, we researched security properties and vulnerabilities of epub rs.
… I'm a big reader, and when I discovered that web tech is underlying, that is how this interest started.
… happy to advise on this.
**Dave Cramer:** i know from reading your paper that epub 3.2 was the standard, this group is working on the next revision of the standard.
… this is the first time that epub is going through the full format w3c rec track.
… the earlier revision was subject to a much less involved process.
… so the big change is that we have to have tests for conformance.
… the other thing is the concept of horizontal review by other w3c groups, e.g. for a11y, for i18n, and of particular interest, security and privacy wg.
… also TAG, which also has an interest in security and privacy issues.
… based on that we've made some major changes, especially to that security and privacy section.
… we've written a threat model, our recommendations on security and privacy are now closer to normative.
**Ivan Herman:** we've also tried to be more precise, we have explicit reference to the origin model -- e.g. each epub must have its own origin.
… we also clarified what the root URL of an epub instance is, something which influences the way relative URLs are interpreted.
… any relative URL that is used in a content document can be checked by epubchecker to see whether it would go "out" of the content.
… so where you researched the possibility of epub going into the fs, this is what we are trying to address.
> *Dave Cramer:* See [definition about the root URL](https://w3c.github.io/epub-specs/epub33/core/#sec-container-iri).
**Ivan Herman:** of course, there are many things that a epubchecker cannot catch.
… we also started a discussion about `file:` URLs, which currently should not be used.
… we are discussion whether we should strengthen this to must not be used.
… but per our charter, we are under a strong requirement to be backwards compatibility.
… i.e. any valid epub 3.2 must be a valid 3.3.
… this would be an exception to that, but for many of us security is a stronger priority than backwards compatibility.
… that is why we need to discuss this now.
… as far as we know, there are no real epubs in the wild that use file: URL.
**Gertjan Franken:** can files be local to the user fs?.
**Ivan Herman:** it is local in the sense that files can be in the package, but we try to stop epubs from going outside the package.
**Dave Cramer:** you can use seven layers of ../ to try to climb out of the epub into the fs, but our new definition prevents that from happening.
**Gertjan Franken:** what about symbolic links? Does it address that?.
**Ivan Herman:** do you mean putting a symbolic link in the package that points outside the package?.
… could that work provided that `file:` URLs are disallowed?.
**Gertjan Franken:** you might circumvent it by making a symbolic file that you put into the zip archive, where the symbolic file refers to `../` for example.
… but we didn't find a lot of rs where this could be abused.
**Ivan Herman:** from our point of view, I agree that we should document that as part of the possible threats.
… i don't think specification wise, in the definition of the epub content, we could do that. That's a concept that is not part of epub..
… but I understand, and yes, we should document that point.
**Wendy Reid:** should we start the presentation part of the meeting?.
**Ivan Herman:** can you link us to your slides too?.
**Gertjan Franken:** okay, no problem, I will put it online afterwards.
**Ivan Herman:** great, you can just send me the link and I will include with the minutes.
**Gertjan Franken:** this is the presentation that I gave at a security conference, but the expertise in epub there was lower, so I will skip those parts.
… i also added some of our concerns at the end of the presentation, to highlight what was most important in our findings.
… the epub spec describes how epub should be formatted, but also how to render this format.
… it's a zip archive, with web technologies inside.
… since developers of rs don't want to reinvent the wheel, they rely on existing browser engine.
… how could this be exploited? One of the most interesting areas to us was Remove Resources.
… so epub can refer to local files, but also files online.
… it is recommended that rs notifies the user when trying to do so, but also to limit activity to read only.
… what were our research questions?.
… 1. what is the state of freely available epub RS? (capabilities, security considerations).
… 2. are these capabilities being abused in the wild? (malicious epubs, platforms tracking users?).
… we evaluated 97 epub rs over a variety of OS, including physical devices.
… we used a semi-automated black box evaluation method.
… test epubs are loaded, and results are shown on the screen and logged.
… this is available on github.
… for example, re. js support, are inline scripts executed? Are external scripts executed?.
… we limited our scripts to ES5.
… for remote communication, we refer to repo of HTTP that are known to execute requests. If communication was received, then we say that remote communication was allowed. We also checked whether user was notified.
… for local fs access, first we just tried render the file.
… if that didn't work, then we tried timing attack, to use the time it would take an event to fire to detect whether a file exists on the fs or not.
… File System in Userspace used to check whether file is available.
… for URI schemes, e.g. `mailto:` links. But in some apps URI schemes execute actions when clicked, e.g. `tel:` links and Skype for business on MacOS. Clicking these links could also be automated via js.
… re. Web engine evaluation, we developed an engine fingerprinting script, and then compared the fingerprints of known engines and compared to fingerprints of unknown engines embedded in rs.
… Results. 48% of rs tested support js. Re. remote comm. only 1 rs required user consent..
… In 16 cases we were able to infer existence of local files, and in 8 cases we could even read local files.
… this was a concern mainly on desktop and smartphone.
… 24 rs supported URI handles, but only 3 were relying on an insecure web engine. This latter issue was only a concern for desktop.
… this is what hackers do. They analyze vulnerabilities and write exploit. As a case study, we looked specifically at Apple Books, EPUBReader in browser, and Amazon Kindle.
… Apple Books we could read user info, EPUBReader extension allowed CSP circumvention leading to universal XSS, Amazon Kindle had an input validation issue due to old version of webkit which resulted in leaking of user's library.
… so are these being abused in the wild?.
… to analyze if any malicious epubs are being shared, we looked at about 9000 epubs shared on Pirate Bay.
… we did not find any malicious epubs, fewer than 1% contained js (which was benign).
… we also checked legal channels and found same results.
… but what is the feasibility of distributing a malicious ebook?.
… you would upload it to a file sharing platform with no security.
… you could also try distributing via the self-publishing platform.
… are self-published epubs sufficiently sanitized?.
… so we tested self-publishing standards of 6 vendors, and unfortunately we found it inadequate for some.
… so what are our main takeaways?.
… almost none of the js-supporting rs adhere to security recommendations. Most did not isolate local fs, allowed reading local file contents..
> *Dave Cramer:* See [Collection of the test EPUB publications](https://github.com/DistriNet/evil-epubs).
**Gertjan Franken:** we contacted devs and urged them to fix the problems.
… we found no abuse in the wild yet, but we found it very possible, even via legal channels.
… our evaluation method is open-source, to try to help devs and make users confident in their security.
… our concerns: js execution. Not a lot of use in real epubs, but it greatly increases attack surface. We would argue that maybe you should prohibit js execution..
… of course this would limit use-cases greatly.
… so what about compromise? Have js execution disabled by default, subject to user consent?.
… Remote resources. Also increases attack surface, opens risk of reading user files. We are in favor of limiting ability to do this..
… Online remote resources. Also a huge security impact. But we're not sure of all the use-cases for online remote resources..
… e.g. spec says it could be used if some resources are too big to put inside the container.
… URI schemes. Can be exploited to perform malicious actions. We would disallow use of certain schemes, not sure there is a legit use-case for this..
… the possibility of malicious action probably outweighs any usability impact.
… Embedded web engine configuration. ebooks should not be able to geolocate, or access webcam, but all of these are inherited as part of relying on web engines..
… some of the rs developers overwrite security defaults with flags (e.g. --allow-file-access-from-files). This is not recommended by devs of web engines, it's for testing only..
… this could also be mentioned explicitly in spec.
… we did not find many rs that employed outdated web engine, but you could flag for rs devs that updating embedded web engines is very important.
… hard/strict requirements instead of recommendations. At the time of our evaluation, only a few rs adhered to the recommendations.
… creating awareness among users and developers. What is even more useful than a compliance checker might be having practical developer guidelines about these security issues.
… we didn't have full overview of what use-cases were intended, so we erred on the side of cautions.
… thank you. Are there any questions?.
**Dave Cramer:** thank you, that was extremely informative and helpful.
… obviously js in general has been a thorny issue from the start, and not just because of security issues.
… we've had issues about who gets to intercept clicks when both rs and epub want to use js.
… but ability to use js is especially important in educational publishing.
… it seems that a lot of your research was done using trade books, as opposed to the specialized rs used for high ed, which rely really heavily on rs.
… unfortunately we can't go back in time to when we first wrote epub 3.
**Gertjan Franken:** if I understand correctly these edu epubs have interactive examples, is that right?.
**Dave Cramer:** sometimes it may be tied to online learning, i.e. recording student progress on a server.
… we want to shut down local remote resources, but for online remote resources, we do have use-cases related to file size, and web fonts.
**Gertjan Franken:** what I saw with Apple Books on iOS, there if you load a chapter that needs a remote resources, it prompts users with a modal asking permission to load from a particular domain (and remembers the decision).
**Gertjan Franken:** of course it comes with a certain amount of time in re-designing the UI, but worth looking into.
**Brady Duga:** about js, the epubs are supposed to say whether they support js or not. So we could have recommendation that rs disable js when epub does not say it uses js. And when epub says it does use js, then we can prompt user..
… we may already say something like that in the spec.
**Ivan Herman:** right, the mechanism is there, because we have the "scripted" property.
… we should put duga's suggestion in the security section..
… When I read your paper it made me really surprised to realize that we do allow accessing local fs.
… in paper it seemed that it was intention that this was possible, but in reality it was more just because `file:` was part of the web.
… we simply did not explicitly disallow it.
… and that's why I think disallowing the file scheme should happen.
… so you have these epub files that you used for testing. We are developing a pretty large test suite for epub testing in general, I was wondering whether your test books can either be incorporated into ours, or referenced.
… to re-use those.
… which would force rs vendors who want to self-test to use those.
… how many is that?.
**Gertjan Franken:** at least one per area of experimentation, but might be more.
… agree that it would be useful to incorporate into your testing too (but I haven't look at yours yet).
**Ivan Herman:** it's really a collection of small test epub books, 1 epub for every feature that needs to be tested.
… i propose we have a separate call on how we can re-use your tests.
> *Dave Cramer:* See [Collection of the test EPUB publications](https://github.com/DistriNet/evil-epubs).
**Ivan Herman:** similarly, compared to 3.2, we have added a significant amount of text on security and privacy issues, so it would be immensely helpful if you could review those text in light of your experience.
… i would be really grateful.
**Gertjan Franken:** I have skimmed some of the added security text, I would be happy to spend some time to read it and see if we have any suggestions.
**Ivan Herman:** re. your suggestion that we use strict requirements instead of recommendations, under the w3c process when we have to test for conformance.
… currently we have a PR that changes those recommendations into SHOULD statements, which is at least a step in the direction you propose.
**Gertjan Franken:** so if you say MUST you have to enforce it in the testing tool?.
**Ivan Herman:** exactly.
**Zheng Xu (徐征):** in my platform we are trying to encourage creators to use js to be more interactive with end-use, like a normal website.
… from your perspective, what do you think should be different between an epub and a normal website?.
… to me, a rs is similar to a web browser.
**Gertjan Franken:** it depends on the use-case, of course.
… i'm fine with using epub as a complete website, but browsers are updated frequently, they are more secure.
… so if you want to use rs as browser, they should be treated the same way re. updates and security.
… it's possible to do it in a secure way.
… the rs that were not secure did not adhere to the same standards as web browsers.
**Charles LaPierre:** this was amazing, congrats and well done.
… on slide 23 you uploaded malicious content to a number of different platforms, and a couple of them failed, right?.
… i would have thought that all of them would use epubcheck as part of ingestion pipeline, so I'm wondering if we could add some of these warnings to epub check.
… it would prevent malicious epubs from getting out via legit channels.
**Gertjan Franken:** its difficult to enforce a requirement that online remote resources should be on benign domains, for example.
… but maybe something could be done re. local fs.
… we found some platforms are doing well. Play Books is good. Barns & Noble rs crashed when you tried to load malicious epub, so that was good..
**Dave Cramer:** evaluating arbitrary js for security implications is probably impossible, yeah.
**Tzviya Siegman:** what epubcheck does formally is check against the spec, so if we wanted to add this to epubcheck then we'd need to update the spec.
**Ivan Herman:** but if the file URL is disallowed then epubcheck will flag it.
**Tzviya Siegman:** yes, gotta go, thank you!.
**Dave Cramer:** this was a great presentation.
**Ivan Herman:** how do we move on from here?.
… maybe I will go through minutes and distill some points info github issues.
… and then, Gertjan, if you and your team can review those and let us know?.
**Gertjan Franken:** okay, we can schedule a next meeting to discuss the specifics.
… and thank you for inviting me here.
**Wendy Reid:** we'll keep in touch!.

@npdoty
Copy link
Author

npdoty commented Jun 3, 2022

Authors have provided a version of the paper in an open access institutional repository, so I believe the paper can be read here without any particular sign-in or payment:
https://limo.libis.be/primo-explore/fulldisplay?docid=LIRIAS3284657&context=L&vid=Lirias&search_scope=Lirias&tab=default_tab&lang=en_US

@dauwhe dauwhe removed the Agenda+ Issues that should be discussed during the next working group call. label Jun 8, 2022
@iherman
Copy link
Member

iherman commented Jun 24, 2022

For admin reason: it is a great paper, and we had a great discussion with @GJFR (thanks again). However, I believe that the specific problems have been raised as separate issues, some of those have also been backed up by PRs: #2324, #2323, #2322, #2321, #2320 and are all, by now, closed.

Is it o.k. to close this issue?

cc @npdoty

@iherman iherman added the Status-ProposeClosing The issue is no longer relevant, or has already been fixed, and should be closed. label Jun 24, 2022
@npdoty
Copy link
Author

npdoty commented Jul 1, 2022

sounds good, thank you for the clarification

@iherman iherman closed this as completed Jul 28, 2022
@mattgarrish mattgarrish added EPUB33 Issues addressed in the EPUB 3.3 revision and removed Status-ProposeClosing The issue is no longer relevant, or has already been fixed, and should be closed. labels Aug 4, 2022
@mattgarrish mattgarrish added the Spec-EPUB3 The issue affects the core EPUB 3.3 Recommendation label Sep 14, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
EPUB33 Issues addressed in the EPUB 3.3 revision security-tracker Group bringing to attention of security, or tracked by the security Group but not needing response. Spec-EPUB3 The issue affects the core EPUB 3.3 Recommendation
Projects
None yet
Development

No branches or pull requests

5 participants