Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Self-contained Packages #1876

Closed
wareid opened this issue Oct 26, 2021 · 8 comments · Fixed by #2342
Closed

Self-contained Packages #1876

wareid opened this issue Oct 26, 2021 · 8 comments · Fixed by #2342
Labels
Cat-Privacy Grouping label for privacy related issues EPUB33 Issues addressed in the EPUB 3.3 revision privacy-tracker Group bringing to attention of Privacy, or tracked by the Privacy Group but not needing response. Spec-EPUB3 The issue affects the core EPUB 3.3 Recommendation

Comments

@wareid
Copy link
Contributor

wareid commented Oct 26, 2021

From the PING review:

Self-contained packages have potential huge privacy advantages, but it's not clear that the EPUB spec or current implementations fulfill these opportunities. Is that a goal that the community could work towards?

The current spec anticipates and requires (at least a SHOULD) loading of remote resources from arbitrary origins. This introduces risks of additional data collection about who is reading what book (and from where), and what part of the book is being read at a particular moment (depending on the implementation or requirements on how remote resources are loaded). And remote resource loading should also make explicit that the author/publisher of the book may effectively be collecting data on the reader's habits, in addition to the reading system. Different levels of scripting access are defined, but it's not clear whether any such level would indicate that user reading behavior would not be disclosed.

It would be useful to specify a privacy threat model specific to EPUB, to the extent that it varies from the Web. Can we guarantee that reading habits will not be surveilled, by the publisher, the retailer, the reading system, or other parties? Or if that data is revealed, then we should clarify to whom or under what conditions. Book-like privacy could be achieved, but would require significant changes from the current spec and current popular implementations.

The spec suggests that the manifest is an exhaustive pre-stated list of resources, including remote resources, but it's not clear how that's intended to be handled by reading systems. Should a reading system refuse to fetch any remote resource not included in the manifest? Are remote resources intended to be the same for all readers of a book, or might they be personalized?

@wareid wareid added privacy-tracker Group bringing to attention of Privacy, or tracked by the Privacy Group but not needing response. Cat-Privacy Grouping label for privacy related issues labels Oct 26, 2021
@dauwhe
Copy link
Contributor

dauwhe commented Oct 28, 2021

The spec suggests that the manifest is an exhaustive pre-stated list of resources, including remote resources, but it's not clear how that's intended to be handled by reading systems. Should a reading system refuse to fetch any remote resource not included in the manifest? Are remote resources intended to be the same for all readers of a book, or might they be personalized?

The spec says:

It SHOULD NOT use non-Publication Resources in the rendering of an EPUB Publication due to the inherent limitations and risks involved (e.g., lack of information about the resource and how to process it, security risks from remotely-hosted sources, lack of fallbacks, etc.).

Perhaps MUST NOT is appropriate here?

@jenstroeger
Copy link

In this context, I think the paper Reading Between the Lines: An Extensive Evaluation of the Security and Privacy Implications of EPUB Reading Systems contains a good analysis of problems of current reading systems. It may also help to clarify some of the phrasing above.

@iherman
Copy link
Member

iherman commented Nov 2, 2021

In this context, I think the paper Reading Between the Lines: An Extensive Evaluation of the Security and Privacy Implications of EPUB Reading Systems contains a good analysis of problems of current reading systems. It may also help to clarify some of the phrasing above.

@jenstroeger do you know if there is a version of the paper that is not behind a paywall?

@npdoty
Copy link

npdoty commented Jan 28, 2022

A copy of the paper from the author's academic site: https://lirias.kuleuven.be/retrieve/616428

This paper is so useful! Thanks for pointing it out -- and I'm sorry I missed it when we had our first discussion of a privacy review. As always, we need to improve outreach to and coordination with academics.

@wareid wareid closed this as completed Apr 8, 2022
@iherman
Copy link
Member

iherman commented Apr 8, 2022

The issue was discussed in a meeting on 2022-04-08

List of resolutions:

View the transcript

1. Close Privacy & Security Issues.

Dave Cramer: the TAG has reappeared of making a couple comments, I am making a PR to mention that when using web APIs, which have the most dramatic privacy and security implications (geolocations, push notifications) then you should get user consent.

See github issue epub-specs#1959.

Dave Cramer: we have several issues where there was never much discussion in the issue (#1959 for example).
… I think the PR i mentioned earlier would serve to close this issue.
… agree/disagree?

Ivan Herman: we had a lot of discussion with PING, good discussions, after which we made extensive additions to answer the issues they raised.
… and we contacted them several times to get their acknowledgement. So at this point we consider these issues closed..
… they have the right to reopen issues if they like.
… Amy from TAG has closed the issue of epub review on the TAG repo, so that is an indication of how they feel.

Gregorio Pellegrino: so is this passed? it is okay?

See github issue epub-specs#1872.

Ivan Herman: yes, it is okay.

Dave Cramer: risk of exposure and finger printability.
… this was raised before we clarified the threat model, can we close this now?

See github issue epub-specs#1873.

Dave Cramer: obfuscation, which we've discussed extensively, followed by updates to the spec docs.

See github issue epub-specs#1875.

See github issue epub-specs#1876.

Dave Cramer: interactivity, which we've addressed as best we can given that it's ambiguous.
… self-contained packages, this is a case where its appropriate to close because epub is clear that it is largely self-contained, subject to exceptions enumerated in the spec. Not dramatically impacting privacy.

See github issue epub-specs#1957.

Dave Cramer: we enumerated the threat model, which deals with #1957.

See github issue epub-specs#1958.

Dave Cramer: permission prompts, we're dealing with this, strengthened text.

See github issue epub-specs#1959.

Proposed resolution: Close remaining privacy and security issues. (Wendy Reid)

Dave Cramer: broad user expectations issues, which is covered by the other changes we've made.

Ivan Herman: +1.

Matthew Chan: +1.

Shinya Takami (高見真也): +1.

Bill Kasdorf: +1.

Dave Cramer: +7.

Wendy Reid: +1.

Matt Garrish: +1.

Murata Makoto: +1.

Dan Lazin: +1.

Charles LaPierre: +1.

Ben Schroeter: +1.

Masakazu Kitahara: +1.

Resolution #1: Close remaining privacy and security issues.

Ivan Herman: clap, clap.

Dave Cramer: I think the spec is now much more informative/clear about some of these issues, so thanks everyone.

GeorgeK: +1.

@npdoty
Copy link

npdoty commented May 10, 2022

non-normative text notes the privacy and security advantages, but doesn't expect that they'll be met.

Could the spec normatively define the properties necessary for an epub to be a self-contained book, so that users, reading systems, archivists, etc. could know/test that it's self-contained and would have those privacy properties?

@mattgarrish
Copy link
Member

Could the spec normatively define the properties necessary for an epub to be a self-contained book

This sounds a lot like we'd be walking the path of making an archival format for EPUB.

There is an ISO specification that purports to do this, though it wasn't done through IDPF/W3C channels. I don't have access to that document, but there's a recap of it here: https://www.loc.gov/preservation/digital/formats/fdd/fdd000519.shtml

Would it work to point to that standard as an example for those who are interested? Normatively recommending that all EPUBs be self-contained probably isn't a realistic goal for the core authoring specification.

@iherman
Copy link
Member

iherman commented Jun 23, 2022

Could the spec normatively define the properties necessary for an epub to be a self-contained book

This sounds a lot like we'd be walking the path of making an archival format for EPUB.

There is an ISO specification that purports to do this, though it wasn't done through IDPF/W3C channels. I don't have access to that document, but there's a recap of it here: https://www.loc.gov/preservation/digital/formats/fdd/fdd000519.shtml

Would it work to point to that standard as an example for those who are interested? Normatively recommending that all EPUBs be self-contained probably isn't a realistic goal for the core authoring specification.

+1 to that, noting that the ISO standard refers to EPUB 3.01 (not sure whether the differences in 3.2 would influence the ISO spec, though).

At some point, when EPUB 3.2 will be published, the question of fast-tracking EPUB 3.2 through ISO will come up (our W3C/ISO agreement will make that easy). At that point, we should encourage ISO to update the EPUB Preservation spec.

@mattgarrish mattgarrish reopened this Jun 23, 2022
@mattgarrish mattgarrish added the Spec-EPUB3 The issue affects the core EPUB 3.3 Recommendation label Sep 14, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Cat-Privacy Grouping label for privacy related issues EPUB33 Issues addressed in the EPUB 3.3 revision privacy-tracker Group bringing to attention of Privacy, or tracked by the Privacy Group but not needing response. Spec-EPUB3 The issue affects the core EPUB 3.3 Recommendation
Projects
None yet
Development

Successfully merging a pull request may close this issue.

6 participants