Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Define a data privacy and usage policy #388

Open
5 tasks
choldgraf opened this issue Oct 23, 2020 · 7 comments
Open
5 tasks

Define a data privacy and usage policy #388

choldgraf opened this issue Oct 23, 2020 · 7 comments

Comments

@choldgraf
Copy link
Member

choldgraf commented Oct 23, 2020

Context

Many communities want some a guarantee that we will not abuse our control over their data. In some cases, this may be a legal requirement (for example, working with communities that follow GDPR guidelines).

We should should define a policy that gives communities confidence that we will not use their data in any way that they do not wish.

Reference policies

Here are a few policies that we could use for inspiration:

Proposed language

2i2c Pilot Hubs user data policy

User data generated by using a 2i2c Hub is controlled by the users, not 2i2c. 2i2c does not retain any ownership or privileges for user data on the hubs that it deploys as a part of this pilot. The infrastructure that 2i2c deploys (e.g., JupyterHub and Kubernetes) does log some information about user behavior, such as sign-on timestamps and aggregated usage over time. This information may be used by 2i2c in diagnostics to improve hub deployments, or as aggregated statistics in order to demonstrate usage and interest for the purposes of grants etc. However, it will not share this data or any derivatives of this data (beyond aggregate statistics or visualizations) with any third parties.

Task and updates

  • Figure out the major points that our data policy should make
  • Create a draft data policy
  • Run it by CS&S for their approval
  • Finalize and approve it (via steering council vote)
  • Post this in our documentation somewhere that we can refer back to it
@colliand
Copy link
Contributor

This is an excellent issue and I am excited to develop this further. Here are a few quick reactions at a high level:

  1. There are standard processes used by universities to evaluate technology. One of these processes is called a privacy impact assessment (PIA). 2i2c should identify a best example of a PIA form, perhaps from Bill Allison, and fill it out. The PIA matrix provides a series of prompts that force 2i2c to consider. As a leading open science organization, 2i2c could perhaps disclose the PIA publicly.
  2. There is a tension between privacy, data ownership and transparency. This tension is modulated differently in Canada (and within Canada) and the USA. For example, the "2i2c way" involves a publicly visible hubs.yaml file that transparently reveals details about some users and administrators of various hubs. I don't believe UBC would allow this type of personally identifiable information (PII) to be shared.

The PIA process will likely allow 2i2c to define an ontology for the various data in scope. That ontology will include things like intellectual property created by the user, raw data from public or private sensor sources, personally identifiable information, and riskier data sets like medical or financial records. My view is that 2i2c should take a leading and opinionated approach here aligned with "open science" best practices.

@choldgraf
Copy link
Member Author

This is super helpful, thanks for this extra information (I know very little about organizational considerations for data privacy).

I think it will take a while to go through the full exercise that you describe, and in the meantime there are organizations asking us what our policy is right now. Should we just say "we have no policy"? Or perhaps we can agree upon an informal language that at least conveys our values and approach even if it is not a rigorous policy?

@colliand
Copy link
Contributor

We should ask those organizations for a PIA and to collaborate with us. We want to know from them what they want our data policy to be. For Syzygy, we mention non-profit, hosted on Compute Canada, and minimal PII retention and get approved right away. The transparency on some PII as part of the 2i2c plan will likely need to be addressed with "open science" values.

@choldgraf
Copy link
Member Author

@colliand that's a good idea - @ericvd-ucb do you think one or more of the community colleges would be willing to brainstorm with us what their ideal user privacy agreement would be?

@choldgraf
Copy link
Member Author

Update: one-off policy being used

@sgibson91 needs a data policy to cover the data collected for an SSI fellowship project she's working on, so I've gotten approval from CS&S to have a one-off use of the policy defined here (adapted from SSI). We should then define a more long-term policy for 2i2c that we can use with the hubs as well.

@choldgraf
Copy link
Member Author

We now have a privacy policy defined here:

https://docs.2i2c.org/user/topics/policy/privacy/

Can this be closed? @jnywong would this work for your needs right now?

It also feels like this page is not discoverable if you weren't able to find it, so do you have thoughts on a better place to link it?

@jnywong
Copy link
Member

jnywong commented Feb 29, 2024

Thanks, Chris! I did manage to find this page before but I don't think it quite works for my needs for now, since it refers mainly to data that is held on hub infrastructure rather than the type of data I will be collecting from the training feedback surveys.

I could expand https://docs.2i2c.org/user/topics/policy/privacy/ to incorporate what I need since I would prefer to link upstream to a SSOT in the Hub Service Guide.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
No open projects
Status: Needs Shaping / Refinement
Development

No branches or pull requests

3 participants