Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SLT-17: Periodical drush SQL-Dump using gdpr-dump. #40

Merged
merged 18 commits into from
Nov 1, 2018

Conversation

Jancis
Copy link
Member

@Jancis Jancis commented Oct 23, 2018

What this PR does:

I've set up a cron rule that is invoked every day at 3:00 AM and runs a drush sql-dump command provided by GDPR Tools. The sql dump is only run when the current branchname equals predefined production_branchname in values.yaml

We have a slightly modified version currently because of following issues/improvements:

This allows site developers to create a /gdpr.json file with Faker formatters that will allow replacing data as it's dumped from database using mysqldump / drush sql-dump command.

{
  "users_field_data": {
    "name": {"formatter": "name"},
    "pass": {"formatter": "password"},
    "mail": {"formatter": "email"},
    "init": {"formatter": "clear"}
  }
} 

Available formatters:

name - generates a name
phoneNumber - generates a phone number
username - generates a random user name
password - generates a random password
email - generates a random email address
date - generates a date
longText - generates a sentence
number - generates a number
randomText - generates a sentence
text - generates a paragraph
uri - generates a URI
clear - generates an empty string

Developers can also add extra elements and attributes, like _cookies, _description or _purpose that could bring this file closer to automatic PD documentation generation. Just have to be marked or prefixed so that it does not mess up GDPR dump when it looks for table data replacements.

Testing:

It works on my machine, I swear! But you can branch it, change the cron interval to something like */5 * * * * and production_branchname to the name of your branch (i.e. feature/SLT-17-gdpr-dump). Deploy and you should see updated sql dump at /var/backups/db/ every five minutes. You can search the dump file for string users_ and see that all user names, emails and such are changed. It also changes the name of anonymous user, which is not an ideal situation, but we can't do much about it at this time.

Important note

While working on this, I ran into issue where dump couldn't rewrite existing dump.

sh: can't create /var/backups/db/feature-SLT-17-gdpr-dump-latest.sql: Invalid argument

This also happened on sites/default/files/, but I found the way around by removing the file first and then making the database dump. This seems to work perfectly, but there could still be a bigger issue with persistent storage so I created a bug report out of this.

@Jancis Jancis force-pushed the feature/SLT-17-gdpr-dump branch 4 times, most recently from b8733b0 to 40b28f4 Compare October 23, 2018 11:21
@Jancis Jancis requested a review from floretan October 23, 2018 11:26
…rd drush sql-dump when the gdpr.json file can't be located; Documentation;
chart/values.yaml Outdated Show resolved Hide resolved
- run: composer install -n --prefer-dist --ignore-platform-reqs --no-dev
- run: |
composer install -n --prefer-dist --ignore-platform-reqs --no-dev
composer config repositories.gdpr-dump-mods git https://github.com/Jancis/gdpr-dump
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it work to add this to composer.json? This way we control when we get updates (otherwise things might break whenever a new version is released).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We have a cronjob by default that assumes there is a gdpr-dump installed we have to be sure developers include it in their composer.json file. I'm doing this manually just because we can't be sure, but then again, build is slower because of that.

I can just assume that it's in there or add an additional check in cron execution.

Dockerfile Outdated Show resolved Hide resolved
README.md Outdated Show resolved Hide resolved
chart/values.yaml Outdated Show resolved Hide resolved
@floretan
Copy link
Contributor

We should provision the volume as well, but only if the current branch is the reference branch.

@Jancis Jancis merged commit c20fbc9 into master Nov 1, 2018
@floretan floretan deleted the feature/SLT-17-gdpr-dump branch November 1, 2018 15:03
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants