Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

db:dump should dump data #196

Open
fdr opened this issue Aug 4, 2015 · 3 comments
Open

db:dump should dump data #196

fdr opened this issue Aug 4, 2015 · 3 comments

Comments

@fdr
Copy link
Contributor

fdr commented Aug 4, 2015

https://github.com/interagent/pliny/blob/master/lib/pliny/tasks/db.rake#L89

When one has a some seed data added via migrations over time, it conflicts poorly with the schema-only dump done here. That's because one is obliged to add them to seeds.rb or schema.sql by hand and then make idempotent migrations foreverafter, or risk losing them the next time the schema is compacted (one is free to speculate how many errors are going to creep in from having to update seeds.rb and complicate migrations for any seed rows, I think it's unnecessarily many)

It would be better if a fresh database were prepared, migrated, and then dumped with data. This would also avoid the special code here to keep track of migrations specially.

@pedro
Copy link

pedro commented Aug 6, 2015

We inherited this model from Rails and it has worked nice in my experience: the basic idea is that migrations should only change structure, not data. I think this is a good approach because data migrations tend to be much slower and are a good candidate to run out of band. Not to mention you may want to use migrate data using your models so any default attributes and validations apply.

Are seeds not a good fit for the kind of data migration you're running?

@fdr
Copy link
Contributor Author

fdr commented Aug 6, 2015

Let me state the general problem, putting aside how schema.sql is generated:

The general problem is that one writes data migrations to, say, update a table that's mostly a lookup table, with say, twenty records in it (then over the years, you write migrations to modify, add, or remove those records systematically in some way).

When schema.sql is generated, it loses the elements of that lookup table, at the same time you remove all the migration files.

No big deal, so you put the lookup stuff in seeds. But then the migration, which you need for production, cannot be applied properly in test unless you take pains to make it idempotent.

The solution I chose in one instance was to make seeds.rb a subset of the records I wanted in production, and then let the migration take care of modifying the result of seeds.rb. The objection is that the next time schema.sql is generated and the migrations truncated, that careful arrangement will be lost ("two sources of truth").

What I see as a non-solution is, while making migrations that modify hand-maintained tables, one has to modify seeds.rb and absorb extra complexity in the migrations forever after if one does not also want to leave an obscure bug behind for re-compaction via schema.sql.

@fdr
Copy link
Contributor Author

fdr commented Aug 6, 2015

Another reasonable answer is: don't brainlessly do schema dumps and delete migration files assuming it will work, particularly if diff $pg_dump_before $pg_dump_after do not match.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants