Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Big objects #248

Closed
brodock opened this issue Aug 5, 2013 · 6 comments
Closed

Big objects #248

brodock opened this issue Aug 5, 2013 · 6 comments

Comments

@brodock
Copy link

brodock commented Aug 5, 2013

currently migration is like the following:

class CreateVersions < ActiveRecord::Migration
  def self.up
    create_table :versions do |t|
      t.string   :item_type, :null => false
      t.integer  :item_id,   :null => false
      t.string   :event,     :null => false
      t.string   :whodunnit
      t.text     :object
      t.datetime :created_at
    end
    add_index :versions, [:item_type, :item_id]
  end

  def self.down
    remove_index :versions, [:item_type, :item_id]
    drop_table :versions
  end
end

This approach is good enough for objects that serialized is smaller then 64kb (on mysql), as this is the maximum size "TEXT" can hold, according to http://www.electrictoolbox.com/maximum-length-mysql-text-field-types/

I've searched on how to force ActiveRecord migration to create "MEDIUMTEXT" or "LONGTEXT". It seems that all you have to do is define a limit that is bigger enough.

See http://stackoverflow.com/questions/4443477/rails-3-migration-with-longtext for more information.

I don't know what is the best default versions should have. It seems like a good idea to have a bigger size as default to prevent headaches from people who figure out late that they lost data because it wasn't saved (as the object was bigger then expected). Also worth noting that it increases only 2 bytes per entry according to: http://dev.mysql.com/doc/refman/5.0/en/storage-requirements.html#id733892

This might be the reason #245 happened.

@batter
Copy link
Collaborator

batter commented Aug 28, 2013

@brodock - Thanks for the report. It seems like the limit option is a reasonable solution for users that are using MySQL. I personally use PostgreSQL and SQLite (which I think either have larger columns by default or are more flexible to overflows in the case of SQLite) more frequently, so it seems like this is probably a database-specific problem. Are you proposing that it might make sense to add a :limit option to the object column on the migration in the gem?

Do you have any idea what happens if you add the limit option to a column and then run the migration against a database like Postgres (which only has a TEXT datatype, no LONGTEXT or TINYTEXT)? If there is no impact then I think it may be a good idea but I'd like to be certain that users of other databases won't start seeing errors before putting that in there.

@jaredbeck
Copy link
Member

Experiment

  1. Control Group: No limit

    create_table :bananas do |t|
      t.text :object
    end
  2. Test Group: limit

    create_table :kiwis do |t|
      t.text :object, limit: 1_073_741_823
    end

Results

MySQL

CREATE TABLE `bananas` (
  `id` ..,
  `object` text COLLATE utf8_unicode_ci,
  ..
) ..;

CREATE TABLE `kiwis` (
  `id` ..,
  `object` longtext COLLATE utf8_unicode_ci,
  ..
) ..;

Postgres

CREATE TABLE bananas (
    id ..,
    object text
);
CREATE TABLE kiwis (
    id ..,
    object text
);

SQLite

CREATE TABLE "bananas" (
  "id" .., 
  "object" text
);
CREATE TABLE "kiwis" (
  "id" .., 
  "object" text(1073741823)
);

Conclusions

  • How limit is used:
    • MySQL uses limit to choose between text, mediumtext, and longtext.
    • Postgres ignores any limit up to 1GB - 1 byte. Greater limits cause an error.
    • SQLite uses the limit literally

Remaining Questions

  • What limit should PaperTrail use in its generators, going forward?
    • In MySQL, is there any downside to e.g. mediumtext over text, e.g. more disk space?
    • In SQLite, is there any downside to a literal limit? Do we care (it's just sqlite, nobody uses it in production)?

@jaredbeck
Copy link
Member

In MySQL, is there any downside to e.g. mediumtext over text, e.g. more disk space?

text uses two bytes to store its length, mediumtext uses three, and longtext uses four. Otherwise I'm not seeing any downside.

I'll take a look the PaperTrail migrations and see if we can add a limit.

jaredbeck added a commit that referenced this issue Jun 22, 2015
[Fixes #248]

- MySQL uses limit to choose between `text`, `mediumtext`, and `longtext`.
- Postgres ignores any limit up to 1GB - 1 byte.  Greater limits cause an error.
- SQLite uses the limit literally
@jaredbeck
Copy link
Member

Closing, conversation can continue at #551

@jaredbeck
Copy link
Member

For posterity, MySQL's behavior re: truncation is documented as follows:

If strict SQL mode is not enabled and you assign a value to a BLOB or TEXT column that exceeds the column's maximum length, the value is truncated to fit and a warning is generated. For truncation of nonspace characters, you can cause an error to occur (rather than a warning) and suppress insertion of the value by using strict SQL mode.
http://dev.mysql.com/doc/refman/5.6/en/blob.html

@batter
Copy link
Collaborator

batter commented Nov 16, 2015

I know this is late but thanks for the detailed research and reporting @jaredbeck, I did read this at the time FWIW

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants