Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

User.avatar= should not immidiately start processing the file #361

Closed
GBH opened this issue Jun 14, 2011 · 27 comments
Closed

User.avatar= should not immidiately start processing the file #361

GBH opened this issue Jun 14, 2011 · 27 comments

Comments

@GBH
Copy link

GBH commented Jun 14, 2011

I started integrating CarrierWave today and I tripped over this. Let's say I have something like this:

class Foo < ActiveRecord::Base
  has_many :bars
end

class Bar < ActiveRecord::Base
  mount_uploader :file, FileUploader
  belongs_to :foo
end

class FileUploader < CarrierWave::Uploader::Base
  # some stuff
  version :thumb do
    process :do_magic
  end

  def do_magic
    resize_to_fill(model.foo.width, model.foo.height)
  end
end

Pretty simple thumb generation with dynamic dimensions based on Foo object. You'd think that something like this would work:

@foo.bars.create!(:file => File.open(whatever))

Nopes. It will blow because at that time model has no idea what foo is. It's a nil at that point. This is because file= immediately triggers thumb generation for the cache before any other assignment had a chance to happen.

You can rewrite it like this:

bar = @foo.bars.new
bar.file = File.open(whatever)

Now it's actually aware of Foo. But this is pretty terrible.

So the point is, in my opinion, that all activity should be triggered only on before_save (or before_validation), never on assignment.

I kinda patched up orm/active_record.rb to do just that and I'm willing to submit a patch. Let me know if it makes sense. Thanks.

@trevorturk
Copy link
Contributor

I'm not terribly bothered by this. Would:

@foo.bars.build(:file => File.open(whatever)) && @foo.bars.save! or something similar work?

@GBH
Copy link
Author

GBH commented Jun 14, 2011

No it wouldn't. Even Bar.new(:foo => @foo, :file => File.open(whatever)) will crash. Basically mass-assignment generally doesn't work. Acting on the assigned file really should happen only when needed... no earlier than the validation.

@trevorturk
Copy link
Contributor

Hmm... I wonder if something else is going on here, because mass assignment does work generally. I'm using it for example here: https://github.com/trevorturk/kzak/blob/master/app/controllers/posts_controller.rb#L21 -- can you upload a simple example application to github so I can have a closer look?

@GBH
Copy link
Author

GBH commented Jun 14, 2011

Here's the example app: https://github.com/GBH/broken_carrierwave_example
Basically, model inside Uploader object doesn't have a chance to get all the attributes assigned because file= runs off and tries to do it's own thing out of turn.

@trevorturk
Copy link
Contributor

I wonder if this is a problem with the new Rails version. Can you try with the latest released gem in the 3.0.x series just to see?

@GBH
Copy link
Author

GBH commented Jun 14, 2011

I don't think it has anything to do with Rails version. Mass assignment goes through attributes= method that just loops through the hash and does the assignments. If the assignment for the upload happens before others: bam!
I'll make a 3.0.8 example later, but I guarantee it's the same thing.

@GBH
Copy link
Author

GBH commented Jun 14, 2011

But that's not what I showed in my example. My issue is that model at the time when uploader tries to generate thumbs is not fully updated with mass assigned attributes. What you linked is an example where you don't do anything inside the uploader object, thus it works fine.

@trevorturk
Copy link
Contributor

Ah I think I see what you mean. Well... any ideas on how to fix it? I can't think of anything right away...

@ghost
Copy link

ghost commented Jun 14, 2011

All I think is needed is an option to cache/process versions after create/save. Something like version :thumb, :after_save => true (or :after_create) would be really useful.

In my opinion, it makes more sense that way, and arguably should be the default. Processing versions before the attached record is validated and saved is a waste of CPU ticks. Say, for example, a user uploads a high resolution PNG that is larger than allowed; that image would get thumbnailed immediately before the record was even validated. It's not going to crash the server, but it's needlessly wasteful nonetheless.

In my opinion, when a file is assigned, CarrierWave should by default sit on that original file in temporary storage until after_create hooks, then process/store all the versions; if the record isn't saved for any reason (be that validation or programmer error) any work done up to that point is for nothing. If anything, users should have to explicitly configure versions to be processed before_save, not the other way around.

I'm willing to work on a patch if other people are on board with this.

@GBH
Copy link
Author

GBH commented Jun 15, 2011

Generating thumbs only after upload validation makes sense. Only not sure how to deal with validates_processing_ofthen.

I actually wanted to ensure that no action of any sort happens during assignment. In mount.rb there's this thing:

def #{column}=(new_file)
  _mounter(:#{column}).cache(new_file)
end

So during assignment it goes ahead and writes things to the cache folder. What it really should do is store the new_file in some sort of temp accessor and then have a method like this: def #{column}_cache_file! that actually caches it. Then in orm/activerecord create after_save filter that will trigger file writing / thumbnailing/etc.

Point is, I don't think attaching :after_save => true to the version is the best way to go... although it might be easier.

@ghost
Copy link

ghost commented Jun 15, 2011

From what I can tell, cache only stores versions in the temporary uploads directory (uploads/tmp/). It's not until you save the associated object that they get copied to your actual store directories.

The problem is that not only does it store the original file but it also goes ahead and processes the versions and holds those in the temporary directory as well, which is problematic if those versions rely on data that isn't available yet. I don't see any reason why anything but the original file should be cached at first; any processing/versioning can wait. I guess if your app "expects" valid uploads then processing immediately isn't a big deal, but if your app is more wary about incoming uploads, IMO it's jumping the gun a little bit.

As far as validates_processing_of, I imagine that the only thing really necessary is to determine that the file can be processed by whatever library is being used. For example, if ImageMagick cannot process the image, then it most likely can't get the dimensions of the image; using the command-line identify on the image should be enough to determine if the image can be processed. Furthermore, I don't think there are many realistic scenarios where an image can be processed for one version but the same image can't be processed for another version; and even if it did I would imagine it would be due to programmer error rather than the user uploading a bad file. Right now I think that the method works a lot differently -- if any processing at any point fails it gets saved to a variable which is later referenced by validates_processing_of, or something along those lines. (It's basically checking to see if processing has already failed rather than checking at that moment.)

@GBH
Copy link
Author

GBH commented Jun 15, 2011

I'd say that even storing things to cache is too premature on the assignment. You have TempFile to validate things like mimetype, size, whatever. Even when you assign file from the url, open() (with open-uri) will give you file handle as well. Moving stuff to cache is needed only when you attempt to save something, so you can redisplay in case validation for the object fails. Actually I don't see why would you want to deal with caching if everything gets successfully saved in the first place.

@ghost
Copy link

ghost commented Jun 15, 2011

My only concern would be that TempFiles might disappear prematurely. They're designed to be automatically deleted by Ruby. I'm not sure of how that works exactly but it's something to look out for.

@GBH
Copy link
Author

GBH commented Jun 15, 2011

I don't think TempFile will disappear. I don't see why we can't generate thumbs directly from the TempFile, cache seems like a redundant step. Cache is only useful as a temp storage during form re-displays.

@tribalvibes
Copy link

@GBH I totally agree that the processing should be deferred and not happen on the assignment and further that most of what this thing does could happen directly from the temp storage that rack provides. A lot of lines of code could be deleted, performance and clarity enhanced, etc.

def #{column}=(new_file)
  _mounter(:#{column}).cache(new_file)
end

This causes other problems as well, because it makes for an accessor pair #{column} and #{column}= which are not symmetrical, i.e. after foo = 'foo.jpg' then it is not the case that foo == 'foo.jpg'.

Just the tip of the iceberg here. See, e.g.:
http://blog.citrusbyte.com/2009/12/15/superfluous-dsls/

@ghost
Copy link

ghost commented Jun 20, 2011

When I got down to hacking CarrierWave for this functionality, I ended up making a mess. I was kind of just mucking around to get a better understanding of how CarrierWave operates, but, from what I saw, major parts of the code have to be rewritten and tests are going to break all over the place.

My problem was unique in that I actually needed to defer processing to after_update. (An upload is attached to a post, and I use a javascript file uploader to allow a user to upload a file before submitting their post, but if the post isn't valid [or completed] the upload will be discarded.) So I wanted to be able to assign a version to any given callback.

Having the option to not automatically process at all, but be able to call model.myuploader.myversion.process elsewhere in your code would go a long way too, particularly if you want some kind of background process to handle version processing for you. This is another problem I ran into; as far as I know there's no way to get CarrierWave to process a single version; it always does them all.

@mstump
Copy link

mstump commented Jun 27, 2011

Issue #380 is a duplicate of this issue.

@trevorturk
Copy link
Contributor

Has anyone come up with a potential solution to this? If not, we may need to relegate this into the "known issues" wiki page and document the problem.

@ghost
Copy link

ghost commented Jul 12, 2011

I've got nothing but words. The amount of patching needed was daunting for me and I returned to focusing on my Ruby app for the time being.

I think maybe @GBH and my concerns are somewhat different; he wants caching delayed until validation and for my purposes I want to delay caching/processing indefinitely; the latter I think would be harder to implement as it is more configurable.

@GBH
Copy link
Author

GBH commented Jul 12, 2011

I'd love to fix it, but it requires pretty extensive rewrite of mount.rb and whatever else that's attached to it. It's not a simple patch. I'll try to find some time to do it, but who knows when that's going to happen.

I think we identified what exactly is wrong, and how it should be done correctly. Now it's just a matter of implementing it.

This ticket should remain open so maybe somebody else volunteers to take a stab at it.

@trevorturk
Copy link
Contributor

You know, maybe you should consider not using mounting. CW is really flexible, so if something funky about AR callbacks etc is getting you down, you can just use it manually. I'm putting together a blog post about this, just as a small example:

https://github.com/trevorturk/sinatra-carrierwave-fog/

So, I'm going to leave this issue alone for a few days, but perhaps this is more of a wiki page than a CW code change...

@trevorturk
Copy link
Contributor

@GBH OK I'm going to close this. I disagree with leaving issues open like this because then they just build up and never get fixed and just confuse people getting started. If you'd like to take a stab at updating this wiki page, that would be great:

https://github.com/jnicklas/carrierwave/wiki/Known-Issues

There's an issue at the top you might put this issue right under that one. It seems pretty similar to me -- it's not exactly a bug, but a possible improvement to CW and definitely something that people might want to know about -- a caveat, I suppose.

@matiasgali
Copy link

I know this is closed but even after so long I find myself facing the same issue and I have a small and pretty clean workaround that I think is worth sharing, specially for those with the same problem that end up reading this issue.

As it's been stated, the problem is that the whole processing runs as soon as the file is assigned (file=), so a possible solution is to perform any previous action before that. On Rails (Active Record) this can be achieved by redefining assign_attributes on the model:

  # app/models/{model}.rb (public method in Active Record)
  def assign_attributes(attributes)
    attributes = Hash(attributes).stringify_keys
    file = attributes.delete("file")     # Put the file aside
    super                                # Assign the rest of the attributes

    # Do or call anything you need...

    self.file = file                     # Finally, assign the file
  end

The new (initialize), create and update methods use assign_attributes to do so.
I use it just like this to ensure that the rest of the attributes are available from the uploader at the time of processing, but you can prepend anything you need.

I agree that the processing should be delayed. Before validations the file's field should contain only the bare original file (or tmp path at least) to allow performing validations (like file size, image dimensions, etc.) and the processing should be carried out only when it's sure to be saved or after saving to avoid wasting resources. Also, it would be great if the processing could be separated from saving (if the original file is valid) to allow handling the process asynchronously (sidekiq, etc.).

@opsidao
Copy link

opsidao commented Jun 1, 2016

We're using 0.10.0 and have hit this problem, is this still happening on 0.11.2?

@ErvalhouS
Copy link

ErvalhouS commented Apr 27, 2017

Using 0a1c6f4 this problem is still present.

@h0jeZvgoxFepBQ2C
Copy link

@trevorturk It's a big vulnerability imo, since you can DOS your server with big files - if the processing starts before the validation already?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

8 participants