-
-
Notifications
You must be signed in to change notification settings - Fork 386
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Preserving external images (issue #399) #447
Conversation
Looks like deprecation warning from |
Option is renamed to |
Thx for the PR. My primary concern is this - " rejected: Configuring it as an Image Proxy mode" but to me, it sounds like the right way to do it. The current proxy already does most of the work of mapping picture urls, converting comment and serving back different (proxied) content on the fly. Preserving images on the FS (or other stores) is a natural extension of such a proxy. i.e. some form of cached proxy. Btw, if we treat stored images as persistent cache, we can limit the size of the cache (number of stored images or total volume on the disk) with LFU or LRU. Besides, it feels like rest-level functionality and not the engine-level. I think with less code, it can be done less intrusively with an additional One potential disadvantage with this simplified logic - we won't be able to utilize submit+commit technic used in Another potentially promising solution is to combine |
It might be a long answer, apologize in advance :) I think there're two key things to clarify:
Besides these two things a few more comments:
Imho pretty dangerous approach, urls are mutable, tomorrow the same url may refer to a completely different picture but remark's storage/cache will continue serving old one.
In fact it does use it. Because I put images into a permanent storage and replace urls with
Yeah, I thought about it too. One thing that stopped me, eventually some remark installation will get an actual user called "system_images" :) Who knows what other features remark will have then, for example you may add returning all images for a user, imagine surprise of this "system_images" user :) Let me know what you think, after all it's your project and I'm totally fine changing implementation in a way you see it. |
No worries, mine will be even longer ;) TLDR: up to you with a couple of extras
Sure. I mentioned LRU/LFU type of cleanup as a hypothetical "possible someday" kind of extension. Anything unlimited may run out of space someday, but let's not care about it now, like I ignored this problem for images directly uploaded to remark42 with a hope to always have big enough disk to keep them forever.
Both. I don't see the ability to serve the response from the stored file as something disqualifying the thing to be a proxy. One of the most popular proxies is squid, and this is exactly what it does. Currently, the proxy we have in remark42 limited to provide http->https "forwarding" but this forwarding is for the same external assets, and I don't see anything conceptually different between serving request by calling
Well, yes and no. I mean, this is a typical tradeoff between size and accuracy. We may address it in different ways without killing deduplication completely, for example, checking the size of the remote image after some TTL and including size into the encoded filename. But I don't think we care about this too much, at least for now, because currently, we including images exactly the same way by URL. However, I don't have any fundamental problem with saving images under user_id if you think it may help somehow.
Ok, agree.
No, this is not possible with user_id because ids formed as
I don't have a major technical problem with your current implementation. It is fine and clear code, decently covered. I didn't like the change of converter interface initially but can live with it. To me, passing user_id felt too artificial, but I guess, in some cases such ability can be handy. Handling imgPreserver as a convertor was the most controversial idea here (from my pov), and the main reason why I started this discussion. However, I can accept it and, maybe, may begin to like it one day. It still feels like the wrong level to me, but it won't wake me up in the middle of the night screaming and crying ;) The bottom line - I don't want you to spend too much time redoing this. If you have some bright ideas on how to keep the most of it but address the "wrong level" issue - it will be nice. If not - I'll gratefully accept the current implementation. Some additional things to consider:
|
Haha, thank you very much for a very detailed response! I actually think that "proxy" idea is growing on me, I'll see what I can do with it. I'm going to spend 5 hours on a plane this weekend, may just as well do something useful :) |
so, you had 2 months to think about it ;) any conclusions? |
@umputun doh, I'm so sorry for disappearing! Yes, I actually pushed my changes implementing caching in proxy back in October (see the last commit e4db45a), now need only rebase it to master and add/update tests (and probably add support into BoltDB storage). I think I'll try to do it during Christmas holidays, would it work? |
sure. somehow I missed that push completely |
Yeah, it's my fault probably, I was planning to finish everything quickly so didn't add any comment after pushing. Do you want to take a look at the implementation now (I don't expect anything to change while rebasing/adding tests) or would prefer to review finished version? |
let's do it on rebased version. I don't recall what even changed in store/image package and caused the conflict, so it's hard to tell how transparent this merge is going to be |
52e3de4
to
39feb3a
Compare
Cleaned up everything, added tests and rebased to master. Tested without frontned (with both FS and BoldDB storages) so it's all good for review. Will do end-to-end testing with frontend tomorrow. |
linter complains about using |
sha1 is fine for our uses and |
d39f788
to
4adf6c6
Compare
Tested with frontend, all related functions seems to be working properly (posting/viewing comments with and without proxy/caching enabled). A few notes:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looking good, just one missing change in a comment and an idea about ignoring CacheExternal in proxied handler
This reverts commit 05014a5.
9d8055d
to
8842010
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thx
Added preserving external images into a local storage.
Configuration
Added
preserve-external-images
parameter (PRESERVE_EXTERNAL_IMAGES
envvar) which isfalse
by default.Implementation details
Images Preserver acts as a
CommentConverter
and utilizes default image storage as it configured by Remark admin. Images Preserver extracts all external images (sourced outside ofREMARK_URL
), downloads and stores them into local storage, after that image URL is replaced with a local version. If at any moment of downloading/storing image any problems occur (for example image url returns 404, or image has unsupported format, or too big, etc) then original image url will be kept.Considered and rejected
IMG_PROXY=[none,proxy,store]
). For some reason, even if implementation is somewhat close to Image Proxy, it feels that from a user (Remark admin) perspective Image Proxy and Image Preserving are two different things. Plus after storing an image Image Preserver no longer acts as a proxy.Future improvements