-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Create a small application #107
Comments
You could take a look at my MozJPEG UI in the Microsoft Store. It's quite new and not 100% production ready. It also does not exactly what you were asking for but it comes reasonable close I guess. Note that jpegs that are recompressed using MozJpeg can be visually indistinguishable from the originals (given a high enough quality setting) but are generally not pixel by pixel identical. Happy to hear your feedback if you give it a try. |
Hi Georg
I should confess that I have written such a program, albeit in C/C++.I’m
using CreateProcess which turned out to be a little tricky. Turns out you
have to use the path as the first argument. I’m pretty sure mine does
produce pixel py pixel identical images, it only uses jpegtran and I
checked the commands.
Not ready for github. But I thought it was a good idea and most likely
easier to create a nice UI in .net.
May I ask one thing. What do you use if not CreateProcess? I suspect .net
has something similar.
Best regards,
Fredrik
tors 5 jan. 2023 kl. 04:28 skrev Georg Jung ***@***.***>:
… You could take a look at my MozJPEG UI in the Microsoft Store
<https://apps.microsoft.com/store/detail/mozjpeg-ui/9NFGLK3GK4QW>. It's
quite new and not 100% production ready. It also does not exactly what you
were asking for but it comes reasonable close I guess.
Note that jpegs that are recompressed using MozJpeg can be
indistinguishable from the originals (given a high enough quality setting)
but are generally not *pixel by pixel identical*.
Happy to hear your feedback if you give it a try.
—
Reply to this email directly, view it on GitHub
<#107 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AQW5QFA4GVFF5RGUVNT4M7DWQY5WLANCNFSM6AAAAAARR2SFX4>
.
You are receiving this because you authored the thread.Message ID:
***@***.***>
|
Hi again
There is something I should mention. When I run my tool on my own poctures
downlooaded from ICloud, the files always get smaller. However, I noticed
that when I ran the tool on images where I already had run an older version
of th mozjpeg’s jpegtran, some of them actually got larger. It seems as if
it does fix problems with the files, maybe there’s a missing height/width
marker. This is of course more likely with pictures that come from
Internet. You never know how they may have been processed. I have used some
other tool before and I’m not sure what it did. Still, I would like to know
why this version managed to shave a few bytes or exactly why I got a larger
file.
My jpegtran settings should only remove unneeded markers like comments and
it should optimise the huffman table. I think this is safer than if you
create a lower quality image.
Best regards,
Fredrik
tors 5 jan. 2023 kl. 14:14 skrev Fredrik Wahlgren <
***@***.***>:
… Hi Georg
I should confess that I have written such a program, albeit in C/C++.I’m
using CreateProcess which turned out to be a little tricky. Turns out you
have to use the path as the first argument. I’m pretty sure mine does
produce pixel py pixel identical images, it only uses jpegtran and I
checked the commands.
Not ready for github. But I thought it was a good idea and most likely
easier to create a nice UI in .net.
May I ask one thing. What do you use if not CreateProcess? I suspect .net
has something similar.
Best regards,
Fredrik
tors 5 jan. 2023 kl. 04:28 skrev Georg Jung ***@***.***>:
> You could take a look at my MozJPEG UI in the Microsoft Store
> <https://apps.microsoft.com/store/detail/mozjpeg-ui/9NFGLK3GK4QW>. It's
> quite new and not 100% production ready. It also does not exactly what you
> were asking for but it comes reasonable close I guess.
>
> Note that jpegs that are recompressed using MozJpeg can be
> indistinguishable from the originals (given a high enough quality setting)
> but are generally not *pixel by pixel identical*.
>
> Happy to hear your feedback if you give it a try.
>
> —
> Reply to this email directly, view it on GitHub
> <#107 (comment)>,
> or unsubscribe
> <https://github.com/notifications/unsubscribe-auth/AQW5QFA4GVFF5RGUVNT4M7DWQY5WLANCNFSM6AAAAAARR2SFX4>
> .
> You are receiving this because you authored the thread.Message ID:
> ***@***.***>
>
|
Which options for jpegtran do you use? As jpeg is a lossy compression mechanism, recompression typically leads to some loss in quality and to not pixel-by-pixel identical images. This also explains why your images grow again if you process them twice. Compression artefacts are often hard to compress themselves. Thus, if you compress an image over and over again it might sometimes grow in size. If you think there is any metadata that is responsible for a larger or smaller file, you could check with exiftool. Edit: Ah, I see, your talking about this part of MozJPEGs readme:
This is actually not what my UI (currently) does and also not what MozJpegSharp's |
This is straight from my program
-optimize -progressive -copy none -outfile "C:\Users\Fredrik
Wahlgren\Documents\Test Images\@@.pict" "C:\Users\Fredrik
Wahlgren\Documents\Test Images\000.jpg"
@@.pict is what I get and I overwrite the source file with this one to
minimise fragmentation
These options will only affect markers unless baseline (or arithmetic)
encoding is used. It will remove comments and it will optimise the huffman
table. The result should be pixel by pixel identical to the source file as
it doesn't recompress the image. Some programs do. If you use microsoft's
viewer and then "save as", what you get won't be the same.
I did use jpegsnoop to examine why some pictures got larger. What I found
was that only the markers were affected. Some pictures lacked the
height/width marker which I think jpegtran fixed. I think that even if the
file is larger, the original should be overwritten because jpegtran has
somehow fixed problems.
Some other things I want to mention. You may want to use a progress bar and
also do an Estimated Time To Arrival calculation. This is somewhat tricky
and there are two ways to think about this.
Idea#1 I first iterated over all files so that I knew how many files to
process, in my case it's every jpg I can find. Then, when I iterate a
second time, I have a counter that keeps track of how many files I have
processed. Then, I calculate how many files per second I have processed and
that makes it easy to "estimate" ETA. It's not good because if your files
are ordered by size for some weird reason, smallest first, then it will
underestimate the time it takes- Once you find large files, the average
time will increase by a large amount.
Idea#2 The idea is now to calculate the total amount of kilobytes and then
how many kilobytes it has processed in addition to a tally of microseconds
per file. Do some math and you can calculate how many seconds it should
take to process whatever remains. One underestimates, the other
overestimates.
So I had a stellar idea. After my first iteration, I calculate the average
file size and then I pass the average of this value and the actual value.
This works well because large files are faster relative to their size than
small files. I then use both algorithms to calculate ETA and I use the
average of these values to calculate what I show. It works very well, the
ETA is almost always smaller.
I can think of a very good reason not to recompress images. If you are a
hobbyist astronomer, you will have heard of Victor Buso. He got some new
gear that he wanted to try out so he pointed his telescope at a random
location and took some pictures. Days later, he found that he had taken a
picture of a supernova. These pictures are unique, every astronomer's wet
dream. You may want to have smaller files but you don't want a single pixel
to be different from the original. Recompressing is not OK.
if you want to make your program more appealing, you should consider the
possibility to rename files. There was this photographer that said
something wise about organising pictures. You should organise in such a way
that if you want to find a certain picture, the method should be such that
this can be done in the shortest amount of time. Organising pictures is a
real PITA because of name collisions. They are called IMG000, IMG 001.. or
DSC000, DSC001 and so on.
So there are two ways which I think are useful. First make sure that
(almost) all pictures have unique names. Some pictures may have been given
a special name that you don't want to lose a photographer like the last
picture of a beloved person that passed away. Or there is a name of the
person on the picture. Therefore, just rename the ones that seem to come
straight off the camera. the first one will be named 000000.jpg and the
last 045723 maybe
Now, you can look at them and almost certainly move them around as you
please. Once done, you may want to rename again using different logic.
whenever you come to a folder, you start from zero, maybe 000 will do. You
will have as many 000.jpg files as you have folders. This is good for two
reasons. If you search for, say, 500.jpg, you will know there are at least
501 files in that folder. Maybe you should have a look at these. The second
reason is that if you find some pictures that should be moved you just add
change 231 to 0231 and these will be first or last when you sort by name
and can be moved without risking name collisions.
I hope you will find my suggestions useful, Please let me know if you do.
In particular, I really think your program should allow for lossless
optimisation as it is really easy and completely "safe". Not only for
hobbyist astronomers.
Best regards,
Fredrik
…On Thu, Jan 5, 2023 at 5:41 PM Georg Jung ***@***.***> wrote:
Which options for jpegtran do you use?
As jpeg is a lossy compression mechanism, recompression typically leads to
some loss in quality and to not pixel-by-pixel identical images. This also
explains why your images grow again if you process them twice. Compression
artefacts are often hard to compress themselves. Thus, if you compress an
image over and over again it might sometimes grow in size. If you think
there is any metadata that is responsible for a larger or smaller file, you
could check with exiftool.
—
Reply to this email directly, view it on GitHub
<#107 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AQW5QFH47V3XHPNNA4RFXKLWQ32TPANCNFSM6AAAAAARR2SFX4>
.
You are receiving this because you authored the thread.Message ID:
***@***.***>
|
When I had finished my somewhat lengthy response to your question, it
occurred to me that there are some complications that probably should be
considered.
While I used jpegsnoop to confirm that markers only were affected, I should
check exifrool.
Anyway, there is a popular program called jpegmini that you should be aware
of. It does recompress jpg files and does produce smaller files, sometimes
considerably so. It ensures that you can’t run jpegmini twice on the same
file by creating a comment. Obviously, if this comment is found, the file
is ignored.
You may want to parse your jpg files to see if it has such a comment and
ignore the file if you do recompression. You can squeeze an additional
percent or so if you use lossless jpegtran but you should preserve the
comment. You use c#, right? I guess that makes it easier.
And here’s another thing. Some people have tens of thousands of pictures
like my froend who is an avid photographer who owns a high end dslr, single
digit Canon. (I’m so jealous..).
It will take a very long time to eun jpegtran on every single file, days
even. So, what happens if you have processed a significant portion of these
files and you have to shut down the computer. Not fun on day three.
So, here’s the solution. If the computer is shut down, make sure to save
the name of the first unprocessed file like x:\my
pictures\landscapes\Italy\432.jpg
You pass x:\my pictures to the program and if this is a substring of the
string you saved, you can ignore every file you find until you reach this
one being the first unprocessed one. Now your program starts where it left.
Nice, don’t you think? Jpegmini does not implement this idea. It’s a pain
in many ways. I will explain why if you want me to.
Very useful but I stii think that lossles optimisation, no recompression,
is the most important thing to implement. Some people really don’t like to
lose pixel values even if they are aware that these files are different
from RAW files. Of course, if you are a hobbyist astronomer, you should
store your important files as raw, png or tiff. It may be ok to convert to
jpg but not to recompress.
Best regards,
Fredrik
tors 5 jan. 2023 kl. 17:41 skrev Georg Jung ***@***.***>:
… Which options for jpegtran do you use?
As jpeg is a lossy compression mechanism, recompression typically leads to
some loss in quality and to not pixel-by-pixel identical images. This also
explains why your images grow again if you process them twice. Compression
artefacts are often hard to compress themselves. Thus, if you compress an
image over and over again it might sometimes grow in size. If you think
there is any metadata that is responsible for a larger or smaller file, you
could check with exiftool.
—
Reply to this email directly, view it on GitHub
<#107 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AQW5QFH47V3XHPNNA4RFXKLWQ32TPANCNFSM6AAAAAARR2SFX4>
.
You are receiving this because you authored the thread.Message ID:
***@***.***>
|
Thanks for your detailed explanations and ideas! I see how many of your points might be useful to users. However, my top priority is to keep the app as simple as possible and thus to offer the least amount of options possible while still beeing useful. I'll probably focus on features that seem most important to an average user to me or I like in some special way. In general, the goal is to just solve the compression problem and keep it really simple. Summarising the ideas (please correct me if I got something wrong, order by occurence):
1 & 2 3 4 5 6 Please don't be frustrated, I really appreciate all your ideas and well thought out proposals! Reality is just that I have limited time to invest in this early hobby project that currently probably has less than 5 users. Going forward I'll come back to this thread and consider the above points when I think about what to implement next. I never planned to make anything bigger than a very basic and simple MozJPEG UI though. |
Hi Georg
I’m happy to hear that you find my suggestions useful. Certainly not
frustrated. There is one tiny thing to consider. JPG files are almost
unique because there are three accepted extensions. Jpg, jpe and jpeg. Most
people use the setting where the extension doesn’t appear. So you should
look for all of these in your code. It’s probably a good idea to
standardise on jpg and silently change the extension to jpg if you find
these other extensions. If you search for *.jpg files you won’t find the
other ones.
Sadly, I don’t know C# well, All I have done is some maintenance
programming. Some things are easier, some are not. And I have a question.
Do you use CreateProcess? You may have noticed that the first argument
should be the path to the exe. I suppose you could pass ”monkey” since
programs typically don’t care where the exe is.
I think these issues are correctly ordered by importance. Once people start
using your program you will get valuble feedback and learn more how people
work and how they think about recompression vs lossless.
1) This one is important. You really want to know how much work the program
has done. ETA turned out to be complicated. If you want to make life really
complicated you could export filesize and milliseconds to a csv file, open
it in Excel and, hopefully, be able to do some curve fitting which then
could be used to come up with some kind of correctional function. Just
kidding, I think you will find that my averaging algorithm will work
3) Actually, even pixel perfect files can be reduced by a large number if
you don’t copy comments. That’s because you can add non standard markers
that can be huge. It seems there exists software that stores what you have
done in these markers. Maybe photoshop. Some people have found such markers.
3) Well, this should perhaps be a different project. Also, one could make
something like a usb disk drive scraper if you find a long since forgotten
disk and you want to sort of consolidate certain file types to another
drive where you can examine the stuff you find
5) Nice feature.
6) You are right, it is good practice to process files in a piecemeal
fashion. Just copy a few gigabyte to waork foler and then move them to a
”fixed” folder.
You know, Im really happy to be able to contribute to your project. I have
found that my pet projects often turn out to be more complicated than I had
anticipated. It’s almost always worthwhile to read comments if you find
that someone has devoloped something roughly similar.
Best regards,
Fredrik
fre 6 jan. 2023 kl. 14:49 skrev Georg Jung ***@***.***>:
… Thanks for your detailed explanations and ideas!
I see how many of your points might be useful to users. However, my top
priority is to keep the app as simple as possible and thus to offer the
least amount of options possible while still beeing useful. I'll probably
focus on features that seem most important to an average user to me or I
like in some special way. In general, the goal is to just solve the
compression problem and keep it really simple.
Summarising the ideas (please correct me if I got something wrong, order
by occurence):
1. Progressbar
2. ETA
3. Lossless recompress
4. Renaming
5. Avoid useless/harmfull re-recompression
6. Support restarts
1 & 2
I'm right there with you, they should exist. First and foremost the
progress bar.
3
I really understand your feeling towards keeping images as pixel perfect
copies, as I'm a hobby photographer too - not an astronomer though. My
workflow is fully RAW-based and I just export jpgs. Thus, from my
perspective I always have a pixel perfect copy but my priority when
creating jpgs is to have the best quality/size ratio possible. To achieve
this I'd export from Lightroom as tiff/... and then compress using MozJPEG.
The best quality/size ratio will just be achieved by actually compressing
with MozJPEG. I assume, most people are more interested in real jpg
re-encoding because that can easily achieve improvement rates of about 50%,
while lossless recompression is about 5% for pictures I tested with. Those
who really care about pixel-perfectness are typically using raw workflows
in the first place, I guess. I still like the idea though and might add it
to the advanced options in a future version.
4
I won't implement any renaming over what is required for keeping the
originals. I think there are other options available which specialize in
renaming and probably do this better then I ever could with reasonable
amounts of effort put in. I agree it'd be nice to have an all in one
solution, but in the end this is still a hobby project with limited
available resources and I think adding a basic and not
thought-through-to-the-end renaming solution would do more harm to this
software than provide use.
5
This is already implemented. In the settings you can select a "Skip if
size reduction is below" percentage. I think this is the easiest and most
straight forward way without adding any proprietary metadata or anything
like that. It feels right to me not to add anything proprietary. Also it
judges from the outcome point of view which I feel is most useful. Why care
if this software already touched the image? Just process it if processing
has a reasonable positive effect. Currently the steps are 1, 5, 10, 15, 20,
30, 40, 50. If I add the lossless recompress it might be worth considering
to add more steps in between. However, if it's lossless either way, why not
just use the 1% option?
6
I see how this might be useful for large image archives. I think however
that it is non-trivial to implement correctly. If I'd have a huge image
archive, wouldn't I then have the option to keep my PC running for some
days or otherwise, as a workaround, process my images in batches (e.g. put
50k images first, then reboot, then continue, ...). I still see how this
might be beneficial. It could be a feature that wouldn't require any
visible options before it is actually needed, e.g. ask for continuing when
restarted much like Excel asks you something like "Do you want to keep
these restored files from your last session?". I might consider adding this
in the future.
Please don't be frustrated, I really appreciate all your ideas and well
thought out proposals! Reality is just that I have limited time to invest
in this early hobby project that currently probably has less than 5 users.
Going forward I'll come back to this thread and consider the above points
when I think about what to implement next. I never planned to make anything
bigger than a very basic and simple MozJPEG UI though.
—
Reply to this email directly, view it on GitHub
<#107 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AQW5QFHPGCUFTO33HYMC6CLWRAPG3ANCNFSM6AAAAAARR2SFX4>
.
You are receiving this because you authored the thread.Message ID:
***@***.***>
|
I don't search for any file extension currently, I just take what the user drag-n-drops. It is supported to drop any non-jpeg image types GDI+ supports, e.g. tiff, bmp, png, ...; if any non image type is dropped, the conversion of this file is marked as error and I continue with the next one.
I don't. MozJPEG UI uses my MozJpegSharp library which is a managed dotnet wrapper for the unmanaged mozjpeg library. Thus, I don't create any processes but rather directly consume the library (which is what the mozjpeg authors suggest one should do).
I see. I currently copy any metadata I can because I want to keep exif data in general and "what to copy" is probably a question that is non-trivial to answer. I assume other software exists that helps with metadata removal. I might add an option "copy metadata", all or nothing. This is however nothing I need for myself right now, as the typical photo that drops out of a RAW->Lightroom->JPEG workflow does not have huge metadata from my experience. Please note that the first version, that is currently in the Microsoft Store doesn't really compress using MozJPEG but rather just using libjpeg-turbo, due to a bug (probably in MozJPEG's latest commit itself). So it'd be better to wait for the next one (based on MozJPEGs second newest commit) before any serious use (it is not possible to find using Store search, just if one knows the link, so no concerns there). |
Hi Georg
Oh, I see. My approach is more like this
https://learn.microsoft.com/en-us/dotnet/csharp/programming-guide/file-system/how-to-iterate-through-a-directory-tree
You search for all files, *.*, and then you look at what you get. If, say,
a pdf file is found it is simply ignored. No need to give it to gdiplus
which will fail. I think this is what you do. I think it’s more intuitive
to drop a folder in which case you have to do like in the example. Here you
accept only known extensions. It seems you have to search all files and
then drop them. Am I right? That’s a bit of a mess.
It’s a good thing there is a wrapper. Should you choose to automate some
other command line tool, I’m afraid you would have to use CreateProcess.
Let’s say you want to support Guetzli. It would be a bad idea because it
takes hours per file. It doesn’t have a wrapper since it’s more of a
research tool than something practical.
There is an additional reason why you should process only a few files. The
computer will get hot. Low end computers are not designed to handle such
workloads.
I think comments are a pain. It seems that every marker that isn’t strictly
needed is considered to be a comment. And that’s true, what else could it
be?
The real problem is that there are comments that you, as a photographer,
want to keep, like geodata, lens information and actual comments. If you
have pictures that have been generically photoshopped and that software
stores some kind of ”undo” data or keeps a record of what you have done,
then it’s a different story, especially if they add a megayte per file or
so. You may be able to find discussions about this topic where forensic
file analysis is discussed.
So, if you do macrophotography, you may want to stack a number of pictures
so that you get a final picture where everything is in focus. This is also
true for astrophotography. You probably want to get rid of this stuff once
you are happy. Ideally, you would want to extract the wanted markers, get
rid of all comments and then write them back. Heavy stuff, I wouldn’t
recommend it. Your approach to keep the metadata is the best option. What I
do is that I parse the files and if I find something unknown marker, I log
it. No practical reason, it was just funny to code.
Well, I think we have covered the most important aspects of your program.
Being able to drop a folder and then recursively iterate is a good idea.
It’s also fun if you have not done this before.
It’s a good idea to mention something like dupeguru in the comments. A very
good tool to find duplicate files since it’s so easy to just copy files
from one place to another. You should do this before processing anything
and if you make a surprise discovery you need to process them before you
use dupeguru.
Feel free to mail me when you have done more progress if you want to
discuss anything. My program is something of a platform for all kind of
crazy ideas that I have and want to code so it’s not strange that it does
some otherwise weird things.
Best regards,
Fredrik
fre 6 jan. 2023 kl. 17:31 skrev Georg Jung ***@***.***>:
… If you search for *.jpg files you won’t find the other ones.
I don't search for any file extension currently, I just take what the user
drag-n-drops. It is supported to drop any non-jpeg image types GDI+
supports, e.g. tiff, bmp, png, ...; if any non image type is dropped, the
conversion of this file is marked as error and I continue with the next one.
Do you use CreateProcess?
I don't. MozJPEG UI uses my MozJpegSharp library
<https://www.nuget.org/packages/MozJpegSharp/> which is a managed dotnet
wrapper for the unmanaged mozjpeg library. Thus, I don't create any
processes but rather directly consume the library (which is what the
mozjpeg authors suggest one should do).
Actually, even pixel perfect files can be reduced by a large number if you
don’t copy comments.
I see. I currently copy any metadata I can because I want to keep exif
data in general and "what to copy" is probably a question that is
non-trivial to answer. I assume other software exists that helps with
metadata removal. I might add an option "copy metadata", all or nothing.
This is however nothing I need for myself right now, as the typical photo
that drops out of a RAW->Lightroom->JPEG workflow does not have huge
metadata from my experience.
Please note that the first version, that is currently in the Microsoft
Store doesn't really compress using MozJPEG but rather just using
libjpeg-turbo, due to a bug (probably in MozJPEG's latest commit itself).
So it'd be better to wait for the next one (based on MozJPEGs second newest
commit) before any serious use (it is not possible to find using Store
search, just if one knows the link, so no concerns there).
—
Reply to this email directly, view it on GitHub
<#107 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AQW5QFDI62QUMJV2XJLA3J3WRBCELANCNFSM6AAAAAARR2SFX4>
.
You are receiving this because you authored the thread.Message ID:
***@***.***>
|
Using your own code, you could make a very useful application. It shoud be given a path and it shoul iterate over all files in tis directory and the sub direcories. It should do the safest thing i.e. make the jpg smaller such that is pixel by pixel identical. When a jpg is found, let mozjpeg create something like jpg.tmp. If this file is smaller, the original file can be overwritten by the contents of jpg.tmp. Either way, jpg.tmp shpuld be deleted afterwards. This is better than deleting the original file and then renaming jpg.tmp since it will cause fragmentation. Most jpg files from your smartcam can be reduced by 6% or so and most people have folders with lots of jpeg files.
Best regards,
Fredrik
The text was updated successfully, but these errors were encountered: