Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use hashed folder names instead of filenames for images #6232

Closed
LekoArts opened this issue Jun 29, 2018 · 16 comments
Closed

Use hashed folder names instead of filenames for images #6232

LekoArts opened this issue Jun 29, 2018 · 16 comments
Labels
help wanted Issue with a clear description that the community can help with.

Comments

@LekoArts
Copy link
Contributor

LekoArts commented Jun 29, 2018

Summary

Since more and more people ask (#3132, prismicio/prismic-gatsby#40, Discord) about the image filenames and how they could remove/change the hash (Reason: SEO) I propose to move the hash out of the filename into a folder name.

Before:
/static/netlify-and-discord-1c11979b664a737c4d748b895d4507a7-b4bbd.webp

After:
/static/1c11979b664a737c4d748b895d4507a7-b4bbd/netlify-and-discord.webp
OR
/static/1c11979b664a737c4d748b895d4507a7/b4bbd/netlify-and-discord.webp

The details of the folder names / structure need to be defined as to what makes the most sense. The second structure would allow to have each image (with its different sizes) in its own folder.

Motivation

The idea came up while looking at the urls that Cloudinary serves. They do the same - and I think it'll make the complaints of the people go away.

Cloudinary: https://cloudinary.com/blog/how_to_dynamically_create_seo_friendly_urls_for_your_site_s_images#dynamic_seo_suffixes

@LekoArts LekoArts added the type: question or discussion Issue discussing or asking a question about Gatsby label Jun 29, 2018
@vinberdon
Copy link

As the person who just brought it up in the Discord, honestly, I would like some way to control the hashing entirely. Some projects just have no use for it at all and it gets in the way of 1) SEO and 2) my sanity when remoted into a server via SSH. If a way to change the hash already exists, please enlighten me!

@KyleAMathews
Copy link
Contributor

This sounds cool! I can't think of any reason not to do this. It'd be a pretty simple change to

const imgSrc = `/${file.name}-${
file.internal.contentDigest
}-${argsDigestShort}.${fileExtension}`
const filePath = path.join(process.cwd(), `public`, `static`, imgSrc)

The only change there would be to move the hash to be its own folder and then use the fs-extra api to ensure the directory is created before writing out the file.

This would also help avoid hitting file limits for folders as some OSs.

@KyleAMathews
Copy link
Contributor

We'd also need to check this doesn't hurt performance any from the increased I/O

@vinberdon
Copy link

What we discussed in the Discord was that compiling the site would take longer because it would redo all the images? That part I'm not too sure about. Other users seemed to think that having excess data in the filename is not an issue. It is for me, though.

As for moving the hash to the folder... I would hope those are not folders being put into public-html because that would change the entire file path and would screw up SEO even more by making the paths (URLs) to pages/content return a 404 and have the same content moved to a new URL.

@vinberdon
Copy link

Kyle, it looks like I could just change those lines right there and it would stop the hashing? Is that only for images? Is there any way at all to just drop the hashing entirely?

@KyleAMathews
Copy link
Contributor

Is there any way at all to just drop the hashing entirely?

We will never support this — 99% of people want this :-) It's critical for ensuring sites are fast. I apologize that it offends your sense of aesthetics. I'm happy to support minor tweaks like this for images assuming it doesn't cause any build or site performance regressions.

@vinberdon
Copy link

It looks like I can just configure webpack on my own ( https://webpack.js.org/guides/caching/ ) and remove the hash settings that are built into gatsby's version, correct?

The OP here about changing directories is not good for SEO or external (or even hard-coded internal) links at all. Many things will break and SEO will plummet. I understand that this is a niche request, so I'll just have to find my own way to do it.

Is something like this able to be made into a plugin? Like remove-filename-hash-plugin so that I can easily implement it into any future Gatsby build?

@KyleAMathews
Copy link
Contributor

Is something like this able to be made into a plugin? Like remove-filename-hash-plugin so that I can easily implement it into any future Gatsby build?

Yup! Plugins can modify our default webpack settings. Stuff like how gatsby-plugin-sharp aren't modifyable so you'd have to live with the defaults there.

@vinberdon
Copy link

Also, the hashes don't mean that a site would be faster, it means that the client would load the correct file when a new build is pushed. Content can and will be cached by the client and it will be fast regardless. What makes Gatsby fast is the static pages, not the hashing on the filenames.

@KyleAMathews
Copy link
Contributor

Content can and will be cached by the client and it will be fast regardless.

If file names don't change then you can't do long-term caching which means for every resources the browser has to do a request still for every file. The server can respond of course that the resource hasn't changed but this is a lot of extra requests which slow things down, sometimes considerably.

@vinberdon
Copy link

You can do long-term caching without hashes, you just can't force a client to update content when new content is pushed out unless you change the filename in some fashion. The file names changing has nothing to do with the ability to cache, just the ability to ensure new content is always loaded when applicable.

For my case, really the only things that are cacheable long-term and need to be updated when changed would be images, and if the image is changing, the filename is changing anyway (unless it's just "_____logo.png" in which case, I could just slightly change the filename to, say, v2 or "__square" or whatever.

I'll look into making a plugin at some point, but right now, I'm just getting my feet wet with Gatsby. So far, it appears to be perfect for me, but the hashing is giving me pause. If I can get rid of it for my own use right now and play around with the system a lot more and enjoy it, then I will definitely just make a plugin for it.

@KyleAMathews
Copy link
Contributor

We'd rather not have to educate everyone about always changing file names when they change something :-) That's an extremely easy thing to forget to do. Also with gatsby-image, we generate multiple thumbnails per image so we have to change the filename somehow to distinguish between them.

@LekoArts LekoArts added the help wanted Issue with a clear description that the community can help with. label Jun 30, 2018
@LekoArts
Copy link
Contributor Author

LekoArts commented Jun 30, 2018

We'd also need to check this doesn't hurt performance any from the increased I/O

I'm happy to support minor tweaks like this for images assuming it doesn't cause any build or site performance regressions.

I hope someone can tackle this and see how it performs :)

@simplesessions
Copy link

I would just like to add another 👍for this fantastic solution, as it solves a particular issue with using this with Prismic where the image filenames that are downloaded already prepend a hash. I was thinking of proposing a way to somehow provide a name transformation function on certain filetypes downloaded so we could somehow remove the original hash before it moves through the system, but this is better.

@Sekhmet
Copy link
Contributor

Sekhmet commented Oct 2, 2018

It's time to put my Gatsby shirt on and work on some PRs again.

I will work on that.

@sirichards
Copy link

Would love a solution to this! We have just built our first Gatsby website for a client who are over the moon, however a 3rd party SEO company have come back to us saying our image names are not SEO friendly because of the hash.

@DSchau DSchau closed this as completed in 2a66958 Oct 29, 2018
gpetrioli pushed a commit to gpetrioli/gatsby that referenced this issue Jan 22, 2019
 (gatsbyjs#8808)

**Related issue:** gatsbyjs#6232

@LekoArts mentioned two ways of implementing it (one that concatenates `argsDigest` with `contentDigest` and other that uses `contentDigest` as one directory and `argsDigest` as another child so images are grouped together and it limits the number of directories in `static` directory). This PR implements the latter.

### Performance impact
I measured the performance of original solution and one from this PR 5 times (and ignoring first build after changing gatsby version) for each (every time cleaning the cache) by running `gatsby build` on example [using-gatsby-image](https://github.com/gatsbyjs/gatsby/tree/master/examples/using-gatsby-image).
Tested on Linux, Ryzen 2700X and 970 EVO.

#### Original (`2.0.18`)
- `14.70`
- `14.08`
- `13.65`
- `14.22`
- `13.80`

**Average:** `14.09`

#### This PR:
- `13.77`
- `13.51`
- `13.61`
- `13.78`
- `13.66`

**Average:** `13,666`

So this results in 3% improvement in build times (it might be just luck, but at least it doesn't increase build time).
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
help wanted Issue with a clear description that the community can help with.
Projects
None yet
Development

No branches or pull requests

6 participants