Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

always sort assets by filename (Windows/Linux difference) #2236

Merged
merged 6 commits into from
Jul 10, 2023

Conversation

wold5
Copy link
Contributor

@wold5 wold5 commented Jun 29, 2023

For me the page.assets items are sorted differently on Windows and Linux. Zola on Windows sorts assets alphabetically, for Linux the sorting is kinda arbitrary as shown below:

aaa.txt
ddd.txt
ccc.txt
bbb.txt

It's no big deal, and could also be fixed using page.assets | sort, but this differing between platforms is not what one would expect.

A small patch is attached. The patch modifies find_related_assets to use the walkdir option sort_by_file_name (docs hints that the order isn't guaranteed otherwise). Feel free to accept or reject the patch.

Using

Zola version: 1.7.2
Windows 11 + Debian (PopOS 22.04)

Step to reproduce

Re-create the asset files using the scripts below. They're reversed, checking if Linux sorts by date (it does not).

Windows powershell

New-Item ddd.txt
sleep 2
New-Item ccc.txt
sleep 2
New-Item bbb.txt
sleep 2
New-Item aaa.txt

Linux/BSD

touch ddd.txt
sleep 2
touch ccc.txt
sleep 2
touch bbb.txt
sleep 2
touch aaa.txt

@Keats
Copy link
Collaborator

Keats commented Jun 29, 2023

Looking at the WalkDir function, I don't know if it does case insensitive ordering, I'm guessing it doesn't from the source? Might be needed for Mac, I'll check later tonight.

@wold5
Copy link
Contributor Author

wold5 commented Jun 30, 2023

Looking at the WalkDir function, I don't know if it does case insensitive ordering, I'm guessing it doesn't from the source? Might be needed for Mac, I'll check later tonight.

Good point. It uses case-sensitive sorting (checked both Windows and Linux), i.e.

AAA.txt
BBB.txt
DDD.txt
aaa.txt
bbb.txt

Unpatched Windows builds do case-insensitive sorts.

There's currently no toggle for this in WalkDir . An issue was followed by a PR, that was refused and in limbo since. In summary, sorting Unicode is complex, and the dev doesn't want more dependencies and suggests using sort_by instead, with some supplied function.
The PR provides many references with background info. For Rust there's caseless and icu_collator.

Suggests using Walkdir sort_by, or sorting the assets vector outside WalkDir, or dropping the issue altogether...

@Keats
Copy link
Collaborator

Keats commented Jun 30, 2023

We can just keep the default unsorted walkdir and sort it ourselves after lowercasing the filenames. I think it's what most people expect but I could be wrong!

@wold5
Copy link
Contributor Author

wold5 commented Jun 30, 2023

We can just keep the default unsorted walkdir and sort it ourselves after lowercasing the filenames. I think it's what most people expect but I could be wrong!

KISS then :)

So something like this before returning assets:

// Only ascii lowercase - 
assets.sort_by(|a, b| 
    a.file_name().unwrap().to_ascii_lowercase()
        .cmp(&b.file_name().unwrap().to_ascii_lowercase()
    )
);

// or for all lowercase convert to_str
assets.sort_by(|a, b| 
    a.file_name().unwrap().to_str().unwrap().to_lowercase()
        .cmp(&b.file_name().unwrap().to_str().unwrap().to_lowercase()
    )
);

I'm new to Rust BTW ;)

@Keats
Copy link
Collaborator

Keats commented Jun 30, 2023

KISS is my motto. First one is fine

Uses .to_str() to sort files and subfolders.

The .unwrap() may need work or be replaced by unwrap_or_default(). Given
earlier checks in the function it should work however.
@wold5
Copy link
Contributor Author

wold5 commented Jul 5, 2023

OK. It now uses .to_str() to also take the subdirectories into account.

Note sure about unwrap(). Given the checks in the function it should do (else use unwrap_or_default()). Tried match, but I really need to read up on Rust first, so I'll leave it at this for now.

@Keats
Copy link
Collaborator

Keats commented Jul 5, 2023

Thanks it looks good. Do you know how to write tests in Rust? Otherwise I can do it.

@wold5
Copy link
Contributor Author

wold5 commented Jul 5, 2023

Thanks it looks good. Do you know how to write tests in Rust? Otherwise I can do it.

Great. Tests, no, but I can try. You can always replace it.

This would require extending the test below (at least), creating additional files like aaa.txt bbb.txt CCC.txt ddd.txt and then checking their order within assets, not?

#[test]
fn can_find_related_assets_recursive() {
let tmp_dir = tempdir().expect("create temp dir");
let path = tmp_dir.path();
File::create(path.join("index.md")).unwrap();
File::create(path.join("example.js")).unwrap();
File::create(path.join("graph.jpg")).unwrap();
File::create(path.join("fail.png")).unwrap();
File::create(path.join("extensionless")).unwrap();
create_dir(path.join("subdir")).expect("create subdir temp dir");
File::create(path.join("subdir").join("index.md")).unwrap();
File::create(path.join("subdir").join("example.js")).unwrap();
let assets = find_related_assets(path, &Config::default(), true);
assert_eq!(assets.len(), 4);
assert_eq!(assets.iter().filter(|p| p.extension().unwrap_or_default() != "md").count(), 4);
for asset in ["example.js", "graph.jpg", "fail.png", "subdir/example.js"] {
assert!(assets.iter().any(|p| p.strip_prefix(path).unwrap() == Path::new(asset)))
}
}

@Keats
Copy link
Collaborator

Keats commented Jul 5, 2023

Yep and since they will be sorted you can remove the loop in

for asset in ["example.js", "graph.jpg", "fail.png", "subdir/example.js"] {
and compare by the array directly

@wold5
Copy link
Contributor Author

wold5 commented Jul 6, 2023

Just dawned on me what you meant... at least I hope so :)

Alternatively, see the now removed standalone check using windows() in aeb9668

Apply this to can_find_related_assets_non_recursive as well?
The change is now also applied to the non-recursive test function.

fn can_find_related_assets_non_recursive() {

@Keats Keats merged commit afc0e2c into getzola:next Jul 10, 2023
@Keats
Copy link
Collaborator

Keats commented Jul 10, 2023

Nice, thanks!

peterprototypes pushed a commit to peterprototypes/zola that referenced this pull request Sep 12, 2023
* sort page.assets by filename

Uses .to_str() to sort files and subfolders.

The .unwrap() may need work or be replaced by unwrap_or_default(). Given
earlier checks in the function it should work however.

* add tests for assets sorting

* fix rustfmt

* use existing loop instead of windows

* also check the non-recursive test

* use .zip() and add assert msg
technimad pushed a commit to technimad/zola that referenced this pull request Sep 30, 2023
* sort page.assets by filename

Uses .to_str() to sort files and subfolders.

The .unwrap() may need work or be replaced by unwrap_or_default(). Given
earlier checks in the function it should work however.

* add tests for assets sorting

* fix rustfmt

* use existing loop instead of windows

* also check the non-recursive test

* use .zip() and add assert msg
Keats pushed a commit that referenced this pull request Dec 18, 2023
* sort page.assets by filename

Uses .to_str() to sort files and subfolders.

The .unwrap() may need work or be replaced by unwrap_or_default(). Given
earlier checks in the function it should work however.

* add tests for assets sorting

* fix rustfmt

* use existing loop instead of windows

* also check the non-recursive test

* use .zip() and add assert msg
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants