Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Postgre: Culture is case sensitive #14810

Open
MikeKry opened this issue Dec 2, 2023 · 14 comments
Open

Postgre: Culture is case sensitive #14810

MikeKry opened this issue Dec 2, 2023 · 14 comments
Labels
Milestone

Comments

@MikeKry
Copy link
Contributor

MikeKry commented Dec 2, 2023

After migration from SQLite to PostgreSQL database, Culture became case sensitive.

I have found (so far) 2 issues:
1/ GraphQL now requires lowercased language - "en-gb" instead of "en-GB"
2/ culture filter in content items uses "en-GB" for search, but nothing is found unless you type in "en-gb"

not sure what is correct solution, but maybe culture should be stored in original format, not lowercase

@hishamco
Copy link
Member

hishamco commented Dec 2, 2023

Please steps to reproduce

@MikeKry
Copy link
Contributor Author

MikeKry commented Dec 2, 2023

@hishamco
Culture picker:
0/ run project on postgre
1/ Click on Culture picker in Content Items, select language
2/ after reload, list is empty
image
3/ manually type in filter "cs-cz"
4/ results are visible
image

GraphQL:
1/ same as for culture picker. Difference is only that case sensitivity is inside where condition. e.g. (where: {localization: {culture: "en-GB"}}) vs (where: {localization: {culture: "en-gb"}})

@hishamco
Copy link
Member

hishamco commented Dec 2, 2023

How is this related to GraphQL?

@hishamco hishamco self-assigned this Dec 2, 2023
@MikeKry
Copy link
Contributor Author

MikeKry commented Dec 2, 2023

I think it is not related to graphql at all, most likely it will be a problem either with case sensitivity in postgre or filter implementation for localization part

@hishamco
Copy link
Member

hishamco commented Dec 2, 2023

I will test this today and push a PR for the fix

@MikeKry
Copy link
Contributor Author

MikeKry commented Dec 2, 2023

btw. this seems to apply to all searches - also text/title search is case sensitive

@hishamco
Copy link
Member

hishamco commented Dec 2, 2023

Could try to create a COLLATION to avoid case sensitive checks

https://dba.stackexchange.com/questions/101294/how-to-create-postgres-db-with-case-insensitive-collation

@hishamco hishamco removed their assignment Dec 2, 2023
@MikeKry
Copy link
Contributor Author

MikeKry commented Dec 4, 2023

@hishamco

Okay, I tried following:

CREATE COLLATION public.case_insensitive (provider = icu, locale = 'und-u-ks-level2', deterministic = false);
ALTER TABLE "ContentItemIndex" ALTER COLUMN "DisplayText" SET DATA TYPE character varying(255) COLLATE "case_insensitive";
ALTER TABLE "LocalizedContentItemIndex" ALTER COLUMN "Culture" SET DATA TYPE character varying(16) COLLATE "case_insensitive";

1/ culture picker (LocalizedContentItemIndex) behaves OK.
2/ displayText (ContentItemIndex) filtration breaks - PostgresException: 0A000: nondeterministic collations are not supported for LIKE.

I guess for search, we need to use "ILIKE" instead.

Select * from "ContentItemIndex" where "DisplayText" ILIKE '%footer%'
instead of
Select * from "ContentItemIndex" where "DisplayText" LIKE '%footer%'

There are some other options.. https://aws.amazon.com/blogs/database/manage-case-insensitive-data-in-postgresql/

@sebastienros Is this issue even related to OC, or should I move it to yessql repo?

@MikeKry
Copy link
Contributor Author

MikeKry commented Dec 4, 2023

@hishamco

I tried this CITEXT data type as mentioned in link to amazon above. https://www.postgresql.org/docs/current/citext.html#CITEXT-LIMITATIONS

CREATE EXTENSION IF NOT EXISTS citext;
ALTER TABLE "ContentItemIndex" ALTER COLUMN "DisplayText" SET DATA TYPE CITEXT COLLATE pg_catalog."default";

This seems to work also with LIKE, but you cannot use case sensitive searches when you use this datatype. I think it is not an issue as we use only case insensitive search in other database providers. Another problem might be no length limit and slower execution?

There might be another option using indexes... https://coderwall.com/p/6yhsuq/improve-case-insensitive-queries-in-postgres-using-smarter-indexes

@sebastienros
Copy link
Member

I think we should handle any collation you decide to use, and this property should be stored normalized (lower-case for instance).

But if this is impacting too many features we might expect the collation to be case insensitive.

@MikeKry
Copy link
Contributor Author

MikeKry commented Mar 16, 2024

@sebastienros

So you suggest to normalize values to lowercase in application? I think that will have impact on all indexes - it has to be handled on every place where data is stored into index, so I guess mostly in index handlers + on every user input (search bar etc.) + might have impact on existing custom integrations e.g. via graphql.

I am okay with that, but it seems complicated at this point.

I would probably prefer to update yessql logic for postgre, as there are some other problems anyway. I have found out that also modifying column size breaks migrations in postgre, so it maybe should be revisited completely..

@MikeKry
Copy link
Contributor Author

MikeKry commented Apr 8, 2024

@sebastienros

Can you please provide a little guidance so I could implement it and create a PR?

@sebastienros
Copy link
Member

For the culture code we should store it normalized, so everywhere we set it we should ToLowerInvariant() it. And the queries will have to do so too. We should also do a migration to convert all existing records.

For the DisplayText search it's a collation issue. If we use the field directly then there is nothing we can do in OC, just configure the DB with that in mind. Or use a default case-insensitive collation when creating the DB. Or if we want orchard to behave in a consistent way whatever the collation then we should do it in the ContentItemIndex too and the search queries.

If we need more than that then it's borderline a search engine issue and we should not use SQL queries for these kind of searches. (talking about DisplayText).

@MikeKry
Copy link
Contributor Author

MikeKry commented Apr 15, 2024

Okay, I will try to make PR on this

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants