Add PostgreSQL support #255

ormsbee · 2024-11-05T21:56:57Z

This repo currently has a set of collation-related helpers that it uses to help normalize model collation behavior between MySQL and SQLite:

They manifest in the migrations files like this:

openedx-learning/openedx_learning/apps/authoring/publishing/migrations/0001_initial.py

Lines 28 to 29 in 14284d2

    
           ('key', openedx_learning.lib.fields.MultiCollationCharField(db_collations={'mysql': 'utf8mb4_bin', 'sqlite': 'BINARY'}, max_length=500)), 
        
           ('title', openedx_learning.lib.fields.MultiCollationCharField(db_collations={'mysql': 'utf8mb4_unicode_ci', 'sqlite': 'NOCASE'}, max_length=500)),

The MultiCollationCharField is also used in one place in edx-platform

Anyone looking to add PostgreSQL support would need to look for all instances of MultiCollationCharField and MultiCollationTextField and put the postgres-specific mappings there. That includes the convenience methods like case_insensitive_char_field in fields.py, as well as in the individual migration files that have the sqlite/mysql arguments serialized out.

The text was updated successfully, but these errors were encountered:

ormsbee · 2024-11-05T21:58:34Z

@qasimgulzar: These are some of the specifics that have to happen for this repo to support PostgreSQL. Thank you for taking this work on.

qasimgulzar · 2024-11-07T06:17:51Z

Thank you @ormsbee for sharing this with me. I am going to take care of it.

qasimgulzar · 2024-11-13T13:30:32Z

@ormsbee could you please add some details of the background of this field, e.g: why this field is there any why is it required to be MultiCollationCharField. It will be very helpful while implementing and testing the change.

qasimgulzar · 2024-11-13T19:43:00Z

Update.-.1.1.-.min.mov

@ormsbee I have managed to run edx-platform with postgresql. I think it's going to bring some joy here.

ormsbee · 2024-11-14T01:58:49Z

There were two primary motivations for MultiCollationCharField:

1. To force `utf8mb4` encoding, even in databases that were still configured for `utf8mb3`.

Up until Redwood, the Tutor default database connection was utf8, which in MySQL translated into their utf8mb3 encoding, which lacks support for emojis and certain other Unicode code points that require 4 bytes to encode in UTF8. Many people had already upgraded their instances through their own scripts and configuration, and some instances that didn't use Tutor were already connecting via utf8mb4.

In the middle of this weird, transitional period, we were creating new data models to store library (and eventually course) content, and we wanted to make sure that the generated tables could properly store 4-byte characters no matter what the database connection settings were for that particular instance at the time.

2. To normalize case sensitivity behavior for certain fields.

We have fields where case-sensitivity is important, like identifier keys. We've gotten into issues with this sort of mismatch in the past, where CourseOverviews store their course keys in a case insensitive manner, but Modulestore stores them in a case sensitive manner, making collision possible (e.g. two courses in Modulestore that map to the same CourseOverview). We also run many of our tests using in-memory sqlite, which is case sensitive by default–as opposed to MySQL's default case insensitivity.

Given that, it's convenient to have something like MultiCollationCharField in order to force consistent constraint-checking and ordering behavior between the two. That way, models can declare that, "this should be a case-insensitive char field" and get consistent behavior across databases, without the error-prone process of altering collations manually in the migration file.

ormsbee assigned qasimgulzar Nov 11, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add PostgreSQL support #255

Add PostgreSQL support #255

ormsbee commented Nov 5, 2024

ormsbee commented Nov 5, 2024

qasimgulzar commented Nov 7, 2024

qasimgulzar commented Nov 13, 2024

qasimgulzar commented Nov 13, 2024

ormsbee commented Nov 14, 2024

Add PostgreSQL support #255

Add PostgreSQL support #255

Comments

ormsbee commented Nov 5, 2024

ormsbee commented Nov 5, 2024

qasimgulzar commented Nov 7, 2024

qasimgulzar commented Nov 13, 2024

qasimgulzar commented Nov 13, 2024

ormsbee commented Nov 14, 2024

1. To force utf8mb4 encoding, even in databases that were still configured for utf8mb3.

2. To normalize case sensitivity behavior for certain fields.

1. To force `utf8mb4` encoding, even in databases that were still configured for `utf8mb3`.