csv batch search and replace #10

brendanheywood · 2024-09-18T13:31:27Z

The use case here is replacing a large number of similar urls but where the id's need to be explicitly remapped. eg migrating from service a to service b.

search,replace
https://www.youtube.com/watch?v=abcd,https://video.com/watch?123456
https://www.youtube.com/watch?v=bcde,https://video.com/watch?234567
https://www.youtube.com/watch?v=cdef,https://video.com/watch?345678

Because most of the urls will be in a similar shape we can do things like pre-scanning with the shared prefix to filter out things we don't need to touch which would in theory make this much faster than doing a simple replace_all multiple times one for each mapping.

We also need to make sure that on any given chunk of content each key is mapped to exactly one value, and they don't get replaced multiple times. In general we wouldn't expect this in most use cases anyway, ie in the example we would never have a youtube url on the right, which could be matched again on a second run on the left side.

It is assumed that we can't do this easily in sql, so we need to so some performance testing of doing this in php as a batch replace with all the IO, vs doing this in sql but repeatedly. It may also be a hybrid, ie we search in sql for the records which contain the shared prefix, then use that subset of record ids to do a faster pure sql replace repeatedly for each mapping. It all comes down to speed testing. It is assumed that in the general case the number of mappings could be huge, like 100,000s so gut feel doing it in php is going to end up better overall.

This whole process should be idempotent by design, so if it breaks then we just run it again and it picks up from where it left off and keeps going and its no big deal.

Also unclear is if we want to load the data into a temporary db table or just into memory. This will probably depends on if we end up with pure sql replace or php replace.

One thing currently unclear is if we want to leave behind any audit trail of changes. This would be quite hard in most cases, but if we know that we are replacing data in say the course table then we could emit a moodle event so there is evidence of the edit. Status quo here is no.

The text was updated successfully, but these errors were encountered:

brendanheywood mentioned this issue Oct 1, 2024

Replace feature issue#10 #15

Merged

brendanheywood closed this as completed Oct 1, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

csv batch search and replace #10

csv batch search and replace #10

brendanheywood commented Sep 18, 2024

csv batch search and replace #10

csv batch search and replace #10

Comments

brendanheywood commented Sep 18, 2024