Skip to content
This repository has been archived by the owner on Jul 31, 2024. It is now read-only.

Datamap script: Update logging and switch to low-level S3 client #1480

Merged
merged 4 commits into from
Jan 14, 2021

Conversation

jwhitlock
Copy link
Member

Update the datamap script with changes from a test run in staging (issue #840):

  • Switch from using the high-level Bucket resource to the low-level client to access the S3 bucket. Others have had problems with multiprocessing and the S3 Resource interfaces, which may accumulate memory due to caches or session accumulation. Others have suggested the low-level client does not share this problem. The Amazon devs are sadly quiet.
  • Add a --verbose option to force the logging options used in local development.
  • Add more progress messages for steps that were faster in the dev enviroment but take over a minute on staging. A new helper, watch_jobs, standardizes the job waiting loop.
  • Copy-edit some debug messages.

Set logging to DEBUG and human-centered output with --verbose
It was printing "tile0" rather than "tiles"
Create watch_jobs to generalize the logic of watching async jobs
complete. Add progress reports to the CSV export and the quadtree
generation, which are much slower in a stage on a 1-processor node.
The S3 service resource allocated more memory each time it is used,
possibly due to caching or session creation. Some solve this by
switching to the low-level client, initialized in each thread.
@jwhitlock jwhitlock merged commit eaccd16 into mozilla:main Jan 14, 2021
@jwhitlock jwhitlock deleted the datamap-logging-s3-840 branch January 14, 2021 02:49
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant