Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

speed of GLM grid generation for ABI CONUS fixed grid #68

Open
jlc248 opened this issue Apr 9, 2020 · 2 comments
Open

speed of GLM grid generation for ABI CONUS fixed grid #68

jlc248 opened this issue Apr 9, 2020 · 2 comments

Comments

@jlc248
Copy link

jlc248 commented Apr 9, 2020

I'm working on the master branch, creating GLM gridded files for the GOES East ABI CONUS fixed grid (shape=(2500,1500)).

I could be misremembering, but the speed seems a bit slow. It takes about 20 sec to process 1 min of data (three L2 GLM files). I'm processing 1 min at a time. make_GLM_grids.py also seems to gobble up all of the available CPU threads on a machine (40 in my case).

Does that cadence seem about right to you? When I created a lot of GLM data for the CONUS fixed grid two years ago, I thought it was much faster, but I could be wrong. If there is a slowdown, is that mainly due to the computation of min_flash_area? something else? Perhaps there is some more optimal way to process a bunch of data at once?

Lastly, is there a way to limit the number of CPU threads that make_GLM_grids.py uses? Or is that not possible/recommended because of slower performance?

I understand it's a lot of processing to unravel all of the parent<->child relationships, so if this is just the way it is, that's totally fine. I'm just curious. And it's plenty fast for realtime processing.

@deeplycloudy
Copy link
Owner

@jlc248 looking at the creation timestamp on the CONUS grids I create for Unidata'a THREDDS, that's pretty close to what I'm getting. There was a performance regression related to .min() in pandas that caused me some problems, too. That is as-yet unfixed upstream.

Regarding threads, I thought I had turned off the parallel processing for both polygon clipping and another spot where I had put in a foundation for some tiling. If you search for "pool" in source, that should be all the locations where there would be parallelism, but it does show 600% CPU on the AWS instance that I'm running right now. Maybe there's something to fix, there …

@zxdawn
Copy link

zxdawn commented May 27, 2021

@deeplycloudy I suppose we can test the speed of dask_groupby(). That should improve a lot.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants