-
Notifications
You must be signed in to change notification settings - Fork 31
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Monitor: Throttle EnviroDIY values to 2 weeks #2710
Conversation
@azavea-bot rebuild |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Tested!
Is there a way to test/know if this threshold is sufficient to prevent site crashes? That's my main criteria. |
When fetching a year of values, it would take forever and freeze the VM locally. After limiting to a month, most variables end up with values within the timeout (although it still takes almost the entirety of the 60 second limit). Those that don't make the limit just take too long, they don't freeze the VM. I don't expect to see any crashes like the ones we saw initially with this limit in place. We can further restrict the values to 1 week or 1 day if necessary, based on response seen on staging. |
@rajadain, thanks for testing how the EnviroDIY data loads. Indeed, the Water One Flow web service, and WaterML delivery, is fundamentally slow, and many EnviroDIY sensor stations are recording every 5 or 10 minutes (relative to USGS, which is typically 10 or 15 min). So it all make sense. I agree with @ajrobbins that the primary concern is with crashing. Limiting to 1 month seems reasonable, but perhaps 2 weeks might make sense in order keep things relatively snappy. @emiliom, what do you think? Since yesterday, @emiliom and @horsburgh have been discussing a speedier web service for EnviroDIY, as a potential priority for Monitor. Not for this release, however! |
+1 for two weeks, for max performance and min crashing potential! |
Previously we could only specify durations in unit lenths, e.g. 1 week, 1 month, 1 year. This allows the specification of integral lengths, e.g. 2 weeks, 3 months, etc.
Fetching 96K values per variable can overload the system. By limiting EnviroDIY to 2 weeks instead of 1 year, we get a more manageable quantity.
24f8851
to
c13e7ec
Compare
@arottersman could you verify this again please? Just added a refactor that allows specifying integral (rather than unit) durations, and limited EnviroDIY to 2 weeks. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+2, tested for NWISUV, NWISDV, and EnviroDIY
Thanks for taking a look! |
Great to see this progress. I had noticed the timeouts yesterday. BTW, Hi @rajadain! Nice to interact with you again 😃 I'm intrigued by this: "When fetching a year of values, it would take forever and freeze the VM locally." Not that I'm suggesting we insist of fetching a year, but if the VM is freezing it sounds like other mitigation / error-catching steps are in order, beyond simple service timeouts. 2 weeks seems overly short, but so be it. Longer term, it'd be good to explore if there are strategies for staging the time series data requests to happen in the background, sequentially or on demand (I'm assuming currently they're happening "in parallel" for all variables, all at once?). Regarding @aufdenkampe's comment:
I suspect this is not just not for this release, but also not for the next month or two either! |
Hi @emiliom, We spent some time looking into the performance bottlenecks, and found that the largest resource utilization comes from
Sample Script
from ulmo.cuahsi import wof @profile
def get_values():
wsdl = 'http://data.envirodiy.org/wofpy/soap/cuahsi_1_1/.wsdl'
site = 'EnviroDIY:JRains1'
variable = 'envirodiy:MaxBotix_MB7386_Distance'
from_date = '03/18/2017'
to_date = '02/15/2018'
wof.get_values(wsdl, site, variable, from_date, to_date)
if __name__ == '__main__':
get_values() Using
CPU Profile
RAM Profile
As can be seen, fetching data for just 1 variable taxes the CPU and RAM considerably. In the app, we fetch data for 4-6 variables simultaneously, which can max out the resources, possibly even denying requests by other users. The current search implementation is designed for a simple request / response cycle, which these kinds of long-running processes are ill-suited for. This design works well for CINERGI and HydroShare, but not as well for CUAHSI which involves expensive interpretation via Ulmo of search results. It would be great if CUAHSI WDC could develop a paginated, REST based API in the future, or if Ulmo could be tweaked to be more performant. Accomodating this performance in MMW would require considerable thought and rearchitecting. |
@rajadain, thanks for providing this information. It is very helpful to see such results. |
Thank you @rajadain ! That'll be very useful. |
Pinging @lsetiawan just to point him to @rajadain's profiling work from yesterday (Mar 8). Don, please take a close look. We'll talk about this and follow up profiling later today. |
Overview
Fetching 96K values per variable can overload the system. By limiting EnviroDIY to 1 month instead of 1 year, we get a more manageable quantity ~12K values per variable. This was previously done for NWISUV in #2494.
Is this acceptable @ajrobbins @aufdenkampe?
Connects #2709
Demo
Testing Instructions
Check out this branch and
bundle
Go to :8000/ and select a shape in the Philadelphia area. Proceed to Analyze.
Switch to the Monitor tab and search for EnviroDIY. Switch to the CUAHSI tab.
If you don't see any results for EnviroDIY under CUAHSI, clear the cache and try again:
Open the Detail view of any result. Ensure it fetches values correctly and you can see them in the chart. Ensure the chart has 1 month of values.