Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Monitor: Throttle EnviroDIY values to 2 weeks #2710

Merged
merged 2 commits into from
Mar 7, 2018

Conversation

rajadain
Copy link
Member

@rajadain rajadain commented Mar 7, 2018

Overview

Fetching 96K values per variable can overload the system. By limiting EnviroDIY to 1 month instead of 1 year, we get a more manageable quantity ~12K values per variable. This was previously done for NWISUV in #2494.

Is this acceptable @ajrobbins @aufdenkampe?

Connects #2709

Demo

image

image

Testing Instructions

  • Check out this branch and bundle

  • Go to :8000/ and select a shape in the Philadelphia area. Proceed to Analyze.

  • Switch to the Monitor tab and search for EnviroDIY. Switch to the CUAHSI tab.

    • If you don't see any results for EnviroDIY under CUAHSI, clear the cache and try again:

      vagrant ssh services -c 'redis-cli -n 1 --raw KEYS ":1:bigcz*" | xargs redis-cli -n 1 DEL'
      
  • Open the Detail view of any result. Ensure it fetches values correctly and you can see them in the chart. Ensure the chart has 1 month of values.

@rajadain
Copy link
Member Author

rajadain commented Mar 7, 2018

@azavea-bot rebuild

Copy link

@arottersman arottersman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tested!

@arottersman arottersman assigned rajadain and unassigned arottersman Mar 7, 2018
@ajrobbins
Copy link

Is there a way to test/know if this threshold is sufficient to prevent site crashes? That's my main criteria.

@rajadain
Copy link
Member Author

rajadain commented Mar 7, 2018

When fetching a year of values, it would take forever and freeze the VM locally. After limiting to a month, most variables end up with values within the timeout (although it still takes almost the entirety of the 60 second limit). Those that don't make the limit just take too long, they don't freeze the VM. I don't expect to see any crashes like the ones we saw initially with this limit in place.

We can further restrict the values to 1 week or 1 day if necessary, based on response seen on staging.

@aufdenkampe
Copy link
Member

@rajadain, thanks for testing how the EnviroDIY data loads. Indeed, the Water One Flow web service, and WaterML delivery, is fundamentally slow, and many EnviroDIY sensor stations are recording every 5 or 10 minutes (relative to USGS, which is typically 10 or 15 min). So it all make sense.

I agree with @ajrobbins that the primary concern is with crashing.

Limiting to 1 month seems reasonable, but perhaps 2 weeks might make sense in order keep things relatively snappy. @emiliom, what do you think?

Since yesterday, @emiliom and @horsburgh have been discussing a speedier web service for EnviroDIY, as a potential priority for Monitor. Not for this release, however!

@ajrobbins
Copy link

+1 for two weeks, for max performance and min crashing potential!

Previously we could only specify durations in unit lenths,
e.g. 1 week, 1 month, 1 year. This allows the specification
of integral lengths, e.g. 2 weeks, 3 months, etc.
Fetching 96K values per variable can overload the system.
By limiting EnviroDIY to 2 weeks instead of 1 year, we get
a more manageable quantity.
@rajadain rajadain force-pushed the tt/monitor-throttle-envirodiy branch from 24f8851 to c13e7ec Compare March 7, 2018 18:59
@rajadain
Copy link
Member Author

rajadain commented Mar 7, 2018

@arottersman could you verify this again please? Just added a refactor that allows specifying integral (rather than unit) durations, and limited EnviroDIY to 2 weeks.

Copy link

@arottersman arottersman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+2, tested for NWISUV, NWISDV, and EnviroDIY

@rajadain rajadain changed the title Monitor: Throttle EnviroDIY values to 1 month Monitor: Throttle EnviroDIY values to 2 weeks Mar 7, 2018
@rajadain rajadain merged commit 6484bad into develop Mar 7, 2018
@rajadain rajadain deleted the tt/monitor-throttle-envirodiy branch March 7, 2018 19:43
@rajadain
Copy link
Member Author

rajadain commented Mar 7, 2018

Thanks for taking a look!

@emiliom
Copy link
Contributor

emiliom commented Mar 7, 2018

Great to see this progress. I had noticed the timeouts yesterday. BTW, Hi @rajadain! Nice to interact with you again 😃

I'm intrigued by this: "When fetching a year of values, it would take forever and freeze the VM locally."

Not that I'm suggesting we insist of fetching a year, but if the VM is freezing it sounds like other mitigation / error-catching steps are in order, beyond simple service timeouts.

2 weeks seems overly short, but so be it. Longer term, it'd be good to explore if there are strategies for staging the time series data requests to happen in the background, sequentially or on demand (I'm assuming currently they're happening "in parallel" for all variables, all at once?).

Regarding @aufdenkampe's comment:

Since yesterday, @emiliom and @horsburgh have been discussing a speedier web service for EnviroDIY, as a potential priority for Monitor. Not for this release, however!

I suspect this is not just not for this release, but also not for the next month or two either!

@rajadain
Copy link
Member Author

rajadain commented Mar 8, 2018

Hi @emiliom,

We spent some time looking into the performance bottlenecks, and found that the largest resource utilization comes from ulmo itself. To demonstrate, consider this simple script which fetches one year of EnviroDIY data for one variable:

Sample Script
from ulmo.cuahsi import wof
@profile
def get_values():
    wsdl = 'http://data.envirodiy.org/wofpy/soap/cuahsi_1_1/.wsdl'
    site = 'EnviroDIY:JRains1'
    variable = 'envirodiy:MaxBotix_MB7386_Distance'
    from_date = '03/18/2017'
    to_date = '02/15/2018'

    wof.get_values(wsdl, site, variable, from_date, to_date)

if __name__ == '__main__':
    get_values()

Using line_profiler, memory_profiler, and psutil, the CPU and RAM profiles are as follows:

CPU Profile
$ kernprof -l -v get_values.py
Wrote profile results to get_values.py.lprof
Timer unit: 1e-06 s

Total time: 84.4948 s
File: get_values.py
Function: get_values at line 4

Line #      Hits         Time  Per Hit   % Time  Line Contents
==============================================================
     4                                           @profile
     5                                           def get_values():
     6         1          5.0      5.0      0.0      wsdl = 'http://data.envirodiy.org/wofpy/soap/cuahsi_1_1/.wsdl'
     7         1          1.0      1.0      0.0      site = 'EnviroDIY:JRains1'
     8         1          0.0      0.0      0.0      variable = 'envirodiy:MaxBotix_MB7386_Distance'
     9         1          0.0      0.0      0.0      from_date = '03/18/2017'
    10         1          1.0      1.0      0.0      to_date = '02/15/2018'
    11
    12         1   84494802.0 84494802.0    100.0      wof.get_values(wsdl, site, variable, from_date, to_date)
RAM Profile
$ python -m memory_profiler get_values.py
Line #    Mem usage    Increment   Line Contents
================================================
     4   77.039 MiB   77.039 MiB   @profile
     5                             def get_values():
     6   77.039 MiB    0.000 MiB       wsdl = 'http://data.envirodiy.org/wofpy/soap/cuahsi_1_1/.wsdl'
     7   77.039 MiB    0.000 MiB       site = 'EnviroDIY:JRains1'
     8   77.039 MiB    0.000 MiB       variable = 'envirodiy:MaxBotix_MB7386_Distance'
     9   77.039 MiB    0.000 MiB       from_date = '03/18/2017'
    10   77.039 MiB    0.000 MiB       to_date = '02/15/2018'
    11
    12  241.746 MiB  164.707 MiB       wof.get_values(wsdl, site, variable, from_date, to_date)
Additional Profiling
$ python get_values.py & while sleep 1; do ps -p $! -o pcpu= -o pmem= ; done;

image

image

As can be seen, fetching data for just 1 variable taxes the CPU and RAM considerably. In the app, we fetch data for 4-6 variables simultaneously, which can max out the resources, possibly even denying requests by other users.

The current search implementation is designed for a simple request / response cycle, which these kinds of long-running processes are ill-suited for. This design works well for CINERGI and HydroShare, but not as well for CUAHSI which involves expensive interpretation via Ulmo of search results.

It would be great if CUAHSI WDC could develop a paginated, REST based API in the future, or if Ulmo could be tweaked to be more performant. Accomodating this performance in MMW would require considerable thought and rearchitecting.

@aufdenkampe
Copy link
Member

@rajadain, thanks for providing this information. It is very helpful to see such results.

@emiliom
Copy link
Contributor

emiliom commented Mar 8, 2018

Thank you @rajadain ! That'll be very useful.

@emiliom
Copy link
Contributor

emiliom commented Mar 9, 2018

Pinging @lsetiawan just to point him to @rajadain's profiling work from yesterday (Mar 8). Don, please take a close look. We'll talk about this and follow up profiling later today.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants