Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

I0617 rescu sensors eval #626

Merged
merged 38 commits into from
Aug 23, 2024
Merged
Show file tree
Hide file tree
Changes from 21 commits
Commits
Show all changes
38 commits
Select commit Hold shift + click to select a range
f40e716
initial commit for outage sql
gabrielwol May 31, 2023
734c7cd
speed improvements
gabrielwol May 31, 2023
40bc613
use scannon's method for identifying streaks
gabrielwol May 31, 2023
ff977eb
sql to identify sensors to repair
gabrielwol May 31, 2023
b6e962c
add comments
gabrielwol May 31, 2023
3cfcb2d
Add png
gabrielwol May 31, 2023
43c8d64
Rename file
gabrielwol May 31, 2023
52b7b29
Change threshold re:change to network outages sql
gabrielwol May 31, 2023
322c698
fluff
gabrielwol May 31, 2023
f376ffd
Reverted individual sensor outages to include network outages
gabrielwol May 31, 2023
a8a59e9
Small investigation into findings.
gabrielwol May 31, 2023
f7035fb
renamed folder
gabrielwol May 31, 2023
948212e
rename part 2
gabrielwol May 31, 2023
0054e24
fluff
gabrielwol Jun 2, 2023
4142f4b
Additional comments
gabrielwol Jun 12, 2023
2330f40
#617 Add on-going outage to network outages table
gabrielwol Jun 12, 2023
844b0f4
#617 initial commit for rescu_sensors_eval_readme.txt
gabrielwol Jun 12, 2023
3049fc6
reorganized into sql folder
gabrielwol Jun 12, 2023
0b24327
add more detail
gabrielwol Jun 12, 2023
c81d082
#617 change readme to md
gabrielwol Jun 12, 2023
337075a
Merge branch 'master' into i0617_rescu_sensors_eval
radumas Jun 21, 2023
af1bc27
#617 move readme to folder
gabrielwol Jul 31, 2023
f3e8e11
Merge branch 'i0617_rescu_sensors_eval' of github.com:CityofToronto/b…
gabrielwol Jul 31, 2023
ff5a7ef
#617 simplify query - remove CTE
gabrielwol Aug 1, 2023
dfc8d6e
#617 fluff fix
gabrielwol Aug 1, 2023
8d96a45
#617 split up readme as per Raph
gabrielwol Aug 1, 2023
3ba3af5
Merge branch 'master' into i0617_rescu_sensors_eval
radumas Sep 22, 2023
cffabe6
Merge branch 'master' into i0617_rescu_sensors_eval
gabrielwol Apr 16, 2024
a2a20a1
Merge branch 'master' into i0617_rescu_sensors_eval
scann0n Jun 11, 2024
d77062c
#617 adapt sqls to vds schema
gabrielwol Jun 25, 2024
901087b
#617 sqlfluff
gabrielwol Jul 29, 2024
8ed1c97
#617 remove the outdated validation exercise
gabrielwol Jul 30, 2024
cfb654f
#617 fix up the network_outages and individual_outages views only
gabrielwol Jul 30, 2024
4045015
Merge branch 'master' into i0617_rescu_sensors_eval
gabrielwol Aug 8, 2024
2e56282
#617 fix bullet point formatting
gabrielwol Aug 21, 2024
4b97cc3
typofix #617
radumas Aug 21, 2024
8e1d3d6
#617 fix broken links, md tables
gabrielwol Aug 23, 2024
69bbf5a
Merge branch 'master' into i0617_rescu_sensors_eval
radumas Aug 23, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
59 changes: 59 additions & 0 deletions volumes/rescu/rescu_sensors_eval_readme.md
gabrielwol marked this conversation as resolved.
Show resolved Hide resolved
Original file line number Diff line number Diff line change
@@ -0,0 +1,59 @@
This PR accomplishes identification of network wide and individual detector outages.

1) **volumes/rescu/create-view-network-outages.sql** - Creates a table of network wide RESCU outages (no values from any sensor over any duration).
This can be used to find good dates for a data request or to be part of an alert pipeline.
For example, here is some sample code to start with a list of eligible dates for a request and exclude any date with a network outage of any length:

```
gabrielwol marked this conversation as resolved.
Show resolved Hide resolved
--use case: find dates and times when there were no network outages:
--there are 34 dates so far in 2023 with no network wide outages.
WITH list_dates AS (
SELECT generate_series('2023-01-01', '2023-06-12', '1 day'::interval)::timestamp AS date
)

SELECT l.date
FROM list_dates AS l
LEFT JOIN gwolofs.network_outages AS nout ON
nout.date_start <= l.date AND
nout.date_end >= l.date
WHERE nout.date_start IS NULL
```

2) **volumes/rescu/create-mat-view-individual-outages.sql** - identify individual detector outages. Could be useful for future data requests.
Similar to network outages but for each individual detector. Currently network wide and individual outages overlap due to difficulty of separating overlapping datetime ranges.

Here is a sample query which finds a list of dates where a list of sensors are all active with no individual outages.
gabrielwol marked this conversation as resolved.
Show resolved Hide resolved
```
gabrielwol marked this conversation as resolved.
Show resolved Hide resolved
WITH list_dates AS (
SELECT generate_series('2023-01-01', '2023-05-15', '1 day'::interval)::timestamp AS date
),

detectors AS (
SELECT detector_id
FROM rescu.detector_inventory
WHERE det_group = 'FGG' --all gardiner sensors
)

SELECT l.date
FROM list_dates AS l
CROSS JOIN detectors AS d
LEFT JOIN gwolofs.rescu_individual_outages AS iout ON
iout.detector_id = d.detector_id AND
iout.date_start <= l.date AND
iout.date_end >= l.date
WHERE iout.time_start IS NULL
GROUP BY 1
HAVING COUNT(*) = (SELECT COUNT(*) FROM detectors)
```

3) **volumes/rescu/Data validation/identify_sensors_to_repair_i0617.sql** - Create a list of detectors classified as good/bad/inactive to map for geographic distribution.
-Uses table 'network_outages' to elimante dates where all detectors are inactive from the denominator of up-time to get more realistic numbers.
-Create a summary table of detectors with different stats: total_volume, bins_active_percent, bins_active, last_active.
-Includes summary stats over a list of dates for active sensors and 'last active date' for any sensors not active during this period for completeness.
gabrielwol marked this conversation as resolved.
Show resolved Hide resolved

4) **volumes/rescu/Data validation/identify_detectors_for_repair.png** - visual output from above sql used to determine sensors to repair.
gabrielwol marked this conversation as resolved.
Show resolved Hide resolved

5) **volumes/rescu/validation/evaluate_rescu_network.ipynb**
A short ipynb to explore the results of the above queries.


64 changes: 64 additions & 0 deletions volumes/rescu/sql/create-mat-view-individual-outages.sql
Original file line number Diff line number Diff line change
@@ -0,0 +1,64 @@
DROP MATERIALIZED VIEW gwolofs.rescu_individual_outages;

CREATE MATERIALIZED VIEW gwolofs.rescu_individual_outages
AS

--make a table of outages by rescu detector.
--Outages includes network wide outages. Couldn't resolve overlapping individual and network wide outages using this method (no longer cross joining timebins and detectors).
--could be possible using future range operator: anymultirange - anymultirange → anymultirange ("Computes the difference of the multiranges.")
--runs in 2 minutes for all of rescu.volumes15min
--doesn't include outages that end after last data point (by detector)

WITH non_zero_bins AS (
SELECT
detector_id,
datetime_bin,
volume_15min
FROM rescu.volumes_15min
WHERE COALESCE(volume_15min, 0) > 0
gabrielwol marked this conversation as resolved.
Show resolved Hide resolved
),

bin_gaps AS (
SELECT
detector_id,
datetime_bin,
volume_15min,
LAG(
datetime_bin, 1
) OVER (
PARTITION BY detector_id ORDER BY datetime_bin
) + interval '15 MINUTES' AS gap_start,
datetime_bin - interval '15 MINUTES' AS gap_end,
CASE
WHEN
datetime_bin - LAG(
datetime_bin, 1
) OVER (PARTITION BY detector_id ORDER BY datetime_bin) = '00:15:00' THEN 0
ELSE 1
END AS bin_break, --identify non-consecutive bins
datetime_bin - LAG(
datetime_bin, 1
) OVER (PARTITION BY detector_id ORDER BY datetime_bin) AS bin_gap --duration of gap
FROM non_zero_bins
)

SELECT
bt.detector_id,
bt.gap_start AS time_start,
bt.gap_end AS time_end,
(bt.gap_start)::date AS date_start,
(bt.gap_end)::date AS date_end,
tsrange(bt.gap_start, bt.gap_end, '[]') AS time_range,
daterange((bt.gap_start)::date, (bt.gap_end)::date, '[]') AS date_range,
EXTRACT(
EPOCH FROM (bt.gap_end + interval '15 MINUTES') - bt.gap_start
) / 86400 AS duration_days
FROM bin_gaps AS bt
WHERE
bt.bin_break = 1
AND bt.bin_gap IS NOT NULL
--Start and end are inclusive so interval 15 minutes implies a gap of 2 bins or 30 minutes
AND bt.gap_end - bt.gap_start >= interval '15 MINUTES'
ORDER BY
bt.detector_id,
bt.gap_start
55 changes: 55 additions & 0 deletions volumes/rescu/sql/create-view-network-outages.sql
Original file line number Diff line number Diff line change
@@ -0,0 +1,55 @@
--network outages using partition, lag (sarah's method).
--much faster, runs in 1s for all of rescu.
--omits outage that is ongoing right now.
DROP VIEW gwolofs.network_outages;

CREATE VIEW gwolofs.network_outages AS

WITH rescu_summary AS (
SELECT
datetime_bin,
--occasional timeslot where a single sensor reports zero
--and no other data. Need to use sum and not count as criteria.
COALESCE(SUM(volume_15min), 0) AS volume
FROM rescu.volumes_15min
GROUP BY 1
HAVING COALESCE(SUM(volume_15min), 0) > 0

UNION

SELECT --quick way to include ongoing outage
now()::timestamp AS datetime_bin,
1 AS volume
ORDER BY 1
),

bin_gaps AS (
SELECT
datetime_bin,
volume,
datetime_bin - interval '15 MINUTES' AS gap_end,
CASE
WHEN
datetime_bin - LAG(datetime_bin, 1) OVER (ORDER BY datetime_bin) = '00:15:00' THEN 0
ELSE 1
END AS bin_break,
datetime_bin - LAG(datetime_bin, 1) OVER (ORDER BY datetime_bin) AS bin_gap,
LAG(datetime_bin, 1) OVER (ORDER BY datetime_bin) + interval '15 MINUTES' AS gap_start
FROM rescu_summary
)

-- calculate the start and end times of gaps that are longer than 15 minutes
SELECT
gap_start AS time_start,
gap_end AS time_end,
(gap_start)::date AS date_start,
(gap_end)::date AS date_end,
tsrange(gap_start, gap_end, '[]') AS time_range,
daterange((gap_start)::date, (gap_end)::date, '[]') AS date_range,
--no minimum duration for a network wide outage
EXTRACT(EPOCH FROM gap_end - gap_start) / 86400 AS duration_days
FROM bin_gaps
WHERE
bin_break = 1
AND bin_gap IS NOT NULL
ORDER BY datetime_bin;
gabrielwol marked this conversation as resolved.
Show resolved Hide resolved
Loading