Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

I0617 rescu sensors eval #626

Merged
merged 38 commits into from
Aug 23, 2024
Merged
Show file tree
Hide file tree
Changes from 35 commits
Commits
Show all changes
38 commits
Select commit Hold shift + click to select a range
f40e716
initial commit for outage sql
gabrielwol May 31, 2023
734c7cd
speed improvements
gabrielwol May 31, 2023
40bc613
use scannon's method for identifying streaks
gabrielwol May 31, 2023
ff977eb
sql to identify sensors to repair
gabrielwol May 31, 2023
b6e962c
add comments
gabrielwol May 31, 2023
3cfcb2d
Add png
gabrielwol May 31, 2023
43c8d64
Rename file
gabrielwol May 31, 2023
52b7b29
Change threshold re:change to network outages sql
gabrielwol May 31, 2023
322c698
fluff
gabrielwol May 31, 2023
f376ffd
Reverted individual sensor outages to include network outages
gabrielwol May 31, 2023
a8a59e9
Small investigation into findings.
gabrielwol May 31, 2023
f7035fb
renamed folder
gabrielwol May 31, 2023
948212e
rename part 2
gabrielwol May 31, 2023
0054e24
fluff
gabrielwol Jun 2, 2023
4142f4b
Additional comments
gabrielwol Jun 12, 2023
2330f40
#617 Add on-going outage to network outages table
gabrielwol Jun 12, 2023
844b0f4
#617 initial commit for rescu_sensors_eval_readme.txt
gabrielwol Jun 12, 2023
3049fc6
reorganized into sql folder
gabrielwol Jun 12, 2023
0b24327
add more detail
gabrielwol Jun 12, 2023
c81d082
#617 change readme to md
gabrielwol Jun 12, 2023
337075a
Merge branch 'master' into i0617_rescu_sensors_eval
radumas Jun 21, 2023
af1bc27
#617 move readme to folder
gabrielwol Jul 31, 2023
f3e8e11
Merge branch 'i0617_rescu_sensors_eval' of github.com:CityofToronto/b…
gabrielwol Jul 31, 2023
ff5a7ef
#617 simplify query - remove CTE
gabrielwol Aug 1, 2023
dfc8d6e
#617 fluff fix
gabrielwol Aug 1, 2023
8d96a45
#617 split up readme as per Raph
gabrielwol Aug 1, 2023
3ba3af5
Merge branch 'master' into i0617_rescu_sensors_eval
radumas Sep 22, 2023
cffabe6
Merge branch 'master' into i0617_rescu_sensors_eval
gabrielwol Apr 16, 2024
a2a20a1
Merge branch 'master' into i0617_rescu_sensors_eval
scann0n Jun 11, 2024
d77062c
#617 adapt sqls to vds schema
gabrielwol Jun 25, 2024
901087b
#617 sqlfluff
gabrielwol Jul 29, 2024
8ed1c97
#617 remove the outdated validation exercise
gabrielwol Jul 30, 2024
cfb654f
#617 fix up the network_outages and individual_outages views only
gabrielwol Jul 30, 2024
4045015
Merge branch 'master' into i0617_rescu_sensors_eval
gabrielwol Aug 8, 2024
2e56282
#617 fix bullet point formatting
gabrielwol Aug 21, 2024
4b97cc3
typofix #617
radumas Aug 21, 2024
8e1d3d6
#617 fix broken links, md tables
gabrielwol Aug 23, 2024
69bbf5a
Merge branch 'master' into i0617_rescu_sensors_eval
radumas Aug 23, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion volumes/rescu/README.md
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@

# *The RESCU schema is now deprecated. Please refer to [`vds` schema](../vds/readme.md).*
# *The RESCU schema is now deprecated. Please refer to [`vds` schema](../vds/readme.md).* <!-- omit in toc -->
- Use `vds.counts_15min` instead of `rescu.volumes_15min`
- Identify RESCU sensors in the new schema using `vds.detector_inventory WHERE det_type = 'RESCU Detectors'`

Expand Down
37 changes: 35 additions & 2 deletions volumes/vds/readme.md
Original file line number Diff line number Diff line change
Expand Up @@ -23,6 +23,8 @@
- [vds.counts\_15min\_bylane](#vdscounts_15min_bylane)
- [vds.veh\_speeds\_15min](#vdsveh_speeds_15min)
- [vds.veh\_length\_15min](#vdsveh_length_15min)
- [vds.individual\_outages](#vdsindividual_outages)
- [vds.network\_outages](#vdsnetwork_outages)
- [Raw Data](#raw-data)
- [vds.raw\_vdsdata](#vdsraw_vdsdata)
- [vds.raw\_vdsvehicledata](#vdsraw_vdsvehicledata)
Expand Down Expand Up @@ -142,8 +144,8 @@ The regular detectors (DET) may have some utility but it is hard to tell with th
- In some cases you may find it easier to select from only the partition of interest. eg. `FROM vds.raw_vdsdata_div2_202308` instead of more verbose ```FROM vds.raw_vdsdata WHERE division_id = 2 and dt >= '2023-08-01 00:00:00'::timestamp....```.
- For RESCU requests, make use of the manually defined fields in `vds.detector_inventory` for easy filtering.
- Data quality checks have not been implemented in this new schema. For examples of past work see:
- @scann0n did some work on identifying good date ranges for RESCU sensors based on volumes from days with all 96 15-minute bins present which is written up [here](https://github.com/CityofToronto/bdit_data-sources/blob/master/volumes/rescu/README.md#6--how-often-are-data-quality-assessment-processes-for-the-data-undertaken).
- @gabrielwol did work to identify periods of network wide or individual sensor outages on the RESCU network which is written up [here](https://github.com/CityofToronto/bdit_data-sources/blob/3ba3af5068e96191caffab524d42ae52fe7be7b2/volumes/rescu/README.md#1--are-there-known-data-gapsincomplete-data)
- @scann0n did some work on identifying good date ranges for RESCU sensors based on volumes from days with all 96 15-minute bins present which is written up [here](https://github.com/CityofToronto/bdit_data-sources/blob/master/volumes/rescu/README.md#6--how-often-are-data-quality-assessment-processes-for-the-data-undertaken).
- @gabrielwol did work to identify periods of network wide or individual sensor outages on the RESCU network which is written up [here](https://github.com/CityofToronto/bdit_data-sources/blob/3ba3af5068e96191caffab524d42ae52fe7be7b2/volumes/rescu/README.md#1--are-there-known-data-gapsincomplete-data)
gabrielwol marked this conversation as resolved.
Show resolved Hide resolved

## Lookup Tables and Views

Expand Down Expand Up @@ -394,6 +396,37 @@ Row count: 4,622,437

</div>

### vds.individual_outages

A view of individual sensor outages of at least 30 minutes. Could be used in looking for days without outages of a certain duration, or in identifying unreliable sensors with multiple smaller outages.

It is recommended to add a where clause using `vdsconfig_uid` to get faster results for a sensor.

| column_name | data_type | sample |
| vdsconfig_uid | integer | 516 |
gabrielwol marked this conversation as resolved.
Show resolved Hide resolved
| entity_location_uid | integer | 10238 |
| division_id | smallint | 2 |
| time_start | timestamp without time zone | "2004-02-26 00:15:00" |
| time_end | timestamp without time zone | "2004-02-27 00:00:00" |
| date_start | date | "2004-02-26" |
| date_end | date | "2004-02-27" |
| time_range | tsrange | "[""2004-02-26 00:15:00"",""2004-02-27 00:00:00""]" |
| date_range | daterange | "[2004-02-26,2004-02-28)" |
| duration_days | numeric | 1.0 |

### vds.network_outages

`network_outages` view can be used to identify/elimante dates from a study where all detectors are inactive. Runs in about 20s for entire network. See an example [here](https://github.com/CityofToronto/bdit_data-sources/blob/901087b032a16fa971ba18c84e84d1625be0fece/volumes/rescu/validation/evaluate_rescu_network.ipynb) where this view was used to identify number of days of network wide outages per year.

| column_name | data_type | sample |
gabrielwol marked this conversation as resolved.
Show resolved Hide resolved
| time_start | timestamp without time zone | "1993-01-11 17:00:00" |
| time_end | timestamp without time zone | "1993-01-14 06:45:00" |
| date_start | date | 1993-01-11 |
| date_end | date | 1993-01-14 |
| time_range | tsrange | ["1993-01-11 17:00:00","1993-01-14 06:45:00"] |
| date_range | daterange | [1993-01-11,1993-01-15) |
| duration_days | numeric | 2.5729166666666667 |

## Raw Data

### vds.raw_vdsdata
Expand Down
64 changes: 64 additions & 0 deletions volumes/vds/sql/views/create-view-individual-outages.sql
Original file line number Diff line number Diff line change
@@ -0,0 +1,64 @@
CREATE OR REPLACE VIEW vds.individual_outages
AS

-- make a table of outages by rescu detector.
-- Outages includes network wide outages. Couldn't resolve overlapping individual and network
-- wide outages using this method (no longer cross joining timebins and detectors).
-- could be possible using future range operator: anymultirange - anymultirange → anymultirange
-- runs in 2 minutes for all of rescu.volumes15min
-- doesn't include outages that end after last data point (by detector)

WITH bin_gaps AS (
SELECT
vdsconfig_uid,
entity_location_uid,
division_id,
datetime_15min,
count_15min,
LAG(datetime_15min, 1) OVER sensor + interval '15 minutes' AS gap_start,
datetime_15min - interval '15 minutes' AS gap_end,
CASE datetime_15min - LAG(datetime_15min, 1) OVER sensor
WHEN '00:15:00' THEN 0
ELSE 1
END AS bin_break, --identify non-consecutive bins
datetime_15min - LAG(datetime_15min, 1) OVER sensor AS bin_gap --duration of gap
FROM vds.counts_15min_div2
WHERE COALESCE(count_15min, 0) > 0
WINDOW sensor AS (
PARTITION BY vdsconfig_uid, entity_location_uid, division_id ORDER BY datetime_15min
)
)

SELECT
bt.vdsconfig_uid,
bt.entity_location_uid,
bt.division_id,
bt.gap_start AS time_start,
bt.gap_end AS time_end,
(bt.gap_start)::date AS date_start,
(bt.gap_end)::date AS date_end,
tsrange(bt.gap_start, bt.gap_end, '[]') AS time_range,
daterange((bt.gap_start)::date, (bt.gap_end)::date, '[]') AS date_range,
EXTRACT(
EPOCH FROM (bt.gap_end + interval '15 minutes') - bt.gap_start
) / 86400 AS duration_days
FROM bin_gaps AS bt
WHERE
bt.bin_break = 1
AND bt.bin_gap IS NOT NULL
--Start and end are inclusive so interval 15 minutes implies a gap of 2 bins or 30 minutes
AND bt.gap_end - bt.gap_start >= interval '15 minutes'
ORDER BY
bt.vdsconfig_uid,
bt.entity_location_uid,
bt.division_id,
bt.gap_start;

ALTER TABLE vds.individual_outages OWNER TO vds_admins;

GRANT SELECT ON TABLE vds.individual_outages TO bdit_humans;

COMMENT ON VIEW vds.individual_outages IS

Check notice on line 61 in volumes/vds/sql/views/create-view-individual-outages.sql

View workflow job for this annotation

GitHub Actions / SQLFluff Lint

SQLFluff

LT01: Unnecessary trailing whitespace.
'''A view which identifies individual sensor outages in the VDS network.
Note: does not indclude ongoing outages. See vds.detector_inventory.last_active for that purpose.
Runs in around 1:30 for entire dataset or include a WHERE filter on vdsconfig_uid for speed.''';
56 changes: 56 additions & 0 deletions volumes/vds/sql/views/create-view-network-outages.sql
Original file line number Diff line number Diff line change
@@ -0,0 +1,56 @@
--network outages using partition, lag (sare's method).
--much faster, runs in 1s for all of vds.
CREATE OR REPLACE VIEW vds.network_outages AS

WITH rescu_summary AS (
SELECT
datetime_15min AS datetime_bin,
SUM(count_15min) AS volume
FROM vds.counts_15min_div2
GROUP BY datetime_15min
HAVING COALESCE(SUM(count_15min), 0) > 0

UNION

SELECT --quick way to include ongoing outage
now()::date AS datetime_bin,
1 AS volume
ORDER BY datetime_bin
),

bin_gaps AS (
SELECT
datetime_bin,
volume,
datetime_bin - interval '15 minutes' AS gap_end,
CASE datetime_bin - LAG(datetime_bin, 1) OVER (ORDER BY datetime_bin)
WHEN '00:15:00' THEN 0
ELSE 1
END AS bin_break,
datetime_bin - LAG(datetime_bin, 1) OVER (ORDER BY datetime_bin) AS bin_gap,
LAG(datetime_bin, 1) OVER (ORDER BY datetime_bin) + interval '15 minutes' AS gap_start
FROM rescu_summary
)

-- calculate the start and end times of gaps that are longer than 15 minutes
SELECT
gap_start AS time_start,
gap_end AS time_end,
gap_start::date AS date_start,
gap_end::date AS date_end,
tsrange(gap_start, gap_end, '[]') AS time_range,
daterange(gap_start::date, gap_end::date, '[]') AS date_range,
--no minimum duration for a network wide outage
EXTRACT(EPOCH FROM gap_end - gap_start) / 86400 AS duration_days
FROM bin_gaps
WHERE
bin_break = 1
AND bin_gap IS NOT NULL
ORDER BY gap_start;

ALTER TABLE vds.network_outages OWNER TO vds_admins;

GRANT SELECT ON TABLE vds.network_outages TO bdit_humans;

COMMENT ON VIEW vds.network_outages
IS 'A view which identifies network wide outages in the VDS network.';