Notes for USBB Process
- Use the geo functions in BigQuery to aggregate the data to geographic areas
- Export that to geojson (or csv)
- Use ogr2ogr + tippecanoe to generate MBTiles of the results, and
- use Mapbox or openmaptiles to serve them up to the application. Locally can use
klokantech/tileserver-gl
to serve your mbtiles file. Openmaptiles docker container sets up an account and downloads tiles from their service. Stuart shared this command:docker run --rm -it -v $(pwd):/data -p 8080:80 klokantech/tileserver-gl
or you can use Kitematic/Docker to run the docker container.
- M-Lab Data (ndt)
- FCC Form 477 Data
- Geo Data
- Census Tracts
- Counties
- State House
- State Senate
- Congress
- Zip
- Dec 2014 = BETWEEN '2014-07-01' AND '2014-12-31'
- Jun 2015 = BETWEEN '2015-01-01' AND '2015-06-30'
- Dec 2015 = BETWEEN '2015-07-01' AND '2015-12-31'
- Jun 2016 = BETWEEN '2016-01-01' AND '2016-06-30'
- Dec 2016 = BETWEEN '2016-07-01' AND '2016-12-31'
- Jun 2017 = BETWEEN '2017-01-01' AND '2017-06-30'
Available in M-Lab, but not in the FCC data:
- Coming soon: Dec 2017 = BETWEEN '2017-07-01' AND '2017-12-31'
- Coming soon: Jun 2018 = BETWEEN '2018-01-01' AND '2018-06-30'
- Coming soon: Dec 2018 = BETWEEN '2018-07-01' AND '2018-12-31'
- M-Lab Data (ndt)
- download speed
- upload speed
- asn (currently not added, but later maybe)
- lat, long
- rtt
- date
count(test_id) as count_tests
count(distinct connection_spec.client_ip) as count_ips
APPROX_QUANTILES(8 * SAFE_DIVIDE(web100_log_entry.snap.HCThruOctetsAcked,
(web100_log_entry.snap.SndLimTimeRwin +
web100_log_entry.snap.SndLimTimeCwnd + web100_log_entry.snap.SndLimTimeSnd)), 101)[SAFE_ORDINAL(51)] AS download_Mbps
APPROX_QUANTILES(8 * SAFE_DIVIDE(web100_log_entry.snap.HCThruOctetsReceived,web100_log_entry.snap.Duration), 101)[SAFE_ORDINAL(51)] AS upload_Mbps
APPROX_QUANTILES(web100_log_entry.snap.MinRTT, 101)[SAFE_ORDINAL(51)] AS min_rtt
#standardSQL
SELECT
'census_tract' AS geo,
count(test_id) as ml_count_tests,
count(distinct connection_spec.client_ip) as ml_count_ips,
APPROX_QUANTILES(8 * SAFE_DIVIDE(web100_log_entry.snap.HCThruOctetsAcked,
(web100_log_entry.snap.SndLimTimeRwin +
web100_log_entry.snap.SndLimTimeCwnd +
web100_log_entry.snap.SndLimTimeSnd)), 101)[SAFE_ORDINAL(51)] AS ml_download_Mbps,
APPROX_QUANTILES(8 * SAFE_DIVIDE(web100_log_entry.snap.HCThruOctetsReceived,web100_log_entry.snap.Duration), 101)[SAFE_ORDINAL(51)] AS ml_upload_Mbps,
APPROX_QUANTILES(web100_log_entry.snap.MinRTT, 101)[SAFE_ORDINAL(51)] AS ml_min_rtt,
GEOID,
AFFGEOID,
CASE
WHEN partition_date BETWEEN '2014-07-01' AND '2014-12-31' THEN 'dec_2014'
WHEN partition_date BETWEEN '2015-01-01' AND '2015-06-30' THEN 'jun_2015'
WHEN partition_date BETWEEN '2015-07-01' AND '2015-12-31' THEN 'dec_2015'
WHEN partition_date BETWEEN '2016-01-01' AND '2016-06-30' THEN 'jun_2016'
WHEN partition_date BETWEEN '2016-07-01' AND '2016-12-31' THEN 'dec_2016'
WHEN partition_date BETWEEN '2017-01-01' AND '2017-06-30' THEN 'jun_2017'
WHEN partition_date BETWEEN '2017-07-01' AND '2017-12-31' THEN 'dec_2017'
WHEN partition_date BETWEEN '2018-01-01' AND '2018-06-30' THEN 'jun_2018'
WHEN partition_date BETWEEN '2018-07-01' AND '2018-12-31' THEN 'dec_2018'
END AS time_period
FROM
`measurement-lab.release.ndt_all`,
`mlab-sandbox.usa_geo.cb_2016_census_tracts`
WHERE
connection_spec.server_geolocation.country_name = "United States"
AND partition_date BETWEEN '2014-07-01' AND '2018-12-31'
AND ST_WITHIN(ST_GeogPoint(connection_spec.client_geolocation.longitude , connection_spec.client_geolocation.latitude ), tract_polygons)
GROUP BY
GEOID,
time_period,
AFFGEOID
- FCC Form 477 Data
- FRN
- Provider_Name
- DBA_Name
- Holding_Company_Number
- Holding_Company_Final
- Census_Block_FIPS_Code
- State
- Technology_Code
- Max_Advertised_Downstream_Speed__mbps_
- Max_Advertised_Upstream_Speed__mbps_
- Business [0,1] - Boolean?
- Consumer [0,1] - Boolean?
- Max_CIR_Downstream_Speed__mbps_
- Max_CIR_Upstream_Speed__mbps_
- time_period
location, median dl time_period, median up time_period, count_isps time_period, max advertized down time_period, max advertized up, time_period
location Census FIPS > County, Tract, State House, State Senate
Census Block Code: 06|067|001101|1085 Corresponds to: 06 - State| 067 - County| 001101 - Tract| 1085 - block
do we think this is accurate? https://i.stack.imgur.com/sF4tS.png from here: https://gis.stackexchange.com/questions/55239/which-census-geography-boundaries-do-congressional-districts-preserve
- Query M-Lab data, case by time period, spatial joined to geometry in BQ
- Save to table
- Export table as CSV
- ogr2ogr to tippecanoe to create mbtiles from geojson
- Repeat for each geometry
- Export csv of FCC data with Census Tract GEOID
- download all of them from GCP
gsutil cp gs://bucket/path/* ./
- tbd.
By default, ogr2ogr will treat all csv columns as text fields. You can provide a schema file with the same name as the .csv, but with the .csvt extension to fix this; this format is documented in the ogr2ogr docs. I used a CSV processing tool called xsv to generate a .csvt semi-automatically.
$ xsv select '!WKT' mlab_county_dec2014_dec2018_429.csv | \
xsv stats | \
xsv select type | \
tail -n +2 | \
sed 's/.*/"&"/' | \
sed 's/Unicode/String/g' | \
sed 's/Float/Real/g' | \
tr '\n' , > mlab_county_dec2014_dec2018_429.csvt
$ echo '"WKT"' >> mlab_county_dec2014_dec2018_429.csvt
By specifying /dev/stdout
as the output file for ogr2ogr and specifying /dev/stdin
as the input file for tippecanoe both can be part of a Unix pipeline.
-oo KEEP_GEOM_COLUMNS
avoids ogr2ogr including the WKT-encoded geometry in the output; it's a waste to keep it because we have the GeoJSON geometry instead.
By default, ogr2ogr looks for WKT geometry in a column literally named WKT
.
$ ogr2ogr -f GeoJSON /dev/stdout \
-oo KEEP_GEOM_COLUMNS=no \
mlab_county_dec2014_dec2018_429.csv | \
tippecanoe -o mlab_county_dec2014_dec2018_429.mbtiles \
-l mlab_county_dec2014_dec2018 /dev/stdin -zg
$ xsv join --help
...
Usage:
xsv join [options] <columns1> <input1> <columns2> <input2>
...
$ xsv join --full \
GEOID fcc_477_county_2014_2017_0515.csv \
GEOID mlab_county_dec2014_dec2018_429.csv | \
xsv select '!NAME,GEOID,WKT' > fcc_mlab_county_dec2014_dec2018_429.csv
$ (
echo 'GEOID'; \
xsv select 'GEOID' mlab_census_tracts_2014_2018_0430.csv | \
tail -n +2 | \
xargs printf '%011d\n' \
) | \
xsv cat columns - <(xsv select '!GEOID' mlab_census_tracts_2014_2018_0430.csv) > ~/mlab_adjusted_census_tracts.csv
$ xsv join --full GEOID fcc477_ct_json.csv GEOID \
~/mlab_adjusted_census_tracts.csv | xsv select '!GEOID,WKT' > fcc_mlab_census_tract.csv