Changelog

All notable changes to FlowKit will be documented in this file.

The format is based on Keep a Changelog.

Unreleased

Added

Changed

Mode is now available for use with categorical metrics when running joined spatial aggregates via api. #2021
Flowmachine now includes the version number in query ids which means cache entries are per-version. #4489

Fixed

Fixed dangling async tasks not being properly cancelled during server shutdown #6833

Removed

1.30.0

Changed

FlowMachine now requires python >= 3.11

Fixed

Direction enum not being recognised #6787

Removed

Removed Oracle fdw

1.29.0

Added

New flowmachine query CalendarActivity, which retrives subscribers pattern of active days
New flowmachine queries PerValueAggregate and RedactedPerValueAggregate, which group by the value column of another query and apply an aggregate to subscribers with that grouping.
New flowapi queries and flowclient functions for calendar_activity and localised_calendar_activity, which return counts of subscribers per sequence of active days, and per sequence of active days additionally grouped by the subscribers reference location
Added new StringStatistic enum, which enumerates valid statistics for use with postgres string types

Changed

HistogramAggregation has moved to flowmachine.features.nonspatial_aggregates
Statistic moved to flowmachine.core.statistic_types
TotalActivePeriodsSubscriber no longer returns an extra inactive_periods column

1.28.1

Fixed

Fixed 500 error when getting api spec from FlowAPI #6686

1.28.0

Added

Added support for Parquet foreign tables using parquet_fdw

Changed

FlowKit test and synthetic data now uses parquet foreign tables.

Warning

The location of the parquet files in the container is /parquet_data, if you are testing with larger amounts of data you may wish to add an additional bind mount for this location.

FlowDB now uses declarative partitioning
FlowETL now attached new data as partitions, rather than subtables

Warning

This change is not backwards compatible with earlier releases of FlowDB, and you will need to repopulate your deployment. We recommend combining this change with the new parquet support.

FlowETL is now built on Airflow 2.9.2

1.27.0

Added

Added FlowDB table infrastructure.invalid_cell_info for recording cell information that could not be included in infrastructure.cell_info (including cells with null or duplicate cell IDs). #6626
The file name of FlowDB's automatically generated at init config file can now be specified by setting the AUTO_CONFIG_FILE_NAME environment variable. By default this is postgresql.configurator.conf.

Changed

FlowDB now triggers an ANALYZE on newly created cache tables to generate statistics rather than waiting for autovacuum
FlowDB now produces JSON formatted logs by default. Set FLOWDB_LOG_DEST=csvlog for the old default behaviour.
The logging destination of FlowDB can now be configured at init by setting the FLOWDB_LOG_DEST environment variable, valid options are stderr, csvlog, and jsonlog.
The location inside the container of FlowDB's automatically generated config file has changed to /flowdb_autoconf/$AUTO_CONFIG_FILE_NAME.

1.26.0

Changed

FlowDB now enables partitionwise aggregation planning by default
FlowDB now uses a default fillfactor of 100 for cache table indexes
EXCLUDE constraint on FlowDB infrastructure.cell_info table requires unique mno_cell_id across all simultaneously-valid cells per cells_table_version, regardless of to_include. #6626

Fixed

Queries that have multiple of the same subquery with different parameters no longer cause duplicate scopes in tokens. #6580
FlowETL QA checks count_imeis, count_imsis, max_msisdns_per_imei and max_msisdns_per_imsi now only count non-null IMEIs/IMSIs. #6619

1.25.0

Fixed

FlowETL get_qa_checks no longer attempts to create duplicate tasks for QA checks defined in the DAG folder. #6494

Removed

Removed flowpyter-task from the FlowETL Docker image. For a Docker image with flowpyter-task included, see (flowminder/flowbot)[https://hub.docker.com/r/flowminder/flowbot].

1.24.0

Added

Test and synthetic data generators now perform QA checks on the generated data. #6467
Added new /qa endpoint to FlowAPI and FlowClient, which supports getting the results of QA checks run by FlowETL #2704
Added new available_qa_checks property to flowmachine Connection objects #2704
Added new get_qa_checks method to flowmachine Connection objects #2704

Fixed

Test QA check IDs are now of the same format as those produced by FlowETL. #6472
FlowAuth now runs migrations correctly on startup. #6480

1.23.0

Changed

MostFrequentLocation now breaks ties based on the last used location, instead of by arbitrary Postgres sort order. #6268
Users no longer have write access to the public schema in FlowDB following a change introduced in PostgreSQL 15
FlowDB is now built on PostgreSQL 16, debian bullseye

Warning

You may need to update your docker version to use newer releases of FlowDB. You will also need to create a fresh database and reimport data if you are upgrading from a previous FlowDB release.

1.22.0

Added

FlowETL sensor NRowsPresentSensor which checks for a specified minimum number of rows.

Changed

ForeignStagingTableOperator will now error if the underlying file cannot be read or the command returns an error. #5763
Flowmachine now requires SQLAlchemy >= 2.0.0 #6066

1.21.1

Added

Changed

Upgraded Python dependencies

Fixed

Removed

1.21.0

Added

Added new FlowDB tables infrastructure.cell_info and infrastructure.cells_table_versions to keep track of changes to the cell info over time (note: the new tables have not yet replaced infrastructure.cells as the source of cell information for FlowKit queries). #6184

1.20.0

Changed

Updated flowpyter-task to 1.1.0

Removed

Removed AutoFlow. #6394

1.19.1

Added

Added flowpyter-task to FlowETL container

1.19.0

Added

FlowETL now updates a new table events.location_ids each time a new day of CDR data is ingested, to record the first and last date that each location ID appears in the data. #5376
New FlowETL QA check "count_locatable_events", which counts the number of added rows with location ID corresponding to a cell with a known location. #5289
flowkit_jwt_generator is now published as a wheel via pypi

1.18.4

Changed

docker-compose has been replaced with docker compose in the makefile; this might break builds on machines that haven't updated their docker in a while.

Fixed

SQLAlchemy version installed in the FlowMachine docker image is now compatible with the flowmachine library. #6052

1.18.3

Added

Quickstart script now supports arbitrary countries via EXAMPLE_COUNTRY env var. #5796
FlowDB's maximum locks per transaction setting can now be controlled using the MAX_LOCKS_PER_TRANSACTION env var. #5157

Changed

Increased FlowDB's default maximum locks per transaction to 365 * 5 * 4 * (1 + 4). #5157

Fixed

Null values in first column of first row of ingested data no longer cause flowetl to skip ingestion #5090

1.18.2 Fixed

Fixed migrations being missing from the built FlowAuth docker images #5818

1.18.1

Added

Added Alembic support via flask-migrate to Flowauth #5799

1.18.0

Added

Added views etl.ingested_state, etl.available_dates and etl.deduped_post_etl_queries in FlowDB, for convenient extraction of relevant information from the ETL tables. #5641
Added MajorityLocationWithUnlocatable query class and majority_location function. #5720

Changed

Important; tokens issued by previous versions of Flowauth are not compatible with this version. Users will need to regenerate tokens using the updated Flowauth.
Move from groups to roles in flowauth; see here for full details. #5613
Changed AIRFLOW__CORE__SQL_ALCHEMY_CONN env var to AIRFLOW__DATABASE__SQL_ALCHEMY_CONN
RoleScopePicker component redesigned and reimplemented.
Docs now recommend creating a separate bind mount for airflow scheduler logs, and include this in the secrets quickstart. #3622
jwt tokens now use sub instead of identity for JWT_IDENTITY_CLAIM.
A majority_location query with include_unlocatable=True will now include rows for all subscribers in the subscriber_location_weights sub-query, including those for whom all weights are negative (previously subscribers with only negative weights were excluded).

Fixed

Fixed a potential deadlock when using a small connection pool and store-ing queries
AutoFlow can now be run in a docker container with non-default user. #5574
Passing an empty list of events tables when creating a query now raises ValueError: Empty tables list. instead of a MissingDateError. #436
Flowmachine now looks at only the most recent state (per CDR type per CDR date) in etl.etl_records to determine available dates. #5641
It is now possible to run API queries that include multiple different aggregation units (e.g. joined_spatial_aggregate with displacement metric). #4649
Demo roles can now be used in worked_examples. #5735

Removed

Removed the include_unlocatable parameter from MajorityLocation class (the majority_location function should be used instead if include_unlocatable is required). #5720

1.17.1

Added

Added get_aggregation_unit server action, for getting the aggregation unit associated with a query specification. #5141

Changed

nocturnal_events now expects a night_hours parameter with nested sub-fields start_hour and end_hour, instead of two parameters night_start_hour and night_end_hour.
Spatial units with a mapping table now only include cells that appear in the mapping table. #5360

Fixed

Invalid sub-query specs nested within a modal_location spec now raise appropriate validation errors, instead of being masked by internal flowmachine server errors. #4816

1.17.0

Added

inflows and outflows exposed via API endpoint + added to flowclient #2029, #4866

Changed

Action Needed Airflow updated to version 2.3.3; backup flowetl_db before applying update #4940
Tables created under the cache schema in FlowDB will automatically be set to be owned by the flowmachine user. #4714
Query.explain will now explain the query even where it is already stored. #1285
unstored_dependencies_graph no longer blocks until dependencies are in a determinate state. #4949
In and out flows no longer return location columns with to/from suffix.
FlowDB now always creates a role named flowmachine.
Flowmachine will set the state of a query being stored to cancelled if interrupted while the store is running.
Flowmachine now supports sqlalchemy >=1.4 #5140

Fixed

Flowmachine now makes the built in flowmachine role owner of cache tables as a post-action when a query is stored. #4714
TopupBalance now returns the weighted mode when requested instead of weighted median #1412
Fixed in and out flow geojson for multicolumn location types #5132
quick_start.sh should no longer raise a misleading error if ss is not installed. #3151

Removed

use_file_flux_sensor removed entirely. #2812
Model, ModelResult and Louvain have been removed. #5168

1.16.0

Added

Most frequent locations is now available via FlowAPI. #3165
Total active periods is now available via FlowAPI.
Made hour of day slicing available via FlowAPI. #3165
Added visited on most days reference location query. #4267
Added unique value from query list query. #4486
Added mixin for exposing start_date and end_date internally as datetime objects #4497
Added CombineFirst and CoalescedLocation queries. #4524
Added MajorityLocation query. #4522
Added join_type param to Flows class. #4539
Added PerSubscriberAggregate query. #4559
Added FlowETL QA checks 'count_imeis', 'count_imsis', 'count_locatable_location_ids', 'count_null_imeis', 'count_null_imsis', 'count_null_location_ids', 'max_msisdns_per_imei', 'max_msisdns_per_imsi', 'count_added_rows_outgoing', 'count_null_counterparts', 'count_null_durations', 'count_onnet_msisdns_incoming', 'count_onnet_msisdns_outgoing', 'count_onnet_msisdns', 'max_duration' and 'median_duration'. #4552
Added FilteredReferenceLocation query, which returns only rows where a subscriber visited a reference location the required number of times. #4584
Added LabelledSpatialAggregate query and redaction, which sub-aggregates by subscriber labels. #4668
Added MobilityClassification query, to classify subscribers by mobility type based on a sequence of locations. #4666
Exposed CoalescedLocation via FlowAPI, in the specific case where the fallback location is a FilteredReferenceLocation query. #4585
Added LabelledFlows query, which returns flows disaggregated by label #4679
Exposed LabelledSpatialAggregate and LabelledFlows via FlowAPI, with a MobilityClassification query accepted as the 'labels' parameter. #4669
Added RedactedLabelledAggregate and subclasses for redacting labelled data (see ADR 0011). #4671

Changed

Harmonised FlowAPI parameter names for start and end dates. They are now all start_date and end_date
Further improvements to token display in FlowAuth. #1124
Increased the FlowDB quickstart container's timeout to 15 minutes. #782
Union and Query.union now accept a variable number of queries to concatenate. #4565

Fixed

Autoflow's prefect version is now current. #2544
FlowMachine server will now successfully remove cache for queries defined in an interactive flowmachine session during cleanup. #4008

1.15.0

Added

FlowETL flux check can be turned off by setting use_flux_sensor=False in create_dag. #3603

Changed

The use_file_flux_sensor argument to create_dag is deprecated. To use the table-based flux check in a file-based DAG, set use_flux_sensor='table'.
Improvements to token display in FlowAuth. #2812

1.14.6

Added

A list of additional paths to FlowETL QA checks can now be supplied to create_dag and get_qa_checks. #3484
FlowETL docker container now includes the upgrade check script for Airflow 2.0.0.

Fixed

Additional FlowETL QA checks in the dags folder are now picked up. #3484
Quickstart will no longer raise a warning about unset Autoflow related environment variables. #2118

1.14.5

Fixed

FlowETL QA checks with template sections conditional on the cdr_type argument now render correctly. #3479

1.14.4

Fixed

Fixed FlowClient ignoring custom SSL certificates #3344

1.14.3

Fixed

Fixed FlowETL not using the randomly generated secret key to secure sessions with the web interface if one is not explicitly provided using AIRFLOW__WEBSERVER__SECRET_KEY. #3244

1.14.2

Fixed

Reinstated tabs navigation in the docs #3238
Removed $ from code snippets in developer docs #3224
FlowETL now randomly generates a secret key to secure sessions with the web interface if one is not explicitly provided using AIRFLOW__WEBSERVER__SECRET_KEY. #3244

1.14.1

Fixed

Docs displaying None where they shouldn't

1.14.0

Added

Previously run, or currently running queries can now be referenced as a subscriber subset via FlowAPI. #1009
total_network_objects, location_introversion, and unique_subscriber_counts now also accept subscriber subsets.
The validity window for FlowAuth 2factor codes can now be configured using the TWO_FACTOR_VALID_WINDOW env variable. #3203

Changed

get_cached_query_objects_ordered_by_score is now a generator. #3116
Flowclient now uses httpx instead of requests, for improved async performance and http2 support. #1789

Fixed

FlowAPI now correctly logs all query run, poll, and retrieval requests for matching with FlowMachine. #3071
Links in the installation docs are now generated correctly. #3152

1.13.0

Changed

When creating a file-based DAG using create_dag, you can now use the slower, table based method of checking whether the file is being written. #2857

1.12.0

Added

The issuer name can now be set for FlowAuth's 2factor authentication using the FLOWAUTH_TWO_FACTOR_ISSUER environment variable.
FlowAPI's internal port can now be set using the FLOWAPI_PORT environment variable, but continues to default to 9090. #2723

With thanks to JIPS for supporting this work.
FlowETL's default port can now be set using the FLOWETL_PORT environment variable, but continues to default to 8080. #2724

With thanks to JIPS for supporting this work.

Changed

Test and synthetic DFS data now uses the same pool of subscribers as CDR data. #2713

With thanks to JIPS for supporting this work.

1.11.1

Added

FlowDB's SQL synthetic data generator now uses the WorldPop project's 2016 population raster for the country chosen as the basis for generating data.

1.11.0

Added

Queries run through FlowAPI can now be run on only a subset of the available CDR types, by supplying an event_types parameter. #2631
FlowETL now includes QA checks for the earliest and latest timestamps in the ingested data. #2627

Fixed

The FlowETL 'count_duplicates' QA check now correctly counts the number of duplicate rows. #2651

1.10.0

Added

FlowDB's SQL synthetic data generator can now generate events for any country, not just Nepal.

To generate synthetic data for a different country, supply the COUNTRY environment variable when starting the container, and a valid GADM GID code for the region to simulate a disaster.

Changed

FlowMachine's docker container now uses Python 3.8
FlowAPI's docker container now uses Python 3.8
FlowAuth's docker container now uses Python 3.8
AutoFlow's docker container now uses Python 3.8
FlowDB's SQL synthetic data generator now uses GADM 3.6 boundaries.
FlowAuth and FlowAPI now exchange tokens with compressed claims. #2625

Fixed

FlowAuth will no longer fail to start if there are directories with names the same as the SSL certificate secrets.

1.9.4 Changed

JoinToLocation is cacheable only if the joined query is also cacheable.

1.9.3

Changed

SubscriberLocations are no longer cacheable using FlowMachine.

Fixed

Fixed cache shrinking failing when large numbers of tables have been written. #2462
Fixed FlowAuth's MySQL support.

1.9.2

Fixed

Added missing bridge table arguments to Several FlowClient methods.

1.9.1

Added

FlowAuth now supports MySQL as a database backend.
FlowKit now allows the use of bridge tables to manually specify linkages between cells and geometries.

Fixed

FlowAuth no longer errors after a period of inactivity due to timed out database connections. #2382

1.9.0

Added

Added new FlowAPI aggregates; unique_visitor_counts, active_at_reference_location_counts, unmoving_counts, unmoving_at_reference_location_counts, trips_od_matrix, and consecutive_trips_od_matrix
Added new Flows type query to FlowAPI unique_locations, which produces the paired regional connectivity COVID-19 indicator
Added FlowClient function unique_locations_spec, which can be used on either side of a flows query
Added FlowClient functions: unique_visitor_counts, active_at_reference_location_counts, unmoving_counts, unmoving_at_reference_location_counts, trips_od_matrix, and consecutive_trips_od_matrix. #2333
FlowClient now has an asyncio API. Use connect_async instead of connect to create an ASyncConnection, and await methods on APIQuery objects. #2199

Fixed

Fixed FlowMachine server becoming deadlocked under load. #2390

1.8.0

Added

Added subscriber metrics: ActiveAtReferenceLocation, Unmoving, UnmovingAtReferenceLocation and UniqueLocations
Added location metrics and their Redacted* equivalents:
- UniqueVisitorCounts
- UnmovingAtReferenceLocationCounts (COVID-19 equivalent)
- ActiveAtReferenceLocationCounts
- UnmovingCount (COVID-19 equivalent)
- TripsODMatrix (COVID-19 equivalent)
- ConsecutiveTripsODMatrix (COVID-19 equivalent) See https://covid19.flowminder.org for more detail on how Flowminder is supporting the global COVID-19 response.

1.7.0

Changed

FlowETL is now based on the official apache-airflow docker image. As a result, you should now bind mount your host dags directory to /opt/airflow/dags, and your logs directory to /opt/airflow/logs.

1.6.1

Fixed

FlowMachine server will now ignore values for the FLOWMACHINE_SERVER_THREADPOOL_SIZE environment variable which can't be cast to int. #2304

1.6.0

Added

histogram_aggregate added to FlowAPI and FlowClient. Allows the user to obtain a histogram over a per-subscriber metric. #1076

1.5.1

Added

FlowClient now displays a progress bar when waiting for a query to ready, indicating how many parts of that query still need to be run.

1.5.0

Added

Added a flowclient Query class to represent a FlowKit query #1980.
Added method flowclient.Connection.update_token, to replace the API token for an existing connection.

Changed

The names of flowclient functions for generating query specifications have been renamed to <previous_name>_spec (e.g. flowclient.modal_location is now flowclient.modal_location_spec).
flowclient.get_status now returns "not_running" (instead of raising FileNotFoundError) if a query is not running or completed.
Flowclient functions location_event_counts_spec, meaningful_locations_aggregate_spec, meaningful_locations_between_label_od_matrix_spec, meaningful_locations_between_dates_od_matrix_spec, flows_spec, unique_subscriber_counts_spec, location_introversion_spec, total_network_objects_spec, aggregate_network_objects_spec, spatial_aggregate_spec and joined_spatial_aggregate_spec have moved to the flowclient.aggregates submodule.

1.4.0

Added

FlowAPI can now return results in CSV and GeoJSON format, FlowClient now supports getting GeoJSON formatted results. #2003

1.3.3

Added

FlowAPI now reports the proportion of subqueries cached for a query when polling. #1202
FlowClient now logs info messages with the proportion of subqueries cached for a query when polling. #1202

Fixed

Fixed the display of deeply nested permissions for flows in FlowAuth. #2110

1.3.2

Fixed

Fixed tokens which used the FlowAuth demo data not being accepted by FlowAPI. #2108

1.3.1

Changed

Flowmachine now uses an enum for interaction direction parameters (but will still accept them as strings). #357

Removed

Removed unused aggregates, results and features schemas from FlowDB. #587

1.3.0

Added

Improved UI for API permissions in FlowAuth.

Changed

The format of user claims expected has changed from a dictionary, to string based format. FlowAPI now expects the claims key of any token to contain a list of scope strings.
Permissions for joined spatial aggregates can now be set at a finer level in FlowAuth, to allow administrators to grant access only to specific combinations of query types at different aggregation units.
FlowAuth no longer requires administrators to manually configure API routes, and will extract them from a FlowAPI server's open api specification.
FlowAuth now uses structlog for log messages.
FlowAPI no longer mandates a top level aggregation_unit field in query specifications.
FlowClient's flows and modal_location functions no longer require an aggregation unit.

Removed

The poll type permission has been removed, and is implicitly granted by both read and get_result rights.
FlowAuth no longer allows administrators to specify the name of a FlowAPI server, and will instead use the name specified in the server's open api specification.

1.2.1 Fixed

Queries which have been removed Flowmachine's cache, or cancelled can now be rerun. #1898

1.2.0

Added

FlowMachine can now use multiple FlowDB backends, redis instances or execution pools via the flowmachine.connections or flowmachine.core.context.context context managers. #391
flowmachine.core.connection.Connection now has a conn_id attribute, which is unique per database host. #391

Changed

flowmachine.connect no longer returns a Connection object. The connection should be accessed via flowmachine.core.context.get_db(). #391
connection, redis, and threadpool are no longer available as attributes of Query, and should be accessed via flowmachine.core.context.get_db(), flowmachine.core.context.get_redis() and flowmachine.core.context.get_executor(). #391

Removed

Removed Query.connection, Query.redis, and Query.threadpool. #391

1.1.1

Added

Added a worked example to demonstrate using joined spatial aggregate queries. #1938

1.1.0

Changed

Connection.available_dates is now a property and returns results based on the etl.etl_records table. #1873

Fixed

Fixed the run action blocking the FlowMachine server in some scenarios. #1256

Removed

Removed tables and columns methods from the Connection class in FlowMachine
Removed the inspector attribute from the Connection class in FlowMachine

1.0.0

Added

FlowMachine now periodically prunes the cache to below the permitted cache size. #1307 The frequency of this pruning is configurable using the FLOWMACHINE_CACHE_PRUNING_FREQUENCY environment variable to Flowmachine, and queries are excluded from being removed by the automatic shrinker based on the cache_protected_period config key within FlowDB.
FlowDB now includes Paul Ramsey's OGR foreign data wrapper, for easy loading of GIS data. #1512
FlowETL now allows all configuration options to be set using docker secrets. #1515
Added a new component, AutoFlow, to automate running Jupyter notebooks when new data is added to FlowDB. #1570
FLOWETL_INTEGRATION_TESTS_SAVE_AIRFLOW_LOGS environment variable added to allow copying the Airflow logs in FlowETL integration tests into the /mounts/logs directory for debugging. #1019
Added new IterativeMedianFilter query to Flowmachine, which applies an iterative median filter to the output of another query. #1339
FlowDB now includes the TDS foreign data wrapper. #1729
Added contributing and support instructions. #1791
New FlowETL module installable via pip to aid in ETL dag creation.

Changed

FlowDB is now built on PostgreSQL 12 #1396 and PostGIS 3.
FlowETL is now built on Airflow 10.1.6.
FlowETL now defaults to disabling Airflow's REST API, and enables RBAC for the webui. #1516
FlowETL now requires that the FLOWETL_AIRFLOW_ADMIN_USERNAME and FLOWETL_AIRFLOW_ADMIN_PASSWORD environment variables be set, which specify the default web ui account. #1516
FlowAPI will no longer return a result for rows in spatial aggregate, joined spatial aggregate, flows, total events, meaningful locations aggregate, meaningful locations od, or unique subscriber count where the aggregate would contain less than 16 sims. #1026
FlowETL now requires that AIRFLOW__CORE__SQL_ALCHEMY_CONN be provided as an environment variable or secret. #1702, #1703
FlowAuth now records last used two-factor authentication codes in an expiring cache, which supports either a file-based, or redis backend. #1173
AutoFlow now uses Bundler to manage Ruby dependencies.
The end_date parameter of flowclient.modal_location_from_dates now refers to the day after the final date included in the range, so is now consistent with other queries that have start/end date parameters. #819
Date intervals in AutoFlow date stencils are now interpreted as half-open intervals (i.e. including start date, excluding end date), for consistency with date ranges elsewhere in FlowKit.
flowmachine user now has read access to ETL metadata tables in FlowDB

Fixed

Quickstart should no longer fail on systems which do not include the netstat tool. #1472
Fixed an error that prevented FlowAuth admin users from resetting users' passwords using the FlowAuth UI. #1635
The 'Cancel' button on the FlowAuth 'New User' form no longer submits the form. #1636
FlowAuth backend now sends a meaningful 400 response when trying to create a user with an empty password. #1637
Usernames of deleted users can now be re-used as usernames for new users. #1638
RedactedJoinedSpatialAggregate now only redacts rows with too few subscribers. #1747
FlowDB now uses a more conservative default setting for tcp_keepalives_idle of 10 minutes, to avoid connections being killed after 15 minutes when running in a docker swarm. #1771
Aggregation units and api routes can now be added to servers. #1815
Fixed several issues with FlowETL. #1529 #1499 #1498 #1497

Removed

Removed pg_cron.

0.9.1

Added

Added new DistanceSeries query to Flowmachine, which produces per-subscriber time series of distance from a reference point. #1313
Added new ImputedDistanceSeries query to Flowmachine, which produces contiguous per-subscriber time series of distance from a reference point by filling in gaps using the rolling median. #1337

Changed

Fixed

The FlowETL config file is now always validated, avoiding runtime errors if a config setting is wrong or missing. #1375
FlowETL now only creates DAGs for CDR types which are present in the config, leading to a better user experience in the Airflow UI. #1376
The concurrency settings in the FlowETL config are no longer ignored. #1378
The FlowETL deployment example has been updated so that it no longer fails due to a missing foreign data wrapper for the available CDR dates. #1379
Fixed error when editing a user in FlowAuth who did not have two factor enabled. #1374
Fixed not being able to enable a newly added api route on existing servers in FlowAuth. #1373

Removed

The default_args section in the FlowETL config file has been removed. #1377

0.9.0

Added

FlowAuth now makes version information available at /version and displays it in the web ui. #835
FlowETL now comes with a deployment example (in flowetl/deployment_example/). #1126
FlowETL now allows to run supplementary post-ETL queries. #989
Random sampling is now exposed via the API, for all non-aggregated query kinds. #1007
New aggregate added to FlowMachine - HistogramAggregation, which constructs histograms over the results of other queries. #1075
New IntereventInterval query class - returns stats over the gap between events as a time interval.
Added submodule flowmachine.core.dependency_graph, which contains functions related to creating or using query dependency graphs (previously these were in utils.py).
New config option sql_find_available_dates in FlowETL to provide SQL code to determine the available dates. #1295

Changed

FlowDB is now based on PostgreSQL 11.5 and PostGIS 2.5.3
When running queries through FlowAPI, the query's dependencies will also be cached by default. This behaviour can be switched off by setting FLOWMACHINE_SERVER_DISABLE_DEPENDENCY_CACHING=true. #1152
NewSubscribers now takes a pair of UniqueSubscribers queries instead of the arguments to them
Flowmachine's default random sampling method is now random_ids rather than the non-reproducible system_rows. #1263
IntereventPeriod now returns stats over the gap between events in fractional time units, instead of time intervals. #1265
Attempting to store a query that does not have a standard table name (e.g. EventTableSubset or unseeded random sample) will now raise an UnstorableQueryError instead of ValueError.
In the FlowETL deployment example, the external ingestion database is now set up separately from the FlowKit components and connected to FlowDB via a docker overlay network. #1276
The md5 attribute of the Query class has been renamed to query_id #1288.
DistanceMatrix no longer returns duplicate rows for the lon-lat spatial unit.
Previously, Displacement defaulted to returning NaN for subscribers who have a location in the reference location but were not seen in the time period for the displacement query. These subscribers are no longer returned unless the return_subscribers_not_seen argument is set to True.
PopulationWeightedOpportunities is now available under flowmachine.features.location, instead of flowmachine.models
PopulationWeightedOpportunities no longer supports erroring with incomplete per-location departure rate vectors and will instead omit any locations not included from the results
PopulationWeightedOpportunities no longer requires use of the run() method

Fixed

Quickstart will no longer fail if it has been run previously with a different FlowDB data size and not explicitly shut down. #900

Removed

Flowmachine's subscriber_locations_cluster function has been removed - use HartiganCluster or MeaningfulLocations directly.
FlowAPI no longer supports the non-reproducible random sampling method system_rows. #1263

0.8.0

Added

FlowAPI's 'joined_spatial_aggregate' endpoint now exposes event counts. #992
FlowAPI's 'joined_spatial_aggregate' endpoint now exposes top-up amount. #967
FlowAPI's 'joined_spatial_aggregate' endpoint now exposes nocturnal events. #1025
FlowAPI's 'joined_spatial_aggregate' endpoint now exposes top-up balance. #968
FlowAPI's 'joined_spatial_aggregate' endpoint now exposes displacement. #1010
FlowAPI's 'joined_spatial_aggregate' endpoint now exposes pareto interactions. #1012
FlowETL now supports ingesting from a postgres table in addition to CSV files. #1027
FLOWETL_RUNTIME_CONFIG environment variable added to control which DAG definitions the FlowETL integration tests should use (valid values: "testing", "production").
FLOWETL_INTEGRATION_TESTS_DISABLE_PULLING_DOCKER_IMAGES environment variable added to allow running the FlowETL integration tests against locally built docker images during development.
FlowAPI's 'joined_spatial_aggregate' endpoint now exposes handset. #1011 and #1029
JoinedSpatialAggregate now supports "distr" stats which computes outputs the relative distribution of the passed metrics.
Added SubscriberHandsetCharacteristic to FlowMachine
FlowAuth now supports optional two-factor authentication #121

Changed

The flowdb containers for test_data and synthetic_data were split into two separate containers and quick_start.sh downloads the docker-compose files to a new temporary directory on each run. #843
Flowmachine now returns more informative error messages when query parameter validation fails. #1055

Removed

TESTING environment variable was removed (previously used by the FlowETL integration tests).
Removed SubscriberPhoneType from FlowMachine to avoid redundancy.

0.7.0

Added

PRIVATE_JWT_SIGNING_KEY environment variable/secret added to FlowAuth, which should be a PEM encoded RSA private key, optionally base64 encoded if supplied as an environment variable.
PUBLIC_JWT_SIGNING_KEY environment variable/secret added to FlowAPI, which should be a PEM encoded RSA public key, optionally base64 encoded if supplied as an environment variable.
The dev provisioning Ansible playbook now automatically generates an SSH key pair for the flowkit user. #892
Added new classes to represent spatial units in FlowMachine.
Added a Geography query class, to get geography data for a spatial unit.
FlowAPI's 'joined_spatial_aggregate' endpoint now exposes unique location counts.#949
FlowAPI's 'joined_spatial_aggregate' endpoint now exposes subscriber degree.#969
Flowdb now contains an auxiliary table to record outcomes of queries that can be run as part of the regular ETL process #988

Changed

The quick-start script now only pulls the docker images for the services that are actually started up. #898
FlowAuth and FlowAPI are now linked using an RSA keypair, instead of per-server shared secrets. #89
Location-related FlowMachine queries now take a spatial_unit parameter instead of level.
The quick-start script now uses the environment variable GIT_REVISION to control the version to be deployed.
Create token page permission and spatial aggregation checkboxes are now hidden by default.#834
The flowetl mounted directories archive, dump, ingest, quarantine were replaced with a single files directory and files are no longer moved. #946
FlowDB's postgresql has been updated to 11.4, which addresses several bugs and one major vulnerability.

Fixed

When creating a new token in FlowAuth, the expiry now always shows the year, seconds till expiry, and timezone. #260
Distances in Displacement are now calculated with longitude and latitude the corrcet way around. #913
The quick-start script now works correctly with branches. #902
Fixed location_event_counts failing to work when specifying a subset of event types #1015
FlowAPI will now show the correct version in the API spec, flowmachine and flowclient will show the correct versions in the worked examples. #818

Removed

Removed cell_mappings.py, get_columns_for_level and BadLevelError.
JWT_SECRET_KEY has been removed in favour of RSA keys.
The FlowDB tables infrastructure.countries and infrastructure.operators have been removed. #958

0.6.4

Added

Buttons to copy token to clipboard and download token as file added to token list page. #704
Two new worked examples: "Cell Towers Per Region" and "Unique Subscriber Counts". #633, #634

Changed

The FLOWDB_DEBUG environment variable has been renamed to FLOWDB_ENABLE_POSTGRES_DEBUG_MODE.
FlowAuth will now automatically set up the database when started without needing to trigger via the cli.
FlowAuth now requires that at least one administrator account is created by providing env vars or secrets for:
- FLOWAUTH_ADMIN_PASSWORD
- FLOWAUTH_ADMIN_USERNAME

Fixed

The FLOWDB_DEBUG environment variable used to have no effect. This has been fixed. #811
Previously, queries could be stuck in an executing state if writing their cache metadata failed, they will now correctly show as having errored. #833
Fixed an issue where Table objects could be in an inconsistent cache state after resetting cache #832
FlowAuth's docker container can now be used with a Postgres backing database. #825
FlowAPI now starts up successfully when following the "Secrets Quickstart" instructions in the docs. #836
The command to generate an SSL certificate in the "Secrets Quickstart" section in the docs has been fixed and made more robust #837
FlowAuth will no longer try to initialise the database or create demo data multiple times when running under uwsgi with multiple workers #844
Fixed issue of Multiple tokens don't line up on FlowAuth "Tokens" page #849

Removed

The FLOWDB_SERVICES environment variable has been removed from the toplevel Makefile, so that now DOCKER_SERVICES is the only environment variable that controls which services are spun up when running make up. #827

0.6.3

Added

FlowKit's worked examples are now Dockerized, and available as part of the quick setup script #614
Skeleton for Airflow based ETL system added with basic ETL DAG specification and tests.
The docs now contain information about required versions of installation prerequisites #703
FlowAPI now requires the FLOWAPI_IDENTIFIER environment variable to be set, which contains the name used to identify this FlowAPI server when generating tokens in FlowAuth #727
flowmachine.utils.calculate_dependency_graph now includes the Query objects in the query_object field of the graph's nodes dictionary #767
Architectural Decision Records (ADR) have been added and are included in the auto-generated docs #780
Added FlowDB environment variables SHARED_BUFFERS_SIZE and EFFECTIVE_CACHE_SIZE, to allow manually setting the Postgres configuration parameters shared_buffers and effective_cache_size.
The function print_dependency_tree() now takes an optional argument show_stored to display information whether dependent queries have been stored or not #804
A new function plot_dependency_graph() has been added which allows to conveniently plot and visualise a dependency graph for use in Jupyter notebooks (this requires IPython and pygraphviz to be installed) #786

Changed

Parameter names in flowmachine.connect() have been renamed as follows to be consistent with the associated environment variables #728:
- db_port -> flowdb_port
- db_user -> flowdb_user
- db_pass -> flowdb_password
- db_host -> flowdb_host
- db_connection_pool_size -> flowdb_connection_pool_size
- db_connection_pool_overflow -> flowdb_connection_pool_overflow
FlowAPI and FlowAuth now expect an audience key to be present in tokens #727
Dependent queries are now only included once in the md5 calculation of a given query (in particular, it changes the query ids compared to previous FlowKit versions).
Error is displayed in the add user form of Flowauth if username is alredy exists. #690
Error is displayed in the add group form of Flowauth if group name already exists. #709
FlowAuth's add new server page now shows helper text for bad inputs. #749
The class SubscriberSubsetterBase in FlowMachine no longer inherits from Query #740 (this changes the query ids compared to previous FlowKit versions).

Fixed

FlowClient docs rendered to website now show the options available for arguments that require a string from some set of possibilities #695.
The Flowmachine loggers are now initialised only once when flowmachine is imported, with a call to connect() only changing the log level #691
The FERNET_KEY environment variable for FlowAuth is now named FLOWAUTH_FERNET_KEY
The quick-start script now correctly aborts if one of the FlowKit services doesn't fully start up #745
The maps in the worked examples docs pages now appear in any browser
Example invocations of generate-jwt are no longer uncopyable due to line wrapping #778
API parameter interval for location_event_counts queries is now correctly passed to the underlying FlowMachine query object #807.

0.6.2

Added

Added a new module, flowkit-jwt-generator, which generates test JWT tokens for use with FlowAPI #564
A new Ansible playbook was added in deployment/provision-dev.yml. In addition to the standard provisioning this installs pyenv, Python 3.7, pipenv and clones the FlowKit repository, which is useful for development purposes.
Added a 'quick start' setup script for trying out a complete FlowKit system #688.

Changed

FlowAPI's available_dates endpoint now always returns available dates for all event types and does not accept JSON
Hints are now displayed in the add user form of FlowAuth if the form is not completed #679
Error messages are now displayed when generating a new token in FlowAuth if the token's name is invalid #799
The Ansible playbooks in deployment/ now allow configuring the username and password for the FlowKit user account.
Default compose file no longer includes build blocks, these have been moved to docker-compose-build.yml.

Fixed

FlowDB synthetic data container no longer silently fails to generate data if data generator is not set #654

0.6.1

Fixed

Fixed TotalNetworkObjects raising an error when run with a lat-long level #108
Radius of gyration no longer incorrectly appears as a top level api query

0.6.0

Added

Added new flowclient API entrypoint, aggregate_network_objects, to access equivalent flowmachine query #601
FlowAPI now exposes the API spec at the spec/openapi.json endpoint, and an interactive version of the spec at the spec/redoc endpoint
Added Makefile target make up-no_build, to spin up all containers without building the images
Added resync_redis_with_cache function to cache utils, to allow administrators to align redis with FlowDB #636
Added new flowclient API entrypoint, radius_of_gyration, to access (with simplified parameters) equivalent flowmachine query RadiusOfGyration #602

Changed

The period argument to TotalNetworkObjects in FlowMachine has been renamed total_by
The period argument to total_network_objects in FlowClient has been renamed total_by
The by argument to AggregateNetworkObjects in FlowMachine has been renamed to aggregate_by
The stop_date argument to the modal_location_from_dates and meaningful_locations_* functions in FlowClient has been renamed end_date #470
get_result_by_query_id now accepts a poll_interval argument, which allows polling frequency to be changed
The start and stop argument to EventTableSubset are now mandatory.
RadiusOfGyration now returns a value column instead of an rog column
TotalNetworkObjects and AggregateNetworkObjects now return a value column, rather than statistic_name
All environment variables are now in a single development_environment file in the project root, development environment setup has been simplified
Default FlowDB users for FlowMachine and FlowAPI have changed from "analyst" and "reporter" to "flowmachine" and "flowapi", respectively
Docs and integration tests now use top level compose file
The following environment variables have been renamed:
- FLOWMACHINE_SERVER (FlowAPI) -> FLOWMACHINE_HOST
- FM_PASSWORD (FlowDB), FLOWDB_PASS (FlowMachine) -> FLOWMACHINE_FLOWDB_PASSWORD
- API_PASSWORD (FlowDB), FLOWDB_PASS (FlowAPI) -> FLOWAPI_FLOWDB_PASSWORD
- FM_USER (FlowDB), FLOWDB_USER (FlowMachine) -> FLOWMACHINE_FLOWDB_USER
- API_USER (FlowDB), FLOWDB_USER (FlowAPI) -> FLOWAPI_FLOWDB_USER
- LOG_LEVEL (FlowMachine) -> FLOWMACHINE_LOG_LEVEL
- LOG_LEVEL (FlowAPI) -> FLOWAPI_LOG_LEVEL
- DEBUG (FlowDB) -> FLOWDB_DEBUG
- DEBUG (FlowMachine) -> FLOWMACHINE_SERVER_DEBUG_MODE
The following Docker secrets have been renamed:
- FLOWAPI_DB_USER -> FLOWAPI_FLOWDB_USER
- FLOWAPI_DB_PASS -> FLOWAPI_FLOWDB_PASSWORD
- FLOWMACHINE_DB_USER -> FLOWMACHINE_FLOWDB_USER
- FLOWMACHINE_DB_PASS -> FLOWMACHINE_FLOWDB_PASSWORD
- POSTGRES_PASSWORD_FILE -> POSTGRES_PASSWORD
- REDIS_PASSWORD_FILE -> REDIS_PASSWORD
status enum in FlowDB renamed to etl_status
reset_cache now requires a redis client argument

Fixed

Fixed being unable to add new users or servers when running FlowAuth with a Postgres database #622
Resetting the cache using reset_cache will now reset the state of queries in redis as well #650
Fixed mode statistic for AggregateNetworkObjects #651

Removed

Removed docker-compose-dev.yml, and docker-compose files in docs/, flowdb/tests/ and integration_tests/.
Removed Dockerfile-dev Dockerfiles
Removed ENV defaults from the FlowMachine Dockerfile
Removed POSTGRES_DB environment variable from FlowDB Dockerfile, database name is now hardcoded as flowdb

0.5.3

Added

Added new spatial_aggregate API endpoint and FlowClient function #599
Added new flowclient API entrypoint, total_network_objects(), to access (with simplified parameters) equivalent flowmachine query #581
Added new flowclient API entrypoint, location_introversion(), to access (with simplified parameters) equivalent flowmachine query #577
Added new flowclient API entrypoint, unique_subscriber_counts(), to access (with simplified parameters) equivalent flowmachine query #562
New schema aggregates and table aggregates.aggregates have been created for maintaining a record of the process and completion of scheduled aggregates.
New joined_spatial_aggregate API endpoint and FlowClient function #600

Changed

daily_location and modal_location query types are no longer accepted as top-level queries, and must be wrapped using spatial_aggregate
JoinedSpatialAggregate no longer accepts positional arguments
JoinedSpatialAggregate now supports "avg", "max", "min", "median", "mode", "stddev" and "variance" stats

Fixed

total_network_objects no longer returns results from AggregateNetworkObjects #603

0.5.2

Fixed

Fixed #514, which would cause the client to hang after submitting a query that couldn't be created
Fixed #575, so that events at midnight are now considered to be happening on the following day

0.5.1

Added

Added HandsetStats to FlowMachine.
Added new ContactReferenceLocationStats query class to FlowMachine.
A new zmq message get_available_dates was added to the flowmachine server, along with the /available_dates endpoint in flowapi and the function get_available_dates() in flowclient. These allow to determine the dates that are available in the database for the supported event types.

Changed

FlowMachine's debugging logs are now from a single logger (flowmachine.debug) and include the submodule in the submodule field instead of using it as the logger name
FlowMachine's query run logger now uses the logger name flowmachine.query_run_log
FlowAPI's access, run and debug loggers are now named flowapi.access, flowapi.query and flowapi.debug
FlowAPI's access and run loggers, and FlowMachine's query run logger now log to stdout instead of stderr
Passwords for Redis and FlowDB must now be explicitly provided to flowmachine via argument to connect, env var, or secret

Removed

FlowMachine and FlowAPI no longer support logging to a file

0.5.0

Added

The flowmachine python library is now pip installable (pip install flowmachine)
The flowmachine server now supports additional actions: get_available_queries, get_query_schemas, ping.
Flowdb now contains a new dfs schema and associated tables to process mobile money transactions. In addition, flowdb_testdata contains sample data for DFS transactions.
The docs now include three worked examples of CDR analysis using FlowKit.
Flowmachine now supports calculating the total amount of various DFS metrics (transaction amount, commission, fee, discount) per aggregation unit during a given date range. These metrics are also exposed in FlowAPI via the query kind dfs_metric_total_amount.

Changed

The JSON structure when setting queries running via flowapi or the flowmachine server has changed: query parameters are now "inlined" alongside the query_kind key, rather than nested using a separate params key. Example:
- previously: {"query_kind": "daily_location", "params": {"date": "2016-01-01", "aggregation_unit": "admin3", "method": "last"}},
- now: {"query_kind": "daily_location", "date": "2016-01-01", "aggregation_unit": "admin3", "method": "last"}
The JSON structure of zmq reply messages from the flowmachine server was changed. Replies now have the form: {"status": "[success|error]", "msg": "...", "payload": {...}.
The flowmachine server action get_sql was renamed to get_sql_for_query_result.
The parameter daily_location_method was renamed to method.

0.4.3

Added

When running integration tests locally, normally pytest will automatically spin up servers for flowmachine and flowapi as part of the test setup. This can now be disabled by setting the environment variable FLOWKIT_INTEGRATION_TESTS_DISABLE_AUTOSTART_SERVERS=TRUE.
The integration tests now use the environment variables FLOWAPI_HOST, FLOWAPI_PORT to determine how to connect to the flowapi server.
A new data generator has been added to the synthetic data container which supports more data types, simple disaster simulation, and more plausible behaviours as well as increased performance

Changed

FlowAPI now reports queued/running status for queries instead of just accepted
The following environment variables have been renamed:
- DB_USER -> FLOWDB_USER
- DB_USER -> FLOWDB_HOST
- DB_PASS -> FLOWDB_PASS
- DB_PW -> FLOWDB_PASS
- API_DB_USER -> FLOWAPI_DB_USER
- API_DB_PASS -> FLOWAPI_DB_PASS
- FM_DB_USER -> FLOWMACHINE_DB_USER
- FM_DB_PASS -> FLOWMACHINE_DB_PASS
Added numerator_direction to ProportionEventType to allow for proportion of directed events.

Fixed

Server no longer loses track of queries under heavy load
TopUpBalances no longer always uses entire topups table

Removed

The environment variable DB_NAME has been removed.

0.4.2

Changed

MDSVolume no longer allows specifying the table, and will always use the mds table.
All FlowMachine logs are now in structured json form
FlowAPI now uses structured logs for debugging messages

0.4.1

Added

Added TopUpAmount, TopUpBalance query classes to FlowMachine.
Added PerLocationEventStats, PerContactEventStats to FlowMachine

Removed

Removed TotalSubscriberEvents from FlowMachine as it is superseded by EventCount.

0.4.0

Added

Dockerised development setup, with support for live reload of flowmachine and flowapi after source code changes.
Pre-commit hook for Python formatting with black.
Added new IntereventPeriod, ContactReciprocal, ProportionContactReciprocal, ProportionEventReciprocal, ProportionEventType and MDSVolume query classes to FlowMachine.

Changed

CustomQuery now requires column names to be specified
Query classes are now required to declare the column names they return via the column_names property
FlowAPI now reports whether a query is queued or running when polling
FlowDB test data and synthetic data images are now available from their own Docker repos (Flowminder/flowdb-testdata, Flowminder/flowdb-synthetic-data)
Changed query class name from NocturnalCalls to NocturnalEvents.

Fixed

FlowAPI is now an installable python module

Removed

Query objects can no longer be recalculated to cache and must be explicitly removed first
Arbitrary Flow maths
EdgeList query type
Removes query class ProportionOutgoing as it becomes redundant with the the introduction of ProportionEventType.

0.3.0

Added

API route for retrieving geography data from FlowDB
Aggregated meaningful locations are now available via FlowAPI
Origin-destination matrices between meaningful locations are now available via FlowAPI
Added new MeaningfulLocations, MeaningfulLocationsAggregate and MeaningfulLocationsOD query classes to FlowMachine

Changed

Constructors for HartiganCluster, LabelEventScore, EventScore and CallDays now have different signatures
Restructured and extended documentation; added high-level overview and more targeted information for different types of users

0.2.2

Added

Support for running FlowDB as an arbitrary user via docker's --user flag

Removed

Support for setting the uid and gid of the postgres user when building FlowDB

0.2.1

Fixed

Fixed being unable to build if the port used by git:// is not open

0.2.0

Added

Added utilities for managing and inspecting the query cache

0.1.2

Changed

FlowDB now requires a password to be set for the flowdb superuser

0.1.1

Added

Support for password protected redis

Changed

Changed the default redis image to bitnami's redis (to enable password protection)

0.1.0

Added

Added structured logging of access attempts, query running, and data access
Added CHANGELOG.md
Added support for Postgres JIT in FlowDB
Added total location events metric to FlowAPI and FlowClient
Added ETL bookkeeping schema to FlowDB

Changed

Added changelog update to PR template
Increased default shared memory size for FlowDB containers

Fixed

Fixed being unable to delete groups in FlowAuth
Fixed make up not working with defaults

0.0.5

Added

Added Python 3.6 support for FlowClient

Files

CHANGELOG.md

Latest commit

History

CHANGELOG.md

File metadata and controls

Changelog

Added

Changed

Fixed

Removed

Changed

Fixed

Removed

Added

Changed

Fixed

Added

Changed

Added

Changed

Changed

Fixed

Fixed

Removed

Added

Fixed

Changed

Warning

Added

Changed

Added

Changed

Fixed

Removed

Added

Changed

Removed

Added

Added

Changed

Fixed

Added

Changed

Fixed

Fixed

Added

Added

Changed

Fixed

Removed

Added

Changed

Fixed

Added

Changed

Fixed

Removed

Added

Changed

Fixed

Added

Changed

Added

Fixed

Fixed

Fixed

Fixed

Fixed

Fixed

Added

Changed

Fixed

Changed

Added

Changed

Added

Added

Fixed

Added