Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Aggregate network objects #607

Merged
merged 46 commits into from
Apr 17, 2019
Merged

Aggregate network objects #607

merged 46 commits into from
Apr 17, 2019

Conversation

OwlHute
Copy link
Contributor

@OwlHute OwlHute commented Apr 10, 2019

Closes #601

I have:

  • Formatted any Python files with black
  • Brought the branch up to date with master
  • Added any relevant Github labels
  • Added tests for any new additions
  • Added or updated any relevant documentation
  • Added an Architectural Decision Record (ADR), if appropriate
  • Added an MPLv2 License Header if appropriate
  • Updated the Changelog

Description

Add API route aggregate_network_objects()

@@ -90,6 +90,10 @@ The API exposes four routes:

At present, the following query types are accessible through FlowAPI:

- `aggregate_network_objects`
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

@@ -1022,3 +1022,31 @@ def total_network_objects(
"end_date": end_date,
"aggregation_unit": aggregation_unit,
}


def aggregate_network_objects(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've been giving the params for this some thought - I actually think what would be most useful is to have it take: network_objects, statistic, by. That can be achieved without touching the underlying Flowmachine class, by constructing a tno object, and then calling aggregate on it with the by and statistic arguments.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You'll have to expand on that a bit, in terms of roughly how the call would look. I gather tno is TotalNetworkObject, and "by" is possibly the date range?! But also, I thought you were trying to do away with calls to aggregate() (#599)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As you were, I understand now what you are driving at! (in view of the review comment further down)

Copy link
Member

@greenape greenape Apr 11, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Arguments need to be total_network_objects_query: dict, statistic and by: str (or a more explanatory name than by would also be fine!)

@@ -203,7 +203,7 @@ def aggregate(self, by=None, statistic="avg"):
)


class AggregateNetworkObjects(GeoDataMixin, Query):
class AggregateNetworkObjects(TotalNetworkObjects, Query):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's going on here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This relates to your comment in #601 "Refactor AggregateNetworkObjects to take TotalNetworkObjects as main argument"

Copy link
Member

@greenape greenape Apr 11, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah. I think I've failed to communicate what I meant there - the suggestion was that rather than taking all the arguments necessary to create a TotalNetworkObjects object in the __init__ method, that this class should instead just take a TotalNetworkObjects object instead of those arguments.

This change actually alters the class hierarchy of the class (https://www.digitalocean.com/community/tutorials/understanding-class-inheritance-in-python-3 is a decent overview of class inheritance in python), and I imagine will cause a few issues because TotalNetworkObjects also inherits from Query, which would give you a somewhat odd inheritance hierarchy.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK I'll amend the code accordingly. (I do understand python class hierarchies, but I'll have a quick skim of that page)

@codecov
Copy link

codecov bot commented Apr 11, 2019

Codecov Report

Merging #607 into master will increase coverage by 0.05%.
The diff coverage is 94.62%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master     #607      +/-   ##
==========================================
+ Coverage   93.38%   93.44%   +0.05%     
==========================================
  Files         122      123       +1     
  Lines        6199     6253      +54     
  Branches      666      668       +2     
==========================================
+ Hits         5789     5843      +54     
+ Misses        286      285       -1     
- Partials      124      125       +1
Impacted Files Coverage Δ
flowauth/backend/flowauth/models.py 93.29% <ø> (ø) ⬆️
...ine/core/server/query_schemas/flowmachine_query.py 100% <100%> (ø) ⬆️
...core/server/query_schemas/total_network_objects.py 100% <100%> (ø) ⬆️
flowclient/flowclient/client.py 93.9% <100%> (+1.23%) ⬆️
.../server/query_schemas/aggregate_network_objects.py 100% <100%> (ø)
flowapi/flowapi/user_model.py 97.43% <100%> (+0.06%) ⬆️
...machine/flowmachine/core/server/action_handlers.py 94.07% <100%> (+0.2%) ⬆️
...machine/core/server/query_schemas/custom_fields.py 77.77% <76.47%> (+3.58%) ⬆️
...wmachine/features/network/total_network_objects.py 97.1% <94.44%> (ø) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update fa9c6c0...79d179e. Read the comment docs.

subscriber_identifier=subscriber_identifier,
)
def __init__(self, total_objs, statistic="avg", by=None):
self.total_objs = copy.deepcopy(total_objs)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't need a deepcopy here, just an assign is fine.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Below this, period will now need to refer to the period attribute of the passed in object.



class AggregateNetworkObjectsExposed(BaseExposedQuery):
def __init__(self, *, start_date, end_date, aggregation_unit):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This needs to match the AggregateNetworkObjects form now


class AggregateNetworkObjectsSchema(Schema):

start_date = fields.Date(required=True)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Needs to have the right fields, one of which is a bit tricky because it should refer to another schema. See FlowsSchema, or ModalLocationSchema for an example of how to do that.

@maxalbert
Copy link
Contributor

@OwlHute FYI, PRs #609 (spatial aggregate) and #611 (joined spatial aggregate) have now been merged, so hopefully that helps with this PR as well (see Jono's inline comment).

Github is indicating a couple of merge conflicts with this branch due to the changes in #609 and #611 so these will need to be resolved (but should be very simple).

@OwlHute
Copy link
Contributor Author

OwlHute commented Apr 12, 2019

Hi Max. I've merged the lastest "master" branch with my aggregate_network_objects private branch & pushed the latter to github for #607

Running the tests on my workspace, there is one failure. But I am not clear on the reason for this & could use a bit of help to diagnose & fix it. (I think the code is 99% OK and expect only a very small fix is needed, but the trick is knowing where and what! )

@maxalbert
Copy link
Contributor

Thanks @OwlHute. Which tests are failing for you locally? When I click on the "Details" link next to the integration checks failure here on Github (see here) it shows two failing tests:

  • tests/flowmachine_server_tests/test_server.py
  • tests/query_tests/test_queries.py

The first one seems to be from a missing update to the API spec. Have you been able to get the approvaltests running locally? If so, it should be just a matter of running this particular test (e.g. via pytest -svx tests/flowmachine_server_tests/test_server.py from within the integration_tests pipenv environment) and it will update the spec for you (via a GUI popup in whichever diff viewer you have configured as explained in the developer docs), so you just need to apply the changes in the diff viewer and commit the result.

For the second failing tests, if you run it via pytest -svx tests/query_tests/test_queries.py (again from within the integration_tests pipenv environment, it will succeed for the first few parameters and then fail on the aggregate_network_objects parameter (also see the CircleCI failure linked above). This is due to some of Jono's recent changes in #609 or #611. If you take a look at how the other JSON parameters have changed now (with a toplevel "spatial_aggregate" key rather than having the query_kind directly as the toplevel key) you should be able to deduce how your addition must be tweaked to make the tests pass.

@maxalbert maxalbert changed the title Initial commit Aggregate network objects Apr 12, 2019
@OwlHute
Copy link
Contributor Author

OwlHute commented Apr 12, 2019

Thanks for the review comment Max. I have fixed the first error. However, I am still slightly puzzled by the second issue.

From the test output, the following is the query being sent to flowapi :

query_kind = 'aggregate_network_objects'
params = {'by': 'day', 'statistic': 'median', 'total_network_objects': {'aggregation_unit': 'admin3', 'end_date': '2016-01-02', 'query_kind': 'total_network_objects', 'start_date': '2016-01-01'}}

and referring to total_network_objects() in flowclient/flowclient/client.py that 'total_network_objects' dictionary is correct:

return {
    "query_kind": "total_network_objects",
    "start_date": start_date,
    "end_date": end_date,
    "aggregation_unit": aggregation_unit,
    "period": period,
}

It also matches the parameters in the AggregateNetworkObjects class in flowmachine/flowmachine/core/server/query_schemas/total_network_objects.py :

def init(self, *, total_network_objects, statistic="avg", by=None):
:::

Finally, in flowmachine/flowmachine/core/server/query_schemas/aggregate_network_objects.py the actual query is run as:

def _flowmachine_query_obj(self):
"""
Return the underlying flowmachine aggregate_network_objects object.

   Returns
   -------
   Query
   """
   tot_network_objs = self.total_network_objects._flowmachine_query_obj

  return AggregateNetworkObjects(
       tot_network_objs, statistic=self.statistic, by=self.by
   )

The point is "spatial_aggregate" is not present in the input query, only in the output results.

I assume the latter call must introduce SpatialAggregate into the mix, but I'm not clear where and what need amending for this. (Once I've fixed this issue, I should be able to handle any similar changes in future.)

It's easy to see what the code is trying to achieve in principle, but all the same the details are damned intricate!

@OwlHute
Copy link
Contributor Author

OwlHute commented Apr 12, 2019

I've got a bit further, and identified the reason for the test failure. (A bug in the test code prevents this being displayed properly!):

       try:
           error = response.json()["msg"]
           # payload_info = f" Payload: {response.json()['payload']}"
           payload_info = str(response.json()) + ' (response.status_code = ' + str(response.status_code) + ')'
       except ValueError:
           # Happens if the response body does not contain valid JSON
           # (see http://docs.python-requests.org/en/master/api/#requests.Response.json)
           error = f"the response did not contain valid JSON"
           payload_info = ""
       raise FlowclientConnectionError(
          f"Something went wrong: {error}. API returned with status code: {response.status_code}.{payload_info}"
       )

E flowclient.client.FlowclientConnectionError: Something went wrong: Aggregation unit must be > specified when running a query.. API returned with status code: 400.{'msg': 'Aggregation unit must be > specified when running a query.'} (response.status_code = 400)

The bug is that the dict returned by response.json() contains no 'payload' key, so the error was being displayed as just "KeyError: 'payload'".

So the next step is to find out where that message "Aggregation unit must be specified when running a query" is generated. (For the new aggregate_network_objects() API route, it is no longer appropriate!), This turns out to be in /flowapi/flowapi/user_model.py function _get_query_kinds_and_aggregation_units() and the JSON it objects to is:

{
'query_kind': 'aggregate_network_objects',
'total_network_objects':
{
'query_kind': 'total_network_objects',
'start_date': '2016-01-01',
'end_date': '2016-01-02',
'aggregation_unit': 'admin3'
},
'statistic': 'median',
'by': 'day'
}

and I can see nothing wrong with that query!

@maxalbert
Copy link
Contributor

Thanks @OwlHute! Yes, you are right that the missing payload key was shadowing the actual error message.

I had a look and the error happens in flowapi/flowapi/user_model.py. The issue is that the code in UserObject._get_query_kinds_and_aggregation_units() assumes that unless the query_kind is spatial_aggregate or joined_spatial_aggregate, it must have an aggregation unit itself.

However, aggregate_network_objects isn’t a spatial aggregate (but rather a statistical aggregate) so it doesn't contain an aggregation_unit itself. The easiest fix that I can think of for the time being is to add another else branch which checks for aggregate_network_objects and extracts the aggregation_unit from the underlying total_network_objects (@greenape: any better ideas?).

This is a bit of a hack for the time being, but unless I'm missing something it would require larger refactorings to have a more elegant solution and I guess this is outside the scope of this particular PR. Let me know if this explanation makes sense and is sufficient to make the required tweaks.

tot_network_objs = self.total_network_objects._flowmachine_query_obj

return AggregateNetworkObjects(
tot_network_objs, statistic=self.statistic, by=self.by
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Jono's latest changes require this to be a keyword argument now:

Suggested change
tot_network_objs, statistic=self.statistic, by=self.by
total_network_objects=tot_network_objs, statistic=self.statistic, by=self.by

TotalNetworkObjects query result
statistic : str
Statistic type
aggregate_by : str
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
aggregate_by : str
aggregate_by : {"second", "minute", "hour", "day", "month", "year", "century"

CHANGELOG.md Outdated
@@ -5,6 +5,7 @@ The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/).

## [Unreleased]
### Added
- Added new flowclient API entrypoint, aggregate_network_objects(), to access (with simplified parameters) equivalent flowmachine query [#601](https://github.com/Flowminder/FlowKit/issues/601)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
- Added new flowclient API entrypoint, aggregate_network_objects(), to access (with simplified parameters) equivalent flowmachine query [#601](https://github.com/Flowminder/FlowKit/issues/601)
- Added new flowclient API entrypoint, `aggregate_network_objects`, to access equivalent flowmachine query [#601](https://github.com/Flowminder/FlowKit/issues/601)

CHANGELOG.md Outdated
@@ -5,6 +5,7 @@ The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/).

## [Unreleased]
### Added
- Added new flowclient API entrypoint, aggregate_network_objects(), to access (with simplified parameters) equivalent flowmachine query [#601](https://github.com/Flowminder/FlowKit/issues/601)

### Changed

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we have a changed entry here as well to reflect that the names of the arguments to TotalNetworkObjects and AggregateNetworkObjects have been altered?

CHANGELOG.md Outdated
@@ -14,10 +15,12 @@ The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/).

## [0.5.3]
### Added
- Amended time aggregation parameter names used in existing `total_network_objects` query from "period" in flowclient & "by" in flowmachine to "total_by" in both.
Copy link
Member

@greenape greenape Apr 16, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Needs to go in the [Unreleased] section, and should mention the underlying TotalNetworkObjects and AggregateNetworkObjects classes

CHANGELOG.md Outdated
@@ -5,6 +5,9 @@ The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/).

## [Unreleased]
### Added
- Amended time aggregation parameter names used in existing `total_network_objects` query from "period" in flowclient & "by" in flowmachine to "total_by" in both.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry to be pedantic, but can this go in the changed section of [Unreleased], and can we explicitly refer to TotalNetworkObjects and AggregateNetworkObjects?

CHANGELOG.md Outdated Show resolved Hide resolved
CHANGELOG.md Outdated Show resolved Hide resolved
CHANGELOG.md Outdated Show resolved Hide resolved
Copy link
Member

@greenape greenape left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

@greenape greenape added the ready-to-merge Label indicating a PR is OK to automerge label Apr 16, 2019
@OwlHute
Copy link
Contributor Author

OwlHute commented Apr 16, 2019

The CHANGELOG Unreleased "Changed" entry now reads:

Changed

  • The period argument to TotalNetworkObjects in FlowMachine has been renamed total_by
  • The period argument to total_network_objects in FlowClient has been renamed total_by
  • The by argument to AggregateNetworkObjects in FlowMachine has been renamed to aggregate_by
  • The by argument to AggregateNetworkObjects in FlowMachine has been renamed to `aggregate_by

So I think in that 4th line AggregateNetworkObjects & "FlowMachine" should read `aggregate_network_objects" & "FlowClient" respectively

CHANGELOG.md Outdated Show resolved Hide resolved
CHANGELOG.md Show resolved Hide resolved
@greenape greenape merged commit b7458de into master Apr 17, 2019
@greenape greenape deleted the aggregate_network_objects branch April 17, 2019 13:32
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ready-to-merge Label indicating a PR is OK to automerge
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Expose AggregateNetworkObjects via api
3 participants