Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: new reports scheduler #11711

Merged
merged 17 commits into from
Nov 25, 2020

Conversation

dpgaspar
Copy link
Member

@dpgaspar dpgaspar commented Nov 16, 2020

SUMMARY

Implements the new alerts and reports processing on the celery workers. PR number 3, previous related PR's #11606 #11550

This is a rework heavily based on the existing alerts implementation, maintains the current logic with the following differences:

  • Celery beat scheduling, was simplified and does not try to schedule possible lost events from the past, the schedule window is framed present to future.
  • Only one beat config is needed for alerts and reports, should be configured with the lowest time grain possible for cron, aka 1 minute.
  • Implements a log prune task, this should be configured on a separate celery beat (e.g. 1 day).
  • Notifications currently support Email and Slack but it's easy to extend to other platforms.

ADDITIONAL INFORMATION

  • Has associated issue:
  • Changes UI
  • Requires DB Migration.
  • Confirm DB Migration upgrade and downgrade tested.
  • Introduces new feature or API
  • Removes existing feature or API

@dpgaspar dpgaspar changed the title Feat/reports models scheduler feat: new reports scheduler Nov 16, 2020
@codecov-io
Copy link

codecov-io commented Nov 16, 2020

Codecov Report

Merging #11711 (d0e3770) into master (ec8ccd4) will decrease coverage by 2.34%.
The diff coverage is 78.04%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master   #11711      +/-   ##
==========================================
- Coverage   62.86%   60.51%   -2.35%     
==========================================
  Files         889      866      -23     
  Lines       43054    42309     -745     
  Branches     4016     3725     -291     
==========================================
- Hits        27064    25604    -1460     
- Misses      15811    16705     +894     
+ Partials      179        0     -179     
Flag Coverage Δ
cypress 55.03% <ø> (?)
javascript ?
python 63.38% <78.04%> (+0.49%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
superset/reports/commands/log_prune.py 0.00% <0.00%> (ø)
superset/tasks/celery_app.py 0.00% <0.00%> (ø)
superset/tasks/scheduler.py 0.00% <0.00%> (ø)
superset/reports/dao.py 76.13% <50.00%> (-8.94%) ⬇️
superset/reports/notifications/__init__.py 88.88% <88.88%> (ø)
superset/reports/notifications/slack.py 89.18% <89.18%> (ø)
superset/reports/commands/alert.py 92.00% <92.00%> (ø)
superset/reports/notifications/base.py 95.45% <95.45%> (ø)
superset/reports/commands/execute.py 96.29% <96.29%> (ø)
superset/dao/base.py 96.49% <100.00%> (+0.12%) ⬆️
... and 447 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update ec8ccd4...d0e3770. Read the comment docs.

Copy link
Member

@willbarrett willbarrett left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good start! A few comments, looking forward to seeing the test suite!!!

if len(rows) > 1:
raise AlertQueryMultipleRowsError()
if len(rows[0]) > 2:
raise AlertQueryMultipleColumnsError()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we only want to raise here if the validator is something other than "NOT NULL" - if a row is returned, then it's not null :)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ups! Right you are

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: it may be cleaner to have separate validation functions depending on the type of the validator

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

superset/reports/commands/alert.py Outdated Show resolved Hide resolved
self._report_schedule.last_value = self._result

if self._report_schedule.validator_type == ReportScheduleValidatorType.NOT_NULL:
return self._result not in (0, None, np.nan)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is correct if the result is a float, but if it's a row or set of rows this should behave differently.

superset/reports/commands/exceptions.py Outdated Show resolved Hide resolved
superset/reports/commands/exceptions.py Outdated Show resolved Hide resolved
superset/reports/commands/exceptions.py Outdated Show resolved Hide resolved
superset/reports/commands/exceptions.py Show resolved Hide resolved
from superset.reports.notifications.slack import SlackNotification


def create_notification(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the rare instance where I support a non-empty __init__.py. Revel, Daniel, in your success.

content = self._get_content()
to = self._get_to()
send_email_smtp(
to,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is an interesting question - should we treat this as a single SMTP send with all of the recipients in the to address field, or split recipients by , and send to each one individually? My preference would be the latter, but open to discussion. Eventually we should provide configuration to allow MTA batching.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right now it's somewhat simplistic, each recipients target will trigger an email, if the target contains multiple SMTP addresses then only one email is sent. I would vote for optimising on the next phase

logger = logging.getLogger(__name__)


class AsyncExecuteReportScheduleCommand(BaseCommand):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I read this class a few times looking for suggestions on simplification or breaking it up. I didn't come up with much, but I figured I'd surface that I felt the need to make the effort.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did a slight refactor, main run is pretty simple now

@dpgaspar dpgaspar marked this pull request as ready for review November 19, 2020 15:07
Copy link
Member

@bkyryliuk bkyryliuk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

left couple nits, but overall looks really good
definitely needs unit tests, your call if they should be added in this PR or later

Thanks a lot for revamping alerts!

if len(rows) > 1:
raise AlertQueryMultipleRowsError()
if len(rows[0]) > 2:
raise AlertQueryMultipleColumnsError()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: it may be cleaner to have separate validation functions depending on the type of the validator

self._result = rows[0][1]
return
# check if query return more then one row
if len(rows) > 1:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it will be useful to add result to the exception & surface it to the user in the error msg

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What do you think about surfacing the number of returned rows? Would be nice to try to wrap all the rows in a string, but need to be careful and truncate it to a certain point, think about a user mistake with millions of rows?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yep, number of rows is a good start, we can tune it later, your right that it needs smart truncation

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok, done that

dashboard_id_or_slug=report_schedule.dashboard_id,
)

def _get_screenshot(self, report_schedule: ReportSchedule) -> ScreenshotData:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: it would be nice to pass only the attributes needed to calculate the screenshot vs a whole class
it makes it easier to unit tests the functions when less objects needs to be constructed, and read the functions / code as well as functions definition explains what is passed to it.

this is just a personal preference and optional suggestion

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point, yet commands internally are model centric, we assume that validate validates and populates the model by model_id. My take on unit tests is to test public methods, and assume that the screenshot itself is being tested also

self.set_state_and_log(
session, start_dttm, ReportLogState.NOOP, error_message=str(ex)
)
except ReportSchedulePreviousWorkingError as ex:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should this one be retried ?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think not, since I'm proposing a 1min beat, the beat cycle itself is the retry, yet the cron for the alert can be set to 1h or 10min etc. Undecided..., I propose not to throw too much logic here, unless we find it's something we really should add

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sounds good

if self._model.type == ReportScheduleType.ALERT:
last_success = ReportScheduleDAO.find_last_success_log(session)
if (
last_success
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

would be nice to move it to a separate function & have unit test

logger = logging.getLogger(__name__)


class AsyncPruneReportScheduleLogCommand(BaseCommand):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nice!

screenshot: ScreenshotData


class BaseNotification: # pylint: disable=too-few-public-methods
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

❤️

@dpgaspar
Copy link
Member Author

left couple nits, but overall looks really good
definitely needs unit tests, your call if they should be added in this PR or later

Thanks a lot for revamping alerts!

Thanks a lot for the review! note that this PR already has unit tests

@bkyryliuk
Copy link
Member

left couple nits, but overall looks really good
definitely needs unit tests, your call if they should be added in this PR or later
Thanks a lot for revamping alerts!

Thanks a lot for the review! note that this PR already has unit tests

damn, that's my bad - need to keep attention to the collapsed github files :)

image

@dpgaspar dpgaspar merged commit f27ebc4 into apache:master Nov 25, 2020
@dpgaspar dpgaspar deleted the feat/reports-models-scheduler branch November 25, 2020 08:50
@mistercrunch mistercrunch added 🏷️ bot A label used by `supersetbot` to keep track of which PR where auto-tagged with release labels 🚢 1.0.0 labels Mar 12, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
🏷️ bot A label used by `supersetbot` to keep track of which PR where auto-tagged with release labels size/XXL 🚢 1.0.0
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants