Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: prevent exception raised when serviceId is missing from calendar file #1646

Merged
merged 8 commits into from
Feb 2, 2024

Conversation

davidgamez
Copy link
Member

@davidgamez davidgamez commented Jan 16, 2024

Summary:

This PR fixes the issue raised here.

How to reproduce the bug:

  • Have a feed with a service ID in the calendar_dates.txt file but missing in calendar.txt(This is a valid use case) and multiple or all calendar expired service IDs. Example: https://gtfs.pro/files/uran/improved-gtfs-dft-gtfs.zip
  • Execute the CLI or desktop application
  • Behavior: The report is missing all calendar expired notices for the service IDs after the row that contains the missing service ID on the calendar.txt. A java.util.NoSuchElementException exception can be found in the logs.

Expected behavior:
All calendar expired notices are reported.

The root cause of the issue is the improper manipulation of the Optional.get method.

The fix:

  • Fixed calls to Optional.get, ensuring the value is present before calling the method.

What is new:

  • For feeds that have only calendar_dates.txt files, expired notices will be reported only when all service IDs are expired
  • For feeds with missing service IDs on the calendar.txt file, service IDs not in the calendar file will not be reported.

Acceptance Tests Results:
Acceptance tests fail due to the number of expired feeds that have not been reported. All cases are foreign service ID key violations or feeds with only calendar_dates.txt service IDs.

Please make sure these boxes are checked before submitting your pull request - thanks!

  • Run the unit tests with gradle test to make sure you didn't break anything
  • Add or update any needed documentation to the repo
  • Format the title like "feat: [new feature short description]." Title must follow the Conventional Commit Specification(https://www.conventionalcommits.org/en/v1.0.0/).
  • Linked all relevant issues
  • Include screenshot(s) showing how this pull request works and fixes the issue(s)

@davidgamez davidgamez changed the title prevent exception raised when serviceId is missing from calendar file fix: prevent exception raised when serviceId is missing from calendar file Jan 16, 2024
Copy link
Contributor

❌ Invalid acceptance test.
New Errors: 0 out of 1479 datasets (~0%) are invalid due to code change, which is less than the provided threshold of 1%.
Dropped Errors: 0 out of 1479 datasets (~0%) are invalid due to code change, which is less than the provided threshold of 1%.
New Warnings: 16 out of 1479 datasets (~1%) are invalid due to code change, which is above the provided threshold of 1%.
Dropped Warnings: 0 out of 1479 datasets (~0%) are invalid due to code change, which is less than the provided threshold of 1%.
0 out of 1479 sources (~0 %) are corrupted.
Commit: 07c25a7
Download the full acceptance test report here (report will disappear after 90 days).
❌ Invalid acceptance test.

@davidgamez davidgamez force-pushed the fix/calendar-expired-missing-serviceid branch from 9c29f43 to 35bb737 Compare January 18, 2024 16:50
Copy link
Contributor

❌ Invalid acceptance test.
New Errors: 0 out of 1479 datasets (~0%) are invalid due to code change, which is less than the provided threshold of 1%.
Dropped Errors: 0 out of 1479 datasets (~0%) are invalid due to code change, which is less than the provided threshold of 1%.
New Warnings: 141 out of 1479 datasets (~10%) are invalid due to code change, which is above the provided threshold of 1%.
Dropped Warnings: 0 out of 1479 datasets (~0%) are invalid due to code change, which is less than the provided threshold of 1%.
0 out of 1479 sources (~0 %) are corrupted.
Commit: 60dd34d
Download the full acceptance test report here (report will disappear after 90 days).
❌ Invalid acceptance test.

Copy link
Contributor

❌ Invalid acceptance test.
New Errors: 0 out of 1478 datasets (~0%) are invalid due to code change, which is less than the provided threshold of 1%.
Dropped Errors: 0 out of 1478 datasets (~0%) are invalid due to code change, which is less than the provided threshold of 1%.
New Warnings: 141 out of 1478 datasets (~10%) are invalid due to code change, which is above the provided threshold of 1%.
Dropped Warnings: 0 out of 1478 datasets (~0%) are invalid due to code change, which is less than the provided threshold of 1%.
1 out of 1479 sources (~0 %) are corrupted.
Corrupted sources:
ca-unknown-via-rail-canada-gtfs-735
Commit: f624168
Download the full acceptance test report here (report will disappear after 90 days).
❌ Invalid acceptance test.

@emmambd
Copy link
Contributor

emmambd commented Jan 19, 2024

@davidgamez I did an initial round of checks on all 141 datasets to see if the feed itself was completely expired or not aka no services currently running. I did this check because that would indicate the issue is primarily with our stale data on the Mobility Database (or the producer hasn't updated the feed.)

70 out of 141 feeds were fully expired, which leaves us with 71 that have this warning and are actually active (5% of the overall Mobility Database).

Next steps:

  1. See if there are replacements for the 70 stale feeds on the Mobility Database (this makes my manual cleanup work a lot easier, so thank you!)
  2. Discuss the remaining 5% with the spec team. It's still over the 1% threshold, but seems mainly to do with feeds that only use calendar_dates. (16 feeds, or 1% had this issue in calendar.txt. 55 or about 4% have this from calendar_dates.txt).

@emmambd
Copy link
Contributor

emmambd commented Jan 23, 2024

Based on internal discussion, next steps are:

  • Update logic so this rule only triggers on feeds that only have calendar_dates.txt when -all- calendars are expired rather than any

@davidgamez davidgamez force-pushed the fix/calendar-expired-missing-serviceid branch from c55390b to b449bc0 Compare January 25, 2024 15:27
Copy link
Contributor

❌ Invalid acceptance test.
New Errors: 0 out of 1479 datasets (~0%) are invalid due to code change, which is less than the provided threshold of 1%.
Dropped Errors: 6 out of 1479 datasets (~0%) are invalid due to code change, which is less than the provided threshold of 1%.
New Warnings: 0 out of 1479 datasets (~0%) are invalid due to code change, which is less than the provided threshold of 1%.
Dropped Warnings: 117 out of 1479 datasets (~8%) are invalid due to code change, which is above the provided threshold of 1%.
2 out of 1481 sources (~0 %) are corrupted.
Corrupted sources:
us-california-tehama-rural-area-express-trax-susanville-indian-rancheria-public-transportation-program-gtfs-116
us-pennsylvania-port-authority-of-allegheny-county-gtfs-409
Commit: c243f1b
Download the full acceptance test report here (report will disappear after 90 days).
❌ Invalid acceptance test.

@emmambd
Copy link
Contributor

emmambd commented Jan 25, 2024

@davidgamez I'm a bit surprised by the results of the acceptance tests! I'd still expect around 71 feeds to return this warning because their calendar_dates.date fields are all expired. Here's a few examples:

ca-alberta-lethbridge-transit-gtfs-765: https://storage.googleapis.com/storage/v1/b/mdb-latest/o/ca-alberta-lethbridge-transit-gtfs-765.zip?alt=media

ca-ontario-go-transit-gtfs-727: https://storage.googleapis.com/storage/v1/b/mdb-latest/o/ca-ontario-go-transit-gtfs-727.zip?alt=media

Full original list of expired feeds is here. I didn't check them all to see if the producer actually made changes that would've impacted the re-run of the acceptance tests.

@davidgamez davidgamez force-pushed the fix/calendar-expired-missing-serviceid branch from d6f3f41 to 3605f4d Compare January 26, 2024 16:01
Copy link
Contributor

❌ Invalid acceptance test.
New Errors: 0 out of 1475 datasets (~0%) are invalid due to code change, which is less than the provided threshold of 1%.
Dropped Errors: 0 out of 1475 datasets (~0%) are invalid due to code change, which is less than the provided threshold of 1%.
New Warnings: 78 out of 1475 datasets (~5%) are invalid due to code change, which is above the provided threshold of 1%.
Dropped Warnings: 0 out of 1475 datasets (~0%) are invalid due to code change, which is less than the provided threshold of 1%.
6 out of 1481 sources (~0 %) are corrupted.
Corrupted sources:
au-south-australia-adelaide-metro-gtfs-660
ca-alberta-grande-prairie-transit-gtfs-1278
de-bayern-augsburger-verkehrs--und-tarifverbund-avv-gtfs-857
jp-hyogo-kobe-subway-gtfs-872
us-oregon-canby-ferry-gtfs-605
us-washington-wahkiakum-ferry-gtfs-604
Commit: f7d6a80
Download the full acceptance test report here (report will disappear after 90 days).
❌ Invalid acceptance test.

Copy link
Contributor

❌ Invalid acceptance test.
New Errors: 0 out of 1481 datasets (~0%) are invalid due to code change, which is less than the provided threshold of 1%.
Dropped Errors: 0 out of 1481 datasets (~0%) are invalid due to code change, which is less than the provided threshold of 1%.
New Warnings: 78 out of 1481 datasets (~5%) are invalid due to code change, which is above the provided threshold of 1%.
Dropped Warnings: 0 out of 1481 datasets (~0%) are invalid due to code change, which is less than the provided threshold of 1%.
0 out of 1481 sources (~0 %) are corrupted.
Commit: 8ccdd2b
Download the full acceptance test report here (report will disappear after 90 days).
❌ Invalid acceptance test.

@emmambd
Copy link
Contributor

emmambd commented Jan 29, 2024

@davidgamez I took a quick look at the acceptance test results and this logic looks good to me.

When we run these analytics before each new release, this can be my list to clean up/check for replacements for feeds in the Mobility Database.

From a QA perspective, so long as we can run https://gtfs.pro/files/uran/improved-gtfs-dft-gtfs.zip successfully, this change can be passed.

Copy link
Contributor

❌ Invalid acceptance test.
New Errors: 0 out of 1479 datasets (~0%) are invalid due to code change, which is less than the provided threshold of 1%.
Dropped Errors: 0 out of 1479 datasets (~0%) are invalid due to code change, which is less than the provided threshold of 1%.
New Warnings: 80 out of 1479 datasets (~5%) are invalid due to code change, which is above the provided threshold of 1%.
Dropped Warnings: 0 out of 1479 datasets (~0%) are invalid due to code change, which is less than the provided threshold of 1%.
2 out of 1481 sources (~0 %) are corrupted.
Corrupted sources:
au-queensland-translink-brisbane-gtfs-1217
ca-british-columbia-translink-vancouver-gtfs-1222
Commit: cf9d130
Download the full acceptance test report here (report will disappear after 90 days).
❌ Invalid acceptance test.

@davidgamez davidgamez marked this pull request as ready for review January 30, 2024 15:09
@jcpitre jcpitre self-requested a review February 1, 2024 18:28
}
if (serviceDates.last().isBefore(dateForValidation.getDate())) {
if (calendarTable.byServiceId(serviceId).isPresent()) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nitpicking here. In the case you know the calendarTable is empty (isCalendarTableEmpty == true), why bother doing a lookup on it?

Suggested change
if (calendarTable.byServiceId(serviceId).isPresent()) {
if (!isCalendarTableEmpty && calendarTable.byServiceId(serviceId).isPresent()) {

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's true. However, reducing the logic in the if condition helps readability. In either case, empty or not, the byService(serviceId) method will behave the same.

Copy link
Contributor

@jcpitre jcpitre left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

Copy link
Contributor

github-actions bot commented Feb 1, 2024

❌ Invalid acceptance test.
New Errors: 0 out of 1479 datasets (~0%) are invalid due to code change, which is less than the provided threshold of 1%.
Dropped Errors: 0 out of 1479 datasets (~0%) are invalid due to code change, which is less than the provided threshold of 1%.
New Warnings: 80 out of 1479 datasets (~5%) are invalid due to code change, which is above the provided threshold of 1%.
Dropped Warnings: 0 out of 1479 datasets (~0%) are invalid due to code change, which is less than the provided threshold of 1%.
2 out of 1481 sources (~0 %) are corrupted.
Corrupted sources:
au-queensland-translink-brisbane-gtfs-1217
ca-british-columbia-translink-vancouver-gtfs-1222
Commit: cf9d130
Download the full acceptance test report here (report will disappear after 90 days).
❌ Invalid acceptance test.

Copy link
Contributor

github-actions bot commented Feb 2, 2024

❌ Invalid acceptance test.
New Errors: 1 out of 1480 datasets (~0%) are invalid due to code change, which is less than the provided threshold of 1%.
Dropped Errors: 3 out of 1480 datasets (~0%) are invalid due to code change, which is less than the provided threshold of 1%.
New Warnings: 80 out of 1480 datasets (~5%) are invalid due to code change, which is above the provided threshold of 1%.
Dropped Warnings: 3 out of 1480 datasets (~0%) are invalid due to code change, which is less than the provided threshold of 1%.
1 out of 1481 sources (~0 %) are corrupted.
Corrupted sources:
fr-nouvelle-aquitaine-rbus-gtfs-1879
Commit: 442a87e
Download the full acceptance test report here (report will disappear after 90 days).
❌ Invalid acceptance test.

@davidgamez davidgamez merged commit 2881c79 into master Feb 2, 2024
332 of 333 checks passed
@davidgamez davidgamez deleted the fix/calendar-expired-missing-serviceid branch February 2, 2024 14:02
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants