-
Notifications
You must be signed in to change notification settings - Fork 34
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Intermittent tracking due to missing foreground service in Malawi on Huawei phones #677
Comments
Response from the user:
|
Digging a bit deeper into the reboots, we see that most of them are very short (~ 2 mins long). The only exception is the one at This does seem to argue for automatic restarts, but I am not sure why these restarts occur.
|
Apparently, some phones do reboot periodically, although the link focuses on the P10. We can in fact see the "last reboot" in the android settings; asking the user to check that as well. This is typically under Settings -> About device -> Status. Or similar. However, if the reboots are of short duration, they are unlikely to affect the sensing in any serious way. So the check is purely to understand if the reboots are a symptom of a more serious issue, which may lead to increased foreground service killing, etc. |
Next, let's look at the gaps in periodic execution and what they are linked with. I pick gaps that are 24 hours apart. so there is a jump in the days of adjacent execution attempts. This gives us:
|
Trying to combine these gaps with the foreground service killed notification is inconclusive.
after restarting with every gap the foreground service had been killed.
But there are plenty of instances where the foreground service was killed and there was no gap - e.g.
And plenty of syncs without a killed notification.
Concretely, it looks like on the 3rd, the periodic activity was invoked 26 times and the foreground service was detected as killed 7 times. That is an order of magnitude greater than what I have seen before.
|
We also had a second Malawi user upload their logs. They have an android 10 phone from |
This second user had the foreground service killed only rarely
And even when they were killed, the service was only off for an hour or so. The max was 3 hours at around 5am.
Number of reboots is around the same
|
Looking through the periodic calls, we see that they are fairly regular for the second user, much more so than for the first user.
Looking at the outliers in both cases.... Gap of at least 24 hours between runs
Gap of at least 12 hours between runs
Gap of at least 6 hours between runs
So the second user doesn't have any really egregious gaps, but does have a lot more small gaps. Looking at some of the timestamps in detail...
For the second user, the gaps don't seem to be related to either the foreground service killed or the reboots.
Given that the gaps seem to principally occur overnight for the second user (the sync after the gap is consistently between 5:30am and 7:30am), the gaps for the second user seem to be primarily related to doze mode. |
Response from the user: no foreground service kills for three days after settings change. However, app was incorrectly stuck in |
Final investigation on network issues, since we had problems with the upload:
or an IO Error.
For user1, the ratio was 2.56
For user2, the ratio was 0.53
However, in both cases, we recovered from the network errors, and the last logs were from successful attempts. So, at least for these two users, it looks like the network issues are intermittent typically resolve themselves.
|
To summarize, there were three potential issues with the end-to-end pipeline:
From the investigation above, it seems pretty clear that the primary problem is (1). It appears as though (1) can largely be mitigated by fixing the background settings so that the foreground service is not killed as often (manual report from user 1 + #677 (comment) + #677 (comment)) Pending issues/additional follow-ups:
LMK if there is anything else that you can think of. |
Some more data from a third chinese phone. This time, it is a OnePlus in the US.
Large gap in periodic sync is typically overnight (restarting between 4am and 8am), similar to user 2 and consistent with doze mode.
Most of the killed notifications were also in this range, indicating that the app resources had been "cleaned up" overnight.
more, more
And so are the reboots.
more, more
Although the user was in the US, they also periodically encountered network errors, but the ratio was very low (~ 2%)
|
This log has now been uploaded. We seem to have only a few periodic calls on all the three recorded days.
|
The first geofence exit in this time range was on the 10th at 7am.
So what was going on for the duration of the 9th? We started off in the ongoing
It detected that we were on an ongoing trip and generated an initialize. Actually, we generated multiple initialize transitions, but I am not quite sure why.
At this point, we start trying to create a geofence (since we have an initialize transition during an ongoing trip).
And we fail consistently due to poor quality location points although all the settings seem to be correct. We only get a good quality location point on the morning of the 10th.
|
The launches were apparently due to the system automatically restarting the service. If we manually launch the service after it is killed, we pass in an intent.
In this case, there is no intent passed in, and the startId keeps increasing. We return
In subsequent calls, it was launched with an intent.
Except for this one.
The ones with the intent seem to be launched because we received an
|
After we got the bad location, we tried to read a new location, but it was consistently null
|
Note also that, although we did turn on the background settings, there were only a few periodic calls. Maybe the location was available more often, but we just didn't check because of the lack of periodic calls. Should explore silent push notifications on android as well so that we can check more often. |
Response from tester about the differences between Sat/Sun/Mon
|
Confirmed that:
Further, the tester traveled to another location on Saturday, and the location was not available either during the drive or at the new location. That seems to indicate some phone-specific/location services-specific issue on Saturday. Not sure there is an easy resolution for that… |
One interesting note is that on Sun morning, we didn't read a new location. Instead, we read the last location and it was valid. Sat morning
Sat late morning
Sat afternoon
Sun morning
So I wonder if there's some weirdness with reading locations in the background or something. |
It looks like the geofence location reading code fails more often that it works.
But occasionally, we do get some values, after a long time of getting no values. Again, this indicates issues with the phone/location services.
or
|
This does not appear to happen at the same rate for the three users here. Not sure why it happens more often for the first user, but again lends credence to the theory that there is something wrong with their environment/phone.
In contrast, the first user in #678 had encountered it exactly once, which was the time that they reported. They have subsequently seen it two more times. But this is two orders of magnitude less that the first user.
|
Digging a bit further into this, it looks like the option to read the location specially to create the geofence doesn't often work. The number of times we launch the callback and the number of times the new location read fails are pretty much identical.
This could be because:
|
I note that most of the failures seem to be because the |
Let us look at the three potential failure scenarios from stackoverflow and see if any of them applied. ** About to call
** Called
** In the synchronized block **
Intent service broadcasts null
Create action receives it and generates a
Waiting code is notified and calls
And bails out completely
So we definitely call We use
And it looks like we do have a location update triggered by the listener although the update is invalid before we call the next request (at least around the 9am call). Request location updates
Get intent callback
At this point, we unregister the listener.
|
Edited the code to always set the last location to null so that the
Started and ended trips in the emulator a few times, don't see any issues.
So there doesn't seem to be anything obviously wrong with the code. Will try to set this as a config setting to make it easier to test on actual phones, but at this point, this seems to be a rare, intermittent issue on most phones. |
Some additional manual data. Mostly accurate, except for some areas where the trip end was not correctly detected. Let's look at each of these individually
First, although the trip end was apparently not detected for multiple hours, the trip were in fact segmented reasonably correctly. Couple of issues:
waiting for trip start error logs
filtered out erroneous trip
|
Foreground service issue resolved after background restrictions were removed. Follow ups:
|
And the phone had been stuck in "ongoing trip" state for two days.
To recap:
Missing data from 24th to 26th; Phone was turned off
I don’t know when it was turned off, but I do get a notification when it is turned on. And it was turned on multiple times in that time period
** Missing data from the 28th to the 30th; the app appeared to have background restrictions **
I basically don’t see any logs between the 27th, when we got a location point, and the 30th, when he launched the app. This means that the app was not launched every hour, it did not check for the foreground service, it didn’t get any locations, etc.
There were definitely stretches of time before this where the period sync ran successfully.
So also checked with the user on whether they changed any app settings.
The text was updated successfully, but these errors were encountered: