Closed Bug 932566 Opened 11 years ago Closed 11 years ago

[B2G][Email] - Email Notification Synchronization for checking new messages have inconsistent timing

Categories

(Firefox OS Graveyard :: Gaia::E-Mail, defect)

ARM
Gonk (Firefox OS)
defect
Not set
normal

Tracking

(Not tracked)

RESOLVED DUPLICATE of bug 933079

People

(Reporter: tnguyen, Unassigned)

References

Details

(Whiteboard: burirun3)

Attachments

(3 files, 1 obsolete file)

Attached file logcat (obsolete) —
On first setup of an email account on device, make sure to set "Check for new messages" to 5 minutes, 10 minutes, 15 minutes, 30 minutes, or 1 hour. When the user receives an email from another source, the notifications will flucuate and be inconsistent with when they've set to check for new messages. 

Repro Steps:
1) Updated Buri to V 1.2 COM RIL BuildID: 20131025004000
2) Navigate into Email App
3) Setup an email account
4) When asked to "Check for new messages" - set value to 15 minutes and complete setup
5) Press Home button to bring device back to homescreen
6) Send an email to the email account just set up on the device
7) Wait 15 minutes and observe:

Actual:
Notification for new message will appear long after 15 minutes

Expected:
Notification for new message will appear at 15 minutes or less

Environmental Variables:
Device: Buri v1.2 COM RIL
BuildID: 20131025004000
Gaia: 606517ceafe0950c2b89822d5f13353743334f2c
Gecko: 5eabd267ef04
Version: 26.0a2
Firmware Version: 01.02.00.019.082

Notes:

Repro frequency: 100%
See attached: logcat
Example test case: 

https://moztrap.mozilla.org/manage/case/9403/
https://moztrap.mozilla.org/manage/case/9404/
https://moztrap.mozilla.org/manage/case/9405/
https://moztrap.mozilla.org/manage/case/9406/
This was not a feature in 1.1.
I think this bug needs to be a bit more specific. For example, what's the definition of long after here? Trying to get a better idea of bug impact here.
Flags: needinfo?(tnguyen)
Comment on attachment 824353 [details]
logcat

Obsoleting the attachment; this logcat only has 20 seconds worth of data.  You need to be logging the logcat output to disk during the entire time period that the e-mail app might wake up, etc.  logcat uses a circular buffer and so unless you've set the buffer to be absurdly large on the device (and the device has the memory for it), just running it after the fact usually will find that much of what you care about is already gone.

Since 15 minutes of logs can get pretty big, as per the instructions at https://wiki.mozilla.org/Gaia/Email/RequiredBugInfo there are ways to reduce the amount of output.

Also, you might want to make sure that the device's wi-fi is either disabled or that the device is on wall power so wi-fi never turns off.  (See bug 907028 for more info on that situation.)  But basically if the e-mail app wakes up and tries to connect and fails to connect, it's just going to go back to sleep and wake up in 15 minutes.  We don't wake up and then when we fail wake up every minute after that or anything like that.
Attachment #824353 - Attachment is obsolete: true
Attached file logcat.txt
I had just reprod this issue again with a device that has been flashed and reset. When setting up the Email App, I had set the value to check for new messages to 5 minutes. Wi-Fi was disabled and the email notification was received 3 minutes past the 5 minute mark. 

Once I recieved the email notification, I copied over the logcat from the terminal after having previously setup my device from Andrew's instructions.
Flags: needinfo?(tnguyen)
Andrew - Can you an impact assessment on this bug? Trying to figure out if we need to block on this or not.
needinfo on comment 4
Flags: needinfo?(bugmail)
Ugh, apologies on this, but the logcat directive I had called out in https://wiki.mozilla.org/Gaia/Email/RequiredBugInfo did not include a timestamp in its formatting, so the log you gathered (thank you!) is unfortunately not fully useful in this case.  We want "-v time" in there too, giving us:
  adb logcat -v time -s GeckoDump:V Gecko:V

I've updated the wiki to include this.  Actually, I've cleaned it up a bit in general and added a suggestion to use the "tee" command to both log to disk and show the output as it logs.  So the full command would be:
  adb logcat -v time -s GeckoDump:V Gecko:V | tee FILENAME-TO-LOG-TO


What the log does show is:
- 2 cronsync runs.
- The first run finds no new messages and allegedly schedules a mozAlarm 5 minutes in the future
- The second run finds a new message.


I'm not sure I entirely understand the exact timeline we're expecting, though.  Unless you completely disable periodic checks, you are not going to be in control of the phase of the 5 minute checks.  If you turn off *all* periodic checks.  Then turn the one you want to 5 minutes, we will set a new time for 5 minutes at the time you set that setting.


Also, based on the other similar bug that was filed where the new message wasn't in the inbox, I think we may also want to know when the e-mail is actually being delivered to the mail server.  We'd probably want that via some mail client that is not us that understands IDLE notifications so that we can see its arrival.  I think the gmail web UI on Firefox does not do proper push notifications based on my own experiences, but I could be wrong; maybe Chrome gets a better one?  Also unsure if Thunderbird handles gmail's IDLE stuff right.

It's conceivable the INTERNALDATE may be largely correct and we can just check the timestamps, but that might be sketchy and there could be some clock skew between adb logcat and google unless we're sure everything's NTP synchronized...

The safest thing is probably to use the "tee" command I mention above so you can see when the periodic sync is firing and once it has fully completed, then you send the new e-mail.  So that way it's very likely the mail will have been received by the testing account and processed before you are done.  Since it's probably annoying to have to wait 5-10 minutes to test this and we do have secret settings you may not know about, I've documented how to set the sync interval to 1 minute on https://wiki.mozilla.org/Gaia/Email/DebuggingTricks .  Please make any enhancements to the wiki page or the one it references if you think they can be improved.


In summary, I think we're doing what we expect to be doing here, and it should show up that way once we're all on the same page.  It's probably worth getting one more log with timestamps and with the new message sent at the ideal time to double check.
Flags: needinfo?(bugmail)
needinfo for the above comment
Flags: needinfo?(tnguyen)
Attached file EmailSyncLogcat.txt
10-30 16:39:17.919 I/Gecko   ( 1738): WLOG: Sync Completed! null days 34 messages synced
10-30 16:39:17.919 I/Gecko   ( 1738): WLOG: folder message count 74 dbCount 66 syncedThrough null oldest known 

1382888032000
10-30 16:39:17.929 I/GeckoDump( 1738): LOG: email oncronsyncstop: 0
10-30 16:39:17.929 I/GeckoDump( 1738): LOG: email: clearing wake locks for "id0"


according to this logcat it detected the email at 16:39, I did not receive an actual notification until 10-30 16:45.
Flags: needinfo?(tnguyen)
(In reply to Peter Bylenga from comment #8)
> according to this logcat it detected the email at 16:39, I did not receive
> an actual notification until 10-30 16:45.

So... we didn't actually detect the message.  Even though we know that there's a folder message count of 74, the query we sent the server did not find the new message.  This is very concerning.  It implies either some type of weird caching on google's end (maybe we need to re-select the folder or use IDLE to get an update?  maybe we regressed when we stopped using idle?), or a bug on google's end, or an unusually serious disagreement about timezone or IMAP spec.  I may spawn another bug for this.

The log entries that you would see if we had detected them would be like from your first log:
===
  new fetched, header processing, INTERNALDATE:  30-Oct-2013 15:54:04 +0000
I/Gecko   (  926): WLOG: onNewHeader: [object Object]
...
I/Gecko   (  926): WLOG: cronsync: Asking for snippets for 1 headers
I/Gecko   (  926): 
I/Gecko   (  926): WLOG: runOp(do: {"type":"downloadBodies","longtermId":"0/7","lifecycle":"do","localStatus":"done","serverStatus":"doing","tryCount":0,"humanOp":"downloadBodies","messages":[{"i)
I/Gecko   (  926): WLOG: cronsync: Notifying for 1 headers
===
Okay, I filed bug 933079 on the phenomenon I mentioned in comment 9.  I realize that I'm assuming the test account in question is using gmail based on the other logs you provided today.

In the future, when filing e-mail bugs, please do try and provide the e-mail domain in use, as per https://wiki.mozilla.org/Gaia/Email/RequiredBugInfo as it can be dangerous when I fill in these blanks sometimes :)


We should probably try and reproduce this bug on an hotmail.com/outlook.com ActiveSync account (non-gmail!) or another non-gmail-hosted IMAP account (so no google apps-for-domains either).  Let's avoid using outlook.com/hotmail.com's IMAP though.

Marking needinfo for an attempt at that.
Flags: needinfo?(tnguyen)
Attaching the logcat captured from an Outlook ActiveSync account. Check for new messages was set to "Every 5 minutes".
Flags: needinfo?(tnguyen)
Email Notifications appear to be behaving differently across Gmail, Outlook, and Yahoo. We've constructed a spreadsheet with the trials that were run per device, outlaying the types of accounts used, service provider, and whether the device was on the lockscreen/homescreen. 

The link to the google doc is located here: https://docs.google.com/a/qanalydocs.com/spreadsheet/ccc?key=0ArrO6yKjSooKdEVOdkJ0eVFfSHl1QS1jTUJBRHlEREE&usp=sharing
Those trial results are very concerning. What it shows is that our timing for checking for new messages is unpredictable enough to indicate that the feature isn't working effectively - out of 31 trials, we failed 9 times. That's a 71% success rate, which fails too low for ship in my opinion. We really need to above 95% at a minimum.
blocking-b2g: --- → koi?
(In reply to Jason Smith [:jsmith] from comment #13)
> Those trial results are very concerning. What it shows is that our timing
> for checking for new messages is unpredictable enough to indicate that the
> feature isn't working effectively - out of 31 trials, we failed 9 times.
> That's a 71% success rate, which fails too low for ship in my opinion. We
> really need to above 95% at a minimum.

Meant to say - which falls to low for ship in my opinion
After discussions with Peter and Tony on IRC, I think there's a few things going on which make the failures potentially misleading:

- Yahoo issues: it seems like Yahoo's internal mail spools/whatever are backed up and the data is not particularly useful:
PBylenga> we found that Yahoo doesn't have an activesync and found in our own tests that email to email while on the web was having a 11 minute lag at times for sending/receiving an email, our results to phone were 25+ minutes to receive the email on a 5 minute sync

- gmail issues: this seems like bug 933079 may be causing us to not find anything until the second poll interval.  I definitely think this is a serious bug, and I definitely think we want to address this one before we ship.

Also, I think the moztrap instructions leave potential phase problems and such.  I propose the following changes to our moztrap cases for this.  (Some of this I mentioned above, but I want to make sure we formalize these steps and it's clear that they were followed.)

1) Nuke the 10 minute, 15 minute, and 30 minute cases.  These are a waste of human manpower.  Literally the only thing that changes is a hard-coded number that is the number of seconds.  No interesting phenomenon will occur with that change.  In the event we ever screw up and it becomes relevant, we can add them back if we don't feel confident we've addressed that with automated tests.  These are:

https://moztrap.mozilla.org/manage/case/9404/
https://moztrap.mozilla.org/manage/case/9405/
https://moztrap.mozilla.org/manage/case/9406/


2) Split the 5-minute test at https://moztrap.mozilla.org/manage/case/9403/ into two different test cases.

- Variant one (implied by current description).  Leave the mail app active.  After creating the account and entering the inbox, either let the screen lock or go back to the home-screen.  Don't use any apps or that might cause the e-mail app to get killed and not test what we want tested.

- Variant two.  Synchronization with the mail app closed.  After creating the account, close the mail app by killing it from the task screen.  I would expect gmail to experience no failures in this case as long as we have corrected phase.


3) Get a fastmail.fm or some other high-quality IMAP server account.  gmail has weird IMAP, Yahoo has horrible IMAP, we should have a real IMAP server in the mix.  fastmail has a web interface too, so it's not like people would need to set up Thunderbird/etc.


4) Modify both of the 5-minute variants to ensure that the e-mail has the full N minutes to arrive at the destination and be processed by managing the phase of the syncs.  This will help eliminate misleading results and also give mail spools a reasonable amount of time to get to where they're going.

I suggest we do this by watching the logcat so that we only send the e-mail just after the last automatic polling interval occurs.  We should use the command that includes using "tee" from https://wiki.mozilla.org/Gaia/Email/RequiredBugInfo so that we also have a full log of all of these.  I really want logs when these fail, because otherwise we have no idea what happened in order to determine what failed.

NOTE! I realized the instructions there previously had not covered the "how to make adb not disconnect when the screen locks/turns off".  I've added minimal instructions for the from-the-UI way.


5) Further modify them to specify that the e-mail should be sent using a non-Firefox OS device (webmail is fine, possibly) from a different mail account (same provider is fine) and that it should be verified that the e-mail is visible in the webmail/UI for the account well before the sync interval is up.  (We don't want to use the same account because it's possible the webmail UI could fast-path things.)  If it doesn't show up in webmail, we don't hold it against the sync.  For example, I would completely discard any test runs where Yahoo takes 11 minute to get to the webmail.  Evidence of an overloaded infrastructure in cloud-hosted implementations means all bets are off; our IMAP server could be talking to a different cluster/machine in a cluster that still hasn't seen the message, etc.
From what I'm reading above, it sounds like we should block on bug 933079, but we shouldn't block on this. The yahoo issue is on the server-side, but the gmail issue sounds like an actual problem captured in bug 933079. I'll nom bug 933079 as a result & remove the nom here.

Is there really anything actionable then in this bug if bug 933079 tracks the actual bug here & the yahoo issue is out of control? I'm just wondering if we should close this out & use bug 933079 to track the actionable bug here.
blocking-b2g: koi? → ---
(In reply to Jason Smith [:jsmith] from comment #16)
> Is there really anything actionable then in this bug if bug 933079 tracks
> the actual bug here & the yahoo issue is out of control? I'm just wondering
> if we should close this out & use bug 933079 to track the actionable bug
> here.

Now that that bug is fixed, I think the only thing to do is ideally amend the moztrap cases as I suggest in comment 15 and then maybe re-run the cases to sanity check the changes.

Tony, do my changes sound reasonable/helpful?
Flags: needinfo?(tnguyen)
Hey Andrew, the changes from comment 15 sound reasonable and very helpful! Thanks.
Flags: needinfo?(tnguyen)
The real fix happening here happened in bug 933079, so I'm duping over there.
Status: NEW → RESOLVED
Closed: 11 years ago
Resolution: --- → DUPLICATE
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Creator:
Created:
Updated:
Size: