Opened 6 years ago

Closed 6 years ago

Last modified 2 years ago

#31 closed task (fixed)

[GTA01] Add flow control handling for suspend/resume

Reported by: mickey Owned by: mickey
Priority: major Milestone: milestone5
Component: framework/ogsmd Version:
Keywords: Cc:

Description

Openmoko devices need special flow control setting on suspend/resume

Attachments (3)

frameworkd.log (18.2 KB) - added by mwester 6 years ago.
frameworkd.log
gsm.log (4.1 KB) - added by mwester 6 years ago.
gsm.log
0001-Add-information-about-Calypso-s-flow-control-handlin.patch (1.1 KB) - added by PaulFertser 6 years ago.
Patch to calypso's documentation

Download all attachments as: .zip

Change History (17)

comment:1 Changed 6 years ago by tim.niemeyer@…

mwester worked allready on this!

Is this really a framework/odeviced thing?
This should be handled in the kernel, think!

comment:2 Changed 6 years ago by mickey

  • Milestone changed from milestone2 to milestone3

comment:3 Changed 6 years ago by mwester@…

[I was hoping to have some example code to add, but that's taking longer than expected]

There are several considerations to be taken into account:

a) as soon as the GSM has been prepared for suspend/sleep mode (i.e. unnecessary unsolicited messages disabled, etc), something needs to forcibly flow-control the GSM on the neo and freerunner. This should be done as soon after the application managing the GSM has prepared it for suspend/sleep as is practically possible.

b) as soon as the application managing the GSM is prepared to read data from the GSM after resuming, it should release the flow-control on the GSM.

The timing above is important - if the GSM flow-control is applied too late, or removed too early, data loss can result. This is why user-space needs to apply/release this flow-control, and not the kernel -- there can be a significant window of time (several seconds) from when userspace is frozen to when the kernel driver for the GSM gets suspended, and if flow-control is not asserted during this window, any messages arriving from the GSM will simply be queued up to be processed when the system resumes, rather than signal that the suspend should be aborted and the phone resumed immediately to process the incoming data.

[The deferral of the flow-control release on resume is less-important with recent kernels on the GTA02 as the window for data-loss has been closed; however deferring the release of flow-control until user-space is up and running (implying that all drivers are resumed and functional) is still critical to minimizing the probability of UART overruns on the GTA01. So not only does it make sense to have a user-space task release flow-control if it asserted it in the first place, but it also functions as a 95% workaround for the UART/GSM hardware problems on the GTA01.]

c) Upon resume, the resume reason should be made available to the process managing the GSM. This information may be helpful to that process in scheduling its "wakeup" processing -- if the resume reason is a GSM interrupt, it would be reasonable for that process to defer the restoring of unsolicited messages, etc, for several seconds, and instead give priority to reading the data from the GSM to begin processing of the incoming data (incoming call, or SMS, for example). Actually, this may not be a big concern if the GSM is in mux mode, and if separate threads can simultaneously read the incoming data and handle the conversation to restore unsolicited messages, read signal strength, registration data, or whatever else needs to be done upon resume). Currently this requires an updated u-boot as well as a recent kernel -- that can be changed; it is trivial to expose a sysfs hook that will tell the code if the resume (or aborted suspend) was due to a GSM event.

Finally, we need to consider support for other, non-dbus-aware applications. I propose that suspend/resume implement a mechanism similar to apm's "/etc/apm/suspend.d" and "/etc/apm/resume.d" directories. Scripts/executables in the appropriate directory will be executed in order on suspend or resume. In the case of suspend processing, if a script or executable returns an error code, the suspend request is denied. This approach will allow users to extend the software suite in a very simple manner, permitting (for example) a simple shell script that logs GPS coordinates to participate in power management. [Of course, it would also be desirable if the wiki could offer some code fragments for various common scripting and programming languages that people might use to "do it the right way" instead.]

Note: The specific sysfs node and actions, expressed in shell script:

echo "Asserting flowcontrol (suspending) the GSM."
echo "1" >/sys/devices/platform/neo1973-pm-gsm.0/flowcontrolled

echo "Releasing flowcontrol of the GSM."
echo "0" >/sys/devices/platform/neo1973-pm-gsm.0/flowcontrolled

comment:4 Changed 6 years ago by stefan

  • Milestone changed from milestone3 to milestone4

comment:5 Changed 6 years ago by mickey

  • Component changed from framework/odeviced to framework/ogsmd
  • Milestone changed from milestone4 to milestone5
  • Priority changed from minor to major
  • Type changed from defect to task

I have just revisited this ticket, sorry for ignoring it so long. I promise to implement this for ms5, but I need to do some serious testing to gather the best spot (timing) in frameworkd.

Changing type and priority.

Changed 6 years ago by mwester

frameworkd.log

Changed 6 years ago by mwester

gsm.log

comment:6 Changed 6 years ago by mwester

GTA01 / SHR / latest framework incl Mickey's patch to do force flow control on the calypso on suspend, and release it on resume.

Suspend device. Call device. Observe device wakes, but no ring.

Logs attached.

From the gsm.log (log from the serial device driver):

"S F(" is the point where flow control is forcibly applied to the calypso at suspend. "S *" is the GSM interrupt (incoming call), and "S )" is when the flow control is released.

Note that the calypso sends a CPI message immediately upon being released -- but frameworkd has not read that message yet, thinks the calypso has been asleep, and sends the empty command string to wake it (at point 105.540 in the sequence). The calypso sends an immediate response to that (105.595) because it's wide away (it woke up the CPU, after all). It follows that with a +CRING message -- but frameworkd is already busy turning on all the unsolicited messages (AT+CTZU and all). In the mix of this, the call is lost.

At around 108.600 there is another +CRING and then we see a bunch more - a valiant attempt by the exhausted calypso to get frameworkd's attention to the call... but it fails, and at 130.780, the call is lost.

a) There's obviously a bug somewhere in the parsing or matching that misses that first CPI indicator that would have resulted in a dbus message indicating a call incoming.

b) There's no need to re-establish the unsolicited messages immediately upon resume -- that can happily wait for several seconds, so that we give the rest of the processes a few extra cpu cycles to respond to the time-critical incoming call. In fact, it would be ideal if the re-establishment of the unsolicited messages could wait some period of time (3 - 5 seconds) unconditionally, and be further deferred if there is an incoming SMS or call, until the call was either accepted or dropped. It just seems less likely to cause trouble with the fragile serial drivers and the unknowns of the calypso.

c) This problem hints at another robustness issue -- the calypso was screaming for help until the call perished, but nobody heeded its desperate cries of "+CRING" -- because, apparently, the "CPI" at the start of the sequence got lost. It's rather like ignoring someone shouting "HELP!" because they didn't say "please" first. :-) I think perhaps the CRING should synthesize a dummy CPI as necessary, or just be put on the dbus as is -- let the application software be aware that a call is incoming and let it decide if it wishes to alert the user of this even if it lacks caller id info.

comment:7 Changed 6 years ago by mickey

  • Status changed from new to assigned

Thanks a lot for your testing. This is a very interesting bunch of debugging information which I'm going to analysis asap! Do you think the patch is its current form helps on the GTA01 or makes it worse? (I'm considering to apply it to the stabilization/milestone4 branch).

comment:8 Changed 6 years ago by daniel

  • Summary changed from [OPENMOKO NEO] Add flow control handling for suspend/resume to [GTA01] Add flow control handling for suspend/resume

comment:9 Changed 6 years ago by PaulFertser

Daniel, i believe in fact a proper flow control handling is needed on any device that supports it. Therefore i disagree that it is a GTA01-specific issue. It might be more important for GTA01 but nevertheless it is necessary everywhere.

Second, looking at the recent sources and at the latest mwester's log, i can consider that flow control is working. I see no lost bytes. In fact, his log highlights several other important issues but to me it seems that flow control per se is working as intended.

First issue was already mentioned by mwester: "There's no need to re-establish the unsolicited messages immediately upon resume". The situation was somewhat improved in HEAD but it still fails a real-life test. If you suspend the phone in less than 10 seconds (and the message is misleading, btw) after previous wake-up, and it wakes on incoming call, the commands scheduled for the previous wake-up will still be send and it messes up everything (i can provide logs) (yes, there's a FIXME in the source, i hope it'll be fixed soon). I suppose problems can also come from those "AT+CMGL=2"s. We should probably refrain from sending them (and everything else) before the modem is considered fully resumed.

The second issue is that we ignore not only +CRING: VOICE but also +CLIP. I'm not sure but probably (haven't consulted GSM docs yet) it can also be used as an incoming call indication.

comment:10 Changed 6 years ago by mickey

Unfortunately this bug is now overloaded with more than one issue. Would you mind commenting on the actual state of #31 and adding new bugs for the race conditions / synthesizing call info?

Changed 6 years ago by PaulFertser

Patch to calypso's documentation

comment:11 Changed 6 years ago by PaulFertser

  • Resolution set to fixed
  • Status changed from assigned to closed

Closing this bug as mwester has confirmed that flow control handling is working for him.

Attached is a patch to documentation that clarifies calypso's peculiarity with regard to flow control.

comment:12 Changed 6 years ago by jluebbe

  • Resolution fixed deleted
  • Status changed from closed to reopened

comment:13 Changed 6 years ago by jluebbe

  • Resolution set to fixed
  • Status changed from reopened to closed

Just noticed that this has been committed by mickey already, reclosing

Note: See TracTickets for help on using tickets.