Bug#808953: xhci regression for large transfers (commit e210c422b)

To: Ron <ron@debian.org>
Cc: Mathias Nyman <mathias.nyman@intel.com>, linux-usb@vger.kernel.org, Greg Kroah-Hartman <gregkh@linuxfoundation.org>, 808953@bugs.debian.org
Subject: Bug#808953: xhci regression for large transfers (commit e210c422b)
From: Ron <ron@debian.org>
Date: Fri, 8 Jan 2016 05:17:25 +1030
Message-id: <[🔎] 20160107184724.GZ10797@hex.shelbyville.oz>
Reply-to: Ron <ron@debian.org>, 808953@bugs.debian.org
In-reply-to: <[🔎] 20160107163920.GY10797@hex.shelbyville.oz>
References: <[🔎] 20160102063255.GQ10797@hex.shelbyville.oz> <[🔎] 568E8661.9040202@intel.com> <[🔎] 20160107162228.GX10797@hex.shelbyville.oz> <[🔎] 20160107163920.GY10797@hex.shelbyville.oz>

On Fri, Jan 08, 2016 at 03:09:20AM +1030, Ron wrote:
> On Fri, Jan 08, 2016 at 02:52:28AM +1030, Ron wrote:
> > On Thu, Jan 07, 2016 at 05:38:09PM +0200, Mathias Nyman wrote:
> > > Hi
> > > 
> > > On 02.01.2016 08:32, Ron wrote:
> > > >
> > > >Hi,
> > > >
> > > >It appears the commit e210c422b6fdd2dc123bedc588f399aefd8bf9de
> > > >"xhci: don't finish a TD if we get a short transfer event mid TD"
> > > >is causing transfers larger than 16kB to be unreliable.
> > > >
> > > >If I limit transfers to be no larger than 16kB, then it also works as
> > > >expected in an XHCI port with an unmodified build of Linus' current
> > > >head (v4.4-rc7-76-g9c982e8), but transfers larger than that do not.
> > > >I see an alternating cycle of a successful transfer, followed by two
> > > >that will time out waiting in libusb (with a 5 second timeout set),
> > > >before getting another successful transfer and the cycle repeating.
> > > >
> > > >I can run more tests and dig into this deeper if the reason for it
> > > >isn't immediately obvious in hindsight.
> > > >
> > > 
> > > Thanks for the info,
> > > I can't spot anything obvious, but my brain might still be in vacation mode.
> > > 
> > > Could you reproduce it with the attached patch, it only adds extra debugging?
> > > 
> > > We should either see no output, or the following sequence:
> > > 
> > >  1. "mid bulk/intr SP, wait for last TRB event"
> > >  2. "last trb has length set"
> > >  3. "and last trb is SHORT_TX, OK"
> > 
> > 
> > I guess one out of 3 ain't good ...  all I see logged is:
> > 
> >  [   60.015708] xhci_hcd 0000:04:00.0: mid bulk/intr SP, wait for last TRB event
> >  [   65.017374] xhci_hcd 0000:04:00.0: mid bulk/intr SP, wait for last TRB event
> >  [   70.455451] xhci_hcd 0000:04:00.0: mid bulk/intr SP, wait for last TRB event
> >  [   75.456248] xhci_hcd 0000:04:00.0: mid bulk/intr SP, wait for last TRB event
> > 
> > I'm passing 5 seconds to libusb as the requested timeout.
> 
> And if I limit the maximum transfer size to 16kB (the above was with
> 64kB transfers), then I see nothing logged at all.
> 
> So if that code does indeed look sane, perhaps the issue is really
> in the code that's splitting large transfers doing something funny?

So, tracing this a bit deeper, what I'm seeing is that in the case of
a transfer of 16kB or less, there's only one TRB in the TD, so it's
always the last_trb and the wait code never triggers (ie. basically
the same code path as with this patch reverted).

In the case of a larger transfer, there are N / 16kB TRBs in the TD,
and in the case where it stalls, I get a short transfer on the first
TRB, and then it just hangs waiting until it times out, reporting:

 xhci_hcd ep 0x81 - asked for 65536 bytes, 16382 bytes untransferred

with event_trb == td->first_trb.


What's still not clear to me yet is which assumption is broken.
Clearly we aren't actually getting more TRB events after the short
transfer.

And the other thing which possibly makes this odd is that the FTDI will
always return 2 bytes in a packet, even when there "no data" in it,
since it prefixes each packet with 2 "modem status" bytes.  Which means
the 16382 untransferred above is it returning 2 status bytes and none
of the data that we actually requested from it (which is a normal thing
for it to do sometimes, and if we issue another read the data will
eventually come - but I don't yet know if that's also contributing here).


Sorry for the trickle of small facts, I'm still wrapping my head around
everything I need to know to understand this properly.

  Ron

Reply to:

Follow-Ups:
- Bug#808953: xhci regression for large transfers (commit e210c422b)
  - From: Mathias Nyman <mathias.nyman@intel.com>

References:
- Bug#808953: xhci regression for large transfers (commit e210c422b)
  - From: Ron <ron@debian.org>
- Bug#808953: xhci regression for large transfers (commit e210c422b)
  - From: Mathias Nyman <mathias.nyman@intel.com>
- Bug#808953: xhci regression for large transfers (commit e210c422b)
  - From: Ron <ron@debian.org>
- Bug#808953: xhci regression for large transfers (commit e210c422b)
  - From: Ron <ron@debian.org>

Prev by Date: Processed: Re: Bug#810247: Hyper-V network adapters are mapped to a different interface (ethN) on every boot
Next by Date: Bug#808293: freeradius stopped working after kernel upgrade
Previous by thread: Bug#808953: xhci regression for large transfers (commit e210c422b)
Next by thread: Bug#808953: xhci regression for large transfers (commit e210c422b)
Index(es):
- Date
- Thread