[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Bug#714959: linux-image-3.2.0-4-kirkwood mdraid array fails to assemble b/c drives are not yet ready (sata_mv)



> > +#define	QMV_SATA_INIT_DELAY_PHASE	5000 //milliseconds
> > +////////////////////////////////////////////////////////////////
> > +
> >  /*
> >   * module options
> >   */
> > @@ -4329,7 +4333,11 @@
> >  		struct ata_port *ap = host->ports[port];
> >  		void __iomem *port_mmio = mv_port_base(hpriv->base, port);
> >  		unsigned int offset = port_mmio - hpriv->base;
> > -
> > +		// marvell 7042 port 2 port 3 will power on by order every  5 sec
> > +		if( (port==2) || (port == 3) ){
> > +			printk("Wait %d seconds to initialize scsi %d.\n",QMV_SATA_INIT_DELAY_PHASE/1000,port);
> > +			mdelay(QMV_SATA_INIT_DELAY_PHASE);
> > +		}
> >  		ata_port_pbar_desc(ap, MV_PRIMARY_BAR, -1, "mmio");
> >  		ata_port_pbar_desc(ap, MV_PRIMARY_BAR, offset, "port");
> >  	}

As i look at what this code is doing within this loop, you start to
wonder what is really going on here. The code does not initialize any
ATA ports. It is just adding strings to the PCI description.

If you look at the timing in dmesg_after_patch.txt and
dmesg_error_txt, for the last driver:

dmesg_error_txt
[   19.284722] ata4: SATA link up 3.0 Gbps (SStatus 123 SControl 300)

vs

dmesg_after_patch.txt
[   22.276626] ata4: SATA link up 3.0 Gbps (SStatus 123 SControl 300)

The error case is actually 3 seconds faster at detecting all the
drives.

The real clue could be:

// marvell 7042 port 2 port 3 will power on by order every  5 sec

So what i think is happening is the controller is performing a
staggered start, independent of the software. The 10 second pause
added by this patch means all the discs are spinning by the time the
driver goes looking at them, and so it notices 4 drives all at the
same time. Without the pause, three drives are ready, and the four
pops up later.

I think the real problem here is, why does it take 230 seconds between
the drive becoming available, and the md: bind<sdd1>.

I think you need to be talking to the device mapper/raid people.

  Andrew


Reply to: