Discussion:
gpmi-mtd ecc regression
Tim Harvey
2013-10-18 17:03:43 UTC
Permalink
Huang,

The patch you made to obtain ECC info from the chip
(http://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/drivers/mtd/nand/gpmi-nand?id=2febcdf84b75aae627c61f0a5bf531a69299966c)
has caused a regression for an i.MX6 board I'm working with that uses
NAND and ubifs.

I'm using a Micron MT29F8G08
(http://download.micron.com/pdf/datasheets/flash/nand/2_4_8gb_nand_m49a.pdf)
which has:
page size: 2112 bytes (2048 + 64 bytes)
block size: 1056 words (1024 + 32 words)
plane size: 2 planes x 1024 blocks per plane
device size: 2Gb: 2048 blocks
ecc: 4bit

The legacy_set_geometry function comes up with:
gf_len=13
ecc_strength=8
page_size=2112
metadata_size=10
ecc_chunk_size=512
ecc_chunk_count=4
payload_size=2048
auxiliary_size=16
auxiliary_status_offset=12
block_mark_byte_offset=1999

and the new set_geometry_by_ecc_info comes up with:
gf_len=13
ecc_strength=4
page_size=2084
metadata_size=10
ecc_chunk_size=512
ecc_chunk_count=4
payload_size=2048
auxiliary_size=16
auxiliary_status_offset=12
block_mark_byte_offset=2018
block_mark_bit_offset=4

There are two discrepancies above:
a) ecc_strength - this part has 4bit ecc but being detected as 8?
b) page_size - the legacy function includes oob in page size, and the
new one does not

which is correct?

With your patch using the new function to detect ECC from the part,
booting to a ubifs that was flashed with uboot (which incidentally
also set ecc_strength=8) I get endless ECC failures:

[ 3.946205] UBI: attaching mtd2 to ubi0
[ 3.946585] UBI warning: ubi_io_read: error -74 (ECC error) while
reading 64 bytes from PEB 0:0, read only 64 bytes, retry
[ 3.946785] UBI warning: ubi_io_read: error -74 (ECC error) while
reading 64 bytes from PEB 0:0, read only 64 bytes, retry
[ 3.946983] UBI warning: ubi_io_read: error -74 (ECC error) while
reading 64 bytes from PEB 0:0, read only 64 bytes, retry
[ 3.947177] UBI error: ubi_io_read: error -74 (ECC error) while
reading 64 bytes from PEB 0:0, read 64 bytes
[ 3.947186] CPU: 0 PID: 1 Comm: swapper/0 Not tainted 3.12.0-rc3+ #253
[ 3.947192] Backtrace:
[ 3.947220] [<8001265c>] (dump_backtrace+0x0/0x10c) from
[<800127fc>] (show_stack+0x18/0x1c)
[ 3.947230] r6:00000040 r5:ffffffb6 r4:00000000 r3:00000000
[ 3.947247] [<800127e4>] (show_stack+0x0/0x1c) from [<80661820>]
(dump_stack+0x78/0x94)
[ 3.947269] [<806617a8>] (dump_stack+0x0/0x94) from [<803b4ebc>]
(ubi_io_read+0x12c/0x364)
[ 3.947274] r4:bf09e000 r3:00000000
[ 3.947285] [<803b4d90>] (ubi_io_read+0x0/0x364) from [<803b533c>]
(ubi_io_read_ec_hdr+0x60/0x330)
[ 3.947295] [<803b52dc>] (ubi_io_read_ec_hdr+0x0/0x330) from
[<803bb2dc>] (ubi_attach+0x160/0x15f0)
[ 3.947304] [<803bb17c>] (ubi_attach+0x0/0x15f0) from [<803ae5f0>]
(ubi_attach_mtd_dev+0x794/0xf18)
[ 3.947319] [<803ade5c>] (ubi_attach_mtd_dev+0x0/0xf18) from
[<808d9850>] (ubi_init+0x1f8/0x2bc)
[ 3.947330] [<808d9658>] (ubi_init+0x0/0x2bc) from [<80008880>]
(do_one_initcall+0xdc/0x194)
[ 3.947342] [<800087a4>] (do_one_initcall+0x0/0x194) from
[<808aeb94>] (kernel_init_freeable+0x104/0x1d0)
[ 3.952386] [<808d9658>] (ubi_init+0x0/0x2bc) from [<80008880>]
(do_one_initcall+0xdc/0x194)
[ 3.960828] [<800087a4>] (do_one_initcall+0x0/0x194) from
[<808aeb94>] (kernel_init_freeable+0x104/0x1d0)
[ 3.967418] [<800087a4>] (do_one_initcall+0x0/0x194) from
[<808aeb94>] (kernel_init_freeable+0x104/0x1d0)
[ 3.967426] [<808aea90>] (kernel_init_freeable+0x0/0x1d0) from
[<8065c74c>] (kernel_init+0x10/0xec)

Ideas?

Thanks,

Tim
Tim Harvey
2013-10-18 20:45:33 UTC
Permalink
Post by Tim Harvey
Huang,
The patch you made to obtain ECC info from the chip
(http://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/drivers/mtd/nand/gpmi-nand?id=2febcdf84b75aae627c61f0a5bf531a69299966c)
has caused a regression for an i.MX6 board I'm working with that uses
NAND and ubifs.
I'm using a Micron MT29F8G08
(http://download.micron.com/pdf/datasheets/flash/nand/2_4_8gb_nand_m49a.pdf)
page size: 2112 bytes (2048 + 64 bytes)
block size: 1056 words (1024 + 32 words)
plane size: 2 planes x 1024 blocks per plane
device size: 2Gb: 2048 blocks
ecc: 4bit
gf_len=13
ecc_strength=8
page_size=2112
metadata_size=10
ecc_chunk_size=512
ecc_chunk_count=4
payload_size=2048
auxiliary_size=16
auxiliary_status_offset=12
block_mark_byte_offset=1999
gf_len=13
ecc_strength=4
digging more into the issue I believe the problem is that the 'chip'
has an internal 4-bit ECC which is now being used, whereas what we
want is the BCH's ECC to be used (which would use 8bit ECC with a NAND
device with a pagesize of 2048 + 64 OOB). If I force ecc_stength such
as:

@@ -143,7 +143,7 @@ static bool set_geometry_by_ecc_info(struct
gpmi_nand_data *this)
return false;
}
geo->ecc_chunk_size = chip->ecc_step_ds;
- geo->ecc_strength = round_up(chip->ecc_strength_ds, 2);
+ geo->ecc_strength = 8;
if (!gpmi_check_ecc(this))
return false;

@@ -208,6 +208,7 @@ static bool set_geometry_by_ecc_info(struct
gpmi_nand_data *this)

then I have no issues.

Does this make sense and if so should we be using the chip's reported
ecc strength at all?

Tim
Huang Shijie
2013-10-21 03:30:34 UTC
Permalink
Post by Tim Harvey
Huang,
The patch you made to obtain ECC info from the chip
(http://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/drivers/mtd/nand/gpmi-nand?id=2febcdf84b75aae627c61f0a5bf531a69299966c)
has caused a regression for an i.MX6 board I'm working with that uses
NAND and ubifs.
I'm using a Micron MT29F8G08
(http://download.micron.com/pdf/datasheets/flash/nand/2_4_8gb_nand_m49a.pdf)
page size: 2112 bytes (2048 + 64 bytes)
block size: 1056 words (1024 + 32 words)
plane size: 2 planes x 1024 blocks per plane
device size: 2Gb: 2048 blocks
ecc: 4bit
gf_len=13
ecc_strength=8
page_size=2112
metadata_size=10
ecc_chunk_size=512
ecc_chunk_count=4
payload_size=2048
auxiliary_size=16
auxiliary_status_offset=12
block_mark_byte_offset=1999
gf_len=13
ecc_strength=4
page_size=2084
metadata_size=10
ecc_chunk_size=512
ecc_chunk_count=4
payload_size=2048
auxiliary_size=16
auxiliary_status_offset=12
block_mark_byte_offset=2018
block_mark_bit_offset=4
a) ecc_strength - this part has 4bit ecc but being detected as 8?
b) page_size - the legacy function includes oob in page size, and the
new one does not
which is correct?
In theory, both are correct.

I am in an embarrass state now:
[1] we can support the jffs2, when we use the set_geometry_by_ecc_info()
to set the NAND layout.

[2] if we still use the legacy_set_geometry() to set the NAND layout, we
can not support the jffs2 for GPMI.


When i submitted the gpmi to the kernel, the mtd code can not provide me
the ECC info. I had to use all the OOB
with legacy_set_geometry(). After i submitted several patches for mtd,
it can provide us the ECC info now, so i switch
to use the set_geometry_by_ecc_info().

it's easy to fix your problem with a patch like this:
-----------------------------------------------------------------------------------------------------------------

drivers/mtd/nand/gpmi-nand/gpmi-nand.c | 3 ++-
1 files changed, 2 insertions(+), 1 deletions(-)

diff --git a/drivers/mtd/nand/gpmi-nand/gpmi-nand.c b/drivers/mtd/nand/gpmi-nand
index 5a1bbc7..8eeead6 100644
--- a/drivers/mtd/nand/gpmi-nand/gpmi-nand.c
+++ b/drivers/mtd/nand/gpmi-nand/gpmi-nand.c
@@ -352,7 +352,8 @@ static int legacy_set_geometry(struct gpmi_nand_data *this)

int common_nfc_set_geometry(struct gpmi_nand_data *this)
{
- return set_geometry_by_ecc_info(this) ? 0 : legacy_set_geometry(this);
+ return (legacy_set_geometry(this) == 0) ? 0 :
+ set_geometry_by_ecc_info(this);
}

struct dma_chan *get_dma_chan(struct gpmi_nand_data *this)


-----------------------------------------------------------------------------------------------------------------

But after apply this patch, the gpmi may can not support the JFFS2 any more.

thanks
Huang Shijie
Post by Tim Harvey
With your patch using the new function to detect ECC from the part,
booting to a ubifs that was flashed with uboot (which incidentally
[ 3.946205] UBI: attaching mtd2 to ubi0
[ 3.946585] UBI warning: ubi_io_read: error -74 (ECC error) while
reading 64 bytes from PEB 0:0, read only 64 bytes, retry
[ 3.946785] UBI warning: ubi_io_read: error -74 (ECC error) while
reading 64 bytes from PEB 0:0, read only 64 bytes, retry
[ 3.946983] UBI warning: ubi_io_read: error -74 (ECC error) while
reading 64 bytes from PEB 0:0, read only 64 bytes, retry
[ 3.947177] UBI error: ubi_io_read: error -74 (ECC error) while
reading 64 bytes from PEB 0:0, read 64 bytes
[ 3.947186] CPU: 0 PID: 1 Comm: swapper/0 Not tainted 3.12.0-rc3+ #253
[ 3.947220] [<8001265c>] (dump_backtrace+0x0/0x10c) from
[<800127fc>] (show_stack+0x18/0x1c)
[ 3.947230] r6:00000040 r5:ffffffb6 r4:00000000 r3:00000000
[ 3.947247] [<800127e4>] (show_stack+0x0/0x1c) from [<80661820>]
(dump_stack+0x78/0x94)
[ 3.947269] [<806617a8>] (dump_stack+0x0/0x94) from [<803b4ebc>]
(ubi_io_read+0x12c/0x364)
[ 3.947274] r4:bf09e000 r3:00000000
[ 3.947285] [<803b4d90>] (ubi_io_read+0x0/0x364) from [<803b533c>]
(ubi_io_read_ec_hdr+0x60/0x330)
[ 3.947295] [<803b52dc>] (ubi_io_read_ec_hdr+0x0/0x330) from
[<803bb2dc>] (ubi_attach+0x160/0x15f0)
[ 3.947304] [<803bb17c>] (ubi_attach+0x0/0x15f0) from [<803ae5f0>]
(ubi_attach_mtd_dev+0x794/0xf18)
[ 3.947319] [<803ade5c>] (ubi_attach_mtd_dev+0x0/0xf18) from
[<808d9850>] (ubi_init+0x1f8/0x2bc)
[ 3.947330] [<808d9658>] (ubi_init+0x0/0x2bc) from [<80008880>]
(do_one_initcall+0xdc/0x194)
[ 3.947342] [<800087a4>] (do_one_initcall+0x0/0x194) from
[<808aeb94>] (kernel_init_freeable+0x104/0x1d0)
[ 3.952386] [<808d9658>] (ubi_init+0x0/0x2bc) from [<80008880>]
(do_one_initcall+0xdc/0x194)
[ 3.960828] [<800087a4>] (do_one_initcall+0x0/0x194) from
[<808aeb94>] (kernel_init_freeable+0x104/0x1d0)
[ 3.967418] [<800087a4>] (do_one_initcall+0x0/0x194) from
[<808aeb94>] (kernel_init_freeable+0x104/0x1d0)
[ 3.967426] [<808aea90>] (kernel_init_freeable+0x0/0x1d0) from
[<8065c74c>] (kernel_init+0x10/0xec)
Ideas?
Thanks,
Tim
Brian Norris
2013-10-21 07:21:24 UTC
Permalink
Post by Tim Harvey
The patch you made to obtain ECC info from the chip
(http://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/drivers/mtd/nand/gpmi-nand?id=2febcdf84b75aae627c61f0a5bf531a69299966c)
has caused a regression for an i.MX6 board I'm working with that uses
NAND and ubifs.
...
[1] we can support the jffs2, when we use the set_geometry_by_ecc_info() to
set the NAND layout.
[2] if we still use the legacy_set_geometry() to set the NAND layout, we can
not support the jffs2 for GPMI.
When i submitted the gpmi to the kernel, the mtd code can not provide me the
ECC info. I had to use all the OOB
with legacy_set_geometry(). After i submitted several patches for mtd, it
can provide us the ECC info now, so i switch
to use the set_geometry_by_ecc_info().
-----------------------------------------------------------------------------------------------------------------
drivers/mtd/nand/gpmi-nand/gpmi-nand.c | 3 ++-
1 files changed, 2 insertions(+), 1 deletions(-)
diff --git a/drivers/mtd/nand/gpmi-nand/gpmi-nand.c
b/drivers/mtd/nand/gpmi-nand
index 5a1bbc7..8eeead6 100644
--- a/drivers/mtd/nand/gpmi-nand/gpmi-nand.c
+++ b/drivers/mtd/nand/gpmi-nand/gpmi-nand.c
@@ -352,7 +352,8 @@ static int legacy_set_geometry(struct gpmi_nand_data *this)
int common_nfc_set_geometry(struct gpmi_nand_data *this)
{
legacy_set_geometry(this);
+ set_geometry_by_ecc_info(this);
}
struct dma_chan *get_dma_chan(struct gpmi_nand_data *this)
-----------------------------------------------------------------------------------------------------------------
So you really have two wholly incompatible layout methods, one which
targeted using the maximum ECC strength that fits and one which
targets the minimum required ECC strength? It seems that you can only
use a new ECC/geometry if the user prompts you to do so (via device
tree, for instance). Doing otherwise is bound to create regressions
like this.
But after apply this patch, the gpmi may can not support the JFFS2 any more.
Well, we do prioritize "no regressions", rather than new features. So
we may have to do this until a proper solution can be formed.

Brian
Huang Shijie
2013-10-21 08:04:07 UTC
Permalink
Post by Brian Norris
Well, we do prioritize "no regressions", rather than new features. So
we may have to do this until a proper solution can be formed.
ok.

I will create a patch to fix this regression.
And it seems i have to add new devicetree property to enable the JFFS2
support.

thanks
Huang Shijie
Brian Norris
2013-10-21 17:41:17 UTC
Permalink
Post by Huang Shijie
Post by Brian Norris
Well, we do prioritize "no regressions", rather than new features. So
we may have to do this until a proper solution can be formed.
ok.
I will create a patch to fix this regression.
Yes. We should either get it into 3.12 or mark it -stable.
Post by Huang Shijie
And it seems i have to add new devicetree property to enable the JFFS2
support.
We cannot use device-tree to enable JFFS2 (a purely software issue),
but you could use it to specify the ECC strength or layout. Be sure to
CC devicetree at vger.kernel.org on any new bindings, since they need to
review bindings and tend to have good input.

Brian
Marek Vasut
2013-10-23 05:33:33 UTC
Permalink
Dear Huang Shijie,
Post by Huang Shijie
Post by Tim Harvey
Huang,
The patch you made to obtain ECC info from the chip
(http://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/dr
ivers/mtd/nand/gpmi-nand?id=2febcdf84b75aae627c61f0a5bf531a69299966c) has
caused a regression for an i.MX6 board I'm working with that uses NAND
and ubifs.
I'm using a Micron MT29F8G08
(http://download.micron.com/pdf/datasheets/flash/nand/2_4_8gb_nand_m49a.p
page size: 2112 bytes (2048 + 64 bytes)
block size: 1056 words (1024 + 32 words)
plane size: 2 planes x 1024 blocks per plane
device size: 2Gb: 2048 blocks
ecc: 4bit
gf_len=13
ecc_strength=8
page_size=2112
metadata_size=10
ecc_chunk_size=512
ecc_chunk_count=4
payload_size=2048
auxiliary_size=16
auxiliary_status_offset=12
block_mark_byte_offset=1999
gf_len=13
ecc_strength=4
page_size=2084
metadata_size=10
ecc_chunk_size=512
ecc_chunk_count=4
payload_size=2048
auxiliary_size=16
auxiliary_status_offset=12
block_mark_byte_offset=2018
block_mark_bit_offset=4
a) ecc_strength - this part has 4bit ecc but being detected as 8?
b) page_size - the legacy function includes oob in page size, and the
new one does not
which is correct?
In theory, both are correct.
[1] we can support the jffs2, when we use the set_geometry_by_ecc_info()
to set the NAND layout.
[2] if we still use the legacy_set_geometry() to set the NAND layout, we
can not support the jffs2 for GPMI.
It's exactly the other way around if you're talking about JFFS2 compatibility
with the 2.6.35 FSL kernel ;-)

Loading...