Discussion:
mtd nand erase and bad block
Matteo Facchinetti
2012-05-31 12:12:41 UTC
Permalink
Hi,

I'm developing a mtd driver for a nand flash controller and I need help;
I'm to the end of the work, and I have problems and dubts about bad
block handling.

For test, I mark manually bad block byte on flash block.
When I erase with flash_erase, all works and skip the bad block:
:~# flash_erase /dev/mtd6 0 0
Erasing 1024 Kibyte @ 100000 -- 2 % complete flash_erase: Skipping bad
block at 00200000
Erasing 1024 Kibyte @ 2700000 -- 100 % complete

If I try to erase with -N parameter I read the following output:
~# flash_erase -N /dev/mtd6 0 0

Erasing 1024 Kibnand_erase_nand: attempt to erase a bad block at page
0x00001600
yte @ 200000 -- 5 % complete libmtd: error!: MEMERASE64 ioctl failed
for eraseblock 2 (mtd6)
error 5 (Input/output error)
flash_erase: error!: /dev/mtd6: MTD Erase failure
error 5 (Input/output error)
Erasing 1024 Kibyte @ 2700000 -- 100 % complete

I espected that with -N option, flash_erase should erase the
hand-damaged block, recovering the usability of the block. Is it true?

What's going wrong?

Best Regards,
Matteo
Adrian Hunter
2012-05-31 13:28:12 UTC
Permalink
Post by Matteo Facchinetti
Hi,
I'm developing a mtd driver for a nand flash controller and I need help;
I'm to the end of the work, and I have problems and dubts about bad block
handling.
For test, I mark manually bad block byte on flash block.
:~# flash_erase /dev/mtd6 0 0
block at 00200000
~# flash_erase -N /dev/mtd6 0 0
Erasing 1024 Kibnand_erase_nand: attempt to erase a bad block at page
0x00001600
eraseblock 2 (mtd6)
error 5 (Input/output error)
flash_erase: error!: /dev/mtd6: MTD Erase failure
error 5 (Input/output error)
I espected that with -N option, flash_erase should erase the hand-damaged
block, recovering the usability of the block. Is it true?
What's going wrong?
The NAND driver has validation to prevent erasure of bad blocks. Refer
nand_erase_nand() in drivers/mtd/nand/nand_base

/* Check if we have a bad block, we do not erase bad blocks! */
if (nand_block_checkbad(mtd, ((loff_t) page) <<
chip->page_shift, 0, allowbbt)) {
pr_warn("%s: attempt to erase a bad block at page 0x%08x\n",
__func__, page);
instr->state = MTD_ERASE_FAILED;
goto erase_exit;
}
Matteo Facchinetti
2012-05-31 14:28:34 UTC
Permalink
Post by Adrian Hunter
Post by Matteo Facchinetti
Hi,
I'm developing a mtd driver for a nand flash controller and I need help;
I'm to the end of the work, and I have problems and dubts about bad block
handling.
For test, I mark manually bad block byte on flash block.
:~# flash_erase /dev/mtd6 0 0
block at 00200000
~# flash_erase -N /dev/mtd6 0 0
Erasing 1024 Kibnand_erase_nand: attempt to erase a bad block at page
0x00001600
eraseblock 2 (mtd6)
error 5 (Input/output error)
flash_erase: error!: /dev/mtd6: MTD Erase failure
error 5 (Input/output error)
I espected that with -N option, flash_erase should erase the hand-damaged
block, recovering the usability of the block. Is it true?
What's going wrong?
The NAND driver has validation to prevent erasure of bad blocks. Refer
nand_erase_nand() in drivers/mtd/nand/nand_base
/* Check if we have a bad block, we do not erase bad blocks! */
if (nand_block_checkbad(mtd, ((loff_t) page)<<
chip->page_shift, 0, allowbbt)) {
pr_warn("%s: attempt to erase a bad block at page 0x%08x\n",
__func__, page);
instr->state = MTD_ERASE_FAILED;
goto erase_exit;
}
this means that -N parameter bypass only the BBT table check, but in the
erase routine there is a low level check that forbids the bad block erase.

What's the way for recheck bad blocks and refresh the BBT from userspace
application?


Matteo
Shmulik Ladkani
2012-05-31 19:57:12 UTC
Permalink
Hi Matteo,
Post by Matteo Facchinetti
Post by Adrian Hunter
Post by Matteo Facchinetti
~# flash_erase -N /dev/mtd6 0 0
Erasing 1024 Kibnand_erase_nand: attempt to erase a bad block at page
0x00001600
eraseblock 2 (mtd6)
error 5 (Input/output error)
flash_erase: error!: /dev/mtd6: MTD Erase failure
error 5 (Input/output error)
I espected that with -N option, flash_erase should erase the hand-damaged
block, recovering the usability of the block. Is it true?
What's going wrong?
The NAND driver has validation to prevent erasure of bad blocks. Refer
nand_erase_nand() in drivers/mtd/nand/nand_base
/* Check if we have a bad block, we do not erase bad blocks! */
if (nand_block_checkbad(mtd, ((loff_t) page)<<
chip->page_shift, 0, allowbbt)) {
pr_warn("%s: attempt to erase a bad block at page 0x%08x\n",
__func__, page);
instr->state = MTD_ERASE_FAILED;
goto erase_exit;
}
this means that -N parameter bypass only the BBT table check, but in the
erase routine there is a low level check that forbids the bad block erase.
What's the way for recheck bad blocks and refresh the BBT from userspace
application?
Well, seems there's no way doing so via Linux MTD APIs.

I know u-boot allows doing so, using its 'scrub' command.
Actually, what it really does is hacks the exact condition quoted above
by Adrian, adding a '!instr->scrub &&' to the condition expression.

I assume such an option could be added to the MEMERASE ioctl.

Regards,
Shmulik
Adrian Hunter
2012-06-01 06:24:28 UTC
Permalink
This post might be inappropriate. Click to display it.
Ricard Wanderlof
2012-06-01 06:37:53 UTC
Permalink
Post by Adrian Hunter
Post by Matteo Facchinetti
What's the way for recheck bad blocks and refresh the BBT from userspace
application?
I always just temporarily hack the kernel driver to allow the erase of the
bad block in question.
I agree. Having that capability available at all times would be scary.

/Ricard
--
Ricard Wolf Wanderl?f ricardw(at)axis.com
Axis Communications AB, Lund, Sweden www.axis.com
Phone +46 46 272 2016 Fax +46 46 13 61 30
Angus CLARK
2012-06-01 08:29:22 UTC
Permalink
I have to do this regularly for testing new NAND drivers. After getting fed up
with doing temporary hacks all the time, I ended up adding a
'nand_erasebadblock' entry to debugfs, which overrides the check in
nand_erase_nand():
...
if (!nand_erasebadblock &&
nand_block_checkbad(mtd, ((loff_t) page) <<
chip->page_shift, 0, allowbbt)) {
...

The sequence in userspace would then be something like:

target% echo 1 > /sys/kernel/debug/nand_erasebadblock
target% flash_erase -N /dev/mtd6 0x00200000 1
target% echo 0 > /sys/kernel/debug/nand_erasebadblock

You need to be careful to only erase marked bad blocks that you know are
actually good, or else you risk loosing the factory-programmed bad block markers.

This method is also useful for erasing the BBTs, which will then force the
driver to re-scan for OOB markers on the next boot. Again care needs to be
taken, as you may loose information about blocks that have gone bad through
wear. (The recent patch "mtd: nand: write BBM to OOB even with flash-based BBT"
partly overcomes this issue.)

Typically, debugfs is only enabled in development environments, and even then it
requires explicit user action, so this method of enabling erasing bad blocks is
safe enough for our needs.

Happy to do a patch if others are interested...

Cheers,

Angus
Post by Ricard Wanderlof
Post by Adrian Hunter
Post by Matteo Facchinetti
What's the way for recheck bad blocks and refresh the BBT from userspace
application?
I always just temporarily hack the kernel driver to allow the erase of the
bad block in question.
I agree. Having that capability available at all times would be scary.
/Ricard
Artem Bityutskiy
2012-06-01 08:42:01 UTC
Permalink
Post by Angus CLARK
I have to do this regularly for testing new NAND drivers. After getting fed up
with doing temporary hacks all the time, I ended up adding a
'nand_erasebadblock' entry to debugfs, which overrides the check in
...
if (!nand_erasebadblock &&
nand_block_checkbad(mtd, ((loff_t) page) <<
chip->page_shift, 0, allowbbt)) {
...
target% echo 1 > /sys/kernel/debug/nand_erasebadblock
target% flash_erase -N /dev/mtd6 0x00200000 1
target% echo 0 > /sys/kernel/debug/nand_erasebadblock
You need to be careful to only erase marked bad blocks that you know are
actually good, or else you risk loosing the factory-programmed bad block markers.
This method is also useful for erasing the BBTs, which will then force the
driver to re-scan for OOB markers on the next boot. Again care needs to be
taken, as you may loose information about blocks that have gone bad through
wear. (The recent patch "mtd: nand: write BBM to OOB even with flash-based BBT"
partly overcomes this issue.)
Typically, debugfs is only enabled in development environments, and even then it
requires explicit user action, so this method of enabling erasing bad blocks is
safe enough for our needs.
Sounds ok to me, especially if you send the patch together with a piece
of doc for the mtd web-site. I just think it is important to document
this feature. Is this doable?
--
Best Regards,
Artem Bityutskiy
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 836 bytes
Desc: This is a digitally signed message part
URL: <http://lists.infradead.org/pipermail/linux-mtd/attachments/20120601/5c16dc4f/attachment.sig>
Shmulik Ladkani
2012-06-01 11:04:45 UTC
Permalink
Hi,
Post by Artem Bityutskiy
Post by Angus CLARK
I have to do this regularly for testing new NAND drivers. After getting fed up
with doing temporary hacks all the time, I ended up adding a
'nand_erasebadblock' entry to debugfs, which overrides the check in
...
if (!nand_erasebadblock &&
nand_block_checkbad(mtd, ((loff_t) page) <<
chip->page_shift, 0, allowbbt)) {
...
target% echo 1 > /sys/kernel/debug/nand_erasebadblock
target% flash_erase -N /dev/mtd6 0x00200000 1
target% echo 0 > /sys/kernel/debug/nand_erasebadblock
Typically, debugfs is only enabled in development environments, and even then it
requires explicit user action, so this method of enabling erasing bad blocks is
safe enough for our needs.
Sounds ok to me, especially if you send the patch together with a piece
of doc for the mtd web-site. I just think it is important to document
this feature. Is this doable?
I think we should prefer a local "allow erase bad blocks" policy than a
global one.

This is because when the global debugfs flag is on, *every* mtd erase
operation might lead to erasure of bad blocks - not necessarily those
triggered by the user which set the flag prior issuing his 'flash_erase'
command.

Meaning, other MTD users (ubi, various ffs) which currently work on
other mtd partitions, are suddenly relaxed and allowed to erase bad
blocks - which is probably not what user intented.

I suggest to be more restrictive and have the "allow erase bad blocks"
propery be local policy, that is - per an erase request.

And since we'll probably need this thing only for userspace erase
calls (e.g. flash_erase) - I suggest placing it into the MEMERASE ioctl.

Comments?

Regards,
Shmulik
Angus CLARK
2012-06-01 14:03:19 UTC
Permalink
Hi Shmulik,
Post by Shmulik Ladkani
Post by Artem Bityutskiy
Post by Angus CLARK
Typically, debugfs is only enabled in development environments, and even then it
requires explicit user action, so this method of enabling erasing bad blocks is
safe enough for our needs.
Sounds ok to me, especially if you send the patch together with a piece
of doc for the mtd web-site. I just think it is important to document
this feature. Is this doable?
I think we should prefer a local "allow erase bad blocks" policy than a
global one.
This is because when the global debugfs flag is on, *every* mtd erase
operation might lead to erasure of bad blocks - not necessarily those
triggered by the user which set the flag prior issuing his 'flash_erase'
command.
Meaning, other MTD users (ubi, various ffs) which currently work on
other mtd partitions, are suddenly relaxed and allowed to erase bad
blocks - which is probably not what user intented.
I suggest to be more restrictive and have the "allow erase bad blocks"
propery be local policy, that is - per an erase request.
And since we'll probably need this thing only for userspace erase
calls (e.g. flash_erase) - I suggest placing it into the MEMERASE ioctl.
Comments?
I would agree to some extent. Enabling the "allow erase bad blocks" option per
erase request would certainly be a safer solution. However, I suspect extending
the existing MEMERASE/MEMERASE64 IOCTLs is not really an option. That leaves
inventing another IOCTL, or perhaps adding another file mode, which would
achieve per-file-descriptor scope, if not per-erase-request.

My approach was largely motivated by the desire not to change the existing ABI,
and/or mtd-utils. One could argue that such an option should only ever be
enabled by someone who knows what they are doing, and that might include things
like un-mounting any filesystems beforehand.

To be honest, I am in two minds. The solution I have at present is very simple
(or perhaps naive!), requires minimial changes to the kernel, no changes to
mtd-utils, and can be disabled completely by not including debugfs (which is
standard practice on production systems). On the other hand, the ability to
enable per-erase-request is a safer and more elegant solution. However, it
would require updates to mtd-utils, and agreement from the MTD community
regarding changes to the ABI...

What do you think?

Cheers,

Angus
Shmulik Ladkani
2012-06-01 14:54:07 UTC
Permalink
Hi Angus,
Post by Angus CLARK
Hi Shmulik,
Post by Shmulik Ladkani
Post by Artem Bityutskiy
Post by Angus CLARK
Typically, debugfs is only enabled in development environments, and even then it
requires explicit user action, so this method of enabling erasing bad blocks is
safe enough for our needs.
Sounds ok to me, especially if you send the patch together with a piece
of doc for the mtd web-site. I just think it is important to document
this feature. Is this doable?
I think we should prefer a local "allow erase bad blocks" policy than a
global one.
This is because when the global debugfs flag is on, *every* mtd erase
operation might lead to erasure of bad blocks - not necessarily those
triggered by the user which set the flag prior issuing his 'flash_erase'
command.
Meaning, other MTD users (ubi, various ffs) which currently work on
other mtd partitions, are suddenly relaxed and allowed to erase bad
blocks - which is probably not what user intented.
I suggest to be more restrictive and have the "allow erase bad blocks"
propery be local policy, that is - per an erase request.
And since we'll probably need this thing only for userspace erase
calls (e.g. flash_erase) - I suggest placing it into the MEMERASE ioctl.
Comments?
I would agree to some extent. Enabling the "allow erase bad blocks" option per
erase request would certainly be a safer solution. However, I suspect extending
the existing MEMERASE/MEMERASE64 IOCTLs is not really an option. That leaves
inventing another IOCTL, or perhaps adding another file mode, which would
achieve per-file-descriptor scope, if not per-erase-request.
My approach was largely motivated by the desire not to change the existing ABI,
and/or mtd-utils. One could argue that such an option should only ever be
enabled by someone who knows what they are doing, and that might include things
like un-mounting any filesystems beforehand.
To be honest, I am in two minds. The solution I have at present is very simple
(or perhaps naive!), requires minimial changes to the kernel, no changes to
mtd-utils, and can be disabled completely by not including debugfs (which is
standard practice on production systems). On the other hand, the ability to
enable per-erase-request is a safer and more elegant solution. However, it
would require updates to mtd-utils, and agreement from the MTD community
regarding changes to the ABI...
What do you think?
To be honest, I'm in two minds either ;)

I completely understand the reasons and motivation for a global
debugfs option. And it seems as a reasonable compromise.

OTOH, adding a new ioctl makes sense as we're offering a functionality
that didn't exist before.

Anyways, I guess its up to David or Artem.

My personal preference would be:
1. A new ioctl (MEMSCRUB?)
2. debugfs flag, PER MTD PART (slightly safer than your global flag)
3. global debugfs flag

BTW what do you think about option (2)? Would you consider it, or do you
think it's an overdesign, if we already accept the debugfs way?

Regards,
Shmulik
Angus CLARK
2012-06-01 15:28:43 UTC
Permalink
Hi Shmulik,
Post by Shmulik Ladkani
Hi Angus,
Post by Angus CLARK
Hi Shmulik,
Post by Shmulik Ladkani
Post by Artem Bityutskiy
Post by Angus CLARK
Typically, debugfs is only enabled in development environments, and even then it
requires explicit user action, so this method of enabling erasing bad blocks is
safe enough for our needs.
Sounds ok to me, especially if you send the patch together with a piece
of doc for the mtd web-site. I just think it is important to document
this feature. Is this doable?
I think we should prefer a local "allow erase bad blocks" policy than a
global one.
This is because when the global debugfs flag is on, *every* mtd erase
operation might lead to erasure of bad blocks - not necessarily those
triggered by the user which set the flag prior issuing his 'flash_erase'
command.
Meaning, other MTD users (ubi, various ffs) which currently work on
other mtd partitions, are suddenly relaxed and allowed to erase bad
blocks - which is probably not what user intented.
I suggest to be more restrictive and have the "allow erase bad blocks"
propery be local policy, that is - per an erase request.
And since we'll probably need this thing only for userspace erase
calls (e.g. flash_erase) - I suggest placing it into the MEMERASE ioctl.
Comments?
I would agree to some extent. Enabling the "allow erase bad blocks" option per
erase request would certainly be a safer solution. However, I suspect extending
the existing MEMERASE/MEMERASE64 IOCTLs is not really an option. That leaves
inventing another IOCTL, or perhaps adding another file mode, which would
achieve per-file-descriptor scope, if not per-erase-request.
My approach was largely motivated by the desire not to change the existing ABI,
and/or mtd-utils. One could argue that such an option should only ever be
enabled by someone who knows what they are doing, and that might include things
like un-mounting any filesystems beforehand.
To be honest, I am in two minds. The solution I have at present is very simple
(or perhaps naive!), requires minimial changes to the kernel, no changes to
mtd-utils, and can be disabled completely by not including debugfs (which is
standard practice on production systems). On the other hand, the ability to
enable per-erase-request is a safer and more elegant solution. However, it
would require updates to mtd-utils, and agreement from the MTD community
regarding changes to the ABI...
What do you think?
To be honest, I'm in two minds either ;)
I completely understand the reasons and motivation for a global
debugfs option. And it seems as a reasonable compromise.
OTOH, adding a new ioctl makes sense as we're offering a functionality
that didn't exist before.
Anyways, I guess its up to David or Artem.
1. A new ioctl (MEMSCRUB?)
2. debugfs flag, PER MTD PART (slightly safer than your global flag)
3. global debugfs flag
BTW what do you think about option (2)? Would you consider it, or do you
think it's an overdesign, if we already accept the debugfs way?
Yes, option 2 could be a good compromise. It would require a few extra hooks in
mtdpart, to dynamically create/remove debugfs entries when partitions are
added/deleted, and some updates to mtd_erase, to allow the flags to be passed to
nand_base:nand_erase_nand(), but it shouldn't be too bad. Perhaps moving it to
sysfs might be cleaner?

In any case, I will wait for advice from David and Artem before commencing. (I
am away next week, so it will have to wait until I get back anyway.)

Cheers,

Angus
Artem Bityutskiy
2012-06-05 12:17:50 UTC
Permalink
Post by Shmulik Ladkani
1. A new ioctl (MEMSCRUB?)
2. debugfs flag, PER MTD PART (slightly safer than your global flag)
3. global debugfs flag
Yes, I guess option 1 is the best I think. Option 2 needs too much work.
--
Best Regards,
Artem Bityutskiy
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 836 bytes
Desc: This is a digitally signed message part
URL: <http://lists.infradead.org/pipermail/linux-mtd/attachments/20120605/00ed0e40/attachment-0001.sig>
Brian Norris
2012-06-14 17:48:49 UTC
Permalink
Hi,
Post by Artem Bityutskiy
Post by Shmulik Ladkani
1. A new ioctl (MEMSCRUB?)
2. debugfs flag, PER MTD PART (slightly safer than your global flag)
3. global debugfs flag
Yes, I guess option 1 is the best I think. Option 2 needs too much work.
Just to put my 2 cents in and revive this thread: I'm also interested
in this kind of feature. I personally recompile to disable bad block
tables temporarily whenever I need to reset a flash-based BBT or
unmark a "bad block." But this isn't always so easy for others I deal
with, so I'm all for a feature that can be used by a
relatively-inexperienced user to reset bad block tables, erase bad
block markers, etc.

I like the idea of an ioctl (option 1), since that does not require
recompilation (even in the event that debugfs wasn't enabled) and can
be built into a user-space tool, with appropriate warnings and
prompting for the user, of course. The debugfs ideas seem a little bit
too manual to be useful for anyone but a true driver/kernel developer
and also a little bit too unsafe (a user may want to target a specific
block without disabling bad block checking for all chips or even for
the entire partition).

FWIW, a similar topic was brought up a long time back, with little result:
http://lists.infradead.org/pipermail/linux-mtd/2010-October/032577.html

So Angus, are you going to code this?

Brian
Shmulik Ladkani
2012-06-14 21:31:21 UTC
Permalink
Hi,
Post by Brian Norris
Post by Artem Bityutskiy
Post by Shmulik Ladkani
1. A new ioctl (MEMSCRUB?)
2. debugfs flag, PER MTD PART (slightly safer than your global flag)
3. global debugfs flag
Yes, I guess option 1 is the best I think. Option 2 needs too much work.
http://lists.infradead.org/pipermail/linux-mtd/2010-October/032577.html
Thanks Brian for spotting this post.
It made me rethink of the MEMSCRUB suggestion and notice it is somewhat
limited.

I guess the main reason for erasing a block marked bad is to erase its
BBM (that was previously set for test purposes), so that the block will
no longer be considered bad.
But that would only work if BBM is the only bad-block identification
scheme (no on-flash BBT).

Hence the post by Jon Povey is actually better, as it provides a tiny
but useful building block: allow "scrubbing" the bad indication of
an erase-block. For a pure-BBM systems, that would be erasure of the
block; For pure flash-BBT, that would be updating the BBT; For systems
that update both BBM and BBT, we'll do both.

What do you think?

Regards,
Shmulik
Angus CLARK
2012-06-15 06:55:30 UTC
Permalink
Hi Brian,
Post by Brian Norris
Post by Shmulik Ladkani
1. A new ioctl (MEMSCRUB?)
2. debugfs flag, PER MTD PART (slightly safer than your global flag)
3. global debugfs flag
[snip]
Post by Brian Norris
I like the idea of an ioctl (option 1), since that does not require
recompilation (even in the event that debugfs wasn't enabled) and can
be built into a user-space tool, with appropriate warnings and
prompting for the user, of course. The debugfs ideas seem a little bit
too manual to be useful for anyone but a true driver/kernel developer
and also a little bit too unsafe (a user may want to target a specific
block without disabling bad block checking for all chips or even for
the entire partition).
Yes, I agree, option 1 is looking good. The main motivation behind option 3 was
to add support with minimal code changes, but option 1 is a cleaner solution.
Post by Brian Norris
So Angus, are you going to code this?
Yes, I will have go later today or early next week...

Cheers,

Angus
Tomer Barletz
2012-06-26 22:10:17 UTC
Permalink
Post by Brian Norris
Hi,
Post by Artem Bityutskiy
Post by Shmulik Ladkani
1. A new ioctl (MEMSCRUB?)
2. debugfs flag, PER MTD PART (slightly safer than your global flag)
3. global debugfs flag
Yes, I guess option 1 is the best I think. Option 2 needs too much work.
Just to put my 2 cents in and revive this thread: I'm also interested
in this kind of feature. I personally recompile to disable bad block
tables temporarily whenever I need to reset a flash-based BBT or
unmark a "bad block." But this isn't always so easy for others I deal
with, so I'm all for a feature that can be used by a
relatively-inexperienced user to reset bad block tables, erase bad
block markers, etc.
I like the idea of an ioctl (option 1), since that does not require
recompilation (even in the event that debugfs wasn't enabled) and can
be built into a user-space tool, with appropriate warnings and
prompting for the user, of course. The debugfs ideas seem a little bit
too manual to be useful for anyone but a true driver/kernel developer
and also a little bit too unsafe (a user may want to target a specific
block without disabling bad block checking for all chips or even for
the entire partition).
http://lists.infradead.org/pipermail/linux-mtd/2010-October/032577.html
So Angus, are you going to code this?
Isn't Jon's patch match option number 1?

--Tomer
Tomer Barletz
2012-06-26 22:10:17 UTC
Permalink
Post by Brian Norris
Hi,
Post by Artem Bityutskiy
Post by Shmulik Ladkani
1. A new ioctl (MEMSCRUB?)
2. debugfs flag, PER MTD PART (slightly safer than your global flag)
3. global debugfs flag
Yes, I guess option 1 is the best I think. Option 2 needs too much work.
Just to put my 2 cents in and revive this thread: I'm also interested
in this kind of feature. I personally recompile to disable bad block
tables temporarily whenever I need to reset a flash-based BBT or
unmark a "bad block." But this isn't always so easy for others I deal
with, so I'm all for a feature that can be used by a
relatively-inexperienced user to reset bad block tables, erase bad
block markers, etc.
I like the idea of an ioctl (option 1), since that does not require
recompilation (even in the event that debugfs wasn't enabled) and can
be built into a user-space tool, with appropriate warnings and
prompting for the user, of course. The debugfs ideas seem a little bit
too manual to be useful for anyone but a true driver/kernel developer
and also a little bit too unsafe (a user may want to target a specific
block without disabling bad block checking for all chips or even for
the entire partition).
http://lists.infradead.org/pipermail/linux-mtd/2010-October/032577.html
So Angus, are you going to code this?
Isn't Jon's patch match option number 1?

--Tomer
Angus CLARK
2012-06-18 09:34:02 UTC
Permalink
Hi Artem,
Post by Artem Bityutskiy
Post by Shmulik Ladkani
1. A new ioctl (MEMSCRUB?)
2. debugfs flag, PER MTD PART (slightly safer than your global flag)
3. global debugfs flag
Yes, I guess option 1 is the best I think. Option 2 needs too much work.
Are you ok with the name MEMSCRUB? I know previously you have objected to this
name, since it might get confused with UBI scrubbing
(http://lists.infradead.org/pipermail/linux-mtd/2010-September/032031.html). In
fact, the conclusion of that thread was to add an extended erase IOCTL, with a
'flags' parameter to capture options such as erase bad blocks. Would this be
the preferred method (it didn't seem to go anywhere last time), or is 'MEMSCRUB'
with the existing erase_info_user64 structure acceptable?

Cheers,

Angus
Artem Bityutskiy
2012-06-27 09:54:06 UTC
Permalink
Post by Angus CLARK
Hi Artem,
Post by Artem Bityutskiy
Post by Shmulik Ladkani
1. A new ioctl (MEMSCRUB?)
2. debugfs flag, PER MTD PART (slightly safer than your global flag)
3. global debugfs flag
Yes, I guess option 1 is the best I think. Option 2 needs too much work.
Are you ok with the name MEMSCRUB? I know previously you have objected to this
name, since it might get confused with UBI scrubbing
(http://lists.infradead.org/pipermail/linux-mtd/2010-September/032031.html). In
fact, the conclusion of that thread was to add an extended erase IOCTL, with a
'flags' parameter to capture options such as erase bad blocks. Would this be
the preferred method (it didn't seem to go anywhere last time), or is 'MEMSCRUB'
with the existing erase_info_user64 structure acceptable?
I think Shmulik had a good point - scrubbing is not only about erasing,
but also about changing the BBT. So a separate ioctl makes more sense.
As for the name, we could name it MEMBBSCRUB, I guess?
--
Best Regards,
Artem Bityutskiy
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 836 bytes
Desc: This is a digitally signed message part
URL: <http://lists.infradead.org/pipermail/linux-mtd/attachments/20120627/015a72b6/attachment.sig>
Angus CLARK
2012-06-27 12:37:07 UTC
Permalink
Post by Artem Bityutskiy
Post by Angus CLARK
Post by Artem Bityutskiy
Post by Shmulik Ladkani
1. A new ioctl (MEMSCRUB?)
2. debugfs flag, PER MTD PART (slightly safer than your global flag)
3. global debugfs flag
Yes, I guess option 1 is the best I think. Option 2 needs too much work.
Are you ok with the name MEMSCRUB? I know previously you have objected to this
name, since it might get confused with UBI scrubbing
(http://lists.infradead.org/pipermail/linux-mtd/2010-September/032031.html). In
fact, the conclusion of that thread was to add an extended erase IOCTL, with a
'flags' parameter to capture options such as erase bad blocks. Would this be
the preferred method (it didn't seem to go anywhere last time), or is 'MEMSCRUB'
with the existing erase_info_user64 structure acceptable?
I think Shmulik had a good point - scrubbing is not only about erasing,
but also about changing the BBT. So a separate ioctl makes more sense.
As for the name, we could name it MEMBBSCRUB, I guess?
There may be others, but the two use-cases I have for erasing blocks marked as
bad are:

1. Restore the status of a block known to be good, but marked as bad. The block
may have been marked bad deliberately, for test purposes, or it may have been
incorrectly identified as bad due to "alien" data from a previous driver/ECC
scheme clashing with the OOB BBM.

2. Erase the Flash-resident BBTs. These blocks are reported as bad to prevent
accidental write or erase operations. However, one might want to erase the
BBTs, either to test the OOB BBM scanning and BBT rebuild on next reboot, or
prior to changing driver and/or ECC scheme.

For case 1, it makes sense to update the BBTs at the same time. At present, I
have to scrub the block in question, scrub the BBTs, and then reboot to force a
rescan/rebuild of the BBTs. (This is fairly simple to do, but does risk loosing
information about blocks that have gone bad through wear, unless invoking the
"mtd: nand: write BBM to OOB even with flash-based BBT" patch.)

However, for case 2, I just want to force the erase operation so I can wipe the
BBTs and return the device as close as possible to its original state. We could
put some logic in the kernel, "if 'MEMBBSCRUB" on BBT blocks, do not
update/rewrite BBTs", but I think this "policy" decision would be better handled
in userspace.

At the risk of repeating the discussion in
http://lists.infradead.org/pipermail/linux-mtd/2010-September/032031.html, how
about adding the MEMSCRUB IOCTL for erasing blocks marked as bad (I have the
kernel and mtd-utils patches available), and then adding 'MEMSETGOODBLOCK' for
updating the BBTs and/or OOB BBM?

Cheers,

Angus
Artem Bityutskiy
2012-06-29 10:31:37 UTC
Permalink
Post by Angus CLARK
However, for case 2, I just want to force the erase operation so I can wipe the
BBTs and return the device as close as possible to its original state. We could
put some logic in the kernel, "if 'MEMBBSCRUB" on BBT blocks, do not
update/rewrite BBTs", but I think this "policy" decision would be better handled
in userspace.
Sounds like you need 2 separate ioctls:
1. MEMBBSCRUB
2. MEMBBTWIPE
--
Best Regards,
Artem Bityutskiy
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 836 bytes
Desc: This is a digitally signed message part
URL: <http://lists.infradead.org/pipermail/linux-mtd/attachments/20120629/fa811752/attachment.sig>
Angus CLARK
2012-07-02 07:14:38 UTC
Permalink
Hi Artem,
Post by Artem Bityutskiy
Post by Angus CLARK
However, for case 2, I just want to force the erase operation so I can wipe the
BBTs and return the device as close as possible to its original state. We could
put some logic in the kernel, "if 'MEMBBSCRUB" on BBT blocks, do not
update/rewrite BBTs", but I think this "policy" decision would be better handled
in userspace.
1. MEMBBSCRUB
There are occasions when it is useful to unmark a bad block, but without erasing
first. Therefore, I guess my preference would be to split this further:
'erase-bad-block' and 'mark-block-as-good'.
Post by Artem Bityutskiy
2. MEMBBTWIPE
This could be a useful addition, since it avoids the need for the user to
calculate the offsets for the BBT blocks.

However, I just have a slight concern about adding the now three new ioctls for
what is essentially development and debugging purposes. (Indeed, we would
probably want a way of disabling these features for production systems.)

Support for 'erase-bad-block' requires minimal updates, and this one ioctl can
be used to achieve the functionality of the other two, albeit with a little
expert knowledge! Support for 'mark-block-as-good' and 'wipe-bbts' ioctls would
require more extensive updates (in particular, to nand_bbt.c which has been
updated heavily since the kernel version I have available for testing!).

I would probably edge towards adding just the 'erase-bad-block' option, although
I except my judgement might be slightly biased here, since I have been using a
variant of this for a year or so. I am happy to implement either approach, but
I would be interested to learn others' views first.

Cheers,

Angus
Artem Bityutskiy
2012-07-03 12:22:50 UTC
Permalink
Post by Angus CLARK
Post by Artem Bityutskiy
Post by Angus CLARK
However, for case 2, I just want to force the erase operation so I can wipe the
BBTs and return the device as close as possible to its original state. We could
put some logic in the kernel, "if 'MEMBBSCRUB" on BBT blocks, do not
update/rewrite BBTs", but I think this "policy" decision would be better handled
in userspace.
1. MEMBBSCRUB
There are occasions when it is useful to unmark a bad block, but without erasing
'erase-bad-block' and 'mark-block-as-good'.
I can smell over-engineering - any good example? Note, you will leave
unused bytes in the ioctl data structure an make it extandable in the
future.
--
Best Regards,
Artem Bityutskiy
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 836 bytes
Desc: This is a digitally signed message part
URL: <http://lists.infradead.org/pipermail/linux-mtd/attachments/20120703/bd065b65/attachment.sig>
Angus CLARK
2012-07-03 15:05:06 UTC
Permalink
Hi Artem,
Post by Artem Bityutskiy
Post by Angus CLARK
Post by Artem Bityutskiy
Post by Angus CLARK
However, for case 2, I just want to force the erase operation so I can wipe the
BBTs and return the device as close as possible to its original state. We could
put some logic in the kernel, "if 'MEMBBSCRUB" on BBT blocks, do not
update/rewrite BBTs", but I think this "policy" decision would be better handled
in userspace.
1. MEMBBSCRUB
There are occasions when it is useful to unmark a bad block, but without erasing
'erase-bad-block' and 'mark-block-as-good'.
I can smell over-engineering
Indeed. The solution I shared at the start of this thread involved a 3-line
patch to add a debugfs entry! This meets our own requirements for
debug/development, although I admit it is a little unsafe when used
inappropriately ;-)
Post by Artem Bityutskiy
- any good example?
One example would to be regain access to blocks that were incorrectly identified
as bad during the creation of the BBTs. This can happen when the NAND device is
pre-programmed with a boot-loader or test-software, but the process used to
perform the pre-programming does not write the BBTs. If the ECC data is such
that it clashes with the OOB BBM, then linux will detect these as bad blocks on
first boot. Of course, the problem lies with the pre-programmer, but it is
helpful to get access to the blocks in linux without erasing the data first.
Post by Artem Bityutskiy
Note, you will leave
unused bytes in the ioctl data structure an make it extandable in the
future.
So, are you suggesting that I implement a MEMBBSCRUB ioctl, with a flag to
indicate whether or not the BBTs should be updated (and some extra padding for
future-proofing)? Might have think about what happens when called on blocks
reserved for the BBTs...

Cheers,

Angus
Artem Bityutskiy
2012-07-16 14:37:12 UTC
Permalink
Post by Angus CLARK
So, are you suggesting that I implement a MEMBBSCRUB ioctl, with a flag to
indicate whether or not the BBTs should be updated (and some extra padding for
future-proofing)? Might have think about what happens when called on blocks
reserved for the BBTs...
First of all, implement only what you really need and can test. Do not
try to implement too much of stuff you do not really need. Yes, the
above sounds good. And the BBT eraseblocks are not (and should not) be
directly accessible for erasure/scrubbing.
--
Best Regards,
Artem Bityutskiy
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 836 bytes
Desc: This is a digitally signed message part
URL: <http://lists.infradead.org/pipermail/linux-mtd/attachments/20120716/0d14c353/attachment.sig>
Loading...