Discussion:
[RFC][patch] NAND partial page read functionality
(too old to reply)
Alexey Korolev
2007-12-13 18:15:12 UTC
Permalink
Hi

Here is a patch providing partial page read functionality for NAND
devices.

In many cases it gives performacne boost. I've added this feature
enabling under chip->options flag.
Setting NAND_PART_READ option in board driver will enable this feature.

This is example how partial page read could affect stat time perfromance.
--------
Case 1: partial page read is OFF

[root at Linux /]#time ls -l /mnt/mtd8/media
-rw-r--r-- 1 root root 8388608 Jan 1 00:01 file1
-rw-r--r-- 1 root root 8388608 Jan 1 00:02 file2
-rw-r--r-- 1 root root 8388608 Jan 1 00:02 file3
-rw-r--r-- 1 root root 8388608 Jan 1 00:02 file4
-rw-r--r-- 1 root root 8388608 Jan 1 00:02 file5
-rw-r--r-- 1 root root 8388608 Jan 1 00:02 file6
real 0m 5.36s
user 0m 0.00s
sys 0m 5.36s
--------
Case 2: partial page read if ON

[root at Linux /]#time ls -l /mnt/mtd8/media
-rw-r--r-- 1 root root 8388608 Jan 1 00:01 file1
-rw-r--r-- 1 root root 8388608 Jan 1 00:02 file2
-rw-r--r-- 1 root root 8388608 Jan 1 00:02 file3
-rw-r--r-- 1 root root 8388608 Jan 1 00:02 file4
-rw-r--r-- 1 root root 8388608 Jan 1 00:02 file5
-rw-r--r-- 1 root root 8388608 Jan 1 00:02 file6
real 0m 3.00s
user 0m 0.01s
sys 0m 2.99s

There are many cases when it makes sense to use it.
Please find patch below. Your comments and suggestions are welcome.
Thanks,
Alexey

Signed-off-by Alexey Korolev <akorolev at infradead.org>
--------------
diff -aur clear_adv/drivers/mtd/nand/nand_base.c linux-2.6.23.8-adv/drivers/mtd/nand/nand_base.c
--- clear_adv/drivers/mtd/nand/nand_base.c 2007-11-29 17:46:12.000000000 +0300
+++ linux-2.6.23.8-adv/drivers/mtd/nand/nand_base.c 2007-12-13 18:24:08.000000000 +0300
@@ -830,6 +830,66 @@
}

/**
+ * nand_read_partial - [REPLACABLE] software ecc based partial page read function
+ * @mtd: mtd info structure
+ * @chip: nand chip info structure
+ * @dataofs offset of requested data within the page
+ * @readlen data length
+ * @buf: buffer to store read data
+ */
+static int nand_read_partial(struct mtd_info * mtd,struct nand_chip *chip, uint32_t data_offs, uint32_t readlen, uint8_t *bufpoi)
+{
+ int start_step, end_step, num_steps;
+ uint32_t *eccpos = chip->ecc.layout->eccpos;
+ uint8_t *p;
+ int data_col_addr, i;
+ int datafrag_len, eccfrag_len;
+
+ /* Column address wihin the page aligned to ECC size (256bytes). */
+ start_step = data_offs / chip->ecc.size;
+ end_step = (data_offs + readlen - 1) / chip->ecc.size;
+ num_steps = end_step - start_step + 1;
+
+ /* Data size aligned to ECC ecc.size*/
+ datafrag_len = num_steps * chip->ecc.size;
+ eccfrag_len = num_steps * chip->ecc.bytes;
+
+ data_col_addr = start_step * chip->ecc.size;
+ /* If we read not a page aligned data */
+ if (data_col_addr != 0) {
+ chip->cmdfunc(mtd, NAND_CMD_RNDOUT, data_col_addr, -1);
+ }
+
+ p = bufpoi + data_col_addr;
+ chip->read_buf(mtd, p, datafrag_len);
+
+ /* Calculate ECC */
+ for (i = 0; i<eccfrag_len ; i += chip->ecc.bytes, p += chip->ecc.size)
+ chip->ecc.calculate(mtd, p, &chip->buffers->ecccalc[i]);
+
+ /* The performance will be improved more if to position offsets
+ * according to ecc.pos. But in this case we may face issues
+ * on chips with gaps in ecc positions. So it is disabled yet*/
+ chip->cmdfunc(mtd, NAND_CMD_RNDOUT, mtd->writesize, -1);
+ chip->read_buf(mtd, chip->oob_poi, mtd->oobsize);
+ for (i = 0; i < eccfrag_len; i++)
+ chip->buffers->ecccode[i] = chip->oob_poi[eccpos[i + start_step * chip->ecc.bytes]];
+
+ p = bufpoi + data_col_addr;
+ for (i = 0; i<eccfrag_len ; i += chip->ecc.bytes, p += chip->ecc.size) {
+ int stat;
+
+ stat = chip->ecc.correct(mtd, p, &chip->buffers->ecccode[i], &chip->buffers->ecccalc[i]);
+ if (stat == -1)
+ mtd->ecc_stats.failed++;
+ else
+ mtd->ecc_stats.corrected += stat;
+ }
+ return 0;
+
+}
+
+/**
* nand_read_page_hwecc - [REPLACABLE] hardware ecc based page read function
* @mtd: mtd info structure
* @chip: nand chip info structure
@@ -1048,14 +1108,17 @@
/* Now read the page into the buffer */
if (unlikely(ops->mode == MTD_OOB_RAW))
ret = chip->ecc.read_page_raw(mtd, chip, bufpoi);
- else
+ else if ( !aligned && NAND_CANPARTREAD(chip) )
+ ret = chip->ecc.read_partial(mtd, chip, col, bytes, bufpoi);
+ else
ret = chip->ecc.read_page(mtd, chip, bufpoi);
if (ret < 0)
break;

/* Transfer not aligned data */
if (!aligned) {
- chip->pagebuf = realpage;
+ if (!NAND_CANPARTREAD(chip))
+ chip->pagebuf = realpage;
memcpy(buf, chip->buffers->databuf + col, bytes);
}

@@ -2563,6 +2626,7 @@
chip->ecc.calculate = nand_calculate_ecc;
chip->ecc.correct = nand_correct_data;
chip->ecc.read_page = nand_read_page_swecc;
+ chip->ecc.read_partial = nand_read_partial;
chip->ecc.write_page = nand_write_page_swecc;
chip->ecc.read_oob = nand_read_oob_std;
chip->ecc.write_oob = nand_write_oob_std;
diff -aur clear_adv/include/linux/mtd/nand.h linux-2.6.23.8-adv/include/linux/mtd/nand.h
--- clear_adv/include/linux/mtd/nand.h 2007-11-29 17:46:12.000000000 +0300
+++ linux-2.6.23.8-adv/include/linux/mtd/nand.h 2007-12-13 18:24:30.000000000 +0300
@@ -170,6 +170,8 @@
#define NAND_NO_READRDY 0x00000100
/* Chip does not allow subpage writes */
#define NAND_NO_SUBPAGE_WRITE 0x00000200
+/* Chip supports partial page read */
+#define NAND_PART_READ 0x00000400

/* Options valid for Samsung large page devices */
#define NAND_SAMSUNG_LP_OPTIONS \
@@ -183,6 +185,8 @@
#define NAND_MUST_PAD(chip) (!(chip->options & NAND_NO_PADDING))
#define NAND_HAS_CACHEPROG(chip) ((chip->options & NAND_CACHEPRG))
#define NAND_HAS_COPYBACK(chip) ((chip->options & NAND_COPYBACK))
+#define NAND_CANPARTREAD(chip) ((chip->options & NAND_PART_READ) &&\
+ (chip->ecc.mode == NAND_ECC_SOFT))

/* Mask to zero out the chip options, which come from the id table */
#define NAND_CHIPOPTIONS_MSK (0x0000ffff & ~NAND_NO_AUTOINCR)
@@ -281,6 +285,10 @@
int (*read_page)(struct mtd_info *mtd,
struct nand_chip *chip,
uint8_t *buf);
+ int (*read_partial)(struct mtd_info *mtd,
+ struct nand_chip *chip,
+ uint32_t offs, uint32_t len,
+ uint8_t *buf);
void (*write_page)(struct mtd_info *mtd,
struct nand_chip *chip,
const uint8_t *buf);
Artem Bityutskiy
2007-12-15 12:13:15 UTC
Permalink
Post by Alexey Korolev
Hi
Here is a patch providing partial page read functionality for NAND
devices.
In many cases it gives performacne boost. I've added this feature
enabling under chip->options flag.
Setting NAND_PART_READ option in board driver will enable this feature.
This is example how partial page read could affect stat time perfromance.
Hi, this stuff looks nice and useful for me. Few suggestions:

1. Please, use -p option when you generate patches.

2. We already have notion of subpage in nand_base.c. We already can
write subpages. Your patch is about reading subpages, so I suggest you
to do something like s/partial/subpage/. E.g., UBI utilizes sub-page
writes.

3. Run nand-tests to make sure you did not brake anything:
git://git.infradead.org/~ahunter/nand-tests.git

Question: what was the flash you tested this on? Judging from the
comments it was a 512-byte page NAND, right?

==========================
Also, an separate idea I'd suggest you too look at, but after you have
your patch in:

Glance in how nand_base.c does sub-page write. It writes whole NAND page
anyway, but fills the subpages which it does not want to write with
0xFFs. I am not sure, but I suspect it could be smarter and write only
the requested subpage. If this is true, and you will do this, then the
notion of subpage could go away.

Indeed, you could declare mtd->writesize = subpage size then, not NAND
page size. Then, for example, for 512-page-byte NANDs you'd have
256-byte mtd->writesize, and JFFS2 would have 256-byte write-buffer, and
you'd speed up jffs2 writes, because it would write less space when it
synchronizes the write-buffer, and it would waste less space, which
means less Garbage-collection.
==========================
Post by Alexey Korolev
/**
+ * nand_read_partial - [REPLACABLE] software ecc based partial page read function
+ */
+static int nand_read_partial(struct mtd_info * mtd,struct nand_chip *chip, uint32_t data_offs, uint32_t readlen, uint8_t *bufpoi)
+{
+ int start_step, end_step, num_steps;
+ uint32_t *eccpos = chip->ecc.layout->eccpos;
+ uint8_t *p;
+ int data_col_addr, i;
+ int datafrag_len, eccfrag_len;
+
+ /* Column address wihin the page aligned to ECC size (256bytes). */
Why this comment says about 256 bytes. AFAIU, this will be 512 bytes in
case of 2048-byte NAND page.
Post by Alexey Korolev
+ start_step = data_offs / chip->ecc.size;
+ end_step = (data_offs + readlen - 1) / chip->ecc.size;
+ num_steps = end_step - start_step + 1;
+
+ /* Data size aligned to ECC ecc.size*/
+ datafrag_len = num_steps * chip->ecc.size;
+ eccfrag_len = num_steps * chip->ecc.bytes;
+
+ data_col_addr = start_step * chip->ecc.size;
+ /* If we read not a page aligned data */
+ if (data_col_addr != 0) {
+ chip->cmdfunc(mtd, NAND_CMD_RNDOUT, data_col_addr, -1);
+ }
Redundant curly brackets
Post by Alexey Korolev
+
+ p = bufpoi + data_col_addr;
+ chip->read_buf(mtd, p, datafrag_len);
+
+ /* Calculate ECC */
+ for (i = 0; i<eccfrag_len ; i += chip->ecc.bytes, p += chip->ecc.size)
+ chip->ecc.calculate(mtd, p, &chip->buffers->ecccalc[i]);
+
+ /* The performance will be improved more if to position offsets
+ * according to ecc.pos. But in this case we may face issues
+ * on chips with gaps in ecc positions. So it is disabled yet*/
+ chip->cmdfunc(mtd, NAND_CMD_RNDOUT, mtd->writesize, -1);
+ chip->read_buf(mtd, chip->oob_poi, mtd->oobsize);
+ for (i = 0; i < eccfrag_len; i++)
+ chip->buffers->ecccode[i] = chip->oob_poi[eccpos[i + start_step * chip->ecc.bytes]];
This double array indexing is rather unreadable.
Post by Alexey Korolev
+
+ p = bufpoi + data_col_addr;
+ for (i = 0; i<eccfrag_len ; i += chip->ecc.bytes, p += chip->ecc.size) {
+ int stat;
...
Post by Alexey Korolev
/* Options valid for Samsung large page devices */
#define NAND_SAMSUNG_LP_OPTIONS \
@@ -183,6 +185,8 @@
#define NAND_MUST_PAD(chip) (!(chip->options & NAND_NO_PADDING))
#define NAND_HAS_CACHEPROG(chip) ((chip->options & NAND_CACHEPRG))
#define NAND_HAS_COPYBACK(chip) ((chip->options & NAND_COPYBACK))
+#define NAND_CANPARTREAD(chip) ((chip->options & NAND_PART_READ) &&\
+ (chip->ecc.mode == NAND_ECC_SOFT))
Why chip->ecc.mode == NAND_ECC_SOFT?
--
Best regards,
Artem Bityutskiy (???????? ?????)
Alexey Korolev
2007-12-17 15:46:22 UTC
Permalink
Hi,
Post by Artem Bityutskiy
1. Please, use -p option when you generate patches.
No problem. The next patch will be with this option.
Post by Artem Bityutskiy
2. We already have notion of subpage in nand_base.c. We already can
write subpages. Your patch is about reading subpages, so I suggest you
to do something like s/partial/subpage/. E.g., UBI utilizes sub-page
writes.
Good point. The question is what name would be better. I used partial
page read because of NAND datasheet (Intel, St, Micron uses term partial
page read or write instead of subpage read or write). May be it will be
better to switch to this name to make search more easy?
Post by Artem Bityutskiy
git://git.infradead.org/~ahunter/nand-tests.git
Thank you for link - we will try it.
(before sending out this code we performed excessive testing - about 500
items were executed, including stress, multithread, power loss tests)
Post by Artem Bityutskiy
Question: what was the flash you tested this on? Judging from the
comments it was a 512-byte page NAND, right?
It was LP NAND with 2048/64 bytes of main/oob area per page.
Post by Artem Bityutskiy
==========================
Also, an separate idea I'd suggest you too look at, but after you have
Glance in how nand_base.c does sub-page write. It writes whole NAND page
anyway, but fills the subpages which it does not want to write with
0xFFs. I am not sure, but I suspect it could be smarter and write only
the requested subpage. If this is true, and you will do this, then the
notion of subpage could go away.
Indeed, you could declare mtd->writesize = subpage size then, not NAND
page size. Then, for example, for 512-page-byte NANDs you'd have
256-byte mtd->writesize, and JFFS2 would have 256-byte write-buffer, and
you'd speed up jffs2 writes, because it would write less space when it
synchronizes the write-buffer, and it would waste less space, which
means less Garbage-collection.
==========================
Yes. I looked at this implementation. Your idea of partial/subpage write implementation
looks good. In fact there will be some drawbacks - if every request will
become small, write performance will become lower - because writing and
positioning are not for free. May be it make sence to add this feature under flag.
It will take some time to implement this feature, but it definetely make
sense to try and play with it.
It is not clear how to handle the situation with oob write requests.
What to do if some one wil ask to write a part of page and oob at the
same time?

I did not understand this part of the code about subpage write:
/*
* Allow subpage writes up to ecc.steps. Not possible for MLC
* FLASH.
*/
if (!(chip->options & NAND_NO_SUBPAGE_WRITE) &&
!(chip->cellinfo & NAND_CI_CELLTYPE_MSK)) {
switch(chip->ecc.steps) {
case 2:
mtd->subpage_sft = 1;
break;
case 4:
case 8:
mtd->subpage_sft = 2;
break;
}
}
Why it sets subpage shift to 2 in case of eight ECC steps? It should be
3. (step size is 1/8 of page)
Post by Artem Bityutskiy
Post by Alexey Korolev
+ /* Column address wihin the page aligned to ECC size (256bytes). */
Why this comment says about 256 bytes. AFAIU, this will be 512 bytes in
case of 2048-byte NAND page.
It is 256 bytes for SOFT_ECC. Arifmetics is quite simple 3 bytes
ECC per 256 bytes of data.
Post by Artem Bityutskiy
Post by Alexey Korolev
+ if (data_col_addr != 0) {
+ chip->cmdfunc(mtd, NAND_CMD_RNDOUT, data_col_addr, -1);
+ }
Redundant curly brackets
Thanks. Will be fixed.
Post by Artem Bityutskiy
Post by Alexey Korolev
#define NAND_HAS_COPYBACK(chip) ((chip->options & NAND_COPYBACK))
+#define NAND_CANPARTREAD(chip) ((chip->options & NAND_PART_READ) &&\
+ (chip->ecc.mode == NAND_ECC_SOFT))
Why chip->ecc.mode == NAND_ECC_SOFT?
I added this limitation because I have only ECC_SOFT chips and afraid to
broke workability non SOFT_ECC chiups. I guess it should work fine for
other devices. Just kind of over protection. It is not a problem to
remove it.

Thank you for your comments. Waiting for you response.

Alexey
Artem Bityutskiy
2007-12-18 08:48:51 UTC
Permalink
Post by Alexey Korolev
Good point. The question is what name would be better. I used partial
page read because of NAND datasheet (Intel, St, Micron uses term partial
page read or write instead of subpage read or write). May be it will be
better to switch to this name to make search more easy?
For me it is not fundamental, but I'd vote for subpage - just because we
already have this. And wendors anyway tend to name things differently.
Post by Alexey Korolev
Post by Artem Bityutskiy
git://git.infradead.org/~ahunter/nand-tests.git
Thank you for link - we will try it.
(before sending out this code we performed excessive testing - about 500
items were executed, including stress, multithread, power loss tests
This is good, but the tests I pointed to work straight with MTD devices
and were designed to test NANDs, so they might be useful anyway. For
example we found quite a few bugs in OneNAND driver with them, although
the FS tests had looked working fine.
Post by Alexey Korolev
Yes. I looked at this implementation. Your idea of partial/subpage write implementation
looks good. In fact there will be some drawbacks - if every request will
become small, write performance will become lower - because writing and
positioning are not for free.
Well, this depends. If an MTD user wants to write 4KiB, and issues 4KiB
write request, then it is of course faster to write 2x2048, then 8x512,
and it is even faster to do some kind of multi-page write (some old
flashes had this AFAIK).

But surely if the driver is not dumb, it will do 2x2048?

I've glanced at jffs2_flash_writev(), and it seems it is also not dumb -
if in needs to write a 4KiB buffer, it first finishes current
write-buffer, flushes it, then it calles mtd->write() for multiple min.
I/O units (note, it does not use wbuf now), and only the rest, which
does not comprise whole min. I/O unit, goes to the wbuf.

Thus, I'd conclude, JFFS2 should benefit.
Post by Alexey Korolev
May be it make sence to add this feature under flag.
No, no flags please. It should either be transparent of not exist I
believe :-)
Post by Alexey Korolev
It is not clear how to handle the situation with oob write requests.
What to do if some one wil ask to write a part of page and oob at the
same time?
Yup, it seems to be a problem.
Post by Alexey Korolev
/*
* Allow subpage writes up to ecc.steps. Not possible for MLC
* FLASH.
*/
if (!(chip->options & NAND_NO_SUBPAGE_WRITE) &&
!(chip->cellinfo & NAND_CI_CELLTYPE_MSK)) {
switch(chip->ecc.steps) {
mtd->subpage_sft = 1;
break;
mtd->subpage_sft = 2;
break;
}
}
Why it sets subpage shift to 2 in case of eight ECC steps? It should be
3. (step size is 1/8 of page)
Frankly, I do not deeply understand nand_base.c, but it looks like you
are right. Thomas Gleixner is the author, you should ask him.
Post by Alexey Korolev
Post by Artem Bityutskiy
Post by Alexey Korolev
#define NAND_HAS_COPYBACK(chip) ((chip->options & NAND_COPYBACK))
+#define NAND_CANPARTREAD(chip) ((chip->options & NAND_PART_READ) &&\
+ (chip->ecc.mode == NAND_ECC_SOFT))
Why chip->ecc.mode == NAND_ECC_SOFT?
I added this limitation because I have only ECC_SOFT chips and afraid to
broke workability non SOFT_ECC chiups. I guess it should work fine for
other devices. Just kind of over protection. It is not a problem to
remove it.
Well, I could try your patch on OLPC which has an ST-Micro NAND and CAFE
controller which, among other things, does HW ECC for NAND.
--
Best regards,
Artem Bityutskiy (???????? ?????)
Jörn Engel
2007-12-18 11:42:33 UTC
Permalink
Post by Artem Bityutskiy
Well, this depends. If an MTD user wants to write 4KiB, and issues 4KiB
write request, then it is of course faster to write 2x2048, then 8x512,
and it is even faster to do some kind of multi-page write (some old
flashes had this AFAIK).
Not necessarily. The alauda chip has a "page program" and a "block
program" command. With a naive implementation the block program is
faster. But when doing asynchronous transfers on the usb bus, page
program becomes just as fast. In this particular case, block program
can only reduce the number of synchronous bus latencies for a
non-optimized implementation.
Post by Artem Bityutskiy
But surely if the driver is not dumb, it will do 2x2048?
I've glanced at jffs2_flash_writev(), and it seems it is also not dumb -
if in needs to write a 4KiB buffer, it first finishes current
write-buffer, flushes it, then it calles mtd->write() for multiple min.
I/O units (note, it does not use wbuf now), and only the rest, which
does not comprise whole min. I/O unit, goes to the wbuf.
Thus, I'd conclude, JFFS2 should benefit.
If writesize is 256 and pagesize is 2048, every wbuf flush will write
exactly 256. Any remaining clean multiple of 256 is written directly,
but this will rarely be an aligned clean multiple of 2048. Every wbuf
flush ensures that either the previous or the following writes is not.

As long as JFFS2 and MTD are synchronous, this could be a significant
performance hit.

J?rn
--
Mac is for working,
Linux is for Networking,
Windows is for Solitaire!
-- stolen from dc
Artem Bityutskiy
2007-12-18 12:57:55 UTC
Permalink
Post by Jörn Engel
Post by Artem Bityutskiy
Well, this depends. If an MTD user wants to write 4KiB, and issues 4KiB
write request, then it is of course faster to write 2x2048, then 8x512,
and it is even faster to do some kind of multi-page write (some old
flashes had this AFAIK).
Not necessarily. The alauda chip has a "page program" and a "block
program" command. With a naive implementation the block program is
faster. But when doing asynchronous transfers on the usb bus, page
program becomes just as fast. In this particular case, block program
can only reduce the number of synchronous bus latencies for a
non-optimized implementation.
Well, in the context of discussion this example is not really relevant,
since aluda have its own write_page, and we are talking about the
nand_base.c's implementation.
Post by Jörn Engel
If writesize is 256 and pagesize is 2048, every wbuf flush will write
exactly 256. Any remaining clean multiple of 256 is written directly,
but this will rarely be an aligned clean multiple of 2048. Every wbuf
flush ensures that either the previous or the following writes is not.
Indeed, you are right.
--
Best regards,
Artem Bityutskiy (???????? ?????)
Jörn Engel
2007-12-18 13:51:03 UTC
Permalink
Post by Artem Bityutskiy
Post by Jörn Engel
Post by Artem Bityutskiy
Well, this depends. If an MTD user wants to write 4KiB, and issues 4KiB
write request, then it is of course faster to write 2x2048, then 8x512,
and it is even faster to do some kind of multi-page write (some old
flashes had this AFAIK).
Not necessarily. The alauda chip has a "page program" and a "block
program" command. With a naive implementation the block program is
faster. But when doing asynchronous transfers on the usb bus, page
program becomes just as fast. In this particular case, block program
can only reduce the number of synchronous bus latencies for a
non-optimized implementation.
Well, in the context of discussion this example is not really relevant,
since aluda have its own write_page, and we are talking about the
nand_base.c's implementation.
Your claim above seemed to imply that larger transfers always improve
speed. That simply isn't true. If I misinterpreted your claim then
sorry about the noise.

J?rn
--
Chance favors only the prepared mind.
-- Louis Pasteur
Artem Bityutskiy
2008-04-24 06:34:36 UTC
Permalink
Post by Alexey Korolev
Hi
Here is a patch providing partial page read functionality for NAND
devices.
In many cases it gives performacne boost. I've added this feature
enabling under chip->options flag.
Setting NAND_PART_READ option in board driver will enable this feature.
Hamish, this stuff should certainly help you. You could give it a try.
Unfortunately Alexey gave up almost straight away and did not try to
push his work harder.The arguments against the patch were weak, and
addressable. This work needs some more efforts and it may get merged.
Moreover, Alexey came with impressive numbers, and it is difficult to
argue against. Would you give it a try? If it gives performance benefits
for you, I think we could raise this again.

Look here for the full story:
http://lists.infradead.org/pipermail/linux-mtd/2007-December/020105.html
--
Best regards,
Artem Bityutskiy (???????? ?????)
Artem Bityutskiy
2008-04-24 07:11:55 UTC
Permalink
Hi David, Thomas,
Post by Artem Bityutskiy
Post by Alexey Korolev
Hi
Here is a patch providing partial page read functionality for NAND
devices.
In many cases it gives performacne boost. I've added this feature
enabling under chip->options flag.
Setting NAND_PART_READ option in board driver will enable this feature.
Hamish, this stuff should certainly help you. You could give it a try.
Unfortunately Alexey gave up almost straight away and did not try to
push his work harder.The arguments against the patch were weak, and
addressable. This work needs some more efforts and it may get merged.
Moreover, Alexey came with impressive numbers, and it is difficult to
argue against. Would you give it a try? If it gives performance benefits
for you, I think we could raise this again.
http://lists.infradead.org/pipermail/linux-mtd/2007-December/020105.html
Could we please take a look at this work:
http://lists.infradead.org/pipermail/linux-mtd/2007-December/020105.html

and identify what should be changed/done to have it merged. Alexey shows
impressive numbers there. Also, Hamish(*) shows that UBI attach time drops
from about 5.6 seconds to about 2.5 seconds which is a very substantial
growth.

Thanks!

*) http://lists.infradead.org/pipermail/linux-mtd/2008-April/021413.html
--
Best regards,
Artem Bityutskiy (???????? ?????)
Hamish Moffatt
2008-04-24 07:45:26 UTC
Permalink
Post by Artem Bityutskiy
Post by Alexey Korolev
Hi
Here is a patch providing partial page read functionality for NAND
devices.
In many cases it gives performacne boost. I've added this feature
enabling under chip->options flag.
Setting NAND_PART_READ option in board driver will enable this feature.
Hamish, this stuff should certainly help you. You could give it a try.
Unfortunately Alexey gave up almost straight away and did not try to
push his work harder.The arguments against the patch were weak, and
addressable. This work needs some more efforts and it may get merged.
Moreover, Alexey came with impressive numbers, and it is difficult to
argue against. Would you give it a try? If it gives performance benefits
for you, I think we could raise this again.
http://lists.infradead.org/pipermail/linux-mtd/2007-December/020105.html
I reviewed the discussion.

Firstly it was suggested to talk about sub-page reads rather than
partial reads. I don't think this is quite correct. Unlike writes, the
chip does not care which part of the page you read - you can skip to any
column address within the page. In nand_base it is in 256-byte
increments because that is the software-ECC step size.

The rest of the discussion was about improved use of sub-page writes.
I'm sure that would be useful but it's not strictly relevant to the
partial read patch.
Post by Artem Bityutskiy
Setting NAND_PART_READ option in board driver will enable this
feature.
which is not true, as nand_base.c masks out chip-options (like this)
inherited from the board driver. It must be set for the chip in
nand_ids.c instead.

Artem, were you able to try the patch on the OLPC with hardware ECC, as
you hinted in the original discussion?

Also I reverted your recent patch to combine the EC + VID header reads
in the UBI scan for this test. I think with partial reads it may make
things slower, because it will force 2K to be read (or 576 bytes if the
ALIGN() is removed, still more than 64 + 64). Because min_io_size = 2K
is not really accurate I guess.

Hamish
--
Hamish Moffatt VK3SB <hamish at debian.org> <hamish at cloud.net.au>
Artem Bityutskiy
2008-04-24 09:53:59 UTC
Permalink
Post by Hamish Moffatt
The rest of the discussion was about improved use of sub-page writes.
I'm sure that would be useful but it's not strictly relevant to the
partial read patch.
Yup.
Post by Hamish Moffatt
Artem, were you able to try the patch on the OLPC with hardware ECC, as
you hinted in the original discussion?
Right, I completely forgot to do this. I've just verified - OLPC works
just fine with this patch. It also does not affect our OneNAND setup.
Post by Hamish Moffatt
Also I reverted your recent patch to combine the EC + VID header reads
in the UBI scan for this test. I think with partial reads it may make
things slower, because it will force 2K to be read (or 576 bytes if the
ALIGN() is removed, still more than 64 + 64). Because min_io_size = 2K
is not really accurate I guess.
I deliberately did that. I though making one call instead of 2 is
better, because MTD anyway reads whole page. And my other motivation was
that flashes like OneNAND which may read 2 consecutive pages faster
(kind of "burst" read). Although for OneNAND this trick does not really
work. Anyway, lets forget about that patch.

I think if Alexey is not any more interested in working on his patch,
you could do this, if you are interested. I am going to be away till the
5 of May, so I won't be able to help.
--
Best regards,
Artem Bityutskiy (???????? ?????)
Alexey Korolev
2008-04-24 10:25:01 UTC
Permalink
Hi,

Oh I'm glad to hear that this patch is needful. In fact I did
not drop it. We did some more investigations about integration subpage
read in background. We tried to remove some drawback caused by
invalidation of prestored page in do_read_ops function. (We have found
that on large read requests nand subpage read causes ~10% of read performance degradation. I
wanted to remove but changes made code ugly and saved only 5%)

Also we investigated possibility to make more universal solution which includes both sub-page
read and partial page write. As it was discussed in December. In fact the partial page write did not
give any benefits at all as file system operates write_size. If we reduce writesize JFFS2 will write data by
small chunks only and it will cause strong performance degradation. (it
brings more cons than pros until serious hacks in JFFS2 will be
implemented)
Post by Hamish Moffatt
Post by Artem Bityutskiy
Post by Alexey Korolev
Hi
Here is a patch providing partial page read functionality for NAND
devices.
In many cases it gives performacne boost. I've added this feature
enabling under chip->options flag.
Setting NAND_PART_READ option in board driver will enable this feature.
Hamish, this stuff should certainly help you. You could give it a try.
Unfortunately Alexey gave up almost straight away and did not try to
push his work harder.The arguments against the patch were weak, and
addressable. This work needs some more efforts and it may get merged.
Moreover, Alexey came with impressive numbers, and it is difficult to
argue against. Would you give it a try? If it gives performance benefits
for you, I think we could raise this again.
http://lists.infradead.org/pipermail/linux-mtd/2007-December/020105.html
I reviewed the discussion.
Firstly it was suggested to talk about sub-page reads rather than
partial reads. I don't think this is quite correct. Unlike writes, the
chip does not care which part of the page you read - you can skip to any
column address within the page. In nand_base it is in 256-byte
increments because that is the software-ECC step size.
Agree about it.
Post by Hamish Moffatt
The rest of the discussion was about improved use of sub-page writes.
I'm sure that would be useful but it's not strictly relevant to the
partial read patch.
Post by Artem Bityutskiy
Setting NAND_PART_READ option in board driver will enable this feature.
which is not true, as nand_base.c masks out chip-options (like this)
inherited from the board driver. It must be set for the chip in
nand_ids.c instead.
It works fine, you need to setup this option after nand_scan.
Using it in NAND_IDS is bad idea because some people may want disable it you can indirectly identify if chip
supports subpage read from NAND capabilities records.y
Post by Hamish Moffatt
Artem, were you able to try the patch on the OLPC with hardware ECC, as
you hinted in the original discussion?
Also I reverted your recent patch to combine the EC + VID header reads
in the UBI scan for this test. I think with partial reads it may make
things slower, because it will force 2K to be read (or 576 bytes if the
ALIGN() is removed, still more than 64 + 64). Because min_io_size = 2K
is not really accurate I guess.
So if there is an interest in subpage read let resume the topic. In next couple days I
will send updated patch with subpage read functionality.

Thanks,
Alexey
Artem Bityutskiy
2008-04-24 10:45:53 UTC
Permalink
Hi,
Post by Alexey Korolev
Oh I'm glad to hear that this patch is needful. In fact I did
not drop it. We did some more investigations about integration subpage
read in background. We tried to remove some drawback caused by
invalidation of prestored page in do_read_ops function. (We have found
that on large read requests nand subpage read causes ~10% of read performance degradation. I
wanted to remove but changes made code ugly and saved only 5%)
Err, are you referring this chip->pagebuf cache? Have you considered
making it more complex and keep track of individual sub-pages within the
pagebuf? You say it introduces a lot of ugliness? May be you could
somehow separate this ugliness out nicely somehow?
Post by Alexey Korolev
Also we investigated possibility to make more universal solution which includes both sub-page
read and partial page write. As it was discussed in December. In fact the partial page write did not
give any benefits at all as file system operates write_size. If we reduce writesize JFFS2 will write data by
small chunks only and it will cause strong performance degradation. (it
brings more cons than pros until serious hacks in JFFS2 will be
implemented)
Well, in any case, if you have something for writes, this should be a
separate story.
--
Best regards,
Artem Bityutskiy (???????? ?????)
Alexey Korolev
2008-04-24 10:57:11 UTC
Permalink
Hi,
Post by Artem Bityutskiy
Post by Alexey Korolev
Oh I'm glad to hear that this patch is needful. In fact I did
not drop it. We did some more investigations about integration subpage
read in background. We tried to remove some drawback caused by
invalidation of prestored page in do_read_ops function. (We have found
that on large read requests nand subpage read causes ~10% of read performance degradation. I
wanted to remove but changes made code ugly and saved only 5%)
Err, are you referring this chip->pagebuf cache? Have you considered
making it more complex and keep track of individual sub-pages within the
pagebuf? You say it introduces a lot of ugliness? May be you could
somehow separate this ugliness out nicely somehow?
Yes. I mean chip->pagebuf cache. From point of performacne the best
would be traking sub_pages within the pagebuf. But it is too complex and
brings ugliness. Sure I will definetely separate it.
Post by Artem Bityutskiy
Post by Alexey Korolev
Also we investigated possibility to make more universal solution which includes both sub-page
read and partial page write. As it was discussed in December. In fact the partial page write did not
give any benefits at all as file system operates write_size. If we reduce writesize JFFS2 will write data by
small chunks only and it will cause strong performance degradation. (it
brings more cons than pros until serious hacks in JFFS2 will be
implemented)
Well, in any case, if you have something for writes, this should be a
separate story.
Hamish Moffatt
2008-04-24 14:04:47 UTC
Permalink
Hi,

Thanks for your patch Alexey.
Post by Alexey Korolev
Oh I'm glad to hear that this patch is needful. In fact I did
not drop it. We did some more investigations about integration subpage
read in background. We tried to remove some drawback caused by
invalidation of prestored page in do_read_ops function. (We have found
that on large read requests nand subpage read causes ~10% of read performance degradation. I
wanted to remove but changes made code ugly and saved only 5%)
Do you know why this is? If you are reading a whole page then the
partial read code is equivalent to the read full page code as far as I
can tell.
Post by Alexey Korolev
Post by Hamish Moffatt
Post by Alexey Korolev
Setting NAND_PART_READ option in board driver will enable this feature.
which is not true, as nand_base.c masks out chip-options (like this)
inherited from the board driver. It must be set for the chip in
nand_ids.c instead.
It works fine, you need to setup this option after nand_scan.
Using it in NAND_IDS is bad idea because some people may want disable it you can indirectly identify if chip
supports subpage read from NAND capabilities records.y
Oh, I see. I think it is unusual to modify the chip->options after
calling nand_scan? I am using plat_nand.c and it doesn't have any way to
do this.

Maybe NAND_PART_READ should be excluded from NAND_CHIPOPTIONS_MSK. But
you still have the case where the chip can't do it even if the board
wants to. Maybe the chip should declare it's capable and the board
should be able to turn it off if required.

Mind you, other parts of nand_base use the RNDOUT function already so I
don't think there is any case where you *wouldn't* enable partial reads.
Post by Alexey Korolev
So if there is an interest in subpage read let resume the topic. In next couple days I
will send updated patch with subpage read functionality.
Thanks. Did you find any actual bugs with the previous patch, or is it
safe for me to use it?

Hamish
--
Hamish Moffatt VK3SB <hamish at debian.org> <hamish at cloud.net.au>
Alexey Korolev
2008-04-24 14:48:23 UTC
Permalink
Hamish,
Post by Hamish Moffatt
Thanks for your patch Alexey.
Post by Alexey Korolev
Oh I'm glad to hear that this patch is needful. In fact I did
not drop it. We did some more investigations about integration subpage
read in background. We tried to remove some drawback caused by
invalidation of prestored page in do_read_ops function. (We have found
that on large read requests nand subpage read causes ~10% of read performance degradation. I
wanted to remove but changes made code ugly and saved only 5%)
Do you know why this is? If you are reading a whole page then the
partial read code is equivalent to the read full page code as far as I
can tell.
Yes. Sometimes it is better to read one page and place it into a cache
than reading part of it several times.
The weak places of original implementation
1. we read whole OOB each time.
2. Many subpage requests occures sequentally we often have to reread 256bytes
fragment.

As result sequential read of large contineous chunks is 10% worse.
I did not found any other drawbacks of this implementation.
Post by Hamish Moffatt
Post by Alexey Korolev
Post by Hamish Moffatt
Post by Alexey Korolev
Setting NAND_PART_READ option in board driver will enable this feature.
which is not true, as nand_base.c masks out chip-options (like this)
inherited from the board driver. It must be set for the chip in
nand_ids.c instead.
It works fine, you need to setup this option after nand_scan.
Using it in NAND_IDS is bad idea because some people may want disable it you can indirectly identify if chip
supports subpage read from NAND capabilities records.y
Oh, I see. I think it is unusual to modify the chip->options after
calling nand_scan? I am using plat_nand.c and it doesn't have any way to
do this.
Maybe NAND_PART_READ should be excluded from NAND_CHIPOPTIONS_MSK. But
you still have the case where the chip can't do it even if the board
wants to. Maybe the chip should declare it's capable and the board
should be able to turn it off if required.
Mind you, other parts of nand_base use the RNDOUT function already so I
don't think there is any case where you *wouldn't* enable partial reads.
Post by Alexey Korolev
So if there is an interest in subpage read let resume the topic. In next couple days I
will send updated patch with subpage read functionality.
Thanks. Did you find any actual bugs with the previous patch, or is it
safe for me to use it?
Yes. It is safe. We passed excessive tests before I sent it
out.

Thanks,
Alexey

Continue reading on narkive:
Loading...