Discussion:
[PATCH RFC] UBIFS: Fix assert failed in ubifs_set_page_dirty
hujianyang
2014-04-26 09:25:42 UTC
Permalink
Hi,

I meet some assert failed as Laurence Withers reported in February.

show like this:

[69440.577932] UBIFS assert failed in ubifs_set_page_dirty at 1421 (pid 2067)
[69440.658567] UBIFS assert failed in ubifs_writepage at 1009 (pid 2069)
[69440.735605] UBIFS assert failed in do_writepage at 936 (pid 2069)
[69440.735881] UBIFS assert failed in ubifs_release_budget at 567 (pid 2069)
[69440.736360] UBIFS assert failed in ubifs_release_budget at 567 (pid 2069)
[69441.581541] UBIFS assert failed in ubifs_budget_space at 464 (pid 2070)
[69441.659118] UBIFS assert failed in ubifs_release_budget at 567 (pid 2072)
[69441.740405] UBIFS assert failed in ubifs_set_page_dirty at 1421 (pid 2070)
[69441.822369] UBIFS assert failed in ubifs_writepage at 1009 (pid 2072)
[69441.899574] UBIFS assert failed in do_writepage at 936 (pid 2072)
[69441.899853] UBIFS assert failed in ubifs_release_budget at 567 (pid 2072)
[69450.677679] UBIFS assert failed in ubifs_release_budget at 567 (pid 6)
[69455.757670] UBIFS assert failed in ubifs_release_budget at 567 (pid 6)
[69458.147075] UBIFS assert failed in ubifs_put_super at 1776 (pid 2077)

After communicating with Laurence, I found this assert failed can
be easily reproduced by running mmap(PROT_WRITE, MAP_SHARED) and
fsync with same file at same time.

I think there is a race in __do_fault and ubifs_writepage.

We do ->page_mkwrite in __do_fault, perform space budget and set
PagePrivate, then execute __set_page_dirty_nobuffers to make the
page dirty. But at the end of ubifs_vm_page_mkwrite, we release
page lock.

At the same time, fsync process may hold the lock and do ->writepage
as page is set to dirty in ->page_mkwrite. We will clear page_dirty,
clear page_private and release budget here. In the end, unlock page.

Mmap then get the lock and perform ->set_page_dirty. We will meet
the first assert failed here because dirty bit is clear by fsync.

"UBIFS assert failed in ubifs_set_page_dirty at 1421"

static int ubifs_set_page_dirty(struct page *page)
{
int ret;

ret = __set_page_dirty_nobuffers(page); /* Here reset dirty bit */
/*
* An attempt to dirty a page without budgeting for it - should not
* happen.
*/
ubifs_assert(ret == 0);
return ret;
}

With this assert failed, ->set_page_dirty will reset the dirty bit
without space budget and SetPagePrivate. When we want to writeback
this page, we will meet PagePrivate assert failed.

"UBIFS assert failed in ubifs_writepage at 1009 (pid 2069)
UBIFS assert failed in do_writepage at 936 (pid 2069)"

Then, release budget without budgeting and meet budget assert failed.

"UBIFS assert failed in ubifs_release_budget at 567
UBIFS assert failed in ubifs_budget_space at 464"

c->bi.dd_growth is less than zero now.

I want to fix this problem but I don't have enough knowledge about
this filesystem. In my fix, I remove __set_page_dirty_nobuffers
in ->page_mkwrite and just set dirty bit in ->set_page_dirty.

I don't know if my fix works and causes no other problems. I can only
test it with linux-3.10 and it seems not bad.

-------------------------------------------------------------------
hujianyang
2014-04-30 06:06:06 UTC
Permalink
Hi, all

Basing on the perious mail, I would like to show a clear figure
about the race I have found.

Thread A (mmap) Thread B (fsync)

->__do_fault ->write_cache_pages
-> ubifs_page_mkwrite
-> budget_space
-> lock_page
-> release/convert_page_budget
-> SetPagePrivate
-> TestSetPageDirty
-> unlock_page
-> lock_page
-> TestClearPageDirty
-> ubifs_writepage
-> do_writepage
-> release_budget
-> ClearPagePrivate
-> unlock_page
-> !(ret & VM_FAULT_LOCKED)
-> lock_page
-> set_page_dirty
-> ubifs_set_page_dirty
-> TestSetPageDirty (set page dirty without budgeting)
-> unlock_page

According to this situation, my v2 fix returns from page_mkwrite
without performing unlock_page. We return VM_FAULT_LOCKED instead
of just return 0. After doing this, the race above will not happen.


Signed-off-by: hujianyang <hujianyang at huawei.com>
---
fs/ubifs/file.c | 3 +--
1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/fs/ubifs/file.c b/fs/ubifs/file.c
index 4f34dba..f7d48a0 100644
--- a/fs/ubifs/file.c
+++ b/fs/ubifs/file.c
@@ -1525,8 +1525,7 @@ static int ubifs_vm_page_mkwrite(struct vm_area_struct *vma,
}

wait_for_stable_page(page);
- unlock_page(page);
- return 0;
+ return VM_FAULT_LOCKED;

out_unlock:
unlock_page(page);
--
1.8.5.5
Laurence Withers
2014-04-30 12:18:46 UTC
Permalink
Post by hujianyang
According to this situation, my v2 fix returns from page_mkwrite
without performing unlock_page. We return VM_FAULT_LOCKED instead
of just return 0. After doing this, the race above will not happen.
Hi,

I have been testing patch v1 on our hardware. Normally, at higher data rates,
we can hit the assert in under 8 hours. So far I have been running for 48
hours on one system without seeing the assert message.

I will now switch to running v2 and see if I can't find a few more systems to
run in parallel.

Many thanks, and bye for now,
--
Laurence Withers, <lwithers at guralp.com> http://www.guralp.com/
Dolev Raviv
2014-04-30 13:48:51 UTC
Permalink
Hi Hujianyang and Laurence,
I'm hitting an assertion in very similar code during the callback
ubifs_releasepage(). I'm quite new to this code area, and currently diving
in to understanding the race you described (before I attempt to look at
the patch).

I was wondering if this assertion is related?

[ 862.695278] UBIFS assert failed in ubifs_releasepage at 1434 (pid 11088)
[ 862.700952] CPU: 0 PID: 11088 Comm: busybox Tainted: G W
3.10.0+ #1
[ 862.714460] [<c00147f4>] (unwind_backtrace+0x0/0x11c) from [<c0011bc0>]
(show_stack+0x10/0x14)
[ 862.722047] [<c0011bc0>] (show_stack+0x10/0x14) from [<c0187fe8>]
(ubifs_releasepage+0x70/0xb8)
[ 862.746573] [<c0187fe8>] (ubifs_releasepage+0x70/0xb8) from
[<c00c7228>] (try_to_release_page+0x44/0x5c)
[ 862.755865] [<c00c7228>] (try_to_release_page+0x44/0x5c) from
[<c00d3b58>] (invalidate_inode_page+0x60/0x94)
[ 862.764955] vbat_low_handler: vbat low
[ 862.768728] [<c00d3b58>] (invalidate_inode_page+0x60/0x94) from
[<c00d3c00>] (invalidate_mapping_pages+0x74/0x114)
[ 862.779244] [<c00d3c00>] (invalidate_mapping_pages+0x74/0x114) from
[<c0145004>] (drop_pagecache_sb+0x7c/0xbc)
[ 862.789079] [<c0145004>] (drop_pagecache_sb+0x7c/0xbc) from
[<c0104ab4>] (iterate_supers+0x74/0xc8)
[ 862.798092] [<c0104ab4>] (iterate_supers+0x74/0xc8) from [<c0145088>]
(drop_caches_sysctl_handler+0x44/0x90)
[ 862.808907] [<c0145088>] (drop_caches_sysctl_handler+0x44/0x90) from
[<c014f280>] (proc_sys_call_handler+0x84/0xa0)
[ 862.821868] [<c014f280>] (proc_sys_call_handler+0x84/0xa0) from
[<c014f2ac>] (proc_sys_write+0x10/0x14)
[ 862.830349] [<c014f2ac>] (proc_sys_write+0x10/0x14) from [<c01022fc>]
(vfs_write+0xd4/0x16c)
[ 862.838719] [<c01022fc>] (vfs_write+0xd4/0x16c) from [<c010263c>]
(SyS_write+0x3c/0x60)
[ 862.846738] [<c010263c>] (SyS_write+0x3c/0x60) from [<c000e440>]
(ret_fast_syscall+0x0/0x30)
Post by Laurence Withers
Post by hujianyang
According to this situation, my v2 fix returns from page_mkwrite
without performing unlock_page. We return VM_FAULT_LOCKED instead
of just return 0. After doing this, the race above will not happen.
Hi,
I have been testing patch v1 on our hardware. Normally, at higher data rates,
we can hit the assert in under 8 hours. So far I have been running for 48
hours on one system without seeing the assert message.
I will now switch to running v2 and see if I can't find a few more systems to
run in parallel.
Many thanks, and bye for now,
--
Laurence Withers, <lwithers at guralp.com>
http://www.guralp.com/
______________________________________________________
Linux MTD discussion mailing list
http://lists.infradead.org/mailman/listinfo/linux-mtd/
--
QUALCOMM ISRAEL, on behalf of Qualcomm Innovation Center, Inc. is a member
of Code Aurora Forum, hosted by The Linux Foundation
hujianyang
2014-05-04 06:38:35 UTC
Permalink
Hi, Dolev
Post by Dolev Raviv
Hi Hujianyang and Laurence,
I'm hitting an assertion in very similar code during the callback
ubifs_releasepage(). I'm quite new to this code area, and currently diving
in to understanding the race you described (before I attempt to look at
the patch).
I was wondering if this assertion is related?
[ 862.695278] UBIFS assert failed in ubifs_releasepage at 1434 (pid 11088)
What's your kernel version? Because in v3.15 and v3.10, line 1434
present different assertion.

I think you probably hit:

"ubifs_assert(0)"

static int ubifs_releasepage(struct page *page, gfp_t unused_gfp_flags)
{
/*
* An attempt to release a dirty page without budgeting for it - should
* not happen.
*/
if (PageWriteback(page))
return 0;
ubifs_assert(PagePrivate(page));
ubifs_assert(0); <- v3.10 line 1434
ClearPagePrivate(page);
ClearPageChecked(page);
return 1;
}


Page_Private bit seems to indicate page is budgeted in UBIFS. Dropping
caches performs with page lock and just deals with pages which are not
dirty. Then, if page_private bit is set, vfs performs ->releasepage at
try_to_release_page.

Maybe some explanation of this assert failed is that there is a private
(budgeted) but not dirty page in your page caches.

I think it is not related to my race because pages can just remain
in 3 states at fsync and mmap:

1) Dirty, Private
2) Not Dirty, Not Private
3) Dirty, Not Private (wrong condition)

Did you get some more assert failed after this? If you umount UBIFS,
ubifs_put_super will check the budget info, and I think you should
hit some assertion there.

Do you have a simply way to reproduce this assert failed?


Hu
Dolev Raviv
2014-05-04 11:54:03 UTC
Permalink
Hi Hu, Thanks allot for your help.
Post by hujianyang
Hi, Dolev
Post by Dolev Raviv
Hi Hujianyang and Laurence,
I'm hitting an assertion in very similar code during the callback
ubifs_releasepage(). I'm quite new to this code area, and currently diving
in to understanding the race you described (before I attempt to look at
the patch).
I was wondering if this assertion is related?
[ 862.695278] UBIFS assert failed in ubifs_releasepage at 1434 (pid 11088)
What's your kernel version? Because in v3.15 and v3.10, line 1434
present different assertion.
You are right the version is v3.10 and I do hit the
"ubifs_assert(0)"
Post by hujianyang
Page_Private bit seems to indicate page is budgeted in UBIFS. Dropping
caches performs with page lock and just deals with pages which are not
dirty. Then, if page_private bit is set, vfs performs ->releasepage at
try_to_release_page.
Maybe some explanation of this assert failed is that there is a private
(budgeted) but not dirty page in your page caches.
Wouldn't then the assertion in the previous line fail?
I dove into the code, and I understood there probably isn't any real
connection between the to.
I do appreciate you took the time to answer me :) .
Post by hujianyang
I think it is not related to my race because pages can just remain
1) Dirty, Private
2) Not Dirty, Not Private
3) Dirty, Not Private (wrong condition)
Did you get some more assert failed after this? If you umount UBIFS,
ubifs_put_super will check the budget info, and I think you should
hit some assertion there.
I did notice, at least once, assertion failure:
[ 4445.102863] UBIFS: un-mount UBI device 1, volume 0
[ 4445.106627] UBIFS assert failed in ubifs_put_super at 1775 (pid 10416)

But, this error happens only significantly later ~2 hours.
Post by hujianyang
Do you have a simply way to reproduce this assert failed?
At the moment I hit this randomly when stressing the system.
I use a stress test that run parallel IOzone and lmdd benchmark, each
100MB (out of ~217MB free)



Wouldn't then the assertion in the previous line fail?
I dove into the code, and I understood there probably isn't any real
connection between the to.
I do appreciate you took the time to answer me . :-)

Thanks,
Dolev
--
QUALCOMM ISRAEL, on behalf of Qualcomm Innovation Center, Inc. is a member
of Code Aurora Forum, hosted by The Linux Foundation
Artem Bityutskiy
2014-05-05 07:07:56 UTC
Permalink
Post by Laurence Withers
Post by hujianyang
According to this situation, my v2 fix returns from page_mkwrite
without performing unlock_page. We return VM_FAULT_LOCKED instead
of just return 0. After doing this, the race above will not happen.
Hi,
I have been testing patch v1 on our hardware. Normally, at higher data rates,
we can hit the assert in under 8 hours. So far I have been running for 48
hours on one system without seeing the assert message.
I will now switch to running v2 and see if I can't find a few more systems to
run in parallel.
Many thanks, and bye for now,
So can I add:

Tested-by: Laurence Withers <lwithers at guralp.com>

?
--
Best Regards,
Artem Bityutskiy
Laurence Withers
2014-05-05 11:50:05 UTC
Permalink
Post by Artem Bityutskiy
Tested-by: Laurence Withers <lwithers at guralp.com>
Dear Artem,

Yes, please do so. My test system is still going strong, 120 hours in.

Many thanks, and bye for now,
--
Laurence Withers, <lwithers at guralp.com> http://www.guralp.com/
Artem Bityutskiy
2014-05-05 07:02:38 UTC
Permalink
Post by hujianyang
Hi, all
Basing on the perious mail, I would like to show a clear figure
about the race I have found.
Looks correct, thanks. This probably needs to go the the stable tree.
--
Best Regards,
Artem Bityutskiy
Artem Bityutskiy
2014-05-05 07:16:17 UTC
Permalink
Post by Artem Bityutskiy
Post by hujianyang
Hi, all
Basing on the perious mail, I would like to show a clear figure
about the race I have found.
Looks correct, thanks. This probably needs to go the the stable tree.
I think the patch is correct and fixes a real problem where we end up
with a dirty page but no budget allocated for this page.

So the subject is not very good - it looks like the patch just silence a
warning, but it actually fixes a real problem.

I'll change the subject.

It looks like this patch should be merged to Linuses tree for 3.15, but
I would like to play safe and postpone it to 3.16, because I did not
test it myself, and only one person's Tested-by: is not enough to make
me confident the patch is bullet-proof.

However, I'll add "Cc: stable at vger.kernel.org", which means that once
the patch hits Linuses tree, it'll also go to older stable kernel trees
too, so all the careful users will end up having it too.

How does this sound?

And thanks for excellent analysis.
--
Best Regards,
Artem Bityutskiy
hujianyang
2014-05-05 07:30:26 UTC
Permalink
Hi Artem,
Post by Artem Bityutskiy
However, I'll add "Cc: stable at vger.kernel.org", which means that once
the patch hits Linuses tree, it'll also go to older stable kernel trees
too, so all the careful users will end up having it too.
I really need this patch goes into stable, it's enough for me. I also
think the commit message is not good and thank you for changing the
subject.

Hu
Loading...