| CVE |
Vendors |
Products |
Updated |
CVSS v3.1 |
| In the Linux kernel, the following vulnerability has been resolved:
media: dvb-usb: m920x: Fix a potential memory leak in m920x_i2c_xfer()
'read' is freed when it is known to be NULL, but not when a read error
occurs.
Revert the logic to avoid a small leak, should a m920x_read() call fail. |
| In the Linux kernel, the following vulnerability has been resolved:
mtd: spi-nor: Fix shift-out-of-bounds in spi_nor_set_erase_type
spi_nor_set_erase_type() was used either to set or to mask out an erase
type. When we used it to mask out an erase type a shift-out-of-bounds
was hit:
UBSAN: shift-out-of-bounds in drivers/mtd/spi-nor/core.c:2237:24
shift exponent 4294967295 is too large for 32-bit type 'int'
The setting of the size_{shift, mask} and of the opcode are unnecessary
when the erase size is zero, as throughout the code just the erase size
is considered to determine whether an erase type is supported or not.
Setting the opcode to 0xFF was wrong too as nobody guarantees that 0xFF
is an unused opcode. Thus when masking out an erase type, just set the
erase size to zero. This will fix the shift-out-of-bounds.
[ta: refine changes, new commit message, fix compilation error] |
| In the Linux kernel, the following vulnerability has been resolved:
btrfs: zoned: fix memory leak after finding block group with super blocks
At exclude_super_stripes(), if we happen to find a block group that has
super blocks mapped to it and we are on a zoned filesystem, we error out
as this is not supposed to happen, indicating either a bug or maybe some
memory corruption for example. However we are exiting the function without
freeing the memory allocated for the logical address of the super blocks.
Fix this by freeing the logical address. |
| In the Linux kernel, the following vulnerability has been resolved:
fuse: missing copy_finish in fuse-over-io-uring argument copies
Fix a possible reference count leak of payload pages during
fuse argument copies.
[Joanne: simplified error cleanup] |
| In the Linux kernel, the following vulnerability has been resolved:
net: tls: avoid hanging tasks on the tx_lock
syzbot sent a hung task report and Eric explains that adversarial
receiver may keep RWIN at 0 for a long time, so we are not guaranteed
to make forward progress. Thread which took tx_lock and went to sleep
may not release tx_lock for hours. Use interruptible sleep where
possible and reschedule the work if it can't take the lock.
Testing: existing selftest passes |
| In the Linux kernel, the following vulnerability has been resolved:
udp_tunnel: use netdev_warn() instead of netdev_WARN()
netdev_WARN() uses WARN/WARN_ON to print a backtrace along with
file and line information. In this case, udp_tunnel_nic_register()
returning an error is just a failed operation, not a kernel bug.
udp_tunnel_nic_register() can fail due to a memory allocation
failure (kzalloc() or udp_tunnel_nic_alloc()).
This is a normal runtime error and not a kernel bug.
Replace netdev_WARN() with netdev_warn() accordingly. |
| In the Linux kernel, the following vulnerability has been resolved:
iomap: adjust read range correctly for non-block-aligned positions
iomap_adjust_read_range() assumes that the position and length passed in
are block-aligned. This is not always the case however, as shown in the
syzbot generated case for erofs. This causes too many bytes to be
skipped for uptodate blocks, which results in returning the incorrect
position and length to read in. If all the blocks are uptodate, this
underflows length and returns a position beyond the folio.
Fix the calculation to also take into account the block offset when
calculating how many bytes can be skipped for uptodate blocks. |
| In the Linux kernel, the following vulnerability has been resolved:
ALSA: ymfpci: Create card with device-managed snd_devm_card_new()
snd_card_ymfpci_remove() was removed in commit c6e6bb5eab74 ("ALSA:
ymfpci: Allocate resources with device-managed APIs"), but the call to
snd_card_new() was not replaced with snd_devm_card_new().
Since there was no longer a call to snd_card_free, unloading the module
would eventually result in Oops:
[697561.532887] BUG: unable to handle page fault for address: ffffffffc0924480
[697561.532893] #PF: supervisor read access in kernel mode
[697561.532896] #PF: error_code(0x0000) - not-present page
[697561.532899] PGD ae1e15067 P4D ae1e15067 PUD ae1e17067 PMD 11a8f5067 PTE 0
[697561.532905] Oops: 0000 [#1] PREEMPT SMP NOPTI
[697561.532909] CPU: 21 PID: 5080 Comm: wireplumber Tainted: G W OE 6.2.7 #1
[697561.532914] Hardware name: System manufacturer System Product Name/TUF GAMING X570-PLUS, BIOS 4408 10/28/2022
[697561.532916] RIP: 0010:try_module_get.part.0+0x1a/0xe0
[697561.532924] Code: 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 0f 1f 44 00 00 55 48 89 e5 41 55 41 54 49 89 fc bf 01 00 00 00 e8 56 3c f8 ff <41> 83 3c 24 02 0f 84 96 00 00 00 41 8b 84 24 30 03 00 00 85 c0 0f
[697561.532927] RSP: 0018:ffffbe9b858c3bd8 EFLAGS: 00010246
[697561.532930] RAX: ffff9815d14f1900 RBX: ffff9815c14e6000 RCX: 0000000000000000
[697561.532933] RDX: 0000000000000000 RSI: ffffffffc055092c RDI: ffffffffb3778c1a
[697561.532935] RBP: ffffbe9b858c3be8 R08: 0000000000000040 R09: ffff981a1a741380
[697561.532937] R10: ffffbe9b858c3c80 R11: 00000009d56533a6 R12: ffffffffc0924480
[697561.532939] R13: ffff9823439d8500 R14: 0000000000000025 R15: ffff9815cd109f80
[697561.532942] FS: 00007f13084f1f80(0000) GS:ffff9824aef40000(0000) knlGS:0000000000000000
[697561.532945] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[697561.532947] CR2: ffffffffc0924480 CR3: 0000000145344000 CR4: 0000000000350ee0
[697561.532949] Call Trace:
[697561.532951] <TASK>
[697561.532955] try_module_get+0x13/0x30
[697561.532960] snd_ctl_open+0x61/0x1c0 [snd]
[697561.532976] snd_open+0xb4/0x1e0 [snd]
[697561.532989] chrdev_open+0xc7/0x240
[697561.532995] ? fsnotify_perm.part.0+0x6e/0x160
[697561.533000] ? __pfx_chrdev_open+0x10/0x10
[697561.533005] do_dentry_open+0x169/0x440
[697561.533009] vfs_open+0x2d/0x40
[697561.533012] path_openat+0xa9d/0x10d0
[697561.533017] ? debug_smp_processor_id+0x17/0x20
[697561.533022] ? trigger_load_balance+0x65/0x370
[697561.533026] do_filp_open+0xb2/0x160
[697561.533032] ? _raw_spin_unlock+0x19/0x40
[697561.533036] ? alloc_fd+0xa9/0x190
[697561.533040] do_sys_openat2+0x9f/0x160
[697561.533044] __x64_sys_openat+0x55/0x90
[697561.533048] do_syscall_64+0x3b/0x90
[697561.533052] entry_SYSCALL_64_after_hwframe+0x72/0xdc
[697561.533056] RIP: 0033:0x7f1308a40db4
[697561.533059] Code: 24 20 eb 8f 66 90 44 89 54 24 0c e8 46 68 f8 ff 44 8b 54 24 0c 44 89 e2 48 89 ee 41 89 c0 bf 9c ff ff ff b8 01 01 00 00 0f 05 <48> 3d 00 f0 ff ff 77 32 44 89 c7 89 44 24 0c e8 78 68 f8 ff 8b 44
[697561.533062] RSP: 002b:00007ffcce664450 EFLAGS: 00000293 ORIG_RAX: 0000000000000101
[697561.533066] RAX: ffffffffffffffda RBX: 0000000000000003 RCX: 00007f1308a40db4
[697561.533068] RDX: 0000000000080000 RSI: 00007ffcce664690 RDI: 00000000ffffff9c
[697561.533070] RBP: 00007ffcce664690 R08: 0000000000000000 R09: 0000000000000012
[697561.533072] R10: 0000000000000000 R11: 0000000000000293 R12: 0000000000080000
[697561.533074] R13: 00007f13054b069b R14: 0000565209f83200 R15: 0000000000000000
[697561.533078] </TASK> |
| In the Linux kernel, the following vulnerability has been resolved:
media: iris: Fix memory leak by freeing untracked persist buffer
One internal buffer which is allocated only once per session was not
being freed during session close because it was not being tracked as
part of internal buffer list which resulted in a memory leak.
Add the necessary logic to explicitly free the untracked internal buffer
during session close to ensure all allocated memory is released
properly. |
| In the Linux kernel, the following vulnerability has been resolved:
NFSD: NFSv4 file creation neglects setting ACL
An NFSv4 client that sets an ACL with a named principal during file
creation retrieves the ACL afterwards, and finds that it is only a
default ACL (based on the mode bits) and not the ACL that was
requested during file creation. This violates RFC 8881 section
6.4.1.3: "the ACL attribute is set as given".
The issue occurs in nfsd_create_setattr(), which calls
nfsd_attrs_valid() to determine whether to call nfsd_setattr().
However, nfsd_attrs_valid() checks only for iattr changes and
security labels, but not POSIX ACLs. When only an ACL is present,
the function returns false, nfsd_setattr() is skipped, and the
POSIX ACL is never applied to the inode.
Subsequently, when the client retrieves the ACL, the server finds
no POSIX ACL on the inode and returns one generated from the file's
mode bits rather than returning the originally-specified ACL. |
| In the Linux kernel, the following vulnerability has been resolved:
netlink: do not hard code device address lenth in fdb dumps
syzbot reports that some netdev devices do not have a six bytes
address [1]
Replace ETH_ALEN by dev->addr_len.
[1] (Case of a device where dev->addr_len = 4)
BUG: KMSAN: kernel-infoleak in instrument_copy_to_user include/linux/instrumented.h:114 [inline]
BUG: KMSAN: kernel-infoleak in copyout+0xb8/0x100 lib/iov_iter.c:169
instrument_copy_to_user include/linux/instrumented.h:114 [inline]
copyout+0xb8/0x100 lib/iov_iter.c:169
_copy_to_iter+0x6d8/0x1d00 lib/iov_iter.c:536
copy_to_iter include/linux/uio.h:206 [inline]
simple_copy_to_iter+0x68/0xa0 net/core/datagram.c:513
__skb_datagram_iter+0x123/0xdc0 net/core/datagram.c:419
skb_copy_datagram_iter+0x5c/0x200 net/core/datagram.c:527
skb_copy_datagram_msg include/linux/skbuff.h:3960 [inline]
netlink_recvmsg+0x4ae/0x15a0 net/netlink/af_netlink.c:1970
sock_recvmsg_nosec net/socket.c:1019 [inline]
sock_recvmsg net/socket.c:1040 [inline]
____sys_recvmsg+0x283/0x7f0 net/socket.c:2722
___sys_recvmsg+0x223/0x840 net/socket.c:2764
do_recvmmsg+0x4f9/0xfd0 net/socket.c:2858
__sys_recvmmsg net/socket.c:2937 [inline]
__do_sys_recvmmsg net/socket.c:2960 [inline]
__se_sys_recvmmsg net/socket.c:2953 [inline]
__x64_sys_recvmmsg+0x397/0x490 net/socket.c:2953
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x41/0xc0 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x63/0xcd
Uninit was stored to memory at:
__nla_put lib/nlattr.c:1009 [inline]
nla_put+0x1c6/0x230 lib/nlattr.c:1067
nlmsg_populate_fdb_fill+0x2b8/0x600 net/core/rtnetlink.c:4071
nlmsg_populate_fdb net/core/rtnetlink.c:4418 [inline]
ndo_dflt_fdb_dump+0x616/0x840 net/core/rtnetlink.c:4456
rtnl_fdb_dump+0x14ff/0x1fc0 net/core/rtnetlink.c:4629
netlink_dump+0x9d1/0x1310 net/netlink/af_netlink.c:2268
netlink_recvmsg+0xc5c/0x15a0 net/netlink/af_netlink.c:1995
sock_recvmsg_nosec+0x7a/0x120 net/socket.c:1019
____sys_recvmsg+0x664/0x7f0 net/socket.c:2720
___sys_recvmsg+0x223/0x840 net/socket.c:2764
do_recvmmsg+0x4f9/0xfd0 net/socket.c:2858
__sys_recvmmsg net/socket.c:2937 [inline]
__do_sys_recvmmsg net/socket.c:2960 [inline]
__se_sys_recvmmsg net/socket.c:2953 [inline]
__x64_sys_recvmmsg+0x397/0x490 net/socket.c:2953
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x41/0xc0 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x63/0xcd
Uninit was created at:
slab_post_alloc_hook+0x12d/0xb60 mm/slab.h:716
slab_alloc_node mm/slub.c:3451 [inline]
__kmem_cache_alloc_node+0x4ff/0x8b0 mm/slub.c:3490
kmalloc_trace+0x51/0x200 mm/slab_common.c:1057
kmalloc include/linux/slab.h:559 [inline]
__hw_addr_create net/core/dev_addr_lists.c:60 [inline]
__hw_addr_add_ex+0x2e5/0x9e0 net/core/dev_addr_lists.c:118
__dev_mc_add net/core/dev_addr_lists.c:867 [inline]
dev_mc_add+0x9a/0x130 net/core/dev_addr_lists.c:885
igmp6_group_added+0x267/0xbc0 net/ipv6/mcast.c:680
ipv6_mc_up+0x296/0x3b0 net/ipv6/mcast.c:2754
ipv6_mc_remap+0x1e/0x30 net/ipv6/mcast.c:2708
addrconf_type_change net/ipv6/addrconf.c:3731 [inline]
addrconf_notify+0x4d3/0x1d90 net/ipv6/addrconf.c:3699
notifier_call_chain kernel/notifier.c:93 [inline]
raw_notifier_call_chain+0xe4/0x430 kernel/notifier.c:461
call_netdevice_notifiers_info net/core/dev.c:1935 [inline]
call_netdevice_notifiers_extack net/core/dev.c:1973 [inline]
call_netdevice_notifiers+0x1ee/0x2d0 net/core/dev.c:1987
bond_enslave+0xccd/0x53f0 drivers/net/bonding/bond_main.c:1906
do_set_master net/core/rtnetlink.c:2626 [inline]
rtnl_newlink_create net/core/rtnetlink.c:3460 [inline]
__rtnl_newlink net/core/rtnetlink.c:3660 [inline]
rtnl_newlink+0x378c/0x40e0 net/core/rtnetlink.c:3673
rtnetlink_rcv_msg+0x16a6/0x1840 net/core/rtnetlink.c:6395
netlink_rcv_skb+0x371/0x650 net/netlink/af_netlink.c:2546
rtnetlink_rcv+0x34/0x40 net/core/rtnetlink.c:6413
netlink_unicast_kernel net/netlink/af_netlink.c:1339 [inline]
netlink_unicast+0xf28/0x1230 net/netlink/af_
---truncated--- |
| In the Linux kernel, the following vulnerability has been resolved:
iio: imu: st_lsm6dsx: fix iio_chan_spec for sensors without event detection
The st_lsm6dsx_acc_channels array of struct iio_chan_spec has a non-NULL
event_spec field, indicating support for IIO events. However, event
detection is not supported for all sensors, and if userspace tries to
configure accelerometer wakeup events on a sensor device that does not
support them (e.g. LSM6DS0), st_lsm6dsx_write_event() dereferences a NULL
pointer when trying to write to the wakeup register.
Define an additional struct iio_chan_spec array whose members have a NULL
event_spec field, and use this array instead of st_lsm6dsx_acc_channels for
sensors without event detection capability. |
| In the Linux kernel, the following vulnerability has been resolved:
iio: adc: at91-sama5d2_adc: Fix potential use-after-free in sama5d2_adc driver
at91_adc_interrupt can call at91_adc_touch_data_handler function
to start the work by schedule_work(&st->touch_st.workq).
If we remove the module which will call at91_adc_remove to
make cleanup, it will free indio_dev through iio_device_unregister but
quite a bit later. While the work mentioned above will be used. The
sequence of operations that may lead to a UAF bug is as follows:
CPU0 CPU1
| at91_adc_workq_handler
at91_adc_remove |
iio_device_unregister(indio_dev) |
//free indio_dev a bit later |
| iio_push_to_buffers(indio_dev)
| //use indio_dev
Fix it by ensuring that the work is canceled before proceeding with
the cleanup in at91_adc_remove. |
| In the Linux kernel, the following vulnerability has been resolved:
hfs: fix potential use after free in hfs_correct_next_unused_CNID()
This code calls hfs_bnode_put(node) which drops the refcount and then
dreferences "node" on the next line. It's only safe to use "node"
when we're holding a reference so flip these two lines around. |
| In the Linux kernel, the following vulnerability has been resolved:
net: netpoll: initialize work queue before error checks
Prevent a kernel warning when netconsole setup fails on devices with
IFF_DISABLE_NETPOLL flag. The warning (at kernel/workqueue.c:4242 in
__flush_work) occurs because the cleanup path tries to cancel an
uninitialized work queue.
When __netpoll_setup() encounters a device with IFF_DISABLE_NETPOLL,
it fails early and calls skb_pool_flush() for cleanup. This function
calls cancel_work_sync(&np->refill_wq), but refill_wq hasn't been
initialized yet, triggering the warning.
Move INIT_WORK() to the beginning of __netpoll_setup(), ensuring the
work queue is properly initialized before any potential failure points.
This allows the cleanup path to safely cancel the work queue regardless
of where the setup fails. |
| In the Linux kernel, the following vulnerability has been resolved:
irqchip/mchp-eic: Fix error code in mchp_eic_domain_alloc()
If irq_domain_translate_twocell() sets "hwirq" to >= MCHP_EIC_NIRQ (2) then
it results in an out of bounds access.
The code checks for invalid values, but doesn't set the error code. Return
-EINVAL in that case, instead of returning success. |
| In the Linux kernel, the following vulnerability has been resolved:
bpf: Fix reference state management for synchronous callbacks
Currently, verifier verifies callback functions (sync and async) as if
they will be executed once, (i.e. it explores execution state as if the
function was being called once). The next insn to explore is set to
start of subprog and the exit from nested frame is handled using
curframe > 0 and prepare_func_exit. In case of async callback it uses a
customized variant of push_stack simulating a kind of branch to set up
custom state and execution context for the async callback.
While this approach is simple and works when callback really will be
executed only once, it is unsafe for all of our current helpers which
are for_each style, i.e. they execute the callback multiple times.
A callback releasing acquired references of the caller may do so
multiple times, but currently verifier sees it as one call inside the
frame, which then returns to caller. Hence, it thinks it released some
reference that the cb e.g. got access through callback_ctx (register
filled inside cb from spilled typed register on stack).
Similarly, it may see that an acquire call is unpaired inside the
callback, so the caller will copy the reference state of callback and
then will have to release the register with new ref_obj_ids. But again,
the callback may execute multiple times, but the verifier will only
account for acquired references for a single symbolic execution of the
callback, which will cause leaks.
Note that for async callback case, things are different. While currently
we have bpf_timer_set_callback which only executes it once, even for
multiple executions it would be safe, as reference state is NULL and
check_reference_leak would force program to release state before
BPF_EXIT. The state is also unaffected by analysis for the caller frame.
Hence async callback is safe.
Since we want the reference state to be accessible, e.g. for pointers
loaded from stack through callback_ctx's PTR_TO_STACK, we still have to
copy caller's reference_state to callback's bpf_func_state, but we
enforce that whatever references it adds to that reference_state has
been released before it hits BPF_EXIT. This requires introducing a new
callback_ref member in the reference state to distinguish between caller
vs callee references. Hence, check_reference_leak now errors out if it
sees we are in callback_fn and we have not released callback_ref refs.
Since there can be multiple nested callbacks, like frame 0 -> cb1 -> cb2
etc. we need to also distinguish between whether this particular ref
belongs to this callback frame or parent, and only error for our own, so
we store state->frameno (which is always non-zero for callbacks).
In short, callbacks can read parent reference_state, but cannot mutate
it, to be able to use pointers acquired by the caller. They must only
undo their changes (by releasing their own acquired_refs before
BPF_EXIT) on top of caller reference_state before returning (at which
point the caller and callback state will match anyway, so no need to
copy it back to caller). |
| In the Linux kernel, the following vulnerability has been resolved:
spi: fsl-cpm: Check length parity before switching to 16 bit mode
Commit fc96ec826bce ("spi: fsl-cpm: Use 16 bit mode for large transfers
with even size") failed to make sure that the size is really even
before switching to 16 bit mode. Until recently the problem went
unnoticed because kernfs uses a pre-allocated bounce buffer of size
PAGE_SIZE for reading EEPROM.
But commit 8ad6249c51d0 ("eeprom: at25: convert to spi-mem API")
introduced an additional dynamically allocated bounce buffer whose size
is exactly the size of the transfer, leading to a buffer overrun in
the fsl-cpm driver when that size is odd.
Add the missing length parity verification and remain in 8 bit mode
when the length is not even. |
| In the Linux kernel, the following vulnerability has been resolved:
iommu/amd: Fix potential out-of-bounds read in iommu_mmio_show
In iommu_mmio_write(), it validates the user-provided offset with the
check: `iommu->dbg_mmio_offset > iommu->mmio_phys_end - 4`.
This assumes a 4-byte access. However, the corresponding
show handler, iommu_mmio_show(), uses readq() to perform an 8-byte
(64-bit) read.
If a user provides an offset equal to `mmio_phys_end - 4`, the check
passes, and will lead to a 4-byte out-of-bounds read.
Fix this by adjusting the boundary check to use sizeof(u64), which
corresponds to the size of the readq() operation. |
| In the Linux kernel, the following vulnerability has been resolved:
block: Use RCU in blk_mq_[un]quiesce_tagset() instead of set->tag_list_lock
blk_mq_{add,del}_queue_tag_set() functions add and remove queues from
tagset, the functions make sure that tagset and queues are marked as
shared when two or more queues are attached to the same tagset.
Initially a tagset starts as unshared and when the number of added
queues reaches two, blk_mq_add_queue_tag_set() marks it as shared along
with all the queues attached to it. When the number of attached queues
drops to 1 blk_mq_del_queue_tag_set() need to mark both the tagset and
the remaining queues as unshared.
Both functions need to freeze current queues in tagset before setting on
unsetting BLK_MQ_F_TAG_QUEUE_SHARED flag. While doing so, both functions
hold set->tag_list_lock mutex, which makes sense as we do not want
queues to be added or deleted in the process. This used to work fine
until commit 98d81f0df70c ("nvme: use blk_mq_[un]quiesce_tagset")
made the nvme driver quiesce tagset instead of quiscing individual
queues. blk_mq_quiesce_tagset() does the job and quiesce the queues in
set->tag_list while holding set->tag_list_lock also.
This results in deadlock between two threads with these stacktraces:
__schedule+0x47c/0xbb0
? timerqueue_add+0x66/0xb0
schedule+0x1c/0xa0
schedule_preempt_disabled+0xa/0x10
__mutex_lock.constprop.0+0x271/0x600
blk_mq_quiesce_tagset+0x25/0xc0
nvme_dev_disable+0x9c/0x250
nvme_timeout+0x1fc/0x520
blk_mq_handle_expired+0x5c/0x90
bt_iter+0x7e/0x90
blk_mq_queue_tag_busy_iter+0x27e/0x550
? __blk_mq_complete_request_remote+0x10/0x10
? __blk_mq_complete_request_remote+0x10/0x10
? __call_rcu_common.constprop.0+0x1c0/0x210
blk_mq_timeout_work+0x12d/0x170
process_one_work+0x12e/0x2d0
worker_thread+0x288/0x3a0
? rescuer_thread+0x480/0x480
kthread+0xb8/0xe0
? kthread_park+0x80/0x80
ret_from_fork+0x2d/0x50
? kthread_park+0x80/0x80
ret_from_fork_asm+0x11/0x20
__schedule+0x47c/0xbb0
? xas_find+0x161/0x1a0
schedule+0x1c/0xa0
blk_mq_freeze_queue_wait+0x3d/0x70
? destroy_sched_domains_rcu+0x30/0x30
blk_mq_update_tag_set_shared+0x44/0x80
blk_mq_exit_queue+0x141/0x150
del_gendisk+0x25a/0x2d0
nvme_ns_remove+0xc9/0x170
nvme_remove_namespaces+0xc7/0x100
nvme_remove+0x62/0x150
pci_device_remove+0x23/0x60
device_release_driver_internal+0x159/0x200
unbind_store+0x99/0xa0
kernfs_fop_write_iter+0x112/0x1e0
vfs_write+0x2b1/0x3d0
ksys_write+0x4e/0xb0
do_syscall_64+0x5b/0x160
entry_SYSCALL_64_after_hwframe+0x4b/0x53
The top stacktrace is showing nvme_timeout() called to handle nvme
command timeout. timeout handler is trying to disable the controller and
as a first step, it needs to blk_mq_quiesce_tagset() to tell blk-mq not
to call queue callback handlers. The thread is stuck waiting for
set->tag_list_lock as it tries to walk the queues in set->tag_list.
The lock is held by the second thread in the bottom stack which is
waiting for one of queues to be frozen. The queue usage counter will
drop to zero after nvme_timeout() finishes, and this will not happen
because the thread will wait for this mutex forever.
Given that [un]quiescing queue is an operation that does not need to
sleep, update blk_mq_[un]quiesce_tagset() to use RCU instead of taking
set->tag_list_lock, update blk_mq_{add,del}_queue_tag_set() to use RCU
safe list operations. Also, delete INIT_LIST_HEAD(&q->tag_set_list)
in blk_mq_del_queue_tag_set() because we can not re-initialize it while
the list is being traversed under RCU. The deleted queue will not be
added/deleted to/from a tagset and it will be freed in blk_free_queue()
after the end of RCU grace period. |