nmqtt: fix lost wakeup in work() when enqueue occurs during inWork by centurysys · Pull Request #48 · python36/nmqtt

centurysys · 2026-02-24T08:28:18Z

work() could return early when ctx.inWork == true without scheduling a retrigger. If publish() enqueued new work during an active execution, no subsequent work() invocation was guaranteed unless triggered by ping or other events.

This resulted in queue stagnation under load.

Fix by:

introducing an isPending flag to explicitly schedule a retrigger after the current execution finishes
copying workQueue into a snapshot before iteration to avoid mutation during traversal
guarding inWork with try/finally to prevent deadlock on exception

This change ensures single-worker semantics without losing execution triggers under concurrent enqueue.

work() could return early when ctx.inWork == true without scheduling a retrigger. If publish() enqueued new work during an active execution, no subsequent work() invocation was guaranteed unless triggered by ping or other events. This resulted in queue stagnation under load. Fix by: - introducing an isPending flag to explicitly schedule a retrigger after the current execution finishes - copying workQueue into a snapshot before iteration to avoid mutation during traversal - guarding inWork with try/finally to prevent deadlock on exception This change ensures single-worker semantics without losing execution triggers under concurrent enqueue. Signed-off-by: Takeyoshi Kikuchi <kikuchi@centurysys.co.jp>

recvInto() may return fewer bytes than requested on stream sockets. Introduce recvExact() to read the full payload length and use it from recv(). While here, make Pkt a ref type so it can be safely used across async boundaries and simplify packet helper procs (no var parameter needed). Signed-off-by: Takeyoshi Kikuchi <kikuchi@centurysys.co.jp>

Pkt was refactored from object to ref object, allowing recv() to return nil on early exit paths. runRx() assumed a non-nil packet and accessed pkt.typ unconditionally. Add pkt.isNil guard to terminate RX loop safely without allocating a Notype sentinel packet. Signed-off-by: Takeyoshi Kikuchi <kikuchi@centurysys.co.jp> (cherry picked from commit f403139)

python36 · 2026-03-25T12:06:30Z

Hi @centurysys, thanks for the PR! Please take a look at my PR #51 as well, where I've addressed the asynchronous issues. Let me know what you think about it and if you have any feedback or suggestions

centurysys · 2026-03-25T12:45:05Z

Hi @python36 , thanks for the update!

I reviewed PR #51, and the approach looks good to me.
It seems quite similar to what I implemented on my side, where:

if work() is already running, new requests are queued
and a flag ensures the loop runs again afterwards

So I think this direction makes sense.

From my experience running nmqtt on a constrained embedded system (Cortex-A5 ~500MHz), I encountered a couple of other issues that had a bigger impact in practice:

TLS socket partial read leading to connection drops
→ which caused PUBACK waits to accumulate
ping worker duplication (multiple ping loops running unexpectedly)

These issues were more critical in my environment than the async scheduling itself.
This might be specific to low-performance environments, so behavior could differ from typical PC setups.
I’ll try to test your changes as well and see how they behave under my workload.

python36 · 2026-03-25T14:26:14Z

@centurysys, thanks for the feedback!
Regarding the other issues you mentioned — I’ve already seen your PR #49 and PR #50.
I will definitely review and test them thoroughly. If everything looks good, I’ll be happy to merge them. Thanks for pointing these out!

centurysys · 2026-03-25T22:17:47Z

Hi @python36 , thank you for merging the fix!

I’d like to share one additional observation from my use case, just for reference.
In my application, it was important to know when a published message was actually acknowledged by the peer (e.g. via PUBACK), so that I can mark stored data as "sent" only after it has been accepted.
To support this without breaking the existing API, I implemented an additional function like publishWithId(), which allows associating a callback with the message ID.
This might be specific to store-and-forward style applications, but I thought it could be useful to mention.

centurysys added 3 commits February 24, 2026 17:20

python36 mentioned this pull request Mar 27, 2026

Implement reliable packet receiving logic #52

Merged

centurysys closed this Mar 27, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

nmqtt: fix lost wakeup in work() when enqueue occurs during inWork#48

nmqtt: fix lost wakeup in work() when enqueue occurs during inWork#48
centurysys wants to merge 3 commits intopython36:masterfrom
centurysys:fix/nmqtt-lost-wakeup

centurysys commented Feb 24, 2026

Uh oh!

python36 commented Mar 25, 2026

Uh oh!

centurysys commented Mar 25, 2026

Uh oh!

python36 commented Mar 25, 2026

Uh oh!

centurysys commented Mar 25, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

centurysys commented Feb 24, 2026

Uh oh!

python36 commented Mar 25, 2026

Uh oh!

centurysys commented Mar 25, 2026

Uh oh!

python36 commented Mar 25, 2026

Uh oh!

centurysys commented Mar 25, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants