nmqtt: fix lost wakeup in work() when enqueue occurs during inWork#48
nmqtt: fix lost wakeup in work() when enqueue occurs during inWork#48centurysys wants to merge 3 commits intopython36:masterfrom
Conversation
work() could return early when ctx.inWork == true without scheduling a retrigger. If publish() enqueued new work during an active execution, no subsequent work() invocation was guaranteed unless triggered by ping or other events. This resulted in queue stagnation under load. Fix by: - introducing an isPending flag to explicitly schedule a retrigger after the current execution finishes - copying workQueue into a snapshot before iteration to avoid mutation during traversal - guarding inWork with try/finally to prevent deadlock on exception This change ensures single-worker semantics without losing execution triggers under concurrent enqueue. Signed-off-by: Takeyoshi Kikuchi <kikuchi@centurysys.co.jp>
recvInto() may return fewer bytes than requested on stream sockets. Introduce recvExact() to read the full payload length and use it from recv(). While here, make Pkt a ref type so it can be safely used across async boundaries and simplify packet helper procs (no var parameter needed). Signed-off-by: Takeyoshi Kikuchi <kikuchi@centurysys.co.jp>
Pkt was refactored from object to ref object, allowing recv() to return nil on early exit paths. runRx() assumed a non-nil packet and accessed pkt.typ unconditionally. Add pkt.isNil guard to terminate RX loop safely without allocating a Notype sentinel packet. Signed-off-by: Takeyoshi Kikuchi <kikuchi@centurysys.co.jp> (cherry picked from commit f403139)
|
Hi @centurysys, thanks for the PR! Please take a look at my PR #51 as well, where I've addressed the asynchronous issues. Let me know what you think about it and if you have any feedback or suggestions |
|
Hi @python36 , thanks for the update! I reviewed PR #51, and the approach looks good to me.
So I think this direction makes sense. From my experience running nmqtt on a constrained embedded system (Cortex-A5 ~500MHz), I encountered a couple of other issues that had a bigger impact in practice:
These issues were more critical in my environment than the async scheduling itself. |
|
@centurysys, thanks for the feedback! |
|
Hi @python36 , thank you for merging the fix! I’d like to share one additional observation from my use case, just for reference. |
work() could return early when ctx.inWork == true without scheduling a retrigger. If publish() enqueued new work during an active execution, no subsequent work() invocation was guaranteed unless triggered by ping or other events.
This resulted in queue stagnation under load.
Fix by:
This change ensures single-worker semantics without losing execution triggers under concurrent enqueue.