path-cas-concurrency-checklist.md
2.89 KB
Path CAS Concurrency Checklist
Purpose
Validate that VDA path cache updates are atomic under concurrent SendNextSegment and net-action updates.
Scope
Applies to:
src/Rcs.Infrastructure/Services/Protocol/Vda5050ProtocolService.cs- Redis keys:
rcs:vdaPath:{robotId}:{taskId}:{subTaskId}rcs:vdaPath:{robotId}:{taskId}:{subTaskId}:planVersion
Core Assertions
-
planVersionmust monotonically increase. - Cache JSON and
:planVersionkey must stay consistent (same version). - Concurrent writers must not silently overwrite newer data.
- On conflict, fallback retry should recover in most cases.
Case A: Dual SendNextSegment race
- Trigger two
SendNextSegmentAsyncrequests for same robot/task/subTask almost simultaneously. - Expected:
- one writer wins direct CAS;
- another writer either:
- fails fast and returns a concurrency-wait response, or
- succeeds via fallback CAS after reading latest version.
- Check logs:
VDA5050 - 路径状态并发更新[PathCAS] ... Conflict=... FallbackSuccess=...
Case B: SendNextSegment + NetAction state race
- While
CheckAndExecuteNetActionsAsyncwritesExecuting/WaitingAsyncResponse, concurrently triggerSendNextSegmentAsync. - Expected:
- no stale overwrite on
NetActionStatus; - final cache contains latest route index and expected net-action state.
- no stale overwrite on
Case C: Hold/Patch conflict with queue update
- Force reverse conflict and hold transitions repeatedly.
- During hold, trigger patch decision (trim or detour) in parallel.
- Expected:
- route mode transitions are serialized by CAS;
- no missing
HoldNodeCode/HoldUntilUtc; -
LastDecisionCoderemains one of latest successful writes.
Case D: PlanVersion key missing recovery
- Manually delete only
:planVersionkey, keep cache JSON. - Trigger
SendNextSegmentAsync. - Expected:
-
EnsurePathPlanVersionKeyAsyncrecreates version key from loaded cache version; - subsequent CAS writes proceed.
-
Operational Signals
Track periodic CAS metric log:
[PathCAS] DirectAttempt={...}, DirectSuccess={...}, Conflict={...}, FallbackAttempt={...}, FallbackSuccess={...}, Failure={...}
Also monitor coordinator decision distribution:
[GlobalNav] DecisionMetrics Continue={...}, Hold={...}, Patch={...}, TopStrategies={...}
Healthy pattern:
-
DirectSuccessdominates. -
Conflictmay rise under burst traffic. -
FallbackSuccessshould be non-zero when conflicts occur. -
Failureshould remain near zero. - When short-window conflict ratio is high, service should emit
HighConflictBackoffdebug logs and apply 40-140ms jitter delay before CAS.
Rollback Criteria
Rollback CAS changes if any occur repeatedly:
-
planVersiondecreases or oscillates. - Cache JSON version !=
:planVersionkey. - Persistent high
Failurewith stalled dispatch. - Robots stuck in hold/queue due to missing state writes.