ADR 007: Weak Network Resilience Analysis¶

Status¶

Accepted (2026-02-12)

A comprehensive analysis of the project's weak network resilience was conducted in version v3.0.4, revealing the following critical issues:

Inconsistent network components: The project has four independent network request implementations (FetchWorker, NetworkUtils, NetworkWorker, ComponentService) with inconsistent timeout/retry/backoff strategies
NetworkWorker lacks timeout: Uses QEventLoop::exec() to wait indefinitely, potentially blocking threads permanently under weak network conditions
FetchWorker does not retry after timeout: Timeouts are the most common error under weak network conditions, but FetchWorker skips retries on timeout
Unreasonable timeout configuration: FetchWorker timeouts (8-10s) are too short for weak network environments

Severity	Issue	Impact
Critical	`NetworkWorker` no timeout, permanent block	Application unresponsive
Critical	`FetchWorker` no retry after timeout	All batch exports fail under weak network
Critical	Timeout too short (8-10s)	Initial connection likely timeout when RTT > 2s
Moderate	Thundering herd effect	Bandwidth contention intensified
Moderate	Linear backoff (not exponential)	Slow recovery from rate limiting
Moderate	Fixed 500ms retry delay	Too aggressive under weak network
Low	QNAM thread_local leak	Memory growth during long runs

Component	Timeout	Retry	Backoff Strategy	Weak Network Rating
`FetchWorker`	8-10s	3x (no retry on timeout)	Linear +1000ms	Poor
`NetworkUtils`	30s	3x (retries on timeout)	Incremental 3/5/10s	Good
`NetworkWorker`	None	None	None	Very Poor
`ComponentService`	15s	3x	Incremental 1/2/3s	Fair

Based on the analysis conclusions, improvements are recommended in two priority levels:

Clarified the current state of weak network support and improvement directions
Provided clear priority guidance for network resilience improvements in future versions
Standardized network strategy across components

Improvement work requires additional development and testing time
Increasing timeout values may slightly increase waiting time under normal network conditions (mitigated by dynamic timeout strategies)

Changes to NetworkWorker require confirming whether it is still actively used to avoid unnecessary modifications
Timeout adjustments need to balance weak network resilience with user experience