STANAG-4538 LDL packets retransmissions, an interesting case

i56578-swl.blogspot.com 21 godzin temu

A fewer days ago I was casually watching the 4 MHz band erstwhile an unusually long STANAG-4538 (3G-HF) transmission on 4561.0 KHz/USB caught my attention and I decided to evidence it for later analysis.
Basically, in the usual STANAG-4538 way (ARQ), after the data link connection has been configured, the sending station and the receiving station alternate transmissions: the sending station transmitting xDL PDUs (Protocol Data Unit) containing payload data packets and the receiving station transmitting acknowledgment/control packets of whether or not the data packet in the preceding PDU was received without mistake (1). In this case the LDL protocol is utilized (Figure 1).

Fig. 1 - STANAG-4538 LDL transfer

As per STANAG-4538, the "original" datagram to be sent is divided into fixed-length segments which will be processed into packets by the chosen LDLn protocol. Indeed, a LDL data packet is defined as a fixed-length series of n-byte data section (n = 32,64,96,...,512) followed by a 17-bit series Number plus an 8-bit Control Field (presently unused). During the construction of the LDL BW3, a 32-bit Cyclic Redundancy Check (CRC) value is computed across the data bits of each data packet and then appended. Then, 7 flush bits having the value 0 are added to guarantee that the encoder is in the all-zero state upon encoding the last flush bit. Sumarizing, the on-air dimension of a LDLn BW3 burst is computed in bit as 8n + 64, in this case (LDL512, n = 512) 4096+64 = 4160 bit or 520 bytes (Figure 2).

Fig. 2 - LDL512 demodulated bitstream

That said, we can inspect the last 64 bits (17-bit series Number + 8-bit Control Field + 32-bit CRC + 7 flush bits) of the recorded BW3 bursts (Figure 3).

Fig. 3 - last 64 bits of the demodulated LDL512 bitstream

The 8-bit reserved field is added after the CRC field and not after the series Number, as specified in Annex C to STANAG-4538; I don't know if it's the modus operandi of the decoder. Moreover, the bits of the series Number field are transmitted starting with the Last crucial Bit (bit 0) alternatively than the most crucial bit (bit 16), most likely it's the modus operandi of the decoder, as above.
In this example you may see that the Packet Number #3 is sent 20 times and it clarifies the unusually dimension of the transmission (actually the number of packet retransmissions is greater than 20 since any bursts were not correctly demodulated, hence not present in Figures, due to their bad SNR value). The Packet Byte number field is always 511, this means it's a LDL512 transfer. The EOM & SOM fields are both set to "0" meaning that's an "interior" packet. The CRC fields show the same value.

110000 111111111 00 10110010110011010001111111100111 000000000000000
110000 111111111 00 10110010110011010001111111100111 000000000000000
110000 111111111 00 10110010110011010001111111100111 000000000000000
110000 111111111 00 10110010110011010001111111100111 000000000000000
110000 111111111 00 10110010110011010001111111100111 000000000000000
110000 111111111 00 10110010110011010001111111100111 000000000000000
110000 111111111 00 10110010110011010001111111100111 000000000000000
110000 111111111 00 10110010110011010001111111100111 000000000000000
110000 111111111 00 10110010110011010001111111100111 000000000000000
110000 111111111 00 10110010110011010001111111100111 000000000000000
110000 111111111 00 10110010110011010001111111100111 000000000000000
110000 111111111 00 10110010110011010001111111100111 000000000000000
110000 111111111 00 10110010110011010001111111100111 000000000000000
110000 111111111 00 10110010110011010001111111100111 000000000000000
110000 111111111 00 10110010110011010001111111100111 000000000000000
110000 111111111 00 10110010110011010001111111100111 000000000000000
110000 111111111 00 10110010110011010001111111100111 000000000000000
110000 111111111 00 10110010110011010001111111100111 000000000000000
110000 111111111 00 10110010110011010001111111100111 000000000000000
110000 111111111 00 10110010110011010001111111100111 000000000000000
3 511 EOM CRC field control + flush bits
SOM

I stopped recording after more than 5 minutes, see Figure 4, and most of the time (4 minutes and 4 seconds!) was spent on the transmissions of the single packet #3; besides note the repeated transmissions of packets #2 (2 times) and #4 (6 times) at the beginning and end of the recording (Figs. 2,3).

Fig. 4 - duration of the full recording (software "audacity")
So, why so many retransmissions?
STANAG-4538 is designed for High-Frequency (HF) radio communication, which is notorious for having a highly unstable and challenging channel environment. The large number of retransmissions may be a direct consequence of the protocol's request to guarantee reliable data transportation over this unreliable medium. STANAG-4538 employs an ARQ (Automatic Repeat reQuest) strategy as part of its data link protocols to accomplish this reliability. The maximum number of retransmissions in ARQ schemes is specified by the parameter N_tx (or often N_max​ or N_retries​): if the packet is not acknowledged as successfully received after the last retransmission it is typically discarded by the data link layer, and the link connection may be closed (2). This clause normalizes the trade-off between reliability and efficiency/latency (preventing the channel from being permanently blocked by an unrecoverable packet).
The usual or default value for the N_tx parameter is not specified in standard documentation summaries, not even in STANAG-4538, as it is simply a configurable parameter of the HF modem which is set by the maker or even by the strategy operator. In commercial and military HF implementations of akin ARQ schemes, the maximum retransmission number is frequently set to a value like 4, 8, or 10, but the actual number should be verified against the circumstantial equipment's configuration settings.

One of the causes of retransmissions is mediocre HF channel conditions: fading, multipath distortion, noise, interference and man-made interference (from electronics or another users). Any of these factors can origin bit errors in the received packet, which the receiver detects via a checksum (CRC) and then requests a retransmission of the corrupted packet.
A second crucial origin is the Adaptive Code Combining (Hybrid-ARQ). Indeed, STANAG-4538's data link protocols utilize a form of Hybrid-ARQ with code combining. This mechanics is specifically designed to work in mediocre channels and straight contributes to retransmissions. alternatively of just discarding an unreadable packet, the receiver stores the corrupted data. erstwhile a retransmitted copy of that same packet arrives, the receiver combines the information from all received copies (the first and all retransmissions) to effort and successfully decode it. The fact that it takes over 20 attempts may mean the combined information only reaches the decoding threshold after a crucial number of transmissions.
In short, the retransmissions aren't necessarily a sign of failure, but alternatively the core mechanics of STANAG-4538 actively working to supply error-free data transfer despite the inherent unreliability of the HF radio channel.
It should besides be considered that the utilized KiwiSDR receiver [1] is simply a kind of "man in the middle", i.e. it sees all the traffic on the channel, but this does not necessarily apply to the stations actually active (not all data/ACK packets can be correctly received by the 2 peers).

So, it is likely that the N_tx parameter is set to a value greater than 20 and channel conditions are very poor... but we could besides consider another origin too STANAG-4538 ARQ.
Perhaps we are observing retransmissions "forced" by the higher-layer protocol moving over the STANAG-4538 data link (running at Layer 2), specifically TCP (Transmission Control Protocol, moving at Layer 4). The limit of N_tx applies strictly to the LDL/ HDL protocol itself. A advanced retransmission number may happen due to the interaction between the 2 layers, peculiarly erstwhile moving IP over HF:
- STANAG-4538 (Layer 2) action
the STANAG-4538 data link protocol tries to deliver a single packet and if all attempts fail, it discards the packet and moves on to the next one. Result: The packet is considered lost by the Data Link Layer.
- TCP (Layer 4) action
TCP is the reliable transport protocol frequently utilized for applications like web browsing, email, or file transfer, and it runs over IP, erstwhile a packet is lost by the lower layer (Layer 2, STANAG-4538), the TCP sender never receives an acknowledgment for that packet, TCP's retransmission timer yet expires, and it assumes the network failed to deliver the data (TCP is not HF-aware!), crucially, TCP retransmits the data independently of the Layer 2 protocol.
- the domino effect
STANAG-4538 transmits a packet respective times, it fails and discards the packet. TCP times out. It retransmits the data. The fresh TCP section is given to STANAG-4538 which treats it as a fresh packet to send. STANAG-4538 transmits this "new" packet up to N_tx times again. If it fails again, TCP times out a second time, and the process repeats until the packet is successfully acknowledged.

If the channel conditions are so mediocre that the STANAG-4538 link fails more than N_tx times in a row before TCP gives up, the full number of retransmissions for a single packet could easy presume a large value, as the higher layer continually resubmits data to the persistently failing lower layer.
Anyway, technically TCP cannot straight origin packet retransmission, only STANAG-4538 does that: TCP increases the likelihood of retransmissions, but it does not initiate them!

Summarizing:
N_tx parameter is set to a large value (>20) to prioritize eventual transportation reliability over velocity and to support Hybrid-ARQ mechanism
or
N_tx parameter is set to a low value (often in the scope of 3 to 10 or akin tiny number) and TCP times out and re-injects the data.

Definitely channel conditions were undoubtedly mediocre and severe.
Perhaps - in my opinion - it would have been better to usage LDL packets size smaller than 512 bytes (4160 bits). Under mediocre channel conditions, a smaller packet size is mostly better, even though the number of packets increases. Shorter transmissions (reduced airtime) reduce vulnerability to fades, have lower probability of interference and collisions, enable faster ARQ cycles and improve latency:
large packet → advanced chance of corruption → many retransmissions
small packet → lower chance of corruption → less retransmissions (even though more packets)
Fig. 5 - Packet mistake Probability (PEP) vs. packet dimension (theoretical model)

The airtime, or duration, of STANAG-4538 LDLn BW3 bursts is variable and depends on the amount of data being sent, specified by the parameter n. The full burst duration is given by the formula: duration (ms) =373.33 + n×13.33, (n = 32,64,96,...,512).
The possible full airtimes for a single BW3 burst for the 4 standard BW3 burst sizes (n=64,128,256,512) are:

n ValuePayload DurationTotal Burst Duration
64853.12 ms1226.45 ms (≈1.23 seconds)
1281706.24 ms2079.57 ms (≈2.08 seconds)
2563412.48 ms3785.81 ms (≈3.79 seconds)
5126824.96 ms7198.29 ms (≈7.20 seconds)
Longer packets like transmit more data per burst, leading to higher throughput, but they are besides importantly more susceptible to errors (higher PEP) since require a much longer airtime (up to seconds).

[1] recording tanks to linkz KiwiSDR #3 (French Alps)
https://disk.yandex.com/d/2r0xOdGLWu45UA
(1) STANAG-4538 xDL protocols:
HDL (High-throughput Data Link protocol) waveforms: BW1 for acknowledgement and traffic management, BW2 for traffic data
LDL (Low-latency Data Link protocol) waveforms: BW3 for traffic data, BW4 for acknowledgement and traffic management
HDL+ waveforms: BW6 for acknowledgement and traffic management, BW7 for traffic data

(2) Why the Limit is Necessary
HF Channel Volatility: HF radio conditions are highly variable due to ionospheric fading and noise. While the LDL protocol uses Code Combining (Hybrid-ARQ) to increase the likelihood of decoding a packet with each attempt, a limit is needed to account for utmost or prolonged channel outages.
Preventing Link Stall: Unlimited retransmissions would origin the full data link to stall, continually trying to send a packet that may be unrecoverable due to a abrupt deep fade or interference. By limiting the attempts, the protocol can decide to terminate the link and let the higher layers (or the operator) to effort a fresh link setup on a different, possibly better, frequency.

Idź do oryginalnego materiału