BeClaude
Research2026-06-24

Integrated Sensing and Communications for Real-time Avatar Control in XR over 5G

Source: Arxiv CS.AI

arXiv:2606.23771v1 Announce Type: cross Abstract: Extended Reality (XR) presents a challenging use case for 5G and 6G networks, requiring high data-rates and lowlatency communication to deliver a truly immersive experience. Moreover, in order to seamlessly translate physical actions to the virtual...

The XR-Avatar Latency Problem Meets ISAC

This new research from Arxiv tackles a fundamental bottleneck in Extended Reality (XR): the gap between physical motion and virtual representation. The paper proposes using Integrated Sensing and Communications (ISAC) — a technique where the same waveform and hardware handle both radar-like sensing and data transmission — to control avatars in real-time over 5G networks. Instead of relying on separate motion capture systems and separate communication links, ISAC merges them into a unified pipeline.

The core idea is elegant: a 5G base station can simultaneously sense a user’s limb movements (via reflected signals) and transmit that positional data to the XR rendering engine, all within the same spectral resource. This eliminates the need for external trackers or cameras, reducing both hardware complexity and the critical end-to-end latency that plagues current avatar systems.

Why This Matters for XR and 5G

The XR industry has long struggled with the “uncanny valley” problem in real-time avatars — delays as small as 20 milliseconds can break immersion, causing motion sickness or a sense of disconnection. Traditional approaches separate sensing (e.g., cameras, IMUs) from communication (Wi-Fi, 5G), introducing multiple serialized processing steps. ISAC collapses these stages into one, potentially shaving off tens of milliseconds.

For 5G and 6G network operators, this research validates that millimeter-wave and sub-THz bands can serve dual purposes. The sensing capability becomes a value-added service, not just a data pipe. This could accelerate investment in dense small-cell deployments, which are necessary for both high-bandwidth XR and precise sensing.

Implications for AI Practitioners

Three specific takeaways emerge for those building AI systems around XR:

First, model architectures must be redesigned for joint sensing-communication pipelines. Current AI models for pose estimation typically assume clean, separate sensor inputs. ISAC introduces noisy, entangled waveforms where the communication data and sensing data share the same signal. Practitioners will need to develop neural networks that can jointly denoise and decode — essentially, a multi-task learning problem where the model must simultaneously estimate joint positions and reconstruct transmitted data.

Second, real-time inference becomes a network-edge problem. The paper implies that AI inference for avatar control must happen at the base station or a near-edge server, not in the cloud. This shifts the optimization target: practitioners should prioritize low-latency, lightweight models (e.g., quantized transformers or spiking neural networks) that can run on 5G edge hardware with millisecond-level deadlines.

Third, synthetic data generation will be critical. ISAC-based sensing produces radar-like point clouds that differ fundamentally from camera-based inputs. Existing pose estimation datasets (e.g., COCO, Human3.6M) are useless here. AI teams will need to generate simulated ISAC waveforms paired with ground-truth joint positions — a non-trivial simulation challenge that requires accurate electromagnetic modeling of human bodies in motion.

Key Takeaways

  • ISAC merges sensing and communication into one waveform, potentially solving the latency bottleneck for real-time avatar control in XR over 5G/6G.
  • AI models must be retrained for joint denoising and decoding of entangled sensing-communication signals, shifting from separate sensor pipelines to unified multi-task architectures.
  • Inference moves to the network edge, requiring ultra-low-latency, quantized models that can run on 5G base stations.
  • Practitioners will need to invest in synthetic ISAC waveform datasets, as existing camera-based pose estimation data is incompatible with this new sensing modality.
arxivpapers