I. Introduction


% However, current ISAC implementations face challenges due to redundant data transmission, as multimodal sensors (e.g., cameras, LiDAR, RF sensors) operate in isolation despite observing the same environment. This redundancy leads to excessive bandwidth consumption, creating inefficiencies in both communication and sensing performance. Consequently, a unified distributed ISAC framework is necessary to facilitate effective cross-modal coordination and optimize resource allocation.
% Yet, distributed ISAC systems currently face practical problems, such as managing cross-modal interactions, optimizing shared resources, and balancing the inherent trade-offs between sensing accuracy and communication efficiency. Addressing these challenges requires robust, generalizable, and efficient methods tailored explicitly to the emerging ISAC context

% → BG 1: the shifting from 5G to 6G: new requirement like: lower latency, higher speed and spectrum resource (frequency), \\\\
% → leading to new tendency like:
% from centralized data processing to Distributed+local data processing, from traditional raw data gathering to AI-assisted latent-representation (i.e., embeddings) transmission (i.e., semantic comm.)\\\\

% → Integrated sensing and communication (ISAC) has emerged as a promising solution, enabling wireless communication systems to incorporate sensing capabilities, but Modern ISAC systems suffer from redundant sensor data transmission due to isolated multi-modal sensing. A unified, distributed ISAC framework is needed to enable cross-modal coordination and efficient resource use.

% → BG 2: boom of AGI-driven wireless systems\\\\
% → leading to knowledge sharing and common knowledge extract, distribution and sharing\\\\
% → from seperated sensing and communication modules towards Integrated machanism (ISAC) \\\\ 
% → current problems with Distributed ISAC \\\\

5G → 6G

→ Distributed ISAC

→ self-supervised learning for robustness & generality

→ ISAC trade-off (compressed sensing for enhanced comm.)

validating robustness and generalizability is critical, especially since many existing works solve similar problems in a supervised manner. Our method, based on SSL-JEPA, aims to generalize by learning world models. To help readers clearly understand what we mean by "robustness" and "generalization" in the distributed crossmodal setting, which terms closely tied to our main contribution, we start by defining them:

Robustness

We consider a double-level interpretations:

Robustness as generalization to input context/task shift:

Demonstrating that the semantic embedding learned via inpainting remains effective across different input features or downstream tasks.