We should start from Semantic Communications And we should introduce a observation system as a typical application of the semantic communication. We can reduce traffic by not sensing the raw image but the latent vector.

Then, mentioning the issue and challenge of conventional Semantic Communications, which are reconstruction errors caused by using single modality and enrichment of the observation via multi-modal integration. But, adding a new modality is costly.

Here, we propose integrating ISAC into Semantic Comm as a solution for the above issue. wireless communication signal is a freely available sensing modality for the communication system. Signals propagating around the observation target must have information about the target. But, the issue is how to integrate ISAC into Semantic Communications, specifically how make ISAC distribute and how integrate ISAC with other modality. Requiring labeled data is undesirable.

So, we propose JEPA-Aided Multimodal ISAC Framework. Explaining the the proposed method briefly and emphasizing the benefit from the proposed method

1. Introduction

In recent years, Semantic Communication (SC) has emerged as a transformative approach for data transmission, particularly in scenarios requiring efficiency and scalability. Unlike traditional communication systems that transmit raw data, SC focuses on transmitting latent semantic representations (e.g., embeddings or vectors) that capture the core meaning of the data, thereby significantly reducing the communication load. This ability to condense and communicate semantic information holds great promise for applications such as autonomous driving and environmental monitoring, where distributed multimodal sensor networks are used to gather vast amounts of sensory data.

Despite its advantages, SC systems face key challenges that limit their effectiveness in distributed sensing applications. One major issue is the reconstruction errors that occur during the Semantic Encoding and Decoding process. In SC, the raw sensory data, such as images, LiDAR, or audio, is compressed into latent semantic embeddings using deep learning models. These embeddings are then transmitted to a central server or other nodes for reconstruction or further processing. However, the process of encoding data into a latent representation, and subsequently decoding it back to its original form, introduces loss of accuracy due to several factors.

First, semantic embeddings are highly compressed representations which do not retain all the detailed information of the original input, resulting in a loss of fidelity during reconstruction, particularly in scenarios requiring high precision. The encoded embeddings capture only the most relevant high-level features, thus they may discard fine-grained details crucial for accurate downstream tasks, leading to blurred or incomplete reconstructions. Second, Errors introduced during the semantic encoding process can propagate through the system and compound during decoding. These errors might stem from misinterpretations of the data's semantic structure or from the inability of the encoder to adequately capture certain contextual information, particularly in complex or highly dynamic environments. Third, since the models used for encoding and decoding are often trained on static general datasets, they may not be optimized for the specific conditions or modalities present in the practical dynamic distributed sensing network, further increasing the likelihood of errors.

These reconstruction challenges directly impact the performance of SC in distributed sensing applications, where maintaining high accuracy in sensing and decision-making tasks is critical. For instance, in city monitoring, even small errors in image or sensor data reconstruction could lead to significant performance drops or system failures. As the scale and complexity of distributed sensing networks grow, the compounded effects of these reconstruction errors become more pronounced, highlighting the need for advanced approaches that can minimize these losses while maintaining communication efficiency.

To address these limitations, we propose incorporating self-enabled Integrated Sensing and Semantic Communication (ISAC) as a solution. Most SC approaches treat the sensing data of each modality independently, which are unable to effectively integrate or fuse semantic information across modalities and result in lost opportunities for improving accuracy. Our self-enabled ISAC approach allows a SC system to use its existing Radio Frequency (RF) signals, such as WiFi's Channel State Information (CSI) or Received Signal Strength Indicator (RSSI), as additional sensing modalities. These signals naturally propagate through the observation environment, inherently carrying information about the objects and surroundings. This self-enabled ISAC sensing serves as a free modality that enhances the sensing process without adding any hardware or sensor cost.