JEPA (Joint Embedding Predictive Architecture) from Meta, an emerging technique, operates in latent space rather than pixel space, similar to inpainting but with deeper semantic layers.

生データの代わりに埋め込みを転送することで、プライバシー保護を実現できます

まだ、FL-related Protocolの設計によって、新規制がさらに増えそうです

Many types of information cannot be perceived and collected through direct sensor-based sensing, like predicting traffic congestion and Identifying potential security threats in real-time. For such ground applications where straightforward sensing is unavailable, it is challenging to infer and predict the real info by directly fusing various unobvious supporting sensory data, highlighting a key flaw in traditional multimodal research.

Distributed scene understanding among distributed camera agents. Cameras need to understand what they see and NOT just do classification.

Examples for ground applications:

Our overarching goal is to enhance the prediction ability of sensing systems by deriving & integrating meaningful insights from diverse distributed data.

To the limitations of traditional multimodal research, our solution involves developing a ”Distributed Brain" mechanism that integrates diverse data modalities through distributed networks and leverages LLMs for semantic extraction to enable comprehensive, context-rich, real-time analysis and predictive insights.