Survey track accepted papers (Montreal)

1106: A Survey on One-To-Many Negotiation: A Taxonomy of Interdependency

Authors: Tamara C.P. Florijn, Pınar Yolum, Tim Baarslag

Location: Montreal | Day: August 20th | Time: 14:00 | Session: Agent-based and Multi-agent Systems (2/3)

Show Abstract

One-to-many negotiations are widely applied in various domains, contributing to efficient resource allocation and effective decision making. This wide variety of applications also brings a wide variety of implemented protocols, terminology and utility functions, which makes it hard to compare and improve strategies using existing solutions.
We introduce a meta-model of negotiations, which characterizes almost all one-to-many negotiation research, bringing a unified description of the negotiations. This meta-model allows us to identify different classes of interdependency based on utility functions.
We show how existing one-to-many negotiations are related to each other, finding new insights and identifying knowledge gaps.
We suggest that a general utility function framework and benchmark scenarios for one-to-many negotiations could accommodate future advancement in this field.

8296: Harnessing Vision Models for Time Series Analysis: A Survey

Authors: Jingchao Ni, Ziming Zhao, ChengAo Shen, Hanghang Tong, Dongjin Song, Wei Cheng, Dongsheng Luo, Haifeng Chen

Location: Montreal | Day: August 20th | Time: 14:00 | Session: ML: time series, sequences and signals

Show Abstract

Time series analysis has evolved from traditional autoregressive models to deep learning, Transformers, and Large Language Models (LLMs). While vision models have also been explored along the way, their contributions are less recognized due to the predominance of sequence modeling. However, challenges such as the mismatch between continuous time series and LLMs’ discrete token space, and the difficulty in capturing multivariate correlations, have led to growing interest in Large Vision Models (LVMs) and Vision-Language Models (VLMs). This survey highlights the advantages of vision models over LLMs in time series analysis, offering a comprehensive dual-view taxonomy that answers key research questions like how to encode time series as images and how to model imaged time series. Additionally, we address pre- and post-processing challenges in this framework and outline future directions for advancing the field.

8412: Generative AI for Immersive Video: Recent Advances and Future Opportunities

Authors: Kaiyuan Hu, Yili Jin, Hao Zhou, Linfeng Du, Jiangchuan Liu, Xue Liu

Location: Montreal | Day: August 21st | Time: 15:00 | Session: CV: videos

Show Abstract

Immersive video serves as a key component of eXtended Reality (XR) that aims to create and interact with simulated virtual or hybrid environments. Such a technology allows users to experience immersive sensations that transcend time and space, and meanwhile continuously providing training data for emerging technologies like Embodied AI. Thanks to the advancements in sensing, computing, and display, recent years have witnessed many excellent works for XR and related hardware or software systems. However, challenges like high creation cost, lack of immersion, and limited scalability hinder the practical application of immersive video services. Whilst recently emerged generative artificial intelligence (GenAI) provides us with new insights in tackling existing challenges. In this paper, we conduct a comprehensive survey into the recent advances and future opportunities on how GenAI can benefit immersive video services. By introducing a systematic taxonomy, we meticulously classify the pertinent techniques and applications into three well-defined categories aligned with the pipeline of immersive video service: content creation, network delivery, and client-side display. This categorization enables a structured exploration of the diverse roles on how GenAI can benefit immersive video service, providing a framework for a more comprehensive understanding and evaluation of these technologies. To the best of our knowledge, this work is the first systematic survey of GenAI in XR settings, laying a foundation for future research in this interdisciplinary domain.

8456: Control in Computational Social Choice

Authors: Jiehua Chen, Joanna Kaczmarek, Paul Nüsken, Jörg Rothe, Ildikó Schlotter, Tessa Seeger

Location: Montreal | Day: August 21st | Time: 15:00 | Session: GTEP: Computational social choice (2/2)

Show Abstract

We survey the notion of control in various areas of computational social choice (COMSOC) such as voting, fair allocation, cooperative game theory, matching under preferences, and group identification. In all these scenarios, control can be exerted, for instance, by adding or deleting agents with the goal of influencing the outcome. We conclude by briefly covering control in some other COMSOC areas including participatory budgeting, judgment aggregation, and opinion diffusion.

8508: Artificial Intelligence in Spectroscopy: Advancing Chemistry from Prediction To Generation and Beyond

Authors: Kehan Guo, Yili Shen, Gisela Abigail Gonzalez-Montiel, Yue Huang, Yujun Zhou, Mihir Surve, Zhichun Guo, Payel Das, Nitesh V. Chawla, Olaf Wiest, Xiangliang Zhang

Location: Montreal | Day: August 20th | Time: 10:00 | Session: Data Mining

Show Abstract

The rapid advent of machine learning (ML) and artificial intelligence (AI) has catalyzed major transformations in chemistry, yet the application of these methods to spectroscopic and spectrometric data—termed Spectroscopy Machine Learning (SpectraML)—remains relatively underexplored. Modern spectroscopic techniques (MS, NMR, IR, Raman, UV-Vis) generate an ever-growing volume of high-dimensional data, creating a pressing need for automated and intelligent analysis beyond traditional expert-based workflows. In this survey, we provide a unified review of SpectraML, systematically examining state-of-the-art approaches for both forward tasks (molecule-to-spectrum prediction) and inverse tasks (spectrum-to-molecule inference). We trace the historical evolution of ML in spectroscopy—from early pattern recognition to the latest foundation models capable of advanced reasoning—and offer a taxonomy of representative neural architectures, including graph-based and transformer-based methods. Addressing key challenges such as data quality, multimodal integration, and computational scalability, we highlight emerging directions like synthetic data generation, large-scale pretraining, and few- or zero-shot learning. To foster reproducible research, we release an open-source repository containing curated datasets and code implementations. Our survey serves as a roadmap for researchers, guiding advancements at the intersection of spectroscopy and AI.

8516: Zero-shot Quantization: A Comprehensive Survey

Authors: Minjun Kim, Jaehyeon Choi, Jongkeun Lee, Wonjin Cho, U Kang

Location: Montreal | Day: August 20th | Time: 14:00 | Session: Computer vision (2/3)

Show Abstract

Network quantization has proven to be a powerful approach to reduce the memory and computational demands of deep learning models for deployment on resource-constrained devices.
However, traditional quantization methods often rely on access to training data, which is impractical in many real-world scenarios due to privacy, security, or regulatory constraints.
Zero-shot Quantization (ZSQ) emerges as a promising solution, achieving quantization without requiring any real data.
In this paper, we provide a comprehensive overview of ZSQ methods and their recent advancements.
First, we provide a formal definition of the ZSQ problem and highlight the key challenges.
Then, we categorize the existing ZSQ methods into classes based on data generation strategies, and analyze their motivations, core ideas, and key takeaways.
Lastly, we suggest future research directions to address the remaining limitations and advance the field of ZSQ.
To the best of our knowledge, this paper is the first in-depth survey on ZSQ.

8553: Human-Centric Foundation Models: Perception, Generation and Agentic Modeling

Authors: Shixiang Tang, Yizhou Wang, Lu Chen, Yuan Wang, Sida Peng, Dan Xu, Wanli Ouyang

Location: Montreal | Day: August 21st | Time: 10:00 | Session: Humans and AI

Show Abstract

Human understanding and generation are critical for modeling digital humans and humanoid embodiments. Recently, Human-centric Foundation Models (HcFMs)—inspired by the success of generalist models such as large language and vision models—have emerged to unify diverse human-centric tasks into a single framework, surpassing traditional task-specific approaches. In this survey, we present a comprehensive overview of HcFMs by proposing a taxonomy that categorizes current approaches into four groups: (1) Human-centric Perception Foundation Models that capture fine-grained features for multi-modal 2D and 3D understanding; (2) Human-centric AIGC Foundation Models that generate high-fidelity, diverse human-related content; (3) Unified Perception and Generation Models that integrate these capabilities to enhance both human understanding and synthesis; and (4) Human-centric Agentic Foundation Models that extend beyond perception and generation to learn human-like intelligence and interactive behaviors for humanoid embodied tasks. We review state-of-the-art techniques, discuss emerging challenges and future research directions. This survey aims to serve as a roadmap for researchers and practitioners working towards more robust, versatile, and intelligent digital human and embodiments modeling. Website is https://github.com/HumanCentricModels/Awesome-Human-Centric-Foundation-Models/

8557: RenderBender: A Survey on Adversarial Attacks Using Differentiable Rendering

Authors: Matthew Hull, Haoran Wang, Matthew Lau, Alec Helbling, Mansi Phute, Chao Zhang, Zsolt Kira, Willian Lunardi, Martin Andreoni, Wenke Lee, Duen Horng Chau

Location: Montreal | Day: August 21st | Time: 10:00 | Session: CV: attacks

Show Abstract

Differentiable rendering techniques like Gaussian Splatting and Neural Radiance Fields have become powerful tools for generating high-fidelity models of 3D objects and scenes. Their ability to produce both physically plausible and differentiable models of scenes are key ingredient needed to produce physically plausible adversarial attacks on DNNs. However, the adversarial machine learning community has yet to fully explore these capabilities, partly due to differing attack goals (e.g., misclassification, misdetection) and a wide range of possible scene manipulations used to achieve them (e.g., alter texture, mesh). This survey contributes a framework that unifies diverse goals and tasks, facilitating easy comparison of existing work, identifying research gaps, and highlighting future directions—ranging from expanding attack goals and tasks to account for new modalities, state-of-the-art models, tools, and pipelines, to underscoring the importance of studying real-world threats in complex scenes.

8644: A Survey on Multi-View Knowledge Graph: Generation, Fusion, Applications and Future Directions

Authors: Zihan Yang, Xiaohui Tao, Taotao Cai, Yifu Tang, Haoran Xie, Lin Li, Jianxin Li, Qing Li

Location: Montreal | Day: August 21st | Time: 15:00 | Session: DM: Graph Data Mining

Show Abstract

Knowledge Graphs (KGs) have revolutionized structured knowledge representation, yet their capacity to model real-world complexity and heterogeneity remains fundamentally constrained. The emerging paradigm of Multi-View Knowledge Graphs (MVKGs) addresses this gap through multi-view learning, but existing research lacks systematic integration. This survey provides the first systematic consolidation of MVKG methodologies, with four pivotal contributions: 1) The first unified taxonomy of view generation paradigms that rigorously categorizes view into four types: structure, semantic, representation, and knowledge & modality; 2) A novel methodological typology for view fusion that systematically classifies techniques by fusion targets (feature, decision, and hybrid); 3) Task-centric application mapping that bridges theoretical MVKG constructs to node/link/graph-level downstream tasks; 4) A forward-looking roadmap identifying underexplored challenges. By unifying fragmented methodologies and formalizing MVKG design principles, this survey serves as a roadmap for advancing KG versatility in complex AI-driven scenarios. In doing so, it paves the way for more efficient knowledge integration, enhanced decision-making, and cross-domain learning in real-world applications.

8711: A Survey on Model Repair in AI Planning

Authors: Pascal Bercher, Sarath Sreedharan, Mauro Vallati

Location: Montreal | Day: August 19th | Time: 15:00 | Session: Planning and Scheduling (2/5)

Show Abstract

Accurate planning models are a prerequisite for the appropriate functioning of AI planning applications. Creating these models is, however, a tedious and error-prone task — even for planning experts. This makes the provision of automated modeling support essential. In this work, we differentiate between approaches that learn models from scratch (called domain model acquisition) and those that repair flawed or incomplete ones. We survey approaches for the latter, including those that can be used for domain repair but have been developed for other applications, discuss possible optimization metrics (i.e., which repaired model to aim at), and conclude with lines of research we believe deserve more attention.

8740: Image Captioning Evaluation in the Age of Multimodal LLMs: Challenges and Future Perspectives

Authors: Sara Sarto, Marcella Cornia, Rita Cucchiara

Location: Montreal | Day: August 22nd | Time: 11:30 | Session: CV: multimodal LLMs

Show Abstract

The evaluation of machine-generated captions is a complex and evolving challenge. With the advent of Multimodal Large Language Models (MLLMs), image captioning has become a core task, increasing the need for robust and reliable evaluation metrics. This survey provides a comprehensive overview of advancements in image captioning evaluation, analyzing the evolution, strengths, and limitations of existing metrics. We assess these metrics across multiple dimensions, including correlation with human judgment, ranking accuracy, and sensitivity to hallucinations. Additionally, we explore the challenges posed by the longer and more detailed captions generated by MLLMs and examine the adaptability of current metrics to these stylistic variations. Our analysis highlights some limitations of standard evaluation approaches and suggests promising directions for future research in image captioning assessment. For a comprehensive overview of captioning evaluation refer to our project page available at https://github.com/aimagelab/awesome-captioning-evaluation.

8748: The Evolving Landscape of LLM- and VLM-Integrated Reinforcement Learning

Authors: Sheila Schoepp, Masoud Jafaripour, Yingyue Cao, Tianpei Yang, Fatemeh Abdollahi, Shadan Golestan, Zahin Sufiyan, Osmar R. Zaiane, Matthew E. Taylor

Location: Montreal | Day: August 21st | Time: 15:00 | Session: ML: Reinforcement Learning (2/2)

Show Abstract

Reinforcement learning (RL) has shown impressive results in sequential decision-making tasks. Large Language Models (LLMs) and Vision-Language Models (VLMs) have recently emerged, exhibiting impressive capabilities in multimodal understanding and reasoning. These advances have led to a surge of research integrating LLMs and VLMs into RL. This survey reviews representative works in which LLMs and VLMs are used to overcome key challenges in RL, such as lack of prior knowledge, long-horizon planning, and reward design. We present a taxonomy that categorizes these LLM/VLM-assisted RL approaches into three roles: agent, planner, and reward. We conclude by exploring open problems, including grounding, bias mitigation, improved representations, and action advice. By consolidating existing research and identifying future directions, this survey establishes a framework for integrating LLMs and VLMs into RL, advancing approaches that unify natural language and visual understanding with sequential decision-making.

8880: Game Theory Meets Large Language Models: A Systematic Survey

Authors: Haoran Sun, Yusen Wu, Yukun Cheng, Xu Chu

Location: Montreal | Day: August 20th | Time: 14:00 | Session: Game Theory

Show Abstract

Game theory establishes a fundamental framework for analyzing strategic interactions among rational decision-makers. The rapid advancement of large language models (LLMs) has sparked extensive research exploring the intersection of these two fields. Specifically, game-theoretic methods are being applied to evaluate and enhance LLM capabilities, while LLMs themselves are reshaping classic game models. This paper presents a comprehensive survey of the intersection of these fields, exploring a bidirectional relationship from three perspectives: (1) Establishing standardized game-based benchmarks for evaluating LLM behavior; (2) Leveraging game-theoretic methods to improve LLM performance through algorithmic innovations; (3) Characterizing the societal impacts of LLMs through game modeling. Among these three aspects, we also highlight how the equilibrium analysis for traditional game models is impacted by LLMs’ advanced language understanding, which in turn extends the study of game theory. Finally, we identify key challenges and future research directions, assessing their feasibility based on the current state of the field. By bridging theoretical rigor with emerging AI capabilities, this survey aims to foster interdisciplinary collaboration and drive progress in this evolving research area.

8933: Neuro-Symbolic Artificial Intelligence: A Task-Directed Survey in the Black-Box Models Era

Authors: Giovanni Pio Delvecchio, Lorenzo Molfetta, Gianluca Moro

Location: Montreal | Day: August 21st | Time: 11:30 | Session: ML: Neurosymbolic AI

Show Abstract

The integration of symbolic computing with neural networks has intrigued researchers since the first theorizations of Artificial intelligence (AI). The ability of Neuro-Symbolic (NeSy) methods to infer or exploit behavioral schema has been widely considered as one of the possible proxies for human-level intelligence. However, the limited semantic generalizability and the challenges in declining complex domains with pre-defined patterns and rules hinder their practical implementation in real-world scenarios. The unprecedented results achieved by connectionist systems since the last AI breakthrough in 2017 have raised questions about the competitiveness of NeSy solutions, with particular emphasis on the Natural Language Processing and Computer Vision fields. This survey examines task-specific advancements in the NeSy domain to explore how incorporating symbolic systems can enhance explainability and reasoning capabilities. Our findings are meant to serve as a resource for researchers exploring explainable NeSy methodologies for real-life tasks and applications. Reproducibility details and in-depth comments on each surveyed research work are made available at https://github.com/disi-unibo-nlp/task-oriented-neuro-symbolic.git.

8943: Grounding Open-Domain Knowledge from LLMs to Real-World Reinforcement Learning Tasks: A Survey

Authors: Haiyan Yin, Hangwei Qian, Yaxin Shi, Ivor Tsang, Yew-Soon Ong

Location: Montreal | Day: August 22nd | Time: 11:30 | Session: Natural Language Processing (2/2)

Show Abstract

Grounding open-domain knowledge from large language models (LLMs) into real-world reinforcement learning (RL) tasks represents a transformative frontier in developing intelligent agents capable of advanced reasoning, adaptive planning, and robust decision-making in dynamic environments. In this paper, we introduce the LLM-RL Grounding Taxonomy, a systematic framework that categorizes emerging methods for integrating LLMs into RL systems by bridging their open-domain knowledge and reasoning capabilities with the task-specific dynamics, constraints, and objectives inherent to real-world RL environments. This taxonomy encompasses both training-free approaches, which leverage the zero-shot and few-shot generalization capabilities of LLMs without fine-tuning, and fine-tuning paradigms that adapt LLMs to environment-specific tasks for improved performance. We critically analyze these methodologies, highlight practical examples of effective knowledge grounding, and examine the challenges of alignment, generalization, and real-world deployment. Our work not only illustrates the potential of LLM-RL agents for enhanced decision-making, but also offers actionable insights for advancing the design of next-generation RL systems that integrate open-domain knowledge with adaptive learning.

8954: Paradigms of AI Evaluation: Mapping Goals, Methodologies and Culture

Authors: John Burden, Marko Tešić, Lorenzo Pacchiardi, José Hernández-Orallo

Location: Montreal | Day: August 19th | Time: 15:00 | Session: Machine Learning (1/4)

Show Abstract

Research in AI evaluation has grown increasingly complex and multidisciplinary, attracting researchers with diverse backgrounds and objectives. As a result, divergent evaluation paradigms have emerged, often developing in isolation, adopting conflicting terminologies, and overlooking each other’s contributions. This fragmentation has led to insular research trajectories and communication barriers both among different paradigms and with the general public, contributing to unmet expectations for deployed AI systems. To help bridge this insularity, in this paper we survey recent work in the AI evaluation landscape and identify six main paradigms. We characterise major recent contributions within each paradigm across key dimensions related to their goals, methodologies and research cultures. By clarifying the unique combination of questions and approaches associated with each paradigm, we aim to increase awareness of the breadth of current evaluation approaches and foster cross-pollination between different paradigms. We also identify potential gaps in the field to inspire future research directions.

8965: Integrating Neurosymbolic AI in Advanced Air Mobility: A Comprehensive Survey

Authors: Kamal Acharya, Iman Sharifi, Mehul Lad, Liang Sun, Houbing Song

Location: Montreal | Day: August 21st | Time: 11:30 | Session: ML: Neurosymbolic AI

Show Abstract

Neurosymbolic AI combines neural network adaptability with symbolic reasoning, promising an approach to address the complex regulatory, operational, and safety challenges in Advanced Air Mobility (AAM). This survey reviews its applications across key AAM domains such as demand forecasting, aircraft design, and real-time air traffic management. Our analysis reveals a fragmented research landscape where methodologies, including Neurosymbolic Reinforcement Learning, have shown potential for dynamic optimization but still face hurdles in scalability, robustness, and compliance with aviation standards. We classify current advancements, present relevant case studies, and outline future research directions aimed at integrating these approaches into reliable, transparent AAM systems. By linking advanced AI techniques with AAM’s operational demands, this work provides a concise roadmap for researchers and practitioners developing next-generation air mobility solutions.

8993: 40 Years of Research in Possibilistic Logic – a Survey

Authors: Didier Dubois, Henri Prade

Location: Montreal | Day: August 20th | Time: 14:00 | Session: KR: Logic

Show Abstract

Possibilistic logic is forty years old. Possibilistic logic is a logic that handles classical logic formulas
associated with weights taking values in a linearly ordered set or more generally in a lattice. Over the
decades, possibilistic logic has undergone numerous developments at both theoretical and applied
levels. The ambition of this article is to review all these developments while exposing the main ideas
behind them.

9009: A Comprehensive Survey on Physical Risk Control in the Era of Foundation Model-enabled Robotics

Authors: Takeshi Kojima, Yaonan Zhu, Yusuke Iwasawa, Toshinori Kitamura, Gang Yan, Shu Morikuni, Ryosuke Takanami, Alfredo Solano, Tatsuya Matsushima, Akiko Murakami, Yutaka Matsuo

Location: Montreal | Day: August 20th | Time: 14:00 | Session: Robotics

Show Abstract

Recent Foundation Model-enabled robotics (FMRs) display greatly improved general-purpose skills, enabling more adaptable automation than conventional robotics. Their ability to handle diverse tasks thus creates new opportunities to replace human labor. However, unlike general foundation models, FMRs interact with the physical world, where their actions directly affect the safety of humans and surrounding objects, requiring careful deployment and control. Based on this proposition, our survey comprehensively summarizes robot control approaches to mitigate physical risks by covering all the lifespan of FMRs ranging from pre-deployment to post-accident stage. Specifically, we broadly divide the timeline into the following three phases: (1) pre-deployment phase, (2) pre-incident phase, and (3) post-incident phase. Throughout this survey, we find that there is much room to study (i) pre-incident risk mitigation strategies, (ii) research that assumes physical interaction with humans, and (iii) essential issues of foundation models themselves. We hope that this survey will be a milestone in providing a high-resolution analysis of the physical risks of FMRs and their control, contributing to the realization of a good human-robot relationship.

9071: Comprehensive Review of Neural Differential Equations for Time Series Analysis

Authors: YongKyung Oh, Seungsu Kam, Jonghun Lee, Dong-Young Lim, Sungil Kim, Alex A. T. Bui

Location: Montreal | Day: August 20th | Time: 14:00 | Session: ML: time series, sequences and signals

Show Abstract

Time series modeling and analysis have become critical in various domains. Conventional methods such as RNNs and Transformers, while effective for discrete-time and regularly sampled data, face significant challenges in capturing the continuous dynamics and irregular sampling patterns inherent in real-world scenarios. Neural Differential Equations (NDEs) represent a paradigm shift by combining the flexibility of neural networks with the mathematical rigor of differential equations. This paper presents a comprehensive review of NDE-based methods for time series analysis, including neural ordinary differential equations, neural controlled differential equations, and neural stochastic differential equations. We provide a detailed discussion of their mathematical formulations, numerical methods, and applications, highlighting their ability to model continuous-time dynamics. Furthermore, we address key challenges and future research directions. This survey serves as a foundation for researchers and practitioners seeking to leverage NDEs for advanced time series analysis.

9107: Emerging Advances in Learned Video Compression: Models, Systems and Beyond

Authors: Chuanmin Jia, Feng Ye, Siwei Ma, Wen Gao, Huifang Sun, Leonardo Chiariglione

Location: Montreal | Day: August 21st | Time: 15:00 | Session: CV: videos

Show Abstract

Video compression is a fundamental topic in the visual intelligence, bridging visual signal sensing/capturing and high-level visual analytics. The broad success of artificial intelligence (AI) technology has enriched the horizon of video compression into novel paradigms by leveraging end-to-end optimized neural models. In this survey, we first provide a comprehensive and systematic overview of recent literature on end-to-end optimized learned video coding, covering the spectrum of pioneering efforts in both uni-directional and bi-directional prediction based compression model designation. We further delve into the optimization techniques employed in learned video compression (LVC), emphasizing their technical innovations, advantages. Some standardization progress is also reported. Furthermore, we investigate the system design and hardware implementation challenges of the LVC inclusively. Finally, we present the extensive simulation results to demonstrate the superior compression performance of LVC models, addressing the question that why learned codecs and AI-based video technology would have with broad impact on future visual intelligence research.

9114: Federated Learning at the Forefront of Fairness: A Multifaceted Perspective

Authors: Noorain Mukhtiar, Adnan Mahmood, Yipeng Zhou, Jian Yang, Jing Teng, Quan Z. Sheng

Location: Montreal | Day: August 19th | Time: 15:00 | Session: ML: Federated Learning

Show Abstract

Fairness in Federated Learning (FL) is emerging as a critical factor driven by heterogeneous clients’ constraints and balanced model performance across various scenarios. In this survey, we delineate a comprehensive classification of the state-of-the-art fairness-aware approaches from a multifaceted perspective, i.e., model performance-oriented and capability-oriented. Moreover, we provide a framework to categorize and address various fairness concerns and associated technical aspects, examining their effectiveness in balancing equity and performance within FL frameworks. We further examine several significant evaluation metrics leveraged to measure fairness quantitatively. Finally, we explore exciting open research directions and propose prospective solutions that could drive future advancements in this important area, laying a solid foundation for researchers working toward fairness in FL.

9136: Words Over Pixels? Rethinking Vision in Multimodal Large Language Models

Authors: Anubhooti Jain, Mayank Vatsa, Richa Singh

Location: Montreal | Day: August 22nd | Time: 11:30 | Session: CV: multimodal LLMs

Show Abstract

Multimodal Large Language Models (MLLMs) promise seamless integration of vision and language understanding. However, despite their strong performance, recent studies reveal that MLLMs often fail to effectively utilize visual information, frequently relying on textual cues instead. This survey provides a comprehensive analysis of the vision component in MLLMs, covering both application-level and architectural aspects. We investigate critical challenges such as weak spatial reasoning, poor fine-grained visual perception, and suboptimal fusion of visual and textual modalities. Additionally, we explore limitations in current vision encoders, benchmark inconsistencies, and their implications for downstream tasks. By synthesizing recent advancements, we highlight key research opportunities to enhance visual understanding, improve cross-modal alignment, and develop more robust and efficient MLLMs. Our observations emphasize the urgent need to elevate vision to an equal footing with language, paving the path for more reliable and perceptually aware multimodal models.

9152: Survey on Strategic Mining in Blockchain: A Reinforcement Learning Approach

Authors: Jichen Li, Lijia Xie, Hanting Huang, Bo Zhou, Binfeng Song, Wanying Zeng, Xiaotie Deng, Xiao Zhang

Location: Montreal | Day: August 21st | Time: 15:00 | Session: Agent-based and Multi-agent Systems (3/3)

Show Abstract

Strategic mining attacks, such as selfish mining, exploit blockchain consensus protocols by deviating from honest behavior to maximize rewards. Markov Decision Process (MDP) analysis faces scalability challenges in modern digital economics, including blockchain. To address these limitations, reinforcement learning (RL) provides a scalable alternative, enabling adaptive strategy optimization in complex dynamic environments.

In this survey, we examine RL’s role in strategic mining analysis, comparing it to MDP-based approaches. We begin by reviewing foundational MDP models and their limitations, before exploring RL frameworks that can learn near-optimal strategies across various protocols. Building on this analysis, we compare RL techniques and their effectiveness in deriving security thresholds, such as the minimum attacker power required for profitable attacks. Expanding the discussion further, we classify consensus protocols and propose open challenges, such as multi-agent dynamics and real-world validation.

This survey highlights the potential of reinforcement learning to address the challenges of selfish mining, including protocol design, threat detection, and security analysis, while offering a strategic roadmap for researchers in decentralized systems and AI-driven analytics.

9158: Evaluation of Medical Large Language Models: Taxonomy, Review, and Directions

Authors: Anisio Lacerda, Gisele Pappa, Adriano César Machado Pereira, Wagner Meira Jr, Alexandre Guimarães de Almeida Barros

Location: Montreal | Day: August 21st | Time: 15:00 | Session: ML: Large Language Models

Show Abstract

The integration of Large Language Models (LLMs) into medicine presents both great opportunities and significant challenges, particularly in ensuring these models are accurate, reliable, and safe. While LLMs have shown impressive capabilities in understanding and generating human language, their application in the medical domain requires careful evaluation due to the critical nature of medical applications which are inherently linked to patient life and health. Current evaluations of LLMs in medicine are often fragmented and insufficient, with a lack of standardized performance metrics, limited use of real patient data, and insufficient attention to important applications, such as documentation, education, and research. Furthermore, traditional NLP-based evaluations are often inadequate for assessing the text generated by LLMs. Therefore, a robust evaluation is essential to ensure the responsible and effective use of LLMs in medical settings, and to address the inherent challenges associated with their implementation. This paper explores the various dimensions of LLM evaluation in the medical domain, proposes a new taxonomy for categorizing medical applications, and discusses directions for future research in this critical area.

9226: Empowering LLMs with Logical Reasoning: A Comprehensive Survey

Authors: Fengxiang Cheng, Haoxuan Li, Fenrong Liu, Robert van Rooij, Kun Zhang, Zhouchen Lin

Location: Montreal | Day: August 20th | Time: 14:00 | Session: KR: Logic

Show Abstract

Large language models (LLMs) have achieved remarkable successes on various tasks. However, recent studies have found that there are still significant challenges to the logical reasoning abilities of LLMs, which can be categorized into the following two aspects: (1) Logical question answering: LLMs often fail to generate the correct answer within a complex logical problem which requires sophisticated deductive, inductive or abductive reasoning given a collection of premises and constrains. (2) Logical consistency: LLMs are prone to producing responses contradicting themselves across different questions. For example, a state-of-the-art question-answering LLM Macaw, answers Yes to both questions Is a magpie a bird? and Does a bird have wings? but answers No to Does a magpie have wings?. To facilitate this research direction, we comprehensively investigate the most cutting-edge methods and propose a detailed taxonomy. Specifically, to accurately answer complex logic questions, previous methods can be categorized based on reliance on external solvers, prompts, and fine-tuning. To avoid logical contradictions, we discuss concepts and solutions of various logical consistencies, including implication, negation, transitivity, factuality consistencies, and their composites. In addition, we review commonly used benchmark datasets and evaluation metrics, and discuss promising research directions, such as extending to modal logic to account for uncertainty and developing efficient algorithms that simultaneously satisfy multiple logical consistencies.