Demo track accepted papers (Montreal)

DM6: SPARC: An AI-Based Speech Processing and Real-Time Correction System Preprint

Authors: TingRay Chung, Pin-Yu Chen

Location: Montreal | Day: August 20th | Time: 11:30 | Session: DEMOS 5

Poster Board Position: From board n120 to board n123

In the world of audio narration and video production, maintaining clear and accurate dialogue is crucial. However, most work done in dubbing mistakes is done in post-production which is often not applicable to live broadcasts. This project aims to develop a real-time voice correction system that automatically detects and corrects speech errors in near real-time while integrating the adjusted audio into ongoing conversations without disrupting the natural flow. This paper utilizes various AI tools like the Nous Hermes 2-Mistral 7B DPO large language model to first generate the reference script for Coqui’s XTTS-V2 zero-shot text-to-speech voice cloning model. After the correction is generated, it goes through a series of filters to replace the mistake and seamlessly integrates it. The experiment’s user survey demonstrates that the corrected audio is of high quality.

DM8: ASP Chef Chats with Large Language Models Preprint

Authors: Mario Alviano, Pietro Macrì, Luis Angel Rodriguez Reiners

Location: Montreal | Day: August 20th | Time: 11:30 | Session: DEMOS 2

Poster Board Position: From board n109 to board n112

ASP Chef enriches Answer Set Programming (ASP) with the notion of recipe, that is, a sequence of operations on answer sets.
Recipes are designed and executed in modern browsers, and further improve the fast prototyping capabilities of ASP.
This paper introduces new operations designed to integrate Large Language Models (LLMs) in recipe, with the aim of combining the reasoning strength of ASP with the natural language capabilities of LLMs, to enable more interactive and adaptive problem-solving workflows.
In a nutshell, answer sets in input are transformed into prompts for LLMs, whose responses are processed to extract facts for subsequent operations within the recipe.

DM104: DAVE: A Framework for Assisted Analysis of Document Collections in Knowledge-Intensive Domains Preprint

Authors: Ruben Agazzi, Renzo Alva Principe, Riccardo Pozzi, Marco Ripamonti, Matteo Palmonari

Location: Montreal | Day: August 20th | Time: 11:30 | Session: DEMOS 6

Poster Board Position: From board n124 to board n126

DAVE is a framework for assisting the analysis of documents in knowledge-intensive domains, based on an entity-centric approach supported by annotations of named entities in the documents. DAVE supports search & filtering, document exploration, question answering, and knowledge refinement. It is released as an open-source project that the community can further develop. DAVE’s distinguishing features are: the integration of a chatbot interface based on recent RAG solutions into well-established entity-powered faceted search, the fusion of search and filtering features provided by entity-level annotations with the capability to ask questions on annotated documents; human-in-the-loop functions to consolidate knowledge while exploring information, allowing users to improve annotations from NLP algorithms.

DM106: A Multimodal AI Dialogue System for Unified Document, Visual, and Audio Interaction Preprint

Authors: Yujun Feng, Jingyi Huang, Yang Zhang

Location: Montreal | Day: August 20th | Time: 11:30 | Session: DEMOS 5

Poster Board Position: From board n120 to board n123

This paper presents a multimodal intelligent dialogue system that seamlessly integrates document analysis, visual media processing, and audio interaction within a unified web interface. The system ensures secure user identity verification through persistent conversational management, leveraging textual document analysis, dynamic context integration, and cross-media interactions via video, image, and real-time speech processing. Our approach introduces three key innovations: (1) context-aware document analysis through text extraction, (2) a multimodal input pipeline supporting images, videos, and audio, and (3) persistent chat history management for maintaining conversational continuity. The system facilitates seamless transitions between audio and text, enabling natural interactions by processing audio input and converting text responses into speech. Additionally, the platform provides an intuitive interface for document uploads, camera capture, and audio recording, while ensuring conversation context is preserved across sessions. This implementation demonstrates the practical integration of multimodal input in an interactive artificial intelligence (AI) system, showcasing its potential for enhanced user engagement and interaction.

DM13: How to Make Reproducible Research in Machine Unlearning with ERASURE Preprint

Authors: Andrea D’Angelo, Claudio Savelli, Gabriele Tagliente, Flavio Giobergia, Elena Baralis, Giovanni Stilo

Location: Montreal | Day: August 20th | Time: 11:30 | Session: DEMOS 6

Poster Board Position: From board n124 to board n126

Machine unlearning, the process of removing specific data influences from Machine Learning models, is critical for complying with regulations like the GDPR’s right to be forgotten and addressing copyright disputes in large models. Despite its rising importance, the field still lacks standardized tools, hindering reproducibility and evaluation. Here, we present, in an extensive way, ERASURE, a unified framework enabling reproducibility by implementing common unlearning techniques, evaluation metrics, and dedicated datasets.
ERASURE advances research, ensures solution comparability, and facilitates reproducibility, addressing future legal and ethical challenges in data management.

DM21: Search Swarm: Multiagent Large Language Models Framework for E-commerce Product Search Preprint

Authors: Nagim Isyanbaev, Ilya Makarov

Location: Montreal | Day: August 20th | Time: 11:30 | Session: DEMOS 1

Poster Board Position: From board n105 to board n108

Search engines are vital for online e-commerce but often struggle with long, detailed queries. We introduce Search Swarm, a novel multi-agent system designed to improve search engine navigation on platforms like Amazon by accurately locating relevant products based on user instructions. Search Swarm employs multiple large language model (LLM) agents, each with a specific role: query planner, searcher, critic, and attribute selector. These agents collaborate to generate search queries, evaluate results, and identify the best product options tailored to users’ needs. Our framework outperforms existing methods like ReAct and Reflexion in the WebShop environment, achieving a reward score of 62.64, compared to scores of 54.1, 59.8, 61.5, and 58.2 for other approaches. Furthermore, in a comparison with a basic rule-based method on Amazon, Search Swarm achieved a score 38.71 points higher and a 41\% greater success rate, demonstrating its superior ability to provide relevant product matches over traditional search engines.

DM23: RobustX: Robust Counterfactual Explanations Made Easy Preprint

Authors: Junqi Jiang, Luca Marzari, Aaryan Purohit, Francesco Leofante

Location: Montreal | Day: August 20th | Time: 11:30 | Session: DEMOS 4

Poster Board Position: From board n117 to board n119

The increasing use of Machine Learning (ML) models to aid decision-making in high-stakes industries demands explainability to facilitate trust. Counterfactual Explanations (CEs) are ideally suited for this, as they can offer insights into the predictions of an ML model by illustrating how changes in its input data may lead to different outcomes. However, for CEs to realise their explanatory potential, significant challenges remain in ensuring their robustness under slight changes in the scenario being explained. Despite the widespread recognition of CEs’ robustness as a fundamental requirement, a lack of standardised tools and benchmarks hinders a comprehensive and effective comparison of robust CE generation methods. In this paper, we introduce RobustX, an open-source Python library implementing a collection of CE generation and evaluation methods, with a focus on the robustness property. RobustX provides interfaces to several existing methods from the literature, enabling streamlined access to state-of-the-art techniques. The library is also easily extensible, allowing fast prototyping of novel robust CE generation and evaluation methods.

DM24: Combining Code Generating Large Language Models and Self-Play to Iteratively Refine Strategies in Games Preprint

Authors: Yoram Bachrach, Edan Toledo, Karen Hambardzumyan, Despoina Magka, Martin Josifoski, Minqi Jiang, Jakob Foerster, Roberta Raileanu, Tatiana Shavrina, Nicola Cancedda, Avraham Ruderman, Katie Millican, Andrei Lupu, Rishi Hazra

Location: Montreal | Day: August 20th | Time: 11:30 | Session: DEMOS 4

Poster Board Position: From board n117 to board n119

We propose a self-play approach to generating strategies for playing in multi-player games, where strategies are represented as computer code. We use large language models (LLMs) to generate pieces of code to play in the game, which we refer to as generated bots. We engage the LLM generated bots in competitions, designed to generate increasingly stronger strategies. We follow game theoretic principles in organizing these tournaments, and use a Policy Space Response Oracle (PSRO) approach. We start with an initial set of LLM generated bots, and continue in rounds for adding new bots into the population. Each round adds a bot to the population by asking the LLM to produce code for playing against a bot representing the Nash equilibrium mixture over the current population. Our analysis shows that even a few rounds are sufficient to produces strong bots for playing the game. Our demo shows the process for the game of Checkers. We allow users to select initial bots in the population, run the process, inspect how the bots evolve over time, and play against the generated bots.

DM28: Aerial Coverage Path Planning in Nuclear Emergencies Preprint

Authors: Johann Blake, Matthias Schubert

Location: Montreal | Day: August 20th | Time: 11:30 | Session: DEMOS 1

Poster Board Position: From board n105 to board n108

We formulate a Coverage Path Planning (CPP) problem for a helicopter or a UAV tasked with mapping ground-level radiation while avoiding radiation that is too strong. We introduce a simulation environment that incorporates digital elevation models, altitude-dependent measurement footprints and realistic flight constraints, as well as state-of-the-art radiation scenario simulations, such as nuclear explosions, provided by the German Federal Office for Radiation Protection. We highlight the complexity of radiological survey missions and demonstrate the necessity for new CPP approaches that address these unique challenges. The code to our simulation environment can be found under https://github.com/JohannBlake/Aerial-Coverage-Path-Planning-in-Nuclear-Emergencies.

DM41: TRIKOP: Exploring Visual Prompting Paradigms for Multi-Grade Knee Osteoarthritis Classification on MRI Images Preprint

Authors: Hieu Phan, Hung Pham, Dat Nguyen, Khoa Le, Tuan Nguyen, Triet Tran, Tho Quan

Location: Montreal | Day: August 20th | Time: 11:30 | Session: DEMOS 3

Poster Board Position: From board n113 to board n116

Knee osteoarthritis (KOA) is a degenerative joint disease that significantly impacts quality of life. While transfer learning shows promise in medical imaging, its application to KOA diagnosis remains challenging due to medical data’s unique characteristics. To address this, we propose TRIKOP, a framework leveraging Visual Prompting for KOA diagnosis on MRI. Our approach explores three prompt-generating strategies that extract clinically relevant information from input images. Each prompt type is encoded using a tailored method to integrate effectively into the Vision Transformer for optimal representation. Among them, the contrastive embedding prompting strategy achieves 63.04% accuracy on the OAI dataset, surpassing prior studies. Moreover, TRIKOP produces attention maps highlighting diagnostically significant regions, improving model interpretability. This work highlights TRIKOP’s potential to improve AI-driven KOA diagnosis and clinical support.

DM46: Fairness-Aware Interactive Target Variable Definition Preprint

Authors: Dalia Gala, Milo Phillips-Brown, Naman Goel, Carina Prunkl, Laura Alvarez Jubete, medb corcoran, Ray Eitel-Porter

Location: Montreal | Day: August 20th | Time: 11:30 | Session: DEMOS 5

Poster Board Position: From board n120 to board n123

Machine learning requires defining one’s target variable for predictions or decisions, a process that can have profound implications on fairness, since biases are often encoded in target variable definition itself, before any data collection or training. The downstream impacts of target variable definitions must be taken into account in order to responsibly develop, deploy, and use the algorithmic systems. We propose FairTargetSim (FTS), an interactive and simulations-based approach for this. We demonstrate FTS using the example of algorithmic hiring, grounded in real-world data and user-defined target variables. FTS is open-source; it can be used by algorithm developers, non-technical stakeholders, researchers, and educators in a number of ways. FTS is available at: http://tinyurl.com/ftsinterface. The video accompanying this paper is here: http://tinyurl.com/ijcaifts.

DM58: SAFE: Structured Argumentation for Fact-checking with Explanations Preprint

Authors: Xiaoou Wang, Elena Cabrio, Serena Villata

Location: Montreal | Day: August 20th | Time: 11:30 | Session: DEMOS 2

Poster Board Position: From board n109 to board n112

Explainable fact-checking plays a vital role in the fight against disinformation in today’s digital landscape. With the increasing volume of unverified content online, providing justifications for fact-checking has become essential to help users make informed decisions. While recent studies provide user-friendly explanations through abstractive or extractive summarization, they often assume the availability of human-written fact-checking articles, which is not always the case. This demo introduces SAFE, an argument-based framework designed to enhance both fact-checking and its justification. Specifically, SAFE offers three key features: i) producing argument-structured summaries of human-written fact-checking articles, ii) in the absence of human-written articles, generating structured summaries based on evidence retrieved from a corpus through a jointly trained summarization and evidence retrieval system, and iii) assessing the truthfulness of a claim by analyzing the structured summary.

DM59: Aletheia: Detect, Discuss, and Stay Informed on Fake News Preprint

Authors: Dorsaf Sallami, Esma Aïmeur

Location: Montreal | Day: August 20th | Time: 11:30 | Session: DEMOS 2

Poster Board Position: From board n109 to board n112

In today’s digital era, the rapid spread of fake news undermines both social unity and democratic institutions, demanding effective countermeasures. Current browser extensions to counter fake news have significant limitations, such as opaque models, dependency on traditional Machine Learning (ML) techniques, lack of explanatory features, and limited focus on detection without user engagement support. This paper introduces Aletheia, a novel browser extension that addresses these shortcomings by leveraging Retrieval Augmented Generation (RAG) and Large Language Models (LLMs) to enhance fake news detection and provide evidence-based explanations. Additionally, Aletheia incorporates two key components: a Discussion Hub, enabling users to discuss instances of fake news, and a Stay Informed feature, which displays the latest fact-checks. Aletheia surpasses state-of-the-art methods according to experimental results.

DM69: SandboxSocial: A Sandbox for Social Media Using Multimodal AI Agents Preprint

Authors: Maximilian Puelma Touzel, Sneheel Sarangi, Gayatri Krishnakumar, Busra Tugce Gurbuz, Austin Welch, Zachary Yang, Andreea Musulan, Hao Yu, Ethan Kosak-Hine, Tom Gibbs, Camille Thibault, Reihaneh Rabbany, Jean-François Godbout, Dan Zhao, Kellin Pelrine

Location: Montreal | Day: August 20th | Time: 11:30 | Session: DEMOS 1

Poster Board Position: From board n105 to board n108

The online information ecosystem enables influence campaigns of unprecedented scale and impact. We urgently need empirically grounded approaches to counter the growing threat of malicious campaigns, now amplified by generative AI. But, developing defenses in real-world settings is impractical. Social system simulations with agents modelled using Large Language Models (LLMs) are a promising alternative approach and a growing area of research. However, existing simulators lack features needed to capture the complex information-sharing dynamics of platform-based social networks. To bridge this gap, we present SandboxSocial, a new simulator that includes several key innovations, mainly: (1) a virtual social media platform (modelled as Mastodon and mirrored in an actual Mastodon server) that enables a realistic setting in which agents interact; (2) an adapter that uses real-world user data to create more grounded agents and social media content; and (3) multi-modal capabilities that enable our agents to interact using both text and images—just as humans do on social media. We make the simulator more useful to researchers by providing measurement and analysis tools that track simulation dynamics and compute evaluation metrics to compare experimental results.

DM72: Veracity: An Open-Source AI Fact-Checking System Preprint

Authors: Taylor Lynn Curtis, Maximilian Puelma Touzel, William Garneau, Manon Gruaz, Mike Pinder, Li Wei Wang, Sukanya Krishna, Luda Cohen, Jean-François Godbout, Reihaneh Rabbany, Kellin Pelrine

Location: Montreal | Day: August 20th | Time: 11:30 | Session: DEMOS 6

Poster Board Position: From board n124 to board n126

The proliferation of misinformation poses a significant threat to society, exacerbated by the capabilities of generative AI.
This demo paper introduces Veracity, an open-source AI system designed to empower individuals to combat misinformation through transparent and accessible fact-checking. Veracity leverages the synergy between Large Language Models (LLMs) and web retrieval agents to analyze user-submitted claims and provide grounded veracity assessments with intuitive explanations. Key features include multilingual support, numerical scoring of claim veracity, and an interactive interface inspired by familiar messaging applications. This paper will showcase Veracity’s ability to not only detect misinformation but also explain its reasoning, fostering media literacy and promoting a more informed society.

DM76: VRD-IU: Lessons from Visually Rich Document Intelligence and Understanding Preprint

Authors: Yihao Ding, Soyeon Caren Han, Yan Li, Josiah Poon

Location: Montreal | Day: August 20th | Time: 11:30 | Session: DEMOS 3

Poster Board Position: From board n113 to board n116

Visually Rich Document Understanding (VRDU) has emerged as a critical field in document intelligence, enabling automated extraction of key information from complex documents across domains such as medical, financial, and educational applications. However, form-like documents pose unique challenges due to their complex layouts, multi-stakeholder involvement, and high structural variability. Addressing these issues, the VRD-IU Competition was introduced, focusing on extracting and localizing key information from multi-format forms within the Form-NLU dataset, which includes digital, printed, and handwritten documents.
This paper presents insights from the competition, which featured two tracks: Track A, emphasizing entity-based key information retrieval, and Track B, targeting end-to-end key information localization from raw document images. With over 20 participating teams, the competition showcased various state-of-the-art methodologies, including hierarchical decomposition, transformer-based retrieval, multimodal feature fusion, and advanced object detection techniques. The top-performing models set new benchmarks in VRDU, providing valuable insights into document intelligence.

DM79: MoleculeMiner: Extracting and Linking Molecule Figures with Tabular Metadata Preprint

Authors: Abhisek Dey, Nathaniel H. Stanley

Location: Montreal | Day: August 20th | Time: 11:30 | Session: DEMOS 3

Poster Board Position: From board n113 to board n116

Despite an ongoing shift in automated chemical literature search methods, many are fairly limited in ability to find very specific relevant information about a drawn molecule and its associated property data. We aim to tackle the challenge of converting drawn molecules to a machine readable representation and co-reference any associated molecule data. MoleculeMiner is a system where a user can feed in their own patent or paper to obtain each drawn molecule along with any specific metadata (chemical name, chemical reactivity, yield, purity etc.) provided anywhere in the PDF in a tabular format, using an interactive user-friendly environment. We also present MolScribeV2, a molecular image parser which improved upon the original MolScribe by introducing pixel-based self attention positional embedding technique. Along with other changes, MolScribeV2 is robust to varied styles of compound drawings commonly found in patents and papers–scanned or born digital. Our extraction and user interactive system can be found at https://github.com/insitro/MoleculeMiner.

DM80: MatchXplain: Analyzing Preferences, Explaining Outcomes, and Simplifying Decisions Preprint

Authors: Hadi Hosseini, Yubo Jing, Ronak Singh

Location: Montreal | Day: August 20th | Time: 11:30 | Session: DEMOS 5

Poster Board Position: From board n120 to board n123

Matching markets, where agents are assigned to one another based on preferences and constraints, are fundamental in various AI-driven applications such as school choice, content matching, and recommender systems. A key challenge in these markets is understanding preference data, as the interpretability of algorithmic solutions hinges on accurately capturing and explaining preferences. We introduce MatchXplain, a platform that integrates preference explanation with a robust matching engine. MatchXplain offers a layered approach for explaining preferences, computing diverse matching solutions, and providing interactive visualizations to enhance user understanding. By bridging algorithmic decision-making with explainability, MatchXplain improves transparency and trust in algorithmic matching markets.

DM84: TimelyMed: AI-Driven Clinical Course Attribution and Temporal Mapping for Psychiatric Medical Records Preprint

Authors: Chien-Hung Chen, Chi-Shin Wu, Chu-Hsien Su, Hsin-Hsi Chen

Location: Montreal | Day: August 20th | Time: 11:30 | Session: DEMOS 2

Poster Board Position: From board n109 to board n112

Timely understanding of a patient’s clinical course is crucial for effective treatment. Extracting course-related information, such as temporal and medical events, from unstructured medical records is both challenging and time-consuming, especially when relying on manual identification by physicians. We introduce TimelyMed, a system powered by a locally deployed large language model (LLM) that ensures data security while efficiently organizing key psychiatric events and their corresponding temporal information. Additionally, our system is attributed, allowing clinicians to not only categorize events but also trace them back to their original textual descriptions, ensuring transparency and interpretability in clinical decision-making. By organizing temporal and medical event information into timelines, our system enables physicians to quickly grasp a patient’s medical history while effectively reducing their cognitive burden.

DM90: Using Planning for Automated Testing of Video Games Preprint

Authors: Tomáš Balyo, Roman Barták, Lukáš Chrpa, Michal Červenka, Filip Dvořák, Stephan Gocht, Lukáš Lipčák, Viktor Macek, Dominik Roháček, Josef Ryzí, Martin Suda, Dominik Šafránek, Slavomír Švancar, G. Michael Youngblood

Location: Montreal | Day: August 20th | Time: 11:30 | Session: DEMOS 3

Poster Board Position: From board n113 to board n116

In this demonstration, we present a system that automates regression testing for video games using automated planning techniques. Traditional test scripts are a common method for testing both video games and software in general. While effective, they require manual creation and frequent updates throughout development, making the process labor-intensive. Our system eliminates this burden by automatically generating and maintaining test scripts. The test engineer only needs to define the game’s rules using the Planning Domain Definition Language (PDDL) and specify initial states and goals for individual test cases. This significantly reduces human effort while ensuring test scripts remain up to date. Additionally, our system integrates with game engine editors—supporting both Unity and Unreal to execute and evaluate test cases directly within the game. It collects detailed logs, telemetry data, and video recordings, allowing users to review test results efficiently.

DM93: NatSTV: Towards Verification of Natural Strategic Ability Preprint

Authors: Mateusz Kamiński, Damian Kurpiewski, Wojciech Jamroga

Location: Montreal | Day: August 20th | Time: 11:30 | Session: DEMOS 1

Poster Board Position: From board n105 to board n108

We present NatSTV, a tool for approximate verification of natural strategic ability in multi-agent systems. The tool builds on our model checker STV (STrategic Verifier), and implements heuristic synthesis of natural strategies for asynchronous agents with imperfect information and recall. All of that is available through a web interface, with no need to install or configure the software by the user.

DM99: Machine Learning Driven Optimization of Fe-Based TMCs for Photodynamic Therapy Preprint

Authors: Vladimir Manuilov, Antonio Francés-Monerris, Abdelazim M.A. Abdelgawwad, Daniel Escudero, Ilya Makarov

Location: Montreal | Day: August 20th | Time: 11:30 | Session: DEMOS 4

Poster Board Position: From board n117 to board n119

Noble metal-based photoactive complexes have applications in photodynamic therapy (PDT), but their toxicity and high cost drive interest in sustainable and cheaper alternatives like iron-based compounds. In this paper, quantum chemistry and classical molecular dynamics were employed to characterize the photophysical properties and non-covalent interactions with DNA of two Fe(III) complexes. We explained the absorption of IR wavelength by bright ligand-to-metal transitions and showed that the complexes exhibit persistent, albeit modest, interaction with DNA. Building on these traditional simulation methods, we propose a conceptual ML-driven optimization module designed to refine the structure of iron complexes and enhance their photophysical features. While the framework is not yet implemented, we demonstrate that key properties relevant for PDT can be computationally evaluated, providing a foundation for future iterative optimization. The ML module integrates 3D molecular structures, simulation results, and quantum chemical insights to suggest modifications aimed at shifting the absorption spectrum more favorably into the visible range, improving their suitability for phototherapies.