DM4: GE-Chat: A Graph Enhanced RAG Framework for Evidential Response Generation of LLMs
Authors: Longchao Da, Parth Mitesh Shah, Kuan-Ru Liou, Jiaxing Zhang, Hua Wei
Location: Guangzhou
| Day: August 29th
| Time: 15:45
| Session: DEMOS1.1
Show Abstract
Large Language Models (LLMs) have become integral to human decision-making processes. However, their outputs are not always reliable, often requiring users to assess the accuracy of the information provided manually. This issue is exacerbated by hallucinated responses, which are frequently presented with convincing but incorrect explanations, leading to trust concerns among users. To address this challenge, we propose GE-Chat, a knowledge Graph-enhanced retrieval-augmented generation framework designed to deliver Evidence-based responses. Specifically, when users upload a document, GE-Chat constructs a knowledge graph to support a retrieval-augmented agent, enriching the agent’s responses with external knowledge beyond its training data. We further incorporate Chain-of-Thought (CoT) reasoning, n-hop subgraph searching, and entailment-based sentence generation to ensure accurate evidence retrieval. Experimental results demonstrate that our approach improves the ability of existing models to identify precise evidence in free-form contexts, offering a reliable mechanism for verifying LLM-generated conclusions and enhancing trustworthiness.
DM25: OpenIAI-SNIO: A Systematic AR-Based Assembly Guidance System for Small-Scale, High-Density Industrial Components
Authors: Yuntao Wang, Yu Cheng, Junhao Geng
Location: Guangzhou
| Day: August 29th
| Time: 15:45
| Session: DEMOS1.1
Show Abstract
This paper develops an AR-based assembly guidance system, OpenIAI-SNIO, for small-scale, high-density industrial components (SHIC), which addresses the challenge of existing AR technology’s inability to achieve complete, accurate, and stable visual cognition and assembly operation guidance for SHIC. OpenIAI-SNIO combines artificial intelligence methods such as computer vision and deep learning with rule-based reasoning and augmented reality to achieve adaptive, whole process, and precise guidance of SHIC assembly in situations where visual information is insufficient. The application case shows that OpenIAI-SNIO can effectively improve the efficiency and quality of SHIC assembly, and reduce the workload of operators, realizing the systematic and practical application of AR technology in SHIC assembly.
DM27: Tsururu: A Python-based Time Series Forecasting Strategies Library
Authors: Alina Kostromina, Kseniia Kuvshinova, Aleksandr Yugay, Andrey Savchenko, Dmitry Simakov
Location: Guangzhou
| Day: August 29th
| Time: 15:45
| Session: DEMOS1.2
Show Abstract
While current time series research focuses on developing new models, crucial questions of selecting an optimal approach for training such models are underexplored. Tsururu, a Python library introduced in this paper, bridges SoTA research and industry by enabling flexible combinations of global and multivariate approaches and multi-step-ahead forecasting strategies. It also enables seamless integration with various forecasting models. Available at https://github.com/sb-ai-lab/tsururu.
DM43: PCToolkit: A Unified Plug-and-Play Prompt Compression Toolkit of Large Language Models
Authors: Zheng Zhang, Jinyi Li, Yihuai Lan, Xiang Wang, Hao Wang
Location: Guangzhou
| Day: August 29th
| Time: 15:45
| Session: DEMOS1.2
Show Abstract
Prompt engineering enables Large Language Models (LLMs) to perform a variety of tasks. However, lengthy prompts significantly increase computational complexity and economic costs. To address this issue, prompt compression reduces prompt length while maintaining LLM response quality. To support rapid implementation and standardization, we present the Prompt Compression Toolkit (PCToolkit), a unified plug-and-play framework for LLM prompt compression. PCToolkit integrates state-of-the-art compression algorithms, benchmark datasets, and evaluation metrics, enabling systematic performance analysis. Its modular architecture simplifies customization, offering portable interfaces for seamless incorporation of new datasets, metrics, and compression methods. Our code is available at https://github.com/3DAgentWorld/Toolkit-for-Prompt-Compression. Our demo is at https://huggingface.co/spaces/CjangCjengh/Prompt-Compression-Toolbox.
DM54: SkyRover: A Modular Simulator for Cross-Domain Pathfinding
Authors: Wenhui Ma, Wenhao Li, Bo Jin, Changhong Lu, Xiangfeng Wang
Location: Guangzhou
| Day: August 29th
| Time: 15:45
| Session: DEMOS1.3
Show Abstract
Unmanned Aerial Vehicles (UAVs) and Automated Guided Vehicles (AGVs) increasingly collaborate in logistics, surveillance, inspection tasks and etc.
However, existing simulators often focus on a single domain, limiting cross-domain study.
This paper presents the SkyRover, a modular simulator for UAV-AGV multi-agent pathfinding (MAPF).
SkyRover supports realistic agent dynamics, configurable 3D environments, and convenient APIs for external solvers and learning methods.
By unifying ground and aerial operations, it facilitates cross-domain algorithm design, testing, and benchmarking.
Experiments highlight SkyRover’s capacity for efficient pathfinding and high-fidelity simulations in UAV-AGV coordination.
We believe the SkyRover fills a key gap in MAPF research.
Project is available at https://sites.google.com/view/mapf3d/home.
However, existing simulators often focus on a single domain, limiting cross-domain study.
This paper presents the SkyRover, a modular simulator for UAV-AGV multi-agent pathfinding (MAPF).
SkyRover supports realistic agent dynamics, configurable 3D environments, and convenient APIs for external solvers and learning methods.
By unifying ground and aerial operations, it facilitates cross-domain algorithm design, testing, and benchmarking.
Experiments highlight SkyRover’s capacity for efficient pathfinding and high-fidelity simulations in UAV-AGV coordination.
We believe the SkyRover fills a key gap in MAPF research.
Project is available at https://sites.google.com/view/mapf3d/home.
DM57: What If LLMs Can Smell: A Prototype
Authors: Xueyi Zhou, Qi Lu, Dong-Kyu Chae
Location: Guangzhou
| Day: August 30th
| Time: 16:30
| Session: DEMOS2.1
Show Abstract
The olfaction is hardly mentioned in the studies of multi-modal Large Language Models (LLMs). This demo presents a prototypical framework to embody prevalent LLMs with smelling ability using a plug-and-play olfactory signal processing service. To this end, we collect a dataset on Korean beers by self-developed electronic noses (e-noses) and an open-source dataset. An olfaction-related question-answering corpus is also generated to fine-tune LLMs. A gas classification model is applied to identify the smelling liquor upon the e-nose data. We then adopt and fine-tune LLMs on the generated datasets. The results show that LLMs under this framework can interact with the environment by its `nose’ and provide olfaction-related answers augmented by our dataset. To the best of our knowledge, this is the first work on embodying LLMs with artificial olfaction. We additionally deployed the gas classification model and the trained LLM in a simple web-based system to show the feasibility of our prototype. Our demo video can be found at: https://bit.ly/4j8x6ZY.
DM63: MedDiT: A Knowledge-Controlled Diffusion Transformer Framework for Dynamic Medical Image Generation in Virtual Simulated Patient
Authors: Yanzeng Li, Cheng Zeng, Jinchao Zhang, Jie Zhou, Lei Zou
Location: Guangzhou
| Day: August 30th
| Time: 16:30
| Session: DEMOS2.1
Show Abstract
Medical education relies heavily on Simulated Patients (SPs) to provide a safe environment for students to practice clinical skills, including medical image analysis. However, the high cost of recruiting qualified SPs and the lack of diverse medical imaging datasets have presented significant challenges. To address these issues, this paper introduces MedDiT, a novel knowledge-controlled conversational framework that can dynamically generate plausible medical images aligned with simulated patient symptoms, enabling diverse diagnostic skill training. Specifically, MedDiT integrates various patient Knowledge Graphs (KGs), which describe the attributes and symptoms of patients, to dynamically prompt Large Language Models’ (LLMs) behavior and control the patient characteristics, mitigating hallucination during medical conversation. Additionally, a well-tuned Diffusion Transformer (DiT) model is incorporated to generate medical images according to the specified patient attributes in the KG. In this paper, we present the capabilities of MedDiT through a practical demonstration, showcasing its ability to act in diverse simulated patient cases and generate the corresponding medical images. This can provide an abundant and interactive learning experience for students, advancing medical education by offering an immersive simulation platform for future healthcare professionals. The work sheds light on the feasibility of incorporating advanced technologies like LLM, KG, and DiT in education applications, highlighting their potential to address the challenges faced in simulated patient-based medical education.
DM68: PyTorch-Lifestream: Learning Embeddings on Discrete Event Sequences
Authors: Artem Sakhno, Ivan Kireev, Dmitrii Babaev, Maxim Savchenko, Gleb Gusev, Andrey Savchenko
Location: Guangzhou
| Day: August 30th
| Time: 16:30
| Session: DEMOS2.1
Show Abstract
The domain of event sequences is widely applied in various industrial tasks in banking, healthcare, etc., where temporal tabular data processing is required. This paper introduces PyTorch-Lifestream, the first open-source library specially designed to handle event sequences. It supports scenarios with multimodal data and offers a variety of techniques for learning embeddings of event sequences and end-to-end model training. Furthermore, PyTorch-Lifestream efficiently implements state-of-the-art methods for event sequence analysis and adapts approaches from similar domains, thus enhancing the versatility and performance of sequence-based models for a wide range of applications, including financial risk scoring, campaigning, user ID matching, churn prediction, fraud detection, medical diagnostics, and recommender systems.
DM77: Taking STEPS Forward: Enhancing Online Peer-Counseling with Schema Therapy via Socratic Questioning
Authors: Beng Heng Ang, Sujatha Das Gollapalli, See-Kiong Ng
Location: Guangzhou
| Day: August 30th
| Time: 16:30
| Session: DEMOS2.2
Show Abstract
Peer-counseling is essential in online mental health communities to provide relatable support to those seeking help, but the peer-counselors often lack professional training in therapeutic counseling to produce the desired cognitive changes. In this paper, we present STEPS, an AI-powered assistive dialog tool for peer-counseling. Unlike other existing tools, STEPS assists peer-counselors in facilitating cognitive change in online counseling settings. Towards this goal, we emulate two key phases in a Schema Therapy-based in-person counseling session–(1) Schema Assessment to uncover the deep-seated irrational beliefs underlying an individual’s mental health problems, and (2) Cognitive Change to reframe these beliefs into healthier alternatives. In both phases, we employ Socratic questioning techniques to effectively elicit critical introspection and guide cognitive change. We describe STEPS and present expert evaluation studies on its counseling conversations on real-world mental health forum posts. Our results indicate that STEPS significantly outperforms competitive baselines on all key metrics related to schema assessment, cognitive change strategies, and critical thinking, achieving an impressive average rating of 5 out of 6, highlighting its strong potential as a transformative tool for online peer-counseling.
DM85: Automated Decision-Making on Networks with LLMs through Knowledge-Guided Evolution
Authors: Xiaohan Zheng, Lanning Wei, Yong Li, Quanming Yao
Location: Guangzhou
| Day: August 29th
| Time: 15:45
| Session: DEMOS1.1
Show Abstract
Effective decision-making on networks often relies on learning from graph-structured data, where Graph Neural Networks (GNNs) play a central role, but they take efforts to configure and tune. In this demo, we propose LLMNet, showing how to design GNN automated through Large Language Models. Our system develops a set of agents that construct graph-related knowlege bases and then leverages Retrieval-Augmented Generation (RAG) to support automated configuration and refinement of GNN models through a knowledge-guided evolution process. These agents, equipped with specialized knowledge bases, extract insights into tasks and graph structures by interacting with the knowledge bases.Empirical results show LLMNet excels in twelve datasets across three graph learning tasks, validating its effectiveness of GNN model designing.
DM87: A Smart Multimodal Healthcare Copilot with Powerful LLM Reasoning
Authors: Xuejiao Zhao, Siyan Liu, Su-Yin Yang, Chunyan Miao
Location: Guangzhou
| Day: August 29th
| Time: 15:45
| Session: DEMOS1.3
Show Abstract
Misdiagnosis causes significant harm to healthcare systems worldwide, leading to increased costs and patient risks. MedRAG is a smart multimodal healthcare copilot equipped with powerful large language model (LLM) reasoning, designed to enhance medical decision-making. It supports multiple input modalities, including non-intrusive voice monitoring, general medical queries, and electronic health records. MedRAG provides recommendations on diagnosis, treatment, medication, and follow-up questioning. Leveraging retrieval-augmented generation enhanced by knowledge graph-elicited reasoning, MedRAG retrieves and integrates critical diagnostic insights, reducing the risk of misdiagnosis. It has been evaluated on both public and private datasets, outperforming existing models and offering more specific and accurate healthcare assistance. A demonstration video of MedRAG is available at: https://www.youtube.com/watch?v=PNIBDMYRfDM. The source code is available at: https://github.com/SNOWTEAM2023/MedRAG.
DM88: Conversational Exploration of Literature Landscape with LitChat
Authors: Mingyu Huang, Shasha Zhou, Yuxuan Chen, Ke Li
Location: Guangzhou
| Day: August 30th
| Time: 16:30
| Session: DEMOS2.2
Show Abstract
We are living in an era of "big literature", where the volume of digital scientific publications is growing exponentially. While offering new opportunities, this also poses challenges for understanding literature landscapes, as traditional manual reviewing is no longer feasible. Recent large language models (LLMs) have shown strong capabilities for literature comprehension, yet they are incapable of offering "comprehensive, objective, open and transparent" views desired by systematic reviews due to their limited context windows and trust issues like hallucinations. Here we present LitChat, an end-to-end, interactive and conversational literature agent that augments LLM agents with data-driven discovery tools to facilitate literature exploration. LitChat automatically interprets user queries, retrieves relevant sources, constructs knowledge graphs, and employs diverse data-mining techniques to generate evidence-based insights addressing user needs. We illustrate the effectiveness of LitChat via a case study on AI4Health, highlighting its capacity to quickly navigate the users through large-scale literature landscape with data-based evidence that is otherwise infeasible with traditional means.
DM97: HealthLens: A Natural Language Querying System for Interactive Visualization of Electronic Health Records
Authors: Haodi Zhang, Siqi Ning, Qiyong Zheng, Yuanfeng Song, Liang-Jie Zhang
Location: Guangzhou
| Day: August 30th
| Time: 16:30
| Session: DEMOS2.2
Show Abstract
As an essential part of modern healthcare systems, extracting valuable insights from electronic medical records (EMRs) remains challenging due to the complexity of structured and unstructured data. Data visualization is essential for transforming complex data into comprehensible visuals that enable professionals to identify patterns and trends. This process involves selecting data attributes, transforming the data, choosing appropriate visual encoding methods, and rendering graphical representations using declarative visualization languages (DVLs). However, achieving proficiency in DVLs requires a deep understanding of domain-specific data and expertise in these languages, which poses a significant barrier for beginners and non-technical users. To address these challenges, we present HealthLens, the first user-friendly visualization tool in the EMR domain that eliminates the need for prior knowledge of DVLs. Built on the MedCodeT5 model developed by us and leveraging a large language model with a bilevel optimization approach, HealthLens enables the generation of EMR visualizations from natural language queries. This demonstrates the feasibility of creating sophisticated visualizations with minimal technical expertise, advancing accessibility in the EMR field.