Tutorials (Guangzhou)

T2
LLM-based Role-Playing from the Perspective of Hallucinations
Yan Wang, Jiaan Wang, Leyang Cui, Xinting Huang, Hongqiu Wu, Nuo Chen, Deng Cai, Shuming SHi
https://ijcai-roleplay.github.io/
Role-playing tasks in NLP require deliberate hallucinations to create immersive fictional scenarios, unlike conventional tasks that prioritize factual accuracy. LLMs must generate narratives adhering to explicit boundaries (e.g., personas) and implicit constraints (e.g., temporal knowledge), balancing creative storytelling with strict limitations. This "controlled hallucination" avoids under-hallucination (rigid interactions) or excessive deviations (factual inconsistencies). This tutorial connects theory and practice, exploring how to achieve controlled hallucination without sacrificing reliability. It examines methods across the LLM lifecycle, including pre-training reinforcement, fine-tuning for coherence, preference alignment for boundary adherence, and decoding adjustments for rule-bound creativity. By emphasizing real-world applications, it shows how to develop role-playing agents that blend imaginative storytelling with user-defined safeguards.

Schedule: August 29th PM – Full afternoon
T4
Empowering LLMs with Logical Reasoning: Challenges, Solutions, and Opportunities
Haoxuan Li, Fengxiang Cheng, Fenrong Liu, Robert van Rooij, Kun Zhang, Zhouchen Lin
https://sites.google.com/view/ijcai25-tutorial-logicllm
Large language models (LLMs) have achieved remarkable successes on various tasks. However, recent studies have found that there are still significant challenges to the logical reasoning abilities of LLMs, which can be categorized into the following two aspects: (1) Logical question answering: LLMs often fail to generate the correct answer within a complex logical problem which requires sophisticated deductive, inductive or abductive reasoning given a collection of premises and constrains. (2) Logical consistency: LLMs are prone to producing responses contradicting themselves across different questions. For example, a state-of-the-art question-answering LLM Macaw, answers “Yes” to both questions “Is a magpie a bird?” and “Does a bird have wings?” but answers “No” to “Does a magpie have wings?”. In this tutorial, we comprehensively introduce the most cutting-edge methods with a proposed new taxonomy. Specifically, to accurately answer complex logic questions, previous methods can be categorized based on reliance on external solvers, prompts, and fine-tuning. To avoid logical contradictions, we discuss concepts and solutions of various logical consistencies, including implication, negation, transitivity, factuality consistencies, and their composites. In addition, we review commonly used benchmark datasets and evaluation metrics, and discuss promising research directions, such as extending to modal logic to account for uncertainty and developing efficient algorithms that simultaneously satisfy multiple logical consistencies.

Schedule: August 29th PM – Full afternoon
T5
Large Language Models for Recommendation
Zhi Zheng, Likang Wu, Xian Wu, Hui Xiong
https://zhengzhi-1997.github.io/LLM4Rec-Tutorial/
In this tutorial, we aim to provide a comprehensive review and discussion of the intersection between LLMs and recommender systems. By offering a clear overview of recent advances and research directions, this tutorial aims to equip both academic researchers and industry practitioners with the necessary knowledge to leverage LLMs in creating more effective and trustworthy recommender systems.

Schedule: August 29th PM – Before coffee break
T9
Multimodal Large Language Model for Visually Rich Document Understanding
Yihao Ding, Feiqi Cao, Siqu Long, Siwen Luo, Yifan Peng
https://github.com/yihaoding/mllm_vrdiu
The rising demand for a Visually Rich Document (VRD) understanding spans various domains such as business, law, and medicine, aiming to enhance the efficiency of document-related tasks. Multimodal Large Language Models (MLLMs) are particularly suited for VRD understanding (VRDU) tasks, as they possess extensive real-world knowledge, enabling zero-shot or few-shot learning. This tutorial will systematically overview LLM/MLLM-driven VRDU frameworks, analyzing current technical trends. It will begin by introducing common document understanding tasks, along with associated benchmarks and techniques. Following this, a summary of existing LLM/MLLM-based VRDU frameworks will cover training strategies, inference methods, and data preparation. To solidify understanding, two hands-on labs will be provided, ensuring participants grasp not only theoretical concepts but also practical applications using state-of-the-art techniques to address real-world challenges.

Schedule: August 29th PM – Full afternoon
T11
Gradient-Based Multi-Objective Deep Learning
Weiyu Chen, Xiaoyuan Zhang, Baijiong Lin, Xi Lin, Han Zhao
https://gradnexus.github.io/IJCAI25_tutorial/
The evaluation of deep learning models often involves navigating trade-offs among multiple criteria. This tutorial provides a structured overview of gradient-based multi-objective optimization (MOO) for deep learning models. We begin with the foundational theory, systematically exploring three core solution strategies: identifying a single balanced solution, finding a discrete set of Pareto optimal solutions, and learning a continuous Pareto set. We will cover their algorithmic details, convergence, and generalization. The second half of the tutorial focuses on applying MOO to Large Language Models (LLMs). We will demonstrate how MOO offers a principled framework for fine-tuning and aligning LLMs, effectively navigating trade-offs between multiple objectives. Through practical demonstrations of state-of-the-art methods, participants will gain valuable insights. The session will conclude by discussing emerging challenges and future research directions, equipping attendees to tackle multi-objective problems in their work.

Schedule: August 29th PM – Full afternoon
T13
GUI Agents with Foundation Models: Data Resource, Framework and Application
Shuai Wang, Kaiwen Zhou, Rui Shao, Gongwei Chen, Yuqi Zhou
https://yuqi-zhou.github.io/GUI-Agent-with-Foundation-Models.github.io/
This tutorial aims to provide a structured overview of the latest work in the field of GUI agents. We deconstructs GUI agent ecosystems into three critical pillars: multimodal data resources, algorithmic frameworks, and applications: Data resources, such as user instructions, user interface (UI) screenshots, and behavior traces, form the cornerstone for the architecture design of GUI agent; Frameworks orchestrate the power of foundation models, knowledge bases and tools to enable intelligent and reliable decision-making; Applications represent the concrete setups for domain-optimized implementation. The current state of these aspects reflects the maturity of the field and highlights future research priorities. To this end, we organize this tutorial to be a captivating review and lecture around three key areas: Data Resources, Frameworks, and Applications.

Schedule: August 29th PM – Full afternoon
T15
Multi-Modal Generative AI in Dynamic and Open Environment
Xin Wang, Hong Chen, Yuwei Zhou, Wenwu Zhu
https://mn.cs.tsinghua.edu.cn/ijcai25-aigc/
This tutorial explores recent advancements in multi-modal generative AI, including multi-modal large language models (MLLMs) and diffusion models, and also highlights the emerging challenges that exist when applying them to dynamic and open environments, where generalizable post-training techniques and unified multi-modal understanding and generation framework will be discussed.

Schedule: August 29th PM – Before coffee break
T16
A Tutorial on Bandit Learning in Matching Markets
Shuai Li, Zilong Wang
https://sites.google.com/view/matchingtutorial
Matching Markets is fundamental in economics and game theory, aiming at developing matching results to achieve the desired objective. Stable matching is an important problem in this field that characterizes the equilibrium state among agents. Due to that agents usually have uncertain preferences, bandit learning recently attracted substantial research attention in this problem. This line of work mainly focuses on the algorithms’ stable regret which characterizes the utility of agents, and incentive compatibility which characterizes the robustness of the system. This tutorial comprehensively introduces the latest advancements and pioneering results in this problem, with a particular emphasis on these two metrics. To investigate the application of bandit algorithms in other areas of mechanism design, the tutorial also broadens its scope by introducing the bandit learning problem in auctions.

Schedule: August 29th PM – After coffee break
T18
Towards Low-Distortion Graph Representation Learning
Ziwei Zhang, Qingyun Sun, Xingcheng Fu, Jianxin Li
https://magic-group-buaa.github.io/IJCAI25_tutorial/
Low-distortion graph representation learning aims to address the challenge of preserving complex structures and topological properties of graphs in low-dimensional representations. Graph distortion manifests in three primary ways: noisy and perturbed structure, missing and tampered topological properties, and attribution fallacies in structural distribution. In this tutorial, we will examine the development of low-distortion graph representation learning through three key perspectives: information theory, geometry, and causality. Specifically, we cover three major aspects, i.e., information-theoretic graph representation learning, geometry-guided graph representation learning, and invariance-guided graph representation learning. Besides, we will discuss future directions, such as advanced theories, low-distortion approaches in graph large language models and graph foundation model, and novel applications in AI for Science.

Schedule: August 29th PM – Full afternoon
T19
Beyond Graph Distribution Shifts: LLMs, Adaptation, and Generalizati
Xin Wang, Haoyang Li, Zeyang Zhang, Wenwu Zhu
https://ood-generalization.com/ijcai2025Tutorial.htm
Graph machine learning has witnessed rapid progress across both academia and industry. However, most existing methods are developed under the in-distribution (I.D.) hypothesis, which assumes that training and testing graph data are drawn from the same distribution. In real-world applications—ranging from dynamic knowledge graphs to evolving biomedical networks—this assumption is frequently violated, resulting in severe performance degradation under distribution shifts. Addressing this challenge has become a key focus in recent years, leading to the development of novel paradigms that move beyond the I.D. setting. This tutorial presents a comprehensive overview of three emerging and synergistic directions for tackling distribution shifts in graph learning. First, we highlight Graph LLMs, which combine the representational power of large language models with graph structures to enable flexible, in-context, and few-shot learning on graphs. Second, we introduce adaptation techniques for both GNNs and Graph LLMs, including graph neural architecture search and continual learning strategies for evolving data. Third, we cover generalization methods that incorporate causality and invariance principles to build robust graph models under unseen distributions. These advances are reshaping the future of graph machine learning and are of central interest to the IJCAI community across machine learning, data mining, and real-world AI deployment.

Schedule: August 29th PM – After coffee break