Special Track on AI and Social Good Papers (Guangzhou)

583: Deconfounding Multi-Cause Latent Confounders: A Factor-Model Approach to Climate Model Bias Correction
Authors: Wentao Gao, Jiuyong Li, Debo Cheng, Lin Liu, Jixue Liu, Thuc Le, Xiaojing Du, Xiongren Chen, Yun Chen, Yanchang Zhao
Location: Guangzhou | Day: TBD
Show Abstract
Global Climate Models (GCMs) are crucial for predicting future climate changes by simulating the Earth systems. However, GCM outputs exhibit systematic biases due to model uncertainties, parameterization simplifications, and inadequate representation of complex climate phenomena. Traditional bias correction methods, which rely on historical observation data and statistical techniques, often neglect unobserved confounders, leading to biased results. This paper proposes a novel bias correction approach to utilize both GCM and observational data to learn a factor model that captures multi-cause latent confounders. Inspired by recent advances in causality based time series deconfounding, our method first constructs a factor model to learn latent confounders from historical data and then applies them to enhance the bias correction process using advanced time series forecasting models. The experimental results demonstrate significant improvements in the accuracy of precipitation outputs. By addressing unobserved confounders, our approach offers a robust and theoretically grounded solution for climate model bias correction.
1005: Beyond Patterns: Harnessing Causal Logic for Autonomous Driving Trajectory Prediction
Authors: Bonan Wang, Haicheng Liao, Chengyue Wang, Bin Rao, Yanchen Guan, Guyang Yu, Jiaxun Zhang, Songning Lai, Chengzhong Xu, Zhenning Li
Location: Guangzhou | Day: TBD
Show Abstract
Accurate trajectory prediction has long been a major challenge for autonomous driving (AD). Traditional data-driven models predominantly rely on statistical correlations, often overlooking the causal relationships that govern traffic behavior. In this paper, we introduce a novel trajectory prediction framework that leverages causal inference to enhance predictive robustness, generalization, and accuracy. By decomposing the environment into spatial and temporal components, our approach identifies and mitigates spurious correlations, uncovering genuine causal relationships. We also employ a progressive fusion strategy to integrate multimodal information, simulating human-like reasoning processes and enabling real-time inference. Evaluations on five real-world datasets—ApolloScape, nuScenes, NGSIM, HighD, and MoCAD—demonstrate our model’s superiority over existing state-of-the-art (SOTA) methods, with improvements in key metrics such as RMSE and FDE. Our findings highlight the potential of causal reasoning to transform trajectory prediction, paving the way for robust AD systems.
1058: Weather Foundation Model Enhanced Decentralized Photovoltaic Power Forecasting Through Spatio-temporal Knowledge Distillation
Authors: Fang He, Jiaqi Fan, Yang Deng, Xiaoyang Zhang, Ka Tai Lau, Dan Wang
Location: Guangzhou | Day: TBD
Show Abstract
The solar photovoltaic power forecasting (SPPF) of a PV system is vital for the downstream power estimation. While approaches for recent decentralized PV systems require customized models for each PV installation, this method is labor-intensive and not scalable. Therefore, developing a general SPPF model for a decentralized PV system is essential. The primary challenge in developing such a model is accounting for regional weather variations. Recent advancements in weather foundation models (WFMs) offer a promising opportunity, providing accurate forecasts with reduced computational demands. However, integrating WFMs into SPPF models remains challenging due to the complexity of WFMs. This paper introduces a novel approach, spatio-temporal knowledge distillation (STKD), to efficiently adapt WFMs for SPPF. The proposed STKD-PV models leverage regional weather and PV power data to forecast power generation from six hours to a day ahead. Globally evaluated across six datasets, STKD-PV models demonstrate superior performance compared to state-of-the-art (SOTA) time-series models and fine-tuned WFMs, achieving significant improvements in forecasting accuracy. This study marks the first application of knowledge distillation from WFMs to SPPF, offering a scalable and cost-effective solution for decentralized PV systems.
1652: MutationGuard: A Graph and Temporal-Spatial Neural Method for Detecting Mutation Telecommunication Fraud
Authors: Haitao Bai, Pinghui Wang, Ruofei Zhang, Ziyang Zhou, Juxiang Zeng, Yulou Su, Li Xing, Zhou Su, Chen Zhang, Lizhen Cui, Jun Hao, Wei Wang
Location: Guangzhou | Day: TBD
Show Abstract
Telecommunication fraud refers to deceptive activities in the field of communication services. This research focuses on a category of fraud identified as ”mutation telecommunication fraud". There is currently a lack of research on mutation telecommunication fraud detection, allowing this type of fraud to persist uncaught. We identify that detecting mutation fraud requires capturing multi-source patterns, including user communication graphs and temporal-spatial Voice of Call (VOC) features. Specifically, we introduce MutationGuard, which leverages Graph Neural Networks (GNN) to capture changes in user communication graphs. For VOC records, we map call start times onto a 3D cylindrical surface, thereby representing each VOC record in spatial coordinates and utilizing proposed LFFE and TCFE modules to capture local fraud behaviors and temporal behavior changes. The proposed neural modeling approach that facilitates multi-source information fusion constitutes a significant advancement in detecting mutation fraud.
Experiment results reveal a significant improvement in the AUC score by 1.52% and the F1 score by 1.36% on the proposed telecommunication fraud dataset. Particularly, our method shows a significant improvement of 13.93% in accuracy on mutation fraud data. We also validate the effectiveness of our method on the publicly available Sichuan Telecommunication Fraud dataset.
2373: OpenCarbon: A Contrastive Learning-based Cross-Modality Neural Approach for High-Resolution Carbon Emission Prediction Using Open Data
Authors: Jinwei Zeng, Yu Liu, Guozhen Zhang, Jingtao Ding, Yuming Lin, Jian Yuan, Yong Li
Location: Guangzhou | Day: TBD
Show Abstract
Accurately estimating high-resolution carbon emissions is crucial for effective emission governance and mitigation planning. While conventional methods for precise carbon accounting are hindered by substantial data collection efforts, the rise of open data and advanced learning techniques offers a promising solution. Once an open data-based prediction model is developed and trained, it can easily infer emissions for new areas based on available open data. To address this, we incorporate two modalities of open data, satellite images and point-of-interest (POI) data, to predict high-resolution urban carbon emissions, with satellite images providing macroscopic and static and POI data offering fine-grained and relatively dynamic functionality information. However, estimating high-resolution carbon emissions presents two significant challenges: the intertwined and implicit effects of various functionalities on carbon emissions, and the complex spatial contiguity correlations that give rise to the agglomeration effect. Our model, OpenCarbon, features two major designs that target the challenges: a cross-modality information extraction and fusion module to extract complementary functionality information from two modules and model their interactions, and a neighborhood-informed aggregation module to capture the spatial contiguity correlations. Extensive experiments demonstrate our model’s superiority, with a significant performance gain of 26.6% on R2. Further generalizability tests and case studies also show OpenCarbon’s capacity to capture the intrinsic relation between urban functionalities and carbon emissions, validating its potential to empower efficient carbon governance and targeted carbon mitigation planning. Codes and data are available: https://github.com/JinweiZzz/OpenCarbon.
4538: LLM-based Collaborative Agents with Pedagogy-guided Interaction Modeling for Timely Instructive Feedback Generation in Task-oriented Group Discussions
Authors: Qihao Yang, Yu Yang, Sixu An, Tianyong Hao, Guandong Xu
Location: Guangzhou | Day: TBD
Show Abstract
Large language models (LLMs) fundamentally reshape learning and teaching models, shifting tutoring systems from supporting individual learning to facilitating collaborative learning (CL) like task-oriented group discussions. However, existing AI tutors struggle to guide CL, as they seldom model the interactions between AI tutors and students. Therefore, they cannot scaffold students to complete tasks collaboratively, which impairs learning outcomes and pedagogy adaptability. Additionally, existing AI tutors fail to make use of CL theories to generate instructive feedback, which leads to undesirable interactions such as over-instruction and limits students’ autonomy. In this paper, we propose an LLM-based collaborative agent that innovatively leverages pedagogical strategies to sense discussion stages, detect learning issues, identify the timing of intervention, and generate instructive feedback. To develop the agent, we first design a prompting strategy based on a CL theory, that is, the Community of Inquiry, to cultivate the agent to understand the discussion status. Second, a multi-agent interaction framework is proposed to simulate the collaborative learning behavior between AI tutors and students. Meanwhile, a synthetic task-oriented group discussion dataset, namely CLTeach, is generated, which consists of 27k manually-verified multi-party dialogues with fine-grained annotations of instructive feedback and explanations. Lastly, we use CLTeach to fine-tune the LLM agent, ultimately enabling it to generate instructive feedback at the right time to support students in CL. Extensive experiments demonstrate that our agent achieves state-of-the-art performance in feedback generation and has the potential to mimic human teachers effectively.
5645: Reinforcement Learning for Hybrid Charging Stations Planning and Operation Considering Fixed and Mobile Chargers
Authors: Yanchen Zhu, Honghui Zou, Chufan Liu, Yuyu Luo, Yuankai Wu, Yuxuan Liang
Location: Guangzhou | Day: TBD
Show Abstract
efficient and adaptable charging infrastructure. Fixed-location charging stations often suffer from underutilization or congestion due to fluctuating demand, while mobile chargers offer flexibility by relocating as needed. This paper studies the optimal planning and operation of hybrid charging infrastructures that combine both fixed and mobile chargers within urban road networks. We formulate the Hybrid Charging Station Planning and Operation (HCSPO) problem, jointly optimizing the placement of fixed stations and the scheduling of mobile chargers. A charging demand prediction model based on Model Predictive Control (MPC) supports dynamic decision-making. To solve the HCSPO problem, we propose a deep reinforcement learning approach enhanced with heuristic scheduling. Experiments on real-world urban scenarios show that our method improves infrastructure availability—achieving up to 244.4% increase in coverage—and reduces user inconvenience with up to 79.8% shorter waiting times, compared to existing solutions.
7716: Generative Agents for Multimodal Controversy Detection
Authors: Tianjiao Xu, Jinfei Gao, Keyi Kong, Jianhua Yin, Tian Gan, Liqiang Nie
Location: Guangzhou | Day: TBD
Show Abstract
Multimodal controversy detection, which involves determining whether a given video and its associated comments are controversial, plays a pivotal role in risk management on social video platforms. Existing methods typically provide only classification results, failing to identify what aspects are controversial and why, thereby lacking detailed explanations. To address this limitation, we propose a novel Agent-based Multimodal Controversy Detection architecture, termed AgentMCD. This architecture leverages Large Language Models (LLMs) as generative agents to simulate human behavior and improve explainability. AgentMCD employs a multi-aspect reasoning process, where multiple judges conduct evaluations from diverse perspectives to derive a final decision. Furthermore, a multi-agent simulation process is incorporated, wherein agents act as audiences, offering opinions and engaging in free discussions after watching videos. This hybrid framework enables comprehensive controversy evaluation and significantly enhances explainability. Experiments conducted on the MMCD dataset demonstrate that our proposed architecture outperforms existing LLM-based baselines in both high-resource and low-resource comment scenarios, while maintaining superior explainability.
8413: ContextAware: A Multi-Agent Framework for Detecting Harmful Image-Based Comments on Social Media
Authors: Zheng Wei, Mingchen Li, Pu Zhang, Xinyu Liu, Huamin Qu, Pan Hui
Location: Guangzhou | Day: TBD
Show Abstract
Detecting hidden stigmatization in social media poses significant challenges due to semantic misalignments between textual and visual modalities, as well as the subtlety of implicit stigmatization. Traditional approaches often fail to capture these complexities in real-world, multimodal content. To address this gap, we introduce ContextAware, an agent-based framework that leverages specialized modules to collaboratively process and analyze images, textual context, and social interactions. Our approach begins by clustering image embeddings to identify recurring content, activating high-likes agents for deeper analysis of images receiving substantial user engagement, while comprehensive agents handle lower-engagement images. By integrating case-based learning, textual sentiment, and vision-language models (VLMs), ContextAware refines its detection of harmful content. We evaluate ContextAware on a self-collected Douyin dataset focused on interracial relationships, comprising 871 short videos and 885,502 comments—of which a notable portion are image-based. Experimental results show that ContextAware not only outperforms state-of-the-art methods in accuracy and F1 score but also effectively detects implicit stigmatization within the highly contextual environment of social media. Our findings underscore the importance of agent-based architectures and multimodal alignment in capturing nuanced, culturally specific forms of harmful content.
8435: HARMONY: A Privacy-preserving and Sensor-agnostic Tele-monitoring system
Authors: Qipeng Xie, Hao Guo, Weizheng Wang, Yongzhi Huang, Linshan Jiang, Jiafei Wu, Shuxin Zhong, Lu Wang, Kaishun Wu
Location: Guangzhou | Day: TBD
Show Abstract
Global aging necessitates tele-monitoring systems to provide real-time tracking and timely assistance for older adults living independently. While pervasive wireless devices (e.g., CSI, IMU, UWB) enable cost-effective, non-intrusive monitoring, existing systems lack flexibility, limiting their adaptability to different environments. In this work, we posit that the motion dynamics of human movement are invariant across sensing modalities, inspiring the design of HARMONY—a privacy-preserving, sensor-agnostic system that supports multi-modal inputs and diverse tele-monitoring tasks. HARMONY incorporates Modality-agnostic Data Processing to uniformly encrypt multi-modal signals and Task-specific Activity Recognition for seamless tasks adaptation. A novel Encrypted-processing Engine then significantly accelerates computations on encrypted data by optimizing matrix and convolution operations. Evaluations across five different sensing modalities show that HARMONY consistently achieves high accuracy while delivering 3.5 × to 130 × speedups over state-of-the-art baselines. Our results demonstrate that HARMONY is a practical, scalable, and privacy-centric prototype for next-generation remote healthcare.
8601: Resolving Conflicting Evidence in Automated Fact-Checking: A Study on Retrieval-Augmented LLMs
Authors: Ziyu Ge, Yuhao Wu, Daniel Wai Kit Chin, Roy Ka-Wei Lee, Rui Cao
Location: Guangzhou | Day: TBD
Show Abstract
Large Language Models (LLMs) augmented with retrieval mechanisms have demonstrated significant potential in fact-checking tasks by integrating external knowledge. However, their reliability decreases when confronted with conflicting evidence from sources of varying credibility. This paper presents the first systematic evaluation of Retrieval-Augmented Generation (RAG) models for fact-checking in the presence of conflicting evidence. To support this study, we introduce CONFACT (Conflicting Evidence for Fact-Checking), a novel dataset comprising questions paired with conflicting information from various sources. Extensive experiments reveal critical vulnerabilities in state-of-the-art RAG methods, particularly in resolving conflicts stemming from differences in media source credibility. To address these challenges, we investigate strategies to integrate media background information into both the retrieval and generation stages. Our results show that effectively incorporating source credibility significantly enhances the ability of RAG models to resolve conflicting evidence and improve fact-checking performance.
8659: Mat-Instructions: A Large-Scale Inorganic Material Instruction Dataset for Large Language Models
Authors: Ke Liu, Shangde Gao, Yichao Fu, Xiaoliang Wu, Shuo Tong, Ajitha Rajan, Hao Xu
Location: Guangzhou | Day: TBD
Show Abstract
Recent advancements in large language models (LLMs) have revolutionized research discovery across various scientific disciplines, including materials science. The discovery of novel materials, particularly crystal materials, is essential for achieving sustainable development goals (SDGs), as they drive breakthroughs in climate change mitigation, clean and affordable energy, and the promotion of industrial innovation. However, unlocking the full potential of LLMs in materials research remains challenging due to the lack of high-quality, diverse, and instruction-based datasets. Such datasets are crucial for guiding these models in understanding and predicting the structure, property, and function of materials across various tasks. To address this limitation, we introduce Mat-Instruction, a large-scale inorganic material instruction dataset, specifically designed to unlock the potential of LLMs in materials science. Extensive experiments on fine-tuning LLaMA with our Mat-Instruction dataset demonstrate its effectiveness in advancing progress for materials science. The code and dataset are available at https://github.com/zjuKeLiu/Mat-Instructions
8675: Uncertainty-aware Predict-Then-Optimize Framework for Equitable Post-Disaster Power Restoration
Authors: Lin Jiang, Dahai Yu, Rongchao Xu, Tian Tang, Guang Wang
Location: Guangzhou | Day: TBD
Show Abstract
The increasing frequency of extreme weather events, such as hurricanes, highlights the urgent need for efficient and equitable power system restoration. Many electricity providers make restoration decisions primarily based on the volume of power restoration requests from each region. However, our data-driven analysis reveals significant disparities in request submission volume, as disadvantaged communities tend to submit fewer restoration requests. This disparity makes the current restoration solution inequitable, leaving these communities vulnerable to extended power outages. To address this, we aim to propose an equity-aware power restoration strategy that balances both restoration efficiency and equity across communities. However, achieving this goal is challenging for two reasons: the difficulty of predicting repair durations under dataset heteroscedasticity, and the tendency of reinforcement learning agents to favor low-uncertainty actions, which potentially undermine equity. To overcome these challenges, we design a predict-then-optimize framework called EPOPR with two key components: (1) Equity-Conformalized Quantile Regression for uncertainty-aware repair duration prediction, and (2) Spatial-Temporal Attentional RL that adapts to varying uncertainty levels across regions for equitable decision-making. Experimental results show that our EPOPR effectively reduces the average power outage duration by 3.60% and decreases inequity between different communities by 14.19% compared to state-of-the-art baselines.
8690: BGM: Demand Prediction for Expanding Bike-Sharing Systems with Dynamic Graph Modeling
Authors: Yixuan Zhao, Hongkai Wen, Xingchen Zhang, Man Luo
Location: Guangzhou | Day: TBD
Show Abstract
Accurate demand prediction is crucial for the equitable and sustainable expansion of bike-sharing systems, which help reduce urban congestion, promote low-carbon mobility, and improve transportation access in underserved areas. However, expanding these systems presents societal challenges, particularly in ensuring fair resource distribution and operational efficiency. A major hurdle is the difficulty of demand prediction at new stations, which lack historical usage data and are heavily influenced by the existing network. Additionally, new stations dynamically reshape demand patterns across time and space, complicating efforts to balance supply and accessibility in evolving urban environments. Existing methods model relationships between new and existing stations but often assume static patterns, overlooking how new stations reshape demand dynamics over time and space. To tackle these challenges, we propose a novel demand prediction framework for expanding bike-sharing systems, namely BGM, which leverages dynamic graph modeling to capture the evolving inter-station correlations while accounting for spatial and temporal heterogeneity. Specifically, we develop a knowledge transfer approach that studies the embeddings transformation across existing and new stations through a learnable orthogonal mapping matrix. We further design a gated selecting vector-based feature fusion mechanism to integrate the transferred embeddings and the intrinsic features of stations for precise predictions. Experiments on real-world bike-sharing data demonstrate that BGM outperforms existing methods.
8743: SMILE: A Scale-aware Multiple Instance Learning Method for Multicenter STAS Lung Cancer Histopathology Diagnosis
Authors: Liangrui Pan, Xiaoyu Li, Yutao Dou, Qiya Song, Jiadi Luo, Qingchun Liang, Shaoliang Peng
Location: Guangzhou | Day: TBD
Show Abstract
Spread through air spaces (STAS) represents a newly identified aggressive pattern in lung cancer, which is known to be associated with adverse prognostic factors and complex pathological features. Pathologists currently rely on time-consuming manual assessments, which are highly subjective and prone to variation. This highlights the urgent need for automated and precise diagnostic solutions. 2,970 lung cancer tissue slides are comprised from multiple centers, re-diagnosed them, and constructed and publicly released three lung cancer STAS datasets: STAS-CSU (hospital), STAS-TCGA, and STAS-CPTAC. All STAS datasets provide corresponding pathological feature diagnoses and related clinical data. To address the bias, sparse and heterogeneous nature of STAS, we propose an scale-aware multiple instance learning(SMILE) method for STAS diagnosis of lung cancer. By introducing a scale-adaptive attention mechanism, the SMILE can adaptively adjust high-attention instances, reducing over-reliance on local regions and promoting consistent detection of STAS lesions. Extensive experiments show that SMILE achieved competitive diagnostic results on STAS-CSU, diagnosing 251 and 319 STAS samples in CPTAC and TCGA, respectively, surpassing clinical average AUC. The 11 open baseline results are the first to be established for STAS research, laying the foundation for the future expansion, interpretability, and clinical integration of computational pathology technologies. The datasets and code are available at https://github.com/panliangrui/IJCAI25.
8747: ECG2TOK: ECG Pre-Training with Self-Distillation Semantic Tokenizers
Authors: Xiaoyan Yuan, Wei Wang, Han Liu, Jian Chen, Xiping Hu
Location: Guangzhou | Day: TBD
Show Abstract
Self-supervised learning (SSL) has garnered increasing attention in electrocardiogram (ECG) analysis for its effectiveness in resource-limited settings. Existing state-of-the-art SSL methods rely on time-frequency detail reconstruction, but due to the inherent redundancy of ECG signals and individual variability, these approaches often yield suboptimal performance. In contrast, discrete label prediction becomes a superior pre-training objective by encouraging models to efficiently abstract ECG high-level semantics. However, the continuity and significant variability of ECG signals pose a challenge in generating semantically discrete labels. To address this issue, we propose an ECG pretraining framework with a self-distillation semantic tokenizer (ECG2TOK), which maps continuous ECG signals into discrete labels for self-supervised training. Specifically, the tokenizer extracts semantically aware embeddings of ECG by self-distillation and performs online clustering to generate semantically rich discrete labels. Subsequently, the SSL model is trained in conjunction with masking strategies and discrete label prediction to facilitate the abstraction of high-level semantic representations. We evaluate ECG2TOK in six downstream tasks, demonstrating that ECG2TOK efficiently achieves state-of-the-art performance and up to a 30.73% AUC increase in low-resource scenarios. Moreover, visualization experiments demonstrate that the discrete labels generated by ECG2TOK exhibit consistent semantics closely associated with clinical features. Our code is available on https://github.com/YXYanova/ECG2TOK.
8825: City-Level Foreign Direct Investment Prediction with Tabular Learning on Judicial Data
Authors: Tianxing Wu, Lizhe Cao, Shuang Wang, Jiming Wang, Shutong Zhu, Yerong Wu, Yuqing Feng
Location: Guangzhou | Day: TBD
Show Abstract
To advance the United Nations Sustainable Development Goal on promoting sustained, inclusive, and sustainable economic growth, foreign direct investment (FDI) plays a crucial role in catalyzing economic expansion and fostering innovation. Precise city-level FDI prediction is quite important for local government and is commonly studied based on economic data (e.g., GDP). However, such economic data could be prone to manipulation, making predictions less reliable. To address this issue, we try to leverage large-scale judicial data which reflects judicial performance influencing local investment security and returns, for city-level FDI prediction. Based on this, we first build an index system for the evaluation of judicial performance over twelve million publicly available adjudication documents according to which a tabular dataset is reformulated. We then propose a new Tabular Learning method on Judicial Data (TLJD) for city-level FDI prediction. TLJD integrates row data and column data in our built tabular dataset for judicial performance indicator encoding, and utilizes a mixture of experts model to adjust the weights of different indicators considering regional variations. To validate the effectiveness of TLJD, we design cross-city and cross-time tasks for city-level FDI predictions. Extensive experiments on both tasks demonstrate the superiority of TLJD (reach to at
least 0.92 R2) over the other ten state-of-the-art baselines in different evaluation metrics.
9079: What is Beneath Misogyny: Misogynous Memes Classification and Explanation
Authors: Kushal Kanwar, Dushyant Singh Chauhan, Gopendra Vikram Singh, Asif Ekbal
Location: Guangzhou | Day: TBD
Show Abstract
Memes are popular in the modern world and are distributed primarily for entertainment. However, harmful ideologies such as misogyny can be propagated through innocent-looking memes. The detection and understanding of why a meme is misogynous is a research challenge due to its multimodal nature (image and text) and its nuanced manifestations across different societal contexts. We introduce a novel multimodal approach, namely, MM-Misogyny to detect, categorize, and explain misogynistic content in memes. MM-Misogyny processes text and image modalities separately and unifies them into a multimodal context through a cross-attention mechanism. The resulting multimodal context is then easily processed for labeling, categorization, and explanation via a classifier and Large Language Model (LLM). The evaluation of the proposed model is performed on a newly curated dataset (What’s Beneath Misogynous Stereotyping (WBMS)) created by collecting misogynous memes from cyberspace and categorizing them into four categories, namely, Kitchen, Leadership, Working, and Shopping. The model not only detects and classifies misogyny, but also provides a granular understanding of how misogyny operates in operates in domains of life. The results demonstrate the superiority of our approach compared to existing methods. The code and dataset are available at https://github.com/Misogyny.
9121: Denoised Attention and Question-Augmented Representations for Knowledge Tracing
Authors: Jiwei Deng, Youheng Bai, Mingliang Hou, Teng Guo, Zitao Liu, Weiqi Luo
Location: Guangzhou | Day: TBD
Show Abstract
Knowledge tracing (KT) is an essential task in online education systems. It aims to predict the future performance of students based on their historical learning interaction data. Despite significant advancements in attention-based KT models, they still face some limitations: inaccurate input representation and excessive student forgetting modeling. These limitations often lead to the attention noise problem: the model assigns non-negligible attention weight to some information that is cognitively irrelevant in nature, thereby generating interference signals. To address this problem, we propose a novel KT model, i.e., DenoiseKT. DenoiseKT effectively models the difficulty of the questions and utilizes graph neural network to capture the complex relationship between questions, thereby refining the representations of input features. Additionally, the denoised attention mechanism introduces a weight factor to reduce the model’s attention weight distribution on irrelevant information. We extensively compare DenoiseKT with 22 state-of-the-art KT models on 4 widely-used public datasets. Experimental results show that DenoiseKT can effectively solve the attention noise problem and outperform other models. The source code of DenoiseKT is available at https://pykt.org.