Special Track on Human-Centred AI Papers

8460: LivePoem: Improving the Learning Experience of Classical Chinese Poetry with AI-Generated Musical StoryboardsPreprint

Authors: Qihao Liang, Xichu Ma, Torin Hopkins, Ye Wang
Location: Montreal | Day: August 21st | Time: 15:00 | Session: Human Centred AI (2/2)
Poster Board Position: From board n131 to board n136
Open-science materials: https://github.com/lqhac/LivePoem
Show Abstract
Textbook reading has long dominated classical poetry education in Chinese-speaking communities. However, research has shown that extensive text-based learning can lead to learner disengagement and a less pleasant experience. This paper aims to improve the experience of classical Chinese poetry learning by introducing LivePoem—a system that generates musical storyboards (storyboards with background music) as audiovisual aids to support poetry comprehension. We employ a pre-trained diffusion model for storyboard generation and train a prosody-based poem-to-melody generator using a Transformer model, both validated by standard objective metrics to ensure generation quality. Through a within-subjects study involving 25 non-native Chinese learners, we compared learning outcomes from textbook reading and musical storyboard viewing through standardised reading comprehension tests. Additionally, the learning experience was assessed by the Self-Assessment Manikin (SAM) and an inductive thematic analysis of learners’ open-ended feedback. Experimental results show that musical storyboards retained the learning outcomes of textbooks, while more effectively engaging learners and providing a more pleasant learning experience.
8556: Explainability Through Human-Centric Design for XAI in Lung Cancer DetectionPreprint

Authors: Amy Rafferty, Rishi Ramaesh, Ajitha Rajan
Location: Montreal | Day: August 21st | Time: 15:00 | Session: Human Centred AI (2/2)
Poster Board Position: From board n131 to board n136
Show Abstract
Deep learning models have shown promise in lung pathology detection from chest X-rays, but widespread clinical adoption remains limited due to opaque model decision-making. In prior work, we introduced ClinicXAI, a human-centric, expert-guided concept bottleneck model (CBM) designed for interpretable lung cancer diagnosis. We now extend that approach and present XpertXAI, a generalizable expert-driven model that preserves human-interpretable clinical concepts while scaling to detect multiple lung pathologies. Using a high-performing InceptionV3-based classifier and a public dataset of chest X-rays with radiology reports, we compare XpertXAI against leading post-hoc explainability methods and an unsupervised CBM, XCBs. We assess explanations through comparison with expert radiologist annotations and medical ground truth. Although XpertXAI is trained for multiple pathologies, our expert validation focuses on lung cancer. We find that existing techniques frequently fail to produce clinically meaningful explanations, omitting key diagnostic features and disagreeing with radiologist judgments. XpertXAI not only outperforms these baselines in predictive accuracy but also delivers concept-level explanations that better align with expert reasoning. While our focus remains on explainability in lung cancer detection, this work illustrates how human-centric model design can be effectively extended to broader diagnostic contexts — offering a scalable path toward clinically meaningful explainable AI in medical diagnostics.
8558: Explainable Automatic Fact-Checking for Journalists Augmentation in the WildPreprint

Authors: Filipe Altoe, Sérgio Miguel Gonçalves Pinto, H Sofia Pinto
Location: Montreal | Day: August 21st | Time: 15:00 | Session: Human Centred AI (2/2)
Poster Board Position: From board n131 to board n136
Show Abstract
Journalistic manual fact-checking is the usual way to address fake news; however, this labor-intensive task regularly is not a match for the scale of the problem. The literature introduced automated fact-checking (AFC) as a potential solution; however, there is still missing functionality in the AFC pipeline, a lack of research benchmarking data, and a disconnect between their design and human factors crucial for adoption. We present a fully explainable AFC framework designed to augment professional journalists in the wild. A novel human annotation-free approach surpasses state-of-the-art multi-label classification by 12%. It is the first to demonstrate strong generalization across different claim subjects without retraining and to generate complete verdict explanation articles and their summaries. A focused user study of 103 professional journalists, with 93% having dedicated experience with fact-checking, validates the framework’s level of explainability, transparency, and quality of generated fact-checking artifacts. The importance of establishing clear source selection and bias evaluation criteria reinforced the need for human augmentation, not replacement, by AFC systems.
8757: Toward Informed AV Decision-Making: Computational Model of Well-being and Trust in Mobility Preprint

Authors: Zahra Zahedi, Shashank Mehrotra, Teruhisa Misu, Kumar Akash
Location: Montreal | Day: August 21st | Time: 15:00 | Session: Human Centred AI (2/2)
Poster Board Position: From board n131 to board n136
Show Abstract
For future human-autonomous vehicle (AV) interactions to be effective and smooth, human-aware systems that analyze and align human needs with automation decisions are essential. Achieving this requires systems that account for human cognitive states. We present a novel computational model in the form of a Dynamic Bayesian Network (DBN) that infers the cognitive states of both AV users and other road users, integrating this information into the AV’s decision-making process. Specifically, our model captures the “well-being” of both an AV user and an interacting road user as cognitive states alongside trust. Our DBN models infer beliefs over the AV user’s evolving well-being, trust, and intention states, as well as the possible well-being of other road users, based on observed interaction experiences. Using data collected from an interaction study, we refine the model parameters and empirically assess its performance. Finally, we extend our model into a causal inference model (CIM) framework for AV decision-making, enabling the AV to enhance user well-being and trust while balancing these factors with its own operational costs and the well-being of interacting road users. Our evaluation demonstrates the model’s effectiveness in accurately predicting user’s states and guiding informed, human-centered AV decisions.
8796: Reflective Verbal Reward Design for Pluralistic AlignmentPreprint

Authors: Carter Blair, Kate Larson, Edith Law
Location: Montreal | Day: August 21st | Time: 15:00 | Session: Human Centred AI (2/2)
Poster Board Position: From board n131 to board n136
Open-science materials: https://osf.io/8yxf2/
Show Abstract
AI agents are commonly aligned with "human values" through reinforcement learning from human feedback (RLHF), where a single reward model is learned from aggregated human feedback and used to align an agent’s behavior. However, human values are not homogeneous–different people hold distinct and sometimes conflicting values. Aggregating feedback into a single reward model risks disproportionately suppressing minority preferences. To address this, we present a novel reward modeling approach for learning individualized reward models. Our approach uses a language model to guide users through reflective dialogues where they critique agent behavior and construct their preferences. This personalized dialogue history, containing the user’s reflections and critiqued examples, is then used as context for another language model that serves as an individualized reward function (what we call a "verbal reward model") for evaluating new trajectories. In studies with 30 participants, our method achieved a 9-12% improvement in accuracy over non-reflective verbal reward models while being more sample efficient than traditional supervised learning methods.
8844: The Delta of Thought: Channeling Rivers of Commonsense Knowledge in the Sea of Metaphorical InterpretationsPreprint

Authors: Antonio Lieto, Gian Luca Pozzato, Stefano Zoia
Location: Montreal | Day: August 21st | Time: 15:00 | Session: Human Centred AI (2/2)
Poster Board Position: From board n131 to board n136
Open-science materials: https://github.com/StefanoZoia/METCL
Show Abstract
We propose a system called METCL (Metaphor Elaboration in Typicality-Based Compositional Logic) able to generate and identify metaphors by using the TCL reasoning framework, specialized in human-like commonsense concept combination. We show that METCL is able to improve both state of-the-art Large Language Models (e.g DeepSeek-R1, GPT-4o, Qwen2.5-Max) and symbolic ones in the task of metaphor identification. Additionally, we show how the metaphors generated by METCL are generally well accepted by human subjects. The obtained results are encouraging and pave the way to research in automatic metaphor generation and comprehension based on the assumption that metaphors interpretation can be partially regarded as a categorization problem relying on generative commonsense concept combination.
8947: Shaping Shared Languages: Human and Large Language Models’ Inductive Biases in Emergent CommunicationPreprint

Authors: Tom Kouwenhoven, Max Peeperkorn, Roy de Kleijn, Tessa Verhoef
Location: Montreal | Day: August 21st | Time: 11:30 | Session: Human Centred AI (1/2)
Poster Board Position: From board n137 to board n138
Show Abstract
Languages are shaped by the inductive biases of their users. Using a classical referential game, we investigate how artificial languages evolve when optimised for inductive biases in humans and large language models (LLMs) via Human-Human, LLM-LLM and Human-LLM experiments. We show that referentially grounded vocabularies emerge that enable reliable communication in all conditions, even when humans and LLMs collaborate. Comparisons between conditions reveal that languages optimised for LLMs subtly differ from those optimised for humans. Interestingly, interactions between humans and LLMs alleviate these differences and result in vocabularies more human-like than LLM-like. These findings advance our understanding of the role inductive biases in LLMs play in the dynamic nature of human language and contribute to maintaining alignment in human and machine communication. In particular, our work underscores the need to think of new LLM training methods that include human interaction and shows that using communicative success as a reward signal can be a fruitful, novel direction.
9184: Enhancing Automated Grading in Science Education through LLM-Driven Causal Reasoning and Multimodal AnalysisPreprint

Authors: Haohao Zhu, Tingting Li, Peng He, Jiayu Zhou
Location: Montreal | Day: August 21st | Time: 11:30 | Session: Human Centred AI (1/2)
Poster Board Position: From board n137 to board n138
Show Abstract
Automated assessment of open responses in K–12 science education poses significant challenges due to the multimodal nature of student work, which often integrates textual explanations, drawings, and handwritten elements. Traditional evaluation methods that focus solely on textual analysis fail to capture the full breadth of student reasoning and are susceptible to biases such as handwriting neatness or answer length. In this paper, we propose a novel LLM-augmented multimodal evaluation framework that addresses these limitations through a comprehensive, bias-corrected grading system. Our approach leverages LLMs to generate causal knowledge graphs that encapsulate the essential conceptual relationships in student responses, comparing these graphs with those derived automatically from the rubrics and submissions. Experimental results demonstrate that our framework improves grading accuracy and consistency over deep supervised learning and few-shot LLM baselines.