Special Track on AI and Social Good Papers

947: Deep Reinforcement Learning for Efficient and Fair Allocation of Healthcare Resources Preprint

Authors: Yikuan Li, Chengsheng Mao, Kaixuan Huang, Hanyin Wang, Zheng Yu, Mengdi Wang, Yuan Luo

Location: Montreal | Day: August 21st | Time: 10:00 | Session: AI for Social Good (4/8)

Poster Board Position: From board n118 to board n120

The scarcity of health care resources, such as ventilators, often leads to the unavoidable consequence of rationing, particularly during public health emergencies or in resource-constrained settings like pandemics. The absence of a universally accepted standard for resource allocation protocols results in governments relying on varying criteria and heuristic-based approaches, often yielding suboptimal and inequitable outcomes. This study addresses the societal challenge of fair and effective critical care resource allocation by leveraging deep reinforcement learning to optimize policy decisions. We propose a transformer-based deep Q-network that integrates individual patient disease progression and interaction effects among patients to enhance allocation decisions. Our method aims to improve both fairness and overall patient outcomes. Experiments using metrics such as normalized survival rates and interracial allocation rate differences demonstrate that our approach significantly reduces excess deaths and achieves more equitable resource allocation compared to severity- and comorbidity-based protocols currently in use. Our findings highlight the potential of deep reinforcement learning to address critical health care challenges.

1853: Faster Annotation for Elevation-Guided Flood Extent Mapping by Consistency-Enhanced Active Learning Preprint

Authors: Saugat Adhikari, Da Yan, Tianyang Wang, Landon Dyken, Sidharth Kumar, Lyuheng Yuan, Akhlaque Ahmad, Jiao Han, Yang Zhou, Steve Petruzza

Location: Montreal | Day: August 20th | Time: 14:00 | Session: AI for Social Good (3/8)

Poster Board Position: From board n98 to board n104

Flood extent mapping is crucial for disaster response and damage assessment. While Earth imagery and terrain data (in the form of DEM) are now readily available, there are few flood annotation data for training machine learning models, which hinders the automated mapping of flooded areas. We propose ALFA, an interactive active-learning-based approach to minimize the annotators’ efforts when preparing the ground-truth flood map in a satellite image. ALFA calibrates the prediction consistency of a segmentation model (1) across training cycles and (2) for various data augmentations. The two consistencies are integrated into the design of both the acquisition function and the loss function to enhance the robustness of active learning with limited annotation inputs. ALFA recommends those superpixels that the underlying model is most uncertain about, and users can annotate their pixels with minimal clicks with the help of elevation guidance. Extensive experiments on various regions hit by flooding show that we can improve the annotation time from hours to around 20 minutes. ALFA is open sourced at https://github.com/saugatadhikari/alfa.

2260: Automating Intervention Discovery from Scientific Literature: A Progressive Ontology Prompting and Dual-LLM Framework Preprint

Authors: Yuting Hu, Dancheng Liu, Qingyun Wang, Charles Yu, Chenhui Xu, Qingxiao Zheng, Heng Ji, Jinjun Xiong

Location: Montreal | Day: August 19th | Time: 15:00 | Session: AI for Social Good (2/8)

Poster Board Position: From board n84 to board n89

Identifying effective interventions from the scientific literature is challenging due to the high volume of publications, specialized terminology, and inconsistent reporting formats, making manual curation laborious and prone to oversight. To address this challenge, this paper proposes a novel framework leveraging large language models (LLMs), which integrates a progressive ontology prompting (POP) algorithm with a dual-agent system, named LLM-Duo. On the one hand, the POP algorithm conducts a prioritized breadth-first search (BFS) across a predefined ontology, generating structured prompt templates and action sequences to guide the automatic annotation process. On the other hand, the LLM-Duo system features two specialized LLM agents, an explorer and an evaluator, working collaboratively and adversarially to continuously refine annotation quality. We showcase the real-world applicability of our framework through a case study focused on speech-language intervention discovery. Experimental results show that our approach surpasses advanced baselines, achieving more accurate and comprehensive annotations through a fully automated process. Our approach successfully identified 2,421 interventions from a corpus of 64,177 research articles in the speech-language pathology domain, culminating in the creation of a publicly accessible intervention knowledge base with great potential to benefit the speech-language pathology community.

8400: Detection and Geographic Localization of Natural Objects in the Wild: A Case Study on Palms Preprint

Authors: Kangning Cui, Rongkun Zhu, Manqi Wang, Wei Tang, Gregory D. Larsen, Victor P. Pauca, Sarra Alqahtani, Fan Yang, David Segurado, David A. Lutz, Jean-Michel Morel, Miles R. Silman

Location: Montreal | Day: August 20th | Time: 14:00 | Session: AI for Social Good (3/8)

Poster Board Position: From board n98 to board n104

Palms are ecologically and economically indicators of tropical forest health, biodiversity, and human impact that support local economies and global forest product supply chains. While palm detection in plantations is well-studied, efforts to map naturally occurring palms in dense forests remain limited by overlapping crowns, uneven shading, and heterogeneous landscapes. We develop PRISM (Processing, Inference, Segmentation, and Mapping), a flexible pipeline for detecting and localizing palms in dense tropical forests using large orthomosaic images. Orthomosaics are created from thousands of aerial images and spanning several to hundreds of gigabytes. Our contributions are threefold. First, we construct a large UAV-derived orthomosaic dataset collected across 21 ecologically diverse sites in western Ecuador, annotated with 8,830 bounding boxes and 5,026 palm center points. Second, we evaluate multiple state-of-the-art object detectors based on efficiency and performance, integrating zero-shot SAM~2 as the segmentation backbone, and refining the results for precise geographic mapping. Third, we apply calibration methods to align confidence scores with IoU and explore saliency maps for feature explainability. Though optimized for palms, PRISM is adaptable for identifying other natural objects, such as eastern white pines. Future work will explore transfer learning for lower-resolution datasets (0.5–1m). Data and code can be found at github.com/Zippppo/PRISM.

8411: DeepShade: Enable Shade Simulation by Text-conditioned Image Generation Preprint

Authors: Longchao Da, Xiangrui Liu, Mithun Shivakoti, Thirulogasankar Pranav Kutralingam, Yezhou Yang, Hua Wei

Location: Montreal | Day: August 22nd | Time: 11:30 | Session: AI for Social Good (8/8)

Poster Board Position: From board n49 to board n52

Heatwaves pose a significant threat to public health, especially as global warming intensifies. However, current routing systems (e.g., online maps) fail to incorporate shade information due to the difficulty of estimating shades directly from noisy satellite imagery and the limited availability of training data for generative models. In this paper, we address these challenges through two main contributions. First, we build an extensive dataset covering diverse longitude-latitude regions, varying levels of building density, and different urban layouts. Leveraging Blender-based 3D simulations alongside building outlines, we capture building shadows under various solar zenith angles throughout the year and at different times of day. These simulated shadows are aligned with satellite images, providing a rich resource for learning shade patterns. Second, we propose the DeepShade, a diffusion-based model designed to learn and synthesize shade variations over time. It emphasizes the nuance of edge features by jointly considering RGB with the Canny edge layer, and incorporates contrastive learning to capture the temporal change rules of shade. Then, by conditioning on textual descriptions of known conditions (e.g., time of day, solar angles), our framework provides improved performance in generating shade images. We demonstrate the utility of our approach by using our shade predictions to calculate shade ratios for real-world route planning in Tempe, Arizona. We believe this work will benefit society by providing a reference for urban planning in extreme heat weather and its potential practical applications in the environment.

8422: Exploring Multimodal Foundation AI and Expert-in-the-Loop for Sustainable Management of Wild Salmon Fisheries in Indigenous Rivers Preprint

Authors: Chi Xu, Yili Jin, Sami Ma, Rongsheng Qian, Hao Fang, Jiangchuan Liu, Xue Liu, Edith C.H. Ngai, William I. Atlas, Katrina M. Connors, Mark A. Spoljaric

Location: Montreal | Day: August 20th | Time: 14:00 | Session: AI for Social Good (3/8)

Poster Board Position: From board n98 to board n104

Wild salmon are essential to the ecological, economic, and cultural sustainability of the North Pacific Rim. Yet climate variability, habitat loss, and data limitations in remote ecosystems that lack basic infrastructure support pose significant challenges to effective fisheries management. This project explores the integration of multimodal foundation AI and expert-in-the-loop frameworks to enhance wild salmon monitoring and sustainable fisheries management in Indigenous rivers across Pacific Northwest. By leveraging video and sonar-based monitoring, we develop AI-powered tools for automated species identification, counting, and length measurement, reducing manual effort, expediting delivery of results, and improving decision-making accuracy. Expert validation and active learning frameworks ensure ecological relevance while reducing annotation burdens. To address unique technical and societal challenges, we bring together a cross-domain, interdisciplinary team of university researchers, fisheries biologists, Indigenous stewardship practitioners, government agencies, and conservation organizations. Through these collaborations, our research fosters ethical AI co-development, open data sharing, and culturally informed fisheries management.

8459: Classifying and Tracking International Aid Contribution Towards SDGs Preprint

Authors: Sungwon Park, Dongjoon Lee, Kyeongjin Ahn, Yubin Choi, Junho Lee, Meeyoung Cha, Kyung Ryul Park

Location: Montreal | Day: August 21st | Time: 15:00 | Session: AI for Social Good (6/8)

Poster Board Position: From board n112 to board n117

International aid is a critical mechanism for promoting economic growth and well-being in developing nations, supporting progress toward the Sustainable Development Goals (SDGs). However, tracking aid contributions remains challenging due to labor-intensive data management, incomplete records, and the heterogeneous nature of aid data. Recognizing the urgency of this challenge, we partnered with government agencies to develop an AI model that complements manual classification and mitigates human bias in subjective interpretation. By integrating SDG-specific semantics and leveraging prior knowledge from language models, our approach enhances classification accuracy and accommodates the diversity of aid projects. When applied to a comprehensive dataset spanning multiple years, our model can reveal hidden trends in the temporal evolution of international development cooperation. Expert interviews further suggest how these insights can empower policymakers with data-driven decision-making tools, ultimately improving aid effectiveness and supporting progress toward SDGs.

8559: AI-Assisted Triage and Decision Support of Head and Neck Cancer Screening and Diagnosis in Low-Resourced Settings Preprint

Authors: Min Hun Lee, Sean Shao Wei Lam, Shaun Xin Hong Liew, Michael Dorosan, Nicholas Graves, Jonas Karlström, Hiang Khoon Tan, Walter Tsong Lee

Location: Montreal | Day: August 20th | Time: 14:00 | Session: AI for Social Good (3/8)

Poster Board Position: From board n98 to board n104

The mortality burden of head and neck cancer (HNC) is increasing globally and disproportionately affects people in low-and middle-income countries with limited medical workforce. To address this issue, artificial intelligence (AI) algorithms are increasingly being explored to process medical imaging data, demonstrating competitive performance. However, the clinical adoption of AI remains challenging as clinicians struggle to understand how complex AI works and trust it to use in practice. In addition, AI may not perform well on varying data qualities of endoscopy videos for HNC screening and diagnosis from multiple sites.

In this project, our international and interdisciplinary team will collaborate with clinicians from multiple sites (e.g. Singapore, the U.S., and Bangladesh) to collect a diverse, multi-site dataset. In addition, we aim to design and develop computational techniques and practices to improve collaborations between clinicians and AI for the triage and diagnosis of HNC. Specifically, these techniques include a YOLOv5-based glottis detector, a classifier of patient’s status using clinical endoscopy videos, uncertainty quantification techniques, and interactive Vision Language Model-based AI explanations, which will enable clinicians to understand AI outputs and provide their inputs to improve AI. After developing our system, we will evaluate the effectiveness of these computational techniques in enabling AI-assisted point-of-care triage and decision-support for HNC, particularly in resource-limited settings.

8662: Towards the 30 by 30 Kunming-Montreal Global Biodiversity Framework Target: Optimising Graph Connectivity in Constraint-Based Spatial Planning Preprint

Authors: Sulian Le Bozec-Chiffoleau, Dimitri Justeau-Allaire, Xavier Lorca, Charles Prud’homme, Gilles Simonin, Philippe Vismara, Philippe Birnbaum, Nicolas Rinck, Nicolas Beldiceanu

Location: Montreal | Day: August 21st | Time: 11:30 | Session: AI for Social Good (5/8)

Poster Board Position: From board n121 to board n123

The Kunming-Montreal Global Biodiversity Framework aims to protect 30% of terrestrial, inland water, marine, and coastal ecosystems worldwide, and ensuring that at least 30% of these areas are under effective restoration by 2030.
Maintaining and restoring ecological connectivity between natural habitats and protected areas is a key feature of this target.
Achieving it will require effective and inclusive spatial planning supported by appropriate decision-support tools.
Most spatial planning models address budget as an objective and connectivity as a constraint, formulating problems with Steiner trees.
In many real-world cases, such as landscape-scale restoration planning, this formulation is inappropriate when environmental managers seek to optimise connectivity under a budget constraint.
This problem was previously addressed with Constraint Programming (CP) and graph variables, but the current approach is severely limited in terms of spatial resolution.
In this article, we formalise this problem as the budget-constrained graph connectivity optimisation problem. Based on a real case study: the restoration of forest connectivity in New Caledonia, we illustrate why “naive” CP approaches are inefficient.
In response, we provide a preprocessing method based on Hanan grids which preserves the existence of at least one optimal solution.
Finally, we assess the efficiency of our approach in the New Caledonian case study.

8707: QBR – A Question-Bank-Based Approach to Fine-Grained Legal Knowledge Retrieval for the General Public Preprint

Authors: Mingruo Yuan, Ben Kao, Tien-Hsuan Wu

Location: Montreal | Day: August 19th | Time: 15:00 | Session: AI for Social Good (2/8)

Poster Board Position: From board n84 to board n89

Retrieval of legal knowledge by the general public is a challenging problem due to the technicality of the professional knowledge and the lack of fundamental understanding by laypersons on the subject. Traditional information retrieval techniques assume that users are capable of formulating succinct and precise queries for effective document retrieval. In practice, however, the wide gap between the highly technical contents and untrained users makes legal knowledge retrieval very difficult. We propose a methodology, called QBR, which employs a Questions Bank (QB) as an effective medium for bridging the knowledge gap. We show how the QB is used to derive training samples to enhance the embedding of knowledge units within documents, which leads to effective fine-grained knowledge retrieval. We discuss and evaluate through experiments various advantages of QBR over traditional methods. These include more accurate, efficient, and explainable document retrieval, better comprehension of retrieval results, and highly effective fine-grained knowledge retrieval. We also present some case studies and show that QBR achieves social impact by assisting citizens to resolve everyday legal concerns.

8714: Direct Estimation of Attenuation Information from Sinograms for Positron Emission Tomography Reconstruction Preprint

Authors: Prabath Hetti Mudiyanselage, Ruwan Tennakoon, John Thangarajah, Robert Ware, Jason Callahan

Location: Montreal | Day: August 20th | Time: 14:00 | Session: AI for Social Good (3/8)

Poster Board Position: From board n98 to board n104

Positron Emission Tomography (PET) is a powerful imaging modality for assessing biochemical processes within the body. However, accurate image reconstruction is challenged by photon attenuation, particularly in dense structures such as bones, leading to quantification errors and reduced diagnostic confidence. Computed Tomography (CT) based attenuation correction is the standard approach but introduces additional radiation exposure, longer imaging times, and patient inconvenience, as well as potential registration errors, motion artifacts, and energy scaling inaccuracies.
In this study, we propose a 3D U-Net based deep learning framework that directly estimates attenuation information from PET sinograms, eliminating the need for additional imaging modalities. Our approach integrates PET physics and employs custom skip connections to enhance cross-domain learning. We evaluate our model on a simulated brain dataset derived from real patient templates, achieving a Dice coefficient of 0.650 and an accuracy of 0.486 for bone structures. The clinical applicability of our method is further assessed by reconstructing PET images with the generated attenuation maps, yielding an MSE of 0.007 and an SSIM of 0.956, demonstrating strong structural consistency with CT-based attenuation correction. These results highlight the feasibility of performing PET image attenuation correction using PET sinograms alone, offering a promising alternative that reduces imaging time, radiation exposure, and patient burden while enabling faster and more efficient PET reconstruction.

8718: Towards the Terminator Economy: Assessing Job Exposure to AI Through LLMs Preprint

Authors: Emilio Colombo, Fabio Mercorio, Mario Mezzanzanica, Antonio Serino

Location: Montreal | Day: August 21st | Time: 15:00 | Session: AI for Social Good (6/8)

Poster Board Position: From board n112 to board n117

AI and related technologies are reshaping jobs and tasks, either by automating or augmenting human skills in the workplace. Many researchers have been working on estimating if and to what extent jobs and tasks are exposed to the risk of being automatized by AI-related technologies. Our work tackles this issue through a data-driven approach by:
(i) developing a reproducible framework that uses cutting-edge open-source large language models to assess the current capabilities of AI and robotics in performing job-related tasks;
(ii) formalizing and computing a measure of AI exposure by occupation, the Task Exposure to AI (TEAI) index, and a measure of Task Replacement by AI (TRAI) index, both validated through a human user evaluation and compared with the state-of-the-art.

Our results show that the TEAI index is positively correlated with cognitive, problem-solving, and management skills, while it is negatively correlated with social skills. Results also suggest that about one-third of U.S. employment is highly exposed to AI, primarily in high-skill jobs requiring a graduate or postgraduate level of education. We also find that AI exposure is positively associated with employment and wage growth from 2003 to 2023, suggesting that AI has had an overall positive effect on productivity.

Considering specifically the TRAI index, we find that even in high-skill occupations, AI exhibits high variability in task substitution, suggesting that AI and humans complement each other within the same occupation, while the allocation of tasks within occupations is likely to change.

All results, models, and code are freely available online to allow the community to reproduce our results, compare outcomes, and use our work as a benchmark to monitor AI’s progress over time.

8760: Exploring Equity of Climate Policies Using Multi-Agent Multi-Objective Reinforcement Learning Preprint

Authors: Palok Biswas, Zuzanna Osika, Isidoro Tamassia, Adit Whorra, Jazmin Zatarain-Salazar, Jan Kwakkel, Frans A. Oliehoek, Pradeep K. Murukannaiah

Location: Montreal | Day: August 19th | Time: 11:30 | Session: AI for Social Good (1/8)

Poster Board Position: From board n90 to board n93

Addressing climate change requires coordinated policy efforts of nations worldwide. These efforts are informed by scientific reports, which rely in part on Integrated Assessment Models (IAMs), prominent tools used to assess the economic impacts of climate policies. However, traditional IAMs optimize policies based on a single objective, limiting their ability to capture the trade-offs among economic growth, temperature goals, and climate justice. As a result, policy recommendations have been criticized for perpetuating inequalities, fueling disagreements during policy negotiations. We introduce JUSTICE, the first framework integrating IAM with Multi-Objective Multi-Agent Reinforcement Learning (MOMARL). By incorporating multiple objectives, JUSTICE generates policy recommendations that shed light on equity while balancing climate and economic goals. Further, using multiple agents can provide a realistic representation of the interactions among the diverse policy actors. We identify equitable Pareto-optimal policies using our framework, which facilitates deliberative decision-making by presenting policymakers with the inherent trade-offs in climate and economic policy.

8772: Hazard Function Guided Agent-Based Models: A Case Study of Return Migration from Poland to Ukraine Preprint

Authors: Zakaria Mehrab, S.S. Ravi, Logan Stundal, Samarth Swarup, Srini Venkatramanan, Bryan Lewis, Henning Mortveit, David Leblang, Madhav V. Marathe

Location: Montreal | Day: August 19th | Time: 11:30 | Session: AI for Social Good (1/8)

Poster Board Position: From board n90 to board n93

The Russian invasion of Ukraine in February 2022 has led to the largest forced migration crisis in Europe since World War II, with millions displaced both internally and internationally. Among the displaced, approximately 4.2 million individuals have returned, highlighting the significance of return migration as a critical phase in the migration continuum. Existing studies on return migration are limited in scope, relying on survey-based approaches that suffer from demographic bias, lack of validation against ground truth, and inability to account for uncertainty. We propose a novel computational framework for modeling the return of conflict-induced migrants, using agent-based models (ABMs) and their surrogates. These models are grounded in hazard functions and account for sociopolitical contexts. Our proposed ABMs outperform baseline methods in estimating return migration from Poland to Ukraine by at least 42% and by as much as 57% in terms of normalized root mean squared error (NRMSE). Further, to illustrate the utility of such models for policymakers, we conduct two case studies that estimate the duration of displacement and characterize the demographic breakdown among the returnees.

8912: Agent-based Modeling Meets the Capability Approach for Human Development: Simulating Homelessness Policy-making Preprint

Authors: Alba Aguilera, Nardine Osman, Georgina Curto

Location: Montreal | Day: August 19th | Time: 11:30 | Session: AI for Social Good (1/8)

Poster Board Position: From board n90 to board n93

The global rise in homelessness calls for urgent and alternative policy solutions. Non-profits and governmental organizations alert about the many challenges faced by people experiencing homelessness (PEH), which include not only the lack of shelter but also the lack of opportunities for personal development. In this context, the capability approach (CA), which underpins the United Nations Sustainable Development Goals (SDGs), provides a comprehensive framework to assess inequity in terms of real opportunities. This paper explores how the CA can be combined with agent-based modelling and reinforcement learning. The goals are: (1) implementing the CA as a Markov Decision Process (MDP), (2) building on such MDP to develop a rich decision-making model that accounts for more complex motivators of behaviour, such as values and needs, and (3) developing an agent-based simulation framework that allows to assess alternative policies aiming to expand or restore people’s capabilities. The framework is developed in a real case study of health inequity and homelessness, working in collaboration with stakeholders, non-profits and domain experts. The ultimate goal of the project is to develop a novel agent-based simulation framework, rooted in the CA, which can be replicated in a diversity of social challenges to assess policies in a non-invasive way.

8917: Enhancing Online Climate Discourse: A Two-Stage Framework for Climate Content Categorization and Moderation Preprint

Authors: Apoorva Upadhyaya, Wolfgang Nejdl, Marco Fisichella

Location: Montreal | Day: August 22nd | Time: 10:00 | Session: AI for Social Good (7/8)

Poster Board Position: From board n46 to board n48

Climate change is one of the most pressing global challenges that requires urgent adaptation and resilience efforts, highlighting the need for both scientific solutions and effective communication. In the digital age, online content plays a key role in shaping climate narratives. Therefore, previous research has mainly focused on public perception or categorized content by topics such as impacts, mitigation, policy, etc. Despite these efforts, identifying discussions that address climate change adaptation is crucial for monitoring resilience and assessing public sentiment, while recognizing denial narratives helps combat misinformation. Moreover, the public’s exposure to online climate content can either lead to or hinder climate action, emphasizing the need for climate content moderation. To address these issues, we propose a novel multi-stage framework where stage 1 categorizes climate-related content into adaptation, resilience, and denial while stage 2 moderates content by enhancing or intervening based on its alignment with climate goals. We present a novel dataset by manually annotating publicly available tweets and news articles into different climate categories with the help of a taxonomy developed by domain experts. Extensive experiments with benchmark climate and other domain datasets validate the efficacy of our prediction stage, while human and external evaluations confirm the relevance of our moderation stage.

8923: Expanding Connected Components from Alternative Terminals: Global Optimization for Freshwater Fishes Under the UN’s 30×30 Conservation Goal Preprint

Authors: Yue Mao, Zhongdi Qu, Imanol Miqueleiz, Aaron Ferber, Sami Wolf, Marc Grimson, Sebastian Heilpern, Felipe S. Pacheco, Alexander S. Flecker, Peter B. McIntyre, Carla P. Gomes

Location: Montreal | Day: August 21st | Time: 11:30 | Session: AI for Social Good (5/8)

Poster Board Position: From board n121 to board n123

Climate change and biodiversity loss are among humanity’s most pressing challenges. In 2022, under the auspices of the United Nations, over 190 countries reached a historic agreement to address the alarming loss of biodiversity and restore natural ecosystems. Target 3, often referred to as “30×30”, seeks to effectively protect and manage 30% of the world’s terrestrial, inland water, coastal, and marine areas by 2030. In this work, we address the UN 30×30 target in the context of global freshwater fish conservation. Freshwater ecosystems are disproportionately unprotected, and their biota are declining at an alarming rate. Our goal is to select new protected areas that protect freshwater fish species as much as possible without exceeding total coverage of 30% of land area. To support this goal, we introduce the Expansion of Connected Components from Alternative Terminals Problem, a graph-based optimization problem that captures ecological priorities and connectivity constraints. We analyze its computational complexity, propose novel integer programming formulations, and develop scalable solution methods. We further evaluate its typical-case complexity under diverse settings and demonstrate that our approach scales to a global real-world scope, encompassing approximately 200,000 freshwater basins and 13,000 species, paving the way for implementing the 30×30 target on a worldwide scale.

8927: Early Detection of Patient Deterioration from Real-Time Wearable Monitoring System Preprint

Authors: Lo Pang-Yun Ting, Hong-Pei Chen, An-Shan Liu, Chun-Yin Yeh, Po-Lin Chen, Kun-Ta Chuang

Location: Montreal | Day: August 22nd | Time: 11:30 | Session: AI for Social Good (8/8)

Poster Board Position: From board n49 to board n52

Early detection of patient deterioration is crucial for reducing mortality rates. Heart rate data has shown promise in assessing patient health, and wearable devices offer a cost-effective solution for real-time monitoring. However, extracting meaningful insights from diverse heart rate data and handling missing values in wearable device data remain key challenges. To address these challenges, we propose TARL, an innovative approach that models the structural relationships of representative subsequences, known as shapelets, in heart rate time series. TARL creates a shapelet-transition knowledge graph to model shapelet dynamics in heart rate time series, indicating illness progression and potential future changes. We further introduce a transition-aware knowledge embedding to reinforce relationships among shapelets and quantify the impact of missing values, enabling the formulation of comprehensive heart rate representations. These representations capture explanatory structures and predict future heart rate trends, aiding early illness detection. We collaborate with physicians and nurses to gather ICU patient heart rate data from wearables and diagnostic metrics to assess illness severity and evaluate deterioration. Experiments on real-world ICU data demonstrate that TARL achieves both high reliability and early detection. A case study further showcases TARL’s explainable detection process, highlighting its potential as an AI-driven tool to assist clinicians in recognizing early signs of patient deterioration.

8930: Bidirectional Human–AI Collaboration for Equitable Student Performance Prediction via Deep Uncertainty Learning Preprint

Authors: Ruohan Zong, Yang Zhang, Lanyu Shang, Frank Stinar, Nigel Bosch, Dong Wang

Location: Montreal | Day: August 21st | Time: 15:00 | Session: AI for Social Good (6/8)

Poster Board Position: From board n112 to board n117

This paper studies a bidirectional human-AI collaborative student performance prediction problem to enhance equitable online education, aligning with the United Nations’ Sustainable Development Goal (SDG) of ensuring inclusive and equitable quality education for all. The goal is to leverage collaborative intelligence to generate accurate and fair student outcome predictions from behavioral data, ensuring equitable estimation for underrepresented populations. Current fair AI solutions often fail to mitigate demographic bias in the absence of student demographic data, while human-AI collaborative approaches frequently overlook human cognitive biases, leading to inaccurate predictions. We develop CollabDebias, a novel bidirectional human-AI collaborative framework that utilizes the complementary strengths of AI and humans to mitigate the AI demographic bias and human cognitive bias. To address AI demographic bias, we propose an uncertainty learning-based bias identification method and a reliability-aware human-AI integration approach. To reduce human cognitive bias, we design uncertainty-aware visualization of AI decision area and attention mechanism. Experimental results on an online course demonstrate CollabDebias’s effectiveness in improving student performance prediction accuracy and fairness.

8944: Recommender Systems for Democracy: Toward Adversarial Robustness in Voting Advice Applications Preprint

Authors: Frédéric Berdoz, Dustin Brunner, Yann Vonlanthen, Roger Wattenhofer

Location: Montreal | Day: August 21st | Time: 15:00 | Session: AI for Social Good (6/8)

Poster Board Position: From board n112 to board n117

Voting advice applications (VAAs) help millions of voters understand which political parties or candidates best align with their views. This paper explores the potential risks these applications pose to the democratic process when targeted by adversarial entities. In particular, we expose 11 manipulation strategies and measure their impact using data from Switzerland’s primary VAA, Smartvote, collected during the last two national elections. We find that altering application parameters, such as the matching method, can shift a party’s recommendation frequency by up to 105%. Cherry-picking questionnaire items can increase party recommendation frequency by over 261%, while subtle changes to parties’ or candidates’ responses can lead to a 248% increase. To address these vulnerabilities, we propose adversarial robustness properties VAAs should satisfy, introduce empirical metrics for assessing the resilience of various matching methods, and suggest possible avenues for research toward mitigating the effect of manipulation. Our framework is key to ensuring secure and reliable AI-based VAAs poised to emerge in the near future.

8946: Perspectives in Play: A Multi-Perspective Approach for More Inclusive NLP Systems Preprint

Authors: Benedetta Muscato, Lucia Passaro, Gizem Gezici, Fosca Giannotti

Location: Montreal | Day: August 22nd | Time: 10:00 | Session: AI for Social Good (7/8)

Poster Board Position: From board n46 to board n48

In the realm of Natural Language Processing (NLP), common approaches for handling human disagreement consist of aggregating annotators’ viewpoints to establish a single ground truth. However, prior studies show that disregarding individual opinions can lead to the side-effect of under-representing minority perspectives, especially in subjective tasks, where annotators may systematically disagree because of their preferences. Recognizing that labels reflect the diverse backgrounds, life experiences, and values of individuals, this study proposes a new multi-perspective approach using soft labels to encourage the development of the next generation of perspective-aware models—more inclusive and pluralistic. We conduct an extensive analysis across diverse subjective text classification tasks including hate speech, irony, abusive language, and stance detection, to highlight the importance of capturing human disagreements, often overlooked by traditional aggregation methods. Results show that the multi-perspective approach not only better approximates human label distributions, as measured by Jensen-Shannon Divergence (JSD), but also achieves superior classification performance (higher F1-scores), outperforming traditional approaches. However, our approach exhibits lower confidence in tasks like irony and stance detection, likely due to the inherent subjectivity present in the texts. Lastly, leveraging Explainable AI (XAI), we explore model uncertainty and uncover meaningful insights into model predictions. All implementation details are available at our github repo.

8956: LogiDebrief: A Signal-Temporal Logic Based Automated Debriefing Approach with Large Language Models Integration Preprint

Authors: Zirong Chen, Ziyan An, Jennifer Reynolds, Kristin Mullen, Stephen Maritini, Meiyi Ma

Location: Montreal | Day: August 19th | Time: 15:00 | Session: AI for Social Good (2/8)

Poster Board Position: From board n84 to board n89

Emergency response services are critical to public safety, with 9-1-1 call-takers playing a key role in ensuring timely and effective emergency operations. To ensure call-taking performance consistency, quality assurance is implemented to evaluate and refine call-takers’ skillsets. However, traditional human-led evaluations struggle with high call volumes, leading to low coverage and delayed assessments. We introduce LogiDebrief, an AI-driven framework that automates traditional 9-1-1 call debriefing by integrating Signal-Temporal Logic (STL) with Large Language Models (LLMs) for fully-covered rigorous performance evaluation. LogiDebrief formalizes call-taking requirements as logical specifications, enabling systematic assessment of 9-1-1 calls against procedural guidelines. It employs a three-step verification process: (1) contextual understanding to identify responder types, incident classifications, and critical conditions; (2) STL-based runtime checking with LLM integration to ensure compliance; and (3) automated aggregation of results into quality assurance reports. Beyond its technical contributions, LogiDebrief has demonstrated real-world impact. Successfully deployed at Metro Nashville Department of Emergency Communications, it has assisted in debriefing 1,701 real-world calls, saving 311.85 hours of active engagement. Empirical evaluation with real-world data confirms its accuracy, while a case study and extensive user study highlight its effectiveness in enhancing call-taking performance.

9008: Sustainable Wearables for Health Applications and Beyond via Uncertainty-Aware Energy Management Preprint

Authors: Dina Hussein, Chibuike E. Ugwu, Ganapati Bhat, Janardhan Rao Doppa

Location: Montreal | Day: August 21st | Time: 10:00 | Session: AI for Social Good (4/8)

Poster Board Position: From board n118 to board n120

Achieving good health and well-being through lower mortality rates of non-communicable diseases and early warning of health risks are key goals of United Nations (UN). Wearable internet of things (IoT) are one of the most promising technology to achieve these goals through their ubiquitous monitoring of key health indicators and in-situ data processing. However, small form-factor of wearable devices constrains the battery capacity, thus requiring frequent recharging or battery replacements, which lowers their adoption rate and benefits. Augmentation of battery energy by scavenging ambient sources, such as light, is a promising solution to improve operating lifetime of IoT devices. However, ambient energy sources are highly uncertain, making energy management (EM) challenging. To handle these challenges, this paper presents a novel uncertainty-aware EM approach. First, we develop a conformal prediction-based method for future energy harvest (EH) that provides small uncertainty regions with provable coverage guarantees (true output vector is within the region). The EH uncertainty regions are then leveraged in an EM algorithm that uses overhead-aware sampling to evaluate the quality of multiple decisions with varying EH before making a decision using a lightweight machine learning model. Experiments on two diverse real-world datasets with 10 users show that conformal prediction achieves more than 90% coverage with tight prediction intervals; and the EM algorithm produces decisions that are, on average, within 2 Joules of an optimal Oracle.

9039: An Interactive Game-based Multi-Agent AI System for Children’s Social and Emotional Development Preprint

Authors: Shreya Banerjee, Soheil Saneei, Lisa Pham, Elliott Alexander Beaton, Henry Fordjour Ansah, Ben Samuel, Jenny Spicer, Huda Hammad, Amanda Stage

Location: Montreal | Day: August 19th | Time: 11:30 | Session: AI for Social Good (1/8)

Poster Board Position: From board n90 to board n93

The earliest years of life, from birth to elementary school, are the most critical time for children’s social and emotional development. Recently, schools and workplaces have become increasingly concerned with cultivating social and emotional skills, especially with the decline of face-to-face interaction and the pervasive influence of modern communications technology. This paper proposes a game in development that navigates this challenge and aims to facilitate skill development in children ages 3-10 by having them identify, understand, feel their emotions, and regulate their actions. Using an established framework for social and emotional learning, it employs multi-agent artificial intelligence-based interactive gaming that generates dynamic scenarios and adjusts learning experiences based on individual child’s or group’s needs over time. We discuss the various modes of this game and its target frameworks, along with ways to evaluate effectiveness in facilitating social and emotional skill development.

9040: CoDiCast: Conditional Diffusion Model for Global Weather Forecasting with Uncertainty Quantification Preprint

Authors: Jimeng Shi, Bowen Jin, Jiawei Han, Sundararaman Gopalakrishnan, Giri Narasimhan

Location: Montreal | Day: August 21st | Time: 11:30 | Session: AI for Social Good (5/8)

Poster Board Position: From board n121 to board n123

Accurate weather forecasting is critical for science and society. However, existing methods have not achieved the combination of high accuracy, low uncertainty, and high computational efficiency simultaneously. On one hand, traditional numerical weather prediction (NWP) models are computationally intensive because of their complexity. On the other hand, most machine learning-based weather prediction (MLWP) approaches offer efficiency and accuracy but remain deterministic, lacking the ability to capture forecast uncertainty. To tackle these challenges, we propose a conditional diffusion model, CoDiCast, to generate global weather prediction, integrating accuracy and uncertainty quantification at a modest computational cost. The key idea behind the prediction task is to generate realistic weather scenarios at a future time point, conditioned on observations from the recent past. Due to the probabilistic nature of diffusion models, they can be properly applied to capture the uncertainty of weather predictions. Therefore, we accomplish uncertainty quantifications by repeatedly sampling from stochastic Gaussian noise for each initial weather state and running the denoising process multiple times. Experimental results demonstrate that CoDiCast outperforms several existing MLWP methods in accuracy, and is faster than NWP models in inference speed. Our model can generate 6-day global weather forecasts, at 6-hour steps and 5.625-degree latitude-longitude resolutions, for over 5 variables, in about 12 minutes on a commodity A100 GPU machine with 80GB memory. The source code is available at https://github.com/JimengShi/CoDiCast.

9041: What is Behind Homelessness Bias? Using LLMs and NLP to Mitigate Homelessness by Acting on Social Stigma Preprint

Authors: Jonathan A. Karr Jr., Emory Smith, Matthew Hauenstein, Georgina Curto, Nitesh V. Chawla

Location: Montreal | Day: August 19th | Time: 15:00 | Session: AI for Social Good (2/8)

Poster Board Position: From board n84 to board n89

Bias towards people experiencing homelessness (PEH) is prevalent in online spaces. This project will leverage natural language processing (NLP) and large language models (LLMs) to identify, classify, and measure bias using geolocalized data collected from X (formerly Twitter), Reddit, meeting minutes, and news media across the United States. While public opinion often refers to addictions, criminality, and high levels of welfare spending to justify bias against PEH, we will conduct a comparative study to determine whether racial fractionalization is associated with homelessness bias. The results of the study aim to provide a new path to alleviate homelessness by unveiling the intersectional bias that affects PEH and minority racial groups. During the course of the project, we will deliver a lexicon, compile an annotated database for homelessness and homelessness-racism intersectional (HRI) bias, evaluate LLMs as classifiers of homelessness and HRI bias, develop homelessness and HRI bias metrics, and audit existing LLMs on HRI. In collaboration with non-profits and the city council of South Bend, Indiana, USA, our ultimate goal is to contribute to homelessness alleviation by counteracting social stigma, restoring the dignity and well-being of the persons affected.

9048: Towards a Bipartisan Understanding of Peace and Vicarious Interactions Preprint

Authors: Arka Dutta, Syed Mohammad Sualeh Ali, Usman Naseem, Ashiqur R. KhudaBukhsh

Location: Montreal | Day: August 19th | Time: 15:00 | Session: AI for Social Good (2/8)

Poster Board Position: From board n84 to board n89

Human input plays a critical role in modern AI systems. As machines take on increasingly nuanced tasks, it becomes essential for the community to embrace subjectivity and diverse perspectives. However, research on sensitive topics often fails to incorporate diverse and balanced perspectives. This paper makes a key contribution to participatory AI design in the context of conflicts between nuclear adversaries (India and Pakistan); where disagreement between stakeholders is anticipated. The paper explores the notion of hope speech detection — detecting de-escalating content in the context of nuclear adversaries on the brink of war — through the lens of participatory AI design and vicarious interactions. We release a dataset of 10,081 social web posts annotated by raters from India and Pakistan and examine the bipartisan nature of the language of de-escalation. Our study reveals that vicarious perspectives can be useful for modeling out-group preferences.

9049: IGraSS: Learning to Identify Infrastructure Networks from Satellite Imagery by Iterative Graph-constrained Semantic Segmentation Preprint

Authors: Oishee Bintey Hoque, Abhijin Adiga, Aniruddha Adiga, Siddharth Chaudhary, Madhav V. Marathe, S.S. Ravi, Kirti Rajagopalan, Amanda Wilson, Samarth Swarup

Location: Montreal | Day: August 20th | Time: 14:00 | Session: AI for Social Good (3/8)

Poster Board Position: From board n98 to board n104

Accurate canal network mapping is essential for water management, including irrigation planning and infrastructure maintenance. State-of-the-art semantic segmentation models for infrastructure mapping, such as roads, rely on large, well-annotated remote sensing datasets. However, incomplete or inadequate ground truth can hinder these learning approaches. Many infrastructure networks have graph-level properties such as reachability to a source (like canals) or connectivity (roads) that can be leveraged to improve these existing ground truth. This paper develops a novel iterative framework IGraSS, combining a semantic segmentation module—incorporating RGB and additional modalities (NDWI, DEM)—with a graph-based ground-truth refinement module. The segmentation module processes satellite imagery patches, while the refinement module operates on the entire data viewing the infrastructure network as a graph. Experiments show that IGraSS reduces unreachable canal segments from ~18% to ~3%, and training with refined ground truth significantly improves canal identification. IGraSS serves as a robust framework for both refining noisy ground truth and mapping canal networks from remote sensing imagery. We also demonstrate the effectiveness and generalizability of IGraSS using road networks as an example, applying a different graph-theoretic constraint to complete road networks.

9097: SAHAY: Multimodal, Privacy-Preserving AI for Suicide Risk Detection and Intervention in India Preprint

Authors: Salam Michael Singh, Manik Inder Singh Sethi, Suresh Bada Math, Tanmoy Chakraborty

Location: Montreal | Day: August 22nd | Time: 10:00 | Session: AI for Social Good (7/8)

Poster Board Position: From board n46 to board n48

Suicide accounts for one of the leading causes of death in India, with over 164,033 deaths reported in 2021. Despite increased awareness, the gap between the need for consistent treatment and actual accessibility remains a challenge due to limited mental health infrastructure, the stigma surrounding mental illness in society, and the lack of real-time detection mechanisms. Traditional suicide risk assessments often miss early signs of distress, which rely heavily on clinical evaluations and self-reporting. Although AI-based monitoring seems promising, currently available models focus only on risk prediction without intervention and treatment, leaving a critical gap in tackling crisis management. In this proposal, we strive to design SAHAY, the first-of-its-kind AI-based, suicide prevention framework that seamlessly couples prediction with prevention and treatment access. Leveraging multimodal data, including the social media text and Electronic Health Records (EHR) and Ecological Momentary Assessments (EMA) such as wearable physiological data, SAHAY aims to assess suicide risk dynamically. Unlike existing models, SAHAY is culturally adaptive, multilingual and seamlessly integrates with India’s TeleMANAS mental health support system, to provide structured AI-human collaboration for long-term care and crisis interventions. It will be an adaptable, scalable, modular, and plug-and-play solution based on the Digital Public Infrastructure principle. Additionally, we intend to incorporate AI-driven geo-spatial crisis mapping to identify suicide hotspots in underserved regions. By combining real-time multimodal risk detection, professional mental health intervention, and geo-spatial outreach, SAHAY represents a scalable, adaptable, and end-to-end suicide prevention system. The design of SAHAY aligns with UN Sustainable Development Goals (SDGs) 3, 4, 5, 10, and 17, promoting inclusive, accessible, and data-driven mental healthcare.

9119: MCloudNet: An Ultra-Short-Term Photovoltaic Power Forecasting Framework With Multi-Layer Cloud Coverage Preprint

Authors: Meng Wan, Tiantian Liu, Yuxuan Bi, Jue Wang, Hui Cui, Rongqiang Cao, Jiaxiang Wang, Peng Shi, Ningming Nie, Yangang Wang

Location: Montreal | Day: August 22nd | Time: 11:30 | Session: AI for Social Good (8/8)

Poster Board Position: From board n49 to board n52

Over 4.15 million low-income households across nearly 60,000 villages in China benefit from photovoltaic (PV) poverty alleviation power stations. However, weak infrastructure and limited capabilities make these systems vulnerable to fluctuations. One of the United Nations’ Sustainable Development Goals (SDG 7) seeks to ensure access to affordable and reliable energy for all, especially in underdeveloped regions. This paper proposes MCloudNet, a multi-modal framework designed to improve ultra-short-term PV prediction in data-scarce, cloud-dynamic environments. MCloudNet explicitly models multi-layer cloud structures from satellite imagery and fuses them with time-series meteorological data to enhance prediction accuracy and interpretability. A province-level dispatch system with MCloudNet has been deployed in Hebei, supporting scheduling across rural PV stations. Experiments conducted in counties such as Shexian and Luxi highlight the framework’s effectiveness for use in underdeveloped micro-grids. Operational results show that the system has reduced over 60 million kWh of solar curtailment and generated 24 million CNY in economic value, benefiting approximately 50,000 rural households. By minimizing power fluctuations and improving rural energy scheduling, MCloudNet supports essential services such as lighting, medical facilities, and communications. The source code is available at: https://github.com/AI4SClab/MCloudNet.

9133: SHIELD: A Self-supervised, Silicosis-focused Hierarchical Imaging Framework for Occupational Lung Disease Diagnosis Preprint

Authors: Yasmeena Akhter, Rishabh Ranjan, Richa Singh, Mayank Vatsa

Location: Montreal | Day: August 20th | Time: 14:00 | Session: AI for Social Good (3/8)

Poster Board Position: From board n98 to board n104

Silicosis is an irreversible lung disease caused by silica dust exposure in industrial settings. Early detection is crucial, but automatic diagnostic methods are hindered by limited data availability. We propose SHIELD – a self-supervised, Silicosis-focused Hierarchical Imaging framework for early occupational Lung disease Diagnosis. Our method leverages a multi-resolution jigsaw puzzle pretext task on CXR images to extract and preserve features for lung region analysis. By employing a pyramidal strategy to generate pretrained models at various resolutions, followed by fine-tuning and a two-level ensembling across diverse deep learning architectures, SHIELD achieves enhanced diagnostic accuracy. We validate our approach on a publicly collected CXR dataset of 3044 samples from public health centers in India. SHIELD achieves 72% accuracy, demonstrating up to 20% improvement over baseline approaches. This work advances medical image analysis and supports UN Sustainable Development Goal 3 by providing cost-effective early screening in resource-limited settings.

9172: Knowledge-Informed Deep Learning for Irrigation Type Mapping from Remote Sensing Preprint

Authors: Oishee Bintey Hoque, Nibir Chandra Mandal, Abhijin Adiga, Samarth Swarup, Sayjro Kossi Nouwakpo, Amanda Wilson, Madhav Marathe

Location: Montreal | Day: August 22nd | Time: 11:30 | Session: AI for Social Good (8/8)

Poster Board Position: From board n49 to board n52

Accurate mapping of irrigation methods is crucial for sustainable agricultural practices and food systems. However, existing models that rely solely on spectral features from satellite imagery are ineffective due to the complexity of agricultural landscapes and limited training data, making this a challenging problem. We present Knowledge-Informed Irrigation Mapping (KIIM), a novel Swin-Transformer based approach that uses (i) a specialized projection matrix to encode crop to irrigation probability, (ii) a spatial attention map to identify agricultural lands from non-agricultural lands, (iii) bi-directional cross-attention to focus complementary information from different modalities, and (iv) a weighted ensemble for combining predictions from images and crop information. Our experimentation on five states in the US shows up to 22.9% (IoU) improvement over baseline with a 71.4% (IoU) improvement for hard-to-classify drip irrigation. In addition, we propose a two-phase transfer learning approach to enhance cross-state irrigation mapping, achieving a 51% IoU boost in a state with limited labeled data. The ability to achieve baseline performance with only 40% of the training data highlights its efficiency, reducing the dependency on extensive manual labeling efforts and making large-scale, automated irrigation mapping more feasible and cost-effective. Code: https://github.com/Nibir088/KIIM

9173: Leveraging Artificial Intelligence to Bridge Gaps in Pediatric Oncology Care for Marginalized Spanish-Speaking Communities Preprint

Authors: Grigorii Khvatskii, Angelica Garcia Martinez, Deng Pan, Matthew Belcher, Gerónimo Medrano Loera, Dayana Pineda Pérez, Juan Emmanuel Ferrari Muñoz-Ledo, Horacio Márquez-González, Nuno Moniz, Nitesh V. Chawla

Location: Montreal | Day: August 19th | Time: 15:00 | Session: AI for Social Good (2/8)

Poster Board Position: From board n84 to board n89

In low-and middle-income countries (LMICs) pediatric cancer patients and their caregivers often suffer from effects of underfunded, fragmented and outdated healthcare systems. One of these effects is a breakdown of communication between hospital staff and caregivers, which is felt stronger among vulnerable populations. Our proposed solution integrates Large Language Models (LLM) and Automatic Speech Recognition (ASR) technologies to enhance communication between caregivers and healthcare providers while integrating community feedback. We combine cutting-edge technology with existing hospital infrastructure to allow for easy deployment and testing. The system will improve access to health, nutrition, and parental care programs, prioritizing caregiver engagement and real-time interaction. Ultimately, our system will pave the way to more equitable access to medical care, and address structural barriers affecting marginalized communities.

9177: Detecting Illicit Massage Businesses by Leveraging Graph Machine Learning Preprint

Authors: Vasuki Garg, Osman Y. Özaltın, Maria E. Mayorga, Sherrie Bosisto

Location: Montreal | Day: August 21st | Time: 15:00 | Session: AI for Social Good (6/8)

Poster Board Position: From board n112 to board n117

Thousands of Illicit Massage Businesses (IMBs) are estimated to be operating in the United States by disguising themselves as legitimate establishments while exploiting trafficked workers, harming both the victims and the massage industry. The increasing digital presence of these illicit businesses presents an opportunity for detection, a crucial task for law enforcement and social service agencies aiming to disrupt their operations. Our research leverages user-generated business reviews from Yelp.com, enriched with data from multiple sources, including RubMaps.ch, U.S. Census records, GIS data, and licensing information. We present a feasibility study of developing a graph convolutional network (GCN) for a novel application and exploring its benefits and drawbacks in identifying IMBs. The novelty of our approach lies in its ability to link and analyze businesses, reviews, and reviewers within a heterogeneous network and employ a relational GCN to capture their complex relationships.

9198: DECASTE: Unveiling Caste Stereotypes in Large Language Models Through Multi-Dimensional Bias Analysis Preprint

Authors: Prashanth Vijayaraghavan, Soroush Vosoughi, Lamogha Chiazor, Raya Horesh, Rogerio Abreu de Paula, Ehsan Degan, Vandana Mukherjee

Location: Montreal | Day: August 21st | Time: 10:00 | Session: AI for Social Good (4/8)

Poster Board Position: From board n118 to board n120

Recent advancements in large language models (LLMs) have revolutionized natural language processing (NLP) and expanded their applications across diverse domains. However, despite their impressive capabilities, LLMs have been shown to reflect and perpetuate harmful societal biases, including those based on ethnicity, gender, and religion. A critical and underexplored issue is the reinforcement of caste-based biases, particularly towards India’s marginalized caste groups such as Dalits and Shudras. In this paper, we address this gap by proposing DECASTE, a novel, multi-dimensional framework designed to detect and assess both implicit and explicit caste biases in LLMs. Our approach evaluates caste fairness across four dimensions: socio-cultural, economic, educational, and political, using a range of customized prompting strategies. By benchmarking several state-of-the-art LLMs, we reveal that these models systematically reinforce caste biases, with significant disparities observed in the treatment of oppressed versus dominant caste groups. For example, bias scores are notably elevated when comparing Dalits and Shudras with dominant caste groups, reflecting societal prejudices that persist in model outputs. These results expose the subtle yet pervasive caste biases in LLMs and emphasize the need for more comprehensive and inclusive bias evaluation methodologies that assess the potential risks of deploying such models in real-world contexts.

9225: An Ethical Dataset from Real-World Interactions Between Users and Large Language Models Preprint

Authors: Masahiro Kaneko, Danushka Bollegala, Timothy Baldwin

Location: Montreal | Day: August 21st | Time: 15:00 | Session: AI for Social Good (6/8)

Poster Board Position: From board n112 to board n117

Recent studies have demonstrated that Large Language Models (LLMs) have ethical-related problems such as social biases, lack of moral reasoning, and generation of offensive content.
The existing evaluation metrics and methods to address these ethical challenges use datasets intentionally created by instructing humans to create instances including ethical problems.
Therefore, the data does not sufficiently include comprehensive prompts that users actually provide when using LLM services in everyday contexts and outputs that LLMs generate.
There may be different tendencies between unethical instances intentionally created by humans and actual user interactions with LLM services, which could result in a lack of comprehensive evaluation.
To investigate the difference, we create Eagle datasets extracted from actual interactions between ChatGPT and users that exhibit social biases, opinion biases, toxicity, and immoral problems.
Our experiments show that Eagle captures complementary aspects, not covered by existing datasets proposed for evaluation and mitigation.
We argue that using both existing and proposed datasets leads to a more comprehensive assessment of the ethics.