9342:
Never Train from Scratch: Fair Comparison of Long-Sequence Models Requires Data-Driven Priors (Extended Abstract) Preprint
Authors: Ido Amos, Jonathan Berant, Ankit Gupta
Location: Montreal
| Day: August 21st
| Time: 15:00
| Session: ML: Large Language Models
Poster Board Position: From board n15 to board n21
This paper is an extended abstract of our ICLR 2024 Outstanding Paper Award work. Modeling long-range dependencies across sequences is a longstanding goal in machine learning. While state space models reportedly outperform Transformers on benchmarks like Long Range Arena, we show that random initialization significantly overestimates architectural differences. Pretraining with standard denoising objectives on downstream task data leads to dramatic gains across architectures and minimal performance gaps between Transformers and state space models (SSMs). We demonstrate that properly pretrained vanilla Transformers match S4 performance on Long Range Arena and improve SSM results on PathX-256 by 20 absolute points. Our analysis shows previously-proposed structured parameterizations for SSMs become largely redundant with pretraining. When evaluating architectures on supervised tasks, incorporating data-driven priors via pretraining is essential for reliable performance estimation.
9345:
Learning Accurate and Interpretable Decision Trees (Extended Abstract) Preprint
Authors: Maria-Florina Balcan, Dravyansh Sharma
Location: Montreal
| Day: August 21st
| Time: 11:30
| Session: Humans and AI: Interpretable Models
Poster Board Position: From board n102 to board n106
Decision trees are a popular tool in machine learning and yield easy-to-understand models. Several techniques have been proposed in the literature for learning a decision tree classifier, with different techniques working well for data from different domains. In this work, we develop a data-driven approach to design decision tree learning algorithms given repeated access to data from the same domain. We study multiple formulations covering different aspects and popular techniques for learning decision trees. We propose novel parameterized classes of node splitting criteria in top-down algorithms, which interpolate between popularly used entropy and Gini impurity based criteria, and provide theoretical bounds on the number of samples needed to learn the splitting function appropriate for the data at hand. We also study the sample complexity of tuning prior parameters in Bayesian decision tree learning, and extend our results to decision tree regression. We further consider the problem of tuning hyperparameters in pruning the decision tree for classical pruning algorithms including min-cost complexity pruning. We also study the interpretability of the learned decision trees and introduce a data-driven approach for optimizing the explainability versus accuracy trade-off using decision trees. Finally, we demonstrate the significance of our approach on real world datasets by learning data-specific decision trees which are simultaneously more accurate and interpretable.
9348:
CAM-Based Methods Can See through Walls (Extended Abstract) Preprint
Authors: Magamed Taimeskhanov, Ronan Sicre, Damien Garreau
Location: Montreal
| Day: August 21st
| Time: 11:30
| Session: Humans and AI: Interpretable Models
Poster Board Position: From board n102 to board n106
CAM-based methods are widely-used post-hoc interpretability methods that produce a saliency map to explain the decision of an image classification model. The saliency map highlights the important areas of the image relevant to the prediction. In this paper, we show that most of these methods can incorrectly attribute an important score to parts of the image that the model cannot see. We show that this phenomenon occurs both theoretically and experimentally. On the theory side, we analyze the behavior of GradCAM on a simple masked CNN model at initialization. Experimentally, we train a VGG-like model constrained to not use the lower part of the image and nevertheless observe positive scores in the unseen part of the image. This behavior is evaluated quantitatively on two new datasets. We believe that this is problematic, potentially leading to mis-interpretation of the model’s behavior.
9350:
Contractions Based on Optimal Repairs (Extended Abstract) Preprint
Authors: Franz Baader, Renata Wassermann
Location: Montreal
| Day: August 19th
| Time: 11:30
| Session: Knowledge Representation and Reasoning (1/4)
Poster Board Position: From board n25 to board n29
Removing unwanted consequences from a knowledge base has been investigated in belief change under the name contraction and is called repair in ontology engineering. Simple repair and contraction approaches based on removing statements from the knowledge base (respectively called belief base contractions and classical repairs) have the disadvantage that they are syntax-dependent and may remove more consequences than necessary. Belief set contractions do not have these problems, but may result in belief sets that have no finite representation. Similarly, optimal repairs, which are syntax-independent and maximize the retained consequences, may not exist. Our KR 2024 paper leverage advances in characterizing and computing optimal repairs of ontologies based on the description logics EL to obtain contraction operations that combine the advantages of belief set and belief base contractions. It introduces this new approach in a very general setting, and proves a characterization theorem that relates the obtained contractions with well-known rationality postulates. Then, it describes a variety of interesting instances, not only in the standard repair/contraction setting where one wants to get rid of a consequence, but also in other settings such as variants of forgetting in propositional and description logic.
9351:
Shapley Value Computation in Ontology-Mediated Query Answering (Extended Abstract) Preprint
Authors: Meghyn Bienvenu, Diego Figueira, Pierre Lafourcade
Location: Montreal
| Day: August 22nd
| Time: 10:00
| Session: KR: ontologies
Poster Board Position: From board n20 to board n23
In this work, we explore the use of the Shapley value in ontology-mediated query answering (OMQA) and provide a detailed complexity analysis of Shapley value computation (SVC) in the OMQA setting. In particular, we establish a FP/#P-hard dichotomy for SVC for ontology-mediated queries (T,q) composed of an ontology T formulated in the description logic ELHI-bot and a connected constant-free homomorphism-closed query q. We further strengthen the #P-hardness side of the dichotomy to cover possibly disconnected queries with constants. Our results exploit recently discovered connections between SVC and probabilistic query evaluation and allow us to generalize existing results on probabilistic OMQA.
9352:
Decoupled Search for the Masses: A Novel Task Transformation for Classical Planning (Extended Abstract) Preprint
Authors: David Speck, Daniel Gnad
Location: Montreal
| Day: August 20th
| Time: 10:00
| Session: Planning and Scheduling (3/5)
Poster Board Position: From board n50 to board n54
Classical planning provides a framework for solving sequential decision-making problems, i.e., finding a sequence of actions that transforms the current state of the world into a state that satisfies a desired goal condition. Planning tasks are modeled in a logic that describes the environment and its dynamics. It is well known that the specific problem formulation can significantly affect the performance of planning systems solving problems like the Rubik’s Cube or finding algorithms for matrix multiplication. In this work, we propose a domain-general problem reformulation that embodies decoupled search, a search-reduction technique from classical planning and model checking. Decoupled search decomposes a given problem to exploit its structure, achieving exponential reductions over other search techniques. We show that decoupled search can be captured exactly as a task reformulation and that, on many benchmark domains, it performs as good and sometimes even better than a native decoupled-search implementation.
9354:
FairCognizer: A Model for Accurate Predictions with Inherent Fairness Evaluation (Extended Abstract) Preprint
Authors: Adda Akram Bendoukha, Nesrine Kaaniche, Aymen Boudguiga, Renaud Sirdey
Location: Montreal
| Day: August 22nd
| Time: 11:30
| Session: AI Ethics, Trust, Fairness (3/3)
Poster Board Position: From board n39 to board n41
Algorithmic fairness is a critical challenge in building trustworthy Machine Learning (ML) models. ML classifiers strive to make predictions that closely match real-world observations (ground truth). However, if the ground truth data itself reflects biases against certain sub-populations, a dilemma arises: prioritize fairness and potentially reduce accuracy, or emphasize accuracy at the expense of fairness.
This work proposes a novel training framework that goes beyond achieving high accuracy. Our framework trains a classifier to not only deliver optimal predictions but also to identify potential fairness risks associated with each prediction.
To do so, we specify a dual-labeling strategy where the second label contains a per-prediction fairness evaluation, referred to as an unfairness risk evaluation. In addition, we identify a subset of samples as highly vulnerable to group-unfair classifiers.
Our experiments demonstrate that our classifiers attain optimal accuracy levels on both the Adult-Census-Income and Compas-Recidivism datasets. Moreover, they identify unfair predictions with nearly 75% accuracy at the cost of expanding the size of the classifier by 45%.
This work proposes a novel training framework that goes beyond achieving high accuracy. Our framework trains a classifier to not only deliver optimal predictions but also to identify potential fairness risks associated with each prediction.
To do so, we specify a dual-labeling strategy where the second label contains a per-prediction fairness evaluation, referred to as an unfairness risk evaluation. In addition, we identify a subset of samples as highly vulnerable to group-unfair classifiers.
Our experiments demonstrate that our classifiers attain optimal accuracy levels on both the Adult-Census-Income and Compas-Recidivism datasets. Moreover, they identify unfair predictions with nearly 75% accuracy at the cost of expanding the size of the classifier by 45%.
9355:
Meaning Holism and Indeterminacy of Reference in Ontologies (Extended Abstract) Preprint
Authors: Adrien Barton, Paul Fabry, Jean-François Ethier
Location: Montreal
| Day: August 22nd
| Time: 10:00
| Session: KR: ontologies
Poster Board Position: From board n20 to board n23
According to meaning holism, the meanings of all the words in a language are interdependent. If this was true, then the very practice of building largely interconnected set of ontologies would be threatened. We examine here the extent of the severity of meaning holism for ontology engineering, based on a definition of the meaning of a class term in an ontology, with regard to the classical analytic/synthetic distinction. We show that meaning holism is not as pervasive in ontologies as traditionally assumed in philosophy of language when interpreting the meaning of a class term as a collection of statements expressing necessary conditions on this term. Still, meaning holism presents substantial challenges for ontology engineering and requires mitigation strategies. We also investigate the related phenomenon of indeterminacy of reference and show how anchoring formal ontologies in natural language can mitigate this problem, even if not fully control it.
9356:
Explanatory Capabilities of Large Language Models in Prescriptive Process Monitoring (Extended Abstract) Preprint
Authors: Kateryna Kubrak, Lana Botchorishvili, Fredrik Milani, Alexander Nolte, Marlon Dumas
Location: Montreal
| Day: August 22nd
| Time: 10:00
| Session: LLM applications
Poster Board Position: From board n11 to board n14
Prescriptive Process Monitoring (PrPM) systems recommend interventions in ongoing business process cases to improve performance. However, performance gains only materialize if users follow the recommendations. Prior research has shown that users are more likely to follow recommendations when they understand them. In this paper, we explore the use of Large Language Models (LLMs) to generate explanations for PrPM recommendations. We developed a prompting method based on typical user questions and integrated it into an existing PrPM system. Our evaluation indicates that LLMs can help users of PrPM systems to better understand the recommendations. The results indicate that LLMs can help users of PrPM systems to better understand the recommendations, and to produce recommendations that have sufficient detail and fulfill their expectations. However, the explanations fall short in addressing the underlying "why" and do not always support users in assessing the trustworthiness of the recommendations.
9357:
How to Teach Programming in the AI Era? Using LLMs as a Teachable Agent for Debugging (Extended Abstract) Preprint
Authors: Qianou Ma, Hua Shen, Ken Koedinger, Tongshuang Wu
Location: Montreal
| Day: August 21st
| Time: 10:00
| Session: Humans and AI
Poster Board Position: From board n97 to board n101
Large Language Models (LLMs) excel at generating content at impeccable speeds. However, they are imperfect and still make various mistakes. In Computer Science education, as LLMs are widely recognized as "AI pair programmers," it becomes increasingly important to train students on evaluating and debugging LLM-generated codes. In this work, we introduce HypoCompass, a novel system to facilitate deliberate practice on debugging, where human novices play the role of Teaching Assistants and help LLM-powered teachable agents debug code.
We enable effective task delegation between students and LLMs in this learning-by-teaching environment: students focus on hypothesizing the cause of code errors, while adjacent skills like code completion are offloaded to LLM-agents. Our evaluations demonstrate that HypoCompass generates high-quality training materials (e.g., bugs and fixes), outperforming human counterparts fourfold in efficiency, and significantly improves student performance on debugging by 12% in the pre-to-post test.
We enable effective task delegation between students and LLMs in this learning-by-teaching environment: students focus on hypothesizing the cause of code errors, while adjacent skills like code completion are offloaded to LLM-agents. Our evaluations demonstrate that HypoCompass generates high-quality training materials (e.g., bugs and fixes), outperforming human counterparts fourfold in efficiency, and significantly improves student performance on debugging by 12% in the pre-to-post test.
9358:
Interpreting Pretrained Language Models via Concept Bottlenecks (Extended Abstract) Preprint
Authors: Zhen Tan, Lu Cheng, Song Wang, Yuan Bo, Jundong Li, Huan Liu
Location: Montreal
| Day: August 21st
| Time: 11:30
| Session: Humans and AI: Interpretable Models
Poster Board Position: From board n102 to board n106
Pretrained language models (PLMs) achieve state-of-the-art results but often function as “black boxes”, hindering interpretability and responsible deployment. While methods like attention analysis exist, they often lack clarity and intuitiveness. We propose interpreting PLMs through high-level, human-understandable concepts using Concept Bottleneck Models (CBMs). This extended abstract introduces C3M (ChatGPT-guided Concept augmentation with Concept-level Mixup), a novel framework for training Concept-Bottleneck-Enabled PLMs (CBE-PLMs). C3M leverages Large Language Models (LLMs) like ChatGPT to augment concept sets and generate noisy concept labels, combined with a concept-level MixUp mechanism to enhance robustness and effectively learn from both human-annotated and machine-generated concepts. Empirical results show our approach provides intuitive explanations, aids model diagnosis via test-time intervention, and improves the interpretability-utility trade-off, even with limited or noisy concept annotations. This is an concise version of [Tan et al., 2024b], recipient of the Best Paper Award at PAKDD 2024. Code and data are released at https://github.com/Zhen-Tan-dmml/CBM_NLP.git.
9360:
Data Void Exploits: Tracking & Mitigation Strategies (Extended Abstract) Preprint
Authors: Miro Mannino, Junior Garcia, Reem Hazim, Azza Abouzied, Paolo Papotti
Location: Montreal
| Day: August 21st
| Time: 11:30
| Session: Knowledge Representation and Reasoning (3/4)
Poster Board Position: From board n78 to board n82
In the evolving landscape of online information, disinformation is a growing concern. A concept central to this challenge is the "data void", a situation where there is a lack of relevant information online regarding certain search terms. This creates an opportunity for misleading or false narratives to fill the gap, often influencing public perception. In this work, we present methods to track and mitigate data voids in Web search settings.