Best Papers from Sister Conferences Track (Montreal)

9342: Never Train from Scratch: Fair Comparison of Long-Sequence Models Requires Data-Driven Priors (Extended Abstract)

Authors: Ido Amos, Jonathan Berant, Ankit Gupta

Location: Montreal | Day: August 21st | Time: 15:00 | Session: ML: Large Language Models

Show Abstract

This paper is an extended abstract of our ICLR 2024 Outstanding Paper Award work. Modeling long-range dependencies across sequences is a longstanding goal in machine learning. While state space models reportedly outperform Transformers on benchmarks like Long Range Arena, we show that random initialization significantly overestimates architectural differences. Pretraining with standard denoising objectives on downstream task data leads to dramatic gains across architectures and minimal performance gaps between Transformers and state space models (SSMs). We demonstrate that properly pretrained vanilla Transformers match S4 performance on Long Range Arena and improve SSM results on PathX-256 by 20 absolute points. Our analysis shows previously-proposed structured parameterizations for SSMs become largely redundant with pretraining. When evaluating architectures on supervised tasks, incorporating data-driven priors via pretraining is essential for reliable performance estimation.

9345: Learning Accurate and Interpretable Decision Trees (Extended Abstract)

Authors: Maria-Florina Balcan, Dravyansh Sharma

Location: Montreal | Day: August 21st | Time: 11:30 | Session: Humans and AI: Interpretable Models

Show Abstract

Decision trees are a popular tool in machine learning and yield easy-to-understand models. Several techniques have been proposed in the literature for learning a decision tree classifier, with different techniques working well for data from different domains. In this work, we develop a data-driven approach to design decision tree learning algorithms given repeated access to data from the same domain. We study multiple formulations covering different aspects and popular techniques for learning decision trees. We propose novel parameterized classes of node splitting criteria in top-down algorithms, which interpolate between popularly used entropy and Gini impurity based criteria, and provide theoretical bounds on the number of samples needed to learn the splitting function appropriate for the data at hand. We also study the sample complexity of tuning prior parameters in Bayesian decision tree learning, and extend our results to decision tree regression. We further consider the problem of tuning hyperparameters in pruning the decision tree for classical pruning algorithms including min-cost complexity pruning. We also study the interpretability of the learned decision trees and introduce a data-driven approach for optimizing the explainability versus accuracy trade-off using decision trees. Finally, we demonstrate the significance of our approach on real world datasets by learning data-specific decision trees which are simultaneously more accurate and interpretable.

9348: CAM-Based Methods Can See through Walls (Extended Abstract)

Authors: Magamed Taimeskhanov, Ronan Sicre, Damien Garreau

Location: Montreal | Day: August 21st | Time: 11:30 | Session: Humans and AI: Interpretable Models

Show Abstract

CAM-based methods are widely-used post-hoc interpretability methods that produce a saliency map to explain the decision of an image classification model. The saliency map highlights the important areas of the image relevant to the prediction. In this paper, we show that most of these methods can incorrectly attribute an important score to parts of the image that the model cannot see. We show that this phenomenon occurs both theoretically and experimentally. On the theory side, we analyze the behavior of GradCAM on a simple masked CNN model at initialization. Experimentally, we train a VGG-like model constrained to not use the lower part of the image and nevertheless observe positive scores in the unseen part of the image. This behavior is evaluated quantitatively on two new datasets. We believe that this is problematic, potentially leading to mis-interpretation of the model’s behavior.

9350: Contractions Based on Optimal Repairs (Extended Abstract)

Authors: Franz Baader, Renata Wassermann

Location: Montreal | Day: August 19th | Time: 11:30 | Session: Knowledge Representation and Reasoning (1/4)

Show Abstract

Removing unwanted consequences from a knowledge base has been investigated in belief change under the name contraction and is called repair in ontology engineering. Simple repair and contraction approaches based on removing statements from the knowledge base (respectively called belief base contractions and classical repairs) have the disadvantage that they are syntax-dependent and may remove more consequences than necessary. Belief set contractions do not have these problems, but may result in belief sets that have no finite representation. Similarly, optimal repairs, which are syntax-independent and maximize the retained consequences, may not exist. Our KR 2024 paper leverage advances in characterizing and computing optimal repairs of ontologies based on the description logics EL to obtain contraction operations that combine the advantages of belief set and belief base contractions. It introduces this new approach in a very general setting, and proves a characterization theorem that relates the obtained contractions with well-known rationality postulates. Then, it describes a variety of interesting instances, not only in the standard repair/contraction setting where one wants to get rid of a consequence, but also in other settings such as variants of forgetting in propositional and description logic.

9351: Shapley Value Computation in Ontology-Mediated Query Answering (Extended Abstract)

Authors: Meghyn Bienvenu, Diego Figueira, Pierre Lafourcade

Location: Montreal | Day: August 22nd | Time: 10:00 | Session: KR: ontologies

Show Abstract

In this work, we explore the use of the Shapley value in ontology-mediated query answering (OMQA) and provide a detailed complexity analysis of Shapley value computation (SVC) in the OMQA setting. In particular, we establish a FP/#P-hard dichotomy for SVC for ontology-mediated queries (T,q) composed of an ontology T formulated in the description logic ELHI-bot and a connected constant-free homomorphism-closed query q. We further strengthen the #P-hardness side of the dichotomy to cover possibly disconnected queries with constants. Our results exploit recently discovered connections between SVC and probabilistic query evaluation and allow us to generalize existing results on probabilistic OMQA.

9352: Decoupled Search for the Masses: A Novel Task Transformation for Classical Planning (Extended Abstract)

Authors: David Speck, Daniel Gnad

Location: Montreal | Day: August 20th | Time: 10:00 | Session: Planning and Scheduling (3/5)

Show Abstract

Classical planning provides a framework for solving sequential decision-making problems, i.e., finding a sequence of actions that transforms the current state of the world into a state that satisfies a desired goal condition. Planning tasks are modeled in a logic that describes the environment and its dynamics. It is well known that the specific problem formulation can significantly affect the performance of planning systems solving problems like the Rubik’s Cube or finding algorithms for matrix multiplication. In this work, we propose a domain-general problem reformulation that embodies decoupled search, a search-reduction technique from classical planning and model checking. Decoupled search decomposes a given problem to exploit its structure, achieving exponential reductions over other search techniques. We show that decoupled search can be captured exactly as a task reformulation and that, on many benchmark domains, it performs as good and sometimes even better than a native decoupled-search implementation.

9355: Meaning Holism and Indeterminacy of Reference in Ontologies (Extended Abstract)

Authors: Adrien Barton, Paul Fabry, Jean-François Ethier

Location: Montreal | Day: August 22nd | Time: 10:00 | Session: KR: ontologies

Show Abstract

According to meaning holism, the meanings of all the words in a language are interdependent. If this was true, then the very practice of building largely interconnected set of ontologies would be threatened. We examine here the extent of the severity of meaning holism for ontology engineering, based on a definition of the meaning of a class term in an ontology, with regard to the classical analytic/synthetic distinction. We show that meaning holism is not as pervasive in ontologies as traditionally assumed in philosophy of language when interpreting the meaning of a class term as a collection of statements expressing necessary conditions on this term. Still, meaning holism presents substantial challenges for ontology engineering and requires mitigation strategies. We also investigate the related phenomenon of indeterminacy of reference and show how anchoring formal ontologies in natural language can mitigate this problem, even if not fully control it.

9356: Explanatory Capabilities of Large Language Models in Prescriptive Process Monitoring (Extended Abstract)

Authors: Kateryna Kubark, Lana Botchorishvili, Fredrik Milani, Alexander Nolte, Marlon Dumas

Location: Montreal | Day: August 22nd | Time: 10:00 | Session: LLM applications

Show Abstract

Prescriptive Process Monitoring (PrPM) systems recommend interventions in ongoing business process cases to improve performance. However, performance gains only materialize if users follow the recommendations. Prior research has shown that users are more likely to follow recommendations when they understand them. In this paper, we explore the use of Large Language Models (LLMs) to generate explanations for PrPM recommendations. We developed a prompting method based on typical user questions and integrated it into an existing PrPM system. Our evaluation indicates that LLMs can help users of PrPM systems to better understand the recommendations. The results indicate that LLMs can help users of PrPM systems to better understand the recommendations, and to produce recommendations that have sufficient detail and fulfill their expectations. However, the explanations fall short in addressing the underlying "why" and do not always support users in assessing the trustworthiness of the recommendations.

9357: How to Teach Programming in the AI Era? Using LLMs as a Teachable Agent for Debugging (Extended Abstract)

Authors: Qianou Ma, Hua Shen, Ken Koedinger, Tongshuang Wu

Location: Montreal | Day: August 21st | Time: 10:00 | Session: Humans and AI

Show Abstract

Large Language Models (LLMs) excel at generating content at impeccable speeds. However, they are imperfect and still make various mistakes. In Computer Science education, as LLMs are widely recognized as "AI pair programmers," it becomes increasingly important to train students on evaluating and debugging LLM-generated codes. In this work, we introduce HypoCompass, a novel system to facilitate deliberate practice on debugging, where human novices play the role of Teaching Assistants and help LLM-powered teachable agents debug code.
We enable effective task delegation between students and LLMs in this learning-by-teaching environment: students focus on hypothesizing the cause of code errors, while adjacent skills like code completion are offloaded to LLM-agents. Our evaluations demonstrate that HypoCompass generates high-quality training materials (e.g., bugs and fixes), outperforming human counterparts fourfold in efficiency, and significantly improves student performance on debugging by 12% in the pre-to-post test.

9358: Interpreting Pretrained Language Models via Concept Bottlenecks (Extended Abstract)

Authors: Zhen Tan, Lu Cheng, Song Wang, Yuan Bo, Jundong Li, Huan Liu

Location: Montreal | Day: August 21st | Time: 11:30 | Session: Humans and AI: Interpretable Models

Show Abstract

Pretrained language models (PLMs) achieve state-of-the-art results but often function as “black boxes”, hindering interpretability and responsible deployment. While methods like attention analysis exist, they often lack clarity and intuitiveness. We propose interpreting PLMs through high-level, human-understandable concepts using Concept Bottleneck Models (CBMs). This extended abstract introduces C3M (ChatGPT-guided Concept augmentation with Concept-level Mixup), a novel framework for training Concept-Bottleneck-Enabled PLMs (CBE-PLMs). C3M leverages Large Language Models (LLMs) like ChatGPT to augment concept sets and generate noisy concept labels, combined with a concept-level MixUp mechanism to enhance robustness and effectively learn from both human-annotated and machine-generated concepts. Empirical results show our approach provides intuitive explanations, aids model diagnosis via test-time intervention, and improves the interpretability-utility trade-off, even with limited or noisy concept annotations. This is an concise version of [Tan et al., 2024b], recipient of the Best Paper Award at PAKDD 2024. Code and data are released at https://github.com/Zhen-Tan-dmml/CBM_NLP.git.