Array ( [0] => chen [1] => Long Chen [2] => The Hong Kong University of Science and Technology [3] => Towards Efficient Multimodal Reasoning Models [4] => Today’s pretrained foundation models have demonstrated astonishing abilities in different applications. Hundreds of foundation models have been proposed during the past few years. Although significant progress has been achieved, there are still several challenges in designing stronger but more efficient foundation models. In this talk, I am going to share some recent works on building efficient multimodal reasoning models. ) Array ( [0] => ) Array ( [0] => ) Array ( [0] => ) Array ( [0] => Bio: ) Array ( [0] => ) Array ( [0] => Dr. Long Chen is an assistant professor at the Computer Science and Engineering (CSE) department [1] => Hong Kong University of Science and Technology (HKUST). He is leading the research group: LONG Group (https://long-group.cse.ust.hk/). Before joining HKUST [2] => he was a postdoctoral research scientist at Columbia University. He obtained his Ph.D. degree from Zhejiang University [3] => and he was also a visiting student at NTU & NUS. His primary research interests are Computer Vision [4] => Machine Learning [5] => and Multimedia. Specifically [6] => he aims to build an efficient multimodal AI system that can realize ""human-like"" multimodal understanding and generation. By “human-like” [7] => we mean that the vision systems should be equipped with three types of abilities: 1) Explainable: The model should rely on (right) explicit evidences when making decisions [8] => i.e. [9] => right for the right reasons. 2) Robust: The model should be robust to some situations with only low-quality training data (e.g. [10] => training samples are biased [11] => noisy [12] => or limited). 3) Universal: The model design is relatively universal [13] => i.e. [14] => it is expected to be effective for various tasks. " [15] => [16] => [17] => ) Array ( [0] => du [1] => Yali Du [2] => King’s College London [3] => Towards Cooperative AI Agents [4] => From collaborative industrial robots to personal AI assistants, the integration of AI into our daily lives highlights the critical need for effective and reliable coordination among agents, as well as between agents and humans. This challenge centers on creating agents that not only align with user intentions but also possess the flexibility to adapt to evolving circumstances, such as the introduction of novel agents. The pursuit of multi-agent cooperation extends beyond individual interactions to encompass broader societal considerations. In this talk, I will discuss the challenges of cooperative AI, and our contributions on multi-agent cooperation, human-ai coordination and cooperative alignments. [5] => [6] => [7] => ) Array ( [0] => hanna [1] => Josiah Hanna [2] => Affiliation: the University of Wisconsin – Madison [3] => Deploying Reinforcement Learning with Confidence via Active and Offline Policy Evaluation [4] => Recent years have seen a surge of interest in reinforcement learning (RL) as a powerful method for enabling AI agents to learn how to act so as to achieve the goals set by their designers. In practice, a crucial question in RL applications is how to decide when a learned policy is performant enough for deployment and, just as importantly, when a learned policy should not be deployed. In this talk, I will describe recent work from my group on methods that aim to enable RL practitioners to answer this question and thus to enable the use of RL in domains where extensive testing of learned policies is difficult or impossible. I will first talk about a line of work in my group on offline policy evaluation (OPE), or predicting the performance of an untested policy using data from previously used policies. The key novelty in these works is to leverage state abstraction and representation learning to scale OPE methods to more complex domains such as robot control. Then I will discuss a line of work on active data collection for data-efficient evaluation of RL policies. In this line of work, we have shown how to adaptively collect data in order to effectively evaluate an RL policy with as few real-world interactions as possible. Taken together, these lines of work are an important step toward instilling confidence in decision-making trained with RL. [5] => [6] => [7] => ) Array ( [0] => liu [1] => Zuozhu Liu [2] => [3] => [4] => [5] => [6] => [7] => ) Array ( [0] => martin [1] => Roberto Martin-Martin [2] => [3] => [4] => [5] => [6] => [7] => ) Array ( [0] => mirsky [1] => Reuth Mirsky [2] => Department of Computer Science, Tufts University [3] => Agents, Autonomy and Disobedience [4] => Human-AI and human-robot interaction often frames artificial agents as obedient assistants that are designed to follow instructions and meet expectations. But what if this paradigm is limiting the true potential of collaborative AI? ) Array ( [0] => In this talk [1] => I challenge the assumption that autonomy should always be constrained by compliance. I present a scale of autonomy for AI agents and use it to argue that intelligent disobedience can be not only beneficial but essential to cooperation. I will use guide dogs as inspiration to discuss several exciting manifestations of agency and intelligent disobedience in AI and robotics: reasoning about other agents [2] => initiating an interaction [3] => teaching teammates [4] => and more." [5] => Reuth Mirsky is an Assistant Professor in the Computer Science department at Tufts University. Before her current position, she was a Senior Lecturer at Bar Ilan University and a postdoc at the Computer Science Department at the University of Texas at Austin. She received her Ph.D. on plan recognition in real-world environments from the Department of Software and Information Systems Engineering at Ben Gurion University. In her research, she seeks algorithms, behaviors, and frameworks that challenge current assumptions made for AI agents. Reuth is an engaged member of the AI and HRI communities, serving in leadership, technical, and organizational capacities. [6] => [7] => ) Array ( [0] => vaz [1] => Marynel Vázquez [2] => Assistant Prof., Computer Science Department, Yale University [3] => The Quest for Generalizable Robot Autonomy in Situated Human-Robot Interactions [4] => Robots hold significant promise for contributing to social good across various domains. For example, robots may help us learn new skills, assist us in completing tasks or provide emotional support. To be successful in these endeavors in real-world human environments, robots need to be robust to changes in their social contexts, such as changes in individual users, group interactions, the physical environment, etc. ) Array ( [0] => ) Array ( [0] => In this talk [1] => I will describe two lines of research critical to achieving more generalizable robot autonomy in situated human-robot interactions. First [2] => I will describe a unified perspective for reasoning about social contexts in HRI that exploits the underlying relational structure of the data. This perspective is motivated by a need to computationally model various aspects of social contexts in HRI and [3] => ultimately [4] => aims to enable more generalizable social robot behavior policies. Second [5] => I will describe our efforts to leverage nonverbal human behavior as implicit feedback for robot behavior evaluation. This work is motivated by the need to scale the feedback that end-users provide to robots during interactions for policy evaluation and improvement. Taken together [6] => these lines of research bring us closer to a future where robots will be better equipped to deal with complex social encounters." [7] => [8] => [9] => ) Array ( [0] => yau [1] => Quanming Yau [2] => Department of Electronic Engineering, Tsinghua University [3] => Structure-Aware Learning: Evolving Topological Learning Techniques for Vertical Domains [4] => The inherent structure within data across various vertical domains, from molecular biology to knowledge graphs, offers a powerful scaffold for machine learning. This talk will explore the evolution of topological learning techniques, spanning from classical graph-based models to the forefront of multi-agent systems. We will begin with the introduction of Graph Neural Networks (GNNs), a widely-used architecture to model complex topological structure in tasks like molecular property prediction and knowledge graph learning. Next, we examine the integration of Large Language Models (LLMs) into topological learning, a paradigm that unifies structured and textual data. By leveraging the capabilities of LLMs, this paradigm enables interpretable inference over complex knowledge graphs. Finally, we will explore the latest advancements where multi-agent systems with optimizable topological structures are designed and explored to solve complex tasks. Overall, this presentation outlines the recent progression of topological Learning, from GNNs to LLMs and Agents, showcasing a powerful paradigm for building sophisticated and adaptable AI solutions for science and industry. [5] => [6] => [7] => ) Array ( [0] => zhang [1] => Chuxu Zhang [2] => Associate Professor of Computer Science and Engineering, University of Connecticut [3] => Graph Machine Learning: Effectiveness, Efficiency, and Safety [4] => Graph data is ubiquitous in real-world applications, and graph machine learning has emerged as a transformative force in advancing AI over the past decade. In this talk, I will present my research in graph machine learning, centered around three key dimensions: effectiveness, efficiency, and safety. I will discuss the development of models and algorithms that not only deliver strong predictive performance but also promote scalability and trustworthiness. I will also showcase how these methods are applied across diverse domains—including healthcare, social media, recommender systems, and natural language processing—to address pressing societal challenges through principled and impactful model design. [5] => [6] => [7] => ) Array ( [0] => chen [1] => Long Chen [2] => [3] => Towards Efficient Multimodal Reasoning Models [4] => Towards Efficient Multimodal Reasoning Models [5] => Dr. Long Chen is an assistant professor at the Computer Science and Engineering (CSE) department, Hong Kong University of Science and Technology (HKUST). He is leading the research group: LONG Group (https://long-group.cse.ust.hk/). Before joining HKUST, he was a postdoctoral research scientist at Columbia University. He obtained his Ph.D. degree from Zhejiang University, and he was also a visiting student at NTU & NUS. His primary research interests are Computer Vision, Machine Learning, and Multimedia. Specifically, he aims to build an efficient multimodal AI system that can realize "human-like" multimodal understanding and generation. By “human-like”, we mean that the vision systems should be equipped with three types of abilities: 1) Explainable: The model should rely on (right) explicit evidences when making decisions, i.e., right for the right reasons. 2) Robust: The model should be robust to some situations with only low-quality training data (e.g., training samples are biased, noisy, or limited). 3) Universal: The model design is relatively universal, i.e., it is expected to be effective for various tasks. [6] => [7] => Today’s pretrained foundation models have demonstrated astonishing abilities in different applications. Hundreds of foundation models have been proposed during the past few years. Although significant progress has been achieved, there are still several challenges in designing stronger but more efficient foundation models. In this talk, I am going to share some recent works on building efficient multimodal reasoning models )