The Hong Kong University of Science and Technology
Towards Efficient Multimodal Reasoning Models
Today’s pretrained foundation models have demonstrated astonishing abilities in different applications. Hundreds of foundation models have been proposed during the past few years. Although significant progress has been achieved, there are still several challenges in designing stronger but more efficient foundation models. In this talk, I am going to share some recent works on building efficient multimodal reasoning models.
he was a postdoctoral research scientist at Columbia University. He obtained his Ph.D. degree from Zhejiang University
and he was also a visiting student at NTU & NUS. His primary research interests are Computer Vision
Machine Learning
King’s College London
Towards Cooperative AI Agents
From collaborative industrial robots to personal AI assistants, the integration of AI into our daily lives highlights the critical need for effective and reliable coordination among agents, as well as between agents and humans. This challenge centers on creating agents that not only align with user intentions but also possess the flexibility to adapt to evolving circumstances, such as the introduction of novel agents. The pursuit of multi-agent cooperation extends beyond individual interactions to encompass broader societal considerations. In this talk, I will discuss the challenges of cooperative AI, and our contributions on multi-agent cooperation, human-ai coordination and cooperative alignments.
Affiliation: the University of Wisconsin – Madison
Deploying Reinforcement Learning with Confidence via Active and Offline Policy Evaluation
Recent years have seen a surge of interest in reinforcement learning (RL) as a powerful method for enabling AI agents to learn how to act so as to achieve the goals set by their designers. In practice, a crucial question in RL applications is how to decide when a learned policy is performant enough for deployment and, just as importantly, when a learned policy should not be deployed. In this talk, I will describe recent work from my group on methods that aim to enable RL practitioners to answer this question and thus to enable the use of RL in domains where extensive testing of learned policies is difficult or impossible. I will first talk about a line of work in my group on offline policy evaluation (OPE), or predicting the performance of an untested policy using data from previously used policies. The key novelty in these works is to leverage state abstraction and representation learning to scale OPE methods to more complex domains such as robot control. Then I will discuss a line of work on active data collection for data-efficient evaluation of RL policies. In this line of work, we have shown how to adaptively collect data in order to effectively evaluate an RL policy with as few real-world interactions as possible. Taken together, these lines of work are an important step toward instilling confidence in decision-making trained with RL.
Department of Computer Science, Tufts University
Agents, Autonomy and Disobedience
Human-AI and human-robot interaction often frames artificial agents as obedient assistants that are designed to follow instructions and meet expectations. But what if this paradigm is limiting the true potential of collaborative AI?
initiating an interaction
teaching teammates
and more."
Assistant Prof., Computer Science Department, Yale University
The Quest for Generalizable Robot Autonomy in Situated Human-Robot Interactions
Robots hold significant promise for contributing to social good across various domains. For example, robots may help us learn new skills, assist us in completing tasks or provide emotional support. To be successful in these endeavors in real-world human environments, robots need to be robust to changes in their social contexts, such as changes in individual users, group interactions, the physical environment, etc.
I will describe a unified perspective for reasoning about social contexts in HRI that exploits the underlying relational structure of the data. This perspective is motivated by a need to computationally model various aspects of social contexts in HRI and
ultimately
aims to enable more generalizable social robot behavior policies. Second
Department of Electronic Engineering, Tsinghua University
Structure-Aware Learning: Evolving Topological Learning Techniques for Vertical Domains
The inherent structure within data across various vertical domains, from molecular biology to knowledge graphs, offers a powerful scaffold for machine learning. This talk will explore the evolution of topological learning techniques, spanning from classical graph-based models to the forefront of multi-agent systems. We will begin with the introduction of Graph Neural Networks (GNNs), a widely-used architecture to model complex topological structure in tasks like molecular property prediction and knowledge graph learning. Next, we examine the integration of Large Language Models (LLMs) into topological learning, a paradigm that unifies structured and textual data. By leveraging the capabilities of LLMs, this paradigm enables interpretable inference over complex knowledge graphs. Finally, we will explore the latest advancements where multi-agent systems with optimizable topological structures are designed and explored to solve complex tasks. Overall, this presentation outlines the recent progression of topological Learning, from GNNs to LLMs and Agents, showcasing a powerful paradigm for building sophisticated and adaptable AI solutions for science and industry.
Associate Professor of Computer Science and Engineering, University of Connecticut
Graph Machine Learning: Effectiveness, Efficiency, and Safety
Graph data is ubiquitous in real-world applications, and graph machine learning has emerged as a transformative force in advancing AI over the past decade. In this talk, I will present my research in graph machine learning, centered around three key dimensions: effectiveness, efficiency, and safety. I will discuss the development of models and algorithms that not only deliver strong predictive performance but also promote scalability and trustworthiness. I will also showcase how these methods are applied across diverse domains—including healthcare, social media, recommender systems, and natural language processing—to address pressing societal challenges through principled and impactful model design.
Towards Efficient Multimodal Reasoning Models
Towards Efficient Multimodal Reasoning Models