Visual Object-oriented Learning meets Interaction: Discovery, Representations, and Applications (ECCV22 VOLI Workshop)

VOLI 2022


Computer Graphics Computer Vision & Pattern Recognition



Objects, as the most basic and composable units in visual data, exist in specific visual appearances and geometrical forms, carrying rich semantic, functional, dynamic, and relational information. One may discover objects by watching passive videos or actively interacting with the world to find them. Once detected, it is also an open research problem how to interact with the objects to extract and represent such object-oriented semantics. Furthermore, such representations need to be designed easily useful for various downstream perception and robotic interaction tasks. It is a crucial research domain for studying how to define, discover, and represent objects in visual data from/for interaction, and use them for various downstream applications.
In this workshop, we will be focusing on discussing learning formulations, approaches, and methodologies that define/discover "objects" (e.g., objects, parts, concepts) from/for interaction (e.g., objects interact in the world, agents interact with objects) in unsupervised, weak-supervised, and/or self-supervised manners, design/learn task-agnostic or task-aware visual representations for various object-oriented properties (e.g., dynamics, functionality, affordance), and/or explores ways to apply the developed object-oriented methods to different downstream applications in different fields (e.g., machine learning, computer graphics, computer vision, robotics, cognition).
Some concrete examples are listed below:
- How to define/discover visual ”objects” from/for interaction? (e.g. How to define the concepts of objects from/for downstream tasks? How do we achieve unsupervised, weak-supervised, self-supervised interaction, or embodied learning for object discovery? What are different approaches to discovering objects?)
- What are good object-oriented representations and how to learn them from/for interaction? (e.g. What properties of objects need to be learned and can be learned from/for interactions? How to design suitable learning representations for different desired object attributes? Is there a unified way to represent desired object properties for various downstream tasks? Shall we learn task-agnostic or task-aware representations? How to extract, learn, represent, and use the learned task-aware information?)
- How to learn useful object-oriented representations for different downstream applications? (e.g. What object-oriented representations do different tasks from different fields need? How much they are similar or different? Do we need to learn task-specific representations? Can we learn universal object representations for all tasks? How do we design learning approaches that can allow objects or agents to interact with the "objects" in the environment for better learning representations for downstream tasks?
We accept both archival and non-archival paper submissions. The accepted archival papers will be included in the ECCV22 conference proceedings, while the non-archival ones will just be presented in the workshop. All papers will be peer-reviewed by three experts in the field in a double-blind manner. We also welcome papers that are accepted to the ECCV main conference or other previous conferences to present your work in the non-archival paper track. There is no need for peer-review for such previously accepted papers, so please indicate clearly in the submission form. Every accepted paper will have the opportunity to give a 5-min spotlight presentation and host two 30-min poster sessions (12-hours separated).
Submission Site: https://cmt3.research.microsoft.com/VOLI2022
Submission Instructions:
Please use the official ECCV template (under "Submission Guidelines") for your submissions. Make sure to anonymize your submission.