We use theoretical and empirical approaches to build machine vision systems that see and understand the world like humans.
In the past few years, deep neural networks have surpassed human performance on a range of complex cognitive tasks. However, unlike humans, these models can be derailed by almost imperceptible perturbations, often fail to generalise beyond the training data and require large amounts of data to learn novel tasks. The core reason for this behaviour is shortcut learning, i.e. the tendency of neural networks to pick up statistical signatures sufficient to solve a given task instead of learning the underlying causal structures and mechanisms in the data. Our research ties together adversarial machine learning, disentanglement, interpretability, self-supervised learning, and theoretical frameworks like nonlinear Independent Component Analysis to develop theoretically grounded yet empirically successful visual representation learning techniques that can uncover the underlying structure of our visual world and close the gap between human and machine vision.