About Me

I am a doctoral student at the Computer Vision Lab at ETH Zurich supervised by Prof. Luc Van Gool and Dr. Dengxin Dai. My focus lies on deep learning and computer vision with a particular interest in label-efficient scene understanding. My current research directions include semi-supervised, domain-adaptive, self-supervised, multi-task, and active learning for dense prediction tasks such as semantic segmentation and monocular depth estimation. In the past, I also worked on explainable AI, multi-camera fusion for semantic maps, and autonomous mobile robots.


Jul 2021 - Present
ETH Zurich, Switzerland
Doctoral Student in Computer Vision
Sep 2019 - Jun 2021
ETH Zurich, Switzerland
M. Sc. in Robotics, Systems and Control
ETH Medal for an outstanding Master's thesis
Sep 2017 - Dec 2017
University of British Columbia Vancouver, Canada
Semester Abroad
Oct 2015 - Jan 2019
Otto von Guericke University Magdeburg, Germany
B. Sc. in Computer Systems in Engineering
Best Graduate at the Computer Science Department 2018/19


Feb 2019 - Jul 2019
Bosch Center for Artificial Intelligence Renningen, Germany
PreMaster Program, Explainable Artificial Intelligence
Oct 2018 - Dec 2018
Bosch Center for Artificial Intelligence Renningen, Germany
Research Internship, Environment Representations for Deep Learning


Feb 2021 - Present
ETH Zurich, Deep Learning for Autonomous Driving, Teaching Assistant
Supervision of the course project on multi-task learning


HRDA: Context-Aware High-Resolution Domain-Adaptive Semantic Segmentation
Lukas Hoyer, Dengxin Dai, Luc Van Gool
arXiv:2204.13132, 2022
Unsupervised domain adaptation (UDA) aims to adapt a model trained on synthetic data to real-world data without requiring expensive annotations of real-world images. As UDA methods for semantic segmentation are usually GPU memory intensive, most previous methods operate only on downscaled images. We question this design as low-resolution predictions often fail to preserve fine details. The alternative of training with random crops of high-resolution images alleviates this problem but falls short in capturing long-range, domain-robust context information. Therefore, we propose HRDA, a multi-resolution training approach for UDA, that combines the strengths of small high-resolution crops to preserve fine segmentation details and large low-resolution crops to capture long-range context dependencies with a learned scale attention, while maintaining a manageable GPU memory footprint. HRDA enables adapting small objects and preserving fine segmentation details. It significantly improves the state-of-the-art performance by 5.5 mIoU for GTA→Cityscapes and by 4.9 mIoU for Synthia→Cityscapes, resulting in an unprecedented performance of 73.8 and 65.8 mIoU, respectively.
DAFormer: Improving Network Architectures and Training Strategies for Domain-Adaptive Semantic Segmentation
Lukas Hoyer, Dengxin Dai, Luc Van Gool
Conference on Computer Vision and Pattern Recognition (CVPR), 2022
As acquiring pixel-wise annotations of real-world images for semantic segmentation is a costly process, a model can instead be trained with more accessible synthetic data and adapted to real images without requiring their annotations. This process is studied in Unsupervised Domain Adaptation (UDA). In this work, we particularly study the influence of the network architecture on UDA performance and propose DAFormer, a Transformer network architecture tailored for UDA. It is enabled by three simple but crucial training strategies to stabilize the training and to avoid overfitting the source domain. DAFormer significantly improves the state-of-the-art performance by 10.8 mIoU for GTA→Cityscapes and by 5.4 mIoU for Synthia→Cityscapes.
Improving Semi-Supervised and Domain-Adaptive Semantic Segmentation with Self-Supervised Depth Estimation
Lukas Hoyer, Dengxin Dai, Qin Wang, Yuhua Chen, Luc Van Gool
arXiv:2108.12545, 2021
We extend our CVPR21 paper "Three Ways to Improve Semantic Segmentation with Self-Supervised Depth Estimation" (see below) to semi-supervised domain adaptation featuring Cross-Domain DepthMix and Matching Geometry Sampling to align synthetic and real data.
Domain Adaptive Semantic Segmentation with Self-Supervised Depth Estimation
Qin Wang, Dengxin Dai, Lukas Hoyer, Olga Fink, Luc Van Gool
International Conference on Computer Vision (ICCV), 2021
Domain adaptation for semantic segmentation aims to improve the model performance in the presence of a distribution shift between source and target domain. In this work, we leverage the guidance from self-supervised depth estimation, available on both domains, to bridge the domain gap. On the one hand, we propose to explicitly learn the task feature correlation to strengthen the target semantic predictions with the help of target depth estimation. On the other hand, we use the depth prediction discrepancy from source and target depth decoders to approximate the pixel-wise adaptation difficulty. The adaptation difficulty, inferred from depth, is then used to refine the target semantic segmentation pseudo-labels.
Three Ways to Improve Semantic Segmentation with Self-Supervised Depth Estimation
Lukas Hoyer, Dengxin Dai, Yuhua Chen, Adrian Köring, Suman Saha, Luc Van Gool
Conference on Computer Vision and Pattern Recognition (CVPR), 2021
We developed a method for improving semantic segmentation based on knowledge learned by self-supervised monocular depth estimation from unlabelled image sequences. In particular, (1) we transferred knowledge from features learned during self-supervised depth estimation to semantic segmentation, (2) we implemented a strong data augmentation by blending images and labels using the structure of the scene, and (3) we utilized the depth feature diversity as well as the level of difficulty of learning depth in a student-teacher framework to select the most useful samples to be annotated for semantic segmentation.
Grid Saliency for Context Explanations of Semantic Segmentation
Lukas Hoyer, Mauricio Munoz, Prateek Katiyar, Anna Khoreva, Volker Fischer
Advances in Neural Information Processing Systems (NeurIPS), 2019 (Poster Presentation)
We extended saliency maps from classification to dense predictions to allow visual inspection of semantic segmentation convolutional neural networks. We investigated the effectiveness of the proposed Grid Saliency on a synthetic dataset with an artificially induced bias between objects and their context as well as on real-world datasets. Our results show that Grid Saliency can be successfully used to provide easily interpretable context explanations and, moreover, can be employed for detecting and localizing contextual biases present in the data.
Short-Term Prediction and Multi-Camera Fusion on Semantic Grids
Lukas Hoyer, Patrick Kesper, Anna Khoreva, Volker Fischer
International Conference on Computer Vision (ICCV) Workshop CVRSUAD, 2019 (Poster Presentation)
We developed a self-supervised temporal prediction and multi-camera fusion system based on agent-centric semantic maps. Semantic information from multiple cameras is integrated over multiple frames in a unified semantic bird’s eye view environment representation.
A Robot Localization Framework Using CNNs for Object Detection and Pose Estimation
Lukas Hoyer, Christoph Steup, Sanaz Mostaghim
IEEE Symposium Series on Computational Intelligence (SSCI), 2018 (Oral Presentation)
We designed and evaluated an external, camera-based localization and identification system for swarm robots using convolutional neural networks. For a convenient system setup, we developed a low-effort training data acquisition and synthetization process.


RoboCup Major @work
In the RoboCup @work team "Robotto", I helped to build an autonomous mobile robot for the transport of work items in factories. From 2015 until 2018, I was responsible for the development of the state machine, world model, task planner, and object recognition. In 2017, we achieved the 2nd place at the World Cup.
RoboCup Junior Rescue-B
For the RoboCup Junior league Rescue-B (now Rescue Maze), we built and programmed a robot from scratch to autonomously search for heat sources in a maze, which simulate victims in a building in danger of collapsing. In 2013 and 2014, we achieved the 1st place at the World Cup competitions.
For the "Jugend forscht" German high school research competition, we developed a cost-effective optical spectrometer for schools, which is more than 90% cheaper than regular devices. At the federal stage, we received the award for non-destructive testing. After the competition, I scaled the prototype to small batch production and equipped about 30 schools with the spectrometer.


Jun 2022
ETH Medal for Outstanding Master's Theses
Awarded for the best 2.5% Master's theses at ETH Zurich
Sep 2019 - Feb 2021
ETH Excellence Scholarship and Opportunity Program
Full, merit-based scholarship awarded to 0.5% of all master students at ETH Zurich
Jan 2016 - Jun 2021
Scholarship of the German Academic Scholarhsip Foundation
Merit-based scholarship awarded to 0.5% of all German students
Sep 2019 - Sep 2020
German Academic Exchange Scholarship (DAAD)
Merit-based scholarship awarded to German graduate students
Nov 2019
Graduation Awards of the University of Magdeburg
Best Graduate at the Computer Science Department 2018/19
Student Research Award for the Bachelor Thesis
Jul 2017
RoboCup, Major League @Work
2nd Place at the World Cup
May 2015
"Jugend forscht" German High School Research Competition
Award for Non-Destructive Testing
Jul 2014 & Jun 2013
RoboCup, Junior League Rescue-B
1st Place at the World Cup