No More Discrimination:
Cross City Adaptation of Road Scene Segmenters

Yi-Hsin Chen     Wei-Yu Chen     Yu-Ting Chen     Bo-Cheng Tsai     Yu-Chiang Frank Wang     Min Sun    

[Dataset]  [Our Method]


Recent developments of technologies in computer vision, deep learning, and more broadly artificial intelligence, have led to the race of building advanced driver assistance systems (ADAS). Among those related techniques, road scene segmentation is definitely one of the key components for a successful ADAS. However, even the state-of-the-art semantic segmenter still shows a huge performance panalty when we apply it to an unseen city due to dataset (domain) bias. In the below figures, see how severe a state-of-the-art semantic segmenter, which is pretrained on Cityscapes dataset (cities in Germany, e.g., Frankfurt), will be affected by the dataset bias when we apply it to other unseen cities (Rome, Rio, Tokyo and Taipei).

This suggests the urgent necessity of a dataset for the adaptation of road scene segmenter, as well as an effective adaptation method.
Rome Rio Tokyo Taipei
Click the markers in the above map to see how poor the segmenter performance is in the unseen city.

Our Dataset

We introduce a whole new dataset for adaptation of road scence semantic segmenter with two unique properties:

Diverse Locations and Appearances: Our dataset consists of high-quality road scence images of four cities across continents: Rome, Rio, Tokyo and Taipei. Due to their diverse locations, these cities are expected to possess significant appearnace difference. This property makes our dataset perfect for adaptation tasks.

Temporal Information: For each city, we collect 1600 unlabeled image pairs which are taken at the same location but different times. Valuable temporal information is embeded in these image pairs, which facilitates our unsupervised adaptation method.

Moreover, for evaluation purpose, we select 100 images for each city and annotate them with Cityscapes-compatible labeling.

We summarize the dataset statistics in the below table.

Our Method

To adapt a road scene segmenter trained on source domain cities to other unseen target domain cities , we propose a unified framework utilizing domain adversarial learning, which performs joint global and class-wise alignment by leveraging soft labels from source and target-domain data. In addition, by leveraging the temporal information of our dataset, we uniquely identifies and introduce static-object priors to our method, which are retrieved from images via natural synchronization of static objects over time. On average over four target cities (Rio, Rome, Tokyo, Taipei), our method could improve the mIOU of the segmenter by 4.1%. For more details, please refer to our paper.

 Download Paper

ContactYi-Hsin Chen

Last update : April 1, 2017