Overview

To submit, please read our submission guidelines.

A bold algorithm or model name indicates an official implementation submitted by an author of the original paper.

This overall leaderboard shows out-of-distribution test performance across all datasets.

Algorithm Contact FMoW PovertyMap iWildCam Camelyon17 OGB-MolPCBA Amazon CivilComments Py150
Worst-Region Acc Rural Pearson r Macro F1 Avg Acc Avg Precision 10% Acc Worst-Group Acc Method/Class Acc
ERM WILDS 32.8 (0.45) 0.46 (0.07) 31.0 (1.3) 70.3 (6.4) 27.2 (0.3) 53.8 (0.8) 56.0 (3.6) 67.9 (0.1)
CORAL WILDS 31.0 (0.35) 0.44 (0.07) 32.8 (0.1) 59.5 (7.7) 17.9 (0.5) 52.9 (0.8) 65.6 (1.3) 65.9 (0.1)
IRM WILDS 33.5 (1.35) 0.48 (0.04) 15.1 (4.9) 64.2 (8.1) 15.6 (0.3) 52.4 (0.8) 66.3 (2.1) 64.3 (0.2)
Group DRO WILDS 31.4 (2.1) 0.4 (0.08) 23.9 (2.1) 68.4 (7.3) 22.4 (0.6) 53.3 (0.0) 70.0 (2.0) 65.9 (0.1)

Below, we list individual leaderboards with more details on each submission.

iWildCam

Rank Algorithm Model Test ID Macro F1 Test ID Avg Acc Test OOD Macro F1 Test OOD Avg Acc Contact References Date
1 CORAL ResNet50 43.5 (3.5) 73.7 (0.4) 32.8 (0.1) 73.3 (4.3) WILDS
Paper / Code March 9, 2021
2 ERM ResNet50 47.0 (1.4) 75.7 (0.3) 31.0 (1.3) 71.6 (2.5) WILDS
Paper / Code March 9, 2021
3 Group DRO ResNet50 37.5 (1.7) 71.6 (2.7) 23.9 (2.1) 72.7 (2.0) WILDS
Paper / Code March 9, 2021
4 IRM ResNet50 22.4 (7.7) 59.9 (8.1) 15.1 (4.9) 59.8 (3.7) WILDS
Paper / Code March 9, 2021

Camelyon17

Rank Algorithm Model Val Acc Test Acc Contact References Date
1 ERM DenseNet121 84.9 (3.1) 70.3 (6.4) WILDS
Paper / Code March 9, 2021
2 Group DRO DenseNet121 85.5 (2.2) 68.4 (7.3) WILDS
Paper / Code March 9, 2021
3 IRM DenseNet121 86.2 (1.4) 64.2 (8.1) WILDS
Paper / Code March 9, 2021
4 CORAL DenseNet121 86.2 (1.4) 59.5 (7.7) WILDS
Paper / Code March 9, 2021

OGB-MolPCBA

Rank Algorithm Model Val Avg Precision Test Avg Precision Contact References Date
1 ERM GIN-virtual 27.8 (0.1) 27.2 (0.3) WILDS
Paper / Code March 9, 2021
2 Group DRO GIN-virtual 23.1 (0.6) 22.4 (0.6) WILDS
Paper / Code March 9, 2021
3 CORAL GIN-virtual 18.4 (0.2) 17.9 (0.5) WILDS
Paper / Code March 9, 2021
4 IRM GIN-virtual 15.8 (0.2) 15.6 (0.3) WILDS
Paper / Code March 9, 2021

CivilComments

Rank Algorithm Model Val Avg Acc Val Worst-Group Acc Test Avg Acc Test Worst-Group Acc Contact References Date
1 Group DRO (label×Black) DistillBERT-base-uncased 90.1 (0.4) 67.7 (1.8) 89.9 (0.5) 70.0 (2.0) WILDS
Paper / Code March 9, 2021
2 Reweighted (label) DistillBERT-base-uncased 90.1 (0.4) 65.9 (1.8) 89.8 (0.4) 69.2 (0.9) WILDS
Paper / Code March 9, 2021
3 Group DRO (label) DistillBERT-base-uncased 90.4 (0.4) 65.0 (3.8) 90.2 (0.3) 69.1 (1.8) WILDS
Paper / Code March 9, 2021
4 IRM (label×Black) DistillBERT-base-uncased 89.0 (0.7) 65.9 (2.8) 88.8 (0.7) 66.3 (2.1) WILDS
Paper / Code March 9, 2021
5 Reweighted (label×Black) DistillBERT-base-uncased 89.5 (0.6) 66.6 (1.5) 89.2 (0.6) 66.2 (1.2) WILDS
Paper / Code March 9, 2021
6 Coral (label×Black) DistillBERT-base-uncased 88.9 (0.6) 64.7 (1.4) 88.7 (0.5) 65.6 (1.3) WILDS
Paper / Code March 9, 2021
7 ERM DistillBERT-base-uncased 92.3 (0.2) 50.5 (1.9) 92.2 (0.1) 56.0 (3.6) WILDS
Paper / Code March 9, 2021

FMoW

Rank Algorithm Model Val Avg Acc Test Avg Acc Val Worst-region Acc Test Worst-region Acc Contact References Date
1 IRM DenseNet121 56.0 (0.24) 50.1 (0.29) 50.8 (1.06) 33.5 (1.35) WILDS
Paper / Code March 9
2 ERM DenseNet121 59.0 (0.69) 52.5 (0.53) 50.9 (0.19) 32.8 (0.45) WILDS
Paper / Code March 9, 2021
3 Group DRO DenseNet121 57.9 (0.19) 51.2 (0.37) 49.3 (0.31) 31.4 (2.1) WILDS
Paper / Code March 9, 2021
4 CORAL DenseNet121 55.7 (0.84) 49.4 (0.48) 48.0 (1.13) 31.0 (0.35) WILDS
Paper / Code March 9, 2021

PovertyMap

Rank Algorithm Model Val Pearson r Test Pearson r Rural Val Pearson r Rural Test Pearson r Contact References Date
1 IRM ResNet18-MS 0.81 (0.03) 0.78 (0.03) 0.54 (0.06) 0.48 (0.04) WILDS
Paper / Code March 9, 2021
2 ERM ResNet18-MS 0.8 (0.03) 0.79 (0.05) 0.52 (0.05) 0.46 (0.07) WILDS
Paper / Code March 9, 2021
3 CORAL ResNet18-MS 0.8 (0.04) 0.78 (0.05) 0.51 (0.06) 0.44 (0.07) WILDS
Paper / Code March 9, 2021
4 Group DRO ResNet18-MS 0.78 (0.04) 0.76 (0.05) 0.48 (0.04) 0.4 (0.08) WILDS
Paper / Code March 9, 2021

Amazon

Rank Algorithm Model Val Avg Acc Test Avg Acc Val 10% Acc Test 10% Acc Contact References Date
1 ERM DistillBERT-base-uncased 72.7 (0.1) 71.1 (0.3) 55.2 (0.7) 53.8 (0.8) WILDS
Paper / Code March 9, 2021
2 Group DRO DistillBERT-base-uncased 70.7 (0.6) 70.0 (0.6) 54.7 (0.0) 53.3 (0.0) WILDS
Paper / Code March 9, 2021
3 CORAL DistillBERT-base-uncased 72.0 (0.3) 71.1 (0.3) 54.7 (0.0) 52.9 (0.8) WILDS
Paper / Code March 9, 2021
4 IRM DistillBERT-base-uncased 71.5 (0.3) 70.5 (0.3) 54.2 (0.8) 52.4 (0.8) WILDS
Paper / Code March 9, 2021
5 Reweight (label) DistillBERT-base-uncased 69.1 (0.5) 68.6 (0.6) 52.1 (0.2) 52.0 (0.0) WILDS
Paper / Code March 9, 2021

Py150

Rank Algorithm Model Test ID Method/Class Acc Test ID All Acc Test OOD Method/class Acc Test OOD All Acc Contact References Date
1 ERM CodeGPT 75.4 (0.4) 74.5 (0.4) 67.9 (0.1) 69.6 (0.1) WILDS
Paper / Code March 9, 2021
2 Group DRO CodeGPT 70.8 (0.0) 71.0 (0.0) 65.9 (0.1) 67.9 (0.0) WILDS
Paper / Code March 9, 2021
3 CORAL CodeGPT 70.6 (0.0) 70.8 (0.1) 65.9 (0.1) 67.9 (0.0) WILDS
Paper / Code March 9, 2021
4 IRM CodeGPT 67.3 (1.1) 68.3 (0.7) 64.3 (0.2) 66.4 (0.1) WILDS
Paper / Code March 9, 2021