Overview

To submit, please read our submission guidelines.

Higher numbers are better for all metrics. In parentheses, we show corrected sample standard deviations across random replicates.

A bold algorithm or model name indicates an official implementation submitted by an author of the original paper.

An asterisk next to a value indicates that the entry deviates from the official submission guidelines, for example because it uses a non-default model or additional pre-training data. The deviations are described in the notes in the dataset-specific leaderboards.

This overall leaderboard show out-of-distribution test performance across all datasets. For each dataset, we highlight in green the best-performing algorithm that conforms to official submission guidelines.

Without unlabeled data
Algorithm Amazon Camelyon17 CivilComments FMoW GlobalWheat iWildCam OGB-MolPCBA PovertyMap Py150 RxRx1 Contact References
10% Acc Avg Acc Worst-Group Acc Worst-Reg Acc Avg domain acc Macro F1 Avg Precision Worst-U/R r Mtd/Cls Acc Avg Acc
LISA 54.7 (0.0) 77.1 (6.9) 72.9 (1.0) 35.5 (0.81) - - - - - 31.9 (1.0) Yu Wang
Paper / Code
MBDG - 93.3 (1.0) - - - - - - - - Alex Robey
Paper / Code
IID repr - - - - - - - - - 39.2 (0.2) Jiqing Wu
Paper / Code
Fish 53.3 (0.0) 74.7 (7.1) 75.3 (0.6) 34.6 (0.18) - 22.0 (1.8) - 0.3 (0.01) - - Yuge Shi
Paper / Code
CORAL 52.9 (0.8) 59.5 (7.7) 65.6 (1.3) 31.7 (1.24) - 32.8 (0.1) 17.9 (0.5) 0.44 (0.06) 65.9 (0.1) 28.4 (0.3) WILDS
Paper / Code
CGD - 69.4 (7.9) 69.1 (1.9) 32.0 (2.26) - - - 0.43 (0.04) - - Vihari Piratla
Paper / Code
Group DRO 53.3 (0.0) 68.4 (7.3) 70.0 (2.0) 30.8 (0.81) 47.9 (2.0) 23.9 (2.1) 22.4 (0.6) 0.39 (0.06) 65.9 (0.1) 23.0 (0.3) WILDS
Paper / Code
ARM-BN - - - 24.4 (0.54) - 23.3 (2.8) - - - 31.2 (0.1) Marvin Zhang
Paper /
Test-time BN adaptation - - - 30.0 (0.23) - 13.8 (0.6) - - - 20.1 (0.2) Marvin Zhang
Paper /
ERM w/ data aug - 82.0 (7.4) * - 34.8 (1.48) - 32.2 (1.2) - 0.49 (0.06) - - WILDS
Paper / Code
ERM (CutMix) - - - - - - - - - 38.4 (0.2) Jiqing Wu
Paper / Code
ERM - - - 34.8 (1.9) - 32.0 (1.5) - - - - Kazuki Irie
Paper / Code
ERM (grid search) 53.8 (0.8) 70.3 (6.4) 56.0 (3.6) 32.3 (1.25) 51.2 (1.8) 31.0 (1.3) 27.2 (0.3) 0.45 (0.06) 67.9 (0.1) 29.9 (0.4) WILDS
Paper / Code
ERM (rand search) 54.2 (0.8) 70.8 (7.2) - 33.7 (1.49) 51.0 (0.7) 30.6 (1.1) 28.3 (0.1) 0.5 (0.07) - - WILDS
Paper / Code
IRM 52.4 (0.8) 64.2 (8.1) 66.3 (2.1) 30.0 (1.37) - 15.1 (4.9) 15.6 (0.3) 0.43 (0.07) 64.3 (0.2) 8.2 (1.1) WILDS
Paper / Code
With unlabeled data
Algorithm Amazon Camelyon17 CivilComments FMoW GlobalWheat iWildCam OGB-MolPCBA PovertyMap Contact References
10% Acc Avg Acc Worst-Group Acc Worst-Region Acc Avg domain acc Macro F1 Avg Precision Worst-U/R Pearson r
CORAL 53.3 (0.0) 77.9 (6.6) - 34.1 (0.62) - 27.9 (0.4) 26.6 (0.2) 0.36 (0.08) WILDS
Paper / Code
DANN 53.3 (0.0) 68.4 (9.2) - 34.6 (1.71) - 31.9 (1.4) 20.4 (0.8) 0.33 (0.1) WILDS
Paper / Code
AFN 54.2 (0.8) 83.2 (6.2) - 38.3 (1.01) - 30.8 (0.5) 14.9 (1.3) 0.39 (0.08) WILDS
Paper / Code
Pseudo-Label 52.3 (1.1) 67.7 (8.2) 66.9 (2.6) 33.7 (0.24) 42.9 (2.3) 30.3 (0.4) 19.7 (0.1) - WILDS
Paper / Code
FixMatch - 71.0 (4.9) - 32.1 (2.05) - 31.0 (1.3) - 0.3 (0.11) WILDS
Paper / Code
NoisyStudent - 86.7 (1.7) - 37.8 (0.62) 46.8 (1.2) 32.1 (0.7) 27.5 (0.1) 0.42 (0.11) WILDS
Paper / Code
SwAV - 91.4 (2.0) - 36.3 (1.01) - 29.0 (2.0) - 0.45 (0.05) WILDS
Paper / Code
Masked LM 53.9 (0.7) - 65.7 (2.3) - - - - - WILDS
Paper / Code

Below, we list individual leaderboards with more details on each submission.

iWildCam

Without unlabeled data
Rank Algorithm Model Test ID Macro F1 Test ID Avg Acc Test OOD Macro F1 ▼ Test OOD Avg Acc Contact References Date Notes
1 Model Soups (CLIP ViT-L) ViT-L 57.6 (1.9) 79.1 (0.4) 43.3 (1.0) 79.3 (0.3) Mitchell Wortsman
Paper / Code March 12, 2022 Model soups on top of a random hyperparameter search over LR, iterations, data augmentation, label smoothing.
2 ERM PNASNet-5-Large 52.8 (1.4) 77.3 (0.7) 38.5 (0.6) 78.3 (1.4) John Miller
Paper / Code July 20, 2021
3 CORAL ResNet50 43.5 (3.5) 73.7 (0.4) 32.8 (0.1) 73.3 (4.3) WILDS
Paper / Code July 15, 2021
4 ERM w/ data aug ResNet50 47.0 (1.4) 76.9 (0.6) 32.2 (1.2) 73.0 (0.4) WILDS
Paper / Code December 9, 2021 Implements RandAugment. Hyperparameters tuned via random search, consistent with the WILDS unlabeled data baselines in the table below.
5 ERM ResNet50 47.9 (2.6) 76.2 (0.1) 32.0 (1.5) 69.0 (0.4) Kazuki Irie
Paper / Code February 02, 2022 We used the default hyper-parameters from the official code base, but conducted cross validation every 1000 training steps.
6 ERM ResNet50 47.0 (1.4) 75.7 (0.3) 31.0 (1.3) 71.6 (2.5) WILDS
Paper / Code July 15, 2021 Hyperparameters tuned via grid search, consistent with other WILDS baselines that do not use unlabeled data.
7 ERM ResNet50 46.7 (0.6) 74.9 (1.2) 30.6 (1.1) 72.5 (3.2) WILDS
Paper / Code December 9, 2021 Hyperparameters tuned via random search, consistent with the WILDS unlabeled data baselines in the table below.
8 Group DRO ResNet50 37.5 (1.7) 71.6 (2.7) 23.9 (2.1) 72.7 (2.0) WILDS
Paper / Code July 15, 2021
9 ARM-BN ResNet50 27.5 (5.4) 62.0 (4.0) 23.3 (2.8) 70.2 (2.4) Marvin Zhang
Paper / April 23, 2022 Requires test data to be batched by groups
10 Fish ResNet50 40.3 (0.6) 73.8 (0.1) 22.0 (1.8) 64.7 (2.6) Yuge Shi
Paper / Code July 15, 2021
11 IRM ResNet50 22.4 (7.7) 59.9 (8.1) 15.1 (4.9) 59.8 (3.7) WILDS
Paper / Code July 15, 2021
12 Test-time BN adaptation ResNet50 12.0 (0.3) 37.2 (0.7) 13.8 (0.6) 46.6 (0.9) Marvin Zhang
Paper / April 19, 2022 Requires test data to be batched by groups
With unlabeled data

Unlabeled data is available from the extra domain.

Rank Algorithm Model Test ID Macro F1 Test ID Avg Acc Test OOD Macro F1 ▼ Test OOD Avg Acc Contact References Date Notes
1 NoisyStudent ResNet50 48.3 (1.6) 77.6 (1.4) 32.1 (0.7) 71.0 (3.1) WILDS
Paper / Code December 9, 2021
2 DANN ResNet50 48.5 (2.8) 77.2 (0.8) 31.9 (1.4) 70.7 (2.6) WILDS
Paper / Code December 9, 2021
3 FixMatch ResNet50 46.3 (0.5) 78.1 (0.5) 31.0 (1.3) 71.9 (3.1) WILDS
Paper / Code December 9, 2021
4 AFN ResNet50 46.8 (0.8) 77.2 (0.5) 30.8 (0.5) 74.4 (1.6) WILDS
Paper / Code December 9, 2021
5 Pseudo-Label ResNet50 47.3 (0.4) 77.5 (0.1) 30.3 (0.4) 68.7 (3.4) WILDS
Paper / Code December 9, 2021
6 SwAV ResNet50 47.3 (1.4) 74.8 (1.3) 29.0 (2.0) 63.4 (1.5) WILDS
Paper / Code December 9, 2021
7 CORAL ResNet50 40.5 (1.4) 77.0 (0.3) 27.9 (0.4) 69.7 (0.8) WILDS
Paper / Code December 9, 2021

Camelyon17

Without unlabeled data
Rank Algorithm Model Val Acc Test Acc ▼ Contact References Date Notes
1 MBDG DenseNet121 88.1 (1.8) 93.3 (1.0) Alex Robey
Paper / Code March 17, 2022 lr: [1e-4*, 1e-3, 1e-2], wd: [0*, 1e-3, 1e-3], gamma (mbdg margin): [0.01, 0.1*, 0.5], eta_d (mbdg dual step size): [5e-3, 5e-2*, 5e-1]
2 ERM w/ H&E jitter SE-ResNeXt101-32x4d 88.0 (4.2) * 91.6 (1.9) * Rohan Taori
Paper / Code July 20, 2021 Implements specialized H&E staining color jitter described here. Does not use the default model.
3 ERM w/ data aug DenseNet121 90.6 (1.2) * 82.0 (7.4) * WILDS
Paper / Code December 9, 2021 Implements RandAugment. Uses color augmentation as part of RandAugment. Hyperparameters tuned via random search, consistent with the WILDS unlabeled data baselines in the table below.
4 LISA DenseNet121 81.8 (1.4) 77.1 (6.9) Yu Wang
Paper / Code January 18, 2022 We have used all the default hyperparameters, except for the mix_alpha, which is tuned in [0.5, 2], and 2 is the final choice.
5 Fish DenseNet121 83.9 (1.2) 74.7 (7.1) Yuge Shi
Paper / Code July 15, 2021
6 ERM DenseNet121 85.8 (1.9) 70.8 (7.2) WILDS
Paper / Code December 9, 2021 Hyperparameters tuned via random search, consistent with the WILDS unlabeled data baselines in the table below.
7 ERM DenseNet121 84.9 (3.1) 70.3 (6.4) WILDS
Paper / Code July 15, 2021 Hyperparameters tuned via grid search, consistent with other WILDS baselines that do not use unlabeled data.
8 CGD DenseNet121 86.8 (1.4) 69.4 (7.9) Vihari Piratla
Paper / Code April 16, 2022 No hyperparameter tuning. CG step size: 0.05. LR, optimizer, decay rate etc. all set to default.
9 Group DRO DenseNet121 85.5 (2.2) 68.4 (7.3) WILDS
Paper / Code July 15, 2021
10 IRM DenseNet121 86.2 (1.4) 64.2 (8.1) WILDS
Paper / Code July 15, 2021
11 CORAL DenseNet121 86.2 (1.4) 59.5 (7.7) WILDS
Paper / Code July 15, 2021
With unlabeled data

Unlabeled data is available from the source, validation and target domains.

Rank Algorithm Model Val Acc Test Acc ▼ Contact References Date Notes
1 SwAV DenseNet121 92.3 (0.4) 91.4 (2.0) WILDS
Paper / Code December 9, 2021 With unlabeled data from the target domain. Uses color augmentation as part of RandAugment.
2 NoisyStudent DenseNet121 93.2 (0.5) 86.7 (1.7) WILDS
Paper / Code December 9, 2021 With unlabeled data from the target domain. Uses color augmentation as part of RandAugment.
3 AFN DenseNet121 91.1 (0.9) 83.2 (6.2) WILDS
Paper / Code December 9, 2021 With unlabeled data from the target domain. Uses color augmentation as part of RandAugment.
4 CORAL DenseNet121 90.4 (0.9) 77.9 (6.6) WILDS
Paper / Code December 9, 2021 With unlabeled data from the target domain. Uses color augmentation as part of RandAugment.
5 FixMatch DenseNet121 91.3 (1.1) 71.0 (4.9) WILDS
Paper / Code December 9, 2021 With unlabeled data from the target domain. Uses color augmentation as part of RandAugment.
6 DANN DenseNet121 86.9 (2.2) 68.4 (9.2) WILDS
Paper / Code December 9, 2021 With unlabeled data from the target domain. Uses color augmentation as part of RandAugment.
7 Pseudo-Label DenseNet121 91.3 (1.3) 67.7 (8.2) WILDS
Paper / Code December 9, 2021 With unlabeled data from the target domain. Uses color augmentation as part of RandAugment.

RxRx1

Rank Algorithm Model Val Acc Test ID Acc Test Acc ▼ Contact References Date Notes
1 IID representation learning ResNet50 23.9 (0.3) 49.9 (0.5) 39.2 (0.2) Jiqing Wu
Paper / Code January 18, 2022 Uses CutMix regularizer.
2 ERM (CutMix) ResNet50 23.6 (0.3) 47.4 (1.0) 38.4 (0.2) Jiqing Wu
Paper / Code January 18, 2022 Uses CutMix regularizer.
3 LISA ResNet50 20.1 (0.4) 41.1 (1.3) 31.9 (1.0) Yu Wang
Paper / Code January 18, 2022 We have used all the default hyperparameters, except for the mix_alpha, which is tuned in [0.5, 2], and 2 is the final choice.
4 ARM-BN ResNet50 20.9 (0.2) 34.9 (0.2) 31.2 (0.1) Marvin Zhang
Paper / April 23, 2022 Requires test data to be batched by groups
5 ERM ResNet50 19.4 (0.2) 35.9 (0.4) 29.9 (0.4) WILDS
Paper / Code July 15, 2021
6 CORAL ResNet50 18.5 (0.4) 34.0 (0.3) 28.4 (0.3) WILDS
Paper / Code July 15, 2021
7 Group DRO ResNet50 15.2 (0.1) 28.1 (0.3) 23.0 (0.3) WILDS
Paper / Code July 15, 2021
8 Test-time BN adaptation ResNet50 12.3 (0.2) 21.5 (0.2) 20.1 (0.2) Marvin Zhang
Paper / April 19, 2022 Requires test data to be batched by groups
9 IRM ResNet50 5.6 (0.4) 9.9 (1.4) 8.2 (1.1) WILDS
Paper / Code July 15, 2021

OGB-MolPCBA

Without unlabeled data
Rank Algorithm Model Val Avg Precision Test Avg Precision ▼ Contact References Date Notes
1 ERM GIN-virtual 29.3 (0.3) 28.3 (0.1) WILDS
Paper / Code December 9, 2021 Hyperparameters tuned via random search, consistent with the WILDS unlabeled data baselines in the table below.
2 ERM GIN-virtual 27.8 (0.1) 27.2 (0.3) WILDS
Paper / Code July 15, 2021 Hyperparameters tuned via grid search, consistent with other WILDS baselines that do not use unlabeled data.
3 Group DRO GIN-virtual 23.1 (0.6) 22.4 (0.6) WILDS
Paper / Code July 15, 2021
4 CORAL GIN-virtual 18.4 (0.2) 17.9 (0.5) WILDS
Paper / Code July 15, 2021
5 IRM GIN-virtual 15.8 (0.2) 15.6 (0.3) WILDS
Paper / Code July 15, 2021
With unlabeled data

Unlabeled data is available from the source, validation and target domains.

Rank Algorithm Model Val Avg Precision Test Avg Precision ▼ Contact References Date Notes
1 NoisyStudent GIN-virtual 28.9 (0.1) 27.5 (0.1) WILDS
Paper / Code December 9, 2021 With unlabeled data from the target domain.
2 CORAL GIN-virtual 27.0 (0.4) 26.6 (0.2) WILDS
Paper / Code December 9, 2021 With unlabeled data from the target domain.
3 DANN GIN-virtual 20.7 (0.8) 20.4 (0.8) WILDS
Paper / Code December 9, 2021 With unlabeled data from the target domain.
4 Pseudo-Label GIN-virtual 21.9 (0.6) 19.7 (0.1) WILDS
Paper / Code December 9, 2021 With unlabeled data from the target domain.
5 AFN GIN-virtual 15.1 (1.3) 14.9 (1.3) WILDS
Paper / Code December 9, 2021 With unlabeled data from the target domain.

GlobalWheat

Without unlabeled data
Rank Algorithm Model Val ID Acc Val Acc Test ID Acc Test Acc ▼ Contact References Date Notes
1 ERM Faster-RCNN 77.4 (1.1) 68.6 (0.4) 77.1 (0.5) 51.2 (1.8) WILDS
Paper / Code July 15, 2021 Hyperparameters tuned via grid search, consistent with other WILDS baselines that do not use unlabeled data.
2 ERM Faster-RCNN 78.2 (0.4) 69.0 (0.3) 77.8 (0.2) 51.0 (0.7) WILDS
Paper / Code December 9, 2021 Hyperparameters tuned via random search, consistent with the WILDS unlabeled data baselines in the table below.
3 Group DRO Faster-RCNN 76.1 (1.0) 66.2 (0.4) 76.2 (0.8) 47.9 (2.0) WILDS
Paper / Code July 15, 2021
With unlabeled data

Unlabeled data is available from the source, validation, target and extra domains.

Rank Algorithm Model Val ID Acc Val Acc Test ID Acc Test Acc ▼ Contact References Date Notes
1 NoisyStudent Faster-RCNN 78.7 (0.1) 67.6 (0.5) 78.1 (0.3) 46.8 (1.2) WILDS
Paper / Code December 9, 2021 With unlabeled data from the target domain.
2 Pseudo-Label Faster-RCNN 73.0 (1.1) 65.0 (0.7) 73.3 (0.9) 42.9 (2.3) WILDS
Paper / Code December 9, 2021 With unlabeled data from the target domain.

CivilComments

Without unlabeled data
Rank Algorithm Model Val Avg Acc Val Worst-Group Acc Test Avg Acc Test Worst-Group Acc ▼ Contact References Date Notes
1 Fish DistillBERT-base-uncased 88.9 (0.6) 70.5 (1.1) 89.3 (0.3) 75.3 (0.6) Yuge Shi
Paper / Code December 14, 2021
2 LISA DistillBERT-base-uncased 90.3 (0.3) 71.2 (0.9) 90.1 (0.3) 72.9 (1.0) Yu Wang
Paper / Code January 18, 2022 We have used all the default hyperparameters, except for the mix_alpha, which is tuned in [0.5, 2], and 2 is the final choice.
3 Group DRO (label×Black) DistillBERT-base-uncased 90.1 (0.4) 67.7 (1.8) 89.9 (0.5) 70.0 (2.0) WILDS
Paper / Code July 15, 2021
4 Reweighted (label) DistillBERT-base-uncased 90.1 (0.4) 65.9 (1.8) 89.8 (0.4) 69.2 (0.9) WILDS
Paper / Code July 15, 2021
5 CGD DistillBERT-base-uncased 89.6 (0.5) 68.3 (1.6) 89.6 (0.4) 69.1 (1.9) Vihari Piratla
Paper / Code April 16, 2022 cg step size: [0.005, 0.01, 0.05, 0.1]; 0.05 performed the best. LR, optimizer, decay rate etc. all set to default
6 Group DRO (label) DistillBERT-base-uncased 90.4 (0.4) 65.0 (3.8) 90.2 (0.3) 69.1 (1.8) WILDS
Paper / Code July 15, 2021
7 Reweighted (label) DistillBERT-base-uncased 90.0 (0.7) 63.7 (2.7) 89.8 (0.8) 66.6 (1.6) WILDS
Paper / Code December 9, 2021 Unlabeled data is available from the same distribution as the labeled data.
8 IRM (label×Black) DistillBERT-base-uncased 89.0 (0.7) 65.9 (2.8) 88.8 (0.7) 66.3 (2.1) WILDS
Paper / Code July 15, 2021
9 Reweighted (label×Black) DistillBERT-base-uncased 89.5 (0.6) 66.6 (1.5) 89.2 (0.6) 66.2 (1.2) WILDS
Paper / Code July 15, 2021
10 CORAL (label×Black) DistillBERT-base-uncased 88.9 (0.6) 64.7 (1.4) 88.7 (0.5) 65.6 (1.3) WILDS
Paper / Code July 15, 2021
11 ERM DistillBERT-base-uncased 92.3 (0.2) 50.5 (1.9) 92.2 (0.1) 56.0 (3.6) WILDS
Paper / Code July 15, 2021 Hyperparameters tuned via grid search, consistent with other WILDS baselines that do not use unlabeled data.
With unlabeled data

Unlabeled data is available from the extra domain.

Rank Algorithm Model Val Avg Acc Val Worst-Group Acc Test Avg Acc Test Worst-Group Acc ▼ Contact References Date Notes
1 Pseudo-Label DistillBERT-base-uncased 90.5 (0.6) 63.9 (1.7) 90.3 (0.5) 66.9 (2.6) WILDS
Paper / Code December 9, 2021
2 Masked LM DistillBERT-base-uncased 89.7 (1.1) 64.5 (2.5) 89.4 (1.2) 65.7 (2.3) WILDS
Paper / Code December 9, 2021

FMoW

Without unlabeled data
Rank Algorithm Model Val Avg Acc Test Avg Acc Val Worst-region Acc Test Worst-region Acc ▼ Contact References Date Notes
1 Model Soups (CLIP ViT-L) ViT-L 75.7 (0.07) 69.5 (0.08) 59.8 (0.43) 47.6 (0.33) Mitchell Wortsman
Paper / Code March 12, 2022 Model soups on top of a random hyperparameter search over LR, iterations, data augmentation, label smoothing.
2 LISA DenseNet121 58.7 (1.12) 52.8 (1.15) 48.7 (0.92) 35.5 (0.81) Yu Wang
Paper / Code January 18, 2022 We have used all the default hyperparameters, except for the mix_alpha, which is tuned in [0.5, 2], and 2 is the final choice.
3 ERM SE-ResNeXt101-32x4d 62.1 (0.24) * 55.5 (0.14) * 51.3 (2.93) * 35.0 (0.78) * John Miller
Paper / Code July 15, 2021 Does not use the default model.
4 ERM DenseNet121 62.0 (0.06) 55.6 (0.23) 52.5 (1.25) 34.8 (1.9) Kazuki Irie
Paper / Code February 02, 2022 batch_size: [20*, 32, 64] and for the case batch_size=20, lr: [1e-4, 3e-4*]. We conducted cross validation every 200 training steps.
5 ERM w/ data aug DenseNet121 62.2 (0.1) 55.4 (0.52) 53.2 (0.61) 34.8 (1.48) WILDS
Paper / Code December 9, 2021 Implements RandAugment. Hyperparameters tuned via random search, consistent with the WILDS unlabeled data baselines in the table below.
6 Fish DenseNet121 57.8 (0.15) 51.8 (0.32) 49.5 (2.34) 34.6 (0.18) Yuge Shi
Paper / Code July 15, 2021
7 ERM DenseNet121 60.7 (0.54) 54.0 (0.39) 52.6 (0.25) 33.7 (1.49) WILDS
Paper / Code December 9, 2021 Hyperparameters tuned via random search, consistent with the WILDS unlabeled data baselines in the table below.
8 ERM DenseNet121 59.5 (0.37) 53.0 (0.55) 48.9 (0.62) 32.3 (1.25) WILDS
Paper / Code July 15, 2021 Hyperparameters tuned via grid search, consistent with other WILDS baselines that do not use unlabeled data.
9 CGD DenseNet121 57.0 (1.03) 50.6 (1.39) 49.8 (1.04) 32.0 (2.26) Vihari Piratla
Paper / Code April 16, 2022 cg step size: [0.05, 0.01, 0.2]; 0.2 performed the best. LR, optimizer, decay rate etc. all set to default
10 CORAL DenseNet121 56.9 (0.25) 50.5 (0.36) 47.1 (0.43) 31.7 (1.24) WILDS
Paper / Code July 15, 2021
11 Group DRO DenseNet121 58.8 (0.19) 52.1 (0.5) 46.5 (0.25) 30.8 (0.81) WILDS
Paper / Code July 15, 2021
12 Test-time BN adaptation DenseNet121 57.9 (0.36) 51.5 (0.25) 47.8 (0.52) 30.0 (0.23) Marvin Zhang
Paper / April 19, 2022 Requires test data to be batched by groups
13 IRM DenseNet121 57.4 (0.37) 50.8 (0.13) 47.5 (1.57) 30.0 (1.37) WILDS
Paper / Code July 15, 2021
14 ARM-BN DenseNet121 48.0 (0.65) 42.1 (0.26) 38.9 (2.17) 24.4 (0.54) Marvin Zhang
Paper / April 23, 2022 Requires test data to be batched by groups
15 ERM CLIP (ResNet50) 41.6 (0) * 36.8 (0) * 31.5 (0) * 24.3 (0) * Rohan Taori
Paper / Code July 20, 2021 Linear probe finetuning on CLIP. Does not use the default model.
16 ERM CLIP (ViT-B/32) 41.6 (0) * 36.3 (0) * 33.0 (0) * 22.9 (0) * Rohan Taori
Paper / Code July 20, 2021 Linear probe finetuning on CLIP. Does not use the default model.
With unlabeled data

Unlabeled data is available from the source, validation and target domains.

Rank Algorithm Model Val Avg Acc Test Avg Acc Val Worst-region Acc Test Worst-region Acc ▼ Contact References Date Notes
1 AFN DenseNet121 62.0 (0.5) 55.8 (0.61) 53.4 (0.78) 38.3 (1.01) WILDS
Paper / Code December 9, 2021 With unlabeled data from the target domain.
2 NoisyStudent DenseNet121 64.0 (0.37) 58.4 (0.4) 55.4 (0.47) 37.8 (0.62) WILDS
Paper / Code December 9, 2021 With unlabeled data from the target domain.
3 SwAV DenseNet121 63.1 (0.38) 56.3 (0.67) 51.6 (0.57) 36.3 (1.01) WILDS
Paper / Code December 9, 2021 With unlabeled data from the target domain.
4 DANN DenseNet121 59.5 (0.45) 53.0 (0.58) 50.8 (2.18) 34.6 (1.71) WILDS
Paper / Code December 9, 2021 With unlabeled data from the target domain.
5 CORAL DenseNet121 60.5 (0.81) 53.7 (0.47) 51.7 (1.23) 34.1 (0.62) WILDS
Paper / Code December 9, 2021 With unlabeled data from the target domain.
6 Pseudo-Label DenseNet121 62.5 (0.08) 55.6 (0.2) 51.5 (0.52) 33.7 (0.24) WILDS
Paper / Code December 9, 2021 With unlabeled data from the target domain.
7 FixMatch DenseNet121 60.6 (1.94) 54.0 (2.03) 50.8 (1.11) 32.1 (2.05) WILDS
Paper / Code December 9, 2021 With unlabeled data from the target domain.

PovertyMap

Without unlabeled data
Rank Algorithm Model Val Pearson r Test Pearson r Val Worst-U/R Pearson r Test Worst-U/R Pearson r ▼ Contact References Date Notes
1 ERM ResNet18-MS 0.81 (0.03) 0.8 (0.04) 0.53 (0.06) 0.5 (0.07) WILDS
Paper / Code December 9, 2021 Hyperparameters tuned via random search, consistent with the WILDS unlabeled data baselines in the table below.
2 ERM w/ data aug ResNet18-MS 0.81 (0.03) 0.79 (0.04) 0.54 (0.06) 0.49 (0.06) WILDS
Paper / Code December 9, 2021 Implements composition of random horizontal flip, random affine transformation, color jitter on the RGB channels, and Cutout on all channels. Hyperparameters tuned via random search, consistent with the WILDS unlabeled data baselines in the table below.
3 ERM ResNet18-MS 0.8 (0.04) 0.78 (0.04) 0.51 (0.06) 0.45 (0.06) WILDS
Paper / Code July 15, 2021 Hyperparameters tuned via grid search, consistent with other WILDS baselines that do not use unlabeled data.
4 CORAL ResNet18-MS 0.8 (0.04) 0.78 (0.05) 0.52 (0.06) 0.44 (0.06) WILDS
Paper / Code July 15, 2021
5 CGD ResNet18-MS 0.81 (0.03) 0.77 (0.04) 0.51 (0.05) 0.43 (0.04) Vihari Piratla
Paper / Code April 16, 2022 No hparam search; cg step size: 0.05, LR, optimizer, decay rate etc. all set to default
6 IRM ResNet18-MS 0.81 (0.03) 0.77 (0.05) 0.53 (0.05) 0.43 (0.07) WILDS
Paper / Code July 15, 2021
7 Group DRO ResNet18-MS 0.78 (0.05) 0.75 (0.07) 0.46 (0.04) 0.39 (0.06) WILDS
Paper / Code July 15, 2021
8 Fish ResNet18-MS 0.82 (0.0) 0.8 (0.02) 0.47 (0.01) 0.3 (0.01) Yuge Shi
Paper / Code July 15, 2021
With unlabeled data

Unlabeled data is available from the source, validation and target domains.

Rank Algorithm Model Val Pearson r Test Pearson r Val Worst-U/R Pearson r Test Worst-U/R Pearson r ▼ Contact References Date Notes
1 SwAV ResNet18-MS 0.81 (0.05) 0.78 (0.06) 0.54 (0.07) 0.45 (0.05) WILDS
Paper / Code December 9, 2021 With unlabeled data from the target domain.
2 NoisyStudent ResNet18-MS 0.8 (0.05) 0.76 (0.08) 0.52 (0.08) 0.42 (0.11) WILDS
Paper / Code December 9, 2021 With unlabeled data from the target domain.
3 AFN ResNet18-MS 0.76 (0.05) 0.75 (0.08) 0.44 (0.07) 0.39 (0.08) WILDS
Paper / Code December 9, 2021 With unlabeled data from the target domain.
4 CORAL ResNet18-MS 0.79 (0.04) 0.74 (0.05) 0.5 (0.09) 0.36 (0.08) WILDS
Paper / Code December 9, 2021 With unlabeled data from the target domain.
5 DANN ResNet18-MS 0.77 (0.04) 0.69 (0.04) 0.44 (0.11) 0.33 (0.1) WILDS
Paper / Code December 9, 2021 With unlabeled data from the target domain.
6 FixMatch ResNet18-MS 0.76 (0.07) 0.64 (0.11) 0.48 (0.05) 0.3 (0.11) WILDS
Paper / Code December 9, 2021 With unlabeled data from the target domain.

Amazon

Without unlabeled data
Rank Algorithm Model Val Avg Acc Test Avg Acc Val 10% Acc Test 10% Acc ▼ Contact References Date Notes
1 LISA DistillBERT-base-uncased 71.4 (0.4) 70.7 (0.3) 54.8 (0.2) 54.7 (0.0) Yu Wang
Paper / Code January 18, 2022 We have used all the default hyperparameters, except for the mix_alpha, which is tuned in [0.5, 2], and 2 is the final choice.
2 ERM DistillBERT-base-uncased 72.8 (0.1) 72.0 (0.1) 56.0 (0.0) 54.2 (0.8) WILDS
Paper / Code December 9, 2021 Hyperparameters tuned via random search, consistent with the WILDS unlabeled data baselines in the table below.
3 ERM DistillBERT-base-uncased 72.7 (0.1) 71.9 (0.1) 55.2 (0.7) 53.8 (0.8) WILDS
Paper / Code July 15, 2021 Hyperparameters tuned via grid search, consistent with other WILDS baselines that do not use unlabeled data.
4 Fish DistillBERT-base-uncased 72.5 (0.0) 71.7 (0.1) 54.2 (0.8) 53.3 (0.0) Yuge Shi
Paper / Code December 14, 2021
5 Group DRO DistillBERT-base-uncased 70.7 (0.6) 70.0 (0.6) 54.7 (0.0) 53.3 (0.0) WILDS
Paper / Code July 15, 2021
6 CORAL DistillBERT-base-uncased 72.0 (0.3) 71.1 (0.3) 54.7 (0.0) 52.9 (0.8) WILDS
Paper / Code July 15, 2021
7 IRM DistillBERT-base-uncased 71.5 (0.3) 70.5 (0.3) 54.2 (0.8) 52.4 (0.8) WILDS
Paper / Code July 15, 2021
8 Reweight (label) DistillBERT-base-uncased 69.1 (0.5) 68.6 (0.6) 52.1 (0.2) 52.0 (0.0) WILDS
Paper / Code July 15, 2021
With unlabeled data

Unlabeled data is available from the validation, target and extra domains.

Rank Algorithm Model Val Avg Acc Test Avg Acc Val 10% Acc Test 10% Acc ▼ Contact References Date Notes
1 AFN DistillBERT-base-uncased 73.0 (0.4) 72.1 (0.3) 56.0 (0.0) 54.2 (0.8) WILDS
Paper / Code December 9, 2021 With unlabeled data from the target domain.
2 Masked LM DistillBERT-base-uncased 72.7 (0.4) 71.9 (0.4) 55.1 (0.8) 53.9 (0.7) WILDS
Paper / Code December 9, 2021 With unlabeled data from the target domain.
3 DANN DistillBERT-base-uncased 72.6 (0.1) 71.7 (0.1) 54.7 (0.0) 53.3 (0.0) WILDS
Paper / Code December 9, 2021 With unlabeled data from the target domain.
4 CORAL DistillBERT-base-uncased 72.5 (0.1) 71.7 (0.1) 54.2 (0.8) 53.3 (0.0) WILDS
Paper / Code December 9, 2021 With unlabeled data from the target domain.
5 Pseudo-Label DistillBERT-base-uncased 72.5 (0.1) 71.6 (0.1) 54.2 (0.8) 52.3 (1.1) WILDS
Paper / Code December 9, 2021 With unlabeled data from the target domain.

Py150

Rank Algorithm Model Test ID Method/Class Acc Test ID All Acc Test OOD Method/class Acc ▼ Test OOD All Acc Contact References Date Notes
1 ERM CodeGPT 75.4 (0.4) 74.5 (0.4) 67.9 (0.1) 69.6 (0.1) WILDS
Paper / Code July 15, 2021
2 Group DRO CodeGPT 70.8 (0.0) 71.0 (0.0) 65.9 (0.1) 67.9 (0.0) WILDS
Paper / Code July 15, 2021
3 CORAL CodeGPT 70.6 (0.0) 70.8 (0.1) 65.9 (0.1) 67.9 (0.0) WILDS
Paper / Code July 15, 2021
4 IRM CodeGPT 67.3 (1.1) 68.3 (0.7) 64.3 (0.2) 66.4 (0.1) WILDS
Paper / Code July 15, 2021