Overview
To submit, please read our submission guidelines.
Higher numbers are better for all metrics. In parentheses, we show corrected sample standard deviations across random replicates.
A bold algorithm or model name indicates an official implementation submitted by an author of the original paper.
An asterisk next to a value indicates that the entry deviates from the official submission guidelines, for example because it uses a non-default model or additional pre-training data. The deviations are described in the notes in the dataset-specific leaderboards.
This overall leaderboard show out-of-distribution test performance across all datasets.
For each dataset, we highlight in green the best-performing algorithm that conforms to official submission guidelines.
Without unlabeled data
Algorithm |
Amazon |
Camelyon17 |
CivilComments |
FMoW |
GlobalWheat |
iWildCam |
OGB-MolPCBA |
PovertyMap |
Py150 |
RxRx1 |
Contact |
References |
|
10% Acc |
Avg Acc |
Worst-Group Acc |
Worst-Reg Acc |
Avg domain acc |
Macro F1 |
Avg Precision |
Worst-U/R r |
Mtd/Cls Acc |
Avg Acc |
|
|
LISA |
54.7 (0.0)
|
77.1 (6.9)
|
72.9 (1.0)
|
35.5 (0.81)
|
-
|
-
|
-
|
-
|
-
|
31.9 (1.0)
|
Yu Wang
|
Paper /
Code
|
Fish |
53.3 (0.0)
|
74.7 (7.1)
|
75.3 (0.6)
|
34.6 (0.18)
|
-
|
22.0 (1.8)
|
-
|
-
|
-
|
-
|
Yuge Shi
|
Paper /
Code
|
ContriMix |
-
|
94.6 (1.2)
|
-
|
-
|
-
|
-
|
-
|
-
|
-
|
-
|
Dinkar Juyal
|
Paper /
Code
|
ERM w/ targeted aug |
-
|
92.1 (3.1)
|
-
|
-
|
-
|
36.5 (0.9)
*
|
-
|
-
|
-
|
-
|
Irena Gao
|
Paper /
Code
|
IRMX (PAIR opt) |
-
|
74.0 (7.2)
|
74.2 (1.4)
|
35.4 (1.3)
|
-
|
27.9 (0.9)
|
-
|
0.47 (0.09)
|
-
|
28.8 (0.0)
|
Yongqiang Chen
|
Paper /
Code
|
IRMX |
-
|
65.5 (8.3)
|
73.4 (1.4)
|
33.7 (0.95)
|
-
|
26.7 (1.1)
|
-
|
0.45 (0.05)
|
-
|
28.7 (0.2)
|
Yongqiang Chen
|
Paper /
Code
|
DFR |
-
|
-
|
72.5 (0.9)
|
42.8 (0.42)
*
|
-
|
-
|
-
|
-
|
-
|
-
|
Pavel Izmailov
|
Paper /
Code
|
CORAL |
52.9 (0.8)
|
59.5 (7.7)
|
65.6 (1.3)
|
32.8 (0.66)
|
-
|
32.7 (0.2)
|
17.9 (0.5)
|
0.44 (0.07)
|
65.9 (0.1)
|
28.4 (0.3)
|
WILDS
|
Paper /
Code
|
CGD |
-
|
69.4 (7.9)
|
69.1 (1.9)
|
32.0 (2.26)
|
-
|
-
|
-
|
0.43 (0.04)
|
-
|
-
|
Vihari Piratla
|
Paper /
Code
|
Group DRO |
53.3 (0.0)
|
68.4 (7.3)
|
70.0 (2.0)
|
31.1 (1.66)
|
47.9 (2.0)
|
23.8 (2.0)
|
22.4 (0.6)
|
0.39 (0.06)
|
66.0 (0.1)
|
22.5 (0.3)
|
WILDS
|
Paper /
Code
|
C-Mixup |
-
|
-
|
-
|
-
|
-
|
-
|
-
|
0.53 (0.07)
|
-
|
-
|
Yiping Wang
|
Paper /
Code
|
IID representation learning |
-
|
-
|
-
|
-
|
-
|
-
|
-
|
-
|
-
|
39.2 (0.2)
|
Jiqing Wu
|
Paper /
Code
|
ABSGD |
-
|
-
|
-
|
-
|
-
|
33.0 (0.6)
|
-
|
-
|
-
|
-
|
Qi Qi
|
Paper /
Code
|
ERM w/ data aug |
-
|
82.0 (7.4)
|
-
|
35.7 (0.26)
|
-
|
32.2 (1.2)
|
-
|
0.49 (0.06)
|
-
|
-
|
WILDS
|
Paper /
Code
|
ERM (CutMix) |
-
|
-
|
-
|
-
|
-
|
-
|
-
|
-
|
-
|
38.4 (0.2)
|
Jiqing Wu
|
Paper /
Code
|
ERM (more checkpoints) |
-
|
-
|
-
|
34.8 (1.9)
|
-
|
32.0 (1.5)
|
-
|
-
|
-
|
-
|
Kazuki Irie
|
Paper /
Code
|
ERM (grid search) |
53.8 (0.8)
|
70.3 (6.4)
|
56.0 (3.6)
|
31.3 (0.17)
|
51.2 (1.8)
|
30.8 (1.3)
|
27.2 (0.3)
|
0.45 (0.06)
|
67.9 (0.1)
|
29.9 (0.4)
|
WILDS
|
Paper /
Code
|
ERM (rand search) |
54.2 (0.8)
|
70.8 (7.2)
|
-
|
34.1 (1.42)
|
50.5 (1.7)
|
30.6 (1.1)
|
28.3 (0.1)
|
0.5 (0.07)
|
-
|
-
|
WILDS
|
Paper /
Code
|
IRM |
52.4 (0.8)
|
64.2 (8.1)
|
66.3 (2.1)
|
32.8 (2.09)
|
-
|
15.1 (4.9)
|
15.6 (0.3)
|
0.43 (0.07)
|
64.3 (0.2)
|
8.2 (1.1)
|
WILDS
|
Paper /
Code
|
ARM-BN |
-
|
-
|
-
|
24.4 (0.54)
|
-
|
23.3 (2.8)
|
-
|
-
|
-
|
31.2 (0.1)
|
Marvin Zhang
|
Paper /
Code
|
Test-time BN adaptation |
-
|
-
|
-
|
30.0 (0.23)
|
-
|
13.8 (0.6)
|
-
|
-
|
-
|
20.1 (0.2)
|
Marvin Zhang
|
Paper /
Code
|
AutoFT |
-
|
-
|
-
|
51.8 (0.41)
*
|
-
|
52.0 (0.4)
*
|
-
|
-
|
-
|
-
|
Caroline Choi
|
Paper /
Code
|
SGD (Freeze-Embed) |
-
|
96.5 (0.4)
*
|
-
|
50.3 (1.1)
*
|
-
|
-
|
-
|
-
|
-
|
-
|
Ananya Kumar
|
Paper /
Code
|
FLYP |
-
|
-
|
-
|
-
|
-
|
46.0 (1.3)
*
|
-
|
-
|
-
|
-
|
Sankalp Garg
|
Paper /
Code
|
MBDG |
-
|
93.3 (1.0)
*
|
-
|
-
|
-
|
-
|
-
|
-
|
-
|
-
|
Alex Robey
|
Paper /
Code
|
MixUp |
-
|
63.5 (0.9)
*
|
-
|
-
|
-
|
13.8 (0.8)
|
-
|
-
|
-
|
-
|
Olivia Wiles
|
Paper /
Code
|
JTT |
-
|
63.8 (1.4)
*
|
-
|
-
|
-
|
11.0 (2.5)
|
-
|
-
|
-
|
-
|
Olivia Wiles
|
Paper /
Code
|
With unlabeled data
Algorithm |
Amazon |
Camelyon17 |
CivilComments |
FMoW |
GlobalWheat |
iWildCam |
OGB-MolPCBA |
PovertyMap |
Contact |
References |
|
10% Acc |
Avg Acc |
Worst-Group Acc |
Worst-Reg Acc |
Avg domain acc |
Macro F1 |
Avg Precision |
Worst-U/R r |
|
|
ICON |
54.7 (0.0)
|
93.8 (0.3)
|
68.8 (1.3)
|
39.9 (1.12)
|
52.3 (0.2)
|
34.5 (1.4)
|
28.3 (0.0)
|
0.49 (0.04)
|
Nick Y.
|
Paper /
Code
|
Connect Later |
-
|
95.0 (0.9)
|
-
|
-
|
-
|
36.8 (1.6)
|
-
|
-
|
Helen Qu
|
Paper /
Code
|
ERM+SpAR |
-
|
-
|
-
|
-
|
-
|
-
|
-
|
0.51 (0.1)
|
Ben Eyre
|
Paper /
Code
|
C-Mixup+SpAR |
-
|
-
|
-
|
-
|
-
|
-
|
-
|
0.52 (0.09)
|
Ben Eyre
|
Paper /
Code
|
Noisy Student |
-
|
86.7 (1.7)
|
-
|
37.8 (0.62)
|
49.3 (3.7)
|
32.1 (0.7)
|
27.5 (0.1)
|
0.42 (0.11)
|
WILDS
|
Paper /
Code
|
SwAV |
-
|
91.4 (2.0)
|
-
|
36.3 (1.01)
|
-
|
29.0 (2.0)
|
-
|
0.45 (0.05)
|
WILDS
|
Paper /
Code
|
AFN |
54.2 (0.8)
|
83.2 (6.2)
|
-
|
38.3 (0.95)
|
-
|
30.8 (0.5)
|
14.9 (1.3)
|
0.39 (0.08)
|
WILDS
|
Paper /
Code
|
DANN |
53.3 (0.0)
|
68.4 (9.2)
|
-
|
34.6 (1.71)
|
-
|
31.9 (1.4)
|
20.4 (0.8)
|
0.33 (0.1)
|
WILDS
|
Paper /
Code
|
Pseudo-Label |
52.3 (1.1)
|
67.7 (8.2)
|
66.9 (2.6)
|
33.7 (0.24)
|
42.9 (2.3)
|
30.3 (0.4)
|
19.7 (0.1)
|
-
|
WILDS
|
Paper /
Code
|
FixMatch |
-
|
71.0 (4.9)
|
-
|
32.6 (2.05)
|
-
|
31.0 (1.3)
|
-
|
0.3 (0.11)
|
WILDS
|
Paper /
Code
|
CORAL |
53.3 (0.0)
|
77.9 (6.6)
|
-
|
33.7 (0.23)
|
-
|
27.9 (0.4)
|
26.6 (0.2)
|
0.36 (0.08)
|
WILDS
|
Paper /
Code
|
Masked LM |
53.5 (0.2)
|
-
|
65.7 (2.3)
|
-
|
-
|
-
|
-
|
-
|
WILDS
|
Paper /
Code
|
Below, we list individual leaderboards with more details on each submission.
iWildCam
Without unlabeled data
Rank |
Algorithm |
Model |
Test ID Macro F1 |
Test ID Avg Acc |
Test OOD Macro F1 ▼ |
Test OOD Avg Acc |
Contact |
References |
Date |
Notes |
1 |
AutoFT |
CLIP ViT-L/14 @ 336px |
63.5 (0.5)
* |
82.9 (0.1)
* |
52.0 (0.4)
* |
83.1 (0.0)
* |
Caroline Choi
|
Paper /
Code
|
January 28, 2024 |
log-scale [1e-7, 1e-3], wd: log-scale[0, 1.0], seed: [0, 100], loss weights in log-scale[1e-4, 10.0] |
2 |
FLYP |
CLIP ViT-L/14@336px |
59.9 (0.7)
* |
76.2 (0.3)
* |
46.0 (1.3)
* |
76.2 (0.4)
* |
Sankalp Garg
|
Paper /
Code
|
December 05, 2022 |
"lr: [0.0001, 0.00001*, 0.000001], wd: [0.1, 0.1, 0.2*]". The hyperparameter search was done on a smaller model CLIP ViT-B/16. The best hyperparameter was used for ViT-L/14@336px too. |
3 |
Model Soups (CLIP ViT-L) |
ViT-L |
57.6 (1.9)
* |
79.1 (0.4)
* |
43.3 (1.0)
* |
79.3 (0.3)
* |
Mitchell Wortsman
|
Paper /
Code
|
March 12, 2022 |
Model soups on top of a random hyperparameter search over LR, iterations, data augmentation, label smoothing. |
4 |
ERM (CLIP ViT-L) |
ViT-L |
55.8 (1.9)
* |
77.0 (0.7)
* |
41.4 (0.5)
* |
78.3 (1.1)
* |
Mitchell Wortsman
|
Paper /
Code
|
July 28, 2022 |
Random hyperparameter search over LR, iterations, data augmentation, label smoothing. |
5 |
ERM |
PNASNet-5-Large |
52.8 (1.4)
* |
77.3 (0.7)
* |
38.5 (0.6)
* |
78.3 (1.4)
* |
John Miller
|
Paper /
Code
|
July 20, 2021 |
Does not use the default model. |
6 |
ERM w/ targeted aug |
ResNet50 |
50.2 (1.6)
* |
77.8 (1.0)
* |
36.5 (0.9)
* |
74.4 (2.0)
* |
Irena Gao
|
Paper /
Code
|
November 30, 2022 |
Uses copy-paste data augmentations, which rely on MegaDetector. |
7 |
ABSGD |
ResNet50 |
47.5 (1.6)
|
74.8 (0.5)
|
33.0 (0.6)
|
72.7 (1.8)
|
Qi Qi
|
Paper /
Code
|
October 13, 2022 |
The learning rates are tuned in {3e-05, *4e-05}. The hyperparameters for ABSGD are tuned between {1.1, 1.5, 2} and gamma is 0.9. Trained for 36 epochs with a batch size of 16. The learning rate is decayed at the 18th epoch by a factor of 2 |
8 |
CORAL |
ResNet50 |
43.6 (3.3)
|
73.8 (0.3)
|
32.7 (0.2)
|
73.3 (4.3)
|
WILDS
|
Paper /
Code
|
July 15, 2021 |
|
9 |
ERM w/ data aug |
ResNet50 |
47.0 (1.4)
|
76.9 (0.6)
|
32.2 (1.2)
|
73.0 (0.4)
|
WILDS
|
Paper /
Code
|
December 09, 2021 |
Implements RandAugment. Hyperparameters tuned via random search, consistent with the WILDS unlabeled data baselines. |
10 |
ERM (more checkpoints) |
ResNet50 |
47.9 (2.6)
|
76.2 (0.1)
|
32.0 (1.5)
|
69.0 (0.4)
|
Kazuki Irie
|
Paper /
Code
|
February 10, 2022 |
We used the default hyper-parameters from the official code base, but conducted cross validation every 1000 training steps. |
11 |
ERM (grid search) |
ResNet50 |
47.1 (1.5)
|
75.7 (0.4)
|
30.8 (1.3)
|
71.5 (2.6)
|
WILDS
|
Paper /
Code
|
July 15, 2021 |
Hyperparameters tuned via grid search, consistent with other WILDS baselines that do not use unlabeled data. |
12 |
ERM (rand search) |
ResNet50 |
46.7 (0.6)
|
74.9 (1.2)
|
30.6 (1.1)
|
72.5 (3.2)
|
WILDS
|
Paper /
Code
|
December 09, 2021 |
Hyperparameters tuned via random search, consistent with the WILDS unlabeled data baselines. |
13 |
IRMX (PAIR opt) |
ResNet50 |
43.9 (2.0)
|
74.9 (1.1)
|
27.9 (0.9)
|
67.6 (2.5)
|
Yongqiang Chen
|
Paper /
Code
|
March 05, 2023 |
preference: [(1,1e8,1e12), (1,1e10,1e12)*, (1,1e12,1e12)], neg_irmv1_adj_rate: [1e-4, 1e-2*, 1] |
14 |
IRMX |
ResNet50 |
43.6 (1.2)
|
74.3 (0.7)
|
26.7 (1.1)
|
66.7 (1.5)
|
Yongqiang Chen
|
Paper /
Code
|
March 04, 2023 |
penalty_weight: [0.01, 0.1, 1.0*, 10.0, 100.0] |
15 |
Group DRO |
ResNet50 |
37.5 (1.9)
|
71.6 (2.7)
|
23.8 (2.0)
|
72.7 (2.0)
|
WILDS
|
Paper /
Code
|
July 15, 2021 |
|
16 |
ARM-BN |
ResNet50 |
27.5 (5.4)
|
62.0 (4.0)
|
23.3 (2.8)
|
70.2 (2.4)
|
Marvin Zhang
|
Paper /
Code
|
April 19, 2022 |
Requires test data to be batched by groups |
17 |
Fish |
ResNet50 |
40.3 (0.6)
|
73.8 (0.1)
|
22.0 (1.8)
|
64.7 (2.6)
|
Yuge Shi
|
Paper /
Code
|
December 14, 2021 |
|
18 |
IRM |
ResNet50 |
22.4 (7.7)
|
59.8 (8.2)
|
15.1 (4.9)
|
59.7 (3.8)
|
WILDS
|
Paper /
Code
|
July 15, 2021 |
|
19 |
MixUp |
ResNet50 |
31.2 (3.1)
|
66.1 (1.8)
|
13.8 (0.8)
|
48.6 (1.1)
|
Olivia Wiles
|
Paper /
Code
|
June 16, 2022 |
lr: [0.01, 0.001, 0.0001]; alpha: [0.2, 0.5, 1.0] |
20 |
Test-time BN adaptation |
ResNet50 |
12.0 (0.3)
|
37.2 (0.7)
|
13.8 (0.6)
|
46.6 (0.9)
|
Marvin Zhang
|
Paper /
Code
|
April 19, 2022 |
Requires test data to be batched by groups |
21 |
JTT |
ResNet50 |
32.6 (4.4)
|
64.9 (2.8)
|
11.0 (2.5)
|
47.4 (2.2)
|
Olivia Wiles
|
Paper /
Code
|
June 16, 2022 |
lr: [0.01, 0.001, 0.0001]; lambda: [0.2, 2, 20, 200] |
With unlabeled data
Unlabeled data is available from the extra domain.
Rank |
Algorithm |
Model |
Test ID Macro F1 |
Test ID Avg Acc |
Test OOD Macro F1 ▼ |
Test OOD Avg Acc |
Contact |
References |
Date |
Notes |
1 |
Connect Later |
ResNet50 |
51.5 (2.0)
|
78.9 (0.8)
|
36.8 (1.6)
|
76.5 (1.7)
|
Helen Qu
|
Paper /
Code
|
March 01, 2024 |
Linear probing LR=0.0018804142350329204. Fine-tuning LR=0.0002504172429234668. TRANSFORM_P=0.5473471004493135. These values were chosen from 10 trials of a random search in lp_lr 10^Uni(-3, -2), ft_lr 10^Uni(-5, -2), transform_p Uni(0.5, 0.9). |
2 |
ICON |
ResNet50 |
50.6 (1.3)
|
77.5 (0.2)
|
34.5 (1.4)
|
72.0 (0.2)
|
Nick Y.
|
Paper /
Code
|
November 03, 2022 |
Uses unlabeled data from extra domains. lr: 0.0002253717686699905*, dropout: 0.5*; follows NoisyStudent. |
3 |
Noisy Student |
ResNet50 |
48.3 (1.6)
|
77.6 (1.4)
|
32.1 (0.7)
|
71.0 (3.1)
|
WILDS
|
Paper /
Code
|
December 09, 2021 |
Uses unlabeled data from extra domains. |
4 |
DANN |
ResNet50 |
48.5 (2.8)
|
77.2 (0.8)
|
31.9 (1.4)
|
70.7 (2.6)
|
WILDS
|
Paper /
Code
|
December 09, 2021 |
Uses unlabeled data from extra domains. |
5 |
FixMatch |
ResNet50 |
46.3 (0.5)
|
78.1 (0.5)
|
31.0 (1.3)
|
71.9 (3.1)
|
WILDS
|
Paper /
Code
|
December 09, 2021 |
Uses unlabeled data from extra domains. |
6 |
AFN |
ResNet50 |
46.8 (0.8)
|
77.2 (0.5)
|
30.8 (0.5)
|
74.4 (1.6)
|
WILDS
|
Paper /
Code
|
December 09, 2021 |
Uses unlabeled data from extra domains. |
7 |
Pseudo-Label |
ResNet50 |
47.3 (0.4)
|
77.5 (0.1)
|
30.3 (0.4)
|
68.7 (3.4)
|
WILDS
|
Paper /
Code
|
December 09, 2021 |
Uses unlabeled data from extra domains. |
8 |
SwAV |
ResNet50 |
47.3 (1.4)
|
74.8 (1.3)
|
29.0 (2.0)
|
63.4 (1.5)
|
WILDS
|
Paper /
Code
|
December 09, 2021 |
Uses unlabeled data from extra domains. |
9 |
CORAL |
ResNet50 |
40.5 (1.4)
|
77.0 (0.3)
|
27.9 (0.4)
|
69.7 (0.8)
|
WILDS
|
Paper /
Code
|
December 09, 2021 |
Uses unlabeled data from extra domains. |
Camelyon17
Without unlabeled data
Rank |
Algorithm |
Model |
Val Acc |
Test Acc ▼ |
Contact |
References |
Date |
Notes |
1 |
SGD (Freeze-Embed) |
CLIP ViT-L |
95.2 (0.3)
* |
96.5 (0.4)
* |
Ananya Kumar
|
Paper /
Code
|
October 18, 2022 |
Uses a pretrained model. lr: [3e-5, 1e-4, 3e-4*, 1e-3, 3e-3, 1e-2] |
2 |
ContriMix |
DenseNet121 |
91.9 (0.7)
|
94.6 (1.2)
|
Dinkar Juyal
|
Paper /
Code
|
June 22, 2023 |
Adam, lr [0.001, 0.0001], 5 attributes, 4 mixes, attr_cons_weight 0.1, self_recon_weight 0.1, cont_cons_weight 0.3, entropy_weight 0.5 |
3 |
MBDG |
DenseNet121 |
88.1 (1.8)
* |
93.3 (1.0)
* |
Alex Robey
|
Paper /
Code
|
March 17, 2022 |
Uses a pretrained model. lr: [1e-4*, 1e-3, 1e-2], wd: [0*, 1e-3, 1e-3], gamma (mbdg margin): [0.01, 0.1*, 0.5], eta_d (mbdg dual step size): [5e-3, 5e-2*, 5e-1] |
4 |
ERM w/ targeted aug |
DenseNet121 |
92.7 (0.7)
|
92.1 (3.1)
|
Irena Gao
|
Paper /
Code
|
November 30, 2022 |
lr: ~0.03, batch size 168. Uses specialized H&E staining color jitter. |
5 |
ERM w/ H&E jitter |
se_resnext101_32x4d |
88.0 (4.2)
* |
91.6 (1.9)
* |
Rohan Taori
|
Paper /
Code
|
July 19, 2021 |
Uses specialized H&E staining color jitter. Does not use the default model. |
6 |
ERM w/ data aug |
DenseNet121 |
90.6 (1.2)
|
82.0 (7.4)
|
WILDS
|
Paper /
Code
|
December 09, 2021 |
Implements RandAugment. Uses color augmentation as part of RandAugment. Hyperparameters tuned via random search, consistent with the WILDS unlabeled data baselines. |
7 |
LISA |
DenseNet121 |
81.8 (1.4)
|
77.1 (6.9)
|
Yu Wang
|
Paper /
Code
|
March 13, 2022 |
We have used all the default hyperparameters, except for the mix_alpha, which is tuned in [0.5, 2], and 2 is the final choice. |
8 |
Fish |
DenseNet121 |
83.9 (1.2)
|
74.7 (7.1)
|
Yuge Shi
|
Paper /
Code
|
December 14, 2021 |
|
9 |
IRMX (PAIR opt) |
DenseNet121 |
84.3 (1.6)
|
74.0 (7.2)
|
Yongqiang Chen
|
Paper /
Code
|
March 05, 2023 |
preference: [(1,1e8,1e12), (1,1e10,1e12)*, (1,1e12,1e12)], neg_irmv1_adj_rate: [1e-4, 1e-2, 1*] |
10 |
ERM (rand search) |
DenseNet121 |
85.8 (1.9)
|
70.8 (7.2)
|
WILDS
|
Paper /
Code
|
December 09, 2021 |
Hyperparameters tuned via random search, consistent with the WILDS unlabeled data baselines. |
11 |
ERM (grid search) |
DenseNet121 |
84.9 (3.1)
|
70.3 (6.4)
|
WILDS
|
Paper /
Code
|
July 15, 2021 |
Hyperparameters tuned via grid search, consistent with other WILDS baselines that do not use unlabeled data. |
12 |
CGD |
DenseNet121 |
86.8 (1.4)
|
69.4 (7.9)
|
Vihari Piratla
|
Paper /
Code
|
April 10, 2022 |
No hyperparameter tuning. CG step size: 0.05. LR, optimizer, decay rate etc. all set to default. |
13 |
Group DRO |
DenseNet121 |
85.5 (2.2)
|
68.4 (7.3)
|
WILDS
|
Paper /
Code
|
July 15, 2021 |
|
14 |
IRMX |
DenseNet121 |
84.7 (2.0)
|
65.5 (8.3)
|
Yongqiang Chen
|
Paper /
Code
|
March 04, 2023 |
penalty_weight: [0.01, 0.1, 1.0*, 10.0, 100.0] |
15 |
IRM |
DenseNet121 |
86.2 (1.4)
|
64.2 (8.1)
|
WILDS
|
Paper /
Code
|
July 15, 2021 |
|
16 |
JTT |
ResNet50 |
66.9 (1.5)
* |
63.8 (1.4)
* |
Olivia Wiles
|
Paper /
Code
|
June 16, 2022 |
lr: [0.01, 0.001, 0.0001]; lambda: [0.2, 2, 20, 200] |
17 |
MixUp |
ResNet50 |
65.5 (1.4)
* |
63.5 (0.9)
* |
Olivia Wiles
|
Paper /
Code
|
June 16, 2022 |
lr: [0.01, 0.001, 0.0001]; alpha:[0.2, 0.5, 1.0] |
18 |
CORAL |
DenseNet121 |
86.2 (1.4)
|
59.5 (7.7)
|
WILDS
|
Paper /
Code
|
July 15, 2021 |
|
With unlabeled data
Unlabeled data is available from the source, validation and target domains.
Rank |
Algorithm |
Model |
Val Acc |
Test Acc ▼ |
Contact |
References |
Date |
Notes |
1 |
Connect Later |
DenseNet121 |
93.9 (0.4)
|
95.0 (0.9)
|
Helen Qu
|
Paper /
Code
|
March 01, 2024 |
Linear probing LR=0.006222466404167087. Fine-tuning LR=0.003324366874654924. TRANSFORM_P=0.8260829829811762. AUG_STRENGTH=0.0995477425665205. These values were chosen from 10 trials of a random search in lp_lr 10^Uni(-3, -2), ft_lr 10^Uni(-5, -2), transform_p Uni(0.5, 0.9), aug strength Uni(0.05, 0.1). |
2 |
ICON |
DenseNet121 |
90.1 (0.4)
|
93.8 (0.3)
|
Nick Y.
|
Paper /
Code
|
November 03, 2022 |
Uses unlabeled data from the target domain. lr: 0.0027267495451878732*, dropout: [0.0*, 0.5]. |
3 |
SwAV |
DenseNet121 |
92.3 (0.4)
|
91.4 (2.0)
|
WILDS
|
Paper /
Code
|
December 09, 2021 |
Uses unlabeled data from the target domain. Uses color augmentation as part of RandAugment. |
4 |
Noisy Student |
DenseNet121 |
93.2 (0.5)
|
86.7 (1.7)
|
WILDS
|
Paper /
Code
|
December 09, 2021 |
Uses unlabeled data from the target domain. Uses color augmentation as part of RandAugment. |
5 |
AFN |
DenseNet121 |
91.1 (0.9)
|
83.2 (6.2)
|
WILDS
|
Paper /
Code
|
December 09, 2021 |
Uses unlabeled data from the target domain. Uses color augmentation as part of RandAugment. |
6 |
CORAL |
DenseNet121 |
90.4 (0.9)
|
77.9 (6.6)
|
WILDS
|
Paper /
Code
|
December 09, 2021 |
Uses unlabeled data from the target domain. Uses color augmentation as part of RandAugment. |
7 |
FixMatch |
DenseNet121 |
91.3 (1.1)
|
71.0 (4.9)
|
WILDS
|
Paper /
Code
|
December 09, 2021 |
Uses unlabeled data from the target domain. Uses color augmentation as part of RandAugment. |
8 |
DANN |
DenseNet121 |
86.9 (2.2)
|
68.4 (9.2)
|
WILDS
|
Paper /
Code
|
December 09, 2021 |
Uses unlabeled data from the target domain. Uses color augmentation as part of RandAugment. |
9 |
Pseudo-Label |
DenseNet121 |
91.3 (1.3)
|
67.7 (8.2)
|
WILDS
|
Paper /
Code
|
December 09, 2021 |
Uses unlabeled data from the target domain. Uses color augmentation as part of RandAugment. |
RxRx1
Rank |
Algorithm |
Model |
Val Acc |
Test ID Acc |
Test Acc ▼ |
Contact |
References |
Date |
Notes |
1 |
IID representation learning |
ResNet50 |
23.9 (0.3)
|
49.9 (0.5)
|
39.2 (0.2)
|
Jiqing Wu
|
Paper /
Code
|
February 20, 2022 |
Uses CutMix regularizer. |
2 |
ERM (CutMix) |
ResNet50 |
23.6 (0.3)
|
47.4 (1.0)
|
38.4 (0.2)
|
Jiqing Wu
|
Paper /
Code
|
February 20, 2022 |
Uses CutMix regularizer. |
3 |
LISA |
ResNet50 |
20.1 (0.4)
|
41.1 (1.3)
|
31.9 (1.0)
|
Yu Wang
|
Paper /
Code
|
March 13, 2022 |
We have used all the default hyperparameters, except for the mix_alpha, which is tuned in [0.5, 2], and 2 is the final choice. |
4 |
ARM-BN |
ResNet50 |
20.9 (0.2)
|
34.9 (0.2)
|
31.2 (0.1)
|
Marvin Zhang
|
Paper /
Code
|
April 19, 2022 |
Requires test data to be batched by groups |
5 |
ERM (grid search) |
ResNet50 |
19.4 (0.2)
|
35.9 (0.4)
|
29.9 (0.4)
|
WILDS
|
Paper /
Code
|
July 15, 2021 |
Hyperparameters tuned via grid search, consistent with other WILDS baselines that do not use unlabeled data. |
6 |
IRMX (PAIR opt) |
ResNet50 |
18.8 (0.2)
|
34.7 (0.3)
|
28.8 (0.0)
|
Yongqiang Chen
|
Paper /
Code
|
March 05, 2023 |
preference: [(1,1e8,1e12)*, (1,1e10,1e12)*, (1,1e12,1e12)], neg_irmv1_adj_rate: [1e-4, 1e-2, 1*] |
7 |
IRMX |
ResNet50 |
18.9 (0.2)
|
34.7 (0.2)
|
28.7 (0.2)
|
Yongqiang Chen
|
Paper /
Code
|
March 04, 2023 |
penalty_weight: [0.01, 0.1, 1.0*, 10.0, 100.0] |
8 |
CORAL |
ResNet50 |
18.5 (0.4)
|
34.0 (0.3)
|
28.4 (0.3)
|
WILDS
|
Paper /
Code
|
July 15, 2021 |
|
9 |
Group DRO |
ResNet50 |
15.2 (0.1)
|
28.1 (0.3)
|
22.5 (0.3)
|
WILDS
|
Paper /
Code
|
July 15, 2021 |
|
10 |
Test-time BN adaptation |
ResNet50 |
12.3 (0.2)
|
21.5 (0.2)
|
20.1 (0.2)
|
Marvin Zhang
|
Paper /
Code
|
April 19, 2022 |
Requires test data to be batched by groups |
11 |
IRM |
ResNet50 |
5.6 (0.4)
|
9.9 (1.4)
|
8.2 (1.1)
|
WILDS
|
Paper /
Code
|
July 15, 2021 |
|
OGB-MolPCBA
Without unlabeled data
Rank |
Algorithm |
Model |
Val Avg Precision |
Test Avg Precision ▼ |
Contact |
References |
Date |
Notes |
1 |
ERM (rand search) |
GIN-virtual |
29.3 (0.3)
|
28.3 (0.1)
|
WILDS
|
Paper /
Code
|
December 09, 2021 |
Hyperparameters tuned via random search, consistent with the WILDS unlabeled data baselines. |
2 |
ERM (grid search) |
GIN-virtual |
27.8 (0.1)
|
27.2 (0.3)
|
WILDS
|
Paper /
Code
|
July 15, 2021 |
Hyperparameters tuned via grid search, consistent with other WILDS baselines that do not use unlabeled data. |
3 |
Group DRO |
GIN-virtual |
23.1 (0.6)
|
22.4 (0.6)
|
WILDS
|
Paper /
Code
|
July 15, 2021 |
|
4 |
CORAL |
GIN-virtual |
18.4 (0.2)
|
17.9 (0.5)
|
WILDS
|
Paper /
Code
|
July 15, 2021 |
|
5 |
IRM |
GIN-virtual |
15.8 (0.2)
|
15.6 (0.3)
|
WILDS
|
Paper /
Code
|
July 15, 2021 |
|
With unlabeled data
Unlabeled data is available from the source, validation and target domains.
Rank |
Algorithm |
Model |
Val Avg Precision |
Test Avg Precision ▼ |
Contact |
References |
Date |
Notes |
1 |
ICON |
GIN-virtual |
29.6 (0.0)
|
28.3 (0.0)
|
Nick Y.
|
Paper /
Code
|
November 03, 2022 |
Uses unlabeled data from the target domain. lr: 0.000398452164375177*, dropout: 0.0*. |
2 |
Noisy Student |
GIN-virtual |
28.9 (0.1)
|
27.5 (0.1)
|
WILDS
|
Paper /
Code
|
December 09, 2021 |
Uses unlabeled data from the target domain. |
3 |
CORAL |
GIN-virtual |
27.0 (0.4)
|
26.6 (0.2)
|
WILDS
|
Paper /
Code
|
December 09, 2021 |
Uses unlabeled data from the target domain. |
4 |
DANN |
GIN-virtual |
20.7 (0.8)
|
20.4 (0.8)
|
WILDS
|
Paper /
Code
|
December 09, 2021 |
Uses unlabeled data from the target domain. |
5 |
Pseudo-Label |
GIN-virtual |
21.9 (0.6)
|
19.7 (0.1)
|
WILDS
|
Paper /
Code
|
December 09, 2021 |
Uses unlabeled data from the target domain. |
6 |
AFN |
GIN-virtual |
15.1 (1.3)
|
14.9 (1.3)
|
WILDS
|
Paper /
Code
|
December 09, 2021 |
Uses unlabeled data from the target domain. |
GlobalWheat
Without unlabeled data
Rank |
Algorithm |
Model |
Val ID Acc |
Val Acc |
Test ID Acc |
Test Acc ▼ |
Contact |
References |
Date |
Notes |
1 |
ERM (grid search) |
Faster R-CNN |
|
68.6 (0.4)
|
|
51.2 (1.8)
|
WILDS
|
Paper /
Code
|
July 15, 2021 |
Hyperparameters tuned via grid search, consistent with other WILDS baselines that do not use unlabeled data. |
2 |
ERM (rand search) |
Faster R-CNN |
|
68.7 (0.9)
|
|
50.5 (1.7)
|
WILDS
|
Paper /
Code
|
December 09, 2021 |
Hyperparameters tuned via random search, consistent with the WILDS unlabeled data baselines. |
3 |
Group DRO |
Faster R-CNN |
|
66.2 (0.4)
|
|
47.9 (2.0)
|
WILDS
|
Paper /
Code
|
July 15, 2021 |
|
With unlabeled data
Unlabeled data is available from the source, validation, target and extra domains.
Rank |
Algorithm |
Model |
Val ID Acc |
Val Acc |
Test ID Acc |
Test Acc ▼ |
Contact |
References |
Date |
Notes |
1 |
ICON |
Faster R-CNN |
|
68.9 (0.3)
|
|
52.3 (0.2)
|
Nick Y.
|
Paper /
Code
|
November 03, 2022 |
Uses unlabeled data from the target domain. lr: 0.000001*. |
2 |
Noisy Student |
Faster R-CNN |
|
68.9 (0.4)
|
|
49.3 (3.7)
|
WILDS
|
Paper /
Code
|
December 09, 2021 |
Uses unlabeled data from the target domain. |
3 |
Pseudo-Label |
Faster R-CNN |
|
65.0 (0.7)
|
|
42.9 (2.3)
|
WILDS
|
Paper /
Code
|
December 09, 2021 |
Uses unlabeled data from the target domain. |
Without unlabeled data
Rank |
Algorithm |
Model |
Val Avg Acc |
Val Worst-Group Acc |
Test Avg Acc |
Test Worst-Group Acc ▼ |
Contact |
References |
Date |
Notes |
1 |
Fish |
DistillBERT-base-uncased |
88.9 (0.6)
|
70.5 (1.1)
|
89.3 (0.3)
|
75.3 (0.6)
|
Yuge Shi
|
Paper /
Code
|
December 14, 2021 |
|
2 |
IRMX (PAIR opt) |
DistillBERT-base-uncased |
88.2 (1.8)
|
73.2 (0.7)
|
88.0 (1.8)
|
74.2 (1.4)
|
Yongqiang Chen
|
Paper /
Code
|
March 05, 2023 |
preference: [(1,1e8,1e12)*, (1,1e10,1e12), (1,1e12,1e12)], neg_irmv1_adj_rate: [1e-4*, 1e-2, 1], lr: [1e-6, 2e-6*, 1e-5, 2e-5] |
3 |
IRMX |
DistillBERT-base-uncased |
87.2 (2.3)
|
71.9 (1.7)
|
87.0 (2.3)
|
73.4 (1.4)
|
Yongqiang Chen
|
Paper /
Code
|
March 04, 2023 |
penalty_weight: [0.01, 0.1, 1.0*, 10.0, 100.0], lr: [1e-6, 2e-6*, 1e-5, 2e-5] |
4 |
LISA |
DistillBERT-base-uncased |
90.3 (0.3)
|
71.2 (0.9)
|
90.1 (0.3)
|
72.9 (1.0)
|
Yu Wang
|
Paper /
Code
|
March 13, 2022 |
We have used all the default hyperparameters, except for the mix_alpha, which is tuned in [0.5, 2], and 2 is the final choice. |
5 |
DFR |
DistillBERT-base-uncased |
87.5 (0.7)
|
71.7 (0.9)
|
87.3 (0.7)
|
72.5 (0.9)
|
Pavel Izmailov
|
Paper /
Code
|
September 26, 2022 |
Uses default WILDS training script for the base models, dropping 20% of the data from the training set for last layer retraining. We then retrain the last layer on the left-out data, with two hyper-parameters: regularization coefficient C: [1., 0.3*, 0.1, 0.07, 0.03, 0.01, 0.003] and bias vector correction t: [-0.15, -0.1*, -0.05, 0., 0.05, 0.1, 0.15] according to the best WGA on the validation set. We use all the group identities for retraining the last layer. We assign a group [0-8] to each example according to its attributes; if multiple attributes are present, we use the least common attribute (with the smallest number of occurrences) as the group for that example. |
6 |
Group DRO |
DistillBERT-base-uncased |
90.1 (0.4)
|
67.7 (1.8)
|
89.9 (0.5)
|
70.0 (2.0)
|
WILDS
|
Paper /
Code
|
July 15, 2021 |
Groups data according to (label, Black). |
7 |
Reweighted (Label) |
DistillBERT-base-uncased |
90.1 (0.4)
|
65.9 (1.8)
|
89.8 (0.4)
|
69.2 (0.9)
|
WILDS
|
Paper /
Code
|
July 15, 2021 |
Groups data according to labels. |
8 |
Group DRO (Label) |
DistillBERT-base-uncased |
90.4 (0.4)
|
65.0 (3.8)
|
90.2 (0.3)
|
69.1 (1.8)
|
WILDS
|
Paper /
Code
|
July 15, 2021 |
Groups data according to labels. |
9 |
CGD |
distilbert-base-uncased |
89.6 (0.5)
|
68.3 (1.6)
|
89.6 (0.4)
|
69.1 (1.9)
|
Vihari Piratla
|
Paper /
Code
|
April 10, 2022 |
CG step size: [0.005, 0.01, 0.05, 0.1]; 0.05 performed the best. LR, optimizer, decay rate etc. all set to default. |
10 |
DFR (label, Black) |
DistillBERT-base-uncased |
88.1 (1.1)
|
69.9 (1.0)
|
87.9 (1.2)
|
68.2 (2.3)
|
Pavel Izmailov
|
Paper /
Code
|
September 26, 2022 |
Uses default WILDS training script for the base models, dropping 20% of the data from the training set for last layer retraining. We then retrain the last layer on the left-out data, with two hyper-parameters: regularization coefficient C: [1., 0.3*, 0.1, 0.07, 0.03, 0.01, 0.003] and bias vector correction t: [-0.15, -0.1*, -0.05, 0., 0.05, 0.1, 0.15] according to the best WGA on the validation set. We only use the (label, Black) attributes to form the groups for last layer retraining. |
11 |
IRM |
DistillBERT-base-uncased |
89.0 (0.7)
|
65.9 (2.8)
|
88.8 (0.7)
|
66.3 (2.1)
|
WILDS
|
Paper /
Code
|
July 15, 2021 |
Groups data according to (label, Black). |
12 |
Reweighted (Label x Black) |
DistillBERT-base-uncased |
89.6 (0.6)
|
66.6 (1.5)
|
89.2 (0.6)
|
66.2 (1.2)
|
WILDS
|
Paper /
Code
|
July 15, 2021 |
Groups data according to (label, Black). |
13 |
CORAL |
DistillBERT-base-uncased |
88.9 (0.6)
|
64.7 (1.4)
|
88.7 (0.5)
|
65.6 (1.3)
|
WILDS
|
Paper /
Code
|
July 15, 2021 |
Groups data according to (label, Black). |
14 |
ERM (grid search) |
DistillBERT-base-uncased |
92.3 (0.2)
|
50.5 (1.9)
|
92.2 (0.1)
|
56.0 (3.6)
|
WILDS
|
Paper /
Code
|
July 15, 2021 |
Hyperparameters tuned via grid search, consistent with other WILDS baselines that do not use unlabeled data. |
With unlabeled data
Unlabeled data is available from the extra domain.
Rank |
Algorithm |
Model |
Val Avg Acc |
Val Worst-Group Acc |
Test Avg Acc |
Test Worst-Group Acc ▼ |
Contact |
References |
Date |
Notes |
1 |
ICON |
DistillBERT-base-uncased |
89.9 (0.1)
|
66.4 (0.7)
|
89.7 (0.1)
|
68.8 (1.3)
|
Nick Y.
|
Paper /
Code
|
November 03, 2022 |
Uses unlabeled data from the same distribution. lr: 7.324042204632364e-05*, dropout: 0.0*. |
2 |
Pseudo-Label |
DistillBERT-base-uncased |
90.5 (0.6)
|
63.9 (1.7)
|
90.3 (0.5)
|
66.9 (2.6)
|
WILDS
|
Paper /
Code
|
December 09, 2021 |
Uses unlabeled data from the same distribution. |
3 |
Masked LM |
DistillBERT-base-uncased |
89.7 (1.1)
|
64.5 (2.5)
|
89.4 (1.2)
|
65.7 (2.3)
|
WILDS
|
Paper /
Code
|
December 09, 2021 |
Uses unlabeled data from the same distribution. |
FMoW
Without unlabeled data
Rank |
Algorithm |
Model |
Val Avg Acc |
Test Avg Acc |
Val Worst-region Acc |
Test Worst-region Acc ▼ |
Contact |
References |
Date |
Notes |
1 |
AutoFT |
CLIP ViT-L/14 @ 336px |
73.5 (0.11)
* |
68.2 (0.05)
* |
66.0 (0.25)
* |
51.8 (0.41)
* |
Caroline Choi
|
Paper /
Code
|
January 28, 2024 |
lr: log-scale [1e-7, 1e-3], wd: log-scale[0, 1.0], seed: [0, 100], loss weights in log-scale[1e-4, 10.0] |
2 |
SGD (Freeze-Embed) |
CLIP-ViT-L/14 @ 338px |
73.7 (0.34)
* |
68.3 (0.42)
* |
60.0 (0.07)
* |
50.3 (1.1)
* |
Ananya Kumar
|
Paper /
Code
|
October 18, 2022 |
lr: [3e-5, 1e-4, 3e-4, 1e-3, 3e-3, 1e-2] |
3 |
Model Soups (CLIP ViT-L) |
ViT-L |
75.7 (0.07)
* |
69.5 (0.08)
* |
59.8 (0.43)
* |
47.6 (0.33)
* |
Mitchell Wortsman
|
Paper /
Code
|
March 12, 2022 |
Model soups on top of a random hyperparameter search over LR, iterations, data augmentation, label smoothing. |
4 |
ERM (CLIP ViT-L) |
ViT-L |
73.6 (0.23)
* |
66.9 (0.17)
* |
59.5 (1.31)
* |
46.1 (0.59)
* |
Mitchell Wortsman
|
Paper /
Code
|
July 28, 2022 |
Random hyperparameter search over LR, iterations, data augmentation, label smoothing. |
5 |
DFR |
DenseNet121 |
68.4 (1.32)
* |
53.4 (0.44)
* |
64.3 (1.39)
* |
42.8 (0.42)
* |
Pavel Izmailov
|
Paper /
Code
|
September 26, 2022 |
Uses the OOD validation set to retrain the last layer of the model. Trains the base model with standard ERM training scripts from the WILDS repo, and only tunes the regularization strength parameter C: [1.*, 0.3, 0.1, 0.07, 0.03, 0.01, 0.003] for last layer retraining. |
6 |
ERM w/ data aug |
DenseNet121 |
62.1 (0.23)
|
55.5 (0.42)
|
53.2 (0.61)
|
35.7 (0.26)
|
WILDS
|
Paper /
Code
|
December 09, 2021 |
Implements RandAugment. Hyperparameters tuned via random search, consistent with the WILDS unlabeled data baselines. |
7 |
LISA |
DenseNet121 |
58.7 (1.12)
|
52.8 (1.15)
|
48.7 (0.92)
|
35.5 (0.81)
|
Yu Wang
|
Paper /
Code
|
March 13, 2022 |
We have used all the default hyperparameters, except for the mix_alpha, which is tuned in [0.5, 2], and 2 is the final choice. |
8 |
IRMX (PAIR opt) |
DenseNet121 |
58.6 (0.46)
|
52.7 (0.57)
|
52.3 (1.21)
|
35.4 (1.3)
|
Yongqiang Chen
|
Paper /
Code
|
March 05, 2023 |
preference: [(1,1e8,1e12), (1,1e10,1e12), (1,1e12,1e12)*], neg_irmv1_adj_rate: [1e-4, 1e-2, 1*] |
9 |
ERM |
se_resnext101_32x4d |
62.1 (0.24)
* |
55.5 (0.14)
* |
51.3 (2.93)
* |
35.0 (0.78)
* |
John Miller
|
Paper /
Code
|
July 15, 2021 |
Does not use the default model. |
10 |
ERM (more checkpoints) |
DenseNet121 |
62.0 (0.06)
|
55.6 (0.23)
|
52.5 (1.25)
|
34.8 (1.9)
|
Kazuki Irie
|
Paper /
Code
|
February 10, 2022 |
batch_size: [20*, 32, 64], lr: [1e-4, 3e-4*]. We conducted cross validation every 200 training steps. |
11 |
Fish |
DenseNet121 |
57.8 (0.15)
|
51.8 (0.32)
|
49.5 (2.34)
|
34.6 (0.18)
|
Yuge Shi
|
Paper /
Code
|
December 14, 2021 |
|
12 |
ERM (rand search) |
DenseNet121 |
60.6 (0.57)
|
54.0 (0.4)
|
52.6 (0.25)
|
34.1 (1.42)
|
WILDS
|
Paper /
Code
|
December 09, 2021 |
Hyperparameters tuned via random search, consistent with the WILDS unlabeled data baselines. |
13 |
IRMX |
DenseNet121 |
58.7 (1.04)
|
52.5 (0.47)
|
52.4 (1.06)
|
33.7 (0.95)
|
Yongqiang Chen
|
Paper /
Code
|
March 04, 2023 |
penalty_weight: [0.01, 0.1, 1.0*, 10.0, 100.0] |
14 |
IRM |
DenseNet121 |
56.1 (0.61)
|
50.4 (0.75)
|
49.7 (0.97)
|
32.8 (2.09)
|
WILDS
|
Paper /
Code
|
July 15, 2021 |
|
15 |
CORAL |
DenseNet121 |
56.5 (0.15)
|
50.1 (0.07)
|
48.9 (1.31)
|
32.8 (0.66)
|
WILDS
|
Paper /
Code
|
July 15, 2021 |
|
16 |
CGD |
DenseNet121 |
57.0 (1.03)
|
50.6 (1.39)
|
49.8 (1.04)
|
32.0 (2.26)
|
Vihari Piratla
|
Paper /
Code
|
April 10, 2022 |
CG step size: [0.05, 0.01, 0.2]; 0.2 performed the best. LR, optimizer, decay rate etc. all set to default. |
17 |
ERM (grid search) |
DenseNet121 |
59.2 (0.07)
|
52.7 (0.23)
|
49.8 (0.36)
|
31.3 (0.17)
|
WILDS
|
Paper /
Code
|
July 15, 2021 |
Hyperparameters tuned via grid search, consistent with other WILDS baselines that do not use unlabeled data. |
18 |
Group DRO |
DenseNet121 |
57.6 (0.7)
|
51.2 (0.38)
|
49.4 (0.45)
|
31.1 (1.66)
|
WILDS
|
Paper /
Code
|
July 15, 2021 |
|
19 |
Test-time BN adaptation |
DenseNet121 |
57.9 (0.36)
|
51.5 (0.25)
|
47.8 (0.52)
|
30.0 (0.23)
|
Marvin Zhang
|
Paper /
Code
|
April 19, 2022 |
Requires test data to be batched by groups |
20 |
ARM-BN |
DenseNet121 |
48.0 (0.65)
|
42.1 (0.26)
|
38.9 (2.17)
|
24.4 (0.54)
|
Marvin Zhang
|
Paper /
Code
|
April 19, 2022 |
Requires test data to be batched by groups |
With unlabeled data
Unlabeled data is available from the source, validation and target domains.
Rank |
Algorithm |
Model |
Val Avg Acc |
Test Avg Acc |
Val Worst-region Acc |
Test Worst-region Acc ▼ |
Contact |
References |
Date |
Notes |
1 |
ICON |
DenseNet121 |
64.4 (0.18)
|
58.5 (0.16)
|
55.6 (0.44)
|
39.9 (1.12)
|
Nick Y.
|
Paper /
Code
|
November 03, 2022 |
Uses unlabeled data from the target domain. lr: 0.00013443151989778619*, dropout: 0.5*. |
2 |
AFN |
DenseNet121 |
61.7 (0.49)
|
55.6 (0.23)
|
53.4 (0.78)
|
38.3 (0.95)
|
WILDS
|
Paper /
Code
|
December 09, 2021 |
Uses unlabeled data from the target domain. |
3 |
Noisy Student |
DenseNet121 |
64.0 (0.37)
|
58.4 (0.4)
|
55.4 (0.47)
|
37.8 (0.62)
|
WILDS
|
Paper /
Code
|
December 09, 2021 |
Uses unlabeled data from the target domain. |
4 |
SwAV |
DenseNet121 |
63.1 (0.38)
|
56.3 (0.67)
|
51.6 (0.57)
|
36.3 (1.01)
|
WILDS
|
Paper /
Code
|
December 09, 2021 |
Uses unlabeled data from the target domain. |
5 |
DANN |
DenseNet121 |
59.5 (0.45)
|
53.0 (0.58)
|
50.8 (2.18)
|
34.6 (1.71)
|
WILDS
|
Paper /
Code
|
December 09, 2021 |
Uses unlabeled data from the target domain. |
6 |
CORAL |
DenseNet121 |
60.1 (0.56)
|
53.3 (0.61)
|
51.7 (1.23)
|
33.7 (0.23)
|
WILDS
|
Paper /
Code
|
December 09, 2021 |
Uses unlabeled data from the target domain. |
7 |
Pseudo-Label |
DenseNet121 |
62.5 (0.08)
|
55.6 (0.2)
|
51.5 (0.52)
|
33.7 (0.24)
|
WILDS
|
Paper /
Code
|
December 09, 2021 |
Uses unlabeled data from the target domain. |
8 |
FixMatch |
DenseNet121 |
58.9 (2.07)
|
52.5 (1.86)
|
50.8 (1.11)
|
32.6 (2.05)
|
WILDS
|
Paper /
Code
|
December 09, 2021 |
Uses unlabeled data from the target domain. |
PovertyMap
Without unlabeled data
Rank |
Algorithm |
Model |
Val Pearson r |
Test Pearson r |
Val Worst-U/R Pearson r |
Test Worst-U/R Pearson r ▼ |
Contact |
References |
Date |
Notes |
1 |
C-Mixup |
ResNet18-MS |
0.82 (0.04)
|
0.8 (0.03)
|
0.55 (0.07)
|
0.53 (0.07)
|
Yiping Wang
|
Paper /
Code
|
December 01, 2022 |
kde_bandwidth: [0.2, 0.3, 0.4*, 0.5*, 0.7, 1.0*, 2.0, 3.0], alpha: [0.5*, 1.0, 2.0*] |
2 |
ERM (rand search) |
ResNet18-MS |
0.81 (0.03)
|
0.8 (0.04)
|
0.53 (0.06)
|
0.5 (0.07)
|
WILDS
|
Paper /
Code
|
December 09, 2021 |
Hyperparameters tuned via random search, consistent with the WILDS unlabeled data baselines. |
3 |
ERM w/ data aug |
ResNet18-MS |
0.81 (0.03)
|
0.79 (0.04)
|
0.54 (0.06)
|
0.49 (0.06)
|
WILDS
|
Paper /
Code
|
December 09, 2021 |
Implements composition of random horizontal flip, random affine transformation, color jitter on the RGB channels, and Cutout on all channels. Hyperparameters tuned via random search, consistent with the WILDS unlabeled data baselines. |
4 |
IRMX (PAIR opt) |
ResNet18-MS |
0.79 (0.04)
|
0.79 (0.05)
|
0.49 (0.07)
|
0.47 (0.09)
|
Yongqiang Chen
|
Paper /
Code
|
March 05, 2023 |
preference: [(1,1e8,1e12)*, (1,1e10,1e12), (1,1e12,1e12)], neg_irmv1_adj_rate: [1e-4, 1e-2*, 1] |
5 |
ERM (grid search) |
ResNet18-MS |
0.8 (0.04)
|
0.78 (0.04)
|
0.51 (0.06)
|
0.45 (0.06)
|
WILDS
|
Paper /
Code
|
July 15, 2021 |
Hyperparameters tuned via grid search, consistent with other WILDS baselines that do not use unlabeled data. |
6 |
IRMX |
ResNet18-MS |
0.78 (0.04)
|
0.77 (0.04)
|
0.48 (0.07)
|
0.45 (0.05)
|
Yongqiang Chen
|
Paper /
Code
|
March 04, 2023 |
penalty_weight: [0.01, 0.1, 1.0*, 10.0, 100.0] |
7 |
CORAL |
ResNet18-MS |
0.8 (0.04)
|
0.78 (0.05)
|
0.51 (0.06)
|
0.44 (0.07)
|
WILDS
|
Paper /
Code
|
July 15, 2021 |
|
8 |
IRM |
ResNet18-MS |
0.81 (0.03)
|
0.77 (0.05)
|
0.53 (0.05)
|
0.43 (0.07)
|
WILDS
|
Paper /
Code
|
July 15, 2021 |
|
9 |
CGD |
ResNet18-MS |
0.81 (0.03)
|
0.77 (0.04)
|
0.51 (0.05)
|
0.43 (0.04)
|
Vihari Piratla
|
Paper /
Code
|
April 10, 2022 |
No hyperparameter search; CG step size: 0.05, LR, optimizer, decay rate etc. all set to default. |
10 |
Group DRO |
ResNet18-MS |
0.78 (0.05)
|
0.75 (0.07)
|
0.46 (0.04)
|
0.39 (0.06)
|
WILDS
|
Paper /
Code
|
July 15, 2021 |
|
With unlabeled data
Unlabeled data is available from the source, validation and target domains.
Rank |
Algorithm |
Model |
Val Pearson r |
Test Pearson r |
Val Worst-U/R Pearson r |
Test Worst-U/R Pearson r ▼ |
Contact |
References |
Date |
Notes |
1 |
C-Mixup+SpAR |
ResNet18-MS |
0.81 (0.02)
|
0.79 (0.05)
|
0.53 (0.08)
|
0.52 (0.09)
|
Ben Eyre
|
Paper /
Code
|
February 19, 2024 |
lr=1e-3*, kde_bandwidth=0.5*, spar_alpha=0.999* |
2 |
ERM+SpAR |
ResNet18-MS |
0.8 (0.04)
|
0.79 (0.05)
|
0.52 (0.08)
|
0.51 (0.1)
|
Ben Eyre
|
Paper /
Code
|
February 19, 2024 |
lr=1e-3*, spar_alpha=0.999* |
3 |
ICON |
ResNet18-MS |
0.8 (0.04)
|
0.77 (0.04)
|
0.52 (0.08)
|
0.49 (0.04)
|
Nick Y.
|
Paper /
Code
|
November 03, 2022 |
Uses unlabeled data from the target domain. lr: 0.0009738391232813829*, dropout: 0.5*. |
4 |
SwAV |
ResNet18-MS |
0.81 (0.05)
|
0.78 (0.06)
|
0.54 (0.07)
|
0.45 (0.05)
|
WILDS
|
Paper /
Code
|
December 09, 2021 |
Uses unlabeled data from the target domain. |
5 |
Noisy Student |
ResNet18-MS |
0.8 (0.05)
|
0.76 (0.08)
|
0.52 (0.08)
|
0.42 (0.11)
|
WILDS
|
Paper /
Code
|
December 09, 2021 |
Uses unlabeled data from the target domain. |
6 |
AFN |
ResNet18-MS |
0.76 (0.05)
|
0.75 (0.08)
|
0.44 (0.07)
|
0.39 (0.08)
|
WILDS
|
Paper /
Code
|
December 09, 2021 |
Uses unlabeled data from the target domain. |
7 |
CORAL |
ResNet18-MS |
0.79 (0.04)
|
0.74 (0.05)
|
0.5 (0.09)
|
0.36 (0.08)
|
WILDS
|
Paper /
Code
|
December 09, 2021 |
Uses unlabeled data from the target domain. |
8 |
DANN |
ResNet18-MS |
0.77 (0.04)
|
0.69 (0.04)
|
0.44 (0.11)
|
0.33 (0.1)
|
WILDS
|
Paper /
Code
|
December 09, 2021 |
Uses unlabeled data from the target domain. |
9 |
FixMatch |
ResNet18-MS |
0.76 (0.07)
|
0.64 (0.11)
|
0.48 (0.05)
|
0.3 (0.11)
|
WILDS
|
Paper /
Code
|
December 09, 2021 |
Uses unlabeled data from the target domain. |
Amazon
Without unlabeled data
Rank |
Algorithm |
Model |
Val Avg Acc |
Test Avg Acc |
Val 10% Acc |
Test 10% Acc ▼ |
Contact |
References |
Date |
Notes |
1 |
LISA |
DistillBERT-base-uncased |
71.4 (0.4)
|
70.7 (0.3)
|
54.8 (0.2)
|
54.7 (0.0)
|
Yu Wang
|
Paper /
Code
|
March 13, 2022 |
We have used all the default hyperparameters, except for the mix_alpha, which is tuned in [0.5, 2], and 2 is the final choice. |
2 |
ERM (rand search) |
DistillBERT-base-uncased |
72.8 (0.1)
|
72.0 (0.1)
|
56.0 (0.0)
|
54.2 (0.8)
|
WILDS
|
Paper /
Code
|
December 09, 2021 |
Hyperparameters tuned via random search, consistent with the WILDS unlabeled data baselines. |
3 |
ERM (grid search) |
DistillBERT-base-uncased |
72.7 (0.1)
|
71.9 (0.1)
|
55.2 (0.7)
|
53.8 (0.8)
|
WILDS
|
Paper /
Code
|
July 15, 2021 |
Hyperparameters tuned via grid search, consistent with other WILDS baselines that do not use unlabeled data. |
4 |
Group DRO |
DistillBERT-base-uncased |
70.7 (0.6)
|
70.0 (0.5)
|
54.7 (0.0)
|
53.3 (0.0)
|
WILDS
|
Paper /
Code
|
July 15, 2021 |
|
5 |
Fish |
DistillBERT-base-uncased |
72.5 (0.0)
|
71.7 (0.1)
|
54.2 (0.8)
|
53.3 (0.0)
|
Yuge Shi
|
Paper /
Code
|
December 14, 2021 |
|
6 |
CORAL |
DistillBERT-base-uncased |
72.0 (0.3)
|
71.1 (0.3)
|
54.7 (0.0)
|
52.9 (0.8)
|
WILDS
|
Paper /
Code
|
July 15, 2021 |
|
7 |
IRM |
DistillBERT-base-uncased |
71.3 (0.5)
|
70.3 (0.6)
|
54.2 (0.8)
|
52.4 (0.8)
|
WILDS
|
Paper /
Code
|
July 15, 2021 |
|
8 |
Reweighted (Label) |
DistillBERT-base-uncased |
68.9 (0.9)
|
68.3 (0.9)
|
52.1 (0.2)
|
51.6 (0.8)
|
WILDS
|
Paper /
Code
|
July 15, 2021 |
|
With unlabeled data
Unlabeled data is available from the validation, target and extra domains.
Rank |
Algorithm |
Model |
Val Avg Acc |
Test Avg Acc |
Val 10% Acc |
Test 10% Acc ▼ |
Contact |
References |
Date |
Notes |
1 |
ICON |
DistillBERT-base-uncased |
72.7 (0.2)
|
71.9 (0.1)
|
55.2 (0.7)
|
54.7 (0.0)
|
Nick Y.
|
Paper /
Code
|
November 03, 2022 |
Uses unlabeled data from the target domain. lr: 1.5581425972502133e-05*, self_training_lambda:1*, self_training_threshold: 0.7681640736450283*. |
2 |
AFN |
DistillBERT-base-uncased |
73.0 (0.4)
|
72.1 (0.3)
|
56.0 (0.0)
|
54.2 (0.8)
|
WILDS
|
Paper /
Code
|
December 09, 2021 |
Uses unlabeled data from the target domain. |
3 |
Masked LM |
DistillBERT-base-uncased |
72.6 (0.5)
|
71.7 (0.4)
|
55.1 (0.8)
|
53.5 (0.2)
|
WILDS
|
Paper /
Code
|
December 09, 2021 |
Uses unlabeled data from the target domain. |
4 |
CORAL |
DistillBERT-base-uncased |
72.5 (0.1)
|
71.7 (0.1)
|
54.2 (0.8)
|
53.3 (0.0)
|
WILDS
|
Paper /
Code
|
December 09, 2021 |
Uses unlabeled data from the target domain. |
5 |
DANN |
DistillBERT-base-uncased |
72.6 (0.1)
|
71.7 (0.1)
|
54.7 (0.0)
|
53.3 (0.0)
|
WILDS
|
Paper /
Code
|
December 09, 2021 |
Uses unlabeled data from the target domain. |
6 |
Pseudo-Label |
DistillBERT-base-uncased |
72.5 (0.1)
|
71.6 (0.1)
|
54.2 (0.8)
|
52.3 (1.1)
|
WILDS
|
Paper /
Code
|
December 09, 2021 |
Uses unlabeled data from the target domain. |
Py150
Rank |
Algorithm |
Model |
Test ID Method/Class Acc |
Test ID All Acc |
Test OOD Method/class Acc ▼ |
Test OOD All Acc |
Contact |
References |
Date |
Notes |
1 |
ERM (grid search) |
CodeGPT |
75.4 (0.4)
|
74.5 (0.4)
|
67.9 (0.1)
|
69.6 (0.1)
|
WILDS
|
Paper /
Code
|
July 15, 2021 |
Hyperparameters tuned via grid search, consistent with other WILDS baselines that do not use unlabeled data. |
2 |
Group DRO |
CodeGPT |
70.8 (0.0)
|
71.0 (0.0)
|
66.0 (0.1)
|
67.9 (0.0)
|
WILDS
|
Paper /
Code
|
July 15, 2021 |
|
3 |
CORAL |
CodeGPT |
70.6 (0.0)
|
70.8 (0.1)
|
65.9 (0.1)
|
67.9 (0.0)
|
WILDS
|
Paper /
Code
|
July 15, 2021 |
|
4 |
IRM |
CodeGPT |
67.3 (1.1)
|
68.3 (0.7)
|
64.3 (0.2)
|
66.4 (0.1)
|
WILDS
|
Paper /
Code
|
July 15, 2021 |
|