Leaderboard

Overview

To submit, please read our submission guidelines.

Higher numbers are better for all metrics. In parentheses, we show corrected sample standard deviations across random replicates.

A bold algorithm or model name indicates an official implementation submitted by an author of the original paper.

An asterisk next to a value indicates that the entry deviates from the official submission guidelines, for example because it uses a non-default model or additional pre-training data. The deviations are described in the notes in the dataset-specific leaderboards.

This overall leaderboard show out-of-distribution test performance across all datasets. For each dataset, we highlight in green the best-performing algorithm that conforms to official submission guidelines.

Without unlabeled data

Algorithm	Amazon	Camelyon17	CivilComments	FMoW	GlobalWheat	iWildCam	OGB-MolPCBA	PovertyMap	Py150	RxRx1	Contact	References
	10% Acc	Avg Acc	Worst-Group Acc	Worst-Reg Acc	Avg domain acc	Macro F1	Avg Precision	Worst-U/R r	Mtd/Cls Acc	Avg Acc
LISA	54.7 (0.0)	77.1 (6.9)	72.9 (1.0)	35.5 (0.81)	-	-	-	-	-	31.9 (1.0)	Yu Wang	Paper / Code
Fish	53.3 (0.0)	74.7 (7.1)	75.3 (0.6)	34.6 (0.18)	-	22.0 (1.8)	-	-	-	-	Yuge Shi	Paper / Code
ContriMix	-	94.6 (1.2)	-	-	-	-	-	-	-	-	Dinkar Juyal	Paper / Code
ERM w/ targeted aug	-	92.1 (3.1)	-	-	-	36.5 (0.9) *	-	-	-	-	Irena Gao	Paper / Code
IRMX (PAIR opt)	-	74.0 (7.2)	74.2 (1.4)	35.4 (1.3)	-	27.9 (0.9)	-	0.47 (0.09)	-	28.8 (0.0)	Yongqiang Chen	Paper / Code
IRMX	-	65.5 (8.3)	73.4 (1.4)	33.7 (0.95)	-	26.7 (1.1)	-	0.45 (0.05)	-	28.7 (0.2)	Yongqiang Chen	Paper / Code
DFR	-	-	72.5 (0.9)	42.8 (0.42) *	-	-	-	-	-	-	Pavel Izmailov	Paper / Code
CORAL	52.9 (0.8)	59.5 (7.7)	65.6 (1.3)	32.8 (0.66)	-	32.7 (0.2)	17.9 (0.5)	0.44 (0.07)	65.9 (0.1)	28.4 (0.3)	WILDS	Paper / Code
CGD	-	69.4 (7.9)	69.1 (1.9)	32.0 (2.26)	-	-	-	0.43 (0.04)	-	-	Vihari Piratla	Paper / Code
Group DRO	53.3 (0.0)	68.4 (7.3)	70.0 (2.0)	31.1 (1.66)	47.9 (2.0)	23.8 (2.0)	22.4 (0.6)	0.39 (0.06)	66.0 (0.1)	22.5 (0.3)	WILDS	Paper / Code
C-Mixup	-	-	-	-	-	-	-	0.53 (0.07)	-	-	Yiping Wang	Paper / Code
IID representation learning	-	-	-	-	-	-	-	-	-	39.2 (0.2)	Jiqing Wu	Paper / Code
ABSGD	-	-	-	-	-	33.0 (0.6)	-	-	-	-	Qi Qi	Paper / Code
ERM w/ data aug	-	82.0 (7.4)	-	35.7 (0.26)	-	32.2 (1.2)	-	0.49 (0.06)	-	-	WILDS	Paper / Code
ERM (CutMix)	-	-	-	-	-	-	-	-	-	38.4 (0.2)	Jiqing Wu	Paper / Code
ERM (more checkpoints)	-	-	-	34.8 (1.9)	-	32.0 (1.5)	-	-	-	-	Kazuki Irie	Paper / Code
ERM (grid search)	53.8 (0.8)	70.3 (6.4)	56.0 (3.6)	31.3 (0.17)	51.2 (1.8)	30.8 (1.3)	27.2 (0.3)	0.45 (0.06)	67.9 (0.1)	29.9 (0.4)	WILDS	Paper / Code
ERM (rand search)	54.2 (0.8)	70.8 (7.2)	-	34.1 (1.42)	50.5 (1.7)	30.6 (1.1)	28.3 (0.1)	0.5 (0.07)	-	-	WILDS	Paper / Code
IRM	52.4 (0.8)	64.2 (8.1)	66.3 (2.1)	32.8 (2.09)	-	15.1 (4.9)	15.6 (0.3)	0.43 (0.07)	64.3 (0.2)	8.2 (1.1)	WILDS	Paper / Code
ARM-BN	-	-	-	24.4 (0.54)	-	23.3 (2.8)	-	-	-	31.2 (0.1)	Marvin Zhang	Paper / Code
Test-time BN adaptation	-	-	-	30.0 (0.23)	-	13.8 (0.6)	-	-	-	20.1 (0.2)	Marvin Zhang	Paper / Code
AutoFT	-	-	-	51.8 (0.41) *	-	52.0 (0.4) *	-	-	-	-	Caroline Choi	Paper / Code
SGD (Freeze-Embed)	-	96.5 (0.4) *	-	50.3 (1.1) *	-	-	-	-	-	-	Ananya Kumar	Paper / Code
FLYP	-	-	-	-	-	46.0 (1.3) *	-	-	-	-	Sankalp Garg	Paper / Code
MBDG	-	93.3 (1.0) *	-	-	-	-	-	-	-	-	Alex Robey	Paper / Code
MixUp	-	63.5 (0.9) *	-	-	-	13.8 (0.8)	-	-	-	-	Olivia Wiles	Paper / Code
JTT	-	63.8 (1.4) *	-	-	-	11.0 (2.5)	-	-	-	-	Olivia Wiles	Paper / Code

With unlabeled data

Algorithm	Amazon	Camelyon17	CivilComments	FMoW	GlobalWheat	iWildCam	OGB-MolPCBA	PovertyMap	Contact	References
	10% Acc	Avg Acc	Worst-Group Acc	Worst-Reg Acc	Avg domain acc	Macro F1	Avg Precision	Worst-U/R r
ICON	54.7 (0.0)	93.8 (0.3)	68.8 (1.3)	39.9 (1.12)	52.3 (0.2)	34.5 (1.4)	28.3 (0.0)	0.49 (0.04)	Nick Y.	Paper / Code
Connect Later	-	95.0 (0.9)	-	-	-	36.8 (1.6)	-	-	Helen Qu	Paper / Code
ERM+SpAR	-	-	-	-	-	-	-	0.51 (0.1)	Ben Eyre	Paper / Code
C-Mixup+SpAR	-	-	-	-	-	-	-	0.52 (0.09)	Ben Eyre	Paper / Code
Noisy Student	-	86.7 (1.7)	-	37.8 (0.62)	49.3 (3.7)	32.1 (0.7)	27.5 (0.1)	0.42 (0.11)	WILDS	Paper / Code
SwAV	-	91.4 (2.0)	-	36.3 (1.01)	-	29.0 (2.0)	-	0.45 (0.05)	WILDS	Paper / Code
AFN	54.2 (0.8)	83.2 (6.2)	-	38.3 (0.95)	-	30.8 (0.5)	14.9 (1.3)	0.39 (0.08)	WILDS	Paper / Code
DANN	53.3 (0.0)	68.4 (9.2)	-	34.6 (1.71)	-	31.9 (1.4)	20.4 (0.8)	0.33 (0.1)	WILDS	Paper / Code
Pseudo-Label	52.3 (1.1)	67.7 (8.2)	66.9 (2.6)	33.7 (0.24)	42.9 (2.3)	30.3 (0.4)	19.7 (0.1)	-	WILDS	Paper / Code
FixMatch	-	71.0 (4.9)	-	32.6 (2.05)	-	31.0 (1.3)	-	0.3 (0.11)	WILDS	Paper / Code
CORAL	53.3 (0.0)	77.9 (6.6)	-	33.7 (0.23)	-	27.9 (0.4)	26.6 (0.2)	0.36 (0.08)	WILDS	Paper / Code
Masked LM	53.5 (0.2)	-	65.7 (2.3)	-	-	-	-	-	WILDS	Paper / Code

Below, we list individual leaderboards with more details on each submission.

iWildCam

Without unlabeled data

Rank	Algorithm	Model	Test ID Macro F1	Test ID Avg Acc	Test OOD Macro F1 ▼	Test OOD Avg Acc	Contact	References	Date	Notes
1	AutoFT	CLIP ViT-L/14 @ 336px	63.5 (0.5) *	82.9 (0.1) *	52.0 (0.4) *	83.1 (0.0) *	Caroline Choi	Paper / Code	January 28, 2024	log-scale [1e-7, 1e-3], wd: log-scale[0, 1.0], seed: [0, 100], loss weights in log-scale[1e-4, 10.0]
2	FLYP	CLIP ViT-L/14@336px	59.9 (0.7) *	76.2 (0.3) *	46.0 (1.3) *	76.2 (0.4) *	Sankalp Garg	Paper / Code	December 05, 2022	"lr: [0.0001, 0.00001, 0.000001], wd: [0.1, 0.1, 0.2]". The hyperparameter search was done on a smaller model CLIP ViT-B/16. The best hyperparameter was used for ViT-L/14@336px too.
3	Model Soups (CLIP ViT-L)	ViT-L	57.6 (1.9) *	79.1 (0.4) *	43.3 (1.0) *	79.3 (0.3) *	Mitchell Wortsman	Paper / Code	March 12, 2022	Model soups on top of a random hyperparameter search over LR, iterations, data augmentation, label smoothing.
4	ERM (CLIP ViT-L)	ViT-L	55.8 (1.9) *	77.0 (0.7) *	41.4 (0.5) *	78.3 (1.1) *	Mitchell Wortsman	Paper / Code	July 28, 2022	Random hyperparameter search over LR, iterations, data augmentation, label smoothing.
5	ERM	PNASNet-5-Large	52.8 (1.4) *	77.3 (0.7) *	38.5 (0.6) *	78.3 (1.4) *	John Miller	Paper / Code	July 20, 2021	Does not use the default model.
6	ERM w/ targeted aug	ResNet50	50.2 (1.6) *	77.8 (1.0) *	36.5 (0.9) *	74.4 (2.0) *	Irena Gao	Paper / Code	November 30, 2022	Uses copy-paste data augmentations, which rely on MegaDetector.
7	ABSGD	ResNet50	47.5 (1.6)	74.8 (0.5)	33.0 (0.6)	72.7 (1.8)	Qi Qi	Paper / Code	October 13, 2022	The learning rates are tuned in {3e-05, *4e-05}. The hyperparameters for ABSGD are tuned between {1.1, 1.5, 2} and gamma is 0.9. Trained for 36 epochs with a batch size of 16. The learning rate is decayed at the 18th epoch by a factor of 2
8	CORAL	ResNet50	43.6 (3.3)	73.8 (0.3)	32.7 (0.2)	73.3 (4.3)	WILDS	Paper / Code	July 15, 2021
9	COSMO ( Classification Of Species using Multimodal cOntext)	ResNet50 + KG link prediction model	46.4 (1.5) *	75.7 (0.2) *	32.3 (0.9) *	74.5 (3.6) *	Vardaan Pahuja	Paper / Code	February 28, 2025	This model uses additional data such as taxonomy, GPS, and time coordinates to train a multimodal KG link prediction model. At inference, only the image is used to make the prediction. We use the original best hyperparameters for ERM model as published in the paper without any further tuning.
10	ERM w/ data aug	ResNet50	47.0 (1.4)	76.9 (0.6)	32.2 (1.2)	73.0 (0.4)	WILDS	Paper / Code	December 09, 2021	Implements RandAugment. Hyperparameters tuned via random search, consistent with the WILDS unlabeled data baselines.
11	ERM (more checkpoints)	ResNet50	47.9 (2.6)	76.2 (0.1)	32.0 (1.5)	69.0 (0.4)	Kazuki Irie	Paper / Code	February 10, 2022	We used the default hyper-parameters from the official code base, but conducted cross validation every 1000 training steps.
12	ERM (grid search)	ResNet50	47.1 (1.5)	75.7 (0.4)	30.8 (1.3)	71.5 (2.6)	WILDS	Paper / Code	July 15, 2021	Hyperparameters tuned via grid search, consistent with other WILDS baselines that do not use unlabeled data.
13	ERM (rand search)	ResNet50	46.7 (0.6)	74.9 (1.2)	30.6 (1.1)	72.5 (3.2)	WILDS	Paper / Code	December 09, 2021	Hyperparameters tuned via random search, consistent with the WILDS unlabeled data baselines.
14	IRMX (PAIR opt)	ResNet50	43.9 (2.0)	74.9 (1.1)	27.9 (0.9)	67.6 (2.5)	Yongqiang Chen	Paper / Code	March 05, 2023	preference: [(1,1e8,1e12), (1,1e10,1e12), (1,1e12,1e12)], neg_irmv1_adj_rate: [1e-4, 1e-2, 1]
15	IRMX	ResNet50	43.6 (1.2)	74.3 (0.7)	26.7 (1.1)	66.7 (1.5)	Yongqiang Chen	Paper / Code	March 04, 2023	penalty_weight: [0.01, 0.1, 1.0*, 10.0, 100.0]
16	Group DRO	ResNet50	37.5 (1.9)	71.6 (2.7)	23.8 (2.0)	72.7 (2.0)	WILDS	Paper / Code	July 15, 2021
17	ARM-BN	ResNet50	27.5 (5.4)	62.0 (4.0)	23.3 (2.8)	70.2 (2.4)	Marvin Zhang	Paper / Code	April 19, 2022	Requires test data to be batched by groups
18	Fish	ResNet50	40.3 (0.6)	73.8 (0.1)	22.0 (1.8)	64.7 (2.6)	Yuge Shi	Paper / Code	December 14, 2021
19	IRM	ResNet50	22.4 (7.7)	59.8 (8.2)	15.1 (4.9)	59.7 (3.8)	WILDS	Paper / Code	July 15, 2021
20	MixUp	ResNet50	31.2 (3.1)	66.1 (1.8)	13.8 (0.8)	48.6 (1.1)	Olivia Wiles	Paper / Code	June 16, 2022	lr: [0.01, 0.001, 0.0001]; alpha: [0.2, 0.5, 1.0]
21	Test-time BN adaptation	ResNet50	12.0 (0.3)	37.2 (0.7)	13.8 (0.6)	46.6 (0.9)	Marvin Zhang	Paper / Code	April 19, 2022	Requires test data to be batched by groups
22	JTT	ResNet50	32.6 (4.4)	64.9 (2.8)	11.0 (2.5)	47.4 (2.2)	Olivia Wiles	Paper / Code	June 16, 2022	lr: [0.01, 0.001, 0.0001]; lambda: [0.2, 2, 20, 200]

With unlabeled data

Unlabeled data is available from the extra domain.

Rank	Algorithm	Model	Test ID Macro F1	Test ID Avg Acc	Test OOD Macro F1 ▼	Test OOD Avg Acc	Contact	References	Date	Notes
1	Connect Later	ResNet50	51.5 (2.0)	78.9 (0.8)	36.8 (1.6)	76.5 (1.7)	Helen Qu	Paper / Code	March 01, 2024	Linear probing LR=0.0018804142350329204. Fine-tuning LR=0.0002504172429234668. TRANSFORM_P=0.5473471004493135. These values were chosen from 10 trials of a random search in lp_lr 10^Uni(-3, -2), ft_lr 10^Uni(-5, -2), transform_p Uni(0.5, 0.9).
2	ICON	ResNet50	50.6 (1.3)	77.5 (0.2)	34.5 (1.4)	72.0 (0.2)	Nick Y.	Paper / Code	November 03, 2022	Uses unlabeled data from extra domains. lr: 0.0002253717686699905, dropout: 0.5; follows NoisyStudent.
3	Noisy Student	ResNet50	48.3 (1.6)	77.6 (1.4)	32.1 (0.7)	71.0 (3.1)	WILDS	Paper / Code	December 09, 2021	Uses unlabeled data from extra domains.
4	DANN	ResNet50	48.5 (2.8)	77.2 (0.8)	31.9 (1.4)	70.7 (2.6)	WILDS	Paper / Code	December 09, 2021	Uses unlabeled data from extra domains.
5	FixMatch	ResNet50	46.3 (0.5)	78.1 (0.5)	31.0 (1.3)	71.9 (3.1)	WILDS	Paper / Code	December 09, 2021	Uses unlabeled data from extra domains.
6	AFN	ResNet50	46.8 (0.8)	77.2 (0.5)	30.8 (0.5)	74.4 (1.6)	WILDS	Paper / Code	December 09, 2021	Uses unlabeled data from extra domains.
7	Pseudo-Label	ResNet50	47.3 (0.4)	77.5 (0.1)	30.3 (0.4)	68.7 (3.4)	WILDS	Paper / Code	December 09, 2021	Uses unlabeled data from extra domains.
8	SwAV	ResNet50	47.3 (1.4)	74.8 (1.3)	29.0 (2.0)	63.4 (1.5)	WILDS	Paper / Code	December 09, 2021	Uses unlabeled data from extra domains.
9	CORAL	ResNet50	40.5 (1.4)	77.0 (0.3)	27.9 (0.4)	69.7 (0.8)	WILDS	Paper / Code	December 09, 2021	Uses unlabeled data from extra domains.

Camelyon17

Without unlabeled data

Rank	Algorithm	Model	Val Acc	Test Acc ▼	Contact	References	Date	Notes
1	SGD (Freeze-Embed)	CLIP ViT-L	95.2 (0.3) *	96.5 (0.4) *	Ananya Kumar	Paper / Code	October 18, 2022	Uses a pretrained model. lr: [3e-5, 1e-4, 3e-4*, 1e-3, 3e-3, 1e-2]
2	ContriMix	DenseNet121	91.9 (0.7)	94.6 (1.2)	Dinkar Juyal	Paper / Code	June 22, 2023	Adam, lr [0.001, 0.0001], 5 attributes, 4 mixes, attr_cons_weight 0.1, self_recon_weight 0.1, cont_cons_weight 0.3, entropy_weight 0.5
3	MBDG	DenseNet121	88.1 (1.8) *	93.3 (1.0) *	Alex Robey	Paper / Code	March 17, 2022	Uses a pretrained model. lr: [1e-4, 1e-3, 1e-2], wd: [0, 1e-3, 1e-3], gamma (mbdg margin): [0.01, 0.1, 0.5], eta_d (mbdg dual step size): [5e-3, 5e-2, 5e-1]
4	ERM w/ targeted aug	DenseNet121	92.7 (0.7)	92.1 (3.1)	Irena Gao	Paper / Code	November 30, 2022	lr: ~0.03, batch size 168. Uses specialized H&E staining color jitter.
5	ERM w/ H&E jitter	se_resnext101_32x4d	88.0 (4.2) *	91.6 (1.9) *	Rohan Taori	Paper / Code	July 19, 2021	Uses specialized H&E staining color jitter. Does not use the default model.
6	ERM w/ data aug	DenseNet121	90.6 (1.2)	82.0 (7.4)	WILDS	Paper / Code	December 09, 2021	Implements RandAugment. Uses color augmentation as part of RandAugment. Hyperparameters tuned via random search, consistent with the WILDS unlabeled data baselines.
7	LISA	DenseNet121	81.8 (1.4)	77.1 (6.9)	Yu Wang	Paper / Code	March 13, 2022	We have used all the default hyperparameters, except for the mix_alpha, which is tuned in [0.5, 2], and 2 is the final choice.
8	Fish	DenseNet121	83.9 (1.2)	74.7 (7.1)	Yuge Shi	Paper / Code	December 14, 2021
9	IRMX (PAIR opt)	DenseNet121	84.3 (1.6)	74.0 (7.2)	Yongqiang Chen	Paper / Code	March 05, 2023	preference: [(1,1e8,1e12), (1,1e10,1e12), (1,1e12,1e12)], neg_irmv1_adj_rate: [1e-4, 1e-2, 1]
10	ERM (rand search)	DenseNet121	85.8 (1.9)	70.8 (7.2)	WILDS	Paper / Code	December 09, 2021	Hyperparameters tuned via random search, consistent with the WILDS unlabeled data baselines.
11	ERM (grid search)	DenseNet121	84.9 (3.1)	70.3 (6.4)	WILDS	Paper / Code	July 15, 2021	Hyperparameters tuned via grid search, consistent with other WILDS baselines that do not use unlabeled data.
12	CGD	DenseNet121	86.8 (1.4)	69.4 (7.9)	Vihari Piratla	Paper / Code	April 10, 2022	No hyperparameter tuning. CG step size: 0.05. LR, optimizer, decay rate etc. all set to default.
13	Group DRO	DenseNet121	85.5 (2.2)	68.4 (7.3)	WILDS	Paper / Code	July 15, 2021
14	IRMX	DenseNet121	84.7 (2.0)	65.5 (8.3)	Yongqiang Chen	Paper / Code	March 04, 2023	penalty_weight: [0.01, 0.1, 1.0*, 10.0, 100.0]
15	IRM	DenseNet121	86.2 (1.4)	64.2 (8.1)	WILDS	Paper / Code	July 15, 2021
16	JTT	ResNet50	66.9 (1.5) *	63.8 (1.4) *	Olivia Wiles	Paper / Code	June 16, 2022	lr: [0.01, 0.001, 0.0001]; lambda: [0.2, 2, 20, 200]
17	MixUp	ResNet50	65.5 (1.4) *	63.5 (0.9) *	Olivia Wiles	Paper / Code	June 16, 2022	lr: [0.01, 0.001, 0.0001]; alpha:[0.2, 0.5, 1.0]
18	CORAL	DenseNet121	86.2 (1.4)	59.5 (7.7)	WILDS	Paper / Code	July 15, 2021

With unlabeled data

Unlabeled data is available from the source, validation and target domains.

Rank	Algorithm	Model	Val Acc	Test Acc ▼	Contact	References	Date	Notes
1	Connect Later	DenseNet121	93.9 (0.4)	95.0 (0.9)	Helen Qu	Paper / Code	March 01, 2024	Linear probing LR=0.006222466404167087. Fine-tuning LR=0.003324366874654924. TRANSFORM_P=0.8260829829811762. AUG_STRENGTH=0.0995477425665205. These values were chosen from 10 trials of a random search in lp_lr 10^Uni(-3, -2), ft_lr 10^Uni(-5, -2), transform_p Uni(0.5, 0.9), aug strength Uni(0.05, 0.1).
2	ICON	DenseNet121	90.1 (0.4)	93.8 (0.3)	Nick Y.	Paper / Code	November 03, 2022	Uses unlabeled data from the target domain. lr: 0.0027267495451878732, dropout: [0.0, 0.5].
3	SwAV	DenseNet121	92.3 (0.4)	91.4 (2.0)	WILDS	Paper / Code	December 09, 2021	Uses unlabeled data from the target domain. Uses color augmentation as part of RandAugment.
4	Noisy Student	DenseNet121	93.2 (0.5)	86.7 (1.7)	WILDS	Paper / Code	December 09, 2021	Uses unlabeled data from the target domain. Uses color augmentation as part of RandAugment.
5	AFN	DenseNet121	91.1 (0.9)	83.2 (6.2)	WILDS	Paper / Code	December 09, 2021	Uses unlabeled data from the target domain. Uses color augmentation as part of RandAugment.
6	CORAL	DenseNet121	90.4 (0.9)	77.9 (6.6)	WILDS	Paper / Code	December 09, 2021	Uses unlabeled data from the target domain. Uses color augmentation as part of RandAugment.
7	FixMatch	DenseNet121	91.3 (1.1)	71.0 (4.9)	WILDS	Paper / Code	December 09, 2021	Uses unlabeled data from the target domain. Uses color augmentation as part of RandAugment.
8	DANN	DenseNet121	86.9 (2.2)	68.4 (9.2)	WILDS	Paper / Code	December 09, 2021	Uses unlabeled data from the target domain. Uses color augmentation as part of RandAugment.
9	Pseudo-Label	DenseNet121	91.3 (1.3)	67.7 (8.2)	WILDS	Paper / Code	December 09, 2021	Uses unlabeled data from the target domain. Uses color augmentation as part of RandAugment.

RxRx1

Rank	Algorithm	Model	Val Acc	Test ID Acc	Test Acc ▼	Contact	References	Date	Notes
1	IID representation learning	ResNet50	23.9 (0.3)	49.9 (0.5)	39.2 (0.2)	Jiqing Wu	Paper / Code	February 20, 2022	Uses CutMix regularizer.
2	ERM (CutMix)	ResNet50	23.6 (0.3)	47.4 (1.0)	38.4 (0.2)	Jiqing Wu	Paper / Code	February 20, 2022	Uses CutMix regularizer.
3	LISA	ResNet50	20.1 (0.4)	41.1 (1.3)	31.9 (1.0)	Yu Wang	Paper / Code	March 13, 2022	We have used all the default hyperparameters, except for the mix_alpha, which is tuned in [0.5, 2], and 2 is the final choice.
4	ARM-BN	ResNet50	20.9 (0.2)	34.9 (0.2)	31.2 (0.1)	Marvin Zhang	Paper / Code	April 19, 2022	Requires test data to be batched by groups
5	ERM (grid search)	ResNet50	19.4 (0.2)	35.9 (0.4)	29.9 (0.4)	WILDS	Paper / Code	July 15, 2021	Hyperparameters tuned via grid search, consistent with other WILDS baselines that do not use unlabeled data.
6	IRMX (PAIR opt)	ResNet50	18.8 (0.2)	34.7 (0.3)	28.8 (0.0)	Yongqiang Chen	Paper / Code	March 05, 2023	preference: [(1,1e8,1e12), (1,1e10,1e12), (1,1e12,1e12)], neg_irmv1_adj_rate: [1e-4, 1e-2, 1*]
7	IRMX	ResNet50	18.9 (0.2)	34.7 (0.2)	28.7 (0.2)	Yongqiang Chen	Paper / Code	March 04, 2023	penalty_weight: [0.01, 0.1, 1.0*, 10.0, 100.0]
8	CORAL	ResNet50	18.5 (0.4)	34.0 (0.3)	28.4 (0.3)	WILDS	Paper / Code	July 15, 2021
9	Group DRO	ResNet50	15.2 (0.1)	28.1 (0.3)	22.5 (0.3)	WILDS	Paper / Code	July 15, 2021
10	Test-time BN adaptation	ResNet50	12.3 (0.2)	21.5 (0.2)	20.1 (0.2)	Marvin Zhang	Paper / Code	April 19, 2022	Requires test data to be batched by groups
11	IRM	ResNet50	5.6 (0.4)	9.9 (1.4)	8.2 (1.1)	WILDS	Paper / Code	July 15, 2021

OGB-MolPCBA

Without unlabeled data

Rank	Algorithm	Model	Val Avg Precision	Test Avg Precision ▼	Contact	References	Date	Notes
1	ERM (rand search)	GIN-virtual	29.3 (0.3)	28.3 (0.1)	WILDS	Paper / Code	December 09, 2021	Hyperparameters tuned via random search, consistent with the WILDS unlabeled data baselines.
2	ERM (grid search)	GIN-virtual	27.8 (0.1)	27.2 (0.3)	WILDS	Paper / Code	July 15, 2021	Hyperparameters tuned via grid search, consistent with other WILDS baselines that do not use unlabeled data.
3	Group DRO	GIN-virtual	23.1 (0.6)	22.4 (0.6)	WILDS	Paper / Code	July 15, 2021
4	CORAL	GIN-virtual	18.4 (0.2)	17.9 (0.5)	WILDS	Paper / Code	July 15, 2021
5	IRM	GIN-virtual	15.8 (0.2)	15.6 (0.3)	WILDS	Paper / Code	July 15, 2021

With unlabeled data

Unlabeled data is available from the source, validation and target domains.

Rank	Algorithm	Model	Val Avg Precision	Test Avg Precision ▼	Contact	References	Date	Notes
1	ICON	GIN-virtual	29.6 (0.0)	28.3 (0.0)	Nick Y.	Paper / Code	November 03, 2022	Uses unlabeled data from the target domain. lr: 0.000398452164375177, dropout: 0.0.
2	Noisy Student	GIN-virtual	28.9 (0.1)	27.5 (0.1)	WILDS	Paper / Code	December 09, 2021	Uses unlabeled data from the target domain.
3	CORAL	GIN-virtual	27.0 (0.4)	26.6 (0.2)	WILDS	Paper / Code	December 09, 2021	Uses unlabeled data from the target domain.
4	DANN	GIN-virtual	20.7 (0.8)	20.4 (0.8)	WILDS	Paper / Code	December 09, 2021	Uses unlabeled data from the target domain.
5	Pseudo-Label	GIN-virtual	21.9 (0.6)	19.7 (0.1)	WILDS	Paper / Code	December 09, 2021	Uses unlabeled data from the target domain.
6	AFN	GIN-virtual	15.1 (1.3)	14.9 (1.3)	WILDS	Paper / Code	December 09, 2021	Uses unlabeled data from the target domain.

GlobalWheat

Without unlabeled data

Rank	Algorithm	Model	Val Acc	Test Acc ▼	Contact	References	Date	Notes
1	ERM (grid search)	Faster R-CNN	68.6 (0.4)	51.2 (1.8)	WILDS	Paper / Code	July 15, 2021	Hyperparameters tuned via grid search, consistent with other WILDS baselines that do not use unlabeled data.
2	ERM (rand search)	Faster R-CNN	68.7 (0.9)	50.5 (1.7)	WILDS	Paper / Code	December 09, 2021	Hyperparameters tuned via random search, consistent with the WILDS unlabeled data baselines.
3	Group DRO	Faster R-CNN	66.2 (0.4)	47.9 (2.0)	WILDS	Paper / Code	July 15, 2021

With unlabeled data

Unlabeled data is available from the source, validation, target and extra domains.

Rank	Algorithm	Model	Val Acc	Test Acc ▼	Contact	References	Date	Notes
1	ICON	Faster R-CNN	68.9 (0.3)	52.3 (0.2)	Nick Y.	Paper / Code	November 03, 2022	Uses unlabeled data from the target domain. lr: 0.000001*.
2	Noisy Student	Faster R-CNN	68.9 (0.4)	49.3 (3.7)	WILDS	Paper / Code	December 09, 2021	Uses unlabeled data from the target domain.
3	Pseudo-Label	Faster R-CNN	65.0 (0.7)	42.9 (2.3)	WILDS	Paper / Code	December 09, 2021	Uses unlabeled data from the target domain.

CivilComments

Without unlabeled data

Rank	Algorithm	Model	Val Avg Acc	Val Worst-Group Acc	Test Avg Acc	Test Worst-Group Acc ▼	Contact	References	Date	Notes
1	Fish	DistillBERT-base-uncased	88.9 (0.6)	70.5 (1.1)	89.3 (0.3)	75.3 (0.6)	Yuge Shi	Paper / Code	December 14, 2021
2	IRMX (PAIR opt)	DistillBERT-base-uncased	88.2 (1.8)	73.2 (0.7)	88.0 (1.8)	74.2 (1.4)	Yongqiang Chen	Paper / Code	March 05, 2023	preference: [(1,1e8,1e12), (1,1e10,1e12), (1,1e12,1e12)], neg_irmv1_adj_rate: [1e-4, 1e-2, 1], lr: [1e-6, 2e-6*, 1e-5, 2e-5]
3	IRMX	DistillBERT-base-uncased	87.2 (2.3)	71.9 (1.7)	87.0 (2.3)	73.4 (1.4)	Yongqiang Chen	Paper / Code	March 04, 2023	penalty_weight: [0.01, 0.1, 1.0, 10.0, 100.0], lr: [1e-6, 2e-6, 1e-5, 2e-5]
4	LISA	DistillBERT-base-uncased	90.3 (0.3)	71.2 (0.9)	90.1 (0.3)	72.9 (1.0)	Yu Wang	Paper / Code	March 13, 2022	We have used all the default hyperparameters, except for the mix_alpha, which is tuned in [0.5, 2], and 2 is the final choice.
5	DFR	DistillBERT-base-uncased	87.5 (0.7)	71.7 (0.9)	87.3 (0.7)	72.5 (0.9)	Pavel Izmailov	Paper / Code	September 26, 2022	Uses default WILDS training script for the base models, dropping 20% of the data from the training set for last layer retraining. We then retrain the last layer on the left-out data, with two hyper-parameters: regularization coefficient C: [1., 0.3, 0.1, 0.07, 0.03, 0.01, 0.003] and bias vector correction t: [-0.15, -0.1, -0.05, 0., 0.05, 0.1, 0.15] according to the best WGA on the validation set. We use all the group identities for retraining the last layer. We assign a group [0-8] to each example according to its attributes; if multiple attributes are present, we use the least common attribute (with the smallest number of occurrences) as the group for that example.
6	Group DRO	DistillBERT-base-uncased	90.1 (0.4)	67.7 (1.8)	89.9 (0.5)	70.0 (2.0)	WILDS	Paper / Code	July 15, 2021	Groups data according to (label, Black).
7	Reweighted (Label)	DistillBERT-base-uncased	90.1 (0.4)	65.9 (1.8)	89.8 (0.4)	69.2 (0.9)	WILDS	Paper / Code	July 15, 2021	Groups data according to labels.
8	Group DRO (Label)	DistillBERT-base-uncased	90.4 (0.4)	65.0 (3.8)	90.2 (0.3)	69.1 (1.8)	WILDS	Paper / Code	July 15, 2021	Groups data according to labels.
9	CGD	distilbert-base-uncased	89.6 (0.5)	68.3 (1.6)	89.6 (0.4)	69.1 (1.9)	Vihari Piratla	Paper / Code	April 10, 2022	CG step size: [0.005, 0.01, 0.05, 0.1]; 0.05 performed the best. LR, optimizer, decay rate etc. all set to default.
10	DFR (label, Black)	DistillBERT-base-uncased	88.1 (1.1)	69.9 (1.0)	87.9 (1.2)	68.2 (2.3)	Pavel Izmailov	Paper / Code	September 26, 2022	Uses default WILDS training script for the base models, dropping 20% of the data from the training set for last layer retraining. We then retrain the last layer on the left-out data, with two hyper-parameters: regularization coefficient C: [1., 0.3, 0.1, 0.07, 0.03, 0.01, 0.003] and bias vector correction t: [-0.15, -0.1, -0.05, 0., 0.05, 0.1, 0.15] according to the best WGA on the validation set. We only use the (label, Black) attributes to form the groups for last layer retraining.
11	IRM	DistillBERT-base-uncased	89.0 (0.7)	65.9 (2.8)	88.8 (0.7)	66.3 (2.1)	WILDS	Paper / Code	July 15, 2021	Groups data according to (label, Black).
12	Reweighted (Label x Black)	DistillBERT-base-uncased	89.6 (0.6)	66.6 (1.5)	89.2 (0.6)	66.2 (1.2)	WILDS	Paper / Code	July 15, 2021	Groups data according to (label, Black).
13	CORAL	DistillBERT-base-uncased	88.9 (0.6)	64.7 (1.4)	88.7 (0.5)	65.6 (1.3)	WILDS	Paper / Code	July 15, 2021	Groups data according to (label, Black).
14	ERM (grid search)	DistillBERT-base-uncased	92.3 (0.2)	50.5 (1.9)	92.2 (0.1)	56.0 (3.6)	WILDS	Paper / Code	July 15, 2021	Hyperparameters tuned via grid search, consistent with other WILDS baselines that do not use unlabeled data.

With unlabeled data

Unlabeled data is available from the extra domain.

Rank	Algorithm	Model	Val Avg Acc	Val Worst-Group Acc	Test Avg Acc	Test Worst-Group Acc ▼	Contact	References	Date	Notes
1	ICON	DistillBERT-base-uncased	89.9 (0.1)	66.4 (0.7)	89.7 (0.1)	68.8 (1.3)	Nick Y.	Paper / Code	November 03, 2022	Uses unlabeled data from the same distribution. lr: 7.324042204632364e-05, dropout: 0.0.
2	Pseudo-Label	DistillBERT-base-uncased	90.5 (0.6)	63.9 (1.7)	90.3 (0.5)	66.9 (2.6)	WILDS	Paper / Code	December 09, 2021	Uses unlabeled data from the same distribution.
3	Masked LM	DistillBERT-base-uncased	89.7 (1.1)	64.5 (2.5)	89.4 (1.2)	65.7 (2.3)	WILDS	Paper / Code	December 09, 2021	Uses unlabeled data from the same distribution.

FMoW

Without unlabeled data

Rank	Algorithm	Model	Val Avg Acc	Test Avg Acc	Val Worst-region Acc	Test Worst-region Acc ▼	Contact	References	Date	Notes
1	AutoFT	CLIP ViT-L/14 @ 336px	73.5 (0.11) *	68.2 (0.05) *	66.0 (0.25) *	51.8 (0.41) *	Caroline Choi	Paper / Code	January 28, 2024	lr: log-scale [1e-7, 1e-3], wd: log-scale[0, 1.0], seed: [0, 100], loss weights in log-scale[1e-4, 10.0]
2	SGD (Freeze-Embed)	CLIP-ViT-L/14 @ 338px	73.7 (0.34) *	68.3 (0.42) *	60.0 (0.07) *	50.3 (1.1) *	Ananya Kumar	Paper / Code	October 18, 2022	lr: [3e-5, 1e-4, 3e-4, 1e-3, 3e-3, 1e-2]
3	Model Soups (CLIP ViT-L)	ViT-L	75.7 (0.07) *	69.5 (0.08) *	59.8 (0.43) *	47.6 (0.33) *	Mitchell Wortsman	Paper / Code	March 12, 2022	Model soups on top of a random hyperparameter search over LR, iterations, data augmentation, label smoothing.
4	ERM (CLIP ViT-L)	ViT-L	73.6 (0.23) *	66.9 (0.17) *	59.5 (1.31) *	46.1 (0.59) *	Mitchell Wortsman	Paper / Code	July 28, 2022	Random hyperparameter search over LR, iterations, data augmentation, label smoothing.
5	DFR	DenseNet121	68.4 (1.32) *	53.4 (0.44) *	64.3 (1.39) *	42.8 (0.42) *	Pavel Izmailov	Paper / Code	September 26, 2022	Uses the OOD validation set to retrain the last layer of the model. Trains the base model with standard ERM training scripts from the WILDS repo, and only tunes the regularization strength parameter C: [1.*, 0.3, 0.1, 0.07, 0.03, 0.01, 0.003] for last layer retraining.
6	ERM w/ data aug	DenseNet121	62.1 (0.23)	55.5 (0.42)	53.2 (0.61)	35.7 (0.26)	WILDS	Paper / Code	December 09, 2021	Implements RandAugment. Hyperparameters tuned via random search, consistent with the WILDS unlabeled data baselines.
7	LISA	DenseNet121	58.7 (1.12)	52.8 (1.15)	48.7 (0.92)	35.5 (0.81)	Yu Wang	Paper / Code	March 13, 2022	We have used all the default hyperparameters, except for the mix_alpha, which is tuned in [0.5, 2], and 2 is the final choice.
8	IRMX (PAIR opt)	DenseNet121	58.6 (0.46)	52.7 (0.57)	52.3 (1.21)	35.4 (1.3)	Yongqiang Chen	Paper / Code	March 05, 2023	preference: [(1,1e8,1e12), (1,1e10,1e12), (1,1e12,1e12)], neg_irmv1_adj_rate: [1e-4, 1e-2, 1]
9	ERM	se_resnext101_32x4d	62.1 (0.24) *	55.5 (0.14) *	51.3 (2.93) *	35.0 (0.78) *	John Miller	Paper / Code	July 15, 2021	Does not use the default model.
10	ERM (more checkpoints)	DenseNet121	62.0 (0.06)	55.6 (0.23)	52.5 (1.25)	34.8 (1.9)	Kazuki Irie	Paper / Code	February 10, 2022	batch_size: [20, 32, 64], lr: [1e-4, 3e-4]. We conducted cross validation every 200 training steps.
11	Fish	DenseNet121	57.8 (0.15)	51.8 (0.32)	49.5 (2.34)	34.6 (0.18)	Yuge Shi	Paper / Code	December 14, 2021
12	ERM (rand search)	DenseNet121	60.6 (0.57)	54.0 (0.4)	52.6 (0.25)	34.1 (1.42)	WILDS	Paper / Code	December 09, 2021	Hyperparameters tuned via random search, consistent with the WILDS unlabeled data baselines.
13	IRMX	DenseNet121	58.7 (1.04)	52.5 (0.47)	52.4 (1.06)	33.7 (0.95)	Yongqiang Chen	Paper / Code	March 04, 2023	penalty_weight: [0.01, 0.1, 1.0*, 10.0, 100.0]
14	IRM	DenseNet121	56.1 (0.61)	50.4 (0.75)	49.7 (0.97)	32.8 (2.09)	WILDS	Paper / Code	July 15, 2021
15	CORAL	DenseNet121	56.5 (0.15)	50.1 (0.07)	48.9 (1.31)	32.8 (0.66)	WILDS	Paper / Code	July 15, 2021
16	CGD	DenseNet121	57.0 (1.03)	50.6 (1.39)	49.8 (1.04)	32.0 (2.26)	Vihari Piratla	Paper / Code	April 10, 2022	CG step size: [0.05, 0.01, 0.2]; 0.2 performed the best. LR, optimizer, decay rate etc. all set to default.
17	ERM (grid search)	DenseNet121	59.2 (0.07)	52.7 (0.23)	49.8 (0.36)	31.3 (0.17)	WILDS	Paper / Code	July 15, 2021	Hyperparameters tuned via grid search, consistent with other WILDS baselines that do not use unlabeled data.
18	Group DRO	DenseNet121	57.6 (0.7)	51.2 (0.38)	49.4 (0.45)	31.1 (1.66)	WILDS	Paper / Code	July 15, 2021
19	Test-time BN adaptation	DenseNet121	57.9 (0.36)	51.5 (0.25)	47.8 (0.52)	30.0 (0.23)	Marvin Zhang	Paper / Code	April 19, 2022	Requires test data to be batched by groups
20	ARM-BN	DenseNet121	48.0 (0.65)	42.1 (0.26)	38.9 (2.17)	24.4 (0.54)	Marvin Zhang	Paper / Code	April 19, 2022	Requires test data to be batched by groups

With unlabeled data

Unlabeled data is available from the source, validation and target domains.

Rank	Algorithm	Model	Val Avg Acc	Test Avg Acc	Val Worst-region Acc	Test Worst-region Acc ▼	Contact	References	Date	Notes
1	ICON	DenseNet121	64.4 (0.18)	58.5 (0.16)	55.6 (0.44)	39.9 (1.12)	Nick Y.	Paper / Code	November 03, 2022	Uses unlabeled data from the target domain. lr: 0.00013443151989778619, dropout: 0.5.
2	AFN	DenseNet121	61.7 (0.49)	55.6 (0.23)	53.4 (0.78)	38.3 (0.95)	WILDS	Paper / Code	December 09, 2021	Uses unlabeled data from the target domain.
3	Noisy Student	DenseNet121	64.0 (0.37)	58.4 (0.4)	55.4 (0.47)	37.8 (0.62)	WILDS	Paper / Code	December 09, 2021	Uses unlabeled data from the target domain.
4	SwAV	DenseNet121	63.1 (0.38)	56.3 (0.67)	51.6 (0.57)	36.3 (1.01)	WILDS	Paper / Code	December 09, 2021	Uses unlabeled data from the target domain.
5	DANN	DenseNet121	59.5 (0.45)	53.0 (0.58)	50.8 (2.18)	34.6 (1.71)	WILDS	Paper / Code	December 09, 2021	Uses unlabeled data from the target domain.
6	CORAL	DenseNet121	60.1 (0.56)	53.3 (0.61)	51.7 (1.23)	33.7 (0.23)	WILDS	Paper / Code	December 09, 2021	Uses unlabeled data from the target domain.
7	Pseudo-Label	DenseNet121	62.5 (0.08)	55.6 (0.2)	51.5 (0.52)	33.7 (0.24)	WILDS	Paper / Code	December 09, 2021	Uses unlabeled data from the target domain.
8	FixMatch	DenseNet121	58.9 (2.07)	52.5 (1.86)	50.8 (1.11)	32.6 (2.05)	WILDS	Paper / Code	December 09, 2021	Uses unlabeled data from the target domain.

PovertyMap

Without unlabeled data

Rank	Algorithm	Model	Val Pearson r	Test Pearson r	Val Worst-U/R Pearson r	Test Worst-U/R Pearson r ▼	Contact	References	Date	Notes
1	C-Mixup	ResNet18-MS	0.82 (0.04)	0.8 (0.03)	0.55 (0.07)	0.53 (0.07)	Yiping Wang	Paper / Code	December 01, 2022	kde_bandwidth: [0.2, 0.3, 0.4, 0.5, 0.7, 1.0, 2.0, 3.0], alpha: [0.5, 1.0, 2.0*]
2	ERM (rand search)	ResNet18-MS	0.81 (0.03)	0.8 (0.04)	0.53 (0.06)	0.5 (0.07)	WILDS	Paper / Code	December 09, 2021	Hyperparameters tuned via random search, consistent with the WILDS unlabeled data baselines.
3	ERM w/ data aug	ResNet18-MS	0.81 (0.03)	0.79 (0.04)	0.54 (0.06)	0.49 (0.06)	WILDS	Paper / Code	December 09, 2021	Implements composition of random horizontal flip, random affine transformation, color jitter on the RGB channels, and Cutout on all channels. Hyperparameters tuned via random search, consistent with the WILDS unlabeled data baselines.
4	IRMX (PAIR opt)	ResNet18-MS	0.79 (0.04)	0.79 (0.05)	0.49 (0.07)	0.47 (0.09)	Yongqiang Chen	Paper / Code	March 05, 2023	preference: [(1,1e8,1e12), (1,1e10,1e12), (1,1e12,1e12)], neg_irmv1_adj_rate: [1e-4, 1e-2, 1]
5	ERM (grid search)	ResNet18-MS	0.8 (0.04)	0.78 (0.04)	0.51 (0.06)	0.45 (0.06)	WILDS	Paper / Code	July 15, 2021	Hyperparameters tuned via grid search, consistent with other WILDS baselines that do not use unlabeled data.
6	IRMX	ResNet18-MS	0.78 (0.04)	0.77 (0.04)	0.48 (0.07)	0.45 (0.05)	Yongqiang Chen	Paper / Code	March 04, 2023	penalty_weight: [0.01, 0.1, 1.0*, 10.0, 100.0]
7	CORAL	ResNet18-MS	0.8 (0.04)	0.78 (0.05)	0.51 (0.06)	0.44 (0.07)	WILDS	Paper / Code	July 15, 2021
8	IRM	ResNet18-MS	0.81 (0.03)	0.77 (0.05)	0.53 (0.05)	0.43 (0.07)	WILDS	Paper / Code	July 15, 2021
9	CGD	ResNet18-MS	0.81 (0.03)	0.77 (0.04)	0.51 (0.05)	0.43 (0.04)	Vihari Piratla	Paper / Code	April 10, 2022	No hyperparameter search; CG step size: 0.05, LR, optimizer, decay rate etc. all set to default.
10	Group DRO	ResNet18-MS	0.78 (0.05)	0.75 (0.07)	0.46 (0.04)	0.39 (0.06)	WILDS	Paper / Code	July 15, 2021

With unlabeled data

Unlabeled data is available from the source, validation and target domains.

Rank	Algorithm	Model	Val Pearson r	Test Pearson r	Val Worst-U/R Pearson r	Test Worst-U/R Pearson r ▼	Contact	References	Date	Notes
1	C-Mixup+SpAR	ResNet18-MS	0.81 (0.02)	0.79 (0.05)	0.53 (0.08)	0.52 (0.09)	Ben Eyre	Paper / Code	February 19, 2024	lr=1e-3, kde_bandwidth=0.5, spar_alpha=0.999*
2	ERM+SpAR	ResNet18-MS	0.8 (0.04)	0.79 (0.05)	0.52 (0.08)	0.51 (0.1)	Ben Eyre	Paper / Code	February 19, 2024	lr=1e-3, spar_alpha=0.999
3	ICON	ResNet18-MS	0.8 (0.04)	0.77 (0.04)	0.52 (0.08)	0.49 (0.04)	Nick Y.	Paper / Code	November 03, 2022	Uses unlabeled data from the target domain. lr: 0.0009738391232813829, dropout: 0.5.
4	SwAV	ResNet18-MS	0.81 (0.05)	0.78 (0.06)	0.54 (0.07)	0.45 (0.05)	WILDS	Paper / Code	December 09, 2021	Uses unlabeled data from the target domain.
5	Noisy Student	ResNet18-MS	0.8 (0.05)	0.76 (0.08)	0.52 (0.08)	0.42 (0.11)	WILDS	Paper / Code	December 09, 2021	Uses unlabeled data from the target domain.
6	AFN	ResNet18-MS	0.76 (0.05)	0.75 (0.08)	0.44 (0.07)	0.39 (0.08)	WILDS	Paper / Code	December 09, 2021	Uses unlabeled data from the target domain.
7	CORAL	ResNet18-MS	0.79 (0.04)	0.74 (0.05)	0.5 (0.09)	0.36 (0.08)	WILDS	Paper / Code	December 09, 2021	Uses unlabeled data from the target domain.
8	DANN	ResNet18-MS	0.77 (0.04)	0.69 (0.04)	0.44 (0.11)	0.33 (0.1)	WILDS	Paper / Code	December 09, 2021	Uses unlabeled data from the target domain.
9	FixMatch	ResNet18-MS	0.76 (0.07)	0.64 (0.11)	0.48 (0.05)	0.3 (0.11)	WILDS	Paper / Code	December 09, 2021	Uses unlabeled data from the target domain.

Amazon

Without unlabeled data

Rank	Algorithm	Model	Val Avg Acc	Test Avg Acc	Val 10% Acc	Test 10% Acc ▼	Contact	References	Date	Notes
1	LISA	DistillBERT-base-uncased	71.4 (0.4)	70.7 (0.3)	54.8 (0.2)	54.7 (0.0)	Yu Wang	Paper / Code	March 13, 2022	We have used all the default hyperparameters, except for the mix_alpha, which is tuned in [0.5, 2], and 2 is the final choice.
2	ERM (rand search)	DistillBERT-base-uncased	72.8 (0.1)	72.0 (0.1)	56.0 (0.0)	54.2 (0.8)	WILDS	Paper / Code	December 09, 2021	Hyperparameters tuned via random search, consistent with the WILDS unlabeled data baselines.
3	ERM (grid search)	DistillBERT-base-uncased	72.7 (0.1)	71.9 (0.1)	55.2 (0.7)	53.8 (0.8)	WILDS	Paper / Code	July 15, 2021	Hyperparameters tuned via grid search, consistent with other WILDS baselines that do not use unlabeled data.
4	Group DRO	DistillBERT-base-uncased	70.7 (0.6)	70.0 (0.5)	54.7 (0.0)	53.3 (0.0)	WILDS	Paper / Code	July 15, 2021
5	Fish	DistillBERT-base-uncased	72.5 (0.0)	71.7 (0.1)	54.2 (0.8)	53.3 (0.0)	Yuge Shi	Paper / Code	December 14, 2021
6	CORAL	DistillBERT-base-uncased	72.0 (0.3)	71.1 (0.3)	54.7 (0.0)	52.9 (0.8)	WILDS	Paper / Code	July 15, 2021
7	IRM	DistillBERT-base-uncased	71.3 (0.5)	70.3 (0.6)	54.2 (0.8)	52.4 (0.8)	WILDS	Paper / Code	July 15, 2021
8	Reweighted (Label)	DistillBERT-base-uncased	68.9 (0.9)	68.3 (0.9)	52.1 (0.2)	51.6 (0.8)	WILDS	Paper / Code	July 15, 2021

With unlabeled data

Unlabeled data is available from the validation, target and extra domains.

Rank	Algorithm	Model	Val Avg Acc	Test Avg Acc	Val 10% Acc	Test 10% Acc ▼	Contact	References	Date	Notes
1	ICON	DistillBERT-base-uncased	72.7 (0.2)	71.9 (0.1)	55.2 (0.7)	54.7 (0.0)	Nick Y.	Paper / Code	November 03, 2022	Uses unlabeled data from the target domain. lr: 1.5581425972502133e-05, self_training_lambda:1, self_training_threshold: 0.7681640736450283*.
2	AFN	DistillBERT-base-uncased	73.0 (0.4)	72.1 (0.3)	56.0 (0.0)	54.2 (0.8)	WILDS	Paper / Code	December 09, 2021	Uses unlabeled data from the target domain.
3	Masked LM	DistillBERT-base-uncased	72.6 (0.5)	71.7 (0.4)	55.1 (0.8)	53.5 (0.2)	WILDS	Paper / Code	December 09, 2021	Uses unlabeled data from the target domain.
4	CORAL	DistillBERT-base-uncased	72.5 (0.1)	71.7 (0.1)	54.2 (0.8)	53.3 (0.0)	WILDS	Paper / Code	December 09, 2021	Uses unlabeled data from the target domain.
5	DANN	DistillBERT-base-uncased	72.6 (0.1)	71.7 (0.1)	54.7 (0.0)	53.3 (0.0)	WILDS	Paper / Code	December 09, 2021	Uses unlabeled data from the target domain.
6	Pseudo-Label	DistillBERT-base-uncased	72.5 (0.1)	71.6 (0.1)	54.2 (0.8)	52.3 (1.1)	WILDS	Paper / Code	December 09, 2021	Uses unlabeled data from the target domain.

Py150

Rank	Algorithm	Model	Test ID Method/Class Acc	Test ID All Acc	Test OOD Method/class Acc ▼	Test OOD All Acc	Contact	References	Date	Notes
1	ERM (grid search)	CodeGPT	75.4 (0.4)	74.5 (0.4)	67.9 (0.1)	69.6 (0.1)	WILDS	Paper / Code	July 15, 2021	Hyperparameters tuned via grid search, consistent with other WILDS baselines that do not use unlabeled data.
2	Group DRO	CodeGPT	70.8 (0.0)	71.0 (0.0)	66.0 (0.1)	67.9 (0.0)	WILDS	Paper / Code	July 15, 2021
3	CORAL	CodeGPT	70.6 (0.0)	70.8 (0.1)	65.9 (0.1)	67.9 (0.0)	WILDS	Paper / Code	July 15, 2021
4	IRM	CodeGPT	67.3 (1.1)	68.3 (0.7)	64.3 (0.2)	66.4 (0.1)	WILDS	Paper / Code	July 15, 2021