Datasets:

Modalities:
Image
Text
Formats:
parquet
ArXiv:
Dataset Viewer
Auto-converted to Parquet Duplicate
__key__
stringlengths
28
30
jp2
imagewidth (px)
512
512
compound-001/Plate1/AA15_s1_1
compound-001/Plate1/AA15_s1_2
compound-001/Plate1/AA15_s1_3
compound-001/Plate1/AA15_s1_4
compound-001/Plate1/AA15_s1_5
compound-001/Plate1/AA15_s1_6
compound-001/Plate1/AA16_s1_1
compound-001/Plate1/AA16_s1_2
compound-001/Plate1/AA16_s1_3
compound-001/Plate1/AA16_s1_4
compound-001/Plate1/AA16_s1_5
compound-001/Plate1/AA16_s1_6
compound-001/Plate1/AA18_s1_1
compound-001/Plate1/AA18_s1_2
compound-001/Plate1/AA18_s1_3
compound-001/Plate1/AA18_s1_4
compound-001/Plate1/AA18_s1_5
compound-001/Plate1/AA18_s1_6
compound-001/Plate1/AA20_s1_1
compound-001/Plate1/AA20_s1_2
compound-001/Plate1/AA20_s1_3
compound-001/Plate1/AA20_s1_4
compound-001/Plate1/AA20_s1_5
compound-001/Plate1/AA20_s1_6
compound-001/Plate1/AA25_s1_1
compound-001/Plate1/AA25_s1_2
compound-001/Plate1/AA25_s1_3
compound-001/Plate1/AA25_s1_4
compound-001/Plate1/AA25_s1_5
compound-001/Plate1/AA25_s1_6
compound-001/Plate1/AA35_s1_1
compound-001/Plate1/AA35_s1_2
compound-001/Plate1/AA35_s1_3
compound-001/Plate1/AA35_s1_4
compound-001/Plate1/AA35_s1_5
compound-001/Plate1/AA35_s1_6
compound-001/Plate1/AA38_s1_1
compound-001/Plate1/AA38_s1_2
compound-001/Plate1/AA38_s1_3
compound-001/Plate1/AA38_s1_4
compound-001/Plate1/AA38_s1_5
compound-001/Plate1/AA38_s1_6
compound-001/Plate1/AA41_s1_1
compound-001/Plate1/AA41_s1_2
compound-001/Plate1/AA41_s1_3
compound-001/Plate1/AA41_s1_4
compound-001/Plate1/AA41_s1_5
compound-001/Plate1/AA41_s1_6
compound-001/Plate1/AA42_s1_1
compound-001/Plate1/AA42_s1_2
compound-001/Plate1/AA42_s1_3
compound-001/Plate1/AA42_s1_4
compound-001/Plate1/AA42_s1_5
compound-001/Plate1/AA42_s1_6
compound-001/Plate1/AA43_s1_1
compound-001/Plate1/AA43_s1_2
compound-001/Plate1/AA43_s1_3
compound-001/Plate1/AA43_s1_4
compound-001/Plate1/AA43_s1_5
compound-001/Plate1/AA43_s1_6
compound-001/Plate1/AA44_s1_1
compound-001/Plate1/AA44_s1_2
compound-001/Plate1/AA44_s1_3
compound-001/Plate1/AA44_s1_4
compound-001/Plate1/AA44_s1_5
compound-001/Plate1/AA44_s1_6
compound-001/Plate1/AA47_s1_1
compound-001/Plate1/AA47_s1_2
compound-001/Plate1/AA47_s1_3
compound-001/Plate1/AA47_s1_4
compound-001/Plate1/AA47_s1_5
compound-001/Plate1/AA47_s1_6
compound-001/Plate1/AB08_s1_1
compound-001/Plate1/AB08_s1_2
compound-001/Plate1/AB08_s1_3
compound-001/Plate1/AB08_s1_4
compound-001/Plate1/AB08_s1_5
compound-001/Plate1/AB08_s1_6
compound-001/Plate1/AB14_s1_1
compound-001/Plate1/AB14_s1_2
compound-001/Plate1/AB14_s1_3
compound-001/Plate1/AB14_s1_4
compound-001/Plate1/AB14_s1_5
compound-001/Plate1/AB14_s1_6
compound-001/Plate1/AB17_s1_1
compound-001/Plate1/AB17_s1_2
compound-001/Plate1/AB17_s1_3
compound-001/Plate1/AB17_s1_4
compound-001/Plate1/AB17_s1_5
compound-001/Plate1/AB17_s1_6
compound-001/Plate1/AB22_s1_1
compound-001/Plate1/AB22_s1_2
compound-001/Plate1/AB22_s1_3
compound-001/Plate1/AB22_s1_4
compound-001/Plate1/AB22_s1_5
compound-001/Plate1/AB22_s1_6
compound-001/Plate1/AB26_s1_1
compound-001/Plate1/AB26_s1_2
compound-001/Plate1/AB26_s1_3
compound-001/Plate1/AB26_s1_4
End of preview. Expand in Data Studio

To accompany OpenPhenom, Recursion is releasing the RxRx3-core dataset, a challenge dataset in phenomics optimized for the research community. RxRx3-core includes labeled images of 735 genetic knockouts and 1,674 small-molecule perturbations drawn from the RxRx3 dataset, image embeddings computed with OpenPhenom, MAE-L/8, MAE-G/8, and associations between the included small molecules and genes. The dataset contains 6-channel Cell Painting images and associated embeddings from 222,601 wells but is less than 18Gb, making it incredibly accessible to the research community.

Mapping the mechanisms by which drugs exert their actions is an important challenge in advancing the use of high-dimensional biological data like phenomics. We are excited to release the first dataset of this scale probing concentration-response along with a benchmark and model to enable the research community to rapidly advance this space.

Paper published at LMRL Workshop at ICLR 2025 RxRx3-core: Benchmarking drug-target interactions in High-Content Microscopy.
Benchmarking code for this dataset is provided in the EFAAR benchmarking repo and Polaris.


Loading the RxRx3-core image dataset

from datasets import load_dataset
rxrx3_core = load_dataset("recursionpharma/rxrx3-core")

Loading OpenPhenom embeddings and metadata for RxRx3-core

from huggingface_hub import hf_hub_download
import pandas as pd

file_path_metadata = hf_hub_download("recursionpharma/rxrx3-core", filename="metadata_rxrx3_core.csv",repo_type="dataset")
file_path_embs = hf_hub_download("recursionpharma/rxrx3-core", filename="OpenPhenom_rxrx3_core_embeddings.parquet",repo_type="dataset")

open_phenom_embeddings = pd.read_parquet(file_path_embs)
rxrx3_core_metadata = pd.read_csv(file_path_metadata)

Metadata

The metadata can be found in metadata_rxrx3_core.csv in this repository. The schema of the metadata is as follows:

Attribute Description
well_id Experiment Name - Plate - Well (compound-004_1_AA04 or gene-088_9_C15)
experiment_name Experiment Name: Experiment number (compound-004 or gene-088)
plate Plate number in the experiment (1-48)
address Well location on the plate - "A01" to "AF48".
gene Unblinded or anonymized gene name, or a control
treatment Compound synonym or gene-name - guide-number (Narlaprevir or _guide_1)
SMILES Canonical SMILES or blank for non-compounds
concentration Compound concentration tested (in uM)
perturbation_type CRISPR or COMPOUND
cell_type HUVEC
well_type_label Indicates experimental control information

The well_type_label column includes the following values:

well_type_label Description
Query guides CRISPR guides that target a query gene
Exon controls Exon-targeting CRISPR guides that are used as controls
Intron controls Intron-targeting CRISPR guides that are used as controls
Query Compounds + Intron control Query compounds on an intron-targeting CRISPR background
CRISPR Gene Positive Controls Control genes that are exon-targeting CRISPR guides that are used as controls, there are five genes with 6 guides each that target the exon region of the gene
Control Compounds + Intron control Control compound on an intron-targeting CRISPR background

To help understand the metadata, we have included some samples to enable parser testing and validation

well_id,experiment_name,plate,address,gene,treatment,SMILES,concentration,perturbation_type,cell_type,well_type_label
compound-001_10_AA12,compound-001,10,AA12,,Esomeprazole,"COC1=CC2=C([N-]C(=N2)[S@@](=O)CC2=C(C)C(OC)=C(C)C=N2)C=C1 |r,c:7,13,21,24,t:2,4,18|",2.5,COMPOUND,HUVEC,Control Compounds + Intron control
gene-077_3_L32,gene-077,3,L32,CENPC,CENPC_guide_3,,,CRISPR,HUVEC,Query guides
Downloads last month
1,019

Papers for recursionpharma/rxrx3-core