Datasets:

Project-AgML
/

AgroMind

AgroMind is an agricultural remote sensing benchmark for evaluating large multimodal models on agricultural scene understanding. It contains 28,482 question-answer pairs paired with 20,850 images drawn from nine public datasets plus proprietary global parcel data, spanning 13 task types across 4 evaluation dimensions: spatial perception (localization, relationship determination, boundary detection), object understanding (classification, pest/disease diagnostics, growth status assessment), scene understanding (comparison, counting, area statistics), and scene reasoning (visual prompt reasoning, anomaly detection, climate classification, yield prediction).

This dataset has been standardized to the HF image_text_to_text format: one conversational messages schema, imagefolder-native image configs, and (if present) a text_only parquet config.

Subsets: Agriculture, CropHarvest, Fruit, Leaf_diseases, Oil_palm_trees, Pest, Rural, Trees, corn, crop.

This dataset is indexed on https://project-agml.github.io/ as part of the AgML python library.

Usage

from datasets import load_dataset, concatenate_datasets

# Everything (default config)
ds = load_dataset("Project-AgML/AgroMind", "all")

# One subset, standalone (no config needed)
ds = load_dataset("imagefolder", data_files="images/Agriculture.zip")

# Download + concatenate ONLY specific subsets (only those zips are fetched)
ds = load_dataset("Project-AgML/AgroMind", data_files=["images/Agriculture.zip", "images/CropHarvest.zip"], split="train")

# ...or load configs separately and concatenate explicitly
a = load_dataset("Project-AgML/AgroMind", "Agriculture", split="train")
b = load_dataset("Project-AgML/AgroMind", "CropHarvest", split="train")
merged = concatenate_datasets([a, b])

# All image zips via wildcard (no loading script needed)
ds = load_dataset("Project-AgML/AgroMind", data_files=["images/*.zip"])

# Stream without downloading
ds = load_dataset("Project-AgML/AgroMind", "all", streaming=True)

Each record has file_names (1..N images), a messages conversation, and the original fields preserved verbatim as JSON-string columns. Multi-image rows return images as a list aligned to the {"type": "image"} placeholders in messages.

Citation

@misc{li2025largemultimodalmodelsunderstand,
      title={Can Large Multimodal Models Understand Agricultural Scenes? Benchmarking with AgroMind},
      author={Li, Qingmei and Zhang, Yang and Mai, Zurong and Chen, Yuhang and Lou, Shuohong and Huang, Henglian and Zhang, Jiarui and Zhang, Zhiwei and Wen, Yibin and Li, Weijia and Fu, Haohuan and Huang, Jianxi and Zheng, Juepeng},
      year={2025},
      eprint={2505.12207},
      archivePrefix={arXiv},
      primaryClass={cs.CV},
      url={https://arxiv.org/abs/2505.12207}
}

Li, Qingmei; Zhang, Yang; Mai, Zurong; Chen, Yuhang; Lou, Shuohong; Huang, Henglian; Zhang, Jiarui; Zhang, Zhiwei; Wen, Yibin; Li, Weijia; Fu, Haohuan; Huang, Jianxi; Zheng, Juepeng (2025), "Can Large Multimodal Models Understand Agricultural Scenes? Benchmarking with AgroMind", arXiv:2505.12207

Downloads last month: 20

Paper for Project-AgML/AgroMind

Can Large Multimodal Models Understand Agricultural Scenes? Benchmarking with AgroMind

Paper • 2505.12207 • Published May 18, 2025