If nothing else, Apple is participating in the artificial intelligence community.
On Wednesday, the company released Pico-Banana-400K, a highly curated 400,000-image research dataset which was built using Google’s Gemini-2.5 models.
The research team, which published a study entitled “Pico-Banana-400K: A Large-Scale Dataset for Text-Guided Image Editing,” released the full 400,000-image dataset under a non-commercial research license. This allows anyone to use it and explore it, provided it’s for academic work or AI research purposes and not for commercial use.
The work follows up on a Google project from a few months again, wherein Google released the Gemini-2.5-Flash-Image model, also known as Nanon-Banana, which is arguably the state-of-the-art when it comes to image editing models.
Apple’s reseachers described the work as follows:
“Despite these advances, open research remains limited by the lack of large-scale, high-quality, and fully shareable editing datasets. Existing datasets often rely on synthetic generations from proprietary models or limited human-curated subsets. Furthermore, these datasets frequently exhibit domain shifts, unbalanced edit type distributions, and inconsistent quality control, hindering the development of robust editing models.”
From Apple’s end, the company pulled an unspecified number of real images from the OpenImages dataset, “selected to ensure coverage of humans, objects, and textual scenes.” The team then came up with a list of 35 different types of changes a user could ask the model to make, grouped into eight categories, which include the following:
- Pixel & Photometric: Add film grain or vintage filter
- Human-Centric: Funko-Pop–style toy figure of the person
- Scene Composition & Multi-Subject: Change weather conditions (sunny/rainy/snowy)
- Object-Level Semantic: Relocate an object (change its position/spatial relation)
- Scale: Zoom in
The researchers would then upload an image to Nano-Banana, alongside one of these prompts. Once Nano-Banana was done generating the edited image, the researchers would then have Gemini-2.5-Pro analyze the result, either approving it or rejecting it, based on instruction compliance and visual quality.
The resulting object became Pico-Banana-400K, which “includes images produced through single-turn edits (a single prompt), multi-turn edit sequences (multiple iterative prompts), and preference pairs comparing successful and failed results (so models can also learn what undesirable outcomes look like).”
The end result both acknowledges limitations in fine-grained spatial editing, layout extrapolation, and typography. Still, the researchers say that they hope Pico-Banana-400K will serve as “a robust foundation for training and benchmarking the next generation of text-guided image editing models.”
The study can be found on arXiv, while the dataset itself is freely available for download and use on GitHub.
Stay tuned for additional details as they become available.




