AI RESEARCH

A new dataset with more that 100M hi-quality, curated images, with captions and meta data! [P]

r/MachineLearning

Hello everyone. The new dataset is named MONET, is Apache 2.0 and available on HF: MONET is open, Apache 2.0-licensed image-text dataset. It was built from 2.9B images and refined to 104.9M high-quality samples. We are also publishing a paper that explains how the dataset was created if you are curious and 3 compagnions projects A umap to visualize the distribution A retreival tool to do text or image search A codebase to train T2i model based on MONET Hope this will be usefull! submitted by /u/dh7net [link] [comments.