The Cosmos omnimodel family of models - 3 variants Edge(4B) , Nano(16B) , Super (64B)
r/StableDiffusion
•
Machine Learning
Generative AI
AI Research
The cosmos family is omnimodel, capabale of various modalites txt2img, img2video etc. designed to jointly process and generate language, image, video, audio, and action sequences within a unified mixture-of-transformers architecture. effectively subsuming vision-language models, video generators, world simulators, and world-action models into a single framework. The super-txt2img and super img2-vid are post-trained finetuning on those speciliased tasks for the 64B model variant. submitted by /u/AgeNo5351 [link] [comments.