The Cosmos omnimodel family of models - 3 variants Edge(4B) , Nano(16B) , Super (64B)

The cosmos family is omnimodel, capabale of various modalites txt2img, img2video etc. designed to jointly process and generate language, image, video, audio, and action sequences within a unified mixture-of-transformers architecture. effectively subsuming vision-language models, video generators, world simulators, and world-action models into a single framework. The super-txt2img and super img2-vid are post-trained finetuning on those speciliased tasks for the 64B model variant. submitted by /u/AgeNo5351 [link] [comments.