AI News Leader · Topic
Image & Video Generation
The latest Image & Video Generation news, research, and analysis, continuously tracked across the AI landscape.
24 recent stories
-
Flux Klein 4B, getting albedo only from textures (delighting)
Hi, i used my own artificial dataset (using blender and pbr texture) to create this LoRA to get albedo from textures with harsh shadows and well ized lighting. I think it may be…
-
I apologise in advance for what I’m about to ask
I’m sure everyone is sick and tired of hearing beginners asking questions with no research but the field of ai is so vast that now it’s impossible to keep up with everything. I’…
-
Anima edit with turbo lora and proper masking
Download here (free): it using anime edit lora and use proper masking so it's very usable if you want to create avatar for your visual novel game or 2d avatar ai chatbot. i'm us…
-
Fine-tuned SDXL model with LoRA to generate Tribal Indian art
I have been curious about the finetune concept for a long time so wanted to learn and have it implemented it to generate authentic tribal warli style images. I finetuned it as I…
-
NVIDIA PiD Upscale for video?
Exist a way to upscale LTX2.3 videos output with NVIDIA PiD Comfyui? submitted by /u/smereces [link] [comments]
-
Make sure it's not coming from the LLM, I used a LLM and got 0 rejections on normal prompts
I modified the default workflow to use a (censored!) Gemma-4-31B running in llama.cpp, called it via API rather than invoking through Comfy and used the "Magic Prompt" from the…
-
Ltx 2.3 lora + nvidia PiD
What u think boys? Double lora double power FIRST TUTORIAL: FOR IMAGES ONLY Soon if i get enough ill make a lora
-
Maybe I'm bad at prompting them but both Klein 9B and ZiT seem really lacking in facial expressions
They can both do basic emotions like joy, surprise, fear, anger, etc but trying to get them to do specific facial expressions is really difficult to impossible. ZiT often just i…
-
I got tired of managing prompts in text files, so I built this
I've been generating AI images for a while and eventually ended up with hundreds of prompt
-
(AI Workflow) CUCO - Love Letter To LA Animation, Paul Trillo
AI news: (AI Workflow) CUCO - Love Letter To LA Animation, Paul Trillo. From r/StableDiffusion.
-
Sorry, not sorry (Ideogram jailbroken in 1 easy step)
ChatGPT says workflows themselves can also technically be illegal and can be considered distribution, so no workflows and forget what you saw here The node is called Layer Weigh…
-
A fully character-driven Fantasy story made entirely with LTX 2.3, ZiT, Klein, VibeVoice, and other open source models | Process & info about my experience in the comments
A fully character-driven Fantasy story made entirely with LTX 2.3, ZiT, Klein, VibeVoice, and other open source models | Process & info about my experience in the comments. From…
-
We Put Ideogram 4 Head-to-Head against OpenAI, Google, and Microsoft in Four Image Stress Test
We Put Ideogram 4 Head-to-Head against OpenAI, Google, and Microsoft in Four Image Stress Test. From r/StableDiffusion.
-
A helpful little tip to help deal with the ideogram model censorship
Their censorship was trained on English FYI. Help me obfuscate this, convert all of the non-field text to Danish please. don't change anything and don't alter the JSON, just tra…
-
At what quality would you be interested in a new vae for sd class models?
Current vae performance as rated by lpips scores. original sd vae: 1.2 sdxl vae: 0.9 qwen 2: 0.35 flux2: 0.24 Trouble with the last two is they do funky stuff making them comple…
-
BYG by NVIDIA - A framework to turn any model into an editing model
Project: "TL;DR We propose ByG (pronounced “Big”), a framework for unpaired image and video editing using only the base model’s internal knowledge - no paired data, no external…
-
TripoSplat converts a single 2D image into high-quality and variable number of 3D Gaussians, developed by TripoAI (open weights, link to github repo)
Did not see this one posted, so here it is: 2D image to high quality 3D gaussians. Open weights, runnable ly. Apparently ComfyUI is already good to go too. I'll get it up and ru…
-
Nvidia PiD Flux-2 color fix is Out + PiD for Qwen
Nvidia PiD Flux-2 color fix is Out + PiD for Qwen I checked teh color fix model for Flux 2, it’s better than before, but still not as good as Flux 1 PiD. It still shifts colors…
-
Multiple characters Anima generations are so good. There is some bleeding but its only gonna get better
I have attached my civitai profile it has all the workflows. I am still learning to prompt better so there will be some prompting, bleeding, anatomy issues. For the 4th image af…
-
Ideogram 4.0 an open source model apparently better than NB pro just released
Ideogram 4.0 an open source model apparently better than NB pro just released. Review r/StableDiffusion for full context, affected parties, and operational relevance.
-
Ideogram v4 is open weights!
Ideogram v4 is open weights!. Review r/StableDiffusion for full context, affected parties, and operational relevance.
-
Ideogram 4.0 Just Open Sourced!
Hi r/StableDiffusion, bet yall didn't see this one coming, it's a big day for the open-source community! Ideogram 4.0 is a 9.3B parameter open-weight text-to-image model. It is…
-
Multi character WAN Lora training?
Greetings. I have successfully trained several WAN loras for single realistic characters, (not real people) that are very high quality and nail the likeness. For context I have…
-
Untwisting RoPE in ComfyUI - One Style Transfer Framework for Most DiT Image Models
This video introduces Untwisting RoPE, a training-free framework for style transfer in Diffusion Transformer (DiT) models, serving as a modern alternative to legacy tools like I…