MultiAct: Text-to-Motion Generation from Composite Text via Tailored Attention Guidance

ArXi:2605.30925v1 Announce Type: new Text-to-motion generation has progressed rapidly in recent years, offering an expressive interface for animation and human-computer interaction. However, current models remain brittle when handling prompts that describe multiple actions occurring at the same time. Rather than realizing all components of a composite description, models frequently prioritize a single dominant action and neglect the rest, leading to incomplete or ambiguous motion.