Optimizing and accelerating the Lance model for RTX 2080 Ti 22GB (Tested on Single & Dual-GPU)

Lance Generated Video Hi r/LocalLLaMA, Affiliation Disclosure: I am the creator of this open-source project. Like many independent researchers and homelab builders here, I heavily rely on the modded RTX 2080 Ti 22GB cards due to their high VRAM-to-cost ratio. However, running modern models like Lance on older Turing architecture often suffers from suboptimal kernel execution paths and multi-GPU scaling bottlenecks. To help the community leverage these budget 22GB cards, I spent some time on the infrastructure side and built a dedicated optimization and acceleration port: Lance-2080ti.