SAGE: Segment-Aware Gloss-Free Encoding for Token-Efficient Sign Language Translation

ArXi:2507.09266v2 Announce Type: replace Gloss-free Sign Language Translation (SLT) has advanced rapidly, achieving strong performances without relying on gloss annotations. However, these gains have often come with increased model complexity and high computational demands, raising concerns about scalability, especially as large-scale sign language datasets become common. We propose a segment-aware visual tokenization framework that leverages sign segmentation to convert continuous video into discrete, sign-informed visual tokens.