AI RESEARCH
LDARNet: DNA Adaptive Representation Network with Learnable Tokenization for Genomic Modeling
arXiv CS.CL
•
ArXi:2606.04552v1 Announce Type: new Genomic foundation models increasingly adopt large language model architectures, yet almost universally rely on fixed tokenization schemes such as $k$-mers, BPE, or single nucleotides, which impose arbitrary sequence boundaries that may obscure biologically relevant structure.