AI RESEARCH
Effective Biological Representation Learning by Masking Gene Expression
arXiv CS.LG
•
ArXi:2605.31562v1 Announce Type: new RNA sequencing produces rich and diverse datasets of gene expression, offering compelling insights into cellular state and function that have many applications in drug discovery. Modeling such data is challenging due to inherent technical noise and experimental batch effects, as evidenced by many existing transcriptomic foundation models (FMs) underperforming relative to linear baselines. Such results raise the question of whether deep representation learning provides a distinct advantage over the direct use of raw transcript counts.