Memory-Efficient Partitioned DNN Inference on Resource-Constrained Android Crowds

ArXi:2605.20723v1 Announce Type: new Deploying large deep neural networks on memory-constrained mobile devices is a central challenge in edge ML. While compression, pruning, and quantization reduce per-parameter cost, transformer-based models remain too large for the 3.3-7.4 GB RAM envelope of commodity Android handsets.