Large Byte Model: Teaching Language Models About Compiled Code

ArXi:2606.02834v1 Announce Type: cross Malware analysis starts with the raw bytes of an executable program, and tools to "lift" these to higher-level representations, such as assembly, are expensive and subject to error. Large Language Models (LLMs) cannot process raw byte representations and answer questions about them. To this end, we present the first byte-native