EDUCATION & TRAINING
Build It, Then Use It: How I wrote 435 AI engineering lessons from scratch
Dev.to Machine Learning
About This Tutorial
The first time I wrote a tokenizer, I did it with a for loop. I counted byte pairs by hand, merged the most common ones, and waited about forty seconds for it to chew through a small corpus. The output was slow. The output was ugly. The output was correct. GitHub Repo: Then I ran the same input through tiktoken and watched it finish in forty milliseconds. That was the moment tiktoken stopped being magic. It was the same thing I had written the night before, in Rust, with the loop unrolled and the cache warm. It was not a library anymore. It was my code, faster.