Me train LLM on 8GB from Scratch. Me happy

I made post yesterday: i program today: Highlight: - train tinystories from scratch with 8GB VRAM. YAY - mHC no good (too small model) - BitNet too Slow (no memory gain while