Posts: 44158
Joined: Thu Oct 06, 2005 05:47 PM
Re: Training nanochat on a nvidia RTX3060 !
Posted: Sun Oct 26, 2025 10:10 AM
| 1. Clone Repo |
cd ~
git clone https://github.com/karpathy/nanochat
cd nanochat |
Clone the standard nanochat repository from Karpathy. |
Creates ~/nanochat directory with all source files. | โ
| 2. Create venv |
python3.10 -m venv .venv
source .venv/bin/activate |
Set up Python 3.10 virtual environment. |
Activates .venv; use uv for faster package management. | โ
| 3. Install Dependencies |
uv sync |
Install PyTorch, CUDA libs, maturin, and other deps (~3-5 min). |
Resolved 91 packages; CUDA: True, GPU: NVIDIA GeForce RTX 3060. | โ
| 4. Verify Torch/CUDA |
uv run python -c "import torch; print(f'CUDA: {torch.cuda.is_available()}, GPU: {torch.cuda.get_device_name(0) if torch.cuda.is_available() else None}')" |
Confirm CUDA and GPU detection. |
Output: CUDA: True, GPU: NVIDIA GeForce RTX 3060. | โ
| 5. Install nanochat Editable |
uv pip install -e . |
Editable install (symlinks to source). |
nanochat==0.1.0 installed; enables module imports. | โ
| 6. Verify nanochat Imports |
uv run python -c "from nanochat.gpt import GPT, GPTConfig; print('nanochat installed OK')" |
Test core model imports. |
Output: nanochat installed OK. | โ
| 7. Compile Rust Tokenizer |
uv run maturin develop --release --manifest-path rustbpe/Cargo.toml |
Build RustBPE tokenizer (~2-5 min). |
📦 Built wheel; RustBPETokenizer available. | โ
| 8. Verify Tokenizer |
uv run python -c "from nanochat.tokenizer import RustBPETokenizer; print('Tokenizer compiled OK')" |
Confirm tokenizer module. |
Output: Tokenizer compiled OK. | โ
| 9. Set PYTHONPATH |
export PYTHONPATH="$(pwd):$PYTHONPATH" |
Add repo root to Python path for multiprocessing. |
Permanent: Add to ~/.bashrc; required for torchrun. | โ
| 10. Download Dataset |
PYTHONPATH="$(pwd):$PYTHONPATH" uv run python -m nanochat.dataset -n 40 |
Download 40 FineWeb-EDU shards (~4 GB, 5-15 min). |
Downloading 8 shards using 4 workers...; Target: ~/.cache/nanochat/base_data/. | โ
| 11. Verify Dataset |
ls ~/.cache/nanochat/base_data/ | wc -l
du -sh ~/.cache/nanochat/base_data/ |
Check downloaded shards. |
40 files; ~4 GB total (shard_00000.parquet to shard_00039.parquet). |
| 12. Train Tokenizer (Optional) |
uv run python -m scripts.tok_train --max_chars=2000000000 |
Train custom BPE tokenizer (~10-20 min CPU). |
Generates tokenizer.json (~65K vocab); Edit vocab_size=32768 for smaller model. | โ
| 13. Eval Tokenizer (Optional) |
uv run python -m scripts.tok_eval |
Test tokenizer compression (~1 min). |
Compression ratio ~4.8 chars/token (vs. GPT-4). |
| 14. Pretrain Model |
PYTHONPATH="$(pwd):$PYTHONPATH" uv run torchrun --standalone --nproc_per_node=1 scripts/base_train.py --depth=10 --device_batch_size=4 --max_seq_len=1024 --compile --num_iterations=800 |
Base training: depth=10 (~100M params, ~1-2 hours on RTX 3060). |
Overriding: depth=10...; Loss ~11.09 โ ~2.0; Checkpoints in checkpoints/d10/; Monitor with watch -n 1 nvidia-smi (VRAM ~7 GB). |
| 15. Midtrain (Post-Pretrain) |
PYTHONPATH="$(pwd):$PYTHONPATH" uv run python scripts/mid_train.py --model_path checkpoints/d10 |
Add reasoning datasets (SmolTalk/MMLU/GSM8K, ~10 min, 4 GB VRAM). |
Improves chat coherence; Output: checkpoints/d10/midtrain.pt. | โ
| 16. Supervised Fine-Tuning |
uv run python scripts/sft.py --batch_size=2 |
Align to ChatGPT-style responses (~5 min, 2 GB VRAM). |
Enables conversational capabilities. | โ
| 17. Evaluation |
uv run python scripts/eval.py |
Run benchmarks (~5 min). |
MMLU ~25-35%, GSM8K ~5-10%, HumanEval ~8%, ARC-Easy ~30%. | โ
| 18. Launch Web UI |
uv run python -m nanochat.webui |
Start ChatGPT-like interface. |
Localhost:8000; Test with trained d10 model. | โ
| 19. Full Pipeline (Alternative) |
bash speedrun.sh |
Automated script (edit base_train.py depth=10 for RTX 3060). |
Handles tokenizer โ pretrain โ midtrain โ SFT โ eval โ UI; Generates report.md. | โ