Training nanochat on a nvidia RTX3060 !

Antonio Linares

Posts: 44229

Joined: Thu Oct 06, 2005 05:47 PM

Re: Training nanochat on a nvidia RTX3060 !

Posted: Sun Oct 26, 2025 10:10 AM

Step

Command

Description

Expected Output/Notes

1. Clone Repo cd ~
git clone https://github.com/karpathy/nanochat
cd nanochat Clone the standard nanochat repository from Karpathy. Creates ~/nanochat directory with all source files. 2. Create venv python3.10 -m venv .venv
source .venv/bin/activate Set up Python 3.10 virtual environment. Activates .venv; use uv for faster package management. 3. Install Dependencies uv sync Install PyTorch, CUDA libs, maturin, and other deps (~3-5 min). Resolved 91 packages; CUDA: True, GPU: NVIDIA GeForce RTX 3060. 4. Verify Torch/CUDA uv run python -c "import torch; print(f'CUDA: {torch.cuda.is_available()}, GPU: {torch.cuda.get_device_name(0) if torch.cuda.is_available() else None}')" Confirm CUDA and GPU detection. Output: CUDA: True, GPU: NVIDIA GeForce RTX 3060. 5. Install nanochat Editable uv pip install -e . Editable install (symlinks to source). nanochat==0.1.0 installed; enables module imports. 6. Verify nanochat Imports uv run python -c "from nanochat.gpt import GPT, GPTConfig; print('nanochat installed OK')" Test core model imports. Output: nanochat installed OK. 7. Compile Rust Tokenizer uv run maturin develop --release --manifest-path rustbpe/Cargo.toml Build RustBPE tokenizer (~2-5 min). 📦 Built wheel; RustBPETokenizer available. 8. Verify Tokenizer uv run python -c "from nanochat.tokenizer import RustBPETokenizer; print('Tokenizer compiled OK')" Confirm tokenizer module. Output: Tokenizer compiled OK. 9. Set PYTHONPATH export PYTHONPATH="$(pwd):$PYTHONPATH" Add repo root to Python path for multiprocessing. Permanent: Add to ~/.bashrc; required for torchrun. 10. Download Dataset PYTHONPATH="$(pwd):$PYTHONPATH" uv run python -m nanochat.dataset -n 40 Download 40 FineWeb-EDU shards (~4 GB, 5-15 min). Downloading 8 shards using 4 workers...; Target: ~/.cache/nanochat/base_data/. 11. Verify Dataset ls ~/.cache/nanochat/base_data/ | wc -l
du -sh ~/.cache/nanochat/base_data/ Check downloaded shards. 40 files; ~4 GB total (shard_00000.parquet to shard_00039.parquet). 12. Train Tokenizer (Optional) uv run python -m scripts.tok_train --max_chars=2000000000 Train custom BPE tokenizer (~10-20 min CPU). Generates tokenizer.json (~65K vocab); Edit vocab_size=32768 for smaller model. 13. Eval Tokenizer (Optional) uv run python -m scripts.tok_eval Test tokenizer compression (~1 min). Compression ratio ~4.8 chars/token (vs. GPT-4). 14. Pretrain Model PYTHONPATH="$(pwd):$PYTHONPATH" uv run torchrun --standalone --nproc_per_node=1 scripts/base_train.py --depth=10 --device_batch_size=4 --max_seq_len=1024 --compile --num_iterations=800 Base training: depth=10 (~100M params, ~1-2 hours on RTX 3060). Overriding: depth=10...; Loss ~11.09 → ~2.0; Checkpoints in checkpoints/d10/; Monitor with watch -n 1 nvidia-smi (VRAM ~7 GB). 15. Midtrain (Post-Pretrain) PYTHONPATH="$(pwd):$PYTHONPATH" uv run python scripts/mid_train.py --model_path checkpoints/d10 Add reasoning datasets (SmolTalk/MMLU/GSM8K, ~10 min, 4 GB VRAM). Improves chat coherence; Output: checkpoints/d10/midtrain.pt. 16. Supervised Fine-Tuning uv run python scripts/sft.py --batch_size=2 Align to ChatGPT-style responses (~5 min, 2 GB VRAM). Enables conversational capabilities. 17. Evaluation uv run python scripts/eval.py Run benchmarks (~5 min). MMLU ~25-35%, GSM8K ~5-10%, HumanEval ~8%, ARC-Easy ~30%. 18. Launch Web UI uv run python -m nanochat.webui Start ChatGPT-like interface. Localhost:8000; Test with trained d10 model. 19. Full Pipeline (Alternative) bash speedrun.sh Automated script (edit base_train.py depth=10 for RTX 3060). Handles tokenizer → pretrain → midtrain → SFT → eval → UI; Generates report.md.

regards, saludos

Antonio Linares
www.fivetechsoft.com

FiveTech Support Forums

Training nanochat on a nvidia RTX3060 !

Re: Training nanochat on a nvidia RTX3060 !

Continue the discussion