Introducing 'nanochat,' an open source platform that allows you to build a conversational AI like ChatGPT from scratch in just 15,000 yen and 4 hours



Andrey Karpaty, a founding member of OpenAI and an AI development engineer, has released nanochat , an open source project for building AI chatbots like ChatGPT from scratch. With nanochat, it's possible to build an AI chatbot like ChatGPT in just a few hours, starting with training a basic large-scale language model (LLM), on a budget of about $100 (approximately 15,000 yen).



GitHub - karpathy/nanochat: The best ChatGPT that $100 can buy.
https://github.com/karpathy/nanochat

nanochat
https://simonwillison.net/2025/Oct/13/nanochat/

Nanochat provides all the elements necessary for LLM development in a single codebase, from the design of the neural network at the heart of the model, to tokenization for language understanding, pre-training to acquire knowledge, fine-tuning to refine conversational capabilities, and a web interface for interacting with the completed model. The entire code is relatively compact, at about 8,000 lines, and is written primarily in Python (PyTorch), with Rust used for training the tokenizer, which requires some high-speed processing.

The biggest features of nanochat are its ease of use and transparency. While developing a high-performance LLM typically requires an investment of hundreds of millions of yen, nanochat dramatically reduces this cost by renting a computer equipped with eight NVIDIA H100 high-performance GPUs on an hourly basis.

For example, if you rent a computer for about $24 (about 3,600 yen) per hour, you can complete the entire learning process in just about four hours by simply running the included script 'speedrun.sh.' The model generated by this 'speedrun.sh' has about 560 million parameters and can handle basic conversation.



The learning process can be broadly divided into four stages.

The first 'pre-training' stage is the most time-consuming and takes about three hours. Here, the model is fed with a massive amount of text data (approximately 24GB) collected from educational web pages called FineWeb-EDU. This allows the model to acquire extensive knowledge of language structure and the world.

Next, we move on to a phase called 'mid-training.' In this process, we train the model on a general conversation dataset (SmolTalk), multiple-choice questions (MMLU), and math word problems (GSM8K). This allows the model to not only acquire knowledge, but also learn how to interact with users and how to answer specific questions.

Next, we perform a supervised fine-tuning (SFT) process, which lasts approximately seven minutes. Here, we use a particularly high-quality selection of conversational data to further refine the model's responses and ultimately improve its performance.

The final step, 'reinforcement learning (RL),' is optional and is not performed by default, but it allows the model to repeat trial and error on its own to further improve the accuracy rate in tasks where the correct answer is clear, such as arithmetic problems.

According to Karpati, the ' CORE Metric ,' which measures a model's overall language ability, achieved a score of 0.22 for a model trained with nanochat and budgeted at $100, slightly higher than the GPT-2 large model (0.21). While the model knew facts like 'The capital of France is Paris' and 'The chemical symbol for gold is Au,' it struggled with simple calculations. However, after fine-tuning, it was able to explain why the sky is blue by citing Rayleigh scattering and even write poetry on the same theme.



Once training is complete, users can interact with their hand-built LLM through a web browser. nanochat not only provides a low-cost LLM, but is also intentionally designed with simple, highly readable code to make its internal structure easy to understand. Performance can be improved simply by changing the number of layers in the model. For example, with a budget of approximately $300 (approximately 45,000 yen) and 12 hours of training, it is expected to outperform the standard GPT-2 model.

A 561 million parameter language model built with nanochat has been published on Hugging Face.

sdobson/nanochat · Hugging Face
https://huggingface.co/sdobson/nanochat

Simon Wilson, who published a script to run this model on a macOS CPU environment, noted that the nanochat model has approximately 560 million parameters, and that its size suggests it should be able to run on inexpensive devices like a Raspberry Pi.

When Wilson actually typed the prompt 'Tell me about your dog' into the model, the model replied, 'I'm excited to share my passion for dogs with you all. As a veterinarian, I've had the opportunity to help so many owners care for their precious dogs. Training, being a part of their lives, and seeing their faces light up when they see their favorite treats and toys is truly special. I've had the opportunity to work with over 1,000 dogs, and I have to say it's a rewarding experience. The bond between owner and pet.' The response was cut off due to a lack of tokens.

in Software, Posted by log1i_yk