9/21/25
Andrej Karpathy has stated, “Most of my ‘coding’ now is in English,” meaning he spends more time writing natural-language instructions for the AI about what to build rather than writing low-level code himself. I think this is nice, but what does this actually mean?
How does it change the way that I do something? How do I actually produce software differently?
These are some points I’ve learned and read about.
Generating code
Andrej Karpathy writes Python to orchestrate final solutions, but lets the AI fill in boilerplate code and suggest implementations. Carmack uses it for refactoring, design, and cleaning up old codebases.
- provide context: what we are building here, what the specific code section is trying to do
- specify the task clearly: “I’m trying to rewrite this ”
- format queries as if you were asking a coworker you admire + respect for help- you don’t want to waste their time
- hit enter
Debugging code
Instead of asking something lazy, like “hey, can you help me figure out why my code isn’t working?” You ask, “here is the code section that is causing the error, and the error. It should do X but it’s doing Y. It should output Z. Why is it failing?”*
Just put some effort into asking the question and the result will be better. I know, crazy- it’s almost like everything in all of history is just about putting some work in and focusing.
*I guess this would be in a program where you already know the output, so this question actually might stifle it from generating something new, which is what a lot of deep learning models have shown- novelty.
“Fix the context, not the code”- Bret Taylor
- provide context- what we’re trying to do here in the code we are about to build
- goal of current code being added: inputs and outputs
- show the error: add environment + output from terminal to the chat
- hit enter
Use other models against each other- also, I have found copying and pasting outputs from different models into different chat threads is really helpful
Scaffolding a repo
- create a
guidelines.md
, which, yes, does actually produce better results if you write it + improve on it over time consisting of - goal of the project (changes every time, hopefully the rest stays relatively similar from build to build) + why it’s being built (to provide context on purpose)
- core technologies you want to use (JAX, SBX, Python3, Triton, PyTorch, tranfsormers, and/or tinygrad in the case of doing )
- versions- understand when the training date was cut off for the model you were using, since it hasn’t indexed new changes in the packages you use (for instance, a not-most-recent version of Node was hacked, but the model doesn’t know that information)
- versions- tell the model to use principles like DRY, encapsulation, and modularity
- tests- [im not sure on this one]
- what you want logged and why- like for building DL models, Cursor likes to output everything; I don’t care about everything, give me input/output shapes going between functions when initially compiling it, performance and accuracy stats from training epochs, and results expressed visually through graphs when done
# guidelines.txt
## 1. project goal
- Objective: [Fill in project-specific goal]
- Why: [Explain scientific/engineering purpose]
## 2. core technologies
- Python 3.10+
- JAX / SBX / Triton / PyTorch / Tinygrad
- Transformers
- Bio libs: Biopython, RDKit, Rosetta, ColabFold
## 3. versions
- track model training cutoff vs. package updates
- Pin versions:
python==3.11
jax==0.4.35
torch==2.2.0
triton==2.2.0
transformers==4.44.2
biopython==1.83
rdkit==2024.03.3
## 4. coding principles
- DRY, encapsulation, modularity
- Separate data, model, train, eval, utils
## 5. tests
- unit tests for core functions + hot paths
- smoke tests: FASTA → embedding shape, docking → score range
- regression test with fixed seed
## 6. logging
- always: tensor shapes, training loss/accuracy, GPU/mem usage
- optional: graphs of metrics, embeddings
## 7. project structure
project/
├── guidelines.txt
├── data/
├── notebooks/
├── src/
│ ├── dataloader.py
│ ├── model.py
│ ├── train.py
│ ├── eval.py
│ └── utils.py
├── tests/
├── scripts/
├── requirements.txt
└── README.md
Setting evals
TBD still getting better at this, it’s an art more than a science. But a good Eval artist would make really good environment/state/action definitions in RL. There is an article below I haven’t read yet that looks good.
Read later
add trial 1 later: Michelangelo van Dam Personal experience with JetBrains Junie™: day 1 - in2it
past about evals: https://hamel.dev/blog/posts/evals/?utm_source=chatgpt.com
go through this guys repos: https://github.com/r2d4?utm_source=chatgpt.com
and his blog post on working with them: https://blog.matt-rickard.com/p/literate-programming-with-llms?utm_source=chatgpt.com