Terminal LLM Trainer

Finetune small LLMs without an ML background

I noticed friends building agents kept hitting the same wall: small open-source LLMs are great until they fumble tool calls. The fix is finetuning, but the ML pipeline is hostile to anyone who hasn’t lived in it for years.

Autotrainer wraps the whole workflow in a TUI: generate synthetic data, finetune with VB-LoRA (parameter-efficient, runs on consumer GPUs), evaluate with BERTScore/CodeBERTScore, then progressively align with ORPO — contrasting ground truth against the model’s own prior generations.

Highlights

Full TUI: synthetic data → SFT (VB-LoRA) → eval → ORPO alignment.
No separate RLHF reward model — ORPO chooses chosen/rejected pairs from prior gens.
Demo: finetuned SmolLM2-135M for MCP tool-calling (code_search, open_file, run_tests).
Runs on consumer hardware.

View on GitHub → ← back to all work