The Tuesday I Watched Qwen Run Locally on My Mac Mini With No Internet
The Tuesday I Watched Qwen Run Locally on My Mac Mini With No Internet
It was a Tuesday night around 11pm and I was staring at my Mac mini doing something I had been putting off for weeks. I clicked the WiFi icon in the menu bar and turned it off. The little fan symbol went dark. And then I asked Qwen 7B, running through Ollama, to write me a Python function for parsing a messy CSV.
It just answered. Tokens streamed in like nothing had changed. No spinner of doom, no “network error,” no rate limit warning. The model was running on my own metal in my own apartment, and the rest of the internet might as well not have existed.
Why I finally tried this
I have been using cloud models for almost everything. Claude through Claude Code, GPT through the API, occasionally something via OpenRouter when I want to test a new release. They all work great until they don’t — until a model gets deprecated, a key gets revoked, a region gets restricted, or my proxy decides to take a nap.
I had read the local-LLM crowd talking about this for months. After losing four hours to r/LocalLLaMA one night, I finally felt like I understood enough to try it myself instead of just lurking.
The setup was almost embarrassingly easy
I installed Ollama with a single curl command. Then ollama pull qwen2.5:7b and the model came down in a few minutes. The Mac mini’s M-series chip and 16GB of unified memory ate it without complaining. No CUDA drivers, no Python environment hell, no Hugging Face token dance.
This was the part that surprised me most. After my Hugging Face afternoon where I drowned in 800,000 model variants, Ollama felt like someone had quietly done the curation work for me.
Eben’s note: I expected the first local model to feel like a downgrade. It felt more like getting my own key to a locked room I had been renting.
The moment the WiFi went dark
I wanted to be sure it was actually local. Not a sneaky background API call, not some hidden phone-home. So I turned off WiFi and unplugged the ethernet cable for good measure. Then I typed: “Write me a Python function that reads a CSV with inconsistent column counts and returns a list of dicts using the first row as headers.”
Qwen thought for maybe three seconds and started generating. The code was not the prettiest — it had a small bug with empty lines — but it was real, working code. With zero connection to anything outside my apartment.
I sat there for a minute just watching the tokens come out. It was not faster than Claude. It was not smarter than Claude. But it was mine. Nobody could rate-limit it. Nobody could deprecate it. Nobody could change the system prompt under me overnight.
What it is not good at
Let me be honest. Qwen 7B is not going to replace Claude Code for serious work. The reasoning is shallower. Long-context tasks make it wobble. When I asked it to refactor a 400-line file, it hallucinated a function that did not exist anywhere in the input.
It is also slower than I expected on prompts longer than a few thousand tokens. The Mac mini fan, which I have basically never heard before, made a polite hum during a longer generation. That is the M-chip telling me it is actually working for once.
For comparison, I still feel the way I felt after watching DeepSeek V3 write Python — the frontier cloud models are in a different league. But that is not the point of running local.
The strange new feeling of ownership
I have been paying for AI tools all year. Claude Pro, OpenRouter credits, the occasional API top-up. Every one of those is a subscription to someone else’s machine. The bill never stops, and the service can change shape any time.
For the first time on Tuesday night, I had an AI that was just sitting on a hard drive in my home. Even if Anthropic disappeared tomorrow, even if OpenAI pulled every API key, even if my proxy died, my Mac mini would still answer me. That is a different category of thing.
What I am taking from this
I am not switching everything to local. Claude Code is still the daily driver and frontier models are worth the money for the hard problems. But I am going to keep at least one solid local model installed at all times now, like a generator in the basement.
If you have a Mac with 16GB or more, the practical takeaway is short: install Ollama, pull qwen2.5:7b or llama3.1:8b, and turn off your WiFi for one prompt. Watch the tokens come out anyway. The feeling is worth more than the answer.
Related Posts
- The Day I Found r/LocalLLaMA and Lost Four Hours to Reading Strangers
- The Afternoon I Tried Hugging Face and Got Lost in 800,000 Models
- The Morning I Realized DeepSeek V3 Writes Python Cleaner Than I Do
Tags: #AIagents #ClaudeCode #OpenClaw #MacMini #OpenRouter #buildinginpublic #Eben