LOCAL MODEL COMPATIBILITY // OPERATOR NOTES

LOCAL MODEL
COMPATIBILITY

This page is a practical operator note for keeping OpenZero stable: which local model lane to use, how quantization should be interpreted in practice, how Ollama storage behaves, and what to do when a node needs a compatibility fallback.

IMPORTANT DISTINCTION

Google quantum progress, Gemma releases, GGUF quantization methods, and Hugging Face model packaging are public external work. The OpenZero job is to implement them properly: choose sane defaults, explain the storage paths honestly, and make installation easier for operators.

01 DEFAULT_MODEL_PATH

DEFAULT EDGE TRACK

Fresh nodes should stay on the Gemma 4 edge path first. Use gemma4:e4b as the normal default, drop to gemma4:e2b for weaker hardware, and only climb to gemma4:26b or gemma4:31b when the box can actually support it.

WHY THIS DEFAULT

The smaller Gemma 4 edge variants are the right “it actually runs” path for most operators. That is better than setting a glamorous default that fails on normal machines and makes OpenZero look broken.

CLOUD FALLBACK

When the task is heavier than the local node can comfortably handle, use Groq, GPT-OSS, or another cloud lane. Local Gemma should be the stable private baseline, not the only lane.

02 MODEL_STORAGE_REALITY

OLLAMA STORE

Native Ollama pulls do not appear in OpenZero's ./models folder. Ollama keeps its own model store, which is why users can feel like “the model is missing” even when the pull actually worked.

LOCAL ./MODELS FOLDER

The local ./models directory is for custom GGUF files that are downloaded and injected manually. That is the correct place for direct Hugging Face GGUF workflows, not native Ollama library pulls.

OPERATOR FIX

The interface should explain that split clearly: “native Ollama models live in the Ollama store; ./models is for manual GGUF injection.” That removes a lot of false “it failed” confusion.

03 QUANTIZATION_CHOICES
Quantization Use It When OpenZero Guidance
Q4_K_M You want the best practical balance of speed, RAM use, and answer quality on normal hardware. Best default for most custom GGUF installs.
Q6 You have a stronger machine and want a little more fidelity without jumping all the way up. Good middle lane when the node has more headroom.
Q8_0 You have real RAM to spare and want minimal compression. Only use this when the hardware actually justifies it.
04 HUGGING_FACE_INJECTION

DIRECT GGUF LINKS ONLY

Use direct GGUF download links, not just the model card page. If the node is told to pull a page instead of a file, the workflow will look broken even though the problem is just the URL.

USE BUILT-IN GEMMA FIRST

If you want the official Google local path, use the built-in Gemma 4 install buttons first. Hugging Face injection is for custom aliases, custom quantizations, and alternative GGUF builds.

IMPLEMENTATION RULE

OpenZero should treat public model releases as deployable options: explain them clearly, route them to the right storage path, and avoid overclaiming them as proprietary internal breakthroughs.