how to use ollama

📖 Bu rehber ToolPazar ekibi tarafından hazırlanmıştır. Tüm araçlarımız ücretsiz ve reklamsızdır.

What Ollama actually is

On macOS or Linux, a single curl command gets you the binary:

Installing Ollama

On Windows, grab the installer from ollama.com. On Linux servers, the install script also registers a systemd unit so the daemon survives reboots. Verify the install:

Pulling and running your first model

Pick a model based on your RAM. For a 16GB laptop, Llama 3.1 8B quantized to Q4 is the sweet spot. For 8GB machines, drop to Phi-3 Mini or Qwen 2.5 3B. For 32GB+, Mistral Small or Llama 3.1 70B (heavily quantized) become viable.

Using the HTTP API

The first run streams tokens to your terminal. Subsequent runs reuse the loaded model from memory until it idles out (five minutes by default).

Picking the right quantization

With the OpenAI SDK, just swap the base URL and use any string for the API key:

What Ollama actually is

Installing Ollama

Pulling and running your first model

Using the HTTP API

Picking the right quantization

When Ollama is the wrong choice