Question 1

Why are embeddings so cheap compared to LLM calls?

Accepted Answer

Embedding models are much smaller than generative LLMs and run a single forward pass per text (no token-by-token generation). OpenAI's text-embedding-3-small is 25x cheaper than GPT-4o mini for input processing. Embed everything once; query cheaply with vectors.

Question 2

Which embedding model is best?

Accepted Answer

For English text: OpenAI text-embedding-3-large is reliable default. For quality: Voyage AI voyage-3 often benchmarks higher. For local/self-hosted: BGE-M3 and E5 families are strong open-source choices. For domain-specific: consider fine-tuned embeddings (Voyage offers law, code, finance variants).

Question 3

How do I know how many embeddings I need?

Accepted Answer

Count documents × chunks per document. Typical chunking: 500-1000 tokens per chunk. A 1000-page corpus (~500k tokens) makes ~500-1000 chunks. Re-embedding when content updates, not from scratch, saves cost long-term — use content hashing to detect changes.

Question 4

What's a good embedding dimension?

Accepted Answer

768-1536 is standard. Smaller (384) is faster and cheaper but slightly less accurate. Larger (3072+) is diminishing returns. Most production systems use 1024-1536. Storage cost matters at scale: 1M embeddings at 1536 dims = ~6GB in a vector DB.

Embedding Cost Estimator

Nasıl Kullanılır

Sık Sorulan Sorular