AI APIs
Calling LLM APIs from code with keys, requests, and streaming
Overview
Most applications integrate AI through HTTP APIs (OpenAI, Anthropic, Google, Azure). Send messages; the provider returns text, embeddings, or structured data. Store API keys in environment variables and proxy calls through your backend.
Syntax / Usage
Standard chat completion request (OpenAI-compatible shape used by many providers):
const response = await fetch("https://api.openai.com/v1/chat/completions", {
method: "POST",
headers: {
"Content-Type": "application/json",
Authorization: `Bearer ${process.env.OPENAI_API_KEY}`,
},
body: JSON.stringify({
model: "gpt-4o-mini",
messages: [
{ role: "system", content: "You are a concise coding assistant." },
{ role: "user", content: "Explain async/await in one paragraph." },
],
temperature: 0.2,
max_tokens: 500,
}),
});
const data = await response.json();
const text = data.choices[0].message.content;
Environment setup:
# .env.local (never commit)
OPENAI_API_KEY=sk-...
Streaming: set stream: true, read SSE chunks, forward tokens to the client for lower perceived latency. Proxy all calls through a server route so keys never reach the browser.
Examples
Basic error handling and retries:
async function chat(messages: Message[], retries = 2): Promise<string> {
for (let attempt = 0; attempt <= retries; attempt++) {
const res = await fetch(url, { method: "POST", headers, body });
if (res.status === 429 && attempt < retries) {
await new Promise((r) => setTimeout(r, 1000 * (attempt + 1)));
continue;
}
if (!res.ok) throw new Error(`API error ${res.status}`);
const data = await res.json();
return data.choices[0].message.content;
}
throw new Error("Max retries exceeded");
}
Cost control: set max_tokens, cache identical requests, use smaller models for drafts, and log token usage per user/feature.
Common Mistakes
- Exposing API keys in client-side JavaScript or mobile apps
- No rate limiting on your own endpoints—users can drain your budget
- Ignoring 429/5xx retries and timeouts on long completions
- Parsing free-form text when structured output or tool calls would be safer
- Logging full prompts/responses containing passwords, tokens, or personal data
See Also
prompt-engineering large-language-models ai-agents