stackademic

The leading education platform for anyone with an interest in software development.

AI APIs

Calling LLM APIs from code with keys, requests, and streaming

Overview

Most applications integrate AI through HTTP APIs (OpenAI, Anthropic, Google, Azure). Send messages; the provider returns text, embeddings, or structured data. Store API keys in environment variables and proxy calls through your backend.

Syntax / Usage

Standard chat completion request (OpenAI-compatible shape used by many providers):

const response = await fetch("https://api.openai.com/v1/chat/completions", {
  method: "POST",
  headers: {
    "Content-Type": "application/json",
    Authorization: `Bearer ${process.env.OPENAI_API_KEY}`,
  },
  body: JSON.stringify({
    model: "gpt-4o-mini",
    messages: [
      { role: "system", content: "You are a concise coding assistant." },
      { role: "user", content: "Explain async/await in one paragraph." },
    ],
    temperature: 0.2,
    max_tokens: 500,
  }),
});

const data = await response.json();
const text = data.choices[0].message.content;

Environment setup:

# .env.local (never commit)
OPENAI_API_KEY=sk-...

Streaming: set stream: true, read SSE chunks, forward tokens to the client for lower perceived latency. Proxy all calls through a server route so keys never reach the browser.

Examples

Basic error handling and retries:

async function chat(messages: Message[], retries = 2): Promise<string> {
  for (let attempt = 0; attempt <= retries; attempt++) {
    const res = await fetch(url, { method: "POST", headers, body });
    if (res.status === 429 && attempt < retries) {
      await new Promise((r) => setTimeout(r, 1000 * (attempt + 1)));
      continue;
    }
    if (!res.ok) throw new Error(`API error ${res.status}`);
    const data = await res.json();
    return data.choices[0].message.content;
  }
  throw new Error("Max retries exceeded");
}

Cost control: set max_tokens, cache identical requests, use smaller models for drafts, and log token usage per user/feature.

Common Mistakes

  • Exposing API keys in client-side JavaScript or mobile apps
  • No rate limiting on your own endpoints—users can drain your budget
  • Ignoring 429/5xx retries and timeouts on long completions
  • Parsing free-form text when structured output or tool calls would be safer
  • Logging full prompts/responses containing passwords, tokens, or personal data

See Also

prompt-engineering large-language-models ai-agents