Adding Responses API to an Agent Framework

Mar 13, 2026 · 9 min read · Humza Tareen

Responses API OpenAI LLM Agent Framework Python

An open-source agent framework we use relies on LiteLLM's Chat Completions API for all model calls. That design worked well until newer models arrived that only support the Responses API—a different endpoint with a different message format, tool schema shape, and response structure. Running those models failed silently or threw opaque errors. The framework had no way to know which API a given model required; it assumed a single contract. Adding Responses API support without breaking the existing path required careful translation at every boundary: input, tools, and output. This post walks through the engineering decisions that made it work.

The Problem: Two APIs, One Framework

The framework was built around the Chat Completions contract. Messages have role and content. System prompts live in a dedicated system message. Tool definitions use a nested wrapper: {"type": "function", "function": {"name": "...", "description": "...", "parameters": {...}}}. The response is a single message with optional tool_calls. The rest of the platform—orchestration, tool execution, conversation history—expects that shape.

The Responses API changes all of it. System content becomes a developer role. Content blocks have a different structure. Tool schemas are flat: {"name": "...", "description": "...", "parameters": {...}} without the function wrapper. The response is a polymorphic stream of items—message items, function_call items—that must be reassembled into the format the framework expects. Sending Chat Completions–shaped payloads to the Responses endpoint produced 400s or malformed responses. Sending Responses-shaped payloads to Chat Completions broke everything. We needed a routing layer that chose the right API and translated both ways.

Auto-Detection and Routing

Models that require the Responses API are identified by name. We added auto-detection: any model whose name contains codex routes to the Responses API. We also exposed a use_responses_api flag in the YAML config so users can override when needed—for example, if a future model uses a different naming convention.

def should_use_responses_api(model: str, config: AgentConfig) -> bool:
    """Route to Responses API for codex-family models or explicit config."""
    if config.use_responses_api is not None:
        return config.use_responses_api
    return "codex" in model.lower()

The routing happens at the call site. Before invoking LiteLLM, we check should_use_responses_api. If true, we convert the input, call the Responses endpoint, and normalize the response. If false, we use the existing Chat Completions path unchanged. Zero regression for non-codex models.

Input Conversion: Chat Completions → Responses

The hardest part was converting messages losslessly. The agent's conversation history, tool calls, and tool results all flow through this pipeline. A mistranslation would corrupt context or drop tool invocations.

The system role maps to developer. Content must be wrapped in the Responses format: {"type": "input_text", "text": "..."} for simple text, or the appropriate block type for multimodal content. Assistant messages with tool_calls become separate items. Tool results from the user become function_call_output blocks. The converter walks the message list and emits the correct shape for each role and content type.

def convert_messages_to_responses(messages: list[dict]) -> list[dict]:
    """Translate Chat Completions messages to Responses API format."""
    out = []
    for msg in messages:
        role = msg["role"]
        if role == "system":
            out.append({
                "role": "developer",
                "content": [{"type": "input_text", "text": msg["content"]}]
            })
        elif role == "user":
            content = msg.get("content", "")
            if isinstance(content, str):
                blocks = [{"type": "input_text", "text": content}]
            else:
                blocks = [{"type": "input_text", "text": b.get("text", "")} for b in content if b.get("type") == "text"]
            out.append({"role": "user", "content": blocks})
        elif role == "assistant":
            # Handle text + tool_calls; split into appropriate blocks
            blocks = []
            if msg.get("content"):
                blocks.append({"type": "output_text", "text": msg["content"]})
            for tc in msg.get("tool_calls", []):
                blocks.append({
                    "type": "function_call",
                    "call_id": tc["id"],
                    "name": tc["function"]["name"],
                    "arguments": tc["function"].get("arguments", "{}")
                })
            out.append({"role": "assistant", "content": blocks})
        elif role == "tool":
            out.append({
                "role": "user",
                "content": [{
                    "type": "function_call_output",
                    "call_id": msg["tool_call_id"],
                    "output": msg["content"]
                }]
            })
    return out

Edge cases matter: empty content, missing tool_calls, malformed arguments. The converter handles each and preserves the order the API expects. We also handle assistant messages that are only tool calls with no text—a common pattern in multi-turn tool use. One subtlety: the Responses API expects tool results to appear as user messages with function_call_output blocks, interleaved with the assistant's function_call blocks. The Chat Completions format uses a separate tool role. Mapping tool messages to user messages with the right block type preserves the turn structure the model expects. Getting that ordering wrong—for example, batching all tool results at the end—breaks the conversation flow.

Tool Schema Conversion

Function tools in Chat Completions use a nested format. The Responses API expects a flat format. The converter strips the function wrapper and passes through name, description, and parameters. Descriptions can be missing; we default to an empty string. Nested objects in parameters are preserved as-is—the Responses API accepts the same JSON Schema structure.

def flatten_tool_schema(tools: list[dict]) -> list[dict]:
    """Convert Chat Completions function tools to Responses API format."""
    out = []
    for t in tools:
        if t.get("type") != "function":
            continue
        fn = t.get("function", {})
        out.append({
            "type": "function",
            "name": fn.get("name", ""),
            "description": fn.get("description") or "",
            "parameters": fn.get("parameters", {})
        })
    return out

We validate that name is present before adding to the output. Tools without a function key are skipped. The rest of the framework continues to define tools in the Chat Completions shape; the conversion is transparent at the call site. Nested objects in parameters—for example, a properties object with required arrays—pass through unchanged. The Responses API accepts standard JSON Schema, so we don't need to transform the structure, only unwrap the outer function layer.

Response Normalization

The Responses API returns a stream or list of items. Each item can be a message (with content blocks) or a function call. We need to extract the final text and all tool calls into the single-message format the framework expects: {"role": "assistant", "content": "...", "tool_calls": [...]}.

The normalizer iterates over the response items. For message items, it accumulates text from output_text blocks. For function_call items, it appends to a list of tool calls with the correct ID, name, and arguments. The result is a single assistant message that the agent loop can process like any other.

def normalize_responses_output(items: list[dict]) -> dict:
    """Normalize Responses API output to Chat Completions shape."""
    content_parts = []
    tool_calls = []
    for item in items:
        if item.get("type") == "message":
            for block in item.get("content", []):
                if block.get("type") == "output_text":
                    content_parts.append(block.get("text", ""))
        elif item.get("type") == "function_call":
            tool_calls.append({
                "id": item.get("call_id", ""),
                "type": "function",
                "function": {
                    "name": item.get("name", ""),
                    "arguments": item.get("arguments", "{}")
                }
            })
    return {
        "role": "assistant",
        "content": "\n".join(content_parts) if content_parts else "",
        "tool_calls": tool_calls
    }

Empty content and empty tool_calls are valid. The agent loop handles both. The key is that the output shape is identical to what Chat Completions returns, so no downstream code needs to change.

Zero Regression

The existing Chat Completions path is completely untouched. For models that don't use the Responses API, the request flows through the original code path. No conversion, no normalization. We added a thin wrapper at the call site that branches on should_use_responses_api and only runs the conversion pipeline when needed.

We covered the new path with 64 unit tests. Config validation tests ensure the flag is parsed correctly and auto-detection works for codex and non-codex model names. Content conversion tests verify system→developer, user→user, assistant→assistant with tool calls, and tool→function_call_output. Tool schema tests cover the flattening logic, including edge cases like missing descriptions and empty parameters. Response normalization tests use mocked API responses with mixed message and function_call items. End-to-end tests run a minimal agent loop with a mocked Responses API to confirm the full pipeline works.

Adding a new API surface to an existing framework is always risky. By keeping the conversion at the boundary and preserving the internal contract, we avoided touching the orchestration logic. The agent framework still thinks in Chat Completions. The Responses API is an implementation detail behind the same interface.

The takeaway: when a provider introduces a new API that overlaps in purpose but differs in shape, the right approach is to build a translation layer at the call boundary rather than refactoring the entire stack. Auto-detection keeps the default behavior correct for new models. Explicit config overrides give users control when naming conventions change. And comprehensive tests—config, conversion, normalization, and end-to-end—ensure that the translation is correct and that the old path remains untouched. The framework now supports both APIs without the rest of the platform needing to know which one is in use.

Tech stack: Python, LiteLLM, YAML configs, pytest for unit and integration tests.

Multi-Turn Agent Evaluation: Persistent State
Evaluating agents using this framework
Automated LLM Scoring Service
LLM integration patterns
Dev Starter Kit: AI Coding Tools
AI tool configuration including agent frameworks