When you call multiple tools with OpenAI, it’s not like a typical function call where one must finish before the next starts.

Let’s see what that looks like in practice.

Imagine you have a weather tool and a calculator tool.

from openai import OpenAI

client = OpenAI()

tools = [
    {
        "type": "function",
        "function": {
            "name": "get_weather",
            "description": "Get the current weather for a given location",
            "parameters": {
                "type": "object",
                "properties": {
                    "location": {
                        "type": "string",
                        "description": "The city and state, e.g. San Francisco, CA",
                    }
                },
                "required": ["location"],
            },
        },
    },
    {
        "type": "function",
        "function": {
            "name": "calculate_expression",
            "description": "Calculate the value of a mathematical expression",
            "parameters": {
                "type": "object",
                "properties": {
                    "expression": {
                        "type": "string",
                        "description": "The mathematical expression to evaluate, e.g. '2 + 2'",
                    }
                },
                "required": ["expression"],
            },
        },
    },
]

# The user's request
user_message = "What's the weather in New York, NY and what is 5 * 7?"

# First call to the OpenAI API
response = client.chat.completions.create(
    model="gpt-4o",
    messages=[
        {"role": "user", "content": user_message}
    ],
    tools=tools,
    tool_choice="auto",
)

# The assistant's response will contain tool calls for both functions
tool_calls = response.choices[0].message.tool_calls

print(tool_calls)

This first API call will return something like this:

[
  {
    "id": "call_...",
    "type": "function",
    "function": {
      "name": "get_weather",
      "arguments": "{\"location\": \"New York, NY\"}"
    }
  },
  {
    "id": "call_...",
    "type": "function",
    "function": {
      "name": "calculate_expression",
      "arguments": "{\"expression\": \"5 * 7\"}"
    }
  }
]

Notice how both get_weather and calculate_expression are present in the same tool_calls list. The system doesn’t wait for one to finish before deciding on the other. It analyzes the entire user prompt and identifies all the distinct, parallelizable tasks it can delegate to available tools.

This is the core of parallel function calling. The model, based on its understanding of your tools and the user’s request, can simultaneously identify multiple independent operations. It doesn’t execute these calls itself; it simply outputs the instructions (the tool_calls) for you to execute them.

The actual execution happens in your application code. You receive the tool_calls and then, in parallel if you wish, you can invoke your local get_weather and calculate_expression functions.

import asyncio

async def execute_tool_calls(tool_calls):
    results = {}
    tasks = []

    # Placeholder functions for demonstration
    async def get_weather_mock(location):
        print(f"Fetching weather for {location}...")
        await asyncio.sleep(1) # Simulate network latency
        return {"temperature": "70", "unit": "Fahrenheit", "condition": "Sunny"}

    async def calculate_expression_mock(expression):
        print(f"Calculating {expression}...")
        await asyncio.sleep(0.5) # Simulate computation
        try:
            return eval(expression)
        except:
            return "Error: Invalid expression"

    for tool_call in tool_calls:
        function_name = tool_call.function.name
        function_args = json.loads(tool_call.function.arguments)

        if function_name == "get_weather":
            task = asyncio.create_task(get_weather_mock(**function_args))
            tasks.append((tool_call.id, task))
        elif function_name == "calculate_expression":
            task = asyncio.create_task(calculate_expression_mock(**function_args))
            tasks.append((tool_call.id, task))

    # Wait for all tasks to complete
    for call_id, task in tasks:
        results[call_id] = await task

    return results

# Assuming 'tool_calls' is the list from the previous API response
import json
# In a real app, 'tool_calls' would be the actual list of tool calls from the API response
# For this example, let's re-create it for clarity:
tool_calls_data = [
  {
    "id": "call_abc123",
    "type": "function",
    "function": {
      "name": "get_weather",
      "arguments": "{\"location\": \"New York, NY\"}"
    }
  },
  {
    "id": "call_def456",
    "type": "function",
    "function": {
      "name": "calculate_expression",
      "arguments": "{\"expression\": \"5 * 7\"}"
    }
  }
]

# Mocking the structure of the API response for the loop
class MockToolCall:
    def __init__(self, id, function_name, function_arguments):
        self.id = id
        self.function = self.MockFunction(function_name, function_arguments)

    class MockFunction:
        def __init__(self, name, arguments):
            self.name = name
            self.arguments = arguments

tool_calls_objects = [
    MockToolCall(tc["id"], tc["function"]["name"], tc["function"]["arguments"])
    for tc in tool_calls_data
]


async def main():
    tool_results = await execute_tool_calls(tool_calls_objects)
    print("Tool Results:", tool_results)

    # Constructing the messages for the next API call
    messages = [
        {"role": "user", "content": user_message},
        # Add the assistant's initial tool calls
        {"role": "assistant", "content": None, "tool_calls": tool_calls_objects}
    ]

    # Add the results of the tool calls
    for tool_call_id, result in tool_results.items():
        messages.append({
            "role": "tool",
            "tool_call_id": tool_call_id,
            "content": json.dumps(result) # Content should be a string
        })

    # Second call to the OpenAI API to get the final answer
    final_response = client.chat.completions.create(
        model="gpt-4o",
        messages=messages,
    )

    print("\nFinal Assistant Response:")
    print(final_response.choices[0].message.content)

asyncio.run(main())

Running this would output something like:

Fetching weather for New York, NY...
Calculating 5 * 7...
Tool Results: {'call_abc123': {'temperature': '70', 'unit': 'Fahrenheit', 'condition': 'Sunny'}, 'call_def456': 35}

Final Assistant Response:
The weather in New York, NY is Sunny with a temperature of 70°F. The calculation of 5 * 7 is 35.

The key takeaway is that tool_choice="auto" is the default and it’s what enables this parallel thinking. When you explicitly set tool_choice to a specific tool (e.g., {"type": "function", "function": {"name": "get_weather"}}), you are forcing the model to only consider that single tool for the current turn. This is useful for disambiguation or when you know only one tool is relevant, but it disables the parallel function calling capability.

The model doesn’t actually run your Python functions. It generates structured JSON representing the function calls it wants you to make. Your code then takes this JSON, executes the corresponding functions (potentially in parallel using asyncio or threads), and then feeds the results back to the model in a subsequent API call. The model then synthesizes these results into a natural language response.

The model doesn’t guarantee that the order of tool calls in the tool_calls array reflects any execution order. Your application is responsible for handling the execution and potential dependencies between tool calls, although the parallel calling mechanism implies they are designed to be independent.

Ultimately, the model’s ability to identify multiple tools in a single turn is a function of its understanding of the task and the tool descriptions. If the user’s prompt clearly expresses two distinct, actionable requests that map to different tools, and those tools are provided in the tools parameter, the model will likely generate parallel tool calls.

The next step after successfully handling parallel tool calls is often managing complex tool dependencies or implementing robust error handling for individual tool executions.

Want structured learning?

Take the full Openai-api course →