The most surprising thing about prompt engineering for function calling is that the LLM doesn’t actually understand the functions you give it in the way a human programmer does; it’s a sophisticated pattern-matching and prediction engine that simulates understanding based on the input it receives.

Let’s see this in action. Imagine we have a simple weather API wrapper:

def get_current_weather(location: str, unit: str = "celsius") -> dict:
    """Get the current weather in a given location.

    Args:
        location: The city and state, or city and country.
        unit: The temperature unit to use, either "celsius" or "fahrenheit".
    """
    # In a real scenario, this would call a weather API.
    # For demonstration, we'll return mock data.
    print(f"Simulating API call for {location} with unit {unit}")
    if "tokyo" in location.lower():
        return {"location": location, "temperature": "15", "unit": unit}
    elif "san francisco" in location.lower():
        return {"location": location, "temperature": "60", "unit": unit}
    else:
        return {"location": location, "temperature": "20", "unit": unit}

Now, if we tell an LLM about this function and ask a question:

User: What’s the weather like in San Francisco?

LLM (simulated output):

{
    "tool_code": "print(get_current_weather(location='San Francisco'))"
}

This JSON output isn’t just a string; it’s a structured representation of a Python function call. The LLM has analyzed the user’s request, identified the get_current_weather function as relevant, and correctly inferred the location argument. Since unit has a default, it wasn’t explicitly provided in the prompt, and the LLM omitted it, relying on the function’s default.

Here’s how it works under the hood, and the levers you control:

  1. Function Definitions (The Schema): The LLM needs to know what tools are available. This is typically done by providing a structured list of functions, often in JSON schema format. Each function needs a name, a description, and a schema for its arguments. The descriptions are critical. They are the primary way the LLM understands when to use a function and what it does.

    {
      "name": "get_current_weather",
      "description": "Get the current weather in a given location.",
      "parameters": {
        "type": "object",
        "properties": {
          "location": {
            "type": "string",
            "description": "The city and state, or city and country."
          },
          "unit": {
            "type": "string",
            "enum": ["celsius", "fahrenheit"],
            "description": "The temperature unit to use, either 'celsius' or 'fahrenheit'."
          }
        },
        "required": ["location"]
      }
    }
    

    The description fields are where you imbue the LLM with the "intent" of the function. Be precise. For location, "The city and state, or city and country" is better than just "location name." For unit, specifying the enum ("celsius", "fahrenheit") guides the LLM to only pick valid options.

  2. User Query (The Intent): This is the natural language prompt from the user. The LLM’s job is to map this intent to the available tools.

    User: What’s the temperature in Paris?

    LLM (simulated output):

    {
        "tool_code": "print(get_current_weather(location='Paris'))"
    }
    

    Notice how "temperature" maps to the get_current_weather function because its description implies it provides weather information, which includes temperature.

  3. Tool Selection and Argument Extraction: The LLM uses its training on vast amounts of text and code to identify which function(s) best match the user’s query and then extracts the necessary arguments. This is where the prompt engineering comes in. The quality of the function descriptions, the clarity of the user query, and the LLM’s underlying capabilities all play a role.

    Consider a more complex query:

    User: Tell me the weather in Tokyo in Fahrenheit.

    LLM (simulated output):

    {
        "tool_code": "print(get_current_weather(location='Tokyo', unit='fahrenheit'))"
    }
    

    Here, the LLM correctly identified both the location and the desired unit, even though unit was specified as "Fahrenheit" and not one of the exact enum values. The LLM is capable of this mapping.

  4. Chaining and Multi-turn Conversations: Function calling isn’t limited to a single turn. You can execute a function, get its result, and then feed that result back into the LLM to generate a natural language response or to trigger another function.

    User: What’s the weather like in San Francisco? LLM: (Outputs {"tool_code": "print(get_current_weather(location='San Francisco'))"}) (Your code executes this, gets {"location": "San Francisco", "temperature": "60", "unit": "celsius"}) User: And in Tokyo? LLM: (Outputs {"tool_code": "print(get_current_weather(location='Tokyo'))"}) (Your code executes this, gets {"location": "Tokyo", "temperature": "15", "unit": "celsius"})

    You can then feed both results back:

    LLM (after receiving both results): The weather in San Francisco is 60 degrees Celsius, and in Tokyo, it’s 15 degrees Celsius.

    This chaining is powerful. The LLM doesn’t remember past tool calls inherently; you must provide the context (the results of previous calls) in subsequent prompts for it to maintain continuity.

The most subtle aspect of function calling is how the LLM handles ambiguity and the "required" versus "optional" parameters. If a function has a required parameter and the user doesn’t provide enough information, the LLM will often ask a clarifying question instead of making a guess or failing. For example, if you only had get_current_weather(location: str) and the user said "What’s the weather?", the LLM might respond: "I can tell you the weather, but I need to know the location. Where are you interested in?" This is a crucial behavior to leverage for robust applications.

The next logical step after mastering single tool use is understanding how to handle situations where multiple tools could be called, or when a single user request requires a sequence of tool invocations.

Want structured learning?

Take the full Prompt-engineering course →