You can get an LLM to write code for you, sure, but the real magic is getting it to debug code, and it’s way more powerful when you combine the two.

Here’s a Python script that uses a hypothetical LLM API (we’ll call it llm_api) to generate a function, then debug it based on an error message.

import llm_api

def generate_and_debug_code():
    prompt_generate = """
    Write a Python function called `calculate_average` that takes a list of numbers and returns their average.
    Handle the case where the input list is empty by returning 0.
    """
    generated_code = llm_api.generate(prompt_generate)
    print("--- Generated Code ---")
    print(generated_code)

    # Simulate a runtime error
    error_message = "TypeError: unsupported operand type(s) for /: 'float' and 'NoneType'"
    bad_input_example = [1, 2, None, 4]

    prompt_debug = f"""
    The following Python code has a bug.
    Error: {error_message}
    Input that caused the error: {bad_input_example}

    Here is the code:
    ```python
    {generated_code}
    ```

    Please fix the code to handle this error and provide the corrected version.
    """
    corrected_code = llm_api.debug(prompt_debug)
    print("\n--- Corrected Code ---")
    print(corrected_code)

    # Verify the fix
    exec(corrected_code) # In a real scenario, you'd import or use a more robust method
    try:
        result = calculate_average(bad_input_example)
        print(f"\nResult with bad input after fix: {result}")
    except Exception as e:
        print(f"\nError after fix: {e}")

if __name__ == "__main__":
    generate_and_debug_code()

This isn’t just about asking an LLM to "write code." It’s about a two-stage process: initial generation, then a specific debugging loop. The llm_api.generate call gets us a starting point, but the real value comes from the llm_api.debug call, where we provide context: the error message, the problematic input, and the code itself. The LLM then acts as a sophisticated debugger, not just a code writer.

The core problem this solves is the disconnect between a human’s understanding of a bug and the LLM’s ability to synthesize code. By feeding the LLM the precise error and context, we guide it to the exact point of failure. It’s like giving a junior developer a stack trace and the failing test case.

Internally, the llm_api.debug function would likely involve a more complex prompt than what’s shown. It might instruct the LLM to:

  1. Analyze the error message: Understand the type of error (TypeError, ValueError, IndexError, etc.) and the operands involved.
  2. Correlate with input: Examine the bad_input_example to see how it triggers the error. In our case, the None value is causing the TypeError during division.
  3. Locate the faulty logic: Pinpoint the line(s) in the generated_code that are responsible.
  4. Propose a fix: Suggest modifications to the code to prevent the error. This could involve adding type checks, handling None values, or adjusting the algorithm.
  5. Generate corrected code: Output the modified code block.

The levers you control are primarily in the prompt_debug. The more specific you are with the error message and the input that caused it, the better the LLM can target the fix. You can also add instructions like "ensure the function still handles empty lists correctly" or "prioritize readability" to guide the correction.

The most surprising thing is how well LLMs can infer the intent of the original code when debugging, even if the original generation was flawed. They don’t just fix the syntax; they often "understand" what the code should be doing and correct it accordingly, especially when given a clear error and input example. For instance, when faced with None in a list of numbers for averaging, the LLM doesn’t just remove None or raise an error; it might infer that None should be treated as a zero or skipped, depending on the broader context implied by the initial prompt and the error.

The next step is to consider how to automate this debugging loop, potentially feeding the corrected code back into a test suite for further validation.

Want structured learning?

Take the full Prompt-engineering course →