A .pyc file is not Python code, but rather the compiled bytecode of your Python script, which the Python interpreter can execute much faster than parsing the original source code.
Let’s see this in action. Suppose you have a simple Python file named hello.py:
def greet(name):
message = f"Hello, {name}!"
print(message)
greet("World")
When you first import or run this file, Python automatically generates a __pycache__ directory (if it doesn’t exist) and places a corresponding .pyc file inside it, typically named hello.cpython-39.pyc (the 39 indicates the Python version). If you look inside this .pyc file, you won’t see the familiar Python syntax. Instead, you’ll see a sequence of bytes.
To peek at this bytecode, we can use Python’s built-in dis module.
import dis
import hello
dis.dis(hello)
Running this will output something like:
2 0 LOAD_CONST 1 ('Hello, ')
2 LOAD_FAST 0 (name)
4 BINARY_ADD
6 LOAD_CONST 2 ('!')
8 BINARY_ADD
10 STORE_FAST 1 (message)
3 12 LOAD_GLOBAL 0 (print)
14 LOAD_FAST 0 (name)
16 FORMAT_VALUE 0
18 LOAD_CONST 2 ('!')
20 BUILD_STRING 2
22 CALL_FUNCTION 1
24 POP_TOP
26 LOAD_CONST 0 (None)
28 RETURN_VALUE
This output shows the disassembled bytecode. Each line represents an instruction that the Python Virtual Machine (PVM) executes.
LOAD_CONST: Pushes a constant value (like a string or number) from the code object onto the stack.LOAD_FAST: Pushes a local variable from the local namespace onto the stack.BINARY_ADD: Pops the top two items from the stack, adds them, and pushes the result back onto the stack.STORE_FAST: Pops the top item from the stack and stores it in a local variable.LOAD_GLOBAL: Loads a global name (like a functionprint).FORMAT_VALUE: Used for f-strings, formats a value.BUILD_STRING: Concatenates strings.CALL_FUNCTION: Calls a function.POP_TOP: Removes the top item from the stack.RETURN_VALUE: Returns the value at the top of the stack.
The PVM is a stack-based machine. Instructions manipulate data by pushing and popping values from an operand stack. This bytecode compilation is a crucial step in Python’s execution process. It allows Python to avoid re-parsing your source code every single time a module is imported or a script is run, making execution faster. The .pyc file is essentially a cache of this compiled bytecode.
The exact structure of a .pyc file is a bit more involved than just the raw bytecode. It starts with a magic number (indicating the Python version and compatibility), followed by a timestamp of the source file, and then the marshalled Python code object. The marshalling process serializes the code object into a byte stream that can be written to a file.
The primary purpose of .pyc files is performance. When you import a module, Python checks if a valid .pyc file exists and is up-to-date with the .py source file. If it is, Python loads the bytecode directly from the .pyc file, skipping the compilation step. If the .pyc file is missing, stale, or corrupted, Python will recompile the .py file and generate a new .pyc file.
The magic number at the beginning of a .pyc file is a 32-bit integer that changes with Python versions. This helps Python identify if a .pyc file was generated by a compatible Python interpreter. If the magic number doesn’t match, Python will ignore the .pyc file and recompile the source.
When you’re debugging, understanding that .pyc files are just a compiled form of your .py code is key. Errors reported from .pyc files will often point to line numbers that correspond to the original .py source, but the underlying execution is happening at the bytecode level. Sometimes, stale .pyc files can lead to confusing behavior if the source code has changed but the .pyc hasn’t been regenerated. In such cases, deleting the __pycache__ directory and letting Python rebuild it is a common troubleshooting step.
The marshal module is what Python uses to serialize and de-serialize these code objects into the .pyc file format. While you generally don’t need to interact with marshal directly for normal Python development, it’s the underlying mechanism that makes .pyc caching possible.
The next step in understanding Python’s execution is often exploring how the PVM handles exceptions.