The Redis server is reporting a BUSY error because a Lua script is monopolizing the CPU, preventing other Redis commands from executing.
This typically happens when a Lua script runs for an unexpectedly long time, either due to inefficient logic, infinite loops, or processing a massive amount of data within the script. Because Redis is single-threaded for command execution, this script effectively blocks all other operations.
Common Causes and Fixes
-
Infinite Loop in Lua Script:
- Diagnosis: Review your Lua script for any conditions that might lead to an infinite loop. Look for
whileorrepeat-untilloops without proper exit conditions, or recursive calls that don’t have a base case. You can also monitor Redis’sDEBUG OBJECT <key>for script-related keys if you suspect a specific script is the culprit, though this is less direct for identifying the loop itself. A more common approach is to inspect the script code itself. - Fix: Add or correct the exit condition in your loop. For example, if you have a loop like
while redis.call('LLEN', 'my_list') > 0 do ... end, ensure that inside the loop, an element is actually removed frommy_list(e.g.,redis.call('LPOP', 'my_list')). - Why it works: A correct exit condition ensures the loop terminates, allowing Redis to process other commands.
- Diagnosis: Review your Lua script for any conditions that might lead to an infinite loop. Look for
-
Processing an Enormous Data Set within a Script:
- Diagnosis: The script might be iterating over a very large Redis data structure (e.g., a set with millions of members, a sorted set with many elements, or a hash with thousands of fields) and performing an operation on each element. Check the size of the keys your script interacts with using commands like
SCARD,ZCARD,HLEN, orLLENoutside of the script, or by examining the script’s logic for iterating over large collections. - Fix: Break down the operation into smaller, manageable chunks. Instead of processing all elements at once, process a subset and use a Redis key to track progress. For example, if you’re processing a sorted set, process 100 elements, store the last element processed, and then call the script again with that marker to continue.
- Why it works: By limiting the amount of work done per script execution, you ensure that each script invocation finishes quickly, preventing it from blocking the server.
- Diagnosis: The script might be iterating over a very large Redis data structure (e.g., a set with millions of members, a sorted set with many elements, or a hash with thousands of fields) and performing an operation on each element. Check the size of the keys your script interacts with using commands like
-
Inefficient Redis Commands within the Lua Script:
- Diagnosis: The Lua script might be using Redis commands that are inherently slow when executed many times within a loop. For example, repeatedly calling
SMEMBERSon a large set inside a loop is highly inefficient. Useredis-cli --latencyto check overall latency, and then analyze the script’s command usage. - Fix: Replace inefficient commands with more optimized ones or restructure the script. For instance, instead of fetching all members of a set and processing them in Lua, consider using Lua’s built-in set operations if available, or use commands like
SSCANto iterate efficiently. If you need to get all members, do it once outside the script and pass the data into the script if feasible. - Why it works: Using commands that are designed for bulk operations or are more efficient within Redis’s execution model reduces the overall time spent per script run.
- Diagnosis: The Lua script might be using Redis commands that are inherently slow when executed many times within a loop. For example, repeatedly calling
-
Excessive Inter-Script Communication (EVALSHA calls):
- Diagnosis: If your application frequently calls
EVALSHAwith different script hashes, and Redis has to load many scripts into memory, or if there’s a high rate of script cache misses, it can contribute to overhead. Monitor thescript_cache_hit_ratein Redis’sINFO. A low rate indicates frequent loading. - Fix: Ensure all your Lua scripts are loaded once using
SCRIPT LOADand then consistently called usingEVALSHAwith the returned SHA1 digest. Avoid usingEVALfor scripts that will be called repeatedly. - Why it works:
EVALSHAis faster thanEVALbecause Redis can directly execute a script it already has in its cache, avoiding the parsing and loading overhead ofEVAL.
- Diagnosis: If your application frequently calls
-
Resource Contention (CPU/Memory):
- Diagnosis: While less common for a single script to cause system-wide CPU exhaustion unless it’s truly massive, an extremely memory-intensive script could indirectly impact performance by causing excessive memory swapping or garbage collection pressure on the host machine. Monitor your server’s CPU and memory usage.
- Fix: Optimize the script for memory usage. Avoid creating large temporary data structures within Lua. If the script needs to process large amounts of data, consider moving that processing logic out of Redis and into a dedicated application layer that can handle memory more flexibly.
- Why it works: Reducing the script’s memory footprint prevents the Redis process from becoming a bottleneck due to host system resource exhaustion.
-
External Factors Affecting Redis Performance:
- Diagnosis: Sometimes, the Redis
BUSYerror is a symptom of a broader system issue. High network latency, other processes on the Redis server consuming CPU, or slow disk I/O (though Redis is primarily in-memory, persistence can be a factor) can make Redis appear unresponsive. Use tools liketop,htop,iostat, andnetstaton the Redis server. - Fix: Address the external resource contention. This might involve optimizing other applications on the server, improving network connectivity, or ensuring Redis is running on a dedicated, well-resourced machine.
- Why it works: By ensuring the underlying infrastructure is healthy, Redis can perform its operations without being throttled by external dependencies.
- Diagnosis: Sometimes, the Redis
After addressing the runaway Lua script, you might encounter a WRONGTYPE error if the script previously modified a key to a different data type, and subsequent commands expect the original type.