It deletes login token and corresponding data if there is more than 10 million records
QUIT = False LIMIT = 10000000 def clean_session: while not QUIT: size = conn.zcard('recent:') if size <= LIMIT: time.sleep(1) continue # find out the range in `recent:` ZSET end_index = min(size-LIMIT, 100) tokens = conn.zrange('recent:', 0, end_index-1) # delete corresponding data session_keys =  for token in tokens: session_keys.append('viewed:' + token) conn.delete(*session_keys) conn.hdel('login:', *tokens) conn.zrem('recent:', *tokens)
the question is:
why delete 100 records at most per time?
why not just delete
size - LIMITrecords at once?
is there some performance consideration?
I guess there are multiple reasons for that choice.
Redis is a single-threaded event loop. It means a large command (for instance a large zrange, or a large del, hdel or zrem) will be processed faster than several small commands, but with an impact on the latency for the other sessions. If a large command takes one second to execute, all the clients accessing Redis will be blocked for one second as well.
A first reason is therefore to minimize the impact of these cleaning operations on the other client processes. By segmenting the activity in several small commands, it gives a chance to other clients to execute their commands as well.
A second reason is the size of the communication buffers in Redis server. A large command (or a large reply) may take a lot of memory. If millions of items are to be cleaned out, the reply of the lrange command or the input of the del, hdel, zrem commands can represent megabytes of data. Past a certain limit, Redis will close the connection to protect itself. So it is better to avoid dealing with very large commands or very large replies.
A third reason is the memory of the Python client. If millions of items have to be cleaned out, Python will have to maintain very large list objects (tokens and session_keys). They may or may not fit in memory.
The proposed solution is incremental: whatever the number of items to delete, it will avoid consuming a lot of memory on both client and Redis sides. It will also avoid to hit the communication buffer limit (resulting in the connection to be closed), and will limit the impact on the performance of the other processes accessing Redis.
Note that the 100 value is arbitrary. A smaller value will allow for better latencies at the price of a lower session cleaning throughput. A larger value will increase the throughput of the cleaning algorithm at the price of higher latencies.
It is actually a classical trade-off between the throughput of the cleaning algorithm, and the latency of other operations.