Redis大量数据集中过期导致CPU使用率高原理分析
本文主要讲述Redis数据集中过期导致CPU使用率高的相关原理。
(1) 遇到的问题
2023.08.16 00:00 忽然收到报警,Redis集群CPU使用率 > 75% ,持续报警了好几分钟。
看监控业务流量和请求的QPS相比昨天都没变化;
看到Redis命令分布的监控里有大量unlink命令,而且时间和CPU使用率飙升的时间吻合。
(2) 原因
Redis集群里大量数据在 2023.08.16 00:00 过期,Redis处理大量unlink 命令,导致CPU使用率飙升。
(3) 缓存过期导致CPU使用率高原理
todo
(4) Redis缓存过期原理
Redis过期数据删除使用的是惰性删除+定期删除配合使用。
(4.1) 惰性删除
Redis Server在key数据过期后不会立即主动删除,因为主动定时删除太费CPU了,如果在下一次查询时key还存在并且过期了,然后才会调dbAsyncDelete 或 dbSyncDelete 函数删除key。
(4.1.1) GET命令调用链
Redis Server 在每次GET
命令执行时,调用链如下
// Redis 6.0
// file: 主要代码在 src/db.c
(t_string.c) getCommand
-> (t_string.c) getGenericCommand
-> (db.c) lookupKeyReadOrReply()
-> (db.c) lookupKeyRead()
-> (db.c) lookupKeyReadWithFlags()
-> (db.c) expireIfNeeded()
-> keyIsExpired()
-> propagateExpire()
-> notifyKeyspaceEvent()
-> dbAsyncDelete()
-> dbSyncDelete()
-> signalModifiedKey()
/* 当我们要对指定的key执行某些操作时,会调用次函数,
* 但是这个key可能在逻辑上已经过期,但是仍然在数据库里。
* 调用此函数的主要方式是通过 lookupKey*() 族 函数
*
* The behavior of the function depends on the replication role of the
* instance, because slave instances do not expire keys, they wait
* for DELs from the master for consistency matters. However even
* slaves will try to have a coherent return value for the function,
* so that read commands executed in the slave side will be able to
* behave like if the key is expired even if still present (because the
* master has yet to propagate the DEL).
*
* In masters as a side effect of finding a key which is expired, such
* key will be evicted from the database. Also this may trigger the
* propagation of a DEL/UNLINK command in AOF / replication stream.
*
* The return value of the function is 0 if the key is still valid,
* otherwise the function returns 1 if the key is expired. */
int expireIfNeeded(redisDb *db, robj *key) {
if (!keyIsExpired(db,key)) return 0;
/* If we are running in the context of a slave, instead of
* evicting the expired key from the database, we return ASAP:
* the slave key expiration is controlled by the master that will
* send us synthesized DEL operations for expired keys.
*
* Still we try to return the right information to the caller,
* that is, 0 if we think the key should be still valid, 1 if
* we think the key is expired at this time. */
if (server.masterhost != NULL) return 1;
server.stat_expiredkeys++;
propagateExpire(db,key,server.lazyfree_lazy_expire);
notifyKeyspaceEvent(NOTIFY_EXPIRED,
"expired",key,db->id);
// 删除key 根据配置判断是异步删除还是同步删除
int retval = server.lazyfree_lazy_expire ? dbAsyncDelete(db,key) :
dbSyncDelete(db,key);
if (retval) signalModifiedKey(NULL,db,key);
return retval;
}
Redis Server 配置
// file: redis.conf
############################# LAZY FREEING ####################################
lazyfree-lazy-eviction yes
lazyfree-lazy-expire yes
lazyfree-lazy-server-del yes
replica-lazy-flush no
调同步删除还是异步删除依赖redis.conf里的配置lazyfree-lazy-expire
(4.1.2) 同步删除-dbSyncDelete()
// Redis 6.0
// file: src/db.c
/*
从数据库删除键、值 和 关联的节点
Delete a key, value, and associated expiration entry if any, from the DB
*/
int dbSyncDelete(redisDb *db, robj *key) {
// 从expires dict 删除 key对应robj的ptr
// 备注:从 expires dict 删除节点不会释放 key对应的SDS,因为它是与主dict共享的。
if (dictSize(db->expires) > 0) dictDelete(db->expires,key->ptr);
// 从dict 删除节点. 真正删除key和value并释放内存给内存分配器
if (dictDelete(db->dict,key->ptr) == DICT_OK) {
// 如果启用了redis cluster,删除对应solt的key
if (server.cluster_enabled) slotToKeyDel(key->ptr);
return 1;
} else {
return 0;
}
}
(4.1.3) 异步删除-dbAsyncDelete()
/*
* 从数据库中删除key、value和对应的节点。
* 如果有足够的内存来释放值对象,则可以将其放入延迟释放列表,而不是同步释放。
* 惰性空闲列表将在另一个bio.c线程中回收。
*/
#define LAZYFREE_THRESHOLD 64
int dbAsyncDelete(redisDb *db, robj *key) {
// 从expires dict 删除 key对应robj的ptr
// 备注:从 expires dict 删除节点不会释放 key对应的SDS,因为它是与主dict共享的。
if (dictSize(db->expires) > 0) dictDelete(db->expires,key->ptr);
/* If the value is composed of a few allocations, to free in a lazy way
* is actually just slower... So under a certain limit we just free
* the object synchronously. */
dictEntry *de = dictUnlink(db->dict,key->ptr);
if (de) {
robj *val = dictGetVal(de);
size_t free_effort = lazyfreeGetFreeEffort(val);
/* If releasing the object is too much work, do it in the background
* by adding the object to the lazy free list.
* Note that if the object is shared, to reclaim it now it is not
* possible. This rarely happens, however sometimes the implementation
* of parts of the Redis core may call incrRefCount() to protect
* objects, and then call dbDelete(). In this case we'll fall
* through and reach the dictFreeUnlinkedEntry() call, that will be
* equivalent to just calling decrRefCount(). */
if (free_effort > LAZYFREE_THRESHOLD && val->refcount == 1) {
atomicIncr(lazyfree_objects,1);
bioCreateBackgroundJob(BIO_LAZY_FREE,val,NULL,NULL);
dictSetVal(db->dict,de,NULL);
}
}
/* Release the key-val pair, or just the key if we set the val
* field to NULL in order to lazy free it later. */
if (de) {
dictFreeUnlinkedEntry(db->dict,de);
if (server.cluster_enabled) slotToKeyDel(key->ptr);
return 1;
} else {
return 0;
}
}
(4.2) 定期删除
// c