Redis大量数据集中过期导致CPU使用率高原理分析

本文主要讲述Redis数据集中过期导致CPU使用率高的相关原理。

(1) 遇到的问题

2023.08.16 00:00 忽然收到报警,Redis集群CPU使用率 > 75% ,持续报警了好几分钟。

 看监控业务流量和请求的QPS相比昨天都没变化;
 看到Redis命令分布的监控里有大量unlink命令,而且时间和CPU使用率飙升的时间吻合。

(2) 原因

 Redis集群里大量数据在 2023.08.16 00:00 过期,Redis处理大量unlink 命令,导致CPU使用率飙升。

(3) 缓存过期导致CPU使用率高原理

todo

(4) Redis缓存过期原理

 Redis过期数据删除使用的是惰性删除+定期删除配合使用。

(4.1) 惰性删除

Redis Server在key数据过期后不会立即主动删除,因为主动定时删除太费CPU了,如果在下一次查询时key还存在并且过期了,然后才会调dbAsyncDelete 或 dbSyncDelete 函数删除key。

(4.1.1) GET命令调用链

Redis Server 在每次GET命令执行时,调用链如下

// Redis 6.0
// file: 主要代码在 src/db.c

(t_string.c) getCommand
    -> (t_string.c) getGenericCommand
        -> (db.c) lookupKeyReadOrReply()
            -> (db.c) lookupKeyRead()
                -> (db.c) lookupKeyReadWithFlags()    
                    -> (db.c) expireIfNeeded()
                        -> keyIsExpired()
                        -> propagateExpire()
                        -> notifyKeyspaceEvent()
                        -> dbAsyncDelete()
                        -> dbSyncDelete()
                        -> signalModifiedKey()
/* 当我们要对指定的key执行某些操作时,会调用次函数,
 * 但是这个key可能在逻辑上已经过期,但是仍然在数据库里。
 * 调用此函数的主要方式是通过 lookupKey*() 族 函数
 *
 * The behavior of the function depends on the replication role of the
 * instance, because slave instances do not expire keys, they wait
 * for DELs from the master for consistency matters. However even
 * slaves will try to have a coherent return value for the function,
 * so that read commands executed in the slave side will be able to
 * behave like if the key is expired even if still present (because the
 * master has yet to propagate the DEL).
 *
 * In masters as a side effect of finding a key which is expired, such
 * key will be evicted from the database. Also this may trigger the
 * propagation of a DEL/UNLINK command in AOF / replication stream.
 *
 * The return value of the function is 0 if the key is still valid,
 * otherwise the function returns 1 if the key is expired. */
int expireIfNeeded(redisDb *db, robj *key) {
    if (!keyIsExpired(db,key)) return 0;

    /* If we are running in the context of a slave, instead of
     * evicting the expired key from the database, we return ASAP:
     * the slave key expiration is controlled by the master that will
     * send us synthesized DEL operations for expired keys.
     *
     * Still we try to return the right information to the caller,
     * that is, 0 if we think the key should be still valid, 1 if
     * we think the key is expired at this time. */
    if (server.masterhost != NULL) return 1;

    server.stat_expiredkeys++;
    propagateExpire(db,key,server.lazyfree_lazy_expire);
    notifyKeyspaceEvent(NOTIFY_EXPIRED,
        "expired",key,db->id);
    // 删除key  根据配置判断是异步删除还是同步删除    
    int retval = server.lazyfree_lazy_expire ? dbAsyncDelete(db,key) :
                                               dbSyncDelete(db,key);
    if (retval) signalModifiedKey(NULL,db,key);
    return retval;
}

Redis Server 配置

// file: redis.conf 

############################# LAZY FREEING ####################################
lazyfree-lazy-eviction yes
lazyfree-lazy-expire yes
lazyfree-lazy-server-del yes
replica-lazy-flush no

 调同步删除还是异步删除依赖redis.conf里的配置lazyfree-lazy-expire

(4.1.2) 同步删除-dbSyncDelete()

// Redis 6.0
// file: src/db.c

/*
  从数据库删除键、值 和 关联的节点 
  Delete a key, value, and associated expiration entry if any, from the DB 
*/
int dbSyncDelete(redisDb *db, robj *key) {
    
    // 从expires dict 删除 key对应robj的ptr 
    // 备注:从 expires dict 删除节点不会释放 key对应的SDS,因为它是与主dict共享的。
    if (dictSize(db->expires) > 0) dictDelete(db->expires,key->ptr);
    // 从dict 删除节点. 真正删除key和value并释放内存给内存分配器
    if (dictDelete(db->dict,key->ptr) == DICT_OK) {
        // 如果启用了redis cluster,删除对应solt的key
        if (server.cluster_enabled) slotToKeyDel(key->ptr);
        return 1;
    } else {
        return 0;
    }
}

(4.1.3) 异步删除-dbAsyncDelete()


/* 
 * 从数据库中删除key、value和对应的节点。
 * 如果有足够的内存来释放值对象,则可以将其放入延迟释放列表,而不是同步释放。
 * 惰性空闲列表将在另一个bio.c线程中回收。
 */
#define LAZYFREE_THRESHOLD 64
int dbAsyncDelete(redisDb *db, robj *key) {
    // 从expires dict 删除 key对应robj的ptr 
    // 备注:从 expires dict 删除节点不会释放 key对应的SDS,因为它是与主dict共享的。
    if (dictSize(db->expires) > 0) dictDelete(db->expires,key->ptr);

    /* If the value is composed of a few allocations, to free in a lazy way
     * is actually just slower... So under a certain limit we just free
     * the object synchronously. */
    dictEntry *de = dictUnlink(db->dict,key->ptr);
    if (de) {
        robj *val = dictGetVal(de);
        size_t free_effort = lazyfreeGetFreeEffort(val);

        /* If releasing the object is too much work, do it in the background
         * by adding the object to the lazy free list.
         * Note that if the object is shared, to reclaim it now it is not
         * possible. This rarely happens, however sometimes the implementation
         * of parts of the Redis core may call incrRefCount() to protect
         * objects, and then call dbDelete(). In this case we'll fall
         * through and reach the dictFreeUnlinkedEntry() call, that will be
         * equivalent to just calling decrRefCount(). */
        if (free_effort > LAZYFREE_THRESHOLD && val->refcount == 1) {
            atomicIncr(lazyfree_objects,1);
            bioCreateBackgroundJob(BIO_LAZY_FREE,val,NULL,NULL);
            dictSetVal(db->dict,de,NULL);
        }
    }

    /* Release the key-val pair, or just the key if we set the val
     * field to NULL in order to lazy free it later. */
    if (de) {
        dictFreeUnlinkedEntry(db->dict,de);
        if (server.cluster_enabled) slotToKeyDel(key->ptr);
        return 1;
    } else {
        return 0;
    }
}

(4.2) 定期删除

// c

参考资料