Redis key 惰性删除

无论是 LRU 算法还是 LFU 算法,它们在删除淘汰数据时,实际上都会根据 Redis server 的 lazyfree-lazy-eviction 配置项,来决定是否使用 Lazy Free,也就是惰性删除。

(1) 惰性删除是什么

惰性删除是 Redis 4.0 版本后提供的功能,它会使用后台线程来执行删除数据的任务

(2) 为什么要用惰性删除

可以避免了删除操作对主线程的阻塞。

https://github.com/redis/redis/blob/6.0/redis.conf#L941

Redis有2个命令/原语删除keys
一种是DEL阻塞删除对象,它意味着Redis Server停止处理新命令,以便同步回收与对象关联的所有内存。
如果删除的这个key关联了一个小对象,为了执行删除命令需要的时间是非常小的,
并且与 Redis 中的大多数其它 O(1) 或 O(log_N) 命令相当。


If the key deleted is associated with a small object,
 the time needed
 in order to execute the DEL command is very small 
 and comparable to most other  O(1) or O(log_N) commands in Redis. 
 However if the key is associated with an
# aggregated value containing millions of elements, the server can block for
# a long time (even seconds) in order to complete the operation.
#
# For the above reasons Redis also offers non blocking deletion primitives
# such as UNLINK (non blocking DEL) and the ASYNC option of FLUSHALL and
# FLUSHDB commands, in order to reclaim memory in background. Those commands
# are executed in constant time. Another thread will incrementally free the
# object in the background as fast as possible.
#
# DEL, UNLINK and ASYNC option of FLUSHALL and FLUSHDB are user-controlled.
# It's up to the design of the application to understand when it is a good
# idea to use one or the other. However the Redis server sometimes has to
# delete keys or flush the whole database as a side effect of other operations.
# Specifically Redis deletes objects independently of a user call in the
# following scenarios:
#
# 1) On eviction, because of the maxmemory and maxmemory policy configurations,
#    in order to make room for new data, without going over the specified
#    memory limit.
# 2) Because of expire: when a key with an associated time to live (see the
#    EXPIRE command) must be deleted from memory.
# 3) Because of a side effect of a command that stores data on a key that may
#    already exist. For example the RENAME command may delete the old key
#    content when it is replaced with another one. Similarly SUNIONSTORE
#    or SORT with STORE option may delete existing keys. The SET command
#    itself removes any old content of the specified key in order to replace
#    it with the specified string.
# 4) During replication, when a replica performs a full resynchronization with
#    its master, the content of the whole database is removed in order to
#    load the RDB file just transferred.
#
# In all the above cases the default is to delete objects in a blocking way,
# like if DEL was called. However you can configure each case specifically
# in order to instead release memory in a non-blocking way like if UNLINK
# was called, using the following configuration directives.

lazyfree-lazy-eviction no
lazyfree-lazy-expire no
lazyfree-lazy-server-del no
replica-lazy-flush no

# It is also possible, for the case when to replace the user code DEL calls
# with UNLINK calls is not easy, to modify the default behavior of the DEL
# command to act exactly like UNLINK, using the following configuration
# directive:

lazyfree-lazy-user-del no

(3) 惰性删除怎么用

(3.1) 惰性删除的配置

当 Redis server 需要启动惰性删除时,需要在redis.conf配置文件中设置和惰性删除相关的配置项。
其中包括了四个配置项,分别对应了如下的四种场景。
lazyfree-lazy-eviction:对应缓存淘汰时的数据删除场景。
lazyfree-lazy-expire:对应过期 key 的删除场景。
lazyfree-lazy-server-del:对应会隐式进行删除操作的 server 命令执行场景。
replica-lazy-flush:对应从节点完成全量同步后,删除原有旧数据的场景。

这四个配置项的默认值都是 no。所以,如果要在缓存淘汰时启用,就需要将

lazyfree-lazy-eviction 设置为 yes。

(4) 惰性删除原理

(4.1) 被淘汰数据的删除过程

freeMemoryIfNeeded 函数(在evict.c文件中)会负责执行数据淘汰的流程。
该函数在筛选出被淘汰的键值对后,就要开始删除被淘汰的数据,这个删除过程主要分成两步。

第一步,freeMemoryIfNeeded 函数会为被淘汰的 key 创建一个 SDS 对象,然后调用 propagateExpire 函数
第二步,freeMemoryIfNeeded 函数会根据 server 是否启用了惰性删除,分别执行

// file: src/evict.c

/* This function is periodically called to see if there is memory to free
 * according to the current "maxmemory" settings. In case we are over the
 * memory limit, the function will try to free some memory to return back
 * under the limit.
 *
 * The function returns C_OK if we are under the memory limit or if we
 * were over the limit, but the attempt to free memory was successful.
 * Otherwise if we are over the memory limit, but not enough memory
 * was freed to return back under the limit, the function returns C_ERR. */
int freeMemoryIfNeeded(void) {
    int keys_freed = 0;

    // 省略部分代码 ... 

    // 已经释放的内存大小 < 计划要释放的内存大小
    while (mem_freed < mem_tofree) {
        int j, k, i;

        // sds
        sds bestkey = NULL;

        // 省略部分代码 

        // 最终移除选择要淘汰的key
        if (bestkey) {
            // 选择对应的db
            db = server.db+bestdbid;
            // 创建redisObject
            robj *keyobj = createStringObject(bestkey,sdslen(bestkey));
            // 删除
            propagateExpire(db,keyobj,server.lazyfree_lazy_eviction);
            /* We compute the amount of memory freed by db*Delete() alone.
             * It is possible that actually the memory needed to propagate
             * the DEL in AOF and replication link is greater than the one
             * we are freeing removing the key, but we can't account for
             * that otherwise we would never exit the loop.
             *
             * Same for CSC invalidation messages generated by signalModifiedKey.
             *
             * AOF and Output buffer memory will be freed eventually so
             * we only care about memory used by the key space. */
            delta = (long long) zmalloc_used_memory();
            latencyStartMonitor(eviction_latency);

            // 是否惰性删除
            if (server.lazyfree_lazy_eviction)
                dbAsyncDelete(db,keyobj);  // 异步删除
            else
                dbSyncDelete(db,keyobj);  // 同步删除

            latencyEndMonitor(eviction_latency);
            latencyAddSampleIfNeeded("eviction-del",eviction_latency);
            delta -= (long long) zmalloc_used_memory();
            mem_freed += delta;
            server.stat_evictedkeys++;
            signalModifiedKey(NULL,db,keyobj);
            notifyKeyspaceEvent(NOTIFY_EVICTED, "evicted",
                keyobj, db->id);
            decrRefCount(keyobj);
            keys_freed++;

            /* When the memory to free starts to be big enough, we may
             * start spending so much time here that is impossible to
             * deliver data to the slaves fast enough, so we force the
             * transmission here inside the loop. */
            if (slaves) flushSlavesOutputBuffers();

            /* Normally our stop condition is the ability to release
             * a fixed, pre-computed amount of memory. However when we
             * are deleting objects in another thread, it's better to
             * check, from time to time, if we already reached our target
             * memory, since the "mem_freed" amount is computed only
             * across the dbAsyncDelete() call, while the thread can
             * release the memory all the time. */
            if (server.lazyfree_lazy_eviction && !(keys_freed % 16)) {
                if (getMaxmemoryState(NULL,NULL,NULL,NULL) == C_OK) {
                    /* Let's satisfy our stop condition. */
                    mem_freed = mem_tofree;
                }
            }
        } else {
            goto cant_free; /* nothing to free... */
        }
    }
    result = C_OK;

cant_free:
    /* We are here if we are not able to reclaim memory. There is only one
     * last thing we can try: check if the lazyfree thread has jobs in queue
     * and wait... */
    if (result != C_OK) {
        latencyStartMonitor(lazyfree_latency);
        while(bioPendingJobsOfType(BIO_LAZY_FREE)) {
            if (getMaxmemoryState(NULL,NULL,NULL,NULL) == C_OK) {
                result = C_OK;
                break;
            }
            usleep(1000);
        }
        latencyEndMonitor(lazyfree_latency);
        latencyAddSampleIfNeeded("eviction-lazyfree",lazyfree_latency);
    }
    latencyEndMonitor(latency);
    latencyAddSampleIfNeeded("eviction-cycle",latency);
    return result;
}

(4.1.1) 传播过期key-propagateExpire

// file: src/evict.c

/* 
 * 传播 过期keys 到 从节点 和 AOF 文件。
 * 当主节点中的key过期时,如果启用(),则会将对此key的DEL操作发送到 所有从节点 和 AOF 文件。
 *
 * 这样key过期集中在一个地方,并且由于 AOF 和 主->从 链接保证操作顺序,
 * 即使我们允许对过期key进行写操作,一切也会保持一致。
 *
 * @param *db redisDb
 * @param *key  过期key对象(redisObject格式)
 * @param lazy  过期策略
 */
void propagateExpire(redisDb *db, robj *key, int lazy) {
    robj *argv[2];

    argv[0] = lazy ? shared.unlink : shared.del;
    argv[1] = key;
    // 引用计数 +1 
    incrRefCount(argv[0]);
    // 引用计数 +1 
    incrRefCount(argv[1]);

    // AOF未关闭
    if (server.aof_state != AOF_OFF)
        feedAppendOnlyFile(server.delCommand,db->id,argv,2);  // 把删除命令追加到AOF缓存 

    // 将删除操作同步给从节点
    replicationFeedSlaves(server.slaves,db->id,argv,2);

    // 引用计数 -1 
    decrRefCount(argv[0]);
    // 引用计数 -1 
    decrRefCount(argv[1]);
}

(4.1.2) 惰性删除-dbAsyncDelete

// file: src/evict.c

/*
*/
int freeMemoryIfNeeded(void) {

    // 省略部分代码

            // 是否惰性删除
            if (server.lazyfree_lazy_eviction)
                dbAsyncDelete(db,keyobj);  // 异步删除
            else
                dbSyncDelete(db,keyobj);  // 同步删除

    // 省略部分代码

}

(4.2) 数据异步删除-dbAsyncDelete

// file: src/lazyfree.c

// 从数据库中删除key、value 和 关联的过期entry(如果有)。
// 如果有足够的内存分配来释放值对象,则可以将其放入惰性释放列表而不是同步释放。 
// 惰性空闲列表将在不同的 bio.c 线程中回收。

#define LAZYFREE_THRESHOLD 64

/*
 * 异步删除
 * 
 * @param *db 
 * @param *key 
 */
int dbAsyncDelete(redisDb *db, robj *key) {

    // 从 expires 字典中删除entry(dictEntry)不会释放key的sds,因为它与主字典共享。 
    // 需要删2次,第一次删entry(dictEntry),第二次删key
    if (dictSize(db->expires) > 0) dictDelete(db->expires,key->ptr);

    // 如果该值由一些allocations组成,以惰性方式释放实际上会更慢...所以在一定限制下我们只是同步释放对象。 

    // 从字典里删除key
    dictEntry *de = dictUnlink(db->dict,key->ptr);
    // 如果节点不为空
    if (de) {
        // 获取节点的值
        robj *val = dictGetVal(de);
        // 
        size_t free_effort = lazyfreeGetFreeEffort(val);

        /* If releasing the object is too much work, do it in the background
         * by adding the object to the lazy free list.
         * Note that if the object is shared, to reclaim it now it is not
         * possible. This rarely happens, however sometimes the implementation
         * of parts of the Redis core may call incrRefCount() to protect
         * objects, and then call dbDelete(). In this case we'll fall
         * through and reach the dictFreeUnlinkedEntry() call, that will be
         * equivalent to just calling decrRefCount(). */
        if (free_effort > LAZYFREE_THRESHOLD && val->refcount == 1) {
            atomicIncr(lazyfree_objects,1);
            bioCreateBackgroundJob(BIO_LAZY_FREE,val,NULL,NULL);
            dictSetVal(db->dict,de,NULL);
        }
    }

    /* Release the key-val pair, or just the key if we set the val
     * field to NULL in order to lazy free it later. */
    if (de) {
        dictFreeUnlinkedEntry(db->dict,de);
        if (server.cluster_enabled) slotToKeyDel(key->ptr);
        return 1;
    } else {
        return 0;
    }
}

/* Return the amount of work needed in order to free an object.
 * The return value is not always the actual number of allocations the
 * object is composed of, but a number proportional to it.
 *
 * For strings the function always returns 1.
 *
 * For aggregated objects represented by hash tables or other data structures
 * the function just returns the number of elements the object is composed of.
 *
 * Objects composed of single allocations are always reported as having a
 * single item even if they are actually logical composed of multiple
 * elements.
 *
 * For lists the function returns the number of elements in the quicklist
 * representing the list. */
size_t lazyfreeGetFreeEffort(robj *obj) {
    if (obj->type == OBJ_LIST) {
        quicklist *ql = obj->ptr;
        return ql->len;
    } else if (obj->type == OBJ_SET && obj->encoding == OBJ_ENCODING_HT) {
        dict *ht = obj->ptr;
        return dictSize(ht);
    } else if (obj->type == OBJ_ZSET && obj->encoding == OBJ_ENCODING_SKIPLIST){
        zset *zs = obj->ptr;
        return zs->zsl->length;
    } else if (obj->type == OBJ_HASH && obj->encoding == OBJ_ENCODING_HT) {
        dict *ht = obj->ptr;
        return dictSize(ht);
    } else if (obj->type == OBJ_STREAM) {
        size_t effort = 0;
        stream *s = obj->ptr;

        /* Make a best effort estimate to maintain constant runtime. Every macro
         * node in the Stream is one allocation. */
        effort += s->rax->numnodes;

        /* Every consumer group is an allocation and so are the entries in its
         * PEL. We use size of the first group's PEL as an estimate for all
         * others. */
        if (s->cgroups && raxSize(s->cgroups)) {
            raxIterator ri;
            streamCG *cg;
            raxStart(&ri,s->cgroups);
            raxSeek(&ri,"^",NULL,0);
            /* There must be at least one group so the following should always
             * work. */
            serverAssert(raxNext(&ri));
            cg = ri.data;
            effort += raxSize(s->cgroups)*(1+raxSize(cg->pel));
            raxStop(&ri);
        }
        return effort;
    } else {
        return 1; /* Everything else is a single allocation. */
    }
}

参考资料

[1] Redis源码剖析与实战 - 17|Lazy Free会影响缓存替换吗?