Linux网络包接收流程

发表于 2022-09-04 更新于 2022-11-27 分类于 basic Waline：本文字数： 208 阅读时长 ≈ 1 分钟

在发现负责的服务网络超时时，一般会通过监控平台、Metric、Trace、ErrorLog来观察系统状态，来判断问题在哪里。

今天咱们来探讨一下，如果是负责的服务收包出现问题，怎么定位问题？

参考资料

[1] 开发内功修炼-图解Linux网络包接收过程
[2] 趣谈Linux操作系统 - 47 | 接收网络包（上）：如何搞明白合作伙伴让我们做什么？
[3] 趣谈Linux操作系统 - 48 | 接收网络包（下）：如何搞明白合作伙伴让我们做什么？

go性能分析及线上问题定位

发表于 2022-08-20 更新于 2023-04-09 分类于 go Waline：本文字数： 9.1k 阅读时长 ≈ 8 分钟

医生的温度计、听诊器、血压计，可以获取就诊者的各种健康指标。

如果把研发工程师比喻成医生，那pprof就是类似温度计、听诊器、血压计的一个工具集，可以用来分析CPU、goroutine堆栈跟踪、内存分配及使用情况、互斥锁争用及耗时、goroutine阻塞 等

火焰图

(1) pprof是什么

性能分析是通过分析器(profiler)工具进行检测来实现的，在 Go 中使用称为 pprof。

pprof工具可以用来监测进程的运行数据，用于监控程序的性能及状况，以下指标都可以监控

CPU— 确定应用程序的时间花在了哪里
Goroutine— 报告正在运行的 goroutines 堆栈跟踪
Heap— 报告堆内存分配以监视当前内存使用情况并检查可能的内存泄漏
Mutex— 报告锁争情况来分析代码中互斥锁使用行为以及应用程序是否在锁定调用上花费了太多时间
Block— 显示 goroutines 阻塞等待同步原语的位置

(2) 如何使用pprof

pprof必须在代码里引入才能使用
不像Java里的 jps、jstat、jinfo、jstack、jmap 工具可以单独使用

pprof 可以从以下两个包中引入

import "net/http/pprof"

import "runtime/pprof"

其中 net/http/pprof 使用 runtime/pprof 包来进行封装，并在 http 端口上暴露出来。
runtime/pprof 可以用来产生 dump 文件，再使用 go tool pprof 来分析这运行日志。

使用 net/http/pprof 可以做到直接看到当前 web 服务的状态，包括 CPU 占用情况和内存使用情况等。

(2.1) 通过网页可视化使用pprof

import (
	"net/http"
	// 代码里引入 pprof
	_ "net/http/pprof"
)

func main() {

    // 代码里引入http server 
	go func() {
		if err := http.ListenAndServe(":6060", nil); err != nil {
			log.Fatal(err)
		}
		os.Exit(0)
	}()

   // 省略业务代码

}

导入 net/http/pprof 的作用是，我们可以通过 http://127.0.0.1:6060/debug/pprof/ 来访问 pprof
通过这个地址可以通过网页很方便的查看各项指标

即使在生产环境中启用 pprof 也是安全的

作用	关键字	访问url
所有过去内存分配的样本	allocs	`http://127.0.0.1:6060/debug/pprof/allocs?debug=1`
导致同步基元阻塞的堆栈跟踪	block	`http://127.0.0.1:6060/debug/pprof/block?debug=1`
当前程序的命令行调用	cmdline
所有当前 goroutines 的堆栈跟踪	goroutine	`http://127.0.0.1:6060/debug/pprof/goroutine?debug=1`
活动对象的内存分配示例	heap	`http://127.0.0.1:6060/debug/pprof/heap?debug=1`
争用互斥锁持有者的堆栈跟踪	mutx	`http://127.0.0.1:6060/debug/pprof/mutex?debug=1`
CPU性能分析	profile	`http://127.0.0.1:6060/debug/pprof/profile?debug=1`
创建新操作系统线程的堆栈跟踪	threadcreate	`http://127.0.0.1:6060/debug/pprof/threadcreate?debug=1`
当前程序执行的跟踪	full goroutine stack dump	`http://127.0.0.1:6060/debug/pprof/goroutine?debug=2`

(2.2) 通过下载的pprof文件分析

(2.2.1) 下载并使用

直接运行 go tool pprof http://localhost:6060/debug/pprof/profile，其会自动下载数据到本地，然后供分析。
默认会下载到当前用户的 ~/pprof/pprof.samples.cpu.001.pb.gz 目录下

go tool pprof http://localhost:6060/debug/pprof/profile = download filepath + go tool pprof filepath

[weikeqin@bogon thrift-tutorial-go-demo (master)]$ go tool pprof http://localhost:6060/debug/pprof/profile 
Fetching profile over HTTP from http://localhost:6060/debug/pprof/profile
Saved profile in /Users/weikeqin/pprof/pprof.samples.cpu.001.pb.gz
Type: cpu
Time: Aug 20, 2022 at 8:37pm (CST)
Duration: 30s, Total samples = 2.37s ( 7.90%)
Entering interactive mode (type "help" for commands, "o" for options)
(pprof)

(2.2.2) 下载数据离线分析

下载文件

[weikeqin@computer ~]$ curl -o /Users/weikeqin/pprof/go_profile.out  http://localhost:6060/debug/pprof/profile
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100   107  100   107    0     0      3      0  0:00:35  0:00:30  0:00:05    22
[weikeqin@computer ~]$

执行 curl -o /Users/weikeqin/pprof/go_profile.out http://localhost:6060/debug/pprof/profile ，会下载原始数据文件到 /Users/weikeqin/pprof/go_profile.out 目录。

分析文件

[weikeqin@computer pprof]$ go tool pprof /Users/weikeqin/pprof/go_profile.out
Type: cpu
Time: Aug 20, 2022 at 3:39pm (CST)
Duration: 30.01s, Total samples = 0
No samples were found with the default sample value type.
Try "sample_index" command to analyze different sample values.
Entering interactive mode (type "help" for commands, "o" for options)
(pprof)

(3) 分析CPU占用情况

通过 http://127.0.0.1:6060/debug/pprof/profile?debug=1 可以下载 go_pprof_profile.out 文件

(3.1) CPU分析原理

CPU 分析器依赖于操作系统和信号。

当它被激活时，应用程序默认通过 SIGPROF 信号要求操作系统每 10 毫秒中断一次。当应用程序收到 SIGPROF 时，它会暂停当前活动并将执行转移到分析器。

分析器收集诸如当前 goroutine 活动之类的数据，并汇总可以检索的执行统计信息；然后停止分析并继续执行直到下一次的 SIGPROF。

我们可以访问 /debug/pprof/profile 路由来激活 CPU 分析。默认情况下，访问此路由会执行 30 秒的 CPU 分析。在 30 秒内，我们的应用程序每 10 毫秒中断一次。

请注意，可以更改这两个默认值：使用 seconds 参数将分析应该持续多长时间传递给路由（例如 /debug/pprof/profile?seconds=15），也可以更改中断率（甚至小于 10 毫秒）。

但多数情况下，10 毫秒应该足够了，在减小这个值（意味着增加频率）时，我们应该注意不要对性能产生影响。30 秒后，就可以下载 CPU 分析器的结果。

注意：
也可以通过 -cpuprofile 标志来开启 CPU 分析器，比如在运行基准测试时就可以用这种方式。

例如，执行以下命令后可通过 /debug/pprof/profile 下载到相同的分析结果文件。

(3.2) 通过命令分析CPU数值指标

如果在代码里使用了 http.ListenAndServe(":6060", nil)，可以直接访问 http://127.0.0.1:6060/debug/pprof/profile?debug=1 获取profile文件

获取profile文件后使用 go tool pprof go_profile.out 来分析CPU占用情况

(pprof) 
(pprof) help 
  Commands:
    callgrind        Outputs a graph in callgrind format
    comments         Output all profile comments
    disasm           Output assembly listings annotated with samples
    dot              Outputs a graph in DOT format
    eog              Visualize graph through eog
    evince           Visualize graph through evince
    gif              Outputs a graph image in GIF format
    gv               Visualize graph through gv
    kcachegrind      Visualize report in KCachegrind
    list             Output annotated source for functions matching regexp
    pdf              Outputs a graph in PDF format
    peek             Output callers/callees of functions matching regexp
    png              Outputs a graph image in PNG format
    proto            Outputs the profile in compressed protobuf format
    ps               Outputs a graph in PS format
    raw              Outputs a text representation of the raw profile
    svg              Outputs a graph in SVG format
    tags             Outputs all tags in the profile
    text             Outputs top entries in text form
    top              Outputs top entries in text form
    topproto         Outputs top entries in compressed protobuf format
    traces           Outputs all profile samples in text form
    tree             Outputs a text rendering of call graph
    web              Visualize graph through web browser
    weblist          Display annotated source in a web browser
    o/options        List options and their current values
    q/quit/exit/^D   Exit pprof

  Options:
    call_tree        Create a context-sensitive call tree
    compact_labels   Show minimal headers
    divide_by        Ratio to divide all samples before visualization
    drop_negative    Ignore negative differences
    edgefraction     Hide edges below <f>*total
    focus            Restricts to samples going through a node matching regexp
    hide             Skips nodes matching regexp
    ignore           Skips paths going through any nodes matching regexp
    intel_syntax     Show assembly in Intel syntax
    mean             Average sample value over first value (count)
    nodecount        Max number of nodes to show
    nodefraction     Hide nodes below <f>*total
    noinlines        Ignore inlines.
    normalize        Scales profile based on the base profile.
    output           Output filename for file-based outputs
    prune_from       Drops any functions below the matched frame.
    relative_percentages Show percentages relative to focused subgraph
    sample_index     Sample value to report (0-based index or name)
    show             Only show nodes matching regexp
    show_from        Drops functions above the highest matched frame.
    source_path      Search path for source files
    tagfocus         Restricts to samples with tags in range or matched by regexp
    taghide          Skip tags matching this regexp
    tagignore        Discard samples with tags in range or matched by regexp
    tagleaf          Adds pseudo stack frames for labels key/value pairs at the callstack leaf.
    tagroot          Adds pseudo stack frames for labels key/value pairs at the callstack root.
    tagshow          Only consider tags matching this regexp
    trim             Honor nodefraction/edgefraction/nodecount defaults
    trim_path        Path to trim from source paths before search
    unit             Measurement units to display

  Option groups (only set one per group):
    granularity      
      functions        Aggregate at the function level.
      filefunctions    Aggregate at the function level.
      files            Aggregate at the file level.
      lines            Aggregate at the source code line level.
      addresses        Aggregate at the address level.
    sort             
      cum              Sort entries based on cumulative weight
      flat             Sort entries based on own weight
  :   Clear focus/ignore/hide/tagfocus/tagignore

  type "help <cmd|option>" for more information
(pprof) 
(pprof)

(3.2.1) 生成调用关系图

(pprof)  web
(pprof)

会生成一个svg图片保存到 file:///private/var/folders/ry/j30lp1sn19q7rbtq6020f6880000gn/T/pprof001.svg，并且自动打开。

调用关系图

每个方框代表一个函数，方框的大小和执行时间成正比，箭头代表调用关系，箭头上的时间代表被调用函数的执行时间

(3.2.2) 查看top占用

(pprof) top
Showing nodes accounting for 5.75s, 94.73% of 6.07s total
Dropped 118 nodes (cum <= 0.03s)
Showing top 10 nodes out of 94
      flat  flat%   sum%        cum   cum%
     3.46s 57.00% 57.00%      3.47s 57.17%  syscall.syscall6
     1.34s 22.08% 79.08%      1.34s 22.08%  runtime.kevent
     0.30s  4.94% 84.02%      0.30s  4.94%  syscall.syscall
     0.16s  2.64% 86.66%      0.16s  2.64%  runtime.siftdownTimer
     0.12s  1.98% 88.63%      0.13s  2.14%  runtime.chansend
     0.12s  1.98% 90.61%      0.14s  2.31%  runtime.walltime
     0.10s  1.65% 92.26%      0.10s  1.65%  runtime.madvise
     0.08s  1.32% 93.57%      0.08s  1.32%  runtime.memclrNoHeapPointers
     0.06s  0.99% 94.56%      0.06s  0.99%  runtime.nanotime1
     0.01s  0.16% 94.73%      0.16s  2.64%  runtime.mallocgc
(pprof)

(3.3) CPU参数页面可视化分析

如果通过 go tool pprof go_profile.out 分析时觉得数字不直观时，可以通过页面可视化来查看

其实就是开了一个http服务，然后把 go_profile.out 做成可视化的了

比如 go_profile.out 对应的路径是 /Users/weikeqin/WorkSpaces/golang/go_profile.out

[weikeqin@computer ~]$ go tool pprof -http=:8080 /Users/weikeqin/WorkSpaces/golang/go_profile.out 
Serving web UI on http://localhost:8080

(3.3.1) 调用链路(graph)

(3.3.2) 火焰图(flamegraph)

(3.3.3) top

CPU top占用情况

(3.4) 分析后的收获

对 runtime.mallogc 的调用过多，意味着我们可以尝试减少过多的小堆分配
在通道操作或互斥锁上花费太多时间，可能表明过度竞争正在损害应用程序的性能
在 syscall.Read 或 syscall.Write 上花费太多时间，意味着应用程序在内核模式下花费了大量时间。处理 I/O 缓冲可能是改进的途径

(4) 分析内存-堆占用情况

通过 http://127.0.0.1:6060/debug/pprof/heap?debug=1 可以查看堆占用情况

(4.1) 堆分析原理

与 CPU 分析一样，堆分析也是基于采样的。

(4.2) 分析堆占用数字指标

通过 http://127.0.0.1:6060/debug/pprof/heap?debug=1 可以获取堆占用的详细数字指标

(4.3) 通过工具分析

[weikeqin@computer pprof]$ go tool pprof /Users/weikeqin/pprof/go_pprof_heap.out
Type: inuse_space
Time: Aug 20, 2022 at 4:29pm (CST)
Entering interactive mode (type "help" for commands, "o" for options)
(pprof)

(pprof)  svg
Generating report in profile001.svg
(pprof)

生成svg，查看堆占用情况

(4.3) 通过网页可视化分析

通过 http://127.0.0.1:6060/debug/pprof/heap?debug=0 下载文件，下载的文件可以重名名

[weikeqin@bogon ~]$ go tool pprof -http=:8082 /Users/weikeqin/pprof/go_pprof_heap.out
Serving web UI on http://localhost:8082

alloc_objects— 分配的对象总数
alloc_space— 分配的内存总量
inuse_objects— 已分配但尚未释放的对象数
inuse_space— 已分配但尚未释放的内存量

(5) 遇到的问题

(5.1) Could not execute dot; may need to install graphviz.

[weikeqin@bogon thrift-tutorial-go-demo (master)]$ go tool pprof -http=:8080 /Users/weikeqin/WorkSpaces/golang/go_profile.out 
Serving web UI on http://localhost:8080
Failed to execute dot. Is Graphviz installed?
exec: "dot": executable file not found in $PATH
Failed to execute dot. Is Graphviz installed?
exec: "dot": executable file not found in $PATH

到 https://www.graphviz.org/download/ 下载 Graphviz

参考资料

[1] 如何排查 Go 程序 CPU 占用过高问题
[2] Go 程序内存泄露问题快速定位
[3] golang pprof 实战 △
[4] 实用go pprof使用指南
[5] Go 中的性能分析和执行跟踪 ☆
[6] go.dev-doc-diagnostics#profiling
[7] 使用google的pprof工具以及在gin中集成pprof
[8] Golang性能测试工具PProf应用详解
[9] Concurrency isn’t Always Faster in Go

RocksDB索引模型

发表于 2022-07-30 更新于 2023-04-10 分类于 storage Waline：本文字数： 4.8k 阅读时长 ≈ 4 分钟

(1) RocksDB是什么

RocksDB是 Facebook 开源的一个高性能持久化 KV 存储。

RocksDB是一个基于LSM树的键值存储引擎，旨在提供高性能、可扩展性和可靠性。
它的核心思想是将数据分为多个层级，每个层级使用不同的数据结构来存储数据。

越来越多的新生代数据库，都不约而同地选择 RocksDB 作为它们的存储引擎。
比如 CockroachDB 用到了 RocksDB 作为它的存储引擎;
MyRocks 项目用 RocksDB 给 MySQL 做存储引擎，目的是取代现有的 InnoDB 存储引擎。

(2) 为什么要用RocksDB

简单来说有3个主要原因：1:持久化 2:数据量大时用磁盘存储便宜

来对比一下 RocksDB 和 Redis 这两个 KV 存储。

Redis是内存数据库，从官方给出的测试数据来看，它的随机读写性能大约在 50 万次 / 秒左右。
RocksDB 数据存储在磁盘，相应的随机读写性能大约在 20 万次 / 秒左右。

虽然性能还不如 Redis，但是已经可以算是同一个量级的水平了。

Redis 是一个内存数据库，并不是一个可靠的存储。数据写到内存中就算成功了，它并不保证安全地保存到磁盘上。
而 RocksDB 它是一个持久化的 KV 存储，它需要保证每条数据都要安全地写到磁盘上

(3) RocksDB索引模型

RocksDB 采用了一个非常复杂的数据存储结构，并且这个存储结构采用了内存和磁盘混合存储方式，使用磁盘来保证数据的可靠存储，并且利用速度更快的内存来提升读写性能。这种数据结构就是 LSM-Tree 。

(3.1) RocksDB使用的数据结构

LSM-Tree 的全称是：The Log-Structured Merge-Tree，是一种非常复杂的复合数据结构，它包含了 WAL（Write Ahead Log）、跳表（SkipList）和一个分层的有序表（SSTable，Sorted String Table）。

LSM-Tree 论文 https://ranger.uta.edu/~sjiang/pubs/papers/wang14-LSM-SDF.pdf

WAL 保证数据顺序写入
跳表查询性能O(log N)
分层有序表排序

LSM-Tree（Log-Structured Merge Tree）是一种针对写入密集型工作负载进行优化的基于磁盘的数据结构。
它由多个级别组成，每个级别都是键值对的排序运行。
当执行写操作时，新的键值对将添加到内存缓冲区中。一旦缓冲区达到一定大小，它将作为新的排序运行刷新到磁盘上。
当一个级别中的排序运行数量超过一定阈值时，它们将合并为下一个级别中的新的排序运行。

这个合并过程是LSM-Tree的效率所在。通过合并排序运行而不是单个键值对，LSM-Tree可以减少读取特定键值对所需的磁盘寻道次数。
此外，通过保持内存缓冲区相对较小，LSM-Tree可以减少维护数据结构所需的内存量。

要实现LSM-Tree，可以使用SkipList数据结构作为每个级别中的排序运行。SkipList的Insert方法可用于将新的键值对添加到内存缓冲区中，FindGreaterOrEqual方法可用于在排序运行中高效地搜索特定键。合并过程可以使用Iterator对象和SkipList的Merge方法的组合来实现。

总的来说，LSM-Tree是一种强大的数据结构，非常适合写入密集型工作负载。通过利用SkipList和排序运行的原理，LSM-Tree可以提供高效的写入和读取性能，同时最小化维护数据结构所需的内存量。

(3.2) 新增数据

当 LSM-Tree 收到一个写请求，比如说：PUT foo bar，把 Key foo 的值设置为 bar。
1、操作命令会被写入到磁盘的 WAL 日志中（图中右侧的 Log）。
2、数据会被写入到内存中的 MemTable 中，返回写入成功。
3、MemTable 写满(默认是32M)之后，就被转换成 Immutable MemTable，然后再创建一个空的 MemTable 继续写。
4、后台线程不断把 Immutable MemTable 复制到磁盘文件中，然后释放内存空间。 (此时每个文件里的Key是有序的，但是文件之间是完全无序的)
5、分层合并使文件有序。 (除第0层无序，第0层是MemTable 直接 dump 出来的磁盘文件所在的那一层)

(3.2.1) WAL日志的作用

写WAL日志是顺序写磁盘，性能较好。
WAL日志可以用于故障恢复，一旦系统宕机，可以从日志中把内存中还没有来得及写入磁盘的数据恢复出来。
WAL日志解决了数据可靠性的问题。

(3.2.2) MemTable的作用

MemTable 是一个按照 Key 组织的跳表（SkipList），查询复杂度是 O(log N)
写 MemTable 是个内存操作，速度也非常快。

跳表和平衡树有着类似的查找性能，但实现起来更简单一些。

这里面有一点需要注意的是，LSM-Tree 在处理写入的过程中，直接就往 MemTable 里写，并不去查找这个 Key 是不是已经存在了。

把 MemTable 写入 SSTable 这个写操作，因为它是把整块内存写入到整个文件中，这同样是一个顺序写操作。

(3.2.3) 分层合并

SSTable 被分为很多层，越往上层，文件越少，越往底层，文件越多。
每一层的容量都有一个固定的上限，一般来说，下一层的容量是上一层的 10 倍。
当某一层写满了，就会触发后台线程往下一层合并，数据合并到下一层之后，本层的 SSTable 文件就可以删除掉了。
合并的过程也是排序的过程，除 Level 0 以外，每一层内的文件都是有序的，文件内的 KV 也是有序的。

(3.3) 更新数据

和新增数据一样

(3.4) 删除数据

LSM-Tree 删除数据：在每条数据上增加一个删除的标志位，查询的时候判断是否已经删除，落盘的时候根据删除标志位合并数据，但是这样会浪费一些空间资源

标记删除，有墓碑的概念，被删除的条目，如果在memtable里，可以直接通过墓碑标记为删除，如果不在memtable里就插入一条新的删除记录，这两种情况都会在层级合并的时候真正发挥作用，同时WAL里通过一条额外的log记录这个删除操作。

另外感觉WAL里存储的其实就是操作指令流，和raft里面的日志完全一个概念，所以raft协议和leveldb/rocksdb的组合简直是绝配

(3.5) 查询数据

查询的过程是按照顺序从一个一个table中去查询，先查内存后查磁盘
Level 0 是无序的，所以一般Level只保存很少的几个文件。Level 0的查找顺序就是按照文件的创建顺序倒序查找，也就是从最新的向最旧的查找。

(3.6) 其它

在写入时，数据首先被写入内存中的MemTable，然后根据大小和数量的限制，MemTable会被转换为一个SSTable文件并写入磁盘。
在读取时，RocksDB会首先在内存中的MemTable中查找数据，如果没有找到，则会在磁盘上的SSTable文件中查找。

为了提高读取性能，RocksDB还使用了Bloom Filter和Skip List等数据结构来加速查找。

此外，RocksDB还支持多种数据压缩算法，以减少磁盘空间的使用。

(4) RocksDB索引模型源码解读

(4.1) 源码解析

源码地址: https://github.com/facebook/rocksdb/tree/v8.0.0/memtable

RocksDB索引模型主要源码
Memtable结构
数据结构-SkipList
分层有序表

(4.2) 跳表(SkipList)源码解析

SkipList类是一个模板类，它有两个模板参数：Key和Comparator。Key参数指定键值存储中使用的键的类型，而Comparator参数指定用于比较键的比较函数。

SkipList类提供了几种方法，用于在存储中插入、搜索和迭代键值对。
Insert方法将新的键值对插入存储中；
Contains方法检查给定的键是否存在于存储中；
Iterator类提供了一种按排序顺序迭代存储中键值对的方法。

SkipList类被设计为线程安全的，写操作需要外部同步，通常是互斥锁。
另一方面，读操作不需要任何内部锁定或同步，但需要保证在读取过程中SkipList不会被销毁。

总的来说，SkipList类提供了一个高性能、线程安全的数据结构，用于存储和检索RocksDB中的键值对。

(4.2.1) 跳表结构

https://github.com/facebook/rocksdb/blob/v8.0.0/memtable/skiplist.h#L170

// filepath: memtable/skiplist.h

// 定义 SkipList 类，模板参数为 Key 和 Comparator
template <typename Key, class Comparator>

// 跳表结构定义
class SkipList {
 private:
  // 节点
  struct Node;
                 
 private:
  // 最大高度
  const uint16_t kMaxHeight_;
  // 分支因子的默认值
  const uint16_t kBranching_;
  const uint32_t kScaledInverseBranching_;

  // 构造函数 后 不可更改
  Comparator const compare_;
  // 用于分配节点的分配器
  Allocator* const allocator_; 

  // 头节点
  Node* const head_;

  // 跳表的高度
  // 只能被 Insert() 修改，可以被读取器读取，但是过时的值是可以接受的
  std::atomic<int> max_height_;  

  // 用于优化顺序插入模式，比较棘手
  // prev_[i] 对于 i 小于等于 max_height_ 是 prev_[0] 的前驱节点，prev_height_ 是 prev_[0] 的高度
  // 在插入之前，prev_[0] 只能等于 head_，此时 max_height_ 和 prev_height_ 都为 1
  Node** prev_;
  int32_t prev_height_;

 public:
  // 构造函数，接受比较器和分配器，以及最大高度和分支因子的默认值
  explicit SkipList(Comparator cmp, Allocator* allocator,
                    int32_t max_height = 12, int32_t branching_factor = 4);

}

(4.2.2) 跳表节点

// filepath: memtable/skiplist.h

// 跳表节点
template <typename Key, class Comparator>
struct SkipList<Key, Comparator>::Node {
  explicit Node(const Key& k) : key(k) {}

  // 节点的key
  Key const key;

 private:
  // 下一个节点(们)  每层指向的下一个节点不一样
  // 类似链表的next节点 只不过有多层/多个
  // std::atomic 是原子类型对象，是为了解决并发编程中的线程安全提供的api，类似Java里的
  std::atomic<Node*> next_[1];
};

(4.3) Memtable结构

MemTable是一个内存中的数据结构，用于存储最近写入的键值对。当MemTable达到一定大小时，它将被转换为SSTable并写入磁盘。SSTable是一种静态的、只读的数据结构，它被用作RocksDB的持久化存储。

Rocksdb 支持创建多数据结构类型的 Memtable，默认的是 SkipList，即跳跃表。

参考资料

[1] 后端存储实战课 - 24 | RocksDB：不丢数据的高性能KV存储
[2] An Efficient Design and Implementation of LSM-Tree based Key-Value Store on Open-Channel SSD
[3] 技术贴 | Rocksdb 中 Memtable 源码解析

thrift TProcessor

发表于 2022-07-16 更新于 2022-09-18 分类于 rpc Waline：本文字数： 8.4k 阅读时长 ≈ 8 分钟

大家有没有想过，调用方调用一个方法，服务端是怎么找到对应方法并处理的？

很多RPC里是通过反射实现的，但是thrift里通过多态去实现的。

阅读全文 »

thrift Server

发表于 2022-07-10 更新于 2022-11-27 分类于 rpc Waline：本文字数： 45k 阅读时长 ≈ 41 分钟

(1) thrift Server 作用

Server将 thrift 所有功能整合到一起：

1、创建一个 Transport；
2、创建 Transport 使用的 I/O Protocol；
3、为 I/O Protocol 创建 Processor；
4、启动服务，等待客户端的连接；

thrift不同语言实现提供的服务器端的模式不一样

thrift Java版本为服务器端提供了多种模式： TSimpleServer 、 TThreadPoolServer 、 TNonblockingServer 、 THsHaServer 、 TThreadedSelectorServer

Thrift Go版本为服务器端提供了 TSimpleServer

IO模型	Java	Go	特点
阻塞IO	TSimpleServer	-	只有一个工作线程，循环监听新请求的到来并完成对请求的处理，一次只能接收和处理一个socket连接，效率比较低。
阻塞IO	TThreadPoolServer	TSimpleServer
IO多路复用	TNonblockingServer	-	TNonblockingServer 单线程工作，采用NIO的方式，所有的socket都被注册到selector中

阅读全文 »

thrift客户端

发表于 2022-07-10 更新于 2022-11-27 分类于 rpc Waline：本文字数： 9.4k 阅读时长 ≈ 9 分钟

(1) 设计

TClient ，
TStandardClient、WrappedTClient 实现了TClient接口的Call方法

(2) demo

https://github.com/weikeqin/thrift-tutorial-go-demo

以client调用add方法为例

调用下游Add()方法代码

func main() {
	// 创建thrift client
	thriftClient := getThriftClient()
	// 创建 calculatorClientProxy   其实是一个代理类(静态代理) 代理了idl定义的所有方法
	// tutorial是由idl生成的
	calculatorClientProxy := tutorial.NewCalculatorClient(thriftClient)
	// 调用Add方法
	sum, _ := calculatorClientProxy.Add(defaultCtx, 1, 2)
	fmt.Print("1+2=", sum, "\n")
}

阅读全文 »

thrift 传输方式(Transport)

发表于 2022-07-02 更新于 2022-07-17 分类于 rpc Waline：本文字数： 12k 阅读时长 ≈ 11 分钟

(1) 传输方式(Transport)作用

传输方式(Transport)作为rpc框架接收报文的入口，提供各种底层实现如socket创建、读写、接收连接等。
同时实现各种复写传输层包括http、framed、buffered、压缩传输等。

(1.1) 支持的传输方式

thrift支持多种传输方式

传输方式	特点
TSocket	阻塞型 socket，用于客户端，采用系统函数 read 和 write 进行读写数据。
TServerSocket	非阻塞型 socket，用于服务器端，accecpt 到的 socket 类型都是 TSocket（即阻塞型 socket）。
TBufferedTransport
TFramedTransport
TMemoryBuffer
TFileTransport
TFDTransport
TSimpleFileTransport
TZlibTransport
TSSLSocket
TSSLServerSocket

阅读全文 »

thrift架构设计

发表于 2022-06-26 更新于 2022-07-30 分类于 rpc Waline：本文字数： 1.4k 阅读时长 ≈ 1 分钟

Thrift是一个轻量、支持多语言、可扩展、高性能的远程服务调用框架。
提供了数据传输、序列化、应用层处理的清晰抽象。

The Apache Thrift software framework, for scalable cross-language services development, combines a software stack with a code generation engine to build services that work efficiently and seamlessly between C++, Java, Python, PHP, Ruby, Erlang, Perl, Haskell, C#, Cocoa, JavaScript, Node.js, Smalltalk, OCaml and Delphi and other languages.

(1) thrift架构分层

thrift-layers

分层	职责
服务调用层 (客户端/服务端)	客户端、服务端调用。
协议层	消息解析
传输层包装	功能增强，实现各种复写传输层包括http、framed、buffered、压缩传输等
低级传输层	靠近网络层、作为rpc框架接收报文的入口，提供各种底层实现如socket创建、读写、接收连接等。
语言层	thrift采用接口描述语言定义并创建服务，支持可扩展的跨语言服务开发
操作系统层	由编程语言提供各种操作系统的支持