大数跨境
0
0

记录一次使用Telegrafa处理Influxdb v2磁盘高IO问题

记录一次使用Telegrafa处理Influxdb v2磁盘高IO问题 David跨境日记
2025-10-21
1
导读:当Influxdb2面临bucket高基数高IO问题的时候,就需要优化程序代码和增添telegra中间件进行解决优化,此文章记录本人单位出现的问题与解决方案。

一、问题


开发同事反映, 线上的InfluxDB服务器, 使用web图形化界面查询非常缓慢,

从下图的阿里云监控磁盘IO读写中就能看到,磁盘IO的读一直处于超负载状态,

最高时候平均读IO高达150MB/秒,因为读的高IO,影响InfluxDB线上数据桶名叫jt,

导致写入延迟并且丢失部分数据,这极大影响了业务数据完整性和可用性

Telegraf

二、分析


1.程序日志

首先写入数据延迟丢失桶名叫jt,写入客户端是使用go编写的程序,查看下面的go程序日志,

发现大量的写入异常报错,将此go程序终止,观察磁盘IO瞬间负载就恢复正常,

这就证明问题出现在程序或者此InfluxDB名为jt的桶上。

{"level":"error","timestamp":"2025-10-08T00:12:55.659+0800","caller":"jtdata/jtdata.go:79","msg":"internal error: unexpected error writing points to database: timeout"}{"level":"error","timestamp":"2025-10-08T00:13:10.292+0800","caller":"jtdata/jtdata.go:79","msg":"internal error: unexpected error writing points to database: timeout"}{"level":"error","timestamp":"2025-10-08T00:13:25.992+0800","caller":"jtdata/jtdata.go:79","msg":"internal error: unexpected error writing points to database: timeout"}{"level":"error","timestamp":"2025-10-08T00:13:42.502+0800","caller":"jtdata/jtdata.go:79","msg":"internal error: unexpected error writing points to database: timeout"}{"level":"error","timestamp":"2025-10-08T00:14:00.116+0800","caller":"jtdata/jtdata.go:79","msg":"internal error: unexpected error writing points to database: timeout"}{"level":"error","timestamp":"2025-10-08T00:14:15.766+0800","caller":"jtdata/jtdata.go:79","msg":"internal error: unexpected error writing points to database: timeout"}{"level":"error","timestamp":"2025-10-08T00:14:27.656+0800","caller":"jtdata/jtdata.go:79","msg":"internal error: unexpected error writing points to database: timeout"}{"level":"error","timestamp":"2025-10-08T00:14:39.615+0800","caller":"jtdata/jtdata.go:79","msg":"internal error: unexpected error writing points to database: timeout"}{"level":"error","timestamp":"2025-10-08T00:14:50.310+0800","caller":"jtdata/jtdata.go:79","msg":"internal error: unexpected error writing points to database: timeout"}{"level":"error","timestamp":"2025-10-08T16:25:26.512+0800","caller":"jtdata/jtdata.go:79","msg":"internal error: unexpected error writing points to database: timeout"}{"level":"error","timestamp":"2025-10-08T16:31:58.866+0800","caller":"jtdata/jtdata.go:79","msg":"internal error: unexpected error writing points to database: timeout"}{"level":"error","timestamp":"2025-10-08T16:32:09.139+0800","caller":"jtdata/jtdata.go:79","msg":"internal error: unexpected error writing points to database: timeout"}{"level":"error","timestamp":"2025-10-08T16:35:59.374+0800","caller":"jtdata/jtdata.go:79","msg":"internal error: unexpected error writing points to database: timeout"}{"level":"error","timestamp":"2025-10-08T16:36:09.923+0800","caller":"jtdata/jtdata.go:79","msg":"internal error: unexpected error writing points to database: timeout"}{"level":"error","timestamp":"2025-10-08T16:36:20.206+0800","caller":"jtdata/jtdata.go:79","msg":"internal error: unexpected error writing points to database: timeout"}{"level":"error","timestamp":"2025-10-08T16:36:30.885+0800","caller":"jtdata/jtdata.go:79","msg":"internal error: unexpected error writing points to database: timeout"}{"level":"error","timestamp":"2025-10-09T00:10:16.687+0800","caller":"jtdata/jtdata.go:79","msg":"internal error: unexpected error writing points to database: timeout"}{"level":"error","timestamp":"2025-10-09T00:10:28.356+0800","caller":"jtdata/jtdata.go:79","msg":"internal error: unexpected error writing points to database: timeout"}{"level":"error","timestamp":"2025-10-09T00:10:44.493+0800","caller":"jtdata/jtdata.go:79","msg":"internal error: unexpected error writing points to database: timeout"}{"level":"error","timestamp":"2025-10-09T00:10:58.603+0800","caller":"jtdata/jtdata.go:79","msg":"internal error: unexpected error writing points to database: timeout"}{"level":"error","timestamp":"2025-10-09T00:11:12.604+0800","caller":"jtdata/jtdata.go:79","msg":"internal error: unexpected error writing points to database: timeout"}{"level":"error","timestamp":"2025-10-09T00:11:29.668+0800","caller":"jtdata/jtdata.go:79","msg":"internal error: unexpected error writing points to database: timeout"}{"level":"error","timestamp":"2025-10-09T00:11:50.172+0800","caller":"jtdata/jtdata.go:79","msg":"Post \"http://localhost:8086/api/v2/write?bucket=jt&org=GTEX&precision=ns\": context deadline exceeded (Client.Timeout exceeded while awaiting headers)"}

2.基数的概念和高基数的危害

2.1什么是 Series 基数?

首先,要彻底理解什么是 Series。

一个 Series 在 InfluxDB 中是由 Measurement(测量名称)、Tag Set(所有 Tag 键值对的组合)和 Field(字段名称)共同唯一确定的。

Series 基数 就是数据库中所有唯一 Series 的数量。

举个例子: 假设你有一个 Measurement 叫 weather_data,用来存储天气信息。

Tags: city(城市), sensor_id(传感器ID)

Fields: temperature(温度), humidity(湿度)

如果你有:

3 个城市:beijing, shanghai, guangzhou

每个城市有 2 个传感器:sensor_1, sensor_2

2 个字段:temperature, humidity

那么你的 Series 总数计算如下:

Series 由 measurement + tag set + field 确定。

对于 temperature 字段,可能的 Tag 组合有 3 cities * 2 sensors = 6 种。

对于 humidity 字段,同样有 3 cities * 2 sensors = 6 种。

总 Series 基数 = 6 (temperature) + 6 (humidity) = 12。这是一个非常健康的基数。

2.2高基数如何产生及它的危害

如何产生高基数? 高基数通常是由于将一个包含大量唯一值的维度设置为了 Tag(标签)。

继续上面的例子,如果你错误地将 request_id(每次请求都不同)或 user_id(用户数量巨大)这样的字段设为了 Tag,那么基数会爆炸式增长。

假设你有 100,000 个独立用户,每个用户上报 temperature 数据:

Series 数 = 100,000 users * 1 field = 100,000

如果你的数据结构更复杂,基数会轻松达到百万甚至千万级别。

高基数的危害: 高基数对 InfluxDB 是致命的,因为它直接影响最核心的组件—— TSM(Time-Structured Merge Tree)引擎。

内存消耗巨大:每个唯一的 Series 在内存中都有一个索引条目。Series 越多,内存占用越高,最终导致 OOM(Out-Of-Memory)错误,进程崩溃。

查询性能急剧下降:查询需要扫描大量的 Series 索引,速度会变得非常慢,尤其是涉及 GROUP BY 或 WHERE 条件在 Tag 上时。

磁盘 I/O 增加:数据分布在更多的小文件中,压缩和 compaction 过程效率降低,耗时更长,导致持续的高 I/O 压力。

写入速度下降:每次写入都需要更新巨大的内存索引,会使写入延迟变高

经查询jt桶高基数问题最重要的原因是

业务中定义了DeviceID 、SensorID、Manufactor 3个Tag,

acc_x、acc_z、pressure、rssi、source、status、temperature、voltage 8个Field

DeviceID估计拥有5000个左右的不同的ID

SensorID估计拥有600个左右的不同的ID

Manufactor估计拥有3个左右的不同ID

计算:50006003*8 = 7200W

相近于桶基数报告的6300W的超高基数



3.桶基数报告

从下面的报告分析

jt桶中的基数已经达到了6300多万,

远远超出了官方推荐的10万以内的基数推荐。

influxd inspect report-db --db-path /var/lib/influxdb/.influxdbv2/engine/data/d8390b10d2ccb963bucket             retention policy measurement series------             ---------------- ----------- ------"d8390b10d2ccb963""autogen"        "Data"      63488581"d8390b10d2ccb963""autogen"        "GNSS"      32893"d8390b10d2ccb963""autogen"        "STATE"     43830"d8390b10d2ccb963""autogen"                    63519728"d8390b10d2ccb963"                              63519728Total (est.)                                    63519728

4.结论

经过查阅大量资料和influxdb v2官方文档,还有以上的分析,

得出结论是此jt桶的基数过大,造成过多的tsm文件生成,当tsm生成速度过快的时候,

会触发influxdb后台系统进行tsm数据合并,在合并期间会读取大量tsm文件, 然后写入临时tsm文件中,

导致IO读写都会升高,这时候因为数据过多,缓存会占用大量内存,当服务器内存过小或有限,

会导致频繁的内存和磁盘唤入唤出数据,因此读取的压力会更大,就造成了磁盘读压力将IO占满,

数据源源不断的写入,桶的基数也越来越大,就会造成恶性循环。

三、解决方案


1.程序层面

go程序源代码优化去掉以下Tag与Field

Manufactor 1个Tag,

acc_x、acc_z、source、status 4个Field

减少基数乘积

2.架构层面

2.1引入telegraf中间件,进行削峰填谷,数据缓存后进行批量提交,

从而减小tsm文件,tsm文件减少,就会减少合并,从而减少IO。

2.2引入telegraf双写机制,进行主备库同时写入,

主库jt桶保留最新7天的数据,备库保留永久数据,

当业务人员需要查询influxdb数据的时候,访问备库web页面查询数据,

从而不影响主库的IO和业务稳定性。

2.3telegraf搭建

ubuntu安装curl --silent --location -O https://repos.influxdata.com/influxdata-archive.keygpg --show-keys --with-fingerprint --with-colons ./influxdata-archive.key 2>&1 \| grep -q '^fpr:\+24C975CBA61A024EE1B631787C3D57159FC2F927:$' \&& cat influxdata-archive.key \| gpg --dearmor \| sudo tee /etc/apt/keyrings/influxdata-archive.gpg > /dev/null \&& echo 'deb [signed-by=/etc/apt/keyrings/influxdata-archive.gpg] https://repos.influxdata.com/debian stable main' \| sudo tee /etc/apt/sources.list.d/influxdata.listsudo apt-get update && sudo apt-get install telegrafredhat&centos&rocky安装cat <<EOF | sudo tee /etc/yum.repos.d/influxdata.repo[influxdata]name = InfluxData Repository - Stablebaseurl = https://repos.influxdata.com/stable/$basearch/mainenabled = 1gpgcheck = 1gpgkey = file:///etc/pki/rpm-gpg/RPM-GPG-KEY-influxdataEOFsudo yum install telegraf编辑配置文件,设置缓冲批量提交双写cd /etc/telegrafvim double_wirte_test.conf# Configuration for telegraf agent[agent]  ## Default data collection interval for all inputs  interval = "10s"  ## Rounds collection interval to 'interval'  ## ie, if interval="10s" then always collect on :00, :10, :20, etc.  round_interval = true  ## Telegraf will send metrics to outputs in batches of at most  ## metric_batch_size metrics.  ## This controls the size of writes that Telegraf sends to output plugins.  metric_batch_size = 1000  ## Maximum number of unwritten metrics per output.  Increasing this value  ## allows for longer periods of output downtime without dropping metrics at the  ## cost of higher maximum memory usage.  metric_buffer_limit = 10000  ## Collection jitter is used to jitter the collection by a random amount.  ## Each plugin will sleep for a random time within jitter before collecting.  ## This can be used to avoid many plugins querying things like sysfs at the  ## same time, which can have a measurable effect on the system.  collection_jitter = "0s"  ## Default flushing interval for all outputs. Maximum flush_interval will be  ## flush_interval + flush_jitter  flush_interval = "10s"  ## Jitter the flush interval by a random amount. This is primarily to avoid  ## large write spikes for users running a large number of telegraf instances.  ## ie, a jitter of 5s and interval 10s means flushes will happen every 10-15s  flush_jitter = "0s"  ## By defaultor when set to "0s", precision will be set to the same  ## timestamp order as the collection interval, with the maximum being 1s.  ##   ie, when interval = "10s", precision will be "1s"  ##       when interval = "250ms", precision will be "1ms"  ## Precision will NOT be used for service inputs. It is up to each individual  ## service input to set the timestamp at the appropriate precision.  ## Valid time units are "ns", "us" (or"µs"), "ms", "s".  precision = ""  ## Log at debug level.  #debug = true  ## Log only error level messages.  # quiet = false  ## Log target controls the destination for logs and can be one of "file",  ## "stderr"or, on Windows, "eventlog".  When set to "file", the output file  ## is determined by the "logfile" setting.  # logtarget = "file"  ## Name of the file to be logged to when using the "file" logtarget.  If set to  ## the empty string then logs are written to stderr.  # logfile = ""  ## The logfile will be rotated after the time interval specified.  When set  ## to 0 no time based rotation is performed.  Logs are rotated only when  ## written to, if there is no log activity rotation may be delayed.  # logfile_rotation_interval = "0d"  ## The logfile will be rotated when it becomes larger than the specified  ## size.  When set to 0 no size based rotation is performed.  # logfile_rotation_max_size = "0MB"  ## Maximum number of rotated archives to keep, any older logs are deleted.  ## If set to -1, no archives are removed.  # logfile_rotation_max_archives = 5  ## Pick a timezone to use when logging or type 'local'for local time.  ## Example: America/Chicago  # log_with_timezone = ""  ## Override default hostname, if empty use os.Hostname()  hostname = ""  ## If set to true, do no set the "host" tag in the telegraf agent.  omit_hostname = true[[inputs.influxdb_v2_listener]]  ## Address and port to host InfluxDB listener on  ## (Double check the port. Could be 9999ifusing OSS Beta)  service_address = ":8087"  ## Maximum allowed HTTP request body size in bytes.  ## 0 means to use the default of 32MiB.  # max_body_size = "32MiB"  ## Optional tag to determine the bucket.  ## If the write has a bucket in the query string then it will be kept in this tag name.  ## This tag can be used in downstream outputs.  ## The default value of nothing means it will be off and the database will not be recorded.  # bucket_tag = ""  ## Set one or more allowed client CA certificate file names to  ## enable mutually authenticated TLS connections  # tls_allowed_cacerts = ["/etc/telegraf/clientca.pem"]  ## Add service certificate and key  # tls_cert = "/etc/telegraf/cert.pem"  # tls_key = "/etc/telegraf/key.pem"  ## Optional token to accept for HTTP authentication.  ## You probably want to make sure you have TLS configured above forthis.  #token = "bwRR7x-Xw2gZEKuU7U2Xs8siwEr6AQYOZXyrmgPOeAQZFnH2kGZ53hCrROpZq_xUIv8j_nomYgnDn4R3egkhMQ=="  token = "To23iYgAsdALSlYRseYZijj3LR7IQatDQRbL-VoQXVGiNwfMcWqc2GEcmbGFhQQ9Cl8n8nM-LKjpbr3bN0jWxw=="  ## Influx line protocol parser  ## 'internal' is the default. 'upstream' is a newer parser that is faster  ## and more memory efficient.  parser_type = "upstream"[[outputs.influxdb_v2]]  # 第二个 InfluxDB v2 实例  urls = ["http://主库ip地址:8086"] # 请替换为你的第三个实例地址  token = "pDbRtCHdCuleCSzgyQ_kxJc2KxGcOcnbkemaq1Cq3AHwueOhQ6QxbMbeJNH11vbg5DSY5CBjRgb54ab9Mn2vFg=="  organization = "GTEX"  bucket = "jt"  metric_batch_size = 10000  # 增大批次大小  metric_buffer_limit = 1000000 # 增大缓冲区  flush_interval = "15s" # 缩短刷新间隔  content_encoding = "gzip" #启用gzip压缩  concurrent_writes = 4 #启用多线程  influx_uint_support = true #启用支持unsigned类型字段  [outputs.influxdb_v2.tags]    influxdb_instance = "primary"[[outputs.influxdb_v2]]  # 第一个 InfluxDB v2 实例  urls = ["http://备库ip地址:28086"] # 请替换为你的第二个实例地址  token = "OlXD8YHa4ZJgk5KgGoqkBPaMKXIMB_SPppgSirgfNaB9ciU6kfot-NTk2NtDjkxxqBf1YNyKtNenv9HD12kCbQ=="  organization = "GTEX"  bucket = "jt"  metric_batch_size = 10000  # 增大批次大小  metric_buffer_limit = 1000000 # 增大缓冲区  flush_interval = "15s" # 缩短刷新间隔  content_encoding = "gzip" #启用gzip压缩  concurrent_writes = 4 #启用多线程  influx_uint_support = true #启用支持unsigned类型字段  [outputs.influxdb_v2.tags]    influxdb_instance = "secondary"启动telegrafcd /etc/telegrafnohup telegraf --config /etc/telegraf/double_wirte_test.conf &启动后telegraf会监听8087端口,程序客户端服务就可以直接连接http://influxdb服务器ip:8087,进行数据写入,telegraf收到数据后缓冲到缓冲区,将客户端写入的line格式数据按批提交到声明outputs.influxdb_v2的influxdb数据库中。


3.时序数据库选择

当以上方案还是无法解决超大级别的桶大基数问题的时候,

需要考虑更换时序数据库,以下是时序数据库的推荐。

InfluxDB 3.0(推荐:极致优化大基数场景)

InfluxDB 3.0 基于开源项目 IOx(InfluxDB IOx)重构,专为高基数、高写入场景设计,是目前处理大基数最成熟的方案之一。

核心解决手段:

列存 + 分区存储:

采用列式存储引擎(基于 Apache Arrow),将不同标签和指标的数据按列拆分存储。对于高基数标签,仅存储唯一值并通过字典编码(Dictionary Encoding)映射,大幅减少重复存储(例如 100 万个设备 ID,仅存储一次唯一值,用整数 ID 映射)。

同时按时间范围(如每小时)分区,查询时仅扫描目标时间区间的数据,避免全表扫描。

无索引设计 + 谓词下推:

摒弃传统时序数据库的倒排索引(避免索引膨胀),通过 “谓词下推” 技术,在扫描数据时直接过滤不符合条件的标签,结合列存的高效批量处理能力,即使高基数场景下查询仍能保持高效。

自动降基数优化:

对高频重复的标签组合自动聚合,减少存储和计算压力;支持动态调整编码方式(如对低基数标签用位图索引,高基数标签用哈希映射)。


TDengine(推荐:物联网场景专用,标签分层优化)

TDengine 是专为物联网(IoT)设计的时序数据库,天然应对设备类高基数场景(如百万级传感器、设备 ID)。

核心解决手段:

超级表 + 子表架构:

引入 “超级表(STable)” 概念,将同一类型的设备(如温度传感器)抽象为超级表,每个设备作为 “子表(Table)”。标签分为 “静态标签”(如设备型号、厂商,存储在超级表元数据中)和 “动态标签”(如实时状态),静态标签仅存储一次,避免重复,大幅降低基数压力。

标签值预编码:

对高频出现的标签值(如设备 ID)进行预编码(整数映射),写入和查询时通过编码值操作,减少字符串比较的开销,提升处理速度。

时序聚合引擎:

内置时序聚合功能,支持按标签维度(如区域、型号)自动聚合数据,查询高基数标签时可直接返回聚合结果,无需扫描所有子表。


特别提醒:

如果说之前使用过influxdb v2的用户,可以优先使用influxdb v3去替换解决,

因为influxdb v3完全兼容v2 客户端接口,说明客户端程序代码无需更改,

这样会大大减少迁移数据库的工作量。

Telegraf








          


VSP

微信号shao5621404

博客|http://vincent.shaopengtrusit.top



【声明】内容源于网络
0
0
David跨境日记
跨境分享说 | 每日分享跨境见解
内容 42855
粉丝 1
David跨境日记 跨境分享说 | 每日分享跨境见解
总阅读243.1k
粉丝1
内容42.9k