node_exporter在k8s集群的部署方案
关于node_exporter
用途
Node exporter是Prometheus常用的一个exporter,主要用于收集内核暴露的硬件和系统级别的监控指标,比如cpu、内存、磁盘等硬件类的使用率。 在一些场景下还可以通过textfile collector特性收集自定义指标,比如无法通过现有的开源exporter收集到的定时任务指标等。
使用限制
由于node_exporter设计的目的是监控主机系统,一些硬件指需要通过对主机的一些系统级别的目录访问获取,比如/proc、/sys等。因此,官方并不建议通过容器进行部署。
The
node_exporter
is designed to monitor the host system. It’s not recommended to deploy it as a Docker container because it requires access to the host system.
受限于生产环境和历史遗留原因,总有一些场景需要外部prometheus监控容器集群的node。在这种情况下,node_exporter通过DaemonSet的形式部署在容器集群的每个node中,独立于外部prometheus集群而存在,必须进行额外的权限配置和宿主机目录挂载。
本文即对此类使用场景的部署方案说明。
版本说明
- node_exporter: v1.3.1
- helm: v3.0.0
部署方案
监控指标整理
作为一个硬件指标exporter, node_exporter默认暴露的指标相当之多,所有指标统计下来上千行,这样就带来了几个问题:
- 并不是所有监控指标都是我们需要的。
- 过多的指标监控给系统带来不必要的资源消耗。
因此,我们需要根据自己的业务监控需求,最小化暴露监控指标。
以下是一些常用的metrics:
# 节点存活时间
node_time_seconds
node_boot_time_seconds
up
# 磁盘I/O
node_disk_read_bytes_total
node_disk_written_bytes_total
# 文件系统
node_filesystem_size_bytes
node_filesystem_free_bytes
node_filesystem_files
node_filesystem_files_free
# cpu 负载
node_cpu_seconds_total
node_load1
node_load5
# 网络I/O
node_netstat_Tcp_CurrEstab
node_network_receive_bytes_total
node_network_transmit_bytes_total
node_public_network_receive_bandwidth
node_public_network_transmit_bandwidth
# 内存
node_memory_MemTotal_bytes
node_memory_MemFree_bytes
node_memory_Buffers_bytes
node_memory_Cached_bytes
# text collector
collector
据官方文档中说明,默认开启的collector包括arp、bonding、cpu等,而每个collector暴露的metrics并无详细文档说明。那么,如何获取collector对应的metrics明细呢?
首先,node_exporter有一个配置项,可以关闭所有默认开启的collector
- --collector.disable-defaults
然后,官方文档列出的collector列表中,根据collector的名称可以得知它的作用范围及适用系统, 通过描述可以得到一些大概范围的collector。
collector name | description | OS |
---|---|---|
cpu | Exposes CPU statistics | Darwin, Dragonfly, FreeBSD, Linux, Solaris, OpenBSD |
filesystem | Exposes filesystem statistics, such as disk space used. | Darwin, Dragonfly, FreeBSD, Linux, OpenBSD |
netdev | Exposes network interface statistics such as bytes transferred. | Darwin, Dragonfly, FreeBSD, Linux, OpenBSD |
textfile | Exposes statistics read from local disk. The --collector.textfile.directory flag must be set. |
any |
time | Exposes the current system time. | any |
stat | Exposes various statistics from /proc/stat . This includes boot time, forks and interrupts. |
Linux |
最后配合--collector.disable-defaults
参数进行一一排查,就能得到一个比较完整的映射关系了。
配置整理
node_exporter
通过梳理监控指标对应的collector,我们可以获取一个最初的精简化配置,只收集需要的监控指标
- --collector.disable-defaults
- --collector.stat
- --collector.diskstats
- --collector.filefd
- --collector.filesystem
- --collector.loadavg
- --collector.cpu
- --collector.netdev
- --collector.netstat
- --collector.netclass
- --collector.meminfo
除此之外,还需要一些基本的配置目录,用于指定proc和sys及root的挂载点。
- --path.procfs="/proc"
- --path.sysfs="/sys"
- --path.rootfs="/"
有自定义指标需求的,需要额外配置text collector
- --collector.textfile
- --collector.textfile.directory="$TEXTFILE_DIR"
对于node_exporter本身的指标,也可以完全剔除
- --web.disable-exporter-metrics
由于容器化其本身的实现机制,宿主机会多出很多挂载目录和虚拟网卡,我们可以通过filter配置过滤掉没意义的设备指标, 只保留宿主机本身的硬件设备。
- --collector.filesystem.fs-types-exclude="^(autofs|binfmt_misc|bpf|cgroup|configfs|debugfs|devpts|devtmpfs|fusectl|hugetlbfs|iso9660|mqueue|nsfs|overlay|proc|procfs|pstore|rpc_pipefs|securityfs|selinuxfs|squashfs|sysfs|tracefs|udev|systemd-1|lxcfs|sunrpc|tmpfs|shm)$"
- --collector.filesystem.mount-points-exclude="^/(sys|dev|etc|proc|run/containerd/.+)($|/)"
- --collector.netdev.device-include="^(eth)\\d+$"
以上操作,可以获取完整的配置清单。
kubernetes
上文已经提及,node_exporter在容器内运行需要挂载宿主机的一些目录来获取硬件信息,因此需要配置PodSecurityPolicy,并创建SA绑定相关clusterrole。
如果使用私有镜像源,记得Serviceaccount配置密钥。
---
apiVersion: policy/v1beta1
kind: PodSecurityPolicy
metadata:
name: node-exporter-prometheus-node-exporter
namespace: monitor
labels:
app: prometheus-node-exporter
release: node-exporter
spec:
privileged: false
# Allow core volume types.
volumes:
- 'configMap'
- 'emptyDir'
- 'projected'
- 'secret'
- 'downwardAPI'
- 'persistentVolumeClaim'
- 'hostPath'
hostNetwork: true
hostIPC: false
hostPID: true
hostPorts:
- min: 0
max: 65535
runAsUser:
# Permits the container to run with root privileges as well.
rule: 'RunAsAny'
seLinux:
# This policy assumes the nodes are using AppArmor rather than SELinux.
rule: 'RunAsAny'
supplementalGroups:
rule: 'MustRunAs'
ranges:
# Allow adding the root group.
- min: 0
max: 65535
fsGroup:
rule: 'MustRunAs'
ranges:
# Allow adding the root group.
- min: 0
max: 65535
readOnlyRootFilesystem: false
---
apiVersion: v1
kind: ServiceAccount
metadata:
name: node-exporter-prometheus-node-exporter
namespace: monitor
labels:
app: prometheus-node-exporter
release: "node-exporter"
annotations:
{}
imagePullSecrets:
- name: yourregistrykey
---
# Source: prometheus-node-exporter/templates/psp-clusterrole.yaml
kind: ClusterRole
apiVersion: rbac.authorization.k8s.io/v1
metadata:
name: psp-node-exporter-prometheus-node-exporter
labels:
app: prometheus-node-exporter
release: node-exporter
rules:
- apiGroups: ['extensions']
resources: ['podsecuritypolicies']
verbs: ['use']
resourceNames:
- node-exporter-prometheus-node-exporter
---
# Source: prometheus-node-exporter/templates/psp-clusterrolebinding.yaml
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
name: psp-node-exporter-prometheus-node-exporter
labels:
app: prometheus-node-exporter
release: node-exporter
roleRef:
apiGroup: rbac.authorization.k8s.io
kind: ClusterRole
name: psp-node-exporter-prometheus-node-exporter
subjects:
- kind: ServiceAccount
name: node-exporter-prometheus-node-exporter
namespace: monitor
然后创建Service,用于暴露node_exporter。使用DaemonSet部署node_exporter到每个node上, 记得挂载相关hostpath到容器中。
---
apiVersion: v1
kind: Service
metadata:
name: node-exporter-prometheus-node-exporter
namespace: monitor
annotations:
prometheus.io/scrape: "true"
labels:
app: prometheus-node-exporter
release: node-exporter
spec:
type: ClusterIP
ports:
- port: 9100
targetPort: 9100
protocol: TCP
name: metrics
selector:
app: prometheus-node-exporter
release: node-exporter
---
apiVersion: apps/v1
kind: DaemonSet
metadata:
name: node-exporter-prometheus-node-exporter
namespace: monitor
labels:
app: prometheus-node-exporter
release: node-exporter
spec:
selector:
matchLabels:
app: prometheus-node-exporter
release: node-exporter
updateStrategy:
rollingUpdate:
maxUnavailable: 1
type: RollingUpdate
template:
metadata:
labels:
app: prometheus-node-exporter
release: node-exporter
annotations:
cluster-autoscaler.kubernetes.io/safe-to-evict: "true"
spec:
automountServiceAccountToken: false
serviceAccountName: node-exporter-prometheus-node-exporter
securityContext:
fsGroup: 65534
runAsGroup: 65534
runAsNonRoot: true
runAsUser: 65534
containers:
- name: node-exporter
image: $PRIVATE_IMAGE_REPO/node-exporter:v1.3.1
imagePullPolicy: IfNotPresent
args:
- --path.procfs=/host/proc
- --path.sysfs=/host/sys
- --path.rootfs=/host/root
- --web.listen-address=$(HOST_IP):9100
- --web.disable-exporter-metrics
- --collector.disable-defaults
- --collector.stat
- --collector.time
- --collector.diskstats
- --collector.filefd
- --collector.netdev
- --collector.filesystem
- --collector.loadavg
- --collector.cpu
- --collector.meminfo
- --collector.tcpstat
- --collector.textfile
- --collector.textfile.directory=$TEXTFILE_DIR
- --collector.filesystem.fs-types-exclude="^(autofs|binfmt_misc|bpf|cgroup|configfs|debugfs|devpts|devtmpfs|fusectl|hugetlbfs|iso9660|mqueue|nsfs|overlay|proc|procfs|pstore|rpc_pipefs|securityfs|selinuxfs|squashfs|sysfs|tracefs|udev|systemd-1|lxcfs|sunrpc|tmpfs|shm)$"
- --collector.filesystem.mount-points-exclude="^/(sys|dev|etc|proc|run/containerd/.+)($|/)"
- --collector.netdev.device-include="^(eth)\\d+$"
env:
- name: HOST_IP
value: 0.0.0.0
ports:
- name: metrics
containerPort: 9100
protocol: TCP
livenessProbe:
httpGet:
path: /
port: 9100
readinessProbe:
httpGet:
path: /
port: 9100
resources:
{}
volumeMounts:
- name: proc
mountPath: /host/proc
readOnly: true
- name: sys
mountPath: /host/sys
readOnly: true
- name: root
mountPath: /host/root
mountPropagation: HostToContainer
readOnly: true
hostNetwork: true
hostPID: true
tolerations:
- effect: NoSchedule
operator: Exists
volumes:
- name: proc
hostPath:
path: /proc
- name: sys
hostPath:
path: /sys
- name: root
hostPath:
path: /
Helm部署
使用Helm可以极大地简化我们的部署环节,我们可以使用prometheus社区开源的chart进行部署。
对于一些自定义的参数,修改values.yaml进行定制, 根据上文所述的配置即可,此处不再赘述。
helm install node-exporter --namespace monitor ./prometheus-node-exporter
Prometheus配置
k8s内部的prometheus相对来说简单一些, 只需要在scape config下配置pod内挂载的ca和token即可。
- job_name: "kubernetes-nodes"
# Default to scraping over https. If required, just disable this or change to
# `http`.
scheme: https
# This TLS & authorization config is used to connect to the actual scrape
# endpoints for cluster components. This is separate to discovery auth
# configuration because discovery & scraping are two separate concerns in
# Prometheus. The discovery auth config is automatic if Prometheus runs inside
# the cluster. Otherwise, more config options have to be provided within the
# <kubernetes_sd_config>.
tls_config:
ca_file: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
# If your node certificates are self-signed or use a different CA to the
# master CA, then disable certificate verification below. Note that
# certificate verification is an integral part of a secure infrastructure
# so this should only be disabled in a controlled environment. You can
# disable certificate verification by uncommenting the line below.
#
# insecure_skip_verify: true
authorization:
credentials_file: /var/run/secrets/kubernetes.io/serviceaccount/token
kubernetes_sd_configs:
- role: node
relabel_configs:
- action: labelmap
regex: __meta_kubernetes_node_label_(.+)
而对于外部prometheus集群来说,需要进行额外的配置,以获取k8s api的访问验证。
首先,我们需要创建SA,并绑定对应的可查询nodes等权限的clusterrole,以便于实现自动发现。
# prom.rbac.yaml
apiVersion: v1
kind: ServiceAccount
metadata:
name: prometheus
namespace: monitor
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
name: prometheus
rules:
- apiGroups: [""]
resources:
- nodes
- services
- endpoints
- pods
verbs: ["get", "list", "watch"]
- apiGroups: [""]
resources:
- configmaps
verbs: ["get"]
- nonResourceURLs: ["/metrics"]
verbs: ["get"]
---
apiVersion: rbac.authorization.k8s.io/v1beta1
kind: ClusterRoleBinding
metadata:
name: prometheus
roleRef:
apiGroup: rbac.authorization.k8s.io
kind: ClusterRole
name: prometheus
subjects:
- kind: ServiceAccount
name: prometheus
namespace: monitor
由于prometheus处于k8s集群外部,无法自动获取ca证书和token,因此,除了在scape config下配置token外(用于数据scrape),kubernetes_sd_config下也需要同样的配置(用于实现discovery)。 此处的token即上述创建的名为prometheus的SA挂载的token,可以通过命令行获取:
kubectl get secrets -n monitor prometheus-token-l6jt7 -o jsonpath={.data.token} |base64 -d > token
将token配置在prometheus中
- job_name: 'kubernetes-web'
scheme: https
kubernetes_sd_configs:
- role: node
api_server: https://$YOUR_API_SERVER
# 配置token,用于实现discovery
bearer_token_file: ssl/token
tls_config:
# 跳过访问证书验证,用于实现discovery
insecure_skip_verify: true
# 配置token
bearer_token_file: ssl/token
tls_config:
# 跳过访问证书验证
insecure_skip_verify: true
relabel_configs:
- source_labels: [__meta_kubernetes_node_address_InternalIP]
target_label: __address__
replacement: ${1}:9100
action: replace
- target_label: __scheme__
replacement: http
action: replace
已知问题
-
大部分collector过滤器失效
-
自身指标剔除不完整
参考文档
- https://prometheus.io/docs/prometheus/latest/configuration/configuration/#kubernetes_sd_config
- https://github.com/prometheus-community/helm-charts
- https://github.com/prometheus/node_exporter
- https://cloud.tencent.com/developer/article/1659552
- https://github.com/prometheus/prometheus/blob/release-2.33/documentation/examples/prometheus-kubernetes.yml