正在运行的gitlab集群突然抽风,具体故障现象如下。

环境信息:gitlab部署在k8s集群内,采用官方helm包部署,gitlab版本为14.10.x。

故障现象:项目相关的操作有几率报错

  • 新建:创建项目有几率失败,并报错
  • 查看项目:进入项目内,有几率刷不出代码树,并报错,报错信息 An error occurred while fetching folder content.
  • 删除项目:项目删不掉

查看Gitaly服务的日志,发现是Praefect服务调用Gitaly的健康检查接口报错,错误关键信息为 PermissionDenied。

{"correlation_id":"01G7BX3DR59H35EYHPMKHNVBC0","error":"rpc error: code = PermissionDenied desc = permission denied","grpc.code":"PermissionDenied","grpc.meta.auth_version":"v2","grpc.meta.deadline_type":"unknown","grpc.meta.method_type"
:"unary","grpc.method":"Check","grpc.request.deadline":"2022-07-07T08:39:54.207","grpc.request.fullMethod":"/grpc.health.v1.Health/Check","grpc.request.payload_bytes":0,"grpc.response.payload_bytes":0,"grpc.service":"grpc.health.v1.Healt
h","grpc.start_time":"2022-07-07T08:39:53.208","grpc.time_ms":0.285,"level":"warning","msg":"finished unary call with code PermissionDenied","peer.address":"10.42.1.134:51266","pid":12,"span.kind":"server","system":"grpc","time":"2022-07-07T08:39:53.208Z"}
{"correlation_id":"01G7BX3EQJZJFT996MD2KF2HXW","error":"rpc error: code = PermissionDenied desc = permission denied","grpc.code":"PermissionDenied","grpc.meta.auth_version":"v2","grpc.meta.deadline_type":"unknown","grpc.meta.method_type":"unary","grpc.method":"Check","grpc.request.deadline":"2022-07-07T08:39:55.212","grpc.request.fullMethod":"/grpc.health.v1.Health/Check","grpc.request.payload_bytes":0,"grpc.response.payload_bytes":0,"grpc.service":"grpc.health.v1.Health","grpc.start_time":"2022-07-07T08:39:54.213","grpc.time_ms":0.201,"level":"warning","msg":"finished unary call with code PermissionDenied","peer.address":"10.42.1.134:51266","pid":12,"span.kind":"server","system":"grpc","time":"2022-07-07T08:39:54.213Z"}

*** /var/log/gitaly/gitaly_ruby_json.log ***
{"type":"gitaly-ruby","grpc.start_time":"2022-07-07T08:39:53Z","grpc.time_ms":0.286,"grpc.code":"OK","grpc.method":"Check","grpc.service":"grpc.health.v1.Health","pid":35,"correlation_id":"c486ca8b2bbc3eb6736d533c38cf6017","time":"2022-07-07T08:39:53.642Z"}
{"type":"gitaly-ruby","grpc.start_time":"2022-07-07T08:39:53Z","grpc.time_ms":17.012,"grpc.code":"OK","grpc.method":"Check","grpc.service":"grpc.health.v1.Health","pid":36,"correlation_id":"71bff8c80424d633989f5daeb29111c3","time":"2022-07-07T08:39:53.658Z"}

*** /var/log/gitaly/gitaly.log ***
{"correlation_id":"01G7BX3FPZ8BTMCA0RY5754JGJ","error":"rpc error: code = PermissionDenied desc = permission denied","grpc.code":"PermissionDenied","grpc.meta.auth_version":"v2","grpc.meta.deadline_type":"unknown","grpc.meta.method_type":"unary","grpc.method":"Check","grpc.request.deadline":"2022-07-07T08:39:56.217","grpc.request.fullMethod":"/grpc.health.v1.Health/Check","grpc.request.payload_bytes":0,"grpc.response.payload_bytes":0,"grpc.service":"grpc.health.v1.Health","grpc.start_time":"2022-07-07T08:39:55.218","grpc.time_ms":0.135,"level":"warning","msg":"finished unary call with code PermissionDenied","peer.address":"10.42.1.134:51266","pid":12,"span.kind":"server","system":"grpc","time":"2022-07-07T08:39:55.218Z"}
{"correlation_id":"01G7BX3GPCRD1RPMN7F3WN6X14","error":"rpc error: code = PermissionDenied desc = permission denied","grpc.code":"PermissionDenied","grpc.meta.auth_version":"v2","grpc.meta.deadline_type":"unknown","grpc.meta.method_type":"unary","grpc.method":"Check","grpc.request.deadline":"2022-07-07T08:39:57.222","grpc.request.fullMethod":"/grpc.health.v1.Health/Check","grpc.request.payload_bytes":0,"grpc.response.payload_bytes":0,"grpc.service":"grpc.health.v1.Health","grpc.start_time":"2022-07-07T08:39:56.223","grpc.time_ms":0.189,"level":"warning","msg":"finished unary call with code PermissionDenied","peer.address":"10.42.1.134:51266","pid":12,"span.kind":"server","system":"grpc","time":"2022-07-07T08:39:56.223Z"}

解决:服务器之间做好时间同步就好了

官方issue: Permission denied between Gitlab and Praefect

1 环境介绍

虽然 kubeadm, kops, kubespray 以及 rke, kubesphere 等工具可以快速部署 K8s 集群,但是依然会有很多人热衷与使用二进制部署 K8s 集群。

二进制部署可以加深对 K8s 各组件的理解,可以灵活地将各个组件部署到不同的机器,以满足自身的要求。还可以生成一个超长时间自签证书,比如 99 年,免去忘记更新证书过期带来的生产事故。

本文基于当前(2021-12-31)最新版本 K8s 1.23.1,总体和网上的 1.20,1.22 等版本的部署方式没有太大的区别,主要参考了韩先超老师的 K8s 1.20 版本的二进制部署教程。

另外,我的环境是使用 m1 芯片的 macbook 运行的 ubuntu 20.04 TLS 虚拟机搭建,因此本次环境搭建 K8s 是基于 arm64 架构的。

1.0 书写约定

  • 命令行输入,均以 符号表示
  • 注释使用 #// 表示
  • 执行命令输出结果,以空行分隔

1.1 规划

角色主机名IP组件
master nodeubuntu-k8s-master-0110.211.55.4etcd, kube-apiserver, kube-controller-manager,
kube-scheduler, kube-proxy, kubelet
worker nodeubuntu-k8s-worker-0110.211.55.5kubelet, kube-proxy

1.2 环境配置

  • 设置主机名

    # 10.211.55.4 主机
    ➜ sudo hostnamectl set-hostname ubuntu-k8s-master-01
    # 10.211.55.5 主机
    ➜ sudo hostnamectl set-hostname ubuntu-k8s-worker-01

- 阅读剩余部分 -

注意:文中代码块会出现很多 \- 这样的字符,实际上是 -,是因为网站的 markdown 渲染有问题,不得不这样写

Question 1

Task weight: 1%

You have access to multiple clusters from your main terminal through kubectl contexts. Write all context names into /opt/course/1/contexts, one per line.

From the kubeconfig extract the certificate of user restricted@infra-prod and write it decoded to /opt/course/1/cert.

题目解析:

  • 考点

    • kubectl
  • 解题

    ➜ kubectl config get-contexts -o name > /opt/course/1/contexts
    
    # 从 .kube/config 文件中找到
    \- name: restricted@infra-prod
      user
        client-certificate-data: LS0tLS1CRUdJ...
    ➜ echo LS0tLS1CRUdJ... | base64 -d > /opt/course/1/cert

- 阅读剩余部分 -

注意:文中代码块会出现很多 \- 这样的字符,实际上是 -,是因为网站的 markdown 渲染有问题,不得不这样写

Question 1

Task weight: 1%

You have access to multiple clusters from your main terminal through kubectl contexts. Write all those context names into /opt/course/1/contexts.

Next write a command to display the current context into /opt/course/1/context_default_kubectl.sh, the command should use kubectl.

Finally write a second command doing the same thing into /opt/course/1/context_default_no_kubectl.sh, but without the use of kubectl.

题目解析:

  • 考点

  • 解题

    • 根据题意:Write all those context names into /opt/course/1/contexts

      ➜ kubectl config get-contexts -o name > /opt/course/1/contexts
    • 根据题意:Next write a command to display the current context into /opt/course/1/context_default_kubectl.sh, the command should use kubectl

      ➜ vim /opt/course/1/context_default_kubectl.sh
      kubectl config current-context
    • 根据题意:Finally write a second command doing the same thing into /opt/course/1/context_default_no_kubectl.sh, but without the use of kubectl

      ➜ vim /opt/course/1/context_default_no_kubectl.sh
      grep "current-context: " ~/.kube/config | awk '{print $2}'

- 阅读剩余部分 -

# coding: utf8

# 使用说明:
# 1. 默认参数启动
#    python loggenerator.py
# 2. 自定义参数启动
#    KEY1=VALUE1 KEY2=VALUE2 python loggenerator.py
#    支持参数:
#       TYPE: 日志类型,默认json,可选txt
#       MAX: 日志总量,默认10000000
#       SPEED: 每秒生成日志量,默认15000
#       OUTPUT: 日志文件,默认logs/sum.log
#       MAXSIZE: 日志文件大小,超过此大小将会进行轮转
import os
import gzip
import shutil
from random import random
import time
import json


def logCompressor(filepath):
    offset = 0
    maxsize = os.getenv('MAXSIZE', 50 * 1024 * 1024)
    r = ''
    while True:
        n = yield r
        if not n:
            return
        size = os.path.getsize(filepath)
        if size >= maxsize:
            tmpfile = "%s%d_tmp" % (filepath, int(random() * 1e17))
            shutil.move(filepath, tmpfile)
            gzip.GzipFile(filename="", mode='wb', compresslevel=9, fileobj=open(
                "%s-%s.%d.log.gz" % (filepath.split('.')[0], time.strftime("%Y.%m.%d", time.localtime()), offset),
                'wb')).write(open(tmpfile, 'rb').read())
            os.remove(tmpfile)
            offset += 1
            r = '200'
        else:
            r = '0'


def logGenerator(c, maxline, speed, filepath, logtype):
    if not os.path.exists(os.path.dirname(filepath)):
        os.mkdir(os.path.dirname(filepath))
    fb = open(filepath, 'a+')
    c.send(None)
    n = 0
    while n < maxline:
        start = time.time()
        s = 0 # 控制速率
        while s < speed:
            if logtype == 'json':
                m = {
                    "level": "INFO",
                    "date": time.strftime("%Y.%m.%d %H:%M:%S", time.localtime()),
                    "message": "time:%s, nothing to do!" % time.time(),
                    "business": "logGenerator:19",
                    "service": "loggenerator",
                    "hostname": "fluentd1"
                }
                m = json.dumps(m)
            else:
                m = '%s [INFO] [logGenerator:19] - time:%s, nothing to do!' % (time.strftime("%Y.%m.%d %H:%M:%S", time.localtime()), time.time())
            fb.write(m + "\n")
            n += 1
            s += 1
            r = c.send(n)
            if r == '200':
                fb.close()
                fb = open(filepath, 'w+')
        end = time.time()
        if end - start < 1:
            # 写入耗时小于1秒,控制写入速度
            time.sleep(1 - (end - start))
    c.close()


if __name__ == "__main__":
    maxline = os.getenv('MAX', 10000000)
    speed = os.getenv('SPEED', 15000)
    logfile = os.getenv('OUTPUT', 'logs/sum.log')
    logtype = os.getenv('TYPE', 'json')
    c = logCompressor(logfile)
    logGenerator(c, maxline, speed, logfile, logtype)