Kubernetes Intro

发表于 2021-09-19 分类于 Cloud Native 阅读次数：本文字数： 30k 阅读时长 ≈ 27 分钟

Kubernetes

容器基础

容器技术的核心功能，就是通过约束和修改进程的动态表现，从而为其创建出一个“边界”
容器，其实是一种特殊的单进程模型而已
同一台机器上的所有容器，都共享宿主机操作系统的内核

# Cgroups技术用来限制
# Namespace技术则是用隔离
  - Mount Namespace 跟其他Namespace使用略有不同，它对容器进程视图的改变一定是伴随着挂载操作mount才能生效
  - chroot (change root file system): 改变进程的根目录到指定位置
# rootfs 根文件系统 (不包括系统内核)
  > 由于rootfs里打包的不只是应用，而是整个操作系统的文件和目录 (对于一个应用来说，操作系统本身才是它圆形所需要的最完整的"依赖库")
  - pivot_root
  - chroot

Cgroups

# Linux Cgroups是Linux内核中用来为进程设置资源限制的一个重要功能
# Linux Cgroups全称 Linux Control Group,主要作用就是限制进程组能够使用的资源上限，CPU、内存、磁盘、网络带宽

➜ mount -t cgroup
cgroup on /sys/fs/cgroup/systemd type cgroup (rw,nosuid,nodev,noexec,relatime,xattr,name=systemd)
cgroup on /sys/fs/cgroup/perf_event type cgroup (rw,nosuid,nodev,noexec,relatime,perf_event)
cgroup on /sys/fs/cgroup/cpu,cpuacct type cgroup (rw,nosuid,nodev,noexec,relatime,cpu,cpuacct)
cgroup on /sys/fs/cgroup/devices type cgroup (rw,nosuid,nodev,noexec,relatime,devices)
cgroup on /sys/fs/cgroup/net_cls,net_prio type cgroup (rw,nosuid,nodev,noexec,relatime,net_cls,net_prio)
cgroup on /sys/fs/cgroup/rdma type cgroup (rw,nosuid,nodev,noexec,relatime,rdma)
cgroup on /sys/fs/cgroup/blkio type cgroup (rw,nosuid,nodev,noexec,relatime,blkio)                        -- 为块设备设置I/O限制，一般用于磁盘等设备
cgroup on /sys/fs/cgroup/memory type cgroup (rw,nosuid,nodev,noexec,relatime,memory)                      -- 为进程设置内存使用限制
cgroup on /sys/fs/cgroup/hugetlb type cgroup (rw,nosuid,nodev,noexec,relatime,hugetlb)
cgroup on /sys/fs/cgroup/cpuset type cgroup (rw,nosuid,nodev,noexec,relatime,cpuset)                      -- 为进程分配单独的CPU核和对应的内存节点
cgroup on /sys/fs/cgroup/freezer type cgroup (rw,nosuid,nodev,noexec,relatime,freezer)
cgroup on /sys/fs/cgroup/pids type cgroup (rw,nosuid,nodev,noexec,relatime,pids)

➜ ls  /sys/fs/cgroup/cpu
cgroup.clone_children  cpuacct.usage             cpuacct.usage_percpu_user  cpu.cfs_quota_us  init.scope         system.slice
cgroup.procs           cpuacct.usage_all         cpuacct.usage_sys          cpu.shares        kubepods.slice     tasks
cgroup.sane_behavior   cpuacct.usage_percpu      cpuacct.usage_user         cpu.stat          notify_on_release  user.slice
cpuacct.stat           cpuacct.usage_percpu_sys  cpu.cfs_period_us          docker            release_agent

Docker vs Hypervisor

1
2

1. 用户运行在容器里的应用进程根宿主主机上的其他进程一样，都由宿主主机操作系统同一管理，只不过这些被隔离的进程拥有额外设置过的Namesapce参数, 而Docker项目在这里扮演的角色更多的是旁路式的辅助和管理工作
2. Hypervisor虚拟化作为应用沙盒，必须由Hypervisor负责创建虚拟机，这个虚拟机真实存在，并且运行完整的GuestOS才能执行用户的应用进程.

容器化

 > Dockerd实际上是在创建容器进程时，指定进程所需要启动的一组Namespace参数，这样，容器只能“看”到当前Namespace所限定的资源、文件、设备、状态、或配置，而对于宿主机以及其他不相关的程序，就完全看不到
> "敏捷"和"高性能"是容器相较于虚拟机最大的优势，也是它能够在PaaS这种更细粒度的资源管理平台上大行其道的重要原因.

- 容器和应用的同生命周期

Docker镜像

> Docker镜像的设计中引入层layer,用户制作镜像的每一步操作都会生成一个层，也就是一个增量rootfs

# Union File System 联合文件系统
  > 将多个不同位置的目录联合挂在(Union mount)到同一个目录下

# overlay2
  > overlay2, which has potential performance advantages over the aufs storage driver.

容器

是由Linux Namespace、Linux Cgroups和rootfs第三种技术构建出来的进程隔离环境

Linux容器

# 容器镜像 - 静态视图
  > 一组联合rootfs

# 容器运行时 - 动态视图
  > 一组Namespace + Cgroups 构成的隔离环境

# 容器编排工具:
  - Docker: Compose+Swarm
  - Google+ReadHat: Kubernetes

Kubernetes全局架构
Kubernetes 全局架构

- Master(控制节点):
  > 如何编排、管理、调度用户提交的作业
  - kube-controller-manager: Controller Manager
  - kube-apiserver: API Server (整个集群的持久化数据由kube-apiserver处理后保存在Etcd中)
  - kube-scheduler: Scheduler

- Node(计算节点):
  - kubelet: kubelet主要责任同容器运行时(docker)交互
  - CNI : kubelet + Networking - 网络插件为容器配置网络
  - CRI : kubectl + Container Runtime Interface (接口定义容器运行时的各项核心操作)
  - CSI : kubectl + Volume Plugin - 存储插件为容器持久化存储
  - OCI : Container Runtime Interface (容器运行时规范同底层Linux操作系统进行交互, 即将CRI请求翻译成Linux系统操作调用 Namespace + Cgroups)
  - grpc: kubelet + Device Plugin (Kubernetes项目管理宿主主机物理设备的主要组件)

> Kubernetes项目关心解决的问题是"运行在大规模集群中的各种任务之间，实际上存在着各种各样的关系，这些关系的处理，才是作业编排和管理系统最困难的地方"

Kubernetes 核心功能

> Kubernetes项目的本质是为用户提供一个具有普遍意义的容器编排工具

# Pod:
  > Pod里的容器共享同一个Network Namespace,同一组数据卷，从而达到高效交换信息的目的
  > Pod就是Kubenetes世界里的"应用",而一个应用，可以由多个容器组成

# Service:
  > Service服务作为Pod代理入口(Portal)从而代替Pod对外暴露一个固定的网络地址

# Deployment:
  > Pod多实例管理器

# Secret:
  > Secret对象是保存在Etcd里的键值对数据

# Job:
  > 描述一次性运行Pod

# CronJob:
  > 描述定时任务

# DaemonSet:
  > 描述每个宿主机上必须且只能运行一个副本的守护进程服务


# 编排对象 - 描述管理的应用
  - Pod
  - Job
  - CronJob

# 服务对象 - 负责平台级功能
  - Service
  - Secret
  - Horizontal Pod Autoscaler(自动水平扩展器)

# 声明式API
  > 这种API对应的"编排对象"和"服务对象"都是Kubernetes项目中的API对象(API Object)

Kubernetes 概念

概述

Kubernetes是什么
容器演进过程

> Kubernetes 是一个可移植的，可扩展的开源平台，用于管理容器化的工作负载和服务，可促进声明式配置和自动化.
> Kubernetes源于希腊语,意为"舵手"或"飞行员"
> Kubernetes建立在Google在大规模运行生产工作负载方面拥有十几年的经验的基础上，结合社区中最好的想法和实践.

# 传统部署时代:
  - 资源分配问题

# 虚拟化部署时代:
  - 虚拟化技术允许在单个物理服务器的CPU上运行多个虚拟机VM,虚拟化允许应用程序在VM之间隔离,并提供一定成都的安全，因为一个应用程序的信息不能被另一个应用程序随意访问
  - 虚拟化技术能够更好的利用物理服务器上的资源，因为可轻松地添加或更新应用程序而可以实现更好的可伸缩性，降低硬件成本
  - 每一个VM是一台完整的计算机，在虚拟化硬件之上运行所有组件，包括其自己的操作系统

# 容器部署时代:
  - 容器类似于VM，但是具有被放宽的隔离属性，可以在应用程序之间共享操作系统OS
  - 容器与VM具有自己的文件系统、CPU、内存、进程空间
  - 容器的好处:
    - 敏捷应用程序的创建和部署: 与使用VM镜像相比，提高容器镜像创建的便捷性和效率
    - 持续开发、集成和部署: 通过快速简单的回滚(由于镜像不可变性)，支持可靠且频繁的容器镜像构建和部署
    - 开发与运维的分离: 在构建/发布时而不是在部署时创建应用程序容器镜像,从而将应用程序于基础架构分离
    - 可观察性: 显示操作系统级别的信息和指标，显示应用程序的运行状态和其他指标信号
    - 跨开发、测试和生产的环境一致性: 在便携计算机上与云中相同地运行
    - 跨云和操作系统发型版本的可移植性: 可在Ubuntu,RHEL,CoreOS,本地
    - 以应用程序为中心的管理: 提高抽象级别，从在虚拟硬件上运行OS到使用逻辑资源在OS上运行应用程序
    - 松散耦合、分布式、弹性、解放的微服务: 应用程序被分解成较小的独立部分，并且可以动态部署和管理-而不是在一台大型单机上整体运行
    - 资源隔离: 可预测的应用程序性能
    - 资源利用: 高效率和高密度

# Kubernetes 提供功能:
  - 服务发现和负载均衡
  - 存储编排
  - 自动部署和回滚:
  - 自动完成装箱计算:
    > Kubernetes 允许指定每个容器所需CPU和内存RAM 当容器指定资源请求时，Kubernetes可以做出更好的决策来管理容器资源
  - 自我修复
  - 密钥与配置管理:
    > Kubernetes存储和管理敏感信息

# Kubernetes 不提供:
  - 不限制支持的应用程序类型: Kubernetes支持及其多种多样的工作负载，包括无状态、有状态和数据处理工作负载。如果应用程序可以在容器中运行，那么他应该可以在Kubernetes上很好的运行.
  - 不部署源代码，也不构建应用: 持续集成CI、交付和部署CI/CD工作流
  - 不提供应用程序级别的服务作为内置服务
  - 不提供日志记录、监控或报警解决方案: 提供一些集成作为概念证明并提供收集和到处指标的机制
  - 不提供不采用任何全面的机器配置、维护、管理或自我修复系统

> Kubernetes 不仅仅是一个编排系统，实际上消除了编排的需要，编排的技术定义是执行已定义的工作流程，首先执行A，然后执行B，在执行C，Kubernetes包含一组独立的，可组合的控制过程，这些过程联系地将当前状态驱动到所提供状态，如何从A到C的方式无关紧要，也不需要几种控制，使得系统更易于使用且功能更强大、系统更健壮、更为弹性和扩展性.

Kubernetes 组件
Kubernetes关联组件

# Control Plane Compoenents 控制平面组件
  > 控制平面组件对集群做出全局决策(调度),以及检测和响应集群事件
  > 控制平面组件可以在集群中的任何节点上运行

  - kube-apiserver:
    > 该组件公开Kubernetes API, API服务器是Kubernetes控制面的前端
    > 运行kube-apiserver多个实例并在这些实例之间平衡流量

  - etcd:
    > etcd 是兼顾一致性和高可用性的键值数据库,作为保存Kubernetes所有集群数据的后台数据库

PV、PVC、StorageClass

# Kubernetes 容器持久化存储
  - PV: 持久化存储的实现
    > 持久化存储数据卷,定义的是一个持久化存储在宿主主机上的目录

  - PVC: 持久化存储的接口
    > 描述的持久化存储的属性 (Volume存储大小,可读写权限)
    > PVC必须和某个符合条件的PV进行绑定
      - 1. PV 和 PVC 的spec字段, PV的存储(storage)大小,必须满足PVC的要求
      - 2. PV 和 PVC 的storageClassName字段必须一样

  - Volume Controller: 持久化存储控制器
    - PersistentVolumeController:
      > 不断查看当前每一个PVC是否已经处于Bound状态,如果不是，就会遍历所有的、可用的PV,尝试将其PVC进行绑定

# 持久化Volume
  - 远程文件存储
    - NFS
    - GlusterFS
  - 远程快存储
    - 公有云提供的远程磁盘

# 准备"持久化"宿主机目录
  - 第一阶段 (Attch) -- nodeName
    > 默认情况下，kubelet 为 Volume创建的目录是 /var/lib/kubelet/pods/<Pod的ID>/volumes/kubernetes.io-<Volume类型>/<Volume名字>
    > AttachDetachController (运行在Master节点上)

  - 第二阶段 (Mount) -- dir
    > 格式化磁盘设备,然后将其挂在到宿主机指定的挂载点上
    > VolumeManagerReconciler (运行在Node节点上,是一个独立于kubelet主循环的goroutine)

# StorageClass:
  > Kubernetes 提供一套可以自动创建PV的机制，Dynamic Provisioning
  > 手动创建PV的方式叫做 Staic Provisioning
  > StorageClass的作用就是创建PV的模板
  - 1. PV的属性 (存储类型、Volume大小)
  - 2. 创建PV需要用到的存储插件(Ceph, NFS)
  > Kubernetes根据用户提交的PVC，找到对应的StorageClass,然后调用该StorageClass声明的存储插件，创建出需要的PV

搭建StorageClass + NFS

1. 创建一个可用的NFS Server
  IP: 172.30.1.14
  Export PATH: /export/K8sData/

  $ sudo apt-get install nfs-common cifs-utils

2. 创建Service Account. 管控NFS Provisioner在K8s集群中运行的权限

3. 创建StorageClass.负责建立PVC并调用NFS provisioner进行预定的工作，并让PV与PVC建立管理
4. 创建NFS Provisioner,在NFS共享目录下创建挂载点(volume),建立PV并将PV与NFS的挂在点建立关联

部署Kubernetes

kubeadm

# 创建一个Master节点
$ kubeadm init

# 将一个Node节点加入当前集群
$ kubeadm join <Master节点的IP和端口>

# kubeadm 工作原理
  - kubelet 是Kubernetes项目用来操作Docker等容器运行时的核心组件,除了跟容器运行时交互外，kubelet在配置容器网络、管理容器数据卷时，都需要直接操作宿主机.
  > Kubeadm选择一种妥协方案，把kubelet直接运行在宿主机上，然后使用容器部署其他的Kubernetes组件.

# kubeadm init工作流程
  - 1. Preflight Checks 检查工作, 确定机器可以用来部署Kubernetes
    - Linux Kernal必须>= 3.10
    - Linux Cgroups 模块是否可用
    - 机器的hostname是否标准
    - 安装的kubeadm和kubelet版本是否匹配
    - 机器上是不是已经安装Kubernetes二进制文件
    - Kubernetes工作端口10250/10251/10252端口是否占用
    - ip, mount Linux指令是否存在
    - docker是否安装
  - 2. 生成 kubernetes对外提供服务所需要的各种证书和目录
    - Kubernetes对外提供服务时，除非专门开启"不安全模式",否则都要通过HTTPS才能访问kube-apiserver,需要kubernetes集群配置证书文件
    - /etc/kubernetes/pki (kubeadm为kubernetes项目生成的证书文件)
  - 3. kubeadm为其他组件生成访问kube-apiserver所需的配置文件
    - /etc/kubernetes/xxx.cnf
  - 4. kubeadm为Master组件生成Pod配置文件
    - kube-apiserver
    - kube-controller-manager
    - kube-scheduler
    - ETCD
    > 在Kubernetes中，特殊的容器启动方法"Static Pod",允许把要部署的Pod的YAML文件放在一个指定的目录里，当kubelet启动时会自动检查次目录，加载所有的PodYAML文件
  - 5. kubeadm检查localhost:6443/healthz 等待Master组件完全运行起来
  - 6. kubeadm为集群生成一个bootstrap token.
    - 剩余的Node节点可以通过此token加入到集群中
  - 7. 安装插件kube-proxy和DNS
    - kube-proxy: 集群的服务发现
    - DNS:

# kubeadm join工作流程
  - bootstrap token
    > kubeadm至少需要发起一次"不安全模式"的访问kube-apiserver,从而拿到保存在ConfigMap中的cluster-info(保存了APIServer的授权信息),而bootstrap token扮演的就是这个过程中的安全验证角色

# kubeadm部署参数配置文件(kubeadm.yaml)
  $ kubeadm init --config kubeadm.yaml

installing kubeadm, kubelet and kubectl

1
2
3

- kubeadm: the command to bootstrap the cluster
- kubelet: the component that runs on all of the machines in your cluster and does things like starting pods and containers.
- kubectl: the command line util to talk to your cluster

container runtimes

# common container runtimes:
  - containerd
  - CRI-O
  - Docker

# Cgroup(Control groups) drivers:
  > used to constrain(限制) resources that are allocated to processes.
  > Changing the settings such that your container runtime and kubelet use systemd as the cgroup stabilized the system.

# Cgroup V2
  > is the next version of cgroup Linux API.
  - cleaner and easier to use API
  - safe sub-tree delegation to containers
  - newer features like Pressure Stall Information

# Migrating to the systemd driver in kubeadm managed clusters
  - Docker:
    sudo mkdir /etc/docker
    cat <<EOF | sudo tee /etc/docker/daemon.json
    {
      "exec-opts": ["native.cgroupdriver=systemd"],
      "log-driver": "json-file",
      "log-opts": {
        "max-size": "100m"
      },
      "storage-driver": "overlay2"
    }
    EOF

  - Restart Docker and enable on boot:
    sudo systemctl enable docker
    sudo systemctl daemon-reload
    sudo systemctl restart docker

kubeadm部署

kubeadm目前欠缺部署高可用Kubernetes集群,ETCD、Master组件都应该是多节点集群

 # 创建一个Master节点 (--image-repository指定容器镜像地址使用阿里云)
 ➜ sudo kubeadm init [--image-repository='registry.cn-hangzhou.aliyuncs.com/google_containers']
 [init] Using Kubernetes version: v1.22.2
 [preflight] Running pre-flight checks
 [preflight] Pulling images required for setting up a Kubernetes cluster
 [preflight] This might take a minute or two, depending on the speed of your internet connection
 [preflight] You can also perform this action in beforehand using 'kubeadm config images pull'
 [certs] Using certificateDir folder "/etc/kubernetes/pki"
 [certs] Generating "ca" certificate and key
 [certs] Generating "apiserver" certificate and key
 [certs] apiserver serving cert is signed for DNS names [chyiyaqing-poweredge-r720 kubernetes kubernetes.default kubernetes.default.svc kubernetes.default.svc.cluster.local] and IPs [10.96.0.1 192.168.50.57]
 [certs] Generating "apiserver-kubelet-client" certificate and key
 [certs] Generating "front-proxy-ca" certificate and key
 [certs] Generating "front-proxy-client" certificate and key
 [certs] Generating "etcd/ca" certificate and key
 [certs] Generating "etcd/server" certificate and key
 [certs] etcd/server serving cert is signed for DNS names [chyiyaqing-poweredge-r720 localhost] and IPs [192.168.50.57 127.0.0.1 ::1]
 [certs] Generating "etcd/peer" certificate and key
 [certs] etcd/peer serving cert is signed for DNS names [chyiyaqing-poweredge-r720 localhost] and IPs [192.168.50.57 127.0.0.1 ::1]
 [certs] Generating "etcd/healthcheck-client" certificate and key
 [certs] Generating "apiserver-etcd-client" certificate and key
 [certs] Generating "sa" key and public key
 [kubeconfig] Using kubeconfig folder "/etc/kubernetes"
 [kubeconfig] Writing "admin.conf" kubeconfig file
 [kubeconfig] Writing "kubelet.conf" kubeconfig file
 [kubeconfig] Writing "controller-manager.conf" kubeconfig file
 [kubeconfig] Writing "scheduler.conf" kubeconfig file
 [kubelet-start] Writing kubelet environment file with flags to file "/var/lib/kubelet/kubeadm-flags.env"
 [kubelet-start] Writing kubelet configuration to file "/var/lib/kubelet/config.yaml"
 [kubelet-start] Starting the kubelet
 [control-plane] Using manifest folder "/etc/kubernetes/manifests"
 [control-plane] Creating static Pod manifest for "kube-apiserver"
 [control-plane] Creating static Pod manifest for "kube-controller-manager"
 [control-plane] Creating static Pod manifest for "kube-scheduler"
 [etcd] Creating static Pod manifest for local etcd in "/etc/kubernetes/manifests"
 [wait-control-plane] Waiting for the kubelet to boot up the control plane as static Pods from directory "/etc/kubernetes/manifests". This can take up to 4m0s
 [apiclient] All control plane components are healthy after 9.004478 seconds
 [upload-config] Storing the configuration used in ConfigMap "kubeadm-config" in the "kube-system" Namespace
 [kubelet] Creating a ConfigMap "kubelet-config-1.22" in namespace kube-system with the configuration for the kubelets in the cluster
 [upload-certs] Skipping phase. Please see --upload-certs
 [mark-control-plane] Marking the node chyiyaqing-poweredge-r720 as control-plane by adding the labels: [node-role.kubernetes.io/master(deprecated) node-role.kubernetes.io/control-plane node.kubernetes.io/exclude-from-external-load-balancers]
 [mark-control-plane] Marking the node chyiyaqing-poweredge-r720 as control-plane by adding the taints [node-role.kubernetes.io/master:NoSchedule]
 [bootstrap-token] Using token: z7bgdd.d4lm4cueg5vo9krh
 [bootstrap-token] Configuring bootstrap tokens, cluster-info ConfigMap, RBAC Roles
 [bootstrap-token] configured RBAC rules to allow Node Bootstrap tokens to get nodes
 [bootstrap-token] configured RBAC rules to allow Node Bootstrap tokens to post CSRs in order for nodes to get long term certificate credentials
 [bootstrap-token] configured RBAC rules to allow the csrapprover controller automatically approve CSRs from a Node Bootstrap Token
 [bootstrap-token] configured RBAC rules to allow certificate rotation for all node client certificates in the cluster
 [bootstrap-token] Creating the "cluster-info" ConfigMap in the "kube-public" namespace
 [kubelet-finalize] Updating "/etc/kubernetes/kubelet.conf" to point to a rotatable kubelet client certificate and key
 [addons] Applied essential addon: CoreDNS
 [addons] Applied essential addon: kube-proxy

 Your Kubernetes control-plane has initialized successfully!

 To start using your cluster, you need to run the following as a regular user:

   mkdir -p $HOME/.kube
   sudo cp -i /etc/kubernetes/admin.conf $HOME/.kube/config
   sudo chown $(id -u):$(id -g) $HOME/.kube/config

 Alternatively, if you are the root user, you can run:

   export KUBECONFIG=/etc/kubernetes/admin.conf

 You should now deploy a pod network to the cluster.
 Run "kubectl apply -f [podnetwork].yaml" with one of the options listed at:
   https://kubernetes.io/docs/concepts/cluster-administration/addons/

 Then you can join any number of worker nodes by running the following on each as root:

 kubeadm join 192.168.50.57:6443 --token z7bgdd.d4lm4cueg5vo9krh \
--discovery-token-ca-cert-hash sha256:f56091fb52dddc01e552ad110b3479015f4bcdaba5fadec6d76eadab1b3ee48b

 # 将一个Node节点加入当前集群
 $ kubeadm join <Master节点的IP和端口>

 # 获取kubeadm临时生成的token
 ➜ kubeadm token list
 TOKEN                     TTL         EXPIRES                USAGES                   DESCRIPTION                                                EXTRA GROUPS
 z1ktsy.k2dzz0m17d42nsem   23h         2021-10-15T05:26:51Z   authentication,signing   The default bootstrap token generated by 'kubeadm init'.   system:bootstrappers:kubeadm:default-node-token

 # 查看nodes
 ➜ kubectl get nodes
 NAME                        STATUS   ROLES                  AGE     VERSION
 chyiyaqing-poweredge-r720   Ready    control-plane,master   4m49s   v1.22.2

 # 查看Pod运行情况
 ➜ kubectl get pod -A
 NAMESPACE     NAME                                                READY   STATUS              RESTARTS   AGE
 kube-system   coredns-7d89d9b6b8-cch8p                            0/1     ContainerCreating   0          5m10s
 kube-system   coredns-7d89d9b6b8-vkzw7                            0/1     ContainerCreating   0          5m10s
 kube-system   etcd-chyiyaqing-poweredge-r720                      1/1     Running             0          5m15s
 kube-system   kube-apiserver-chyiyaqing-poweredge-r720            1/1     Running             0          5m15s
 kube-system   kube-controller-manager-chyiyaqing-poweredge-r720   1/1     Running             0          5m17s
 kube-system   kube-proxy-smnf2                                    1/1     Running             0          5m10s
 kube-system   kube-scheduler-chyiyaqing-poweredge-r720            1/1     Running             0          5m14s

 # kubectl describe 查看节点详细信息、状态和事件Event
 ➜ kubectl describe pod coredns-7d89d9b6b8-cch8p -n kube-system
 Name:                 coredns-7d89d9b6b8-cch8p
 Namespace:            kube-system
 Priority:             2000000000
 Priority Class Name:  system-cluster-critical
 Node:                 chyiyaqing-poweredge-r720/192.168.50.57
 Start Time:           Thu, 14 Oct 2021 13:27:02 +0800
 Labels:               k8s-app=kube-dns
                       pod-template-hash=7d89d9b6b8
 Annotations:          <none>
 Status:               Pending
 IP:
 IPs:                  <none>
 Controlled By:        ReplicaSet/coredns-7d89d9b6b8
 Containers:
   coredns:
     Container ID:
     Image:         registry.cn-hangzhou.aliyuncs.com/google_containers/coredns:v1.8.4
     Image ID:
     Ports:         53/UDP, 53/TCP, 9153/TCP
     Host Ports:    0/UDP, 0/TCP, 0/TCP
     Args:
       -conf
       /etc/coredns/Corefile
     State:          Waiting
       Reason:       ContainerCreating
     Ready:          False
     Restart Count:  0
     Limits:
       memory:  170Mi
     Requests:
       cpu:        100m
       memory:     70Mi
     Liveness:     http-get http://:8080/health delay=60s timeout=5s period=10s #success=1 #failure=5
     Readiness:    http-get http://:8181/ready delay=0s timeout=1s period=10s #success=1 #failure=3
     Environment:  <none>
     Mounts:
       /etc/coredns from config-volume (ro)
       /var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-qdx6w (ro)
 Conditions:
   Type              Status
   Initialized       True
   Ready             False
   ContainersReady   False
   PodScheduled      True
 Volumes:
   config-volume:
     Type:      ConfigMap (a volume populated by a ConfigMap)
     Name:      coredns
     Optional:  false
   kube-api-access-qdx6w:
     Type:                    Projected (a volume that contains injected data from multiple sources)
     TokenExpirationSeconds:  3607
     ConfigMapName:           kube-root-ca.crt
     ConfigMapOptional:       <nil>
     DownwardAPI:             true
 QoS Class:                   Burstable
 Node-Selectors:              kubernetes.io/os=linux
 Tolerations:                 CriticalAddonsOnly op=Exists
                              node-role.kubernetes.io/control-plane:NoSchedule
                              node-role.kubernetes.io/master:NoSchedule
                              node.kubernetes.io/not-ready:NoExecute op=Exists for 300s
                              node.kubernetes.io/unreachable:NoExecute op=Exists for 300s
 Events:
   Type     Reason                  Age                  From               Message
   ----     ------                  ----                 ----               -------
   Warning  FailedScheduling        6m12s                default-scheduler  0/1 nodes are available: 1 node(s) had taint {node.kubernetes.io/not-ready: }, that the pod didn't tolerate.
   Normal   Scheduled               6m7s                 default-scheduler  Successfully assigned kube-system/coredns-7d89d9b6b8-cch8p to chyiyaqing-poweredge-r720
   Warning  FailedCreatePodSandBox  6m6s                 kubelet            Failed to create pod sandbox: rpc error: code = Unknown desc = [failed to set up sandbox container "d165e9f1f627d4a9c1f0eac4492ae7d1db6a7a1fe2a92dd435380cac57b35011" network for pod "coredns-7d89d9b6b8-cch8p": networkPlugin cni failed to set up pod "coredns-7d89d9b6b8-cch8p_kube-system" network: unable to allocate IP address: Post "http://127.0.0.1:6784/ip/d165e9f1f627d4a9c1f0eac4492ae7d1db6a7a1fe2a92dd435380cac57b35011": dial tcp 127.0.0.1:6784: connect: connection refused, failed to clean up sandbox container "d165e9f1f627d4a9c1f0eac4492ae7d1db6a7a1fe2a92dd435380cac57b35011" network for pod "coredns-7d89d9b6b8-cch8p": networkPlugin cni failed to teardown pod "coredns-7d89d9b6b8-cch8p_kube-system" network: Delete "http://127.0.0.1:6784/ip/d165e9f1f627d4a9c1f0eac4492ae7d1db6a7a1fe2a92dd435380cac57b35011": dial tcp 127.0.0.1:6784: connect: connection refused]
   Normal   SandboxChanged          63s (x25 over 6m5s)  kubelet            Pod sandbox changed, it will be killed and re-created.

 # 通过Taint/Toleration 调整Master执行Pod的策略 (由于本地搭建环境kubeamd只有一个Master节点)
 > 默认情况下Master节点是不允许运行用户Pod的, Kubernetes依赖Taint/Toleration机制
   - 为节点打上污点(Taint)
     > $ kubectl taint nodes <node-1> foo=bar:NoSchedule
   - 删除Taint
     ➜ kubectl taint nodes --all node-role.kubernetes.io/master-
     node/chyiyaqing-poweredge-r720 untainted
 # 安装网络插件
     ➜ kubectl apply -f "https://cloud.weave.works/k8s/net?k8s-version=$(kubectl version | base64 | tr -d '\n')"

     serviceaccount/weave-net created
     clusterrole.rbac.authorization.k8s.io/weave-net created
     clusterrolebinding.rbac.authorization.k8s.io/weave-net created
     role.rbac.authorization.k8s.io/weave-net created
     rolebinding.rbac.authorization.k8s.io/weave-net created
     daemonset.apps/weave-net created

kubernetes CNI (Container Network Interface)
CNI plugin architecture

> Kubernetes supports CNI plugins for the communication between pods.
> kubeadm does not support kubenet. you should use a CNI plugin

# Kubernetes impose(施加) following rules for network communication:
  - All containers can communicate with all other containers without NAT
  - All nodes can communicate with all container (and vice-versa) without NAT
  - The IP that a container sees itself as is the same IP that others see it as.

CNI plugins generally use kube-proxy or directory iptables for routing. However, Cilium is based on BPF and XDP to provide a faster and more scalable option.

# Plugins:
  - Flannel:
    - provides VXLAN tunneling solution 隧道解决方案
    - configuration and management are very simple
    - does not support Network Policies
  - Calico:
    - default choice of the most of kubernetes platform (kubespary, docker enterprise)
    - uses BGP and Bird, a daemon called Felix configures routes on Bird
    - supports IP-IP encapsulation if BGP cannot be used.
    - supports Network Policies
    - uses iptables for routing but it can be configured to use kube-proxy's IPVS mode.
  - Weave:
    - Provides VXLAN tunneling solution
    - all of the nodes are connected as mesh which allows it to run on paritially connected networks.
    - stores configuration files on pods instead of kubernetes CRDs or etcd
    - has an encryption library
    - supports Network Policies
  - Cilium
    - Linux kernel must be at least 4.9
  - kube-router:
    -

Terminology

kube-proxy

Kube-proxy tree mods: 三个模组

userspace:
kube-proxy Userspace mode

1	> It adds rules to iptables so that all communication redirects through proxy server. It is no longer used since it is much slower compared to the other modes.

Iptables:

1	> This mode adds rules to iptables so that iptables redirects straight to pods without using a proxy server. It is the default mode of kube-proxy.

IPVS:

1	IPVS (IP Virtual Server) is layer-4 load balancer inside the Linux kernel. It is built on top of netfilter like iptables. It utilizes hash table instead of chain as in iptables.

Network policy

1	Kubernetes specification that can be used to control traffic between pods.

Overlay Networks

1	An overlay network abstracts a physical (underlay) network to create virtual network. It provides simpler interface by hiding complexities of the underlay.

VXLAN

VXLAN is a network tunneling(隧道) protocol in the Linux Kernel. Network tunneling means hiding protocol (VXLAN) within another protocol (TCP/IP).VXLAN tuneels layer 2 frames inside of Layer 4 UDP datagrams.

# Linux network structure in VXLAN:
  - veth: Virtual ethernet pair, it connects network namespaces
  - bridge: It is used to connect ethernet pairs in Linux
  - vtep: VXLAN tunnel endpoint, it's entry/exit point for VXLAN tunnels

BGP(Border gateway protocol) - 边界网关协议
BPF(Berkeley Packet Filter) - 伯克利数据包过滤器

XDP (eXpress Data Path)

1	XDP is a data path recently added to Linux kernel. It relies on eBPF to perform fast packet processing.

kubernetes安装Metrics Server

> Metrics Server是集群中资源使用情况的聚合器, Metrics Server从kubelets收集资源指标，并通过Metrics API将他们暴露在Kubernetes apiserver中,以供Horizontal Pod Autoscaler和Vertical Pod Autoscaler使用.
> Metrics Server不适用与非自动缩放目的，不要使用他将指标转发到监控解决方案，或作为监控解决方案指标的来源,这种情况可以使用kuberlet/metrics/resource 端点收集指标

# Installation
- kubectl apply -f https://github.com/kubernetes-sigs/metrics-server/releases/latest/download/components.yaml

# Fix: (cannot validate certificate, doesn't contain any IP SANs)
  - $ kubectl edit deployment metrics-server -n kube-system
    > modifying the metrics-server deployment template, and adding the argument - --kubelet-insecure-tls to the container args

Kubernetes集群安装Dashboard

# Dashboard(Web界面) 可视化插件
  ➜ wget kubectl apply -f https://raw.githubusercontent.com/kubernetes/dashboard/v2.2.0/aio/deploy/recommended.yaml
  namespace/kubernetes-dashboard created
  serviceaccount/kubernetes-dashboard created
  service/kubernetes-dashboard created
  secret/kubernetes-dashboard-certs created
  secret/kubernetes-dashboard-csrf created
  secret/kubernetes-dashboard-key-holder created
  configmap/kubernetes-dashboard-settings created
  role.rbac.authorization.k8s.io/kubernetes-dashboard created
  clusterrole.rbac.authorization.k8s.io/kubernetes-dashboard created
  rolebinding.rbac.authorization.k8s.io/kubernetes-dashboard created
  clusterrolebinding.rbac.authorization.k8s.io/kubernetes-dashboard created
  deployment.apps/kubernetes-dashboard created
  service/dashboard-metrics-scraper created
  Warning: spec.template.metadata.annotations[seccomp.security.alpha.kubernetes.io/pod]: deprecated since v1.19; use the "seccompProfile" field instead
  deployment.apps/dashboard-metrics-scraper created

# 查看Dashboard对应的Pod状态
➜ kubectl get pods -n kubernetes-dashboard
NAME                                         READY   STATUS    RESTARTS   AGE
dashboard-metrics-scraper-856586f554-w4vvw   1/1     Running   0          5m19s
kubernetes-dashboard-78c79f97b4-l8nmb        1/1     Running   0          5m19s

# 命令行代理 kubectl 访问DashBoard
$ kubectl proxy --address='0.0.0.0' --port=8001 --accept-hosts='.*'

# 从集群外部访问Dashboard 需要使用Ingress

Kubernetes部署存储插件

> 容器持久化存储,用来保存容器存储状态,存储插件会在容器中挂在一个基于网络或者其他机制的远程数据卷. 使得在容器里创建的文件，实际上保存在远程存储服务器上,

# Kubernetes 存储插件 - Rook
  > Rook项目是一个基于Ceph的Kubernetes存储插件，不同于Ceph的简单封装，Rook加入了对平扩展、迁移、灾难备份、监控等大量的企业级功能.
  - 部署基于Rook持久化存储集群以容器方式运行
    $ git clone --single-branch --branch release-1.7 https://github.com/rook/rook.git
    cd rook/cluster/examples/kubernetes/ceph
    kubectl create -f crds.yaml -f common.yaml -f operator.yaml
    kubectl create -f cluster.yaml

  - 查看Rook部署的Namespace
    ➜ k get pods -n rook-ceph
    NAME                                           READY   STATUS    RESTARTS   AGE
    csi-cephfsplugin-7l9xb                         3/3     Running   0          15m
    csi-cephfsplugin-provisioner-689686b44-qpt5q   6/6     Running   0          15m
    csi-rbdplugin-provisioner-5775fb866b-25224     6/6     Running   0          15m
    csi-rbdplugin-vvq94                            3/3     Running   0          15m
    rook-ceph-operator-7bdb744878-zz2fm            1/1     Running   0          24m

  > Kubernetes项目创建的所有Pod就能够通过Persistent Volume (PV) 和 Persistent Volume Claim (PVC) 的方式，在容器里挂载由Ceph提供的数据卷

  - Storage
    - Block: Create block storage to be consumed by a pod (RWO)
      > Before Rook can provision storage, a **StorageClass** and **CephBlockPool** need to be created.
    - Shared FileSystem: Create a filesystem to be shared across multiple pods (RWX)
    - Object: Create an object store that is accessible inside or outside the Kubernetes cluster

HELM

> Helm is the package manager for Kubernetes

# Helm 三大概念
  - Chart: helm包
    > 包含Kubernetes集群内部运行应用程序,工具或服务所需的所有资源定义
  - Repository: 仓库
    > 存放和共享charts
  - Release:
    > 运行在Kubernetes集群中的chart实例

# helm search - 查找Charts

Kubernetes IN Docker (Kind)

local clusters for testing Kubernetes

# Installation
  $ go install sigs.k8s.io/kind@v0.11.1

# Creating a Cluster
  $ kind create cluster # Default cluster context name is `kind`
  Creating cluster "kind" ...
   ✓ Ensuring node image (kindest/node:v1.21.1) 🖼
   ✓ Preparing nodes 📦
   ✓ Writing configuration 📜
   ✓ Starting control-plane 🕹️
   ✓ Installing CNI 🔌
   ✓ Installing StorageClass 💾
  Set kubectl context to "kind-kind"
  You can now use your cluster with:

  kubectl cluster-info --context kind-kind

  Not sure what to do next? 😅  Check out https://kind.sigs.k8s.io/docs/user/quick-start/

  ❯ kubectl cluster-info --context kind-kind  # interact with a specific cluster
  Kubernetes control plane is running at https://127.0.0.1:45967
  CoreDNS is running at https://127.0.0.1:45967/api/v1/namespaces/kube-system/services/kube-dns:dns/proxy

  To further debug and diagnose cluster problems, use 'kubectl cluster-info dump'.

  ❯ kind get clusters  # list kind clusters
  kind

容器化应用

Kubernetes API 对象定义

> Kubernetes 不推荐使用命令行方式直接运行容器(kubectl run), 而是希望使用YAML方式(kubectl create -f yaml)
> 使用一个API对象(Deployment)管理另一种API对象(Pod)的方式叫做“控制器”模式 (Controller pattern)
> Pod 是Kubernetes世界里的“应用”,而一个应用可以由多个容器组成.

- metadata: API对象的元数据
- spec: 描述它要表达的功能
- labels: 一组key-value格式的标签
- spec.selector.matchLabels: Label Selector 标签选择器
- annotations: 一组key-value格式的内部信息(kubernetes组件本身感兴趣, 在Kubernetes运行过程中被自动加载到API对象上)

➜ k get pods -l app=nginx
NAME                                READY   STATUS    RESTARTS   AGE
nginx-deployment-5d59d67564-8qtq5   1/1     Running   0          11m
nginx-deployment-5d59d67564-xx29g   1/1     Running   0          11m

# 使用kubectl apply 统一进行kubernetes对象的创建和更新操作
$ k apply -f nginx-deployment.yaml
$ k exec -it nginx-deployment-748c6fff66-plhwq -- /bin/bash
$ k delete -f nginx-deployment.yaml

# emptyDir
  > 不显式声明宿主主机目录的Volume,Kubernetes会在宿主主机上创建一个临时目录,这个目录会被绑定到容器所声明的Volume目录上(Kubernetes的emptyDir类型，只是把kubernetes创建的临时目录作为Volume宿主机目录，交给Docker)

# hostPath
  > 显式的Volume定义
  ...
    volumes:
      - name: nginx-vol
        hostPath:
          path: " /var/data"

Pod实现原理

> Pod是Kuberntes项目的原子调度单位,kubernetes项目的调度器是统一按照Pod而非容器的资源需求进行计算
> Docker容器的本质"Namespace隔离，Cgroups限制，rootfs文件系统"
> Pod是Kubernetes里院子调度单位，Kubernetes项目的调度器是统一按照Pod而非容器的资源需求进行计算.

# 展示系统中正在运行的进程树状结构
$ pstree - display a tree of process

> 容器的"单进程模型"并不是指容器里只能运行"一个"进程,而是指容器没有管理多个进程的能力,
> Pod在Kubernetes项目里“容器设计模型”
> Pod其实是一组共享了某些资源的容器;Pod里的所有容器共享的是一个Network Namespace,并且可以声明共享一个Volume.
> Pod实际上扮演传统基础设施里"虚拟机"的角色，而容器则是这个虚拟机里运行的用户程序

Pod Infra容器

1
2
3

> Kubernetes项目中Pod实现使用一个中间容器，这个容器叫做Infra容器,在Pod中,Infra容器都是第一个被创建的容器
> Infra 容器使用k8s.gcr.io/pause镜像使用汇编语言编写，永远处于"暂停"状态的容器,解压后大小只有100～200KB
> Pod声明周期只跟Infra容器一致，而与容器A和B无关

容器设计模式

# sidecar
  > 我们可以在一个Pod中启动一个辅助容器，来完成一些独立主容器之外的工作

# Istio - 微服务治理项目
  > 使用sidecar容器完成微服务治理的原理

深入解析Pod对象(-): 基本概念

> Kubernetes 项目的最小编排单元 "Pod"
> Pod扮演的是传统部署环境里"虚拟机"的角色

# Pod属性
  - 调度、网络、存储、安全
  - NodeSelector (将Pod与Node进行绑定的字段)
  - NodeName: (调度的结果)
  - 凡是Pod中的容器要共享宿主机的Namespace,一定是Pod级别的定义

# Container属性
  - ImagePullPolicy: 镜像拉取策略

# Pod对象在Kubernetes中的声明周期
  - phase:
    - Pending: Pod的YAML文件已经提交Kubernetes, API对象已经被创建保存在Etcd当中
    - Running: Pod已经调度成功,与具体节点绑定
    - Succeeded: Pod中所有容器都正常运行完毕，并且已经退出
    - Failed: Pod中至少有一个容器以不正常的状态退出
    - Unknown: Pod状态不能持续被kubelet上报给kube-apiserver
  - status.condation
    - PodScheduled:
    - Initialized
    - Ready
    - ContainersReady
# Volume
  - Projected Volume
    - Secret: 加密数据存放在Etcd,通过Pod容器挂载Volume方式访问Secret,Secret对象存储数据经过base64转码
    - ConfigMap: 保存不需要加密、应用所需的配置信息
    - Downward API: 直接获取Pod API 对象本身的信息
      - spec.nodeName: 宿主机名字
      - status.hostIP: 宿主机IP
      - metadata.name: Pod名字
      - metadata.namespace: Pod的Namespace
      - status.podIP: Pod的IP
      - spec.serviceAccountName:
      - metadata.uid
      - metadata.labels
      - metadata.annotations
      - metadata.labels
      - metadata.annotations
    - ServiceAccountToken
      > Service Account对象是Kubernetes系统内置的一种"服务账户",是Kubernetes进行权限分配的对象

环境变量获取信息的方式不具备自动更新的能力,建议使用Volume文件的方式获取这些信息
这种把Kubernetes客户端以容器的方式运行在集群里，然后使用default Service Account自动授权的方式，称作"InClusterConfig"，Kubernetes API编程的授权方式

# 容器健康检查和恢复机制
  - Probe(探针)

  > Kubernetes中并没有Docker的Stop,虽然是Restart,

Stay Hungry Stay Foolish

Kubernetes Intro

Kubernetes

容器基础

Kubernetes 概念

部署Kubernetes

Kubernetes IN Docker (Kind)

容器化应用

Kubernetes 技能图谱

Awesome Tools

WarmUp

专业术语

参考资料