一、K8S架構與組件
1、K8S架構
k8s 總體架構采用了經典的 maste/slave 架構模式,分 master 節點和 worker 節點,節點可以是虛擬機也可以是物理機。
K8S組件
master 節點組件Kube-apiserver
用于暴露 Kubernetes API,任何資源請求或調用操作都是通過 kube-apiserver 提供的接口進行。以 HTTPRestful API 提供接口服務,所有對象資源的增刪改查和監聽操作都交給 API Server 處理后再提交給 Etcd 存儲。
可以理解成 API Server 是 K8S 的請求入口服務。API Server 負責接收 K8S 所有請求(來自 UI 界面或者 CLI命令行工具), 然后根據用戶的具體請求,去通知其他組件干活。可以說 API Server 是 K8S 集群架構的大腦。
Kube-controller-manager
運行管理控制器,是 K8S 集群中處理常規任務的后臺線程,是 K8S 集群里所有資源對象的自動化控制中心。在 K8S 集群中,一個資源對應一個控制器,而 Controller manager 就是負責管理這些控制器的
由一系列控制器組成,通過 API Server 監控整個集群的狀態,并確保集群處于預期的工作狀態,比如當某個 Node意外宕機時,Controller Manager 會及時發現并執行自動化修復流程,確保集群始終處于預期的工作狀態。
Kube-scheduler
是負責資源調度的進程,根據調度算法為新創建的 Pod 選擇一個合適的 Node 節點。可以理解成 K8S 所有 Node 節點的調度器。當用戶要部署服務時,Scheduler 會根據調度算法選擇最合適的 Node 節點來部署 Pod。預選策略(predicate)優選策略(priorities
etcd
K8S 的存儲服務。etcd 是分布式鍵值存儲系統,存儲了 K8S 的關鍵配置和用戶配置,K8S 中僅 API Server 才具備讀寫權限,其他組件必須通過 API Server 的接口才能讀寫數據。
node節點組件
在 Kubernetes 集群中,在每個 Node(又稱 Worker Node)上都會啟動一個 kubelet 服務進程。該進程用于處理 Master 下發到本節點的任務,管理 Pod 及 Pod 中的容器。每個 kubelet 進程都會在 API Server 上注冊節點自身的信息,定期向 Master 匯報節點資源的使用情況,并通過 cAdvisor 監控容器和節點資源。
Kubelet
Node 節點的監視器,以及與 Master 節點的通訊器。Kubelet 是 Master 節點安插在 Node 節點上的“眼線”,它會定時向 API Server 匯報自己 Node 節點上運行的服務的狀態,并接受來自 Master 節點的指示采取調整措施。從 Master 節點獲取自己節點上 Pod 的期望狀態(比如運行什么容器、運行的副本數量、網絡或者存儲如何配置等), 直接跟容器引擎交互實現容器的生命周期管理,如果自己節點上 Pod 的狀態與期望狀態不一致,則調用對應的容器平臺接口(即 docker 的接口)達到這個狀態。管理鏡像和容器的清理工作,保證節點上鏡像不會占滿磁盤空間,退出的容器不會占用太多資源。
Kube-Proxy
在每個 Node 節點上實現 Pod 網絡代理,是 Kubernetes Service 資源的載體,負責維護網絡規則和四層負載均衡工作。 負責寫入規則至iptables、ipvs實現服務映射訪問的。Kube-Proxy 本身不是直接給 Pod 提供網絡,Pod 的網絡是由 Kubelet 提供的,Kube-Proxy 實際上維護的是虛擬的 Pod 集群網絡。Kube-apiserver 通過監控 Kube-Proxy 進行對 Kubernetes Service 的更新和端點的維護。在 K8S 集群中微服務的負載均衡是由 Kube-proxy 實現的。Kube-proxy 是 K8S 集群內部的負載均衡器。它是一個分布式代理服務器,在 K8S 的每個節點上都會運行一個 Kube-proxy 組件。
Controller Runtime
容器引擎,如:docker、containerd,運行容器,負責本機的容器創建和管理工作。當 kubernetes 把 pod 調度到節點上,節點上的 kubelet會指示 docker 啟動特定的容器。接著,kubelet 會通過 docker 持續地收集容器的信息, 然后提交到主節點上。docker 會如往常一樣拉取容器鏡像、啟動或停止容器。不同點僅僅在于這是由自動化系統控制而非管理員在每個節點上手動操作的。
Pod
k8s 中特有的一個概念,可以理解為對容器的包裝,是 k8s 的基本調度單位,一個 Pod 代表集群上正在運行的一個進程,實際的容器是運行在 Pod 中的, 可以把 Pod 理解成豌豆莢,而同一 Pod 內的每個容器是一顆顆豌豆。一個節點可以啟動一個或多個 Pod。生產環境中一般都是單個容器或者具有強關聯互補的多個容器組成一個 Pod。
二、POD創建過程
第一步: ? 客戶端提交創建Pod的請求,可以通過調用API Server的Rest API接口,也可以通過kubectl命令行工具。如kubectl apply -f filename.yaml(資源清單文件)
? 第二步: ? apiserver接收到pod創建請求后,會將yaml中的屬性信息(metadata)寫入etcd。
? 第三步: ? apiserver觸發watch機制準備創建pod,信息轉發給調度器scheduler,調度器用一組規則過濾掉不符合要求的主機,比如Pod指定了所需要的資源量,那么可用資源比Pod需要的資源量少的主機會被過濾掉。調度器使用調度算法選擇node,調度器將node信息給apiserver,apiserver將綁定的node信息寫入etcd。
? 第四步: ? apiserver又通過watch機制,調用kubelet,指定pod信息,調用Docker API創建并啟動pod內的容器。
? 第五步: ? worker創建完成之后反饋給自身的kubelet, 再反饋給控制器的kubelet,然后將pod的狀態信息給apiserver,apiserver又將pod的狀態信息寫入etcd。
三、Pod資源清單詳解
1、Pod資源清單介紹
#查看對象中包含哪些字段
[root@k8s-master ~]# kubectl explain pod.spec.containers
KIND: Pod
VERSION: v1FIELD: containers <[]Container>DESCRIPTION:List of containers belonging to the pod. Containers cannot currently beadded or removed. There must be at least one container in a Pod. Cannot beupdated.A single application container that you want to run within a pod.FIELDS:args <[]string>Arguments to the entrypoint. The container image's CMD is used if this isnot provided. Variable references $(VAR_NAME) are expanded using thecontainer's environment. If a variable cannot be resolved, the reference inthe input string will be unchanged. Double $$ are reduced to a single $,which allows for escaping the $(VAR_NAME) syntax: i.e. "$$(VAR_NAME)" willproduce the string literal "$(VAR_NAME)". Escaped references will never beexpanded, regardless of whether the variable exists or not. Cannot beupdated. More info:https://kubernetes.io/docs/tasks/inject-data-application/define-command-argument-container/#running-a-command-in-a-shellcommand <[]string>Entrypoint array. Not executed within a shell. The container image'sENTRYPOINT is used if this is not provided. Variable references $(VAR_NAME)are expanded using the container's environment. If a variable cannot beresolved, the reference in the input string will be unchanged. Double $$ arereduced to a single $, which allows for escaping the $(VAR_NAME) syntax:i.e. "$$(VAR_NAME)" will produce the string literal "$(VAR_NAME)". Escapedreferences will never be expanded, regardless of whether the variable existsor not. Cannot be updated. More info:https://kubernetes.io/docs/tasks/inject-data-application/define-command-argument-container/#running-a-command-in-a-shellenv <[]EnvVar>List of environment variables to set in the container. Cannot be updated.envFrom <[]EnvFromSource>List of sources to populate environment variables in the container. The keysdefined within a source must be a C_IDENTIFIER. All invalid keys will bereported as an event when the container is starting. When a key exists inmultiple sources, the value associated with the last source will takeprecedence. Values defined by an Env with a duplicate key will takeprecedence. Cannot be updated.image <string>Container image name. More info:https://kubernetes.io/docs/concepts/containers/images This field is optionalto allow higher level config management to default or override containerimages in workload controllers like Deployments and StatefulSets.imagePullPolicy <string>Image pull policy. One of Always, Never, IfNotPresent. Defaults to Always if:latest tag is specified, or IfNotPresent otherwise. Cannot be updated. Moreinfo: https://kubernetes.io/docs/concepts/containers/images#updating-imagesPossible enum values:- `"Always"` means that kubelet always attempts to pull the latest image.Container will fail If the pull fails.- `"IfNotPresent"` means that kubelet pulls if the image isn't present ondisk. Container will fail if the image isn't present and the pull fails.- `"Never"` means that kubelet never pulls an image, but only uses a localimage. Container will fail if the image isn't presentlifecycle <Lifecycle>Actions that the management system should take in response to containerlifecycle events. Cannot be updated.livenessProbe <Probe>Periodic probe of container liveness. Container will be restarted if theprobe fails. Cannot be updated. More info:https://kubernetes.io/docs/concepts/workloads/pods/pod-lifecycle#container-probesname <string> -required-Name of the container specified as a DNS_LABEL. Each container in a pod musthave a unique name (DNS_LABEL). Cannot be updated.ports <[]ContainerPort>List of ports to expose from the container. Not specifying a port here DOESNOT prevent that port from being exposed. Any port which is listening on thedefault "0.0.0.0" address inside a container will be accessible from thenetwork. Modifying this array with strategic merge patch may corrupt thedata. For more information Seehttps://github.com/kubernetes/kubernetes/issues/108255. Cannot be updated.readinessProbe <Probe>Periodic probe of container service readiness. Container will be removedfrom service endpoints if the probe fails. Cannot be updated. More info:https://kubernetes.io/docs/concepts/workloads/pods/pod-lifecycle#container-probesresizePolicy <[]ContainerResizePolicy>Resources resize policy for the container.resources <ResourceRequirements>Compute Resources required by this container. Cannot be updated. More info:https://kubernetes.io/docs/concepts/configuration/manage-resources-containers/restartPolicy <string>RestartPolicy defines the restart behavior of individual containers in apod. This field may only be set for init containers, and the only allowedvalue is "Always". For non-init containers or when this field is notspecified, the restart behavior is defined by the Pod's restart policy andthe container type. Setting the RestartPolicy as "Always" for the initcontainer will have the following effect: this init container will becontinually restarted on exit until all regular containers have terminated.Once all regular containers have completed, all init containers withrestartPolicy "Always" will be shut down. This lifecycle differs from normalinit containers and is often referred to as a "sidecar" container. Althoughthis init container still starts in the init container sequence, it does notwait for the container to complete before proceeding to the next initcontainer. Instead, the next init container starts immediately after thisinit container is started, or after any startupProbe has successfullycompleted.securityContext <SecurityContext>SecurityContext defines the security options the container should be runwith. If set, the fields of SecurityContext override the equivalent fieldsof PodSecurityContext. More info:https://kubernetes.io/docs/tasks/configure-pod-container/security-context/startupProbe <Probe>StartupProbe indicates that the Pod has successfully initialized. Ifspecified, no other probes are executed until this completes successfully.If this probe fails, the Pod will be restarted, just as if the livenessProbefailed. This can be used to provide different probe parameters at thebeginning of a Pod's lifecycle, when it might take a long time to load dataor warm a cache, than during steady-state operation. This cannot be updated.More info:https://kubernetes.io/docs/concepts/workloads/pods/pod-lifecycle#container-probesstdin <boolean>Whether this container should allocate a buffer for stdin in the containerruntime. If this is not set, reads from stdin in the container will alwaysresult in EOF. Default is false.stdinOnce <boolean>Whether the container runtime should close the stdin channel after it hasbeen opened by a single attach. When stdin is true the stdin stream willremain open across multiple attach sessions. If stdinOnce is set to true,stdin is opened on container start, is empty until the first client attachesto stdin, and then remains open and accepts data until the clientdisconnects, at which time stdin is closed and remains closed until thecontainer is restarted. If this flag is false, a container processes thatreads from stdin will never receive an EOF. Default is falseterminationMessagePath <string>Optional: Path at which the file to which the container's terminationmessage will be written is mounted into the container's filesystem. Messagewritten is intended to be brief final status, such as an assertion failuremessage. Will be truncated by the node if greater than 4096 bytes. The totalmessage length across all containers will be limited to 12kb. Defaults to/dev/termination-log. Cannot be updated.terminationMessagePolicy <string>Indicate how the termination message should be populated. File will use thecontents of terminationMessagePath to populate the container status messageon both success and failure. FallbackToLogsOnError will use the last chunkof container log output if the termination message file is empty and thecontainer exited with an error. The log output is limited to 2048 bytes or80 lines, whichever is smaller. Defaults to File. Cannot be updated.Possible enum values:- `"FallbackToLogsOnError"` will read the most recent contents of thecontainer logs for the container status message when the container exitswith an error and the terminationMessagePath has no contents.- `"File"` is the default behavior and will set the container statusmessage to the contents of the container's terminationMessagePath when thecontainer exits.tty <boolean>Whether this container should allocate a TTY for itself, also requires'stdin' to be true. Default is false.volumeDevices <[]VolumeDevice>volumeDevices is the list of block devices to be used by the container.volumeMounts <[]VolumeMount>Pod volumes to mount into the container's filesystem. Cannot be updated.workingDir <string>Container's working directory. If not specified, the container runtime'sdefault will be used, which might be configured in the container image.Cannot be updated.
2、POD YAML文件示例
# yaml格式的pod定義文件完整內容:
apiVersion: v1 #必選,版本號,例如v1
kind: Pod #必選,Pod
metadata: #必選,元數據name: string #必選,Pod名稱namespace: string #必選,Pod所屬的命名空間labels: #自定義標簽name: string #自定義標簽名字annotations: #自定義注釋列表- name: string
spec: #必選,Pod中容器的詳細定義containers: #必選,Pod中容器列表- name: string #必選,容器名稱image: string #必選,容器的鏡像名稱imagePullPolicy: [Always | Never | IfNotPresent] #獲取鏡像的策略 Alawys表示下載鏡像 IfnotPresent表示優先使用本地鏡像,否則下載鏡像,Nerver表示僅使用本地鏡像command: [string] #容器的啟動命令列表,如不指定,使用打包時使用的啟動命令args: [string] #容器的啟動命令參數列表workingDir: string #容器的工作目錄volumeMounts: #掛載到容器內部的存儲卷配置- name: string #引用pod定義的共享存儲卷的名稱,需用volumes[]部分定義的的卷名mountPath: string #存儲卷在容器內mount的絕對路徑,應少于512字符readOnly: boolean #是否為只讀模式ports: #需要暴露的端口庫號列表- name: string #端口號名稱containerPort: int #容器需要監聽的端口號hostPort: int #容器所在主機需要監聽的端口號,默認與Container相同protocol: string #端口協議,支持TCP和UDP,默認TCPenv: #容器運行前需設置的環境變量列表- name: string #環境變量名稱value: string #環境變量的值resources: #資源限制和請求的設置limits: #資源限制的設置cpu: string #Cpu的限制,單位為core數,將用于docker run --cpu-shares參數memory: string #內存限制,單位可以為Mib/Gib,將用于docker run --memory參數requests: #資源請求的設置cpu: string #Cpu請求,容器啟動的初始可用數量memory: string #內存清楚,容器啟動的初始可用數量livenessProbe: #對Pod內個容器健康檢查的設置,當探測無響應幾次后將自動重啟該容器,檢查方法有exec、httpGet和tcpSocket,對一個容器只需設置其中一種方法即可exec: #對Pod容器內檢查方式設置為exec方式command: [string] #exec方式需要制定的命令或腳本httpGet: #對Pod內個容器健康檢查方法設置為HttpGet,需要制定Path、portpath: stringport: numberhost: stringscheme: stringHttpHeaders:- name: stringvalue: stringtcpSocket: #對Pod內個容器健康檢查方式設置為tcpSocket方式port: numberinitialDelaySeconds: 0 #容器啟動完成后首次探測的時間,單位為秒timeoutSeconds: 0 #對容器健康檢查探測等待響應的超時時間,單位秒,默認1秒periodSeconds: 0 #對容器監控檢查的定期探測時間設置,單位秒,默認10秒一次successThreshold: 0failureThreshold: 0securityContext:privileged: falserestartPolicy: [Always | Never | OnFailure]#Pod的重啟策略,Always表示一旦不管以何種方式終止運行,kubelet都將重啟,OnFailure表示只有Pod以非0退出碼退出才重啟,Nerver表示不再重啟該PodnodeSelector: obeject #設置NodeSelector表示將該Pod調度到包含這個label的node上,以key:value的格式指定imagePullSecrets: #Pull鏡像時使用的secret名稱,以key:secretkey格式指定- name: stringhostNetwork: false #是否使用主機網絡模式,默認為false,如果設置為true,表示使用宿主機網絡volumes: #在該pod上定義共享存儲卷列表- name: string #共享存儲卷名稱 (volumes類型有很多種)emptyDir: {} #類型為emtyDir的存儲卷,與Pod同生命周期的一個臨時目錄。為空值hostPath: string #類型為hostPath的存儲卷,表示掛載Pod所在宿主機的目錄path: string #Pod所在宿主機的目錄,將被用于同期中mount的目錄secret: #類型為secret的存儲卷,掛載集群與定義的secre對象到容器內部scretname: string items: - key: stringpath: stringconfigMap: #類型為configMap的存儲卷,掛載預定義的configMap對象到容器內部name: stringitems:- key: stringpath: string
四、標簽
1 、什么是標簽?
標簽其實就一對 key/value ,被關聯到對象上,比如Pod,標簽的使用我們傾向于能夠表示對象的特殊特點,就是一眼就看出了這個Pod是干什么的,標簽可以用來劃分特定的對象(比如版本,服務類型等),標簽可以在創建一個對象的時候直接定義,也可以在后期隨時修改,每一個對象可以擁有多個標簽,但是,key值必須是唯一的。創建標簽之后也可以方便我們對資源進行分組管理。如果對pod打標簽之后就可以使用標簽來查看、刪除指定的pod。(慎重!!!)在k8s中,大部分資源都可以打標簽。
2、給pod資源打標簽
通過yaml文件修改
通過命令修改