2018-04-13

AzureのAKSのkubectl describe nodeを見る

Kubernetesで実際のメモリを超えるコンテナアプリを動かすと、どうなるか？ - あさのひとりごとにスケジューリングリソースの解説がされていますが、一番重要なkubectl describe nodeの生のデータが掲載されていないのがちょっともったいないかなと思ったので一応貼っておきます。

aksの準備から終了まではこんな感じです。

az login
az provider register -n Microsoft.ContainerService
az provider register -n Microsoft.Network 
az provider register -n Microsoft.Compute
az provider register -n Microsoft.Storage

az group list
az group create -n testaks -l eastus
az aks get-versions -l eastus
az aks create -g testaks -n testaks --node-count 2 --kubernetes-version 1.9.6
az aks install-cli
az aks get-credentials -g testaks -n testaks

kubectl get node
kubectl describe node

az aks delete -g testaks -n testaks

kubectl describe nodeの結果は以下です。Allocatableのmemoryが3319Mi、node末尾0のAllocated resourcesのMemory Requestsが290Mi、node末尾1が294Miなので、この例では両ノード3000Mi程度アロケーション可能です。この状態でrequests.memoryが1.5Gi==1536Miのpodを複数スケジュールしようとすると恐らく3つめでFailedSchedulingとなると思います。1500Miであれば4 podスケジューリングできます。最初からデプロイされるkube-systemのpod群の配置は不定なので、場合によっては少し偏ってしまい元記事のように3 podスケジューリングできて4 pod目がfailするという状況にもなるでしょう。

Name:               aks-nodepool1-16184948-0
Roles:              agent
Labels:             agentpool=nodepool1
                    beta.kubernetes.io/arch=amd64
                    beta.kubernetes.io/instance-type=Standard_DS1_v2
                    beta.kubernetes.io/os=linux
                    failure-domain.beta.kubernetes.io/region=eastus
                    failure-domain.beta.kubernetes.io/zone=0
                    kubernetes.azure.com/cluster=MC_testaks_testaks_eastus
                    kubernetes.io/hostname=aks-nodepool1-16184948-0
                    kubernetes.io/role=agent
                    storageprofile=managed
                    storagetier=Premium_LRS
Annotations:        node.alpha.kubernetes.io/ttl=0
                    volumes.kubernetes.io/controller-managed-attach-detach=true
CreationTimestamp:  Fri, 13 Apr 2018 05:10:06 +0000
Taints:             <none>
Unschedulable:      false
Conditions:
  Type                 Status  LastHeartbeatTime                 LastTransitionTime                Reason                       Message
  ----                 ------  -----------------                 ------------------                ------                       -------
  NetworkUnavailable   False   Fri, 13 Apr 2018 05:11:13 +0000   Fri, 13 Apr 2018 05:11:13 +0000   RouteCreated                 RouteController created a route
  OutOfDisk            False   Fri, 13 Apr 2018 05:23:02 +0000   Fri, 13 Apr 2018 05:10:06 +0000   KubeletHasSufficientDisk     kubelet has sufficient disk space available
  MemoryPressure       False   Fri, 13 Apr 2018 05:23:02 +0000   Fri, 13 Apr 2018 05:10:06 +0000   KubeletHasSufficientMemory   kubelet has sufficient memory available
  DiskPressure         False   Fri, 13 Apr 2018 05:23:02 +0000   Fri, 13 Apr 2018 05:10:06 +0000   KubeletHasNoDiskPressure     kubelet has no disk pressure
  Ready                True    Fri, 13 Apr 2018 05:23:02 +0000   Fri, 13 Apr 2018 05:11:08 +0000   KubeletReady                 kubelet is posting ready status. AppArmor enabled
Addresses:
  InternalIP:  10.240.0.4
  Hostname:    aks-nodepool1-16184948-0
Capacity:
 alpha.kubernetes.io/nvidia-gpu:  0
 cpu:                             1
 memory:                          3501592Ki
 pods:                            110
Allocatable:
 alpha.kubernetes.io/nvidia-gpu:  0
 cpu:                             1
 memory:                          3399192Ki
 pods:                            110
System Info:
 Machine ID:                 2c3d39f8fac841cb9df23cf4453420a9
 System UUID:                707CF566-AC50-D649-9A23-F6A03C86DE52
 Boot ID:                    ec9a15b2-af2e-4f84-92a5-5525d78c20f8
 Kernel Version:             4.13.0-1011-azure
 OS Image:                   Debian GNU/Linux 9 (stretch)
 Operating System:           linux
 Architecture:               amd64
 Container Runtime Version:  docker://1.13.1
 Kubelet Version:            v1.9.6
 Kube-Proxy Version:         v1.9.6
PodCIDR:                     10.244.0.0/24
ExternalID:                  /subscriptions/31c0faff-6b3e-4b51-86e2-6c9595e05454/resourceGroups/MC_testaks_testaks_eastus/providers/Microsoft.Compute/virtualMachines/aks-nodepool1-16184948-0
ProviderID:                  azure:///subscriptions/31c0faff-6b3e-4b51-86e2-6c9595e05454/resourceGroups/MC_testaks_testaks_eastus/providers/Microsoft.Compute/virtualMachines/aks-nodepool1-16184948-0
Non-terminated Pods:         (6 in total)
  Namespace                  Name                                     CPU Requests  CPU Limits  Memory Requests  Memory Limits
  ---------                  ----                                     ------------  ----------  ---------------  -------------
  kube-system                kube-dns-v20-7c556f89c5-9grpn            110m (11%)    0 (0%)      120Mi (3%)       220Mi (6%)
  kube-system                kube-dns-v20-7c556f89c5-s8x25            110m (11%)    0 (0%)      120Mi (3%)       220Mi (6%)
  kube-system                kube-proxy-4n2xd                         100m (10%)    0 (0%)      0 (0%)           0 (0%)
  kube-system                kube-svc-redirect-qwrtf                  0 (0%)        0 (0%)      0 (0%)           0 (0%)
  kube-system                kubernetes-dashboard-546f987686-khjvt    100m (10%)    100m (10%)  50Mi (1%)        50Mi (1%)
  kube-system                tunnelfront-6f9ff58869-jxcfn             0 (0%)        0 (0%)      0 (0%)           0 (0%)
Allocated resources:
  (Total limits may be over 100 percent, i.e., overcommitted.)
  CPU Requests  CPU Limits  Memory Requests  Memory Limits
  ------------  ----------  ---------------  -------------
  420m (42%)    100m (10%)  290Mi (8%)       490Mi (14%)
Events:
  Type    Reason                   Age                From                                  Message
  ----    ------                   ----               ----                                  -------
  Normal  Starting                 15m                kubelet, aks-nodepool1-16184948-0     Starting kubelet.
  Normal  NodeAllocatableEnforced  15m                kubelet, aks-nodepool1-16184948-0     Updated Node Allocatable limit across pods
  Normal  NodeHasNoDiskPressure    14m (x7 over 15m)  kubelet, aks-nodepool1-16184948-0     Node aks-nodepool1-16184948-0 status is now: NodeHasNoDiskPressure
  Normal  NodeHasSufficientDisk    13m (x8 over 15m)  kubelet, aks-nodepool1-16184948-0     Node aks-nodepool1-16184948-0 status is now: NodeHasSufficientDisk
  Normal  NodeHasSufficientMemory  13m (x8 over 15m)  kubelet, aks-nodepool1-16184948-0     Node aks-nodepool1-16184948-0 status is now: NodeHasSufficientMemory
  Normal  Starting                 12m                kube-proxy, aks-nodepool1-16184948-0  Starting kube-proxy.


Name:               aks-nodepool1-16184948-1
Roles:              agent
Labels:             agentpool=nodepool1
                    beta.kubernetes.io/arch=amd64
                    beta.kubernetes.io/instance-type=Standard_DS1_v2
                    beta.kubernetes.io/os=linux
                    failure-domain.beta.kubernetes.io/region=eastus
                    failure-domain.beta.kubernetes.io/zone=1
                    kubernetes.azure.com/cluster=MC_testaks_testaks_eastus
                    kubernetes.io/hostname=aks-nodepool1-16184948-1
                    kubernetes.io/role=agent
                    storageprofile=managed
                    storagetier=Premium_LRS
Annotations:        node.alpha.kubernetes.io/ttl=0
                    volumes.kubernetes.io/controller-managed-attach-detach=true
CreationTimestamp:  Fri, 13 Apr 2018 05:10:11 +0000
Taints:             <none>
Unschedulable:      false
Conditions:
  Type                 Status  LastHeartbeatTime                 LastTransitionTime                Reason                       Message
  ----                 ------  -----------------                 ------------------                ------                       -------
  NetworkUnavailable   False   Fri, 13 Apr 2018 05:11:13 +0000   Fri, 13 Apr 2018 05:11:13 +0000   RouteCreated                 RouteController created a route
  OutOfDisk            False   Fri, 13 Apr 2018 05:23:03 +0000   Fri, 13 Apr 2018 05:10:11 +0000   KubeletHasSufficientDisk     kubelet has sufficient disk space available
  MemoryPressure       False   Fri, 13 Apr 2018 05:23:03 +0000   Fri, 13 Apr 2018 05:10:11 +0000   KubeletHasSufficientMemory   kubelet has sufficient memory available
  DiskPressure         False   Fri, 13 Apr 2018 05:23:03 +0000   Fri, 13 Apr 2018 05:10:11 +0000   KubeletHasNoDiskPressure     kubelet has no disk pressure
  Ready                True    Fri, 13 Apr 2018 05:23:03 +0000   Fri, 13 Apr 2018 05:11:11 +0000   KubeletReady                 kubelet is posting ready status. AppArmor enabled
Addresses:
  InternalIP:  10.240.0.5
  Hostname:    aks-nodepool1-16184948-1
Capacity:
 alpha.kubernetes.io/nvidia-gpu:  0
 cpu:                             1
 memory:                          3501592Ki
 pods:                            110
Allocatable:
 alpha.kubernetes.io/nvidia-gpu:  0
 cpu:                             1
 memory:                          3399192Ki
 pods:                            110
System Info:
 Machine ID:                 5b4fb70dfc744759821a92694f9d5993
 System UUID:                45C3EA49-8E12-0146-964D-5EFB5A4E3E8A
 Boot ID:                    e7160774-de86-4a16-9bd3-9f43af1cdd57
 Kernel Version:             4.13.0-1011-azure
 OS Image:                   Debian GNU/Linux 9 (stretch)
 Operating System:           linux
 Architecture:               amd64
 Container Runtime Version:  docker://1.13.1
 Kubelet Version:            v1.9.6
 Kube-Proxy Version:         v1.9.6
PodCIDR:                     10.244.1.0/24
ExternalID:                  /subscriptions/31c0faff-6b3e-4b51-86e2-6c9595e05454/resourceGroups/MC_testaks_testaks_eastus/providers/Microsoft.Compute/virtualMachines/aks-nodepool1-16184948-1
ProviderID:                  azure:///subscriptions/31c0faff-6b3e-4b51-86e2-6c9595e05454/resourceGroups/MC_testaks_testaks_eastus/providers/Microsoft.Compute/virtualMachines/aks-nodepool1-16184948-1
Non-terminated Pods:         (3 in total)
  Namespace                  Name                         CPU Requests  CPU Limits  Memory Requests  Memory Limits
  ---------                  ----                         ------------  ----------  ---------------  -------------
  kube-system                heapster-6599f48877-mhr5s    138m (13%)    138m (13%)  294Mi (8%)       294Mi (8%)
  kube-system                kube-proxy-2lv22             100m (10%)    0 (0%)      0 (0%)           0 (0%)
  kube-system                kube-svc-redirect-q54pd      0 (0%)        0 (0%)      0 (0%)           0 (0%)
Allocated resources:
  (Total limits may be over 100 percent, i.e., overcommitted.)
  CPU Requests  CPU Limits  Memory Requests  Memory Limits
  ------------  ----------  ---------------  -------------
  238m (23%)    138m (13%)  294Mi (8%)       294Mi (8%)
Events:
  Type    Reason                   Age                From                                  Message
  ----    ------                   ----               ----                                  -------
  Normal  Starting                 15m                kubelet, aks-nodepool1-16184948-1     Starting kubelet.
  Normal  NodeAllocatableEnforced  15m                kubelet, aks-nodepool1-16184948-1     Updated Node Allocatable limit across pods
  Normal  NodeHasNoDiskPressure    14m (x7 over 15m)  kubelet, aks-nodepool1-16184948-1     Node aks-nodepool1-16184948-1 status is now: NodeHasNoDiskPressure
  Normal  NodeHasSufficientDisk    14m (x8 over 15m)  kubelet, aks-nodepool1-16184948-1     Node aks-nodepool1-16184948-1 status is now: NodeHasSufficientDisk
  Normal  NodeHasSufficientMemory  14m (x8 over 15m)  kubelet, aks-nodepool1-16184948-1     Node aks-nodepool1-16184948-1 status is now: NodeHasSufficientMemory
  Normal  Starting                 12m                kube-proxy, aks-nodepool1-16184948-1  Starting kube-proxy.

Resource requestとlimitの基本はOpenShiftのResource requestとlimitをどうぞ。

2018-03-30

OpenShift CNSでgluster-blockを有効化する

OpenShiftではAnsibleのインベントリにglusterfsグループを定義するとCNSをセットアップしてくれる。

[glusterfs]
node[01:03].example.com glusterfs_devices='[ "/dev/sda" ]'

しかしこの記述でセットアップされるのは普通のglusterfsファイルシステムマウントのみで、ブロックデバイスを提供するgluster-blockはprovisionerのみセットアップされるようだ。nodeホストのセットアップは行われないので、別に実行する必要がある。

CNSのドキュメントをみながらセットアップしてみる。

$ ansible nodes -b -a "yum install iscsi-initiator-utils device-mapper-multipath rpcdind -y"
$ cat << EOF > multipath.conf 
device {
                vendor "LIO-ORG"
                user_friendly_names "yes" # names like mpatha
                path_grouping_policy "failover" # one path per group
                path_selector "round-robin 0"
                failback immediate
                path_checker "tur"
                prio "const"
                no_path_retry 120
                rr_weight "uniform"
        }
EOF
$ ansible nodes -b -m copy -a "src=./multipath.conf dest=/etc/multipath.conf"
$ ansible nodes -b -a "mpathconf --enable"
$ ansible nodes -b -a "systemctl restart multipathd rpcbind"
$ ansible nodes -b -a "systemctl enable multipathd rpcbind"
$ oc project glusterfs
$ oc delete pod --all

きちんと設定できていれば、DaemonSetのglusterfs pod内でgluster-blockdが起動する。oc rshしてsystemctl status gluster-blockdを確認すれば良い。

gluster-blockのstorageclassとsecretを定義する。secretはglusterfsで既に設定されているheketi-storage-admin-secretの値をそのまま流用した。

$ oc project glusterfs
$ oc export secret heketi-storage-admin-secret > heketi-storage-admin-secret.yaml
$ cp -a heketi-storage-admin-secret.yaml gluster-block-secret.yaml 
$ vi gluster-block-secret.yaml  # change name and type
$ diff heketi-storage-admin-secret.yaml gluster-block-secret.yaml 
7,8c7,8
<   name: heketi-storage-admin-secret
< type: kubernetes.io/glusterfs
---
>   name: gluster-block-secret
> type: gluster.org/glusterblock 

$ cat << EOF > gluster-block.yaml
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
  name: gluster-block
parameters:
  resturl: http://heketi-storage-glusterfs.apps.s.nekop.io
  restuser: admin
  restsecretname: gluster-block-secret
  restsecretnamespace: glusterfs
provisioner: gluster.org/glusterblock
reclaimPolicy: Delete
EOF
$ oc create -f gluster-block-secret.yaml
$ oc create -f gluster-block.yaml

PVCを作ってPVできればOK。できない場合はglusterfsプロジェクトのglusterblock-storage-provisioner-dc podのログを確認。

oc run sleep --image=registry.access.redhat.com/rhel7 -- tail -f /dev/null
oc volume dc/sleep --add -t pvc --name=rwo-block --claim-name=rwo-block --mount-path=/rwo-block --claim-size=1Gi --claim-mode=ReadWriteOnce --claim-class=gluster-block
oc get pvc -w

2018-03-29

OpenShiftで外部のコンテナレジストリへpushするビルドを作成する

Kubernetes / OpenShift もくもく会　No. 2です。

OpenShift Container Platform 3.9がリリースされたので、会社にある自分のメインクラスタを3.7から3.9にアップグレードしています。並行して3.4, 3.5 3.6, 3.7のテスト環境のプロビジョニングを仕掛けました。でも仕込みさえ終われば基本的にモニタリングしながら待っているだけなので、空き時間に別のネタをこっちに書いておきます。

OpenShiftには統合されたコンテナregistryが付属しており、基本的にビルドしたイメージはこのregistryに格納されるようになっています。このregistryはOpenShiftのRBAC連携の他、イメージメタデータをOpenShiftのカスタムオブジェクトImageStreamとして管理することによって履歴の保持や更新トリガーなどさまざまな付加機能を実現しています。

とはいえ、外部レジストリを使いたい場合もあるので、その場合のアプリケーションの作成のやり方を書いておきます。

oc new-build https://github.com/nekop/hello-sinatra --to=registry.example.com:5000/test-exregistry/hello-sinatra:latest --to-docker
oc new-app --docker-image=registry.example.com:5000/test-exregistry/hello-sinatra:latest --insecure-registry
oc tag registry.example.com:5000/test-exregistry/hello-sinatra:latest hello-sinatra:latest --scheduled=true

このようにnew-build --toオプションを指定することにより、push先が変更されます。new-buildしただけではデプロイされないので、2番目のコマンドでデプロイの設定を作ります。3番目のコマンドはイメージの更新をpollingする設定です。OpenShiftのregistryではイメージの更新は自動検知しますが、外部レジストリの場合でイメージ更新時に自動再デプロイしたい場合はこのpollingを有効化する必要があります。

2018-02-06

Kubernetes CLI pluginを使ってOpenShiftのアプリケーション情報をダンプするpluginを作る

Kubernetes / OpenShift もくもく会　No. 1でした。

ここにあるOpenShiftで特定のプロジェクト(ネームスペース)をダンプするスクリプトをてきとーに作って使ったりしていたんですが、他の人にも使われるようになって汎用化とかOpenShiftの製品の一部にするとかいろいろしなきゃなー、って感じになってきたので、同僚のRobertが手を付けていたKubernetes CLI pluginとしてマージする作業をはじめました。以下のリポジトリです。

https://github.com/nekop/openshift-sos-plugin

とりあえず最低限のダンプがとれるようにアップデートして、一旦Robertに目を通してもらうためにpull reqしました。

https://github.com/bostrt/openshift-sos-plugin/pull/1

以下のように実行するとダンプファイルが生成されます。トラブルシューティングに必要な大体のオブジェクトのダンプと、イベントログ、podのログを全て保存して固めたものです。

$ oc plugin sos -n logging
Data capture complete and archived in /tmp/oc-sos-logging-20180206-210447.tar.xz
$ tar tf /tmp/oc-sos-logging-20180206-210447.tar.xz
logging/
logging/pods-logging-kibana-3-bwlw2_kibana-proxy.previous.log
logging/pods-logging-kibana-3-bwlw2_kibana-proxy.log
logging/pods-logging-kibana-3-bwlw2_kibana.previous.log
logging/pods-logging-kibana-3-bwlw2_kibana.log
logging/pods-logging-fluentd-rwgjv_fluentd-elasticsearch.previous.log
logging/pods-logging-fluentd-rwgjv_fluentd-elasticsearch.log
logging/pods-logging-fluentd-6fbm9_fluentd-elasticsearch.previous.log
logging/pods-logging-fluentd-6fbm9_fluentd-elasticsearch.log
logging/pods-logging-fluentd-4b5n5_fluentd-elasticsearch.previous.log
logging/pods-logging-fluentd-4b5n5_fluentd-elasticsearch.log
logging/pods-logging-fluentd-2fxhc_fluentd-elasticsearch.previous.log
logging/pods-logging-fluentd-2fxhc_fluentd-elasticsearch.log
logging/pods-logging-es-data-master-ye3u4nvd-3-q6md6_elasticsearch.previous.log
logging/pods-logging-es-data-master-ye3u4nvd-3-q6md6_elasticsearch.log
logging/pods-logging-es-data-master-ye3u4nvd-3-q6md6_proxy.previous.log
logging/pods-logging-es-data-master-ye3u4nvd-3-q6md6_proxy.log
logging/oc-get-all.txt
logging/oc-get-all.yaml
logging/oc-get-project.yaml
logging/oc-get-event.txt
logging/oc-status.txt
logging/oc-version.txt

Kubernetes CLI pluginについてはここに記述されています。

https://kubernetes.io/docs/tasks/extend-kubectl/kubectl-plugins/

気付いたこととして、plugin.yamlの記述をミスっても静かにプラグインとして認識されなくなります。実行しようとしてもそんなプラグインないよ、という以外に特にエラー報告とかはされないのでがんばって直しましょう。

あとはflagsには必ずValueを要求するオプションしか定義できないようです。-v / --debugオプションを定義したかったのですが、この制限のため微妙な感じになったので後回し。

このあとのTODOとしてはこんな感じ。

sos.sh単体でも利用できるよう、plugin特有のENVと直接コマンドパラメータの両方を入力として受けられるようにする、かな。
アプリケーションではなくoc get node,hostsubnetなどノードレベルのトラブルのための情報を取得するcluster-admin向けのオプション追加
debugオプション追加
secretやconfigmapを取得されたくないときもあるので、取得切り替えフラグ追加

2017-12-26

ThinkPad T470sにFedora 27インストール

会社からクリスマスプレゼントでThinkPad T470sをもらったので翌日出社して受け取ってFedora 27をインストールしてメインマシンのスイッチ。スペックはIntel Core i7-7600U 2.8GHz 4 cores, 16GB mem, NVMe SSD 256GB。

前回の記録はFedora 24インストール。

いつも通りHow to create and use Live USBのCUIのlivecd-toolsを使ってUSB起動したけどなぜかgrubが/images/pxeboot/vmlinuzという存在していないパスを参照していて起動しなかった。いい機会だったのでごちゃまぜUSBメモリの運用をやめて、新しいUSBメモリでGUIのmediawriterを使ってインストール専用USBメモリを作成してふつうにインストールした。

セットアップの流れはあまり変わってない。今回のセットアップでの変更点がいくつか。

メインブラウザとしてFirefoxではなくChrome(実験)
- Firefoxのセットアップやspice-xpiインストールなどはスキップ
Emacsのsemiとwanderlustをpackage-installするようにした
Bluejeansはchrome extensionになったのでrpmパッケージはもういらない
Emacsのフォントがなぜか明朝になったのでとりあえず以下の設定を入れた。あとでフォント変えるかも。

(cond
 (window-system (set-default-font "DejaVu Sans Mono-8")
                (set-fontset-font (frame-parameter nil 'font)
                                  'japanese-jisx0208
                                  '("VL ゴシック" . "unicode-bmp"))))

Chromeはchrome://settings/でOn Startup -> Continue where you left offに設定してセッションを復元するように。あとはcp /usr/share/applications/google-chrome.desktop ~/.local/share/applications/してExecに--auth-server-whitelistを指定してKerberos認証の有効化。.desktopファイルをまるっとコピーしているのが若干微妙なんだけど差分だけ書けるようなもっとスマートな方法あるのかな。

2017-12-25

OpenShift Container Platform 3.7をAWS上にシングルノードでセットアップする

OpenShift 全部俺 Advent Calendar 2017

OpenShiftをAWSでセットアップするリファレンスアーキテクチャやCloudFormationやデプロイスクリプトなどはいろいろ用意されていますが、本番構成向けのもったりした構成です。AWS上でのテスト用にシングルノードセットアップを使いたい場合があるので、手順を書いておきます。

VPC, Internet Gateway, Subnetなどは事前に作成する前提です。EC2 Instanceは以下のものを利用しました。

AMI: RHEL-7.4_HVM_GA-20170724-x86_64-1-Access2-GP2
Instance Type: m4.xlarge (4 vCPU, 16G mem)
Disk: 80GB
Tag: kubernetes.io/cluster/foo=owned
Elastic IP付与

Elastic IPの生IPやながーいホスト名は扱いづらいので、Route 53でmasterのホスト名とワイルドカードアプリケーションドメインをCNAMEもしくはAレコードで上記Elastic IPに設定します。

AWS上でKubernetesを利用する場合はkubernetes.io/cluster/<clusterid>タグが必要になります。設定しないと同一アカウント内の複数のKubernetesクラスタがEBSの奪い合いなどリソースの処理で競合して面白いことが起きます。

Security Groupは適切に設定してください。Allow All設定でAllowAllPasswordIdentityProviderの設定であれば当然ですがビットコイン掘られます。

起動したら前回と同じように以下のようなスクリプトを用意してsudoで流し込みます。

RHSM_USERNAME=
RHSM_PASSWORD=
RHSM_POOLID=
OPENSHIFT_VERSION=3.7

subscription-manager register --username=$RHSM_USERNAME --password=$RHSM_PASSWORD
subscription-manager attach --pool $RHSM_POOLID
subscription-manager repos --disable=*
subscription-manager repos \
     --enable=rhel-7-server-rpms \
     --enable=rhel-7-server-extras-rpms \
     --enable=rhel-7-server-optional-rpms \
     --enable=rhel-7-server-rh-common-rpms \
     --enable=rhel-7-fast-datapath-rpms \
     --enable=rhel-7-server-ose-$OPENSHIFT_VERSION-rpms

yum install chrony wget git net-tools bind-utils iptables-services bridge-utils nfs-utils sos sysstat bash-completion lsof tcpdump yum-utils yum-cron docker atomic-openshift-utils -y
sed -i 's/DOCKER_STORAGE_OPTIONS=/DOCKER_STORAGE_OPTIONS="--storage-driver overlay2"/' /etc/sysconfig/docker-storage
systemctl enable chronyd docker
systemctl start chronyd docker
reboot

再度接続したらansibleの準備です。

$ ssh-keygen
$ cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys
$ sudo mkdir -p /etc/aws
$ sudo cat <<EOF > /etc/aws/aws.conf
[Global]
Zone = ap-northeast-1c
EOF
$ sudo vi /etc/ansible/hosts

Ansibleのhostsファイルはこんな感じです。AWSやOpenStackなどのクラウド環境上ではホスト定義にopenshift_hostnameパラメータを定義してはいけません。Kubernetesのクラウドインテグレーションは内部ホスト名を要求するので、オーバーライドするといろいろエラーになります。

[OSEv3:children]
masters
etcd
nodes

[OSEv3:vars]
ansible_ssh_user=ec2-user
ansible_become=true
deployment_type=openshift-enterprise
openshift_master_identity_providers=[{'name': 'htpasswd', 'login': 'true', 'challenge': 'true', 'kind': 'HTPasswdPasswordIdentityProvider', 'filename': '/etc/origin/master/htpasswd'}]
os_sdn_network_plugin_name=redhat/openshift-ovs-multitenant
openshift_node_kubelet_args={'kube-reserved': ['cpu=100m,memory=100Mi'], 'system-reserved':['cpu=100m,memory=100Mi'], 'eviction-hard': [ 'memory.available<4%', 'nodefs.available<4%', 'nodefs.inodesFree<4%', 'imagefs.available<4%', 'imagefs.inodesFree<4%' ], 'eviction-soft': [ 'memory.available<8%', 'nodefs.available<8%', 'nodefs.inodesFree<8%', 'imagefs.available<8%', 'imagefs.inodesFree<8%' ], 'eviction-soft-grace-period': [ 'memory.available=1m30s', 'nodefs.available=1m30s', 'nodefs.inodesFree=1m30s', 'imagefs.available=1m30s', 'imagefs.inodesFree=1m30s' ]}
openshift_disable_check=memory_availability,disk_availability

openshift_master_cluster_public_hostname=master.example.com
openshift_master_default_subdomain=apps.example.com

openshift_cloudprovider_kind=aws
openshift_cloudprovider_aws_access_key="{{ lookup('env','AWS_ACCESS_KEY_ID') }}"
openshift_cloudprovider_aws_secret_key="{{ lookup('env','AWS_SECRET_ACCESS_KEY') }}"
openshift_clusterid=foo

[masters]
master.example.com

[etcd]
master.example.com

[nodes]
master.example.com openshift_node_labels="{'region': 'infra'}" openshift_schedulable=true

インストールを行なえばEBS PVのダイナミックプロビジョニングもセットアップされている状態になっているはずです。

export AWS_ACCESS_KEY_ID=...
export AWS_SECRET_ACCESS_KEY=...
ansible-playbook -vvvv /usr/share/ansible/openshift-ansible/playbooks/byo/config.yml | tee -a openshift-install-$(date +%Y%m%d%H%M%S).log
reboot

2017-12-25

OpenShiftでメモリでオートスケールをしてみる

OpenShift 全部俺 Advent Calendar 2017

まだAlpha機能ですが、CPUではなくメモリ利用量でオートスケールを設定することができます。

https://docs.openshift.org/3.11/dev_guide/pod_autoscaling.html#pod-autoscaling-memory

先に重要な点を書いておきますが、このメモリ利用量とはcgroupsのmemory.usage_in_bytesであり、これはbuff/cacheを含みます。実際には解放できるディスクキャッシュなども使用中としてレポートされ、キャッシュが活用されている限りメモリ利用は常に高い状態をキープし、オートスケールであまり意味もなくPodが増えることになるので、メモリベースのオートスケールは現状あまり使い物にならないと思います。

追記: 上記記述は古いものであり、OpenShift 3.11以降など新しいバージョンはメモリ量としてmemory.usage_in_bytesではなくより一般的な利用量と呼ぶのにふさわしい値であるWorking set (memory.stat.rss + memory.stat.cache - memory.stat.inactive_file) を返却するのでメモリオートスケールも普通に利用できます。

まずはドキュメントの通り以下の設定をmaster-config.yamlへ反映してmasterを再起動します。

apiServerArguments:
  runtime-config:
  - apis/autoscaling/v2alpha1=true

いつものRubyのサンプルを作成してrequests.memoryに64Miを指定します。

$ oc new-project test-memhpa
$ oc new-app https://github.com/nekop/hello-sinatra
$ oc set resources dc/hello-sinatra --requests=memory=64Mi
$ cat <<EOF | oc create -f -
apiVersion: autoscaling/v2alpha1
kind: HorizontalPodAutoscaler
metadata:
  name: hpa-memory 
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: DeploymentConfig
    name: hello-sinatra
  minReplicas: 1 
  maxReplicas: 10 
  metrics:
  - type: Resource
    resource:
      name: memory
      targetAverageUtilization: 50 
EOF

作成したHPAを再度取得するとAlpha機能であるため内容のほとんどは以下のようにアノテーション内に格納されていることがわかります。

$ oc get hpa hpa-memory -o yaml
apiVersion: autoscaling/v1
kind: HorizontalPodAutoscaler
metadata:
  annotations:
    autoscaling.alpha.kubernetes.io/conditions: '[{"type":"AbleToScale","status":"True","lastTransitionTime":"2017-12-25T02:36:07Z","reason":"ReadyForNewScale","message":"the
      last scale time was sufficiently old as to warrant a new scale"},{"type":"ScalingActive","status":"True","lastTransitionTime":"2017-12-25T02:36:07Z","reason":"ValidMetricFound","message":"the
      HPA was able to succesfully calculate a replica count from memory resource utilization
      (percentage of request)"},{"type":"ScalingLimited","status":"False","lastTransitionTime":"2017-12-25T02:36:07Z","reason":"DesiredWithinRange","message":"the
      desired replica count is within the acceptible range"}]'
    autoscaling.alpha.kubernetes.io/current-metrics: '[{"type":"Resource","resource":{"name":"memory","currentAverageUtilization":24,"currentAverageValue":"16576512"}}]'
    autoscaling.alpha.kubernetes.io/metrics: '[{"type":"Resource","resource":{"name":"memory","targetAverageUtilization":50}}]'
  creationTimestamp: 2017-12-25T02:35:37Z
  name: hpa-memory
  namespace: test-memhpa
  resourceVersion: "4241124"
  selfLink: /apis/autoscaling/v1/namespaces/test-memhpa/horizontalpodautoscalers/hpa-memory
  uid: 4a2e8f4e-e91c-11e7-b3cf-001a4a40dc83
spec:
  maxReplicas: 10
  minReplicas: 1
  scaleTargetRef:
    apiVersion: apps/v1
    kind: DeploymentConfig
    name: hello-sinatra
status:
  currentReplicas: 1
  desiredReplicas: 1

初期状態はメモリ16Mi利用しています。

$ oc adm top pod
NAME                    CPU(cores)   MEMORY(bytes)   
hello-sinatra-2-d9ft8   0m           16Mi

32MBのファイルを作ってcatしてディスクキャッシュに載せてみます。

$ oc rsh dc/hello-sinatra sh -c "dd if=/dev/zero of=/tmp/32mb bs=32M count=1; cat /tmp/32mb"

少し経過すると、メトリクスに反映されました。

$ oc adm top pod
NAME                    CPU(cores)   MEMORY(bytes)   
hello-sinatra-2-d9ft8   5m           48Mi

HPAによるスケールアウトがトリガーされ、Podが2つに増えました。

$ oc get pod
NAME                    READY     STATUS    RESTARTS   AGE
hello-sinatra-2-b9hxt   1/1       Running   0          16s
hello-sinatra-2-d9ft8   1/1       Running   0          5m
$ oc adm top pod
NAME                    CPU(cores)   MEMORY(bytes)   
hello-sinatra-2-d9ft8   0m           48Mi            
hello-sinatra-2-b9hxt   0m           15Mi

動作に必要なメモリが足りなくなりそうな場合にスケールアウト、というのが欲しいところですが、今のところはこのように動作に不可欠ではない解放されるメモリもカウントされてオートスケールされてしまいますので、使い道はかなり限定されそうです。

nekop's blog

OpenShift / JBoss / WildFly / Infinispanの中の人 http://twitter.com/nekop