flwang1 | give yours is a new env, pls go for fedora coreos | 00:01 |
---|---|---|
cosmicsound | flwang1 , i tried those images in this new setup somehow they all failed to boot | 00:04 |
cosmicsound | maybe my cli for coreos template is not great | 00:04 |
flwang1 | cosmicsound: i think it's because you're using old version heat | 00:04 |
flwang1 | what's your heat version? | 00:04 |
cosmicsound | i use kolla-ansible to deploy from source on ubuntu | 00:05 |
cosmicsound | so i should not be using old version | 00:05 |
flwang1 | cosmicsound: what's your heat version? | 00:06 |
*** k_mouza has joined #openstack-containers | 00:08 | |
*** k_mouza has quit IRC | 00:10 | |
cosmicsound | not sure how i see it | 00:12 |
cosmicsound | cli shows me the cli version of cli cliennt | 00:12 |
cosmicsound | heat --version | 00:13 |
cosmicsound | 2.0.0 | 00:13 |
flwang1 | hmm.. no, not heat cli version | 00:31 |
flwang1 | your heat service version | 00:31 |
cosmicsound | not sure how to get that yet | 00:42 |
cosmicsound | must be the train stable version | 00:42 |
openstackgerrit | Merged openstack/magnum master: Support calico v3.3.6 https://review.opendev.org/717116 | 01:23 |
cosmicsound | it seems if so i found the issue | 02:05 |
cosmicsound | or not | 02:08 |
*** k_mouza has joined #openstack-containers | 02:10 | |
*** k_mouza has quit IRC | 02:15 | |
cosmicsound | manifest for kolla/ubuntu-source-magnum-api:9.3.0 not found: manifest unknown: manifest unknown | 02:32 |
openstackgerrit | Lingxian Kong proposed openstack/magnum master: [K8S] Delete all related load balancers before deleting cluster https://review.opendev.org/716930 | 03:48 |
*** ykarel|away is now known as ykarel | 04:20 | |
cosmicsound | Apr 07 04:02:01 d-pa2hu2ehhcwn-master-0 podman[2332]: Authorization failed: SSL exception connecting to https://cloud.uhlhost.net:5000/v3/auth/tokens: HTTPSConnectionPool(host='cloud.uhlhost.net', port=5000): Max retries exceeded with url: /v3/auth/tokens (Caused by SSLError(SSLError(136, '[X509] no certificate or crl found (_ssl.c:4232)'))) | 04:23 |
cosmicsound | this is only in coreos happening | 04:24 |
*** AJaeger has left #openstack-containers | 05:04 | |
*** ricolin has joined #openstack-containers | 05:30 | |
*** udesale has joined #openstack-containers | 05:41 | |
openstackgerrit | Feilong Wang proposed openstack/magnum master: [WIP] Support multi AZ for k8s multi masters https://review.opendev.org/714347 | 05:44 |
brtknr | cosmicsound: are you using os_distro=fedora-coreos or coreos? | 05:57 |
cosmicsound | fedora-coreos | 06:03 |
cosmicsound | tried with your scrips | 06:03 |
cosmicsound | scripts* | 06:04 |
cosmicsound | it worked for atomic | 06:04 |
cosmicsound | not for coreos | 06:04 |
cosmicsound | I get this in logs | 06:04 |
cosmicsound | Apr 07 05:45:48 k-necofnexy5va-master-0 podman[2327]: Source [heat_local] Unavailable. | 06:04 |
cosmicsound | Apr 07 05:45:50 k-necofnexy5va-master-0 podman[2327]: Source [request] Unavailable. | 06:04 |
cosmicsound | Apr 07 05:45:50 k-necofnexy5va-master-0 podman[2327]: /var/lib/os-collect-config/local-data not found. Skipping | 06:04 |
cosmicsound | Apr 07 05:45:50 k-necofnexy5va-master-0 podman[2327]: No auth_url configured. | 06:04 |
cosmicsound | I updated to latest image from coreos | 06:04 |
*** udesale has quit IRC | 06:09 | |
*** udesale has joined #openstack-containers | 06:13 | |
*** xinliang has joined #openstack-containers | 06:26 | |
brtknr | cosmicsound: wait so coreos is booting up but not making the cluster? | 06:30 |
brtknr | If you are using kolla ansible, can you try using the master branch for magnum? | 06:31 |
brtknr | I have seen that before for tls endpoints | 06:31 |
brtknr | eg if your keystone is https | 06:32 |
brtknr | it works when it’s http | 06:32 |
brtknr | please file a bug report | 06:32 |
brtknr | And we shall look at it | 06:32 |
brtknr | But try the master tag for magnum fiest | 06:33 |
brtknr | first | 06:33 |
cosmicsound | yes | 06:36 |
cosmicsound | I will try | 06:36 |
cosmicsound | Now i am on train | 06:36 |
openstackgerrit | Merged openstack/magnum master: Cleanup py27 support https://review.opendev.org/717549 | 06:41 |
*** ttsiouts has joined #openstack-containers | 06:45 | |
cosmicsound | brtknr , with master branch and your script for atomic it gives me: Create_Failed: Resource CREATE failed: BadRequest: resources.kube_masters.resources[0].resources.docker_volume: Invalid input for field/attribute availability_zone. Value: . '' is too short (HTTP 400) (Request-ID: req-f3846f3b-bd55-4d64-bb1a-33779b5a43ba) | 07:16 |
*** xinliang has quit IRC | 07:29 | |
cosmicsound | brtknr , Create Failed | 07:29 |
cosmicsound | Resource Create Failed: Badrequest: Resources.Kube Masters.Resources[0].Resources.Kube Node Volume: Invalid Input For Field/Attribute Availability Zone. Value: . '' Is Too Short (Http 400) (Request-Id: Req-E909dacb-8c96-467b-A859-1ae5ddb40e4a) | 07:29 |
cosmicsound | i added availability_zone=nova and then i got the second error | 07:29 |
cosmicsound | and weirdest is that. on new version i cannot even. remake the cluster. that worked before: Resource Create Failed: Error: Resources.Kube Masters.Resources[0].Resources.Master Config Deployment: Deployment To Server Failed: Deploy Status Code: Deployment Exited With Non-Zero Status Code: 1 | 07:59 |
cosmicsound | http://paste.openstack.org/show/791719/ | 08:00 |
*** born2bake has joined #openstack-containers | 08:02 | |
*** k_mouza has joined #openstack-containers | 08:11 | |
*** k_mouza has quit IRC | 08:15 | |
*** ykarel is now known as ykarel|lunch | 08:48 | |
brtknr | cosmicsound: are you using cinder volume? | 08:51 |
cosmicsound | brtknr , yes | 08:51 |
cosmicsound | cinder backed by ceph | 08:51 |
brtknr | can you disable it and try | 08:51 |
cosmicsound | not sure how you mean that | 08:52 |
cosmicsound | labels | {'heat_container_agent_tag': '689704', 'kube_tag': 'v1.14.8'} | 08:52 |
brtknr | also you need the latest heat also | 08:52 |
brtknr | do you have volume_driver=cinder? | 08:52 |
cosmicsound | let me check | 08:52 |
cosmicsound | latest heat with latest magnum? | 08:53 |
brtknr | yes | 08:54 |
cosmicsound | will give it a try | 08:54 |
cosmicsound | anyhow | 08:54 |
cosmicsound | most occurd i find | 08:54 |
cosmicsound | that the same template that made yesterday a healthy cluster | 08:55 |
cosmicsound | today it fails | 08:55 |
cosmicsound | there is no volume_driver setup | 08:56 |
*** k_mouza has joined #openstack-containers | 09:37 | |
*** ykarel|lunch is now known as ykarel | 09:41 | |
*** k_mouza has quit IRC | 09:49 | |
*** k_mouza has joined #openstack-containers | 09:57 | |
*** k_mouza has quit IRC | 09:57 | |
*** k_mouza has joined #openstack-containers | 09:57 | |
*** ykarel is now known as ykarel|meeting | 10:02 | |
cosmicsound | brtknr , it did not helped | 10:04 |
cosmicsound | somehow it fails continually | 10:04 |
brtknr | cosmicsound: why are you using heat_container_agent_tag 689704? | 10:12 |
brtknr | ussuri-dev is recommended | 10:12 |
cosmicsound | thats what you had stored in the script | 10:13 |
cosmicsound | and thus was the first working cluster | 10:13 |
cosmicsound | i will update them all to ussuri-dev and try again | 10:13 |
*** ricolin has quit IRC | 10:16 | |
cosmicsound | heat_container_agent_tag=ussuri-dev,kube_tag=v1.16.0 | 10:27 |
cosmicsound | trying this | 10:27 |
cosmicsound | dockerd-current[1343]: time="2020-04-07T10:30:28.670451562Z" level=error msg="Handler for GET /v1.26/containers/etcd/json returned error: No such container: etcd" | 10:30 |
cosmicsound | Apr 07 10:30:14 k1-rkjdrcythq3m-master-0.novalocal runc[2397]: [2020-04-07 10:30:14,291] (heat-config) [DEBUG] Running /var/lib/heat-config/hooks/script < /var/lib/heat-config/deployed/56955806-b478-4789-b1c6-e5e747713f43.json | 10:31 |
cosmicsound | it will stay here 2 3 mins, then will time out with os-profile not found | 10:32 |
cosmicsound | brtknr , it goes here | 10:37 |
cosmicsound | Apr 07 10:36:02 k1-rkjdrcythq3m-master-0.novalocal runc[2397]: [2020-04-07 10:36:02,657] (os-refresh-config) [INFO] Completed phase migration | 10:37 |
cosmicsound | Apr 07 10:36:02 k1-rkjdrcythq3m-master-0.novalocal runc[2397]: INFO:os-refresh-config:Completed phase migration | 10:37 |
cosmicsound | Apr 07 10:36:04 k1-rkjdrcythq3m-master-0.novalocal runc[2397]: /var/lib/os-collect-config/local-data not found. Skipping | 10:37 |
cosmicsound | and it dies | 10:37 |
cosmicsound | no matter what labels i use | 10:37 |
brtknr | cosmicsound: please check inside /var/log/heat-config as i mentioned before | 10:37 |
*** ykarel|meeting is now known as ykarel | 10:37 | |
*** vishalmanchanda has joined #openstack-containers | 10:40 | |
cosmicsound | i did | 10:41 |
cosmicsound | this is from there | 10:41 |
*** k_mouza has quit IRC | 10:42 | |
*** k_mouza has joined #openstack-containers | 10:46 | |
cosmicsound | 4月 07 08:38:12 magnum-test-cluster-v1-15-11-7lzcw4g3fcpk-master-0.novalocal runc[2364]: ++ ssh -F /srv/magnum/.ssh/config root@localhost ls /dev/disk/by-id | 10:48 |
cosmicsound | 4月 07 08:38:12 magnum-test-cluster-v1-15-11-7lzcw4g3fcpk-master-0.novalocal runc[2364]: ++ grep 'd3212ba7-394d-45f1-9$' | 10:48 |
cosmicsound | 4月 07 08:38:12 magnum-test-cluster-v1-15-11-7lzcw4g3fcpk-master-0.novalocal runc[2364]: + device_name= | 10:48 |
cosmicsound | Here comes the trouble | 10:48 |
cosmicsound | https://github.com/openstack/magnum/blob/master/magnum/drivers/common/templates/kubernetes/fragments/configure-etcd.sh#L25 | 10:50 |
cosmicsound | this one cannot be retrieved by. heat-agent | 10:50 |
cosmicsound | and will fail | 10:50 |
*** ttsiouts has quit IRC | 10:59 | |
cosmicsound | brtknr , with or without hw_scsi_model=virtio-scsi | 11:03 |
cosmicsound | it seems scsi can cause also issues with above bug we got here | 11:03 |
cosmicsound | il confirm soon | 11:03 |
brtknr | please show me the full log | 11:04 |
brtknr | cosmicsound: well, have you specified etcd_volume_size? | 11:05 |
brtknr | try disabling it | 11:05 |
brtknr | if what you say is true then that is only ever executed if this condition is met: if [ -n "$ETCD_VOLUME_SIZE" ] && [ "$ETCD_VOLUME_SIZE" -gt 0 ]; then | 11:06 |
cosmicsound | brtknr , i had o etcd specified this time | 11:07 |
cosmicsound | also i was o scsi now on virtio | 11:07 |
*** ttsiouts has joined #openstack-containers | 11:09 | |
cosmicsound | http://paste.openstack.org/show/791719/ here is full log from heat | 11:11 |
cosmicsound | coreos seems to be stucked at: | 11:35 |
cosmicsound | + echo 'Trying to label master node with node-role.kubernetes.io/master=""' | 11:35 |
cosmicsound | + sleep 5s | 11:35 |
cosmicsound | ++ curl --silent http://127.0.0.1:8080/healthz | 11:35 |
cosmicsound | + '[' ok = '' ']' | 11:35 |
ttsiouts | strigazi, brtknr: are you guys around? | 12:07 |
strigazi | o/ | 12:08 |
ttsiouts | o/ | 12:08 |
ttsiouts | I wanted to talk about the spec | 12:08 |
ttsiouts | I kind of like brtknr's idea. | 12:09 |
ttsiouts | I could start rewriting the spec based on that | 12:09 |
guilhermesp | thanks for sharing the conformance results flwang1 ! Not sure but for both v1.17.4 and v1.18 i'm getting the same tests failing http://paste.openstack.org/show/791730/ | 12:26 |
guilhermesp | which is mostly dns tests | 12:26 |
*** ttsiouts has quit IRC | 12:34 | |
born2bake | guys is there any up-to-date guide how to use magnum and deploy up-to-date k8s with magnum? | 12:41 |
born2bake | using up-to-date fedora-coreos images | 12:42 |
born2bake | I ve tried already so many different setups :) the only one that works for me: flannel, fedora-coreos, 1 master. (autoscaler, autohealer, cloud manager are crashing at scaling) | 12:43 |
*** ttsiouts has joined #openstack-containers | 12:43 | |
born2bake | calico, multi-master neither of them are working for me | 12:43 |
born2bake | openstack: train, kolla | 12:43 |
guilhermesp | born2bake: do you have octavia on your env? | 12:47 |
born2bake | yes | 12:47 |
guilhermesp | no logs? | 12:47 |
born2bake | deleted everything, will try again later on. I managed to have multi-master cluster with fedora-atomic 29....but I cant use fedora-atomic (it takes around 20 min to boot up just one image), worker-nodes were not able to connect either | 12:48 |
guilhermesp | born2bake: https://review.opendev.org/#/c/685875/1 are you aware of? | 12:49 |
born2bake | the main problem, I have no idea how to troubleshoot using heat-config logs...I can see it stopped and failed but I do not know why | 12:49 |
born2bake | guilhermesp nope. even though I have no idea how to update my magnum setup lol | 12:50 |
born2bake | but as far as I know, in kolla ansible train magnum version is 9.2.0 | 12:50 |
guilhermesp | and the heat version? | 12:51 |
brtknr | ttsiouts: heelo im here | 12:52 |
born2bake | docker exec -it kolla/ubuntu-source-heat-engine:train heat --version - 1.18.0 | 12:53 |
brtknr | born2bake: can you upload your logs? | 12:56 |
brtknr | born2bake: ssh core@172.24.4.253 sudo cat /var/log/heat-config/heat-config-script/* | nc seashells.io 1337 | 12:56 |
born2bake | surely I will do | 12:57 |
brtknr | cosmicsound: same for you^ | 12:57 |
strigazi | ping ttsiouts | 12:58 |
ttsiouts | I'm here | 13:01 |
ttsiouts | brtknr: I really like the your idea | 13:02 |
ttsiouts | :) | 13:02 |
ttsiouts | I just wanted to discuss with both of you a bit more | 13:02 |
brtknr | ttsiouts: great to hear =) | 13:02 |
*** udesale_ has joined #openstack-containers | 13:03 | |
brtknr | I think someone else has suggested this before but on the client side | 13:03 |
brtknr | e.g. by reading the cluster template labels and applying merge/override based on a flag that the server never sees | 13:04 |
ttsiouts | this solution though would not allow proper tracking of the labels that were provided at creation time. | 13:05 |
*** udesale has quit IRC | 13:05 | |
brtknr | ttsiouts: yes i agree | 13:06 |
ttsiouts | so having this option server side is what makes it work for this use case too. | 13:06 |
strigazi | I think someone else has suggested this before but on the client side: HARD NO | 13:06 |
brtknr | I thought it was this but looks like its a different implementation: https://review.opendev.org/#/c/657410 | 13:07 |
strigazi | https://review.opendev.org/#/c/657435/ This is the patch | 13:07 |
brtknr | that looks similar | 13:08 |
strigazi | They are duplicate | 13:08 |
brtknr | either way, the merge takes place on the client side | 13:08 |
strigazi | Haven't we rejected this? | 13:09 |
brtknr | yes, i was basically trying to point out that my suggestion is not 100% original :) | 13:10 |
ttsiouts | ok we agree that the merge should be done server side in order to allow proper tracking of client input | 13:10 |
brtknr | +1 | 13:11 |
ttsiouts | do you also agree that the labels field (in a cluster or nodegroup) should contain only the labels provided at creation? | 13:13 |
ttsiouts | which means that we have to persist the flag too. | 13:14 |
brtknr | at cluster/nodegroup creation? | 13:14 |
ttsiouts | brtknr: yes | 13:14 |
brtknr | agree | 13:14 |
ttsiouts | strigazi ? | 13:14 |
strigazi | yes | 13:15 |
strigazi | argee | 13:15 |
strigazi | agree | 13:16 |
brtknr | ttsiouts: e.g. after the cluster is created, --merge-label flag cannot be modified you mean right? | 13:16 |
brtknr | via the API | 13:16 |
ttsiouts | brtknr: yes | 13:16 |
ttsiouts | cool | 13:17 |
ttsiouts | sorry for going step by step but I want this to go forward as soon as possible | 13:18 |
ttsiouts | :) | 13:18 |
strigazi | we are picky, so this ^^ is the only way | 13:18 |
ttsiouts | :) | 13:19 |
brtknr | ttsiouts: any more resolutions to pass ? :) | 13:20 |
ttsiouts | should we also agree on the flag and the field name? | 13:20 |
ttsiouts | --merge-labels and a boolean field in DB called merge_labels? | 13:21 |
brtknr | i am happy with merge-labels but open to other suggestions | 13:22 |
brtknr | other ideas: --combine-labels, --smash-labels, --update-labels, --inherit-labels, --override-labels | 13:23 |
strigazi | override is probably what we want | 13:23 |
strigazi | in OOP you override a method | 13:23 |
brtknr | second thing to agree on is whether the current behaviour is override-labels=True or False | 13:25 |
strigazi | animal.get_features() and dog.get_features() | 13:25 |
ttsiouts | override though means not using what's inherited right? | 13:25 |
ttsiouts | brtknr: yes | 13:25 |
ttsiouts | if we go with override then false should mean merge right? | 13:26 |
brtknr | I think the current behaviour is --override-labels=True | 13:26 |
ttsiouts | brtknr: exactly | 13:26 |
brtknr | Since nothing is inherited | 13:27 |
strigazi | actually, the current behaviour is both. Becaus: | 13:27 |
cosmicsound | brtknr , will upload logs | 13:28 |
brtknr | It would be good not have to speficy --override-labels=False as an opt-in flag | 13:28 |
strigazi | in the API we do https://github.com/openstack/magnum/blob/master/magnum/api/controllers/v1/cluster.py#L475 | 13:28 |
brtknr | would prefer to supply "--opt-in-flag" only if True | 13:28 |
strigazi | as brtknr wants | 13:29 |
brtknr | strigazi: I see your point | 13:30 |
ttsiouts | strigazi: indeed | 13:30 |
cosmicsound | brtknr , https://seashells.io/v/VzqW9UYW | 13:31 |
brtknr | --override_labels = False if cluster.label == wtypes.Unset else True | 13:31 |
strigazi | this boolean is strange because, in the new API version we want override always True and in the old API always False. | 13:32 |
cosmicsound | Will update session as it changes i test newer versions | 13:32 |
brtknr | hmm this is more of a rabbit hole than I realised :) | 13:33 |
strigazi | Another option (a bad one) is the default API is not the actual latest which means (use always only the cluster labels), and the new API uses always both (override = True) | 13:35 |
strigazi | The problem is that in the cli we always ask the latest API version. | 13:35 |
strigazi | override=true == (get CT labels and C labels) && (get CT labels and C labels and NG labels) | 13:38 |
strigazi | override=False == (get C labels) && (get NG labels) | 13:38 |
strigazi | for POST cluster and POST NG respectively | 13:38 |
strigazi | default logic override=true | 13:39 |
ttsiouts | IMHO the True boolean option should reflect the new functionality. | 13:39 |
strigazi | +10 ^^ | 13:40 |
strigazi | The issue to address is: The old client will send Unset for override and latest for the API microversion. | 13:41 |
strigazi | To solve this: The API can have default logic override=fale | 13:42 |
strigazi | and the new client send true by default | 13:42 |
strigazi | for UX experience | 13:42 |
brtknr | or override=True if labels is defined else False? | 13:43 |
brtknr | is override is Unset | 13:43 |
strigazi | (UX includes experience) :) | 13:43 |
brtknr | if override is Unset | 13:43 |
strigazi | -1 to that | 13:43 |
strigazi | The new API should not check if Unset | 13:44 |
strigazi | the old API microversion will do what you just mentioned | 13:44 |
strigazi | because it is not supposed to know about override | 13:44 |
brtknr | strigazi: ok IDK the fine details of how API microversion works atm | 13:45 |
cosmicsound | cloud_provider_tag=v1.15.0 should work for k8s v1.17.4 ? | 13:45 |
strigazi | yes ^^ | 13:45 |
brtknr | cosmicsound: yes thats what I use | 13:45 |
guilhermesp | it is the default right for v1.17 right? | 13:46 |
guilhermesp | cloud_provider_tag=v1.15.0. | 13:46 |
strigazi | brtknr: ttsiouts: let's break backwards compatibility? | 13:47 |
strigazi | brtknr: ttsiouts: let's break the API | 13:47 |
brtknr | strigazi: hmm? | 13:47 |
strigazi | we document and users open many tickets, at CERN the open many anyway :) | 13:47 |
brtknr | strigazi: not sure if you are being serious :) | 13:49 |
strigazi | actually desperate | 13:49 |
strigazi | :) | 13:49 |
cosmicsound | Error: Unable to update cluster. when trying to resize cluster | 13:49 |
cosmicsound | isnt this supposed to work? | 13:49 |
cosmicsound | making from 1 node 2 3 nodes | 13:50 |
ttsiouts | strigazi, brtknr: let's think about this. this property is immutable. meaning that it is false for all the existing clusters | 13:50 |
ttsiouts | we need to describe this with one word. | 13:51 |
brtknr | the way I see it, it only makes sense to evaluate this flag if labels is not empty at cluster scope or nodegroup scope | 13:52 |
ttsiouts | brtknr: +1 | 13:52 |
brtknr | if labels is empty and this flag is True, the API should return an error | 13:52 |
brtknr | this should be backward compatible too | 13:52 |
ttsiouts | I agree | 13:52 |
brtknr | strigazi: ^ | 13:53 |
strigazi | thinking | 13:55 |
brtknr | -\|/-\|/- | 13:55 |
strigazi | And the default is True? | 13:56 |
strigazi | both API and cli | 13:57 |
strigazi | ? | 13:57 |
brtknr | i think the "opt-in" flag should signify the merge action | 13:57 |
brtknr | I am starting to actually prefer the sound of combine | 13:58 |
ttsiouts | the default shouldn't be true | 13:58 |
brtknr | --combine-labels | 13:58 |
brtknr | so it should be False | 13:58 |
brtknr | ^ | 13:58 |
*** dave-mccowan has joined #openstack-containers | 13:59 | |
ttsiouts | In the code it will be a dict.update | 14:01 |
strigazi | brtknr: So the new improved logic won't be available by default to users, correct? | 14:01 |
ttsiouts | should we go with update? | 14:01 |
ttsiouts | update-labels | 14:01 |
brtknr | strigazi: not by default | 14:02 |
brtknr | users will need to work for it | 14:02 |
brtknr | earn their keep | 14:02 |
strigazi | Is this what we want? | 14:02 |
brtknr | strigazi: doesnt make sense to break current default behaviour suddenly does it? | 14:03 |
strigazi | I think that what currently is described in the spec is simpler (one param less and more verbosity) for user and more code for us, thoughts? | 14:03 |
strigazi | doesnt make sense to break current default behaviour suddenly does it? it doesn't | 14:04 |
brtknr | i disagree that it is one less param, since we are adding a new field | 14:04 |
brtknr | its the same number of params | 14:05 |
brtknr | ttsiouts: i am also fine with update-labels | 14:06 |
brtknr | ttsiouts: i am also fine with update-labels but it is less obvious | 14:07 |
strigazi | brtknr: true, same number of params. | 14:07 |
brtknr | in fact i would argue that the purpose of dict.update is not entirely clear to a new user | 14:07 |
strigazi | So, to have the same functionality with the SPEC (and not change default behavior). override-labes false by default. And we are covered, correct? | 14:10 |
brtknr | the purpose of this flag is to avoid mutually exclusive population of labels/override_labels(from the current spec) | 14:10 |
*** dave-mccowan has quit IRC | 14:10 | |
brtknr | if override-labels==combine-labels, correct :) | 14:11 |
strigazi | exactly | 14:11 |
brtknr | if override-labels==combine-labels==update-labels, correct :) | 14:11 |
strigazi | *-labels | 14:11 |
strigazi | doesn't matter | 14:12 |
strigazi | the new name | 14:12 |
strigazi | whatever it is | 14:12 |
brtknr | we could do a survey on ML? | 14:14 |
brtknr | or send a link to a survey | 14:14 |
strigazi | let's conclude in the rest and leave the name out | 14:14 |
brtknr | ok | 14:14 |
strigazi | So it will be a new bool | 14:15 |
ttsiouts | yes | 14:16 |
brtknr | +1 | 14:16 |
strigazi | it will be false by default | 14:16 |
ttsiouts | yes | 14:17 |
brtknr | false by default if labels is defined | 14:17 |
ttsiouts | we need a default value for the flag in the DB | 14:17 |
ttsiouts | and it should be false no matter what for existing cluster | 14:18 |
ttsiouts | the default should be false and it will be evaluated only if labels are provided. | 14:19 |
ttsiouts | does it make sense? | 14:19 |
brtknr | ttsiouts: yes that makes sense to me | 14:19 |
strigazi | brtknr: > false by default if labels is defined | what about when labels == unset? | 14:19 |
strigazi | brtknr: false as well, no? | 14:20 |
brtknr | what ttsiouts said | 14:20 |
brtknr | strigazi: yes | 14:20 |
brtknr | lets keep it simple and leave it as false | 14:20 |
strigazi | solved? | 14:22 |
brtknr | ttsiouts: ? | 14:22 |
brtknr | strigazi: do you guys use https for keystone at CERN? | 14:23 |
strigazi | yes | 14:23 |
brtknr | how did you make this work for fedora coreos? | 14:23 |
strigazi | for everything user facing | 14:23 |
ttsiouts | I'm just thinking about the migration for the existing clusters. we should have something simple and not checking if things are set or not | 14:23 |
strigazi | I think false for all existing clusters is fine. I don't see what we need to distinct it | 14:24 |
brtknr | strigazi: a user reported this yesterday: Apr 07 04:02:01 d-pa2hu2ehhcwn-master-0 podman[2332]: Authorization failed: SSL exception connecting to | 14:24 |
brtknr | https://cloud.uhlhost.net:5000/v3/auth/tokens: HTTPSConnectionPool(host='cloud.uhlhost.net', | 14:25 |
brtknr | port=5000): Max retries exceeded with url: /v3/auth/tokens (Caused by SSLError(SSLError(136, '[X509] no | 14:25 |
brtknr | certificate or crl found (_ssl.c:4232)'))) | 14:25 |
strigazi | ttsiouts: I think false for all existing clusters is fine. I don't see what we need to distinct it | 14:25 |
strigazi | brtknr: openstack_ca_file | 14:25 |
strigazi | brtknr: in magnum.conf | 14:25 |
ttsiouts | strigazi: cool for me | 14:25 |
brtknr | strigazi: ok cool thanks | 14:26 |
brtknr | ttsiouts: look forward to the updated spec | 14:26 |
brtknr | cosmicsound: ^^ | 14:26 |
strigazi | ttsiouts: brtknr: I hope flwang1 doesn't have another idea | 14:26 |
brtknr | strigazi: me too :P | 14:27 |
ttsiouts | haha | 14:27 |
ttsiouts | strigazi, brtknr: thanks guys! I'll update the spec | 14:27 |
brtknr | .X, | 14:27 |
brtknr | ^that is a crossed finger | 14:27 |
strigazi | brtknr: ttsiouts: name: | 14:29 |
strigazi | brtknr: ttsiouts: https://helm.sh/docs/helm/helm_install/ For example, if both myvalues.yaml and override.yaml contained a key called ‘Test’, the value set in override.yaml would take precedence: | 14:29 |
ttsiouts | we go with override? It's ok for me | 14:30 |
strigazi | brtknr: ^^ | 14:30 |
brtknr | strigazi: im okay with override | 14:30 |
brtknr | yankcrime: ^^ please read the bit about openstack_ca_file | 14:31 |
ttsiouts | it's the name of the spec and my jira ticket :P | 14:31 |
brtknr | i am getting used to it | 14:32 |
strigazi | ttsiouts: override? | 14:32 |
ttsiouts | strigazi: yes | 14:32 |
strigazi | brtknr: yankcrime: https://docs.openstack.org/magnum/latest/configuration/sample-config.html [drivers] openstack_ca_file Path to the OpenStack CA-bundle file to pass and install in all cluster nodes. | 14:33 |
*** ricolin has joined #openstack-containers | 14:39 | |
*** ttsiouts has quit IRC | 14:42 | |
yankcrime | brtknr: 👀 | 14:45 |
yankcrime | oh is this because fedora coreos doesn't ship a cacert bundle for public / commercial CAs? | 14:45 |
strigazi | yankcrime: no, because pyhton | 14:46 |
*** ttsiouts has joined #openstack-containers | 14:46 | |
strigazi | yankcrime: no, because python | 14:46 |
yankcrime | :( | 14:47 |
strigazi | yankcrime: it should work, you have an ok cert | 14:48 |
strigazi | from Sectigo | 14:48 |
*** ttsiouts_ has joined #openstack-containers | 14:49 | |
*** ttsiouts has quit IRC | 14:49 | |
yankcrime | strigazi: it's a letsencrypt issued cert and we still see that error that brtknr described | 14:50 |
strigazi | yankcrime: not sure why you see it: podman run -it --rm --entrypoint /usr/bin/python docker.io/openstackmagnum/heat-container-agent:train-stable-1 -c "import requests ; print(requests.get('https://cloud.uhlhost.net:5000/v3'))" | 14:53 |
strigazi | <Response [200]> | 14:53 |
strigazi | seems to work | 14:53 |
strigazi | yankcrime: this is how the agent runs: https://github.com/openstack/magnum/blob/master/magnum/drivers/k8s_fedora_coreos_v1/templates/fcct-config.yaml#L194 | 14:59 |
brtknr | strigazi: yankcrime has 9.2.0 release | 14:59 |
brtknr | strigazi: is this patch relevant: https://review.opendev.org/#/c/709777/ | 15:00 |
brtknr | this patch is only available in 9.3.0 release | 15:01 |
strigazi | maybe yes, if /etc/pki/ca-trust/source/anchors/openstack-ca.pem has something bad inside | 15:01 |
*** ttsiouts_ has quit IRC | 15:02 | |
strigazi | brtknr: if this file doesn't exist the error is: OSError: Could not find a suitable TLS CA certificate bundle, invalid path: /etc/pki/ca-trust/source/anchors/openstack-ca.pem | 15:03 |
strigazi | test with: podman run -it --rm --entrypoint /usr/bin/python3 --env REQUESTS_CA_BUNDLE=/etc/pki/ca-trust/source/anchors/openstack-ca.pem docker.io/openstackmagnum/heat-container-agent:train-stable-1 -c "import requests ; print(requests.get('https://cloud.uhlhost.net:5000/v3'))" | 15:03 |
strigazi | yankcrime: brtknr: ^^ | 15:04 |
brtknr | strigazi: you're right | 15:06 |
born2bake | brtknr http://paste.openstack.org/show/791740/ - calico, coreos ; same on flannel. its with having loadbalancers added. | 15:08 |
born2bake | so it failed but load balancers show online http://prntscr.com/rut6b5 and they can ping machines | 15:09 |
brtknr | strigazi: not sure what other explaination there is | 15:11 |
brtknr | i will try running these on yankcrime's compute.sausage.cloud | 15:12 |
*** rcernin has quit IRC | 15:13 | |
brtknr | born2bake: please run 9.3.0, there is a patch for TimeoutRestartSec | 15:17 |
born2bake | do I need to add label tag or something when I do 9.3.0? | 15:18 |
brtknr | No label required | 15:19 |
brtknr | TimeoutRestartSec default value is 90 seconds, we have increased this to 600 | 15:19 |
*** ttsiouts has joined #openstack-containers | 15:27 | |
brtknr | born2bake: in 9.3.0 release | 15:30 |
born2bake | it would take some time cause I ve no idea how to create custom magnum containers in kolla so I can have the latest version :) | 15:31 |
brtknr | born2bake: you dont need to build it, the image should be usable as train tag: https://hub.docker.com/r/kolla/centos-binary-magnum-conductor/tags | 15:34 |
brtknr | although i think the CI is broken | 15:35 |
born2bake | a632c4d94216 kolla/ubuntu-source-magnum-conductor:train "dumb-init --single-…" 3 days ago Up 3 days magnum_conductor | 15:35 |
born2bake | 1a200061c45b kolla/ubuntu-source-magnum-api:train "dumb-init --single-…" 3 days ago Up 3 days magnum_api | 15:35 |
born2bake | i have ubuntu-source-train | 15:35 |
brtknr | no wait it finally merged: https://review.opendev.org/#/c/716339/ | 15:36 |
born2bake | and then just run reconfigure? | 15:36 |
brtknr | you might have to wait till tomrrow because i think they build the image every 24 hours | 15:36 |
brtknr | strigazi: if run that command you shared as sudo, with --privileged flag, i can reproduce the problem | 15:56 |
brtknr | strigazi: e.g sudo podman run -it --name heat-container-agent-dupe --privileged --volume /etc/:/etc/ --env REQUESTS_CA_BUNDLE=/etc/pki/ca-trust/source/anchors/openstack-ca.pem --net=host --rm docker.io/openstackmagnum/heat-container-agent:ussuri-dev python3 -c "import requests ; print(requests.get('https://compute.sausage.cloud:5000/v3'))" | 15:59 |
brtknr | but with the REQUESTS_CA_BUNDLE patch, no issues | 15:59 |
brtknr | yankcrime: you need this patch in conclusion https://review.opendev.org/#/c/704739/2/magnum/drivers/k8s_fedora_coreos_v1/templates/user_data.json | 16:00 |
brtknr | born2bake: yes reconfigure but as i mentioned in the openstack-kolla channel, i dont think the images have been built yet, according to dockerhub the last train image was built 13 days ago | 16:04 |
brtknr | ah sorry you are using ubuntu-source | 16:05 |
brtknr | its possible 9.3.0 is available in there then | 16:06 |
brtknr | one caveat is that we forgot to merge zincati auto-update disable patch | 16:06 |
brtknr | you might therefore be better off using master branch for magnum | 16:06 |
brtknr | you might therefore be better off using master tag for magnum | 16:07 |
brtknr | the side-effect of zincati is that for fedora coreos, heat-container-agent restarts | 16:07 |
*** udesale_ has quit IRC | 16:08 | |
*** ykarel is now known as ykarel|away | 16:10 | |
born2bake | brtknr ok I will try binary cotainers then. Also, as I mentioned previously, just created flannel 1 master 1 node cluster....run kubectl scale deployment test-autoscale --replicas=100 - http://paste.openstack.org/show/791752/ (autoscaler, autohealer, cloud manager crashing, node is created in stack though but not added | 16:12 |
brtknr | born2bake: not sure why, they work for me | 16:14 |
brtknr | born2bake: ubuntu-source may have the correct version | 16:15 |
brtknr | as it was built 9 hours ago | 16:15 |
born2bake | how do I check magnum version in container? | 16:15 |
born2bake | [magnum@sova magnum-base-source]$ ls | 16:19 |
born2bake | magnum-9.2.0 | 16:19 |
born2bake | [heat@sova heat-base-source]$ ls openstack-heat-13.0.0 | 16:19 |
born2bake | I will try to use binary master | 16:22 |
born2bake | brtknr which one you wd suggest to you? centos/ubuntu-binary/source-master? | 16:22 |
yankcrime | brtknr: ok will get it applied | 16:27 |
yankcrime | tomorrow at this rate | 16:27 |
*** ttsiouts has quit IRC | 16:40 | |
brtknr | born2bake: ubuntu-source master should also be fine | 16:48 |
cosmicsound | born2bake , use virtio instead of scsi if case | 16:50 |
cosmicsound | it helped me on my failed scripts | 16:51 |
cosmicsound | use heat_tag: master magnum_tag: master and reconfigure | 16:51 |
born2bake | as I said, when I use virtio, image doesnt have enough entropy /dev/random and cant generate ssh keys fast. so it takes around 20 minutes for machine to boot :) | 16:51 |
cosmicsound | make sure disks are on virtio | 16:51 |
cosmicsound | hmm | 16:51 |
cosmicsound | did you added the other one i mentioned? | 16:52 |
born2bake | therefore, I am stick to fedora-coreos images cause they are fine and newer | 16:52 |
born2bake | yes, I ve tried all :) | 16:52 |
cosmicsound | i too work now on coreos | 16:52 |
cosmicsound | and works good for me | 16:52 |
born2bake | have you tried autoscaler? | 16:52 |
born2bake | its crashing for me for some reason | 16:53 |
*** ttsiouts has joined #openstack-containers | 17:13 | |
*** ttsiouts has quit IRC | 17:18 | |
cosmicsound | born2bake , i tried it | 17:22 |
cosmicsound | it made me scared when it lowered my servers | 17:22 |
cosmicsound | :D | 17:22 |
cosmicsound | I did not tried it upscale it yet was downscalling itself | 17:22 |
born2bake | none-k8s servers? :) | 17:22 |
cosmicsound | all :D | 17:30 |
cosmicsound | born2bake , used sonobuoy? | 17:31 |
cosmicsound | anyone know how i start it? | 17:31 |
*** k_mouza has quit IRC | 17:31 | |
*** ttsiouts has joined #openstack-containers | 17:31 | |
born2bake | cosmicsound what version do you have? kolla/ubuntu-source-magnum-conductor:master - [magnum@sova magnum-base-source]$ ls - magnum-9.1.0.dev212 | 17:36 |
born2bake | I set master, and its even lower than I had | 17:37 |
*** ttsiouts has quit IRC | 17:46 | |
*** vishalmanchanda has quit IRC | 17:47 | |
cosmicsound | yes born2bake | 17:57 |
cosmicsound | the one with 9.1.0 was working | 17:57 |
cosmicsound | also need the heat master | 17:57 |
cosmicsound | i do not tag versions only master or train . | 17:57 |
cosmicsound | numerical tags do not work | 17:57 |
cosmicsound | If I want to edit just a extra label | 17:58 |
cosmicsound | I need to recreate the cluster? | 17:58 |
born2bake | both flannel and calico failed for me with master tag :/ | 18:01 |
born2bake | http://paste.openstack.org/show/791756/ | 18:01 |
*** ricolin has quit IRC | 18:02 | |
*** k_mouza has joined #openstack-containers | 18:13 | |
*** k_mouza has quit IRC | 18:14 | |
*** ttsiouts has joined #openstack-containers | 18:25 | |
*** ttsiouts has quit IRC | 18:30 | |
born2bake | http://paste.openstack.org/show/791757/ - flannel, with lb, 2 masters | 18:31 |
born2bake | calico faiing | 18:31 |
brtknr | Use etcd_tag=v3.4.6 | 18:33 |
brtknr | With coreos or atomic? | 18:33 |
brtknr | born2bake: | 18:33 |
born2bake | coreos | 18:34 |
brtknr | are you using the terraform script? | 18:34 |
born2bake | http://paste.openstack.org/show/791756/ - calico | 18:34 |
born2bake | yes terraform from github | 18:34 |
brtknr | That is a partial log that doesn’t tell me a lot | 19:01 |
brtknr | born2bake: | 19:02 |
brtknr | born2bake: it doesn’t say why it failed | 19:02 |
brtknr | born2bake: at the end of the log, it says etcd server request timed out | 19:04 |
brtknr | check that etcd is running | 19:04 |
brtknr | born2bake: When copying the logs, use the seashells method I described earlier | 19:05 |
brtknr | it will capture the full log | 19:06 |
born2bake | ssh core@172.24.4.253 sudo cat /var/log/heat-config/heat-config-script/* | nc seashells.io 1337 ? | 19:06 |
born2bake | Okay I will | 19:06 |
brtknr | born2bake: Yes | 19:06 |
born2bake | I will change etcd tag and do clusters again | 19:06 |
brtknr | but looks are you are using incompatible etcd version | 19:06 |
brtknr | what is the current version on the terraform script? | 19:07 |
born2bake | flannel http://paste.openstack.org/show/791757/ looks like smth with octavia | 19:07 |
brtknr | in master, you now need a v before the tag | 19:07 |
born2bake | branch is up-to-date with 'origin/master' | 19:07 |
born2bake | in vars.tf right? | 19:08 |
brtknr | I saw those, it’s not much use because the log is incomplete | 19:08 |
brtknr | but I saw etcd timing out at the end | 19:09 |
*** ttsiouts has joined #openstack-containers | 19:14 | |
born2bake | brtknr flannel, coreos - masters finished successfully https://seashells.io/v/QJcExtc8 ; let me see worker | 19:21 |
born2bake | on worker node there is no even heat-config logs | 19:23 |
born2bake | now there is. worker node - https://seashells.io/v/nQvQRffK | 19:23 |
born2bake | kubectl get node doesnt work on master either | 19:26 |
brtknr | Can you use tail instead of cat | 19:28 |
brtknr | looks like the container agent is still running on the master | 19:29 |
cosmicsound | born2bake , what hw labels you have on the image? | 19:34 |
cosmicsound | for libvirt | 19:34 |
born2bake | tail on master - https://seashells.io/v/uASF984Q | 19:37 |
born2bake | cosmicsound all ceph rbd related...cant find it now :) | 19:37 |
brtknr | born2bake: can you try with your lb disabled? | 19:47 |
*** ttsiouts has quit IRC | 19:48 | |
brtknr | does that work? | 19:48 |
brtknr | looks like your lb for octavia is not reachable | 19:48 |
born2bake | brtknr flannel: masters tail - https://seashells.io/v/uASF984Q ; workers - https://seashells.io/v/nQvQRffK ; calico: master - https://seashells.io/v/xN4gyXTD | 19:48 |
brtknr | for etcd | 19:48 |
brtknr | arr you sure octavia is configured correctly? | 19:49 |
born2bake | as I said, flannel with 1 master and 1 node it works | 19:49 |
born2bake | cant be sure :/ | 19:50 |
born2bake | https://ssup2.github.io/record/OpenStack_Stein_%EC%84%A4%EC%B9%98_Kolla-Ansible_Ubuntu_18.04_ODROID-H2_Cluster/ followed that guide for octavia | 19:50 |
brtknr | you can test octavia ingress controller | 19:50 |
born2bake | created certs, added route to docker hosts "route add -net 20.0.0.0/24 gw 192.168.0.225", then when I create lb's they are fine | 19:51 |
brtknr | born2bake: Setting up octavia is complicated, if it works with single master, sounds like problem with your octavia config | 19:54 |
born2bake | calico without lb still didnt work but I think I will focus on flannel just now...and see what's wrong with octavia | 19:54 |
born2bake | even though autoscaler/cloud manager are still crashing for me :( | 19:55 |
brtknr | you can try curling etcd port on the load balancer | 19:55 |
born2bake | curl --insecure https://10.0.15.169:2379 curl: (35) error:14094412:SSL routines:ssl3_read_bytes:sslv3 alert bad certificate | 19:57 |
born2bake | I noticed my load balancer does not have floating ip assigned | 19:57 |
born2bake | Status: TCP 2379 Online Active Yes | 19:57 |
born2bake | however, I do have master_lb_floating_ip_enabled = "true" enabled | 19:58 |
born2bake | I think the case might be that my octavia doesnt support tls/ssl | 19:59 |
brtknr | What if you use http instead of https | 20:03 |
born2bake | curl: (52) Empty reply from server | 20:03 |
born2bake | thing is it does not create floating ip - lb for etcd. only for 6443 api | 20:04 |
brtknr | Can you curl the k8s api? | 20:09 |
brtknr | born2bake: Anyway have fun investigating, I’m going to bed, I strongly suspect your lb config | 20:10 |
born2bake | okay, yeah wd need to do some testing on octavia | 20:11 |
born2bake | thanks a lot! | 20:11 |
flwang1 | brtknr: ping, are you there? | 20:28 |
*** born2bake has quit IRC | 21:36 | |
*** ttsiouts has joined #openstack-containers | 21:44 | |
*** rcernin has joined #openstack-containers | 22:11 | |
*** ttsiouts has quit IRC | 22:18 |
Generated by irclog2html.py 2.15.3 by Marius Gedminas - find it at mg.pov.lt!