*** flwang has joined #openstack-containers | 00:21 | |
flwang | brtknr: around? | 00:21 |
---|---|---|
*** ramishra has joined #openstack-containers | 01:18 | |
*** ramishra has quit IRC | 01:26 | |
openstackgerrit | Xinliang Liu proposed openstack/magnum master: Prevent scripts from exiting when there is no error https://review.opendev.org/700485 | 01:32 |
*** ramishra has joined #openstack-containers | 01:48 | |
openstackgerrit | Merged openstack/magnum stable/train: [bug] Fix regression when use_podman=false https://review.opendev.org/709782 | 02:26 |
openstackgerrit | Merged openstack/magnum stable/train: [k8s] Make metrics-server work without DNS https://review.opendev.org/709781 | 02:29 |
openstackgerrit | Merged openstack/magnum stable/train: Fix api-cert-manager=true blocking cluster creation https://review.opendev.org/709778 | 02:29 |
openstackgerrit | Merged openstack/magnum stable/train: k8s_coreos Set REQUESTS_CA for heat-agent https://review.opendev.org/709777 | 02:29 |
openstackgerrit | Merged openstack/magnum stable/train: Fix Field `health_status_reason[api]' cannot be None` https://review.opendev.org/709776 | 02:29 |
openstackgerrit | Merged openstack/magnum stable/train: Fix the load balancer description regex pattern for deleting cluster https://review.opendev.org/709775 | 02:29 |
*** iokiwi has quit IRC | 03:59 | |
*** iokiwi has joined #openstack-containers | 03:59 | |
*** udesale has joined #openstack-containers | 04:35 | |
*** ykarel|away is now known as ykarel | 05:08 | |
*** rcernin has quit IRC | 05:33 | |
*** rcernin has joined #openstack-containers | 05:33 | |
brtknr | flwang: hi | 05:53 |
*** rcernin has quit IRC | 06:24 | |
*** AJaeger has left #openstack-containers | 07:24 | |
*** ykarel is now known as ykarel|lunch | 07:35 | |
*** pcaruana has joined #openstack-containers | 07:42 | |
*** udesale has quit IRC | 08:31 | |
*** udesale has joined #openstack-containers | 08:32 | |
*** ykarel|lunch is now known as ykarel | 09:30 | |
cosmicsound | good day | 10:35 |
*** pcaruana has quit IRC | 10:41 | |
*** pcaruana has joined #openstack-containers | 10:55 | |
brtknr | cosmicsound: hi | 11:10 |
brtknr | strigazi: i dont understand the api versioning | 11:10 |
cosmicsound | hey brtknr | 11:11 |
brtknr | cosmicsound did you get anywhere? | 11:12 |
cosmicsound | brtknr , i got here: https://mdb.uhlhost.net/uploads/32e6d3c9ef5de88f/image.png | 11:19 |
cosmicsound | this is last step that eventually will fail | 11:19 |
cosmicsound | greped all kolla logs and searched for magnum | 11:19 |
cosmicsound | got this | 11:19 |
cosmicsound | http://paste.openstack.org/show/790049/ | 11:19 |
cosmicsound | this was on yesterdays deployment, today i start from scratch again | 11:20 |
cosmicsound | kube)deploy_cluster is last step that fails in end with time out, again not much info to debug on it | 11:20 |
cosmicsound | kube_ | 11:21 |
cosmicsound | will get heat-aget-logs from this new deployment | 11:23 |
cosmicsound | pitty this is always empty : openstack software deployment output show --all --long | 11:24 |
cosmicsound | latest http://paste.openstack.org/show/790064/ cloud-init.log | 11:36 |
cosmicsound | latest http://paste.openstack.org/show/790065/ cloud-init-output.log | 11:37 |
cosmicsound | the network errors are weird, because the machines got local and public ip | 11:41 |
*** ivve has joined #openstack-containers | 11:53 | |
*** udesale_ has joined #openstack-containers | 12:19 | |
*** udesale_ has quit IRC | 12:21 | |
*** udesale_ has joined #openstack-containers | 12:21 | |
*** udesale has quit IRC | 12:22 | |
*** mgariepy has joined #openstack-containers | 12:32 | |
*** mgariepy has quit IRC | 12:38 | |
*** alti_17 has joined #openstack-containers | 12:44 | |
*** iokiwi has quit IRC | 12:47 | |
*** iokiwi has joined #openstack-containers | 12:48 | |
alti_17 | Hello, FYI, for those who are using fedora coreos 31 stable, (in our case 31.20200127) yesterday Stable stream was updated to 31.20200223 https://getfedora.org/en/coreos/download/ . If you start cluster provisioning using older image it often fails with errors related to SoftwareDeployment stage, looks like it caused by coreos zincati | 12:52 |
alti_17 | https://docs.fedoraproject.org/en-US/fedora-coreos/auto-updates/ auto updates feauture. It start update and restart of OS during heat script execution and interrupts it | 12:52 |
*** mgariepy has joined #openstack-containers | 12:52 | |
*** alti_17 has quit IRC | 12:53 | |
*** alti_17 has joined #openstack-containers | 12:54 | |
*** udesale_ has quit IRC | 13:05 | |
*** udesale_ has joined #openstack-containers | 13:05 | |
*** markguz_ has joined #openstack-containers | 13:06 | |
markguz_ | Hi I'm trying to deploy a 1master 3node cluster on fedora coreos using the lastest version of magnum. (git master) | 13:10 |
markguz_ | however consistently 1 out of 3 of the minions fails to be configured by heat. I see this error in the logs "Command failed, will not cache new data. Command 'os-refresh-config' died with <Signals.SIGTERM: 15>" | 13:11 |
markguz_ | and then the heat-container-agent dies | 13:11 |
markguz_ | i can't find any more detailed logging inside the vm.. anyone got any hints as to where to look? | 13:12 |
brtknr | markguz_: try inside /var/log/heat-config/ | 13:12 |
markguz_ | brtknr: no such folder | 13:13 |
brtknr | markguz_: so heat container is failing for a different reason | 13:13 |
brtknr | markguz_: so when you have 1 master 1 worker its successful? | 13:13 |
brtknr | markguz_: which image version are you using? | 13:14 |
markguz_ | brtknr: i checked on the the nodes that did configure and they don't have that folder either | 13:14 |
markguz_ | fedora-coreos-31.20200127.3.0 | 13:14 |
brtknr | markguz_: are you using heat container agent ussuri-dev | 13:14 |
markguz_ | brtknr: yes | 13:14 |
brtknr | markguz_: so when you have 1 master 1 worker its successful? | 13:15 |
markguz_ | brtknr: i will try that | 13:15 |
alti_17 | This is no longer latest version | 13:16 |
alti_17 | fedora-coreos-31.20200127.3.0 | 13:16 |
markguz_ | alti_17: it was yesterday? | 13:16 |
markguz_ | that's when i downloaded it | 13:16 |
markguz_ | there was no newer version available | 13:16 |
markguz_ | did they release another? | 13:17 |
alti_17 | Yes, 1 Day ago new one was released | 13:17 |
alti_17 | 31.20200210.3.0 stable | 13:17 |
markguz_ | do y'all have a verified version that you know works? | 13:17 |
alti_17 | And it caused issues for me. Because of https://docs.fedoraproject.org/en-US/fedora-coreos/auto-updates/ zincati auto updates.. it updates and restart vm during heat script execution what causes errors for me. | 13:18 |
alti_17 | I'm testing right now, looks like new version works. but haven't verified it yet | 13:18 |
markguz_ | alti_17: YES! i saw those zincati things | 13:19 |
alti_17 | I think magnum driver doesn't expect that vm will be rebooted by zincati. It also causes issues for already provisioned clusters, we can't scale them now | 13:20 |
markguz_ | alti_17: does that mean coreos is broken for the moment? | 13:23 |
markguz_ | maybe zincati can be disabled in the user data before config starts.. | 13:25 |
alti_17 | I have just created 2 clusters in a row using very latest version of coreos. I will start looking on how to disable/control coreos/zincati update behavior, but meanwhile maybe someone from Magnum maintainers will share some thoughts about it, maybe there are some solutions which we missed | 13:29 |
markguz_ | i'm just about to try it. was the zincati feature recently added to coreos? | 13:30 |
markguz_ | or perhaps using atomic 29 | 13:33 |
alti_17 | Just for history this is journalctl log from node vm when you are using not latest coreos. Reboot triggered. Master nodes not being rebooted. No nodes being rebooted if you use latest coreos | 13:36 |
alti_17 | Feb 27 11:03:56 alti17-2xw75kfskmyd-node-0 audit[1]: SERVICE_STOP pid=1 uid=0 auid=4294967295 ses=4294967295 subj=system_u:system_r:init_t:s0 msg='unit=systemd-tmpfiles-se> | 13:36 |
alti_17 | Persistent Storage... | 13:36 |
alti_17 | BOOT_IMAGE=(hd0,gpt1)/ostree/fedora-coreos-19190477fad0e60d605a623b86e06bb92aa318b6b79f78696b06f68f262ad5d6/vmlinuz-5.4.17-> | 13:36 |
*** lpetrut has joined #openstack-containers | 13:36 | |
markguz_ | hmm. ok i got the image uploaded just about to try spinning a cluster | 13:36 |
markguz_ | i hope they disabled auto rebooting.. that seems like a dumb thing to do with something like coreos. | 13:38 |
markguz_ | it's dumb thing to do on anything that is designed to run services... | 13:39 |
alti_17 | This is, by the way, coreos "killer feature" https://getfedora.org/en/coreos/ "Fedora CoreOS is an automatically-updating" maybe we just don't know how to operate it properly | 13:41 |
*** waverider has joined #openstack-containers | 13:44 | |
markguz_ | auto updating... maybe.. auto rebooting.. definitely not | 13:49 |
*** alti_17 has quit IRC | 13:50 | |
*** alti_17 has joined #openstack-containers | 13:57 | |
*** pcaruana has quit IRC | 13:59 | |
*** alti_17 has quit IRC | 14:07 | |
*** alti_17 has joined #openstack-containers | 14:07 | |
*** ykarel is now known as ykarel|away | 14:08 | |
*** mgariepy has quit IRC | 14:14 | |
*** mgariepy has joined #openstack-containers | 14:55 | |
*** udesale_ has quit IRC | 14:57 | |
*** pcaruana has joined #openstack-containers | 15:01 | |
*** alti_17 has quit IRC | 16:08 | |
*** markguz_ has quit IRC | 16:31 | |
*** waverider has quit IRC | 16:40 | |
*** lpetrut has quit IRC | 16:40 | |
*** jmlowe has quit IRC | 16:54 | |
cosmicsound | whats the cli you guys use to build the latest fedora-coreos image? | 16:58 |
cosmicsound | whenever i build it is not visible in magnum at template creation | 16:59 |
*** alti_17 has joined #openstack-containers | 17:04 | |
alti_17 | cosmicsound something like this openstack image create \ | 17:05 |
alti_17 | fedora-coreos-31.20200127.3.0 | 17:05 |
alti_17 | You might missed os_distro or public key | 17:06 |
cosmicsound | what is the good os=distro? | 17:06 |
cosmicsound | fedora-coreos? | 17:07 |
cosmicsound | do i need also to specify a default username? like on fedora-atomic | 17:07 |
cosmicsound | i used this: openstack image create "fedora-coreos-31.20200127.3.0" --file fedora-coreos-latest.qcow2 --disk-format qcow2 --container-format=bare --min-disk 10 --min-ram 4096 --public --protected --property hw_scsi_model=virtio-scsi --property hw_disk_bus=scsi --property hw_qemu_guest_agent=yes --property | 17:08 |
cosmicsound | os_distro=fedora-coreos --property os_admin_user=fedora --property os_version="31.20200127.3.0" | 17:08 |
cosmicsound | https://mdb.uhlhost.net/uploads/330b518a122d9356/image.png in the end is only the fedora atomic visible | 17:09 |
alti_17 | Which version of openstack/magnum do you have? | 17:11 |
cosmicsound | train/2.5.0 | 17:11 |
cosmicsound | sorry | 17:13 |
cosmicsound | 2.17.0 | 17:13 |
cosmicsound | I upgraded from 2.15 | 17:13 |
cosmicsound | openstack 5.0.0 | 17:14 |
cosmicsound | Train release | 17:14 |
alti_17 | Sory, a bit confused, magnum train has 9.* version https://docs.openstack.org/releasenotes/magnum/train.html | 17:17 |
cosmicsound | thats the cli version my bad | 17:17 |
alti_17 | starting from 9.1.0 tag fedora-coreos supported | 17:18 |
cosmicsound | hmm i am also confused to find real magnum version | 17:18 |
cosmicsound | :D not the cli version | 17:18 |
*** alti_17 has quit IRC | 17:44 | |
cosmicsound | magnum-conductor --version | 17:51 |
cosmicsound | 9.2.0 | 17:51 |
cosmicsound | this is the version i run | 17:51 |
cosmicsound | still my image is not visible | 17:51 |
cosmicsound | il try to go to latest image version | 17:52 |
*** jmlowe has joined #openstack-containers | 18:01 | |
*** jmlowe has quit IRC | 18:05 | |
*** jmlowe has joined #openstack-containers | 18:05 | |
*** jmlowe has quit IRC | 18:27 | |
*** jmlowe has joined #openstack-containers | 18:30 | |
*** jmlowe has quit IRC | 18:34 | |
*** jmlowe has joined #openstack-containers | 18:35 | |
*** jmlowe has quit IRC | 18:36 | |
*** jmlowe has joined #openstack-containers | 18:56 | |
*** jmlowe has quit IRC | 18:59 | |
-openstackstatus- NOTICE: Memory pressure on zuul.opendev.org is causing connection timeouts resulting in POST_FAILURE and RETRY_LIMIT results for some jobs since around 06:00 UTC today; we will be restarting the scheduler shortly to relieve the problem, and will follow up with another notice once running changes are reenqueued. | 19:10 | |
*** jmlowe has joined #openstack-containers | 19:31 | |
*** pcaruana has quit IRC | 19:38 | |
-openstackstatus- NOTICE: The scheduler for zuul.opendev.org has been restarted; any changes which were in queues at the time of the restart have been reenqueued automatically, but any changes whose jobs failed with a RETRY_LIMIT, POST_FAILURE or NODE_FAILURE build result in the past 14 hours should be manually rechecked for fresh results | 19:44 | |
*** jmlowe has quit IRC | 20:07 | |
*** jmlowe has joined #openstack-containers | 20:30 | |
*** rcernin has joined #openstack-containers | 21:44 | |
*** ivve has quit IRC | 22:57 |
Generated by irclog2html.py 2.15.3 by Marius Gedminas - find it at mg.pov.lt!