*** zigo_ is now known as zigo | 09:43 | |
jrosser | morning | 11:53 |
---|---|---|
noonedeadpunk | o/ | 12:28 |
opendevreview | Andrew Bonney proposed openstack/openstack-ansible master: WIP: [doc] Update distribution upgrades document for 2023.1/jammy https://review.opendev.org/c/openstack/openstack-ansible/+/906832 | 13:03 |
noonedeadpunk | andrewbonney: I've commented our solution to one of your TODO items fwiw | 13:13 |
andrewbonney | Ta. I've got a copy of your script, just haven't got to using it during this process yet | 13:13 |
noonedeadpunk | ah, ok, was not sure if I've ever shared this | 13:13 |
noonedeadpunk | also likely it should be polished a bit... but well | 13:14 |
jrosser | noonedeadpunk: do you know how nested virt is supposed to work in CI jobs? | 13:14 |
jrosser | i see that there are nested virt enabled flavors here https://opendev.org/openstack/project-config/src/branch/master/nodepool/nl03.opendev.org.yaml#L147-L188 | 13:14 |
noonedeadpunk | i somehow thought that nested virt is a requirement from infra side before joining | 13:15 |
jrosser | but we always make `nova_virt_type: qemu` in user_variables for any zuul job | 13:15 |
jrosser | and this is terribly slow for the capi job, like 10x slower | 13:15 |
noonedeadpunk | qemu is very slow, yes | 13:16 |
noonedeadpunk | as generally tempest just spawn VMs with cirros to check connectivity and drop it | 13:16 |
jrosser | so i was wondering if we are somehow doing it wrong, and the nodeset choice is supposed to decide if its nested virt or not | 13:16 |
noonedeadpunk | or well, maybe cases with octavia where LBs are spawned, but all cases have quite small workload | 13:17 |
jrosser | we can detect this at runtime with `cat /sys/module/kvm_amd/parameters/nested` | 13:18 |
noonedeadpunk | so you mean that nested virt we should be able to use kvm? | 13:18 |
jrosser | (or kvm_intel as needed) | 13:18 |
jrosser | yeah so in my cloud here i am running AIO for capi with nested virt enabled | 13:18 |
jrosser | and becasue it's not a zuul job it actually is using kvm accel | 13:18 |
noonedeadpunk | and holds you saw were all having it? | 13:19 |
jrosser | and the magnum cluster creates in 4 mins | 13:19 |
jrosser | and in the CI hold i have, the nodes are nested virt enabled, but we set it to qemu in zuul user_variables | 13:19 |
noonedeadpunk | yeah, I see | 13:19 |
jrosser | and then it takes like 40mins+ to create the capi cluster | 13:19 |
noonedeadpunk | makes sense to use kvm when we can, sure | 13:20 |
jrosser | and the CPUs are just pegged at 100% on qemu processes | 13:20 |
noonedeadpunk | maybe this will get some boost for other jobs as well | 13:20 |
jrosser | yeah, and i guess if i can make it dynamic detection then if the provider supports it, it will use it | 13:21 |
noonedeadpunk | I feel like it should be >90% of cases... | 13:22 |
noonedeadpunk | Like testing kata would be impossible otherwise... | 13:23 |
opendevreview | Jonathan Rosser proposed openstack/openstack-ansible master: Enable nested virtualisation in the AIO when it is available https://review.opendev.org/c/openstack/openstack-ansible/+/907327 | 14:11 |
opendevreview | Jonathan Rosser proposed openstack/openstack-ansible-os_magnum master: Add job to test Vexxhost cluster API driver https://review.opendev.org/c/openstack/openstack-ansible-os_magnum/+/905199 | 14:12 |
* noonedeadpunk wonder if execution time will drop significantly | 14:23 | |
jrosser | i guess in a normal job this will only affect booting cirros, so perhaps not | 14:23 |
jrosser | but the capi job makes amphora and two ubuntus...... | 14:23 |
noonedeadpunk | yeah, true... | 14:23 |
noonedeadpunk | tempest run still takes 8mins... | 14:25 |
noonedeadpunk | So if it will drop to 3 - 5 mins is noticable... | 14:25 |
jrosser | that would be good | 14:25 |
noonedeadpunk | jrosser: seems it 's not working out: https://zuul.opendev.org/t/openstack/build/bad24e6c50114083a56c7ae392f99dfe/log/logs/openstack/aio1-utility/tempest_run.log.txt | 15:17 |
noonedeadpunk | I know it's distro job, but it fails like very alike to how it would with such change | 15:17 |
noonedeadpunk | with VM crashing on attempt of startup | 15:18 |
jrosser | hmm maybe it needs to be opt-in for the job | 15:26 |
noonedeadpunk | and you have KVM not QEMU locally? | 15:27 |
noonedeadpunk | Like, you sure about that?:) | 15:27 |
noonedeadpunk | jrosser: as we are already trying to guess kvm/qemu here: https://opendev.org/openstack/openstack-ansible-os_nova/src/branch/master/tasks/nova_virt_detect.yml | 15:29 |
noonedeadpunk | so eventually, we could just undefine it and rely on nova role... | 15:30 |
jrosser | argh i see | 15:31 |
opendevreview | Jonathan Rosser proposed openstack/openstack-ansible master: Allow nova role to detect virtalisation type in CI jobs https://review.opendev.org/c/openstack/openstack-ansible/+/907327 | 15:36 |
jrosser | noonedeadpunk: in my local AIO vm i see `-accel kvm` on the /usr/bin/qemu-system-x86_64 process | 15:37 |
noonedeadpunk | mhm, I see | 15:37 |
noonedeadpunk | jrosser: fwiw, it's failing quite dramatically as well | 17:03 |
noonedeadpunk | but not for everything.... | 17:04 |
jrosser | ok so there is probably very good reason to have nested virt specific nodesets | 17:04 |
noonedeadpunk | I wonder what's wrong though, and why it's not catched with nova role then | 17:05 |
jrosser | as it may just be br0k in certain providers | 17:05 |
noonedeadpunk | as conditions there looked around right | 17:05 |
jrosser | well if you have unfortunate kernel troubles between host/guest doesnt this all go a bit bad? | 17:05 |
jrosser | afaik its very sensitive to host side things | 17:05 |
jrosser | if you codesearch for the virt_type stuff its just tons of "use qemu in CI" comments all over | 17:07 |
noonedeadpunk | yeah, that;'s probably reason why it's there | 17:09 |
jrosser | anyway - the node type for the capi job is one where this is actaully supposed to work | 17:12 |
jrosser | so we still need a way to opt in | 17:12 |
jrosser | OMG it works https://review.opendev.org/c/openstack/openstack-ansible-os_magnum/+/905199?tab=change-view-tab-header-zuul-results-summary | 17:44 |
jrosser | \o/ | 17:44 |
noonedeadpunk | wow | 17:56 |
noonedeadpunk | that is veeeery promising :) | 17:56 |
noonedeadpunk | only 2h:) | 17:57 |
noonedeadpunk | even faster then upgrade job :D | 17:57 |
jrosser | they are fast nodes | 18:47 |
TheCompWiz | jrosser: I finally broke down and tried the AIO install, and I'm getting yet-another-brick wall. The bootstrap_aio script dies when trying to "Download EPEL gpg keys" | 19:32 |
TheCompWiz | wait... why on earth would it be trying to install centos EPEL keys on ubuntu? | 19:38 |
noonedeadpunk | TheCompWiz: that is correct question :D | 19:39 |
TheCompWiz | ansible facts say distribution is ubuntu... | 19:50 |
jrosser | can you share the output? | 19:50 |
TheCompWiz | https://paste.openstack.org/show/bPrmCUBAQKp3mRy0LpGb/ | 19:51 |
TheCompWiz | the ansible facts shows this: "ansible_os_family": "Debian" | 19:51 |
TheCompWiz | and everything I see says the "Install EPEL" block shouldn't even be run... | 19:52 |
TheCompWiz | brb 1 sec... switching PCs. | 19:54 |
*** TheCompWiz is now known as TCW | 19:55 | |
TheCompWiz | back | 19:56 |
jrosser | its not to do with debian or not | 19:58 |
jrosser | the error says `error while evaluating conditional ('s3fs' in systemd_mount_types)` | 19:58 |
jrosser | seeing the output of tasks to do with setting up the AIO storage is going to be the only way to understand this | 20:11 |
noonedeadpunk | TheCompWiz: to be more specific `'_bootstrap_host_data_disk_device' is undefined.` | 20:36 |
TheCompWiz | which is odd... because I set export BOOTSTRAP_OPTS="bootstrap_host_data_disk_device=sdb bootstrap_host_data_disk_fs_type=xfs bootstrap_host_public_interface=ens34" | 20:37 |
TheCompWiz | before running bootstrap. | 20:37 |
TheCompWiz | and yes, sdb does exist. | 20:37 |
TheCompWiz | and was partitioned & formatted. | 20:37 |
noonedeadpunk | aha, ok, that;s important input | 20:39 |
dmsimard[m] | noonedeadpunk: are you going to fosdem after all? :P | 20:39 |
noonedeadpunk | dmsimard[m]: nah :( Like I even got time and funding, but task workload is just /o\ | 20:39 |
noonedeadpunk | so next time I guess | 20:40 |
dmsimard[m] | I know the feeling, there'll be a next time no stress | 20:40 |
noonedeadpunk | TheCompWiz: that actually should have worked.... | 20:47 |
noonedeadpunk | I'd need try to reproduce that, but only tomorrow :( | 20:48 |
noonedeadpunk | eventually then it should have failed waaaay earlier I guess | 20:49 |
noonedeadpunk | like here: https://opendev.org/openstack/openstack-ansible/src/branch/master/tests/roles/bootstrap-host/tasks/check-requirements.yml#L135-L144 | 20:50 |
noonedeadpunk | TheCompWiz: oh.... what if you try to drop partition table? | 20:53 |
noonedeadpunk | that looks like a bug | 20:53 |
noonedeadpunk | so we define _bootstrap_host_data_disk_device only when there's no partition table exist | 20:54 |
noonedeadpunk | https://opendev.org/openstack/openstack-ansible/src/branch/master/tests/roles/bootstrap-host/tasks/prepare_data_disk.yml#L36-L38 | 20:54 |
noonedeadpunk | But that in next task assume it's defined and using another condition | 20:55 |
noonedeadpunk | which I think explicitly what you see | 20:55 |
noonedeadpunk | so solution - add disk, set everything the same, but don;'t bother yourself with partition creation on the drive :D | 20:55 |
noonedeadpunk | TheCompWiz: ^ | 20:55 |
TheCompWiz | noonedeadpunk: wiped partitions, and re-running bootstrap-aio. I'll paste the results. | 21:22 |
TheCompWiz | hmmm... that seems to have worked. ... not sure how I ended up in that situation. | 21:24 |
spatel | How to force detach volume? | 21:54 |
spatel | I have one VM has single volume attached two time | 21:54 |
Generated by irclog2html.py 2.17.3 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!