*** mfo is now known as Guest1042 | 00:07 | |
*** mfo_ is now known as mfo | 00:07 | |
gibi | o/ | 07:05 |
---|---|---|
bauzas | \o | 07:27 |
*** bhagyashris|rover is now known as bhagyashris | 07:37 | |
opendevreview | Balazs Gibizer proposed openstack/nova master: Unparent PciDeviceSpec from PciAddressSpec https://review.opendev.org/c/openstack/nova/+/844491 | 08:02 |
gibi | elodilles, melwitt: what is the stable view on squashes? https://review.opendev.org/c/openstack/nova/+/843680/6#message-f9be680631a00541158243c55609a5804cc7777d | 08:33 |
gibi | stephenfin: do you have some context on https://review.opendev.org/c/openstack/hacking/+/816676 ? the hacking repo is green now with flake 4 but in nova we have a bunch of new findings. Was there any previous trial moving nova to the newer hacking? | 08:39 |
songwenping | bauzas:we donnot delete vgpu mdev when delete vm? | 08:42 |
sean-k-mooney | **cough** black **cough** | 08:42 |
sean-k-mooney | songwenping: ya our down stream qe may have also found that yesterday | 08:43 |
sean-k-mooney | songwenping: we are still confirming | 08:43 |
sean-k-mooney | songwenping: its unclear if something has changed recently i.e. if this only happens if you have mig mode enabled | 08:44 |
sean-k-mooney | or if this was always an oversight | 08:44 |
songwenping | sean-k-mooney: from this line https://github.com/openstack/nova/blob/8260979b71b29ce2666d37b3adc7c256482aa16d/nova/virt/libvirt/driver.py#L6485 we can reuse the mdev | 08:44 |
sean-k-mooney | songwenping: we should not be reusing the mdev | 08:44 |
kashyap | sean-k-mooney: I think you have Ubuntu box handy, can you please tell the QEMU binary path on it? (Not the /usr/bin/qemu-kvm, but the actual binary) | 08:45 |
sean-k-mooney | sure that my home server. its not in libexec like its on rhel related distros | 08:45 |
sean-k-mooney | the fact its different is waht prevent live migration form happening one sec while i check | 08:45 |
songwenping | so we should delete the mdev with vm. | 08:45 |
kashyap | sean-k-mooney: I know it's not in /usr/libexec - that's a RHEL/CentOS thing | 08:45 |
sean-k-mooney | kashyap: yep just sayign they used to be the same | 08:46 |
kashyap | sean-k-mooney: The thing is on Fedora the binary is this: /usr/bin/qemu-system-x86_64 | 08:46 |
sean-k-mooney | kashyap: but when centos change to that it broke live migration between centos and ubunutu | 08:46 |
bauzas | songwenping: no, we leave them | 08:47 |
kashyap | (And IIRC, Ubuntu uses a different binary name and a different path) | 08:47 |
sean-k-mooney | /usr/bin/qemu-system-x86_64 | 08:47 |
kashyap | Ah, same as Fedora then. Thanks | 08:47 |
sean-k-mooney | its qemu im not seeing qemu-kvm on 22.04 | 08:47 |
sean-k-mooney | let me double check | 08:47 |
sean-k-mooney | sean@cloud:~$ qemu-kvm | 08:47 |
bauzas | songwenping: you can even precreate the mdevs before starting the nova-compute service, then the n-cpu service will use them instead of creating them | 08:47 |
sean-k-mooney | qemu-kvm: command not found | 08:47 |
sean-k-mooney | so its not on my path at least | 08:47 |
kashyap | sean-k-mooney: I don't want that; it's an alias to the QEMU binary on Fedora | 08:48 |
kashyap | sean-k-mooney: So, "/usr/bin/qemu-system-x86_64" exists on Ubuntu, right? | 08:48 |
sean-k-mooney | yes | 08:48 |
songwenping | bauzas:then there will has race problem, https://bugs.launchpad.net/nova/+bug/1836204 | 08:48 |
sean-k-mooney | sean@cloud:~$ file /usr/bin/qemu-system-x86_64 | 08:48 |
sean-k-mooney | /usr/bin/qemu-system-x86_64: ELF 64-bit LSB pie executable, x86-64, version 1 (SYSV), dynamically linked, interpreter /lib64/ld-linux-x86-64.so.2, BuildID[sha1]=f28deae95b4b0f19a8c8be87c45d1be213cf8b6b, for GNU/Linux 3.2.0, stripped | 08:49 |
kashyap | sean-k-mooney: Thanks! | 08:49 |
sean-k-mooney | kashyap: it exiss and is not a symlink its the binary | 08:49 |
kashyap | Okay, I see that I could've checked it here: https://packages.ubuntu.com/bionic/i386/qemu-system-x86/filelist | 08:49 |
kashyap | sean-k-mooney: Yeah, I know; if that's the name, then it's the real binary | 08:49 |
kashyap | 'qemu-kvm' is the symlink on Fedora/CentOS | 08:49 |
sean-k-mooney | kashyap: not quite | 08:50 |
sean-k-mooney | well it might be a symlink but it was a script on ubuntu | 08:50 |
kashyap | $> file /usr/bin/qemu-kvm | 08:50 |
kashyap | /usr/bin/qemu-kvm: symbolic link to qemu-system-x86_64 | 08:50 |
songwenping | bauzas:if we create new mdev when alloc, there will be no this problem. | 08:50 |
kashyap | sean-k-mooney: Yeah, it used to be a wrapper script | 08:50 |
sean-k-mooney | yep | 08:50 |
kashyap | (Which just pointed to the real binary. So they removed that) | 08:50 |
sean-k-mooney | it was a wrapper on ubuntu in 20.04 | 08:50 |
sean-k-mooney | in 22.04 they droped it | 08:51 |
bauzas | songwenping: I don't see how it could help | 08:51 |
sean-k-mooney | kashyap: looks like ti snow a virtual package/metapackage https://packages.ubuntu.com/jammy/qemu-kvm | 08:51 |
sean-k-mooney | likely for upgrade reasons | 08:52 |
kashyap | Yeah, same on Fedora | 08:53 |
kashyap | $> rpm -ql qemu-kvm | 08:53 |
kashyap | (contains no files) | 08:53 |
sean-k-mooney | kashyap: if you want to poke around i can give you access to the server just let me know | 08:53 |
elodilles | gibi: for the sake of easier reviewing i'd say squash is better to avoid. some rare cases though we can have squash: e.g. in case otherwise we could not unblock the gate. or in case we can avoid some introduced bug that the follow-up fixes. | 08:53 |
kashyap | sean-k-mooney: Thank you, it's not needed at this time. I just wanted to be sure of the path - so that we could supply it to `coredumpctl` :) | 08:54 |
sean-k-mooney | hum never had to use that | 08:54 |
sean-k-mooney | does that parse them or cause them :) | 08:54 |
songwenping | bauzas: in this function https://github.com/openstack/nova/blob/8260979b71b29ce2666d37b3adc7c256482aa16d/nova/virt/libvirt/driver.py#L6416, we use _create_new_mediated_device for every vm instead of select a mdev from mdevs_available. | 08:55 |
sean-k-mooney | songwenping: we inteded to not reuse mdevs for new vms | 08:57 |
sean-k-mooney | the other path for intedd for agent restart if i rememebr correctly | 08:57 |
sean-k-mooney | songwenping: so we shoudl be delete the mdevs if we delete the vm | 08:58 |
songwenping | sean-k-mooney: cool | 08:58 |
sean-k-mooney | songwenping: this become more important for cards that can support more then one mdev at a time with generic mdev support | 08:58 |
sean-k-mooney | i dont think nvidia do with any of there gpus btu other vendors might in the future | 08:59 |
sean-k-mooney | bauzas: precrating does not work and i dont think it was part of the sepc so if that was added its a bug | 09:00 |
songwenping | sean-k-mooney: ack, agree to delete the mdevs if we delete the vm | 09:00 |
bauzas | sean-k-mooney: if we make nova creating the mdevs directly, this could create other problems | 09:00 |
bauzas | for reboot, for example | 09:00 |
sean-k-mooney | bauzas: we dont support using mdevctl and the current case faild our qe | 09:00 |
bauzas | remember the difference between SR-IOV devices and mdevs | 09:01 |
sean-k-mooney | bauzas: nova is not able to create the mdevs because it did not clean up | 09:01 |
sean-k-mooney | bauzas: yes i know | 09:01 |
sean-k-mooney | we only support nova creating the mdevs today | 09:01 |
sean-k-mooney | including in the mig case | 09:01 |
bauzas | in the mig case, nothing changes from nova pov | 09:02 |
bauzas | this is just that we precreate the pci devices | 09:02 |
bauzas | not the mdevs | 09:02 |
sean-k-mooney | yep | 09:02 |
songwenping | when manage vgpus by cyborg, we precreate the mdevs when discover | 09:02 |
sean-k-mooney | pre creat the pci device and list the vfs instead of the pf | 09:02 |
bauzas | songwenping: sean-k-mooney: anyway, I'm not against fixing this old bug | 09:02 |
bauzas | songwenping: if you have time for fixing it, would be appreciated | 09:03 |
bauzas | none of this requires a spec | 09:03 |
songwenping | bauzas:i'll try. | 09:06 |
kashyap | dansmith: To answer to your question in the scrollback: yeah, I'm guessing it's different - based on the trigger here (backup) vs. the older one (disk-detach) | 09:29 |
frickler | gibi: hacking-integration-nova has been busted for ages | 09:32 |
frickler | likely before we moved to zull v3 even https://zuul.opendev.org/t/openstack/builds?job_name=hacking-integration-nova&project=openstack%2Fhacking&result=SUCCESS&skip=0 | 09:32 |
frickler | *zuul | 09:33 |
opendevreview | Balazs Gibizer proposed openstack/nova master: Fix PciAddressSpec descendants to call super.__init__ https://review.opendev.org/c/openstack/nova/+/844565 | 09:34 |
gibi | frickler: I think it shows that nova is not compatible with latest hacking | 09:34 |
gibi | frickler: am I mistaken? | 09:34 |
frickler | gibi: this is true. afaict nova also hasn't been compatible with at least 3 years of previous releases of hacking, so not much change | 09:35 |
gibi | ahh OK | 09:36 |
gibi | frickler: so you think we can land the flake version bump after I drop py35 testing or we need to fix nova first? | 09:36 |
sean-k-mooney | frickler: gibi if we use black hacking become less importnat. still nice for extra non style related checks like dont improt prev sep stuff with from | 09:37 |
sean-k-mooney | or any other rules we want to enforece but the systle related ones are less requried | 09:38 |
gibi | sean-k-mooney: the current failures are thinks that are not covered by black / flake | 09:38 |
frickler | gibi: nova caps to 3.1.0, there are 3.2.0, 4.0.0 and 4.1.0 already. so adapting nova to recent hacking should not block updating hacking IMO | 09:38 |
sean-k-mooney | ack i was just hoping it would reduce the number of things we have to port | 09:38 |
gibi | frickler: thanks | 09:38 |
gibi | I will update the hacking patch to drop py35 then | 09:39 |
sean-k-mooney | frickler: yep it should not | 09:39 |
frickler | also I must reduce my "3 years" claim to 2 years, 3.1.0 was released in May 2020. no idea how far back our history of zuul builds actually goes | 09:40 |
sean-k-mooney | frickler: i think there was a change in hackign that broke some fo our checks which is why we pinned | 09:54 |
sean-k-mooney | frickler: but then stephenfin who was lookign at fixing it moved to not work on nova as part of there day job | 09:55 |
sean-k-mooney | and we had others move too | 09:55 |
sean-k-mooney | so we just never got aroudn to adressint the issues | 09:55 |
sean-k-mooney | we might still have patches form stephenfin for some of the issues | 09:55 |
sean-k-mooney | not that i can see on this topic | 09:57 |
sean-k-mooney | frickler: we did recently move to 3.1 | 09:58 |
sean-k-mooney | https://review.opendev.org/c/openstack/nova/+/836639/1/.pre-commit-config.yaml | 09:58 |
sean-k-mooney | actully i guess the bump was done in victoria https://github.com/openstack/nova/commit/61b99a1295c5208deef806b69ba74a7d031ad851 | 10:03 |
frickler | sean-k-mooney: yes, that was right after the release of 3.1.0, but it seems after that, nova lost track | 10:04 |
sean-k-mooney | yep it was not broken so i guess we just never updated it | 10:05 |
sean-k-mooney | proably also just resouce constrianed/pandmic related | 10:05 |
* gibi hates the pci whitelist parsing code | 12:18 | |
sean-k-mooney | :) | 12:18 |
sean-k-mooney | dont you love how we only support bash globs if the adress is a sting and only support regexs if its a dictionary | 12:19 |
gibi | nah, the self.is_physical_function handling is worst in my eyes | 12:20 |
gibi | we write that flag twice based on two different utility function reading the same sysfs location | 12:21 |
sean-k-mooney | heh of course we do | 12:21 |
gibi | but yes, that string / dict duality is close second | 12:21 |
sean-k-mooney | we did it because of the conflict between * in glob and regex meendnin ins * is .* in regex land | 12:22 |
sean-k-mooney | i just wish we used regex form from the start | 12:22 |
gibi | yepp I figured that regex was added later and blow up the picture | 12:22 |
gibi | and of course the whole devname special case is a pain | 12:22 |
sean-k-mooney | i would not mind devname if it worked for all devices | 12:23 |
sean-k-mooney | but the fact that its unreliable and only works for nic | 12:23 |
sean-k-mooney | sometimes | 12:23 |
sean-k-mooney | is why i hate it | 12:23 |
sean-k-mooney | once you are done with the placment work i do still think its worth revisigint our config format in a differnt discussion | 12:24 |
gibi | I agree | 12:25 |
gibi | I will try to untangle as much of the current parsing code as possible before I add the new things to it | 12:25 |
sean-k-mooney | like in general can oslo config supprot yaml or can we add a resouces.yaml for all the complext host level resouces | 12:25 |
sean-k-mooney | ok cool | 12:26 |
sean-k-mooney | the other thing to be aware of is how we parse physical_network | 12:26 |
gibi | I'm not there yet :) | 12:26 |
sean-k-mooney | speicifcliy we are relying on physical_network=null being converted to python None | 12:27 |
sean-k-mooney | so we are using the json parsing to differnceat between not set, physical_network=null and pyhsical_network="null" or physical_network="None" | 12:28 |
gibi | nice | 12:28 |
sean-k-mooney | all 4 of those have differnt meanings | 12:28 |
sean-k-mooney | the last 2 are just strings that are the name of a phsyical network in neutron | 12:28 |
sean-k-mooney | pyhical_network=null without quotes is converted to python None and that is used for hardware offloaded ovs with tunneld networks | 12:29 |
sean-k-mooney | and unset means this is not a nic | 12:29 |
gibi | I would like to give prizes to deployments naming there physnets None and null :D | 12:29 |
sean-k-mooney | hehe ya | 12:30 |
sean-k-mooney | but since the null vaule was not planned to be supported | 12:30 |
sean-k-mooney | and was a bug that was abused for hardware offloed ovs | 12:30 |
sean-k-mooney | its posible | 12:30 |
sean-k-mooney | https://bugs.launchpad.net/nova/+bug/1915282 | 12:32 |
dansmith | kashyap: so it seems like this heisenbug has receded into the background, even though only one package has changed in fedora since last week (that we install) | 13:35 |
dansmith | kashyap: so I'm going to formalize my devstack patch so we'll just always capture qemu coredmps so we're ready for next time | 13:35 |
kashyap | dansmith: Huh, the bugs from hell. Same story w/ that other detach thing. Although we sussed out a latent libvirt bug from it | 13:36 |
kashyap | dansmith: Yeah, thank you; I also saw that the same path works for Ubuntu as well - I checked, and by extension, Debian too | 13:36 |
dansmith | scary if that means users can hit them, but yeah :/ | 13:36 |
dansmith | kashyap: ah cool | 13:36 |
kashyap | dansmith: I commented w/ a link to evidence your DNM patch | 13:36 |
dansmith | cool thanks | 13:37 |
opendevreview | Artom Lifshitz proposed openstack/nova stable/wallaby: fake: Ensure need_legacy_block_device_info returns False https://review.opendev.org/c/openstack/nova/+/843678 | 14:03 |
opendevreview | Artom Lifshitz proposed openstack/nova stable/wallaby: Add a regression test for bug 1939545 https://review.opendev.org/c/openstack/nova/+/843702 | 14:03 |
opendevreview | Artom Lifshitz proposed openstack/nova stable/wallaby: compute: Ensure updates to bdms during pre_live_migration are saved https://review.opendev.org/c/openstack/nova/+/843680 | 14:03 |
opendevreview | Artom Lifshitz proposed openstack/nova stable/wallaby: fup: Make connection_info returned by CinderFixture unique per attachment https://review.opendev.org/c/openstack/nova/+/844594 | 14:03 |
opendevreview | Artom Lifshitz proposed openstack/nova stable/wallaby: fup: Assert state of connection_info during LM rollback in func tests https://review.opendev.org/c/openstack/nova/+/844595 | 14:03 |
opendevreview | Artom Lifshitz proposed openstack/nova stable/victoria: compute: Ensure updates to bdms during pre_live_migration are saved https://review.opendev.org/c/openstack/nova/+/843949 | 14:15 |
opendevreview | Artom Lifshitz proposed openstack/nova stable/victoria: fup: Make connection_info returned by CinderFixture unique per attachment https://review.opendev.org/c/openstack/nova/+/844598 | 14:15 |
opendevreview | Artom Lifshitz proposed openstack/nova stable/victoria: fup: Assert state of connection_info during LM rollback in func tests https://review.opendev.org/c/openstack/nova/+/844599 | 14:15 |
opendevreview | Artom Lifshitz proposed openstack/nova stable/ussuri: fake: Ensure need_legacy_block_device_info returns False https://review.opendev.org/c/openstack/nova/+/843950 | 15:19 |
opendevreview | Artom Lifshitz proposed openstack/nova stable/ussuri: Add a regression test for bug 1939545 https://review.opendev.org/c/openstack/nova/+/843951 | 15:19 |
opendevreview | Artom Lifshitz proposed openstack/nova stable/ussuri: compute: Ensure updates to bdms during pre_live_migration are saved https://review.opendev.org/c/openstack/nova/+/843952 | 15:19 |
opendevreview | Artom Lifshitz proposed openstack/nova stable/ussuri: fup: Make connection_info returned by CinderFixture unique per attachment https://review.opendev.org/c/openstack/nova/+/844606 | 15:19 |
opendevreview | Artom Lifshitz proposed openstack/nova stable/ussuri: fup: Assert state of connection_info during LM rollback in func tests https://review.opendev.org/c/openstack/nova/+/844607 | 15:19 |
gibi | sean-k-mooney: I just realized that there is antoher edge case in the PCI parsing, the remove managed feature allows PF address and VF product ID in the same device spec test_remote_managed_pf_raises | 15:53 |
gibi | https://github.com/openstack/nova/blob/ffb810e2ba2fdec9b2a881a88fa6d65cd32f8fa3/nova/tests/unit/pci/test_devspec.py#L501-L513 | 15:53 |
*** xek__ is now known as xek | 15:56 | |
sean-k-mooney | gibi: that was a prexisting feature | 15:57 |
sean-k-mooney | they did not add it | 15:57 |
sean-k-mooney | that is how you whitelisted the VFs of a pf before we added teh glob and regex support | 15:57 |
sean-k-mooney | so thats been there since like icehouse | 15:57 |
gibi | it is pretty remote managed specific https://github.com/openstack/nova/blob/d86916360858daa06164ebc0d012b78d19ae6497/nova/pci/devspec.py#L315-L334 | 15:58 |
sean-k-mooney | no this work for any sriov device | 15:58 |
sean-k-mooney | they just added a test case for it | 15:58 |
gibi | I think what you are referring to is the ability to list PF and match VF, but the remote managed extends this to list PF with VF's product_id | 15:58 |
sean-k-mooney | so you used to be able to use the pf address and vf product id | 15:59 |
gibi | it has an if self._remote_managed: on top so this does not run for the rest of the caseas | 15:59 |
sean-k-mooney | and that would allow all the VFs of that pf to be used | 15:59 |
sean-k-mooney | but not the pf itself | 15:59 |
sean-k-mooney | just looking at the code you linked now | 15:59 |
gibi | in the generic case I only see PF address matching for VF devs, but no such logic for vendor / product matching | 16:00 |
sean-k-mooney | i think its this https://github.com/openstack/nova/blob/d86916360858daa06164ebc0d012b78d19ae6497/nova/pci/devspec.py#L254-L265= | 16:02 |
sean-k-mooney | but i would have tro look at this carfully | 16:02 |
sean-k-mooney | we do supprot suign adress=<pf addres> product_id=<vf id> | 16:02 |
gibi | nope, that is only matching addresses | 16:03 |
gibi | the PhysicalPciAddress has no info about the vendor / product | 16:03 |
sean-k-mooney | its match the adress object | 16:03 |
gibi | yepp | 16:03 |
gibi | only address matching happens there | 16:03 |
gibi | vendor / product happens independently https://github.com/openstack/nova/blob/d86916360858daa06164ebc0d012b78d19ae6497/nova/pci/devspec.py#L377-L378 | 16:04 |
sean-k-mooney | right but that is combiend with the vendro id and prodcut id filter elsewhere | 16:04 |
sean-k-mooney | yes it does so the way it works is we mach on vendor and product id first | 16:06 |
sean-k-mooney | and then we match on the adrees or the partent adress | 16:06 |
sean-k-mooney | https://github.com/openstack/nova/blob/d86916360858daa06164ebc0d012b78d19ae6497/nova/pci/devspec.py#L379-L380= | 16:06 |
melwitt | sean-k-mooney: I was wondering if you might be able to help see what's wrong with this failure, I think it might be related to ansible. I have written about it on L57 https://etherpad.opendev.org/p/nova-stable-branch-ci tl;dr is we're passing executable=/bin/bash with the ansible command args but it's running with /bin/sh on the remote side. I don't understand it | 16:07 |
melwitt | have you seen something like that before? | 16:07 |
sean-k-mooney | melwitt: i was talking to fungi about that yesterday on #opentack-infra | 16:07 |
gibi | sean-k-mooney: hm, OK, I see it now. But they why on earth we needed to re-implement the same (?) logic specifically for remote_managed at https://github.com/openstack/nova/blob/d86916360858daa06164ebc0d012b78d19ae6497/nova/pci/devspec.py#L322-L334 ? | 16:08 |
melwitt | sean-k-mooney: oh cool. I'll go check the channel log | 16:08 |
sean-k-mooney | gibi: remote managed does not otherwise allow the PF to be listed | 16:09 |
sean-k-mooney | if i recall correctly | 16:09 |
sean-k-mooney | melwitt: they did not come to a conclution either but were speculating it was related to hoave devstrack get invoke it and the #!/bin/bash lines in the hook script | 16:10 |
sean-k-mooney | but those point to bash too | 16:10 |
sean-k-mooney | melwitt: i think the fix is to stop usinging the raw module and actully use shell | 16:10 |
sean-k-mooney | i think the raw moduel is perhaps using a shle script to run your comamdn via the executabel you specify | 16:11 |
melwitt | sean-k-mooney: I wondered that exact same thing. the raw module seemed the only potentially suspicious thing to me | 16:11 |
sean-k-mooney | so raw is not ment to be used in normal code | 16:11 |
gibi | sean-k-mooney: thanks. I think I have not enough brain power for this right now | 16:11 |
sean-k-mooney | raw is intended to install the deps that are required for command and shell so that you can bootstrap an ansible target when its mising the basic required pacakages for ansibel to work | 16:12 |
melwitt | sean-k-mooney: yeah. supposedly saying executable=/bin/bash will make it run under bash and it must have used to but for some reason recently the behavior has changed | 16:12 |
sean-k-mooney | yes so looking at the ansible docs the args you would normally pass in ansibel you set as varibles in the string when you are doing an adhock command | 16:13 |
melwitt | I had been thinking about trying using the shell module instead to see if it helps | 16:13 |
melwitt | right | 16:13 |
sean-k-mooney | i kid of hate that we are invokeing ansibel form bash form ansible by the way | 16:13 |
melwitt | haha yeah ... | 16:14 |
sean-k-mooney | it feels like it woudl better just to use ssh and be done with it or a script that we copy and be doen with it | 16:14 |
sean-k-mooney | but its been working this way for years | 16:14 |
sean-k-mooney | and now just sudenlly stopped | 16:15 |
sean-k-mooney | gibi: one thing that is true is that we dont have gret test for all the valid way that you can list thing for the pci module | 16:15 |
melwitt | yeah. puzzling | 16:15 |
sean-k-mooney | like not clear human readble ones with comments at anyrate | 16:16 |
sean-k-mooney | so a lot fo the reasoning why x works is buried in my brain next to memroy of swaring at a terminal | 16:17 |
* sean-k-mooney notes the pci whitelist code predates me working on nova :) | 16:18 | |
* gibi is swearing too | 16:19 | |
* sean-k-mooney i just work here dont look behind the curtain of how it works if you want to be happy on a firday | 16:19 | |
sean-k-mooney | gibi: all set for the sumit? | 16:19 |
gibi | yepp, I'm well set. | 16:20 |
sean-k-mooney | melwitt: are you goign to push a wip patch by the way to try using shell? | 16:20 |
sean-k-mooney | gibi: do you have a presination ? | 16:20 |
gibi | yepp | 16:21 |
gibi | about the qos work | 16:21 |
sean-k-mooney | ah nice | 16:21 |
melwitt | sean-k-mooney: I will unless you wanted to. I have pretty much never used ansible and learning about it now | 16:21 |
sean-k-mooney | you had one on the bandwith work before with migule right so this is the evolution i guess | 16:21 |
gibi | sean-k-mooney: yes, it was in berlin too :) | 16:22 |
gibi | so a nice cycle | 16:22 |
sean-k-mooney | melwitt: i never use it for adhoc comands its kind of like python that way. sure you can pass a string of python code to python and it will execute it but i only ever use it with a script so im really not familar iwth the adhock syntax | 16:23 |
gibi | at that time we only had WIP code for booting with bandwidth qos ports. No we have a lot more | 16:23 |
melwitt | sean-k-mooney: yeah I was just reading the docs yesterday. thought it might be fun to try | 16:24 |
sean-k-mooney | this one https://docs.ansible.com/ansible/latest/user_guide/intro_adhoc.html | 16:24 |
chateaulav | gibi: im excited to see it | 16:25 |
sean-k-mooney | gibi: yep its amazing what can hapne over the course of a pandemic... | 16:25 |
chateaulav | its my first time to berlin | 16:25 |
melwitt | sean-k-mooney: thanks | 16:25 |
gibi | sean-k-mooney: I honestly want to fix the pci spec parsing code before I start adding to it, but it is soo full of edge cases that I start feeling the I cannot clean it up without potentially breaking couple of cases that works somehow today | 16:26 |
gibi | chateaulav: o/ | 16:26 |
sean-k-mooney | gibi: ya am if i get the vdpa stuff landed perhaps i can help add some more testcsaes to prove out some of those edgecases | 16:27 |
sean-k-mooney | liek we can test the parsign with the libvirt fuctional tests | 16:27 |
sean-k-mooney | we shoudl not need that level of testing to parse stings but that would be one way to validate teh edge cacses | 16:28 |
sean-k-mooney | gibi: in your case you are not really chanign the filtering | 16:29 |
sean-k-mooney | gibi: you are just addign new tags for resocue class and traits | 16:29 |
sean-k-mooney | so if you avoid refactoring and just focus on that it shoud eb pretty safe | 16:30 |
sean-k-mooney | the new tags you are adding will not filter in/out any devices | 16:30 |
sean-k-mooney | so you can mostly just operate on the objects that are constructed after the parsing is done | 16:31 |
sean-k-mooney | melwitt: by the way on later branches there are zuul rules to do this | 16:33 |
melwitt | sean-k-mooney: oh nice | 16:33 |
sean-k-mooney | https://github.com/SeanMooney/ansible_role_devstack/blob/master/ansible/deploy_multinode_devstack.yaml#L133-L136= | 16:33 |
sean-k-mooney | sync-controller-ceph-conf-and-keys, | 16:33 |
sean-k-mooney | https://github.com/openstack/devstack/tree/master/roles/sync-controller-ceph-conf-and-keys | 16:34 |
sean-k-mooney | actully its in devstack | 16:35 |
sean-k-mooney | im kind of confused why we are donign this in hte gate hook | 16:35 |
gibi | sean-k-mooney: I promised to reject the devname case in the new codepath, for that I need to spearate out the devname handling codepath from the current parsing | 16:36 |
sean-k-mooney | am not really you jus need to check if the whitelist containds devname | 16:36 |
sean-k-mooney | gibi: if you realy wanted too you could also punt on that for this cycle | 16:37 |
gibi | yepp that does not allows removing that devname codepath from the complexity picture | 16:37 |
sean-k-mooney | well devname does not realy change anything for the rest of the spec or code | 16:37 |
gibi | I just add the guard condition, but the code below will still depend on devname | 16:37 |
gibi | so the code below will as today, complex, uggly, :/ | 16:38 |
sean-k-mooney | yep but that wont affect tracking the dvices in placment | 16:38 |
gibi | and I just pile on that with the guard | 16:38 |
gibi | I don't like piling on hard to comprehend code | 16:38 |
sean-k-mooney | what im saying is if you dont feel like you will have time to adress the devname part | 16:38 |
sean-k-mooney | we could put that out os scope fo the spec an come back to it | 16:38 |
sean-k-mooney | after | 16:38 |
sean-k-mooney | i.e. we can deprecate and remove it later | 16:39 |
gibi | yeah I got you. But I feel that this is a good change to clean it up. I don't know when will be the next time we come back to this | 16:40 |
sean-k-mooney | yep it certenly is | 16:41 |
sean-k-mooney | next time would proably be either when we make this on by default or add support for neutron prots | 16:42 |
sean-k-mooney | you are right that we likely wont just go clean tthis up if its not in aide of something else | 16:42 |
gibi | yepp | 16:43 |
sean-k-mooney | melwitt: by the way im really confused why the hook works the way it does | 16:46 |
sean-k-mooney | like we are runnign devstack on all the noes anyway so im not sure why se need the gate hook to reach out to all the other subnodes and execute command on them to deploy ceph | 16:47 |
sean-k-mooney | you know instead of just runign the ceph plugin as part of the devstack run on thsoe nodes | 16:47 |
sean-k-mooney | melwitt: we are using the shell module elsewhere in the file by the way | 16:48 |
sean-k-mooney | https://github.com/openstack/nova/blob/stable/train/gate/live_migration/hooks/ceph.sh#L62 | 16:48 |
sean-k-mooney | melwitt: just in case your wondering the fist thing after $ansible is the inventory group | 16:49 |
sean-k-mooney | $ANSIBLE subnodes --become -f 5 -i "$WORKSPACE/inventory" -m raw -a "executable=/bin/bash | 16:49 |
sean-k-mooney | so this is runnign the scrip on all the hosts in the subnodes group wich is jut the compute node | 16:49 |
sean-k-mooney | so ANSIBLE subnodes --become -f 5 -i "$WORKSPACE/inventory" -m shell -a "... would be the same | 16:50 |
sean-k-mooney | because this is using "" not '' $BASE is evaulatedin the context fo prepare_ceph too | 16:51 |
sean-k-mooney | https://github.com/openstack/nova/blob/stable/train/gate/live_migration/hooks/ceph.sh#L10-L17= | 16:52 |
sean-k-mooney | which i dont know if that was inteded or not | 16:52 |
sean-k-mooney | i guess so as it woud not otherwise be defiend | 16:52 |
sean-k-mooney | anyway i think im going to call it a day if you want me to take a look at this on monday then i can | 16:53 |
melwitt | sean-k-mooney: thank you for all the hints | 16:54 |
opendevreview | melanie witt proposed openstack/nova stable/train: DNM Testing for ceph setup gate fail https://review.opendev.org/c/openstack/nova/+/844530 | 17:21 |
opendevreview | Balazs Gibizer proposed openstack/nova master: Unparent PciDeviceSpec from PciAddressSpec https://review.opendev.org/c/openstack/nova/+/844491 | 17:33 |
opendevreview | Balazs Gibizer proposed openstack/nova master: Fix PciAddressSpec descendants to call super.__init__ https://review.opendev.org/c/openstack/nova/+/844565 | 17:33 |
opendevreview | Balazs Gibizer proposed openstack/nova master: Add more test coverage for devname base dev spec https://review.opendev.org/c/openstack/nova/+/844625 | 17:33 |
opendevreview | Balazs Gibizer proposed openstack/nova master: test remote managed dev spec with wildcard address https://review.opendev.org/c/openstack/nova/+/844626 | 17:33 |
opendevreview | Balazs Gibizer proposed openstack/nova master: Poison /sys access in test https://review.opendev.org/c/openstack/nova/+/844627 | 17:33 |
opendevreview | Balazs Gibizer proposed openstack/nova master: More comment in the code https://review.opendev.org/c/openstack/nova/+/844628 | 17:33 |
opendevreview | Merged openstack/nova stable/yoga: Add missing condition https://review.opendev.org/c/openstack/nova/+/843820 | 19:14 |
Generated by irclog2html.py 2.17.3 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!