opendevreview | Balazs Gibizer proposed openstack/nova master: Add force kwarg to delete_allocation_for_instance https://review.opendev.org/c/openstack/nova/+/688802 | 06:11 |
---|---|---|
gibi | melwitt: ^^ removed the co-authored line as you requested | 06:11 |
gibi | lyarwood, stephenfin, bauzas: we are still pretty much blocking the openstack gate without ^^ | 06:34 |
lyarwood | I'm out today, last public holiday of the year in the UK but I'll review from my phone now. | 06:45 |
lyarwood | Okay done, LGTM. | 06:49 |
gibi | lyarwood: thanks, enjoy your day off | 06:57 |
abhishekk | gibi, py38 post failure for https://review.opendev.org/c/openstack/nova/+/688802, could you please dd recheck ? | 08:39 |
elodilles | gibi: could you please have a quick look at this placement release patch for stable/ussuri? (it's a generated patch to avoid release rush around EM transition): https://review.opendev.org/c/openstack/releases/+/802110 | 10:01 |
gibi | elodilles: ack I will check | 10:07 |
gibi | abhishekk feel free to recheck next time | 10:09 |
gibi | elodilles: done and thanks | 10:11 |
elodilles | gibi: thanks \o/ | 10:12 |
gibi | the force kwargs patch https://review.opendev.org/c/openstack/nova/+/688802 bounced from the gate due to bug 1912310, I've requeued it | 12:21 |
sean-k-mooney | what causes https://bugs.launchpad.net/nova/+bug/1912310 | 12:24 |
gibi | I saw libvirt internal errors like | 12:25 |
gibi | 2021-07-30 08:56:25.528+0000: 57632: error : virProcessRunInFork:1159 : internal error: child reported (status=125): unable to open /dev/sda: No such device or address | 12:25 |
sean-k-mooney | ok so it looks like its actully libvirt that is having issue not nova connecting to it | 12:26 |
gibi | yepp | 12:26 |
gibi | as far as I understand | 12:26 |
gibi | there are also occasions with | 12:27 |
gibi | virKeepAliveTimerInternal:137 : internal error: connection closed due to keepalive timeout | 12:27 |
sean-k-mooney | we are not seeing any OOM events or anythying else strange on the node at the time are we | 12:27 |
gibi | I just like two occurence and found no such thing | 12:27 |
gibi | I just checked like | 12:27 |
gibi | the nova-live-migration job set to non-voting due to this | 12:28 |
gibi | but it seems we can hit the same in nova-next too | 12:28 |
gibi | but a lot less frequently | 12:28 |
sean-k-mooney | yep if failind in nova-net in this case | 12:29 |
sean-k-mooney | https://zuul.opendev.org/t/openstack/build/f888b58ca23f49fc8f9046e9c2ad18a0/log/controller/logs/screen-dstat.txt | 12:29 |
gibi | yes | 12:29 |
gibi | that is basically a first time I see it in nova-next | 12:29 |
sean-k-mooney | we got donw to 120MB a few times but i dont see any really evidence of memory issues so likely not the kernel randomly killing things | 12:29 |
gibi | around the time of the failure we were floating around 300MB free | 12:32 |
sean-k-mooney | ya its unlikely to be the cause but we have seen OOM issue break libvirt and other process in weird ways before. | 12:33 |
gibi | true, oom can cause weird thing | 12:34 |
gibi | s | 12:34 |
sean-k-mooney | ill quickly check the cloud archive | 12:35 |
sean-k-mooney | perhaps there is a newer libvirt avaiable we coudl use instead | 12:35 |
gibi | didn't we use the max available? | 12:36 |
sean-k-mooney | well im not sure we are using the xena cloud archive currently | 12:37 |
sean-k-mooney | but looking at it they are not shiping libvirt/qemu in the cloud archive currently | 12:37 |
sean-k-mooney | we tyically dont use the most recent cloud archive version | 12:37 |
sean-k-mooney | so ya looks like we are using 6.x for ubuntu "libvirt0:amd64 6.0.0-0ubuntu8.12" | 12:39 |
sean-k-mooney | on centos stream with the advance virt modulee we would be useing 7.x.y | 12:40 |
sean-k-mooney | its a long shot but we could enable this ppa as a test to see if that would resolve it. its the one i use when i need newer libvirt on ubutu but dont want to build from source | 12:43 |
sean-k-mooney | https://launchpad.net/~jacob/+archive/ubuntu/virtualisation | 12:43 |
sean-k-mooney | althopugh that still only provides 6.6.0-1ubuntu2~ppa0 | 12:43 |
sean-k-mooney | not 7.x | 12:44 |
gibi | I'm not sure how can we enable this in infra but feel free to go ahead. We can use the nova-live-migration job as canary as that is now non-voting but still run for almost all of our patches | 12:45 |
sean-k-mooney | ya i might porpose a DNM patch just to see if that works. if it does it means we need to talk to canonical about a missing backport | 12:45 |
sean-k-mooney | proablem is i have no idea what is missing | 12:46 |
gibi | cool, good ide | 12:46 |
gibi | idea | 12:46 |
sean-k-mooney | the other alternitive would be to move form ubuntu 20.04 to 21.04 or to centos 8 on the affected jobs | 12:46 |
sean-k-mooney | well there is another alternitive too which is complie libvirt/qemu form source which i have a devstack plugin to do but i would prefer to avoid that mainly due to extra job time. its not hard to do but if we can just use distro pacakages in this case its nicer | 12:49 |
gibi | as we declare our supported distros beforhand of the release I would go with trying to fix ubuntu 20.04 https://governance.openstack.org/tc/reference/runtimes/xena.html | 12:49 |
sean-k-mooney | yes although centos 8 stream is vaild too. but ya ill see if i can look into this a little later today. ill propose a couple of different patches for different options. | 12:50 |
sean-k-mooney | enableing "sudo add-apt-repository ppa:jacob/virtualisation" in a pre playbook is simple as is changing the base os to centos 8 stream | 12:51 |
sean-k-mooney | the other options are more complicated but doable | 12:51 |
opendevreview | Merged openstack/nova master: Functional tests removed direct post call https://review.opendev.org/c/openstack/nova/+/766068 | 13:06 |
sean-k-mooney | gibi: by the way we maintain a tempest plugin called whitebox that looks at some of the internals of how nova works and assert that it does the right thing. would you have any object to me enableing that for a subset of nova changes at least in a non voting capasity initaly? | 13:22 |
sean-k-mooney | gibi: i was thinking of making it run on change to the libvirt driver and hardware.py | 13:22 |
gibi | sean-k-mooney: I have no problem with it if it is actively maintained and won't take up much of the CI resources | 13:23 |
sean-k-mooney | yes its maintianed and runs downstream we also maintin the devstack support upstream | 13:25 |
sean-k-mooney | gibi: upstream many of the test are disabled because we dont have the hardware https://opendev.org/openstack/whitebox-tempest-plugin/src/branch/master/whitebox_tempest_plugin/api/compute | 13:25 |
sean-k-mooney | i.e. we can run the pmem, sriov or vgpu test in the gate | 13:26 |
sean-k-mooney | i know we were lucking to see if we coudl use this for third party ci but we still are having problems finding hardware internally to run it | 13:26 |
opendevreview | Balazs Gibizer proposed openstack/nova master: Add two new hacking rules https://review.opendev.org/c/openstack/nova/+/805668 | 13:27 |
gibi | just based on the test file names even without special hardware this plugin has useful coverage | 13:28 |
sean-k-mooney | yep it has all the test that were orginailly don by the intel thridpary nfv ci in it but updated | 13:28 |
sean-k-mooney | and some other test coverage | 13:28 |
gibi | then lets enable it | 13:31 |
sean-k-mooney | gibi: when the qe member of the comptue team downstream writes test automation that is not suitable for upstream tempest becasue it depens on speicic configuration of the serives this is where we try to add the test coverage. | 13:32 |
sean-k-mooney | like testing adding cpu flags which we can do https://opendev.org/openstack/whitebox-tempest-plugin/src/branch/master/.zuul.yaml#L53-L55 in the ci like this | 13:33 |
gibi | I agree to have that coverage in our upstream gate | 13:34 |
sean-k-mooney | thanks ill let artom know and see if he wants me to wait for the jobs to be split or not first ill start on the WIP patch in anycase | 13:34 |
artom | Huh, happy coincidence, I was pondering proposing a periodic whitebox job for Nova | 13:43 |
artom | So, I think it's not yet stable enough for that, actually | 13:43 |
artom | We think we know the issue, and we're working on it, but until then I'm not sure it's ready yet | 13:43 |
artom | Every so often, depending on which order tests end up being executed, what we think happens is we attempt to reshape from cpu_dedicated_set to vcpu_pin_set, and that's not allowed, so there's a cascading failure. There are also around how we use admin clients and clean up after ourselves, that can also cause cascading failures | 13:45 |
gibi | artom: nothing is urgent from upstream perspective. If upstream feedback helps then I'm OK to enable a non voting job | 13:45 |
artom | gibi, I think even that's premature, as the solution to ^^^ is to change whitebox's own job a bit, so until that's done, let's not add it to nova | 13:46 |
sean-k-mooney | artom: ok the reason i was bring this up was we did at one point plan to enabel white box for wallaby | 13:46 |
sean-k-mooney | then we did not have time to actully get it stable in time | 13:47 |
sean-k-mooney | so i was hoping we could do thatbefore the end of xena | 13:47 |
sean-k-mooney | if you think its not ready however we can hold off | 13:47 |
artom | Ah, probably not before the end of Xena | 13:47 |
artom | ... well, does end == FF? | 13:47 |
artom | Or release? | 13:47 |
sean-k-mooney | well i guess i twas thinking before RC1 when stable branch is created | 13:48 |
sean-k-mooney | although if we were ok with backporting enableing the testing on the stable branch end could be anytime before eol i guess | 13:48 |
sean-k-mooney | if we dont think it ready however no need to rush | 13:49 |
sean-k-mooney | i would just like to keep making progress on getting this test coverage enable eithe firstparty or third party | 13:50 |
artom | Third party I still haven't solved the hardware problem :) | 13:51 |
artom | Err :( | 13:51 |
sean-k-mooney | artom: is bauzas back today or is he retruning tomorow | 13:51 |
sean-k-mooney | artom: yep i know :) | 13:51 |
artom | Still on PTO today, according to Workday | 13:52 |
gibi | I personally OK with enabling new jobs on stable but I guess elodilles or lyarwood has more authority about that :) | 13:52 |
gibi | as per landing it on master, this is not a feature so RC1 is the cut of date due to branching | 13:53 |
artom | I can try to hurry it up, especially as jparker seems to have more time for this right about now, too | 13:53 |
sean-k-mooney | gibi: before i recheck are there any gate blockers i should hold off for | 13:54 |
sean-k-mooney | i was just looking at the failure in bauzas mdev series which dont seam related | 13:55 |
gibi | sean-k-mooney: the "Add force kwarg to delete_allocation_for_instance" not landed yet that kills at least 1/4 of the tempest jobs all around the gate | 13:56 |
sean-k-mooney | ah right | 13:56 |
gibi | I don't know about any full blocker | 13:56 |
sean-k-mooney | ok i was seeing the nova-ceph-multistore job fail in several patch but have not dug in to see if its the same issue | 13:56 |
sean-k-mooney | oh "'Failed to delete allocations for consumer 2064788c-9fa0-474e-a66c-72cf97b45922. .." | 13:57 |
sean-k-mooney | ya so its just that | 13:57 |
gibi | yes | 13:58 |
sean-k-mooney | any idea why it would hit the multistore job more often | 13:59 |
sean-k-mooney | it looks like that is mostly the failure so ill hold off until the force patch lands | 13:59 |
sean-k-mooney | they dont have +w anyway so they can wait | 14:00 |
elodilles | well, i don't exactly followed which job you were talking about but I am less concerned enabling new CI jobs on stable than disabling one o:) | 14:00 |
gibi | Ive no ide about the increased frequency of multistore failure | 14:00 |
gibi | elodilles: it would be a job running https://opendev.org/openstack/whitebox-tempest-plugin/src/branch/master/whitebox_tempest_plugin/api/compute | 14:01 |
sean-k-mooney | elodilles: which is defiedn here https://opendev.org/openstack/whitebox-tempest-plugin/src/branch/master/.zuul.yaml#L31-L81 | 14:01 |
sean-k-mooney | gibi: artom is currntly actully spliting it into two jobs one that use the old cpu pinning config and the main one will only use the new way | 14:02 |
gibi | ack | 14:02 |
sean-k-mooney | gibi: right now if the reshap happens at the wong time the job breaks | 14:02 |
sean-k-mooney | so we are just going to split it to aovid that | 14:02 |
gibi | sure make sense | 14:02 |
elodilles | gibi sean-k-mooney : I guess these would land on master and then backported on the most recent stable branch, am I right? | 14:03 |
elodilles | (hmmm, it looks quite heavy, according to its parent: tempest-multinode-full-py3) | 14:05 |
gibi | personally I would take it on master first | 14:07 |
sean-k-mooney | elodilles: well it need 2 nodes but it does not run run all the tempest test becasue we use the regex to limit it | 14:16 |
sean-k-mooney | tox_envlist: all | 14:16 |
sean-k-mooney | tempest_concurrency: 1 | 14:17 |
sean-k-mooney | tempest_test_regex: ^whitebox_tempest_plugin\. | 14:17 |
sean-k-mooney | so i just ues that job to set up 2 node devstack with tempest then we just run the test from the plugin | 14:17 |
elodilles | oh, i see, i missed that | 14:20 |
sean-k-mooney | we proably could inherit form something better to make that more obvious | 14:21 |
elodilles | well, when someone reviews it thoroughly i think it'll be obvious o:) but that's true that at first glance the tempest-multinode-*full*-py3 suggests some time and resource heavy test job o:) | 14:26 |
sean-k-mooney | we proably can just use devstack-tempest ill see if there is a better job we can use in the future. that is a simple fix | 14:26 |
*** akekane_ is now known as abhishekk | 14:41 | |
ganso | elodilles, lyarwood: Hi! If you have a spare minute could you please take a quick look at the backport now for victoria? it is clean and same as the one for wallaby from last week. Thanks in advance! https://review.opendev.org/c/openstack/nova/+/806004 | 15:20 |
opendevreview | Ghanshyam proposed openstack/nova master: Convert features not supported error to HTTPBadRequest https://review.opendev.org/c/openstack/nova/+/806294 | 15:22 |
elodilles | ganso: +2'd. Thanks for the backport! (fyi, lyarwood is on holiday today) | 15:46 |
ganso | elodilles: thanks! I will ping him tomorrow =) | 15:47 |
elodilles | no problem :) | 15:50 |
opendevreview | Merged openstack/nova master: tests: Validate AZ values https://review.opendev.org/c/openstack/nova/+/801523 | 16:06 |
opendevreview | Merged openstack/nova master: Add force kwarg to delete_allocation_for_instance https://review.opendev.org/c/openstack/nova/+/688802 | 17:05 |
sean-k-mooney | :) | 17:05 |
opendevreview | Merged openstack/nova master: Prevent deletion of a compute node belonging to another host https://review.opendev.org/c/openstack/nova/+/694802 | 17:15 |
opendevreview | Merged openstack/nova master: Fix inactive session error in compute node creation https://review.opendev.org/c/openstack/nova/+/695189 | 17:15 |
opendevreview | Merged openstack/nova master: Reduce mocking in test_reject_open_redirect for compat https://review.opendev.org/c/openstack/nova/+/803091 | 17:16 |
opendevreview | Merged openstack/nova master: extend_volume of libvirt/volume/iscsi should not use device_path https://review.opendev.org/c/openstack/nova/+/801003 | 17:16 |
opendevreview | sean mooney proposed openstack/nova stable/victoria: address open redirect with 3 forward slashes https://review.opendev.org/c/openstack/nova/+/806626 | 17:38 |
sean-k-mooney | gibi: elodilles by the way are we goign to backport https://review.opendev.org/c/openstack/nova/+/688802 | 17:39 |
opendevreview | sean mooney proposed openstack/nova stable/ussuri: address open redirect with 3 forward slashes https://review.opendev.org/c/openstack/nova/+/806628 | 17:56 |
opendevreview | sean mooney proposed openstack/nova stable/train: address open redirect with 3 forward slashes https://review.opendev.org/c/openstack/nova/+/806629 | 18:03 |
gibi | sean-k-mooney: I don't konw. The consumer types feature is only on master so we have a smaller issue on stable. And that smaller issue there since stein if I remember correclty. | 19:51 |
gmann | gibi: added releasenotes in this https://review.opendev.org/c/openstack/nova/+/806294 | 21:56 |
Generated by irclog2html.py 2.17.2 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!