*** hemna9 is now known as hemna | 01:27 | |
opendevreview | Merged openstack/nova master: Ignore plug_vifs on the ironic driver https://review.opendev.org/c/openstack/nova/+/813263 | 04:36 |
---|---|---|
gibi | lyarwood: I saw elodilles explained the setuptools pin question. thanks elodilles | 08:02 |
bauzas | hola folks | 08:10 |
bauzas | gibi: I'm asked to present some PTG updates in a company session today at the same time of the upstream meeting | 08:10 |
gibi | bauzas: o/ | 08:10 |
bauzas | gibi: it would be a 2 min presentation about Nova | 08:10 |
bauzas | gibi: could you help me by chairing the meeting when I'm asked to discuss ? | 08:11 |
gibi | bauzas: sure | 08:11 |
bauzas | I could run the meeting, then passing it to you for 5 mins | 08:11 |
bauzas | and then, either you continue or me :) | 08:11 |
gibi | ok | 08:12 |
gibi | I will handle it when you need to switch | 08:13 |
bauzas | gibi: thanks | 08:18 |
bauzas | appreciated | 08:18 |
gibi | no worries | 08:22 |
gibi | lyarwood, elodilles: I'm seeing multiple guest kernel panics in stable/victoria volume related tests | 08:54 |
* gibi gather links | 08:54 | |
gibi | 1) https://zuul.opendev.org/t/openstack/build/67c89daf17e3475cb1d632f87beeb60d/log/controller/logs/tempest_log.txt#5950 | 08:55 |
lyarwood | Just jumping on a call but I wonder if we were still using cirros 0.4.0 back then? | 08:56 |
lyarwood | /opt/stack/devstack/files/cirros-0.5.1-x86_64-disk.img | 08:57 |
lyarwood | maybe not | 08:57 |
gibi | 2) https://1a59031cf12ee85b5b8a-5c947c8d22eb7769ff9d2de46bec4cc9.ssl.cf5.rackcdn.com/810915/2/gate/nova-grenade-multinode/ebc944c/testr_results.html | 08:57 |
lyarwood | however I also see image.http_image = http://download.cirros-cloud.net/0.3.1/cirros-0.3.1-x86_64-uec.tar.gz | 08:57 |
lyarwood | anyway I'll take a look after this call | 08:58 |
gibi | ack, I see both 0.3.1 and 0.5.1 in the logs | 08:58 |
gibi | hm the issue in the grenade job uses a different cirros as it has kernel 4.4.0 while the failed test case in the live migration job has kernel 5.3.0 | 09:00 |
gibi | both kernel stack trace shows page fault but in different processes | 09:01 |
kashyap | gibi: Got a link to the traceback? | 09:11 |
gibi | kashyap: https://zuul.opendev.org/t/openstack/build/67c89daf17e3475cb1d632f87beeb60d/log/controller/logs/tempest_log.txt#5950 | 09:11 |
gibi | that is one | 09:11 |
kashyap | Yep, finally it loaded; thanks | 09:12 |
kashyap | So, the above trace is with kernel 4.4.0? (i.e. CirrOS 0.5.1?) | 09:12 |
gibi | this one is kernel 5.3 | 09:13 |
gibi | [ 15.489062] CPU: 0 PID: 284 Comm: ip Not tainted 5.3.0-26-generic #28~18.04.1-Ubuntu | 09:13 |
kashyap | Yes, just saw it. Silly me | 09:14 |
gibi | sorry wrong buffer | 09:14 |
gibi | ... | 09:14 |
gibi | [ 15.302770] CPU: 0 PID: 9 Comm: ksoftirqd/0 Not tainted 5.3.0-26-generic #28~18.04.1-Ubuntu | 09:14 |
gibi | this one is from the stack trace you are looking at | 09:14 |
gibi | and that is matching with cirros 0.5.1 | 09:16 |
kashyap | Yeah, figured as much. The trace seems to go into kernel RCU (read-copy update) code in the kernel ... which I was told can be used to "frighten small children and adults alike" | 09:17 |
kashyap | s/in the kernel// | 09:17 |
gibi | :) | 09:17 |
kashyap | Hm, I wonder what changed suddenly in stable/victoria for us to hit these | 09:18 |
gibi | I can try to check how frequently we hit kernel panics in stable/victoria and when we get the increawse | 09:18 |
gibi | we don't have much logs going backward in time for nova-live-migration as it was turned off for a while on stable | 09:26 |
gibi | https://zuul.opendev.org/t/openstack/builds?job_name=nova-live-migration&branch=stable%2Fvictoria | 09:27 |
gibi | based on this it started failing yesterday | 09:27 |
gibi | but it is small sample | 09:28 |
gibi | it seems other branches (wallaby, xena, master) are not affected but only master has good amount of runs to be sure | 09:32 |
gibi | but master uses cirros 0.5.2 | 09:33 |
gibi | ohh both wallaby and xena also uses 0.5.2 | 09:35 |
gibi | maybe it is the cirros version | 09:35 |
gibi | I'm wondering where we define the cirros version | 09:36 |
gibi | ok, that is devstack | 09:37 |
gibi | the 0.5.2. bump was this patch https://review.opendev.org/c/openstack/devstack/+/779179 | 09:37 |
kashyap | Hmm | 09:37 |
kashyap | Okay, so failing since yesterday; and only affects stable/victoria | 09:37 |
kashyap | (The bump was only this year - it shouldn't affect stable/victoria?) | 09:39 |
gibi | stable/ussuri uses cirros 0.4.0 and it seems that is also not affected (still small sample) | 09:39 |
gibi | kashyap: the bump to 0.5.2 does not effect victora, that uses 0.5.1 still as devstack has stable branches too | 09:39 |
kashyap | Aaah, right | 09:40 |
* kashyap back in a bit | 09:40 | |
opendevreview | Balazs Gibizer proposed openstack/nova stable/victoria: [stable-only]Bump cirros to 0.5.2 for live migration https://review.opendev.org/c/openstack/nova/+/817173 | 09:47 |
gibi | lyarwood, kashyap: that is my guess ^^ lets see what happens | 09:47 |
gibi | lyarwood: a totally different failure from stable/victoria https://zuul.opendev.org/t/openstack/build/3f48404a55904986b6f5bcd2ce7d1908/log/job-output.txt#2517 | 09:47 |
gibi | die 276 'Support for rhel8 is incomplete: no support for installing packages' | 09:48 |
lyarwood | hmm that only adds in the ahci module so I dunno maybe | 09:48 |
gibi | it is from tempest-integrated-compute-centos-8-stream | 09:48 |
lyarwood | oh I think that has never worked but I've added it in on master | 09:49 |
lyarwood | and because tempest is branchless | 09:49 |
lyarwood | fun times | 09:49 |
gibi | :) | 09:49 |
* lyarwood checks | 09:49 | |
lyarwood | https://review.opendev.org/c/openstack/tempest/+/797614 was the change | 09:49 |
lyarwood | landed overnight | 09:49 |
lyarwood | so I need to add a branch conditional in there I guess | 09:50 |
gibi | for which branch? | 09:51 |
gibi | ahh I see it passed on master | 09:52 |
gibi | but now it fails on master too https://zuul.opendev.org/t/openstack/build/a637bf6e68c545e59c8d091393a2307e/log/job-output.txt but with a totally different issue | 09:52 |
gibi | with botocore version conflict | 09:53 |
lyarwood | hmmm https://review.opendev.org/c/openstack/devstack/+/688614 is in stable/victoria | 09:56 |
lyarwood | oh it's CentOSStream | 09:57 |
* lyarwood facepalm | 09:57 | |
lyarwood | https://review.opendev.org/c/openstack/devstack/+/803023 was rejected so I need the branch conditional in master gah | 09:59 |
gibi | I've opened a gate bug for the tempest-integrated-compute-centos-8-stream job failing on master with version conflict as it seems to be 100% hit | 10:00 |
gibi | tempest-integrated-compute-centos-8-stream | 10:00 |
gibi | https://bugs.launchpad.net/nova/+bug/1950291 | 10:01 |
lyarwood | weird | 10:03 |
lyarwood | `Cannot install cinder because these package versions have conflicting dependencies.` FWIW | 10:04 |
gibi | interesing nothing recent in cinder bumped version | 10:10 |
gibi | and nothing in the requirements repo since the 6th | 10:10 |
gibi | this was on the 6th bumping boto https://review.opendev.org/c/openstack/requirements/+/816611/2/upper-constraints.txt#314 | 10:12 |
gibi | sorry 4th | 10:12 |
lyarwood | gibi: did you have a bug for the stable/victoria issue? | 10:15 |
* lyarwood will raise one if not | 10:15 | |
gibi | nope | 10:15 |
lyarwood | ack | 10:15 |
gibi | please raise one | 10:15 |
kashyap | gibi: lyarwood: Catching up ... is it because of this not merging yet? https://review.opendev.org/c/openstack/devstack/+/803023 (fix is_fedora for centos 8 stream) | 10:18 |
lyarwood | yeah but as we didn't support centos8stream at that point we shouldn't backport this anyway | 10:20 |
lyarwood | I'll just land some regex shortly to fix this | 10:20 |
opendevreview | Jun Chen proposed openstack/nova master: Catch an exception in power off procedure https://review.opendev.org/c/openstack/nova/+/817176 | 10:21 |
kashyap | lyarwood: Ah, noted. Sorry, regex based on what? To selectively check if 8stream is available, if not fallback to vanilla CentoS? | 10:23 |
lyarwood | kashyap: regex to stop the centos8stream job from running on branches older than wallaby | 10:24 |
lyarwood | https://review.opendev.org/c/openstack/tempest/+/817179 | 10:24 |
kashyap | Aah, like that; thx! | 10:25 |
opendevreview | Lee Yarwood proposed openstack/nova stable/victoria: DNM - Test integrated-gate-compute fix for centos8stream https://review.opendev.org/c/openstack/nova/+/817180 | 10:25 |
lyarwood | ^ testing here | 10:25 |
gibi | I cannot reproduce the version conflict locally seen in https://bugs.launchpad.net/nova/+bug/1950291 | 10:28 |
gibi | but I don't have python3.6 :/ | 10:30 |
* gibi installing py3.6 | 10:32 | |
frickler | gibi: that looks like a failure in the index from pypi CDN, there is no conflict if you look at the version numbers. pip just fails to generate a proper error when it can't find that specific version in the index | 10:35 |
gibi | frickler: ohh, good point, then I guess the error will go away after the recheck | 10:35 |
bauzas | gibi: should we mark the bug Critical as it holds the gate ? | 10:35 |
bauzas | https://bugs.launchpad.net/nova/+bug/1950291 | 10:35 |
gibi | bauzas: wait a bit, frickler has an explanation that might mean it was a transient only | 10:36 |
bauzas | ok | 10:36 |
gibi | I have recheck running now | 10:36 |
bauzas | that's what I see | 10:36 |
* gibi stops install py3.6 locall :D | 10:36 | |
frickler | we have some way of telling the CDN to refresh its cache, I can look that up in a bit | 10:36 |
gibi | still failing with boto conflict after recheck https://zuul.opendev.org/t/openstack/build/0ba1dd59972d48a98fe29b47dbb82e1e/log/job-output.txt | 10:41 |
bauzas | gibi: marking it Critical until we figure out a better vision | 10:42 |
bauzas | gibi: just to be clear, this is an unrelated issue from the stable/victoria gate, right? | 10:43 |
bauzas | here, we have centos-stream on wallaby and later | 10:43 |
gibi | bauzas: right | 10:43 |
bauzas | and I see lyarwood providing a tempest fix for the stable branches that are impacted | 10:44 |
frickler | bauzas: gibi: I did "curl -XPURGE https://pypi.org/simple/botocore" and the same without the "/botocore". please try another recheck | 10:44 |
gibi | frickler: ack I will | 10:45 |
gibi | and thanks | 10:45 |
* bauzas needs to go off | 10:45 | |
bauzas | but I'll scroll when I'm back | 10:45 |
frickler | if it is still failing for jobs starting now, please ping infra-root in #opendev, I'll be afk for a bit | 10:46 |
kevko | sean-k-mooney: hi, here ? :) | 10:46 |
gibi | frickler: ack thanks | 11:03 |
opendevreview | Lee Yarwood proposed openstack/nova master: libvirt: Create qcow2 disks with the correct size without extending https://review.opendev.org/c/openstack/nova/+/779275 | 11:03 |
gibi | lyarwood: fyi cirros 0.5.2 is not a solution for the kernel panic as https://review.opendev.org/c/openstack/nova/+/817173 still triggers it | 11:10 |
lyarwood | gibi: Yeah I didn't think it would tbh | 11:24 |
gibi | so back to square one | 11:24 |
gibi | lyarwood: should I file a bug for the kernel panic problem on stable/victoria? | 12:00 |
gibi | or you already did? | 12:00 |
lyarwood | I haven't so go ahead | 12:00 |
gibi | ok | 12:01 |
gibi | I will do | 12:01 |
gibi | lyarwood: https://bugs.launchpad.net/nova/+bug/1950310 | 12:09 |
gibi | lyarwood: could this be the appearence of our old volume detach bug ^^ where the fix was the redesigned detach code in https://review.opendev.org/q/topic:bug/1882521 | 12:21 |
gibi | that was backported only to wallaby | 12:21 |
gibi | and I do see in the nova log that the _do_wait_and_retry_detach function goes through the 7 iteration | 12:22 |
opendevreview | Merged openstack/nova master: Remove SESSION_CONFIGURED global from DB fixture https://review.opendev.org/c/openstack/nova/+/815689 | 13:21 |
gibi | frickler: seems your PURGE command helped later runs does not hit the boto version conflict | 13:26 |
frickler | gibi: great, thanks for confirming | 13:27 |
gibi | lyarwood: I backported the libvirt event based detach series to stable/victoria let's see if that helps with the kernel panic | 13:51 |
opendevreview | Balazs Gibizer proposed openstack/nova stable/victoria: libvirt: Define and emit DeviceRemovedEvent and DeviceRemovalFailedEvent https://review.opendev.org/c/openstack/nova/+/817209 | 13:51 |
opendevreview | Balazs Gibizer proposed openstack/nova stable/victoria: libvirt: add AsyncDeviceEventsHandler https://review.opendev.org/c/openstack/nova/+/817210 | 13:51 |
opendevreview | Balazs Gibizer proposed openstack/nova stable/victoria: libvirt: allow querying devices from the persistent domain https://review.opendev.org/c/openstack/nova/+/817211 | 13:53 |
opendevreview | Balazs Gibizer proposed openstack/nova stable/victoria: libvirt: parse alias out from device config https://review.opendev.org/c/openstack/nova/+/817212 | 13:56 |
kevko | dansmith: commented on https://review.opendev.org/c/openstack/nova/+/817030 | 13:57 |
opendevreview | Merged openstack/nova master: Refactor Database fixture https://review.opendev.org/c/openstack/nova/+/815690 | 13:58 |
opendevreview | Merged openstack/nova master: Use ReplaceEngineFacade fixture https://review.opendev.org/c/openstack/nova/+/816820 | 13:58 |
opendevreview | Merged openstack/nova master: Fix interference in db unit test https://review.opendev.org/c/openstack/nova/+/814735 | 13:59 |
opendevreview | Balazs Gibizer proposed openstack/nova stable/victoria: Replace blind retry with libvirt event waiting in detach https://review.opendev.org/c/openstack/nova/+/817214 | 14:00 |
gibi | elodilles: this probably interest you too ^^ | 14:01 |
opendevreview | Balazs Gibizer proposed openstack/nova stable/victoria: Move the guest.get_disk test to test_guest https://review.opendev.org/c/openstack/nova/+/817215 | 14:03 |
opendevreview | Balazs Gibizer proposed openstack/nova stable/victoria: libvirt: Remove dead error handling code https://review.opendev.org/c/openstack/nova/+/817216 | 14:03 |
opendevreview | Balazs Gibizer proposed openstack/nova stable/victoria: Move instance power state check to _detach_with_retry https://review.opendev.org/c/openstack/nova/+/817217 | 14:03 |
opendevreview | Balazs Gibizer proposed openstack/nova stable/victoria: Consolidate device detach error handling https://review.opendev.org/c/openstack/nova/+/817218 | 14:03 |
opendevreview | Balazs Gibizer proposed openstack/nova stable/victoria: Parse alias from domain hostdev https://review.opendev.org/c/openstack/nova/+/816486 | 14:32 |
elodilles | gibi: wow, 10 patches for a single bug fix? :-o | 14:40 |
gibi | elodilles: you know that, it is the libvirt event based device detach serires | 14:43 |
gibi | it was backported to wallaby and now I backported it to victoria | 14:43 |
elodilles | gibi: oh, that's soooo... May... o:D | 14:48 |
gibi | :D | 14:49 |
stephenfin | gibi: Okay with me backporting those DB test changes? | 14:55 |
gibi | stephenfin: which one? | 14:56 |
stephenfin | https://review.opendev.org/c/openstack/nova/+/814735 and company | 14:56 |
gibi | I'm already on it | 14:56 |
stephenfin | oh, great :D | 14:56 |
bauzas | reminder: nova team meeting in 1 hour | 15:00 |
bauzas | ... here | 15:00 |
bauzas | (sorry, forgot to tell) | 15:00 |
opendevreview | sean mooney proposed openstack/nova master: This change replaces all hardcoded tox enve with generative envs https://review.opendev.org/c/openstack/nova/+/804292 | 15:02 |
sean-k-mooney | stephenfin: that ^ still has you -2 on it can you remove it so we can proceed with the review | 15:03 |
stephenfin | oh yeah, sure | 15:03 |
sean-k-mooney | i proably have typos and other issue in it but for the most part i think its ready to review | 15:04 |
opendevreview | Alexey Stupnikov proposed openstack/nova master: Test aborting queued live migration https://review.opendev.org/c/openstack/nova/+/776250 | 15:07 |
clarkb | frickler: bauzas gibi it happens when new deps are released and then we pin them in constraints because pypi has a fallback for its CDN lookups that tends to run out of date by a couple of weeks it seems | 15:10 |
clarkb | openstack notices because our requirements system is really good at bumping and constraining new deps | 15:10 |
gibi | clarkb: I see. Is it easy to detect when this happen? | 15:10 |
gibi | just by looking at the conflict I did not figure out | 15:11 |
bauzas | gibi: fwiw, this bug is still Critical, so we'll discuss it at the meeting | 15:11 |
bauzas | elodilles: man, you updated the stable section in https://wiki.openstack.org/wiki/Meetings/Nova#Agenda_for_next_meeting, right? | 15:12 |
gibi | bauzas: we can close the bug, frickler's purge solved the issue | 15:12 |
elodilles | bauzas: yes | 15:12 |
bauzas | elodilles: can you please at the agenda, I got a merge conflict | 15:12 |
bauzas | gibi: ack, please do | 15:12 |
clarkb | gibi: no one of the bugs that someone could file is against pip to output a better error message. Maybe ideally have it print out the versions it did find | 15:12 |
gibi | bauzas: on it | 15:13 |
bauzas | gibi: thanks | 15:13 |
gibi | clarkb: I see | 15:13 |
elodilles | bauzas: sorry :S | 15:14 |
elodilles | bauzas: is there anything I should do now regarding the wiki page? :S | 15:15 |
bauzas | elodilles: just looking at what you provided | 15:16 |
bauzas | elodilles: when I merged, I could not have seen some modification you provided | 15:16 |
elodilles | bauzas: if it helps to you just delete my change and I'll add it again | 15:19 |
bauzas | elodilles: nah, should be ok | 15:19 |
elodilles | ack | 15:19 |
elodilles | I'll try to remember to sync with you next week to avoid another merge conflict o:) | 15:20 |
opendevreview | Balazs Gibizer proposed openstack/nova stable/xena: Remove SESSION_CONFIGURED global from DB fixture https://review.opendev.org/c/openstack/nova/+/817236 | 15:32 |
opendevreview | Balazs Gibizer proposed openstack/nova stable/xena: Refactor Database fixture https://review.opendev.org/c/openstack/nova/+/817237 | 15:33 |
opendevreview | Balazs Gibizer proposed openstack/nova stable/xena: Use ReplaceEngineFacade fixture https://review.opendev.org/c/openstack/nova/+/817239 | 15:35 |
opendevreview | Balazs Gibizer proposed openstack/nova stable/xena: Fix interference in db unit test https://review.opendev.org/c/openstack/nova/+/817240 | 15:35 |
gibi | stephenfin: here are the backports | 15:35 |
bauzas | nova meeting in 3 mins | 15:57 |
bauzas | #startmeeting nova | 16:01 |
opendevmeet | Meeting started Tue Nov 9 16:01:09 2021 UTC and is due to finish in 60 minutes. The chair is bauzas. Information about MeetBot at http://wiki.debian.org/MeetBot. | 16:01 |
opendevmeet | Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. | 16:01 |
opendevmeet | The meeting name has been set to 'nova' | 16:01 |
gibi | o/ | 16:01 |
bauzas | I'll pass the baton to gibi for a few mins | 16:02 |
bauzas | #chair gibi | 16:02 |
opendevmeet | Current chairs: bauzas gibi | 16:02 |
elodilles | o/ | 16:02 |
bauzas | #link https://wiki.openstack.org/wiki/Meetings/Nova#Agenda_for_next_meeting | 16:03 |
gibi | bauzas: if everybody for RH is on the meeting where you will present then we might not need this meeting :) | 16:03 |
* bauzas facepalms | 16:03 | |
bauzas | I dunno | 16:03 |
artom | It's mostly a listening meeting, so we can lurk in both | 16:03 |
gibi | ahh I see | 16:04 |
artom | But yeah, active participation will be... patchy | 16:04 |
bauzas | let's start and we'll see | 16:04 |
gibi | ok | 16:04 |
bauzas | #topic Bugs (stuck/critical) | 16:04 |
bauzas | #info No Critical bug | 16:04 |
bauzas | thanks gibi for triaging the one | 16:04 |
bauzas | #link https://bugs.launchpad.net/nova/+bugs?search=Search&field.status=New 25 new untriaged bugs (+3 since the last meeting) | 16:05 |
bauzas | #help Nova bug triage help is appreciated https://wiki.openstack.org/wiki/Nova/BugTriage | 16:05 |
bauzas | #link https://storyboard.openstack.org/#!/project/openstack/placement 32 open stories (+0 since the last meeting) in Storyboard for Placement | 16:05 |
bauzas | anything to discuss about bugs ? | 16:05 |
bauzas | ok, let's move on to the gate status | 16:06 |
bauzas | #topic Gate status | 16:06 |
bauzas | #link https://bugs.launchpad.net/nova/+bugs?field.tag=gate-failure Nova gate bugs | 16:06 |
gibi | we had an intermittent failure this morning on master | 16:07 |
gibi | but it is resolved now | 16:07 |
bauzas | yeah | 16:07 |
bauzas | we have a few other new bugs | 16:07 |
bauzas | like https://bugs.launchpad.net/nova/+bug/1950310 | 16:07 |
bauzas | (easy one to triage, btw.) | 16:07 |
bauzas | #link https://zuul.openstack.org/builds?project=openstack%2Fplacement&pipeline=periodic-weekly Placement periodic job status | 16:08 |
bauzas | we again had an issue with placement-nova-tox-functional-py38 | 16:08 |
bauzas | #link https://zuul.openstack.org/build/0c6c18cff1d74f99a6f1a19913f35818 issue with placement-nova-tox-functional-py38 last run | 16:08 |
gibi | that is the misterious | 16:09 |
gibi | /bin/sh: 1: Syntax error: "(" unexpected | 16:09 |
gibi | hm | 16:09 |
bauzas | https://zuul.openstack.org/build/0c6c18cff1d74f99a6f1a19913f35818/log/job-output.txt#802 | 16:09 |
gibi | it is again the tox showconfig role | 16:10 |
bauzas | gibi: I need to pass you the baton now for 5-ish mins | 16:10 |
gibi | ack | 16:10 |
gibi | anyhow I will look into that placement failure I feel we handled this before | 16:10 |
gibi | anything else on the gate status? | 16:10 |
gibi | #topic Release Planning | 16:12 |
gibi | Yoga-1 is due Nova 18th #link https://releases.openstack.org/yoga/schedule.html#y-1 | 16:12 |
gibi | which means | 16:12 |
gibi | #info Spec review day on Tuesday Nova 16th | 16:12 |
gibi | which is next Tuesday | 16:12 |
gibi | anything else about release planning? | 16:12 |
gibi | #topic Review priorities | 16:14 |
gibi | https://review.opendev.org/q/status:open+(project:openstack/nova+OR+project:openstack/placement)+label:Review-Priority%252B1 | 16:14 |
gibi | not a huge list | 16:14 |
gibi | and most of them has feedback already | 16:15 |
gibi | #link https://review.opendev.org/c/openstack/nova/+/816861 bauzas proposing a documentation change for helping contributors to ask for reviews | 16:15 |
gibi | I will definitely review that ^^ but probably not today | 16:15 |
gibi | any comment / question about review priorities? | 16:16 |
gmann | I will also check today | 16:16 |
* bauzas is back | 16:18 | |
* gibi hands back the baton | 16:18 | |
bauzas | yeah, so I wrote this one | 16:18 |
bauzas | I know we had concerns during the PTG but I'd love to see comments in https://review.opendev.org/c/openstack/nova/+/816861 | 16:19 |
bauzas | ok, let's move | 16:20 |
bauzas | #topic Stable Branches | 16:20 |
bauzas | elodilles: your time | 16:20 |
elodilles | victoria and ussuri are blocked until tempest fix lands: https://review.opendev.org/c/openstack/tempest/+/817179 | 16:20 |
elodilles | no news yet regarding the investigation of the 'volume detach' failures that requires many rechecks on multiple stable branches | 16:20 |
elodilles | Ussuri Extended Maintenance transition is scheduled this week (Nov 12) | 16:21 |
elodilles | final release patch proposed: https://review.opendev.org/c/openstack/releases/+/817226 | 16:21 |
gmann | will check tempest one | 16:22 |
elodilles | still, the list of open and unreleased patches if someone is interested: https://etherpad.opendev.org/p/nova-stable-ussuri-em | 16:22 |
bauzas | ++ | 16:22 |
elodilles | and patches that need one +2 on ussuri: https://review.opendev.org/q/project:openstack/nova+branch:stable/ussuri+is:open+label:Code-Review%253E%253D%252B2 | 16:22 |
bauzas | I need to do homework | 16:22 |
elodilles | gmann: thanks in advance! | 16:22 |
bauzas | we have a large list of ussuri changes | 16:23 |
elodilles | let me know if something needs to be fit into the final release and I'll hold the release patch until | 16:23 |
bauzas | but I'll try to review a few of them I think are important | 16:23 |
bauzas | I could ask other company folks if they're interesting | 16:23 |
bauzas | interested* | 16:23 |
bauzas | that said, just saying out loud, my own company isn't getting fully interested in ussuri backports for obvious reasons | 16:24 |
gibi | looking at the list a lot of them are not merged to victoria yet | 16:24 |
gmann | +A on tempest fix | 16:24 |
gibi | gmann: thanks for that | 16:24 |
gibi | ! | 16:24 |
elodilles | yes. only some could be reasonably quickly merged | 16:24 |
elodilles | gmann: \o/ | 16:24 |
gibi | elodilles: do you have some links for "no news yet regarding the investigation of the 'volume detach' failures that requires many rechecks on multiple stable branches" | 16:25 |
elodilles | and we are close to deadline | 16:25 |
bauzas | yup | 16:25 |
gibi | elodilles: is it related to the recently seen kernel panics on stable/victoria ? | 16:25 |
bauzas | I can look at that tomorrow | 16:25 |
elodilles | gibi: not really, as there were not so much activity on stable nowadays | 16:25 |
elodilles | gibi: but yes, it could be related to the kernel panic issue as well | 16:26 |
gibi | as for that I have a huge packport to see if helps | 16:26 |
bauzas | a packport ? nice | 16:26 |
gibi | backport :D | 16:26 |
elodilles | pun intended :D | 16:27 |
gibi | bauzas: https://review.opendev.org/q/topic:bug/1882521 if you are interested :D | 16:27 |
bauzas | always interested in eating reviews | 16:27 |
gibi | (and yes it is -1 all over as wee need the tempest fix gman just approved) | 16:27 |
bauzas | yeah | 16:27 |
bauzas | this doesn't help btw. | 16:27 |
bauzas | anyway | 16:28 |
bauzas | moving on ? | 16:28 |
bauzas | #topic Sub/related team Highlights | 16:28 |
bauzas | Libvirt (lyarwood) | 16:28 |
bauzas | I guess he's not there | 16:29 |
bauzas | no worries, we can punt this to next week | 16:29 |
bauzas | #topic Open discussion | 16:29 |
bauzas | Off-path Network Backends spec re-review https://review.opendev.org/c/openstack/nova-specs/+/787458 after addressing PTG comments (dmitriis) | 16:29 |
bauzas | dmitriis: around ? | 16:30 |
dmitriis | yep | 16:30 |
dmitriis | One of the asks during the PTG was that the Neutron cores review the Neutron spec: https://review.opendev.org/c/openstack/neutron-specs/+/788821/ | 16:31 |
dmitriis | There is some progress on that, I am waiting for a second +2 (hopefully there will be some more feedback today). | 16:31 |
dmitriis | I updated the spec with some of the points that were discussed during the PTG as well | 16:31 |
bauzas | \o/ | 16:32 |
bauzas | so I guess it's our turn ? | 16:32 |
dmitriis | That would be much appreciated :^) | 16:32 |
bauzas | ok, so just a ping for reviews ? :) | 16:32 |
bauzas | nothing you wanna discuss with the team by now ? | 16:32 |
bauzas | some open left question, maybe ? | 16:33 |
dmitriis | yes, just a ping for now | 16:33 |
dmitriis | trying to get some eyes on it early since we are getting closer to the spec freeze and holidays | 16:33 |
bauzas | dmitriis: I guess you saw we plan a spec review day ? | 16:34 |
bauzas | dmitriis: this doesn't mean we won't review your spec *before* | 16:34 |
bauzas | but we would appreciate if you could be around on this particular day | 16:35 |
dmitriis | bauzas: yes, I plan to be around for that and ready to address feedback | 16:35 |
bauzas | dmitriis: excellent, thanks | 16:35 |
bauzas | given the size of the spec, first runs of reviews will be needed before the spec review day | 16:35 |
bauzas | but this helps to know you'll be arouind | 16:36 |
dmitriis | bauzas: it had some rounds of reviews around April/May 2021 already | 16:36 |
dmitriis | but, yes, I think early views would be preferred | 16:36 |
bauzas | :) | 16:37 |
bauzas | ok, I guess we consumed the whole agenda | 16:37 |
dmitriis | there were some external dependencies in Libvirt and OVN that got merged recently (so this is out of the way). During the PTG we agreed that the Neutron spec needs to be reviewed first and that I need to address some additional points | 16:37 |
dmitriis | ack | 16:37 |
bauzas | dmitriis: yup, indeed | 16:37 |
bauzas | dmitriis: but yeah, I get the fact the dependencies are now solved | 16:38 |
bauzas | so it's our turn | 16:38 |
bauzas | anyone wanting to raise anything before we shutdown the meeting ? | 16:39 |
gibi | - | 16:40 |
bauzas | #endmeeting | 16:40 |
opendevmeet | Meeting ended Tue Nov 9 16:40:20 2021 UTC. Information about MeetBot at http://wiki.debian.org/MeetBot . (v 0.1.4) | 16:40 |
opendevmeet | Minutes: https://meetings.opendev.org/meetings/nova/2021/nova.2021-11-09-16.01.html | 16:40 |
opendevmeet | Minutes (text): https://meetings.opendev.org/meetings/nova/2021/nova.2021-11-09-16.01.txt | 16:40 |
opendevmeet | Log: https://meetings.opendev.org/meetings/nova/2021/nova.2021-11-09-16.01.log.html | 16:40 |
dmitriis | o/ | 16:40 |
elodilles | o/ | 16:40 |
bauzas | hah, fancy https://meetings.opendev.org/meetings/nova/2021/nova.2021-11-09-16.01.html | 16:40 |
bauzas | I fixed the use of the #link command and the topics | 16:41 |
bauzas | I guess I need to make sure we provide an #info command per topîc | 16:42 |
* bauzas tries to make our minutes more readable | 16:42 | |
opendevreview | Balazs Gibizer proposed openstack/nova master: Apply common irrelevant_files for centos 8 job https://review.opendev.org/c/openstack/nova/+/817278 | 16:48 |
gibi | lyarwood: ^^ on tweak for the new job | 16:49 |
gibi | *one | 16:49 |
lyarwood | gibi: it's part of tempest-integrated-compute so is this really needed? | 16:51 |
lyarwood | ah wait the template is called something different my bad | 16:51 |
gibi | yeah, I noticed that it was run on https://review.opendev.org/c/openstack/nova/+/814735 but no other tempest job run there | 16:51 |
opendevreview | Balazs Gibizer proposed openstack/placement stable/xena: Use 'functional-without-sample-db-tests' tox env for placement nova job https://review.opendev.org/c/openstack/placement/+/817255 | 17:02 |
opendevreview | Balazs Gibizer proposed openstack/nova stable/xena: Define new functional test tox env for placement gate to run https://review.opendev.org/c/openstack/nova/+/817256 | 17:02 |
gibi | bauzas, gmann: ^^ these backports are needed to make the periodic placement test run green on stable/xena | 17:02 |
gibi | as per https://zuul.openstack.org/builds?project=openstack%2Fplacement&pipeline=periodic-weekly | 17:02 |
opendevreview | Balazs Gibizer proposed openstack/placement stable/xena: Use 'functional-without-sample-db-tests' tox env for placement nova job https://review.opendev.org/c/openstack/placement/+/817255 | 17:04 |
opendevreview | Stephen Finucane proposed openstack/nova master: Use unittest.mock instead of third party mock https://review.opendev.org/c/openstack/nova/+/714676 | 17:05 |
gibi | stephenfin: on the backport of the Database fixture fix, I think we need to backport https://review.opendev.org/c/openstack/nova/+/810291 as well | 17:15 |
gibi | or at least I see that as a difference between master and xena and my backport on xena now fails misteriously https://zuul.opendev.org/t/openstack/build/d7c064c8981b40618e3d24fc221c1832/log/job-output.txt | 17:16 |
kevko | anyone to help me investigate nova/neutron problem :/ | 17:18 |
gibi | anyhow I gave up for today | 17:18 |
sean-k-mooney | gibi: i have not reviewed that but skimmig it quickly it seam like a small enough change | 17:18 |
gibi | sean-k-mooney: me neither, I probably need to pull it in apply it to xena and see if it resolves the test failure with the xena backport | 17:19 |
kevko | sean-k-mooney: hi, i patched nova code to see how much time spent to get event about vif plugged from neutron | 17:19 |
kevko | on my test environment it is about 10 - 40 sec ..sometimes it is higher sometimes it is lower ..what is strange that sometimes when I run heat stack ..it is quite fast and I can see debug log message about vif event ..sometimes it is long time .. :( | 17:21 |
sean-k-mooney | kevko: it soundly like when there are a lot of vms strating it presumable gets longer | 17:24 |
sean-k-mooney | are you seeing them get close to the 300 time out or are they still generally below that | 17:25 |
kevko | sean-k-mooney: nope, it is really low | 17:26 |
kevko | sean-k-mooney: https://paste.opendev.org/show/810887/ | 17:26 |
kevko | stack is always same .. 6 small cirros instances | 17:26 |
kevko | openstack is clean testing env ..so no other processes running ...just my stack is building .. | 17:27 |
sean-k-mooney | kevko: that point to this not being a general performance problem then so increaseign the timeout wont help | 17:28 |
kevko | sean-k-mooney: yeah, something is somewhere buggy :D probably in neutron .. | 17:28 |
sean-k-mooney | you will have to start corralating the nova and netron logs to see if/when the ovs ports ar created and what happens | 17:28 |
kevko | on neutron-server side i can see this -> 2021-11-09 16:40:42.749 8 ERROR neutron.agent.dhcp.agent [-] Unexpected number of DHCP interfaces for metadata proxy, expected 1, got 2 | 17:29 |
sean-k-mooney | ya it either a problem with libvirt creating the tap and adding it to ovs or a proablem in the neutron l2 agent | 17:29 |
kevko | hmm, If you have a time ..I can give you access to that LAB env | 17:30 |
kevko | sean-k-mooney: or give logs ? | 17:30 |
kevko | sean-k-mooney: because I don't know if I am able to debug it :/ ..trying whole day | 17:31 |
sean-k-mooney | if you can share logs form 5-10min before/after the vm failed for the neutron l2 agent and nova-compute agent that should be enough | 17:32 |
sean-k-mooney | i can try and take a look but unfrotruatlly i proabley wont be able to fully debug this for you | 17:32 |
kevko | ok, give me minute | 17:33 |
sean-k-mooney | really the way to approch this is look for the point at which nova/libvirt create teh docmain which will in trun create teh port and get the time stampe | 17:33 |
sean-k-mooney | then you need to look at the l2 agent log and see if it start processign the port in the treat_ports fuction | 17:33 |
sean-k-mooney | kevko: this is the code that shoudl configure the port after its added https://github.com/openstack/neutron/blob/master/neutron/plugins/ml2/drivers/openvswitch/agent/ovs_neutron_agent.py#L1925 | 17:34 |
kevko | sean-k-mooney: https://debian.kevko.ultimum.cloud/neutron-openvswitch-agent.log | 17:37 |
sean-k-mooney | do you know the uuid of the port/tap name or mac | 17:38 |
kevko | nova-compute 17:16:53.188 line | 17:39 |
kevko | sean-k-mooney: probably this ? | 17:40 |
kevko | 2021-11-09 17:11:24.974 7 DEBUG neutron.agent.resource_cache [req-72e6674d-a4b6-4040-b62f-e7983c5c74f3 f21b4913a25d411fa774338091bd105a 5bd5561af79540c38df13222dce135f6 - - -] Resource Port 8d3373f0-6329-4252-8114-fc981873e0fb updated (revision_number 21->22). Old fields: {'dns': PortDNS(current_dns_domain='',current_dns_name='',dns_domain='',dns_name='',port_id=8d3373f0-6329-4252-8114-fc981873e0fb,previous_dns_domain='',previo | 17:40 |
kevko | us_dns_name=''), 'device_id': '', 'bindings': [PortBinding(host='',port_id=8d3373f0-6329-4252-8114-fc981873e0fb,profile={},status='ACTIVE',vif_details=None,vif_type='unbound',vnic_type='normal')], 'device_owner': ''} New fields: {'dns': PortDNS(current_dns_domain='',current_dns_name='',dns_domain='',dns_name='prod-p0000000001-s0000000001-uan',port_id=8d3373f0-6329-4252-8114-fc981873e0fb,previous_dns_domain='',previous_dns_name= | 17:40 |
kevko | ''), 'device_id': 'f01680bd-ba12-4029-b11f-b2d5ae848818', 'bindings': [PortBinding(host='compute0',port_id=8d3373f0-6329-4252-8114-fc981873e0fb,profile={},status='ACTIVE',vif_details=None,vif_type='unbound',vnic_type='normal')], 'device_owner': 'compute:nova'} record_resource_update /usr/lib/python3/dist-packages/neutron/agent/resource_cache.py:185 | 17:40 |
kevko | found via instance id | 17:40 |
sean-k-mooney | ok so tha tis in the log | 17:42 |
sean-k-mooney | 021-11-09 17:05:32.150 7 DEBUG neutron.plugins.ml2.drivers.openvswitch.agent.ovs_neutron_agent [req-70183bba-0380-45d0-afef-7834c5644b2a - - - - -] Starting to process devices in:{'current': {'8d3373f0-6329-4252-8114-fc981873e0fb', 'f2b15696-b359-4216-a22a-804ebf285332', 'cc1609e0-d7ae-45a1-9405-1c95bb8dabf1', '05295e05-4fc2-4c00-ba77-8e4ff57b2ae3'}, 'added': set(), 'removed': | 17:42 |
sean-k-mooney | set(), 'updated': {'8d3373f0-6329-4252-8114-fc981873e0fb', 'f2b15696-b359-4216-a22a-804ebf285332', '05295e05-4fc2-4c00-ba77-8e4ff57b2ae3'}, 're_added': set()} rpc_loop /usr/lib/python3/dist-packages/neutron/plugins/ml2/drivers/openvswitch/agent/ovs_neutron_agent.py:2662 | 17:42 |
sean-k-mooney | and the status is set up at 17:05:41 | 17:43 |
sean-k-mooney | 2021-11-09 17:05:41.766 7 DEBUG neutron.plugins.ml2.drivers.openvswitch.agent.ovs_neutron_agent [req-70183bba-0380-45d0-afef-7834c5644b2a - - - - -] Setting status for 8d3373f0-6329-4252-8114-fc981873e0fb to UP _bind_devices /usr/lib/python3/dist-packages/neutron/plugins/ml2/drivers/openvswitch/agent/ovs_neutron_agent.py:1202 | 17:43 |
sean-k-mooney | so the device configuration complete in the agent a 17:05:44 | 17:45 |
sean-k-mooney | a side effect fo settign the status up shoudl be calling the provision blocks cod ewhich will eventurally send the even to nova | 17:46 |
sean-k-mooney | kevko: that intersting im seeign it repeat later in the log too | 17:50 |
kevko | well, i think if something is broken ..it is trying to spawn instance again no ? | 17:51 |
sean-k-mooney | not on the same host | 17:52 |
sean-k-mooney | the curernt host appears to be compute0 | 17:53 |
sean-k-mooney | the revision_number 23->24 update is fefintly going form bound to compute0 with status down to compute0 with status up | 17:55 |
sean-k-mooney | which corralates with the 21->22 detail above | 17:56 |
sean-k-mooney | it looks like the issue is else where perhaps in the the dhcp agent or neutorn server | 17:56 |
sean-k-mooney | kevko: for the event to be sent both the l2 agnet and dhcp agent need to notify the neutron server that the provisioning is complete | 17:57 |
sean-k-mooney | since the l2 agent seams to be working correctly the next most likely candiate is the dhcp agent being slow whne many vms are created | 17:58 |
kevko | 6 vms ? :/ | 17:58 |
sean-k-mooney | its likely that there is a bug in the configutaion that is cause the agent to block/hang for some reason if this is the issue | 18:01 |
sean-k-mooney | its not really a perfroamce issue | 18:01 |
sean-k-mooney | we have had bugs in the interactio nwith dnsmasque in the past | 18:01 |
sean-k-mooney | kevko: in any case when the l2 agent set the port status as active it execution this code which mare it complete for the l2 agent | 18:02 |
sean-k-mooney | https://github.com/openstack/neutron/blob/9241c76b04e6745cc648ee42037cfe6ddad3600a/neutron/plugins/ml2/rpc.py#L312-L331 | 18:02 |
sean-k-mooney | if both sides had complted the provision the event would have been sent | 18:02 |
kevko | bug in configuration ? | 18:03 |
kevko | yeah, i saws some fixed bugs on launchpad | 18:04 |
sean-k-mooney | in the neutron server you should see one of these two logs notign that the l2 agent has complted its provisioning https://github.com/openstack/neutron/blob/9241c76b04e6745cc648ee42037cfe6ddad3600a/neutron/db/provisioning_blocks.py#L133-L140 | 18:05 |
kevko | sean-k-mooney: nothing, i have wallaby btw | 18:10 |
sean-k-mooney | i dont think this has changed much form wallaby to master | 18:11 |
sean-k-mooney | it might be best to take this to then neutron channel butit would seam for whateer reason that the port status chagne is not propagating to the nutron server then | 18:12 |
kevko | do you want ssh key to that lab ? | 18:12 |
sean-k-mooney | unfortunetly i have some other work i need to get done so im not sure i can really supprot debuging this much beyond what i have already done | 18:13 |
kevko | sean-k-mooney: ok, no problem, thank you very much ... | 18:14 |
kevko | btw, I have neutron server set to Debug = False ..so that's the reason why I am not seeing that debug messages .. | 18:14 |
sean-k-mooney | ah ya these are debug only since its a bit verbose | 18:15 |
kevko | ok, have to go ...thank you very much | 18:16 |
opendevreview | Artom Lifshitz proposed openstack/nova master: DNM: Run OVS job with hybrid plug https://review.opendev.org/c/openstack/nova/+/817303 | 20:16 |
opendevreview | Artom Lifshitz proposed openstack/nova master: DNM: Run OVS job with hybrid plug https://review.opendev.org/c/openstack/nova/+/817303 | 20:18 |
opendevreview | Dan Smith proposed openstack/nova master: WIP: Revert project-specific APIs for servers https://review.opendev.org/c/openstack/nova/+/816206 | 20:26 |
dansmith | gmann: lbragstad: my brain is fried from ^ so use extra caution while reviewing | 20:26 |
dansmith | however, I do think that's much easier to read than what was there before, and hopefully makes the iteration from current..scope..nolegacy more clear | 20:27 |
gmann | dansmith: thanks, ack | 20:35 |
lbragstad | dansmith sweet - thanks | 20:45 |
hyang[m] | Hi there, can someone help to review https://review.opendev.org/c/openstack/nova/+/811521? It can help to close both https://bugs.launchpad.net/nova/+bug/1943969 and https://bugs.launchpad.net/neutron/+bug/1942615 | 21:06 |
artom | Hah, so revert resize is broken with ovs + hybrid plug | 22:14 |
artom | https://storage.bhs.cloud.ovh.net/v1/AUTH_dcaab5e32b234d56b626f72581e3644c/zuul_opendev_logs_f60/817303/2/check/nova-ovs-hybrid-plug/f60d54c/testr_results.html | 22:15 |
artom | We first noticed this downstream, and now that ^^ tested it upstream, same result | 22:15 |
gmann | dansmith: lbragstad johnthetubaguy[m] I created this wikitable to audit all the nova API policy - https://wiki.openstack.org/wiki/Nova/rbac | 22:19 |
gmann | few I have kept as ? mainly multi-policy one. for example showing host_status policy in GET /servers please review those. | 22:20 |
gmann | dansmith: lbragstad johnthetubaguy[m] I have updated those as per my understanding and with new direction we agreed on Wed. My eyes are paining now after listing/auditing these ~225 policies . will catch up on this tomorrow. | 22:24 |
dansmith | gmann: wow, I thought you were going to do it in a google sheet or something | 22:32 |
dansmith | I'm sure your eyes are literally bleeding now :/ | 22:32 |
Generated by irclog2html.py 2.17.2 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!