16:00:10 #startmeeting nova 16:00:10 Meeting started Tue Nov 16 16:00:10 2021 UTC and is due to finish in 60 minutes. The chair is bauzas. Information about MeetBot at http://wiki.debian.org/MeetBot. 16:00:10 Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. 16:00:10 The meeting name has been set to 'nova' 16:00:18 o/ 16:00:21 o/ 16:00:28 #link https://wiki.openstack.org/wiki/Meetings/Nova#Agenda_for_next_meeting 16:00:48 good 'day, everyone ;) 16:00:52 Hi 16:00:56 Merged openstack/nova-specs master: Integration With Off-path Network Backends https://review.opendev.org/c/openstack/nova-specs/+/787458 16:01:11 o/ 16:01:38 I'll have to hardstop working in 45-ish mins, sooo 16:01:42 #chair gibi 16:01:42 Current chairs: bauzas gibi 16:01:44 sorry again 16:01:53 so I will take the rest 16:01:57 * bauzas is a taxi 16:02:13 anyway, let's start 16:02:20 #topic Bugs (stuck/critical) 16:02:26 #info No Critical bug 16:02:33 #link https://bugs.launchpad.net/nova/+bugs?search=Search&field.status=New 28 new untriaged bugs (+3 since the last meeting) 16:02:38 #help Nova bug triage help is appreciated https://wiki.openstack.org/wiki/Nova/BugTriage 16:02:43 I'm really a sad panda 16:03:07 in general, I'm triaging bugs on Tuesday, but I forgot about our today's spec review day :) 16:03:17 so I'll look at the bugs tomorrow 16:03:38 in case people want to help us, <3 16:03:52 any bug to discuss ? 16:04:26 #link https://storyboard.openstack.org/#!/project/openstack/placement 33 open stories (+1 since the last meeting) in Storyboard for Placement 16:04:30 about thisq.$ 16:04:34 this... * 16:04:49 I tried to find which story was new :) 16:05:15 but the last story was already the one I knew 16:05:20 so, in case people know... 16:05:49 o/ 16:06:12 bauzas: if at some point I have time I can try to dig but I pretty full at the moment 16:06:12 also, Storyboard is a bit... slow, I'd say 16:06:46 5 secs at least everytime it takes for looking about a story 16:07:24 I mean, for stories, maybe we should use Facebook then ? :p 16:07:35 (heh, :p ) 16:07:53 * bauzas was joking in case people were not knowing 16:08:18 OK, this looks like a bad joke 16:08:20 moving on :p 16:08:30 #topic Gate status 16:08:36 #link https://bugs.launchpad.net/nova/+bugs?field.tag=gate-failure Nova gate bugs 16:08:47 nothing new 16:08:58 #link https://zuul.openstack.org/builds?project=openstack%2Fplacement&pipeline=periodic-weekly Placement periodic job status 16:09:24 now placement-nova-tox-functional-py38 job works again :) 16:09:26 thanks ! 16:09:55 #topic Release Planning 16:10:04 #info Yoga-1 is due Nova 18th #link https://releases.openstack.org/yoga/schedule.html#y-1 16:10:17 which is in 2 days 16:10:27 nothing really to say about it 16:10:32 #info Spec review day is today 16:10:52 I think I reviewed all the specs but one (but I see this one was merged ;) ) 16:11:14 thanks for all who already reviewed specs 16:11:28 yeah I think we pushed forward all the open specs 16:12:01 Sorry If I'm interrupting but I had one doubt regarding my spec 16:12:04 we merged 3 specs today 16:12:41 whoami-rajat: no worries, we can discuss this spec if you want during the open discussion topic 16:12:52 ack thanks bauzas 16:12:55 whoami-rajat: but what is your concern ? 16:13:07 a tl;dr if you prefer 16:14:03 for other specs, I'll mark the related blueprints accepted in Launchpad by tomorrow 16:14:21 bauzas, so I'm working on the reimage spec for volume backed instances and we decided to send connector details with the reimage API call and cinder will do the attachment update (this was during PTG), Lee pointed out that we should follow our current mechanism of nova doing attachment update like we do for other operations 16:15:00 ok, if this is a technical question, let's discuss this during the open discussion topic as I said 16:15:26 sure, np 16:15:42 ok, next topic then 16:15:46 #topic Review priorities 16:15:52 #link https://review.opendev.org/q/status:open+(project:openstack/nova+OR+project:openstack/placement)+label:Review-Priority%252B1 16:16:06 #info https://review.opendev.org/c/openstack/nova/+/816861 bauzas proposing a documentation change for helping contributors to ask for reviews 16:16:16 gibi already provided some comments on it 16:16:58 I guess the concern is how to help contributors to ask for reviews priorities like we did with the etherpad 16:17:29 but if we have a consensus saying that it is not an issue, I'll stop 16:18:02 but my only concern is that I think asking people to come on IRC and ping folks is difficult so we could use gerrit 16:19:09 what is more difficult? Finding the reason of a faul in nova code and fixing it or joing IRC to ask for review help? 16:19:46 well one you "might" be able to do offlien/async 16:20:01 the other invovles talking to peopel albeit by text 16:20:25 unfortunetly those are sometime non overlaping skill sets 16:20:32 gibi: I'm just thinking of on and off contributors that just provide bugfixes 16:20:33 doing code review is talking to people via text :) 16:21:12 but let's continue discussing this in the proposal, I don't wanna drag the whole attention by now 16:21:19 bauzas: for one off patches i think the expectaion shoudl still be on use to watch the patchs come in and help them 16:21:28 rather ten assuemign they will use any tools we provide 16:21:49 sean-k-mooney: yeah but then how to discover them ? 16:22:08 either way, let's discuss this by Gerrit :p 16:22:10 well if its a similar time zone i watch for teh irc bot commeting for the patches 16:22:24 if i dont recognise it or the name i open it 16:22:49 and then one of use can request the reqview priority in gerrirt or publicise the patch to others 16:23:15 that's one direction 16:23:46 if there is something in gerrit i can set im happy to do that on patches when i think they are ready otherwise ill just ping them to ye as i do now 16:23:55 either way, we have a large number of items for the open discussion topic, so let's move on 16:24:01 ack 16:24:09 #topic Stable Branches 16:24:23 elodilles: fancy copy/pasting or do you want me to do so ? 16:24:35 either way is OK :) 16:25:42 I can do it 16:25:44 #info stable gates' status look OK, no blocked branch 16:25:50 #info final ussuri nova package release was published (21.2.4) 16:25:55 #info ussuri-em tagging patch is waiting for final python-novaclient release patch to merge 16:26:00 #link https://review.opendev.org/c/openstack/releases/+/817930 16:26:05 #link https://review.opendev.org/c/openstack/releases/+/817606 16:26:09 #info intermittent volume detach issue: afaik Lee has an idea and started to work on how it can be fixed: 16:26:14 #link https://review.opendev.org/c/openstack/tempest/+/817772/ 16:26:19 any question ? 16:26:25 thanks :) 16:27:40 looks like none 16:27:47 #topic Sub/related team Highlights 16:27:47 the volume detach issue feel more an more like not related to detach 16:27:51 #undo 16:27:51 Removing item from minutes: #topic Sub/related team Highlights 16:28:05 the kernel panic happens before we issue detach 16:28:07 gibi: true 16:28:21 it is either related to the attach or the live migration itself 16:28:50 I have trials placing sleep in different places to see where we are too fast https://review.opendev.org/c/openstack/nova/+/817564 16:28:52 which stable branches are impacted ? 16:28:57 stable/victoria 16:29:01 ubuntu focal-ish I guess ? 16:29:20 ack thanks 16:29:21 (and other branches as well, but might be different root causes) 16:29:45 I only see kernel panic in stable/victoria (a lot) and one single failure in stable/wallaby 16:30:08 so if there are detach issues in older stable that is either not causing kernel panic, or we don't see the panic in the logs 16:30:33 I guess kernel versions are different between branches 16:30:41 right? 16:31:06 could we imagine somehow to verify another kernel version for stable/victoria 16:31:06 ? 16:31:06 we tested with guest cirros 0.5.1 (victoria default) and 0.5.2 (master default) it is reproducible with both 16:31:23 ack so unrelated 16:31:31 there is a summary here https://bugs.launchpad.net/nova/+bug/1950310/comments/8 16:32:19 #link https://bugs.launchpad.net/nova/+bug/1950310/comments/8 explaining the guest kernel panic related to stable/victoria branch 16:32:22 ya the fiew cases i looked at with you last week were all happing befoer detach 16:32:38 so its either the attach or live migration 16:32:44 sean-k-mooney: I have more logs in the runs of https://review.opendev.org/c/openstack/nova/+/817564 if you are interested 16:32:52 i looked downstream at our qemu bugs but didnt see anythign relevent 16:33:09 gibi: sure ill try and take a look proably tomorrow 16:33:15 but ill open it in a tab 16:33:57 sean-k-mooney: thanks, I will retrigger that patch for a couple times to see if the current sleep before the live migration helps 16:34:24 a good sleep always helps 16:34:39 :) 16:34:45 :] 16:34:46 when sleep does not work we can also try a trusty print statement 16:35:09 sleep is not there as a solution but as a troubleshooting to see which step we are too fast :D 16:35:24 * sean-k-mooney is dismayed by how may race condition __dont__ appeare when you use print for debugging 16:35:33 and I do have a lot of print(server.console) like statements in the tempest :D 16:36:10 i think we can move on but its good you were able to confirm we were attaching before the kerenl finished booting 16:36:22 at least in some cases 16:36:57 that at least lend weight to the idea we are racing 16:37:05 ok, let's move on 16:37:12 ack 16:37:12 again, large agenda todayu 16:37:17 #topic Sub/related team Highlights 16:37:23 damn 16:37:24 #topic Sub/related team Highlights 16:37:40 Libvirt : lyarwood ? 16:38:15 I guess nothing to tell 16:38:19 moving on to the last topic 16:38:31 #topic Open discussion 16:39:02 whoami-rajat: please queue 16:39:09 thanks! 16:39:19 (kashyapc) Blueprint for review: "Switch to 'virtio' as the default display device" -- https://blueprints.launchpad.net/nova/+spec/virtio-as-default-display-device 16:39:27 this is a specless bp ask 16:39:37 kashyap said " The full rationale is in the blueprint; in short: "cirrus" display device has many limitations and is "considered harmful"[1] by QEMU graphics maintainers since 2014." 16:40:00 do we need a spec for this bp or are we OK for approving it by now ? 16:40:19 so lyarwood had a concern with my reimage spec, we agreed to pass the connector info to reimage API (cinder) and cinder will do attachment update and return the connection info with events payload 16:40:22 I think we don't need a spec this is pretty self contained in the libvirt driver 16:40:23 kashyap was unable to attend the meeting today 16:40:27 (in PTG) 16:40:38 i think we are ok with approving it the main thing to call out is we will be chaing it for existing isntnace too 16:40:39 whoami-rajat: please hold, sorry 16:40:43 oh ok 16:40:45 the only open question we had with sean-k-mooney is how to change the default 16:41:05 but kashyap tested it out that changing the default during hard reboot not cause any trouble to guests 16:41:16 as the new video dev has a fallback vga mode 16:41:17 gibi: I'm thinking hard of any potential upgrade implication 16:41:32 right so when we dicussed this before we decied to change it only for new instances to avoid upgrade issue 16:41:50 correct 16:41:57 our downstream QE tested this with windows guests and linux guest and both seamd to be ok with the change 16:42:01 I'm in favor of not touching the running instances 16:42:11 or asking to rebuild them 16:42:18 we are not toching the running instance, we only touch hard rebooting instances 16:42:22 so kasyap has impletne this for all instnaces 16:42:49 gibi: which happens when you stop/start, right? 16:42:54 right 16:42:54 bauzas: yes as gibi says it will only take effect when the xml is next regenreted 16:42:59 it happens while the guest is not running 16:43:21 it is not an unplug/plug for a running guest 16:43:29 do we want admins to opt-in instances ? 16:43:41 or do we agree it would be done automatically? 16:43:42 it will happen on start/stop hard reboot or a non live move operations 16:44:03 bauzas: I trust kashyap that it is safe to change this device 16:44:14 do we also want to have a nova-status upgrade check for yoga about this ? 16:44:23 no 16:44:24 gibi: me too 16:44:29 why would we need too 16:44:35 we are not removing support for cirrus 16:44:36 we don't remove cirros 16:44:41 jsut not the default 16:44:46 yepp 16:45:00 gibi: context is downstream it is being remvoed form rhel 9 16:45:04 sean-k-mooney: sure, that just means that long-living instances could continue running cirros 16:45:14 so wwe need to care about it for our product 16:45:24 actully cirrus is not beeing remvoed in rhel 9 16:45:35 but like in rhel 10 16:46:06 bauzas: yep which i think is ok 16:46:27 we coudl have a nova status check but it woudl have to run on the compute nodes 16:46:35 which is kind of not nice 16:46:40 since it woudl have to check the xmls 16:46:51 I know 16:46:53 so i woudl not add it personally 16:47:19 I'm just saying that we enter a time that could last long 16:47:20 I agree, we don't need upgrade check 16:48:16 shal we continue this in the patch review 16:48:17 but agreed on the fact this is not a problem until cirros support is removed and this is not an upstream question 16:48:31 sean-k-mooney: you're right, nothing needing a spec 16:49:06 #agreed https://blueprints.launchpad.net/nova/+spec/virtio-as-default-display-device is accepted as specless BP for the Yoga release timeframe 16:49:10 moving on 16:49:12 \o/ 16:49:17 next item 16:49:30 (kashyapc) Blueprint for review: "Add ability to control the memory used by fully emulated QEMU guests -- https://blueprints.launchpad.net/nova/+spec/control-qemu-tb-cache 16:49:39 again, a specless bp ask 16:49:54 he said " This blueprint allows us to configure how much memory a plain-emulated (TCG) VM, which is what OpenStack CI uses. Recently, QEMU changed the default memory used by TCG VMs to be much higher, thus reducing the no. of VMs you TCG could run per host. Note: the libvirt patch required for this will be in libvirt-v7.10.0 (December 2021)." 16:49:59 " See this issue for more details: https://gitlab.com/qemu-project/qemu/-/issues/693 (Qemu increased memory usage with TCG)" 16:50:21 im a little torn on this 16:50:41 im not sure i like this being a per host config option 16:50:51 but its also breaking existing deployemnts 16:51:04 so we cant really adress that with flavor extra specs or iamge properties 16:51:17 sicne it would be a pain for operators to use 16:51:17 but that requires rebuild of existing instances 16:51:21 yep 16:51:32 so with that in mind the config option proably is the way to go 16:51:51 just need to bare in mind it might chagne after a hard reboot if you live migrate 16:51:53 yeah, config as a first step, if later more fine grained control is needed we can add an extra_spec 16:52:24 there are libvirt dependencies 16:52:28 if we capture the (this should really be the same on all host in a region) pice in the docs im ok with this 16:52:37 bauzas: and qemu deps 16:52:37 you need a recent libvirt in order to be able to use it 16:52:43 right 16:52:45 its only supproted on qemu 5.0+ 16:52:45 sean-k-mooney: yeah that make sense to document 16:53:05 so we will need a libvirt verion and qemu check in the code 16:53:12 which is fine we know how to do that 16:53:15 so, if this is a configurable, this has to explain which versions you need 16:53:23 yep 16:53:33 we would expose something unusable for the most 16:53:50 the only tricky bit will be live migration 16:54:02 if the dest is not new enough but the host is 16:54:04 correct, the checks ? 16:54:16 we will need to make sure we validate that 16:54:23 right 16:54:36 but this looks to me an implementation detail 16:54:50 all of this seems not needing a spec, right? 16:54:59 upgrade concerns are N/A 16:55:10 as you explicitely need a recent qemu 16:55:18 am the live migration check will be a littel complex but other then that i dont see a need for a spec 16:55:41 im a littel concerned about the livemgration check which is what makes me hesitate to say no spec 16:55:45 we can revisit this decision if the patch goes hairy 16:55:53 yes 16:55:55 that works for mee 16:56:25 works for me too 16:56:36 i htink we have the hypervior version avaible in the conductor so i think we can do it without an rpc/object change 16:56:36 #agreed https://blueprints.launchpad.net/nova/+spec/control-qemu-tb-cache can be a specless BP but we need to know more about the live migration checks before we approve 16:56:51 gibi: sean-k-mooney: works for you what I wrote ? 16:57:04 +1 16:57:08 ok, 16:57:13 next topic is ganso 16:57:20 and eventually, whoami-rajat 16:57:27 hi! 16:57:35 ganso: you have one min :) 16:57:50 so my question is about adding hw_vif_multiqueue_enabled setting to flavors 16:57:56 it was removed from the original spec 16:57:57 https://review.opendev.org/c/openstack/nova-specs/+/128825/comment/7ad32947_73515762/#90 16:58:06 today it can be only used in image properties 16:58:26 does it make at all semantically or is this something that only makes sense as an image property? 16:58:32 ya this came up semi recently 16:58:38 i think we can just add this in the flavor 16:58:49 the other way would be a concern to me 16:58:59 ok. Would this require a spec? 16:59:00 as users could use a new property 16:59:20 well image propertise are for exposing thing that affect the virtualised hardware 16:59:24 but given we already accept this for images, I don't see a problem with accepting it as a flavor extraspec 16:59:30 so in gnerally you want that to be user setable 17:00:00 great 17:00:07 sean-k-mooney: right, I was just explaning that image > flavor seems not debatable while flavor > image seems to be discussed 17:00:16 to me it sounds simple enough to not require a spec, do you agree? 17:00:37 good question 17:00:45 but we're overtime 17:00:49 https://blueprints.launchpad.net/nova/+spec/multiqueue-flavor-extra-spec 17:01:04 this is the implemation https://review.opendev.org/q/topic:bp/multiqueue-flavor-extra-spec 17:01:04 ganso: whoami-rajat: let's continue discussing your concerns after the meeting 17:01:10 #endmeeting