16:00:06 #startmeeting nova 16:00:06 Meeting started Tue Jun 29 16:00:06 2021 UTC and is due to finish in 60 minutes. The chair is gibi. Information about MeetBot at http://wiki.debian.org/MeetBot. 16:00:06 Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. 16:00:06 The meeting name has been set to 'nova' 16:00:14 o/ 16:00:21 * kashyap waves 16:00:32 kashyap: OK, then I will do the normal agenda and your topic will be part of the Open Discussion 16:00:38 Yes, that's fine. 16:01:06 \o 16:01:10 o/ 16:01:24 bauzas asked for a quick meeting so lets start 16:01:41 #topic Bugs (stuck/critical) 16:01:44 no critical bug 16:01:55 #link 22 new untriaged bugs (+1 since the last meeting): #link https://bugs.launchpad.net/nova/+bugs?search=Search&field.status=New 16:02:14 any bug we need to talk about today/ 16:02:15 ? 16:02:47 lyarwood spotted a bug earlier this morning. We think it's basically fixed now. Just waiting for the neutron patch to merge (thanks slaweq) 16:03:13 until that merges though, the gate is stuck (the live-migration job will fail) 16:03:45 (https://review.opendev.org/c/openstack/neutron/+/798634/ is the fix, btw) 16:04:23 stephenfin: thanks 16:04:49 any other bug to mention? 16:05:09 stephenfin: that job should not fail 16:05:27 we should fall back to the old way without mulitple port bindings 16:05:37 it might fail on stable branches if we have not backported the fix 16:05:45 maybe not, but it does and I haven't had time to figure out why 🤷 16:06:00 well we should since that means contrail is broken 16:06:08 one test failing is this which is port status gibi mentioned before meeting https://be2e92e10ead782aa651-35e07a4cf42cfaed2fcffa4bf0b16f1b.ssl.cf1.rackcdn.com/794757/9/check/tempest-multinode-full-py3/94d92ea/testr_results.html 16:06:10 they do not support it. 16:06:20 is that same? 16:06:36 yes, I think so 16:06:52 ok 16:07:25 so we proably want to hold the neutron patch till we fiture this out or propose a revert so we can debug 16:07:43 well the neutron patch is correct 16:07:57 if the neutron fix is needed anyhow then I vote for merging it and troubleshoot on a revert if needed 16:07:59 so proably the latter propose a revert so we can figure out why it failed 16:08:25 yeah, what gibi said 16:08:54 as we anyhow moved to the gate status lets move there by topic as wel 16:08:55 l 16:08:56 #topic Gate status 16:09:00 Nova gate bugs #link https://bugs.launchpad.net/nova/+bugs?field.tag=gate-failure 16:09:02 a reguression of this is effectivly a critial nova bug just an fyi 16:09:53 please tag bugs with gate-failure so that we can follow them there 16:10:15 placement weekly jobs are green #link https://zuul.openstack.org/builds?project=openstack%2Fplacement&pipeline=periodic-weekly 16:10:33 anything else on the gate status? 16:11:27 #topic Release Planning 16:11:31 Milestone 2 is in 3 weeks (15 of July) which is spec freeze 16:11:36 Spec review day is 6th of July #link http://lists.openstack.org/pipermail/openstack-discuss/2021-June/023083.html 16:11:58 and as you saw on the ML the next PTG time is set to October 18-22 16:12:05 and it will be virtual still 16:12:39 any other news on the incoming release? 16:12:58 :( i dont think so 16:13:15 sean-k-mooney: what is the sadface? virtual ptg? 16:13:20 yes 16:13:21 yup 16:13:24 :( :( :( even 16:13:31 heh, okay 16:14:11 we should have a terapeutic session in the PTG 16:14:15 anyhow moving on 16:14:17 #topic Stable Branches 16:14:22 stable gates should be OK (though 'wait_for_volume_resource_status' intermittently fails) 16:14:26 EOM (from elodilles ) 16:14:42 any other stable news? 16:15:04 nothing from me for now 16:15:17 same failure on master a bunch too I think 16:15:37 the cinder peeps were working on it a couple weeks ago 16:15:51 yeah I think lyarwood is still trying to see what happens in the guest that prevents detaching a volume 16:16:18 well, the cinder peeps were thinking it was an lvm segv or something 16:16:29 (on the host) 16:16:46 have we check that the falvor have at least 2 cores. its really just a workaroudn but i think that help downstream at one point with the guest not respondind to the detach 16:17:34 dansmith: could be multiple indendependent failure I only hit the detach one last week but I'm did not looked at CI results recently 16:17:37 i mean 1 shoudl really be enought but sometiems if the guest has 2 cores it will still be abel to repsond if its hung on other thngs 16:18:05 gibi: yeah, they already fixed one thing that manifested in the same way I think, which was specifically timeout related IIRC 16:18:16 but the latest was lvm crashing I think 16:18:16 anyway 16:18:27 dansmith: thansk that is good info 16:18:30 sean-k-mooney: good idea 16:18:36 moving on 16:18:38 #topic Sub/related team Highlights 16:18:43 bauzas: are you still with us? 16:18:55 yup 16:19:07 nothing to report, sir. 16:19:09 then 16:19:10 Libvirt (bauzas) 16:19:12 ack 16:19:13 thanks 16:19:13 this^ 16:19:15 :) 16:19:23 moving on 16:19:23 #topic Open discussion 16:19:27 (kashyap) seeking approval for the specless bp https://blueprints.launchpad.net/nova/+spec/virtio-as-default-display-device 16:19:34 gibi: So on that: 16:19:43 I think this was discussed today on the channel 16:19:47 I was reminded that we can't do the switch in the current devel cycle 16:20:03 But we need some preperatory work for Y release 16:20:22 E.g. recording the video model in system_metadata. Get the tests sorted, and then do the switch. 16:20:42 OK, so then for X you only aim for the recording and testing then switch in Y 16:20:58 why do we need to record the video model 16:20:59 ? 16:21:17 to prevent it chanigng for exisating vm after upgrade and hard reboot 16:21:18 I think it is to avoid changing ABI to the gues during hard reboot 16:21:28 dansmith: Upthread, sean-k-mooney was saying it might 16:21:29 yes that ^ 16:21:32 But: 16:21:40 is that because we're changing our default? 16:21:44 oh 16:21:45 There won't be any _visible_ breakage here: 16:21:49 dansmith: Yep 16:21:54 I thought the spec was adding it as an option, this is for changing the default, I see 16:21:56 Sorry, I should've given a summary here. 16:22:10 dansmith: The first sentence of the BP says: "Change Nova's default video display from 'cirrus' to 'virtio'." :-) 16:22:11 dansmith: ya we added virtio i think in train 16:22:47 so in this specific case it actuly might be safe to jsut make the change 16:23:00 because fo vga fallback mode 16:23:07 Yeah 16:23:08 well, I was going to say, i think we've made such changes in the past after some interval 16:23:09 but in general for device model default changes 16:23:14 we woudl have to recored then change 16:23:15 dansmith: To summarize the above: 16:23:44 If your guest has the kernel driver, then "virtio" display dev will make use of it; or else, it'll gracefully fallback to VGA 16:23:49 dansmith: the only one i can think of was enabling the RNG by default 16:23:55 So that's the recommended option from the QEMU graphics maints 16:24:10 sean-k-mooney: Yep 16:24:14 kashyap: fall back to cirrus? 16:24:33 dansmith: No, no; fall back to "VGA compatibility mode", which is still better than "cirrus" 16:24:34 dansmith: no the virtio-gpu device support a vga hardware interface 16:24:55 okay 16:24:56 dansmith: you just wont get all the fatures but it shold funciton simialr to cirrus in the guest 16:24:58 I.e. standard VGA. 16:25:02 Yes 16:25:10 do operators would opt into it ? 16:25:14 yeah, but that might freak out a windows machine if your display adapter suddenly changes I guess 16:25:39 or would we need to change the default automatically ? 16:25:40 bauzas: No; they can opt out of it here. 16:25:40 dansmith: on reboot it should be ok but that was the upgrade concern that would prompt recored then change next cycle 16:25:58 I understand dansmith's concern about freaking out if done automatically 16:25:59 bauzas: Yes, we should do the right thing here by changing the defaul. 16:26:06 kashyap: not for existing instahce you need to use hw_video_model in the image 16:26:10 bauzas: dansmith's good point is for Windows 16:26:31 sean-k-mooney: Right; obvious the default implies only for the new ones. 16:26:35 so tl;dr recored in X change in Y ? 16:26:37 s/obvious/obviously,/ 16:26:53 so the problem is for people who don't have hw_video_model in their image meta right? 16:26:54 sean-k-mooney: But _do_ we need to record at all? As there's no breakage here 16:27:03 dansmith: correct 16:27:08 can we just create all new instances with that set to the default if they don't have it in their image? 16:27:17 then we're good for next time too when we switch to whizbang32 video 16:27:28 can't wait for it 16:27:53 compute assumes cirrus if unset forever, otherwise honors what it's set to, and then we can make the switch now for any new instances 16:27:57 dansmith: not really but we can store our default in the instance_system_metadata 16:28:16 which is what we are now doing for machine_type as if it was set in the image 16:28:20 sean-k-mooney: not really? we mirror image meta in sysmeta already right? so we'd just be using that instead of a bespoke key? 16:28:44 dansmith: ya so we can set it in our copy which is what kashyap was going to do 16:28:50 mirror *some* of image_meta I mean 16:29:09 we just cant set it in glance unless we just document use the glance import plugin to set it on all uploaded images 16:29:13 ack, okay, then we don't need a warning cycle to switch the default if we do it that way 16:29:21 sean-k-mooney: right I'm talking about our local copy (of course) 16:29:41 dansmith: so unless we backport the recording of the current value we would still need one cycle 16:29:53 why? 16:30:05 to populate the instance_metadata_table for exisiting instnaces 16:30:08 sean-k-mooney: Yeah, why? I still don't see it. 16:30:17 no, we just assume cirrus forever if unset 16:30:30 oh 16:30:43 that just means dont change the default 16:30:51 past the virtio default, it'll always be set to something, so if set, honor that, else cirrus (but just on the compute).. new instances always get virtio set explicitly by default on create 16:30:54 dansmith: So we can even directly change w/o even recording it in system_metadata, as we did for virtio-rng (I'll get the commit later for you to read) 16:31:31 dansmith: that will complicate the inital spawn logic and posibel hard reboot 16:31:31 kashyap: okay not sure how, but happy to look 16:31:49 it might be doable but we reuse span in hard reboot 16:31:54 * bauzas needs to disappear 16:31:58 sean-k-mooney: just spawn, AFAIK, which seems fine as we record other such things IIRC, but whatever 16:32:03 dansmith: https://opendev.org/openstack/nova/commit/de512f2c025 16:32:04 so we will need to tell the different betweeen first boot and subsequint 16:32:08 just trying to avoid needing a cycle to change *and* annotate all existing instances 16:32:14 (It's slow to load) 16:32:41 dansmith: i guess we could try and implement that and see what it looks like 16:32:54 we can talk outside the meeting about it 16:32:57 Yeah 16:33:06 Thanks for the design discussion so far! 16:33:17 gibi: Any other topics? We can hash it outside of the meeting 16:33:17 OK. then I hold on approving the bp until you agree on the way forward 16:33:38 sean-k-mooney has one more headsup I think 16:33:44 so moving on to that 16:33:45 sean-k-mooney: 16:33:56 yes so ovn migration... 16:34:19 am tl;dr is architeutlaly there is alwasy a race when doing live migartion with ovn 16:34:44 effectivly ovn can only start installing rule when the tap is created on the dest 16:35:00 and at that point we have called libvirt to do the migration and its incontol 16:35:16 to to avoid that and create the port in prelive migration im proposing an os-vif change 16:35:51 baiscly reinotduce hybrid-plug btu with ovs bridges and patch port instead of linux bridges and veth pairs 16:36:07 that will not have any perfromance impact on the vm 16:36:19 but will allow ovn to isntall the rules in prelive migrate 16:36:35 i was wondering how people felt about that 16:37:13 honestly it is too deep networking to me. I assume the impact is mostly in os-vif. Does nova needs to be adapted? 16:37:13 so previously, we had 16:37:22 (ovs bridge) veth | <---> | veth (linux bridge) tap | <---> | VM 16:37:32 and now we'll have 16:37:39 (ovs bridge) patch | <---> | patch (ovs bridge) tap | <---> | VM 16:37:56 so everything stays in OVS but there's an additional (on top of br-int) bridge? 16:38:07 more like (ovs bridge) tap | <---> | VM orginally to (ovs bridge) patch | <---> | patch (ovs bridge) tap | <---> | VM 16:38:18 yes 16:38:28 this is the poc but it has a bug (ovs bridge) patch | <---> | patch (ovs bridge) tap | <---> | VM 16:38:34 https://review.opendev.org/c/openstack/os-vif/+/798055 16:39:00 currently its configurable and defualting to true for development 16:39:08 do we need to worry about flows getting added for the patch <-> tap in the second (new) bridge? 16:39:13 or does that happen automatically? 16:39:24 stephenfin: just the normal action 16:39:33 so no rules required 16:39:50 on the neutron side if we wanted to proceed there woudl need to be some qos changes for ovn 16:39:57 so that will be covered by a spec 16:40:24 if we are ok with this on the nova side i would like to track the capablity as a bug against os-vif 16:40:48 so we can backport the ablity to opt in tothis behavor but not use it by default for stable branches 16:40:59 excellent, so we'll pre-populate a flow in the br-int for the new patch port, and then the comms from the other side of the patch port to the VM don't need anything explicit bar the normal action 16:41:07 that wfm, personally 16:41:26 certainly seems better than re-adding hybrid plug with the OVS -> linux bridge -> VM dance 16:41:47 i guess may main question is bug blueprint or spec for this 16:43:01 I would like to see some high level docs on this _somewhere_ 16:43:02 hm, if this requires a neutron spec, then why do you need to backport the os-vif change to stable? 16:43:12 personally i would prefer to leave this bake for a cycle and enable it by default next cycle 16:43:23 gibi: the neutorn spec is to fix QOS support 16:43:29 it could be a blueprint but docs in the neutron tree might be better 16:43:31 * gibi is slow 16:43:37 sean-k-mooney: ahh OK I see 16:43:40 it would be useful for those that dont need qos without that 16:43:48 yepp now I got it 16:44:07 this is a bugfix for os-vif to support live migration with OVN 16:44:23 or more preciesly fix a race in live migration 16:44:28 I can live with this as a bugfix 16:44:37 yes basically 16:45:13 and thats also why we woudl default this to off intially and then enable it by default in the future 16:45:30 any objection? 16:45:39 operators can opt in early if they want but not change any behavior by default 16:46:16 I'm good. Can't speak for others tho 16:46:27 I don't see any hands raised :) 16:46:33 we can defer if peopel want to think about it more 16:46:38 im still working on the poc 16:47:09 my main concern is m2 and spec freeze 16:47:14 it is accepted as a bug now, here. If somebody later has an objection the we can rediscuss but until that this is a bug 16:47:34 Is there any other topic for today 16:48:31 not form me 16:49:00 then let's close this 16:49:04 thanks for joining 16:49:07 #endmeeting