16:00:19 #startmeeting nova 16:00:19 Meeting started Tue Jan 17 16:00:19 2023 UTC and is due to finish in 60 minutes. The chair is bauzas. Information about MeetBot at http://wiki.debian.org/MeetBot. 16:00:19 Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. 16:00:19 The meeting name has been set to 'nova' 16:00:24 gdi, just in time 16:00:32 #link https://wiki.openstack.org/wiki/Meetings/Nova#Agenda_for_next_meeting 16:00:36 hi everyone 16:00:40 o/ 16:01:11 o/ 16:01:59 o/ 16:02:00 okay let's start 16:02:06 (I'm a bit distracted) 16:02:15 #topic Bugs (stuck/critical) 16:02:30 #info One critical bug 16:02:33 #info One critical bug 16:02:38 o/ 16:03:08 #link https://bugs.launchpad.net/nova/+bug/2002951 16:03:20 gibi: I marked this one as critical for the sake of the discussion 16:03:22 o/ 16:03:29 but we can put it back to High 16:03:49 in general, I tend to triage CI bugs to Critical until we agree this is not holding the gate 16:04:04 do we want to discuss about it now or no ? 16:04:38 sure 16:04:57 I updated the bug 16:05:01 ok, so, gibi (mostly) and I looked at this one today 16:05:05 I think it is tempest.api.compute.admin.test_volume.AttachSCSIVolumeTestJSON.test_attach_scsi_disk_with_config_drive test case that tirggers the OOM 16:05:11 yeah 16:05:17 and like I said, I tried to find wherer 16:06:17 but I wasn't able to see 16:06:34 context : https://github.com/openstack/tempest/blob/7c8b49becef78a257e2515970a552c84982f59cd/tempest/api/compute/admin/test_volume.py#L84-L120 16:06:46 we try to create an image 16:06:52 then we create an instance 16:07:06 and then a volume which we attach to the instance 16:07:44 I haven't had time to look into the actual tc yet 16:07:45 p/ 16:08:08 also it would be nice to see how the python interpreter rss size grows during the test execution 16:08:11 yeah surely seems like a benign test case 16:08:38 we unfortuently dont have the memtacker stuff form devstack 16:08:50 btu it would be nice if we coudl get that and also dmsg in the tox based tests 16:08:52 I tried to grep the testname in n-api 16:09:01 but I wasn't finding it 16:09:07 so, either we no longer use it 16:09:17 or we were not yet calling the nova-api 16:09:34 which means we have the kill before creating the instance 16:09:55 but I could be wrong 16:10:53 anyway, folks are ok if we modify the bug to High ? 16:10:59 bug report* 16:11:32 tomorrow I will continue looking but we can also tentatively try to disable this single test to see if that removes the OOM problem 16:12:18 bauzas: I'm not against having this as High 16:12:26 ok 16:12:35 then let's look again tomorrow and we'll see what to do 16:13:08 this time I'm just afraid to remove this test because we don't know why we have a OOMkill 16:13:18 this could arrive to another test then 16:13:40 yep, that would be my goal of disabling it temporary to see if the OOM just moves to another test case 16:13:44 and to see which test case 16:13:48 to find a pattern 16:13:48 (I also verified that nothing changed on the tempest side since 1 year for this test) 16:14:05 gibi: you could also rename it I think and change the sort ordering 16:14:10 yeah 16:14:11 afaik, we run tests sorted per worker 16:14:27 I was wondering, maybe this was a problem due to another test 16:14:28 dansmith: good idea 16:14:45 dansmith: I think we can ask stestr to modifyh the sort 16:14:55 oh? 16:14:59 but I need to remember how to do it 16:15:05 bauzas: on that I extracted all the test cases form the killed worker from multiple runs and the only test case overlap was this tc 16:15:35 so if other test causing the issue then it is not a single test but a set of tests 16:15:49 otherwise I would see an overlap 16:15:51 gibi: well, yeah, but that maybe means that the previous tests were adding more memory before so that's only with this test that OOMkiller wants to kill 16:16:14 as you see, this is a very simple test 16:16:22 that is my point above, if a single test adds the extra memory usage then that woudl show up as an overlap between runs 16:16:30 but it doesn't 16:16:45 gibi: that's why I'll try to see how to ask stestr to modify the sort 16:17:12 yeah, moving this tc to the end can help to see if there is a set of tests that trigger this behavior 16:18:07 anyhow I think we can move on 16:19:03 cool 16:19:19 #link https://bugs.launchpad.net/nova/+bugs?search=Search&field.status=New 27 new untriaged bugs (+0 since the last meeting) 16:19:45 I triaged a few bugs todaty 16:19:57 #link https://etherpad.opendev.org/p/nova-bug-triage-20230110 16:20:16 nothing to report here by now 16:20:21 #info Add yourself in the team bug roster if you want to help https://etherpad.opendev.org/p/nova-bug-triage-roster 16:20:28 gibi: wants to get the bug baton this week ? 16:21:23 bauzas: sure I can 16:21:29 thanks alot 16:21:47 #info bug baton is being passed to gibi 16:21:53 #topic Gate status 16:21:58 #link https://bugs.launchpad.net/nova/+bugs?field.tag=gate-failure Nova gate bugs 16:22:14 we already discussed about the main one, wanting to discuss other CI bugs ? 16:22:38 just a sort summary 16:22:38 looks not 16:22:41 ah 16:22:50 we're listening to you 16:22:53 I see failures in our functional tests 16:23:13 one is about missing db tables so it is probably interference between test cases 16:23:21 we saw that before 16:23:30 fixed it but not we had a non 100% fix 16:23:39 :/ 16:24:08 and there is a failure with db cursor need a reset 16:24:14 it might be related to the above 16:24:17 not sure yet 16:24:47 lovely 16:24:55 these two I wanted to mention 16:25:08 but there are other open bugs that appear in the gate time to time 16:25:09 flipping strest worker runs would help to trigger the races 16:25:20 so it is fairly hard to land things overall 16:25:29 I could try to reproduce those functests locally 16:25:45 this would exhaust my laptop, but worth trying 16:26:18 gibi: let's then discuss this tomorrow as well 16:26:23 sure 16:26:48 I mean, I have my power mgmt series to work on, but if we can't land things, nothing will merge either way. 16:27:03 the gate is not totally blocked 16:27:12 but its flaky enough that its hard 16:27:15 yeah, but rechecking is not a great option 16:27:16 yepp 16:27:22 ya its not 16:27:29 agreed, I'm not sending the signal our gate is busted 16:27:35 but we know this is hard 16:27:43 one thing i have noticed is the py3.10 functional job seams more stable then py38 16:27:47 and let me go to the next topic and you'll understand why 16:28:01 for the db issues 16:28:09 but that could be just the ones i happend to look at 16:28:14 ok 16:28:22 sean-k-mooney: 3.10 introduced a much more deterministic thread scheduler. Also its quite a bit quicker in some projects which helps generally 16:28:32 ah, gdk 16:28:54 we probably have tests not correctly cleaning up data 16:28:58 so we need to bisect them 16:29:01 ya so im wondifing if we are blocked we might want to make the 3.8 one non voting while we try to fix this 16:29:14 but there are other issues so i dont think that will help much 16:29:19 just somethign to keep in mind 16:29:20 sean-k-mooney: before going that road, lemme try to bisect the faulty tests 16:29:25 I've seen it both waysm 16:29:26 yep 16:29:32 3.10 passing with 3.8 failing and the other way 16:29:39 so I don't think disabling one gets us much 16:29:41 ok then its jsut flaky 16:29:50 lovely 16:29:53 moving on 16:29:59 we have some agenda today 16:30:09 #link https://zuul.openstack.org/builds?project=openstack%2Fnova&project=openstack%2Fplacement&pipeline=periodic-weekly Nova&Placement periodic jobs status 16:30:14 that's fun 16:30:34 despite https://review.opendev.org/c/openstack/tempest/+/866049 was merged, we still have the centos9-fips job timeouting 16:30:40 so I looked at the job def 16:30:57 and looks to me it no longer depends on the job I added extra timeout :) 16:31:22 so basically the patch that took 2 months to get landed is basically useless for our pipeline 16:31:26 funny, as I said 16:31:41 I think we had progress on running fips testing on ubuntu but need to check if we have job ready. that can replace c9-fips jobs 16:31:48 so I'll just add the extra timeout on our local job definition 16:32:00 Dan Smith proposed openstack/nova master: WIP: Detect host renames and abort startup https://review.opendev.org/c/openstack/nova/+/863920 16:32:10 gmann: that's good to hear 16:32:29 not merged yet #link https://review.opendev.org/c/openstack/project-config/+/867112 16:32:33 gmann: we could put fips in check pipeline then 16:32:47 yeah that is plan once we have ubuntu based job 16:32:55 gmann: as a reminder, given centos9s, fips is on periodic pipeline 16:33:04 yeah 16:33:08 anyway, this time it should be quickier 16:33:17 I'll just update our .zuul.yaml 16:33:36 oh wait 16:34:10 https://zuul.openstack.org/job/tempest-integrated-compute-centos-9-stream is actually defined in tempest 16:34:30 so I don't get why we don't benefit from the extra timeout 16:34:39 yeah, we will prepare the tempest job and then add in project side gate 16:34:44 the job definition is yes 16:35:14 anyway, I don't want us to spill too much time about it 16:35:18 not sure on timeout. c9s has been flasky for fips always 16:35:20 let's move on 16:35:26 yes 16:35:30 #info Please look at the gate failures and file a bug report with the gate-failure tag. 16:35:35 #info STOP DOING BLIND RECHECKS aka. 'recheck' https://docs.openstack.org/project-team-guide/testing.html#how-to-handle-test-failures 16:35:39 #topic Release Planning 16:35:44 #link https://releases.openstack.org/antelope/schedule.html 16:35:49 #info Antelope-3 is in 4 weeks 16:35:51 tick tack 16:35:57 #info 17 Accepted blueprints for 2023.1 Antelope 16:36:04 which is the same amount than yoga 16:36:16 this is a large number given our team 16:36:42 given this, I'll create an etherpad for tracking each of them 16:36:44 there are 3 i expect to complete this week possibely more 16:36:51 dependign on review bandwith 16:36:56 sean-k-mooney: me too, but that still requires us some effort 16:37:24 i am a little worried for soem of them but hopeful we will land the majoriy of them 16:37:32 I mean, I know me, I'll need to put my review energy on the right way and an etherpad will help me to direct my energy productively 16:37:45 i dobth it will be too much over half 16:37:48 #link https://blueprints.launchpad.net/nova/antelope 16:38:02 you can find the list of those blueprints there ^ 16:38:28 #topic Review priorities 16:38:34 #link https://review.opendev.org/q/status:open+(project:openstack/nova+OR+project:openstack/placement+OR+project:openstack/os-traits+OR+project:openstack/os-resource-classes+OR+project:openstack/os-vif+OR+project:openstack/python-novaclient+OR+project:openstack/osc-placement)+(label:Review-Priority%252B1+OR+label:Review-Priority%252B2) 16:38:40 #info As a reminder, cores eager to review changes can +1 to indicate their interest, +2 for committing to the review 16:39:18 nothing to mention here 16:39:23 #topic Stable Branches 16:39:27 elodilles: floor is yours 16:39:38 #info stable branches don't seem to be blocked, but patches mostly need rechecks 16:39:47 #info stable branch status / gate failures tracking etherpad: https://etherpad.opendev.org/p/nova-stable-branch-ci 16:39:54 and last but not least: Xena will transition to Extended Maintenance after the release of 2023.1 Antelope 16:40:00 so to prepare for that: 16:40:08 #info release patches were generated for *stable/xena* : https://review.opendev.org/q/topic:xena-stable+reviewer:sbauza%2540redhat.com 16:40:13 the release team proposed doign a release of several repos for xena. do we want ot wait for the tox pin to be merged 16:40:34 sean-k-mooney: which one do you mean? 16:40:45 the ones you were linking 16:40:46 (and that was all from me about stable branches) 16:40:59 tox pin is merged for stable branches. it is done at central place in openstck-zuul-jobs repo 16:41:02 that's fun, stable branches are more stable than master :) 16:41:03 so we dont have the pin to tox<4 on xena yet 16:41:13 gmann: oh ok 16:41:22 i tought we needed to do it in the tox.ini too 16:41:28 so that it worked if you run tox loclaly 16:41:31 let me check if osc-placement and python client is merged or not 16:41:40 no, the workaround was merged last week, as gmann says 16:41:51 will that work outside ci 16:42:17 im not sure hwo you can fix it centrally unless we did it in upper-constraits? 16:42:21 yeah tox one is merged but this placement functional test this is not yet #link https://review.opendev.org/q/I4e3e5732411639054baaa9211a29e2e2c8210ac0 16:42:32 bauzas: sean-k-mooney elodilles ^^ 16:42:41 oh, i missed that somehow 16:42:45 will review ASAP 16:42:49 ack 16:42:51 sorry for that 16:42:52 tab open 16:43:07 I'll do my homework after the meeting 16:43:15 so my question still is not really answered 16:43:17 (the stable ones o:)) 16:43:19 thanks 16:43:42 where is tox pinned in https://github.com/openstack/nova/blob/stable/xena/tox.ini 16:43:44 sean-k-mooney: in that case we can wait until the xena one merges :) 16:43:47 sean-k-mooney: how to cap tox under 3 ? 16:43:58 sean-k-mooney: only for CI. you mean to pin it in tox.ini itself ? 16:44:01 under 4, I mean 16:44:24 yes so that developers can also run tox locally to test backports 16:44:45 sean-k-mooney: if we want to fix it for local run to make sure we do not run it with tox4 then yes we need to pin in tox.ini also but that can be done if we really need 16:44:47 i was asking should we do that before doing the final release for extended mainance 16:45:33 hmmm. good question. 16:45:44 that sounds doable to me 16:45:45 for local run I think both way ok either make sure we have tox<4 in our env or pin it in tox.ini 16:46:04 i replciated the pin in ci downstream 16:46:32 we did for python-novaclient https://review.opendev.org/c/openstack/python-novaclient/+/869598/2/tox.ini#4 16:46:35 #link https://review.opendev.org/c/openstack/python-novaclient/+/869598/2/tox.ini#4 16:47:27 yes 16:47:38 so do we want to do it for all the other nova delivberable 16:47:46 then i'm OK to do the same and release after that merged 16:47:50 if so we should do it before the em tansition 16:48:35 yes, I'm OK with that, I don't see now any reason not to do it before the transition 16:49:56 (the generated xena release patches don't have deadlines, but best not to postpone them for weeks) 16:50:28 ok, sounds an agreement, we just need an owner 16:50:53 i can do it for os-vif maybe some of the others 16:51:00 ack 16:51:02 its really just one line and ensuring it works loocally 16:51:10 I know 16:51:12 sean-k-mooney: ping me if i forgot the reviews o:) 16:51:52 anyway I guess we're done with this topic and we have a specless blueprint ask in a sec 16:51:59 so, moving on 16:52:17 #topic Open discussion 16:52:30 (sean-k-mooney) https://blueprints.launchpad.net/nova/+spec/default-ephemeral-format-unformated 16:53:03 ya so tl;dr is currently we use libguestfs in two places in nova 16:53:12 file injection which is deprecated for a long time 16:53:25 and formating the filesystem of the addtional ephmeral disks 16:53:41 true 16:53:45 i would like to have a way to allwo tthe ephmeral disk to be unformated 16:53:53 making libguestfs optional 16:54:14 to the proposal is either add unformated to https://docs.openstack.org/nova/latest/configuration/config.html#DEFAULT.default_ephemeral_format 16:54:28 or sligly cleaner add a bool opt to trun off the formating 16:54:35 what does the default value which is None ? 16:54:55 and i want to kwno if there is a prefernce and if we think this could be a spec or specless 16:55:02 either is okay with me, I guess format=unformatted seems better to me because it's just another option for an existing knob 16:55:31 i need to check the default of none but i belive it makes it os dependednt 16:55:36 sean-k-mooney: I see None as the default value, what's then the behaviour ? 16:55:38 ok 16:55:43 i need to dig into this a little more 16:56:04 but basically i wanted ot know if peopel think this is ok to do this cycle 16:56:13 or shoudl we discuss in the ptg and do it next cycle 16:56:14 I think this is a very small feature 16:56:19 self-containede 16:56:26 yeah no need for lots of discussion, IMHO 16:56:30 particularly if we go with adding a new value 16:56:56 ok so 1 i need to document what none does. 2 determin if it can disable the formating today alredy 16:57:02 true 16:57:10 and 3 if not add unformated as an option to expcitly do that 16:57:20 sounds a simple plan to me 16:57:34 so at a minium ill add a docs change to say what none does 16:57:45 and we can then evaluate in the gerrit review if we need unformated 16:58:02 sounds good to me 16:58:06 anyone objecting about this smallish effort for this cycle ? 16:58:29 if this ends up not being small i will punt to next cycle 16:58:45 I don't expect any behavioural change 16:59:03 so I'm fine with approving it as a specless blueprint based on such assumption 16:59:35 and you're free to close this one as deferred if we consider this is only a doc patch 16:59:44 correct the default would be what we have today and the unformated behavior woudl be opt in 16:59:45 any objections ? 16:59:51 no objection from me 16:59:55 cool 17:00:11 #agreed https://blueprints.launchpad.net/nova/+spec/default-ephemeral-format-unformated accepted as specless blueprint for the 2023.1 cycle 17:00:17 that's it for me 17:00:21 nothing else on the agenda 17:00:23 thanks all 17:00:26 #endmeeting