16:00:19 <bauzas> #startmeeting nova
16:00:19 <opendevmeet> Meeting started Tue Jan 17 16:00:19 2023 UTC and is due to finish in 60 minutes.  The chair is bauzas. Information about MeetBot at http://wiki.debian.org/MeetBot.
16:00:19 <opendevmeet> Useful Commands: #action #agreed #help #info #idea #link #topic #startvote.
16:00:19 <opendevmeet> The meeting name has been set to 'nova'
16:00:24 <bauzas> gdi, just in time
16:00:32 <bauzas> #link https://wiki.openstack.org/wiki/Meetings/Nova#Agenda_for_next_meeting
16:00:36 <bauzas> hi everyone
16:00:40 <dansmith> o/
16:01:11 <elodilles> o/
16:01:59 <gibi> o/
16:02:00 <bauzas> okay let's start
16:02:06 <gibi> (I'm a bit distracted)
16:02:15 <bauzas> #topic Bugs (stuck/critical)
16:02:30 <bauzas> #info  One critical bug
16:02:33 <bauzas> #info  One critical bug
16:02:38 <Uggla> o/
16:03:08 <bauzas> #link https://bugs.launchpad.net/nova/+bug/2002951
16:03:20 <bauzas> gibi: I marked this one as critical for the sake of the discussion
16:03:22 <gmann> o/
16:03:29 <bauzas> but we can put it back to High
16:03:49 <bauzas> in general, I tend to triage CI bugs to Critical until we agree this is not holding the gate
16:04:04 <bauzas> do we want to discuss about it now or no ?
16:04:38 <gibi> sure
16:04:57 <gibi> I updated the bug
16:05:01 <bauzas> ok, so, gibi (mostly) and I looked at this one today
16:05:05 <gibi> I think it is tempest.api.compute.admin.test_volume.AttachSCSIVolumeTestJSON.test_attach_scsi_disk_with_config_drive test case that tirggers the OOM
16:05:11 <bauzas> yeah
16:05:17 <bauzas> and like I said, I tried to find wherer
16:06:17 <bauzas> but I wasn't able to see
16:06:34 <bauzas> context : https://github.com/openstack/tempest/blob/7c8b49becef78a257e2515970a552c84982f59cd/tempest/api/compute/admin/test_volume.py#L84-L120
16:06:46 <bauzas> we try to create an image
16:06:52 <bauzas> then we create an instance
16:07:06 <bauzas> and then a volume which we attach to the instance
16:07:44 <gibi> I haven't had time to look into the actual tc yet
16:07:45 <sean-k-mooney> p/
16:08:08 <gibi> also it would be nice to see how the python interpreter rss size grows during the test execution
16:08:11 <dansmith> yeah surely seems like a benign test case
16:08:38 <sean-k-mooney> we unfortuently dont have the memtacker stuff form devstack
16:08:50 <sean-k-mooney> btu it would be nice if we coudl get that and also dmsg in the tox based tests
16:08:52 <bauzas> I tried to grep the testname in n-api
16:09:01 <bauzas> but I wasn't finding it
16:09:07 <bauzas> so, either we no longer use it
16:09:17 <bauzas> or we were not yet calling the nova-api
16:09:34 <bauzas> which means we have the kill before creating the instance
16:09:55 <bauzas> but I could be wrong
16:10:53 <bauzas> anyway, folks are ok if we modify the bug to High ?
16:10:59 <bauzas> bug report*
16:11:32 <gibi> tomorrow I will continue looking but we can also tentatively try to disable this single test to see if that removes the OOM problem
16:12:18 <gibi> bauzas: I'm not against having this as High
16:12:26 <bauzas> ok
16:12:35 <bauzas> then let's look again tomorrow and we'll see what to do
16:13:08 <bauzas> this time I'm just afraid to remove this test because we don't know why we have a OOMkill
16:13:18 <bauzas> this could arrive to another test then
16:13:40 <gibi> yep, that would be my goal of disabling it temporary to see if the OOM just moves to another test case
16:13:44 <gibi> and to see which test case
16:13:48 <gibi> to find a pattern
16:13:48 <bauzas> (I also verified that nothing changed on the tempest side since 1 year for this test)
16:14:05 <dansmith> gibi: you could also rename it I think and change the sort ordering
16:14:10 <bauzas> yeah
16:14:11 <dansmith> afaik, we run tests sorted per worker
16:14:27 <bauzas> I was wondering, maybe this was a problem due to another test
16:14:28 <gibi> dansmith: good idea
16:14:45 <bauzas> dansmith: I think we can ask stestr to modifyh the sort
16:14:55 <dansmith> oh?
16:14:59 <bauzas> but I need to remember how to do it
16:15:05 <gibi> bauzas: on that I extracted all the test cases form the killed worker from multiple runs and the only test case overlap was this tc
16:15:35 <gibi> so if other test causing the issue then it is not a single test but a set of tests
16:15:49 <gibi> otherwise I would see an overlap
16:15:51 <bauzas> gibi: well, yeah, but that maybe means that the previous tests were adding more memory before so that's only with this test that OOMkiller wants to kill
16:16:14 <bauzas> as you see, this is a very simple test
16:16:22 <gibi> that is my point above, if a single test adds the extra memory usage then that woudl show up as an overlap between runs
16:16:30 <gibi> but it doesn't
16:16:45 <bauzas> gibi: that's why I'll try to see how to ask stestr to modify the sort
16:17:12 <gibi> yeah, moving this tc to the end can help to see if there is a set of tests that trigger this behavior
16:18:07 <gibi> anyhow I think we can move on
16:19:03 <bauzas> cool
16:19:19 <bauzas> #link https://bugs.launchpad.net/nova/+bugs?search=Search&field.status=New 27 new untriaged bugs (+0 since the last meeting)
16:19:45 <bauzas> I triaged a few bugs todaty
16:19:57 <bauzas> #link https://etherpad.opendev.org/p/nova-bug-triage-20230110
16:20:16 <bauzas> nothing to report here by now
16:20:21 <bauzas> #info Add yourself in the team bug roster if you want to help https://etherpad.opendev.org/p/nova-bug-triage-roster
16:20:28 <bauzas> gibi: wants to get the bug baton this week ?
16:21:23 <gibi> bauzas: sure I can
16:21:29 <bauzas> thanks alot
16:21:47 <bauzas> #info bug baton is being passed to gibi
16:21:53 <bauzas> #topic Gate status
16:21:58 <bauzas> #link https://bugs.launchpad.net/nova/+bugs?field.tag=gate-failure Nova gate bugs
16:22:14 <bauzas> we already discussed about the main one, wanting to discuss other CI bugs ?
16:22:38 <gibi> just a sort summary
16:22:38 <bauzas> looks not
16:22:41 <bauzas> ah
16:22:50 <bauzas> we're listening to you
16:22:53 <gibi> I see failures in our functional tests
16:23:13 <gibi> one is about missing db tables so it is probably interference between test cases
16:23:21 <gibi> we saw that before
16:23:30 <gibi> fixed it but not we had a non 100% fix
16:23:39 <bauzas> :/
16:24:08 <gibi> and there is a failure with db cursor need a reset
16:24:14 <gibi> it might be related to the above
16:24:17 <gibi> not sure yet
16:24:47 <bauzas> lovely
16:24:55 <gibi> these two I wanted to mention
16:25:08 <gibi> but there are other open bugs that appear in the gate time to time
16:25:09 <bauzas> flipping strest worker runs would help to trigger the races
16:25:20 <gibi> so it is fairly hard to land things overall
16:25:29 <bauzas> I could try to reproduce those functests locally
16:25:45 <bauzas> this would exhaust my laptop, but worth trying
16:26:18 <bauzas> gibi: let's then discuss this tomorrow as well
16:26:23 <gibi> sure
16:26:48 <bauzas> I mean, I have my power mgmt series to work on, but if we can't land things, nothing will merge either way.
16:27:03 <sean-k-mooney> the gate is not totally blocked
16:27:12 <sean-k-mooney> but its flaky enough that its hard
16:27:15 <bauzas> yeah, but rechecking is not a great option
16:27:16 <gibi> yepp
16:27:22 <sean-k-mooney> ya its not
16:27:29 <bauzas> agreed, I'm not sending the signal our gate is busted
16:27:35 <bauzas> but we know this is hard
16:27:43 <sean-k-mooney> one thing i have noticed is the py3.10 functional job seams more stable then py38
16:27:47 <bauzas> and let me go to the next topic and you'll understand why
16:28:01 <sean-k-mooney> for the db issues
16:28:09 <sean-k-mooney> but that could be just the ones i happend to look at
16:28:14 <bauzas> ok
16:28:22 <clarkb> sean-k-mooney: 3.10 introduced a much more deterministic thread scheduler. Also its quite a bit quicker in some projects which helps generally
16:28:32 <bauzas> ah, gdk
16:28:54 <bauzas> we probably have tests not correctly cleaning up data
16:28:58 <bauzas> so we need to bisect them
16:29:01 <sean-k-mooney> ya so im wondifing if we are blocked we might want to make the 3.8 one non voting while we try to fix this
16:29:14 <sean-k-mooney> but there are other issues so i dont think that will help much
16:29:19 <sean-k-mooney> just somethign to keep in mind
16:29:20 <bauzas> sean-k-mooney: before going that road, lemme try to bisect the faulty tests
16:29:25 <dansmith> I've seen it both waysm
16:29:26 <sean-k-mooney> yep
16:29:32 <dansmith> 3.10 passing with 3.8 failing and the other way
16:29:39 <dansmith> so I don't think disabling one gets us much
16:29:41 <sean-k-mooney> ok then its jsut flaky
16:29:50 <bauzas> lovely
16:29:53 <bauzas> moving on
16:29:59 <bauzas> we have some agenda today
16:30:09 <bauzas> #link https://zuul.openstack.org/builds?project=openstack%2Fnova&project=openstack%2Fplacement&pipeline=periodic-weekly Nova&Placement periodic jobs status
16:30:14 <bauzas> that's fun
16:30:34 <bauzas> despite https://review.opendev.org/c/openstack/tempest/+/866049 was merged, we still have the centos9-fips job timeouting
16:30:40 <bauzas> so I looked at the job def
16:30:57 <bauzas> and looks to me it no longer depends on the job I added extra timeout :)
16:31:22 <bauzas> so basically the patch that took 2 months to get landed is basically useless for our pipeline
16:31:26 <bauzas> funny, as I said
16:31:41 <gmann> I think we had progress on running fips testing on ubuntu but need to check if we have job ready. that can replace c9-fips jobs
16:31:48 <bauzas> so I'll just add the extra timeout on our local job definition
16:32:00 <opendevreview> Dan Smith proposed openstack/nova master: WIP: Detect host renames and abort startup  https://review.opendev.org/c/openstack/nova/+/863920
16:32:10 <bauzas> gmann: that's good to hear
16:32:29 <gmann> not merged yet #link https://review.opendev.org/c/openstack/project-config/+/867112
16:32:33 <bauzas> gmann: we could put fips in check pipeline then
16:32:47 <gmann> yeah that is plan once we have ubuntu based job
16:32:55 <bauzas> gmann: as a reminder, given centos9s, fips is on periodic pipeline
16:33:04 <gmann> yeah
16:33:08 <bauzas> anyway, this time it should be quickier
16:33:17 <bauzas> I'll just update our .zuul.yaml
16:33:36 <bauzas> oh wait
16:34:10 <bauzas> https://zuul.openstack.org/job/tempest-integrated-compute-centos-9-stream is actually defined in tempest
16:34:30 <bauzas> so I don't get why we don't benefit from the extra timeout
16:34:39 <gmann> yeah, we will prepare the tempest job and then add in project side gate
16:34:44 <sean-k-mooney> the job definition is yes
16:35:14 <bauzas> anyway, I don't want us to spill too much time about it
16:35:18 <gmann> not sure on timeout. c9s has been flasky for fips always
16:35:20 <bauzas> let's move on
16:35:26 <gmann> yes
16:35:30 <bauzas> #info Please look at the gate failures and file a bug report with the gate-failure tag.
16:35:35 <bauzas> #info STOP DOING BLIND RECHECKS aka. 'recheck' https://docs.openstack.org/project-team-guide/testing.html#how-to-handle-test-failures
16:35:39 <bauzas> #topic Release Planning
16:35:44 <bauzas> #link https://releases.openstack.org/antelope/schedule.html
16:35:49 <bauzas> #info Antelope-3 is in 4 weeks
16:35:51 <bauzas> tick tack
16:35:57 <bauzas> #info 17 Accepted blueprints for 2023.1 Antelope
16:36:04 <bauzas> which is the same amount than yoga
16:36:16 <bauzas> this is a large number given our team
16:36:42 <bauzas> given this, I'll create an etherpad for tracking each of them
16:36:44 <sean-k-mooney> there are 3 i expect to complete this week possibely more
16:36:51 <sean-k-mooney> dependign on review bandwith
16:36:56 <bauzas> sean-k-mooney: me too, but that still requires us some effort
16:37:24 <sean-k-mooney> i am a little worried for soem of them but hopeful we will land the majoriy of them
16:37:32 <bauzas> I mean, I know me, I'll need to put my review energy on the right way and an etherpad will help me to direct my energy productively
16:37:45 <sean-k-mooney> i dobth it will be too much over half
16:37:48 <bauzas> #link https://blueprints.launchpad.net/nova/antelope
16:38:02 <bauzas> you can find the list of those blueprints there ^
16:38:28 <bauzas> #topic Review priorities
16:38:34 <bauzas> #link https://review.opendev.org/q/status:open+(project:openstack/nova+OR+project:openstack/placement+OR+project:openstack/os-traits+OR+project:openstack/os-resource-classes+OR+project:openstack/os-vif+OR+project:openstack/python-novaclient+OR+project:openstack/osc-placement)+(label:Review-Priority%252B1+OR+label:Review-Priority%252B2)
16:38:40 <bauzas> #info As a reminder, cores eager to review changes can +1 to indicate their interest, +2 for committing to the review
16:39:18 <bauzas> nothing to mention here
16:39:23 <bauzas> #topic Stable Branches
16:39:27 <bauzas> elodilles: floor is yours
16:39:38 <elodilles> #info stable branches don't seem to be blocked, but patches mostly need rechecks
16:39:47 <elodilles> #info stable branch status / gate failures tracking etherpad: https://etherpad.opendev.org/p/nova-stable-branch-ci
16:39:54 <elodilles> and last but not least: Xena will transition to Extended Maintenance after the release of 2023.1 Antelope
16:40:00 <elodilles> so to prepare for that:
16:40:08 <elodilles> #info release patches were generated for *stable/xena* : https://review.opendev.org/q/topic:xena-stable+reviewer:sbauza%2540redhat.com
16:40:13 <sean-k-mooney> the release team proposed doign a release of several repos for xena. do we want ot wait for the tox pin to be merged
16:40:34 <elodilles> sean-k-mooney: which one do you mean?
16:40:45 <sean-k-mooney> the ones you were linking
16:40:46 <elodilles> (and that was all from me about stable branches)
16:40:59 <gmann> tox pin is merged for stable branches. it is done at central place in openstck-zuul-jobs repo
16:41:02 <bauzas> that's fun, stable branches are more stable than master :)
16:41:03 <sean-k-mooney> so we dont have the pin to tox<4 on xena yet
16:41:13 <sean-k-mooney> gmann: oh ok
16:41:22 <sean-k-mooney> i tought we needed to do it in the tox.ini too
16:41:28 <sean-k-mooney> so that it worked if you run tox loclaly
16:41:31 <gmann> let me check if osc-placement and python client is merged or not
16:41:40 <elodilles> no, the workaround was merged last week, as gmann says
16:41:51 <sean-k-mooney> will that work outside ci
16:42:17 <sean-k-mooney> im not sure hwo you can fix it centrally unless we did it in upper-constraits?
16:42:21 <gmann> yeah tox one is merged but this placement functional test this is not yet #link https://review.opendev.org/q/I4e3e5732411639054baaa9211a29e2e2c8210ac0
16:42:32 <gmann> bauzas: sean-k-mooney elodilles ^^
16:42:41 <elodilles> oh, i missed that somehow
16:42:45 <elodilles> will review ASAP
16:42:49 <bauzas> ack
16:42:51 <elodilles> sorry for that
16:42:52 <bauzas> tab open
16:43:07 <bauzas> I'll do my homework after the meeting
16:43:15 <sean-k-mooney> so my question still is not really answered
16:43:17 <elodilles> (the stable ones o:))
16:43:19 <gmann> thanks
16:43:42 <sean-k-mooney> where is tox pinned in https://github.com/openstack/nova/blob/stable/xena/tox.ini
16:43:44 <elodilles> sean-k-mooney: in that case we can wait until the xena one merges :)
16:43:47 <bauzas> sean-k-mooney: how to cap tox under 3 ?
16:43:58 <gmann> sean-k-mooney: only for CI. you mean to pin it in tox.ini itself ?
16:44:01 <bauzas> under 4, I mean
16:44:24 <sean-k-mooney> yes so that developers can also run tox locally to test backports
16:44:45 <gmann> sean-k-mooney: if we want to fix it for local run to make sure we do not run it with tox4 then yes we need to pin in tox.ini also but that can be done if we really need
16:44:47 <sean-k-mooney> i was asking should we do that before doing the final release for extended mainance
16:45:33 <elodilles> hmmm. good question.
16:45:44 <bauzas> that sounds doable to me
16:45:45 <gmann> for local run I think both way ok either make sure we have tox<4 in our env or pin it in tox.ini
16:46:04 <sean-k-mooney> i replciated the pin in ci downstream
16:46:32 <gmann> we did for python-novaclient https://review.opendev.org/c/openstack/python-novaclient/+/869598/2/tox.ini#4
16:46:35 <gmann> #link https://review.opendev.org/c/openstack/python-novaclient/+/869598/2/tox.ini#4
16:47:27 <sean-k-mooney> yes
16:47:38 <sean-k-mooney> so do we want to do it for all the other nova delivberable
16:47:46 <elodilles> then i'm OK to do the same and release after that merged
16:47:50 <sean-k-mooney> if so we should do it before the em tansition
16:48:35 <elodilles> yes, I'm OK with that, I don't see now any reason not to do it before the transition
16:49:56 <elodilles> (the generated xena release patches don't have deadlines, but best not to postpone them for weeks)
16:50:28 <bauzas> ok, sounds an agreement, we just need an owner
16:50:53 <sean-k-mooney> i can do it for os-vif maybe some of the others
16:51:00 <bauzas> ack
16:51:02 <sean-k-mooney> its really just one line and ensuring it works loocally
16:51:10 <bauzas> I know
16:51:12 <elodilles> sean-k-mooney: ping me if i forgot the reviews o:)
16:51:52 <bauzas> anyway I guess we're done with this topic and we have a specless blueprint ask in a sec
16:51:59 <bauzas> so, moving on
16:52:17 <bauzas> #topic Open discussion
16:52:30 <bauzas> (sean-k-mooney) https://blueprints.launchpad.net/nova/+spec/default-ephemeral-format-unformated
16:53:03 <sean-k-mooney> ya so tl;dr is currently we use libguestfs in two places in nova
16:53:12 <sean-k-mooney> file injection which is deprecated for a long time
16:53:25 <sean-k-mooney> and formating the filesystem of the addtional ephmeral disks
16:53:41 <bauzas> true
16:53:45 <sean-k-mooney> i would like to have a way to allwo tthe ephmeral disk to be unformated
16:53:53 <sean-k-mooney> making libguestfs optional
16:54:14 <sean-k-mooney> to the proposal is either add unformated to https://docs.openstack.org/nova/latest/configuration/config.html#DEFAULT.default_ephemeral_format
16:54:28 <sean-k-mooney> or sligly cleaner add a bool opt to trun off the formating
16:54:35 <bauzas> what does the default value which is None ?
16:54:55 <sean-k-mooney> and i want to kwno if there is a prefernce and if we think this could be a spec or specless
16:55:02 <dansmith> either is okay with me, I guess format=unformatted seems better to me because it's just another option for an existing knob
16:55:31 <sean-k-mooney> i need to check the default of none but i belive it makes it os dependednt
16:55:36 <bauzas> sean-k-mooney: I see None as the default value, what's then the behaviour ?
16:55:38 <bauzas> ok
16:55:43 <sean-k-mooney> i need to dig into this a little more
16:56:04 <sean-k-mooney> but basically i wanted ot know if peopel think this is ok to do this cycle
16:56:13 <sean-k-mooney> or shoudl we discuss in the ptg and do it next cycle
16:56:14 <bauzas> I think this is a very small feature
16:56:19 <bauzas> self-containede
16:56:26 <dansmith> yeah no need for lots of discussion, IMHO
16:56:30 <bauzas> particularly if we go with adding a new value
16:56:56 <sean-k-mooney> ok so 1 i need to document what none does. 2 determin if it can disable the formating today alredy
16:57:02 <bauzas> true
16:57:10 <sean-k-mooney> and 3 if not add unformated as an option to expcitly do that
16:57:20 <bauzas> sounds a simple plan to me
16:57:34 <sean-k-mooney> so at a minium ill add a docs change to say what none does
16:57:45 <sean-k-mooney> and we can then evaluate in the gerrit review if we need unformated
16:58:02 <gibi> sounds good to me
16:58:06 <bauzas> anyone objecting about this smallish effort for this cycle ?
16:58:29 <sean-k-mooney> if this ends up not being small i will punt to next cycle
16:58:45 <bauzas> I don't expect any behavioural change
16:59:03 <bauzas> so I'm fine with approving it as a specless blueprint based on such assumption
16:59:35 <bauzas> and you're free to close this one as deferred if we consider this is only a doc patch
16:59:44 <sean-k-mooney> correct the default would be what we have today and the unformated behavior woudl be opt in
16:59:45 <bauzas> any objections ?
16:59:51 <dansmith> no objection from me
16:59:55 <bauzas> cool
17:00:11 <bauzas> #agreed https://blueprints.launchpad.net/nova/+spec/default-ephemeral-format-unformated accepted as specless blueprint for the 2023.1 cycle
17:00:17 <bauzas> that's it for me
17:00:21 <bauzas> nothing else on the agenda
17:00:23 <bauzas> thanks all
17:00:26 <bauzas> #endmeeting