opendevreview | Merged openstack/nova master: Update Availability zone doc page https://review.opendev.org/c/openstack/nova/+/846463 | 00:06 |
---|---|---|
bauzas | gibi: re: https://bugs.launchpad.net/nova/+bug/2002951 OOM | 09:17 |
bauzas | gibi: based on the example you gave, those are the tests that were run for the failing worker https://paste.opendev.org/show/bUSshY14qpkpDQ5jraEt/ | 09:17 |
gibi | nothing really jumps out from that list | 09:18 |
bauzas | me too | 09:18 |
* bauzas looks at syslog | 09:19 | |
opendevreview | Aaron S proposed openstack/nova master: Add further workaround features for qemu_monitor_announce_self https://review.opendev.org/c/openstack/nova/+/867324 | 09:29 |
bauzas | gibi: looks like the test was downloading the image when it stacktraced | 09:30 |
bauzas | wait, no | 09:32 |
bauzas | timings don't match | 09:33 |
gibi | I don't think OOM kill will cause a stack trace, the process will simply disappear | 09:34 |
bauzas | my bad | 09:34 |
bauzas | I meant when it was killed | 09:34 |
gibi | also as we discussed the point where the OOM hit might not be close to the point where the killed process used up the excessive memory | 09:35 |
bauzas | I'm trying to find where the test was when the worker got killed | 09:35 |
gibi | from this we can rule out that it is on a specific provider https://paste.opendev.org/show/b1CpIgnpVmLh4YCUOIar/ I see failures on ovh, rax, inmotion | 09:41 |
bauzas | gibi: TIL how to ask subunit from a CI log : | 09:42 |
bauzas | (venv) [sbauza@sbauza zuul-logs.9HEwdg]$ cat testrepository.subunit | subunit-filter -s --xfail --with-tag=worker-0 | subunit-ls | 09:42 |
bauzas | a grep does the same but not by the same manner :D | 09:42 |
bauzas | gibi: do you have any idea why I'm seeing a tempest call 30 mins before the run is run ? | 09:43 |
bauzas | before the *test is run ? | 09:43 |
gibi | TZ difference in log? | 09:43 |
bauzas | gibi: https://paste.opendev.org/show/bI0yvTNy52PzFSQUsGze/ | 09:44 |
gibi | maybe job-output.txt rendered after the job failed | 09:46 |
gibi | hm | 09:46 |
gibi | I would believe the tempest_log over the job-output.txt about the time steps | 09:47 |
bauzas | me too | 09:47 |
bauzas | but look, the image eventually was downloaded | 09:47 |
bauzas | we can see the log | 09:47 |
bauzas | which means the HTTP call was done | 09:47 |
gibi | the OOM hit at 22:31:13 based on syslog | 09:47 |
gibi | that matches the tempest_log timestamp | 09:48 |
bauzas | good point then | 09:48 |
bauzas | gibi: I briefly looked at glance logs | 09:48 |
bauzas | as I said, the image was apparently fully downloaded in 7-ish secs | 09:48 |
bauzas | oh wait | 09:51 |
bauzas | gibi: https://paste.opendev.org/show/bLFZGO2MZTYjdRRV3DCM/ | 09:52 |
bauzas | looks like we were caching the image | 09:55 |
bauzas | as we got the new path, and then nothing | 09:55 |
bauzas | and the timings match this time | 09:56 |
gibi | hm this is interesting, in all the 16 nova-ceph-multistore jobs that failed in the last 10 days the same test case got killed https://paste.opendev.org/show/bmEzF6rFgucUibd4CqTX/ | 10:16 |
bauzas | gibi: and I guess we'll see the same, which is we want to get the image | 10:20 |
bauzas | gibi: I'm curious btw., I've seen you using a logsearch tool | 10:32 |
bauzas | is that a CLI about https://opensearch.logs.openstack.org/ ? | 10:33 |
gibi | nope, it is https://github.com/gibizer/zuul-log-search | 10:33 |
gibi | my homebrew tool for grepping zuul logs | 10:33 |
* bauzas straight git cloning | 10:34 | |
gibi | I took 6 recent runs and generated the list of test cases run in the killed worker | 10:50 |
gibi | then I checked for intersection of the set of test cases | 10:50 |
gibi | and it is only tempest.api.compute.admin.test_volume.AttachSCSIVolumeTestJSON.test_attach_scsi_disk_with_config_drive | 10:50 |
gibi | the one that got killed | 10:50 |
gibi | so this points to that single test case as a cause | 10:51 |
bauzas | gibi: I'm just grabbing another change logs for looking whether the test was also killed while downloading the image | 10:56 |
gibi | ack | 10:57 |
bauzas | ah, your tool only downloads a specific file if I use --file | 10:57 |
bauzas | gibi: can I get all zuul logs from a specific change ? | 10:58 |
gibi | no, you need to use --file to get a log downloaded. I did it opt-in as I mostly use it to wide search and I wanted to limit the disk and bandwidth usage | 10:58 |
bauzas | k | 10:59 |
gibi | feel free to open an issue in the repo to add such option | 10:59 |
bauzas | I can workaround it for a sec | 10:59 |
opendevreview | Kashyap Chamarthy proposed openstack/nova master: libvirt: At start-up skip compareCPU() with a workaround https://review.opendev.org/c/openstack/nova/+/870794 | 11:19 |
kashyap | gibi: When you get a minute, can you have a quick look at the unit test? I know I messed it up slightly but how I'm unclear :/ | 11:20 |
bauzas | gibi: I tried to look at alot of failing jobs and all of them are indeed failing with the same test | 11:36 |
bauzas | I tried to find where in https://github.com/openstack/tempest/blob/master/tempest/api/compute/admin/test_volume.py#L76 we have the oomkiller | 11:36 |
bauzas | but as you said, maybe it's killed after a few seconds | 11:36 |
sean-k-mooney | bauzas: im going to add a specless bluepint the meeting adgenda and try and implement it before then. we we decided to defer it thats ok but if we agree its trivial enough i would liek to include it in A | 13:01 |
bauzas | sean-k-mooney: ack | 13:03 |
zigo | Is there some docs somewhere explaining how to implement an OpenStack wsgi API with keystone auth? | 13:43 |
* zigo is starting a new project from scratch | 13:43 | |
zigo | FYI, I already got the db migration with Alembic done ... | 13:43 |
zigo | (plus oslo_config setup...) | 13:43 |
zigo | User docs are sometimes lacking info, dev docs are almost inexistant ... :( | 13:45 |
bauzas | zigo: you are deliberatly left with the choice you want | 13:55 |
bauzas | you just need to use keystonemiddleware lib | 13:55 |
bauzas | https://pypi.org/project/keystonemiddleware/ | 13:55 |
bauzas | https://docs.openstack.org/keystonemiddleware/latest/middlewarearchitecture.html describes the strategies you can choose for Auth'ing | 13:56 |
bauzas | a recommandation is to use paste for pipelining the WSGI middlewares | 13:57 |
zigo | Thanks. But there's no code example is shown in the keystonemiddleware's doc. | 13:58 |
zigo | Like many stuff, I'm stuck with a "look at other project, and attempt cut/past, then see what it does" strategy... | 13:59 |
bauzas | hah | 13:59 |
bauzas | that | 13:59 |
zigo | :) | 14:00 |
bauzas | yeah, in generall the overall workflow is prescribed, like in https://docs.openstack.org/project-team-guide/index.html | 14:00 |
bauzas | but beyond this, this is the project's team responsbility to decide how to implement what they want | 14:00 |
bauzas | like, the WSGI framework they prefer | 14:00 |
bauzas | or even the WSGI server they'd run with devstack | 14:01 |
bauzas | zigo: but honestly, the keystonemiddleware plugin isn't that hard to use | 14:03 |
zigo | I don't think that's the hardest part indeed. I just don't know where to start! :) | 14:03 |
bauzas | I suppose you just way the regular 'do the auth thing' by keystonmiddleware like in https://docs.openstack.org/keystonemiddleware/latest/middlewarearchitecture.html#authentication-component | 14:03 |
bauzas | zigo: we have a couple of openstack cookiecutters, if those still exist and are updated | 14:04 |
bauzas | but yeah, before incepting any code, I'd recommend to formalize your repo structure the openstack way | 14:04 |
zigo | Yeah, I used it. But it doesn't do: | 14:05 |
zigo | - alembic migrations | 14:05 |
zigo | - oslo.config | 14:05 |
zigo | - api | 14:05 |
zigo | ... | 14:05 |
bauzas | :) | 14:05 |
zigo | Yeah, I'm navigating through many projects to see how they are organized, and I'm trying to pick the best ones. | 14:05 |
bauzas | if you're asking for a 'Project inception 101 class', I'll make you sad, it doesn't exist :) | 14:06 |
bauzas | but you can surely bug us if you want guidance | 14:06 |
bauzas | I guess you know the project team guide ? | 14:06 |
zigo | Thanks ! :) | 14:06 |
bauzas | https://docs.openstack.org/project-team-guide/index.html | 14:06 |
*** dasm|off is now known as dasm | 14:07 | |
zigo | Well, I know how the community works, gerrit, release management, branches, etc. | 14:07 |
zigo | I don't think I even need to read this ! :) | 14:07 |
bauzas | yup, but there is a small but interesting section in that guide https://docs.openstack.org/project-team-guide/technical-guides/index.html | 14:07 |
opendevreview | Balazs Gibizer proposed openstack/nova master: Use new get_rpc_client API from oslo.messaging https://review.opendev.org/c/openstack/nova/+/869900 | 14:08 |
bauzas | you also have the API guidelines https://specs.openstack.org/openstack/api-wg/#guidelines | 14:08 |
bauzas | and then you're left with reading each of the oslo libs docs | 14:09 |
bauzas | assuming you want RPC | 14:09 |
zigo | Thanks for all of the links. | 14:15 |
zigo | I don't think I'll need RPC, but maybe along the way... | 14:15 |
bauzas | artom: sean-k-mooney: https://review.opendev.org/c/openstack/nova/+/869812 got a weak -1 because I think we need to add an upgrade section in reno | 14:29 |
bauzas | tl;dr: starting with 2023.1, users could request instance.example.com hostname for their instance, and it would fail | 14:29 |
bauzas | because of dhcp_domain | 14:30 |
sean-k-mooney | it wont fail but it will be modifed as currently don | 14:31 |
sean-k-mooney | but sure lets add that | 14:31 |
artom | bauzas, sure, OK | 14:32 |
bauzas | sean-k-mooney: yeah agreed "fail" is too broad | 14:33 |
bauzas | sean-k-mooney: I mean their instances won't get the hostname they expect | 14:33 |
bauzas | from the user pov | 14:33 |
kashyap | gibi: I think for my unit test question in the scroll, it's probably because I accidentally removed a mock. /me tries... | 14:36 |
gibi | kashyap: sorry, I haven't got back to that yet | 14:36 |
kashyap | Don't worry, I don't count on instant responses :-) | 14:36 |
kashyap | I know you're context-switching on several tihngs | 14:36 |
sean-k-mooney | bauzas: yep exactly so im fine with calling that out in the release note | 14:39 |
bauzas | sean-k-mooney: i wonder if operators will scream about it | 14:41 |
bauzas | of course we can't provide different input validation based on a config option | 14:41 |
bauzas | but still, they'll have to change something probably | 14:42 |
sahid | stephenfin: o/ | 14:42 |
sahid | I can see that you are involving on osprofiler | 14:43 |
sahid | I have question for you :-) | 14:43 |
sean-k-mooney | bauzas: we are not changing the exting behavior | 14:43 |
sean-k-mooney | so if they were not doing this before we blocked FQDNs they will get the same behavior | 14:44 |
sahid | ut's related to the driver Jaeger, we would like to add an option | 14:44 |
sahid | https://github.com/openstack/osprofiler/blob/master/osprofiler/drivers/jaeger.py#L56 | 14:44 |
sean-k-mooney | bauzas: that is why we orginally did not mention it in the sepc | 14:44 |
sean-k-mooney | we did dicuss this option in hte past | 14:44 |
sahid | basically the point will be to have a prefix for the service_name, so we can make a difference between for example services that are running on different region | 14:45 |
sahid | does that mae sense for you if I add an option like service_prefix | 14:45 |
sean-k-mooney | bauzas: so i dont think operators will be upset that we are mainting the behvior they expect | 14:45 |
sahid | from my understanding that one will be only useful for jaeger, so I'm considering adding a section [jaager] | 14:46 |
bauzas | sean-k-mooneytechnically I agree | 14:46 |
bauzas | sean-k-mooney: technically I agree | 14:46 |
bauzas | we never supported FQDNs | 14:46 |
sean-k-mooney | and those that used dispaly name with an fqdn hand it modifed by the config option | 14:47 |
bauzas | so when passing a hostname, cloud-init was getting a FQDN based on the hostname + the default domain name from the option | 14:47 |
sean-k-mooney | so if they wanted it to not be modifed they already had to set the config option to the empty string | 14:47 |
bauzas | which was consistent | 14:47 |
artom | bauzas, fixed | 14:47 |
kashyap | Unrelated: Is the "nova-tox-functional-py38" job passing reliably for everyone? - it's still failing with "TypeError: getresponse() got an unexpected keyword argument 'buffering' | 14:47 |
kashyap | " | 14:47 |
sean-k-mooney | bauzas: when stephen added --hostname we also added the dhcp_doamin to the dns name in neutron | 14:48 |
sean-k-mooney | we did nto modify what we put in the metadtaa | 14:48 |
sean-k-mooney | and we are not going to with artoms code because he is not changing that | 14:48 |
sean-k-mooney | it was only the value in the neutron port that was changed | 14:48 |
bauzas | you know what ? I'll play the ostrich about any kind of FQDN questions | 14:49 |
bauzas | once artom uploads his change, I'll review it and I'm done | 14:50 |
sean-k-mooney | bauzas: dont feel like you cant ask them | 14:50 |
sean-k-mooney | im just saying we are intentiolly not changing the behvior to not break anyone | 14:50 |
sean-k-mooney | and because we dont want cofnig dirven api behavior | 14:50 |
bauzas | the ship has sailed | 14:50 |
sean-k-mooney | so without removign the option entirly which would impact everyone | 14:50 |
sean-k-mooney | we cant really do much else | 14:50 |
bauzas | but honestly, I liked the fact that we were saying domain names was something unrelated to nova :) | 14:51 |
sean-k-mooney | yep so did i | 14:51 |
sean-k-mooney | this is the compromsie so that we dont have to care about them again once done | 14:51 |
sean-k-mooney | like all good compromises it does not make anyone happy but we can all live with it | 14:52 |
bauzas | the ostrich theory applied to me. | 14:52 |
sean-k-mooney | i prefer magpie psychology. distract people with other shiny things that matter | 14:53 |
bauzas | I need to learn the three-card monte | 14:55 |
bauzas | 'follow your card' | 14:55 |
kashyap | bauzas: sean-k-mooney: There's a thing called "Belgian compromise", which loosely means: | 14:56 |
sean-k-mooney | bauzas: added https://blueprints.launchpad.net/nova/+spec/default-ephemeral-format-unformated to meeting adjenda | 14:56 |
kashyap | (quote) | 14:57 |
kashyap | complex issues are settled by conceding something to every party concerned, through an agreement that is usually so complicated that nobody completely understands all its implications. | 14:57 |
kashyap | (/quote) | 14:57 |
sean-k-mooney | hehe ya sounds like a typical eu threaty | 14:57 |
opendevreview | Kashyap Chamarthy proposed openstack/nova master: libvirt: At start-up skip compareCPU() with a workaround https://review.opendev.org/c/openstack/nova/+/870794 | 14:58 |
sahid | stephenfin: https://bugs.launchpad.net/osprofiler/+bug/2003092 | 14:59 |
artom | bauzas, so I should just remove the ( ) about i18n? | 15:24 |
opendevreview | Alexey Stupnikov proposed openstack/nova stable/victoria: Test aborting queued live migration https://review.opendev.org/c/openstack/nova/+/845748 | 15:32 |
opendevreview | Kashyap Chamarthy proposed openstack/nova master: libvirt: At start-up allow skiping compareCPU() with a workaround https://review.opendev.org/c/openstack/nova/+/870794 | 15:38 |
bauzas | artom: yup IMHO | 15:49 |
bauzas | as a reminder nova meeting here in 10 mins | 15:50 |
opendevreview | Artom Lifshitz proposed openstack/nova master: Microversion 2.94: FQDN in hostname https://review.opendev.org/c/openstack/nova/+/869812 | 15:59 |
bauzas | #startmeeting nova | 16:00 |
opendevmeet | Meeting started Tue Jan 17 16:00:19 2023 UTC and is due to finish in 60 minutes. The chair is bauzas. Information about MeetBot at http://wiki.debian.org/MeetBot. | 16:00 |
opendevmeet | Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. | 16:00 |
opendevmeet | The meeting name has been set to 'nova' | 16:00 |
bauzas | gdi, just in time | 16:00 |
bauzas | #link https://wiki.openstack.org/wiki/Meetings/Nova#Agenda_for_next_meeting | 16:00 |
bauzas | hi everyone | 16:00 |
dansmith | o/ | 16:00 |
elodilles | o/ | 16:01 |
gibi | o/ | 16:01 |
bauzas | okay let's start | 16:02 |
gibi | (I'm a bit distracted) | 16:02 |
bauzas | #topic Bugs (stuck/critical) | 16:02 |
bauzas | #info One critical bug | 16:02 |
bauzas | #info One critical bug | 16:02 |
Uggla | o/ | 16:02 |
bauzas | #link https://bugs.launchpad.net/nova/+bug/2002951 | 16:03 |
bauzas | gibi: I marked this one as critical for the sake of the discussion | 16:03 |
gmann | o/ | 16:03 |
bauzas | but we can put it back to High | 16:03 |
bauzas | in general, I tend to triage CI bugs to Critical until we agree this is not holding the gate | 16:03 |
bauzas | do we want to discuss about it now or no ? | 16:04 |
gibi | sure | 16:04 |
gibi | I updated the bug | 16:04 |
bauzas | ok, so, gibi (mostly) and I looked at this one today | 16:05 |
gibi | I think it is tempest.api.compute.admin.test_volume.AttachSCSIVolumeTestJSON.test_attach_scsi_disk_with_config_drive test case that tirggers the OOM | 16:05 |
bauzas | yeah | 16:05 |
bauzas | and like I said, I tried to find wherer | 16:05 |
bauzas | but I wasn't able to see | 16:06 |
bauzas | context : https://github.com/openstack/tempest/blob/7c8b49becef78a257e2515970a552c84982f59cd/tempest/api/compute/admin/test_volume.py#L84-L120 | 16:06 |
bauzas | we try to create an image | 16:06 |
bauzas | then we create an instance | 16:06 |
bauzas | and then a volume which we attach to the instance | 16:07 |
gibi | I haven't had time to look into the actual tc yet | 16:07 |
sean-k-mooney | p/ | 16:07 |
gibi | also it would be nice to see how the python interpreter rss size grows during the test execution | 16:08 |
dansmith | yeah surely seems like a benign test case | 16:08 |
sean-k-mooney | we unfortuently dont have the memtacker stuff form devstack | 16:08 |
sean-k-mooney | btu it would be nice if we coudl get that and also dmsg in the tox based tests | 16:08 |
bauzas | I tried to grep the testname in n-api | 16:08 |
bauzas | but I wasn't finding it | 16:09 |
bauzas | so, either we no longer use it | 16:09 |
bauzas | or we were not yet calling the nova-api | 16:09 |
bauzas | which means we have the kill before creating the instance | 16:09 |
bauzas | but I could be wrong | 16:09 |
bauzas | anyway, folks are ok if we modify the bug to High ? | 16:10 |
bauzas | bug report* | 16:10 |
gibi | tomorrow I will continue looking but we can also tentatively try to disable this single test to see if that removes the OOM problem | 16:11 |
gibi | bauzas: I'm not against having this as High | 16:12 |
bauzas | ok | 16:12 |
bauzas | then let's look again tomorrow and we'll see what to do | 16:12 |
bauzas | this time I'm just afraid to remove this test because we don't know why we have a OOMkill | 16:13 |
bauzas | this could arrive to another test then | 16:13 |
gibi | yep, that would be my goal of disabling it temporary to see if the OOM just moves to another test case | 16:13 |
gibi | and to see which test case | 16:13 |
gibi | to find a pattern | 16:13 |
bauzas | (I also verified that nothing changed on the tempest side since 1 year for this test) | 16:13 |
dansmith | gibi: you could also rename it I think and change the sort ordering | 16:14 |
bauzas | yeah | 16:14 |
dansmith | afaik, we run tests sorted per worker | 16:14 |
bauzas | I was wondering, maybe this was a problem due to another test | 16:14 |
gibi | dansmith: good idea | 16:14 |
bauzas | dansmith: I think we can ask stestr to modifyh the sort | 16:14 |
dansmith | oh? | 16:14 |
bauzas | but I need to remember how to do it | 16:14 |
gibi | bauzas: on that I extracted all the test cases form the killed worker from multiple runs and the only test case overlap was this tc | 16:15 |
gibi | so if other test causing the issue then it is not a single test but a set of tests | 16:15 |
gibi | otherwise I would see an overlap | 16:15 |
bauzas | gibi: well, yeah, but that maybe means that the previous tests were adding more memory before so that's only with this test that OOMkiller wants to kill | 16:15 |
bauzas | as you see, this is a very simple test | 16:16 |
gibi | that is my point above, if a single test adds the extra memory usage then that woudl show up as an overlap between runs | 16:16 |
gibi | but it doesn't | 16:16 |
bauzas | gibi: that's why I'll try to see how to ask stestr to modify the sort | 16:16 |
gibi | yeah, moving this tc to the end can help to see if there is a set of tests that trigger this behavior | 16:17 |
gibi | anyhow I think we can move on | 16:18 |
bauzas | cool | 16:19 |
bauzas | #link https://bugs.launchpad.net/nova/+bugs?search=Search&field.status=New 27 new untriaged bugs (+0 since the last meeting) | 16:19 |
bauzas | I triaged a few bugs todaty | 16:19 |
bauzas | #link https://etherpad.opendev.org/p/nova-bug-triage-20230110 | 16:19 |
bauzas | nothing to report here by now | 16:20 |
bauzas | #info Add yourself in the team bug roster if you want to help https://etherpad.opendev.org/p/nova-bug-triage-roster | 16:20 |
bauzas | gibi: wants to get the bug baton this week ? | 16:20 |
gibi | bauzas: sure I can | 16:21 |
bauzas | thanks alot | 16:21 |
bauzas | #info bug baton is being passed to gibi | 16:21 |
bauzas | #topic Gate status | 16:21 |
bauzas | #link https://bugs.launchpad.net/nova/+bugs?field.tag=gate-failure Nova gate bugs | 16:21 |
bauzas | we already discussed about the main one, wanting to discuss other CI bugs ? | 16:22 |
gibi | just a sort summary | 16:22 |
bauzas | looks not | 16:22 |
bauzas | ah | 16:22 |
bauzas | we're listening to you | 16:22 |
gibi | I see failures in our functional tests | 16:22 |
gibi | one is about missing db tables so it is probably interference between test cases | 16:23 |
gibi | we saw that before | 16:23 |
gibi | fixed it but not we had a non 100% fix | 16:23 |
bauzas | :/ | 16:23 |
gibi | and there is a failure with db cursor need a reset | 16:24 |
gibi | it might be related to the above | 16:24 |
gibi | not sure yet | 16:24 |
bauzas | lovely | 16:24 |
gibi | these two I wanted to mention | 16:24 |
gibi | but there are other open bugs that appear in the gate time to time | 16:25 |
bauzas | flipping strest worker runs would help to trigger the races | 16:25 |
gibi | so it is fairly hard to land things overall | 16:25 |
bauzas | I could try to reproduce those functests locally | 16:25 |
bauzas | this would exhaust my laptop, but worth trying | 16:25 |
bauzas | gibi: let's then discuss this tomorrow as well | 16:26 |
gibi | sure | 16:26 |
bauzas | I mean, I have my power mgmt series to work on, but if we can't land things, nothing will merge either way. | 16:26 |
sean-k-mooney | the gate is not totally blocked | 16:27 |
sean-k-mooney | but its flaky enough that its hard | 16:27 |
bauzas | yeah, but rechecking is not a great option | 16:27 |
gibi | yepp | 16:27 |
sean-k-mooney | ya its not | 16:27 |
bauzas | agreed, I'm not sending the signal our gate is busted | 16:27 |
bauzas | but we know this is hard | 16:27 |
sean-k-mooney | one thing i have noticed is the py3.10 functional job seams more stable then py38 | 16:27 |
bauzas | and let me go to the next topic and you'll understand why | 16:27 |
sean-k-mooney | for the db issues | 16:28 |
sean-k-mooney | but that could be just the ones i happend to look at | 16:28 |
bauzas | ok | 16:28 |
clarkb | sean-k-mooney: 3.10 introduced a much more deterministic thread scheduler. Also its quite a bit quicker in some projects which helps generally | 16:28 |
bauzas | ah, gdk | 16:28 |
bauzas | we probably have tests not correctly cleaning up data | 16:28 |
bauzas | so we need to bisect them | 16:28 |
sean-k-mooney | ya so im wondifing if we are blocked we might want to make the 3.8 one non voting while we try to fix this | 16:29 |
sean-k-mooney | but there are other issues so i dont think that will help much | 16:29 |
sean-k-mooney | just somethign to keep in mind | 16:29 |
bauzas | sean-k-mooney: before going that road, lemme try to bisect the faulty tests | 16:29 |
dansmith | I've seen it both waysm | 16:29 |
sean-k-mooney | yep | 16:29 |
dansmith | 3.10 passing with 3.8 failing and the other way | 16:29 |
dansmith | so I don't think disabling one gets us much | 16:29 |
sean-k-mooney | ok then its jsut flaky | 16:29 |
bauzas | lovely | 16:29 |
bauzas | moving on | 16:29 |
bauzas | we have some agenda today | 16:29 |
bauzas | #link https://zuul.openstack.org/builds?project=openstack%2Fnova&project=openstack%2Fplacement&pipeline=periodic-weekly Nova&Placement periodic jobs status | 16:30 |
bauzas | that's fun | 16:30 |
bauzas | despite https://review.opendev.org/c/openstack/tempest/+/866049 was merged, we still have the centos9-fips job timeouting | 16:30 |
bauzas | so I looked at the job def | 16:30 |
bauzas | and looks to me it no longer depends on the job I added extra timeout :) | 16:30 |
bauzas | so basically the patch that took 2 months to get landed is basically useless for our pipeline | 16:31 |
bauzas | funny, as I said | 16:31 |
gmann | I think we had progress on running fips testing on ubuntu but need to check if we have job ready. that can replace c9-fips jobs | 16:31 |
bauzas | so I'll just add the extra timeout on our local job definition | 16:31 |
opendevreview | Dan Smith proposed openstack/nova master: WIP: Detect host renames and abort startup https://review.opendev.org/c/openstack/nova/+/863920 | 16:32 |
bauzas | gmann: that's good to hear | 16:32 |
gmann | not merged yet #link https://review.opendev.org/c/openstack/project-config/+/867112 | 16:32 |
bauzas | gmann: we could put fips in check pipeline then | 16:32 |
gmann | yeah that is plan once we have ubuntu based job | 16:32 |
bauzas | gmann: as a reminder, given centos9s, fips is on periodic pipeline | 16:32 |
gmann | yeah | 16:33 |
bauzas | anyway, this time it should be quickier | 16:33 |
bauzas | I'll just update our .zuul.yaml | 16:33 |
bauzas | oh wait | 16:33 |
bauzas | https://zuul.openstack.org/job/tempest-integrated-compute-centos-9-stream is actually defined in tempest | 16:34 |
bauzas | so I don't get why we don't benefit from the extra timeout | 16:34 |
gmann | yeah, we will prepare the tempest job and then add in project side gate | 16:34 |
sean-k-mooney | the job definition is yes | 16:34 |
bauzas | anyway, I don't want us to spill too much time about it | 16:35 |
gmann | not sure on timeout. c9s has been flasky for fips always | 16:35 |
bauzas | let's move on | 16:35 |
gmann | yes | 16:35 |
bauzas | #info Please look at the gate failures and file a bug report with the gate-failure tag. | 16:35 |
bauzas | #info STOP DOING BLIND RECHECKS aka. 'recheck' https://docs.openstack.org/project-team-guide/testing.html#how-to-handle-test-failures | 16:35 |
bauzas | #topic Release Planning | 16:35 |
bauzas | #link https://releases.openstack.org/antelope/schedule.html | 16:35 |
bauzas | #info Antelope-3 is in 4 weeks | 16:35 |
bauzas | tick tack | 16:35 |
bauzas | #info 17 Accepted blueprints for 2023.1 Antelope | 16:35 |
bauzas | which is the same amount than yoga | 16:36 |
bauzas | this is a large number given our team | 16:36 |
bauzas | given this, I'll create an etherpad for tracking each of them | 16:36 |
sean-k-mooney | there are 3 i expect to complete this week possibely more | 16:36 |
sean-k-mooney | dependign on review bandwith | 16:36 |
bauzas | sean-k-mooney: me too, but that still requires us some effort | 16:36 |
sean-k-mooney | i am a little worried for soem of them but hopeful we will land the majoriy of them | 16:37 |
bauzas | I mean, I know me, I'll need to put my review energy on the right way and an etherpad will help me to direct my energy productively | 16:37 |
sean-k-mooney | i dobth it will be too much over half | 16:37 |
bauzas | #link https://blueprints.launchpad.net/nova/antelope | 16:37 |
bauzas | you can find the list of those blueprints there ^ | 16:38 |
bauzas | #topic Review priorities | 16:38 |
bauzas | #link https://review.opendev.org/q/status:open+(project:openstack/nova+OR+project:openstack/placement+OR+project:openstack/os-traits+OR+project:openstack/os-resource-classes+OR+project:openstack/os-vif+OR+project:openstack/python-novaclient+OR+project:openstack/osc-placement)+(label:Review-Priority%252B1+OR+label:Review-Priority%252B2) | 16:38 |
bauzas | #info As a reminder, cores eager to review changes can +1 to indicate their interest, +2 for committing to the review | 16:38 |
bauzas | nothing to mention here | 16:39 |
bauzas | #topic Stable Branches | 16:39 |
bauzas | elodilles: floor is yours | 16:39 |
elodilles | #info stable branches don't seem to be blocked, but patches mostly need rechecks | 16:39 |
elodilles | #info stable branch status / gate failures tracking etherpad: https://etherpad.opendev.org/p/nova-stable-branch-ci | 16:39 |
elodilles | and last but not least: Xena will transition to Extended Maintenance after the release of 2023.1 Antelope | 16:39 |
elodilles | so to prepare for that: | 16:40 |
elodilles | #info release patches were generated for *stable/xena* : https://review.opendev.org/q/topic:xena-stable+reviewer:sbauza%2540redhat.com | 16:40 |
sean-k-mooney | the release team proposed doign a release of several repos for xena. do we want ot wait for the tox pin to be merged | 16:40 |
elodilles | sean-k-mooney: which one do you mean? | 16:40 |
sean-k-mooney | the ones you were linking | 16:40 |
elodilles | (and that was all from me about stable branches) | 16:40 |
gmann | tox pin is merged for stable branches. it is done at central place in openstck-zuul-jobs repo | 16:40 |
bauzas | that's fun, stable branches are more stable than master :) | 16:41 |
sean-k-mooney | so we dont have the pin to tox<4 on xena yet | 16:41 |
sean-k-mooney | gmann: oh ok | 16:41 |
sean-k-mooney | i tought we needed to do it in the tox.ini too | 16:41 |
sean-k-mooney | so that it worked if you run tox loclaly | 16:41 |
gmann | let me check if osc-placement and python client is merged or not | 16:41 |
elodilles | no, the workaround was merged last week, as gmann says | 16:41 |
sean-k-mooney | will that work outside ci | 16:41 |
sean-k-mooney | im not sure hwo you can fix it centrally unless we did it in upper-constraits? | 16:42 |
gmann | yeah tox one is merged but this placement functional test this is not yet #link https://review.opendev.org/q/I4e3e5732411639054baaa9211a29e2e2c8210ac0 | 16:42 |
gmann | bauzas: sean-k-mooney elodilles ^^ | 16:42 |
elodilles | oh, i missed that somehow | 16:42 |
elodilles | will review ASAP | 16:42 |
bauzas | ack | 16:42 |
elodilles | sorry for that | 16:42 |
bauzas | tab open | 16:42 |
bauzas | I'll do my homework after the meeting | 16:43 |
sean-k-mooney | so my question still is not really answered | 16:43 |
elodilles | (the stable ones o:)) | 16:43 |
gmann | thanks | 16:43 |
sean-k-mooney | where is tox pinned in https://github.com/openstack/nova/blob/stable/xena/tox.ini | 16:43 |
elodilles | sean-k-mooney: in that case we can wait until the xena one merges :) | 16:43 |
bauzas | sean-k-mooney: how to cap tox under 3 ? | 16:43 |
gmann | sean-k-mooney: only for CI. you mean to pin it in tox.ini itself ? | 16:43 |
bauzas | under 4, I mean | 16:44 |
sean-k-mooney | yes so that developers can also run tox locally to test backports | 16:44 |
gmann | sean-k-mooney: if we want to fix it for local run to make sure we do not run it with tox4 then yes we need to pin in tox.ini also but that can be done if we really need | 16:44 |
sean-k-mooney | i was asking should we do that before doing the final release for extended mainance | 16:44 |
elodilles | hmmm. good question. | 16:45 |
bauzas | that sounds doable to me | 16:45 |
gmann | for local run I think both way ok either make sure we have tox<4 in our env or pin it in tox.ini | 16:45 |
sean-k-mooney | i replciated the pin in ci downstream | 16:46 |
gmann | we did for python-novaclient https://review.opendev.org/c/openstack/python-novaclient/+/869598/2/tox.ini#4 | 16:46 |
gmann | #link https://review.opendev.org/c/openstack/python-novaclient/+/869598/2/tox.ini#4 | 16:46 |
sean-k-mooney | yes | 16:47 |
sean-k-mooney | so do we want to do it for all the other nova delivberable | 16:47 |
elodilles | then i'm OK to do the same and release after that merged | 16:47 |
sean-k-mooney | if so we should do it before the em tansition | 16:47 |
elodilles | yes, I'm OK with that, I don't see now any reason not to do it before the transition | 16:48 |
elodilles | (the generated xena release patches don't have deadlines, but best not to postpone them for weeks) | 16:49 |
bauzas | ok, sounds an agreement, we just need an owner | 16:50 |
sean-k-mooney | i can do it for os-vif maybe some of the others | 16:50 |
bauzas | ack | 16:51 |
sean-k-mooney | its really just one line and ensuring it works loocally | 16:51 |
bauzas | I know | 16:51 |
elodilles | sean-k-mooney: ping me if i forgot the reviews o:) | 16:51 |
bauzas | anyway I guess we're done with this topic and we have a specless blueprint ask in a sec | 16:51 |
bauzas | so, moving on | 16:51 |
bauzas | #topic Open discussion | 16:52 |
bauzas | (sean-k-mooney) https://blueprints.launchpad.net/nova/+spec/default-ephemeral-format-unformated | 16:52 |
sean-k-mooney | ya so tl;dr is currently we use libguestfs in two places in nova | 16:53 |
sean-k-mooney | file injection which is deprecated for a long time | 16:53 |
sean-k-mooney | and formating the filesystem of the addtional ephmeral disks | 16:53 |
bauzas | true | 16:53 |
sean-k-mooney | i would like to have a way to allwo tthe ephmeral disk to be unformated | 16:53 |
sean-k-mooney | making libguestfs optional | 16:53 |
sean-k-mooney | to the proposal is either add unformated to https://docs.openstack.org/nova/latest/configuration/config.html#DEFAULT.default_ephemeral_format | 16:54 |
sean-k-mooney | or sligly cleaner add a bool opt to trun off the formating | 16:54 |
bauzas | what does the default value which is None ? | 16:54 |
sean-k-mooney | and i want to kwno if there is a prefernce and if we think this could be a spec or specless | 16:54 |
dansmith | either is okay with me, I guess format=unformatted seems better to me because it's just another option for an existing knob | 16:55 |
sean-k-mooney | i need to check the default of none but i belive it makes it os dependednt | 16:55 |
bauzas | sean-k-mooney: I see None as the default value, what's then the behaviour ? | 16:55 |
bauzas | ok | 16:55 |
sean-k-mooney | i need to dig into this a little more | 16:55 |
sean-k-mooney | but basically i wanted ot know if peopel think this is ok to do this cycle | 16:56 |
sean-k-mooney | or shoudl we discuss in the ptg and do it next cycle | 16:56 |
bauzas | I think this is a very small feature | 16:56 |
bauzas | self-containede | 16:56 |
dansmith | yeah no need for lots of discussion, IMHO | 16:56 |
bauzas | particularly if we go with adding a new value | 16:56 |
sean-k-mooney | ok so 1 i need to document what none does. 2 determin if it can disable the formating today alredy | 16:56 |
bauzas | true | 16:57 |
sean-k-mooney | and 3 if not add unformated as an option to expcitly do that | 16:57 |
bauzas | sounds a simple plan to me | 16:57 |
sean-k-mooney | so at a minium ill add a docs change to say what none does | 16:57 |
sean-k-mooney | and we can then evaluate in the gerrit review if we need unformated | 16:57 |
gibi | sounds good to me | 16:58 |
bauzas | anyone objecting about this smallish effort for this cycle ? | 16:58 |
sean-k-mooney | if this ends up not being small i will punt to next cycle | 16:58 |
bauzas | I don't expect any behavioural change | 16:58 |
bauzas | so I'm fine with approving it as a specless blueprint based on such assumption | 16:59 |
bauzas | and you're free to close this one as deferred if we consider this is only a doc patch | 16:59 |
sean-k-mooney | correct the default would be what we have today and the unformated behavior woudl be opt in | 16:59 |
bauzas | any objections ? | 16:59 |
dansmith | no objection from me | 16:59 |
bauzas | cool | 16:59 |
bauzas | #agreed https://blueprints.launchpad.net/nova/+spec/default-ephemeral-format-unformated accepted as specless blueprint for the 2023.1 cycle | 17:00 |
bauzas | that's it for me | 17:00 |
bauzas | nothing else on the agenda | 17:00 |
bauzas | thanks all | 17:00 |
bauzas | #endmeeting | 17:00 |
opendevmeet | Meeting ended Tue Jan 17 17:00:26 2023 UTC. Information about MeetBot at http://wiki.debian.org/MeetBot . (v 0.1.4) | 17:00 |
opendevmeet | Minutes: https://meetings.opendev.org/meetings/nova/2023/nova.2023-01-17-16.00.html | 17:00 |
opendevmeet | Minutes (text): https://meetings.opendev.org/meetings/nova/2023/nova.2023-01-17-16.00.txt | 17:00 |
opendevmeet | Log: https://meetings.opendev.org/meetings/nova/2023/nova.2023-01-17-16.00.log.html | 17:00 |
elodilles | thanks o/ | 17:01 |
gibi | o/ | 17:01 |
bauzas | sean-k-mooney: https://github.com/openstack/nova/blob/9e2ca01988b8889738eba3c9af336ad82d214e1b/nova/virt/libvirt/utils.py#L226 | 17:03 |
sean-k-mooney | so none is virt driver dpendent and for libvirt its ext4 | 17:05 |
sean-k-mooney | if the glance image does not have an OS | 17:06 |
sean-k-mooney | *os_type set | 17:06 |
sean-k-mooney | also why are we importing the constant form the privsep module nova.privsep.fs.FS_FORMAT_EXT4 | 17:06 |
sean-k-mooney | that just feel lazy | 17:06 |
sean-k-mooney | bauzas: actully thats in create_ploop_image | 17:07 |
bauzas | correct | 17:07 |
sean-k-mooney | so that is only used for openvz | 17:07 |
sean-k-mooney | that not what we do for qemu/kvm | 17:07 |
bauzas | indeed | 17:08 |
bauzas | https://github.com/openstack/nova/blob/b8a5961161da4a33c4d9c80e3025d9ff6eaf5326/nova/privsep/fs.py#L299-L302 | 17:08 |
sean-k-mooney | https://github.com/openstack/nova/blob/9e2ca01988b8889738eba3c9af336ad82d214e1b/nova/privsep/fs.py#L257-L259 | 17:08 |
bauzas | yup | 17:08 |
sean-k-mooney | so for qemu/kvm we default to vfat | 17:09 |
bauzas | anyway, the behaviour of None seems consistent | 17:09 |
sean-k-mooney | its virt driver and virt_type dependent | 17:09 |
sean-k-mooney | so ya a new option is what we want | 17:09 |
sean-k-mooney | well value | 17:09 |
bauzas | this is just saying "let the virt driver decide for me or the os type" | 17:09 |
sean-k-mooney | of unformatted | 17:09 |
sean-k-mooney | yep | 17:09 |
sean-k-mooney | ok will ill update the docs text to call that out | 17:10 |
bauzas | correct, we need an extra explicit value | 17:10 |
bauzas | if we just want an unformatted partition | 17:10 |
sean-k-mooney | i breifly looked at this this morning but didnt have time to fully get to the bottom of it | 17:10 |
sean-k-mooney | not even a partion a blank file | 17:10 |
sean-k-mooney | so if we set unformatted we will jsut get the empty disk | 17:11 |
sean-k-mooney | and its up to the user to partion and format it as they see fit | 17:11 |
sean-k-mooney | just like a blank cinder volume | 17:11 |
bauzas | I see | 17:11 |
opendevreview | Merged openstack/osc-placement master: Use pypi released version of placement in functional tests https://review.opendev.org/c/openstack/osc-placement/+/869755 | 17:50 |
sean-k-mooney | sigh... https://review.opendev.org/c/openstack/nova/+/869900 will never merge | 17:53 |
sean-k-mooney | bauzas: summerised usecase here https://blueprints.launchpad.net/nova/+spec/default-ephemeral-format-unformated | 18:14 |
sean-k-mooney | in the whiteboard | 18:14 |
bauzas | sean-k-mooney: all good thanks | 18:14 |
dansmith | yeah looking at the latest ceph-multistore oom, one of tempest's workers is using >900MiB of ram, where the others are <10 | 18:43 |
dansmith | so something is clearly going haywire | 18:43 |
dansmith | gmann: you're aware of this right? | 18:43 |
dansmith | I assume if you knew of any recent tempest changes that could be responsible, you'd have spoken up by now :) | 18:43 |
gmann | dansmith: no, I cannot recall any relevant change happened in tempest | 18:57 |
dansmith | yeah I looked too, and nothing much lately | 18:57 |
gmann | and no stestr version change recently | 18:59 |
dansmith | I'm trying to stack for the first time this year so I can repro and I'm getting this failure to install pbr.build when it tries to install os-testr | 19:01 |
dansmith | ModuleNotFoundError: No module named 'pbr.build' | 19:01 |
dansmith | is this known? | 19:01 |
dansmith | gmann: ^ | 19:01 |
gmann | no, i did not see this before | 19:03 |
sean-k-mooney | stephenfin: fixed a pbr issues recently not sure if its releaed | 19:04 |
sean-k-mooney | this https://review.opendev.org/q/topic:pep-517 but that looks more tox 4 related | 19:05 |
dansmith | blargh | 19:06 |
dansmith | I can install os-testr without constraints, but it fails like this otherwise | 19:06 |
sean-k-mooney | wait os-testr | 19:07 |
gmann | yeah I think those were tox4 related | 19:07 |
sean-k-mooney | what is using that | 19:07 |
dansmith | sean-k-mooney: +./stack.sh:main:803 pip_install -U os-testr | 19:07 |
sean-k-mooney | we shoud not be using os-testr anywhere anymore | 19:07 |
sean-k-mooney | everthing should be using stestr | 19:07 |
gmann | that is installed successfully in gate I think that is using constraint? | 19:07 |
gmann | https://zuul.opendev.org/t/openstack/build/0fc9dc8ecbe748498c941c6f21cbf057/log/job-output.txt#4367 | 19:07 |
dansmith | yeah I dunno why I can't install it | 19:08 |
sean-k-mooney | its in uc ya https://opendev.org/openstack/requirements/src/branch/master/openstack_requirements/tests/files/upper-constraints.txt#L359 | 19:08 |
dansmith | unless it's a mirror sync thing? | 19:08 |
sean-k-mooney | https://opendev.org/openstack/devstack/src/branch/master/stack.sh#L803 | 19:08 |
sean-k-mooney | comment ou tthat line | 19:09 |
gmann | not this week but last week i stack successfully | 19:09 |
clarkb | note contraints don't pick what is installed. Only what version to install if something is to be installed | 19:09 |
dansmith | sean-k-mooney: heh, yeah, I could but.. | 19:09 |
sean-k-mooney | clarkb: yep i know just pointing out that it would be constrained if it was installed for other projects | 19:10 |
dansmith | I purged pbr and os-testr locally, which might have gotten past it | 19:10 |
sean-k-mooney | pip_install -U os-testr | 19:10 |
sean-k-mooney | so that is unconstiaed in devstack | 19:11 |
sean-k-mooney | unless pip_install in devstack add uc by default | 19:11 |
* sean-k-mooney checks | 19:11 | |
gmann | https://zuul.opendev.org/t/openstack/build/0fc9dc8ecbe748498c941c6f21cbf057/log/job-output.txt#4367 | 19:12 |
dansmith | sean-k-mooney: it does | 19:12 |
dansmith | I could install it myself without uc | 19:12 |
dansmith | but just purging those two packages locally seems to have worked | 19:12 |
dansmith | so maybe some not-so-correct version deps | 19:13 |
sean-k-mooney | what os are you using out of interest | 19:13 |
gmann | maybe | 19:13 |
dansmith | sean-k-mooney: focal | 19:13 |
sean-k-mooney | you said you had not stacked this year yet is it 20.04 | 19:13 |
sean-k-mooney | ok that should also work | 19:14 |
sean-k-mooney | well for the next 4-6 weeks | 19:14 |
gmann | focal should all work for latest master/constraints | 19:14 |
sean-k-mooney | yep | 19:15 |
sean-k-mooney | i just ment that after RC1 and we reopen master for bobcat | 19:15 |
sean-k-mooney | then it wont be in the testing runtime | 19:15 |
sean-k-mooney | nor wil python 3.8 | 19:15 |
sean-k-mooney | so it will proably be fine but it could start breaking | 19:15 |
opendevreview | Sofia Enriquez proposed openstack/nova master: Check NFS protocol https://review.opendev.org/c/openstack/nova/+/854030 | 20:02 |
opendevreview | Dan Smith proposed openstack/nova master: WIP: Detect host renames and abort startup https://review.opendev.org/c/openstack/nova/+/863920 | 20:05 |
Generated by irclog2html.py 2.17.3 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!