*** han-guangyu is now known as Guest3661 | 01:48 | |
*** han-guangyu_ is now known as han-guangyu | 01:51 | |
opendevreview | HanGuangyu proposed openstack/nova master: Unify ServersController._flavor_id_from_req_data param to server_dict https://review.opendev.org/c/openstack/nova/+/898315 | 02:14 |
---|---|---|
opendevreview | HanGuangyu proposed openstack/nova master: Unify ServersController._flavor_id_from_req_data param to server_dict https://review.opendev.org/c/openstack/nova/+/898315 | 05:19 |
opendevreview | HanGuangyu proposed openstack/nova master: Unify ServersController._flavor_id_from_req_data param to server_dict https://review.opendev.org/c/openstack/nova/+/898315 | 05:40 |
opendevreview | HanGuangyu proposed openstack/nova master: Unify ServersController._flavor_id_from_req_data param to server_dict https://review.opendev.org/c/openstack/nova/+/898315 | 08:34 |
opendevreview | Pavlo Shchelokovskyy proposed openstack/nova master: Forbid use_cow_images together with flat images_type https://review.opendev.org/c/openstack/nova/+/898229 | 08:44 |
opendevreview | Alexey Stupnikov proposed openstack/nova master: Translate VF network capabilities to port binding https://review.opendev.org/c/openstack/nova/+/884439 | 13:28 |
opendevreview | Alexey Stupnikov proposed openstack/nova master: Translate VF network capabilities to port binding https://review.opendev.org/c/openstack/nova/+/884439 | 13:30 |
opendevreview | Alexey Stupnikov proposed openstack/nova master: Translate VF network capabilities to port binding https://review.opendev.org/c/openstack/nova/+/884439 | 13:31 |
opendevreview | Tobias Urdin proposed openstack/nova master: [WIP] Handle scaling of cputune.shares https://review.opendev.org/c/openstack/nova/+/898326 | 13:40 |
sean-k-mooney | tobias-urdin: thisis not something we can do in nova | 13:41 |
sean-k-mooney | tobias-urdin: we can discuss it again but we considerd that option and rejected it before | 13:41 |
sean-k-mooney | tobias-urdin: its not jsut eh cpu_shares that woudl need to be adjusted | 13:42 |
opendevreview | Tobias Urdin proposed openstack/nova master: [WIP] Handle scaling of cputune.shares https://review.opendev.org/c/openstack/nova/+/898326 | 14:00 |
tobias-urdin | sean-k-mooney: are you thinking about quota:cpu_shares flavor extra spec as well? | 14:01 |
sean-k-mooney | tobias-urdin: yes | 14:02 |
tobias-urdin | i don't understand the reasoning for "libvirt broke us let's remove the default completely" | 14:02 |
sean-k-mooney | so on master (since zed) we nolonger genreate cpu_shares implictly | 14:02 |
sean-k-mooney | tobias-urdin: it was nto libvirt | 14:02 |
sean-k-mooney | this was cause by your kernel being compiled with cgroups_v2 | 14:03 |
sean-k-mooney | it was broken by the kernel team changeing the allowed ranges between api versions | 14:03 |
sean-k-mooney | tobias-urdin: neither nova nor libvirt provided any normalistaion of values menaing the admin is responsible for selecting values that are allowed by there kernel cgroup version | 14:05 |
tobias-urdin | while that is true and libvirt never normalized the values, i don't agree, libvirt is an abstraction and should've handled it | 14:09 |
sean-k-mooney | tobias-urdin: we made that argument to the libvirt maintainer and they disagreed | 14:09 |
sean-k-mooney | tobias-urdin: https://bugs.launchpad.net/nova/+bug/1960840 is related | 14:12 |
sean-k-mooney | we considerd extneding the flavor validation https://review.opendev.org/c/openstack/nova/+/829064 | 14:12 |
sean-k-mooney | however part of the problem is the range depend on the virtdriver in use and the cgroup version in the libvirt case | 14:13 |
sean-k-mooney | in that case actully the limit is a tc one rather the cgroup but we considerd deprecting and removing these as a result | 14:14 |
sean-k-mooney | that simpler to do with the vif quotas since they generally nolonger fucntion | 14:15 |
sean-k-mooney | and have a neutron replacemnt that does | 14:15 |
tobias-urdin | as an operator i basically have two options 1) apply patch https://review.opendev.org/c/openstack/nova/+/824048 and just live with the behaviour change and larger instances does not get favored for oversubscription and i have to manually update cpu_shares or backport that patch to yoga to move to zed with live migration or 2) scale the value and | 14:17 |
tobias-urdin | set correct on all current instances and live migrate them over and make sure new instances also gets a scaled value | 14:17 |
tobias-urdin | for an operator where touching the code is an issue this might be very messy | 14:17 |
tobias-urdin | i opted for adding it as a workaround and upon nova-compute startup fix the values, exactly because of the upgrade issue | 14:18 |
sean-k-mooney | so im not against backporting the disabling of the implict cpu_share request for what its worht | 14:18 |
sean-k-mooney | tobias-urdin: we have dont this back to wallaby downstream because we considerd it a release blocker for our product | 14:18 |
sean-k-mooney | tobias-urdin: we dont have any other code that modifes a guest on startup like that and im not sure hta tis a patten we should follow in general | 14:19 |
sean-k-mooney | tobias-urdin: if we were to backport https://review.opendev.org/c/openstack/nova/+/824048 to yoga would that solve your issue | 14:21 |
tobias-urdin | it feel like kind of a limbo, if libvirt maintainers indeed informed that this will never change then the only way forward is fix applications or drop it, i'm just surpised we went for drop it for a default value that has been there since years(?) | 14:22 |
tobias-urdin | personally i would like it backported but i dont know if stable backport policy covers performance impact as well? | 14:23 |
opendevreview | Amit Uniyal proposed openstack/nova-specs master: WIP: Enforce console session timeout https://review.opendev.org/c/openstack/nova-specs/+/898553 | 14:27 |
opendevreview | Tobias Urdin proposed openstack/nova stable/yoga: libvirt: remove default cputune shares value https://review.opendev.org/c/openstack/nova/+/898554 | 14:29 |
tobias-urdin | i guess ^ and then hope that operators take the proper action to update cpu_shares to default value or live migrate to get rid of it, annoying that nothing with this is optimal | 14:31 |
tobias-urdin | a shame that libvirt didn't even try by introducing a cputune.cpu_weight, deprecating and scaling the cputune.cpu_shares value if required, like systemd did and took backward compatibility more seriously | 14:33 |
tobias-urdin | / end of rant :p | 14:33 |
bauzas | reminder : nova meeting in 1 hour and 10 mins here | 14:50 |
opendevreview | Takashi Kajinami proposed openstack/nova master: Fix python shebang https://review.opendev.org/c/openstack/nova/+/898594 | 15:43 |
opendevreview | Merged openstack/nova stable/wallaby: Accept both 1 and Y as AMD SEV KVM kernel param value https://review.opendev.org/c/openstack/nova/+/843939 | 15:46 |
bauzas | #startmeeting nova | 16:01 |
opendevmeet | Meeting started Tue Oct 17 16:01:18 2023 UTC and is due to finish in 60 minutes. The chair is bauzas. Information about MeetBot at http://wiki.debian.org/MeetBot. | 16:01 |
opendevmeet | Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. | 16:01 |
opendevmeet | The meeting name has been set to 'nova' | 16:01 |
bauzas | hey folsk | 16:01 |
dansmith | o/ | 16:01 |
elodilles | o/ | 16:01 |
bauzas | #link https://wiki.openstack.org/wiki/Meetings/Nova#Agenda_for_next_meeting | 16:01 |
* bauzas is currently hit by a big bus, so I'm sorry to not be around like I should | 16:02 | |
auniyal | 0/\ | 16:02 |
auniyal | o/ | 16:02 |
bauzas | okay, let's start | 16:02 |
bauzas | #topic Bugs (stuck/critical) | 16:03 |
bauzas | #info No Critical bug | 16:03 |
bauzas | #link https://bugs.launchpad.net/nova/+bugs?search=Search&field.status=New 47 new untriaged bugs (+1 since the last meeting) | 16:03 |
bauzas | #info Add yourself in the team bug roster if you want to help https://etherpad.opendev.org/p/nova-bug-triage-roster | 16:03 |
Uggla | o/ | 16:03 |
bauzas | artom: you're next in the roster list, fancy trying to look at some upstream bugs ? | 16:03 |
bauzas | artom seems to be offline, let's move on and we'll see | 16:04 |
bauzas | #info bug baton is artom | 16:04 |
bauzas | #undo | 16:04 |
opendevmeet | Removing item from minutes: #info bug baton is artom | 16:04 |
artom | Eh? No I'm here | 16:04 |
bauzas | #info bug baton is tentatively artom | 16:04 |
bauzas | artom: I was just asking you whether you were happy to go looking at Launchpad | 16:05 |
artom | (to my own surprise, I should say - I've been trying a new wayland-native IRC client, and it got lost somewhere on my 9 workspaces) | 16:05 |
bauzas | trust me, this is a happy place compared to some other bug reporting tools I know | 16:05 |
artom | Yep, I'll launch all the triage pads | 16:05 |
artom | And pad all the triage launches | 16:06 |
bauzas | I give you my pad, bro | 16:06 |
bauzas | anyway, moving on | 16:06 |
artom | Pad accepted, brah | 16:06 |
bauzas | #topic Gate status | 16:06 |
bauzas | #link https://bugs.launchpad.net/nova/+bugs?field.tag=gate-failure Nova gate bugs | 16:06 |
bauzas | this was a funny week | 16:07 |
sean-k-mooney | dansmith: have you seen that grenade failure more then once | 16:07 |
bauzas | but afaict, the gate postfailure is now fixed ? | 16:07 |
dansmith | sean-k-mooney: yeah | 16:07 |
dansmith | looks like failure to ssh to the test instance though | 16:07 |
dansmith | I haven't dug in deep yet | 16:07 |
sean-k-mooney | ok so that might be another gate blocker | 16:08 |
bauzas | shit | 16:08 |
sean-k-mooney | ya it looks like an ssh issue but im not sure if its consitent or intermitant | 16:08 |
dansmith | idk, but it's blocking the fix for another blocker :) | 16:08 |
bauzas | I don't you for you folks, but I have the impression that the universe is after me | 16:08 |
sean-k-mooney | for context https://github.com/openstack/grenade/blob/master/projects/70_cinder/resources.sh#L240 is failing | 16:09 |
sean-k-mooney | but have not looked at it properly either | 16:09 |
bauzas | ack | 16:09 |
bauzas | I guess the cinder team is fully aware of the situation ? | 16:10 |
dansmith | I doubt it's a cinder problem | 16:10 |
bauzas | but since we ssh to the guest, this is our mud, right ? | 16:10 |
sean-k-mooney | if they run grenade maybe but i just saw it this morning | 16:10 |
sean-k-mooney | im logging in to opensarch to see how common it is now | 16:11 |
bauzas | and no guest console saying anything ? | 16:11 |
sean-k-mooney | again i havent debug it so im not sure | 16:11 |
dansmith | no guest console dump in grenade | 16:11 |
dansmith | that's a tempest thing | 16:11 |
dansmith | lets not debug here | 16:11 |
bauzas | I quite agree with the fact that we shouldn't debug now | 16:12 |
bauzas | moving on so | 16:12 |
bauzas | #link https://zuul.openstack.org/builds?project=openstack%2Fnova&project=openstack%2Fplacement&pipeline=periodic-weekly Nova&Placement periodic jobs status | 16:12 |
sean-k-mooney | looks like 100 failures in the last 30 days | 16:12 |
sean-k-mooney | but ya lets move on | 16:12 |
bauzas | some reds, but the gate isn't happy those days, we'll see next week | 16:13 |
bauzas | #info Please look at the gate failures and file a bug report with the gate-failure tag. | 16:13 |
bauzas | #topic Release Planning | 16:13 |
bauzas | #link https://releases.openstack.org/caracal/schedule.html | 16:14 |
bauzas | #info Nova deadlines are not yet defined and will be once the PTG happens | 16:14 |
bauzas | #info Caracal-1 milestone in 4 weeks | 16:14 |
bauzas | we'll discuss about spec review days next week at the PTG | 16:14 |
bauzas | but this is a good idea to propose your specs this week or the week after, since we'll be around at the same time | 16:14 |
bauzas | message is sent, moving on | 16:15 |
bauzas | #topic Caracal vPTG planning | 16:15 |
bauzas | #info Sessions will be held virtually October 23-27 | 16:15 |
bauzas | which is next week, basically | 16:15 |
bauzas | let's be honest, I'm late in terms of preparing this PTG | 16:15 |
opendevreview | Takashi Kajinami proposed openstack/nova master: Drop remaining deprecated upgrade_levels option for nova-cert https://review.opendev.org/c/openstack/nova/+/898613 | 16:15 |
bauzas | but we will already have a nova-cinder x-p session | 16:16 |
bauzas | FYI | 16:16 |
bauzas | feel free to add any cinder-related topics in the nova ptg etherpad, I'll move them to the right place in order for them to be discussed | 16:16 |
bauzas | no other PTLs came to me until now | 16:16 |
bauzas | but I guess this will come soon | 16:17 |
bauzas | I also haven't seen yet any cross-project-ish topic in the nova ptg etherpad, but if so, I'll do the liaison | 16:17 |
bauzas | #info Register yourselves on https://ptg2023.openinfra.dev/ even if the event is free | 16:17 |
bauzas | this is free, tbc. | 16:17 |
bauzas | #link https://etherpad.opendev.org/p/nova-caracal-ptg PTG etherpad | 16:18 |
bauzas | that's the etherpad I was referring to, seconds ago | 16:18 |
sean-k-mooney | we can proably just join there room if they have one or two topics | 16:18 |
bauzas | and yet the reminder | 16:18 |
bauzas | #info add your own topics into the above etherpad if you want them to be discussed at the PTG | 16:18 |
bauzas | sean-k-mooney: yeah that's the plan, the nova-cinder x-p session will be in their room | 16:18 |
bauzas | the exact timing of the nova-cinder session is already written in the etherpad (wed 5pm IIRC) | 16:19 |
bauzas | or maybe thur, my brain is playing with me | 16:19 |
bauzas | that reminds me, we shall cancel next team meeting | 16:20 |
bauzas | anybody disagrees ? | 16:20 |
bauzas | I take your silence as no | 16:20 |
* sean-k-mooney nods | 16:21 | |
dansmith | obvious :) | 16:21 |
bauzas | #agreed next Nova weekly meeting on Oct 24 is CANCELLED, go join the PTG instead, you'll have fun | 16:21 |
* bauzas doesn't know what to do in order to incentize operators to join | 16:21 | |
bauzas | I should try to joggle with balls, maybe | 16:22 |
bauzas | oh, last point | 16:22 |
bauzas | shall we run again this marvelous and succesful experience that is the operator-hour ? | 16:22 |
bauzas | I mean, I can just unbook some nova slot and officially pretend this is an operator hour | 16:23 |
dansmith | I think we should try yeah | 16:23 |
bauzas | and if nobody steps up, which would be sad and unfortunately expected, we could just consume the regular nova etherpad | 16:23 |
bauzas | okay, then I'll do the flip for tuesday 4pm | 16:24 |
bauzas | 4pm UTC allows us to get EU and US east-coast ops | 16:24 |
bauzas | and we could continue to discuss at 5pm if we really have audience and hot topics | 16:25 |
bauzas | #action bauzas to set up some operator hour, preferably Tuesday around 4pm UTC | 16:25 |
bauzas | #topic Review priorities | 16:26 |
bauzas | #link https://review.opendev.org/q/status:open+(project:openstack/nova+OR+project:openstack/placement+OR+project:openstack/os-traits+OR+project:openstack/os-resource-classes+OR+project:openstack/os-vif+OR+project:openstack/python-novaclient+OR+project:openstack/osc-placement)+(label:Review-Priority%252B1+OR+label:Review-Priority%252B2) | 16:26 |
bauzas | #info As a reminder, people eager to review changes can +1 to indicate their interest, +2 for asking cores to also review | 16:26 |
bauzas | yet again, taking the action to propose a Gerrit dash, once I'll have 5 mins of my time for doing this easy peasy | 16:26 |
bauzas | #topic Stable Branches | 16:26 |
bauzas | elodilles: ? | 16:26 |
elodilles | yepp | 16:27 |
elodilles | i'm not aware of any stable gate issues | 16:27 |
bauzas | the universe is smiling at me then | 16:27 |
elodilles | though nova-ceph-multistore is suspicious on stable/victoria | 16:27 |
elodilles | but need more check | 16:27 |
elodilles | otherwise gates should be OK | 16:28 |
elodilles | also, some bug fixes landed already on stable/2023.1 (antelope), so i'll propose a release patch, if that's OK for people | 16:28 |
bauzas | elodilles: thanks | 16:28 |
bauzas | elodilles: and yeah, sounds cool to me | 16:29 |
elodilles | ++ | 16:29 |
elodilles | and the usual: | 16:29 |
elodilles | #info stable branch status / gate failures tracking etherpad: https://etherpad.opendev.org/p/nova-stable-branch-ci | 16:29 |
elodilles | add stable gate issues there ^^^ | 16:29 |
bauzas | fwiw, we'll discuss the State of Wallaby and Ussuri | 16:29 |
elodilles | if you encounter any | 16:29 |
bauzas | at the PTG* | 16:29 |
elodilles | bauzas: ACK | 16:29 |
elodilles | (wallaby, victoria, ussuri, you mean?) | 16:30 |
elodilles | anyway, we'll discuss at PTG :) | 16:30 |
* bauzas doesn't know why but I always skip victoria | 16:30 | |
bauzas | this is like the thanos blip | 16:30 |
bauzas | I pretend victoria never existed | 16:31 |
bauzas | elodilles: heard any other projects besides Cinder pursuing the idea to drop those releases ? | 16:31 |
elodilles | bauzas: well, some projects have eol'd their xena already | 16:32 |
elodilles | bauzas: let me check quickly | 16:32 |
bauzas | see, the ship has sailed then | 16:32 |
bauzas | should be a very quick discussion at the PTG then | 16:32 |
elodilles | kolla & magnum, otherwise i think projects are quiet about it yet | 16:32 |
elodilles | so no other projects yet | 16:33 |
*** ralonsoh is now known as ralonsoh_ooo | 16:33 | |
bauzas | okay, we'll see at the PTG | 16:33 |
elodilles | +1 | 16:34 |
bauzas | thanks | 16:34 |
elodilles | ++ | 16:34 |
bauzas | #topic Open discussion | 16:34 |
bauzas | I have nothing in the wikipage | 16:34 |
bauzas | anything anyone ? | 16:34 |
bauzas | looks not | 16:35 |
bauzas | have a good day everyone and thanks all | 16:35 |
bauzas | #endmeeting | 16:35 |
opendevmeet | Meeting ended Tue Oct 17 16:35:22 2023 UTC. Information about MeetBot at http://wiki.debian.org/MeetBot . (v 0.1.4) | 16:35 |
opendevmeet | Minutes: https://meetings.opendev.org/meetings/nova/2023/nova.2023-10-17-16.01.html | 16:35 |
opendevmeet | Minutes (text): https://meetings.opendev.org/meetings/nova/2023/nova.2023-10-17-16.01.txt | 16:35 |
opendevmeet | Log: https://meetings.opendev.org/meetings/nova/2023/nova.2023-10-17-16.01.log.html | 16:35 |
elodilles | thanks bauzas o/ | 16:35 |
sean-k-mooney | bauzas: dansmith here is a short url to the greande failures https://tinyurl.com/2a63sy7e | 16:38 |
dansmith | sean-k-mooney: the fix patch passed grenade (or that phase of grenade) this time, so clearly not a hard fail | 17:08 |
sean-k-mooney | ack | 17:21 |
sean-k-mooney | it looks like it started aroudn october 5th | 17:22 |
auniyal | from this log https://3594ebcd65d47df3e70b-6ec9504d1ecc47a9ef6950d383ea355d.ssl.cf1.rackcdn.com/898435/1/check/nova-grenade-multinode/a3b1d83/controller/logs/grenade.sh_log.txt | 17:23 |
auniyal | I think, this https://github.com/openstack/grenade/blob/master/projects/70_cinder/resources.sh#L245 i.e. connecting to test_vm via ssh to verify verify.txt has test string, because, somewhat create failed at `2023-10-17 13:59:13.930` | 17:23 |
noonedeadpunk | Hey folks. I'm not really getting how to satisfy this check: https://opendev.org/openstack/nova/commit/27f384b7ac4f19ffaf884d77484814a220b2d51d | 18:02 |
noonedeadpunk | As eventually, you have a query for _compute_node_select where filter is `{"service_id": None}` | 18:03 |
noonedeadpunk | (as it's always being passed None as service_id here https://opendev.org/openstack/nova/src/branch/master/nova/db/main/api.py#L674) | 18:04 |
noonedeadpunk | And that eventually does not work at all for upgrades in our CI and quite reproducible https://paste.openstack.org/show/bWODASkcA0PkKHUHf3xG/ | 18:05 |
noonedeadpunk | And that's on current HEAD of 2023.2 | 18:07 |
noonedeadpunk | dansmith: I know you're on a tc meeting now, but maybe you have some thoughts on that | 18:07 |
dansmith | noonedeadpunk: have you done the upgrade before you run that check? | 18:08 |
* noonedeadpunk need to check logs | 18:09 | |
dansmith | noonedeadpunk: perhaps you've done the upgrade but haven't started all the computes yet so they haven't fixed their records? | 18:10 |
dansmith | (as noted in the error message) | 18:10 |
noonedeadpunk | Yes, online data migrations was run on N-1 as well according to the logs | 18:11 |
noonedeadpunk | and that' the first thing I've checked - that all is running https://paste.openstack.org/show/b8ZgQRT6N2qmGRRkdEZz/ | 18:11 |
dansmith | the computes do their own migrations of this, so running online_data_migrations won't change it | 18:12 |
dansmith | hang on | 18:12 |
noonedeadpunk | well, talking about that - computes were not restarted yet after the upgrade of API | 18:13 |
noonedeadpunk | (I guess) | 18:13 |
dansmith | well, that'd be why then | 18:14 |
noonedeadpunk | but um... I guess then I'm confused at what point nova-status upgrade check should run? | 18:15 |
noonedeadpunk | As that is to ensure that things are ready for upgrade? and like - cancel upgrade if they're not? | 18:16 |
noonedeadpunk | So, in case of upgrade, you should now upgrade computes first and then only api/conductor/scheduler? | 18:16 |
noonedeadpunk | https://docs.openstack.org/nova/latest/cli/nova-status.html#upgrade `Performs a release-specific readiness check before restarting services with new code` | 18:18 |
noonedeadpunk | So exactly like I did - run check, it failed, services were not restarted with the new code... | 18:18 |
dansmith | noonedeadpunk: okay maybe that should have been a warning I guess, idk | 18:18 |
sean-k-mooney | noonedeadpunk: nova-status is ment to be run before you do any upgrades | 18:18 |
noonedeadpunk | ahaa..... that what can be wrong | 18:19 |
noonedeadpunk | as I'm running it with 2023.1 code before restart | 18:19 |
noonedeadpunk | sorry | 18:19 |
noonedeadpunk | 2023.2 code | 18:19 |
dansmith | sean-k-mooney: running the new nova-status before you upgrade right? | 18:20 |
noonedeadpunk | so 1. upgrade code. 2. run check 3. restart services if it passes rollback if it's not | 18:20 |
dansmith | so we shouldn't have made that an error because without having run computes, you can't have had the uuids added | 18:20 |
noonedeadpunk | (it's current flow) | 18:20 |
sean-k-mooney | so to upgrade to 2023.2 form 2023.1 you would run the 2023.2 version of nova-status before doign the upgrde | 18:20 |
noonedeadpunk | yeah, I think that what I do actually | 18:21 |
noonedeadpunk | Like code is upgraded but services still running old code | 18:21 |
sean-k-mooney | dansmith: am i right about runnign the 2023.2 nova status when upgradeign to 2023.2 or should it be the 2023.1 version | 18:22 |
sean-k-mooney | i feel like im wrong about htat | 18:22 |
sean-k-mooney | we are ment to deprecate thign at least one release early so we normally put the status check in with the deprecation as a warning | 18:23 |
dansmith | honestly, I think it's supposed to be "am I done with my upgrade yet" | 18:23 |
noonedeadpunk | but how with old code you can be aware if it's safe to upgrade to the new one? | 18:23 |
dansmith | which before you upgrade will be "no" | 18:23 |
dansmith | but maybe error should only be raised if you have to do something before upgrading, idk | 18:23 |
dansmith | I wrote that at your request (IIRC) and don't recall any review comments about it :) | 18:24 |
sean-k-mooney | :) | 18:24 |
sean-k-mooney | so the currnt check woudl only pass after all compute are fully upgraded but it also should only be an error if th min compute service version is above the version where we stated doing that | 18:25 |
sean-k-mooney | noonedeadpunk: is this curretly blocking you form upgrading by the way | 18:26 |
sean-k-mooney | or were you asking how to make the check happy | 18:26 |
sean-k-mooney | if the later then you need to start the computes with the bobcat code | 18:26 |
sean-k-mooney | and it should pass after they have all started | 18:26 |
noonedeadpunk | Well. It makes our upgrade code fail in osa | 18:26 |
noonedeadpunk | And I'm looking how to make our upgrade jobs happy as well as users that would perform upgrade | 18:27 |
sean-k-mooney | i would need to check our docs for which version of nova status we expect to be run and where | 18:27 |
noonedeadpunk | It says nothing after Xena.... | 18:28 |
noonedeadpunk | if you're about https://docs.openstack.org/nova/latest/cli/nova-status.html#upgrade | 18:28 |
sean-k-mooney | nova-status on -1 tells you about the thigns that are deprecated that we know might brake you in the future | 18:28 |
sean-k-mooney | so before upgrade you shoudl fix those with the n-1 version | 18:28 |
sean-k-mooney | then do the upgrade. | 18:29 |
noonedeadpunk | mhm. Ok, I see, so it should run not on the upgraded code | 18:29 |
sean-k-mooney | ok so reading the doc text | 18:29 |
sean-k-mooney | its say "Performs a release-specific readiness check before restarting services with new code." | 18:30 |
sean-k-mooney | that would implcy that the new check should be a warning in 2023.2 and an error in 2024.1 i think | 18:30 |
noonedeadpunk | Well, I read it in a way - upgrade code, run test, restart services | 18:30 |
dansmith | sean-k-mooney: yep | 18:30 |
sean-k-mooney | although with slurp it shold be a warnignin 2024.1 as well? | 18:31 |
noonedeadpunk | I'm not sure it should be a warning even.... | 18:31 |
noonedeadpunk | and yeah, error only in 2024.2 | 18:31 |
noonedeadpunk | As that is really expected, that you won't have that at this stage, so what to warn about... | 18:31 |
sean-k-mooney | well it may mean you have old non upgraded compute service records | 18:32 |
sean-k-mooney | that is pretty common where peope scale in or remove say ironic | 18:32 |
noonedeadpunk | but you 100% have them until you restart computes, that will happen only afterwards? | 18:32 |
sean-k-mooney | and forget to remove the compute service records | 18:32 |
dansmith | noonedeadpunk: it needs to go from some state to "all good" after you're done with the upgade | 18:32 |
sean-k-mooney | that hsoudl be caust by the "Check: Older than N-1 computes " check | 18:32 |
dansmith | I don't think slurp has anything to do with this | 18:33 |
noonedeadpunk | well, it does, as otherwise you can't jump 2023.1 -> 2024.1? | 18:33 |
noonedeadpunk | as you have to do 2023.2 regardless ? | 18:33 |
noonedeadpunk | Or I'm totally lost why slurp even a thing... | 18:34 |
sean-k-mooney | so i think given where this is run in the upgrade it would only be valid to be an error if the min compute service version was above that in 2023.2 | 18:34 |
dansmith | slurp has nothing to do with it because it shouldn't be erroring out before you'e done whatever upgrade will result in the new ids | 18:34 |
dansmith | sean-k-mooney: I think we could move it to an error and version check before we start relying on these, but no need to make it an error right now I think | 18:35 |
noonedeadpunk | well, docs for the command say, that it should be run BEFORE service restart. So you can't be done with upgrade, it's pre-upgrade check basically | 18:35 |
sean-k-mooney | dansmith: i agree | 18:35 |
sean-k-mooney | noonedeadpunk: yes it is | 18:36 |
noonedeadpunk | but tight it to the compute version is indeed a good idea | 18:36 |
sean-k-mooney | the command was intened to tell you before you upgrade that "x will break because your forgot to do something" | 18:36 |
noonedeadpunk | as if it's n-2 for 2024.1 -> warning, n-1 - error | 18:36 |
sean-k-mooney | noonedeadpunk: so 2024.1 will be the first release to fully supprot n-2 | 18:37 |
noonedeadpunk | yeah, but well, if the feature has appeared afterwards - you really can't do that before? | 18:37 |
dansmith | noonedeadpunk: can you file a bug so we can backport? | 18:37 |
noonedeadpunk | ++ | 18:37 |
sean-k-mooney | dansmith: we just need to change this to warnign correct https://github.com/openstack/nova/blob/master/nova/cmd/status.py#L291-L302 | 18:38 |
dansmith | doing it now | 18:38 |
noonedeadpunk | just to state this one more time - warning will fire in 100% cases, right? | 18:38 |
sean-k-mooney | hum yes until the upgrade is compelte | 18:39 |
dansmith | yes | 18:39 |
noonedeadpunk | And what docs say - `At least one check encountered an issue and requires further investigation.` So all who do upgrades will spend some time to find out that they should proceed first when it's not inclined but the command nature? | 18:39 |
dansmith | it wouldn't make much sense to say "you pass this test with OK" and then after you restart it goes to warning or error | 18:39 |
dansmith | if we had a way to skip a test because it's irrelevant, then we'd do that, otherwise I think warning makes sense | 18:39 |
noonedeadpunk | it does on 2024.1 | 18:39 |
noonedeadpunk | if you upgrade from 2023.2 | 18:40 |
sean-k-mooney | dansmith: if we modify the check to do a min compute version check it might but then perhasp i was wrong to ask for this as its not as useful as i tought | 18:40 |
dansmith | does what? make sense? I disagree | 18:40 |
sean-k-mooney | noonedeadpunk: baislly i asked for this to catch the case where you are upgrading aging and had not started the computed i.e. to 2024.2 i guess | 18:41 |
noonedeadpunk | but then I'm really not getting what are expectations on me as operator to perform upgrade from 2023.1 to 2023.2 and what I should do when I see that warning | 18:41 |
dansmith | saying we checked a thing and everything is good when we really checked and see that you're not but we're ignoring the done-ness of it because you're in the middle of an upgrade doesn't seem to fit very well to me | 18:42 |
dansmith | like I say if we had "skipped" or "not due yet" then that'd make sense | 18:42 |
noonedeadpunk | yeah, I kind of already got what it does and why exist :) just trying to say that raise warning for expected behaviour is also... meh? | 18:42 |
sean-k-mooney | i wonder if it would be beter to remove this check. i know i asked for it orginally but im wondering if its adding benifit. im also wondering if there is another bug | 18:45 |
sean-k-mooney | we are doing " cn_no_service = main_db_api.compute_nodes_get_by_service_id(ctx, None) | 18:46 |
sean-k-mooney | " | 18:46 |
sean-k-mooney | so cn_no_service is the list of all compute nodes that have service id set to None | 18:46 |
dansmith | sean-k-mooney: it's really more applicable to the release where we start requiring it | 18:46 |
sean-k-mooney | dansmith: ya it is. im wonderign what happens for ironic in this case | 18:47 |
dansmith | idk. I have a big "if ironic: return" in my head over all that stuff :P | 18:47 |
sean-k-mooney | i think this will always fail if you have ironic | 18:48 |
sean-k-mooney | but i am also pretty sure we have not test coverage | 18:48 |
noonedeadpunk | https://bugs.launchpad.net/nova/+bug/2039597 | 18:49 |
sean-k-mooney | as in, in the compute manager i think your correect we have "if ironic: return" so i dont think we set the service id for ironic computes | 18:50 |
dansmith | sean-k-mooney: no I meant I have that conditional in my _brain_ :) | 18:50 |
sean-k-mooney | oh hehe | 18:50 |
dansmith | okay well, I have the FAIL->WARN thing queued, I can either submit that or change to a revert | 18:51 |
sean-k-mooney | given this converstaion im inclided to revert | 18:53 |
* noonedeadpunk still thinks that WARN is not perfect | 18:53 | |
sean-k-mooney | and then when we actully start depenign on this we can decide if we shoudl add a status check | 18:53 |
sean-k-mooney | as you suggested before | 18:53 |
noonedeadpunk | I mean. I can ignore WARNs and just make them pass.... But what value of warns then... | 18:53 |
noonedeadpunk | (or depending on the source version for upgrade for 2024.1 | 18:54 |
sean-k-mooney | noonedeadpunk: no warning are ment to be "this thing is deprecated and you should fix it before upgrading again" | 18:54 |
dansmith | sean-k-mooney: well, maybe comment on that in the bug and we can get bauzas to opine tomorrow. ISTR he was pro check as well, but maybe only because you said it and it "seems right" :) | 18:55 |
noonedeadpunk | sean-k-mooney: then it's worth fixing docs to state that | 18:55 |
sean-k-mooney | if we keep the check we need to filter out ironic compute nodes too and or add the min compute version cehck | 18:55 |
sean-k-mooney | but sure ill comment on the bug | 18:56 |
sean-k-mooney | noonedeadpunk: to be honest we try to avoid requiuring the operator to do things to be able to upgrade | 18:56 |
noonedeadpunk | As right now reading the doc it sounds a bit different then it's fine now - but will be not next time | 18:56 |
sean-k-mooney | so we very really add nova-status checks becuase we try to amke sure not to design something to require them | 18:57 |
sean-k-mooney | s/really/rarely/ | 18:58 |
noonedeadpunk | and you're doing great job to have that said | 18:58 |
sean-k-mooney | if you look at the first copule they are things like you must have placment, or cells v2 | 18:59 |
noonedeadpunk | And I'm not insisting that we're doing things right - it's just we're doing them how it's written more or less... And if you say we should do things differently - I'm really fine with that, and can propose update to docs as well | 18:59 |
sean-k-mooney | and more recently the service user token for the cve | 18:59 |
noonedeadpunk | Yeah, I guess one "but" for service user token - is that it's not since 24.0.0 as it was during Zed in fact and then backported... But anyway - it's small detail | 19:00 |
noonedeadpunk | I also wondering how nobody has raised that thing until now... As we've seen failure in CI right after the coordinated release, but nobody had time to look into the rootcause... | 19:03 |
sean-k-mooney | right so the upgrade check is ment to help catch this before you deploy to production | 19:05 |
sean-k-mooney | what this si also telling me is we dont actullly fail on error in our grenade jobs. | 19:06 |
sean-k-mooney | that would have caught this before it merged i think | 19:06 |
dansmith | well, tbh I think we're running it after upgrade to see if you've got everything done | 19:10 |
dansmith | and thus it's passing | 19:10 |
dansmith | it's supposed to be used for that as well.. basically "am I done with my homework" | 19:10 |
sean-k-mooney | ya that makes sense i gues although this would still fial in the multi node job in that case | 19:11 |
sean-k-mooney | since we only upgrade the contoler node and not the extra compute | 19:12 |
noonedeadpunk | Sorry, I think I have one more question. Today I got quite stuid idea, but wanna confirm how stupid it is :) | 19:26 |
noonedeadpunk | How bad is to run `nova-manage cell_v2 discover_hosts --by-service` if you don't have ironic | 19:26 |
noonedeadpunk | I see there's a performance penalty of doing that in the docs... | 19:27 |
noonedeadpunk | But that would simplify some logic quite a lot on the other hand... | 19:27 |
noonedeadpunk | So wondering how bad that trade off might be from the nova prespective | 19:30 |
opendevreview | Alexey Stupnikov proposed openstack/nova master: Add functional tests to reproduce bug #1994983 https://review.opendev.org/c/openstack/nova/+/863416 | 19:57 |
opendevreview | Alexey Stupnikov proposed openstack/nova master: Add functional tests to reproduce bug #1994983 https://review.opendev.org/c/openstack/nova/+/863416 | 19:59 |
opendevreview | Alexey Stupnikov proposed openstack/nova master: Log some InstanceNotFound exceptions from libvirt https://review.opendev.org/c/openstack/nova/+/863665 | 19:59 |
colby__ | Hey Guys. We are in the process of upgrading our hypervisors to Yoga. We are on centos8 stream. Once I update openstack and qemu/libvirt packages instances are not able to be live migrated to the hypervisor. Im seeing the following: | 20:18 |
colby__ | qemu-kvm: Missing section footer for 0000:00:01.3/piix4_pm#0122023-10-17T19:07:20.896349Z qemu-kvm: load of migration failed: Invalid argument | 20:18 |
colby__ | did the commands to migrate change at all with yoga release? Where can I see that in the code? Im wondering if this is a qemu update issue | 20:19 |
opendevreview | Merged openstack/nova master: Install lxml before we need it in post-run https://review.opendev.org/c/openstack/nova/+/898435 | 20:50 |
*** haleyb is now known as haleyb|out | 22:30 |
Generated by irclog2html.py 2.17.3 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!