*** ralonsoh_ooo is now known as ralonsoh | 07:32 | |
opendevreview | Jorge San Emeterio proposed openstack/nova master: WIP: Look for cpu controller on cgroups v2 https://review.opendev.org/c/openstack/nova/+/873127 | 08:49 |
---|---|---|
bauzas | gibi: so a bit of heads-up | 09:26 |
bauzas | gibi: first, you may be interested in knowing what the logs tell to us for the functests https://paste.opendev.org/show/bfvZX0XeKsELzY54EGb8/ | 09:27 |
bauzas | gibi: secondly, I created a docs patch and a fup for the cpu mgmnt series https://review.opendev.org/c/openstack/nova/+/874514 and https://review.opendev.org/c/openstack/nova/+/874515/ | 09:28 |
bauzas | eventually, I'll tell about the RC1 etherpad in the meeting https://etherpad.opendev.org/p/nova-antelope-rc-potential | 09:29 |
bauzas | we now have a LP tag for antelope rc | 09:29 |
opendevreview | Jorge San Emeterio proposed openstack/nova stable/train: WIP: Fixing python-devel package for RHEL 8 https://review.opendev.org/c/openstack/nova/+/874547 | 10:13 |
opendevreview | Jorge San Emeterio proposed openstack/nova stable/train: Changing "python-devel" to "python3-devel" on bindep test requirements for RPM based distros. https://review.opendev.org/c/openstack/nova/+/874547 | 11:10 |
opendevreview | Alexey Stupnikov proposed openstack/nova stable/victoria: reenable greendns in nova. https://review.opendev.org/c/openstack/nova/+/833436 | 12:03 |
opendevreview | Rajesh Tailor proposed openstack/nova master: Handle InstanceExists exception for duplicate instance https://review.opendev.org/c/openstack/nova/+/860938 | 12:39 |
*** ralonsoh is now known as ralonsoh_lunch | 12:51 | |
*** ralonsoh_lunch is now known as ralonsoh | 13:31 | |
opendevreview | Jorge San Emeterio proposed openstack/nova stable/train: Indicate dependency on "python3-devel" for py3 based RPM distros. https://review.opendev.org/c/openstack/nova/+/874547 | 14:10 |
*** dasm|off is now known as dasm | 14:12 | |
opendevreview | Jorge San Emeterio proposed openstack/nova stable/train: Add binary test dependency "python3-devel" for py3 based RPM distros. https://review.opendev.org/c/openstack/nova/+/874547 | 14:12 |
opendevreview | Jorge San Emeterio proposed openstack/nova stable/train: [stable-only] Add binary test dependency "python3-devel" for py3 based RPM distros. https://review.opendev.org/c/openstack/nova/+/874547 | 14:13 |
gibi | bauzas: I will have to drop aroun 17:30 during the nova weekly meeting | 14:40 |
bauzas | gibi: ack, np | 14:40 |
elodilles | bauzas: are you editing the meeting page? let me know when i can update stable section | 14:43 |
bauzas | elodilles: do it now | 14:44 |
bauzas | elodilles: I'll add all the Bobcat plans and RC1 later | 14:44 |
elodilles | bauzas: done | 14:45 |
bauzas | all cool | 14:45 |
opendevreview | Jorge San Emeterio proposed openstack/nova master: WIP: Look for cpu controller on cgroups v2 https://review.opendev.org/c/openstack/nova/+/873127 | 14:45 |
elodilles | bauzas: btw, have you seen this? https://review.opendev.org/c/openstack/releases/+/874450 | 14:51 |
elodilles | (i know that you are busy with everything o:)) | 14:51 |
bauzas | elodilles: yup, it's now in the RC1 etherpad | 14:51 |
bauzas | we'll discuss it in the meeting | 14:52 |
elodilles | bauzas: ++ | 14:52 |
opendevreview | Merged openstack/nova-specs master: Create specs directory for 2023.2 Bobcat https://review.opendev.org/c/openstack/nova-specs/+/872068 | 15:39 |
bauzas | #startmeeting nova | 16:00 |
opendevmeet | Meeting started Tue Feb 21 16:00:38 2023 UTC and is due to finish in 60 minutes. The chair is bauzas. Information about MeetBot at http://wiki.debian.org/MeetBot. | 16:00 |
opendevmeet | Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. | 16:00 |
opendevmeet | The meeting name has been set to 'nova' | 16:00 |
Uggla | o/ | 16:00 |
bauzas | hey folks, hola everyone | 16:00 |
dansmith | o/ | 16:01 |
bauzas | #link https://wiki.openstack.org/wiki/Meetings/Nova#Agenda_for_next_meeting | 16:01 |
*** artom_ is now known as artom | 16:01 | |
elodilles | o/ | 16:01 |
bauzas | let's start, some people have to leave early | 16:01 |
bauzas | #topic Bugs (stuck/critical) | 16:01 |
bauzas | #info No Critical bug | 16:02 |
bauzas | #link https://bugs.launchpad.net/nova/+bugs?search=Search&field.status=New 16 new untriaged bugs (-1 since the last meeting) | 16:02 |
gibi | o/ | 16:02 |
bauzas | auniyal helped me with triage | 16:02 |
bauzas | I created an etherpad | 16:02 |
bauzas | and I have a bug I'd like to discuss with you folks | 16:02 |
bauzas | #link https://etherpad.opendev.org/p/nova-bug-triage-20230214 | 16:02 |
bauzas | the bug in question : | 16:03 |
bauzas | #link https://bugs.launchpad.net/nova/+bug/2006770 | 16:03 |
bauzas | as you see, i did set it to Opinion | 16:03 |
bauzas | tl;dr: this is about our ip query param for instances list | 16:03 |
bauzas | we directly call Neutron to get the ports | 16:03 |
bauzas | it basically works, but the reporter had some concerns | 16:04 |
bauzas | people want to discuss this bug now or later ? | 16:05 |
bauzas | (we can discuss it in the open disc topic if we have time) | 16:05 |
bauzas | let's say later then :) | 16:06 |
bauzas | (people can lookup the bug if they want meanwhile) | 16:06 |
dansmith | opinion seems right to me :) | 16:06 |
bauzas | let's discuss this then later in the open discussion topic | 16:06 |
bauzas | so people will have time | 16:06 |
bauzas | moving on | 16:06 |
bauzas | #info Add yourself in the team bug roster if you want to help https://etherpad.opendev.org/p/nova-bug-triage-roster | 16:07 |
bauzas | Uggla: works for you to get the baton this week ? | 16:07 |
Uggla | bauzas, ok | 16:07 |
bauzas | ack | 16:07 |
bauzas | #info bug baton is being passed to Uggla | 16:07 |
bauzas | #topic Gate status | 16:07 |
bauzas | #link https://bugs.launchpad.net/nova/+bugs?field.tag=gate-failure Nova gate bugs | 16:07 |
bauzas | but the best is to track the etherpad | 16:07 |
bauzas | #link https://etherpad.opendev.org/p/nova-ci-failures | 16:08 |
bauzas | it was a dodgy week | 16:08 |
dansmith | so, | 16:08 |
dansmith | this got merged: https://review.opendev.org/c/openstack/devstack/+/873646 | 16:08 |
dansmith | which seems to allow halving the memory used by mysqld | 16:08 |
bauzas | haha, gtk | 16:09 |
dansmith | which may help with the OOM issues we see, especially in the fat jobs like ceph-multistore | 16:09 |
dansmith | we could enable that in our nova-ceph-multistore job if we want to be on the leading edge and try to make sure that it's actually helping | 16:09 |
dansmith | (it's opt-in right now) | 16:09 |
bauzas | indeed, I'll double check later if we continue to have some OOM issues | 16:09 |
bauzas | ah my bad | 16:09 |
dansmith | we could remove it if it causes other problems, but.. might be good to try it | 16:09 |
bauzas | surely | 16:10 |
bauzas | dansmith: thanks for having worked on it | 16:10 |
bauzas | dansmith: I can write a zuul patch for novza | 16:10 |
dansmith | I can do it too, just wanted to socialize | 16:10 |
bauzas | dansmith: ack cool then, ping me for reviews | 16:10 |
dansmith | ack | 16:11 |
bauzas | ++ again | 16:11 |
bauzas | dansmith: I also need to look at all the Gerrit recheck comments I wrote last week | 16:12 |
bauzas | I maybe found some other races | 16:12 |
bauzas | but we'll see | 16:12 |
bauzas | we also have the OOM logger patch that was telling us a few things | 16:13 |
bauzas | https://paste.opendev.org/show/bfvZX0XeKsELzY54EGb8/ | 16:13 |
bauzas | but let's discuss this off-meeting | 16:13 |
* gibi had not time to look at the extra logs from the functional race | 16:14 | |
bauzas | gibi: basically, each of the 6 failures having logs was having a different functest | 16:15 |
gibi | cool, that can serve as a basis for a local repor | 16:15 |
bauzas | anyway, moving on | 16:15 |
sean-k-mooney1 | bauzas: they are all in teh libvirt test suite | 16:15 |
sean-k-mooney1 | so likely all have the same common issue | 16:16 |
sean-k-mooney1 | but ya lets move on | 16:16 |
bauzas | maybe, I didn't had time yet to look at the code | 16:16 |
bauzas | #link https://zuul.openstack.org/builds?project=openstack%2Fnova&project=openstack%2Fplacement&pipeline=periodic-weekly Nova&Placement periodic jobs status | 16:16 |
bauzas | all of them are green ^ | 16:16 |
bauzas | #info Please look at the gate failures and file a bug report with the gate-failure tag. | 16:16 |
bauzas | #info STOP DOING BLIND RECHECKS aka. 'recheck' https://docs.openstack.org/project-team-guide/testing.html#how-to-handle-test-failures | 16:16 |
bauzas | that's it | 16:16 |
bauzas | #topic Release Planning | 16:16 |
bauzas | #link https://releases.openstack.org/antelope/schedule.html | 16:16 |
bauzas | so we're now on Feature Freeze | 16:17 |
bauzas | #link https://etherpad.opendev.org/p/nova-antelope-blueprint-status Blueprint status for 2023.1 | 16:17 |
bauzas | you can see what we merged | 16:17 |
bauzas | I also created two changes for my own series that were asked | 16:17 |
bauzas | but I'll ping folks tomorrow about them | 16:17 |
bauzas | #info Antelope-rc1 is in 1.5 weeks | 16:17 |
bauzas | now, we need to prepare for our RC1 where we branch master | 16:18 |
bauzas | #link https://etherpad.opendev.org/p/nova-antelope-rc-potential | 16:18 |
bauzas | as you see ^ I created an etherpad | 16:18 |
bauzas | thanks btw. again takashi for creating some changes that are needed | 16:18 |
bauzas | as a reminder, if people find some bugs, they can use a specific tag : | 16:19 |
bauzas | https://bugs.launchpad.net/nova/+bugs?field.tag=antelope-rc-potential | 16:19 |
bauzas | before RC1, any bug report can be using this tag, but we prefer to make sure they are regressions | 16:19 |
bauzas | after RC1, only regressions should use this tag | 16:20 |
bauzas | I created a cycle highlights change too : | 16:20 |
bauzas | #link https://review.opendev.org/c/openstack/releases/+/874483 Cycle highlights for Nova Antelope | 16:20 |
bauzas | please review it | 16:20 |
bauzas | at least gibi, dansmith, artom and other folks that were having merged changes | 16:21 |
gibi | ack | 16:21 |
bauzas | I'll +1 on Thursday | 16:21 |
bauzas | we need to merge this before this Thursday for the Foundation market folks | 16:22 |
bauzas | we also have https://review.opendev.org/c/openstack/releases/+/874450 to +1 | 16:22 |
bauzas | I guess we're done with our clients | 16:22 |
bauzas | so I'll branch os-vif, osc-placement and novaclient unless people have concerns | 16:23 |
bauzas | as you see in the commit msg, it will be merged eventually on Friday | 16:23 |
* bauzas will just verify the SHA1 | 16:24 | |
elodilles | or earlier if a release liaison +1s it | 16:24 |
bauzas | elodilles: yup, but I'm asking people if they have concerns | 16:24 |
sean-k-mooney1 | speaking of which im not sure if i ill have time to continue to do that | 16:24 |
bauzas | looks not, so I'll just verify the SHA1s before +1ing | 16:24 |
sean-k-mooney1 | i may leave my slef for this cycle if no on else want to take on that role | 16:24 |
bauzas | sean-k-mooney1: yup, I know and I was planning to ask you | 16:25 |
sean-k-mooney1 | but i am not sure of my aviablity to keep an eye on it this cycle | 16:25 |
*** sean-k-mooney1 is now known as sean-k-mooney | 16:25 | |
bauzas | ok, so maybe it's not time yet to ask if someone else wants to be a release liaison | 16:25 |
bauzas | but I'll officially ask it next week | 16:25 |
sean-k-mooney | ok | 16:25 |
bauzas | we can have more than one release liaison btw. | 16:26 |
bauzas | no need to remove you before someone arrives or something like that | 16:26 |
sean-k-mooney | ack the primary role is to reducec the bus factor and ensure that release are done correctly and in a timply fashion so it does not all fall on the PTL | 16:26 |
bauzas | and we can even have *two* liaisons if we really find *two* people wanting to be :) | 16:26 |
bauzas | no need to battle :po | 16:26 |
bauzas | I'll explain next week what a release liaison is and what they do | 16:27 |
bauzas | but if people want, they can DM me | 16:27 |
bauzas | before next meeting | 16:27 |
bauzas | #info If someone wants to run as a Nova release liaison next cycle, please ping bauzas | 16:28 |
bauzas | I think that's it for the RC1 agenda | 16:28 |
bauzas | oh | 16:28 |
bauzas | one last thing | 16:28 |
bauzas | thanks to takashi, https://review.opendev.org/c/openstack/nova-specs/+/872068 is merged | 16:29 |
bauzas | you can now add your specs for Bocat | 16:29 |
bauzas | Bobcat even | 16:29 |
bauzas | like, people who had accepted specs for Antelope can just repropose them for Bobcat and I'll quickly +2/+W directly if nothing changes between both spec files | 16:30 |
* bauzas tries to not eye at folks | 16:30 | |
bauzas | I'll do the Launchpad Bobcat magic later next week (I guess) | 16:31 |
bauzas | that's it this time | 16:31 |
bauzas | #topic vPTG Planning | 16:31 |
bauzas | as a weekly reminder : | 16:31 |
bauzas | #link https://www.eventbrite.com/e/project-teams-gathering-march-2023-tickets-483971570997 Register your free ticket | 16:31 |
* gibi needs to drop, will read back tomorrow | 16:31 | |
bauzas | maybe you haven't seen but we are officially a PTG team | 16:31 |
bauzas | I don't know yet how long we could run the vPTG sessions | 16:32 |
bauzas | but like every cycle, I'll ask your opinions about the timing | 16:32 |
bauzas | not today, but once I'm asked | 16:32 |
bauzas | good time for saying | 16:32 |
bauzas | #link https://etherpad.opendev.org/p/nova-bobcat-ptg Draft PTG etherpad | 16:33 |
bauzas | I feel alone with this etherpad ^ | 16:33 |
bauzas | and I'm sure people have topics they want to discuss | 16:33 |
bauzas | anyway, moving on | 16:34 |
bauzas | (just hoping people read our meeting notes) | 16:34 |
bauzas | #topic Review priorities | 16:34 |
bauzas | #link https://review.opendev.org/q/status:open+(project:openstack/nova+OR+project:openstack/placement+OR+project:openstack/os-traits+OR+project:openstack/os-resource-classes+OR+project:openstack/os-vif+OR+project:openstack/python-novaclient+OR+project:openstack/osc-placement)+(label:Review-Priority%252B1+OR+label:Review-Priority%252B2) | 16:34 |
bauzas | #info As a reminder, cores eager to review changes can +1 to indicate their interest, +2 for committing to the review | 16:34 |
bauzas | #topic Stable Branches | 16:35 |
bauzas | elodilles: your turn | 16:35 |
elodilles | #info stable gates seem to be OK (victoria gate workaround has landed and it is now unblocked) | 16:35 |
elodilles | well, unblocked | 16:35 |
elodilles | though it's not everywhere easy to merge in patches | 16:35 |
elodilles | #info stable branch status / gate failures tracking etherpad: https://etherpad.opendev.org/p/nova-stable-branch-ci | 16:36 |
bauzas | indeed | 16:36 |
elodilles | that's the short summary | 16:36 |
bauzas | I still have the ussuri CVE VMDK fix to be merged | 16:36 |
bauzas | I rechecked it a few times | 16:36 |
bauzas | elodilles: thanks for the notes | 16:37 |
elodilles | np | 16:37 |
bauzas | #topic Open discussion | 16:37 |
bauzas | so, nothing on the agenda | 16:37 |
bauzas | we can discuss https://bugs.launchpad.net/nova/+bug/2006770 if people want or close the meeting | 16:37 |
bauzas | the fact is, I wrote Opinion | 16:37 |
bauzas | unless people have concerns with what I wrote, I'm done. | 16:38 |
bauzas | looks not | 16:39 |
bauzas | then I assume we're done. | 16:39 |
dansmith | ++ | 16:40 |
bauzas | thanks all | 16:40 |
bauzas | #endmeeting | 16:40 |
opendevmeet | Meeting ended Tue Feb 21 16:40:23 2023 UTC. Information about MeetBot at http://wiki.debian.org/MeetBot . (v 0.1.4) | 16:40 |
opendevmeet | Minutes: https://meetings.opendev.org/meetings/nova/2023/nova.2023-02-21-16.00.html | 16:40 |
opendevmeet | Minutes (text): https://meetings.opendev.org/meetings/nova/2023/nova.2023-02-21-16.00.txt | 16:40 |
opendevmeet | Log: https://meetings.opendev.org/meetings/nova/2023/nova.2023-02-21-16.00.log.html | 16:40 |
elodilles | thanks o/ | 16:40 |
dansmith | bauzas: so, gmann and I were running that memory usage patch in periodic on tempest jobs for a few days to make sure it didn't substantially worsen things | 16:40 |
dansmith | and my survey at the moment indicates that it looks good | 16:41 |
dansmith | so I'll propose to make it enabled for ceph-multistore (which will also impact glance) and we'll see if gmann is cool with that when he's around | 16:41 |
bauzas | nice to hear | 16:41 |
bauzas | ack, do it and I'll vote | 16:41 |
opendevreview | Dan Smith proposed openstack/nova master: Use mysql memory reduction flags for ceph job https://review.opendev.org/c/openstack/nova/+/874664 | 16:45 |
dansmith | bauzas: ^ | 16:45 |
bauzas | dansmith: I doubt that cells_v2 map_instances could work with https://bugs.launchpad.net/nova/+bug/2007922 (even I asked for it) | 17:43 |
bauzas | dansmith: tl;dr: the instance mapping exists but the cell value is None | 17:43 |
bauzas | and we know the instance is in cell0 DB | 17:43 |
dansmith | yeah, as I said, I initially missed that the person said they had reference in the mappings table | 17:43 |
bauzas | dansmith: I guess the simpliest thing is to hack the DB to add the cell0 uuid in the instancemapping record, nope ? | 17:43 |
dansmith | probably | 17:44 |
bauzas | or do we have a better nova-manage command ? | 17:44 |
bauzas | looking at the docs, nope | 17:44 |
dansmith | not that I know of | 17:44 |
bauzas | this instance is somehow sit in the middle | 17:44 |
bauzas | not fully migrated but in between | 17:44 |
dansmith | not fully ... mapped? | 17:44 |
bauzas | sorry, yeah mapped | 17:45 |
bauzas | I'll propose the ALTER to the reporter | 17:45 |
dansmith | don't we have a mapped flag on the instance (or something else)? | 17:47 |
bauzas | in the instances table you mean ? | 17:48 |
dansmith | I thought it was.. that's how we survey instances that need to be mapped right? | 17:48 |
* bauzas just checks the map_instances code | 17:49 | |
dansmith | just wondering if that flag matches or not | 17:49 |
bauzas | so | 17:52 |
bauzas | https://github.com/openstack/nova/blob/master/nova/cmd/manage.py#L874 | 17:52 |
bauzas | we just iterate over a limit and a marker on the instances table from a cell that's given | 17:53 |
dansmith | ah right | 17:54 |
bauzas | and I think I understand how the cell ID was set to None | 17:54 |
bauzas | https://github.com/openstack/nova/blob/439c67254859485011e7fd2859051464e570d78b/nova/objects/instance_mapping.py#L73 | 17:54 |
dansmith | it only does that if it's not none though | 17:55 |
bauzas | anyway, map_instances *could* work with cell0 | 17:55 |
bauzas | if the reporter runs map_instances with cell0 attribute, it will loop over the contents of cell0's instances table and will create an instancemapping object | 17:56 |
bauzas | oh wait, fuck no | 17:56 |
bauzas | https://github.com/openstack/nova/blob/master/nova/cmd/manage.py#L791-L792 | 17:56 |
bauzas | so, definitely the easier is to alter the db | 17:56 |
dansmith | again, I only thought it was useful to run map if the mapping didn't exist | 18:00 |
bauzas | yup | 18:03 |
bauzas | or the reporter could then delete the instance mapping | 18:03 |
sean-k-mooney | bauzas: elodilles can we prioritise review of this if possibel https://review.opendev.org/c/openstack/nova/+/874547 | 18:10 |
opendevreview | Takashi Natsume proposed openstack/placement master: Move implemented specs for Xena and Yoga release https://review.opendev.org/c/openstack/placement/+/853730 | 18:10 |
sean-k-mooney | this will help us fix our downstream ci | 18:11 |
bauzas | done but I leave you +W as I don't have a lot of context | 18:13 |
sean-k-mooney | tl;dr we use bindep in our downstream jobs to install deps before runing tox but rhel 8 nolonger has python-devel | 18:15 |
gmann | dansmith: +W on 'mysql memory reduction flags for ceph job' | 18:15 |
sean-k-mooney | i wanted to check with elodilles to make sure they were ok with the stable-only change | 18:15 |
dansmith | gmann: cool | 18:16 |
dansmith | thanks | 18:16 |
dansmith | gmann: oh jeez, I didn't realize the mysql periodic thing hadn't landed yet | 18:24 |
dansmith | gmann: do you think we should wait for that to soak for a bit? | 18:24 |
dansmith | I know the devstack one did, and I guess I misread that the tempest one hadn't yet | 18:24 |
gmann | dansmith: I also did not realize it when I checked that patch. but I think it is ok to enable it in ceph job and see. | 18:26 |
gmann | we can always revert it if it fail and make delay things during release time | 18:26 |
dansmith | okay, that's my preference too | 18:26 |
mnaser | i got a fun one. it looks like by default nova saves the az of the vm in the cell db, but it doesn't update the request_spec, but when we do migrations, we pass the request_spec to the scheduler (which contains az=null) which then moves you from one az to another in the migration | 18:44 |
mnaser | since .. https://github.com/openstack/nova/blob/90e2a5e50fbf08e62a1aedd5e176845ee22d96c9/nova/scheduler/request_filter.py#L138-L166 checks for request_spec az | 18:45 |
sean-k-mooney | mnaser: this was changed recently | 18:45 |
mnaser | this is in a scenario where an operator wants to make vms stick to their az if a user doesnt specify one | 18:45 |
sean-k-mooney | right so we spent a lot fo time trying to decide what the sematics shoudl be | 18:46 |
sean-k-mooney | im trying to find the spec | 18:46 |
sean-k-mooney | https://specs.openstack.org/openstack/nova-specs/specs/zed/implemented/unshelve-to-host.html | 18:47 |
sean-k-mooney | i guess this was for unshleve | 18:47 |
sean-k-mooney | mnaser: we epect tht the request spec would not have the az by the way if the user did not request one | 18:48 |
mnaser | makes sense cause that's their request | 18:48 |
mnaser | i understand it might not be everyone that wants this, but maybe for live migration use case it can cause issues if nova ends up doing cross-az migrations | 18:49 |
sean-k-mooney | for move operations where we supprot specifying a AZ it would be ok in some cases to set it in the request spec | 18:49 |
sean-k-mooney | mnaser: but we would want to have the same beahivor as in the unselve spec | 18:50 |
sean-k-mooney | i dont recal if we fixed the other move operatiosn to be consitent with that when we did this | 18:50 |
sean-k-mooney | Uggla: do you recall | 18:50 |
sean-k-mooney | mnaser: wiht unshelve to a specific az if you set it in the unshelve request and it was not set in the orgainl request spec it will be set after | 18:51 |
sean-k-mooney | mnaser: live migraiton does not currently supprot an az | 18:52 |
mnaser | sean-k-mooney: esentailly im thinking this is where this can be changed https://github.com/openstack/nova/blob/f01a90ccb85ab254236f84009cd432d03ce12ebb/nova/compute/api.py#L5499-L5500 | 18:52 |
mnaser | cause live migrating from one az to another could pretty much fail, and we can just have it as an option i guess if we dont want to change default behaviour | 18:53 |
sean-k-mooney | nor does migrate | 18:53 |
mnaser | in most worlds migrate or live migrate will fail across az's | 18:53 |
mnaser | esp if you're using different storage backends for example | 18:53 |
sean-k-mooney | mnaser: this would be an api change and need a spec | 18:53 |
sean-k-mooney | in general live migration betwen AZ will either work on not work depending on yoru deployment. in general i would expect it to work in most cases | 18:54 |
sean-k-mooney | it just comes down to if you have exchanged ssh keys such that the hyperviors can comunicate and if you are using az with cinder or not | 18:55 |
sean-k-mooney | and cross_az attach | 18:55 |
mnaser | Maybe we can make it so that if cross az attach = false then it would update the request spec to match? | 18:56 |
sean-k-mooney | no | 18:56 |
sean-k-mooney | no config drvent api behavior | 18:56 |
sean-k-mooney | this is not a bug | 18:56 |
sean-k-mooney | if we want to supprot move operation to target an AZ or change the request spec this is an api change | 18:56 |
mnaser | No it’s not to target an AZ | 18:56 |
sean-k-mooney | i know you want to prefer to keep affinity | 18:57 |
mnaser | it’s for that it stays in the same AZ, or otherwise the live migration will fail | 18:57 |
sean-k-mooney | liek a weigher or filter | 18:57 |
sean-k-mooney | however that is not what the end user asked for | 18:57 |
mnaser | if nova allows you to live migrate from one az to another for a vm with cross_az_attach set to false is that a bug ? | 18:58 |
sean-k-mooney | not a schduler bug | 18:58 |
sean-k-mooney | it will fail in pre-live-migrate | 18:59 |
sean-k-mooney | and the vm will stay in active on the host | 18:59 |
sean-k-mooney | (source host) | 18:59 |
mnaser | now if you’re using rbd for images_type and you have 2 clusters with each az using different cluster | 18:59 |
mnaser | And you do a live migrate and end up with vm running on the other side and but using it’s original storage | 19:00 |
sean-k-mooney | then you need to configure your filters to ensure that you target the vsm to spcific cluster using a flavor or simialr | 19:00 |
mnaser | And then on resize ops it blows up horribly because it’s trying to use the destination cluster id | 19:00 |
sean-k-mooney | yep that operator error if they did not configure things properly to prevent this | 19:01 |
sean-k-mooney | adressing theses usecase is somethign that could be done but it would be a feature not a bug | 19:01 |
mnaser | How? So if you have 3 azs you create 3 flavors? | 19:01 |
sean-k-mooney | yep | 19:01 |
mnaser | Do you think that’s user friendly at all | 19:01 |
sean-k-mooney | nope but its how its currently desigined | 19:02 |
sean-k-mooney | and fixing it would not eb a bug fix | 19:02 |
mnaser | So really what you’re saying is nova will do live migrations that will break your vm | 19:02 |
mnaser | And that’s not a bug | 19:02 |
sean-k-mooney | nope | 19:02 |
sean-k-mooney | nova check if it can attach the volcumes to the select host before it live migrates | 19:03 |
sean-k-mooney | so it will pass the schduler but fail in pre live migrate | 19:03 |
mnaser | ok, lets put that aside and talk about the users who use images_type=rbd | 19:03 |
mnaser | with different az's | 19:03 |
mnaser | it will break thoes vms | 19:03 |
sean-k-mooney | also live migrate is an admin only api and we allow you as an admin to select the host | 19:03 |
mnaser | ok when we're deploying openstack for customers they don't expect to sit and decide which host they are going to move things into at scale | 19:04 |
sean-k-mooney | mnaser: not if you use cross_az_atch=false | 19:04 |
mnaser | if i tell them 'sorry, openstack is kinda silly, it picks the wrong hosts, you just pick the right host yourself instead' | 19:04 |
mnaser | non-bfv, images_type=rbd, 2 az's with ceph cluster each will result in broken live migrations | 19:04 |
sean-k-mooney | if you want to propsoe a new feature for this im open to review that | 19:04 |
sean-k-mooney | what i do not think woudl be corerct it considerign this a bug when we previously declared it out of scope and backproting this | 19:05 |
sean-k-mooney | mnaser: it would break if the ceph cluster was inaccable yes | 19:05 |
sean-k-mooney | although i belvie | 19:06 |
sean-k-mooney | the vm would stay runnign on the souce host in active | 19:06 |
sean-k-mooney | with the migration in error | 19:06 |
mnaser | and any reasonable operator would make a sane assumption that the cloud would not live migrate across az's | 19:06 |
sean-k-mooney | libvirt will detect teh qemu instance was not able to connect | 19:06 |
mnaser | the vm does migrate if the cluster is accessible, and then all further operations like resize/migrate/etc are broken | 19:06 |
sean-k-mooney | and it shoudl abort the migration | 19:06 |
mnaser | so it goes into a user-facing broken state | 19:06 |
sean-k-mooney | az are not fault domain | 19:07 |
sean-k-mooney | or isolated segments | 19:07 |
sean-k-mooney | mnaser: i do not belive you will get into a user facing broken state for live migration | 19:07 |
mnaser | you will.. if both ceph clusters are accessible, then the further operations will try to use the fsid of the target vm | 19:07 |
mnaser | i can ask to get tracebacks and logs from teh customer | 19:08 |
mnaser | but it makes sense since now its trying to use the _new_ cluster fsid, but doesnt find the volume, since its attached from the old cluster fsid | 19:08 |
sean-k-mooney | if both are accsabel and you only have ceph cred for one of them on the compute host then qemu will not be able to conenct | 19:08 |
sean-k-mooney | mnaser: that sound like they are trying to use the same user/keyring between both clusters | 19:09 |
mnaser | ok, assume one cluster with different pools when you're using ceph then | 19:09 |
sean-k-mooney | which is incorect | 19:09 |
mnaser | i havent dug that deep into their stuff | 19:09 |
mnaser | now when nova tries to do things it'll do it on the new pool but cant find that _disk image | 19:10 |
sean-k-mooney | which will fail when we try to create the qemu instance on the dest | 19:10 |
sean-k-mooney | but the migraiton shoudl abort then | 19:10 |
mnaser | isnt the old xml get transferred | 19:11 |
sean-k-mooney | and the vm shoudl stay runing on the souce node in actie | 19:11 |
mnaser | so it successfully completes? | 19:11 |
mnaser | s/isnt/doesnt/ | 19:11 |
sean-k-mooney | no the vm get created really really early on the dest | 19:11 |
mnaser | i dont think we rebuild xml from scratch on target but rather rely on shipping the xml from the old libvirt to the new one? | 19:11 |
sean-k-mooney | we have to create the vm on the dest so that the ram can be copied | 19:11 |
sean-k-mooney | mnaser: we generate a new xml on the souce for the dest | 19:11 |
mnaser | ok something is not adding up then | 19:12 |
sean-k-mooney | so my expectation is that it shoudl use the old cluster | 19:12 |
sean-k-mooney | so you woudl have cross az traffic | 19:12 |
mnaser | oh ok right yes, it would add up nevermind | 19:12 |
mnaser | if we generate xml on source for the dest it'll have the old | 19:12 |
sean-k-mooney | what might break is a hard reboot after that | 19:12 |
mnaser | yes exactly, or resize, etc | 19:12 |
sean-k-mooney | right but thats a complete differnt issue | 19:13 |
sean-k-mooney | we do not supprot move operations across diffent stroagge backends at all | 19:13 |
sean-k-mooney | and preventing that is left to the operator today and it has alyas been that way in nova | 19:13 |
mnaser | so as someone whos trying to get people to use openstack, giving them a big gun to shoot themselves in the foot | 19:13 |
mnaser | and then when they do that because it doesnt seem very trivial and obvious that what they did is wrong | 19:14 |
sean-k-mooney | mnaser: the simpelr approch si to use cells | 19:14 |
mnaser | when they went ahead, created az, aggregates, etc | 19:14 |
sean-k-mooney | we do not allwo cross cell live migration | 19:14 |
mnaser | that's a really good point | 19:14 |
mnaser | so ensure same storage backend inside a cell | 19:14 |
mnaser | seems like pretty sane advice | 19:15 |
sean-k-mooney | yes | 19:15 |
sean-k-mooney | with all that said we coudl work on a feature to adress this | 19:15 |
sean-k-mooney | but it would be a new feature and it would have to still allow usecase wehre cross az move operations make sense | 19:15 |
sean-k-mooney | mnaser: for example we recently added a similar feature for neutron routed networks | 19:16 |
sean-k-mooney | https://specs.openstack.org/openstack/nova-specs/specs/wallaby/implemented/routed-networks-scheduling.html | 19:16 |
mnaser | sometimes i really feel letting users create az's was a massive mistake lol | 19:16 |
sean-k-mooney | well users cant | 19:17 |
mnaser | it was always so loose and there's so many people who get shot in the foot with it | 19:17 |
mnaser | nah i mean from an operator perspective | 19:17 |
sean-k-mooney | its admin only unless you change the policy | 19:17 |
mnaser | people build out something and then it almost never gives them what they want | 19:17 |
sean-k-mooney | oh well the issue is peopel consufe nova az with aws | 19:17 |
sean-k-mooney | and they are nothign like each other | 19:17 |
mnaser | yeah | 19:17 |
sean-k-mooney | so before wallaybe tehre was no schduler supprot for route l3 networks | 19:18 |
mnaser | aws has a strong presence so its natural to think of it that way | 19:18 |
sean-k-mooney | i.e. there was nothign preventing you form cold/live migrating to a host where that ip coudl not be routed | 19:18 |
sean-k-mooney | https://specs.openstack.org/openstack/nova-specs/specs/wallaby/implemented/routed-networks-scheduling.html added support for this | 19:18 |
sean-k-mooney | it woudl not be unreasonable to have a similer feature for nova stoage | 19:19 |
sean-k-mooney | for example if we use the ceph fsid to create a placement aggrate containing all host that were configured to use that ceph cluster | 19:19 |
sean-k-mooney | and then recoded that in the isntance_system_metadata and schduled based on that if set | 19:19 |
sean-k-mooney | we would jsut need to do member_of=<fsid> in the pacement query | 19:20 |
mnaser | yeah, that seems like a handy simple way to track that for ceph | 19:20 |
sean-k-mooney | if rbd_fsid was in instance_system_metadata | 19:21 |
mnaser | i guess we would technically toss that into block device mapping data | 19:21 |
mnaser | i cant remember if nova uses that for its own storage | 19:21 |
sean-k-mooney | ish | 19:21 |
sean-k-mooney | we do in weird ways | 19:22 |
mnaser | maybe we should add a warning to the doc https://docs.openstack.org/nova/latest/admin/availability-zones.html about looking into using cells if you want to have full isolation and not allow migrations from one az to another | 19:22 |
sean-k-mooney | but this would be for the root disk really althoguh you could map cinder voluems to placment aggreats in a simialr way | 19:22 |
sean-k-mooney | bauzas: when you ahve time reading back over ^ would be good | 19:23 |
mnaser | ill push a PR to add some details about migrations and bring up cells | 19:23 |
sean-k-mooney | mnaser: cells are still not full isolation but ya. | 19:24 |
mnaser | i have to be honest in my ability of providing help, spec + new feature discussion + all that is a bit too far of a reach for this | 19:24 |
sean-k-mooney | mnaser: the other approch woudl be to have a weigher | 19:24 |
sean-k-mooney | so an az affinity weigher | 19:24 |
mnaser | hmm | 19:24 |
mnaser | i could do that out of tree i guess | 19:24 |
mnaser | as i dont think nova particlarly would wnat to carry that | 19:24 |
sean-k-mooney | we would need to pass the instance current cell to the sheduler and then the weigher could prefer to say in the same az | 19:25 |
sean-k-mooney | am i would not be against having it | 19:25 |
sean-k-mooney | we woudl need to modify the destination object and add a prefered az filed or something | 19:25 |
mnaser | i guess it can be a filter too but it would be very ugly | 19:26 |
mnaser | cause it would have to check if this is a reschedule (aka instance exists and we can find it) or first time (ignore) | 19:27 |
sean-k-mooney | well it should not be a filter because corss az move operations are valid | 19:27 |
mnaser | ah yes also addressing that | 19:28 |
sean-k-mooney | mnaser: basically we coudl add a "current_az" field here https://github.com/openstack/nova/blob/master/nova/objects/request_spec.py#L1092-L1122 | 19:28 |
mnaser | this starts to enter the domain of requiring more resources/time than i have so trying to see how i can be them most useful with the little resource i can spend on this 😅 | 19:29 |
sean-k-mooney | thats used in a few places but we baskcialy woudl just need to get the instnace.az and pass it on | 19:29 |
sean-k-mooney | well simple solution is docs patch + ptg topic | 19:29 |
sean-k-mooney | and i can raise it as a "operator pain point" internally and see if there is interst in adressing it | 19:30 |
sean-k-mooney | although i think we likely wont have time in the next cycle to work on this | 19:30 |
sean-k-mooney | there are potically 2 feature here an az affinity weigher, and reporting ceph clusters reachablity to placment | 19:31 |
sean-k-mooney | both help usablity in differnt ways | 19:32 |
*** dasm is now known as dasm|off | 19:56 | |
simondodsley | In Train when I try to volume migrate a boot volume of a shutdown instance I get the message `Cannot 'swap_volume' instance xyx while it is in vm_state stopped` | 21:36 |
simondodsley | Is there any way to do this? | 21:36 |
simondodsley | Or was this something added after Train | 21:36 |
Generated by irclog2html.py 2.17.3 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!