gibi | bauzas, dansmith: nice findings. I'm OK with the privsep fix and you can ping me with the governor check fix when it is available | 07:57 |
---|---|---|
gibi | regarding asserting the use of the decorator during testing we can build a list of filesystem.write(path, data) calls that we know require the privsep decorator and then check in the test that when those calls happen the func has _ENTRYPOINT_ATTR set. | 08:00 |
gibi | *filesyste.write_sys | 08:04 |
songwenping | sean-k-mooney:hi, does live migration pass filters if assigned the host? | 08:09 |
opendevreview | Balazs Gibizer proposed openstack/nova master: Fix failed count for anti-affinity check https://review.opendev.org/c/openstack/nova/+/873216 | 08:16 |
gibi | sean-k-mooney: This ^^ was already approved but needed a rebase and a unit test fix due the base changes. Could you check it please? | 08:17 |
gibi | I did not wait for the author with the rebase and went ahead and fixed up the patch | 08:17 |
opendevreview | Balazs Gibizer proposed openstack/nova stable/2023.1: Fix failed count for anti-affinity check https://review.opendev.org/c/openstack/nova/+/885343 | 08:22 |
opendevreview | Balazs Gibizer proposed openstack/nova stable/zed: Fix failed count for anti-affinity check https://review.opendev.org/c/openstack/nova/+/885344 | 08:30 |
bauzas | gibi: sorry for the late reply, but thanks | 08:33 |
opendevreview | Balazs Gibizer proposed openstack/nova stable/yoga: Fix failed count for anti-affinity check https://review.opendev.org/c/openstack/nova/+/885345 | 08:36 |
opendevreview | Balazs Gibizer proposed openstack/nova stable/xena: Fix failed count for anti-affinity check https://review.opendev.org/c/openstack/nova/+/885347 | 08:45 |
songwenping | gibi:morning, does live migration pass filters if assigned the host? | 08:46 |
gibi | songwenping: it depends. See the doc of the host and the force option in https://docs.openstack.org/api-ref/compute/?expanded=live-migrate-server-os-migratelive-action-detail#id131 | 08:47 |
songwenping | we use rocky version, and nova-conductor donot find new destination. | 08:53 |
songwenping | and there is a problem, if the vm has affinity, it can be migarated to other host. | 08:53 |
opendevreview | Balazs Gibizer proposed openstack/nova stable/wallaby: Fix failed count for anti-affinity check https://review.opendev.org/c/openstack/nova/+/885348 | 08:57 |
songwenping | then if we use the same affinity strategy to create vms, these vms scheduled to different hosts. gibi, is this reasonable? | 08:57 |
gibi | if you use affinity strategy then you cannot move the VM. Execpt if you disable the scheduler via the force flag and old enough microversion. But if you disable the scheduler then the affinity will not be honored. | 08:59 |
gibi | If you need both affinity and move operations then you should use soft-affinity | 08:59 |
opendevreview | Balazs Gibizer proposed openstack/nova stable/victoria: Fix failed count for anti-affinity check https://review.opendev.org/c/openstack/nova/+/885349 | 09:08 |
songwenping | gibi, got it, thanks^^ | 09:09 |
opendevreview | Sylvain Bauza proposed openstack/nova master: cpu: make governors to be optional https://review.opendev.org/c/openstack/nova/+/885352 | 09:57 |
opendevreview | Balazs Gibizer proposed openstack/nova stable/ussuri: Fix failed count for anti-affinity check https://review.opendev.org/c/openstack/nova/+/885353 | 10:04 |
opendevreview | Balazs Gibizer proposed openstack/nova stable/train: Fix failed count for anti-affinity check https://review.opendev.org/c/openstack/nova/+/885355 | 10:19 |
sean-k-mooney | gibi: its workign on https://github.com/openstack-k8s-operators/nova-operator/pull/400 by the way | 10:22 |
sean-k-mooney | which ran at 2023-06-05 14:50:20 | 10:23 |
sean-k-mooney | so it passed yesterday | 10:23 |
gibi | ack | 10:23 |
gibi | you oddly switched from downstream slack to upstream irc to write that though :) | 10:24 |
sean-k-mooney | oh right donwstream is normlaly on the bottom half of my screen and upstream is the top | 10:24 |
sean-k-mooney | this windows is in the wrong place | 10:25 |
sean-k-mooney | fixed :) | 10:25 |
gibi | :) | 10:25 |
opendevreview | Gorka Eguileor proposed openstack/nova master: Libvirt: remove old discard with virtio log https://review.opendev.org/c/openstack/nova/+/885356 | 11:07 |
*** EugenMayer44 is now known as EugenMayer4 | 11:21 | |
dvo-plv | gibi,bauzas: Hello, Could you please review nova patch: https://review.opendev.org/c/openstack/nova/+/876075 | 11:41 |
bauzas | dvo-plv: sure, I already promised but unfortunately I needed to work on my presentation for the OpenInfra Summit :( | 11:41 |
dvo-plv | Sure, thank you no rush, review according to your plan, I just would like to remind in case request was lost | 11:44 |
bauzas | I'm really sorry folks but I forgot to tell that today is the spec review day | 12:52 |
bauzas | !!! | 12:52 |
opendevmeet | bauzas: Error: "!!" is not a valid command. | 12:52 |
opendevreview | Amit Uniyal proposed openstack/nova master: Reproducer for dangling bdms https://review.opendev.org/c/openstack/nova/+/881457 | 14:12 |
opendevreview | Amit Uniyal proposed openstack/nova master: Delete dangling bdms https://review.opendev.org/c/openstack/nova/+/882284 | 14:12 |
opendevreview | Sylvain Bauza proposed openstack/nova master: cpu: make governors to be optional https://review.opendev.org/c/openstack/nova/+/885352 | 14:23 |
dansmith | bauzas: are you going to update the reno for the first patch? | 14:27 |
mnederlof | hi, i've created this bp https://blueprints.launchpad.net/nova/+spec/rbd-allow-glance-image-deletion and the code change required, can someone help with the next steps for review? https://review.opendev.org/c/openstack/nova/+/884595 | 14:32 |
bauzas | dansmith: yeah, I'm just fixing the series | 14:33 |
opendevreview | Sylvain Bauza proposed openstack/nova master: cpu: fix the privsep issue when offlining the cpu https://review.opendev.org/c/openstack/nova/+/885293 | 14:37 |
opendevreview | Sylvain Bauza proposed openstack/nova master: cpu: make governors to be optional https://review.opendev.org/c/openstack/nova/+/885352 | 14:37 |
bauzas | dansmith: gibi: just updated the cpu fixes ^ | 14:38 |
bauzas | elodilles: can you help me ? wanted to propose the train-eol patch but looked at the docs and saw https://docs.openstack.org/project-team-guide/stable-branches.html#end-of-life | 14:53 |
bauzas | "point #2 : Remove any related zuul jobs that are defined in other repositories and not needed anymore." | 14:53 |
bauzas | wdym by that ? | 14:53 |
bauzas | like in tempest ? | 14:54 |
elodilles | bauzas: any job that nova uses in its .zuul.yaml, but defined outside of nova repository | 14:59 |
bauzas | I don't see any of them | 14:59 |
elodilles | for example if there is let's say nova-special-grenade-train defined in, for example, openstack/grenade repository | 15:00 |
elodilles | bauzas: if there is none, then you're done with that step ;) | 15:01 |
bauzas | I'll doublecheck with Gerrit | 15:01 |
bauzas | gibi: btw. I have an appointment around 20 mins after the start of the meeting, can you chair it ? | 15:02 |
gibi | bauzas: I can try but I'm probably not the best person today as I will be on an flaky connection at that time | 15:10 |
bauzas | ok, I can ask someone else, I just wonder who | 15:10 |
bauzas | elodilles: want to lead it ? | 15:10 |
elodilles | bauzas: i'm not feeling quite well, so i'd rather pass this time :/ | 15:15 |
bauzas | okok | 15:16 |
bauzas | so, we'll try to have a quick meeting then | 15:16 |
elodilles | +1 | 15:16 |
gibi | bauzas: then I will jump in after you need to leave but I don't promise I will not get disconnected at some point :) | 15:18 |
dansmith | gibi: #chair me and I can recover it if you drop | 15:23 |
gibi | ack | 15:26 |
bauzas | dansmith: cool thanks for the offer | 15:38 |
bauzas | shit, my appointment just arrived | 15:50 |
bauzas | gibi: can you please lead it ? | 15:50 |
bauzas | the agenda is done https://wiki.openstack.org/wiki/Meetings/Nova#Agenda_for_next_meeting | 15:50 |
gibi | sure | 15:51 |
gibi | I will do | 15:51 |
gibi | #startmeeting nova | 16:00 |
opendevmeet | Meeting started Tue Jun 6 16:00:04 2023 UTC and is due to finish in 60 minutes. The chair is gibi. Information about MeetBot at http://wiki.debian.org/MeetBot. | 16:00 |
opendevmeet | Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. | 16:00 |
opendevmeet | The meeting name has been set to 'nova' | 16:00 |
gibi | #chair bauzas | 16:00 |
opendevmeet | Current chairs: bauzas gibi | 16:00 |
gibi | #chair dansmith | 16:00 |
opendevmeet | Current chairs: bauzas dansmith gibi | 16:00 |
dansmith | o/ | 16:00 |
auniyal | o/ | 16:00 |
elodilles | o/ | 16:00 |
gibi | bauzas has an appointment so I try to chair this | 16:01 |
gibi | #link https://wiki.openstack.org/wiki/Meetings/Nova#Agenda_for_next_meeting | 16:01 |
gibi | #topic Bugs (stuck/critical) | 16:01 |
gibi | lets see | 16:01 |
gibi | #info No Critical bug | 16:01 |
gibi | #link https://bugs.launchpad.net/nova/+bugs?search=Search&field.status=New 18 new untriaged bugs (+3 since the last meeting) | 16:02 |
gibi | #info Add yourself in the team bug roster if you want to help https://etherpad.opendev.org/p/nova-bug-triage-roster | 16:02 |
gibi | last week the baton was at bauzas | 16:02 |
gibi | so I'm not sure if we have any news from him now | 16:03 |
gibi | the next on the roster is me | 16:03 |
bauzas | I'll take it next week | 16:04 |
gibi | but I will be mostly away next week | 16:04 |
Uggla_ | o/ | 16:04 |
bauzas | I didn't had time to look at them this week | 16:04 |
bauzas | Ditto due to the summit | 16:04 |
gibi | so moving down the list the next on it is melwitt | 16:04 |
bauzas | But I can try to look at them | 16:04 |
bauzas | (sorry on my phone) | 16:04 |
gibi | melwitt: could you take the baton? | 16:05 |
auniyal | gibi, last to last week I looked into this bug: https://bugs.launchpad.net/nova/+bug/2018719, I could not reproduce so added comment to ask for more info | 16:05 |
gibi | auniyal: ack | 16:06 |
gibi | I guess logging in to the rescue image depends on the actual image so you are right | 16:06 |
gibi | I will ping melwitt later about the bug baton | 16:07 |
gibi | any other bugs we need to discuss? | 16:07 |
auniyal | nothing from my side, thanks | 16:08 |
gibi | #topic Gate status | 16:09 |
gibi | #link https://bugs.launchpad.net/nova/+bugs?field.tag=gate-failure Nova gate bugs | 16:09 |
gibi | #link https://etherpad.opendev.org/p/nova-ci-failures | 16:10 |
dansmith | lots of little test failures lately which is making it challenging to get a clean result | 16:10 |
dansmith | but nothing outstanding as a super common thing to go tackle that I've seen | 16:10 |
gibi | I saw two different guest failures one case an disk io error | 16:10 |
gibi | the other was probably some metadata error | 16:10 |
gibi | but I agree I did not see a pattern yet | 16:11 |
dansmith | I have seen some IO errors related to volumes yeah, but I don't know what that's coming from | 16:11 |
gibi | I don't see any new bug reported tagged with gate-failure. If I see a pattern in tomorrows reject then I will file some | 16:12 |
gibi | s/reject/recheck/ | 16:13 |
bauzas | haven't seen any gate failure | 16:13 |
bauzas | (still otp) | 16:13 |
gibi | #link https://zuul.openstack.org/builds?project=openstack%2Fnova&project=openstack%2Fplacement&pipeline=periodic-weekly Nova&Placement | 16:14 |
gibi | periodics look good | 16:14 |
gibi | any other gate issues to raise? | 16:15 |
gibi | #info Please look at the gate failures and file a bug report with the gate-failure tag. | 16:15 |
dansmith | nothing from me | 16:15 |
bauzas | I'm back | 16:16 |
gibi | then the usual announcement | 16:16 |
gibi | #info STOP DOING BLIND RECHECKS aka. 'recheck' https://docs.openstack.org/project-team-guide/testing.html#how-to-handle-test-failures | 16:16 |
dansmith | fwiw, | 16:16 |
dansmith | I think we're doing quite well on the blind recheck thing.. not that we shouldn't remind people, but we could probably un-ALL-CAPS-ify that now :D | 16:17 |
bauzas | gibi: wants me to take again the chair seat ? | 16:17 |
gibi | dansmith: cool. I'm OK to uncap it :) | 16:17 |
dansmith | it's been tracked in the TC meeting and we' | 16:17 |
gibi | bauzas: the char is yours :) | 16:17 |
dansmith | we seem to be settling around pretty good behavior | 16:17 |
bauzas | dansmith: lol, I'll change it :) | 16:17 |
bauzas | #topic Release Planning | 16:19 |
bauzas | #link https://releases.openstack.org/bobcat/schedule.html | 16:19 |
bauzas | #info Nova deadlines are set in the above schedule | 16:19 |
bauzas | #info Nova spec review day today | 16:19 |
bauzas | as a reminder ^ | 16:19 |
gibi | I deeply missed that :/ | 16:19 |
bauzas | tbh, I wasn't able to do my duty but I'll do this later tonight | 16:19 |
bauzas | (some internal discussion ate my whole afternoon) | 16:20 |
bauzas | so, yeah, would be nice | 16:20 |
gibi | I saw that there is a spec proposal for continuing the PCI in placement work | 16:20 |
bauzas | nothing to tell apart this | 16:20 |
bauzas | gibi: indeed, someone proposed | 16:20 |
gibi | I need to review that | 16:20 |
bauzas | cool | 16:20 |
gibi | but other can chime in there too :) | 16:20 |
bauzas | as a reminder, if folks don't have time to review specs today, that's fine (c) | 16:20 |
bauzas | but please try to look at them this week | 16:21 |
gibi | there is alway tomorrow :) | 16:21 |
bauzas | at least before the Summit in case people discuss there | 16:21 |
bauzas | anyway, good related point, | 16:21 |
bauzas | #topic pPTG Planning | 16:21 |
bauzas | #info please add your topics and names to the etherpad https://etherpad.opendev.org/p/vancouver-june2023-nova | 16:21 |
bauzas | crickets in there ^ | 16:21 |
bauzas | so I'll write an -discuss ML thread for this | 16:22 |
gibi | nah, I added one thing now :D | 16:22 |
bauzas | in case ops or devs want to discuss with us | 16:22 |
bauzas | hehe | 16:22 |
bauzas | + I'll tell ops during our forum meet&greet about our PTG | 16:23 |
bauzas | #info The table #24 is booked for the whole two days. See the Nova community thereĀ ! | 16:23 |
bauzas | that's it | 16:23 |
bauzas | moving on | 16:23 |
bauzas | #topic Review priorities | 16:23 |
bauzas | #link https://review.opendev.org/q/status:open+(project:openstack/nova+OR+project:openstack/placement+OR+project:openstack/os-traits+OR+project:openstack/os-resource-classes+OR+project:openstack/os-vif+OR+project:openstack/python-novaclient+OR+project:openstack/osc-placement)+(label:Review-Priority%252B1+OR+label:Review-Priority%252B2) | 16:23 |
bauzas | #info As a reminder, cores eager to review changes can +1 to indicate their interest, +2 for committing to the review | 16:24 |
bauzas | #topic Stable Branches | 16:24 |
bauzas | elodilles is maybe afk | 16:24 |
bauzas | so lemme add his points | 16:24 |
bauzas | #info stable gates should be OK (from stable/2023.1 to stable/train) | 16:24 |
bauzas | #info stable branch status / gate failures tracking etherpad: https://etherpad.opendev.org/p/nova-stable-branch-ci | 16:24 |
bauzas | huzzah for this | 16:24 |
bauzas | and my point | 16:24 |
bauzas | #info train-eol patch proposed https://review.opendev.org/c/openstack/releases/+/885365 | 16:24 |
bauzas | I'd appreciate if nova-cores could comment it ^ | 16:25 |
dansmith | I will | 16:25 |
* gibi just proposed a train backport today | 16:25 | |
dansmith | I noticed the cinder people are taking a more aggressive approach | 16:25 |
bauzas | gibi: okay, then -1 my patch and I'll modify it to await for your backport merge | 16:25 |
bauzas | dansmith: yup, saw it too | 16:25 |
gibi | bauzas: I think I'm OK to drop that backport | 16:25 |
gibi | I did that to see if the patch works | 16:26 |
bauzas | gibi: as you want, just tell me your insights in the train-eol patch | 16:26 |
gibi | ack | 16:26 |
bauzas | dansmith: fwiw I'm afraid of EOLing the whole EM branches | 16:26 |
dansmith | shrug.. my argument for train applies to all the EM ones too | 16:27 |
dansmith | maybe we wait and see how it goes for cinder ;) | 16:27 |
bauzas | but since we haven't backported the os-brick CVE fix in Ussuri and Victoria, I could understand | 16:27 |
bauzas | but yeah, let's see what it's happening for cinder :D | 16:28 |
* bauzas takes his popcorn :) | 16:28 | |
bauzas | anyway, moving on | 16:28 |
bauzas | #topic Open discussion | 16:28 |
bauzas | none in the agenda | 16:28 |
bauzas | anything that someone wants to tell ? | 16:28 |
gibi | one thing | 16:30 |
bauzas | shoto | 16:30 |
bauzas | shoot even | 16:30 |
gibi | there was a request for opinion about openstack on k8s | 16:30 |
gibi | let me find the link | 16:30 |
sean-k-mooney | its a thing that people do | 16:30 |
bauzas | okidoki, let's wait | 16:31 |
gibi | 13:01 < mdbooth> I'll be running a forum session on Kubernetes on OpenStack in Vancouver next week. It's for users and developers of all related projects to talk to each other. Etherpad is here if there's anything you'd like to discuss: https://etherpad.opendev.org/p/openinfra-2023-kubernetes-on-openstack | 16:32 |
gibi | 13:01 < mdbooth> I'll be running a forum session on Kubernetes on OpenStack in Vancouver next week. It's for users and developers of all related projects to talk to each other. Etherpad is here if there's anything you'd like to discuss: https://etherpad.opendev.org/p/openinfra-2023-kubernetes-on-openstack | 16:32 |
gibi | sorry for the duplicate | 16:32 |
sean-k-mooney | im wondering why mdboot is runnign that session but ok | 16:32 |
dansmith | that's k8s on openstack not what you said right? | 16:32 |
gibi | sorry I mixed up | 16:32 |
sean-k-mooney | that k8s on openstack | 16:32 |
sean-k-mooney | ya | 16:33 |
bauzas | ok, so lemme add the link | 16:33 |
sean-k-mooney | that makes more sense why mdbooth is involved | 16:33 |
bauzas | #link https://etherpad.opendev.org/p/openinfra-2023-kubernetes-on-openstack OpenInfra Forum session for discussing about k8s on openstack | 16:33 |
bauzas | gibi: the other way btw. :) | 16:33 |
gibi | there is some nova related question in the etherpad | 16:33 |
gibi | about getting notified if anything changed with servers | 16:34 |
gibi | I offered the nova notification inteface | 16:34 |
bauzas | yeah and it's a public API :) | 16:34 |
gibi | but apperantly they want something public | 16:34 |
sean-k-mooney | the notifcation are the only interface we have currently | 16:34 |
gibi | the API is documented but the message bus access then to be non public | 16:34 |
bauzas | since someone worked on notifications objects like 6 years ago (guess who :p ) | 16:35 |
gibi | bauzas: hah :D | 16:35 |
bauzas | gibi: are you sure that the message bus can't be public ? | 16:35 |
dansmith | it shouldn't be | 16:35 |
gibi | yeah | 16:35 |
dansmith | we have instance events.. that's what they want I think | 16:35 |
sean-k-mooney | the notification bus is semi privladged | 16:35 |
gibi | and our notifications tend to contain infra information | 16:35 |
dansmith | I think they just need an async way to get those | 16:35 |
sean-k-mooney | the notifications can have private infor in them | 16:35 |
sean-k-mooney | depening on what you configure | 16:36 |
sean-k-mooney | like the bdms | 16:36 |
gibi | yeah, bottom line the notification API is designed for consumed by admins or other openstack service not endusers | 16:36 |
bauzas | ah they want it to be consumable by endusers ? | 16:36 |
dansmith | right, it would leak bad things between tenants for sure | 16:36 |
dansmith | not just infra things | 16:36 |
gibi | yeah | 16:36 |
bauzas | urth | 16:36 |
bauzas | urgh | 16:37 |
sean-k-mooney | you could have a multi tentant service that converts the notificaton in to a webhook callback or similar | 16:37 |
dansmith | instance events/actions is the right thing I think, it's just only polling currently | 16:37 |
sean-k-mooney | but that still not greate | 16:37 |
sean-k-mooney | dansmith: ya the event stream would work but im not sure that | 16:37 |
gibi | yeah, a websocket around instance actions wouldbe nice | 16:37 |
sean-k-mooney | even if it was event based they woud lwant ot liste per instance | 16:37 |
sean-k-mooney | more liek open a websocket and get all instance events for a project? | 16:37 |
dansmith | they probably want to be able to register a handler with a scope (one instance, all my instances) that lives for a period of time that we call when there's a new event | 16:38 |
sean-k-mooney | or that you are allowed to see based on teh scope of the keystone token | 16:38 |
bauzas | I think everytime someone asks us to monitor some instance action, we tell them 'lookup the notifications' | 16:38 |
bauzas | but this is for admin usage | 16:38 |
dansmith | websocket will require a lot of standby resources that I think would be hard for us to manage | 16:38 |
gibi | true | 16:38 |
dansmith | anyway, | 16:38 |
bauzas | so, they want some enduser public subscription mechanism for asynchronously being notified on my instance state changes ? | 16:38 |
dansmith | not sure how many people will be there to make any sort of headway on that topic, since I think those people are likely here :) | 16:39 |
dansmith | bauzas: yeah | 16:39 |
bauzas | sounds a client thing to mre | 16:39 |
bauzas | me | 16:39 |
gibi | dansmith: I will be there hence collecting ideas here now :) | 16:39 |
sean-k-mooney | lets see if they acan at least expand on the usecases | 16:39 |
dansmith | bauzas: a client can poll (or long poll) but that's much less efficient | 16:39 |
gibi | yeah I will pull out some specific use case and try to limit the scope to something very simple on our side | 16:39 |
dansmith | especially when instance actions could be days apart | 16:40 |
bauzas | but yeah, someone could provide some tool that would listen to the notification bus and scramples all the admin-only data | 16:40 |
gibi | bauzas: exactly | 16:40 |
bauzas | sorry, by client I meant something unrelated to nova | 16:40 |
sean-k-mooney | so the way to do this in the past was ceilometer put the relevetn events in AODH | 16:40 |
dansmith | they could, but that's basically re-constructing the tenant isolation that nova already has, so it's a big new surface to secure and new services to run | 16:40 |
sean-k-mooney | and then you woudl set up alarms on the events you cared about | 16:40 |
dansmith | sean-k-mooney: that's all intended to be operator-focused, not for users to get status/events on their instances right? | 16:41 |
sean-k-mooney | no | 16:41 |
gibi | if they only need a trigger to re-read the instance action API then most of of the data can be hidden from our notifications | 16:41 |
sean-k-mooney | aodh and celoimeter provided user facing events/metrics | 16:41 |
dansmith | okay I didn't realize | 16:41 |
bauzas | yeah | 16:41 |
sean-k-mooney | they didnt actully expose the full notificaiton | 16:41 |
bauzas | ceilometer was the fit | 16:42 |
sean-k-mooney | jsut instance boot started and instance boot finsined events | 16:42 |
bauzas | and that's why we never had this in nova | 16:42 |
dansmith | honestly, I feel like this is probably something nova can/should be doing | 16:42 |
dansmith | nowadays this is how stuff plugs together | 16:42 |
sean-k-mooney | ya i think its something we coudl do | 16:42 |
sean-k-mooney | but we need to think about how | 16:42 |
dansmith | making an external tool reconstruct what we already know is kinda :/ | 16:42 |
gibi | yeah | 16:43 |
dansmith | it could be a service like console that you run if you want, and scale separately to handle the amoun tof load you want to tolerate | 16:43 |
bauzas | dansmith: I'm still struggling to find how we would ensure the tenancy isolation by the message bus, but I'm open to ideas | 16:43 |
bauzas | unless we create a bus per tenant | 16:43 |
sean-k-mooney | if its in nova | 16:43 |
dansmith | bauzas: we wouldn't? | 16:43 |
sean-k-mooney | we can just filter | 16:43 |
bauzas | I'm maybe misunderstanding the proposal, but I thought we were about saying that we may emit project-related notifications | 16:44 |
dansmith | not at the rabbit level | 16:44 |
dansmith | let's let gibi collect some data, | 16:44 |
dansmith | and then we probably need a high-bandwidth conversation about options | 16:44 |
gibi | bauzas: we def need to understand their use case better | 16:45 |
bauzas | dansmith: ah ok | 16:45 |
bauzas | dansmith: then we need to construct some HTTP/2 layer with keystone auth | 16:45 |
bauzas | or something like that | 16:45 |
dansmith | bauzas: not necessarily | 16:45 |
dansmith | it just depends.. but it should be HTTP-something, either an event stream or callbacks | 16:45 |
sean-k-mooney | you would do somethign like "openstack project event subsribe (instnace.action.)*" which woudl return a websock url that would only stream the relevent events for the current project based on the keystone token | 16:46 |
dansmith | yep, could be something like that | 16:46 |
bauzas | sounds very console-ish | 16:46 |
sean-k-mooney | so like the console you woudl first create it and then get a handel for where to collect the data | 16:46 |
bauzas | but okay | 16:46 |
gibi | I can simplify it down to give me a server uuid and I stream you data about notifications affecting the server, but only with very limited data provided | 16:46 |
dansmith | bauzas: exactly.. it's the same sort of arrangement, and the same target audience | 16:46 |
bauzas | ok, then it sounds we have an agreement on the direction, let's not overpaper the technical details | 16:47 |
sean-k-mooney | and if its a seperate binary nova-event-proxy | 16:47 |
dansmith | gibi: yeah I just don't think you should need 100 websockets you have to read from if you have 100 instances in your wordpress deployment | 16:47 |
dansmith | sean-k-mooney: right | 16:47 |
sean-k-mooney | then its scalablity and wether its deployed is up ot the operator | 16:48 |
dansmith | yup | 16:48 |
gibi | dansmith: ahh true, we can do it pre project then | 16:48 |
gibi | per | 16:48 |
bauzas | yeah, devil is in the details of the productization | 16:48 |
dansmith | gibi: or even server group | 16:48 |
sean-k-mooney | anyway lets see what they actully bring up | 16:48 |
bauzas | gibi: honestly, the granularity sounds per project to me | 16:48 |
sean-k-mooney | and see if this type of solution would work for them or not | 16:48 |
dansmith | well, nfv people are all one project in some cases, so that probably won't work for them | 16:48 |
bauzas | gibi: are you done with this topic now that we drafted a solution for you ? :D | 16:49 |
gibi | I'm done | 16:49 |
bauzas | cool | 16:49 |
gibi | thanks for the discussion\ | 16:49 |
gibi | I will link this to the etherpad | 16:49 |
gibi | and I will report back from the summit | 16:49 |
bauzas | cool | 16:49 |
bauzas | I'll be back watching you at the Summit anyway | 16:49 |
bauzas | so if you promise too many things, I could yell :p | 16:49 |
gibi | bauzas: please do so | 16:49 |
gibi | :D | 16:49 |
gibi | I don't need another 3 years of "notification" work | 16:50 |
dansmith | heh | 16:50 |
dansmith | still got scars eh/ | 16:50 |
bauzas | ok, I was balancing the idea to paperwork the scaphandre and manila series but I'm exhausted of today | 16:50 |
bauzas | so, let's skip it and pretend it will be discussed in two weeks from now | 16:50 |
gibi | dansmith: time makes all these memory nicer and nicer actaully | 16:51 |
dansmith | heh | 16:51 |
gibi | so bauzas' has a good point watching me :) | 16:51 |
dansmith | gibi: https://www.youtube.com/watch?v=dLjNzwEULG8 | 16:51 |
bauzas | gibi: you're fortunate, canadians don't open carry | 16:51 |
bauzas | sorry, was a terrible joke :) | 16:52 |
gibi | dansmith: I need to check this out after the meeting :) | 16:52 |
bauzas | anyway, I think we're done for today | 16:52 |
gibi | indeed | 16:52 |
bauzas | thanks all | 16:52 |
bauzas | and thanks gibi for the chair | 16:52 |
bauzas | #endmeeting | 16:52 |
opendevmeet | Meeting ended Tue Jun 6 16:52:50 2023 UTC. Information about MeetBot at http://wiki.debian.org/MeetBot . (v 0.1.4) | 16:52 |
opendevmeet | Minutes: https://meetings.opendev.org/meetings/nova/2023/nova.2023-06-06-16.00.html | 16:52 |
opendevmeet | Minutes (text): https://meetings.opendev.org/meetings/nova/2023/nova.2023-06-06-16.00.txt | 16:52 |
opendevmeet | Log: https://meetings.opendev.org/meetings/nova/2023/nova.2023-06-06-16.00.log.html | 16:52 |
bauzas | gibi: there you go, you have your logs :p | 16:53 |
gibi | thank | 16:54 |
gibi | and it was the moment when my connection dropped first | 16:54 |
gibi | so it was not that flaky after all | 16:54 |
dansmith | are we canceling the nova meeting next week? | 16:55 |
bauzas | dansmith: damn shit, forgot to tell it | 16:55 |
bauzas | unless people wanna run it, which I'm cool wiht | 16:55 |
geguileo | dansmith: sean-k-mooney last week I mentioned the os-brick idempotency and we talked about reconstructing the disk XML on some operations. I've just opened a bug (#2023078) where Nova is not rebuilding the disk XML after block migration. Don't know nova enough to know if this would be fixed with the changes to the os-brick idempotency thingy. | 16:56 |
sean-k-mooney | the xml for live migration is built on the souce node based on info passed back form the destination host | 16:57 |
sean-k-mooney | for cold migration it build on the dest host | 16:57 |
sean-k-mooney | i assume you are refering to live-migration local block devices? | 16:57 |
geguileo | sean-k-mooney: the XML after live migration is wrong | 16:57 |
sean-k-mooney | that xml must be generated on the source host | 16:58 |
geguileo | sean-k-mooney: it doesn't have the right discard=unmap value even when the destination is saying that it is supported | 16:58 |
sean-k-mooney | is it enabeld on the source | 16:58 |
geguileo | sean-k-mooney: but if the source didn't support it and now it does? Then it cannot be supported until reboot? | 16:59 |
geguileo | Because then I think Cinder should start reporting that everything supports discard... | 16:59 |
sean-k-mooney | geguileo: correct | 16:59 |
sean-k-mooney | so this is not somethign that should be chnaging durign a live migrate | 17:00 |
geguileo | ok, so then why do we even report the support for this thing? | 17:00 |
geguileo | We should always set discard=unmap in the XML | 17:00 |
sean-k-mooney | well we cant because our min libvirt/qemu did not support it | 17:01 |
geguileo | if it works, nice, if it doesn't it will not prevent it from working after a live migration | 17:01 |
sean-k-mooney | that may have changed recently but it was a limitation in the past | 17:01 |
opendevreview | sean mooney proposed openstack/nova master: Allow discard with virtio-blk https://review.opendev.org/c/openstack/nova/+/878795 | 17:01 |
geguileo | sean-k-mooney: but if it does now then maybe we have to give it another go at this whole thing | 17:01 |
sean-k-mooney | geguileo: without ^ we also dont support discard for all disk buses | 17:01 |
sean-k-mooney | geguileo: sure but we have to be careful to ensure that the move ops work properly | 17:02 |
geguileo | sean-k-mooney: that patch should be abandoned... I added a comment to the LP bug | 17:02 |
sean-k-mooney | that is not your patch | 17:02 |
sean-k-mooney | its my patch to fix the bug | 17:02 |
sean-k-mooney | that for some reason was not in launchpad | 17:02 |
geguileo | sean-k-mooney: how is it different to mine? | 17:02 |
geguileo | it's literaly the same | 17:03 |
geguileo | and it doesn't work | 17:03 |
sean-k-mooney | i make discard work for disk buses | 17:03 |
geguileo | it just removes a debug log message | 17:03 |
geguileo | and the log is correct | 17:03 |
geguileo | discard doesn't work with virtio | 17:03 |
sean-k-mooney | yes it does | 17:03 |
geguileo | don't know why | 17:03 |
sean-k-mooney | it requires a min version fo qemu and libvirt | 17:03 |
geguileo | I could only make it work with IDE, SCSI, and SATA | 17:04 |
sean-k-mooney | i can try and find the downstream bz for virtio-blk again one sec | 17:04 |
geguileo | sean-k-mooney: I have the downstream BZ | 17:04 |
geguileo | sean-k-mooney: I'm just telling you I can't make it work | 17:04 |
sean-k-mooney | for qemu supprot of virtio-blk | 17:04 |
geguileo | (maybe I'm dumb) | 17:04 |
sean-k-mooney | *trim with virtio--blk | 17:04 |
geguileo | sean-k-mooney: sure, it says it supports it... I can't make it work without using IDE, SCSI or SATA | 17:05 |
sean-k-mooney | well it worked in our ci | 17:05 |
sean-k-mooney | https://review.opendev.org/c/openstack/nova/+/879077/1 | 17:05 |
geguileo | sean-k-mooney: did you actually check that the size was reduced? | 17:05 |
sean-k-mooney | no but if its not that a qemu bug | 17:05 |
geguileo | sean-k-mooney: but then the log should remain | 17:05 |
sean-k-mooney | why it would not be correct | 17:06 |
geguileo | just because fstrim says it has freed space within the guest OS it doesn't mean that it has actually hapened | 17:06 |
sean-k-mooney | sure there are several layeres at play here | 17:07 |
geguileo | sean-k-mooney: oh, I'm sure there are many, but they are beyond my expertise | 17:07 |
geguileo | I'm just reporting as a storage guy saying, don't know why I can't make it work | 17:07 |
geguileo | lol | 17:07 |
sean-k-mooney | so you were using local sotrage | 17:07 |
sean-k-mooney | no cinder | 17:07 |
sean-k-mooney | booted a vm | 17:07 |
sean-k-mooney | allcoated space | 17:08 |
geguileo | I was booting a VM from RBD, iSCSI, NFS | 17:08 |
sean-k-mooney | and then deleteed it an ddid a trim | 17:08 |
geguileo | Then I did a live volume migration, which made the disk loose sparseness (became thick) | 17:08 |
geguileo | then I issued the "fstrim -v --all" | 17:08 |
sean-k-mooney | well i filed https://bugs.launchpad.net/nova/+bug/2013123 for local storage | 17:08 |
geguileo | sean-k-mooney: yeah, I replied in that LP bug | 17:09 |
geguileo | sean-k-mooney: I'm working on a sparseness document, because this is a CF | 17:09 |
geguileo | (including the cinder side) | 17:09 |
sean-k-mooney | well discard is not just about sparceness | 17:10 |
sean-k-mooney | but for what its worth we do not make any statement about spacencie at the api level | 17:10 |
dansmith | AFAIK, with various file formats you can only expect the used size to decrease if you fully discard a block that covers an extent | 17:11 |
geguileo | sean-k-mooney: yeah, it's also about SSDs optimization, power consumption, etc | 17:11 |
dansmith | vmware, TMK, only actually reclaims space when the guest is shutdown | 17:11 |
geguileo | dansmith: true, but even then I can clearly see the reduction | 17:11 |
geguileo | I mean, the size goes down from 1GB to 100MB or so... | 17:12 |
sean-k-mooney | so for example i dont knwo if qcow or raw files will actully reduce space | 17:12 |
geguileo | sean-k-mooney: qcow2 does | 17:12 |
sean-k-mooney | for qcow i woudl expect it to be reduced for raw proably not | 17:12 |
geguileo | it things are set correctly | 17:12 |
geguileo | (aka all the starts align) | 17:12 |
geguileo | s/starts/stars | 17:12 |
geguileo | and I can even make NFS/qcow2 and RBD preserve sparseness on live migration | 17:13 |
geguileo | changing the nova code | 17:13 |
sean-k-mooney | how did you chagne the nova code | 17:13 |
geguileo | to use the detect_zeroes feature | 17:13 |
geguileo | I created this LP bug for that one https://bugs.launchpad.net/nova/+bug/2023079 | 17:13 |
sean-k-mooney | ok then we can add that as a new feature if you can explain what it is | 17:13 |
sean-k-mooney | that is not a bug it woudl be a new feature | 17:14 |
sean-k-mooney | a small one but its still a feature | 17:14 |
geguileo | sean-k-mooney: https://paste.openstack.org/show/brFgX6MgBlxjgCrE3rbg/ | 17:14 |
sean-k-mooney | geguileo: what do you mean by volume in https://bugs.launchpad.net/nova/+bug/2023079 | 17:14 |
geguileo | sean-k-mooney: but that's not the right code | 17:14 |
geguileo | it's what I used to test | 17:14 |
sean-k-mooney | oh "When doing a live volume migration " | 17:15 |
geguileo | but that has CPU implications when running, so it's best only to change it when the block migration is going to happen | 17:15 |
sean-k-mooney | geguileo: we have a differnt concept called block-migration in the context of live migrating a vm | 17:15 |
dansmith | ...yeah | 17:15 |
geguileo | sean-k-mooney: yes, I mean block live migration, I've updated the bug name, thanks | 17:16 |
dansmith | I was going to say, we should never be block migrating a volume | 17:16 |
geguileo | sean-k-mooney: ooooh, then soooooooorry for mixing terms (/me facepalms) | 17:16 |
sean-k-mooney | geguileo: block live migratin in nova mean live migrate a vm with local raw/qcow storage | 17:16 |
dansmith | it means we literally move all the data | 17:16 |
dansmith | (all the disk data) | 17:16 |
geguileo | my bad, I've updated the LP bug | 17:17 |
geguileo | dansmith: we are currently moving ALL the data | 17:17 |
sean-k-mooney | geguileo: ya so i know knwo what your trying to fix | 17:17 |
geguileo | that's why the detect_zeroes would be good for volume live migration | 17:17 |
sean-k-mooney | so we expect this to be done by cider using the driver assited migration feature | 17:17 |
geguileo | sean-k-mooney: thanks for you patience in understanding my ramblings :-) | 17:17 |
sean-k-mooney | however you want nova ot be intelegent enough | 17:17 |
sean-k-mooney | so that when we fallback to nova doing the volume migration | 17:18 |
sean-k-mooney | that we also preseve the sparceness | 17:18 |
geguileo | sean-k-mooney: problem is that driver assisted migration cannot work between different backends (and afaik it doesn't work for any driver even between volumes of the same array) | 17:18 |
geguileo | sean-k-mooney: afaik today all online volume migrations are done by nova | 17:18 |
dansmith | wait what? | 17:18 |
geguileo | I don't think any cinde driver supports it | 17:18 |
opendevreview | Amit Uniyal proposed openstack/nova master: Reproducer for dangling bdms https://review.opendev.org/c/openstack/nova/+/881457 | 17:19 |
opendevreview | Amit Uniyal proposed openstack/nova master: Delete dangling bdms https://review.opendev.org/c/openstack/nova/+/882284 | 17:19 |
sean-k-mooney | geguileo: i was pretty sure you fixed at least one vendor driver last year | 17:19 |
dansmith | why would we ever want to do that? unless you're crossing AZs or something | 17:19 |
geguileo | dansmith: all volumes that are attached to now are migrated by nova | 17:19 |
geguileo | s/now/nova | 17:19 |
sean-k-mooney | geguileo: if you change backend right | 17:19 |
sean-k-mooney | not just retype | 17:19 |
sean-k-mooney | within the same backend | 17:19 |
dansmith | retype is a different thing right? | 17:19 |
sean-k-mooney | yes | 17:20 |
dansmith | I'm talking about you live migrate from one server to the next one in the rack, we should not be moving all the volume data to a *new* volume... | 17:20 |
geguileo | dansmith: retype is a different think, but many times it triggers a migration | 17:20 |
geguileo | dansmith: oh, yeah, not a nova live migration | 17:20 |
geguileo | dansmith: it's a volume live migration | 17:20 |
sean-k-mooney | retypes can be within hte same backend (just different qos policy) or to a diffent backend | 17:20 |
geguileo | basically when you mirror the data from one volume to another | 17:20 |
dansmith | okay, I guess we're confusing too many things | 17:21 |
geguileo | sean-k-mooney: correct! | 17:21 |
sean-k-mooney | so i remmeber a custoemr issue like 4-6 months ago where i tought wew fixed scalio or one of the other driver to explcitly preserve sparcens when doign a driver assited voluem migration | 17:21 |
geguileo | dansmith: what's the right term for moving data from one attached volume to another while the instance is running? | 17:21 |
dansmith | swap volume I think? | 17:22 |
geguileo | sean-k-mooney: I fixed for offline and to report the value to Nova | 17:22 |
dansmith | that's the action we see, AFAIK | 17:22 |
sean-k-mooney | on the nova side its the swap volume is what is called yes | 17:22 |
geguileo | dansmith: ok, I'll try to talk about swap volume | 17:22 |
sean-k-mooney | well no volume migration is fine | 17:22 |
dansmith | geguileo: not trying to make you use our language, I just need to know that a bunch of terms have been re-used :) | 17:23 |
sean-k-mooney | but you are asserting that the driver asseited path never works for an online volume migration | 17:23 |
geguileo | dansmith: oh, I prefer to use the right language | 17:23 |
geguileo | sean-k-mooney: I don't think we have any cinder driver capable of doing it... | 17:23 |
sean-k-mooney | i see... | 17:24 |
sean-k-mooney | that kind of suck since you have to transet all the data via the compute node then | 17:24 |
dansmith | I just don't understand where that happens | 17:24 |
geguileo | maybe RBD can... | 17:24 |
dansmith | if the instance is running but nova is transferring the data between volumes.. where in nova is that happening? | 17:24 |
geguileo | sean-k-mooney: agreed, it sucks | 17:24 |
geguileo | dansmith: libvirt/QEMU supports that | 17:25 |
geguileo | by adding a mirror to the disk | 17:25 |
geguileo | and once the volumes are mirrored | 17:25 |
geguileo | nova removes the old volume | 17:25 |
dansmith | ah, okay and is that what we're poking via swap? | 17:25 |
geguileo | I think so | 17:25 |
geguileo | I believe that's the swap volume | 17:26 |
dansmith | okay I thought our swap was just "pause, change the connection, unpause" | 17:26 |
sean-k-mooney | https://github.com/openstack/nova/blob/master/nova/virt/libvirt/driver.py#LL2241C16-L2241C16 | 17:26 |
sean-k-mooney | there i think | 17:26 |
dansmith | oh okay so it really is us blocking on that action, interesting | 17:27 |
geguileo | sean-k-mooney: sounds about right | 17:27 |
sean-k-mooney | i dug into this a few months ago but i have mostly purged that info | 17:27 |
dansmith | man, that sucks :) | 17:27 |
sean-k-mooney | its actully libvirt doing this | 17:27 |
sean-k-mooney | but there are a few flags we can pass | 17:27 |
geguileo | yeah, Nova just waits for the job to complete | 17:27 |
sean-k-mooney | to have it sparify the zeors | 17:27 |
geguileo | yeah, it's the detect_zeroes option from https://paste.openstack.org/show/brFgX6MgBlxjgCrE3rbg/ | 17:28 |
geguileo | well, that's the brute force approach for my tests | 17:28 |
geguileo | because that enables them ALL the time | 17:28 |
geguileo | which sucks | 17:28 |
geguileo | but I was able to confirm that it works for NFS/qcow2 and RBD | 17:28 |
geguileo | doesn't work for SCSI devices though | 17:28 |
sean-k-mooney | well it depens on if there is any perfornace overhead to it normally | 17:28 |
geguileo | sean-k-mooney: there is performance overhead | 17:29 |
dansmith | definitely | 17:29 |
geguileo | so I think it only makes sense when you are going to be reading the whole thing | 17:29 |
geguileo | and then you save on network + writes | 17:29 |
geguileo | so only enable it during the volume swap operation | 17:29 |
sean-k-mooney | well we cant really change this in responce to a volume migration api request | 17:29 |
geguileo | then disable it | 17:29 |
dansmith | you're trading cpu for disk | 17:30 |
sean-k-mooney | even if we coudl im not sure if we should | 17:30 |
geguileo | dansmith: in volume swap, we are trading cpu for network + disk + time | 17:30 |
dansmith | yeah I meant if it's enabled all the time | 17:30 |
dansmith | and yeah, network too | 17:31 |
sean-k-mooney | well it would only have an effect on livemigration and on the intal disk creation | 17:31 |
geguileo | dansmith: I don't think we should enable it all the time, because if the storage supports discard, then the disk will be recovered by periodically calling fstrimg like some OSs do | 17:31 |
dansmith | yeah | 17:31 |
geguileo | but I think this can greatly improve some volume swap cases | 17:31 |
sean-k-mooney | well discard is off by default | 17:31 |
sean-k-mooney | and currently only works if you use virtio-scsi which is not our default | 17:32 |
geguileo | sean-k-mooney: yeah, but that's something we have to improve on the cinder side | 17:32 |
sean-k-mooney | no i mean in the nova side | 17:32 |
geguileo | so we properly report the value | 17:32 |
sean-k-mooney | we have a config option to opt into allowing discard | 17:32 |
sean-k-mooney | and by default we dont | 17:32 |
geguileo | yeah, but if cinder reports it is supported then nova does the right thing | 17:32 |
geguileo | sean-k-mooney: really? | 17:32 |
geguileo | sean-k-mooney: which one? because I don't recall touching nova conf to enable that one | 17:33 |
geguileo | (maybe devstack does automatically) | 17:33 |
sean-k-mooney | https://docs.openstack.org/nova/latest/configuration/config.html#libvirt.hw_disk_discard | 17:33 |
sean-k-mooney | so that contols if discard works for local disk at least | 17:34 |
geguileo | sean-k-mooney: oh, but for local, not cinder | 17:34 |
geguileo | good to know about that one | 17:34 |
sean-k-mooney | i was under the impression it had an effect for cinder too but im not sure | 17:34 |
sean-k-mooney | geguileo: so the reason the discard beahvior came to my attention | 17:34 |
geguileo | sean-k-mooney: I don't think it does | 17:34 |
sean-k-mooney | was i wanted to make discard the defautl for nova | 17:34 |
sean-k-mooney | and found it broke virtio-blk | 17:34 |
geguileo | sean-k-mooney: that would be awesome!!! | 17:34 |
sean-k-mooney | so i fixed that | 17:34 |
geguileo | what did you fix? | 17:35 |
sean-k-mooney | i removed the block based on the qemu min version | 17:35 |
sean-k-mooney | and i coudl boot vms | 17:35 |
sean-k-mooney | so what i need to do is repoduce this locally again | 17:35 |
sean-k-mooney | and do some manual testing | 17:35 |
sean-k-mooney | to confirm if discard with qcow acturally works | 17:36 |
geguileo | sean-k-mooney: LVM with LIO doesn't currently support trimming | 17:36 |
geguileo | it's one of the bugs I have a local fix for | 17:36 |
sean-k-mooney | ok but that wont affect things right | 17:36 |
sean-k-mooney | since that driver wont report discard supprot | 17:36 |
geguileo | so you may see fstrim telling you it has recovered space, but it hasn't really | 17:36 |
geguileo | sean-k-mooney: we can ask cinder drivers to report discard support | 17:36 |
sean-k-mooney | well again i dont care about the cinder case im trying to fix discard support for non-cinder storage | 17:37 |
geguileo | using the report_discard_supported backend option | 17:37 |
geguileo | sean-k-mooney: I was digging into the discard case for cinder lol | 17:37 |
geguileo | and forgot to look into the ephemeral case | 17:37 |
sean-k-mooney | yep i know :) | 17:37 |
sean-k-mooney | ephmeral means somethign esle in nova | 17:38 |
geguileo | sean-k-mooney: I'm doing a writeup on my findings, so I'll send you the link later so you can add yours as well | 17:38 |
sean-k-mooney | acl | 17:38 |
sean-k-mooney | *ack | 17:38 |
geguileo | sean-k-mooney: dansmith thank you both for your time :-) | 17:41 |
dansmith | same :) | 17:41 |
carloss | o/ bauzas - what are your thoughts on having a follow-up cross-project session between Nova and Manila in the PTG next week? we can use it to chat about gouthamr's specs | 18:08 |
opendevreview | Merged openstack/nova master: Add debug logging when Instance raises OrphanedObjectError https://review.opendev.org/c/openstack/nova/+/883325 | 20:07 |
Generated by irclog2html.py 2.17.3 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!