opendevreview | Slawek Kaplonski proposed openstack/nova stable/ussuri: [neutron] Get only ID and name of the SGs from Neutron https://review.opendev.org/c/openstack/nova/+/787253 | 05:44 |
---|---|---|
opendevreview | Yongli He proposed openstack/nova master: Smartnic support - cyborg drive https://review.opendev.org/c/openstack/nova/+/771362 | 06:06 |
opendevreview | Yongli He proposed openstack/nova master: smartnic support - new vnic type https://review.opendev.org/c/openstack/nova/+/771363 | 06:06 |
opendevreview | Yongli He proposed openstack/nova master: smartnic support - create arqs https://review.opendev.org/c/openstack/nova/+/758944 | 06:06 |
opendevreview | Yongli He proposed openstack/nova master: smartnic support - cleanup arqs https://review.opendev.org/c/openstack/nova/+/798054 | 06:06 |
opendevreview | Yongli He proposed openstack/nova master: smartnic support - reject server move and suspend https://review.opendev.org/c/openstack/nova/+/779913 | 06:06 |
opendevreview | Yongli He proposed openstack/nova master: smartnic support - functional tests https://review.opendev.org/c/openstack/nova/+/780147 | 06:06 |
opendevreview | Yongli He proposed openstack/nova master: smartnic support - build instance with smartnic arqs https://review.opendev.org/c/openstack/nova/+/798249 | 06:06 |
gibi | sean-k-mooney[m]: do you still hold your -1 on https://review.opendev.org/c/openstack/nova/+/797142 ? the follow up is green | 07:39 |
gibi | lyarwood: I have a comment in https://review.opendev.org/c/openstack/nova/+/779275 about the assumption that size is always provided to create_image | 07:49 |
MrClayPole | Morning all, We currently have an OpenStack ansible rocky deployment running on Ubuntu 18.04. We've been having failures during live migrations. We are seeing the following error in the journal logs but are not having much luck trying to trace it "error : qemuDomainObjBeginJobInternal:4945 : Timed out during operation: cannot acquire state change lock (held by remoteDispatchDomainFSFreeze)" & "error : | 08:26 |
MrClayPole | qemuDomainObjBeginJobInternal:4945 : Timed out during operation: cannot acquire state change lock (held by remoteDispatchDomainMigratePrepareTunnel3Params)" | 08:26 |
lyarwood | gibi: ack, I think this is because ramdisk and kernel files are always RAW but let me grep around again and confirm | 09:01 |
lyarwood | was anyone working on the VIR_CONNECT_LIST_NODE_DEVICES_CAP_VDPA libvirt regression btw? | 09:02 |
lyarwood | https://zuul.opendev.org/t/openstack/build/68e59744ef7444a5ae108118983c9353/log/controller/logs/screen-n-cpu.txt#1525 - the new centos job is hitting it | 09:02 |
* lyarwood is sure it came up somewhere either up or downstream last week | 09:02 | |
lyarwood | https://bugs.launchpad.net/nova/+bug/1933096 ah ha | 09:03 |
stephenfin | In the docs on AZs, we have this sentence "A host can be part of multiple aggregates but it can only be in one availability zone". Anyone know off the top of their heads what enforces this? | 09:06 |
stephenfin | from https://docs.openstack.org/nova/latest/admin/availability-zones.html | 09:06 |
stephenfin | I wrote that, but I think I copy-pasted it from elsewhere | 09:06 |
stephenfin | ah, found it. 'is_safe_to_update_az' in nova/compute/api.py | 09:20 |
gibi | lyarwood: Sean looked in https://bugs.launchpad.net/nova/+bug/1933096 before | 09:24 |
lyarwood | yup and closed it invalid as it was third party CI, this time it's our own upstream CI | 09:24 |
lyarwood | was going to ask them how we can proceed here, no idea how we cache packages on CI nodes tbh | 09:24 |
gibi | lyarwood: I guess we also has a bad cache somewher then | 09:25 |
lyarwood | yeah likely | 09:25 |
opendevreview | Jorhson Deng proposed openstack/nova master: recheck the attachment_id after the reschedule successful https://review.opendev.org/c/openstack/nova/+/796209 | 09:31 |
*** bhagyashris_ is now known as bhagyashris | 09:40 | |
sean-k-mooney[m] | lyarwood: so im still not conviced that that is a valid bug | 11:37 |
sean-k-mooney[m] | or rather we could adress it but only by nolonger relying on any libvirt version checks in our code | 11:37 |
sean-k-mooney[m] | libvirt-python is not really intended to be installed as a wheel | 11:38 |
sean-k-mooney[m] | its intended to generate bindings when its installed for your current libvirt version which it wont do if you have prebuilt it as a wheel | 11:38 |
sean-k-mooney[m] | stephenfin: yes we enforce that a host can only be in one az | 11:39 |
sean-k-mooney[m] | lyarwood: i can add an extra guard conditon for this specific case but it would just be a wack a mole problem for any other case where we use code that is generated on install | 11:41 |
opendevreview | sean mooney proposed openstack/nova master: fix sr-iov support on Cavium ThunderX hosts. https://review.opendev.org/c/openstack/nova/+/777679 | 12:48 |
bauzas | stephenfin: when you're around, we can discuss on https://review.opendev.org/c/openstack/nova/+/798145 if you wish | 13:00 |
bauzas | tl;dr: problem is that we don't verify the AZs if you don't use the AZfilter | 13:00 |
bauzas | so we can't just look at them by the API service unless we know that the AZFilter is used | 13:01 |
sean-k-mooney | our down stream customer could avoid the issue they had if they just enabled the placemnt preilter | 13:01 |
sean-k-mooney | that would enforece the AZ existance check | 13:02 |
sean-k-mooney | but they could still select the host using the hack | 13:02 |
sean-k-mooney | bauzas: i do agree though that we should remove that in a new microversion now that we have teh new way to do it | 13:03 |
bauzas | sean-k-mooney: my thought is that we should just not using the az hack after a new microversion | 13:03 |
sean-k-mooney | yep | 13:03 |
sean-k-mooney | i was expecting that to have been done in the one that added --host | 13:03 |
bauzas | for sure, it wouldn't fix the issue of a requested AZ not good but... | 13:03 |
sean-k-mooney | i also agree with our assement tha the az in the request spec and instance are not always intended to match | 13:04 |
sean-k-mooney | classic example being request spec is none but instance has a value set | 13:04 |
sean-k-mooney | in princiapl i think that is the only ligitimat case where they should disagree | 13:04 |
sean-k-mooney | if the request spec is non None then they should agree | 13:05 |
sean-k-mooney | if they dont you forced a live migration | 13:05 |
stephenfin | bauzas: we don't currently, but I'm adding that | 13:05 |
stephenfin | and the AZFilter is no use to us if we're bypassing the scheduler by forcing a host | 13:05 |
sean-k-mooney | stephenfin: right but im not conviced you should | 13:05 |
sean-k-mooney | stephenfin: that is not how that works | 13:05 |
sean-k-mooney | we check that the az exists | 13:06 |
bauzas | stephenfin: what sean-k-mooney said | 13:06 |
stephenfin | requesting zone:host makes no sense if $host is not in $zone | 13:06 |
sean-k-mooney | and only proceed if it does when you use the az hack | 13:06 |
bauzas | stephenfin: it's an hack, we should just remove it | 13:06 |
stephenfin | we can't remove it for the older APIs | 13:06 |
bauzas | surely | 13:06 |
sean-k-mooney | stephenfin: that is something we could check potentally but im not sure the api is the right place | 13:06 |
stephenfin | so people will keep hitting this | 13:06 |
bauzas | stephenfin: it's an hack, right? | 13:07 |
sean-k-mooney | well its was a supported feature | 13:07 |
bauzas | and you need to be an operator | 13:07 |
bauzas | soooo | 13:07 |
sean-k-mooney | but yes | 13:07 |
bauzas | the az hack can't be used by an end user | 13:07 |
sean-k-mooney | yes it can | 13:07 |
bauzas | not by default | 13:07 |
sean-k-mooney | they just need to use an older microverion | 13:07 |
bauzas | the default policy is admin | 13:08 |
sean-k-mooney | bauzas: is it? | 13:08 |
bauzas | for the az hack ? yes | 13:08 |
sean-k-mooney | i tought we did not have a sepreate policy for it | 13:08 |
stephenfin | I must admit I don't understand the issue | 13:08 |
bauzas | (and fortunately) | 13:08 |
sean-k-mooney | just the az one | 13:08 |
stephenfin | why wouldn't a simple "does this host belong to this AZ" check make sense? | 13:08 |
stephenfin | it's not too expensive fwict | 13:09 |
gibi | bauzas: do you suggest to keep allowing calling --availability_zone my-az:host-not-in-my-az and succeed in old microversions? | 13:09 |
bauzas | sean-k-mooney: I'm 100% sure about the different policy | 13:09 |
stephenfin | a simple lookup in the API DB | 13:09 |
bauzas | gibi: we *could* fix this for old versions only, but then I have another concern | 13:09 |
* stephenfin dashes to shop to get food for lunch, brb | 13:10 | |
sean-k-mooney | stephenfin: my main issue is that you are doing it in a different location to the other az check | 13:10 |
bauzas | gibi: my other concern is that I know some environments that don't use the AZfilter | 13:10 |
sean-k-mooney | stephenfin: whihc is doen in the schduler i belive | 13:10 |
bauzas | gibi: and previously, you were able to use the az hack without the AZFilter | 13:11 |
gibi | bauzas: yes, but that hack resulted in an inconsistent system | 13:11 |
gibi | as described in the bug | 13:11 |
bauzas | gibi: not if you don't use the filter | 13:11 |
bauzas | see the problem ? | 13:11 |
sean-k-mooney | gibi: well its basically forcing the host | 13:12 |
sean-k-mooney | same as a forced migration | 13:12 |
gibi | so if you dont use teh AzFilter then no AZ recorded in the instance or in the request_spec? | 13:12 |
bauzas | gibi: and again, I remember ourselves saying A LOT 'well, if you force a host, then meh" | 13:12 |
sean-k-mooney | gibi: the az should be recoreded in the isntacne regaradless of the filter | 13:12 |
bauzas | gibi: now, we become super picky about forcing hosts and we want to verify them | 13:13 |
gibi | I think the base issue is that if you use the hack then you end up having inconsistent az recorded in the instance and in the request_spec | 13:13 |
bauzas | but again, you *SHOULDN'T* force a destination | 13:13 |
gibi | we cannot remove the hack from old microversions | 13:13 |
bauzas | that's why we added 2.74 version | 13:13 |
bauzas | to have a way to propose a target without forcing it | 13:14 |
sean-k-mooney | gibi: you wont always | 13:14 |
bauzas | and we said as a consensus that we should stop supporting to force move | 13:14 |
bauzas | so, if operators wanna move (because again, you need to be ADMIN in order to use the AZ hack), then your dog | 13:15 |
sean-k-mooney | bauzas: this is the policy yes https://github.com/openstack/nova/blob/master/nova/policies/servers.py#L204-L225 | 13:15 |
bauzas | again, I was 100% sure about it | 13:15 |
sean-k-mooney | oh no that the new one | 13:15 |
gibi | bauzas: even if we fix the bug in the hack the admin still can move to any host just need to specy the proper az name of the host | 13:15 |
gibi | bauzas: so no functionality is lost | 13:15 |
sean-k-mooney | this is the old one https://github.com/openstack/nova/blob/master/nova/policies/servers.py#L177-L196 | 13:15 |
bauzas | gibi: admins can move to bad targets anyway | 13:16 |
sean-k-mooney | gibi: what would the fix be | 13:16 |
bauzas | gibi: admins can force migrate to hosts without verifying other attributes of the host | 13:16 |
gibi | bauzas: it is not a bad target, the host is valid, nova just record a wrong az name during the move as it trustes admin input | 13:16 |
sean-k-mooney | gibi: just include the AZ and not the host in the request spec? | 13:16 |
gibi | sean-k-mooney: the fix is to make sure admin provide an az name for the host that is valid for the host, then nova will record a valid az name | 13:17 |
bauzas | again, we're breaking existing behaviours if we change things | 13:17 |
gibi | bauzas: we are fixing a bug | 13:17 |
bauzas | because again, some operators opt-out the AZfilter | 13:17 |
gibi | and such we can break old buggy behaviro | 13:17 |
bauzas | gibi: it's not a bug, it's a 40x | 13:17 |
gibi | bauzas: the db inconsistency is the bug | 13:17 |
bauzas | no | 13:17 |
sean-k-mooney | gibi: stephenfin if we add this check we shoudl also move the az exits check to the api also | 13:18 |
bauzas | you asked for a target you can't succeed | 13:18 |
sean-k-mooney | stephenfin: you added that to your check but did you remove the check later | 13:18 |
bauzas | gibi: you can create an instance on AZ1, then force migrate to AZ2 | 13:18 |
gibi | bauzas: I still think that if nova creates an inconsistent db record then we should fix that | 13:18 |
bauzas | gibi: and then, good luck with resizing the instance | 13:18 |
sean-k-mooney | bauzas: well a resize in that case will resize back to AZ1 | 13:19 |
gibi | so I can accept any fix that result in a consistent db data. | 13:19 |
bauzas | again, this is a forced operation and we made a clear statement on the fact broken migrations are not nova's fault | 13:19 |
bauzas | gibi: if we really want to fix this thing | 13:20 |
bauzas | gibi: I'd then suggest two things | 13:20 |
bauzas | gibi: 1/ remove the call by a new microversion | 13:20 |
bauzas | 2/ change the az value to None or to the host AZ in the az hack method | 13:21 |
bauzas | the az value is meaningless when you use the force hosts | 13:21 |
bauzas | but I wouldn't hardstop on the call | 13:21 |
bauzas | eg. | 13:22 |
bauzas | nova boot --az az1:host_in_az2 would consist into getting the tuple (None, host, node) | 13:22 |
gibi | 1/ is totally OK to me. So remove the hack in future version. | 13:22 |
sean-k-mooney | setting it to none would be consitent with using --host | 13:22 |
bauzas | or actually (schedule_default_az, host, node) | 13:23 |
bauzas | I mean, setting the returned az to be the default AZ from the option | 13:23 |
bauzas | (which defaults to None) | 13:23 |
gibi | 2/ if we can simulate --host when --az was given with bad az name, and log a warning, then I can accep that as well | 13:23 |
gibi | so keep the existing bad (but used) behavior but avoid incosistent db data | 13:24 |
gibi | in old microversin | 13:24 |
bauzas | gibi: the crucial distinction between --host and the az hack is the fact we call out the scheduler on the former, not on the latter | 13:24 |
bauzas | gibi: honestly, again, ops are using the az hack not for the az, but for providing a target | 13:25 |
bauzas | gibi: so agreed, we should log a warning (after all, this is an op who did this) and just propose the default AZ as a returned AZ | 13:25 |
bauzas | if people really want to both force to a target *AND* stick on this AZ, then they can use --host and --az (without the az hack) | 13:26 |
gibi | yepp | 13:27 |
bauzas | I'll log my thoughts in the review | 13:27 |
gibi | bauzas: thanks | 13:28 |
gibi | let's see how stephenfin feels about it after his lunch | 13:29 |
bauzas | sure | 13:33 |
sean-k-mooney | gibi: here is a patch to update teh neutron doc by the way https://review.opendev.org/c/openstack/neutron/+/798302 | 13:34 |
sean-k-mooney | git distracted by the previous conversation | 13:34 |
bauzas | gosh, eavesdrop is soooo slow to update the | 13:35 |
sean-k-mooney | bauzas: i think its a cron job or similar | 13:35 |
bauzas | last updated bits are from more than 20 mins | 13:35 |
sean-k-mooney | it often pretty quick but sometimes its delayed | 13:35 |
sean-k-mooney | ya that sometimes happens | 13:36 |
sean-k-mooney | ususally its only a minute or so behind at most | 13:36 |
bauzas | still lagging | 13:37 |
*** abhishekk is now known as akekane|home | 13:39 | |
*** akekane|home is now known as abhishekk | 13:39 | |
ganso | lyarwood: hi! could you please take one quick look at https://review.opendev.org/c/openstack/nova/+/795432 ? The other reviewers said they are waiting for your feedback. Thanks in advance! | 13:39 |
gibi | sean-k-mooney: ups, I also pushed a doc patch https://review.opendev.org/c/openstack/neutron/+/798294 | 13:40 |
sean-k-mooney | oh ok lol | 13:41 |
bauzas | can someone confirm it's not PEBKAC if https://meetings.opendev.org/irclogs/%23openstack-nova/%23openstack-nova.2021-06-28.log.html is lagging ? | 13:42 |
sean-k-mooney | gibi: i dont mind going with yours we took slightly different approches but the message is similar | 13:42 |
artom | bauzas, https://meetings.opendev.org/irclogs/%23openstack-nova/latest.log.html | 13:42 |
sean-k-mooney | artom: its the same page | 13:43 |
artom | sean-k-mooney, I know | 13:43 |
artom | 13:15 is the latest timestamp there | 13:43 |
artom | That's in UTC I imagine | 13:43 |
artom | So about 30 minutes behind... | 13:43 |
sean-k-mooney | bauzas: i assume you just want to link to the point where the converstation started | 13:43 |
artom | Yeah, feels longer than normal | 13:43 |
bauzas | sean-k-mooney: I'd rather point to the written agrement | 13:44 |
sean-k-mooney | bauzas: it have updated by the time anyone actully reads it | 13:44 |
sean-k-mooney | ah ok | 13:44 |
sean-k-mooney | ill quickly check with infra | 13:44 |
artom | Also, "then your dog" | 13:45 |
bauzas | artom: did you like it ? did i used it correctly ? | 13:50 |
artom | bauzas, I actually have no idea what you were trying to say :P | 13:50 |
bauzas | artom: "then it's your problem" | 13:50 |
gibi | sean-k-mooney: ack, let me know if you think some part of your message should be incorporated to mine | 13:51 |
artom | Never heard it used like that... or at all, in fact | 13:51 |
bauzas | "mommy, I don't wanna take the dog out it's raining", "darling, you wanted it, so YOUR DOG" | 13:51 |
artom | bauzas, I think you just invented an expression. You're this century's Shakespeare | 13:53 |
gibi | :D | 13:53 |
bauzas | artom: Voltaire, please | 13:53 |
bauzas | so, I left my comment but i need to get my kids from school, ttyl | 14:18 |
opendevreview | sean mooney proposed openstack/os-vif master: [WIP] add configurable per port bridges https://review.opendev.org/c/openstack/os-vif/+/798055 | 14:33 |
gibi | bauzas, sean-k-mooney: you were +2 on the Placement RP re-parenting spec, could you also look at the implementation https://review.opendev.org/c/openstack/placement/+/784020 ? | 14:57 |
bauzas | surelyu | 14:57 |
sean-k-mooney | yes i can do that | 14:57 |
bauzas | I tho have a meeting in 2 mins... so tomorrow morning, your dog :p | 14:57 |
* bauzas tries to make a catchphrase | 14:57 | |
gibi | bauzas: thanks :) | 14:58 |
gibi | sean-k-mooney: thanks | 14:58 |
gibi | also on placement side there is a quick small patch to add pps resources to os-resource-classes https://review.opendev.org/c/openstack/os-resource-classes/+/796591 | 14:58 |
bauzas | +2d on the last one | 15:29 |
gibi | bauzas: thanks | 15:33 |
opendevreview | Elod Illes proposed openstack/nova stable/pike: Update pci stat pools based on PCI device changes https://review.opendev.org/c/openstack/nova/+/798345 | 15:36 |
*** abhishekk is now known as abhishekk|dinner | 15:55 | |
*** abhishekk|dinner is now known as abhishekk|home | 16:28 | |
*** abhishekk|home is now known as abhishekk | 16:28 | |
opendevreview | Stephen Finucane proposed openstack/nova master: WIP: api: Validate host belongs to availability zone https://review.opendev.org/c/openstack/nova/+/798145 | 16:32 |
opendevreview | Merged openstack/nova stable/ussuri: [CI] Fix gate by using zuulv3 live migration and grenade jobs https://review.opendev.org/c/openstack/nova/+/795432 | 19:19 |
opendevreview | sean mooney proposed openstack/os-vif master: [WIP] add configurable per port bridges https://review.opendev.org/c/openstack/os-vif/+/798055 | 19:23 |
ganso | gibi, bauzas: if you have a spare minute to please look at this backport that is ready for an extra +2 and +W: https://review.opendev.org/c/openstack/nova/+/796719 | 19:30 |
tosky | melwitt, elodilles: yay, now time to update the train backport of the legacy cleanup and the CI should be (almost) fine | 20:36 |
Generated by irclog2html.py 2.17.2 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!