opendevreview | Merged openstack/nova master: tests: Walk database migrations in correct order https://review.opendev.org/c/openstack/nova/+/810291 | 00:30 |
---|---|---|
opendevreview | Merged openstack/nova master: tests: Address some nits with database migration series https://review.opendev.org/c/openstack/nova/+/810856 | 00:31 |
opendevreview | Merged openstack/nova master: tests: Silence noise from database tests https://review.opendev.org/c/openstack/nova/+/810857 | 00:31 |
opendevreview | melanie witt proposed openstack/nova master: DNM Run against unmerged oslo.limit changes https://review.opendev.org/c/openstack/nova/+/812236 | 00:51 |
opendevreview | melanie witt proposed openstack/nova master: Add logic to enforce local api and db limits https://review.opendev.org/c/openstack/nova/+/712139 | 00:51 |
opendevreview | melanie witt proposed openstack/nova master: Enforce api and db limits https://review.opendev.org/c/openstack/nova/+/712142 | 00:51 |
opendevreview | melanie witt proposed openstack/nova master: Update quota_class APIs for db and api limits https://review.opendev.org/c/openstack/nova/+/712143 | 00:51 |
opendevreview | melanie witt proposed openstack/nova master: Update limit APIs https://review.opendev.org/c/openstack/nova/+/712707 | 00:51 |
opendevreview | melanie witt proposed openstack/nova master: Update quota sets APIs https://review.opendev.org/c/openstack/nova/+/712749 | 00:51 |
opendevreview | melanie witt proposed openstack/nova master: Tell oslo.limit how to count nova resources https://review.opendev.org/c/openstack/nova/+/713301 | 00:51 |
opendevreview | melanie witt proposed openstack/nova master: Enforce resource limits using oslo.limit https://review.opendev.org/c/openstack/nova/+/615180 | 00:51 |
opendevreview | melanie witt proposed openstack/nova master: Add legacy limits and usage to placement unified limits https://review.opendev.org/c/openstack/nova/+/713498 | 00:51 |
opendevreview | melanie witt proposed openstack/nova master: Update quota apis with keystone limits and usage https://review.opendev.org/c/openstack/nova/+/713499 | 00:51 |
opendevreview | melanie witt proposed openstack/nova master: Add reno for unified limits https://review.opendev.org/c/openstack/nova/+/715271 | 00:51 |
opendevreview | melanie witt proposed openstack/nova master: WIP Enable unified limits in the nova-next job https://review.opendev.org/c/openstack/nova/+/789963 | 00:51 |
gmann | melwitt: its green now https://review.opendev.org/c/openstack/devstack/+/812092/4 | 01:22 |
melwitt | gmann: \o/ | 01:22 |
opendevreview | Merged openstack/nova master: Reproduce bug 1945310 https://review.opendev.org/c/openstack/nova/+/811394 | 01:25 |
opendevreview | Merged openstack/nova master: Update min supported service version for Yoga https://review.opendev.org/c/openstack/nova/+/809932 | 02:36 |
opendevreview | pengzhang proposed openstack/nova master: the vif_name error leads virsh domifstat cant work https://review.opendev.org/c/openstack/nova/+/813119 | 03:25 |
opendevreview | pengzhang proposed openstack/nova master: the vif_name error leads virsh domifstat cant work https://review.opendev.org/c/openstack/nova/+/813119 | 05:46 |
opendevreview | norman shen proposed openstack/nova master: Recreate mdev devices according to placement https://review.opendev.org/c/openstack/nova/+/810220 | 07:33 |
bauzas | good (late) morning Nova | 08:12 |
lyarwood | \o | 08:15 |
gibi | o/ | 08:17 |
opendevreview | Dmitriy Rabotyagov proposed openstack/nova master: Ensure MAC addresses characters are in the same case https://review.opendev.org/c/openstack/nova/+/811947 | 09:16 |
opendevreview | Elod Illes proposed openstack/nova stable/train: Reject open redirection in the console proxy https://review.opendev.org/c/openstack/nova/+/791807 | 09:38 |
opendevreview | Elod Illes proposed openstack/nova stable/train: address open redirect with 3 forward slashes https://review.opendev.org/c/openstack/nova/+/806629 | 09:38 |
gibi | sean-k-mooney: what is the reason of not waiting for network-vif-plugged events during hard reboot? I hard reboot does unplug / plug but does not wait for plugged event. Compared that to boot where we do plug and then wait for plugged event before we let the VM to run | 10:20 |
gibi | in normal ovs case I see that during hard reboot neutron eventually sends the unplugged and plugged events to us we just not waiting for it | 10:20 |
sean-k-mooney | you are correct it will in most cases | 10:24 |
sean-k-mooney | technically linux bridge might miss the removal and not send it | 10:24 |
sean-k-mooney | for ovn and odl it wont | 10:24 |
sean-k-mooney | starnard ml2/ovs it will | 10:24 |
sean-k-mooney | so we dont wait because we need to know which ml2 driver bound the port and the behaivor of that driver | 10:25 |
sean-k-mooney | gibi: i likely will be looking at reqorkign how we do this in yoga its one of the things i want to talk about at the ptg | 10:25 |
sean-k-mooney | in train neutron started teilling us what driver bound the port | 10:26 |
gibi | OK I see, thanks | 10:26 |
sean-k-mooney | so we can actully no start using that info | 10:26 |
sean-k-mooney | we just need to add it to our vif objects and use it where it makes sense | 10:26 |
sean-k-mooney | e.g. wait for event on hard rebot if its ml2/ovs for example | 10:26 |
sean-k-mooney | but not if its odl | 10:26 |
gibi | I see | 10:26 |
sean-k-mooney | gibi: the short answer why we dont wait is we dont rebind the ports on hard reboot so we only get event for backend that send plug time event and since beforfe train we could not tell if ovs port were plug time or bind time we could not wait safly | 10:28 |
sean-k-mooney | gibi: are you seeing a issue wiht not waiting currently? | 10:28 |
sean-k-mooney | or was this just courisity | 10:28 |
gibi | I have a downstream case (on pike) where the nova unplug / replug behavior at openstack server start causes problems in ODL | 10:29 |
gibi | but you said that odl won't even send us plugtime events | 10:29 |
gibi | so waiting won't be the solution | 10:30 |
gibi | wondering if for backends not supporting plug time events we need to rebind | 10:30 |
sean-k-mooney | ah ml2/odl used to send events at bind time yes not at plug time but i think later version might have adress that | 10:30 |
gibi | ohh, good info, then I have to figure out the odl version too | 10:30 |
sean-k-mooney | give me to second and ill see if i can find that in the odl neutron driver | 10:31 |
sean-k-mooney | basically to support bind time event they had to open a websock that odl could call back to to send event to neutron which neuton could then send to nova | 10:32 |
sean-k-mooney | that was not part of the orginal ml2 driver | 10:32 |
gibi | you mean for plug time event | 10:32 |
sean-k-mooney | so in the legacy driver they just hardcoded the port as active as soon as the port was bound https://opendev.org/openstack/networking-odl/src/branch/master/networking_odl/ml2/legacy_port_binding.py#L58-L61 | 10:33 |
sean-k-mooney | which resulted in network-vif-plugged beign sent at bind time only | 10:33 |
sean-k-mooney | then latere they added a way to get port status updates form odl https://opendev.org/openstack/networking-odl/commit/3432e0f875f8243a2c1febf2012bd0f58863c56b | 10:34 |
sean-k-mooney | https://opendev.org/openstack/networking-odl/src/branch/master/networking_odl/ml2/port_status_update.py#L91-L95 | 10:34 |
sean-k-mooney | and then they would correctly send plug time events | 10:34 |
sean-k-mooney | gibi: but pike might predate that | 10:35 |
gibi | cool, I can check the dates then | 10:35 |
gibi | sean-k-mooney: thanks a lot | 10:35 |
gibi | so the solution is fresh enough networking-odl to have plug time event, and the in yoga think about waiting for plug time event at reboot as well | 10:36 |
sean-k-mooney | no worries assuming we can confirm this works today we could add odl also to the list of "send plug events" or whatever and then start waiting | 10:36 |
sean-k-mooney | yep | 10:36 |
gibi | does neutron have odl jobs in the upstream ci? | 10:36 |
sean-k-mooney | so we dont really need to wait for the ptg to discuss this i just have not had time to work on a spec | 10:36 |
sean-k-mooney | good question im not sure | 10:37 |
gibi | ok I will check that too | 10:37 |
sean-k-mooney | i suspect not | 10:37 |
sean-k-mooney | networking-odl should have jobs however | 10:37 |
gibi | ahh good point that is a separate repo with it own jobs | 10:37 |
sean-k-mooney | https://opendev.org/openstack/networking-odl/src/branch/master/.zuul.d/project.yaml#L51-L52 | 10:38 |
sean-k-mooney | it also has multi node versions | 10:38 |
gibi | cool then we can have test coverage upstream about the change | 10:38 |
sean-k-mooney | am on thing i have been talking to artom about is possible adding some other network toplogies to nova's gate as a periodic | 10:39 |
sean-k-mooney | so right now we have ovs with iptables in nova-next, we have ovn for most things and linux bridge for some file changes | 10:39 |
sean-k-mooney | it might be nice to add a weekly multi node odl and ml2/ovs job | 10:40 |
sean-k-mooney | weekly multi-node job that is to tesst migration and resize ectra | 10:40 |
gibi | I would expect that some of this coverage is already running on the neutron side | 10:41 |
gibi | but if not then I agree we can add it | 10:41 |
sean-k-mooney | that will tell them if we break things but only if the project is activly beign developed | 10:41 |
sean-k-mooney | it looks like networking-odl for example only gets comiits every month or so | 10:42 |
gibi | yeah, that is a good point too | 10:42 |
gibi | then we need the periodic to trigger testing even if networking-odl is not changed | 10:43 |
sean-k-mooney | yep just to make sure we did not break something. but really weekly is more then enought i think | 10:43 |
sean-k-mooney | we can review it like the placment jobs | 10:43 |
gibi | yepp, agree | 10:44 |
sean-k-mooney | gibi: i would also like to add dpdk at some point but i have not been maintaining networking-ovs-dpdk so i will need to add devstack support for dpdk or fix it before thats even posible at this poing | 10:45 |
sean-k-mooney | *point | 10:45 |
gibi | I support that idea too. Downstream we use dpdk a lot, so would be nice to get coverage for it upstream. | 10:46 |
sean-k-mooney | fortunetly on the dpdk front its support by distros now so its now just a matter of turning it on and seting a few config values but still need to sit down and do that | 10:46 |
gibi | let me know if I can help | 10:46 |
sean-k-mooney | sure will do. when i get around to workign on the devstack patches ill let you know | 10:47 |
opendevreview | sean mooney proposed openstack/nova master: Ensure MAC addresses characters are in the same case https://review.opendev.org/c/openstack/nova/+/811947 | 11:09 |
sean-k-mooney | noonedeadpunk: hopefully you dont mind me updateing ^ | 11:14 |
sean-k-mooney | gibi: can you take a look at ^ when you have time | 11:15 |
gibi | ack I will try | 11:16 |
* gibi gathered a big backlog of task this week, craxy | 11:16 | |
noonedeadpunk | sure I'm not! | 11:19 |
opendevreview | Lee Yarwood proposed openstack/nova-specs master: WIP libvirt: Allow Manila shares to be directly attached to instances https://review.opendev.org/c/openstack/nova-specs/+/813180 | 12:20 |
*** lbragstad_ is now known as lbragstad | 13:51 | |
lyarwood | stupid Friday question but can anyone think of an API where we've changed the return code in a microversion? I want to fix the POST os-volume_attachments method to return 202 but I can't think of the best way to do this | 14:37 |
lyarwood | ah nvm I see just duplicate the actual method and decorate it | 14:39 |
bauzas | lyarwood: correct, I was about to point it to you | 15:00 |
bauzas | (sorry for the late ping, kids taxi) | 15:00 |
bauzas | lyarwood: do you want to discuss on your spec ? | 15:00 |
stephenfin | lyarwood: There are quite a few APIs that suffer from that issue. Are you just going to do the one or multiple? | 15:00 |
bauzas | lyarwood: fwiw, I'm leaving in 25 mins for the weekend | 15:04 |
lyarwood | bauzas: sorry was just getting a drink, happy to chat now if you had any questions etc | 15:06 |
bauzas | I briefly looked at your spec | 15:06 |
lyarwood | stephenfin: just os-volume_attachments for now but I'll likely start to address others once I've worked this out | 15:07 |
lyarwood | I'm not sure if people would prefer a single microversion for everything etc | 15:07 |
dansmith | so, I'm pretty sure that we originally said we wouldn't do microversions for things like this | 15:08 |
bauzas | if 500, yes | 15:08 |
lyarwood | that would make my life easier | 15:08 |
dansmith | because nobody is going to opt-in to a new return code, or opt out (back?) to the old one | 15:08 |
sean-k-mooney | dont we requrie a microversion if you are chaning the reponcoe code to one that is not currently used | 15:08 |
bauzas | I mean, 500 to something clearer, not a microversion | 15:08 |
dansmith | if you're changing other things about the api, then doing the return code as part of it makes sense | 15:08 |
dansmith | but not just a "2.98: change a bunch of 200s to 202s" | 15:08 |
bauzas | sec, giving you the API rules | 15:08 |
sean-k-mooney | well 200 to 202 is blocking to async | 15:09 |
lyarwood | I wonder why we have TODOs everywhere for this then | 15:09 |
dansmith | sean-k-mooney: yeah, I'm not saying do it without a microversion, I'm saying don't do it unless there's something else changing | 15:09 |
lyarwood | oh sorry I see | 15:09 |
sean-k-mooney | https://developer.mozilla.org/en-US/docs/Web/HTTP/Status/202 | 15:09 |
lyarwood | well tbh it's never going to get done in that case | 15:09 |
sean-k-mooney | lyarwood: you want to move this to be an async api yes? | 15:10 |
dansmith | if we | 15:10 |
bauzas | https://specs.openstack.org/openstack/api-wg/guidelines/microversion_specification.html for the API general rules | 15:10 |
lyarwood | sean-k-mooney: it already is | 15:10 |
sean-k-mooney | oh then 200 is wrong ya | 15:10 |
bauzas | https://docs.openstack.org/nova/latest/contributor/microversions.html#when-do-i-need-a-new-microversion for nova specifics | 15:10 |
lyarwood | yes | 15:10 |
dansmith | are moving it from sync to async, then *of course* it makes sense to change, but just a microversion for changing the return code but not the behavior is kinda silly | 15:11 |
sean-k-mooney | right | 15:11 |
sean-k-mooney | im more ok with not haveing a microversion | 15:11 |
sean-k-mooney | if we are not changing behavior | 15:11 |
sean-k-mooney | just the return code | 15:11 |
dansmith | I'm not saying we should do that | 15:11 |
bauzas | technically, we agreed on asking for a microversion https://docs.openstack.org/nova/latest/_images/graphviz-a4c9682faf139450eed50fdcc7dd06df5677c197.png | 15:11 |
bauzas | unless it was changing from HTTP500 | 15:12 |
dansmith | I'm saying returning 200 for an async api that has always been async is unfortunate, but not that big of a deal | 15:12 |
sean-k-mooney | dansmith: you are say until we modify tese again dont change it | 15:12 |
dansmith | sean-k-mooney: yes | 15:12 |
dansmith | nobody is going to opt-in to "no new behavior but a more accurate return value" | 15:12 |
sean-k-mooney | i think that is also reasonable becasue of ^ | 15:12 |
bauzas | but we mention something : | 15:12 |
bauzas | " | 15:12 |
bauzas | The exception to not needing a microversion when returning a previously unspecified error code is the 400, 403, 404 and 415 cases. This is considered OK to return even if previously unspecified in the code since it’s implied given keystone authentication can fail with a 403 and API validation can fail with a 400 for invalid json request body. Request to url/resource that does not exist always fails with 404. Invalid content typ | 15:12 |
bauzas | es are handled before API methods are called which results in a 415. | 15:12 |
bauzas | " | 15:12 |
lyarwood | right but at least it's fixed later when they opt-in to some other change | 15:12 |
dansmith | bauzas: that has nothing to do with this case | 15:13 |
bauzas | nothing tells us to not ask for a microversion if changing from 200 | 15:13 |
bauzas | dansmith: sure, I was just looking at what we wrote :) | 15:13 |
bauzas | to see whether we said "cool if 200 -> 202 and async" | 15:13 |
sean-k-mooney | bauzas: under that policy we sould have to do a microversion but there is no untily in changing the repssonce doe and making it opt in | 15:13 |
dansmith | lyarwood: right but all of those kinds of death-by-1000-cuts things impact people, which is why we didn't do a massive cleanup early on | 15:13 |
dansmith | lyarwood: right now if you know that api returns 200 and you check for 200, then returning 202 could break client code even though absolutely nothing has changed | 15:14 |
dansmith | it'd be bad client code, granted, but.. it's pain for pretty much no gain | 15:14 |
bauzas | that's the point ^ | 15:14 |
lyarwood | dansmith: and how does that change anything by including it with other changes later on? | 15:14 |
lyarwood | dansmith: bad clients will still break | 15:14 |
sean-k-mooney | lyarwood: in the ohter api change later the clinet needed to be updated anyway | 15:15 |
bauzas | if clients expect a 200 on a call, we can't leave them wondering now why they get something else | 15:15 |
sean-k-mooney | e.g. there was already work for them to do | 15:15 |
dansmith | lyarwood: well, right, I'm saying it's not worth changing like ever, unless some other behavior changes that would require a behavioral change anyway | 15:15 |
dansmith | it's a developer purity thing, like many of our TODOs everywhere, but unless it impacts proper client behavior, it's more likely to break things than fix them, IMHO | 15:16 |
lyarwood | at least this way we could change all of them in one go and fix the same issue in the clients in one go | 15:16 |
lyarwood | instead you're suggesting different people fix them one by one at different times | 15:16 |
dansmith | when we started out do the v3 API, it was all this purity stuff, and we literally unrolled it all when we realized we were just making pain for people | 15:16 |
dansmith | lyarwood: again, I'm not | 15:16 |
lyarwood | okay it's Friday I'm likely missing something but given people don't want this I'm going to stop and move on | 15:18 |
sean-k-mooney | lyarwood: when we have done this clean up in the past we have paired it with other changes with the logic being a client tha want to use that new feature via a microversion need to be updated anyway so they can adapt to both change at the same time. | 15:18 |
dansmith | let me ask this way: what client behavior is going to be wrong if we return 200 that will be right if we return 202? | 15:18 |
dansmith | if there's something there, then it's worth changing | 15:18 |
dansmith | sean-k-mooney: right | 15:18 |
lyarwood | tbh with 200 to 202 it's not our clients that I'm worried about | 15:19 |
lyarwood | it's other callers outside of them and their understanding of the API | 15:19 |
dansmith | lyarwood: right, but ... what's the behavior or understanding that will be wrong? | 15:20 |
lyarwood | the CSI plugin in k8s for OpenStack for example has been misusing it for a while | 15:20 |
sean-k-mooney | your assumeing they are reading the reponce code but not the docs that say this is an async api | 15:20 |
lyarwood | dansmith: assuming it's sync and not polling | 15:20 |
dansmith | meaning, generally 202 means "if you query this thing right after the PUT/POST then you will not see it updated so don't expect it to be" .. is it that? or is it "this operation takes a while?" | 15:20 |
dansmith | lyarwood: but is the actual API operations sync and just not the attachment? | 15:21 |
lyarwood | we return after we cast to the compute to do the actual attachment | 15:22 |
sean-k-mooney | dansmith: i think k8s has been assuming that the attachment was completed when it got the 200 | 15:22 |
lyarwood | so it takes a while, but the CSI driver would start polling Cinder instead of Nova and that has caused various issues | 15:22 |
lyarwood | because they assumed the attachment in Nova was done at that point | 15:22 |
dansmith | I'm not sure that | 15:23 |
dansmith | is reasonable behavior to assume from either 200 or 202 though | 15:23 |
dansmith | you're giving it attachment information, which it is storing, and that triggers some behavior right? | 15:23 |
dansmith | like, IIRC, 200 is for "I've stored your object, here's the ID you can use to get it again" and 202 is for "I will store that later, but I can't necessarily give you its identifier yet" right? | 15:24 |
sean-k-mooney | for a 202 they cannot assuem its complete and should expect to poll nova for the compleation | 15:24 |
lyarwood | dansmith: yeah but my understanding with 200 was that processing was complete with the return | 15:25 |
sean-k-mooney | for 200 generally you would assume it blocks until its completed but its valid to retrun before that | 15:25 |
lyarwood | dansmith: and as sean-k-mooney said, 202 requires additional processing | 15:25 |
dansmith | I think that's true of the REST object and not necessarily the actions implied from the thing you're storing | 15:25 |
sean-k-mooney | lyarwood: has the k8s csi plugin be fixed to pool nova by the way | 15:26 |
dansmith | but again, as sean-k-mooney said, this is really a result of them not reading the docs that say it's async and implying which part of the operation is synchronous from the 200 | 15:26 |
sean-k-mooney | regardeless of if we cahnge the api | 15:26 |
lyarwood | sean-k-mooney: https://github.com/kubernetes/cloud-provider-openstack/issues/1645 not yet | 15:26 |
sean-k-mooney | ok but they are aware of it and it will get eventually fixed | 15:27 |
lyarwood | yup sure, just outlining my motivation to fix this | 15:27 |
* lyarwood looks at the Nova bug backlog and laughs | 15:27 | |
lyarwood | ;) | 15:27 |
sean-k-mooney | :) | 15:27 |
sean-k-mooney | i assume we have a customer hitting this so that normally seed up fixing things | 15:28 |
dansmith | to be clear, making it 202 won't fix this, we're just assuming the human that wrote the code would have assumed they should do something different if the code was 202 right? | 15:28 |
lyarwood | yup that's the assumption, it wouldn't fix anything in CSI | 15:28 |
lyarwood | I got triggered by the TODO today and wanted to take a swing a killing it | 15:29 |
sean-k-mooney | if we wnated to fix CSI issue in nova we would have to block | 15:29 |
dansmith | so, I mean.. we're like changing something that could easily break other clients that did read the docs, based on the assumption that someone who didn't would have done the right thing if it was different | 15:29 |
sean-k-mooney | wich i dont think we want to do | 15:29 |
dansmith | pretty meh :) | 15:29 |
bauzas | ok, sorry guys, I need to bail out from now | 15:30 |
lyarwood | bauzas: \o if you want to chat on Monday ping me an email, I should have time to chat once I'm on London | 15:31 |
lyarwood | in* | 15:31 |
bauzas | lyarwood: don't worry, I'll use the power of community brainstorm to move on with your spec ;) | 15:32 |
dansmith | I think we've all put on a few pandemic pounds, but being "on london" would require a substantial increase in waist size, I think | 15:32 |
* lyarwood looks at 16 month old child | 15:32 | |
lyarwood | dansmith: it has been a rough 16 months, it wouldn't take much more | 15:33 |
dansmith | heh | 15:33 |
opendevreview | Julia Kreger proposed openstack/nova master: Ignore plug_vifs on the ironic driver https://review.opendev.org/c/openstack/nova/+/813263 | 22:23 |
*** tbachman is now known as Guest2264 | 23:35 |
Generated by irclog2html.py 2.17.2 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!