| cardoe | I wandered off. I'll grab logs tomorrow though | 01:52 |
|---|---|---|
| TheJulia | okay, cool. I have another idea which may be worthwhile to consider, but logs first | 02:32 |
| rpittau | good morning ironic! o/ | 08:01 |
| *** sfinucan is now known as stephenfin | 10:16 | |
| rpittau | CI is fubar! \o/ | 10:57 |
| rpittau | new oslo.process version does not support no_fork in ServiceLauncher? | 10:57 |
| rpittau | oh it never supported it oO | 11:05 |
| rpittau | I thin kwe should just switch to process launcher | 11:06 |
| *** BertrandLanson[m] is now known as blanson[m] | 11:06 | |
| rpittau | btw I'm talking about this http://a31ada860fc20a35932c-5da8dd525c228407ee4661a46790293d.ssl.cf5.rackcdn.com/openstack/999131c0941e4f1cae35ed71f9ab8b22/logs/ironic.log | 11:06 |
| rpittau | s/oslo.process/oslo.service | 11:08 |
| rpittau | ooook ServiceLauncher silently ignored unknown kwargs :/ | 11:09 |
| rpittau | until now | 11:09 |
| rpittau | great! | 11:09 |
| rpittau | writing a fix | 11:14 |
| opendevreview | Riccardo Pittau proposed openstack/ironic master: Fix singleprocess launcher compatibility with oslo.service 4.4+ https://review.opendev.org/c/openstack/ironic/+/967821 | 11:18 |
| *** mdfr8 is now known as mdfr | 12:37 | |
| rpittau | if any core is around, this is passing CI ^ | 13:21 |
| opendevreview | Riccardo Pittau proposed openstack/bifrost master: [WIP] Remove tinyipa support and switch to debian IPA https://review.opendev.org/c/openstack/bifrost/+/964404 | 13:24 |
| opendevreview | nidhi proposed openstack/ironic master: Add Redfish LLDP data collection support https://review.opendev.org/c/openstack/ironic/+/967841 | 13:35 |
| opendevreview | nidhi proposed openstack/ironic master: Add Redfish LLDP data collection support https://review.opendev.org/c/openstack/ironic/+/967841 | 13:38 |
| opendevreview | nidhi proposed openstack/ironic master: Add Redfish LLDP data collection support to the Redfish inspection interface. https://review.opendev.org/c/openstack/ironic/+/967841 | 13:43 |
| opendevreview | nidhi proposed openstack/ironic master: Add PCIe function fields to redfish inspection https://review.opendev.org/c/openstack/ironic/+/963179 | 14:24 |
| opendevreview | Dmitry Tantsur proposed openstack/bifrost master: WIP add an OCI artifact registry https://review.opendev.org/c/openstack/bifrost/+/961388 | 14:27 |
| opendevreview | nidhi proposed openstack/ironic master: Add PCIe function fields to redfish inspection https://review.opendev.org/c/openstack/ironic/+/963179 | 14:28 |
| dtantsur | rpittau: I'd prefer to wait for TheJulia to check that changes ServiceLauncher to ProcessLauncher is fine | 14:29 |
| TheJulia | oh, hmmmmm | 14:31 |
| TheJulia | you can't process launch the vnc code | 14:32 |
| TheJulia | it goes kaboom internally and won't work | 14:32 |
| TheJulia | Only real option is to remove the no_fork option, I guess | 14:33 |
| TheJulia | Then again, I could likely stage it up here in a little bit and give it a spin to see if the vnc stuff works, or not | 14:33 |
| rpittau | TheJulia: ack | 14:34 |
| rpittau | removing the no_fork option shoud work, it was ignored so far | 14:34 |
| * TheJulia tears down and prepares to restack | 14:41 | |
| opendevreview | nidhi proposed openstack/ironic master: Add PCIe function fields to redfish inspection https://review.opendev.org/c/openstack/ironic/+/963179 | 14:48 |
| TheJulia | Okay, should be pulling everything in fresh | 14:55 |
| rpittau | TheJulia: btw the CI is passing with that in bifrost https://review.opendev.org/c/openstack/bifrost/+/964404 | 15:08 |
| rpittau | but I agree we should probably just remove the no_fork option | 15:08 |
| rpittau | just let me know if you want me to update the patch | 15:08 |
| dtantsur | I don't think that Bifrost is testing the VNC proxy | 15:10 |
| rpittau | yeah, but at least ironic starts now :D | 15:10 |
| TheJulia | rpittau: if you wouldn't mind just removing the no_fork option, that would be good. If I can get devstack to behave I can at least spin it up and test if the proxy service operates or not in that case | 15:15 |
| rpittau | TheJulia: sure, no problem! updating the patch now | 15:15 |
| TheJulia | Looks like I'm finally re-stacking now | 15:20 |
| opendevreview | Riccardo Pittau proposed openstack/ironic master: Fix singleprocess launcher compatibility with oslo.service 4.4+ https://review.opendev.org/c/openstack/ironic/+/967821 | 15:21 |
| clif | cardoe TheJulia do y'all have any perspective on this if statement in NeutronVIFPortIDMixin.vif_attach ? It says neutron cannot have a host/instance connected to more than one physical_network at a time and enforces that requirement by raising an exception: | 15:48 |
| clif | https://opendev.org/openstack/ironic/src/commit/e75c8a4483b437eb98f5cb8089c8809bedb526bf/ironic/drivers/modules/network/common.py#L623 | 15:49 |
| clif | does this still hold true in neutron? It would be a large hurdle for intended trait based networking operation otherwise | 15:49 |
| TheJulia | So, I think it might be for individual vif creation and mapping, but not across all vifs because you could have a hypervisor (or baremetal node for that matter) which bridges physical networks | 15:54 |
| clif | reading the logic more carefully it seems like it looks at if the vif being considered for attachment has more than one physical_network in common with existing node physical_networks | 15:59 |
| clif | I guess I should probably take this into consideration when planning the network operations and emit an error or warning at plan time if possible | 16:00 |
| TheJulia | okay, no_fork doesn't work | 16:01 |
| clif | it would stink to get half way through the actions and then blow up because what was planned is not possible from the outset | 16:01 |
| JayF | iurygregory: Do you know how far the idrac 10 fixes got backported and/or perhaps have a handy-dandy list of patches that need backporting (if not upstream then downstream :D) | 16:02 |
| JayF | iurygregory: my folks are gonna have DRAC10 in a lab soon and will be trying to do vmedia on caracal, which is not gonna be a good time so I'm trying to help smooth it over :) | 16:02 |
| clif | but that NOTE by mgoddard makes the physical_network restriction more dire than what is directly implied by the code itself | 16:02 |
| TheJulia | rpittau: your first change was good, second revision with no_fork was bad. | 16:03 |
| TheJulia | I've not tried to fire up the proxy, but the code does internally execute past where it would have failed so I feel pretty good about the first revision you had. | 16:04 |
| JayF | clif: given that comment came in with the original physical_network mappign implementation, my hunch is that it's more about "if more than one match; I don't know how to configure it" than anything else | 16:04 |
| JayF | clif: which I think would not apply if TBN is doing scheduling instead of physnet matching | 16:04 |
| TheJulia | JayF: ++ | 16:04 |
| clif | "Neutron cannot currently handle hosts ..." is concerning | 16:04 |
| rpittau | TheJulia: I'll revert the revert! | 16:04 |
| JayF | Neutron will never know :D | 16:05 |
| clif | lol ok | 16:05 |
| JayF | Don't tell them shhhhh | 16:05 |
| JayF | ;) | 16:05 |
| JayF | lol | 16:05 |
| clif | fair | 16:05 |
| TheJulia | so internally, we *do* this backfill of physical network data into neutron and I'm pretty sure it is not blowing up, but then again maybe nobody has actually tried different physical networks | 16:05 |
| JayF | I think of it like a real restriction; you can't have a given interface on multiple networks | 16:05 |
| TheJulia | hey hjensas! | 16:05 |
| TheJulia | this discussion might interest you! | 16:05 |
| JayF | and nothing TBN should be trying to put a single port onto >1 network | 16:06 |
| clif | yea that tracks | 16:06 |
| clif | I will proceed as if everything is fine and then worry if something blows up | 16:06 |
| JayF | I wonder if it goes the other way too | 16:06 |
| JayF | where if I have physical_network=foo on multiple vif/portgroups | 16:07 |
| JayF | becaues in the real world that presents routing pain | 16:07 |
| TheJulia | ports in neutron, can end up on any physical network as long as the base network supports it, the physical network helps guide it to the supported network if applicable | 16:07 |
| JayF | I don't think we should guard against it, per se, but there are several ways someone could misconfigure themselves in TBN | 16:07 |
| JayF | and to some effect, that's on them :) | 16:07 |
| TheJulia | in cases where overlays exist (like geneve, vxlan), then there is no concept of a physical network | 16:07 |
| TheJulia | (which... STINKS) | 16:07 |
| TheJulia | ((but, I get why its modeled that way)) | 16:08 |
| TheJulia | physical networks are more a provider network concept, fwiw | 16:09 |
| JayF | (((ergh, ok))) | 16:09 |
| JayF | (I joked to clif yesterday that "ergh, ok" was the official frustrated exclamation of OpenStack) | 16:09 |
| clif | because I made it :) | 16:09 |
| TheJulia | ((((We're likely some of the few people to be crazy enough to want to bridge overlays to physical networks....)))) | 16:09 |
| JayF | we're only [checks survey] 25% of all nova users | 16:10 |
| JayF | so not many /s | 16:10 |
| opendevreview | Riccardo Pittau proposed openstack/ironic master: Fix singleprocess launcher compatibility with oslo.service 4.4+ https://review.opendev.org/c/openstack/ironic/+/967821 | 16:10 |
| opendevreview | Riccardo Pittau proposed openstack/ironic master: Fix singleprocess launcher compatibility with oslo.service 4.4+ https://review.opendev.org/c/openstack/ironic/+/967821 | 16:16 |
| TheJulia | Although, it doesn't seem super happy about control-c :\ | 16:30 |
| TheJulia | The odds of vnc in that single process case are a bit slim, it looks like the interrupt calls get overridden because of the way its launched and it never records to the main process | 16:31 |
| rpittau | what it's weird is that it was working before before it was actually mapped to ProcessLauncher, so it should just work? | 16:34 |
| cardoe | clif: but a machine shouldn't be on more than 1 physical_network at a time? | 16:34 |
| clif | I probably don't understand the semantics of physical_network. Why not? Can't a machine have multiple physical ports that are connected to different switches/networks? | 16:36 |
| TheJulia | rpittau: yeah, processlauncher should have launched it's own subprocess, now it's an all in one binary with threads running concurrently | 16:37 |
| TheJulia | tl;dr single-process + vnc is not an expected case by default and only works locally because I have it forced on. | 16:38 |
| rpittau | ah well | 16:38 |
| TheJulia | clif: as it relates to overlays or physical networks? | 16:38 |
| clif | either one? | 16:41 |
| TheJulia | overlays because physical network concept doesn't exist in that | 16:43 |
| TheJulia | each hypervisor tunnels to each other | 16:43 |
| TheJulia | and magic happens | 16:43 |
| TheJulia | With physical networks, those are provider networks, and a network once created and mapped to a physical network can only be distributed via a singular physical network, even if there is an overlap between the physical networks. | 16:44 |
| TheJulia | its an address mapping/logical mapping in neutron constraint. | 16:44 |
| TheJulia | When we bind, we know the lower level details as configured, and the intermediate level details beyond the vlan provider network being attached to a network fabric, are all sort of handwavey in neutron because it relies upon site/operator specific configuration. | 16:45 |
| TheJulia | rpittau: oh, you know what might not work, systemd might not detect it as running. In eventlety and entirely single threaded process models, it needs that call which they had no plans on implementing, but your leaning the service hard into pure single process with no manager of sorts | 16:52 |
| cardoe | clif: so what you linked before is routed networks or L3VNIs and that statement is absolutely correct. A host cannot be a part of multiple physical_networks at once in that case. | 17:53 |
| cardoe | clif: https://cardoe.com/neutron/evpn-vxlan-network/admin/data-center-networks.html#physical-layout here's a poor man's picture | 17:53 |
| JayF | HOWEVER those are not the only ways to get networks to Ironic today, right? | 17:53 |
| cardoe | The code in the case of what you're talking about would be a server connected to both leaf switches in that picture at the same time | 17:54 |
| cardoe | And that cannot happen | 17:54 |
| cardoe | In the wise words of Trey Parker and Matt Stone, that would be french frying when you should have pizzaed. | 17:54 |
| JayF | cardoe: the part that's not clear to me is if that is /a/ possible network architecture vs /the only/ possible network architecture | 17:56 |
| cardoe | And I actually take back my statement about L3VNIs and I'll amend it to using Neutron's terminology... L2 segmented networks and L3VNIs | 17:57 |
| cardoe | JayF: There's 3 network architectures possible. | 17:57 |
| JayF | cardoe: a little worried we're going to get tunnel visioned on a use case when the scheduling bits should (maybe?) be more flexible | 17:57 |
| cardoe | Hard stop | 17:57 |
| cardoe | Unless we wanna talk custom vendor things. | 17:58 |
| JayF | My question is more if that spot in ironic is the place for us to say "don't do that" | 18:00 |
| JayF | that's what feels weird to me | 18:00 |
| cardoe | So from a physical port binding I think Ironic should just care about EVPN type 2. Hard stop. | 18:01 |
| JayF | You're not exactly answering my question though; at vif attach time, when TBN code activates, is that the right place for that check to be? | 18:01 |
| cardoe | But Neutron won't allow that so we'll have to take into account EVPN type 5. | 18:01 |
| cardoe | Yes that's the right place for that check. | 18:02 |
| cardoe | That check is 100% wrong. | 18:02 |
| JayF | [blink] | 18:02 |
| cardoe | https://bugs.launchpad.net/networking-generic-switch/+bug/2114451 | 18:02 |
| JayF | I feel like 90% of my questions get answered with a link to or description of that bug | 18:03 |
| cardoe | I wish it wasn't the case. | 18:03 |
| JayF | and for purposes of this question I'm trying to think about what *Ironic* should check | 18:03 |
| JayF | and that all centers around neutron logic and modeling | 18:03 |
| JayF | (right?) | 18:03 |
| cardoe | Yeah. Help me get them to fix stuff. | 18:04 |
| cardoe | So easiest description... | 18:04 |
| cardoe | You have an office with a lot of computers. So let's say you decide to use 192.168.10.0/24 for floor 1 and 192.168.20.0/24 for floor 2. | 18:05 |
| cardoe | In Neutron terms, it's one office network so its 1 Neutron Network. Traffic can get between all machines. | 18:06 |
| cardoe | It's 2 segments on that network. Hence that code that clif linked walking the segments. | 18:06 |
| cardoe | Ports of computers that are on floor 1 would have a physical_network = floor 1 and floor 2 would be physical_network = floor 2. | 18:07 |
| JayF | that check is in context of a single vif though | 18:07 |
| JayF | that code only runs with a single vif coming in | 18:07 |
| cardoe | Yep. | 18:07 |
| JayF | so if it shouldn't have >N physical_network, isn't it neutron's job to enforce that? | 18:07 |
| JayF | That's why I'm struggling here, it feels like if there's a check to happen here it shouldn't be happening in Ironic | 18:08 |
| JayF | because that code is 100% just about mapping neutron ports to ironic port-likes | 18:08 |
| cardoe | oh. sure you can toss that check. | 18:08 |
| JayF | and there's no reason if >1 physical network is returned by neutron, because we should still be able to whiddle it down to one using TBN | 18:09 |
| JayF | or else TBN itself blows it up | 18:09 |
| JayF | all | 18:09 |
| JayF | **all I'm trying to get to the heart of is what is our responsibilities at vif_attach time | 18:09 |
| cardoe | If we fix https://bugs.launchpad.net/networking-generic-switch/+bug/2114451 then that check is unnecessary. | 18:09 |
| JayF | and it seems to be that check is WELL OUTSIDE them | 18:09 |
| JayF | I don't think that bug applies in all cases, but the check does | 18:10 |
| cardoe | Well lemme continue. | 18:10 |
| cardoe | So simple setup of the floors correct? | 18:10 |
| * JayF is trying hard to think about this as inputs/outputs to vif_attach without modeling the world :( | 18:11 | |
| JayF | yeah, N subnets in a single (v(x))LAN | 18:11 |
| cardoe | But that's too simplified version of the world. | 18:11 |
| cardoe | That's not how the real world is. | 18:11 |
| cardoe | You've really got 10 switches on floor 1 serving up that 192.168.10.0/24 block. | 18:12 |
| JayF | I've literally run a lan configured like this, btw | 18:12 |
| JayF | except for data/voice subnets | 18:12 |
| cardoe | I mean I would hope so | 18:12 |
| cardoe | So in the older aggr model of the world you'd just stretch a VLAN across floor 1 | 18:13 |
| cardoe | But your overall traffic suffers | 18:13 |
| JayF | yes, but that does still represent a physical network design that exists and is used in openstack cloud contexts | 18:13 |
| cardoe | So you actually put each of your 10 switches on different VLANs which are serving up 192.168.10.0/24 | 18:14 |
| cardoe | In terms of Neutron, each of those 10 switches are their own segment still | 18:14 |
| JayF | this is where you lose me in terms of real world, this sounds bananas to me | 18:14 |
| cardoe | That's how spine leaf works | 18:14 |
| cardoe | The MAC addresses are registered as being part of 1 leaf so the spine traffic just gets directed where it needs to go ensuring your overall throughput. | 18:15 |
| TheJulia | oh gawd lots of talking | 18:15 |
| cardoe | Your VLAN stretched across the entirely of floor 1 is really divided up. | 18:16 |
| TheJulia | I just opened some wine for a beef stew I was starting, should I be pouring myself a glasS? | 18:16 |
| TheJulia | (and how do ya'll get that far into discussion on networking while I started a stew!) | 18:17 |
| * cardoe shrugs. | 18:17 | |
| cardoe | This is why I'm struggling with this whole issue. | 18:17 |
| cardoe | Cause I can never describe it concisely and everyone tells me I'm crazy. | 18:18 |
| TheJulia | crazy on what aspect? | 18:18 |
| JayF | I think you're describing *a* possible design, not *the only* possible design | 18:18 |
| JayF | and I'm really trying to keep a super generic hat on to help clif get past his specific TBN question | 18:18 |
| TheJulia | we do a lot of what JayF notes | 18:18 |
| TheJulia | ++ | 18:18 |
| * TheJulia summons the glasses so she can actually read | 18:18 | |
| cardoe | JayF: I'm describing how VXLAN works. | 18:19 |
| JayF | which at this point sounds like the answer is: that check is not valid in all cases, and even when it's valid it's not in the correct place, so we should remove it and the broken cases (which are seemingly already broken in some ways) stay broken until 2114451 works | 18:19 |
| JayF | cardoe: I don't use vxlan. | 18:19 |
| cardoe | $5 says you do. | 18:19 |
| JayF | I don't have root on any machine with vxlan connectivity, $50000 says that's true ;) | 18:20 |
| TheJulia | OR he *does* and his part of the network doesn't matter because they are doing the handoff | 18:20 |
| JayF | yes | 18:20 |
| JayF | yes yes yes | 18:20 |
| JayF | it's not ironic's problem | 18:20 |
| TheJulia | Bottom line, if one wants dynamic vxlan stuffs, thats a lot | 18:20 |
| JayF | the vlans are pre-curated by network team | 18:20 |
| JayF | Ironic's job is to decide how many ports to bond together and what vlan to put them on | 18:20 |
| TheJulia | if everyone is willing to pre-curate the awfulness of vlans, its okay | 18:20 |
| * TheJulia has glasses, and pins her hair back to begin reading from the beginning | 18:21 | |
| cardoe | I mean this issue that I'm describing right here is what causes Zuul jobs to go squirrelly | 18:21 |
| cardoe | Cause Zuul uses vxlan to setup the network that the nodesets are part of. | 18:21 |
| TheJulia | as an overlay to serve as an underlay for the devstack jobs | 18:22 |
| cardoe | Yep | 18:22 |
| cardoe | But this same "use the IP to find the segment" bug exists there. | 18:22 |
| cardoe | At least in all the OpenStack Helm jobs. | 18:22 |
| cardoe | What nodeset segment am I on? lemme use my IP address to look this up.... | 18:23 |
| JayF | Yeah, I think part of our comms disconnect is how far into the stack I have openstack worrying about | 18:23 |
| * TheJulia sees we went way off the rails putting topics into a blender | 18:23 | |
| JayF | like I said, network team owns everything above the instance ports here, so any dynamic craziness is preconfigured as essentially business logic into ngs configs | 18:23 |
| cardoe | Maybe. I suspect you'll have push back. | 18:25 |
| cardoe | Cause that's not efficient. | 18:25 |
| cardoe | unless you're dealing with a very small number of networks | 18:25 |
| JayF | We are. | 18:25 |
| JayF | Small number of networks at extremely high throughput and low latency | 18:25 |
| JayF | with eye-watering amounts of transit to each individual server | 18:25 |
| clif | so if that check in vif_attach is wrong I can safely ignore it and carry on? and maybe rip it out as part of ongoing tbn work? | 18:26 |
| JayF | clif: I'd suggest ripping it out for now and expecting that to be a hot topic in code review :) | 18:26 |
| clif | lol ok | 18:26 |
| clif | when it comes to network stuff I know enough to be dangerous but I'm not an expert at all | 18:27 |
| cardoe | I've had to learn more then I ever cared to know. | 18:27 |
| TheJulia | Can we have a meeting that is called "The Topic Frappe" ? | 18:27 |
| JayF | I know a TON about low level networking on linux in 2005. | 18:27 |
| JayF | Too bad time passes and everyone overlays 5000 things on top now | 18:28 |
| cardoe | JayF: I promise you if you're dealing with high throughput stuff this segment stuff is gonna matter. | 18:28 |
| JayF | I don't care if it matters; I care about answering the specific question about that block of code for TBN | 18:29 |
| JayF | I am not hungry enough to eat the whole (proverbial) elephant today, I just want a couple of bites :) | 18:29 |
| TheJulia | At the level which you are/will/can integrate today. | 18:30 |
| TheJulia | Which is separate from a desirable future state | 18:30 |
| JayF | yeah, I can never fully understand the picture in enough detail to reason about it, so I try to zoom in and solve it a bite at a time | 18:31 |
| JayF | TBN is one of those bites | 18:31 |
| cardoe | So the issue really is that the implementation of get_physnets_by_port_uuid() is wrong | 18:32 |
| cardoe | If https://bugs.launchpad.net/networking-generic-switch/+bug/2114451 was fixed | 18:32 |
| cardoe | That function could only ever return 1 | 18:33 |
| cardoe | I've already submitted a patch to remove the check. | 18:33 |
| cardoe | Well how about this... +W https://review.opendev.org/c/openstack/ironic/+/964570 | 18:34 |
| cardoe | And that gets me one step closer to fixing it | 18:34 |
| cardoe | https://review.opendev.org/c/openstack/ironic/+/952168 that's where I attempted to get rid of the check as much as possible | 18:35 |
| TheJulia | excellent! I'll try to review soon. I'm having to chase down a what if question for downstream stuff right now | 18:37 |
| cardoe | Well that last one is a -1 from me | 18:38 |
| TheJulia | on your own change? | 18:38 |
| cardoe | Yes. | 18:38 |
| TheJulia | There was a reason, fair enough | 18:39 |
| cardoe | I don't have that env setup anymore but that was the pure NGS environment. | 18:39 |
| cardoe | Neutron would plug the port in correctly to the server on a VLAN network (I wasn't doing VXLAN in that env... I mean I was it was like what JayF's describing that the network folk setup it up for us and we just used the VLANs) | 18:40 |
| cardoe | But then when you tore the server down Neutron and Ironic get out their dart board and pick a random switch to tell NGS to disconnect from. And things go sideways. | 18:40 |
| TheJulia | because switches are modeled in that as well? (which is valid, but I guess there was overlapping in the switch selections?) | 18:43 |
| cardoe | Because it calls unplug_port_from_segment | 18:48 |
| cardoe | Which reads the VLAN from the segment and the switch to talk to from the port | 18:49 |
| cardoe | And it walks the segments to clean up and it has no clue what the relationship is between segments and switches. | 18:49 |
| cardoe | Wait sorry other way around | 18:50 |
| cardoe | it walks the switches and does list_of_segments[0] | 18:50 |
| cardoe | So the check prevents you from attaching to a vif that would have len(segments) > 1 | 18:51 |
| TheJulia | that makes sense then | 18:55 |
| TheJulia | (since you shouldn't... really.) | 19:58 |
| cardoe | JayF: let's just put it another way... the point of that check is that at detachment time you won't know the physical_network that the attach happened with... that's why it's preventing that... https://review.opendev.org/c/openstack/ironic/+/964570 puts the physical_network into the port's binding_profile at attachment time so then you know it at detachment time | 20:46 |
| cardoe | So even if neutron hands us back a list, we can find the right one. | 20:47 |
| JayF | TheJulia: how would you feel about: if node.provision_state in _UNPROVISION_STATES or node.provision_state not in ironic_states.PROVISION_STATE_LIST (https://github.com/openstack/nova/blob/master/nova/virt/ironic/ironic_states.py#L178 ): do the undeploy | 21:30 |
| JayF | TheJulia: that might be the backportable fix that would also solve us going forward | 21:30 |
| JayF | without moving fully to a try/fail model | 21:30 |
| TheJulia | .... maybe.... I need to click the link in a minute | 21:48 |
| TheJulia | I *think* that would cover it | 21:53 |
| TheJulia | because it would still trigger on the last | 21:53 |
| JayF | I'm going to propose it | 22:09 |
| TheJulia | I was thinking about what cardoe was indicating yesterday, no logs (hint hint), I think a thing we might want to consider is actually considering blocking for some updates and just letting a thread wait until it can have a lock | 22:57 |
| TheJulia | which is bad, but... maybe it is needed? :\ | 22:57 |
| TheJulia | Alternatively "deferred tasks" | 22:57 |
| cardoe | I pasted no? | 23:09 |
| cardoe | I'll grab them again | 23:09 |
| cardoe | uwsgi and ironic question... | 23:09 |
| cardoe | OpenStack Helm sets the number of uWSGI processes equal to api_workers. Is that correct? | 23:09 |
| JayF | I have an email from a DMTF member requesting feedback on Redfish specification. I suspect I'm not the best person to give full feedback to them. Is someone else interested in having a chat? | 23:10 |
| cardoe | janders is probably your best person. | 23:11 |
| cardoe | TheJulia: you wanted nova-compute-ironic? | 23:13 |
| TheJulia | JayF: I might be willing, but do sort of concur janders might be a good candidate. | 23:15 |
| TheJulia | cardoe: huh?! | 23:15 |
| cardoe | the logs | 23:15 |
| JayF | I'm mainly seeing if they want a chat or if they wanta document, I'll rope TheJulia/janders in when it gets to rubber-hit-road | 23:15 |
| TheJulia | OH, ironic-conductor, ironic-api. I'm 90% sure nova-compute did the right thing but I'd happily look at the logs too. | 23:16 |
| TheJulia | JayF: I wouldn't want to overload them, but truthfully maybe some solidified community feedback from more than one person might be most impactful, or just go "hey, there are several people you might gain insight from..." | 23:17 |
| JayF | My preferred model would be an etherpad, probably with split experiences labelled by person, with a sync chat with one or two of us (different companies ideally) to go over it | 23:17 |
| TheJulia | since nova-compute was in the drivers seat, likely best to frame it that way | 23:17 |
| TheJulia | ++ | 23:17 |
| TheJulia | Yeah, that is likely for the bst | 23:17 |
| TheJulia | best | 23:17 |
| TheJulia | Its not mraineri is it? | 23:18 |
| JayF | No. | 23:20 |
| JayF | RFR for ironic driver fix: https://review.opendev.org/c/openstack/nova/+/967941 | 23:23 |
| cardoe | whelp... I clearly don't know how to copy and paste out of grafana... it looks like turds... https://gist.github.com/cardoe/b0aefe21b1fc7b81c38bed8dad8e14b2 | 23:28 |
| JayF | cardoe: we used to have a rule in the cluster @ Yahoo and the cluster @ RAX Cloud Monitoring which could detect tracebacks and assembled them together into one actual logline for the dash | 23:28 |
| JayF | cardoe: it was very nice | 23:28 |
| JayF | cardoe: (note: those places used splunk and greylog respectively, so I have no tech help to offer lol) | 23:29 |
| cardoe | yeah that's something we need to do. | 23:29 |
| cardoe | In the prior setups it did that. | 23:30 |
| cardoe | So any thoughts on uWSGI and api_workers? | 23:34 |
| * JayF has zero information on that for ysa | 23:34 | |
| cardoe | I already found a place they diverge with what Neutron wants. | 23:34 |
| TheJulia | There is tons of disagreement out there regarding workers | 23:34 |
| TheJulia | ... Personally, you want enough workers to serve the requests. Technically, I think each worker should scale but some others think it shoudl be single threaded off the worker. There is a launch cost to the worker but most don't want to pay that over and over | 23:36 |
| JayF | fwiw as a follow-up to that ironic-driver fix, I might split the constants defining our states from the actual-state-machine-code, into a separate file, and rework the nova side driver to use a copied-over version of that file | 23:41 |
| cardoe | then maybe as a follow on put it into a separate package to make it versioned... maybe call it ironic-lib? | 23:51 |
| * cardoe sees himself out. | 23:51 | |
Generated by irclog2html.py 4.0.0 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!