| janders | TheJulia ACK. So I remove the diagram from my doco change? | 00:06 |
|---|---|---|
| janders | (new diagram looks good) | 00:06 |
| TheJulia | either, I mean its all programatically generated | 00:08 |
| TheJulia | so we should ideally do it off master branch state periodically but dont. | 00:08 |
| opendevreview | Jacob Anders proposed openstack/ironic master: WIP: update documentation to include servicing abort. https://review.opendev.org/c/openstack/ironic/+/957825 | 00:15 |
| opendevreview | Jacob Anders proposed openstack/ironic master: WIP: update documentation to include servicing abort. https://review.opendev.org/c/openstack/ironic/+/957825 | 00:19 |
| janders | ^^ change to the svg file removed from my WIP doco update | 00:20 |
| opendevreview | OpenStack Proposal Bot proposed openstack/ironic-ui master: Imported Translations from Zanata https://review.opendev.org/c/openstack/ironic-ui/+/957829 | 02:33 |
| rpittau | good morning ironic! o/ | 06:59 |
| janders | hey rpittau o/ | 07:01 |
| rpittau | hey janders :) | 07:08 |
| queensly[m] | Good morning o/ | 07:53 |
| abongale | good morning | 08:16 |
| opendevreview | luocanhui proposed openstack/ironic master: Add UT for metrics modules https://review.opendev.org/c/openstack/ironic/+/958028 | 08:27 |
| kubajj | Good morning Ironic! o/ | 09:33 |
| kubajj | I am very confused about https://opendev.org/openstack/ironic-python-agent/src/branch/master/ironic_python_agent/hardware.py#L2041-L2043 | 09:33 |
| kubajj | janders: if you are around, any clue why you checked if the function exists? | 09:34 |
| janders | kubajj not of the top of my head (been a while) but will try check for you after dinner | 09:41 |
| janders | kubajj this looks like a straight up typo bug with brackets missing | 10:43 |
| janders | I don't think there is any deeper explanation than that | 10:44 |
| janders | part of this change was moving the logic in question to a helper function, typo must have been introduced as a part of this and since it's not too disruptive in most circumstances (change still works well on nodes with NVMe drivers, doesn't break CI) it went unnoticed up till now | 10:46 |
| janders | IIRC you're working on swraid related features, correct? I see how it can be disruptive in this scenario | 10:48 |
| opendevreview | Jacob Anders proposed openstack/ironic master: Fix servicing abort to respect abortable flag https://review.opendev.org/c/openstack/ironic/+/957189 | 11:12 |
| dtantsur | TheJulia: hey, how is indirection API going to work in the hash ring code, given that the hash ring is needed to get the RPC peers? | 11:20 |
| opendevreview | Takashi Kajinami proposed openstack/ironic-inspector master: Add WSGI module entrypoint https://review.opendev.org/c/openstack/ironic-inspector/+/958068 | 12:36 |
| opendevreview | Takashi Kajinami proposed openstack/ironic-inspector master: add pyproject.toml to support pip 23.1 https://review.opendev.org/c/openstack/ironic-inspector/+/958069 | 12:36 |
| tkajinam | quick question: Do we expect 2025.2 release of ironic-inspector deliverables ? | 12:36 |
| kubajj | janders: yes, indeed, we are trying to debug some swraid issues | 12:36 |
| dtantsur | tkajinam: I assume we still release it for 2025.2, but I don't remember the final agreement tbh | 12:37 |
| tkajinam | dtantsur, ok | 12:37 |
| tkajinam | I'm updating all puppet modules to replace wsgi script and am wondering if we can add wsgi module to ironic-inspector, then want to know if it'll be part of the next release | 12:38 |
| tkajinam | if not then I can just drop all ironic-inspector support code | 12:38 |
| opendevreview | Dmitry Tantsur proposed openstack/ironic master: JSON-RPC: disable server-side logging with rpc_transport=none https://review.opendev.org/c/openstack/ironic/+/958072 | 12:43 |
| opendevreview | Takashi Kajinami proposed openstack/ironic-inspector master: Add WSGI module entrypoint https://review.opendev.org/c/openstack/ironic-inspector/+/958068 | 12:51 |
| opendevreview | Takashi Kajinami proposed openstack/ironic-inspector master: add pyproject.toml to support pip 23.1 https://review.opendev.org/c/openstack/ironic-inspector/+/958069 | 12:51 |
| opendevreview | Baptiste Jonglez proposed openstack/networking-generic-switch master: Cast boolean Netmiko kwargs to native types https://review.opendev.org/c/openstack/networking-generic-switch/+/958079 | 13:03 |
| opendevreview | Baptiste Jonglez proposed openstack/networking-generic-switch master: Document advanced Netmiko parameters https://review.opendev.org/c/openstack/networking-generic-switch/+/958080 | 13:07 |
| TheJulia | I *think* the idea was we would release it for 2025.2 and that would be it | 13:16 |
| dtantsur | TheJulia: good morning. Now to the bad news: Ironic seems pretty broken in Metal3, and the eventlet migration may be to blame. | 13:20 |
| TheJulia | dtantsur: you've reminded me I need to set one of the classic jobs to use it. I think it ends up working by routing to ironic.conductor_manager directly | 13:20 |
| TheJulia | dtantsur: logs? | 13:20 |
| TheJulia | metal3-integration really need to be fixed to use an entire stack even though we did the squash test | 13:21 |
| TheJulia | ... which makes me question it's job overall, but hey | 13:21 |
| opendevreview | John Garbutt proposed openstack/ironic master: WIP: Choose prefered vmedia boot device https://review.opendev.org/c/openstack/ironic/+/958082 | 13:21 |
| dtantsur | TheJulia: metal3-integration is, unfortunately, the least useful thing we could run on Ironic.. we have consistent failures in the ironic-standalone-operator job. Which, among other things, creates and reads 100 nodes in a quick sequence on each test. | 13:22 |
| TheJulia | Well, had no way to know about that, and I can only work with logs | 13:23 |
| TheJulia | so... logs? | 13:23 |
| dtantsur | Yeah, before I bother you with that, I need to understand if it fails entirely or is just slow | 13:23 |
| dtantsur | so | 13:23 |
| dtantsur | https://github.com/metal3-io/ironic-image/actions/runs/17098403587/job/48488270259 is the entry point | 13:23 |
| dtantsur | if you click "Summary", you can find the ZIP file with logs to download. Each directly is one test, so a separate Ironic installation. | 13:24 |
| TheJulia | ack, caffinating now() | 13:24 |
| dtantsur | (Yeah, github UX for you... I wish they spent time on that rather than AI..) | 13:24 |
| dtantsur | I'll try to rise all the timeouts to see if it's slowness or something gets stuck | 13:24 |
| TheJulia | it wouldn't super surprise me if something got stuck, the basic testing I was running was pretty sporty performance wise, but... as with all things, it is also a massive change | 13:25 |
| dtantsur | https://github.com/metal3-io/ironic-standalone-operator/pull/349 will hopefully tell | 13:26 |
| opendevreview | John Garbutt proposed openstack/ironic master: WIP: Choose prefered vmedia boot device https://review.opendev.org/c/openstack/ironic/+/958082 | 13:29 |
| TheJulia | curious | 13:43 |
| TheJulia | not a single error, looks like it is generating responses | 13:44 |
| TheJulia | at least, it thinks it is | 13:44 |
| TheJulia | dtantsur: is the config intentionally all over the place, some jobs with json-rpc, some appearing to be sqlite, but then others with a configuration for an ironic-database.mariadb.svc ? | 13:56 |
| TheJulia | I guess so, an ha job | 13:56 |
| dtantsur | TheJulia: yep, it's testing different configurations. | 13:57 |
| * dtantsur updated the wrong timeout, retrying.. | 13:58 | |
| TheJulia | doh | 13:58 |
| TheJulia | Interesting | 14:08 |
| stephenfin | TheJulia: JayF: Did you come to a conclusion yet on whether to make the redfish-vmedia-4k job n-v? https://review.opendev.org/c/openstack/ironic/+/957962 | 14:13 |
| TheJulia | my hope was mirrors would heal since the central mirrors are supposed to be good now aiui, but maybe my understanding was limited by my EOD | 14:14 |
| stephenfin | ack | 14:15 |
| TheJulia | dtantsur: crazy question, could ironic be ooming? | 14:19 |
| TheJulia | dtantsur: or to be more precise, could the runtime environment be killing it? | 14:20 |
| dtantsur | That's possible but we should see signs of it | 14:20 |
| TheJulia | dtantsur: I do think your on there right path with timeout, inconsistent stop points while just seemingly chugging right along | 14:20 |
| dtantsur | If that's the timeout, on one hand it's a relief (we're not broken in the end), but it's also a significant slow-down. | 14:21 |
| TheJulia | or the hosts are swapping and get inconsistent performance | 14:22 |
| TheJulia | which would greatly reduce overall performance | 14:22 |
| dtantsur | TheJulia: it's passing, but taking 4 minutes instead of < 2 minutes for now | 14:22 |
| TheJulia | okay, a start point! | 14:22 |
| TheJulia | We need to get request logging back somehow | 14:23 |
| dtantsur | definitely | 14:26 |
| TheJulia | do you have a link for your successful run logs? I want to see if I can somehow extract a pattern | 14:29 |
| TheJulia | or at least, some level of understaning | 14:29 |
| TheJulia | understanding | 14:29 |
| TheJulia | Also, how much ram do those nodes have? | 14:30 |
| dtantsur | That's something we ourselves are trying to figure out | 14:30 |
| dtantsur | it's minikube | 14:30 |
| TheJulia | at least, we know its not a 5 alarm fire, more like a 2.5 alarm fire :) | 14:32 |
| dtantsur | (side conversation: how crazy would be to mount the RPC application under the rest of the API?) | 14:32 |
| dtantsur | hehe, true | 14:32 |
| TheJulia | could you elaborate on your idea and what your trying to achieve? | 14:33 |
| dtantsur | Don't worry too much about it. But I really want to stop inventing new port numbers (8089 conflicts in OpenShift) | 14:33 |
| TheJulia | oh, he | 14:33 |
| TheJulia | heh | 14:33 |
| TheJulia | Regarding logging, it feels like we need to either see if we can toggle any additional logging in cheroot/cotyldon, or we need to insturment our rest API code | 14:34 |
| dtantsur | Looks like it's not implemented in cheroot, so I've filed a bug for us | 14:37 |
| JayF | dtantsur: link me the bug? | 14:38 |
| JayF | I'll at least get it on the GR-OSS backlog | 14:38 |
| JayF | if nobody gets it by the time they can schedule it, woo a freebie :D | 14:38 |
| TheJulia | Looking at a single delete action towards when one of the jobs stopped, at *least* when it started, the rpc call got generated at 13:05:20.167 and the client logged the response at 13:05:20.257 which is not bad. Basically 90ms for the entire conductor side | 14:40 |
| dtantsur | JayF: https://bugs.launchpad.net/ironic/+bug/2121058 | 14:42 |
| JayF | aha so not something we can do in cheroot | 14:42 |
| JayF | that can't be that hard | 14:42 |
| JayF | I'm going to point claude at it and see if I can bang it out today | 14:42 |
| dtantsur | JayF: maybe we can convince them to accept a PR? It's not clear if they don't want it at all or just don't want to work on it. | 14:43 |
| JayF | I am... skeptical given how that reads to me, and also, given the option between "Ironic solves our own problem in a day-ish" vs "solve problem in cheroot -> release -> requirements -> update -> test" I'll take the first one :) | 14:44 |
| TheJulia | I think in our own code path makes sense, since we also need to tie things with a request-id | 14:45 |
| dtantsur | Good point | 14:45 |
| TheJulia | the logs dmitry has provided *are* nice and squential | 14:46 |
| dtantsur | Only because the test is sequential ;) | 14:46 |
| TheJulia | there is definitely a feeling that I'm getting looking at them that we do seem to be dealing with some sort of unknown issue on the API side sort of outside our RPC code path | 14:46 |
| TheJulia | which means, minimally we need to insturment that in middleware most likely | 14:47 |
| TheJulia | at least, we need to see once we consider ourselves entirely done | 14:47 |
| TheJulia | and log that | 14:47 |
| TheJulia | and see how long it is before the next request comes in | 14:47 |
| TheJulia | (and of course with all multiprocess logging, some of the entires are a little out of sequential order since we have two processes operating here | 14:48 |
| TheJulia | so, we actually need to sort of work both but I don't think we have any baseline on the conductor/rpc side aside from "that feels reasonable" | 14:48 |
| TheJulia | I almost want thread count numbers too | 14:49 |
| dtantsur | Meanwhile, the tests passed with an increased timeout | 14:49 |
| TheJulia | so, 2.5 alarms ;) | 14:49 |
| TheJulia | its not the end of the world, but its not great | 14:49 |
| TheJulia | What is super interesting to me is it *feels* like in opendev, it has felt like a wash timewise | 14:50 |
| opendevreview | Takashi Kajinami proposed openstack/ironic-inspector master: Add WSGI module entrypoint https://review.opendev.org/c/openstack/ironic-inspector/+/958068 | 14:50 |
| dtantsur | TheJulia: here is the data point for you. As far as I can tell, the normal RPC case (where we have 3 Ironic's and a MariaDB) has not regressed. Still around 1 minute (even less) for creating/reading/deleting nodes. | 14:52 |
| TheJulia | Interesting! | 14:53 |
| TheJulia | The log I'm looking at looks like it is doing that and it looks like it got to node 95 before the job stopped, so not entirely there before your timeout job but close | 14:54 |
| TheJulia | at least, on deletions | 14:55 |
| TheJulia | I suspect we're sort of in a death by a thousand cuts situation, one thing is for sure though, the amount of logging we do when we're doing indirection is likely just way too much text and a bit more overhead | 14:55 |
| TheJulia | I guess first stop might also need to put a knob on that as well | 14:56 |
| dtantsur | I wonder if we can get rid of the API/conductor split entirely for the rpc_transport==none case | 14:57 |
| dtantsur | Like, run the entire API in a conductor thread or something | 14:57 |
| dtantsur | (pondering if indirection is the cause of slowness) | 14:57 |
| TheJulia | Well, we could single process it, but we would still need a hardish delineation | 14:57 |
| TheJulia | without the systemd signaling stuffs systemd on bifrost will think the process never actually started | 14:58 |
| dtantsur | I'm thinking: we're already able to start the RPC WSGI server as part of teh conductor. What is preventing doing the same with the API? | 14:59 |
| * dtantsur -> 1:1 | 14:59 | |
| TheJulia | I dunno, but lets try to make this somewhat data driven of an approach | 14:59 |
| dtantsur | fair | 14:59 |
| TheJulia | so like obvious thing on my end is the logging which results when indirection is in use | 15:00 |
| TheJulia | we can dial that back big time | 15:00 |
| TheJulia | we also need to insturment | 15:00 |
| dtantsur | https://review.opendev.org/c/openstack/ironic/+/958072?usp=dashboard related | 15:00 |
| TheJulia | good enough for me | 15:01 |
| TheJulia | I was going to make it configurable | 15:01 |
| opendevreview | John Garbutt proposed openstack/ironic master: WIP: Choose prefered vmedia boot device https://review.opendev.org/c/openstack/ironic/+/958082 | 15:39 |
| iurygregory | <eyes> ^ | 16:38 |
| JayF | iurygregory: if you have questions about that or need context I have some of it (the vmedia boot device change) | 16:41 |
| iurygregory | I added a comment to the patch | 16:42 |
| iurygregory | just some thought I had while looking at it | 16:42 |
| JayF | I'm assuming we're all onboard to add request logging to cheroot in IPA as well? | 16:56 |
| JayF | The Ironic patch is getting close, and it occurred to me I should port it to IPA | 16:57 |
| iurygregory | ++ | 16:58 |
| opendevreview | Jay Faulkner proposed openstack/ironic master: Add request logging middleware for API requests https://review.opendev.org/c/openstack/ironic/+/958103 | 17:00 |
| TheJulia | the regex worries me, I suspect we should always have a request-id by that point though, but I don't remember if its in context that gets created or if it is in middleware establishing context | 17:07 |
| TheJulia | (doesn't help I'm on a long running meeting) | 17:07 |
| TheJulia | I'm looking at changing the jsonrpc logging further, to make it *way* more lean on the client side | 17:10 |
| TheJulia | if we can get to "we sent request_id to rpc and got it back" then that should be a lot and also make the logs a bit cleaner to follow-overall | 17:15 |
| JayF | TheJulia: that is exactly how nova does it fwiw | 17:16 |
| JayF | TheJulia: they validate the request-id via regexp every time, that's why I did it | 17:16 |
| JayF | seemed like a good sanity check and protection from people injecting weird crap into logs via http header | 17:16 |
| JayF | I have a meeting very shortly, am going to check CI output or test manually after if it's not done | 17:17 |
| TheJulia | okay then | 17:18 |
| TheJulia | I just approved the 4k job change, since it seems like its still happening, likely just slowly syncing mirrors :( | 17:20 |
| clif | apparently there's been an open issue for at least a week: https://issues.redhat.com/browse/CS-2983 | 17:21 |
| JayF | given the datestamp on that and a fix is attached and declared working without comments, I wonder if there's a newer ticket and/or if they realize that one isn't solved | 17:25 |
| TheJulia | clif: yeah, although I think I glanced at that yesterday and it was resolved. | 17:32 |
| * TheJulia goes and uses her employee access to see what the change log says | 17:32 | |
| TheJulia | still being worked, fwiw | 17:34 |
| TheJulia | I must have been thinking it was resolved by looking at a different item in jira, that one has never changed state to done/resolved | 17:35 |
| opendevreview | Verification of a change to openstack/ironic master failed: Make ironic-tempest-uefi-redfish-vmedia-4k non-voting https://review.opendev.org/c/openstack/ironic/+/957962 | 18:30 |
| opendevreview | Julia Kreger proposed openstack/ironic master: Minimize json-rpc client logging https://review.opendev.org/c/openstack/ironic/+/958119 | 18:44 |
| opendevreview | Julia Kreger proposed openstack/ironic master: Make ironic-tempest-uefi-redfish-vmedia-4k non-voting https://review.opendev.org/c/openstack/ironic/+/957962 | 18:46 |
| TheJulia | clif: ^^ you had to disable the gate job as well :) | 18:46 |
| TheJulia | the wall clocking of the request handling seems to point to the indirection interaction being the huge sink. I think we should get patches around dialing back logging and see what we come out at | 19:36 |
| TheJulia | If not, we might want to dial out indirection for metal3 and just see if we're lock chaos free, or lock ridden | 19:37 |
| TheJulia | Looks like the post to the conductor is chewing about 300 ms | 20:42 |
| JayF | is there any part of that interaction we could speed up? Thinking about the moving parts that seems a little long but probably not more than double-digit ms to save | 20:48 |
| TheJulia | I'm not sure yet | 20:50 |
| TheJulia | https://github.com/cherrypy/cheroot/blob/main/cheroot/wsgi.py#L148 <-- say hello to our 300ms | 21:02 |
| JayF | should that be obvious to me? | 21:04 |
| TheJulia | nah | 21:07 |
| TheJulia | nope | 21:07 |
| TheJulia | so, whats interesting here is wsgi_app is the wsgiservice _application call for json_rpc (ironic/common/json_rpc/server.py) | 21:08 |
| TheJulia | which makes no sense because if I measure it internally, its super fast | 21:08 |
| JayF | so the difference is probably imports? | 21:14 |
| JayF | yeah? | 21:14 |
| JayF | if you're measuring inside ironic and it's fast, but from inside cheroot it's slow, that makes me wonder if there's something inherently slow about calling into there | 21:14 |
| TheJulia | not terribly, So, the issue why the local service metal3 stuff is slower, is because its basically death by a thousand cuts | 21:14 |
| JayF | imports is unlikely the actual answer, but something along those lines | 21:14 |
| TheJulia | but... each jsonrpc pass through indirection gets 300+ ms on my local laptop through the cheroot wsgi app handler back to our own app | 21:15 |
| JayF | yeah but if there's something weird here, we could speed up the whole service if we could improve it | 21:15 |
| TheJulia | I don't understand why, yet | 21:15 |
| JayF | because I'd assume we're paying those 300ms, just less often, in all cases with json-rpc, yeah? | 21:15 |
| TheJulia | oh yeah | 21:15 |
| TheJulia | we just now pay them on the object calls in local mode | 21:16 |
| TheJulia | which slows down the reads/writes | 21:16 |
| JayF | </topic><topic> it's weird to me that ironic uses https://github.com/openstack/ironic/blob/master/ironic/api/hooks.py#L102 whereas other stuff uses https://github.com/openstack/oslo.middleware/blob/master/oslo_middleware/request_id.py#L40 ... I'm really perplexed as to how our request_id handling works | 21:20 |
| JayF | I thought at first it was just a "we did it an old way, there's a new way", but the shape seems different | 21:21 |
| JayF | I'm going to table this and try to get back on the yellow log road | 21:26 |
| JayF | holy cow I'm pretty sure I just unraveled this and figured out a thing | 21:31 |
| JayF | I'm 99% sure Ironic is a terminus in a case where we're given a global_request_id | 21:31 |
| TheJulia | found our 300ms | 21:32 |
| JayF | nope just kidding, this code is just misdirected as hell | 21:32 |
| JayF | TheJulia: forgot to call sys.go_fast()? | 21:32 |
| JayF | :D | 21:32 |
| TheJulia | lol no | 21:32 |
| TheJulia | getting a link | 21:32 |
| JayF | no, I am right | 21:33 |
| JayF | https://github.com/openstack/ironic/blob/master/ironic/api/hooks.py#L128 | 21:33 |
| JayF | we never pass global-request-id back through the response | 21:33 |
| JayF | only our newly-created request-id | 21:33 |
| JayF | so it's in the context but I can't find where we ever take it back outta the context on the way back out | 21:34 |
| JayF | and readd it to a header | 21:34 |
| TheJulia | https://github.com/openstack/ironic/blob/master/ironic/common/auth_basic.py#L53 | 21:35 |
| TheJulia | that line takes 300ms | 21:35 |
| JayF | we don't cache there?! | 21:35 |
| JayF | we're opening that file 9000 times | 21:35 |
| TheJulia | heh | 21:35 |
| JayF | even without caching there has to be a more efficient implementation | 21:35 |
| TheJulia | yeah, its authenticate() | 21:37 |
| JayF | I already asked claude to see if it has ideas on perf improvements with no tradeoffs | 21:37 |
| JayF | oh, it's bcrypt | 21:38 |
| JayF | that's intentional | 21:38 |
| JayF | bcrypt is intentionally slow to prevent brute force attacks, right | 21:38 |
| JayF | I bet bcrypt.checkpw() is 99% of that time sink | 21:38 |
| TheJulia | likely | 21:39 |
| JayF | the file would be in ram all day long anyway | 21:39 |
| TheJulia | I *suspect* the best thing to do is to cache known good in ram | 21:39 |
| TheJulia | and do a quick direct match | 21:39 |
| JayF | that is exactly what my first thought was | 21:39 |
| JayF | I'm trying to think if there's any way I could use that to do Bad Things(tm) | 21:39 |
| JayF | we likely would just need some kinda awareness of when the file changes to clear the cache, and/or document that you must restart ironic services if you change the auth_file contents | 21:40 |
| TheJulia | ... I bet we're hitting this issue twice atually | 21:41 |
| TheJulia | actually | 21:41 |
| TheJulia | both on the rest api | 21:41 |
| * JayF wonders if this performance bug is so severe we can backport it | 21:41 | |
| TheJulia | and the internal json-rpc endpoint | 21:41 |
| JayF | yes exactly | 21:41 |
| JayF | yes | 21:41 |
| TheJulia | that would be basically half to 2/3rds of all the wall clock time for the requests metal3 is hitting | 21:42 |
| TheJulia | if not 3/4ths of the time | 21:42 |
| JayF | great job | 21:43 |
| JayF | not fixed yet but this is a super good find | 21:43 |
| TheJulia | yeah | 21:43 |
| TheJulia | (I quite literally started adding prints to wall clock everything in the wsgi app invocation | 21:43 |
| JayF | cid: ^ this is relevant to our 1:1 chat, fwiw | 21:44 |
| JayF | we were talking about the sorta things you could find performance testing, and this is a great story of how to do it \o/ | 21:44 |
| TheJulia | (prints not great in general, since python does buffered IO, but when I printed out "in basic auth -- called authenticate %s" % (time.time()-start_time)) | 21:45 |
| TheJulia | actually, in that I did logging instead of print, cheroot I had to do print | 21:46 |
| TheJulia | Anyway, I'm about to go vanish for the day and try to get some exercise in | 21:47 |
| JayF | if you can spare 5 minutes | 21:49 |
| JayF | can you sanity check my global-request-id discovery above | 21:49 |
| * TheJulia presses brain rewind | 21:49 | |
| JayF | one of those I'm pretty sure I'm right but I wonder if there's some code somewhere I'm missing | 21:49 |
| TheJulia | hmmmmm | 21:51 |
| TheJulia | I suspect your right because we have a single place where we capture context.global_id and send it | 21:52 |
| TheJulia | which is in the nova callback stuffs | 21:52 |
| JayF | OK. I'm going to write the request logging stuff assuming that bug doesn't exist, then go fix that bug too. | 21:52 |
| JayF | Probably by actually using oslo_middleware to handle global_request_id, unless I find some reason not to | 21:53 |
| TheJulia | did you check into oslo.context invocations at all? | 21:53 |
| JayF | .from_environ() loads it into the context | 21:53 |
| JayF | we have no code that takes it from the context and puts it on the outgoing response | 21:53 |
| TheJulia | correct | 21:54 |
| TheJulia | we also have a wrapper around it, but our wrapper just adds a custom field or two | 21:54 |
| JayF | yep, I will keep api compat in mind | 21:54 |
| JayF | but I'm correct that we should have that header on response | 21:54 |
| JayF | *and* on any outgoing requests? | 21:55 |
| TheJulia | I don't remember if that was a requirement or not | 21:55 |
| JayF | I'm going to use nova as a model for the behavior | 21:55 |
| TheJulia | its beeeeen soooooooooooooo long | 21:55 |
| JayF | I'll file a bug about it | 21:55 |
| JayF | TheJulia: thinking about the caching idea, it exposes us to zero additional risk, given if you can read the ram of ironic you can read auth_file | 21:56 |
| JayF | TheJulia: so I'm +1 to that being a solution | 21:56 |
| * TheJulia may not be an El-Alurian from star trek and think "its been five hundred years!" | 21:56 | |
| TheJulia | err, thinks | 21:56 |
| JayF | I can't acknowledge our centuries-old friendship until midway through the run of TNG, sorry | 21:57 |
| TheJulia | silly time travel | 21:57 |
| TheJulia | I've prodded steve since he wrote a bunch of that code around basic auth, to go ahead and chime in if he sees any issues with us caching "known good" in ram. | 22:00 |
| JayF | good stuff | 22:00 |
| JayF | really, really nice find | 22:00 |
| JayF | I am 100% serious we should try to make that backportable | 22:00 |
| JayF | that's a huge perf increase, even if it doesn't hit the majority of our (stable) userbase | 22:01 |
| TheJulia | I'm 95% sure we can do that easily | 22:01 |
| stevebaker[m] | I don't see an issue with memoization of the bcrypt response | 22:05 |
| stevebaker[m] | actually memoizing the auth_entry function would mean the file is always read but bcrypt calls are cached | 22:08 |
| JayF | it's not ridiculous to say "restart the service" if they update auth | 22:13 |
| JayF | in fact, given most auth_basic users are metal3 (right?) wouldn't that already be what's done by default in a k8s world | 22:14 |
| stevebaker[m] | yeah, it would be a regression in behaviour though. Reading the file every time was an intentional design choice, and likely not the cause of this poor performance | 22:17 |
| stevebaker[m] | TheJulia, JayF : Shall I whip up a change? | 22:30 |
| TheJulia | stevebaker[m]: if you have time and capacity, sure | 22:30 |
| TheJulia | if not, I'll be taking a look tomorrow | 22:31 |
| JayF | that means we are free to review it | 22:31 |
| TheJulia | (I also have.... ?4? hours of meetings tomorrow... soooo yeah.) | 22:31 |
| stevebaker[m] | yep I can take a look | 22:32 |
| opendevreview | Jay Faulkner proposed openstack/ironic master: Add request logging middleware for API requests https://review.opendev.org/c/openstack/ironic/+/958103 | 23:10 |
| stevebaker[m] | I have filed this bug https://bugs.launchpad.net/ironic/+bug/2121105 | 23:31 |
Generated by irclog2html.py 4.0.0 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!