clarkb | fungi: huh apparently https://github.com/python/cpython/blob/main/Modules/_io/_iomodule.c#L197 is what gdb says I'm in which is a C implementation of that python function | 00:12 |
---|---|---|
clarkb | fungi: I wonder are they code gening from python to C? | 00:12 |
clarkb | https://github.com/python/cpython/blob/v3.10.11/Modules/_io/bufferedio.c#L753 is what ultimately calls the seek since tell depends on seek | 00:16 |
clarkb | sidenote https://debuginfod.opensuse.org/ is super duper cool and making my life easier | 00:18 |
clarkb | it writes it to your local user cache dir as it needs contents | 00:18 |
clarkb | where I've ended up is that tell call has been there for 14 years (and maybe longer I think the function got renamed at that point but was there prior too) | 00:20 |
clarkb | so the issue really is in openafs I guess not handling a seek on that file gracefully | 00:20 |
clarkb | and now we know who to file a bug report with | 00:20 |
clarkb | but I need to eat dinner so maybe tomorrow I'll figure out how to do that | 00:20 |
fungi | clarkb: there are parallel c and native python implementations of a lot of the stdlib, where the latter tend to be used for testing and extreme portability situations | 02:03 |
fungi | during cpython compilation it will pick which one to include | 02:03 |
*** amoralej is now known as amoralej|lunch | 12:10 | |
*** amoralej|lunch is now known as amoralej | 13:11 | |
clarkb | fungi: I think what confused me is that the python version is a 1:1 copy except for this lseek | 14:37 |
clarkb | the seek only exists in the C version | 14:37 |
clarkb | openafs' rt server appears to be non responsive. But they have a libera irc channel? Maybe I'll start there | 14:48 |
fungi | can also try reaching out to auristor or rra maybe | 14:50 |
fungi | or i think there's an ml | 14:50 |
clarkb | fungi: ya the bug reporting mechanism is via email to rt so I can just send it there and hope the instance is processing email and only http is sad | 14:50 |
clarkb | but I Figure if they have an irc channel I can ask if I found the right links in the first place | 14:51 |
clarkb | (I wanted to check if this was a known issue before reporting it) | 14:51 |
*** amoralej is now known as amoralej|off | 15:37 | |
frickler | meh, seems changing topics for merged patches isn't allowed? my plan was to update this stack once I'm done with all of them, but now I'm getting an error from gerrit | 16:34 |
frickler | Ramereth: https://review.opendev.org/885336 doesn't look much better yet, you updated it when I was about to merge it. do you still plan to fix those CI issues for rocky? I'm only interested in getting rid of the zuul config errors | 16:39 |
Ramereth | frickler: sorry for pushing that right at the same time. I was trying to see if I could fix it, but I assumed it would just error out | 17:24 |
Ramereth | I'm not going to worry about it and will just approve both of those | 17:27 |
frickler | Ramereth: thx, merged | 17:58 |
clarkb | I apparently failed to send a meeting agenda yesterday after I got nerd sniped with python, openafs, and lseek() | 18:44 |
clarkb | I'm going to send one nowish. Sorry about that | 18:44 |
fungi | thanks! | 18:44 |
clarkb | and my browser is broken | 18:45 |
corvus | It's a sign | 18:55 |
clarkb | kill took care of it, but it was weird changing workspaces to it it refused to render | 18:56 |
fungi | took it out behind the toolshed and gave it a talking-to? | 18:56 |
ianw | ok here's my confusion | 19:31 |
ianw | the change was proposed on Mar 2, 2022 and merged in January 2023 | 19:32 |
ianw | so it hasn't made it into any release! | 19:32 |
ianw | but it looks old | 19:32 |
clarkb | aha | 19:32 |
fungi | yeah, i saw a ton of that in the git history for openafs | 19:32 |
clarkb | if they havne't released it yet I'm guessing it isn't a high priority for making things work generally | 19:33 |
clarkb | and it is more of a corner case | 19:33 |
fungi | seemed like they had changes which took years to merge in some cases | 19:33 |
fungi | possibly related to the loose interface between openafs and the mainline kernel codebase too | 19:33 |
ianw | although the corner case is "python opens a file on afs" isn't it? | 19:38 |
clarkb | ianw: no only the ioctl I think | 19:39 |
clarkb | which is a special file/device, not a general file | 19:39 |
clarkb | but we can test that | 19:39 |
clarkb | ianw: `python3 -c 'open("/proc/fs/openafs/afs_ioctl", mode="rb", buffering=4096)'` this crashes. `python3 -c 'open("/afs/openstack.org/project/opendev.org/docs/opendev/gear/latest/index.html", mode="rb", buffering=4096)'` does not | 19:55 |
clarkb | using the .openstack.org path does not crash either | 19:55 |
clarkb | so ya its related to that specific device because it isn't a seekable device. I expect that they kept seekable things working like regular files but missed this one | 19:56 |
ianw | ++ i can file a bug for an ubuntu backport. and if we have an urgent need we know what to do now :) | 19:57 |
ianw | ^ https://bugs.launchpad.net/ubuntu/+source/openafs/+bug/2023107 | 20:06 |
fungi | $ host 2604:e100:1:0:f816:3eff:feff:bd1c | 20:30 |
fungi | c.1.d.b.f.f.e.f.f.f.e.3.6.1.8.f.0.0.0.0.1.0.0.0.0.0.1.e.4.0.6.2.ip6.arpa domain name pointer ns04.opendev.org. | 20:30 |
fungi | yay! | 20:30 |
fungi | thanks guilhermesp_____ ! | 20:30 |
guilhermesp_____ | nice! | 20:43 |
clarkb | corvus: on nl02 (and I suspect the other launchers) we appear to create lock nodes and delete lock nodes frequently against the same instances. This creates a lot of cache watcher log spam. I think this may be due to trying to grab locks to see if things are locked? Maybe we should look at reducing the log level of thoese events? (though they are already debug) | 21:54 |
corvus | clarkb: a few thoughts on that | 22:01 |
corvus | clarkb: 1) those entries are specifically in a logger dedicated to verbose cache log entries, so if we or anyone else ever wants to silence them, it's super easy. i would be okay with a change to make that the default too. but i don't think i'd want to remove them. | 22:02 |
corvus | clarkb: 2) i'm not 100% confident in that code yet and i anticipate a non-zero probability of needing to use those in opendev in the near future, so i don't think we should do that on opendev now | 22:03 |
corvus | clarkb: 3) those specific entries may represent opportunities to further optimize -- a lot of the cache work is actually reducing lock attempts | 22:04 |
clarkb | corvus: ah ok. Ya it seems like we get an ever incrementing counter in the lock path for a lock on the same set of instances. DOing an instance list shows the instance is locked so I think it must be something trying to get the lock and failing? | 22:04 |
corvus | clarkb: got a good number and host i should look at? | 22:06 |
clarkb | corvus: nl02 node 0034237848 was the one I spot checked to make sure that behavior wasn't a problem | 22:07 |
clarkb | (I'm fairly certain it is fine just verbose) | 22:07 |
corvus | understood; just figured since this is an ongoing area of work looking at the behavior you're seeing (with specific nodes) would be good | 22:07 |
clarkb | ack | 22:08 |
corvus | (just a casual reference to node number 34 million) | 22:08 |
clarkb | side note looks like infra-prod-service-zuul has been failing. Is that due to the executor situation? | 22:17 |
corvus | yes but that's not expected | 22:19 |
corvus | it looks like it's talking to the new hosts, and also it's not treating them as jammy nodes | 22:21 |
clarkb | did we recache the old facts at some point? | 22:22 |
corvus | oh wait, no it is talking to the correct (old) hosts | 22:23 |
corvus | and it has focal cached... | 22:23 |
corvus | but for some reason it's trying to install the xenial openafs package | 22:23 |
clarkb | corvus: fungi noticed that we had two apt sources.lists files for openafs on the new nodes | 22:24 |
clarkb | maybe it is related to that? | 22:24 |
corvus | oooh | 22:24 |
corvus | yep | 22:25 |
corvus | on the old hosts too, we now have 2 files; one is focal and one is xenial | 22:25 |
corvus | openafs.list is wrong, and i don't find that in a grep in system-config | 22:26 |
clarkb | I think fungi did debug it a bit but not sure where it ended up | 22:26 |
corvus | i think the openafs file is from the zuul-executor role | 22:28 |
corvus | we must have run the role on the old hosts with cached facts from the new hosts during a time where we didn't have the focal config in place | 22:29 |
corvus | i'm sort of inclined to delete both ppa files on every ze host, delete the cache from every ze host, and see if it recovers | 22:30 |
corvus | clarkb: sound good ^? | 22:30 |
clarkb | corvus: delete the cache on bridge you mean for the hosts? I think that works | 22:31 |
corvus | yep | 22:31 |
clarkb | I think the only risk there is if we somehow install openafs from the distro because we don't have the ppa config in place | 22:32 |
clarkb | but that doesn't seem likely? | 22:32 |
corvus | i hope the ppa is installed before package installation; i suspect we're failing at an earlier apt-get update | 22:33 |
clarkb | ah | 22:33 |
corvus | that's done, so let's see what the next run does | 22:34 |
corvus | those locks seem to happen every 5 seconds... i'm not sure what's doing that | 22:35 |
corvus | delete or stats | 22:37 |
corvus | clarkb: i suspect you may have found a bug; i'm pretty sure that's the deleted node worker, and it's supposed to know that that node is locked and not try it. i think i have enough clues to track it down. | 22:44 |
clarkb | cool | 22:46 |
clarkb | anyone have a quick moment to check that meetpad is happy via https://meetpad.opendev.org/isitbroken ? | 22:48 |
clarkb | I just rebooted the two associated servers. It seems up but without audio/video between multiple users hard to say for sure. | 22:49 |
clarkb | I guess I can test via my phone too | 22:49 |
clarkb | using my phone as a second device worked well and all seems happy | 22:50 |
tonyb | I get the expected welcome screen | 22:52 |
clarkb | yup I think it is happy | 22:52 |
fungi | yes, sorry, i forgot to put the extra lists file back to its original name on ze01 last night, i got spacey | 22:56 |
fungi | it's apparently low-impact because we don't need that ppa on jammy anyway, but want to look into why those servers have two roles that apply duplicate entries for that ppa (one for afs-client and one for zuul-executor i think it was?) | 22:58 |
fungi | seems like maybe one should imply the other or something, and then we can remove the duplication | 22:59 |
clarkb | fungi: the old focal servers have it too and it has broken the infra-prod-service-zuul job | 22:59 |
fungi | looking at my comments from yesterday, it was because ansible is applying both the openafs-server-config and zuul-executor roles, and they add the /etc/apt/sources.list.d/openafs.list and /etc/apt/sources.list.d/ppa_openstack_ci_core_openafs_jammy.list files with identical ppa entries | 23:07 |
fungi | maybe if they both agreed on the filename it would be a non-problem? that could be a simple solution, though maybe somewhat dirty | 23:08 |
clarkb | fungi: or just drop the special zuul-executor content? | 23:09 |
clarkb | since we have a general role for it already | 23:09 |
fungi | sure, i can push that up real quick if nobody else is already writing it | 23:09 |
Clark[m] | I'm not. Transitioning to figuring out dinner | 23:12 |
fungi | on it | 23:15 |
opendevreview | Jeremy Stanley proposed opendev/system-config master: Stop adding duplicate OpenAFS PPA on executors https://review.opendev.org/c/opendev/system-config/+/885419 | 23:20 |
fungi | not sure if we actually need to add cleanup code to remove that... opinions? | 23:21 |
Clark[m] | I didn't approve it since we should make sure corvus is ok with it since he debugged it. But I don't think we need to encode cleanup since its only 12 nodes we can pretty easily rm a file off of manually or with an ansible command | 23:33 |
corvus | Clark: fungi it seems wrong to have openafs-server-config on there... that looks like it has stuff for ... like afs file/pts/db servers? | 23:35 |
corvus | hrm, also /etc/openafs/server is not currently present on the executors... is that the right role? | 23:36 |
opendevreview | Ian Wienand proposed opendev/system-config master: install-docker: replace deprecated include: calls https://review.opendev.org/c/opendev/system-config/+/885420 | 23:39 |
fungi | the other possibility is that it's coming from roles/openafs-client/tasks/openafs-client/Debian.yaml | 23:43 |
fungi | maybe that's something the launch script uses? | 23:44 |
Clark[m] | https://opendev.org/opendev/system-config/src/branch/master/playbooks/service-zuul.yaml#L27 is where it comes from which is openafs-client | 23:44 |
fungi | yeah, okay, so it's that it includes the openafs-client role. are we good with the change so long as i adjust the commit message? | 23:45 |
opendevreview | Jeremy Stanley proposed opendev/system-config master: Stop adding duplicate OpenAFS PPA on executors https://review.opendev.org/c/opendev/system-config/+/885419 | 23:46 |
opendevreview | Neil Hanlon proposed openstack/diskimage-builder master: Add support to build 64k-page-table images for Rocky 9 https://review.opendev.org/c/openstack/diskimage-builder/+/884452 | 23:46 |
fungi | edited the commit message | 23:46 |
corvus | fungi: yes, thanks, that was very confusing; i agree it does look like it's openafs-client and i think this fix makes sense. | 23:46 |
fungi | for some reason i thought we weren't pulling in things from the top-level roles directory any longer, but i guess we mix that and the playbooks/roles directory | 23:47 |
corvus | and that role doesn't need the special per-release files (focal jammy) that are in the executor role? | 23:48 |
corvus | i'm not sure why that role did something different | 23:50 |
Clark[m] | The only weird thing is the xenial hwe override/difference. But we haven't don't xenial in forever so I think we can leave that behind | 23:52 |
fungi | if i had the desire to go excavating git history we'd probably find that at one time we included more variation between ubuntu versions at one time | 23:52 |
corvus | k | 23:54 |
fungi | since they were identical anyway, i didn't question it | 23:56 |
Generated by irclog2html.py 2.17.3 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!