opendevreview | Merged opendev/system-config master: Use versioned get-pip.py URL for Ubuntu Bionic https://review.opendev.org/c/opendev/system-config/+/826968 | 00:01 |
---|---|---|
fungi | excellent | 00:03 |
opendevreview | Merged opendev/system-config master: Add openstack-skyline channel in statusbot/meetbot/logging https://review.opendev.org/c/opendev/system-config/+/825882 | 00:57 |
fungi | finally! i started out my day approving that change ;) | 00:57 |
ianw | haha yak shaving of the highest order | 01:06 |
opendevreview | Merged opendev/system-config master: Use grafyaml container image https://review.opendev.org/c/opendev/system-config/+/780128 | 01:18 |
opendevreview | Merged opendev/grafyaml master: Generate and use UID for acessing dashboards https://review.opendev.org/c/opendev/grafyaml/+/825990 | 02:56 |
ianw | since ^ has merged, I might as well push the change that updates to use the latest upstream grafana container. if it doesn't work, we can revert (if too hard to fix expediently) | 03:46 |
ianw | (merged and deployed, the current page has been synced with the new grafyaml | 03:49 |
opendevreview | Merged openstack/diskimage-builder master: Fix openSUSE images and bump them to 15.3 https://review.opendev.org/c/openstack/diskimage-builder/+/825347 | 04:08 |
fungi | awesome, thanks! | 04:08 |
opendevreview | Merged opendev/system-config master: grafana: update to oss latest release https://review.opendev.org/c/opendev/system-config/+/825410 | 04:13 |
*** ysandeep|out is now known as ysandeep | 05:58 | |
*** marios is now known as marios|ruck | 06:02 | |
opendevreview | Ian Wienand proposed opendev/system-config master: infra-prod-grafana: drop system-config-promote-image-grafana https://review.opendev.org/c/opendev/system-config/+/826990 | 06:15 |
ianw | ^ that is why it didn't deploy | 06:15 |
ianw | ok, did it manually and lgtm. all the dashboards are there, it's running 8.3.4 and the fonts look more ... fonty | 06:19 |
ianw | i'm not 100% sure the urls have stayed the same, i forgot to check | 06:24 |
ianw | however, from now on ,they shouldn't ever change as we make a UID by hashing the title and explicitly set that | 06:25 |
ianw | the centos volume has run out of quota | 06:27 |
ianw | i think we were discussing this before | 06:28 |
ianw | afs01 has ~500gb free. i think let's bump it by 100gb for now, this growth might just be the steady increase of 9-stream | 06:29 |
ianw | hrm, although 9-stream is a separate volume, actually | 06:30 |
ianw | i'll bump it to get it re-syncing for now, and then on my todo list is to investigate further | 06:31 |
ianw | I'm going to manually run with timeout | 06:32 |
ianw | i should be able to parse the recent history logs to see what made it grow when i have time | 06:34 |
*** ysandeep is now known as ysandeep|brb | 06:42 | |
*** amoralej|off is now known as amoralej | 07:02 | |
*** ysandeep|brb is now known as ysandeep | 07:22 | |
frickler | ianw: when I looked last week it seemed to be steady almost linear growth for the centos volume, but I didn't check what part of it might have been the cause | 07:40 |
frickler | also clarkb wanted to drop the no-stream centos8 things, but I'm not sure we might want a bit more for devstack-consumers to clean up all stable things | 07:41 |
ianw | Released volume mirror.centos successfully | 07:41 |
ianw | #status log bumped centos volume quota to 450gb, did a manual run to get it back in sync | 07:42 |
opendevstatus | ianw: finished logging | 07:42 |
ianw | i've dropped the locks | 07:42 |
dpawlik | ianw: hey, did you change something on afs mirror? | 08:18 |
dpawlik | ah, I see there was some mention about that | 08:18 |
* dpawlik reading | 08:18 | |
*** odyssey4me is now known as Guest1217 | 08:32 | |
*** jpena|off is now known as jpena | 08:38 | |
ianw | dpawlik: centos had stopped syncing, so it is now back in sync | 08:44 |
dpawlik | ianw: ok, thans | 08:47 |
*** ysandeep is now known as ysandeep|coffee | 10:04 | |
opendevreview | Alfredo Moralejo proposed zuul/zuul-jobs master: Install OVS from RDO Train Testing repository for CS8 https://review.opendev.org/c/zuul/zuul-jobs/+/827032 | 10:12 |
*** ysandeep|coffee is now known as ysandeep | 10:51 | |
*** rlandy is now known as rlandy|ruck | 11:14 | |
dpawlik | tristanC, ianw, fungi: hi, could you check https://review.opendev.org/c/zuul/zuul-jobs/+/827032 please | 12:40 |
opendevreview | daniel.pawlik proposed zuul/zuul-jobs master: Change RDO train repository for Centos 8 stream https://review.opendev.org/c/zuul/zuul-jobs/+/827067 | 12:50 |
*** amoralej is now known as amoralej|lunch | 13:10 | |
opendevreview | Merged zuul/zuul-jobs master: Install OVS from RDO Train Testing repository for CS8 https://review.opendev.org/c/zuul/zuul-jobs/+/827032 | 13:18 |
fungi | dpawlik: ^ | 13:19 |
dpawlik | thanks fungi! | 13:28 |
*** amoralej|lunch is now known as amoralej | 13:54 | |
opendevreview | Tristan Cacqueray proposed zuul/zuul-jobs master: Add cargo-test job https://review.opendev.org/c/zuul/zuul-jobs/+/827113 | 14:43 |
opendevreview | Tristan Cacqueray proposed zuul/zuul-jobs master: Add cargo-test job https://review.opendev.org/c/zuul/zuul-jobs/+/827113 | 15:37 |
opendevreview | Tristan Cacqueray proposed zuul/zuul-jobs master: Add packages support to ensure-cargo https://review.opendev.org/c/zuul/zuul-jobs/+/827130 | 15:37 |
opendevreview | Tristan Cacqueray proposed zuul/zuul-jobs master: Update cargo-test job to support system packages https://review.opendev.org/c/zuul/zuul-jobs/+/827131 | 15:37 |
*** dviroel is now known as dviroel|lunch | 15:39 | |
*** ysandeep is now known as ysandeep|dinner | 15:46 | |
clarkb | frickler: ya end of january was mostly to give people a concrete target. If people are still actively working to clean up centos 8 usage I don't mind waiting a bit longer | 15:51 |
*** ysandeep|dinner is now known as ysandeep | 16:06 | |
clarkb | once I've settled into the day a bit I'll merge the upstream fix for signed tag acls (assuming nothing pops up that distracts me) | 16:14 |
clarkb | Then we can rebuild our image, deploy that, and update the acl on a single project and double check it works as expected in production (the test node seemed to show it worked for me there though) | 16:14 |
*** amoralej is now known as amoralej|off | 16:30 | |
fungi | awesome, thanks! | 16:34 |
clarkb | ok I submitted that change | 16:36 |
clarkb | I'll get a change up to rebuild gerrit images | 16:36 |
*** dviroel|lunch is now known as dviroel | 16:38 | |
opendevreview | Neil Hanlon proposed openstack/diskimage-builder master: Add new container element - Rocky Linux https://review.opendev.org/c/openstack/diskimage-builder/+/825957 | 16:39 |
opendevreview | Clark Boylan proposed opendev/system-config master: Rebuild gerrit images for signed tag acl fix https://review.opendev.org/c/opendev/system-config/+/827153 | 16:40 |
*** marios|ruck is now known as marios|out | 16:42 | |
clarkb | Looks like Zuul is quite busy this morning. I don't see anything in grafana that looks like we've broken anything just big demand for node requests then all executors stop accepting new jobs as they get nodes and start running ansible en masse | 17:16 |
clarkb | hrm though I am noticing if I change tenant status pages I'm seeing changes that should be removed from various tenants quickly hanging out there for a while implying that maybe the schedulers aren't keeping up for some reason? | 17:25 |
clarkb | corvus: ^ not sure if that is known | 17:25 |
opendevreview | Michal Nasiadka proposed openstack/diskimage-builder master: Add new container element - Rocky Linux https://review.opendev.org/c/openstack/diskimage-builder/+/825957 | 17:26 |
corvus | clarkb: the merger queue looks werd | 17:30 |
*** ysandeep is now known as ysandeep|out | 17:30 | |
fungi | cacti graphs for both schedulers definitely show they're active, but neither looks remotely at risk of running out of cpu/memory | 17:31 |
corvus | zm01 is processing jobs... | 17:32 |
*** jpena is now known as jpena|off | 17:32 | |
corvus | so is zm02. i have confirmed there are a lot of merge job requests in zk | 17:33 |
fungi | zk servers don't look overloaded either | 17:33 |
corvus | did something happen around 15:48? bunch of new branches or tags or anything like that? | 17:34 |
clarkb | Not that I am aware of, but I also hadn't thought ot look for merger activity | 17:35 |
fungi | 14:44 <opendevreview> Merged openstack/releases master: Release manila for stable/xena https://review.opendev.org/c/openstack/releases/+/826762 | 17:35 |
fungi | er, that was an hour earlier | 17:36 |
clarkb | a nova stack made it into check around then | 17:36 |
corvus | the mergers are running, but very slowly; is gerrit slow? | 17:37 |
fungi | https://review.opendev.org/c/openstack/releases/+/825789 for keystone stable/victoria was 15:03z, and that was the most recent things to merge for openstack releases | 17:37 |
fungi | gerrit cpu utilization in cacti looks typical for a weekday | 17:38 |
clarkb | system load is currently acceptable on gerrit for the last 15 minutes. I didn't notice issues with the UI. I could be network related between vexxhost and rax though? | 17:38 |
corvus | whit is accounts-daemon and why does it use 0.5 cpu? | 17:40 |
clarkb | internet says it is a dbus service | 17:40 |
fungi | accountsservice: /usr/lib/accountsservice/accounts-daemon | 17:41 |
fungi | Description: query and manipulate user account information | 17:41 |
clarkb | https://bugs.launchpad.net/ubuntu/+source/accountsservice/+bug/1316830 | 17:42 |
fungi | it's in Section: gnome | 17:42 |
clarkb | apparently high cpu usage from that daemon is a known thing :/ | 17:42 |
fungi | seems like a great choice for a headless server | 17:42 |
fungi | i agree, the merger queue graph in grafana is definitely a concerning sign | 17:44 |
fungi | the mergers definitely look like they're running flat out, just not keeping up | 17:45 |
clarkb | if I clone nova from review to home I'm getting an initially quite slow download speed but then it picks up to about 1MiB/s | 17:45 |
clarkb | hrm seems to have fallen back off again to about 500 KiB/s | 17:45 |
clarkb | basically not great but also not completely off of where I would expect it (I think 2MiB/s is common) | 17:45 |
clarkb | and considering we're not completely overloaded on the review server (historically system load would shoot up when IO was slow) I suspect something networky | 17:46 |
fungi | i don't suppose the mergers relied at all on gerrit's mergeable metadata we're no longer providing since the upgrade? | 17:47 |
clarkb | I don't ebelieve so | 17:48 |
corvus | nah, this should just be git ops | 17:48 |
clarkb | they always need to calculate their own merges so relied on determining that locally | 17:48 |
fungi | yeah, that's what i thought, just making sure | 17:49 |
clarkb | once this clone to home is done I'll do a clone on review02 and compare if removing the network produces a different result | 17:50 |
clarkb | I guess I could dothem concurrently because it isn't like zuul stopped operating either. Initially I Thought I should avoid that to avoid extra noise but thats meaningless I think | 17:50 |
corvus | inside the container this has a huge startup delay: GIT_SSH_COMMAND='ssh -i /var/lib/zuul/ssh/id_rsa' git clone git+ssh://zuul@review.opendev.org:29418/zuul/zuul-jobs | 17:50 |
corvus | outside the container on zm02 has the same behavior | 17:51 |
clarkb | doing the clone locally on review02 started extremely quickly | 17:52 |
corvus | it's about 25 seconds before any response | 17:52 |
clarkb | then around 80% of object download started to get slower | 17:52 |
fungi | i'm seeing signs of memory pressure, but it looks like it's a desire for more cache memory | 17:52 |
fungi | there's little ticks in paging activity | 17:52 |
clarkb | its doing about 5MiB/s locally | 17:52 |
fungi | (on the mergers) | 17:52 |
opendevreview | Tristan Cacqueray proposed zuul/zuul-jobs master: Add cargo-test job https://review.opendev.org/c/zuul/zuul-jobs/+/827113 | 17:53 |
opendevreview | Tristan Cacqueray proposed zuul/zuul-jobs master: Add packages support to ensure-cargo https://review.opendev.org/c/zuul/zuul-jobs/+/827130 | 17:53 |
opendevreview | Tristan Cacqueray proposed zuul/zuul-jobs master: Update cargo-test job to support system packages https://review.opendev.org/c/zuul/zuul-jobs/+/827131 | 17:53 |
corvus | there are a lot of zuul login/logout logs on gerrit ssh log. do we have a cap for user ssh sessions that zuul is hitting? | 17:54 |
clarkb | corvus: there is a cap of 96 concurrent | 17:54 |
fungi | 96 per account, 100 per ip address | 17:54 |
fungi | (the former is enforced by gerrit, the latter by iptables/conntrack) | 17:55 |
clarkb | Receiving objects: 100% (602947/602947), 1.04 GiB | 6.08 MiB/s, done. <- that is the gerrit local clone of nova | 17:55 |
clarkb | which isn't terrible but I would've expected it to be closer to the 25MiB/s that it showed when starting up | 17:55 |
corvus | i think the startup delay is key here; i'm seeing the issue with small repos, not large ones | 17:55 |
clarkb | corvus: fungi gerrit show-queue shows zuul waiting on pulls | 17:56 |
clarkb | which might be the startup delay that you see | 17:56 |
clarkb | it also shows a several day old clone for a third party ci system I'm inclined to kill to free up an extra gerrit thread | 17:57 |
corvus | watching the sshd logs, i see the connection show up immediately, so it does not seem like the delay is before the ssh connection handshake; it's after | 17:57 |
clarkb | I'm going to kill that pull from a few days ago for that third party ci system | 17:58 |
clarkb | 7f22c7e1 Jan-28 15:08 git-upload-pack /openstack/nova | 17:59 |
clarkb | (I removed the user name from the end of that entry) | 17:59 |
fungi | thanks, please do | 17:59 |
fungi | though i doubt that alone will fix it, i suppose it could have been holding onto resources which gerrit wants to free or something | 17:59 |
corvus | weird, "show-queue -w" doesn't seem to be giving me user names? | 18:00 |
clarkb | corvus: they are in parentheses at the end of the tasks | 18:00 |
clarkb | for example (zuul) for zuul tasks | 18:00 |
corvus | ssh review gerrit show-queue -w | 18:00 |
clarkb | corvus: oh you may need to be admin | 18:01 |
corvus | ah right :) | 18:01 |
fungi | confirmed, being admin gets me those, normal user doesn't see them | 18:01 |
clarkb | what I'm noticing is there are only like 5 pull tasks running at a time | 18:01 |
clarkb | and we allow significantly more thread | 18:02 |
fungi | maybe config options changed in 3.4? | 18:02 |
corvus | maybe there's a new limit? that would jive with the behavior we're seeing | 18:02 |
clarkb | sshd.commandStartThreads maybe? Though it isn't clear if that affects git ops too | 18:04 |
clarkb | also we didn't have problems with this after the 3.4 upgrade until now? Maybe it is related to Friday's update instead of the 3.4 update? | 18:05 |
clarkb | or maybe we just missed it during the week last week | 18:05 |
fungi | last week might have been quieter. also the spike earlier may have been anomalous | 18:05 |
clarkb | ok I see 4 pulls now | 18:05 |
clarkb | sshd.threads says "By default, 2x the number of CPUs available to the JVM (but at least 4 threads)." | 18:06 |
clarkb | we set it to 100 | 18:06 |
corvus | do we have melody? | 18:07 |
fungi | not any longer, it was axed over log4j concerns | 18:07 |
fungi | we could probably safely readd it at this point | 18:07 |
clarkb | yes I think we can safely readd it. After we removed it we were reassured that it was fine since gerrit forces its deps | 18:08 |
clarkb | I'm still semi skeptical because I'm that way, but they seeemd confident of it | 18:09 |
corvus | i'm wondering if there's a way (other than melody) to confirm the sshd.threads setting is working as expected; or if batchthreads is being used | 18:09 |
fungi | https://review.opendev.org/c/openstack/nova/+/812111/ was the change triggered at the time of the spike, along with several of its peers. it does seem to have rather a lot of explicit and implicit deps, so could account for a significant number of merge requests | 18:10 |
corvus | zuul is not a service user, right? | 18:10 |
clarkb | corvus: oh that is an interesting possibility. It says if set to 0 then you share the pool with interactive users but by default it is 1 or 2 depending on core count | 18:10 |
clarkb | corvus: I think it may be now because of attention sets | 18:10 |
corvus | how did attention sets make it a service user? | 18:11 |
clarkb | corvus: if you don't put CI systems in the system user group then the attention set system treats them as a human and expects them to take action and stuff. It braeks the workflow | 18:11 |
corvus | so you're saying someone made it a service user? | 18:12 |
clarkb | maybe. I'm trying to do a listing now | 18:12 |
clarkb | looks like zuul is not in the service users group so maybe something to consider for the future but unlikely to cause this issue | 18:12 |
corvus | i just ran jstack on gerrit to get a thread dump | 18:14 |
clarkb | cool I'm working on a change to get melody back shoudl we decide to go that route | 18:15 |
corvus | most of our "SSH-Interactive-Worker" threads are idle | 18:16 |
corvus | btw, can we install less in our images? :) | 18:17 |
clarkb | If it doesnt' make the images twice the size I'm not opposed :) vi nox should be pretty small and I think that may provide a less? | 18:18 |
clarkb | corvus: ya so I wonder if sshd.commandStartThreads could be the bottleneck | 18:18 |
corvus | it looks like there are distinct SSH git-upload-pack threads, and there are 4 of them | 18:19 |
clarkb | oh are we using ssh or http? | 18:19 |
clarkb | maybe the difference is on the http size | 18:19 |
clarkb | *side | 18:19 |
corvus | pretty sure we're using ssh | 18:20 |
opendevreview | Clark Boylan proposed opendev/system-config master: Revert "Remove melody" https://review.opendev.org/c/opendev/system-config/+/827161 | 18:21 |
corvus | fwiw, jstack is easy; i don't think i need anything more right now | 18:21 |
clarkb | btw for my local clone the final result was Receiving objects: 100% (602947/602947), 1.04 GiB | 1.01 MiB/s, done. but I think now we suspect less parallelism and not individual throughput | 18:22 |
clarkb | corvus: noted, maybe we can add that info to our gerrit sysadmin docs when things settle down. I'll WIP that chagne to prevent accidnetal merging | 18:22 |
corvus | oh it looks like the worker threads turn into ssh threads... | 18:23 |
clarkb | according to grafana it looks like we may have caught up | 18:24 |
clarkb | and I'm seeing 9 pulls processed at once now | 18:24 |
fungi | oh, wow it burned down very quickly just out of nowhere | 18:24 |
corvus | i observe a 7 second delay starting a clone | 18:24 |
clarkb | and now up to 18 | 18:24 |
clarkb | fungi: ya it seems we're making use of a lot more threads now | 18:25 |
clarkb | whatever it was seems to have corrected itself at least temporarily | 18:26 |
corvus | in the thread dump, both command start threads are idle | 18:26 |
clarkb | I wonder if the thread dump unstuck something | 18:26 |
corvus | i think the problem is still observable. | 18:26 |
corvus | doubt it | 18:26 |
corvus | it's just fast enough now that it's able to service the queue, but it's still right on the edge | 18:27 |
clarkb | corvus: I guess the impact is lower but the 7 seconds to start cloning is still quite a delay | 18:27 |
corvus | exactly | 18:27 |
corvus | should be 0 | 18:27 |
fungi | and that seems to be the client waiting on jgit to start talking? | 18:28 |
corvus | fungi: hard to say; all i know is the ssh connection is immediate, but the output from git starts X seconds later. don't know what happens during that time. | 18:28 |
corvus | the thread dump is in /tmp/dump inside the gerrit container if anyone wants a look | 18:29 |
corvus | down to about 5 seconds delay now | 18:30 |
corvus | i'm out of ideas | 18:31 |
fungi | this does, at least, seem unlikely to be a result of changes on zuul's side, given the behavior | 18:32 |
clarkb | corvus: fungi: maybe the thing to do is try to capture as much about this as possible in a gerrit issue and see if upstream has ideas | 18:32 |
clarkb | They have been super helpful recently when we bring stuff like this to them. Considering Zuul has caught up we can probably live with waiting on their input for a bit? | 18:32 |
fungi | just to make sure i understand, we did see the number of parallel active threads for ssh increase once that episode passed? | 18:33 |
clarkb | fungi: yes I observed one show-queue -w output with 18 pulls running and not waiting. | 18:33 |
fungi | okay, so it seems like gerrit temporarily reduced the number of tasks it was willing to perform | 18:34 |
clarkb | all but 2 were for zuul | 18:34 |
clarkb | ya its like the requests go into a waiting state according to show-queue -w | 18:35 |
clarkb | and for a while we were only processing ~4-5 of them at a time | 18:35 |
clarkb | which led to a backlog | 18:35 |
clarkb | we still see a backlog but it is much shorter | 18:35 |
fungi | i wonder if it's something like an unintentional throttle which we brought on with the spike from the nova/placement stack at ~15:55z and from that point on requests were being processed in a diminished capacity until gerrit managed to recover itself | 18:38 |
fungi | or whether the threads count had been constricted prior to that, and it was simply the event which generated enough requests to push the mergers over the edge into backlog | 18:39 |
clarkb | ya I think what we can do is write an issue explaining what we observed and ask for any guidance on settings/tunings or behaviors we might not be aware of that could explain this. Then continue to monitor and try to determine what is going on if it happens more regularly | 18:40 |
clarkb | it could also potentially be related to that pull I killed if it holds some lock that everyone has to contend with? | 18:40 |
clarkb | it was for nova which is a popular repo and maybe all pulls for the same repo all have to synchronize with each other? | 18:41 |
*** tkajinam is now known as Guest1307 | 18:43 | |
clarkb | corvus: is the process to make the threadump just `jstack $PID > file` ? | 18:43 |
corvus | yep | 18:45 |
clarkb | cool I can work on a docs update for that today. I'm probably not the best person to file the issue as I'm not sure what my availability will end up being this week due to local stuff. I can help with an issue draft though | 18:46 |
clarkb | (basically thinking it would be good for someone who can be repsonsive to upstream to file that) | 18:46 |
opendevreview | Tristan Cacqueray proposed zuul/zuul-jobs master: Add packages support to ensure-cargo https://review.opendev.org/c/zuul/zuul-jobs/+/827130 | 18:56 |
opendevreview | Tristan Cacqueray proposed zuul/zuul-jobs master: Update cargo-test job to support system packages https://review.opendev.org/c/zuul/zuul-jobs/+/827131 | 18:56 |
clarkb | fungi: corvus https://etherpad.opendev.org/p/fyIZ7J4DPpxcJSEyfeyz I copied the template from gerrit's issue tracker for a bug report if we want to draft something up there together | 18:57 |
fungi | thanks! | 19:02 |
clarkb | fungi: corvus something like that maybe? feel free to edit, but I need to step away for a bit. If you would prefer I file that I can do that just with the caveat that my attentiveness to that bug may be lacking | 19:12 |
corvus | clarkb: lgtm! | 19:21 |
fungi | clarkb: i've suggested a few edits. feel free to take them or leave them | 19:42 |
clarkb | fungi: I s/jgit/gerrit/ since there is more than just jgit involved with things like mina | 19:48 |
clarkb | I don't want to indicate this is a jgit issue as I think we may be hittingthis before jgit is involved (but maybe not) | 19:48 |
clarkb | anyway should I file that or do one of yall wish to do that? | 19:48 |
fungi | yeah, just trying to make it clear we're not pulling refs from gitiles or some other backend | 19:49 |
clarkb | ++ | 19:49 |
fungi | i'm not sure how many ways gerrit exposes things, but we're specifically cloning via ssh to gerrit's dedicated service port it shares with its ssh cli | 19:50 |
fungi | maybe that? | 19:51 |
clarkb | I think that + the port indication should make it clear | 19:51 |
fungi | oh, you added the pirt number, that's probably enough | 19:51 |
fungi | if you have a moment to file it, feel free. otherwise i can probably get to it in a bit but trying to juggle a few things | 19:54 |
clarkb | ok. I'm working on the jstack thing first. Then I need to step out again but if you haven't beaten me to it by then I can file it | 19:55 |
opendevreview | Steve Baker proposed openstack/diskimage-builder master: Allow the bootloader element with the default block device https://review.opendev.org/c/openstack/diskimage-builder/+/826976 | 19:59 |
opendevreview | Clark Boylan proposed opendev/system-config master: Add info on running jstack against gerrit to docs https://review.opendev.org/c/opendev/system-config/+/827172 | 20:01 |
clarkb | corvus: ^ fyi thatis the jstack documentation change | 20:01 |
corvus | clarkb: text lgtm but needs more cowbell er dash | 20:04 |
opendevreview | Clark Boylan proposed opendev/system-config master: Add info on running jstack against gerrit to docs https://review.opendev.org/c/opendev/system-config/+/827172 | 20:06 |
clarkb | thanks fixed | 20:06 |
*** dviroel is now known as dviroel|brb | 20:48 | |
ianw | so from the merger graph i see it suddenly dropped ~ 18:00 but am i correct that we didn't really find a smoking gun for what was happening, just that gerrit decided to go slow for a while? | 20:52 |
fungi | from outward appearances, yes | 20:56 |
ianw | infra-root: if anyone could take a quick look at https://review.opendev.org/c/openstack/diskimage-builder/+/826244 just to confirm i haven't totally mixed things up. it modifies the .repo files we use after i realised yum's "$releasever" doesn't include -stream on -stream distros | 21:11 |
ianw | i'd like to get that testing stack there in before a release, which i'd like to do before cleaning up our fedora gate images, etc. | 21:12 |
opendevreview | Gage Hugo proposed openstack/project-config master: Retire security-specs repo - Step 3 https://review.opendev.org/c/openstack/project-config/+/827178 | 21:20 |
clarkb | ianw: ya that looks sane. Hardcoding should be fine | 21:26 |
opendevreview | Clark Boylan proposed opendev/base-jobs master: Remove centos-8 as it is EOL https://review.opendev.org/c/opendev/base-jobs/+/827181 | 21:28 |
clarkb | infra-root ^ frickler mentioned holding off on that for now since people are still actively working to remove it from their jobs. I'm just proposing changes now so that we have them ready when we are ready to go for it | 21:28 |
opendevreview | Gage Hugo proposed openstack/project-config master: Retire security-specs repo - Step 3 https://review.opendev.org/c/openstack/project-config/+/827178 | 21:33 |
opendevreview | Clark Boylan proposed openstack/project-config master: Set centos-8 min-ready to 0 https://review.opendev.org/c/openstack/project-config/+/827183 | 21:38 |
opendevreview | Clark Boylan proposed openstack/project-config master: Remove centos-8 https://review.opendev.org/c/openstack/project-config/+/827184 | 21:38 |
clarkb | fungi: your gerrit gitea stack needed monitoring to land the first few changes iirc? Then that would allow us to depends-on test the fix I pushed upstream | 21:38 |
opendevreview | Clark Boylan proposed opendev/system-config master: Stop mirroring centos-8 https://review.opendev.org/c/opendev/system-config/+/827186 | 21:44 |
fungi | clarkb: oh, i thought we already merged those | 21:58 |
clarkb | fungi: we merged the first one. But I didn't recheck the second since it affected gerrit configs and I wasn't able to babysit | 21:59 |
clarkb | but I think if we recheck the whole stack in order we hsould be able to test the depends-on upstream gerrit properly | 21:59 |
clarkb | I've got the meeting agenda edited. Any last minute additions before I send that out? | 21:59 |
fungi | oh, yes i only saw the approvals, didn't notice the lack of merging | 22:00 |
clarkb | oh I'll add a thing about the gerrit issues | 22:04 |
clarkb | fungi: did you manage to file an issue for the ssh pull thing yet? If not I'll do that nowish and add it to the infra agenda | 22:06 |
fungi | i have not yet, i've just wrapped up other chores and am starting to cook dinner, but could do it after probably if you're busy | 22:07 |
clarkb | No worries. I've got it as part of the agenda prep | 22:08 |
clarkb | https://bugs.chromium.org/p/gerrit/issues/detail?id=15649 | 22:09 |
clarkb | if you star that issue you'll get email notification which can help you keep on top of it if I am not around though | 22:09 |
ianw | clarkb: did you want to merge the centos-8 retirement stuff now? | 22:12 |
clarkb | ianw: no I was going to check in with people tomorrow to see fi we are ready yet. Frickler mentioned devsatck may still need cleanup | 22:12 |
clarkb | ianw: that said https://review.opendev.org/c/openstack/project-config/+/827183 should be safe to land now. Just not the others | 22:12 |
ianw | yeah, i'll do a double check but given it s/8/8-stream/ should fix most everything i think a quick deprecation is reasonable | 22:15 |
jrosser | we are having a fairly torrid time over in openstack-ansible dropping our centos-8 jobs | 22:16 |
jrosser | because of a succession of external factors happening one after the other, the mirrors were out of sync, now we are stuck for a couple of days because erlang-solutions repos are broken again | 22:17 |
clarkb | ya I mostly wanted to make sure we were ready on our side since ~now was when I said it would happen. But if people are actively working onit and struggling I don't mind waiting/helping as necessary to do the removal cleanly | 22:18 |
ianw | ++; if people are actively working on things that's good | 22:19 |
ianw | clarkb/fungi: do we need to revert anything wrt to https://review.opendev.org/c/opendev/system-config/+/827153 (signed acl fix?) . should i deploy it this afternoon? | 22:34 |
clarkb | ianw: we need to revert the acl updates that allowed annotated tags in openstack/project-config for all the things. fungi has a change to do that, but it might be prudent to revert for a single project first and test that it works then revert for all | 22:35 |
clarkb | but that happens after the update of gerrit | 22:36 |
clarkb | alright last call for the meeting agenda. I'll get that sent out in about 5 minutes | 22:39 |
fungi | ianw: yeah, we need a gerrit restart first on the new image, at a minimum | 22:46 |
fungi | once it's restarted, we could make sure 826335 is still passing tests and then merge it, or we could take a more measured approach and update one repo first as clarkb suggests | 22:48 |
fungi | i'll un-wip 826335 now regardless so that's not blocking it | 22:48 |
clarkb | ya I guess reverting the revert if we got something wrong is easy too | 22:48 |
clarkb | its just a lot of diff to look at :) | 22:49 |
fungi | i'm inclined to trust our testing and just make sure #openstack-release knows to test one tag first, not approve an entire batch, as they're likely to be the first to find problems | 22:52 |
fungi | but if others are more comfortable with a conservative, incremental reversal, that's cool with me too | 22:53 |
clarkb | that plan wfm. I did test it. Happy for others to test it too. The held node is still up and running if they want to do that | 22:54 |
ianw | hrm, centos-8 job just started failing and i think things have moved to vault | 23:21 |
ianw | Error: Failed to download metadata for repo 'centos-base': Cannot prepare internal mirrorlist: No URLs in mirrorlist | 23:22 |
fungi | i guess someone has removed centos 8 for us! | 23:22 |
opendevreview | Merged opendev/system-config master: Rebuild gerrit images for signed tag acl fix https://review.opendev.org/c/opendev/system-config/+/827153 | 23:23 |
opendevreview | Merged opendev/system-config master: Add info on running jstack against gerrit to docs https://review.opendev.org/c/opendev/system-config/+/827172 | 23:23 |
opendevreview | Merged opendev/system-config master: infra-prod-grafana: drop system-config-promote-image-grafana https://review.opendev.org/c/opendev/system-config/+/826990 | 23:23 |
ianw | i think our CI will be somewhat isolated while we run in our little mirror eco-system. but the dib job that pulls from upstream fails | 23:23 |
fungi | once it propagates to afs this might hit jobs harder? | 23:26 |
fungi | i have a sinking suspicion rsync will just delete the files, call that success, and we'll release the empty result | 23:27 |
ianw | still investigating, http://mirror.centos.org/centos/ still seems to be there | 23:27 |
ianw | i'm wondering if just the "metalink" mirror choosing thing is failing to return | 23:28 |
fungi | unless it happened since our last sync | 23:29 |
fungi | oh, sorry, i think i see what you mean now | 23:29 |
fungi | i'm not familiar with what role metalink plays | 23:30 |
fungi | i guess we use that as a convenience redirector in the dip element? | 23:30 |
*** ysandeep|out is now known as ysandeep | 23:30 | |
fungi | dib element | 23:31 |
fungi | though now i want crisps and dip | 23:31 |
ianw | ", we intend to bend the usual EOL process by keeping this content available until January 31st." | 23:31 |
* fungi shakes fist at nighttime snacking tendencies | 23:31 | |
ianw | that sounds suspiciously like today | 23:31 |
ianw | https://www.centos.org/centos-linux-eol/ | 23:31 |
fungi | if the people in charge of taking it down are in cest, then it's already feb 1 there since half an hour | 23:32 |
ianw | it does explain why it chose this moment to break | 23:33 |
fungi | no crisps, but i found a tin of savory biscuits. in the states we call them "crackers" but i assume that's an american colloquialism which is not shared by any other english-speaking culture (and what we call "biscuits" is more akin to shortcakes) | 23:41 |
fungi | i'm not going to defend the ways my ancestors have destroyed this language | 23:41 |
opendevreview | Ian Wienand proposed openstack/project-config master: Stop building CentOS 8 https://review.opendev.org/c/openstack/project-config/+/827195 | 23:42 |
ianw | crackers would be understood in australian english. biscuits is definitely a sweet thing here, scone ~= biscuit, although they have a little sugar and are eaten with jam and cream | 23:43 |
ianw | i don't know why we generally don't have a non-sweet biscuit/scone equivalent. sausage, white gravy and biscuits is good stuff and i still make it for weekend breakfast sometimes | 23:44 |
fungi | ahh, maybe that was a 16th century working class difference brought to our respective homes by the unpaid labor the english shipped here | 23:44 |
fungi | (17th, 18th century... the practice did rather linger more in some places than others) | 23:46 |
fungi | okracoke, two islands to the south of me, has only ever been accessible by boat, and the locals there have retained some very colorful variations on the language | 23:50 |
Generated by irclog2html.py 2.17.3 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!