Clark[m] | fungi: TheJulia: we're over subscribed in that cloud and have that ability to control that to an extent | 00:45 |
---|---|---|
Clark[m] | I think the idea is we don't know what the right balance is so we have to fiddle with it | 00:46 |
fungi | yep | 00:47 |
yuriys | We did scale down a bit as of 10 hours ago. But yeah, performance over availability there. | 00:47 |
yuriys | How can I get runtime data for the 809895 run? | 00:48 |
Clark[m] | yuriys I can get links in a bit. Finishing dinner first | 00:50 |
ianw | yuriys: i'd suspect it was https://zuul.opendev.org/t/openstack/build/626c1caaf4e34e91b4a1b961e3a2a21d/ | 00:51 |
fungi | yeah, i think that's likely it | 00:53 |
fungi | it's the only voting job to be reported on change 809895 with a failure state in the most recent buildset | 00:53 |
clarkb | ok just sent out tomorrow's meeting agenda. Sorry that was late | 01:24 |
yuriys | We can probably scale down a bit more, maybe 32-36 limit. I still saw some ooms after we went down to 40. We had 1 launch error in the last 10ish hours, but I'm assuming that test runtime is outside of that. | 01:36 |
yuriys | If test duration is a good show of performance, I'm interested in that as well, longterm. | 01:37 |
clarkb | there are likely tests that are a decent indicator of that, but I'm notsure what those might be. | 01:38 |
clarkb | and ya the test runtime should be orthogonal to launch failures. This would b after we have a VM how quickly can it run the job content | 01:38 |
fungi | but also, as we've observed, performance on one vm in isolation is often far better than performance on a vm when the same hosts are also running a bunch of other test instances flat out all at the same time | 01:40 |
fungi | the "noisy neighbor" effect | 01:41 |
clarkb | ya and we see that across clouds | 01:42 |
yuriys | I want to say this 'expansion' created a lot of internal tickets for us, from how we approach overcommitting resources, to memory optimizations, to even ceph optimizations. Like one of the OSDs are caching at an insane rate, we're at 38GB for the 3 OSDs, which is hilarious since we set osd_memory_target at 4G... | 01:42 |
yuriys | Also in all brutal honestly for your use case, ceph is actually bad. We'd want to provision LVM on these NVMes, which we're testing inhouse now. | 01:43 |
fungi | yeah, donnyd had observed that local storage on the compute nodes performed far better | 01:46 |
fungi | that's what he ended up doing in the fortnebula cloud | 01:46 |
clarkb | he might have input on over subscription ratios too | 01:46 |
yuriys | I'm calling it a night, if you guys see performance issues, feel free to scale down (Maybe to 32). We'll eventually find the right tenant maximum that doesn't impact any testing negatively, and that will help us later making correct calculations when scaling hardware. I'm not concerned over getting big numbers here, just finding the right numbers, and getting some experience what bad numbers end up doing to infra. | 02:00 |
yuriys | Interestingly enough ended up watching Deploy Friday: E50 during this chat, which is heavily Zuul focused, many shoutouts to what you guys do there. | 02:01 |
clarkb | https://www.youtube.com/watch?v=2c3qJ851QVI neat | 02:04 |
fungi | woah | 02:06 |
TheJulia | Is it bad I didn't remember which video that was until I saw myself and the background? | 03:46 |
*** ysandeep|away is now known as ysandeep | 05:38 | |
*** rpittau|afk is now known as rpittau | 07:24 | |
*** jpena|off is now known as jpena | 07:28 | |
*** ykarel is now known as ykarel|away | 07:34 | |
opendevreview | daniel.pawlik proposed opendev/puppet-log_processor master: Add capability with python3; add log request cert verify https://review.opendev.org/c/opendev/puppet-log_processor/+/809424 | 07:38 |
*** ysandeep is now known as ysandeep|lunch | 07:48 | |
opendevreview | Balazs Gibizer proposed opendev/irc-meetings master: Add Sylvain as nova meeting chair https://review.opendev.org/c/opendev/irc-meetings/+/810165 | 08:23 |
opendevreview | Balazs Gibizer proposed opendev/irc-meetings master: Add Sylvain as nova meeting chair https://review.opendev.org/c/opendev/irc-meetings/+/810165 | 08:26 |
*** ysandeep|lunch is now known as ysandeep | 09:09 | |
*** ykarel|away is now known as ykarel | 10:26 | |
*** frenzy_friday is now known as anbanerj|ruck | 10:36 | |
*** jpena is now known as jpena|lunch | 11:24 | |
*** dviroel|out is now known as dviroel | 11:31 | |
*** ysandeep is now known as ysandeep|afk | 11:47 | |
opendevreview | Yuriy Shyyan proposed openstack/project-config master: Scaling down InMotion nodepool resource. https://review.opendev.org/c/openstack/project-config/+/810213 | 12:04 |
*** jpena|lunch is now known as jpena | 12:22 | |
*** ysandeep|afk is now known as ysandeep | 12:39 | |
opendevreview | Merged opendev/irc-meetings master: Add Sylvain as nova meeting chair https://review.opendev.org/c/opendev/irc-meetings/+/810165 | 12:53 |
*** tristanC_ is now known as tristanC | 13:16 | |
fungi | TheJulia: you did a good job in it, i watched it all the way through last night | 13:21 |
TheJulia | \o/ | 13:22 |
yuriys | Nothing like waking up to a fire. fungi can you approve|c/r the scale down please. | 13:22 |
fungi | yuriys: just did, sorry didn't spot it until moments before you asked | 13:23 |
yuriys | Awesome ty. I already adjusted overcommit stuff on the cloud itself. | 13:23 |
fungi | yuriys: also nodepool will pay attention to quotas, so if the cloud side scales down the ram, cpu or disk quota it will adjust its expectations accordingly | 13:24 |
yuriys | Very cool, did not know. Will probably tackle via quotas as well then. | 13:26 |
fungi | yeah, we have some public cloud providers who burst our activity by simply adjusting the ram quota on their side, so when they know they have extra capacity they ramp it up temporarily and then when they expect to be under additional load for other reasons they scale it way down, maybe even to 0 | 13:31 |
fungi | a more dynamic way to make adjustments, and faster than going through configuration management | 13:32 |
opendevreview | Merged openstack/project-config master: Scaling down InMotion nodepool resource. https://review.opendev.org/c/openstack/project-config/+/810213 | 13:36 |
*** artom_ is now known as artom | 13:41 | |
opendevreview | Jeremy Stanley proposed opendev/system-config master: Use Apache to serve a local OpenDev logo on paste https://review.opendev.org/c/opendev/system-config/+/810253 | 14:21 |
opendevreview | Danni Shi proposed openstack/diskimage-builder master: Update keylime-agent and tpm-emulator elements Story: #2002713 Task: #41304 https://review.opendev.org/c/openstack/diskimage-builder/+/810254 | 14:23 |
opendevreview | Danni Shi proposed openstack/diskimage-builder master: Update keylime-agent and tpm-emulator elements https://review.opendev.org/c/openstack/diskimage-builder/+/810254 | 14:25 |
opendevreview | Merged opendev/system-config master: lodgeit: use logo from system-config assets https://review.opendev.org/c/opendev/system-config/+/809510 | 14:28 |
opendevreview | Merged opendev/system-config master: gerrit: copy theme plugin from plugins/ https://review.opendev.org/c/opendev/system-config/+/809511 | 15:13 |
clarkb | digging into the replication leaks a bit more: there is a current task to replicate tobiko to gitea02 from just over an hour ago. On gitea02 there is no receive pack process but there are processes for a couple of other replications that are happening | 15:14 |
clarkb | Looking at netstat -np | grep 222 I see three ssh connections that correspond to the three receive packs that are present | 15:15 |
clarkb | All that to say it really does seem like we aren't properly connecting to the remote end when this happens | 15:16 |
yuriys | I noticed a couple instances are in Shut Down state. Is that normal? Is that the 'Available' state? | 15:17 |
clarkb | yuriys: it is possible for test jobs to request a reboot. But typically I'm not sure that is normal | 15:19 |
fungi | reboots also don't generally enter shutdown state, as they're just performed soft from within the guest and not via the nova api | 15:21 |
clarkb | I think libvirt may detect that through the acpi stuff though | 15:22 |
fungi | ahh, any any case they shouldn't remain in that state for more than a few seconds if so | 15:22 |
clarkb | infra-root does anyone else want to look at gitea02 before I kill and restart the tobiko replication task? | 15:22 |
*** dviroel is now known as dviroel|lunch | 15:23 | |
fungi | when you say "Looking at netstat..." do you mean on gitea02 or review? | 15:23 |
*** ysandeep is now known as ysandeep|dinner | 15:23 | |
clarkb | gitea02 | 15:24 |
fungi | i guess the gitea side since netstat isn't installed on review | 15:24 |
fungi | i went ahead and installed net-tools on review | 15:26 |
clarkb | fungi: I think you areexpected to use ss on newer systems liek focal | 15:26 |
clarkb | but net-tools shouldn't hurt either | 15:27 |
fungi | never heard of ss, thanks | 15:27 |
clarkb | fungi: ss is to netstat what ip is to ifconfig aiui | 15:27 |
fungi | interestingly, there is an ssh socket to gitea02 which exists only on the review side and has no corresponding socket tracked on gitea02 | 15:29 |
fungi | 199.204.45.33:32798 -> 38.108.68.23:222 | 15:29 |
opendevreview | Clark Boylan proposed opendev/system-config master: GC/pack gitea repos every other day https://review.opendev.org/c/opendev/system-config/+/810284 | 15:38 |
clarkb | I'm less confident ^ will help but it also shouldn't hurt | 15:38 |
fungi | i have a feeling it's something network related, like a pmtud blackhole impacting one random router which only gets a small subset of the flow distribution or some stateful middlebox dropping a small percentage of tracked states at random | 15:39 |
clarkb | fungi: ya the network timeout that we need to restart gerrit for is likely the best bet | 15:40 |
clarkb | fungi: any objection to me stopping and restargin that tobiko replication task now? | 15:42 |
fungi | clarkb: nah, go for it. i'm curious to see whether these connections clear up, and on which ends | 15:45 |
*** ykarel is now known as ykarel|away | 15:49 | |
fungi | huh, a `git remote update` in zuul/zuul-jobs is taking forever on my workstation at "Fetching origin" (which should be one of the gitea servers) | 15:49 |
fungi | fatal: unable to access 'https://opendev.org/zuul/zuul-jobs/': GnuTLS recv error (-110): The TLS connection was non-properly terminated. | 15:49 |
clarkb | that would be to the haproxy not the backends ? | 15:49 |
clarkb | but maybe something is busy (gitea02 was fairly quiet with a system load of 10 | 15:50 |
clarkb | er 1 | 15:50 |
fungi | maybe? we terminate ssl on the backends | 15:50 |
fungi | haproxy is just a plani layer 4 proxy | 15:50 |
fungi | plain | 15:50 |
clarkb | http://cacti.openstack.org/cacti/graph.php?action=view&local_graph_id=66611&rra_id=all | 15:50 |
fungi | at least in our config | 15:50 |
clarkb | someone decided to update their openstack ansible install? | 15:50 |
clarkb | :P | 15:50 |
fungi | looks like it | 15:50 |
fungi | supposedly osad now sets a unique user agent string when it pulls from git servers | 15:51 |
clarkb | looks like 02 did suffer some of that | 15:51 |
clarkb | small bumps on some other servers buit was largely 02 | 15:52 |
fungi | i'm currently being sent to CN = gitea06.opendev.org | 15:52 |
clarkb | seems like it may be recovering now? possibly because haproxy told things to go away | 15:52 |
clarkb | looks like it did OOM though | 15:53 |
*** ysandeep|dinner is now known as ysandeep | 15:56 | |
fungi | looks like if it's openstack-ansible it'll have a ua of something along the lines of "git/unknown (osa/X.Y.Z/component) | 15:57 |
*** marios is now known as marios|out | 15:59 | |
clarkb | I don't see 'osa' in any of the UA strings during the period of time it appears to have been busy | 16:00 |
fungi | [21/Sep/2021:15:58:03 +0000] "POST /openstack/zun-tempest-plugin/git-upload-pack HTTP/1.1" 200 224778 "-" "git/2.27.0 (osa/23.1.0.dev43/aio)" | 16:00 |
fungi | as a sample | 16:00 |
jrosser | thats a test / CI run becasue it has 'aio' in the string | 16:01 |
fungi | thanks, i haven't tried to put any numbers together yet, just confirming whether i can find those | 16:02 |
clarkb | jrosser: is there a reason the CI runs don't use the local git repos on the node? | 16:02 |
jrosser | they try to as far as possible | 16:02 |
clarkb | I think gitea06 may have become collateral damage in whateverthis is. I can't reach it | 16:03 |
fungi | so at least looking for osa ua strings, there's not a substantial spike in those on gitea02's access log around the time things started to go sideways | 16:05 |
clarkb | 06 does ping, but maybe OOMkiller hit it in an unrecoverable manner? I Guess we can wait and see for a bit | 16:06 |
fungi | filtering out any with /aio as the component, i can see some definite bursts but no clear correlation to spikes on the haproxy established connections graph | 16:07 |
fungi | clarkb: unfortunately 06 is not so broken that haproxy has taken it out of the pool | 16:08 |
fungi | i'm still getting my v6 connections balanced to 06 | 16:08 |
clarkb | fungi: are you sure? the haproxy log shows it as down | 16:09 |
fungi | `echo | openssl s_client -connect opendev.org:https | openssl x509 -text | grep CN` | 16:09 |
clarkb | oh then it flipped it back UP again weird | 16:09 |
fungi | Subject: CN = gitea06.opendev.org | 16:09 |
clarkb | it flip flopped 05 and 06 | 16:10 |
opendevreview | Elod Illes proposed openstack/project-config master: Add stable-only tag to generated stable patches https://review.opendev.org/c/openstack/project-config/+/810287 | 16:10 |
clarkb | yesterday it properly detected the update to the images which do a rolling restart of the services (kind of cool to see that in the log as happening properly) | 16:10 |
fungi | it's definitely hosed to the point where cacti can't poll snmpd though | 16:10 |
clarkb | yup and no ssh either | 16:10 |
fungi | but i guess apache is still semi-responsive, enough to complete tls handshakes | 16:11 |
fungi | looks like gitea05 has also been knocked offline, yeah | 16:12 |
fungi | i was able to ssh into 05 but it took a while to do the login | 16:13 |
clarkb | same here | 16:14 |
fungi | syslem load average is around 90 | 16:14 |
fungi | system | 16:14 |
fungi | it's heavy into swap thrash | 16:14 |
fungi | out of swap altogether in fact | 16:14 |
clarkb | everytime this happens I seriously wonder if we shouldn't cgit again | 16:14 |
clarkb | (it was far more resilient to connetion count based load balancing rather than source) | 16:15 |
fungi | or use apache to handle the git interactions and just rely on gitea as a browser | 16:15 |
fungi | yeah, there's a gitea process using almost 12gb of virtual memory | 16:16 |
clarkb | for 05 and 06 should we ask the cloud to reboot them? and/or manually remove them from the haproxy pool? | 16:16 |
fungi | well, at the moment i expect they're acting as a tarpit for whatever's generating all thos load | 16:16 |
fungi | if we take them out of the pool, those connections will get balanced to another backend and knock it offline quickly as well | 16:17 |
fungi | there was a sizeable spike in non-aio osa ua strings in requests to gitea05 at 14:33 and again at 14:44 | 16:19 |
clarkb | yup, but without access to the web server logs on those hosts it is hard to figure out what is causing this, but I'm looking at haproxy to see if there is any clues | 16:19 |
fungi | i'll see if i can isolate those and possibly map them back to a client | 16:19 |
clarkb | fungi: those happen well before the spike we see in cacti fwiw | 16:20 |
clarkb | we are lookingat a start of ~15:30 according to cacti | 16:20 |
fungi | oh, yep, you're right. i'm looking at the wrong hour | 16:20 |
fungi | /openstack/nova/info/refs?service=git-upload-pack is a clone, right? | 16:22 |
fungi | or could be a fetch too | 16:22 |
fungi | POST /openstack/nova/git-upload-pack | 16:23 |
fungi | i think that's what i'm looking for actually | 16:23 |
clarkb | ya that sounds right | 16:24 |
fungi | yeah, a spike of 42 of those within the 15:35 minute on gitea05 | 16:24 |
clarkb | fwiw on the load balancer: `grep 'Sep 21 15:[345].* ' syslog | grep gitea06 | cut -d' ' -f 6 | sed -e 's/\(.*\):[0-9]\+/\1/' | sort | uniq -c | sort` shows a couple of interesting things | 16:24 |
fungi | normally we see around 1-5 of them in a minute during the surrounding timeframe | 16:25 |
clarkb | if I add 05 to that list then there is strong correlation between two IPs | 16:25 |
fungi | i'm going to try to identify the client addresses associated with those nova clones at 15:35 | 16:25 |
*** jpena is now known as jpena|off | 16:25 | |
*** rpittau is now known as rpittau|afk | 16:25 | |
clarkb | fungi: I just PM'd you the IP I expect it to be based on the haproxy data | 16:25 |
fungi | yep, thanks, i'll see if they like nova a lot | 16:26 |
*** dviroel|lunch is now known as dviroel | 16:28 | |
fungi | these were the source ports on the haproxy side of the connection for those nova clones at 15:35: | 16:29 |
fungi | 60478 60452 60702 60652 60588 60692 60660 60442 60466 60610 60464 60566 60460 60650 60616 60958 60916 60400 60730 32862 60992 60802 60838 32808 60910 32818 60882 60834 60954 60852 60932 60972 60746 60694 60978 32848 60918 60962 60876 32776 60844 33314 | 16:29 |
fungi | to gitea05 | 16:29 |
clarkb | 60478 maps to the IP I shared with you | 16:30 |
clarkb | 60852 does as well | 16:31 |
clarkb | the correlation is starting to get stronger :) | 16:31 |
clarkb | fungi: though it looks like that IP did end up stopping about half an hour ago | 16:33 |
clarkb | fungi: maybe if we restart things we'll be ok | 16:33 |
clarkb | ? | 16:33 |
clarkb | based on that correlation and the lack of that IP showing up for the last bit that is my suggestion | 16:34 |
clarkb | I suspect what happened according to the log is that 6 went down then they were setnt to 5. Then 6 came up and 5 went down and htat happened back and forth | 16:35 |
clarkb | and UP here isn't a very strong metric apparently :) | 16:36 |
fungi | so 100% of those 42 nova clone operations logged during the 15:35 minute on gitea05 came from the same ip address you noted as having a surge in connections through the proxy | 16:38 |
fungi | though they showed up as being during the 15:37 minute on the haproxy log | 16:38 |
clarkb | fungi: I think that is beacuse it takes a few minutes for haproxy to close disconnect the connections | 16:39 |
clarkb | fungi: note they all have status of cD or sD in the log lines which is an exceptional state from haproxy aiui | 16:39 |
clarkb | -- is normal | 16:39 |
fungi | cacti is starting to be able to get through to gitea05 again | 16:39 |
fungi | and gitea06 seems like it wants to finish logging me in... it did print the motd just no shell prompt yet | 16:40 |
clarkb | progress! | 16:40 |
fungi | spike in nova clones on gitea06 was logged at 15:32-15:33 but it was hit much harder too, and stopped really doing anything according to its logs after 15:34 | 16:43 |
clarkb | fungi: ya then I think it flipped over to 05 when 06 was noted as down | 16:44 |
fungi | 115 nova clones ni that 120 second timeframe | 16:44 |
clarkb | Sep 21 15:35:27 gitea-lb01 docker-haproxy[786]: [WARNING] (9) : Server balance_git_https/gitea06.opendev.org is DOWN, reason: Layer4 timeout | 16:44 |
clarkb | Sep 21 15:40:14 gitea-lb01 docker-haproxy[786]: [WARNING] (9) : Server balance_git_https/gitea05.opendev.org is DOWN, reason: Layer4 timeout | 16:44 |
clarkb | basically it went from 06 to 05 | 16:44 |
fungi | running the cross-log analysis with haproxy now | 16:45 |
clarkb | and for some reason didn't continue on to 01 02 03 04 etc | 16:45 |
clarkb | possibly because 06 went back up Sep 21 15:37:47 gitea-lb01 docker-haproxy[786]: [WARNING] (9) : Server balance_git_https/gitea06.opendev.org is UP, reason: Layer4 check passed and so it stuck to only 05 and 06 | 16:45 |
clarkb | load average on 05 doesn't seem to be getting better | 16:49 |
clarkb | fungi: are you good with attempting to reboot 05 and 06 now? | 16:50 |
fungi | yeah, let's | 16:50 |
clarkb | I can't get to 06, if you are still on it did you want to try sudo rebooting both of them? | 16:50 |
clarkb | then if taht doesn't work we can ask the cloud to do it for us | 16:50 |
fungi | i'm logged into both so yep, can do | 16:54 |
fungi | and done | 16:54 |
fungi | we'll see if they manage to shut themselves down cleanly and reboot | 16:54 |
fungi | 05 is back up again | 16:56 |
clarkb | gitea05 closed my connection at least | 16:56 |
fungi | 06 is still booting i think | 16:56 |
fungi | or might still be shutting down, but it closed my connection at least | 16:56 |
clarkb | load is a bit high on 05 | 16:56 |
clarkb | but not as high as before | 16:56 |
clarkb | is there a secondary dos? | 16:56 |
fungi | jrosser: user agent on these nova clones we observed was just "git/1.8.3.1" so i have a feeling it still could be an osa site | 16:57 |
jrosser | it could easily be | 16:58 |
fungi | in a 3-minute span we saw >150 clones of nova from a single ip address, so likely behind a nat | 16:58 |
jrosser | i think we backported that user agent stuff all the way back to T | 16:58 |
jrosser | but it does require them to have moved to a new tag | 16:58 |
clarkb | looks like load is falling back down again on 05 I guess it just had to catch up | 16:59 |
clarkb | also gerrit replication shows retries enqueued for pushing to 05 (we want to see that so good to confirm it happens) | 16:59 |
fungi | git 1.8.3.1 is fairly old... is that the default git on centos 7 maybe? | 17:00 |
clarkb | fungi: I think it is | 17:00 |
clarkb | 05 looks normal now | 17:01 |
clarkb | plenty of memory, reasonable system load etc | 17:01 |
fungi | https://centos.pkgs.org/7/centos-x86_64/git-svn-1.8.3.1-23.el7_8.x86_64.rpm.html | 17:01 |
fungi | yeah, centos 7 seems likely | 17:01 |
fungi | i'm still getting a "no route to host" for gitea06 | 17:02 |
clarkb | I'm going to go find some breakfast since we are just waiting on 06 to restart and 05 showed a restart seems to make things happier and replication handles it properly | 17:03 |
clarkb | I was hoping to do more zuul reviews this morning :) maybe I can do those this afternoon and the gerrit account emails can happen tomorrow | 17:03 |
fungi | i was trying to start on another zuul-jobs change which is what caused me to notice things weren't working right | 17:03 |
clarkb | ya I noticed beacuse I was trying to look at gitea06 like I had looked at gitae02 to debug the replication slowness | 17:04 |
clarkb | oh and at some point we really should restart gerrit to pickup the timeout change | 17:05 |
*** ysandeep is now known as ysandeep|away | 17:13 | |
clarkb | fungi: looks like 06 has been up for about 5 minutes | 17:13 |
fungi | yeah, i'm able to ssh into it now | 17:13 |
fungi | git sidetracked trying to write docs | 17:14 |
fungi | system load average is nice and low | 17:14 |
clarkb | there are three older replication tasks to 06 one each for cinder ironic and neutron. If they don't complete soon they may need to be restarted as well | 17:14 |
clarkb | There are many retry tasks for replication to 06 though so generally seems to have detected it needs to try again | 17:15 |
clarkb | I only see a push for keystone and not the other three (implying they are in a similar state to the previous tobiko replication task. gitea knows nothing about them) | 17:16 |
clarkb | I'll give them 5 more minutes then manually intervene | 17:16 |
*** sshnaidm is now known as sshnaidm|off | 17:22 | |
clarkb | and done. Will check queue statuses in a bit but I expect we'll be recovering to more normal situation soon | 17:23 |
clarkb | fungi: thinking out loud here maybe we can to run iperf3 tests between rax dfw and gitea0X and compare to similar run between review02 and gitea0X | 17:26 |
clarkb | the giteas were never in the same cloud region but it seems that replication might be a fair bit slower now? | 17:26 |
clarkb | mnaser: ^ fyi if that is something you might already know about | 17:27 |
clarkb | don't have hard data on it but possibly also having started since we migrated the review server to the new dc? | 17:27 |
fungi | also can't rule out that these hangs are related to activity spikes and oom events on the gitea side | 17:28 |
mnaser | clarkb: that's strange, the hardware should be way quicker and should be faster ceph systems. i wonder if it has something to do with the kernel version in your vm vs host (as that is signifcantly newer) | 17:28 |
clarkb | fungi: well I have checked some of the hosts and some dont' have recent OOMs | 17:29 |
clarkb | fungi: gitea03 for example is quite clean but also has had issues | 17:29 |
clarkb | the other odd thing is it seems to happen between 14:00UTC and 18:00UTC | 17:31 |
fungi | okay, so yes that does seem like occasional problems corssing the internet | 17:31 |
clarkb | our cacti data shows the last few days during this period of time has been clean except for today | 17:31 |
clarkb | note it is possible that is observer bias as I tend to check in the morning. It could be happening at other times but some timeout is finally occuring and cleaning them up so we don't seem them queue the next morning | 17:33 |
opendevreview | Jeremy Stanley proposed zuul/zuul-jobs master: Deprecate EOL Python releases and OS versions https://review.opendev.org/c/zuul/zuul-jobs/+/810299 | 17:33 |
clarkb | 236edacc 17:05:46.304 [43642e93] push ssh://git@gitea03.opendev.org:222/openstack/releases.git | 17:34 |
clarkb | That one appears to have "leaked" | 17:34 |
clarkb | gitea03 has not OOM'd | 17:34 |
fungi | could they be getting cleaned up after 2 hours? 3? what's the oldest you observed? | 17:34 |
clarkb | fungi: I've observed ~14:00ish at ~18:00 | 17:34 |
fungi | okay, so at least 4 hours i guess | 17:35 |
clarkb | do we want to see what happens with 236edacc ? | 17:35 |
fungi | sure, if someone complains about an old ref there we can always abort the experiment and kill that task so it catches up | 17:35 |
clarkb | ok | 17:36 |
clarkb | f54021d3 and 1c9c079d appear to have leaked on 05 | 17:42 |
clarkb | those are both post reboot tasks so no OOM there either | 17:42 |
*** ysandeep|away is now known as ysandeep | 17:44 | |
clarkb | two things I'll note. the giteas don't appear to have AAAA records but do have configured ipv6 addresses. This means gerrit is going to talk to them over ipv4 only | 17:44 |
clarkb | pinging from gitea05 to review02 over ipv6 results in no route to host | 17:44 |
fungi | yeah, i want to say the original kubernetes deployment design limited us to only using ipv4 addresses, but since the lb is proxying to them anyway it was irrelevant for end users | 17:48 |
fungi | since we ended up not sticking with kubernetes there, we've got ipv6 addresses, just never added any aaaa records | 17:49 |
clarkb | in this case it is a good thing beacuse it seems the ipv6 cannot route | 17:57 |
clarkb | I did a ping -c 100 from gitea05 to review02 and vice versa and both had a 2% loss | 17:57 |
opendevreview | Danni Shi proposed openstack/diskimage-builder master: Update keylime-agent and tpm-emulator elements https://review.opendev.org/c/openstack/diskimage-builder/+/810254 | 18:06 |
clarkb | rerunning the ping -c 100 test to see how consistent that is | 18:07 |
opendevreview | Merged openstack/project-config master: Update neutron-lib grafana dasboard https://review.opendev.org/c/openstack/project-config/+/806138 | 18:07 |
fungi | clarkb: ianw: not urgent, but related to recently approved changes and i'm looking for a suggestion as to the best way to tackle it: https://review.opendev.org/810253 | 18:15 |
clarkb | yuriys: we're noticing some connectivity issues to https://registry.yarnpkg.com/@patternfly/react-tokens/-/react-tokens-4.12.15.tgz from 173.231.255.74 and 173.231.255.246 in the inmotion cloud. Currently I can fetch that url with wget from the hosts that have those IPs assigned to them. | 18:16 |
clarkb | yuriys: I guess I'm wondering if there is potentail routing issues with those IPs or maybe the neutron routers/NAT mgiht be struggling? | 18:16 |
clarkb | oh there wouldn't be NAT | 18:16 |
clarkb | just the neutron router I think | 18:16 |
clarkb | no packet loss on second pass of ping -c between gitea05 and review02 | 18:17 |
clarkb | fungi: left a response to your question on that change. I'm not 100% sure of that but maybe 90% sure | 18:19 |
opendevreview | Jeremy Stanley proposed opendev/system-config master: Switch IPv4 rejects from host-prohibit to admin https://review.opendev.org/c/opendev/system-config/+/810013 | 18:19 |
fungi | thanks! | 18:19 |
opendevreview | Jeremy Stanley proposed opendev/system-config master: Use Apache to serve a local OpenDev logo on paste https://review.opendev.org/c/opendev/system-config/+/810253 | 18:24 |
fungi | also the screenshot for that in the run job was a huge help, made it quite obvious my naive first attempt was worthless | 18:25 |
opendevreview | Clark Boylan proposed opendev/system-config master: Upgrade gitea to 1.15.3 https://review.opendev.org/c/opendev/system-config/+/803231 | 18:29 |
opendevreview | Clark Boylan proposed opendev/system-config master: DNM force gitea failure for interaction https://review.opendev.org/c/opendev/system-config/+/800516 | 18:29 |
opendevreview | Clark Boylan proposed opendev/system-config master: Upgrade gitea to 1.14.7 https://review.opendev.org/c/opendev/system-config/+/810303 | 18:29 |
clarkb | infra-root ^ I put a hold on the last change in that stack to verify 1.15.3. I think we should consider going ahead and landing the 1.14.7 upgrade to keep up to date there. Then for the 1.15.3 update I'd like to do that after the gerrit theme logo stuff that ianw has pushed is done | 18:29 |
clarkb | and I'm beginning to think maybe we do a combo restart of gerrit for the theme update and the replication timout config change | 18:30 |
fungi | a reasonable choice | 18:30 |
clarkb | Then after all that we can do the buster -> bullseye updates for those images (I have changes up for those as well) | 18:31 |
*** ysandeep is now known as ysandeep|out | 18:32 | |
clarkb | I have +2'd the two gerrit changes at the end of the logo stack but didn't approve them as they are a bit more involved than the previous changes. I figure we can double check the above plan with ianw then proceed from there with approving those updates? | 18:39 |
fungi | wfm | 18:40 |
fungi | and hopefully 810253 will work as intended now | 18:42 |
opendevreview | Jeremy Stanley proposed opendev/system-config master: Use Apache to serve a local OpenDev logo on paste https://review.opendev.org/c/opendev/system-config/+/810253 | 19:05 |
clarkb | infra-root https://review.opendev.org/c/opendev/system-config/+/810303 has been +1'd by zuul. I'm around all afternoon if we want to proceed with that. I do have to pick up kids from school though for a shortish gap in keyboard time | 19:50 |
clarkb | that is the gitea 1.14.7 update | 19:50 |
ianw | fungi: the paste not showing the logo in the screenshot is weird | 19:52 |
ianw | especially when it seems like the wget returned it correctly | 19:52 |
fungi | ianw: well, my test also failed and so there should be a held node for it now | 19:53 |
fungi | the get returned a 5xx error | 19:53 |
clarkb | fungi: have you ever seen anything like the error in https://zuul.opendev.org/t/openstack/build/e665dbc7368e44caa398e8c130c4151a ? seems apt had problems? | 19:54 |
clarkb | maybe we fetched and incomplete file? but hash verification sould catch that first? | 19:54 |
ianw | fungi: oh indeed https://storage.bhs.cloud.ovh.net/v1/AUTH_dcaab5e32b234d56b626f72581e3644c/zuul_opendev_logs_d26/810253/3/check/system-config-run-paste/d266360/bridge.openstack.org/test-results.html | 19:55 |
fungi | clarkb: cannot copy extracted data for './usr/bin/dockerd' to '/usr/bin/dockerd.dpkg-new': unexpected end of file or stream | 19:56 |
fungi | my guess is it was truncated, yeah | 19:56 |
clarkb | I'll recheck that change once it reports I guess | 19:57 |
ianw | "[pid: 14|app: -1|req: -1/2] 127.0.0.1 () {32 vars in 435 bytes} [Tue Sep 21 19:44:07 2021] GET /assets/opendev.svg" <- so the request made it to lodgeit, which it shouldn't have you'd think | 19:57 |
fungi | clarkb: i'm betting the working run will show a larger file size for docker-ce than 21.2 MB | 19:58 |
ianw | but also, it looks like mysql wasn't ready -> https://zuul.opendev.org/t/openstack/build/d266360944434e288db1880729d809dc/log/paste01.opendev.org/containers/docker-lodgeit.log#144 | 19:58 |
fungi | ianw: possible my location section in the vhost config isn't right. i can fiddle with it on the held node when i get a moment | 19:58 |
fungi | according to the apache 2.4 docs, location /assets/ should cover /assets/opendev.svg | 19:59 |
fungi | and therefore be excluded from the proxy | 20:00 |
ianw | https://opendev.org/opendev/system-config/src/branch/master/playbooks/roles/lodgeit/templates/docker-compose.yaml.j2#L28 -> we sleep for 30 seconds for mariadb to be up | 20:00 |
ianw | Sep 21 19:42:03 -> Sep 21 19:42:39 paste01 docker-mariadb[10998]: 2021-09-21 19:42:39 0 [Note] mysqld: ready for connections. | 20:01 |
ianw | that's ... 36 seconds from start to ready? | 20:01 |
fungi | that provider may still not be running at consistent performance levels to our others | 20:07 |
ianw | yeah it was inmotion | 20:09 |
ianw | it should use a proper polling wait; the sleep was just expedience but we could include the wait-for-it script | 20:10 |
fungi | huh, fun, my ssh to that held node seems to have just hung | 20:15 |
fungi | nevermind, it resumed | 20:16 |
fungi | bizarre, i tried treating opendev.svg exactly like robots.txt in the apache config on the held node, and it's still getting proxied to lodgeit | 20:20 |
fungi | oh, duh | 20:25 |
opendevreview | Jeremy Stanley proposed opendev/system-config master: Use Apache to serve a local OpenDev logo on paste https://review.opendev.org/c/opendev/system-config/+/810253 | 20:28 |
fungi | ianw: ^ turns out the mistake was a surprisingly simple one | 20:28 |
fungi | i only added the logo to the http vhost, not the https one | 20:29 |
* fungi sighs audibly | 20:29 | |
ianw | oh, doh. that's right we allowed the http for the old config file | 20:32 |
clarkb | its been just over 3 hours on those leaked replication tasks and they are still present | 20:38 |
fungi | well, we guessed the timeout is at least 4 hours there | 20:42 |
clarkb | ya or at least 4 hours | 20:43 |
clarkb | just calling out the data doesn't contradict this yet | 20:43 |
clarkb | following up on the nodepool zk data backups that appears to be working as expected | 20:46 |
clarkb | gitea 1.15.3 continues to look good https://158.69.73.109:3081/opendev/system-config | 20:56 |
mnaser | clarkb: i think you caught network in an odd time where we were flipping some bits, let me know if you continue to see some instability-ish | 21:00 |
clarkb | mnaser: will do | 21:00 |
clarkb | mnaser: fwiw the ipv6 ping from gitea05 to review02 still says Destination unreachable: No route | 21:04 |
mnaser | clarkb: yes, that's still a 'working on fixing it' :( | 21:04 |
clarkb | got it | 21:04 |
*** dviroel is now known as dviroel|out | 21:07 | |
yuriys | Just caught up on chat. Okay, looks like we need to scale down once more to improve the CI performance, 36 sec to start MySQL is awful lol. | 21:35 |
yuriys | Quick question on workload distribution, when a test is queued does it pull a 'worker' at random from a list of available instances, or does it pull an instance out of a serial list of available instances for testing? | 21:38 |
yuriys | The reason I ask is that one of the nodes in the inmotion cloud was heavily under used while the other exploded, which is where I'm guessing ianw's test ran, but it looks like there may have been multiple instances used at the same time, which is fine, but looking to optimize for better/more responsive load distribution. | 21:43 |
ianw | yuriys: umm, the nodes are up and in a "ready" state before they are assigned to run tests | 21:47 |
yuriys | Yeah, what I've seen so far, is instance is created "Launch Attempt", then they are shut off and go to an "Available" state, then if they are selected they go to "In Use" state. This is the stuff that gets pushed to grafana so I'm going from that. | 21:49 |
ianw | if you were just looking at the cloud side, you see vm's come up that are lightly used that will sit for an indeterminate amount of time, when under load not very long, before being assigned as workers when they start doing stuff | 21:49 |
ianw | i need to look at why https://grafana.opendev.org/d/4sdNjeXGk/nodepool-inmotion?orgId=1 is not showing openstackapi stats | 21:50 |
yuriys | No other providers have that info. | 21:51 |
yuriys | I thought it was just 'permabroken' lol. | 21:51 |
Guest490 | ianw: i haven't looked into it but https://zuul-ci.org/docs/nodepool/releasenotes.html#relnotes-3-6-0-upgrade-notes may be relevant to missing stats | 21:54 |
*** Guest490 is now known as corvus | 21:55 | |
*** corvus is now known as _corvus | 21:56 | |
*** _corvus is now known as corvus | 21:56 | |
ianw | i think it might be related to https://review.opendev.org/c/zuul/nodepool/+/786862 | 21:57 |
ianw | we're graphing : stats.timers.nodepool.task.$region.ComputePostServers.mean | 22:07 |
ianw | i think that should be compute.POST.servers now | 22:09 |
opendevreview | Yuriy Shyyan proposed openstack/project-config master: Improve CI performance and reduce infra load. https://review.opendev.org/c/openstack/project-config/+/810326 | 22:11 |
opendevreview | Ian Wienand proposed openstack/project-config master: grafana: fix openstack API stats for providers https://review.opendev.org/c/openstack/project-config/+/810329 | 22:25 |
fungi | okay, back now. dinner ended up slightly more involved than i anticipated | 22:27 |
corvus | i think everything we were interested in getting into zuul has landed, so i'd like to start working on a restart now | 22:32 |
fungi | i'm happy to help | 22:32 |
corvus | fungi: you want to establish if now is okay time wrt openstack? | 22:33 |
corvus | i'll run pull meanwhile | 22:33 |
fungi | yeah, i'm checking in with the release team | 22:33 |
fungi | our gerrit logo changes haven't been approved yet, so we can just do gerrit restart separately later | 22:33 |
corvus | most recent promote succeeded, and i've pulled images, so we're up to date now | 22:34 |
clarkb | fungi: I think that is fine. The two restarts a sufficiently quick compared to each other that we don't need to try and squash them together I don't think | 22:34 |
opendevreview | Merged openstack/project-config master: Improve CI performance and reduce infra load. https://review.opendev.org/c/openstack/project-config/+/810326 | 22:34 |
fungi | i've let the openstack release team know we're restarting zuul, and there are no changes in any of their release-oriented zuul pipelines right now so should be non-impacting there | 22:35 |
fungi | should be all clear to start | 22:36 |
corvus | thanks, i'll save qs and run the restart playbook | 22:36 |
corvus | starting up now | 22:37 |
TheJulia | I was just about ask.... | 22:37 |
fungi | see, no need to ask! ;) | 22:38 |
TheJulia | lol | 22:38 |
fungi | we promise to try to avoid unnecessary restarts next week when we expect things to get more frantic for openstack ;) | 22:39 |
TheJulia | Well, I actually had ironic's last change before releasing in the check queue.... :) | 22:39 |
fungi | (not that this restart is unnecessary, it fixes at least one somewhat nasty bug, and we don't like to release new versions of zuul without making sure opendev's happy on them) | 22:40 |
TheJulia | I *also* found a find against /opt in our devstack plugin which I'm very promptly ripping out because that makes us bad opendev citizens | 22:40 |
fungi | oof | 22:40 |
TheJulia | fungi: that was my reaction when I spotted it | 22:40 |
fungi | thank you for helping take out the trash ;) | 22:41 |
corvus | our current zuul effort is in making it so no one notices downtime again... so every restart now is an infinite number of restarts avoided in the future :) | 22:42 |
corvus | you can't argue with that math. ;) | 22:42 |
fungi | that too! | 22:42 |
clarkb | ya we've been doing a lot of incremental improvements to get closer to removing the spof | 22:42 |
fungi | restartless zuul | 22:42 |
fungi | it's nearing everpresence | 22:42 |
clarkb | This is one of the things that has motivated me to do all this code review :) | 22:43 |
corvus | much appreciated :) | 22:43 |
clarkb | looks like it is done reloading configs? | 22:49 |
corvus | i think it's reloading something (still? or again?) | 22:50 |
corvus | re-enqueing now | 22:51 |
clarkb | changs and jobs are showing up as queued | 22:51 |
fungi | lgtm | 22:52 |
yuriys | hmmm how do you guys idenitfy which provider gets selected for a particular task? | 22:53 |
clarkb | yuriys: every job records an inventory and in that inventory are hostnames that indicate the cloud provider | 22:54 |
clarkb | yuriys: the beginningof the job-output.txt also records a summary of that info (so you can see it in the live stream console) | 22:54 |
yuriys | thank you, found localhost | Provider: xxxx in one of the logs | 22:55 |
clarkb | yuriys: a single job will always have all of its nodes provided by the same provider too | 22:56 |
yuriys | when change is successfully merged by zuul, what triggers a build? | 22:56 |
clarkb | *a single build of a job | 22:56 |
clarkb | yuriys: zuul's gerrit driver will see the merge event sent by gerrit then the pipeline configs in zuul can match that and then trigger their jobs | 22:57 |
yuriys | > a single job will always have all of its nodes provided by the same provider | 22:57 |
yuriys | This explains the explosions!!! | 22:57 |
fungi | unless you're talking about speculative merges, rather than merging changes which have passed gating | 22:57 |
clarkb | yuriys: basically zuul has an event stream open to gerrit and for every event that gerrit emits it evaluates against its pipelines | 22:57 |
yuriys | so if you guys just restarted things | 22:58 |
fungi | "merge" is used in multiple contexts, so it's good to be clear which scenario you're asking about | 22:58 |
yuriys | what are the odds that stream got cut | 22:58 |
yuriys | https://review.opendev.org/c/openstack/project-config/+/810326 | 22:58 |
yuriys | no build | 22:58 |
clarkb | yuriys: the deploy job for that is currently running (zuul got restarted so we had to restore queues) | 22:59 |
fungi | check zuul's status page for the openstack tenant, all the builds for that change got re-added to pipelines | 22:59 |
clarkb | https://zuul.opendev.org/t/openstack/status and look for 810326 | 22:59 |
yuriys | Ah I saw stuff under [check] but deploy was empty for a bit | 22:59 |
yuriys | i see it now | 22:59 |
corvus | re-enqueue complete | 23:00 |
clarkb | ya it isn't instantaneous as each one of those enqueue actions after a restart has to requery git repos | 23:00 |
fungi | yeah, the re-enqueuing was scripted so it doesn't all show back up at once | 23:00 |
corvus | #status log restarted all of zuul on commit 0c26b1570bdd3a4d4479fb8c88a8dca0e9e38b7f | 23:00 |
opendevstatus | corvus: finished logging | 23:00 |
fungi | thanks corvus! | 23:00 |
clarkb | fungi: its been alomst 6 hours on those leaked replications. I guess maybe we wait ~8 hours and then manually clean them up or do we want to leave them until tomorrow? | 23:01 |
clarkb | the mass of failures on some check changes seem to be legit (pip dep resolution problems) | 23:03 |
fungi | clarkb: i think it's safe to assume the queue times you were observing for those replication tasks weren't particularly biased by the time you were checking them, so i'd be fine just cleaning them up at this poimt | 23:04 |
clarkb | ya I'm beginning to suspect there is something interesting about the time period they show up in. Network instability during thos eperiods of time for example | 23:05 |
clarkb | rather than it being a side effect of some sort of long timeout | 23:05 |
clarkb | I'll give them a bit longer. I don't have to make dinner for a bit | 23:06 |
yuriys | build failed : ( | 23:06 |
clarkb | neat let me go see why | 23:06 |
yuriys | is it waiting on logger? | 23:07 |
yuriys | https://zuul.opendev.org/t/openstack/build/a9c7f49c293f4659befe7ae1e3353ca5/log/job-output.txt | 23:07 |
clarkb | no that is the bit I was telling you about where we don't let zuul stream those logs out. We keep the logs on the bastion to avoid unexpected leakages of sensitive info | 23:07 |
clarkb | looking at the log on the bastion it failed because nb01 and nb02 had some issue. I think your change is only needed on nl02 and so we should be good from the scale down perspective | 23:08 |
clarkb | ya https://grafana.opendev.org/d/4sdNjeXGk/nodepool-inmotion?orgId=1 reflects the change | 23:08 |
fungi | that bit of log redaction is specific to our continuous deployment jobs, not typical of test jobs | 23:08 |
yuriys | kk | 23:08 |
fungi | we just want to make sure that the ansible which pushes production configs doesn't inadvertently log things like credentials if it breaks | 23:09 |
clarkb | nb01 and nb02 failed to update project-config which goes in /opt because /opt is full | 23:09 |
yuriys | yeah i got that part, hard to track what failed though lol | 23:09 |
clarkb | I'll stop buildres on them now and then work on cleaning them up | 23:09 |
yuriys | Cool, well, hopefully this is the last one, might have to fiddle with placement distribution limits, our weakness here is just the quantity of nodes. | 23:11 |
ianw | ... we just had an earthquake! | 23:17 |
yuriys | woah | 23:18 |
fungi | everything okay there? | 23:19 |
ianw | yep, well the internet is still working! :) but wow, that got the heartrate up | 23:20 |
yuriys | easy calorie burn | 23:20 |
ianw | i felt a few in bay area when i lived there, but this was bigger bumps than them | 23:21 |
clarkb | wow | 23:21 |
artom | "easy calorie burn" is it though? Feels like a lot of trouble for some cardio ;) | 23:22 |
ianw | it wasn't knock things of shelves level. still, why not add something else to worry about in 2021 :) | 23:25 |
yuriys | don't worry, 2021 not over yet | 23:26 |
yuriys | sorry, correction, worry, 2021 not over yet | 23:26 |
clarkb | any idea why we seem to have a ton of fedora-34 image? | 23:27 |
clarkb | that seems to be at least part of the reason that nb01 and nb02 have filled their disks | 23:27 |
clarkb | I have cleaned out their /opt/dib_tmp as well as stale intermediate vhd images and that helped a bit | 23:27 |
opendevreview | Merged opendev/system-config master: Use Apache to serve a local OpenDev logo on paste https://review.opendev.org/c/opendev/system-config/+/810253 | 23:28 |
ianw | we should just have the normal amount (2). but it is the only thing using containerfile to build so might be a bug in there | 23:28 |
clarkb | ianw: hrm I cross checked against focal as a sanity check and it has 2 for x86 and 2 ready + 1 deleting for arm64 | 23:28 |
clarkb | but fedora-34 has many many | 23:28 |
clarkb | oh you know what | 23:28 |
clarkb | one sec | 23:28 |
clarkb | 2021-09-21 23:29:03,556 ERROR nodepool.zk.ZooKeeper: Error loading json data from image build /nodepool/images/fedora-34/builds/0000007388 | 23:29 |
clarkb | I suspect that issue in the zk db is preventing nodepool from cleaning up the older images | 23:29 |
clarkb | corvus: ^ is that something you think you'd like to look at or should we just rm the node or? | 23:29 |
clarkb | fwiw I htink I cleaned enough disk that we can look at this tomorrow | 23:29 |
clarkb | but probably won't want to wait much longer than that | 23:30 |
opendevreview | Merged openstack/project-config master: grafana: fix openstack API stats for providers https://review.opendev.org/c/openstack/project-config/+/810329 | 23:31 |
clarkb | /nodepool/images/fedora-34/builds/0000007388 is empty and is the oldest build | 23:35 |
clarkb | I don't see the string 7388 in /opt/nodepool_dib on either builder | 23:36 |
clarkb | I suspect some sort of half completed cleaning of the zk db and we should go ahead and rm that znode | 23:36 |
clarkb | However, I'll let corvus confirm there isn't further debugging that wants to happen first | 23:37 |
clarkb | I cleaned up the replication queue as it is getting close to dinner | 23:39 |
clarkb | the replication queue is now empty even after I reenqueued the tasks | 23:39 |
fungi | nice, thanks | 23:39 |
corvus | clarkb: i don't feel a compelling need to debug that right now, so if you want to manually clean up that's great | 23:41 |
fungi | if it's an actual persistent or intermittent issue, i'm sure we'll have more samples soon enough | 23:41 |
clarkb | ok I will rm that single entry then I expect nodepool will clean up after itself from there | 23:42 |
clarkb | oh wait it won't let me rm it because it has subnodes and shows ovh-gra1 has the image? | 23:43 |
clarkb | let me check on what ovh-gra1 sees | 23:43 |
clarkb | nodepool image list didn't show it and its subnode for images/ whereI think it records that was empty so I went ahead and cleaned up everything below 7388 as well as 7388 | 23:46 |
clarkb | the exception listing dib images is gone | 23:47 |
ianw | https://earthquakes.ga.gov.au/event/ga2021sqogij | 23:47 |
ianw | clarkb: i'll let you poke at it, take me longer to context switch in i imagine | 23:48 |
clarkb | ianw: ya, I think this may be all that was necessary then the next time nodepool's cleanup routines run it will clean up the 460something extra records | 23:48 |
clarkb | heh now /nodepool/images/fedora-34/builds/0000007392 is sad but we went from 467 to 460 :) | 23:50 |
clarkb | that one is in the same situation so I'll give it the same treatment | 23:51 |
corvus | wow M6 is not nothing | 23:52 |
clarkb | /nodepool/images/fedora-34/builds/0000007405 now | 23:56 |
Generated by irclog2html.py 2.17.2 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!