*** tosky has quit IRC | 00:03 | |
ianw | ok, that's all cleaned up on ORD | 00:09 |
---|---|---|
*** brinzhang has joined #opendev | 00:17 | |
ianw | i'm feeling like we must have deployed /etc/openafs/server/rxkad.keytab manually because i'm not seeing it deployed in config mgmt | 00:18 |
ianw | i've put afs01.ord.openstack.org in emergency and am going to try the manual 1.8 update on it | 00:24 |
*** caiqilong has quit IRC | 00:31 | |
ianw | i think the debian scripts must create KeyFileExt automatically | 00:46 |
*** andrii_ostapenko has quit IRC | 00:47 | |
ianw | # vos status -localauth -server afs01.ord.openstack.org | 01:14 |
ianw | vos: host 'afs01.ord.openstack.org' not found in host table | 01:14 |
ianw | is the only wierd thing so far. it can't seem to query itself | 01:14 |
*** caiqilong has joined #opendev | 01:24 | |
*** caiqilong has left #opendev | 01:28 | |
*** zoharm1 has joined #opendev | 01:29 | |
*** zoharm has quit IRC | 01:33 | |
ianw | ok, it appears to be filtering out all local addresses, meaning it can't talk to itself | 02:12 |
*** otherwiseguy has quit IRC | 02:19 | |
*** sshnaidm has quit IRC | 02:19 | |
*** mhu has quit IRC | 02:23 | |
*** otherwiseguy has joined #opendev | 02:27 | |
*** mhu has joined #opendev | 02:28 | |
*** sshnaidm has joined #opendev | 02:29 | |
ianw | https://lists.openafs.org/pipermail/openafs-info/2021-January/043037.html | 03:00 |
ianw | so it seems 100% intentional 1.8 tools filter out loopback addresses; although the error message is a little unclear (says it can't find the host, rather than "i found the host, but it's all loopback so i chose to ignore it") | 03:00 |
ianw | the upshot is probably that you can't run the commands on the servers themselves. which is probably fine, i've just been used to being able to do that with 1.6 | 03:01 |
*** zoharm has joined #opendev | 03:10 | |
*** zoharm1 has quit IRC | 03:12 | |
*** stevebaker has quit IRC | 03:22 | |
*** stevebaker has joined #opendev | 03:48 | |
*** ykarel has joined #opendev | 04:24 | |
openstackgerrit | Ian Wienand proposed opendev/system-config master: openafs-server : add ansible roles for OpenAFS servers https://review.opendev.org/c/opendev/system-config/+/771159 | 04:37 |
openstackgerrit | Ian Wienand proposed opendev/system-config master: openafs-server : add ansible roles for OpenAFS servers https://review.opendev.org/c/opendev/system-config/+/771159 | 04:42 |
ianw | clarkb / fungi: update dump related to ^ | 04:54 |
ianw | I cleaned up a bunch of old volumes that appeared to be abanonded on afs01.ord.openstack.org (readonly pypi and npm mirrors, etc.) | 04:55 |
ianw | after that i put it in emergency and manually installed from our ppa openafs 1.8.6-5 packages, and restarted everything | 04:55 |
ianw | it seems to be working | 04:56 |
ianw | i haved started on moving this to ansible with 771159 | 04:56 |
ianw | if you could review that, i think what I would like to do is run that manually to ensure it is deploying ok on afs01.ord | 04:57 |
ianw | then we can merge and do afs01/02.dfw | 04:57 |
ianw | fungi it looks like the tarballs transaction finished, but i note all those other transactions marked deleted are still running | 04:59 |
ianw | maybe they clear out, maybe not | 04:59 |
ianw | it might be worth considering https://review.opendev.org/c/opendev/system-config/+/770705/1 to run vos release sequentially | 05:01 |
prometheanfire | ianw: for the removed at end of image build, I guess not, I modeled it after the repo stuff | 05:14 |
prometheanfire | not sure what removes it for theme either | 05:14 |
auristor | ianw: I believe you referenced the wrong commit in your e-mail to openafs-info. The loopback filtering was introduced by dc2a4fe4e949c250ca25708aa5a6dd575878fd7e | 05:17 |
ianw | auristor: ahh yeah, that was the one i was looking at, i guess i pasted the wrong thing :) | 05:19 |
ianw | anyway, i get that "don't use the tools on the server" might be the most practical answer | 05:19 |
auristor | the intent of that code is to find a non-local address when localhost or 127.0.0.1 is specified | 05:20 |
ianw | sure -- and we've (and by that I mean "I") have screwed up before by getting 127.0.1.1 in the volume db | 05:22 |
auristor | vos partinfo localhost -localauth is a reasonable command to execute | 05:22 |
auristor | Looks like I asked the relevant question back in 2013 https://gerrit.openafs.org/#/c/10585/4/src/volser/vos.c but the conclusion in 2014 was that preventing loopback addressed from getting pushed into the VLDB was more important than resolving edge cases. | 05:34 |
ianw | probably the right decision if it took 8 years for anyone to notice :) | 05:41 |
ianw | although separating the error for "host not found" from "host filtered" would probably have tipped me off without having to go to the source :) | 05:43 |
ianw | fungi: i approved a couple of other dib things. i think it's fine for a 3.6.0 release if you like, or i can do it tomorrow when things merge | 05:48 |
*** ykarel has quit IRC | 05:58 | |
openstackgerrit | Ian Wienand proposed opendev/system-config master: openafs-server : add ansible roles for OpenAFS servers https://review.opendev.org/c/opendev/system-config/+/771159 | 06:02 |
*** ykarel has joined #opendev | 06:05 | |
*** marios has joined #opendev | 06:08 | |
openstackgerrit | Merged openstack/diskimage-builder master: Fix centos 8.3 partition image building error with element iscsi-boot https://review.opendev.org/c/openstack/diskimage-builder/+/770701 | 06:43 |
openstackgerrit | Merged openstack/diskimage-builder master: Fix building error with element dracut-regenerate https://review.opendev.org/c/openstack/diskimage-builder/+/770241 | 06:45 |
*** lpetrut has joined #opendev | 07:14 | |
*** AJaeger has joined #opendev | 07:27 | |
AJaeger | infra-root, https://docs.openstack.org/ gives a 403 - forbidden. Is that a known issue? | 07:27 |
*** ralonsoh has joined #opendev | 07:28 | |
ykarel | i see same in tarballs.openstack.org ^ and asked in #openstack-infra | 07:29 |
frickler | frickler@static01:~$ ls -l /afs/openstack.org/ | 07:30 |
frickler | ls: cannot access '/afs/openstack.org/docs': Connection timed out | 07:30 |
frickler | seems afs01.dfw.openstack.org. is down | 07:32 |
AJaeger | ;( | 07:34 |
frickler | at least some sites are still working, so I won't touch anything at this point. ianw and fungi did some work on it last night. | 07:39 |
frickler | maybe we should discuss whether afs is still the right tool to use for this | 07:42 |
*** ykarel_ has joined #opendev | 07:45 | |
*** lpetrut_ has joined #opendev | 07:45 | |
*** ykarel has quit IRC | 07:45 | |
*** lpetrut_ has quit IRC | 07:45 | |
*** ykarel_ is now known as ykarel | 07:45 | |
ykarel | also seeing in some of our jobs:- Status code: 403 for http://mirror.regionone.vexxhost-nodepool-sf.rdoproject.org/centos/8/AppStream/x86_64/os/repodata/repomd.xml (IP: 38.102.83.237) | 07:46 |
*** lpetrut_ has joined #opendev | 07:46 | |
AJaeger | frickler: did anybody sent an #status alert? | 07:46 |
*** lpetrut has quit IRC | 07:48 | |
ykarel | jpena|off, danpawlik5 fyi ^ | 07:48 |
danpawlik5 | AJaeger on thursday or friday | 07:49 |
ykarel | seems issue is cleared, sites are visible now.]] | 07:49 |
*** danpawlik5 has quit IRC | 07:51 | |
*** dpawlik7 has joined #opendev | 07:53 | |
*** sboyron has joined #opendev | 07:53 | |
frickler | afs01 is back, uptime 12min. now sure what happened there. | 07:56 |
*** ykarel_ has joined #opendev | 07:58 | |
*** rpittau|afk is now known as rpittau | 07:59 | |
*** hemanth_n has joined #opendev | 07:59 | |
*** AJaeger has quit IRC | 07:59 | |
*** ykarel has quit IRC | 08:01 | |
mrunge | good morning and happy new week | 08:02 |
*** ykarel_ is now known as ykarel | 08:03 | |
mrunge | frickler, it seems setting the swap size did work in an unintended way (disabled swap?) | 08:03 |
mrunge | https://review.opendev.org/c/openstack/telemetry-tempest-plugin/+/770955/1/.zuul.yaml | 08:03 |
mrunge | that's the change | 08:03 |
mrunge | https://zuul.opendev.org/t/openstack/build/451d87a60f1d42f0b69a19644d90c902/log/zuul-info/host-info.controller.yaml | 08:03 |
mrunge | is the log, swap is 0 . hmmm | 08:04 |
*** slaweq has joined #opendev | 08:04 | |
mrunge | https://zuul.opendev.org/t/openstack/build/451d87a60f1d42f0b69a19644d90c902/log/zuul-info/host-info.controller.yaml#391 | 08:04 |
ykarel | mrunge, seems u are adding it wrongly | 08:14 |
ykarel | also need to add to https://review.opendev.org/c/openstack/telemetry-tempest-plugin/+/770955/1/.zuul.yaml#75 | 08:14 |
ykarel | so it used in master branches | 08:15 |
*** fressi has joined #opendev | 08:16 | |
*** andrewbonney has joined #opendev | 08:16 | |
*** eolivare has joined #opendev | 08:24 | |
*** yourname is now known as avass | 08:30 | |
frickler | mrunge: IIUC host-info is collected before any task runs, so no swap to be seen there. check the log output for the configure-swap task and then what ykarel says. | 08:31 |
*** dtantsur|afk is now known as dtantsur | 08:32 | |
mrunge | ykarel, what do you mean? | 08:32 |
mrunge | ah! | 08:32 |
ykarel | as there are job variants for different branches, you need to add vars to all variants as needed | 08:34 |
mrunge | I was hoping all this would be inherited | 08:34 |
mrunge | but yes. | 08:35 |
mrunge | thanks, let's see if the gate is more happy now | 08:35 |
mrunge | reducing memory in xz job did not have any effect it seems | 08:35 |
ykarel | no it's not get inherited like this, if you see all vars are common you can use yaml anchors | 08:36 |
*** tosky has joined #opendev | 08:44 | |
frickler | infra-root: / on bridge is at 91%, seems to be going on pretty linearly over the last year, but more stable recently, so not sure whether we need to take any action http://cacti.openstack.org/cacti/graph.php?action=view&local_graph_id=65007&rra_id=all | 08:45 |
*** hashar has joined #opendev | 08:52 | |
*** jpena|off is now known as jpena | 08:56 | |
*** akahat|rover has joined #opendev | 09:17 | |
akahat|rover | o/ | 09:18 |
akahat|rover | we are facing issue with mirros. | 09:18 |
akahat|rover | https://bugs.launchpad.net/tripleo/+bug/1912177 | 09:18 |
openstack | Launchpad bug 1912177 in tripleo "Issue with upstream mirrors, Jobs failing with Error: Failed to download metadata for repo 'quickstart-centos-base': Cannot download repomd.xml: Cannot download repodata/repomd.xml: All mirrors were tried"." [Critical,Triaged] | 09:18 |
bhagyashris | akahat|rover, ykarel raised one issue few times before not sure this same or not | 09:25 |
bhagyashris | <ykarel> also seeing in some of our jobs:- Status code: 403 for http://mirror.regionone.vexxhost-nodepool-sf.rdoproject.org/centos/8/AppStream/x86_64/os/repodata/repomd.xml (IP: 38.102.83.237) | 09:25 |
bhagyashris | <ykarel> seems issue is cleared, sites are visible now.]] | 09:25 |
ysandeep | bhagyashris, can you try opening https://mirror-int.dfw.rax.opendev.org/ubuntu/dists/bionic/main/binary-amd64/Packages locally, its not working for me. | 09:26 |
akahat|rover | bhagyashris, ok.. but not sure this will get resolve with the time. :| | 09:27 |
bhagyashris | ysandeep, not working at my end | 09:28 |
fbo | Hi our monitoring is reporting a sync date older than 5 days from the opendev mirror. Is there any ongoing issue with opendev mirrors ? thanks in advance. | 09:39 |
*** ysandeep is now known as ysandeep|afk | 09:53 | |
*** jcapitao has joined #opendev | 09:55 | |
*** fressi has quit IRC | 10:08 | |
*** DSpider has joined #opendev | 10:13 | |
*** Jeffrey4l has joined #opendev | 10:46 | |
*** jcapitao has left #opendev | 10:46 | |
lourot | ^ seeing issue with that regionone mirror again as well, see https://review.opendev.org/c/openstack/charm-mysql-innodb-cluster/+/771029 | 10:50 |
lourot | frickler, are you around maybe? | 10:51 |
*** icey has joined #opendev | 10:59 | |
*** icey has quit IRC | 10:59 | |
*** icey has joined #opendev | 11:00 | |
*** dpawlik7 has quit IRC | 11:06 | |
*** dpawlik7 has joined #opendev | 11:09 | |
*** calcmandan_ is now known as calcmandan | 11:18 | |
frickler | afs01 now has an uptime of 6min, not sure whether it is crashing or maybe rackspace is having some issues | 11:24 |
*** ysandeep|afk is now known as ysandeep | 11:55 | |
*** lpetrut__ has joined #opendev | 12:22 | |
*** lpetrut_ has quit IRC | 12:25 | |
*** jpena is now known as jpena|lunch | 12:28 | |
openstackgerrit | Martin Kopec proposed opendev/system-config master: WIP Deploy refstack with ansible docker https://review.opendev.org/c/opendev/system-config/+/705258 | 12:44 |
*** zoharm1 has joined #opendev | 12:45 | |
*** brinzhang_ has joined #opendev | 12:46 | |
*** bhagyashris has quit IRC | 12:47 | |
*** bhagyashris has joined #opendev | 12:47 | |
*** zoharm has quit IRC | 12:48 | |
*** brinzhang_ has quit IRC | 12:48 | |
*** brinzhang_ has joined #opendev | 12:48 | |
*** brinzhang_ has quit IRC | 12:50 | |
*** brinzhang_ has joined #opendev | 12:51 | |
*** sshnaidm is now known as sshnaidm|ruck | 12:52 | |
*** hashar has quit IRC | 12:56 | |
*** otherwiseguy has quit IRC | 12:56 | |
*** brinzhang has quit IRC | 12:56 | |
*** jrosser has quit IRC | 12:56 | |
*** fbo has quit IRC | 12:56 | |
*** cgoncalves has quit IRC | 12:56 | |
*** ysandeep has quit IRC | 12:56 | |
*** logan- has quit IRC | 12:56 | |
fungi | frickler: i'll check the root inbox, maybe rackspace opened a ticket | 12:57 |
fungi | woah, 24 messages from rackspace support | 12:58 |
fungi | just since i went to sleep | 12:58 |
*** artom has joined #opendev | 12:58 | |
fungi | "This message is to inform you that the host your cloud server, afs01.dfw.openstack.org, 'adfea02f-6229-464a-99e5-7617fb00caef' resides on alerted our monitoring systems at 2021-01-18T06:46:50.457633." | 12:58 |
*** hashar has joined #opendev | 12:59 | |
*** otherwiseguy has joined #opendev | 12:59 | |
*** jrosser has joined #opendev | 12:59 | |
*** fbo has joined #opendev | 12:59 | |
*** cgoncalves has joined #opendev | 12:59 | |
*** ysandeep has joined #opendev | 12:59 | |
*** logan- has joined #opendev | 12:59 | |
*** logan- has quit IRC | 12:59 | |
fungi | yeah, so hypervisor host outage which happened to impact afs01.dfw while we were still trying to recover the replicas on afs02.dfw | 12:59 |
*** logan- has joined #opendev | 13:00 | |
fungi | the firehose01.openstack.org instance was also impacted | 13:01 |
fungi | per subsequent messages there was a hardware failure on the host, and then they had to initiate an offline migration of the server instances to a new host, hence the extended downtime | 13:02 |
*** mordred has quit IRC | 13:05 | |
*** Eighth_Doctor has quit IRC | 13:05 | |
fungi | checking various static sites backed by afs, they seem to be returning content now | 13:19 |
fungi | a vos release of the docs volume is underway since 11:30 utc | 13:25 |
*** jpena|lunch is now known as jpena | 13:28 | |
openstackgerrit | Tobias Henkel proposed zuul/zuul-jobs master: Fail mirror-workspace-git-repos if checkout failed https://review.opendev.org/c/zuul/zuul-jobs/+/771220 | 13:31 |
fungi | aha, the docs volume is served from afs01.dfw and afs01.org, not afs02.dfw | 13:33 |
fungi | so the vos release in progress for it is (re?)populating the replica on afs01.ord | 13:34 |
fungi | ianw: ^ expected, i suppose? | 13:34 |
fungi | that may take a while to complete | 13:34 |
fungi | cacti says it's doing roughly 8Mbps inbound on eth0, and the volume has 23GiB of content, so should expect it to run for roughly 7 hours, maybe finishing around 18:30 utc | 13:49 |
*** mordred has joined #opendev | 13:55 | |
*** Eighth_Doctor has joined #opendev | 13:57 | |
openstackgerrit | Merged zuul/zuul-jobs master: Fail mirror-workspace-git-repos if checkout failed https://review.opendev.org/c/zuul/zuul-jobs/+/771220 | 13:57 |
*** zul has joined #opendev | 14:01 | |
*** lbragstad has joined #opendev | 14:03 | |
*** lbragstad has quit IRC | 14:16 | |
*** d34dh0r53 has quit IRC | 14:17 | |
*** lbragstad has joined #opendev | 14:19 | |
*** ysandeep is now known as ysandeep|afk | 14:19 | |
*** lbragstad has quit IRC | 14:20 | |
*** lbragstad has joined #opendev | 14:20 | |
*** zoharm has joined #opendev | 14:21 | |
*** zoharm1 has quit IRC | 14:23 | |
*** whoami-rajat__ has joined #opendev | 14:25 | |
*** sboyron has quit IRC | 14:27 | |
*** d34dh0r53 has joined #opendev | 14:29 | |
openstackgerrit | Merged zuul/zuul-jobs master: Pass environment variables to 'tox envlist config' task https://review.opendev.org/c/zuul/zuul-jobs/+/770819 | 14:31 |
fungi | ugh... 13:48:21 "At this time we are still in the process of migrating your cloud migration. At this time we are still in the process of migrating your cloud server, afs01.dfw.openstack.org, to a new host. Your server is currently online but will experience some intermittent downtime during the migration process. We will notify you once the migration is complete and we have verified that your cloud server is | 14:34 |
fungi | online. Please do not attempt to access or modify [the server] during this process." | 14:34 |
fungi | so i guess we should expect it to reboot and abort the in-progress vos release for the docs volume | 14:35 |
*** zbr2 has joined #opendev | 14:35 | |
*** zbr has quit IRC | 14:37 | |
*** zbr2 is now known as zbr | 14:37 | |
*** sgw has joined #opendev | 14:39 | |
*** hemanth_n has quit IRC | 14:39 | |
openstackgerrit | Tristan Cacqueray proposed zuul/zuul-jobs master: DNM: negative test https://review.opendev.org/c/zuul/zuul-jobs/+/522438 | 14:40 |
*** hemanth_n has joined #opendev | 14:42 | |
*** ysandeep|afk is now known as ysandeep | 14:45 | |
openstackgerrit | Tristan Cacqueray proposed zuul/zuul-jobs master: DNM: negative test https://review.opendev.org/c/zuul/zuul-jobs/+/522438 | 14:49 |
*** dpawlik7 has quit IRC | 14:52 | |
*** hemanth_n has quit IRC | 14:53 | |
*** ykarel has quit IRC | 14:57 | |
*** dpawlik7 has joined #opendev | 15:17 | |
*** dpawlik7 has quit IRC | 15:26 | |
*** dpawlik9 has joined #opendev | 15:27 | |
*** lpetrut__ has quit IRC | 15:27 | |
*** dpawlik9 is now known as dpawlik | 15:27 | |
*** ysandeep is now known as ysandeep|away | 15:32 | |
openstackgerrit | Cole Walker proposed openstack/project-config master: Add PTP Notification app to StarlingX https://review.opendev.org/c/openstack/project-config/+/771235 | 15:33 |
clarkb | fungi: is afs01.dfw running 1.8? if not is 1.6 functioning? | 15:43 |
fungi | it's running 1.6 and seems to be functioning | 15:46 |
clarkb | interesting | 15:46 |
*** sboyron has joined #opendev | 15:50 | |
fungi | afs01.ord is 1.8 i believe | 15:54 |
fungi | yeah, 1.8.6-5ubuntu1~xenial2 on afs-1.ord, 1.6.15-1ubuntu1 on afs01.dfw and afs02.dfw still | 15:55 |
openstackgerrit | Cole Walker proposed openstack/project-config master: Add PTP Notification app to StarlingX https://review.opendev.org/c/openstack/project-config/+/771235 | 15:56 |
openstackgerrit | Clark Boylan proposed opendev/system-config master: run-selenium: run selenium on a node https://review.opendev.org/c/opendev/system-config/+/767078 | 15:59 |
openstackgerrit | Clark Boylan proposed opendev/system-config master: gerrit: Initalize in testing https://review.opendev.org/c/opendev/system-config/+/765224 | 15:59 |
openstackgerrit | Clark Boylan proposed opendev/system-config master: gerrit: move plugins to common code https://review.opendev.org/c/opendev/system-config/+/767269 | 15:59 |
openstackgerrit | Clark Boylan proposed opendev/system-config master: bazelisk-build: specify targets as list https://review.opendev.org/c/opendev/system-config/+/767272 | 15:59 |
openstackgerrit | Clark Boylan proposed opendev/system-config master: gerrit: get files from bazel build dir https://review.opendev.org/c/opendev/system-config/+/767433 | 15:59 |
openstackgerrit | Clark Boylan proposed opendev/system-config master: gerrit: Install zuul-summary-results plugin https://review.opendev.org/c/opendev/system-config/+/767079 | 15:59 |
openstackgerrit | Clark Boylan proposed opendev/system-config master: Fix review01's fqdn in infratesting https://review.opendev.org/c/opendev/system-config/+/771242 | 15:59 |
smcginnis | clarkb, fungi: Semi-random question, but how difficult would it be to stand up a wiki.openinfra.dev site? | 15:59 |
*** sboyron has quit IRC | 16:01 | |
clarkb | smcginnis: we can barely run the current wiki we've got, it needs to be upgraded. Meanwhile dansmith is complaining that gerrit is slow and openafs decided last week thatn january 14-31 were going to be an inoperable period for it | 16:02 |
smcginnis | OK, so difficulty == high. ;) | 16:02 |
clarkb | I think at this point for new stuff we really need new help to go along with it | 16:05 |
smcginnis | Yeah... not even just the new stuff though. | 16:05 |
clarkb | well that too | 16:06 |
fungi | smcginnis: ttx found a great hosted wiki option which is open source though, i forget the name just now, something maintained by ow2 i think? | 16:09 |
fungi | we've talked about possibly convincing projects to use that so we can retire the wiki service we're currently running | 16:10 |
smcginnis | I was also thinking hackmd.io may be an option. Paid service, but free for communities and the code itself appears to be open source. | 16:10 |
fungi | i thought it was more like an etherpad, but i've rarely used it | 16:10 |
smcginnis | Sync with GitHub. I wonder how difficult it would be to reverse mirror into our own. | 16:10 |
smcginnis | HackMD is very similar to etherpad, but with a lot more features. And I think maybe a little more appealing to those that aren't used to plain text based collaboration (thinking board members here). | 16:11 |
*** sboyron has joined #opendev | 16:11 | |
clarkb | but also etherpad isn't one of the things we tend to struggle with | 16:13 |
clarkb | its a straightforward peice of software to install with a simple upgrade process | 16:13 |
clarkb | migrating off of etherpad to a hosted $othertool isn't going to save us | 16:13 |
fungi | yeah for something which runs as a nodejs process, it's really not half terrible to operate | 16:14 |
smcginnis | Definitely not proposing that. I'm just brainstorming options for moving Open Infra BOD stuff off of https://wiki.openstack.org/wiki/Governance/Foundation | 16:15 |
ttx | fungi: xwiki | 16:15 |
smcginnis | Not like it's critical, but just thinking the perception switch from "OpenStack + a few others" to "set of projects, including OpenStack" | 16:15 |
openstackgerrit | Alfredo Moralejo proposed zuul/zuul-jobs master: Set zuul-jobs-test-base-roles-gentoo-17-0-systemd non-voting https://review.opendev.org/c/zuul/zuul-jobs/+/771248 | 16:19 |
openstackgerrit | Alfredo Moralejo proposed zuul/zuul-jobs master: Rename config repos file config for CentOS Stream https://review.opendev.org/c/zuul/zuul-jobs/+/770815 | 16:20 |
openstackgerrit | Alfredo Moralejo proposed zuul/zuul-jobs master: Rename config repos file config for CentOS Stream https://review.opendev.org/c/zuul/zuul-jobs/+/770815 | 16:24 |
clarkb | fungi: did zuul get restarted? | 16:28 |
clarkb | fungi: frickler bridge disk use is likely to be due to ansible logging | 16:35 |
clarkb | I don't know if we got something in place to clean up old aged out logs | 16:35 |
clarkb | (we rotate them directly in the jobs and that may not clean up older sets?) | 16:36 |
fungi | clarkb: yes, zuul scheduler was restarted late sunday my time (i status logged it so can probably get a more exact time from there) | 16:38 |
fungi | er, sorry, late saturday/early sunday i mean | 16:38 |
clarkb | awesome, I was looking at the zuul queues take off and wondered if we got that out of the way already | 16:39 |
*** eolivare_ has joined #opendev | 16:42 | |
*** zbr1 has joined #opendev | 16:43 | |
*** marios is now known as marios|out | 16:43 | |
*** slaweq has quit IRC | 16:44 | |
*** slaweq has joined #opendev | 16:45 | |
*** zbr has quit IRC | 16:45 | |
*** eolivare has quit IRC | 16:45 | |
*** stevebaker has quit IRC | 16:45 | |
*** zbr1 is now known as zbr | 16:45 | |
*** hashar has quit IRC | 16:48 | |
*** rpittau is now known as rpittau|afk | 16:49 | |
fungi | i ended up having to restart zuul-web too because the api socket became unresponsive after the scheduler restart and remained unusable for a good five minutes after all the cat jobs returned, so i gave up waiting for it to possibly come back on its own | 16:51 |
*** marios|out has quit IRC | 16:51 | |
clarkb | ya I think that is a known issue (though seems like it doesn't fail 100% of the time?) | 16:52 |
*** zul has quit IRC | 17:05 | |
*** ykarel has joined #opendev | 17:07 | |
clarkb | fungi: ianw: I think the arm64 afs stuff isn't working properly as the arm64 zuul job integration test things are failing with afs not being readable | 17:09 |
clarkb | it does look like we successfully built 1.8.6-5 arm64 packages in our ppa so we should't be using the old unpatched version | 17:10 |
fungi | we probably haven't upgraded the packages on the arm64 mirror | 17:10 |
fungi | oh, maybe | 17:10 |
* fungi checks it | 17:10 | |
clarkb | I think the arm64 mirror is fine | 17:11 |
clarkb | but possibly because it hasn't restarted with the new version? https://storage.gra.cloud.ovh.net/v1/AUTH_dcaab5e32b234d56b626f72581e3644c/zuul_opendev_logs_65a/767272/4/check/system-config-zuul-role-integration-debian-stable-arm64/65a715d/job-output.txt is a job log showing the mriror seems fine then it tries to afs locally and beraks | 17:12 |
fungi | `ls /afs/openstack.org/` is taking a while to return for me from there | 17:13 |
fungi | still trying | 17:13 |
fungi | 1.8.6-5ubuntu1~focal1 is what's installed | 17:14 |
fungi | rebooted 3 days ago | 17:14 |
clarkb | hrm | 17:14 |
clarkb | I suppose it is possible that it is a networking issue rather than afs itself | 17:14 |
*** ykarel has quit IRC | 17:15 | |
fungi | `vos status -server afs01.dfw.openstack.org` works from it | 17:15 |
fungi | okay, after initially not returning, i now get content in /afs/openstack.org/ there | 17:15 |
clarkb | huh | 17:15 |
clarkb | I guess we can recheck those changes that failed and see if it persists | 17:16 |
fungi | [Mon Jan 18 17:10:49 2021] afs: Lost contact with volume location server 23.253.200.228 in cell openstack.org (code -1) | 17:16 |
fungi | [Mon Jan 18 17:11:17 2021] afs: volume location server 23.253.200.228 in cell openstack.org is back up (code 0) | 17:17 |
clarkb | the job failure was ~16:58-17:01 ish | 17:17 |
fungi | and again a few seconds later, and then same for 104.130.136.20 | 17:17 |
fungi | so yeah, could be network related | 17:17 |
clarkb | in that case I'll wait for things to finish testing then can recheck and see if it persists | 17:18 |
clarkb | fungi: https://storage.bhs.cloud.ovh.net/v1/AUTH_dcaab5e32b234d56b626f72581e3644c/zuul_opendev_logs_493/767433/7/check/system-config-zuul-role-integration-debian-stable/4930877/job-output.txt it hit an amd64 job | 17:35 |
clarkb | I wonder is msg connection timed out there really ansible saying it can't connect to the remote host anymore? | 17:35 |
clarkb | oh! | 17:36 |
clarkb | debain and centos != ubuntu | 17:36 |
clarkb | so ya I think this is the old packages not working anymore | 17:36 |
fungi | indeed, so they're still testing with broken packages, i expect | 17:36 |
clarkb | that clears things up | 17:36 |
clarkb | ya | 17:36 |
fungi | looks like there's a 1.8.6-5 in sid and bullseye now, but buster still has an older 1.8.2-1 which is probably not patched | 17:37 |
clarkb | https://tarballs.opendev.org/openstack/openstack-zuul-jobs/openafs/centosstuff is where we get the centos package too | 17:37 |
fungi | https://packages.debian.org/openafs-modules-dkms | 17:38 |
clarkb | and I bet we haven't updated our centos builds of the openafs package | 17:38 |
clarkb | I think I see a quick fix for centos | 17:38 |
fungi | yeah, debian's 1.8.6-5 was pulling in the bitmask patches: https://metadata.ftp-master.debian.org/changelogs//main/o/openafs/openafs_1.8.6-5_changelog | 17:41 |
clarkb | remote: https://review.opendev.org/c/openstack/openstack-zuul-jobs/+/771257 Build openafs 1.8.7 on centos | 17:41 |
clarkb | the fix for centos may be that simple? | 17:41 |
*** artom has quit IRC | 17:42 | |
clarkb | fungi: yup we just repackaged that version for ubuntu | 17:42 |
fungi | lgtm | 17:42 |
*** mrunge_ has joined #opendev | 17:43 | |
*** mrunge has quit IRC | 17:45 | |
*** artom has joined #opendev | 17:52 | |
*** ralonsoh has quit IRC | 17:58 | |
*** mrunge_ is now known as mrunge | 18:05 | |
fungi | i've followed up to https://bugs.debian.org/980115 asking if anyone's working on backporting the fixes to buster. i'll push up a merge request in salsa for it if not | 18:06 |
openstack | Debian bug 980115 in openafs-client "connection failure when rx initialized after 08:25:36 GMT 14 Jan 2021" [Grave,Fixed] | 18:06 |
*** jpena is now known as jpena|off | 18:07 | |
clarkb | fungi: thanks | 18:11 |
fungi | oh, hey! the vos release for docs finished at 18:10:29 so roughly in line with what i projected | 18:17 |
fungi | that means our static site volumes should all be caught up now, and i can start releasing the locks i've been holding for mirror volume updates, one at a time | 18:19 |
fungi | the full restore of the docs volume ran from 11:30:01 until 17:36:35, then a subsequent vos release caught it up to current | 18:22 |
fungi | i'll release the yum-puppetlabs lock first and test the waters | 18:23 |
fungi | i've got a mirror update running for it in a root screen session on mirror-update.opendev.org now | 18:25 |
fungi | Starting ForwardMulti from 536871036 to 536871036 on afs02.dfw.openstack.org (full release). | 18:25 |
fungi | hopefully won't take too long as this volume isn't huge | 18:25 |
fungi | not exactly tiny (~200GiB of data according to our grafana dashboard) but it's the smallest one we're graphing (other than epel, which i fear shouldn't be updated until centos/fedora have been) | 18:28 |
fungi | also i'm worried that the live migration of afs01.dfw is still in progress, so we may see rackspace reboot it at any time | 18:30 |
*** eolivare_ has quit IRC | 18:30 | |
fungi | oh, i guess not. yay! 16:18:24 "This message is to inform you that your cloud server, afs01.dfw.openstack.org, is online." | 18:33 |
*** whoami-rajat__ has quit IRC | 18:55 | |
*** zoharm has quit IRC | 19:05 | |
clarkb | fungi: looks like 1.8.7 is present at https://tarballs.opendev.org/openstack/openstack-zuul-jobs/openafs/centos8/RPMS/x86_64/ now | 19:13 |
clarkb | That should just leave debian before we can recheck those | 19:13 |
clarkb | fungi: is that something that debian might backport quickly? | 19:14 |
clarkb | not sure how debian prioritizes those against risk | 19:14 |
fungi | they did set the urgency to "emergency" on the sid upload so it migrated to bullseye within a day, but not sure if fixing it in buster is on their radar which is why i pinged the bug about it | 19:23 |
fungi | the bug is already tagged as affecting buster, so hopefully they're working on it? but i wouldn't count on it being solved straight away | 19:23 |
fungi | we could probably set our tests to use our ubuntu ppa | 19:24 |
*** andrewbonney has quit IRC | 19:25 | |
fungi | the focal build would probably work, but we can try the bionic one if not | 19:25 |
fungi | problem is bionic released with 1.8.0 and focal with 1.8.4, so neither is particularly in sync with buster | 19:25 |
fungi | but odds are our package builds for either would run on it just fine | 19:26 |
*** artom has quit IRC | 19:28 | |
*** artom has joined #opendev | 19:30 | |
*** elod has quit IRC | 19:33 | |
*** elod has joined #opendev | 19:35 | |
clarkb | our packages are built from the debian package too | 19:43 |
clarkb | fungi: to do that we just enable the ppa on debian? then apt-get update and install? | 19:44 |
clarkb | also reading our role to install openafs client it seems we may not install those packages on xenial | 19:49 |
clarkb | which would affect the zuul executors? we may need to double check those have updated? | 19:49 |
openstackgerrit | Clark Boylan proposed opendev/system-config master: Use our ubuntu openafs ppa on debian https://review.opendev.org/c/opendev/system-config/+/771268 | 19:52 |
clarkb | Maybe something like ^ will work | 19:52 |
fungi | clarkb: we checked the executors and ansible updated them within a few hours of publishing to the ppa | 19:54 |
fungi | we even tested rebooting ze01 with it just to make sure | 19:55 |
clarkb | hrm I wonder if that logic there is wrong or did we already update the executors? | 19:55 |
fungi | ii openafs-modules-dkms 1.8.6-5ubuntu1~xenial2 | 19:55 |
clarkb | I wonder if I'm missing something with how that module is used or maybe the condition on it is buggy | 19:56 |
clarkb | I think we want xenial to update fwiw | 19:56 |
clarkb | but the code says not ( ansible_distribution_version is version('16.04', '==') and ansible_architecture == 'x86_64' ) | 19:56 |
fungi | oof, we have a ton of cruft in /etc/apt/sources.list.d on ze01 (and probably elsewhere) | 19:57 |
fungi | looks like we add http://ppa.launchpad.net/openstack-ci-core/openafs/ubuntu in openafs.list, openstack-ci-core-ubuntu-openafs-amd64-hwe-xenial.list, and ppa_openstack_ci_core_openafs_xenial.list | 19:59 |
fungi | i expect only one of those is what we call the file these days and we've renamed it and not cleaned up the old filenames (at least) twice | 19:59 |
clarkb | we use the ansible apt_repository module without an explicit filename so not sure which is the most current | 20:00 |
clarkb | but also I wonder if maybe all of those are stale from puppet and the current ansible isn't trying to add it in | 20:00 |
fungi | yuck. i guess then the module has changed how it works multiple times and doesn't clean up after itself | 20:00 |
fungi | oh, or could be that too, yep | 20:00 |
fungi | comparing to ze12 we have the same files there too | 20:01 |
fungi | but i suppose they all pre-date the ansibification | 20:01 |
clarkb | I think they do | 20:01 |
clarkb | 771268 should give us more insight as it runs on xenial in addition to debian and centos | 20:02 |
clarkb | fungi: thinking out loud woudl it be better to pull in from sid on buster for openafs instead of our ppa? | 20:02 |
clarkb | I guess it probably doesn't matter much as long as the kernel is happy with our ppa too | 20:02 |
fungi | it's likely the sid/bullseye build will run just fine on buster too, but that requires doing some apt pinning if we don't want to accidentally upgrade buster to bullseye once it releases (the default pin priorities will still cause buster packages to be preferred while bullseye is still the testing suite at least) | 20:04 |
clarkb | gotcha | 20:05 |
clarkb | with the ppa its only the packages in the ppa we have to worry about so don't need to pin. THat makes sense | 20:06 |
fungi | the thing our ppa has going for it is that only the openafs packages are present, so if we add that ppa we won't accidentally install any other packages from outside buster | 20:06 |
fungi | yeah, that | 20:06 |
fungi | we could probably also add a buster build to the ppa, i don't recall whether the system there is restricted to only holding ubuntu-targeted builds | 20:07 |
clarkb | fungi: if you've got a second can you take a quick look at the meeting agenda for tomorrow? want to make sure I didn't miss anything important with everything that went on last week | 20:12 |
clarkb | fungi: the centos jobs appear to work now after the 1.8.7 update for their package repo. Debian is failing though. "E: Unable to correct problems, you have held broken packages." I guess it didn't like the ppa? | 20:15 |
clarkb | https://5c08ec073f658d3d10ed-6109476be9a7a65d8252c7a651ade8fd.ssl.cf2.rackcdn.com/771268/1/check/system-config-zuul-role-integration-debian-stable/6afb2ae/job-output.txt | 20:15 |
clarkb | that isn't very verbose about what the broken package is | 20:17 |
clarkb | are we missing an apt-get update? | 20:17 |
clarkb | the timing for the ppa addition implies ti did more than just write a file | 20:19 |
*** sboyron has quit IRC | 20:21 | |
clarkb | ya ansible docs say it should do an update by default after adding a new repo | 20:24 |
fungi | lookin' | 20:24 |
*** slaweq has quit IRC | 20:24 | |
clarkb | the zuul render from the js may include better debugging info | 20:25 |
clarkb | er from the jsobn | 20:25 |
clarkb | digging in that json by hand isn't much fun I've found | 20:25 |
fungi | yeah, looks like we don't have the output from apt-get | 20:26 |
fungi | if it's in the json, that may help | 20:26 |
fungi | openafs-client : Depends: libcrypt1 (>= 1:4.1.0) but it is not installable | 20:27 |
clarkb | yup its in the json | 20:27 |
clarkb | you found it too | 20:27 |
fungi | indeed, flamel is indispensible | 20:27 |
fungi | and that was using the focal build instead of the bionic one? | 20:28 |
clarkb | correct | 20:28 |
*** stevebaker has joined #opendev | 20:28 | |
clarkb | https://packages.debian.org/search?keywords=libcrypt1&searchon=names&suite=buster§ion=all says that package doesn't exist at all? | 20:28 |
fungi | yeah, introduced in bullseye | 20:28 |
clarkb | but ubuntu is happy with it? | 20:29 |
clarkb | hrm bionic doesn't have it either so maybe we edited the dep list for bionic and xenial? | 20:30 |
clarkb | I'll try to use the bionic ppa to see if that fixes it | 20:30 |
fungi | well, deps may be automatically inferred from what libraries the linker pulls in | 20:30 |
clarkb | oh interesting | 20:30 |
fungi | so assuming autotools, it probably detected libcrypt was present and used it | 20:31 |
clarkb | I didn't realize debian pacakging was doing that | 20:31 |
openstackgerrit | Clark Boylan proposed opendev/system-config master: Use our ubuntu openafs ppa on debian https://review.opendev.org/c/opendev/system-config/+/771268 | 20:31 |
fungi | though libcrypt1 replaced libxcrypt | 20:31 |
clarkb | there is bionic | 20:31 |
fungi | yeah, debhelper allows you to embed placeholders in the dependencies which get auto-filled at package build time | 20:32 |
clarkb | TIL | 20:32 |
clarkb | I'll send out the meeting agenda in about half an hour. If you get a chance to look that over first that would be great | 20:32 |
fungi | https://salsa.debian.org/debian/openafs/-/blob/master/debian/control#L19 | 20:33 |
fungi | Depends: ${shlibs:Depends}, ${misc:Depends}, lsb-base (>= 3.0-6) | 20:34 |
fungi | that's what it looks like in the packaging config | 20:34 |
fungi | so shlibs:Depends gets filled in by debhelper's dh_shlibdeps | 20:35 |
fungi | and `man dh_shlibdeps` explains every painful detail | 20:35 |
fungi | so to conclude, the bionic build will likely depend on libxcrypt instead of libcrypt1, and work fine on buster (assuming no other dependencies had different breaking transitions between when bionic was snapshotted from sid and when buster froze) | 20:37 |
fungi | looking over the agenda now | 20:37 |
fungi | mnaser did get his agenda item added, good | 20:40 |
fungi | we should probably talk about project renames sometime soon, if not this week. have we done any since the upgrade? the past months are a blur. https://review.opendev.org/765787 is going to need a rename once they decide on a namespace, not sure if there are others waiting who don't know to add themselves to the agenda | 20:41 |
fungi | clarkb: i don't see anything missing from the current agenda wiki article, no | 20:43 |
clarkb | thanks for checking I'll get that sent out shortly | 20:43 |
fungi | thanks! | 20:43 |
clarkb | fungi: looks like the debian jobs are passing now when switched to bionic from focal | 20:45 |
clarkb | thank you for looking at that with me. I learned a bit more about packaging there ;) | 20:45 |
clarkb | I think that role is only used on debian for wheel builds? is that right? | 20:46 |
clarkb | if so the impact to making that change should be fairly small, but maybe a good idea to have ianw double check it too? | 20:46 |
fungi | sounds right | 20:47 |
fungi | it's getting to be the time where he often appears, so happy to wait | 20:47 |
ianw | o/ | 20:52 |
ianw | sorry catching up | 20:52 |
clarkb | ianw: no rush, basically I picked up the gerrit thing to help out on that side and discovered it runs the openafs-client testing from zuul-jobs which needs working openafs on debian and centos. We fixed centos by updating the job that builds the package to build 1.8.7 instead of 1.8.6 | 20:54 |
clarkb | the change above proposes we install our ppa package on buster as the fix for debian | 20:54 |
ianw | ahh yes i meant to do that | 20:55 |
clarkb | ianw: left a few notes on https://review.opendev.org/c/opendev/system-config/+/771159 one of which needs fixing (so I -1'd) | 20:55 |
fungi | ianw: probably the most exciting thing to note is that afs01.dfw suffered from a hypervisor host outage in rackspace, and was rebooted after many hours offline | 20:56 |
*** slaweq has joined #opendev | 20:56 | |
ianw | fungi: yeah, so basically we need to get that to 1.8 now? | 20:56 |
fungi | also i discovered that the docs volume is replicated to afs01.ord, so was replacing it there at a crawl | 20:57 |
fungi | but it completed, i've since started removing locks i was holding for mirror servers, but so far just the smallest one (mirror.yum-puppetlabs which is now done) | 20:58 |
fungi | i can hold off allowing other mirrors to update if we want to avoid any lengthy vos releases in progress | 20:58 |
fungi | mirror.yum-puppetlabs finished roughly an hour ago | 20:59 |
ianw | fungi: yeah, basically docs is the only thing on ORD | 20:59 |
ianw | i think we got rid of everything mirroring to it because it was so slow? | 20:59 |
fungi | unfortunately afs01.dfw went offline while docs wasn't done being rebuilt on afs01.ord so the site ended up offline | 20:59 |
fungi | but now it should be okay | 21:01 |
fungi | yeah, restoration of the docs volume from dfw to ord took ~7.5 hours at 8Mbps to replicate all 23GiB of content | 21:02 |
ianw | but basically afs01.dfw is now running 1.6 with the bad timestamp issue, since it got restarted, right? | 21:02 |
fungi | we suspect so, yes | 21:02 |
fungi | though i haven't really seen any issues stemming from that so far today | 21:02 |
ianw | ok, well ORD seemed to go OK with it's update | 21:04 |
ianw | i'll just look at 771159; my initial thought was to run that against ORD and make sure it's idempotent | 21:04 |
fungi | is it just me or is the executor not running on ze01? | 21:08 |
openstackgerrit | Ian Wienand proposed opendev/system-config master: openafs-server : add ansible roles for OpenAFS servers https://review.opendev.org/c/opendev/system-config/+/771159 | 21:08 |
fungi | yeah i think it maybe didn't start when we rebooted the server a few days ago for the openafs update? | 21:09 |
fungi | starting it now | 21:09 |
ianw | i agree, doesn't look like it crashed | 21:09 |
fungi | graphite was reporting only 11 executors too, so i expect the others are still running | 21:09 |
fungi | what's the easiest way to tell if the console stream interface is still running on an executor? whether or not there's a process listening on 7900/tcp? | 21:12 |
ianw | yep i think so. oom messages in dmesg is usually a smoking gun | 21:12 |
ianw | using bionic afs on debian should be self-testing with the integration tests right? | 21:13 |
fungi | i think so? | 21:13 |
fungi | trying the focal builds was at least | 21:13 |
fungi | which is why it got switched to bionic | 21:14 |
ianw | oh, in those 5 seconds zuul reported | 21:14 |
ianw | yeah, that tests afs client install and access, so must be good | 21:14 |
fungi | so should i hold off letting any other mirrors resync to afs02.dfw just yet? i guess we don't want to be in the middle of what could be a multi-day replication while we're upgrading openafs packages/rebooting things? | 21:21 |
ianw | fungi: sorry, eating toast, back now | 21:33 |
ianw | umm, i think yes maybe we should upgrade and just get it done | 21:33 |
ianw | the other thing to consider is https://review.opendev.org/c/opendev/system-config/+/770705 to release sequentially | 21:34 |
fungi | the other thing to consider about those mirror volumes, since they're not current on afs02.dfw, they won't be used, so if afs01.dfw is offline then any requests to our mirrors will result in a 403 | 21:43 |
fungi | (we saw that happen during the afs01.dfw outage earlier today as well) | 21:43 |
fungi | i suppose we could snapshot zuul's queues restart the scheduler, and then restore them to avoid a bazillion builds failing due to our mirrors all going offline | 21:44 |
ianw | hrm, i guess we should upgrade 02, sync volumes then do 01 | 21:45 |
fungi | syncing the outstanding mirror volumes is going to require the better part of a week, just keep that in mind | 21:46 |
fungi | it's certainly a conundrum | 21:47 |
ianw | fungi: i'm thinking through a plan v2 on https://etherpad.opendev.org/p/infra-openafs-1.8 | 21:49 |
fungi | it looks like we were able to sync the mirror.yum-puppetlabs volume to afs02.dfw at around 300Mbps, | 21:51 |
ianw | we've got the db servers too | 21:51 |
fungi | so we copied 200GiB in 1.5 hours give or take | 21:52 |
fungi | maybe the other volumes won't take *too* terribly long | 21:52 |
openstackgerrit | Merged opendev/system-config master: Use our ubuntu openafs ppa on debian https://review.opendev.org/c/opendev/system-config/+/771268 | 21:58 |
ianw | i'm running 771159 manually now and watching it | 21:58 |
ianw | oh! i need to put the base64 keys in | 22:00 |
fungi | into /etc/openafs/server/KeyFileExt? | 22:06 |
ianw | into bridge heira so it gets deployed | 22:10 |
fungi | aha | 22:14 |
*** slaweq has quit IRC | 22:20 | |
*** slaweq has joined #opendev | 22:24 | |
ianw | alright, i confirmed with a manual run that doesn't do anything crazy to afs01.ord | 22:30 |
ianw | and is also restricted to that host | 22:30 |
ianw | fungi: you ok to merge with that? | 22:30 |
openstackgerrit | Ian Wienand proposed opendev/system-config master: openafs-server: ensure vos_release keys installed on new servers https://review.opendev.org/c/opendev/system-config/+/771284 | 22:33 |
openstackgerrit | Ian Wienand proposed opendev/system-config master: Move afs02.dfw.openstack.org to afs-1.8 group https://review.opendev.org/c/opendev/system-config/+/771285 | 22:35 |
fungi | ianw: yep, approving now, thanks for giving it a trial run! | 22:44 |
ianw | ok, when that is in, we can do a manual upgrade of afs02.dfw and switch that to ansible control. i think we should try some small releases and then decide what to do next | 22:48 |
ianw | i think i'll start work on the db side now | 22:48 |
fungi | yeah, looking over the numbers now | 23:07 |
fungi | centos may be the next one to unleash | 23:07 |
fungi | or opensuse is about the same size | 23:08 |
fungi | we're not graphing the ceph mirror volumes though, they may actually be much smaller | 23:08 |
* fungi checks | 23:08 | |
fungi | oh, yeah those will go quickly... ceph-deb-nautilus has 2.9GiB of data and ceph-deb-octopus 7.7GiB | 23:09 |
ianw | hrm, it may be that i reorganised ceph and didn't update the graphs? can't quite remember | 23:09 |
ianw | i thought i consolidated those to one volume, but maybe it was just that, a thought :) | 23:10 |
ianw | i think maybe i just cleaned up some old ones | 23:10 |
fungi | we still update and vos release them separately in the cronjobs, at least | 23:10 |
ianw | it looks like for the afsdb servers we basically just extra-install openafs-dbserver | 23:11 |
fungi | anyway i can drop the lock and manually run the reprepro script for mirror.deb-nautilus which in theory will only take a few minutes to complete | 23:12 |
ianw | i guess the "bos" setup for fileserver v db server isn't recorded in config mgmt | 23:12 |
ianw | fungi: ++ | 23:12 |
fungi | once you're ready for testing afs02.dfw | 23:12 |
fungi | i'll be around for at least a few more hours | 23:12 |
openstackgerrit | Merged opendev/system-config master: openafs-server : add ansible roles for OpenAFS servers https://review.opendev.org/c/opendev/system-config/+/771159 | 23:14 |
*** slaweq has quit IRC | 23:14 | |
ianw | fungi: ^ let's let that deploy, then we can emergency afs02.dfw and manual upgrade; do a release test, and when good merge 771285 to manage it with ansible | 23:15 |
ianw | given that a) a lot of things aren't 100% captured by config mgmt, 2) afs doesn't like ip addresses changing and 3) generally not that many complex dependencies I'm thinking that in-place upgrades of these hosts from xenial->focal is the best idea | 23:16 |
*** brinzhang_ has quit IRC | 23:20 | |
ianw | infra-prod-install-ansible (3. attempt) | 23:21 |
ianw | i wonder why that's on it's 3rd try | 23:21 |
ianw | sigh, looks like the streamer is dead https://zuul.opendev.org/t/openstack/stream/abbdc56b277646409e8410ad90d68229?logfile=console.log | 23:21 |
ianw | https://zuul.opendev.org/t/openstack/build/abbdc56b277646409e8410ad90d68229/console | 23:24 |
ianw | warning: failed to remove launch/__pycache__/sshclient.cpython-36.pyc: Permission denied | 23:24 |
ianw | i guess that must be related | 23:24 |
corvus | mordred, ianw: we have a 'yamlgroup' ansible inventory plugin in system-config; is that published anywhere else? if not, any plans to do so? | 23:27 |
ianw | corvus: i feel like maybe that was written in rush mode after we were having issues with the openstack inventory plugin that was querying all the clouds? | 23:29 |
ianw | iirc git blame might at some point show me adding regex support to it | 23:29 |
ianw | that feels like all i know :) | 23:29 |
corvus | ianw: yes, there are 2 commits, one from you and one from mordred :) | 23:30 |
corvus | mordred's commit explains why we created it; and i think it's a good generally useful plugin | 23:30 |
corvus | i'm working on an install where i'd actually like to use it in concert with the openstack inventory plugin | 23:30 |
*** brinzhang has joined #opendev | 23:31 | |
ianw | yeah, so clearly mordred was the driver; i imagine there was no specific reason to not publish it more widely | 23:31 |
corvus | okay, maybe next time mordred is around we can see if there's a good home for it (maybe publish it as a collection or something?) | 23:32 |
ianw | corvus: ++ | 23:34 |
ianw | infra-prod-ansible stopped working @ https://zuul.opendev.org/t/openstack/build/ddfee3c7d60e476a99dd266f071c49ba on 2021-01-18 15:01:54 | 23:34 |
ianw | it appears to be this bit | 23:34 |
ianw | # Clean is needed because we pushed to a non-bare repo | 23:34 |
ianw | git clean -xdf | 23:34 |
ianw | i wonder if that is new | 23:34 |
ianw | hrm ... https://opendev.org/zuul/zuul-jobs/commit/3cc366c9c6fb3424dbc7647b23e26f637feb8451 | 23:36 |
ianw | Fail mirror-workspace-git-repos if checkout failed | 23:36 |
ianw | that change looks correct | 23:37 |
ianw | i think we must have run the launch script out of /home/zuul/src/.../system-config/launch as root which created these pyc files | 23:38 |
ianw | i can remove them, but i wonder if we can stop that tree from creating them | 23:38 |
ianw | i don't think you can, easily, without setting environment variables | 23:41 |
openstackgerrit | Ian Wienand proposed opendev/system-config master: launch-node.py : don't write bytecode https://review.opendev.org/c/opendev/system-config/+/771289 | 23:50 |
fungi | that seems like a reasonable precaution | 23:50 |
ianw | ok, i'm going to run the clean as root, which should get things working, and ^ should help minimise harm in the future | 23:50 |
fungi | yeah, i think that's the right choice | 23:51 |
ianw | there was also playbooks/filter_plugins/__pycache__/ | 23:52 |
ianw | we'll have to see if that comes back, not sure how that's getting generated :/ | 23:52 |
fungi | timestamps on the files might yield some indication of age | 23:53 |
fungi | and thus possible frequency | 23:53 |
fungi | like if they're years old, then maybe not done by automation | 23:54 |
ianw | the launch ones were old; unfortunately i didn't realise filter_plugins would go till i ran git clean -dxf and saw it | 23:54 |
fungi | ahh, yeah | 23:54 |
ianw | i think at this point, i think a manual 1.8 upgrade of afs02.dfw and then merge https://review.opendev.org/c/opendev/system-config/+/771285 to get it into ansible, and i'll watch the deployment which hopefully can complete | 23:54 |
ianw | i'll put it in emergency and start the module builds | 23:55 |
fungi | thanks! | 23:58 |
Generated by irclog2html.py 2.17.2 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!