Thursday, 2021-01-14

clarkb	there are 31 neutron changes in the queue too	00:05
clarkb	which is about 20% of the queue size but likely significantly more when measured in node time	00:07
clarkb	does anyone know why the tripleo-ci-centos-8-standalone jobs in the tripleo gate queue all seem to be waiting? I wonder if they are waiting on resources by a parent job?	00:09
clarkb	that seems to be a big reason for why that queue isn't flushing its contents quickly	00:09
clarkb	well that and the general slowness of the jobs that are running but that is normal	00:09
corvus	infra-root: it looks like perhaps afs release jobs are stuck or slow; is this known?	00:15
ianw	corvus: i belive due to a prior failure of afs02 everything is doing a full release	00:16
corvus	neat	00:16
corvus	it came to my attention because docs volumes aren't updating; from the process list, i'm guessing they have completed their release, but due to the way the cron works, they won't get another release until everything else is done	00:17
clarkb	I think the tripleo jobs that aren't running depend on tripleo-ci-centos-8-content-provider which zuul says has a > 3 hour estimated runtime	00:17
clarkb	so that is likely another place that can be optimized	00:17
clarkb	oh I wonder if that job pauses though so the long runtime is related to waiting for the other jobs to run	00:18
corvus	let me revise that: it looks like the docs release script is waiting only on the release of project.tarballs	00:20
fungi	corvus: yep, i mentioned in #zuul earlier, still keeping an eye on it	00:20
clarkb	https://opendev.org/openstack/tripleo-ci/src/branch/master/zuul.d/standalone-jobs.yaml#L1092 that seems to be a job the builds container images. I wonder if every tripleo content-provider job rebuilds all tripleo images which isn't quick, then other jobs sit around waiting for that to complete	00:21
corvus	fungi: oh, sorry missed that	00:21
fungi	all the mirror volumes are trying to do full releases to afs02.dfw which has immensely slowed the tarballs volume release, ad the zuul site release is in a serialized script behind the tarballs full rerelease	00:21
fungi	recovery has been underway since roughly 17:20 utc	00:22
clarkb	the neutron changes also run that trpleo content-provider job as a dep for the tripleo jobs run against neutron changes	00:23
clarkb	I'm running out of steam to do a zuul throughput debug but we may need to talk to the openstack TC tomorrow if this persists or gets worse	00:24
clarkb	it does seem like we've got at least a couple of setups that are making it bad	00:24
clarkb	I expect the build the images step is a response to docker hub doing rate limits	00:25
corvus	fungi, ianw: are we sure afs01.dfw is okay?	00:25
clarkb	quay.io says they won't do image download rate limits (though if you make too many looks they rate limit those) ? I wonder if we need to work with tripleo to look at quay again	00:25
corvus	i'm seeing 'vos examine' hang on volumes that are on that server	00:26
clarkb	corvus: I think it stopped complaining about its cinder volume after rebooting, but I'm not sure if further verification of health was made	00:26
fungi	corvus: not entirely sure since the server itself hung last week and had to be rebooted. it looked like things had recovered but possible i missed something	00:26
fungi	clarkb: afs01 not 02	00:26
clarkb	fungi: oh got it	00:27
fungi	for afs01 the server hung last week and had to be hard rebooted to recover (likely something related to live migration in rax)	00:27
fungi	for afs02 one of its cinder volumes was on a host which became unresponsive, so the /vicepa partition remained in a read-only state until i rebooted the server	00:28
fungi	corvus: where did you see hung vos examine processes? possible those were somewhere i missed after the reboot last week	00:29
ianw	afs01 seems up and no bad messages as step 1	00:29
corvus	fungi: in my terminal	00:29
fungi	ahh	00:29
corvus	vos examine test.fedora	00:29
fungi	yeah, it takes a while for me to get a response on that as well (maybe forever, hasn't returned yet)	00:30
fungi	corvus: looks like there's a hung `vos examine project.tarballs` from 10 minutes ago on mirror-update.opendev.org too, is that you as root?	00:31
ianw	nothing bad in the afs logs, but agree it's not looking healthy	00:32
corvus	yep	00:32
fungi	could it be that having so many volumes doing full releases in parallel has overwhelmed it?	00:32
corvus	could be? maybe the volserver limited queue for answering requests and it's full of release txns?	00:34
fungi	project.tarballs, mirror.yum-puppetlabs, mirror.debian, mirror.opensuse, mirror.debian-security, mirror.ubuntu-ports, mirror.epel, mirror.centos, mirror.fedora, mirror.deb-octopus, mirror.deb-docker, mirror.ubuntu-cloud, mirror.apt-puppetlabs, mirror.deb-nautilus, mirror.ubuntu	00:34
fungi	those are all running simultaneously	00:34
corvus	since we're not seeing any errors, probably makes sense to just leave it be until the backlog clears a bit	00:34
fungi	basically the static volume releases kicked off and those went fine until it got up to project.tarballs which needed a full release and is something like 175GB of data	00:35
fungi	roughly an hour into that, mirror volumes started getting vos release runs and i expect many if not most of them decided they needed a full release too	00:35
fungi	and they piled up before i noticed	00:35
corvus	would be cool if we could shard that by date :)	00:36
corvus	next time we design a tarballs archive....	00:36
fungi	indeed	00:36
*** tosky has quit IRC		00:36
fungi	in theory at the date of data transfer i'm seeing the project.tarballs full release would have only needed two hours to complete	00:37
fungi	except about halfway into that it started having to compete with full releases of mirror volumes	00:37
fungi	er, rate of data transfer i'm seeing	00:37
fungi	we're clocking ~50Mbps into eth0 on afs02.dfw	00:38
ianw	iotop on afs01 shows a lot of reads, and a lot of writes on afs02 ... it does appear to be doing something, although i agree none of the status commands return	00:39
fungi	er, actually bad math on my part. more like 9 hours to complete a full release of 175GiB at 45Mbps (closer to what the graph indicates)	00:41
fungi	i'm increasingly tempted to kill the mirror volume releases and hold locks for them, then remove the locks in an orderly fashion once static site volumes are done	00:42
fungi	when i thought the tarballs volume could finish at any time i was less concerned about it, but at this point i'm quite sure the mirror volumes are going to need faaar longer and drag out the tarballs site finishing by a lot more than i originally estimated	00:43
ianw	fungi: i once had a change out to put a global stop to mirror releases ...	00:44
ianw	https://review.opendev.org/c/opendev/system-config/+/680586	00:45
fungi	thinking through this, the main risk is if i kill an in progress vos release, the volume it was releasing might need a full release instead of an incremental one. the problem volumes are likely ones doing full releases anyway and i bet they're not very far along... so maybe if i check the logs and only kill the mirror volumes doing a full release, then not much progress is lost?	00:45
fungi	any of the ones where their last loglines are currently something like "Starting ForwardMulti from .* to .* on afs02.dfw.openstack.org (full release)."	00:47
auristor	fungi: the most likely reason that "vos status afs01.dfw.openstack.org" is failing to return is that there are no available threads from to process the request.	00:48
fungi	auristor: sounds likely, thanks!	00:49
fungi	yeah, currently 14 mirror volumes all in the midst of a full release to afs02.dfw according to their respective logs	00:49
auristor	although rxdebug reports that there are currently 10 calls waiting for a thread and two threads are idle	00:49
fungi	plus the project.tarballs volume	00:49
*** d34dh0r53 has quit IRC		00:50
ianw	the other thing is, we'd be in a much better position to put in a sequential lock for the mirror volume releases now that they're all done from the same server	00:50
fungi	great point	00:51
fungi	well, except for the wheel volumes i think?	00:52
fungi	but those aren't a huge problem	00:52
*** d34dh0r53 has joined #opendev		00:54
*** auristor has quit IRC		00:56
openstackgerrit	melanie witt proposed opendev/elastic-recheck master: Add query for bug 1911574 https://review.opendev.org/c/opendev/elastic-recheck/+/770688	00:59
openstack	bug 1911574 in OpenStack-Gate "SSH to guest sometimes fails publickey authentication: AuthenticationException: Authentication failed." [Undecided,New] https://launchpad.net/bugs/1911574	00:59
fungi	actually a more careful analysis of the mirror update logs indicates there are only 8 with a full release in progress so i'll stop those first and put locks in place, then see which others queue up and whether they want full releases too	01:00
*** auristor has joined #opendev		01:07
*** auristor has quit IRC		01:19
*** auristor has joined #opendev		01:20
fungi	okay, i've stopped releases underway for the following and held locks in a root screen session on mirror-update.o.o: centos, epel, fedora, opensuse, yum-puppetlabs, debian-security, debian, ubuntu-ports	01:21
auristor	what do you mean by "stopped releases"? do you mean you killed the vos processes? that isn't going to cancel the transfers between afs01.dfw and afs02.dfw. it will result in the transferred data being discarded after the transfer completes.	01:25
fungi	yeah, i still need to do something about the transactions, presumably	01:25
fungi	just trying to make sure our scripts don't restart them once i have	01:25
auristor	until there are available threads to process incoming rpcs there is nothing that can be done.	01:26
fungi	ahh, even the vos endtrans calls will queue up i guess	01:26
fungi	so need to wait for at least one of them to complete	01:26
auristor	one of the design flaws in openafs that was addressed by auristorfs is that you cannot terminate the transfer between the volservers	01:27
auristor	also, I strongly recommend that no more than 5 volume operations be permitted in flight at a time.	01:27
fungi	thanks, sounds like our plan to start setting up a semaphore of some sort with these is a good one in that case	01:28
fungi	usually it's not a problem, but if something happens to the server and most of the volumes have to get full releases they'll pile up	01:29
*** cloudnull has quit IRC		01:29
openstackgerrit	xinliang proposed openstack/diskimage-builder master: Fix building error with element dracut-regenerate https://review.opendev.org/c/openstack/diskimage-builder/+/770241	01:53
*** chateaulav has joined #opendev		02:29
*** chateaulav has quit IRC		02:33
*** mlavalle has quit IRC		02:34
openstackgerrit	xinliang proposed openstack/diskimage-builder master: Fix centos 8.3 partition image building error with element iscsi-boot https://review.opendev.org/c/openstack/diskimage-builder/+/770701	03:00
openstackgerrit	Jeremy Stanley proposed opendev/engagement master: Initial commit https://review.opendev.org/c/opendev/engagement/+/729293	03:01
openstackgerrit	Jeremy Stanley proposed opendev/engagement master: Initial commit https://review.opendev.org/c/opendev/engagement/+/729293	03:04
openstackgerrit	xinliang proposed openstack/diskimage-builder master: Add rhel support for iscsi-boot https://review.opendev.org/c/openstack/diskimage-builder/+/770702	03:25
*** brinzhang0 has joined #opendev		03:27
*** brinzhang0 has quit IRC		03:29
*** brinzhang0 has joined #opendev		03:29
*** brinzhang_ has quit IRC		03:30
*** brinzhang0 has quit IRC		03:30
*** brinzhang0 has joined #opendev		03:31
openstackgerrit	xinliang proposed openstack/diskimage-builder master: Add aarch64 support for rhel https://review.opendev.org/c/openstack/diskimage-builder/+/770703	03:37
openstackgerrit	Merged opendev/system-config master: Publish structured data listing our ML archives https://review.opendev.org/c/opendev/system-config/+/751125	03:38
*** whoami-rajat__ has joined #opendev		04:22
*** artom has quit IRC		04:24
*** amotoki has quit IRC		04:44
openstackgerrit	Ian Wienand proposed opendev/system-config master: vos-release: implement sequential release lock https://review.opendev.org/c/opendev/system-config/+/770705	04:44
openstackgerrit	Ian Wienand proposed opendev/system-config master: mirror-update: create common timeout set https://review.opendev.org/c/opendev/system-config/+/770706	04:44
*** amotoki has joined #opendev		04:44
ianw	fungi: ^	04:45
ianw	mnaser: so we have two vexxhost backups; the old bup one @ backup01.sjc-1.vexxhost.opendev.org and the new borg one @ backup02.sjc-1.vexxhost.opendev.org	04:53
ianw	mnaser: the old one has 3tb attached, 2.4 taken, so i can't take any drives out of that. the new one only has 1tb attached and is at 98%	04:54
ianw	mnaser: i think ideally we'd like to keep the old bup 3tb of backups for a while, just as ... well ... a backup. but we're out of quota to add more disk to the new server	04:55
ianw	mnaser: let me know what you think. if adding extra is an issue, we could probably drop the old bup backups on vexxhost and just leave one copy in RAX; i.e. effectively free up the 3tb of the old server and allocate to new	04:56
*** ykarel has joined #opendev		05:22
*** ykarel has quit IRC		05:39
*** ykarel has joined #opendev		05:41
*** ysandeep\|out is now known as ysandeep\|afk		05:43
*** ykarel_ has joined #opendev		05:54
*** ykarel has quit IRC		05:57
ianw	fungi: ooohhhh ... hrm 770705 & 770706 actually have a little dependency on production in testing. it does a no-op test of the vos-release script in the testinfra	06:03
ianw	fungi: the problem being that it does a vos examine (or whatever) on the volumes to determine if they need have release run on them. that is timing out due to the prior discussion, failing the test	06:04
ianw	i dunno what to do about that. probably just leaving it till everything settles down is the best course of action at this point	06:04
*** ykarel__ has joined #opendev		06:05
*** marios has joined #opendev		06:07
*** ykarel__ is now known as ykarel		06:07
*** ykarel_ has quit IRC		06:08
*** brinzhang_ has joined #opendev		06:25
*** brinzhang0 has quit IRC		06:28
*** auristor has quit IRC		06:43
*** ysandeep\|afk is now known as ysandeep		07:21
*** eolivare has joined #opendev		07:42
*** jaicaa has quit IRC		07:44
*** jaicaa has joined #opendev		07:45
*** rpittau\|afk is now known as rpittau		07:47
*** ralonsoh has joined #opendev		07:51
*** jpena\|off is now known as jpena		07:52
*** ralonsoh_ has joined #opendev		07:56
*** ralonsoh has quit IRC		07:59
*** slaweq has joined #opendev		08:03
*** fressi has joined #opendev		08:05
danpawlik	ianw, fungi: Hey, do you have some issue with AFS mirror?	08:05
danpawlik	seems that release date are not updated since 1 day for all distros	08:05
*** sgw has quit IRC		08:11
*** zoharm has joined #opendev		08:14
*** andrewbonney has joined #opendev		08:17
*** hashar has joined #opendev		08:22
*** ralonsoh has joined #opendev		08:28
*** ralonsoh_ has quit IRC		08:28
zoharm	Hi all, would like to ask here for some pointers regarding setting up Cinder volume backend driver 3rd party CI.	08:31
zoharm	We currently have devstack streamlined to run with our storage backend and are able to launch successful tempest runs. My question/request is, what are some useful resources documenting the integration points needed for Gerrit, initiating assigned CI runs, and publishing results?	08:31
zoharm	And any recommendations for the setup architecture would be greatly appreciated! Thank you!	08:32
*** fp4 has joined #opendev		08:35
*** fp4 has quit IRC		08:39
*** fp4 has joined #opendev		08:41
*** brinzhang0 has joined #opendev		08:45
*** brinzhang_ has quit IRC		08:48
*** tosky has joined #opendev		08:49
openstackgerrit	Merged zuul/zuul-jobs master: upload-artifactory: no_log upload task https://review.opendev.org/c/zuul/zuul-jobs/+/768111	08:51
*** sgw has joined #opendev		08:51
*** slaweq has quit IRC		08:55
frickler	danpawlik: yes, one afs node crashed yesterday and we are still trying to get things back into sync	09:00
*** slaweq has joined #opendev		09:00
zbr	Am i the only one that find quite often gerrit stuck forever with "Loading..."?	09:08
zbr	i have to reload the page again to fully load it, as when it happens it almost never finishes.	09:09
zbr	it did happen in the past but during the last week it become very common.	09:09
*** ykarel_ has joined #opendev		09:13
zbr	frickler: low hanging, https://review.opendev.org/c/opendev/git-review/+/770556 thanks.	09:14
*** ykarel has quit IRC		09:16
openstackgerrit	Martin Kopec proposed opendev/system-config master: WIP Deploy refstack with ansible docker https://review.opendev.org/c/opendev/system-config/+/705258	09:29
*** fressi has quit IRC		09:39
*** fressi has joined #opendev		09:40
*** ykarel_ is now known as ykarel		10:06
openstackgerrit	Andy Ladjadj proposed zuul/zuul-jobs master: [ensure-python] install python version only if not present https://review.opendev.org/c/zuul/zuul-jobs/+/770656	10:14
*** sgw1 has joined #opendev		10:15
*** sgw has quit IRC		10:16
*** lpetrut has joined #opendev		10:33
*** sshnaidm\|afk is now known as sshnaidm\|ruck		10:40
openstackgerrit	Andy Ladjadj proposed zuul/zuul-jobs master: [ensure-python] install python version only if not present https://review.opendev.org/c/zuul/zuul-jobs/+/770656	10:56
*** dtantsur\|afk is now known as dtantsur		10:56
*** brinzhang has joined #opendev		10:57
*** brinzhang has quit IRC		10:58
*** brinzhang has joined #opendev		10:58
*** brinzhang0 has quit IRC		10:59
mnaser	ianw: how long would you keep it up?	11:03
openstackgerrit	Andy Ladjadj proposed zuul/zuul-jobs master: [ensure-python] install python version only if not present https://review.opendev.org/c/zuul/zuul-jobs/+/770656	11:06
openstackgerrit	Andy Ladjadj proposed zuul/zuul-jobs master: [ensure-python] install python version only if not present https://review.opendev.org/c/zuul/zuul-jobs/+/770656	11:07
*** ysandeep is now known as ysandeep\|afk		11:17
*** DSpider has joined #opendev		11:19
*** brinzhang has quit IRC		11:25
*** brinzhang has joined #opendev		11:25
*** brinzhang_ has joined #opendev		11:26
*** brinzhang_ has quit IRC		11:28
*** brinzhang_ has joined #opendev		11:29
*** brinzhang has quit IRC		11:30
*** fressi has quit IRC		11:32
*** fressi has joined #opendev		11:39
*** hashar is now known as hasharLunch		11:54
*** fressi has quit IRC		11:54
*** brinzhang0 has joined #opendev		12:03
*** brinzhang_ has quit IRC		12:05
openstackgerrit	Andy Ladjadj proposed zuul/zuul-jobs master: [ensure-python] install python version only if not present https://review.opendev.org/c/zuul/zuul-jobs/+/770656	12:21
*** hasharLunch is now known as hashar		12:22
*** jpena is now known as jpena\|lunch		12:33
*** auristor has joined #opendev		12:39
*** ysandeep\|afk is now known as ysandeep		12:52
*** michael-mcaleer has joined #opendev		12:58
michael-mcaleer	Hi OpenDev team, I have a question around editing watchlists in review.opendev.org. The docs say to go through Settings > Watched Projects but since the recent changes to gerrit I seem to have lost the ability to add/remove watched tags or projects. I am trying to unsubscribe from tags from cinder after moving teams	13:01
michael-mcaleer	Can you help point me in the right direction here? Thanks!	13:01
frickler	michael-mcaleer: I think https://review.opendev.org/settings/#Notifications should be what you are looking for	13:08
openstackgerrit	Martin Kopec proposed opendev/system-config master: WIP Deploy refstack with ansible docker https://review.opendev.org/c/opendev/system-config/+/705258	13:10
michael-mcaleer	Thanks frickler, I was able to find what I was looking for	13:13
michael-mcaleer	I also needed to remove it from launchpad, that was where I was going wrong	13:13
*** brinzhang0 has quit IRC		13:13
*** brinzhang0 has joined #opendev		13:14
*** chateaulav has joined #opendev		13:16
*** sboyron has joined #opendev		13:26
*** jpena\|lunch is now known as jpena		13:29
*** artom has joined #opendev		13:30
*** zul has quit IRC		13:49
*** ykarel has quit IRC		14:02
slaweq	hi fungi and other infra-root guys, can You take a look at	14:04
slaweq	https://review.opendev.org/c/zuul/zuul-jobs/+/762650?	14:04
slaweq	thx in advance	14:04
*** ysandeep is now known as ysandeep\|cinder_		14:14
*** ysandeep\|cinder_ is now known as ysandeep\|session		14:14
openstackgerrit	Alfredo Moralejo proposed zuul/zuul-jobs master: Rename config repos file config for CentOS Stream https://review.opendev.org/c/zuul/zuul-jobs/+/770815	14:16
*** fressi has joined #opendev		14:16
openstackgerrit	Benedikt Löffler proposed zuul/zuul-jobs master: Pass environment variables to 'tox envlist config' task https://review.opendev.org/c/zuul/zuul-jobs/+/770819	14:25
*** fressi has quit IRC		14:32
*** fbo has quit IRC		14:45
*** fbo has joined #opendev		14:47
fungi	danpawlik: yes, there was a catastrophic storage failure for afs02.dfw due to a cinder outage in the provider, it's being rewritten very slowly but volume releases are delayed due to limited bandwidth to complete that	14:50
guillaumec	zuul-promote-docs doesn't seem to update Zuul Documentation https://zuul-ci.org/docs/zuul/index.html, recent https://review.opendev.org/644927 and https://review.opendev.org/732066 doc update aren't online	14:50
fungi	oh, i see frickler replied further down in my scrollback	14:50
danpawlik	fungi, frickler: thanks for the information	14:51
fungi	guillaumec: yes, we're roughly a day backlogged on replicating afs volumes due to a catastrophic cinder failure in rackspace	14:51
guillaumec	fungi, ok	14:51
fungi	and that site is served out of a read-only afs volume replica	14:51
fungi	it will update once all the replication catches back up	14:52
frickler	slaweq: can you consider ianw's remark? would it be possible to just rerun this task when you need it instead of using the new script?	14:54
*** cloudnull has joined #opendev		14:55
slaweq	frickler: but can I run this ansible task from devstack directly?	15:00
slaweq	currently the problem is that:	15:00
slaweq	1. zuul installs ovs and configures bridges for infra connectivity	15:01
*** sgw1 has left #opendev		15:01
slaweq	2. devstack runs and if ovn module is used there, it removes ovs installed previously from packages and installs everything from source	15:01
*** ykarel has joined #opendev		15:02
slaweq	3. and then all those settings made by zuul and this role are gone	15:02
slaweq	frickler: that's why I wanted to have simply script which I can call from the devstack plugin	15:02
openstackgerrit	Martin Kopec proposed opendev/system-config master: WIP Deploy refstack with ansible docker https://review.opendev.org/c/opendev/system-config/+/705258	15:07
*** michael-mcaleer has quit IRC		15:17
*** sboyron has quit IRC		15:31
*** hashar has quit IRC		15:33
*** lpetrut has quit IRC		15:33
*** mtreinish has joined #opendev		15:41
*** JayF has joined #opendev		15:50
openstackgerrit	Matthew Thode proposed openstack/project-config master: update gentoo from python 3.6 to python 3.8 https://review.opendev.org/c/openstack/project-config/+/770828	15:55
*** ysandeep\|session is now known as ysandeep		15:58
*** cloudnull has quit IRC		15:59
clarkb	slaweq: are ovn and ovs not able to coexist because they share the same kernel configuration?	16:01
clarkb	or maybe the same module in the kernel	16:01
slaweq	clarkb: it's not that they can't coexists, ovn requires ovs to be running	16:02
slaweq	clarkb: but the problem in our case is that ovs installed from packages is using different ovsdb file, different sockets, etc.	16:02
clarkb	slaweq: right I undersatnd that, I think I'm trying to understand why you must flush our existing config	16:02
clarkb	we've intentionally tried to set it up such that is uses very high vxlan numbers and is otherwise out of the way	16:02
openstackgerrit	Matthew Thode proposed openstack/project-config master: update gentoo from python 3.6 to python 3.8 https://review.opendev.org/c/openstack/project-config/+/770828	16:03
slaweq	so when we are installing new ovs from source, it don't see anything what was created earlier when "packaged ovs" was running	16:03
clarkb	I see it completely ignores the preexisting config	16:03
slaweq	clarkb: config is flushed by reinstallation of ovs from source	16:03
clarkb	my concern with adding scripts there is that they won't be tested 99% of the time. I think if we are going to go that route then the role should run those scripts as its setup too	16:03
clarkb	I know others will object to replacing ansible with bash (though I'm personally less concerned about that)	16:04
slaweq	clarkb: I can do that of course but as You said, will others be happy with that?	16:04
*** cloudnull has joined #opendev		16:05
slaweq	I just need simple way to "reconfigure" that br-infra bridge again, after ovs is installed from source by ovn devstack plugin	16:05
openstackgerrit	Martin Kopec proposed opendev/system-config master: WIP Deploy refstack with ansible docker https://review.opendev.org/c/opendev/system-config/+/705258	16:05
clarkb	slaweq: I think if we are going to supply a script to do this but don't run it as part of the role it is very likely to regress the next time we make a change to the role	16:05
clarkb	which is why I think we should run the scripts in the role too. Maybe ask in #zuul and see if others have strong opinions about replacing ansible with bash	16:06
slaweq	clarkb: ok, I will ask	16:06
slaweq	but not today as I'm almost done for today	16:06
fungi	right, it's really a zuul project, not an opendev project	16:06
clarkb	slaweq: yes enjoy your evening	16:06
slaweq	thx for checking that	16:06
fungi	(zuul-jobs is the zuul standard library in essence)	16:06
clarkb	fungi: did the backlog of changes in zuul come up in the tc meeting?	16:07
clarkb	looks like it has grown significantly since I last looked :/	16:08
clarkb	looks like pylint caused neutron to reset the gate recently too	16:08
clarkb	I didn't realize any openstack projects used pylint (for this very reason)	16:09
fungi	it did not come up, no	16:09
fungi	wasn't on the agenda i don't think	16:09
*** ykarel is now known as ykarel\|away		16:24
clarkb	fungi: for the afs release is the list of volumes to release shrinking ?	16:28
*** mlavalle has joined #opendev		16:29
*** hashar has joined #opendev		16:31
*** zbr3 has joined #opendev		16:38
*** zbr3 has quit IRC		16:39
*** zbr9 has joined #opendev		16:40
*** zbr has quit IRC		16:40
*** zbr9 is now known as zbr		16:40
*** ykarel\|away has quit IRC		16:46
openstackgerrit	Martin Kopec proposed opendev/system-config master: WIP Deploy refstack with ansible docker https://review.opendev.org/c/opendev/system-config/+/705258	16:49
fungi	clarkb: not really, no, because they're all volumes which would require multiple days to do a full release and all got triggered on top of each other	16:49
clarkb	got it	16:50
fungi	and as was subsequently pointed out, killing the vos release processes doesn't terminate the data transfer transactions, just makes them get discarded when they complete, so there's still no open rpc slots to actually tell afs to do anything else	16:50
clarkb	oh TIL	16:51
*** marios has quit IRC		17:03
*** diablo_rojo has joined #opendev		17:04
JayF	https://lists.openafs.org/pipermail/openafs-info/2021-January/043013.html in case you were not aware; IDK what version of OpenAFS you all run, but this is particularly nasty.	17:06
*** jpena is now known as jpena\|off		17:07
*** eolivare has quit IRC		17:07
clarkb	fungi: ^ I think our restarts were prior to that date. But I assume subsequent restarts will completely break us	17:09
clarkb	thats a joy	17:09
fungi	looks like we're using 1.8.6 on focal	17:09
clarkb	fungi: our servers likely pose a bigger issue they are still 1.6 iirc	17:09
JayF	folks were talking about it in #lopsa -- apparently almost any change, client or server side, can trigger it	17:10
fungi	also using 1.8.6 on bionic	17:10
clarkb	oh our servers are 1.8 ? I guess that is "good"	17:11
fungi	at least some of our mirrors are bionic and focal, not sure which ones might be xenial still	17:11
clarkb	I didn't thnk we had upgraded the servers	17:11
clarkb	fungi: the afs servers themselves	17:11
fungi	ahh	17:11
fungi	yeah, afs01.dfw at least is still using 1.6.15	17:12
clarkb	I see that email says <1.8 might not be affected in the same way "further research needed"	17:13
fungi	all three fileservers are 1.6.15	17:13
clarkb	also specifically calls out unauthenticated there	17:13
clarkb	but I thnik our servers talk to each other authenticated?	17:13
fungi	zuul executors are also using 1.8.6 (but on xenial)	17:14
clarkb	fungi: I think the 1.8 packages we install are out of our own ppa	17:15
clarkb	so in theory we can apply those patches and rebuild in the ppa	17:15
fungi	yeah	17:15
fungi	that's a great point	17:15
clarkb	JayF: fungi ya reading the earlier messages it sounds like any action like a vos * starts a new rx stack and can hit this	17:16
clarkb	so it isn't just if we restart services or reload the kernel module	17:16
*** chateaulav has quit IRC		17:16
clarkb	looking at the change in gerrit it says authenticated calls will fail because it detects the mismatch between the id and the auth state	17:18
*** rjcv has joined #opendev		17:22
*** rjcv has quit IRC		17:25
fungi	unrelated to the bug fix, but...	17:28
fungi	infra-root: probably the biggest impact to users while we're still waiting to get vos releases back on track is the static site content volumes... what if we temporarily patched the vhost configs on static.o.o to hit the read-write volume path instead of the read-only path? is that a terrible idea? keep in mind we're probably looking at some time next week having everything back to normal?	17:28
clarkb	I think we have done it in the past	17:29
clarkb	fungi: I don't know if doing that will trip over that bug though	17:29
clarkb	its possible that changing the target will cause a new rx stack to be created then we'll break there too?	17:29
clarkb	fungi: I'm not opposed to the idea, I think the biggest risk with it is if it trips on the afs bug. Maybe we can attemtp to test things somehow with different hosts?	17:33
clarkb	though this may not be a problem until there is >1 new connection since they won't share the ids until there is >1	17:34
* JayF feels like he played the role of messenger of pain this morning		17:34
fungi	JayF: nah, appreciate the heads up	17:35
*** ysandeep is now known as ysandeep\|out		17:35
clarkb	https://launchpad.net/~openstack-ci-core/+archive/ubuntu/openafs is what we use to install 1.8 on the clients	17:37
clarkb	so I guess we can put those two patches into that then do a forced update on openafs on the openafs clients?	17:38
clarkb	then figure out what it means for the servers?	17:38
fungi	well, once it builds	17:38
clarkb	JayF: you haven't come across any more concrete info for 1.6 have you?	17:39
clarkb	I'm kind of reading it like it doesn't have the specifric issue but lacks randomness so can hit this in other ways	17:39
JayF	The only reason I even know about this is a couple of folks (billings and ENOMAD) were talking about the problem in #lopsa	17:39
clarkb	JayF: thanks	17:39
JayF	Might not hurt to drop in there and ask billings? but I wouldn't have high expectations he has an answer	17:39
clarkb	ya I think we can work off of what is published now then go digging when we get to that point	17:40
clarkb	(to avoid bugging people who are likely triaging their stuff)	17:40
JayF	it does sound to me like having 1.6 server doesn't solve it though; 1.8 clients can trigger it just on their own	17:40
JayF	but IMBW, and I haven't worked with AFS in ~10 years	17:40
fungi	right, i think we need to push the fixes into our lp forks for now and then trigger ppa rebuilds	17:41
clarkb	fungi: ya I think that is step 0	17:41
fungi	might be worth considering updating our servers to 1.8.6 from the ppa as well i guess	17:41
clarkb	reading the comments on the fixes the first fix only makes this work 50% of the time. Need the second fix to get it much better than that	17:41
clarkb	fungi: the reason we haven't done that is the 1.6 -> 1.8 migration requires an outage aiui	17:42
clarkb	but maybe if we're gonna have an outage anyway this is our opportunity	17:42
clarkb	but ya ++ to updating the ppa	17:42
*** rpittau is now known as rpittau\|afk		17:42
clarkb	fungi: the first fix ahs merged but the second has not yet	17:43
clarkb	but we likely need both (not sure if we want to wait for them to merge the second)	17:43
*** dtantsur is now known as dtantsur\|afk		18:10
*** andrewbonney has quit IRC		18:17
*** hashar is now known as hasharAway		18:29
*** diablo_rojo has quit IRC		18:34
*** hasharAway has quit IRC		18:34
*** slaweq has quit IRC		18:38
*** ralonsoh has quit IRC		18:41
*** diablo_rojo has joined #opendev		18:43
fungi	okay, lemme hydrate real quick and then i can look at ppa updates for realz	18:53
fungi	also ianw might have suggestions once he's around	18:53
fungi	he seems to have done the most recent uploads for it	18:54
clarkb	fwiw it looks like the second change to fix things has merged so should be good to grab that version	18:54
fungi	oh, awesome, that was fast	18:54
clarkb	I've also been poking at figuring out build stuff in a container but it is slow going as I haven't done this in like 15 years	18:55
clarkb	learning a lot about various things like quilt :) right now I'm just trying to do a local build of the existing package source though	18:55
clarkb	then will try to figure out quilt for applying the patches then rebuild. I'm not in a good spot to test this before uploading though beacuse my container is running on suse with different kernels and all that	18:55
clarkb	all that to say fungi please don't stop looking at it too :)	18:55
fungi	i'll get some pbuilder chroots set up for xenial, bionic and focal on my workstation but that'll need a few minutes	18:56
fungi	creating ubuntu chroots is its own rabbit hole	19:07
clarkb	I used docker which may or may not present problems too :)	19:09
clarkb	I'm sure if you do debian packaging this all makes immediate sense via looking at the debian/ dir in the source but what is the best way to say I want this git diff to be a quilt patch?	19:10
clarkb	my local build succeeds but tests fail. I'm now going to fiddle with patches and see if I can make it do a thing	19:12
fungi	if you do debian packaging regularly you already have clean chroots on hand with the dev tools installed into them and your gpg keys mapped in	19:18
fungi	mmdebstrap was throwing weird gpg errors trying to create ubuntu chroots (and it's not that i'm missing the ubuntu archive keys, those are definitely already installed) so i'm updating the packages on my workstation to rule out something being stale/behind	19:19
*** whoami-rajat__ has quit IRC		19:20
fungi	yeesh, our node request backlog is up over 4k now	19:27
clarkb	ya nova neutron glance etc are backed up >24 hours now	19:29
clarkb	I brought it up with the tc and called out a couple of the things I noticed and sounds like people are looking at it now	19:29
fungi	clarkb: probably easiest way to get those patches now: https://git.openafs.org/?p=openafs.git;a=patch;h=a3bc7ff1501d51ceb3b39d9caed62c530a804473 https://git.openafs.org/?p=openafs.git;a=patch;h=2c0a3901cbfcb231b7b67eb0899a3133516f33c8	19:30
clarkb	fungi: thanks, I did manage to get them from gerrit directly eventually	19:30
clarkb	dumping them into the patches dir and updating the series file by hand seems to work when I run quilt push -a	19:31
clarkb	now i'm looking to verify the diff between that state and the old state (but I didn't record the old state first so now learning how to revert with quilt)	19:31
fungi	right, that's easier than learning to use quilt push/pop	19:31
fungi	and apply	19:31
fungi	clarkb: the debdiff tool can compare two source packages	19:32
fungi	oh, though that's only for the filenames themselves	19:32
clarkb	quilt pop worked for reverting	19:33
fungi	debdiff manpage recommends diffoscope tool for deeper comparison of packages	19:33
clarkb	and the diff looks good. I guess now I rerun the build and see if tests pass (if they don't I'm not sure what I can do next as I suspect it may be cranky about my kernel?)	19:33
fungi	yeah, can't hurt, but you also might just try building the source package and pushing to lp and letting the ppa builder complain if it doesn't really build	19:34
clarkb	ya but if it build we'll start pulling it automatically in like 10 hours?	19:35
clarkb	I'm just paranoid I will accidentally push an unhappy package (also I have to figure out signing still)	19:36
fungi	ahh	19:36
clarkb	do you know if lp will run those same tests on build?	19:36
clarkb	its rebuilding with the patches applied now so should find out if the test failures are related to that soon I guess	19:37
fungi	i believe it will run any autotests defined in the package build	19:38
clarkb	cool. It also ran them automatically when I ran debuild	19:39
clarkb	seems likely it would run them in lp too	19:39
mordred	yah- the ppa builds should run the in-package tests	19:40
clarkb	oh I bet I need to bump the package version too otherwise it won't upgrade?	19:41
fungi	yeah, you can use the dch command or just add an entry to the debian/changelog file	19:42
clarkb	also I'm told lunch will be ready shortly so I need to pop out for that (please don't rely on me to figure this out quickest, really going through it as a learning exercise and expecting people like fungi that understand debs better to get to the end first)	19:42
fungi	you may still beat me to it, my chroot creation rabbit hole has turned up what i think may be a corrupt gnupg trustdb for apt-key	19:43
fungi	can't create any ubuntu chroots until i solve why i'm getting "gpg: [don't know]: invalid packet (ctb=2d)"	19:44
fungi	which is a marvellously clear error message, lemme tell you	19:44
clarkb	alright that builds successfully now and creates a ton of .debs	19:48
clarkb	the versioning stuff is really confusing to me. The changelogs for the ppa package are different for xenial and focal	19:52
clarkb	but only in their versions	19:52
clarkb	does this mean ianw uploaded version specific source packages for each one or is lp doing something smart?	19:52
fungi	not sure if you can tell a ppa to backport a source package to another release, searching now	19:54
mordred	clarkb: usually version specific source packages	19:54
mordred	fungi: you can also do that in the LP user interface but there are times when it doesn't work awesomely	19:55
clarkb	got it	19:55
mordred	when I've done this before I've always created release-specific source packages - usually only different in the version	19:55
clarkb	1.8.6-1ubuntu2 is the version number I should use ya?	19:55
clarkb	actually no the current version is 1.8.6-1ubuntu1~focal1 so it would be 1.8.6-1ubuntu1~focal2 ?	19:55
fungi	clarkb: yeah, or you could make it 1.8.6-1ubuntu2~focal1	19:56
fungi	since you're adding patches which are not focal-specific that's probably technically more correct, but either should work	19:56
ianw	o/	19:56
clarkb	fungi: thanks	19:57
clarkb	ianw: ohai https://lists.openafs.org/pipermail/openafs-info/2021-January/043013.html is what we are poking at	19:57
fungi	there's also the fact that these are built from 1.8.6-1 in debian but now debian has a 1.8.6-4 which ubuntu has imported into its archives, may be worth taking a look at the changelog for that too	19:57
clarkb	ianw: I'm learning me a debian right now the hard way but also need to pop out to lunch	19:57
ianw	so we need new afs packages post haste?	19:57
fungi	ianw: with upstream patches applied	19:57
clarkb	ianw: yes with two upstream gerrit patches applied. Then we need to figure out how/if 1.6 on our openafs fileservers if affected	19:57
fungi	not-yet-released	19:57
ianw	hrm; clients not too hard as we pull from launchpad. the servers don't	19:58
clarkb	ianw: ya and 1.6 doesn't have any patches yet I think. In that email I linked auristor says that investigation is necessary. One (potentially crazy) idea I threw out was the reason we haven't gone to 1.8 on the fileservers yet is that it requires a downtime iirc	19:59
clarkb	but if this ends up forcing downtime maybe we do that with the patched version	19:59
clarkb	ianw: also totally feel free to sort this out more quickly than me. I'm doing this as a good exercise to learn but doubt I'll be quickest	19:59
ianw	:/	19:59
fungi	unrelated, but also weighing the possibility of switching static.o.o vhost configs to serve from the read-write path as we're unlikely to get the read-only replicas updating regularly again before some time next week	20:00
clarkb	ianw: fwiw it almost sounds like unauthenticated client connections on 1.6 are expected to be affected but authenticated may not be? it is possible that our servers will be fine bceause they talk to each other auth'd?	20:00
ianw	fungi: that seems very sane, iirc we've done that before in recovery situations	20:01
clarkb	and its only our leaf nodes like the mirrors etc that are unauth'd	20:01
fungi	executors use auth too as they're primarily write not read	20:01
fungi	so static.o.o and the mirror servers are the main unauthed systems i guess	20:01
clarkb	looks like there may be a third patch https://gerrit.openafs.org/14495	20:02
clarkb	which has been abandoned in favor of https://gerrit.openafs.org/14496	20:02
ianw	fungi: that one looks abandonded	20:03
ianw	oh, what clarkb said :)	20:03
clarkb	also people on the mailing list are reporting that 1.8.6 clients talking to 1.6 servers are failing even after being patched	20:03
ianw	there's also centos to consider, but that's only for wheels	20:03
clarkb	but they indicate 1.6 clients seem to be ok?	20:03
clarkb	it sort of does seem like 1.6 might be less affected	20:04
ianw	although we're all 1.8 clients -> 1.6 servers	20:04
clarkb	correct	20:04
clarkb	but the 1.6 servers also talk to each other aiui	20:04
clarkb	just pointing out that the comms between servers may end up being ok (though I have no concrete assertion for that0	20:04
ianw	ahh, although i guess that's kind of moot if no client can talk correctly to the servers :)	20:05
clarkb	https://lists.openafs.org/pipermail/openafs-info/2021-January/043015.html	20:06
fungi	turn back all the clocks to 2020? ;)	20:06
fungi	i guess that won't work unless you can globally turn back the world	20:06
fungi	and i don't think the world wants another 2020	20:06
ianw	maybe i should turn the clock back an hour and go back to bed :)	20:06
clarkb	can I dothat but for 5 hours?	20:07
fungi	ianw: or forward 36 hours and start your weekend?	20:07
clarkb	I do think we should try and be careful about testing this if we can. Like maybe we upload to another ppa or just copy a deb around and install it?	20:07
clarkb	before we push to the normal ppa that servers will automatically update from	20:07
clarkb	since that may restart services and such and then actually break us if the fix is not sufficient?	20:08
clarkb	of course if things properly break then it will be moot at that point anyway	20:08
clarkb	I have been told food is waiting for me. back in a bit	20:08
auristor	there will not be patches for openafs 1.6	20:08
ianw	clarkb: should we spin up a common ubuntu/debian vm?	20:08
clarkb	auristor: I didn't necessarily expect them, but it wasn't clear reading the mailing list stuff if it is suffering from the same problems	20:08
clarkb	auristor: as at least one person indicates a 1.6 client can talk to 1.6 servers just fine?	20:09
clarkb	ianw: ya maybe that is a good next step	20:09
ianw	we want to be a bit careful with launchpad as it likes to only keep one version of the package	20:09
auristor	its a related but different problem with all pre-1.8 openafs and non-AuriStorFS and non-Linux rxrpc clients	20:09
clarkb	auristor: and does that related but different problem affect us if only the server side of afs is 1.6 ?	20:10
auristor	The 1.6 issue is not as simple as "if you restart today it won't work"	20:10
ianw	mnaser: i'd have to run that by infra-root, not sure we have a strict policy on keeping the old backups. maybe 6 months or so i guess?	20:10
*** lpetrut has joined #opendev		20:10
auristor	the problems are always in the rx initiator. so client connections to fileserver / vlserver; fileserver to vlserver and cache managers; volserver to volserver, etc	20:11
clarkb	and volserver to volserver etc is where our 1.6 side of things would be using the buggy rx initiator	20:12
*** zoharm has quit IRC		20:12
clarkb	auristor: is https://gerrit.openafs.org/#/c/14496 expected to fix https://lists.openafs.org/pipermail/openafs-info/2021-January/043015.html ?	20:14
clarkb	if so I guess we continue to focus on getting patched packages built, then limp along with 1.6 with its not 100% failure while we update all our 1.8 clients. Then sort out a 1.6 -> 1.8 upgrade	20:14
openstackgerrit	Jeremy Stanley proposed opendev/system-config master: Temporarily serve static sites from AFS R+W vols https://review.opendev.org/c/opendev/system-config/+/770856	20:15
openstackgerrit	Jeremy Stanley proposed opendev/system-config master: Revert "Temporarily serve static sites from AFS R+W vols" https://review.opendev.org/c/opendev/system-config/+/770857	20:15
ianw	i'm just spinning up a common server to work on now	20:17
clarkb	thanks. I need to pop out before my lunch gets cold	20:17
clarkb	trying to start a new local build with that third patch just to get to see it passes testing	20:18
fungi	the good news is the fixes stack on the existing quilt patches in the source package without conflict and pass the autopkgtests	20:18
ianw	np, do that :)	20:18
fungi	at least the first two merged fixes do anyway	20:18
clarkb	yup the third applies cleanly too	20:19
*** slaweq has joined #opendev		20:19
clarkb	I just started a build with the third and will go eat now	20:20
clarkb	(that will tell us if the tests pass)	20:20
auristor	pretty much try not to restart any openafs clients or servers (except for patching 1.8) from now to the end of the month	20:21
ianw	root@10.209.39.226	20:21
ianw	we don't tend to restart the afs servers ... but when it does happen it's usually not our choice and something wrong on the cloud provider side	20:22
fungi	yup, so far it's been last week the provider decided they needed to live-migrate one server and it hung the kernel such that a hard reboot was the only option, then earlier this week they had a problem with a system serving one of the iscsi volumes which make up the lvm underlying our /vicepa on another afs server and i had to restart it to fsck the fs mounted writeable again	20:26
fungi	it's not been a good couple of weeks stability-wise for that segment of our services	20:27
*** slaweq has quit IRC		20:27
fungi	infra-root: anybody else feel strongly in favor or against switching to serving static site content from the rw path temporarily? https://review.opendev.org/770856	20:29
clarkb	I guess not	20:34
*** sgw has joined #opendev		20:35
clarkb	ianw: thats a 10/8 address :)	20:36
ianw	ok, how about 104.239.144.149 :)	20:37
fungi	now that i can reach	20:37
clarkb	I can hit it and need to reload my keys. Will do that once lunch is more properly finished	20:38
ianw	i'm just putting 14491/2 in the patches; so far this is nothing unique over what anyone else has done right?	20:38
ianw	i.e. we know that works	20:38
clarkb	ianw: correct. Lookls like my 14496 build succeeded locally	20:39
clarkb	running the build without these patches failed for me locally	20:39
clarkb	my rough setup was pull the source and deps, build clean, that failed. Apply patches, rebuild that succeeded	20:40
fungi	ianw: in debian/patches and list them in debian/patches/serial	20:45
fungi	to make sure quilt will apply them	20:45
ianw	yep, i'm running a build on that host in a screen too, just for sanity	20:45
fungi	and then dch to add a new build version to the debian/changelog	20:45
ianw	dpkg-source: info: applying 0011-14491.patch	20:45
ianw	dpkg-source: info: applying 0012-14492.patch	20:45
ianw	agre they apply clean	20:45
ianw	fungi: there we differ, i tend to use the emacs mode :)	20:46
fungi	ianw: sure, whatever works	20:47
fungi	though if you set VISUAL or EDITOR to emacs then dch will respect that too	20:48
ianw	so yeah, the only trick is the source needs to be signed by a key accepted by launchpad, and you do need to upload for each release	20:48
ianw	there's no clicky button to build for different releases	20:48
*** hashar has joined #opendev		20:54
ianw	looks like kaduk has pushed 1.8.6-5 to debian	20:56
ianw	and there's talk of making 1.8.7 release	20:56
clarkb	ianw: fungi before we push to lp we can install our built package on that server and then check it works?	20:58
clarkb	then if it works figure out signing and push to lp?	20:58
ianw	and jsbillings has already done rpm's @ https://copr.fedorainfracloud.org/coprs/jsbillings/openafs/package/openafs/ which we can import	20:59
clarkb	ianw: we should make sure that they include the third fix too if we start using the upstream packaging stuff	21:00
clarkb	at least I had completely missed hte third change until very recently	21:00
ianw	that being 14496?	21:02
clarkb	yes	21:02
ianw	https://gerrit.openafs.org/#/q/status:merged+project:openafs+branch:openafs-stable-1_8_x now has the backports of those 3 too	21:02
ianw	i didn't have 14496 on the server, rebuilding now	21:05
ianw	(the server being 104.239.144.149)	21:06
*** lpetrut has quit IRC		21:07
fungi	clarkb: if you built binary packages from your patched source package, then yes you should be able to `apt install ./somepackage.deb ./otherpackage.deb` or use dpkg -i to do it for that matter	21:09
clarkb	fungi: yes, except I'm in a ubuntu container on suse and my kernel is too new I think	21:09
fungi	ahh	21:09
clarkb	but on ianw's server we should be able to do that	21:09
fungi	yeah, that will also exercise the dkms bits	21:10
*** Alex_Gaynor has left #opendev		21:10
ianw	really need to add --parallel to the rules :/	21:10
clarkb	ianw: slight nit, I think fungi said the mroe correct packge version was 1.8.6-1ubuntu2~focal1 not 1.8.6-1ubuntu1~focal2	21:10
clarkb	but both should work for now	21:11
ianw	yeah i'm fairly tempted to take the 1.8.6-5 packages	21:13
fungi	agreed, we could in theory just nab them from debian/sid	21:14
fungi	once they land in the archive anyway	21:14
ianw	we've only kept our own ppa because a) for a while there packages were way behind and we had much more bespoke work and b) iirc for whatever reason they weren't building for arm64	21:14
fungi	1.8.6-5 is unlikely to end up in xenial at all, and may see a lengthy delay getting to bionic	21:15
clarkb	when you say take teh 1.8.6-5 you mean backport them to xenial and bionic from whatever is at 1.8.6-5?	21:15
clarkb	fungi: ok that answers some of my question	21:15
fungi	but we can grab the source packages and stuff them into our ppa, right	21:15
ianw	clarkb: yeah, basically stuff those packages into our ppa	21:15
fungi	even landing in focal proper will take some time because ubuntu is trickling those packages in from debian and probably not with significant urgency	21:16
ianw	i imagine it just has a few more of those patches from the 1.8 stable branch	21:16
fungi	yes, it will be just more stuff in the debian/patches dir	21:16
ianw	... which also all might be moot if there's a 1.8.7 release with all htem too	21:17
clarkb	https://salsa.debian.org/debian/openafs/-/blob/master/debian/patches/0014-Remove-overflow-check-from-update_nextCid.patch seems to show that debian's package has all three fixes in 1.8.6-5. The 0013 and 0012 patches are the other two	21:17
clarkb	so ya that would probably work too assuming it also compiles for xenial and bionic and focal	21:17
clarkb	(and I guess arm64)	21:18
ianw	it should if no new build-deps, since we've been ok with all the prior versions	21:18
ianw	ok, the .debs on 104.239.144.149 have all three patches. we can install them and try if we like	21:19
ianw	what i'm still not clear on is if our 1.6 servers are just now broken and can't be fixed	21:19
clarkb	ianw: should we install openafs-client on 104.239.144.149 first and see if we can navigate /afs/ ?	21:19
fungi	since they're keeping the packaging in git on salsa, you can probably just look at the history there to see if anything about the package itself has changed other than the quilt patches	21:20
clarkb	ianw: it sounds like 1.6 servers are broken in a similar way but not exactly the same and that means they aren't 100% broken like 1.8 is	21:20
clarkb	ianw: and if we don't restart 1.6 servers we'll maybe be ok?	21:20
clarkb	fungi: https://salsa.debian.org/debian/openafs/-/commit/be72605900a4820ce613a3c3b2bce372a203d2c6 no thats it	21:20
clarkb	at least in the last 2 months	21:21
fungi	testing against our servers at the moment may be intractable because they're overrun with vos release activity in progress from yesterday and not accepting new rpc calls	21:21
clarkb	there are other updates in the debain package we don't have in our ppa	21:21
clarkb	fungi: well they should be able to do reads right? or are those slots also full up?	21:21
fungi	you should be able to test that you can reach files, right	21:21
fungi	i assumed you meant testing vos commands	21:22
ianw	building modules now	21:22
clarkb	ya I guess I don't actually know enough about afs to know how to sufficiently test this	21:22
*** hamalq has joined #opendev		21:22
clarkb	the example on the mailing list seem to be primarily vos * commands	21:23
clarkb	ianw: oh also at least for 1.8 it seems to be based on server start time	21:24
clarkb	ianw: so 1.6 may not actually exhibit any problems until we restart	21:24
clarkb	and then we'd be completely hosed potentially	21:24
ianw	i think that's fine, but yeah, the restart may not be our choice and recent stability hasn't been reassuring in that regard	21:25
fungi	looks like the other major changes in the newer package version are support for more recent linux kernels (5.8, 5.9)	21:26
*** hashar has quit IRC		21:26
fungi	so yeah, honestly i would just clone https://salsa.debian.org/debian/openafs/ and debuild from that	21:29
*** jrosser has quit IRC		21:29
*** ildikov has quit IRC		21:30
clarkb	lsmod show openafs on ianw's server now	21:30
ianw	# ls /afs/openstack.org/	21:30
ianw	developer-docs docs docs-old mirror project service user	21:30
ianw	is promising	21:30
*** ildikov has joined #opendev		21:31
*** jrosser has joined #opendev		21:32
ianw	a "find . -type f -exec md5sum {} \;" stress test is seeming good	21:32
clarkb	do we try to do some more admin type commands next?	21:32
clarkb	though as noted before those might hang because of full up slots for all those releases?	21:32
ianw	also i haven't setup the kerberosy things	21:33
clarkb	I think we document how to do that via command line switches without setting up default domains and all that	21:33
clarkb	but ya it seems like at least for reads this is working (which is going to be what the mirrors and static are all doing, zuul will do writes too)	21:34
*** fp4 has quit IRC		21:37
ianw	ok, i have 1.8.6-5 focal/bionic/xenial packages i can upload to the ppa to build if we like	21:40
clarkb	ianw: do we want to build the focal one on your test serverfirst? or treat the testing of the three patches we've already done as good enough?	21:41
clarkb	but I agree using 1.8.6-5 seems like a good option rather than maintaining an older fork	21:41
ianw	the only two extra patches that has over what we just tested are for 5.8 & 5.9 kernels	21:42
clarkb	probably pretty safe to push as is then	21:42
ianw	sorry there is also one other for afsmonitor, atool i don't think we use	21:43
ianw	if nobody has any other opinions, i'm happy to dput these and get the ppa buliding them	21:47
clarkb	I think that sounds like a reasonable next step. fungi ^	21:47
*** diablo_rojo has quit IRC		21:48
clarkb	looks like jproulx was hit by this too	22:00
clarkb	maybe jproulx has hints on the 1.6 -> 1.8 upgrade	22:00
*** fp4 has joined #opendev		22:01
mnaser	does anyone know if there's some sort of setting i'm missing, our ci is not reporting to zuul-jobs, ijust generated a new set of http credentials and im getting a 403 when its reporting	22:21
mnaser	https://www.irccloud.com/pastebin/n3ZCIyP5/	22:22
mnaser	ssh streaming work cause it does get enqueud	22:22
clarkb	mnaser: basic auth was the only thing we changed for opendev zuul after the gerrit upgrade	22:23
mnaser	yeah thats why i changed auth_type=basic	22:23
clarkb	mnaser: however we just dropped the digest setting and let it default to basic	22:23
clarkb	maybe explicitly setting it to basic doesn't work?	22:23
mnaser	ok, dropping the basic and trying again	22:24
fungi	sorry, dinner pulled me away, catching back up	22:33
fungi	ianw: clarkb: dput them, yes please	22:33
clarkb	fungi: I think ianw just did that	22:34
ianw	yep, hitting some backport package errors with debhelerp-compat packages	22:37
clarkb	ianw: did you only push xenial? I wonder if the other two have the right compat level	22:37
ianw	yeah, i've pushed bionic and we'll see if that works	22:38
ianw	sbuild-build-depends-openafs-dummy : Depends: debhelper-compat (= 12)	22:39
ianw	nope on bionic. i feel like i just dropped this previously and they built ok anyway	22:39
clarkb	ianw: ya looking at the ppa debian/control files there doesn't seem to be a debhelper-compat listed but the debian/control file in the upstream salsa repo has it	22:41
clarkb	(so I assume that means you dropped it in those other ones)	22:41
*** DSpider has quit IRC		22:42
clarkb	https://manpages.debian.org/testing/debhelper/debhelper.7.en.html#Supported_compatibility_levels has more info	22:42
fungi	good point, dh-compat needs to list a dh version available in the target distro release	22:43
ianw	i think i hacked it back to 9 and it "just worked"	22:43
clarkb	v9 seems to be what xenail has if you want to just switch it to 9	22:43
ianw	i also think i very stupidly didn't note that in the changelog	22:43
fungi	it probably will if the package isn't relying on newer dh features	22:43
clarkb	ianw: debhelper (>= 9.20160114~) is what I see in the xenial and focal control files from our ppa	22:44
ianw	yep, that is what i put in	22:44
clarkb	fungi: it must because it worked before an sounds like the delta between our current package and the upstream package is all in the patching side?	22:45
clarkb	or maybe people were only looking at the patch dir?	22:45
fungi	clarkb: yes, i agree with that logic	22:45
ianw	ok, focal looks like it's building ok	22:46
*** fp4 has quit IRC		22:46
fungi	git diff --stat debian/1.8.6-1..HEAD	22:47
fungi	Standards-Version was increased from 4.1.3 to 4.5.0 in debian/control	22:48
fungi	some make var assignment methods were tweaked in debian/rules	22:49
fungi	the debian/watch file was updated to track upstream source via https instead of http	22:50
fungi	otherwise, just more quilt patches	22:50
fungi	i don't see any changes which would have impacted debhelper use	22:50
clarkb	cool	22:51
ianw	i just have to wait a bit for the old failed ones to delete	22:51
ianw	well that's brilliant	22:54
ianw	deleting the failed builds appears to have also removed the prior good xenial and bionic builds	22:54
ianw	no sorry, i see. the old builds have been superseded by the failed builds	22:55
fungi	good news! i got a response from a vos status call finally	22:56
clarkb	ianw: I don't see xenial and bionic at https://launchpad.net/~openstack-ci-core/+archive/ubuntu/openafs/+packages anymore fwiw	22:56
clarkb	they are still at http://ppa.launchpad.net/openstack-ci-core/openafs/ubuntu/pool/main/o/openafs/ though	22:56
clarkb	so maybe a weird UI thing?	22:56
ianw	clarkb: yeah, they're not listed as their status is "Superseded"	22:57
ianw	i should probably upload xenial/bionic as "2" with the debhelper revert explicitly	22:58
clarkb	wfm	22:58
*** redrobot1 has joined #opendev		22:58
*** redrobot has quit IRC		23:02
*** redrobot1 is now known as redrobot		23:02
clarkb	ianw: looks like bionic builds just started	23:05
ianw	yep, so that 2 package just has the debhelper manually reverted to 9, no other changes	23:06
fungi	sounds fine to me	23:09
ianw	gosh i will be happy to have xenial gone	23:12
ianw	btw we have 96gb of ram available in the rax ci tenant	23:12
fungi	good to know	23:12
fungi	sounds like enough for a gerrit replacement in that case	23:13
ianw	yeah, enough for a 60g replacement for a little	23:13
ianw	we've got ticks on the bionic x86-64/i386 packages; that's good	23:25
ianw	fungi: is it reliably returning	23:26
ianw	fungi: thinking about upgrades for server -- i think we're in a pickle enough to manually get to 1.8.6 and then prioritise actual ansible etc for replacement non-xenial servers	23:27
fungi	ianw: no, i got a call in between some vos release completing and another kicking off, i think	23:27
ianw	my thought was the best thing to do is to probably vos dump the important volumes before attempting such a thing	23:27
openstackgerrit	Martin Kopec proposed opendev/system-config master: WIP Deploy refstack with ansible docker https://review.opendev.org/c/opendev/system-config/+/705258	23:28
fungi	so i went and held locks for all the remaining mirror updates now to hopefully prevent any additional vos release calls which are waiting from actually getting serviced, not sure if it will help	23:28
ianw	the other options seem to be a openstack-side snapshot of the volumes attached to the server, or posix-level rsync-type copies of vicepa somewhere	23:28
fungi	we could add another cinder volume as a pv in that vg and then make an lvm snapshot	23:29
fungi	we probably don't have sufficient available extents on the current pvs to comfortably snapshot the volume	23:30
fungi	since lvm snapshots are essentially cow we shouldn't need enough room for a full extra copy that way	23:30
ianw	yeah, i think we waste a lot of time though backing up the mirrors	23:30
clarkb	ya the mirrors are like 95% of the disk use	23:31
fungi	lvm snapshot is instantaneous	23:31
fungi	(effectively)	23:31
clarkb	ah til	23:31
ianw	oh good point, yeah cow	23:31
ianw	i mentioned in this in openafs	23:32
clarkb	those servers appaer to still be puppeted fwiw	23:32
ianw	i've been pointed at akeyconvert	23:32
fungi	see manpage for lvcreate if you're interested in details	23:32
ianw	yes, we'll need to ansiblise this. it feels like we should do that and recrate them as focal nodes	23:32
clarkb	and puppet == xenial for us	23:32
clarkb	ianw: note I believe we did that last server upgrades in place via do-upgrade or whatever the ubuntu command is because afs doesn't like ip addresses changing	23:33
clarkb	that said I agree getting all the way up to focal for a lot of things sounds like a great idea :)	23:33
fungi	our kerberos servers also probably fall into the category of related infrastructure we'll want to upgrade around the same time	23:34
ianw	http://manpages.ubuntu.com/manpages/bionic/man8/akeyconvert.8.html	23:34
ianw	The akeyconvert command is used when upgrading an AFS cell from the 1.6.x release series	23:34
ianw	to the 1.8.x release series.	23:34
clarkb	one thing to keep in mind is that bionic is supposedly getting 10 yaers of support. I don't know if that is true for focal too	23:37
clarkb	but if not it may make sense to consider if bionic and sit in place for the remaining 7 years or whatever it is is a better option	23:38
openstackgerrit	Martin Kopec proposed opendev/system-config master: WIP Deploy refstack with ansible docker https://review.opendev.org/c/opendev/system-config/+/705258	23:39
ianw	yeah, if we're going to replace i'd say focal -- i mean already the packages don't build on bionic	23:39
clarkb	thats a good point	23:39
clarkb	looks like all LTS are 10 years now	23:41
clarkb	sorry not all	23:41
clarkb	bionic and newer	23:41
clarkb	ianw: fungi I think the packages are all done now except for arm64	23:44
clarkb	do we want to install the new package on a node and see if it works?	23:44
ianw	we can upgrade a mirror node, yep	23:45
clarkb	note the ml seems to indicate that some sort of forceful restart of things may be required (lots of people were just doing reboots)	23:46
ianw	starting to track things in https://etherpad.opendev.org/p/infra-openafs-1.8	23:47
*** sshnaidm\|ruck is now known as sshnaidm\|afk		23:49

Generated by irclog2html.py 2.17.2 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!