Thursday, 2021-06-17

ianw	and i need to fix the base deployment issues	00:00
Ramereth	ianw: I've been trying to catch those instances and clean them up. Do we have some right now?	00:01
ianw	i'm seeing 5 in ERROR status from our client	00:01
ianw	fault \| {'message': 'libvirtError', 'code': 500, 'created': '2021-06-16T23:56:47Z'}	00:02
ianw	looks like they've all got libvirt error	00:03
ianw	fungi: in openstackzuul-linaro-us i only see active servers?	00:10
Ramereth	checking	00:14
Ramereth	yup, more qemu-kvm processes in a defunct state :/	00:16
Ramereth	I wish I knew why that keeps happening. I don't see this on our ppc or x86 clusters	00:16
Ramereth	i'll have to work on cleaning those up tomorrow as I'm about to head out for the day	00:21
opendevreview	Merged opendev/system-config master: openafs-client: add service timeout override https://review.opendev.org/c/opendev/system-config/+/796578	00:22
ianw	Ramereth: no problems. i'll likely re-enable osuosl once ^ deploys	00:22
fungi	ianw: nodepool list when run on nl03 at least shows 15 nodes in a deleting state for linaro-us for me	00:22
ianw	ahhh, so ZK is out of sync with the cloud i guess	00:23
ianw	i was looking via openstack client	00:23
fungi	could be, yeah	00:23
fungi	possible these are cluttering up zk and so nodepool thinks they're occupying quota?	00:24
ianw	that could be true. i can delete the nodes by hand	00:24
ianw	(the ZK nodes ... too much overloading of node :)	00:24
fungi	we've got quite the backlog of queued arm64 builds and linaro-us now has only deleting and two ready nodes which are over a day old	00:25
Ramereth	ianw: how long until that deploys? Maybe I should go ahead and do it now	00:26
fungi	not sure why we'd have a ready ubuntu-focal-arm64 and ubuntu-bionic-arm64 node in linaro-us for more than a day	00:26
fungi	would have expected those to get used sooner, but maybe all the arm64 testing is centos/fedora	00:26
ianw	Ramereth: probably an hour, but then i also have to merge the change to re-enable. i think it can wait	00:26
ianw	hrm, system-config at a minimum should slurp those up you'd think	00:27
ianw	but most of the testing is tox-ish things as well, that i think would all be focal	00:27
ianw	ok, i'm into the zk shell just correlating now	00:31
ianw	fungi: hrm, i don't see lots of deleted nodes. i wonder if we're getting launch failures	00:33
ianw	no i tell a lie, sorry, they don't have images attached so weren't matching arm64 grep	00:34
ianw	ok, i've cleared the ZK nodes, nodepool list is no longer showing them	00:39
ianw	2021-06-17 00:39:51,268 DEBUG nodepool.PoolWorker.linaro-us-regionone-main: Active requests: []	00:40
ianw	it doesn't feel like it thinks it has anything to do	00:40
fungi	mmm	00:40
ianw	i might have to restart nl03, i feel like i've seen this before	00:41
ianw	2021-06-17 00:43:31,423 DEBUG nodepool.PoolWorker.linaro-us-regionone-main: Active requests: ['300-0014437271', '300-0014440634', '300-0014440637', '300-0014441811', '300-0014455060', '300-0014455555', '300-0014437862', '300-0014437863', '300-0014454786', '300-0014455047', '300-0014462863', '300-0014462864', '300-0014454788', '300-0014454789', '300-0014455322', '300-0014455325', '300-0014463009', '300-0014463010', '300-0014454915', '300-0014409073',	00:43
ianw	'300-0014422609', '300-0014433950', '300-0014434049']	00:43
ianw	that's more like it	00:43
fungi	ahh, excellent, thanks	00:43
ianw	we now have 25 servers building in linaro	00:44
fungi	yes, that sounds like what we should normally see	00:45
ianw	i think the fact that nodepool didn't notice those VM's were gone and thought that there were no requests despite the long queue are related. i'm not sure how though	00:50
Ramereth	ianw: alright, I cleared those out. Heading out for real now	00:50
ianw	Ramereth: thank you!	00:51
ianw	i've enabled the openafs service and am rebooting the osuosl mirror. the timeout has applied "A start job is running for OpenAFS client (1min 2s / 8min 8s)"	01:07
ianw	2min 54s / 8min 8s and it's up	01:09
opendevreview	Merged openstack/project-config master: Revert "Disable the osuosl arm64 cloud" https://review.opendev.org/c/openstack/project-config/+/796585	01:19
ianw	i think the hosts missing the EFI mounts are limited	01:42
ianw	mirror01.ord.rax.opendev.org and bridge.openstack.org. since we're not starting bionic nodes, i think i'll just fix this by hand	01:43
melwitt	https://review.opendev.org/c/opendev/jeepyb/+/795912 converts blueprint integration in jeepyb to the gerrit API if anyone interested. here's the system-config change to re-enable blueprint integration for patchset-created https://review.opendev.org/c/opendev/system-config/+/795914	02:06
*** ykarel\|away is now known as ykarel		04:14
*** zbr is now known as Guest2488		05:03
opendevreview	chandan kumar proposed openstack/project-config master: Added publish-openstack-python-tarball job https://review.opendev.org/c/openstack/project-config/+/791745	05:28
opendevreview	Dr. Jens Harbott proposed opendev/git-review master: Fix nodeset selections for zuul jobs https://review.opendev.org/c/opendev/git-review/+/796754	05:29
*** marios is now known as marios\|ruck		05:44
*** raukadah is now known as chandankumar		05:46
opendevreview	Merged opendev/system-config master: bridge: upgrade to Ansible 4.0.0 https://review.opendev.org/c/opendev/system-config/+/792866	06:20
opendevreview	Florian Haas proposed opendev/git-review master: Support the Git "core.hooksPath" option when dealing with hook scripts https://review.opendev.org/c/opendev/git-review/+/796727	06:52
*** rpittau\|afk is now known as rpittau		07:22
*** jpena\|off is now known as jpena		07:31
*** raukadah is now known as chandankumar		08:28
hjensas	Hi, the job at the top of the queue for TripleO, its been sitting there with that on tripleo-ci-centos-8-undercloud-upgrade-victoria "queued" for 12ish hours. See: https://zuul.opendev.org/t/openstack/status#tripleo	08:35
hjensas	Almost all the jobs behind it in the queue has finished, but with the job at the top of the queue stuck the whole queue seems stuck? Any chance of poking so that the job in "queued" is started?	08:36
*** ykarel is now known as ykarel\|lunch		08:38
hjensas	#opendev anyone around who can take a look at TripleO's stuck gate queue? https://zuul.opendev.org/t/openstack/status#tripleo	09:38
*** ykarel\|lunch is now known as ykarel		09:45
frickler	hjensas: for a fast workaround, you could likely drop that patch from the queue. not sure we'd get to any further debugging before the event next week, but for that you'd likely have to wait a couple of hours for corvus to show up	09:46
hjensas	frickler: yeah, we may just have to abandon that patch and restore it. Any idea when corvus usually shows up?	10:02
hjensas	fyi, TripleO abandoned the blocking change, and restored it to get the queue unstuck.	10:09
*** jpena is now known as jpena\|lunch		11:34
*** amoralej is now known as amoralej\|lunch		12:00
*** whayutin is now known as weshay\|ruck		12:10
opendevreview	Ananya Banerjee proposed opendev/elastic-recheck master: Run elastic-recheck container https://review.opendev.org/c/opendev/elastic-recheck/+/729623	12:14
opendevreview	Ananya Banerjee proposed opendev/elastic-recheck master: Run elastic-recheck container https://review.opendev.org/c/opendev/elastic-recheck/+/729623	12:20
*** jpena\|lunch is now known as jpena		12:32
fungi	hjensas: frickler: it does look more generally like there may be some stuck node requests across the board though... for example an openstack/ovsdbapp has been sitting in check for 137 hours waiting for nodes for some of its unit and static analysis jobs	12:33
fungi	trying to track those down now and probably restart launchers to free any of the locks they're holding on those node requests	12:34
hjensas	fungi: in case you didn't see, we abandoned the stuck change in the tripleo queue. So it is resolved there, but thanks for investigating! :)	12:47
fungi	hjensas: yep, thanks, i've got plenty of other candidates to serve as examples this time	12:48
fungi	here's the node request i'm hunting down:	12:48
fungi	2021-06-11 19:39:33,602 DEBUG zuul.Pipeline.openstack.check: [e: 08ab26431732453d87d673e2aaaf138e] Adding node request <NodeRequest 300-0014404756 <NodeSet ubuntu-focal [<Node None ('ubuntu-focal',):ubuntu-focal>]>> for job openstack-tox-lower-constraints to item <QueueItem af28c45f00f441238e6bbf3099a4a98c for <Change 0x7fe02f02eeb0 openstack/ovsdbapp 795892,1> in check>	12:48
fungi	looks like it was taken by nl04:	12:51
fungi	2021-06-11 19:40:43,357 INFO nodepool.NodeLauncher: [e: 08ab26431732453d87d673e2aaaf138e] [node_request: 300-0014404756] [node: 0025081525] Node is ready	12:52
fungi	but it never unlocked the request	12:52
fungi	so same symptoms we've been finding in other cases	12:52
fungi	i wonder if a thread dump will prove useful, i'll trigger one	12:53
*** amoralej\|lunch is now known as amoralej		13:02
fungi	okay, i've sent 12 (sigusr2) to the child launcher process twice now roughly a minute apart and the dumps are in /var/log/nodepool/launcher-debug.log at 2021-06-17 13:11:43,516-13:12:55,304	13:13
fungi	i'm going to restart the container now to release the locks on those node requests	13:14
fungi	#status log Restarted the nodepool-launcher container on nl04.opendev.org to release stale node request locks	13:15
opendevstatus	fungi: finished logging	13:15
fungi	and now i'm seeing many of the stuck builds transition from queued to running	13:17
fungi	corvus: probably not particularly urgent as it's not that debilitating and we've been seeing it for months off and on so no idea when it started, but do you have any good suggestions for how to try to track this down?	13:34
fungi	most of the time it seems to happen amid flurries of node launch failures, though that perception could also be selection bias on my part	13:35
fungi	sometimes it's a decline after three launch failures, sometimes it's a successful node launch, but the commonality is that it either doesn't get communicated back to the scheduler or the scheduler loses the event somehow	13:37
fungi	not entirely sure which	13:37
fungi	also the trimmed up thread dumps are now in nl04.opendev.org:~fungi/2021-06-17_threads.log	13:38
corvus	fungi: which nl has that debug log ^?	13:38
corvus	heh :)	13:38
fungi	04	13:38
fungi	if catching a launcher actively in this state would help, i can avoid restarting next time it comes up	13:41
fungi	but generally when it happens it's blocking more than just a few changes	13:41
*** marios\|ruck is now known as marios\|call		14:02
corvus	fungi: there was a zookeeper connection suspension between when the server finished booting and when nodepool should have marked the request fulfilled and unlocked the nodes. it was only a suspension, which means that it should have been able to pick up without losing anything. additionally, that node request and several others all disappeared from the list of current requests without a log entry; that's not supposed to be possible.	14:18
corvus	i don't think keeping the launcher in that state would provide more info. i think that's enough clues to figure out what debug info we're missing	14:19
fungi	thanks, and i don't recall seeing it prior to maybe february, but i suppose it could have been lurking there for as long as we've been handling node requests via zk... when we're regularly restarting the launchers it tends to just solve itself	14:21
*** marios\|call is now known as marios\|ruck		14:44
*** ykarel is now known as ykarel\|away		15:10
*** ysandeep is now known as ysandeep\|out		15:33
clarkb	looks like the osuosl mirror got sorted out, that is great news	15:34
clarkb	fungi: re gerrit yes I should page that back in then ask for review on data that I'm back up to speed with. I'll see if I can get to that today or tomorrow	15:34
*** rpittau is now known as rpittau\|afk		16:09
*** sshnaidm is now known as sshnaidm\|afk		16:16
*** marios\|ruck is now known as marios\|out		16:28
clarkb	I'm looking into why the LE certs didn't refresh after 60 days as epxected for the names we got emails about	16:34
clarkb	it appears that nb03.opendev.org failed for some reason in the last couple of LE playbook runs which meant the certs haven't updated anywhere.	16:34
clarkb	looks like a full /opt on nb03 is causing that.	16:34
*** amoralej is now known as amoralej\|off		16:35
*** jpena is now known as jpena\|off		16:36
clarkb	/opt/nodepool_dib (where we store the images that get uploaded) is only about 1/3 of the disk use. This implies we've leaked a bunch in the dib_tmp dir. I'll stop the daemon, clear out dib_tmp, then restart things	16:38
opendevreview	Clark Boylan proposed opendev/system-config master: Add note about afs01's mirror-update vos releases to docs https://review.opendev.org/c/opendev/system-config/+/796893	16:53
clarkb	infra-root ^ that is an update to the openafs docs based on what I discovered the hard way earlier this week :)	16:54
frickler	clarkb: fungi: I tried to fix the nodeset selection for git-review jobs, which I think succeeded, but now there are some issues with the test instance of gerrit not starting, see https://review.opendev.org/c/opendev/git-review/+/796754	17:07
frickler	seems to be some kind of race condition or similar, as it hits only one out of 5 jobs	17:08
clarkb	agreed and seems to have only hit 8 tests in that job	17:10
clarkb	we don't seem to collect the gerrit startup logs in that case to see what went wrong though	17:11
clarkb	re next step it is probably modifying the test suite to grab the gerrit error log file to see why it breaks	17:12
fungi	i thought we did collect the gerrit logs	17:13
clarkb	fungi: I don't see them on that job or in the job-output.txt	17:14
rosmaita	when someone has a few minutes, i'm trying to make sure a job has python 3.6 available, but the change i made to .zuul.yaml on this patch isn't working, it's still reporting "InterpreterNotFound: python3.6": https://review.opendev.org/c/openstack/python-brick-cinderclient-ext/+/796835	17:31
rosmaita	i'm using "python_version" in the job vars, do i need something else?	17:31
clarkb	rosmaita: you have to run the job on a platform that has the python version you want	17:31
fungi	make sure it's ubuntu-bionic	17:31
clarkb	I expect that job is running on focal which has not python3.6. Bionic is probably what you want	17:32
clarkb	ya	17:32
fungi	if you don't specify a nodeset, right now you get ubuntu-focal which is too new	17:32
rosmaita	oh, ok	17:32
rosmaita	and i guess python_version is still a good idea?	17:32
fungi	our abstract py36 jobs set the nodeset so that child jobs will inherit that, but if you're creating a job which doesn't inherit then you need a nodeset which has the version of python you need or you need to specify an alternative means of installing it	17:33
rosmaita	ok, thanks, will revise my patch	17:34
fungi	setting python_version is fine for jobs which switch on that, though it's more useful for selecting a non-default interpreter on whatever node label you're running against	17:34
fungi	e.g. ubuntu-focal defaults to python 3.8 but has 3.9 packages available, so you might want to use python_version to tell the job to use 3.9 on focal	17:35
rosmaita	that's helpful, i understand this better now	17:36
clarkb	corvus: reading the matrix spec and one thing that stands out to me is that the homeserver is responsbile for maintaining scrollback (until the end of time I guess?). While this makes sense I wonder what sort of storage needs are required? its basically the entire channel log with a pointer for each user's client indicating where they were last caught up?	17:43
clarkb	corvus: any idea how that scales over time? do we need a potentially ever growing data store for this information (not that channel logs tend to be large but trying to understand what is involved there)	17:44
fungi	clarkb: if it helps, the channel logs for most of the channels we'd be talking about, as recorded by the meetbot, presently occupy 17gb on disk and this includes the htmlified copies as well	17:48
fungi	technically that's ever-growing but we've never really come close to running out of space for it	17:49
clarkb	fungi: that is good info. Probably a decent estimate for matrix disk needs (though I expect matrix adds significantly more metadata, the order of magnitude is probably similar)	17:49
corvus	clarkb: that's my understanding -- at least for the users on that homeserver that are in that room. of course, we will almost certainly have at least one "user" in each room on our homeserver in the form of a bot.	17:51
fungi	though i agree, having someone with a homeserver which has channel history provide an estimate for how much room logging needs by comparison to our meetbot logging might be good	17:51
fungi	like x% the size of a plan text log for the equivalent timeframe	17:51
clarkb	another thing that comes to mind is how upgrades are done. Can those be done without taking chat down?	17:52
clarkb	I suspect we'd be required to run multiple homeservers to accomplish that?	17:52
corvus	clarkb: it's possible there are ways to reduce or truncate the data since our bots wouldn't need history. however, it seems likely that we might want the opendev.org homeserver to provide history as a service for new folks connecting. so i think we should expect to keep indefinite history, not as a technical requirement, but as a good service.	17:52
clarkb	yup	17:53
clarkb	scrollback is one of the key features people always talk about so keeping that around makes sense to me	17:53
corvus	clarkb: synapse is a python program with a database; i'd expect minimal downtime between upgrades. remember that users won't see that downtime, except maybe in just a little bit of extra lag, as their homeserver queues updates.	17:53
corvus	clarkb: i have stopped and restarted my homesever several times during conversations with you :)	17:54
JayF	fungi: holy @#$%, I've been using `skipdist` in about every tox file I've created since the beginning of time. I was wondering why it always seemed so ineffective	17:54
JayF	thank you for that email :D	17:54
corvus	clarkb: i have no idea how accurate this is: https://matrix.org/docs/projects/other/hdd-space-calc-for-synapse	17:54
clarkb	corvus: ya I guess the impact would be due to prolonged outages, short outages for updates should go unnoticed	17:54
corvus	clarkb: but it is certainly a thing you can put numbers in and get other numbers out of :)	17:54
fungi	JayF: yw	17:55
corvus	clarkb: i'd have to check the actual Matrix Specification for this, but it may be the case that only bots and new users would notice homeserver downtime.	17:56
corvus	(given that we're talking of only using our homeserver for bots)	17:56
clarkb	new users bceause they wouldn't be able to join the channels in that moment?	17:57
corvus	right	17:57
corvus	clarkb: the high-level thing under "how does it work" at https://matrix.org/ leads me to believe that's the case	17:59
corvus	oh we should steal that "next" animation idea for zuul	17:59
corvus	essentially, our bots would get an eventually-consistent view of the room after the homeserver came back up	18:01
clarkb	I've brought the subject up over at the foundation. Pointed out the oddity in potential pricing for element hosted homeserver for us and asked if there was any reason to not reach out to them now and start a conversation to understand what element can do better. I also mentioned corvus and mordred are willing to meet up and talk about it in more depth	18:02
corvus	(also, fwiw, i think new users could technically join the room if any other homeserver involved chose to publish it at an alternate address)	18:02
clarkb	I'm also trying to build a model in my head for what hosting a homeserver looks like if we do it ourselves. Seems like we should expect some disk and network bw.	18:03
clarkb	And for upgrades maybe we can have docker compose simply update to the latest image constantly?	18:03
mordred	++	18:04
corvus	<clarkb "I'm also trying to build a model"> yep; minimal for #zuul, potentially significant for other communities	18:04
corvus	<clarkb "And for upgrades maybe we can ha"> I haven't tried an upgrade yet; that sounds reasonable, but it's definitely a wildcard in my mind.	18:04
*** dviroel is now known as dviroel\|away		18:05
clarkb	having a plan for those as well as understanding what server upgrades look like is probably important before going down the run our own path	18:05
corvus	yup	18:05
clarkb	in particular for the host server I wonder if changing IPs will be a problem or if we can spin up a new one alongside and do a failover of sorts, etc	18:05
clarkb	a sync + failover seems like the sort of thing matrix may just do out of the box given how it does federation	18:06
fungi	yeah, i asked similar questions about server redundancy in the zuul spec	18:06
*** slittle1 is now known as Guest2552		18:06
fungi	seems like something we need a little more research around	18:06
corvus	clarkb: also potential interesting info for the foundation is that we have contacts in ansible, fedora, gnome, and mozilla orgs all of whom are in various stages of the same process (most/all of whom are chosing to use EMS hosting) and they seem happy to help with the process	18:07
corvus	changing ips will not be a problem	18:07
clarkb	corvus: oh ya it would be interesting to get info about EMS experiences from them I bet	18:08
clarkb	On the subject of history collection I wonder if we can simply expose those logs somehow as the canonical historical record without needing an account and authentication (just so we don't end up with a copy of them all on the homeserver and another copy where the bot lives)	18:11
fungi	that might get tricky to filter if the homeserver has history from any private channels for some reason	18:13
fungi	though if we can be sure the only history it has is safe to publish, then perhaps easier	18:14
clarkb	fungi: good point. Point to point private comms are e2e encrypted so it would just be private channels that have this issue I think	18:15
clarkb	but definitely something to check on if we explore that option further	18:15
corvus	clarkb: i don't think they're natively stored in a way that's usable for that purpose (you know, directed graph structure and all). doing that with a web app is probably computationally expensive. my guess is for purposes of search engine indexing, etc, just having flat files is best. but maybe there's a way to export a room history to flat file to avoid needing the bot.	18:15
clarkb	corvus: gotcha	18:15
fungi	or a way to make a plugin which does that in more lightweight ways than a typical "bot"	18:16
corvus	fungi: yeah, an app service may be able to do stuff like that, but i'm not very familiar	18:17
clarkb	corvus: mordred: do you know if https://element.io/contact-sales is the best way to reach out to EMS?	18:19
clarkb	or rather, do have a better contact? if not thehn ^ is probably easiest	18:19
corvus	clarkb: i think so	18:28
corvus	er, i think that's the best way to get started; i don't have a better contact :)	18:29
opendevreview	melanie witt proposed opendev/jeepyb master: Convert update_blueprint to use the Gerrit REST API https://review.opendev.org/c/opendev/jeepyb/+/795912	18:29
*** dviroel\|away is now known as dviroel		18:53
clarkb	melwitt: I left a couple of comments on https://review.opendev.org/c/opendev/jeepyb/+/795912 The first one would probably be a good followup and the other is more of a "is this even possible question"	19:41
fungi	oh, thanks for the reminder, i was going to review that today	19:44
clarkb	ok cleanup on nb03 has completed and I have restarted the builder there	19:48
clarkb	hopefully the periodic LE job tonight runs successfully and we don't get warnings about expiring certs tomorrow	19:48
fungi	unrelated, gitea01 has been reporting backup problems. i saw the e-mails for the past ~week but haven't found time to look into it yet	19:49
clarkb	fungi: those backups are primarily there to keep db updates around and db updates are primarily important for project renames. If we don't do a project rename until that is fixed it may not be urgent. Of course understanding why it broke would be nice	19:52
fungi	yeah, more or less what i was thinking, i just wanted to at least mention it so i know i'm not the only one aware there's a problem	19:54
clarkb	++	19:55
melwitt	clarkb: ok thanks, will look	20:09
y2kenny	With OpenDev's Zuul, is it possible for organizations to add additional nodepool and/or executors? If so, what is the process?	20:38
y2kenny	(or perhaps "attach" additional nodepool/executor is better wording...)	20:40
fungi	y2kenny: we haven't worked out a process for adding provider-specific builders/launchers/excutors, so far we've only got central services connecting to publicly accessible cloud apis with public addressing (in some cases ipv6-only) to reach the nodes: https://docs.opendev.org/opendev/system-config/latest/contribute-cloud.html	20:45
y2kenny	fungi: Is it something that's in the plan or is that something that will be tricky?	20:46
y2kenny	fungi: This is still very much on the drawing board right now, but the kind of testing resources I am thinking of is not generally available in the cloud (baremetal HW with GPUs.)	20:47
fungi	y2kenny: we've talked about adding a zoned executor for one prospective donor who has no ipv6 connectivity to their environments and extremely limited ipv4 capacity, but we haven't talked about dedicated builders or launchers at all	20:48
fungi	and even environment with the dedicated executor would be a slight reduction in functionality since we'd need to attach floating ips to give developers remote access to held nodes	20:49
y2kenny	fungi: ok... so is this something that you guys would be interested in starting a conversation on or is it too complex to do any time soon?	20:50
fungi	y2kenny: i guess we'd need to talk about what the architecture would look like and what amount and sort of capacity we're talking about, to determine whether the engineering work needed to support it would be sufficiently offset, since there's just not that many people helping design and maintain our control plane these days	21:16
y2kenny	fungi: ok, I see.	21:17
y2kenny	fungi: what is the right place to continue the discussion? (service-discuss@lists.opendev.org ?)	21:22
fungi	y2kenny: sure, here or there, though the mailing list is better for longer-term asynchronous discussion	21:33
fungi	the handful of sysadmins who are active on opendev are scattered around the globe, so not all awake/around right now	21:33
y2kenny	fungi: understood.	21:39
opendevreview	melanie witt proposed opendev/jeepyb master: Convert update_blueprint to use the Gerrit REST API https://review.opendev.org/c/opendev/jeepyb/+/795912	21:43
ianw	clarkb: ahh, thanks for finding the letsencrypt job failure	21:53
ianw	also, i just manually added fstab entries for EFI on our rax bionic hosts	21:54
ianw	yesterday on base i was seeing almost random "-13" errors; the only thing i could find was an ancient devstack-gate launchpad bug from yourself that mentioned iptables randomly returning -13 (no permissions)	21:54
ianw	looks like last night https://review.opendev.org/c/opendev/system-config/+/792866 failed to install updated ansible on bridge; looking	21:56
ianw	### ERROR ###	21:57
ianw	Upgrading directly from ansible-2.9 or less to ansible-2.10 or greater with pip is	21:57
ianw	known to cause problems. Please uninstall the old version found at:	21:57
ianw	i guess this is not worth coding for. i'll manually uninstall and re-install ansible 4.0.0 once to get around this. we don't see this in the gate with fresh installs	22:00
ianw	"Ansible will require Python 3.8 or newer on the controller starting with Ansible 2.12." ... we should think about bridge upgrade process too	22:04
ianw	#status log manually performed uninstall/reinstall for bridge ansible upgrade from https://review.opendev.org/c/opendev/system-config/+/792866	22:05
opendevstatus	ianw: finished logging	22:05
clarkb	ianw: that is an interesting move by ansible since they have historically kept compat with really ancient python. I guess they decided that isn't sustainable (good for them)	22:46
clarkb	fungi: y2kenny: also note that there are specs up to zuul to make that sort of allocation easier on a per tenant basis	22:47
opendevreview	Merged opendev/system-config master: review02 : switch reviewdb to mariadb_container type https://review.opendev.org/c/opendev/system-config/+/795192	22:57
ianw	clarkb: it is only on the controller side though	23:03
clarkb	ianw: ah	23:14
fungi	yeah, what version of python interpreter ansible can be installed with, not what version it can orchestrate	23:18
opendevreview	Merged openstack/project-config master: Add gmann to IRC accessbot https://review.opendev.org/c/openstack/project-config/+/795986	23:31
opendevreview	Merged openstack/project-config master: Added publish-openstack-python-tarball job https://review.opendev.org/c/openstack/project-config/+/791745	23:31
corvus	clarkb: i just upgraded my homeserver to matrixdotorg/synapse:latest; it automatically applied 1 db schema update	23:42
corvus	that was just a pull followed by docker-compose up -d	23:43
clarkb	nice	23:43
fungi	so if i want to run a homeserver should i work out the docker orchestration, or do you expect the matrix-synapse 1.36.0 package in debian would suffice?	23:47
opendevreview	Ghanshyam proposed openstack/project-config master: End project gating for retiring arch-wg repo https://review.opendev.org/c/openstack/project-config/+/796962	23:51
corvus	fungi: i think that would be fine. it's basically a single python daemon, at least unless you want to start running app services like a slack bridge; so if the debian packages have worked out all the python deps already, i don't see a big advantage. it uses sqlite by default for ease of testing, but they recommend postgres for production use.	23:53
corvus	i'm about to migrate from sqlite to postgres; that's going to take a couple of minutes. so i'll go silent for a bit and... hopefully... let you know how it goes :)	23:53
corvus	fungi, clarkb apparently it worked fine :)	23:57

Generated by irclog2html.py 2.17.2 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!