Thursday, 2024-04-04

opendevreview	Merged opendev/irc-meetings master: We have decided to adjust meeting time to 0700 during summer time. https://review.opendev.org/c/opendev/irc-meetings/+/914940	09:02
SvenKieske	I'm currently trying to decide which parent job to utilize for my new linting job, and looking at https://opendev.org/openstack/openstack-zuul-jobs/src/branch/master/zuul.d/jobs.yaml#L72 I have questions. Are the python27 jobs really still running? the build history for those is not loading for me in zuul. I guess I could grep all the zuul jobs.yamls if it is enabled at all	09:30
SvenKieske	in general it seems - for me anyway - that there are some jobs in there which could probably be updated? referencing EOL branches etc? But I'm not sure if these are still in use for some fips testing or other stuff?	09:31
SvenKieske	I don't see any jobs there explicitly running on newer releases (2023.X et al). newest branches there in most parent jobs e.g. "openstack-tox" is zed release. even openstack-tox-py311's parent is openstack-tox with no newer branches declared? I must be missing something?	09:50
SvenKieske	ah, those are defined here: https://opendev.org/openstack/openstack-zuul-jobs/src/branch/master/zuul.d/project-templates.yaml#L645 weird structure..	09:54
SvenKieske	so it seems, previously gate testing jobs and their branches where defined in jobs.yaml https://opendev.org/openstack/openstack-zuul-jobs/commit/6d85fd8399ed6b9f2358412945cd6683989662cd	09:59
SvenKieske	but nowadays this is done in project-templates.yaml https://review.opendev.org/c/openstack/openstack-zuul-jobs/+/913710	10:00
SvenKieske	and nobody cleaned that up? afaik it's not necessary to split the branch definition for jobs being run over two files here?	10:00
SvenKieske	maybe I'm still not seeing the whole picture here, but it does seem to make some kind of sense at least. although it seems a little brittle and error prone to have no single source of truth which branch is being used for which job at which point in time.	10:03
*** sfinucan is now known as stephenfin		12:25
fungi	SvenKieske: you might find https://zuul.opendev.org/t/openstack/jobs an easier way to browse the zuul jobs defined in the openstack tenant, and then cross-reference them to git from there	12:28
fungi	to answer your question about whether some projects still maintain compatibility with and test on python 2.7, yes of course. there are still distributions that support it even if it's not supported upstream by the python community, and openstack project branches that (at least very recently) supported being installed with python 2.7	12:30
fungi	and even master branches of non-branching tools and libraries that need to continue to support those older versions of the software (e.g. pbr). i think we only dropped python 2.7 support from bindep a few weeks ago	12:33
fungi	https://zuul.opendev.org/t/openstack/builds?job_name=openstack-tox-py27&project=openstack/pbr	12:36
fungi	https://zuul.opendev.org/t/opendev/builds?job_name=tox-py27&project=opendev/bindep though that's in a different zuul tenant these days	12:38
SvenKieske	ah, that's why I didn't find any jobs, thanks!	12:38
SvenKieske	does pbr still use py27?	12:39
fungi	yes, it still supports python 2.7 because other projects supporting python 2.7 need to be installable with the latest versions of pbr	12:39
fungi	a prime example though is https://zuul.opendev.org/t/openstack/builds?project=openstack%252Fswift	12:40
fungi	you'll see it runs several different "py27" jobs even on master branch changes	12:41
SvenKieske	interesting, I was under the impression that the transition to python3 was complete years ago. at least that was some marketing speak around it. so swift still supports python2?	12:43
SvenKieske	is that to support some old redhat cruft? I assume swift does support python3. I wouldn't even know how to currently install python2 on most distros, maybe software heritage has old packages.	12:45
tkajinam	we globally removed python 2 support at ussuri afair and that was because mainly SwiftStack required it for a bit longer ( to run new swift in older ubuntu). Idk if that requiremenet still stands	12:48
tkajinam	s/that was/keeping py2 support in swift/ I mean	12:49
SvenKieske	i quickly grepped for python2\|3 in the swift repo, at least there seem to be some remnants of python2 support left. as the only tests being run on python2 seem to be linting\|tox stuff I doubt that it works.	12:52
SvenKieske	just recently found out swift support with keystone backend for auth was broken since zed release in k-a because we did not test that.	12:53
SvenKieske	so my basic assumption since roughly 6 years is: untested code is broken code.	12:54
tkajinam	SvenKieske, that's interesting. do you have a bug for it ?	12:54
tkajinam	we run some tempest tests to validate deployment with swift + keystone in puppt jobs but I've never seen any problems so far (though our test coverage is quite limited)	12:55
tkajinam	I think swift + keystone is covered by usual dsvm jobs run in multiple projects	12:55
SvenKieske	tkajinam: https://bugs.launchpad.net/kolla-ansible/+bug/2060121 this is kolla-ansible specific bug, see the attached patch	12:55
SvenKieske	but it reinforces my view. another contributor is working on implementing tempest tests in kolla now. we need more integration tests in kolla and I think tempest is the best we can add.	12:56
tkajinam	https://github.com/openstack/devstack/blob/master/lib/swift#L435	12:57
SvenKieske	we have custom bash integration test and they mostly work, but they really only test a fraction, even of our default install.	12:57
tkajinam	I'd say that's not a bug in swift but one in deployment tools. though I feel like the requirement of /v3 path is redundant and something we may want to improve	12:57
SvenKieske	yeah sure, I was talking about kolla-ansible, that's my main interaction point with openstack :) it's a bug in this deployment tool	12:58
tkajinam	a bit tricky point with this discussion is that you may need to test s3 api instead of native swift api and tempest does not cover it for now	12:58
SvenKieske	it was just an example where no tests lead to silently broken code, for many releases even.	12:59
tkajinam	yeah	12:59
SvenKieske	so if swift does not run integration tests in python2 I doubt it works in python2, until proven otherwise :)	13:00
tkajinam	hm https://github.com/openstack/swift/tree/master/test/functional/s3api	13:01
fungi	SvenKieske: to be clear, our "marketing speak" was that we fully supported python 3. you can fully support python 3 and 2.7 if you're careful about how you write your software	13:16
SvenKieske	sure, but I also imho read somewhere that python2 support was - to be - removed and afaik there are projects without python2 support	13:17
fungi	and yes, it was in service of people on older (but still supported by their vendor) gnu/linux distributions being able to upgrade to the latest releases of swift	13:17
SvenKieske	I'm pretty sure I myself added the usage of a standardlib that's not available in python2	13:17
fungi	sometimes people don't want to upgrade the distribution they're running, and as long as that distro still provides necessary things like security fixes i don't see that as a concern	13:18
SvenKieske	no	13:18
SvenKieske	but https://docs.openstack.org/tempest/latest/supported_version.html does not list any python 2 version as supported	13:18
SvenKieske	so imho it's good to wonder why we burn CI cycles on python2 tests?	13:19
SvenKieske	ah damn, that's only tempest	13:19
fungi	SvenKieske: maybe the missing piece here is that swift doesn't require keystone. it can be installed as a stand-alone service	13:19
fungi	and from what i understand there are quite a few large deployments of stand-alone swift without other openstack services alongside it	13:20
fungi	and that's what the swift team has been trying to make sure kept working on python 2.7	13:21
SvenKieske	but python2 is also not listed on the supported runtimes for zed: https://governance.openstack.org/tc/reference/runtimes/zed.html	13:21
fungi	SvenKieske: right, in the zed release projects were not required to support python 2.7, but that doesn't mean they couldn't still choose to do so	13:22
SvenKieske	okay, makes sense :)	13:22
fungi	openstack-wide support guarantees are the minimum required by projects included in openstack, not the extent of what some of they might support in isolation	13:23
SvenKieske	imho it would still make sense to at least think about a roadmap when to officially demand removal of this though.	13:23
fungi	i think i heard that the swift team was sunsetting python 2.7 support, i don't recall what that timeline was, perhaps that's something they'll be talking about in ptg sessions next week	13:24
SvenKieske	okay, thanks for the insights. I wasn't really aware there are actually still openstack projects with python2 support. guess I'm one of the lucky 10000 ( https://xkcd.com/1053/ )	13:25
fungi	openstack is pretty vast, it's hard to know everything that goes on in it	13:31
corvus1	my guesstimate of the db import time was way off, it took 9 hours.	14:14
frickler	wow, that's a lot	14:18
corvus1	yeah, i think it's all the indexes... i made the estimate based on the rate of the artifact table, but it has few indexes. i think the builds/buildsets/refs tables, which have lots of indexes, slowed down a lot	14:19
corvus1	during the recent migration, i turned off indexes then recreated them at the end. i think we should see if it's possible to do that with mysqldump	14:20
corvus1	s/turned off/deleted/	14:20
frickler	unrelated, didn't we have like 500+ config errors? now I see only 335 and I wonder what happened	14:20
corvus1	i think if we do that, we may approach the runtime of the migration (which i think was 40m?)	14:20
corvus1	heh, i wonder if archive.org crawled our config error page? and if we're about to block it? :)	14:21
frickler	interesting idea. seems there's only one single copy from 2022 though :(	14:26
corvus1	frickler: i think the scheduler logs the error count every time it reconfigures a tenant; so you could at least double check the numbers, but not the actual error contents from the logs	14:28
frickler	I'm just check against a copy I made 4 weeks ago. seems my memory of the count is mostly correct and we have 250 less errors for openstack/trove combined with some general increase due to 2024.1 branching	14:32
fungi	hopefully that means someone actually fixed trove's job configs	14:49
clarkb	fungi: I do wonder if we should make a pbr2 package and then have that drop python<3.8 support. Probably a lot of effort for minimal gain	14:50
clarkb	particularly since we may be able to drop python2 in the near future which is probably the biggest tripping hazard	14:50
fungi	we'll probably be able to drop python2 support from pbr soon enough that it's not worth the extra dance	14:50
fungi	yeah, that	14:50
clarkb	the one place where it could get tricky is if people see that as an invitation to start adding newer python3 stuff but then we won't necessarily work with newer python3 when installing in old locations either. THough I suspect that pip's fake pyproject.toml stuff may help a bit there	14:51
clarkb	I'll be joining the gerrit community meeting in about 9 minutes. Going to ask them about this reindexing bug on gerrit 3.9 (my current biggest concern with an upgrade)	14:51
fungi	testing with sufficiently old python3 is probably sufficient to control that	14:52
clarkb	good point	14:52
corvus1	hrm, the mysqldump script already disables keys during the import, so i'm not sure that manually deleting them and adding them would be faster.	15:04
fungi	any idea what the bottleneck is? network bandwidth? cpu on the trove instance?	15:05
corvus1	(or string parsing overhead of a text dump file)?	15:05
corvus1	i don't know, and it's a bit hard to tell without access to the db host...	15:06
fungi	maybe this is where that "break the warranty sticker" root login comes in handy	15:07
fungi	or we can accelerate our plans to launch a dedicated mariadb server instance	15:07
corvus1	well, even that's just root db user	15:07
fungi	oh, root in the db not a rood command shell in the os	15:07
fungi	yeah, that's maybe some help still (i think mysql has performance details for some stuff?) but not the whole picture	15:08
corvus1	i'm sure we can make this faster, but is it worth it? do we sink a lot of time into improving it, or is 9 hours of missing build records on a saturday acceptable?	15:08
fungi	i think it's acceptable, but also wonder if it's much less work than an ansible playbook that installs the mariadb container we use elsewhere and launching a server	15:09
frickler	I think it is fine, too. bonus if you make sure to start after the periodic-weekly pipeline is done	15:10
corvus1	yeah, if we can fold a migration to self-hosted in, that would be ideal; just not sure how fast we can cobble together that change	15:10
fungi	i'm starting to look at it because i can't help myself, but really shouldn't be since i'm up against several other deadlines	15:13
fungi	i guess the main things we need are a servicve-zuul-db playbook that includes the mariadb role and our other standard roles, custom firewall rules allowing query access from the zuul schedulers, some rudimentary testinfra test(s)... what else?	15:15
corvus1	yes -- except there is no mariadb role because we don't have any standalone mariadbs	15:15
corvus1	so that has to start as a copy of, say, the gerrit role with a bunch of stuff removed	15:15
fungi	oh, yes i totally missed that and assumed we had already made a shared mariadb role, but i see we haven't	15:16
fungi	instead we just embed mariadb container configuration in every service that needs one	15:16
corvus1	yep	15:16
* fungi had started from a copy of the etherpad role but gerrit would have also worked yes		15:17
corvus1	if you're doing that, i can launch the server	15:17
fungi	but yeah, maybe we just do the migration to another trove on saturday. i don't think i can commit to writing and debugging this before the weekend	15:18
corvus1	ok i'll stand down then :)	15:18
fungi	the scope isn't substantial, but it's more than i have time for before next week at the earliest	15:18
clarkb	https://gerrit-review.googlesource.com/c/gerrit/+/417857 progress	15:18
fungi	yay!	15:19
fungi	and with that, i'm disappearing for lunch but shouldn't be more than an hour tops	15:19
clarkb	I also understand the issue much better now. Bsaically there was a new feature added that allowed a full offline reindex to start from a checkpoint state (possibly precreated before you do an upgrade) this keeps deltas small and speeds up your "full" offlien reindex. However, there was a bug and in some cases (I think when you did not create the checkpoint state whcih is non default)	15:28
clarkb	it would completely delete the changes index	15:28
clarkb	then you start gerrit and panic. It turns out that if you rerun a full reindex from that state it works beacuse its starting from 0. That means the workaround is to simply rerun the reindex	15:28
clarkb	but this was under documented and obtuse and when you're in a "gerrit is basically completely unusable" state you're not likely to find that path forward	15:28
clarkb	anyway that revert pulls out the functionality and then 3.10 (current master) has reimplemented it in a different more robust way	15:29
clarkb	The other thing that was called out is that SAP apparently has hit what they think may be a race between C git and Jgit during repacking of large repos. It sounds like packed-refs ends up getting truncated and then the repo is unuseable. They were able to restore from backups though	16:00
clarkb	Apparently one installation has for many years (like a decade) done a hard link to packed-refs before running gc then only removes the hard link if things check out cleanly. We may want to investigate doing this in our system. (it was nasser saying they do that so we can followup if it seems like a good idea)	16:01
clarkb	SAP seemed to think it is extremely unlikely to happen though (and they don't even know that it is a race between c git and jgit that is just a theory).	16:02
clarkb	sounds like their gerrit install has many more changes than ours and much larger repos involved	16:02
opendevreview	James E. Blair proposed opendev/system-config master: Add a standalone zuul db server https://review.opendev.org/c/opendev/system-config/+/915079	16:08
corvus1	fungi: clarkb ^ i dropped and added a single index on the new server on the artifacts table and it took over an hour, which is kind of spooking me about using this trove db. so i went ahead and tried to throw together a self-hosted change.	16:10
clarkb	corvus1: you did that on the old server or the new test one? (or maybe both?)	16:11
corvus1	new one; i don't have a comparable time for the old server, so i can't say whether it's slower or not	16:14
clarkb	corvus1: posted some quick thoughts on that change. But lgtm	16:17
clarkb	well other than addressing those minor things I mean	16:18
opendevreview	James E. Blair proposed opendev/system-config master: Add a standalone zuul db server https://review.opendev.org/c/opendev/system-config/+/915079	16:24
corvus1	clarkb: thanks!	16:24
clarkb	corvus1: looks like you've got the secrets file open so gpg is telling me to go away (I presume ofr the trove stuff)	16:26
clarkb	if you don't need it anymore there is an edit I'd like to make this morning	16:26
corvus1	yep, i'll exit now	16:26
clarkb	the email for the password change infra-root just got was me	16:34
corvus1	i'm launching an 8gb performance flavor in dfw for zuul-db01	16:36
corvus1	on jammy	16:36
fungi	okay, back now	16:37
fungi	and reviewing the db server change, awesome!	16:40
clarkb	oh maybe the passwd change email only went to me. Maybe we should upate that contact email. One thing at a time :)	16:41
fungi	yeah, we talked about that when i was doing the mfa stuff for it too	16:42
opendevreview	James E. Blair proposed opendev/zone-opendev.org master: Add zuul-db01 https://review.opendev.org/c/opendev/zone-opendev.org/+/915082	16:45
clarkb	corvus1: on the db role change I think maybe we need to add the groups fiel for testing to the big list that gets copied by ansible to set up the test bridge?	16:46
opendevreview	James E. Blair proposed opendev/system-config master: Add zuul-db01 https://review.opendev.org/c/opendev/system-config/+/915083	16:46
clarkb	I'm trying to dig that up and will get a pink	16:47
clarkb	corvus1: https://opendev.org/opendev/system-config/src/branch/master/playbooks/zuul/run-base.yaml#L115 needs an entry there. I'll leavea gerrit comment for historical reasons too	16:48
corvus1	clarkb: found it	16:48
corvus1	yep	16:48
opendevreview	James E. Blair proposed opendev/system-config master: Add a standalone zuul db server https://review.opendev.org/c/opendev/system-config/+/915079	16:49
opendevreview	James E. Blair proposed opendev/system-config master: Add zuul-db01 https://review.opendev.org/c/opendev/system-config/+/915083	16:49
corvus1	clarkb: fungi i just ran the same drop/add on my local mysql8 db with a slightly older copy of the opendev db and it finished in 19min. so i think it's safe to say that the trove mysql8 is not optimal, but it won't be clear if self-hosting will be an improvement until we test there.	16:52
clarkb	ack	16:52
fungi	it definitely sounds like a promising data point though	16:52
corvus1	i need to run some errands -- if ya'll can continue pushing on the mariadb thing, that would be much appreciated	16:53
fungi	absolutely, thanks!!!	16:55
opendevreview	Clark Boylan proposed opendev/system-config master: Rebuild the etherpad container image https://review.opendev.org/c/opendev/system-config/+/915084	16:55
opendevreview	James E. Blair proposed opendev/system-config master: Set standalone mariadb innodb buffer pool to 4G https://review.opendev.org/c/opendev/system-config/+/915085	17:01
corvus1	okay really leaving now :)	17:01
opendevreview	Jeremy Stanley proposed opendev/system-config master: Add a standalone zuul db server https://review.opendev.org/c/opendev/system-config/+/915079	17:51
clarkb	fungi: I think you need to rebase the other two chagnes on that too?	17:54
clarkb	or maybe you want to see check apss first? thats fine I guess	17:54
fungi	yeah, i wasn't eager to rebase those until i see it passing	17:54
fungi	just in case there are other surprises lingering	17:55
fungi	but will do, for sure, once it's all good	17:55
Clark[m]	Thanks. I'm working on lunch now. It got cold again so I'm making a quick dashi to do some ramen	18:19
fungi	yum! what base are you using? shiitaki? kombu? bonito? niboshi? some combination of those?	18:21
Clark[m]	Kombu and katsuobushi (bonito)	18:22
fungi	soooo gooooooooood!	18:22
fungi	oiishi	18:23
fungi	er, oishii i meant	18:23
Clark[m]	Nothing fancy just putting something together with what I've got laying around. Noodles were bought fresh hut had one serving left hiding in thebfreezert	18:25
Clark[m]	*the freezer	18:25
fungi	we end up with a lot of shiitake dashi from rehydrating dried mushrooms for other dishes, excellent for reuse	18:26
fungi	less so recently since we've been growing our own shiitake though	18:28
fungi	given my irc nick you'd think i would have at least attempted mushroom farming earlier in life, but i've discovered it's surprisingly easy	18:30
Clark[m]	I've debated buying one of the kits. I feel like I would end up killing them like I do the plants in my yard	18:31
fungi	kits are mostly for educational purposes and not a sustainable way to farm	18:36
fungi	longer term you can just grow shiitake on billets of oak in a dark place like your basement, root cellar or crawlspace	18:36
fungi	the wood needs to stay damp but not soaked, and you just harvest the mushroom growth from them periodically	18:37
fungi	https://zuul.opendev.org/t/openstack/build/60d3f05987e34125b88c1cdbe8a85ad9	19:16
fungi	what am i missing?	19:16
fungi	the change has:	19:16
fungi	assert mariadb_log_file.contains('mariadb: ready for connections')	19:16
fungi	oh, wait, that's a buildset for the old patchset	19:17
fungi	check hasn't reported on the new patchset	19:17
fungi	guess i found an outdated notification in my inbox	19:17
fungi	though oddly, it was sent 5 minutes ago	19:18
fungi	oh, it's for a child change, not the one i updated	19:19
fungi	okay, less confused now	19:19
fungi	though the new patchset is failing in a new way	19:20
fungi	"Apr 4 18:24:24 zuul-db99 docker-mariadb[15314]: 2024-04-04 18:24:24 0 [Note] mariadbd: ready for connections." https://zuul.opendev.org/t/openstack/build/e9eb0810ce1c45c995bcea46b6655405/log/zuul-db99.opendev.org/containers/docker-mariadb.log#36	19:23
fungi	https://zuul.opendev.org/t/openstack/build/e9eb0810ce1c45c995bcea46b6655405/log/job-output.txt#54619-54622	19:24
corvus1	ha i see it	19:24
corvus1	i'll fix	19:24
* fungi sighs		19:24
fungi	what did i miss?	19:24
opendevreview	James E. Blair proposed opendev/system-config master: Add a standalone zuul db server https://review.opendev.org/c/opendev/system-config/+/915079	19:25
corvus1	fungi: you're gonna love it	19:25
fungi	zomg	19:26
fungi	how did i not notice that extra d? did i not cut and paste? i guess not!	19:26
corvus1	dbdbdbdbd	19:26
fungi	feels like it should be a friday	19:27
corvus1	i want to start a db project called pqdb	19:27
fungi	and name the process pqdbdbd	19:27
corvus1	and the query client pqdbdbdpq	19:28
fungi	i'd use it as often as i was able to type it	19:29
opendevreview	James E. Blair proposed opendev/system-config master: Add zuul-db01 https://review.opendev.org/c/opendev/system-config/+/915083	19:29
opendevreview	James E. Blair proposed opendev/system-config master: Set standalone mariadb innodb buffer pool to 4G https://review.opendev.org/c/opendev/system-config/+/915085	19:29
fungi	::	19:32
fungi	my window manager is unusually squirrelly today	19:32
fungi	guess i'll take the opportunity for a package upgrade. running out of ways to procrastinate on paperwork	19:33
clarkb	its really maria db d ?	19:55
clarkb	thats like the equivalent of a typing tongue twister	19:55
corvus1	yup for realz	20:04
fungi	really for reals	20:24
corvus1	first change is looking good, but third change failed with this failure which seems spurious: https://zuul.opendev.org/t/openstack/build/7504595c5a474e7a81bcbf57f62c9f26	20:32
corvus1	(and related to the zk host)	20:33
corvus1	i'm going to recheck that but we should keep that in mind if it shows up again	20:33
clarkb	ya looks like the test node lost networking?	20:34
clarkb	++ to a recheck	20:34
corvus1	clarkb: fungi https://review.opendev.org/915082 can you +3 that?	20:34
fungi	that's my interpretation as well	20:34
fungi	corvus1: done	20:35
corvus1	i have created secret creds for it on bridge, so i think all pieces are in place, just awaiting merge	20:37
opendevreview	Merged opendev/zone-opendev.org master: Add zuul-db01 https://review.opendev.org/c/opendev/zone-opendev.org/+/915082	20:38
opendevreview	Merged opendev/system-config master: Add a standalone zuul db server https://review.opendev.org/c/opendev/system-config/+/915079	21:21
opendevreview	Merged opendev/system-config master: Add zuul-db01 https://review.opendev.org/c/opendev/system-config/+/915083	21:21
corvus1	that's enough to get started; once that deploys i'll manually make the config change, start the db, and start an import	21:25
fungi	infra-prod-base deploy failed for 915079, checking into the logs to see why	21:34
fungi	zuul-db01.opendev.org : ok=0 changed=0 unreachable=1 failed=0 skipped=0 rescued=0 ignored=0	21:36
fungi	not sure what that's all about, i can sudo ssh into it from bridge	21:37
fungi	"Data could not be sent to remote host "104.239.240.24". Make sure this host can be reached over ssh: Host key verification failed."	21:38
fungi	`sudo ssh -4 zuul-db01.opendev.org` is also working with no interaction, so it's not a host key mismatch on the v4 addy or anything like that	21:39
fungi	should we just reenque the change into deploy?	21:40
corvus1	fungi: yeah, i agree with all that. any chance you have a buildset link?	21:45
corvus1	fungi: actually there is a giant deploy happening now, i guess we can just let it run?	21:46
corvus1	fungi: oh weird, actually i would not have expected 079 to try to contact the host because it wasn't added until 915083, which is the change after it, and the one that's running now	21:49
corvus1	i'm not sure why it thought it had a zuul-db01 in inventory since it hadn't been added to the inventory yet.... :?	21:49
fungi	corvus1: oh, we've seen this before	21:49
corvus1	but at any rate, i think we can expect 083 to work since at least everything should definitely be in place then	21:49
fungi	approve changes together and the deploy works off the state of everything that merged before it started	21:50
fungi	which fails for the penultimate change because it's operating off state that isn't relevant yet	21:50
fungi	happened to me when i added lists01.opendev.org now that i think about it	21:51
fungi	if we'd waited to approve the inventory addition until after the prior change deploy had finished, it would have been fine	21:51
fungi	but the inventory addition showed up early during the parent change's deploy	21:51
corvus1	oh because deploy is a change pipeline, not ref-updated	21:52
fungi	zactly	21:52
fungi	so not a show-stopper, but worth thinking about whether there's a way to solve that short-term race i suppose	21:53
fungi	other than just delaying later approvals, that is	21:54
fungi	because we'll never remember to do that	21:55
clarkb	but shouldn'tssh still work?	21:55
fungi	it's possible we didn't add the host key as known at that point	21:56
clarkb	I can understand there may be an order thing going on but if the inventory has the node and we have ssh set up (lauch node does this) it should still be able to connect right?	21:56
clarkb	oh right the host key comes out of the inventory and we may have needed an earlier step to apply that	21:56
fungi	my guess is it's a known_hosts challenge, right	21:56
clarkb	I wonder if bootstrap bridge does that	21:57
fungi	hard to say for sure because the logs are a bit opaque	21:57
clarkb	and it can run at the same time as other playbooks? that might be the source of the bug	21:57
fungi	possible we're skipping a necessary job with changed file filters, yep	21:57
fungi	or at least not running it first	21:57
clarkb	ya I think that is it (either of those two scenarios or both)	21:58
fungi	which we might also be able to solve another way if we could make openssh ignore missing known_hosts entries as long as there's a matching sshfp result	21:59
fungi	that was the original intent with sshfp records, but openssh upstream backpedaled after dnssec failed to gain traction	22:00
corvus1	the letsencrypt job failed (don't know why yet) but that's a stroke of luck in that it skipped 21 jobs and the zuul-db deploy is next since it doesn't depend on it.	22:25
clarkb	'ansible.vars.hostvars.HostVarsVars object' has no attribute 'letsencrypt_certcheck_domains'	22:35
clarkb	however in the task just two ish tasks prior it records the value of that variable so I dont' know why that failed	22:35
clarkb	some sort of bug in fact recording?	22:35
opendevreview	Merged opendev/system-config master: Set standalone mariadb innodb buffer pool to 4G https://review.opendev.org/c/opendev/system-config/+/915085	22:45
corvus1	clarkb: but nb02 is not in that list...	22:45
corvus1	okay that's some serious ansible wizardry to build the list of domains and it's failing at the nb02 step	22:47
corvus1	there's a comment in that task file that makes me think maybe this happens occasionally	22:50
corvus1	iptables on zuul-db01 looks reasonable	22:51
opendevreview	James E. Blair proposed opendev/system-config master: Mariadb: listen on all IP addresses https://review.opendev.org/c/opendev/system-config/+/915096	22:57
corvus1	clarkb: fungi ^ one more thing; i have that in place manually and i think that's it.	22:57
fungi	oh, good catch. for the mailman3 role i configured it to only work on the loopback	22:59
corvus1	fungi: clarkb we need to put /var/mariadb/db on /opt -- how should we do that?	23:16
fungi	oh, for more disk space? hmm..	23:17
corvus1	should i just change the docker-compose to use /opt instead of /var/mariadb? or do a bind mount... or...?	23:17
corvus1	ya	23:17
corvus1	seems like maybe just moving the docker-compose volume mounts might be easiest/best?	23:17
fungi	yeah, i think for other things we've done something like /opt/mariadb and then fiddled with fstab and cinder volumes if we deploy in another provider	23:18
fungi	clarkb: ^ does that sound right?	23:18
opendevreview	James E. Blair proposed opendev/system-config master: Move standalone mariadb to /opt https://review.opendev.org/c/opendev/system-config/+/915098	23:20
corvus1	since it's easy, i've made that change locally on the server. i'm going to put it into emergency for now so it won't be reverted	23:33
fungi	sounds good. i would have approved it, but would appreciate clarkb's input once he's back	23:34
corvus1	yeah, it's super easy to undo if we want something else	23:34
Clark[m]	Usually we bind mount the drive to something in /var	23:35
Clark[m]	But the end result is the same other than the path. Etherpad is an example of this iirc	23:35
corvus1	is that in ansible, or was that just done manually?	23:36
Clark[m]	I think it's done with launch node flags telling it what to do with the epehermal drive?	23:37
Clark[m]	Or with a volume via launch node	23:37
corvus1	/dev/main/main-etherpad02 /var/etherpad/db ext4 errors=remount-ro,barrier=0 0 2	23:37
corvus1	that's looking like lvm	23:37
fungi	oh, so we mount the ephemeral disk to /var/something in those cases?	23:38
corvus1	well, at least in etherpad's case, we got an extra volume and lvm'd it and mounted that; it's not actually using /opt	23:38
fungi	yeah, on etherpad02 we have /dev/xvdb1 as a pv and then make a logical volume on it	23:38
Clark[m]	That's a volume xvde is ephemeral	23:39
fungi	right, we're using a cinder volume in that case, mainly so that we have some insurance beyond backups	23:39
corvus1	want i should make a volume for zuul-db02 and mount it at /var/mariadb ?	23:40
corvus1	mimic etherpad?	23:40
corvus1	might be nice to have opt for scratch space for giant sql files anyway :)	23:40
fungi	i suppose it's a question of whether we 1. need more space than the ephemeral disk provides, 2. might consider detaching and attaching to a different server in the future	23:41
fungi	or yeah, 3. want to be able to use the ephemeral disk for something else entirely	23:41
corvus1	1: not today, but we'll use ~half of it i think; 30 out of 60g maybe more	23:42
corvus1	2. probability significantly higher than 0	23:42
corvus1	and 3 -- yeah, if we share it, we will be nearly out of space if we make a single mysqldump. so yeah, cinder has some things going for it. :)	23:42
fungi	sounds good to me. also we could probably use some of our ssd quota for that rather than the default sata, for added performance	23:45
corvus1	yep	23:45
corvus1	VolumeManager.create() got an unexpected keyword argument 'backup_id'	23:45
corvus1	i got that when i ran volume create... :(	23:45
corvus1	do i need to use a certain venv?	23:45
fungi	we don't need it to be huge. probably the 100gb minimum rackspace requires would suffice	23:45
corvus1	the one in launcher-venv does not work	23:47
tonyb	corvus1: yes we do. I think it's /home/fungi/xyzy	23:47
tonyb	something like that. history should help	23:48
fungi	heh, um...	23:48
corvus1	tonyb: thanks! different error: The plugin rackspace_apikey could not be found	23:48
corvus1	so i think fungi's secret venv has bitrotted since the mfa stuff and we need a new one!	23:48
tonyb	ahhh that be new due to the MFA stuff	23:48
fungi	we can probably fix that by installing the rackspace client plugin into that venv	23:48
corvus1	fungi: if you want -- i won't touch it since it's in your homedir :)	23:49
fungi	i just installed rackspaceauth into that xyzzy venv	23:50
fungi	try again?	23:50
corvus1	fungi: success, thanks!	23:51
fungi	but really, we should figure out why /usr/launcher-venv doesn't work for that	23:51
corvus1	other openstack volume commands work in the global env, only the create fails with the backup_id thing	23:52
fungi	yeah, i had previously only tested things like volume list	23:54
fungi	so didn't realize the main venv wasn't able to create new volumes	23:55
corvus1	okay all done, and change abandoned	23:57
corvus1	removed host from emergency	23:58

Generated by irclog2html.py 2.17.3 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!