Tuesday, 2021-11-30

clarkb	Meeting time in a minute or two	18:59
clarkb	Its been a while too :)	18:59
ianw	o/	19:00
clarkb	#startmeeting infra	19:01
opendevmeet	Meeting started Tue Nov 30 19:01:02 2021 UTC and is due to finish in 60 minutes. The chair is clarkb. Information about MeetBot at http://wiki.debian.org/MeetBot.	19:01
opendevmeet	Useful Commands: #action #agreed #help #info #idea #link #topic #startvote.	19:01
opendevmeet	The meeting name has been set to 'infra'	19:01
clarkb	#link http://lists.opendev.org/pipermail/service-discuss/2021-November/000303.html Our Agenda	19:01
clarkb	We have an agenda.	19:01
clarkb	#topic Announcements	19:01
clarkb	Gerrit User Summit is happening Thursday and Friday this week from 8am-11am pacific time virtually	19:01
clarkb	If you are interested in joining registration is free. I think they will have recordings too if you prefer to catch up out of band	19:02
fungi	also there was a new git-review release last week	19:02
clarkb	I intend on joining as there is a talk on gerrit updates that will be useful to us to hear I think	19:02
clarkb	yup please update your git-review installation to help ensure it is working properly. I've updated as my git version updated locally forcing me to update	19:02
clarkb	I haven't had any issues with new git review yet	19:03
fungi	git-review 2.2.0	19:03
fungi	i sort of rushed it through because an increasing number of people were upgrading to newer git which it was broken with	19:03
clarkb	the delta to the previous release was small too so probably the right move	19:03
fungi	but yeah, follow up on the service-discuss ml or in #opendev if you run into anything unexpected with it	19:04
clarkb	#topic Actions from last meeting	19:05
clarkb	#link http://eavesdrop.openstack.org/meetings/infra/2021/infra.2021-11-16-19.01.txt minutes from last meeting	19:05
clarkb	I don't see any recorded actions	19:05
clarkb	We'll dive right into the fun stuff then	19:05
clarkb	#topic Topics	19:05
clarkb	#topic Improving CD Throughput	19:05
clarkb	sorry small network hiccup	19:06
clarkb	A number of changes have landed to make this better while keeping our serialized one job after another setup	19:07
clarkb	#link https://review.opendev.org/c/opendev/system-config/+/807808 Update system-config once per buildset.	19:07
clarkb	#link https://review.opendev.org/c/opendev/base-jobs/+/818297/ Reduce actions needed to be taken in base-jobs.	19:07
ianw	yep, those are the last two	19:08
clarkb	These are the last two updates to keep status quo but prepare for parallel ops	19:08
clarkb	Once those go in we can start thinking about adding/updating semaphores to allow jobs to run in parallel. Very exciting. Thank you ianw for pushing this along	19:08
ianw	yep i'll get to that change soon and we can discuss	19:08
clarkb	#topic Zuul multi scheduler setup	19:09
clarkb	Just a note that a number of bug fixes have landed to zuul since we last restarted	19:09
clarkb	I expect that we'll be doing a restart at some point soon to check everything is happy before zuul cuts a new release	19:10
clarkb	I'm not sure if that will require a full restart and clearing of the zk state, corvus would know. Basically it is possible that this won't be a graceful restart	19:10
fungi	after our next restart, it would probably be helpful to comb the scheduler/web logs for any new exceptions getting raised	19:10
clarkb	s/graceful/no downtime/	19:10
corvus	yes we do need a full clear/restart	19:10
clarkb	corvus: thank you for confirming	19:11
fungi	i saw you indicated similar in matrix as well	19:11
fungi	(for future 4.11 anyway)	19:11
clarkb	and ya generally be on the lookout for odd behaviors, our input has been really helpful to the development process here and we should keep providing that feedback	19:11
corvus	i'd like to do that soon, but maybe after a few more changes land	19:11
corvus	we should probably talk about multi web	19:12
corvus	it is, amusingly, now our spof :)	19:12
clarkb	corvus: are we thinking run a zuul-web on zuul01 as well then dns round robin?	19:13
corvus	(amusing since it hasn't ever actually been a spof except that opendev only ever needed to run 1)	19:13
corvus	that's an option, or a LB	19:13
clarkb	if we add an haproxy that might work better for outages and balancing but it would still be a spof for us	19:13
corvus	we might want to think about the LB so we can have more frequent restarts without outages	19:13
clarkb	I guess the idea is haproxy will need to restart less often than zuul-web and in many cases haproxy is able to keep connections open until they complete	19:14
fungi	dns round-robin is only useful for (coarse) load distribution, not failover	19:14
frickler	do we have octavia available? that in vexxhost?	19:14
corvus	i figure if it's good enough for gitea it's good enough for zuul; we know that we'll want to restart zuul-web frequently, and there's a pretty long window when a zuul-web is not fully initialized, so a lb setup could make a big difference.	19:14
clarkb	frickler: I think it is available in vexxhost, but we don't host these services in vexxhost currently so that would add a large (~40ms?) rtt between the lb frontend and backend	19:15
clarkb	corvus: good point re gitea	19:15
fungi	on the other hand, if we need to take the lb down for an extended period, which is far less often, we can change dns to point directly to a single zuul-web while we work on the lb	19:15
ianw	it's a bit old now, but https://review.opendev.org/c/opendev/system-config/+/677903 does the work to make haproxy a bit more generic for situations such as this	19:16
fungi	or just build a new lb and switch dns to it, then tear down the old one	19:16
ianw	(haproxy roles, not haproxy itself)	19:16
clarkb	ianw: oh ya we'll want something like that if we go the haproxy route and don't aaS it	19:17
corvus	ianw: is that for making a second haproxy server, or for using an existing one for more services?	19:17
corvus	(i think it's option #1 from the commit msg)	19:17
clarkb	corvus: I read the commit message as #1 as well	19:17
ianw	corvus: iirc that was when we were considering a second haproxy server	19:17
fungi	yeah, make it easier for us to reuse the system configuration, not the individual load balancer instances	19:17
corvus	that approach seems good to me (but i don't feel strongly; if there's an aas we'd like to use that should be fine too)	19:18
fungi	so that we don't end up with multiple almost identical copies of the same files in system-config for different load balancers	19:18
clarkb	corvus: I think I have a slgiht prefernce of using our existing tooling for consistency	19:19
clarkb	and separately if someone wants to investigate octavia we can do that and switch wholesale later (I'd be most concerned about using it across geographically distributed systems with disparate front and backe ends)	19:19
fungi	though for that we'd probably be better off with some form of dns-based global load balancing	19:20
fungi	granted it can be a bit hard on the nameservers	19:20
fungi	(availability checks driving additions and removals to a dedicated dns record/zone)	19:21
fungi	requires very short ttls, which some caching resolvers don't play nicely with	19:21
corvus	ok, i +2d ianw's change; seems like we can base a zuul-lb role on that	19:22
clarkb	sounds good, anything else zuul related to go over?	19:22
corvus	i'll put that on my list, but it's #2 on my opendev task list, so if someone wants to grab it first feel free :)	19:23
corvus	(and that's all from me)	19:24
clarkb	#topic User management on our systems	19:24
clarkb	The update to irc gerritbot here went really well. The update to matrix-gerritbot did not.	19:24
clarkb	It turns out that matrix-gerritbot needs a cache dir in $HOME/.cache to store its dhall intermediate artifacts	19:24
clarkb	and that didn't play nicely with the idea of running the container as a different user as it couldn't write to $HOME/.cache. I had thought I had bind mounted everything it needed and that it was all read only but that wasn't the case. TO make things a bit worse the dhall error log messages couldn't be written because the image lacked a utf8 locale and the error messages had	19:25
clarkb	utf8 characters	19:25
clarkb	tristanC has updated the matrix-gerritbot image to address these things so we can try again this week. I need to catch back up on that.	19:25
clarkb	One thing I wanted to ask about is whether or not we'd like to build our own matrix-gerritbot images using docker instead of nix so that we can have a bit more fully featured image as well as understand the process	19:26
clarkb	I found the nix stuff to be quite obtuse myself and basically punted on it as a result	19:26
clarkb	(the image is really interesting it sets a bash prompt but no bash is installed, there is no /tmp (I tried to override $HOME to /tmp to fix the issue and that didn't work), etc)	19:27
clarkb	I don't need an answer to that in this meeting but wanted to call it out. Let me know if you think that is a good or terrible idea once you have had a chance to ponder it	19:28
fungi	i agree, it's nice to have images which can be minimally troubleshot at least	19:28
ianw	it wouldn't quite fit our usual python-builder base images, though, either?	19:29
clarkb	ianw: correct, it would be doing very similar things but with haskell and cabal instead of python and pip	19:29
clarkb	ianw: we'd do a build in a throwaway image/layer and then copy the resulting binary into a more minimal haskell image	19:29
clarkb	s/haskell/ghc/ I guess	19:29
clarkb	https://hub.docker.com/_/haskell is the image we'd probably use	19:30
clarkb	I don't think we would need to maintain the base images, we could just FROM that image a couple of times and copy the resulting binary over	19:30
clarkb	We can move on. I wanted to call this out and get people thinking about it so that we can make a decision later. It isn't urgent to decide now as it isn't an operation issue at the moment	19:31
clarkb	#topic UbuntuOne two factor auth	19:31
clarkb	#link http://lists.opendev.org/pipermail/service-discuss/2021-November/000298.html Using 2fa with ubuntu one	19:31
fungi	at the beginning of last week i started that ml thread	19:31
fungi	i wanted to bring it up again today since i know a lot of people were afk last week	19:32
fungi	so far there have been no objections to proceeding, and two new volunteers to test	19:32
clarkb	I have no objections, if users are comfortable with the warning in the group description I think we should enroll those who are interested	19:32
fungi	even though we haven't really made a call for volunteers yet	19:32
ianw	(i think i approved one already, sorry, after not reading the email)	19:32
fungi	no harm done ;)	19:33
clarkb	ya was hrw, I think hrw was aware of the concerns after working at canonical previously	19:33
clarkb	an excellent volunteer :)	19:33
fungi	i just didn't want to go approving more volunteers or asking for volunteers until we seemed to have some consensus that we're ready	19:33
clarkb	I think so, its been about a year, I have yet to have a problem in that time	19:33
fungi	i'll give people until this time tomorrow to follow up on the ml as well before i more generally declare that we're seeking volunteers to help try it out	19:34
clarkb	sounds like a plan, thanks	19:34
frickler	I guess I can't be admin for that group without being member?	19:35
fungi	frickler: correct	19:35
clarkb	frickler: I think that is correct due to how lp works	19:35
fungi	i'm also happy to add more admins for the group	19:36
frickler	o.k., not a blocker I'd think, but I'm not going to join at least for now	19:36
clarkb	One thing we might need to clarify with canonical/lp/ubuntu is what happens if someone is removed from the group	19:36
clarkb	and until then don't remove anyone?	19:36
fungi	i'll make sure to mention that in the follow-up	19:37
fungi	maybe hrw knows, even	19:37
ianw	it does seem like from what it says it's a one-way ticket, i was treating it as such	19:37
ianw	but good to confirm	19:37
clarkb	ianw: yup, that is why I asked because if we add more admins they need to be aware of that and not remove people potentially	19:37
clarkb	it may also be the case that the enrollment happens on the backend once and then never changes regardless of group membership	19:38
clarkb	We have a couple more topics so lets continue on	19:38
clarkb	#topic Adding a lists.openinfra.dev mailman site	19:38
clarkb	#link https://review.opendev.org/818826 add lists.openinfra.dev	19:38
clarkb	fungi: I guess you've decided it is safe to add the new site based on current resource usage on lists.o.o?	19:39
clarkb	One thing I'll note is that I don't think we've added a new site since we converted to ansible. Just be on the lookout for anything odd due to that. We do test site creation in the test jobs though	19:39
fungi	yeah, i've been monitoring the memory usage there and it's actually under less pressure after the ubuntu/python/mailman upgrade	19:39
clarkb	you'll also need to update DNS over in the dns as a service but that is out of bad and safe to land this before that happens	19:40
fungi	for some summmary background, as part of the renaming of the openstack foundation to the open infrastructure foundation, there's a desire to move the foundation-specific mailing lists off the openstack.org domain	19:40
fungi	i'm planning to duplicate the list configs and subscribers, but leave the old archives in place	19:40
clarkb	fungi: is there any concern for impact on the mm3 upgrade from this? I guess it is just another site to migrate but we'll be doing a bunch of those either way	19:41
fungi	and forward from the old list addresses to the new ones of course	19:41
fungi	yeah, one of the reasons i wanted to knock this out was to reduce the amount of list configuration churn we need to deal with shortly after a move to mm3 when we're still not completely familiar with it	19:42
clarkb	makes sense. I think you've got the reviws you need so approve when ready I guess :)	19:42
fungi	so the more changes we can make before we migrate, the more breathing room we'll have after to finish coming up to speed	19:42
clarkb	Anything else on this topic?	19:42
fungi	nope, thanks. i mainly wanted to make sure everyone was aware this was going on so there were few surprises	19:43
clarkb	thank you for the heads up	19:43
clarkb	#topic Proxying and caching Ansible Galaxy in our providers	19:43
clarkb	#link https://review.opendev.org/818787 proxy caching ansible galaxy	19:43
clarkb	This came up in the context of tripleo jobs needing to use ansible collections and having less reliable downloads	19:44
fungi	right	19:44
clarkb	I think we set them up with zuul github projects they can require on their jobs	19:44
fungi	yes we added some oc the collections they're using, i think	19:44
clarkb	Is the proxy cache something we think we should moev those ansible users to? or should we continue adding github projects?	19:44
clarkb	or do we need some combo of both?	19:44
fungi	that's my main question	19:45
fungi	one is good for integration testing, the other good for deployment testing	19:45
fungi	if you're writing software which pulls things from galaxy, you may want to exercise that part of it	19:45
clarkb	corvus: from a zuul perspective I know we've struggled with the github api throttling during zuul restarts. Is that something you think we should try to optimize by reducing the number of github projects in our zuul config?	19:45
clarkb	fungi: I think you still point galaxy at a local file dir url. And I'm not sure you gain much testing galaxies ability to parse file:/// vs https:///	19:46
corvus	clarkb: i don't know if that's necessary at this point; i think it's worth forgetting what we knew and starting a fresh analysis (if we think it's worthwhile or is/could-be a problem)	19:46
corvus	much has changed	19:46
clarkb	corvus: got it	19:46
clarkb	At the end of the day adding the proxy cache is pretty low effort on our end. But the zuul required projects should be far more reliable for jobs. And since we are already doing that I sort of lean that direction	19:47
clarkb	But considering the low effort to run the caching proxy I'm good with doing both and letting users decide which tradeoff is best for them	19:48
fungi	yeah, the latter means we need to review every new addition, even if the project doesn't actually need to consume that dependency from arbitrary git states	19:48
fungi	with the caching proxy, if they add a collection or role from galaxy they get the benefit of the proxy right away	19:49
clarkb	good point. I'll add this to my review list for after lunch and we can roll forweard with both while we sort out github connections in zuul	19:49
clarkb	Anything else on this subject?	19:49
fungi	but i agree that if the role or collection is heavily used then having it in the tenant config is going to be superior for stability	19:49
fungi	i didn't have anything else on that one	19:50
clarkb	#topic Open Discussion	19:50
clarkb	We've got 10 minutes for any other items to discuss.	19:50
fungi	you had account cleanups on the agenda too	19:50
clarkb	ya but there isn't anything to say about them. I've been out and no time to discuss them	19:50
fungi	for anyone reviewing storyboard, i have a couple of webclient fixes up	19:51
clarkb	Its a bit aspirational at this point :/ I need to block of a solid day or three and just dive into it	19:51
fungi	#link https://review.opendev.org/814053 Bindep cleanup and JavaScript updates	19:51
fungi	that solves bitrot in the tests	19:51
clarkb	#link https://review.opendev.org/c/opendev/system-config/+/819733 upgrade Gerrit to 3.3.8	19:51
fungi	and makes it deployable again	19:51
clarkb	Gerrit made new versions and ^ updates our image so that we can upgrade	19:51
clarkb	Might want to do that during a zuul restart?	19:51
fungi	yeah, since we need to clear zk anyway that probably makes sense	19:52
fungi	#link https://review.opendev.org/814041 Update default contact in error message template	19:52
fungi	that fixes the sb error message to point users to oftc now instead of freenode	19:52
fungi	can't merge until the tests work again (the previous change i mentioned)	19:52
ianw	oh i still have the 3.4 checklist to work through. hopefully can discuss next week	19:53
clarkb	ianw: 819733 does update the 3.4 imgae to 3.4.2 as well. We may want to refresh the test system on that once the above change lands	19:53
clarkb	The big updates in these new versions are to reindexing so something that might actually impact the upgrade	19:53
clarkb	they added a bunch of performance improvements sounds like	19:54
ianw	iceweasel ... there's a name i haven't heard in a while	19:54
fungi	especially since it essentially no longer exists	19:54
ianw	clarkb: ++	19:54
clarkb	Last call, then we can all go eat $meal	19:56
ianw	kids these days wouldn't even remember the trademark wars of ... 2007-ish?	19:56
fungi	i had to trademark uphill both ways in the snow	19:57
clarkb	ianw: every browser is Chrome now too	19:57
clarkb	#endmeeting	19:57
opendevmeet	Meeting ended Tue Nov 30 19:57:59 2021 UTC. Information about MeetBot at http://wiki.debian.org/MeetBot . (v 0.1.4)	19:57
opendevmeet	Minutes: https://meetings.opendev.org/meetings/infra/2021/infra.2021-11-30-19.01.html	19:57
opendevmeet	Minutes (text): https://meetings.opendev.org/meetings/infra/2021/infra.2021-11-30-19.01.txt	19:57
opendevmeet	Log: https://meetings.opendev.org/meetings/infra/2021/infra.2021-11-30-19.01.log.html	19:57
clarkb	Thank you everyone	19:58
clarkb	We'll see you here next week	19:58
fungi	thanks clarkb!	19:58
corvus	and app too (thanks electron!)	19:58
fungi	same bat time, same bat channel!	19:58
ianw	https://en.wikipedia.org/wiki/Mozilla_software_rebranded_by_Debian - 2006 - i'll take 2007 as a pretty good guess off the top of my head :)	19:59
ianw	i remember installing it on an Itanium desktop, so that constrained the timeline a bit	19:59
clarkb	wow itanium	20:00
ianw	gosh that was a long time ago!	20:00
clarkb	we had a couple racks of itanium servers when I was at Intel. I think they were largely idle because by that point in time everyone knew the arch wasn't going anywhere	20:01
ianw	oh those were the days. this was pre amd64 (which was pre x86-64!) so just about everything had weird issues related to 64-bit pointers	20:01
ianw	there was ~ nobody running gnome, etc. on 64-bit in those days	20:02
clarkb	ianw: not even on sparc?	20:02
ianw	maybe enthusiasts would play with things on a sparc, or a alpha multia etc. but generally you'd run x11 and something more basic like fvwm	20:03
clarkb	I guess solaris had the CDE probably not many gnome users. But by the time opensolaris happened gnome was the default iirc	20:04
clarkb	we had a lot of sparc stuff at the university	20:04
ianw	yeah, sparc was fun and coveted hardware if you coud find it. i feel like people fiddling with the alternative archs were also more bsd-ish. a lot of netbsd going around for alpha and sparc at the time	20:07
ianw	i don't know why i coveted 200lb boxes full of jet engine fans, but it was a different time :)	20:09
fungi	i did have a 64-bit sparc (sunstation) and 64-bit mips (sgi indy)	20:14
fungi	s/sunstation/sparcstation/	20:14
fungi	(the sunstations also existed but i didn't have one)	20:15
fungi	er, no, the sgi indy was 32-bit mips, but i did have a dec alpha as well	20:16
fungi	i eventually swapped out the sparcstation for sun t1-105 rackmount servers because they were more compact and drew less power	20:17

Generated by irclog2html.py 2.17.2 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!