Tuesday, 2020-11-10

*** hamalq has quit IRC		02:44
*** sboyron has joined #opendev-meeting		07:11
*** hashar has joined #opendev-meeting		07:50
*** SotK has quit IRC		09:00
*** SotK has joined #opendev-meeting		09:01
*** hashar has quit IRC		09:22
*** hashar has joined #opendev-meeting		09:28
*** hashar is now known as hasharLunch		10:18
*** hasharLunch is now known as hashar		12:44
*** hamalq has joined #opendev-meeting		17:10
*** hamalq has quit IRC		17:10
*** hamalq has joined #opendev-meeting		17:11
*** hashar has quit IRC		17:59
clarkb	anyone else here for the infra meeting?	19:01
clarkb	I'm trying to juggle a quick set of updates for one of the topics but we'll get things going	19:01
clarkb	#startmeeting infra	19:01
openstack	Meeting started Tue Nov 10 19:01:39 2020 UTC and is due to finish in 60 minutes. The chair is clarkb. Information about MeetBot at http://wiki.debian.org/MeetBot.	19:01
openstack	Useful Commands: #action #agreed #help #info #idea #link #topic #startvote.	19:01
*** openstack changes topic to " (Meeting topic: infra)"		19:01
openstack	The meeting name has been set to 'infra'	19:01
corvus	o/	19:01
fungi	ohai	19:02
clarkb	#link http://lists.opendev.org/pipermail/service-discuss/2020-November/000134.html Our Agenda	19:02
ianw	o/	19:02
clarkb	#topic Announcements	19:03
*** openstack changes topic to "Announcements (Meeting topic: infra)"		19:03
*** diablo_rojo__ has joined #opendev-meeting		19:03
diablo_rojo__	o/	19:03
clarkb	Wallaby cycle signing key has been activated https://review.opendev.org/760364	19:03
clarkb	Please sign if you haven't yet https://docs.opendev.org/opendev/system-config/latest/signing.html	19:03
diablo_rojo__	o/	19:03
clarkb	I should find time to do that	19:03
fungi	as long as we have at least a few folks attesting to it, that should be fine. the previous key has also published a signature for it anyway	19:04
clarkb	#topic Actions from last meeting	19:05
*** openstack changes topic to "Actions from last meeting (Meeting topic: infra)"		19:05
clarkb	#link http://eavesdrop.openstack.org/meetings/infra/2020/infra.2020-11-03-19.01.txt minutes from last meeting	19:05
clarkb	There were no recorded actions	19:05
clarkb	#topic Priority Efforts	19:05
*** openstack changes topic to "Priority Efforts (Meeting topic: infra)"		19:05
clarkb	#topic Update Config Management	19:05
*** openstack changes topic to "Update Config Management (Meeting topic: infra)"		19:05
clarkb	I believe we have an update on mirror-update.opendev.org from ianw and fungi? The reprepro stuff has been converted to ansible and the old puppeted server is no more?	19:05
fungi	that sounds right to me	19:06
ianw	yes, all done now, i've removed the old server so it's all opendev.org, all the time :)	19:06
clarkb	excellent, thank you for working on that.	19:06
clarkb	Has the change to do vos release via ssh landed?	19:06
ianw	yes, i haven't double checked all the runs yet this morning, but the ones i saw last night looked good	19:07
fungi	758695 merged and was deployed by 05:12:16	19:07
clarkb	cool. Are there any other puppet conversions to call out?	19:07
fungi	so in theory any mirror pulses starting after that time should have used it	19:07
ianw	umm you saw the thing about the afs puppet jobs	19:09
ianw	i think they have just been broken for ... a long time?	19:09
clarkb	ianw: yup I've pushed up a few changes/patchsets to try and fix the testing on that change	19:09
clarkb	and yes I expect that has always been broken	19:09
clarkb	just more noticeable now due to the symlink thing	19:09
clarkb	ianw: if my patches don't work then maybe we should ignore e208 for now in order to get the puppetry happy	19:10
ianw	ok, i think afs is my next challenge to get updated	19:10
fungi	grafana indicates ~current state (all <4hr old) for our package mirrors	19:10
clarkb	#topic OpenDev	19:11
*** openstack changes topic to "OpenDev (Meeting topic: infra)"		19:11
clarkb	Preparations for a gerrit 3.2 upgrade are ramping up again	19:11
clarkb	#link http://lists.opendev.org/pipermail/service-announce/2020-October/000012.html Our announcement for the November 20 - 22 upgrade window	19:11
clarkb	fungi and I have got review-test upgraded from a ~november 5 prod state	19:12
clarkb	The server should be up and useable for testing and other interactions	19:12
fungi	yep, fully upgraded to 3.2	19:12
clarkb	ianw: we are hoping that will help with your jeepyb testing	19:12
fungi	also usable for demonstrating the ui	19:12
ianw	ahh yes, i can play with the api and see if we can replicate the jeepyb things	19:13
fungi	and it sounds like the 3.3 release is coming right about the time we planned to upgrade to 3.2, so we should probably plan a separate 3.3 release soon after?	19:13
clarkb	fungi: I think once we've settled then ya 3.3 should happen quickly after	19:13
clarkb	I think we've basically decided that the surrogate gerrit idea is neat, but introduces a bit of complexity in knowing what needs to be synced back and forth to end up with a valid upgrade path that way.	19:13
fungi	i suppose we can keep review-test around to also test the 3.3 upgrade if we want	19:13
clarkb	fungi and I did discover that giving the notedb conversion more threads sped up that process. Still not short but noticeably quicker	19:13
clarkb	we gave it 50% more threads and it ran 40% quicker	19:14
clarkb	I think we plan to double the default thread count when we do the production upgrade	19:14
fungi	we might be able to speed it up a bit more still too, though i don't expect much below 4 hours to complete the notedb migration step	19:14
clarkb	There are a couple of things that have popped up that I wanted to bring up for wider discussion.	19:14
clarkb	The first is that we have confirmed that gerrit does not like updating accounts if they don't have an email set	19:15
clarkb	#link https://bugs.chromium.org/p/gerrit/issues/detail?id=13654	19:15
fungi	if we budget ~4 hours on the upgrade plan, i guess we can see where that would leave us in possible timelines	19:15
fungi	oh, yeah, that email-less account behavior strikes me as a bug	19:15
clarkb	I've filed that upstream bug due to that weird account management behavior. You can create an internal account just fine without and email address, but you cannot then update that account's ssh keys	19:15
clarkb	you also can't use a duplicate email address across accounts	19:16
fungi	you're allowed to create accounts with no e-mail address, but adding an ssh key to one after the fact throws a weird backtrace into the logs and responds "unavailable"	19:16
fungi	so probably just a regression	19:16
clarkb	this means that if we need to update our admin accounts we may need to set a unique email address on them :/	19:16
corvus	we can set "infra-root+foobar" as email for for aour admin accounts	19:16
fungi	yeah, that seems like a reasonable workaround	19:16
clarkb	ah cool	19:16
ianw	++	19:17
clarkb	fungi: ^ we should probably test that gerrit treats those as unique?	19:17
corvus	or i guess probably our own email addresses depending on hosting provider	19:17
fungi	since rackspace's e-mail system we're using does support + addresses as automatic aliases	19:17
corvus	gmail supports it iirc	19:17
clarkb	and hopefully newer gerrit will just fix the problem	19:17
corvus	(and my exim/cyrus does)	19:17
fungi	i'm happy to test that gerrit sees those addresses as unique, but i can pretty well guarantee it will	19:17
clarkb	fungi: thanks	19:18
clarkb	that seems like a quick and easy fix so we probably don't need to get into it much further	19:18
clarkb	The other thing I wanted to bring up was the set of chagnes that I've prepped in relation to the upgrade	19:18
clarkb	#link https://review.opendev.org/#/q/status:open+topic:gerrit-upgrade-prep Changes for before and after the upgrade	19:18
clarkb	A number of those should be safe to land today and they do not have a WIP	19:18
fungi	the main reason i would want to avoid having to put e-mail addresses on those accounts is it's just one more thing which can tab complete to them in the gerrit ui and confuse users	19:18
clarkb	Another chunk reflect state after the upgrade and are WIP because we shouldn't land them yet	19:19
clarkb	It would be great if we could get reviews on the lot of them to sanity check things as well as land as much as we can today	19:19
clarkb	(or $day before the upgrade_	19:19
clarkb	One specific concern I've got is there are ~4 system-config chagnes that sort of all need to land together because they reflect post upgrade system state, but zuul will run them in sequence	19:20
clarkb	so I'm wondering how should we manipulate zuul during/after the upgrade to safely run those updates against the updated state	19:20
corvus	fungi: good point, we should probably avoid "corvus+admin"; infra-root+corvus is better due to tab-complete	19:20
clarkb	https://review.opendev.org/#/c/757155/ https://review.opendev.org/#/c/757625/ https://review.opendev.org/#/c/757156/ https://review.opendev.org/#/c/757176/ are the 4 changes I've identified in this situation	19:21
corvus	clarkb: bracket with disable/enable jobs change?	19:21
clarkb	corvus: ya so I think our options are: disable then enable the jobs entirely, force merge them all before zuul starts, squash them and set it up so that a single job running is fine	19:22
clarkb	one concern with disabling the jobs then enabling them is I worry I won't manage to sufficiently disable the job since we trigger them in a number of places. But that concern may just be mitigated with sufficient grepping	19:22
corvus	i agree and force-merge or squashing means less time spinning wheels	19:23
clarkb	just before the meeting I discovered thatjeepyb wasn't running the gerrit 3.1 and 3.2 image builds as an example of where we've missed things like that previously	19:23
fungi	i'm good with squashing, those changes aren't massive	19:24
fungi	and they're all for the same repo	19:24
clarkb	the changes in system-config that trail the ones I've listed above should all be safe to land as after the fact cleanups	19:24
clarkb	Another concern I had was I expect gitea replication to take a day and a half or so based on testing, I don't think we rely on gitea state for our zuul jobs that run ansible, but if we do anywhere can you call that out?	19:24
clarkb	because that is another syncing of the world step that may impact our automated deployments	19:25
clarkb	but ya if people can review those changes and think about them from a perspective of how do we land them safely post upgrade that would be great. I'm open to feedback and ideas	19:25
clarkb	I'm hoping to write up a concrete upgrade plan doc soon (starting tomorrow likely) and we can start to fill in those details	19:26
clarkb	at this point I think my biggest concern with the upgrade revolves around how do we turn zuul back on safely :)	19:26
corvus	the gitea replication lag will probably confuse folks cloning or pulling changes (or using gertty)	19:26
*** hashar has joined #opendev-meeting		19:27
corvus	but it's happened before, so i think if we include that in the announcement folks can deal	19:27
fungi	this is also why even if we can get stuff done on saturday we need to say the maintenance is through sunday	19:27
clarkb	fungi: yup and we have done that	19:28
fungi	(or early monday as we've been communicating so far)	19:28
clarkb	another thought that occured to me when writing https://review.opendev.org/#/c/762191/1 earlier today is that it feels like we're effectively abandoning review-dev	19:28
clarkb	Should we try to upgrade review-dev or decide it doesn't work well for us anymore and we need something like review-test going forward?	19:28
clarkb	I'm hopeful that zuul jobs can fit in there too	19:29
fungi	i had assumed, perhaps incorrectly, that we wouldn't really need review-dev going forward	19:29
clarkb	fungi: fwiw I don't think that is incorrect, mostly just me realizing today "Oh ya we still have review dev and these cahgnes will make it sad"	19:29
clarkb	I think that is ok if one of the todo items here is retire review-dev	19:29
clarkb	we can put it in the emergency file in the interim	19:29
clarkb	review-test with prod like data has been way more valuable imo	19:30
fungi	our proliferation of -dev servers predates our increased efficiency at standing up test servers on demand, or even as part of ci	19:30
fungi	and at some point they become more of a maintenance burden than a benefit	19:30
corvus	clarkb: ++	19:32
clarkb	ok /me adds put review-dev in stasis to the list	19:32
clarkb	The last thing on my talk about gerrit list is that storyboard is still an unknown	19:33
clarkb	its-storyboard may or may not work is the more specific way of saying that	19:33
clarkb	fungi: how terrible would it be to set up credentials for review-test against storyboard-dev now and test that integration?	19:33
fungi	we're building it into the images, adding credentials for it would be fairly trivial	19:33
fungi	i can give that a go later this week and test it	19:34
clarkb	that would be great, thank you	19:34
clarkb	anyone else have questions or concerns to bring up around the upgrade?	19:34
fungi	i think where it's likely yo fall apart is around commentlinks mapping to the its actions	19:35
fungi	er, likely to fall apart	19:35
fungi	(talking about its-storyboard plugin integration that is)	19:37
clarkb	#topic General topics	19:38
*** openstack changes topic to "General topics (Meeting topic: infra)"		19:38
clarkb	#topic PTG Followups	19:38
*** openstack changes topic to "PTG Followups (Meeting topic: infra)"		19:38
clarkb	Just a note that I haven't forgotten these, but the time pressure for the gerrit upgrade has me focusing on that (the downside to having all the things happen in a short period of time)	19:38
clarkb	I'm hoping tomorrow will be a "writing" day and I'll get an upgrade plan doc written as well as some of these ptg things and not look at failing jobs or code for a bit	19:39
clarkb	#topic Meetpad not useable from some locations	19:39
*** openstack changes topic to "Meetpad not useable from some locations (Meeting topic: infra)"		19:39
clarkb	I brought this up with Horace and he was willing to help us test it, then I completely spaced on it because last week had a very distracting event going on.	19:40
clarkb	I'll try pinging horace this evening (my time) to see if there is a good time to test again	19:40
clarkb	then hopeflly we can narrow this down to corporate firewalls or the great firewall etc	19:40
clarkb	#topic Bup and Borg Backups	19:41
*** openstack changes topic to "Bup and Borg Backups (Meeting topic: infra)"		19:41
clarkb	Wanted to bring this up since there have been recent updates	19:41
clarkb	In particular I think we declared bup bankruptcy on etherpad since /root/.bup was using significant disk	19:41
clarkb	and out of that ianw has landed chagnes to start running borg on all the hosts we back up	19:42
clarkb	ianw: were you happy with the results of those changes?	19:42
ianw	i was last night on etherpad	19:42
ianw	i haven't yet gone through all the other hosts but will today	19:42
clarkb	sounds good	19:43
ianw	note per our discussion bup is now off on etherpad, because it was filling up the disk	19:43
clarkb	I think the biggest change from what we were doing with bup is that borg requires a bit more opt in to what is backed up rather than backing up all of / with exclusions	19:43
clarkb	(we could set borg to backup / then do exclusions too I suppose)	19:44
clarkb	want to call that out as I tripped over it a few times when reasoning about exclusion list updates and the like	19:44
ianw	another thing is that the vexxhost backup server has 1tb attached, the rax one 3tb	19:45
fungi	i think if we set a good policy about where we expect important data/state to reside on our systems and then back up those paths, it's fine	19:45
clarkb	ianw: also have we set the borg settings to do append only backups?	19:45
clarkb	we had called that out as a desireable feature and now I can't recall if we're setting that or not	19:45
ianw	yes, we run the remote side with --append-only	19:46
clarkb	great, thank you for working on this. Hopeflly we end up freeing a lot of local disk that was consumed by /root/.bup as well as handle the python2 less world	19:47
clarkb	I had a couple other topics (openstackid.org and splitting puppet else up) but I don't think anything has happend on those subjects	19:48
clarkb	#topic Open Discussion	19:48
*** openstack changes topic to "Open Discussion (Meeting topic: infra)"		19:48
clarkb	tomorrow is a holiday in many parts of the wordl which is why I'm hoping I can get away with writing documents :)	19:49
clarkb	if you've got the day off enjoy	19:49
corvus	ianw: there was some discussion in #zuul this morning related to your pypa zuul work; did you see that? is a tl;dr worthwhile?	19:49
ianw	corvus: sure, pypa have shown interest in zuul and i've been working to get a proof-of-concept up	19:49
corvus	oh sorry, i meant do you want me to summarize the #zuul chat? :)	19:50
ianw	the pull request doing some tox testing is @ https://github.com/pypa/pip/pull/9107	19:50
ianw	oh, haha, sure	19:50
corvus	it was suggested that if we pull more stuff out of the run playbook and put it into pre (eg, ensure-tox etc) it would make the console tab more accessible to folks. i think that's relevant in your pr since that job is being defined there. i think avass was going to leave a comment.	19:51
corvus	building on that, we thought we might look into having zuul default to the console tab rather than the summary tab. (this item is less immediately relevant)	19:52
ianw	oh right, yeah i pushed a change to do that in that pr	19:53
clarkb	oh inmotionhosting has reached out to me about possibly providing cloud resources to opendev. I'ev got an introductory call with them tomorrow to start that conversation	19:53
corvus	the overall theme is if we focus on simplifying the run playbook and present the console tab to users, we can immediately present important information to users, increase the signal/noise ratio, and the output may start to seem a little more familiar to folks using other ci tools.	19:53
corvus	ianw: cool, then you're probably ahead of me on this, i had to duck out right after that convo. :)	19:54
ianw	corvus: this is true, as with travis or github, i forget, you get basically your yaml file shown to you in a "console" format	19:54
ianw	like you click to open up each step and see the logs	19:54
corvus	ianw: yeah, and we do too, it's just our yaml file is way bigger :)	19:55
corvus	(and the console tab hides pre/post playbooks by default, so putting "boring" stuff in those is a win for ux [assuming it's appropriate to put them there])	19:55
corvus	clarkb: neatoh	19:56
ianw	i'm pretty aware that just using zuul to run tox as 3rd party CI for github isn't a big goal for us ... but i do feel like there's some opportunity to bring pip a little further along here	19:56
fungi	the tasks like "Run tox testing" which are just role inclusion statements could also be considered noise, i suppose	19:57
corvus	fungi: yeah, that might be worth a ui re-think	19:57
corvus	(maybe we can ignore those?)	19:57
corvus	clarkb: are they a private cloud provider?	19:58
fungi	or maybe "expandable" task results could be more prominent in the ui somehow	19:58
clarkb	corvus: yup, the brief intro I got was that they coudl run an openstack private cloud that we would use	19:58
fungi	besides just the leading > marker	19:58
clarkb	We are just about to our hour time limit. Thank you everyone!	19:59
fungi	thanks clarkb!	19:59
corvus	clarkb: thx!	19:59
clarkb	We'll see you here next week. Probably with another focus on gerrit as that'll be a few days before the planned upgrade	20:00
clarkb	probably do a sanity check go no go then too	20:00
clarkb	#endmeeting	20:00
*** openstack changes topic to "Incident management and meetings for the OpenDev sysadmins; normal discussions are in #opendev"		20:00
openstack	Meeting ended Tue Nov 10 20:00:10 2020 UTC. Information about MeetBot at http://wiki.debian.org/MeetBot . (v 0.1.4)	20:00
openstack	Minutes: http://eavesdrop.openstack.org/meetings/infra/2020/infra.2020-11-10-19.01.html	20:00
openstack	Minutes (text): http://eavesdrop.openstack.org/meetings/infra/2020/infra.2020-11-10-19.01.txt	20:00
openstack	Log: http://eavesdrop.openstack.org/meetings/infra/2020/infra.2020-11-10-19.01.log.html	20:00
corvus	inmotion is in el segundo... i left my wallet in el segundo.	20:00
fungi	maybe they can help you find it!	20:07
*** hashar has quit IRC		20:55
*** sboyron has quit IRC		23:36

Generated by irclog2html.py 2.17.2 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!