*** hamalq has quit IRC | 02:44 | |
*** sboyron has joined #opendev-meeting | 07:11 | |
*** hashar has joined #opendev-meeting | 07:50 | |
*** SotK has quit IRC | 09:00 | |
*** SotK has joined #opendev-meeting | 09:01 | |
*** hashar has quit IRC | 09:22 | |
*** hashar has joined #opendev-meeting | 09:28 | |
*** hashar is now known as hasharLunch | 10:18 | |
*** hasharLunch is now known as hashar | 12:44 | |
*** hamalq has joined #opendev-meeting | 17:10 | |
*** hamalq has quit IRC | 17:10 | |
*** hamalq has joined #opendev-meeting | 17:11 | |
*** hashar has quit IRC | 17:59 | |
clarkb | anyone else here for the infra meeting? | 19:01 |
---|---|---|
clarkb | I'm trying to juggle a quick set of updates for one of the topics but we'll get things going | 19:01 |
clarkb | #startmeeting infra | 19:01 |
openstack | Meeting started Tue Nov 10 19:01:39 2020 UTC and is due to finish in 60 minutes. The chair is clarkb. Information about MeetBot at http://wiki.debian.org/MeetBot. | 19:01 |
openstack | Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. | 19:01 |
*** openstack changes topic to " (Meeting topic: infra)" | 19:01 | |
openstack | The meeting name has been set to 'infra' | 19:01 |
corvus | o/ | 19:01 |
fungi | ohai | 19:02 |
clarkb | #link http://lists.opendev.org/pipermail/service-discuss/2020-November/000134.html Our Agenda | 19:02 |
ianw | o/ | 19:02 |
clarkb | #topic Announcements | 19:03 |
*** openstack changes topic to "Announcements (Meeting topic: infra)" | 19:03 | |
*** diablo_rojo__ has joined #opendev-meeting | 19:03 | |
diablo_rojo__ | o/ | 19:03 |
clarkb | Wallaby cycle signing key has been activated https://review.opendev.org/760364 | 19:03 |
clarkb | Please sign if you haven't yet https://docs.opendev.org/opendev/system-config/latest/signing.html | 19:03 |
diablo_rojo__ | o/ | 19:03 |
clarkb | I should find time to do that | 19:03 |
fungi | as long as we have at least a few folks attesting to it, that should be fine. the previous key has also published a signature for it anyway | 19:04 |
clarkb | #topic Actions from last meeting | 19:05 |
*** openstack changes topic to "Actions from last meeting (Meeting topic: infra)" | 19:05 | |
clarkb | #link http://eavesdrop.openstack.org/meetings/infra/2020/infra.2020-11-03-19.01.txt minutes from last meeting | 19:05 |
clarkb | There were no recorded actions | 19:05 |
clarkb | #topic Priority Efforts | 19:05 |
*** openstack changes topic to "Priority Efforts (Meeting topic: infra)" | 19:05 | |
clarkb | #topic Update Config Management | 19:05 |
*** openstack changes topic to "Update Config Management (Meeting topic: infra)" | 19:05 | |
clarkb | I believe we have an update on mirror-update.opendev.org from ianw and fungi? The reprepro stuff has been converted to ansible and the old puppeted server is no more? | 19:05 |
fungi | that sounds right to me | 19:06 |
ianw | yes, all done now, i've removed the old server so it's all opendev.org, all the time :) | 19:06 |
clarkb | excellent, thank you for working on that. | 19:06 |
clarkb | Has the change to do vos release via ssh landed? | 19:06 |
ianw | yes, i haven't double checked all the runs yet this morning, but the ones i saw last night looked good | 19:07 |
fungi | 758695 merged and was deployed by 05:12:16 | 19:07 |
clarkb | cool. Are there any other puppet conversions to call out? | 19:07 |
fungi | so in theory any mirror pulses starting after that time should have used it | 19:07 |
ianw | umm you saw the thing about the afs puppet jobs | 19:09 |
ianw | i think they have just been broken for ... a long time? | 19:09 |
clarkb | ianw: yup I've pushed up a few changes/patchsets to try and fix the testing on that change | 19:09 |
clarkb | and yes I expect that has always been broken | 19:09 |
clarkb | just more noticeable now due to the symlink thing | 19:09 |
clarkb | ianw: if my patches don't work then maybe we should ignore e208 for now in order to get the puppetry happy | 19:10 |
ianw | ok, i think afs is my next challenge to get updated | 19:10 |
fungi | grafana indicates ~current state (all <4hr old) for our package mirrors | 19:10 |
clarkb | #topic OpenDev | 19:11 |
*** openstack changes topic to "OpenDev (Meeting topic: infra)" | 19:11 | |
clarkb | Preparations for a gerrit 3.2 upgrade are ramping up again | 19:11 |
clarkb | #link http://lists.opendev.org/pipermail/service-announce/2020-October/000012.html Our announcement for the November 20 - 22 upgrade window | 19:11 |
clarkb | fungi and I have got review-test upgraded from a ~november 5 prod state | 19:12 |
clarkb | The server should be up and useable for testing and other interactions | 19:12 |
fungi | yep, fully upgraded to 3.2 | 19:12 |
clarkb | ianw: we are hoping that will help with your jeepyb testing | 19:12 |
fungi | also usable for demonstrating the ui | 19:12 |
ianw | ahh yes, i can play with the api and see if we can replicate the jeepyb things | 19:13 |
fungi | and it sounds like the 3.3 release is coming right about the time we planned to upgrade to 3.2, so we should probably plan a separate 3.3 release soon after? | 19:13 |
clarkb | fungi: I think once we've settled then ya 3.3 should happen quickly after | 19:13 |
clarkb | I think we've basically decided that the surrogate gerrit idea is neat, but introduces a bit of complexity in knowing what needs to be synced back and forth to end up with a valid upgrade path that way. | 19:13 |
fungi | i suppose we can keep review-test around to also test the 3.3 upgrade if we want | 19:13 |
clarkb | fungi and I did discover that giving the notedb conversion more threads sped up that process. Still not short but noticeably quicker | 19:13 |
clarkb | we gave it 50% more threads and it ran 40% quicker | 19:14 |
clarkb | I think we plan to double the default thread count when we do the production upgrade | 19:14 |
fungi | we might be able to speed it up a bit more still too, though i don't expect much below 4 hours to complete the notedb migration step | 19:14 |
clarkb | There are a couple of things that have popped up that I wanted to bring up for wider discussion. | 19:14 |
clarkb | The first is that we have confirmed that gerrit does not like updating accounts if they don't have an email set | 19:15 |
clarkb | #link https://bugs.chromium.org/p/gerrit/issues/detail?id=13654 | 19:15 |
fungi | if we budget ~4 hours on the upgrade plan, i guess we can see where that would leave us in possible timelines | 19:15 |
fungi | oh, yeah, that email-less account behavior strikes me as a bug | 19:15 |
clarkb | I've filed that upstream bug due to that weird account management behavior. You can create an internal account just fine without and email address, but you cannot then update that account's ssh keys | 19:15 |
clarkb | you also can't use a duplicate email address across accounts | 19:16 |
fungi | you're allowed to create accounts with no e-mail address, but adding an ssh key to one after the fact throws a weird backtrace into the logs and responds "unavailable" | 19:16 |
fungi | so probably just a regression | 19:16 |
clarkb | this means that if we need to update our admin accounts we may need to set a unique email address on them :/ | 19:16 |
corvus | we can set "infra-root+foobar" as email for for aour admin accounts | 19:16 |
fungi | yeah, that seems like a reasonable workaround | 19:16 |
clarkb | ah cool | 19:16 |
ianw | ++ | 19:17 |
clarkb | fungi: ^ we should probably test that gerrit treats those as unique? | 19:17 |
corvus | or i guess probably our own email addresses depending on hosting provider | 19:17 |
fungi | since rackspace's e-mail system we're using does support + addresses as automatic aliases | 19:17 |
corvus | gmail supports it iirc | 19:17 |
clarkb | and hopefully newer gerrit will just fix the problem | 19:17 |
corvus | (and my exim/cyrus does) | 19:17 |
fungi | i'm happy to test that gerrit sees those addresses as unique, but i can pretty well guarantee it will | 19:17 |
clarkb | fungi: thanks | 19:18 |
clarkb | that seems like a quick and easy fix so we probably don't need to get into it much further | 19:18 |
clarkb | The other thing I wanted to bring up was the set of chagnes that I've prepped in relation to the upgrade | 19:18 |
clarkb | #link https://review.opendev.org/#/q/status:open+topic:gerrit-upgrade-prep Changes for before and after the upgrade | 19:18 |
clarkb | A number of those should be safe to land today and they do not have a WIP | 19:18 |
fungi | the main reason i would want to avoid having to put e-mail addresses on those accounts is it's just one more thing which can tab complete to them in the gerrit ui and confuse users | 19:18 |
clarkb | Another chunk reflect state after the upgrade and are WIP because we shouldn't land them yet | 19:19 |
clarkb | It would be great if we could get reviews on the lot of them to sanity check things as well as land as much as we can today | 19:19 |
clarkb | (or $day before the upgrade_ | 19:19 |
clarkb | One specific concern I've got is there are ~4 system-config chagnes that sort of all need to land together because they reflect post upgrade system state, but zuul will run them in sequence | 19:20 |
clarkb | so I'm wondering how should we manipulate zuul during/after the upgrade to safely run those updates against the updated state | 19:20 |
corvus | fungi: good point, we should probably avoid "corvus+admin"; infra-root+corvus is better due to tab-complete | 19:20 |
clarkb | https://review.opendev.org/#/c/757155/ https://review.opendev.org/#/c/757625/ https://review.opendev.org/#/c/757156/ https://review.opendev.org/#/c/757176/ are the 4 changes I've identified in this situation | 19:21 |
corvus | clarkb: bracket with disable/enable jobs change? | 19:21 |
clarkb | corvus: ya so I think our options are: disable then enable the jobs entirely, force merge them all before zuul starts, squash them and set it up so that a single job running is fine | 19:22 |
clarkb | one concern with disabling the jobs then enabling them is I worry I won't manage to sufficiently disable the job since we trigger them in a number of places. But that concern may just be mitigated with sufficient grepping | 19:22 |
corvus | i agree and force-merge or squashing means less time spinning wheels | 19:23 |
clarkb | just before the meeting I discovered thatjeepyb wasn't running the gerrit 3.1 and 3.2 image builds as an example of where we've missed things like that previously | 19:23 |
fungi | i'm good with squashing, those changes aren't massive | 19:24 |
fungi | and they're all for the same repo | 19:24 |
clarkb | the changes in system-config that trail the ones I've listed above should all be safe to land as after the fact cleanups | 19:24 |
clarkb | Another concern I had was I expect gitea replication to take a day and a half or so based on testing, I don't think we rely on gitea state for our zuul jobs that run ansible, but if we do anywhere can you call that out? | 19:24 |
clarkb | because that is another syncing of the world step that may impact our automated deployments | 19:25 |
clarkb | but ya if people can review those changes and think about them from a perspective of how do we land them safely post upgrade that would be great. I'm open to feedback and ideas | 19:25 |
clarkb | I'm hoping to write up a concrete upgrade plan doc soon (starting tomorrow likely) and we can start to fill in those details | 19:26 |
clarkb | at this point I think my biggest concern with the upgrade revolves around how do we turn zuul back on safely :) | 19:26 |
corvus | the gitea replication lag will probably confuse folks cloning or pulling changes (or using gertty) | 19:26 |
*** hashar has joined #opendev-meeting | 19:27 | |
corvus | but it's happened before, so i think if we include that in the announcement folks can deal | 19:27 |
fungi | this is also why even if we can get stuff done on saturday we need to say the maintenance is through sunday | 19:27 |
clarkb | fungi: yup and we have done that | 19:28 |
fungi | (or early monday as we've been communicating so far) | 19:28 |
clarkb | another thought that occured to me when writing https://review.opendev.org/#/c/762191/1 earlier today is that it feels like we're effectively abandoning review-dev | 19:28 |
clarkb | Should we try to upgrade review-dev or decide it doesn't work well for us anymore and we need something like review-test going forward? | 19:28 |
clarkb | I'm hopeful that zuul jobs can fit in there too | 19:29 |
fungi | i had assumed, perhaps incorrectly, that we wouldn't really need review-dev going forward | 19:29 |
clarkb | fungi: fwiw I don't think that is incorrect, mostly just me realizing today "Oh ya we still have review dev and these cahgnes will make it sad" | 19:29 |
clarkb | I think that is ok if one of the todo items here is retire review-dev | 19:29 |
clarkb | we can put it in the emergency file in the interim | 19:29 |
clarkb | review-test with prod like data has been way more valuable imo | 19:30 |
fungi | our proliferation of -dev servers predates our increased efficiency at standing up test servers on demand, or even as part of ci | 19:30 |
fungi | and at some point they become more of a maintenance burden than a benefit | 19:30 |
corvus | clarkb: ++ | 19:32 |
clarkb | ok /me adds put review-dev in stasis to the list | 19:32 |
clarkb | The last thing on my talk about gerrit list is that storyboard is still an unknown | 19:33 |
clarkb | its-storyboard may or may not work is the more specific way of saying that | 19:33 |
clarkb | fungi: how terrible would it be to set up credentials for review-test against storyboard-dev now and test that integration? | 19:33 |
fungi | we're building it into the images, adding credentials for it would be fairly trivial | 19:33 |
fungi | i can give that a go later this week and test it | 19:34 |
clarkb | that would be great, thank you | 19:34 |
clarkb | anyone else have questions or concerns to bring up around the upgrade? | 19:34 |
fungi | i think where it's likely yo fall apart is around commentlinks mapping to the its actions | 19:35 |
fungi | er, likely to fall apart | 19:35 |
fungi | (talking about its-storyboard plugin integration that is) | 19:37 |
clarkb | #topic General topics | 19:38 |
*** openstack changes topic to "General topics (Meeting topic: infra)" | 19:38 | |
clarkb | #topic PTG Followups | 19:38 |
*** openstack changes topic to "PTG Followups (Meeting topic: infra)" | 19:38 | |
clarkb | Just a note that I haven't forgotten these, but the time pressure for the gerrit upgrade has me focusing on that (the downside to having all the things happen in a short period of time) | 19:38 |
clarkb | I'm hoping tomorrow will be a "writing" day and I'll get an upgrade plan doc written as well as some of these ptg things and not look at failing jobs or code for a bit | 19:39 |
clarkb | #topic Meetpad not useable from some locations | 19:39 |
*** openstack changes topic to "Meetpad not useable from some locations (Meeting topic: infra)" | 19:39 | |
clarkb | I brought this up with Horace and he was willing to help us test it, then I completely spaced on it because last week had a very distracting event going on. | 19:40 |
clarkb | I'll try pinging horace this evening (my time) to see if there is a good time to test again | 19:40 |
clarkb | then hopeflly we can narrow this down to corporate firewalls or the great firewall etc | 19:40 |
clarkb | #topic Bup and Borg Backups | 19:41 |
*** openstack changes topic to "Bup and Borg Backups (Meeting topic: infra)" | 19:41 | |
clarkb | Wanted to bring this up since there have been recent updates | 19:41 |
clarkb | In particular I think we declared bup bankruptcy on etherpad since /root/.bup was using significant disk | 19:41 |
clarkb | and out of that ianw has landed chagnes to start running borg on all the hosts we back up | 19:42 |
clarkb | ianw: were you happy with the results of those changes? | 19:42 |
ianw | i was last night on etherpad | 19:42 |
ianw | i haven't yet gone through all the other hosts but will today | 19:42 |
clarkb | sounds good | 19:43 |
ianw | note per our discussion bup is now off on etherpad, because it was filling up the disk | 19:43 |
clarkb | I think the biggest change from what we were doing with bup is that borg requires a bit more opt in to what is backed up rather than backing up all of / with exclusions | 19:43 |
clarkb | (we could set borg to backup / then do exclusions too I suppose) | 19:44 |
clarkb | want to call that out as I tripped over it a few times when reasoning about exclusion list updates and the like | 19:44 |
ianw | another thing is that the vexxhost backup server has 1tb attached, the rax one 3tb | 19:45 |
fungi | i think if we set a good policy about where we expect important data/state to reside on our systems and then back up those paths, it's fine | 19:45 |
clarkb | ianw: also have we set the borg settings to do append only backups? | 19:45 |
clarkb | we had called that out as a desireable feature and now I can't recall if we're setting that or not | 19:45 |
ianw | yes, we run the remote side with --append-only | 19:46 |
clarkb | great, thank you for working on this. Hopeflly we end up freeing a lot of local disk that was consumed by /root/.bup as well as handle the python2 less world | 19:47 |
clarkb | I had a couple other topics (openstackid.org and splitting puppet else up) but I don't think anything has happend on those subjects | 19:48 |
clarkb | #topic Open Discussion | 19:48 |
*** openstack changes topic to "Open Discussion (Meeting topic: infra)" | 19:48 | |
clarkb | tomorrow is a holiday in many parts of the wordl which is why I'm hoping I can get away with writing documents :) | 19:49 |
clarkb | if you've got the day off enjoy | 19:49 |
corvus | ianw: there was some discussion in #zuul this morning related to your pypa zuul work; did you see that? is a tl;dr worthwhile? | 19:49 |
ianw | corvus: sure, pypa have shown interest in zuul and i've been working to get a proof-of-concept up | 19:49 |
corvus | oh sorry, i meant do you want me to summarize the #zuul chat? :) | 19:50 |
ianw | the pull request doing some tox testing is @ https://github.com/pypa/pip/pull/9107 | 19:50 |
ianw | oh, haha, sure | 19:50 |
corvus | it was suggested that if we pull more stuff out of the run playbook and put it into pre (eg, ensure-tox etc) it would make the console tab more accessible to folks. i think that's relevant in your pr since that job is being defined there. i think avass was going to leave a comment. | 19:51 |
corvus | building on that, we thought we might look into having zuul default to the console tab rather than the summary tab. (this item is less immediately relevant) | 19:52 |
ianw | oh right, yeah i pushed a change to do that in that pr | 19:53 |
clarkb | oh inmotionhosting has reached out to me about possibly providing cloud resources to opendev. I'ev got an introductory call with them tomorrow to start that conversation | 19:53 |
corvus | the overall theme is if we focus on simplifying the run playbook and present the console tab to users, we can immediately present important information to users, increase the signal/noise ratio, and the output may start to seem a little more familiar to folks using other ci tools. | 19:53 |
corvus | ianw: cool, then you're probably ahead of me on this, i had to duck out right after that convo. :) | 19:54 |
ianw | corvus: this is true, as with travis or github, i forget, you get basically your yaml file shown to you in a "console" format | 19:54 |
ianw | like you click to open up each step and see the logs | 19:54 |
corvus | ianw: yeah, and we do too, it's just our yaml file is way bigger :) | 19:55 |
corvus | (and the console tab hides pre/post playbooks by default, so putting "boring" stuff in those is a win for ux [assuming it's appropriate to put them there]) | 19:55 |
corvus | clarkb: neatoh | 19:56 |
ianw | i'm pretty aware that just using zuul to run tox as 3rd party CI for github isn't a big goal for us ... but i do feel like there's some opportunity to bring pip a little further along here | 19:56 |
fungi | the tasks like "Run tox testing" which are just role inclusion statements could also be considered noise, i suppose | 19:57 |
corvus | fungi: yeah, that might be worth a ui re-think | 19:57 |
corvus | (maybe we can ignore those?) | 19:57 |
corvus | clarkb: are they a private cloud provider? | 19:58 |
fungi | or maybe "expandable" task results could be more prominent in the ui somehow | 19:58 |
clarkb | corvus: yup, the brief intro I got was that they coudl run an openstack private cloud that we would use | 19:58 |
fungi | besides just the leading > marker | 19:58 |
clarkb | We are just about to our hour time limit. Thank you everyone! | 19:59 |
fungi | thanks clarkb! | 19:59 |
corvus | clarkb: thx! | 19:59 |
clarkb | We'll see you here next week. Probably with another focus on gerrit as that'll be a few days before the planned upgrade | 20:00 |
clarkb | probably do a sanity check go no go then too | 20:00 |
clarkb | #endmeeting | 20:00 |
*** openstack changes topic to "Incident management and meetings for the OpenDev sysadmins; normal discussions are in #opendev" | 20:00 | |
openstack | Meeting ended Tue Nov 10 20:00:10 2020 UTC. Information about MeetBot at http://wiki.debian.org/MeetBot . (v 0.1.4) | 20:00 |
openstack | Minutes: http://eavesdrop.openstack.org/meetings/infra/2020/infra.2020-11-10-19.01.html | 20:00 |
openstack | Minutes (text): http://eavesdrop.openstack.org/meetings/infra/2020/infra.2020-11-10-19.01.txt | 20:00 |
openstack | Log: http://eavesdrop.openstack.org/meetings/infra/2020/infra.2020-11-10-19.01.log.html | 20:00 |
corvus | inmotion is in el segundo... i left my wallet in el segundo. | 20:00 |
fungi | maybe they can help you find it! | 20:07 |
*** hashar has quit IRC | 20:55 | |
*** sboyron has quit IRC | 23:36 |
Generated by irclog2html.py 2.17.2 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!