19:01:18 <clarkb> #startmeeting infra 19:01:19 <openstack> Meeting started Tue Aug 18 19:01:18 2020 UTC and is due to finish in 60 minutes. The chair is clarkb. Information about MeetBot at http://wiki.debian.org/MeetBot. 19:01:20 <openstack> Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. 19:01:21 <ianw> o/ 19:01:22 <openstack> The meeting name has been set to 'infra' 19:01:32 <clarkb> #link http://lists.opendev.org/pipermail/service-discuss/2020-August/000077.html Our Agenda 19:01:39 <clarkb> #topic Announcements 19:01:50 <clarkb> I had no announcements. 19:02:35 <clarkb> #topic Actions from last meeting 19:02:45 <clarkb> #link http://eavesdrop.openstack.org/meetings/infra/2020/infra.2020-08-11-19.01.txt minutes from last meeting 19:02:48 <clarkb> There were no actions 19:02:57 <clarkb> #topic Specs approval 19:03:02 <clarkb> #link https://review.opendev.org/#/c/731838/ Authentication broker service 19:03:38 <clarkb> This got a new patchset after soem feedback from corvus. I expect it it still ready for approval 19:03:51 <clarkb> Do we want to put it up for approval this week and get rereviews by say friday? 19:04:24 <fungi> seems like a plan to me 19:04:32 <corvus> ++ 19:04:39 <fungi> and yeah, today's update there was just clarification 19:04:51 <fungi> and only touched one paragraph 19:04:59 <clarkb> great. I'll be sure to rereview and plan to approve it friday if there are no objections between now and then 19:05:12 <clarkb> #topic Priority Efforts 19:05:19 <clarkb> #topic Update Config Management 19:05:38 <clarkb> I'm still pushing on the gerrit(bot) things here 19:05:40 <clarkb> #link https://review.opendev.org/746181 Final followup for gerritbot containerization 19:05:53 <clarkb> this change should finish up the remaining todo items for gerritbot's containerization 19:06:05 <clarkb> that ensures we're updating the container image on eavesdrop 19:07:00 <clarkb> #link https://review.opendev.org/#/c/746335/ add missing files to config management 19:07:14 <clarkb> This is a fixup for the ansibleification of gerrit that I noticed when testing gerrit upgrades locally 19:07:33 <clarkb> we stopped managing the logo svg and the jquery js files that hideci uses 19:07:45 <clarkb> #link https://review.opendev.org/746784 More image cleanups 19:08:05 <clarkb> and this last one is where I'm currently at on the gerrit upgrade testing. That should make gerrit startup cleaner. But I want to test it 19:08:19 <clarkb> I've also discovered there is some sort of problem with the gerrit plugin manager on gerrit 3.0 that I haven't figured out yet 19:08:36 <clarkb> thats a bit lower priority as 3.0 happens after 2.16, but I'll still try to sort it out if I can 19:08:48 <clarkb> Any other config management updates to bring up? 19:09:01 <clarkb> fungi: I know you mentioned you wanted to pick up the mirror update reprepro in ansible changes again 19:09:20 <fungi> yeah, i haven't gotten to it yet though 19:09:43 <fungi> if anyone else is excited to work on it though, i don't mind anyone chipping away at the conversion 19:10:00 <fungi> it's just dozens of thankless erb to j2 template conversions 19:11:31 <clarkb> #topic OpenDev 19:12:08 <clarkb> For the Gerrit upgrade process I sent some questions to Luca which resulted in some good info. TL;DR is that we should be able to upgrade to 2.16 without notedb then do the notedb conversion separately 19:12:29 <clarkb> I like this because it breaks the fairly large upgrade into two more manageable pieces 19:13:43 <clarkb> I'm still iterating on our images (as noted above). So far all the upgrades I've been doing have been in succession with online reindexing. Once I've got the image into a happy spot I'm going to start testing skip level type upgrades and see if we can stop gerrit, run 2.14 init, 2.15 init, 2.16 init, reindex, then start as I expect that will end up being the quickest way for us to upgrade if it works 19:14:19 <corvus> this all sounds good to me 19:14:24 <clarkb> but all of that is still an unknown. My goal is to be able to write up an upgrade process to 2.16 using our images that we can then apply to our actual data. 19:15:16 <clarkb> and I'm sure I'll have more questions for luca. Related to that luca offered to do a conference call if we wanted. Are others interested in being included in that? If so let me know and I'll include you for scheduling if/when that happens 19:15:40 <clarkb> I'm also thinking syncing up with luca that way once I've got an upgrade process written down may be good as we can talk about our plan and see if he has any concerns with it 19:16:22 <clarkb> #link https://review.opendev.org/741277 Needed in Gerritlib first as well as a Gerritlib release with this change. 19:16:28 <clarkb> #link https://review.opendev.org/741279 Can land once Gerritlib release is made with above change. 19:16:38 <clarkb> two other opendev related chagnes that would be good to review if you get a chance 19:16:47 <clarkb> Anyone else have opendev topics they want to bring up? 19:17:11 <fungi> rackspace volume maintenance maybe 19:17:33 <fungi> just got word today that there will be outages in october for all our current cinder volumes in dfw 19:17:42 <clarkb> did they give specific dates or just that month? 19:17:54 <fungi> no specific dates yet 19:18:06 <fungi> i've converted the uuid list to volume names and broken them down by what we ought to do 19:18:21 <fungi> #link https://etherpad.opendev.org/p/2020-10-rax-dfw-volume-maint October Volume Maintenance 19:19:03 <fungi> the ones in the "migrate" list we want to avoid outages for, and could attach new volumes and pvmove (all new volumes created aren't impacted by the maintenance, only existing volumes) 19:19:39 <fungi> the outage list is those where we could fix them after with modest impact, or maybe turn them off for migration 19:20:05 <fungi> the delete list is there because i noticed we have three which aren't attached ("available" according to cinder) so suggesting we just clean those up 19:20:14 <clarkb> fungi: review.o.o's is in two separate lists 19:20:17 <clarkb> not sure if that was intentional 19:20:48 <corvus> we can migrate it, and then take an outage for extra fun 19:20:56 <clarkb> thank you for putting that together, other than the review.o.o double listing I think that looks good 19:21:02 <fungi> ahh, yeah checking now to see if that matched twice somewhere 19:21:12 <fungi> (i've already cleaned it up on the pad) 19:21:48 <fungi> yeah, that was just a drag-n-drop turning into a cut-n-paste looks like 19:21:58 <fungi> there aren't two volumes with that name 19:22:24 <fungi> anyway, the breakdown there was just my first stab. if folks think we should shuffle anything between migrate and outage feel free 19:23:07 <fungi> and of course we *could* migrate more of them if there's time, but i'd want to prioritize the ones we know would otherwise be painful 19:23:15 <fungi> anyway, that's all i have on that 19:23:20 <clarkb> thanks! 19:23:28 <clarkb> #topic General Topics 19:23:37 <clarkb> #topic Bup and Borg Backups 19:23:49 <clarkb> Doesn't look like the change has merged yet. ianw has been busy with other things 19:23:58 <clarkb> #topic github 3rd party ci 19:24:11 <clarkb> ianw does report that we've hit a speedbump on the arm64 wheel generation problem 19:24:37 <ianw> yeah, i did a bunch of stuff to get us manylinux2014_aarch64 wheels 19:24:37 <fungi> oh, yeah, this is a "fun" (in an unfortunate way) issue 19:24:42 <clarkb> in particular the goal with working with cryptography was to produce manylinux wheels that could be hosted on pypi and help everyone, but they've discovered that ubuntu and centos use different page sizes on arm64 19:25:20 <fungi> with no way to differentiate those for pypi/pip apparently 19:25:21 <clarkb> linux allows for 4k and 64k page sizes on arm64. ubuntu and centos choose differently. It sounds like we may be able to foce 64k for everyone as 4k would still be 64k aligned (but not vice versa) 19:25:38 <corvus> are the pyca folks aware of this now? (ie, did we at least help them discover/understand this problem?) 19:25:45 <clarkb> but I think that is all work that needs to be done to the upstream python manylinux builder images and from there we can pull it in 19:25:50 <clarkb> corvus: that is my understanding ya 19:26:07 <clarkb> there is commentary through their github issue tracker /me looks for a link 19:26:26 <fungi> manylinux2014 is centos-based right? 19:26:51 <fungi> so basically the idea would be to tweak the manylinux2014 reference to force 64k page size 19:27:15 <clarkb> ya I'm not finding a link, maybe it hasn't gone upstream yet? 19:27:23 <clarkb> fungi: yes its a centos7 in this case iirc 19:27:57 <ianw> there's a few related discussions, the problem was more in libffi as they released a wheel and our ci found it 19:28:24 <clarkb> oh they pushed a wrongly aligned wheel to pypi and then we tried to use it with a different page size? neat 19:28:53 <ianw> yep, it was our wider distro testing that flagged it 19:29:14 <clarkb> anyway Just wanted to call out that progress continues to be made here, and sounds like its ending up as good feedback more globally for arm64 python wheels 19:29:21 <clarkb> ianw: is there anything else you want to call out on this topic? 19:29:33 <fungi> so it sounds like we're uncovering problems which haven't gotten much attention yet, i guess that's a good thing in th elong run 19:29:48 <corvus> yeah, this is a short-term disappointment in the middle of a long-term benefit 19:29:57 <ianw> the other thing is they just (like a few hours ago) switched to travis-ci.com ... which apparently gives them access to whatever aws hardware arm64 thing is 19:30:26 <corvus> so they don't need us anymore? 19:30:31 <ianw> e.g. https://github.com/pyca/cryptography/pull/5416 19:31:18 <ianw> well ... maybe. it's not 100% clear to me what runs on hardware or not 19:31:42 <clarkb> its also possible that hardware/distro diversity is a good thing here to uncover problems like the page alignment issue 19:32:01 <clarkb> at elast until the arm64 python ecosystem works out those gotchas 19:32:54 <ianw> the other thing was the rust support they're adding 19:33:13 <ianw> #link https://review.opendev.org/746423 19:33:38 <ianw> that adds an ensure-rust role, which worked for upstream jobs (after i figured out where to depends-on for github issues :) 19:34:05 <corvus> who's adding rust support? 19:34:21 <clarkb> corvus: cryptography wants to link to rust as well as C 19:34:28 <corvus> gotcha 19:34:38 <ianw> #link https://github.com/pyca/cryptography/pull/5410 19:34:41 <ianw> corvus: ^ 19:35:36 <corvus> btw, do we want to continue leaving comments in the PRs? 19:35:52 <corvus> (we can turn that off now that checks are there; some people like them, some people seem them as spammy) 19:35:59 <ianw> oh, that was another thing, there was some discussions about that 19:36:08 <ianw> yeah, we can turn that off 19:36:24 <corvus> i'm guessing if there were discussions, then there's at least some "these are spammy" sentiment :) 19:37:06 <corvus> should just be a matter of dropping the message stanza from the pipeline 19:37:08 <ianw> #link https://foss.heptapod.net/pypy/cffi/-/issues/468 19:37:17 <ianw> that was the discovery of the page size issues fyi 19:37:21 <ianw> yeah, i can do that 19:37:32 <ianw> the other thing they wanted was a "re-run" button 19:37:41 <ianw> apparently some ci's do that 19:37:46 <clarkb> different than "recheck" comments? 19:37:48 <fungi> rather than leaving a recheck comment 19:37:52 <corvus> i think we can with github checks 19:37:58 <ianw> https://imgur.com/a/ok7WNqs 19:38:26 <ianw> github definitely issues a rerun hook, and we handle it 19:38:46 <clarkb> do we need to change anything then? 19:38:50 <ianw> #link https://developer.github.com/webhooks/event-payloads/#webhook-payload-object 19:39:03 <clarkb> I guess update the trigger config to fire on the rerun call? 19:39:07 <ianw> #link https://developer.github.com/v3/checks/runs/#check-runs-and-requested-actions 19:39:25 <ianw> i'm not sure if *maybe* we need to define the custom button? 19:39:39 <ianw> To create a button that can request additional actions from your app, use the actions object when you Create a check run. For example, the actions object below displays a button in a pull request with the label "Fix this." The button appears after the check run completes. 19:40:21 <clarkb> "use the actions object" <- I guess zuul may need to learn about github actions? 19:40:35 <fungi> oh, hah, so we *can* do it for pyca/cryptography, but projects who want to gate with zuul can't because of the whole apps can't have control over a repo with actions problem? 19:40:56 <fungi> or has that been solved in recent months? 19:41:02 <clarkb> pabelanger would probably know 19:41:17 <ianw> i guess i should probably write a story 19:41:40 <ianw> and then the *other* thing that was brought up was re-running a single job 19:41:57 <corvus> there is existing support in zuul for re-running checks 19:42:10 <ianw> i know we've had that discussion over and over in various ways. i couldn't find something canonical to point to 19:42:38 <clarkb> ianw: that came up elsewhere recently. I think its the wrong thing for openstack/opendev but can see that being something zuul grows for other use cases 19:43:06 <clarkb> but I also expect that requires significantly more updates to zuul to support 19:43:06 <corvus> ianw: this is the canonical thing to point to: https://zuul-ci.org/docs/zuul/discussion/github-checks-api.html 19:43:45 <corvus> that page also talks about re-run 19:44:10 <corvus> i'd like clarification on clarkb's question -- do we need to change anything? 19:44:16 <corvus> (ie, is re-run not working as expected?) 19:44:59 <clarkb> corvus: looking at that doc maybe our pipeline config to handle the rerun requests? 19:45:08 <clarkb> but it seems like github automatically sets up the desired buttons 19:45:10 <ianw> comment recheck does, but specifically i think they wanted that "re-run" button to appear to be consistent with other ci 19:45:16 <corvus> sure 19:45:30 <corvus> i'm waiting on a clear statement of "the comment button does not appear as expected" 19:45:31 <clarkb> ianw: "Github provides a set of default actions for check suites and check runs. Those actions are available as buttons in the Github UI. Clicking on those buttons will emit webhook events which will be handled by Zuul." is what the zuul doc says 19:45:43 <corvus> er the 'rerun' button 19:45:50 <corvus> because right now, i expect it to appear 19:46:03 <corvus> so i need to understand if there even is a problem 19:46:29 <ianw> well, maybe you want to try and catch reaperhulk into #crytography-dev -- i don't think it appears for non-admin users 19:46:54 <corvus> i think if it appears for admin users, then this is not our problem :) 19:46:56 <fungi> presumably github only shows the re-run widget to users who have permission to trigger it (via whatever acls github enforces on those)? 19:47:20 <ianw> corvus: no i mean he's admin and not seeing it, and i'm not so i think i can't see it in any case 19:47:50 <corvus> note that the docs say it only appears for failing runs 19:47:59 <corvus> (which is, imho, a bad choice on github's part) 19:48:12 <fungi> wow, really? 19:48:19 <fungi> that's an odd decision indeed 19:48:33 <ianw> i think this was in the context of the failing runs from the ffi fallout, but i may be wrong 19:48:39 <corvus> (we recheck successful runs all the time, in fact, i'd argue that's the more legitimate case for rechecking but i'd be arguing with the wind) 19:48:40 <fungi> that really just reinforces the whole "recheck until it passes" mindset too 19:48:46 <corvus> fungi: that 19:49:44 <clarkb> as a time check we have 2 more items to talk about. Maybe we can continue this conversation in #zuul? 19:49:48 <corvus> i don't think i can commit to working with the pyca folks to improve their github experience 19:50:03 <corvus> but atm, i don't think there's anything lacking from zuul in order for it to do what they want 19:50:18 <fungi> but possible some of the github users in #zuul know what the misconfiguration might be there 19:50:25 <corvus> if there even is one 19:50:41 <corvus> let's start with a clear problem statement :) 19:51:35 <clarkb> #topic Making ask.openstack.org read only 19:51:41 <clarkb> #link https://review.opendev.org/#/c/746497/ set ask.openstack.org to read only 19:51:53 <clarkb> we've talked about sunsetting this service for a long time and ttx has written a change to start that process 19:52:09 <clarkb> There is also a openstack-discuss thread on the subject 19:52:23 <clarkb> I don't expect this will get any objections from this group, but wanted to call it out in case there were any concerns 19:52:47 <clarkb> what that chagne should do is make the running service read only and give people a message about it and alternative locations for questions 19:53:09 <clarkb> ianw: ^ you did the last ask deployment so may be able to offer some of the flavor text behind this if people ask 19:53:22 <fungi> it's like the author designed a sunsetting feature right in 19:54:19 <clarkb> #topic PTG Planning 19:54:27 <ianw> clarkb: ^ sure 19:54:38 <fungi> the other concern worth raising is that we likely won't/can't leave it up indefinitely even in a read-only state, as it's complex and unmaintained software and the distro release we're able to deploy it on now is reaching eol in a few months 19:54:44 <clarkb> #undo 19:54:45 <openstack> Removing item from minutes: #topic PTG Planning 19:55:03 <clarkb> fungi: ya maybe we should make that clearer on the thread 19:55:13 <clarkb> basically call out that this is the first step in eventually turning it off completely 19:55:22 <fungi> sgtm 19:55:31 <fungi> i can reply on that thread 19:55:32 <clarkb> #topic PTG Planning 19:55:55 <clarkb> There will be a virtual PTG at the end of October. I think our three blocks of 2 hours across timezone boundaries seemed to work well last time 19:56:08 <corvus> maybe we can point the internet archive crawler at ask after we make it read-only to make sure it gets a complete copy 19:56:15 <clarkb> corvus: ++ 19:56:27 <fungi> good idea 19:56:37 <clarkb> #link https://etherpad.opendev.org/opendev-ptg-planning-oct-2020 October PTG planning starts here 19:56:48 <clarkb> I've yet to populate that etherpad with ideas, but will do so there when I get some time 19:56:57 <clarkb> feel free to add your own items too 19:57:06 <fungi> clarkb: yeah, i think the vptg worked well, same schedule this time is fine by me 19:57:12 <clarkb> #link https://www.openstack.org/ptg/ Registration is open too 19:57:31 <clarkb> ya I think my biggest question right now is if people think we want more (or less) time? 19:57:33 <fungi> i keep meaning to do that, thanks for the reminder 19:57:55 <clarkb> I'll assume three blocks of 2 hours unless I hear otherwise. I personally think that worked well for us 19:58:40 <clarkb> #topic Open Disucssion 19:58:48 <clarkb> we have about an minute and a half for anything else you'd like to bring uop 20:00:06 <clarkb> I guess that was it. Thank you everyone! 20:00:08 <clarkb> #endmeeting