19:01:09 #startmeeting infra 19:01:09 Meeting started Tue Jun 14 19:01:09 2022 UTC and is due to finish in 60 minutes. The chair is clarkb. Information about MeetBot at http://wiki.debian.org/MeetBot. 19:01:09 Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. 19:01:09 The meeting name has been set to 'infra' 19:01:31 #link https://lists.opendev.org/pipermail/service-discuss/2022-June/000339.html Our Agenda 19:01:43 #topic Announcements 19:01:48 I had none 19:02:28 There were no actions last meeting either so we can dive right into the agenda 19:02:32 #topic Topics 19:02:40 #topic Improving CD throughput 19:03:08 We worked through the issues with the zuul cluster upgrade and reboot playbook and managed to run it to completion without error 19:03:45 The next step there is to run it automatically. It took abou 18 hours to complete so I figure a daily cron wiht some sort of locking mechanism is appropriate. Any concern with getting that set up? 19:04:22 I'd think maybe once a week would be often enough? 19:04:35 running this 3/4 of the time seems a bit much to me 19:04:38 ya we don't have to do it as often as possible either 19:05:20 In that case maybe a weekend cron to do it when zuul is under the least load. I can work on that 19:05:53 probably one more manual run is in order before we turn on a cronjob too 19:05:57 ++ 19:06:21 i'm happy to run it, e.g., tomorrow 19:06:27 thanks 19:06:58 this week is pretty quiet, so may go faster and also probably less impact if something does go wrong 19:07:45 sounds like a plan. Anything else on this tpoic? 19:09:19 #topic Gerrit 3.5 upgrade planning 19:09:41 ianw: are we still on track for doing this your monday (sunday utc)? 19:10:12 yes i think so 19:10:39 I ended up pushing a change for the collision checking config, but in the process realized the default is to enable it so that bit is less urgent than I thought it was 19:10:39 couple of config todo's but i'll get that done soon 19:11:13 ++, sorry haven't checked review queues just yet but sounds good 19:11:57 I guess let us know if we need to review anything or go over the process. I was planning to look at the etherpad more closely again, but this upgrade very closely resembles the 3.4 upgrade iirc 19:12:07 the next one to 3.6 is a bit more involved but we aren't going that far 19:12:32 #link https://etherpad.opendev.org/p/gerrit-upgrade-3.5 19:13:00 anything else to call out before the weekend upgrade? 19:13:18 nope, as you say this one doesn't seem too involved 19:14:12 #topic Changing our default ansible version in Zuul 19:14:36 I meant to send email about this but then summit travel and prep ended up beingtoo distracting. 19:15:02 I also forgot about kolla testing with all the other brokenness 19:15:04 Do we think tw oweeks notice if I send an email this week is sufficient for flipping to ansible v5 by default at the end of june or should we do it in july 19:15:56 seems reasonable. set it much longer and openstack's release cycle will be too far along 19:16:01 I think it is o.k., I don't expect many people to act before it happens 19:16:10 agreed 19:16:11 and that, too 19:16:21 well, the two are directly related ;) 19:16:27 ok I'll plan to send notice of that changing June 30 then (its a thursday so that gives people time before the weekend to loo kat brokeness) 19:16:35 thanks! 19:17:12 #topic Enable webapp on nodepool launchers 19:17:26 frickler: I think you added this one. I did want to point out we do run a webserver on the builders 19:17:34 yes, I came across that while looking at how to check to stuck image build 19:17:38 But I think you're looking for access to the newer launcher api stuff 19:17:54 the webserver only serves log and images right now iiuc 19:18:12 we could add the couple of special URLs that the api serves to it 19:18:28 and then have a data source to check image builds quite easily 19:18:30 ya I think adding that is fine and a good idea 19:18:52 do we need a spec? otherwise I could just hack up a patch I think 19:19:16 I don't think we need a spec. We already have a webserver in place and there isn't any privilged info 19:19:32 just a matter of adding the webserver to the launchers and wiring it up to the api bits 19:19:54 (no new servers, no new security concerns, no new dns records, etc pretty traightforward) 19:20:13 my theory with this was that we should be able to see from a dashboard like ... 19:20:38 the zuul dashboard does expose nodes and labels but not the images 19:20:38 https://grafana.opendev.org/d/f3089338b3/nodepool-dib-status?orgId=1 19:21:20 i have to admit i haven't looked at that in a while, and now it has a big *green* FAILED 19:21:41 oh, I didn't know that page 19:21:57 oh ya I don't recall knowing that existed 19:22:14 grafana has ways to alert us of issues, but we've never quite managed to get consensus on actually turning that on 19:22:22 maybe if we manage to make "failed" red, that's already all we need 19:22:45 just for comparison, this is an example of how the api result looks like https://paste.opendev.org/show/bwHPkLhxzyARMsOryUyV/ 19:23:42 but this is also maybe something to shortly talk about 19:23:53 arm64 builds are broken, haven't checked yet why 19:24:05 I don't think it hurts to have the information available directly via the api too if we still want to add that 19:24:14 yeah i saw that note, thanks, sorry i've been out a few days but will look into it 19:24:18 and centos9 waits for a dib release which is difficult because there is a nasty workaround merged 19:24:19 but I agree the dashborad is likely more generally a better way to consume tit 19:24:37 I'm on my laptop keyboard and my typing is extra bad 19:24:53 I'll try to get the API working anyway, yes 19:24:59 yeah i'm hoping the centos 9 packages have been fixed in the last few days 19:25:54 and the other thing is wheels haven't been published for 14 days, I think also due to centos9 19:26:23 ya the afs packaging is sensitive to booting on current kernels so when the images get delayed wheels get delayed 19:26:33 I wonder if we need to only publish if all jobs pass though 19:26:39 and instead just publish whatever we've built 19:27:14 yeah, that's been a constant issue; not sure if we have a "finally" type zuul dependency? 19:27:23 or make arch specific publishing? 19:27:24 i suppose that's safe, it shouldn't create a wheel if building that wheel fails, so we're probably not going to be more likely to publish broken wheels that way at least 19:27:59 fungi: yup exactly. If we write a wheel it should be fine tp publsih 19:28:21 that said, we're more likely to not notice it's broken if we do that 19:28:35 https://opendev.org/openstack/project-config/src/branch/master/zuul.d/projects.yaml#L4811 is where it is released 19:28:55 er, more likely to not notice we've started failing to build some hweels i mean 19:28:57 fungi: ya Ithink that is the balance. Is it better to hold everything up and probably notice or do best effort and maybe not notice as quickly 19:29:44 (also, grafana monitors this, and i would also be happy for it to push me notifications it was broken) 19:30:07 ianw: in the past we've said making notifications like that opt in would be fine. I think I'm also ok with sending them to an infra-root@ folder 19:30:16 I would probably consume them ^ that way 19:30:52 (we just want to avoid people getting middle of the night pages and feeling obligated to do something, but an alert that can be checked in the morning is something I woul dfind helpful) 19:31:10 yes, my position on it is that notifications of what's broken is fine, as long as we don't et expectations that someone is necessarily going to address whatever we're being notified about, and as long as the false failure rate isn't significant 19:31:49 we already do it for cronjobs, expiring ssl certs, et cetera 19:32:59 https://review.opendev.org/c/opendev/system-config/+/573183/ was in this area 19:34:05 I think I would avoid irc (at least to start) and do email if we can 19:34:17 simply because it is easier to "subscribe" with email 19:34:25 (though most irc clients will let you filter stuff out too) 19:34:51 but ya I think if we can make grafana send us an email to infra-root@ and elsewhere that would work 19:36:27 https://meetings.opendev.org/irclogs/%23openstack-infra/%23openstack-infra.2018-06-07.log.html#t2018-06-07T23:43:25 was some discussion on it 19:37:19 at the time i accidentally left a test server alerting top #openstack-infra, which probably had people starting from a base of "already annoyed" :) 19:37:48 hah 19:38:08 we might use a dedicated channel then. but I'm also not against mail 19:38:43 ya a dedicated channel would bte other method. Then I just want join that channel on my phone :) 19:39:00 also, this might go into another point of contention on this as well, which is i'm not sure exactly how to set it up, but i feel like grafyaml may not support it 19:40:45 if these are things we can add to specific grpahs it may work with grafyaml as is 19:42:24 anyway we have one more agenda item to get to. We don't need to design this here. It may be worth a specific agenda item or a spec/email thread for future discussion though 19:42:37 #topic Running a URL shortener 19:42:49 frickler pointed out that people use services like bit.ly 19:43:01 another thing I came up with, yes 19:43:11 #topic https://opensource.com/article/18/7/apache-url-shortener an open source alternative we could host 19:43:22 and seeing that apache2 has everything one needs was new to me 19:44:17 I'm not opposed and this seems like the sort of thing that would fit in well on static.o.o 19:44:18 i guess my concern is that it seems to be a target for abuse, isn't that why github killed "git.io"? 19:44:37 ianw: in this case I think you'd have to modify a file via gerrit, it wouldn't be self service 19:44:39 well we would still have reviews in front of the data 19:45:14 I would do it within project-config for simplicity, but we could also use a dedicated repo if you prefer 19:45:18 yeah, the main concern i have is that this is something we'd probably have to commit to maintain ~forever or else break people's external links 19:45:28 however, it does seem like a pretty lightweight thing 19:45:42 oh, so basically just a vhost with a list of 301 redirects? 19:45:47 ianw: ya 19:46:09 it is simple neough that fungi's concern doens't seem to be a major thing. If we had to ru na proper wsgi service or similar I'd think differently 19:46:09 RewriteMap shortlinks txt:/data/web/shortlink/links.txt RewriteRule ^/(.+)$ ${shortlinks:$1} [R=temp,L] 19:46:16 that's what a large part of static.o.o is anyway :) 19:46:22 agreed 19:46:36 i certainly don't have an issue if it's just an easy-to-update config file that goes through review 19:47:10 for the sites we already host, we do similar things, e.g. zuul-ci.org/start 19:47:27 then another question would be whether e.g. l.opendev.org is short enough or we want to grab a shorter domain 19:48:03 I reserved od42.de just in case, but not sure if everyone would be fine using a .de domain 19:48:48 using another domain typically adds another level of management with the registrar service 19:49:04 i always find it weird that these things use what i generally don't consider stable countries as a top-level domain 19:49:06 its not impossible but avoiding that if possible is likely a good idea 19:49:21 ianw: .io is a pet peeve of mine, yeah 19:49:30 something in .dev maybe, but i imagine anything short is unavailable 19:50:19 note that .dev is controlled by google too 19:50:59 and they have a history of forcing a number of "experimental" features for domains in that tld as a result 19:51:07 my vote is something like l.opendev.org as it is one less thing to manage and I feel that is short enough to work on conference slide sfor example 19:51:27 (where experimental means anything they're considering for tie-ins with chrome) 19:51:31 we don't have to decide now, I can start preparing a patch with that anyway 19:51:48 yup we could expand to another domain later if we decide it is neceessary 19:51:49 ++ i can't imagine we can get any shorter without spending ridiculous amounts of $ anyway 19:52:24 the foundation already spent a semi-large amount of money to buy opendev.org off a scalper as it was 19:53:05 and reusing a subdomain of opendev.org is also a bit of useful advertising for the collaboratory too 19:53:23 "oh opendev, what's that?" 19:53:56 lets open it up to anything else befoer we run out of time 19:54:00 #topic Open DIsussion 19:54:02 anything else? 19:54:03 do we want to restrict targets to being opendev related? 19:54:28 frickler: ya I wouldn't use it for arbitrary stuff to avoid that abuse concern ianw brought up 19:54:41 anyway, can discuss that once I have a patch 19:54:45 #link https://review.opendev.org/c/opendev/system-config/+/845066 19:55:03 that's a doc update for duplicate accounts 19:55:20 ah I'll have to take a look at that one 19:55:22 and cleans up some other things 19:56:10 I also have a zuul patch if someone get's bored ;) 19:56:14 #link https://review.opendev.org/c/zuul/zuul/+/834671 19:58:22 interesting ... do people take anonymous patches? 19:58:54 ianw: zuul can only see public data, not everyone publishes that 19:59:43 in particular for the email 20:00:01 And we are at time. Thanks everyone. We'll be back here next week 20:00:05 thanks clarkb! 20:00:09 #endmeeting