19:01:09 <clarkb> #startmeeting infra
19:01:09 <opendevmeet> Meeting started Tue Jun 14 19:01:09 2022 UTC and is due to finish in 60 minutes.  The chair is clarkb. Information about MeetBot at http://wiki.debian.org/MeetBot.
19:01:09 <opendevmeet> Useful Commands: #action #agreed #help #info #idea #link #topic #startvote.
19:01:09 <opendevmeet> The meeting name has been set to 'infra'
19:01:31 <clarkb> #link https://lists.opendev.org/pipermail/service-discuss/2022-June/000339.html Our Agenda
19:01:43 <clarkb> #topic Announcements
19:01:48 <clarkb> I had none
19:02:28 <clarkb> There were no actions last meeting either so we can dive right into the agenda
19:02:32 <clarkb> #topic Topics
19:02:40 <clarkb> #topic Improving CD throughput
19:03:08 <clarkb> We worked through the issues with the zuul cluster upgrade and reboot playbook and managed to run it to completion without error
19:03:45 <clarkb> The next step there is to run it automatically. It took abou 18 hours to complete so I figure a daily cron wiht some sort of locking mechanism is appropriate. Any concern with getting that set up?
19:04:22 <frickler> I'd think maybe once a week would be often enough?
19:04:35 <frickler> running this 3/4 of the time seems a bit much to me
19:04:38 <clarkb> ya we don't have to do it as often as possible either
19:05:20 <clarkb> In that case maybe a weekend cron to do it when zuul is under the least load. I can work on that
19:05:53 <fungi> probably one more manual run is in order before we turn on a cronjob too
19:05:57 <clarkb> ++
19:06:21 <fungi> i'm happy to run it, e.g., tomorrow
19:06:27 <clarkb> thanks
19:06:58 <fungi> this week is pretty quiet, so may go faster and also probably less impact if something does go wrong
19:07:45 <clarkb> sounds like a plan. Anything else on this tpoic?
19:09:19 <clarkb> #topic Gerrit 3.5 upgrade planning
19:09:41 <clarkb> ianw: are we still on track for doing this your monday (sunday utc)?
19:10:12 <ianw> yes i think so
19:10:39 <clarkb> I ended up pushing a change for the collision checking config, but in the process realized the default is to enable it so that bit is less urgent than I thought it was
19:10:39 <ianw> couple of config todo's but i'll get that done soon
19:11:13 <ianw> ++, sorry haven't checked review queues just yet but sounds good
19:11:57 <clarkb> I guess let us know if we need to review anything or go over the process. I was planning to look at the etherpad more closely again, but this upgrade very closely resembles the 3.4 upgrade iirc
19:12:07 <clarkb> the next one to 3.6 is a bit more involved but we aren't going that far
19:12:32 <clarkb> #link https://etherpad.opendev.org/p/gerrit-upgrade-3.5
19:13:00 <clarkb> anything else to call out before the weekend upgrade?
19:13:18 <ianw> nope, as you say this one doesn't seem too involved
19:14:12 <clarkb> #topic Changing our default ansible version in Zuul
19:14:36 <clarkb> I meant to send email about this but then summit travel and prep ended up beingtoo distracting.
19:15:02 <frickler> I also forgot about kolla testing with all the other brokenness
19:15:04 <clarkb> Do we think tw oweeks notice if I send an email this week is sufficient for flipping to ansible v5 by default at the end of june or should we do it in july
19:15:56 <fungi> seems reasonable. set it much longer and openstack's release cycle will be too far along
19:16:01 <frickler> I think it is o.k., I don't expect many people to act before it happens
19:16:10 <fungi> agreed
19:16:11 <frickler> and that, too
19:16:21 <fungi> well, the two are directly related ;)
19:16:27 <clarkb> ok I'll plan to send notice of that changing June 30 then (its a thursday so that gives people time before the weekend to loo kat brokeness)
19:16:35 <fungi> thanks!
19:17:12 <clarkb> #topic Enable webapp on nodepool launchers
19:17:26 <clarkb> frickler: I think you added this one. I did want to point out we do run a webserver on the builders
19:17:34 <frickler> yes, I came across that while looking at how to check to stuck image build
19:17:38 <clarkb> But I think you're looking for access to the newer launcher api stuff
19:17:54 <frickler> the webserver only serves log and images right now iiuc
19:18:12 <frickler> we could add the couple of special URLs that the api serves to it
19:18:28 <frickler> and then have a data source to check image builds quite easily
19:18:30 <clarkb> ya I think adding that is fine and a good idea
19:18:52 <frickler> do we need a spec? otherwise I could just hack up a patch I think
19:19:16 <clarkb> I don't think we need a spec. We already have a webserver in place and there isn't any privilged info
19:19:32 <clarkb> just a matter of adding the webserver to the launchers and wiring it up to the api bits
19:19:54 <clarkb> (no new servers, no new security concerns, no new dns records, etc pretty traightforward)
19:20:13 <ianw> my theory with this was that we should be able to see from a dashboard like ...
19:20:38 <clarkb> the zuul dashboard does expose nodes and labels but not the images
19:20:38 <ianw> https://grafana.opendev.org/d/f3089338b3/nodepool-dib-status?orgId=1
19:21:20 <ianw> i have to admit i haven't looked at that in a while, and now it has a big *green* FAILED
19:21:41 <frickler> oh, I didn't know that page
19:21:57 <clarkb> oh ya I don't recall knowing that existed
19:22:14 <ianw> grafana has ways to alert us of issues, but we've never quite managed to get consensus on actually turning that on
19:22:22 <frickler> maybe if we manage to make "failed" red, that's already all we need
19:22:45 <frickler> just for comparison, this is an example of how the api result looks like https://paste.opendev.org/show/bwHPkLhxzyARMsOryUyV/
19:23:42 <frickler> but this is also maybe something to shortly talk about
19:23:53 <frickler> arm64 builds are broken, haven't checked yet why
19:24:05 <clarkb> I don't think it hurts to have the information available directly via the api too if we still want to add that
19:24:14 <ianw> yeah i saw that note, thanks, sorry i've been out a few days but will look into it
19:24:18 <frickler> and centos9 waits for a dib release which is difficult because there is a nasty workaround merged
19:24:19 <clarkb> but I agree the dashborad is likely more generally a better way to consume tit
19:24:37 <clarkb> I'm on my laptop keyboard and my typing is extra bad
19:24:53 <frickler> I'll try to get the API working anyway, yes
19:24:59 <ianw> yeah i'm hoping the centos 9 packages have been fixed in the last few days
19:25:54 <frickler> and the other thing is wheels haven't been published for 14 days, I think also due to centos9
19:26:23 <clarkb> ya the afs packaging is sensitive to booting on current kernels so when the images get delayed wheels get delayed
19:26:33 <clarkb> I wonder if we need to only publish if all jobs pass though
19:26:39 <clarkb> and instead just publish whatever we've built
19:27:14 <ianw> yeah, that's been a constant issue; not sure if we have a "finally" type zuul dependency?
19:27:23 <frickler> or make arch specific publishing?
19:27:24 <fungi> i suppose that's safe, it shouldn't create a wheel if building that wheel fails, so we're probably not going to be more likely to publish broken wheels that way at least
19:27:59 <clarkb> fungi: yup exactly. If we write a wheel it should be fine tp publsih
19:28:21 <fungi> that said, we're more likely to not notice it's broken if we do that
19:28:35 <ianw> https://opendev.org/openstack/project-config/src/branch/master/zuul.d/projects.yaml#L4811 is where it is released
19:28:55 <fungi> er, more likely to not notice we've started failing to build some hweels i mean
19:28:57 <clarkb> fungi: ya  Ithink that is the balance. Is it better to hold everything up and probably notice or do best effort and maybe not notice as quickly
19:29:44 <ianw> (also, grafana monitors this, and i would also be happy for it to push me notifications it was broken)
19:30:07 <clarkb> ianw: in the past we've said making notifications like that opt in would be fine. I think I'm also ok with sending them to an infra-root@ folder
19:30:16 <clarkb> I would probably consume them ^ that way
19:30:52 <clarkb> (we just want to avoid people getting middle of the night pages and feeling obligated to do something, but an alert that can be checked in the morning is something I woul dfind helpful)
19:31:10 <fungi> yes, my position on it is that notifications of what's broken is fine, as long as we don't et expectations that someone is necessarily going to address whatever we're being notified about, and as long as the false failure rate isn't significant
19:31:49 <fungi> we already do it for cronjobs, expiring ssl certs, et cetera
19:32:59 <ianw> https://review.opendev.org/c/opendev/system-config/+/573183/ was in this area
19:34:05 <clarkb> I think I would avoid irc (at least to start) and do email if we can
19:34:17 <clarkb> simply because it is easier to "subscribe" with email
19:34:25 <clarkb> (though most irc clients will let you filter stuff out too)
19:34:51 <clarkb> but ya I think if we can make grafana send us an email to infra-root@ and elsewhere that would work
19:36:27 <ianw> https://meetings.opendev.org/irclogs/%23openstack-infra/%23openstack-infra.2018-06-07.log.html#t2018-06-07T23:43:25 was some discussion on it
19:37:19 <ianw> at the time i accidentally left a test server alerting top #openstack-infra, which probably had people starting from a base of "already annoyed" :)
19:37:48 <fungi> hah
19:38:08 <frickler> we might use a dedicated channel then. but I'm also not against mail
19:38:43 <clarkb> ya a dedicated channel would bte other method. Then I just want join that channel on my phone :)
19:39:00 <ianw> also, this might go into another point of contention on this as well, which is i'm not sure exactly how to set it up, but i feel like grafyaml may not support it
19:40:45 <clarkb> if these are things we can add to specific grpahs it may work with grafyaml as is
19:42:24 <clarkb> anyway we have one more agenda item to get to. We don't need to design this here. It may be worth a specific agenda item or a spec/email thread for future discussion though
19:42:37 <clarkb> #topic Running a URL shortener
19:42:49 <clarkb> frickler pointed out that people use services like bit.ly
19:43:01 <frickler> another thing I came up with, yes
19:43:11 <clarkb> #topic https://opensource.com/article/18/7/apache-url-shortener an open source alternative we could host
19:43:22 <frickler> and seeing that apache2 has everything one needs was new to me
19:44:17 <clarkb> I'm not opposed and this seems like the sort of thing that would fit in well on static.o.o
19:44:18 <ianw> i guess my concern is that it seems to be a target for abuse, isn't that why github killed "git.io"?
19:44:37 <clarkb> ianw: in this case I think you'd have to modify a file via gerrit, it wouldn't be self service
19:44:39 <frickler> well we would still have reviews in front of the data
19:45:14 <frickler> I would do it within project-config for simplicity, but we could also use a dedicated repo if you prefer
19:45:18 <fungi> yeah, the main concern i have is that this is something we'd probably have to commit to maintain ~forever or else break people's external links
19:45:28 <fungi> however, it does seem like a pretty lightweight thing
19:45:42 <ianw> oh, so basically just a vhost with a list of 301 redirects?
19:45:47 <clarkb> ianw: ya
19:46:09 <clarkb> it is simple neough that fungi's concern doens't seem to be a major thing. If we had to ru na proper wsgi service or similar I'd think differently
19:46:09 <frickler> RewriteMap shortlinks txt:/data/web/shortlink/links.txt RewriteRule ^/(.+)$ ${shortlinks:$1} [R=temp,L]
19:46:16 <ianw> that's what a large part of static.o.o is anyway :)
19:46:22 <fungi> agreed
19:46:36 <ianw> i certainly don't have an issue if it's just an easy-to-update config file that goes through review
19:47:10 <fungi> for the sites we already host, we do similar things, e.g. zuul-ci.org/start
19:47:27 <frickler> then another question would be whether e.g. l.opendev.org is short enough or we want to grab a shorter domain
19:48:03 <frickler> I reserved od42.de just in case, but not sure if everyone would be fine using a .de domain
19:48:48 <clarkb> using another domain typically adds another level of management with the registrar service
19:49:04 <ianw> i always find it weird that these things use what i generally don't consider stable countries as a top-level domain
19:49:06 <clarkb> its not impossible but avoiding that if possible is likely a good idea
19:49:21 <fungi> ianw: .io is a pet peeve of mine, yeah
19:49:30 <ianw> something in .dev maybe, but i imagine anything short is unavailable
19:50:19 <fungi> note that .dev is controlled by google too
19:50:59 <fungi> and they have a history of forcing a number of "experimental" features for domains in that tld as a result
19:51:07 <clarkb> my vote is something like l.opendev.org as it is one less thing to manage and I feel that is short enough to work on conference slide sfor example
19:51:27 <fungi> (where experimental means anything they're considering for tie-ins with chrome)
19:51:31 <frickler> we don't have to decide now, I can start preparing a patch with that anyway
19:51:48 <clarkb> yup we could expand to another domain later if we decide it is neceessary
19:51:49 <ianw> ++ i can't imagine we can get any shorter without spending ridiculous amounts of $ anyway
19:52:24 <fungi> the foundation already spent a semi-large amount of money to buy opendev.org off a scalper as it was
19:53:05 <fungi> and reusing a subdomain of opendev.org is also a bit of useful advertising for the collaboratory too
19:53:23 <fungi> "oh opendev, what's that?"
19:53:56 <clarkb> lets open it up to anything else befoer we run out of time
19:54:00 <clarkb> #topic Open DIsussion
19:54:02 <clarkb> anything else?
19:54:03 <frickler> do we want to restrict targets to being opendev related?
19:54:28 <clarkb> frickler: ya I wouldn't use it for arbitrary stuff to avoid that abuse concern ianw brought up
19:54:41 <frickler> anyway, can discuss that once I have a patch
19:54:45 <ianw> #link https://review.opendev.org/c/opendev/system-config/+/845066
19:55:03 <ianw> that's a doc update for duplicate accounts
19:55:20 <clarkb> ah I'll have to take a look at that one
19:55:22 <ianw> and cleans up some other things
19:56:10 <frickler> I also have a zuul patch if someone get's bored ;)
19:56:14 <frickler> #link https://review.opendev.org/c/zuul/zuul/+/834671
19:58:22 <ianw> interesting ... do people take anonymous patches?
19:58:54 <frickler> ianw: zuul can only see public data, not everyone publishes that
19:59:43 <frickler> in particular for the email
20:00:01 <clarkb> And we are at time. Thanks everyone. We'll be back here next week
20:00:05 <fungi> thanks clarkb!
20:00:09 <clarkb> #endmeeting