Tuesday, 2022-06-14

clarkbJust about meeting time. We'll get started shortly18:59
ianwo/19:00
clarkb#startmeeting infra19:01
opendevmeetMeeting started Tue Jun 14 19:01:09 2022 UTC and is due to finish in 60 minutes.  The chair is clarkb. Information about MeetBot at http://wiki.debian.org/MeetBot.19:01
opendevmeetUseful Commands: #action #agreed #help #info #idea #link #topic #startvote.19:01
opendevmeetThe meeting name has been set to 'infra'19:01
clarkb#link https://lists.opendev.org/pipermail/service-discuss/2022-June/000339.html Our Agenda19:01
clarkb#topic Announcements19:01
clarkbI had none19:01
clarkbThere were no actions last meeting either so we can dive right into the agenda19:02
clarkb#topic Topics19:02
clarkb#topic Improving CD throughput19:02
clarkbWe worked through the issues with the zuul cluster upgrade and reboot playbook and managed to run it to completion without error19:03
clarkbThe next step there is to run it automatically. It took abou 18 hours to complete so I figure a daily cron wiht some sort of locking mechanism is appropriate. Any concern with getting that set up?19:03
fricklerI'd think maybe once a week would be often enough?19:04
fricklerrunning this 3/4 of the time seems a bit much to me19:04
clarkbya we don't have to do it as often as possible either19:04
clarkbIn that case maybe a weekend cron to do it when zuul is under the least load. I can work on that19:05
fungiprobably one more manual run is in order before we turn on a cronjob too19:05
clarkb++19:05
fungii'm happy to run it, e.g., tomorrow19:06
clarkbthanks19:06
fungithis week is pretty quiet, so may go faster and also probably less impact if something does go wrong19:06
clarkbsounds like a plan. Anything else on this tpoic?19:07
clarkb#topic Gerrit 3.5 upgrade planning19:09
clarkbianw: are we still on track for doing this your monday (sunday utc)?19:09
ianwyes i think so19:10
clarkbI ended up pushing a change for the collision checking config, but in the process realized the default is to enable it so that bit is less urgent than I thought it was19:10
ianwcouple of config todo's but i'll get that done soon19:10
ianw++, sorry haven't checked review queues just yet but sounds good19:11
clarkbI guess let us know if we need to review anything or go over the process. I was planning to look at the etherpad more closely again, but this upgrade very closely resembles the 3.4 upgrade iirc19:11
clarkbthe next one to 3.6 is a bit more involved but we aren't going that far19:12
clarkb#link https://etherpad.opendev.org/p/gerrit-upgrade-3.519:12
clarkbanything else to call out before the weekend upgrade?19:13
ianwnope, as you say this one doesn't seem too involved19:13
clarkb#topic Changing our default ansible version in Zuul19:14
clarkbI meant to send email about this but then summit travel and prep ended up beingtoo distracting.19:14
fricklerI also forgot about kolla testing with all the other brokenness19:15
clarkbDo we think tw oweeks notice if I send an email this week is sufficient for flipping to ansible v5 by default at the end of june or should we do it in july19:15
fungiseems reasonable. set it much longer and openstack's release cycle will be too far along19:15
fricklerI think it is o.k., I don't expect many people to act before it happens19:16
fungiagreed19:16
fricklerand that, too19:16
fungiwell, the two are directly related ;)19:16
clarkbok I'll plan to send notice of that changing June 30 then (its a thursday so that gives people time before the weekend to loo kat brokeness)19:16
fungithanks!19:16
clarkb#topic Enable webapp on nodepool launchers19:17
clarkbfrickler: I think you added this one. I did want to point out we do run a webserver on the builders19:17
frickleryes, I came across that while looking at how to check to stuck image build19:17
clarkbBut I think you're looking for access to the newer launcher api stuff19:17
fricklerthe webserver only serves log and images right now iiuc19:17
fricklerwe could add the couple of special URLs that the api serves to it19:18
fricklerand then have a data source to check image builds quite easily19:18
clarkbya I think adding that is fine and a good idea19:18
fricklerdo we need a spec? otherwise I could just hack up a patch I think19:18
clarkbI don't think we need a spec. We already have a webserver in place and there isn't any privilged info19:19
clarkbjust a matter of adding the webserver to the launchers and wiring it up to the api bits19:19
clarkb(no new servers, no new security concerns, no new dns records, etc pretty traightforward)19:19
ianwmy theory with this was that we should be able to see from a dashboard like ...19:20
clarkbthe zuul dashboard does expose nodes and labels but not the images19:20
ianwhttps://grafana.opendev.org/d/f3089338b3/nodepool-dib-status?orgId=119:20
ianwi have to admit i haven't looked at that in a while, and now it has a big *green* FAILED19:21
frickleroh, I didn't know that page19:21
clarkboh ya I don't recall knowing that existed19:21
ianwgrafana has ways to alert us of issues, but we've never quite managed to get consensus on actually turning that on19:22
fricklermaybe if we manage to make "failed" red, that's already all we need19:22
fricklerjust for comparison, this is an example of how the api result looks like https://paste.opendev.org/show/bwHPkLhxzyARMsOryUyV/19:22
fricklerbut this is also maybe something to shortly talk about19:23
fricklerarm64 builds are broken, haven't checked yet why19:23
clarkbI don't think it hurts to have the information available directly via the api too if we still want to add that19:24
ianwyeah i saw that note, thanks, sorry i've been out a few days but will look into it19:24
fricklerand centos9 waits for a dib release which is difficult because there is a nasty workaround merged19:24
clarkbbut I agree the dashborad is likely more generally a better way to consume tit19:24
clarkbI'm on my laptop keyboard and my typing is extra bad19:24
fricklerI'll try to get the API working anyway, yes19:24
ianwyeah i'm hoping the centos 9 packages have been fixed in the last few days19:24
fricklerand the other thing is wheels haven't been published for 14 days, I think also due to centos919:25
clarkbya the afs packaging is sensitive to booting on current kernels so when the images get delayed wheels get delayed19:26
clarkbI wonder if we need to only publish if all jobs pass though19:26
clarkband instead just publish whatever we've built19:26
ianwyeah, that's been a constant issue; not sure if we have a "finally" type zuul dependency?19:27
frickleror make arch specific publishing?19:27
fungii suppose that's safe, it shouldn't create a wheel if building that wheel fails, so we're probably not going to be more likely to publish broken wheels that way at least19:27
clarkbfungi: yup exactly. If we write a wheel it should be fine tp publsih19:27
fungithat said, we're more likely to not notice it's broken if we do that19:28
ianwhttps://opendev.org/openstack/project-config/src/branch/master/zuul.d/projects.yaml#L4811 is where it is released19:28
fungier, more likely to not notice we've started failing to build some hweels i mean19:28
clarkbfungi: ya  Ithink that is the balance. Is it better to hold everything up and probably notice or do best effort and maybe not notice as quickly19:28
ianw(also, grafana monitors this, and i would also be happy for it to push me notifications it was broken)19:29
clarkbianw: in the past we've said making notifications like that opt in would be fine. I think I'm also ok with sending them to an infra-root@ folder19:30
clarkbI would probably consume them ^ that way19:30
clarkb(we just want to avoid people getting middle of the night pages and feeling obligated to do something, but an alert that can be checked in the morning is something I woul dfind helpful)19:30
fungiyes, my position on it is that notifications of what's broken is fine, as long as we don't et expectations that someone is necessarily going to address whatever we're being notified about, and as long as the false failure rate isn't significant19:31
fungiwe already do it for cronjobs, expiring ssl certs, et cetera19:31
ianwhttps://review.opendev.org/c/opendev/system-config/+/573183/ was in this area19:32
clarkbI think I would avoid irc (at least to start) and do email if we can19:34
clarkbsimply because it is easier to "subscribe" with email19:34
clarkb(though most irc clients will let you filter stuff out too)19:34
clarkbbut ya I think if we can make grafana send us an email to infra-root@ and elsewhere that would work19:34
ianwhttps://meetings.opendev.org/irclogs/%23openstack-infra/%23openstack-infra.2018-06-07.log.html#t2018-06-07T23:43:25 was some discussion on it19:36
ianwat the time i accidentally left a test server alerting top #openstack-infra, which probably had people starting from a base of "already annoyed" :)19:37
fungihah19:37
fricklerwe might use a dedicated channel then. but I'm also not against mail19:38
clarkbya a dedicated channel would bte other method. Then I just want join that channel on my phone :)19:38
ianwalso, this might go into another point of contention on this as well, which is i'm not sure exactly how to set it up, but i feel like grafyaml may not support it19:39
clarkbif these are things we can add to specific grpahs it may work with grafyaml as is19:40
clarkbanyway we have one more agenda item to get to. We don't need to design this here. It may be worth a specific agenda item or a spec/email thread for future discussion though19:42
clarkb#topic Running a URL shortener19:42
clarkbfrickler pointed out that people use services like bit.ly19:42
frickleranother thing I came up with, yes19:43
clarkb#topic https://opensource.com/article/18/7/apache-url-shortener an open source alternative we could host19:43
fricklerand seeing that apache2 has everything one needs was new to me19:43
clarkbI'm not opposed and this seems like the sort of thing that would fit in well on static.o.o19:44
ianwi guess my concern is that it seems to be a target for abuse, isn't that why github killed "git.io"?19:44
clarkbianw: in this case I think you'd have to modify a file via gerrit, it wouldn't be self service19:44
fricklerwell we would still have reviews in front of the data19:44
fricklerI would do it within project-config for simplicity, but we could also use a dedicated repo if you prefer19:45
fungiyeah, the main concern i have is that this is something we'd probably have to commit to maintain ~forever or else break people's external links19:45
fungihowever, it does seem like a pretty lightweight thing19:45
ianwoh, so basically just a vhost with a list of 301 redirects?19:45
clarkbianw: ya19:45
clarkbit is simple neough that fungi's concern doens't seem to be a major thing. If we had to ru na proper wsgi service or similar I'd think differently19:46
fricklerRewriteMap shortlinks txt:/data/web/shortlink/links.txt RewriteRule ^/(.+)$ ${shortlinks:$1} [R=temp,L]19:46
ianwthat's what a large part of static.o.o is anyway :)19:46
fungiagreed19:46
ianwi certainly don't have an issue if it's just an easy-to-update config file that goes through review19:46
fungifor the sites we already host, we do similar things, e.g. zuul-ci.org/start19:47
fricklerthen another question would be whether e.g. l.opendev.org is short enough or we want to grab a shorter domain19:47
fricklerI reserved od42.de just in case, but not sure if everyone would be fine using a .de domain19:48
clarkbusing another domain typically adds another level of management with the registrar service19:48
ianwi always find it weird that these things use what i generally don't consider stable countries as a top-level domain19:49
clarkbits not impossible but avoiding that if possible is likely a good idea19:49
fungiianw: .io is a pet peeve of mine, yeah19:49
ianwsomething in .dev maybe, but i imagine anything short is unavailable19:49
funginote that .dev is controlled by google too19:50
fungiand they have a history of forcing a number of "experimental" features for domains in that tld as a result19:50
clarkbmy vote is something like l.opendev.org as it is one less thing to manage and I feel that is short enough to work on conference slide sfor example19:51
fungi(where experimental means anything they're considering for tie-ins with chrome)19:51
fricklerwe don't have to decide now, I can start preparing a patch with that anyway19:51
clarkbyup we could expand to another domain later if we decide it is neceessary19:51
ianw++ i can't imagine we can get any shorter without spending ridiculous amounts of $ anyway19:51
fungithe foundation already spent a semi-large amount of money to buy opendev.org off a scalper as it was19:52
fungiand reusing a subdomain of opendev.org is also a bit of useful advertising for the collaboratory too19:53
fungi"oh opendev, what's that?"19:53
clarkblets open it up to anything else befoer we run out of time19:53
clarkb#topic Open DIsussion19:54
clarkbanything else?19:54
fricklerdo we want to restrict targets to being opendev related?19:54
clarkbfrickler: ya I wouldn't use it for arbitrary stuff to avoid that abuse concern ianw brought up19:54
frickleranyway, can discuss that once I have a patch19:54
ianw#link https://review.opendev.org/c/opendev/system-config/+/84506619:54
ianwthat's a doc update for duplicate accounts19:55
clarkbah I'll have to take a look at that one19:55
ianwand cleans up some other things19:55
fricklerI also have a zuul patch if someone get's bored ;)19:56
frickler#link https://review.opendev.org/c/zuul/zuul/+/83467119:56
ianwinteresting ... do people take anonymous patches?19:58
fricklerianw: zuul can only see public data, not everyone publishes that19:58
fricklerin particular for the email19:59
clarkbAnd we are at time. Thanks everyone. We'll be back here next week20:00
fungithanks clarkb!20:00
clarkb#endmeeting20:00
opendevmeetMeeting ended Tue Jun 14 20:00:09 2022 UTC.  Information about MeetBot at http://wiki.debian.org/MeetBot . (v 0.1.4)20:00
opendevmeetMinutes:        https://meetings.opendev.org/meetings/infra/2022/infra.2022-06-14-19.01.html20:00
opendevmeetMinutes (text): https://meetings.opendev.org/meetings/infra/2022/infra.2022-06-14-19.01.txt20:00
opendevmeetLog:            https://meetings.opendev.org/meetings/infra/2022/infra.2022-06-14-19.01.log.html20:00
fricklerthx20:00

Generated by irclog2html.py 2.17.3 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!