*** diablo_rojo__ is now known as diablo_rojo | 09:02 | |
*** corvus is now known as Guest3154 | 11:18 | |
*** Guest3154 is now known as notcorvus | 14:39 | |
*** notcorvus is now known as corvus | 14:40 | |
*** corvus is now known as Guest3197 | 14:40 | |
*** Guest3197 is now known as corvus | 14:42 | |
*** corvus is now known as notcorvus | 14:42 | |
*** notcorvus is now known as corvus | 14:42 | |
*** corvus is now known as Guest3200 | 14:43 | |
*** Guest3200 is now known as corvus | 15:17 | |
*** corvus is now known as notcorvus | 15:17 | |
*** notcorvus is now known as corvus | 15:17 | |
clarkb | Anyone else here for the opendev infra team meeting? | 19:00 |
---|---|---|
clarkb | #startmeeting infra | 19:01 |
opendevmeet | Meeting started Tue Aug 3 19:01:19 2021 UTC and is due to finish in 60 minutes. The chair is clarkb. Information about MeetBot at http://wiki.debian.org/MeetBot. | 19:01 |
opendevmeet | Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. | 19:01 |
opendevmeet | The meeting name has been set to 'infra' | 19:01 |
clarkb | #link http://lists.opendev.org/pipermail/service-discuss/2021-August/000272.html Our Agenda | 19:01 |
ianw | o/ | 19:01 |
fungi | ohai | 19:01 |
clarkb | Hello | 19:02 |
clarkb | #topic Announcements | 19:02 |
clarkb | I had no announcements | 19:02 |
clarkb | #topic Actions from last meeting | 19:02 |
clarkb | #link http://eavesdrop.openstack.org/meetings/infra/2021/infra.2021-07-27-19.01.txt minutes from last meeting | 19:02 |
clarkb | #action someone write spec to replace Cacti with Prometheus | 19:02 |
clarkb | tristanC has some prometheus checks in the matrix gerritbot. I'm still hopeful I may have time this week to start a draft of this spec | 19:03 |
clarkb | considering that people want to add this to new software and some of our existing software has prometheus integration (gitea) getting this moving forwad seems like a good idea | 19:03 |
corvus | Zuul has it too | 19:04 |
clarkb | ah neat | 19:04 |
corvus | (For health, not job data) | 19:05 |
clarkb | #topic Topics | 19:06 |
fungi | the topics topic! | 19:06 |
clarkb | #topic Service Coordinator Election | 19:06 |
clarkb | fungi: I kept it because you enjoy it so much :) | 19:06 |
* fungi blushes | 19:06 | |
clarkb | As mentioend in the email sent out last week Nominations will run for another week. I'm really encouraging someone else to take this on now so I can not do it :) | 19:06 |
clarkb | frickler ianw fungi corvus ^ I know we're all busy but if you're interested please feel free to volunteer :) | 19:07 |
clarkb | #topic Review Upgrade | 19:08 |
clarkb | I expect that we are close to being able to drop this topic from the meeting agenda, but wanted to keep it for at least this week | 19:08 |
clarkb | We just merged a change to bump the review.opendev.org dns record ttl up to one hour from 5 minutes | 19:08 |
clarkb | #link https://review.opendev.org/c/opendev/system-config/+/803372 Stack of gerrit fixes | 19:09 |
clarkb | I've also got this stack of changes which should pull in the mariadb and openid fixes that we landed upstream as well as switch us over to using the mariadb connector and then removing the mysql connector from our images | 19:09 |
clarkb | They are split out this way because I suspect that we'll want to land the first two. Do a gerrit restart, then after we have happy gerrit for a bit we can land the third | 19:10 |
clarkb | ianw: The other major followup is the old server cleanups? | 19:10 |
clarkb | I think frickler ack'd the file preservation question. Do we want to set a date for cleaning those up? | 19:10 |
ianw | yep, i think i can probably just go ahead with that at this point | 19:11 |
fungi | agreed | 19:11 |
clarkb | cool and thank you for doing a ton of work to make this happen | 19:11 |
clarkb | #topic Gerrit User Cleanups | 19:13 |
clarkb | I put together another set of proposed user cleanups. There are about 103 remaining conflicts. I think I've got 73 proposed for cleanup. That will take us to 30 remaining | 19:13 |
clarkb | My goal here is that we'll do the 73 then the 30 will be manageable as a single comit pushed back to All-Users and we won't have to do the more hacky retire and delete external ids thing we have been doing | 19:14 |
clarkb | I plan to reach out to those ~30 when we get there as well | 19:14 |
clarkb | fungi has reviewed this list. If anyone has tiem to take a look that is appreciated. I basically put together an audit file with a bunch of user info and teh separately indicate which of those we'll be retiring and cleaning up | 19:15 |
clarkb | But I plan to try and run the cleanup scripts against that list tomorrow | 19:15 |
clarkb | #topic Project Renames | 19:16 |
clarkb | I'll pull this topic off the agenda for next week but wanted to follow up on how the renames went last week | 19:16 |
clarkb | They went really well. Having the testing is great | 19:16 |
fungi | smoothly, i thought | 19:16 |
clarkb | #link https://review.opendev.org/c/opendev/system-config/+/802922 Need to manage ssh known_hosts for review02 users | 19:16 |
fungi | yeah, the testing was a huge help | 19:17 |
clarkb | I think ^ was the major thing that we worked around by hand | 19:17 |
fungi | and we predicted that ahead of time so it didn't disrupt the maintenance | 19:17 |
clarkb | if others can review that and indicate the chagne looks good I'll add the host var data to prod host vars and then I can approve that | 19:17 |
fungi | the only surprise was that it choked on storyboard-dev being unreachable | 19:17 |
clarkb | yup | 19:17 |
clarkb | #link https://review.opendev.org/c/opendev/system-config/+/803266 Further testing improvements | 19:17 |
clarkb | is another related chagne which aims to make the testing we've added even more useful | 19:17 |
fungi | that will hopefully merge in the next hour or so | 19:18 |
ianw | ahh sorry yep i looked at that yesterday | 19:18 |
ianw | that doesn't add the host key for giteas? | 19:18 |
ianw | was that intentional? | 19:18 |
clarkb | ianw: yes. The idea is we'll do it via private vars. | 19:18 |
clarkb | ianw: the reason for this is the previous patchset tried to add the giteas and the localhost known_hosts but failed becusae that overrode the testing group vars | 19:19 |
clarkb | ianw: this meant the known hosts entry we had for the testing review was actually the prod review and host key verification failed | 19:19 |
clarkb | ianw: to work around this the plan is to just set those values in the private host vars which I'll do as soon as we think that change is mergeable. | 19:19 |
clarkb | The content I'll add is at https://review.opendev.org/c/opendev/system-config/+/802922/6/inventory/service/host_vars/review02.opendev.org.yaml | 19:20 |
clarkb | ianw: if you think a different approach makes sense feel free to leave that in review. There are a few options there but fungi and corvus felt that keeping the pubkey and private key stuff close together in private vars made sense | 19:21 |
ianw | ok, i feel like we could grab the keys from gitea servers directly but i'll rethink on it | 19:21 |
clarkb | #topic Matrix eavesdrop and gerritbot bots | 19:22 |
clarkb | corvus and tristanC have been working to get these bots deployed. Testing has shown a couple of issues. Specifically creating log dirs properly and not using fedora as the base image so that ssh works for gerritbot | 19:23 |
clarkb | I think fixes for both of those items are on their way in whcih means I expect we can interact with those bots in the opendev test room | 19:23 |
clarkb | corvus: ^ is there anything else to be aware of as the matrix bots get deployed? | 19:23 |
corvus | Nope just waiting to check back | 19:24 |
clarkb | #topic gitea01 backups | 19:24 |
clarkb | gitea01 backups are still sad. This is more important now that we have done a project rename. We do have backups to the other host working. If you have to restore gitea database from backup we need to be careful to use the up to date backup | 19:25 |
clarkb | backups are listed by date so this should be pretty apparent but calling it out here so that others see it | 19:25 |
clarkb | One option available to us is to drop the AAAA record for the backup server in vexxhost. Now that we actually want backups to update I think we should consider this but I know others feel more strongly about ipv6 than I do (I don't have local ipv6 yet :( ) | 19:25 |
ianw | note we do have daily copies going to rax, so we're not too bad | 19:26 |
clarkb | yup | 19:26 |
ianw | i dunno, perhaps email is the next step. i haven't heard any more on the underlying issue | 19:27 |
clarkb | that seems reasonable to me | 19:27 |
ianw | i'm open to ideas but i feel like i've more or less exhausted my options for fiddling things client side | 19:27 |
clarkb | yup I think we need to rely on the cloud to correct it. The only thing we can do now is workaround it by dropping the aaaa record | 19:28 |
clarkb | I'm happy if we start with email | 19:29 |
ianw | i can do that | 19:29 |
clarkb | thanks | 19:29 |
clarkb | #topic Gitea 1.15.0 upgrade prep | 19:29 |
clarkb | When I did the gitea 1.14 upgrade recently I noticed there was a release candidate for gitea 1.15. | 19:29 |
clarkb | This turned into a thing yesterday where 1.15.0 didn't work for a number of reasons | 19:30 |
clarkb | The first is nodejs 16 and gitea are no longer compatibile despite their release notes saying this is the default nodejs to build with. We have since (just a few minutes ago) landed a change to use nodejs 14 to work around that as 14 is happy with old and new gitea | 19:31 |
clarkb | Next we've discovered that some of our old web ui based project management for things like project settings updatse and renames don't necessarily work with new gitea (specifically the renames) | 19:31 |
clarkb | I took this as a good indication we should rewrite to use the proper rest api as the api does support these actions now | 19:32 |
clarkb | #link https://review.opendev.org/c/opendev/system-config/+/803366/ Update project renaming and its child | 19:32 |
clarkb | Again that should be safe with older and newer gitea | 19:32 |
fungi | i expect we didn't use the api for those in the past because there weren't methods/objects to accomplish that? | 19:32 |
ianw | oh nice, and gate tested with the new playbooks? | 19:32 |
clarkb | fungi: yup that is what corvus thought | 19:32 |
clarkb | ianw: yup | 19:32 |
clarkb | then at the end of the stack I've placed a WIP change for gitea v1.15.0-rc2 that does a bunch of v1.15.0 specific things around templates and file paths and some config | 19:33 |
clarkb | I don't want to land the last change until we can do so with a real release. But we should be in good shape to deploy that release when it is ready | 19:33 |
clarkb | I'm happy I caught this early because there are a number of things that all needed updating to make this work right. Also when we get a real release we can hold nodes and double check it all looks good (I did that with the rc recently and it was mostly ok other than the rename not working) | 19:34 |
clarkb | #topic Mailman Ansible and Server Upgrades | 19:35 |
clarkb | #link https://review.opendev.org/c/opendev/system-config/+/803263 fix newlist command | 19:36 |
clarkb | corvus: ianw ^ if one of you can review that it would be great. It seems small on the surface but I think it gets deep into how ansible handles commands and how newlist reads prompts | 19:36 |
fungi | that was a fun subtle "we're not testing *exactly* like production" bug | 19:36 |
clarkb | otherwise I'd go ahead and approve it | 19:36 |
clarkb | fungi did test redirecting a file with only a newline in it to newlist on a held test node and that worked | 19:36 |
clarkb | so we expect it will work but we got this wrong once already so eyeballs are a good thing :) | 19:37 |
fungi | i guess the way we could have caught that was if we'd not used --quiet in the newlist test and instead blackholed delivery in exim | 19:37 |
clarkb | ya I suppose that is still an option but someone that groks exim better than me may want to take a shot at it | 19:37 |
clarkb | I've also got a todo I need to get back to which is "upgrade the list serve servers" | 19:38 |
clarkb | It occured to me that the lists.katacontainers.io server is a good guinea pig. In part because I expect it to be easier to snapshot and therefore easier to iterate on for testing. But also because if it goes really sideways we can emergency migrate those lists onto lists.openstack.org (long planned anyway) and then figure out why it was so broken | 19:39 |
clarkb | fungi: ^ I'll probably bug you for a second set of eyeballs as I try to bootstrap that testing | 19:39 |
ianw | hrm, i wonder if running it under nohup might work too? | 19:39 |
fungi | oh, on a related note, a user approached me privately today to find out why they'd stopped receiving posts from one of the lists, and in researching i discovered it was because the ipv6 address for lists.o.o ended up on the spamhaus xbl. as of ~16:30 deliveries were working to the problem destinations again | 19:39 |
clarkb | ianw: ya that might work too. We went with the ansible version to avoid needing to use the shell module, but nohup should do that as well | 19:40 |
clarkb | if we feel strognly about it I expect fungi can test newlist on the test server again using nohup | 19:41 |
fungi | rough timeframe on the xbl listing is it likely got added around july 21 | 19:41 |
clarkb | fungi: fun. I guess I solve that problem by not having aaaa records on my mailservers | 19:42 |
clarkb | #topic Open Discussion | 19:43 |
clarkb | That was what I had on the agenda. | 19:43 |
clarkb | Apologies for the ton of various docker image and gitea and gerrit and so on changes | 19:44 |
ianw | i'm hopefully i can remove debian-stretch soon | 19:44 |
clarkb | I just started looking at this as I did gitea things and the thread was very long. I think it does get us to a better spot overall once we get through them | 19:44 |
clarkb | ianw: jayf noticed the fedora 32 mirror removal. dib functests stopped working | 19:44 |
corvus | we can configure exim to not use ipv6 for outgoing if we want | 19:44 |
ianw | yeah, mea culpa for not updating that | 19:44 |
ianw | #link https://review.opendev.org/c/zuul/zuul-jobs/+/802981 | 19:44 |
clarkb | ianw: https://review.opendev.org/c/openstack/diskimage-builder/+/799341 thta apparently fixes it which I'll review after the meeting | 19:45 |
fungi | i have working ipv6 on my mta and spamassassin was logging that the RCVD_IN_XBL test was matching on messages from the lists, but it wasn't subtracting enough from the score to send them to my spam inbox | 19:45 |
fungi | so i didn't spot it | 19:45 |
ianw | removes centos-8-stream from using centos-8 wheels | 19:45 |
ianw | i'm working on getting centos-8-wheels built; it basically works | 19:45 |
ianw | #link https://review.opendev.org/c/openstack/openstack-zuul-jobs/+/802988 | 19:45 |
ianw | i just need to get the publish jobs working | 19:45 |
ianw | i have added some volumes | 19:46 |
clarkb | ianw: will that change fix it? I think we still want to use the pypi mirrors but not the wheel | 19:46 |
ianw | fungi: if you could confirm on https://review.opendev.org/c/opendev/base-jobs/+/802639 removal is the plan that would be great | 19:46 |
clarkb | ianw: that role seems to only do the distro mirrors though | 19:47 |
ianw | clarkb: my intention was to turn off just the wheel bits but i may have missed the mark | 19:47 |
clarkb | ianw: ya the tasks/mirror.yaml top level tasks lists does the pypi mirror config | 19:48 |
fungi | can do | 19:48 |
clarkb | ianw: ianw left a note | 19:48 |
ianw | ok will cycle back | 19:48 |
clarkb | tab completion fail | 19:49 |
ianw | oh and clarkb you mentioned backup pruning too? | 19:49 |
clarkb | ianw: oh ya I mentioend it yesterday since we're getting the emails warning us of disk space | 19:49 |
clarkb | ianw: did that get run? | 19:49 |
clarkb | seems we still got the warning email overnight | 19:50 |
ianw | no i didn't yet. how about i clean up review-test etc. and then i'll do that | 19:50 |
clarkb | sounds good thanks | 19:50 |
clarkb | sounds like that may be it. As awalys feel free to bring up discussion on IRC or on the mailing list | 19:51 |
clarkb | we aren't limited to this hour block | 19:51 |
clarkb | Thanks everyone! | 19:52 |
clarkb | #endmeeting | 19:52 |
opendevmeet | Meeting ended Tue Aug 3 19:52:03 2021 UTC. Information about MeetBot at http://wiki.debian.org/MeetBot . (v 0.1.4) | 19:52 |
opendevmeet | Minutes: https://meetings.opendev.org/meetings/infra/2021/infra.2021-08-03-19.01.html | 19:52 |
opendevmeet | Minutes (text): https://meetings.opendev.org/meetings/infra/2021/infra.2021-08-03-19.01.txt | 19:52 |
opendevmeet | Log: https://meetings.opendev.org/meetings/infra/2021/infra.2021-08-03-19.01.log.html | 19:52 |
fungi | thanks clarkb! | 19:53 |
Generated by irclog2html.py 2.17.2 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!