Tuesday, 2022-05-31

clarkbMeeting time19:00
clarkbwe'll get started shortly19:00
fungiahoy!19:01
clarkb#startmeeting infra19:01
opendevmeetMeeting started Tue May 31 19:01:14 2022 UTC and is due to finish in 60 minutes.  The chair is clarkb. Information about MeetBot at http://wiki.debian.org/MeetBot.19:01
opendevmeetUseful Commands: #action #agreed #help #info #idea #link #topic #startvote.19:01
opendevmeetThe meeting name has been set to 'infra'19:01
clarkb#link https://lists.opendev.org/pipermail/service-discuss/2022-May/000338.html Our Agenda19:01
clarkbwe have an agenda but I forgot to add a couple of items before I sent it out (too many ditractions) so we'll audible those in as we go19:01
clarkb#topic Announcements19:01
clarkbThe open infra summit is happening next week19:02
ianwo/19:02
clarkbkeep that in mind when making changes (particularly to etherpad which will be used by the colocated forum)19:02
clarkbbut also several of us will be distracted and on different timezone from normal19:02
fungiyeah, i'll be living on cest all week19:03
clarkb#topic Actions from last meeting19:03
clarkb#link http://eavesdrop.openstack.org/meetings/infra/2022/infra.2022-05-24-19.01.txt minutes from last meeting19:03
clarkbThere were no actions19:03
clarkb#topic Topics19:03
clarkb#topic Improving CD throughput19:03
clarkb#link https://opendev.org/opendev/system-config/src/branch/master/playbooks/zuul_reboot.yaml Automated graceful Zuul upgrades and server reboots19:04
clarkbZuul management is becoming related to this topic and we landed that playbook and ran it last week19:04
clarkbIt mostly worked as expected. Zuul mergers didn't gracefully stop (they stopped doign work but then never exited). That bug has since been fixed19:04
fungiwent very smoothly, only a few bumps along the way19:04
clarkbThe other issue we hit was zuul updated its model api version from 7 to 8 through this upgrade. And there was a bug in that transition19:05
clarkbwe managed to work around that by dequeing the affected buildsets and reenqueing them19:05
clarkbas the model differences were basically deleted and then recreated on the expected content version when reenqueued19:05
clarkbZuul has also addressed thsi problem upstream (though it shouldn't be a problem for us now that we've updated to api version 8 anyway)19:05
clarkbOne change I made after the initial run was to add more package updating to the playbook so there is a small difference to pay attention to when next we run it19:06
clarkbOne thing I realized about ^ is that I'm not sure what happens if we try to do that when the auto updater is running. We might hit dpkg lock conflicts and fail? but we can keep learning as we go19:06
clarkbThat was all I had on this topic. Anything else?19:07
fungithe upgrade brought us new openstacksdk, but we can cover that in another topic19:07
clarkbyup thats next19:07
clarkb#topic Job log uploads to swift now with broken CORS headers19:09
fungi(only on rackspace, as far as we've observed)19:09
clarkbWe noticed ~Friday that job logs uploaded to swift didn't have CORS headers but only with rax (ovh is fine)19:09
clarkbThe current suspiciion is that that openstacksdk 0.99.0 release which we picked up by restarting and upgrading executors with the above playbook may be to blame19:10
fungiyeah, and anything which was uploaded to swift on/before thursday seems to still work fine19:10
clarkbhttps://review.opendev.org/844090 will downgrade the version of openstacksdk on the executors and is in the zuul gate19:10
fungiwhich lends weight to it being related to the update19:10
clarkbwe should be able to confirm/deny this theory by upgrading our executors to that newer image once it lands19:10
fungior at least significantly increase our confidence level that it's related19:10
clarkbif downgrading sdk doesn't help we'll need to debug further as we're currently only uploading logs to ovh which is less than ideal19:11
fungitechnically we still only have time-based correlation to back it up, even if the problem goes away at the next restart19:11
clarkbyup that and we now know the 0.99.0 openstacksdk release is an expected to break things in general release19:11
clarkb(we expected that with 1.0.0 but got it early)19:12
fungioh, also on ~saturday we merged a change to temporarily stop uploading logs to rackspace until we confirm via base-test that things have recovered19:12
corvusi think we can batch a lot of these changes in the next zuul restart19:13
fungiso this really was only an impact window of 24-48 hours worth of log uploads19:13
corvusi'll keep an eye out for that and let folks know when i think the time is ripe19:13
clarkbcorvus: thanks19:13
clarkband we'll pcik this up furhter if downgrading sdk doesn't help19:13
clarkbbut for now be aware of that and the current process being taken to try and debug/fix the issue19:13
ianwok, happy to help with verifying things once we're in a place to upgrade (downgrade?) executors19:13
fungias for people with job results impacted during that window, the workaround is to look at the raw logs linked from the results page19:14
clarkbianw: we need to upgrade executors whcih will downgrade openstacksdk in the images19:14
clarkbAnything else on this subject?19:15
funginada19:15
clarkb#topic Glean doesn't handle static ipv6 configuration on rpm distros19:16
clarkbWe discovered recently that glean isn't configuring ipv6 statically on rpm/red hat distros19:16
clarkbthis particular combo happens on OVH because they are now publishing the ipv6 network info in their config drives but don't do RAs19:16
fungistatically meaning addressing supplied via instance metadata19:16
clarkball of the other clouds seem to also do RAs in addition to the config drive info so we don't have to statically configure ipv6 to get working ipv619:16
clarkbfungi: right and with the expectation that the host statically configure it rather than relying on RAs19:17
clarkbin the past this was a non issue beacuse OVH didn't include that info in the metadata via config drive19:17
clarkb#link https://review.opendev.org/q/topic:rh-ipv6 Changes to add support to glean19:17
clarkbianw has picked this up and has a stack of changes ^ that I'm now noticing I need to rereview19:17
clarkbthank you for picking that up19:17
fungithis will be awesome, because it means we can finally do ipv6 in ovh generally, but also we can consider running servers there now as well19:18
clarkb++19:18
fungithe lack of automatable v6 (because it wasn't provided via metadata) was a blocker for us in the past19:18
clarkbianw: any gotchas or things to be aware as we review those changes?19:18
ianwright, so rax/ovh both provide "ipv6" in their metadata, vexxhost provides "ipv6_slaac"19:19
clarkbbut only ovh doesn't also advertise RAs19:20
clarkbI guess the rax metadata is maybe slightly wrong if it isn't ipv6_slaac? did you test these changes on rax to ensure we don't regress there by statically configuring things?19:20
clarkbI don't expect it to be a problem the kernel might just configure ipv6 first or NM will do it later depending on when RAs are received19:21
ianwi haven't touched the !rh path at all, so that should not change19:21
clarkbianw: I mean for rh on rax19:21
clarkbsince we'll be going from ignoring it entirely and using kernel RA default handling to static NM configuration of ipv619:21
fungistill probably fine so long as the metadata and neutron are in agreement19:21
ianwnm doesn't configure ipv6 there, for rh platforms19:22
ianwbut the kernel doesn't accept ra's either19:22
ianwso basically we're moving from no ipv6 -> statically configured ipv6 on rax19:22
clarkbno rax has ipv6 today19:22
clarkbI believe it is accepting RAs by default and glean is just ignoring it all19:23
clarkbthat is the difference between rax and ovh. Rax sends RAs so we got away with glean not having support, but ovh does not. Now that ovh adds the info to the config drive we notice19:23
ianwi don't think so, e.g. 172.99.69.200 is running 9-stream now on rax and doesn't have ipv619:24
fungioh, it may only be working with debuntu nodes19:24
ianwthe interface doesn't have IPV6INIT=yes set for it19:24
ianwhttps://paste.opendev.org/show/b3ya7eC9zN7oyzrEgQpk/ is a 9-stream with the change on rax i tested earlier19:24
clarkbhuh how did we not notice this issue in the past then? The ovh change is realtively new but I doubt rax has changed much19:25
ianwyeah, the debuntu case is different -- i called that out in https://review.opendev.org/c/opendev/glean/+/84399619:25
clarkbanyway sounds like we'll get ipv6 in rax on rh too then19:25
clarkban improvement all around19:26
fungihttps://zuul.opendev.org/t/openstack/build/429f8a274074476c9de7792aa71f5258/log/zuul-info/zuul-info.ubuntu-focal.txt#3019:26
fungithat ran in rax-dfw on focal and got ipv6 addressing19:26
ianwi feel like probably the debuntu path should be setting something in it's interface files to say "use autoconfig on this interface"19:26
clarkbianw: well if the type is "ipv6" yo uare not supposed to auto config19:26
clarkb"ipv6_slaac" you do19:26
clarkbiirc19:27
corvus[we noticed in zuul because of the particular way openshift 3 tries to start up (it relies on an ipv6 /etc/hosts record and does not fall back to ipv4) -- perhaps nothing else behaves like that]19:27
clarkbcorvus: ah that could be19:27
clarkbanyway sounds like there are a stack of changes htat will fix this properly and they just need review. I've put reviewing those on my todo list for this afternoon19:28
fungiyeah, that sounds like a somewhat naive implementation19:28
clarkbfungi: its specifically how neutron defines it19:28
fungibut a great canary19:28
ianwlike "iface eth<X> inet6 auto"19:28
fungii mean openshift's decision to base ipv6 detection on /etc/hosts content seems naive19:29
clarkbah19:29
ianwi feel like the issue is maybe that the network management tools and the kernel start to get in each others way if the network management tools don't know ipv6 is autoconfigured19:29
ianwbut, maybe the way we operate it just never becomes an issue19:29
clarkbThank you for digging into that. Anything else before we move on?19:30
ianwanyway, there is basically no cross-over between the debian path and the rh path in any of the changes.  so nothing should change on the !rh path19:30
fungisounds very promising19:31
fungido we have a consensus on the revert in that stack?19:31
fungii saw it got reshuffled19:31
clarkbI think it is at the bottom of the stack still19:32
clarkb(at least gerrit appears to be telling me it is)19:32
ianwoh, yeah i just missed a file on that one19:32
clarkbI'm ok with that, I suspect we were the only users of the feature. We just need to communicate when we release that the feature has been removed and tag appropriately19:32
clarkband if necessary can add it back in again if anyone complains19:33
ianwyeah, the way it tests "OVH" but is actually testing "ignore everything" was quite confusing19:33
fungiyeah, i'm okay with no deprecation period as long as everyone else is in agreement we just communicate thoroughly19:33
fungiif anyone's going to be impacted it's probably ironic, and i doubt they used it either19:34
fungi(i haven't checked codesearch)19:34
clarkbya I think ironic makes a lot of use of dhcp all interfaces19:34
ianwi did check codesearch and nothing i could see was setting it19:34
fungiwfm. thanks!19:34
clarkbwhich is essentailly what glean fell back to there and they get it from other tools19:34
ianwalso, up to https://review.opendev.org/c/opendev/glean/+/843979/2 is all no-op refactoring19:35
ianwi was thinking of 2 releases; the first with that to make sure there's no regressions, then a second one a day later that actually turns on ipv619:36
clarkbthat seems reasonable to me19:36
ianwi'll probably announce pending ipv6 enablement to the service-discuss list as a heads up19:36
clarkb++19:36
ianwbefore we tag any release that does that, and the next build picks it up19:36
fungiyeah, even just getting working v6 on ovh is likely to be disruptive somehow19:37
clarkbAlright lets continue as we hve a few more items to get to19:37
clarkbthis all sounds good to me though and I'll try to review shortly19:38
clarkb#topic Container Maintenance19:38
clarkbianw: on this topic I was wondering if you had time to look at the mariadb container upgrade path via the env var19:38
clarkbif not thats fine (a lot has been going on)19:38
ianwsorry not yet, but top of todo list is to recheck that and the new gerrit release (the ipv6 was way more work than i thought it would be :)19:39
clarkb#topic Gerrit 3.5 upgrade19:39
clarkb#link https://review.opendev.org/c/opendev/system-config/+/843298 new gerrit minor releases and our image updates for them19:40
clarkbGetting that landed and production updated is probably a good idea before we upgrade19:40
clarkbianw: are there any other upgrade related changes that need review or items that need assitance?19:40
ianwi can look at a quick restart for the minor releases during my afternoon19:41
clarkbthanks19:41
ianwno updates, but track19:42
clarkbI do feel like I'm getting sucked into travel and summit prep and want to not be tied to too many ops obligations19:42
ianw#link https://etherpad.opendev.org/p/gerrit-upgrade-3.519:42
ianwif you're interested in keeping up, anything will be logged there19:42
clarkbnoted thanks19:42
clarkb#topic Manually triggering periodic jobs in zuul19:42
clarkbI saw some talk about this at one point so I kept it on the agenda this week19:42
clarkbDoes anyone know if we managed to trigger a periodic job manually? If not thats fine we can continue but wanted to make sure we calledi t out if someone managed it and has notes to share19:43
fricklerI didn't succeed so far19:43
fricklerstill planning to do local testing19:44
clarkbthanks for the update19:44
ianw++ on getting instructions19:44
clarkb#topic Zuul changing default versions of Ansible to v519:44
clarkbThere is still no hard date for when zuul will make this change but I think we should continue to aim for end of June in opendev's deployment19:45
clarkbnote the version needs to be a string in yaml (this was something else we hit when we upgraded zuul last week)19:45
clarkbOur image updates to include the acl package appear to have made devstack happy and I think the zuul-jobs fixups have all landed19:45
clarkbfrickler: you mentioned wanting to test kolla jobs with newer ansible. Have you managed to do that yet? any thing to report if so?19:46
fricklernope, still another todo19:46
clarkbI think I'll send emal about this change at the end of next week assuming we don't hit any major blockers by then19:46
clarkbI'll send that to service-announce19:46
clarkband plan for June 30 change (its a thursday)19:47
clarkb#topic Removing Ethercalc19:48
clarkbI have shutdown and snapshotted this server19:48
clarkbI figure there isn't much harm in waiting a day or two before actually deleting it just in case someone starts screaming about problems due to this19:49
clarkbany objections to leaving the turned off server around for a couple of days before I delete it and its dns records?19:49
clarkbalso if you have a moment to double check the snapshot exists and looks sane please do.19:49
ianwnope, we have backups too19:50
fungilgtm19:50
ianwi guess they will prune but otherwise just stay around19:50
clarkbgood point19:50
clarkbalright ~thursday I'll delete the server and its dns records19:50
clarkb#topic Do we want to have a meeting next week19:51
clarkblast item on the agenda. Do we want to have a meeting next week? I think fungi corvus and myself will be unable to make it19:51
corvuscorrect; i vote no19:52
fungiyeah, i expect to be at a dinner at that time, i think19:52
fungiat any rate, i won't be around19:52
fricklerI guess we can skip19:52
clarkbI won't plan to send an agenda but if others do want a meeting and can run it feel free to send one and drive it19:53
clarkbbut ya I expect it is a good day for ianw to sleep in and otherwise not worry about it :)19:53
ianwheh, yeah, let's see if anything comes up during the week that needs an eye kept on it19:53
clarkb#topic Open Discussion19:54
clarkbAnything else?19:54
ianwopenstacksdk 1.0.0 releases seems to have potential :)  but hopefully not!19:54
clarkbianw: we'll probably have everything pinned to 0.61.0 by then :)19:54
fungii ran across an explanation of how browsers detect whether a page is trying to "auto" play audio, apparently they check to see if the user interacts by clicking or entering things in the page first, so guidance from jitsi-meet is to turn on a landing page where users preselect their options before proceeding to the room, which is generally sufficient to keep browsers from thinking the19:56
fungipage is just spontaneously generating sound19:56
fungiseems like this may address a lot of the "my sound doesn't work" complaints we've been getting about meetpAD19:56
clarkbfungi: are there changes up to add the landing page thing to our jitsi ?19:57
clarkbI know you were working on it then I got totally distracted19:57
fungii've also noticed that the upstream config files and examples have diverged quite a bit from the copies we forked, and a number of the options we set have become defaults now, so i'm trying to rectify all that first19:57
fungithe current upstream configs turn on that landing page by default since some time19:57
clarkbah so if we sync with upstraem then we'll get it for free19:58
fungiso if i realign our configs with the upstream defaults as much as possible, that should just come for free, yes19:58
fungibut in an effort not to destabilize anything people might be using for remote participation in forum sessions, i think we probably shouldn't merge any major config updates until after next week19:58
clarkbwfm19:59
fungii'll try to push up something which minimizes our divergence from the upstream configs though, for further discussion19:59
clarkbthanks20:00
clarkband we are at time20:00
clarkbTHank you everyone for joining us. We likely won't be hear next week but should be back in two weeks20:00
clarkb#endmeeting20:00
opendevmeetMeeting ended Tue May 31 20:00:20 2022 UTC.  Information about MeetBot at http://wiki.debian.org/MeetBot . (v 0.1.4)20:00
opendevmeetMinutes:        https://meetings.opendev.org/meetings/infra/2022/infra.2022-05-31-19.01.html20:00
opendevmeetMinutes (text): https://meetings.opendev.org/meetings/infra/2022/infra.2022-05-31-19.01.txt20:00
opendevmeetLog:            https://meetings.opendev.org/meetings/infra/2022/infra.2022-05-31-19.01.log.html20:00
fungithanks clarkb!20:01

Generated by irclog2html.py 2.17.3 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!