clarkb | Meeting time | 19:00 |
---|---|---|
clarkb | we'll get started shortly | 19:00 |
fungi | ahoy! | 19:01 |
clarkb | #startmeeting infra | 19:01 |
opendevmeet | Meeting started Tue May 31 19:01:14 2022 UTC and is due to finish in 60 minutes. The chair is clarkb. Information about MeetBot at http://wiki.debian.org/MeetBot. | 19:01 |
opendevmeet | Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. | 19:01 |
opendevmeet | The meeting name has been set to 'infra' | 19:01 |
clarkb | #link https://lists.opendev.org/pipermail/service-discuss/2022-May/000338.html Our Agenda | 19:01 |
clarkb | we have an agenda but I forgot to add a couple of items before I sent it out (too many ditractions) so we'll audible those in as we go | 19:01 |
clarkb | #topic Announcements | 19:01 |
clarkb | The open infra summit is happening next week | 19:02 |
ianw | o/ | 19:02 |
clarkb | keep that in mind when making changes (particularly to etherpad which will be used by the colocated forum) | 19:02 |
clarkb | but also several of us will be distracted and on different timezone from normal | 19:02 |
fungi | yeah, i'll be living on cest all week | 19:03 |
clarkb | #topic Actions from last meeting | 19:03 |
clarkb | #link http://eavesdrop.openstack.org/meetings/infra/2022/infra.2022-05-24-19.01.txt minutes from last meeting | 19:03 |
clarkb | There were no actions | 19:03 |
clarkb | #topic Topics | 19:03 |
clarkb | #topic Improving CD throughput | 19:03 |
clarkb | #link https://opendev.org/opendev/system-config/src/branch/master/playbooks/zuul_reboot.yaml Automated graceful Zuul upgrades and server reboots | 19:04 |
clarkb | Zuul management is becoming related to this topic and we landed that playbook and ran it last week | 19:04 |
clarkb | It mostly worked as expected. Zuul mergers didn't gracefully stop (they stopped doign work but then never exited). That bug has since been fixed | 19:04 |
fungi | went very smoothly, only a few bumps along the way | 19:04 |
clarkb | The other issue we hit was zuul updated its model api version from 7 to 8 through this upgrade. And there was a bug in that transition | 19:05 |
clarkb | we managed to work around that by dequeing the affected buildsets and reenqueing them | 19:05 |
clarkb | as the model differences were basically deleted and then recreated on the expected content version when reenqueued | 19:05 |
clarkb | Zuul has also addressed thsi problem upstream (though it shouldn't be a problem for us now that we've updated to api version 8 anyway) | 19:05 |
clarkb | One change I made after the initial run was to add more package updating to the playbook so there is a small difference to pay attention to when next we run it | 19:06 |
clarkb | One thing I realized about ^ is that I'm not sure what happens if we try to do that when the auto updater is running. We might hit dpkg lock conflicts and fail? but we can keep learning as we go | 19:06 |
clarkb | That was all I had on this topic. Anything else? | 19:07 |
fungi | the upgrade brought us new openstacksdk, but we can cover that in another topic | 19:07 |
clarkb | yup thats next | 19:07 |
clarkb | #topic Job log uploads to swift now with broken CORS headers | 19:09 |
fungi | (only on rackspace, as far as we've observed) | 19:09 |
clarkb | We noticed ~Friday that job logs uploaded to swift didn't have CORS headers but only with rax (ovh is fine) | 19:09 |
clarkb | The current suspiciion is that that openstacksdk 0.99.0 release which we picked up by restarting and upgrading executors with the above playbook may be to blame | 19:10 |
fungi | yeah, and anything which was uploaded to swift on/before thursday seems to still work fine | 19:10 |
clarkb | https://review.opendev.org/844090 will downgrade the version of openstacksdk on the executors and is in the zuul gate | 19:10 |
fungi | which lends weight to it being related to the update | 19:10 |
clarkb | we should be able to confirm/deny this theory by upgrading our executors to that newer image once it lands | 19:10 |
fungi | or at least significantly increase our confidence level that it's related | 19:10 |
clarkb | if downgrading sdk doesn't help we'll need to debug further as we're currently only uploading logs to ovh which is less than ideal | 19:11 |
fungi | technically we still only have time-based correlation to back it up, even if the problem goes away at the next restart | 19:11 |
clarkb | yup that and we now know the 0.99.0 openstacksdk release is an expected to break things in general release | 19:11 |
clarkb | (we expected that with 1.0.0 but got it early) | 19:12 |
fungi | oh, also on ~saturday we merged a change to temporarily stop uploading logs to rackspace until we confirm via base-test that things have recovered | 19:12 |
corvus | i think we can batch a lot of these changes in the next zuul restart | 19:13 |
fungi | so this really was only an impact window of 24-48 hours worth of log uploads | 19:13 |
corvus | i'll keep an eye out for that and let folks know when i think the time is ripe | 19:13 |
clarkb | corvus: thanks | 19:13 |
clarkb | and we'll pcik this up furhter if downgrading sdk doesn't help | 19:13 |
clarkb | but for now be aware of that and the current process being taken to try and debug/fix the issue | 19:13 |
ianw | ok, happy to help with verifying things once we're in a place to upgrade (downgrade?) executors | 19:13 |
fungi | as for people with job results impacted during that window, the workaround is to look at the raw logs linked from the results page | 19:14 |
clarkb | ianw: we need to upgrade executors whcih will downgrade openstacksdk in the images | 19:14 |
clarkb | Anything else on this subject? | 19:15 |
fungi | nada | 19:15 |
clarkb | #topic Glean doesn't handle static ipv6 configuration on rpm distros | 19:16 |
clarkb | We discovered recently that glean isn't configuring ipv6 statically on rpm/red hat distros | 19:16 |
clarkb | this particular combo happens on OVH because they are now publishing the ipv6 network info in their config drives but don't do RAs | 19:16 |
fungi | statically meaning addressing supplied via instance metadata | 19:16 |
clarkb | all of the other clouds seem to also do RAs in addition to the config drive info so we don't have to statically configure ipv6 to get working ipv6 | 19:16 |
clarkb | fungi: right and with the expectation that the host statically configure it rather than relying on RAs | 19:17 |
clarkb | in the past this was a non issue beacuse OVH didn't include that info in the metadata via config drive | 19:17 |
clarkb | #link https://review.opendev.org/q/topic:rh-ipv6 Changes to add support to glean | 19:17 |
clarkb | ianw has picked this up and has a stack of changes ^ that I'm now noticing I need to rereview | 19:17 |
clarkb | thank you for picking that up | 19:17 |
fungi | this will be awesome, because it means we can finally do ipv6 in ovh generally, but also we can consider running servers there now as well | 19:18 |
clarkb | ++ | 19:18 |
fungi | the lack of automatable v6 (because it wasn't provided via metadata) was a blocker for us in the past | 19:18 |
clarkb | ianw: any gotchas or things to be aware as we review those changes? | 19:18 |
ianw | right, so rax/ovh both provide "ipv6" in their metadata, vexxhost provides "ipv6_slaac" | 19:19 |
clarkb | but only ovh doesn't also advertise RAs | 19:20 |
clarkb | I guess the rax metadata is maybe slightly wrong if it isn't ipv6_slaac? did you test these changes on rax to ensure we don't regress there by statically configuring things? | 19:20 |
clarkb | I don't expect it to be a problem the kernel might just configure ipv6 first or NM will do it later depending on when RAs are received | 19:21 |
ianw | i haven't touched the !rh path at all, so that should not change | 19:21 |
clarkb | ianw: I mean for rh on rax | 19:21 |
clarkb | since we'll be going from ignoring it entirely and using kernel RA default handling to static NM configuration of ipv6 | 19:21 |
fungi | still probably fine so long as the metadata and neutron are in agreement | 19:21 |
ianw | nm doesn't configure ipv6 there, for rh platforms | 19:22 |
ianw | but the kernel doesn't accept ra's either | 19:22 |
ianw | so basically we're moving from no ipv6 -> statically configured ipv6 on rax | 19:22 |
clarkb | no rax has ipv6 today | 19:22 |
clarkb | I believe it is accepting RAs by default and glean is just ignoring it all | 19:23 |
clarkb | that is the difference between rax and ovh. Rax sends RAs so we got away with glean not having support, but ovh does not. Now that ovh adds the info to the config drive we notice | 19:23 |
ianw | i don't think so, e.g. 172.99.69.200 is running 9-stream now on rax and doesn't have ipv6 | 19:24 |
fungi | oh, it may only be working with debuntu nodes | 19:24 |
ianw | the interface doesn't have IPV6INIT=yes set for it | 19:24 |
ianw | https://paste.opendev.org/show/b3ya7eC9zN7oyzrEgQpk/ is a 9-stream with the change on rax i tested earlier | 19:24 |
clarkb | huh how did we not notice this issue in the past then? The ovh change is realtively new but I doubt rax has changed much | 19:25 |
ianw | yeah, the debuntu case is different -- i called that out in https://review.opendev.org/c/opendev/glean/+/843996 | 19:25 |
clarkb | anyway sounds like we'll get ipv6 in rax on rh too then | 19:25 |
clarkb | an improvement all around | 19:26 |
fungi | https://zuul.opendev.org/t/openstack/build/429f8a274074476c9de7792aa71f5258/log/zuul-info/zuul-info.ubuntu-focal.txt#30 | 19:26 |
fungi | that ran in rax-dfw on focal and got ipv6 addressing | 19:26 |
ianw | i feel like probably the debuntu path should be setting something in it's interface files to say "use autoconfig on this interface" | 19:26 |
clarkb | ianw: well if the type is "ipv6" yo uare not supposed to auto config | 19:26 |
clarkb | "ipv6_slaac" you do | 19:26 |
clarkb | iirc | 19:27 |
corvus | [we noticed in zuul because of the particular way openshift 3 tries to start up (it relies on an ipv6 /etc/hosts record and does not fall back to ipv4) -- perhaps nothing else behaves like that] | 19:27 |
clarkb | corvus: ah that could be | 19:27 |
clarkb | anyway sounds like there are a stack of changes htat will fix this properly and they just need review. I've put reviewing those on my todo list for this afternoon | 19:28 |
fungi | yeah, that sounds like a somewhat naive implementation | 19:28 |
clarkb | fungi: its specifically how neutron defines it | 19:28 |
fungi | but a great canary | 19:28 |
ianw | like "iface eth<X> inet6 auto" | 19:28 |
fungi | i mean openshift's decision to base ipv6 detection on /etc/hosts content seems naive | 19:29 |
clarkb | ah | 19:29 |
ianw | i feel like the issue is maybe that the network management tools and the kernel start to get in each others way if the network management tools don't know ipv6 is autoconfigured | 19:29 |
ianw | but, maybe the way we operate it just never becomes an issue | 19:29 |
clarkb | Thank you for digging into that. Anything else before we move on? | 19:30 |
ianw | anyway, there is basically no cross-over between the debian path and the rh path in any of the changes. so nothing should change on the !rh path | 19:30 |
fungi | sounds very promising | 19:31 |
fungi | do we have a consensus on the revert in that stack? | 19:31 |
fungi | i saw it got reshuffled | 19:31 |
clarkb | I think it is at the bottom of the stack still | 19:32 |
clarkb | (at least gerrit appears to be telling me it is) | 19:32 |
ianw | oh, yeah i just missed a file on that one | 19:32 |
clarkb | I'm ok with that, I suspect we were the only users of the feature. We just need to communicate when we release that the feature has been removed and tag appropriately | 19:32 |
clarkb | and if necessary can add it back in again if anyone complains | 19:33 |
ianw | yeah, the way it tests "OVH" but is actually testing "ignore everything" was quite confusing | 19:33 |
fungi | yeah, i'm okay with no deprecation period as long as everyone else is in agreement we just communicate thoroughly | 19:33 |
fungi | if anyone's going to be impacted it's probably ironic, and i doubt they used it either | 19:34 |
fungi | (i haven't checked codesearch) | 19:34 |
clarkb | ya I think ironic makes a lot of use of dhcp all interfaces | 19:34 |
ianw | i did check codesearch and nothing i could see was setting it | 19:34 |
fungi | wfm. thanks! | 19:34 |
clarkb | which is essentailly what glean fell back to there and they get it from other tools | 19:34 |
ianw | also, up to https://review.opendev.org/c/opendev/glean/+/843979/2 is all no-op refactoring | 19:35 |
ianw | i was thinking of 2 releases; the first with that to make sure there's no regressions, then a second one a day later that actually turns on ipv6 | 19:36 |
clarkb | that seems reasonable to me | 19:36 |
ianw | i'll probably announce pending ipv6 enablement to the service-discuss list as a heads up | 19:36 |
clarkb | ++ | 19:36 |
ianw | before we tag any release that does that, and the next build picks it up | 19:36 |
fungi | yeah, even just getting working v6 on ovh is likely to be disruptive somehow | 19:37 |
clarkb | Alright lets continue as we hve a few more items to get to | 19:37 |
clarkb | this all sounds good to me though and I'll try to review shortly | 19:38 |
clarkb | #topic Container Maintenance | 19:38 |
clarkb | ianw: on this topic I was wondering if you had time to look at the mariadb container upgrade path via the env var | 19:38 |
clarkb | if not thats fine (a lot has been going on) | 19:38 |
ianw | sorry not yet, but top of todo list is to recheck that and the new gerrit release (the ipv6 was way more work than i thought it would be :) | 19:39 |
clarkb | #topic Gerrit 3.5 upgrade | 19:39 |
clarkb | #link https://review.opendev.org/c/opendev/system-config/+/843298 new gerrit minor releases and our image updates for them | 19:40 |
clarkb | Getting that landed and production updated is probably a good idea before we upgrade | 19:40 |
clarkb | ianw: are there any other upgrade related changes that need review or items that need assitance? | 19:40 |
ianw | i can look at a quick restart for the minor releases during my afternoon | 19:41 |
clarkb | thanks | 19:41 |
ianw | no updates, but track | 19:42 |
clarkb | I do feel like I'm getting sucked into travel and summit prep and want to not be tied to too many ops obligations | 19:42 |
ianw | #link https://etherpad.opendev.org/p/gerrit-upgrade-3.5 | 19:42 |
ianw | if you're interested in keeping up, anything will be logged there | 19:42 |
clarkb | noted thanks | 19:42 |
clarkb | #topic Manually triggering periodic jobs in zuul | 19:42 |
clarkb | I saw some talk about this at one point so I kept it on the agenda this week | 19:42 |
clarkb | Does anyone know if we managed to trigger a periodic job manually? If not thats fine we can continue but wanted to make sure we calledi t out if someone managed it and has notes to share | 19:43 |
frickler | I didn't succeed so far | 19:43 |
frickler | still planning to do local testing | 19:44 |
clarkb | thanks for the update | 19:44 |
ianw | ++ on getting instructions | 19:44 |
clarkb | #topic Zuul changing default versions of Ansible to v5 | 19:44 |
clarkb | There is still no hard date for when zuul will make this change but I think we should continue to aim for end of June in opendev's deployment | 19:45 |
clarkb | note the version needs to be a string in yaml (this was something else we hit when we upgraded zuul last week) | 19:45 |
clarkb | Our image updates to include the acl package appear to have made devstack happy and I think the zuul-jobs fixups have all landed | 19:45 |
clarkb | frickler: you mentioned wanting to test kolla jobs with newer ansible. Have you managed to do that yet? any thing to report if so? | 19:46 |
frickler | nope, still another todo | 19:46 |
clarkb | I think I'll send emal about this change at the end of next week assuming we don't hit any major blockers by then | 19:46 |
clarkb | I'll send that to service-announce | 19:46 |
clarkb | and plan for June 30 change (its a thursday) | 19:47 |
clarkb | #topic Removing Ethercalc | 19:48 |
clarkb | I have shutdown and snapshotted this server | 19:48 |
clarkb | I figure there isn't much harm in waiting a day or two before actually deleting it just in case someone starts screaming about problems due to this | 19:49 |
clarkb | any objections to leaving the turned off server around for a couple of days before I delete it and its dns records? | 19:49 |
clarkb | also if you have a moment to double check the snapshot exists and looks sane please do. | 19:49 |
ianw | nope, we have backups too | 19:50 |
fungi | lgtm | 19:50 |
ianw | i guess they will prune but otherwise just stay around | 19:50 |
clarkb | good point | 19:50 |
clarkb | alright ~thursday I'll delete the server and its dns records | 19:50 |
clarkb | #topic Do we want to have a meeting next week | 19:51 |
clarkb | last item on the agenda. Do we want to have a meeting next week? I think fungi corvus and myself will be unable to make it | 19:51 |
corvus | correct; i vote no | 19:52 |
fungi | yeah, i expect to be at a dinner at that time, i think | 19:52 |
fungi | at any rate, i won't be around | 19:52 |
frickler | I guess we can skip | 19:52 |
clarkb | I won't plan to send an agenda but if others do want a meeting and can run it feel free to send one and drive it | 19:53 |
clarkb | but ya I expect it is a good day for ianw to sleep in and otherwise not worry about it :) | 19:53 |
ianw | heh, yeah, let's see if anything comes up during the week that needs an eye kept on it | 19:53 |
clarkb | #topic Open Discussion | 19:54 |
clarkb | Anything else? | 19:54 |
ianw | openstacksdk 1.0.0 releases seems to have potential :) but hopefully not! | 19:54 |
clarkb | ianw: we'll probably have everything pinned to 0.61.0 by then :) | 19:54 |
fungi | i ran across an explanation of how browsers detect whether a page is trying to "auto" play audio, apparently they check to see if the user interacts by clicking or entering things in the page first, so guidance from jitsi-meet is to turn on a landing page where users preselect their options before proceeding to the room, which is generally sufficient to keep browsers from thinking the | 19:56 |
fungi | page is just spontaneously generating sound | 19:56 |
fungi | seems like this may address a lot of the "my sound doesn't work" complaints we've been getting about meetpAD | 19:56 |
clarkb | fungi: are there changes up to add the landing page thing to our jitsi ? | 19:57 |
clarkb | I know you were working on it then I got totally distracted | 19:57 |
fungi | i've also noticed that the upstream config files and examples have diverged quite a bit from the copies we forked, and a number of the options we set have become defaults now, so i'm trying to rectify all that first | 19:57 |
fungi | the current upstream configs turn on that landing page by default since some time | 19:57 |
clarkb | ah so if we sync with upstraem then we'll get it for free | 19:58 |
fungi | so if i realign our configs with the upstream defaults as much as possible, that should just come for free, yes | 19:58 |
fungi | but in an effort not to destabilize anything people might be using for remote participation in forum sessions, i think we probably shouldn't merge any major config updates until after next week | 19:58 |
clarkb | wfm | 19:59 |
fungi | i'll try to push up something which minimizes our divergence from the upstream configs though, for further discussion | 19:59 |
clarkb | thanks | 20:00 |
clarkb | and we are at time | 20:00 |
clarkb | THank you everyone for joining us. We likely won't be hear next week but should be back in two weeks | 20:00 |
clarkb | #endmeeting | 20:00 |
opendevmeet | Meeting ended Tue May 31 20:00:20 2022 UTC. Information about MeetBot at http://wiki.debian.org/MeetBot . (v 0.1.4) | 20:00 |
opendevmeet | Minutes: https://meetings.opendev.org/meetings/infra/2022/infra.2022-05-31-19.01.html | 20:00 |
opendevmeet | Minutes (text): https://meetings.opendev.org/meetings/infra/2022/infra.2022-05-31-19.01.txt | 20:00 |
opendevmeet | Log: https://meetings.opendev.org/meetings/infra/2022/infra.2022-05-31-19.01.log.html | 20:00 |
fungi | thanks clarkb! | 20:01 |
Generated by irclog2html.py 2.17.3 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!