Wednesday, 2023-04-05

opendevreviewAndy Wu proposed openstack/project-config master: Add cinder-nfs charm to OpenStack charms
opendevreviewMichal Nasiadka proposed opendev/irc-meetings master: kolla: Move meeting one hour backwards (DST)
opendevreviewMerged opendev/irc-meetings master: kolla: Move meeting one hour backwards (DST)
opendevreviewwaleed mousa proposed openstack/diskimage-builder master: fix ifupdown pkg map for dhcp-all-interfaces of redhat family
opendevreviewwaleed mousa proposed openstack/diskimage-builder master: fix ifupdown pkg map for dhcp-all-interfaces of redhat family
opendevreviewStephen Finucane proposed openstack/project-config master: Add github sqlalchemy/alembic to the project list
opendevreviewMerged openstack/project-config master: Add github sqlalchemy/alembic to the project list
TheJuliagood morning opendev! Looks like job 38b5b05b25894795b4997a7935c9ad89 would have a held node for the ironic-grenade job for us to investigate13:25
fungiTheJulia: i have a very nice server in rax-ord you might like13:28
fungiwhat ssh keys should i authorize?13:28
TheJuliaooooo ahhhhhhhhh13:28
fungiTheJulia: ssh root@
fungilet us know whenever you're done with it and we can clean up the hold13:30
* TheJulia does the whole evil laugh thing13:30
fungino rush13:30
opendevreviewAndy Wu proposed openstack/project-config master: Add cinder-nfs charm to OpenStack charms
fungiinfra-root: there's some outstanding project creation changes. should we hold those until friday in order to avoid creating merge conflicts for the renames?14:21
fungirenames and retirements both i guess14:21
fricklerfungi: sounds reasonable to me14:22
fungii was about to approve some, and then stopped myself14:22
fricklerbtw. fri+mon are bank holidays here, so I won't be around much then14:22
fungii mentioned it during the meeting, but just as a reminder i'll be gone all next week14:24
opendevreviewMichal Nasiadka proposed openstack/project-config master: Stop using Storyboard for Magnum projects
gthiemongeHi Folks, we (the Octavia team) have a lot of issues with vexxhost's Ubuntu Jammy nested virt hosts (we are still on ubuntu focal controlers, the commit that updates the jobs fails in the CI) because of kernel crashes:
gthiemongeIt seems that neutron had the same issue in the past:
gthiemongeshould we consider removing those hosts from the jammy nested-virt pool?14:51
fungigthiemonge: it looks like that would only leave ovh providing those labels, though we do have two ovh regions with them so there would be some redundancy15:06
gthiemongefungi: some vexxhost nodes work properly (maybe 60% of the nodes based on the neutron commit), could we identify the broken nodes and keep only the good ones?15:08
fungigthiemonge: zuul records the host-id hash which we can correlate and then provide to mnaser or guilhermesp and they can see if those correspond to specific underlying operating system versions, kernel versions, processor models, whatever15:11
fungigthiemonge: example...
clarkbgthiemonge: fungi: friendly reminder that those using the virt labels are expected to engage the cloud directly rather than play telephone with us. Part of adding clouds to those labels is double checking with the cloud that they are happy with that15:13
gthiemongefungi: do we know how many nodes are in this pool?15:13
clarkbI think the best thing would be to engage the cloud directly and if the issue can't be resolved in a reasonable time period we can remove the cloud from the label15:14
clarkbalso a friendly reminder that nested virt is highly untable and you should expect these problems to occur and require debugging15:14
gthiemongeclarkb: ack, first we will identify which nodes are problematic15:14
fungiactually, i don't see evidence we're providing any nested virt labels in vexxhost? we have them configured for the sjc1 region but it's set to max-servers of 015:16
fungioh, never mind, we do have them in the ca-ymq-1 region15:16
fungigthiemonge: roughly appears to be 50 in vexxhost ca-ymq-1 vs 199 between ovh bhs1 and gra1 regions15:18
fungiso of the quota (based on our max-servers settings, which isn't perfectly accurate) where we could boot those labels, vexxhost accounts for roughly 20%15:19
fungii'm heading out to lunch, but should be back in an hour-ish15:19
clarkbgenekuo: fungi: ya my focus for the next little bit within opendev will likely be updating old bionic servers. That involves changes like: these are probably less appraochable to a new contributor though. As far as things that are more approachable go I would point out our system-config-run-* zuul jobs that deploy our infrastructure15:21
clarkblike production and test things like gerrit, gitea, and so on. Getting familiar with that and maybe adding some tests to less well tested services could be helpful and a good way to get familiar with our tooling15:21
clarkbgenekuo: fungi: then a possible good followup to that would be picking up the dedicated uid work for our docker containers. and/or updating mariadb versions in services that depend on a mariadb installation15:22
clarkbBut also as fungi pointed out there is the keycloak openid integration work and general keycloak opendev id service bring up.15:22
noonedeadpunkfolks, I'm not sure if you aware or not, but centos is completely broked atm, including container images, cloud images, etc, due to release of gnupg2-2.3.3-3 -
noonedeadpunkSo might be good to know. I'm not sure how much this could affect us though, but it for sure will freshly-built zuul images in case they won't rollout a fix15:24
clarkbnoonedeadpunk: thanks for the heads up15:25
clarkbthat doesn't look like the sort of thing we could work around eve nif we wanted to (and generally we try to expose platform issues and force the software to workaround it as that is the value in testing on the platform)15:27
noonedeadpunkwell, they said that explicit `rpm --import` should still work15:28
noonedeadpunkbut things may get worse as miltiple SIGs also don't have supported GPG keys, like NFV where I spotted the issue originally15:29
noonedeadpunk(rocky is not affected as you might guess)15:30
clarkbya this looks stream specific15:31
clarkbsimilar to the other stream issues we've hit previously where packages update and break but then it takes time to correct because the fixing happens in rhel future first or something15:31
noonedeadpunkyeah. which is super frustrating... So it's 3rd major breakage of Stream for OSA in last 2 weeks fwiw15:33
clarkbdefinitely seems like if centos stream is where people break first that it should also be fixed first15:33
clarkb this dib change is failing on the issue15:33
noonedeadpunkyeah, exactly the issue15:35
noonedeadpunkI wonder if it's worth writing a ML for holding rechecks15:35
noonedeadpunkAs I really have no idea how centos jobs are widespread15:36
opendevreviewClark Boylan proposed opendev/system-config master: Add static02 to inventory
opendevreviewClark Boylan proposed opendev/system-config master: Make etherpad configuration more generic for multiple hosts
opendevreviewClark Boylan proposed opendev/system-config master: Add etherpad02 to inventory
clarkbinfra-root ^ that should make for better testing of those services on jammy before we merge anything. Sorry I missed that on the first pass15:36
clarkbnoonedeadpunk: ++15:37
noonedeadpunkwell, I've jsut spotted email from TripleO on the same topic with solution like that
noonedeadpunkLikely this can be applied to dib as well?15:50
clarkbwe try to avoid those changes if we can because it hides the fact that your software is actually broken on centos15:51
clarkbwhat tripleo is doing is fine because it is the software and it is correcting the issue for itself15:51
clarkbhowever when the issue is deep in the package management system things get weird for sure and there may need to be compromise15:52
clarkbI think maybe doing the rpm --import is better though then we aren't using old gpg and instead working around current gpg?15:52
noonedeadpunkThe tricky thing about rpm --import is that nobody know full list of affected repos atm15:53
clarkbah that is repo specific because each repo has a different signing key. got it15:54
noonedeadpunkso then everyone need to do that for each affected one which is quite annoying and plenty of work15:54
noonedeadpunkyeah, so if repo has gpg signed with sha-2 - it shouldn't be an issue15:54
noonedeadpunkor well !sha-1 at least15:55
clarkbthe gpg keys themselves are independent of the hash as is the pubkey material right? The issue is on the signing side so theoretically they could resign everything and push and we'd be fine?15:56
clarkbgenekuo: fungi: also I would be happy to set up some time to talk on jitsi meet or similar if that would be helpful. I'm currently in UTC-7 which might make selecting a time painful but I'm sure we can make something work. I don't mind an early morning or later evening call15:58
noonedeadpunkclarkb: oh, yes, exactly16:00
noonedeadpunkso it's not big technical issue, it's jsut super annoying and matter of not being able to do much in a proper way16:02
clarkbya part of the issue here is endusers can't properly fix anything themselves16:02
noonedeadpunkAnd once they release it we need to also wait for mirrors to get these updated packages16:03
clarkbas it has to do with the trust relationship between the end user and the distro rather than the content of the distro itself16:03
noonedeadpunkI kind of wonder how that could pass any CI, but it's completely different topic....16:04
fricklerat some point we said we'd only support LTS releases, not unstable things. if we continue with this, we could as well start supporting sid or buntu 23.04 16:23
clarkbfrickler: yes, personally I think we should probably continue to push more towards rocky/openeuler for the bulk of the rhel like testing16:24
clarkbWe didn't have that option initially (since openeuler is a different kernel I don't want to force people that way if they don't want to use it), but now we've had rocky 8 and 9 for a bit and it seems reasonably stable16:25
fricklerI agree about rocky, openeuler keeps falling over, too16:25
fricklersee devstack+kolla16:25
clarkband then maybe also consider if fedora or centos stream provide more benefit from an upcoming updates perspective and focus on one. I think the transition from centos to centos stream has made this all a bit painful though and we're still slowly working through it16:26
clarkbpersonally I realy like the idea of a forward looking distro to catch issues as early as possible too, but that requires dedicated effort and we've never really been able to find that person or people16:31
clarkblinux 6.2 broke s3 suspend on my laptop and now I have to use the more battery hungry s10x or whatever its called16:32
clarkbcatching problems like that early before they cause problems for stable releases that affect many more users would be great16:32
fricklersure, but that should likely be an optional, non-voting scenario, not one where everyone goes screaming about their whole CI being borked16:34
clarkbfor sure16:35
johnsomHi there. I don't seem to have channel op status on #openstack-lbaas any more. It won't let me update the channel topic any longer. Can I get added to the op list for the channel?16:36
clarkbjohnsom: you will want ot edit
johnsomack, thanks16:37
clarkbIt was very likely lost in the oftc migration since there was no garuntee that your old nick was the same on oftc we didn't port those from freenode16:37
opendevreviewMichael Johnson proposed openstack/project-config master: Add johnsom ops for #openstack-lbaas and -dns
fungiokay, lunch completed16:49
fricklerjohnsom: also you don't need to actually op yourself, better set the topic via chanserv. we still need to merge the above patch for that17:00
johnsomYeah, chanserv was rejecting me when I did the set topic command17:00
johnsom"You do not have access to the TOPIC command on channel #openstack-lbaas."17:01
fricklerah, o.k., that's the right command, wasn't clear from your earlier description. that should work soon, then17:02
opendevreviewMerged openstack/project-config master: Add johnsom ops for #openstack-lbaas and -dns
fungijohnsom: the deploy job finished so you should have access now17:46
johnsomThank you17:47
fungiyw. and yes, as clarkb noted we didn't copy access lists from freenode to oftc because we couldn't know if people were squatting some of the account names, but also extending the accessbot code to set those acls makes it easier for us to track permission requests now17:48
clarkbafs01.ord was migrated off of failing hardware and should be online again according to email18:35
clarkbI can look more closely after lunch18:35
clarkbalso Element Matrix Services will be doing maintenance on our instances April 13 between 02:24 UTC and 6:24 UTC for a predicted maximum downtime of 60 minutes18:36
clarkbI suspect no one will really notice in that time block. I'll be sure to checkn on it when I wake April 14th18:37
clarkbmaybe ianw will notice but its friday afternoon/evening for ianw in that timeframe anyway18:38
clarkbfungi: if you get a chance can you review particularly the static replacement changes? I think landing those should be pretty safe? If we want ot wait until after the gerrit stuff that is fine too, but hoping to have eyeballs on them before you go on vacation at least20:35
fungiyep, meant to look at those today20:46
clarkbthe four changes related to tomorrow's gerrit outage all lgtm (3 renames and 1 to update the gerrit config to 3.7)20:47
ianwoh is that the second time for that?21:15
ianwgenekuo: in approximately 25 hours from now we'll be going through for the gerrit upgrade, and discussing it here21:18
ianwyou are of course welcome to follow along21:18
ianw99% of our maintenance is not nearly so hands-on, as it were.  most everything else is gitops driven21:20
ianwnoonedeadpunk/clarkb: we can pause our builds, if we haven't already rebuilt with it21:21
ianwcentos9 i mean21:21
ianwi think it's actually building right now ...
ianw2023-04-05 20:47:28.542 |  gnupg2                      x86_64 2.3.3-3.el9                 baseos    2.5 M21:23
clarkbianw: I don't think it will help will it? youdo a yum update in the job and it will break. But I guess maybe we don't do that everywhere?21:24
clarkbianw: oh I'm just bad at noticing timestamps. I think ord is fine as you point out21:25
ianwclarkb: yeah .. it might save something but not a generic solution21:26
clarkbianw: is there anything else you can think of that needs eyeballs prior to the downtime and work tomorrow? I think all of my concerns have been addressed at this point21:47
ianwclarkb: i don't think so, i'm happy with the checklist -- i assume you're happy with the approach in renames to merge two + wait for manage-proj to fail + unemergency + merge last?21:58
clarkbyup I think that plan sounds great22:05
opendevreviewMerged opendev/system-config master: install-launch-node: upgrade launch env periodically
ianwRequirement already satisfied: openstacksdk>=0.103 in /usr/launcher-venv/lib/python3.10/site-packages (from opendev-launch==1.0.0) (0.103.0)23:14
ianwso i guess "pip install -U <path-to-launch>" doesn't upgrade *everything*, only the launch script23:14
ianwwhich is obvious now i think about it with hindsight23:15
fungiyeah, the default upgrade strategy is conservative, i should have thought of that23:15
fungiwe need --upgrade-strategy=eager23:16
fungithe default strategy is "only-if-needed"23:16
opendevreviewMerged opendev/ master: Add static02
opendevreviewMerged opendev/ master: Add etherpad02 to DNS
Clark[m]My parents are in town as of today. Going to skip out early today for dinner to catch up with them so I feel less bad when I ignore them tomorrow 23:18
fungihave fun!23:19
opendevreviewIan Wienand proposed opendev/system-config master: install-launch-node: upgrade all packages
fungithat looks right, thanks!23:22

