Monday, 2025-09-22

zigoHi there!07:04
zigoI've re-written completely my package status thingy over here: https://osbpo.debian.net/deb-status/07:04
zigoNow it's completely event-driven, as in, when I do a "git push" on a package, that package gets rebuilt by one of my jenkins, which then trigger a webhook.07:04
zigoWhen that webhook is triggered, osbpo does an rsync of packages, recalculates the indices for the matching release, and then recalculates the version report.07:04
zigoAt the moment, I've setup a cron job to pull from the release team's repo at https://opendev.org/openstack/releases07:04
zigoI wonder: would it be possible to send me a trigger, so I could git pull, and then recalculate *ONLY* for the matching OpenStack release? Event driven is so much nicer than a cron.07:04
zigoFYI, I'm using "webhook" from https://github.com/adnanh/webhook (in fact, the Debian package in version 2.8.0), so it does HMAC-256 checking of a signature, which is easy to do with a small shell script and curl.07:04
*** liuxie is now known as liushy11:51
fungizigo: we have jobs that call webhook endpoints, e.g. https://zuul.opendev.org/t/openstack/job/trigger-readthedocs-webhook and https://zuul.opendev.org/t/openstack/job/kolla-copr-erlang-update13:22
fungiif the openstack release team is interested in doing it, some similar job could be added to their release-post pipeline i guess13:23
zigoThat'd be wonderful for me!13:59
zigofungi: Who should I ask?14:00
fungithey reside in the #openstack-release channel, though they're a bit busy for the next few weeks14:00
clarkbcorvus: fungi ykarel I'm going to move the discussion here. tl;dr is that we still have multiple interfaces on at least rax flex sjc3 nodes. As far as I can tell the clouds.yaml updated at 17:05 september 19 to remove the extra network config (it isn't there now and that is the file modified timestamp) and we restarted zuul launcher processes sometime september 20 as part of server14:28
clarkbupdates and reboots14:28
clarkbwe "know" that removing the zuul launcher network config from zuul-providers and adding it to clouds.yaml seems to work14:29
clarkbso we could "revert" back to that but that is an undesirable state in terms of ease of management14:29
clarkb(requires restarting launchers to apply those updates)14:29
clarkbI wonder what would happen if we simply remove the network config from zuul launcher and let openstacksdk try to figure it out (though I suspect it will go back to complaining about multiple networks)14:30
corvusclarkb: there were two settings in the clouds.yaml that aren't in the launcher config, do you think they are important?14:30
clarkbcorvus: after my read of the sdk code I didn't think so anymore, but maybe they are. specifically default_interface shouldn't matter if we only ask for one itnerface. Similar with nat destination that should also default to the lone interface if we have only one14:31
corvusclarkb: i only have a weak preference for having the config in zuul, all things being equal.  since they aren't, then i think it's fine to revert to the clouds.yaml state.14:36
corvusif you decide you want to pull on the thread and see if we can make it work the same way in zuul, i think it'd be worth performing some "manual" (ie, via a test script) launches of nodes with the extra arguments to verify that really makes a difference before we consider updating zuul to add those args.14:36
ykarelclarkb, ack i had created bug to track this https://bugs.launchpad.net/devstack/?orderby=-id&start=0 , something wrong in launchpad now as can't access the created bug https://bugs.launchpad.net/devstack/+bug/212541214:36
clarkbcorvus: do you think it is worth trying the quick experiment of removing the network config from zuul-provider to see if sdk figures it out automatically? I guess my two hunches are either the config provider is supplying is incorrect or sdk is merging some discovered configwith our provided config and that results in two interfaces14:37
corvusyes, the most likely outcomes are 1) no change; 2) fails to launch completely; 3) works.  all of those seem safe.14:39
clarkbok let me push that up now14:39
opendevreviewClark Boylan proposed opendev/zuul-providers master: Remove raxflex networks config  https://review.opendev.org/c/opendev/zuul-providers/+/96198814:41
clarkbmy early morning network troubles were particularly bad today so I restarted my ONT. Perhaps unsurprisingly everything seems happy afterwards. It is interesting that they build a battery backup into the ONT but then don't give you a reset/reboot/restart button so you have to do surgery to actually power the thing down14:46
clarkbwith local network sorted and raxflex networking investigated I'm going to pause here and resume normal morning activities back in a bit14:47
fungiit's nice to see bindep and dib pkg-map covered as prior art in https://discuss.python.org/t/pep-804-an-external-dependency-registry-and-name-mapping-mechanism/10389115:22
fungii expect we'd extend bindep to support reading from pyproject.toml once pep 725 and 804 get nailed down15:23
opendevreviewMerged opendev/zuul-providers master: Remove raxflex networks config  https://review.opendev.org/c/opendev/zuul-providers/+/96198815:40
clarkbI'm back I've self approved 961988. Once that is in I'll try to determine what node boots look like and decide if clouds.yaml needs network config and a launchers need restarts15:40
clarkblanded faster than I can type15:41
corvusretro +215:46
clarkbok 'Multiple possible networks found' has started being logged for iad3 and dfw315:48
clarkband sjc315:49
clarkbthis was expected but I wanted to confirm that we would get this behavior15:49
fungiso basically we're back to where we started out on wednesday last week15:50
clarkbfungi: sort of. When we got the issue previously we had the networks config in place. This is why I suspected some bad caching maybe15:51
clarkbwith no networks config in place I think this is the expected result and everything is working correctly. Previuosly it was not an expected result. But I wanted to check those assumptions15:51
opendevreviewClark Boylan proposed opendev/system-config master: Reapply "Select the network to use in raxflex"  https://review.opendev.org/c/opendev/system-config/+/96199515:53
clarkbI think we can go back to ^ and I can restart the launchers as we found that did seem to work last week15:53
clarkband then we can try to reproduce and debug outside of the production deployment for further updates15:54
fungii guess it "worked" in that it was able to create servers but they were coming up with more than one interface, presumably due to also setting the network in the zuul providers?15:55
fungior did you also try with just the clouds.yaml network section and not specifying it in the provider config on friday?15:56
clarkbbefore the restart the weekend prior everything seemed to work with networks only configured in zuul-providers. We didnt' check that we had a single interface at that time but considering we had no complaints about errors it seems likely there was only one interface. Then after the restart the weekend prior to last we started getting 'Multiple possible networks found' errors. My15:59
clarkbhunch is that we cached some bad data on startup after that restart. Then mid week I noticed zuul was scheduling nodes very slowly and the queues were piling up. One of the issues detected then was the multiple networks found issue. I (naively) thought oh that means we just needt oconfigure the networks in clouds.yaml as that is what we have done in the past for other clouds and so15:59
clarkbwe added that and restarted the launchers. At that point nodes started coming up with two interfaces instead of one. We dropped the zuul-provider networks config and started getting a single interface. So then we thought maybe the issue was the double configuration between zuul-providers and clouds.yaml so we removed the clouds.yaml config and added the zuul-provider config back and15:59
clarkblet the weekend reboot reset things for us. Except after that restart we wereback to two itnerfaces until just now when I dropped the network config out of zuul-providers (it aws already out of clouds.yaml) and now we have multiple possible networks found errors15:59
clarkbwe don't know why we are getting two interfaces when nodes don't fail to boot. But we know that configuring zuul-providers with no networks and configuring clouds.yaml with networks instead seems to produce a single interface16:00
fungiokay, got it, so you did test that specific combo already then16:02
clarkbyes, but we wanted to switch back to a zuul-providers only config if possible as its config updates apply immediately. Clouds.yaml requires a service restart16:02
clarkbunfortunately, it appears that that double nic issue is related to having networks configured in zuul-providers so we'll need torun that down16:03
fungiat least until someone can figure out why it's doing that16:03
clarkbya I think we should go back to what we had working last week and then restart the debugging process from there16:04
clarkbfungi: did you want to weigh in on https://review.opendev.org/c/opendev/system-config/+/961995 ? It has passed check tests so I plan to approve it in the next few minutes if not16:22
fungiapproved, thanks!16:23
clarkbopenmetal emailed frickler and me asking if we had any feedback for their cloud setup. I'm making a note that the grafana graph moved due to zuul launcher replacing nodepool and explaining how openmetal picked up the slack last week when other regions were not working (this was the multiple networks found issue and the leaked node cleanup not working problems) and from that I think16:29
clarkbeverything is working well16:29
clarkbdid anyone else have thoughts/feedback I should give them cc infra-root16:29
fungii didn't16:34
opendevreviewMerged opendev/system-config master: Reapply "Select the network to use in raxflex"  https://review.opendev.org/c/opendev/system-config/+/96199516:47
clarkboh right we don't update zuul when updating that file. The hourlies will take care of that though so once those complete in about 20 minutes I can restart the launchers16:48
clarkbcorvus: ^ just do one at a time and give each a moment to startup before proceeding with the next right?16:48
corvusclarkb: yes, that will allow failover; but also, if you just want to restart both at once, that's fine too.  that just means a momentary pause in request handling that no one will likely notice.16:57
clarkbgot it thank you for confirming16:58
clarkbzuul launchers have been restarted17:13
clarkbwe have nodes in iad3 and sjc3 (not dfw3 yet) and they all appear to have on interface again as expected17:19
clarkbwhat a fun mystery17:19
clarkbunfortunately I think I may need to delay digging into it further as I have a bunch of things I need to work through today that are a bit more time/deadline sensitive17:19
clarkband now there are dfw3 nodes and they look good too17:24
clarkbykarel: so at this point I think the issue is worked around. We still don't understand it, but your jobs shouldn't be affected as long as we're using this workaround17:24
clarkbI've made an initial pass at meeting agenda updates for tomorrow's meeting. Let me know if any content is missing, needs to be added ,or just generally needs editing18:01
clarkbI'm going to pop out for a bike ride while I have the opportunity19:24
clarkbbut when I get back I may spend a quick moment looking at the zuul lauincher code to see if anything stands out to me and/or try to start on a script to reproduce the issue. Problem is I erally need to start on my summit presentation stuff so may not get into the launcher stuff today19:26
clarkbI've been using reveal.js lately for presentation stuff. Curious if anyone else has other tools they like. I know corvus wrote presentty.22:29
clarkbok last call on meeting agenda items. I'm going to get that sent out in the next little bit22:34
fungifor my talk for oid-na 2024, indiana university required slides in powerpoint. i used pandoc's pptx plugin with reveal.js: http://fungi.yuggoth.org/presentations/2024oidna/22:45
fungiit didn't work great, but it did at least work (and i could point attendees to the html version)22:46
fungialso used pandoc's beamer plugin to make a pdf version which was marginally more tolerable than the powerpoint formatted one22:47
clarkbI found the reveal.js pdf output was pretty good last I used it22:56
fungiyeah, the organizers insisted submissions had to be in powerpoint format, but then we uploaded them to a system that converted them on the fly to pdf anyway?!? *boggle*22:58
fungiso i think i eventually ended up giving them the beamer pdf render because the pandoc pptx output wasn't converting to pdf well in their backend22:59
clarkbmeeting agenda should land in inbox momentarily. Using lists to pull up the archive continues to be much quicker than before the data location move23:00
fungiyeah, i think we can go ahead and delete the old copy of the data at this point23:00
clarkbI've left the lists stuff on the agenda just as a chance to call out the work was done and to let us know if new problems arise23:01
clarkbbut I intend on taking it out of next weeks's meeting agenda23:01
fungisgtm23:01

Generated by irclog2html.py 4.0.0 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!