Monday, 2025-09-22

zigo	Hi there!	07:04
zigo	I've re-written completely my package status thingy over here: https://osbpo.debian.net/deb-status/	07:04
zigo	Now it's completely event-driven, as in, when I do a "git push" on a package, that package gets rebuilt by one of my jenkins, which then trigger a webhook.	07:04
zigo	When that webhook is triggered, osbpo does an rsync of packages, recalculates the indices for the matching release, and then recalculates the version report.	07:04
zigo	At the moment, I've setup a cron job to pull from the release team's repo at https://opendev.org/openstack/releases	07:04
zigo	I wonder: would it be possible to send me a trigger, so I could git pull, and then recalculate ONLY for the matching OpenStack release? Event driven is so much nicer than a cron.	07:04
zigo	FYI, I'm using "webhook" from https://github.com/adnanh/webhook (in fact, the Debian package in version 2.8.0), so it does HMAC-256 checking of a signature, which is easy to do with a small shell script and curl.	07:04
*** liuxie is now known as liushy		11:51
fungi	zigo: we have jobs that call webhook endpoints, e.g. https://zuul.opendev.org/t/openstack/job/trigger-readthedocs-webhook and https://zuul.opendev.org/t/openstack/job/kolla-copr-erlang-update	13:22
fungi	if the openstack release team is interested in doing it, some similar job could be added to their release-post pipeline i guess	13:23
zigo	That'd be wonderful for me!	13:59
zigo	fungi: Who should I ask?	14:00
fungi	they reside in the #openstack-release channel, though they're a bit busy for the next few weeks	14:00
clarkb	corvus: fungi ykarel I'm going to move the discussion here. tl;dr is that we still have multiple interfaces on at least rax flex sjc3 nodes. As far as I can tell the clouds.yaml updated at 17:05 september 19 to remove the extra network config (it isn't there now and that is the file modified timestamp) and we restarted zuul launcher processes sometime september 20 as part of server	14:28
clarkb	updates and reboots	14:28
clarkb	we "know" that removing the zuul launcher network config from zuul-providers and adding it to clouds.yaml seems to work	14:29
clarkb	so we could "revert" back to that but that is an undesirable state in terms of ease of management	14:29
clarkb	(requires restarting launchers to apply those updates)	14:29
clarkb	I wonder what would happen if we simply remove the network config from zuul launcher and let openstacksdk try to figure it out (though I suspect it will go back to complaining about multiple networks)	14:30
corvus	clarkb: there were two settings in the clouds.yaml that aren't in the launcher config, do you think they are important?	14:30
clarkb	corvus: after my read of the sdk code I didn't think so anymore, but maybe they are. specifically default_interface shouldn't matter if we only ask for one itnerface. Similar with nat destination that should also default to the lone interface if we have only one	14:31
corvus	clarkb: i only have a weak preference for having the config in zuul, all things being equal. since they aren't, then i think it's fine to revert to the clouds.yaml state.	14:36
corvus	if you decide you want to pull on the thread and see if we can make it work the same way in zuul, i think it'd be worth performing some "manual" (ie, via a test script) launches of nodes with the extra arguments to verify that really makes a difference before we consider updating zuul to add those args.	14:36
ykarel	clarkb, ack i had created bug to track this https://bugs.launchpad.net/devstack/?orderby=-id&start=0 , something wrong in launchpad now as can't access the created bug https://bugs.launchpad.net/devstack/+bug/2125412	14:36
clarkb	corvus: do you think it is worth trying the quick experiment of removing the network config from zuul-provider to see if sdk figures it out automatically? I guess my two hunches are either the config provider is supplying is incorrect or sdk is merging some discovered configwith our provided config and that results in two interfaces	14:37
corvus	yes, the most likely outcomes are 1) no change; 2) fails to launch completely; 3) works. all of those seem safe.	14:39
clarkb	ok let me push that up now	14:39
opendevreview	Clark Boylan proposed opendev/zuul-providers master: Remove raxflex networks config https://review.opendev.org/c/opendev/zuul-providers/+/961988	14:41
clarkb	my early morning network troubles were particularly bad today so I restarted my ONT. Perhaps unsurprisingly everything seems happy afterwards. It is interesting that they build a battery backup into the ONT but then don't give you a reset/reboot/restart button so you have to do surgery to actually power the thing down	14:46
clarkb	with local network sorted and raxflex networking investigated I'm going to pause here and resume normal morning activities back in a bit	14:47
fungi	it's nice to see bindep and dib pkg-map covered as prior art in https://discuss.python.org/t/pep-804-an-external-dependency-registry-and-name-mapping-mechanism/103891	15:22
fungi	i expect we'd extend bindep to support reading from pyproject.toml once pep 725 and 804 get nailed down	15:23
opendevreview	Merged opendev/zuul-providers master: Remove raxflex networks config https://review.opendev.org/c/opendev/zuul-providers/+/961988	15:40
clarkb	I'm back I've self approved 961988. Once that is in I'll try to determine what node boots look like and decide if clouds.yaml needs network config and a launchers need restarts	15:40
clarkb	landed faster than I can type	15:41
corvus	retro +2	15:46
clarkb	ok 'Multiple possible networks found' has started being logged for iad3 and dfw3	15:48
clarkb	and sjc3	15:49
clarkb	this was expected but I wanted to confirm that we would get this behavior	15:49
fungi	so basically we're back to where we started out on wednesday last week	15:50
clarkb	fungi: sort of. When we got the issue previously we had the networks config in place. This is why I suspected some bad caching maybe	15:51
clarkb	with no networks config in place I think this is the expected result and everything is working correctly. Previuosly it was not an expected result. But I wanted to check those assumptions	15:51
opendevreview	Clark Boylan proposed opendev/system-config master: Reapply "Select the network to use in raxflex" https://review.opendev.org/c/opendev/system-config/+/961995	15:53
clarkb	I think we can go back to ^ and I can restart the launchers as we found that did seem to work last week	15:53
clarkb	and then we can try to reproduce and debug outside of the production deployment for further updates	15:54
fungi	i guess it "worked" in that it was able to create servers but they were coming up with more than one interface, presumably due to also setting the network in the zuul providers?	15:55
fungi	or did you also try with just the clouds.yaml network section and not specifying it in the provider config on friday?	15:56
clarkb	before the restart the weekend prior everything seemed to work with networks only configured in zuul-providers. We didnt' check that we had a single interface at that time but considering we had no complaints about errors it seems likely there was only one interface. Then after the restart the weekend prior to last we started getting 'Multiple possible networks found' errors. My	15:59
clarkb	hunch is that we cached some bad data on startup after that restart. Then mid week I noticed zuul was scheduling nodes very slowly and the queues were piling up. One of the issues detected then was the multiple networks found issue. I (naively) thought oh that means we just needt oconfigure the networks in clouds.yaml as that is what we have done in the past for other clouds and so	15:59
clarkb	we added that and restarted the launchers. At that point nodes started coming up with two interfaces instead of one. We dropped the zuul-provider networks config and started getting a single interface. So then we thought maybe the issue was the double configuration between zuul-providers and clouds.yaml so we removed the clouds.yaml config and added the zuul-provider config back and	15:59
clarkb	let the weekend reboot reset things for us. Except after that restart we wereback to two itnerfaces until just now when I dropped the network config out of zuul-providers (it aws already out of clouds.yaml) and now we have multiple possible networks found errors	15:59
clarkb	we don't know why we are getting two interfaces when nodes don't fail to boot. But we know that configuring zuul-providers with no networks and configuring clouds.yaml with networks instead seems to produce a single interface	16:00
fungi	okay, got it, so you did test that specific combo already then	16:02
clarkb	yes, but we wanted to switch back to a zuul-providers only config if possible as its config updates apply immediately. Clouds.yaml requires a service restart	16:02
clarkb	unfortunately, it appears that that double nic issue is related to having networks configured in zuul-providers so we'll need torun that down	16:03
fungi	at least until someone can figure out why it's doing that	16:03
clarkb	ya I think we should go back to what we had working last week and then restart the debugging process from there	16:04
clarkb	fungi: did you want to weigh in on https://review.opendev.org/c/opendev/system-config/+/961995 ? It has passed check tests so I plan to approve it in the next few minutes if not	16:22
fungi	approved, thanks!	16:23
clarkb	openmetal emailed frickler and me asking if we had any feedback for their cloud setup. I'm making a note that the grafana graph moved due to zuul launcher replacing nodepool and explaining how openmetal picked up the slack last week when other regions were not working (this was the multiple networks found issue and the leaked node cleanup not working problems) and from that I think	16:29
clarkb	everything is working well	16:29
clarkb	did anyone else have thoughts/feedback I should give them cc infra-root	16:29
fungi	i didn't	16:34
opendevreview	Merged opendev/system-config master: Reapply "Select the network to use in raxflex" https://review.opendev.org/c/opendev/system-config/+/961995	16:47
clarkb	oh right we don't update zuul when updating that file. The hourlies will take care of that though so once those complete in about 20 minutes I can restart the launchers	16:48
clarkb	corvus: ^ just do one at a time and give each a moment to startup before proceeding with the next right?	16:48
corvus	clarkb: yes, that will allow failover; but also, if you just want to restart both at once, that's fine too. that just means a momentary pause in request handling that no one will likely notice.	16:57
clarkb	got it thank you for confirming	16:58
clarkb	zuul launchers have been restarted	17:13
clarkb	we have nodes in iad3 and sjc3 (not dfw3 yet) and they all appear to have on interface again as expected	17:19
clarkb	what a fun mystery	17:19
clarkb	unfortunately I think I may need to delay digging into it further as I have a bunch of things I need to work through today that are a bit more time/deadline sensitive	17:19
clarkb	and now there are dfw3 nodes and they look good too	17:24
clarkb	ykarel: so at this point I think the issue is worked around. We still don't understand it, but your jobs shouldn't be affected as long as we're using this workaround	17:24
clarkb	I've made an initial pass at meeting agenda updates for tomorrow's meeting. Let me know if any content is missing, needs to be added ,or just generally needs editing	18:01
clarkb	I'm going to pop out for a bike ride while I have the opportunity	19:24
clarkb	but when I get back I may spend a quick moment looking at the zuul lauincher code to see if anything stands out to me and/or try to start on a script to reproduce the issue. Problem is I erally need to start on my summit presentation stuff so may not get into the launcher stuff today	19:26
clarkb	I've been using reveal.js lately for presentation stuff. Curious if anyone else has other tools they like. I know corvus wrote presentty.	22:29
clarkb	ok last call on meeting agenda items. I'm going to get that sent out in the next little bit	22:34
fungi	for my talk for oid-na 2024, indiana university required slides in powerpoint. i used pandoc's pptx plugin with reveal.js: http://fungi.yuggoth.org/presentations/2024oidna/	22:45
fungi	it didn't work great, but it did at least work (and i could point attendees to the html version)	22:46
fungi	also used pandoc's beamer plugin to make a pdf version which was marginally more tolerable than the powerpoint formatted one	22:47
clarkb	I found the reveal.js pdf output was pretty good last I used it	22:56
fungi	yeah, the organizers insisted submissions had to be in powerpoint format, but then we uploaded them to a system that converted them on the fly to pdf anyway?!? boggle	22:58
fungi	so i think i eventually ended up giving them the beamer pdf render because the pandoc pptx output wasn't converting to pdf well in their backend	22:59
clarkb	meeting agenda should land in inbox momentarily. Using lists to pull up the archive continues to be much quicker than before the data location move	23:00
fungi	yeah, i think we can go ahead and delete the old copy of the data at this point	23:00
clarkb	I've left the lists stuff on the agenda just as a chance to call out the work was done and to let us know if new problems arise	23:01
clarkb	but I intend on taking it out of next weeks's meeting agenda	23:01
fungi	sgtm	23:01

Generated by irclog2html.py 4.0.0 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!