| zigo | Hi there! | 07:04 |
|---|---|---|
| zigo | I've re-written completely my package status thingy over here: https://osbpo.debian.net/deb-status/ | 07:04 |
| zigo | Now it's completely event-driven, as in, when I do a "git push" on a package, that package gets rebuilt by one of my jenkins, which then trigger a webhook. | 07:04 |
| zigo | When that webhook is triggered, osbpo does an rsync of packages, recalculates the indices for the matching release, and then recalculates the version report. | 07:04 |
| zigo | At the moment, I've setup a cron job to pull from the release team's repo at https://opendev.org/openstack/releases | 07:04 |
| zigo | I wonder: would it be possible to send me a trigger, so I could git pull, and then recalculate *ONLY* for the matching OpenStack release? Event driven is so much nicer than a cron. | 07:04 |
| zigo | FYI, I'm using "webhook" from https://github.com/adnanh/webhook (in fact, the Debian package in version 2.8.0), so it does HMAC-256 checking of a signature, which is easy to do with a small shell script and curl. | 07:04 |
| *** liuxie is now known as liushy | 11:51 | |
| fungi | zigo: we have jobs that call webhook endpoints, e.g. https://zuul.opendev.org/t/openstack/job/trigger-readthedocs-webhook and https://zuul.opendev.org/t/openstack/job/kolla-copr-erlang-update | 13:22 |
| fungi | if the openstack release team is interested in doing it, some similar job could be added to their release-post pipeline i guess | 13:23 |
| zigo | That'd be wonderful for me! | 13:59 |
| zigo | fungi: Who should I ask? | 14:00 |
| fungi | they reside in the #openstack-release channel, though they're a bit busy for the next few weeks | 14:00 |
| clarkb | corvus: fungi ykarel I'm going to move the discussion here. tl;dr is that we still have multiple interfaces on at least rax flex sjc3 nodes. As far as I can tell the clouds.yaml updated at 17:05 september 19 to remove the extra network config (it isn't there now and that is the file modified timestamp) and we restarted zuul launcher processes sometime september 20 as part of server | 14:28 |
| clarkb | updates and reboots | 14:28 |
| clarkb | we "know" that removing the zuul launcher network config from zuul-providers and adding it to clouds.yaml seems to work | 14:29 |
| clarkb | so we could "revert" back to that but that is an undesirable state in terms of ease of management | 14:29 |
| clarkb | (requires restarting launchers to apply those updates) | 14:29 |
| clarkb | I wonder what would happen if we simply remove the network config from zuul launcher and let openstacksdk try to figure it out (though I suspect it will go back to complaining about multiple networks) | 14:30 |
| corvus | clarkb: there were two settings in the clouds.yaml that aren't in the launcher config, do you think they are important? | 14:30 |
| clarkb | corvus: after my read of the sdk code I didn't think so anymore, but maybe they are. specifically default_interface shouldn't matter if we only ask for one itnerface. Similar with nat destination that should also default to the lone interface if we have only one | 14:31 |
| corvus | clarkb: i only have a weak preference for having the config in zuul, all things being equal. since they aren't, then i think it's fine to revert to the clouds.yaml state. | 14:36 |
| corvus | if you decide you want to pull on the thread and see if we can make it work the same way in zuul, i think it'd be worth performing some "manual" (ie, via a test script) launches of nodes with the extra arguments to verify that really makes a difference before we consider updating zuul to add those args. | 14:36 |
| ykarel | clarkb, ack i had created bug to track this https://bugs.launchpad.net/devstack/?orderby=-id&start=0 , something wrong in launchpad now as can't access the created bug https://bugs.launchpad.net/devstack/+bug/2125412 | 14:36 |
| clarkb | corvus: do you think it is worth trying the quick experiment of removing the network config from zuul-provider to see if sdk figures it out automatically? I guess my two hunches are either the config provider is supplying is incorrect or sdk is merging some discovered configwith our provided config and that results in two interfaces | 14:37 |
| corvus | yes, the most likely outcomes are 1) no change; 2) fails to launch completely; 3) works. all of those seem safe. | 14:39 |
| clarkb | ok let me push that up now | 14:39 |
| opendevreview | Clark Boylan proposed opendev/zuul-providers master: Remove raxflex networks config https://review.opendev.org/c/opendev/zuul-providers/+/961988 | 14:41 |
| clarkb | my early morning network troubles were particularly bad today so I restarted my ONT. Perhaps unsurprisingly everything seems happy afterwards. It is interesting that they build a battery backup into the ONT but then don't give you a reset/reboot/restart button so you have to do surgery to actually power the thing down | 14:46 |
| clarkb | with local network sorted and raxflex networking investigated I'm going to pause here and resume normal morning activities back in a bit | 14:47 |
| fungi | it's nice to see bindep and dib pkg-map covered as prior art in https://discuss.python.org/t/pep-804-an-external-dependency-registry-and-name-mapping-mechanism/103891 | 15:22 |
| fungi | i expect we'd extend bindep to support reading from pyproject.toml once pep 725 and 804 get nailed down | 15:23 |
| opendevreview | Merged opendev/zuul-providers master: Remove raxflex networks config https://review.opendev.org/c/opendev/zuul-providers/+/961988 | 15:40 |
| clarkb | I'm back I've self approved 961988. Once that is in I'll try to determine what node boots look like and decide if clouds.yaml needs network config and a launchers need restarts | 15:40 |
| clarkb | landed faster than I can type | 15:41 |
| corvus | retro +2 | 15:46 |
| clarkb | ok 'Multiple possible networks found' has started being logged for iad3 and dfw3 | 15:48 |
| clarkb | and sjc3 | 15:49 |
| clarkb | this was expected but I wanted to confirm that we would get this behavior | 15:49 |
| fungi | so basically we're back to where we started out on wednesday last week | 15:50 |
| clarkb | fungi: sort of. When we got the issue previously we had the networks config in place. This is why I suspected some bad caching maybe | 15:51 |
| clarkb | with no networks config in place I think this is the expected result and everything is working correctly. Previuosly it was not an expected result. But I wanted to check those assumptions | 15:51 |
| opendevreview | Clark Boylan proposed opendev/system-config master: Reapply "Select the network to use in raxflex" https://review.opendev.org/c/opendev/system-config/+/961995 | 15:53 |
| clarkb | I think we can go back to ^ and I can restart the launchers as we found that did seem to work last week | 15:53 |
| clarkb | and then we can try to reproduce and debug outside of the production deployment for further updates | 15:54 |
| fungi | i guess it "worked" in that it was able to create servers but they were coming up with more than one interface, presumably due to also setting the network in the zuul providers? | 15:55 |
| fungi | or did you also try with just the clouds.yaml network section and not specifying it in the provider config on friday? | 15:56 |
| clarkb | before the restart the weekend prior everything seemed to work with networks only configured in zuul-providers. We didnt' check that we had a single interface at that time but considering we had no complaints about errors it seems likely there was only one interface. Then after the restart the weekend prior to last we started getting 'Multiple possible networks found' errors. My | 15:59 |
| clarkb | hunch is that we cached some bad data on startup after that restart. Then mid week I noticed zuul was scheduling nodes very slowly and the queues were piling up. One of the issues detected then was the multiple networks found issue. I (naively) thought oh that means we just needt oconfigure the networks in clouds.yaml as that is what we have done in the past for other clouds and so | 15:59 |
| clarkb | we added that and restarted the launchers. At that point nodes started coming up with two interfaces instead of one. We dropped the zuul-provider networks config and started getting a single interface. So then we thought maybe the issue was the double configuration between zuul-providers and clouds.yaml so we removed the clouds.yaml config and added the zuul-provider config back and | 15:59 |
| clarkb | let the weekend reboot reset things for us. Except after that restart we wereback to two itnerfaces until just now when I dropped the network config out of zuul-providers (it aws already out of clouds.yaml) and now we have multiple possible networks found errors | 15:59 |
| clarkb | we don't know why we are getting two interfaces when nodes don't fail to boot. But we know that configuring zuul-providers with no networks and configuring clouds.yaml with networks instead seems to produce a single interface | 16:00 |
| fungi | okay, got it, so you did test that specific combo already then | 16:02 |
| clarkb | yes, but we wanted to switch back to a zuul-providers only config if possible as its config updates apply immediately. Clouds.yaml requires a service restart | 16:02 |
| clarkb | unfortunately, it appears that that double nic issue is related to having networks configured in zuul-providers so we'll need torun that down | 16:03 |
| fungi | at least until someone can figure out why it's doing that | 16:03 |
| clarkb | ya I think we should go back to what we had working last week and then restart the debugging process from there | 16:04 |
| clarkb | fungi: did you want to weigh in on https://review.opendev.org/c/opendev/system-config/+/961995 ? It has passed check tests so I plan to approve it in the next few minutes if not | 16:22 |
| fungi | approved, thanks! | 16:23 |
| clarkb | openmetal emailed frickler and me asking if we had any feedback for their cloud setup. I'm making a note that the grafana graph moved due to zuul launcher replacing nodepool and explaining how openmetal picked up the slack last week when other regions were not working (this was the multiple networks found issue and the leaked node cleanup not working problems) and from that I think | 16:29 |
| clarkb | everything is working well | 16:29 |
| clarkb | did anyone else have thoughts/feedback I should give them cc infra-root | 16:29 |
| fungi | i didn't | 16:34 |
| opendevreview | Merged opendev/system-config master: Reapply "Select the network to use in raxflex" https://review.opendev.org/c/opendev/system-config/+/961995 | 16:47 |
| clarkb | oh right we don't update zuul when updating that file. The hourlies will take care of that though so once those complete in about 20 minutes I can restart the launchers | 16:48 |
| clarkb | corvus: ^ just do one at a time and give each a moment to startup before proceeding with the next right? | 16:48 |
| corvus | clarkb: yes, that will allow failover; but also, if you just want to restart both at once, that's fine too. that just means a momentary pause in request handling that no one will likely notice. | 16:57 |
| clarkb | got it thank you for confirming | 16:58 |
| clarkb | zuul launchers have been restarted | 17:13 |
| clarkb | we have nodes in iad3 and sjc3 (not dfw3 yet) and they all appear to have on interface again as expected | 17:19 |
| clarkb | what a fun mystery | 17:19 |
| clarkb | unfortunately I think I may need to delay digging into it further as I have a bunch of things I need to work through today that are a bit more time/deadline sensitive | 17:19 |
| clarkb | and now there are dfw3 nodes and they look good too | 17:24 |
| clarkb | ykarel: so at this point I think the issue is worked around. We still don't understand it, but your jobs shouldn't be affected as long as we're using this workaround | 17:24 |
| clarkb | I've made an initial pass at meeting agenda updates for tomorrow's meeting. Let me know if any content is missing, needs to be added ,or just generally needs editing | 18:01 |
| clarkb | I'm going to pop out for a bike ride while I have the opportunity | 19:24 |
| clarkb | but when I get back I may spend a quick moment looking at the zuul lauincher code to see if anything stands out to me and/or try to start on a script to reproduce the issue. Problem is I erally need to start on my summit presentation stuff so may not get into the launcher stuff today | 19:26 |
| clarkb | I've been using reveal.js lately for presentation stuff. Curious if anyone else has other tools they like. I know corvus wrote presentty. | 22:29 |
| clarkb | ok last call on meeting agenda items. I'm going to get that sent out in the next little bit | 22:34 |
| fungi | for my talk for oid-na 2024, indiana university required slides in powerpoint. i used pandoc's pptx plugin with reveal.js: http://fungi.yuggoth.org/presentations/2024oidna/ | 22:45 |
| fungi | it didn't work great, but it did at least work (and i could point attendees to the html version) | 22:46 |
| fungi | also used pandoc's beamer plugin to make a pdf version which was marginally more tolerable than the powerpoint formatted one | 22:47 |
| clarkb | I found the reveal.js pdf output was pretty good last I used it | 22:56 |
| fungi | yeah, the organizers insisted submissions had to be in powerpoint format, but then we uploaded them to a system that converted them on the fly to pdf anyway?!? *boggle* | 22:58 |
| fungi | so i think i eventually ended up giving them the beamer pdf render because the pandoc pptx output wasn't converting to pdf well in their backend | 22:59 |
| clarkb | meeting agenda should land in inbox momentarily. Using lists to pull up the archive continues to be much quicker than before the data location move | 23:00 |
| fungi | yeah, i think we can go ahead and delete the old copy of the data at this point | 23:00 |
| clarkb | I've left the lists stuff on the agenda just as a chance to call out the work was done and to let us know if new problems arise | 23:01 |
| clarkb | but I intend on taking it out of next weeks's meeting agenda | 23:01 |
| fungi | sgtm | 23:01 |
Generated by irclog2html.py 4.0.0 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!