tkajinam | I think we can merge https://review.opendev.org/c/openstack/project-config/+/905820 now, because the governance change has already been merged | 08:45 |
---|---|---|
tkajinam | and it'd be nice if https://review.opendev.org/c/openstack/project-config/+/912710 can be merged, too, to move the retirement process forward | 08:45 |
opendevreview | Takashi Kajinami proposed openstack/project-config master: Retire puppet-murano: End Project Gating https://review.opendev.org/c/openstack/project-config/+/913292 | 09:11 |
opendevreview | Merged openstack/project-config master: Retire puppet-ec2api: End Project Gating https://review.opendev.org/c/openstack/project-config/+/912710 | 09:12 |
opendevreview | Takashi Kajinami proposed openstack/project-config master: Retire puppet-murano: Remove Project from Infrastructure System https://review.opendev.org/c/openstack/project-config/+/913296 | 09:18 |
fungi | tkajinam: see comments on 905820, but i'm happy to merge once that's cleaned up | 12:47 |
opendevreview | Takashi Kajinami proposed openstack/project-config master: Retire heat-cfnclient: Remove Project from Infrastructure System https://review.opendev.org/c/openstack/project-config/+/905820 | 12:49 |
tkajinam | fungi, thanks ! (and thank you, frickler, too) | 12:49 |
tkajinam | fungi, I guess I can remove the grops key assuming it's for storyboard but lmk in case I have to restore it | 12:50 |
fungi | tkajinam: yeah, remove both. it was there for adding the repo to the corresponding project group in sb | 12:52 |
tkajinam | ok ! | 12:54 |
opendevreview | Merged openstack/project-config master: Retire puppet-murano: End Project Gating https://review.opendev.org/c/openstack/project-config/+/913292 | 13:02 |
opendevreview | Merged openstack/project-config master: Retire heat-cfnclient: Remove Project from Infrastructure System https://review.opendev.org/c/openstack/project-config/+/905820 | 13:34 |
Clark[m] | fungi: fricker: thoughts on removing centos 7 from base jobs and nodepool as planned today? I think fungi's efforts put us in a good spot to proceed with minimal impact | 14:52 |
fungi | i think we should do it today as we announced, yes | 14:52 |
Clark[m] | https://review.opendev.org/c/opendev/base-jobs/+/912786 is the next step then. If you want to review that I can remove my wip and approve once reviews are done | 14:53 |
fungi | and i liked the idea of capturing the config error json from tenants (or maybe just the openstack tenant? the others are small enough to work out by skimming) before we do | 14:53 |
Clark[m] | That way we can do a diff? I like that idea too | 14:54 |
fungi | yeah. it seems like zuul builds that json in a deterministic order, so diffing a yaml conversion of it is quite trivial | 14:54 |
fungi | when i merged the final devstack cleanup change, a diff of the error data cleanly pointed out three backports i'd accidentally not pushed to some old keystone branches, and once i did and merged those the diff was empty again | 14:55 |
fungi | just a sec and i'll review, then i'll grab a config error snapshot when we're approving | 14:56 |
Clark[m] | Thanks! | 15:01 |
clarkb | WIP is removed I think you can go ahead and approve it when you review it (assuming you have no complaints with the change) | 15:11 |
clarkb | https://review.opendev.org/c/openstack/project-config/+/912787 and then that can go in once we're satisfied we don't have any crazy new config errosr that needs fixing first | 15:14 |
fungi | clarkb: okay, i've approved 912786 now | 15:37 |
fungi | snapshot of https://zuul.opendev.org/api/tenant/openstack/config-errors?limit=1000 grabbed (514 entries) | 15:39 |
clarkb | fungi: where is that snapshot? | 15:39 |
fungi | on my desktop | 15:42 |
fungi | it's nearly half a megabyte uncompressed so no way i can put it in a paste, but i can copy it to a server or something if needed | 15:44 |
clarkb | impressive :) | 15:44 |
fungi | i was just planning to put a yaml diff on paste (if it's compact enough) | 15:44 |
fungi | all errors mentioning centos-7 are either broken tripleo references, or various branches of openstack-ansible-functional-centos-7 which is already broken for other reasons | 15:46 |
clarkb | I wonder if this phishes from "ovh" that hit the mailing list are a common attack sent to all mailing lists they can find or if we're targetted because we have some dealings with ovh already | 15:50 |
opendevreview | Merged opendev/base-jobs master: Remove centos-7 nodeset https://review.opendev.org/c/opendev/base-jobs/+/912786 | 15:59 |
clarkb | there are 637 errors now | 16:02 |
fungi | every single mailing list is getting those, yes | 16:03 |
clarkb | fungi: looks like openstack/freezer is angry | 16:04 |
fungi | i'll work up the diff | 16:04 |
clarkb | but it was already sad about opensuse removal so not really a regression from that | 16:04 |
fungi | the diff is still ~95k | 16:05 |
clarkb | swift, freezer, and openstack-ansible-ops | 16:05 |
clarkb | fungi: I think thee amy be duplicates too | 16:06 |
clarkb | So maybe do the diff then sort and uniq the entries? | 16:06 |
fungi | basically looking through first to see if there's any impact to master or stable/2024.1 branches of active openstack deliverable repositories | 16:08 |
clarkb | fungi: I think freezer but the other two were only on older branches | 16:08 |
fungi | not found any yet but will take a bit | 16:08 |
clarkb | but not sure if freezer is part of active openstack release | 16:09 |
fungi | freezer is not active | 16:09 |
clarkb | ack | 16:09 |
fungi | it was officially declared inactive by the tc | 16:09 |
fungi | in large part because its zuul configuration was in a broken state for a prolonged period of time | 16:10 |
fungi | starlingx/zuul-jobs master branch is impacted, btw | 16:10 |
clarkb | yes I pushed a change to them when I did opensuse and it has been completely ignored | 16:11 |
clarkb | https://review.opendev.org/c/starlingx/zuul-jobs/+/909766 | 16:11 |
clarkb | I feel like I was helpful there and it is up to them to decide to accept the help | 16:12 |
fungi | and yeah, no references to 'branch: master' outside those two, no references at all to 'branch: stable\/2024\.1' | 16:12 |
clarkb | fungi: probably the biggest risk to the release then is swift, but we're reasonably confident that not having errors on the master and stable/2024.1 branches isolates us from problems? cc timburke | 16:13 |
clarkb | since openstack ansible ops isn't part of coordinated releasing as its a deployment trailing release repo I think | 16:13 |
fungi | i believe so, yes | 16:14 |
clarkb | in that case I think we can offer our help to timburke to bypass ci to merge cleanups if that is necessary and otherwise we can proceed with nodepool clenaup? or do we want to revert and cleanup swift first? | 16:15 |
* clarkb will go do morning things while awaiting feedback on that | 16:16 | |
fungi | this should only at worst prevent testing changes on the affected older stable branches, yes | 16:18 |
fungi | i think we can proceed with the nodepool removal | 16:18 |
fungi | i've rechecked it | 16:20 |
fungi | the varied diff context and interspersed escaped embedded newlines make analyzing the diff tough. if we really want a uniq'd breakdown of the 123 new errors we'll likely need to produce it by hand | 16:22 |
fungi | the actual errors are encoded as visually-formatted multi-line message strings rather than as structured data | 16:24 |
fungi | i could try to stream edit out the error messages with a smart enough pattern match, but that will take work too | 16:26 |
clarkb | fwiw I skimmed only the centos-7 nodeset not foudn errors. I guess there could be newer transitive job not found errors | 16:36 |
clarkb | maybe if you can extract any errors that are not nodeset not found that list will be smaller and potentially actionable? | 16:36 |
clarkb | but you said the master and stable/2024.1 branches don't show up so we're probably fine either way | 16:37 |
*** dmellado74522 is now known as dmellado | 16:41 | |
fungi | yeah, i need to step away for a few minutes, but can try to put together some sort of machine-assisted analysis of the diff shortly | 16:41 |
clarkb | there is one centos-7 node locked for deleting in nodepool. I suspect the problem is in the cloud though and deletion is failing | 16:42 |
clarkb | corvus: ^ do you think that will present a problem if we remove the centos-7 configuration from nodepool (no more labels, images, etc)? | 16:42 |
clarkb | I think we can manually clear out the record from zk and followup with the cloud later | 16:43 |
clarkb | I seem to recall node processing (even deletions) failing if the node and pool info is removed | 16:43 |
clarkb | the server was booted in january | 16:44 |
clarkb | fault | {'message': 'MessagingTimeout', 'code': 500, 'created': '2024-01-21T03:26:57Z'} and is in an error state | 16:44 |
clarkb | I'm going to try and manually issue a delete request (not that I expect a different result) | 16:44 |
clarkb | I think if we manually delete the record from zk then nodepool should create a leaked node record for it with less info that may avoid issues. However, I think I'm also happy to do the nodepool cleanup config and see if this creates any problems in the first place. But will defer to corvus on whether or not that is sane | 16:46 |
clarkb | My manual delete attempt has not created any change in the situation (as expected) | 16:55 |
corvus | clarkb: i think removing the image config should be fine as long as the provider still exists. i think as long as the zk record still exists, it will proceed like a normal node deletion, and if it doesn't exist it should proceed like a leaked instance delete | 16:55 |
clarkb | corvus: the provider will remain but not the pool config for that image/label in the provider | 16:55 |
corvus | clarkb: i think we don't actually create stub zk nodes anymore either btw (that doesn't change analysis of the situation -- just indicating that it means we fall into the leaked path in that case). | 16:56 |
clarkb | but sounds like we can proceed and see if nodepool has any problems and deal with them at that point if they occur | 16:56 |
corvus | clarkb: so i'm like 85% sure you should be able to proceed as you describe without problem, and if i'm wrong, i agree that a manual zk deletion is probably a good way out of that (or maybe if we see an error, something else will suggest itself) | 16:57 |
corvus | tldr: +1 delete label/image/etc | 16:57 |
clarkb | sounds like a plan | 16:57 |
corvus | fungi: remote: https://review.opendev.org/c/zuul/zuul/+/913434 Use NodesetNotFoundError class [NEW] | 17:10 |
corvus | i think that should make those errors a bit more filterable (not by nodeset name, but at least you should be able to filter by the class of error) | 17:11 |
corvus | also remote: https://review.opendev.org/c/zuul/zuul/+/913435 Use ProjectNotFoundError [NEW] | 17:25 |
fungi | okay, back and catching up | 17:41 |
fungi | corvus: ah, neat! yes those would help tremendously. as for filtering out the error strings, i'll probably resort to a python script to drop keys from the json rather than trying to do it all with simple command-line utilities | 17:43 |
clarkb | fungi: for https://review.opendev.org/c/openstack/project-config/+/912787 I'll let you approve when you're satisfied with your error deep dive since corvus' seems to think the stuck deleting node shouldn't eb a problem | 17:46 |
clarkb | I do have lunch plans with some friends today, but should be back at a reasonable time to debug any nodepool issues should those occur | 17:46 |
fungi | yeah, will do | 17:46 |
clarkb | corvus: that config error refactor change has import errors | 17:47 |
corvus | guess i should have run more than just flake8 on it :) | 17:48 |
corvus | it usually catches those | 17:48 |
fungi | okay, filtering out the error and short_error keys, the before/after diff is reduced to ~18k in size, probably still too big for paste.o.o | 17:55 |
fungi | ah, no, it fits! https://paste.opendev.org/show/bfRpuFOiI2OM6O77uPuV/ | 17:56 |
fungi | i suppose i could squeeze it down a bit more by dropping the source_context.path keys | 17:58 |
fungi | https://paste.opendev.org/show/b52SraP5g8re4jEB8IaC/ is without the source_context.path | 18:00 |
frickler | I guess I should run the eom branch cleanup, lest someone tries to fix stable/vwx | 18:00 |
fungi | i already merged a bunch of backports to those earlier this week | 18:01 |
fungi | and yes, it would have been easier if they hadn't existed | 18:01 |
frickler | yes, but these should go into unmaintained/vwx now where those exist | 18:01 |
clarkb | fungi: looks like placement and tenks were things I missed when doing a manual scan. But neither are on master or stable/2024.1 so that still isn't an issue | 18:01 |
opendevreview | Merged openstack/project-config master: Remove centos-7 image uploads from Nodepool https://review.opendev.org/c/openstack/project-config/+/912787 | 18:17 |
clarkb | that config update is deploying nowish | 18:24 |
clarkb | the stuck node is in rax-dfw so if there are issues with deleting it we will see that on nl01 | 18:24 |
clarkb | deploy appears done | 18:32 |
clarkb | `grep 0036475040 /var/log/nodepool/nodepool-launcher.log` shows a new behavior basically we hit https://opendev.org/zuul/nodepool/src/branch/master/nodepool/driver/utils.py#L360-L366 which has to do quota calculation then it says deletion should clean it up. Otherwise the exceptions deleting the node have continued the same before and after this update | 18:35 |
clarkb | all that to say I don't think this is going to be a problem | 18:35 |
clarkb | at least not any more than it would be if we kept centos-7 in the provider config | 18:35 |
clarkb | I'm going to pop out for that lunch momentarily. But will check in after to make sure all is still well. If so I think we can probably proceed with removing the disk image builds today as well? | 18:37 |
frickler | +1 | 18:46 |
fungi | wow, working on final contributor stats for openstack 2024.1 cycle and all those centos-7 removal patches shot me up to #13 by change count | 18:46 |
fungi | didn't realize there had been quite that many | 18:47 |
fungi | no, i was looking at the wrong file. #35 does seem more likely ;) | 18:51 |
frickler | 236 branches deleted, seems like ~100 less config errors https://paste.opendev.org/show/b8EXZ6co1dP8WrMl3J5I/ | 19:40 |
frickler | and with that I'm out for today and mostly for the weekend | 19:45 |
fungi | have a good weekend, thanks for the help! | 19:45 |
Clark[m] | fungi if you are still around can you recheck https://review.opendev.org/c/openstack/project-config/+/912788/ | 20:21 |
Clark[m] | Lunch should be winding down soon but realized that still has a config error -1 | 20:21 |
fungi | yep! | 20:25 |
fungi | it's passing now | 20:55 |
fungi | we can approve it when you get back | 20:55 |
clarkb | I'm back sorry that took entirely too long but the car said itw as was 70F something we haven't had in month | 21:21 |
clarkb | I have approved it | 21:22 |
clarkb | I even turned off the hvac system. its too bad it will only last a few days | 21:25 |
opendevreview | Merged openstack/project-config master: Remove centos-7 nodepool image builds https://review.opendev.org/c/openstack/project-config/+/912788 | 21:34 |
fungi | looks like it deployed | 21:46 |
fungi | no need to apologize! there are days when i've accidentally gone out for a 3-hour lunch | 21:46 |
clarkb | cool at this point we shouldn't have any new issues. We would've seen problems with the earlier cleanups | 21:46 |
clarkb | nb02 did leak an intermediate vhd conversion file for centos-7 from august. I've deleted it. That was the only file in /opt/nodepool_dib for centos-7 on either nb01 or nb02 | 22:04 |
opendevreview | Clark Boylan proposed opendev/system-config master: Stop mirroring CentOS 7 packages https://review.opendev.org/c/opendev/system-config/+/913453 | 22:47 |
opendevreview | Clark Boylan proposed opendev/system-config master: Cleanup opensuse mirroring configs entirely https://review.opendev.org/c/opendev/system-config/+/913454 | 22:47 |
clarkb | give the way buster went I think we wait on approving those until next week. But I wanted to get testing done for that (particularly the second change since it is a bit more involved) | 22:47 |
fungi | a reasonable precaution | 22:56 |
fungi | also i want to remember to tag a git-review release on monday | 22:57 |
Generated by irclog2html.py 2.17.3 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!