opendevreview | Ian Wienand proposed opendev/system-config master: letsencrypt: build txt record lists betterer https://review.opendev.org/c/opendev/system-config/+/865203 | 00:02 |
---|---|---|
opendevreview | Ian Wienand proposed opendev/system-config master: system-config-run-gitea: use standard bridge host https://review.opendev.org/c/opendev/system-config/+/865204 | 00:02 |
opendevreview | Ian Wienand proposed opendev/system-config master: [wip] Bump bridge ansible to 6.6.0 https://review.opendev.org/c/opendev/system-config/+/865195 | 00:02 |
ianw | system-config-run-gitea : " ERROR: Could not find a version that satisfies the requirement setuptools>=61.0 " ... what is this all about :/ | 00:07 |
fungi | which package is requiring that? | 00:08 |
clarkb | ianw: is that for the ansible 6 upgrade change? | 00:08 |
fungi | does newer ansible need bleeding edge setuptools? | 00:08 |
clarkb | its almost certainly going to be the module shebang stuff | 00:08 |
clarkb | fungi: the problem is that we did a thing where we used /usr/bin/env python or /usr/bin/env python3 in our module files but ansible blows up on that | 00:09 |
clarkb | and it tries to run the modules against different python and things break | 00:09 |
fungi | oh it's trying to run with python 2.7 which has a too-old setuptools maybe (or no setuptools at all) | 00:09 |
clarkb | in this case ansible is being run out of a virtualenv on bridge but then executing the module using system python and things are unhappy | 00:09 |
clarkb | well I think python3 in giteas case but yes | 00:10 |
fungi | oic | 00:10 |
ianw | no actually this was a -2 on https://review.opendev.org/c/opendev/system-config/+/864600/, just that little doc update | 00:10 |
clarkb | interesting it failed to install the node launcher | 00:11 |
ianw | maybe it is transient, i always get paranoid when setuptools/pip turn up in a trace | 00:12 |
clarkb | ianw: its bridge99 complaining | 00:13 |
clarkb | and it has python3.6 and appears to be bionic. | 00:13 |
clarkb | I wonder if the problem is simply an old node specification for bridge in that job? | 00:13 |
clarkb | (and the new python package for bridge requires modern python which isn't found on bionic) | 00:13 |
clarkb | ya bridge is bionic on that job | 00:14 |
ianw | yeah, i fixed that in https://review.opendev.org/c/opendev/system-config/+/865204 | 00:14 |
ianw | i wonder how it's been working ... | 00:14 |
clarkb | ianw: but that fix isn't before the change that failed | 00:15 |
clarkb | I suspect we just don't run system-config-run-gitea often enough to hvae hit that problem | 00:15 |
clarkb | https://review.opendev.org/c/opendev/system-config/+/865204/2 seems like something that shouldn't be in a chain of things and should be an independent fix? +2 anyway | 00:15 |
ianw | but interestingly it did pass in the check | 00:19 |
ianw | yeah, i agree, i can split it out now | 00:20 |
ianw | ansible 6 is picking up some list constructions that i have no idea how they used to work :) | 00:20 |
clarkb | ianw: I think we only just landed the launch env change | 00:21 |
clarkb | possible when the check pass ran the launch env didn't exist on bridge yet | 00:21 |
ianw | oh yeah, *that* is probably it | 00:23 |
ianw | that makes sense | 00:23 |
opendevreview | Ian Wienand proposed opendev/system-config master: system-config-run-gitea: use standard bridge host https://review.opendev.org/c/opendev/system-config/+/865204 | 00:24 |
opendevreview | Ian Wienand proposed opendev/system-config master: system-config-run-gitea: use standard bridge host https://review.opendev.org/c/opendev/system-config/+/865204 | 00:26 |
opendevreview | Ian Wienand proposed opendev/system-config master: borg-backup-server: build borg users betterer https://review.opendev.org/c/opendev/system-config/+/865202 | 00:26 |
opendevreview | Ian Wienand proposed opendev/system-config master: letsencrypt: build txt record lists betterer https://review.opendev.org/c/opendev/system-config/+/865203 | 00:26 |
opendevreview | Ian Wienand proposed opendev/system-config master: [wip] Bump bridge ansible to 6.6.0 https://review.opendev.org/c/opendev/system-config/+/865195 | 00:26 |
*** rlandy|rover|biab is now known as rlandy|rover | 00:33 | |
ianw | i seem to have written a lot of things that don't really evaluate to lists but seem to work :) | 00:59 |
fungi | you've mastered dwim programming | 01:00 |
ianw | https://opendev.org/opendev/system-config/src/branch/master/playbooks/roles/letsencrypt-request-certs/tasks/main.yaml#L30 in hindsight probably isn't the best way to do things | 01:05 |
*** rlandy|rover is now known as rlandy|out | 01:08 | |
*** tkajinam is now known as Guest2162 | 02:04 | |
opendevreview | Ian Wienand proposed opendev/system-config master: letsencrypt: build txt record lists betterer https://review.opendev.org/c/opendev/system-config/+/865203 | 02:30 |
opendevreview | Ian Wienand proposed opendev/system-config master: [wip] Bump bridge ansible to 6.6.0 https://review.opendev.org/c/opendev/system-config/+/865195 | 02:30 |
opendevreview | Ian Wienand proposed opendev/system-config master: letsencrypt-request-certs: refactor certcheck list https://review.opendev.org/c/opendev/system-config/+/865218 | 02:30 |
opendevreview | Merged opendev/system-config master: system-config-run-gitea: use standard bridge host https://review.opendev.org/c/opendev/system-config/+/865204 | 02:40 |
*** tkajinam is now known as Guest2174 | 03:28 | |
opendevreview | Ian Wienand proposed opendev/system-config master: [wip] Bump bridge ansible to 6.6.0 https://review.opendev.org/c/opendev/system-config/+/865195 | 04:02 |
opendevreview | Ian Wienand proposed opendev/system-config master: gitea-git-repos: remove #!/usr/bin/env python https://review.opendev.org/c/opendev/system-config/+/865224 | 04:02 |
*** yadnesh|away is now known as yadnesh | 04:03 | |
*** ysandeep|out is now known as ysandeep | 04:49 | |
*** ysandeep is now known as ysandeep|ruck | 04:49 | |
*** pojadhav- is now known as pojadhav | 05:03 | |
opendevreview | Merged opendev/system-config master: opendev.org: add status update links https://review.opendev.org/c/opendev/system-config/+/864600 | 05:24 |
*** tkajinam is now known as Guest2179 | 05:36 | |
StutiArya[m] | fungi: can you please share the link for #openstack-qa channel that you mentioned | 05:38 |
*** pojadhav- is now known as pojadhav | 05:57 | |
*** ysandeep__ is now known as ysandeep|ruck | 06:53 | |
opendevreview | Ian Wienand proposed opendev/system-config master: opendev.org: close <li> tag properly https://review.opendev.org/c/opendev/system-config/+/865233 | 07:19 |
*** yadnesh is now known as yadnesh|afk | 08:20 | |
*** jpena|off is now known as jpena | 08:22 | |
*** ysandeep|ruck is now known as ysandeep|ruck|brb | 08:37 | |
*** yadnesh|afk is now known as yadnesh | 08:51 | |
*** jpena is now known as jpena|off | 08:56 | |
*** jpena|off is now known as jpena | 08:57 | |
frickler | infra-root: do we have a policy to clean up unused groups in gerrit? or do we just ignore them? like https://review.opendev.org/c/openstack/project-config/+/814597/2 left kolla-cli-core unused. bit confusing while searching for active kolla groups IMO | 08:59 |
*** ysandeep|ruck|brb is now known as ysandeep|ruck | 09:24 | |
*** ysandeep|ruck is now known as ysandeep|ruck|brb | 10:53 | |
*** ysandeep|ruck|brb is now known as ysandeep|ruck | 11:09 | |
*** dviroel|afk is now known as dviroel | 11:20 | |
*** rlandy|out is now known as rlandy|rover | 11:23 | |
*** dviroel_ is now known as dviroel | 11:38 | |
*** dviroel_ is now known as dviroel | 12:16 | |
fungi | frickler: long ago i would empty the groups, then rename them with a prefix and then set them to hidden, but i haven't done so for a long time | 12:42 |
fungi | and it looks like our options for cleaning those up hasn't really changed. gerrit still doesn't seem to have a mechanism for deleting groups once they're created | 12:52 |
fungi | probably we could run a (complex) query to generate a list of groups which are neither referenced from an acl directly nor transitively through inclusion in another group, and then bulk rename and hide them | 12:57 |
fungi | and i'm not finding any official gerrit plugins for group deletion either | 13:00 |
fungi | though i did happen across this one which has account deletion: https://gerrit.googlesource.com/plugins/account/ | 13:00 |
*** ysandeep|ruck is now known as ysandeep|ruck|afk | 13:07 | |
frickler | seems someone started some time ago but then gave up (see the linked review, too) https://www.gerritcodereview.com/design-docs/delete-groups-solution-plugin.html | 13:07 |
frickler | this is also related https://duckduckgo.com/l/?uddg=https%3A%2F%2Fwww.gerritcodereview.com%2Fdesign%2Ddocs%2Fdelete%2Dgroups%2Dconclusion.html&rut=56e19180877adfa2ce096f0eee57023cbfd501c57e808cc4734b42a415aa8cb5 | 13:07 |
frickler | meh, should've been without that bird wrapper | 13:08 |
frickler | https://www.gerritcodereview.com/design-docs/delete-groups-conclusion.html | 13:08 |
frickler | oh, nice, the email address parser in gerrit is broken, it makes some.one@gmail.com split into two http URLs separated by a normal "@" https://review.opendev.org/c/openstack/requirements/+/865030 | 13:26 |
fungi | infra-root: the new mailman3 import i started last night based on the latest fix resulted in no more django path warnings, and my cleanup on the production list footer fields seems to have eliminated those template conversion errors too. now the only errors are for two hidden/private lists which were configured not to create archives (and in both cases the error is that there was no archive to | 13:30 |
fungi | import) | 13:30 |
fungi | if anyone wants to poke around at the held node, it's 104.130.140.226 just be aware that if you want to test authenticated features like list configuration/moderation or altering subscription preferences, you'll need to "reset" your password and then fish the corresponding token out of a deferred message in exim's queue, since we're blocking outbound delivery from the test node | 13:31 |
fungi | if this looks good, we're probably clear to merge the topic:mailman3 changes (except for the "dnm" one on top of course) | 13:33 |
fungi | after i do some spot checks myself, i'm going to start working on booting the new production server and checking its addresses against blocklists (filing exception/removal requests as needed) | 13:34 |
*** ysandeep|ruck|afk is now known as ysandeep|ruck | 13:53 | |
*** dasm|off is now known as dasm | 13:55 | |
*** frenzy_friday is now known as frenzy_friday|doc | 14:43 | |
Clark[m] | fungi: should we squash the image change into the parent or maybe flip the order so that we build images first and then deploy them? | 15:09 |
*** pojadhav is now known as pojadhav|afk | 15:32 | |
fungi | Clark[m]: i'm still a little too fuzzy on the container building and consuming workflow to know which of those is preferable. i figured the current sequence allows us to isolate the forking effort so that it's theoretically easier to roll back off of later if we want to? | 15:35 |
Clark[m] | fungi: the main thing is updating the docker compose file and then removing the other bits. I agree having a separate commit simplifies things | 15:51 |
fungi | only thing odd of note is that the exim queue had a bunch of notices to subscribers that their subscriptions had been disabled due to bounces. i *think* it's because those subscribers were set to get digests, and exim was prevented from delivering them, so working as designed if so. but hard to be entirely certain of that | 16:03 |
fungi | if we configured iptables to drop instead of reject outbound smtp, then i'd still have the originals in the queue in order to say for sure | 16:04 |
fungi | oh, though maybe i can tell from the delivery log | 16:05 |
fungi | also i was wrong about the password reset workflow. it appears users are not precreated by the import script (or maybe that changed in a more recent version than we originally tested), so i needed to click "sign up" and create an account on one of the list sites and then confirm the account creation by following the link from the outbound message stuck in exim's queue | 16:07 |
fungi | but once i logged in with that new account, it had my existing subscriptions and owner/moderator status for various lists already linked | 16:08 |
fungi | and i was able to log into another list site with the same username/password after that | 16:08 |
fungi | confirming that account creation is still global across all the list sites (but login state is not, just for the record, so you still need to log into each site separately to make changes) | 16:09 |
*** yadnesh is now known as yadnesh|away | 16:12 | |
*** dviroel is now known as dviroel|lunch | 16:19 | |
*** dasm is now known as dasm|off | 16:23 | |
clarkb | fungi: ya I think at this point I'm happy to land the changes and start pushing towards a real host. Your testing has been fairly extensive and we've caught a fiar bit of stuff but I expect we're about at the limit of what is reasonable for our testing to cover | 16:32 |
fungi | also we're running clean on the absolute latest release of mailman from a few weeks ago, which is quite exciting | 16:33 |
clarkb | I hesitate to +2 the changes since I wrote a fair bit of them. But I assume if ianw, corvus, and/or frickler don't object as a third vote we can proceed? | 16:34 |
fungi | yeah, i mean, they've been up for review for months and discussed ad nauseum, so if anyone was going to object to the implementation/design choices they've had ample opportunity to do so | 16:36 |
fungi | and we seemed to have consensus on overall direction for this effort | 16:36 |
fungi | as we're following a spec we collectively approved last year | 16:36 |
*** hjensas is now known as hjensas|afk | 16:57 | |
*** marios is now known as marios|out | 17:00 | |
*** dviroel|lunch is now known as dviroel | 17:00 | |
corvus | i say proceed :) | 17:00 |
fungi | thanks! | 17:01 |
*** ysandeep|ruck is now known as ysandeep|out | 17:04 | |
fungi | i went ahead and approved the two changes. they don't alter current production services anyway so there's still time to make further adjustments before initial import maintenance for the opendev and zuul list sites anyway | 17:06 |
*** jpena is now known as jpena|off | 17:33 | |
*** frenzy_friday|doc is now known as frenzy_friday | 17:36 | |
opendevreview | Merged opendev/system-config master: Add a mailman3 list server https://review.opendev.org/c/opendev/system-config/+/851248 | 18:00 |
fungi | once the other shoe drops, i'll start launching the server | 18:02 |
fungi | trying out ianw's shiny new launcher package | 18:02 |
fungi | looks like it's /usr/launcher-venv/bin/launch-node on bridge01 | 18:05 |
clarkb | the readme was updated as part of the change but ya that sounds right | 18:07 |
fungi | interestingly, the osc/sdk versions installed in that venv don't work with rackspace's volume api | 18:08 |
fungi | $ sudo /usr/launcher-venv/bin/openstack --os-cloud=openstackci-rax --os-region-name=DFW volume list | 18:08 |
fungi | No module named 'cinderclient.v2' | 18:08 |
fungi | though works with the versions of things used by ~fungi/foo/bin/openstack (just gives a deprecation warning for cinder v2) | 18:09 |
clarkb | I think we half expected we might need to adjust those | 18:09 |
fungi | my venv was populated with `pip install openstackclient 'python-cinderclient<8'` | 18:10 |
opendevreview | Merged opendev/system-config master: Fork the maxking/docker-mailman images https://review.opendev.org/c/opendev/system-config/+/860157 | 18:11 |
fungi | looks like my venv has python-openstackclient==6.0.0 vs 4.0.2 in the global one, but i have older python-cinderclient==7.4.1 instead of 9.1.0 | 18:13 |
clarkb | I think those versions came from my venv on bridge01 ~clarkb/oldoscenv and those worked. But looking in my history they may have only worked against server commands and not volume commands. I thought I had tested both | 18:14 |
clarkb | I'm happy to update to match yours if it works | 18:15 |
fungi | my venv only backdates the cinderclient version sufficiently to not reach v2 support removal and otherwise works with latest versions of anything pip is able to install with that | 18:16 |
opendevreview | Jeremy Stanley proposed opendev/system-config master: Improve launch-node dependency versions https://review.opendev.org/c/opendev/system-config/+/865320 | 18:25 |
fungi | clarkb: ianw: ^ | 18:25 |
fungi | ~fungi/foo has been rebuilt now installing the launch package with that patch applied, and i can openstack volume list from rax dfw just fine | 18:28 |
frickler | https://review.opendev.org/c/openstack/neutron-tempest-plugin/+/857031 looks weird to me, why was that recheck needed? the devstack patch merged 3 days ago, I assumed zuul to trigger gate for projects in the same (integrated) queue automatically | 18:31 |
clarkb | you probably need to look at the zuul scheduler logs from the 19th | 18:34 |
clarkb | the way it should work is when the parent change enqueues it also enqueues the child | 18:34 |
fungi | even for cross-project dependencies? | 18:35 |
fungi | i thought it only knew to do that for git relationships | 18:35 |
clarkb | yes I think so. However, it may rely on the change cache and the child was approved on the 16th which was possibly long enough for it to lose that change in the cache | 18:36 |
fungi | looks like the change it depends-on was approved two days later | 18:37 |
clarkb | looking at zuul really quickly we seem to track this on the child side so ya if the child isn't active anylonger then maybe we lose that info | 18:40 |
fungi | 857031 was approved 2022-11-16 13:05z, then the change it depends-on (860795) was approved 2022-11-18 10:06z and was immediately rejected because it in turn depends-on another change with which it doesn't share a change queue | 18:40 |
fungi | then someone blindly rechecked 860795 thinking zuul must be confused about that fact and it was immediately rejected again for the same reason | 18:41 |
clarkb | though there is getChangesDependingOn which does a Gerrit quiery | 18:41 |
fungi | it was approved again and enqueued into the gate at 2022-11-19 20:11, but 857031 did not get automatically enqueued at the same time (i'm still not 100% sure i've ever seen it add previously-approved depends-on changes into the queue but i probably haven't paid close attention) | 18:43 |
clarkb | ya I think it tries to. getChangesDependingOn is part of that | 18:44 |
fungi | also note that there was a zuul rolling upgrade between those, if that matters at all | 18:45 |
clarkb | from zuul01 2022-11-19 20:11:12,812 DEBUG zuul.Pipeline.vexxhost.gate: [e: e3203a0382ea4077a21d6c60c0d76cf2] No changes need <Change 0x7f006cb75720 openstack/devstack 860795,9 | 18:51 |
clarkb | er thats the wrong pipeline | 18:51 |
fungi | new problem with launch-node... | 18:53 |
fungi | openstack.exceptions.BadRequestException: BadRequestException: 400: Client Error for url: https://dfw.servers.api.rackspacecloud.com/v2/610275/servers, Bad networks format | 18:53 |
clarkb | fungi: that might be why I used older osc | 18:54 |
clarkb | we might need older of both things I guess | 18:54 |
fungi | nope, same with the current version in /usr/launcher-venv | 18:54 |
fungi | i tried both that venv and mine just to be sure | 18:54 |
fungi | sudo /usr/launcher-venv/bin/launch-node --cloud=openstackci-rax --region=DFW --flavor="8GB Standard Instance" lists01.opendev.org | 18:55 |
fungi | that's what i ran | 18:55 |
clarkb | frickler: https://paste.opendev.org/show/b6P1Uy2VjAMZOqtBbwyz/ that is zuul deciding that there is one tempets change following that devstack change but that it isn't ready to gate. Your change doens't show up in that list | 18:56 |
clarkb | frickler: so zuul didn't find the change for one reason or another. Maybe the gerrit query didn't return it in the list for some reason? | 18:56 |
clarkb | it does seem to if I manually construct the gerrit query so not sure why that would happen | 18:57 |
clarkb | fungi: I think ianw and corvus ran into this too fwiw on old bridge and they sorted it out there so they may have input | 18:57 |
fungi | wondering if i need to pass some specific --network label | 18:58 |
clarkb | fungi: no, rax doesn't do networks | 18:59 |
clarkb | its a client/sdk/something issue where it thinks it has to do that but in reality you don't irc | 18:59 |
clarkb | frickler: the depends on string for the change it found is identical to the one in your change | 18:59 |
clarkb | https://review.opendev.org/q/message:%257BDepends-On:+https://review.opendev.org/c/openstack/devstack/%252B/860795%257D is the query it should run I think | 19:00 |
clarkb | corvus: ^ any idea why zuul didn't find https://review.opendev.org/c/openstack/neutron-tempest-plugin/+/857031 when https://review.opendev.org/c/openstack/devstack/+/860795 was approved but did find https://review.opendev.org/c/openstack/tempest/+/861110/ per https://paste.opendev.org/show/b6P1Uy2VjAMZOqtBbwyz/ | 19:03 |
fungi | problem #1 trying with the same package versions as corvus-env on the old bastion, new bastion python appears to be incompatible with openstacksdk 0.41.0: | 19:27 |
fungi | AttributeError: module 'collections' has no attribute 'MutableMapping' | 19:27 |
clarkb | ya yo uneed 0.99.0 or newer for some of those python fixes iirc. There is another one that impacts python 3.11 | 19:28 |
clarkb | er I guess the 3.11 issue was addressed in 0.99.0. Not sure about that one | 19:28 |
fungi | 0.51.0 seems to work | 19:31 |
clarkb | fungi: I think you need ~0.50.0 for that fix | 19:32 |
fungi | 0.50.0 did not | 19:32 |
fungi | anyway, if that works, i'll see how far forward i can wind it | 19:32 |
fungi | FileNotFoundError: [Errno 2] No such file or directory: '/var/cache/ansible/inventory' | 19:34 |
fungi | that does indeed not exist on bridge01 | 19:34 |
clarkb | fungi: launch node is simply removing cache entries | 19:35 |
clarkb | so that may be something we can ignore now? but if ansible caches elsewhere we would need ot update the path | 19:36 |
fungi | well, "ignore" will need patching i think. at least it seems to imply that's why the command failed | 19:36 |
clarkb | yes you'd need to update the tool | 19:37 |
clarkb | I'm just pointing out that the code deletes contents of that dir so fi the dir doens't exist we're already winning :) | 19:37 |
fungi | yep | 19:38 |
fungi | i've wrapped that in a try/except for FileNotFoundError so seeing if it gets farther | 19:39 |
fungi | looks like launch-node is expecting to find ansible playbooks relative to but outside the virtualenv, or something like that | 19:50 |
fungi | ERROR! the playbook: /home/fungi/bar/lib/python3.10/site-packages/opendev_launch/../playbooks/set-hostnames.yaml could not be found | 19:50 |
clarkb | ya line 215 ish of launch node script | 19:50 |
clarkb | that should probably just be hardcoded to the zuul managed repo paths | 19:51 |
clarkb | the upside to relative paths in the past was it allowed you to make local edits | 19:51 |
fungi | or get embedded as data files | 19:51 |
clarkb | fungi: I'm never certain if that is safe because ansible looks content up relative to the playbook file path | 19:51 |
clarkb | fungi: I would be wary of doing that unless you vendor in an entire system-config | 19:51 |
fungi | oh, right, we have to consider not just python but also ansible there | 19:52 |
clarkb | can just default to the zuul path and then make it overrideable with a command line option if we find we need to do local edits of playbook | 19:52 |
fungi | as best as i can tell, this would break for /usr/launcher-venv/bin/launch-node as well | 19:53 |
clarkb | yes | 19:54 |
clarkb | thats why I suggest just setting it to /home/zuul/src/opendev.org/opendev/system-config/playbooks/set-hostnames.yaml and so on | 19:54 |
clarkb | and a flag to change the prefix would be a good addition too if we decide using different sets of playbooks is useful | 19:55 |
clarkb | another option would be to not install the script to the venv as a command | 19:55 |
clarkb | and instead curate a venv to run it out of but expect users to invoke the script out of the launch dir as before | 19:56 |
fungi | looks like it's probably getting a lot further now that i've embedded the full playbook path | 19:58 |
fungi | oh, yeah it made it past the reboot | 19:59 |
ianw | (thanks for looking at this, i figured it might not 100% work. sorry it's on my todo list to launch nodes once it merged, but i got sidetracked by the ansible update stuff) | 19:59 |
fungi | hey no sweat, you did the heavy lifting | 20:00 |
fungi | i'll add these tweaks to 865320 | 20:00 |
fungi | and try to work out the latest actually viable package set which can do openstack server create and openstack volume list | 20:01 |
fungi | for rackspace specifically | 20:01 |
clarkb | fungi: looking in /var/cache/ansible I think we may want to delete host specific facts | 20:02 |
clarkb | but that is only an issue if we reuse names which we do far less of today | 20:02 |
clarkb | so ya I think just not deleting things is fine | 20:02 |
fungi | we sort of reuse hostnames if we have to keep retrying to boot the same machine over and over to test the launch tooling ;) | 20:02 |
fungi | this launch is still progressing, so seems it's probably working. just need to work out a more exact formula for the package versions to install into the launch venv now | 20:15 |
opendevreview | Merged opendev/system-config master: opendev.org: close <li> tag properly https://review.opendev.org/c/opendev/system-config/+/865233 | 20:20 |
fungi | launching this server has already taken 30 minutes. hopefully it's close to done applying our configuration | 20:26 |
fungi | looks like it's outputting syslog/journald messages instead of ansible progress. just constant spam from systemd and dbus-daemon | 20:28 |
fungi | and multipathd | 20:28 |
fungi | this seems to be TASK [Run unattended-upgrade on debuntu] which has been going for nearly 25 minutes now | 20:29 |
fungi | also it's dawned on me that i didn't tell it to boot a jammy image so it's using focal | 20:30 |
fungi | just as well, i was expecting to test this at least once more after i settle on newer python package versions | 20:30 |
*** dasm|off is now known as dasm | 20:31 | |
fungi | it finished! roughly 40 minutes start to end | 20:34 |
fungi | ianw: is there a patch to the launch script to make it include host keys in the copy-paste blob for the inventory at the end? | 20:34 |
fungi | just happened to notice that wasn't included | 20:34 |
*** dviroel is now known as dviroel|afk | 20:37 | |
clarkb | fungi: re unattended upgrades that ensures that base image is up to date before we reboot and try to configure stuff on it. But yes it can take some time occasionally | 20:38 |
fungi | yep, just didn't imagine it would take that long... then again this was focal, and possibly a very old one | 20:38 |
fungi | interesting. rackspace's jammy image insists we supply keypair data, but the focal image doesn't | 20:40 |
fungi | The requested image '...' requires remote authentication credentials to be passed to the image, but no supported credentials were found within the request. Supported methods: 'ssh_keypair'. (HTTP 400) | 20:40 |
clarkb | corvus: frickler: I've tried to manually step through what zuul would do to find that dependency. So far everything seems to work as expected. One thing I noticed is that the commit used Depends-on instead of Depends-On but we ignore case in our re matching | 20:44 |
clarkb | we know it found one following change which means it didn't shortcut due to change.uris being empty | 20:45 |
fungi | huh, so openstackclient can do a server create just fine with newer sdk, but the launch script seemingly cannot | 20:46 |
clarkb | fungi: we do pass a keypair though | 20:48 |
clarkb | its a throwaway key that we remove from both ends once bootstrapping is done | 20:48 |
clarkb | corvus: frickler: there is gerrit io logging but it seems we disable this (not surprising it would be very verbose) | 20:49 |
clarkb | but I'm beginning to think we got an incomplete response from gerrit | 20:49 |
clarkb | maybe the gerrit indexes were stale for some reason | 20:49 |
clarkb | either that or the gerrit query was case sensitive (though in my testing via the rest api it seems that it isn't) | 20:49 |
ianw | fungi: not yet (on the ssh keys) | 20:59 |
clarkb | I think there are one of two explanations. First is that gerrit just returned less data than we expect for some rason so zuul wasn't aware | 20:59 |
clarkb | second is that our query is returning no data for som ereason (encoding, etc) and we're relying on the cached info about changes and that tempest change was live when the other change was approved | 21:00 |
clarkb | But I have no hard evidence of this as all my testing returns the results we expect | 21:00 |
*** join_eggdreamnft is now known as \join_subline | 21:01 | |
clarkb | I'm going to step away from depends on debuggig now to look at python weirdness on jammy with ceph in tempest jobs | 21:01 |
corvus | clarkb: i think there's a third possibility which is some error updating the change cache (missed an update, race, etc), but that seems unlikely, especially if you didn't see any indications it wrote an updated change value in response to that query. | 21:07 |
corvus | clarkb: any extra debug lines you think we should add in case it happens again? | 21:07 |
corvus | (if you think the only additional info would be the gerrit io log, then that's probably the end of the line, since i don't think we want to turn that on in production) | 21:08 |
ianw | https://fosstodon.org/@opendevinfra has a green tick now, which is kinda cool | 21:09 |
clarkb | corvus: the logs I have appear t obe a complete accounting for that processing so ya I don't think it had cache errors (we do log thoes iirc) | 21:14 |
ianw | clarkb: "the input device is not a TTY" <- does this seem familiar? istr you having some sort of issue with it with a recent jammy upgrade | 21:15 |
clarkb | corvus: maybe we should log following changes found by getChangesDependingOn separately from those already on the change object? then we'd have a clearer idea if the problem is talking to gerrit or our cache? | 21:15 |
clarkb | ianw: is that with docker? | 21:15 |
clarkb | ianw: when I ran into it it was with docker -it and it was a bug in docker. Upgrading docker fixes it (it took them a bit of time to make a fixed release though so at the time we just waited) | 21:16 |
ianw | yes, this is with https://storage.gra.cloud.ovh.net/v1/AUTH_dcaab5e32b234d56b626f72581e3644c/zuul_opendev_logs_8a4/865195/7/check/system-config-run-gitea/8a45d93/bridge99.opendev.org/ara-report/results/523.html | 21:18 |
ianw | which is where we poke at the giteadb with a mariadb call to update logos | 21:18 |
ianw | but I guess this only appears when using ansible 6 on the bridge | 21:19 |
clarkb | previously the issue was in docker itself and could be worked around by not doing interactive sessions. Just running commands which can be clunky | 21:20 |
ianw | and it's talking to, and running this, on th the gitea host which is bionic | 21:20 |
clarkb | corvus: ya I think maybe a bit of split up logging in enqueueChangesBehind() might help differentiate where we are missing info | 21:20 |
ianw | clarkb: hrm, yeah in this case it's "/usr/local/bin/docker-compose -f /etc/gitea-docker/docker-compose.yaml exec mariadb" | 21:21 |
corvus | clarkb: kk ping me if you write that | 21:21 |
clarkb | ianw: by default docker-compose asks for a tty iirc which is the inverse of `docker` | 21:21 |
clarkb | ianw: I think the flag is -T? to disable it? | 21:21 |
opendevreview | Ian Wienand proposed opendev/system-config master: borg-backup-server: build borg users betterer https://review.opendev.org/c/opendev/system-config/+/865202 | 21:26 |
opendevreview | Ian Wienand proposed opendev/system-config master: letsencrypt-request-certs: refactor certcheck list https://review.opendev.org/c/opendev/system-config/+/865218 | 21:26 |
opendevreview | Ian Wienand proposed opendev/system-config master: letsencrypt: build txt record lists betterer https://review.opendev.org/c/opendev/system-config/+/865203 | 21:26 |
opendevreview | Ian Wienand proposed opendev/system-config master: gitea-git-repos: remove #!/usr/bin/env python https://review.opendev.org/c/opendev/system-config/+/865224 | 21:26 |
opendevreview | Ian Wienand proposed opendev/system-config master: [wip] Bump bridge ansible to 6.6.0 https://review.opendev.org/c/opendev/system-config/+/865195 | 21:26 |
opendevreview | Ian Wienand proposed opendev/system-config master: gitea-set-org-logos: use -T on mariadb command https://review.opendev.org/c/opendev/system-config/+/865339 | 21:26 |
opendevreview | Ian Wienand proposed opendev/system-config master: borg-backup-server: build borg users betterer https://review.opendev.org/c/opendev/system-config/+/865202 | 21:26 |
opendevreview | Ian Wienand proposed opendev/system-config master: letsencrypt-request-certs: refactor certcheck list https://review.opendev.org/c/opendev/system-config/+/865218 | 21:26 |
opendevreview | Ian Wienand proposed opendev/system-config master: letsencrypt: build txt record lists betterer https://review.opendev.org/c/opendev/system-config/+/865203 | 21:26 |
opendevreview | Ian Wienand proposed opendev/system-config master: gitea-git-repos: remove #!/usr/bin/env python https://review.opendev.org/c/opendev/system-config/+/865224 | 21:26 |
opendevreview | Ian Wienand proposed opendev/system-config master: gitea-set-org-logos: use -T on mariadb command https://review.opendev.org/c/opendev/system-config/+/865339 | 21:26 |
opendevreview | Ian Wienand proposed opendev/system-config master: [wip] Bump bridge ansible to 6.6.0 https://review.opendev.org/c/opendev/system-config/+/865195 | 21:26 |
ianw | sorry, rebased to master instead of origin/master :/ anyway, there's a chance that's a complete stack for ansible 6 support | 21:27 |
*** dasm is now known as dasm|off | 22:25 | |
fungi | okay, through trial and error i've determined that launch-node will work with rackspace avoiding that network response parsing exception if i pin openstacksdk<0.99 (which results in installing 0.61.0) | 22:33 |
fungi | the problem arises in 0.99.0 | 22:34 |
opendevreview | Ian Wienand proposed opendev/system-config master: Bump bridge ansible to 6.6.0 https://review.opendev.org/c/opendev/system-config/+/865195 | 22:35 |
opendevreview | Ian Wienand proposed opendev/system-config master: bridge: Use any 6.X Ansible release https://review.opendev.org/c/opendev/system-config/+/865345 | 22:35 |
ianw | fungi: do you think it's worth bisecting? | 22:36 |
fungi | maybe, but for now i've at least got a patch i can push with accumulated launch-node fixes | 22:36 |
clarkb | fungi: and openstacksdk 0.102.0 also errors? | 22:37 |
fungi | yes | 22:37 |
clarkb | fungi: I ask because I was hoping to update nodepool but I think that means nodepool can't talk to rax if we do :/ | 22:37 |
fungi | i started at openstacksdk<1 and then wound backward by version until in hit the last one which avoided the error | 22:38 |
ianw | do we have a theory? if not, i can make some time to bisect it down to a change ... especially if we can pinpiont it for nodepool too | 22:38 |
opendevreview | Jeremy Stanley proposed opendev/system-config master: Improve launch-node deps and fix script bugs https://review.opendev.org/c/opendev/system-config/+/865320 | 22:41 |
fungi | ianw: i have no theory, other than something in 0.99.0 probably started expecting modern network information returned in the server create response which doesn't match what's coming back from rackspace | 22:42 |
fungi | also 0.99.0 was the merge of a feature branch, if memory serves, so may not be trivial to bisect | 22:42 |
fungi | anyway, 865320 is what i installed in a venv in my homedir on bridge01 and that seems to be working start to finish to launching a jammy node for the new lists server | 22:46 |
ianw | ++ | 22:47 |
ianw | i'll see if i can make a small replicator ... | 23:00 |
*** rlandy|rover is now known as rlandy|out | 23:09 | |
clarkb | fungi: did you want to check my comments on the launch node change and approve if you feel like those updates are too much? we can always do them in followups as we launch more nodes | 23:29 |
fungi | i'll take a look in a bit | 23:33 |
clarkb | ianw: question about https://review.opendev.org/c/opendev/system-config/+/865218/3/playbooks/roles/letsencrypt-request-certs/tasks/main.yaml you updated the example to show a port (8000) and the construction seems to default to :443. I thought certs didn't specify a port or service though? | 23:36 |
clarkb | oh wait this is just for verification not generation I get it | 23:36 |
ianw | clarkb: yep, that's right -- it's just what goes in the ssl check file | 23:37 |
ianw | the first regex should turn anything with ":<port>" into " <port>" and the second should be 'if this doesn't have a space in it, then add " 443"' | 23:38 |
ianw | i half considered dropping into python with a lib file for it, but i think it *just* meets criteria of being understandable :) | 23:38 |
clarkb | ianw: and the second regex matches because there are no spaces if we didn't replace : with ' ' ? Otherwise we don't match because the $ interfers with our trailing space? | 23:43 |
clarkb | left a small nit on it (use + instead of *) but otherwise that lgtm | 23:43 |
ianw | right; there shouldn't be any trailing spaces. i guess we could double check that with a |trim | 23:45 |
opendevreview | Ian Wienand proposed opendev/system-config master: rax: remove identity_api_version 2 pin https://review.opendev.org/c/opendev/system-config/+/865351 | 23:56 |
clarkb | ianw: we should check the catalog or whatever it is to double check there is a v3 there | 23:58 |
clarkb | just to make sure we know what we are falling bac kon | 23:59 |
ianw | yeah i'm still trying to understand it all | 23:59 |
ianw | i can say it makes "openstack server list" go from not working to working :) | 23:59 |
Generated by irclog2html.py 2.17.3 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!