ianw | ok, i've cleared out the deleting nodes manually | 00:26 |
---|---|---|
ianw | both builders are working away on images and have built one successfully | 00:26 |
ianw | openstack.exceptions.NotSupported: The image service for iweb:mtl01 exists but does not have any supported versions | 03:09 |
ianw | i wonder how new this is | 03:09 |
ianw | https://meetings.opendev.org/irclogs/%23zuul/%23zuul.2020-04-23.log.html#t2020-04-23T20:25:18 is another instance | 03:12 |
ianw | $ curl https://image.api.mtl01.cloud.iweb.com | 03:21 |
ianw | curl: (60) SSL certificate problem: unable to get local issuer certificate | 03:21 |
ianw | i think this is the real issue ... | 03:21 |
ianw | my browser seems to trust it, but the nodepool-builder image doesn't | 03:22 |
ianw | the underlying focal system doesn't seem to trust it, and neither does my debian testing system | 03:31 |
ianw | neither does fedora 35 | 03:31 |
ianw | Certificate chain | 03:36 |
ianw | 0 s:CN = image.api.mtl01.cloud.iweb.com | 03:36 |
ianw | i:C = GB, ST = Greater Manchester, L = Salford, O = Sectigo Limited, CN = Sectigo RSA Domain Validation Secure Server CA | 03:36 |
ianw | 1 s:C = US, O = DigiCert Inc, CN = RapidSSL TLS DV RSA Mixed SHA256 2020 CA-1 | 03:36 |
ianw | i:C = US, O = DigiCert Inc, OU = www.digicert.com, CN = DigiCert Global Root CA | 03:36 |
ianw | 2 s:C = US, O = DigiCert Inc, OU = www.digicert.com, CN = DigiCert Global Root CA | 03:36 |
ianw | i:C = US, O = DigiCert Inc, OU = www.digicert.com, CN = DigiCert Global Root CA | 03:36 |
ianw | i think this is actually sectigo | 03:36 |
ianw | it looks like missing intermediate certificates to me | 03:41 |
opendevreview | Ian Wienand proposed openstack/project-config master: iweb: disable image uploads https://review.opendev.org/c/openstack/project-config/+/820473 | 04:12 |
ianw | on the plus side, "Linux fedora-35-inmotion-iad3-0027609836" ... fedora 35 now boots | 04:13 |
ianw | note per the commit in 820473 i've filed a ticket on the certificate issue | 04:15 |
*** pojadhav is now known as pojadhav|ruck | 04:31 | |
*** ysandeep|out is now known as ysandeep | 04:43 | |
*** marios is now known as marios|ruck | 06:37 | |
*** pojadhav|ruck is now known as pojadhav|rover | 06:39 | |
opendevreview | Riccardo Pittau proposed opendev/irc-meetings master: Move Ironic meeting 1 hour forward https://review.opendev.org/c/opendev/irc-meetings/+/820477 | 08:22 |
*** ysandeep is now known as ysandeep|lunch | 08:52 | |
*** ysandeep|lunch is now known as ysandeep | 09:11 | |
dtantsur | Hey folks. I'm seeing node_failures on fedora-latest and opensuse-15, is it expected/known? | 09:16 |
*** pojadhav is now known as pojadhav|rover | 09:18 | |
*** rlandy is now known as rlandy|ruck | 11:14 | |
*** ysandeep is now known as ysandeep|afk | 12:02 | |
*** pojadhav|rover is now known as pojadhav|brb | 12:08 | |
*** pojadhav|brb is now known as pojadhav|rover | 12:28 | |
*** ysandeep|afk is now known as ysandeep | 12:44 | |
*** marios|ruck is now known as marios|call | 13:37 | |
*** marios|call is now known as marios|ruck|call | 13:38 | |
*** marios|ruck|call is now known as marios|ruck | 14:34 | |
*** ysandeep is now known as ysandeep|afk | 14:40 | |
*** ykarel is now known as ykarel|away | 14:42 | |
opendevreview | Merged opendev/elastic-recheck rdo: Fix ER bot to report back to gerrit with bug/error report https://review.opendev.org/c/opendev/elastic-recheck/+/805638 | 14:56 |
fungi | dtantsur: i think that means we don't have any nodes matching those labels... i'll take a look | 14:58 |
fungi | i know ianw was working on getting fedora-35 booting, and i see one in a ready state in rax-dfw since almost 8 hours | 15:01 |
fungi | sorry, no, 8 minutes | 15:01 |
fungi | which means we're probably using them successfully but maybe we didn't have any when you tried since that was only a few hours after ianw said he'd gotten it booting | 15:02 |
fungi | oh, though the fedora-latest nodeset uses the fedora-34 label | 15:03 |
fungi | and yeah, we don't seem to have any of those | 15:04 |
fungi | right, we don't appear to have any fedora-34 images built | 15:05 |
fungi | nor opensuse-15 images | 15:05 |
fungi | looks like the builders are still spinning their wheels trying to upload to iweb-mtl01 because of the cert issue ianw observed earlier | 15:13 |
fungi | seems like this has effectively deadlocked our image builds, and for some reason the previous opensuse-15 and fedora-34 images were deleted and no replacements have been built yet | 15:14 |
fungi | we may need to temporarily remove iweb-mtl01 from our builder configs | 15:15 |
fungi | i'll work on a patch for that now | 15:15 |
*** ysandeep|afk is now known as ysandeep | 15:17 | |
fungi | oh, ianw already tried to push up a patch for that as 820473 but i think it's hitting the wrong file | 15:24 |
opendevreview | Jeremy Stanley proposed openstack/project-config master: iweb: disable image uploads https://review.opendev.org/c/openstack/project-config/+/820473 | 15:26 |
fungi | i'll emergency-approve that | 15:26 |
fungi | infra-root: i've put nb01/nb02 into the emergency disable list and manually applied that change while we wait for it to merge, though i think the builder containers are going to have to be restarted since they're looping far down enough they're not going to reach the point where they reread their configs automatically | 15:33 |
corvus | fungi: if you ilke, i'll take a look at the logs and see if i concur with the looping hypothesis? | 15:35 |
fungi | corvus: please do | 15:36 |
fungi | i haven't restarted them yet | 15:36 |
corvus | fungi: 2021-12-06 15:30:25,831 DEBUG nodepool.ProviderManager: Creating new ProviderManager object for iweb-mtl01 | 15:37 |
corvus | makes me think it did reload the config... not sure what it's doing right now though | 15:37 |
fungi | yeah, actually, it looks like they may have | 15:37 |
fungi | that's roughly the time i made the edit | 15:37 |
corvus | it's deleting an image from mtl01 though, which is not what you want it to do right now? | 15:38 |
fungi | i wonder if it's trying to delete one it never successfully uploaded | 15:38 |
fungi | i set all the images to paused there in its config | 15:38 |
corvus | yes, it's deleting failed uploads | 15:39 |
corvus | and the one it said it was deleting is gone, so i think it may be done, and correctly idling now | 15:39 |
corvus | fungi: i think your procedure worked as expected :) | 15:40 |
corvus | that was on nb01 btw | 15:40 |
fungi | awesome. hopefully they'll proceed to build the missing images | 15:40 |
fungi | since we're short at least fedora-34 and opensuse-15 at the moment | 15:40 |
fungi | not sure why those disappeared globally, maybe related to the cleanup ianw did when resolving the full filesystems | 15:41 |
opendevreview | Merged openstack/project-config master: iweb: disable image uploads https://review.opendev.org/c/openstack/project-config/+/820473 | 15:49 |
fungi | the builders still don't actually seem to be doing anything now though... corvus: in theory they should be moving on to start building missing images, right? | 15:52 |
fungi | it's been over 20 minutes since they logged that they were deleting records for failed uploads | 15:53 |
fungi | since the pause merged, i'll take the builders back out of emergency disable now | 15:59 |
jrosser | corvus: here's the keycloak realm config ansible tasks https://gist.github.com/jrosser/0444430988ee4d28788f2577c64712a9 | 16:02 |
jrosser | feels like an example of where some custom module might be warranted now rather than massive use of the uri module | 16:03 |
jrosser | oh, it's not maintained the order, the entry point is configure_realms.yml | 16:04 |
fungi | corvus: i'm going to restart the nodepool-builder container on nb01 and see if it starts building things. i'll leave nb02 as-is for further inspection for the moment | 16:11 |
corvus | fungi: maybe a thread dump on nb02 would be warranted to see what it's up to | 16:12 |
fungi | yeah, i was going there next | 16:12 |
fungi | though the nb01 debug log has been silent since i restarted it, and it doesn't seem to think it needs to build any images | 16:14 |
fungi | still no opensuse-15 nor fedora-34 images according to dib-image-list nor image-list | 16:16 |
fungi | not even in a building state | 16:16 |
clarkb | something needs to trigger the daily builds if that is what you are waiting on | 16:16 |
fungi | i thought the builders automatically built any missing images? | 16:16 |
fungi | did they stop doing that? | 16:17 |
clarkb | they do, but there is still a process that goes through and finds which ones need to be rebuilt and explicitly enqueues build requests for them | 16:17 |
clarkb | if that thread isn't running or doesn't run extremely frequently we could still eb waiting for that | 16:18 |
fungi | aah | 16:18 |
fungi | any idea how often? hourly? longer? | 16:19 |
fungi | i always recall it being fairly immediate | 16:19 |
*** ysandeep is now known as ysandeep|out | 16:20 | |
clarkb | looks like it checks every loop through on the build worker | 16:22 |
clarkb | which means if we're building an image already that could explain it | 16:22 |
fungi | except nothing is building from what i can see | 16:23 |
fungi | two thread dumps a minute apart and yappi stats are at the end of the nb02 debug log now | 16:24 |
fungi | 2021-12-06 16:21:45,839 DEBUG nodepool.stack_dump: Beginning debug handler | 16:25 |
fungi | starts there | 16:25 |
clarkb | do we know why those images got deleted? | 16:26 |
clarkb | both thread dumps show the build worker is in _checkForManualBuildRequest which happens after checking for stale images that need to be built. I'm not sure we can conclude it is stuck in there yet though and may have just gotten lucky when the signals were sent | 16:28 |
fungi | i do not. speculation is it was something to do with the cleanup ianw was doing after the filesystem filled up on all the builders due to hanging deletes | 16:28 |
clarkb | quickly double checking the images are still in the config and dib-image-list doesn't record them | 16:29 |
clarkb | ya I'm not sure why it isn't finding those images don't exist and need to be built | 16:30 |
clarkb | we can manually request a build and see if that behaves different | 16:31 |
clarkb | should I go ahead and do that or do we want to do more debugging first? cc corvus | 16:31 |
corvus | clarkb: maybe do another dump and see? | 16:34 |
clarkb | sure can do | 16:35 |
corvus | oh actually it looks like the first thread dump was in _checkForScheduledImageUpdates | 16:36 |
clarkb | async_object.set_exception(ConnectionClosedError( <- my first caught that, second seemed happier though but all were in the manual build requests | 16:37 |
clarkb | corvus: oh hrm. Ya so where is it deciding that it doen't need to build those images | 16:37 |
clarkb | check if disk images are paused. | 16:40 |
opendevreview | Riccardo Pittau proposed openstack/diskimage-builder master: Install python versions specific pip and virtualenv modules https://review.opendev.org/c/openstack/diskimage-builder/+/820563 | 16:41 |
corvus | fedora34 is paused in the config file | 16:41 |
clarkb | yes as is opensuse-15 | 16:41 |
clarkb | so mystery solved. However, I bet that means we can't build those images now :/ | 16:41 |
clarkb | we can try unpausing them. I'll look in git history to see why they were puased and propose reverts/toggles | 16:41 |
fungi | oh, ugh | 16:41 |
corvus | program performs as instructed, news at 11 :) | 16:42 |
fungi | well, i guess that explains it | 16:42 |
fungi | makes me wonder all the more how they got deleted globally | 16:42 |
clarkb | fedora-34 was paused until https://review.opendev.org/c/openstack/diskimage-builder/+/817317 could be landed. It has landed but we also need it to be included in the nodepool builder images. I'm checking that next | 16:43 |
clarkb | yes I think we're good for that one. | 16:44 |
clarkb | there is a dib release after that merged and nodepool requires that latest version | 16:44 |
clarkb | opensuse-15 was paused due to "persistent build failures" almost a year ago | 16:45 |
fungi | fedora-34, gentoo-17-0-systemd, opensuse-15 and opensuse-tumbleweed are the ones which are paused. of those, only gentoo-17-0-systemd has images | 16:45 |
clarkb | I think we unpause opensuse-15 and debug the build failures | 16:45 |
fungi | so we seem to have lost fedora-34, opensuse-15 and opensuse-tumbleweed | 16:45 |
clarkb | then decide if we're going to keep building them | 16:45 |
clarkb | fungi: re tumbleweed without suse's mroe active involvement I think we should consider not building it at all | 16:46 |
clarkb | I thought it would be a good way to get leading edge stuff but only if we had people around that could care for it. Opensuse-15 seems a bit more important. Anyway unpauses should be up as soon as I can write the commit | 16:46 |
fungi | yeah, makes sense, i'm currently just trying to discern if there's something about the paused image builds which resulted in their deletion | 16:46 |
clarkb | no I suspect they all got caught in ianw's deletion. Maybe because they were old looking and didn't get filtered by active in the dib-image-list? | 16:47 |
clarkb | or perhaps they had fallen out of the dib-image-list somehow but were still in the image-list? | 16:47 |
*** pojadhav|rover is now known as pojadhav|out | 16:48 | |
fungi | leaves me wondering why the gentoo image was spared that fate | 16:48 |
opendevreview | Clark Boylan proposed openstack/project-config master: Unpause fedora-34 and opensuse-15 image builds https://review.opendev.org/c/openstack/project-config/+/820565 | 16:49 |
fungi | there's also 816933 proposing to remove f34 images since roughly a month | 16:51 |
clarkb | we should probably unpause for now since we had those images recently, then work to retire them in the normal process. | 16:52 |
clarkb | if opensuse-15 completely fails to build still and no one shows up to fix them that might be another story though | 16:52 |
fungi | yeah | 16:52 |
fungi | testing on year-old opensuse images was probably not all that great in the first place | 16:52 |
fungi | willing to bet 15.2 isn't even the current point release | 16:53 |
fungi | and may not be available any longer for that matter | 16:53 |
clarkb | ya I think a new release has happend. We should see where we are at after getting logs and send notice to people that the suse stuff needs care and we can guide people through that if there is itnerest otherwise start to retire it | 16:53 |
*** marios|ruck is now known as marios|out | 16:54 | |
fungi | we do still seem to be mirroring 15.2 packages at least | 16:54 |
fungi | https://mirror.dfw.rax.opendev.org/opensuse/distribution/leap/15.2/repo/oss/x86_64/ has a bunch of rpms in it | 16:55 |
clarkb | ya part of the struggle here is that unlike ubuntu or centos when a major release gets updated the package mirrors don't auto roll forward | 16:55 |
clarkb | each "minor" release is a true release with an entirely separate package mirror aiui | 16:55 |
clarkb | which means that we need to spin up the 15.3 mirror first, then spin up the images, then remove 15.2 images then remove 15.2 mirror | 16:56 |
clarkb | (or maybe we start pushing back on mirroring for these lesser used images entirely and spin up 15.3 without a local mirror if people want 15.3) | 16:56 |
fungi | that's a possibility, sure | 16:56 |
clarkb | compared to say 20.04.x becoming 20.04.x+1 where the main mirror just updates and we auto sync | 16:57 |
clarkb | similar story with centos 8 (though it isn't updating anymore) | 16:57 |
clarkb | We should probablyremind people that centos-8 is going away at the end of the year and plan to remove it in early january in our nodepools? | 16:57 |
clarkb | I need to find breakfast but I'll probably land the matrix-gerritbot image update change after that. System-config updates are happening normally as expected now right? | 16:59 |
clarkb | fungi: also how is lists.o.o fixing going /me finds those changes | 17:00 |
clarkb | looks like we may still need the port 25 update to make them mergeable? (Also a rebase? I bet it was the keyclock change that conflicted) | 17:02 |
fungi | i haven't pushed the update yet | 17:02 |
fungi | about to head out to an appointment but planned to send them up when i get back in a couple hours | 17:02 |
clarkb | sounds good, thanks | 17:02 |
clarkb | re opensuse mirroring. I'm currently getting a few KBps at home pulling updates | 17:03 |
clarkb | entirely possible that the suse distros aren't viable without mirroring :/ | 17:03 |
fungi | thinking about how our deployment testing works, i have a feeling i'm going to need a custom test-only role to block 25/tcp egress in tests, as we normally test only applying the production iptables rules configured in our inventory. i guess i would include it in jobs after playbooks/roles/iptables so it doesn't get undone | 17:10 |
fungi | and just have it inject rules via the cli | 17:10 |
clarkb | I think you may also be able to add custom rules to the test base inventory? | 17:12 |
clarkb | opendev/system-config/playbooks/zuul/templates/group_var/all.yaml.j2 is the file | 17:13 |
fungi | oh, is that only used for testing? | 17:13 |
fungi | i should be able to add an iptables_rules var in there, in that case | 17:14 |
clarkb | ya the templates under there are put in place as fake private group vars | 17:14 |
opendevreview | Merged openstack/project-config master: Unpause fedora-34 and opensuse-15 image builds https://review.opendev.org/c/openstack/project-config/+/820565 | 17:14 |
fungi | like we do in inventory/service/group_vars/review.yaml | 17:14 |
clarkb | ya though the ones under inventory and on bridge are the real prod values | 17:15 |
clarkb | only the ones under playbooks/zuul/templates are test only | 17:15 |
fungi | okay, headed out to my appointment, should hopefully be back around 20:00 utc | 17:15 |
*** sshnaidm is now known as sshnaidm|afk | 17:25 | |
*** rlandy|ruck is now known as rlandy|ruck|brb | 17:26 | |
clarkb | once I've confirmed nodepool has unpaused properly I'm going to approve tristanC's matrix-gerritbot update | 17:26 |
tristanC | clarkb: nice, i'm here to check if it goes wrong | 17:30 |
clarkb | thanks | 17:31 |
clarkb | just waiting for the nodepool job to run then I'll get to that | 17:31 |
clarkb | they are unpaused and building now \o/ | 17:39 |
clarkb | and now matrix-gerritbot has been approved | 17:39 |
tristanC | clarkb: i'm watching curl eavesdrop01.opendev.org:9001/metrics | grep error | 17:45 |
clarkb | tristanC: ok I think it may take a little time to land the change since it does the deploiyment testing first | 17:45 |
clarkb | infra-root something like https://etherpad.opendev.org/p/xvu2oKUQVLiHIHsvGRUt for the image changes (this assumes the opensuse images are still broken) | 17:52 |
*** rlandy|ruck|brb is now known as rlandy|ruck | 18:13 | |
frickler | clarkb: also mention the f34 to f35 step in the same run or was that already announced? and what about tumbleweed? otherwise I'm fine with the wording | 18:13 |
clarkb | frickler: oh ya good point. We should mention fedora-34 being replaced by 35. For tumbleweed I guess we should say we'd prefer to turn it off entirely since it really hasn't been maintained | 18:14 |
clarkb | frickler: updated with the additional thoughts. Thanks that was good feedback | 18:18 |
opendevreview | Merged opendev/system-config master: Update the gerritbot-matrix image to support arbitrary uid https://review.opendev.org/c/opendev/system-config/+/818645 | 18:18 |
clarkb | tristanC: ^ I think that should apply shortly | 18:19 |
tristanC | clarkb: ssh_errors counter is increasing, has the service restarted already? | 18:33 |
clarkb | Connecting to review.opendev.org:29418 No user exists for uid 11000 | 18:33 |
clarkb | tristanC: yes it just restarted and ^ is the error. I guess openssh or something else (the fork tooling in ghc?) doesn't like forking to a non existent user | 18:34 |
tristanC | that's odd, it did work with rootless podman | 18:34 |
clarkb | I'll manually undo the uid/gid specification and restart | 18:34 |
tristanC | clarkb: yes please, we'll need another update | 18:34 |
clarkb | ok thats done. The new image seems to be working with the uid:gid specification commented out of the docker-compose.yaml file | 18:35 |
clarkb | (I didn't revert the image) | 18:35 |
clarkb | I guess we need stronger checks in our deployment job too | 18:36 |
opendevreview | Clark Boylan proposed opendev/system-config master: Unset the matrix gerritbot uid:gid settings https://review.opendev.org/c/opendev/system-config/+/820583 | 18:38 |
clarkb | I'll self approve ^ | 18:38 |
clarkb | tristanC: ^ I included the entire log context around the error if that is helpful in the commit message there | 18:39 |
clarkb | sounds like openssh is the problem | 18:39 |
clarkb | it very specifically wants the uid to exist? that is odd, but ok | 18:40 |
clarkb | maybe a security thing? | 18:40 |
*** avass[m] is now known as AlbinVass[m] | 18:45 | |
tristanC | it is openssh complaining about the arbitrary uid, but i don't get that error when running `podman --user 11000` | 18:48 |
opendevreview | Clark Boylan proposed opendev/system-config master: Update gitea to 1.15.7 https://review.opendev.org/c/opendev/system-config/+/820267 | 19:02 |
tristanC | ok, i reproduce the issue when using rootfull docker | 19:02 |
clarkb | infra-root ^ I decided to double check that the templates didn't change and there is ony minor update to the head navbar template in a block that we comment out. I figured best to deal with small deltas now rather than bigger deltas in future upgrades | 19:03 |
clarkb | oh wow opensuse 15 and fedora-34 both built on their first passes | 19:04 |
clarkb | I'll need to update my email etherpad | 19:04 |
tristanC | clarkb: oh i see, podman populates a correct /etc/passwd in that case | 19:05 |
clarkb | that is an intersting behavior. Images aren't so immutable I guess :) | 19:07 |
clarkb | but that certainly explains why it doesn't error | 19:07 |
clarkb | dtantsur: I think you should be set for fedora and opensuse now | 19:08 |
fungi | okay, back and catching up | 19:18 |
fungi | clarkb: built maybe, but did they boot? | 19:20 |
clarkb | fungi: I haven't checked that | 19:23 |
fungi | i'll look | 19:23 |
fungi | | fedora-34-0000024221 | fedora-34 | nb01.opendev.org | qcow2,raw,vhd | ready | 00:00:25:59 | | 19:23 |
fungi | no, sorry, that's not what i meant to check | 19:23 |
fungi | though it's probably still uploading given it's that new | 19:24 |
fungi | yeah, still uploading everywhere | 19:24 |
fungi | same for opensuse-15 | 19:24 |
fungi | i'll check on it again in half an hour | 19:24 |
clarkb | fungi: I'll remove the ubuntu one 2fa topic from the meeting agenda? I think you've proceeded with that | 19:25 |
fungi | yep, sgtm | 19:26 |
fungi | unless you want to recap it | 19:26 |
clarkb | nah I think its fine as that has been on the mailing list and people can follow along there | 19:26 |
fungi | clarkb: for test nodes announcement, should we include anything about the debian-stretch/debian-stable image cleanups? are those still in progress? | 19:26 |
fungi | also gentoo probably needs to get the axe unless someone has time to figure out how to get it building/booting successfully | 19:27 |
clarkb | fungi: not sure what debian-stable is re our images. But stretch is long gone | 19:27 |
clarkb | hrm ya can add gentoo to the list | 19:27 |
fungi | debian-stable was an alias for debian-stretch, which we eliminated at the same time (in theory) | 19:27 |
fungi | looks like vexxhost/ansible-role-wireguard still references the debian-stretch node label, but that's the only straggler | 19:29 |
fungi | we don't still build them at least | 19:29 |
fungi | codesearch doesn't turn up any job configs referencing a debian-stable nodeset either, so i think that's all basically complete | 19:30 |
clarkb | I'll update the email draft once I'm done editing meeeting agenda stuff | 19:31 |
clarkb | ok I've got my agenda edits in. I'll let them simmer for a bit if others want to add stuff before I send the actual agenda out later today | 19:34 |
clarkb | fungi: updated with a bit about gentoo. | 19:36 |
fungi | thanks... should we try un-pausing gentoo to see if it's somehow miraculously back to working again? | 19:36 |
fungi | i have at least one old zuul-jobs change blocked on being able to boot gentoo nodes (771106 and its parent) | 19:37 |
clarkb | I guess we can. Do we have hints as to what the problem is in our logs (opensuse lacked that so I decided the best option was to unpause) | 19:38 |
fungi | i'm not really sure at this point, our existing images are over a year old | 19:39 |
fungi | checking git history | 19:39 |
fungi | https://review.opendev.org/797790 nodepool: pause gentoo and tumbleweed builds | 19:40 |
fungi | "Both of these are failing in ways that look like we need to fix them in dib. Stop attempting to build them for now." | 19:41 |
fungi | not much to go on | 19:41 |
clarkb | ya so maybe we unpause and check the results, we may be surprised as with opensuse 15 | 19:41 |
clarkb | added a note to the agenda bout image spring cleaning | 19:42 |
clarkb | and it isn't spring anywhere in the world right now is it? so no worries about hemisphere confusion :) | 19:42 |
clarkb | I'ev manually fixed up matrix gerritbot hopefully for the last time before my fix lands | 19:49 |
fungi | it's when you clean the loose springs out of the machinery | 19:49 |
clarkb | (just manually applying the comment out of the user line in the docker compose file and restarting things | 19:50 |
ianw | o/ ... thanks for fixing the image screwup | 19:50 |
fungi | well, fingers crossed they can actually boot | 19:51 |
fungi | checking again now that some of the uploads have completed | 19:52 |
fungi | | 0027618409 | ovh-gra1 | opensuse-15 | 597fa9a1-7752-43b4-8a61-62fe98f949e4 | 149.202.179.158 | | ready | 00:00:10:14 | unlocked | | 19:52 |
fungi | no fedora-34 nodes yet | 19:52 |
clarkb | I was able to ssh to that suse node | 19:53 |
opendevreview | Merged opendev/system-config master: Unset the matrix gerritbot uid:gid settings https://review.opendev.org/c/opendev/system-config/+/820583 | 19:54 |
ianw | clarkb: mail looks about right. gentoo is building in the dib gate ... i'm not sure but it might build ok now | 19:55 |
clarkb | ok, lets unpause it and see what happens then we can send the email out once we've got the data | 19:55 |
clarkb | fungi: did you want to push that change since you suggested it? or should I go ahead and push that up? | 19:55 |
fungi | i'm happy to push it now | 19:55 |
clarkb | cool thanks | 19:56 |
ianw | the problem has been it's a bit unreliable, sometimes it takes a long time to build and other times it breaks. prometheanfire is always responsive, but also i don't think anyone else has contributed to the image either | 19:56 |
fungi | should gentoo still be python3.8 and default/linux/amd64/17.1/systemd or something newer? | 19:57 |
prometheanfire | ya, I don't think you need to have an infra gentoo image anymore | 19:57 |
prometheanfire | I need to see what the current default is, it might be 3.9 | 19:57 |
opendevreview | Merged openstack/diskimage-builder master: Document EFI elements requirements https://review.opendev.org/c/openstack/diskimage-builder/+/819406 | 19:58 |
clarkb | I think it has a bit of value to projects like zuul and bindep that try to have broad assurance of functionality, but the cost of fixing random issues with gentoo when they show up in those tools is probably less than keeping the image alive? | 19:59 |
prometheanfire | looks like we are setting it to 3.9 | 19:59 |
fungi | prometheanfire: clarkb: ianw: i'll hold off pushing the un-pause for gentoo-17-0-systemd image builds until we know what we should build | 19:59 |
prometheanfire | fungi: atm I don't think the image is used anywhere in gate | 20:00 |
clarkb | fungi: what we should build meaning the python version? | 20:01 |
fungi | openstack/openstack-ansible-tests seems to reference it in several jobs (which are probably themselves unused) | 20:01 |
fungi | i have a long-wip change to fix gentoo support in the configure-mirrors role in zuul/zuul-jobs which is pending having working gentoo images again | 20:03 |
fungi | but will just abandon that stack if we're dropping ubuntu images entirely | 20:03 |
clarkb | fungi: or do you mean not bother until I send the email out and check if anyone wants to fix them? | 20:04 |
prometheanfire | gentoo, for now fully supports py38 and 39, for the python binary itself 37 and 3.10 are available | 20:04 |
clarkb | we should probably build with 39 if unpausing then | 20:05 |
fungi | clarkb: what we should build meaning the python version, but also profile and thus nodename | 20:05 |
clarkb | got it | 20:05 |
fungi | right now we have images defined for gentoo-17-0-systemd which set GENTOO_PROFILE: 'default/linux/amd64/17.1/systemd' | 20:05 |
fungi | it sounds like 19 is the current gentoo release? | 20:05 |
clarkb | alright I'm going to go find lunch. Back in a bit. Might be good to try and land the gitea 1.15.7 update later today if people have time to look that over | 20:05 |
prometheanfire | gentoo has releases? :P | 20:06 |
fungi | prometheanfire: it has versioned profiles, at least ;) | 20:06 |
ianw | fungi: it's probably worth double-checking what is being built in the dib jobs too, to keep in sync | 20:06 |
prometheanfire | true, 17.1/systemd is current | 20:06 |
fungi | oh, in that case we should stick with 17.1/systemd | 20:06 |
prometheanfire | that's separate from what you want for python | 20:07 |
fungi | upping the GENTOO_PYTHON_TARGETS and GENTOO_PYTHON_ACTIVE_VERSION envvars at least doesn't seem to imply we'd want a different image name | 20:08 |
fungi | i'll switch those to 3.9 in the unpause change | 20:08 |
opendevreview | Jeremy Stanley proposed openstack/project-config master: Un-pause gentoo-17-0-systemd images https://review.opendev.org/c/openstack/project-config/+/820590 | 20:14 |
fungi | prometheanfire: clarkb: ianw: ^ | 20:14 |
opendevreview | Merged openstack/diskimage-builder master: Disable all repositories after attaching a pool https://review.opendev.org/c/openstack/diskimage-builder/+/727879 | 20:20 |
fungi | i see fedora-34 nodes building in ovh-bhs1 now, so i'll see what happens with them | 20:31 |
opendevreview | Tristan Cacqueray proposed opendev/system-config master: Update the gerritbot-matrix image to support arbitrary uid with docker https://review.opendev.org/c/opendev/system-config/+/820591 | 20:36 |
fungi | the fedora-24 nodes in ovh-bhs1 repeatedly raised nodepool.exceptions.ConnectionTimeoutException | 20:37 |
fungi | now there | 20:38 |
tristanC | clarkb: 820591 seems to fix the docker issue | 20:38 |
fungi | 's one booting in rax-ord i'll watch | 20:38 |
fungi | 2021-12-06 20:40:14,921 INFO nodepool.NodeLauncher: [e: dbb13cf88df149999975274eaf43da37] [node_request: 300-0016378697] [node: 0027618929] Node is ready | 20:45 |
clarkb | fungi: the ovh problem is probably the problem we had previously just continuing | 20:45 |
fungi | looks like we're successfully booting fedora-34 in rad-ord but not ovh-bhs1, which yeah seems like deja vu | 20:45 |
clarkb | ya I bet that is the old problem not being corrected but this should work long enough for us to delete f34 | 20:46 |
fungi | right | 20:47 |
clarkb | tristanC: fungi +2 on your respective changes thanks | 20:47 |
ianw | fungi: yep, i don't think we merged the initramfs regeneration | 20:56 |
ianw | clarkb/fungi: https://review.opendev.org/c/zuul/zuul-jobs/+/818702 updates zuul jobs for f35, and surprisingly (to me) passed without alteration | 20:57 |
clarkb | ianw: I too am surprised it passed without alteration but +2 from me. Do you have time for https://review.opendev.org/c/opendev/system-config/+/820267? | 21:15 |
clarkb | fungi: ianw: I'm also happy to watch https://review.opendev.org/c/opendev/system-config/+/820591 if one of you have time to review that one as well | 21:15 |
ianw | oh yep, looking | 21:18 |
ianw | i thought we took some screenshots of the gitea page; we should do that | 21:18 |
fungi | oh poop, the iptables_rules var has -A openstack-INPUT hard-coded so i can't use it for egress | 21:18 |
clarkb | fungi: oh we don't have an egress chain we can modify? hrm | 21:19 |
clarkb | How difficult would it be to add an output chain? | 21:19 |
clarkb | (and is that a good idea?) | 21:19 |
clarkb | ianw: Ya taking a screenshot of the front page, the system-config repo page, and a file in system-config is probably a good idea | 21:21 |
clarkb | (we populate system-config in that job which makes it a good test case) | 21:21 |
clarkb | let me see if I can figure that out | 21:21 |
clarkb | ianw: hrm the testinfra stuff calls take_screenshot in test_gitea | 21:23 |
clarkb | http://storage.bhs.cloud.ovh.net/v1/AUTH_dcaab5e32b234d56b626f72581e3644c/zuul_opendev_logs_f6e/820267/2/check/system-config-run-gitea/f6e2e68/bridge.openstack.org/screenshots/ ah yup there they are | 21:23 |
clarkb | ok ya I think you already set this up for us :) | 21:24 |
mgagne | fungi: the issue with the certificate intermediate at iweb is being worked on | 21:27 |
ianw | clarkb: haha i thought so! i had the wrong job window open | 21:27 |
clarkb | ianw: and the screenshots look good for 1.15.7, no need to remove the +A | 21:27 |
ianw | mgagne: thanks! was it a recent change, or did we just not notice | 21:28 |
mgagne | The API certificates were renewed and intermediates didn't properly get updated. | 21:29 |
mgagne | ianw: I've been told the issue has been solved. | 21:30 |
ianw | mgagne: cool, yes looks like i can connect now | 21:31 |
ianw | clarkb: not sure where we ended up, are the builders still in emergency? | 21:32 |
clarkb | ianw: no the builders should be out of emergency. At least the change I pushed to unpause opensuse15 and f34 landed and updated automatically | 21:32 |
opendevreview | Ian Wienand proposed openstack/project-config master: Revert "iweb: disable image uploads" https://review.opendev.org/c/openstack/project-config/+/820595 | 21:37 |
opendevreview | Merged openstack/project-config master: Revert "iweb: disable image uploads" https://review.opendev.org/c/openstack/project-config/+/820595 | 21:56 |
fungi | ianw: yeah, my update to your pause change merged at which point i took the builders out of emergency disable | 22:18 |
fungi | so just reverting that normally should be sufficient | 22:18 |
clarkb | the gitea change should merge shortly. I'll keep an eye on that and when that is done approve the matrix-gerritbot update | 22:18 |
opendevreview | Merged opendev/system-config master: Update gitea to 1.15.7 https://review.opendev.org/c/opendev/system-config/+/820267 | 22:21 |
clarkb | ianw: should we go ahead and land https://review.opendev.org/c/openstack/project-config/+/820590 to restart gentoo image builds too? | 22:21 |
fungi | adding egress rules shouldn't be hard, i'll just end up creating a separate var for them and corresponding support in the rules files | 22:22 |
opendevreview | Merged opendev/system-config master: Update letsencrypt role docs to suggest a specific order https://review.opendev.org/c/opendev/system-config/+/820409 | 22:23 |
clarkb | fungi: ya I'm more just wondering if the lack of prior art indicates maybe we should appraoch this from another angle. But I still like the firewall rule as it captures what we want pretty well I think from a prevention of state leakage perspective | 22:24 |
fungi | i'm trying to decide between just conditioninally adding to the default OUTPUT chain vs creating a separate openstack-OUTPUT chain guarded by nonzero content in a iptables_rules_egress var | 22:26 |
fungi | we have significant enough regression testing with testinfra that probably if adding to the OUTPUT chain in test jobs broke anything we'd notice before it could merge anyway | 22:27 |
clarkb | the other place that this might affect is docker | 22:28 |
clarkb | for that reason I think we should stick to a separate chain since that has worked well for input | 22:28 |
fungi | that's fair | 22:29 |
fungi | also the openstack in openstack-INPUT saddens me, but it would be nontrivial to change given existing production deployments | 22:30 |
fungi | ideally we'd have chosen something more neutral | 22:30 |
clarkb | ok gitea01 is done. The stop and start of the containers was not quick. It took almost 4-5 minutes I think | 22:41 |
clarkb | but it seems to have done so successfully and I expect the next 7 will also be happy | 22:41 |
clarkb | meeting agenda sent | 22:48 |
clarkb | I've gone ahead and approved the matrix gerritbot update since the gitea updates seem to be going well and are almost done | 22:55 |
clarkb | gitea upgrade all done | 23:03 |
ianw | all images available in iweb too | 23:14 |
opendevreview | Merged opendev/system-config master: Update the gerritbot-matrix image to support arbitrary uid with docker https://review.opendev.org/c/opendev/system-config/+/820591 | 23:26 |
opendevreview | Ian Wienand proposed opendev/dstat_graph master: Add .gitreview https://review.opendev.org/c/opendev/dstat_graph/+/820609 | 23:26 |
opendevreview | Ian Wienand proposed opendev/dstat_graph master: Update dependencies https://review.opendev.org/c/opendev/dstat_graph/+/820630 | 23:26 |
opendevreview | Ian Wienand proposed opendev/dstat_graph master: Trim styles https://review.opendev.org/c/opendev/dstat_graph/+/820631 | 23:26 |
opendevreview | Ian Wienand proposed opendev/dstat_graph master: Conver to bootstrap 5 https://review.opendev.org/c/opendev/dstat_graph/+/820632 | 23:26 |
opendevreview | Ian Wienand proposed opendev/dstat_graph master: Trim margin of overview panel https://review.opendev.org/c/opendev/dstat_graph/+/820633 | 23:26 |
opendevreview | Ian Wienand proposed opendev/dstat_graph master: Fix walking elements of input list https://review.opendev.org/c/opendev/dstat_graph/+/820634 | 23:26 |
opendevreview | Ian Wienand proposed opendev/dstat_graph master: Use list-group-item for graphs https://review.opendev.org/c/opendev/dstat_graph/+/820635 | 23:26 |
opendevreview | Ian Wienand proposed opendev/dstat_graph master: Filter out blank lines https://review.opendev.org/c/opendev/dstat_graph/+/820636 | 23:26 |
opendevreview | Ian Wienand proposed opendev/dstat_graph master: Replace remove .size() with .length https://review.opendev.org/c/opendev/dstat_graph/+/820637 | 23:26 |
opendevreview | Ian Wienand proposed opendev/dstat_graph master: Fix focus panel chart selection for pcp-dstat https://review.opendev.org/c/opendev/dstat_graph/+/820638 | 23:26 |
opendevreview | Ian Wienand proposed opendev/dstat_graph master: Update generate_page.sh https://review.opendev.org/c/opendev/dstat_graph/+/820639 | 23:26 |
opendevreview | Ian Wienand proposed opendev/dstat_graph master: Add a sample.csv and update docs https://review.opendev.org/c/opendev/dstat_graph/+/820640 | 23:26 |
opendevreview | Ian Wienand proposed opendev/dstat_graph master: Fix refresh for embedded csv https://review.opendev.org/c/opendev/dstat_graph/+/820641 | 23:26 |
opendevreview | Merged opendev/irc-meetings master: Move Ironic meeting 1 hour forward https://review.opendev.org/c/opendev/irc-meetings/+/820477 | 23:29 |
clarkb | matrix-gerritbot restarted about a minute ago and I see no errors | 23:34 |
clarkb | need an event to show up in the testing channel then I think we can be happy with this | 23:35 |
clarkb | anyone have a change to push up? :) | 23:35 |
*** rlandy|ruck is now known as rlandy|out | 23:37 | |
fungi | i'm too lazy to come up with one at this point in my evening | 23:38 |
clarkb | https://review.opendev.org/c/opendev/system-config/+/815049 is an easy one we can approve if you aren't too lazy to take a look at that one. ianw another option is https://review.opendev.org/c/openstack/project-config/+/820590 I would approve it myself but don't want to leave you with a looping failing gentoo build if it is unhappy later today | 23:40 |
ianw | clarkb: no problem, i can check on gentoo -- if it's non-obvious why it's failing we can pause it again | 23:41 |
ianw | the doc one lgtm | 23:42 |
clarkb | ianw: ya I'm hoping fungi can double check the doc one since we thought through that last time we did a rename | 23:42 |
fungi | i just approved it | 23:42 |
clarkb | thanks! | 23:42 |
fungi | thanks for writing it! | 23:42 |
opendevreview | Ian Wienand proposed opendev/dstat_graph master: Add basic Zuul job https://review.opendev.org/c/opendev/dstat_graph/+/820642 | 23:46 |
clarkb | that event showed up in matrix. I think we're good there. thank you tristanC | 23:46 |
opendevreview | Merged openstack/project-config master: Un-pause gentoo-17-0-systemd images https://review.opendev.org/c/openstack/project-config/+/820590 | 23:51 |
opendevreview | Ian Wienand proposed opendev/dstat_graph master: Add basic Zuul job https://review.opendev.org/c/opendev/dstat_graph/+/820642 | 23:52 |
clarkb | the letsencrypt job failed because the limestone mirror is not reachable | 23:54 |
clarkb | I can confirm it doesn't ping and http doesn't seem to work either | 23:54 |
clarkb | openstack server list says the server is in a shut off state | 23:55 |
clarkb | did we do that? SHould we try turning it back on again or add it to the emergency file instead for now? | 23:55 |
clarkb | updated | 2021-12-06T10:26:00Z presumably that is when it was shut off | 23:56 |
fungi | i don't recall shutting it off, likely something happened on the host | 23:57 |
clarkb | I'm going to try starting it | 23:59 |
Generated by irclog2html.py 2.17.2 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!