mnasiadka | morning | 07:47 |
---|---|---|
mnasiadka | https://meetings.opendev.org/#Containers_Team_Meeting - where can I change the url it's pointing to? Magnum team has moved to using "magnum" as the meeting name, so that link is pointing to very old archives | 07:47 |
mnasiadka | (I mean url for meeting logs) | 07:47 |
frickler | mnasiadka: https://opendev.org/opendev/irc-meetings/src/branch/master/meetings/containers-team-meeting.yaml | 07:50 |
mnasiadka | frickler: thanks :) | 07:53 |
jrosser | could i get a hold on job openstack-ansible-deploy-aio_capi-ubuntu-jammy for change 893240 | 09:49 |
frickler | jrosser: I can do that in a moment | 10:09 |
frickler | corvus: can you have a look at https://zuul.opendev.org/t/openstack/build/52f01074b2eb487993ede049d858a660 please? nova certainly does have a master branch, doesn't it? | 10:10 |
frickler | note that this is for stable/pike and I'm deleting the whole .zuul.yaml there now anyway, I'm just not sure how to interpret that error | 10:11 |
frickler | jrosser: done, recheck triggered | 10:14 |
frickler | elodilles: infra-root: I've done https://review.opendev.org/c/openstack/blazar/+/893846 now, but (of course) now with no job running at all it also cannot be merged. would you prefer adding a noop job like we do for newly created projects or just force merge? | 10:37 |
elodilles | frickler: if noop does the trick, then it's OK to me | 10:51 |
jrosser | frickler: thankyou | 10:54 |
opendevreview | Alex Kavanagh proposed openstack/project-config master: Add charms-purestorage group for purestorage charms https://review.opendev.org/c/openstack/project-config/+/893377 | 11:36 |
fungi | we'll probably be able to take advantage of this by the time we get mailman auth wired up to our keycloak server: https://github.com/pennersr/django-allauth/commit/ab70468 (that's just merged to the auth lib we're using) | 11:51 |
*** dviroel_ is now known as dviroel | 12:27 | |
*** dhill is now known as Guest2025 | 12:32 | |
*** d34dh0r5- is now known as d34dh0r53 | 12:33 | |
opendevreview | Alex Kavanagh proposed openstack/project-config master: Add the Pure Storage Flashblade charm-manila subordinate https://review.opendev.org/c/openstack/project-config/+/893910 | 13:08 |
opendevreview | Alex Kavanagh proposed openstack/project-config master: Add the Pure Storage Flashblade charm-manila subordinate https://review.opendev.org/c/openstack/project-config/+/893910 | 13:08 |
opendevreview | Alex Kavanagh proposed openstack/project-config master: Add the Pure Storage Flashblade charm-manila subordinate https://review.opendev.org/c/openstack/project-config/+/893910 | 13:13 |
opendevreview | Alex Kavanagh proposed openstack/project-config master: Switch charm-cinder-purestorage acl https://review.opendev.org/c/openstack/project-config/+/893912 | 13:16 |
opendevreview | Alex Kavanagh proposed openstack/project-config master: Switch charm-cinder-purestorage acl https://review.opendev.org/c/openstack/project-config/+/893912 | 13:16 |
opendevreview | Alex Kavanagh proposed openstack/project-config master: Add the Pure Storage Flashblade charm-manila subordinate https://review.opendev.org/c/openstack/project-config/+/893910 | 13:22 |
opendevreview | Harry Kominos proposed openstack/diskimage-builder master: feat: Add new fail2ban elemenent https://review.opendev.org/c/openstack/diskimage-builder/+/892541 | 13:25 |
fungi | corvus: whenever you have a moment, no urgency at all, but this is an example of a leaked image in iad from yesterday. oddly, i can't seem to find a mention of the image name in either of our builders: https://paste.opendev.org/show/bIzGJD8QKpW13Dqg4LpH/ | 13:27 |
fungi | there's a list of 24 leaked image uuids in ~fungi/iad.leakedimages on bridge | 13:29 |
fungi | in case you need more examples | 13:29 |
fungi | oh, right, because the image names the builders log are in the new build id suffix format rather than the serial suffix that gets uploaded | 13:35 |
fungi | i'm not sure how to reverse the serial suffix name to a build id in order to find the exact upload attempts | 13:35 |
fungi | since the upload gives up before it gets a uuid for the image | 13:37 |
fungi | and the builder no longer mentions the serial at all | 13:37 |
corvus | frickler: bug in zuul. fix in https://review.opendev.org/c/zuul/zuul/+/893925 | 14:02 |
frickler | corvus: ah, cool, thx | 14:05 |
frickler | meh, I fixed blazar, now zuul complains about blazar-dashboard https://review.opendev.org/c/openstack/openstack-zuul-jobs/+/891628 . corvus is there a chance zuul could report all possibly affected projects at once? | 14:06 |
opendevreview | Merged openstack/project-config master: Reduce frequency of image rebuilds https://review.opendev.org/c/openstack/project-config/+/893588 | 14:09 |
opendevreview | Bernhard Berg proposed zuul/zuul-jobs master: prepare-workspace-git: Add ability to define synced pojects https://review.opendev.org/c/zuul/zuul-jobs/+/887917 | 14:10 |
corvus | frickler: that could get very large; it only reports the first to avoid leaving many gb messages. normally i'd suggest looking at codesearch, but i guess these are on unindexed branches? | 14:11 |
opendevreview | Merged openstack/project-config master: Add charms-purestorage group for purestorage charms https://review.opendev.org/c/openstack/project-config/+/893377 | 14:17 |
corvus | fungi: well, that sure doesn't look like it has nodepool metadata attached to it. also, neither does any other image in rackspace. | 14:31 |
corvus | oh wait i take that back | 14:31 |
corvus | i see nodepool_build_id='5fefa50bae64483b9e6f8b6a21664758', nodepool_provider_name='rax-dfw', nodepool_upload_id='0000000001' in one of the other images | 14:32 |
corvus | but i don't see those in the paste | 14:32 |
corvus | so it does look like they were not added to the leaked image | 14:32 |
fungi | i expect that metadata gets attached to images after they're imported, so if the sdk gives up waiting for the image to show up then it never gets around to attaching the metadata to it? but i'm really not familiar enough with the business logic in the sdk to know for sure. it could also just be some sort of internal timeout between services inside rackspace responsible for handing off the | 14:36 |
fungi | metadata, i suppose | 14:36 |
fungi | thanks for looking, corvus! i wonder if we can more conveniently script something that just checks our tenant for private images lacking nodepool metadata and clean those up directly, rather than comparing uuids between lists from zk and glance | 14:38 |
fungi | looks like `openstack image list` allows filtering by property (key=value), so might be able to fetch a list of them that way | 14:40 |
corvus | fungi: https://opendev.org/openstack/openstacksdk/src/branch/master/openstack/image/v2/_proxy.py#L755 | 14:42 |
corvus | yes, it looks like when using the v2 task api, the sdk attaches image metadata after the image is sucessfully imported. | 14:42 |
corvus | i think that's the api that is relevant here? | 14:42 |
fungi | i agree | 14:43 |
fungi | i guess a longer timeout in the image create call would counter that to some extent | 14:43 |
corvus | do you know where the task api is documented? | 14:43 |
fungi | hah | 14:43 |
fungi | sorry, it's a running joke | 14:43 |
fungi | the task api hails from the bad old days of "vendor extensible apis" so it can be/do just about anything the provider wants | 14:44 |
corvus | so i guess there would be "rackspace task api documentation" then? | 14:44 |
fungi | as i understand it, yes | 14:45 |
corvus | i'm wondering if it would be valid to include the metadata in https://opendev.org/openstack/openstacksdk/src/branch/master/openstack/image/v2/_proxy.py#L732 | 14:45 |
corvus | alternatively, i wonder if glance_task.result['image_id'] is available and valid in the case of an exception, and could be used to attach metadata in the exception handler | 14:46 |
fungi | i think this might be the current api doc for tasks, but of course rackspace is also running something ancient which may not be entirely glance either: https://docs.openstack.org/api-ref/image/v2/index.html#tasks | 14:47 |
fungi | i've observed test uploads to their glance, and the basic process is that the image create call returns an "import" task id (uuid) then the task is inspected and, once it completes, its data includes the uuid of the image that was created | 14:48 |
corvus | https://docs-ospc.rackspace.com/cloud-images/v2/api-reference/image-task-operations#task-to-import-image | 14:49 |
corvus | warning:: Name is the only property that can be included in image-properties. Including any other property will cause the operation to fail. | 14:49 |
corvus | so that's a no on idea #1 | 14:49 |
corvus | fungi: yeah, but since it's leaving an image object around, perhaps it includes the image id in the failure case too | 14:50 |
corvus | so idea #2 might work if that's the case | 14:50 |
fungi | well, we're not getting task failures. the sdk is just giving up waiting for the image to be ready | 14:50 |
fungi | once the image is ready, the task object does include the uuid of the image | 14:50 |
fungi | regardless of whether the sdk is still looking for it at that point | 14:51 |
fungi | it's just that it might be 6 hours after the image was uploaded | 14:51 |
corvus | oh | 14:51 |
corvus | it looks like the timeout comes from the create_image call, so i think we can pass that in from nodepool | 14:52 |
fungi | yes, we can. frickler made a temporary patch that hard-coded a nondefault timeout value in the builder's invocation of the sdk method | 14:52 |
fungi | and afterward you indicated that adding a config option in nodepool for that would be acceptable if we wanted it | 14:53 |
fungi | we were just pursuing other avenues first before determining whether it was necessary | 14:53 |
corvus | we used to have a 6 hour timeout, but apparently that disappeared | 14:54 |
corvus | probably lost in one of the sdk refactors | 14:55 |
corvus | https://opendev.org/zuul/nodepool/src/branch/master/nodepool/builder.py#L43 | 14:55 |
corvus | but there's the constant; unused. | 14:55 |
fungi | oh wow | 14:56 |
fungi | looks like it was last used in shapshot image updates, removed by https://review.openstack.org/396719 in 2016? | 14:58 |
fungi | and removed from upload waiting by https://review.openstack.org/226751 (presumably similarly set in shade back then) | 15:00 |
fungi | though i'm not immediately finding it in shade's history if so | 15:01 |
fungi | so anyway, i take this to mean we haven't had that 6-hour timeout in production for about 7 years | 15:02 |
fungi | wasn't a recent change | 15:02 |
corvus | remote: https://review.opendev.org/c/zuul/nodepool/+/893933 Add an image upload timeout to the openstack driver [NEW] | 15:05 |
fungi | thanks! | 15:05 |
fungi | clarkb: does https://zuul.opendev.org/t/openstack/build/cfe503710fa543618da6bf513c7576ba mean that opensuse dropped the libvirt-python package? i see a python3-libvirt-python in /afs/openstack.org/mirror/opensuse/distribution/leap/15.2/repo/oss/INDEX.gz but i have no idea if i'm looking in the right place | 15:07 |
clarkb | fungi: https://software.opensuse.org/package/libvirt-python "there is no official package for ALL distributions" | 15:14 |
clarkb | looking at https://software.opensuse.org/search?baseproject=ALL&q=libvirt-python I think you may need different package names for different suse releaes | 15:15 |
clarkb | I would drop it from the bindep test | 15:15 |
opendevreview | Jeremy Stanley proposed openstack/project-config master: Drop libvirt-python from suse in bindep fallback https://review.opendev.org/c/openstack/project-config/+/893935 | 15:19 |
fungi | clarkb: ^ thanks! | 15:19 |
fungi | apparently, git-review takes roughly 5 minutes to give up trying to reach gerrit over ipv6 | 15:20 |
clarkb | I think that will be based on your system tcp settings? | 15:21 |
clarkb | zanata didn't rotate mysql sessions overnight and the error still occurs. I see there is a wildfly systemd unit running on the server whihc I will restart now to see if that changes things | 15:22 |
clarkb | ianychoi[m]: the DB sql_mode update to the 5.6 default plus a restart of the zanata service appears to have your api request working | 15:24 |
corvus | i'm going to restart the schedulers now to get the default branch bugfix | 15:28 |
fungi | thanks corvus! | 15:28 |
fungi | clarkb: yeah, i know i can tweak the protocol fallback in the kernel, just hoping whatever the v6 routing issue is between here and there clears up soon | 15:29 |
corvus | i'm restarting web too; not necessary, but just to keep schedulers ~= web | 15:30 |
fungi | my outbound traceroute to the server transits eqix-ash.bdr01.12100sunrisevalleydr01.iad.beanfield.com and gets as far as 2607:f0c8:2:1::e (no reverse dns but whois puts that somewhere in beanfield as well), but the return route to me from the gerrit server seems to never escape the vexxhost border (bouncing back and forth between 2604:e100:1::1 and 2604:e100:1::2) | 15:32 |
fungi | guilhermesp_____: mnaser: ^ any idea if there's a routing table issue in ca-ymq-1 which could explain that? can you see where packets for 2600:6c60:5300:375:96de:80ff:feec:f9e7 are trying to go? | 15:34 |
fungi | it's been like this for at least a week, so doesn't seem to be clearing up on its own | 15:34 |
fungi | maybe my isp's announcements are being filtered at the edge or by a peer | 15:35 |
fungi | i haven't had trouble reaching other things over ipv6 though, i can get to servers in vexxhost's sjc1 region over ipv6 with no problem, for example | 15:36 |
corvus | #status log restarted zuul schedulers/web to pick up default branch bugfix | 15:37 |
opendevstatus | corvus: finished logging | 15:37 |
corvus | fungi: who needs to review https://review.opendev.org/893792 (openstack-zuul-jobs) ? | 15:38 |
fungi | corvus: i can, meant to look earlier, thanks! | 15:38 |
corvus | oh cool, thanks :) | 15:38 |
clarkb | https://23.253.22.116/ <- is a held gerrit running in a bookworm container with java 17 | 15:40 |
corvus | we might still see some default branch errors since those values are cached. they should clear as config changes are merged and the cache is updated; but we may still want to do a zuul-admin delete-state this weekend just to make sure all traces are gone. or if it becomes a real problem, we can do that earlier, but that's an outage, so i'd like to avoid it. | 15:40 |
clarkb | Gerrit + bookworm + java 17 seems to generally work. I think we can plan for a short gerrit outage to restart on that container if others agree | 15:42 |
fungi | seems fine, i suppose we could time that to coincide with the zuul restart? | 15:42 |
opendevreview | Merged openstack/project-config master: Drop libvirt-python from suse in bindep fallback https://review.opendev.org/c/openstack/project-config/+/893935 | 15:44 |
clarkb | the zuul restart is done? | 15:45 |
fungi | clarkb: the full down delete-state restart | 15:48 |
fungi | not the rolling restart | 15:48 |
clarkb | ah, I think that restart depends on gerrit being up, but yes we could do gerrit first and then zuul | 15:49 |
frickler | corvus: those issue are all on stable/pike it seems, and I can understand the concern about error size, ack. so I'll work my way through them one by one and hope that'll be a finite task | 15:52 |
clarkb | frickler: might be able to do a git grep for the negative lookahead regex and catch a lot of them upfront? | 15:54 |
frickler | clarkb: that's not about regexes, but about removing some old project templates https://review.opendev.org/c/openstack/openstack-zuul-jobs/+/891628 | 15:56 |
frickler | and I don't have most of those repos locally | 15:57 |
clarkb | ah | 15:57 |
frickler | unless now, when I need to create a patch on them | 15:57 |
corvus | another option to consider would be to force-merge the removal, then cleanup based on the new static config errors. you'd have to be pretty sure that isn't going to break branches you care about, because it will definitely break project configs on affected branches. not advocating it, just brainstorming. :) | 15:59 |
frickler | afaict all of those branches are only waiting to be eoled, because they predate release automation. so that's a good idea. let me try to do two more projects manually and if more show up then, we can take that path | 16:03 |
frickler | likely noone would care if they stay broken until they're eoled even | 16:03 |
frickler | I have the silent fear that blazar might not be the single odd project, but just rather early in the alphabetic list of more to come | 16:04 |
opendevreview | Clark Boylan proposed openstack/project-config master: Set fedora labels min-ready to 0 https://review.opendev.org/c/openstack/project-config/+/893961 | 16:08 |
clarkb | infra-root ^ that one should be safe to merge now | 16:08 |
frickler | that sounds like a good first step, ack | 16:09 |
opendevreview | Clark Boylan proposed openstack/project-config master: Remove fedora-35 and fedora-36 from nodepool providers https://review.opendev.org/c/openstack/project-config/+/893963 | 16:17 |
opendevreview | Clark Boylan proposed openstack/project-config master: Remove fedora image builds https://review.opendev.org/c/openstack/project-config/+/893964 | 16:17 |
clarkb | These two shoud land on monday with some oversight to ensure things clean up sufficiently | 16:17 |
opendevreview | Merged openstack/project-config master: Set fedora labels min-ready to 0 https://review.opendev.org/c/openstack/project-config/+/893961 | 16:24 |
corvus | the next nodepool restarts should happen on bookworm-based images | 16:28 |
corvus | so heads up for any unexpected behavior changes there | 16:29 |
frickler | corvus: do you want to restart now/soon or wait for the usual cycle? | 16:33 |
corvus | i was assuming we just wanted the usual cycle; but i don't have a strong opinion | 16:42 |
frickler | if we see the probability of issues as low I'm fine with waiting, otherwise a restart where we can more closely watch might be better | 17:03 |
*** ykarel is now known as ykarel|away | 17:29 | |
clarkb | the nodepool services have updated fwiw | 17:29 |
clarkb | live disk images are huge these days. | 17:37 |
fungi | going to pop out for a late lunch, bbiab | 17:50 |
frickler | ah, I was confused earlier, thinking that nodepool would be updated together with zuul on the weekly schedule. sorry if what I was saying didn't make much sense | 17:58 |
clarkb | frickler: ah ya nodepool updates ~hourly | 18:00 |
clarkb | my laptop exihits the same behavior under ubuntu jammy's desktop livecd which is linux 6.2 which definitely worked back when I ran that linux version. Unfortunately that makes me pretty confident I have a hardware issue | 18:00 |
guilhermesp_____ | fungi: huuuum i think to be able to filter then i would need to check which hv the vm lives.. .we did have a minor issue with core switches this week but it was on amsterdam region... nothing suspicious in mtl | 18:19 |
*** ralonsoh is now known as ralonsoh_away | 18:55 | |
frickler | corvus: zuul seems to be reporting the errors twice on https://review.opendev.org/c/openstack/openstack-zuul-jobs/+/891628 each time, I didn't notice that earlier, but seems to have been happening since the first result | 19:09 |
frickler | also now some charm repo is broken. | 19:09 |
frickler | will continue looking at that tomorrow | 19:09 |
corvus | frickler: likely due to being in two tenants | 19:13 |
corvus | (or could just be 2 pipelines) | 19:14 |
fungi | guilhermesp_____: i see the same problem to both 16acb0cb-ead1-43b2-8be7-ab4a310b4e0a (review02.opendev.org) and a064b2e3-f47c-4e70-911d-363c81ccff5e (mirror01.ca-ymq-1.vexxhost.opendev.org) | 19:15 |
fungi | guilhermesp_____: from the cloud side it looks like a routing loop, but it's hard to tell with the limited amount of visibility i see | 19:16 |
corvus | frickler: it's because it's in check and check-arm64 | 19:17 |
corvus | clarkb: fungi maybe gerrit + zuul reboot saturday morning pst? | 20:04 |
fungi | i should be around, sgtm | 20:05 |
fungi | happy to do some/all of it | 20:05 |
clarkb | I can be around for that too | 20:14 |
corvus | i wonder how long it's been since we actually had zuul offline. it's been a pretty good run. :) | 20:16 |
clarkb | we should probably merge the gerrit change on friday then since we won't auto restart on the newi mage | 20:17 |
opendevreview | Merged zuul/zuul-jobs master: Remove the nox-py27 job https://review.opendev.org/c/zuul/zuul-jobs/+/892408 | 20:46 |
fungi | this discussion seems hauntingly familiar: https://discuss.python.org/t/33122 | 21:50 |
guilhermesp_____ | fungi: hum ok... im going out for holidays tomorrow. Maybe if you could just open up a ticket for us to keep track and we can investigate that ( or someone else from my team tomorrow when im back ) -- just send an email to support@vexxhost.com | 21:58 |
fungi | guilhermesp_____: sure, will do. thanks! | 22:32 |
Generated by irclog2html.py 2.17.3 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!