corvus | naturally that failed; so i'm going to set another autohold and run again | 00:01 |
---|---|---|
mnasiadka | clarkb: Right, any idea when that might happen? I don’t think DIB is managed in openstack/releases? | 05:39 |
ianw | mnasiadka: dib releases are manual - clarkb happy to make one if you wish | 07:46 |
kevko | Hi, anybody know what is happening with pypi mirror ? | 11:36 |
kevko | Connection to mirror.bhs1.ovh.opendev.org timed out. (connect timeout=60.0)')': /pypi/simple/setuptools/ | 11:36 |
ianw | it does look like mirror.bhs1.ovh.opendev.org is not responding | 11:57 |
ianw | it reports as active from api | 12:02 |
kevko | can anybody check it please ? | 12:06 |
ianw | i can't see the console | 12:16 |
fungi | confirmed, i can ssh into mirror.gra1.ovh but not mirror.bhs1.ovh, checking the nova api | 12:16 |
fungi | server show says it's active, console log show is taking a while to return | 12:18 |
ianw | mirror03.gra1.ovh.opendev.org console works | 12:18 |
ianw | fungi: never returns for me; and also trying to get to it via the OVH mgmt website throws an error | 12:19 |
fungi | cnosole irl show also seems to be timing out | 12:19 |
fungi | er, console url show i meant, of course | 12:19 |
ianw | fungi: want me to reboot it, see what happens? i think this is an ovh problem | 12:20 |
fungi | ianw: should we try to do a server reboot? | 12:20 |
fungi | yeah, agreed | 12:20 |
fungi | go for it | 12:21 |
fungi | shows it's in a reboot state | 12:21 |
ianw | yeah, if console doesn't come back maybe a full stop/start | 12:22 |
fungi | https://public-cloud.status-ovhcloud.com/ doesn't indicate any widespread issue in that region at least | 12:23 |
fungi | if we can't get it restarted, we can temporarily turn down that nodepool region | 12:24 |
ianw | sigh, still rebooting, and can't stop it if it's rebooting | 12:25 |
fungi | well, stop would probably have failed similarly | 12:25 |
fungi | i've set max-servers to 0 there for the moment | 12:29 |
fungi | i'll add nl04 to the emergency disable list while i put a change together | 12:29 |
ianw | ++ | 12:33 |
opendevreview | Jeremy Stanley proposed openstack/project-config master: Temporarily disable OVH BHS1 launching in nodepool https://review.opendev.org/c/openstack/project-config/+/931777 | 12:34 |
fungi | infra-root: ^ | 12:34 |
fungi | once that merges i'll take nl04 out of the emergency disable list | 12:35 |
Clark[m] | ianw: I think a dib release should be fine. There have only been a few commits since the recent release | 13:45 |
Clark[m] | fungi: the mirror responds to http(s) for me now | 13:50 |
opendevreview | James E. Blair proposed opendev/zuul-jobs master: Finish upload job https://review.opendev.org/c/opendev/zuul-jobs/+/931355 | 14:01 |
corvus | apparently it needs the segment-size argument ^ | 14:02 |
corvus | i thought that was automatic based on my previous testing, but for some reason, this time it just did one stream | 14:03 |
fungi | Clark[m]: oh! excellent, i was afk for a few but will check to see if they updated the ticket | 14:03 |
corvus | Clark: fungi ^ if you have a sec to re-review 931355 i think we can try again :) | 14:03 |
fungi | yep, on it | 14:04 |
fungi | lgtm! | 14:05 |
fungi | ovh hasn't replied to the ticket yet as far as i can see, so we should probably wait a bit for a post-mortem before we assume it's probably staying up | 14:07 |
fungi | also ssh is still timing out for me | 14:08 |
fungi | which is odd since https is working | 14:08 |
corvus | fungi: ssh wfm | 14:09 |
corvus | ipv4 | 14:09 |
fungi | yeah, socket timeout reaching 22/tcp over ipv6 | 14:09 |
fungi | v4 is indeed working | 14:09 |
fungi | aha, 443/tcp isn't reachable over v6 either | 14:10 |
fungi | so ipv6 connectivity to the mirror is still broken | 14:10 |
fungi | the server never rebooted either, "up 321 days" | 14:10 |
fungi | and yes, it's still stuck reporting "reboot" status according to nova | 14:11 |
fungi | i'll try rebooting it from the cli since i can reach it over v4 | 14:11 |
fungi | worth noting, the v6 default route on the node is still there, and through a gateway that's marked reachable in its neighbor table | 14:12 |
corvus | i just lost a connection to bridge | 14:14 |
fungi | sorry, that was me :/ | 14:14 |
corvus | whew. good to clean out the cobwebs every now and then anyway :) | 14:14 |
fungi | yeah, it's on its way back up now | 14:14 |
fungi | and up again | 14:15 |
fungi | #status log Rebooted bridge01 | 14:15 |
opendevstatus | fungi: finished logging | 14:16 |
fungi | at least we're running on a new kernel that way | 14:16 |
fungi | apologies to anyone who had something running there that i accidentally interrupted! | 14:16 |
fungi | so anyway, mirror02.bhs1.ovh has a default v6 route through 2607:5300:201:2000::1 and that's reachable and responding to ping from the server | 14:18 |
fungi | when i traceroute6 from home to the mirror, the last hop that responds is an address in an ovh assignment | 14:21 |
fungi | it's a small (/44) allocation, but unfortunately it has a very generic netname in whois so i don't know how far into their network that really is | 14:22 |
Clark[m] | Is it possible that v6 was the problem all along? | 14:23 |
fungi | entirely possible, i didn't think to try v4 networking, maybe ianw didn't either | 14:24 |
fungi | tracerouting in the opposite direction from the mirror to my home v6 address, it gets two hops through ovh's network and then stops | 14:24 |
fungi | i'll try a command-line reboot from the server now, but have an increasing suspicion it will come back into the same state | 14:25 |
stephenfin | fungi: clarkb: Would one of you be able to change the topic of the #openstack-sdks channel for us. We'd like it to point back to launchpad rather than storyboard | 14:26 |
Clark[m] | It would surprise me that the control plane is trying to use ipv6 too and failing so the state changes never occur/register | 14:27 |
stephenfin | fungi: ...and I see you just replied on #openstack-sdks. Sorry for the nosie | 14:27 |
stephenfin | *noise | 14:27 |
Clark[m] | *wouldn't | 14:27 |
corvus | should we consider omitting inactive repos from the on-image cache? | 14:29 |
fungi | i thought we did, i guess we only skip them if they're using the retired acl? | 14:29 |
corvus | oh that may be, i was just assuming based on names i see scrolling by; but yeah, maybe some of these just aren't actually retired | 14:30 |
fungi | we should at least consider also skipping any that aren't in any zuul tenant | 14:30 |
Clark[m] | Yes we should skip any with retired acls. Worth double checking though | 14:30 |
fungi | since we've been removing repos with broken job configs from tenants | 14:30 |
corvus | yes, we do have many fewer repos on the image than in projects.yaml | 14:31 |
fungi | it's also possible that when openstack got its own separate retired project acl, we didn't add it to the list of what to skip | 14:31 |
corvus | so i assume it's working, but we might be able to eek out a small improvement if we add in some other heuristics like zuul tenancy | 14:31 |
corvus | if acl and os.path.basename(acl) == 'retired.config': | 14:32 |
fungi | so, rebooting mirror02.bhs1.ovh neither fixed its v6 reachability nor did it clear the reboot status nova is reporting for it | 14:33 |
corvus | i think that retired.config works for both | 14:33 |
fungi | it remains reachable via ipv4 after the reboot however | 14:33 |
fungi | i'll update the ovh trouble ticket in a few minutes with latest findings | 14:34 |
clarkb | I guess we should still land the nodepool config change since no ipv6 is a problem for job success? | 14:37 |
clarkb | fungi: if you agree ^ maybe go ahead and +A the change? I +2'd it but didn't approve in case we think we can just return it to service somehow | 14:38 |
mnasiadka | ianw, clarkb : I assume we want to hold off with merging https://review.opendev.org/c/openstack/diskimage-builder/+/924421 with that planned release to fix locale and then do a major one after this lands? | 14:39 |
clarkb | mnasiadka: yes that seems like a good plan. Do a quick bugfix release for locales then plan to do a bigger release with the potentially breaking for users change to the rocky element | 14:41 |
clarkb | I don't think that will affect opendev because we always specify vm but other users may be impacted | 14:41 |
opendevreview | James E. Blair proposed opendev/zuul-jobs master: WIP: testing https://review.opendev.org/c/opendev/zuul-jobs/+/931347 | 14:44 |
corvus | clarkb: fungi ^ i'd like to see if we can use ansible env variables to hold the credential information and make it safe to run without no_log -- can you double check that test change before i run it through its paces? | 14:46 |
fungi | lookin | 14:46 |
corvus | (i want to run the job and then see if the fake credential strings show up in any of the logs) | 14:47 |
clarkb | corvus: you seem to have overridden the secret in its entirety so I don't think there is any way for that to leak info | 14:49 |
clarkb | or rather leak sensitive info. We don't know if it will leak the public test data | 14:49 |
opendevreview | Doug Goldstein proposed openstack/project-config master: Update ironic ACL for editHashtags https://review.opendev.org/c/openstack/project-config/+/931799 | 14:50 |
opendevreview | Mohammed Naser proposed zuul/zuul-jobs master: Stop using temporary registry https://review.opendev.org/c/zuul/zuul-jobs/+/931713 | 14:55 |
corvus | clarkb: yep, thanks; just wanted more eyes on that to make sure i didn't miss something :) | 14:57 |
corvus | i'll send it and we can see what happens | 14:58 |
fungi | yes, looks like a safe test | 14:59 |
fungi | and then we can examine the ansible output/manifest to see if the overridden strings show up anywhere | 14:59 |
opendevreview | Merged opendev/zuul-jobs master: Finish upload job https://review.opendev.org/c/opendev/zuul-jobs/+/931355 | 15:03 |
fungi | merge failed | 15:03 |
clarkb | fungi: merge of what failed? | 15:04 |
fungi | Error merging gerrit/opendev/zuul-jobs for 931347,11 | 15:04 |
fungi | the wip child of the change that just merged | 15:04 |
fungi | so we didn't get an actual build in gate to inspect | 15:05 |
fungi | outdated parent? | 15:05 |
clarkb | I think they may have been disconnected in git so not actually sharing a relationship to resolve conflicts | 15:05 |
fungi | ah | 15:05 |
fungi | yes, correct. they merge-conflicted in gate | 15:06 |
opendevreview | Merged openstack/project-config master: Update ironic ACL for editHashtags https://review.opendev.org/c/openstack/project-config/+/931799 | 15:06 |
fungi | so i guess it needs a rebase on the current branch tip | 15:06 |
fungi | mirror.bhs1.ovh is reachable over ipv6 again! | 15:10 |
clarkb | I wonder if ipv6 coming back allowed the nova status to reconcile too | 15:11 |
fungi | status is still "reboot" in nova though | 15:11 |
opendevreview | James E. Blair proposed opendev/zuul-jobs master: WIP: testing https://review.opendev.org/c/opendev/zuul-jobs/+/931347 | 15:16 |
opendevreview | James E. Blair proposed opendev/zuul-jobs master: WIP: testing https://review.opendev.org/c/opendev/zuul-jobs/+/931347 | 15:17 |
corvus | okay i think that dtrt | 15:18 |
corvus | oh! i forgot something in the real change -- the artifact was returned from the role i removed, so i need to add that back. | 15:18 |
corvus | oh here's a pickle -- we were using sdk to get the endpoint to construct the url of the image we uploaded; i wonder if the swift cli can provide that information | 15:21 |
timburke | corvus, running `swift stat -v <container> <obj>` should give you the full URL of the object (among a bunch of other info) | 15:24 |
corvus | timburke: perfect thanks! | 15:24 |
timburke | if you're just interested in the OS_STORAGE_URL, you can get that with `swift auth` | 15:24 |
corvus | oh yeah that'll work too | 15:25 |
corvus | neither of these are hard to parse -- but json output might be cool to have someday :) | 15:26 |
timburke | wrote up https://bugs.launchpad.net/python-swiftclient/+bug/2083948 | 15:36 |
corvus | ++ | 15:36 |
opendevreview | James E. Blair proposed opendev/zuul-jobs master: Return image artifacts https://review.opendev.org/c/opendev/zuul-jobs/+/931815 | 15:59 |
corvus | clarkb: fungi https://zuul.opendev.org/t/opendev/build/fd8e5201483c4f5688f1368774f50885 the only "leak" i see is the output of the credential id in that error message; i think we decided we weren't very worried about that. i don't see "testcredentialsecret" in any of the output files, and i don't see testcredentialid anywhere except in those error messages, so i'm inclined to think this approach should be safe. | 16:03 |
corvus | if you agree, then https://review.opendev.org/931815 incorporates that, along with getting the url and checksums for returning the artifact. | 16:04 |
fungi | yeah, not long ago i was wondering really how sensitive we found that id, and a quick grep of irc logs indicates we've pasted urls with the id in them with some regularity | 16:05 |
fungi | since it appears as part of a url for at least some systems, i wouldn't consider it worth trying to keep secret | 16:06 |
clarkb | ya the only other thought I've got is wondering what the scope of that credential is. | 16:06 |
clarkb | If it does leak because ansible or swiftlcient change then what is the impact | 16:06 |
clarkb | is it bad enough that we want to try the dedicated user with acl thing and see if we can make that work instead or is the application credential scoped to that service and region already so maybe we care less? I don't know | 16:07 |
fungi | well, to be clear, i meant the parent account id is scattered all over the place, so worrying about a sub-credential id is fairly pointless | 16:07 |
clarkb | fungi: this isn't currently a sub credential aiui fwiw | 16:08 |
clarkb | but I agree that it seems like any id is fine to expose | 16:08 |
fungi | yes, passwords and api keys are what we should be worried about guarding | 16:09 |
corvus | oh the application credential that is the subject of this secret is not our main credential. it is an "application credential" that i created just for uploading images | 16:10 |
corvus | i don't think it's the thing that shows up in the url | 16:10 |
clarkb | corvus: right the thing in the url should be the id portion. What I'm curious to know is if that credential is a global one for the account | 16:11 |
clarkb | I don't know what kind of scoping it has if any | 16:11 |
corvus | yes it is global for the account; the scoping that it has seems pretty coarse | 16:11 |
clarkb | in that case I think my inclination would be to be careful here even if that means continuing to no log things | 16:11 |
corvus | there were like 6 things like "creator" "reader" and some others. no idea what any of them mean. | 16:12 |
clarkb | maybe rewrite them so that we can manually remove the no log to aid in debugging later while still likely being safe? | 16:12 |
fungi | i'm still not especially worried about the application credential id as long as there's still a strong api key or password we're not exposing, but i understand the hesitance | 16:13 |
corvus | clarkb: which part concerns you? having them as env variables? potential leaking of the id? or potential leaking of the secret? | 16:13 |
clarkb | corvus: potential leaking of the secret without nolog if say exception handling does the wrong thing | 16:14 |
fungi | in unrelated news, mirror02.bhs1.ovh.opendev.org is Status:ACTIVE again, so probably safe to put that region back into use but i've seen no reply on our trouble ticket about it yet | 16:15 |
jrosser | it is possible to add access rules to an application credential (different to keystone roles), to limit their usability to particular apis | 16:16 |
fungi | looks like the mirror has been up since the manual reboot i performed, so whatever got corrected was only on the backend | 16:16 |
corvus | clarkb: okay, if we're worried about command line clients outputting secrets, then i have no counter to that. but if so, then we should probably not use the environment variables at all since anything could access it and print it. that includes that "swift stat" command i just added. | 16:16 |
clarkb | corvus: historically openstack hasn't been very good about this. Openstack client very explicitly has tried to not leak things that way but historically the other client tools have not | 16:17 |
corvus | jrosser: possible for a user or a cloud admin? | 16:17 |
clarkb | unfortunately, we're not able to use the openstackclient here without hitting other bugs so... | 16:18 |
jrosser | corvus: you can do that as a user https://docs.openstack.org/keystone/latest/user/application_credentials.html#access-rules | 16:18 |
jrosser | what is difficult with it though, is for more complex things, perhaps server create, that you just have to know that glance/neutron/<other> are also involved and those have to be allowed as well | 16:19 |
clarkb | in this case it would just be for swift so maybe we're lucky with a simple case? | 16:19 |
corvus | jrosser: neat. i did not see that in the web ui i used to create the cred; but we could try doing that from the cli. | 16:19 |
jrosser | could well be, yes.... simple things will be much more straightforward | 16:20 |
fungi | fwiw, we've considered any leak of client credentials in client/lib projects a severe security vulnerability unless it occurs when debugging options are enabled | 16:20 |
jrosser | if you're concerned that the application credential is too powerful then access rules are useful, but I have found them to be quote difficult to configure | 16:20 |
clarkb | fungi: but aiui they still occur with non zero frequency in logs/crash handling? | 16:21 |
fungi | they shouldn't, unless it's in debug level log entries | 16:21 |
fungi | and even then, projects have moved toward trying to mask them (there's specific functionality for that in oslo.log now since years) | 16:22 |
corvus | (if we are concerned about this i could have saved some time and not run that test since this is not falsifiable with a test) | 16:22 |
clarkb | swiftcleint doesn't use oslo.log I don't think | 16:22 |
opendevreview | James E. Blair proposed opendev/zuul-jobs master: Return image artifacts https://review.opendev.org/c/opendev/zuul-jobs/+/931815 | 16:24 |
corvus | okay, there it is with no env variables and no logging i think | 16:25 |
clarkb | ah but it does use keystoneauthv1 which does use oslo.log? anyway if we scope things I feel more comfortable dropping the no log if we're not scoping then I thinkw eshould be careful /me reviews the change above | 16:26 |
opendevreview | James E. Blair proposed opendev/zuul-jobs master: Delete images after 72 hours https://review.opendev.org/c/opendev/zuul-jobs/+/931819 | 16:28 |
clarkb | corvus: re the use of async does that help if we're still doing things serially task by task on a single node? (I'm just trying to undersatnd the benefit to async in the hash tasks) | 16:28 |
corvus | yes, those get backgrounded and then we check the result after the upload | 16:28 |
clarkb | oh I see we check for them later | 16:29 |
clarkb | got it thats special poll=0 behavior | 16:29 |
corvus | yup | 16:30 |
clarkb | looking at that application credential acl stuff and thinking about jrosser's feedback I wonder if openstack could provide some "recipes" for that | 16:38 |
clarkb | liek one for "create server" and another for "swift usage but nothing else" | 16:38 |
clarkb | but I think we could set up swift acls by setting path to /**, service to object/swift/whatevertheofficialserviceis, and then create a rule like that for one each of HEAD,PATCH,GET,POST,PUT ? | 16:39 |
fungi | i've removed nl04 from the emergency disable list, set max-servers back to 120 for ovh-bhs1, abandoned 931777 and closed the ticket in ovh about it | 16:39 |
clarkb | if we wanted to be even fancier we could scope it to the specific container using better path rules | 16:40 |
clarkb | btu I think even just "this can only do swift api actions" is a big improvement | 16:40 |
corvus | do we have to specify the method? can we use * instead of GET? | 16:43 |
clarkb | corvus: the docs that were linked only mention wildcards being valid for path not method. But maybe the docs are incomplete? | 16:43 |
jrosser | here is what we ended up with for server create https://paste.opendev.org/show/bLejT3tuJqaUusE61tX4/ | 16:45 |
clarkb | oh swift probably uses DELETE too | 16:45 |
clarkb | jrosser: recipe #1 right there :) more seriously that sort of thing would probably make a good appendix to the doc you linked to? | 16:47 |
clarkb | then people can add additional ones as they are created and known to work? | 16:47 |
jrosser | that would be great really | 16:47 |
jrosser | i think what was most difficult about it was that as an end user it was pretty opaque why the rule you were working on did not work | 16:48 |
jrosser | it was only by also digging through the service logs could you find some 2nd order 4xx to then allow that as well | 16:48 |
fungi | keep in mind that if someone compromised the image publishing credentials, they could in theory upload their own image to replace one mid-process before it got pushed to glance and added to the zuul launcher, so could inject a custom version of some binary which then alters our container images of things like zuul or gerrit backdooring them. or alter openstacksdk release packages so that when | 16:51 |
fungi | we install a new version on our bastion server it's compromised. as such i'm not sure there's a ton of benefit from spending lots of time trying to tightly scope it, since uploading images to the swift container is already a possible key to the kingdom (even if a somewhat circuitous one) | 16:51 |
jrosser | i would also not the gigantic caveat in the docs `Application credentials with access rules require additional configuration of each service that will use it` | 16:51 |
jrosser | *note | 16:51 |
clarkb | jrosser: on the backend you mean? I guess they have to opt into checking the restrictions with keystone? | 16:52 |
corvus | so we'll need to figure out if this will work with rax-flex | 16:52 |
jrosser | https://opendev.org/openstack/keystone/commit/3856cbf10d4d19b9d7797d600ef096b0c04aaedb | 16:53 |
corvus | fungi: zuul can perform checksum when it's doing the cloud upload (i don't think we've written code that does that yet, but we can). we can compare it to the checksum we make when we return the artifact to zuul, so it's effectively a "did someone compromise the intermediary object storage" check. | 16:53 |
fungi | yes, i was thinking that checksum verification would be a good way to thwart that | 16:54 |
corvus | (only trusted points in that are zuul's database and the image build node) | 16:54 |
clarkb | jrosser: thanks. This is good info if it doesnt' work we can feed that back to $cloud | 16:55 |
fungi | afaik rax-flex is basically just a very recent vanilla openstack, and the "weirdness" with our basic credentials is simply because they're using a federated login to their old/existing account system, but the accounts also have local ids within keystone so we should be able to use whatever the usual openstack apis are to lock them down | 16:55 |
fungi | the ids we ended up using in our clouds.yaml are the local keystone ones rather than the federated ones | 16:56 |
clarkb | fungi: ya but according to the doc above they need toconfigure swift with keystonemiddleware to make this work | 16:56 |
clarkb | which they may or may not do | 16:56 |
fungi | oh, got it | 16:57 |
clarkb | I think everything is still checking on the vanilla clloud side of things and we should avoid problems with the federated logins. But still needs special configuration | 16:57 |
corvus | remote: https://review.opendev.org/c/zuul/zuul/+/931824 WIP: verify downloaded image checksum [NEW] <-- made a note so i don't forget to write that. | 16:57 |
fungi | thanks! | 16:59 |
clarkb | fwiw I =2'd it because I think merging it with a todo is fine as well | 17:00 |
fungi | probably the biggest concern with that mitigation is time. reading a 25GB file in order to checksum it is not fast, though we could probably parallelize that by checksumming chunks | 17:00 |
fungi | and we have to checksum it twice (once when creating the image, then later when retrieving it for upload to glance), so twice the time | 17:01 |
fungi | maybe we want gpu flavors from our cloud providers ;) | 17:01 |
fungi | then again, it's just as likely to be i/o bound and that's not as easy to solve | 17:04 |
fungi | oh, unless we checksum the swift upload and download chunks since that's already happening in parallel? | 17:05 |
fungi | we already have to read the image from disk to upload it to swift and to glance, so i guess if we do the checksumming inline with those reads we're already stuck with, it shouldn't increase the number of reads from disk | 17:06 |
fungi | we could also checksum inline with the download from swift instead and then not even bother with starting the glance upload if the checksum doesn't match | 17:07 |
fungi | but yeah, regardless, keeping performance in mind in the design will be important | 17:08 |
mnasiadka | ianw: so if you can - then please release a new minor version of DIB - hopefully it fixes Kolla-Ansible Ansible locale issues ;-) | 17:40 |
clarkb | the TC meeting has me mulling an idea. A job setup where you deploy an openstack using devstack/kolla/whatever then pause that job and then have several other jobs run to test various api interactions using various versions of things | 18:27 |
clarkb | for example that the current release of openstacksdk/openstackclient work but also master and maybe the stable releases too | 18:27 |
clarkb | I don't know if that would be more or less headache then having a separate cloud for each of those | 18:27 |
clarkb | but it occurred to me that we could prbably share resources if it would be useful | 18:28 |
fungi | in that design it would necessarily use multiple job nodes | 18:31 |
clarkb | particularly if you are booting many VMs | 18:33 |
opendevreview | Merged openstack/project-config master: Switch the remaining opendev zuul tenants to ansible 9 by default https://review.opendev.org/c/openstack/project-config/+/931320 | 19:15 |
clarkb | I've not seen anyone complain about ^ yet. I need to do a school run in a bit and if there are no complaints still I guess I'll proceed with merging that ozj change | 20:43 |
fungi | yeah, all's quiet on the nwestern front | 20:51 |
opendevreview | James E. Blair proposed opendev/zuul-jobs master: Return image artifacts https://review.opendev.org/c/opendev/zuul-jobs/+/931815 | 21:42 |
opendevreview | James E. Blair proposed opendev/zuul-jobs master: Delete images after 72 hours https://review.opendev.org/c/opendev/zuul-jobs/+/931819 | 21:42 |
ianw | just for the logs i did try pinging on ipv4 when mirror02 was down: | 21:42 |
ianw | [iwienand@fedora19 dist-git]$ ping -4 mirror.bhs1.ovh.opendev.org | 21:42 |
ianw | PING mirror.bhs1.ovh.opendev.org (158.69.69.81) 56(84) bytes of data. | 21:42 |
ianw | --- mirror.bhs1.ovh.opendev.org ping statistics --- | 21:42 |
ianw | 3 packets transmitted, 0 received, 100% packet loss, time 2083ms | 21:42 |
ianw | yay clouds | 21:44 |
clarkb | maybe their network gear recovers ipv4 more quickly | 21:52 |
clarkb | looks like I'm still in the clear to land that ozj ansible-lint update. I'll hit the approval button shortly | 21:54 |
clarkb | I've rereviewed my own change to triple check there weren't any silly typos | 22:04 |
clarkb | I'm going to go ahead and self approve it now with only fungi's actual review | 22:05 |
clarkb | ianw: btw we just updated the openafs version built on centos 9 stream because the previous one stopped building there (incompatible function declarations in the kernel and openafs for abort()). I couldn't find anything else that needed to be done to consume that new rpm so I think it must be automatic? | 22:06 |
clarkb | not sure if you recall | 22:06 |
corvus | clarkb: apparently we don't get to register stdout on a no_log task, so my method of obtaining the url from the swift command won't work | 22:08 |
clarkb | corvus: I guess in that case we have to risk it. I did do some digging after fungi pointed out that oslo.log should handle things and while swift and swiftclient don't directly consume oslo.log the keystoneauth lib does and I guess as long as you mark items secret=True it is supposed to handle it automatically for you | 22:10 |
ianw | clarkb: that bump should be it; it should make it's way to https://tarballs.opendev.org/openstack/openstack-zuul-jobs/openafs/ which is then used to install | 22:10 |
clarkb | now there has been at least one case of a config option accidentally lacking secret=True in the past, but we're probably fine for this case at least as long as the toolchain stays relatively static? We're installing from the distro right? so that should be the case until we bump the test node up? | 22:10 |
corvus | clarkb: an alternative would be to just hard-code the url. | 22:10 |
corvus | re distro install: yes | 22:11 |
clarkb | corvus: oh ya if it is a static thing in rax-flex that also seems reasonable (with rax proper you have to use the cdn and its a bit more convoluted) | 22:11 |
clarkb | I think it is still static for rax with the cdn but its some hmac hashed domain name? | 22:11 |
corvus | i think it's https://swift.api.sjc3.rackspacecloud.com/v1/AUTH_f063ac0bb70c486db47bcf2105eebcbd for this account | 22:12 |
clarkb | that does seem workable too then | 22:12 |
clarkb | arg centos 9 stmrea just updated the kernel again so ozj change won't land | 22:13 |
clarkb | I'm going to manually request that nb04 rebuild centos-9-stream now in hopes that maybe I can land that tomorrow morning instead | 22:14 |
clarkb | kernel-devel-aarch64 = 5.14.0-513.el9 is needed by openafs-1.8.12.1-1.el9.aarch64 <- is the latest error. 514 appears to be the new kernel so I think we must be booted on 513 and thus its looking for those headers but only finding the modern 514 ones? I'm trying t oconfirm via the job facts | 22:16 |
clarkb | ya BOOT_IMAGE: (hd0,gpt3)/boot/vmlinuz-5.14.0-513.el9.aarch64 | 22:16 |
opendevreview | James E. Blair proposed opendev/zuul-jobs master: Return image artifacts https://review.opendev.org/c/opendev/zuul-jobs/+/931815 | 22:16 |
opendevreview | James E. Blair proposed opendev/zuul-jobs master: Delete images after 72 hours https://review.opendev.org/c/opendev/zuul-jobs/+/931819 | 22:16 |
corvus | clarkb: can you re-review 931815 and see if that lgty? | 22:16 |
clarkb | corvus: the artifact change lgtm. What is with the zuul.success to zuul_success change? | 22:19 |
clarkb | is the zuul var loaded at the top level as zuul_success instead of an entry in the zuul dict? | 22:19 |
corvus | yep | 22:19 |
corvus | there was a reason for that i think... since it changes over different playbook runs | 22:20 |
clarkb | got it | 22:20 |
clarkb | +2 from me | 22:20 |
clarkb | I made the image build request on nl01 and that returned an error about trying to build some non disk image builder image. Running the command against nb04 it worked so I guess some sort of config issue | 22:27 |
clarkb | also we've got a zk entry for a debian-bookworm-arm64 image build with invalid json in it | 22:28 |
clarkb | doing a dib-image-list shows that. I suspect we can just delete the zk db record for that? | 22:28 |
clarkb | in any case I think the request is in now. A different image is ucrrently building though. Hopefully this will be in a happier spot tomorrow morning | 22:28 |
opendevreview | Merged opendev/zuul-jobs master: Return image artifacts https://review.opendev.org/c/opendev/zuul-jobs/+/931815 | 23:27 |
Generated by irclog2html.py 2.17.3 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!