opendevreview | Steve Baker proposed openstack/diskimage-builder master: Parse block device lvm lvs size attributes https://review.opendev.org/c/openstack/diskimage-builder/+/839829 | 02:04 |
---|---|---|
opendevreview | Steve Baker proposed openstack/diskimage-builder master: WIP Support LVM thin provisioning https://review.opendev.org/c/openstack/diskimage-builder/+/840144 | 02:04 |
ianw | looks like we might be having some issues getting an arm64 node | 03:09 |
ianw | hrm, might be a centos-8-stream arm64 node in particular | 03:10 |
ianw | looks like everything in linaro-us is stuck in build, which i think is the problem here | 03:23 |
ianw | kevinz: ^ | 03:23 |
ianw | we're just getting a no-info exception ... raise exceptions.ServerDeleteException( | 03:26 |
ianw | look like kevinz might be on holidays. we might have to disable linaro-us, but i'm not sure about all this stuff in half-deleted mode, it might be a mess to clean up | 03:59 |
opendevreview | Ian Wienand proposed openstack/project-config master: Temporarly disable linaro-us https://review.opendev.org/c/openstack/project-config/+/840150 | 04:06 |
opendevreview | Merged openstack/project-config master: Temporarly disable linaro-us https://review.opendev.org/c/openstack/project-config/+/840150 | 05:08 |
opendevreview | Ian Wienand proposed opendev/system-config master: Test openafs roles on CentOS 9-stream https://review.opendev.org/c/opendev/system-config/+/839841 | 05:08 |
*** marios is now known as marios|ruck | 05:08 | |
opendevreview | Ian Wienand proposed opendev/system-config master: Test openafs roles on CentOS 9-stream https://review.opendev.org/c/opendev/system-config/+/839841 | 06:03 |
*** ysandeep is now known as ysandeep|lunch | 08:13 | |
*** pojadhav is now known as pojadhav|afk | 08:49 | |
*** ysandeep|lunch is now known as ysandeep | 08:53 | |
*** ysandeep is now known as ysandeep|sick | 08:55 | |
*** rlandy|out is now known as rlandy | 10:23 | |
*** pojadhav|afk is now known as pojadhav | 10:27 | |
opendevreview | Ian Wienand proposed opendev/system-config master: Test openafs roles on CentOS 9-stream https://review.opendev.org/c/opendev/system-config/+/839841 | 11:12 |
*** pojadhav is now known as pojadhav|afk | 13:41 | |
*** artom_ is now known as artom | 13:42 | |
clarkb | ianw: when I had keyboard issues it was related to power on my usb bus. I had to move devices around | 15:16 |
*** dviroel is now known as dviroel|lunch | 15:23 | |
clarkb | fungi: frickler: any chance we can move forward with https://review.opendev.org/c/opendev/system-config/+/839422 and https://review.opendev.org/c/opendev/system-config/+/839972 to add more jammy mirroring? | 15:27 |
clarkb | for that second one do we need to hold the lock and run the script without a timeout? I think that is the case and I'm happy to do that if so | 15:27 |
fungi | probably, otherwise it's likely to take multiple passes in cron before the mirror is ready | 15:28 |
fungi | won't break anything if we don't though | 15:28 |
clarkb | got it. I'm happy to grab the lock and help it through | 15:29 |
clarkb | also ianw has https://review.opendev.org/c/opendev/system-config/+/837637 proposed to do a bit more fedora mirror cleanup | 15:29 |
fungi | i can single-core approve the jammy changes, they should be entirely non-impacting | 15:30 |
clarkb | thanks. I'll reboot for local updates then grab that lock on mirror-update | 15:32 |
fungi | i've approved them both | 15:32 |
clarkb | I've got the lock (sorry got ditracted after reboot putting keys back in place nd stuff) | 15:52 |
clarkb | the lock is held in window 3 of the preexisting screen session on mirror-udpate. I'll run the script there once the change lands | 15:52 |
opendevreview | Merged opendev/system-config master: Mirror Jammy arm64 ubuntu-ports https://review.opendev.org/c/opendev/system-config/+/839972 | 15:56 |
opendevreview | Merged opendev/system-config master: Add Jammy Docker package mirroring https://review.opendev.org/c/opendev/system-config/+/839422 | 15:56 |
clarkb | the docker mirroring is much much smaller so I won't bother to manually run that one | 15:57 |
fungi | yeah, it should take no more than a few minutes | 16:01 |
clarkb | the ubuntu-ports update script is running now and of course I forget to set a timeout :/ | 16:04 |
clarkb | that was even the whole point. I'll let it timeout. THen rerun it turning the timeout off. In the meantime I think I'll look at making no timeout the default and then set a timeout when run in cron | 16:05 |
fungi | yeah, that's fine really. it's more that this way you can catch it when it times out and restart it immediately rather than waiting up to 2 hours for it to continue automatically | 16:07 |
clarkb | ++ | 16:09 |
*** marios|ruck is now known as marios|out | 16:10 | |
opendevreview | Clark Boylan proposed opendev/system-config master: Run reprepro with no timeout by default https://review.opendev.org/c/opendev/system-config/+/840214 | 16:16 |
clarkb | Something like ^ that? | 16:16 |
fungi | oh, right, i totally forgot we talked about dropping it completely | 16:16 |
clarkb | at least for me I never remember it is timing out by default, but if we do it this way it is a bit more explicit in hte cronob command that you copy and run. This means you can drop it easily | 16:17 |
clarkb | anyway I'll keep an eye on it and restart it once it times out. THis time with no timeout :) | 16:18 |
*** dviroel|lunch is now known as dviroel | 16:23 | |
*** rlandy is now known as rlandy|ruck | 16:25 | |
clarkb | looking at the gerrit memory stack's latest results after restacking them I continue to see no appreciable difference in a small setup | 16:38 |
clarkb | I think thats probably about as good as it will get out of the CI system. Those changes are landable though adn will give us useful info | 16:39 |
clarkb | now to see if we can ask gerrit/jvm for heap info as well | 16:39 |
clarkb | I'm curious to see what kind of headroom we have | 16:39 |
clarkb | `gerrit show-caches --show-jvm` maybe? I'll give that a go in a bit once I've got admin creds loaded | 16:40 |
fungi | yeah, probably need a much larger and more active gerrit to see problems with it | 16:45 |
clarkb | looks like we shouldn't need --show-jvm for the heap info so I'll run that show-caches command without taht extra flag | 16:45 |
clarkb | this command does not return quickly | 16:46 |
clarkb | 'Mem: 96.00g total = 22.32g used + 73.29g free + 399.99m buffers' based on that I'm not very concerend about more memory use in 3.5 | 16:47 |
clarkb | would have to be ~3x more memory use to be a problem | 16:47 |
clarkb | also only 166 open files (noting that as discussion about increasing ulimits has come up in the past) | 16:47 |
clarkb | so ya I think we can alnd that stack in preparation just to be sure we don't make it worse than it needs to be and we have extra logging info to track request costs. But otherwise proceed with 3.5 planning per usual | 16:49 |
fungi | yeah, agreed | 16:49 |
clarkb | fungi: wiki.openstack.org's SSL cert expires in 30 days | 16:52 |
clarkb | do we want to reissue another annual cert for that one? | 16:52 |
fungi | clarkb: yeah, looks like that's what we did last year | 16:54 |
clarkb | fungi: if you have time to rereview https://review.opendev.org/c/opendev/system-config/+/839251/ and parent that would be great (I Just changed the order ont hem so that we would haev CI results showing memory use with the performance logging toggle explicityl disabled) | 17:02 |
clarkb | er rather that we would have memory info with performance logging enabled and disabled to compare (and the comparison is pretty close and boring) | 17:02 |
opendevreview | Clark Boylan proposed opendev/system-config master: Upgrade gitea to v1.16.7 https://review.opendev.org/c/opendev/system-config/+/840218 | 17:12 |
clarkb | infra-root ^ that probably isn't super urgent but also should be low impact | 17:14 |
clarkb | the ubuntu-ports sync should timeout any minute now. | 17:31 |
clarkb | fungi: re the gerrit updates there is a grandparent that needs to be approved before those land. I'm happy to see if ianw can review them though | 17:32 |
clarkb | ok reprepro restarted with NO_TIMEOUT=1 set | 17:36 |
fungi | they're each fairly trivial, and easy to undo if concerns are raised | 17:40 |
clarkb | fungi: yup I just noticed you approved the parent and child but not grandparent | 17:41 |
fungi | i just plain forgot it was there ;) | 17:48 |
clarkb | gitea 1.16.7 passed ci. Always reassuring when those bugfixes don't magically stop working | 18:07 |
fungi | yeah | 18:07 |
fungi | regression testing is a wonderful thing | 18:07 |
clarkb | fungi: and I guess we can plan for a gerrit restart later today as load drops? that way we'll get the new logfile | 18:10 |
fungi | wfm, yeah | 18:24 |
*** rlandy|ruck is now known as rlandy|rover | 18:30 | |
opendevreview | Merged opendev/system-config master: Update Gerrit build checkouts https://review.opendev.org/c/opendev/system-config/+/839250 | 18:37 |
opendevreview | Merged opendev/system-config master: Enable Gerrit httpd requestLog https://review.opendev.org/c/opendev/system-config/+/839976 | 18:37 |
opendevreview | Merged opendev/system-config master: Explicitly disable Gerrit tracing.performanceLogging https://review.opendev.org/c/opendev/system-config/+/839251 | 18:39 |
*** timburke__ is now known as timburke | 19:24 | |
clarkb | we are mirroring supertux now heh | 20:08 |
clarkb | kind of cool that that has an arm build | 20:08 |
clarkb | the ports mirroring is writing out a bunch of zstd errors and not yet complete | 21:56 |
clarkb | I assume it will eventually get there though | 21:56 |
fungi | same errors we saw with the main mirror, no doubt | 21:58 |
clarkb | ya | 22:02 |
ianw | distro love changing their zip formats | 22:02 |
clarkb | ianw: good morning. Probably two things to be aware of. We landed a stack of small updates to the gerrit images and config as part of more prep work for 3.5. We should probably restart gerrit on that soon just to be sure its all working in production. The other thing is I have a cahnge to update gitea to 1.16.7 | 22:03 |
ianw | i'll be happy to restart gerrit this afternoon if we like? | 22:04 |
clarkb | ianw: sure. There are two config changes. One to disable performancelogging explicitly as that can apparently consume more memory and the other enables httpd logging so tthat we can track memory useage for requests (apparently useful for debugging in 3.5 ends up using all your memory) | 22:05 |
clarkb | the image updates just bring us up to date with the latest tags on things upstream | 22:05 |
ianw | gitea lgtm, nothing crazy in the changelog | 22:05 |
clarkb | on the memory side of things I can't really see it using more memory in CI. And our prod service has plenty of heap space according to `gerrit show-caches` so I'm not too worried about it | 22:05 |
ianw | that all sounds good. we've certainly spent a share of time with gerrit running out of memory with little insight as to why | 22:06 |
clarkb | oh and jammy ubuntu-ports mirroring is in progress with the lock held in the screen (window 3) on mirror-update | 22:06 |
ianw | although i don't feel like we've had anything like since we moved to the bigger server | 22:07 |
clarkb | I had hoped that would be done before my day ends in about 2 hours but its still going so who knows | 22:07 |
clarkb | ianw: ya the bigger server helped a lot | 22:07 |
ianw | yeah that will take a while, i can pull up that window and drop the lock if it finishes by eod | 22:07 |
ianw | i want to try looking at centos 9 arm64, i'm hopeful it "just works" | 22:09 |
ianw | yesterday i fixed publishing the openafs rpms, am just finishing fixing our testing | 22:09 |
clarkb | ianw: oh and I approved your zuul ansible json callback updates | 22:10 |
fungi | which are the remaining changes for c9arm? | 22:10 |
ianw | fungi: i think we need some node definitions and some general testing | 22:12 |
ianw | i nee to pick it apart a bit and see where we got to | 22:12 |
fungi | oh, and if you want a quick diversion, https://review.opendev.org/839990 is some long-overdue mailing list cleanup (once that merges i can take care of the manual steps) | 22:14 |
clarkb | The images are in nodepool and the mirrors appears to have aarch64 packages. | 22:14 |
clarkb | we've also got the labels defined. You may just need to use them at this point and see if they work? | 22:14 |
ianw | yeah something seemed up yesterday but it might have also been my typos :) | 22:27 |
*** rlandy|rover is now known as rlandy|rover|bbl | 22:29 | |
ianw | fungi: 839990 lgtm, not sure if we want to link to opendev lists? i assume it's safe to approve? | 22:29 |
fungi | wdym "link to opendev lists?" | 22:30 |
fungi | oh, in the docs | 22:30 |
fungi | we could amend that, but it's worth a broader discussion on how (and if) we'd want to engage with third-party ci operators | 22:31 |
ianw | yeah, service-discuss/announce? | 22:31 |
ianw | ok, fair enough it's not a 1:1 replacement scenario | 22:32 |
ianw | school run, bib | 22:32 |
clarkb | fungi: I think we should continue to help them with setting up gerrit accounts and debugging gerrit interactions. But ya I'm wary about trying to solve all their CI issues. Thankfully that hasn't been a major issue for a while | 22:32 |
fungi | right, what i'm questioning is whether we want to make "subscribe to service-*@opendev" a general recommendation for all third-party ci operators | 22:33 |
fungi | we used to tell them to subscribe to third-party-ci-announce@openstack because at one point we were using that to notify of disabled accounts | 22:34 |
clarkb | fungi: maybe we should email the accounts email address directly instead? | 22:34 |
fungi | i don't think we'd notify the service-discuss ml about specific accounts | 22:35 |
clarkb | so something like "ensure your third party ci account's email address is up to date as important info may be sent to that address" | 22:35 |
clarkb | yes agreed about not using service-discuss for that | 22:35 |
fungi | yeah, i'm cool with that. i mostly wanted to avoid turning the ml removal change into a broader discussion about how we intend to interface with third-party ci operators, but am happy to do that in a followup patch. for this one i mainly just didn't feel right leaving references in the docs to an ml we're not using and are now removing | 22:37 |
corvus | apparently the nodepool builders have been restarted frequently but not the launchers | 22:43 |
corvus | so i will restart the launchers now | 22:43 |
corvus | it would be cool if maybe when folks are updating them they could maybe try to keep them closer to being in sync? | 22:43 |
clarkb | corvus: updating them == the builders? I think the issue is that the builders automatically restart but the launchers do not | 22:44 |
clarkb | we could possibly decide that automatically restarting the launchers is safe enough and set them up that way too | 22:44 |
corvus | why do the builders automatically restart (and why don't the launchers?) | 22:45 |
clarkb | corvus: I think the reason is that interrupting builds and breaking builds is low impact. We'll just use older images. But if the launchers break all CI will grind to a halt | 22:45 |
clarkb | in theory the launchers will restart automatically and clean up their previously building nodes and continue where they left off without trouble though so it would mostly be if we land a bug in the launchers that presents a problem I think | 22:46 |
corvus | yes, that's correct. | 22:46 |
clarkb | I'd be open to auto restarting launchers too and see how we do. Its been a while since we landed a halt the world bug to nodepool (yay testing) | 22:48 |
fungi | i guess the frequent dib releases are the driving interest in automatic builder restarts? | 22:48 |
clarkb | fungi: I think its more that we can do automated builder restarts and not break anything user visible | 22:49 |
clarkb | because if image builds break we just continue along with the old images | 22:49 |
fungi | it seems like every time we need to fix a bug in image building or add support for a new platform, that's a dib change which needs to be merged, released, added to nodepool's minimum to trigger new container image uploads to dockerhub which we pull and restart onto | 22:49 |
corvus | though it also means that the builders and launchers were apparently 8 weeks apart in versions | 22:50 |
clarkb | fungi: yup, but that wasn't why we did autoamtic restarts | 22:50 |
fungi | i guess remembering to manually restart builders wouldn't be that big of a deal given all the other steps | 22:50 |
clarkb | yes, if we wanted to go the other direciton we could do that too. Make them all manual then manually restart the full set when updating | 22:51 |
corvus | seems like if we want to move toward CD, taking the next step and doing the launchers is reasonable. | 22:51 |
corvus | #status log restarted nodepool launchers on a2e5e640ad13b5bf3e7322eb3b62005484e21765 | 22:52 |
opendevstatus | corvus: finished logging | 22:52 |
corvus | looks like nodes are becoming ready, so at least it hasn't blowed up. | 22:53 |
clarkb | another thing we could do to reduce the risk there is reduce the interval we update nodepool on. Currently it is every hour. Which means if a change lands upstream in nodepool about an hour later we'd get it | 22:54 |
clarkb | er not reduce it increase it | 22:55 |
corvus | nl02 has a pre-existing error about if node.type[0] not in provider_pool.labels: IndexError: list index out of range | 22:55 |
clarkb | so that we update nodepool when we expect opendev admins to be around should it stop things | 22:55 |
corvus | nl04 has a pre-existing error about Unexpected API Error. Please report this at http://bugs.launchpad.net/nova/ and attach the Nova API log if possible. | 22:55 |
corvus | <class 'nova.exception.OrphanedObjectError'> | 22:55 |
corvus | (by pre-existing i mean both of those appear in logs from before the recent restart) | 22:56 |
clarkb | a nova orphaned object error sounds like fun. nl04 is ovh. amorin may be interested in that one | 22:56 |
clarkb | for nl02's error I wonder if when we removed some older label we didn't properly let it steady state without nodes and now we just need to manually delete a zk record | 22:56 |
corvus | yep that's from ovh gra1 | 22:56 |
corvus | re nl02 probably something like that yeah | 22:57 |
clarkb | I can look at nl02's thing more closely tomorrow | 22:57 |
corvus | (or maybe that's the fake node for deleting issue that i think may have been addressed at some point recently? look into that first if you want to look into it) | 22:58 |
clarkb | k | 22:58 |
opendevreview | Ian Wienand proposed opendev/system-config master: Test openafs roles on CentOS 9-stream https://review.opendev.org/c/opendev/system-config/+/839841 | 23:09 |
opendevreview | Jeremy Stanley proposed opendev/system-config master: third-party CI: reminder to keep address current https://review.opendev.org/c/opendev/system-config/+/840251 | 23:16 |
fungi | ianw: clarkb: ^ | 23:16 |
opendevreview | Merged opendev/system-config master: Upgrade gitea to v1.16.7 https://review.opendev.org/c/opendev/system-config/+/840218 | 23:24 |
clarkb | that should start applying in a few minutes onces the hourly deploy is complete | 23:24 |
ianw | i guess we probably want to do an upload of our openafs to the ppa to support jammy too | 23:35 |
clarkb | oh right | 23:36 |
clarkb | unless jammy's packge is enw enough? I suppose that is possible | 23:36 |
ianw | true; we've generally just ended up on the release we have to work around issues | 23:39 |
clarkb | gitea01 has updated. Looks ok at first glance | 23:41 |
fungi | 01 seems to be working for me | 23:41 |
clarkb | oh I need to send out an agenda for tomorrow /me does this | 23:42 |
clarkb | ok agenda has been updated, I'll send it out in ~10 minutes giving others to add any additional items | 23:47 |
clarkb | * giving others time | 23:47 |
* fungi has nothing to add | 23:48 | |
clarkb | looks like ubuntu ports updating for jammy has completed. I'll release the lock now | 23:48 |
clarkb | the screen is still up but lock is dropped. We can probably close the screen session there if no one lese needs it for anything? | 23:49 |
fungi | i don't | 23:49 |
fungi | though as a general rule, when i'm doing those updates manually, i do try to log all the output to the same log the cronjob uses, so the screen history usually doesn't have much value over that anyway | 23:50 |
clarkb | yup I logged to the same file. I just kept using the screen since it was there from previous recent mirror updates | 23:51 |
fungi | ah | 23:51 |
clarkb | all 8 giteas are updated now. The job isn't quite completel though | 23:58 |
*** dviroel is now known as dviroel|out | 23:59 |
Generated by irclog2html.py 2.17.3 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!