opendevreview | Dale Smith proposed openstack/project-config master: Add magnum-capi-helm-charts to Magnum project https://review.opendev.org/c/openstack/project-config/+/893117 | 00:47 |
---|---|---|
opendevreview | Dale Smith proposed openstack/project-config master: Add magnum-capi-helm-charts to Magnum project https://review.opendev.org/c/openstack/project-config/+/893117 | 00:52 |
opendevreview | Dale Smith proposed openstack/project-config master: Add magnum-capi-helm-charts to Magnum project https://review.opendev.org/c/openstack/project-config/+/893117 | 01:53 |
*** TheMaster is now known as Unit193 | 10:00 | |
opendevreview | Dr. Jens Harbott proposed openstack/project-config master: Unpause image uploads for rax-iad part 2 https://review.opendev.org/c/openstack/project-config/+/893145 | 10:36 |
frickler | clarkb: corvus: ^^ according to the nodepool docs, this is per provider, so should look like this? https://zuul-ci.org/docs/nodepool/latest/configuration.html#attr-providers.max-concurrency | 10:36 |
frickler | this would avoid the issue of slowing down uploads for other providers | 10:37 |
opendevreview | Merged openstack/project-config master: Add magnum-capi-helm-charts to Magnum project https://review.opendev.org/c/openstack/project-config/+/893117 | 11:56 |
*** d34dh0r5- is now known as d34dh0r53 | 12:14 | |
corvus | frickler: that just affects node requests on launchers; you want `--upload-workers` command line argument for the builder | 13:30 |
opendevreview | Lukas Kranz proposed zuul/zuul-jobs master: prepare-workspace-git: Add ability to define synced pojects https://review.opendev.org/c/zuul/zuul-jobs/+/887917 | 13:53 |
opendevreview | Lukas Kranz proposed zuul/zuul-jobs master: prepare-workspace-git: Add ability to define synced pojects https://review.opendev.org/c/zuul/zuul-jobs/+/887917 | 13:57 |
opendevreview | Maksim Malchuk proposed openstack/diskimage-builder master: Fix and issue with wait_for https://review.opendev.org/c/openstack/diskimage-builder/+/893196 | 14:32 |
fungi | just a heads up, i'm in and out doing storm prep and errands, but should be around more during my afternoon | 15:05 |
clarkb | fungi: its expected to hit you tomorrow night? | 15:12 |
fungi | rain and wind are likely to pick up around 5pm local time here, but yeah if the eye regains coherence on the way out to the atlantic that will be tomorrow | 15:13 |
fungi | right now the eye is projected to pass south of us, but it's hard to track/predict accurately once it's over land | 15:13 |
clarkb | hopefully it doesn't have too big of an impact. That asid it looks like its already creating massive problems in florida | 15:14 |
fungi | the main thing we have to keep an eye on is wind-driven surge, which will depend a lot on wind direction (in turn depending on where the eye reappears) and how timing coincides with the tides | 15:15 |
fungi | north carolina has a recently built out a really great flood inundation mapping and prediction network though, very glad that's a thing now: https://fiman.nc.gov/# | 15:16 |
fungi | and in the past few months they added a gauge a few blocks from our house, so even better | 15:17 |
clarkb | thats neat. They've even built it up far inland. I guess river flooding is an issue too | 15:19 |
fungi | yes, the topology in nc includes a mountain range and coastal plane. the eastern continental divide passes through the west end of the state, and so everything that falls from the sky in the state flows this direction | 15:27 |
fungi | er, coastal plain | 15:27 |
fungi | there are a variety of flood risks across the state, whether it's flash flooding in valleys, poor drainage in low-lying areas, or wind-driven surges on the shore | 15:29 |
fungi | stepping out again for a bit but should be back in an hour or so | 15:31 |
clarkb | I'm going to approve https://review.opendev.org/c/opendev/system-config/+/892701. That image was primarily created for gitea on k8s which means it isn't used in production today though it will trigger a infra-prod-service-gitea run which should noop | 15:36 |
opendevreview | Merged opendev/system-config master: Update jinja-init image to bookworm https://review.opendev.org/c/opendev/system-config/+/892701 | 16:13 |
*** ralonsoh is now known as ralonsoh_ooo | 16:20 | |
clarkb | as expected the gitea job ran but was quick and successful and the service is still reachable | 16:42 |
fungi | oh good | 17:06 |
fungi | (back now btw) | 17:06 |
TheJulia | Greetings folks, can I get a hold added for job name "ironic-tempest-ipa-partition-uefi-pxe-grub2" ? Thanks! | 17:36 |
clarkb | I was about to say sure. Then ssh failed because I haven't loaded keys yet /me looks for keys | 17:40 |
clarkb | TheJulia: it wouldn't let me create a hold without setting a projcet name so I set it for ironic | 17:46 |
fungi | yes, project name is required | 17:50 |
opendevreview | Merged openstack/diskimage-builder master: Fix and issue with wait_for https://review.opendev.org/c/openstack/diskimage-builder/+/893196 | 17:58 |
clarkb | fungi: whats your read on gitea + bookworm ssh risk. Do you think we should hold a gitea and a gerrit node and have them replicate to each other to ensure that mina can talk to openssh 9.2 using an rsa key? similarly what about the gerrit + bookworm change that bumps us to java 17? | 18:00 |
fungi | i think i missed some of the nuances of that (though i did see it being discussed). can you resummarize the issue? | 18:02 |
clarkb | fungi: for gitea we're upgrading bullseye to bookworm which bumps us from openssh 8.4 to 9.2. This crosses the 8.8 rsa with sha1 is bad/evil/disabled threshold. Historically Gerrit's MINA library struggled with this. So far we've only run into problems with MINA as a server, but there is potential that it will break as the client too (though they say they fixed both) | 18:03 |
clarkb | In our CI testing we do test that we can push to gitea but we use git + openssh not Gerrit + MINA | 18:04 |
TheJulia | clarkb: perfect, thanks! | 18:04 |
clarkb | If we do have problems we can switch to an ed25519 key | 18:04 |
clarkb | The gerrit image upgrade to bookworm also bumps us to java 17. Bookworm has java 11 if we want to separate the distro update and the java update but currently as written it does both together. I don't have a full grasp on the differences between java 11 and java 17. In theory GC will perform better. Gerrit says they fully support java 17 at this point too so it should be fine | 18:05 |
fungi | could we test upgrading one gitea server and then take it down in haproxy if replication isn't working to it as expected? | 18:08 |
fungi | or are held nodes still an easier path for confirming? | 18:09 |
clarkb | fungi: yes I think we can put all but one of the gitea servers in the emergency file list, land the change, check replication, remove the other giteas from emergency and wait for the daily run to upgrade them | 18:09 |
clarkb | as an alternative to waiting for the daily run we can land another change to trigger the job or manually run the playbook from bridge | 18:09 |
fungi | i'd be fine with that. if all we're concerned with is impact to commit replication, then the problems that will briefly produce should be minimal | 18:12 |
frickler | sounds good to me too | 18:34 |
frickler | for nodepool, we have this variable here https://opendev.org/opendev/system-config/src/branch/master/playbooks/roles/nodepool-builder/defaults/main.yaml | 18:34 |
frickler | do we want to change the default or just override the nodepool group_var in our inventory? | 18:35 |
Clark[m] | I would override not change the default | 18:37 |
frickler | ok so I added "nodepool_builder_upload_workers: 1" in group_vars/nodepool.yaml, did not commit yet, can someone watch whether that works as expected? I'll update the project-config patch next | 19:06 |
fungi | i can check back in on it in a bit | 19:07 |
fungi | i still haven't caught up from this morning's personal life interruptions, but do still hope to get the ticket opened with rackspace before the end of my day | 19:08 |
opendevreview | Dr. Jens Harbott proposed openstack/project-config master: Unpause image uploads for rax-iad part 2 https://review.opendev.org/c/openstack/project-config/+/893145 | 19:08 |
frickler | once the builders are running with 1 thread, you could merge ^^ then, I'll check back on it tomorrow | 19:09 |
TheJulia | oooh ahh, autohold appears ready. Who shall I send my pub key to? Thanks in advance! | 19:45 |
fungi | TheJulia: ooh, gimme | 19:46 |
fungi | TheJulia: ssh root@213.32.74.11 | 19:47 |
TheJulia | muahahahahahahah | 19:48 |
fungi | world domination is closer than ever | 19:49 |
TheJulia | hey guys, you can reclaim that hold now, I've got what I needed! Thanks! | 20:21 |
frickler | TheJulia: done | 20:23 |
frickler | corvus: Clark[m]: nodepool-builder is running with 1 thread now on nb01+2, so 893145 should be good to go | 20:24 |
* frickler should really eod now | 20:24 | |
corvus | where's the change that changes the upload workers? | 20:46 |
fungi | corvus: the local vars on the bridge were updated per 19:06z in scrollback | 20:48 |
corvus | fricklerfungi Clark i think/hope there was a miscommunication above. it looks like frickler was asking where to make the change for the number of upload workers and interpreted Clark's reponse as indicating that it should be made in the secret hostvars on bridge. but this is not a secret, and the change should be made somewhere in the opendev/system-config repository. it could either be made in the role definition itself (where frickler | 20:50 |
corvus | originally pointed) since we want it to apply to all builders and we are the only users; or it could be made in the inventory files in the opendev/system-config repo. but either way, it shouldn't be on bridge. | 20:50 |
fungi | makes sense. i can push a change to add that to system-config and then we can pull out the entry on bridge once it merges | 20:51 |
fungi | working on that now | 20:51 |
corvus | https://opendev.org/opendev/system-config/src/branch/master/playbooks/roles/nodepool-builder/defaults/main.yaml (original location from frickler ) | 20:52 |
corvus | or something like https://opendev.org/opendev/system-config/src/branch/master/inventory/service/group_vars/nodepool-builder.yaml if we want to narrow the scope | 20:52 |
corvus | or there's like 5 other places it could go :) | 20:53 |
opendevreview | Merged openstack/project-config master: Unpause image uploads for rax-iad part 2 https://review.opendev.org/c/openstack/project-config/+/893145 | 20:54 |
corvus | i would probably change the original location -- that seems less confusing to me... | 20:57 |
corvus | (because in this particular case, overriding anywhere else would basically mean we set a default value we never use?) | 20:57 |
clarkb | You can set it when the role is included | 20:58 |
opendevreview | Jeremy Stanley proposed opendev/system-config master: Temporarily limit node image upload concurrency https://review.opendev.org/c/opendev/system-config/+/893289 | 20:58 |
fungi | like that? i'll work on opening the ticket now and amend that change with the id in place of the todo comment | 20:59 |
clarkb | yes i think that will work, though corvus is suggesting we just change the role default | 20:59 |
clarkb | its just weird to me to use a default like that. To me the default should be for an ideal or at least realistic state and we are overriding that for unexpected behavior | 20:59 |
corvus | but the role is "run the opendev nodepool builder" | 21:00 |
corvus | so the "default" is really the "way we run the opendev nodepool builder" | 21:00 |
fungi | both arguments make sense to me. i don't really have a preference but happy to adjust the change to whatever consensus is reached | 21:00 |
clarkb | ya I'm happy to do it the way corvus suggests since that is the strongest opinion any of us have expressed | 21:01 |
fungi | okay, i can switch the change to do that when i get the ticket open in a few minutes | 21:02 |
fungi | "Image upload processing delay in IAD | 21:06 |
fungi | For the past month (since around the end of July), when uploading images to the Glance API for the IAD region, backend processing takes at least 30 minutes after the upload has completed until the uploaded image appears in the image list. Uploading the same image to the DFW or ORD regions only takes a few minutes at this stage, by comparison. | 21:06 |
fungi | Worse, if we upload multiple images around the same time, the delay for any of them appearing in the image list appears to scale roughly linearly with the number of images uploaded, and so has been observed to exceed 5 hours in some cases. | 21:06 |
fungi | Thanks for looking into it!" | 21:07 |
fungi | does that seem to encapsulate the concern without getting too into the weeds? | 21:07 |
corvus | either way, it's an extra level of indirection which makes how we run the system a little less discoverable (or at least, prone to accidental misunderstanding) | 21:08 |
clarkb | fungi: the only other thing is maybe mention the task system? when uploading images to the Glance API using the task system... | 21:08 |
fungi | do we explicitly invoke tasks? | 21:08 |
fungi | i should probably also say we're uploading vhd images? | 21:08 |
clarkb | openstacksdk/shade/whatever it is call now does | 21:08 |
corvus | (heh, to be clear, the strength of my opinion on this is like 2 out of 10 -- but that may well still be the strongest :) | 21:08 |
fungi | the "import" task i guess? | 21:08 |
clarkb | maybe? maybe its better to leave that out until they ask what secific apis are being used | 21:09 |
fungi | feels odd to say we're using tasks but not say what tasks | 21:09 |
fungi | maybe that just feels odd because i'm fuzzy on that part of the api | 21:09 |
clarkb | everyone is because it is undocumented :) | 21:10 |
fungi | but if it's undocumented how do we know we're using it? | 21:10 |
fungi | anyway, the api response does mention the import task | 21:10 |
clarkb | because its the rax image upload system. Glance added tasks just for rax and that is why it is undocumented because it wasn't a thing anyone else ever ended up using | 21:10 |
fungi | "the IAD region, backend processing takes at least 30 minutes after the upload has completed until the uploaded image appears in the image list, a while after the import task returned by the image create call is showing an image ID." | 21:12 |
clarkb | lgtm | 21:12 |
fungi | huh, that didn't copy all of what i thought i highlighted | 21:13 |
fungi | "For the past month (since around the end of July), when uploading VHD images to the Glance API for the IAD region, backend processing takes at least 30 minutes after the upload has completed until the uploaded image appears in the image list, a while after the import task returned by the image create call is showing an image ID." | 21:13 |
fungi | anyway, that adds mention of vhd and tasks | 21:13 |
opendevreview | Jeremy Stanley proposed opendev/system-config master: Temporarily limit node image upload concurrency https://review.opendev.org/c/opendev/system-config/+/893289 | 21:16 |
fungi | corvus: clarkb: ^ how's that? | 21:16 |
clarkb | +2 | 21:17 |
corvus | +3 | 21:17 |
fungi | once it merges i'll undo the similar addition on bridge | 21:18 |
opendevreview | Clark Boylan proposed openstack/project-config master: Switch OpenStack's Zuul tenant to Ansible 8 by default https://review.opendev.org/c/openstack/project-config/+/893290 | 21:21 |
clarkb | frickler: ^ theres the change to merge on Monday. | 21:21 |
JayF | I suddenly, for a completely unrelated reason, feel compelled to go run a test job on Ironic/bifrost ;) | 21:21 |
opendevreview | Merged opendev/system-config master: Temporarily limit node image upload concurrency https://review.opendev.org/c/opendev/system-config/+/893289 | 22:31 |
fungi | since that ^ has merged, i undid the corresponding edit to /etc/ansible/hosts/group_vars/nodepool.yaml | 22:37 |
fungi | i didn't revert it since it was never committed to git on bridge anyway | 22:37 |
*** benj_0 is now known as benj_ | 23:20 |
Generated by irclog2html.py 2.17.3 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!