opendevreview | Merged zuul/zuul-jobs master: mirror-container-images: use skopeo to mirror multiarch images https://review.opendev.org/c/zuul/zuul-jobs/+/944878 | 00:00 |
---|---|---|
clarkb | I'm going to look for dinner now but would be good if we can keep an eye on ^ during opendev's mirror jobs that trigger in ~2 hours | 00:02 |
clarkb | image mirroring looks ok to me https://quay.io/repository/opendevmirror/registry?tab=tags there are manifests for unknown arches and platforms in addition to the linux on amd64 linux on arm64 etc manifests | 02:24 |
clarkb | not sure what is with those unknown ones. I feel like we've looked into this with nodepool images before and decided it wasn't a problem but I don't recall specifics | 02:25 |
clarkb | corvus would be good for you to double check tomorrow but my first glance seems fine | 02:25 |
clarkb | also periodic jobs are a great way to exercise the new nodepool launchers | 02:25 |
corvus | clarkb: i agree that looks good. we could probably recheck a zuul change and that might exercise the images | 02:27 |
corvus | rechecked https://review.opendev.org/944303 | 02:27 |
clarkb | cool | 02:28 |
frickler | is this job supposed to do anything useful? https://zuul.opendev.org/t/zuul/builds?job_name=zuul-nox-py311-multi-scheduler&project=zuul%2Fzuul&result=SUCCESS&skip=0 does timeout for me, without the success filter I only see failures+timeouts | 07:44 |
*** dmellado0755393737 is now known as dmellado075539373 | 09:09 | |
*** ykarel_ is now known as ykarel | 11:12 | |
frickler | #status log paused ubuntu-noble image builds and deleted the most recent one to mitigate https://bugs.launchpad.net/ubuntu/+source/linux/+bug/2104134 | 12:17 |
opendevstatus | frickler: finished logging | 12:17 |
frickler | jamespage: seems haleyb is out, can you take a look at this bug ^^ and make sure it gets proper attention? | 12:22 |
Clark[m] | frickler: I proposed a change yesterday to move that zuul job to the experimental pipeline. The jobs purpose is to run tests with multiple coordinating schedules which has value but getting it stable has been difficult. Maybe easier with larger test nodes I don't know | 13:04 |
Clark[m] | Re the kernel bug this seems like deja vu I swear we had the same problem not too long ago | 13:06 |
Clark[m] | Oh jammy broke in December and now noble is broken on the same bug | 13:07 |
ykarel | Clark[m], yes same issue was with Jammy in second last week of December | 13:07 |
Clark[m] | https://wiki.ubuntu.com/KernelTeam says the kernel team is on matrix now | 13:15 |
Clark[m] | Bugs that break firewalls on lts kernels are probably worth being up there? | 13:15 |
Clark[m] | If no one beats me to it I can send a message once I'm actually fed and awake | 13:16 |
frickler | Clark[m]: iiuc ykarel did so already | 13:32 |
fungi | all's the better | 13:33 |
ykarel | Clark[m], frickler yes i already send a message there | 13:59 |
opendevreview | Jeremy Stanley proposed opendev/bindep master: Comment reminding to replace extras with depgroups https://review.opendev.org/c/opendev/bindep/+/945402 | 14:14 |
fungi | latest test results on ^ indicate centos 9 mirrors are back to working again | 14:45 |
opendevreview | Jeremy Stanley proposed opendev/engagement master: Update project boilerplate https://review.opendev.org/c/opendev/engagement/+/945151 | 14:46 |
opendevreview | Jeremy Stanley proposed opendev/engagement master: Import old who-approves.py script https://review.opendev.org/c/opendev/engagement/+/945152 | 14:46 |
opendevreview | Jeremy Stanley proposed opendev/engagement master: Ratchet down and simplify linting rules https://review.opendev.org/c/opendev/engagement/+/945212 | 14:46 |
opendevreview | Jeremy Stanley proposed opendev/engagement master: Rename who-approves.py to maintainers.py https://review.opendev.org/c/opendev/engagement/+/945224 | 14:46 |
opendevreview | Jeremy Stanley proposed opendev/engagement master: Add a convenience entrypoint for maintainers.py https://review.opendev.org/c/opendev/engagement/+/945225 | 14:46 |
opendevreview | Jeremy Stanley proposed opendev/engagement master: Rewrite maintainers.py functionality https://review.opendev.org/c/opendev/engagement/+/945262 | 14:46 |
clarkb | infra-root https://review.opendev.org/c/openstack/project-config/+/945398 and https://review.opendev.org/c/opendev/zone-opendev.org/+/945399 are the last two changes for cleaning up the old nodepool launchers if the new launchers look good to you | 14:56 |
corvus | i don't think the mariadb statement timeouts are working in opendev. i ran through everything manually and they seem to work. so i'm going to restart the web servers again just to make sure i didn't get wires crossed and they somehow started using the mysql dialect dburi. if that doesn't work, then i'll have to dig deeper. | 15:48 |
clarkb | ack | 15:48 |
clarkb | fwiw the serach builds by project performance did seem a lot better | 15:48 |
corvus | yep, that much is working (which does make me suspect that the configuration is correct). but still, gotta cross this off the list. | 15:49 |
corvus | oh actually, that would hit with mysql dialect too | 15:49 |
corvus | so, yeah. restarting now. | 15:49 |
clarkb | ah | 15:50 |
corvus | i'll restart the schedulers too, just because there's a small version bump. that way they match. | 15:51 |
clarkb | last call for objections on 945398 and 945399 otherwise I'll approve them and then work on cleaning up nl01, nl02, nl03, and nl04 on the cloud side | 15:51 |
fungi | another fairly active thread has started up on the python community discourse in relation to yesterday's setuptools regression: https://discuss.python.org/t/how-can-build-backends-avoid-breaking-users-when-they-make-backwards-incompatible-changes/85847 | 15:55 |
fungi | clarkb: i've approved them both | 15:58 |
clarkb | fungi: thanks | 15:59 |
clarkb | I was just about to do so myself saved me a few clicks | 15:59 |
clarkb | re that thread it seems to be saying what I was trying to get at yesterday which is nice to see | 15:59 |
corvus | okay restart didn't fix it. off to the repl. | 16:01 |
opendevreview | Merged opendev/zone-opendev.org master: Cleanup nl01, nl02, nl03, and nl04 DNS records https://review.opendev.org/c/opendev/zone-opendev.org/+/945399 | 16:02 |
opendevreview | Merged openstack/project-config master: Cleanup configs for nl01, nl02, nl03, and nl04 https://review.opendev.org/c/openstack/project-config/+/945398 | 16:08 |
clarkb | once those have deployed I'll proceed with server deletion and emergency file cleanup. Should be able to get that done well before the next round of tuesday meetings | 16:09 |
clarkb | deployment succeeded for both changes. I'm proceeding with server deletions now | 16:23 |
opendevreview | James E. Blair proposed zuul/zuul-jobs master: Add upload-image-s3 role https://review.opendev.org/c/zuul/zuul-jobs/+/944813 | 16:28 |
clarkb | #status log Deleted nl01.opendev.org (7bf432b1-392f-4c34-adc3-f11f8181a187), nl02.opendev.org (553767f5-b6af-4684-b716-3ad2e16e18e2), nl03.opendev.org (a53d3af1-dfc0-4cb0-9cd4-d57e43355230), and nl04.opendev.org (c8206f41-eded-44be-ae3f-a18f4788fd39). They have been replaced by nl05-08. | 16:30 |
clarkb | hrm status bot is here just being slow I guess | 16:31 |
opendevstatus | clarkb: finished logging | 16:31 |
fungi | clarkb: remember it does a synchronous write to the wiki | 16:42 |
clarkb | oh right | 16:43 |
fungi | so if the meediawiki api is dead slow responding (which it often is these days, especially for database writes), it can take an age | 16:43 |
jamespage | frickler: I need to find someone at canonical to point you at | 16:48 |
frickler | jamespage: seems haleyb was back today so best check with him I'd think | 16:50 |
jamespage | frickler: ack - ftr I'm no longer at Canonical so on the outside as well now :) | 16:52 |
jamespage | I've asked fnordahl to join this channel as he should be aware of this | 16:53 |
clarkb | other than the bug itself I think the main feedback may be that it would be good if ubuntu could track buggy kernel patches to avoid repeating the same bugs release by release months apart | 16:54 |
clarkb | bugs happen and the response in December was much appreciated. Ideally we'd avoid repeating the same issue in noble now | 16:54 |
frickler | jamespage: oh, I wasn't aware of that, I'll try to avoid annoying you with Canonical things in the future, then :-) | 16:56 |
jamespage | frickler: new news - only 2 weeks | 16:56 |
clarkb | oh congrats! | 16:56 |
jamespage | thanks | 16:57 |
frickler | jamespage: nice, so it looks like you're doing containers now. you may want to update your oif page anyway ;) | 17:00 |
jamespage | yep on the TODO list | 17:00 |
clarkb | LE will stop sending expiration email reminders. We're fine as we have our own monitoring and update 30 days in advance but mentioning it here in case anyone was relying on those emails | 17:34 |
fungi | also i don't think we ever received expiration reminders from them? or maybe we just renewed too soon to trigger any | 17:37 |
clarkb | I think we renew too soon to trigger them | 17:39 |
clarkb | fungi: I wouldn't say supporting old pythons is a lot of work. Its only extra work when devs choose to start making superficial changes that impact compatibility | 17:45 |
clarkb | at least for a tool like pbr | 17:45 |
clarkb | with minimal dependencies (setuptoosl only) and a narrow focus/scope | 17:46 |
clarkb | I think the recent breaking chagne is a good example of this. Setuptools can accept both variations of the names using - or _ indefinitely using a small compatibility shim. That is easy to maintain and understand basically forever. but the instant you decide to no longer be backward compatible you have to consider the impacts and that is not easy and requires effort | 17:47 |
fungi | yeah, i tried to point out that cpython is surprisingly backward compatible, and it's setuptools deciding to drop support for old things that otherwise would still work which is causing headaches | 17:48 |
fungi | the effort in maintaining backward compatibility for pbr isn't nearly as much as for larger projects in openstack, but it's still more work than i'm sure some build backend maintainers want to sign up for | 17:50 |
slittle | Does opendev have any automated tools for keeping a feature branch up to date relative to the main branch. i.e. an automated daily merge from 'main' to 'my_branch'. I expect not, as the merge always risks failing on a conflict and manual intervention would be required at that point. | 17:51 |
clarkb | slittle: not for branches that both move independently. jeepyb does have the ability to update a local tracking branch to follow an upsteram but they can't diverge its a copy not a merge | 17:52 |
clarkb | in general I suspect we'd largely recommend feature branches and similar types of work be as short lived as possible | 17:52 |
clarkb | you can maintain stacks of proposed changes on top of branches with fairly minimal effort which means unless there is a really good reason to fork temporarily you're probably better off doing that | 17:53 |
clarkb | fungi: one thing I find odd is that I think pbr is already doing the - to _ mapping for us. Are people then tripping because setuptools is also reading the file too? | 17:53 |
clarkb | fungi: I wonder if we can make pbr/setuptools avoid that extra read and allow pbr to be a compatibility layer. That might work as a workaround for users of pbr | 17:54 |
clarkb | fungi: look at cfg_to_args() and setup_cfg_to_setup_kwargs() to see what I'm talking about | 17:54 |
fungi | clarkb: correct, setuptools has added setup.cfg file validation, based on (incorrect) assumptions that it's the only thing using that file | 17:55 |
fungi | and yeah, that's what i meant in my post about transparently transforming metadata options | 17:56 |
clarkb | slittle: if you can provdie more info about your higher level use case that would help us provide advice that works with the existing tooling | 17:56 |
slittle | IS there an tools to aid in maintaining such a stack of proposed changes? And sharing that stack? I know the pain of trying to keep just a few updates current in gerrit. In high traffic areas it usually thouws a merge conflict pretty quick. | 17:58 |
clarkb | there is git restack: https://opendev.org/opendev/git-restack https://pypi.org/project/git-restack/ I personally just use git rebase -i HEAD~N where N is the number of commits back that I need to edit. I also do what I like to call "squash back" where I edit on the tip with new commits that I know will be squashed back into existing commits that already have changes | 18:00 |
clarkb | its the sort of thing that becomes a lot easier with a little practice | 18:00 |
clarkb | newer/latest git has gotten a lot better about not conflicting on repeated work too which helps when you rearrange the order or stuff | 18:01 |
fungi | i use gits- restack all the time. just used it today for this series of changes, for example: https://review.opendev.org/c/opendev/engagement/+/945225 | 18:01 |
fungi | er, git-restack | 18:01 |
slittle | Basically I have a starlingx feature about to launch that will run for 6 months minimum and hit a dozen gits. Right now my best recommendation for them is to branch all gits and DO NOT try to keep up with the 'main' branch on a continuing basis. Instead I'm suggesting they do just a few manual merges at well chosen times. i.e. when both main and feature are otherwise healthy. | 18:03 |
clarkb | ya so thats a pretty classic feature branch setup and in general I think we expect those to merge manually (because you may need to merge in either direction and it changes over time and conflicts tend to be common with feature work) | 18:04 |
clarkb | the downside to working that way is merging can become a lot more difficult as you aren't doing it a piece at a time its everything all at once every time | 18:04 |
clarkb | the upside is you can ignore all the other work happening while you work on your feature branch until you go to merge | 18:05 |
fungi | right, usually whoever's maintaining that feature branch (e.g. release team members) will have the necessary permissions to merge from master into the feature branch at their discretion, whenever they feel it's needed, and then to merge the feature branch into master when they're ready to wrap it up | 18:05 |
clarkb | most openstack projects develop new features directly against master all the time and don't use featur ebranches. There are rare exceptions and they tend to be for specific features (though swift has used tehm more than others iirc) | 18:06 |
fungi | if instead you want to continually work in sync with master, rebasing a change series targeting master will be less work | 18:06 |
clarkb | which is to say both approaches are valid and do work. You just need to pick which poison is better for you | 18:06 |
slittle | What work is required to setup the feature owner with permissions to merge freely into there branch? | 18:07 |
fungi | also if you're doing this across multiple git repositories, you may need depends-on footers in the commit messages of some changes where they rely on series in a sister repository | 18:07 |
fungi | slittle: https://docs.opendev.org/opendev/infra-manual/latest/drivers.html#feature-branches | 18:08 |
slittle | I guess the other aspect is that this is a multi-developer feature. I've only ever seem rebase used sucessfully for single developer features. | 18:09 |
slittle | seem -> seen | 18:09 |
clarkb | there are two approaches to ahndle mutli devs working no the same stack that I've seen work well. The first is to always git reviwe -d the stack before you edit it to ensure you have the latest copy and do some lightweight comms "I'm working on that now" | 18:10 |
clarkb | the other is to decouple it a bit and rely on depends-on rather than the git tree to enforce roder | 18:11 |
clarkb | re automating merges one thing to keep in mind is if you can git merge things trivially then that is trivial for anyone to do at any point and there is less value to doing it daily or on a schedule. If there are conflicts they need to eb resolved and that requires a human anyway | 18:11 |
fungi | but generally the main reason to use a feature branch is if you want to make breaking changes that don't impact master until later, and are willing to incur the associated pain of dealing with that at merge points | 18:13 |
fungi | usually projects either develop in master and then create stable branches at some cadence to provide a lower-churn option, or they develop on feature branches so that master will be lower-churn. doing both at the same time is a lot less common | 18:14 |
clarkb | it also helps a lot to make code review and landing code an active part of the dev loop | 18:14 |
clarkb | that minimizes the critical sections and reduces the depths of stacks/context you have to deal with | 18:15 |
clarkb | consistent incremental progress essentially | 18:15 |
opendevreview | Stephen Finucane proposed openstack/project-config master: gerritbot: Log changes to stable branches on #openstack-keystone https://review.opendev.org/c/openstack/project-config/+/945512 | 18:35 |
opendevreview | Merged openstack/project-config master: gerritbot: Log changes to stable branches on #openstack-keystone https://review.opendev.org/c/openstack/project-config/+/945512 | 18:59 |
frickler | I'm seeing concerning job timeouts on rax-dfw. for the last two weeks or so that was mostly kolla jobs, now two for keystone, in particular a simply docs job that really really shouldn't timeout https://zuul.opendev.org/t/openstack/build/b02c3d859f6e4084ac2447a0b353b8e2 https://zuul.opendev.org/t/openstack/build/d99690acde8e4745bf3c1d3aa832f974 | 20:39 |
frickler | these seem mostly to be happening when the cloud is running at capacity, so I'm thinking maybe to limit max-servers there for a while. like go to 100 from 140? https://grafana.opendev.org/d/a8667d6647/nodepool3a-rackspace?orgId=1&from=now-6h&to=now&timezone=utc&var-region=$__all | 20:41 |
fungi | looks like it has the expected processor count and ram, at least | 20:43 |
fungi | so not a scheduling mix-up | 20:43 |
fungi | maybe this is a good incentive to pick jamesdenton's brain about shifting more of our quota from rackspace classic to flex? | 20:44 |
fungi | since the network and mirror rebuilds, i haven't observed any issues with the test nodes we've been booting in either dfw3 or sjc3 | 20:46 |
opendevreview | Aurelio Jargas proposed zuul/zuul-jobs master: Add role: `ensure-python-command`, refactor similar roles https://review.opendev.org/c/zuul/zuul-jobs/+/941490 | 21:06 |
gouthamr | has anyone run into an issue where devstack bails out quite early in CI jobs with apache2 restarts failing? my specific issue seems to occur after setting up "keystone-tls-proxy", and bouncing teh apache2 service for that to take effect | 21:41 |
gouthamr | The error i see in the journal is "apache2.service: Failed with result 'start-limit-hit'." | 21:41 |
gouthamr | apache2.service: Start request repeated too quickly. | 21:41 |
tonyb | gouthamr: I haven't seen it. So you have logs from the failed job? | 21:43 |
gouthamr | tonyb: yes, https://zuul.opendev.org/t/openstack/build/e2fbf3148ba449c6ae5e0ec3f45c3318/log/controller/logs/devstacklog.txt#6218-6227 | 21:43 |
tonyb | https://storage.bhs.cloud.ovh.net/v1/AUTH_dcaab5e32b234d56b626f72581e3644c/zuul_opendev_logs_e2f/openstack/e2fbf3148ba449c6ae5e0ec3f45c3318/controller/logs/apache/tls-proxy_error_log.txt doesn't have any errors and I doubt that warning is the cause | 21:49 |
gouthamr | yeah :/ this is happening way every time on a single change, but not always on the same devstack job, which, gives me the feels that the particular change is cursed :D | 21:50 |
gouthamr | https://review.opendev.org/c/openstack/manila-tempest-plugin/+/942862 | 21:50 |
tonyb | I'll keep looking, but it's slow going because I'm on my phone | 21:50 |
gouthamr | ty for taking a look, tonyb | 21:50 |
gouthamr | ++ | 21:50 |
JayF | This sounds vaguely like an issue we had in ironic, I don't remember how we fixed it | 21:58 |
* JayF can't find it in gerrit | 22:00 | |
tonyb | gouthamr: I think I need my laptop to do more digging. Does a no op change on the same SHA with the (merged) depends-on fail the same way? | 22:03 |
Clark[m] | gouthamr tonyb https://serverfault.com/questions/845471/service-start-request-repeated-too-quickly-refusing-to-start-limit | 23:02 |
Clark[m] | Probably just need to update the unit file to allow more restarts. That will be simpler than changing how devstack updates apache. My guess is those jobs ran on faster rax flex nodes and that allows them to restart too quickly | 23:02 |
clarkb | I don't think you need to fully replace the /usr/lib/systemd/system unit you can just append to it via /etc/systemd/ or whatever the path is | 23:24 |
clarkb | whoever decided that pyenv installing python 3.13 should install to /usr/local/bin/python3.13.2t is crazy | 23:27 |
clarkb | ok now I shall go back to enjoying the nice weather. Tomorrow I'll try to land things that have bee nreviewed | 23:33 |
corvus | \o/ mariadb query timeouts look good now: Query | 1 | Sending data | SET STATEMENT max_statement_time=30.0 for ... | 23:48 |
corvus | i restarted the schedulers and web servers to pick up the fix | 23:48 |
Generated by irclog2html.py 2.17.3 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!