opendevreview | Steve Baker proposed openstack/diskimage-builder master: growvols: reserve space for spare metadata volume https://review.opendev.org/c/openstack/diskimage-builder/+/892244 | 00:45 |
---|---|---|
opendevreview | OpenStack Proposal Bot proposed openstack/project-config master: Normalize projects.yaml https://review.opendev.org/c/openstack/project-config/+/892257 | 02:39 |
opendevreview | Steve Baker proposed openstack/diskimage-builder master: growvols: reserve space for spare metadata volume https://review.opendev.org/c/openstack/diskimage-builder/+/892244 | 03:49 |
opendevreview | Merged openstack/project-config master: Normalize projects.yaml https://review.opendev.org/c/openstack/project-config/+/892257 | 05:14 |
fungi | judging from the running builds graph, looks like we're doing around 600 concurrent builds at the moment | 13:04 |
fungi | also very close to (or maybe right at) full quota saturation, so we likely won't see it climb any higher than that | 13:04 |
fungi | hovering around 650-660 nodes in an "in use" state | 13:06 |
fungi | actually looks like it may have gotten as high as 713 nodes in use | 13:06 |
fungi | oh, right, the executor queue graph gives a total number of running jobs and that reached 555 at the top of the hour | 13:07 |
fungi | so we seem to be averaging 1.28 nodes per build | 13:08 |
*** d34dh0r5- is now known as d34dh0r53 | 13:09 | |
fungi | looks like a bunch of changes were dumped into openstack's check pipeline around 12:40z, leading most of the executors to stop accepting new builds for a bit, but they caught up with that right at 13:00z | 13:11 |
clarkb | high level queues don't look too bad at first glance | 15:00 |
fungi | at 14:56:40 we had 627 concurrent builds | 15:16 |
fungi | only 673 nodes were in use at that time, so 1.07 nodes per build | 15:17 |
fungi | at 13:57:20 we reached 719 nodes in use, but were only running 624 builds at that time | 15:19 |
fungi | we're back to having available quota again for the past half hour | 15:20 |
clarkb | fungi: ildikov: did the matrix eavesdrop bot ever get added to the starlingx rooms? | 16:21 |
opendevreview | Clark Boylan proposed opendev/bindep master: Drop fedora-latest test job https://review.opendev.org/c/opendev/bindep/+/892378 | 16:33 |
opendevreview | Clark Boylan proposed opendev/base-jobs master: Remove fedora-latest nodeset https://review.opendev.org/c/opendev/base-jobs/+/892380 | 16:39 |
frickler | clarkb: looking at your bindep patch I notice that there isn't any testing on debian, either, is that intentional or could we add at least bookworm maybe? | 16:42 |
clarkb | frickler: I think we can add bookworm. | 16:42 |
clarkb | I doubt it is intentional. More likely a result of us doing most of the bindep work long befor we had debian test nodes | 16:42 |
frickler | oops and py27 is broken | 16:43 |
clarkb | thats weird it says it can't find the 2.7 interpreter but the job does install it | 16:45 |
clarkb | I wonder if the latest release of nox breaks python2.7 interpreter discovery | 16:46 |
frickler | might be related to the venv it runs from? but then I don't know why it worked earlier | 16:46 |
clarkb | ya looks like the error actually comes from virtualenv according to google | 16:47 |
clarkb | I guess we need to cap the virtualenv version | 16:47 |
clarkb | I think in the ensure-nox role we need to cap virtualenv on python2.7 jobs | 16:50 |
clarkb | or maybe we should just downgrade it in a child job step | 16:50 |
fungi | clarkb: there has been no change yet to add the bot, but i can push one in a moment | 16:55 |
opendevreview | Clark Boylan proposed zuul/zuul-jobs master: Fix nox job python2.7 support https://review.opendev.org/c/zuul/zuul-jobs/+/892384 | 17:03 |
clarkb | frickler: something like that maybe ^ | 17:03 |
frickler | clarkb: that may need extra quoting for the "<", but with the downgrade my local bindep test succeeds | 17:05 |
clarkb | I think because it is command its fine? | 17:06 |
clarkb | command doesn't do shell things like redirection | 17:06 |
frickler | maybe. anyway we can test it out, can't get more broken I guess | 17:07 |
clarkb | indeed | 17:08 |
frickler | hmm, weird, when I was looking in the UI at https://review.opendev.org/c/zuul/zuul-jobs/+/892384 I could see multiple conflicting patches. after looking at one of them and deciding it could be abandoned, I no longer see the others | 17:14 |
clarkb | abandoning may trigger a reindex that finds they no longer conlict? | 17:16 |
clarkb | that is odd though | 17:16 |
frickler | weirder even, after submitting my W+1, they're back | 17:16 |
frickler | same thing repeats after looking at the next patch in question. resolved by adding a comment | 17:18 |
clarkb | I'm guessing it has to do with index state for the change(s) | 17:18 |
clarkb | since taking some action restores the info maybe that triggers the data to be repopulated | 17:18 |
corvus | quick question: did anything test https://review.opendev.org/892384 ? | 17:18 |
clarkb | corvus: no but its broken anyway so seemed safe (eg we won't make it any more broken) | 17:19 |
frickler | corvus: no, but the job was broken anyway, so I decided it couldn't get worse | 17:19 |
frickler | see https://review.opendev.org/c/opendev/bindep/+/892378 | 17:20 |
corvus | i'm not doubting that it's broken | 17:20 |
clarkb | and frickler did a local reproduction of steps to see that fixes it generally (but not through the playbook in that change aiui) | 17:20 |
frickler | I tested locally that the venv cap solves the nox run | 17:20 |
opendevreview | Jeremy Stanley proposed opendev/system-config master: Add StarlingX Matrix channels to the logbot https://review.opendev.org/c/opendev/system-config/+/892387 | 17:20 |
fungi | clarkb: ildikov: ^ | 17:21 |
corvus | okay it's just... i mean, is it hard to test that? | 17:21 |
frickler | we'll test in bindep in 5 minutes when it is merged, not worth any additional effort imo | 17:21 |
fungi | dnm bindep change depending on it maybe? | 17:23 |
corvus | tests serve a purpose to avoid regressions too | 17:23 |
corvus | this just doesn't seem like our usual practice in zuul-jobs, especially with super popular roles/jobs like this one | 17:23 |
fungi | or set 892378 to depends-on it should work right? | 17:23 |
corvus | this affects every zuul running nox-py27 | 17:24 |
corvus | fungi: and with a depends-on, there's no rush to merge it without review | 17:25 |
corvus | so why can't this be tested? (and also, could this be done in one of the roles instead?) | 17:27 |
opendevreview | Merged zuul/zuul-jobs master: Fix nox job python2.7 support https://review.opendev.org/c/zuul/zuul-jobs/+/892384 | 17:27 |
fungi | too late now i guess | 17:27 |
corvus | not really? | 17:27 |
fungi | unless we revert it | 17:28 |
fungi | i'm happy to push a revert if you like | 17:28 |
corvus | i don't think "we slipped this change in real fast before anyone could notice" is our policy either | 17:28 |
fungi | i agree | 17:28 |
fungi | i happened to be in a conference call and missed all the discussion around it | 17:28 |
corvus | i'd actually just like to have test coverage | 17:29 |
fungi | (still in a conference call unfortunately, actually in meetings for the next several hours straight) | 17:29 |
corvus | i believe frickler, but i also want to be able to merge changes to zuul-jobs with confidence that we don't break things | 17:29 |
corvus | that requires test coverage | 17:29 |
corvus | so i'd like to either have a test for it in zuul-jobs, or a good reason why we don't (we don't have 100% test coverage, and sometimes there's good reasons for that). i'm struggling to figure out what that is for nox-py27 -- which is not esoteric. | 17:30 |
frickler | but that is unrelated to fixing the job after it broke | 17:31 |
clarkb | corvus: we don't have any runtime testing of the nox jobs. Just the ensure-nox role | 17:31 |
frickler | it was broken before exactly because there is no regular test running that would have caught it regressing | 17:31 |
fungi | it provides confidence in the purported fix | 17:31 |
corvus | clarkb: indeed, which is why my next question is: could that be placed in the role instead so that we can maintain the "most of the work is in the roles" structure? | 17:32 |
corvus | frickler: it broke due to an external factor, not a change in zuul-jobs | 17:32 |
clarkb | corvus: we could put it in the role. The reason I didn't put it there is python2.7 will eventually go away and this gives us a nice cleanup spot. Btu we can also delete tasks out of a role | 17:33 |
clarkb | and ya sorry I'm multitasking with a phone call meeting | 17:33 |
frickler | corvus: yes, that's what causes 90% of all regressions in my experience, so? | 17:33 |
corvus | frickler: tests in zuul-jobs are to prevent changes in zuul-jobs from breaking zuul-jobs. they are not meant to catch external factors. | 17:33 |
corvus | this is a change to zuul-jobs, so it (1) should have had a test to verify it worked and (2) should have a test to verify that it doesn't break in the future if someone changes that file. | 17:34 |
frickler | but why is that one thing more important than the other? | 17:34 |
corvus | frickler: we can do something about it. | 17:34 |
frickler | well we could do something about this, too | 17:35 |
corvus | so to be clear, if i had been given the opportunity to review that change, i would have -1'd it asking for 1) can the logic be put in a role; 2) can it be tested? and i think #1 would have taken care of #2. | 17:35 |
frickler | well you can still do that now. what if I as potential patch submitter would have said that I don't want to go the extra effort, would you have left the job broken? | 17:36 |
corvus | clarkb: frickler are either of you interested in implementing that? | 17:36 |
corvus | frickler: yes i would | 17:36 |
clarkb | oh the other issue with putting it in ensure-nox is we don't currently hvae any knowledge of what the python version or eventual sessions will be when installing nox | 17:37 |
clarkb | but we can add a flag like nox_python27_virtualenv: true or something to address that | 17:37 |
clarkb | and ya I can push a change up that does ^ | 17:38 |
corvus | clarkb: that's a good point. that would be one way of doing that, or, if we decide the job is the best place, we can add job tests. | 17:38 |
fungi | if nobody is interested in adding regression testing for a role which has broken, that's a signal that removing it is the better alternative | 17:39 |
frickler | https://zuul.opendev.org/t/opendev/build/11deacb2f82e4ca1b29d7448eb4ddc19 just in case anyone cares | 17:39 |
corvus | fungi: and indeed we have deprecated and removed things from zuul-jobs because of that | 17:39 |
frickler | but yes, maybe dropping support for py27 would be a better path | 17:40 |
clarkb | that also works for me at this point I think. Bookworm took the plunge and thats a good indicator we can care less about python2.7 now | 17:40 |
corvus | after all, the python project did :) | 17:40 |
clarkb | I've got meetings and/or meeting prep for the next couple of hours. If we can decide on a preference between the brainstormed options I can try to push a change up to implement it after lunch | 17:41 |
corvus | here's my request: please take zuul-jobs testing seriously, and code review too. keep in mind that zuul-jobs is a zuul project, used by many zuul users, not just there for opendev's needs. | 17:43 |
corvus | i would be happy to review a removal of py27 because it's unsupported by upstream, or additional testing of the backport pin. | 17:45 |
clarkb | bindep is the only place using nox-py27 in codesearch and I Think we can drop it from bindep | 17:48 |
clarkb | I'll take a look at cleaning up that job from bindep and zuul-jobs later today | 17:49 |
corvus | wfm, thanks! | 17:49 |
fungi | i'm in favor of dropping it but can't commit to pushing a change for that until middle of my tomorrow at the earliest | 17:49 |
frickler | I may not be around for the meeting later, so this may be a good opportunity for my question now: how do you intend to test rolling out ansible 8 for openstack? I have no idea how large the impact might be, but I very much would like to avoid a repetition of the queue config situation, which nobody except me seems to really care about | 18:01 |
clarkb | frickler: anyone and any project can opt into it right now with speculative testing of everything that isn't trusted | 18:01 |
frickler | clarkb: with 90% of openstack nearly dead, this will not happen | 18:02 |
fungi | i think we need to take a harder line on removing projects from the openstack tenant if they have broken configuraiton | 18:02 |
clarkb | in my head we'd set a date to make it the default sometime after the release and encourage people to test it via speculative changes before then | 18:02 |
clarkb | and ya if after the hard cutover date projects are dead we can remove them from the zuul tenant config | 18:02 |
fungi | "eol your broken branches or we will turn off testing for the entire repo" | 18:03 |
corvus | ++ to all of the above | 18:03 |
fungi | apropos the openstacl tc meeting is happening right this moment | 18:03 |
clarkb | the upside to tenant config cleanup is that it is really easy to revert and it doesn't really matter too much if the project configs are broken for a project. The config updates either way aiui | 18:05 |
clarkb | *the zuul tenant config updates either way | 18:05 |
corvus | btw, we have speculatively tested ansible 8 with zuul-jobs, so the basics are known to work. | 18:06 |
corvus | (no changes were required) | 18:06 |
clarkb | corvus: have we flipped the zuul tenant over by default yet? If not should we do that soon? | 18:07 |
corvus | so there's a possibility that it's not a big deal. | 18:07 |
corvus | clarkb: we have not. i could go either way on that (either flip it now for more frequest testing, or wait until zuul flips the default, which is the next step in the process) | 18:07 |
corvus | frequent even | 18:07 |
frickler | is there any way to configure it more specific than on a tenant level? | 18:08 |
corvus | frickler: sure, at the job level | 18:08 |
corvus | that's how we did the initial testing, and how anyone can test it in their project now | 18:08 |
corvus | example change: https://review.opendev.org/c/zuul/zuul-jobs/+/890362 | 18:10 |
clarkb | I kinda like the idea of having zuul's tenant dogfood the change since that will give us broader coverage with people in a good position to debug/fix anything unexpected | 18:10 |
corvus | clarkb: i was mistaken we are running ansible 8 in zuul tenant, sorry | 18:11 |
clarkb | perfect :) | 18:11 |
corvus | https://review.opendev.org/c/openstack/project-config/+/890367 | 18:12 |
corvus | clarkb: we both have questionable memories :) | 18:12 |
clarkb | ha | 18:12 |
corvus | so that's been in place for almost weeks | 18:12 |
corvus | it is also possible to include/exclude config from specific branches (or branch patterns) for projects. not sure if we want to go that route in opendev, but it's a tool in the box. | 18:14 |
corvus | (that's back to the "what to do about projects that won't clean up their room" issue) | 18:14 |
clarkb | oh I thought that wasn't possible. Someone asked about it a couple months ago (jpew maybe?) | 18:16 |
clarkb | oh no that was merging those branches. I think it will always merge the branches but then it can choose to ignore the results? | 18:16 |
clarkb | anyway that is good to know | 18:16 |
corvus | i mean, i wouldn't do that on anything except a completely dead branch and then if someone proposed a change to that branch, not expect any kind of working behavior. | 18:17 |
corvus | but if we open that door in opendev, i worry we would have an unmanageable list of exceptional "stable/" and "bugfix/" branches that are different for every one of the 2000 openstack repos, so.... i don't advocate it. | 18:18 |
clarkb | good point | 18:19 |
corvus | (just mentioning it in case it's useful in some way that we think might actually avoid that outcome :) | 18:19 |
clarkb | I mentioned to the TC that error free zuulconfig is different than working CI jobs though which seems to be part of the confusion over effort required | 18:19 |
JayF | The thing is they are often connected in practice | 18:22 |
JayF | unless we're force-merging changes to fix the config error alone | 18:22 |
fungi | or deleting enough configuration that what's left passes | 18:22 |
clarkb | you shouldn't need to force merge. you just delee what is broken | 18:22 |
JayF | In some of the Ironic cases, that woulda been all the integration tests. | 18:23 |
JayF | You aren't wrong; it's just not easy to toss that many tests | 18:23 |
frickler | finding out what's broken is a lot of work, just force-merging the queue config fix was a lot easier | 18:23 |
fungi | well, those tests are already not being run because zuul can't parse the configuration in order to know to run them | 18:23 |
clarkb | sure but the main concern is projects who are dead and don't have anyone to fix CI | 18:23 |
JayF | Yeah in this case we have some not-dead-at-all-but-neglected stuff | 18:24 |
JayF | e.g. ironicclient yoga is in this list | 18:24 |
fungi | so your choice is between not merging anything to the branch and keeping it open because its configuration is broken, or removing the branch | 18:24 |
clarkb | don't fix CI in those cases just stop CI. Either by removing the project from zuul or deleting the broken zuul config so that zuul stop complaining | 18:24 |
fungi | and in fact ci is already stopped anyway | 18:24 |
clarkb | JayF: hopefully ironicclient yoga branch isn't defining any integration jobs for the rest of ironic and if it is you probabl do want to fix it anyway | 18:25 |
fungi | maybe it's unclear that when openstack/python-ironicclient stable/ussuri appears in that config-errors list, zuul already is not running any jobs for changes targeting that branch any longer | 18:25 |
JayF | fungi: it's clear, in this case the frustration is more related to me knowing someone is shipping that, and nobody giving a damn that ^^^ that has been true for at least a year | 18:26 |
clarkb | noonedeadpunk: Zuul's tenant is already on ansible 8 by default. It works generally. | 18:52 |
clarkb | noonedeadpunk: we don't need to sovle python verison and openstack cloud collection problems for the general case. That is all done | 18:53 |
clarkb | however, specific jobs may have problems and that is why we won't flip the switch immediately across the board | 18:53 |
JayF | Honestly I think the biggest ansible pain is gonna be ironic/bifrost | 18:53 |
JayF | IIRC we aren't properly dual-nesting environments there | 18:53 |
JayF | I should run a test on that nowish so we get a canary, if that's true it'll be a project to fix | 18:54 |
clarkb | ++ and when we get emails out about this later this week we'll include instructions for doing that. But basically just push a change that explicitly sets the ansible version to 8 on your job(s) and they will run with newer ansible | 18:54 |
noonedeadpunk | clarkb: ah, ok, good to know. then indeed it should be generally fine | 18:55 |
noonedeadpunk | I don't think that either kola or osa will be affected by this then | 18:56 |
fungi | those should, i expect, already be invoking their own versions of ansible on test nodes anyway | 18:58 |
fungi | this won't affect their ability to do that | 18:58 |
fungi | this is purely a change to the default version of ansible used by the executor to run playbooks listed as phases of the job definitions, and any roles included by them | 18:59 |
fungi | if those playbooks install and run another ansible version, that can be any version of ansible you like | 19:00 |
fungi | it's just another application at that point | 19:01 |
noonedeadpunk | nah, for sure not, I was just having weird behaviour of swift upload when was messing up with ansible versions, or better say openstack.cloud version on ansible 6 | 19:03 |
noonedeadpunk | and things were breaking in multiple places not in a good way in post jobs at least | 19:03 |
noonedeadpunk | but yeah, that wasn't latest zuul version, so hopefully this is covered now | 19:04 |
clarkb | note the swift upload does not use ansible collections for openstack | 19:04 |
fungi | also we're only ourselves uploading to actual swift api endpoints. we tried to use providers who offered swift-like ceph+radosgw and it did not end well | 19:11 |
fungi | mainly it caused problems for the provider, because of lack of proper multi-tenancy and sub-par performance compared to real swift | 19:12 |
opendevreview | James E. Blair proposed openstack/project-config master: Switch opendev tenant to Ansible 8 https://review.opendev.org/c/openstack/project-config/+/892405 | 19:51 |
frickler | clarkb: regarding the gerrit change, anything from 6-16UTC should be fine for me | 20:02 |
clarkb | frickler: cool I can probably be at the computer by 14:30UTC. Maybe you and/or fungi want to approve the config update prior to that so it is deployed then I'm happy to help do a restart around 14:30 | 20:03 |
frickler | that's for tomorrow then, just to avoid any misunderstanding? | 20:04 |
clarkb | yes, tomorrow works for me | 20:05 |
frickler | ok, fine | 20:06 |
fungi | yeah, i'll hopefully be free by then. painters are coming by first thing in the morning to get our guest suite prepped for work over the course of the rest of the week, so i'll probably be interrupted/distracted a bit | 20:09 |
clarkb | ack | 20:09 |
clarkb | I made my service coordinator nomination official and will now find lunch | 20:11 |
opendevreview | Merged openstack/project-config master: Switch opendev tenant to Ansible 8 https://review.opendev.org/c/openstack/project-config/+/892405 | 20:14 |
corvus | that should take effect soon ^ | 20:17 |
Clark[m] | We can recheck the bindep change to exercise it | 20:18 |
opendevreview | Steve Baker proposed openstack/diskimage-builder master: growvols: reserve space for spare metadata volume https://review.opendev.org/c/openstack/diskimage-builder/+/892244 | 20:26 |
opendevreview | Steve Baker proposed openstack/diskimage-builder master: growvols: reserve space for spare metadata volume https://review.opendev.org/c/openstack/diskimage-builder/+/892244 | 20:29 |
ildikov | clarkb: Thanks for the heads up! I just +1'ed fungi's patch that adds the config changes. | 20:32 |
clarkb | fungi: how do we specify that wheels are python3 and not universal? | 21:24 |
clarkb | do we just drop the wheel section from setup.cfg? | 21:24 |
fungi | yes | 21:24 |
fungi | it's that wheel.universal=1 which tells it to claim support for py2.py3 instead of just whichever one it was built with | 21:25 |
fungi | without that, if we build under python3 it will make a py3 wheel, and if we build under python(2) it will create a py2 wheel | 21:26 |
opendevreview | Clark Boylan proposed opendev/bindep master: Drop python27 testing and support https://review.opendev.org/c/opendev/bindep/+/892407 | 21:27 |
*** timburke_ is now known as timburke | 21:28 | |
opendevreview | Clark Boylan proposed zuul/zuul-jobs master: Remove the nox-py27 job https://review.opendev.org/c/zuul/zuul-jobs/+/892408 | 21:30 |
clarkb | corvus: ^ fyi | 21:30 |
fungi | clarkb: soft -1 on 892407 | 21:49 |
clarkb | good point | 21:49 |
opendevreview | Clark Boylan proposed opendev/bindep master: Drop python27 testing and support https://review.opendev.org/c/opendev/bindep/+/892407 | 21:51 |
fungi | clarkb: fwiw, we also don't actually test 3.5 there currently either, should we continue to claim support for it? | 21:52 |
clarkb | fungi: I stronlgy suspect it would work given the small number of deps and lack of active development adding new syntax | 21:52 |
fungi | if dropping 2.7 testing means we should stop claiming to support 2.7, then lack of 3.5 testing likely means we shou | 21:52 |
fungi | ldn't claim support for it either | 21:52 |
clarkb | but ya maybe we should clean it all up | 21:53 |
fungi | i mean, we currently don't have reason to believe 2.7 would immediately cease working either | 21:53 |
fungi | we seem to assume 3.5 works even though we haven't tested it for ages | 21:54 |
fungi | looks like we dropped it with the switch to nox last year | 21:55 |
opendevreview | Clark Boylan proposed opendev/bindep master: Drop Python 2.7/3.5 testing and support https://review.opendev.org/c/opendev/bindep/+/892407 | 21:56 |
clarkb | how does that look | 21:56 |
clarkb | I'm doing review of the gitea change and remembered/noticed that the existing gitea servers have /data/gitea/jwt/private.pem contents already and I'm not sure if those will change and invalidate any jwt stuff if I explicitly set a secret value | 22:04 |
clarkb | I think it may do that but since we don't do oauth it doesn't matter | 22:04 |
clarkb | but I can check on the held node if this is a problem by changing the secret by hand and restarting things | 22:04 |
fungi | on a related note, i looked at the recent zuul-announce posts i received and all the headers now correctly claim lists.zuul-ci.org | 23:00 |
fungi | the only opendev references were in the received headers, where exim helo's as lists01.opendev.org | 23:01 |
fungi | which is correct behavior by my reckoning | 23:01 |
fungi | since that's its reverse dns name too | 23:01 |
clarkb | sounds correct to me | 23:03 |
opendevreview | Merged opendev/bindep master: Drop fedora-latest test job https://review.opendev.org/c/opendev/bindep/+/892378 | 23:05 |
opendevreview | Merged opendev/bindep master: Drop Python 2.7/3.5 testing and support https://review.opendev.org/c/opendev/bindep/+/892407 | 23:06 |
opendevreview | Merged openstack/diskimage-builder master: growvols: reserve space for spare metadata volume https://review.opendev.org/c/openstack/diskimage-builder/+/892244 | 23:52 |
opendevreview | Merged openstack/diskimage-builder master: Fix baseurl for Fedora versions before 36 https://review.opendev.org/c/openstack/diskimage-builder/+/890650 | 23:52 |
opendevreview | Merged openstack/diskimage-builder master: Remove lower-constraints.txt https://review.opendev.org/c/openstack/diskimage-builder/+/878287 | 23:55 |
Generated by irclog2html.py 2.17.3 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!