Tuesday, 2023-08-22

opendevreviewSteve Baker proposed openstack/diskimage-builder master: growvols: reserve space for spare metadata volume  https://review.opendev.org/c/openstack/diskimage-builder/+/89224400:45
opendevreviewOpenStack Proposal Bot proposed openstack/project-config master: Normalize projects.yaml  https://review.opendev.org/c/openstack/project-config/+/89225702:39
opendevreviewSteve Baker proposed openstack/diskimage-builder master: growvols: reserve space for spare metadata volume  https://review.opendev.org/c/openstack/diskimage-builder/+/89224403:49
opendevreviewMerged openstack/project-config master: Normalize projects.yaml  https://review.opendev.org/c/openstack/project-config/+/89225705:14
fungijudging from the running builds graph, looks like we're doing around 600 concurrent builds at the moment13:04
fungialso very close to (or maybe right at) full quota saturation, so we likely won't see it climb any higher than that13:04
fungihovering around 650-660 nodes in an "in use" state13:06
fungiactually looks like it may have gotten as high as 713 nodes in use13:06
fungioh, right, the executor queue graph gives a total number of running jobs and that reached 555 at the top of the hour13:07
fungiso we seem to be averaging 1.28 nodes per build13:08
*** d34dh0r5- is now known as d34dh0r5313:09
fungilooks like a bunch of changes were dumped into openstack's check pipeline around 12:40z, leading most of the executors to stop accepting new builds for a bit, but they caught up with that right at 13:00z13:11
clarkbhigh level queues don't look too bad at first glance15:00
fungiat 14:56:40 we had 627 concurrent builds15:16
fungionly 673 nodes were in use at that time, so 1.07 nodes per build15:17
fungiat 13:57:20 we reached 719 nodes in use, but were only running 624 builds at that time15:19
fungiwe're back to having available quota again for the past half hour15:20
clarkbfungi: ildikov: did the matrix eavesdrop bot ever get added to the starlingx rooms?16:21
opendevreviewClark Boylan proposed opendev/bindep master: Drop fedora-latest test job  https://review.opendev.org/c/opendev/bindep/+/89237816:33
opendevreviewClark Boylan proposed opendev/base-jobs master: Remove fedora-latest nodeset  https://review.opendev.org/c/opendev/base-jobs/+/89238016:39
fricklerclarkb: looking at your bindep patch I notice that there isn't any testing on debian, either, is that intentional or could we add at least bookworm maybe?16:42
clarkbfrickler: I think we can add bookworm.16:42
clarkbI doubt it is intentional. More likely a result of us doing most of the bindep work long befor we had debian test nodes16:42
frickleroops and py27 is broken16:43
clarkbthats weird it says it can't find the 2.7 interpreter but the job does install it16:45
clarkbI wonder if the latest release of nox breaks python2.7 interpreter discovery16:46
fricklermight be related to the venv it runs from? but then I don't know why it worked earlier16:46
clarkbya looks like the error actually comes from virtualenv according to google16:47
clarkbI guess we need to cap the virtualenv version16:47
clarkbI think in the ensure-nox role we need to cap virtualenv on python2.7 jobs16:50
clarkbor maybe we should just downgrade it in a child job step16:50
fungiclarkb: there has been no change yet to add the bot, but i can push one in a moment16:55
opendevreviewClark Boylan proposed zuul/zuul-jobs master: Fix nox job python2.7 support  https://review.opendev.org/c/zuul/zuul-jobs/+/89238417:03
clarkbfrickler: something like that maybe ^17:03
fricklerclarkb: that may need extra quoting for the "<", but with the downgrade my local bindep test succeeds17:05
clarkbI think because it is command its fine?17:06
clarkbcommand doesn't do shell things like redirection17:06
fricklermaybe. anyway we can test it out, can't get more broken I guess17:07
fricklerhmm, weird, when I was looking in the UI at https://review.opendev.org/c/zuul/zuul-jobs/+/892384 I could see multiple conflicting patches. after looking at one of them and deciding it could be abandoned, I no longer see the others17:14
clarkbabandoning may trigger a reindex that finds they no longer conlict?17:16
clarkbthat is odd though17:16
fricklerweirder even, after submitting my W+1, they're back17:16
fricklersame thing repeats after looking at the next patch in question. resolved by adding a comment17:18
clarkbI'm guessing it has to do with index state for the change(s)17:18
clarkbsince taking some action restores the info maybe that triggers the data to be repopulated17:18
corvusquick question: did anything test https://review.opendev.org/892384 ?17:18
clarkbcorvus: no but its broken anyway so seemed safe (eg we won't make it any more broken)17:19
fricklercorvus: no, but the job was broken anyway, so I decided it couldn't get worse17:19
fricklersee https://review.opendev.org/c/opendev/bindep/+/89237817:20
corvusi'm not doubting that it's broken17:20
clarkband frickler did a local reproduction of steps to see that fixes it generally (but not through the playbook in that change aiui)17:20
fricklerI tested locally that the venv cap solves the nox run17:20
opendevreviewJeremy Stanley proposed opendev/system-config master: Add StarlingX Matrix channels to the logbot  https://review.opendev.org/c/opendev/system-config/+/89238717:20
fungiclarkb: ildikov: ^17:21
corvusokay it's just... i mean, is it hard to test that?17:21
fricklerwe'll test in bindep in 5 minutes when it is merged, not worth any additional effort imo17:21
fungidnm bindep change depending on it maybe?17:23
corvustests serve a purpose to avoid regressions too17:23
corvusthis just doesn't seem like our usual practice in zuul-jobs, especially with super popular roles/jobs like this one17:23
fungior set 892378 to depends-on it should work right?17:23
corvusthis affects every zuul running nox-py2717:24
corvusfungi: and with a depends-on, there's no rush to merge it without review17:25
corvusso why can't this be tested?  (and also, could this be done in one of the roles instead?)17:27
opendevreviewMerged zuul/zuul-jobs master: Fix nox job python2.7 support  https://review.opendev.org/c/zuul/zuul-jobs/+/89238417:27
fungitoo late now i guess17:27
corvusnot really?17:27
fungiunless we revert it17:28
fungii'm happy to push a revert if you like17:28
corvusi don't think "we slipped this change in real fast before anyone could notice" is our policy either17:28
fungii agree17:28
fungii happened to be in a conference call and missed all the discussion around it17:28
corvusi'd actually just like to have test coverage17:29
fungi(still in a conference call unfortunately, actually in meetings for the next several hours straight)17:29
corvusi believe frickler, but i also want to be able to merge changes to zuul-jobs with confidence that we don't break things17:29
corvusthat requires test coverage17:29
corvusso i'd like to either have a test for it in zuul-jobs, or a good reason why we don't  (we don't have 100% test coverage, and sometimes there's good reasons for that).  i'm struggling to figure out what that is for nox-py27 -- which is not esoteric.17:30
fricklerbut that is unrelated to fixing the job after it broke17:31
clarkbcorvus: we don't have any runtime testing of the nox jobs. Just the ensure-nox role17:31
fricklerit was broken before exactly because there is no regular test running that would have caught it regressing17:31
fungiit provides confidence in the purported fix17:31
corvusclarkb: indeed, which is why my next question is: could that be placed in the role instead so that we can maintain the "most of the work is in the roles" structure?17:32
corvusfrickler: it broke due to an external factor, not a change in zuul-jobs17:32
clarkbcorvus: we could put it in the role. The reason I didn't put it there is python2.7 will eventually go away and this gives us a nice cleanup spot. Btu we can also delete tasks out of a role17:33
clarkband ya sorry I'm multitasking with a phone call meeting17:33
fricklercorvus: yes, that's what causes 90% of all regressions in my experience, so?17:33
corvusfrickler: tests in zuul-jobs are to prevent changes in zuul-jobs from breaking zuul-jobs.  they are not meant to catch external factors.17:33
corvusthis is a change to zuul-jobs, so it (1) should have had a test to verify it worked and (2) should have a test to verify that it doesn't break in the future if someone changes that file.17:34
fricklerbut why is that one thing more important than the other?17:34
corvusfrickler: we can do something about it.17:34
fricklerwell we could do something about this, too17:35
corvusso to be clear, if i had been given the opportunity to review that change, i would have -1'd it asking for 1) can the logic be put in a role; 2) can it be tested?  and i think #1 would have taken care of #2.17:35
fricklerwell you can still do that now. what if I as potential patch submitter would have said that I don't want to go the extra effort, would you have left the job broken?17:36
corvusclarkb: frickler are either of you interested in implementing that?17:36
corvusfrickler: yes i would17:36
clarkboh the other issue with putting it in ensure-nox is we don't currently hvae any knowledge of what the python version or eventual sessions will be when installing nox17:37
clarkbbut we can add a flag like nox_python27_virtualenv: true or something to address that17:37
clarkband ya I can push a change up that does ^17:38
corvusclarkb: that's a good point.  that would be one way of doing that, or, if we decide the job is the best place, we can add job tests.17:38
fungiif nobody is interested in adding regression testing for a role which has broken, that's a signal that removing it is the better alternative17:39
fricklerhttps://zuul.opendev.org/t/opendev/build/11deacb2f82e4ca1b29d7448eb4ddc19 just in case anyone cares17:39
corvusfungi: and indeed we have deprecated and removed things from zuul-jobs because of that17:39
fricklerbut yes, maybe dropping support for py27 would be a better path17:40
clarkbthat also works for me at this point I think. Bookworm took the plunge and thats a good indicator we can care less about python2.7 now17:40
corvusafter all, the python project did :)17:40
clarkbI've got meetings and/or meeting prep for the next couple of hours. If we can decide on a preference between the brainstormed options I can try to push a change up to implement it after lunch17:41
corvushere's my request: please take zuul-jobs testing seriously, and code review too.  keep in mind that zuul-jobs is a zuul project, used by many zuul users, not just there for opendev's needs.17:43
corvusi would be happy to review a removal of py27 because it's unsupported by upstream, or additional testing of the backport pin.17:45
clarkbbindep is the only place using nox-py27 in codesearch and I Think we can drop it from bindep17:48
clarkbI'll take a look at cleaning up that job from bindep and zuul-jobs later today17:49
corvuswfm, thanks!17:49
fungii'm in favor of dropping it but can't commit to pushing a change for that until middle of my tomorrow at the earliest17:49
fricklerI may not be around for the meeting later, so this may be a good opportunity for my question now: how do you intend to test rolling out ansible 8 for openstack? I have no idea how large the impact might be, but I very much would like to avoid a repetition of the queue config situation, which nobody except me seems to really care about18:01
clarkbfrickler: anyone and any project can opt into it right now with speculative testing of everything that isn't trusted18:01
fricklerclarkb: with 90% of openstack nearly dead, this will not happen18:02
fungii think we need to take a harder line on removing projects from the openstack tenant if they have broken configuraiton18:02
clarkbin my head we'd set a date to make it the default sometime after the release and encourage people to test it via speculative changes before then18:02
clarkband ya if after the hard cutover date projects are dead we can remove them from the zuul tenant config18:02
fungi"eol your broken branches or we will turn off testing for the entire repo"18:03
corvus++ to all of the above18:03
fungiapropos the openstacl tc meeting is happening right this moment18:03
clarkbthe upside to tenant config cleanup is that it is really easy to revert and it doesn't really matter too much if the project configs are broken for a project. The config updates either way aiui18:05
clarkb*the zuul tenant config updates either way18:05
corvusbtw, we have speculatively tested ansible 8 with zuul-jobs, so the basics are known to work.18:06
corvus(no changes were required)18:06
clarkbcorvus: have we flipped the zuul tenant over by default yet? If not should we do that soon?18:07
corvusso there's a possibility that it's not a big deal.18:07
corvusclarkb: we have not.  i could go either way on that (either flip it now for more frequest testing, or wait until zuul flips the default, which is the next step in the process)18:07
corvusfrequent even18:07
frickleris there any way to configure it more specific than on a tenant level?18:08
corvusfrickler: sure, at the job level18:08
corvusthat's how we did the initial testing, and how anyone can test it in their project now18:08
corvusexample change: https://review.opendev.org/c/zuul/zuul-jobs/+/89036218:10
clarkbI kinda like the idea of having zuul's tenant dogfood the change since that will give us broader coverage with people in a good position to debug/fix anything unexpected18:10
corvusclarkb: i was mistaken we are running ansible 8 in zuul tenant, sorry18:11
clarkbperfect :)18:11
corvusclarkb: we both have questionable memories :)18:12
corvusso that's been in place for almost weeks18:12
corvusit is also possible to include/exclude config from specific branches (or branch patterns) for projects.  not sure if we want to go that route in opendev, but it's a tool in the box.18:14
corvus(that's back to the "what to do about projects that won't clean up their room" issue)18:14
clarkboh I thought that wasn't possible. Someone asked about it a couple months ago (jpew maybe?)18:16
clarkboh no that was merging those branches. I think it will always merge the branches but then it can choose to ignore the results?18:16
clarkbanyway that is good to know18:16
corvusi mean, i wouldn't do that on anything except a completely dead branch and then if someone proposed a change to that branch, not expect any kind of working behavior.18:17
corvusbut if we open that door in opendev, i worry we would have an unmanageable list of exceptional "stable/" and "bugfix/" branches that are different for every one of the 2000 openstack repos, so.... i don't advocate it.18:18
clarkbgood point18:19
corvus(just mentioning it in case it's useful in some way that we think might actually avoid that outcome :)18:19
clarkbI mentioned to the TC that error free zuulconfig is different than working CI jobs though which seems to be part of the confusion over effort required18:19
JayFThe thing is they are often connected in practice18:22
JayFunless we're force-merging changes to fix the config error alone18:22
fungior deleting enough configuration that what's left passes18:22
clarkbyou shouldn't need to force merge. you just delee what is broken18:22
JayFIn some of the Ironic cases, that woulda been all the integration tests.18:23
JayFYou aren't wrong; it's just not easy to toss that many tests18:23
fricklerfinding out what's broken is a lot of work, just force-merging the queue config fix was a lot easier18:23
fungiwell, those tests are already not being run because zuul can't parse the configuration in order to know to run them18:23
clarkbsure but the main concern is projects who are dead and don't have anyone to fix CI18:23
JayFYeah in this case we have some not-dead-at-all-but-neglected stuff18:24
JayFe.g. ironicclient yoga is in this list18:24
fungiso your choice is between not merging anything to the branch and keeping it open because its configuration is broken, or removing the branch18:24
clarkbdon't fix CI in those cases just stop CI. Either by removing the project from zuul or deleting the broken zuul config so that zuul stop complaining18:24
fungiand in fact ci is already stopped anyway18:24
clarkbJayF: hopefully ironicclient yoga branch isn't defining any integration jobs for the rest of ironic and if it is you probabl do want to fix it anyway18:25
fungimaybe it's unclear that when openstack/python-ironicclient stable/ussuri appears in that config-errors list, zuul already is not running any jobs for changes targeting that branch any longer18:25
JayFfungi: it's clear, in this case the frustration is more related to me knowing someone is shipping that, and nobody giving a damn that ^^^ that has been true for at least a year18:26
clarkbnoonedeadpunk: Zuul's tenant is already on ansible 8 by default. It works generally.18:52
clarkbnoonedeadpunk: we don't need to sovle python verison and openstack cloud collection problems for the general case. That is all done18:53
clarkbhowever, specific jobs may have problems and that is why we won't flip the switch immediately across the board18:53
JayFHonestly I think the biggest ansible pain is gonna be ironic/bifrost18:53
JayFIIRC we aren't properly dual-nesting environments there18:53
JayFI should run a test on that nowish so we get a canary, if that's true it'll be a project to fix18:54
clarkb++ and when we get emails out about this later this week we'll include instructions for doing that. But basically just push a change that explicitly sets the ansible version to 8 on your job(s) and they will run with newer ansible18:54
noonedeadpunkclarkb: ah, ok, good to know. then indeed it should be generally fine18:55
noonedeadpunkI don't think that either kola or osa will be affected by this then18:56
fungithose should, i expect, already be invoking their own versions of ansible on test nodes anyway18:58
fungithis won't affect their ability to do that18:58
fungithis is purely a change to the default version of ansible used by the executor to run playbooks listed as phases of the job definitions, and any roles included by them18:59
fungiif those playbooks install and run another ansible version, that can be any version of ansible you like19:00
fungiit's just another application at that point19:01
noonedeadpunknah, for sure not, I was just having weird behaviour of swift upload when was messing up with ansible versions, or better say openstack.cloud version on ansible 619:03
noonedeadpunkand things were breaking in multiple places not in a good way in post jobs at least19:03
noonedeadpunkbut yeah, that wasn't latest zuul version, so hopefully this is covered now19:04
clarkbnote the swift upload does not use ansible collections for openstack19:04
fungialso we're only ourselves uploading to actual swift api endpoints. we tried to use providers who offered swift-like ceph+radosgw and it did not end well19:11
fungimainly it caused problems for the provider, because of lack of proper multi-tenancy and sub-par performance compared to real swift19:12
opendevreviewJames E. Blair proposed openstack/project-config master: Switch opendev tenant to Ansible 8  https://review.opendev.org/c/openstack/project-config/+/89240519:51
fricklerclarkb: regarding the gerrit change, anything from 6-16UTC should be fine for me20:02
clarkbfrickler: cool I can probably be at the computer by 14:30UTC. Maybe you and/or fungi want to approve the config update prior to that so it is deployed then I'm happy to help do a restart around 14:3020:03
fricklerthat's for tomorrow then, just to avoid any misunderstanding?20:04
clarkbyes, tomorrow works for me20:05
fricklerok, fine20:06
fungiyeah, i'll hopefully be free by then. painters are coming by first thing in the morning to get our guest suite prepped for work over the course of the rest of the week, so i'll probably be interrupted/distracted a bit20:09
clarkbI made my service coordinator nomination official and will now find lunch20:11
opendevreviewMerged openstack/project-config master: Switch opendev tenant to Ansible 8  https://review.opendev.org/c/openstack/project-config/+/89240520:14
corvusthat should take effect soon ^20:17
Clark[m]We can recheck the bindep change to exercise it20:18
opendevreviewSteve Baker proposed openstack/diskimage-builder master: growvols: reserve space for spare metadata volume  https://review.opendev.org/c/openstack/diskimage-builder/+/89224420:26
opendevreviewSteve Baker proposed openstack/diskimage-builder master: growvols: reserve space for spare metadata volume  https://review.opendev.org/c/openstack/diskimage-builder/+/89224420:29
ildikovclarkb: Thanks for the heads up! I just +1'ed fungi's patch that adds the config changes.20:32
clarkbfungi: how do we specify that wheels are python3 and not universal?21:24
clarkbdo we just drop the wheel section from setup.cfg?21:24
fungiit's that wheel.universal=1 which tells it to claim support for py2.py3 instead of just whichever one it was built with21:25
fungiwithout that, if we build under python3 it will make a py3 wheel, and if we build under python(2) it will create a py2 wheel21:26
opendevreviewClark Boylan proposed opendev/bindep master: Drop python27 testing and support  https://review.opendev.org/c/opendev/bindep/+/89240721:27
*** timburke_ is now known as timburke21:28
opendevreviewClark Boylan proposed zuul/zuul-jobs master: Remove the nox-py27 job  https://review.opendev.org/c/zuul/zuul-jobs/+/89240821:30
clarkbcorvus: ^ fyi21:30
fungiclarkb: soft -1 on 89240721:49
clarkbgood point21:49
opendevreviewClark Boylan proposed opendev/bindep master: Drop python27 testing and support  https://review.opendev.org/c/opendev/bindep/+/89240721:51
fungiclarkb: fwiw, we also don't actually test 3.5 there currently either, should we continue to claim support for it?21:52
clarkbfungi: I stronlgy suspect it would work given the small number of deps and lack of active development adding new syntax21:52
fungiif dropping 2.7 testing means we should stop claiming to support 2.7, then lack of 3.5 testing likely means we shou21:52
fungildn't claim support for it either21:52
clarkbbut ya maybe we should clean it all up21:53
fungii mean, we currently don't have reason to believe 2.7 would immediately cease working either21:53
fungiwe seem to assume 3.5 works even though we haven't tested it for ages21:54
fungilooks like we dropped it with the switch to nox last year21:55
opendevreviewClark Boylan proposed opendev/bindep master: Drop Python 2.7/3.5 testing and support  https://review.opendev.org/c/opendev/bindep/+/89240721:56
clarkbhow does that look21:56
clarkbI'm doing review of the gitea change and remembered/noticed that the existing gitea servers have /data/gitea/jwt/private.pem contents already and I'm not sure if those will change and invalidate any jwt stuff if I explicitly set a secret value22:04
clarkbI think it may do that but since we don't do oauth it doesn't matter22:04
clarkbbut I can check on the held node if this is a problem by changing the secret by hand and restarting things22:04
fungion a related note, i looked at the recent zuul-announce posts i received and all the headers now correctly claim lists.zuul-ci.org23:00
fungithe only opendev references were in the received headers, where exim helo's as lists01.opendev.org23:01
fungiwhich is correct behavior by my reckoning23:01
fungisince that's its reverse dns name too23:01
clarkbsounds correct to me23:03
opendevreviewMerged opendev/bindep master: Drop fedora-latest test job  https://review.opendev.org/c/opendev/bindep/+/89237823:05
opendevreviewMerged opendev/bindep master: Drop Python 2.7/3.5 testing and support  https://review.opendev.org/c/opendev/bindep/+/89240723:06
opendevreviewMerged openstack/diskimage-builder master: growvols: reserve space for spare metadata volume  https://review.opendev.org/c/openstack/diskimage-builder/+/89224423:52
opendevreviewMerged openstack/diskimage-builder master: Fix baseurl for Fedora versions before 36  https://review.opendev.org/c/openstack/diskimage-builder/+/89065023:52
opendevreviewMerged openstack/diskimage-builder master: Remove lower-constraints.txt  https://review.opendev.org/c/openstack/diskimage-builder/+/87828723:55

Generated by irclog2html.py 2.17.3 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!