Monday, 2022-09-19

*** rlandy is now known as rlandy|out01:27
*** akahat is now known as akahat|ruck04:37
*** ysandeep|away is now known as ysandeep04:49
opendevreviewIan Wienand proposed opendev/base-jobs master: configure-mirrors: enable extras-common for 9-stream
opendevreviewIan Wienand proposed zuul/zuul-jobs master: configure-mirrors: fix typo in 9-stream enablement list
opendevreviewIan Wienand proposed opendev/system-config master: testinfra: Update selenium calls
opendevreviewIan Wienand proposed opendev/system-config master: testinfra: Update selenium calls
*** jpena|off is now known as jpena07:34
*** ysandeep is now known as ysandeep|lunch08:22
*** ysandeep|lunch is now known as ysandeep09:00
*** pojadhav- is now known as pojadhav09:47
*** pojadhav- is now known as pojadhav09:54
*** rlandy|out is now known as rlandy|rover10:20
*** anbanerj is now known as frenzyfriday10:26
*** ysandeep is now known as ysandeep|afk10:34
priteauHello. I submitted this morning, but jobs don't appear to be running.10:48
fungipriteau: from the scheduler log, it looks like it thinks you have no jobs which match for that change. do you maybe make extensive use of file filters in your job definitions and don't match on that one file which the change is altering?11:45
fricklerfungi: I don't think so, at least the openstack python jobs should be running11:46
fungiif you look in the scheduler debug log on zuul01 you'll see the patch upload event concludes processing with this:11:48
fungi2022-09-19 09:25:25,806 DEBUG zuul.Pipeline.openstack.check: [e: d33a9d224832475abef8f94b0cce282a] No jobs for change <Change 0x7ff157492860 openstack/kolla-ansible 858270,1>11:48
frickleroh, is that the queue definition fallout?11:49
frickler needs backporting11:50
fricklermnasiadka: yoctozepto: priteau: ^^11:51
fungiahh, indeed, that came in during the zuul upgrade over the weekend11:52
yoctozeptoah, yup, seemingly we have forgotten to apply it to stable branches11:52
yoctozeptothanks, frickler, for noticing11:52
fricklerfungi: shouldn't zuul log a warning about that somewhere, too? would be good to see how many other projects might be affected11:54
yoctozeptoall k-a branches done11:54
yoctozeptonow for k11:54
fungiextra keys not allowed @ data['check']['queue']11:55
fungiet cetera11:55
fungii just didn't think to look there11:55
fungilooks like there are a ton in project-config too, i'll get a patch up for those now11:56
yoctozeptok and collection done11:57
yoctozeptokayobe remains11:57
yoctozeptokayobe done12:00
*** ysandeep|afk is now known as ysandeep12:03
fungifound an interesting erro doing this... openstack/murano was in monasca's check queue but a murano gate queue. must have been copy/paste error12:06
opendevreviewJeremy Stanley proposed openstack/project-config master: Move queues from pipeline to project level
fungifrickler: ^12:19
fungii'll check for more in other central repos12:19
fungilooks like lots of individual projects copied that into their local configs as well, especially showing up in openstack stable branches12:21
fungiwe'll probably have to bypass zuul to merge that, since it won't get applied speculatively12:28
priteauThank you yoctozepto12:31
fungiokay, found a couple of lines i missed dedentinng when i moved them, revision on the way12:33
opendevreviewJeremy Stanley proposed openstack/project-config master: Move queues from pipeline to project level
fungimerging that without zuul checking it makes me uneasy, because it's a pretty large configuration change and i'm not 100% sure i didn't flub something, but at least `tox -e linters` locally seems to pass12:34
fricklerfungi: I see no other way except downgrading zuul again temporarily, which I'm not sure would be better12:38
fungiat least it didn't complain with "unknown configuration error" on the second patchset, so i think that means it parsed12:44
fungiit just won't run any jobs because of the catch-2212:44
fungiinfra-root: i'm going to bypass zuul to merge 858307 shortly, last call for reviews12:44
fricklerfungi: ack from me, do you want me to add a formal W+1, too?12:47
fungino need12:47
fungithe example ssh command i used to rely on got moved around in our docs and is no longer copy-pasteable. i'll try to fix that once this is settled down12:48
opendevreviewMerged openstack/project-config master: Move queues from pipeline to project level
fricklerdown from 654 errors to 51212:54
fungiyeah, and i don't see any for project-config any longer, at least12:56
fricklerfungi: maybe send out some status notice, too? not sure about how to word it though12:56
fungii'll dig up the earlier announcement about this so people have context12:57
yoctozeptowow, 512 is high, sad12:58
fungiit was pretty high even before this12:59
yoctozeptowell, "extra keys not allowed" gets 463 hits13:01
fungiyes, it looks like very, very many projects copied this into local project-pipelines13:02
fungithough also those errors could be shadowing others. i'm not sure fixing those 463 errors will result in 463 fewer reported errors13:03
fungi(but they do still need fixing)13:03
yoctozeptofungi: are you sending that notice? I was planning to include it in the commit messages to masakari and blazar13:08
fungilooks like we neglected to send a notification about this to the servicve-announce ml, though we did give openstack-discuss a heads up (i think because that's the only impacted tenant)13:08
yoctozeptooh, so there was a more up-to-date one, checking13:08
yoctozeptoI will use this one13:09
fungithat was the original warning to openstack-discuss four months ago13:09
yoctozeptoyeah, that one I quoted in my patches in may ;s13:09
fungii'll send something to service-announce and then reference that13:10
fricklerah, I was more thinking about #status notice here, but mails are fine, too13:11
yoctozeptothe more, the merrier13:11
*** soniya is now known as soniya|afk13:11
fungifrickler: well, i didn't want to send a status notice pointed to reminders on mailing lists other than our own, but now that i've sent something to service-announce i can reference that in a status notice13:28
*** prometheanfire is now known as Guest93113:29
fungii don't want to give opendev collaboratory users the impression that they're expected to follow zuul or openstack mailing lists for information about changes to services in the collaboratory13:29
*** Guest772 is now known as dasm13:30
fungistatus notice As of the weekend, Zuul only supports queue declarations at the project level; if expected jobs aren't running, see this announcement:
fungifrickler: ^ how's that look?13:34
fricklerfungi: lgtm13:37
fungi#status notice As of the weekend, Zuul only supports queue declarations at the project level; if expected jobs aren't running, see this announcement:
opendevstatusfungi: sending notice13:37
-opendevstatus- NOTICE: As of the weekend, Zuul only supports queue declarations at the project level; if expected jobs aren't running, see this announcement:
opendevstatusfungi: finished sending notice13:40
Clark[m]fungi: correct I didn't bother with service-announce because openstack is the only affected user. I'm not sure we'd be giving an impression that openstack ml is required. They were the only ones affected here13:48
fungiwell, the openstack tenant is the only affected tenant, but there were a number of non-openstack projects affected in that tenant13:48
fungiincluding starlingx13:49
fungi(by way of all the errors in project-config anyway)13:49
Clark[m]Hrm I think those must've been added after the announcement then?13:49
Clark[m]Oh, then maybe the listing utility wasn't listing all the problems?13:49
fungii don't think so, i think we missed fixing project-config at the very least13:50
Clark[m]I pushed changes my self to the non openstack projects that the utility listed13:50
fungii wonder if the utility skipped config projects13:50
Clark[m]Starlingx is definitely not in that original list. It got added after or the utility didn't give us a complete listing13:54
fungiClark[m]: well, it was added to them in openstack/project-config (along with about 150 other repos)13:58
fungia lot of the entries affected in project-config were for very old/deprecated repos, so i don't think they got added recently13:59
fungiit's more likely the script simply skipped checking project-config13:59
fungior didn't support checking additions to foreign projects and only checked for additions affecting the project in which they were configured14:00
Clark[m]Ya just calling that out on case it helps other zuul users. At this point it seems we've addressed the problem in project-config and other repos that I expect were listed are what remain14:06
fungiyeah, i checked our config repos for some other tenants and didn't see a similar issue14:07
corvuswe can't run the script as written any more, but we can verify that it at least checks those project stanzas, and it appears to do so.  there could be a flaw in it, but it at least doesn't appear to be that it didn't check the project stanza in openstack/project-config14:10
*** rcastillo|rover is now known as rcastillo14:11
fungialso possible that it listed project-config and we forgot to fix it14:23
fungi does say openstack/project-config is one of the affected repos14:23
corvusyeah, if that's a list of "places where the config stanza is wrong" and starlingx didn't have any wrong config stanzas in their own git repos but rather only in project-config, then that would be consistent with the information at hand.14:25
fungito be clear, i just mentioned starlingx as being a non-openstack project impacted by the configuration issue, i don't know that they even noticed since i think the impact for them would have been limited to no queue getting applied to their changes14:30
fungibut since the tenant-wide configuration was affected for a tenant containing non-openstack projects, the impact was not limited to the openstack project as we had previously asserted14:31
fungigranted, the fix needed to be reviewed and approved by our central config reviewers, so couldn't be corrected entirely by the affected projects14:32
fricklerwe also seem to have some builder issue, likely unrelated since it is going on for longer
fungiand we seem to be having trouble consistently booting fedora-36 images in at least some providers, per discussion in #openstack-infra14:52
*** tweining[m] is now known as tweining14:56
*** ysandeep is now known as ysandeep|dinner14:58
opendevreviewJeremy Stanley proposed openstack/project-config master: Add missing fedora-36 label to nl01
fungithat looks like at least part of the problem15:08
fungii was seeing lots of immediate boot failures for fedora-36 in rackspace and exceptions in the debug log led to that discovery15:08
clarkbI think I had in my head that the affected projects in project-config would be responsiblefor fixing things as they would've been listed. But what we observed instead makes sense now that I think about it15:10
clarkbfungi: if this morning is still good for you I'm good to land as well. We can remove the jvbs from emergency and move the keystore aside on meetpad to force it to be recreated15:22
fungiyes, let's do. i've approved it now15:27
fungii'll take the jvbs out of the emergency file15:27
fungiand done15:27
clarkbI'm doing local updates and will reboot then load keys. I can move the keystore aside15:27
fungicool, thanks!15:27
*** marios is now known as marios|out15:28
opendevreviewMerged openstack/project-config master: Add missing fedora-36 label to nl01
fricklerone related question: designate has the queue definition both in the project-template as in each project stanza, cf. . this is redundant, so I think I want to clean it up, but not sure in which direction, do we have a preference? template has less config, but maybe more difficult to discover15:43
fungifrickler: could you be misreading that? looks like it's in the project definition for three projects15:47
fungior else i've misunderstood what you're saying15:48
fricklerah, forgot to mention, this is only in stable branches15:48
fungiokay, i see it in stable/yoga15:48
fungiis that valid?15:49
fungii guess i'll check the docs15:49
frickleror rather, in master the added stanza got dropped from the template15:49
clarkbzuul applies the first one it finds iirc15:49
fungiyeah, looks like it's valid for project-template entries as well15:50
frickler explicitly mentions project-template as an option15:50
fungifrickler: right, but that's the docs for project configuration. project-template (documented below there) merely says it takes the stuff project does, and doesn't call out queue specifically, so presumably fine15:51
fungianyway, i suppose it's up to the project which they prefer, but doing it in the project-template would seem to involve less work to maintain long term15:53
dtantsurhi folks! do fedora-latest nodes still exist?15:54
clarkbfungi: the keystore file is moved aside15:55
fungidtantsur: yes, are you seeing intermittent node_failure results for those jobs?15:55
yoctozeptohmm, does the new zuul "queue" place work? I see masakari repos using separate queues in gate:
fungidtantsur: if so, hopefully 858397 (merged a few minutes ago) will help15:56
dtantsurfungi: yep (ps 1)15:56
dtantsurcool, thanks!15:56
clarkbyoctozepto: masakari seems to have broken config15:57
frickleryoctozepto: it takes effect only after those patches are merged15:57
fungiyoctozepto: those look like implicit queues15:57
yoctozeptofrickler: thanks, that is what I hoped15:57
opendevreviewMerged opendev/system-config master: Fix jitsi meet jvb connection info and cert CN
yoctozeptoclarkb: yeah, these patches are fixing it15:57
fungi(zuul has had implicit queues for years, they're what you get if you don't associate the project with a named queue)15:58
yoctozepto(somehow I missed masakari from the zuul config radar before)15:58
fricklersee the note in the project.queue doc I cited earlier15:58
fungii need to step away for a few minutes, but should be back before the meetpad config update deploys15:59
*** ysandeep|dinner is now known as ysandeep16:02
*** ysandeep is now known as ysandeep|out16:03
clarkbI just did a quick sanity check with my laptop and desktop and it looks good16:06
clarkbNot sure what jvb I went through yet16:06
clarkbaccording to the nginx log I believe it used meetpad0116:07
clarkbI'm in with just my desktop now which doesn't have a camera (my laptop battery was dying so now it is plugged in elsewhere)16:10
clarkbI think we can confirm that is working for someone other than myself. Then stop jvb on meetpad01 and rejoin to see if we switch to one of the jvbs16:10
clarkbI need to make breakfast. I'll be back shortly to rejoin that room for testing16:26
*** jpena is now known as jpena|off16:28
fungiokay, back now. that deployed faster than i expected16:32
clarkband it looks good. jvb01 hosted our video bridge when I rejoined16:42
clarkbWe should now have a scale out system again with working video and tls16:42
clarkbfungi: dtantsur: fedora-36 launches are being attempted on the rax launcher now (nl01), but they seem to be hitting timeouts waiting for ssh to listen18:20
clarkbfedora like rocky is a container built image.18:20
clarkbI wonder if new fedora is having the same problems as rocky18:20
clarkbdiskimage_builder/elements/fedora/pre-install.d/02-set-machine-id runs systemd-machine-id-setup but fedora-container does not18:21
clarkbits possible the fedora container images were updated to break that (which is what i Think happened to rocky)18:23
fungiif i can catch it booting somewhere else i can grab a console log (unfortunately rackspace only gives novnc urls for these)18:30
clarkbyou can also manually boot it 18:30
clarkbthat is what I did with rocky and that way it is easier to grab a console log18:31
fungialso rackspace seems to give back a "instance not yet ready" error when i try on one of the booting nodes18:31
fungithere it goes, i just didn't wait long enough18:32
fungi"booting fedora linux container image..."18:32
clarkbI think the behavior with the machine id missing was that it fell back to a recovery shell?18:33
fungiseems to have gotten far enough to resize the rootfs18:33
clarkbsomething like that. It never got far enough to run glean which is the other usual culprit18:33
fungibut yeah it's just sitting there after that18:33
fungiit got all the way to a login prompt on tty1 now18:34
fungii can't reach its sshd though18:34
fungiand now nodepool has deleted it18:34
clarkbin that case it could be glean18:35
fungiTimeout waiting for connection to port 2218:35
clarkbdefinitely worth manually booting and then checking the console logs for glean info18:35
clarkbI need to find lunch shortly and start on some dinner prep, but after that I can help take a look if that is useful. Maybe after I get a meeting agenda out since that is somewhat time bound18:53
fricklerso /opt is full on nb01+2, I'm going to clean those up and I think we rebooted after that for good measure?18:56
clarkbfrickler: yes, because sometimes there are stale mounts (since / isn't filling its less critical to reboot but may as well)18:57
fricklernb02 is done, nb01 seems to take much longer. but from it looks like sooner or later some more thorough solution may be needed. late last year we got 50% free after a cleanup, now only about 30%19:13
frickleralso seems to be a pretty regular pattern now, things filling up rather linearly over 2-3 months19:15
fungii wonder if that's proportional to the number of images we're building19:18
fungimaybe it's time for another x86 builder19:18
fricklerwould be interesting to watch disk usage on a fresh builder develop, we could try that and maybe drop the node again if it turns out it isn't needed. maybe we can discuss tomorrow, getting late here for me now19:24
opendevreviewlotorev vitaly proposed opendev/project-config master: Update link to zuul gating docs
clarkbyes it is proportional and we've added images with fedora and rocky recently19:46
fungivlotorev[m]: looks like 858454 is missing files. did you forget a git add?19:54
opendevreviewMerged opendev/system-config master: Add Jaeger tracing server
clarkbanything else need to go on the meeting agenda?21:00
fungii've got nuthin21:15
*** dasm is now known as dasm|off22:00
ianwhrm, f36 *has* been working so I wouldn't expect too much that would change :/22:26
ianw looks good at least22:32
clarkbianw: ya I think fungi caught it actually booting to a login prompt which indicates somethign with glean and network setup is more likely than the machine id stuff22:34
ianwinfra-root: has the follow-ups to enable repos for 9-stream.  i think that's pretty low risk, and would only affect centos-9, but I'm open to opinions22:35
fungiyes, i used console url show on one which was "building" in rackspace and managed to see it get all the way to a login prompt before the launcher deleted it for being unreachable22:36
ianwinteresting ... so only failing on rax?22:36
funginot sure, though i think i saw one work in inmotion22:37
fungipart of the confusion was that the config on nl01 was missing the label definition, so it was immediately failing to boot them until that got patched22:38
clarkbnode failure should only happen if all clouds fail to boot it though22:38
clarkbianw: while I agree the fallout should be minimal we should still update base-test re
ianwindeed you're correct; we should get that back in sync22:41
opendevreviewIan Wienand proposed opendev/base-jobs master: configure-mirrors: enable extras-common for 9-stream
opendevreviewIan Wienand proposed opendev/base-jobs master: Revert "Switch base-test to test-prepare-workspace-git"
opendevreviewIan Wienand proposed opendev/base-jobs master: base-test: add descriptive names
ianwclarkb: ^ thanks, just a very minor yak shave :)22:47
*** rlandy|rover is now known as rlandy|out22:50
opendevreviewIan Wienand proposed opendev/system-config master: testinfra: Update selenium calls
opendevreviewIan Wienand proposed opendev/system-config master: testinfra: Update selenium calls

Generated by 2.17.3 by Marius Gedminas - find it at!