*** rlandy is now known as rlandy|out | 01:27 | |
*** akahat is now known as akahat|ruck | 04:37 | |
*** ysandeep|away is now known as ysandeep | 04:49 | |
opendevreview | Ian Wienand proposed opendev/base-jobs master: configure-mirrors: enable extras-common for 9-stream https://review.opendev.org/c/opendev/base-jobs/+/858255 | 06:15 |
---|---|---|
opendevreview | Ian Wienand proposed zuul/zuul-jobs master: configure-mirrors: fix typo in 9-stream enablement list https://review.opendev.org/c/zuul/zuul-jobs/+/858256 | 06:17 |
opendevreview | Ian Wienand proposed opendev/system-config master: testinfra: Update selenium calls https://review.opendev.org/c/opendev/system-config/+/858003 | 06:43 |
opendevreview | Ian Wienand proposed opendev/system-config master: testinfra: Update selenium calls https://review.opendev.org/c/opendev/system-config/+/858003 | 07:10 |
*** jpena|off is now known as jpena | 07:34 | |
*** ysandeep is now known as ysandeep|lunch | 08:22 | |
*** ysandeep|lunch is now known as ysandeep | 09:00 | |
*** pojadhav- is now known as pojadhav | 09:47 | |
*** pojadhav- is now known as pojadhav | 09:54 | |
*** rlandy|out is now known as rlandy|rover | 10:20 | |
*** anbanerj is now known as frenzyfriday | 10:26 | |
*** ysandeep is now known as ysandeep|afk | 10:34 | |
priteau | Hello. I submitted https://review.opendev.org/c/openstack/kolla-ansible/+/858270 this morning, but jobs don't appear to be running. | 10:48 |
fungi | priteau: from the scheduler log, it looks like it thinks you have no jobs which match for that change. do you maybe make extensive use of file filters in your job definitions and don't match on that one file which the change is altering? | 11:45 |
frickler | fungi: I don't think so, at least the openstack python jobs should be running | 11:46 |
fungi | if you look in the scheduler debug log on zuul01 you'll see the patch upload event concludes processing with this: | 11:48 |
fungi | 2022-09-19 09:25:25,806 DEBUG zuul.Pipeline.openstack.check: [e: d33a9d224832475abef8f94b0cce282a] No jobs for change <Change 0x7ff157492860 openstack/kolla-ansible 858270,1> | 11:48 |
frickler | oh, is that the queue definition fallout? | 11:49 |
frickler | https://review.opendev.org/c/openstack/kolla-ansible/+/842280 needs backporting | 11:50 |
frickler | mnasiadka: yoctozepto: priteau: ^^ | 11:51 |
fungi | ahh, indeed, that came in during the zuul upgrade over the weekend | 11:52 |
yoctozepto | ah, yup, seemingly we have forgotten to apply it to stable branches | 11:52 |
yoctozepto | thanks, frickler, for noticing | 11:52 |
frickler | fungi: shouldn't zuul log a warning about that somewhere, too? would be good to see how many other projects might be affected | 11:54 |
yoctozepto | all k-a branches done | 11:54 |
yoctozepto | now for k | 11:54 |
fungi | https://zuul.opendev.org/t/openstack/config-errors | 11:55 |
fungi | extra keys not allowed @ data['check']['queue'] | 11:55 |
fungi | et cetera | 11:55 |
fungi | i just didn't think to look there | 11:55 |
fungi | looks like there are a ton in project-config too, i'll get a patch up for those now | 11:56 |
yoctozepto | k and collection done | 11:57 |
yoctozepto | kayobe remains | 11:57 |
yoctozepto | kayobe done | 12:00 |
*** ysandeep|afk is now known as ysandeep | 12:03 | |
fungi | found an interesting erro doing this... openstack/murano was in monasca's check queue but a murano gate queue. must have been copy/paste error | 12:06 |
opendevreview | Jeremy Stanley proposed openstack/project-config master: Move queues from pipeline to project level https://review.opendev.org/c/openstack/project-config/+/858307 | 12:19 |
fungi | frickler: ^ | 12:19 |
fungi | i'll check for more in other central repos | 12:19 |
fungi | looks like lots of individual projects copied that into their local configs as well, especially showing up in openstack stable branches | 12:21 |
fungi | we'll probably have to bypass zuul to merge that, since it won't get applied speculatively | 12:28 |
priteau | Thank you yoctozepto | 12:31 |
yoctozepto | yw | 12:32 |
fungi | okay, found a couple of lines i missed dedentinng when i moved them, revision on the way | 12:33 |
opendevreview | Jeremy Stanley proposed openstack/project-config master: Move queues from pipeline to project level https://review.opendev.org/c/openstack/project-config/+/858307 | 12:33 |
fungi | merging that without zuul checking it makes me uneasy, because it's a pretty large configuration change and i'm not 100% sure i didn't flub something, but at least `tox -e linters` locally seems to pass | 12:34 |
frickler | fungi: I see no other way except downgrading zuul again temporarily, which I'm not sure would be better | 12:38 |
fungi | yeah | 12:40 |
fungi | at least it didn't complain with "unknown configuration error" on the second patchset, so i think that means it parsed | 12:44 |
fungi | it just won't run any jobs because of the catch-22 | 12:44 |
fungi | infra-root: i'm going to bypass zuul to merge 858307 shortly, last call for reviews | 12:44 |
frickler | fungi: ack from me, do you want me to add a formal W+1, too? | 12:47 |
fungi | no need | 12:47 |
fungi | the example ssh command i used to rely on got moved around in our docs and is no longer copy-pasteable. i'll try to fix that once this is settled down | 12:48 |
opendevreview | Merged openstack/project-config master: Move queues from pipeline to project level https://review.opendev.org/c/openstack/project-config/+/858307 | 12:51 |
frickler | down from 654 errors to 512 | 12:54 |
fungi | yeah, and i don't see any for project-config any longer, at least | 12:56 |
frickler | fungi: maybe send out some status notice, too? not sure about how to word it though | 12:56 |
fungi | i'll dig up the earlier announcement about this so people have context | 12:57 |
yoctozepto | wow, 512 is high, sad | 12:58 |
fungi | it was pretty high even before this | 12:59 |
yoctozepto | ah | 13:00 |
yoctozepto | well, "extra keys not allowed" gets 463 hits | 13:01 |
fungi | yes, it looks like very, very many projects copied this into local project-pipelines | 13:02 |
fungi | though also those errors could be shadowing others. i'm not sure fixing those 463 errors will result in 463 fewer reported errors | 13:03 |
fungi | (but they do still need fixing) | 13:03 |
yoctozepto | yeah | 13:04 |
yoctozepto | fungi: are you sending that notice? I was planning to include it in the commit messages to masakari and blazar | 13:08 |
fungi | looks like we neglected to send a notification about this to the servicve-announce ml, though we did give openstack-discuss a heads up (i think because that's the only impacted tenant) | 13:08 |
yoctozepto | oh, so there was a more up-to-date one, checking | 13:08 |
yoctozepto | https://lists.openstack.org/pipermail/openstack-discuss/2022-September/030505.html | 13:08 |
fungi | https://lists.openstack.org/pipermail/openstack-discuss/2022-May/028603.html | 13:09 |
yoctozepto | I will use this one | 13:09 |
fungi | that was the original warning to openstack-discuss four months ago | 13:09 |
yoctozepto | yeah, that one I quoted in my patches in may ;s | 13:09 |
fungi | i'll send something to service-announce and then reference that | 13:10 |
frickler | ah, I was more thinking about #status notice here, but mails are fine, too | 13:11 |
yoctozepto | the more, the merrier | 13:11 |
*** soniya is now known as soniya|afk | 13:11 | |
fungi | frickler: well, i didn't want to send a status notice pointed to reminders on mailing lists other than our own, but now that i've sent something to service-announce i can reference that in a status notice | 13:28 |
*** prometheanfire is now known as Guest931 | 13:29 | |
fungi | i don't want to give opendev collaboratory users the impression that they're expected to follow zuul or openstack mailing lists for information about changes to services in the collaboratory | 13:29 |
*** Guest772 is now known as dasm | 13:30 | |
fungi | status notice As of the weekend, Zuul only supports queue declarations at the project level; if expected jobs aren't running, see this announcement: https://lists.opendev.org/pipermail/service-announce/2022-September/000044.html | 13:34 |
fungi | frickler: ^ how's that look? | 13:34 |
frickler | fungi: lgtm | 13:37 |
fungi | #status notice As of the weekend, Zuul only supports queue declarations at the project level; if expected jobs aren't running, see this announcement: https://lists.opendev.org/pipermail/service-announce/2022-September/000044.html | 13:37 |
opendevstatus | fungi: sending notice | 13:37 |
-opendevstatus- NOTICE: As of the weekend, Zuul only supports queue declarations at the project level; if expected jobs aren't running, see this announcement: https://lists.opendev.org/pipermail/service-announce/2022-September/000044.html | 13:37 | |
opendevstatus | fungi: finished sending notice | 13:40 |
Clark[m] | fungi: correct I didn't bother with service-announce because openstack is the only affected user. I'm not sure we'd be giving an impression that openstack ml is required. They were the only ones affected here | 13:48 |
fungi | well, the openstack tenant is the only affected tenant, but there were a number of non-openstack projects affected in that tenant | 13:48 |
fungi | including starlingx | 13:49 |
fungi | (by way of all the errors in project-config anyway) | 13:49 |
Clark[m] | Hrm I think those must've been added after the announcement then? | 13:49 |
Clark[m] | Oh, then maybe the listing utility wasn't listing all the problems? | 13:49 |
fungi | i don't think so, i think we missed fixing project-config at the very least | 13:50 |
Clark[m] | I pushed changes my self to the non openstack projects that the utility listed | 13:50 |
fungi | i wonder if the utility skipped config projects | 13:50 |
Clark[m] | Starlingx is definitely not in that original list. It got added after or the utility didn't give us a complete listing | 13:54 |
fungi | Clark[m]: well, it was added to them in openstack/project-config (along with about 150 other repos) | 13:58 |
fungi | a lot of the entries affected in project-config were for very old/deprecated repos, so i don't think they got added recently | 13:59 |
fungi | it's more likely the script simply skipped checking project-config | 13:59 |
fungi | or didn't support checking additions to foreign projects and only checked for additions affecting the project in which they were configured | 14:00 |
Clark[m] | Ya just calling that out on case it helps other zuul users. At this point it seems we've addressed the problem in project-config and other repos that I expect were listed are what remain | 14:06 |
fungi | yeah, i checked our config repos for some other tenants and didn't see a similar issue | 14:07 |
corvus | we can't run the script as written any more, but we can verify that it at least checks those project stanzas, and it appears to do so. there could be a flaw in it, but it at least doesn't appear to be that it didn't check the project stanza in openstack/project-config | 14:10 |
*** rcastillo|rover is now known as rcastillo | 14:11 | |
fungi | also possible that it listed project-config and we forgot to fix it | 14:23 |
fungi | https://lists.openstack.org/pipermail/openstack-discuss/2022-May/028603.html does say openstack/project-config is one of the affected repos | 14:23 |
corvus | yeah, if that's a list of "places where the config stanza is wrong" and starlingx didn't have any wrong config stanzas in their own git repos but rather only in project-config, then that would be consistent with the information at hand. | 14:25 |
fungi | to be clear, i just mentioned starlingx as being a non-openstack project impacted by the configuration issue, i don't know that they even noticed since i think the impact for them would have been limited to no queue getting applied to their changes | 14:30 |
fungi | but since the tenant-wide configuration was affected for a tenant containing non-openstack projects, the impact was not limited to the openstack project as we had previously asserted | 14:31 |
fungi | granted, the fix needed to be reviewed and approved by our central config reviewers, so couldn't be corrected entirely by the affected projects | 14:32 |
frickler | we also seem to have some builder issue, likely unrelated since it is going on for longer https://grafana.opendev.org/d/f3089338b3/nodepool-dib-status?orgId=1 | 14:45 |
fungi | and we seem to be having trouble consistently booting fedora-36 images in at least some providers, per discussion in #openstack-infra | 14:52 |
*** tweining[m] is now known as tweining | 14:56 | |
*** ysandeep is now known as ysandeep|dinner | 14:58 | |
opendevreview | Jeremy Stanley proposed openstack/project-config master: Add missing fedora-36 label to nl01 https://review.opendev.org/c/openstack/project-config/+/858397 | 15:07 |
fungi | that looks like at least part of the problem | 15:08 |
fungi | i was seeing lots of immediate boot failures for fedora-36 in rackspace and exceptions in the debug log led to that discovery | 15:08 |
clarkb | I think I had in my head that the affected projects in project-config would be responsiblefor fixing things as they would've been listed. But what we observed instead makes sense now that I think about it | 15:10 |
clarkb | fungi: if this morning is still good for you I'm good to land https://review.opendev.org/c/opendev/system-config/+/858224 as well. We can remove the jvbs from emergency and move the keystore aside on meetpad to force it to be recreated | 15:22 |
fungi | yes, let's do. i've approved it now | 15:27 |
clarkb | thanks! | 15:27 |
fungi | i'll take the jvbs out of the emergency file | 15:27 |
fungi | and done | 15:27 |
clarkb | I'm doing local updates and will reboot then load keys. I can move the keystore aside | 15:27 |
fungi | cool, thanks! | 15:27 |
*** marios is now known as marios|out | 15:28 | |
opendevreview | Merged openstack/project-config master: Add missing fedora-36 label to nl01 https://review.opendev.org/c/openstack/project-config/+/858397 | 15:38 |
frickler | one related question: designate has the queue definition both in the project-template as in each project stanza, cf. https://codesearch.openstack.org/?q=designate-devstack-jobs&i=nope&literal=nope&files=&excludeFiles=&repos= . this is redundant, so I think I want to clean it up, but not sure in which direction, do we have a preference? template has less config, but maybe more difficult to discover | 15:43 |
fungi | frickler: could you be misreading that? looks like it's in the project definition for three projects | 15:47 |
fungi | or else i've misunderstood what you're saying | 15:48 |
frickler | ah, forgot to mention, this is only in stable branches | 15:48 |
fungi | okay, i see it in stable/yoga | 15:48 |
fungi | is that valid? | 15:49 |
fungi | i guess i'll check the docs | 15:49 |
frickler | or rather, in master the added stanza got dropped from the template | 15:49 |
clarkb | zuul applies the first one it finds iirc | 15:49 |
fungi | yeah, looks like it's valid for project-template entries as well | 15:50 |
frickler | https://zuul-ci.org/docs/zuul/latest/config/project.html#attr-project.queue explicitly mentions project-template as an option | 15:50 |
fungi | frickler: right, but that's the docs for project configuration. project-template (documented below there) merely says it takes the stuff project does, and doesn't call out queue specifically, so presumably fine | 15:51 |
fungi | anyway, i suppose it's up to the project which they prefer, but doing it in the project-template would seem to involve less work to maintain long term | 15:53 |
dtantsur | hi folks! do fedora-latest nodes still exist? | 15:54 |
clarkb | fungi: the keystore file is moved aside | 15:55 |
fungi | dtantsur: yes, are you seeing intermittent node_failure results for those jobs? | 15:55 |
yoctozepto | hmm, does the new zuul "queue" place work? I see masakari repos using separate queues in gate: https://zuul.opendev.org/t/openstack/status#masakari | 15:56 |
fungi | dtantsur: if so, hopefully 858397 (merged a few minutes ago) will help | 15:56 |
dtantsur | fungi: yep https://review.opendev.org/c/openstack/bifrost/+/858391 (ps 1) | 15:56 |
dtantsur | cool, thanks! | 15:56 |
clarkb | yoctozepto: https://opendev.org/openstack/masakari/src/branch/master/.zuul.yaml#L92-L93 masakari seems to have broken config | 15:57 |
frickler | yoctozepto: it takes effect only after those patches are merged | 15:57 |
fungi | yoctozepto: those look like implicit queues | 15:57 |
yoctozepto | frickler: thanks, that is what I hoped | 15:57 |
opendevreview | Merged opendev/system-config master: Fix jitsi meet jvb connection info and cert CN https://review.opendev.org/c/opendev/system-config/+/858224 | 15:57 |
yoctozepto | clarkb: yeah, these patches are fixing it | 15:57 |
fungi | (zuul has had implicit queues for years, they're what you get if you don't associate the project with a named queue) | 15:58 |
yoctozepto | (somehow I missed masakari from the zuul config radar before) | 15:58 |
frickler | see the note in the project.queue doc I cited earlier | 15:58 |
fungi | i need to step away for a few minutes, but should be back before the meetpad config update deploys | 15:59 |
*** ysandeep|dinner is now known as ysandeep | 16:02 | |
*** ysandeep is now known as ysandeep|out | 16:03 | |
clarkb | I just did a quick sanity check with my laptop and desktop and it looks good | 16:06 |
clarkb | Not sure what jvb I went through yet | 16:06 |
clarkb | according to the nginx log I believe it used meetpad01 | 16:07 |
clarkb | I'm in https://meetpad.opendev.org/isitbroken with just my desktop now which doesn't have a camera (my laptop battery was dying so now it is plugged in elsewhere) | 16:10 |
clarkb | I think we can confirm that is working for someone other than myself. Then stop jvb on meetpad01 and rejoin to see if we switch to one of the jvbs | 16:10 |
clarkb | I need to make breakfast. I'll be back shortly to rejoin that room for testing | 16:26 |
*** jpena is now known as jpena|off | 16:28 | |
fungi | okay, back now. that deployed faster than i expected | 16:32 |
clarkb | and it looks good. jvb01 hosted our video bridge when I rejoined | 16:42 |
clarkb | We should now have a scale out system again with working video and tls | 16:42 |
clarkb | fungi: dtantsur: fedora-36 launches are being attempted on the rax launcher now (nl01), but they seem to be hitting timeouts waiting for ssh to listen | 18:20 |
clarkb | fedora like rocky is a container built image. | 18:20 |
clarkb | I wonder if new fedora is having the same problems as rocky | 18:20 |
clarkb | diskimage_builder/elements/fedora/pre-install.d/02-set-machine-id runs systemd-machine-id-setup but fedora-container does not | 18:21 |
clarkb | its possible the fedora container images were updated to break that (which is what i Think happened to rocky) | 18:23 |
fungi | if i can catch it booting somewhere else i can grab a console log (unfortunately rackspace only gives novnc urls for these) | 18:30 |
clarkb | you can also manually boot it | 18:30 |
clarkb | that is what I did with rocky and that way it is easier to grab a console log | 18:31 |
fungi | also rackspace seems to give back a "instance not yet ready" error when i try on one of the booting nodes | 18:31 |
fungi | there it goes, i just didn't wait long enough | 18:32 |
fungi | "booting fedora linux container image..." | 18:32 |
clarkb | I think the behavior with the machine id missing was that it fell back to a recovery shell? | 18:33 |
fungi | seems to have gotten far enough to resize the rootfs | 18:33 |
clarkb | something like that. It never got far enough to run glean which is the other usual culprit | 18:33 |
fungi | but yeah it's just sitting there after that | 18:33 |
fungi | it got all the way to a login prompt on tty1 now | 18:34 |
fungi | i can't reach its sshd though | 18:34 |
fungi | and now nodepool has deleted it | 18:34 |
clarkb | in that case it could be glean | 18:35 |
fungi | Timeout waiting for connection to port 22 | 18:35 |
clarkb | definitely worth manually booting and then checking the console logs for glean info | 18:35 |
clarkb | I need to find lunch shortly and start on some dinner prep, but after that I can help take a look if that is useful. Maybe after I get a meeting agenda out since that is somewhat time bound | 18:53 |
frickler | so /opt is full on nb01+2, I'm going to clean those up and I think we rebooted after that for good measure? | 18:56 |
clarkb | frickler: yes, because sometimes there are stale mounts (since / isn't filling its less critical to reboot but may as well) | 18:57 |
frickler | nb02 is done, nb01 seems to take much longer. but from http://cacti.openstack.org/cacti/graph.php?action=view&local_graph_id=68184&rra_id=all it looks like sooner or later some more thorough solution may be needed. late last year we got 50% free after a cleanup, now only about 30% | 19:13 |
frickler | also seems to be a pretty regular pattern now, things filling up rather linearly over 2-3 months | 19:15 |
fungi | i wonder if that's proportional to the number of images we're building | 19:18 |
fungi | maybe it's time for another x86 builder | 19:18 |
frickler | would be interesting to watch disk usage on a fresh builder develop, we could try that and maybe drop the node again if it turns out it isn't needed. maybe we can discuss tomorrow, getting late here for me now | 19:24 |
opendevreview | lotorev vitaly proposed opendev/project-config master: Update link to zuul gating docs https://review.opendev.org/c/opendev/project-config/+/858454 | 19:26 |
clarkb | yes it is proportional and we've added images with fedora and rocky recently | 19:46 |
fungi | vlotorev[m]: looks like 858454 is missing files. did you forget a git add? | 19:54 |
opendevreview | Merged opendev/system-config master: Add Jaeger tracing server https://review.opendev.org/c/opendev/system-config/+/855983 | 20:51 |
clarkb | anything else need to go on the meeting agenda? | 21:00 |
fungi | i've got nuthin | 21:15 |
*** dasm is now known as dasm|off | 22:00 | |
ianw | hrm, f36 *has* been working so I wouldn't expect too much that would change :/ | 22:26 |
ianw | https://zuul.opendev.org/t/openstack/builds?job_name=dib-nodepool-functional-openstack-fedora-36-containerfile-src looks good at least | 22:32 |
clarkb | ianw: ya I think fungi caught it actually booting to a login prompt which indicates somethign with glean and network setup is more likely than the machine id stuff | 22:34 |
ianw | :/ | 22:34 |
ianw | infra-root: https://review.opendev.org/q/topic:configure-mirrors-centos has the follow-ups to enable repos for 9-stream. i think that's pretty low risk, and would only affect centos-9, but I'm open to opinions | 22:35 |
fungi | yes, i used console url show on one which was "building" in rackspace and managed to see it get all the way to a login prompt before the launcher deleted it for being unreachable | 22:36 |
ianw | interesting ... so only failing on rax? | 22:36 |
fungi | not sure, though i think i saw one work in inmotion | 22:37 |
fungi | part of the confusion was that the config on nl01 was missing the label definition, so it was immediately failing to boot them until that got patched | 22:38 |
clarkb | node failure should only happen if all clouds fail to boot it though | 22:38 |
clarkb | ianw: while I agree the fallout should be minimal we should still update base-test re https://review.opendev.org/c/opendev/base-jobs/+/858255 | 22:39 |
ianw | indeed you're correct; we should get that back in sync | 22:41 |
opendevreview | Ian Wienand proposed opendev/base-jobs master: configure-mirrors: enable extras-common for 9-stream https://review.opendev.org/c/opendev/base-jobs/+/858255 | 22:47 |
opendevreview | Ian Wienand proposed opendev/base-jobs master: Revert "Switch base-test to test-prepare-workspace-git" https://review.opendev.org/c/opendev/base-jobs/+/858473 | 22:47 |
opendevreview | Ian Wienand proposed opendev/base-jobs master: base-test: add descriptive names https://review.opendev.org/c/opendev/base-jobs/+/858474 | 22:47 |
ianw | clarkb: ^ thanks, just a very minor yak shave :) | 22:47 |
*** rlandy|rover is now known as rlandy|out | 22:50 | |
opendevreview | Ian Wienand proposed opendev/system-config master: testinfra: Update selenium calls https://review.opendev.org/c/opendev/system-config/+/858003 | 23:11 |
opendevreview | Ian Wienand proposed opendev/system-config master: testinfra: Update selenium calls https://review.opendev.org/c/opendev/system-config/+/858003 | 23:50 |
Generated by irclog2html.py 2.17.3 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!