*** travisholton3 is now known as travisholton | 00:11 | |
opendevreview | OpenStack Proposal Bot proposed openstack/project-config master: Normalize projects.yaml https://review.opendev.org/c/openstack/project-config/+/895992 | 02:39 |
---|---|---|
opendevreview | Merged openstack/project-config master: Normalize projects.yaml https://review.opendev.org/c/openstack/project-config/+/895992 | 06:00 |
frickler | fungi: checking merge conflicts for ^^ I saw https://review.opendev.org/c/openstack/project-config/+/685778 , do you still want to keep that open? | 06:04 |
opendevreview | daniel.pawlik proposed zuul/zuul-jobs master: Parametrize cri-o version https://review.opendev.org/c/zuul/zuul-jobs/+/895597 | 07:02 |
opendevreview | daniel.pawlik proposed zuul/zuul-jobs master: Add feature to set --vm-driver name for minikube https://review.opendev.org/c/zuul/zuul-jobs/+/894755 | 07:14 |
opendevreview | daniel.pawlik proposed zuul/zuul-jobs master: Add feature to set --vm-driver name for minikube https://review.opendev.org/c/zuul/zuul-jobs/+/894755 | 07:49 |
opendevreview | daniel.pawlik proposed zuul/zuul-jobs master: Add feature to set --vm-driver name for minikube https://review.opendev.org/c/zuul/zuul-jobs/+/894755 | 08:13 |
opendevreview | daniel.pawlik proposed zuul/zuul-jobs master: Add feature to set --vm-driver name for minikube https://review.opendev.org/c/zuul/zuul-jobs/+/894755 | 08:29 |
*** elodilles_pto is now known as elodilles | 08:47 | |
opendevreview | daniel.pawlik proposed zuul/zuul-jobs master: Add feature to set --vm-driver name for minikube https://review.opendev.org/c/zuul/zuul-jobs/+/894755 | 09:48 |
opendevreview | daniel.pawlik proposed zuul/zuul-jobs master: Add feature to set --vm-driver name for minikube https://review.opendev.org/c/zuul/zuul-jobs/+/894755 | 09:59 |
fungi | frickler: thanks for spotting that, i've abandoned it since i don't expect to find time to work on that any time soon | 11:35 |
*** amoralej is now known as amoralej|lunch | 12:23 | |
*** amoralej|lunch is now known as amoralej | 13:10 | |
frickler | fungi: we might have an issue with some log uploads, two post_failures without logs in https://review.opendev.org/c/openstack/requirements/+/896053 , do you have time to check zuul logs? | 13:12 |
fungi | yep, just a sec | 13:30 |
fungi | f802192c96bd4bec80b3948761297e8a (one in your example) ran on ze01 and picked ovh_gra as its upload destination | 13:34 |
fungi | 2023-09-21 11:50:20,595 DEBUG zuul.AnsibleJob.output: [e: c56e5ae5b08444c39a56600d02319824] [build: f802192c96bd4bec80b3948761297e8a] Ansible output: b'TASK [upload-logs-swift : Upload logs to swift] ********************************' | 13:39 |
fungi | 2023-09-21 11:50:30,679 INFO zuul.AnsibleJob: [e: c56e5ae5b08444c39a56600d02319824] [build: f802192c96bd4bec80b3948761297e8a] Early failure in job | 13:39 |
fungi | 2023-09-21 11:50:30,684 DEBUG zuul.AnsibleJob.output: [e: c56e5ae5b08444c39a56600d02319824] [build: f802192c96bd4bec80b3948761297e8a] Ansible result output: b'RESULT failure' | 13:39 |
fungi | 2023-09-21 11:50:30,685 DEBUG zuul.AnsibleJob.output: [e: c56e5ae5b08444c39a56600d02319824] [build: f802192c96bd4bec80b3948761297e8a] Ansible output: b'fatal: [localhost]: FAILED! => {"censored": "the output has been hidden due to the fact that \'no_log: true\' was specified for this result", "changed": false}' | 13:39 |
fungi | well that's not much help | 13:40 |
fungi | so precisely 10 seconds into the "upload-logs-swift : Upload logs to swift" task, there was an unspecified failure | 13:40 |
fungi | infra-root: i'm 10 minutes late for approving 895205 so doing that now | 13:41 |
fungi | bedf3ffb2a2e42178199337efeb6fb8c, the other build from the example change, ran on ze04 and also selected ovh_gra | 13:44 |
fungi | https://public-cloud.status-ovhcloud.com/ indicates the only known incidents today for their public cloud was in the de1 region | 13:47 |
fungi | i guess we should keep an eye out for more of the same in case it's ongoing | 13:48 |
opendevreview | Merged opendev/system-config master: Move OpenInfra and StarlingX lists to Mailman 3 https://review.opendev.org/c/opendev/system-config/+/895205 | 14:09 |
fungi | mailing list migration maintenance starts in 30 minutes | 15:00 |
fungi | 895205 deployed about 5 minutes ago, so we should be all set | 15:01 |
clarkb | sounds good. I'm just sitting down now. | 15:08 |
fungi | i'm taking the opportunity to revisit moderation queues while my tea steeps | 15:08 |
clarkb | I'm drinking my tea far too quickly in an effort ot warm up | 15:11 |
fungi | um, this is unfortunate timing... does anyone else get an error when going to https://lists.openinfra.dev/ ? | 15:16 |
clarkb | fungi: yes, thats the old system judging by the resulting cgi url? | 15:17 |
fungi | right. looks like the cert got replaced a couple of hours ago? | 15:18 |
fungi | Not Before: Sep 21 13:42:09 2023 GMT | 15:18 |
clarkb | I don't get a cert error | 15:18 |
clarkb | I get an Internal Server Error | 15:18 |
fungi | huh | 15:18 |
clarkb | cert is valid until november 8 | 15:18 |
fungi | i just get firefox telling me "Warning: Potential Security Risk Ahead. Firefox detected a potential security threat and did not continue to lists.openinfra.dev. If you visit this site, attackers could try to steal information like your passwords, emails, or credit card details." | 15:19 |
fungi | "Unable to communicate securely with peer: requested domain name does not match the server’s certificate." | 15:20 |
clarkb | fungi: double check you are talking to the correct server? maybe you're pointed at the mm3 server and the new cert didn't get deployed? | 15:20 |
corvus | cn and san are both for lists.openstack.org for me | 15:21 |
fungi | it's the correct ip address | 15:21 |
corvus | ie, my experience matches fungi | 15:21 |
fungi | https://paste.opendev.org/show/b4PCRUWDYpBrNN004JHn/ | 15:21 |
clarkb | https://review.opendev.org/c/opendev/system-config/+/895205/1/inventory/service/host_vars/lists.openstack.org.yaml is the reason | 15:22 |
fungi | maybe a problem with the chain? | 15:22 |
clarkb | and I must be talkign to an apache child that hasn't restarted yet | 15:22 |
fungi | oh | 15:22 |
fungi | yep, okay | 15:22 |
fungi | so i guess that's a good reason to put the old server in emergency disable in the future | 15:22 |
fungi | i bet it redid the cert on the server when the config deployed | 15:23 |
fungi | okay, mystery solved | 15:23 |
fungi | we're 5 minutes out from starting at this point anyway | 15:24 |
ajaiswal[m] | how to run re run job failed job or the job which is not triggered | 15:25 |
ajaiswal[m] | for example https://review.opendev.org/c/starlingx/ansible-playbooks/+/894802/5?tab=change-view-tab-header-zuul-results-summary | 15:25 |
clarkb | as noted in the zuul amtrix room recheck is what you'd typically do. But that has apparently not worked | 15:26 |
clarkb | ajaiswal[m]: are there two problems here? One with a job not triggering and another where recheck does work on a job that previously failed? | 15:26 |
corvus | ajaiswal: what job didn't run that you expected to run? | 15:27 |
ajaiswal[m] | https://review.opendev.org/c/starlingx/ansible-playbooks/+/894802/5?tab=change-view-tab-header-zuul-results-summary | 15:27 |
ajaiswal[m] | corvus: gating job which will provide verified+1 | 15:28 |
clarkb | ajaiswal[m]: that link is a link to the jobs which did run. There were 3 and they did provide a +1. Which job did not run that you expected to run? | 15:29 |
fungi | #status notice The lists.openinfra.dev and lists.starlingx.io sites will be offline briefly for migration to a new server | 15:30 |
opendevstatus | fungi: sending notice | 15:30 |
-opendevstatus- NOTICE: The lists.openinfra.dev and lists.starlingx.io sites will be offline briefly for migration to a new server | 15:30 | |
fungi | i'm starting to do dns updates now | 15:30 |
opendevstatus | fungi: finished sending notice | 15:32 |
fungi | dns changes submitted as of 15:33z, so we should be safe to proceed with service shutdowns after 15:38z | 15:33 |
fungi | we're on line 220 of https://etherpad.opendev.org/p/mm3migration now | 15:34 |
corvus | ajaiswal: if you're wondering why the recheck of the earlier patchset (#4) did not work, it's because that patchset depended on an outdated change; the rebase fixed that and that's why the jobs ran. | 15:34 |
fungi | okay, lists.openinfra.dev and lists.starlingx.io resolve to the temporary addresses to me and we're well past the prior ttl, so we should be able to proceed | 15:39 |
fungi | i'll stop the related services on the old server now | 15:40 |
fungi | services stopped and disabled for mailman-openinfra and mailman-starlingx | 15:41 |
fungi | final rsyncs done | 15:41 |
fungi | i've got a root screen session open on the new server now, proceeding with the import steps | 15:42 |
fungi | migrate script is now running for lists.openinfra.dev | 15:43 |
jkt | hi there, a long time Zuul user here. Just wondering, we have a FLOSS project hosted at GitHub, with code review at GerritHub, and so far we used managed Zuul at Vexxhost | 15:47 |
jkt | but it seems that we might be hitting some internal budget problems and therefore won't be able to pay for the resources because of some internal reasons | 15:48 |
jkt | I was wondering if it's possible for such an "external project" to piggyback on the opendev's Zuul infrastructure | 15:49 |
fungi | jkt: not really. we don't run zuul-as-a-service, we run a code hosting and development collaboration platform which happens to have an integrated ci system that uses zuul | 15:50 |
jkt | https://github.com/Telecominfraproject/oopt-gnpy is the project, it's BSD-3-clause, and the patch traffic is rather low | 15:50 |
jkt | fungi: understood | 15:50 |
fungi | if you're interested in relocating your development into the opendev collaboratory, then it could use opendev's integrated zuul project gating | 15:50 |
jkt | I guess that it's a bit complex to move existing patches, or is that possible with upstream Gerrit now? | 15:51 |
clarkb | jkt: I think the problem with moving existing patches is they are tied to a specific installation id in the notedb | 15:51 |
jkt | yeah | 15:51 |
fungi | it might be possible to import changes with the new notedb backend, but we've never tried it, and yes that | 15:51 |
fungi | there's almost certainly some sort of transformation that would be necessary | 15:52 |
jkt | I recall reading *something* in the release notes of the next release though | 15:52 |
fungi | however, some of the review history (who reviewed/approved, et cetera) is incorporated into git notes associated with the commits that merged | 15:52 |
jkt | right | 15:53 |
fungi | i wonder if luca has ever considered attaching a zuul to gerrithub | 15:53 |
jkt | I am that guy who Luca mentioned at https://groups.google.com/g/repo-discuss/c/MkrP8RmErOk?pli=1 | 15:53 |
fungi | heh | 15:54 |
jkt | but yeah, it would be awesome if you could "just" plug your own openstack API endpoint & credentials into a zero-cost-zuul-plus-nodepool somewhere :) | 15:54 |
fungi | i think there's some optional notion of tenancy in nodepool recently, but it might become a lot easier post-nodepool when zuul is in charge of allocating job resources more directly | 15:56 |
clarkb | the struggles we had when we tried to open up zuul elsewhere are that the problems arise at those integration boundaries and because you have less strong community ties there is less communication. Then when something inevitably goes wrong zuul/opendev get blamed for something that may be completely out of our control or at least require the user involved to intervene in some way. | 15:56 |
clarkb | That is how we ended up with the policy of keeping things in house | 15:56 |
clarkb | we can debug our gerrit and our zuul and our cloud accounts. Its much more difficult to sort out why zuul is hitting rate limits in github when we don't control the application installation | 15:56 |
jkt | yeah | 15:58 |
fungi | migrate script is now running for lists.starlingx.io | 15:59 |
clarkb | the hope too is that we can convince people to help us out and while taking advantage of the resource available also contribute back to runnign them, add cloud resources, etc | 15:59 |
jkt | I'm also running Zuul at $other-company. It's a small dev team, just a handful of people, so we only upgrade when we have a very compelling reason to upgrade, which means that it's like Gerrit 3.3 and one of the oldish Zuul versions pre-zookeeper I think | 16:00 |
fungi | last week when i was trying to explain the opendev collaboratory to someone, i hit on this analogy: it's like dinner at grandma's... guests are welcome but you eat what's for dinner, and if you come around too often you're likely to get put to work in the kitchen | 16:00 |
jkt | but the most time-demanding thing for us are probably the image builds | 16:01 |
jkt | due to the fact that we're a C++ shop, and we tend to bump into various incompatibilities in -devel, libraries, etc in these projects | 16:01 |
jkt | Boost for example :) | 16:01 |
jkt | also, from the perspective of that first project that I mentioned, when we asked for some budget, I felt like the feedback I got was essentially "why don't you use github actions for that" | 16:03 |
jkt | anyway, realistically -- GNPy is a small Python thing, it requires a minute or two for the test suite to pass, and the only thing that might be non-standard is that we care about more Python versions, like 3.8-3.12 | 16:04 |
jkt | but since it's something that's mostly being worked on by people from academia, I'm not sure if we can commit to that kitchen duty of potato peeling | 16:04 |
clarkb | I think we're far less worried about the job runtime and more the integration. As all of that falls on us. We're happy to host things on our systems then we can spread that load out across the few thousand repos we are hosting instead of ending up with a complete one off | 16:06 |
clarkb | one thing I like to remind people is that openstack is about 95% of our resource consumption. Adding other projects next to openstack doesn't move that needle much. | 16:06 |
jkt | that makes sense; I remember the days of JJB and Turbo-Hipster that I absolutely loved | 16:06 |
jkt | how does it look like on the VM images, what determines which Python versions are available? Or is it something like "latest ubuntu" being the default, and jobs installing some oldish python in a pre-run playbook? | 16:08 |
clarkb | jkt: we offer ubuntu, centos stream, rocky linux, etc images that are relatively barebones. Projects then execute jobs that install what they need at runtime (often pulling through or from our in cloud mirrors/caches). | 16:09 |
clarkb | For example tox/nox python jobs will run the ensure-python role to install python at the required version (you need to align this with the distro offerings if using distro python) and also run bindep package listings and installations for other system deps | 16:10 |
clarkb | then they run tox or nox depending on the variant to pull in the python deps, building project and execute a test suite | 16:10 |
fungi | also zuul's standard library contains simple roles for things like "provide this version of python" and "install and run tox" | 16:10 |
fungi | we do keep caching proxies/mirrors of things like package repositories network-local to the cloud regions where our job nodes are booted too, in order to speed up run-time installation of job requirements | 16:11 |
fungi | and to shield them from the inherent instability of the internet and remote services | 16:11 |
fungi | and we also bake fairly up-to-date copies of git repositories into the images the nodes are booted from, so that they spend as little time as possible syncing additional commits over the network | 16:14 |
fungi | okay, listserv imports have completed. stdout/stderr are recorded to import_openinfra.log and import_starlingx.log in root's homedir on the new server. proceeding with the manual django site additions next | 16:17 |
*** blarnath is now known as d34dh0r53 | 16:19 | |
jkt | nice, so it looks like the Ubuntu LTS versions combined together do have a reasonable Python version coverage; https://zuul.opendev.org/t/openstack/build/0c26758ef7a14b88b939ef45e07fc5d4/console says it took 6s to install that | 16:19 |
jkt | that's nice | 16:19 |
fungi | switching dns for both sites to the new server now | 16:20 |
fungi | dns records updated as of 16:22z, ttl for the previous entries should expire by 16:27z and then we can proceed with testing | 16:22 |
fungi | okay, we're there | 16:27 |
fungi | dns is resolving to the new addresses for me | 16:27 |
fungi | testing urls at line 245 of https://etherpad.opendev.org/p/mm3migration | 16:27 |
clarkb | dns is updated for me as well /me looks at urls | 16:28 |
clarkb | looks good to me | 16:28 |
fungi | https://lists.openinfra.dev/archives/list/foundation@lists.openinfra.dev/ shows some incomplete (very old) messages which got imported incompletely. i think they were missing dates and so got imported with today's date | 16:29 |
fungi | but yeah, everything checks out | 16:30 |
clarkb | there are 6 of them judging by the None subjects and timestamps | 16:30 |
fungi | okay, i'm going to send completion notifications to the two lists indicated, which will also serve to test message delivery and archive updates. i have them already drafted | 16:31 |
corvus | received my copy @foundation | 16:34 |
clarkb | me too and that one actually goes to gmail | 16:34 |
clarkb | I forget why I'm dobule subscribed on that one | 16:34 |
clarkb | I see the starlingx email as well | 16:35 |
corvus | me too | 16:35 |
fungi | yeah, i received my reply on both lists and the archive looks right too (links added to the pad) | 16:35 |
corvus | \o/ | 16:35 |
fungi | that concludes today's maintenance activity, not too far outside the announced window either | 16:36 |
corvus | regarding the zuul regex email -- i sent a copy to service-discuss: https://lists.opendev.org/archives/list/service-discuss@lists.opendev.org/thread/DLBYJAVTIJOXZPY6JOLP3AJHDTA6XF2R/ | 16:36 |
corvus | but it occurs to me that maybe that should have gone to service-announce? | 16:36 |
fungi | corvus: i had a similar thought when i saw it | 16:36 |
corvus | should i resend a copy to -announce then? | 16:36 |
clarkb | ya -announce is probably best to ensure more people see it | 16:36 |
fungi | might be worth sending a copy there, i agree | 16:37 |
corvus | ok will do | 16:37 |
fungi | i'll point openstack folks to that announcement | 16:37 |
clarkb | I'll probably have to log in and moderate it through. Let me know when it is waiting for me | 16:37 |
fungi | (as a subtle reminder to at least subscribe to our announcements ml) | 16:37 |
corvus | okay it awaits moderator approval now | 16:38 |
fungi | i'll get it | 16:40 |
clarkb | I got it | 16:40 |
fungi | you're too quick for me | 16:41 |
fungi | thanks! | 16:41 |
corvus | ha, the quote of the release note is collapsed by default in the web ui | 16:42 |
corvus | (which i guess makes sense under the assumption that quotes are things from earlier in the thread; but not in this case...) | 16:42 |
corvus | click "..." to read the important part! fortunately, i think all the important things show up later in the text | 16:43 |
corvus | but next time, i'll just block indent it without the > | 16:44 |
fungi | yeah, one of the unfortunate design elements that people seem to want ported from webmail clients | 16:44 |
clarkb | my mail client seems to know it wasn't previously viewed so doesn't collapse it at least | 16:45 |
fungi | it's not as annoying as discourse at least. i've basically stopped including quoted context in my replies to the python community "mailing lists" because it just eats them unless i include specially-formatted markdown tags | 16:45 |
fungi | discourse's "mailing list mode" is basically incapable of parsing traditional attribution lines in replies | 16:46 |
fungi | granted they're not all that standardized, but it could at least avoid eating them | 16:46 |
clarkb | fungi: how difficult would it be to skim the openstack-discuss or other list archives to see if they have similar problems to the foundatiuon archive emails | 16:47 |
clarkb | slightly worred there could be hundreds in that particular mailing list archive. But maybe not since that list is relatively new? | 16:47 |
fungi | well, we have a log of the import, so it may have clues | 16:47 |
clarkb | aha | 16:47 |
fungi | and yeah, openstack-discuss is comparatively new | 16:47 |
fungi | looking at lists01:/root/import_openinfra.log it doesn't mention any errors from the hyperkitty_import step for that list | 16:49 |
fungi | we also have the original mbox files (both on the old server and the new one) we can look those messages up in | 16:50 |
TheJulia | o/ Anyone up for a bit of a mystery with zuul starting around august 10th, to do with artifact build jobs? | 16:55 |
TheJulia | https://zuul.opendev.org/t/openstack/builds?job_name=ironic-python-agent-build-image-tinyipa&project=openstack/ironic-python-agent | 16:56 |
fungi | it was colonel mustard in the library with the candlestick | 16:56 |
TheJulia | eh, well, aside from focal in openstack/project-config.... | 16:56 |
TheJulia | I'm wondering if we've got a new timeout from Zuul?! | 16:57 |
TheJulia | (which, I thought was 30 minutes, but dunno now) | 16:57 |
fungi | module_stderr: Killed | 16:58 |
fungi | rc: 137 | 16:58 |
fungi | https://zuul.opendev.org/t/openstack/build/dc30232cfd42432597a7b1c0ab2c12ba/console#2/0/3/ubuntu-focal | 16:58 |
TheJulia | that looks like the actual zuul task was killed | 16:58 |
fungi | took 3 mins 4 secs | 16:58 |
TheJulia | :\ | 16:58 |
fungi | er, took 13 mins 4 secs | 16:58 |
TheJulia | still, not that long | 16:59 |
TheJulia | Did we forget to bake fresh chocolate chip cookies and give them to zuul?! | 16:59 |
fungi | just remember zuul is allergic to nuts | 16:59 |
TheJulia | ... oh. | 16:59 |
fungi | i don't see any long-running tasks prior to that one | 17:00 |
* TheJulia whistles innocently as it was Ironic with chocolate walnut cookies | 17:00 | |
fungi | and that's pretty early in the job to have been a timeout | 17:00 |
TheJulia | That is what I thought | 17:00 |
clarkb | the taks after the one that is killed is a failed ssh connection | 17:00 |
clarkb | could the image build be breaking the system in such a way that ansible just stops working? | 17:00 |
TheJulia | it is running gcc in most of the cases I looked at | 17:00 |
TheJulia | like, nowhere near there | 17:01 |
clarkb | psosibly by filling the disk? (oddlyif ansible can't write to /tmp to copy its scripts then it treats that as a network failure) | 17:01 |
TheJulia | ... I doub't it | 17:01 |
TheJulia | that is true, afaik the script uses the current working directory, but I'll check | 17:01 |
clarkb | that is long enough ago I bet we don't have executor lgos for it which is the next thing I would check to see if we get more verbose info on the killed task | 17:02 |
TheJulia | ugh | 17:02 |
TheJulia | off hand, how much space should /tmp have on these test VMs? | 17:02 |
JayF | I'm going to note that post job is running on focal | 17:02 |
JayF | when most of our other items are on jammy | 17:02 |
clarkb | it depends on the cloud. / and /tmp are shared and real disk (not tmpfs iirc) | 17:03 |
TheJulia | well, yes. It does seem that openstack/project-config needs an update | 17:03 |
fungi | looks like that node started out with a ~40gb fs that was nearly half full: https://zuul.opendev.org/t/openstack/build/dc30232cfd42432597a7b1c0ab2c12ba/log/zuul-info/zuul-info.ubuntu-focal.txt#125 | 17:03 |
clarkb | so on rax you'll get 20GB - 13GB or so for 7GB free | 17:03 |
clarkb | and on ovh it will be someting like 80GB - 13GB or so | 17:03 |
TheJulia | Those builds are a couple hundred megs... tops | 17:03 |
JayF | when complete; but while dib runs it can use GBs of cache | 17:04 |
corvus | https://zuul.opendev.org/t/openstack/build/69805541fb554b2a8d48753d7c9b2aca/console is a recent build with logs on ze02 | 17:04 |
TheJulia | I've never seen it use more than like 500 mb | 17:04 |
clarkb | it definitely looks liek something has made the test node unresponsive though | 17:04 |
TheJulia | but it has been a while | 17:04 |
JayF | TheJulia: ack | 17:04 |
* TheJulia gives it a spin | 17:04 | |
clarkb | which could be external too, but image builds also mess with filesystems and do things that if done wrong could hose a host | 17:04 |
corvus | the logs don't have any additional info | 17:05 |
corvus | 2023-09-21 12:49:37,990 DEBUG zuul.AnsibleJob.output: [e: 4fcf54103f834451985e34885dff8dad] [build: 69805541fb554b2a8d48753d7c9b2aca] Ansible output: b'fatal: [ubuntu-focal]: FAILED! => {"changed": false, "module_stderr": "Killed\\n", "module_stdout": "", "msg": "MODULE FAILURE\\nSee stdout/stderr for the exact error", "rc": 137}' | 17:05 |
* TheJulia runs a test build locally to see | 17:05 | |
fungi | also notable, the builds for stable/2023.1 seem to be successful | 17:06 |
fungi | but not stable/2023.2 nor master | 17:06 |
JayF | The builds do succeed in CI; we have jobs that build an image and use it | 17:07 |
fungi | so the start date for the problem could be as early as 2023-08-07 because the one that succeeded on 2023-08-10 was for stable/2023.1 which also succeeded on 2023-08-30 | 17:07 |
JayF | just not in post, which is weird | 17:07 |
fungi | JayF: i guess the check/gate jobs have a different name then? | 17:08 |
TheJulia | yeah, they do | 17:08 |
JayF | https://opendev.org/openstack/ironic-python-agent/src/branch/master/zuul.d/ironic-python-agent-jobs.yaml#L22 | 17:08 |
fungi | they may have other differences in that case | 17:08 |
JayF | basically ipa-*-src jobs | 17:08 |
TheJulia | These jobs get sourced out of ironic-python-agent-builder | 17:09 |
JayF | I'm pointing it out as a data point we can look at for success in master branch | 17:09 |
TheJulia | for the build itself | 17:09 |
TheJulia | the check jobs invoke builder | 17:09 |
fungi | i'd say look into what's different in the check/gate version of the job, and also what's different in the stable/2023.1 vs stable/2023.2 builds | 17:09 |
* TheJulia is building locally... finally | 17:10 | |
TheJulia | I'm way past where it dies at and /tmp is untouched | 17:12 |
corvus | there are streaming logs from that task; https://zuul.opendev.org/t/openstack/build/69805541fb554b2a8d48753d7c9b2aca/log/job-output.txt#2244 is the point where it died | 17:12 |
TheJulia | JayF: way past the failure point, by like miles, and only at 502MB so far | 17:13 |
JayF | ack | 17:13 |
clarkb | does the waiting on logger there after the failure imply the zuul logging daemon was killed (possibly by OOMKiller or a reboot?) | 17:14 |
corvus | clarkb: yes it is suggestive of that; and that it took 29 minutes to run df further points to "unhappiness on the remote node" | 17:16 |
corvus | probably worth collecting syslog from the remote system in a post-run playbook | 17:17 |
TheJulia | JayF: so more like 2GB total disk space utilized so far, but I'm deep in the compile process like 10 minutes past where it failed | 17:19 |
JayF | TheJulia: I realized with your second comment that my experiences, and comment, was not relating to the tinyipa image, but the dib one | 17:19 |
TheJulia | ahh! | 17:19 |
TheJulia | okay | 17:19 |
TheJulia | anyhow, all in current working directory where executed which means the user's folder where cloned | 17:20 |
TheJulia | so it looks like it was downloading when it failed too | 17:20 |
TheJulia | So, when it fails, it is doing the build in a chroot | 17:25 |
clarkb | I wonder if we get some sort of kernel panic because of some incompatibility between new image build stuff (mounting a fs or whatever) and the old(ish) focal kernel | 17:25 |
TheJulia | getting and loading the files | 17:25 |
TheJulia | That could be actually | 17:25 |
TheJulia | maybe there is some issue we're ticking within the chroot which is causing a panic | 17:26 |
TheJulia | Only thing I can guess at the moment | 17:26 |
clarkb | switching the job to jammy might be a quick way to narrow down on something with the focal kernel | 17:26 |
TheJulia | yup | 17:26 |
TheJulia | should we consider switching up openstack/project-config in general? | 17:27 |
clarkb | the default nodeset in zuul should already be jammy | 17:27 |
clarkb | I think this job must specifically ask for focal | 17:27 |
TheJulia | project-config carries an override | 17:27 |
TheJulia | and it is all rooted on the artifact upload job | 17:27 |
TheJulia | in project-cnofig | 17:27 |
TheJulia | config | 17:27 |
clarkb | ah this one publish-openstack-artifacts | 17:28 |
clarkb | I would start by overriding in your job since that will affect all openstack artifact publishing and the relaase is soon | 17:28 |
TheJulia | ack | 17:28 |
clarkb | but ya bumping that post release is probably a good idea | 17:28 |
corvus | fungi: clarkb here's one of the None messages: https://web.archive.org/web/20221202000314/https://lists.openinfra.dev/pipermail/foundation/2011-October/000353.html | 17:30 |
fungi | oh, good find! i hadn't started trying to hunt one down yet | 17:31 |
fungi | the >From escaping is likely the problem | 17:31 |
TheJulia | JayF: change posted to ironic-python-agent-builder, we'll need to merge it to see | 17:31 |
fungi | i saw someone mention it on the mm3 list earlier this week even | 17:31 |
JayF | TheJulia: +2A as a single core as a CI fix; I think you'll still have to wait for jobs | 17:32 |
fungi | https://lists.mailman3.org/archives/list/mailman-users@mailman3.org/thread/5JDF7OVYZNLBEHROZWGNMWR5LQICNHMB/ | 17:33 |
TheJulia | JayF: indeed :( | 17:34 |
clarkb | fungi: new favorite word: "guessalasys" | 17:35 |
clarkb | I'm not sure I understand the nuances of quoting there though | 17:35 |
fungi | i think it's that every message in an mbox file starts with a line matching "^From .*" and so anything in the message body gets rewritten to ">From ..." in order to not prematurely truncate the message | 17:37 |
fungi | but then something about the import is unescaping the >From to From which results in everything after that looking like a new message with no headers | 17:38 |
fungi | i don't have my head fully wrapped around it myself either | 17:39 |
fungi | the case described in that thread isn't exactly ours because it's someone exporting list archives from one mm3 server and importing onto another | 17:39 |
corvus | fungi: according to that thread, you probably shouldn't commit crimes unless you fully understand `from` escaping. | 17:40 |
fungi | but my point was the hyperkitty_import tool seems to possibly have some rough edges around escaped froms | 17:40 |
fungi | so maybe those messages are tripping that (which would match the results we're seeing at least) | 17:40 |
fungi | oh, also no need for the wayback machine, we still serve those original archives unmodified: https://lists.openinfra.dev/pipermail/foundation/2011-October/000353.html | 17:43 |
fungi | though maybe sans a bit of theming (not sure where that went) | 17:43 |
fungi | i downloaded the mbox file from https://lists.openinfra.dev/pipermail/foundation/ and this is what that same message looks like trimmed out: https://paste.opendev.org/show/bl8lVessN5hQHRFsdLCJ/ | 17:46 |
fungi | if i grep that month's mbox file for '^>From ' i get 9 matches though, not 6 | 17:47 |
corvus | may only trigger as the first line of the message? | 17:49 |
stephenfin | I'd like to cut a release of x/wsme, but I notice it's not hooked into openstack/release. How do I push tags without that? I already have access to PyPI so I can do it locally but I'd like to push the tag up to the remote also | 17:52 |
Clark[m] | stephenfin: the first things to check are that you are a member of the release group in gerrit for the project and that the project is configured to run the release jobs when you push a tag. With that in place you tag a signed tag then push the tag to Gerrit and let zuul do the rest | 17:56 |
fungi | corvus: oh, yeah maybe needs a preceeding blank | 17:56 |
Clark[m] | Keep in mind you need to take care to push commits that are already merged to a branch in Gerrit. Otherwise you'll push a bunch of unreviewed code up hanging off of that tag. We also don't typically delete tags because git doesn't notice when tags change | 17:56 |
fungi | corvus: though all 9 matches have blank lines before the >From so it must be something more subtle | 17:58 |
stephenfin | Clark[m]: thanks | 17:59 |
fungi | stephenfin: https://docs.opendev.org/opendev/infra-manual/latest/drivers.html#tagging-a-release | 17:59 |
fungi | okay, a new twist. while there are 9 messages in that mbox file which match ^>From they are 3 duplicate copies of 3 messages each | 18:10 |
fungi | all 3 unique messages are in the set of ones that didn't import correctly | 18:10 |
clarkb | maybe it was able to do deduplication but only one layer deep resulting in 6 messages being imported wrongly? | 18:12 |
clarkb | s/wrongly/in a weird way/ | 18:12 |
fungi | aha, yes, the messages incorrectly imported in hyperkitty appear to be 2 duplicates of each of 3 messages | 18:13 |
fungi | maybe there is a copy of each one in the right place as well | 18:14 |
fungi | body getting stripped out resulting in a blank message: https://lists.openinfra.dev/archives/list/foundation@lists.openinfra.dev/thread/VEFLPHLB43Z4EM5KJN76XEUHHOT5HR4A/ | 18:15 |
clarkb | that would be fun. We could potentially just delete the bad ones out of the archive | 18:15 |
fungi | https://lists.openinfra.dev/archives/list/foundation@lists.openinfra.dev/thread/2UJR6X6ZAMH4TNEPU4ABFXHMKXEWCOLP/ and https://lists.openinfra.dev/archives/list/foundation@lists.openinfra.dev/thread/7EKUWZXEMOWODPWUMFPP3WIRADQ6BH5C/ are the other two | 18:16 |
fungi | so in summary, there are three messages in the october 2011 archive which are disembodied heads, and then two duplicate copies of their headless zombie bodies roaming the september 2023 archives | 18:18 |
fungi | ...come back from the grave to reclaim their heads | 18:18 |
fungi | anyway, i think we can delete the extra headless corpses and then try to reattach bodies to the three severed heads from 2011, but the latter part may require db surgery (and lots of lightning) | 18:22 |
fungi | also not urgent, we can mull the situation over as long as we like, maybe discuss in tuesday's meeting | 18:23 |
fungi | and with that, i've got an appointment to get to... bbiaw | 18:26 |
TheJulia | so... crazy question. Is it just me, or does the favicon for lists.openinfra.dev look like moopsy ? | 18:57 |
TheJulia | (from star trek lower decks) | 18:58 |
TheJulia | Oh, I see, it is "Hyperkitty" | 18:59 |
TheJulia | ... still looks like moopsy :) | 19:00 |
fungi | it does indeed look like moopsy | 19:44 |
fungi | we can change it, we just haven't invested in theming postorius/hyperkitty until we're done migrating everything | 19:44 |
fungi | in a few weeks we'll hopefully do openstack, and then everything will be fully on mm3 and we can probably spend some time beautifying it | 19:45 |
fungi | like add site-specific favicons and logos for each list domain and stuff | 19:46 |
opendevreview | Jay Faulkner proposed openstack/project-config master: JayF volunteering for IRC ops https://review.opendev.org/c/openstack/project-config/+/896162 | 19:50 |
TheJulia | fungi: oh, no worries. I love it! | 20:52 |
fungi | now you're channeling nils ;) | 22:18 |
TheJulia | lol | 22:55 |
fungi | that video is immortal | 22:56 |
Generated by irclog2html.py 2.17.3 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!