clarkb | ianw: I think you can also just restart the persistent firewall service | 00:00 |
---|---|---|
clarkb | (fwiw I'm convincing myself that nb01/nb02.opendev.org will get the nodepool configs we want them to get now) | 00:00 |
ianw | yeah, they should be fine with the normal "nodepool.yaml" config file | 00:01 |
clarkb | yup just found the ternary condition for that | 00:01 |
clarkb | it will use a host specific config first if found else fallback to nodepool.yaml which is what we want for nb01 and nb02 | 00:02 |
ianw | yep and when up, we can remove nb04 and make sure 01/02 are building everything | 00:02 |
openstackgerrit | Merged opendev/system-config master: install-docker: remove arch match https://review.opendev.org/724435 | 00:34 |
ianw | hrm, a POST_FAILURE on one of hte beaker jobs | 00:40 |
ianw | https://zuul.opendev.org/t/openstack/build/1eb289a5cbcc42f8aba47ce3c35a2eff | 00:40 |
clarkb | iamw that could be the ovh issue | 00:41 |
clarkb | they areinvestigating and have confirmed the priblem | 00:41 |
clarkb | details in #opemstack-infra | 00:41 |
clarkb | (that is where amorin was lurking) | 00:41 |
ianw | i did see that but didn't we stop uploading there? | 00:42 |
clarkb | ianw I dont think that change merged | 00:43 |
ianw | available identity versions when contacting https://auth.cloud.ovh.net/ | 00:46 |
openstackgerrit | Merged opendev/system-config master: Add nb01/nb02 opendev servers https://review.opendev.org/726021 | 00:51 |
fungi | the change to stop uploading to ovh has not been approbed | 00:56 |
fungi | approved | 00:56 |
ianw | ok, well i'll keep an eye ... | 01:02 |
ianw | i think it might be time to merge it, about 2/3 of my changes managed one job that failed | 01:08 |
openstackgerrit | Ian Wienand proposed opendev/base-jobs master: Revert "Temporarily disable OVH Swift uploads" https://review.opendev.org/726028 | 01:10 |
openstackgerrit | Merged opendev/base-jobs master: Temporarily disable OVH Swift uploads https://review.opendev.org/725943 | 01:13 |
openstackgerrit | Merged zuul/zuul-jobs master: ensure-tox: use venv to install https://review.opendev.org/725737 | 01:28 |
ianw | nb01.opendev.org was in emergency from prior work, i've taken it out | 02:09 |
*** olaph has quit IRC | 02:39 | |
ianw | 01 needs another run to fix up it's certs, as i only noticed it was in emergency after LE phase ran | 02:42 |
ianw | but it appears to be building | 02:42 |
ianw | as mentioned nb01/02.openstack.org are currently borked with locked files. i'm going to just shut them down now nb01/n02.opendev.org seem to be working | 02:48 |
openstackgerrit | Monty Taylor proposed zuul/zuul-jobs master: Upload images to dockerhub with buildx when using buildx https://review.opendev.org/726033 | 02:50 |
mordred | ianw: woot! that's exciting | 02:51 |
mordred | ianw: so that means we're just down to nb03 - and that should be gtg as soon as we actually start uploading our multi-arch images | 02:54 |
ianw | #status nb01/02.openstack.org shutdown and in emergency file; nb01/02.opendev.org are replacements | 02:55 |
openstackstatus | ianw: unknown command | 02:55 |
ianw | #status log nb01/02.openstack.org shutdown and in emergency file; nb01/02.opendev.org are replacements | 02:55 |
openstackstatus | ianw: finished logging | 02:55 |
ianw | yep! | 02:55 |
openstackgerrit | Ian Wienand proposed opendev/system-config master: Retire nb01/02.openstack.org https://review.opendev.org/726034 | 03:07 |
openstackgerrit | Ian Wienand proposed opendev/system-config master: nodepool-builder: fix servername https://review.opendev.org/726035 | 03:10 |
openstackgerrit | Ian Wienand proposed opendev/system-config master: [wip] arm64 builder test https://review.opendev.org/726037 | 03:33 |
openstackgerrit | Merged opendev/system-config master: Add system-config-run-base-arm64 https://review.opendev.org/724439 | 03:54 |
*** ykarel|away is now known as ykarel | 04:21 | |
openstackgerrit | Ian Wienand proposed opendev/system-config master: [wip] arm64 builder test https://review.opendev.org/726037 | 04:59 |
openstackgerrit | Ian Wienand proposed opendev/system-config master: Move build-essential arm64 things to base https://review.opendev.org/726039 | 04:59 |
openstackgerrit | Ian Wienand proposed opendev/system-config master: service-bridge: skip osc/kubectl things for arm64 https://review.opendev.org/726040 | 04:59 |
openstackgerrit | Ian Wienand proposed opendev/system-config master: Move build-essential arm64 things to base https://review.opendev.org/726039 | 05:25 |
openstackgerrit | Ian Wienand proposed opendev/system-config master: service-bridge: skip osc/kubectl things for arm64 https://review.opendev.org/726040 | 05:25 |
openstackgerrit | Ian Wienand proposed opendev/system-config master: [wip] arm64 builder test https://review.opendev.org/726037 | 05:25 |
*** ysandeep|away is now known as ysandeep | 05:42 | |
openstackgerrit | Ian Wienand proposed openstack/project-config master: Remove special nb04 config file https://review.opendev.org/726043 | 05:53 |
openstackgerrit | Ian Wienand proposed openstack/project-config master: Remove special nb04 config file https://review.opendev.org/726043 | 05:54 |
openstackgerrit | Ian Wienand proposed opendev/system-config master: [wip] arm64 builder test https://review.opendev.org/726037 | 06:12 |
*** DSpider has joined #opendev | 06:51 | |
*** ralonsoh has joined #opendev | 07:26 | |
*** tosky has joined #opendev | 07:31 | |
*** jaicaa has quit IRC | 07:31 | |
*** jaicaa has joined #opendev | 07:33 | |
openstackgerrit | Albin Vass proposed zuul/zuul-jobs master: Bumb ansible to version 2.6 https://review.opendev.org/726054 | 07:40 |
*** rpittau|afk is now known as rpittau | 07:47 | |
*** ysandeep is now known as ysandeep|lunch | 07:50 | |
*** mnasiadka has quit IRC | 07:56 | |
*** panda has quit IRC | 07:59 | |
*** mnasiadka has joined #opendev | 08:01 | |
*** panda has joined #opendev | 08:02 | |
*** roman_g has quit IRC | 08:20 | |
*** ysandeep|lunch is now known as ysandeep | 08:44 | |
openstackgerrit | Albin Vass proposed zuul/zuul-jobs master: Add upload-artifactory role https://review.opendev.org/725678 | 08:45 |
openstackgerrit | Albin Vass proposed zuul/zuul-jobs master: Add upload-artifactory role https://review.opendev.org/725678 | 08:46 |
openstackgerrit | Albin Vass proposed zuul/zuul-jobs master: Add upload-artifactory role https://review.opendev.org/725678 | 08:53 |
*** ykarel is now known as ykarel|lunch | 09:01 | |
openstackgerrit | Albin Vass proposed zuul/zuul-jobs master: Add upload-artifactory role https://review.opendev.org/725678 | 09:10 |
openstackgerrit | Albin Vass proposed zuul/zuul-jobs master: Bumb ansible to version 2.6 https://review.opendev.org/726054 | 09:16 |
*** dtantsur|afk is now known as dtantsur | 09:18 | |
*** diablo_rojo has quit IRC | 09:24 | |
openstackgerrit | Albin Vass proposed zuul/zuul-jobs master: Add upload-artifactory role https://review.opendev.org/725678 | 09:38 |
*** Eighth_Doctor has quit IRC | 09:39 | |
*** sshnaidm|afk is now known as sshnaidm | 09:44 | |
*** ykarel|lunch is now known as ykarel | 09:44 | |
openstackgerrit | Albin Vass proposed zuul/zuul-jobs master: Add upload-artifactory role https://review.opendev.org/725678 | 09:54 |
*** Eighth_Doctor has joined #opendev | 09:58 | |
openstackgerrit | Albin Vass proposed zuul/zuul-jobs master: Add upload-artifactory role https://review.opendev.org/725678 | 10:07 |
*** rpittau is now known as rpittau|bbl | 10:10 | |
openstackgerrit | Albin Vass proposed zuul/zuul-jobs master: Add upload-artifactory role https://review.opendev.org/725678 | 10:27 |
openstackgerrit | Albin Vass proposed zuul/zuul-jobs master: Add upload-artifactory role https://review.opendev.org/725678 | 10:46 |
*** ysandeep is now known as ysandeep|brb | 10:57 | |
*** ysandeep|brb is now known as ysandeep | 11:10 | |
openstackgerrit | Albin Vass proposed zuul/zuul-jobs master: Add upload-artifactory role https://review.opendev.org/725678 | 11:18 |
*** ysandeep is now known as ysandeep|afk | 11:48 | |
*** ysandeep|afk is now known as ysandeep | 12:00 | |
openstackgerrit | Albin Vass proposed zuul/zuul-jobs master: Add upload-artifactory role https://review.opendev.org/725678 | 12:02 |
openstackgerrit | Albin Vass proposed zuul/zuul-jobs master: Add upload-artifactory role https://review.opendev.org/725678 | 12:11 |
*** rpittau|bbl is now known as rpittau | 12:11 | |
*** hashar has joined #opendev | 12:24 | |
*** hrw has joined #opendev | 12:48 | |
hrw | morning | 12:49 |
hrw | can someone take a look at http://mirror.regionone.linaro-us.opendev.org/ mirror? it 403 all centos8 packages | 12:50 |
hrw | https://39a670f0ca097ca9d2d3-0327a7e653cb74ce0efd34fcb3f0b3e6.ssl.cf5.rackcdn.com/725032/5/check-arm64/kolla-build-centos8-source-aarch64/ff0437e/kolla/build/000_FAILED_base.log shows | 12:50 |
openstackgerrit | Albin Vass proposed zuul/zuul-jobs master: Add upload-artifactory role https://review.opendev.org/725678 | 12:52 |
*** ykarel is now known as ykarel|afk | 13:05 | |
openstackgerrit | Sorin Sbarnea (zbr) proposed zuul/zuul-jobs master: Made sequence indent consistent https://review.opendev.org/725538 | 13:15 |
openstackgerrit | Albin Vass proposed zuul/zuul-jobs master: Add upload-artifactory role https://review.opendev.org/725678 | 13:17 |
*** roman_g has joined #opendev | 13:18 | |
openstackgerrit | Merged openstack/project-config master: Add repository for oslo.metrics https://review.opendev.org/725847 | 13:21 |
openstackgerrit | Albin Vass proposed zuul/zuul-jobs master: Add upload-artifactory role https://review.opendev.org/725678 | 13:22 |
openstackgerrit | Monty Taylor proposed zuul/zuul-jobs master: Upload images to dockerhub with buildx when using buildx https://review.opendev.org/726033 | 13:26 |
fungi | hrw: those urls seem to be okay when i try to retrieve them | 13:29 |
fungi | do you know whether it's still happening? | 13:29 |
fungi | looks like there were a probably slew of afs connectivity issues from the linaro mirror at 10:29 utc, and then i see a stray error at 11:55 and another at 12:09 | 13:36 |
*** rosmaita has quit IRC | 13:38 | |
fungi | though dmesg shows a bunch more outside those times | 13:38 |
fungi | doesn't look like it's on the afs server side as our other mirrors aren't reporting similar errors | 13:39 |
donnyd | is there a central ARA for the CI? | 13:39 |
donnyd | like a place to see everything | 13:39 |
donnyd | or are they by job | 13:39 |
fungi | donnyd: nope, though not sure what you mean by "everything" | 13:39 |
fungi | oh, as in an aggregate of all job logs? no, we have logstash/elasticsearch/kibana though | 13:40 |
donnyd | I didn't know if there was like a central ARA or something. Shows how much I know about ARA | 13:40 |
fungi | we don't run ara generally for our zuul jobs, though some jobs which involve nested ansible invocation also install ara in the job and produce a report from the inner ansible | 13:41 |
openstackgerrit | Albin Vass proposed zuul/zuul-jobs master: Add upload-artifactory role https://review.opendev.org/725678 | 13:42 |
dmsimard | donnyd: it's doable but openstack-scale is hard :) | 13:42 |
dmsimard | https://api.demo.recordsansible.org/ has data from different jobs | 13:43 |
*** jhesketh has quit IRC | 13:43 | |
donnyd | thank you dmsimard | 13:44 |
openstackgerrit | Monty Taylor proposed zuul/zuul-jobs master: Upload images to dockerhub with buildx when using buildx https://review.opendev.org/726033 | 13:46 |
*** jhesketh has joined #opendev | 13:47 | |
fungi | hrw: so far, it looks like there may be some sort of intermittent network connectivity issue preventing mirror.regionone.linaro-us.opendev.org from reaching afs01.dfw.openstack.org for around 30 seconds at a time. the server itself seems healthy other than there are some fairly large spike of network utilization around the same times i'm seeing the afs errors: | 13:50 |
fungi | http://cacti.openstack.org/cacti/graph.php?action=view&local_graph_id=67974&rra_id=all | 13:50 |
fungi | i guess that could be job nodes in that region retrieving files from the mirror | 13:50 |
fungi | it looks somewhat consistent with previous days, so could just be that those are the times it's being used, and we're less likely to see errors retrieving files when nothing's retrieving files | 13:52 |
donnyd | Is OE going to be put in there to monitor? | 13:57 |
fungi | oh, is it not configured in cacti yet? i'll get to that in a bit | 14:01 |
fungi | yeah, i don't see that we're collecting system metrics for the openedge mirror | 14:02 |
*** ykarel|afk is now known as ykarel | 14:07 | |
openstackgerrit | Merged zuul/zuul-jobs master: Bumb ansible to version 2.6 https://review.opendev.org/726054 | 14:21 |
openstackgerrit | Monty Taylor proposed zuul/zuul-jobs master: Upload images to dockerhub with buildx when using buildx https://review.opendev.org/726033 | 14:24 |
*** tosky_ has joined #opendev | 14:29 | |
*** tosky has quit IRC | 14:32 | |
*** ysandeep is now known as ysandeep|afk | 14:33 | |
openstackgerrit | Merged zuul/zuul-jobs master: Made sequence indent consistent https://review.opendev.org/725538 | 14:42 |
*** panda is now known as panda|pto | 14:42 | |
*** tkajinam has quit IRC | 14:43 | |
openstackgerrit | Monty Taylor proposed opendev/system-config master: Add focal to system-config base job https://review.opendev.org/725676 | 14:47 |
mordred | clarkb, fungi: https://review.opendev.org/#/c/723528/ is finally ready to go | 14:47 |
mordred | as is its child adding focal to the base job | 14:47 |
mordred | infra-root: also - when you get a chance, https://review.opendev.org/#/c/724682 could use a +A | 14:48 |
corvus | i'm going to perform a shopping outing this morning, which as you doubtless know, can be a bit of an event, so i'll be afk for a while (starting in maybe 30m) | 14:50 |
fungi | our grocery pickup appointment is tomorrow morning, but the town has also reopened the recycling facility and we have an entire station wagon load of the past couple months recycling pile to go drop off | 14:52 |
fungi | so yeah, i'll probably disappear for a good chunk of tomorrow morning | 14:53 |
openstackgerrit | Merged zuul/zuul-jobs master: Upload images to dockerhub with buildx when using buildx https://review.opendev.org/726033 | 15:01 |
*** tosky_ is now known as tosky | 15:04 | |
mordred | corvus, fungi: before you disappear, have a minute to review https://review.opendev.org/#/c/724682 ? we need it in so we can re-do clarkb's zuul.yaml reorg | 15:06 |
clarkb | fungi: and then stand in line behind all the other full station wagons | 15:07 |
corvus | mordred: +2 (i looked at that yesterday -- but i understand it today :) did not +W; will let you or fungi do that | 15:08 |
fungi | clarkb: indeed, so very much looking forward to that | 15:09 |
hrw | fungi: thanks. will check when other ci job run | 15:09 |
fungi | mordred: my outing's not until tomorrow, but i will try to take a look while in meetings right now | 15:09 |
fungi | hrw: yeah, i still need to perform some more thorough network tests to see if i can spot more general network connectivity issues from the linaro-us mirror | 15:10 |
mordred | corvus: awesome - thanks! | 15:12 |
openstackgerrit | Merged openstack/project-config master: Remove special nb04 config file https://review.opendev.org/726043 | 15:13 |
clarkb | infra-root disk for / on zuul01 is running out. It has occurred to me I didn't run docker-compose prune in zuul-web after updating its image a few times. I'm going to do that now | 15:16 |
clarkb | oh wait we run docker image prune not docker-compose image prune Does the second thing even exist? | 15:17 |
clarkb | mordred: ^ I think this is a general issue with us pulling images but not restarting and pruning | 15:18 |
clarkb | mordred: I'm thinking we should move the pull into the start playbooks | 15:18 |
clarkb | otherwise we add on more and more images and slowly fill the disk | 15:18 |
fungi | oh, could that explain some of the disk utilization on review.o.o too? | 15:19 |
clarkb | fungi: we build images less frequently there, but it may explain a small portion of it | 15:19 |
clarkb | fungi: the issue here is I think everytime zuul builds a new image we are pulling them down with ansible but not restarting and pruning on them | 15:20 |
clarkb | infra-root I'm going to run the image prune on zuul now. It will also remove newer zuul-scheduler images than we are running because that hasn't been restarted. The next run of ansible should pull the latest down which will have us covered for tomorrow | 15:21 |
clarkb | Total reclaimed space: 3.285GB | 15:22 |
mordred | clarkb: ++ | 15:22 |
mordred | clarkb: I agree about moving the pull to the start | 15:22 |
clarkb | mordred: cool I'll work on that patch after tea and breakfast | 15:23 |
mordred | cool | 15:23 |
mordred | fungi: you're good with apache - wanna review clarkb's apache caching patch: https://review.opendev.org/#/c/724682 ? | 15:23 |
clarkb | mordred: I think we should also have bup exclude the docker stuff if we haven't already | 15:24 |
mordred | also https://review.opendev.org/#/c/724778/ is related to apache and caching | 15:24 |
clarkb | /root/.bup on zuul and gerrit continue to be huge consumers of disk | 15:24 |
clarkb | and that is index files for files being backed up aiui | 15:24 |
mordred | clarkb: I believe I checked and I believe we do? but maybe we don't? | 15:24 |
mordred | clarkb: there's almost certainly _something_ we're backing up that we don't need to :) | 15:25 |
clarkb | /var/lib/docker/* <- is excluded | 15:25 |
mordred | \o/ | 15:26 |
fungi | i'll take a look at the apache change next once i get through 724682 | 15:27 |
clarkb | mordred: I think we can exclude /var/lib/zuul too? | 15:27 |
mordred | clarkb: well - on the scheduler we want to make sure we're backing up keys | 15:28 |
clarkb | mordred: fungi and ya /var/cache/apache2 is 1.5GB so another not small cost | 15:28 |
clarkb | mordred: good point | 15:28 |
clarkb | though that means we are backing up the status.json dumps | 15:29 |
clarkb | and those are reasonably large and change constantly which could be part of the large /root/.bup | 15:29 |
clarkb | maybe we are ok with excluding those? | 15:29 |
mordred | clarkb: we could exclude /var/lib/zuul/backups on zuul | 15:29 |
mordred | yeah | 15:29 |
clarkb | we do already exclude /var/cache/* so aren't trying to backup apache's cache | 15:30 |
clarkb | (which is good) | 15:30 |
mordred | clarkb: /var/lib/zuul/times might also be icky to backup? | 15:30 |
clarkb | mordred: ++ we can live without those too I think | 15:30 |
clarkb | mordred: basically I think the /root/.bup cost is indexing many files | 15:30 |
clarkb | and I expect having files with names that change over time contributes to that too if I've read bup docs correctly | 15:31 |
clarkb | but also we might consider adding these excludes then "restarting" zuul backups in order to clear all that out? | 15:31 |
openstackgerrit | Merged opendev/system-config master: Remove old init scripts and services for zuul/nodepool https://review.opendev.org/726011 | 15:31 |
openstackgerrit | Merged opendev/system-config master: Run cloud_launcher from zuul https://review.opendev.org/718798 | 15:31 |
clarkb | I think that was the only way I could find to clean that up. Basically start from a new epoch for the append only bakcups | 15:31 |
clarkb | alright breakfast and tea and will write some changes | 15:31 |
mordred | clarkb: ++ | 15:32 |
fungi | mordred: your reference to "clarkb's apache caching patch" back at 15:23 seems to have been a link to your ansible roles change i was already reviewing. which one did you mean? | 15:32 |
AJaeger | mordred: https://zuul.opendev.org/t/openstack/build/5ce869e00e044ed1b6f328fc7f8265d7 failed, that's the deploy from 726043 | 15:34 |
mordred | fungi: whoops! https://review.opendev.org/#/c/724444/ and https://review.opendev.org/#/c/724778/ | 15:35 |
fungi | cool, i just approved that second one, taking a look at the first now | 15:35 |
mordred | AJaeger: thanks - looking | 15:35 |
mordred | Error executing /usr/sbin/apache2ctl: AH00526: Syntax error on line 19 of /etc/apache2/sites-enabled/001-nb.conf: | 15:36 |
mordred | SSLCertificateFile: file '/etc/letsencrypt-certs/nb01.opendev.org/nb01.opendev.org.cer' does not exist or is empty | 15:36 |
mordred | I think ianw said someting about needing another pulse of le - but that doesn't seem to have happened | 15:36 |
mordred | yeah - we haven't run letsencrypt again - let me run it real quick | 15:37 |
clarkb | mordred: as a side note that is just for serving images and status so its not critical to nodepool function | 15:42 |
mordred | clarkb: yeah - but it causes the nodepool playbook to bomb | 15:42 |
clarkb | ah | 15:43 |
mordred | ok - I've run the le playbook and have re-run the service-nodepool playbook | 15:44 |
AJaeger | thanks, mordred ! | 15:45 |
*** priteau has joined #opendev | 15:46 | |
openstackgerrit | Clark Boylan proposed opendev/system-config master: Update bup excludes for zuul-scheduler https://review.opendev.org/726183 | 15:46 |
openstackgerrit | Clark Boylan proposed opendev/system-config master: Pull and prune docker images together https://review.opendev.org/726185 | 15:54 |
clarkb | mordred: infra-root ^ two more disk saving changes | 15:54 |
smcginnis | Interesting workshop on GitHub Actions. They've done a lot to make GitHub work like Gerrit. | 15:56 |
smcginnis | Other than needing to commit a whole stack of half baked commits to get to a good state. | 15:56 |
mordred | smcginnis: yeah - and lack of support for patch series | 15:57 |
fungi | you can avoid that, you just lose the history | 15:57 |
clarkb | also acls | 15:57 |
smcginnis | That patch series support has been something I've really missed in GitHub projects. | 15:57 |
clarkb | fungi: they actually save the diff contexts and comments within that scope now | 15:57 |
* mordred ran in to the patch series issue with subsurface on monday - I pushed up a patch, review found an unrelated thing but that needed to be on top - I had to wait until the first PR was landed before I could push up the followup | 15:58 | |
fungi | clarkb: oh, even if you push --force a new commit? | 15:58 |
clarkb | fungi: yup, its still not great beacuse you can't easily revert and I think comments outside the context of the git diff are either lost or only kept with no context | 15:58 |
clarkb | but its something | 15:58 |
smcginnis | Comments on the PR will be kept, but the commits will be lost. | 15:58 |
smcginnis | AFAIK | 15:58 |
mordred | the comments are still there, but they don't show | 15:58 |
clarkb | smcginnis: correct the commits go away. | 15:58 |
mordred | if you go to your email and find when you got the original comment and click that | 15:59 |
mordred | it'll show it to you | 15:59 |
mordred | but you can't browse to it | 15:59 |
smcginnis | Anyway, not tried to throw shade or anything. They've done a lot of great work. Just found it humorous how much work has been put in to make it gerrit like. | 15:59 |
fungi | mordred: could you have created a new branch off your first pr's branch and then committed the second fix there and made a pr from that> | 16:00 |
fungi | ? | 16:00 |
mordred | fungi: I could have - but it would have still shown the first pr's commits too | 16:00 |
clarkb | ya there is no way to partition the review context | 16:00 |
clarkb | the PR will always contain HEAD to remote HEAD commits in it | 16:00 |
mordred | smcginnis: yah - totally - there has been some excellent work done | 16:00 |
mordred | smcginnis: it's still not a system I enjoy using, but it has definitely improved over the years | 16:01 |
smcginnis | ++ | 16:01 |
openstackgerrit | Merged zuul/zuul-jobs master: Check blocks recursively for loops https://review.opendev.org/724967 | 16:02 |
fungi | #status log deleted openstack/rally branch stable/0.9 (304c76a939b013cbc4b5d0cbbaadecb6c3e88289) per https://review.opendev.org/721687 | 16:03 |
openstackstatus | fungi: finished logging | 16:03 |
clarkb | smcginnis: I think what we are seeing is a realization that they have popularity but if they want to continue to be considered by serious software development environments they need to improve the tools that are available to those groups | 16:04 |
clarkb | Github has always been great for hobby projects or school assignments and even small groups of people working together. | 16:04 |
clarkb | But I have no idea how projects like nasible or kubernetes manage to survive on it | 16:05 |
smcginnis | The one thing they have now that I think could really help us, especially for drive by contributations, is being able to open a web based VS Code environment to make updates. | 16:05 |
clarkb | and slowly github is making the lives of ^ projects like that better | 16:05 |
clarkb | smcginnis: eh | 16:05 |
clarkb | smcginnis: you forget the CLA | 16:05 |
smcginnis | Yeah, I know there are a lot of other things that need to be done. | 16:05 |
clarkb | from a technical standpoint maybe, but you've got to get the board to get out of the way | 16:05 |
smcginnis | But that's kind of the problem. | 16:05 |
smcginnis | Drive by contributors may have a quick fix, look at the steps needed to submit a patch, and say a great big nope. | 16:06 |
openstackgerrit | Merged zuul/zuul-jobs master: Remove opensuse-15-plain testing https://review.opendev.org/725750 | 16:06 |
clarkb | smcginnis: sure, but the biggest issue there is legal and not technical so addressing it from a technical standpoint doesn't help | 16:06 |
clarkb | if we can remove the legal hurdle then addressing technical hurdles makes a lot more sense | 16:06 |
clarkb | and gerrit is adding that functionality | 16:07 |
*** ykarel is now known as ykarel|away | 16:07 | |
clarkb | I think paladox was saying it is really close | 16:07 |
paladox | clarkb you mean file uploads? | 16:07 |
clarkb | paladox: ya I think you said that could be used to generate a new change in the browser? | 16:08 |
paladox | ah, yeh. That's been merged now | 16:08 |
paladox | (so will be included in 3.2) | 16:08 |
paladox | it's live on https://gerrit-review.googlesource.com | 16:08 |
clarkb | nice | 16:09 |
paladox | https://imgur.com/a/dL91kwI is what it looks like | 16:09 |
fungi | right, for now the biggest hurdle to casual contribution, as clarkb points out, is that the osf board of directors and legal counsel are still unwilling to consider that the legal safety net they believe to be crucial to prevent companies from going to war with each other over their contributions to these projects are a significant hindrance to the code contributors themselves, but especially to casual | 16:15 |
fungi | contributors | 16:15 |
fungi | we need folks who are on the osf board of directors to take up the challenge of revisiting that and possibly refuting the earlier concerns of member companies whose legal counsel were so insistent on enforcing contributor legal agreements (especially the corporate contributor license agreement) | 16:17 |
JayF | (I'm 100% serious) can someone run for the board on a platform of "lowering barriers to entry for contributors", that'd get my vote | 16:17 |
mordred | it's complicated | 16:17 |
mordred | the board mostly works via consensus, so if even a few of the directors are generally uncomfortable with something, it's unlikely to go anywhere | 16:18 |
fungi | i've heard from the folks managing ccla compliance at some of these larger member companies who ask why osf still insists on such arcane legal red tape when other large projects are able to get by with something simple like the dco | 16:18 |
mordred | not because it'll get voted down - but because the board won't feel like it has gotten to a point where a vote would be consensual | 16:19 |
mordred | so - as a former board member who has been in favor of fixing this for years - I can tell you it's an arcane enough topic that enough of the members just don't feel like they have enough of handle on it that they're not interested in taking on the risk | 16:20 |
mordred | without an active push from legal counsel saying "this is a good idea and we should do it" (which allows such board members to just say "legal counsel said it's a good idea, if I don't undersatnd the issue I can just trust that) - I think it's unlikely to get traction | 16:21 |
mordred | ALL of that said - we *do* have approval from the board to drop the icla in favor of the dco, which movement forward on is in our court | 16:22 |
mordred | and is backed up behind a few other things last I remember | 16:22 |
fungi | not really | 16:22 |
fungi | i mean we could basically ignore the real thrust of that board resolution | 16:22 |
fungi | which said that we could drop the icla *if* we did a better job of reporting ccla affiliation | 16:23 |
clarkb | which is somehow our problem | 16:23 |
fungi | and that we could rely on the dco for unaffiliated contributors | 16:23 |
mordred | I do not remember it that way | 16:23 |
fungi | reread it ;) | 16:23 |
mordred | oh - no, ccla affiliation is the important bit | 16:23 |
mordred | but I do not remember *us* having to report that better being part of it | 16:23 |
clarkb | mordred: right companies are unable to track their own employees so we have to do it for them | 16:23 |
fungi | https://wiki.openstack.org/wiki/Governance/Foundation/26Oct2015BoardMeetingMinutes | 16:24 |
mordred | the members who cared said they were fine with updating employee ccla activity | 16:24 |
mordred | "WHEREAS, the Foundation needs to develop and implement new software to identify individuals who are listed in the Corporate Contribution License Agreements and implement a new process for all contributions (“New Process”); " | 16:24 |
mordred | yes | 16:24 |
mordred | that's not us | 16:24 |
mordred | that's the foudation | 16:24 |
mordred | and my understanding is openstackid is the answer from the affiliation side | 16:24 |
fungi | stuff like "...adoption of the DCO instead of the Individual Contributor License Agreement (“ICLA”) for contributions by individuals who are not making contributions on behalf of a corporate employer..." | 16:24 |
mordred | so when I say it's backed up | 16:25 |
mordred | I mean the issue we still ahve on our side is that we can't let people log in with openstackid to gerrit | 16:25 |
fungi | "...the Foundation needs to develop and implement new software to identify individuals who are listed in the Corporate Contribution License Agreements and implement a new process for all contributions..." | 16:25 |
mordred | openstackid is the "affliation tracking system" | 16:25 |
mordred | so it has long been my undersatnding that we can't make progress on this until we make progress on new SSO | 16:26 |
fungi | but anyway, we have the okay from the board to adopt the dco for individuals not making contributions on behalf of their employer, so long as we can figure out who they are | 16:26 |
clarkb | mordred: I seemed to remember it being a lot more strict than that but maybe the thoughts around it have changed over time. | 16:27 |
mordred | whether there needs to be additional things past that at that point remains to be seen, but until tying an openstackid to a dev account is a thing, there is really nothing we can do | 16:27 |
fungi | my take is that we could just drop the icla enforcement, turn on dco enforcement for relevant repos, and call it a day. but we need to present that in a way that the osf executive team will be okay with | 16:27 |
clarkb | mordred: ah maybe that was it. step 0 is openstackid maybe there is step 1, 2, n | 16:28 |
mordred | fungi: right - my memory is that once people can login with openstackid, the executive team is much more comfortable | 16:28 |
mordred | yes | 16:28 |
mordred | there ight be a step 1 or 2 | 16:28 |
fungi | and if the osf executives feel like they can't make that call without checking with legal counsel and/or the board, we're sunk | 16:28 |
mordred | but we haven't even tohught about it because of step 0 | 16:28 |
*** dtantsur is now known as dtantsur|afk | 16:28 | |
mordred | and step 0 turns out to be hard | 16:28 |
mordred | I'm mostly just pointing out - there is a known step 0 that is blocking on us - even if there may be unknown step 1, 2, n - and it's a step 0 we want to do for other reasons anyway | 16:29 |
mordred | it mgiht still be blocked on other people or other things after step 0 | 16:29 |
fungi | on a related note, i have some 350 lines scraped from years of old e-mail threads chucked into a spec template i'm trying to refine to make something current | 16:30 |
fungi | i still feel like it's not *actually* blocked on us, it's blocked on somewhat unreasonable logistical demands we have to come up with a technical solution for | 16:31 |
mordred | smcginnis: also - for the record - the cncf also requires a CCLA for contributions, and it took me over a month to figure out how to get myself added to a ccla list so that I could submit a PR to a cncf project | 16:32 |
fungi | we can attempt to un-block it with a semi-reasonable technical compromise, but there's every chance we'll be told that's still not good enough to meet the legal obligation the board insisted on | 16:33 |
mordred | so - let's remember next time someone tells us "it's easier for projects on github" - that for other big projects that statement is an outright lie | 16:33 |
smcginnis | mordred: Not an uncommon experience either. | 16:33 |
mordred | yeah | 16:33 |
fungi | heck, corvus has a tiny (two line?) patch for jitsi-meet rotting in their pr queue blocked on red hat legal signing jitsi's ccla | 16:34 |
*** rpittau is now known as rpittau|afk | 16:34 | |
mordred | fungi: ++ | 16:34 |
*** hashar has quit IRC | 16:39 | |
clarkb | mordred: you reviewed the pull/prune symmetry change but not teh bup update https://review.opendev.org/#/c/726183/ any chance you can review that one too? | 16:39 |
mordred | clarkb: lgtm | 16:43 |
clarkb | mordred: is the stack at https://review.opendev.org/#/c/724682 the last major piece before the reorg of system-config zuul confs? | 16:43 |
*** sshnaidm is now known as sshnaidm|afk | 16:46 | |
mordred | clarkb: yes | 17:03 |
mordred | clarkb: https://review.opendev.org/#/c/723528/ <-- also that please | 17:03 |
mordred | I have confirmed that the cloud-launcher cronjob has been removed from bridge,so landing the "stop removing cron job" patch is fine - I've +A'd the middle of the stack | 17:04 |
mordred | clarkb: https://review.opendev.org/#/c/726034/ is also a fun one to land :) | 17:05 |
openstackgerrit | Monty Taylor proposed zuul/zuul-jobs master: Add docker_dockerfile to upload-docker-image defaults https://review.opendev.org/726210 | 17:08 |
clarkb | mordred: that si a fun one | 17:08 |
mordred | clarkb: ikr? | 17:08 |
clarkb | ugh I think weechat decided mouse mode is on for some reason | 17:10 |
openstackgerrit | Merged opendev/system-config master: python-builder: drop # from line https://review.opendev.org/725374 | 17:10 |
openstackgerrit | Merged opendev/system-config master: Configure htcacheclean for zuul-web https://review.opendev.org/724778 | 17:10 |
openstackgerrit | Merged opendev/system-config master: Cache static zuul resources in apache https://review.opendev.org/724444 | 17:10 |
clarkb | mordred: ianw re https://review.opendev.org/#/c/726034/ shoudl we s/nb04/nb01/ in a followup? | 17:11 |
mordred | clarkb: yeah - might as well | 17:11 |
*** priteau has quit IRC | 17:25 | |
openstackgerrit | Merged zuul/zuul-jobs master: Add docker_dockerfile to upload-docker-image defaults https://review.opendev.org/726210 | 17:25 |
*** avass has joined #opendev | 17:50 | |
clarkb | mordred: https://review.opendev.org/#/c/724682 failed in the gate | 17:50 |
*** ralonsoh has quit IRC | 17:51 | |
openstackgerrit | Merged opendev/system-config master: Test zuul-executor on focal https://review.opendev.org/723528 | 17:53 |
openstackgerrit | Merged opendev/system-config master: Retire nb01/02.openstack.org https://review.opendev.org/726034 | 18:00 |
mordred | clarkb: I've got a recheck on it - it's an unreachable error to the xenial node, so I think it was a cloud-derp | 18:02 |
mordred | (the arm test came after the recheck because slow arm - the recheck is still running) | 18:02 |
*** roman_g has quit IRC | 18:04 | |
corvus | i'm back from errands, but going to do some lunch prep, etc, now, so not really fully back yet | 18:09 |
clarkb | corvus: all fine I think we are mostly waiting on gerrit/zuul for the next step in CD stuff | 18:10 |
openstackgerrit | Merged opendev/system-config master: Use zuul checkouts of ansible roles from other repos https://review.opendev.org/724682 | 18:41 |
openstackgerrit | Merged opendev/system-config master: Stop logging the rsync of puppet https://review.opendev.org/724419 | 18:41 |
openstackgerrit | Merged opendev/system-config master: Stop removing cloud-launcher cron https://review.opendev.org/718799 | 18:41 |
mordred | clarkb: I got a question in sdks about image checksum issues in osc - do you remember us having some issue with that here in the last couple of months? | 18:43 |
*** dtroyer has joined #opendev | 18:44 | |
clarkb | mordred: ish. They dont do anything useful except for the swift based upload dedup | 18:49 |
clarkb | mordred: glance can change the image without telling you and so thr checksum you provide to glance is uselss also glance doesnt check it aiui | 18:50 |
clarkb | those are issues but nothing that affected useability in a regressive way | 18:52 |
mordred | clarkb: nod. well - issue at hand is issues uploading an image to a vmware-backed cloud with osc v5 which uses sdk instead of glanceclient with a checksum mismatch | 18:56 |
mordred | I'm guessing the vmware driver is modifying the image so the cloud checksum is different | 18:56 |
mordred | which makes me thnk having osc pass "validate_checksums=True" is a bad idea and we should set that to false -a nd probably default it to false in sdk | 18:57 |
clarkb | ya I dont think you can generally verify checksums unfortunayely | 18:57 |
clarkb | it would be good for glance to checksum what it receives separately to what it stores for use | 18:57 |
clarkb | then clients can confirm the remote received the data properly | 18:58 |
mordred | clarkb: ++ | 19:08 |
mordred | clarkb: and could know that the cloud modified the binary payload | 19:08 |
mordred | clarkb: ZOMG - the stack landed | 19:12 |
mordred | clarkb: I pass you the mutex on the repo for zuul.yaml changes | 19:12 |
mordred | clarkb: also - I'm going for a bike ride - back in a bit | 19:12 |
openstackgerrit | Andreas Jaeger proposed openstack/project-config master: Stop building opensuse-15-plain images https://review.opendev.org/725751 | 19:20 |
clarkb | cool will get the new ps up in a momeny | 19:20 |
corvus | really back now | 19:20 |
clarkb | finishing lunch | 19:20 |
fungi | and i'm about to start cooking dinner | 19:20 |
fungi | like ships that pass in the night ;) | 19:21 |
*** dpawlik has quit IRC | 19:23 | |
clarkb | corvus: fungi https://review.opendev.org/726183 and https://review.opendev.org/726185 are additional efforts to get disk consumption into reasonable levels | 19:27 |
clarkb | and now to redo the reorg change | 19:27 |
openstackgerrit | James E. Blair proposed opendev/system-config master: WIP: add Zookeeper TLS support https://review.opendev.org/720302 | 19:28 |
corvus | clarkb: the prune is an improvement, but i think that the ideal (if we have time to figure this out) is something more like the kernel: always pull, and prune all but the current and latest. | 19:30 |
corvus | clarkb: that way we're not pulling on start, but we're also not wasting space | 19:31 |
corvus | clarkb: actually -- does prune do that? | 19:31 |
corvus | clarkb: ie, does it not remove the running image because it's running, and not remove the :latest image because it's tagged? | 19:32 |
openstackgerrit | Clark Boylan proposed opendev/system-config master: Organize zuul jobs in zuul.d/ dir https://review.opendev.org/722394 | 19:37 |
clarkb | corvus: it doesn't remove the running image. I think it may remove the latest image | 19:38 |
clarkb | but I'm not sure about latest | 19:38 |
clarkb | it removes all "dangling" images. Now to figure out what that means | 19:39 |
clarkb | "Dangling images are layers that have no relationship to any tagged images." | 19:39 |
corvus | clarkb: when i do 'docker image prune' locally, i still have a bunch of tagged images in the list | 19:39 |
clarkb | so ya we can probably just pull and prune always | 19:39 |
corvus | ++ | 19:39 |
clarkb | infra-root https://review.opendev.org/#/c/722394/ should be ready for review and the conflicts with list looks less bad than before | 19:40 |
clarkb | I believe all testing that will run is happy now too but there may be surprises I suppose | 19:40 |
clarkb | corvus: I guess we move the pull and prune into main.yaml and simplify start.yaml then? I can work on that patchset if we like that idea | 19:41 |
corvus | clarkb: yeah that sounds right | 19:43 |
openstackgerrit | Clark Boylan proposed opendev/system-config master: Pull and prune docker images together https://review.opendev.org/726185 | 19:52 |
clarkb | corvus: ^ I think I switched gears there properly | 19:52 |
openstackgerrit | Merged opendev/system-config master: Update bup excludes for zuul-scheduler https://review.opendev.org/726183 | 19:54 |
openstackgerrit | Merged openstack/project-config master: Stop building opensuse-15-plain images https://review.opendev.org/725751 | 20:06 |
paladox | corvus wondering if you saw my ping from earlier in the morning? :) | 20:08 |
paladox | err wrong channel | 20:08 |
corvus | paladox: no -- which channel? :) | 20:13 |
corvus | ah found it | 20:15 |
paladox | zuul :) | 20:15 |
openstackgerrit | Gage Hugo proposed openstack/project-config master: Retire syntribos - Step 1 https://review.opendev.org/726237 | 20:26 |
clarkb | infra-root if you get a chance it might be good to get other eyeballs onto the disk constraints issue on zuul01.openstck.org. Hoping we can avoid filling that disk, but worried the last thing I'm seeing there is the /root/.bup dir whcih I'm not sure how to safely clean up | 20:30 |
clarkb | ianw: ^ I know you've done some bup things any idea if we can move the old zuul backups aside, rm /root/.bup then start a new backup series for zuul01? | 20:30 |
clarkb | or possibly even rm /root/.bup and keep backing up to the existing location? | 20:31 |
clarkb | https://review.opendev.org/#/c/722394/ passed testing | 20:41 |
corvus | clarkb: interesting it ran so many jobs... | 20:43 |
corvus | clarkb: they shouldn't be changing, right? | 20:44 |
clarkb | corvus: the content shouldn't be changing, its literally copy paste from one file into another | 20:44 |
clarkb | but I think zuul may see that as an update anyway? | 20:44 |
corvus | clarkb: it's basically a diff of the json serialization of the job | 20:44 |
clarkb | corvus: is it possible that order matters? | 20:45 |
clarkb | corvus: iirc zuul loads the jobs in a sorted order when in zuul.d | 20:45 |
clarkb | and that new order may be different from what was in .zuul.yaml? | 20:45 |
mordred | corvus: do you want me to remove my +A? | 20:45 |
corvus | clarkb: should be job-by-job | 20:47 |
corvus | mordred: nah; let's just keep an eye out | 20:47 |
clarkb | also for each patchset I didn't do a rebase. Instead I redid the copy paste process to be sure I caught changes in master | 20:48 |
clarkb | it was fairly mechanical so doing that was easier than reasoning about merge conflicts | 20:48 |
mordred | clarkb: you might find https://review.opendev.org/#/c/725103/ amusing | 20:50 |
clarkb | mordred: yay puppet | 20:51 |
openstackgerrit | Monty Taylor proposed opendev/system-config master: Remove puppet on non-puppeted servers https://review.opendev.org/725104 | 20:53 |
corvus | mordred: holy cow, the zk tls change passed tests | 20:57 |
mordred | corvus: woot! | 20:57 |
corvus | mordred: however, it seems it managed to do that despite the zuul scheduler failing to start | 20:59 |
mordred | corvus: for the job diff - is it possible that the serialization contains yaml context info (for error line numbers and stuff?) | 20:59 |
mordred | corvus: I think we need better testing of the zuul/nodepool services actually starting | 20:59 |
corvus | mordred: yeah, there's a source_context in there | 21:00 |
corvus | we should drop that from the comparison | 21:00 |
mordred | \o/ | 21:00 |
* mordred had a thought | 21:01 | |
corvus | maybe the description, too? | 21:01 |
mordred | corvus: yeah - maybe so - I don't hink we need to run tests if the description changes | 21:01 |
corvus | mordred, clarkb: it looks like the zuul installation as created by system-config on current master largely doesn't work | 21:15 |
corvus | we need to fix that before we can use it to validate the tls change | 21:15 |
corvus | the scheduler says: | 21:15 |
corvus | configparser.NoSectionError: No section: 'gearman' | 21:15 |
corvus | which is weird, because the zuul.conf has a gearman section. and the other services don't have that error | 21:15 |
clarkb | maybe a bind mount issue? | 21:16 |
corvus | maybe; that makes me wonder how prod is working though? | 21:16 |
clarkb | /etc/zuul:/etc/zuul is a bind mount in docker-compose | 21:17 |
clarkb | and there is a gearman section in prod's /etc/zuul/zuul.conf | 21:18 |
corvus | the one we copy to logs is in /etc/zuul/zuul.conf | 21:18 |
corvus | and it has a gearman section | 21:18 |
clarkb | I would expect a permissions error to manifest differently | 21:20 |
* mordred is confused | 21:22 | |
corvus | i downloaded the zuul.conf from the test and configparser is fine with it | 21:27 |
mordred | corvus: do we need to hold a node and recheck the job? | 21:27 |
mordred | I don't see anything that explains what's different from the logs | 21:27 |
mordred | everything ran that should and the file has the right content | 21:27 |
openstackgerrit | James E. Blair proposed opendev/system-config master: DNM: fail zuul tests https://review.opendev.org/726248 | 21:31 |
corvus | mordred: yeah, i guess so | 21:31 |
corvus | nodepool fails to connect to zk | 21:33 |
corvus | so i guess i should update that too | 21:33 |
openstackgerrit | Monty Taylor proposed opendev/system-config master: Remove puppet on non-puppeted servers https://review.opendev.org/725104 | 21:33 |
openstackgerrit | James E. Blair proposed opendev/system-config master: DNM: fail zuul tests https://review.opendev.org/726248 | 21:33 |
mordred | corvus: your comment on the previous ps ^^ made me realize I was doing that WAY to absurdly | 21:33 |
*** DSpider has quit IRC | 21:34 | |
corvus | mordred: much nicer :) | 21:34 |
corvus | okay, we should have a held zuul and nodepool set soon | 21:35 |
clarkb | mordred: we need to disable puppte agent on puppet hosts though | 21:36 |
clarkb | mordred: so I think we want to keep running the old role on puppet too | 21:36 |
clarkb | mordred: left that note on https://review.opendev.org/#/c/725104/5 if yo ucan double ceck it | 21:39 |
clarkb | and I can't type | 21:39 |
openstackgerrit | Merged opendev/system-config master: Stop running mcollective https://review.opendev.org/725103 | 21:42 |
openstackgerrit | Monty Taylor proposed zuul/zuul-jobs master: Actually include platform in the upload build https://review.opendev.org/726251 | 21:58 |
openstackgerrit | Monty Taylor proposed zuul/zuul-jobs master: Process siblings in upload-image push https://review.opendev.org/726253 | 22:02 |
openstackgerrit | Monty Taylor proposed opendev/system-config master: Add focal to system-config base job https://review.opendev.org/725676 | 22:15 |
mordred | clarkb: ^^ I rebased this on your reorg patch | 22:16 |
clarkb | hrm did we break a bunch of our tests? the reorg change is failing everything now | 22:16 |
mordred | clarkb: oh good | 22:16 |
clarkb | mordred: opensift client tarball fetch is returning HTTP 403 | 22:17 |
corvus | this is a really helpful error from systemd: https://27fbadc9cf2dd544239f-7d0e556db3075d25d1b91bbdcc8a4562.ssl.cf1.rackcdn.com/722394/3/gate/system-config-run-static/cc59be9/bridge.openstack.org/ara-report/result/25990b08-c9a9-4a19-9b80-7839f9c6b1f3/ | 22:17 |
mordred | 2020-05-07 22:10:09.258383 | bridge.openstack.org | TASK [install-kubectl : Download openshift client tarball] ********************* | 22:18 |
mordred | yeah | 22:18 |
mordred | corvus: yeah - it really is isn't it? | 22:18 |
corvus | clarkb: https://zuul.opendev.org/t/openstack/build/cc59be9dfac6482dbd543a9a9258f93c/log/static01.opendev.org/syslog.txt#1556 | 22:19 |
corvus | clarkb, mordred: ^ that's the static/apache failure | 22:19 |
corvus | something about the LE cert | 22:19 |
clarkb | corvus: I'm guessing the LE demo site failed and so we didn't fake issue our fake cert? | 22:20 |
clarkb | decoupling from their fake site now that we know things essentially work may be a good idea | 22:20 |
clarkb | ianw: ^ is that something you've thought about as an option? | 22:20 |
clarkb | the openshiftclient issue seems to be a github thing? | 22:20 |
mordred | yeah - github seems to be happier now ... I just curl'd the file locally | 22:20 |
mordred | although if I try to click on it in my browser it's still sad | 22:20 |
clarkb | I'm still getting the aws error | 22:21 |
corvus | clarkb: do we hit their fake server? | 22:21 |
mordred | nope. still error | 22:21 |
openstackgerrit | Merged zuul/zuul-jobs master: Actually include platform in the upload build https://review.opendev.org/726251 | 22:21 |
clarkb | corvus: yes, then if fake server succeeds we copy over a snakeoil cert | 22:21 |
openstackgerrit | Merged zuul/zuul-jobs master: Process siblings in upload-image push https://review.opendev.org/726253 | 22:21 |
corvus | clarkb: this is nothing failing for the tarball server: https://27fbadc9cf2dd544239f-7d0e556db3075d25d1b91bbdcc8a4562.ssl.cf1.rackcdn.com/722394/3/gate/system-config-run-static/cc59be9/bridge.openstack.org/ara-report/result/bc0f0019-15c4-438d-9a54-41198f4bcf85/ | 22:21 |
mordred | I found better url | 22:22 |
corvus | clarkb: i can't see anything wrong | 22:22 |
mordred | https://artifacts-openshift-release-3-11.svc.ci.openshift.org/zips/openshift-origin-client-tools-v3.11.0-d699176-406-linux-64bit.tar.gz is not a github url and is mentioned on the github release page | 22:22 |
mordred | (also it works) | 22:22 |
corvus | mordred: it's a google ip | 22:23 |
clarkb | this happens on kubernetes releases too | 22:23 |
mordred | should we update our download to download from there instead? | 22:23 |
clarkb | so I think its a complete failure of github's release system | 22:23 |
mordred | clarkb: but this isn't a new release issue | 22:23 |
mordred | this is an old release | 22:23 |
clarkb | mordred: ya I know | 22:23 |
clarkb | mordred: I'm saying other releases for other projects have the same issue | 22:23 |
corvus | mordred: google's usually pretty good about keeping their servers up | 22:23 |
mordred | nod | 22:23 |
mordred | corvus: yeah | 22:23 |
clarkb | it looks likethey've misconfigured their aws file proxying | 22:23 |
mordred | clarkb: you wanna update your patch or I can I've got it in my cache | 22:23 |
clarkb | mordred: we can update the url in a separate change right? I think that will be cleaner for review and history | 22:24 |
mordred | yeah | 22:25 |
openstackgerrit | Monty Taylor proposed opendev/system-config master: Organize zuul jobs in zuul.d/ dir https://review.opendev.org/722394 | 22:25 |
openstackgerrit | Monty Taylor proposed opendev/system-config master: Add focal to system-config base job https://review.opendev.org/725676 | 22:25 |
openstackgerrit | Monty Taylor proposed opendev/system-config master: Use google hosted oc client tools location https://review.opendev.org/726256 | 22:25 |
mordred | clarkb, corvus : ^^ that's the file and checksum listed at https://artifacts-openshift-release-3-11.svc.ci.openshift.org/zips/ | 22:26 |
clarkb | mordred: hrm I wasn't expecting different shas there | 22:26 |
clarkb | looks like the filename embedded sha and the blob sha are different? Is that like when we do a dev release its 3.11 + this delta? | 22:26 |
mordred | me either - but I'm guessing they built their own to upload to the google location and had github build the ones on github maybe? | 22:27 |
clarkb | heh I was going to download both and compare them | 22:28 |
corvus | mordred: where's the page where openshift promises this isn't a haxor trojan thing? | 22:28 |
clarkb | then I realized I can't download the old one :) | 22:28 |
ianw | clarkb: we test against staging ... if anything we should move to testing against some sort of dummy issuer | 22:29 |
mordred | eek no don't load that | 22:29 |
mordred | that's the build from the tip of the release-3.11 branch | 22:29 |
mordred | and it'll change every time they land a new commit | 22:29 |
clarkb | mordred: thats what I was thinking | 22:29 |
ianw | https://zuul.opendev.org/t/openstack/build/cc59be9dfac6482dbd543a9a9258f93c/log/static01.opendev.org/acme.sh/acme.sh.log doesn't seem to have anything obvious | 22:30 |
clarkb | fwiw I can download nad extract it and it seems to have the contents we expect, but if thats dev ya I don't think we want that | 22:30 |
mordred | https://github.com/openshift/origin/commits/d699176b22a0836c4f2dcd327a685473249e9633 | 22:30 |
mordred | that's the sha in that filename | 22:30 |
* mordred unrestacks that | 22:30 | |
openstackgerrit | Monty Taylor proposed opendev/system-config master: Organize zuul jobs in zuul.d/ dir https://review.opendev.org/722394 | 22:30 |
openstackgerrit | Monty Taylor proposed opendev/system-config master: Add focal to system-config base job https://review.opendev.org/725676 | 22:30 |
ianw | sorry, take that back it failed with "35" -> https://zuul.opendev.org/t/openstack/build/cc59be9dfac6482dbd543a9a9258f93c/log/static01.opendev.org/acme.sh/acme.sh.log#461 | 22:31 |
mordred | I -2'd the update patch and then abandoned it | 22:31 |
clarkb | ianw: thanks for tracking that back | 22:31 |
mordred | CURLE_SSL_CONNECT_ERROR | 22:31 |
mordred | haha | 22:31 |
mordred | it's an ssl error | 22:31 |
corvus | mordred: nice :) le inception | 22:32 |
corvus | i think that part of the internet is telling us to go do something else for a while | 22:32 |
mordred | corvus: yeah | 22:32 |
corvus | oh, the nodepool/zuul jobs are done | 22:33 |
mordred | neat! | 22:33 |
clarkb | it certainly feels that way | 22:33 |
mordred | ianw: fwiw - I still don't have an arm image for you - there were a series of tiny derps in the publish job | 22:33 |
clarkb | its not made it to the github status dashboard | 22:33 |
clarkb | though I expect if I mention it to jlk at this point it would just be an unnecessary interupt | 22:34 |
mordred | ianw: hahaha. your patch to add build-essential for the arm host - is that because we pip install docker-compose? that's actually pretty funny | 22:34 |
ianw | mordred: ok ... https://c8abc17054e797f6cc7e-38b170b34202fbd22a3d39c7d4e00ec5.ssl.cf5.rackcdn.com/726037/4/check-arm64/system-config-run-nodepool-builder-arm64/886487d/bridge.openstack.org/ara-report/ got further than i thought | 22:35 |
openstackgerrit | James E. Blair proposed opendev/system-config master: DNM: fail zuul tests https://review.opendev.org/726248 | 22:35 |
corvus | failed to fail | 22:35 |
mordred | ianw: oh. hah. I hadn't thought about needing arm zk images for our tests | 22:35 |
ianw | mordred: yeah i think that was it, but it could be anything we install that has a wheel on x86 that doesn't on arm | 22:36 |
mordred | ianw: yah - I just checked because we're trying to install fewer things with global pip on these docker hosts | 22:36 |
ianw | it did pull a nodepool image, though | 22:36 |
mordred | but - docker-compose is the exception | 22:36 |
mordred | ianw: it did? | 22:36 |
clarkb | mordred: ianw in theory zk on arm should be easy because jvm at least | 22:36 |
corvus | mordred, clarkb: can we switch back to distro d-c on focal? | 22:36 |
ianw | yeah ... just looking now :) | 22:37 |
mordred | corvus: maybe | 22:37 |
mordred | so - uhm. how did the job pull a nodepool image? | 22:37 |
ianw | "Pulling nodepool-builder ... digest: sha256:82af2f94898157ea82...", | 22:37 |
ianw | not sure what that's cut off | 22:38 |
mordred | I do agree - it does very much look like it successfully pulled on | 22:38 |
mordred | one | 22:38 |
* clarkb looks at ubuntu package versions | 22:38 | |
mordred | but - I haven't seen us upload one | 22:38 |
ianw | i'm presuming you have memorised the sha256's of all our container images | 22:38 |
mordred | corvus: is it possible that the original docker push based upload to dockerhub actually was working for multi-arch manifest? | 22:39 |
clarkb | corvus: yes I think docker-compose on focal is plenty new enough (its from november 2019) | 22:39 |
mordred | cool | 22:39 |
ianw | should we see it @ https://hub.docker.com/r/zuul/nodepool-builder/tags ? | 22:40 |
mordred | ianw: I htink I'd expect to see more than one arch at https://hub.docker.com/layers/zuul/nodepool-builder/latest/images/sha256-f7d46486c717ed08d1b4ea90f77bf1a6e578cbfcb24f4bb353dd7345c88410a9?context=explore | 22:42 |
ianw | clarkb: i just saw those openshift client failures too, but can download hte file here; is it just a recheck situation? | 22:44 |
clarkb | if I click on the kubernetes release file it works now | 22:44 |
corvus | mordred: no idea, i took your word that it wasn't working | 22:44 |
clarkb | ianw: ya I think if there isn't another source for that (maybe that obs kubic related packaging for libcontainers has a sibling for openshift?) then we just recheck once it works and I think its wrking for me now | 22:45 |
mordred | corvus: I havent' been able to see any evidence it was working | 22:45 |
ianw | mordred: am i missing how to see older versions? | 22:46 |
mordred | ianw: there isn't a way to see older versions | 22:46 |
corvus | mordred: easiest thing would be to fetch the manifest, yeah? | 22:46 |
mordred | corvus: yeah - and manifest inspect for me shows me only amd64 | 22:47 |
corvus | also, i guess the arch should show up on https://hub.docker.com/r/zuul/nodepool-builder/tags ? | 22:47 |
mordred | yeah | 22:47 |
mordred | so - I'm baffled by the pull working in that nodepool job -- oh wait | 22:48 |
mordred | corvus: is there any chance that job was finding an arm image in our intermediate registry? | 22:48 |
mordred | those should be cross-tenant so almost certainly not, right? | 22:49 |
mordred | ianw: what change was that job log from? | 22:49 |
ianw | https://review.opendev.org/#/c/726037/ | 22:49 |
*** tkajinam has joined #opendev | 22:49 | |
ianw | but the registry job started at 6:19 | 22:50 |
ianw | https://zuul.opendev.org/t/openstack/build/7980170503ff45cfbb354eb5630e5c63 : 2020-05-07T06:19:16 | 22:50 |
mordred | yeah - there's nothing about that that should have been able to find a non-published internal image | 22:50 |
ianw | and it also *didn't* find a zk image | 22:51 |
ianw | i've got a recheck going, let's see if it replicates first | 22:53 |
mordred | ++ | 22:54 |
corvus | mordred, ianw: am i correct in understanding that your current question is "what is sha256:82af2f94898157ea82 and where did it come from?" | 22:54 |
mordred | corvus: well - my main question is "how did docker on arm find _anything_ to install for zuul/nodepool" - but I think that's a valid sub question | 22:55 |
ianw | corvus: yes, that is what the arm64 job appeared to pull | 22:55 |
corvus | mordred: it's hard to directly answer the real question because we changed dockerhub since then, right? | 22:55 |
corvus | (once ianw rechecks, we'll have a test where we can actually inspect dockerhub at the same time though) | 22:55 |
ianw | https://c8abc17054e797f6cc7e-38b170b34202fbd22a3d39c7d4e00ec5.ssl.cf5.rackcdn.com/726037/4/check-arm64/system-config-run-nodepool-builder-arm64/886487d/bridge.openstack.org/ara-report/result/8a4c85f5-536f-4ecd-98b0-b352b83dc6b7/ | 22:56 |
mordred | https://zuul.opendev.org/t/openstack/build/886487d6c14749ce8b7e7b29196fcec5/log/nb03.opendev.org/docker/nodepool-builder-compose_nodepool-builder_1.txt#1 | 22:56 |
mordred | it pulled amd65 | 22:56 |
mordred | amd64 | 22:56 |
mordred | and then failed | 22:56 |
corvus | ianw, mordred: can you give your links more context? | 22:57 |
corvus | mordred: that build link is different than the one ianw pointed to earlier | 22:57 |
ianw | corvus: https://review.opendev.org/#/c/726037/ is the job that is trying to run gate testing of arm64 nodepool builder | 22:57 |
mordred | I'm looking at https://zuul.opendev.org/t/openstack/build/886487d6c14749ce8b7e7b29196fcec5 - which is the build for that job | 22:57 |
corvus | ok, thx | 22:58 |
mordred | and that was the docker log for the nodepool container - which while docker pulled an image for it failed to exec the contents inside - because "awesome" | 22:58 |
ianw | so we see it failing to get a zk image due to the arch not matching, but downloading an x86 container for nodepool-builder and trying to run it | 22:58 |
ianw | zk failure to pull due to arch @ https://c8abc17054e797f6cc7e-38b170b34202fbd22a3d39c7d4e00ec5.ssl.cf5.rackcdn.com/726037/4/check-arm64/system-config-run-nodepool-builder-arm64/886487d/bridge.openstack.org/ara-report/result/d48ca6f2-75bf-4a17-8dff-3bf2659f270b/ | 22:58 |
corvus | ianw, mordred: this is the build that pushed sha256:82af... https://zuul.opendev.org/t/zuul/build/efe660457c1a4c34bb9fe46e3587dbe2 | 23:01 |
ianw | hopefully the currently running one holds the nodes | 23:02 |
ianw | ssh root@2604:1380:4111:3e56:f816:3eff:fe74:ee7d is the zk node ; ssh root@2604:1380:4111:3e56:f816:3eff:fe5f:c9cf is the nb (to be) host | 23:05 |
corvus | ianw, mordred: dockerhub is updated and the images are now multi-arch: https://hub.docker.com/r/zuul/nodepool-builder/tags | 23:05 |
ianw | heh, ok so this build is going to grab that anyway, hopefully | 23:06 |
* mordred needs to eod | 23:06 | |
mordred | corvus: woot! | 23:07 |
ianw | if things are looking good, i guess i'll just move forward on a nb03.opendev.org | 23:09 |
ianw | i wonder actually if we have enough quota for a duplicate builder | 23:15 |
openstackgerrit | Merged opendev/system-config master: Pull and prune docker images together https://review.opendev.org/726185 | 23:16 |
mordred | corvus: I'm trying to eod - but I notice that we've uploaded to latest there, not just to change_{foo}_latest | 23:17 |
openstackgerrit | Monty Taylor proposed zuul/zuul-jobs master: Don't upload to the tag with buildx - only to the change tag https://review.opendev.org/726261 | 23:18 |
mordred | corvus, ianw: ^^ we should land that soonish - only affects nodepool images - but still | 23:19 |
ianw | mordred: we're a few seconds away from seeing what the test nb03 tries to pull :) | 23:19 |
corvus | mordred: yes, important, thanks | 23:20 |
corvus | mordred: i just assumed the promote job failed to delete the change tag | 23:20 |
mordred | ianw: it should pull the right thing - although the k8s functional test in the nodepool job has not succeeded - so this is an accident :) | 23:21 |
ianw | "Labels": { | 23:21 |
ianw | "com.docker.compose.config-hash": "2549820a8f46c69784b2981e97d86858ce83ccbb73c5abd374c5c17abe038156", | 23:21 |
ianw | standard_init_linux.go:211: exec user process caused "exec format error" | 23:21 |
corvus | mordred: but yes, now i see that change hasn't even merged yet :( | 23:21 |
ianw | i would say it has not | 23:21 |
mordred | poo | 23:22 |
mordred | SO - I noticed something in the zk situation | 23:22 |
mordred | which is that it was looking for linux/arm64/v8 | 23:22 |
mordred | and we are so far only building linux/arm64 | 23:22 |
mordred | do we need to try building linux/arm64/v8 ? | 23:22 |
ianw | Architecture: aarch64 | 23:23 |
ianw | from docker info | 23:23 |
mordred | linux/amd64, linux/arm64, linux/riscv64, linux/ppc64le, linux/s390x, linux/386, linux/arm/v7, linux/arm/v6 | 23:24 |
mordred | those are the archs we can build for | 23:24 |
ianw | but again, the zk one failed with the no arch match ... | 23:24 |
mordred | we're so close ... | 23:25 |
ianw | the only difference seems to be latest in nodepool v image: docker.io/library/zookeeper:3.5 | 23:27 |
ianw | a specific tag | 23:27 |
corvus | zk only has amd64 for that tag | 23:27 |
ianw | OH ... i thought it was our zk image ... that's an upstream image | 23:28 |
mordred | corvus: so, this: docker run --rm arm64v8/alpine uname -a | 23:28 |
mordred | on mttest-docker | 23:28 |
corvus | some old zk image tags have linux/arm/v6 arch | 23:28 |
mordred | corvus: shows that binfmt has been registered for the appropriate arch for our arm64/v8 host | 23:29 |
mordred | corvus: so I think we just need to learn how to configure the builder we're launching to work with it | 23:29 |
mordred | assuming that everything is pulling the right images | 23:29 |
mordred | oh - actually - arm64v8/alpine is arm64 arch in its image manifest | 23:30 |
mordred | so maybe those two are compatible | 23:30 |
corvus | mordred: yes i was about to point that out | 23:30 |
corvus | i think the v8 thing is a red herring | 23:31 |
mordred | yeah | 23:31 |
corvus | (that came from the zk failure, but the zk failure doesn't have *any* kind of an arm image, so we shouldn't fixate on that) | 23:31 |
mordred | agree | 23:32 |
corvus | ianw: did your test run pull this image? https://hub.docker.com/layers/zuul/nodepool-builder/latest/images/sha256-606d630c83ada2f40a182f0ea88d6616f2005a1effb11f8e305aed07ee57ea85?context=explore | 23:32 |
ianw | no | 23:33 |
ianw | zuul/nodepool-builder latest d3999fec2fbc 41 minutes ago 634MB | 23:33 |
openstackgerrit | Merged zuul/zuul-jobs master: Don't upload to the tag with buildx - only to the change tag https://review.opendev.org/726261 | 23:34 |
ianw | that has in it "org.zuul-ci.change_url": "https://review.opendev.org/614074" | 23:34 |
ianw | when i inspect that, it has "Architecture": "arm64", | 23:35 |
corvus | ianw: that's weird; dockerhub says that :latest was uploaded 32 minutes ago | 23:35 |
mordred | corvus: also - one more thought - we do install dumb-init in python-base - perhaps with docker layers we're getting an x86 dumb-init | 23:35 |
corvus | mordred: interesting | 23:35 |
mordred | corvus: so maybe we actually DO need to build arm python-base | 23:35 |
ianw | http://paste.openstack.org/show/793297/ is the full inspect | 23:36 |
mordred | the python base image itself is multi-arch - and nothing in our docker build of nodepool interacts with dumbinit | 23:36 |
*** tosky has quit IRC | 23:36 | |
openstackgerrit | Monty Taylor proposed opendev/system-config master: Build multi-arch python-base/python-builder https://review.opendev.org/726263 | 23:39 |
corvus | ianw: based on the labels and arch, that appears to be the right image | 23:39 |
*** mlavalle has quit IRC | 23:40 | |
mordred | corvus: so x86 python-base dumb-init is my best hypothesis right now | 23:40 |
corvus | ianw: maybe you can try some docker run in that image? | 23:41 |
corvus | ianw: basically, everything should work except dumb-init if mordred's hypothesis is correct | 23:41 |
ianw | coruvs: yeah i am, but i always get the exec format error | 23:41 |
mordred | ianw: what are you running? | 23:41 |
ianw | # docker run --entrypoint "/usr/bin/python" zuul/nodepool-builder:latest | 23:41 |
ianw | standard_init_linux.go:211: exec user process caused "exec format error" | 23:41 |
mordred | ah - interesting | 23:41 |
ianw | is there some way to inspect the container from the outside; mount it as a volume or something? | 23:42 |
ianw | i could export it i guess? | 23:42 |
corvus | ianw: yeah, docker save? | 23:42 |
mordred | ok. I really do have to EOD- good luck | 23:43 |
ianw | file /usr/bin/python2.7 | 23:45 |
ianw | /usr/bin/python2.7: ELF 64-bit LSB shared object, ARM aarch64, version 1 (SYSV) | 23:45 |
ianw | ok, so i exported the image and untarred it | 23:47 |
ianw | # ./bin/bash | 23:47 |
ianw | -bash: ./bin/bash: cannot execute binary file: Exec format error | 23:47 |
ianw | http://paste.openstack.org/show/793298/ ... | 23:49 |
ianw | file format elf64-littleaarch64 (good) v file format elf64-little (bad) | 23:51 |
Generated by irclog2html.py 2.17.2 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!