*** brinzhang has joined #opendev | 00:02 | |
fungi | sent just now | 00:14 |
---|---|---|
openstackgerrit | Merged zuul/zuul-jobs master: ensure-docker: ensure docker.socket is stopped https://review.opendev.org/c/zuul/zuul-jobs/+/787271 | 00:18 |
brinzhang | mordred: hi, I am a cyborg core, we would like to switch using launchpad instead of storyboard, but the bugfix and/or feature's commit cannot be have a relation, and that cannot statistics by the https://www.stackalytics.io/ | 00:23 |
brinzhang | mordred: could you help us, and have a look? I saw https://launchpad.net/openstack here created by yourself ^ | 00:24 |
fungi | brinzhang: maybe you want to revisit the conversation i had with xinranwang in #openstack-infra earlier: http://eavesdrop.openstack.org/irclogs/%23openstack-infra/%23openstack-infra.2021-04-20.log.html | 00:35 |
fungi | hopefully that answers all your questions | 00:35 |
brinzhang | fungi: ack, let me see, thanks | 00:37 |
*** elod has quit IRC | 00:40 | |
*** elod has joined #opendev | 00:42 | |
openstackgerrit | Ian Wienand proposed openstack/diskimage-builder master: dib-run-parts: stop leaving PROFILE_DIR behind https://review.opendev.org/c/openstack/diskimage-builder/+/787303 | 00:42 |
fungi | rackspace e-mailed us about an outage for the ethercalc server around 14:00 utc, if someone gets a chance to check in on it | 01:25 |
ianw | i'm just cleaning up nb03 and old images from the arm64 region rename, hopefully then we can upload raw images to OSU | 01:45 |
fungi | awesome | 01:48 |
ianw | hrm, i might have messed up a bit swapping the names | 01:57 |
ianw | there's a bunch of deleting images for "linaro-us" "osuosl" (as opposed to "linaro-us-regionone", "osuosl-regionone") but i don't think anything will pick them up to delete them, as there isn't a provider for that now | 01:57 |
ianw | two thoughts are to manually delete the ZK nodes, or put in a provider by that name with no images by hand to see if it picks them up | 01:59 |
openstackgerrit | Wenping Song proposed openstack/project-config master: Change cyborg project track to launchpad https://review.opendev.org/c/openstack/project-config/+/787306 | 02:00 |
ianw | adding a fake provider seems to be deleting the images | 02:03 |
*** artom has quit IRC | 02:06 | |
ianw | i don't know why, but nb03 doesn't seem to want to refresh the images for osu | 02:14 |
openstackgerrit | Wenping Song proposed openstack/project-config master: Change cyborg project track to launchpad https://review.opendev.org/c/openstack/project-config/+/787306 | 02:50 |
*** amoralej|off has quit IRC | 03:37 | |
*** ykarel|away has joined #opendev | 04:06 | |
*** ykarel_ has joined #opendev | 04:10 | |
*** ykarel|away has quit IRC | 04:12 | |
fungi | #status log Removed temporary block of 161.170.233.0/24 in iptables on gitea-lb01.opendev.org after discussion with operators of the systems therein | 04:39 |
openstackstatus | fungi: finished logging | 04:39 |
*** hamalq has quit IRC | 04:49 | |
*** vishalmanchanda has joined #opendev | 04:55 | |
*** ralonsoh has joined #opendev | 05:29 | |
*** ralonsoh has quit IRC | 05:33 | |
*** ralonsoh has joined #opendev | 05:36 | |
*** ralonsoh_ has joined #opendev | 05:55 | |
*** seongsoocho_ has joined #opendev | 05:55 | |
*** seongsoocho has quit IRC | 05:55 | |
*** seongsoocho_ is now known as seongsoocho | 05:55 | |
*** paladox has quit IRC | 05:55 | |
*** ykarel_ has quit IRC | 05:55 | |
*** ykarel__ has joined #opendev | 05:55 | |
*** ralonsoh has quit IRC | 05:57 | |
*** mlavalle has quit IRC | 05:57 | |
*** mlavalle has joined #opendev | 05:58 | |
*** mnaser has quit IRC | 05:59 | |
*** mnaser has joined #opendev | 06:00 | |
*** marios has joined #opendev | 06:03 | |
openstackgerrit | Ian Wienand proposed opendev/system-config master: nodepool-base: prefer ZK IPv6 addresses https://review.opendev.org/c/opendev/system-config/+/787313 | 06:11 |
*** sboyron has joined #opendev | 06:21 | |
*** ykarel_ has joined #opendev | 06:38 | |
*** ykarel__ has quit IRC | 06:40 | |
ianw | bullseye images should be making their way out | 06:43 |
ianw | arm64 is uploaded | 06:43 |
openstackgerrit | Ian Wienand proposed opendev/system-config master: nodepool-base: prefer ZK IPv6 addresses https://review.opendev.org/c/opendev/system-config/+/787313 | 06:56 |
*** slaweq has joined #opendev | 07:02 | |
*** fressi has joined #opendev | 07:07 | |
*** avass has quit IRC | 07:09 | |
*** avass has joined #opendev | 07:10 | |
*** andrewbonney has joined #opendev | 07:11 | |
*** whoami-rajat has joined #opendev | 07:12 | |
*** ralonsoh_ is now known as ralonsoh | 07:21 | |
*** gothicserpent has quit IRC | 07:25 | |
*** gothicserpent has joined #opendev | 07:27 | |
*** amoralej has joined #opendev | 07:29 | |
*** rpittau|afk is now known as rpittau | 07:33 | |
*** eolivare has joined #opendev | 07:41 | |
*** tosky has joined #opendev | 07:46 | |
*** ysandeep|away is now known as ysandeep | 07:49 | |
*** brinzhang has quit IRC | 07:50 | |
*** ykarel_ has quit IRC | 07:52 | |
*** brinzhang has joined #opendev | 07:55 | |
*** jpena|off is now known as jpena | 07:56 | |
*** ysandeep is now known as ysandeep|lunch | 08:15 | |
*** ykarel_ has joined #opendev | 08:27 | |
kevinz | ianw: Good evening :-) | 08:46 |
kevinz | ianw: does ZK running in the node : nb03.opendev.org? | 08:46 |
ianw | kevinz: nb03.opendev.org talks to the zookeeper cluster of zk*.openstack.org hosts for communication from zuul | 08:52 |
kevinz | ianw: OK, so ZK lost has happened today? It was working well before right? | 08:53 |
ianw | kevinz: it's been a persistent problem. it drops the connection, but then connects again quickly | 08:53 |
hrw | ianw: cool (bullseye images)! images means nodes. nodes means wheel cache. so soon bullseye be ready for CI jobs ;D | 09:10 |
*** ysandeep|lunch is now known as ysandeep | 09:23 | |
*** ykarel_ has quit IRC | 09:34 | |
*** lpetrut has joined #opendev | 09:39 | |
*** slaweq_ has joined #opendev | 10:14 | |
*** slaweq has quit IRC | 10:18 | |
*** slaweq_ is now known as slaweq | 10:18 | |
*** dtantsur|afk is now known as dtantsur | 10:37 | |
*** jpena is now known as jpena|lunch | 11:32 | |
*** sshnaidm has quit IRC | 12:00 | |
*** amoralej is now known as amoralej|lunch | 12:07 | |
*** sshnaidm has joined #opendev | 12:07 | |
hrw | hm. python3 crashes in centos-8-stream-aarch64 nodes ;( | 12:27 |
hrw | https://5a819d33a93f73917b61-d92e3f8cc209d4a3d1e66263399702fb.ssl.cf2.rackcdn.com/772479/31/check-arm64/kolla-build-centos8s-source-aarch64/6219eb7/job-output.txt | 12:27 |
*** jpena|lunch is now known as jpena | 12:32 | |
fungi | (core dumped) python3 | 13:04 |
fungi | we may have more details in the console json | 13:04 |
*** amoralej|lunch is now known as amoralej | 13:05 | |
fungi | nope, job-output.json doesn't have any additional output, it was probably swallowed by the shell invocation | 13:06 |
fungi | looks like the last success result was 2021-04-16 20:06:14 | 13:20 |
fungi | so this has potentially been failing for almost 5 days | 13:21 |
fungi | hrw: i set an autohold for that job on https://review.opendev.org/772479 and have added a check arm64 comment to get it rerun | 13:27 |
hrw | ok | 13:28 |
fungi | so once it fails we should be able to install the symbols and load the core in gdb, hopefully | 13:29 |
*** artom has joined #opendev | 13:38 | |
fungi | hrw: well, the bad news is it doesn't trivially crash, so we'll probably need to try to replicate what ansible is asking python to do | 14:05 |
fungi | likely there's some c extension getting involved | 14:06 |
hrw | ok. should we disable testing on centos for now? | 14:08 |
fungi | probably so until we get to the bottom of this | 14:09 |
hrw | ok. will prepare patch | 14:09 |
*** fressi has quit IRC | 14:09 | |
fungi | i may need some help from someone better versed in ansible's internals, but it looks like there are several /tmp/ansible_setup_payload_*/ansible_setup_payload.zip we could try unpacking and running to narrow down the cause of the crash | 14:12 |
hrw | sorry, I barely know how to use ansible so cannot help | 14:14 |
fungi | yeah, not a problem, i'm sure we've got folks hanging around in the shadows ;) | 14:14 |
hrw | https://review.opendev.org/c/openstack/kolla/+/787375 submitted to stop jobs | 14:15 |
fungi | if i had to place a wager, i'd say odds favor some non-stdlib ansible dependency has a c extension compiled for the wrong architecture... but at this stage i have no evidence to support that theory | 14:16 |
*** lpetrut has quit IRC | 14:29 | |
johnsom | clarkb Many moons ago I requested to delete the xenial "test" image on tarballs.o.o. I think you wanted to wait a week or such before deleting it. Can we delete that now? It hasn't been updates since 2019.... | 14:33 |
johnsom | https://tarballs.opendev.org/openstack/octavia/test-images/ | 14:33 |
fungi | not sure what the concern with deleting it back then was, but i can take care of it shortly | 14:35 |
johnsom | That would be great, thank you | 14:35 |
clarkb | johnsom: ah sorry | 14:39 |
clarkb | I think the concern was it had just been announced? something like that | 14:39 |
johnsom | No worries, just following up | 14:39 |
hrw | fungi: ok. centos stream 8 aarch64 job disabled | 15:22 |
fungi | thanks, once i'm off my current conference call i hope to get back to trying to come up with a python3 invocation to directly reproduce the crash | 15:30 |
*** eolivare has quit IRC | 15:59 | |
*** tkajinam has quit IRC | 16:07 | |
*** tkajinam has joined #opendev | 16:07 | |
zigo | fungi: Just checking (no pressure): do we have bullseye now? :) | 16:19 |
*** chandankumar is now known as raukadah | 16:22 | |
*** hamalq has joined #opendev | 16:22 | |
*** hamalq has quit IRC | 16:23 | |
*** hamalq has joined #opendev | 16:24 | |
fungi | zigo: i believe so? | 16:37 |
fungi | i've been fairly busy today but i saw ianw mention while i was asleep that we had nodes booting | 16:38 |
zigo | Ah cool ! :) | 16:42 |
fungi | zigo: a quick check of nodepool says we have bullseye images uploaded to our arm64 providers but not amd64 | 16:44 |
zigo | Oh ... :/ | 16:45 |
fungi | though we did build an amd64 image | 16:45 |
zigo | So, that's for tomorrow? | 16:45 |
fungi | aha, it was only built an hour ago so may still be uploading | 16:45 |
fungi | er, no, it started building an hour aho | 16:45 |
fungi | ago | 16:45 |
fungi | no, wait, that's a minute ago | 16:46 |
fungi | debian-bullseye-arm64 completed building almost 12 hours ago so maybe there's something wrong with the amd64 image build still | 16:46 |
*** amoralej is now known as amoralej|off | 16:51 | |
zigo | :/ | 16:51 |
fungi | yeah, insta-failing: https://nb02.opendev.org/debian-bullseye-0000001505.log | 16:52 |
fungi | Err:13 https://mirror.dfw.rax.opendev.org/debian-security bullseye-security/updates/main amd64 Packages | 16:53 |
fungi | 404 Not Found [IP: 2001:4800:7819:105:be76:4eff:fe04:9b8a 443] | 16:53 |
fungi | the /updates is the problem | 16:53 |
fungi | it's just https://mirror.dfw.rax.opendev.org/debian-security/dists/bullseye-security/main/ | 16:54 |
fungi | so we've got something wrong in our mirror entry for the amd64 image builds but not arm64 | 16:56 |
*** ysandeep is now known as ysandeep|away | 16:57 | |
*** marios is now known as marios|out | 16:57 | |
*** jpena is now known as jpena|off | 17:03 | |
*** ralonsoh has quit IRC | 17:09 | |
openstackgerrit | Tristan Cacqueray proposed zuul/zuul-jobs master: ensure-docker: prevent issue on centos-7 where the socket does not exists https://review.opendev.org/c/zuul/zuul-jobs/+/787421 | 17:12 |
*** marios|out has quit IRC | 17:21 | |
*** rpittau is now known as rpittau|afk | 17:24 | |
*** dtantsur is now known as dtantsur|afk | 17:40 | |
openstackgerrit | Clark Boylan proposed opendev/system-config master: Add inmotion cloud to cloud launcher https://review.opendev.org/c/opendev/system-config/+/787425 | 17:40 |
*** ocsabat has joined #opendev | 17:41 | |
*** ocsabat has quit IRC | 17:53 | |
openstackgerrit | Clark Boylan proposed openstack/project-config master: Add InMotion cloud to nodepool https://review.opendev.org/c/openstack/project-config/+/787428 | 17:56 |
*** andrewbonney has quit IRC | 18:01 | |
openstackgerrit | Tristan Cacqueray proposed zuul/zuul-jobs master: ensure-docker: do not manage the socket on distro centos https://review.opendev.org/c/zuul/zuul-jobs/+/787429 | 18:01 |
*** elod has quit IRC | 18:04 | |
*** elod has joined #opendev | 18:06 | |
openstackgerrit | Clark Boylan proposed opendev/system-config master: Add inmotion cloud to cloud launcher https://review.opendev.org/c/opendev/system-config/+/787425 | 18:18 |
openstackgerrit | Merged zuul/zuul-jobs master: ensure-docker: prevent issue on centos-7 where the socket does not exists https://review.opendev.org/c/zuul/zuul-jobs/+/787421 | 18:19 |
*** vishalmanchanda has quit IRC | 18:54 | |
*** whoami-rajat has quit IRC | 18:55 | |
*** lpetrut has joined #opendev | 19:32 | |
openstackgerrit | Merged opendev/system-config master: Add inmotion cloud to cloud launcher https://review.opendev.org/c/opendev/system-config/+/787425 | 19:32 |
*** lpetrut has quit IRC | 19:34 | |
*** hamalq has quit IRC | 19:44 | |
*** hamalq has joined #opendev | 19:44 | |
corvus | infra-root: i'd like to restart zuul with the latest batch of zk changes (secrets in zk); any objections or points to conisder? | 19:58 |
fungi | corvus: we've got a deploy item in progress we'd like to see finish soonish (deploying credentials for the inmotion hosting cloud) | 20:02 |
fungi | other than that no concerns on my end | 20:03 |
fungi | item 787425 in the deploy pipeline | 20:04 |
corvus | cool, i'll give that a little bit | 20:05 |
clarkb | corvus: airship did finally do their release aiui so should be good on that front (though I've been busy with this cloud stuff today and haven't gotten to zk cluster upgrades yet) | 20:07 |
fungi | #status log Deleted /afs/.openstack.org/project/tarballs.opendev.org/openstack/octavia/test-images/test-only-amphora-x64-haproxy-ubuntu-xenial.qcow2 as requested by johnsom | 20:09 |
openstackstatus | fungi: finished logging | 20:09 |
corvus | clarkb, fungi: fyi the base and letsencrypt playbooks failed; manage-projects is running now. | 20:18 |
fungi | thanks, i'll take a look. we're mostly concerned with the nodepool and cloud-launcher jobs i think, depending on where base failed at least | 20:20 |
corvus | fungi: buildset complete: https://zuul.opendev.org/t/openstack/buildset/7e90244873494a059d3f2fe3b6aac7f9 | 20:24 |
fungi | oh poo, all the other prod jobs were skipped, i guess because they're set to depend on infra-prod-base | 20:24 |
fungi | yep | 20:24 |
corvus | fungi: actually i think they skipped due to letsencrypt | 20:26 |
fungi | oh, huh... | 20:26 |
fungi | fatal: [mirror01.regionone.limestone.opendev.org]: UNREACHABLE! | 20:28 |
fungi | yeah, that's the underlying reason for both builds failing | 20:29 |
fungi | i also can't ssh into it | 20:30 |
clarkb | I think we can probably make cloud launcher run pretty independent of everything else | 20:30 |
clarkb | but figuring out limestone first seems like a good idea | 20:30 |
fungi | i can get to the api for limestone, server list shows the server is in SHUTOFF state | 20:34 |
fungi | i've asked it to start, will check its logs if it lets me ssh in now | 20:34 |
johnsom | fungi Thank you! | 20:35 |
fungi | something seems to have asked that mirror to shutdown at 15:27:47 utc today | 20:38 |
fungi | no indication in syslog of what that was | 20:39 |
fungi | logan-: ^ just a heads up, if you're around, seems like there may have been some trauma in that cloud | 20:39 |
fungi | may also be ongoing, but it was at least willing to let me start the mirror server back up again | 20:40 |
fungi | #status log Started mirror01.regionone.limestone.opendev.org, which seems to have spontaneously shutdown at 15:27:47 UTC today | 20:41 |
openstackstatus | fungi: finished logging | 20:41 |
*** slaweq has quit IRC | 20:46 | |
*** sboyron has quit IRC | 20:49 | |
clarkb | corvus: if we can let the next hourly run run through the cloud launcher that should finish about 21:22 ish that would be great | 20:52 |
fungi | corvus: clarkb: seems like it's safe to go ahead with the zuul restart? i can always manually reenqueue the failed deploy after zuul's back up | 20:52 |
clarkb | fungi: if we do that we may need to enqueue an hourly run? | 20:52 |
fungi | heh, messages crossed on the wire | 20:52 |
fungi | clarkb: well, if zuul's restarted right now, the currently incomplete hourly run will be reenqueued | 20:52 |
clarkb | ah yup | 20:54 |
clarkb | either way I guess then | 20:54 |
fungi | it ends up showing a 0 ref instead of refs/heads/master i think, but we observed the jobs still end up using the correct branch | 20:55 |
fungi | or maybe that's fixed more recently | 20:55 |
corvus | ... :) | 20:56 |
corvus | yeah, i think there's no harm to the inmotion project by restarting zuul now | 20:57 |
fungi | i agree. clarkb? | 20:57 |
corvus | (it may even speed it up -- all assuming the zuul deploy doesn't explode) | 20:58 |
clarkb | ya unless the job ends before we grab the queues | 20:58 |
clarkb | but we can manually add in an hourly run at that point if necessary | 20:58 |
corvus | i have run zuul_pull | 21:00 |
corvus | hourly just finished | 21:00 |
corvus | i will save queues now and restart | 21:00 |
corvus | zuul is currently iterating through all the projects loading keys into zk from the filesystem | 21:02 |
fungi | awesome | 21:02 |
corvus | it's probably doing a little more than 5 projects/sec... we might want to think about making that more efficient... | 21:03 |
corvus | also... | 21:03 |
corvus | i think i'd like to restart the scheduler again immediately after it finishes, just to observe a second startup with keys already in zk | 21:03 |
fungi | seems like a reasonable precaution | 21:04 |
fungi | the earlier we catch a problem there the better | 21:04 |
corvus | (after it finishes loading keys into zk -- we don't need to wait for the cat jobs) | 21:04 |
clarkb | ~5 minutes for loading keys I guess? maybe a little less since projects in zuul is < projects in gerrit? | 21:06 |
corvus | 21:06:15,080 - 21:07:42,111 the second time around | 21:08 |
clarkb | status loads now | 21:11 |
corvus | re-enqueing | 21:11 |
corvus | #status log restarted zuul on commit 620d7291b9e7c24bb97633270492abaa74f5a72b | 21:16 |
openstackstatus | corvus: finished logging | 21:16 |
clarkb | looks like we didn't end up with an hourly job. How would I go about enqueuing that? | 21:17 |
corvus | clarkb: i'll do it in a sec | 21:17 |
corvus | a job uploaded logs: https://4b2e22934a7c49e91cc6-5f33f4a8f6999785c5e66684a945b77a.ssl.cf1.rackcdn.com/783969/37/check/openstack-tox-pep8/2c04510/ | 21:18 |
clarkb | cool thanks, if you cna share the command after that would be great :) | 21:18 |
corvus | i think that means secrets still work :) | 21:18 |
corvus | clarkb: i'll just let the current re-enqueue finish | 21:18 |
corvus | clarkb: i snagged a queue dump while the hourly was still running as well, so hopefully it already has the command in it :) | 21:18 |
clarkb | oh got it | 21:19 |
corvus | zuul enqueue-ref --tenant openstack --pipeline opendev-prod-hourly --project opendev.org/opendev/system-config --ref refs/heads/master | 21:19 |
corvus | clarkb: that's what the script came up with and looks reasonable to me, so i'll run that | 21:19 |
clarkb | cool it is enqueue-ref which was one of my big questions | 21:19 |
clarkb | then we just tell it to use master | 21:19 |
corvus | clarkb: enqueued | 21:20 |
clarkb | thanks! | 21:20 |
ianw | fungi: if i read correctly two things that need action are bullseye amd64 builds and centos8 ansible arm64 is somehow crashing? | 21:25 |
fungi | ianw: yeah, sorry, it's been a crazy day. i haven't made much progress on those. i held an arm64 centos7 node to try and work out why ansible gathering facts leads to a coredump | 21:26 |
clarkb | didn't nasible have similar issues with a systemd update once | 21:26 |
fungi | the bullseye amd64 problem is probably a quick fix... just need to correct the security mirror suite name | 21:26 |
clarkb | something about how ansible gloms onto systemd on centos | 21:26 |
fungi | i'll see if i can figure out where/why bullseye-security is misconfigured | 21:28 |
ianw | systemd is involved, well i never! | 21:28 |
* ianw clutches pearls | 21:28 | |
clarkb | ianw: well I don't know that it is just that I seem to recall similar ansible problems that did involve it once | 21:28 |
clarkb | :) | 21:28 |
ianw | clarkb: if you have a quick sec https://review.opendev.org/c/openstack/diskimage-builder/+/787303 should stop leaking all those profile dirs i hope | 21:29 |
clarkb | oh neat, I do have time for a review while I wait for the cloud launcher to run | 21:30 |
clarkb | ianw: does bash trap put things in a stack? | 21:31 |
clarkb | (just wondering if we need to worry about other exit traps being overwritten) | 21:31 |
fungi | that's one of those things i would whip up a local test to determine | 21:32 |
clarkb | looks like it overrides according to trap | 21:32 |
clarkb | er man trap | 21:32 |
fungi | good to know | 21:32 |
ianw | it should be running the actual scripts in a subshell though | 21:33 |
clarkb | yup and man trap also says they are reset per subshell | 21:33 |
clarkb | just making sure this won't cause issues, it shouldn't as long as no other trap EXIT exists in this particular script | 21:33 |
clarkb | and there isn't another | 21:34 |
clarkb | \o/ bridge is now able to talk to the inmotion cloud (I can list images for example) | 21:36 |
corvus | i had to scroll past 4 pages of zuul status before i saw a failing job; i thought something was wrong for a minute :) | 21:36 |
clarkb | now just need cloud launcher to configure the things it configures and next step can do a mirror launch | 21:36 |
ianw | corvus: if you have a sec, could you look at https://review.opendev.org/c/opendev/system-config/+/787313 to switch zk for nodepool to ipv6 | 21:37 |
ianw | primarily, i'd like to see if this helps nb03 stay in zookeeper. it will give us a data point that it really is ipv4 that is getting us cut off (as opposed to more general network issues) | 21:37 |
ianw | but generally, i think we're ok to use ipv6 for it | 21:38 |
ianw | fungi: alright, i'm seeing the bullseye build failure. i don't think i'm quite understanding how things are layed out incorrectly, and why it hasn't failed in the gate | 21:40 |
ianw | https://mirror.dfw.rax.opendev.org/debian-security/dists/ ... should this just be "bullseye", not "bullseye-security"? | 21:41 |
openstackgerrit | Ian Wienand proposed opendev/system-config master: debian-security: fix bullseye codename https://review.opendev.org/c/opendev/system-config/+/787447 | 21:44 |
clarkb | I want to say bullseye changed how security was handled compared to buster | 21:45 |
clarkb | I would definitely have fungi ack that one since I don't grok all the debianness there | 21:45 |
mordred | yeah - I agree - I looked at repo paths a few weeks ago and it's definitely laid out a little different for bullseye | 21:45 |
mordred | the sources.list in a bullseye container is different too | 21:46 |
ianw | http://security.debian.org/debian-security/dists/ | 21:46 |
ianw | hrm, indeed | 21:46 |
fungi | sorry, trying to juggle too many discussions at once. the "/updates" in the security suite is the problem | 21:47 |
fungi | for some reason the amd64 image tries to get bullseye-security/updates/main instead of bullseye-security/main | 21:48 |
fungi | (that would have been correct for buster, but is not correct for bullseye) | 21:48 |
ianw | i note that http://security.debian.org/debian-security/dists/bullseye-security/updates/ is a recursive link to itself apparently | 21:49 |
ianw | http://security.debian.org/debian-security/dists/bullseye-security/updates/updates/updates/updates/updates/updates/updates/ ... i wonder how far it can go | 21:49 |
clarkb | cloud launcher failed. I'm looking at it. I think it managed to update security groups though? | 21:49 |
ianw | clarkb: note that it will fail on osu using the default sdk | 21:51 |
ianw | until https://review.opendev.org/c/openstack/openstacksdk/+/786148 is incorporated | 21:52 |
clarkb | BadRequestException: 400: Client Error 400: Client Error for url: https://173.231.255.228:9696/v2.0/security-group-rules, Unrecognized attribute(s) 'remote_address_group_id' | 21:52 |
clarkb | in this case the failure was in the new cloud, but good to know | 21:52 |
clarkb | its the same issue | 21:53 |
clarkb | ianw: did you run cloud launcher by hand with a different sdk version somehow? | 21:53 |
ianw | clarkb: yep | 21:53 |
ianw | sudo ./venv/bin/ansible-playbook -e ansible_python_interpreter=/home/ianw/system-config/venv/bin/python3 -v ./playbooks/run_cloud_launcher.yaml | 21:54 |
ianw | that would probably work for you too | 21:54 |
clarkb | thanks I'll try it | 21:54 |
ianw | ok, so the gate tests don't use our mirror -> https://zuul.opendev.org/t/openstack/build/933e4b208ea44bfbb97ea08e7bd66c96/log/nodepool/builds/test-image-0000000001.log#227 | 21:54 |
clarkb | ianw: is that ./venv/bin/ansible-playbook that same as /home/ianw/system-config/venv ? | 21:54 |
ianw | umm yep, but i think you could probably just use ansible-playbook | 21:55 |
ianw | (from path) | 21:55 |
ianw | that venv just has an old openstacksdk installed | 21:56 |
clarkb | ok I'll try sudo ansible-playbook -e ansible_python_interpreter=/home/ianw/system-config/venv/bin/python3 -v path/to/normal/system-config-check/playbooks/run_cloud_clauncher.yaml | 21:56 |
ianw | ++ | 21:57 |
fungi | so to refresh my memory there, openstacksdk released a behavior change which is broken on anything except wallaby neutron? | 21:58 |
clarkb | the cloud we're trying to get going now is victoria so certainly seems like it must be a very recent cloud? | 21:59 |
ianw | i think that is an accurate summary; though i would stand to be corrected if someone dug through the various points the api bits got released | 22:00 |
clarkb | we don't notice it against the other clouds because we don't try to change things if they are already up to date | 22:00 |
clarkb | so we notice on the new clouds as we enroll them | 22:00 |
ianw | fungi: sorry, i know you're doing other things | 22:02 |
ianw | i'm seeing upstream job working with | 22:02 |
ianw | http://security.debian.org/ bullseye-security/updates main | 22:02 |
ianw | and our job failing with | 22:02 |
ianw | https://mirror.dfw.rax.opendev.org/debian-security bullseye-security/updates main | 22:02 |
ianw | the difference being, we don't have an "updates" in our mirror, while upstream does | 22:02 |
ianw | (http://security.debian.org/debian-security/dists/bullseye-security/updates/) | 22:03 |
ianw | but upstream's "updates" appears to be a recursive link to itself | 22:03 |
ianw | which is about the state my brain is now in :) | 22:03 |
fungi | ianw: yeah, i think we owe that to reprepro... pretty sure that updates recursive symlink is a temporary workaround debian added on their mirrors to make upgrades less painful | 22:09 |
clarkb | ianw: fungi: there was an issue in the cloud launcher config that I set up where it improperly applied config to osuosl that I wanted against inmotion | 22:14 |
clarkb | I'm working on a fix, but I think it may have created a network and subnet (but not router) on osuosl's zuul project :/ I'm not sure if this has an impact on our abiltiy to boot functioning instances there yet | 22:14 |
clarkb | the keypairs and security groups should be identical so those don't matter | 22:15 |
ianw | clarkb: probably unlikely as we hard-code the network to use, as they have two, and used fixed ip's on them | 22:15 |
clarkb | ianw: ok good | 22:15 |
clarkb | I'll get this fixed up version pushed then look at cleaning up the mess | 22:15 |
ianw | it might be easier to delete via horizon | 22:16 |
clarkb | ya probably since order matters | 22:17 |
openstackgerrit | Clark Boylan proposed opendev/system-config master: Set the correct cloud for opendevzuul-inmotion enrollment https://review.opendev.org/c/opendev/system-config/+/787452 | 22:17 |
clarkb | that should fix the problem | 22:17 |
fungi | ianw: see evidence of a somewhat mass bug filing trying to get stuff to stop relying on the symlink... https://duckduckgo.com/?q=site:bugs.debian.org+"bullseye+updates+security" | 22:18 |
clarkb | I won't do the cleanup yet as it will just get recreated until https://review.opendev.org/c/opendev/system-config/+/787452 merges. Since we are explicit about networks I don't expect this is causing functional issues, its just messy | 22:20 |
ianw | fungi: but for the immediate issue; our missing symlink is the problem? i'm not sure what creates that on our mirror | 22:20 |
fungi | well, nothing creates it on our mirror, it's not on our mirror | 22:21 |
fungi | i think our sources.list needs to be fixed | 22:21 |
fungi | in the images | 22:21 |
ianw | https://mirror.dfw.rax.opendev.org/debian-security/dists/buster/updates/ is there? | 22:21 |
fungi | yes, but not bullseye. they redid how the security repository is organized starting in bullseye | 22:22 |
clarkb | oh wait my patch is still wrong one moment | 22:22 |
openstackgerrit | Clark Boylan proposed opendev/system-config master: Set the correct cloud for opendevzuul-inmotion enrollment https://review.opendev.org/c/opendev/system-config/+/787452 | 22:23 |
ianw | fungi: right. *our* mirror doesn't have the updates symlink. but debian-security does -> http://security.debian.org/debian-security/dists/bullseye-security/updates/ | 22:23 |
clarkb | I'm going to manually run cloud launcher again but against 787452's state to get the new cloud set up | 22:25 |
fungi | ianw: yep, and that's apparently a temporary hack to fix mostly regressions in debian's package testing infrastructure and ease upgrades from buster. deployed bullseye systems shouldn't follow that symlink | 22:26 |
fungi | so either we add a similar hack symlink to our reprepro mirror, and then fix it properly for the subsequent debian release, or we solve it now | 22:27 |
fungi | ianw: what i haven't figured out yet is where we write out the sources.list file when creating those images | 22:29 |
ianw | diskimage_builder/elements/debian-minimal/environment.d/10-debian-minimal.bash is what builds it | 22:30 |
ianw | diskimage_builder/elements/debian-minimal/root.d/75-debian-minimal-baseinstall is what ends up writing it | 22:31 |
corvus | ianw: 787313 lgtm | 22:31 |
fungi | aha, now i see why my naive codesearch patterns weren't turning up any hits | 22:31 |
ianw | corvus: thanks; OSU also just added an ipv6 address and AAAA records for their API too | 22:33 |
ianw | like WOPR i think we've figured out the best way to diagnose ipv4 nat issues is just not to play the game :) | 22:34 |
fungi | ianw: so i think just moving the DIB_DEBIAN_SECURITY_SUBPATH assignment into the bullseye conditional block will probably address this | 22:34 |
fungi | i'll gibe that a shot | 22:34 |
fungi | give | 22:34 |
ianw | fungi: perhaps we should just hand-end on nb01 and see, because the gate job isn't setting us up to use our security mirror | 22:35 |
fungi | yeah i can try that too | 22:35 |
ianw | (we can fix that too) | 22:35 |
fungi | but i'll still push a change first just to record what i'm trying | 22:36 |
openstackgerrit | Jeremy Stanley proposed openstack/diskimage-builder master: debian-minimal: bullseye: /updates -> -security https://review.opendev.org/c/openstack/diskimage-builder/+/787454 | 22:40 |
fungi | ianw: ^ so that's what i want to try | 22:40 |
fungi | zigo: ^ next time you're around, in case you want to provide more context | 22:42 |
zigo | fungi: I'm here ! | 22:42 |
* zigo reads | 22:42 | |
fungi | zigo: i'm fairly sure that's why the current images are failing to build | 22:42 |
clarkb | it seems running the cloud launcher against my checkout is insufficient | 22:42 |
clarkb | (I think var lookup paths may not be as relative as I want in this case) | 22:42 |
ianw | fungi: ok, just have to do school run, bib | 22:43 |
zigo | fungi: Because of bullseye/updates -> bullseye-security thingy ? | 22:43 |
clarkb | I think that means I need https://review.opendev.org/c/opendev/system-config/+/787452 in | 22:43 |
ianw | clarkb: yeah, ansible.conf is probably making it look in global paths | 22:43 |
clarkb | ianw: ya | 22:43 |
clarkb | Zuul has +1'd https://review.opendev.org/c/opendev/system-config/+/787452 if I can get reviews :) | 22:43 |
fungi | zigo: yep, official debian mirrors have added an updates symlink in bullseye-security as a temporary workaround, but reprepro doesn't know to create that (and our sources.list files shouldn't depend on it anyway) | 22:44 |
zigo | fungi: This thing is a major pain in many components, but it's very helpful for Debian users, so they don't mistake between stable/updates and stable-updates anymore ... | 22:45 |
zigo | fungi: My own mirror does *NOT* have a bullseye/updates folder ... | 22:45 |
fungi | absolutely, i followed the discussions on the ml when reorganizing the security layout was proposed, it all makes sense. just means we need to do a bit more special-casing (we already did in fact, but it was incomplete) | 22:45 |
clarkb | the opendevci side seems to be fine so I will proceed with launching the mirror | 22:45 |
zigo | So that symlink is not an official thing at all. | 22:46 |
fungi | right, if memory serves it was added to work around some autopkgtests and to ease in-place upgrades from buster | 22:46 |
fungi | there was an mbf for updating packages which had hard-coded the old pattern | 22:47 |
fungi | bullseye d-i will properly write out sources.list without the /updates though, and dib should be made to do the same | 22:48 |
openstackgerrit | Merged opendev/system-config master: Set the correct cloud for opendevzuul-inmotion enrollment https://review.opendev.org/c/opendev/system-config/+/787452 | 22:56 |
clarkb | ssh failed to the mirror I tried lauinching. I'm going to try booting something by hand now | 22:56 |
clarkb | I thought horizon might make this easy. I was wrong | 22:57 |
zigo | fungi: Could you point at the DIB code so that I can try to help? | 23:06 |
zigo | Which part contains the brokenness ? | 23:06 |
ianw | zigo: fungi already proposed the fix :) https://review.opendev.org/c/openstack/diskimage-builder/+/787454 | 23:07 |
zigo | Brilliant ! :) | 23:07 |
ianw | i can try putting this on nb01 and see if it picks it up | 23:07 |
zigo | Though: "if [ "${DIB_RELEASE}" = "bullseye" ]; then" is probably not the right way to go... | 23:07 |
zigo | I would have go the other way around. | 23:07 |
zigo | if [ "${DIB_RELEASE}" = "buster" ] || [ "${DIB_RELEASE}" = "stretch" ]; then | 23:08 |
zigo | Because now, we have a problem with Bookworms ... :) | 23:08 |
fungi | zigo: yep, and if we decide to do a sid image or something | 23:09 |
fungi | i was trying not to disrupt the current logic there, but i agree it needs future-proofing | 23:09 |
clarkb | the issue with launching the new mirror seems to be that focal has decided that you cannot ssh in as root and must use ubuntu now? | 23:17 |
clarkb | That said when I manually booted an instance root seemed to work, so I wonder if this is impacted by cloud-init somehow | 23:17 |
ianw | clarkb: you know i think i hit that launching the osu mirror, hand-edited launch.py and thought "that's weird" and forgot about it | 23:22 |
ianw | fungi: ok, bullseye is building further along now on nb02 with your change applied | 23:22 |
clarkb | ianw: I think I figured it out. Its managed by the ssh key data. The raeson my manual boot worked is that our ssh key for infra root is a number of keys and the way the break you is by prefixing the expected single key with a command | 23:23 |
clarkb | ianw: I think if we modify launch-node.py to put the pubkey in twice it would work | 23:23 |
clarkb | which is probably my favorite hack of the last little while | 23:23 |
clarkb | I'm going to try this | 23:23 |
clarkb | but I need to cleanup my instance first | 23:23 |
clarkb | oh we may actually already handle this properly | 23:26 |
clarkb | just need to remove root from the front of the attempted list | 23:27 |
clarkb | stuff seems to be moving now that i did ^ | 23:28 |
clarkb | I think we need to properly catch the error there and check the next one in the list though | 23:29 |
clarkb | that is the proper fix | 23:29 |
*** hamalq has quit IRC | 23:34 | |
openstackgerrit | Clark Boylan proposed opendev/system-config master: Add iad3.inmotion mirror node https://review.opendev.org/c/opendev/system-config/+/787456 | 23:37 |
clarkb | I need to get dns address before ^ lands | 23:39 |
clarkb | working on dns now | 23:39 |
*** tosky has quit IRC | 23:42 | |
openstackgerrit | Clark Boylan proposed opendev/zone-opendev.org master: Add inmotion mirror to DNS https://review.opendev.org/c/opendev/zone-opendev.org/+/787459 | 23:43 |
clarkb | ok I think DNS first then inventory then we should be good | 23:44 |
fungi | thanks! reviewing | 23:48 |
openstackgerrit | Clark Boylan proposed opendev/system-config master: Handle focal's insistence we don't use root in launch-node.py https://review.opendev.org/c/opendev/system-config/+/787461 | 23:53 |
clarkb | this last change isn't tested, but I think that is the correct fix to launch node for our root vs ubuntu issues | 23:53 |
clarkb | oh the ns update will be stuck behind the cloud launcher fix too so this might not move too quickly | 23:53 |
Generated by irclog2html.py 2.17.2 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!