openstackgerrit | Ian Wienand proposed zuul/zuul-jobs master: [wip] ensure-tox use venv https://review.opendev.org/725737 | 00:19 |
---|---|---|
openstackgerrit | Ian Wienand proposed zuul/zuul-jobs master: [wip] ensure-tox use venv https://review.opendev.org/725737 | 00:26 |
*** larainema has joined #opendev | 00:42 | |
openstackgerrit | Ian Wienand proposed zuul/zuul-jobs master: ensure-pip : fix Xenial exported virtualenv command https://review.opendev.org/725743 | 00:57 |
openstackgerrit | Ian Wienand proposed zuul/zuul-jobs master: ensure-pip : fix Xenial exported virtualenv command https://review.opendev.org/725743 | 01:06 |
openstackgerrit | Ian Wienand proposed zuul/zuul-jobs master: ensure-pip : fix Xenial exported virtualenv command https://review.opendev.org/725743 | 01:09 |
*** _mlavalle_1 has quit IRC | 01:10 | |
openstackgerrit | Ian Wienand proposed zuul/zuul-jobs master: ensure-pip : fix Xenial exported virtualenv command https://review.opendev.org/725743 | 01:15 |
openstackgerrit | Merged zuul/zuul-jobs master: ensure-pip : fix Xenial exported virtualenv command https://review.opendev.org/725743 | 02:06 |
openstackgerrit | Ian Wienand proposed openstack/project-config master: Drop pip-and-virtualenv from SUSE 15 https://review.opendev.org/725749 | 02:41 |
openstackgerrit | Ian Wienand proposed zuul/zuul-jobs master: Remove opensuse-15-plain testing https://review.opendev.org/725750 | 02:42 |
openstackgerrit | Ian Wienand proposed openstack/project-config master: Stop building opensuse-15-plain images https://review.opendev.org/725751 | 02:45 |
openstackgerrit | Ian Wienand proposed openstack/diskimage-builder master: [wip] focal ubuntu-minimal testing https://review.opendev.org/725752 | 02:49 |
fungi | #status log unlocked mirror.opensuse afs volume and manually performed vos release to clear stale lock from 2020-04-28 afs01.dfw outage | 02:55 |
openstackstatus | fungi: finished logging | 02:55 |
openstackgerrit | Ian Wienand proposed zuul/zuul-jobs master: ensure-tox: use venv to install https://review.opendev.org/725737 | 03:01 |
openstackgerrit | Ian Wienand proposed opendev/system-config master: [dnm] test with plain nodes https://review.opendev.org/712819 | 03:09 |
mordred | ianw: so ... next time you're bored - would you look at https://review.opendev.org/#/c/715717/ ? (it's in merge conflict, and I'm not 100% sure it's right - but I was in there and it seems like we're making a logic error - you touch the nodepool siblings stuff more than I do - take a peek and see if you think I'm right? | 03:12 |
ianw | hrm looking | 03:14 |
openstackgerrit | Merged opendev/system-config master: Revert "Clear LD_PRELOAD variable on zuul-web containers" https://review.opendev.org/725730 | 03:36 |
*** ykarel|away is now known as ykarel | 03:54 | |
*** openstack has joined #opendev | 04:21 | |
*** ChanServ sets mode: +o openstack | 04:21 | |
*** cgoncalves has joined #opendev | 05:38 | |
*** ysandeep|away is now known as ysandeep | 05:51 | |
openstackgerrit | Andreas Jaeger proposed zuul/zuul-jobs master: Remove some temporary files https://review.opendev.org/725650 | 05:54 |
*** diablo_rojo has quit IRC | 05:57 | |
*** dpawlik has joined #opendev | 06:03 | |
*** ysandeep is now known as ysandeep|away | 06:12 | |
*** roman_g has joined #opendev | 06:28 | |
*** ysandeep|away has quit IRC | 06:29 | |
*** ysandeep has joined #opendev | 06:38 | |
*** ysandeep is now known as ysandeep|brb | 06:38 | |
*** ysandeep|brb is now known as ysandeep | 06:39 | |
*** ralonsoh has joined #opendev | 06:45 | |
*** lpetrut has joined #opendev | 07:19 | |
*** tosky has joined #opendev | 07:30 | |
*** rpittau|afk is now known as rpittau | 07:31 | |
*** cgoncalves has quit IRC | 07:34 | |
*** cgoncalves has joined #opendev | 07:40 | |
*** dtantsur|afk is now known as dtantsur | 07:40 | |
*** roman_g has quit IRC | 07:41 | |
*** openstackstatus has quit IRC | 07:46 | |
*** openstack has joined #opendev | 07:48 | |
*** ChanServ sets mode: +o openstack | 07:48 | |
*** citizen_stig has quit IRC | 08:03 | |
*** ysandeep is now known as ysandeep|lunch | 08:03 | |
*** dpawlik has quit IRC | 08:09 | |
*** dpawlik has joined #opendev | 08:09 | |
*** roman_g has joined #opendev | 08:28 | |
*** larainema has quit IRC | 08:30 | |
*** elod is now known as elod_pto | 08:43 | |
*** ykarel is now known as ykarel|lunch | 08:45 | |
openstackgerrit | Merged zuul/zuul-jobs master: Remove some temporary files https://review.opendev.org/725650 | 09:00 |
*** ysandeep|lunch is now known as ysandeep | 09:20 | |
*** ykarel|lunch is now known as ykarel | 09:27 | |
*** rpittau is now known as rpittau|bbl | 09:44 | |
*** owalsh has quit IRC | 09:46 | |
*** owalsh has joined #opendev | 10:09 | |
*** priteau has joined #opendev | 11:03 | |
*** ysandeep is now known as ysandeep|brb | 11:16 | |
*** ysandeep|brb is now known as ysandeep | 11:54 | |
*** hashar has joined #opendev | 12:11 | |
*** rpittau|bbl is now known as rpittau | 12:18 | |
openstackgerrit | Monty Taylor proposed zuul/zuul-jobs master: Retry buildx builds https://review.opendev.org/725843 | 12:23 |
*** tkajinam has joined #opendev | 12:34 | |
*** ykarel is now known as ykarel|afk | 12:38 | |
openstackgerrit | Merged zuul/zuul-jobs master: Retry buildx builds https://review.opendev.org/725843 | 12:47 |
openstackgerrit | Thierry Carrez proposed openstack/project-config master: Add repository for oslo.metrics https://review.opendev.org/725847 | 12:49 |
*** ralonsoh has quit IRC | 13:08 | |
*** ralonsoh has joined #opendev | 13:11 | |
*** olaph has joined #opendev | 13:12 | |
*** cgoncalves has quit IRC | 13:19 | |
fungi | #status log unlocked mirror.yum-puppetlabs afs volume and manually performed vos release to clear stale lock from 2020-04-28 afs01.dfw outage | 13:22 |
openstackstatus | fungi: finished logging | 13:22 |
*** cgoncalves has joined #opendev | 13:25 | |
mordred | infra-root: I'm going to put nodepool hosts into emergency and then land the nodepool patch so that we can update them one at a time | 13:28 |
fungi | mordred: thanks, i'll be around to help | 13:31 |
fungi | optimal caffeination nearly achieved | 13:31 |
*** slittle1 has quit IRC | 13:37 | |
mordred | fungi: I feel like that's an approximation that would be better descibed by a limit | 13:47 |
mordred | because i'm not sure it's possible to ever actually reach that state, but only to approach it closer and closer | 13:47 |
fungi | asymptotic caffeine ideal | 13:49 |
*** ykarel|afk is now known as ykarel | 13:54 | |
*** slittle1 has joined #opendev | 13:54 | |
*** slittle1 has quit IRC | 14:01 | |
*** ysandeep is now known as ysandeep|away | 14:02 | |
openstackgerrit | Monty Taylor proposed zuul/zuul-jobs master: WIP configure cache-to and cache-from for buildx https://review.opendev.org/725862 | 14:02 |
*** slittle1 has joined #opendev | 14:03 | |
openstackgerrit | Merged opendev/system-config master: Run nodepool launchers with ansible and containers https://review.opendev.org/720527 | 14:21 |
*** jhesketh has quit IRC | 14:22 | |
*** lpetrut has quit IRC | 14:30 | |
mordred | woot | 14:40 |
mordred | infra-root: I'm gonna run ansible on nl01 - I'm in a screen in case anyone wants to watch, but I'm fairly sure it'll go fine | 14:44 |
*** mlavalle has joined #opendev | 14:49 | |
fungi | jumping on now, thanks | 14:49 |
fungi | i guess it's not there yet? or already gone | 14:50 |
*** jhesketh has joined #opendev | 14:51 | |
mordred | fungi: I'm in screen | 14:52 |
mordred | on bridge | 14:52 |
mordred | fungi: ok - I did chowns on nl01 and did a docker-compose start - and it seems like it's running well | 14:55 |
mordred | well - except- we seem to be logging to the docker logs instead of to /var/log/nodepool | 14:56 |
mordred | but docker-compose logs -f seems to look decent - except for some quota issues | 14:57 |
* mordred is going to wait here and not do any further nodes so people can look at things | 14:57 | |
*** smcginnis has quit IRC | 14:58 | |
fungi | oh, bridge | 15:00 |
fungi | duh ;) | 15:00 |
*** lpetrut has joined #opendev | 15:00 | |
fungi | mind if i scroll the buffer? | 15:01 |
clarkb | mordred: did the logging config not get supplied properly? it co figures the /var/log/nodepool stuff iirc | 15:04 |
*** smcginnis has joined #opendev | 15:04 | |
mordred | fungi: please do! | 15:07 |
mordred | clarkb: we are totally not passing -l /etc/nodepool/launcher-logging.conf in the compose file | 15:08 |
*** hashar has quit IRC | 15:08 | |
fungi | okay, and the usermod errors are expected since there are running processes for that user, same as we saw with zuul | 15:11 |
fungi | lgtm | 15:11 |
openstackgerrit | Monty Taylor proposed opendev/system-config master: Configure nodepool launcher to use logging config https://review.opendev.org/725889 | 15:17 |
mordred | clarkb, fungi: ^^ | 15:17 |
fungi | so the command parameter defaults to just `nodepool-launcher -c /etc/nodepool/nodepool.yaml` somehow? | 15:19 |
fungi | (and where was that happening?) | 15:19 |
mordred | fungi: https://opendev.org/zuul/nodepool/src/branch/master/Dockerfile#L41 | 15:20 |
mordred | in the zuul setup, we have a zuul.conf file which is where we configure which logging config to use, so it wasnt' an issue | 15:21 |
mordred | fungi: actually - we're logging to docker in nb04 too | 15:22 |
fungi | neat | 15:22 |
mordred | and we don't write out a logging config there | 15:23 |
mordred | so - should we align to just using docker logs for this? or also add a builder logging config? | 15:23 |
mordred | (the dib builds are still logging to /var/log/nodepool/builds) | 15:23 |
*** ykarel is now known as ykarel|away | 15:23 | |
mordred | clarkb, corvus : ^^ thoughts? | 15:23 |
clarkb | mordred: I think the special logging is more important for the builder than the launcher as it split logs out by image | 15:24 |
clarkb | we then expose those per image build logs publicly | 15:24 |
mordred | yeah - and that's working on the builder | 15:26 |
mordred | should we just keep logging the service logs to docker for both? | 15:26 |
fungi | i agree separating out builder logs for publication is more important | 15:26 |
mordred | if so - I can just stop writing out a logging config file | 15:26 |
fungi | but if we can publish launcher logs safely too, then why wouldn't we? | 15:26 |
clarkb | mordred: if we do that I think we should consider doing it for all the services we run too (at least as much as we can, haproxy will only log to syslog iirc) | 15:27 |
clarkb | for me the big thing is consistency and not trying to figure out where logs are on different hosts/services | 15:27 |
mordred | clarkb: yeah - I think we've got a hybrid mix of logging currently | 15:27 |
mordred | but - I think for zuul/nodepol that sounds liek a good reason to do logging config files | 15:28 |
mordred | since we're doig them for zuul | 15:28 |
mordred | that way at least we're consistent within that service | 15:28 |
clarkb | ++ | 15:30 |
clarkb | infra-root I've pulled the latest zuul-web image down which is built without jemalloc. The docker compose file for zuul-web no longer updates LD_PRELOAD. I'd like to restart the zuul-web service now. Let me know if I should hold off on that | 15:30 |
mordred | clarkb: go for it | 15:31 |
clarkb | done | 15:32 |
clarkb | this should serve as a last sanity check for the jemalloc cleanup addressing memory pressure | 15:32 |
corvus | mordred: i'd be more comfortable logging to disk everywhere | 15:32 |
*** roman_g has quit IRC | 15:33 | |
*** roman_g has joined #opendev | 15:34 | |
openstackgerrit | Monty Taylor proposed opendev/system-config master: Configure nodepool to use logging config https://review.opendev.org/725889 | 15:37 |
mordred | corvus, clarkb: ^^ that should get us logging to disk on builder and launcher | 15:37 |
clarkb | mordred: how does https://review.opendev.org/#/c/725889/2/playbooks/roles/nodepool-builder/files/logging.conf produce the per image logs? or is nodepool doing that internally? I thought at one time we generated a very large logging.conf with all our images listed | 15:38 |
mordred | clarkb: I think nodepool is doing that internally | 15:39 |
clarkb | mordred: also found a file path bug on the latest ps | 15:39 |
mordred | clarkb: that's the logging config file from puppet | 15:39 |
clarkb | (its noted inline) | 15:39 |
clarkb | mordred: gotcha | 15:40 |
openstackgerrit | Monty Taylor proposed opendev/system-config master: Configure nodepool to use logging config https://review.opendev.org/725889 | 15:40 |
clarkb | mordred: double check ps3 too. I think builder and logging are transposed in the two locations they are used | 15:41 |
mordred | awesome | 15:41 |
clarkb | builder-logging.conf vs logging-builder.conf | 15:41 |
openstackgerrit | Monty Taylor proposed opendev/system-config master: Configure nodepool to use logging config https://review.opendev.org/725889 | 15:42 |
mordred | clarkb: yup. thanks | 15:42 |
openstackgerrit | Monty Taylor proposed zuul/zuul-jobs master: Split the arch building and pushing separately https://review.opendev.org/725905 | 15:44 |
*** panda|pto has quit IRC | 16:03 | |
openstackgerrit | Albin Vass proposed zuul/zuul-jobs master: Add upload-artifactory role https://review.opendev.org/725678 | 16:06 |
*** panda has joined #opendev | 16:06 | |
openstackgerrit | Albin Vass proposed zuul/zuul-jobs master: Add upload-artifactory role https://review.opendev.org/725678 | 16:10 |
*** dtantsur is now known as dtantsur|afk | 16:11 | |
mordred | fungi, corvus : if you get a sec, could you +A https://review.opendev.org/#/c/725889/ so we can roll forward with nodepool? | 16:11 |
mordred | fungi: thanks! | 16:14 |
openstackgerrit | Merged zuul/zuul-jobs master: Split the arch building and pushing separately https://review.opendev.org/725905 | 16:14 |
mordred | onoes. | 16:14 |
fungi | wot? | 16:15 |
mordred | fungi: https://zuul.opendev.org/t/openstack/build/476e48ecdf6f4c63a37f9ce9a91ec9fa | 16:17 |
mordred | fungi: file (/var/log/nodepool) is absent, cannot continue | 16:17 |
mordred | what? | 16:17 |
mordred | oh. HAHAHAHAHA | 16:18 |
openstackgerrit | Monty Taylor proposed opendev/system-config master: Configure nodepool to use logging config https://review.opendev.org/725889 | 16:18 |
mordred | fungi, clarkb : please to enjoy the silly mistake - and to also be thankful that we have testing | 16:18 |
fungi | oh, indeed, a /var/log/nodepool file would have been less useful | 16:20 |
clarkb | directories and files are just inodesin the end right? why is ext4 so picky :P | 16:20 |
mordred | clarkb: we should switch to running hurd | 16:21 |
clarkb | related: you can't hardlink directories, but ufs has/had an internals debugging tool that did allow you to effectively hardlink directories by constructing the inode contents correctly | 16:22 |
*** rpittau is now known as rpittau|afk | 16:24 | |
*** priteau has quit IRC | 16:25 | |
openstackgerrit | Albin Vass proposed zuul/zuul-jobs master: Add upload-artifactory role https://review.opendev.org/725678 | 16:29 |
Open10K8S | Hey infra team! | 16:37 |
Open10K8S | I am facing POST_FAILURE in zuul gating | 16:37 |
AJaeger | do you have a link to a failure, Open10K8S ? | 16:38 |
Open10K8S | sure | 16:38 |
Open10K8S | https://zuul.opendev.org/t/vexxhost/status | 16:39 |
*** hrw has quit IRC | 16:39 | |
Open10K8S | there are 2 projects | 16:39 |
Open10K8S | https://review.opendev.org/#/c/723087/ | 16:39 |
Open10K8S | https://review.opendev.org/#/c/725923/ | 16:39 |
AJaeger | Links just to "finger://ze09.openstack.org/9a693d2cf30e4ba1ba649d0df2e3e291" in one case ;( | 16:40 |
openstackgerrit | Albin Vass proposed zuul/zuul-jobs master: Add upload-artifactory role https://review.opendev.org/725678 | 16:40 |
AJaeger | Open10K8S: best to recheck and follow the log file to see what's happening | 16:40 |
Open10K8S | AJaeger: ok, gotcha | 16:41 |
fungi | or wait until one of those reports and see if you get anything useful from the build result page for it | 16:41 |
Open10K8S | fungi: ok, anyway, could you share some infor related to this failure or your idea? | 16:41 |
Open10K8S | fungi: i mean the root cause :) | 16:42 |
AJaeger | Open10K8S: Is that a job using container? | 16:42 |
Open10K8S | yeah, exactly k8s | 16:42 |
AJaeger | openstack-operator:linters:tox does not use container, does it? | 16:43 |
smcginnis | Lot's of CANCELED and a POST_FAILURE on this one as well: https://review.opendev.org/#/c/722715/ | 16:43 |
Open10K8S | aha, yes, I only mentioned the whole pipeline | 16:43 |
AJaeger | infra-root, 722715 looks serious ^ | 16:44 |
smcginnis | Seeing quite a few post_failures looking at the check queue. | 16:47 |
mnaser | POST_FAILURE seems to occur after log upload | 16:49 |
mnaser | https://storage.gra.cloud.ovh.net/v1/AUTH_dcaab5e32b234d56b626f72581e3644c/zuul_opendev_logs_e04/722715/1/gate/openstack-tox-lower-constraints/e0438b4/job-output.txt this POST_FAILURE'd and not really anything helpful | 16:49 |
fungi | 2020-05-06 16:41:57,490 INFO zuul.ExecutorClient: [e: 55d61c7159914bd1a5117b757ecf5db3] [build: 02e48b46d36b4622aae99c3df5063525] Cancel build <Build 02e48b46d36b4622aae99c3df5063525 of openstack-tox-docs voting:True on <Worker ze01.openstack.org>> for job openstack-tox-docs | 16:49 |
openstackgerrit | Albin Vass proposed zuul/zuul-jobs master: Add upload-artifactory role https://review.opendev.org/725678 | 16:50 |
corvus | fungi, AJaeger, smcginnis: well, it looks like someone configured fail-fast on octavia: https://opendev.org/openstack/octavia/src/branch/master/zuul.d/projects.yaml#L61 | 16:51 |
corvus | so, i mean, zuul did exactly what you asked it to? | 16:51 |
corvus | i did not think we were planning on using that in openstack | 16:52 |
fungi | yeah, that looks to be the case for the cancelled states | 16:52 |
johnsom | I did | 16:52 |
smcginnis | Ah, OK. I haven't done enough with octavia to have seen that one before. | 16:52 |
fungi | so it's just the openstack-tox-lower-constraints post_failure which triggered all those | 16:53 |
johnsom | Those jobs can run well over an hour, so since there is no point letting them run if one fails I enabled that. | 16:53 |
smcginnis | So looks like just a transient failure that then caused the rest of the CANCELED ones. | 16:53 |
smcginnis | Makes sense. | 16:53 |
fungi | and that looks like it could be a rogue vm in inap | 16:53 |
fungi | something at least caused the executor to think the ssh host key for the job node changed when it tried to collect some data from it | 16:54 |
smcginnis | Would that be what's impacting all of those POST_FAILURES? | 16:54 |
fungi | if the other post_failure results all correlate to similar ip addresses in inap, then that's the most likely explanation | 16:54 |
smcginnis | Here's a release job failure: https://zuul.opendev.org/t/openstack/build/11cee1a24fc64f048459c7eb732f5497 | 16:55 |
smcginnis | No link to results, so I'm not sure. | 16:55 |
fungi | i'll see if we have something useful in executor logs on that one | 16:56 |
clarkb | mnaser: note we can't tell if the issue is after log upload as we upload before the upload completes (if that makes sense) | 16:56 |
clarkb | mnaser: from a logging perspective there is a race there that we can't solve without time travel | 16:57 |
clarkb | what we can do though is check the executor logs | 16:58 |
clarkb | as those don't have the upload race with time moving in one direction | 16:58 |
fungi | i see "Could not find versioned identity endpoints when attempting to authenticate. Please check that your auth_url is correct. Service Unavailable (HTTP 503)" in a traceback when communicating with ovh (swift presumably) | 17:00 |
fungi | so maybe at least some of these post_failure results are an ovh swift issue? | 17:01 |
openstackgerrit | Albin Vass proposed zuul/zuul-jobs master: Add upload-artifactory role https://review.opendev.org/725678 | 17:01 |
fungi | yeah, the exception was raised when running upload-logs-swift | 17:02 |
clarkb | fungi: we forced v3 there iirc because they were dropping v2 | 17:02 |
clarkb | fungi: possible they don't like the force of v3 anymoer? | 17:03 |
fungi | ooh, maybe that was taking effect today | 17:03 |
fungi | and yeah, i find a few of those in the executor-debug.log on ze01, but not prior to 16:32:37 | 17:04 |
fungi | i'll check other executors | 17:04 |
clarkb | in mnasers example we have: 2020-05-06 16:40:49,425 DEBUG zuul.AnsibleJob.output: [e: 55d61c7159914bd1a5117b757ecf5db3] [build: e0438b4528fa4c559a09b504025543e7] Ansible output: b'[WARNING]: Failure using method (v2_runner_item_on_failed) in callback plugin' which is a different issue during fetch-subunit-output | 17:04 |
clarkb | thats an actual issue in zuul's callback plugins I think. Unfortunately thats as much info as we seem to get | 17:04 |
corvus | clarkb: i don't think that's a fatal error? | 17:07 |
fungi | okay, so every executor has logged multiple tracebacks for ovh swift like that, the earliest was 15:00:05 on ze11 | 17:08 |
clarkb | corvus: ya maybe. Its confusing what we get in the output it says failed: 1 with no obvious failures then later 2020-05-06 16:40:50.196207 | POST-RUN END RESULT_NORMAL: [untrusted : opendev.org/zuul/zuul-jobs/playbooks/unittests/post.yaml@master so maybe it is reporting it as a failure but not actually failing? | 17:08 |
clarkb | fungi: on bridge I can run catalog list against BHS1 and GRA1 and we don't seem to force an identity version so thats my best guess still | 17:09 |
fungi | should we temporarily disable ovh swift in zuul, or try that as a fix? | 17:09 |
clarkb | oh we don't set it for our ovh secret either from what I can see | 17:10 |
corvus | clarkb: the error for e0438b4528fa4c559a09b504025543e7 on ze01 is WARNING: REMOTE HOST IDENTIFICATION HAS CHANGED! | 17:11 |
fungi | also possible this was transient... checking to see when the last errors were logged | 17:11 |
fungi | corvus: for that one, was it also in inap? | 17:11 |
fungi | ovh swift error does not seem to have abated on its own, btw, ze02 saw it as recently as 17:11:28 | 17:12 |
corvus | fungi: yes e0438b4528fa4c559a09b504025543e7 on ze01 was on ubuntu-bionic-inap-mtl01-0016406504 | 17:12 |
fungi | corvus: so that makes two post failures we've seen for a host key error in inap, https://zuul.opendev.org/t/openstack/build/e0438b4528fa4c559a09b504025543e7 was another | 17:13 |
clarkb | corvus: I guess that doesn't show up if you grep the uuid? I don't get that in my grep output | 17:13 |
fungi | corvus: oh, wait, that's the same one i already looked at earlier | 17:13 |
fungi | so we only have one case of that failure | 17:14 |
clarkb | looks like we pass in opendev/base-jobs/zuul.d/secrets.yaml:opendev_cloud_ovh_bhs directly as the cloud config to shade/openstacksdk | 17:16 |
clarkb | thats a profile, auth dict with name, passwd, and project name, then a region name | 17:16 |
openstackgerrit | Albin Vass proposed zuul/zuul-jobs master: Add upload-artifactory role https://review.opendev.org/725678 | 17:16 |
clarkb | the project name value should force an implied v3 identification | 17:16 |
clarkb | thats basically the same config that works on bridge | 17:17 |
clarkb | this makes me wonder if it is a "we need to update openstacksdk to have new profile for ovh" problem | 17:17 |
corvus | clarkb: it does show up in a grep for the uuid; log line starts with 2020-05-06 16:41:43,430 DEBUG zuul.AnsibleJob.output: [e: 55d61c7159914bd1a5117b757ecf5db3] [build: e0438b4528fa4c559a09b504025543e7] Ansible output | 17:18 |
fungi | openstacksdk 0.46.0 is what's on the executors, ftr | 17:18 |
clarkb | ze01 openstacksdk==0.41.0 bridge is ^ ya that | 17:18 |
clarkb | fungi: remember zuul executors run ansible in venvs | 17:19 |
clarkb | its the venv version that matters here I think | 17:19 |
fungi | oh, is this ansible calling openstacksdk? | 17:19 |
clarkb | fungi: yes | 17:19 |
fungi | /usr/lib/zuul/ansible/2.8/lib/python3.5/site-packages/keystoneauth1/identity/generic/base.py | 17:20 |
fungi | indeed | 17:20 |
corvus | openstacksdk==0.41.0 | 17:20 |
corvus | is what i see on the 2.8 venv on ze01 | 17:20 |
fungi | openstacksdk 0.41.0 according to /usr/lib/zuul/ansible/2.8/bin/pip | 17:20 |
fungi | yep | 17:20 |
fungi | i guess we just also have openstacksdk installed globally | 17:21 |
fungi | s/globally/in the system context/ | 17:21 |
clarkb | the ovh vendor profile hasn't changed meaningfully | 17:22 |
clarkb | its possible its a bug/change in openstacksdk itself then | 17:22 |
clarkb | corvus: re that REMOTE HOST I guess I was blind or mixing up my greps on ze01 and of https://storage.gra.cloud.ovh.net/v1/AUTH_dcaab5e32b234d56b626f72581e3644c/zuul_opendev_logs_e04/722715/1/gate/openstack-tox-lower-constraints/e0438b4/job-output.txt any idea why it doesn't show up in the uploaded log? | 17:23 |
clarkb | maybe I'm still blind and it is there too | 17:23 |
openstackgerrit | Merged opendev/system-config master: Configure nodepool to use logging config https://review.opendev.org/725889 | 17:23 |
corvus | clarkb: i don't think all the ansible warnings go in to the console log | 17:24 |
fungi | i feel like we ought to disable uploads to ovh swift while we keep troubleshooting keystone/sdk | 17:24 |
fungi | i can push up that change unless there are objections | 17:24 |
clarkb | fungi: I've yet to reproduce outside of zuul though | 17:24 |
clarkb | so we may be turning off our ability to debug this easily (we can use base-test though) | 17:24 |
clarkb | and ya that seems like a reasonable next step. maybe switch base-test to ovh only too and we can start iterating there? | 17:25 |
fungi | are additional job failures going to make this easier to debug? we already have dozens (maybe hundreds) | 17:25 |
clarkb | for reproduction I've installed openstacksdk==0.41.0 in a venv on bridge and am using the clouds.yaml there (whcih appears equivalent to my eye) | 17:25 |
fungi | i'll do the base-test tweak in the same change, sure | 17:25 |
clarkb | fungi: its not additional failures, but being able to cahgne something and then check if that cahnges behavior since we don't have a reproducer right now | 17:26 |
*** sshnaidm is now known as sshnaidm|afk | 17:26 | |
fungi | i can't remember, is this all inherited from opendev/base-jobs now or do i need to make a separate change for each tenant's config repo? | 17:27 |
clarkb | fungi: it should all be in opendev/base-jobs | 17:27 |
fungi | thanks, that's quicker at least | 17:27 |
openstackgerrit | Albin Vass proposed zuul/zuul-jobs master: Add upload-artifactory role https://review.opendev.org/725678 | 17:27 |
clarkb | https://auth.cloud.ovh.net/ produces what I expect from my home network connection | 17:31 |
openstackgerrit | Jeremy Stanley proposed opendev/base-jobs master: Temporarily disable OVH Swift uploads https://review.opendev.org/725943 | 17:32 |
fungi | infra-root: ^ | 17:32 |
clarkb | fungi: ah ok I see now you noted we get an HTTP 503 | 17:32 |
clarkb | that likely explains why we can't reproduce | 17:32 |
clarkb | its a server side error | 17:32 |
fungi | we might want to submit that in gerrit to reduce delay, especially since it's likely to post_failure on the problem it's trying to stop | 17:32 |
clarkb | fungi: before we land that can we check really quickly if we still observe the 503s? | 17:32 |
fungi | checking | 17:33 |
fungi | most recent occurrence was 17:11:39 on ze04 | 17:35 |
fungi | there's every chance that this is only happening for some fraction of api calls, and so could require lots of tries to reproduce | 17:35 |
clarkb | looks like they happened in clusters about every 20 minutes or so from 15:40 to 17:11 | 17:36 |
fungi | also maybe it was finally fixed 25 minutes ago | 17:36 |
corvus | clarkb, fungi: todays post failures: http://paste.openstack.org/show/793212/ | 17:36 |
fungi | well, i also saw it as early as 14:00 | 17:36 |
fungi | er, 15:00 | 17:36 |
clarkb | and this is executor to ovh regionless url so won't be region specific | 17:37 |
corvus | i think the cluster around 1630 is what got people's attention. it may have settled back down to normal levels | 17:37 |
corvus | (we can't tell whether those post failures are all log upload failures) | 17:38 |
clarkb | http://travaux.ovh.net/?do=details&id=44436 possibly related | 17:38 |
openstackgerrit | Albin Vass proposed zuul/zuul-jobs master: Add upload-artifactory role https://review.opendev.org/725678 | 17:38 |
clarkb | however I don't think the timestamps line up properly for us in that ticket | 17:39 |
clarkb | corvus: fungi ya maybe we can check the list again in 10 minutes or so and see if it has grown new cases? | 17:40 |
clarkb | if it has then remove ovh from the rotation if not then its looking happier? | 17:40 |
corvus | yeah, i'm inclined to think it's abated | 17:41 |
fungi | clarkb: corvus: yeah, could have been a temporary glitch while they were maintenancing | 17:42 |
clarkb | that said I think we may want to consider updating openstacksdk periodically in those virtualenvs. If we get to running in containers I think that will happen automatically | 17:42 |
clarkb | openstacksdk gets updates to handle changes in real world cloud deployments so being able to pull those in seems like a good thing | 17:43 |
clarkb | (and maybe the answer is just containers) | 17:43 |
corvus | containers on executors are still blocked by afs | 17:43 |
clarkb | ze07 and ze12 have just ercordred occurences at 17:44 | 17:45 |
clarkb | but only one each | 17:45 |
fungi | so it's infrequent, but frequent enough to be disruptive | 17:46 |
clarkb | ya there has been talk of having that role retry but the way we feed it random inputs makes that difficult | 17:47 |
clarkb | essentially we'd we retrying against the same cloud which in many cases is the wrong thing (though in this case might be the correct thing) | 17:48 |
clarkb | or correct enough in this case anyway since the 503 seems infrequent | 17:48 |
*** hashar has joined #opendev | 17:58 | |
openstackgerrit | Albin Vass proposed zuul/zuul-jobs master: Add upload-artifactory role https://review.opendev.org/725678 | 18:05 |
*** ralonsoh has quit IRC | 18:19 | |
*** lpetrut has quit IRC | 18:22 | |
openstackgerrit | Albin Vass proposed zuul/zuul-jobs master: Add upload-artifactory role https://review.opendev.org/725678 | 18:23 |
openstackgerrit | Albin Vass proposed zuul/zuul-jobs master: Add upload-artifactory role https://review.opendev.org/725678 | 18:44 |
clarkb | mordred: is nl01 happy now? should we proceed with the others? | 18:58 |
mordred | clarkb: let me restart it and make sure it's logging to the right place | 18:59 |
openstackgerrit | Albin Vass proposed zuul/zuul-jobs master: Add upload-artifactory role https://review.opendev.org/725678 | 19:03 |
openstackgerrit | Monty Taylor proposed opendev/system-config master: Pass -f to nodepool to run in foreground https://review.opendev.org/725967 | 19:07 |
mordred | clarkb, fungi, corvus : ^^ | 19:08 |
mordred | we almost got the last patch right | 19:08 |
mordred | clarkb: I applied that by hand on nl01 and it's happy | 19:08 |
mordred | clarkb: so once we land that patch ^^ I believe we can proceed with the others | 19:08 |
mordred | (and nl01 - bugs not withstanding, went quickly - so we shoudl be able to knock this out) | 19:09 |
clarkb | mordred: I don't understand the relatonship between logging config and foreground vs not | 19:09 |
mordred | clarkb: we are overriding the command line in the compose file | 19:09 |
mordred | the commmand line in the dockerfile that we're overriding has a -f | 19:09 |
clarkb | gotcha | 19:09 |
mordred | sorry - I englished bad | 19:10 |
clarkb | and because its docker it doesn't like proper daemon | 19:10 |
mordred | well - that -- and also when we daemonize we want to write a pidfile | 19:10 |
mordred | and we want to write it in a location that doesn't exist in the container | 19:10 |
mordred | so it flat won't start | 19:10 |
*** DSpider has joined #opendev | 19:13 | |
openstackgerrit | Albin Vass proposed zuul/zuul-jobs master: Add upload-artifactory role https://review.opendev.org/725678 | 19:22 |
openstackgerrit | Albin Vass proposed zuul/zuul-jobs master: Add upload-artifactory role https://review.opendev.org/725678 | 19:34 |
mordred | clarkb: feel like +Aing https://review.opendev.org/#/c/722483/ and https://review.opendev.org/#/c/725611/ ? | 19:35 |
clarkb | mordred: done | 19:37 |
clarkb | mordred: Iguess once the container nodepool rollout is done we can replace all the builders and launchers with bionic (or even focal?) hosts? | 19:38 |
mordred | clarkb: yup | 19:39 |
mordred | clarkb: should be a piece of cake | 19:39 |
mordred | clarkb: speaking of: https://review.opendev.org/#/c/723528/ fixes base-server for focal | 19:40 |
mordred | clarkb: so once that's in - we can basically do a rolling replacement of ze* nl* nb* zm* with focal | 19:40 |
clarkb | mordred: for some reason I thought we had a 5 node limit on our jobs? | 19:42 |
clarkb | (that adds a 6th node) | 19:42 |
mordred | clarkb: we apparently do not | 19:46 |
fungi | i thought it was 7, but i really don't have much evidence on which to base that vague recollection | 19:47 |
corvus | 10: https://opendev.org/openstack/project-config/src/branch/master/zuul/main.yaml#L3 | 19:49 |
corvus | since https://opendev.org/openstack/project-config/commit/58c9ca0e9a910e8bf101f2b4a95abbf7aac8f0ce | 19:50 |
clarkb | mordred: noted a couple small cleanups on that change I think we should do to avoid confusion | 19:50 |
openstackgerrit | Albin Vass proposed zuul/zuul-jobs master: Add upload-artifactory role https://review.opendev.org/725678 | 19:50 |
corvus | looks like we're maintaining our "looking one year ahead" track record :) | 19:51 |
mordred | clarkb, corvus: should we keep installing jemalloc on the executors? | 19:53 |
mordred | especially on focal | 19:53 |
clarkb | mordred: I've had a bit of a think on that. My hunch is that the jemalloc on Kubic (is that what our base docker images are?) is buggy | 19:54 |
clarkb | mordred: because python is going to use the same number of mallocs and frees if running with jemalloc or not | 19:54 |
mordred | clarkb: s/kubic/buster/ | 19:54 |
clarkb | mordred: its likely that newer jemalloc on ubuntu would suffer similar issues if sharing a similar jemalloc version | 19:55 |
mordred | yeah - focal is also jemalloc2 | 19:55 |
corvus | yeah, given that things are looking better without it so far (under different circumstances, but it's the best we got), i'd say lets try focal without jemalloc | 19:55 |
mordred | so I think based on our container experience we should avoid jemalloc on focal | 19:55 |
mordred | ++ | 19:55 |
openstackgerrit | Monty Taylor proposed opendev/system-config master: Test zuul-executor on focal https://review.opendev.org/723528 | 20:00 |
mordred | corvus, clarkb: how do those changes look? | 20:00 |
mordred | I realized we also weren't actually, you know, using jemalloc1 on not-focal even though we were installing it | 20:01 |
openstackgerrit | Albin Vass proposed zuul/zuul-jobs master: Add upload-artifactory role https://review.opendev.org/725678 | 20:02 |
openstackgerrit | Monty Taylor proposed opendev/system-config master: Test zuul-executor on focal https://review.opendev.org/723528 | 20:04 |
mordred | clarkb: I mean - we _were_ using jemalloc on the existing executors - but only be accident since there was a defaults file there already | 20:04 |
openstackgerrit | Albin Vass proposed zuul/zuul-jobs master: Add upload-artifactory role https://review.opendev.org/725678 | 20:10 |
fungi | infra-root: just as a heads up, we're still seeing keystone 503 errors out of ovh as recently as 20:02:30 | 20:11 |
clarkb | fungi: I did +2 the change back when it was pushed so that we can land it if necessary | 20:12 |
fungi | make that 20:08:12 | 20:13 |
*** diablo_rojo has joined #opendev | 20:13 | |
openstackgerrit | Albin Vass proposed zuul/zuul-jobs master: Add upload-artifactory role https://review.opendev.org/725678 | 20:18 |
openstackgerrit | Merged opendev/system-config master: Pass -f to nodepool to run in foreground https://review.opendev.org/725967 | 20:19 |
openstackgerrit | Albin Vass proposed zuul/zuul-jobs master: Add upload-artifactory role https://review.opendev.org/725678 | 20:32 |
openstackgerrit | Albin Vass proposed zuul/zuul-jobs master: Add upload-artifactory role https://review.opendev.org/725678 | 20:43 |
*** hashar has quit IRC | 20:49 | |
fungi | infra-root: update from conversation with an ovh engineer (rledisez) in #openstack-infra just now, this does seem to be a problem on their end and is now being looked into since clarkb brought it to their attention | 20:51 |
openstackgerrit | Albin Vass proposed zuul/zuul-jobs master: Add upload-artifactory role https://review.opendev.org/725678 | 20:53 |
slittle1 | Not long ago we created a new repo starlingx/kernel.git | 20:56 |
fungi | i remember | 20:57 |
slittle1 | I do the branch management for starlingx, but I don't seem to be able to push a branch into starlingx/kernel.git | 20:57 |
slittle1 | a new branch that is | 20:57 |
slittle1 | I'm ondering if we missed something | 20:58 |
slittle1 | ondering ? make that 'wondering' | 20:58 |
fungi | looks like you're in the starlingx-release group which has create-reference permission for refs/heads/* in that repo | 20:59 |
fungi | slittle1: if you go to https://review.opendev.org/#/admin/projects/starlingx/kernel,branches is the create branch button available | 20:59 |
fungi | ? | 21:00 |
slittle1 | yes | 21:00 |
fungi | you should be able to plug a branch name and then a ref into the fields there and create a new branch | 21:01 |
fungi | this version of gerrit isn't great about equating that to pushing a new branch via git protocol, unfortunately, if that's what you were trying | 21:01 |
slittle1 | I normally work on large numbers of gits at the same time via bash scripts | 21:01 |
fungi | i'm not sure if that will improve in the near future when we upgrade, but it might | 21:01 |
fungi | there is a rest api which can be used to create branches | 21:02 |
fungi | the ssh "command-line" gerrit api also has a create-branch command | 21:02 |
slittle1 | so 'git push gerrit <new_branch>' isn't going to be working for any of our gits ? | 21:03 |
slittle1 | ok, can you point me to a reference for that ? | 21:03 |
fungi | not if the branch doesn't exist yet. branch creation in gerrit is a separate step, at least in gerrit 2.13 | 21:03 |
fungi | yeah, getting you the reference docs for both rest and ssh options now | 21:03 |
fungi | https://review.opendev.org/Documentation/rest-api-projects.html#create-branch | 21:04 |
fungi | https://review.opendev.org/Documentation/cmd-create-branch.html | 21:04 |
fungi | slittle1: ^ those are scriptable alternatives to the webui "create branch" form | 21:05 |
clarkb | the openstack release team has scripts to automate branch creation which may be helpful to | 21:05 |
fungi | slittle1: if you're configured to push via https then the rest api will use the same credentials, or if you push to gerrit via ssh on 29418/tcp then the ssh command-line api uses the same ssh key | 21:06 |
mordred | clarkb: nl01 seems good. I'm going to move on to nl02 if you concur | 21:09 |
slittle1 | ok, one snag | 21:10 |
clarkb | mordred: give me a few minutes and I can check in the host | 21:10 |
mordred | clarkb: cool thanks. I've put the disable file on bridge fwiw | 21:11 |
slittle1 | I'm not branching from an existing revision | 21:11 |
fungi | slittle1: in git there's no such thing | 21:11 |
fungi | all branches share some history, even if it's the base commit | 21:11 |
slittle1 | not so sure that's tru | 21:12 |
mordred | fungi: yeah - I agree with slittle1 - this is how we stich different repos together | 21:12 |
fungi | slittle1: there are tricks you can use to graft a branch into a repository where it shares no history with other branches, but that requires rewriting the branch, i think | 21:12 |
mordred | they wind up with two completely separate history parents | 21:12 |
slittle1 | exactly what I'm trying to do | 21:13 |
* fungi checks on that process | 21:13 | |
fungi | been a while since i've had to do that | 21:13 |
slittle1 | There was a subdirectory that should have been included in in kernel.git | 21:14 |
slittle1 | I'm tryinh to transfer that directory while preserving it's history | 21:14 |
slittle1 | I had hoped to deliver it to a side branch | 21:14 |
slittle1 | then put the merge commit for normal review | 21:15 |
mordred | I'll see what fungi finds - I think what you might need to do is create a branch from some ref (doens't matter) - then force-push to that branch (which will overwrite it with your non-parent-sharing new import) | 21:15 |
clarkb | fwiw the 0 * 40 sha is the null parent | 21:15 |
clarkb | so fungi is right but there is a hack | 21:16 |
mordred | and then once you have done that, the refs in it will all exist in gerrit, so when you do the merge commit, it will only be the merge commit that gets reviewed | 21:16 |
clarkb | to make that true for the first commit | 21:16 |
mordred | ah - cool | 21:16 |
fungi | slittle1: so... just to confirm, you have a bunch of commits already in a branch which aren't reviews in gerrit, and *now* you want to push them into the repository, completely bypassing the code review system | 21:16 |
slittle1 | strictly speaking, they were reviewed in the old project. | 21:17 |
fungi | oh, these are being copied from a different gerrit project? | 21:18 |
slittle1 | don't want to subject the community to dozens of re-reviews | 21:18 |
slittle1 | was hoping for a single review in the merge commit | 21:18 |
fungi | well, either way, it's going to require an admin to push --force the commits since they're not being pushed for review | 21:18 |
clarkb | mordred: I seea running nodepool-launcher, /usr/local/bin/nodepool is not container installed but running list with it showed me a recently used node. Grepping /var/log/nodepool/launcher-debug.log for that nodeid shows me the node was created after the launcher process started | 21:19 |
openstackgerrit | Monty Taylor proposed opendev/system-config master: Test zuul-executor on focal https://review.opendev.org/723528 | 21:19 |
clarkb | mordred: thats all looking great to me on nl01. I think the only thing I've noticed is we may need to cleanup the global install at some point | 21:19 |
clarkb | mordred: ++ to nl02 | 21:19 |
fungi | slittle1: if you can get the branch you want pushed in published somewhere public i can pull from, i'll add it to the repo in gerrit with whatever branch name you need | 21:19 |
mordred | clarkb: yes to global install (although that'll get sorted when we re-install on focal) | 21:20 |
slittle1 | ok, so I need to push what I have up to github and ask one of you to do it ? | 21:20 |
slittle1 | the force push that is | 21:20 |
fungi | slittle1: can be github, or really anywhere i can clone from | 21:21 |
fungi | i frequently just push via ssh to a webserver i have access to, that also works | 21:21 |
fungi | just needs to be reachable long enough for me to clone | 21:21 |
mordred | clarkb: playbooks/service-nodepool-manual.yaml in /home/zuul/src/opendev.org/opendev/system-config has the playbook I'm using | 21:21 |
mordred | clarkb: and I am running in a screen | 21:22 |
clarkb | mordred: k. I'm not on the screen now but let me know if I should be | 21:24 |
mordred | clarkb: nope. going smoothly. nl02 is done | 21:28 |
mordred | moving on to nl03 | 21:28 |
slittle1 | ready https://github.com/slittle1/starlingx-kernel-additions.git branch=move-mellanox-with-history | 21:28 |
fungi | slittle1: thanks, working on it now | 21:29 |
mordred | nl03 is done - moving on to 04 | 21:33 |
fungi | slittle1: you want the branch in gerrit to be called move-mellanox-with-history as well? | 21:33 |
slittle1 | curiosity: 'fungi' as in 'fun guy' :) | 21:34 |
slittle1 | yes please | 21:34 |
fungi | slittle1: fungi in my case is an obscure reference to a series of sonnets by an early 20th century horror and science fiction author named howard phillips lovecraft | 21:35 |
slittle1 | cool | 21:35 |
* corvus can confirm fungi is fun nonetheless | 21:36 | |
fungi | i try ;) | 21:36 |
*** DSpider has quit IRC | 21:36 | |
corvus | "fungi is fun in a lovecraftian sort of way" | 21:37 |
mordred | #status log finished rolling out nodepool ansible change - all launchers now running in docker | 21:39 |
openstackstatus | mordred: finished logging | 21:39 |
mordred | infra-root: ^^ | 21:39 |
mordred | I have removed nodpeool from the emergency file and removed he ansible lock on bridge | 21:40 |
clarkb | mordred: \o/ | 21:40 |
fungi | corvus: that's likely closer to the truth | 21:41 |
fungi | slittle1: okay, this is the output from pushing (looks like --force was unneeded): http://paste.openstack.org/show/793221/ | 21:42 |
fungi | slittle1: see if the history here looks right to you: https://opendev.org/starlingx/kernel/commits/branch/move-mellanox-with-history | 21:43 |
slittle1 | yes, that looks correct | 21:43 |
fungi | it still would have needed an admin for the impersonate committer permissions | 21:43 |
fungi | slittle1: cool, don't hesitate to let us know if you need anything else | 21:44 |
slittle1 | thanks | 21:44 |
fungi | any time! | 21:45 |
*** smcginnis has quit IRC | 21:46 | |
clarkb | I'm overdue for a bike ride and the sun just popped out. Going to afk for a bit | 21:46 |
openstackgerrit | Monty Taylor proposed zuul/zuul-jobs master: Tag the images pulled back into docker for changes https://review.opendev.org/726007 | 21:47 |
ianw | infra-root: if i could get one more eye on https://review.opendev.org/#/c/725749/ i think we're ready to drop pip-and-virtualenv from suse at least. i can monitor it, but we're at a point we can roll-back by just re-adding the element and rebuilding now | 21:51 |
ianw | also https://review.opendev.org/725737 ; this modifies ensure-tox to install in a venv instead of --user and is required to run testinfra on our plain nodes, because we run it as a different user | 21:52 |
openstackgerrit | James E. Blair proposed opendev/system-config master: WIP: add Zookeeper TLS support https://review.opendev.org/720302 | 21:53 |
openstackgerrit | Monty Taylor proposed zuul/zuul-jobs master: Tag the images pulled back into docker for changes https://review.opendev.org/726007 | 21:54 |
mordred | ianw: have fun! | 21:55 |
mordred | ianw: isn't that zuul-jobs patch going to break on xenial for people who are not using opendev nodes? | 21:56 |
ianw | mordred: i don't think so ... {{ ensure_pip_virtualenv_command }} should aways be a working thing that you can put into virtualenv_command, anywhere | 21:57 |
openstackgerrit | Monty Taylor proposed opendev/system-config master: Add focal to system-config base job https://review.opendev.org/725676 | 21:59 |
mordred | ianw: ok - I just thought there was something about old pip on xenial that's broken or something | 22:00 |
mordred | but - I guess if that was what's on the node befoore, this is no different | 22:00 |
ianw | mordred: it's pip 8 which xenial ships that has problems falling back to pypi with our mirrors; but for any external users (if there are any) who aren't setting up mirroring there won't be any issues | 22:03 |
ianw | that was what https://review.opendev.org/#/c/724788/ works around | 22:03 |
clarkb | mordred: so I dont forget. Did we disable the system units for nodepool on nl01-04? | 22:04 |
*** smcginnis has joined #opendev | 22:05 | |
ianw | as soon as i drop pip-and-virtualenv for xenial, we can consistently be using python3 -m venv to install things. i.e. we can be out of the virtualenv game -- if jobs want it they can manage it themselves (we give them ensure-virtualenv, but it's not required) | 22:06 |
openstackgerrit | Merged openstack/project-config master: Drop pip-and-virtualenv from SUSE 15 https://review.opendev.org/725749 | 22:07 |
mordred | clarkb: no - let me make a patch do that. I think we need to do it on zuul hosts too right? | 22:08 |
mordred | clarkb: and - it looks like we were never installing units - we were installing sysvinit scripts and letting compat take care of it | 22:08 |
clarkb | mordred: ya sorry `systemctl disable foo` basically that works for units and compat | 22:11 |
openstackgerrit | Monty Taylor proposed opendev/system-config master: Remove old init scripts and services for zuul/nodepool https://review.opendev.org/726011 | 22:15 |
openstackgerrit | Monty Taylor proposed opendev/system-config master: Revert "Remove old init scripts and services for zuul/nodepool" https://review.opendev.org/726012 | 22:15 |
mordred | clarkb: there's two - one to land and run and disable/remove things - then a second to remove that from ansible os we don't do it a billion times | 22:15 |
corvus | mordred: is it correct that we can remove nodepool-base-legacy role now? | 22:18 |
mordred | corvus: we still have three hosts on puppet | 22:19 |
mordred | corvus: we should be clear to shift them over to ansible/docker now | 22:19 |
corvus | mordred: ack | 22:19 |
mordred | ianw: any reason to not just update nb01 and nb02 in place? | 22:20 |
* mordred is pretty sure we should boot a second builder on arm running the new arm container to verify that | 22:20 | |
corvus | mordred: hostnames for nb01/02? | 22:21 |
mordred | or - we could also combine it with fresh new servers and roll out nb01.opendev.org and nb02.opendev.org | 22:21 |
corvus | ya that | 22:21 |
mordred | yeah - so maybe that's the cleanest thing to do | 22:21 |
ianw | mordred: also config things changed a bit, so i think it would leave a bit of a mess | 22:21 |
mordred | yeah - I think for the zuul hosts that we did in place doing a rolling replacement is going to be nice | 22:21 |
mordred | infra-root: we can now land these: https://review.opendev.org/#/q/topic:infra-prod-zuul+status:open which should open the door to clarkb's reorg of zuul.yaml | 22:22 |
ianw | i can work on bringing them up now, if we're ready | 22:22 |
mordred | ianw: I think we are, yes | 22:23 |
mordred | ianw: also - well - one sec | 22:23 |
ianw | we have nb04 running, so really we could just create 05 and shutdown 01/02 and be in the same capacity | 22:23 |
mordred | ianw: ++ | 22:23 |
mordred | sounds great to me | 22:24 |
ianw | ok, i'll start on that | 22:24 |
mordred | I think the only thing we have any questions on wrt proceeding is 03 - and we need to publish the docker image for us to test that | 22:25 |
corvus | why wouldn't we call them 01 and 02? | 22:25 |
corvus | they're under different domains | 22:25 |
openstackgerrit | Merged zuul/zuul-jobs master: Tag the images pulled back into docker for changes https://review.opendev.org/726007 | 22:25 |
mordred | good point | 22:25 |
ianw | corvus: when i did try that things exploded :) | 22:26 |
corvus | did that not get fixed? | 22:26 |
fungi | i want to say after that we merged a change to start using the fqdn for identifiers in the metadata | 22:27 |
fungi | instead of just the short hostname | 22:27 |
corvus | fungi, ianw: or maybe this was the fix? https://review.opendev.org/713057 | 22:28 |
ianw | yeah i feel like i commented on a change to confirm that, looking | 22:28 |
ianw | yeah that's the one i'm thinking of | 22:29 |
corvus | so maybe we should go with 01/02 and place our bets on us having actually fixed that :) | 22:29 |
ianw | i can try nb01.opendev.org, and if something doesn't work we'll know what to look at | 22:29 |
mordred | ++ I think that's a great plan | 22:31 |
ianw | are we at the point to try it as a focal image? | 22:31 |
mordred | ianw: this needs to land - but I think it's ready to: https://review.opendev.org/#/c/723528/ | 22:32 |
mordred | ianw: probably worth pushing up an patch to add an nb01.opendev.org to the nodepool test running on focal just to make sure | 22:33 |
mordred | ianw: or - we could just stick to bionic on the builders since we know it's working | 22:34 |
fungi | #status log deleted openstack/rally branches stable/0.10 (67759651f129704242d346a2c045413fcdea912d) and stable/0.12 (99f13ca7972d7f64b84204c49f1ab91da6d6cb6b) per https://review.opendev.org/721687 (stable/0.9 left in place for now as it still has open changes) | 22:34 |
mordred | and leave focal for later | 22:34 |
openstackstatus | fungi: finished logging | 22:34 |
mordred | corvus: what do you think? | 22:34 |
corvus | mordred: that change is 99% to the executors | 22:34 |
corvus | mordred: i think trying focal on the builders is fine, but we should separate that out | 22:34 |
mordred | ++ | 22:35 |
mordred | so let's roll out new builders on bionic to get rid of puppet | 22:35 |
mordred | and unblock the zk tls work | 22:35 |
corvus | sounds good | 22:35 |
openstackgerrit | James E. Blair proposed opendev/system-config master: WIP: add Zookeeper TLS support https://review.opendev.org/720302 | 22:38 |
corvus | okay, throwing spaghetti at the wall and seeing what sticks :) | 22:38 |
openstackgerrit | James E. Blair proposed opendev/system-config master: WIP: add Zookeeper TLS support https://review.opendev.org/720302 | 22:45 |
mordred | corvus: mmm. spaghetti | 22:52 |
*** tosky has quit IRC | 22:56 | |
openstackgerrit | Ian Wienand proposed opendev/zone-opendev.org master: Add nb01/nb02.opendev.org https://review.opendev.org/726018 | 22:59 |
openstackgerrit | Monty Taylor proposed opendev/system-config master: Test zuul-executor on focal https://review.opendev.org/723528 | 23:00 |
mordred | ianw: woot | 23:03 |
openstackgerrit | Ian Wienand proposed opendev/system-config master: Add nb01/nb02 opendev servers https://review.opendev.org/726021 | 23:11 |
ianw | we probably have to reboot graphite as i think it is old enough to have a firewall rule from the last time we tried nb01.opendev.org | 23:11 |
mordred | corvus: hrm. there still may be a flaw in our wondrous multi-arch work | 23:13 |
mordred | but it'll have to be a tomorrow thing | 23:13 |
clarkb | mordred: the init script stack is +2's all around from me | 23:38 |
clarkb | mordred: and looks like I've already reviewed the other stack you linked | 23:39 |
clarkb | maybe tomorrow we'll be able to land those and the reorg change? | 23:39 |
ianw | clarkb: if you have second for https://review.opendev.org/#/c/725737/, i might be able to get a arm64 nb container test up | 23:44 |
ianw | it has to run on a full arm64 stack, though, see https://review.opendev.org/#/c/724439/and https://review.opendev.org/#/c/724435/1 | 23:45 |
clarkb | ianw: looking | 23:45 |
clarkb | ianw: hrm I worry about that move breaking jobs if they hardcode the old path to tox | 23:46 |
ianw | they should be using {{ tox_executable }} though? | 23:47 |
clarkb | ianw: yes, but it wouldn't surprise me if they did something like /home/zuul/.local/pip/bin/tox (I don't know where --user actually stashes it) | 23:47 |
clarkb | we can check codesearch for that path and if it shows no results is probably reasonably safe and people are using the ansible var instead | 23:48 |
clarkb | /home/zuul/.local/bin/tox | 23:48 |
clarkb | ianw: https://opendev.org/x/tobiko/src/branch/master/roles/tobiko-ensure-tox/tasks/tox.yaml#L24 thats the only case I find in codesearch | 23:50 |
fungi | http://codesearch.openstack.org/?q=local%2Fbin%2Ftox&i=nope&files=&repos= | 23:50 |
fungi | yeah, tobiko is the only one i found | 23:50 |
fungi | and zuul-jobs obviously | 23:50 |
ianw | arrgghhh | 23:51 |
ianw | oh, hang on, that's not using our role? | 23:51 |
ianw | https://opendev.org/x/tobiko/src/branch/master/roles/tobiko-ensure-tox/tasks/tox.yaml#L3 | 23:51 |
fungi | convenient! | 23:51 |
clarkb | ianw: oh thats a good point they do their own install at hte top weird | 23:52 |
ianw | it looks like nb01.openstack.org is borked ATM anyway ... a stuck process it seems | 23:52 |
clarkb | so ya maybe it is safe afterall (fwiw I+2'd but didn't approve thinking we could fix tobiko then approve zuul-jobs but maybe we can just land it now) | 23:52 |
clarkb | ianw: is it safe to land https://review.opendev.org/#/c/724439/1 or do we want to wait for the tox fixing first? | 23:54 |
ianw | clarkb: it's safe to land, as it doesn't vote | 23:54 |
ianw | i think it's that the focal builds bail on the xenial hosts and then leave behind lockfiles. | 23:54 |
ianw | at this point, i'm just going to push on with the replacement nodes | 23:55 |
ianw | https://review.opendev.org/#/c/726021/ to add the new nodes just reported back, clarkb/fungi maybe you could take a look | 23:56 |
clarkb | uhm is it safe to reuse the names? | 23:56 |
clarkb | I feel like we fixed it sufficiently that it would be now but I also don't want to repeat that deletion thing all over again | 23:56 |
ianw | we discussed it @http://eavesdrop.openstack.org/irclogs/%23opendev/%23opendev.2020-05-06.log.html#t2020-05-06T22:29:13 | 23:57 |
clarkb | thanks | 23:57 |
openstackgerrit | Merged opendev/zone-opendev.org master: Add nb01/nb02.opendev.org https://review.opendev.org/726018 | 23:58 |
ianw | graphite is the only thing i can see that is holding on to an old firewall rule for the old nb01, i'll reboot that quickly once the new adress resolves ^ | 23:58 |
Generated by irclog2html.py 2.17.2 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!