corvus | ugh, we probably should have just frozen the whole system | 00:01 |
---|---|---|
corvus | i spent the day trying to get the letsencrypt playbook to run, cause we're still not at the point where my change could actually deploy | 00:01 |
corvus | but the project addition change jumped the gun on it :/ | 00:01 |
corvus | (presumably because it didn't have to run all the failing base playbooks) | 00:01 |
corvus | mordred: i've got like another minute i can help; are things stable? | 00:06 |
mordred | corvus: I think things are stable yeah - the rename at least didn't seem to break anything further | 00:06 |
fungi | i think things were stable enough anyway, i just didn't realize tenant config updates were going to be blocked | 00:06 |
fungi | so had started digging into it to figure out why | 00:07 |
corvus | well, they weren't blocked :/ | 00:07 |
mordred | corvus: I think leaving it in place is fine - it should make the running of the service-zuul playbook work when the other parts do | 00:07 |
fungi | er, prevented from getting applied to the scheduler i mean | 00:07 |
corvus | yeah, i mean, we basically may have possibly jumped to the end of the process | 00:07 |
corvus | depends on whether we whacked all the moles | 00:07 |
corvus | we'll need to check the periodic pipeline to see | 00:08 |
corvus | (at this point, we should stop enqueing old changes into deploy, since a much newer change has actually deployed) | 00:08 |
fungi | unless someone's had a chance to run through the whole inventory, i expect we still probably have at least a few ipv4 addresses we'll need to accept ssh host keys for | 00:09 |
corvus | fungi: what happened is that i expected the zuuld change to possibly break, but because of all the problems with the base and letsencrypt playbooks, it was never applied (that's what i've spent 2 days on). but a project-config change applies the same zuul playbook without running the base or letsencrypt changes first, so that was when it was finally applied. so it broke, as expected. just not when | 00:11 |
corvus | expected. | 00:11 |
mordred | fungi: I could run a quick ansible command on all and accept the rest of the host keys | 00:12 |
corvus | so i think this is another aspect of the base playbook tree that we need to keep in mind as we consider how to re-work it | 00:12 |
fungi | i see. so we should have frozen any approval of project-config changes is what you were saying | 00:12 |
fungi | at least any touching files which could set off the zuul deploy job anyway | 00:12 |
corvus | fungi: yep; a characteristic of our new/current system is that there are multiple paths to get to the same deployment step. that wasn't the case before. project-config changes are a shortcut. | 00:13 |
fungi | mordred: like a no-op sort of command across the whole inventory? that ought to to the trick | 00:13 |
mordred | fungi: yeah - I'm running -mshell -aecho right now | 00:14 |
fungi | awesome, thanks | 00:14 |
mordred | it was really pissed off about logstash-worker01 | 00:15 |
corvus | okay, i have to run. i think the key thing to know is that we won't be sure the system is in equilibrium until we have a clean periodic run. when that happens, we should be gtg. until then, there could still be dragons. | 00:15 |
fungi | thanks corvus! | 00:16 |
fungi | i'll try to remember to check the periodic builds when i wake up tomorrow | 00:16 |
mordred | fungi: ok. all done | 00:19 |
mordred | fungi: I had to replace an old key for logstash-worker01 - but other than that all should be good and happy now | 00:19 |
fungi | thanks again! | 00:20 |
*** ysandeep|away is now known as ysandeep | 00:38 | |
*** Meiyan has joined #opendev | 01:03 | |
*** larainema has joined #opendev | 02:19 | |
*** DSpider has joined #opendev | 02:21 | |
*** stephenfin has quit IRC | 05:21 | |
*** stephenfin has joined #opendev | 05:28 | |
*** slaweq has joined #opendev | 06:20 | |
*** sgw has quit IRC | 06:32 | |
*** roman_g has quit IRC | 07:15 | |
*** slaweq has quit IRC | 07:16 | |
*** lpetrut has joined #opendev | 07:21 | |
zbr | morning! any core around today or busy shopping for the bbq? | 07:21 |
*** lpetrut has quit IRC | 07:22 | |
openstackgerrit | gugug proposed openstack/project-config master: Retire kolla-cli project - step 1 end project gating https://review.opendev.org/730431 | 07:31 |
openstackgerrit | gugug proposed openstack/project-config master: Retire kolla-cli project - step 3 remove from infra system https://review.opendev.org/730432 | 07:40 |
openstackgerrit | gugug proposed openstack/project-config master: Retire kolla-cli project - step 1 end project gating https://review.opendev.org/730431 | 07:40 |
*** Meiyan has quit IRC | 07:51 | |
openstackgerrit | Sorin Sbarnea (zbr) proposed opendev/elastic-recheck master: Bumped flake8 https://review.opendev.org/729328 | 07:55 |
*** moppy has quit IRC | 08:01 | |
*** moppy has joined #opendev | 08:01 | |
*** slaweq has joined #opendev | 08:32 | |
*** slaweq has quit IRC | 08:47 | |
*** slaweq has joined #opendev | 08:49 | |
zbr | infra-core: please merge ^ | 08:54 |
openstackgerrit | Sorin Sbarnea (zbr) proposed zuul/zuul-jobs master: Make gentoo jobs nv https://review.opendev.org/728640 | 09:17 |
*** slaweq has quit IRC | 10:12 | |
openstackgerrit | Albin Vass proposed zuul/zuul-jobs master: fetch-subunit-output: Do not fail if find is not installed https://review.opendev.org/730448 | 10:23 |
openstackgerrit | Albin Vass proposed zuul/zuul-jobs master: fetch-subunit-output: Do not fail if find is not installed https://review.opendev.org/730448 | 10:36 |
openstackgerrit | Albin Vass proposed zuul/zuul-jobs master: fetch-subunit-output: Do not fail if find is not installed https://review.opendev.org/730448 | 10:38 |
openstackgerrit | Albin Vass proposed zuul/zuul-jobs master: WIP: add simple test runner https://review.opendev.org/728684 | 10:44 |
*** tosky has joined #opendev | 11:00 | |
*** slaweq has joined #opendev | 11:02 | |
*** yuri has quit IRC | 11:11 | |
openstackgerrit | Albin Vass proposed zuul/zuul-jobs master: zuul-test: Add more labels https://review.opendev.org/730449 | 11:11 |
openstackgerrit | Albin Vass proposed zuul/zuul-jobs master: zuul-test: Add more labels https://review.opendev.org/730449 | 11:14 |
*** hrw has quit IRC | 13:07 | |
*** Eighth_Doctor is now known as Eleventh_Doctor | 13:11 | |
*** hrw has joined #opendev | 13:19 | |
mordred | zbr: looks good - I left a +2 with a comment that could either be done on that or as a followup | 13:29 |
fungi | mordred: looks like all our periodic deploy jobs ran cleanly: https://zuul.opendev.org/t/openstack/builds?pipeline=periodic&project=opendev%2Fsystem-config | 13:40 |
fungi | so i think that means the ssh host key update took care of those | 13:40 |
mordred | fungi: \o/ | 13:40 |
fungi | *however* we don't seem to run the zuul deploy in periodic? | 13:40 |
mordred | I think we run it hourly? | 13:41 |
mordred | fungi: opendev-prod-hourly is the pipeline | 13:41 |
mordred | fungi: and it is red | 13:41 |
fungi | aha, yep | 13:42 |
fungi | https://zuul.opendev.org/t/openstack/builds?pipeline=opendev-prod-hourly&project=opendev%2Fsystem-config | 13:42 |
fungi | i just went looking for it | 13:42 |
fungi | also infra-prod-remote-puppet-else is failing there | 13:42 |
fungi | "chown failed: failed to look up user zuul" | 13:44 |
fungi | also the puppet-else log seems to include a lot of failures due to "fatal: Unable to create '/etc/ansible/roles/puppet/.git/index.lock': File exists." | 13:47 |
fungi | on various servers | 13:47 |
fungi | i assume removing /etc/ansible/roles/puppet (or all of /etc/ansible/roles maybe?) on those servers will cause them to get re-cloned correctly on the next pass | 13:48 |
mordred | fungi: yeah | 13:48 |
mordred | fungi: I mean - I'm honestly not sure why we're pushing ansible roles to those servers | 13:48 |
mordred | fungi: WTF | 13:53 |
mordred | fungi: OH | 13:53 |
mordred | ***OH**** | 13:53 |
fungi | on the "failed to look up user zuul" message? | 13:53 |
mordred | no - the puppet one | 13:53 |
fungi | is there a become? | 13:53 |
fungi | oh, okay | 13:53 |
mordred | we're running install-ansible-roles delegated to localhost but on each host | 13:54 |
mordred | which means we're trying to clone in parallel | 13:54 |
* mordred makes patch | 13:54 | |
openstackgerrit | Monty Taylor proposed opendev/system-config master: Only install ansible roles once per run https://review.opendev.org/730460 | 13:59 |
mordred | fungi: ^^ that should fix the git error | 13:59 |
fungi | oh! they were git collisions on bridge.o.o and not the individual servers? | 14:03 |
fungi | that makes more sense | 14:03 |
openstackgerrit | Monty Taylor proposed opendev/system-config master: Fix a few missing zuul_user usages https://review.opendev.org/730461 | 14:05 |
mordred | fungi: yeah | 14:05 |
mordred | fungi: and that should fix the zuul issues | 14:06 |
fungi | aha, yep | 14:07 |
fungi | and at least these are triggered hourly so we should know quickly if there's anything else hiding behind those errors | 14:08 |
mordred | yah - well, they're also triggered from landing - so we should know real quick | 14:15 |
fungi | oh, right that too ;) | 14:22 |
corvus | mordred, fungi: retro +2d thx :) | 14:41 |
mordred | corvus: I think as fallout goes, that was pretty low :) | 14:44 |
corvus | ya :) | 14:44 |
openstackgerrit | Merged opendev/system-config master: Only install ansible roles once per run https://review.opendev.org/730460 | 14:56 |
openstackgerrit | Merged opendev/system-config master: Fix a few missing zuul_user usages https://review.opendev.org/730461 | 15:28 |
fungi | on the run triggered by 730460 we still hit "fatal: Unable to create '/etc/ansible/roles/puppet/.git/index.lock': File exists." on logstash-worker01 and logstash-worker18 | 15:37 |
fungi | those are the only task failures i see in the log though | 15:37 |
mordred | WEIRD | 15:51 |
mordred | I cannot account for that behavior | 15:51 |
fungi | also the infra-prod-service-zuul build triggered by 730461 failed, though i'm mucking with my internet connection so can't check the log for it just yet | 16:08 |
fungi | looks much better though | 16:36 |
fungi | the executors all failed with errors like "usermod: user zuuld is currently used by process NNNN" | 16:36 |
fungi | which in retrospect is expected | 16:36 |
fungi | we need the executors offline to apply that change | 16:36 |
fungi | i suppose i could stop half of them, wait for a periodic cycle, start those again and stop the other half, wait for another periodic cycle, then start them back up | 16:37 |
fungi | the tenant config change from Open10K8S yesterday finally got applied at least | 16:38 |
openstackgerrit | Albin Vass proposed zuul/zuul-jobs master: WIP: add simple test runner https://review.opendev.org/728684 | 17:12 |
openstackgerrit | Albin Vass proposed zuul/zuul-jobs master: zuul-test: Add more labels https://review.opendev.org/730449 | 17:13 |
openstackgerrit | Albin Vass proposed zuul/zuul-jobs master: zuul-test: Add more labels https://review.opendev.org/730449 | 17:13 |
openstackgerrit | Marcin Juszkiewicz proposed opendev/base-jobs master: add arm64 nodesets https://review.opendev.org/728810 | 19:31 |
openstackgerrit | Albin Vass proposed zuul/zuul-jobs master: ensure-package-repositories: fix loopvar collision https://review.opendev.org/730477 | 20:14 |
openstackgerrit | Albin Vass proposed zuul/zuul-jobs master: Fix deprecation warning from multinode tests https://review.opendev.org/730479 | 21:26 |
openstackgerrit | Albin Vass proposed zuul/zuul-jobs master: Fix deprecation warning from multinode tests https://review.opendev.org/730479 | 21:28 |
*** DSpider has quit IRC | 22:14 | |
openstackgerrit | Albin Vass proposed zuul/zuul-jobs master: fetch-subunit-output: stop using system os-testr https://review.opendev.org/730482 | 22:34 |
fungi | infra-root: out of curiosity i stopped zuul-executor on ze01 and that allowed ansible to run to completion on it in the next hourly pulse, *but* now the service won't start. first thing i spot is that the initscript references USER=zuul | 22:38 |
fungi | (and of course now /etc/password only has a zuuld user) | 22:38 |
fungi | i'll leave it stopped for now | 22:39 |
fungi | maybe we only need to patch https://opendev.org/opendev/system-config/src/branch/master/playbooks/roles/zuul-executor/files/zuul-executor.init#L21 ? | 22:40 |
openstackgerrit | Jeremy Stanley proposed opendev/system-config master: Update username in Zuul executor initscript https://review.opendev.org/730483 | 22:43 |
fungi | pushed that ^ in case folks think it's the next step | 22:43 |
openstackgerrit | Albin Vass proposed zuul/zuul-jobs master: fetch-subunit-output: stop using system os-testr https://review.opendev.org/730482 | 22:46 |
openstackgerrit | Albin Vass proposed zuul/zuul-jobs master: WIP: add simple test runner https://review.opendev.org/728684 | 23:05 |
*** slaweq has quit IRC | 23:05 | |
*** Meiyan has joined #opendev | 23:19 | |
*** tosky has quit IRC | 23:27 |
Generated by irclog2html.py 2.17.2 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!