mnaser | 286 periodic and 464 periodic buildsets.. that seems.. excessive lol | 02:20 |
---|---|---|
mnaser | ah, it runs at 2:01 utc and that was a few minutes ago | 02:22 |
fungi | yeah, that's mostly the stable branch jobs for every openstack project. it's grown the more branches we leave open since the extended maintenance model went into effect | 11:16 |
corvus | i'm going to perform a rolling restart+reboot of all zuul servers | 17:11 |
corvus | this will include the relaxed ansible restrictions | 17:11 |
fungi | thanks! i'm semi-around now (was in the midst of gardening and away from computers) | 19:26 |
fungi | i assume this is mostly the long wait for graceful stop on executors, ze01-06 seem to have stopped | 19:28 |
fungi | everything else seems to be on a newer revision | 19:28 |
fungi | (besides the remaining executors, i mean) | 19:29 |
corvus | slower than that i'm afraid -- i've only stopped the mergers and 1/2 the executors, so everything still running is on the old version | 19:38 |
corvus | i'm about to bring the mergers up now | 19:38 |
fungi | ahh, okay, i was going by the versions reported by the components page | 19:38 |
corvus | mergers are up now, so it looks like dev32 is our target version | 19:39 |
fungi | oh, got it, when i looked everything was on dev3 except the running executors which were still on dev2 | 19:39 |
corvus | rebooting executors now | 19:42 |
corvus | er (01-06) that is | 19:42 |
corvus | 01-06 are up; stopping 07-12 now | 19:54 |
corvus | waiting on 09 and 10 to stop | 19:56 |
fungi | awesome | 20:04 |
fungi | looks like it's down to just 09 and 10 now | 20:05 |
fungi | yep | 20:05 |
fungi | and they're down now too | 21:13 |
corvus | i'll reboot them now | 21:17 |
fungi | thanks! | 21:18 |
corvus | all the executors are running now... i'll cycle zuul01 now | 21:53 |
fungi | lgtm, everything except zuul02 seems to be updated | 22:29 |
corvus | i'll restart zuul02 now | 23:10 |
corvus | at this point we should be running zuul 6 | 23:11 |
corvus | and having said that, it sure looks like there's a problem that warrants a rollback to 5.x | 23:14 |
fungi | oh? | 23:14 |
corvus | any of the builds i look at on openstack's tenant seem to have errors relating to ansible on the controller | 23:15 |
fungi | ooh, er yeah, i see all the retry_limit results | 23:16 |
corvus | here's a sample: https://zuul.opendev.org/t/openstack/build/75ffc120188245c0927a7c10e3f1dc0a | 23:16 |
fungi | do we need to rerun our ansible-manage? | 23:16 |
corvus | that should all be in the zuul-executor container | 23:16 |
fungi | "The conditional check 'hostvars[inventory_hostname]['nodepool']['interface_ip'] |ipv6' failed. The error was: The ipv6 filter requires python's netaddr be installed on the ansible controller" | 23:16 |
fungi | i guess the executor images are incomplete? | 23:17 |
corvus | hrm, i'm a little confused... i'm going to poke around in what's running | 23:18 |
fungi | pip list in the container says netaddr 0.8.0 is present (on ze01 nayway) | 23:19 |
corvus | hrm. but `/usr/local/lib/zuul/ansible/2.9/bin/pip list|grep netaddr` is empty | 23:20 |
corvus | (inside the container) | 23:20 |
fungi | ohh, right there's a separate venv for ansible | 23:21 |
fungi | and i agree netaddr isn't in it | 23:22 |
corvus | lemme try running that on 5.2.5 | 23:22 |
corvus | yes it's in there | 23:23 |
fungi | looks like /usr/local/lib/zuul/ansible/2.9 is part of the container image, i don't see it mapped into the container from the host fs anyway | 23:23 |
corvus | here's the requirements difference between 5.2.5 and master: https://paste.opendev.org/show/bT1vCgQcKqDFQRR7ATaD/ | 23:24 |
fungi | clearly netaddr was a transitive dep of something there | 23:25 |
fungi | looks like we dropped boto3, Jinja2, ara, added openshift | 23:26 |
corvus | openshift was there already | 23:26 |
corvus | i'm eyeing ara.... | 23:26 |
fungi | nevermind, openshift was also there before | 23:26 |
fungi | as was boto3 | 23:27 |
fungi | yeah, it was probably dragged in by ara (i can't imagine Jinja2 depending on it anyway) | 23:27 |
corvus | yeah, a pip install of that version of ara drags in netaddr | 23:28 |
corvus | okay, i think we should roll back to 5.2.5, then add netaddr, then try again | 23:29 |
fungi | mystery solved i guess, we need netaddr for ipv6 support in ansible, but ansible doesn't declare a dep on it | 23:29 |
fungi | possible there are other libs in a similar situation, but hopefully not | 23:29 |
corvus | yeah... one of those "technically not required unless you want to use it" sort of things :/ | 23:29 |
fungi | we can just roll back the executors, right? | 23:30 |
corvus | best to do the whole thing i think | 23:30 |
fungi | ah, okay | 23:31 |
corvus | i have pulled 5.2.5 on all the hosts and retagged that as "latest" | 23:35 |
corvus | i'm going to hard-stop the executors now | 23:36 |
corvus | and restart them and the mergers | 23:36 |
corvus | bringing up zuul02 now | 23:38 |
corvus | fungi: hah, i just saw your commit message in the scheduler log :) | 23:39 |
fungi | thanks! | 23:39 |
fungi | and yeah, hopefully not duplicating effort | 23:39 |
corvus | nope, i had my hands full, thanks :) | 23:40 |
fungi | np, i feel like i'm waving my pompoms from the sidelines here anyway | 23:41 |
corvus | it'll take a bit for zuul02 to come up; once that's done we can restart zuul01 and the rollback should be done. i don't think we need to delete state for this rollback, but it'll be good to run a change through its paces | 23:41 |
fungi | yeah, i was wondering if there were any zk schema changes we needed to worry about, but figured it's not been that long since we last updated | 23:42 |
Generated by irclog2html.py 2.17.3 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!