*** DSpider has quit IRC | 00:12 | |
*** mlavalle has quit IRC | 00:31 | |
*** roman_g has quit IRC | 03:37 | |
*** DSpider has joined #opendev | 07:58 | |
*** ralonsoh has joined #opendev | 09:19 | |
*** ralonsoh has quit IRC | 11:27 | |
openstackgerrit | Monty Taylor proposed opendev/system-config master: Make applytest files outside of system-config https://review.opendev.org/720848 | 14:00 |
---|---|---|
openstackgerrit | Monty Taylor proposed opendev/system-config master: Move puppet apply jobs to system-config repo https://review.opendev.org/720887 | 14:00 |
openstackgerrit | Monty Taylor proposed opendev/system-config master: Remove unused rspec tests https://review.opendev.org/720802 | 14:14 |
openstackgerrit | Monty Taylor proposed opendev/system-config master: Make applytest files outside of system-config https://review.opendev.org/720848 | 14:14 |
openstackgerrit | Monty Taylor proposed opendev/system-config master: Move puppet apply jobs to system-config repo https://review.opendev.org/720887 | 14:14 |
openstackgerrit | Monty Taylor proposed opendev/system-config master: Remove global variables from manifest/site.pp https://review.opendev.org/720800 | 14:14 |
openstackgerrit | Monty Taylor proposed openstack/project-config master: Use legacy infra puppet jobs from system-config https://review.opendev.org/720889 | 14:14 |
mordred | fungi, clarkb, corvus: if you have some bored time - that stack above ^^ the applytest patch is the more important one (I fixed the error I think) but the others are part of starting to clean up the mess that is our puppet testing | 14:21 |
openstackgerrit | Monty Taylor proposed opendev/system-config master: Run Zuul using Ansible and Containers https://review.opendev.org/717620 | 14:29 |
openstackgerrit | Monty Taylor proposed opendev/system-config master: Run Zuul using Ansible and Containers https://review.opendev.org/717620 | 14:30 |
*** roman_g has joined #opendev | 14:37 | |
*** roman_g has quit IRC | 14:37 | |
AJaeger | mordred: one suggestion for https://review.opendev.org/720887 | 14:40 |
openstackgerrit | Monty Taylor proposed opendev/system-config master: Stop cloning a bunch of puppet modules we don't use https://review.opendev.org/720892 | 14:42 |
openstackgerrit | Monty Taylor proposed opendev/system-config master: Make applytest files outside of system-config https://review.opendev.org/720848 | 14:45 |
openstackgerrit | Monty Taylor proposed opendev/system-config master: Move puppet apply jobs to system-config repo https://review.opendev.org/720887 | 14:45 |
openstackgerrit | Monty Taylor proposed opendev/system-config master: Remove global variables from manifest/site.pp https://review.opendev.org/720800 | 14:45 |
openstackgerrit | Monty Taylor proposed opendev/system-config master: Stop cloning a bunch of puppet modules we don't use https://review.opendev.org/720892 | 14:45 |
mordred | AJaeger: ++ | 14:46 |
openstackgerrit | Monty Taylor proposed openstack/project-config master: Use legacy infra puppet jobs from system-config https://review.opendev.org/720889 | 15:31 |
openstackgerrit | Monty Taylor proposed openstack/project-config master: Stop running jobs on unused puppet repos https://review.opendev.org/720900 | 15:31 |
openstackgerrit | Monty Taylor proposed openstack/project-config master: Retire unused puppet modules https://review.opendev.org/720901 | 15:31 |
openstackgerrit | Monty Taylor proposed opendev/system-config master: Stop cloning a bunch of puppet modules we don't use https://review.opendev.org/720892 | 15:31 |
openstackgerrit | Monty Taylor proposed openstack/project-config master: Use legacy infra puppet jobs from system-config https://review.opendev.org/720889 | 15:55 |
openstackgerrit | Monty Taylor proposed openstack/project-config master: Stop running jobs on unused puppet repos https://review.opendev.org/720900 | 15:55 |
openstackgerrit | Monty Taylor proposed openstack/project-config master: Retire unused puppet modules https://review.opendev.org/720901 | 15:55 |
openstackgerrit | Monty Taylor proposed opendev/system-config master: Run Zuul using Ansible and Containers https://review.opendev.org/717620 | 15:58 |
*** olaph_ has joined #opendev | 16:00 | |
*** olaph has quit IRC | 16:00 | |
*** olaph_ is now known as olaph | 16:00 | |
openstackgerrit | Monty Taylor proposed opendev/system-config master: Remove global variables from manifest/site.pp https://review.opendev.org/720800 | 16:03 |
openstackgerrit | Monty Taylor proposed opendev/system-config master: Stop cloning a bunch of puppet modules we don't use https://review.opendev.org/720892 | 16:03 |
AJaeger | mordred: let's first migrate the jobs, merge 720892 - and then retire repos. | 16:07 |
AJaeger | Please propose for all repos a change to "empty" it | 16:07 |
mordred | AJaeger: ++ | 16:17 |
openstackgerrit | Monty Taylor proposed opendev/system-config master: Remove global variables from manifest/site.pp https://review.opendev.org/720800 | 16:43 |
corvus | okay, i'm going to muck about on the zk servers | 16:46 |
mordred | corvus: sounds like fun - I'm here if you want eyeballs on anything | 16:47 |
corvus | mordred: thanks! | 16:48 |
corvus | zk01 is the leader; i'm going to kill it and see what happens | 16:49 |
openstackgerrit | Monty Taylor proposed opendev/system-config master: Fix a typo in letsencrypt cron job name https://review.opendev.org/720904 | 16:51 |
openstackgerrit | Monty Taylor proposed opendev/system-config master: Run Zuul using Ansible and Containers https://review.opendev.org/717620 | 16:53 |
corvus | restarted zk01, no problems there; zk03 is the new leader | 16:56 |
corvus | i'll restart 02 now | 16:56 |
mordred | corvus: I wonder - if you shut one of them down (so that a leader election happens but also so that zk notices there are only 2) - and then run ansible to write out the config files | 16:57 |
corvus | mordred: ? | 16:59 |
mordred | corvus: hrm - as I try to write a better vesion of that I think maybe I'm just on crack - so nevermind :) | 16:59 |
corvus | mordred: currently zk does not have write permission for either of the config files, so they shouldn't be changing | 17:00 |
corvus | 2 is back up; no problems so far | 17:00 |
corvus | i'm going to restart 3 now; it's the current leader, which means we'll get another epoch (we started at 0xd, we're currently at 0xf) | 17:01 |
corvus | 2 won the electtion, the new epoch is 0x1 | 17:03 |
corvus | starting 3 | 17:03 |
corvus | it's back up, no complaints | 17:04 |
mordred | corvus: that is disappointing | 17:05 |
corvus | that appears to be a complete rolling restart with no errors. the sequence was: leader (1), follower (2), newleader (3) | 17:05 |
corvus | i think the sequence previously was: follower (1), follower (2), leader (3). perhaps that makes a difference | 17:05 |
corvus | maybe next i should try 1-3-2 to try to approximate that | 17:06 |
corvus | oh, another difference, is that i am running 'docker stop' and not 'docker-compose down' | 17:07 |
corvus | i'll keep using docker stop for the next test | 17:07 |
mordred | ++ | 17:08 |
corvus | (the permissions should forbid it writing any state anywhere, but maybe i'm wrong about that -- maybe there's something in /tmp or somesuch) | 17:08 |
corvus | i've restarted 1 and 3 without incident; 2 (leader) is next | 17:11 |
corvus | 3 is new leader | 17:12 |
corvus | epoch xs 0x11 | 17:12 |
corvus | is ^ | 17:12 |
mordred | corvus: this is unfortunately problem free | 17:13 |
corvus | yeah, 2 is up and running again | 17:13 |
clarkb | perhaps docker stop is more graceful than docker-compose down? | 17:13 |
corvus | maybe | 17:13 |
clarkb | that said you'd expect zk to handle ungraceful power outages | 17:13 |
mordred | yeah | 17:13 |
mordred | kind of pointless to have an HA system if it can't | 17:14 |
corvus | but since 3 is the leader now, we can replay the sequence from yesterday: follower (1), follower (2), leader (3) | 17:14 |
mordred | ++ | 17:14 |
corvus | i'll do that, but still using docker stop | 17:14 |
corvus | so the only known variables will be docker vs docker-compose and the traffic volume | 17:14 |
corvus | (there's still enough volume for us to have increasing zxids, but clearly not as much as yesterday) | 17:15 |
corvus | (that could have secondary effects though, like driving when a snapshot is written, which could have some impact) | 17:15 |
corvus | anyway, i'll do 1-2-3(leader) now with docker stop | 17:16 |
corvus | 1,2 done; starting 3 now | 17:19 |
corvus | 2 is leader | 17:20 |
corvus | 3 is back up; no issues | 17:20 |
corvus | okay, maybe it's time to try 1-3-2(leader) with docker-compose down | 17:21 |
corvus | i'm poking around inside the container -- the entire zk binary installation dir is writable by the zookeeper user (however, we are not running as the zookeeper user, so we can't write to it) | 17:24 |
corvus | (that seems like an error in the build process) | 17:24 |
corvus | there's nothing interesting in /tmp | 17:24 |
corvus | okay, restarting 1 with d-c down | 17:25 |
openstackgerrit | Monty Taylor proposed opendev/system-config master: Remove global variables from manifest/site.pp https://review.opendev.org/720800 | 17:26 |
corvus | all back up without error; 3 is the new leader | 17:28 |
corvus | i want to repeat that once more with the 1-2-3 sequence from yesterday with docker-compose | 17:28 |
*** hrw has quit IRC | 17:29 | |
corvus | yeah, that worked flawlessly too | 17:30 |
corvus | in my mind, that suggests that either the error yesterday was due to some inconsistency that has been ironed out since (maybe something related to the formatting of the myid files after the upgrade); or it's driven by load either directly (like a race condition with connections) or indirectly (like timing of snapshots during the restart process) | 17:32 |
corvus | at least it doesn't happen every time. so i think maybe we should keep an eye on the system and see if it happens again | 17:33 |
mordred | ++ | 17:33 |
mordred | yeah- like | 17:33 |
mordred | absent more data - it seems like there's nothing more we can do - but I suppose it's good that you've rolling restarted it multiple times now with no issue | 17:33 |
*** hrw has joined #opendev | 17:34 | |
* fungi is around-ish now as well | 17:38 | |
fungi | sounds like a heisenbug :/ | 17:40 |
mordred | zomg! | 17:43 |
mordred | corvus: in good news - https://review.opendev.org/#/c/717620/ and https://review.opendev.org/#/c/720527/ (zuul and nodepool in containers) are both finally green! | 17:44 |
*** hashar has joined #opendev | 17:49 | |
*** hashar has quit IRC | 17:49 | |
openstackgerrit | James E. Blair proposed opendev/system-config master: Meetpad: proxy through meetpad to etherpad.opendev.org https://review.opendev.org/720095 | 18:03 |
corvus | mordred: zomg :) | 18:03 |
mordred | corvus: ikr? | 18:09 |
openstackgerrit | Monty Taylor proposed opendev/system-config master: Remove global variables from manifest/site.pp https://review.opendev.org/720800 | 18:12 |
mordred | corvus: quick meetbot patch: https://review.opendev.org/#/c/720904/ | 18:12 |
corvus | mordred: huh. we don't actually use that | 18:13 |
corvus | maybe we should send that patch upstream | 18:13 |
corvus | (that's just for our fork of the image) | 18:13 |
corvus | mordred: https://github.com/jitsi/docker-jitsi-meet/blob/master/web/rootfs/etc/cont-init.d/10-config#L39 | 18:13 |
fungi | yeah, our letsencrypt renewals are all handled from system-config | 18:15 |
*** tobiash has quit IRC | 18:17 | |
*** tobiash has joined #opendev | 18:18 | |
mordred | corvus: ah - hah | 18:33 |
mordred | (I mostly just found it when I was grepping for our letsencrypt stuff but typoed on the command line :) ) | 18:33 |
corvus | heh, i wondered :) | 18:33 |
fungi | clearly a common typographical error | 18:35 |
fungi | or typing error at least | 18:36 |
*** smcginnis has quit IRC | 18:36 | |
openstackgerrit | Monty Taylor proposed opendev/system-config master: Make applytest files outside of system-config https://review.opendev.org/720848 | 18:42 |
openstackgerrit | Monty Taylor proposed opendev/system-config master: Move puppet apply jobs to system-config repo https://review.opendev.org/720887 | 18:42 |
openstackgerrit | Monty Taylor proposed opendev/system-config master: Remove global variables from manifest/site.pp https://review.opendev.org/720800 | 18:42 |
openstackgerrit | Merged opendev/system-config master: Meetpad: proxy through meetpad to etherpad.opendev.org https://review.opendev.org/720095 | 18:53 |
mordred | infra-root: cleanup stack ending https://review.opendev.org/#/c/720800/ is green and ready to go | 19:39 |
*** smcginnis has joined #opendev | 20:43 | |
*** noonedeadpunk has quit IRC | 21:06 | |
*** noonedeadpunk has joined #opendev | 21:07 | |
*** DSpider has quit IRC | 21:40 | |
*** hrw has quit IRC | 22:34 | |
*** paladox has quit IRC | 22:37 | |
*** hrw has joined #opendev | 22:40 | |
*** paladox has joined #opendev | 22:42 |
Generated by irclog2html.py 2.15.3 by Marius Gedminas - find it at mg.pov.lt!