*** jkilpatr has joined #openstack-sprint | 00:00 | |
pabelanger | okay, fixed | 00:01 |
---|---|---|
pabelanger | well, manually fixed | 00:01 |
clarkb | jeblair: once 528101 is merged I am going to clean up the testing of my change on es05-07 and abandon that change | 00:01 |
pabelanger | we need to update our vhost | 00:01 |
pabelanger | let me get up a aptch | 00:01 |
pabelanger | patch* | 00:01 |
pabelanger | remote: https://review.openstack.org/528132 Add javascript alias to cacti.o.o for xenial | 00:04 |
pabelanger | jeblair: clarkb: ianw: fix for cacti02.o.o^ missing alias for javascript folder | 00:04 |
pabelanger | jeblair: thanks for pointer, would have been waiting a while for it to update | 00:05 |
pabelanger | :) | 00:05 |
pabelanger | okay, stepping away for a bit, will check back to make sure firewall rules are happy | 00:09 |
pabelanger | cool, starting to see data show up on cacti02.o.o now | 00:29 |
pabelanger | I'll delete cacti01.o.o in the morning if nobody objects | 00:29 |
pabelanger | EOD | 00:29 |
*** jkilpatr has quit IRC | 00:58 | |
clarkb | ianw: can I get a second review on https://review.openstack.org/#/c/528101/ ? that will allow elasticsearch upgrades to proceed | 00:59 |
jeblair | pabelanger: oh wow that's a crazy new tree thing. yep, data looking good. thanks! | 01:01 |
ianw | arrghh slight stuff up in the ordering of /var/www in the puppet-hound module ... fixing and adding a test. | 02:32 |
*** harlowja has joined #openstack-sprint | 02:41 | |
*** harlowja has quit IRC | 02:42 | |
*** harlowja has joined #openstack-sprint | 02:43 | |
*** harlowja has quit IRC | 03:15 | |
*** harlowja has joined #openstack-sprint | 04:36 | |
ianw | clarkb: http://logs.openstack.org/30/528130/4/check/legacy-puppet-beaker-rspec-infra/58ed957/job-output.txt.gz#_2017-12-15_04_34_45_888316 | 04:38 |
ianw | it's not looking great for etherpad ... looks like a bunch of version stuff to work through | 04:39 |
*** harlowja has quit IRC | 04:45 | |
ianw | https://review.openstack.org/528156 starting to bump everything, not sure how far i'll get with it | 04:50 |
clarkb | ianw: probably need to update etherpad version | 05:05 |
ianw | yeah, trying that, i've put a basic rspec test in there, so it's a start | 05:06 |
ianw | ok, codesearch01 is alive! | 05:09 |
*** harlowja has joined #openstack-sprint | 05:21 | |
*** harlowja has quit IRC | 05:22 | |
*** harlowja has joined #openstack-sprint | 05:24 | |
*** harlowja has quit IRC | 05:29 | |
*** harlowja has joined #openstack-sprint | 05:38 | |
ianw | frickler: after 17 revisions, i *think* ethercalc is ready to go | 06:03 |
ianw | i've update the changes required in the etherpad. if you want to shepherd them through, then launch the node, should be gt | 06:05 |
ianw | gtg | 06:05 |
*** harlowja has quit IRC | 06:32 | |
frickler | wow, so much backlog ... | 07:35 |
frickler | looks like es03+04 are still untouched, so I'm going to launch a new es04, verify the firewall patch there locally, merge it and watch everything go boom | 07:39 |
frickler | uhoh, seems we need to take care of cleaning up nodes first, not enough quota left to start even one additional small node ... Requested 61440, but already used 1094656 of 1152000 ram | 07:42 |
frickler | that's 4G missing | 07:43 |
frickler | so I found an instance of subunit-worker01.openstack.org in error state, 140991b2-b376-4990-aed5-a07ffeb94ea6, launched earlier than the currently running 540b860a-ed52-4307-99ca-9f51103ae3f2 | 08:29 |
frickler | I'm going to remove the errored one, hoping that that will be enough of a quota cleanup | 08:30 |
frickler | that worked fine, confirmed fw patch, merging now | 08:48 |
frickler | ianw: major issue with https://review.openstack.org/528156 seems to be lacking systemd service definition, will try to fix that once I'm done with elasticsearch | 08:56 |
frickler | o.k., so new es04 looks sane to me, starting with migration tasks now | 10:50 |
*** jkilpatr has joined #openstack-sprint | 11:38 | |
frickler | new es04 is active, syncinc shards, bbiab | 11:40 |
*** jkilpatr has quit IRC | 11:51 | |
*** jkilpatr has joined #openstack-sprint | 11:52 | |
*** jkilpatr has quit IRC | 12:07 | |
frickler | cluster is green, removing old es04 now and launching new es03 | 12:35 |
frickler | es03 now syncing. puppet runs on trusty nodes seems to take a long time, scanning all es data, won't investigate further as that will soon no longer affect us | 13:21 |
*** clayton has quit IRC | 13:29 | |
*** baoli has joined #openstack-sprint | 14:21 | |
*** clayton has joined #openstack-sprint | 14:22 | |
frickler | old es03 deleted, new es02 launched, waiting for clarkb to start the grand finale now ;) | 14:29 |
clarkb | frickler: to do es02 we just have to update the apache proxy config on logstash.openstack.org | 14:43 |
clarkb | right now we point it as es02 but can point it to any of the others | 14:43 |
clarkb | frickler: you were verifying that https://review.openstack.org/#/c/528101/ applied properly? I am assuming so as you approved the change | 14:51 |
frickler | clarkb: yes, I did. causes a different workflow for changing hosts now, needing a puppet run instead of just an iptables restart | 14:54 |
clarkb | frickler: but seems to be working ok? | 14:55 |
frickler | clarkb: yep, except for the long run time on the old nodes I noted earlier, but that should now be obsolete anyway | 14:56 |
frickler | for changing logstash, is that the "discover_node" entry? | 14:57 |
clarkb | frickler: yes in puppet-logstash/templates/kibana.vhost.erb we do something like RewriteRule ^/elasticsearch/((.*/)?_search)$ http://<%= @discover_nodes[0] %>/$1 [P] | 14:58 |
clarkb | frickler: so I think changing the first element of that list will be what we want | 14:59 |
pabelanger | morning | 15:09 |
clarkb | good morning | 15:10 |
jeblair | good morning! | 15:10 |
*** harlowja has joined #openstack-sprint | 15:11 | |
*** harlowja has quit IRC | 15:13 | |
pabelanger | I confirmed with ttx tht design-summit-prep can be deleted | 15:14 |
pabelanger | I'll do that shortly | 15:14 |
frickler | clarkb: https://review.openstack.org/528305 Prepare for replacing elasticsearch02 | 15:15 |
pabelanger | remote: https://review.openstack.org/528306 Delete design-summit-prep node | 15:17 |
pabelanger | easy review for people | 15:17 |
pabelanger | I'll delete the server and dns now | 15:17 |
pabelanger | cacti02.o.o looks to be setup correctly, I'll delete cacti01.o.o now unless people object | 15:35 |
clarkb | no objection here | 15:36 |
clarkb | pabelanger: maybe you can be second review on https://review.openstack.org/#/c/528305/ so that frickler can finish up the elasticsearch cluster upgrades | 15:36 |
pabelanger | looking | 15:37 |
pabelanger | +3 | 15:37 |
clarkb | general note to the channel, can you try and make sure the etherpad is up to date with the work you did around the sprint this week beforeyou sign off and weekend? I will use that to put together an email summary of what we did | 15:38 |
pabelanger | https://review.openstack.org/528133/1 and https://review.openstack.org/528135/ are also some easy views for clean up of system-config | 15:38 |
pabelanger | reviews* | 15:38 |
pabelanger | I'll start looking into static.o.o, but that will likely need some sort of announcement for an outage | 15:42 |
pabelanger | and we likely don't want to roll that out today on a friday | 15:42 |
clarkb | pabelanger: ya we'll want to look at scheduling the more difficult upgrades around feature/freeze release | 15:43 |
clarkb | (as I expect we won't be getting much done after this week simply due to holidays and all that and then its into last milestone and feature freeze and all that fun) | 15:44 |
pabelanger | agree | 15:46 |
pabelanger | let me see what else we could finish off today before looking at static.o.o | 15:46 |
pabelanger | I'm going to read up on how to migrate kerberos | 15:48 |
clarkb | pabelanger: see comment on 528135 (mostly just looking to see what others think) | 15:48 |
clarkb | pabelanger: I wrote docs on how to do no downtime kerberos reboots. I expect that will come in to play a little | 15:49 |
pabelanger | clarkb: yah, I seem to remember we did that before with kerberos | 15:49 |
pabelanger | okay, so I think we maybe stand up kd04, as new standby. Join it to kdc01 and kdc02, confirm it works, then offline kdc02. | 15:51 |
pabelanger | run run-kprop.sh to make kdc04 primary, stand up kdc03 (xenial primary), join to kdc04, delete kdc01, then run-kprop.sh again so kdc03 is new final primary | 15:52 |
pabelanger | clarkb: ^seem right? | 15:52 |
clarkb | pabelanger: it sounds good, but I have little to go on as far as previous experience to know if it is right :) | 15:53 |
clarkb | jeblair: ^ is probably hte best person to get input on that | 15:53 |
pabelanger | same | 15:53 |
pabelanger | let me propose puppet patches first to stand up xenial but not join | 15:56 |
pabelanger | then we can finalize order | 15:56 |
clarkb | ok | 15:57 |
jeblair | clarkb, pabelanger: some dns changes may be necessary too? | 16:11 |
jeblair | huh, i think we may have forgotten to document those | 16:12 |
jeblair | i'll catalog those real quick | 16:15 |
pabelanger | okay, thanks | 16:22 |
frickler | clarkb: so I can run the usual puppet apply on logstash.o.o? | 16:22 |
jeblair | pabelanger, clarkb: remote: https://review.openstack.org/528323 Add kerberos / afs dns info | 16:24 |
jeblair | pabelanger, clarkb: if you want to change hostnames, we'll need to update dns | 16:25 |
clarkb | frickler: there is a playbook you can run on puppetmaster that does it | 16:26 |
clarkb | frickler: sorry I am dealing with kids right now but will be back at computer soon | 16:26 |
frickler | clarkb: o.k., I'll be back in about an hour then, too | 16:28 |
pabelanger | remote: https://review.openstack.org/528328 Add kdc04.o.o xenial node | 16:40 |
pabelanger | okay, as I understand, we shouldn't have an issue with multiple slave KDCs with our master | 16:43 |
jeblair | pabelanger: i think so | 17:10 |
clarkb | frickler we use the remote_puppet_adhoc.yaml playbook. It runs by default using hosts * so you have to use ansible-playbook --limit some.fqdn.here to restrict it to just the host you want | 17:10 |
clarkb | frickler: the other thing to keep in mind is it does not update the puppet modules and other git repos | 17:10 |
jeblair | clarkb: doesn't kick.sh do that? | 17:10 |
clarkb | oh kick.sh may | 17:11 |
* clarkb checks | 17:11 | |
jeblair | that's what i usually use | 17:11 |
clarkb | no kick.sh is basically just the same thing as above so I don't think the git repos update | 17:11 |
jeblair | that's what i meant. | 17:11 |
jeblair | irc lag | 17:11 |
clarkb | gotcha ya kick.sh makes it simpler to use | 17:11 |
clarkb | because you just pass the hostname and it does the limit for you | 17:11 |
jeblair | yep | 17:12 |
clarkb | frickler: so system-config/tools/kick.sh some.fqdn.here is simpler | 17:12 |
pabelanger | clarkb: mind a review on https://review.openstack.org/528328/ and parent https://review.openstack.org/528319 | 17:26 |
pabelanger | I believe that will allow us to bring a new slave online | 17:26 |
clarkb | pabelanger: what points afs at kdc*.openstack.org? Might be getting ahread of myself but not seeing that in the change to add the new kdc (or in system-config otherwise) | 17:31 |
pabelanger | clarkb: we'll need some followup DNS changes to bring kdc04.o.o online in dns, so AFS could see it | 17:34 |
clarkb | right but what tells afs to look at kdc04 too? is that just afs config? | 17:35 |
pabelanger | clarkb: that should be https://review.openstack.org/528323/ | 17:35 |
pabelanger | clarkb: I belive it is just resolves it via dns | 17:35 |
pabelanger | but I will confirm | 17:35 |
clarkb | oh kerberos uses srv records perfect | 17:35 |
pabelanger | yah | 17:36 |
clarkb | pabelanger: jeblair looks like zuul.opestack.org is still up and running is that so that it can redirect to zuulv3.openstack.org? | 17:37 |
pabelanger | yah, fungi suggested we might be able to just update DNS now and delete zuul.o.o | 17:38 |
pabelanger | then discuss moving zuulv3.o.o back to zuul.o.o in the future | 17:39 |
*** baoli has quit IRC | 17:48 | |
frickler | clarkb: o.k., so the config seems to have been applied in the meantime, I'd assume I could go on replacing es02 now | 18:01 |
*** baoli has joined #openstack-sprint | 18:03 | |
clarkb | frickler: go for it | 18:03 |
pabelanger | okay, trying kdc04 in ord same location as kdc02 | 18:19 |
frickler | o.k., new es02 running, waiting on 2 shards. logstash.o.o seems to be doing fine | 18:28 |
clarkb | frickler: woot | 18:28 |
clarkb | frickler: if you need to run I can remove the old es02 when cluster goes green | 18:36 |
clarkb | frickler: you don't need to hold off on your weekend for that :) thank you for all the help this week! | 18:36 |
*** baoli has quit IRC | 18:42 | |
*** baoli has joined #openstack-sprint | 18:46 | |
frickler | clarkb: thx, deleted old server now. have a nice weekend, everyone (though I'll probably be back tomorrow anyway ;) | 19:05 |
pabelanger | looks like some systemd issues with kerberos, looking now | 19:16 |
fungi | yeah, my primary concern with deleting the old zuul.o.o instance is if anyone has anything in their homedirs/shell histories they want to grab first | 19:29 |
fungi | i doubt we still care about the logs on it at this point (odds are they've been rotated into oblivion by now anyway) | 19:30 |
*** jkilpatr has joined #openstack-sprint | 20:21 | |
*** baoli has quit IRC | 21:27 | |
clarkb | I shouldn't have anything in my homedir on any of our hosts that I care about | 22:20 |
*** dteselkin has quit IRC | 22:31 | |
*** dteselkin has joined #openstack-sprint | 22:39 | |
clarkb | I've clearned out the infra netfilter persistent unit on the 3 elasticsaerch nodes that got it. Goign to abandon that change then work on an email summarizing what we did | 22:39 |
fungi | thanks for summarizing! i'll fix up the irc topics now | 23:21 |
*** ChanServ changes topic to "OpenStack Virtual Sprints, schedule at https://wiki.openstack.org/wiki/VirtualSprints | Channel logs at: http://eavesdrop.openstack.org/irclogs/%23openstack-sprint/" | 23:22 | |
*** rwsu has quit IRC | 23:37 |
Generated by irclog2html.py 2.15.3 by Marius Gedminas - find it at mg.pov.lt!