*** jkilpatr has joined #openstack-sprint | 00:09 | |
*** baoli has quit IRC | 03:34 | |
*** skramaja has joined #openstack-sprint | 05:26 | |
*** ianychoi_ is now known as ianychoi | 06:12 | |
frickler | ianw: hi, sorry for being late, is there anything left I could help you with? or are you done for today? | 08:37 |
---|---|---|
* frickler is seeing messages of type "< openstackrecheck> Console logs not available after ..." for the first time after weeks again this morning, does anyone know what happened there? | 09:04 | |
ianw | frickler: hey, thanks for the reviews | 10:47 |
ianw | you can pick out any parts, or wait for clarkb etc | 10:48 |
ianw | if you want to jump further into ethercalc, feel free | 10:48 |
ianw | basically, ssh to the testing host 23.253.119.134 and "cd /opt/system-config/production; puppet apply -v --modulepath=modules:/etc/puppet/modules manifests/site.pp" and keep fixing stuff till it works :) there's some notes on the etherpad, for sure our puppet needs to ship a .service file instead of an upstart, for example | 10:50 |
ianw | i'm off but will jump back in tomorrow! | 10:50 |
frickler | I've looked at your notes for ethercalc and was wandering whether we should do a systemd service file directly | 10:51 |
frickler | otherwise I can go on with iterating. do I need to become root for that? | 10:52 |
ianw | frickler: we'll want to replace https://git.openstack.org/cgit/openstack-infra/puppet-ethercalc/tree/templates/upstart.erb (and all the stuff that writes that out) with a .service file | 10:54 |
ianw | frickler: yep; for these hosts log in as yourself and sudo -s, for ci hosts you log in as root@ (your key should be deployed, i went through all that today :) | 10:55 |
frickler | ianw: yeah, I'm on the host already, will go ahead and try to build a service definition | 10:56 |
ianw | that'll be step 1 ... the nodejs deployment stuff might need fiddling. i think that will work out common for etherpad too, so that's good. it's just a matter of trying & fixing failures till it works really | 10:58 |
ianw | i thought this would be an easy one :) status.o.o is probably *really* an easy one ... but you never know :) | 10:59 |
frickler | everything looks easy from the outside probably ;) | 11:01 |
*** skramaja_ has joined #openstack-sprint | 11:20 | |
*** skramaja has quit IRC | 11:21 | |
*** skramaja has joined #openstack-sprint | 11:25 | |
*** skramaja_ has quit IRC | 11:25 | |
*** jkilpatr has quit IRC | 11:37 | |
*** ianychoi has quit IRC | 11:37 | |
*** ianychoi has joined #openstack-sprint | 11:50 | |
*** jkilpatr has joined #openstack-sprint | 12:11 | |
*** baoli has joined #openstack-sprint | 13:10 | |
*** clarkb has joined #openstack-sprint | 13:15 | |
clarkb | frickler: dmsimard the puppetmaster:/etc/puppet/hieradata/production git repo is where we keep the root non public hiera data | 13:20 |
frickler | clarkb: so do I connect with my account and use sudo then? | 13:20 |
clarkb | frickler: correct | 13:20 |
clarkb | the reason our email addresses are not public is because we found people were using our puppet modules and installing our email addresses for root spam resulting in us getting their root email | 13:21 |
frickler | clarkb: can you take a look at elasticsearch in the meantime? seems the cluster is stuck | 13:21 |
frickler | probably since ian did some updates earlier | 13:21 |
clarkb | looks like es02 and es04 did not hav etheir elasticserach processes running (we don't let them start on boot to give us more control over cluster management) so I started the service on those two hosts | 13:22 |
clarkb | in general editing this hiera repo is what we'll do to update ssl certs or db credentials and so on | 13:23 |
clarkb | so adding yourself to the sysadmins list is a good first exposure to where that lives and how to update it | 13:23 |
frickler | o.k., so I'll start with that now | 13:23 |
clarkb | and since its a shared repo when we edit it we'll usually drop a note in IRC to let others know not to conflict with us | 13:25 |
clarkb | frickler: looks like you are all done? | 13:27 |
frickler | oh, I'm seeing a note in the log about issues with google mail | 13:27 |
frickler | my address is also gmail rebranded | 13:27 |
frickler | maybe I should set a something different from work email anyway | 13:28 |
frickler | but for now I'm done with editing, yes | 13:28 |
clarkb | it may not be an issue for you, I think pabelanger's red hat email is gmail too | 13:28 |
clarkb | I personally had problems with gmail and switched away from it though | 13:29 |
clarkb | frickler: dmsimard the next thing I had in mind was to replace a logstash-workerNN.openstack.org node each since those are straightforward to replcae and should give us ability to focus more on process than specific service details | 13:30 |
frickler | my private stuff is hosted at hetzner.de but I need to move things around there a bit first | 13:30 |
clarkb | frickler: dmsimard if you haven't seen it yet you probably want to start at https://git.openstack.org/cgit/openstack-infra/system-config/tree/launch/README | 13:30 |
clarkb | system-config/launch is our openstack cloud VM launching tool for booting new instances in clouds | 13:31 |
clarkb | when executed from the puppetmaster it can make use of our clouds.yaml on that node making the process fairly straightforward | 13:32 |
clarkb | I personally have a git clone of system-config in my homedir on the puppetmaster that I run that from | 13:32 |
frickler | I noticed that I'm in admin group but not puppet. Is the idea to set this up manually when needed or should this get better automation? | 13:32 |
clarkb | frickler: I think we've always just run the manual group addition like in that doc, but we probably could automate that instead | 13:33 |
clarkb | if you want to work on a change to automate that I think it would be a good addition | 13:33 |
clarkb | (but maybe for later so we can focus on launch node things now) | 13:34 |
frickler | yeah, I'll put it on my todo list | 13:34 |
*** skramaja has quit IRC | 13:36 | |
frickler | the pip install is failing for me https://git.openstack.org/cgit/openstack-infra/system-config/tree/launch/README#n21 | 13:37 |
clarkb | fun, is it a dependency issue? | 13:37 |
frickler | failing to build multiple wheels | 13:37 |
* clarkb works to reproduce | 13:37 | |
frickler | http://paste.openstack.org/show/628615 | 13:38 |
frickler | thats the full log | 13:38 |
clarkb | looks like it failed to find Python.h which comes from python-dev | 13:40 |
clarkb | I think this may be a new dependency or we otherwise were able to pull wheels for it in the past? | 13:40 |
*** ianychoi has quit IRC | 13:40 | |
dmsimard | I'm here for around ~30 minutes before I have to afk briefly, going to look at step 0 and bootstrapping | 13:40 |
clarkb | oh wait I see | 13:40 |
clarkb | frickler: we have python 2 dev files installed but not python3 and virtualenv defaulted to python3 for some reason | 13:41 |
clarkb | frickler: I'm testing with `virtualenv -p python2 launch-env` | 13:42 |
Shrews | clarkb: i'm probably going to need the same bootstrapping as frickler and dmsimard | 13:42 |
clarkb | Shrews: good morning, feel free to follow along, ask questions, etc. We hvae plenty of logstash worker nodes so should be plenty of room. | 13:43 |
*** baoli has quit IRC | 13:43 | |
clarkb | Shrews: dmsimard has indicated he is editing the sysadmins list in hiera and since that is a shared git repo we will have to wait for him to indicate completion before you add yourself | 13:43 |
clarkb | Shrews: the file for that is puppetmaster.openstack.org:/etc/puppet/hieradata/production/common.yaml when dmsimard is done | 13:43 |
Shrews | k k | 13:43 |
clarkb | bsaically you edit and commit as root and sign off on the change with your name in the commit message | 13:44 |
dmsimard | clarkb: I see symlinks from <nickname> to production | 13:44 |
clarkb | dmsimard: yes, that is an artifact of puppet environments | 13:44 |
dmsimard | clarkb: is that a system used to "lock" ? i.e, we grep to see if there is a user doing it ? | 13:44 |
dmsimard | oh, ok | 13:44 |
clarkb | I've personally not used puppet environments any time recently becaues they are often quite clunky (and I think ansible-puppet may have mostly negated their usefulness by local applying everything) | 13:45 |
clarkb | Instead I do my best to run puppet locally until I'm happy with it (which is probably a better way to do things anyways) | 13:45 |
clarkb | frickler: yes virtualenv -p python2 launch-env seemed to work | 13:45 |
clarkb | frickler: that forced virtualenv to make the env with python2 instead of python3 | 13:45 |
dmsimard | clarkb: do people typically remain as their own user or they sudo as root ? i.e, I'd want to move to /etc/puppet/hieradata | 13:45 |
clarkb | dmsimard: I think its a mix. I know pleia2 for example was really good about always sudoing everything and never properly becoming root. I came from an env where we didn't have sudo and only had proper root so end up as proper root more often than is good probably | 13:46 |
frickler | clarkb: confirmed, ansible installed fine now | 13:47 |
dmsimard | oh wow, nano as default editor on git commit.. that's something I haven't seen in a long time | 13:47 |
dmsimard | :D | 13:48 |
frickler | dmsimard: I stumbled about that too ;) | 13:48 |
clarkb | dmsimard: I think that is how we've avoided the vi(m) vs emacs battle :P | 13:48 |
clarkb | frickler: great, I'll push a patch up for that now and then add you two to the infra root gerrit group so you can review it for me :) | 13:48 |
dmsimard | Shrews: I'm done editing hieradata | 13:48 |
Shrews | dmsimard: k | 13:49 |
Shrews | dmsimard: i noticed you didn't sign your commit. want to amend before I change anything? | 13:50 |
dmsimard | Shrews: let me see.. | 13:51 |
clarkb | frickler: https://review.openstack.org/527092 and I will have gerrit groups updated momentarily | 13:51 |
dmsimard | Shrews: by sign you mean append my nickname to the commit description ? | 13:51 |
dmsimard | Shrews: or gpg sign ? | 13:52 |
Shrews | dmsimard: just a nick in the commit msg | 13:52 |
dmsimard | Shrews: ok, I added it | 13:52 |
dmsimard | Shrews: er, hang on.. | 13:52 |
clarkb | 13:50:23 Shrews | dmsimard: k | 13:53 |
clarkb | silly weechat mouse mode | 13:54 |
dmsimard | It's picking up the author as "Your Name <you@example.com>" | 13:54 |
dmsimard | ¯\_(ツ)_/¯ | 13:54 |
dmsimard | fixing that | 13:54 |
dmsimard | Shrews: ok, go | 13:55 |
clarkb | frickler: dmsimard you have been added to the infra-core group in gerrit. So you can now +/-2 +/-A changes like https://review.openstack.org/527092 | 13:56 |
frickler | clarkb: already done ;) | 13:57 |
clarkb | dmsimard: you'll want to read https://git.openstack.org/cgit/openstack-infra/system-config/tree/launch/README next and follow the steps through to line 21 (but using my edit in change 527092) | 13:57 |
Shrews | alright, done. i don't have any option but a gmail address, so *shrug* if there's a problem with that | 13:57 |
clarkb | Shrews: ^ you'll want to follow that too | 13:57 |
dmsimard | you'd think that "virtualenv" would be py2 and "virtualenv-3.4" would be py3 :D | 13:58 |
clarkb | ya I'm not sure why its picking python3 yet | 13:58 |
clarkb | I think because it got installed under python3? | 13:58 |
dmsimard | ahhhh | 13:59 |
dmsimard | The default is the interpreter that virtualenv was installed with (/usr/bin/python3) | 13:59 |
clarkb | I'm going to make tea while everyone makes virtualenvs | 14:00 |
* Shrews is now virtual | 14:01 | |
dmsimard | Those are some kind of old versions of shade and ansible by now -- Ansible 2.1 is EOL actually. Are they pinned for a good reason ? | 14:01 |
clarkb | dmsimard: they are pinned because releases of bot htend to break things. I'm not sure that they are pinned to those specific versions for a good reason though | 14:02 |
clarkb | I expect that ansible 2.3 would work as well | 14:02 |
dmsimard | clarkb: yeah that's totally fair, I would up the pin. I'll guinea pig ? | 14:02 |
clarkb | dmsimard: maybe after the first round so that we can hopefully avoid problems first time through? | 14:03 |
dmsimard | sure | 14:03 |
clarkb | when we upgrade nodes typically what that actually means is replacing the instance with a new instance running newer software | 14:04 |
clarkb | I only know of one case where we upgraded in place which was the lists.openstack.org upgrade and we did that ot keep the IP and its reputation for sending email | 14:05 |
clarkb | to upgrade logstash worker nodes we will be using the replace method | 14:05 |
clarkb | So the next step is looking at the old instance(s) to see what flavor/size/distro we need `openstack --os-cloud openstackci-rax --os-region DFW server show logstash-worker01.openstack.org` should be runnable as a normal user on puppetmaster to give you that info | 14:06 |
clarkb | in this case we see the flavor is 'performance1-4' and it is indeed a trusty node so we will want to replace it with a 16.04 xenial node | 14:06 |
Shrews | aye | 14:07 |
dmsimard | clarkb: should we grab a copy of clouds.yaml from root and put it in our home directory ? | 14:07 |
clarkb | dmsimard: no, you should probably use the root copy it should be readable by your user | 14:07 |
* dmsimard looks | 14:08 | |
clarkb | dmsimard: the root copy is the default for openstack client and this way we can keep it up to date more easily | 14:08 |
clarkb | you can also do things like flavor list and image list to get a sense of what flavors and images are available | 14:08 |
clarkb | one piece of information that the launch README doesn't really call out that is probably worth being more explicit about is that we have two tenants/users/projects/whateveritscalledtoday | 14:09 |
clarkb | we have the openstackci account and the openstackjenkins/openstackzuul account. openstackci is where we run the control plane servers and openstackjenkins/openstackzuul is what we give nodepool access to | 14:09 |
Shrews | yeah, that's kinda important | 14:10 |
clarkb | in this case we are using the openstackci account because logstash workers are in the control plane but when you work with nodepool nodes you will use the openstackzuul/openstackjenkins account | 14:10 |
dmsimard | clarkb: yeah I guess that's why I was asking for the clouds.yaml -- in order to use openstackclient "freely" | 14:10 |
clarkb | dmsimard: you should be able to use it freely already | 14:11 |
clarkb | does the command I pasted above work for you? | 14:11 |
clarkb | (it should work as is) | 14:11 |
Shrews | i don't see the other account(s) in clouds.yaml | 14:11 |
Shrews | oh, all-clouds.yaml has them | 14:12 |
clarkb | Shrews: oh thats brings up another important piece of info. We have two clouds.yaml the default file only has control plane stuff and then there is all-clouds.yaml which you can set with an env var for everything | 14:12 |
Shrews | *nod* | 14:12 |
dmsimard | clarkb: what is this magic, are we created as uid 0 ? | 14:12 |
clarkb | the reason for this is the ansible-puppet things use the dfeault file and we don't want it attempting to puppet nodepool nodes | 14:12 |
*** baoli has joined #openstack-sprint | 14:12 | |
clarkb | dmsimard: I think its just group membership | 14:12 |
dmsimard | clarkb: huh, I totally expected osc to seek in ~/.config, not ~/root/.config | 14:13 |
clarkb | dmsimard: ya group admin gets rw access to the file | 14:13 |
dmsimard | well, wfm | 14:13 |
clarkb | dmsimard: it is actually looking at /etc/openstack/clouds.yaml | 14:13 |
Shrews | dmsimard: shade (or occ, rather) will look in /etc/openstack and ~/.config | 14:13 |
Shrews | part of occ magic | 14:13 |
dmsimard | clarkb: ohhhhhh, yeah /etc/openstack totally makes more sense than my confused explanation | 14:14 |
Shrews | os-client-config for non-shorthand | 14:14 |
clarkb | so now we should all pick a unique logstash-workerNN NN value then we can start running some boots | 14:14 |
* frickler picks 01 | 14:14 | |
* Shrews picks 02 | 14:15 | |
dmsimard | I have to afk briefly but I'm all set up, I'll pick a number when I'm back | 14:15 |
dmsimard | Let's use the pad to keep up with who's doing what | 14:15 |
clarkb | dmsimard: ++ to using etherpad to track | 14:15 |
dmsimard | pad is here: https://etherpad.openstack.org/p/infra-sprint-xenial-upgrades | 14:15 |
dmsimard | ok, brb | 14:16 |
Shrews | clarkb: and we use launch-node.py, right? | 14:16 |
frickler | clarkb: how does our quota look like, do we need to check before launching new servers | 14:17 |
clarkb | Shrews: correct | 14:17 |
clarkb | frickler: I don't actually know but we can ask openstackclient for that info (or we can just execute the command and if we don't have enoug hquota it will fail fast | 14:17 |
clarkb | before we start though a few more things | 14:17 |
Shrews | clarkb: value for $FQDN can be the same as the thing we are replacing? | 14:18 |
clarkb | since this is a base distro image upgrade we should be careful to explicitly set the image name we want. Also make sure they flavor matches the old server's | 14:18 |
clarkb | Shrews: yes | 14:18 |
* clarkb will make a quick paste for what the commands should look like in this specific case | 14:18 | |
Shrews | a3b50a75-2fe0-437a-bf7a-04c2cf0adf4c | Ubuntu 16.04 LTS (Xenial Xerus) (PVHVM) | 14:19 |
clarkb | ya, something like http://paste.openstack.org/show/628623/ | 14:20 |
clarkb | replacing the NN with your chosen value | 14:21 |
clarkb | also I tend to run this in screen | 14:21 |
clarkb | some server builds take longer than expected and being able to close the laptop is nice | 14:21 |
Shrews | oh, yeah. that's a good tip | 14:22 |
dmsimard | Oh yay other screen users | 14:22 |
* dmsimard needs to learn tmux | 14:22 | |
Shrews | especially since i have a chiro appointment soon | 14:22 |
clarkb | frickler: Shrews but ya I think you can go ahead and run that whenever you are ready | 14:22 |
clarkb | in this specific case the server we are bringing up is largely stateless and will start its life firewalled off from the rest of the cluster so very little to worry about :) | 14:23 |
* Shrews launching | 14:24 | |
* frickler is launching too and will be back in a couple of minutes | 14:24 | |
* fungi sprints in, very late | 14:26 | |
fungi | i'll get something good in the channel topic in just a sec | 14:27 |
fungi | didn't we have an ml thread discussing this? was it just in meetings? | 14:28 |
fungi | i guess i can link the 'pad | 14:28 |
clarkb | there is a ml thread too but I think the etherpad is likely most useful | 14:29 |
*** ChanServ changes topic to "OpenStack Infra team Xenial upgrade sprint | Coordinating at https://etherpad.openstack.org/p/infra-sprint-xenial-upgrades" | 14:31 | |
clarkb | Shrews: frickler let me know when that completes (there should be a bunch of information about dns related items which we'll talk about next once we have that info) | 14:32 |
Shrews | ooh exception | 14:33 |
clarkb | woo fun | 14:33 |
fungi | the launch script raises an exception if puppet (or anything really) fails during the process | 14:34 |
Shrews | http://paste.openstack.org/show/628626/ | 14:34 |
clarkb | ok I think I actually know what this bug is | 14:35 |
clarkb | I expect we'll be seeing a lot of this one because systemd | 14:35 |
* fungi shakes fist at systemd | 14:35 | |
clarkb | well systemd and puppet. The problem (I think) is that we use sys V init scripts which systemd supports but you have to reload its config for it to find them | 14:35 |
clarkb | puppet does not reload this config for us automatically so we'll need to add some puppetry to do that | 14:35 |
clarkb | I've done this before for zanata let me dig up that change | 14:36 |
dmsimard | ok, I'm back | 14:36 |
fungi | for basically any service we're adding custom initscripts with i suppose | 14:36 |
clarkb | fungi: ya | 14:36 |
* Shrews has to step away a sec. brb | 14:36 | |
fungi | i guess the distro packages use maintscripts to register their initscripts with systemd | 14:36 |
dmsimard | fungi: that was the thread for the sprint: http://lists.openstack.org/pipermail/openstack-infra/2017-November/005702.html | 14:36 |
*** mrhillsman has joined #openstack-sprint | 14:37 | |
clarkb | fungi: ya and puppet's excuse is that this is how you are suppoesd to use puppet | 14:37 |
fungi | dmsimard: ahh, back in november. no wonder i wasn't spotting it | 14:37 |
clarkb | we basically need to add the code that was removed in https://review.openstack.org/#/c/423369/3/manifests/wildfly.pp to the puppet for logstash workers | 14:39 |
clarkb | (it was removed in ^ because an external dep solved the problem for us, but we don't have external deps for logstash workers like that so we'll carry it ourselves) | 14:39 |
clarkb | does someone else want to work on that change or should I? | 14:39 |
clarkb | Shrews: frickler another note, by default launch-node.py will clean up after itself on failure by default so you shouldn't need to do anything special here | 14:40 |
fungi | and if you need it _not_ to clean up after itself, add --keep | 14:41 |
dmsimard | clarkb: I can send a patch. | 14:41 |
clarkb | dmsimard: cool I think you want to edit worker.pp in puppet-log_processor repo | 14:41 |
fungi | i'm starting on subunit-worker01 (to replace subunit-worker02) since i had actually started trying to boot it on xenial a month or so ago and then got sidetracked by other stuff | 14:42 |
fungi | odds are i'll want to copy dmsimard's patch for that | 14:42 |
clarkb | yup | 14:42 |
fungi | should we patch tools/launch-node.py to switcn the default image to 'Ubuntu 16.04 LTS (Xenial Xerus) (PVHVM)' now? | 14:45 |
fungi | s/switcn/switch/ | 14:45 |
clarkb | fungi: probably a good idea to prevent regressions launching new servers or mistakes if we forget to specify the image | 14:45 |
fungi | patch on the way then | 14:46 |
fungi | do we have a review topic we're using? | 14:46 |
dmsimard | should we use a topic for sprint patches ? | 14:46 |
dmsimard | wow, fungi beat me to it :) | 14:47 |
fungi | heh | 14:47 |
fungi | let's use topic:xenial-upgrades | 14:47 |
dmsimard | wfm | 14:47 |
dmsimard | https://review.openstack.org/527109 is up for logprocessor | 14:47 |
frickler | ianw started with topic infra-xenial already | 14:48 |
fungi | ahh, i'll adjust accordingly. now i see it in the notes section of the pad | 14:48 |
fungi | totally missed it earlier | 14:48 |
clarkb | dmsimard: +2 | 14:48 |
frickler | couple of patches that could be reviewed there already https://review.openstack.org/#/q/status:open+topic:infra-xenial | 14:49 |
dmsimard | ok /me switches topic | 14:49 |
Shrews | So once that lands to the puppet-log_processor repo, do we need to update a repo on puppetmaster, or is that done automatically by the launch script? | 14:49 |
clarkb | Shrews: the puppet modules are updated by the ansible run puppet cron. Which runs every 15 minutes but due to how long it takes to get through effectively runs every 45 minutes | 14:50 |
clarkb | in this case I think we can go ahead and update the git repo early to speed up the process | 14:51 |
dmsimard | frickler: oh your comment on https://review.openstack.org/#/c/515279/ .. I remember writing a blog post exactly for stuff around those lines when 14.04 came out | 14:51 |
fungi | heh, pabelanger already beat me to https://review.openstack.org/502856 "Bump default image to xenial to launch-node.py" | 14:52 |
fungi | so we can already skip specifying --image | 14:52 |
dmsimard | clarkb: it looks like https://review.openstack.org/#/c/515279/ would save us some trouble | 14:54 |
clarkb | reading now | 14:55 |
Shrews | clarkb: so I'm not seeing another logstash-worker02 in the server list. i guess the process didn't get far enough to create it | 14:55 |
Shrews | or it automatically deleted it? | 14:56 |
*** jeblair has joined #openstack-sprint | 14:56 | |
fungi | likely the latter | 14:56 |
clarkb | Shrews: it automatically deleted it | 14:56 |
Shrews | maybe i should just look at the code :) | 14:56 |
clarkb | launch-node tries to be helpful that way | 14:56 |
fungi | if the launch fails for any reason then the script will delete the instance | 14:56 |
Shrews | yay | 14:56 |
fungi | unless you specify --keep and then you can use the temporary root ssh key for that uuid in /tmp to log into it if you need to investigate it directly | 14:57 |
*** baoli has quit IRC | 14:57 | |
clarkb | fungi: are you willing to be second reviewer on https://review.openstack.org/#/c/527109/1 ? | 14:57 |
Shrews | fungi: *nod* thx | 14:57 |
*** baoli has joined #openstack-sprint | 15:00 | |
*** baoli has quit IRC | 15:00 | |
*** pabelanger has joined #openstack-sprint | 15:01 | |
pabelanger | o/ | 15:01 |
pabelanger | running a little behind this morning | 15:01 |
pabelanger | just getting coffee and will start reviewing changes that are up | 15:02 |
fungi | subunit-worker01 puppet-user[11998]: (/Stage[main]/Subunit2sql/Package[subunit2sql]/ensure) change from absent to latest failed: Could not update: Execution of '/usr/local/bin/pip install -q --upgrade subunit2sql' returned 1: Command "/usr/bin/python -u -c "import setuptools, tokenize;__file__='/tmp/pip-build-9bVQMS/netifaces/setup.py';f=getattr(tokenize, 'open', | 15:03 |
fungi | open)(__file__);code=f.read().replace('\r\n', '\n');f.close();exec(compile(code, __file__, 'exec'))" install --record /tmp/pip-wuw6O6-record/install-record.txt --single-version-externally-managed --compile" failed with error code 1 in /tmp/pip-build-9bVQMS/netifaces/ | 15:03 |
frickler | not directly related but adding me to accessbot could also use another review https://review.openstack.org/526125 | 15:03 |
fungi | looks like building netifaces from sdist is failing when attempting to install subunit2sql | 15:03 |
*** baoli has joined #openstack-sprint | 15:04 | |
clarkb | fungi: I wonder if that is due to the same issue we had with virtualenv on the puppetmaster (using python3 instead of 2) | 15:04 |
fungi | ahh, right, this is the case where we want to override --upgrade-strategy for pip | 15:04 |
dmsimard | frickler: ah I guess I should do that too. | 15:04 |
fungi | it's calling /usr/bin/python according to that message, so should be python 2.7 | 15:05 |
clarkb | oh ya | 15:05 |
* clarkb pops out for a few, brb | 15:06 | |
fungi | i think it's just trying to upgrade to a later netifaces than the distro package because it sees that what's on pypi is newer (even though what's already installed is sufficient) | 15:06 |
pabelanger | cool, looks like logstash-workers have been started | 15:06 |
pabelanger | I'm going to delete puppetdb.o.o and puppetdb01.o.o, and clean up system-config | 15:07 |
Shrews | ALL: I have to step out for a chiropractor appointment now. I'll catch up on things when I return. Shouldn't be long. | 15:07 |
fungi | didn't we have a change up to do --upgrade-strategy=only-if-needed for pip in some module recently? codesearch isn't turning it up for me so maybe hasn't merged yet? | 15:09 |
clarkb | fungi: yes, ara in puppet-zuul iirc | 15:10 |
fungi | thanks, finding | 15:10 |
clarkb | doesn't look like it merged though | 15:10 |
clarkb | fungi: https://review.openstack.org/#/c/516740/ yup not merged yet | 15:11 |
pabelanger | remote: https://review.openstack.org/449167 Remove puppetdb / puppetboard server | 15:12 |
fungi | annoying that gerrit message searches unconditionally replace hyphens with spaces so you can't search for strings containing hyphens | 15:12 |
pabelanger | clarkb: fungi: any objections to deleteing puppetdb^/puppetboard above? it is still precise | 15:12 |
pabelanger | err | 15:12 |
pabelanger | yah, precise | 15:13 |
clarkb | for some reason I thought that was already done so no objection from me | 15:13 |
fungi | pabelanger: by all means | 15:13 |
pabelanger | okay, done | 15:17 |
pabelanger | updating etherpad | 15:17 |
dmsimard | pabelanger, clarkb: that reminds me.. i'll take the opportunity of the sprint week to write the draft for continuous deployment dashboard to replace puppetboard | 15:20 |
clarkb | waiting on gating for the log processor fix was clearly a missed opportunity to make breakfast | 15:21 |
dmsimard | yeah it hasn't passed check yet | 15:22 |
* dmsimard starts working on draft | 15:22 | |
pabelanger | dmsimard: great | 15:22 |
pabelanger | okay, working on tripleo mirror now, going to ping them for a larger flavor. 100GB is the max listed right now | 15:23 |
dmsimard | pabelanger: yeah good idea | 15:23 |
pabelanger | I also think, we might be able to now move mirror-update.o.o into a zuulv3 job and periodic pipeline (may have to create) | 15:25 |
clarkb | I'm worried that the log processor fix for centos 7 is out to lunch | 15:28 |
clarkb | er the centos7 job is | 15:28 |
clarkb | we may have to recheck it and if that happens I am making breakfast | 15:29 |
dmsimard | clarkb: should we check out the patch locally ? | 15:31 |
dmsimard | or wait it out ? | 15:31 |
clarkb | I think we should wait it out, if the job doesn't go out to lunch it runs fairly quickly and this way we can't lose track of where we have or haven't fixed this particular systemd/xenial thing | 15:32 |
dmsimard | ack. | 15:32 |
clarkb | (also its not an emergency) | 15:32 |
dmsimard | indeed. | 15:33 |
jeblair | pabelanger: let's not tackle mirror-update right now. i think it will take some work, and just replacing the server will be easier. | 15:33 |
pabelanger | sure | 15:37 |
jeblair | what data does grafana require be migrated? | 15:38 |
pabelanger | jeblair: for AFS services, we can join the new (xenial) servers to the existing AFS cells right? Then after some sync process retire the original trusty based servers? | 15:38 |
jeblair | pabelanger: depends on the servers -- can you be more specific | 15:39 |
pabelanger | jeblair: sure, afsdb01/afsdb02 right now. Could we bring online afsdb03 and join the existing? | 15:40 |
pabelanger | jeblair: I think we'd need to update puppet-grafana in system-config to working xenial, it is also possible we might need to patch grafyaml too. I think they changed some of the APIs in newer versions. | 15:41 |
jeblair | pabelanger: yes -- i forget off the top of my head how to tell it to join, but we should be able to tell it to sync its data from the others, then remove them. | 15:41 |
pabelanger | okay cool | 15:42 |
jeblair | pabelanger: okay, that's not a data migration though... | 15:42 |
jeblair | there's a lot of servers under "these servers require data to be migrated" which i don't think require data to be migrated | 15:43 |
pabelanger | Yah, I might have put it there by mistake. We shouldn't need any data because of grafyaml | 15:43 |
*** baoli has quit IRC | 15:45 | |
*** baoli has joined #openstack-sprint | 15:45 | |
clarkb | dmsimard: frickler Shrews ok if someone can recheck that change when the centos7 job finally times out I am going to make breakfast (there are penty of other roots around now to answer questions, walk through process so feel free to ping them too) | 15:47 |
dmsimard | k | 15:48 |
pabelanger | I'm going to start on eavesdrop01.o.o replacement | 15:49 |
*** baoli has quit IRC | 15:50 | |
pabelanger | IIRC, we'll need to migrate the volume between servers | 15:50 |
pabelanger | clarkb: mind a +3: https://review.openstack.org/449167/ | 15:51 |
*** baoli has joined #openstack-sprint | 15:57 | |
fungi | pabelanger: yeah, /dev/mapper/main-meetbot seems to be on a cinder volume | 15:58 |
pabelanger | yah | 15:58 |
pabelanger | remote: https://review.openstack.org/527139 Update eavesdrop.o.o to support xenial | 15:58 |
pabelanger | reworks eavesdrop.o.o to support numeric hosts | 15:59 |
pabelanger | and ups our testing to start on xenial | 15:59 |
frickler | so I have a patch to make puppet-ethercalc work on xenial. question is: do we need to keep it backwards compatible for < xenial at the same time? or can we avoid a lot of extra code and target only xenial/systemd-based hosts? | 16:03 |
clarkb | looks like the log processor fix is queuing the centos7 job again so we may not need a recheck afterall | 16:08 |
clarkb | frickler: I think it best to keep support for both | 16:08 |
clarkb | frickler: makes the upgrade process (replacing servers) a little simpler | 16:09 |
frickler | clarkb: hmm, I just submitted the xenial-only version, will update later: https://review.openstack.org/527144 Update to work on Ubuntu Xenial or newer | 16:09 |
clarkb | pabelanger: done | 16:13 |
pabelanger | clarkb: frickler: Yah, that one doesn't look too bad to support both. for ethercalc | 16:13 |
pabelanger | clarkb: danke | 16:13 |
*** baoli has quit IRC | 16:15 | |
pabelanger | I'm going to run into town for a quick errand / lunch. But have 2 servers in my name | 16:16 |
pabelanger | I also added a 'Bug Fixes' section to https://etherpad.openstack.org/p/infra-sprint-xenial-upgrades so we can quickly identify things we need to merge | 16:16 |
pabelanger | we should also pick a topic to make it easier, if somebody wants to do so | 16:16 |
pabelanger | should be back in 45mins | 16:16 |
clarkb | there is a topic for the sprint already, not sure if we need another for bugfixes? | 16:17 |
dmsimard | let's use the same topic ? | 16:22 |
dmsimard | we have infra-xenial right now | 16:22 |
*** baoli has joined #openstack-sprint | 16:27 | |
*** baoli has quit IRC | 16:29 | |
*** baoli has joined #openstack-sprint | 16:29 | |
* jeblair looks up grafana stats on cacti | 16:33 | |
jeblair | grafana has like no cpu or memory usage. i think we can shrink the flavor | 16:33 |
jeblair | the 1 year max used ram is 771M (!) | 16:34 |
clarkb | ++ to shrikning flavor | 16:35 |
jeblair | the load average is 0.016 | 16:35 |
jeblair | 1 year max | 16:35 |
jeblair | 2G then? | 16:35 |
clarkb | what is it now? | 16:36 |
clarkb | but ya thats double max ram usage which seems like safe overhead | 16:37 |
jeblair | 8G | 16:37 |
clarkb | 2G sounds good to me | 16:37 |
jeblair | http://cacti.openstack.org/cacti/graph.php?action=zoom&local_graph_id=2715&rra_id=4&view_type=&graph_start=1479957086&graph_end=1513010270 | 16:38 |
jeblair | we will need to reboot it at least once every 2 years i think. | 16:38 |
frickler | pabelanger: clarkb: next question then, may also affect other upgrades: do we want to continue piping the output to log files like here for backwards compatibility? or can we use native systemd/journald log handling? http://git.openstack.org/cgit/openstack-infra/puppet-ethercalc/tree/templates/upstart.erb#n26 | 16:39 |
clarkb | frickler: I'm good with journald, the one thing we should check on that before we commit to it though is whether or not journald is logging persistently on our nodes | 16:40 |
clarkb | should get infra-root's larger opinion too | 16:40 |
* Shrews catches up | 16:42 | |
fungi | as long as stuff gets logged *somewhere* and i can find it, i'm fine | 16:42 |
clarkb | it doesn't look like journald is currently logging persistently on ubuntu fwiw | 16:42 |
clarkb | we can address that though | 16:43 |
jeblair | me too. that hasn't been my experience with journald to date, but if someone's willing to go on a limb and guarantee that, i'm fine with it. :) | 16:43 |
clarkb | Shrews: zuul had a burp on one job for the systemd fix so we are still waiting for that to merge, but its close now | 16:44 |
Shrews | ah, then i haven't missed much fun | 16:45 |
Shrews | looks like it just merged | 17:00 |
frickler | so https://review.openstack.org/527109 merged, do we need to update puppetmaster or is there a cron? | 17:00 |
clarkb | frickler: there is a cron but its a "slow" one. The cron that updates those puppet modules is our main run puppet with ansible cron job | 17:00 |
clarkb | frickler: Shrews dmsimard you can check whree that cron is by looking at /var/log/puppet_run_all.log on the puppetmaster | 17:01 |
clarkb | it looks like it just started at 1700UTC which I think means it will have just updated the module for us | 17:02 |
clarkb | frickler: Shrews dmsimard you can confirm this by running git log at puppetmaster:/etc/puppet/modules/log_processor | 17:02 |
clarkb | once you've poked at those items and have convinced yourselves that I actually did check before making those claims :) I think we can go ahead and try the new instance boot again | 17:03 |
Shrews | yuppers | 17:03 |
Shrews | i used the wait time to setup my tmux properly | 17:04 |
* clarkb migrates into the office now that the kids are awake | 17:06 | |
Shrews | is this something to be concerned about? http://paste.openstack.org/show/628639/ | 17:11 |
Shrews | it seems we progressed past that, but happened to notice it in the output | 17:11 |
clarkb | pabelanger: I think ^ may be related to your host removal work | 17:11 |
clarkb | Shrews: my guess is that the host deletions pabelanger has been doing have resulted in some groups defined that don't match any instances | 17:12 |
clarkb | pabelanger: is that something you can look into? | 17:12 |
clarkb | if that is the cause then I don't think we need to worry about it | 17:12 |
frickler | anyway it failed again for me, will retry with --keep for better debugging, not sure about the failure from the log | 17:14 |
clarkb | frickler: can you share the log? | 17:14 |
jeblair | 17:12 < openstackgerrit> James E. Blair proposed openstack-infra/system-config master: Support xenial on health https://review.openstack.org/527169 | 17:14 |
jeblair | 17:14 < openstackgerrit> James E. Blair proposed openstack-infra/system-config master: Support xenial on stackalytics https://review.openstack.org/527171 | 17:14 |
jeblair | since the first step is to update the node selector and the node-os comment in site.pp, and then wait for that to gate, is there any reason we shouldn't do a bunch of those ahead of time ^ ? | 17:15 |
clarkb | jeblair: probably not, just split them up so that failures can be debugged individually | 17:15 |
jeblair | clarkb: ya, i've pushed up 3 all based on tip so far | 17:16 |
frickler | http://paste.openstack.org/show/628641/ is the tail of it, neglegted to tee all of it | 17:16 |
clarkb | frickler: ya may need keep or a bigger screen bugger to see why puppet is unhappy | 17:18 |
clarkb | some dependency for logrotate failed looks like | 17:19 |
frickler | ya, need to amend my tmux settings to have more scrollback and searching | 17:19 |
clarkb | maybe its a new package name or different dir path for that config? | 17:19 |
jeblair | just so we're really clear, i'm pushing up a bunch of changes, but i don't plan on doing all these servers, i'm just trying to save time so that the initial step (with a bunch of waiting) is already done. please grab/update/abandon my changes as needed as you work on servers. | 17:20 |
frickler | clarkb: /tmp/launch-log on puppetmaster is the complete log now, instance is kept for checking | 17:25 |
clarkb | frickler: its lookling like the reload for systemctl isn't finding the sys v compat scripts? maybe permissions or something is wrong with them? | 17:27 |
clarkb | frickler: running the systemctl reload in the foreground may have more details? possibly also list-units? | 17:28 |
clarkb | I need to pop out agin to help with kids now that they are awake. Back in a bit. Look forward to seeing what you find out | 17:30 |
frickler | clarkb: http://paste.openstack.org/show/628645/ looking deeper into the service definitions now | 17:32 |
Shrews | i know less about puppet than anybody, but there is this in that log: | Dec 11 17:20:36 logstash-worker01 puppet-agent[10308]: Could not run: SIGTERM | 17:32 |
clarkb | Shrews: that is expected since we are puppet apply only I think. That happened as a result of the puppet agent stop I think | 17:34 |
dmsimard | frickler: the daemon reload isn't working | 17:35 |
dmsimard | frickler: (/Stage[main]/Openstack_project::Logstash_worker/Log_processor::Worker[B]/Service[jenkins-log-worker-B]/enable) change from false to true failed: Could not enable jenkins-log-worker-B: | 17:35 |
frickler | ya, fix upcoming | 17:35 |
jeblair | could folks +3 https://review.openstack.org/527168 please? | 17:36 |
dmsimard | jeblair: do we actually have different grafana numbered nodes ? | 17:36 |
jeblair | dmsimard: not yet -- we're transitioning all of the hosts to numbered hosts so it's easier to replace them | 17:37 |
dmsimard | jeblair: makes sense | 17:37 |
jeblair | dmsimard: so the replacement for grafana.o.o will be grafana01.o.o, with a cname in dns | 17:37 |
dmsimard | jeblair: in any case, that pattern should match numbered or not | 17:37 |
jeblair | yep | 17:37 |
frickler | clarkb: dmsimard: that fixed it on my node: https://review.openstack.org/527193 Fix multiple workers for systemd | 17:37 |
jeblair | i'm using \d* so we continue to have puppet operate on the current host | 17:38 |
dmsimard | frickler: makes sense | 17:38 |
jeblair | dmsimard, frickler: not sure if you're aware -- the node-os comment is read by the infra apply jobs, so adding that xenial comment causes those jobs to run, and we verify that at least puppet apply -noop works on that os. | 17:39 |
dmsimard | I wasn't aware those comments were actually important, thanks for that | 17:40 |
* frickler needs a break now, will take another look later | 17:40 | |
jeblair | it looks like i got 36% of the way through site.pp updating the node matchers and os comments. i'm going to stop there and leave more for others to do. :) | 17:41 |
dmsimard | jeblair: have a comment on https://review.openstack.org/#/c/527172/ | 17:45 |
dmsimard | question came up when I was looking at https://review.openstack.org/#/c/527186/1/manifests/site.pp with the files group left intact | 17:45 |
clarkb | frickler: back and reviewing your fix as well as jeblairs now | 17:53 |
clarkb | oh frickler is taking a break, I have a comment on the fix I'll just update the patch | 17:54 |
clarkb | dmsimard: no patchset on https://review.openstack.org/527193 can you rereview? jeblair care to review as well? | 17:55 |
dmsimard | clarkb: ah I guess frickler's patch was working although it was a little bit uglier with two dashes | 17:56 |
clarkb | dmsimard: ya and may have confused systemd slightly depending on how important that name is | 17:56 |
clarkb | figure better to just get it matching the name used elsewhere and not worry about it | 17:56 |
* dmsimard nods | 17:57 | |
jeblair | dmsimard: good catch thanks | 18:01 |
*** baoli has quit IRC | 18:01 | |
*** baoli_ has joined #openstack-sprint | 18:04 | |
clarkb | I'm just going to approve all those changes without check results as long as my eyeballs don't catch anything wrong with them. Then if tests do fail we can sort them out (otherwise there is just too much state to track) | 18:05 |
clarkb | its unfortunate that our puppet apply --noop testing won't catch the systemd reload issue though | 18:06 |
fungi | i'm looking at the implementation of that in puppet-zuul | 18:09 |
fungi | looks like there's a manifests/systemd_reload.pp classfile implementing it | 18:09 |
fungi | which gets called out as a require line in services | 18:09 |
fungi | but then there's also what looks like basically a duplicate implementation of it in manifests/executor.pp | 18:10 |
clarkb | fungi: that would be one way to do it. The tricky thing is requiring something that won't necessarily be in place on all platforms (but hiding it in a class of its own is one way to do that | 18:10 |
fungi | am i right in thinking that's redundant? | 18:10 |
clarkb | oh ya if there is something else doing it then it problem is redundant /me looks | 18:10 |
fungi | or is it serving some subtle purpose i'm not picking up? | 18:11 |
clarkb | it looks redundant to me as well, but maybe there is an ordering issue that isn't immediately apparent that that works around | 18:12 |
*** baoli has joined #openstack-sprint | 18:46 | |
*** baoli_ has quit IRC | 18:47 | |
clarkb | for anyone wondering why it got quiet all of a sudden we are mostly just waiting on CI to finish and changes to merge at this point (lots of demand in zuul right now) | 18:54 |
pabelanger | and back | 18:57 |
pabelanger | catching up on backscroll | 18:57 |
clarkb | the log_processor fix has finally started jobs | 19:05 |
clarkb | hopeflly will be in gate in the not too distant future then shrews and dmsimard (and frickler if still around) can give it another go. | 19:05 |
fungi | i have a couple of puppet-subunit2sql changes proposed to help me build the replacement worker | 19:12 |
clarkb | I'll do another round of reviews shortly | 19:13 |
pabelanger | remote: https://review.openstack.org/526194 Remove zuulv2 long lived servers | 19:18 |
pabelanger | could use another +3 on^ had to rebase | 19:18 |
pabelanger | clarkb: Shrews: is the pastebin from above on expand-groups.sh still an issue?> | 19:18 |
dmsimard | pabelanger: I believe so | 19:22 |
pabelanger | k, lets land 526194, then delete ansible-inventory cache, since we've deleted some servers | 19:23 |
pabelanger | okay, tripleo has bumped the flavor for mirror to 150GB | 19:27 |
pabelanger | uploading xenial cloud image to tripleo-test-cloud-rh1 now | 19:27 |
clarkb | fungi: did you see https://review.openstack.org/#/c/527193/ ? you amy need similar for subunit2sql | 19:33 |
fungi | clarkb: oh, thanks! i missed that. will update my open change if it's not merged yet | 19:36 |
fungi | added | 19:39 |
pabelanger | okay, mirror01.regionone.tripleo-test-cloud-rh1.openstack.org launched properly | 19:40 |
pabelanger | setting up DNS now | 19:40 |
pabelanger | http://mirror01.regionone.tripleo-test-cloud-rh1.openstack.org/ | 19:43 |
pabelanger | everything seems okay | 19:43 |
pabelanger | I'm going to redirect mirror.regionone to mirror01.regionone now | 19:44 |
pabelanger | DNS updated, waiting to confirm it correct | 19:49 |
clarkb | pabelanger: remember to use hour long ttls on those records (to avoid dns requerying | 19:50 |
pabelanger | clarkb: Yup! confirmed at 60min | 19:50 |
pabelanger | and cname is working | 19:50 |
pabelanger | will accept ssh hostkey on puppetmaster | 19:51 |
pabelanger | remote: https://review.openstack.org/507266 Comment out server in puppet.conf | 19:54 |
pabelanger | I believe that will stop puppet from hanging for 2mins when we boot new servers | 19:55 |
clarkb | pabelanger: will puppet apply do that? | 19:55 |
clarkb | seems like that should be a noop | 19:55 |
clarkb | especially now that ianw's change to stop the agent is in | 19:56 |
ianw | oh good | 19:56 |
clarkb | ianw: good morning | 19:56 |
ianw | sorry, just catching up with reviews etc | 19:56 |
ianw | morning! | 19:57 |
pabelanger | clarkb: I think it is a race condition we install puppet with install_puppet.sh, but server boots and puppet-agent tried to connect to puppetmaster, during when puppet apply is running. So I think it might be too late | 20:01 |
pabelanger | also, trying to see the change ian did | 20:01 |
clarkb | pabelanger: ya but ianw's patch explicitly stop puppet agent | 20:02 |
clarkb | and puppet apply shouldn't talk to a server aiui | 20:02 |
pabelanger | clarkb: I don't think it worked, cause It still happened when I tried bringing tripleo mirror online | 20:02 |
pabelanger | let me see which system-config I had | 20:03 |
pabelanger | | Dec 11 19:36:28 mirror01 puppet-agent[4061]: Could not request certificate: Failed to open TCP connection to puppet:8140 (getaddrinfo: Name or service not known) | 20:03 |
clarkb | ya I'm thinking your system-config may not have been up to date? the change just merged a few hours ago | 20:03 |
pabelanger | also, I see HTTP requests to new tripleo mirror now | 20:03 |
clarkb | Shrews: the fix for log_processor appears to be about to merge, will you be able to give that another shot in a few minutes? | 20:04 |
Shrews | clarkb: yeah | 20:04 |
Shrews | getting frustrated with sockets so could use a diversion | 20:04 |
ianw | i dropped a comment ... so the package just assumes that there's a resolvable remote host called "puppet" ? | 20:04 |
clarkb | ianw: ya thats puppets default behaviopr | 20:05 |
pabelanger | yup | 20:05 |
pabelanger | I think I had ianw commit when I ran it just now | 20:05 |
ianw | ok, TIL :) | 20:06 |
pabelanger | but, will know in a moment when I try to launch next server | 20:06 |
pabelanger | server=puppet | 20:06 |
pabelanger | that is what the default is in puppet.conf for us | 20:07 |
pabelanger | at one point I think we managed it on server boot | 20:07 |
clarkb | is the problem that we can't stop/disable the service until after we've already started it and sent it off trying? | 20:07 |
pabelanger | | + systemctl disable puppet | 20:08 |
pabelanger | okay I see that in my console | 20:08 |
pabelanger | | Executing /lib/systemd/systemd-sysv-install disable puppet | 20:08 |
pabelanger | | Dec 11 19:32:26 mirror01 systemd[1]: Started Puppet agent. | 20:09 |
pabelanger | so, something started it again | 20:09 |
pabelanger | then | 20:09 |
pabelanger | | Dec 11 19:32:28 mirror01 puppet-agent[4061]: Could not request certificate: Failed to open TCP connection to puppet:8140 (getaddrinfo: Name or service not known) | 20:09 |
clarkb | huh | 20:11 |
clarkb | Shrews: arg zuul just put the trusty job back to queuing | 20:12 |
clarkb | I'm worried that infracloud networking is falling over with nodepool running at full capacity | 20:12 |
Shrews | clarkb: did we put your nodepool fix in? | 20:12 |
Shrews | clarkb: if we didn't, we could be hitting that again | 20:13 |
clarkb | Shrews: I don't thin so that reminds me I want to say tobias hda comments for me to address and I cimpletely forgot with the sprint stuff this morning | 20:13 |
* Shrews checks nodepool | 20:13 | |
clarkb | looks like the comments are more along the lines of "this is weird and test doesn't do a good job reproducing but dunno what is going on yet" | 20:14 |
Shrews | clarkb: hrm, only 1 ready&locked node, so unlikely we're hitting the issue you found | 20:15 |
Shrews | just busy | 20:15 |
clarkb | Shrews: ya I'm thinking the networking in hpcloud just can't handle the demand and is dropping connections | 20:16 |
pabelanger | okay, moving on to eavesdrop01.o.o | 20:23 |
pabelanger | | Dec 11 20:30:35 eavesdrop01 puppet-user[11951]: Could not find data item openstack_meetbot_password in any Hiera data file and no default supplied at /opt/system-config/production/manifests/site.pp:347 on node eavesdrop01.openstack.org | 20:32 |
pabelanger | how did we handle hiera data for numeric hosts again? | 20:32 |
pabelanger | did we just move them into a group | 20:32 |
clarkb | pabelanger: yes that is what I have been doing with eg translate | 20:32 |
pabelanger | okay, wanted to confirm | 20:33 |
pabelanger | I'll send a patch for eavesdrop here shortly | 20:33 |
clarkb | Shrews: ok fix for log_processor merged | 20:34 |
clarkb | Shrews: puppetmaster:/var/log/puppet_run_all.log says that the ansible puppet cron is currently running so we can either wait for it to finish or just manually update the puppet module on the puppet master | 20:35 |
clarkb | Shrews: if you are able to give the node launch another go right now I can walk through updating the puppet module | 20:35 |
clarkb | dmsimard: ^you too | 20:35 |
dmsimard | yeah will give a try after extinguishing a fire | 20:36 |
clarkb | (I expect at this point frickler has called it a day) | 20:36 |
Shrews | clarkb: waiting for the puppet repo to update | 20:38 |
Shrews | clarkb: oh, that's what you want to walk me thru | 20:38 |
Shrews | :) | 20:38 |
Shrews | yeah, i'm ready | 20:38 |
clarkb | Shrews: cool, so the module is at /etc/puppet/modules/log_processor | 20:40 |
ianw | with something like 527144 ... do we care about effectively dropping trusty support? should we put a tag in before merging maybe? | 20:40 |
ianw | that's puppet-ethercalc btw, moving from an upstart file to a .service file | 20:40 |
clarkb | Shrews: as root you will want to do a `git remote update` to fetch latest changes then `git checkout origin/master` it might be `git checkout origin master` I can never rememner where git wants the / | 20:41 |
clarkb | Shrews: however the cron will update it in 3 minutes if you want to wait (and avoid conflicts though git should sort those for us in this case) | 20:41 |
clarkb | ianw: ya frickler had asked about that and I had asked to keep trusty support for now. Simplifies the transition/upgrade too | 20:42 |
Shrews | clarkb: those commands put me in a detached HEAD state. is that the norm? | 20:42 |
clarkb | Shrews: yes | 20:43 |
clarkb | Shrews: rather than try and curate a local branch we just checkout upstream states | 20:43 |
clarkb | Shrews: its easier this way when you rely on code review to specify a state | 20:43 |
Shrews | clarkb: that's done then | 20:43 |
clarkb | cool I think you can give the launch node script another go then | 20:43 |
Shrews | if i could get my copy-pasta fixed | 20:48 |
Shrews | k. kicked off | 20:49 |
Shrews | fwiw, launch-node.py does not play nicely with tee | 20:50 |
clarkb | is it writing to stderr? | 20:50 |
Shrews | i guess? | 20:50 |
clarkb | I wonder if that isb ecause that is how ansible does it? | 20:50 |
clarkb | mordred or dmsimard may know | 20:50 |
dmsimard | Shrews: launch-node.py 2&>1 |tee -a file.out ? | 20:51 |
dmsimard | or maybe PYTHON_UNBUFFERED thing | 20:51 |
Shrews | i'll just depend on my tmux buffer | 20:52 |
dmsimard | if we were really motivated, we could do, like, launch-node.py | systemd-cat -t launch-node | 20:55 |
dmsimard | that sends the output straight to the journal and then you can do, like, journalctl -u launch-node | 20:55 |
clarkb | dmsimard: currently no journald on that node | 20:55 |
clarkb | fungi: https://review.openstack.org/#/c/527203/2 failed testing | 20:57 |
fungi | grar | 20:58 |
* dmsimard pictures fungi growling | 20:58 | |
fungi | it's not a particularly intimidating growl | 20:58 |
*** jkilpatr has quit IRC | 20:59 | |
Shrews | clarkb: looks like we can haz new node | 21:00 |
clarkb | Shrews: yay | 21:00 |
clarkb | Shrews: ok now don't immediately do the dns stuff yet | 21:00 |
clarkb | because dns is a pita we should probably talk about it a little | 21:01 |
clarkb | dmsimard: maybe you want to get to the point where you have a launched logstash worker too and we can go through that together? | 21:01 |
fungi | cue rant about proprietary dns hosting api | 21:01 |
dmsimard | clarkb: fire almost extinguished | 21:01 |
clarkb | dmsimard: cool | 21:01 |
Shrews | aaaaaaaand go go gadget rant | 21:01 |
clarkb | Shrews: does it work if I grab lunch and dmsimard gets a node launched before we dig into the next step? | 21:02 |
dmsimard | for context, I don't think I've mentioned this before but I'm basically infra-root for RDO's infrastructure | 21:02 |
dmsimard | so from time to time there's those fires :) | 21:02 |
Shrews | clarkb: yes. i will task switch back to the finger gateway, but we do have the zuul meeting in an hour | 21:02 |
jeblair | i'm back from lunch if needed here | 21:03 |
clarkb | Shrews: oh right zuul meeting | 21:04 |
clarkb | Shrews: we can also just go through the dns stuff and take the pressure off getting everything done in that time | 21:04 |
clarkb | its not the end of the world to go through it multiple times | 21:04 |
clarkb | Shrews: so the deal with DNS is its hosted by rackspace and they use a proprietary client and service for managing it | 21:05 |
clarkb | Shrews: this works reasonably well for when you are just adding a new host (and not replacing an existing one) because adding records is super easy | 21:05 |
clarkb | Shrews: the problems largely lie in removing old records safely because there is no version control like you get with gandi and other services | 21:05 |
clarkb | Shrews: and since we share the openstack.org domain with the foundation we have had cases of stepping on each others toes in the past :/ | 21:06 |
clarkb | Shrews: in this case of replacing an instance my preferred method is to use the command line client to udpate only the reverse PTR records, then log in to the web ui and delete the old A and AAAA records and add new ones | 21:07 |
clarkb | this means we'll only run half of the commands printed out by the launch script (2/4 that update the reverse ptr records) | 21:07 |
clarkb | fungi: jeblair do you recall if the reverse ptr records are the first two commands or the second two? I think they are the first two | 21:08 |
jeblair | they are the first | 21:08 |
Shrews | so command line for one direction resolution, gui for the other | 21:09 |
jeblair | example: http://paste.openstack.org/show/628658/ | 21:09 |
clarkb | Shrews: correct | 21:09 |
clarkb | Shrews: so you can go ahead and run the command above line 15 in jeblairs example (but use the command that were pritned out for your launch invocation) | 21:10 |
jeblair | we have some (a lot of) hiera data assigned by fqdn. i'm guessing that as we transition nodes to numbered, we're going to need to move those to groups, yeah? | 21:11 |
clarkb | jeblair: yup, pabelanger ran into that with eavesdrop and I did with translate*. Making a copy of the heira data in a group is what I did for translate | 21:12 |
clarkb | then once things are transitioned we can remove the fqdn specific data | 21:12 |
fungi | clarkb: the entries with ip addresses are the address records, then entries with server uuids are the reverse ptrs | 21:12 |
fungi | i don't recall what order the wind up in | 21:12 |
jeblair | clarkb: i can never remember how our split group system works. what do i need to do to make a grafana group and add grafana01 to it? | 21:12 |
clarkb | jeblair: in the site.pp add group = grafana line like the other examples in there | 21:13 |
jeblair | that's the only thing? | 21:13 |
clarkb | jeblair: then we need to update the ansible group file that I can never remember the path to /me finds it | 21:13 |
jeblair | yeah, that's the thing i was worried about :) | 21:13 |
clarkb | jeblair: openstack-infra/system-config/modules/openstack_project/files/puppetmaster/groups.txt | 21:14 |
Shrews | clarkb: done | 21:14 |
clarkb | Shrews: ok next step is the fun step | 21:14 |
* clarkb actually goes through process with shrews to figure it out | 21:15 | |
Shrews | you mean the fun doesn't stop there????? | 21:15 |
Shrews | :) | 21:15 |
clarkb | Shrews: go to https://www.rackspace.com/login then click on cloud control panel login | 21:15 |
clarkb | Shrews: username and password can be found in the file being sourced on line 16 in jeblairs example | 21:16 |
fungi | next, attempt to extrude your brain matter through a collander | 21:16 |
clarkb | Shrews: once there click on Networking -> Cloud DNS | 21:17 |
clarkb | then click on openstack.org | 21:17 |
fungi | because, you know, dns is totally a network thing | 21:17 |
clarkb | Now my favorite part of this whole process, it doesn't load all of the records for you to serach at once, so you want to scroll taht scroll bar until its done loading all the things it can load | 21:18 |
* fungi wonders why they don't also put database services under the "storage" menu | 21:18 | |
jeblair | remote: https://review.openstack.org/527245 Create a grafana group | 21:18 |
jeblair | clarkb: can you ^ pls? | 21:18 |
clarkb | jeblair: yup | 21:18 |
clarkb | Shrews: let me know when you get tehre | 21:18 |
Shrews | clarkb: there, and see logstash-worker02 | 21:19 |
pabelanger | remote: https://review.openstack.org/527246 Add eavesdrop into groups.txt | 21:19 |
fungi | yeah, i basically scroll as far down as it will go, then do that again, and again, and again... until it stops letting me do it any longer or i get distracted and go do something else | 21:19 |
pabelanger | clarkb: jeblair: also^ | 21:19 |
fungi | Shrews: there will be two, one for ipv4 and one for ipv6... and they won't be even remotely adjacent in the ui | 21:19 |
Shrews | oh | 21:20 |
clarkb | pabelanger: you have two differen regexes in use fwiw | 21:20 |
jeblair | pabelanger, clarkb: i used \d* and pabelanger used \d+. which is better? | 21:20 |
fungi | which is why once you've gotten it to load all the paginated chunks of the set, you can then use in-browser keyword searching to find them all | 21:20 |
clarkb | jeblair: I think you got yours right because it matches the node spec in site.pp | 21:20 |
jeblair | i mean, specifically, because i don't understand the group system, i don't know if things will break if they are different | 21:20 |
clarkb | pabelanger should update his change to use * in groups.txt I think | 21:20 |
pabelanger | ah, I copypasted another | 21:20 |
pabelanger | let me fix | 21:20 |
Shrews | ah yes. i see both A and AAAA entries | 21:20 |
jeblair | clarkb: sounds like you are inclined to think they may break -- ie, puppet will expect a group to be present that ansible won't have placed on the filesystem, unless they match? | 21:21 |
clarkb | jeblair: I don't think they will break but the old server will continue to fail to find the group it thinks it is in and fall back to the fqdn system instead until it is gone | 21:21 |
clarkb | Shrews: cool now you can use brwoser search to find logstash-worker02 (I think you were 02) | 21:21 |
jeblair | clarkb: that sounds reasonable too. hrm | 21:21 |
pabelanger | okay, updated | 21:22 |
pabelanger | remote: https://review.openstack.org/527246 Add eavesdrop into groups.txt | 21:22 |
Shrews | clarkb: yup | 21:22 |
clarkb | Shrews: then click the little gear next to the records name and click modify record | 21:22 |
clarkb | Shrews: then replace the ipv6 address if modifying the AAAA record with the one launch printed out or the ipv4 if modifying the A record | 21:22 |
clarkb | Shrews: and do that for both the A and AAAA records | 21:22 |
clarkb | pabelanger: approved | 21:23 |
*** jkilpatr has joined #openstack-sprint | 21:24 | |
Shrews | clarkb: done | 21:24 |
clarkb | Shrews: then you can `dig +short logstash-worker02.openstack.org` and `dig +short AAAA logstash-worker02.openstack.org` to see when the records update | 21:25 |
clarkb | once that happens there is one last step we have for the lgostash workers which is updating the firewalls to accept the new host and making sure services on new host are functioning | 21:26 |
Shrews | groovy. i can dig it | 21:26 |
Shrews | far out | 21:26 |
pabelanger | okay, I'm going to delete the old mirror in triple-test-cloud-rh1, I don't see any traffic in apache logs for 45mins now | 21:27 |
* Shrews speaks fungi language | 21:27 | |
clarkb | pabelanger: sounds good | 21:28 |
fungi | gnarly | 21:29 |
Shrews | pfft, that's 2 decades beyond | 21:29 |
jeblair | i need a translator | 21:29 |
dmsimard | ok fire extinguished | 21:30 |
dmsimard | going through a logstashworker now. | 21:30 |
pabelanger | and deleted | 21:30 |
Shrews | clarkb: anyhoo, dig seems to be immediately returning the correct things (from multiple places) | 21:30 |
clarkb | Shrews: awesome, so now some logstash-worker specifc things. We use unauthenticated connectivity to gearman (which could be changed) and to elasticsearch (whcih can't be changed without paying them money or writing our own auth plugin for es) | 21:31 |
pabelanger | ianw: clarkb: so, what are we thinking on https://review.openstack.org/507266/ (puppet DNS error on server boot) | 21:31 |
fungi | clarkb: there are also two other steps... updating the ssh host key cached by root on puppetmaster, and truncating the ansible inventory cache | 21:32 |
pabelanger | clarkb: Shrews: we'll also need to restart firewalls too, to pickup new IP addresses | 21:32 |
clarkb | Shrews: this means we have to kick the firewall on logstash.openstack.org (where gearman server runs) and elasticsearch[2-7].openstack.org where elasticsearch runs to have it pick up the new IPs based on name | 21:32 |
clarkb | fungi: oh right | 21:32 |
fungi | steps which i frequently forget | 21:32 |
pabelanger | i think the last time we changes out logstash workers I wrote an ansible-playbook to restart firewalls, I think I added it to system-config | 21:33 |
clarkb | Shrews: the way to restart the firewall on those nodes is to run `service restart iptables-persistent` | 21:33 |
ianw | pabelanger: my only thought is that it's quite untested on everything other than xenial? | 21:33 |
clarkb | fungi: doesn't launch node automatically truncate the cache file now? | 21:33 |
clarkb | fungi: I think it may, but the ssh key add will need to be done | 21:34 |
clarkb | pabelanger: oh cool | 21:34 |
ianw | pabelanger: maybe we should just limit it to that for now? | 21:34 |
pabelanger | ianw: sure, we can do it for xenial, then add it to others | 21:34 |
fungi | clarkb: oh, maybe | 21:34 |
clarkb | pabelanger: I don't see it, maybe it hasn't merged? | 21:34 |
pabelanger | clarkb: yah, looking now | 21:35 |
clarkb | Shrews: anyways let me know once that is run on logstash.o.o and elasticsearch[2-7].o.o (can just ssh directly or figure out ansible) | 21:35 |
ianw | pabelanger: although actually, the apply tests do run it | 21:35 |
ianw | http://logs.openstack.org/66/507266/2/check/legacy-infra-puppet-apply-3-centos-7/5db1915/job-output.txt.gz#_2017-12-11_20_21_21_950690 | 21:36 |
Shrews | clarkb: will do | 21:36 |
fungi | clarkb: easiest way to be sure is to check whether the old instance continues to appear in the inventory cache file, i guess | 21:36 |
Shrews | clarkb: should these be done in any particular order? | 21:38 |
pabelanger | clarkb: yah, I don't see it any more but it wasn't a big playbook. I can whip up a replacement if needed | 21:38 |
Shrews | clarkb: like logstash.o.o first, then the elasticsearch nodes? | 21:38 |
clarkb | Shrews: probably best if elasticsearch is done first as its at the end of the data processing pipeline | 21:38 |
clarkb | Shrews: this way we don't try processing anything until the whole pipeline can talk | 21:38 |
clarkb | fungi: ya there is code to make sure the inventory cache file is not out of date in launch script | 21:39 |
fungi | oh, good | 21:40 |
pabelanger | ianw: yah, your call. If you want only xenial, I can propose that. | 21:42 |
dmsimard | I guess I'll go learn what the DNS stuff looks like while logstash-worker03 is installing. | 21:42 |
Shrews | clarkb: that should be 'service iptables-persistent restart', right? | 21:43 |
clarkb | Shrews: possibly systemctl goes one way and service theo ther so I mix them up | 21:43 |
dmsimard | Shrews: oh, that's different from trusty to xenial | 21:43 |
clarkb | dmsimard: ya | 21:43 |
dmsimard | Shrews: in xenial is netfilter-persistent | 21:43 |
clarkb | Shrews: if your command works and mine doesn't then yours is correct :) | 21:44 |
dmsimard | clarkb: where is the rackdns script ? | 21:46 |
clarkb | you mean where does things like rdns-create live? | 21:47 |
clarkb | dmsimard: http://paste.openstack.org/show/628658/ is jeblairs example. It lives in the virtualenv that is sourced early in that | 21:47 |
dmsimard | oh, root/rackdns-venv/ | 21:47 |
Shrews | clarkb: those are done | 21:48 |
clarkb | Shrews: ok now we want to hop on the node itself and check the services are working, then we will swing around and do the thing fungi mentioned and remove the old instance | 21:49 |
clarkb | Shrews: there are 4 log worker processes that log in /var/log/logprocessor and one logstash jvm process that logs in /var/log/logstash | 21:49 |
clarkb | Shrews: if you tail the files in /var/log/logprocessor you should see it grabbing gearman jobs and pushing log files | 21:50 |
clarkb | logstash on the other hand seems to make on demand http connections to the elasticsearch servers so as long as the process is running it hsould be fine | 21:50 |
dmsimard | what was that about the firewall ? I think I need that too. /me reads backlog | 21:51 |
dmsimard | getting connection denied to gearman from the new worker | 21:51 |
clarkb | unfortunately logstash doesn't log as well as I'd like | 21:51 |
Shrews | clarkb: yep, seeing that | 21:51 |
fungi | pretty ironic considering its name | 21:51 |
clarkb | dmsimard: yup we use the dns names to set up firewall rules so you need to "restart" the iptables-persistent service once you are happy with the state of dns | 21:52 |
clarkb | on logstash.o.o and elasticsearch[2-7].o.o | 21:52 |
fungi | or netfilter-persistent | 21:52 |
Shrews | those machines are still trusy | 21:52 |
Shrews | trusty | 21:52 |
fungi | ahh, right-o | 21:52 |
clarkb | Shrews: so I think this ndoe is happy | 21:52 |
dmsimard | clarkb: hmm, so we need to change the DNS before the worker can connect to gearman ? | 21:52 |
clarkb | dmsimard: correct | 21:52 |
dmsimard | thus we can't really validate that it works | 21:52 |
dmsimard | should we perhaps use /etc/hosts ? | 21:53 |
dmsimard | at least before changing the DNS to ensure it works | 21:53 |
fungi | if we're worried about not being able to switch back and forth quickly enough, set a low ttl on the record | 21:53 |
clarkb | fungi: ya that | 21:53 |
clarkb | dmsimard: ^ | 21:53 |
dmsimard | TTLs is mostly a suggestion though | 21:53 |
dmsimard | but sure | 21:53 |
clarkb | this is also fairly specific to the logstash workers fo which we have many and can be replaced at any time | 21:53 |
clarkb | because elasticsearch is money grabbing for features | 21:54 |
dmsimard | lol | 21:54 |
*** pabelanger_ has joined #openstack-sprint | 21:55 | |
*** EmilienM_ has joined #openstack-sprint | 21:55 | |
clarkb | Shrews: now before zuul meeting. As root on puppet master you need to ssh to logstash-worker02 and accept its ssh host key. This is so that ansible can ssh to it for puppetting | 21:55 |
*** EmilienM has quit IRC | 21:56 | |
*** pabelanger has quit IRC | 21:56 | |
clarkb | Shrews: then for deleting the old instance when we are happy with how new one is functioning (seems fine to me so far) | 21:56 |
clarkb | Shrews: I like to do something like `openstack --os-cloud openstackci-rax --os-region DFW server show cf873928-122c-447b-ad24-d1e213d277f0` to confirm the uuid I think is the old instance is actually the old instance | 21:56 |
*** EmilienM_ is now known as EmilienM | 21:56 | |
dmsimard | TTL is already 300, short enough | 21:56 |
clarkb | Shrews: then I can change the 'show' in that command to 'delete' to delete it | 21:56 |
*** EmilienM has quit IRC | 21:56 | |
*** EmilienM has joined #openstack-sprint | 21:56 | |
Shrews | clarkb: known_hosts updated | 21:57 |
*** pabelanger_ is now known as pabelanger | 21:57 | |
dmsimard | will it work if we do a rdns create/record create on a record that already exists ? | 21:58 |
clarkb | dmsimard: sort of | 21:58 |
Shrews | clarkb: old server deleted. many thx for the guidance | 21:58 |
dmsimard | clarkb: heh, okay, let's see. | 21:58 |
fungi | yeah, having the same reverse dns for multiple systems is perfectly fine | 21:58 |
clarkb | dmsimard: I walked shrewd though it above, the reverse dns you can run the commands that launch spat out. So basically everything above line 15 in jeblairs example | 21:58 |
clarkb | dmsimard: but when replacing a server it is easier to update the forward A and AAAA records through the gui | 21:59 |
Shrews | etherpad updated. now meeting | 21:59 |
clarkb | dmsimard: otherwise you get a round robin between the instances | 21:59 |
clarkb | Shrews: thanks! | 21:59 |
dmsimard | clarkb: yeah I've seen that, but for an existing node I'd tend to do a delete before the create -- or there is a record modify command, but not a rdns modify. | 21:59 |
pabelanger | IIRC, rdns won't update, but create a 2nd DNS entry | 21:59 |
Shrews | clarkb: oh, updating ansible inventory cache? | 21:59 |
clarkb | Shrews: launch handled that for us buiilt in features | 21:59 |
Shrews | cool cool cool | 22:00 |
clarkb | dmsimard: ya rdns is specific to the IP address | 22:00 |
clarkb | dmsimard: and the other rdns record gets removed when you delete the old instance | 22:00 |
clarkb | dmsimard: wheras A and AAAA are specific to the name | 22:00 |
pabelanger | sorry, record-create will not update | 22:00 |
jeblair | it's zuul meeting time in #openstack-meeting-alt | 22:00 |
clarkb | dmsimard: so its an artifact of how DNS + rax dns service operate | 22:00 |
fungi | dmsimard: problem is you need to know thee "record id" for it which you can only get from the api, but the api refuses to return more than 100 records i think, and has no pagination, so you usually can't get the info you need to delete or modify a record via th api | 22:00 |
dmsimard | bah | 22:01 |
dmsimard | jeblair has not written a raxtty yet? :D | 22:01 |
fungi | for the a/aaaa records | 22:01 |
fungi | i doubt jeblair has any interest writing a client for a proprietary api | 22:01 |
dmsimard | it was mostly a joke, but indeed | 22:01 |
clarkb | dmsimard: so your general process here is run the commands for reverse dns, then ignore the forward dns commands. Switch over to rax gui using steps I described above for shrews and modify the A and AAAA records to point at the new IP addresses | 22:02 |
clarkb | dmsimard: then once dig reports new addrs "restart" the iptables-persistent service on the nodes that firewall things (logstash.o.o and elasticsearch[2-7].o.o) | 22:02 |
dmsimard | yup, I'll figure it out and report back if I have issues | 22:02 |
clarkb | and be very careful when modifying openstack.org records as there is no revision control and it is a shared resource :/ | 22:03 |
clarkb | DNS is bascially the least optimal part of this whole process | 22:03 |
clarkb | dmsimard: also totally happy to walk you through it step by step like I did with shrews after the zuul meeting if you like | 22:21 |
*** baoli has quit IRC | 22:28 | |
*** baoli has joined #openstack-sprint | 22:29 | |
*** baoli has quit IRC | 22:33 | |
*** larainema has quit IRC | 22:45 | |
dmsimard | clarkb: DNS updated so I'll check every once in a while. Someone mentioned there was an ansible inventory somewhere ? | 22:52 |
clarkb | dmsimard: there is, it is what the ansible that runs puppet uses to know what to puppet, but the launch node script automatically updates that for you so you should be fine | 22:53 |
dmsimard | clarkb: oh, it was mostly to do like ansible -i inventory -m command "dig ..." :) | 22:54 |
clarkb | oh that, I think the default inventoy will work | 22:54 |
clarkb | but default inventory has every control plane host in it so be careful | 22:54 |
dmsimard | yeah, but where is it ? | 22:55 |
dmsimard | oh, /etc/ansible/hosts, got it | 22:55 |
clarkb | dmsimard: /etc/ansible/hosts/openstack it uses the openstack dynamic inventory thing | 22:55 |
clarkb | (with a cache file that is the thing that launch-node.py updates) | 22:55 |
dmsimard | ansible -i /etc/ansible/hosts/openstack logstash.openstack.org,elasticsearch* --list-hosts <-- does what I wanted | 22:59 |
dmsimard | clarkb: new logstash-worker03 is processing things \o/ | 23:02 |
clarkb | dmsimard: woot | 23:02 |
dmsimard | so delete the old one and done ? | 23:03 |
clarkb | dmsimard: so the lsat two steps are to make sure root accepts the host key for the new host on puppetmaster (just ssh to the host and accept it if it looks good) then delete the old one | 23:03 |
clarkb | dmsimard: like I told shrews I like to use openstack server show $uuid and check that the uuid I have is the one then change show to delete to delete it | 23:03 |
*** jesusaur has quit IRC | 23:03 | |
clarkb | dmsimard: and you have to use uuid in this case because there are duplicate matching names | 23:03 |
dmsimard | yeah | 23:04 |
dmsimard | I always use UUIDs anyway, even for flavors and images | 23:04 |
dmsimard | name matching is nice but.. | 23:04 |
pabelanger | heads up, I've modifying /etc/puppet/hieradata for eavesdrop01 | 23:04 |
pabelanger | testing out, then will commit changes | 23:04 |
clarkb | dmsimard: looks like 582c3ddf-a669-4c2b-bdd3-87a5ca088d0f in this case | 23:05 |
dmsimard | yeah | 23:05 |
dmsimard | 582c3ddf-a669-4c2b-bdd3-87a5ca088d0f is deleted \o/ | 23:06 |
pabelanger | cool | 23:06 |
dmsimard | ok, that was easy enough once we churned through some of the patches | 23:06 |
clarkb | dmsimard: if the host key has been accepted I think thats it | 23:06 |
dmsimard | I have to step away for dinner but I'll probably take a few out | 23:06 |
dmsimard | clarkb: yeah, did that too. | 23:07 |
clarkb | ya I'm about to call it a day myself. Got up very early and expect I'll try that again to walk frickler through the rest of the process | 23:07 |
dmsimard | clarkb: I'll send you a link later tonight for continuous deployment dashboard spec | 23:07 |
dmsimard | no rush, just sayin | 23:07 |
*** jesusaur has joined #openstack-sprint | 23:09 | |
ianw | would someone might a quick eye on https://review.openstack.org/#/c/526975/ and i'll see about status.o.o | 23:10 |
ianw | i'm also working through the puppet for nodejs and ethercalc | 23:10 |
clarkb | ianw: ya I can take a look before I call it a day | 23:10 |
ianw | yep we were chatting yesterday, all good | 23:11 |
clarkb | ianw: re 526975 I think you also want to add a status group? see https://review.openstack.org/527245 | 23:13 |
ianw | clarkb: ok, done | 23:15 |
clarkb | ianw: one thing inline | 23:17 |
pabelanger | okay, hieradata for eavesdrop group works, I've commit the change | 23:18 |
jeblair | i've added a grafana group to private hiera | 23:18 |
clarkb | ianw: +2 thanks | 23:19 |
pabelanger | okay, eavesdrop server failed. running with --keep to debug and propose fixes | 23:20 |
ianw | heira will fallback to the fqdn if the group doesn't exist? | 23:26 |
clarkb | ianw: yes | 23:26 |
clarkb | most specifi cmatch wins | 23:26 |
clarkb | in the case of status.o.o -> status01.o.o ther ewon't be an fqdn file for status01.o.o. What I did for translate was to copy the existing translate.o.o fqdn hiera data to a group for translate | 23:27 |
clarkb | and I've now got kids telling me its walk time so I gotta go | 23:27 |
clarkb | thanks everyone see you tomorrow | 23:27 |
jeblair | http://grafana01.openstack.org/dashboard/db/zuul-status | 23:29 |
jeblair | that looks really promising | 23:29 |
jeblair | i'll delete dns for the old server and add a cname now | 23:30 |
pabelanger | nice | 23:38 |
jeblair | new dns has taken effect for me | 23:41 |
jeblair | i'll delete the old server tomorrow unless someone screams | 23:42 |
*** baoli has joined #openstack-sprint | 23:46 | |
*** baoli_ has joined #openstack-sprint | 23:50 | |
*** baoli has quit IRC | 23:50 | |
pabelanger | okay, I see the issue with eavesdrop01 | 23:52 |
pabelanger | Dec 11 23:26:21 eavesdrop01 puppet-user[11794]: (/Stage[main]/Ptgbot/Exec[install_ptgbot]) Failed to call refresh: Could not find command 'pip3' | 23:52 |
pabelanger | I start working on a fix | 23:52 |
Generated by irclog2html.py 2.15.3 by Marius Gedminas - find it at mg.pov.lt!