*** openstack has joined #openstack-sprint | 00:04 | |
pabelanger | decided to switch to rsync | 00:18 |
---|---|---|
pabelanger | had to bring meetbot back online | 00:18 |
*** baoli has joined #openstack-sprint | 00:25 | |
*** openstackstatus has quit IRC | 00:46 | |
*** openstack has joined #openstack-sprint | 00:55 | |
pabelanger | \o/ | 00:55 |
pabelanger | logs persisted to cinder on eavesdrop.o.o | 00:56 |
pabelanger | just waiting for confirmation that logging is still working | 00:57 |
*** anteaya has quit IRC | 00:59 | |
pabelanger | w00t: http://eavesdrop.openstack.org/irclogs/%23openstack-sprint/latest.log.html | 01:00 |
pabelanger | I'll finish the migration of eavesdrop.o.o tomorrow | 01:00 |
pabelanger | should be able to use the same rsync process for lists.o.o too, and have minimal downtime for migrating the data to cinder | 01:01 |
jhesketh | with server deletions are we taking snapshots or anything before offlining them? | 02:04 |
jhesketh | do we have any recovery plans if we delete something we actually needede | 02:04 |
*** openstack has joined #openstack-sprint | 02:40 | |
*** rfolco has quit IRC | 03:03 | |
*** baoli has quit IRC | 03:26 | |
*** rfolco has joined #openstack-sprint | 03:46 | |
*** rfolco has quit IRC | 04:16 | |
*** baoli has joined #openstack-sprint | 04:27 | |
*** baoli has quit IRC | 05:39 | |
*** openstack has joined #openstack-sprint | 06:55 | |
*** baoli has joined #openstack-sprint | 12:46 | |
*** baoli_ has joined #openstack-sprint | 12:49 | |
*** baoli has quit IRC | 12:52 | |
*** anteaya has joined #openstack-sprint | 13:16 | |
pabelanger | jhesketh: I've been deleting servers as I have replaced them. Once I've confirmed things are working correctly. | 13:18 |
jhesketh | pabelanger: sure I more meant as a backup in case we've missed something and don't notice for a week | 13:19 |
pabelanger | jhesketh: ya, I've assumed we'd find that data on our bup.o.o server, but that is a big assumption on my behalf | 13:20 |
jhesketh | Hmm okay | 13:22 |
pabelanger | I think I'll push eavesdrop.o.o to tomorow | 13:22 |
jhesketh | There's probably no harm in taking snapshots right? As in the storage is reasonably cheap | 13:22 |
pabelanger | since there are only 2 meetings on Friday | 13:22 |
pabelanger | jhesketh: I don't think so | 13:22 |
*** rcarrillocruz has joined #openstack-sprint | 13:23 | |
rcarrillocruz | oh | 13:24 |
rcarrillocruz | was not aware of a sprint :/ | 13:24 |
pabelanger | np | 13:25 |
pabelanger | What is the plan for logstash-workerXX.o.o? You and yolanda are working on 01? | 13:26 |
rcarrillocruz | we started on it about 45min on it | 13:26 |
rcarrillocruz | playbooks and roles runs with latest changes | 13:26 |
fungi | pabelanger: yeah, i wouldn't make assumptions about backups being viable unless you 1. confirm the server in question actually has backups configured, and 2. test restoring some data from it | 13:26 |
rcarrillocruz | but the ansible-puppet role fails | 13:26 |
rcarrillocruz | as it cannot access hiera files | 13:26 |
*** yolanda has joined #openstack-sprint | 13:26 | |
rcarrillocruz | not sure why | 13:26 |
rcarrillocruz | do you folks run launch-node.py as root? | 13:27 |
yolanda | hi | 13:27 |
rcarrillocruz | yolanda: could you share the commands you are running with pabelanger so he can login and check | 13:27 |
yolanda | sure | 13:27 |
pabelanger | rcarrillocruz: yolanda: Ah, yes that is a bug with our file permissions. We need to change the hieradata to puppet:puppet IMO | 13:27 |
pabelanger | fungi: ack | 13:27 |
rcarrillocruz | oh ok | 13:27 |
pabelanger | I've been doing launch-node.py as root | 13:28 |
jhesketh | fungi: what do you think about snapshotting nodes before deletion? | 13:28 |
rcarrillocruz | so, yolanda , ansible-playbook as 'root' should work then | 13:28 |
yolanda | yes, running with root now | 13:28 |
fungi | jhesketh: for servers with actual state on their filesystems, i'm not opposed to snapshotting | 13:28 |
fungi | jhesketh: for stuff like zmXX or logstash-workerXX i'm less concerned | 13:28 |
jhesketh | fungi: well that's the tricky part right. If we've got all the state elsewhere (such as git in the case of apps.o.o) we're fine. It's more for what we might have missed or not noticed | 13:29 |
yolanda | rcarrillocruz, weird, it timed out on creating keypairs | 13:30 |
yolanda | going to retry | 13:30 |
jhesketh | But yeah those workers are more obvious | 13:30 |
rcarrillocruz | rax api or net transient issue i assume | 13:30 |
fungi | jhesketh: the main things which make snapshotting useful for some of those is so that we can dig up logs from before the upgrade (or similar stuff like periodic mysqldumps) | 13:31 |
jhesketh | That's a good point too | 13:31 |
rcarrillocruz | once we migrate all the stuff to trusty, i'd like to start a conversation on how we are going to manage the infra resources. Making a playbook that leverages the cloud-launcher role to have a feature parity with launch_node.py is ok, but the advantages of the role is to have a full infra defined in a yaml | 13:31 |
jhesketh | Can we set retentions on snapshots? Do we need to or do we have enough quota to keep them indefinitely | 13:32 |
rcarrillocruz | i.e. work a process on adding new servers / resources in a resources.yml and continously deploy them | 13:32 |
pabelanger | rcarrillocruz: I think that is some of the proposal nibalizer added, but ya. Eager to do that too | 13:32 |
jhesketh | +1 | 13:33 |
rcarrillocruz | it's linked to what nibalizer talked about one-off thing , yeah... | 13:33 |
fungi | rcarrillocruz: and the design for that is also predicated on refactoring a lot of our manifests so that we can start having hostnames differ from service names/site names | 13:33 |
rcarrillocruz | my next work item is adding the ability to have servers in the resources.yml a 'node_count', so if you have let's say a server 'logstash-worker.openstack.org' with node_count attribute set to two, the role would create two numbered resources | 13:34 |
rcarrillocruz | fungi: ++ | 13:34 |
fungi | jhesketh: we've hardly used snapshotting (aside from our old nodepool image method, and that was in a separate tenant) so no idea | 13:34 |
yolanda | rcarrillocruz, it worked fine with root | 13:39 |
anteaya | rcarrillocruz: we talked about the sprint several times in the weekly meeting, how could you have found out about it prior to it starting? | 13:39 |
rcarrillocruz | \o/ | 13:39 |
rcarrillocruz | pabelanger: ^ | 13:39 |
rcarrillocruz | so | 13:39 |
rcarrillocruz | i guess i leave to both of you | 13:39 |
rcarrillocruz | to get rid of ls workers | 13:39 |
rcarrillocruz | and you spin replacements | 13:39 |
anteaya | rcarrillocruz yolanda glad you are here, just wanting to know how you consume information of this nature? | 13:39 |
rcarrillocruz | clarkb mentioned also the dns corrections and modifying the firewall for the changes | 13:40 |
rcarrillocruz | let me search for that irc conversation, i'll paste here | 13:41 |
pabelanger | rcarrillocruz: cool, so a bug on our permissions. Will wait until nibalizer is around to confirm we can update /etc/puppet/hieradata to puppet:puppet | 13:42 |
rcarrillocruz | this is what clarkb put yesterday about additional steps: | 13:42 |
rcarrillocruz | http://paste.openstack.org/show/505652/ | 13:42 |
rcarrillocruz | pabelanger , yolanda : ^ | 13:43 |
rcarrillocruz | anteaya: sorry, i don't follow what you mean of consume information of this nature | 13:43 |
rcarrillocruz | not sure if i missed an earlier comment | 13:43 |
rcarrillocruz | oh | 13:43 |
rcarrillocruz | the sprint announcement | 13:43 |
rcarrillocruz | well | 13:43 |
pabelanger | rcarrillocruz: Yup, had to do the same with zuul.o.o | 13:43 |
rcarrillocruz | i missed the meeting | 13:43 |
rcarrillocruz | i use to be able to attend | 13:43 |
rcarrillocruz | but haven't been able the last one | 13:43 |
rcarrillocruz | i should have read the meeting wrap-up i guess | 13:44 |
rcarrillocruz | anyway, it's good that incidentally i was working on something that is *about* this sprint topic :D | 13:44 |
anteaya | rcarrillocruz: oh okay, I just assumed you were reading meeting logs | 13:45 |
anteaya | I'm glad you are here | 13:45 |
rcarrillocruz | out for a bit, my wife is about to arrive with the kid | 13:45 |
rcarrillocruz | be back in an 1h, haven't had lunch yet (starving!) | 13:45 |
anteaya | rcarrillocruz: happy family time | 13:45 |
yolanda | rcarrillocruz, so i totally lost the scope of the sprint, and the intentions with logstash. I have not been able to follow chat and contribute to infra so much lately... so what shall we do with that logstash-test ? first of all, is that naming something temporary? i guess we need to be launching replacements for logstash to 1-20? | 13:46 |
anteaya | yolanda: hope all is well with your family | 13:46 |
yolanda | anteaya, a bit better, but things are still complicated. Thanks... | 13:47 |
anteaya | yolanda: I understand, yes it is hard to be all the places you want to be | 13:47 |
anteaya | thanks for your help here | 13:47 |
yolanda | family is first priority... but trying to be back to work normally | 13:48 |
yolanda | rcarrillocruz, i guess first step is to approve your changes, so we have a functional launch-node play | 13:48 |
anteaya | yolanda: yup, I agree with your priorities | 13:49 |
anteaya | and I understand enjoying the routine of work | 13:49 |
anteaya | clarkb was back within about 3 weeks of the babies being born | 13:50 |
yolanda | we may be a bit workaholics... | 13:52 |
anteaya | agreed | 13:54 |
anteaya | a big family of workaholics | 13:54 |
pabelanger | yolanda: rcarrillocruz: okay, I am going to do logstash-worker01.o.o using launch-node.py, just to confirm everything works correctly on ubuntu-trusty for the first one. Then, I'll review the topic and see how to use cloud-launcher for the next server | 14:07 |
yolanda | ++ | 14:07 |
rcarrillocruz | ++ | 14:07 |
pabelanger | clarkb: is there a procedure for taking a logstack-worker out of service? | 14:07 |
pabelanger | dropping the DNS TTL to 5mins on logstash-workers | 14:09 |
clarkb | pabelanger: no, you just stop services on them, logstash and the 4 jenkins log workers | 14:12 |
clarkb | as rcarrillocruz mentioned you will have to bounce iptables on the elasticsearch hosts and logstash.o.o after dns updates | 14:13 |
pabelanger | clarkb: perfect | 14:13 |
pabelanger | ack | 14:13 |
rcarrillocruz | yolanda: can you pls add the steps to create a worker with the launcher onto https://etherpad.openstack.org/p/newton-infra-distro-upgrade-plans ? | 14:14 |
rcarrillocruz | with the source ansible/hacking/env-setup etc | 14:14 |
rcarrillocruz | rax params and all | 14:14 |
rcarrillocruz | so pabelanger or whoever can just copy paste and run it | 14:14 |
pabelanger | rcarrillocruz: why do we need ansible-devel branch? | 14:15 |
rcarrillocruz | because of this bug: https://github.com/ansible/ansible/issues/14146 | 14:16 |
rcarrillocruz | you can't have include with_items nested, it's broken | 14:16 |
rcarrillocruz | that will land on ansible 2.1 , which is about to be released | 14:16 |
pabelanger | 2.1 was released yesterday :) | 14:16 |
yolanda | i tested with ansible 2.2 on my venv | 14:17 |
rcarrillocruz | was it? | 14:17 |
rcarrillocruz | \o/ | 14:17 |
rcarrillocruz | haven't seen the announcement on ansible-devel | 14:17 |
* rcarrillocruz goes check | 14:17 | |
rcarrillocruz | weeee | 14:18 |
rcarrillocruz | you folks ok if I bump the version on puppetmaster | 14:18 |
rcarrillocruz | ? | 14:18 |
pabelanger | clarkb: does it make more sense to just delete the logstash-worker from rackspace vs stopping services? | 14:20 |
clarkb | pabelanger: I would leave old one up, make new one, bounce iptables, make sure new one works then delete old | 14:21 |
pabelanger | okay, I'm fine with that | 14:21 |
clarkb | its fine to have both running at the same time (we normally have 20 instances) | 14:21 |
rcarrillocruz | hmm | 14:21 |
rcarrillocruz | https://github.com/openstack-infra/puppet-ansible/blob/master/manifests/init.pp | 14:21 |
rcarrillocruz | so by default the class installs latest from pip | 14:22 |
rcarrillocruz | i vaguely recall an issue with the pip provider where latest had issues | 14:22 |
rcarrillocruz | can someone confirm if the puppetmaster ansible is now on 2.1 ? | 14:22 |
rcarrillocruz | i.e. 'ansible --version' | 14:22 |
yolanda | checking | 14:22 |
yolanda | 2.1.0.0 | 14:23 |
rcarrillocruz | \o/ | 14:23 |
rcarrillocruz | yesterday you pasted me 2.0.0.2 | 14:23 |
rcarrillocruz | so all good then | 14:23 |
rcarrillocruz | just paste the ansible-playbook command and will be good :D | 14:24 |
yolanda | yes | 14:24 |
yolanda | it got updated | 14:24 |
rcarrillocruz | thx a bunch | 14:24 |
yolanda | let me try without my venv, just with the ansible we have | 14:24 |
rcarrillocruz | and sorry for being a pain and being woman-in-the-middle :/ | 14:24 |
yolanda | glad to help a bit with the sprint, even just for that... | 14:25 |
pabelanger | launching replacement logstash-worker01.o.o now | 14:25 |
yolanda | confirmed that play works with ansible 2.1 version | 14:28 |
rcarrillocruz | 'o/ | 14:30 |
rcarrillocruz | \o/ | 14:30 |
rcarrillocruz | and with that, i can resume the work on my tests for the cloud launcher... was reluctant to test against devel on requirements.txt | 14:31 |
pabelanger | clarkb: so looking at logstash-worker01.o.o replacement, I guess our services don't launch on start up? They need to be manually started? | 14:47 |
rcarrillocruz | that would follow a common pattern on other manifests, like zuul... | 14:48 |
clarkb | pabelanger: ya that sounds right | 14:48 |
pabelanger | rcarrillocruz: zuul-mergers will start on boot, that's why I was asking | 14:49 |
pabelanger | clarkb: ack | 14:49 |
clarkb | logtsash should start iirc | 14:49 |
clarkb | but not the workers | 14:49 |
pabelanger | Right, jenkins workers are stopped | 14:50 |
nibalizer | pabelanger: rcarrillocruz heya | 15:20 |
nibalizer | what's up with hiera? | 15:20 |
pabelanger | clarkb: okay, I think logstash-worker01.o.o is upgrade properly. Is there an easy way to confirm it is working properly? | 15:20 |
pabelanger | nibalizer: We think the permissions on /etc/puppet/hieradata on to restrictive for non-root users, in the puppet group. And have some issues using launch-node.py as non-root | 15:22 |
clarkb | pabelanger: tail /var/log/logprocessor files | 15:22 |
clarkb | you should see those log files reporting work is happening | 15:22 |
pabelanger | clarkb: only issue I see is some HTTPError: HTTP Error 404: Not Found | 15:23 |
pabelanger | not sure if that is expected | 15:23 |
clarkb | ya that is expected since it is greedy and many log files dont exist on all the jobs | 15:24 |
clarkb | pabelanger: but the log files are advamcing? | 15:24 |
pabelanger | clarkb: yes | 15:24 |
clarkb | then should be good | 15:25 |
pabelanger | perfect! | 15:25 |
nibalizer | pabelanger: well those are the keys to the kingdom... where are you seeing a permission denied? | 15:25 |
clarkb | they will not advance if part of the pipeline is plugged | 15:25 |
nibalizer | i am suprised that launch-node is going into /etc/puppet/hiera | 15:25 |
pabelanger | nibalizer: issue revolves around copying hiera data onto the new node, the bits fail to copy properly (which I assume is because of permission issues). I haven't debugged it, but have switched to root for the moment. | 15:26 |
rcarrillocruz | nibalizer: not the script itself, but the remote-adhoc-puppet playbook, which in turn is the ansible-puppet | 15:26 |
nibalizer | oh interesting | 15:28 |
nibalizer | yep that will fail | 15:28 |
nibalizer | we wrote the hiera-copy stuff assuming root | 15:29 |
nibalizer | so it comes down to which is more valuable - running launch_node.py as nonroot or having /etc/hieradata locked off | 15:29 |
pabelanger | right | 15:29 |
nibalizer | there is at least one more option, that sudo is used to grab the file from /etc/hieradata | 15:29 |
nibalizer | so yea i'd defer to fungi clarkb and jeblair on that one | 15:30 |
fungi | yes, i've been running in an interactive root shell because the hiera copying doesn't work if launch-node.py is run as non-root | 15:32 |
rcarrillocruz | pabelanger: so, does trusty work ok for the logstash worker manifest? | 15:37 |
pabelanger | rcarrillocruz: yup | 15:37 |
rcarrillocruz | nice | 15:37 |
pabelanger | going to do a few more now, then try out cloud-launcher after lunch | 15:39 |
rcarrillocruz | ++ | 15:40 |
* rcarrillocruz goes for a coffee | 15:42 | |
pabelanger | okay, letting logstach-worker05.o.o build while I get some food | 16:27 |
pabelanger | once online, I'll use cloud-launcher for logstash-worker06.o.o | 16:27 |
*** rfolco has joined #openstack-sprint | 16:53 | |
rcarrillocruz | pabelanger: i can't see the entry from yolanda on the etherpad about the exact one-liner she ran | 16:54 |
rcarrillocruz | but left instructions at the bottom | 16:54 |
rcarrillocruz | you should be able to work it out with that | 16:54 |
pabelanger | switching to cloud-launcher | 17:01 |
clarkb | I am about to do logstash.o.o again | 17:02 |
*** baoli_ has quit IRC | 17:06 | |
*** baoli has joined #openstack-sprint | 17:06 | |
pabelanger | rcarrillocruz: first issue: http://paste.openstack.org/show/505708/ | 17:13 |
rcarrillocruz | ah well | 17:13 |
rcarrillocruz | put your .ansible.cfg this: | 17:13 |
rcarrillocruz | [defaults] | 17:13 |
rcarrillocruz | host_key_checking = no | 17:13 |
rcarrillocruz | alternatively | 17:13 |
rcarrillocruz | export ANSIBLE_HOST_KEY_CHECKING=False | 17:14 |
rcarrillocruz | that should get you past that issue | 17:14 |
clarkb | please do not set that in roots ansible.cfg | 17:14 |
pabelanger | Right, I'm not a fan of having to setup defaults in ansible.cfg to run it honestly | 17:14 |
clarkb | the env var for one time use would be preferable | 17:14 |
pabelanger | we should be able to setup a pre_task or something to dynamically disable it | 17:15 |
jeblair | it looks like ansible on puppetmaster is sploding: | 17:16 |
jeblair | 2016-05-26 17:13:10,734 p=13399 u=root | An exception occurred during task execution. To see the full traceback, use -vvv. The error was: TypeError: string indices must be integers, not str | 17:16 |
jeblair | 2016-05-26 17:13:10,735 p=13399 u=root | fatal: [paste.openstack.org -> localhost]: FAILED! => {"changed": false, "failed": true, "module_stderr": "Traceback (most recent call last):\n File \"/tmp/ansible_MIFKWV/ansible_module_puppet_post_puppetdb.py\", line 149, in <module>\n main()\n File \"/tmp/ansible_MIFKWV/ansible_module_puppet_post_puppetdb.py\", line 78, in main\n fqdn = ... | 17:16 |
jeblair | ... p['hostvars']['inventory_hostname']\nTypeError: string indices must be integers, not str\n", "module_stdout": "", "msg": "MODULE FAILURE", "parsed": false} | 17:16 |
jeblair | that's for every host | 17:16 |
rcarrillocruz | ansible 2.1 issue ? | 17:16 |
jeblair | i guess it's just for posting facts | 17:16 |
jeblair | oh, istr someone saying posting facts is broken | 17:17 |
jeblair | is that what was meant, or is that something new? | 17:17 |
rcarrillocruz | not aware of issues specific of ansible 2.1, but it got upgraded on the puppetmaster yesterday | 17:18 |
rcarrillocruz | so unless someone identifies that prior to yesterday, could be linked | 17:18 |
rcarrillocruz | although that error message suggests the fix should be issue | 17:19 |
rcarrillocruz | passing a filter to type cast it to int | 17:19 |
pabelanger | jeblair: missing if __name__ == '__main__' bits? | 17:19 |
pabelanger | ya, ansible-puppet doesn't have that | 17:20 |
pabelanger | let me patch, see if that helps | 17:20 |
jeblair | how is it working at all then? | 17:20 |
pabelanger | I thought the magic bits were optional until 2.1? I think that is what dshrews mentioned? | 17:21 |
pabelanger | honestly, I'm just guessing ATM | 17:21 |
jeblair | yeah, i just mean, the "run puppet" task works, but the "post facts" task doesn't. they are 2 different modules, but both lack the ifnamemain | 17:21 |
pabelanger | Right, not sure actually | 17:22 |
jeblair | it also looks like no 'group' hiera files are being copied over | 17:25 |
clarkb | finally tracked down logstash.o.o fails, related to apache update looks like | 17:25 |
jeblair | grep "hieradata/production/group" puppet_run_all.log | 17:25 |
jeblair | returns nothing | 17:25 |
jeblair | group copying last worked at 2016-05-25 12:52:02,469 | 17:26 |
rcarrillocruz | pabelanger: going thru now? | 17:26 |
rcarrillocruz | i gotta leave shortly to buy some food | 17:27 |
pabelanger | rcarrillocruz: stopped for the moment, going to try again shortly | 17:27 |
jeblair | "MODULE FAILURE" first happened at 2016-05-25 13:03:58,514 | 17:27 |
jeblair | do we know when we upgraded to 2.1? | 17:27 |
rcarrillocruz | yesterday jeblair | 17:28 |
rcarrillocruz | the class installs latest from pip by default | 17:28 |
jeblair | i mean a specific time | 17:28 |
rcarrillocruz | time not sure, pypi page should tell | 17:28 |
rcarrillocruz | sec | 17:28 |
pabelanger | I don't see a timestamp on https://pypi.python.org/pypi/ansible/2.1.0.0 | 17:28 |
jeblair | i'm trying to determine if both of those problems (which started within 10 minutes of each other) are related to 2.1 | 17:28 |
jeblair | well, i'm more interested in when our puppet upgraded it :) | 17:29 |
rcarrillocruz | hmm, i don't see upload time | 17:29 |
rcarrillocruz | https://pypi.python.org/pypi/ansible/2.1.0.0 | 17:29 |
*** baoli_ has joined #openstack-sprint | 17:29 | |
jeblair | May 25 13:02:02 puppetmaster puppet-user[5811]: (/Stage[main]/Ansible/Package[ansible]/ensure) ensure changed '2.0.2.0' to '2.1.0.0' | 17:29 |
pabelanger | was just about to link that | 17:30 |
jeblair | so yep, i think that both the post module failure and the group hiera copying issues are likely related | 17:30 |
rcarrillocruz | ttyl | 17:31 |
*** baoli has quit IRC | 17:32 | |
jeblair | remote: https://review.openstack.org/321772 Pin to ansible 2.0.2.0 | 17:33 |
pabelanger | jeblair: +2 | 17:34 |
jeblair | clarkb: would you +3 that please? | 17:34 |
* clarkb looks | 17:35 | |
anteaya | jeblair: where is the remote: bit coming from, is that an artifact from gertty? | 17:36 |
clarkb | approved, note that that host may not be able to puppet itself right now | 17:36 |
anteaya | when you post a patch url | 17:36 |
clarkb | we may have to manually downgrade after that merges | 17:36 |
jeblair | anteaya: it's what gerrit sends back to 'git review' (and git review prints it on the terminal) | 17:41 |
jeblair | anteaya: i've just been copy/pasting the whole line, so 'remote:' shows up | 17:42 |
anteaya | jeblair: ah yes, makes sense, thank you | 17:42 |
jeblair | clarkb: and yeah, i can manually downgrade once that lands | 17:42 |
fungi | pabelanger: were you wanting to snapshot status.o.o and then delete it? or want me to? | 17:46 |
fungi | (the old one i mean, not the new one of course) | 17:46 |
pabelanger | fungi: Ya, I just have it shutdown for the moment. I'll defer to you on how to handle it | 17:47 |
fungi | pabelanger: i'm creating an image from it called "status-precise-backup" and will delete the server instance once that shows up | 17:48 |
pabelanger | fungi: ack | 17:48 |
fungi | once it looks like ansible is happy on puppetmaster again, i'll try booting paste01.openstack.org (bkero's change to pass through the vhost name has merged just a little while ago) | 17:50 |
bkero | \o/ | 18:05 |
clarkb | pabelanger: for the recently completed 08 host are you using new ansible thingy or old launch script? | 18:12 |
clarkb | curious because downgrade of ansible will break the new thing aiui | 18:12 |
pabelanger | clarkb: I reverted to launch-node.py | 18:12 |
pabelanger | I tried cloud-launcher a few more times, but ran into issues | 18:13 |
pabelanger | likely trivial to fix, but don't want to get bogged down debug it right now | 18:13 |
clarkb | kk | 18:13 |
clarkb | I am really happy at how many of these we are knocking out. I am sad I didn't hav time to help more early in the week but the list of precise is dwindling | 18:14 |
pabelanger | Ya, so far things are working really well | 18:14 |
pabelanger | I am pleased | 18:14 |
clarkb | pabelanger: if my current stack for logstash.o.o fails I will need to add a third change which adds the guards to puppet-logstash | 18:16 |
anteaya | pabelanger: eavesdrop is done? | 18:16 |
pabelanger | anteaya: not yet, going to do it tomorrow. only 2 meetings scheduled | 18:16 |
pabelanger | but data is persisted | 18:16 |
anteaya | pabelanger: awesome | 18:16 |
anteaya | now I understand what is in the etherpad for that server | 18:17 |
pabelanger | I quickly looked at lists.o.o for the data, but need to figured out the mount point | 18:17 |
anteaya | fungi: storyboard.o.o is done I think, yes? | 18:17 |
anteaya | pabelanger: cool | 18:17 |
rcarrillocruz | pabelanger: paste me the issues so I can check when back home | 18:17 |
pabelanger | rcarrillocruz: will do | 18:18 |
clarkb | jeblair: does zuul 2.5 have a story for our privileged long running slaves? wondering if jenkins.o.o needs to be treated separately from the other jenkisnes | 18:18 |
anteaya | zuul got restarted yesterday to pick up a patch, what is the status of its server? | 18:18 |
clarkb | anteaya: zuul is still precise | 18:18 |
anteaya | okay | 18:19 |
pabelanger | Ya, that should be straightforward to upgrade. We just need to schedule the outage I think | 18:19 |
jeblair | clarkb: thanks for asking! https://review.openstack.org/321584 https://review.openstack.org/321615 https://review.openstack.org/321616 | 18:19 |
anteaya | so zuul, wiki and static I think have no notes beside them | 18:19 |
clarkb | pabelanger: doing the ES hosts shouldn't be too difficult either. The process there will be to ru nthat temporary no allocation curl command, shut off ES on a host, detach its cinder volume, boot new cinder host reattaching that cinder volume, start ES, delete old host, then enable allocation again | 18:20 |
anteaya | all others appear to be in some sort of progress | 18:20 |
clarkb | pabelanger: really similar to how we did the ES upgrades | 18:20 |
pabelanger | clarkb: Ya, was going to ask what was needed for that. But makes sense | 18:21 |
fungi | anteaya: yep, just crossed it out | 18:21 |
*** baoli_ has quit IRC | 18:21 | |
anteaya | fungi: awesome, thank you | 18:21 |
*** baoli has joined #openstack-sprint | 18:21 | |
clarkb | just looking at the list static is likely to be tricky | 18:22 |
clarkb | we will need to pause all new jobs, wait for running jobs to finish (or kill them early), then do a cinder volume shuffle | 18:22 |
pabelanger | clarkb: maybe we can do both zuul.o.o and static.o.o at the same time | 18:23 |
clarkb | pabelanger: good idea | 18:24 |
clarkb | jeblair: comments on https://review.openstack.org/#/c/321616/2 | 18:25 |
jeblair | i can not type that line correctly | 18:27 |
clarkb | jeblair: comment on https://review.openstack.org/#/c/321615/2 as well | 18:28 |
clarkb | now to review the big change that implements the thing | 18:28 |
clarkb | "big" | 18:29 |
jeblair | heh, it totally replaces a comment with the thing the comment said we would replace it with someday! | 18:29 |
anteaya | ...how big is it... | 18:29 |
clarkb | well its dense | 18:29 |
clarkb | all that erb to read | 18:30 |
clarkb | half the characters are non in alnum | 18:30 |
clarkb | jeblair: squashing may not be a terrible idea, but either way | 18:31 |
jeblair | clarkb: there is some trepidation the hiera thing may not work. i'd like to land both and try it, but if it doesn't, i kept them separate so we can revert the 2nd and have working zl. | 18:32 |
clarkb | gotcha | 18:32 |
jeblair | testing also caught an issue with 615 fixing that too | 18:33 |
jeblair | both sysconfig changes updated | 18:36 |
pabelanger | okay, 10 of 20 logstash-workers upgrade to ubuntu-trusty | 18:37 |
pabelanger | Going to take a break and walk down to pickup my daughter from school | 18:37 |
pabelanger | I'll finished off the other 10 when I get back | 18:37 |
jeblair | oh, updated one more time because i forgot our group double-accounting | 18:41 |
jeblair | pabelanger: btw, you can #status in any statusbot channel (incl this one) | 18:41 |
clarkb | jeblair: one more thing on https://review.openstack.org/#/c/321615/4 I don't think the regexes currently match up between ansible and puppet, I left a comment for what a possible regex would be | 18:45 |
jeblair | clarkb: oh yep | 18:46 |
jeblair | done | 18:47 |
*** baoli has quit IRC | 19:01 | |
*** baoli has joined #openstack-sprint | 19:01 | |
rcarrillocruz | back | 19:08 |
rcarrillocruz | sup pabelanger , how many workers have been migrated | 19:08 |
anteaya | rcarrillocruz: he is getting his daughter | 19:12 |
rcarrillocruz | cool , thx | 19:13 |
anteaya | welcome | 19:13 |
nibalizer | yall are doing a great job! | 19:13 |
nibalizer | sorry im not helping! | 19:14 |
anteaya | nibalizer: I think you have helped in a few key moments | 19:14 |
clarkb | nodepool needs a hug | 19:19 |
anteaya | <hug> | 19:21 |
*** baoli has quit IRC | 19:23 | |
pabelanger | jeblair: neat, TIL | 19:29 |
pabelanger | rcarrillocruz: 10 of 20 ATM | 19:30 |
pabelanger | going to finish them off using launch-node.py | 19:30 |
pabelanger | plan to do some more testing of cloud-launcher once I'm finished | 19:30 |
rcarrillocruz | you remember what kind of issues you had earlier so I could poke? | 19:34 |
pabelanger | rcarrillocruz: ssh hostchecking was 1 | 19:35 |
pabelanger | let me see if I have backscroll of other | 19:35 |
pabelanger | I did a quick hack to disable it via env in playbook | 19:35 |
clarkb | my logstash fixes are still hanging out in check | 19:36 |
anteaya | playing cards, drinking beer | 19:37 |
pabelanger | clarkb: Ya, don't see any stale nodes. Just busy today it seems | 19:39 |
*** baoli has joined #openstack-sprint | 19:41 | |
pabelanger | #status log logstash-worker11.openstack.org now running ubuntu-trusty and processing requests | 19:45 |
openstackstatus | pabelanger: finished logging | 19:45 |
clarkb | pabelanger: busy and osic and bluebox are basically offline due to fip things | 19:46 |
clarkb | or rather osic is not sure about bluebox | 19:46 |
pabelanger | clarkb: Ya, looking forward to when we fix shade | 19:47 |
fungi | looks like we're back to a working ansible version on puppetmaster again | 19:47 |
fungi | thanks jeblair! | 19:47 |
clarkb | fungi: jeblair did someone manually downgrade or did it sort itself out? | 19:48 |
fungi | i don't know, i simply checked `pip list|grep ansible` | 19:48 |
pabelanger | not I | 19:49 |
pabelanger | clarkb: we do have a large number of server in delete state on nodepool.o.o about 132 | 19:52 |
pabelanger | 86 in OSIC alone | 19:52 |
pabelanger | so, not that bad, if we account for the FIP issue | 19:53 |
clarkb | pabelanger: they are in that state due to the fip issue | 19:53 |
clarkb | pabelanger: they take an hour to build, timeout, then get deleted | 19:53 |
pabelanger | we seem to be on an uptick of deleting nodes however: http://grafana.openstack.org/dashboard/db/nodepool | 19:53 |
pabelanger | Hmm, something up with ORD: http://grafana.openstack.org/dashboard/db/nodepool-rackspace | 19:54 |
pabelanger | 13mins time to ready ATM | 19:54 |
pabelanger | status.rackspace.com is reporting some ORD storage maintenance today: | 19:55 |
pabelanger | https://status.rackspace.com/ | 19:55 |
fungi | unfortunately my first attempt at booting paste01 failed, so i'm rerunning with --keep and going out for a walk | 20:02 |
fungi | bbiaw | 20:02 |
pabelanger | I'm having some issues using ansible-playbook on puppetmaster.o.o. | 20:08 |
pabelanger | looks to be related to the inventory | 20:08 |
anteaya | fungi: enjoy your walk | 20:09 |
pabelanger | I suspect JJB is running on jenkins servers, which is affecting it | 20:11 |
clarkb | can has approval for https://review.openstack.org/#/c/321778/ and its dependency? | 20:12 |
jeblair | fungi, clarkb, pabelanger: neat. i did not manually fix puppetmaster, guess it fixed itself | 20:13 |
clarkb | with that stack in I will retry making logstash.o.o | 20:13 |
jeblair | clarkb: you might use zuul enqueue | 20:13 |
clarkb | jeblair: I don't think I need enqueue, they both passed testing | 20:13 |
clarkb | just need review and approvals | 20:13 |
jeblair | clarkb: oh, thought you mentioned something being stuck in check | 20:13 |
clarkb | they were I occupied my time with other stuff so it was fine | 20:14 |
pabelanger | http://paste.openstack.org/show/505734/ | 20:15 |
pabelanger | that's the error I am seeing now when I run ansible-playbook | 20:15 |
pabelanger | hoping it fixes itself | 20:15 |
clarkb | hrm looks like it can't talk to osic? | 20:16 |
clarkb | maybe try using openstackclient against the same clouds.yaml | 20:17 |
pabelanger | clarkb: ya, looks to be an issue | 20:18 |
pabelanger | going to hope into #osic to see what is going on | 20:18 |
clarkb | ok | 20:18 |
pabelanger | clarkb: all quiet in #osic. Do we have another contact besides cloudnull? I believe he is on vacation today | 20:23 |
pabelanger | additionally, guess we found a bug in openstack inventory | 20:24 |
pabelanger | since losing a cloud stop our puppet wheel | 20:25 |
clarkb | oh you know what | 20:26 |
clarkb | I think the ssl cert had a really short time before expiry | 20:26 |
clarkb | pabelanger: maybe check if the ssl cert for it is still good? | 20:26 |
pabelanger | Hmm | 20:27 |
pabelanger | Issued On Thursday, May 26, 2016 at 2:27:00 PM according to chrome | 20:28 |
pabelanger | I'm also using python-openstackclient | 20:29 |
clarkb | ya so thats brand new I wonder if related | 20:30 |
pabelanger | https://bugs.launchpad.net/python-openstackclient/+bug/1447704 | 20:30 |
openstack | Launchpad bug 1447704 in python-openstackclient "token issue fails for keystone v2 if OS_PROJECT_DOMAIN_NAME or OS_USER_DOMAIN_NAME are set" [Medium,Fix released] - Assigned to Hieu LE (hieulq) | 20:30 |
pabelanger | looks like the same backtrace I am seeing | 20:30 |
pabelanger | DiscoveryFailure: Could not determine a suitable URL for the plugin | 20:31 |
clarkb | maybe we updated other libs? | 20:32 |
pabelanger | python-openstackclient 2.5.0 was just tagged 1 hour ago | 20:32 |
pabelanger | with a fix | 20:32 |
pabelanger | let me test in a venv | 20:32 |
pabelanger | same issue | 20:35 |
pabelanger | and --insecure doesn't work either | 20:35 |
pabelanger | http://paste.openstack.org/show/505739/ | 20:37 |
pabelanger | SNIMissingWarning is new to me | 20:37 |
clarkb | pabelanger: thats part of urllib3 trying to be a good citizen by annoying its users in hopes they will get the services they talk to to fix their ssl certs | 20:38 |
jeblair | oh, it's over here :) | 20:38 |
clarkb | er not SNI there is a different one. In any case urllib3 has a handful of warnings that are "hey user bad things that you probably can't easily fix yourself" | 20:38 |
pabelanger | right | 20:39 |
jeblair | OpenStackCloudException: error fetching floating IPs list: 503 Service Unavailable | 20:40 |
jeblair | The server is currently unavailable. Please try again at a later time. | 20:40 |
jeblair | that's what running nodepool is seeing | 20:40 |
jeblair | or at least one of the errors | 20:40 |
pabelanger | so, maybe they are down | 20:40 |
jeblair | of course, it may have something cached | 20:40 |
pabelanger | I have a query into #osic | 20:40 |
jeblair | we might see something different if we restart nodepool | 20:40 |
pabelanger | jeblair: clarkb: seems to be related to the new SSL cert, according to #osic. They are working on it | 20:49 |
clarkb | fun | 20:50 |
clarkb | pabelanger: I am guessing that broken ansible inventory is preventing the puppet modules from updating on puppetmsater because we use ansible to do that | 20:50 |
jeblair | oh, what's wrong with ansible inventory? | 20:51 |
clarkb | jeblair: the osic thing | 20:52 |
clarkb | it doesn't gracefully handle clouds being gone | 20:52 |
jeblair | oh | 20:52 |
pabelanger | clarkb: ya | 20:52 |
pabelanger | I don't know how to tell openstack inventory to skip osic | 20:52 |
jeblair | we could remove it from that clouds.yaml | 20:52 |
jeblair | it might be nice to fix it so that it fails gracefully, but some day we're going to have to think about what that means for a systems that wants to automatically create servers that don't exist | 20:53 |
pabelanger | Ya, I think commenting it out for the moment is our fix | 20:54 |
clarkb | it will put itself back in if you don't disable puppet on the puppetmaster | 20:54 |
jeblair | i will do these things | 20:54 |
pabelanger | thanks | 20:55 |
jeblair | #status log puppet disabled on puppetmaster (for the puppetmaster host itsself -- not globally) and OSIC manually removed from clouds.yaml because OSIC is down which is causing ansible openstack inventory to fail | 20:57 |
openstackstatus | jeblair: finished logging | 20:57 |
clarkb | tyty | 20:57 |
clarkb | new logstash.o.o launching now | 21:03 |
pabelanger | #osic says they are reverting the SSL cert now | 21:03 |
jeblair | puppet run all is running | 21:03 |
pabelanger | nodepool.o.o is building nodes again in OSIC | 21:04 |
pabelanger | #status log logstash-worker12.openstack.org now running ubuntu-trusty and processing requests | 21:05 |
openstackstatus | pabelanger: finished logging | 21:05 |
fungi | bkero: if i try to launch paste01.openstack.org it doesn't puppet sufficiently for me to even be able to log into it, so no idea what's wrong there. if i launch paste.openstack.org it works fine: 2001:4800:7817:104:be76:4eff:fe06:83b8, 23.253.238.187 | 21:06 |
fungi | (for definitions of fine where i needed to `sudo start openstack-paste` because it doesn't start automagically, that is) | 21:07 |
bkero | fungi: Huh, let me check the service resource agaib | 21:08 |
bkero | again* | 21:08 |
fungi | so anyway, i'm inclined to just replace it with the trusty one i booted as paste.o.o and have tested and confirmed to be up and working/serving content from trove | 21:12 |
bkero | ok | 21:13 |
* bkero looks at the puppetboard run anyway to see ig it's something we need to be worried about | 21:13 | |
clarkb | pabelanger: I got a whole bunch of http://paste.openstack.org/show/505745/ | 21:14 |
clarkb | that almost seems like an issue with new ansible | 21:15 |
clarkb | but pip says 2.0.2.0 is installed | 21:15 |
clarkb | in any case I appear to have a new logstash.o.o that didn't break during puppeting | 21:16 |
clarkb | should I go ahead and use it or debug the above issues first? | 21:16 |
jeblair | clarkb: yeah, that's the same inventory error. apparently i failed at preventing it from being reverted | 21:17 |
fungi | bkero: i can retry with paste01 one more time just to confirm it wasn't a fluke | 21:17 |
jeblair | clarkb: because it's 'localhost' | 21:18 |
clarkb | aha | 21:19 |
bkero | fungi: ok, do you know where i can see the report made by the puppet run? | 21:19 |
clarkb | jeblair: do you think I should rerun? I do not know which step requires the inventory, but the host is built | 21:19 |
clarkb | jeblair: its pretty cheap to delete and rebuild for safety though | 21:19 |
jeblair | clarkb: i think if you run with our ansible kick thing, it should be fine | 21:19 |
fungi | bkero: nope. i don't think launch-node does trigger a report? | 21:19 |
clarkb | jeblair: I am not sure I know what that is | 21:19 |
bkero | Oh, hrm | 21:20 |
fungi | bkero: i'm really not sure if it does or not anyway | 21:20 |
jeblair | clarkb: tools/kick.sh (which runs the adhoc playbook) | 21:20 |
bkero | clarkb: any clue if launch-node generates puppet reports? | 21:20 |
fungi | bkero: puppet apply logs in syslog, but since we don't get that back through ansible, if it doesn't puppet far enough to set up my account i can't ssh in to look at the errors | 21:20 |
clarkb | bkero: no idea | 21:20 |
clarkb | jeblair: oh puppet ran and everything just fine | 21:21 |
bkero | i tested locally, but obv the environment is different | 21:21 |
clarkb | which is why I am confused about why it needs to execute the inventory script, maybe to update the cacahe | 21:21 |
jeblair | (though, looking at that, i wonder if disabling localhost (which i have now done) will have further adverse effects) | 21:21 |
jeblair | clarkb: oh! yes, launch-node does a cache flush | 21:21 |
jeblair | clarkb: i thought it just removed, but maybe it repopulates too? | 21:21 |
clarkb | ya I am guessing that is what it is trying to do | 21:21 |
clarkb | I am going to just redo since its quick and low cost and will ensure all that data is correct | 21:22 |
jeblair | manage-projects is running on review.o.o and taking seriously long time | 21:22 |
jeblair | clarkb: it runs 'expand-groups' after clearing the cache | 21:22 |
jeblair | clarkb: which uses ansible to list hosts, so yeah | 21:23 |
clarkb | ok rebuilding now | 21:24 |
bkero | I'm surprised puppet could be borked enough to at least not set up users. I wonder if install_puppet had a network hiccup or something | 21:24 |
pabelanger | clarkb: that usually happens when ansible inventory is doing something | 21:25 |
pabelanger | clarkb: I don't know what, but it eventually fixes itself | 21:26 |
fungi | bkero: yeah, the failure is consistent. this is the error i get back from the launch script: | 21:29 |
fungi | fatal: [paste01.openstack.org]: FAILED! => {"changed": false, "disabled": false, | 21:29 |
fungi | "error": true, "failed": true, "msg": "puppet did not run", "rc": 1, "stderr": "", "stdout": "", "stdout_lines": []} | 21:29 |
fungi | bkero: and if i try to ssh to the ipv4 or ipv6 address of the kept (broken) server, it prompts me for a password implying it didn't get far enough to puppet my ssh key on there | 21:30 |
bkero | fungi: the comments in ansible-puppet seem to indicate that it's a compilation failure. | 21:31 |
fungi | and if i make the hostname paste.openstack.org instead, it's fine | 21:31 |
clarkb | well it failed again, I am just going to ignore that stuff for now and move forward with finishing this server replacement | 21:31 |
clarkb | any objections? | 21:32 |
jeblair | clarkb: sounds good | 21:33 |
bkero | fungi: The only difference I can think of is if the "vhost_name" parameter makes catalog application fail :/ | 21:34 |
fungi | bkero: which is odd since it's a class parameter passed directly to http://git.openstack.org/cgit/openstack-infra/system-config/tree/modules/openstack_project/manifests/paste.pp#n6 | 21:36 |
bkero | fungi: Yep. Shouldn't make a diff. | 21:36 |
clarkb | DNS updates are done | 21:37 |
clarkb | will bounce iptables on jenkinses and logstash workers as soon as the new stuff resolves | 21:37 |
pabelanger | okay, moving on to logstash-worker13.o.o replacement | 21:39 |
clarkb | actually I think only the jenkisnes need it | 21:39 |
clarkb | since the workers connect to it | 21:39 |
*** rfolco has quit IRC | 21:43 | |
clarkb | http://logstash.openstack.org/ forbidden! | 21:47 |
clarkb | I think this means I need to do the file stuff for 2.4? | 21:47 |
clarkb | everything else seems to be functioning | 21:49 |
fungi | clarkb: i get a ton of the same errors you pasted in http://paste.openstack.org/show/505745/ every time i successfully launch a server too, so it's not just you. nobody else seemed to be able to reproduce it, but i guess you can | 21:52 |
fungi | also, i'm updating dns for the new paste.o.o now | 21:53 |
clarkb | I am making sure all the workers are talking to new logstash.o.o then will work on fixing apache config, then can delete old one | 21:54 |
fungi | #status log paste.openstack.org now running ubuntu-trusty and successfully responding to requests | 21:59 |
openstackstatus | fungi: finished logging | 21:59 |
fungi | i wonder if we should take the downtime during the gerrit rename maintenance window as an opportunity to replace static.o.o | 22:01 |
anteaya | fungi: earlier there was a thought that when zuul was being replaced that that downtime might make a good static replacement window | 22:03 |
anteaya | but I don't know if zuul has already been replaced | 22:03 |
anteaya | if not, two windows to replace static | 22:03 |
fungi | yeah, that's a possibility, or we also do zuul during that same window (but zuul seems like it would be potentially quicker to replace?) | 22:03 |
anteaya | yeah the gerrit downtime is not until a week tomorrow | 22:04 |
clarkb | fungi: pabelanger mentioned doing zuul at the same time | 22:04 |
anteaya | but I'm not on the root end so whatever rooters want | 22:04 |
anteaya | yeah that's right it was pabelanger's idea | 22:05 |
anteaya | sorry about that, forgot who mentioned it | 22:05 |
fungi | we've made huge progress this week, so if some stuff gets pushed off i won't object | 22:05 |
fungi | i mean, we've already identified at least a couple we won't be able to migrate for a while | 22:05 |
fungi | (planet, wiki) | 22:05 |
anteaya | pleia2: was still working on planet last I understood | 22:06 |
anteaya | is there more to the tale? | 22:06 |
clarkb | fungi: you were saying we did the Require all granted stuff in two different ways? one of them is using mod version to switch on including it, the other is what? I need to add it to kibana's vhost | 22:06 |
anteaya | I think wiki was the only server that didnt' get touched or talked about | 22:06 |
fungi | clarkb: yeah, i don't remember and would resort to digging up examples | 22:06 |
anteaya | resort! | 22:08 |
anteaya | resorts are nice I hear | 22:08 |
anteaya | beaches and so on | 22:08 |
clarkb | now to find the conditional for installing mod version | 22:08 |
pabelanger | 2016-05-26 22:08:45,320 Error connecting to logstash.openstack.org port 4730 | 22:08 |
pabelanger | that is what I am seeing now | 22:08 |
clarkb | pabelanger: you have to bounce iptables on logstash.o.o | 22:09 |
clarkb | after dns is updated | 22:09 |
pabelanger | clarkb: did | 22:09 |
pabelanger | well, I think I did | 22:09 |
clarkb | pabelanger: on the new one? | 22:09 |
pabelanger | I have to check, I wrote a quick ansible playbook to do the bouncing | 22:10 |
clarkb | 23.x.y.z is new. 166.x.y.z is old | 22:10 |
pabelanger | 08c356e5-d225-4163-9dce-c57b4d68eb55 : ok=0 changed=0 unreachable=1 failed=0 | 22:10 |
clarkb | pabelanger: thats the right uuid, maybe you raced the record timing out? | 22:11 |
clarkb | in any case the other 19 are working | 22:11 |
pabelanger | okay better | 22:11 |
pabelanger | I had to fix SSH host keys on puppetmaster.o.o | 22:11 |
pabelanger | #status log logstash-worker13.openstack.org now running ubuntu-trusty and processing requests | 22:12 |
openstackstatus | pabelanger: finished logging | 22:12 |
fungi | i'm not seeing any new traffic to the old paste server for ~10 minutes now, so i'm going to halt and start snapshotting it | 22:12 |
clarkb | pabelanger: which repo did you do the mod version thing in again? | 22:12 |
fungi | up 449 days | 22:12 |
fungi | sorry to see some of these uptimes die | 22:12 |
pabelanger | clarkb: puppet-graphite I think | 22:13 |
pabelanger | ya, that's right | 22:13 |
clarkb | yup found it, thanks | 22:15 |
fungi | the status-precise-backup image exists now, so i'm deleting the old offline server instance | 22:15 |
pabelanger | ack | 22:16 |
clarkb | https://review.openstack.org/321875 should be all that is needed to finish up logstash.o.o move | 22:17 |
bkero | fungi: odd. I'm trying to replicate again locally using a simple manifest. Seems to be apply fine. O_o http://paste.openstack.org/show/505749/ | 22:24 |
bkero | creating a user for you, write perms, etc | 22:24 |
bkero | on a clean trusty host :/ | 22:25 |
pabelanger | #status log logstash-worker14.openstack.org now running ubuntu-trusty and processing requests | 22:33 |
openstackstatus | pabelanger: finished logging | 22:33 |
clarkb | pabelanger: if now is a good time for reviews https://review.openstack.org/#/c/321875/ has your name on it :) | 22:34 |
pabelanger | clarkb: WFM | 22:35 |
pabelanger | I can +A if needed | 22:35 |
clarkb | +A would be great | 22:35 |
bkero | Does openstack-infra's hiera limit access to variables based on node name? | 22:43 |
clarkb | bkero: ya ansible only copies the data that belongs to a speicifc host or group | 22:44 |
bkero | clarkb: I'm wondering. http://paste.openstack.org/show/505752/ | 22:45 |
bkero | paste01.openstack.org fails, but paste.openstack.org succeeds (as trusty) | 22:45 |
bkero | I'm wondering if 1) that hostname regex isn't matching like I assume it is, or 2) hiera values aren't accessible. | 22:46 |
clarkb | oh ya the hostname changes so the asnible hiera matching stuff won't pick it up | 22:46 |
bkero | Would that cause ansible-puppet to hit this? https://github.com/openstack-infra/ansible-puppet/blob/master/library/puppet#L219 | 22:47 |
bkero | That's what's happening | 22:47 |
clarkb | I would expect it to run and fail on hiera lookups | 22:48 |
bkero | fungi's user isn't even being created, so it can't be getting terribly far based on my local run. http://sprunge.us/hOGO | 22:50 |
clarkb | what is hiera's behavior when there is no hieradata? | 22:51 |
clarkb | maybe its bailing out really early? | 22:51 |
bkero | That's why the hiera() function has 2 parameters | 22:52 |
bkero | if it fails it returns the 2nd | 22:52 |
bkero | if it doesn't have a 2nd...dunno | 22:52 |
clarkb | possible it also has different behavior when the files that should have the data don't exist | 22:53 |
bkero | fungi: I'm guessing that's it ^ | 22:53 |
bkero | clarkb: is infra's hiera data in a secret repo? | 22:54 |
clarkb | bkero: the secret info is, there is a public set of data in system-config too | 22:54 |
bkero | clarkb: I'm guessing the new hostname should probably be in system-config/hiera/common.yaml | 22:55 |
bkero | fungi: ^ | 22:56 |
bkero | (sorry for double-ping) | 22:56 |
clarkb | it should go in cacti hosts maybe, depending on how dns is configured | 22:56 |
clarkb | but otherwise I don't think it needs anything there | 22:56 |
bkero | Cacti? For paste.o.o? | 22:57 |
clarkb | ya that list there tells cacti which hosts to poll | 22:57 |
bkero | Ah | 22:58 |
pabelanger | #status log logstash-worker15.openstack.org now running ubuntu-trusty and processing requests | 23:01 |
*** rfolco has joined #openstack-sprint | 23:01 | |
openstackstatus | pabelanger: finished logging | 23:01 |
clarkb | ok kibana apache 2.4 fix has merged, almost done here | 23:01 |
fungi | bkero: clarkb: ooh! great point. our host-based hiera split is almost certainly at odds with the idea that we'll have host name patterns in most cases going forward | 23:07 |
fungi | i didn't even consider that | 23:07 |
fungi | i'll follow up to the ml thread with that as yet another caveat | 23:08 |
bkero | fungi: Don't know if you want another review to add that to the correct hiera groups, or to just go with the old naming scheme and adding a Node tag to it. | 23:08 |
fungi | we can hash it out on the ml. there are more stakeholders potentially following that thread | 23:08 |
bkero | ok | 23:08 |
bkero | Sounds good | 23:09 |
*** asselin_ has quit IRC | 23:23 | |
pabelanger | #status log logstash-worker16.openstack.org now running ubuntu-trusty and processing requests | 23:29 |
openstackstatus | pabelanger: finished logging | 23:29 |
clarkb | I am not seeing puppet update logstash.o.o with my apache 2.4 fix, guessing ansible + puppet are still both unhappy? | 23:29 |
clarkb | maybe I didn't get the key accepted like I thought I did | 23:29 |
clarkb | no ssh works | 23:29 |
pabelanger | starting wheel again now | 23:30 |
clarkb | logstash.o.o doesn't show up in the puppet run all log | 23:32 |
pabelanger | clarkb: try the UUId | 23:32 |
pabelanger | suspect there is 2 logstash.o.o server ATM | 23:32 |
clarkb | oh right | 23:32 |
clarkb | oh hrm it looks like our puppet runs are taking a large amount of time and we may not be updating every 15 minutes currnetly | 23:34 |
clarkb | I need to practice patience | 23:34 |
pabelanger | we usually get back to back puppet runs | 23:37 |
fungi | the snapshot for the old paste.o.o instance is still 0% complete even after i went out to dinner and came back | 23:37 |
fungi | openstack has bugs? | 23:37 |
pabelanger | JJB is usually the reason we don't | 23:37 |
clarkb | fungi: are we snapshotting for safety? or are you wanting to boot off of snapshot? | 23:38 |
fungi | i'm snapshotting before deleting stuff that's not farm-style | 23:38 |
fungi | jhesketh talked me into it | 23:39 |
fungi | it _looks_ like the ci-backup-rs-ord migration completed | 23:44 |
fungi | though there are two still listed in the server list, the one on the new flavor seems to have inherited the ip addresses we put in dns and the legacy one was assigned different addresses | 23:45 |
clarkb | fatal: [git06.openstack.org]: FAILED! => {"failed": true, "msg": "'dict object' has no attribute 'gitinfo'"} | 23:48 |
clarkb | that seems unhappy | 23:49 |
clarkb | and the puppet-kibana module doesn't appear to have updated on the puppetmsater which means it isn't getting updated on logstash.o.o | 23:49 |
clarkb | but I am quickly running out of steam for the day may have to pcik that up in the morning | 23:49 |
bkero | womp womp missing ansible dict elements | 23:49 |
Generated by irclog2html.py 2.14.0 by Marius Gedminas - find it at mg.pov.lt!