Tuesday, 2015-05-19

SpamapSgreghaynes: it worked for me when I did that too00:04
greghaynesSpamapS: Well, did you find a way in which it didnt work?00:37
openstackgerritgreghaynes proposed openstack/diskimage-builder: Add tests for building *-minimal images  https://review.openstack.org/18116200:59
greghaynesmordred: we should make an installtype for the simple-init element that installs the rust version01:04
clarkbits no longer a beta language so you wont have to rebuild everyday01:10
bkeroit should be 'stable'01:20
untriaged-botUntriaged bugs so far:03:00
openstackLaunchpad bug 1455175 in tripleo "Option to configure gateway through keepalived" [Undecided,New] - Assigned to Mayank (mayank0107)03:00
openstackLaunchpad bug 1449852 in diskimage-builder "Buidling ramdisk with ironic-agent behind proxy fails" [Undecided,In progress] - Assigned to Ramakrishnan G (rameshg87) (rameshg87)03:00
openstackLaunchpad bug 1449854 in diskimage-builder "Ironic agent ramdisk built using disk-image-create fails with iscsi_ilo driver" [Undecided,Fix committed] - Assigned to Ramakrishnan G (rameshg87) (rameshg87)03:00
openstackLaunchpad bug 1454803 in tripleo "puppet: Neutron is not configured with L2 population" [Undecided,New]03:00
openstackLaunchpad bug 1452752 in tuskar "keystone_authtoken section is wrong in default shipped tuskar.conf.sample" [Undecided,Confirmed]03:00
openstackLaunchpad bug 1454802 in tripleo "puppet: Neutron does not use Nova notifications" [Undecided,New]03:00
*** julim has quit IRC05:41
*** ishant has joined #tripleo07:25
*** aufi has joined #tripleo07:29
*** gfidente has joined #tripleo07:59
gfidentejistr, morning :)07:59
jistrmorning :)07:59
gfidentelooks like we got something yesterday07:59
gfidenteit seems to work fine for me07:59
jistrgfidente: neat, which patch were you on?08:00
gfidentemy WIP which included yours08:00
gfidenteI see CI failing apparently on the running time08:00
gfidenteso I merged the change which was increasing the HA job timeout08:01
gfidentejistr, so I tested this https://review.openstack.org/#/c/184078/08:03
gfidenteand also https://review.openstack.org/#/c/184043/ and both worked08:03
gfidentemarios, jistr can you check this (and the dep) https://review.openstack.org/#/c/183097/08:05
gfidenteit's be nice to merge that and then https://review.openstack.org/#/c/183472/08:06
jistrgfidente: yeah i'm going to test this https://review.openstack.org/#/c/183097/ (+ its long dep chain :) ) and if that gives us a successful deployment, then i think we should merge away08:08
jistrgonna do the remaining visual reviews first08:09
gfidenteFWIW the commit here is wrong https://review.openstack.org/18407208:10
gfidenteit is only moving VIPs08:10
openstackgerritGiulio Fidente proposed openstack/tripleo-heat-templates: Enable VIPs via Pacemaker from step 2 instead of step 1  https://review.openstack.org/18407208:11
mariosgfidente: sure man gimme few (just done one round of reviews)08:12
gfidentejistr, the thing I found about clustercheck is that it doesn't work the way it is now08:14
gfidentewe need to add the service into haproxy08:14
gfidentewhich gives back status of synchronization to the resource agent08:14
gfidenteI was going to add that today08:14
jistrgfidente: by "doesn't work" you mean it reports OK even though the cluster is not ready yet08:15
gfidentejistr, so from what I understand reading the script08:16
*** mcornea has joined #tripleo08:16
gfidentethe script itself is started by xinetd and it does work08:16
gfidentebut we miss in haproxy the service which is polling the service08:17
gfidentelet me point at the code which is easier08:17
jistrgfidente: ack. btw the commit message change caused the later patches to depend on [outdated], it should go away if you submit the whole branch08:19
gfidenteL22 there is playing with the results of the xinetd service on 920008:20
jistrah ok08:21
gfidentejistr, the galera resource agent itself doesn't call it https://github.com/ClusterLabs/resource-agents/blob/master/heartbeat/galera08:24
gfidenteso I suspect the mechanism is nodes are removed from balancer if unsynched08:24
jistryeah that sounds sensible08:26
jistrso i'm rebuilding from scratch08:26
jistrwith this https://review.openstack.org/#/c/184078/08:27
* jistr fingers crossed08:27
gfidentejistr, so that makes me think08:29
gfidenteif nodes are only removed from balancer08:29
gfidentewhat is one supposed to do at full cluster restart?08:30
gfidenteI'll ping jayg|g0n3 when he is online to see if we miss pieces08:30
jistryeah. IIRC full cluster restarts are an issue in general. Not sure if it's solved in Astapor ATM.08:31
gfidenteso to add that into the balancer we need to make sure clustercheck works in step108:35
jprovaznjistr: thanks, I'm about to send scaledown patch, so I would lean to add count check in a separate patch, but if nobody +2 it soon, I might have it done sooner :)08:53
gfidentejprovazn, can stack_id and plan_id really be None?08:53
mariosjprovazn: thanks man, added commented, i completely missed https://github.com/openstack/tripleo-common08:53
mariosjprovazn: have added it to my review list for tomorrow morning08:53
jprovaznmarios: thanks08:54
jistrgfidente: so re clustercheck in step 1. I'm not experienced in this area, but could it be that we add it to HAProxy when it's still reporting failure, which means HAProxy would have no backends, and then when galera comes up, clustercheck starts reporting OK and HAProxy notices that and starts balancing over those nodes?08:54
*** jang1 has quit IRC08:54
*** jang has quit IRC08:54
jistrbecause i'm not sure we can get galera up in step 1...08:54
jistrwe need to write the config a step earlier08:55
gfidentejistr, yeah 1 is actually 2, where pcmk is starting services08:55
gfidenteack on letting haproxy/galera race08:56
jprovazngfidente: re none value for plan/stack - the method would fail if not proper ids are passed, but user input is checked in CLI part - https://review.gerrithub.io/#/c/231550/4/rdomanager_oscplugin/v1/overcloud_scale.py08:56
*** dguerri_ is now known as dguerri08:57
gfidentejprovazn, yeah I suppose L32 or L36 will raise something? would it be worth removing the =None default?08:59
jprovazngfidente: yep, doing it now including jistr's check08:59
jprovaznthanks for feedback :)09:00
gfidentejistr, how went the deployment? it worked for me a couple of time09:01
jistrdevtest_overcloud.sh@495: wait_for_stack_ready -w 3600 10 overcloud09:01
gfidentemarios, +A! :)09:03
mariosgfidente: i _will_ hit you09:05
mariosgfidente: done :)09:05
gfidentemarios, we have an entire week to break CI ... and FIX it! :)09:05
mariosgfidente: you do, i'm off on thursday09:05
mariosgfidente: thus i have an entire day to break stuff for you09:05
* marios goes to work09:05
jistrnow the bad spot09:06
jistrdevtest_overcloud.sh@601: wait_for -w 300 --delay 10 -- nova service-list --binary nova-compute '2>/dev/null' '|' grep 'enabled.*\ up\ '09:06
gfidentejistr, marios, so apparently no OPM build is out yet?09:06
* jistr fingers crossed09:06
gfidentejistr, I also noticed tpe seems to have been rebased actually?09:06
gfidentejistr, I didn't go that far!09:07
mariosgfidente: till yesterday no09:07
mariosgfidente: i was going to revisit that now and see what to do about a test env... o_O09:07
gfidentemarios, please please please https://review.openstack.org/#/c/183096/09:07
mariosgfidente: i also need to update my neutron-* to look like jistr and spredzy|afk with the puppet-pacemaker09:07
gfidentemarios, to I think we have cinder and glance and neutron and keystone pending09:08
mariosgfidente: its like you enjoy the pain09:08
gfidenteall of which need to be updated09:08
jistrgfidente, marios: hmm so it's still failing... That's exactly the same spot where it was failing yesterday when i tested just my patches. I like the refactors you made though, so i'd like to merge all that stuff anyway.09:10
*** regebro has quit IRC09:10
openstackgerritMerged openstack/tripleo-heat-templates: Add a directory for overcloud heat environments  https://review.openstack.org/18309609:10
gfidentejistr, so on the messaging part09:10
*** regebro has joined #tripleo09:10
gfidenteI think we need this: https://review.openstack.org/#/c/181081/09:11
openstackgerritMerged openstack/tripleo-heat-templates: Environment which configures puppet pacemaker.  https://review.openstack.org/18309709:11
gfidenteAND we need to https://github.com/redhat-openstack/astapor/blob/master/puppet/modules/quickstack/manifests/openstack_common.pp#L1509:11
jistryeah that does look like it might affect the issue09:12
gfidenteand if after those it still won't work, we might want to make some pressure on the depends09:12
jistrcurrently the master is broken09:12
jistrit doesn't finish stack-create at all09:12
gfidenteHA master?09:12
jistryeah i meant HA09:12
jistrnow we have a bunch of patches that get us to a successful stack-create at least09:13
jistri'd say let's merge them and continue from there09:13
jistrgfidente, marios: does it sound sensible?09:14
jistri'm not sure we should keep piling up the changes and rebasing09:14
gfidenteit'd be nice to get some eyes09:14
gfidentebut I can't think of many except jayg09:14
gfidentesysctl though should have been set by the element anyway, jistr can you check on your env?09:40
gfidentemarios, the horizon thing only applies to instack which installs OPM, devtest does checkout of puppet modules from git09:56
mariosgfidente: thanks10:02
openstackgerritGiulio Fidente proposed openstack/tripleo-heat-templates: Move sysctl settings into hieradata  https://review.openstack.org/18421010:23
openstackgerritJan Provaznik proposed openstack/tripleo-common: Scale out heat stack  https://review.openstack.org/17328310:25
jprovazngfidente: jistr: marios: ^ when you have a sec10:26
*** trown|outttypeww is now known as trown10:31
*** dguerri is now known as dguerri_10:36
gfidentejistr, IT PASSED!10:50
gfidentejistr, marios and the sysctl change can land later because we already to it via sysctl element, so just this: https://review.openstack.org/#/c/181081/ + https://review.openstack.org/#/c/184078/10:51
gfidenteand we get back green again!10:51
jistrgfidente: wow that's awesome news!10:58
gfidentejistr, try for yourself :)10:58
*** mmagr|afk is now known as mmagr10:58
jistrgfidente: so should we rebase this on top of the HA patches? https://review.openstack.org/#/c/181081/10:59
gfidentethey don't really depend on each other but I can11:00
gfidenteyou want to see it green11:01
gfidenteI know11:01
gfidentelet me try11:01
*** dguerri_ is now known as dguerri11:06
*** dguerri is now known as dguerri_11:08
openstackgerritGiulio Fidente proposed openstack/tripleo-heat-templates: Provide RabbitMQ clients with a list of servers instead of VIP  https://review.openstack.org/18108111:09
openstackgerritGiulio Fidente proposed openstack/tripleo-heat-templates: Enable VIPs via Pacemaker from step 2 instead of step 1  https://review.openstack.org/18407211:09
openstackgerritGiulio Fidente proposed openstack/tripleo-heat-templates: Consolidate use of $pacemaker_master in step 2  https://review.openstack.org/18407811:09
jistrgfidente: thx! i handed out the ticks there ;) Gonna test manually too.11:11
* jistr wants to land green HA so much11:11
gfidentejistr, bus as last time ... we'd only be half-there11:11
jistryeah, still missing services pacemaker coverage and maybe diverging from recommended arch here and there, but all the subsequent changes would be tested by CI then instead of manual testing, that's a huge win11:14
*** panda has quit IRC11:14
*** panda has joined #tripleo11:14
*** masco has quit IRC11:19
gfidentejistr, didn't jayg|g0n3  and cwolferh have more changes?11:21
mariosjistr: so the pcmk_resource_create stuff shouldn't be necessary for current instack packages right (puppet-pacemaker puppet-tripleo etc?)11:22
mariosjistr: gfidente: I set export DIB_INSTALLTYPE_puppet_modules=source in instack-build-images -and then also grabbed the pcmk changes ... haven't seen pcs status for a few days ;)11:23
mariosgfidente: your patches go ontop? ^^^11:23
* marios reads bak11:23
gfidenteso this should be live CI of topmost https://jenkins06.openstack.org/job/check-tripleo-ironic-overcloud-f20puppet-ha/61/11:23
openstackgerritJan Provaznik proposed openstack/tripleo-common: Scale out heat stack  https://review.openstack.org/17328311:24
jistrmarios: yeah with that you'd also have to switch to a custom branch of t-h-t with my and gfidente's patches11:24
mariosjistr: basically i've been looking for an env i can work with, i think this is it... (add horizon and tidy up neutron-* pcmk resources)11:25
marios(i mean even if it doesn't copmlete 100% if pcs is doing its thing by that point)11:25
gfidenteyeah problem is they diverged a lot recently11:26
mariosstill running though let's see11:26
jistrmandre: with DIB_INSTALLTYPE_puppet_modules=source you won't need any customizations for puppet modules i think11:26
gfidenteso you want updated OPM, updated TPE and merge updates to THT11:26
mariosgfidente: i think i'm missing opm from these11:26
gfidentejistr, does instack honor that?11:27
jistrmandre: sorry for the noise11:27
mariosgfidente: i rebuilt images to pull pe from src and also patched templates (and rebuilt roles)11:27
gfidentemarios, or you move to devtest11:27
jistrgfidente: i think it should use the same tooling, it only overrides the variable if it wasn't set previously it seems https://github.com/rdo-management/instack-undercloud/blob/e921062e7751a48af09f9a9f5f275fc31a9657e3/elements/undercloud-package-install/environment.d/00-package-install#L1011:28
mariosgfidente: you move to devtest11:28
gfidentejistr, ack, thanks11:29
gfidenteI thought was pulled in by some RPM as dependency11:29
gfidentemarios, it is not so terrible nowdays pretty stable actually11:30
mariosgfidente: indeed last time i poked was end of march, according to DAS LOG11:31
mariosgfidente: but you're there now, so i'm ok for a bit longer11:32
mariosMay 19 07:19:27 localhost pengine[17336]: warning: unpack_rsc_op_failure: Processing failed op start for galera:0 on ov-jbgefv6jimp-0-uzushvxqjdlv-controller-sg72b4wylo36.novalocal: not configured (6)11:33
mariosis this the startup race (looks like)11:33
marios(might be i only grabbed the 'rabbitmq startup race' and not the other one)11:34
gfidentemarios, those images also still have .novalocal11:35
gfidentethis is going to be epic rebase11:36
jistrepic rebase, yeah :D11:37
gfidenteloads of things could either get fixed OR BREAK! :)11:39
*** thrash|g0ne is now known as thrash11:39
*** jistr is now known as jistr|class11:40
*** dguerri_ is now known as dguerri11:49
*** jayg|g0n3 is now known as jayg12:14
lsmola_gfidente: ping12:31
lsmola_gfidente: when I want to deploy ceph, should I use this export CINDER_ISCSI=112:31
lsmola_gfidente: the docs are unclear for me :-)12:31
gfidentehi lsmola_ we need to add some documentation about ceph cause things changed a bit12:32
gfidenteare you trying via instack or via devtest?12:32
lsmola_gfidente: the rdo-manager docs, with rhel12:32
gfidenteso you mean the bits here: https://repos.fedorapeople.org/repos/openstack-m/docs/master/basic_deployment/basic_deployment.html#deploy-the-overcloud ?12:33
lsmola_gfidente: there is basically just this param and number of ceph nodes for configuration documented12:33
lsmola_gfidente: https://repos.fedorapeople.org/repos/openstack-m/docs/internal/master/basic_deployment/basic_deployment.html12:33
lsmola_gfidente: the internal for rhel12:33
*** shardy_ has joined #tripleo12:33
lsmola_gfidente: but it's the same :-)12:34
gfidentelsmola_, so we support multibackend in cinder now12:34
gfidenteif you use CINDER_ISCSI=1 you will get both Ceph and LVM backends enabled12:34
gfidenteif you set CINDER_ISCSI=0, only Ceph12:34
gfidente(backend name in that message should really refer to LVM not ISCSI)12:34
gfidenteyou comfortable with updating the docs to word it better?12:35
*** shardy has quit IRC12:35
lsmola_gfidente: ok12:35
lsmola_gfidente: I guess only ceph is good for me, since I will dpeloy only ceph ndoes now12:36
lsmola_gfidente: I'll send it to jcoufal to reword it, I don't have access to the docs12:36
gfidenteso to be clear, upstream we do support scaling the LVM nodes as well, not in instack yet though12:36
lsmola_gfidente: or do I? never tried it :-)12:36
lsmola_gfidente: ok12:36
gfidentelsmola_, I think we all do, it is on gerrithub12:36
gfidenteso from instack, when using LVM backend, only the controllers will contribute to the LVM space12:37
lsmola_gfidente: is there any doc for the doc? :-)12:37
gfidentelsmola_, you have to use git-review12:37
gfidentedoc is built from instack inline12:37
lsmola_gfidente: ah12:37
gfidentelsmola_, my fault, not for instack12:38
gfidenteinstack doc https://github.com/rdo-management/instack-undercloud/blob/master/doc/source/basic_deployment/basic_deployment.rst12:38
lsmola_gfidente: ook, I'll ask jcoufal, he will send me how to contribute to the docs :-)12:39
*** shardy_ has quit IRC12:39
*** shardy has joined #tripleo12:40
*** jprovazn has quit IRC12:46
adrianopetrichhey folks12:57
*** jistr|class is now known as jistr12:59
gfidenteso looks like we'll have to do a recheck13:17
gfidentebecause CI failed with Error: Could not start Service[httpd]: Execution of '/sbin/service httpd start' returned 113:17
gfidenteonly on controller-213:17
jistrgfidente: ack on recheck, in the meantime we can investigate13:17
jistrcould be a race condition again13:18
jistri don't see any httpd logs written on the ctrl 213:18
gfidentejistr, my fault, only controller-013:19
mariosguys which one am i missing to still be seeing http://paste.openstack.org/show/228197/13:20
jistrgfidente:  ERROR:scss.expression:Function not found: twbs-font-path:113:20
gfidentejistr, I also noticed from host_info that httpd is actually running on the other nodes even though there is no mention of httpd in os-collect-config13:21
*** dasm is now known as dasm|afk13:21
gfidentemarios, I think you wanted to take on horizon/ha ?13:21
mariosgfidente: when i get an env i will :)13:22
mariosi may have borked somthing on a rebase with heat templates... gonna try tidy that up a bit and go again13:23
mariosbut first 5 minutes afk13:23
*** jprovazn has joined #tripleo13:23
gfidentemarios, looks like it couldn't start haproxy13:24
gfidentecan you check systemctl status haproxy?13:24
gfidenteor the journalctl to see if you can spot anything about haproxy?13:24
jistrso that twbs-font-path thing -- twbs stands for twitter bootstrap :)13:25
jayggfidente jistr: so if I want to do a fresh deploy to test HA, is there a set of patches you recommend?  or perhaps a single on that is the base for others?13:27
gfidentejayg, we were waiting for you!@13:27
gfidentejayg, get back to master13:27
gfidentecheckout the entire tree from here: https://review.openstack.org/#/c/18108113:27
gfidenteand make sure you get overcloud deployed! :)13:27
jistr^ yeah13:28
jaygcool, I'll give it a shot, thanks!13:28
jistrthat works for both gfidente and me13:28
jaygis there a way in gerrit to grabs _all_ those patches at once?13:29
jistrso re that failure, i'm not sure if it's the twitter bootstrap errors that crash it or if it's just that the initial operation (whatever it does) takes too long13:29
jistr httpd.service start-pre operation timed out. Terminating.13:29
gfidentejistr, yeah was thinking same, have to compare with nonha13:29
jistrjayg: just take the last one, it will include the ~10 others13:29
gfidentejayg, use 'checkout'13:29
jistrgit fetch https://review.openstack.org/openstack/tripleo-heat-templates refs/changes/81/181081/7 && git checkout FETCH_HEAD13:30
jaygjistr: thx13:30
gfidentefrom the top-right 'download' menu13:30
jaygI prefer to pull to a new local branch, but I get the idea, thx13:30
jaygalso, any other patches like the puppet-triple one that bit me that other day, which I need to get on my control images?13:30
gfidentejistr, same messages but then it started http://logs.openstack.org/81/181081/7/check-tripleo/check-tripleo-ironic-overcloud-f20puppet-nonha/9fa3962/logs/ov-sdfgdlh7b3x-0-3mtvyb4vjsu7-Controller_logs/httpd.txt.gz13:31
jistrjayg: nothing special is required right now i think13:31
gfidentejistr, instead of timing out13:31
jistrgfidente: ack, so it's just horizon trying to eat the world on httpd startup13:32
gfidentelooks like13:32
gfidentein nonha it takes from 21:28 to 21:47 to start13:32
jaygjistr: cool, thx13:33
gfidentein ha at 55:27 starts but at 57:18 times out13:33
jaygjistr: are you using apache in the resource type like in the ref arch?13:33
jaygcrag had problems with that yesterday13:34
gfidentejayg, horizon is not in pacemaker yet13:34
jistrjayg: i think at this point we don't have it as a resource13:34
jaygah, k13:34
gfidentejayg, so this is the puppet-horizon module starting httpd at some point13:34
jaygwhy not just disable it until it is managed?13:35
gfidentewell, none of the openstack services is at this point, but if it helps in making CI green, we might do it13:36
gfidentesadly leaving work on the head of marios13:36
jistrso the scary thing is that the timeout comes from systemd itself13:37
mariosgfidente: i will hit you many times13:37
gfidentewho can push changes until he gets green CI with horizon13:37
jistr..which means that value might be hard to change13:37
mariosgfidente: my latest run is using your branch above (I see it has all the things pcmk_ etc)13:38
gfidentejistr, on the other hand, almost 2mins is reasonable timeout13:38
mariosgfidente: if it doesn't complete sucessfully13:38
mariosgfidente: i think you know what happens13:38
mariosgfidente: should i expect the horizon error then?13:38
gfidentemarios, so I haven't seen it on the local dev env, but upstream CI is failing on horizon yet13:39
jistrgfidente: yeah that's right. Possibly something went wrong there really, not just taking its time.13:39
mariosgfidente: looks like haproxy started ok this time...13:39
jaygguys, /7 doesn't apply to master without a merge commit, should it be rebased?13:39
mariosjayg: i'd do fresh git clone and then git review -d changid13:40
mariosjayg: assuming you have gerrit id review.openstack.org13:40
jistrjayg: yeah are you sure you have current master?13:40
gfidentemarios, but you still got CREATE_FAILED?13:40
mariosgfidente: waiting still13:40
jaygmarios: ok, I will try that, I think I set up gerrit on test box13:40
* gfidente pushing a temporary thing to disable horizon on top13:41
mariosjayg: yeah if new box you'll need to cp the .ssh/id_rsa.pum to gerrit13:41
* gfidente still have hopes of not disabling it though13:41
jistrgfidente: ack13:41
* marios palmface forgot to change registry13:43
jaygmarios: that did the trick, thanks, will try to redeploy now13:43
* jayg still gerrit n00b13:44
jistrgfidente: so regarding the instance check timeout i mentioned to you, the overcloud instance didn't get created at all, this is from nova-api log: http://fpaste.org/223328/14320430/raw/13:45
jistraand when i tried to re-run the nova boot command that devtest runs:13:47
jistr[root@dell-t5810ws-rdo-10 ~]# nova boot --key-name default --flavor m1.tiny --block-device source=image,id=855b0b2e-9fe6-4789-b58c-aee37dc44a0c,dest=volume,size=3,shutdown=preserve,bootindex=0 demo13:47
jistrERROR (BadRequest): Multiple possible networks found, use a Network ID to be more specific. (HTTP 400) (Request-ID: req-45b0ab52-3dbb-412c-bb37-bec7f06a3463)13:47
gfidentejistr, that is because in CI it runs with demo user credentials13:47
gfidentejistr, so it doesn't have multiple networks13:47
jistrduh thx13:47
jistrah so the instance *is* up, and has the correct ip assigned, but i can't ping it13:50
* jistr proceeds to investigate further13:50
gfidentejistr, I think I know this :)13:50
openstackgerritGiulio Fidente proposed openstack/tripleo-heat-templates: Temporarily disable Horizon in HA job  https://review.openstack.org/18425213:51
* jistr is one big ear13:51
gfidentejistr, jprovazn probably remembers as well13:51
* jistr hopes that translates from czech correctly13:51
gfidenteso the reconnect to rabbit from neutron-ovs-agent13:51
gfidentemight cause desync of ovs-agent with neutron-server13:51
gfidenteand you end up with only 'some' of the ovs tunnels in between controller/compute13:52
* jprovazn reads back13:52
gfidenteI used to do this:13:52
gfidenteovs-vsctl show13:52
gfidenteon all three controllers AND compute13:52
gfidentecompute should have a tunnel to each controller13:52
*** dguerri is now known as dguerri`away13:53
gfidenteand controller to each other controller + to compute13:53
jprovaznjistr, gfidente is the right guy you want to ask about this :) - all I remember is just that he debugged&solved it13:54
gfidentejprovazn, sure13:54
gfidenteespecially solved13:54
jistrhmm interesting. seems all the nodes have 3 tunnels present (to all the remaining nodes of the four)13:57
gfidentealso yes14:23
jayghrm, so the good news is I finally get a cluster14:25
jaygthe bad news is it appears puppet did not actually try to add any of the resources to that cluster14:25
gfidentejayg, any at all sounds like it failed14:25
jaygthe last Pacemaker ref I see in my puppet log is turning off stonith (no fails I see at end either)14:25
*** jrist has quit IRC14:25
gfidentejayg, can we make some resource agent verbose?14:44
gfidentelooks like we ended up with pacemaker saying the rabbitmq cluster is up but rabbitmqctl says they are isolted14:44
jayggfidente: I am not sure what you mean by agent being verbose - pacemaker only reports on if service responds as running or not, not whether it is working properly14:46
gfidentejayg, yeah so rabbitmq-cluster ocf14:46
gfidentelogging cluster bootstrapped14:47
gfidentebut the rabbitmq nodes are isolated14:47
*** akrivoka has joined #tripleo14:47
*** sdake has quit IRC14:47
jistrpresently we're missing ordered=true interleave=true as clone params though, i'm wondering if that could make a difference.14:47
* jistr will try14:48
jaygjistr: you have switched to newest puppet-pacemaker, right?14:48
jistrjayg: yes14:48
jistri know adding them is easy14:49
jaygthere was a patch from spredzy|afk after crag's stuff, which we merged, just want to make sure you have that too14:49
jistri rebuilt the images today so i'm hoping i do14:49
gfidentejistr, looks like ordered is pretty important14:50
*** zbitter has joined #tripleo14:50
jaygyeah, those two look like the biggest diffs in config atm14:52
lsmola_gfidente: is it possible we don't have cinder registered to keystone?14:52
gfidentelsmola_, don't think so, something failed if you don't have it14:52
lsmola_gfidente: I see only object store there14:53
lsmola_gfidente: damn14:53
openstackgerritJiri Stransky proposed openstack/tripleo-heat-templates: Clone params for pacemaker rabbitmq resource  https://review.openstack.org/18426315:03
gfidentejistr, on top or outside that tree?15:04
*** akrivoka__ has joined #tripleo15:04
jistrgfidente: on top, sans the horizon bit15:04
gfidenteI'm even more convinced we need to merge things now :)15:04
jistr(it has to be on top of at least some of the moves, otherwise it would conflict)15:04
jistryeah :D15:04
gfidenteah right15:04
gfidentejayg, any chance to cherry-pick https://review.openstack.org/#/c/184263 on top of what you have already?15:05
jayggfidente: already grabbing it  :)15:05
lsmola_gfidente: so running service list, but I see only one volume service, for 2 deployed ceph nodes15:16
gfidentelsmola_, correct, you have cinder-volume running only on controller15:17
gfidenteceph osd dump15:17
gfidenteshould tell you about the ceph storage status15:17
gfidenteceph -s15:17
lsmola_gfidente: and hostname is weird, controller servicer has hostname of controller node, but volume doesn't match any ceph node15:17
*** zbitter has quit IRC15:17
gfidentehostname of the tripleo_ceph instance is customized to arbitrary string on purpose, jistr ^^15:18
gfidenteif you had more controllers they were all sharing same string for that backend15:18
gfidenteinsted of their hostname15:18
gfidentejayg, and you were probably enabling l3_ha based on number of controllers right? because it doesn't work with 1 node15:19
*** akrivoka__ has quit IRC15:20
*** ParsectiX has joined #tripleo15:21
gfidentejayg, I also wanted to ask about clustercheck15:22
gfidenteping when you're not testing :P15:22
lsmola_gfidente: hmm, so the volume <-> host relation is not exposed in APIs?15:22
*** zb has joined #tripleo15:22
jayggfidente: we allow l3_ha even on one node, but there are a couple other settings that vary if you have one node or more15:23
* jayg wrapping up call :)15:23
*** akrivoka has joined #tripleo15:23
gfidentejayg, isn't it breaking neutron-server on 1 node preventing it from starting?15:23
jayggfidente: let me double check myself, but that has not been a problem15:25
gfidenteack thanks, then I can ask about clusterchek15:25
gfidenteyou should put a bot in the channel15:25
gfidentecapturing questions15:25
gfidenteand giving known good responses15:25
gfidentelsmola_, I am not sure about 'not exposed' ?15:26
*** akrivoka_ has joined #tripleo15:26
lsmola_gfidente: similar to VMs, you can see on which host it is running15:27
*** zaneb has quit IRC15:27
jaygthis is what we do https://github.com/redhat-openstack/astapor/blob/master/puppet/modules/quickstack/manifests/pacemaker/neutron.pp#L10215:27
gfidentelsmola_, ah so cinder-volume is always running on all controllers15:27
lsmola_gfidente: so here it's hidden behind the cluster?15:27
gfidentelsmola_, which piece of information is hidden?15:27
lsmola_gfidente: and the actual deployed volume?15:27
*** zbitter has joined #tripleo15:27
*** daneyon has quit IRC15:28
*** akrivoka__ has joined #tripleo15:28
gfidentelsmola_, the volume is not hosted by a specific node, that is the very purpose of customizing the host setting15:28
gfidenteif we were to do it, volume would only be available if backing host is up15:28
gfidentejayg, YOU HACK THE NUMBERS! :P15:29
* gfidente crying15:29
*** akrivoka__ has quit IRC15:29
gfidentejayg, maybe we should do that too :)15:29
lsmola_gfidente: I though it will be just replicated on few hosts?15:29
gfidentelsmola_, it is, but ceph is doing that, not cinder15:29
jistrgfidente, jayg: :)))15:29
lsmola_gfidente: right, so that info is not exposed through cinder?15:30
jayggfidente: I don't hack, them, I set them correctly  :)15:30
gfidentelsmola_, ack now I get it, that info is not even known to cinder15:30
lsmola_gfidente: any idea whic command can list this talking to ceph?15:30
*** akrivoka has quit IRC15:30
gfidentejayg, YOU HACK THE NUMBER!15:30
*** zaneb has joined #tripleo15:30
gfidenteand I mean YOU15:30
gfidentenot you15:30
gfidenteand now I WILL too :15:31
*** weshay has joined #tripleo15:31
*** akrivoka_ has quit IRC15:31
jaygooooh, I am watching a second puppet run that may be doing what I want for a change here....15:31
gfidentejayg, good :)15:32
*** zb has quit IRC15:32
*** zbitter has quit IRC15:34
*** whayutin_ has joined #tripleo15:35
*** julim has quit IRC15:37
*** weshay has quit IRC15:38
*** CheKoLyN has quit IRC15:38
*** saguilar has joined #tripleo15:39
* gfidente wait_for ping15:45
gfidentelsmola_, so cinder doesn't know about what ceph is doing in terms of replication15:46
gfidentejistr, IT REPLIED TO PING!15:47
lsmola_gfidente: ok, I'll try to investigate if I can get that relation from ceph API15:47
jistrgfidente: /me wait_for ping15:47
lsmola_gfidente: if not, I guess we don't need it that much :-)15:47
gfidentelsmola_, with rbd lspool volumes15:48
gfidenteyou get list of objects stored from cinder15:48
gfidenteand uuid matches the cinder volume itself15:48
gfidentebut you don't get which nodes are hosting the volume15:48
gfidenteceph is, in the background, maintaining 2/3 replicas at all times15:48
*** whayutin_ has quit IRC15:49
jistrgfidente: IT DID REPLY TO PING. which means i've just had a first END 2 END SUCCESSFUL HA DEPLOYMENT15:49
gfidenteAHAH :)15:49
lsmola_gfidente: where do I run this command?15:49
gfidentelsmola_, controllers15:49
gfidentejistr, including horizon, as it was for me as well15:50
gfidenteso I am not sure what to do with it15:50
jayggfidente jistr: ok, so after my deploy, I have one seemingly transient failure, but galera running - http://paste.fedoraproject.org/223415/14320505/15:50
gfidenteupstream CI still failing instead15:50
jaygand no rabbit partitions - http://paste.fedoraproject.org/223414/05054114/15:50
gfidentejayg, I have seen the galera_monitor to fail as well15:50
gfidenteI vote for merging THE WHOLE THING15:51
jistrgfidente, jayg: yeah i've seen the error too15:51
lsmola_gfidente: seems like wrong command, rbd lspool volumes, running it on overcloud controller15:51
gfidenteand then get back on galera15:51
jistrgfidente: we don't have enough +2s on some15:51
gfidentemarios, ^^ :)15:51
gfidentejayg, question about galera clustercheck is15:52
gfidenteI see it is polled by haproxy15:52
gfidenteto remove unsynced nodes15:52
jistrgfidente: should we start landing those which do have enough +2s? i'd say yes15:52
gfidenteis it used for full cluster restart as well?15:52
gfidentejistr, sure!15:53
gfidentefrom where we are, it's better to merge15:53
jayggfidente: in what way do you mean?15:53
openstackgerritMerged openstack/tripleo-heat-templates: Fix RabbitMQ startup race  https://review.openstack.org/18139815:53
openstackgerritMerged openstack/tripleo-heat-templates: Update to reflect puppet-pacemaker changes  https://review.openstack.org/18310315:54
openstackgerritMerged openstack/tripleo-heat-templates: Configure HAProxy, Galera and MongoDB before start  https://review.openstack.org/18404315:54
openstackgerritMerged openstack/tripleo-heat-templates: Remove unused enable_pacemaker setting from templates  https://review.openstack.org/18405715:55
*** eghobo has joined #tripleo15:55
* jistr EOD ttyl15:57
*** jistr has quit IRC15:57
gfidentemarios, you around for some merging?15:57
lsmola_gfidente: rbd ls volumes --long15:57
jayggfidente: so unless I read /usr/lib/ocf/resource.d/heartbeat/galera incorrectly, I dont see it using clustercheck wrt restarts15:58
lsmola_gfidente: but don;t see the any link to hosts, or replications15:58
lsmola_gfidente: have to run, will try that tomorrow again15:58
gfidentewhat is purpose of clustercheck, just to report about sync status to haproxy?15:58
jaygwe use it also to make sure galera is ready before trying to set up anything that depends on it15:59
jaygand haproxy takes no action, that is just how it monitors for issues15:59
*** Marga_ has joined #tripleo16:01
gfidenteisn't it removing nodes from sticky table out of sync?16:01
*** saguilar has quit IRC16:01
*** cwolferh has joined #tripleo16:02
*** mestery has joined #tripleo16:04
*** dguerri is now known as dguerri`away16:05
jayggfidente: if the node is down, it would move traffic to a different one, but I think it keeps checking16:06
*** daneyon has joined #tripleo16:07
gfidenteso we have something which defines different nodes as master/backup16:07
gfidentebut is there any other difference with the sticky-table approach that you can tell?16:08
jaygcwolferh: maybe you can answer this question better?  wrt stick table+galera+haproxy16:09
*** yamahata has quit IRC16:10
gfidentecurrently we do deploy clustercheck but we don't use it from haproxy yet because of the master/backup pre-existing approach16:10
*** ParsectiX has quit IRC16:10
cwolferher, what is the question?16:10
gfidentecwolferh, sec code is easier16:11
gfidentewe don't have this in tripleo https://github.com/redhat-openstack/astapor/blob/master/puppet/modules/quickstack/manifests/load_balancer/galera.pp#L2216:11
openstackgerritPino Toscano proposed openstack/diskimage-builder: Cleanup the build directories earlier  https://review.openstack.org/18426816:11
gfidenteinstead we append backup key to all-except-one node16:12
gfidenteyey we deploy clustercheck16:12
*** athomas has quit IRC16:13
jayghmm, well the linked quickstack bit is what the ref arch recommends...16:13
jaygso why arent we doing that?16:13
gfidenteso the questions are: 1) is it needed for full cluster restart in some way? is there more than haproxy pollin 9200? and 2) how does sticktable in haproxy compares to the master/backup thing? shall we use sticktable instead?16:13
gfidentejayg, because some of this stuff was in tripleo already16:13
gfidentejayg, it aimed at multiple controllers without pacemaker16:13
*** mestery has quit IRC16:14
cwolferhi don't understand what the master/backup thing is, i thought all nodes were masters when galera was deployed as a pacemaker service16:14
gfidentejayg, so we're slowly 'migrating' the config16:14
jaygah, ok16:14
gfidentecwolferh, all galera nodes are, it is in haproxy we define all-but-one as backup so they only get proxied if master is down16:14
gfidentecwolferh, https://github.com/stackforge/puppet-tripleo/blob/master/manifests/loadbalancer.pp#L571-L58816:15
cwolferhgfidente, n-1 masters and 1 backup, or 1 master and n-1 backups?16:15
gfidenten-1 backups16:15
cwolferheither way though, i think you are better off with stick table16:15
cwolferhif the master goes down, then you are in a round robin scenario between backups it seems to me (naively, not really familiar with master/backup)16:16
gfidentenah it is only using 1 backup at a time16:16
gfidenteas long as one is available it uses the first one it can find16:16
gfidenteanyway, I'll track this as a bug so we get on it with some medium priority16:16
*** zaneb has quit IRC16:17
*** saguilar has joined #tripleo16:19
gfidentejayg, so regarding the l3_ha setting instead16:20
gfidentehttps://github.com/redhat-openstack/astapor/blob/master/puppet/modules/quickstack/manifests/pacemaker/neutron.pp#L102 < I see minimum set to 216:20
gfidentebut I get InternalServerError: Not enough l3 agents available to ensure HA. Minimum required 2, available 1.16:20
gfidentewith single node, do you disable it with single node then>16:21
*** daneyon has quit IRC16:21
*** Marga_ has quit IRC16:27
*** Marga_ has joined #tripleo16:27
jayggfidente: hmm, not explicitly that I recall, unless staypuft was doing so16:28
gfidenteyeah because if I enforce min to 1 I get16:29
gfidenteHAMinimumAgentsNumberNotValid: min_l3_agents_per_router config parameter is not valid. It has to be equal to or more than 2 for HA.16:29
jayggfidente: can you paste your neutron config?16:31
gfidentejayg sure16:31
gfidentenot anymore I deleted stack16:31
jaygah, ok16:32
jaygbecause if you look, even with l3_ha= true, if the cluster size were 1, we would set dhcp_agents_per_network=1, and min is the same in either case16:33
gfidentedhcp_agents is fine16:33
jaygthe only bits that change are clone-max and max_l3_agents_per_router16:33
gfidenteit is min_l3_agents the issue16:33
jaygthat value is the default anyway16:33
jaygso there must be a different setting triggering this16:34
gfidenteyes indeed if you don't change it and set l3_ha with 1 neutron server, the neutron client returns the error I pasted16:34
gfidenteclone is only affecting the pacemaker resource no?16:34
jaygah, looking at neutron::server, it must be as you say, that 'DEFAULT/l3_ha':                    value => true;16:36
jaygmust break it16:36
*** mestery has joined #tripleo16:37
gfidenteyep it won't work with less than 216:39
jayggfidente: you know, now that I have been trying to remember for a bit, I think I have heard reports of this before, and that staypuft would set it to false if one node16:40
gfidenteyou DO NOT HACK THE NUMBERS!16:40
gfidentesee you tomorrow16:41
gfidenteI am done for today16:41
jayggood night!16:41
openstackgerritGiulio Fidente proposed openstack/tripleo-heat-templates: Enable NeutronL3HA by default in Pacemaker scenario  https://review.openstack.org/18428916:42
*** ParsectiX has quit IRC16:47
openstackgerritGiulio Fidente proposed openstack/tripleo-heat-templates: Enable NeutronL3HA by default in Pacemaker scenario  https://review.openstack.org/18428916:47
*** ParsectiX has joined #tripleo16:47
gfidentejayg, can you vote the deps tree as well?16:49
jayggfidente: do I just do each individually?16:50
gfidenteyeah you have to do individually16:50
gfidentetopmost can't be merged untill the deps also are16:50
gfidenteeven if it gets all the votes16:50
jaygsure thing, I'll go through them each as well16:51
gfidentewe switched rabbit client to list of hosts as wel16:51
gfidenteand there was some missing sync_db16:52
gfidentethe other three are minor restructuring16:52
gfidenteagain, ttyt :)16:52
jaygk, ttyt16:52
*** gfidente has quit IRC16:53
*** sdake_ is now known as sdake16:55
*** ParsectiX has quit IRC16:57
*** ParsectiX has joined #tripleo16:57
*** trown is now known as trown|lunch17:01
*** Marga_ has quit IRC17:01
*** mestery has joined #tripleo17:28
*** sdake has quit IRC17:40
*** sdake has joined #tripleo17:42
*** daneyon has joined #tripleo17:50
*** david-lyle has joined #tripleo17:53
*** sdake has joined #tripleo17:56
*** saguilar has joined #tripleo18:10
*** daneyon has quit IRC18:14
*** daneyon has joined #tripleo18:16
*** daneyon has quit IRC18:18
*** trown|lunch is now known as trown18:18
*** spredzy|afk is now known as spredzy18:18
*** gfidente has joined #tripleo18:41
*** gfidente has quit IRC18:44
*** barra204 has joined #tripleo18:50
*** barra204_ has joined #tripleo18:50
*** adrianopetrich has quit IRC18:54
*** david-lyle has quit IRC18:57
*** david-lyle has joined #tripleo19:01
openstackgerritgreghaynes proposed openstack/diskimage-builder: Add debian build test case  https://review.openstack.org/18116119:23
openstackgerritgreghaynes proposed openstack/diskimage-builder: Add tests for building *-minimal images  https://review.openstack.org/18116219:23
*** alop has quit IRC19:28
*** akrivoka has joined #tripleo19:33
*** barra204__ has joined #tripleo19:35
*** barra204 has joined #tripleo19:35
*** barra204 has quit IRC19:37
*** barra204__ has quit IRC19:37
*** barra204__ has joined #tripleo19:37
*** barra204 has joined #tripleo19:37
openstackgerritgreghaynes proposed openstack/diskimage-builder: Add tests for building *-minimal images  https://review.openstack.org/18116219:42
*** daneyon has joined #tripleo19:45
*** akrivoka has quit IRC19:46
*** cwolferh has joined #tripleo19:48
*** david-lyle has quit IRC19:52
*** dguerri is now known as dguerri`away19:54
*** daneyon has quit IRC20:24
*** akrivoka has joined #tripleo20:29
*** daneyon has joined #tripleo20:41
*** eghobo has joined #tripleo20:42
*** daneyon has quit IRC20:45
*** eghobo_ has joined #tripleo20:45
openstackgerritChristopher Dearborn proposed openstack/tripleo-image-elements: Have all os-refresh-config elements use su instead of sudo  https://review.openstack.org/17130320:47
*** eghobo has quit IRC20:48
*** daneyon has joined #tripleo20:53
*** daneyon has quit IRC20:54
*** david-lyle has joined #tripleo20:55
*** david-lyle has quit IRC20:58
*** eghobo_ has quit IRC21:05
*** dguerri`away is now known as dguerri21:16
openstackgerritDan Prince proposed openstack/tripleo-heat-templates: Update neutron local_ip to use the tenant network  https://review.openstack.org/17871622:15
openstackgerritDan Prince proposed openstack/tripleo-heat-templates: Add a network ports IP mapping resource  https://review.openstack.org/17871422:15
openstackgerritDan Prince proposed openstack/tripleo-heat-templates: Add isolated network ports to block storage roles  https://review.openstack.org/18082422:15
openstackgerritDan Prince proposed openstack/tripleo-heat-templates: Add isolated network ports to swift roles  https://review.openstack.org/18082322:15
openstackgerritDan Prince proposed openstack/tripleo-heat-templates: Add isolated network ports to ceph roles  https://review.openstack.org/18082222:15
openstackgerritDan Prince proposed openstack/tripleo-heat-templates: Add isolated network ports to controller roles  https://review.openstack.org/17784622:15
openstackgerritDan Prince proposed openstack/tripleo-heat-templates: Add isolated network ports to compute roles  https://review.openstack.org/18082122:15
openstackgerritDan Prince proposed openstack/tripleo-heat-templates: Add a ports (ip address) abstraction layer  https://review.openstack.org/17784522:15
openstackgerritDan Prince proposed openstack/tripleo-heat-templates: Add isolated net parameters to net-config stacks  https://review.openstack.org/18082022:15
openstackgerritDan Prince proposed openstack/tripleo-heat-templates: Wire in optional network creation for overcloud  https://review.openstack.org/17784422:15
openstackgerritDan Prince proposed openstack/tripleo-heat-templates: Switch net-config templates to use OS::stack_id  https://review.openstack.org/18434322:15
*** david-lyle has joined #tripleo22:21
*** david-lyle has quit IRC22:27
*** chlong has quit IRC23:31
