*** sdake_ has joined #tripleo | 00:04 | |
*** sdake has quit IRC | 00:07 | |
*** sdake has joined #tripleo | 00:20 | |
*** shadower has quit IRC | 00:23 | |
*** shadower has joined #tripleo | 00:23 | |
*** sdake_ has quit IRC | 00:23 | |
*** olaph has joined #tripleo | 00:38 | |
*** Goneri has quit IRC | 00:43 | |
*** mestery has joined #tripleo | 00:59 | |
*** yamahata has joined #tripleo | 01:27 | |
*** mestery has quit IRC | 01:36 | |
*** mestery has joined #tripleo | 01:36 | |
*** mestery has quit IRC | 01:37 | |
*** yamahata has quit IRC | 01:47 | |
*** al has quit IRC | 01:59 | |
*** al has joined #tripleo | 02:01 | |
*** panda has quit IRC | 02:09 | |
*** panda has joined #tripleo | 02:10 | |
*** aukhan has joined #tripleo | 02:41 | |
*** untriaged-bot has joined #tripleo | 03:00 | |
untriaged-bot | Untriaged bugs so far: | 03:00 |
---|---|---|
untriaged-bot | https://bugs.launchpad.net/os-collect-config/+bug/1482510 | 03:00 |
openstack | Launchpad bug 1482510 in heat "OS::Heat::SoftwareDeployment failed due SSL certificate verification error" [Medium,Triaged] - Assigned to Rico Lin (rico-lin) | 03:00 |
untriaged-bot | https://bugs.launchpad.net/diskimage-builder/+bug/1466037 | 03:00 |
openstack | Launchpad bug 1466037 in diskimage-builder "Signed Fedora and Ubuntu user image built by DIB can`t boot on HP DL380 Gen8 server for lack of mpt2sas driver" [Undecided,Incomplete] | 03:00 |
uvirtbot | Launchpad bug 1482510 in os-collect-config "OS::Heat::SoftwareDeployment failed due SSL certificate verification error" [Undecided,New] | 03:00 |
uvirtbot | Launchpad bug 1482510 in os-collect-config "OS::Heat::SoftwareDeployment failed due SSL certificate verification error" [Undecided,New] https://launchpad.net/bugs/1482510 | 03:00 |
untriaged-bot | https://bugs.launchpad.net/diskimage-builder/+bug/1483385 | 03:00 |
openstack | Launchpad bug 1483385 in diskimage-builder "install_grub failing for centos7" [Undecided,In progress] - Assigned to Abel Lopez (al592b) | 03:00 |
uvirtbot | Launchpad bug 1466037 in diskimage-builder "Signed Fedora and Ubuntu user image built by DIB can`t boot on HP DL380 Gen8 server for lack of mpt2sas driver" [Undecided,Incomplete] | 03:00 |
untriaged-bot | https://bugs.launchpad.net/diskimage-builder/+bug/1471802 | 03:00 |
openstack | Launchpad bug 1471802 in diskimage-builder "ironic-agent element hardcodes interfaces names for DHCP." [Undecided,Fix committed] - Assigned to Om Kumar (om-kumar) | 03:00 |
uvirtbot | Launchpad bug 1466037 in diskimage-builder "Signed Fedora and Ubuntu user image built by DIB can`t boot on HP DL380 Gen8 server for lack of mpt2sas driver" [Undecided,Incomplete] https://launchpad.net/bugs/1466037 | 03:00 |
uvirtbot | Launchpad bug 1483385 in diskimage-builder "install_grub failing for centos7" [Undecided,In progress] | 03:00 |
*** untriaged-bot has quit IRC | 03:00 | |
uvirtbot | Launchpad bug 1483385 in diskimage-builder "install_grub failing for centos7" [Undecided,In progress] https://launchpad.net/bugs/1483385 | 03:00 |
uvirtbot | Launchpad bug 1471802 in diskimage-builder "ironic-agent element hardcodes interfaces names for DHCP." [Undecided,Fix committed] | 03:00 |
uvirtbot | Launchpad bug 1471802 in diskimage-builder "ironic-agent element hardcodes interfaces names for DHCP." [Undecided,Fix committed] https://launchpad.net/bugs/1471802 | 03:00 |
*** al has quit IRC | 03:08 | |
*** al has joined #tripleo | 03:10 | |
*** openstack has joined #tripleo | 04:19 | |
*** masco has joined #tripleo | 05:20 | |
*** jprovazn has joined #tripleo | 05:52 | |
*** bvandenh has joined #tripleo | 06:06 | |
*** lsmola has joined #tripleo | 06:12 | |
*** Marga_ has joined #tripleo | 06:12 | |
*** Marga_ has quit IRC | 06:13 | |
*** Marga_ has joined #tripleo | 06:14 | |
*** Marga_ has quit IRC | 06:17 | |
*** Marga_ has joined #tripleo | 06:17 | |
*** ifarkas has joined #tripleo | 06:40 | |
*** sdake_ has joined #tripleo | 06:44 | |
*** sdake has quit IRC | 06:47 | |
marios | " It's just you. http://review.openstack.org is up. " :/ | 06:54 |
-openstackstatus- NOTICE: Gerrit is currently under very high load and may be unresponsive. infra are looking into the issue. | 07:07 | |
*** sdake_ has quit IRC | 07:08 | |
marios | so perhaps not just me then | 07:09 |
*** jtomasek has joined #tripleo | 07:09 | |
*** pblaho has joined #tripleo | 07:15 | |
openstackgerrit | Yanis Guenane proposed openstack/tripleo-heat-templates: [test] Ensuring ha job is working when stonith is fully disabled https://review.openstack.org/212566 | 07:25 |
*** pblaho has quit IRC | 07:32 | |
*** pblaho has joined #tripleo | 07:32 | |
*** sthillma has joined #tripleo | 07:35 | |
*** aufi has joined #tripleo | 07:35 | |
*** sthillma_ has joined #tripleo | 07:36 | |
*** matbu has joined #tripleo | 07:38 | |
*** sthillma has quit IRC | 07:39 | |
*** sthillma_ is now known as sthillma | 07:39 | |
*** yog_ has joined #tripleo | 07:41 | |
*** sthillma_ has joined #tripleo | 07:41 | |
*** sthillma has quit IRC | 07:44 | |
*** sthillma_ is now known as sthillma | 07:44 | |
*** lucasagomes has joined #tripleo | 07:53 | |
*** stendulker has joined #tripleo | 07:55 | |
*** matbu has quit IRC | 07:57 | |
*** matbu has joined #tripleo | 08:02 | |
*** matbu has quit IRC | 08:07 | |
*** sthillma has quit IRC | 08:07 | |
*** shardy has joined #tripleo | 08:09 | |
*** derekh has joined #tripleo | 08:14 | |
derekh | Anybody looking at the failures in devtest/CI ? | 08:15 |
spredzy | derekh, I've been investigating the HA job failure for few days | 08:19 |
spredzy | no luck yet | 08:19 |
spredzy | As it only happens in the CI but not on my setup :/ | 08:19 |
derekh | spredzy: ok, I'm gonna try the nonha job first to see if I can figure that one out | 08:20 |
spredzy | Issue are : Sometimes cluster not forming itself. Sometimes it does but Galera can't create the cluster | 08:20 |
spredzy | derekh, I think puppet-nonha job was green last time I checked | 08:21 |
*** jistr has joined #tripleo | 08:21 | |
* spredzy dealing with gerrit atm is really painful | 08:21 | |
derekh | spredzy: yup it is, the one I'm looking at is overcloud-f21-nonha | 08:21 |
*** matbu has joined #tripleo | 08:28 | |
*** shardy_ has joined #tripleo | 08:33 | |
*** shardy has quit IRC | 08:34 | |
*** matbu has quit IRC | 08:35 | |
*** adrianopetrich has quit IRC | 08:36 | |
*** matbu has joined #tripleo | 08:37 | |
*** shardy_ has quit IRC | 08:38 | |
*** shardy has joined #tripleo | 08:39 | |
*** mcornea has joined #tripleo | 08:39 | |
*** mbound has joined #tripleo | 08:47 | |
*** gfidente has joined #tripleo | 08:48 | |
*** regebro has joined #tripleo | 08:57 | |
*** untriaged-bot has joined #tripleo | 09:00 | |
untriaged-bot | Untriaged bugs so far: | 09:00 |
untriaged-bot | https://bugs.launchpad.net/os-collect-config/+bug/1482510 | 09:00 |
openstack | Launchpad bug 1482510 in heat "OS::Heat::SoftwareDeployment failed due SSL certificate verification error" [Medium,Triaged] - Assigned to Rico Lin (rico-lin) | 09:00 |
untriaged-bot | https://bugs.launchpad.net/diskimage-builder/+bug/1466037 | 09:00 |
uvirtbot | Launchpad bug 1482510 in os-collect-config "OS::Heat::SoftwareDeployment failed due SSL certificate verification error" [Undecided,New] | 09:00 |
openstack | Launchpad bug 1466037 in diskimage-builder "Signed Fedora and Ubuntu user image built by DIB can`t boot on HP DL380 Gen8 server for lack of mpt2sas driver" [Undecided,Incomplete] | 09:00 |
untriaged-bot | https://bugs.launchpad.net/diskimage-builder/+bug/1483385 | 09:00 |
uvirtbot | Launchpad bug 1466037 in diskimage-builder "Signed Fedora and Ubuntu user image built by DIB can`t boot on HP DL380 Gen8 server for lack of mpt2sas driver" [Undecided,Incomplete] | 09:00 |
uvirtbot | Launchpad bug 1482510 in os-collect-config "OS::Heat::SoftwareDeployment failed due SSL certificate verification error" [Undecided,New] https://launchpad.net/bugs/1482510 | 09:00 |
openstack | Launchpad bug 1483385 in diskimage-builder "install_grub failing for centos7" [Undecided,In progress] - Assigned to Abel Lopez (al592b) | 09:00 |
untriaged-bot | https://bugs.launchpad.net/diskimage-builder/+bug/1471802 | 09:00 |
openstack | Launchpad bug 1471802 in diskimage-builder "ironic-agent element hardcodes interfaces names for DHCP." [Undecided,Fix committed] - Assigned to Om Kumar (om-kumar) | 09:00 |
uvirtbot | Launchpad bug 1466037 in diskimage-builder "Signed Fedora and Ubuntu user image built by DIB can`t boot on HP DL380 Gen8 server for lack of mpt2sas driver" [Undecided,Incomplete] https://launchpad.net/bugs/1466037 | 09:00 |
uvirtbot | Launchpad bug 1483385 in diskimage-builder "install_grub failing for centos7" [Undecided,In progress] | 09:00 |
*** untriaged-bot has quit IRC | 09:00 | |
uvirtbot | Launchpad bug 1483385 in diskimage-builder "install_grub failing for centos7" [Undecided,In progress] https://launchpad.net/bugs/1483385 | 09:00 |
uvirtbot | Launchpad bug 1471802 in diskimage-builder "ironic-agent element hardcodes interfaces names for DHCP." [Undecided,Fix committed] | 09:00 |
uvirtbot | Launchpad bug 1471802 in diskimage-builder "ironic-agent element hardcodes interfaces names for DHCP." [Undecided,Fix committed] https://launchpad.net/bugs/1471802 | 09:00 |
*** pelix has joined #tripleo | 09:06 | |
*** Marga_ has quit IRC | 09:08 | |
*** Marga_ has joined #tripleo | 09:08 | |
*** kbyrne has quit IRC | 09:12 | |
*** kbyrne has joined #tripleo | 09:17 | |
*** bvandenh has quit IRC | 09:18 | |
*** akrivoka has joined #tripleo | 09:19 | |
*** athomas has joined #tripleo | 09:23 | |
*** akrivoka has quit IRC | 09:34 | |
*** akrivoka has joined #tripleo | 09:36 | |
*** Guest47951 is now known as d0ugal | 09:49 | |
*** d0ugal has quit IRC | 09:49 | |
*** d0ugal has joined #tripleo | 09:49 | |
*** Marga_ has quit IRC | 09:54 | |
*** Marga_ has joined #tripleo | 09:55 | |
*** panda has quit IRC | 10:09 | |
*** panda has joined #tripleo | 10:10 | |
*** leanderthal has quit IRC | 10:20 | |
*** leanderthal has joined #tripleo | 10:22 | |
-openstackstatus- NOTICE: review.openstack.org (aka gerrit) is going down for an emergency restart | 10:23 | |
*** ChanServ changes topic to "review.openstack.org (aka gerrit) is going down for an emergency restart" | 10:23 | |
*** bvandenh has joined #tripleo | 10:39 | |
*** ChanServ changes topic to "CI failing on https://bugs.launchpad.net/tripleo/+bug/1483706 and https://bugs.launchpad.net/tripleo/+bug/1482195 | Deploying OpenStack Using OpenStack | https://wiki.openstack.org/wiki/TripleO" | 10:50 | |
-openstackstatus- NOTICE: Gerrit restart has resolved the issue and systems are back up and functioning | 10:50 | |
*** yog_ has quit IRC | 10:56 | |
spredzy | derekh, ping | 10:59 |
spredzy | derekh, would you happen to know if there are any kind of multicast filtering in our CI system ? | 10:59 |
*** regebro has quit IRC | 11:00 | |
*** regebro has joined #tripleo | 11:01 | |
derekh | spredzy: none that I'm aware of | 11:02 |
*** Marga_ has quit IRC | 11:02 | |
spredzy | derekh, ack. I am starting to run out of idea then :) | 11:07 |
*** aukhan has quit IRC | 11:11 | |
derekh | spredzy: I'll see if I can set you up a VM on the ci cloud to reproduce | 11:12 |
*** paramite has joined #tripleo | 11:13 | |
spredzy | derekh, thanks that would be awesome. Question: Can devtest run in a VM ? | 11:13 |
spredzy | When I tried months ago it has to be run on a baremetal AFAIK | 11:13 |
derekh | spredzy: it can run in a VM but needs to control VM's that run on baremetal (i.e. the undercloud/overcloud can't be nested virt instances) | 11:15 |
spredzy | ack no nested virt. That was what I tried | 11:15 |
spredzy | thanks for clearing this out | 11:16 |
derekh | spredzy: can you point me at a public key for you, I've spun you up a VM | 11:17 |
derekh | spredzy: I'll then start a screen session to show you what I'm going to do | 11:17 |
spredzy | derekh, ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAABAQC+ZFQv3MyjtL1BMpSA0o0gIkzLVVC711rthT29hBNeORdNowQ7FSvVWUdAbTq00U7Xzak1ANIYLJyn+0r7olsdG4XEiUR0dqgC99kbT/QhY5mLe5lpl7JUjW9ctn00hNmt+TswpatCKWPNwdeAJT2ERynZaqPobENgvIq7jfOFWQIVew7qFeZygxsPVn36EUr2Cdq7Nb7U0XFXh3x1p0v0+MbL4tiJwPlMAGvFTKIMt+EaA+AsRIxiOo9CMk5ZuOl9pT8h5vNuEOcvS0qx4v44EAD2VOsCVCcrPNMcpuSzZP8dRTGU9wRREAWXngD0Zq9YJMH38VTxHiskoBw1NnPz spredzy@murcia.yanisguenane. | 11:18 |
spredzy | ack | 11:18 |
derekh | spredzy: got it | 11:18 |
derekh | ssh fedora@66.187.229.66 | 11:18 |
derekh | spredzy: ^ | 11:18 |
spredzy | derekh, in | 11:19 |
derekh | spredzy: ok, screen -x | 11:19 |
spredzy | derekh, in | 11:19 |
derekh | spredzy: this is a vanilla f21 cloud instance, the first thing that infra do is install a bunch of stuff to make it a nodepool template, one sec and I'll make a script | 11:20 |
*** mburned_` is now known as mburned_out | 11:20 | |
*** mburned_out is now known as mburned | 11:20 | |
*** stendulker has quit IRC | 11:20 | |
derekh | spredzy: essentially this gets run | 11:21 |
derekh | http://paste.openstack.org/show/419081/ | 11:21 |
spredzy | derekh, ack (looking at the screen session) | 11:22 |
derekh | spredzy: hmm, running it twice is a problem, gonna remove a few things | 11:23 |
derekh | spredzy: ok, so amunst other things that clones a shed load of git repositories, its takes about 30 minutes (maybe a little longer, so we can get back to this once its finished) | 11:25 |
spredzy | ok | 11:25 |
spredzy | derekh, ^ | 11:25 |
*** lucasagomes is now known as lucas-hungry | 11:28 | |
*** shardy_ has joined #tripleo | 11:43 | |
*** shardy has quit IRC | 11:45 | |
*** Marga_ has joined #tripleo | 11:49 | |
*** shardy_ has quit IRC | 11:49 | |
*** shardy has joined #tripleo | 11:49 | |
spredzy | derekh, ping. process seems over | 11:53 |
*** rhallisey has joined #tripleo | 11:53 | |
derekh | spredzy: yup, ok so what normally happens now (in nodepool) is that a snapshot is taken, this snapshot is what all CI tests will start from (stop me if I'm telling you things you already know) | 11:54 |
spredzy | derekh, nop at all I am tootally unfamiliar with this process | 11:55 |
*** funzo has joined #tripleo | 11:55 | |
derekh | spredzy: ok, I've SU'd to jenkins, everything else will run as jenkins | 11:55 |
spredzy | derekh, ok | 11:56 |
derekh | spredzy: we also need to configure eth1 | 11:56 |
derekh | spredzy: eth1 is on a special "test" network, it will alowe us to talk to bm host that are used to being up instances | 11:58 |
spredzy | derekh, ack | 11:58 |
*** chlong has quit IRC | 11:59 | |
derekh | spredzy: so, nearly there ;-) , nodepool/zuul kicks off jobs with a load of job specific stuff defined | 11:59 |
*** adrianopetrich has joined #tripleo | 11:59 | |
*** funzo has quit IRC | 12:00 | |
spredzy | derekh, what is the last command you ran. It went out pretty fast :/ | 12:02 |
*** adrianopetrich_ has joined #tripleo | 12:02 | |
derekh | spredzy: I essentiall went through this (I had it in some notes) http://paste.openstack.org/show/419108/ | 12:02 |
derekh | spredzy: and changed the joba name to the ha job | 12:03 |
derekh | spredzy: this sets things up and then calls "gate_hook", | 12:03 |
derekh | spredzy: I've stubbed out gate_hook so we can run it seperatly in a minute | 12:04 |
spredzy | derekh, ok. I think it failed atm | 12:04 |
derekh | spredzy: checking | 12:04 |
*** adrianopetrich has quit IRC | 12:04 | |
spredzy | derekh, timeout -s 9 m vs. timeout -s 9m | 12:06 |
spredzy | no ? | 12:06 |
spredzy | well I see some command not found above also | 12:06 |
derekh | spredzy: yup possibly but I *think* its ok in this case | 12:06 |
derekh | spredzy: so I'm going to try the tripleo command to see how it goes | 12:07 |
derekh | spredzy: nope, I'm wrong its not ok, /opt/stack/new wasn't set up | 12:08 |
derekh | spredzy: I gotta pop away for about 30 minutes can we reconvene after that ? (or at some stage that suits you?) | 12:09 |
spredzy | sure | 12:09 |
spredzy | derekh, ping me when you're back | 12:09 |
derekh | spredzy: will do | 12:10 |
spredzy | thx | 12:10 |
*** jayg|g0n3 is now known as jayg | 12:15 | |
*** shardy_ has joined #tripleo | 12:28 | |
*** shardy has quit IRC | 12:29 | |
*** lucas-hungry is now known as lucasagomes | 12:31 | |
*** shardy_ has quit IRC | 12:33 | |
*** rbrady has joined #tripleo | 12:34 | |
*** shardy has joined #tripleo | 12:34 | |
*** noslzzp has joined #tripleo | 12:41 | |
*** Marga_ has quit IRC | 12:47 | |
*** matbu has quit IRC | 12:47 | |
*** adrianopetrich_ has quit IRC | 12:51 | |
*** matbu has joined #tripleo | 12:51 | |
*** dprince has joined #tripleo | 12:54 | |
*** Marga_ has joined #tripleo | 12:56 | |
*** sdake has joined #tripleo | 13:00 | |
derekh | spredzy: back, just taking a look now to see what I screwed up | 13:01 |
spredzy | derekh, ack I am around | 13:01 |
derekh | spredzy: I recloned openstack-infra/devstack-gate into $WORKSPACE | 13:04 |
*** adrianopetrich_ has joined #tripleo | 13:05 | |
derekh | spredzy: I must have done it wrong the first time | 13:05 |
*** matbu has quit IRC | 13:05 | |
spredzy | derekh, ok let see where it leads us now | 13:05 |
*** lifeless has quit IRC | 13:05 | |
derekh | spredzy: ok, that worked, moving into the tripleo-ci directory to run tripleo ci ha job | 13:08 |
*** rlandy has joined #tripleo | 13:08 | |
derekh | spredzy: when a ci job finishes the instances are fried up for another CI job, that sleep keeps them around until you kill the script | 13:09 |
spredzy | derekh, ack | 13:09 |
spredzy | derekh, so now the regular CI job is running, correct ? | 13:10 |
derekh | spredzy: then we run this command, the overcloud-puppet ha ci job is running | 13:10 |
derekh | spredzy: yup | 13:10 |
spredzy | so once it is done - failed mostly - system won't be tear down by the sleep you just put in | 13:11 |
spredzy | derekh, some from there I'll be able to debug/break/change things | 13:11 |
derekh | spredzy: exactly, it will only be released once the sleep is finished/killed | 13:11 |
spredzy | derekh, yeah thats about 16h | 13:12 |
spredzy | thats gives me plenty of time :) | 13:12 |
spredzy | derekh, thanks for setting this up for me | 13:12 |
derekh | spredzy: you might need to change one or two things so the script can be rerun (not all of it is idepotent) but that should esentially be it | 13:12 |
derekh | spredzy: no prob, image download seems slow but lets see how far this gets and we can see if I've forgotten anything ;-) | 13:13 |
*** devvesa has joined #tripleo | 13:14 | |
*** sdake_ has joined #tripleo | 13:15 | |
derekh | hmmm, image download isn't usually this slow | 13:16 |
*** karume has joined #tripleo | 13:16 | |
derekh | spredzy: its getting slower, going to cancel it and start over | 13:17 |
spredzy | derekh, ok | 13:17 |
derekh | spredzy: going to change a few things so it can be rerun multiple times | 13:17 |
*** yog_ has joined #tripleo | 13:18 | |
*** sdake has quit IRC | 13:19 | |
*** sdake_ has quit IRC | 13:19 | |
derekh | spredzy: 3 changes there, 1. remove the check to specifically stop us reusing nodes, 2. add "|| true" after the ci-branch is created (as it errors if the branch already exists) | 13:20 |
derekh | and 3 don't download the fedora image each time | 13:20 |
spredzy | ack | 13:21 |
*** matbu has joined #tripleo | 13:22 | |
*** hewbrocca has joined #tripleo | 13:23 | |
spredzy | derekh, was that cause by the || true on the ci-branch thingy ? | 13:24 |
derekh | spredzy: it was because of the ci-branch yes but I don't think the "|| true" did it | 13:25 |
derekh | spredzy: looks like I gotta change a few more things so this can be rerun bear with me | 13:25 |
derekh | this script is normally only run once and VM is thrown away | 13:26 |
*** funzo has joined #tripleo | 13:26 | |
*** lifeless has joined #tripleo | 13:28 | |
spredzy | derekh, now it fails with "AttributeError: 'module' object has no attribute 'wraps'" | 13:29 |
*** absubram has quit IRC | 13:29 | |
derekh | spredzy: yup, the first pass through this script must have updating something (in six...?) | 13:30 |
*** julim has joined #tripleo | 13:31 | |
*** funzo has quit IRC | 13:31 | |
jistr | btw i looked into this a bit as well on my local setup, didn't find anything yet (still unable to reproduce it) but here are some info points: | 13:32 |
jistr | the most weird thing is that CI doesn't execute "Disable STONITH" exec, as spredzy pointed out before | 13:33 |
jistr | the cause could be that this unless condition is not met for some reason | 13:33 |
jistr | https://github.com/redhat-openstack/puppet-pacemaker/blob/master/manifests/stonith.pp#L5 | 13:33 |
spredzy | jistr, weirdest thing is that each CI run gives a different kind of error | 13:34 |
spredzy | Sometimes corosync can't form the cluster | 13:34 |
spredzy | Sometime cororsync is ok, pacemaker is ok but galera fails | 13:34 |
*** mcornea has quit IRC | 13:34 | |
spredzy | jistr, if you look at the logs from this https://review.openstack.org/#/c/212566/ | 13:34 |
*** adrianopetrich_ has quit IRC | 13:35 | |
spredzy | the mysqld.log for each node, you'll see cluster forms itself (3 nodes), then 1 leave, then another leave, then it its brought back up. 1 then 2, then 1 leave again like non stop | 13:35 |
jistr | but "Disable STONITH" is never run, is it? | 13:35 |
jistr | on my local env it's always run | 13:35 |
jistr | i noticed that there's an updated corosync RPM in Fedora which my image didn't contain | 13:36 |
jistr | so i run yum -y update with a firstboot script when deploying | 13:36 |
jistr | but it still didn't reproduce | 13:36 |
jistr | and disable stonith was run | 13:36 |
spredzy | + sudo pcs stonith show --full | 13:37 |
spredzy | + which crm_verify | 13:37 |
spredzy | so not sure what to think | 13:37 |
*** mcornea has joined #tripleo | 13:37 | |
derekh | spredzy: ok, its getting further now, currently building the ramdisk, hopefully its gets further now | 13:37 |
jistr | stonith show won't give any relevant info afaik | 13:37 |
spredzy | jistr, what about + sudo pcs stonith show --full | 13:38 |
spredzy | + which crm_verify | 13:38 |
spredzy | ooops | 13:38 |
spredzy | worry | 13:38 |
spredzy | <nvpair id="cib-bootstrap-options-stonith-enabled" name="stonith-enabled" value="false"/> | 13:38 |
jistr | yeah that looks like it's disabled then | 13:38 |
jistr | (pcs property show stonith-enabled would show from cmdline) | 13:38 |
jistr | but i still wonder how the difference between CI and my local env happens... | 13:39 |
jistr | how come it's disabled in CI even though the "disable stonith" exec isn't being run | 13:39 |
spredzy | looking at the corosync logs and the galera logs I have the feeling one node os the cluster is flapping (in and out the cluster) | 13:40 |
spredzy | and I have no idea what could be the cause of that | 13:40 |
spredzy | derekh, ack thx | 13:40 |
spredzy | jistr, ^ | 13:40 |
spredzy | jistr, on the run I am talking about it did run I can see '/Stage[main]/Pacemaker::Stonith/Exec[Disable STONITH]/returns: executed successfully' in the os-collect-config log | 13:41 |
jistr | hmm interesting | 13:41 |
spredzy | (review: https://review.openstack.org/#/c/212566/) | 13:41 |
spredzy | but before the recheck that wasn't the case | 13:41 |
*** sseago has joined #tripleo | 13:41 | |
jistr | one thing which flew by is if the CI job cluster could interfere with other CI job clusters, i.e. if we're using multicast | 13:42 |
jistr | but we're using unicast | 13:42 |
jistr | transport: udpu | 13:42 |
spredzy | yeah, by default when you run pcs cluster start it will use --transport udpu | 13:42 |
jistr | (which doesn't disprove the interference hypothesis, just a data point) | 13:42 |
spredzy | jistr, I've run it this week end in the CI while no other tripleo-heat-templates job were running | 13:43 |
spredzy | still had the same issue :( | 13:43 |
spredzy | jistr, have you ever experiences nodes flapping in/out the cluster ? | 13:43 |
*** bvandenh has quit IRC | 13:43 | |
spredzy | s/experiences/experienced | 13:43 |
jistr | no i haven't | 13:44 |
spredzy | jistr, http://logs.openstack.org/66/212566/3/check-tripleo/gate-tripleo-ironic-overcloud-f21puppet-ha/9fe3de0/logs/overcloud-controller-2_logs/corosync.txt.gz | 13:44 |
spredzy | if you grep Retransmit | 13:44 |
spredzy | you'll see quite some messages ... this worries me as a fact that it might show some cluster difunctionement | 13:45 |
spredzy | but I still can't pin point the actual issue | 13:45 |
*** sseago has quit IRC | 13:46 | |
*** Goneri has joined #tripleo | 13:48 | |
*** masco has quit IRC | 13:48 | |
*** sdake has joined #tripleo | 13:49 | |
*** sdake has quit IRC | 13:49 | |
*** sdake has joined #tripleo | 13:49 | |
*** matbu has quit IRC | 13:55 | |
openstackgerrit | Derek Higgins proposed openstack-infra/tripleo-ci: Switch nova temprevert to a cherrypick https://review.openstack.org/213714 | 13:57 |
*** matbu has joined #tripleo | 13:57 | |
derekh | That ^^^ I think is why the non puppet jobs are failing | 13:57 |
dprince | derekh: ack | 13:57 |
dprince | derekh: fast tracking the switch to instack underclouds would help us here too | 13:58 |
derekh | spredzy: seed has booted on that manuall ci job | 13:58 |
*** pradk has joined #tripleo | 13:59 | |
*** spzala has joined #tripleo | 13:59 | |
derekh | dprince: yup, I'm back looking at it today should have a good idea how close we are in a bit | 13:59 |
spredzy | derekh, ok | 14:00 |
dprince | derekh: cool, yeah we don't use the "ephemeral" partitions for those jobs. Or we shouldn't need to if we do | 14:00 |
derekh | dprince: ack | 14:00 |
*** chlong has joined #tripleo | 14:03 | |
jistr | spredzy: well guess what :) after yum -y update i get the same behavior you posted with flapping nodes (previously i was focusing just on the "disable stonith" exec) | 14:04 |
hewbrocca | hey fellas! | 14:04 |
jistr | and the deployment seems stuck | 14:05 |
jistr | spredzy: so our suspect is corosync/corosynclib RPMs i'd say | 14:05 |
jistr | hewbrocca: hello | 14:05 |
*** lblanchard has joined #tripleo | 14:08 | |
spredzy | jistr, thhat would confirm my guess about corosync being the guily, but then I would suspect the issue to be 'network' related and not specifically configuration related | 14:08 |
spredzy | (I asked also on #clusterlabs) see if they get any idea | 14:08 |
spredzy | hewbrocca, o/ | 14:08 |
*** panda has quit IRC | 14:09 | |
*** panda has joined #tripleo | 14:10 | |
*** sdake has quit IRC | 14:10 | |
gfidente | spredzy, any trace of network manager? | 14:11 |
jistr | here's the full list of suspects http://fpaste.org/255866/14398206/raw/ | 14:13 |
*** w_ has joined #tripleo | 14:14 | |
spredzy | jistr, ahaha it couldnt be shorter :) | 14:14 |
spredzy | :D | 14:14 |
spredzy | gfidente, apparently nop (http://logs.openstack.org/66/212566/3/check-tripleo/gate-tripleo-ironic-overcloud-f21puppet-ha/9fe3de0/logs/overcloud-controller-0_logs/host_info.txt.gz) | 14:14 |
* spredzy out for the next 4 hours. | 14:15 | |
*** olaph has quit IRC | 14:16 | |
*** spredzy is now known as spredzy|afk | 14:16 | |
*** bvandenh has joined #tripleo | 14:17 | |
*** sdake has joined #tripleo | 14:17 | |
*** olaph has joined #tripleo | 14:20 | |
*** w_ has quit IRC | 14:21 | |
*** mcornea has quit IRC | 14:23 | |
*** mcornea has joined #tripleo | 14:24 | |
*** funzo has joined #tripleo | 14:27 | |
*** adrianopetrich has joined #tripleo | 14:32 | |
*** funzo has quit IRC | 14:32 | |
*** olaph has quit IRC | 14:33 | |
openstackgerrit | Jiri Stransky proposed openstack/diskimage-builder: Fedora: install older corosync https://review.openstack.org/213736 | 14:38 |
*** paramite is now known as paramite|afk | 14:41 | |
openstackgerrit | Jiri Stransky proposed openstack-infra/tripleo-ci: Fedora: install older corosync https://review.openstack.org/213737 | 14:45 |
jistr | spredzy|afk, derekh: ^ tried to pin down corosync and corosynclib to see if it fixes us | 14:45 |
*** paramite|afk is now known as paramite | 14:46 | |
*** mburned is now known as mburned_out | 14:52 | |
*** mburned_out is now known as mburned | 14:52 | |
*** bvandenh has quit IRC | 14:56 | |
*** shardy_ has joined #tripleo | 14:59 | |
*** untriaged-bot has joined #tripleo | 15:00 | |
untriaged-bot | Untriaged bugs so far: | 15:00 |
untriaged-bot | https://bugs.launchpad.net/os-collect-config/+bug/1482510 | 15:00 |
openstack | Launchpad bug 1482510 in heat "OS::Heat::SoftwareDeployment failed due SSL certificate verification error" [Medium,Triaged] - Assigned to Rico Lin (rico-lin) | 15:00 |
uvirtbot | Launchpad bug 1482510 in os-collect-config "OS::Heat::SoftwareDeployment failed due SSL certificate verification error" [Undecided,New] | 15:00 |
untriaged-bot | https://bugs.launchpad.net/diskimage-builder/+bug/1466037 | 15:00 |
openstack | Launchpad bug 1466037 in diskimage-builder "Signed Fedora and Ubuntu user image built by DIB can`t boot on HP DL380 Gen8 server for lack of mpt2sas driver" [Undecided,Incomplete] | 15:00 |
uvirtbot | Launchpad bug 1482510 in os-collect-config "OS::Heat::SoftwareDeployment failed due SSL certificate verification error" [Undecided,New] https://launchpad.net/bugs/1482510 | 15:00 |
uvirtbot | Launchpad bug 1466037 in diskimage-builder "Signed Fedora and Ubuntu user image built by DIB can`t boot on HP DL380 Gen8 server for lack of mpt2sas driver" [Undecided,Incomplete] | 15:00 |
untriaged-bot | https://bugs.launchpad.net/diskimage-builder/+bug/1483385 | 15:00 |
openstack | Launchpad bug 1483385 in diskimage-builder "install_grub failing for centos7" [Undecided,In progress] - Assigned to Abel Lopez (al592b) | 15:00 |
uvirtbot | Launchpad bug 1466037 in diskimage-builder "Signed Fedora and Ubuntu user image built by DIB can`t boot on HP DL380 Gen8 server for lack of mpt2sas driver" [Undecided,Incomplete] https://launchpad.net/bugs/1466037 | 15:00 |
untriaged-bot | https://bugs.launchpad.net/diskimage-builder/+bug/1471802 | 15:00 |
openstack | Launchpad bug 1471802 in diskimage-builder "ironic-agent element hardcodes interfaces names for DHCP." [Undecided,Fix committed] - Assigned to Om Kumar (om-kumar) | 15:00 |
uvirtbot | Launchpad bug 1483385 in diskimage-builder "install_grub failing for centos7" [Undecided,In progress] | 15:00 |
*** untriaged-bot has quit IRC | 15:00 | |
uvirtbot | Launchpad bug 1483385 in diskimage-builder "install_grub failing for centos7" [Undecided,In progress] https://launchpad.net/bugs/1483385 | 15:00 |
uvirtbot | Launchpad bug 1471802 in diskimage-builder "ironic-agent element hardcodes interfaces names for DHCP." [Undecided,Fix committed] | 15:00 |
uvirtbot | Launchpad bug 1471802 in diskimage-builder "ironic-agent element hardcodes interfaces names for DHCP." [Undecided,Fix committed] https://launchpad.net/bugs/1471802 | 15:00 |
openstackgerrit | Dan Prince proposed openstack/os-net-config: os-net-config: ensure ifup is called just once https://review.openstack.org/213746 | 15:00 |
*** shardy has quit IRC | 15:01 | |
*** dsneddon has joined #tripleo | 15:03 | |
*** dsneddon has quit IRC | 15:03 | |
*** dsneddon has joined #tripleo | 15:03 | |
*** funzo has joined #tripleo | 15:04 | |
*** shardy_ has quit IRC | 15:04 | |
*** shardy has joined #tripleo | 15:05 | |
*** mcornea has quit IRC | 15:07 | |
*** dsneddon has quit IRC | 15:08 | |
*** dsneddon has joined #tripleo | 15:08 | |
openstackgerrit | Dan Prince proposed openstack/tripleo-image-elements: os-net-config: add configure_safe_defaults https://review.openstack.org/213748 | 15:09 |
*** dsneddon has quit IRC | 15:11 | |
openstackgerrit | greghaynes proposed openstack/diskimage-builder: create growroot element https://review.openstack.org/206636 | 15:11 |
*** dsneddon has joined #tripleo | 15:11 | |
*** yamahata has joined #tripleo | 15:11 | |
*** paramite is now known as paramite|afk | 15:11 | |
*** paramite|afk is now known as paramite | 15:12 | |
*** Marga_ has quit IRC | 15:17 | |
*** chlong has quit IRC | 15:18 | |
*** dprince has quit IRC | 15:20 | |
*** mbound has quit IRC | 15:22 | |
*** spzala has quit IRC | 15:30 | |
*** chlong has joined #tripleo | 15:31 | |
*** paramite is now known as paramite|afk | 15:32 | |
*** chlong has quit IRC | 15:38 | |
*** trown is now known as trown|lunch | 15:40 | |
*** chlong has joined #tripleo | 15:40 | |
*** aufi has quit IRC | 15:42 | |
*** Marga_ has joined #tripleo | 15:43 | |
*** lazy_prince has joined #tripleo | 15:45 | |
*** jprovazn has quit IRC | 15:52 | |
*** mestery has joined #tripleo | 15:53 | |
*** ifarkas has quit IRC | 15:56 | |
*** rwsu has joined #tripleo | 15:57 | |
*** yog_ has quit IRC | 16:00 | |
*** dprince has joined #tripleo | 16:03 | |
*** lucasagomes is now known as lucas-brb | 16:04 | |
*** lazy_prince has quit IRC | 16:06 | |
*** alop has joined #tripleo | 16:08 | |
*** pbourke has quit IRC | 16:10 | |
*** pbourke has joined #tripleo | 16:10 | |
*** adrianopetrich has quit IRC | 16:11 | |
*** regebro has quit IRC | 16:12 | |
openstackgerrit | Derek Higgins proposed openstack-infra/tripleo-ci: Switch nova temprevert to a cherrypick https://review.openstack.org/213714 | 16:12 |
*** david-ly_ is now known as david-lyle | 16:15 | |
*** Marga_ has quit IRC | 16:17 | |
*** Marga_ has joined #tripleo | 16:18 | |
*** lazy_prince has joined #tripleo | 16:18 | |
*** jistr has quit IRC | 16:19 | |
*** spzala has joined #tripleo | 16:20 | |
openstackgerrit | greghaynes proposed openstack/diskimage-builder: Fix init-scripts element path munging and deps https://review.openstack.org/213180 | 16:20 |
derekh | spredzy|afk: that run has finsihed now, I've logged into one of the controllers | 16:20 |
openstackgerrit | greghaynes proposed openstack/diskimage-builder: Install-static depends on rsync https://review.openstack.org/213309 | 16:21 |
openstackgerrit | greghaynes proposed openstack/diskimage-builder: create growroot element https://review.openstack.org/206636 | 16:21 |
derekh | spredzy|afk: if you need to redo it later when your back you can rerun the devtest command and when its done log in like this : http://paste.openstack.org/show/419343/ | 16:21 |
*** sthillma has joined #tripleo | 16:26 | |
*** Marga_ has quit IRC | 16:30 | |
*** olaph has joined #tripleo | 16:32 | |
*** yamahata has quit IRC | 16:35 | |
*** sthillma has quit IRC | 16:35 | |
*** matbu has quit IRC | 16:37 | |
*** olaph has quit IRC | 16:42 | |
*** matbu has joined #tripleo | 16:42 | |
*** trown|lunch is now known as trown | 16:44 | |
*** devvesa has quit IRC | 16:46 | |
*** tzumainn has joined #tripleo | 16:49 | |
*** lucas-brb is now known as lucasagomes | 16:54 | |
*** derekh has quit IRC | 16:57 | |
*** lazy_prince has quit IRC | 16:57 | |
*** yamahata has joined #tripleo | 16:58 | |
*** dsneddon has quit IRC | 17:01 | |
*** karume has quit IRC | 17:05 | |
*** bvandenh has joined #tripleo | 17:14 | |
*** mestery has quit IRC | 17:15 | |
*** sdake_ has joined #tripleo | 17:20 | |
*** athomas has quit IRC | 17:21 | |
*** sdake has quit IRC | 17:24 | |
*** mestery has joined #tripleo | 17:26 | |
*** lsmola has quit IRC | 17:31 | |
*** sdake has joined #tripleo | 17:31 | |
*** mestery has quit IRC | 17:34 | |
*** sdake_ has quit IRC | 17:35 | |
*** morazi has joined #tripleo | 17:37 | |
*** sthillma has joined #tripleo | 17:48 | |
openstackgerrit | Dan Prince proposed openstack/tripleo-heat-templates: Docker compute role configured via Puppet https://review.openstack.org/209505 | 17:57 |
openstackgerrit | Dan Prince proposed openstack/tripleo-heat-templates: Docker compute role configured via Puppet https://review.openstack.org/209505 | 17:57 |
*** dsneddon has joined #tripleo | 17:58 | |
*** sthillma has quit IRC | 18:07 | |
*** sthillma has joined #tripleo | 18:07 | |
*** panda has quit IRC | 18:10 | |
*** panda has joined #tripleo | 18:10 | |
*** lucasagomes is now known as lucas-dinner | 18:10 | |
*** pelix has quit IRC | 18:11 | |
*** lucas-dinner has quit IRC | 18:20 | |
*** bvandenh has quit IRC | 18:24 | |
*** yamahata has quit IRC | 18:28 | |
*** shivrao has joined #tripleo | 18:29 | |
*** spzala has quit IRC | 18:39 | |
*** olaph has joined #tripleo | 18:41 | |
*** matbu has quit IRC | 18:43 | |
*** karume has joined #tripleo | 18:45 | |
*** olaph has quit IRC | 18:51 | |
*** olaph has joined #tripleo | 18:53 | |
*** Marga_ has joined #tripleo | 18:58 | |
*** spzala has joined #tripleo | 19:02 | |
*** matbu has joined #tripleo | 19:03 | |
*** olaph has quit IRC | 19:09 | |
*** karume has quit IRC | 19:16 | |
*** matbu has quit IRC | 19:32 | |
*** matbu has joined #tripleo | 19:34 | |
*** adrianopetrich has joined #tripleo | 19:34 | |
*** Goneri has quit IRC | 19:37 | |
*** penick has joined #tripleo | 19:58 | |
*** jayg is now known as jayg|g0n3 | 20:02 | |
*** noslzzp has quit IRC | 20:17 | |
*** Goneri has joined #tripleo | 20:28 | |
openstackgerrit | Rob Pothier proposed openstack/tripleo-heat-templates: Enable Cisco Nexus and UCSM plugins https://review.openstack.org/198754 | 20:29 |
*** paramite|afk is now known as paramite | 20:36 | |
*** paramite has quit IRC | 20:38 | |
*** akrivoka has quit IRC | 20:40 | |
*** spredzy|afk is now known as spredzy | 20:41 | |
*** shardy has quit IRC | 20:50 | |
*** trown is now known as trown|outttypeww | 20:53 | |
*** matbu has quit IRC | 20:54 | |
*** untriaged-bot has joined #tripleo | 21:00 | |
untriaged-bot | Untriaged bugs so far: | 21:00 |
untriaged-bot | https://bugs.launchpad.net/os-collect-config/+bug/1482510 | 21:00 |
openstack | Launchpad bug 1482510 in heat "OS::Heat::SoftwareDeployment failed due SSL certificate verification error" [Medium,Triaged] - Assigned to Rico Lin (rico-lin) | 21:00 |
uvirtbot | Launchpad bug 1482510 in os-collect-config "OS::Heat::SoftwareDeployment failed due SSL certificate verification error" [Undecided,New] | 21:00 |
untriaged-bot | https://bugs.launchpad.net/diskimage-builder/+bug/1466037 | 21:00 |
openstack | Launchpad bug 1466037 in diskimage-builder "Signed Fedora and Ubuntu user image built by DIB can`t boot on HP DL380 Gen8 server for lack of mpt2sas driver" [Undecided,Incomplete] | 21:00 |
uvirtbot | Launchpad bug 1482510 in os-collect-config "OS::Heat::SoftwareDeployment failed due SSL certificate verification error" [Undecided,New] https://launchpad.net/bugs/1482510 | 21:00 |
uvirtbot | Launchpad bug 1466037 in diskimage-builder "Signed Fedora and Ubuntu user image built by DIB can`t boot on HP DL380 Gen8 server for lack of mpt2sas driver" [Undecided,Incomplete] | 21:00 |
untriaged-bot | https://bugs.launchpad.net/diskimage-builder/+bug/1483385 | 21:00 |
openstack | Launchpad bug 1483385 in diskimage-builder "install_grub failing for centos7" [Undecided,In progress] - Assigned to Abel Lopez (al592b) | 21:00 |
uvirtbot | Launchpad bug 1466037 in diskimage-builder "Signed Fedora and Ubuntu user image built by DIB can`t boot on HP DL380 Gen8 server for lack of mpt2sas driver" [Undecided,Incomplete] https://launchpad.net/bugs/1466037 | 21:00 |
untriaged-bot | https://bugs.launchpad.net/diskimage-builder/+bug/1471802 | 21:00 |
openstack | Launchpad bug 1471802 in diskimage-builder "ironic-agent element hardcodes interfaces names for DHCP." [Undecided,Fix committed] - Assigned to Om Kumar (om-kumar) | 21:00 |
uvirtbot | Launchpad bug 1483385 in diskimage-builder "install_grub failing for centos7" [Undecided,In progress] | 21:00 |
*** untriaged-bot has quit IRC | 21:00 | |
uvirtbot | Launchpad bug 1483385 in diskimage-builder "install_grub failing for centos7" [Undecided,In progress] https://launchpad.net/bugs/1483385 | 21:00 |
uvirtbot | Launchpad bug 1471802 in diskimage-builder "ironic-agent element hardcodes interfaces names for DHCP." [Undecided,Fix committed] | 21:00 |
uvirtbot | Launchpad bug 1471802 in diskimage-builder "ironic-agent element hardcodes interfaces names for DHCP." [Undecided,Fix committed] https://launchpad.net/bugs/1471802 | 21:00 |
*** yamahata has joined #tripleo | 21:01 | |
*** lblanchard has quit IRC | 21:03 | |
*** sthillma_ has joined #tripleo | 21:04 | |
*** sthillma has quit IRC | 21:06 | |
*** sthillma_ is now known as sthillma | 21:06 | |
*** jtomasek has quit IRC | 21:09 | |
openstackgerrit | Dan Sneddon proposed openstack/tripleo-heat-templates: Remove hardcoded bridge name in bonded compute NIC config https://review.openstack.org/213861 | 21:11 |
*** mestery has joined #tripleo | 21:15 | |
*** julim has quit IRC | 21:17 | |
*** gfidente has quit IRC | 21:41 | |
*** Marga_ has quit IRC | 21:46 | |
*** uvirtbot has quit IRC | 21:50 | |
*** mestery has quit IRC | 22:01 | |
*** mestery has joined #tripleo | 22:04 | |
*** mestery has quit IRC | 22:09 | |
*** panda has quit IRC | 22:09 | |
*** sdake_ has joined #tripleo | 22:10 | |
openstackgerrit | Yanis Guenane proposed openstack-infra/tripleo-ci: Adding more CPU in the HA scenario https://review.openstack.org/213885 | 22:10 |
*** panda has joined #tripleo | 22:10 | |
*** sdake has quit IRC | 22:13 | |
openstackgerrit | OpenStack Proposal Bot proposed openstack/tuskar: Updated from global requirements https://review.openstack.org/186939 | 22:13 |
*** pradk has quit IRC | 22:14 | |
*** dprince has quit IRC | 22:21 | |
openstackgerrit | Tomoki Sekiyama proposed openstack/os-net-config: Support multiple addresses assignment with ifcfg https://review.openstack.org/213902 | 22:26 |
openstackgerrit | Tomoki Sekiyama proposed openstack/os-net-config: Support multiple addresses assignment with eni https://review.openstack.org/213903 | 22:26 |
*** Goneri has quit IRC | 22:33 | |
*** chlong has quit IRC | 22:34 | |
*** shivrao has quit IRC | 22:41 | |
*** shivrao has joined #tripleo | 22:53 | |
openstackgerrit | Derek Higgins proposed openstack-infra/tripleo-ci: Nothing to see here https://review.openstack.org/111011 | 22:56 |
*** sdake_ is now known as sdake | 23:01 | |
*** rhallisey has quit IRC | 23:11 | |
*** mestery has joined #tripleo | 23:13 | |
*** sdake_ has joined #tripleo | 23:23 | |
*** sdake has quit IRC | 23:26 | |
*** spzala has quit IRC | 23:38 | |
*** mestery has quit IRC | 23:55 |
Generated by irclog2html.py 2.14.0 by Marius Gedminas - find it at mg.pov.lt!