*** bank_ has quit IRC | 00:02 | |
*** apetrich has quit IRC | 00:04 | |
*** apetrich has joined #tripleo | 00:04 | |
*** ooolpbot has joined #tripleo | 00:10 | |
ooolpbot | URGENT TRIPLEO TASKS NEED ATTENTION | 00:10 |
---|---|---|
ooolpbot | https://bugs.launchpad.net/tripleo/+bug/1637961 | 00:10 |
openstack | Launchpad bug 1637961 in tripleo "periodic HA master job pingtest times out" [Critical,In progress] - Assigned to Gabriele Cerami (gcerami) | 00:10 |
ooolpbot | https://bugs.launchpad.net/tripleo/+bug/1638350 | 00:10 |
ooolpbot | https://bugs.launchpad.net/tripleo/+bug/1638864 | 00:10 |
*** ooolpbot has quit IRC | 00:10 | |
openstack | Launchpad bug 1638350 in tripleo "pingtest failing on OVB jobs to create Cinder volume and Nova server" [Critical,In progress] - Assigned to Emilien Macchi (emilienm) | 00:10 |
openstack | Launchpad bug 1638864 in tripleo "mariadb-10.1.18 breaks the resource agent" [Critical,In progress] - Assigned to Michele Baldessari (michele) | 00:10 |
*** fultonj has quit IRC | 00:12 | |
*** fultonj has joined #tripleo | 00:14 | |
*** panda is now known as panda|zZ | 00:18 | |
*** links has joined #tripleo | 00:27 | |
*** limao has joined #tripleo | 00:36 | |
*** b00tcat has quit IRC | 00:44 | |
*** dougbtv has joined #tripleo | 00:46 | |
*** percevalbot has quit IRC | 00:58 | |
*** cdearborn has quit IRC | 01:00 | |
*** maticue has quit IRC | 01:06 | |
*** jkilpatr has quit IRC | 01:07 | |
*** ooolpbot has joined #tripleo | 01:10 | |
ooolpbot | URGENT TRIPLEO TASKS NEED ATTENTION | 01:10 |
ooolpbot | https://bugs.launchpad.net/tripleo/+bug/1637961 | 01:10 |
openstack | Launchpad bug 1637961 in tripleo "periodic HA master job pingtest times out" [Critical,In progress] - Assigned to Gabriele Cerami (gcerami) | 01:10 |
ooolpbot | https://bugs.launchpad.net/tripleo/+bug/1638350 | 01:10 |
ooolpbot | https://bugs.launchpad.net/tripleo/+bug/1638864 | 01:10 |
*** ooolpbot has quit IRC | 01:10 | |
openstack | Launchpad bug 1638350 in tripleo "pingtest failing on OVB jobs to create Cinder volume and Nova server" [Critical,In progress] - Assigned to Emilien Macchi (emilienm) | 01:10 |
openstack | Launchpad bug 1638864 in tripleo "mariadb-10.1.18 breaks the resource agent" [Critical,In progress] - Assigned to Michele Baldessari (michele) | 01:10 |
*** b00tcat has joined #tripleo | 01:11 | |
*** saneax is now known as saneax-_-|AFK | 01:15 | |
*** lblanchard has quit IRC | 01:19 | |
*** jerrygb has joined #tripleo | 01:26 | |
*** jerrygb_ has quit IRC | 01:27 | |
*** jerrygb_ has joined #tripleo | 01:32 | |
*** jerrygb has quit IRC | 01:35 | |
*** jerrygb_ has quit IRC | 01:39 | |
*** jerrygb has joined #tripleo | 01:41 | |
*** dougbtv has quit IRC | 01:44 | |
*** jerrygb has quit IRC | 01:49 | |
*** jerrygb has joined #tripleo | 01:50 | |
*** chlong has joined #tripleo | 02:06 | |
*** ooolpbot has joined #tripleo | 02:10 | |
ooolpbot | URGENT TRIPLEO TASKS NEED ATTENTION | 02:10 |
ooolpbot | https://bugs.launchpad.net/tripleo/+bug/1637961 | 02:10 |
ooolpbot | https://bugs.launchpad.net/tripleo/+bug/1638350 | 02:10 |
ooolpbot | https://bugs.launchpad.net/tripleo/+bug/1638864 | 02:10 |
*** ooolpbot has quit IRC | 02:10 | |
openstack | Launchpad bug 1637961 in tripleo "periodic HA master job pingtest times out" [Critical,In progress] - Assigned to Gabriele Cerami (gcerami) | 02:10 |
openstack | Launchpad bug 1638350 in tripleo "pingtest failing on OVB jobs to create Cinder volume and Nova server" [Critical,In progress] - Assigned to Emilien Macchi (emilienm) | 02:10 |
openstack | Launchpad bug 1638864 in tripleo "mariadb-10.1.18 breaks the resource agent" [Critical,In progress] - Assigned to Michele Baldessari (michele) | 02:10 |
*** jerrygb has quit IRC | 02:12 | |
*** jerrygb has joined #tripleo | 02:13 | |
*** fzdarsky__ has joined #tripleo | 02:14 | |
*** rlandy has quit IRC | 02:14 | |
*** fzdarsky_ has quit IRC | 02:17 | |
*** rhallisey has quit IRC | 02:17 | |
openstackgerrit | Steve Baker proposed openstack/tripleo-common: Clean up configure_containers.sh script https://review.openstack.org/384865 | 02:21 |
openstackgerrit | Steve Baker proposed openstack/tripleo-common: Allow building heat-agents image from master https://review.openstack.org/384866 | 02:21 |
openstackgerrit | Steve Baker proposed openstack/tripleo-common: Create new docker command hook https://review.openstack.org/312723 | 02:21 |
openstackgerrit | Steve Baker proposed openstack/tripleo-common: Install configuration files for all downloaded packages https://review.openstack.org/347412 | 02:21 |
openstackgerrit | Steve Baker proposed openstack/tripleo-common: Create new docker command hook https://review.openstack.org/312723 | 02:24 |
*** jerrygb has quit IRC | 02:29 | |
*** jerrygb has joined #tripleo | 02:30 | |
*** jerrygb_ has joined #tripleo | 02:42 | |
*** jerrygb has quit IRC | 02:44 | |
*** jerrygb has joined #tripleo | 02:47 | |
*** jerrygb_ has quit IRC | 02:50 | |
*** dmacpher has joined #tripleo | 02:54 | |
*** ooolpbot has joined #tripleo | 03:10 | |
ooolpbot | URGENT TRIPLEO TASKS NEED ATTENTION | 03:10 |
ooolpbot | https://bugs.launchpad.net/tripleo/+bug/1637961 | 03:10 |
openstack | Launchpad bug 1637961 in tripleo "periodic HA master job pingtest times out" [Critical,In progress] - Assigned to Gabriele Cerami (gcerami) | 03:10 |
ooolpbot | https://bugs.launchpad.net/tripleo/+bug/1638350 | 03:10 |
ooolpbot | https://bugs.launchpad.net/tripleo/+bug/1638864 | 03:10 |
*** ooolpbot has quit IRC | 03:10 | |
openstack | Launchpad bug 1638350 in tripleo "pingtest failing on OVB jobs to create Cinder volume and Nova server" [Critical,In progress] - Assigned to Emilien Macchi (emilienm) | 03:10 |
openstack | Launchpad bug 1638864 in tripleo "mariadb-10.1.18 breaks the resource agent" [Critical,In progress] - Assigned to Michele Baldessari (michele) | 03:10 |
*** yamahata has quit IRC | 03:10 | |
openstackgerrit | Steve Baker proposed openstack/tripleo-common: Create new docker command hook https://review.openstack.org/312723 | 03:14 |
openstackgerrit | RedHat RDO CI proposed openstack/tripleo-heat-templates: GATE TEST, please ignore https://review.openstack.org/365449 | 03:30 |
*** apetrich has quit IRC | 03:35 | |
*** apetrich has joined #tripleo | 03:36 | |
*** sudipto has joined #tripleo | 03:46 | |
*** sudipto_ has joined #tripleo | 03:47 | |
*** ebalduf has quit IRC | 03:50 | |
*** numans has joined #tripleo | 03:51 | |
*** ooolpbot has joined #tripleo | 04:10 | |
ooolpbot | URGENT TRIPLEO TASKS NEED ATTENTION | 04:10 |
ooolpbot | https://bugs.launchpad.net/tripleo/+bug/1637961 | 04:10 |
openstack | Launchpad bug 1637961 in tripleo "periodic HA master job pingtest times out" [Critical,In progress] - Assigned to Gabriele Cerami (gcerami) | 04:10 |
ooolpbot | https://bugs.launchpad.net/tripleo/+bug/1638350 | 04:10 |
ooolpbot | https://bugs.launchpad.net/tripleo/+bug/1638864 | 04:10 |
*** ooolpbot has quit IRC | 04:10 | |
openstack | Launchpad bug 1638350 in tripleo "pingtest failing on OVB jobs to create Cinder volume and Nova server" [Critical,In progress] - Assigned to Emilien Macchi (emilienm) | 04:10 |
openstack | Launchpad bug 1638864 in tripleo "mariadb-10.1.18 breaks the resource agent" [Critical,In progress] - Assigned to Michele Baldessari (michele) | 04:10 |
*** jerrygb has quit IRC | 04:24 | |
*** abregman has quit IRC | 04:25 | |
*** tzumainn has quit IRC | 04:46 | |
*** sudipto_ has quit IRC | 05:10 | |
*** sudipto has quit IRC | 05:10 | |
*** ooolpbot has joined #tripleo | 05:10 | |
ooolpbot | URGENT TRIPLEO TASKS NEED ATTENTION | 05:10 |
ooolpbot | https://bugs.launchpad.net/tripleo/+bug/1637961 | 05:10 |
ooolpbot | https://bugs.launchpad.net/tripleo/+bug/1638350 | 05:10 |
ooolpbot | https://bugs.launchpad.net/tripleo/+bug/1638864 | 05:10 |
*** ooolpbot has quit IRC | 05:10 | |
openstack | Launchpad bug 1637961 in tripleo "periodic HA master job pingtest times out" [Critical,In progress] - Assigned to Gabriele Cerami (gcerami) | 05:10 |
openstack | Launchpad bug 1638350 in tripleo "pingtest failing on OVB jobs to create Cinder volume and Nova server" [Critical,In progress] - Assigned to Emilien Macchi (emilienm) | 05:10 |
openstack | Launchpad bug 1638864 in tripleo "mariadb-10.1.18 breaks the resource agent" [Critical,In progress] - Assigned to Michele Baldessari (michele) | 05:10 |
*** abehl has joined #tripleo | 05:13 | |
*** masco has joined #tripleo | 05:24 | |
*** ramishra has quit IRC | 05:29 | |
*** ramishra has joined #tripleo | 05:31 | |
*** saneax-_-|AFK is now known as saneax | 05:52 | |
*** rcernin has joined #tripleo | 05:55 | |
*** sudipto_ has joined #tripleo | 05:56 | |
*** sudipto has joined #tripleo | 05:56 | |
*** ooolpbot has joined #tripleo | 06:10 | |
ooolpbot | URGENT TRIPLEO TASKS NEED ATTENTION | 06:10 |
ooolpbot | https://bugs.launchpad.net/tripleo/+bug/1637961 | 06:10 |
openstack | Launchpad bug 1637961 in tripleo "periodic HA master job pingtest times out" [Critical,In progress] - Assigned to Gabriele Cerami (gcerami) | 06:10 |
ooolpbot | https://bugs.launchpad.net/tripleo/+bug/1638350 | 06:10 |
*** ooolpbot has quit IRC | 06:10 | |
openstack | Launchpad bug 1638350 in tripleo "pingtest failing on OVB jobs to create Cinder volume and Nova server" [Critical,In progress] - Assigned to Emilien Macchi (emilienm) | 06:10 |
*** florianf has joined #tripleo | 06:38 | |
*** florianf has quit IRC | 06:42 | |
*** florianf has joined #tripleo | 06:49 | |
*** gfidente has quit IRC | 06:52 | |
*** nyechiel has joined #tripleo | 06:55 | |
*** bana_k has joined #tripleo | 06:57 | |
*** mrunge has quit IRC | 07:03 | |
*** lmiccini has joined #tripleo | 07:04 | |
*** tesseract has joined #tripleo | 07:04 | |
*** tesseract is now known as Guest13194 | 07:04 | |
*** rasca has joined #tripleo | 07:05 | |
*** mrunge has joined #tripleo | 07:07 | |
*** ooolpbot has joined #tripleo | 07:10 | |
ooolpbot | URGENT TRIPLEO TASKS NEED ATTENTION | 07:10 |
ooolpbot | https://bugs.launchpad.net/tripleo/+bug/1637961 | 07:10 |
openstack | Launchpad bug 1637961 in tripleo "periodic HA master job pingtest times out" [Critical,In progress] - Assigned to Gabriele Cerami (gcerami) | 07:10 |
ooolpbot | https://bugs.launchpad.net/tripleo/+bug/1638350 | 07:10 |
*** ooolpbot has quit IRC | 07:10 | |
openstack | Launchpad bug 1638350 in tripleo "pingtest failing on OVB jobs to create Cinder volume and Nova server" [Critical,In progress] - Assigned to Emilien Macchi (emilienm) | 07:10 |
*** bana_k has quit IRC | 07:12 | |
*** asalkeld has joined #tripleo | 07:14 | |
*** liverpooler has joined #tripleo | 07:20 | |
*** jprovazn has joined #tripleo | 07:22 | |
openstackgerrit | Merged openstack/tripleo-validations: Pass the the custom cacert to nova and heat client https://review.openstack.org/390833 | 07:30 |
*** pcaruana has joined #tripleo | 07:33 | |
*** chlong has quit IRC | 07:33 | |
*** b00tcat` has joined #tripleo | 07:34 | |
openstackgerrit | OpenStack Proposal Bot proposed openstack/tripleo-validations: Updated from global requirements https://review.openstack.org/391170 | 07:38 |
*** cylopez has joined #tripleo | 07:38 | |
*** mhenkel has joined #tripleo | 07:43 | |
*** mcornea has joined #tripleo | 07:46 | |
*** zoli|gone is now known as zoli|trng-afk | 07:48 | |
*** dmacpher has quit IRC | 07:52 | |
*** chandankumar has joined #tripleo | 07:52 | |
*** ccamacho has quit IRC | 07:53 | |
*** florianf has quit IRC | 07:58 | |
*** ohamada has joined #tripleo | 07:59 | |
*** shardy has joined #tripleo | 08:00 | |
*** florianf has joined #tripleo | 08:04 | |
*** ccamacho has joined #tripleo | 08:05 | |
*** jaosorior has joined #tripleo | 08:07 | |
*** d0ugal has joined #tripleo | 08:08 | |
*** fragatina has joined #tripleo | 08:09 | |
*** fragatina has quit IRC | 08:10 | |
*** ooolpbot has joined #tripleo | 08:10 | |
ooolpbot | URGENT TRIPLEO TASKS NEED ATTENTION | 08:10 |
ooolpbot | https://bugs.launchpad.net/tripleo/+bug/1637961 | 08:10 |
ooolpbot | https://bugs.launchpad.net/tripleo/+bug/1638350 | 08:10 |
*** ooolpbot has quit IRC | 08:10 | |
openstack | Launchpad bug 1637961 in tripleo "periodic HA master job pingtest times out" [Critical,In progress] - Assigned to Gabriele Cerami (gcerami) | 08:10 |
openstack | Launchpad bug 1638350 in tripleo "pingtest failing on OVB jobs to create Cinder volume and Nova server" [Critical,In progress] - Assigned to Emilien Macchi (emilienm) | 08:10 |
*** fragatina has joined #tripleo | 08:10 | |
openstackgerrit | Steven Hardy proposed openstack/tripleo-heat-templates: WIP prototyping composable upgrades with Heat+Ansible https://review.openstack.org/393448 | 08:11 |
shardy | marios: Hey g'morning - FYI I started trying some things related to the discussion on the spec, see ^^ | 08:12 |
shardy | not yet functional, but if you're happy with the approach I'll spend some more time on it today | 08:12 |
shardy | it's pretty similar to the previous ansible prototyping, but it's wired in with the new composable services interfaces | 08:13 |
marios | shardy: awesome. i tried keeping it updated after the initial discussion this morning (thanks for including the context/chat with emilienm there nice to have it all in one place) | 08:15 |
marios | shardy: the spec i mean | 08:15 |
marios | heh s/this morning/ this week even :/ | 08:15 |
* marios gets the coffee | 08:15 | |
shardy | marios: yup thanks for updating the spec - perhaps we can push a little more on the first pass of prototyping then do a final spec update when we're comfortable with the approach/interfaces | 08:15 |
marios | shardy: yeah sounds good... i think by next week end we should have a fairly good idea on the overall approach so we can land it in time for O1 easy | 08:16 |
shardy | I feel fairly happy with how it fits now, and we could still reuse the same pieces in future even if ansible was run by something other than heat | 08:16 |
*** ebarrera has joined #tripleo | 08:16 | |
marios | shardy: thanks very much for picking that up shardy even though I'd love to have the time to play a bit too :( | 08:16 |
shardy | marios: np - honestly I think the hard part will be writing all the per-service snippets and not so much the initial architecture | 08:17 |
*** yamahata has joined #tripleo | 08:17 | |
marios | shardy: check email when you have a change pls | 08:17 |
shardy | I'm thinking we may be able to reuse some of the stuff from jistr's earlier ansible prototype there tho | 08:17 |
marios | shardy: yeah i softened the 'no ansible' on the alternatives... it had occurred to me that since we are using ansible now we may be able to pick something from there... though that was designed to be run stand-alone. I guess the output of the upgrade snippet munging on the tht side will be something like the stuff that jistr wrote (which were the e-e upgrades playbooks) | 08:19 |
*** athomas has joined #tripleo | 08:19 | |
shardy | marios: yeah exactly - I expect the output to be a playbook with a bunch of tasks tagged with each step | 08:20 |
shardy | we can even write it to e.g /root/tripleo_upgrade_steps.yml on the nodes (we could do the same with the per-role puppet manifests) | 08:20 |
shardy | or at least make it easy to grab it if you want to run it via ansible directly | 08:21 |
marios | shardy: yeah that would be cool... we could then invoke that with w/e - even just a 'upgrade-my-node.sh' which invoked the playbook. i mean we already have upgrade-non-controller.sh | 08:21 |
shardy | marios: yeah, I was thinking we'd retain the option to just deploy an upgrade script on the node, and optionally run the upgrade | 08:22 |
shardy | that way folks can do their special snowflake things on upgrade if they really want to | 08:22 |
marios | shardy: i mean that assumes a /root/tripleo_upgrade_node.sh | 08:22 |
shardy | marios: yeah, but that could just be a wrapper for ansible or $whatever | 08:23 |
openstackgerrit | Juan Antonio Osorio Robles proposed openstack-infra/tripleo-ci: WIP TLS everywhere job https://review.openstack.org/391738 | 08:23 |
shardy | brb | 08:23 |
marios | shardy: right... i'm saying the 'deliver but not invoke yet' is what we do for the non controller upgrade scripts | 08:23 |
matbu | shardy: hey, i was looking briefly at your review | 08:24 |
matbu | shardy: the ansible playbook would be run on localhost on the nodes ? | 08:25 |
*** d0ugal has quit IRC | 08:28 | |
matbu | marios: by the way, i kick the pingtest between controller upgrade and compute, and it failed, glance is not reachable, i need to investigate | 08:28 |
*** amoralej|off is now known as amoralej | 08:29 | |
openstackgerrit | Carlos Camacho proposed openstack/tripleo-heat-templates: Reload haproxy configuration as a post-deployment step https://review.openstack.org/393644 | 08:30 |
marios | matbu: check if swift is started on overcloud they might be down | 08:30 |
marios | matbu: matbu https://review.openstack.org/#/c/392680/ | 08:31 |
matbu | marios: yup i'll check, but i applied the review (afair, it was late :)) | 08:31 |
marios | matbu: was the pingtest fail like this https://bugzilla.redhat.com/attachment.cgi?id=1216574 | 08:32 |
marios | 500 Internal Server Error tripleo.sh -- Overcloud pingtest, uploading demo tenant image to glance | 08:32 |
matbu | marios: 500 Internal Server Error: Failed to upload image | 08:33 |
matbu | marios: i didn't check the log | 08:33 |
marios | matbu: and on controllers there is this trace https://bugzilla.redhat.com/attachment.cgi?id=1216577 ... | 08:33 |
marios | matbu: yeah sounds similar anyway | 08:33 |
marios | matbu: but even after i fixed swift, still have an issue with heat stack domain... there is info in https://bugzilla.redhat.com/show_bug.cgi?id=1386719#c8 | 08:33 |
openstack | bugzilla.redhat.com bug 1386719 in rhel-osp-director "OSP9 to OSP10 upgrade pingtest fails." [High,New] - Assigned to mandreou | 08:33 |
matbu | marios: actually it depend on the priority, should i spend sometimes on that ? or just try to go further (i mean compute nodes upgrade and converge) | 08:34 |
marios | matbu: so i'd say try and continue for now. i think it would be good if you could first verify that you are hitting the same 'swift was down' main issue there and that review fixes it | 08:34 |
*** jpena|off is now known as jpena | 08:35 | |
marios | matbu: if you also then hit the subsequent heat stack domain auth issue then perhaps we should file BZ for it as another issue (i.e. use https://bugzilla.redhat.com/show_bug.cgi?id=1386719 for the 'fix swift' which is a problem anyway) | 08:35 |
openstack | bugzilla.redhat.com bug 1386719 in rhel-osp-director "OSP9 to OSP10 upgrade pingtest fails." [High,New] - Assigned to mandreou | 08:35 |
matbu | marios: ack | 08:35 |
marios | matbu: s/we/i will file a bz | 08:36 |
marios | matbu: thanks | 08:36 |
matbu | marios: yep pretty much the same stack trace | 08:38 |
matbu | but i applied the fix | 08:38 |
*** ebarrera has quit IRC | 08:40 | |
marios | matbu: hmm there may be a nit there then I set -1 on the review https://review.openstack.org/#/c/392680/2 | 08:41 |
marios | matbu: can you check the logs. was there an attempt to start the swift services? | 08:42 |
matbu | marios: yep | 08:42 |
matbu | marios: but actually swift is running | 08:43 |
marios | matbu: k, unsetting -1 then if swift is running :) | 08:44 |
matbu | marios: lol seconds | 08:45 |
marios | matbu: /me jumpy | 08:45 |
matbu | hehe | 08:45 |
matbu | marios: so yes swift is running, the review works fine, but there is another issue | 08:46 |
marios | matbu: ok | 08:47 |
*** hewbrocca_afk is now known as hewbrocca | 08:48 | |
marios | matbu: so don't spend *too* long imo... capture some debug info incase but then try get onto the converge etc | 08:48 |
social | moin | 08:49 |
matbu | marios: yep, it looks like it's an authentication issue | 08:49 |
marios | matbu: right so it may be the overcloud heat domain issue i saw then | 08:49 |
marios | https://bugzilla.redhat.com/show_bug.cgi?id=1386719#c7 | 08:50 |
openstack | bugzilla.redhat.com bug 1386719 in rhel-osp-director "OSP9 to OSP10 upgrade pingtest fails." [High,New] - Assigned to mandreou | 08:50 |
marios | matbu: like this https://bugzilla.redhat.com/attachment.cgi?id=1216586 | 08:50 |
marios | ERROR: Authorization failed. | 08:50 |
shardy | marios: are you sure that's not another manifestation of https://bugzilla.redhat.com/show_bug.cgi?id=1388474 ? | 08:51 |
openstack | bugzilla.redhat.com bug 1388474 in openstack-tripleo "Overcloud heat fails to create an IAM user" [Unspecified,On_dev] - Assigned to shardy | 08:51 |
shardy | fixed by https://review.openstack.org/#/c/392288 | 08:51 |
shardy | puppet misconfigures the heat.conf after the upgrade unless you have that fix | 08:52 |
* matbu lost in the bz | 08:52 | |
marios | shardy: thanks i was not aware of that bug reading | 08:52 |
shardy | you can tell by looking inside heat.conf - without the fix the heat domain settings are all unset, and openstack domain list with the overcloudrc.v3 won't have the special heat_stack domain | 08:53 |
*** paramite has joined #tripleo | 08:53 | |
marios | shardy: i went looking for earlier 8/9 heat domain related issues we had in the passed yesterday but this seems to be a new thing | 08:53 |
shardy | matbu: Hey sorry missed your question, yes for now at least the playbook will be run just on the localhost on a node by node basis | 08:55 |
shardy | that is the only ansible model heat supports | 08:55 |
shardy | in future we might enable a different mode where the same playbooks are driven from the undercloud without heat, but this is IMO an easier first step | 08:55 |
marios | shardy: cool would be fantastic if this ifxes it ... i just checked heat.conf and i do see things like stack_domain_admin = heat_stack_domain_admin | 08:55 |
marios | stack_domain_admin_password = BQqY etc | 08:56 |
shardy | marios: ah, possibly a different issue then, IME those are unset when the bug I referenced is present | 08:56 |
shardy | puppet actually unconfigures them on update | 08:56 |
matbu | shardy: k, yes i was wondering of something more "ansible oriented". | 08:57 |
matbu | shardy: actually heat is limited us | 08:57 |
marios | shardy: ok... it still sounds relevant though i mean it is about keystone domain auth failure which isn't happening before we upgrade the controllers | 08:57 |
shardy | matbu: Yeah - like, this is the first step, e.g generating the playbook with all the per-upgrade-step tags, for all services | 08:57 |
shardy | matbu: heat provides an easy way to run those, but we could also dump the playbook out and let the user run it some other way | 08:58 |
shardy | and/or automate doing that via a mistral workflow like we do for validations | 08:58 |
*** d0ugal_ has joined #tripleo | 08:58 | |
shardy | but that's a more difficult fit for our existing heat orientated architecture, so I'd prefer to tackle that as a later feature | 08:58 |
*** gfidente has joined #tripleo | 08:58 | |
*** gfidente has quit IRC | 08:58 | |
*** gfidente has joined #tripleo | 08:58 | |
matbu | shardy: yep, k | 08:58 |
shadower | jaosorior: could you have a look at these one liners, pls? https://review.openstack.org/#/c/391280/ https://review.openstack.org/#/c/391272/ and https://review.openstack.org/#/c/391267/ | 08:58 |
shardy | matbu: well heat isn't limiting us exactly, it's just that there is overlap between heat an ansible in this case | 08:59 |
matbu | shardy: and what about handling ansible via the tripleoclient ? | 08:59 |
shardy | e.g heat or ansible could orchestrate a rolling upgrade, but not both at the same time | 08:59 |
matbu | shardy: yes agree with the overlap | 08:59 |
*** jpich has joined #tripleo | 08:59 | |
*** jlinkes has joined #tripleo | 08:59 | |
shardy | matbu: not sure I follow, what is the ansible/tripleoclient requirement? | 09:00 |
openstackgerrit | Merged openstack/tripleo-quickstart: Drop *openstack/common* in flake8 exclude list https://review.openstack.org/373874 | 09:00 |
jaosorior | shardy: sure | 09:01 |
matbu | shardy: well, it just a thought, but i was wondering of just using heat for managing deploying nodes and ansible for applying config (and so upgrade things) ... but it's really far away of what we have today | 09:01 |
shardy | matbu: yes, that's one possible end-goal here, but not something I'm considering in this iteration | 09:01 |
matbu | shardy: so the client could orchestrate ansible (through the python ansible api) | 09:01 |
shardy | matbu: I think when we have this model in place, it'd be pretty simple to go that extra step and disable heat running both puppet and ansible | 09:02 |
shardy | and dump out a playbook which contains the deploy and upgrade steps | 09:02 |
matbu | shardy: i was also wondering of something that could make a graph of the dependencies of the services/roles for ordering the upgrade (in the client) | 09:02 |
shardy | then folks can do whatever they want (and stop complaining about heat :\) | 09:02 |
matbu | shardy: hehe yep | 09:03 |
shardy | matbu: So, yes, but we have to avoid putting business logic in tripleoclient itself, particularly if we want that functionality to be available via the UI | 09:03 |
shardy | so it might be in a mistral workflow instead | 09:03 |
matbu | shardy: ha yes right , i always forgot the UI :/ | 09:04 |
shardy | I can imagine e.g openstack overcloud deploy --templates -e $tht/environments/no-config.yaml | 09:04 |
openstackgerrit | Merged openstack/tripleo-validations: Fix the pacemaker-status validation https://review.openstack.org/391280 | 09:05 |
shardy | which would deploy the nodes and noop all puppet/ansible configuration | 09:05 |
shardy | then we'd provide the per-role puppet and ansible stuff as outputs from the stack | 09:05 |
openstackgerrit | Merged openstack/tripleo-validations: Fix the rabbitmq-limits validations https://review.openstack.org/391272 | 09:05 |
openstackgerrit | Merged openstack/tripleo-validations: Fix the check-network-gateway validation https://review.openstack.org/391267 | 09:05 |
shardy | or write them into swift, or whatever | 09:05 |
matbu | yep | 09:06 |
shardy | matbu: something that graphs the dependencies for both deploy and upgrade would be really useful | 09:06 |
shardy | I suspect it might be fairly tricky to write tho | 09:06 |
openstackgerrit | Julie Pichon proposed openstack/python-tripleoclient: Pass clients to get the get_password function https://review.openstack.org/393192 | 09:06 |
matbu | shardy: so the UI could handle the deploy/upgrade with this solution ? (calling mistral heat api only) | 09:07 |
shardy | matbu: yes, that's one reason I've left driving ansible inside heat | 09:07 |
matbu | shardy: yes i was wonedring of that, i wanted to test something, but the upgrade bugs is killing me :) | 09:07 |
shardy | the UI doesn't have to change at all, and all existing upgrade docs still work | 09:07 |
matbu | (i mean graph) | 09:07 |
matbu | shardy: k, but it too late for ocata ? or do you think we could try something like this for ocata ? | 09:09 |
shardy | the other thing to consider is containers - I actually think the heat model will work much better for upgrades in that case | 09:09 |
shardy | so I wanted to wire this all in via the heat model now, then we can review the next step when containerization is done | 09:09 |
shardy | matbu: this upgrade work is targetted at ocata | 09:09 |
shardy | it's basically essential since we released composable roles | 09:10 |
*** ooolpbot has joined #tripleo | 09:10 | |
ooolpbot | URGENT TRIPLEO TASKS NEED ATTENTION | 09:10 |
ooolpbot | https://bugs.launchpad.net/tripleo/+bug/1637961 | 09:10 |
ooolpbot | https://bugs.launchpad.net/tripleo/+bug/1638350 | 09:10 |
*** ooolpbot has quit IRC | 09:10 | |
openstack | Launchpad bug 1637961 in tripleo "periodic HA master job pingtest times out" [Critical,In progress] - Assigned to Gabriele Cerami (gcerami) | 09:10 |
openstack | Launchpad bug 1638350 in tripleo "pingtest failing on OVB jobs to create Cinder volume and Nova server" [Critical,In progress] - Assigned to Emilien Macchi (emilienm) | 09:10 |
matbu | shardy: yep but i mean the model : let heat managing/deploying nodes and output the playbook/manifest to ansible/puppet .. this model could be target for ocata ? | 09:10 |
*** psanchez has quit IRC | 09:10 | |
*** lucas-afk is now known as lucasagomes | 09:12 | |
*** nyechiel has quit IRC | 09:13 | |
*** katkapilatova has joined #tripleo | 09:13 | |
*** psanchez has joined #tripleo | 09:13 | |
shardy | matbu: possibly but I don't view it as a super high priority given all the other work we have this (short) cycle | 09:16 |
shardy | matbu: basically we have to get composable upgrades done first, and you might consider your requirement related to https://blueprints.launchpad.net/tripleo/+spec/split-stack-default | 09:16 |
shardy | I suspect by the end of ocata what you're asking for will be possible, but perhaps not fully tested/supported yet | 09:16 |
shardy | lets see how long the composable upgrades stuff takes to get finished, then we can assess how much work remains to fully split things | 09:17 |
hewbrocca | shardy: I never would have predicted the shape this is taking, but it's not bad :) | 09:17 |
matbu | shardy: ack | 09:18 |
shardy | hewbrocca: heh, thanks (I think?! ;) | 09:19 |
shardy | I think it actually fits together fairly well, even though it does end up with heat being essentially a translation layer for the software config | 09:19 |
shardy | provided we still want to use heat for the node/network orchstration that's probably not so bad tho | 09:20 |
hewbrocca | That's the thing I wouldn't have predicted | 09:20 |
hewbrocca | and, yes | 09:20 |
hewbrocca | shardy: makes me wonder if we shouldn't resurrect that ironic resource for Heat | 09:20 |
matbu | shardy: agree | 09:21 |
openstackgerrit | Arx Cruz proposed openstack-infra/tripleo-ci: Reducing a few minutes from the job timeout to save the logs https://review.openstack.org/393309 | 09:21 |
hewbrocca | There's even less reason to have Nova in the picture if we're really just driving Ironic | 09:21 |
shardy | hewbrocca: actually I think those are already getting resurrected by ricolin, but I'm not sure they are vital to our use-case | 09:21 |
shardy | hewbrocca: I was planning instead a mistral workflow that coordinates the dance between ironic and neutron ref https://review.openstack.org/#/c/313048/ | 09:22 |
hewbrocca | Agree, I wouldn't say vital | 09:22 |
hewbrocca | well, you are two steps ahead of me, as usual | 09:22 |
shardy | need to spend some time getting that working, then we could remove nova potentially | 09:22 |
hewbrocca | carry on :) | 09:22 |
shardy | hehe :) | 09:22 |
openstackgerrit | Juan Antonio Osorio Robles proposed openstack-infra/tripleo-ci: WIP TLS everywhere job https://review.openstack.org/391738 | 09:23 |
*** lmiccini has quit IRC | 09:24 | |
*** hjensas has quit IRC | 09:25 | |
*** ebarrera has joined #tripleo | 09:26 | |
openstackgerrit | Julie Pichon proposed openstack-infra/tripleo-ci: Add UI to undercloud sanity checks https://review.openstack.org/390845 | 09:28 |
openstackgerrit | Juan Antonio Osorio Robles proposed openstack-infra/tripleo-ci: WIP TLS everywhere job https://review.openstack.org/391738 | 09:29 |
*** dtantsur|afk is now known as dtantsur | 09:32 | |
openstackgerrit | Merged openstack/tripleo-common: Install configuration files for all downloaded packages https://review.openstack.org/347412 | 09:36 |
openstackgerrit | Katerina Pilatova proposed openstack/tripleo-validations: undercloud-disk-space.yaml: improved output https://review.openstack.org/393354 | 09:36 |
jaosorior | Has anybody attempted to reproduce the CI issues locally? | 09:37 |
openstackgerrit | Katerina Pilatova proposed openstack/tripleo-validations: undercloud-disk-space.yaml: improved output https://review.openstack.org/393354 | 09:39 |
*** charliejllewelly has joined #tripleo | 09:40 | |
*** Steve__ has joined #tripleo | 09:40 | |
*** Steve__ is now known as TheRealCharlieJl | 09:41 | |
*** TheRealCharlieJl is now known as SteveRelf | 09:41 | |
jaosorior | ccamacho: have you tested this? https://review.openstack.org/#/c/393644/1 | 09:42 |
charliejllewelly | Hi All, is anyone able to help me understand how the auth_url for os-collect-config is set? Currently it is returning the internal API address space which prevents it accessing the HEAT API, I would expect it to be using the Public URL. | 09:43 |
ccamacho | jaosorior checking that with https://bugzilla.redhat.com/show_bug.cgi?id=1390962 | 09:43 |
openstack | bugzilla.redhat.com bug 1390962 in rhel-osp-director "HAProxy doesn't load the new configuration after scaling out the role running the Openstack API services" [Urgent,Assigned] - Assigned to ccamacho | 09:43 |
ccamacho | jaosorior, good morning man :) | 09:43 |
*** milan has joined #tripleo | 09:44 | |
jaosorior | ccamacho: I remember in the summit, bandini, mcornea, jistr and I arrived to the grim conclusion that no pacemaker restarts were happening due to a slip-up in the composable roles work, not sure if that made it into a bug report, but yeah | 09:44 |
jaosorior | ccamacho: so it might be that the stuff you put there doesn't even get set up | 09:44 |
jaosorior | :/ | 09:44 |
jaosorior | *doesn't even get ran | 09:44 |
openstackgerrit | OpenStack Proposal Bot proposed openstack/tripleo-common: Updated from global requirements https://review.openstack.org/389957 | 09:45 |
openstackgerrit | OpenStack Proposal Bot proposed openstack/tripleo-validations: Updated from global requirements https://review.openstack.org/391170 | 09:45 |
ccamacho | jaosorior, aaaaaa crap... Do you have a bug for that? | 09:45 |
ccamacho | im adding some echo's locally to test | 09:45 |
jaosorior | ccamacho: that's what I said, I'm not sure if that made it into a bug report, we just figured out in the summit cause of another bug | 09:45 |
ccamacho | mmmmm jaosorior thanks man | 09:45 |
jaosorior | ccamacho: yeah :/ | 09:46 |
mcornea | jaosorior: ccamacho I don't think there's a bug except the SSL cert related one | 09:46 |
ccamacho | :) thanks for the hint dude | 09:46 |
jaosorior | ccamacho: so, hopefully this saves you some time debugging the same stuff we did :/ | 09:46 |
jaosorior | ccamacho: good morning, by the way :D | 09:46 |
jpich | d0ugal_: Good morning! Is it fair to assume you're going to keep the approach of updating the passwords in the mistral env for https://review.openstack.org/#/c/392593/ ? As in, should I make my patch dependent on yours I can solve All My Problems and assume get_password() will expect passwords to be in the new format (name), regardless of whether this is Mitaka/Newton/an upgrade? | 09:47 |
jtomasek | shardy: Hi, so I've been looking into if it is possible to add additional sections to environment and it seems that it is not possible (I am getting environment has wrong section "metadata") did I misunderstood something during the summit session? | 09:47 |
*** percevalbot has joined #tripleo | 09:47 | |
jaosorior | mcornea: was the patch that I sent you the other day about the VIP hosts useful? | 09:48 |
therve | jtomasek, It's not possible, sections are validated | 09:48 |
shardy | jtomasek: It's not possible, we'd need to propose changes to both heat and heatclient | 09:48 |
jaosorior | shardy: you might know the answer to charliejllewelly's question :O | 09:48 |
shardy | jtomasek: I suggested an alternative approach where we abuse a specially named parameter_default | 09:48 |
therve | Uhhhhh | 09:49 |
shardy | but we can revive the discussion wrt just adding some things to the environment too | 09:49 |
* therve didn't see that | 09:49 | |
shardy | like at least a description field | 09:49 |
shardy | therve: I was only mentioning it in the context of a temporary workaround | 09:49 |
jtomasek | shardy: yeah, I outlined the intention here https://review.openstack.org/#/c/393365/1/specs/ocata/gui-deployment-configuration.rst | 09:50 |
therve | shardy, I know :) | 09:50 |
d0ugal_ | jpich: Yup, I will still be doing that, one way or another | 09:50 |
mcornea | jaosorior: yep, I was looking for you yesterday to confirm but it was too late. I got an environment deployed with CloudName and it worked well. I believe we can document the parameters as the different domain for public/internal makes sense. | 09:50 |
d0ugal_ | jpich: I think I just need to be a bit smarter about when I do that, as bnemec pointed out it will get a bit messy after the upgrade unless the user deletes the password file. | 09:50 |
jtomasek | shardy: as I looked into it more, 'title' and 'description' would be sufficient | 09:50 |
mcornea | jaosorior: probably we need to take them into consideration when we're going to test SSL for internal services? | 09:50 |
jtomasek | shardy: could you please point me to the 'description' discussion? | 09:50 |
*** d0ugal_ is now known as d0ugal | 09:51 | |
jaosorior | mcornea: yep, actually, I only test internal SSL with CLOUDNAME, FreeIPA doesn't give you certs for IP addresses anyway | 09:51 |
jpich | d0ugal_: Cool! We get to play to the "patch dependency chain" game again :) | 09:51 |
mcornea | jaosorior: interesting, I'm going to look into that after I finish with the composable roles testing | 09:52 |
*** d0ugal has joined #tripleo | 09:52 | |
shardy | jtomasek: this thread http://lists.openstack.org/pipermail/openstack-dev/2016-June/097178.html | 09:52 |
jtomasek | shardy: thank you | 09:52 |
jaosorior | mcornea: if you have time to try it out, I have this blog post about it :D http://jaormx.github.io/2016/testing-out-the-tls-everywhere-patches-for-tripleo/ | 09:52 |
mcornea | jaosorior: yep, it's on my to do list :) | 09:53 |
*** rhallisey has joined #tripleo | 09:53 | |
*** panda|zZ is now known as panda | 09:55 | |
openstackgerrit | Merged openstack/tripleo-validations: undercloud-disk-space.yaml: improved output https://review.openstack.org/393354 | 09:59 |
shardy | charliejllewelly: Hi, you can configure the heat.conf on the undercloud to influence the os-collect-config settings | 10:00 |
shardy | basically we get the endpoint from keystone on the undercloud, and inject it via cloud-init userdata | 10:00 |
openstackgerrit | Merged openstack/tripleo-validations: Updated from global requirements https://review.openstack.org/391170 | 10:00 |
*** hjensas has joined #tripleo | 10:00 | |
*** stevemul has joined #tripleo | 10:00 | |
shardy | you can set the [clients_heat] endpoint_type to e.g publicURL | 10:00 |
shardy | charliejllewelly: note that on recent TripleO versions os-collect-config is actually polling swift, not heat | 10:01 |
shardy | so that will change which config setting is modified | 10:01 |
openstackgerrit | Merged openstack/tripleo-validations: Change HAProxy timeouts to match the defaults https://review.openstack.org/391359 | 10:01 |
openstackgerrit | Chris Jones proposed openstack/puppet-tripleo: Improve failed mysql node removal time in HA deploys. https://review.openstack.org/393673 | 10:03 |
charliejllewelly | shardy: thanks for the pointer but this is for our users of the overcloud. Unfortunately w are not running swift in our cloud hence having to use cfn or heat. I couldn't see a way to set the auth_url in heat.conf...must be being silly | 10:03 |
openstackgerrit | Julie Pichon proposed openstack/python-tripleoclient: Pass clients to get the get_password function https://review.openstack.org/393192 | 10:04 |
shardy | charliejllewelly: the same applies I think, sec let me see how to configure it | 10:04 |
*** thrash|g0ne is now known as thrash | 10:06 | |
shardy | charliejllewelly: do you have HeatApiNetwork assigned to external in the ServiceNetMap ? | 10:07 |
charliejllewelly | shardy: not sure just checking | 10:08 |
*** ooolpbot has joined #tripleo | 10:10 | |
ooolpbot | URGENT TRIPLEO TASKS NEED ATTENTION | 10:10 |
ooolpbot | https://bugs.launchpad.net/tripleo/+bug/1637961 | 10:10 |
openstack | Launchpad bug 1637961 in tripleo "periodic HA master job pingtest times out" [Critical,In progress] - Assigned to Gabriele Cerami (gcerami) | 10:10 |
ooolpbot | https://bugs.launchpad.net/tripleo/+bug/1638350 | 10:10 |
*** ooolpbot has quit IRC | 10:10 | |
openstack | Launchpad bug 1638350 in tripleo "pingtest failing on OVB jobs to create Cinder volume and Nova server" [Critical,In progress] - Assigned to Emilien Macchi (emilienm) | 10:10 |
charliejllewelly | shardy: Could you explain how I would check that please? | 10:11 |
*** limao has quit IRC | 10:11 | |
charliejllewelly | a catalog list only shows a heat endpoint in the service catalog with public, internal and admin endpoints | 10:12 |
shardy | charliejllewelly: by default all services listen on the internal_api network, but there's a parameter available which lets you assign services to whatever network you want | 10:12 |
shardy | charliejllewelly: Ok so all the endpoints in catalog list for heat are on your internal_api network? | 10:12 |
shardy | openstack endpoint show heat should also tell you | 10:13 |
charliejllewelly | shardy: correct | 10:13 |
shardy | charliejllewelly: what version of TripleO are you running please? | 10:14 |
charliejllewelly | although they also list public and admin | 10:14 |
charliejllewelly | Redhat OSP8 | 10:14 |
openstackgerrit | Giulio Fidente proposed openstack/tripleo-heat-templates: Defaults kernel.pid_max to 1048576 https://review.openstack.org/393682 | 10:14 |
*** athomas has quit IRC | 10:15 | |
jaosorior | gfidente: hey dude | 10:16 |
shardy | charliejllewelly: Ok, so if you want to reassign heat to your external network, you need to pass a parameter which overrides the default ServiceNetMap, see here: | 10:16 |
shardy | https://github.com/openstack/tripleo-heat-templates/blob/stable/liberty/overcloud.yaml#L672 | 10:16 |
jaosorior | gfidente: so, I've been digging into the CI issue involving cinder... it seems that cinder is actually up and running, but for some reason it takes a VERY long time to respond. | 10:16 |
jaosorior | gfidente: doing a simple volume list, it seems that it takes around 7 to 8 seconds to do anything in my deployment. And I think it's the same issue in CI. | 10:17 |
jaosorior | EmilienM: ^^ | 10:17 |
shardy | charliejllewelly: it would look something like this: | 10:18 |
shardy | http://paste.openstack.org/show/587877/ | 10:18 |
shardy | not that on liberty and mitaka you need to copy the whole map (for all services), but since Newton we allow just passing those services you want to override | 10:18 |
shardy | you can assign any other services you want on the external net there too, then check it all looks good via the catalog | 10:19 |
*** akrivoka has joined #tripleo | 10:19 | |
shardy | basically heat puts the value from the catalog in the userdata for os-collect-config by default | 10:19 |
shardy | but if for any reason that doesn't work for you, there is a config option to override it | 10:19 |
shardy | https://github.com/openstack/heat/blob/master/heat/engine/clients/os/heat_plugin.py#L78 | 10:20 |
shardy | probably not needed though if you just set ServiceNetMap how you want | 10:20 |
*** athomas has joined #tripleo | 10:21 | |
*** tremble has joined #tripleo | 10:22 | |
*** tremble has joined #tripleo | 10:22 | |
dtantsur | hey folks! I've heard CI is having numerous problems. is there something I could help with from Ironic side? | 10:22 |
charliejllewelly | shardy: brilliant thanks for the info, I'll have read | 10:22 |
openstackgerrit | Juan Antonio Osorio Robles proposed openstack-infra/tripleo-ci: CINDER TEST https://review.openstack.org/393687 | 10:22 |
openstackgerrit | Marios Andreou proposed openstack/tripleo-heat-templates: Add special case handling for OVS upgrade in updates and upgrades https://review.openstack.org/393688 | 10:23 |
openstackgerrit | Marios Andreou proposed openstack/tripleo-heat-templates: Add replacepkgs to the manual ovs upgrade workaround and fix a typo https://review.openstack.org/393689 | 10:23 |
shadower | shardy: how should we proceed with this? https://review.openstack.org/#/c/390854/ | 10:24 |
shadower | shardy: you wrote the first revision so I'm not sure whether you can review it or not | 10:24 |
*** lmiccini has joined #tripleo | 10:24 | |
shadower | but it's a prerequisite for an important validation fix so I'd like to land it asap | 10:25 |
jaosorior | shadower: well, we do need to fix CI before landing more stuff | 10:25 |
shadower | jaosorior: yeah that's true | 10:26 |
*** ealcaniz has joined #tripleo | 10:26 | |
panda | image builder is not honouring OVERCLOUD_IMAGES_ARGS and I still don't understand why ... selinux is still on enforcing ... | 10:26 |
jaosorior | panda: well, we do modify the images after being built | 10:27 |
jaosorior | panda: why not do that? | 10:28 |
gfidente | jaosorior I think I need to update my image templates because I still get cinder-api as service | 10:28 |
panda | jaosorior: where, with what ? virt-customize is not installed in undercloud ? | 10:29 |
jaosorior | gfidente: probably that and the puppet-tripleo from the images | 10:29 |
shardy | shadower: I'm fine for it to land when we fix CI | 10:30 |
shadower | shardy: thanks | 10:30 |
shardy | I +2'd it but someone else should probably approve since I'm a co-author | 10:30 |
*** yamahata has quit IRC | 10:30 | |
jaosorior | panda: look at the update_image function in common_functions.sh | 10:30 |
shardy | shadower: Also I'd welcome your feedback on https://review.openstack.org/#/c/393448/ | 10:31 |
shadower | shardy: I'll have a look | 10:31 |
shardy | shadower: I'm experimenting with building an ansible playbook for upgrades from the composable service interfaces | 10:31 |
shardy | it occurs to me that the same approach might work well for ansible (or $whatever) based validations of the deployed overcloud | 10:31 |
*** skramaja has quit IRC | 10:32 | |
shardy | particularly if we get to per-service validations as has recently been proposed by flepied in https://review.openstack.org/#/c/372336/ | 10:32 |
bogdando | folks, I tried the tripleo quickstart https://github.com/openstack/tripleo-quickstart/ and it fails with "modprobe: ERROR: could not insert 'kvm_intel': Operation not supported" Is it because my VIRTHOST is a VM? | 10:34 |
panda | bogdando: yes | 10:35 |
bogdando | does the VM case without HW accel supported? :( | 10:35 |
bogdando | how could I try it on the GCE instance? | 10:35 |
bogdando | any w/a to that to tun non HW accelerated VMs by installer? | 10:36 |
bogdando | s/tun/run | 10:36 |
bogdando | or what is anouther most simple alternative to fit my case? | 10:36 |
openstackgerrit | Gabriele Cerami proposed openstack-infra/tripleo-ci: Workaround: Set selinux to permissive to workaround bug LP#1637961 https://review.openstack.org/392703 | 10:37 |
jaosorior | panda: did you find the function I mentioned? | 10:40 |
*** kbyrne has quit IRC | 10:40 | |
*** abehl has quit IRC | 10:40 | |
panda | jaosorior: yes, but it will take a while, in the meantime, I'll try with my original ugly hack. | 10:40 |
*** NobodyCam has quit IRC | 10:42 | |
*** Ng has quit IRC | 10:42 | |
*** hrybacki has quit IRC | 10:42 | |
*** kbyrne has joined #tripleo | 10:42 | |
*** ramishra has quit IRC | 10:43 | |
*** akrivoka has quit IRC | 10:44 | |
*** afazekas has quit IRC | 10:44 | |
*** fungi has quit IRC | 10:44 | |
*** tdasilva has quit IRC | 10:44 | |
panda | bogdando: I think using non-accelerated VMs on top of another VM would take so much to deploy that I'm not sure it's even worth it. If you still want to do it, you may try to modify the ansible tasks to not install the kvm_intel module and use qemu to launch the instances. | 10:44 |
*** mgould|afk is now known as mgould | 10:44 | |
*** ramishra has joined #tripleo | 10:45 | |
*** ramishra has quit IRC | 10:45 | |
*** Ng has joined #tripleo | 10:45 | |
*** ChanServ sets mode: +v Ng | 10:45 | |
*** hrybacki has joined #tripleo | 10:46 | |
panda | jaosorior: I'm confused. we use update_image when we're not building the image, but the gates are actually building it, and it's weird because the image shold be built only during periodic jobs ... | 10:46 |
*** ramishra has joined #tripleo | 10:46 | |
bogdando | panda, so recommended way to make dev envs is use only a BM host? | 10:46 |
shardy | bogdando: most folks are either using a BM host or an environment where nested virt is enabled | 10:47 |
jaosorior | panda: we don't build the images in every gate. tripleo-ci builds them, but AFAIK when it's run on tripleo-heat-templates we don't. | 10:47 |
panda | jaosorior: anyway, since I just want to change selinux and not updating all the image, I'll have to use part of this function up to some point, and use it only to change SELINUX variable | 10:47 |
bogdando | shardy, got it thanks | 10:47 |
*** openstackgerrit has quit IRC | 10:47 | |
*** openstackgerrit has joined #tripleo | 10:48 | |
*** afazekas has joined #tripleo | 10:49 | |
panda | jaosorior: or maybe I'll just install libguestfs-tools and use virt-customize | 10:50 |
jaosorior | panda: sounds reasonable too | 10:50 |
jaosorior | panda: this is a workaround anyway, doesn't need to be perfect. | 10:51 |
*** afazekas has quit IRC | 10:54 | |
openstackgerrit | Merged openstack/tripleo-validations: Fix the ctlplane-ip-range validation https://review.openstack.org/392868 | 10:54 |
openstackgerrit | Merged openstack/tripleo-validations: Fix the mysql-open-files-limit validation https://review.openstack.org/392869 | 10:54 |
*** afazekas has joined #tripleo | 10:55 | |
*** NobodyCam has joined #tripleo | 10:56 | |
*** akrivoka has joined #tripleo | 10:57 | |
*** fungi has joined #tripleo | 10:58 | |
openstackgerrit | Juan Antonio Osorio Robles proposed openstack-infra/tripleo-ci: CINDER TEST https://review.openstack.org/393687 | 11:00 |
*** jlinkes_ has joined #tripleo | 11:05 | |
*** jkilpatr has joined #tripleo | 11:07 | |
openstackgerrit | Florian Fuchs proposed openstack/tripleo-ui: Validate JSON parameters https://review.openstack.org/393713 | 11:09 |
panda | libguestfs method works locally ... | 11:09 |
*** jlinkes has quit IRC | 11:09 | |
*** ooolpbot has joined #tripleo | 11:10 | |
ooolpbot | URGENT TRIPLEO TASKS NEED ATTENTION | 11:10 |
ooolpbot | https://bugs.launchpad.net/tripleo/+bug/1637961 | 11:10 |
ooolpbot | https://bugs.launchpad.net/tripleo/+bug/1638350 | 11:10 |
*** ooolpbot has quit IRC | 11:10 | |
openstack | Launchpad bug 1637961 in tripleo "periodic HA master job pingtest times out" [Critical,In progress] - Assigned to Gabriele Cerami (gcerami) | 11:10 |
openstack | Launchpad bug 1638350 in tripleo "pingtest failing on OVB jobs to create Cinder volume and Nova server" [Critical,In progress] - Assigned to Emilien Macchi (emilienm) | 11:10 |
charliejllewelly | shardy: thanks for the advice earlier, I have found that by changing the auth_uri in heat.conf it allows the correct auth endpoint to be set via os-collect-config. Would this be considered an acceptable solution or is the correct method still to change the endpoint map? | 11:10 |
charliejllewelly | My concern is changing the isolated network that some of the internal service interactions take place over may be insecure? | 11:11 |
shardy | charliejllewelly: so, you want to have keystone listen on the external network, and heat only listen on the internal_api network? | 11:12 |
shardy | maybe I misunderstood, I thought you wanted the connection to heat to happen via a different network | 11:12 |
*** katkapilatova has left #tripleo | 11:12 | |
charliejllewelly | I was attempting to get VM's in the overcloud to interact with heat via the public URL's but if other services internal to the overcloud consume heat they should use the internal API network | 11:14 |
*** jlinkes_ has quit IRC | 11:14 | |
*** jlinkes_ has joined #tripleo | 11:15 | |
shardy | charliejllewelly: ah, so yeah looking at the liberty templates we actually force the URL to the vip for the HeatApiNetwork | 11:18 |
shardy | https://github.com/openstack/tripleo-heat-templates/blob/stable/liberty/overcloud.yaml#L1011 | 11:18 |
shardy | which overrides the heat logic that detects the endpoint from the catalog | 11:18 |
shardy | we fixed that in Newton | 11:18 |
charliejllewelly | Ah...that makes sense :) | 11:18 |
shardy | https://github.com/openstack/tripleo-heat-templates/blob/stable/liberty/puppet/controller.yaml#L953 | 11:19 |
shardy | https://github.com/openstack/tripleo-heat-templates/blob/stable/liberty/puppet/controller.yaml#L1390 | 11:19 |
openstackgerrit | Gabriele Cerami proposed openstack-infra/tripleo-ci: Workaround: Set selinux to permissive to workaround bug LP#1637961 https://review.openstack.org/392703 | 11:20 |
openstackgerrit | Merged openstack/tripleo-heat-templates: Enable proxy headers parsing for Neutron https://review.openstack.org/393130 | 11:21 |
*** pblaho has joined #tripleo | 11:22 | |
shardy | charliejllewelly: yeah so if you just want to force a different value in the heat.conf you can do that like this: | 11:22 |
shardy | http://paste.openstack.org/show/587886/ | 11:22 |
shardy | then you can set whatever url you want to override what we set in the template as linked above | 11:22 |
*** dsariel has quit IRC | 11:24 | |
shardy | charliejllewelly: you can use the same trick if you wish to change the auth_uri when using the native heat transport | 11:24 |
openstackgerrit | Julie Pichon proposed openstack/instack-undercloud: Open firewall port for the TripleO UI https://review.openstack.org/393719 | 11:25 |
charliejllewelly | shardy: perfect thanks for the help, I'll try that now | 11:25 |
*** jprovazn has quit IRC | 11:26 | |
openstackgerrit | Bogdan Dobrelya proposed openstack/tripleo-quickstart: Add support of deploy on a VM w/o nested virt https://review.openstack.org/393720 | 11:28 |
bogdando | panda, shardy , mwhahaha ^^ (still testing tho) | 11:28 |
openstackgerrit | Merged openstack/tripleo-common: Configure run-validation to use the custom output https://review.openstack.org/393467 | 11:29 |
*** mburned_out is now known as mburned | 11:31 | |
*** dprince has joined #tripleo | 11:36 | |
cschwede | Hello there! Question: is it possible to disable a single service on one role only? I know I can modify a role (eg. Controller), but i I want to disable only a single service there might be an easier way? | 11:37 |
cschwede | Maybe using resource_registry & OS::Heat::None but restrict that to one role only (not global)? | 11:38 |
shardy | cschwede: you can map the service to OS::Heat::None, or modify roles_data.yaml to remove the service you don't want | 11:38 |
shardy | both basically do the same, other than the None approach giving you a spurious service when looking at the deployed heat stack | 11:39 |
shardy | https://github.com/openstack/tripleo-heat-templates/blob/master/overcloud-resource-registry-puppet.j2.yaml#L105 | 11:39 |
*** tdasilva has joined #tripleo | 11:40 | |
*** d0ugal has quit IRC | 11:40 | |
shardy | cschwede: it's actually how we currently disable services which we don't want deployed by default | 11:40 |
cschwede | shardy: ok, but if I map a service (say OS::TripleO::Services::SwiftStorage) to OS::Heat::None, it will affect all roles, and I only want this for the controller role | 11:40 |
shardy | but pretty soon we'll be moving away from that as heat allows for merging environments now | 11:40 |
cschwede | so i need to copy the default controller role definition, and just remove the unwanted service | 11:40 |
EmilienM | hello | 11:40 |
shardy | cschwede: Ok, then you have to either remove it from the roles_data.yaml or pass a modified ControllerServices list as a parameter | 11:40 |
shardy | cschwede: yes, or just copy the list of services and pass a different ControllerServices list | 11:41 |
shardy | https://github.com/openstack-infra/tripleo-ci/blob/master/test-environments/scenario001-multinode.yaml#L5 | 11:41 |
shardy | cschwede: ^^ like that | 11:41 |
cschwede | shardy: yes, both is working for me, thx. i was just wondering if there is an easier way when i only want to disable one service on one role | 11:41 |
cschwede | shardy: like not adding 50-1 services to my custom template | 11:42 |
shardy | cschwede: currently not really - you could pass hierdata via ControllerExtraConfig which tells puppet not to start the service, but that might cause other issues because it would still be wired in to any other services which expect to interface with it | 11:42 |
cschwede | shardy: yes, tried that, failed for exact the reason you described | 11:43 |
EmilienM | do we have progress on our CI issues? | 11:43 |
shardy | cschwede: Yeah, maybe we can enhance the heat parameter merging stuff to allow an xor merge or something | 11:43 |
cschwede | shardy: i add that to my list of topics/questions for next months OOO workshop :) | 11:43 |
shardy | cschwede: it'd also be pretty easy to have e.g ControllerExcludedServices as a parameter which we then use to filter the ControllerServices list | 11:43 |
shardy | but that seems like a kind of klunky interface | 11:44 |
panda | EmilienM: mariadb issue fixed, my workaround needed more passes instead | 11:44 |
EmilienM | panda: right, mariadb we updated packages last night again | 11:44 |
shardy | one of the aims with the composable services/roles was to keep the interface simple, and the explicit list does at least keep things simple | 11:44 |
EmilienM | but what about the cinder problem? | 11:44 |
EmilienM | jaosorior: ^ | 11:44 |
panda | EmilienM: there's a review from jaosorior | 11:44 |
panda | EmilienM: https://review.openstack.org/393687 | 11:45 |
cschwede | shardy: agree; i was just a bit worried that eg the controller list gets updated between releases, and an operator might miss to update the customized services then | 11:45 |
*** d0ugal has joined #tripleo | 11:45 | |
EmilienM | panda: interesting | 11:45 |
shardy | cschwede: yeah, either way it's a compromise, but so far I've preferred the explicit approach of having the operator just define the list of services they want | 11:46 |
*** ealcaniz has quit IRC | 11:46 | |
shardy | seems like it should yield the least surprising results in most cases despite the minor cut/paste invonvenience | 11:46 |
*** lmiccini has quit IRC | 11:46 | |
shardy | inconvenience even | 11:46 |
cschwede | shardy: yep, makes sense to me | 11:46 |
openstackgerrit | Steven Hardy proposed openstack/tripleo-heat-templates: WIP prototyping composable upgrades with Heat+Ansible https://review.openstack.org/393448 | 11:48 |
shardy | marios: ^^ it now works! :) | 11:48 |
shardy | well, appending to /root/upgrade.log does, it doesn't really implement the upgrade workflow yet | 11:48 |
shardy | question, do we still need major-upgrade-pacemaker-init.yaml if we can configure the new repos e.g as step1 of the upgrade, or run the rameter_defaults: | 11:50 |
shardy | sorry or run the UpgradeInitCommand before running the steps | 11:50 |
*** zoli|trng-afk is now known as zoli | 11:52 | |
*** zoli is now known as zoli|lunch | 11:52 | |
*** bfournie has quit IRC | 11:55 | |
*** morazi has quit IRC | 11:57 | |
cschwede | would be great if someone has a few minutes for a small t-h-t bugfix review: https://review.openstack.org/#/c/391222/ | 11:59 |
EmilienM | cschwede: looking | 12:00 |
cschwede | EmilienM: thx a lot! | 12:00 |
weshay | sshnaidm, panda how's the ci situation? | 12:02 |
EmilienM | shardy: I don't think we need this init thing, if we have step1,2,3,4,5,etc in place | 12:02 |
EmilienM | weshay: I see HA job very unstable | 12:03 |
*** jayg|g0n3 is now known as jayg | 12:03 | |
EmilienM | weshay: tripleo.org/cistatus.html | 12:03 |
weshay | ya.. I'm looking | 12:03 |
weshay | ha ping test | 12:03 |
panda | weshay: mariadb solved, redis workaround still under work | 12:03 |
EmilienM | slagle: we have 3nodes job in place | 12:06 |
*** fultonj_ has joined #tripleo | 12:06 | |
shardy | EmilienM: ack, I'll see how the init stuff could be wired in via the steps instead | 12:08 |
slagle | EmilienM: yes i saw | 12:08 |
slagle | EmilienM: it will be a while before i can work on it, as i'm working on other things | 12:09 |
EmilienM | slagle: right, just fyi | 12:09 |
*** ooolpbot has joined #tripleo | 12:10 | |
ooolpbot | URGENT TRIPLEO TASKS NEED ATTENTION | 12:10 |
ooolpbot | https://bugs.launchpad.net/tripleo/+bug/1637961 | 12:10 |
ooolpbot | https://bugs.launchpad.net/tripleo/+bug/1638350 | 12:10 |
*** ooolpbot has quit IRC | 12:10 | |
openstack | Launchpad bug 1637961 in tripleo "periodic HA master job pingtest times out" [Critical,In progress] - Assigned to Gabriele Cerami (gcerami) | 12:10 |
openstack | Launchpad bug 1638350 in tripleo "pingtest failing on OVB jobs to create Cinder volume and Nova server" [Critical,In progress] - Assigned to Emilien Macchi (emilienm) | 12:10 |
*** chlong has joined #tripleo | 12:13 | |
panda | I can't believe it ... it's easier to fix the issue than to put a workaround ... | 12:14 |
openstackgerrit | Gabriele Cerami proposed openstack-infra/tripleo-ci: Workaround: Set selinux to permissive to workaround bug LP#1637961 https://review.openstack.org/392703 | 12:18 |
marios | shardy: cool i will have a closer look still todo :) thanks a lot | 12:21 |
*** cdearborn has joined #tripleo | 12:21 | |
EmilienM | panda: can I assign 1638350 to you please? I'm leaving on PTO tonight, I'm not going to take care of it during the next days | 12:22 |
panda | EmilienM: yes | 12:23 |
EmilienM | panda: thx | 12:23 |
slagle | EmilienM: you're going to get rid of the 2 node job? | 12:23 |
*** jerrygb has joined #tripleo | 12:23 | |
EmilienM | slagle: not yet i think, we might want to wait to have a stable 3nodes jobs in place before, no? | 12:23 |
slagle | i wasnt thinking of removing it at all honestly | 12:24 |
EmilienM | I was actually thinking at keeping it for some projects or kind of patches | 12:24 |
EmilienM | and keep 3nodes only for some projects (THT & puppet-tripleo) | 12:24 |
shadower | jaosorior: can I have one more 1 liner? https://review.openstack.org/#/c/391259/ | 12:25 |
EmilienM | I see some value in keeping the 2nodes job, where a project can't break composable roles & network isolation | 12:25 |
*** sudipto_ has quit IRC | 12:25 | |
*** sudipto has quit IRC | 12:25 | |
slagle | EmilienM: ok. you're comment sounded like you planned to get rid of it | 12:25 |
EmilienM | slagle: I was wrong, let me phrase it again in the review | 12:25 |
*** maticue has joined #tripleo | 12:29 | |
*** bfournie has joined #tripleo | 12:29 | |
*** jerrygb has quit IRC | 12:31 | |
*** maeca1 has joined #tripleo | 12:33 | |
weshay | panda, fyi.. the ping test isssues have been escalated | 12:36 |
*** rlandy has joined #tripleo | 12:36 | |
*** dsariel has joined #tripleo | 12:41 | |
EmilienM | weshay: what does it mean? | 12:41 |
*** jerrygb has joined #tripleo | 12:42 | |
*** kjw3 has joined #tripleo | 12:44 | |
weshay | EmilienM, bugs that stop CI related to the prod chain, like upstream CI are now treated like gss/customer escalations and use the same process | 12:44 |
EmilienM | nice | 12:44 |
*** lmiccini has joined #tripleo | 12:45 | |
beagles | is there a way to sort gerrit search results? | 12:45 |
weshay | it helps to get visibility, getting people to help out, and offers some amount of root cause analysis to prevent the same issue in the future | 12:45 |
*** jlinkes_ has quit IRC | 12:45 | |
beagles | like by subject, etc | 12:45 |
shardy | beagles: have you tried gerrit dash creator? https://github.com/openstack/gerrit-dash-creator | 12:46 |
*** jlinkes_ has joined #tripleo | 12:46 | |
*** chlong has quit IRC | 12:46 | |
*** rodrigods has quit IRC | 12:46 | |
beagles | shardy: nope.. not yet :) | 12:46 |
*** rodrigods has joined #tripleo | 12:46 | |
beagles | shardy: but it looks like I'm about to | 12:46 |
panda | beagles: shardy gertty allows sorting by change number and last update | 12:47 |
shardy | I created some tripleo dashboards with that then bookmarked them, you can hack the tripleo.dash and/or tripleo-stable.dash to suit your needs :) | 12:47 |
*** jlinkes_ has quit IRC | 12:47 | |
*** tzumainn has joined #tripleo | 12:48 | |
beagles | shardy: cool.. I've reached my tipping point where procrastinating on building some custom dashboards is no longer an option :) | 12:49 |
*** dougbtv has joined #tripleo | 12:49 | |
*** lucasagomes is now known as lucas-hungry | 12:52 | |
*** jlinkes has joined #tripleo | 12:52 | |
*** percevalbot has quit IRC | 12:53 | |
*** maeca1 has quit IRC | 12:55 | |
*** morazi has joined #tripleo | 12:55 | |
*** percevalbot has joined #tripleo | 12:56 | |
EmilienM | marios: approving https://review.openstack.org/#/c/392680/ please backport it asap | 12:57 |
EmilienM | slagle: I think we can also approve https://review.openstack.org/#/c/392313/ -- wdyt? | 12:58 |
*** jcoufal has joined #tripleo | 12:58 | |
slagle | it's not passed ha | 12:59 |
EmilienM | social: we can approve https://review.openstack.org/#/c/392123/ as it won't break CI - please make sure you backport it into stable/newton asap | 12:59 |
openstackgerrit | Marios Andreou proposed openstack/tripleo-heat-templates: Fixup the start of swift services https://review.openstack.org/393760 | 12:59 |
marios | EmilienM: thanks done setting -1 till master merges | 12:59 |
slagle | EmilienM: i guess it got past the point where that code is executed | 12:59 |
EmilienM | slagle: right, that's why I think we can go ahead | 13:00 |
EmilienM | but I'm fine waiting | 13:00 |
*** jerrygb has quit IRC | 13:01 | |
*** morazi has quit IRC | 13:02 | |
*** masco has quit IRC | 13:02 | |
social | EmilienM: https://review.openstack.org/#/c/392260/ | 13:02 |
social | EmilienM: related commit | 13:03 |
EmilienM | social: I know, waiting for CI though | 13:03 |
*** lblanchard has joined #tripleo | 13:03 | |
social | EmilienM: ack | 13:03 |
*** maeca1 has joined #tripleo | 13:06 | |
*** tobias_fiberdata has joined #tripleo | 13:06 | |
*** tobias-fiberdata has quit IRC | 13:08 | |
openstackgerrit | Arx Cruz proposed openstack-infra/tripleo-ci: Reducing a few minutes from the job timeout to save the logs https://review.openstack.org/393309 | 13:09 |
EmilienM | shardy: https://review.openstack.org/#/c/391064/ is also a good candidate for backport | 13:09 |
dtantsur | folks, I've started a blueprint to get back RAID: https://blueprints.launchpad.net/tripleo/+spec/raid-workflow | 13:09 |
dtantsur | mgould, I've assigned you just because you've been working on it ^^^ | 13:10 |
dtantsur | maybe we'll shuffle assignees (e.g. lucas-hungry might want to help you) | 13:10 |
mgould | dtantsur: OK, thanks | 13:10 |
shardy | EmilienM: yeah I was waiting for it to land before proposing the backport | 13:10 |
shardy | but I can cherry-pick it now | 13:10 |
marios | shardy: thanks the review looks great its excellent we have actual code to point at end of the first week after summit (session was last thursday) | 13:10 |
ansiwen | what is the current status of CI? working again? (sorry, didn't follow here) | 13:10 |
EmilienM | shardy: no, just making sure it's in the radar | 13:10 |
EmilienM | ansiwen: right now, AFIK it's unstable | 13:11 |
shardy | EmilienM: ack, thanks | 13:11 |
EmilienM | ansiwen: you can use tripleo.org/cistatus.html to follow it | 13:11 |
ansiwen | EmilienM: thank you! | 13:11 |
*** ooolpbot has joined #tripleo | 13:11 | |
ooolpbot | URGENT TRIPLEO TASKS NEED ATTENTION | 13:11 |
ooolpbot | https://bugs.launchpad.net/tripleo/+bug/1637961 | 13:11 |
openstack | Launchpad bug 1637961 in tripleo "periodic HA master job pingtest times out" [Critical,In progress] - Assigned to Gabriele Cerami (gcerami) | 13:11 |
ooolpbot | https://bugs.launchpad.net/tripleo/+bug/1638350 | 13:11 |
*** ooolpbot has quit IRC | 13:11 | |
openstack | Launchpad bug 1638350 in tripleo "pingtest failing on OVB jobs to create Cinder volume and Nova server" [Critical,In progress] - Assigned to Gabriele Cerami (gcerami) | 13:11 |
*** d0ugal has quit IRC | 13:12 | |
*** pradk has quit IRC | 13:12 | |
*** pradk has joined #tripleo | 13:13 | |
*** links has quit IRC | 13:13 | |
*** zoli|lunch is now known as zoli | 13:15 | |
*** morazi has joined #tripleo | 13:15 | |
*** zoli is now known as zoliXXL | 13:15 | |
*** akshai has joined #tripleo | 13:19 | |
*** d0ugal has joined #tripleo | 13:21 | |
*** jprovazn has joined #tripleo | 13:22 | |
openstackgerrit | Gabriele Cerami proposed openstack-infra/tripleo-ci: Workaround: Set selinux to permissive to workaround bug LP#1637961 https://review.openstack.org/392703 | 13:24 |
openstackgerrit | Merged openstack/tripleo-heat-templates: Add option to disable "d1" Swift device https://review.openstack.org/391222 | 13:24 |
openstackgerrit | Merged openstack/puppet-tripleo: Deploy monitoring/logging agents sooner https://review.openstack.org/390802 | 13:24 |
*** [1]cdearborn has joined #tripleo | 13:25 | |
openstackgerrit | Christian Schwede proposed openstack/tripleo-heat-templates: Add option to disable "d1" Swift device https://review.openstack.org/393769 | 13:26 |
openstackgerrit | Christian Schwede proposed openstack/puppet-tripleo: swift/proxy: configure rabbitmq properly https://review.openstack.org/393770 | 13:27 |
EmilienM | cschwede: you missed -x option to cherry-pick ^ | 13:27 |
jaosorior | EmilienM: did you see what I posted about the cinder issue? | 13:27 |
EmilienM | cschwede: also, master patch is not merged yet, we shouldn't propose backports before | 13:28 |
cschwede | EmilienM: bummer; I did that in Gerrit | 13:28 |
EmilienM | jaosorior: not much | 13:28 |
cschwede | EmilienM: eh, i have too many patches open. abandoning that, it was the wrong one. sorry for the noise | 13:28 |
EmilienM | cschwede: np | 13:28 |
EmilienM | cschwede: we need to backport it but only when master patch is merged | 13:29 |
cschwede | EmilienM: yep, fully agree | 13:29 |
jaosorior | EmilienM: right, so I did a local deployment, and at first it actually shows the same deal, no cinder logs, but no error in any logs either. However, when running volume list, the first runs take a VERY long time, and then it works. So right now my theory is that it's apache's lazy load being the culprit. Cinder is working, but doing the initial load takes too long and it times out. | 13:29 |
jaosorior | EmilienM: I think we could merge the eventlet patch, and right now I'm figuring out a way to speed up apache's loading of the virtual host | 13:30 |
EmilienM | jaosorior: ok. It is the number of workers? | 13:30 |
EmilienM | jaosorior: look, it failed on latest run https://review.openstack.org/#/c/392647/ | 13:31 |
EmilienM | http://logs.openstack.org/47/392647/7/check-tripleo/gate-tripleo-ci-centos-7-ovb-nonha/f920b85/console.html#_2016-11-04_12_23_52_588576 | 13:31 |
EmilienM | another issue | 13:31 |
jaosorior | EmilienM: that was an ironic issue | 13:31 |
EmilienM | d0ugal: ^ can you look this one? | 13:31 |
EmilienM | ah | 13:31 |
EmilienM | a known issue? | 13:31 |
d0ugal | EmilienM: looking | 13:31 |
d0ugal | EmilienM: just in a meeting | 13:31 |
jaosorior | EmilienM: not that I know of | 13:32 |
*** amoralej is now known as amoralej|lunch | 13:32 | |
jaosorior | EmilienM: seen in a couple of times now in the 5 minutes I've been checking for those. Since that happened in another patch that I wanted to try to load cinder before the pingtest https://review.openstack.org/#/c/393687/ | 13:32 |
*** ccamacho is now known as ccamacho|lunch | 13:32 | |
d0ugal | EmilienM: I've not seen that before, is it a one off? | 13:33 |
openstackgerrit | Juan Antonio Osorio Robles proposed openstack-infra/tripleo-ci: CINDER TEST https://review.openstack.org/393687 | 13:34 |
d0ugal | EmilienM: oh, actually, it might be related to ... https://review.openstack.org/#/c/388920/ | 13:34 |
jaosorior | d0ugal: it isn't | 13:34 |
jaosorior | d0ugal: * it isn't a one-off | 13:34 |
*** d0ugal has quit IRC | 13:35 | |
*** d0ugal has joined #tripleo | 13:37 | |
jpich | https://review.openstack.org/#/c/392148/ might also be of interest for the node locked error | 13:37 |
d0ugal | EmilienM, jaosorior - sorry, I am on an unstable connection. | 13:38 |
*** jpena is now known as jpena|lunch | 13:39 | |
*** jerrygb has joined #tripleo | 13:39 | |
*** jerrygb has quit IRC | 13:39 | |
d0ugal | EmilienM, jaosorior - I think it might be related to the fact that the Ironic client mistral creates doesn't do any retrying by default | 13:39 |
*** cdearborn has quit IRC | 13:40 | |
*** jroll is now known as jrollinhatin | 13:40 | |
d0ugal | i.e. we do this: https://github.com/openstack/tripleo-common/blob/d0fb1822def70068b9f8a4aa70602e8ff6d06920/tripleo_common/actions/base.py#L52-L64 | 13:40 |
d0ugal | and depending on the path in the workflow, you get that client, or the one mistral creates without any retrying values | 13:41 |
openstackgerrit | Tomas Sedovic proposed openstack/tripleo-specs: Allow specifying multiple sources of validations https://review.openstack.org/393775 | 13:41 |
EmilienM | cschwede: please submit the backport again, but with -x option. I think it's fine. I don't want to forget this one | 13:42 |
cschwede | EmilienM: looking | 13:42 |
EmilienM | d0ugal: ok thanks | 13:44 |
*** jerrygb has joined #tripleo | 13:46 | |
social | EmilienM: and from updates I think we need to push more for https://review.openstack.org/#/c/392593 and https://review.openstack.org/#/c/389830/ :( | 13:46 |
EmilienM | social: none of them pass CI and have positive review | 13:48 |
*** jerrygb has quit IRC | 13:48 | |
EmilienM | I'm afraid some work needs to be done before we land them | 13:48 |
openstackgerrit | Christian Schwede proposed openstack/puppet-tripleo: swift/proxy: configure rabbitmq properly https://review.openstack.org/393770 | 13:48 |
*** d0ugal_ has joined #tripleo | 13:51 | |
openstackgerrit | Merged openstack/tripleo-quickstart: Add roles-gate playbook run to OVB ci-script https://review.openstack.org/392873 | 13:53 |
gfidente | jaosorior so cinder is not that slow for me | 13:53 |
gfidente | not slower than other APIs | 13:53 |
*** d0ugal has quit IRC | 13:53 | |
jaosorior | gfidente: well, it's timing out in CI | 13:54 |
gfidente | jaosorior the pingtest you mean? | 13:55 |
jaosorior | gfidente: my theory at the moment is that the issue is the first command given to an instance (a vhost in a specific host), since that takes some time for apache to laod it (apache does lazy initialization of the vhosts) | 13:55 |
jaosorior | gfidente: correct | 13:55 |
*** ebalduf has joined #tripleo | 13:56 | |
*** lucas-hungry is now known as lucasagomes | 13:56 | |
gfidente | jaosorior but the nonha job is passing | 13:57 |
*** tobias-fiberdata has joined #tripleo | 13:57 | |
jaosorior | gfidente: so it seems to be the case that one cinder instance in the ha job has logs (meaning it can respond quickly) while the other two nodes have no logs for cinder (yet), which is something I could reproduce locally. | 13:57 |
jaosorior | gfidente: did you do an ha deployment? Can you verify that in your case cinder is working as well? | 13:58 |
gfidente | I did an HA deployment but with a single controller | 13:58 |
*** Guest13194 has quit IRC | 13:58 | |
gfidente | and it's working fine it seems | 13:58 |
gfidente | it couldn't start redis instead | 13:59 |
jaosorior | gfidente: right, that is the issue that panda is trying to solve | 13:59 |
jaosorior | gfidente: can you do an HA deployment with 3 controllers and verify? | 14:00 |
gfidente | jaosorior yep | 14:00 |
jaosorior | gfidente: awesome :D | 14:00 |
*** fultonj_ has quit IRC | 14:00 | |
gfidente | in CI I see the nova api returning bad status line | 14:01 |
panda | well, the issue is solved with the package ... it's the workaround that is actually taking ages :( | 14:01 |
*** tobias_fiberdata has quit IRC | 14:01 | |
*** fultonj has quit IRC | 14:01 | |
panda | gfidente: link ? | 14:01 |
gfidente | http://logs.openstack.org/82/393682/1/check-tripleo/gate-tripleo-ci-centos-7-ovb-ha/d2b3ff2/console.html#_2016-11-04_12_16_09_456876 | 14:02 |
*** fultonj has joined #tripleo | 14:02 | |
jaosorior | gfidente: check the logs, that seems to be when attempting to boot from volume | 14:03 |
shardy | gfidente: Hey are you planning to refresh your patch adding the ansible hook to overcloud-full? | 14:04 |
shardy | I think we just need to install the python-heat-agent-ansible package now | 14:04 |
gfidente | shardy instead of heat-config-ansible ? | 14:05 |
openstackgerrit | Dougal Matthews proposed openstack/python-tripleoclient: Fix password handling for users upgrading from Mitaka https://review.openstack.org/392593 | 14:05 |
gfidente | oh no you mean the rpm, not the element | 14:06 |
shardy | gfidente: Yeah I don't think we need the element anymore, now that it's packaged | 14:06 |
gfidente | so that patch should go against a different repo | 14:06 |
gfidente | send it if you have it already | 14:07 |
shardy | gfidente: oh yeah, not got a patch yet but I can push one, thanks! | 14:07 |
openstackgerrit | Merged openstack/tripleo-common: Clean up configure_containers.sh script https://review.openstack.org/384865 | 14:07 |
gfidente | shardy thanks for pinging though :) | 14:07 |
openstackgerrit | Merged openstack/tripleo-common: Allow building heat-agents image from master https://review.openstack.org/384866 | 14:07 |
openstackgerrit | Juan Antonio Osorio Robles proposed openstack/tripleo-heat-templates: Add preload wsgi script option for cinder https://review.openstack.org/393790 | 14:07 |
openstackgerrit | Merged openstack/tripleo-heat-templates: Add special case handling for OVS upgrade in updates and upgrades https://review.openstack.org/393688 | 14:07 |
openstackgerrit | Merged openstack/tripleo-heat-templates: Add replacepkgs to the manual ovs upgrade workaround and fix a typo https://review.openstack.org/393689 | 14:08 |
*** eggmaster has joined #tripleo | 14:08 | |
openstackgerrit | Merged openstack/tripleo-heat-templates: Fixup the start of swift services https://review.openstack.org/392680 | 14:08 |
openstackgerrit | Merged openstack/tripleo-heat-templates: Update openstack-puppet-modules dependencies https://review.openstack.org/392123 | 14:08 |
*** Goneri has joined #tripleo | 14:09 | |
*** ooolpbot has joined #tripleo | 14:10 | |
ooolpbot | URGENT TRIPLEO TASKS NEED ATTENTION | 14:10 |
ooolpbot | https://bugs.launchpad.net/tripleo/+bug/1637961 | 14:10 |
openstack | Launchpad bug 1637961 in tripleo "periodic HA master job pingtest times out" [Critical,In progress] - Assigned to Gabriele Cerami (gcerami) | 14:10 |
ooolpbot | https://bugs.launchpad.net/tripleo/+bug/1638350 | 14:10 |
*** ooolpbot has quit IRC | 14:10 | |
openstack | Launchpad bug 1638350 in tripleo "pingtest failing on OVB jobs to create Cinder volume and Nova server" [Critical,In progress] - Assigned to Gabriele Cerami (gcerami) | 14:10 |
*** lmiccini has quit IRC | 14:11 | |
*** lmiccini has joined #tripleo | 14:11 | |
*** dtantsur is now known as creepy_owlet | 14:11 | |
*** Guest13194 has joined #tripleo | 14:13 | |
openstackgerrit | Merged openstack/tripleo-ui: Prepare 1.0.6 https://review.openstack.org/391798 | 14:16 |
jtomasek | florianf: do you intend to fix the node power state bug by introducing polling? | 14:16 |
openstackgerrit | OpenStack Proposal Bot proposed openstack/tripleo-common: Updated from global requirements https://review.openstack.org/389957 | 14:17 |
florianf | jtomasek: not sure yet. But I guess there is nothing coming through the websocket... | 14:17 |
*** zoliXXL is now known as zoli|brb | 14:17 | |
jtomasek | florianf: yeah, problem is that ironic would have to send zaqar messages | 14:17 |
florianf | jtomasek: exactly. do you think that is something worth investigating? | 14:18 |
jtomasek | florianf: I am quite in favor of introducing polling | 14:18 |
openstackgerrit | Paul Belanger proposed openstack/diskimage-builder: Switch to bindep to manage OS requirements https://review.openstack.org/391931 | 14:18 |
florianf | jtomasek: I think for now that's the best option | 14:18 |
*** saneax is now known as saneax-_-|AFK | 14:19 | |
gfidente | jaosorior panda apparently cinder-volume created and attached the volume at 12:14:36 | 14:19 |
jtomasek | florianf: we can either file it as a bug in ironic or just hide the specific node under transparent loader | 14:19 |
jaosorior | gfidente: so it did get attached? right, so then it seems that it's just a slowness issue | 14:20 |
jtomasek | florianf: I am not sure if the node power state is any important during introspection | 14:20 |
gfidente | jaosorior I am still not sure | 14:20 |
gfidente | looking at nova logs | 14:20 |
gfidente | to see what causes the badstatusline | 14:20 |
florianf | jtomasek: AFAIK it doesn't update the maintenance state, so power state is probably the only column that is updated during introspection. | 14:21 |
*** apetrich has quit IRC | 14:21 | |
gfidente | jaosorior see the heat volume1 resource went in COMPLETE state | 14:22 |
florianf | jtomasek: So are you saying it's not even worth updating it? | 14:22 |
jtomasek | florianf: yes | 14:22 |
*** apetrich has joined #tripleo | 14:23 | |
jrist | honza: might need a rebase on this? https://review.openstack.org/#/c/392589/ | 14:24 |
openstackgerrit | Lukas Bezdicka proposed openstack/tripleo-heat-templates: Update openstack-puppet-modules dependencies https://review.openstack.org/393797 | 14:25 |
*** rajinir has quit IRC | 14:26 | |
florianf | jtomasek: hmm. but then we shouldn't just hide the power state, but all columns. | 14:27 |
honza | jrist: might have to redo the whole thing :) | 14:27 |
florianf | jtomasek: except for the mac address, name and role | 14:27 |
jpich | jtomasek: Hey, a couple of days ago folks mentioned letting users switch nodes back to 'manageable', looks like we do have a workflow for changing state already -> https://github.com/openstack/tripleo-common/blob/master/workbooks/baremetal.yaml#L8 | 14:28 |
jpich | jtomasek: (I made the mistake a couple of times where I forgot to run introspection and ended up stuck, from the UI perspective) | 14:28 |
*** ccamacho|lunch is now known as ccamacho | 14:28 | |
*** jistr is now known as jistr|call | 14:29 | |
*** jaosorior is now known as jaosorior_mtg | 14:29 | |
*** jaosorior_mtg has quit IRC | 14:31 | |
jtomasek | jpich: cool | 14:31 |
jtomasek | sure we can add it then | 14:31 |
EmilienM | mwhahaha: can you confirm we don't need this backport in stable/newton https://review.openstack.org/#/c/393455/ ? It's an Ocata thing, right? | 14:32 |
*** jaosorior_mtg has joined #tripleo | 14:32 | |
jpich | jtomasek: Cool! I'll open a bug to remind us to do it sometime | 14:32 |
mwhahaha | EmilienM: i thought we did that change in newton upstream | 14:32 |
jtomasek | jpich: thanks! | 14:32 |
EmilienM | mwhahaha: https://review.openstack.org/#/c/368169/ was not backported | 14:32 |
mwhahaha | EmilienM: techincally I don't think we need it in newton | 14:33 |
EmilienM | ok | 14:33 |
jtomasek | florianf: so, I am not against doing polling, but we need to make sure we only start polling when node operation is started and stops when node operation finishes | 14:33 |
mwhahaha | EmilienM: it's in stable/newton, it was merged 6+ weeks ago | 14:33 |
jtomasek | florianf: we will not solve the problem of restarting the polling when user refreshes the page (for now) | 14:33 |
mwhahaha | EmilienM: there's nothing to backport in puppet-keystone | 14:33 |
mwhahaha | EmilienM: it was in 9.4.0 | 14:34 |
EmilienM | mwhahaha: mhh ok | 14:34 |
EmilienM | so I can approve it | 14:34 |
jtomasek | florianf: (that is a app state recovery problem) | 14:34 |
mwhahaha | EmilienM: yea | 14:34 |
*** zoli|brb is now known as zoli | 14:36 | |
marios | EmilienM: thanks for the backport magic | 14:37 |
*** SteveRelf has quit IRC | 14:38 | |
*** d0ugal_ has quit IRC | 14:39 | |
*** d0ugal has joined #tripleo | 14:39 | |
jpich | jtomasek: FYI - https://bugs.launchpad.net/tripleo/+bug/1639262 | 14:40 |
openstack | Launchpad bug 1639262 in tripleo "Cannot set nodes to 'manageable' from the UI" [Medium,Triaged] | 14:40 |
*** fragatina has quit IRC | 14:42 | |
*** fragatina has joined #tripleo | 14:42 | |
jrist | honza: ah | 14:42 |
jrist | honza: is that the one with the circular dependencies | 14:42 |
honza | jrist: yes, sir | 14:43 |
*** sudipto has joined #tripleo | 14:43 | |
jrist | ok don't mind me :) | 14:43 |
*** sudipto_ has joined #tripleo | 14:43 | |
*** rajinir has joined #tripleo | 14:43 | |
EmilienM | marios: yw :) | 14:45 |
*** amoralej|lunch is now known as amoralej | 14:46 | |
*** dmacpher has joined #tripleo | 14:49 | |
*** jpena|lunch is now known as jpena | 14:50 | |
openstackgerrit | James Slagle proposed openstack/tripleo-heat-templates: TEST: Disable convergence on the overcloud https://review.openstack.org/393813 | 14:51 |
*** jerrygb_ has joined #tripleo | 14:52 | |
florianf | jtomasek: ok, makes sense. are we tackling app state recovery for ocata? | 14:56 |
jtomasek | florianf: probably not | 14:56 |
therve | slagle, Dang, I really hope convergence is not an issue :/ | 14:58 |
d0ugal | shardy: Hey | 14:59 |
d0ugal | or any other Heat experts around :) | 15:00 |
therve | d0ugal, Try me :) | 15:00 |
d0ugal | therve: Am I correct in thinking I can get the parameters back out of Heat? | 15:00 |
slagle | therve: can you see why the stack is stuck in create in progress here? http://logs.openstack.org/13/392313/7/check-tripleo/gate-tripleo-ci-centos-7-ovb-ha/f383107/ | 15:00 |
jaosorior_mtg | EmilienM: this is my current attempt at going at the cinder issue https://review.openstack.org/#/q/topic:wsgi_script_preload | 15:00 |
therve | slagle, Looking | 15:00 |
therve | d0ugal, You mean with stack-show? | 15:01 |
d0ugal | therve: That might be it, and can I get hidden params with that too? passwords? | 15:01 |
*** jerrygb has joined #tripleo | 15:01 | |
EmilienM | jaosorior_mtg: ack | 15:01 |
therve | d0ugal, Good question, let me check | 15:01 |
slagle | therve: i see "Stack CREATE COMPLETE (tenant-stack): Stack CREATE completed successfully" in the heat-engine.log from controller 0, yet when we resource-list at the end of the pingtest, the server1 resources is still CREATE_IN_PROGRESS | 15:02 |
jaosorior_mtg | EmilienM: the result of this https://review.openstack.org/#/c/393687/ will let us know if it's actually the lag of apache's lazy load that's actually the issue | 15:02 |
EmilienM | jaosorior_mtg: ok cool. it would be cool to also test your series of patches in tripleo ci | 15:03 |
*** jistr|call is now known as jistr | 15:03 | |
therve | s; | 15:03 |
jaosorior_mtg | EmilienM: right, so the last of that patch in the series is to tripleo-heat-templates | 15:03 |
therve | slagle, The tenant-stack is still in progress, so that message is probably wrong | 15:04 |
jaosorior_mtg | EmilienM: so that'll trigger the ovb-ha job anyway | 15:04 |
shardy | d0ugal: Hey, sorry was in a meeting - yeah stack-show will give you parameters but IIRC it obfuscates hidden parameters | 15:04 |
shardy | if they end up in server metadata and/or a SoftwareDeployment they are actually visible via the resource-metadata or deployment-show commands tho... | 15:04 |
*** jerrygb_ has quit IRC | 15:04 | |
therve | slagle, Oh, it just happened to be very very slow | 15:04 |
EmilienM | jaosorior_mtg: right | 15:05 |
therve | slagle, finished at 14:12:23, but your tests timed out at 300s | 15:05 |
EmilienM | good | 15:05 |
slagle | therve: ah, indeed | 15:05 |
therve | slagle, at 14:10 | 15:05 |
therve | So it took 7mins instead of 5 | 15:05 |
slagle | ok, so we need to increase the timeout | 15:05 |
therve | Maybe | 15:06 |
therve | OTOH it took 30sec to do a stack-show, which is not a good sign | 15:06 |
panda | slagle: therve: this is the redis bug again. | 15:07 |
therve | panda, Is it? It completed successfully though | 15:08 |
panda | therve: http://logs.openstack.org/13/392313/7/check-tripleo/gate-tripleo-ci-centos-7-ovb-ha/f383107/logs/overcloud-controller-0/var/log/audit/audit.txt.gz | 15:08 |
therve | d0ugal, it looks like you may not get the hidden value, but I didn't try | 15:08 |
panda | therve: search for "denied" | 15:08 |
d0ugal | shardy: hrm, I'll have to try. Trying to find a sensible way to get the passwords for Mitaka users upgrading, my current approach isn't sensible. | 15:09 |
d0ugal | therve: dang, shardy suggested looking in the resource-metadata or deployment-show commands - so I can try that if I can navigate my way in | 15:09 |
therve | panda, Wouldn't that break things though? | 15:09 |
*** ooolpbot has joined #tripleo | 15:10 | |
ooolpbot | URGENT TRIPLEO TASKS NEED ATTENTION | 15:10 |
ooolpbot | https://bugs.launchpad.net/tripleo/+bug/1637961 | 15:10 |
openstack | Launchpad bug 1637961 in tripleo "periodic HA master job pingtest times out" [Critical,In progress] - Assigned to Gabriele Cerami (gcerami) | 15:10 |
ooolpbot | https://bugs.launchpad.net/tripleo/+bug/1638350 | 15:10 |
*** ooolpbot has quit IRC | 15:10 | |
openstack | Launchpad bug 1638350 in tripleo "pingtest failing on OVB jobs to create Cinder volume and Nova server" [Critical,In progress] - Assigned to Gabriele Cerami (gcerami) | 15:10 |
slagle | ok, so gnocchi-metricd is consuming all CPU | 15:10 |
*** jistr is now known as jistr|biab | 15:10 | |
slagle | panda: yea, sounds like the same issue i guess | 15:11 |
therve | slagle, gnocchi is connecting to redis, so that might be it | 15:11 |
*** chandankumar has quit IRC | 15:12 | |
panda | therve: yep, metricd eating all CPU, everything slows down. | 15:12 |
therve | OK | 15:12 |
panda | therve: the weird thing is that deploy reports CREATE_COMPLETE even if redis resource is stopped in all the nodes. | 15:14 |
*** jistr|biab is now known as jistr | 15:14 | |
openstackgerrit | Merged openstack/tripleo-heat-templates: Fixup the start of swift services https://review.openstack.org/393760 | 15:15 |
therve | panda, I think very few services depend on redis, but if it makes things slow it can impact the test stack | 15:15 |
EmilienM | jaosorior_mtg: can you remove -1 on https://review.openstack.org/#/c/392647/ please? | 15:15 |
therve | Should https://review.openstack.org/#/c/392703/ land then? | 15:16 |
jaosorior_mtg | EmilienM: I am not convinced that that is the real issue, which is why I did the -1 there. | 15:16 |
panda | therve: I'm waiting for it to pass the gates | 15:17 |
*** hjensas has quit IRC | 15:17 | |
jaosorior_mtg | EmilienM: panda's fix even makes more sense to me. | 15:17 |
slagle | panda: it looks like the liberty jobs have already failed on the patch | 15:17 |
slagle | is that a known issue? | 15:17 |
shardy | d0ugal: aha, you don't need to fish in the stack parameters at all | 15:17 |
shardy | d0ugal: try openstack stack environment show overcloud | 15:18 |
shardy | I think I even added that API but forgot about it :) | 15:18 |
slagle | ImportError: cannot import name secretutils | 15:18 |
panda | slagle: is there a way to see how the liberty gate failed before zull is reporting back to the review ? | 15:18 |
EmilienM | panda: http://status.openstack.org/zuul/ | 15:18 |
slagle | panda: yea, you can look here: http://status.openstack.org/zuul/ | 15:18 |
slagle | panda: type in the patch # in the filter | 15:19 |
slagle | 392703 | 15:19 |
EmilienM | looks like it could be a backport in liberty? let me check | 15:19 |
panda | slagle: EmilienM, when the build has already failed, telnet to host is impossible | 15:19 |
*** sudipto_ has quit IRC | 15:19 | |
*** sudipto has quit IRC | 15:19 | |
EmilienM | looks like something in https://github.com/openstack/osprofiler | 15:20 |
EmilienM | panda: right but you have logs | 15:20 |
slagle | filed bug: https://bugs.launchpad.net/tripleo/+bug/1639274 | 15:20 |
openstack | Launchpad bug 1639274 in tripleo "ImportError: cannot import name secretutils" [Undecided,New] | 15:20 |
openstackgerrit | Merged openstack/tripleo-heat-templates: Use correct password for keystone bootstrap https://review.openstack.org/393455 | 15:20 |
panda | EmilienM: yes, my question is where I can see the log of that build when telnet is not available and logs links are not in the review | 15:20 |
slagle | panda: from here: http://status.openstack.org/zuul/ | 15:21 |
slagle | use the filter, and then click on the failed job name | 15:21 |
EmilienM | slagle: it looks like a packaging issue with either osprofiler or oslo.utils, I'm investigating | 15:21 |
panda | slagle: All the links I have in zuul for the builds on my review are telnet URIs ... | 15:23 |
d0ugal | shardy: oh, wow. That is perfect. Thanks! | 15:23 |
panda | slagle: nevermind, was looking at the wrong page | 15:23 |
d0ugal | therve: FYI, this does it: openstack stack environment show overcloud | 15:23 |
EmilienM | slagle: https://github.com/openstack/osprofiler/commit/a3accb74512b5355a020c876190e3d1a44455b6c | 15:24 |
EmilienM | I think we don't pin osprofiler in RDO/liberty | 15:24 |
EmilienM | let me check | 15:24 |
*** rcernin has quit IRC | 15:25 | |
*** dsariel has quit IRC | 15:25 | |
*** rcernin has joined #tripleo | 15:27 | |
EmilienM | slagle: bingo | 15:28 |
EmilienM | I'm sending a patch in rdo | 15:28 |
openstackgerrit | Merged openstack/puppet-tripleo: Add port to rabbitmq node ip list https://review.openstack.org/392996 | 15:28 |
panda | oh, my. This review is a bug magnet. memcache, mariadb, osprofiler. We'll fix internet before landing this | 15:29 |
openstackgerrit | Thiago da Silva proposed openstack/tripleo-heat-templates: set url_base option in static web middleware https://review.openstack.org/392918 | 15:29 |
*** absubram has joined #tripleo | 15:32 | |
EmilienM | panda: pingtest is running :) | 15:34 |
EmilienM | let's see | 15:34 |
panda | EmilienM: I'm glued to the monitor ... | 15:34 |
*** jprovazn has quit IRC | 15:35 | |
panda | EmilienM: why doesn't osprofiler have stable/liberty, and we're using master even for newton ? | 15:36 |
*** yamahata has joined #tripleo | 15:36 | |
panda | EmilienM: tenant-stack creation should take ~3 minutes | 15:37 |
EmilienM | panda: I don't know about osprofiler | 15:37 |
EmilienM | I never contributed to it yet | 15:37 |
panda | \o/ woohoo | 15:38 |
EmilienM | Overcloud pingtest, heat stack CREATE_COMPLETE | 15:38 |
EmilienM | well, we set selinux in permissive | 15:38 |
panda | yes | 15:38 |
*** Guest13194 is now known as tesseract- | 15:38 | |
panda | but it took some attempts to get it right | 15:39 |
EmilienM | panda: please revert this change as soon as we have resource-agent package | 15:39 |
panda | EmilienM: of course | 15:39 |
panda | maybe at this point I should've patched the resource agent directly | 15:40 |
therve | d0ugal, Nice | 15:40 |
bogdando | so folks, the patch https://review.openstack.org/#/c/393720 seems almost working for me, almost as it fails randomly on *different* places :) please comment if you think the case is interesting at all | 15:40 |
therve | shardy, So I started at looking at using zaqar for events. What's the procedure to propose something, a spec, a spec-lite bug? | 15:41 |
bogdando | failures are mostly related to intermittent libguestfs / virt-customize errors | 15:41 |
shardy | therve: we've mostly been using blueprints for features but a spec lite bug is OK too | 15:42 |
EmilienM | panda: how the patch looks like? any example? | 15:42 |
shardy | therve: if you feel a spec will be useful you're free to propose one, but we've not enforced them for smaller features | 15:42 |
therve | shardy, OK. It should be a small one, it's relatively self contained | 15:42 |
panda | EmilienM: https://github.com/ClusterLabs/resource-agents/commit/ba604dde3a58e1aaf4218487fcf40543a0e79db2 | 15:43 |
EmilienM | panda: sounds tricky ;) but if you can do it... The thing is, you need to make sure to apply this patch *after* the package install | 15:44 |
panda | EmilienM: this means modifying some manifests, I think it's simpler what I did at this time .. and It's easier to remove once all landed. It's not touching any project | 15:45 |
*** artom has quit IRC | 15:45 | |
*** artom_ has joined #tripleo | 15:46 | |
EmilienM | slagle: can you review https://review.openstack.org/#/c/393797/ please? | 15:46 |
EmilienM | panda: I see nonha job failing though | 15:46 |
slagle | EmilienM: done | 15:47 |
panda | jaosorior_mtg: http://logs.openstack.org/03/392703/9/check-tripleo/gate-tripleo-ci-centos-7-ovb-nonha/a72e0e2/console.html#_2016-11-04_15_32_47_857988 working around redis bug did not fix cinder problem it seems | 15:47 |
*** dsariel has joined #tripleo | 15:48 | |
jaosorior_mtg | panda: so the selinux issue didn't fix the cinder issue? | 15:48 |
jaosorior_mtg | *selinux fix | 15:48 |
panda | EmilienM: jaosorior_mtg , ClientException: resources.volume1: Gateway Time-out (HTTP 504) | 15:49 |
jaosorior_mtg | or workaround | 15:49 |
slagle | EmilienM: should we go ahead and merge https://review.openstack.org/#/c/392703/ anyway? | 15:49 |
EmilienM | it doesn't sound like | 15:49 |
EmilienM | slagle: wait | 15:49 |
slagle | ok | 15:49 |
EmilienM | slagle: nonha job failed for cinder timeout | 15:49 |
therve | panda, http://logs.openstack.org/03/392703/9/check-tripleo/gate-tripleo-ci-centos-7-ovb-nonha/a72e0e2/logs/overcloud-controller-0/var/log/gnocchi/metricd.txt.gz redis issue still here? | 15:50 |
openstackgerrit | Pradeep Kilambi proposed openstack/tripleo-heat-templates: swift/proxy: remove swift::proxy::ceilometer::rabbit_host https://review.openstack.org/391864 | 15:50 |
openstackgerrit | Dougal Matthews proposed openstack/tripleo-common: Use parameters from existing Heat stack if it already exists https://review.openstack.org/393831 | 15:51 |
panda | therve: I don't see any denied in audit.log | 15:52 |
therve | panda, Yeah, but it can't still connect somehow | 15:52 |
therve | http://logs.openstack.org/03/392703/9/check-tripleo/gate-tripleo-ci-centos-7-ovb-nonha/a72e0e2/logs/overcloud-controller-0/var/log/redis/redis.txt.gz is fishy as well | 15:52 |
d0ugal | rbrady, marios, bnemec, chem: I *think* this is a better solution. https://review.openstack.org/#/c/393831/ - thoughts? (NOTE: not tested yet, doing that now) | 15:53 |
*** rbowen has quit IRC | 15:54 | |
*** dsariel has quit IRC | 15:54 | |
panda | therve: damn ... pacemaker is not even used in nonha | 15:54 |
matbu | d0ugal: +1 agree with patching tripleo-common instead | 15:54 |
matbu | d0ugal: looking at the review atm | 15:54 |
openstackgerrit | Merged openstack/puppet-tripleo: pacemaker/mysql: wait step 2 to remove default accounts https://review.openstack.org/393317 | 15:55 |
*** athomas has quit IRC | 15:55 | |
d0ugal | matbu: Thanks - it is much much simpler | 15:55 |
*** numans has quit IRC | 15:55 | |
matbu | d0ugal: yep | 15:55 |
panda | therve: with pacemaker is the resource agent that is creating /var/run/redis dir | 15:55 |
panda | not sure wha'ts supposed to creat it without paccemaker | 15:56 |
openstackgerrit | Dougal Matthews proposed openstack/tripleo-common: Use parameters from existing Heat stack if it already exists https://review.openstack.org/393831 | 15:56 |
*** rcernin has quit IRC | 15:56 | |
openstackgerrit | Merged openstack/puppet-tripleo: Make sure keepalived is restarted before haproxy. https://review.openstack.org/393361 | 15:56 |
d0ugal | shardy: If you have time to look at https://review.openstack.org/#/c/393831 I would really appreciated it | 15:57 |
openstackgerrit | Juan Antonio Osorio Robles proposed openstack-infra/tripleo-ci: CINDER TEST https://review.openstack.org/393687 | 15:57 |
*** dprince has quit IRC | 15:58 | |
social | slagle: EmilienM: can you have a look on https://review.openstack.org/#/c/392260/ | 15:58 |
*** pcaruana has quit IRC | 15:58 | |
EmilienM | social: recheck | 15:59 |
*** tesseract- has quit IRC | 15:59 | |
panda | how did it work until now ? | 15:59 |
marios | d0ugal: you must have been sneezing few minute ago we were talking abut you | 16:00 |
marios | (lifecycle scrum) talking about the overcloudrc issue | 16:00 |
marios | d0ugal: will look thanks | 16:00 |
*** athomas has joined #tripleo | 16:01 | |
openstackgerrit | Merged openstack/puppet-tripleo: Set redis file descriptor limit when run via pacemaker https://review.openstack.org/393343 | 16:02 |
d0ugal | marios: haha, oh dear, I can't imagine you were saying anything good :) | 16:03 |
marios | d0ugal: course not! | 16:03 |
*** ebarrera has quit IRC | 16:03 | |
openstackgerrit | Merged openstack/tripleo-heat-templates: gnocchi statsd should be able to send data to port 8125 https://review.openstack.org/393296 | 16:05 |
ccamacho | jaosorior dude upstream updates are brooooken mmm a mistral workflow snag I believe... | 16:05 |
d0ugal | ccamacho: related to the passwords? | 16:06 |
ccamacho | d0ugal there is an env file generated empty from the the jinja templates and its breaking my "update" here part of the logs http://paste.openstack.org/show/587908/ | 16:08 |
d0ugal | ccamacho: oh, weird - I've not seen that one before. | 16:08 |
ccamacho | just reprovisioned the server, and launched the deployment command 2 times.. | 16:09 |
ccamacho | the second time failed with the mistral error | 16:09 |
*** ooolpbot has joined #tripleo | 16:10 | |
ooolpbot | URGENT TRIPLEO TASKS NEED ATTENTION | 16:10 |
ooolpbot | https://bugs.launchpad.net/tripleo/+bug/1637961 | 16:10 |
ooolpbot | https://bugs.launchpad.net/tripleo/+bug/1638350 | 16:10 |
*** ooolpbot has quit IRC | 16:10 | |
openstack | Launchpad bug 1637961 in tripleo "periodic HA master job pingtest times out" [Critical,In progress] - Assigned to Gabriele Cerami (gcerami) | 16:10 |
openstack | Launchpad bug 1638350 in tripleo "pingtest failing on OVB jobs to create Cinder volume and Nova server" [Critical,In progress] - Assigned to Gabriele Cerami (gcerami) | 16:10 |
*** jaosorior_mtg is now known as jaosorior | 16:11 | |
jaosorior | ccamacho: fuck, so they're broken again? | 16:12 |
jaosorior | :( | 16:12 |
d0ugal | ccamacho: Can you share the mistral executor logs? | 16:12 |
openstackgerrit | Charlie Llewellyn proposed openstack/tripleo-heat-templates: Add method to retry registration as we expect occasional network issues https://review.openstack.org/386529 | 16:13 |
panda | there's nothing in redis that supports direclty having a socket in tmpfs ... | 16:13 |
*** tiswanso has joined #tripleo | 16:14 | |
ccamacho | d0ugal sure let me re-reproduce the error to confirm and Ill get the executor logs. | 16:15 |
panda | The only workaround that I can think of is addin and ExecPre in systemd to creates the dir | 16:15 |
panda | but with hardcoded path | 16:15 |
*** akshai has quit IRC | 16:15 | |
*** percevalbot has quit IRC | 16:17 | |
*** percevalbot has joined #tripleo | 16:19 | |
*** saneax-_-|AFK is now known as saneax | 16:20 | |
*** dprince has joined #tripleo | 16:21 | |
*** dsariel has joined #tripleo | 16:26 | |
*** abehl has joined #tripleo | 16:27 | |
panda | EmilienM: it's possible to use it something like this in erb files ? <%= File.dirname(@unixsocket) %> | 16:28 |
*** bnemec is now known as beekneemech | 16:28 | |
openstackgerrit | Julie Pichon proposed openstack/python-tripleoclient: Pass clients to get the get_password function https://review.openstack.org/393192 | 16:29 |
*** panda is now known as panda|bbl | 16:29 | |
panda|bbl | be back in ~3hours | 16:29 |
*** jcoufal has quit IRC | 16:29 | |
*** jcoufal has joined #tripleo | 16:30 | |
*** dsariel has quit IRC | 16:33 | |
openstackgerrit | Dougal Matthews proposed openstack/tripleo-common: Use parameters from existing Heat stack if it already exists https://review.openstack.org/393831 | 16:34 |
*** rbrady is now known as rbrady-run | 16:35 | |
ayoung | No op redeploy started 15:31:36Z ended 16:09:59Z one controller, one compute, virtualized env | 16:35 |
ayoung | 10 minutes were between 2016-11-04 15:59:20Z [NovaComputeDeployment]: SIGNAL_COMPLETE Unknown | 16:36 |
ayoung | and 2016-11-04 16:09:38Z [overcloud-AllNodesDeploySteps-tg2bq2qb4ogw-ControllerDeployment_Step5-duronstn7yce.0]: SIGNAL_IN_PROGRESS Signal: deployment d3ddc624-372d-4379-b0d1-219e4d9a4134 succeeded | 16:36 |
ayoung | Something is hanging. How do I debug? | 16:36 |
ayoung | same thing earlier in the deploy: 10 minutes between 2016-11-04 15:46:31Z [NovaComputeDeployment]: SIGNAL_COMPLETE Unknown and SIGNAL_IN_PROGRESS Signal: deployment caa21456-320d-418b-9392-4a8f2d171a44 succeeded | 16:37 |
*** cylopez has quit IRC | 16:41 | |
*** zoli is now known as zoli|gone | 16:42 | |
*** chandankumar has joined #tripleo | 16:42 | |
shardy | ayoung: log onto the node and ps ax | grep heat - see if there's a heat-config hook running and stuck | 16:43 |
*** jlinkes has quit IRC | 16:43 | |
ayoung | shardy, ok...lets see... | 16:43 |
*** jaosorior has quit IRC | 16:43 | |
shardy | if there is you can copy the command, kill the hook and re-run puppet with --debug to see where it's stuck | 16:43 |
shardy | ayoung: note if you do that, you'll need to note the step it's stuck on and pass that to puppet | 16:44 |
shardy | e.g by adding step: 3 to a hieradata file or something | 16:44 |
ayoung | shardy, so the deploy finally finished | 16:44 |
ayoung | I;m on the controller, and Heat is running as a service . Did you mean I should log in to a compute? | 16:44 |
shardy | well, log in to whichever node has a stuck Deployment resource | 16:45 |
*** yamahata has quit IRC | 16:45 | |
ayoung | shardy, so kick it off again and ssh in when it hangs. But I t did actually complete. Is there some log I should look at from the last time first? | 16:46 |
ayoung | /var/log/os-apply-config.log ? | 16:46 |
shardy | ayoung: you can look at the stdout from the deployment e.g | 16:47 |
ayoung | shardy, that is what I pasted above | 16:47 |
shardy | openstack software deployment output show caa21456-320d-418b-9392-4a8f2d171a44 --all --long | 16:47 |
shardy | no that's just the events | 16:47 |
ayoung | http://paste.openstack.org/show/587912/ | 16:47 |
ayoung | ah...ok | 16:47 |
shardy | sounds like you want to profile the puppet run to make it faster? | 16:48 |
ayoung | gah just lost all output in this window...one sec | 16:48 |
ayoung | shardy, yeah, or at least understand where the time is going | 16:48 |
EmilienM | slagle: can you please approve https://review.openstack.org/#/c/393769/ ? | 16:50 |
openstackgerrit | Emilien Macchi proposed openstack/tripleo-heat-templates: Include redis/mongo hiera when using pacemaker https://review.openstack.org/393318 | 16:51 |
openstackgerrit | Emilien Macchi proposed openstack/tripleo-heat-templates: Remove duplicate metadata keys from nova-api.yaml https://review.openstack.org/393327 | 16:51 |
*** mcornea has quit IRC | 16:51 | |
*** ayoung has quit IRC | 16:51 | |
openstackgerrit | James Slagle proposed openstack/python-tripleoclient: Fix nodes count check on stack-update with custom roles https://review.openstack.org/393855 | 16:51 |
*** ayoung has joined #tripleo | 16:52 | |
*** paramite has quit IRC | 16:54 | |
*** bana_k has joined #tripleo | 17:03 | |
gfidente | panda|bbl so I don't see any particular slowdown in HA job with three controllers | 17:03 |
gfidente | and pingtest succeeded | 17:03 |
gfidente | I can see initial request for cinder taking a bit longer than the others, so wsgi preload might help | 17:03 |
gfidente | but I am inclined to think that this isn't the root cause | 17:03 |
gfidente | as the volume resource in heat goes into COMPLETE state | 17:04 |
gfidente | it's when we boot the nova guest which things fail, probably because of CPU overload | 17:04 |
*** flepied has joined #tripleo | 17:04 | |
ccamacho | d0ugal reported here: https://bugs.launchpad.net/tripleo/+bug/1639302 | 17:04 |
openstack | Launchpad bug 1639302 in tripleo "Started Mistral Workflow fails due to malformed template" [Undecided,New] | 17:04 |
*** hewbrocca is now known as hewbrocca_afk | 17:07 | |
d0ugal | ccamacho: Thanks. | 17:08 |
shardy | ayoung: Note that you can get the puppet manifest for a role from heat, e.g | 17:10 |
shardy | openstack software config list | grep ComputePuppetConfigImpl | 17:10 |
*** ooolpbot has joined #tripleo | 17:10 | |
ooolpbot | URGENT TRIPLEO TASKS NEED ATTENTION | 17:10 |
ooolpbot | https://bugs.launchpad.net/tripleo/+bug/1637961 | 17:10 |
ooolpbot | https://bugs.launchpad.net/tripleo/+bug/1638350 | 17:10 |
*** ooolpbot has quit IRC | 17:10 | |
openstack | Launchpad bug 1637961 in tripleo "periodic HA master job pingtest times out" [Critical,In progress] - Assigned to Gabriele Cerami (gcerami) | 17:10 |
shardy | openstack software config show <ID> | 17:10 |
openstack | Launchpad bug 1638350 in tripleo "pingtest failing on OVB jobs to create Cinder volume and Nova server" [Critical,In progress] - Assigned to Gabriele Cerami (gcerami) | 17:10 |
shardy | which makes it fairly easy to cut/paste then run on the node manually | 17:10 |
ayoung | shardy, oh, I did not realize, but it makes sense | 17:10 |
shardy | I've been thinking we should write the manifest on each node to make this easier | 17:10 |
ayoung | shardy, yes | 17:10 |
ayoung | shardy, so I was thinking it should be like this: | 17:11 |
shardy | ayoung: the other option is when the heat hook is running, you can copy the <deploymentid>.pp file somewhere | 17:11 |
shardy | then run it again after the heat hook finishes | 17:11 |
ayoung | 1. User calls an API to update the manifest in heat. That is saved as a delta | 17:11 |
ayoung | 2. User can see both the old and new state in heat | 17:11 |
shardy | (in both cases you have to add a step value to hiera) | 17:11 |
ayoung | 3. User can run the deploy in "test only" mode | 17:11 |
ayoung | finally, apply.... | 17:11 |
*** fragatina has quit IRC | 17:11 | |
ayoung | but even then, we need to be able to trace and see the log as it happens to diagnose the unpredicted breakages, and to fix a broken redeploy | 17:12 |
ayoung | is a workflow like that possible? | 17:12 |
openstackgerrit | Honza Pokorny proposed openstack/tripleo-ui: Redirect user to login page when token expires https://review.openstack.org/392589 | 17:12 |
shardy | ayoung: probably yes, it'd take a bit of work to wire it all together tho | 17:12 |
*** jpich has quit IRC | 17:13 | |
shardy | the biggest missing pieces is a way for the heat hook to pass line-by-line data back instead of waiting for puppet to finish (or hang..) | 17:13 |
shardy | ayoung: if we write the manifest then exist, it'd be pretty easy to allow folks to run it directly or via another tool tho | 17:13 |
shardy | s/exist/exit | 17:14 |
*** jprovazn has joined #tripleo | 17:17 | |
openstackgerrit | Giulio Fidente proposed openstack/tripleo-common: Install Heat's Ansible agent in the overcloud-full image https://review.openstack.org/393872 | 17:20 |
*** tremble has quit IRC | 17:22 | |
*** ohamada has quit IRC | 17:23 | |
openstackgerrit | Merged openstack/tripleo-heat-templates: Updated Nuage neutron plugin name https://review.openstack.org/391923 | 17:23 |
*** saneax is now known as saneax-_-|AFK | 17:23 | |
*** rbowen has joined #tripleo | 17:23 | |
*** florianf has quit IRC | 17:23 | |
*** florianf has joined #tripleo | 17:29 | |
openstackgerrit | Alfredo Moralejo proposed openstack/tripleo-quickstart: Increase haproxy timeouts https://review.openstack.org/393876 | 17:30 |
slagle | amoralej: should we just fix that properly? instead of a ci workaround? | 17:32 |
*** lucasagomes is now known as lucas-afk | 17:32 | |
amoralej | slagle, i'm open to any suggestion, but i think slow hardware in rdo-ci is behing the cause | 17:32 |
amoralej | in rdo-ci GET heat api calls take up to more that 60 seconds to respond | 17:33 |
amoralej | while in better hardware it takes 12seconds max | 17:33 |
amoralej | we are trying to tune things in rdo-ci in parallel | 17:34 |
*** ebarrera has joined #tripleo | 17:35 | |
*** lmiccini has quit IRC | 17:36 | |
*** d0ugal has quit IRC | 17:39 | |
*** rasca has quit IRC | 17:40 | |
slagle | sounds like shardy saw the same issue in his env. makes me think the root cause might be something other than slow hardware since this appears to have just come up | 17:40 |
*** jkilpatr has quit IRC | 17:42 | |
*** jkilpatr has joined #tripleo | 17:43 | |
shardy | Yeah, I didn't have much luck figuring out the root cause tho unfortunately | 17:44 |
shardy | it was definitely the post to swift which caused it tho | 17:44 |
*** creepy_owlet is now known as dtantsur|afk | 17:44 | |
*** ebarrera has quit IRC | 17:44 | |
*** ebarrera has joined #tripleo | 17:47 | |
*** shardy has quit IRC | 17:51 | |
*** ebarrera has quit IRC | 17:54 | |
*** charliejllewelly has quit IRC | 18:00 | |
*** akshai has joined #tripleo | 18:00 | |
*** yamahata has joined #tripleo | 18:03 | |
*** rbrady-run is now known as rbrady | 18:03 | |
openstackgerrit | David Moreau Simard proposed openstack/tripleo-quickstart: Properly reload kvm module when trying to set up nested virtualization https://review.openstack.org/386012 | 18:07 |
amoralej | shardy, slagle it may be different issues, have you seen https://bugzilla.redhat.com/show_bug.cgi?id=1320164#c21 ? | 18:07 |
openstack | bugzilla.redhat.com bug 1320164 in openstack-swift "2 tests for object storage failed with BadStatusLine" [Medium,Post] - Assigned to zaitcev | 18:07 |
*** links has joined #tripleo | 18:09 | |
*** ooolpbot has joined #tripleo | 18:10 | |
ooolpbot | URGENT TRIPLEO TASKS NEED ATTENTION | 18:10 |
ooolpbot | https://bugs.launchpad.net/tripleo/+bug/1637961 | 18:10 |
openstack | Launchpad bug 1637961 in tripleo "periodic HA master job pingtest times out" [Critical,In progress] - Assigned to Gabriele Cerami (gcerami) | 18:10 |
ooolpbot | https://bugs.launchpad.net/tripleo/+bug/1638350 | 18:10 |
*** ooolpbot has quit IRC | 18:10 | |
openstack | Launchpad bug 1638350 in tripleo "pingtest failing on OVB jobs to create Cinder volume and Nova server" [Critical,In progress] - Assigned to Gabriele Cerami (gcerami) | 18:10 |
*** links has quit IRC | 18:10 | |
openstackgerrit | Emilien Macchi proposed openstack/tripleo-heat-templates: Add preload wsgi script option for cinder https://review.openstack.org/393790 | 18:11 |
*** liverpooler has quit IRC | 18:14 | |
*** jpena is now known as jpena|off | 18:16 | |
openstackgerrit | Jiri Stransky proposed openstack/tripleo-heat-templates: Updated Nuage neutron plugin name https://review.openstack.org/393892 | 18:18 |
*** stevemul has left #tripleo | 18:20 | |
openstackgerrit | Merged openstack/python-tripleoclient: Fix nodes count check on stack-update with custom roles https://review.openstack.org/392313 | 18:20 |
*** sshnaidm has quit IRC | 18:24 | |
*** sshnaidm has joined #tripleo | 18:27 | |
*** rhefner has quit IRC | 18:27 | |
openstackgerrit | Bob Fournier proposed openstack/diskimage-builder: Set "NM_CONTROLLED=no" for ifcfg files in dhcp-all-interfaces.sh https://review.openstack.org/393897 | 18:28 |
EmilienM | slagle: can you remove -1 on https://review.openstack.org/#/c/393855/ please? | 18:30 |
openstackgerrit | Bob Fournier proposed openstack/diskimage-builder: Set "NM_CONTROLLED=no" for ifcfg files in dhcp-all-interfaces.sh https://review.openstack.org/393897 | 18:32 |
*** pblaho has quit IRC | 18:33 | |
panda|bbl | bkero: what's the upstream for puppet-redis ? | 18:35 |
*** mgould is now known as mgould|afk | 18:37 | |
*** milan has quit IRC | 18:38 | |
*** saneax-_-|AFK has quit IRC | 18:38 | |
EmilienM | panda|bbl: https://github.com/arioch/puppet-redis | 18:41 |
EmilienM | I think | 18:41 |
EmilienM | let me check | 18:41 |
EmilienM | yes | 18:41 |
panda|bbl | EmilienM: thanks. | 18:42 |
EmilienM | panda|bbl: FYI it's here https://github.com/redhat-openstack/rdoinfo/blob/master/rdo.yml#L324 | 18:42 |
panda|bbl | EmilienM: ok. I'm not sure how any of this worked, but if we really want redis socket to be in /var/run, something has to be changed. The best way is to change the systemd service file. I'm not sure if it's better to propose a change upstream or put a patch in the package. Maybe both and then remove the patch when upstream lands ... | 18:44 |
*** jerrygb has quit IRC | 18:45 | |
Slower | is there some magic to killing a stack while it's being started? | 18:45 |
Slower | every time I try it ends up taking forever | 18:45 |
beagles | bandini: some other info on the whole "what creates this route" question of yesterday | 18:45 |
Slower | eg while in CREATE_IN_PROGRESS | 18:45 |
*** ebalduf has quit IRC | 18:45 | |
*** jerrygb has joined #tripleo | 18:45 | |
bandini | beagles: oh I found out what happened | 18:45 |
bandini | beagles: your pointer gave me the right tip | 18:46 |
*** akshai has quit IRC | 18:46 | |
beagles | bandini: ok cool.. fwiw: if net-config-noop.yaml is spec'd in the resource registry, something else is used... | 18:46 |
bandini | beagles: basically it was a wrongly configured EC2MetadataIP and ControlPlaneDefaultRoute. So the templates were adding routes to 169.254... that were impossible so the kernel would not create them and everything broke apart | 18:47 |
bandini | beagles: ah cool | 18:47 |
beagles | bandini: ahhh.. yeah, that'd do it :) | 18:48 |
bandini | beagles: I'd have to check if validations would have caught this one and file a bug ;) | 18:48 |
bandini | it's not entirely trivial to debug because the deployment simply times out | 18:49 |
beagles | bandini: ugh | 18:49 |
*** jkilpatr has quit IRC | 18:50 | |
beagles | bandini: speaks to something that's been brewing for me.. I'd like us to explore better separation between the essential "infrastructure" and overcloud networking elements. For example, right now we configure neutron in the overcloud with mappings to the same bridge we use for the control plane | 18:52 |
beagles | bandini: we could have them as two distinct bridges and link them with patch ports | 18:52 |
beagles | bandini: so if we screw up configuration of a neutron service in the overcloud, it is less likely to mess up the networking for the control plane | 18:53 |
* beagles is just thinking out loud | 18:53 | |
bandini | beagles: sounds like worth exploring, yes | 18:53 |
bandini | beagles: I need to play and see how much we can cover of these issues via validations | 18:54 |
beagles | bandini: cool | 18:54 |
bandini | the case I looked at the other day, screamed for a warning saying "dude, not going to work" ;) | 18:54 |
beagles | :) | 18:55 |
beekneemech | Now that we have the api, we should stop requiring users to specify the route to the undercloud api endpoints. | 18:55 |
beekneemech | We have a thing that knows what address the undercloud is using and can pass that into the overcloud config automatically. | 18:55 |
bandini | yeah that would definitely be an improvement as well | 18:56 |
*** jkilpatr has joined #tripleo | 19:03 | |
*** dsariel has joined #tripleo | 19:07 | |
*** fzdarsky__ is now known as fzdarsky|afk | 19:08 | |
EmilienM | slagle: can you approve https://review.openstack.org/#/c/393770/ please? | 19:08 |
slagle | EmilienM: the master patch hasn't merged | 19:09 |
EmilienM | my bad | 19:09 |
*** ooolpbot has joined #tripleo | 19:10 | |
ooolpbot | URGENT TRIPLEO TASKS NEED ATTENTION | 19:10 |
ooolpbot | https://bugs.launchpad.net/tripleo/+bug/1637961 | 19:10 |
ooolpbot | https://bugs.launchpad.net/tripleo/+bug/1638350 | 19:10 |
openstack | Launchpad bug 1637961 in tripleo "periodic HA master job pingtest times out" [Critical,In progress] - Assigned to Gabriele Cerami (gcerami) | 19:10 |
*** ooolpbot has quit IRC | 19:10 | |
openstackgerrit | Emilien Macchi proposed openstack/puppet-tripleo: swift/proxy: configure rabbitmq properly https://review.openstack.org/391862 | 19:10 |
openstack | Launchpad bug 1638350 in tripleo "pingtest failing on OVB jobs to create Cinder volume and Nova server" [Critical,In progress] - Assigned to Gabriele Cerami (gcerami) | 19:10 |
EmilienM | rebasing it/ recheck -- failed to pass CI | 19:10 |
openstackgerrit | Martin André proposed openstack/tripleo-heat-templates: Containerized Services for Composable Roles https://review.openstack.org/330659 | 19:10 |
openstackgerrit | Merged openstack/tripleo-heat-templates: Update openstack-puppet-modules dependencies https://review.openstack.org/393797 | 19:10 |
slagle | EmilienM: should we w-1 the backport? | 19:10 |
slagle | someoen might come along and merge it accidentally | 19:11 |
openstackgerrit | Ian Main proposed openstack/tripleo-heat-templates: Containerized Services for Composable Roles https://review.openstack.org/330659 | 19:11 |
EmilienM | slagle: you can, I won't otherwise you'll be stuck | 19:11 |
EmilienM | slagle: or -1 it | 19:11 |
EmilienM | it's visible enough | 19:11 |
*** rbrady is now known as rbrady-afk | 19:11 | |
*** amoralej is now known as amoralej|off | 19:11 | |
slagle | oh, well if i decide to never come back to work after today, i guess it maybe stuck | 19:11 |
EmilienM | but won't block if we want to approve | 19:11 |
EmilienM | lol | 19:11 |
EmilienM | slagle: please come back | 19:12 |
slagle | ok, yea, i just -1'd | 19:12 |
slagle | if i'm not around when the master patch lands, someone else can push it through if they so desire | 19:12 |
EmilienM | sounds good | 19:12 |
openstackgerrit | James Slagle proposed openstack/python-tripleoclient: Fix nodes count check on stack-update with custom roles https://review.openstack.org/393855 | 19:13 |
slagle | EmilienM: can you approve that ^ :) | 19:14 |
EmilienM | slagle: -2 | 19:14 |
slagle | it fully passed CI and all i did was update commit message | 19:14 |
EmilienM | yeah | 19:14 |
slagle | thanks | 19:14 |
*** jerrygb has quit IRC | 19:14 | |
*** saneax-_-|AFK has joined #tripleo | 19:16 | |
*** jerrygb has joined #tripleo | 19:16 | |
EmilienM | slagle: the list of things we know critical is getting smaller, wdyt about doing an etherpad maybe? or a gerrit topic | 19:21 |
slagle | EmilienM: that might help | 19:26 |
EmilienM | slagle: etherpad.openstack.org/p/tripleo-newton-2 | 19:27 |
*** jcoufal has quit IRC | 19:28 | |
*** akshai has joined #tripleo | 19:34 | |
*** akshai has quit IRC | 19:35 | |
*** fragatina has joined #tripleo | 19:44 | |
openstackgerrit | Ian Main proposed openstack/tripleo-heat-templates: Containerized Services for Composable Roles https://review.openstack.org/330659 | 19:44 |
panda|bbl | first hastily made attempt to solve redis issue even without a ressource agent: https://github.com/arioch/puppet-redis/pull/131 | 19:44 |
panda|bbl | but if this ever merge, it will take ages .... and we need a workaround anyway ... | 19:45 |
panda|bbl | the easiest thing is really use just /tmp/redis.sock as socket. Any suggestion appreciated | 19:47 |
*** chandankumar has quit IRC | 19:54 | |
openstackgerrit | Merged openstack/puppet-tripleo: Create heat user in keystone profile https://review.openstack.org/393000 | 19:57 |
openstackgerrit | Merged openstack/tripleo-heat-templates: Remove duplicate metadata keys from nova-api.yaml https://review.openstack.org/393327 | 19:59 |
*** jerrygb_ has joined #tripleo | 20:01 | |
*** fragatina has quit IRC | 20:03 | |
*** jerrygb has quit IRC | 20:05 | |
*** jerrygb_ has quit IRC | 20:06 | |
*** tiswanso has quit IRC | 20:06 | |
*** coolsvap has quit IRC | 20:07 | |
*** morazi has quit IRC | 20:09 | |
*** ooolpbot has joined #tripleo | 20:10 | |
ooolpbot | URGENT TRIPLEO TASKS NEED ATTENTION | 20:10 |
ooolpbot | https://bugs.launchpad.net/tripleo/+bug/1637961 | 20:10 |
ooolpbot | https://bugs.launchpad.net/tripleo/+bug/1638350 | 20:10 |
*** ooolpbot has quit IRC | 20:10 | |
openstack | Launchpad bug 1637961 in tripleo "periodic HA master job pingtest times out" [Critical,In progress] - Assigned to Gabriele Cerami (gcerami) | 20:10 |
openstack | Launchpad bug 1638350 in tripleo "pingtest failing on OVB jobs to create Cinder volume and Nova server" [Critical,In progress] - Assigned to Gabriele Cerami (gcerami) | 20:10 |
*** dprince has quit IRC | 20:12 | |
*** abregman has joined #tripleo | 20:14 | |
*** jayg is now known as jayg|g0n3 | 20:14 | |
*** florianf has quit IRC | 20:15 | |
*** jprovazn has quit IRC | 20:15 | |
openstackgerrit | Merged openstack/tripleo-heat-templates: Include redis/mongo hiera when using pacemaker https://review.openstack.org/393318 | 20:22 |
*** jkilpatr has quit IRC | 20:25 | |
*** athomas has quit IRC | 20:25 | |
*** derekh has joined #tripleo | 20:27 | |
derekh | Are we still hitting CI issues? anything that needs an extra pair of hands to debug? | 20:28 |
*** panda|bbl is now known as panda | 20:31 | |
EmilienM | panda: managing this file is imho a bad idea | 20:31 |
EmilienM | it should be done by packaging | 20:31 |
EmilienM | derekh, slagle: FYI I also see some issues when deploying undercloud | 20:32 |
EmilienM | ironic api sounds unreachable sometimes | 20:32 |
EmilienM | http://logs.openstack.org/76/391876/5/check-tripleo/gate-tripleo-ci-centos-7-ovb-ha/ab8a98e/console.html#_2016-11-04_19_14_23_949674 | 20:32 |
derekh | EmilienM: I'll try a spin up a env and see which of the problems I hit first, then debug | 20:33 |
panda | EmilienM: the problem is that with the package is not easy to get the config file options for the unix socket | 20:33 |
panda | is there something that works this week ? :( | 20:34 |
panda | EmilienM: but I agree that is a ad idea ... | 20:35 |
EmilienM | fwiw, stable/newton seems pretty stable | 20:35 |
panda | EmilienM: I'd like to check if redis starts there ... | 20:36 |
slagle | EmilienM: you wanna know why? | 20:36 |
beagles | this still the selinux issue | 20:36 |
beagles | overcloud nodes are permissive in newton :) | 20:36 |
slagle | it's b/c we didnt backport https://review.openstack.org/#/c/393472/ | 20:37 |
slagle | right | 20:37 |
slagle | that's why i -1'd the backport | 20:37 |
slagle | maybe we shoudl revert on master | 20:37 |
beagles | it would help unblock CI, but it kind of sucks | 20:37 |
panda | slagle: it's not enough, for the nonha is not a selinux problem. In HA redis is started by resource agente that creates the /var/run/redis dir. Without pacemaker there is nothing that creates that dir, unix socket cannot be created and redis does not start anyway. | 20:38 |
slagle | panda: do we know what has changed to cause that problem? | 20:39 |
* beagles reflects that is kind of weird that is happening now | 20:39 | |
beagles | yeah, what he said | 20:40 |
panda | slagle: no, I wonder if this ever worked ... /var/run should have been a tmps for a long time now | 20:40 |
slagle | right. /var/run has been tmpfs for at least all of centos 7 | 20:40 |
beagles | I imagine we've been down the obvious routes of having the service file do something.. like in the ExecStartPre or something? | 20:41 |
panda | we don't have overcloud logs for newton ... nodes are deleted when all goes well, or seems to go well | 20:44 |
EmilienM | panda: what? | 20:45 |
*** tiswanso has joined #tripleo | 20:45 | |
*** afazekas_ has joined #tripleo | 20:45 | |
EmilienM | panda: that's newton/ovb-ha logs http://logs.openstack.org/18/393318/2/check-tripleo/gate-tripleo-ci-centos-7-ovb-ha/9c6ffa2/logs/ | 20:45 |
panda | EmilienM: https://bugs.launchpad.net/tripleo/+bug/1631942 | 20:46 |
openstack | Launchpad bug 1631942 in tripleo "Ha periodic jobs don't gather overcloud nodes logs on successful runs" [Undecided,New] - Assigned to Gabriele Cerami (gcerami) | 20:46 |
EmilienM | ah, periodic jobs | 20:46 |
*** jkilpatr has joined #tripleo | 20:46 | |
*** afazekas has quit IRC | 20:48 | |
panda | EmilienM: http://logs.openstack.org/18/393318/2/check-tripleo/gate-tripleo-ci-centos-7-ovb-nonha/0b5af97/logs/overcloud-controller-0/var/log/redis/redis.txt.gz | 20:49 |
panda | redis did not start in newton nonha! | 20:49 |
panda | but maible ceilometer was disabled | 20:49 |
panda | maybe* | 20:49 |
*** tiswanso has quit IRC | 20:50 | |
panda | nothing was using it | 20:50 |
EmilienM | panda: have we compared the version of redis over the last days/weeks? | 20:50 |
openstackgerrit | Merged openstack/tripleo-heat-templates: Add option to disable "d1" Swift device https://review.openstack.org/393769 | 20:51 |
panda | EmilienM: I didn't. Not even sure where to start to do it. | 20:51 |
EmilienM | I'm doing it but let me show you: | 20:51 |
EmilienM | you go on http://tripleo.org/cistatus.html and you take a job from 1 or 2 weeks ago. You inspect the logs and look what version of redis is installed on the overcloud (/var/log/hosts-info.txt) | 20:52 |
EmilienM | and you compare with the version we have right now | 20:52 |
*** dbecker has joined #tripleo | 20:54 | |
*** dbecker has quit IRC | 20:54 | |
panda | EmilienM: last job I see is from 25 October | 20:54 |
EmilienM | ok so version of redis looks ok | 20:54 |
EmilienM | panda: is it file ok ? http://logs.openstack.org/56/390556/3/check-tripleo/gate-tripleo-ci-centos-7-ovb-ha/5ede534/logs/overcloud-controller-0/var/log/redis/redis.txt.gz | 20:55 |
*** pradk has quit IRC | 20:55 | |
EmilienM | it's an ovb job from 10 days ago, redis seems to start | 20:55 |
EmilienM | let's now compare packages between there and now (diff will be huge but still helpful) | 20:55 |
panda | EmilienM: this is ha, the resource agent takes care of creating the /var/run/redis dir | 20:55 |
panda | EmilienM: you have to look at nonha | 20:56 |
EmilienM | ok | 20:56 |
EmilienM | panda: http://logs.openstack.org/56/390556/3/check-tripleo/gate-tripleo-ci-centos-7-ovb-nonha/f3972f2/logs/overcloud-controller-0/var/log/redis/redis.txt.gz | 20:56 |
EmilienM | is it ok? | 20:56 |
panda | EmilienM: yes | 20:57 |
EmilienM | â— redis.service loaded failed failed Redis persistent key-value database | 20:57 |
EmilienM | I'm not sure | 20:57 |
panda | EmilienM: if there's that error, it did not start | 20:58 |
EmilienM | see in http://logs.openstack.org/56/390556/3/check-tripleo/gate-tripleo-ci-centos-7-ovb-nonha/f3972f2/logs/overcloud-controller-0/var/log/host_info.txt.gz | 20:58 |
EmilienM | so it was 10 days ago, which means redis was already broken? | 20:58 |
panda | EmilienM: yes | 20:58 |
EmilienM | we really need to get scenario001 green again, it's a telemetry CI job | 20:58 |
EmilienM | panda: ok I'l looking for old jobs now | 20:59 |
EmilienM | ok I found redis working | 21:00 |
EmilienM | http://logs.openstack.org/67/371567/1/check-tripleo/gate-tripleo-ci-centos-7-ovb-nonha/f48f1f0/logs/overcloud-controller-0/var/log/redis/redis.txt.gz | 21:00 |
EmilienM | Sep 17 | 21:00 |
openstackgerrit | Merged openstack/python-tripleoclient: Fix nodes count check on stack-update with custom roles https://review.openstack.org/393855 | 21:00 |
EmilienM | so now we have redis-3.2.4-1.el7.x86_64 | 21:01 |
EmilienM | but it worked with edis version 3.2.3 | 21:01 |
EmilienM | I guess we need to investigate diff between 3.2.3 and 3.2.4 | 21:01 |
EmilienM | looking at https://github.com/antirez/redis/releases | 21:02 |
EmilienM | 3.2.5 was release end of september which makes sense regarding our failures | 21:02 |
*** flepied has quit IRC | 21:03 | |
mwhahaha | more package fun? | 21:03 |
panda | EmilienM: https://github.com/antirez/redis/blob/3.2/00-RELEASENOTES | 21:04 |
*** dougbtv has quit IRC | 21:04 | |
EmilienM | mwhahaha: yes, I think redis broke us end of september and we didn't catch it | 21:04 |
mwhahaha | nice | 21:04 |
EmilienM | I know a guy, called mwhahaha, he's super good at finding what commit broke us :P | 21:04 |
mwhahaha | teach me to speak up on a friday afternoon | 21:05 |
panda | EmilienM: maybe we should look at packaging changes | 21:05 |
EmilienM | panda: both code and packaging changes could break us | 21:06 |
EmilienM | looking at release notes, I think it's in code | 21:06 |
EmilienM | https://github.com/antirez/redis/compare/3.2.3...3.2.4 | 21:06 |
EmilienM | diff is not too bad | 21:06 |
EmilienM | I like their commit messages | 21:07 |
EmilienM | https://github.com/antirez/redis/commit/c01abcdebf4fa2b1cd3d3a89049651d528ed5656 | 21:07 |
EmilienM | "fix the fix" | 21:07 |
panda | I don't see anything related. They changed the message for socket bind problem, but that's it | 21:09 |
*** ooolpbot has joined #tripleo | 21:10 | |
ooolpbot | URGENT TRIPLEO TASKS NEED ATTENTION | 21:10 |
ooolpbot | https://bugs.launchpad.net/tripleo/+bug/1637961 | 21:10 |
openstack | Launchpad bug 1637961 in tripleo "periodic HA master job pingtest times out" [Critical,In progress] - Assigned to Gabriele Cerami (gcerami) | 21:10 |
ooolpbot | https://bugs.launchpad.net/tripleo/+bug/1638350 | 21:10 |
*** ooolpbot has quit IRC | 21:10 | |
openstack | Launchpad bug 1638350 in tripleo "pingtest failing on OVB jobs to create Cinder volume and Nova server" [Critical,In progress] - Assigned to Gabriele Cerami (gcerami) | 21:10 |
EmilienM | panda: maybe I'm wrong wrt redis version | 21:11 |
EmilienM | let me find a CI job right after redis upgrade and right before | 21:11 |
panda | EmilienM: in the meantime I checked if in case, we enabled pacemaker for nonha jobs too. It doesn't seem the case. | 21:13 |
*** abregman has quit IRC | 21:13 | |
*** absubram has quit IRC | 21:13 | |
EmilienM | panda: ok so I check the day after redis release and redis was failing in tripleo CI | 21:15 |
jidar | is there any example of using something like tripleo-common/blob/master/scripts/upload-puppet-modules to augment an existing puppet post-config update? | 21:16 |
jidar | I'm not sure what glue there is between using that script to create a swift object for the puppet modules and then using it on the other end in post-config deployment pattern | 21:16 |
jidar | I would suppose you just build a heat resource that pulls it down and unzips it in /etc/puppet/modules ? | 21:17 |
EmilienM | panda: and 2 days earlier with previous version it was working | 21:17 |
EmilienM | so I'm quite sure now, that redis version has to do something here | 21:17 |
EmilienM | slagle: do we already have a bug for redis? | 21:20 |
EmilienM | otherwise i'll create one in launchpad | 21:20 |
slagle | EmilienM: not one of outside of the 2 that have been alerting already | 21:21 |
EmilienM | ok | 21:21 |
EmilienM | 3.2.5 is out also, I'm wondering if this one would work | 21:22 |
EmilienM | looks like no, it's just a compilation fix | 21:23 |
mwhahaha | well the redis package works with no configuration so it must be the way we're configuring it | 21:23 |
mwhahaha | do we capture the redis configs? | 21:23 |
beekneemech | jidar: Try http://hardysteven.blogspot.com/2016/08/tripleo-deploy-artifacts-and-puppet.html | 21:24 |
EmilienM | slagle: https://bugs.launchpad.net/tripleo/+bug/1639356 | 21:24 |
openstack | Launchpad bug 1639356 in tripleo "Redis 3.2.4 breaks TripleO CI" [Critical,In progress] | 21:24 |
EmilienM | mwhahaha: yes you need to download log file | 21:25 |
mwhahaha | the /etc/redis folder is empty | 21:25 |
*** rhallisey has quit IRC | 21:25 | |
mwhahaha | (from the extracted overcloud-controller-0.tar.xz) | 21:26 |
jidar | beekneemech: yea,that's where I found this in the first place - was hoping for some code of how to recreate that image | 21:26 |
EmilienM | o_O | 21:26 |
mwhahaha | there's an etc/redis.conf.puppet | 21:26 |
mwhahaha | which seems weird | 21:27 |
mwhahaha | oh its /etc/redis.conf | 21:27 |
panda | what else changed that day ? | 21:28 |
panda | EmilienM: what's the day redis was upgraded ? | 21:28 |
EmilienM | panda: Sep 26 | 21:29 |
beekneemech | jidar: image? | 21:30 |
jidar | beekneemech: oh, the image there just shows a workflow where you pull data out of swift using "deploy-artifacts.sh" | 21:30 |
jidar | via the heat templates | 21:31 |
*** rlandy has quit IRC | 21:32 | |
beekneemech | jidar: Right. upload-puppet-modules creates an env file for you that you pass to your deployment to apply the artifacts to the deployed servers. | 21:33 |
beekneemech | By default, $HOME/.tripleo/environments/puppet-modules-url.yaml | 21:33 |
*** maeca1 has quit IRC | 21:34 | |
EmilienM | mwhahaha: even when redis was working, no config in /etc/redis | 21:35 |
EmilienM | actually we have etc/redis-sentinel.conf | 21:35 |
*** jrollinhatin is now known as jroll | 21:36 | |
*** paramite has joined #tripleo | 21:37 | |
panda | EmilienM: look in /etc/redis.con | 21:37 |
panda | EmilienM: look in /etc/redis.conf | 21:37 |
panda | Is theere a way to download the logs ? | 21:37 |
EmilienM | panda: yes, go in http://logs.openstack.org/39/375339/1/check-tripleo/gate-tripleo-ci-centos-7-ovb-nonha/c3b122d/logs/ | 21:37 |
EmilienM | you download the tar gz | 21:37 |
mwhahaha | Yea I found the redis.conf and was looking at it | 21:38 |
*** abehl has quit IRC | 21:40 | |
*** ebalduf has joined #tripleo | 21:42 | |
*** jeckersb is now known as jeckersb_gone | 21:44 | |
EmilienM | mhh only diff in config is the IP @ we use for binding | 21:45 |
EmilienM | before (working) it was 192.0.2.11 and now it's 172.17.0.21 | 21:46 |
EmilienM | I wonder if we care | 21:46 |
beekneemech | How are the stable/newton jobs passing with redis 3.2.4? | 21:49 |
EmilienM | beekneemech: same problem | 21:50 |
EmilienM | we cut stable/newton after 3.2.4 anyway | 21:50 |
panda | beekneemech: because there's nothing that uses redis there ... | 21:50 |
EmilienM | so if it's a config problem, it's in newton | 21:50 |
EmilienM | panda: I have to go now, please use the launchpad bug if you find something | 21:51 |
beekneemech | Ah, so redis is still broken there, but it doesn't matter. | 21:51 |
panda | beekneemech: it doesn't matter for our CI, which is probably disabling some service | 21:51 |
*** jerrygb has joined #tripleo | 21:51 | |
panda | EmilienM: sure, have a nice PTO | 21:51 |
panda | EmilienM: leave us in despair | 21:51 |
panda | EmilienM: :) | 21:52 |
*** morazi has joined #tripleo | 21:52 | |
mwhahaha | i think it's the socket file dir | 21:52 |
panda | mwhahaha: ? | 21:52 |
mwhahaha | yea | 21:53 |
mwhahaha | so /var/run/redis doesn't exist | 21:53 |
mwhahaha | so it won't start | 21:53 |
mwhahaha | you comment that out of the config and it works fine | 21:53 |
*** dtrainor has quit IRC | 21:54 | |
mwhahaha | the old spec used to create /var/run/redis | 21:54 |
panda | mwhahaha: oh? where ? | 21:55 |
panda | mwhahaha: link to the old spec ? | 21:55 |
panda | mwhahaha: if it's really doing that, it's broken anyway | 21:55 |
mwhahaha | i pulled the srpm from cbs | 21:55 |
panda | mwhahaha: /var/run is tmpfs, at first reboot /var/run/redis is deleted, and redis won't start | 21:55 |
mwhahaha | so we need to stop configuring that | 21:56 |
mwhahaha | it gets put in from the puppet | 21:56 |
*** jerrygb has quit IRC | 21:56 | |
mwhahaha | - remove /var/run/redis with systemd #1374728 | 21:56 |
mwhahaha | so that was in 3.2.3-2 | 21:56 |
*** paramite has quit IRC | 21:56 | |
jidar | beekneemech: oh crap, so you can just generate the yaml there and then `-e $HOME/.path/to/yaml-file.yaml` straight out of the script? Super neat. I did a artifact deploy to drop some custom images in there and didn't realize I could just include it directly | 21:57 |
mwhahaha | but if you look in the redis.conf it's configuring /var/run/redis/redis.sock | 21:57 |
jidar | I feel silly haha | 21:57 |
mwhahaha | panda: https://cbs.centos.org/koji/buildinfo?buildID=12030 old file | 21:57 |
mwhahaha | or old package (you can just download and extract the srpm) | 21:57 |
panda | mwhahaha: puppet-redis is also configuring that socket | 21:57 |
*** derekh has quit IRC | 21:58 | |
mwhahaha | yea that's the problem so we need to either pass a new dir (or stop configuring it) | 21:58 |
beekneemech | jidar: Yeah, I think that's it. | 21:58 |
panda | mwhahaha: we have other alternatives, but I'd like to know if: 1) we really need to use unix socket (I think we do) 2) the socket file has to be in /var/run | 21:59 |
panda | the unix socket should be faster | 21:59 |
mwhahaha | panda: gimme a min to look, what puppet module do we use? | 21:59 |
panda | mwhahaha: https://github.com/arioch/puppet-redis.git | 22:00 |
panda | mwhahaha: if the answer is yes for both, this may prove useful https://github.com/arioch/puppet-redis/pull/131 | 22:00 |
mwhahaha | https://bugzilla.redhat.com/show_bug.cgi?id=1374728 | 22:00 |
openstack | bugzilla.redhat.com bug 1374728 in redis "/var/run/redis and /usr/lib/tmpfiles/redis.conf not used" [Unspecified,New] - Assigned to fpercoco | 22:00 |
mwhahaha | i like how it said it wasn't used | 22:01 |
mwhahaha | :( | 22:01 |
mwhahaha | unless we're configuring something to consume the unix socket, i don't think we need to configure it | 22:02 |
mwhahaha | i wonder if it would just be ok if we didn't configure it | 22:02 |
panda | mwhahaha: what is gnocchi-metricd using ? | 22:02 |
mwhahaha | i have no idea | 22:03 |
*** Vijayendra has quit IRC | 22:03 | |
*** Vijayendra has joined #tripleo | 22:04 | |
mwhahaha | so we're getting that socket config from the defaults in the module, https://github.com/arioch/puppet-redis/blob/master/manifests/params.pp#L75-L76 | 22:04 |
mwhahaha | panda: is there a bug for this? | 22:04 |
openstackgerrit | Alex Schultz proposed openstack/puppet-tripleo: Skip unix socket configuration for redis https://review.openstack.org/393940 | 22:06 |
mwhahaha | panda: EmilienM -^ | 22:06 |
panda | mwhahaha: https://bugs.launchpad.net/tripleo/+bug/1639356 | 22:07 |
openstack | Launchpad bug 1639356 in tripleo "Redis 3.2.4 breaks TripleO CI" [Critical,In progress] - Assigned to Alex Schultz (alex-schultz) | 22:07 |
*** derekh has joined #tripleo | 22:09 | |
*** ooolpbot has joined #tripleo | 22:10 | |
ooolpbot | URGENT TRIPLEO TASKS NEED ATTENTION | 22:10 |
ooolpbot | https://bugs.launchpad.net/tripleo/+bug/1637961 | 22:10 |
openstack | Launchpad bug 1637961 in tripleo "periodic HA master job pingtest times out" [Critical,In progress] - Assigned to Gabriele Cerami (gcerami) | 22:10 |
ooolpbot | https://bugs.launchpad.net/tripleo/+bug/1638350 | 22:10 |
ooolpbot | https://bugs.launchpad.net/tripleo/+bug/1639356 | 22:10 |
*** ooolpbot has quit IRC | 22:10 | |
openstack | Launchpad bug 1638350 in tripleo "pingtest failing on OVB jobs to create Cinder volume and Nova server" [Critical,In progress] - Assigned to Gabriele Cerami (gcerami) | 22:10 |
openstack | Launchpad bug 1639356 in tripleo "Redis 3.2.4 breaks TripleO CI" [Critical,In progress] - Assigned to Alex Schultz (alex-schultz) | 22:10 |
mwhahaha | oh that's going to be annoying if it spams by name on a timer | 22:10 |
panda | mwhahaha: no pressure :) | 22:10 |
panda | mwhahaha: can you depend on change I1fc28d522d05033add53dad41b7519a1bd033b62 ? | 22:11 |
openstackgerrit | Alex Schultz proposed openstack/puppet-tripleo: Skip unix socket configuration for redis https://review.openstack.org/393940 | 22:11 |
mwhahaha | k | 22:11 |
panda | mwhahaha: thanks. If we end up not using the socket, maybe my change is not needed. | 22:13 |
mwhahaha | i don't think we need it | 22:13 |
mwhahaha | but i'm looking | 22:13 |
panda | gnocchi configuration points to the IP, not the socket, for redis | 22:13 |
mwhahaha | seems that in THT the only thing we configure a socketdir for is ovs | 22:14 |
*** mhenkel has quit IRC | 22:14 | |
*** fultonj has quit IRC | 22:19 | |
*** artom_ has quit IRC | 22:19 | |
*** gfidente has quit IRC | 22:19 | |
*** noslzzp has quit IRC | 22:24 | |
*** noslzzp has joined #tripleo | 22:24 | |
*** flepied has joined #tripleo | 22:27 | |
*** bfournie has quit IRC | 22:29 | |
openstackgerrit | Ben Nemec proposed openstack/tripleo-heat-templates: Include keystone authtoken config in manila-share service https://review.openstack.org/393947 | 22:34 |
openstackgerrit | Ben Nemec proposed openstack/tripleo-heat-templates: Move db settings from manila-api to manila-base https://review.openstack.org/393948 | 22:34 |
*** rajinir has quit IRC | 22:36 | |
*** artom has joined #tripleo | 22:36 | |
*** amoralej|off is now known as amoralej | 22:46 | |
*** jerrygb has joined #tripleo | 22:52 | |
*** jerrygb has quit IRC | 22:58 | |
derekh | my attempt at reproducing failed with this error, is it one of the current issues? | 23:00 |
derekh | ++ timeout -k 10 240 ssh -o StrictHostKeyChecking=no -o UserKnownHostsFile=/dev/null -o LogLevel=Verbose -o PasswordAuthentication=no -o ConnectionAttempts=32 heat-admin@192.0.2.10 sudo crm_resource -r openstack-heat-api --wait | 23:00 |
derekh | crm_resource for openstack-heat-api has failed! | 23:00 |
*** b00tcat` has quit IRC | 23:01 | |
*** ooolpbot has joined #tripleo | 23:10 | |
ooolpbot | URGENT TRIPLEO TASKS NEED ATTENTION | 23:10 |
ooolpbot | https://bugs.launchpad.net/tripleo/+bug/1637961 | 23:10 |
openstack | Launchpad bug 1637961 in tripleo "periodic HA master job pingtest times out" [Critical,In progress] - Assigned to Gabriele Cerami (gcerami) | 23:10 |
ooolpbot | https://bugs.launchpad.net/tripleo/+bug/1638350 | 23:10 |
ooolpbot | https://bugs.launchpad.net/tripleo/+bug/1639356 | 23:10 |
*** ooolpbot has quit IRC | 23:10 | |
openstack | Launchpad bug 1638350 in tripleo "pingtest failing on OVB jobs to create Cinder volume and Nova server" [Critical,In progress] - Assigned to Gabriele Cerami (gcerami) | 23:10 |
openstack | Launchpad bug 1639356 in tripleo "Redis 3.2.4 breaks TripleO CI" [Critical,In progress] | 23:10 |
*** saneax-_-|AFK is now known as saneax | 23:19 | |
*** bfournie has joined #tripleo | 23:29 | |
*** akshai has joined #tripleo | 23:35 | |
*** akshai has quit IRC | 23:36 | |
EmilienM | mwhahaha, panda: back, what's up? | 23:42 |
EmilienM | slagle: can you approve https://review.openstack.org/#/c/393770/ ? | 23:43 |
EmilienM | err https://review.openstack.org/#/c/391862/ | 23:43 |
EmilienM | well we can approve both | 23:43 |
*** akshai has joined #tripleo | 23:44 | |
panda | EmilienM: we have the root cause of all root causes | 23:46 |
panda | EmilienM: all the work of the past week can be removed, abandoned, destroyed. | 23:46 |
panda | (the work to fix the breakage) | 23:47 |
*** akshai has quit IRC | 23:48 | |
mwhahaha | EmilienM: https://bugzilla.redhat.com/show_bug.cgi?id=1374728 it's a packaging fix | 23:53 |
openstack | bugzilla.redhat.com bug 1374728 in redis "/var/run/redis and /usr/lib/tmpfiles/redis.conf not used" [Unspecified,Post] - Assigned to apevec | 23:53 |
*** maticue has quit IRC | 23:55 | |
openstackgerrit | OpenStack Proposal Bot proposed openstack/python-tripleoclient: Updated from global requirements https://review.openstack.org/389945 | 23:58 |
*** amoralej is now known as amoralej|off | 23:59 |
Generated by irclog2html.py 2.14.0 by Marius Gedminas - find it at mg.pov.lt!