*** britthouser has quit IRC | 00:01 | |
*** alop has quit IRC | 00:17 | |
*** daneyon has joined #openstack-ansible | 00:39 | |
openstackgerrit | Merged stackforge/os-ansible-deployment: Updated master for new dev work - 15 Aug 2015 https://review.openstack.org/212919 | 00:42 |
---|---|---|
*** daneyon_ has joined #openstack-ansible | 00:46 | |
*** daneyon has quit IRC | 00:49 | |
*** sigmavirus24 is now known as sigmavirus24_awa | 00:49 | |
*** shoutm has quit IRC | 00:51 | |
*** shoutm has joined #openstack-ansible | 01:04 | |
*** JRobinson__ has joined #openstack-ansible | 01:16 | |
*** shoutm has quit IRC | 02:18 | |
*** shoutm has joined #openstack-ansible | 02:20 | |
*** galstrom_zzz is now known as galstrom | 02:28 | |
*** galstrom is now known as galstrom_zzz | 02:28 | |
*** shoutm has quit IRC | 02:44 | |
*** logan2 has quit IRC | 02:44 | |
*** shoutm has joined #openstack-ansible | 03:01 | |
*** sdake has joined #openstack-ansible | 03:32 | |
*** logan2 has joined #openstack-ansible | 03:34 | |
*** sdake_ has quit IRC | 03:36 | |
*** markvoelker has quit IRC | 03:42 | |
*** sdake_ has joined #openstack-ansible | 03:42 | |
*** sdake has quit IRC | 03:45 | |
*** shoutm has quit IRC | 04:00 | |
openstackgerrit | Jesse Pretorius proposed stackforge/os-ansible-deployment: Add nova_libvirt_live_migration_flag variable https://review.openstack.org/212452 | 04:02 |
openstackgerrit | Jesse Pretorius proposed stackforge/os-ansible-deployment: Use slurp to get the content of the ceph.conf file https://review.openstack.org/213170 | 04:02 |
openstackgerrit | Jesse Pretorius proposed stackforge/os-ansible-deployment: Remove hardcoded config drive enforcement https://review.openstack.org/212497 | 04:03 |
openstackgerrit | Jesse Pretorius proposed stackforge/os-ansible-deployment: Removes trailing whitespace for bashate https://review.openstack.org/207663 | 04:03 |
openstackgerrit | Jesse Pretorius proposed stackforge/os-ansible-deployment: Update the documented ceph user variables https://review.openstack.org/209130 | 04:04 |
openstackgerrit | Jesse Pretorius proposed stackforge/os-ansible-deployment: Allow cinder-backup to use ceph https://review.openstack.org/209537 | 04:04 |
openstackgerrit | Jesse Pretorius proposed stackforge/os-ansible-deployment: Add support for additional nova.conf options https://review.openstack.org/210492 | 04:04 |
openstackgerrit | Jesse Pretorius proposed stackforge/os-ansible-deployment: Keystone SSL cert/key distribution and configuration https://review.openstack.org/194474 | 04:05 |
*** shoutm has joined #openstack-ansible | 04:06 | |
*** JRobinson__ is now known as JRobinson__afk | 04:17 | |
*** sdake_ is now known as sdake | 04:22 | |
*** shoutm_ has joined #openstack-ansible | 04:27 | |
*** shoutm has quit IRC | 04:28 | |
*** markvoelker has joined #openstack-ansible | 04:42 | |
*** JRobinson__afk is now known as JRobinson__ | 04:45 | |
*** markvoelker has quit IRC | 04:47 | |
*** shoutm_ has quit IRC | 05:03 | |
*** shoutm has joined #openstack-ansible | 05:34 | |
*** ashishb has joined #openstack-ansible | 05:58 | |
*** javeriak has joined #openstack-ansible | 06:01 | |
openstackgerrit | Jesse Pretorius proposed stackforge/os-ansible-deployment: Replace ADFS example DNS name with something appropriate https://review.openstack.org/214012 | 06:02 |
openstackgerrit | Jesse Pretorius proposed stackforge/os-ansible-deployment: Adds a pep8 target to tox.ini https://review.openstack.org/214013 | 06:03 |
*** shoutm has quit IRC | 06:05 | |
*** shoutm has joined #openstack-ansible | 06:12 | |
*** britthou_ has quit IRC | 06:22 | |
*** britthouser has joined #openstack-ansible | 06:22 | |
*** JRobinson__ has quit IRC | 06:23 | |
*** shoutm_ has joined #openstack-ansible | 06:39 | |
*** shoutm has quit IRC | 06:42 | |
*** openstackgerrit_ has joined #openstack-ansible | 06:43 | |
*** markvoelker has joined #openstack-ansible | 06:43 | |
*** javeriak has quit IRC | 06:47 | |
*** javeriak has joined #openstack-ansible | 06:48 | |
openstackgerrit | Jesse Pretorius proposed stackforge/os-ansible-deployment: Update rabbitmq version deployed to v3.5.4 https://review.openstack.org/210516 | 06:48 |
*** markvoelker has quit IRC | 06:48 | |
*** javeriak_ has joined #openstack-ansible | 06:51 | |
*** javeriak has quit IRC | 06:52 | |
openstackgerrit | Jesse Pretorius proposed stackforge/os-ansible-deployment: Update tempest configuration https://review.openstack.org/210107 | 06:57 |
*** fawadkhaliq has joined #openstack-ansible | 07:05 | |
openstackgerrit | Jesse Pretorius proposed stackforge/os-ansible-deployment: Switch tempest to test Cinder API v2 https://review.openstack.org/214045 | 07:22 |
*** benwh4 has joined #openstack-ansible | 07:50 | |
benwh4 | I have an sysadmin question, does the container-mgmt network should perform web action or is it entirely manage by the hosts-net in the example does 10.240.0.0/24 should reach or have wan connectivity ? | 07:56 |
odyssey4me | benwh4 I'm not sure that I understand your question. | 07:58 |
*** fawadkhaliq has quit IRC | 08:01 | |
*** fawadkhaliq has joined #openstack-ansible | 08:01 | |
mattt | benwh4: that is a private network, you don't want that publicly exposed | 08:06 |
benwh4 | during the osad deployment following the example you have net 10.240.0.0 & 172.29.236.0 | 08:06 |
benwh4 | and does 172.29.36.0 should have internet conectivity to perform some openstack i stallation ? or is it manage by the 10.240.0.0 network ? | 08:07 |
odyssey4me | benwh4 you can allow the networks out if you want to, but don't allow public access into anything except the external load balancer address | 08:07 |
benwh4 | ok but whom install from source the openstack package ? the management-net (10.240.0.0) or the container-mgmt (172.29.236.0/22) ? | 08:08 |
benwh4 | coz ansible should reach over ssh both network and I need to set up my firewall to allow the traffic, I just want to know which network will install the openstack services | 08:10 |
odyssey4me | benwh4 I think all containers use eth0 for the default gateway, and therefore their traffic will pass through the host and be natted - so the access to public packages will depend on the host's default gateway, which is something you setup | 08:10 |
odyssey4me | yes, ansible reaches the hosts and containers on the mgmt network | 08:11 |
benwh4 | eth0 who is bind to bind0 correct ? | 08:12 |
odyssey4me | benwh4 the host network configuration is up to you | 08:13 |
odyssey4me | the containers do not use bonded network configurations though | 08:13 |
odyssey4me | only the hosts | 08:13 |
odyssey4me | and the outgoing traffic will pass through the default gateway for the hosts | 08:14 |
benwh4 | yes but I still have ssh error and I suppose it is due to my firewall which hosts the networks vlan and GW | 08:15 |
odyssey4me | an ssh error from where to where? | 08:15 |
benwh4 | but if all traffic go through 10.240.0.0 (or my conf) it should work fine but it doesn't | 08:16 |
benwh4 | from the deployment node to the container I think coz form the deployment node I ssh all my targets | 08:17 |
odyssey4me | ok, is the deployment node on the same network as the containers? | 08:17 |
mattt | we typically deploy from one of the nodes in the cluster itself | 08:17 |
odyssey4me | ie does it have an address on the management subnet? | 08:17 |
benwh4 | yes | 08:17 |
odyssey4me | ok, then no routing should be involved | 08:18 |
odyssey4me | is the ssh error all the time, or sometimes? | 08:18 |
odyssey4me | and is it when you use ansible only, or when you ssh too? | 08:18 |
benwh4 | all the time to the targets hosts | 08:18 |
benwh4 | only when I use the setup-hosts.yml playboo | 08:19 |
odyssey4me | ok - can you ssh to the host using the ssh client (not ansible) | 08:19 |
mattt | benwh4: do you have keys setup? | 08:19 |
benwh4 | form my deployment node yes | 08:19 |
benwh4 | yes | 08:19 |
benwh4 | I have keys set up | 08:20 |
mattt | benwh4: and when you ssh in you're doing so as root right? | 08:21 |
mattt | all the ansible stuff runs as root | 08:21 |
benwh4 | yes I ssh using root | 08:21 |
odyssey4me | benwh4 can you ssh as root to the target host? | 08:21 |
benwh4 | yes | 08:21 |
mattt | benwh4: go into /path/to/os-ansible-deployment/playbooks, and type "ansible all -m ping" | 08:22 |
benwh4 | should my deployment node have the same network configuration as the target ? | 08:22 |
odyssey4me | benwh4 no, but it does need to be able to communicate via ssh to the target | 08:22 |
odyssey4me | is there any firewalling on the target hosts? | 08:23 |
benwh4 | the ansible all -m ping doesn't work it give me ssh error | 08:25 |
benwh4 | non there is no FWing on the target hosts | 08:26 |
odyssey4me | it sounds to me like you probably have some sort of sshd configuration issue that's preventing ansible from working properly | 08:26 |
odyssey4me | on the target hosts | 08:26 |
benwh4 | but when I do ssh tragethost it works | 08:26 |
benwh4 | ok I will double check | 08:27 |
odyssey4me | yes, but ansible uses more options | 08:27 |
odyssey4me | try reverting the config to defaults, enabling key based access for root and then re-enabling any special settings you have | 08:27 |
benwh4 | ok last thing does chmod 600 on the authorized_keys file is good or to restrictive ? | 08:28 |
mattt | benwh4: that's fine | 08:33 |
mattt | assuming it's owned by the correct user | 08:33 |
*** c0m0 has joined #openstack-ansible | 08:39 | |
*** shausy has joined #openstack-ansible | 08:42 | |
*** markvoelker has joined #openstack-ansible | 08:45 | |
*** shoutm_ has quit IRC | 08:48 | |
*** markvoelker has quit IRC | 08:50 | |
*** d0ugal has joined #openstack-ansible | 08:53 | |
evrardjp | good morning everyone | 08:59 |
odyssey4me | o/ evrardjp | 08:59 |
evrardjp | I've got a working draft for haproxy | 08:59 |
odyssey4me | good news - I seem to have tracked down the pattern for the build failures in master: https://bugs.launchpad.net/openstack-ansible/+bug/1485917 | 09:00 |
openstack | Launchpad bug 1485917 in openstack-ansible trunk "hpcloud AIO's are failing tempest tests" [Critical,Confirmed] - Assigned to Jesse Pretorius (jesse-pretorius) | 09:00 |
evrardjp | I'll create the blueprint with the features | 09:00 |
evrardjp | nice! | 09:00 |
odyssey4me | evrardjp great! if you can prep a spec for what you're implementing, that'd be great | 09:00 |
evrardjp | I've never done that | 09:00 |
odyssey4me | ie register a blueprint with a short summary, then the spec with the details | 09:00 |
evrardjp | ok | 09:01 |
evrardjp | I'll try and we'll adapt if necessary | 09:01 |
odyssey4me | clone https://github.com/stackforge/os-ansible-deployment-specs | 09:01 |
odyssey4me | copy https://github.com/stackforge/os-ansible-deployment-specs/blob/master/specs/template.rst to the 'liberty' folder and name the file according to the same name as the blueprint | 09:01 |
odyssey4me | you'll see the pattern with existing files there | 09:01 |
odyssey4me | then edit the file with the details | 09:02 |
odyssey4me | evrardjp here are some examples in flight https://review.openstack.org/213779 and https://review.openstack.org/207713 | 09:02 |
odyssey4me | note that you don't have to complete everything in round one - just do what you can and we can discuss and iterate from there | 09:03 |
odyssey4me | hughsaunders please can you review https://review.openstack.org/213439 - 11.1.1 has a pbr issue which breaks the repo build | 09:05 |
evrardjp | odyssey4me: ok | 09:05 |
hughsaunders | odyssey4me: yep | 09:06 |
evrardjp | odyssey4me: if it's a blueprint that doesn't target liberty, but works for kilo, should I create the spec in libery folder? | 09:06 |
odyssey4me | evrardjp you're welcome to propose it to the kilo folder - we can hopefully get it in for kilo then, perhaps in 11.2.0 | 09:07 |
*** fawadkhaliq has quit IRC | 09:30 | |
evrardjp | odyssey4me: so for the spec, I should do a git review too? | 09:36 |
evrardjp | or simply pushing it? | 09:36 |
odyssey4me | evrardjp yes, you submit it using git review | 09:36 |
odyssey4me | it's reviewed in gerrit | 09:36 |
openstackgerrit | Jean-Philippe Evrard proposed stackforge/os-ansible-deployment-specs: Add spec to change haproxy default behaviour https://review.openstack.org/214089 | 09:37 |
evrardjp | woops | 09:37 |
evrardjp | whitespaces | 09:37 |
evrardjp | I'm fixing that | 09:37 |
openstackgerrit | Jean-Philippe Evrard proposed stackforge/os-ansible-deployment-specs: Add spec to change haproxy default behaviour https://review.openstack.org/214089 | 09:39 |
odyssey4me | git-harry https://bugs.launchpad.net/openstack-ansible/+bug/1484011 | 09:41 |
openstack | Launchpad bug 1484011 in openstack-ansible trunk "python-openstacksdk build fails" [Critical,New] | 09:41 |
*** neillc is now known as neillc_away | 09:42 | |
openstackgerrit | Matt Thompson proposed stackforge/os-ansible-deployment: Disable tempest swift tests when DEPLOY_SWIFT=no https://review.openstack.org/214102 | 09:56 |
openstackgerrit | Jean-Philippe Evrard proposed stackforge/os-ansible-deployment: [WIP] HAProxy rewrite https://review.openstack.org/214107 | 10:03 |
openstackgerrit | Jean-Philippe Evrard proposed stackforge/os-ansible-deployment: Enables the admin level on the haproxy stats socket. https://review.openstack.org/214110 | 10:07 |
evrardjp | I have two hosts for working on OSAD: my main workstation and one of my deploy hosts... when I'm working on a patch, I don't know how I should transfer the state of the work realized... I thought it's best to use another repo (like a personal github repo) | 10:11 |
evrardjp | but it seems to bring a mess with the commits (pushes to personal repo and then review seem to bring more than one commit)... how should I do? | 10:12 |
odyssey4me | evrardjp typically I use a deployment host in a test environment to work out a working patch, then I diff it and copy the diff to my workstation where I patch it into a branch and submit it via gerrit | 10:12 |
evrardjp | atm I'm using patch to solve this | 10:13 |
evrardjp | ok | 10:13 |
evrardjp | I'm doing the same | 10:13 |
evrardjp | then it's not that bad | 10:13 |
evrardjp | ;) | 10:13 |
odyssey4me | I also ensure that I do at least one patch set upload each day so that the work can be reviewed, commented on or continue | 10:13 |
evrardjp | I ddin't get that | 10:13 |
evrardjp | didn't* | 10:13 |
odyssey4me | yeah, it's not the best workflow - but the alternative would be to put my ssh keys on the lab host which is not all that wise | 10:14 |
evrardjp | oh you make sure that your code is submitted on a day to day basis, right? | 10:14 |
odyssey4me | yes | 10:14 |
odyssey4me | that way it's safe, and it can be reviewed | 10:14 |
evrardjp | I don't have that opportunity, deploy host is in isolated lab ;) | 10:14 |
evrardjp | hard to reach I'd say | 10:15 |
odyssey4me | but if I'm co-authoring a patch with someone then they can also continue from where I left off | 10:15 |
evrardjp | that makes sense | 10:15 |
hughsaunders | I use my github fork to shuffle patches between my workstation and test env before pushing to gerrit for review | 10:15 |
*** sdake has quit IRC | 10:15 | |
*** britthouser has quit IRC | 10:17 | |
evrardjp | hughsaunders: so you pull and then review, but do you have to squash commits, or pay attention to specific stuff? | 10:19 |
hughsaunders | evrardjp: I tend to work on a single commit, so I dont need to squash, I commit --ammend for each update that I want to test, then force push to my github fork. Then when testing is done, I only have a single commit to submit to gerrit | 10:20 |
evrardjp | I maybe did something wrong... I try to also amend all the time | 10:23 |
openstackgerrit | Merged stackforge/os-ansible-deployment: Update kilo for new dev work - 15 Aug 2015 https://review.openstack.org/213439 | 10:24 |
hughsaunders | evrardjp: you will have multiple commits to submit if you are working on a patch that depends on other patches that haven't merged yet | 10:26 |
*** shoutm has joined #openstack-ansible | 10:27 | |
*** britthouser has joined #openstack-ansible | 10:36 | |
*** ashishb has quit IRC | 10:37 | |
*** ashishb has joined #openstack-ansible | 10:37 | |
openstackgerrit | Jesse Pretorius proposed stackforge/os-ansible-deployment: Switch tempest to test Cinder API v2 https://review.openstack.org/214045 | 10:41 |
*** javeriak_ has quit IRC | 10:42 | |
*** javeriak has joined #openstack-ansible | 10:43 | |
openstackgerrit | Jesse Pretorius proposed stackforge/os-ansible-deployment: Switch Nova/Tempest to use/test Cinder API v2 https://review.openstack.org/214045 | 10:44 |
*** markvoelker has joined #openstack-ansible | 10:46 | |
*** javeriak has quit IRC | 10:48 | |
*** markvoelker has quit IRC | 10:50 | |
*** shausy has quit IRC | 11:09 | |
*** javeriak has joined #openstack-ansible | 11:09 | |
*** shausy has joined #openstack-ansible | 11:09 | |
*** britthouser has quit IRC | 11:11 | |
*** britthouser has joined #openstack-ansible | 11:12 | |
openstackgerrit | Jesse Pretorius proposed stackforge/os-ansible-deployment: Switch Nova/Tempest to use/test Cinder API v2 https://review.openstack.org/214045 | 11:13 |
*** javeriak has quit IRC | 11:15 | |
*** javeriak_ has joined #openstack-ansible | 11:15 | |
openstackgerrit | Jesse Pretorius proposed stackforge/os-ansible-deployment: Update tempest configuration https://review.openstack.org/210107 | 11:16 |
*** fawadkhaliq has joined #openstack-ansible | 11:37 | |
*** rward has quit IRC | 11:37 | |
*** britthouser has quit IRC | 11:43 | |
*** markvoelker has joined #openstack-ansible | 11:46 | |
*** fawadk has joined #openstack-ansible | 11:46 | |
*** fawadkhaliq has quit IRC | 11:47 | |
*** markvoelker has quit IRC | 11:51 | |
*** fawadk has quit IRC | 11:55 | |
mgariepy | good morning everyone. | 12:11 |
mgariepy | odyssey4me, do you sleep ? | 12:12 |
*** markvoelker has joined #openstack-ansible | 12:12 | |
*** woodard has joined #openstack-ansible | 12:14 | |
mgariepy | Sam-I-Am, I would like to have a quick idea on how to implement the external network with osad if you don't mind. | 12:17 |
*** woodard has quit IRC | 12:26 | |
odyssey4me | mgariepy sometimes :p | 12:50 |
*** woodard has joined #openstack-ansible | 12:52 | |
mgariepy | haha | 12:52 |
*** britthouser has joined #openstack-ansible | 12:59 | |
*** britthou_ has joined #openstack-ansible | 13:01 | |
openstackgerrit | Jesse Pretorius proposed stackforge/os-ansible-deployment: Set iptables-persistent install execution to append to log https://review.openstack.org/214172 | 13:02 |
*** tlian has joined #openstack-ansible | 13:03 | |
*** KLevenstein has joined #openstack-ansible | 13:04 | |
*** britthouser has quit IRC | 13:04 | |
evrardjp | mgariepy: I'm not sure. Maybe he sleeps between some red bulls ;) | 13:07 |
evrardjp | mgariepy: you mean using neutron? | 13:08 |
mgariepy | yeah, | 13:08 |
evrardjp | you can connect on the utility container first | 13:08 |
mgariepy | i just want to have a gross idea on how to specify the network for the container. | 13:08 |
evrardjp | source the rc file there | 13:08 |
evrardjp | ho | 13:08 |
evrardjp | we aren't speaking about the same thing I think | 13:08 |
evrardjp | what do you mean? | 13:09 |
mgariepy | first. on the node i have a br-ex (external net) maps to eth12 on the neutron container | 13:09 |
mgariepy | http://paste.ubuntu.com/12118008/ | 13:10 |
mgariepy | I guess i need that but i'm not quite sure. haha | 13:10 |
evrardjp | this means that a network named "ex", with untagged traffic will arrive to neutron from the host bridge br-ex | 13:11 |
evrardjp | so your host should have a br-ex bridge (manually configured on the host) | 13:12 |
mgariepy | so I add this to the neutron container, got my eth12 interface, after this i think it need to be added to neutron config ? | 13:12 |
mgariepy | yes i have that. | 13:12 |
evrardjp | ok | 13:12 |
mgariepy | ans this maps the veth in the container. | 13:12 |
evrardjp | neutron can then leverage that network in openstack | 13:12 |
mgariepy | where do i add this in neutron ? | 13:13 |
evrardjp | it depends on what you want to do | 13:13 |
evrardjp | let me find a good doc that explains that | 13:13 |
mgariepy | only by adding a external network in openstack ? or do i need to specify it in config files ? | 13:14 |
*** ashishb has quit IRC | 13:14 | |
evrardjp | you define it with neutron CLI | 13:14 |
evrardjp | the configuration files are handled by OSAD | 13:14 |
evrardjp | check maybe here | 13:15 |
evrardjp | http://docs.openstack.org/networking-guide/deploy.html | 13:15 |
*** woodard has quit IRC | 13:16 | |
mattt | mgariepy: https://github.com/stackforge/os-ansible-deployment/blob/master/etc/openstack_deploy/openstack_user_config.yml.example#L254-L262 | 13:18 |
mattt | mgariepy: https://github.com/stackforge/os-ansible-deployment/blob/master/etc/network/interfaces.d/openstack_interface.cfg.example#L102-L109 | 13:19 |
*** woodard has joined #openstack-ansible | 13:19 | |
*** persia_ is now known as persia | 13:19 | |
mattt | mgariepy: so if you're using a vlan or flat it needs to be specific in openstack_user_config.yml | 13:20 |
evrardjp | mattt: with the paste above, I think mgariepy: already configured his openstack_user_config | 13:21 |
mgariepy | my br-ex is already part of a vlan, so my eth12 will need to be flat | 13:21 |
*** prad_ has joined #openstack-ansible | 13:39 | |
*** woodard has quit IRC | 13:39 | |
*** woodard has joined #openstack-ansible | 13:40 | |
*** javeriak_ has quit IRC | 13:41 | |
Sam-I-Am | mgariepy: br-ex is an openvswitch thing | 13:50 |
Sam-I-Am | osad uses linuxbridge, which creates bridges that bind to interfaces | 13:51 |
mgariepy | Sam-I-Am, ok so the rigth way to do it is ? | 13:53 |
Sam-I-Am | what kind of networks do you use? | 13:54 |
Sam-I-Am | vlan? vxlan? tenant/provider? | 13:54 |
mgariepy | vxlan for the tenant | 13:54 |
mgariepy | and vlan for public net. | 13:54 |
Sam-I-Am | on each hosts, you have a br-vlan and br-vxlan, right? | 13:56 |
mgariepy | yes | 13:56 |
Sam-I-Am | you can remove the section in the config about 'flat' networks if you dont need them | 13:56 |
*** javeriak has joined #openstack-ansible | 14:00 | |
mgariepy | ok, and use then create the provider network with : neutron net-create provider-101 --shared --provider:physical_network provider --provider:network_type vlan --provider:segmentation_id 101 | 14:00 |
mgariepy | and the also the subnet | 14:00 |
mgariepy | simple as that ? | 14:01 |
mattt | mgariepy: let me look for some documentation on what our support team does | 14:04 |
palendae | odyssey4me: Why was pbr completely removed here? https://review.openstack.org/#/c/213439/2/playbooks/roles/repo_server/defaults/main.yml | 14:04 |
*** Mudpuppy has joined #openstack-ansible | 14:04 | |
Sam-I-Am | mgariepy: depends how you configured your deployment | 14:05 |
odyssey4me | palendae because it was only added the previous sha bump as a workaround until https://review.openstack.org/211800 merged | 14:05 |
Sam-I-Am | but assuming thats correct, neutron is just neutron | 14:05 |
Sam-I-Am | physical_network would probably be 'vlan' unless you called it something else | 14:05 |
palendae | odyssey4me: I think we need to keep it at 0.11 or higher | 14:05 |
mgariepy | it's vlan | 14:05 |
mgariepy | a few pointer in the doc to configure this would be nice tho ;) | 14:06 |
odyssey4me | palendae it now uses the global requirements from OpenStack, which are pbr>=0.6,!=0.7,<1.0 for stable/kilo | 14:06 |
*** phalmos has joined #openstack-ansible | 14:06 | |
Sam-I-Am | mgariepy: the docs do say this | 14:06 |
Sam-I-Am | mgariepy: after you deploy openstack, its neutron stuff, already documented upstream here: http://docs.openstack.org/networking-guide/scenario_legacy_lb.html | 14:07 |
evrardjp | mattt: as you were working with ceph stuff, did you get issues for creating instances? | 14:08 |
*** javeriak has quit IRC | 14:08 | |
mattt | evrardjp: i have seen a few issues after deploying where i need to bounce nova-compute, i need to investigate this now that you mentoin it | 14:09 |
mattt | evrardjp: but further than taht no, it works for me | 14:09 |
evrardjp | what do you mean by "bounce nova-compute" ? | 14:10 |
evrardjp | restart nova-compute services? | 14:10 |
mattt | evrardjp: yep | 14:10 |
mattt | evrardjp: i think that service isn't getting restarted properly after we do the ceph bits | 14:10 |
mattt | evrardjp: what kind of problems you running into? | 14:11 |
evrardjp | it still fails on me | 14:11 |
*** sigmavirus24_awa is now known as sigmavirus24 | 14:11 | |
evrardjp | No valid host was found. There are not enough hosts available | 14:12 |
evrardjp | wait | 14:12 |
* mattt waits | 14:13 | |
evrardjp | I'll first check if nova still responds | 14:13 |
evrardjp | yup, nova services down | 14:14 |
*** spotz_zzz is now known as spotz | 14:14 | |
evrardjp | should I do something else after a service nova-compute restart ? | 14:15 |
evrardjp | the process is listed | 14:15 |
evrardjp | on the compute node | 14:15 |
mattt | evrardjp: nope, they take a while to check back in tho | 14:16 |
evrardjp | I've restarted libvirt-bin justin case | 14:16 |
evrardjp | Ok, I'll tell you after a few minutes if it's still failing | 14:17 |
Apsu | evrardjp: You have to check the nova-compute logs to see what/if there's a problem nova-compute is having with identifying available resources | 14:17 |
mattt | yeah, i'd recommend that ... or the scheduler logs incase it's not finding a compute node | 14:18 |
Apsu | "No valid host" means either A) nova-computes aren't checking in, or B) they're checking in but misidentifying resources available for what you're trying to boot. | 14:18 |
evrardjp | I'd check the scheduler myself | 14:18 |
Apsu | Which could be neutron services down | 14:18 |
Apsu | So might want to make sure neutron's agent is running on the hypervisor too. | 14:18 |
Apsu | Check its logs, etc | 14:18 |
evrardjp | the process runs, and I've checked in the nova-compute logs, but I'm not sure about what I should see | 14:19 |
andymccr | evrardjp: since Apsu mentions neutron services - its worth checking, I had that issue with an install I did a few days ago, there was a neutron-db-manage issue so the neutron-server services werent starting. | 14:19 |
Apsu | If they're not checking in (nova service-list shows it down)... | 14:19 |
evrardjp | I see a lot of dumps about capabilities | 14:19 |
Ti-mo | Hi, just found this in nrb's gists: https://gist.github.com/nrb/d6142c104677c09683f1, anyone else facing this issue by any chance ? | 14:19 |
evrardjp | neutron service runs | 14:19 |
Apsu | evrardjp: Is it repeating the capability identification over and over? | 14:19 |
evrardjp | neutron-linuxbridge-agent at least | 14:19 |
evrardjp | yup | 14:19 |
Apsu | evrardjp: neutron agent-list, make sure compute's agent is ":-)" | 14:19 |
Apsu | nova service-list, make sure the compute host is "up" | 14:20 |
*** shoutm has quit IRC | 14:20 | |
Apsu | The fact it's repeating the capability identification means it's almost certainly "up" | 14:20 |
Apsu | That's the log output when it checks in | 14:20 |
evrardjp | ok | 14:21 |
*** jmckind has joined #openstack-ansible | 14:21 | |
Apsu | So check neutron agent-list | 14:21 |
openstackgerrit | Jesse Pretorius proposed stackforge/os-ansible-deployment: Enable/disable Swift/OpenStack deployment properly https://review.openstack.org/214213 | 14:22 |
evrardjp | a lot of smileys | 14:22 |
evrardjp | only smileys I should say | 14:22 |
odyssey4me | mattt there you go - https://review.openstack.org/214213 | 14:23 |
Apsu | If the compute's neutron agent is ":-)", you can start worrying about nova config being wrong on that node, the flavor looking for things not on that node, needing to restart the nova-scheduler... | 14:23 |
*** benwh4 has quit IRC | 14:23 | |
Apsu | Easiest order is restart nova-scheduler, first. | 14:23 |
openstackgerrit | Jesse Pretorius proposed stackforge/os-ansible-deployment: Enable/disable Swift/OpenStack deployment properly https://review.openstack.org/214213 | 14:23 |
*** wmlynch has quit IRC | 14:23 | |
*** wmlynch has joined #openstack-ansible | 14:24 | |
mattt | odyssey4me: did you deliberately remove export BOOTSTRAP_AIO ? | 14:25 |
odyssey4me | mattt nope, sorry - lemme put that back | 14:25 |
mattt | k | 14:26 |
odyssey4me | urgh, I see some other nonsense in there too | 14:26 |
odyssey4me | lemme clesan it up | 14:26 |
*** prad_ is now known as pradk | 14:26 | |
evrardjp | I think it's an issue with the scheduler, but it didn't show up before I tried ceph | 14:28 |
openstackgerrit | Jesse Pretorius proposed stackforge/os-ansible-deployment: Enable/disable Swift/OpenStack deployment properly https://review.openstack.org/214213 | 14:28 |
evrardjp | in my "host aggregates" that I see on horizon (I don't recall the command to show that), I see "Services Down" for the availability zone "Nova" | 14:29 |
*** wmlynch has quit IRC | 14:29 | |
evrardjp | and I've restarted nova-schedulers and nova-compute | 14:29 |
odyssey4me | mattt ^ updated | 14:29 |
openstackgerrit | Jesse Pretorius proposed stackforge/os-ansible-deployment: Enable/disable Swift/OpenStack deployment properly https://review.openstack.org/214213 | 14:31 |
*** sdake has joined #openstack-ansible | 14:31 | |
openstackgerrit | Jesse Pretorius proposed stackforge/os-ansible-deployment: Enable/disable Swift/OpenStack deployment properly https://review.openstack.org/214213 | 14:31 |
*** woodard has quit IRC | 14:33 | |
*** rward has joined #openstack-ansible | 14:34 | |
*** jmckind has quit IRC | 14:34 | |
*** sdake_ has joined #openstack-ansible | 14:37 | |
*** jwagner_away is now known as jwagner | 14:37 | |
*** sdake has quit IRC | 14:40 | |
*** woodard has joined #openstack-ansible | 14:41 | |
*** phalmos has quit IRC | 14:42 | |
evrardjp | mattt: on your working installation, shouldn't virsh pool-list mention the ceph pool ? | 14:43 |
*** wmlynch has joined #openstack-ansible | 14:44 | |
mattt | evrardjp: mine doesn't | 14:45 |
evrardjp | ok | 14:45 |
evrardjp | I thought it had it | 14:45 |
evrardjp | too* | 14:45 |
evrardjp | to* | 14:45 |
evrardjp | I'm redeploying os-nova-install.yml, just to make sure | 14:45 |
mattt | are your instances booting now? | 14:45 |
evrardjp | nope | 14:45 |
evrardjp | and my nova services are still down | 14:46 |
evrardjp | and no error in the logs | 14:46 |
evrardjp | just some info | 14:46 |
evrardjp | amqp seems fine | 14:47 |
odyssey4me | evrardjp is your time consistent across nodes? ie ntp sync | 14:48 |
odyssey4me | also is the DB healthy | 14:48 |
odyssey4me | and is nova-conductor running? | 14:48 |
evrardjp | I have a ntp server, and my nodes should be using it | 14:49 |
odyssey4me | nova-conductor is the interface between the compute nodes and the DB | 14:49 |
mattt | evrardjp: sometimes it's helpful to flip debug on in the nova.conf file | 14:49 |
odyssey4me | ++ | 14:49 |
*** phalmos has joined #openstack-ansible | 14:50 | |
evrardjp | ok | 14:50 |
evrardjp | then I should restart all the nova services, right? is there a specific order for that? | 14:50 |
Apsu | Not really | 14:50 |
evrardjp | ok | 14:50 |
evrardjp | good | 14:50 |
Apsu | I mean, nova-server first is probably the least noisy | 14:51 |
evrardjp | I'll check nova-conductor on all the nodes first | 14:51 |
Apsu | Then conductor+compute | 14:51 |
evrardjp | then flip debug on server - conductor - compute | 14:51 |
evrardjp | is it bad if I see Failed to consume message from queue on a conductor, and then 2 lines after: Connecting to AMQP Server, and then Connected to AMQP server | 14:54 |
evrardjp | After the "Failed to consume message from queue:" I've got "skipping periodic task _periodic_update_dns becaue its interval is negative" | 14:54 |
evrardjp | the time seems good though at first sight | 14:55 |
mattt | evrardjp: is it a production install? | 14:55 |
evrardjp | nope but it should be considered as | 14:56 |
evrardjp | why? | 14:56 |
mattt | evrardjp: just curious | 14:56 |
*** wmlynch_ has joined #openstack-ansible | 14:56 | |
evrardjp | It has worked in the past | 14:56 |
mattt | evrardjp: your rabbit cluster is up right? | 14:56 |
evrardjp | yup | 14:57 |
evrardjp | else it would say connected | 14:57 |
evrardjp | wouldn't* | 14:57 |
mattt | yeah that is true | 14:57 |
evrardjp | yeah, list_queues and list_policies seem fine | 14:58 |
*** rromans_ has joined #openstack-ansible | 14:58 | |
evrardjp | I'm flipping debug | 14:58 |
*** tobasco_ has quit IRC | 14:59 | |
*** mgariepy has quit IRC | 14:59 | |
*** metral has quit IRC | 14:59 | |
*** rromans has quit IRC | 14:59 | |
*** Ti-mo has quit IRC | 14:59 | |
*** tobasco has joined #openstack-ansible | 14:59 | |
mattt | evrardjp: how many compute nodes do you have ? | 14:59 |
*** woodard_ has joined #openstack-ansible | 15:02 | |
evrardjp | in that environment, just one | 15:02 |
evrardjp | but it should be enough for starting my vm | 15:02 |
evrardjp | enough cpu/ram | 15:02 |
mattt | evrardjp: are you using the libvirt rbd thing, or booting w/ cinder rbd ? | 15:02 |
evrardjp | good question | 15:02 |
*** woodard_ has quit IRC | 15:03 | |
*** woodard_ has joined #openstack-ansible | 15:03 | |
evrardjp | I'm not starting from a volume | 15:03 |
*** yaya has joined #openstack-ansible | 15:03 | |
*** javeriak has joined #openstack-ansible | 15:03 | |
evrardjp | so I guess it's libvirt rbd | 15:03 |
mattt | evrardjp: ok, cool ... you created the necessary user on the ceph clsuter right? | 15:03 |
evrardjp | I'm not the ceph expert of my company, so someone created it for me | 15:04 |
evrardjp | it should have failed in the playbooks if it was wrong, right? | 15:04 |
mattt | evrardjp: should have, wondering if it was given the necessary permissions | 15:05 |
mattt | and that the right pool was created etc. | 15:05 |
mattt | just thinking of all the things that can go wrong here | 15:05 |
*** woodard has quit IRC | 15:05 | |
openstackgerrit | Merged stackforge/os-ansible-deployment-specs: Add tox generated files to .gitignore https://review.openstack.org/208440 | 15:06 |
*** javeriak has quit IRC | 15:07 | |
*** javeriak has joined #openstack-ansible | 15:07 | |
Apsu | So, really, "no valid host" should have more detail available in some logfile | 15:08 |
Apsu | nova's api service, or the nova-scheduler | 15:08 |
evrardjp | I've enabled debug | 15:09 |
*** phalmos has quit IRC | 15:09 | |
* Apsu nods | 15:09 | |
sigmavirus24 | odyssey4me: where are the logs you wanted me to look at? | 15:09 |
evrardjp | I'll check again on nova api and nova scheduler | 15:09 |
evrardjp | to follow the flow | 15:09 |
*** mgariepy has joined #openstack-ansible | 15:12 | |
*** Ti-mo has joined #openstack-ansible | 15:12 | |
*** metral has joined #openstack-ansible | 15:12 | |
*** phalmos has joined #openstack-ansible | 15:17 | |
evrardjp | debug indeed helps | 15:19 |
Sam-I-Am | mgariepy: did you find what you needed? | 15:21 |
odyssey4me | sigmavirus24 if you have a gap, note that the pbr limit in https://review.openstack.org/211265 also came with a python-openstacksdk issue which I worked around, if you can figure out the source of the issue so that we can get it fixed then that'd be awesome | 15:25 |
sigmavirus24 | odyssey4me: so that bug report is incomplete | 15:25 |
odyssey4me | sigmavirus24 https://bugs.launchpad.net/openstack-ansible/+bug/1484011 is the issue | 15:25 |
openstack | Launchpad bug 1484011 in openstack-ansible trunk "python-openstacksdk build fails" [Critical,New] | 15:25 |
sigmavirus24 | It doesn't tell me what version it was trying to build of openstacksdk | 15:25 |
sigmavirus24 | Yeah I'm reading that issu | 15:25 |
mgariepy | Sam-I-Am, yeah i'll be ok. | 15:25 |
sigmavirus24 | ENOTENOUGHINFORMATION | 15:25 |
mgariepy | thanks for your help. | 15:25 |
odyssey4me | sigmavirus24 check the first comment | 15:25 |
sigmavirus24 | "git tag and github show differing tags, so I suspect there's an issue somewhere - but fixing the requirements on a static version fixes this" | 15:26 |
odyssey4me | (and the last) | 15:26 |
evrardjp | paste.openstack.org/show/NWWiRXC9yrE2aPkOfbFw | 15:26 |
sigmavirus24 | "This now also affects the master branch." | 15:26 |
sigmavirus24 | still no information about what was being built that failed | 15:26 |
sigmavirus24 | odyssey4me: I don't mean version of osad | 15:26 |
odyssey4me | sigmavirus24 'Command "python setup.py egg_info" failed with error code 1 in /tmp/openstack-builder/python-openstacksdk' | 15:26 |
sigmavirus24 | I mean version of python-openstacksdk | 15:26 |
sigmavirus24 | Yeah I get that | 15:27 |
odyssey4me | sigmavirus24 I haven't managed to replicate it down to a specific tag - that's what needs to be figured out | 15:27 |
odyssey4me | I tried a little this morning, but then squirrels | 15:27 |
*** alop has joined #openstack-ansible | 15:27 | |
odyssey4me | sigmavirus24 I'll retry a build without that version set and see how it goes - master seems not to be affected... suffice to say that this may not be an issue any more with other updates | 15:29 |
sigmavirus24 | odyssey4me: no need | 15:29 |
odyssey4me | I just need an independent verification | 15:29 |
sigmavirus24 | I'm trying to do the same | 15:29 |
*** woodard_ has quit IRC | 15:30 | |
odyssey4me | for the issue in master related to constant failures - pick a failed build, any build from master - you'll see that hpcloud is common, but I haven't found a cause yet | 15:30 |
sigmavirus24 | odyssey4me: we should start an etherpad to track failed builds | 15:32 |
*** woodard has joined #openstack-ansible | 15:32 | |
odyssey4me | sigmavirus24 sounds like a plan | 15:32 |
sigmavirus24 | So odyssey4me on master, I don't see any mention of openstacksdk in the playbooks | 15:32 |
sigmavirus24 | or anywhere in osad | 15:32 |
odyssey4me | sigmavirus24 yep, it's a dep | 15:33 |
sigmavirus24 | it's a transitive dep? | 15:33 |
sigmavirus24 | i.e., a dependency of something we're using? | 15:33 |
*** javeriak_ has joined #openstack-ansible | 15:34 | |
palendae | Wouldn't be surprised if that didn't come from openstack global requirements | 15:34 |
odyssey4me | sigmavirus24 it comes from global requirements: https://github.com/openstack/requirements/blob/master/global-requirements.txt#L160 | 15:34 |
*** javeriak has quit IRC | 15:34 | |
sigmavirus24 | Oh right | 15:34 |
palendae | Caaaaaallled it | 15:34 |
sigmavirus24 | I forget we just build everything in there | 15:34 |
odyssey4me | and https://github.com/openstack/requirements/blob/stable/kilo/global-requirements.txt#L136 | 15:34 |
sigmavirus24 | so does yaprt always take the lowest possible version? | 15:35 |
sigmavirus24 | I can never remember the answer to this | 15:35 |
Apsu | safest* | 15:35 |
Apsu | Because old is safe | 15:35 |
odyssey4me | sigmavirus24 and the issue is http://logs.openstack.org/65/211265/2/check/gate-os-ansible-deployment-dsvm-commit/3b38df0/console.html.gz#_2015-08-12_07_32_17_794 | 15:35 |
odyssey4me | sigmavirus24 it seems to be the case that it takes the lowest possible option, yes | 15:36 |
*** rward has quit IRC | 15:36 | |
svg | Does OSAD do something wrt to lvm.conf or lvm in general, that could potentially overwrite an existing lvm setup? | 15:37 |
*** phalmos has quit IRC | 15:39 | |
openstackgerrit | Jesse Pretorius proposed stackforge/os-ansible-deployment: Remove python-openstacksdk version spec https://review.openstack.org/214242 | 15:39 |
odyssey4me | sigmavirus24 ^ that is a test to see if it works again without the version specification | 15:39 |
odyssey4me | svg it would appear so: playbooks/roles/openstack_hosts/tasks/openstack_lvm_config.yml and playbooks/roles/os_cinder/tasks/cinder_lvm_config.yml | 15:40 |
svg | at least looking in ./playbooks/roles/openstack_hosts/templates/lvm.conf.j2 it seems it checks for current setup and defines them there | 15:41 |
svg | colleague mailed me about pre-existing setup where /openstack is mounted on an external storage backed lvm device, where this fails after the osad rollout and after a reboot | 15:44 |
openstackgerrit | Jesse Pretorius proposed stackforge/os-ansible-deployment: Remove hardcoded config drive enforcement https://review.openstack.org/212497 | 15:55 |
openstackgerrit | Merged stackforge/os-ansible-deployment: Enable debug logging for gate checks https://review.openstack.org/213438 | 15:56 |
*** wmlynch has quit IRC | 15:57 | |
*** javeriak_ has quit IRC | 15:58 | |
evrardjp | found my issue \o/ | 16:01 |
evrardjp | nothing related to nova | 16:01 |
Apsu | yey | 16:01 |
evrardjp | routing on the nova host was failing | 16:01 |
evrardjp | it worked perfectly fine at first sight, but routing to ceph instance was not working | 16:02 |
*** javeriak has joined #openstack-ansible | 16:02 | |
evrardjp | damn ipv6 | 16:02 |
odyssey4me | howdy all - ready for bug triage? | 16:03 |
odyssey4me | cloudnull, mattt, andymccr, d34dh0r53, hughsaunders, b3rnard0, palendae, Sam-I-Am, odyssey4me, serverascode, rromans, mancdaz, dolphm, _shaps_, BjoernT, claco, echiu, dstanek, jwagner, ayoung, prometheanfire, evrardjp bug triage in this room | 16:04 |
dstanek | i'm half here...trying to type faster to get stuff done this morning | 16:05 |
odyssey4me | if it's ok with everyone, I'd like to get through the new bugs quickly - then spend some time showing everyone how to debug gate failures | 16:05 |
odyssey4me | just orientation regarding where the logs are, how to read them, etc | 16:05 |
sigmavirus24 | that's not what a bug triage is for | 16:05 |
sigmavirus24 | =P | 16:05 |
odyssey4me | sure, but if we triage the bugs - then I think that there are a few people who may be able to help spot the cause of the master blocker right now | 16:06 |
rromans_ | . | 16:06 |
sigmavirus24 | odyssey4me: I'm just teasing | 16:07 |
palendae | Ready for triaging | 16:07 |
odyssey4me | first up: https://bugs.launchpad.net/openstack-ansible/+bug/1485547 | 16:07 |
openstack | Launchpad bug 1485547 in openstack-ansible trunk "Need ability to set default_volume_type in cinder" [Undecided,New] | 16:07 |
odyssey4me | seems like a good enhancement request - happy for this to be on the wish list? | 16:08 |
palendae | I am | 16:08 |
odyssey4me | that's a low hanging fruit patch for anyone wanting to patch it up :) | 16:09 |
* odyssey4me looks at evrardjp :) | 16:09 | |
evrardjp | :) | 16:09 |
evrardjp | I'll check if I have time | 16:10 |
evrardjp | I'll tell later | 16:10 |
odyssey4me | we have some docs bugs, I'll allocate those appropriately | 16:10 |
odyssey4me | https://bugs.launchpad.net/openstack-ansible/+bug/1482265 | 16:10 |
openstack | Launchpad bug 1482265 in openstack-ansible juno "Nova-computes need nfs-common installed if NFS cinder backend" [High,Confirmed] - Assigned to Jesse Pretorius (jesse-pretorius) | 16:10 |
odyssey4me | I think we discussed this last week, but I haven't had a chance to look into it for master/kilo. | 16:10 |
odyssey4me | Does anyone have a gap to confirm the bug? If you do, please feel free to allocate it to yourself | 16:11 |
odyssey4me | otherwise I'll get to it asap | 16:11 |
odyssey4me | https://bugs.launchpad.net/openstack-ansible/+bug/1484256 | 16:11 |
openstack | Launchpad bug 1484256 in openstack-ansible "Apache servers reporting version in response header" [Undecided,New] | 16:11 |
palendae | That was reported by our netsec folks | 16:12 |
odyssey4me | that's a good call, and another low hanging fruit fix :) | 16:12 |
odyssey4me | not really a bug though? wishlist? | 16:13 |
sigmavirus24 | odyssey4me: eh | 16:14 |
odyssey4me | due to the security factor I am tempted to rate it a medium bug though | 16:14 |
sigmavirus24 | low is better | 16:14 |
palendae | Possibly a security concern | 16:14 |
palendae | Low seems reasonable to me | 16:14 |
sigmavirus24 | Put it like this, it makes it easier for someone to know which exploits to test against our apache servers | 16:14 |
odyssey4me | ok, low it is | 16:14 |
sigmavirus24 | That cuts down the attack time | 16:14 |
sigmavirus24 | But it doesn't mean they wont' find a vulnerability if they're attacking the server | 16:15 |
odyssey4me | I'll target it for 11.2.0 and hopefully someone can pick it up before then. | 16:15 |
sigmavirus24 | So medium works too but I don't think it's a huge concern | 16:15 |
odyssey4me | https://bugs.launchpad.net/openstack-ansible/+bug/1484619 | 16:15 |
openstack | Launchpad bug 1484619 in openstack-ansible "Document host_bind_override option" [Undecided,New] | 16:15 |
sigmavirus24 | assign Sam-I-Am | 16:16 |
sigmavirus24 | =P | 16:16 |
odyssey4me | Sam-I-Am are you around perhaps? | 16:16 |
odyssey4me | done | 16:17 |
sigmavirus24 | We need a bug triage bot so we can do "#assign Sam-I-Am" | 16:17 |
sigmavirus24 | and it just works | 16:17 |
odyssey4me | ok, now the current master blocker: https://bugs.launchpad.net/openstack-ansible/+bug/1485917 | 16:17 |
openstack | Launchpad bug 1485917 in openstack-ansible trunk "hpcloud AIO's are failing tempest tests" [Critical,New] - Assigned to Jesse Pretorius (jesse-pretorius) | 16:17 |
odyssey4me | palendae do you want to do a run-through of navigating gerrit and identifying the logs | 16:18 |
Apsu | Looking at the volume failure right now | 16:18 |
Apsu | Logs for the volume failure* that is | 16:18 |
odyssey4me | palendae ? | 16:19 |
palendae | Basically we're looking through https://review.openstack.org/#/q/project:stackforge/os-ansible-deployment+branch:master,n,z to find -1 verifieds; the gate-os-ansible-dsvm-commit log will contain the failures in the 'console' log | 16:19 |
palendae | As to the ones specifically affecting master on HP cloud, I have not looked in depth to know what the errors in that log are | 16:19 |
Apsu | odyssey4me: It was me who asked what was hpcloud and what wasn't, earlier. I'm quite familiar with identifying logs and navigating gerrit ;P | 16:19 |
odyssey4me | Apsu sure, but evrardjp mgariepy and others are new to this :) | 16:20 |
palendae | That part I also don't know...I assume there's a host name in the logs? | 16:20 |
evrardjp | I am but I can grep -i FAIL * | 16:20 |
odyssey4me | ok, so you can identify the provider and region right at the top of the console log | 16:20 |
odyssey4me | eg: http://logs.openstack.org/67/213467/1/gate/gate-os-ansible-deployment-dsvm-commit/ab4aa89/console.html#_2015-08-18_15_02_50_318 | 16:20 |
palendae | (also, upstream jobs have a standard footer that goes on all their job index pages, which I would love to include someday) | 16:20 |
odyssey4me | see devstack-trusty-hpcloud-b2-<number> | 16:21 |
evrardjp | same as Apsu | 16:21 |
odyssey4me | that means it's the devstack image, on ubuntu trusty, running in hpcloud region b2 | 16:21 |
Apsu | Tempest is failing on waiting for cinder volume create because the build goes into ERROR. Why is what I'm looking into now | 16:22 |
odyssey4me | Apsu ok, so here's where some context may help | 16:22 |
evrardjp | OSError: [Errno 2] No such file or directory seems bad too | 16:23 |
odyssey4me | note: https://github.com/stackforge/os-ansible-deployment/blob/master/scripts/scripts-library.sh#L21 | 16:23 |
odyssey4me | and https://github.com/stackforge/os-ansible-deployment/blob/master/scripts/scripts-library.sh#L65-L111 | 16:23 |
evrardjp | but not that much apparently | 16:23 |
odyssey4me | and also http://logs.openstack.org/67/213467/1/gate/gate-os-ansible-deployment-dsvm-commit/ab4aa89/logs/instance-info/ | 16:23 |
Apsu | ok | 16:23 |
odyssey4me | the host_info files contain information about the instance | 16:23 |
odyssey4me | including disk layout | 16:24 |
evrardjp | ok | 16:24 |
Apsu | Cool | 16:25 |
odyssey4me | if you check successful master jobs, you'll be able to compare as well | 16:26 |
odyssey4me | here's a recent master success which ran in rax: https://review.openstack.org/212919 | 16:27 |
*** phalmos has joined #openstack-ansible | 16:27 | |
odyssey4me | you'll see that the disk layouts are different | 16:27 |
odyssey4me | and you'll notice that the script sets up the cinder-volumes vg differently | 16:27 |
*** rward has joined #openstack-ansible | 16:28 | |
odyssey4me | in hocloud it's on a real disk because there's enough space to do that, but on rax we have to use a loopback disk for cinder | 16:28 |
odyssey4me | in hpcloud there's not enough space on the system disk, whereas on rax there is | 16:28 |
odyssey4me | something I noticed is that we're missing the cinder-volume log here: http://logs.openstack.org/67/213467/1/gate/gate-os-ansible-deployment-dsvm-commit/ab4aa89/logs/aio1-cinder/ | 16:29 |
Apsu | Interesting. | 16:29 |
Apsu | Also, appears the 1gb test volume create/deletes are successful, at least from the api/scheduler perspectives | 16:30 |
odyssey4me | yep | 16:30 |
*** yaya has quit IRC | 16:30 | |
odyssey4me | I built a g1-8 on rax and added a 500gb disk to provide a similar setup - then bootstrapped master and confirmed that it was setup with a similar layout to hpcloud. | 16:31 |
odyssey4me | tempest failed for me, multiple times - but each time it failed differently | 16:31 |
odyssey4me | and none of those times were related to volumes | 16:31 |
Apsu | Are the account/container/container-error/object logs supposed to have things in them, under logs/aio1? | 16:31 |
Apsu | Because they're all empty | 16:31 |
odyssey4me | Apsu those are swift logs, and yeah - I think there's an issue where rsyslog needs to be restarted before those get populated. | 16:32 |
odyssey4me | maybe another bug to pick up on | 16:32 |
odyssey4me | but not related | 16:32 |
Apsu | kk | 16:32 |
Apsu | Figured they were swift, but ok | 16:32 |
odyssey4me | evrardjp do you see anything unusual? | 16:33 |
odyssey4me | I expect that we may be finding plenty of red herrings here, which is why I'm asking for more eyes. | 16:33 |
evrardjp | I'm looking to that right now | 16:33 |
evrardjp | http://logs.openstack.org/67/213467/1/gate/gate-os-ansible-deployment-dsvm-commit/ab4aa89/console.html#_2015-08-18_15_02_50_796 | 16:33 |
odyssey4me | the issue is most likely cinder/nova related, I think - but it befuzzles me why it works in rax but not in hpcloud | 16:34 |
evrardjp | I read your analysis | 16:34 |
*** sdake_ is now known as sdake | 16:34 | |
odyssey4me | the following patches have been submitted through my digging: https://review.openstack.org/214172 | 16:35 |
odyssey4me | https://review.openstack.org/210107 | 16:35 |
odyssey4me | https://review.openstack.org/214045 | 16:35 |
evrardjp | I never used tempest, so it's really hard for me to understand the impact of all changes, but I get what you've done. Although I don't get why yet | 16:37 |
odyssey4me | well, basic logic tells me that the underlying difference in storage architecture is the most likely culprit | 16:38 |
odyssey4me | but that very same architecture is being used for juno and kilo with success, so it may have to do with changes in cinder code instead | 16:39 |
Apsu | Wonder what's up with no cinder-volume log | 16:39 |
*** phalmos has quit IRC | 16:39 | |
*** shausy has quit IRC | 16:39 | |
evrardjp | about that you were right | 16:40 |
evrardjp | http://logs.openstack.org/67/213467/1/gate/gate-os-ansible-deployment-dsvm-commit/ab4aa89/console.html#_2015-08-18_16_10_48_791 | 16:40 |
odyssey4me | Apsu I suspect the switch from on_metal false to on_metal true was not done properly - the log link probably has not been implemented | 16:40 |
Apsu | odyssey4me: Ah | 16:40 |
odyssey4me | evrardjp yep, below that you'll see the tempest debug output showing the stack traces for the failures | 16:41 |
evrardjp | I'll get there, it's hard to do everything at the same time ;) | 16:41 |
*** c0m0 has quit IRC | 16:43 | |
odyssey4me | let me fyi - I've logged https://bugs.launchpad.net/openstack-ansible/+bug/1486133 | 16:44 |
openstack | Launchpad bug 1486133 in openstack-ansible "zero sized swift logs" [Low,New] | 16:44 |
Apsu | Without that log, this might not be easy to figure out. The API call logs from tempest show the create call succeeding, the status is "creating", followed by "error" | 16:44 |
evrardjp | odyssey4me: what do you mean by the switch with on_metal? | 16:44 |
Apsu | Then it deletes it, and the delete succeeds | 16:44 |
odyssey4me | will also log the bug regarding the cinder-volume log, as that'd be very helpful right now | 16:44 |
Apsu | evrardjp: cinder-volume isn't contained | 16:44 |
odyssey4me | evrardjp the cinder container thing that you did some documentation for | 16:45 |
evrardjp | yup I remember that ;) | 16:45 |
odyssey4me | well, since that change we no longer have a cinder-volume log | 16:45 |
evrardjp | but we are still on_metal because we're using the lvm right? | 16:47 |
*** ashishb has joined #openstack-ansible | 16:47 | |
evrardjp | I know nothing about these devstack instances, I'll check the link you posted | 16:47 |
odyssey4me | evrardjp in this case 'devstack-trusty' refers to the base image used inside openstack-ci | 16:48 |
evrardjp | anyway that's not the important part of the conversation | 16:48 |
evrardjp | I get that | 16:48 |
odyssey4me | it's an Ubuntu Trusty image with a boatload of other software on it | 16:48 |
evrardjp | I've checked now what's inside | 16:49 |
evrardjp | with your link | 16:49 |
odyssey4me | we have discovered that sometimes the image is inconsistent, so that's quite possibly a cause | 16:49 |
evrardjp | so by default, there isn't any lvm in the /dev/vd* | 16:49 |
odyssey4me | another bug https://bugs.launchpad.net/openstack-ansible/+bug/1486137 | 16:49 |
openstack | Launchpad bug 1486137 in openstack-ansible "cinder-volume log missing" [Medium,New] | 16:49 |
evrardjp | ok | 16:49 |
evrardjp | so there is a complete reverse engineering to do about the is_metal change | 16:50 |
evrardjp | ;) | 16:50 |
*** rromans_ is now known as rromans_afk | 16:51 | |
evrardjp | I'm taking the conversation away from its original goal, sorry | 16:51 |
*** phalmos has joined #openstack-ansible | 16:51 | |
evrardjp | quick question: isn't it quicker to temporarily recheck and arrive to a different host, or even ask to be specifically build on one host? | 16:52 |
openstackgerrit | Jesse Pretorius proposed stackforge/os-ansible-deployment: Set iptables-persistent install execution to append to log https://review.openstack.org/214172 | 16:53 |
odyssey4me | evrardjp I've been rechecking for days now. This is too disruptive. | 16:53 |
odyssey4me | The node pools aren't allocated evenly. | 16:54 |
odyssey4me | And we don't get to choose which node pool, or which host to build on. | 16:54 |
odyssey4me | This is a general CI service for all of openstack. See jobs come and go here: http://status.openstack.org/zuul/ | 16:54 |
evrardjp | yeah I was guessing that, since the jobs running or failing have the same name | 16:56 |
evrardjp | I should check on zuul, learn more about it | 16:56 |
odyssey4me | evrardjp workhole alert: http://docs.openstack.org/infra/manual/ | 16:57 |
odyssey4me | *wormhole | 16:57 |
evrardjp | I'll stay away from time warps as much as can, thanks for the alert | 16:58 |
evrardjp | I'll have to go for today, but I can still learn/check tomorrow | 17:01 |
odyssey4me | I need to get out of here - time to get home. I need a break for it. | 17:02 |
odyssey4me | If anyone discovers anything that's a pattern across failed builds that might be worth looking into then please let me know. | 17:02 |
evrardjp | I guess we'll try to see what will change with your api version change, right? | 17:02 |
evrardjp | at least trying | 17:03 |
evrardjp | ok odyssey4me, let's try this tomorrow | 17:03 |
Apsu | odyssey4me: rsyslog usage and config/vars seem the same in the cinder playbook as nova and such | 17:03 |
Apsu | Very odd | 17:03 |
odyssey4me | evrardjp already been through and failed :/ | 17:03 |
Apsu | odyssey4me: Where does the bit that uploads the logs on the gate live? | 17:04 |
odyssey4me | Apsu that's a jenkins thingy in openstack-ci and out of our control | 17:04 |
Apsu | yey | 17:04 |
odyssey4me | we do this to facilitate it: https://github.com/stackforge/os-ansible-deployment/blob/master/scripts/gate-check-commit.sh#L49-L52 | 17:05 |
odyssey4me | essentially whatever ends up in a subdirectory of the pwd called 'logs' ends up being uploaded by jenkins | 17:05 |
Apsu | Gotcha | 17:05 |
*** spotz is now known as spotz_zzz | 17:05 | |
odyssey4me | we symlink it to /openstack/logs seeing as that's where we store all out logs anyway | 17:06 |
odyssey4me | *ahem '/openstack/log' | 17:06 |
odyssey4me | if we're still stuck tomorrow and need more info, we'll try and crank out some bugfixes for the above-registered bugs | 17:06 |
Apsu | yep | 17:06 |
odyssey4me | but Apsu if there's info you're missing that could be useful, let us know and we can pop in a review to get those included - whether permanently or temporarily | 17:07 |
* Apsu nods | 17:08 | |
Apsu | Thanks sir. Catch you later | 17:08 |
openstackgerrit | Merged stackforge/os-ansible-deployment: Remove unused variables in os_swift role https://review.openstack.org/213114 | 17:10 |
*** dabernie_ has quit IRC | 17:12 | |
*** jwagner is now known as jwagner_away | 17:18 | |
*** KLevenstein has quit IRC | 17:22 | |
*** yaya has joined #openstack-ansible | 17:24 | |
openstackgerrit | Merged stackforge/os-ansible-deployment: development-stack Doc Update https://review.openstack.org/206016 | 17:28 |
*** britthouser has joined #openstack-ansible | 17:30 | |
*** TheIntern has joined #openstack-ansible | 17:31 | |
*** britthou_ has quit IRC | 17:33 | |
*** phalmos has quit IRC | 17:34 | |
*** phalmos has joined #openstack-ansible | 17:35 | |
*** yaya has quit IRC | 17:35 | |
Sam-I-Am | odyssey4me: moo? | 17:35 |
Sam-I-Am | odyssey4me: was conferencing during bug triage | 17:35 |
*** k_stev has joined #openstack-ansible | 17:38 | |
prometheanfire | Sam-I-Am: no irc for you | 17:39 |
Sam-I-Am | shhh you | 17:40 |
*** alop has quit IRC | 17:41 | |
*** prometheanfire has quit IRC | 17:45 | |
*** prometheanfire has joined #openstack-ansible | 17:46 | |
*** KLevenstein has joined #openstack-ansible | 17:50 | |
*** TheIntern has quit IRC | 18:05 | |
*** yaya has joined #openstack-ansible | 18:05 | |
*** openstackgerrit_ has quit IRC | 18:11 | |
*** spotz_zzz is now known as spotz | 18:23 | |
openstackgerrit | Merged stackforge/os-ansible-deployment: Remove python-openstacksdk version spec https://review.openstack.org/214242 | 18:34 |
openstackgerrit | Merged stackforge/os-ansible-deployment: Remove hardcoded config drive enforcement https://review.openstack.org/212497 | 18:34 |
odyssey4me | Sam-I-Am it was just to discuss https://bugs.launchpad.net/openstack-ansible/+bug/1484619 but we decided to assign it to you instead :p | 18:38 |
openstack | Launchpad bug 1484619 in openstack-ansible "Document host_bind_override option" [Low,Confirmed] - Assigned to Matt Kassawara (ionosphere80) | 18:38 |
Sam-I-Am | oh, yeah... its just not documented anywhere | 18:39 |
Sam-I-Am | except if you look at the ansibles :) | 18:40 |
Sam-I-Am | or... understand what it does | 18:40 |
Sam-I-Am | self-documenting? | 18:40 |
odyssey4me | Apsu if you're up to continue debugging the master block then we can continue - just give me an hour or so to make food and such. | 18:40 |
odyssey4me | It's nagging me, which means that sleep will be elusive once again. | 18:41 |
Apsu | odyssey4me: Yep, in meetings right now, will hit you up in a bit | 18:44 |
Sam-I-Am | odyssey4me: whats broked this time? | 18:45 |
odyssey4me | Sam-I-Am https://bugs.launchpad.net/openstack-ansible/+bug/1485917 | 18:45 |
openstack | Launchpad bug 1485917 in openstack-ansible trunk "hpcloud AIO's are failing tempest tests" [Critical,In progress] - Assigned to Jesse Pretorius (jesse-pretorius) | 18:45 |
Sam-I-Am | if hpcloud exit 0 ? | 18:46 |
sigmavirus24 | lol | 18:46 |
odyssey4me | not sure of the root cause yet | 18:46 |
Sam-I-Am | hp | 18:47 |
Apsu | return true | 18:48 |
sigmavirus24 | odyssey4me: http://paste.openstack.org/show/420843/ | 18:48 |
sigmavirus24 | odyssey4me: specifically "No valid host was found. No weighed hosts available" | 18:49 |
odyssey4me | sigmavirus24 hmm, I've seen that before - and also seen devstack gate references recently to it - but it wasn't consistent between failures | 18:50 |
odyssey4me | but it may be a clue | 18:50 |
sigmavirus24 | odyssey4me: just looking through one set of logs | 18:50 |
sigmavirus24 | trying to dig in | 18:50 |
sigmavirus24 | and look around | 18:50 |
odyssey4me | I suspect in that case it may be due to cinder-volume not starting, or something to that effect | 18:50 |
sigmavirus24 | it's the only "error" in cinder's logs though | 18:50 |
Sam-I-Am | yeah | 18:50 |
Sam-I-Am | thats what i'm thinking | 18:50 |
odyssey4me | with https://bugs.launchpad.net/openstack-ansible/+bug/1486137 it makes it harder to know | 18:50 |
openstack | Launchpad bug 1486137 in openstack-ansible "cinder-volume log missing" [Medium,New] | 18:50 |
sigmavirus24 | hah | 18:50 |
sigmavirus24 | I was like, "Well that explains why I can't find cinder volume's log file" | 18:51 |
*** woodard has quit IRC | 18:51 | |
sigmavirus24 | I'll tackle that now | 18:51 |
Sam-I-Am | is cinder finding all of its infra bits? | 18:51 |
sigmavirus24 | And see if it shows up in the logs | 18:51 |
Sam-I-Am | like... fake devices | 18:51 |
odyssey4me | I suspect we have a missing symlink which needs to be placed there if cinder-volume is 'on metal' | 18:51 |
odyssey4me | Sam-I-Am in most failed builds the scheduler shows successful volume creats, deletes, scrubs, etc | 18:52 |
odyssey4me | also, hpcloud happens to not use a fake device for cinder - that's the odd bit | 18:52 |
*** yaya has quit IRC | 18:52 | |
Sam-I-Am | what does it do? | 18:52 |
Sam-I-Am | i havent looked at the infra jobs | 18:52 |
odyssey4me | the other things I'm thinking is that perhaps lvm is saying that it supports 'thin' lv's - but we're missing a package to actually make that work properly | 18:53 |
Sam-I-Am | why would this be different on different clouds? | 18:53 |
odyssey4me | Sam-I-Am in hpcloud there's a big enough ephemeral disk for us to share it between the containers and cinder - so there are two vg's. It uses real LVM on a real disk. | 18:53 |
Sam-I-Am | ahh | 18:54 |
Sam-I-Am | vs. containers just using root? | 18:54 |
odyssey4me | Sam-I-Am there's a difference in underlying disk structure, but also why is this working perfectly for kilo - but not master | 18:54 |
odyssey4me | Sam-I-Am both hp and rax have their containers built in an LVM vg | 18:54 |
Sam-I-Am | but rax uses a fake for cinder? | 18:55 |
odyssey4me | only cinder's vg differs in the underlying supporting disk - in rax it's a fake loopback disk, in hp it's a real disk | 18:55 |
Sam-I-Am | hmmm | 18:55 |
odyssey4me | in both cases they are still lvm vg's - it's just the underlying disk that differs | 18:55 |
odyssey4me | Sam-I-Am more details in your backscroll :p | 18:56 |
*** sdake_ has joined #openstack-ansible | 18:56 | |
*** yaya has joined #openstack-ansible | 18:58 | |
*** k_stev has quit IRC | 18:58 | |
*** sdake has quit IRC | 18:59 | |
*** k_stev has joined #openstack-ansible | 19:00 | |
*** cloudnull_afk is now known as cloudkiller | 19:02 | |
*** fawadkhaliq has joined #openstack-ansible | 19:02 | |
*** javeriak has quit IRC | 19:07 | |
*** javeriak has joined #openstack-ansible | 19:10 | |
*** fawadk has joined #openstack-ansible | 19:13 | |
*** fawadkhaliq has quit IRC | 19:16 | |
*** sdake_ is now known as sdake | 19:16 | |
*** alop has joined #openstack-ansible | 19:25 | |
*** jwagner_away is now known as jwagner | 19:25 | |
*** fawadk has quit IRC | 19:41 | |
*** rromans_afk is now known as rromans | 19:42 | |
*** cloudkiller is now known as cloudnull_zzz | 19:48 | |
*** woodard has joined #openstack-ansible | 19:51 | |
*** britthouser has quit IRC | 19:54 | |
*** britthouser has joined #openstack-ansible | 19:54 | |
*** woodard has quit IRC | 19:57 | |
*** sigmavirus24 is now known as sigmavirus24_awa | 19:57 | |
*** sigmavirus24_awa is now known as sigmavirus24 | 19:58 | |
*** ashishb has quit IRC | 20:06 | |
*** britthou_ has joined #openstack-ansible | 20:12 | |
*** britthouser has quit IRC | 20:15 | |
openstackgerrit | Merged stackforge/os-ansible-deployment: Correct binding logic in haproxy configuration https://review.openstack.org/213230 | 20:16 |
*** britthouser has joined #openstack-ansible | 20:18 | |
*** britthou_ has quit IRC | 20:19 | |
openstackgerrit | Jesse Pretorius proposed stackforge/os-ansible-deployment: Remove unused variables in os_swift role https://review.openstack.org/214326 | 20:19 |
openstackgerrit | Jesse Pretorius proposed stackforge/os-ansible-deployment: Correct binding logic in haproxy configuration https://review.openstack.org/214328 | 20:19 |
*** woodard has joined #openstack-ansible | 20:31 | |
sigmavirus24 | odyssey4me: you around? | 20:32 |
sigmavirus24 | So, I found that while cinder-volume.log may not be present in HP Cloud, aio1-cinder may not be present at all either | 20:32 |
sigmavirus24 | Apsu: http://paste.openstack.org/show/420987/ | 20:36 |
sigmavirus24 | odyssey4me: ^ | 20:36 |
*** javeriak has quit IRC | 20:40 | |
odyssey4me | sigmavirus24 pong | 20:40 |
sigmavirus24 | nevermind | 20:41 |
sigmavirus24 | that's wrong | 20:41 |
odyssey4me | sigmavirus24 I was just having a wtf moment there | 20:41 |
sigmavirus24 | yeah | 20:41 |
sigmavirus24 | the logging is not helpful | 20:42 |
sigmavirus24 | because it runs the restarts across cinder_all | 20:42 |
sigmavirus24 | Instead of restart specific services on the host they're running on | 20:42 |
sigmavirus24 | which my Cmd-F searching found that and confused me | 20:42 |
odyssey4me | sigmavirus24 our playbooks are too broad, and our roles do too much magic with the inventory - it's time we did better | 20:42 |
sigmavirus24 | so | 20:44 |
sigmavirus24 | aio1-cinder/ is missing on some of these jenkins runs | 20:44 |
sigmavirus24 | which makes it seem like cinder-backup/cinder-volume never start or never actually print anything out | 20:44 |
odyssey4me | that makes sense to me in light of what's going on - so I'd like to zone in on that somehow | 20:45 |
*** britthou_ has joined #openstack-ansible | 20:45 | |
odyssey4me | we have a ton of superfluous stuff that gets run and logged to try and figure out the issue with hpcloud-b4 - and much of it is getting in the way | 20:45 |
odyssey4me | I'm tempted to rip it all out and just exit fail on hpcloud-b4 | 20:45 |
sigmavirus24 | odyssey4me: Apsu brought up the theory that this could be related to base images | 20:46 |
sigmavirus24 | I'm asking in -infra if they updated the dsvm image recently? | 20:46 |
sigmavirus24 | s/\?// | 20:46 |
d34dh0r53 | food for thought, I start getting SSH errors on boxen with a large number of processors, we should probably cap the FORKS number, number of procs up until a cap | 20:46 |
odyssey4me | sigmavirus24 yes - although in theory the base images are synchronised, there is no guarantee | 20:46 |
sigmavirus24 | right | 20:46 |
odyssey4me | d34dh0r53 long fixed on that one :p | 20:46 |
d34dh0r53 | there's a cap? | 20:46 |
*** yaya has quit IRC | 20:47 | |
odyssey4me | well, we use the cpu number for some tasks - but on normal openstack-ansible executions we use the default and let the deployer change it | 20:47 |
odyssey4me | the gate uses the cpu number as a forks override | 20:47 |
odyssey4me | but we must really stop using overrides, dammit | 20:47 |
*** britthouser has quit IRC | 20:47 | |
d34dh0r53 | heh | 20:48 |
odyssey4me | d3 https://review.openstack.org/207474 | 20:48 |
odyssey4me | d34dh0r53 ^ | 20:48 |
d34dh0r53 | odyssey4me: right, what I'm saying is that number of cpu's can be > than what FORKS can successfully run at. | 20:50 |
*** britthou_ has quit IRC | 20:51 | |
odyssey4me | d34dh0r53 interesting - perhaps it's based on the target then? | 20:52 |
odyssey4me | I expect that perhaps we should do something like #cpu's or 8, whichever is lower | 20:52 |
odyssey4me | or something like that | 20:52 |
odyssey4me | maybe not 8, but 10 or 12 - but you get the idea | 20:53 |
d34dh0r53 | odyssey4me: yeah, it's strange, I think we can go higher than 8, but I'm not sure what the cap is, on this 40 core OM box I fail at 40 but seem to be running ok at 20 | 20:53 |
odyssey4me | maybe 16 is a more reasonable number | 20:54 |
d34dh0r53 | odyssey4me: yeah, I'm thinking 12-16 is probably the range | 20:54 |
odyssey4me | or perhaps some sort of calc based on the number of threads available, or cores per proc | 20:54 |
odyssey4me | well, 15 has been the default where we've failed so often in the gate | 20:55 |
odyssey4me | 8 has worked well for the p1-8 in the gate and in AIO tests | 20:55 |
d34dh0r53 | right, so it probably should follow the number of cores up until the limit | 20:55 |
odyssey4me | 10 worked well for QE with hardware - and they needed it to be less than 15. | 20:55 |
bgmccollum | does ansible take into account delegate to when popping tasks from the queue? i can imagine a situation where lots of delegates for containers could hit the ssh limit for the host... | 20:56 |
odyssey4me | bgmccollum no idea personally, but once we have some numbers it'd be great to approach them to figure it out | 20:56 |
odyssey4me | ideally ansible should 'know better' after an initial handshake with its target - and after knowing its deployment host | 20:57 |
d34dh0r53 | odyssey4me: ^ | 20:57 |
d34dh0r53 | ansible should definitely know better | 20:57 |
odyssey4me | it's a young tool, so we forgive it a little :p | 20:58 |
*** yaya has joined #openstack-ansible | 20:58 | |
bgmccollum | d34dh0r53: maybe bump the sshd conf that limits the number of simul connections... | 20:58 |
d34dh0r53 | A more robust SSH retry mechanism is in 2.0 so this hopefully will be a mute point | 20:58 |
d34dh0r53 | bgmccollum: interesting idea, I may play around with that | 20:59 |
odyssey4me | bgmccollum while that may be a factor, we did spend an awful lot of time trying that sort of thing a few revisions ago - things may have gotten better, but you can ask hughsaunders who literally spent around two months trying everything under the sun to make it work better | 20:59 |
odyssey4me | the ssh retry mechanism in ansible 2.0 was the resulting work which was the only thing that actually achieved a consistent result | 21:01 |
odyssey4me | sigmavirus24 a part of me wants to normalise the gate checks to use the same underlying storage to see what happens, but another part of me is happy that using different mechanisms is like a canary | 21:04 |
odyssey4me | I can't work out whether ti love it or hate it. | 21:04 |
odyssey4me | *to | 21:04 |
sigmavirus24 | odyssey4me: so | 21:04 |
sigmavirus24 | "sigmavirus24: its possible we stopped doing the ephemeral dirve formating on one for some reason" from -infra | 21:05 |
* odyssey4me switches to -infra to see the discussion | 21:05 | |
sigmavirus24 | When I asked about the differences (because there are flavor differences) between the two providers I got "sigmavirus24: devstack-gate's setup function documents via code" | 21:05 |
sigmavirus24 | odyssey4me: this was in the scrollback | 21:05 |
sigmavirus24 | I'm reading devstack's function(s) to find wtf they're talkinga bout | 21:05 |
odyssey4me | I see it. | 21:06 |
odyssey4me | the scrollback I mean | 21:06 |
odyssey4me | as I recall in hpcloud the ephemeral disk is mounted at startup - we then dismount, repartition and continue | 21:07 |
odyssey4me | ie we dismount and remove the config: https://github.com/stackforge/os-ansible-deployment/blob/master/scripts/scripts-library.sh#L75-L79 | 21:08 |
odyssey4me | we only care about devices beyond the first one: https://github.com/stackforge/os-ansible-deployment/blob/master/scripts/scripts-library.sh#L72 | 21:08 |
*** woodard_ has joined #openstack-ansible | 21:09 | |
sigmavirus24 | hm | 21:10 |
odyssey4me | it seems though, that we do format - but only for the lxc partition: https://github.com/stackforge/os-ansible-deployment/blob/master/scripts/scripts-library.sh#L81-L109 | 21:10 |
odyssey4me | so, correct me on this - it's been a while since I wrote it - but... | 21:11 |
odyssey4me | if we have more than 250GB of disk space on a secondary disk | 21:11 |
odyssey4me | then partition 80% for lxc, and the rest for cinder-volume | 21:11 |
d34dh0r53 | yep | 21:12 |
d34dh0r53 | that's how I read it | 21:12 |
odyssey4me | (no formatting on either as they're both lvm vg's - so lv's will get created inside them and those will be formatted) | 21:12 |
*** woodard has quit IRC | 21:12 | |
sigmavirus24 | That's 50GB for cinder | 21:12 |
odyssey4me | otherwise (if we have less than 250gb - ie rax), just format the thing as ext4 for and use it for lxc | 21:13 |
*** woodard_ has quit IRC | 21:13 | |
Apsu | Cinder's still not even being found as a service to potentially start | 21:14 |
Apsu | Ignoring how much space it might have if it did :P | 21:14 |
odyssey4me | the otherwise bit does not use lvm at all - it's simply a normal partition mount | 21:14 |
odyssey4me | Apsu right, but not always - we have inconsistency here | 21:14 |
odyssey4me | so bootstrap-aio runs the configure_dispace function early: https://github.com/stackforge/os-ansible-deployment/blob/master/scripts/bootstrap-aio.sh#L164 | 21:15 |
odyssey4me | so if that results in the 'cinder-volumes' vg, then great - otherwise: https://github.com/stackforge/os-ansible-deployment/blob/master/scripts/bootstrap-aio.sh#L199-L214 | 21:16 |
odyssey4me | so either way - hpcloud or rax - a vg is being used for volumes... but the underlying disk is different | 21:16 |
sigmavirus24 | Apsu: I was misreading the logs | 21:17 |
*** phalmos has quit IRC | 21:18 | |
odyssey4me | I don't see how a thin/thick volume issue could arise - unless somehow the driver is detecting badly. Doing lvm thin provisioning requires a special setup which we don't do: https://gist.github.com/sidnei/7041338 | 21:19 |
odyssey4me | but yes, we need that cinder-volume log | 21:20 |
odyssey4me | sigmavirus24 did you figure out how to make the magic? | 21:20 |
sigmavirus24 | odyssey4me: not yet | 21:21 |
sigmavirus24 | odyssey4me: because sometiems cinder-backup.log doesn't appear either | 21:21 |
odyssey4me | sigmavirus24 that would make sense if the failure is earlier in the process | 21:21 |
odyssey4me | so I have been wondering - perhaps we should carry a sha for tempest_lib or something else to help this | 21:22 |
sigmavirus24 | odyssey4me: we do | 21:22 |
sigmavirus24 | or don't we on master? | 21:22 |
odyssey4me | sigmavirus24 we did - I removed them as an experiment | 21:22 |
odyssey4me | we should not carry sha's specifics unless we need to | 21:23 |
odyssey4me | we tend to carry baggae | 21:23 |
odyssey4me | *baggage | 21:23 |
odyssey4me | however - I looked at the lib, and it's been quite dormant | 21:23 |
sigmavirus24 | So the logs indicate this isn't a problem with tempest though | 21:23 |
sigmavirus24 | Or if it is, it's a very very very bizarre one | 21:23 |
odyssey4me | I suspect the issue is flux relating to nova, cinder and/or neutron... and each fail relates to different herrings. | 21:24 |
odyssey4me | I have learned that we've been far too willing to turn tests off. | 21:24 |
sigmavirus24 | We could try an earlier SHA for cinder to see if that fixes it but I doubt that's the issue | 21:25 |
sigmavirus24 | That said, tempest talks directly to cinder's API | 21:25 |
sigmavirus24 | I don't think nova has any part of this | 21:26 |
odyssey4me | sigmavirus24 yes, that's my lowest order suspect - it's only suspect in the interaction with neutron and cinder | 21:27 |
*** wmlynch_ has quit IRC | 21:29 | |
*** k_stev has quit IRC | 21:32 | |
*** mpmsimo has joined #openstack-ansible | 21:38 | |
odyssey4me | I'm pretty seriously considering changing the -infra timeout for our builds to 90 mins instead of 120 mins. | 21:40 |
d34dh0r53 | odyssey4me: +1 at 90 minutes "It's dead Jim" | 21:43 |
odyssey4me | d34dh0r53 urm, it looks like you hit the tree jim | 21:43 |
d34dh0r53 | Wow, I thought I was the only one who used old-school Links references :) | 21:44 |
odyssey4me | yeah - we have a clear 70-80 min run on the longest running check | 21:44 |
odyssey4me | :) | 21:44 |
d34dh0r53 | Links 2004 I think it was, one of the best golf games ever | 21:44 |
odyssey4me | that and the original fifa games were pretty epic at the time | 21:45 |
*** k_stev has joined #openstack-ansible | 21:45 | |
odyssey4me | never mind that car game - what was it.... | 21:45 |
d34dh0r53 | RalliSport Challenge? | 21:46 |
odyssey4me | d34dh0r53 Test Drive, I think. | 21:48 |
d34dh0r53 | odyssey4me: yeah, test drive, one of the best games ever | 21:48 |
*** Mudpuppy has quit IRC | 21:48 | |
odyssey4me | although Death Chase on the ZX Spectrum was the first I ever got to play - loaded off an audio tape! | 21:48 |
odyssey4me | oh yeah, rock that game: https://www.youtube.com/watch?v=snpr8hFIf3U | 21:49 |
odyssey4me | ascii graphics to the maxx :p | 21:49 |
d34dh0r53 | lol, awesome | 21:51 |
odyssey4me | and then there was https://www.youtube.com/watch?v=GtpKfSY0MBw | 21:52 |
odyssey4me | hold me back | 21:52 |
*** yaya has quit IRC | 21:53 | |
* d34dh0r53 thanks odyssey4me for the youtube black hole | 21:53 | |
odyssey4me | oh yes! https://www.youtube.com/watch?v=iJNfMqEK7VI | 21:54 |
*** JRobinson__ has joined #openstack-ansible | 21:54 | |
odyssey4me | please do not cue windows 3.0 videos | 21:55 |
odyssey4me | or windows 2.0 for that matter | 21:55 |
d34dh0r53 | many hours wasted https://www.youtube.com/watch?v=M9Bp3N9TdLc | 21:55 |
odyssey4me | d34dh0r53 I did enjoy this one in the arcade though: https://www.youtube.com/watch?v=J4tshJrkBw0 | 21:56 |
odyssey4me | Test Drive! Yeah! It came on around 10-20 'floppy' disks as I recall | 21:56 |
d34dh0r53 | haha | 21:56 |
d34dh0r53 | Golden Axe was so much fun | 21:57 |
*** yaya has joined #openstack-ansible | 21:57 | |
*** mpmsimo has quit IRC | 21:59 | |
*** k_stev has quit IRC | 22:01 | |
sigmavirus24 | so I can't seem to see anything that we committed recently to do this to ourselves | 22:06 |
palendae | Tempest SHA change? | 22:06 |
sigmavirus24 | nah this is pretty clearly in cinder's court | 22:06 |
sigmavirus24 | no cinder-volume/backup logs | 22:06 |
sigmavirus24 | that's very very suspicious | 22:06 |
palendae | Are *their* things passing? | 22:06 |
sigmavirus24 | cinder's gate? I hadn't checked recently but the current state of their gate is irrelevant to our since we're pinning to what is likely quite a few commits ago | 22:07 |
palendae | On our master? | 22:07 |
sigmavirus24 | Yes | 22:07 |
palendae | Ok | 22:07 |
odyssey4me | I agree that it looks more code-orientated. | 22:09 |
*** shoutm has joined #openstack-ansible | 22:09 | |
sigmavirus24 | but it's so bizarre that this only happens on hpcloud | 22:09 |
odyssey4me | We have passing gate checks with the same underlying architecture for juno and kilo. | 22:09 |
odyssey4me | that's the confusing part\ | 22:09 |
sigmavirus24 | is there a way for me to make a (free|cheap) account there to test this? | 22:09 |
odyssey4me | sigmavirus24 in hp cloud? | 22:11 |
sigmavirus24 | mhm | 22:11 |
odyssey4me | well, in a few days cloudnull_zzz will return and he has one | 22:12 |
odyssey4me | omg he has lurked | 22:12 |
sigmavirus24 | lol | 22:12 |
palendae | odyssey4me: he was on company chat earlier | 22:12 |
palendae | Waiting in an airport | 22:12 |
palendae | And thankfully he tl;dr'd the scrollback | 22:12 |
odyssey4me | airports... ugh | 22:12 |
sigmavirus24 | airports are the best places ever | 22:15 |
sigmavirus24 | especially when flights are delayed 3 hours | 22:15 |
sigmavirus24 | such that you have to sleep overnight in the next airport you get to | 22:16 |
palendae | Or like when O'Hare gives you free vouchers to a hotel that sends a shuttle | 22:16 |
sigmavirus24 | loooool | 22:17 |
sigmavirus24 | and I'm out | 22:17 |
palendae | *never sends a shuttle | 22:17 |
sigmavirus24 | the shuttle will be there in the next 48 hours | 22:17 |
sigmavirus24 | you're welcome, we're sorry | 22:18 |
*** sigmavirus24 is now known as sigmavirus24_awa | 22:18 | |
odyssey4me | sigmavirus24 I do think that if we continue to keep up to date with SHA's in master, we probably need to be better at keeping touch with each project - including devstack. | 22:19 |
*** pradk has quit IRC | 22:24 | |
*** mpmsimo has joined #openstack-ansible | 22:32 | |
*** neillc_away is now known as neillc | 22:33 | |
*** mpmsimo has quit IRC | 22:35 | |
*** mpmsimo has joined #openstack-ansible | 22:35 | |
*** jwagner is now known as jwagner_away | 22:38 | |
*** yaya has quit IRC | 22:38 | |
*** darrenc is now known as darrenc_afk | 22:40 | |
*** spotz is now known as spotz_zzz | 22:43 | |
*** KLevenstein has quit IRC | 22:49 | |
*** tlian2 has joined #openstack-ansible | 22:50 | |
*** tlian has quit IRC | 22:51 | |
*** darrenc_afk is now known as darrenc | 22:56 | |
*** britthouser has joined #openstack-ansible | 23:24 | |
*** britthou_ has joined #openstack-ansible | 23:25 | |
*** britthouser has quit IRC | 23:28 | |
*** mpmsimo has quit IRC | 23:48 | |
*** britthou_ has quit IRC | 23:56 |
Generated by irclog2html.py 2.14.0 by Marius Gedminas - find it at mg.pov.lt!