Monday, 2015-03-16

*** vmtrooper has joined #openstack-ansible		00:53
*** vmtrooper has quit IRC		00:58
*** mahito has joined #openstack-ansible		01:02
*** galstrom_zzz is now known as galstrom		02:00
*** vmtrooper has joined #openstack-ansible		02:41
*** vmtrooper has quit IRC		02:47
*** stevemar has joined #openstack-ansible		03:19
*** galstrom is now known as galstrom_zzz		04:01
*** galstrom_zzz is now known as galstrom		04:06
*** mahito has quit IRC		04:08
*** vmtrooper has joined #openstack-ansible		04:30
*** vmtrooper has quit IRC		04:36
*** galstrom is now known as galstrom_zzz		05:05
*** stevemar has quit IRC		05:55
*** vmtrooper has joined #openstack-ansible		06:19
*** vmtrooper has quit IRC		06:24
*** mahito has joined #openstack-ansible		06:56
*** vmtrooper has joined #openstack-ansible		08:08
*** vmtrooper has quit IRC		08:13
*** mahito has quit IRC		08:33
*** vmtrooper has joined #openstack-ansible		09:57
*** vmtrooper has quit IRC		10:02
*** vmtrooper has joined #openstack-ansible		11:46
*** jaypipes has joined #openstack-ansible		11:51
*** vmtrooper has quit IRC		11:51
*** britthouser has joined #openstack-ansible		11:53
*** galstrom_zzz is now known as galstrom		12:25
*** sdake has joined #openstack-ansible		12:47
*** openstackgerrit has quit IRC		12:50
*** openstackgerrit has joined #openstack-ansible		12:50
*** galstrom is now known as galstrom_zzz		13:08
openstackgerrit	git-harry proposed stackforge/os-ansible-deployment: Test commit - do not review https://review.openstack.org/161670	13:31
*** sandywalsh has joined #openstack-ansible		13:31
*** vmtrooper has joined #openstack-ansible		13:35
*** KLevenstein has joined #openstack-ansible		13:39
*** vmtrooper has quit IRC		13:40
*** Mudpuppy has joined #openstack-ansible		13:56
*** Mudpuppy has quit IRC		14:07
*** Mudpuppy has joined #openstack-ansible		14:08
*** sigmavirus24_awa is now known as sigmavirus24		14:25
openstackgerrit	Kevin Carter proposed stackforge/os-ansible-deployment: Adds rsyslog client role and enables it in all plays https://review.openstack.org/164714	14:36
openstackgerrit	Kevin Carter proposed stackforge/os-ansible-deployment: Adds rsyslog client role and enables it in all plays https://review.openstack.org/164714	14:36
*** galstrom_zzz is now known as galstrom		14:37
openstackgerrit	Kevin Carter proposed stackforge/os-ansible-deployment: Adds rsyslog client role and enables it in all plays https://review.openstack.org/164714	14:39
openstackgerrit	Hugh Saunders proposed stackforge/os-ansible-deployment: Add ldappool to keystone packages https://review.openstack.org/164715	14:40
*** stevemar has joined #openstack-ansible		14:40
*** prometheanfire has joined #openstack-ansible		14:45
*** alextricity has quit IRC		14:50
*** alextricity has joined #openstack-ansible		15:06
*** daneyon has quit IRC		15:11
*** vmtrooper has joined #openstack-ansible		15:24
*** vmtrooper has quit IRC		15:29
openstackgerrit	Hugh Saunders proposed stackforge/os-ansible-deployment: Ensure return code passes through output trimming https://review.openstack.org/164480	15:35
*** galstrom is now known as galstrom_zzz		15:55
*** openstackgerrit has quit IRC		16:11
*** openstackgerrit has joined #openstack-ansible		16:12
*** sandywalsh has quit IRC		16:36
palendae	cloudnull: Now that https://github.com/ansible/ansible-modules-extras/blob/devel/cloud/lxc/lxc_container.py is merged, do we need a blueprint to get that in? A bug at least	16:36
*** openstackgerrit has quit IRC		16:54
*** openstackgerrit has joined #openstack-ansible		16:54
*** britthouser has quit IRC		17:01
openstackgerrit	Merged stackforge/os-ansible-deployment: Reduce script verbosity https://review.openstack.org/164165	17:07
openstackgerrit	Merged stackforge/os-ansible-deployment: Removed all rackspace related logging parts https://review.openstack.org/164470	17:08
openstackgerrit	Merged stackforge/os-ansible-deployment: Add new rsyslog server role https://review.openstack.org/164471	17:08
openstackgerrit	Merged stackforge/os-ansible-deployment: Do not assume users have names https://review.openstack.org/164236	17:08
palendae	One more?	17:08
openstackgerrit	Merged stackforge/os-ansible-deployment: Update contributing guidelines with backport guidance https://review.openstack.org/163764	17:08
palendae	Boom	17:08
openstackgerrit	Miguel Alejandro Cantu proposed stackforge/os-ansible-deployment: Change heat_metadata_server_url to external API. https://review.openstack.org/164785	17:09
*** vmtrooper has joined #openstack-ansible		17:13
*** vmtrooper has quit IRC		17:18
*** daneyon has joined #openstack-ansible		17:18
*** sigmavirus24 is now known as sigmavirus24_awa		17:51
*** Mudpuppy has quit IRC		17:51
*** KLevenstein has quit IRC		18:06
*** galstrom_zzz is now known as galstrom		18:07
*** KLevenstein has joined #openstack-ansible		18:25
cloudnull	palendae the module is merged however its not part of a 'release' quite yet so i think we should hold off on that for now.	18:27
openstackgerrit	Jesse Pretorius proposed stackforge/os-ansible-deployment: Simplify and improve bootstrap/gate/run scripts https://review.openstack.org/163837	18:35
openstackgerrit	Jesse Pretorius proposed stackforge/os-ansible-deployment: Revise gate script library to report correctly https://review.openstack.org/163914	18:35
prometheanfire	aparently gate finally works for neutron, after 3-4 days of not working (at least)	18:37
prometheanfire	well, grenade at least	18:37
*** daneyon_ has joined #openstack-ansible		18:45
*** daneyon has quit IRC		18:46
openstackgerrit	Jesse Pretorius proposed stackforge/os-ansible-deployment: Remove 'holland' package from the wheel repository https://review.openstack.org/164822	18:48
daneyon_	Any background on why all the neutron agents share the same container?	18:50
*** Mudpuppy has joined #openstack-ansible		18:52
*** Mudpuppy has quit IRC		18:57
*** Mudpuppy has joined #openstack-ansible		18:57
odyssey4me	Apsu rackertom cloudnull ^ see question from daneyon_	19:01
palendae	cloudnull: Ah, that's fair	19:01
*** vmtrooper has joined #openstack-ansible		19:02
palendae	prometheanfire: For neutron the project or our neutron tests	19:02
Apsu	daneyon_: Meaning l3 + dhcp + metadata + l2?	19:03
cloudnull	daneyon_ the neutron agents all share the same container because the core agents all require the use of the same namespace.	19:03
Apsu	daneyon_: The alternative would be l3 + metadata + l2 && dhcp + metadata + l2.	19:04
*** sigmavirus24_awa is now known as sigmavirus24		19:04
Apsu	Which we felt was worse than combining l3 and dhcp.	19:04
cloudnull	what Apsu said ^	19:04
cloudnull	we've taken a similar approach with heat and glance.	19:06
Apsu	Yep. We combine where it makes sense, and separate where it doesn't.	19:06
prometheanfire	palendae: project	19:06
*** vmtrooper has quit IRC		19:06
prometheanfire	palendae: the entire project was stalled because grenade fail	19:06
daneyon_	cloudnull and Apsu that's what I was thinking but I just wanted to make sure. I would think it would be pretty straight forward to create plays that separated the LB agent (running on compute nodes) from the other Neutron agents that would run on control nodes.	19:08
Apsu	daneyon_: Ah, well, now you're thinking in DVR terms.	19:08
Apsu	And that's different than LinuxBridge terms. For now :)	19:08
cloudnull	doing a deployment using full micro-services, as has been dubed by the docker community, is a lofty goal and one that can be done in academia but simply doesn't make sense within in production in most cases. We found that we'd need to work around issues presented by using micro-services like that which present bugs that are really not bugs	19:08
Apsu	^ iscsi kernel module doesn't respect network namespaces, for instance.	19:09
Apsu	Making putting nova into a container partially crippled.	19:09
cloudnull	daneyon_ in terms of running LB agents and things like DVR on compute nodes is already possible in the way that we spec the environment. IE neutron-linuxbridgeagent is running on the compute nodes.	19:10
cloudnull	and you'd build off of that to continue that trend	19:11
Apsu	Yep	19:11
cloudnull	if you have a look at master there's a lot more separation of services and roles that in the other feature branches.	19:11
daneyon_	Apsu: OK. I haven't touched Neutron HA in a while. My approach in the past for Neutron HA was to use provider networking instead of the L3 agent. It appears that v10 support L3 HA. How well has this worked for you?	19:12
Apsu	Sam-I-Am: daneyon_ has a question for you, I believe :P	19:12
cloudnull	daneyon_ l3_ha is disabled by default, as t pertains to neutron.	19:12
Apsu	daneyon_: I would defer to Sam-I-Am who has flagellated himself the most extensively with L3HA/DVR.	19:13
Sam-I-Am	hahah	19:13
cloudnull	we're using the l3 att tool to ensure that l3 is ha between multiple active nodes.	19:13
Sam-I-Am	l3ha and linuxbridge is not recommended in juno (v10)	19:13
Sam-I-Am	you can probably configure it, but ymmv on operation	19:13
daneyon_	cloudnull: I agree. I think Docker's inability to natively support multiple processes gets in the way sometimes.	19:14
cloudnull	this is the tool that att created to do 3l ha failover https://github.com/stackforge/os-ansible-deployment/blob/master/playbooks/roles/os_neutron/files/neutron-ha-tool.py	19:14
Apsu	^ we switched to that from our former solution, a daemon I wrote to watch agent heartbeats through rabbit and reschedule routers/networks as appropriate	19:15
Apsu	The daemon used to break older neutron. It was too efficient at its job, and Neutron has some serious race conditions in it	19:15
Apsu	Especially around network scheduling	19:16
Apsu	Tempest can do it too, before they added waits into it between network spinup/teardown :P	19:16
cloudnull	additionally we're using the change / least router scheduler	19:16
cloudnull	https://github.com/stackforge/os-ansible-deployment/blob/master/playbooks/roles/os_neutron/defaults/main.yml#L48-L49	19:16
Apsu	^	19:16
cloudnull	which allows us to make sure scheduling is relatively diverse	19:16
Apsu	That's relatively new, but basically obsoleted rpcdaemon (which did the same basic thing)	19:17
cloudnull	^ true story	19:17
Apsu	So we get to leverage builtin schedulers instead with manual rescheduling on failure (through cron'd ATT script)	19:17
Apsu	Easier for everyone	19:17
Apsu	What I really should do next is write an auto-rescheduler...	19:17
Apsu	Bake it into upstream neutron	19:17
cloudnull	that way we can say gty to the l3_ha :)	19:18
palendae	Apsu: Do you want core? That's how you get core.	19:18
daneyon_	Apsu: I briefly looked at the iscsi bug you're referencing. Compute containers were interesting from an upgrade perspective... being able to upgrade nova-compute with little downtime to the running instances. I suspect you simply rely on migration to evacuate instances from a compute node before upgrading, correct?	19:18
Apsu	palendae: haha	19:18
Apsu	daneyon_: Essentially yeah. I mean, that code path Still works; running instances in containers.	19:18
cloudnull	daneyon_ we can upgrade inplace without impacting the running instances. so long as your not upgrading libvirt.	19:19
Apsu	The whole thing works great. It's just iscsi failing to rewrite their netlink driver code	19:19
daneyon_	cloudnull Apsu: OK, I'm used to running only the L3 and LB agents on the compute nodes. Many times I would not even run L3 on the compute node and use provider networking to get around the Neutron L3 HA limitations. I would run the other agents on the control nodes.	19:20
cloudnull	deploying new nova-compute/neutron-linuxbridge code is relatively benign.	19:20
cloudnull	ah. this is a bit of an architectural shift from that.	19:21
Apsu	daneyon_: I hear you. We didn't even ship l3 agents when they were first out, nor OVS.	19:21
cloudnull	in test we run neutron agents on controller nodes, but in production we setup neutron agents on standalone hosts.	19:22
Apsu	I wonder if the metadata route injection bug is still present in Kilo...	19:22
daneyon_	cloudnull: i'm unfamiliar with the l3 att tool. Do you have a pointer?	19:22
palendae	https://github.com/stackforge/os-ansible-deployment/blob/master/playbooks/roles/os_neutron/files/neutron-ha-tool.py	19:22
Apsu	I should probably pick this back up and make sure it's fixed or doesn't need the workaround anymore: https://review.openstack.org/#/c/40487/	19:23
daneyon_	cloudnull: nm	19:23
cloudnull	https://github.com/stackforge/os-ansible-deployment/blob/master/playbooks/roles/os_neutron/files/neutron-ha-tool.py	19:23
cloudnull	oh sorry :)	19:23
Apsu	Only need to wait a few more months before 2 years abandoned!	19:23
daneyon_	cloudnull: even if the nova-compute upgrade doesn't upgrade libvirt, you don't evacuate instances to CYA?	19:27
daneyon_	Sam-I-Am: Is the plan to move from the att ha tool to dvr in Kilo?	19:31
Sam-I-Am	dvr requires ovs, so until there's ovs support in os-ansible or lb support in dvr... probably not.	19:31
daneyon_	Sam-I-Am: OK. Thanks	19:33
Sam-I-Am	i suspect L3HA will easier to implement	19:34
odyssey4me	In my opinion, L3HA is a better model from a security standpoint - it's certainly easier to control what's exposed with that model. DVR just requires far too much additional hole-plugging.	19:39
odyssey4me	far less moving parts and hocus pocus too	19:40
daneyon_	cloudnull: Back to the my cinder issue from Friday: error while evaluating conditional: CURRENT not in cinder_get.content I think it may be due to the cinder scheduler not running. I don't understand why the cinder-scheduler container is not being created and configured.	19:43
cloudnull	the scheduler is running within the volume container.	19:44
daneyon_	odyssey4me Sam-I-Am I'm a big fan of simple solutions to fix problems.	19:44
daneyon_	cloudnull: ah... I see	19:44
cloudnull	which is due to issues with the cinder-volumes not wanting to communicate to other schedulers upon failover.	19:44
cloudnull	so we run the scheduler on each volume node. to ensure that it maximizes uptime and maximizes availability.	19:45
daneyon_	cloudnull: Is their a bug ID for that?	19:47
cloudnull	let me go see if i can find it .	19:47
daneyon_	cloudnull: I guess that puts me back at square one. I have cinder scheduler/volume running, I can curl the API VIP, and get status CURRENT but I get the error above running the play.	19:48
Sam-I-Am	odyssey4me: there are some benefits to l3ha over dvr	19:48
Sam-I-Am	both of them are sort of half-baked	19:48
Sam-I-Am	(when you go digging)	19:48
daneyon_	cloudnull: I'm going to temp remove the check cinder api service is available from the play, run the play and checkto make sure everything works	19:49
cloudnull	hughsaunders might you be able to have a look at what could be going on with that? didn't you work on that part of cinder? or am i remembering wrong?	19:49
* hughsaunders reads		19:49
odyssey4me	Sam-I-Am my support of L3HA is more conceptual at this point - I haven't done any digging	19:49
*** sdake has quit IRC		19:50
Sam-I-Am	odyssey4me: we should have a chat	19:50
cloudnull	daneyon_: https://bugs.launchpad.net/cinder/+bug/1409012	19:50
openstack	Launchpad bug 1409012 in Cinder "Volume becomes in 'error' state after scheduler starts" [High,Fix committed] - Assigned to Michal Dulko (michal-dulko-f)	19:50
cloudnull	fix-committed in master 23 hours ago	19:51
*** sdake has joined #openstack-ansible		19:51
daneyon_	cloudnull: thx for the bug info	19:52
odyssey4me	Sam-I-Am part of that support comes from running a public cloud for some time and having to forcibly plug holes and put complex iptables blocking in to protect compute nodes when we were using nova-network... all of which because a lot simpler when we switched to the Neutron L3 agent model on designated network nodes. Our only issue then was scale for provider networks... but that was easy enough to resolve by moving	19:53
odyssey4me	our network control VM's to less contentious hosts and beefing the VM's up as required.	19:53
*** britthouser has joined #openstack-ansible		19:55
daneyon_	cloudnull: the rest of the openstack-setup runs when i remove the check api from the cinder backend setup. I am able to create a volume and attach it to an instance.	19:58
Sam-I-Am	odyssey4me: from the ops meetup, seems a lot of people use nova-net or neutron w/ providernets	19:58
Sam-I-Am	and nova-net is popular because, even with dvr, there's not really a parallel in neutron for flatdhcp	19:59
cloudnull	hum. thats seems odd daneyon_ . as i've not seen that in production before. that said there are a few of us looking into it to see if we can figure out why that is .	20:04
odyssey4me	Sam-I-Am from the ops meetup notes, it appeared common that people weren't aware that provider nets can be used with neutron - which I thought was odd... we had that setup for neutron back in Grizzly	20:05
Apsu	odyssey4me: Yep.	20:05
Sam-I-Am	what?	20:06
Apsu	People seem confused by the fact that all networks are provider networks.	20:06
odyssey4me	we had provider nets for in-DC traffic, and customer dedicated WAN links... the provider nets ran straight from the compute hosts via vlan tags	20:06
odyssey4me	for those not using provider nets, or even those using them but implementing virtual routers to gre networks, those went via the L3 Agents	20:07
Apsu	There's no such thing as a "tenant" or "overlay" network, per se. As far as Neutron is concerned, a network is a network, and you pick the type by either accepting the default type for a non-admin tenant (i.e., "tenant" network), which may be a tunnel type ("overlay").	20:07
Apsu	But all of them involve either accepting the default or using the provider extension with --provider:key=value	20:07
Apsu	They're all provider networks :P	20:07
odyssey4me	Apsu exactly - but it seems that most operators don't get that yet... which is not surprising as neutron is far more conplicated to piece together compared to nova-network.	20:08
Apsu	Sure	20:08
odyssey4me	You actually do need to understand networking. I was lucky in that I had someone who worked with me who did.	20:09
* Apsu nods		20:09
Apsu	I find it silly to attempt to network a cluster of any size or complexity without having someone versed in at least traditional networking, if not linux networking specifically	20:09
odyssey4me	Sam-I-Am flatdhcp - is that where there is only one network shared by all projects?	20:10
Apsu	Unless you're outsourcing deployment entirely	20:10
Apsu	odyssey4me: No. It's a nova-network network architecture type.	20:10
Apsu	There was VLANManager, FlatManager, FlatDHCPManager, essentially	20:10
odyssey4me	Apsu we used vlanmanager - but is there really no neutron topology that is similar to the flatdhcpmanager?	20:11
Apsu	Most people used FlatDHCP, with the (eventual) multi_host=True	20:11
Apsu	The way that worked, was you put a dhcp server on each compute host	20:11
Sam-I-Am	yeah... multi_host is the big deal	20:11
Apsu	Each compute host had a linux bridge, with IPs on it from your instance networks	20:11
Apsu	The IPs served as the gateways/DHCP server binds	20:12
Sam-I-Am	and some hackey goodness to make fixed/floating work	20:12
odyssey4me	nova-network's networking was horrible - even with everything in segregated vlan's you couldn't safely overlap subnets	20:12
Apsu	So each compute host could route traffic through each compute's bridge.	20:12
Sam-I-Am	it is horrible, and its not self-service	20:12
Sam-I-Am	however, people got used to those hacks, and think its fine.	20:12
Apsu	Required much more extensive linux networking knowledge to configure and maintain I'd say	20:12
Apsu	Neutron is just presented poorly and has a very large potential scope	20:12
odyssey4me	yeah, neutron was written by networking people... nova-net was written by server people	20:13
Apsu	But the actual configuration is relatively simple. Much simpler than provisioning an equally complex nova-net	20:13
odyssey4me	well, that's the conclusion I drew	20:13
Apsu	There's been a lot of work trying to emulate multi_host over the past few years	20:13
Sam-I-Am	dvr is about as close as it gets	20:13
Sam-I-Am	except now its "too complex"	20:14
Apsu	One of the main pieces of work was from a guy at IBM (iirc), which got pushed back again and again until eventually he abandoned it	20:14
Apsu	Some other folks retried with DVR. Similar concept, but overengineered and poorly implemented from what I can see	20:14
Apsu	The core concept is very simple and many people (myself included) have come up with it independently.	20:14
Apsu	There's even other options possible with upstream network device support, such as ECMP	20:15
Apsu	Or using MAC load-balancing, like CARP	20:15
Sam-I-Am	Apsu: patches accepted	20:15
Sam-I-Am	well, proposed :P	20:15
Apsu	:P	20:15
Apsu	I might. I'm afraid of getting core, because I'll have to take up drinking that way	20:15
Sam-I-Am	you're not drinking now?	20:15
palendae	Apsu: Is there anything legally binding with core?	20:16
hughsaunders	daneyon_: did you rerun when "check cinder api service is available" failed? I'm curious as to whether it failed multiple times. If only once, could have been that the retries expired before cinder was available?	20:16
prometheanfire	cloudnull: https://review.openstack.org/#/c/154128/ kthnx	20:16
prometheanfire	:D	20:16
prometheanfire	Apsu: I suppose you too	20:16
Apsu	palendae: Nah. Just my non-existent professional reputation. I'm dressing for the job I want.	20:17
odyssey4me	Apsu if you can simplify it, then make the code... do it!	20:17
Apsu	odyssey4me: Probably will	20:17
Apsu	I've been kicking around the idea for 3 years. Have had many conversations with Vish, Dan and the IBM guy.	20:17
Apsu	Sadly, when Dan was in charge, his response to "What about multi_host parity?" was "I don't see why you would do anything different than run 2 networking nodes and use OVS"	20:18
palendae	Apsu: So you're trying to improve it?	20:18
Apsu	Note that he said it in person with an earnest face. So... yeah.	20:18
Apsu	palendae: Yeah, ideally	20:18
palendae	Suck up :p	20:18
odyssey4me	so by multi-host parity, then mean having dhcp on every compute node?	20:19
Apsu	odyssey4me: Nah, that's not the primary goal.	20:19
odyssey4me	*they	20:19
Apsu	The HA'ness of agents isn't the point. That's essentially solved already.	20:19
Apsu	You can put them anywhere, as many as you like, the scheduler is (almost) fine.	20:20
Apsu	The issue is what's often called "direct return", in the load-balancing world.	20:20
Sam-I-Am	ooo yeah	20:20
Apsu	I.e., instances on a given compute host will directly route through the upstream (non-virtual) switch to reach the outside world, and likewise for traffic coming back in to that instance.	20:20
Apsu	Which is also known as direct north-south traversal.	20:21
openstackgerrit	Kevin Carter proposed stackforge/os-ansible-deployment: Adds rsyslog client role and enables it in all plays https://review.openstack.org/164714	20:21
Apsu	Currently, the path is east-west to network nodes, then north-south	20:21
Apsu	Aggregating all routed traffic through the network nodes.	20:21
Apsu	Generally switches can handle aggregate traffic better than servers, and their uplinks are capable of being much better, so you're artificially limiting your (aggregate) instance bandwidth by funneling through the east-west path.	20:22
odyssey4me	Apsu yeah, but that choking point is actually a positive thing - assuming that you're able to scale those network control points sideways and perhaps also set them into AZ's.	20:22
Apsu	Even worse, you generally traverse the same switch to get to the network node in the first place	20:22
Apsu	And if it's the same physical interface (different VLAN, say), you've just halved your routed bandwidth	20:22
odyssey4me	yeah, but hang on - this is why perhaps you should be implementing neutron with a real-world controller, instead of OVS.	20:22
Apsu	Sure. This is part of what led people to put OVS on physical switches.	20:23
Apsu	Got one under my desk right now :P	20:23
palendae	Apsu: O.o	20:23
Apsu	palendae: Oh yes.	20:23
palendae	O. k.	20:23
Apsu	Runs a full Debian distro, has OVS, can see each physical port as an OVS port	20:23
Apsu	All the needfuls	20:24
Sam-I-Am	debian? so it has ovs 1.3?	20:24
Apsu	Sam-I-Am: First rule of Debian club.	20:24
odyssey4me	so you you chose to use a Cisco Nexus fabric with its controller in a VDC, and allowing Neutron to orchestrate directly, you'd be sitting pretty (although somewhat poorer compared to a similar OVS setup)	20:24
Apsu	Sam-I-Am: I'm trying to reduce the salt level, not increase it ;P	20:24
Sam-I-Am	Apsu: when did we move to salt?	20:24
Apsu	odyssey4me: Welcome to what almost every single network vendor has been working on for ~2 years.	20:25
Sam-I-Am	or you can use brocade :)	20:25
Apsu	Sam-I-Am: Before we realized it wasn't the expression of our saltiness.	20:25
odyssey4me	but that's the sort of setup needed for a real production environment - the OVS setup really is only useful for small setups	20:25
odyssey4me	Sam-I-Am yeah, or Arista	20:25
Apsu	odyssey4me: Eh, that's debatable. Google's got OVS running their internal backbone switches	20:25
Apsu	They were one of the first to stick it on a physical switch	20:26
Sam-I-Am	rackspace uses ovs	20:26
Apsu	True, public cloud networking is partly OVS.	20:26
Apsu	Well, I should say partly NSX..	20:26
Sam-I-Am	one could argue that if you have to pick software, ovs scales better than linuxbridge	20:26
odyssey4me	fair enough - OVS can be used... but then it should be on specialised hardware... and for a decent L3 setup I expect that a more capable controller owuld be needed than the basic stuff that usually gets used	20:26
odyssey4me	maybe opendaylight or something? dunno - it's been a hiwle	20:26
odyssey4me	*while	20:27
palendae	This discussions is working well with the class right now	20:27
Apsu	odyssey4me: Sure. To be fair, OVS 2.3+ is way more advanced than Neutron has begun to take advantage of	20:27
Apsu	Still, I don't think it's "better" than native linux.	20:28
Apsu	Namespaces and tunnel interfaces and minor orchestration can do what you need.	20:28
daneyon_	hughsaunders: I ran multiple times and i hit the same error. The only way I can get around it is by removing the api check.	20:28
odyssey4me	also to be fair, our horrible experiences with OVS have a lot to do with early versions and the inability for Ubuntu to provide kernel and OVS kernel module patches properly... I do have horrible memories of kernel panics after almost every package update	20:29
hughsaunders	daneyon_: would you be able to pastebin the response you get from cinder?	20:29
Apsu	odyssey4me: The biggest benefits from OVS come from using actual metrics and monitoring at the flow level to provide dynamic adjustments to traffic, to maximize utilization	20:29
Apsu	odyssey4me: That's the whole value prop of OpenFlow. The dynamic, programmable feedback loop	20:29
Apsu	Then you can adjust QoS, pick different datacenter paths based on link utilization, etc	20:29
Sam-I-Am	what is this qos?	20:29
odyssey4me	Apsu yeah, I do remember my fellow architect schooling me in those mysteries :p	20:29
Apsu	Gives you a good place to interface with OSPF in the backbone layer...	20:30
Apsu	Sam-I-Am: I have no idea what I'm talking about, I just started pasting from a buzzword generator. Don't mind me.	20:30
hughsaunders	Apsu: now we have it in writing!	20:30
palendae	Apsu is a markov chain	20:30
Apsu	hughsaunders: Was it ever in question?	20:30
Apsu	palendae: What do you feel about Apsu is a markov chain ?	20:31
daneyon_	hughsaunders: just to clarify, the response I get from Cinder API when I simply curl the VIP?	20:34
hughsaunders	daneyon_: yes please	20:34
hughsaunders	just to see if theres some reason its not matching	20:35
daneyon_	hughsaunders: https://etherpad.mozilla.org/74IKwZSD2b	20:35
hughsaunders	daneyon_: thanks	20:36
*** vmtrooper has joined #openstack-ansible		20:50
daneyon_	hughsaunders: yw	20:53
*** vmtrooper has quit IRC		20:55
*** jaypipes has quit IRC		20:56
*** jaypipes has joined #openstack-ansible		20:57
*** Mudpuppy has quit IRC		21:00
*** sigmavirus24 is now known as sigmavirus24_awa		21:00
hughsaunders	daneyon_: I setup nc to respond with the response you provided, and pointed the "check cinder api service is available" task from the tip of the juno branch at and I can't get it to fail :( so I'm not sure whats happening. Which SHA of os-ansible-deployment are you on?	21:08
daneyon_	Sam-I-Am: I've had a chance to look at t he HA tool in more detail. It seems like the tool is more of a DR solution than an HA solution. How long do you see the typical faial-over time between L3 agents in your test cases?	21:08
daneyon_	hughsaunders: commit 2c6e3b5c5958feda28c800a3acec9165051e6fdc	21:09
hughsaunders	daneyon_: thanks	21:09
daneyon_	hughsaunders: yw	21:11
daneyon_	hughsaunders: bb in 10-15.. need food.	21:12
*** Mudpuppy has joined #openstack-ansible		21:14
*** Mudpuppy_ has joined #openstack-ansible		21:17
*** Mudpuppy has quit IRC		21:20
*** Mudpuppy_ is now known as Mudpuppy		21:31
hughsaunders	daneyon_: still can't get it to fail, tried ansible 1.6.10 (as pre requirements) and 1.8.4 (latest release). could you add a debug task after "check cinder api service is available" and pastebin the result? -debug: var=cinder_get	21:34
*** sigmavirus24_awa is now known as sigmavirus24		21:47
daneyon_	hughsaunders: Do I add something like this: debug: cinder_get	21:52
hughsaunders	daneyon_: yeah -debug: var=cinder_get	21:53
*** stevemar has quit IRC		22:13
*** Mudpuppy has quit IRC		22:19
daneyon_	hughsaunders: i updated the etherpad	22:28
daneyon_	hughsaunders: Is their a log I should check b/c I get the same error msg	22:29
hughsaunders	daneyon_: yeah, it seems the debug is not evaluated because the previous task fails :(	22:30
Apsu	daneyon_: < 5 min for failover. Which isn't very fast in the worst case. It's more like Medium Availability, I guess. There are better ways to do it but not without upstream scheduler changes (which currently don't handle failover at all for L3), running the cronjobs more often (which can be too slow and pile up with lots of networks/agents), or using a dedicated daemon to plug into agent statuses through AMQP and do its own scheduling and queueing	22:34
Apsu	(like rpcdaemon did/does).	22:34
daneyon_	Apsu: Thx. Can you refresh my memory why you went away from rpcdaemon then?	22:36
Apsu	daneyon_: Neutron finally implemented a decent L3 scheduler that handled most of the edge cases it used to not, and only lacked the basic case of: agent down -> move routers to other agents	22:38
Apsu	The LeastUsedScheduler or whatever. It just needs to get poked a little bit	22:39
Apsu	Which the ATT script did in a simple straightforward way	22:39
daneyon_	Apsu: OK. Thanks!	22:39
Apsu	rpcdaemon is ... a slightly complex, multithreaded, proper daemon.	22:39
*** vmtrooper has joined #openstack-ansible		22:39
Apsu	I can totally show it to you and explain it if you're interested, but you'd have to disable the builtin schedulers	22:39
daneyon_	Apsu: Maybe at some point. Right now, I'm just trying to gather data.	22:41
Apsu	It's faster than processing the same bits in bash, for sure, but the real solution is honestly to make a better scheduler.	22:42
Apsu	I think Neutron had a blueprint for one....	22:42
*** sigmavirus24 is now known as sigmavirus24_awa		22:43
Apsu	Might have gotten displaced a little due to DVR/HA bits coming in	22:43
hughsaunders	daneyon_: thanks for providing debug info, the only thing I can think off is that the internal vip or service port are wrong, but I'm not sure how that would happen.	22:44
Apsu	Looks like this part is in at least: https://github.com/openstack/neutron/blob/e933891462408435c580ad42ff737f8bff428fbc/neutron/scheduler/l3_agent_scheduler.py#L126	22:44
Apsu	auto_schedule_routers	22:44
Apsu	Which was part of the automatic rescheduler class	22:44
hughsaunders	The content check stands up to scruityn	22:44
Apsu	Anyhow, heading out.	22:44
hughsaunders	I'm off now, but will ping if I think of anything else.	22:44
openstackgerrit	Kevin Carter proposed stackforge/os-ansible-deployment: Adds rsyslog client role and enables it in all plays https://review.openstack.org/164714	22:44
*** vmtrooper has quit IRC		22:45
daneyon_	hughsaunders: what's weird the play checks the vip and port in the previous step. That passes. I even went changed the vip/port vars in the api check to the real IP/port and I get the same failure. I'm going to do a rebuild of my env soon and I'll let you know if it comes back.	22:45
*** sdake has quit IRC		22:55
*** KLevenstein has quit IRC		22:59
*** galstrom is now known as galstrom_zzz		22:59
*** sdake has joined #openstack-ansible		23:24
*** jaypipes has quit IRC		23:41
*** sdake__ has joined #openstack-ansible		23:45
*** sdake has quit IRC		23:49

Generated by irclog2html.py 2.14.0 by Marius Gedminas - find it at mg.pov.lt!