Monday, 2015-03-16

*** vmtrooper has joined #openstack-ansible00:53
*** vmtrooper has quit IRC00:58
*** mahito has joined #openstack-ansible01:02
*** galstrom_zzz is now known as galstrom02:00
*** vmtrooper has joined #openstack-ansible02:41
*** vmtrooper has quit IRC02:47
*** stevemar has joined #openstack-ansible03:19
*** galstrom is now known as galstrom_zzz04:01
*** galstrom_zzz is now known as galstrom04:06
*** mahito has quit IRC04:08
*** vmtrooper has joined #openstack-ansible04:30
*** vmtrooper has quit IRC04:36
*** galstrom is now known as galstrom_zzz05:05
*** stevemar has quit IRC05:55
*** vmtrooper has joined #openstack-ansible06:19
*** vmtrooper has quit IRC06:24
*** mahito has joined #openstack-ansible06:56
*** vmtrooper has joined #openstack-ansible08:08
*** vmtrooper has quit IRC08:13
*** mahito has quit IRC08:33
*** vmtrooper has joined #openstack-ansible09:57
*** vmtrooper has quit IRC10:02
*** vmtrooper has joined #openstack-ansible11:46
*** jaypipes has joined #openstack-ansible11:51
*** vmtrooper has quit IRC11:51
*** britthouser has joined #openstack-ansible11:53
*** galstrom_zzz is now known as galstrom12:25
*** sdake has joined #openstack-ansible12:47
*** openstackgerrit has quit IRC12:50
*** openstackgerrit has joined #openstack-ansible12:50
*** galstrom is now known as galstrom_zzz13:08
openstackgerritgit-harry proposed stackforge/os-ansible-deployment: Test commit - do not review  https://review.openstack.org/16167013:31
*** sandywalsh has joined #openstack-ansible13:31
*** vmtrooper has joined #openstack-ansible13:35
*** KLevenstein has joined #openstack-ansible13:39
*** vmtrooper has quit IRC13:40
*** Mudpuppy has joined #openstack-ansible13:56
*** Mudpuppy has quit IRC14:07
*** Mudpuppy has joined #openstack-ansible14:08
*** sigmavirus24_awa is now known as sigmavirus2414:25
openstackgerritKevin Carter proposed stackforge/os-ansible-deployment: Adds rsyslog client role and enables it in all plays  https://review.openstack.org/16471414:36
openstackgerritKevin Carter proposed stackforge/os-ansible-deployment: Adds rsyslog client role and enables it in all plays  https://review.openstack.org/16471414:36
*** galstrom_zzz is now known as galstrom14:37
openstackgerritKevin Carter proposed stackforge/os-ansible-deployment: Adds rsyslog client role and enables it in all plays  https://review.openstack.org/16471414:39
openstackgerritHugh Saunders proposed stackforge/os-ansible-deployment: Add ldappool to keystone packages  https://review.openstack.org/16471514:40
*** stevemar has joined #openstack-ansible14:40
*** prometheanfire has joined #openstack-ansible14:45
*** alextricity has quit IRC14:50
*** alextricity has joined #openstack-ansible15:06
*** daneyon has quit IRC15:11
*** vmtrooper has joined #openstack-ansible15:24
*** vmtrooper has quit IRC15:29
openstackgerritHugh Saunders proposed stackforge/os-ansible-deployment: Ensure return code passes through output trimming  https://review.openstack.org/16448015:35
*** galstrom is now known as galstrom_zzz15:55
*** openstackgerrit has quit IRC16:11
*** openstackgerrit has joined #openstack-ansible16:12
*** sandywalsh has quit IRC16:36
palendaecloudnull: Now that https://github.com/ansible/ansible-modules-extras/blob/devel/cloud/lxc/lxc_container.py is merged, do we need a blueprint to get that in? A bug at least16:36
*** openstackgerrit has quit IRC16:54
*** openstackgerrit has joined #openstack-ansible16:54
*** britthouser has quit IRC17:01
openstackgerritMerged stackforge/os-ansible-deployment: Reduce script verbosity  https://review.openstack.org/16416517:07
openstackgerritMerged stackforge/os-ansible-deployment: Removed all rackspace related logging parts  https://review.openstack.org/16447017:08
openstackgerritMerged stackforge/os-ansible-deployment: Add new rsyslog server role  https://review.openstack.org/16447117:08
openstackgerritMerged stackforge/os-ansible-deployment: Do not assume users have names  https://review.openstack.org/16423617:08
palendaeOne more?17:08
openstackgerritMerged stackforge/os-ansible-deployment: Update contributing guidelines with backport guidance  https://review.openstack.org/16376417:08
palendaeBoom17:08
openstackgerritMiguel Alejandro Cantu proposed stackforge/os-ansible-deployment: Change heat_metadata_server_url to external API.  https://review.openstack.org/16478517:09
*** vmtrooper has joined #openstack-ansible17:13
*** vmtrooper has quit IRC17:18
*** daneyon has joined #openstack-ansible17:18
*** sigmavirus24 is now known as sigmavirus24_awa17:51
*** Mudpuppy has quit IRC17:51
*** KLevenstein has quit IRC18:06
*** galstrom_zzz is now known as galstrom18:07
*** KLevenstein has joined #openstack-ansible18:25
cloudnullpalendae the module is merged however its not part of a 'release' quite yet so i think we should hold off on that for now.18:27
openstackgerritJesse Pretorius proposed stackforge/os-ansible-deployment: Simplify and improve bootstrap/gate/run scripts  https://review.openstack.org/16383718:35
openstackgerritJesse Pretorius proposed stackforge/os-ansible-deployment: Revise gate script library to report correctly  https://review.openstack.org/16391418:35
prometheanfireaparently gate finally works for neutron, after 3-4 days of not working (at least)18:37
prometheanfirewell, grenade at least18:37
*** daneyon_ has joined #openstack-ansible18:45
*** daneyon has quit IRC18:46
openstackgerritJesse Pretorius proposed stackforge/os-ansible-deployment: Remove 'holland' package from the wheel repository  https://review.openstack.org/16482218:48
daneyon_Any background on why all the neutron agents share the same container?18:50
*** Mudpuppy has joined #openstack-ansible18:52
*** Mudpuppy has quit IRC18:57
*** Mudpuppy has joined #openstack-ansible18:57
odyssey4meApsu rackertom cloudnull ^ see question from daneyon_19:01
palendaecloudnull: Ah, that's fair19:01
*** vmtrooper has joined #openstack-ansible19:02
palendaeprometheanfire: For neutron the project or our neutron tests19:02
Apsudaneyon_: Meaning l3 + dhcp + metadata + l2?19:03
cloudnulldaneyon_ the neutron agents all share the same container because the core agents all require the use of the same namespace.19:03
Apsudaneyon_: The alternative would be l3 + metadata + l2 && dhcp + metadata + l2.19:04
*** sigmavirus24_awa is now known as sigmavirus2419:04
ApsuWhich we felt was worse than combining l3 and dhcp.19:04
cloudnullwhat Apsu said ^19:04
cloudnullwe've taken a similar approach with heat and glance.19:06
ApsuYep. We combine where it makes sense, and separate where it doesn't.19:06
prometheanfirepalendae: project19:06
*** vmtrooper has quit IRC19:06
prometheanfirepalendae: the entire project was stalled because grenade fail19:06
daneyon_cloudnull and Apsu that's what I was thinking but I just wanted to make sure. I would think it would be pretty straight forward to create plays that separated the LB agent (running on compute nodes) from the other Neutron agents that would run on control nodes.19:08
Apsudaneyon_: Ah, well, now you're thinking in DVR terms.19:08
ApsuAnd that's different than LinuxBridge terms. For now :)19:08
cloudnulldoing a deployment using full micro-services, as has been dubed by the docker community, is a lofty goal and one that can be done in academia but simply doesn't make sense within in production in most cases. We found that we'd need to work around issues presented by using micro-services like that which present bugs that are really not bugs19:08
Apsu^ iscsi kernel module doesn't respect network namespaces, for instance.19:09
ApsuMaking putting nova into a container partially crippled.19:09
cloudnulldaneyon_ in terms of running LB agents and things like DVR on compute nodes is already possible in the way that we spec the environment. IE neutron-linuxbridgeagent is running on the compute nodes.19:10
cloudnulland you'd build off of that to continue that trend19:11
ApsuYep19:11
cloudnullif you have a look at master there's a lot more separation of services and roles that in the other feature branches.19:11
daneyon_Apsu: OK. I haven't touched Neutron HA in a while. My approach in the past for Neutron HA was to use provider networking instead of the L3 agent. It appears that v10 support L3 HA. How well has this worked for you?19:12
ApsuSam-I-Am: daneyon_ has a question for you, I believe :P19:12
cloudnulldaneyon_ l3_ha is disabled by default, as t pertains to neutron.19:12
Apsudaneyon_: I would defer to Sam-I-Am who has flagellated himself the most extensively with L3HA/DVR.19:13
Sam-I-Amhahah19:13
cloudnullwe're using the l3 att tool to ensure that l3 is ha between multiple active nodes.19:13
Sam-I-Aml3ha and linuxbridge is not recommended in juno (v10)19:13
Sam-I-Amyou can probably configure it, but ymmv on operation19:13
daneyon_cloudnull: I agree. I think Docker's inability to natively support multiple processes gets in the way sometimes.19:14
cloudnullthis is the tool that att created to do 3l ha failover https://github.com/stackforge/os-ansible-deployment/blob/master/playbooks/roles/os_neutron/files/neutron-ha-tool.py19:14
Apsu^ we switched to that from our former solution, a daemon I wrote to watch agent heartbeats through rabbit and reschedule routers/networks as appropriate19:15
ApsuThe daemon used to break older neutron. It was too efficient at its job, and Neutron has some serious race conditions in it19:15
ApsuEspecially around network scheduling19:16
ApsuTempest can do it too, before they added waits into it between network spinup/teardown :P19:16
cloudnulladditionally we're using the change / least router scheduler19:16
cloudnullhttps://github.com/stackforge/os-ansible-deployment/blob/master/playbooks/roles/os_neutron/defaults/main.yml#L48-L4919:16
Apsu^19:16
cloudnullwhich allows us to make sure scheduling is relatively diverse19:16
ApsuThat's relatively new, but basically obsoleted rpcdaemon (which did the same basic thing)19:17
cloudnull^ true story19:17
ApsuSo we get to leverage builtin schedulers instead with manual rescheduling on failure (through cron'd ATT script)19:17
ApsuEasier for everyone19:17
ApsuWhat I really should do next is write an auto-rescheduler...19:17
ApsuBake it into upstream neutron19:17
cloudnullthat way we can say gty to the l3_ha :)19:18
palendaeApsu: Do you want core? That's how you get core.19:18
daneyon_Apsu: I briefly looked at the iscsi bug you're referencing. Compute containers were interesting from an upgrade perspective... being able to upgrade nova-compute with little downtime to the running instances. I suspect you simply rely on migration to evacuate instances from a compute node before upgrading, correct?19:18
Apsupalendae: haha19:18
Apsudaneyon_: Essentially yeah. I mean, that code path Still works; running instances in containers.19:18
cloudnulldaneyon_ we can upgrade inplace without impacting the running instances. so long as your not upgrading libvirt.19:19
ApsuThe whole thing works great. It's just iscsi failing to rewrite their netlink driver code19:19
daneyon_cloudnull Apsu: OK, I'm used to running only the L3 and LB agents on the compute nodes. Many times I would not even run L3 on the compute node and use provider networking to get around the Neutron L3 HA limitations. I would run the other agents on the control nodes.19:20
cloudnulldeploying new nova-compute/neutron-linuxbridge code is relatively benign.19:20
cloudnullah. this is a bit of an architectural shift from that.19:21
Apsudaneyon_: I hear you. We didn't even ship l3 agents when they were first out, nor OVS.19:21
cloudnullin test we run neutron agents on controller nodes, but in production we setup neutron agents on standalone hosts.19:22
ApsuI wonder if the metadata route injection bug is still present in Kilo...19:22
daneyon_cloudnull: i'm unfamiliar with the l3 att tool. Do you have a pointer?19:22
palendaehttps://github.com/stackforge/os-ansible-deployment/blob/master/playbooks/roles/os_neutron/files/neutron-ha-tool.py19:22
ApsuI should probably pick this back up and make sure it's fixed or doesn't need the workaround anymore: https://review.openstack.org/#/c/40487/19:23
daneyon_cloudnull: nm19:23
cloudnullhttps://github.com/stackforge/os-ansible-deployment/blob/master/playbooks/roles/os_neutron/files/neutron-ha-tool.py19:23
cloudnulloh sorry :)19:23
ApsuOnly need to wait a few more months before 2 years abandoned!19:23
daneyon_cloudnull: even if the nova-compute upgrade doesn't upgrade libvirt, you don't evacuate instances to CYA?19:27
daneyon_Sam-I-Am: Is the plan to move from the att ha tool to dvr in Kilo?19:31
Sam-I-Amdvr requires ovs, so until there's ovs support in os-ansible or lb support in dvr... probably not.19:31
daneyon_Sam-I-Am: OK. Thanks19:33
Sam-I-Ami suspect L3HA will easier to implement19:34
odyssey4meIn my opinion, L3HA is a better model from a security standpoint - it's certainly easier to control what's exposed with that model. DVR just requires far too much additional hole-plugging.19:39
odyssey4mefar less moving parts and hocus pocus too19:40
daneyon_cloudnull: Back to the my cinder issue from Friday: error while evaluating conditional: CURRENT not in cinder_get.content I think it may be due to the cinder scheduler not running. I don't understand why the cinder-scheduler container is not being created and configured.19:43
cloudnullthe scheduler is running within the volume container.19:44
daneyon_odyssey4me Sam-I-Am I'm a big fan of simple solutions to fix problems.19:44
daneyon_cloudnull: ah... I see19:44
cloudnullwhich is due to issues with the cinder-volumes not wanting to communicate to other schedulers upon failover.19:44
cloudnullso we run the scheduler on each volume node. to ensure that it maximizes uptime and maximizes availability.19:45
daneyon_cloudnull: Is their a bug ID for that?19:47
cloudnulllet me go see if i can find it .19:47
daneyon_cloudnull: I guess that puts me back at square one. I have cinder scheduler/volume running, I can curl the API VIP, and get status CURRENT but I get the error above running the play.19:48
Sam-I-Amodyssey4me: there are some benefits to l3ha over dvr19:48
Sam-I-Amboth of them are sort of half-baked19:48
Sam-I-Am(when you go digging)19:48
daneyon_cloudnull: I'm going to temp remove the check cinder api service is available from the play, run the play and checkto make sure everything works19:49
cloudnullhughsaunders might you be able to have a look at what could be going on with that? didn't you work on that part of cinder? or am i remembering wrong?19:49
* hughsaunders reads19:49
odyssey4meSam-I-Am my support of L3HA is more conceptual at this point - I haven't done any digging19:49
*** sdake has quit IRC19:50
Sam-I-Amodyssey4me: we should have a chat19:50
cloudnulldaneyon_: https://bugs.launchpad.net/cinder/+bug/140901219:50
openstackLaunchpad bug 1409012 in Cinder "Volume becomes in 'error' state after scheduler starts" [High,Fix committed] - Assigned to Michal Dulko (michal-dulko-f)19:50
cloudnullfix-committed in master 23 hours ago19:51
*** sdake has joined #openstack-ansible19:51
daneyon_cloudnull: thx for the bug info19:52
odyssey4meSam-I-Am part of that support comes from running a public cloud for some time and having to forcibly plug holes and put complex iptables blocking in to protect compute nodes when we were using nova-network... all of which because a lot simpler when we switched to the Neutron L3 agent model on designated network nodes. Our only issue then was scale for provider networks... but that was easy enough to resolve by moving19:53
odyssey4me our network control VM's to less contentious hosts and beefing the VM's up as required.19:53
*** britthouser has joined #openstack-ansible19:55
daneyon_cloudnull: the rest of the openstack-setup runs when i remove the check api from the cinder backend setup. I am able to create a volume and attach it to an instance.19:58
Sam-I-Amodyssey4me: from the ops meetup, seems a lot of people use nova-net or neutron w/ providernets19:58
Sam-I-Amand nova-net is popular because, even with dvr, there's not really a parallel in neutron for flatdhcp19:59
cloudnullhum. thats seems odd daneyon_ . as i've not seen that in production before. that said there are a few of us looking into it to see if we can figure out why that is .20:04
odyssey4meSam-I-Am from the ops meetup notes, it appeared common that people weren't aware that provider nets can be used with neutron - which I thought was odd... we had that setup for neutron back in Grizzly20:05
Apsuodyssey4me: Yep.20:05
Sam-I-Amwhat?20:06
ApsuPeople seem confused by the fact that all networks are provider networks.20:06
odyssey4mewe had provider nets for in-DC traffic, and customer dedicated WAN links... the provider nets ran straight from the compute hosts via vlan tags20:06
odyssey4mefor those not using provider nets, or even those using them but implementing virtual routers to gre networks, those went via the L3 Agents20:07
ApsuThere's no such thing as a "tenant" or "overlay" network, per se. As far as Neutron is concerned, a network is a network, and you pick the type by either accepting the default type for a non-admin tenant (i.e., "tenant" network), which may be a tunnel type ("overlay").20:07
ApsuBut all of them involve either accepting the default or using the provider extension with --provider:key=value20:07
ApsuThey're all provider networks :P20:07
odyssey4meApsu exactly - but it seems that most operators don't get that yet... which is not surprising as neutron is far more conplicated to piece together compared to nova-network.20:08
ApsuSure20:08
odyssey4meYou actually do need to understand networking. I was lucky in that I had someone who worked with me who did.20:09
* Apsu nods20:09
ApsuI find it silly to attempt to network a cluster of any size or complexity without having someone versed in at least traditional networking, if not linux networking specifically20:09
odyssey4meSam-I-Am flatdhcp - is that where there is only one network shared by all projects?20:10
ApsuUnless you're outsourcing deployment entirely20:10
Apsuodyssey4me: No. It's a nova-network network architecture type.20:10
ApsuThere was VLANManager, FlatManager, FlatDHCPManager, essentially20:10
odyssey4meApsu we used vlanmanager - but is there really no neutron topology that is similar to the flatdhcpmanager?20:11
ApsuMost people used FlatDHCP, with the (eventual) multi_host=True20:11
ApsuThe way that worked, was you put a dhcp server on each compute host20:11
Sam-I-Amyeah... multi_host is the big deal20:11
ApsuEach compute host had a linux bridge, with IPs on it from your instance networks20:11
ApsuThe IPs served as the gateways/DHCP server binds20:12
Sam-I-Amand some hackey goodness to make fixed/floating work20:12
odyssey4menova-network's networking was horrible - even with everything in segregated vlan's you couldn't safely overlap subnets20:12
ApsuSo each compute host could route traffic through each compute's bridge.20:12
Sam-I-Amit is horrible, and its not self-service20:12
Sam-I-Amhowever, people got used to those hacks, and think its fine.20:12
ApsuRequired much more extensive linux networking knowledge to configure and maintain I'd say20:12
ApsuNeutron is just presented poorly and has a very large potential scope20:12
odyssey4meyeah, neutron was written by networking people... nova-net was written by server people20:13
ApsuBut the actual configuration is relatively simple. Much simpler than provisioning an equally complex nova-net20:13
odyssey4mewell, that's the conclusion I drew20:13
ApsuThere's been a lot of work trying to emulate multi_host over the past few years20:13
Sam-I-Amdvr is about as close as it gets20:13
Sam-I-Amexcept now its "too complex"20:14
ApsuOne of the main pieces of work was from a guy at IBM (iirc), which got pushed back again and again until eventually he abandoned it20:14
ApsuSome other folks retried with DVR. Similar concept, but overengineered and poorly implemented from what I can see20:14
ApsuThe core concept is very simple and many people (myself included) have come up with it independently.20:14
ApsuThere's even other options possible with upstream network device support, such as ECMP20:15
ApsuOr using MAC load-balancing, like CARP20:15
Sam-I-AmApsu: patches accepted20:15
Sam-I-Amwell, proposed :P20:15
Apsu:P20:15
ApsuI might. I'm afraid of getting core, because I'll have to take up drinking that way20:15
Sam-I-Amyou're not drinking now?20:15
palendaeApsu: Is there anything legally binding with core?20:16
hughsaundersdaneyon_:  did you rerun when "check cinder api service is available"  failed? I'm curious as to whether it failed multiple times. If only once, could have been that the retries expired before cinder was available?20:16
prometheanfirecloudnull: https://review.openstack.org/#/c/154128/ kthnx20:16
prometheanfire:D20:16
prometheanfireApsu: I suppose you too20:16
Apsupalendae: Nah. Just my non-existent professional reputation. I'm dressing for the job I want.20:17
odyssey4meApsu if you can simplify it, then make the code... do it!20:17
Apsuodyssey4me: Probably will20:17
ApsuI've been kicking around the idea for 3 years. Have had many conversations with Vish, Dan and the IBM guy.20:17
ApsuSadly, when Dan was in charge, his response to "What about multi_host parity?" was "I don't see why you would do anything different than run 2 networking nodes and use OVS"20:18
palendaeApsu: So you're trying to improve it?20:18
ApsuNote that he said it in person with an earnest face. So... yeah.20:18
Apsupalendae: Yeah, ideally20:18
palendaeSuck up :p20:18
odyssey4meso by multi-host parity, then mean having dhcp on every compute node?20:19
Apsuodyssey4me: Nah, that's not the primary goal.20:19
odyssey4me*they20:19
ApsuThe HA'ness of agents isn't the point. That's essentially solved already.20:19
ApsuYou can put them anywhere, as many as you like, the scheduler is (almost) fine.20:20
ApsuThe issue is what's often called "direct return", in the load-balancing world.20:20
Sam-I-Amooo yeah20:20
ApsuI.e., instances on a given compute host will directly route through the upstream (non-virtual) switch to reach the outside world, and likewise for traffic coming back in to that instance.20:20
ApsuWhich is also known as direct north-south traversal.20:21
openstackgerritKevin Carter proposed stackforge/os-ansible-deployment: Adds rsyslog client role and enables it in all plays  https://review.openstack.org/16471420:21
ApsuCurrently, the path is east-west to network nodes, then north-south20:21
ApsuAggregating all routed traffic through the network nodes.20:21
ApsuGenerally switches can handle aggregate traffic better than servers, and their uplinks are capable of being much better, so you're artificially limiting your (aggregate) instance bandwidth by funneling through the east-west path.20:22
odyssey4meApsu yeah, but that choking point is actually a positive thing - assuming that you're able to scale those network control points sideways and perhaps also set them into AZ's.20:22
ApsuEven worse, you generally traverse the same switch to get to the network node in the first place20:22
ApsuAnd if it's the same physical interface (different VLAN, say), you've just halved your routed bandwidth20:22
odyssey4meyeah, but hang on - this is why perhaps you should be implementing neutron with a real-world controller, instead of OVS.20:22
ApsuSure. This is part of what led people to put OVS on physical switches.20:23
ApsuGot one under my desk right now :P20:23
palendaeApsu: O.o20:23
Apsupalendae: Oh yes.20:23
palendaeO. k.20:23
ApsuRuns a full Debian distro, has OVS, can see each physical port as an OVS port20:23
ApsuAll the needfuls20:24
Sam-I-Amdebian? so it has ovs 1.3?20:24
ApsuSam-I-Am: First rule of Debian club.20:24
odyssey4meso you you chose to use a Cisco Nexus fabric with its controller in a VDC, and allowing Neutron to orchestrate directly, you'd be sitting pretty (although somewhat poorer compared to a similar OVS setup)20:24
ApsuSam-I-Am: I'm trying to reduce the salt level, not increase it ;P20:24
Sam-I-AmApsu: when did we move to salt?20:24
Apsuodyssey4me: Welcome to what almost every single network vendor has been working on for ~2 years.20:25
Sam-I-Amor you can use brocade :)20:25
ApsuSam-I-Am: Before we realized it wasn't the expression of our saltiness.20:25
odyssey4mebut that's the sort of setup needed for a real production environment - the OVS setup really is only useful for small setups20:25
odyssey4meSam-I-Am yeah, or Arista20:25
Apsuodyssey4me: Eh, that's debatable. Google's got OVS running their internal backbone switches20:25
ApsuThey were one of the first to stick it on a physical switch20:26
Sam-I-Amrackspace uses ovs20:26
ApsuTrue, public cloud networking is partly OVS.20:26
ApsuWell, I should say partly NSX..20:26
Sam-I-Amone could argue that if you have to pick software, ovs scales better than linuxbridge20:26
odyssey4mefair enough - OVS can be used... but then it should be on specialised hardware... and for a decent L3 setup I expect that a more capable controller owuld be needed than the basic stuff that usually gets used20:26
odyssey4memaybe opendaylight or something? dunno - it's been a hiwle20:26
odyssey4me*while20:27
palendaeThis discussions is working well with the class right now20:27
Apsuodyssey4me: Sure. To be fair, OVS 2.3+ is way more advanced than Neutron has begun to take advantage of20:27
ApsuStill, I don't think it's "better" than native linux.20:28
ApsuNamespaces and tunnel interfaces and minor orchestration can do what you need.20:28
daneyon_hughsaunders: I ran multiple times and i hit the same error. The only way I can get around it is by removing the api check.20:28
odyssey4mealso to be fair, our horrible experiences with OVS have a lot to do with early versions and the inability for Ubuntu to provide kernel and OVS kernel module patches properly... I do have horrible memories of kernel panics after almost every package update20:29
hughsaundersdaneyon_: would you be able to pastebin the response you get from cinder?20:29
Apsuodyssey4me: The biggest benefits from OVS come from using *actual* metrics and monitoring at the flow level to provide dynamic adjustments to traffic, to maximize utilization20:29
Apsuodyssey4me: That's the whole value prop of OpenFlow. The dynamic, programmable feedback loop20:29
ApsuThen you can adjust QoS, pick different datacenter paths based on link utilization, etc20:29
Sam-I-Amwhat is this qos?20:29
odyssey4meApsu yeah, I do remember my fellow architect schooling me in those mysteries :p20:29
ApsuGives you a good place to interface with OSPF in the backbone layer...20:30
ApsuSam-I-Am: I have no idea what I'm talking about, I just started pasting from a buzzword generator. Don't mind me.20:30
hughsaundersApsu: now we have it in writing!20:30
palendaeApsu is a markov chain20:30
Apsuhughsaunders: Was it ever in question?20:30
Apsupalendae: What do you feel about Apsu is a markov chain ?20:31
daneyon_hughsaunders: just to clarify, the response I get from Cinder API when I simply curl the VIP?20:34
hughsaundersdaneyon_: yes please20:34
hughsaundersjust to see if theres some reason its not matching20:35
daneyon_hughsaunders: https://etherpad.mozilla.org/74IKwZSD2b20:35
hughsaundersdaneyon_: thanks20:36
*** vmtrooper has joined #openstack-ansible20:50
daneyon_hughsaunders: yw20:53
*** vmtrooper has quit IRC20:55
*** jaypipes has quit IRC20:56
*** jaypipes has joined #openstack-ansible20:57
*** Mudpuppy has quit IRC21:00
*** sigmavirus24 is now known as sigmavirus24_awa21:00
hughsaundersdaneyon_: I setup nc to respond with the response you provided, and pointed the "check cinder api service is available" task from the tip of the juno branch at and I can't get it to fail :( so I'm not sure whats happening. Which SHA of os-ansible-deployment are you on?21:08
daneyon_Sam-I-Am: I've had a chance to look at t he HA tool in more detail. It seems like the tool is more of a DR solution than an HA solution. How long do you see the typical faial-over time between L3 agents in your test cases?21:08
daneyon_hughsaunders: commit 2c6e3b5c5958feda28c800a3acec9165051e6fdc21:09
hughsaundersdaneyon_: thanks21:09
daneyon_hughsaunders: yw21:11
daneyon_hughsaunders: bb in 10-15.. need food.21:12
*** Mudpuppy has joined #openstack-ansible21:14
*** Mudpuppy_ has joined #openstack-ansible21:17
*** Mudpuppy has quit IRC21:20
*** Mudpuppy_ is now known as Mudpuppy21:31
hughsaundersdaneyon_: still can't get it to fail, tried ansible 1.6.10 (as pre requirements) and 1.8.4 (latest release). could you add a debug task after "check cinder api service is available" and pastebin the result?  -debug: var=cinder_get21:34
*** sigmavirus24_awa is now known as sigmavirus2421:47
daneyon_hughsaunders: Do I add something like this: debug: cinder_get21:52
hughsaundersdaneyon_: yeah -debug: var=cinder_get21:53
*** stevemar has quit IRC22:13
*** Mudpuppy has quit IRC22:19
daneyon_hughsaunders: i updated the etherpad22:28
daneyon_hughsaunders: Is their a log I should check b/c I get the same error msg22:29
hughsaundersdaneyon_: yeah, it seems the debug is not evaluated because the previous task fails :(22:30
Apsudaneyon_: < 5 min for failover. Which isn't very fast in the worst case. It's more like Medium Availability, I guess. There are better ways to do it but not without upstream scheduler changes (which currently don't handle failover at all for L3), running the cronjobs more often (which can be too slow and pile up with lots of networks/agents), or using a dedicated daemon to plug into agent statuses through AMQP and do its own scheduling and queueing22:34
Apsu (like rpcdaemon did/does).22:34
daneyon_Apsu: Thx. Can you refresh my memory why you went away from rpcdaemon then?22:36
Apsudaneyon_: Neutron finally implemented a decent L3 scheduler that handled most of the edge cases it used to not, and only lacked the basic case of: agent down -> move routers to other agents22:38
ApsuThe LeastUsedScheduler or whatever. It just needs to get poked a little bit22:39
ApsuWhich the ATT script did in a simple straightforward way22:39
daneyon_Apsu: OK. Thanks!22:39
Apsurpcdaemon is ... a slightly complex, multithreaded, proper daemon.22:39
*** vmtrooper has joined #openstack-ansible22:39
ApsuI can totally show it to you and explain it if you're interested, but you'd have to disable the builtin schedulers22:39
daneyon_Apsu: Maybe at some point. Right now, I'm just trying to gather data.22:41
ApsuIt's faster than processing the same bits in bash, for sure, but the real solution is honestly to make a better scheduler.22:42
ApsuI think Neutron had a blueprint for one....22:42
*** sigmavirus24 is now known as sigmavirus24_awa22:43
ApsuMight have gotten displaced a little due to DVR/HA bits coming in22:43
hughsaundersdaneyon_: thanks for providing debug info, the only thing I can think off is that the internal vip or service port are wrong, but I'm not sure how that would happen.22:44
ApsuLooks like this part is in at least: https://github.com/openstack/neutron/blob/e933891462408435c580ad42ff737f8bff428fbc/neutron/scheduler/l3_agent_scheduler.py#L12622:44
Apsuauto_schedule_routers22:44
ApsuWhich was part of the automatic rescheduler class22:44
hughsaundersThe content check stands up to scruityn22:44
ApsuAnyhow, heading out.22:44
hughsaundersI'm off now, but will ping if I think of anything else.22:44
openstackgerritKevin Carter proposed stackforge/os-ansible-deployment: Adds rsyslog client role and enables it in all plays  https://review.openstack.org/16471422:44
*** vmtrooper has quit IRC22:45
daneyon_hughsaunders: what's weird the play checks the vip and port in the previous step. That passes. I even went changed the vip/port vars in the api check to the real IP/port and I get the same failure. I'm going to do a rebuild of my env soon and I'll let you know if it comes back.22:45
*** sdake has quit IRC22:55
*** KLevenstein has quit IRC22:59
*** galstrom is now known as galstrom_zzz22:59
*** sdake has joined #openstack-ansible23:24
*** jaypipes has quit IRC23:41
*** sdake__ has joined #openstack-ansible23:45
*** sdake has quit IRC23:49

Generated by irclog2html.py 2.14.0 by Marius Gedminas - find it at mg.pov.lt!