*** vmtrooper has joined #openstack-ansible | 00:53 | |
*** vmtrooper has quit IRC | 00:58 | |
*** mahito has joined #openstack-ansible | 01:02 | |
*** galstrom_zzz is now known as galstrom | 02:00 | |
*** vmtrooper has joined #openstack-ansible | 02:41 | |
*** vmtrooper has quit IRC | 02:47 | |
*** stevemar has joined #openstack-ansible | 03:19 | |
*** galstrom is now known as galstrom_zzz | 04:01 | |
*** galstrom_zzz is now known as galstrom | 04:06 | |
*** mahito has quit IRC | 04:08 | |
*** vmtrooper has joined #openstack-ansible | 04:30 | |
*** vmtrooper has quit IRC | 04:36 | |
*** galstrom is now known as galstrom_zzz | 05:05 | |
*** stevemar has quit IRC | 05:55 | |
*** vmtrooper has joined #openstack-ansible | 06:19 | |
*** vmtrooper has quit IRC | 06:24 | |
*** mahito has joined #openstack-ansible | 06:56 | |
*** vmtrooper has joined #openstack-ansible | 08:08 | |
*** vmtrooper has quit IRC | 08:13 | |
*** mahito has quit IRC | 08:33 | |
*** vmtrooper has joined #openstack-ansible | 09:57 | |
*** vmtrooper has quit IRC | 10:02 | |
*** vmtrooper has joined #openstack-ansible | 11:46 | |
*** jaypipes has joined #openstack-ansible | 11:51 | |
*** vmtrooper has quit IRC | 11:51 | |
*** britthouser has joined #openstack-ansible | 11:53 | |
*** galstrom_zzz is now known as galstrom | 12:25 | |
*** sdake has joined #openstack-ansible | 12:47 | |
*** openstackgerrit has quit IRC | 12:50 | |
*** openstackgerrit has joined #openstack-ansible | 12:50 | |
*** galstrom is now known as galstrom_zzz | 13:08 | |
openstackgerrit | git-harry proposed stackforge/os-ansible-deployment: Test commit - do not review https://review.openstack.org/161670 | 13:31 |
---|---|---|
*** sandywalsh has joined #openstack-ansible | 13:31 | |
*** vmtrooper has joined #openstack-ansible | 13:35 | |
*** KLevenstein has joined #openstack-ansible | 13:39 | |
*** vmtrooper has quit IRC | 13:40 | |
*** Mudpuppy has joined #openstack-ansible | 13:56 | |
*** Mudpuppy has quit IRC | 14:07 | |
*** Mudpuppy has joined #openstack-ansible | 14:08 | |
*** sigmavirus24_awa is now known as sigmavirus24 | 14:25 | |
openstackgerrit | Kevin Carter proposed stackforge/os-ansible-deployment: Adds rsyslog client role and enables it in all plays https://review.openstack.org/164714 | 14:36 |
openstackgerrit | Kevin Carter proposed stackforge/os-ansible-deployment: Adds rsyslog client role and enables it in all plays https://review.openstack.org/164714 | 14:36 |
*** galstrom_zzz is now known as galstrom | 14:37 | |
openstackgerrit | Kevin Carter proposed stackforge/os-ansible-deployment: Adds rsyslog client role and enables it in all plays https://review.openstack.org/164714 | 14:39 |
openstackgerrit | Hugh Saunders proposed stackforge/os-ansible-deployment: Add ldappool to keystone packages https://review.openstack.org/164715 | 14:40 |
*** stevemar has joined #openstack-ansible | 14:40 | |
*** prometheanfire has joined #openstack-ansible | 14:45 | |
*** alextricity has quit IRC | 14:50 | |
*** alextricity has joined #openstack-ansible | 15:06 | |
*** daneyon has quit IRC | 15:11 | |
*** vmtrooper has joined #openstack-ansible | 15:24 | |
*** vmtrooper has quit IRC | 15:29 | |
openstackgerrit | Hugh Saunders proposed stackforge/os-ansible-deployment: Ensure return code passes through output trimming https://review.openstack.org/164480 | 15:35 |
*** galstrom is now known as galstrom_zzz | 15:55 | |
*** openstackgerrit has quit IRC | 16:11 | |
*** openstackgerrit has joined #openstack-ansible | 16:12 | |
*** sandywalsh has quit IRC | 16:36 | |
palendae | cloudnull: Now that https://github.com/ansible/ansible-modules-extras/blob/devel/cloud/lxc/lxc_container.py is merged, do we need a blueprint to get that in? A bug at least | 16:36 |
*** openstackgerrit has quit IRC | 16:54 | |
*** openstackgerrit has joined #openstack-ansible | 16:54 | |
*** britthouser has quit IRC | 17:01 | |
openstackgerrit | Merged stackforge/os-ansible-deployment: Reduce script verbosity https://review.openstack.org/164165 | 17:07 |
openstackgerrit | Merged stackforge/os-ansible-deployment: Removed all rackspace related logging parts https://review.openstack.org/164470 | 17:08 |
openstackgerrit | Merged stackforge/os-ansible-deployment: Add new rsyslog server role https://review.openstack.org/164471 | 17:08 |
openstackgerrit | Merged stackforge/os-ansible-deployment: Do not assume users have names https://review.openstack.org/164236 | 17:08 |
palendae | One more? | 17:08 |
openstackgerrit | Merged stackforge/os-ansible-deployment: Update contributing guidelines with backport guidance https://review.openstack.org/163764 | 17:08 |
palendae | Boom | 17:08 |
openstackgerrit | Miguel Alejandro Cantu proposed stackforge/os-ansible-deployment: Change heat_metadata_server_url to external API. https://review.openstack.org/164785 | 17:09 |
*** vmtrooper has joined #openstack-ansible | 17:13 | |
*** vmtrooper has quit IRC | 17:18 | |
*** daneyon has joined #openstack-ansible | 17:18 | |
*** sigmavirus24 is now known as sigmavirus24_awa | 17:51 | |
*** Mudpuppy has quit IRC | 17:51 | |
*** KLevenstein has quit IRC | 18:06 | |
*** galstrom_zzz is now known as galstrom | 18:07 | |
*** KLevenstein has joined #openstack-ansible | 18:25 | |
cloudnull | palendae the module is merged however its not part of a 'release' quite yet so i think we should hold off on that for now. | 18:27 |
openstackgerrit | Jesse Pretorius proposed stackforge/os-ansible-deployment: Simplify and improve bootstrap/gate/run scripts https://review.openstack.org/163837 | 18:35 |
openstackgerrit | Jesse Pretorius proposed stackforge/os-ansible-deployment: Revise gate script library to report correctly https://review.openstack.org/163914 | 18:35 |
prometheanfire | aparently gate finally works for neutron, after 3-4 days of not working (at least) | 18:37 |
prometheanfire | well, grenade at least | 18:37 |
*** daneyon_ has joined #openstack-ansible | 18:45 | |
*** daneyon has quit IRC | 18:46 | |
openstackgerrit | Jesse Pretorius proposed stackforge/os-ansible-deployment: Remove 'holland' package from the wheel repository https://review.openstack.org/164822 | 18:48 |
daneyon_ | Any background on why all the neutron agents share the same container? | 18:50 |
*** Mudpuppy has joined #openstack-ansible | 18:52 | |
*** Mudpuppy has quit IRC | 18:57 | |
*** Mudpuppy has joined #openstack-ansible | 18:57 | |
odyssey4me | Apsu rackertom cloudnull ^ see question from daneyon_ | 19:01 |
palendae | cloudnull: Ah, that's fair | 19:01 |
*** vmtrooper has joined #openstack-ansible | 19:02 | |
palendae | prometheanfire: For neutron the project or our neutron tests | 19:02 |
Apsu | daneyon_: Meaning l3 + dhcp + metadata + l2? | 19:03 |
cloudnull | daneyon_ the neutron agents all share the same container because the core agents all require the use of the same namespace. | 19:03 |
Apsu | daneyon_: The alternative would be l3 + metadata + l2 && dhcp + metadata + l2. | 19:04 |
*** sigmavirus24_awa is now known as sigmavirus24 | 19:04 | |
Apsu | Which we felt was worse than combining l3 and dhcp. | 19:04 |
cloudnull | what Apsu said ^ | 19:04 |
cloudnull | we've taken a similar approach with heat and glance. | 19:06 |
Apsu | Yep. We combine where it makes sense, and separate where it doesn't. | 19:06 |
prometheanfire | palendae: project | 19:06 |
*** vmtrooper has quit IRC | 19:06 | |
prometheanfire | palendae: the entire project was stalled because grenade fail | 19:06 |
daneyon_ | cloudnull and Apsu that's what I was thinking but I just wanted to make sure. I would think it would be pretty straight forward to create plays that separated the LB agent (running on compute nodes) from the other Neutron agents that would run on control nodes. | 19:08 |
Apsu | daneyon_: Ah, well, now you're thinking in DVR terms. | 19:08 |
Apsu | And that's different than LinuxBridge terms. For now :) | 19:08 |
cloudnull | doing a deployment using full micro-services, as has been dubed by the docker community, is a lofty goal and one that can be done in academia but simply doesn't make sense within in production in most cases. We found that we'd need to work around issues presented by using micro-services like that which present bugs that are really not bugs | 19:08 |
Apsu | ^ iscsi kernel module doesn't respect network namespaces, for instance. | 19:09 |
Apsu | Making putting nova into a container partially crippled. | 19:09 |
cloudnull | daneyon_ in terms of running LB agents and things like DVR on compute nodes is already possible in the way that we spec the environment. IE neutron-linuxbridgeagent is running on the compute nodes. | 19:10 |
cloudnull | and you'd build off of that to continue that trend | 19:11 |
Apsu | Yep | 19:11 |
cloudnull | if you have a look at master there's a lot more separation of services and roles that in the other feature branches. | 19:11 |
daneyon_ | Apsu: OK. I haven't touched Neutron HA in a while. My approach in the past for Neutron HA was to use provider networking instead of the L3 agent. It appears that v10 support L3 HA. How well has this worked for you? | 19:12 |
Apsu | Sam-I-Am: daneyon_ has a question for you, I believe :P | 19:12 |
cloudnull | daneyon_ l3_ha is disabled by default, as t pertains to neutron. | 19:12 |
Apsu | daneyon_: I would defer to Sam-I-Am who has flagellated himself the most extensively with L3HA/DVR. | 19:13 |
Sam-I-Am | hahah | 19:13 |
cloudnull | we're using the l3 att tool to ensure that l3 is ha between multiple active nodes. | 19:13 |
Sam-I-Am | l3ha and linuxbridge is not recommended in juno (v10) | 19:13 |
Sam-I-Am | you can probably configure it, but ymmv on operation | 19:13 |
daneyon_ | cloudnull: I agree. I think Docker's inability to natively support multiple processes gets in the way sometimes. | 19:14 |
cloudnull | this is the tool that att created to do 3l ha failover https://github.com/stackforge/os-ansible-deployment/blob/master/playbooks/roles/os_neutron/files/neutron-ha-tool.py | 19:14 |
Apsu | ^ we switched to that from our former solution, a daemon I wrote to watch agent heartbeats through rabbit and reschedule routers/networks as appropriate | 19:15 |
Apsu | The daemon used to break older neutron. It was too efficient at its job, and Neutron has some serious race conditions in it | 19:15 |
Apsu | Especially around network scheduling | 19:16 |
Apsu | Tempest can do it too, before they added waits into it between network spinup/teardown :P | 19:16 |
cloudnull | additionally we're using the change / least router scheduler | 19:16 |
cloudnull | https://github.com/stackforge/os-ansible-deployment/blob/master/playbooks/roles/os_neutron/defaults/main.yml#L48-L49 | 19:16 |
Apsu | ^ | 19:16 |
cloudnull | which allows us to make sure scheduling is relatively diverse | 19:16 |
Apsu | That's relatively new, but basically obsoleted rpcdaemon (which did the same basic thing) | 19:17 |
cloudnull | ^ true story | 19:17 |
Apsu | So we get to leverage builtin schedulers instead with manual rescheduling on failure (through cron'd ATT script) | 19:17 |
Apsu | Easier for everyone | 19:17 |
Apsu | What I really should do next is write an auto-rescheduler... | 19:17 |
Apsu | Bake it into upstream neutron | 19:17 |
cloudnull | that way we can say gty to the l3_ha :) | 19:18 |
palendae | Apsu: Do you want core? That's how you get core. | 19:18 |
daneyon_ | Apsu: I briefly looked at the iscsi bug you're referencing. Compute containers were interesting from an upgrade perspective... being able to upgrade nova-compute with little downtime to the running instances. I suspect you simply rely on migration to evacuate instances from a compute node before upgrading, correct? | 19:18 |
Apsu | palendae: haha | 19:18 |
Apsu | daneyon_: Essentially yeah. I mean, that code path Still works; running instances in containers. | 19:18 |
cloudnull | daneyon_ we can upgrade inplace without impacting the running instances. so long as your not upgrading libvirt. | 19:19 |
Apsu | The whole thing works great. It's just iscsi failing to rewrite their netlink driver code | 19:19 |
daneyon_ | cloudnull Apsu: OK, I'm used to running only the L3 and LB agents on the compute nodes. Many times I would not even run L3 on the compute node and use provider networking to get around the Neutron L3 HA limitations. I would run the other agents on the control nodes. | 19:20 |
cloudnull | deploying new nova-compute/neutron-linuxbridge code is relatively benign. | 19:20 |
cloudnull | ah. this is a bit of an architectural shift from that. | 19:21 |
Apsu | daneyon_: I hear you. We didn't even ship l3 agents when they were first out, nor OVS. | 19:21 |
cloudnull | in test we run neutron agents on controller nodes, but in production we setup neutron agents on standalone hosts. | 19:22 |
Apsu | I wonder if the metadata route injection bug is still present in Kilo... | 19:22 |
daneyon_ | cloudnull: i'm unfamiliar with the l3 att tool. Do you have a pointer? | 19:22 |
palendae | https://github.com/stackforge/os-ansible-deployment/blob/master/playbooks/roles/os_neutron/files/neutron-ha-tool.py | 19:22 |
Apsu | I should probably pick this back up and make sure it's fixed or doesn't need the workaround anymore: https://review.openstack.org/#/c/40487/ | 19:23 |
daneyon_ | cloudnull: nm | 19:23 |
cloudnull | https://github.com/stackforge/os-ansible-deployment/blob/master/playbooks/roles/os_neutron/files/neutron-ha-tool.py | 19:23 |
cloudnull | oh sorry :) | 19:23 |
Apsu | Only need to wait a few more months before 2 years abandoned! | 19:23 |
daneyon_ | cloudnull: even if the nova-compute upgrade doesn't upgrade libvirt, you don't evacuate instances to CYA? | 19:27 |
daneyon_ | Sam-I-Am: Is the plan to move from the att ha tool to dvr in Kilo? | 19:31 |
Sam-I-Am | dvr requires ovs, so until there's ovs support in os-ansible or lb support in dvr... probably not. | 19:31 |
daneyon_ | Sam-I-Am: OK. Thanks | 19:33 |
Sam-I-Am | i suspect L3HA will easier to implement | 19:34 |
odyssey4me | In my opinion, L3HA is a better model from a security standpoint - it's certainly easier to control what's exposed with that model. DVR just requires far too much additional hole-plugging. | 19:39 |
odyssey4me | far less moving parts and hocus pocus too | 19:40 |
daneyon_ | cloudnull: Back to the my cinder issue from Friday: error while evaluating conditional: CURRENT not in cinder_get.content I think it may be due to the cinder scheduler not running. I don't understand why the cinder-scheduler container is not being created and configured. | 19:43 |
cloudnull | the scheduler is running within the volume container. | 19:44 |
daneyon_ | odyssey4me Sam-I-Am I'm a big fan of simple solutions to fix problems. | 19:44 |
daneyon_ | cloudnull: ah... I see | 19:44 |
cloudnull | which is due to issues with the cinder-volumes not wanting to communicate to other schedulers upon failover. | 19:44 |
cloudnull | so we run the scheduler on each volume node. to ensure that it maximizes uptime and maximizes availability. | 19:45 |
daneyon_ | cloudnull: Is their a bug ID for that? | 19:47 |
cloudnull | let me go see if i can find it . | 19:47 |
daneyon_ | cloudnull: I guess that puts me back at square one. I have cinder scheduler/volume running, I can curl the API VIP, and get status CURRENT but I get the error above running the play. | 19:48 |
Sam-I-Am | odyssey4me: there are some benefits to l3ha over dvr | 19:48 |
Sam-I-Am | both of them are sort of half-baked | 19:48 |
Sam-I-Am | (when you go digging) | 19:48 |
daneyon_ | cloudnull: I'm going to temp remove the check cinder api service is available from the play, run the play and checkto make sure everything works | 19:49 |
cloudnull | hughsaunders might you be able to have a look at what could be going on with that? didn't you work on that part of cinder? or am i remembering wrong? | 19:49 |
* hughsaunders reads | 19:49 | |
odyssey4me | Sam-I-Am my support of L3HA is more conceptual at this point - I haven't done any digging | 19:49 |
*** sdake has quit IRC | 19:50 | |
Sam-I-Am | odyssey4me: we should have a chat | 19:50 |
cloudnull | daneyon_: https://bugs.launchpad.net/cinder/+bug/1409012 | 19:50 |
openstack | Launchpad bug 1409012 in Cinder "Volume becomes in 'error' state after scheduler starts" [High,Fix committed] - Assigned to Michal Dulko (michal-dulko-f) | 19:50 |
cloudnull | fix-committed in master 23 hours ago | 19:51 |
*** sdake has joined #openstack-ansible | 19:51 | |
daneyon_ | cloudnull: thx for the bug info | 19:52 |
odyssey4me | Sam-I-Am part of that support comes from running a public cloud for some time and having to forcibly plug holes and put complex iptables blocking in to protect compute nodes when we were using nova-network... all of which because a lot simpler when we switched to the Neutron L3 agent model on designated network nodes. Our only issue then was scale for provider networks... but that was easy enough to resolve by moving | 19:53 |
odyssey4me | our network control VM's to less contentious hosts and beefing the VM's up as required. | 19:53 |
*** britthouser has joined #openstack-ansible | 19:55 | |
daneyon_ | cloudnull: the rest of the openstack-setup runs when i remove the check api from the cinder backend setup. I am able to create a volume and attach it to an instance. | 19:58 |
Sam-I-Am | odyssey4me: from the ops meetup, seems a lot of people use nova-net or neutron w/ providernets | 19:58 |
Sam-I-Am | and nova-net is popular because, even with dvr, there's not really a parallel in neutron for flatdhcp | 19:59 |
cloudnull | hum. thats seems odd daneyon_ . as i've not seen that in production before. that said there are a few of us looking into it to see if we can figure out why that is . | 20:04 |
odyssey4me | Sam-I-Am from the ops meetup notes, it appeared common that people weren't aware that provider nets can be used with neutron - which I thought was odd... we had that setup for neutron back in Grizzly | 20:05 |
Apsu | odyssey4me: Yep. | 20:05 |
Sam-I-Am | what? | 20:06 |
Apsu | People seem confused by the fact that all networks are provider networks. | 20:06 |
odyssey4me | we had provider nets for in-DC traffic, and customer dedicated WAN links... the provider nets ran straight from the compute hosts via vlan tags | 20:06 |
odyssey4me | for those not using provider nets, or even those using them but implementing virtual routers to gre networks, those went via the L3 Agents | 20:07 |
Apsu | There's no such thing as a "tenant" or "overlay" network, per se. As far as Neutron is concerned, a network is a network, and you pick the type by either accepting the default type for a non-admin tenant (i.e., "tenant" network), which may be a tunnel type ("overlay"). | 20:07 |
Apsu | But all of them involve either accepting the default or using the provider extension with --provider:key=value | 20:07 |
Apsu | They're all provider networks :P | 20:07 |
odyssey4me | Apsu exactly - but it seems that most operators don't get that yet... which is not surprising as neutron is far more conplicated to piece together compared to nova-network. | 20:08 |
Apsu | Sure | 20:08 |
odyssey4me | You actually do need to understand networking. I was lucky in that I had someone who worked with me who did. | 20:09 |
* Apsu nods | 20:09 | |
Apsu | I find it silly to attempt to network a cluster of any size or complexity without having someone versed in at least traditional networking, if not linux networking specifically | 20:09 |
odyssey4me | Sam-I-Am flatdhcp - is that where there is only one network shared by all projects? | 20:10 |
Apsu | Unless you're outsourcing deployment entirely | 20:10 |
Apsu | odyssey4me: No. It's a nova-network network architecture type. | 20:10 |
Apsu | There was VLANManager, FlatManager, FlatDHCPManager, essentially | 20:10 |
odyssey4me | Apsu we used vlanmanager - but is there really no neutron topology that is similar to the flatdhcpmanager? | 20:11 |
Apsu | Most people used FlatDHCP, with the (eventual) multi_host=True | 20:11 |
Apsu | The way that worked, was you put a dhcp server on each compute host | 20:11 |
Sam-I-Am | yeah... multi_host is the big deal | 20:11 |
Apsu | Each compute host had a linux bridge, with IPs on it from your instance networks | 20:11 |
Apsu | The IPs served as the gateways/DHCP server binds | 20:12 |
Sam-I-Am | and some hackey goodness to make fixed/floating work | 20:12 |
odyssey4me | nova-network's networking was horrible - even with everything in segregated vlan's you couldn't safely overlap subnets | 20:12 |
Apsu | So each compute host could route traffic through each compute's bridge. | 20:12 |
Sam-I-Am | it is horrible, and its not self-service | 20:12 |
Sam-I-Am | however, people got used to those hacks, and think its fine. | 20:12 |
Apsu | Required much more extensive linux networking knowledge to configure and maintain I'd say | 20:12 |
Apsu | Neutron is just presented poorly and has a very large potential scope | 20:12 |
odyssey4me | yeah, neutron was written by networking people... nova-net was written by server people | 20:13 |
Apsu | But the actual configuration is relatively simple. Much simpler than provisioning an equally complex nova-net | 20:13 |
odyssey4me | well, that's the conclusion I drew | 20:13 |
Apsu | There's been a lot of work trying to emulate multi_host over the past few years | 20:13 |
Sam-I-Am | dvr is about as close as it gets | 20:13 |
Sam-I-Am | except now its "too complex" | 20:14 |
Apsu | One of the main pieces of work was from a guy at IBM (iirc), which got pushed back again and again until eventually he abandoned it | 20:14 |
Apsu | Some other folks retried with DVR. Similar concept, but overengineered and poorly implemented from what I can see | 20:14 |
Apsu | The core concept is very simple and many people (myself included) have come up with it independently. | 20:14 |
Apsu | There's even other options possible with upstream network device support, such as ECMP | 20:15 |
Apsu | Or using MAC load-balancing, like CARP | 20:15 |
Sam-I-Am | Apsu: patches accepted | 20:15 |
Sam-I-Am | well, proposed :P | 20:15 |
Apsu | :P | 20:15 |
Apsu | I might. I'm afraid of getting core, because I'll have to take up drinking that way | 20:15 |
Sam-I-Am | you're not drinking now? | 20:15 |
palendae | Apsu: Is there anything legally binding with core? | 20:16 |
hughsaunders | daneyon_: did you rerun when "check cinder api service is available" failed? I'm curious as to whether it failed multiple times. If only once, could have been that the retries expired before cinder was available? | 20:16 |
prometheanfire | cloudnull: https://review.openstack.org/#/c/154128/ kthnx | 20:16 |
prometheanfire | :D | 20:16 |
prometheanfire | Apsu: I suppose you too | 20:16 |
Apsu | palendae: Nah. Just my non-existent professional reputation. I'm dressing for the job I want. | 20:17 |
odyssey4me | Apsu if you can simplify it, then make the code... do it! | 20:17 |
Apsu | odyssey4me: Probably will | 20:17 |
Apsu | I've been kicking around the idea for 3 years. Have had many conversations with Vish, Dan and the IBM guy. | 20:17 |
Apsu | Sadly, when Dan was in charge, his response to "What about multi_host parity?" was "I don't see why you would do anything different than run 2 networking nodes and use OVS" | 20:18 |
palendae | Apsu: So you're trying to improve it? | 20:18 |
Apsu | Note that he said it in person with an earnest face. So... yeah. | 20:18 |
Apsu | palendae: Yeah, ideally | 20:18 |
palendae | Suck up :p | 20:18 |
odyssey4me | so by multi-host parity, then mean having dhcp on every compute node? | 20:19 |
Apsu | odyssey4me: Nah, that's not the primary goal. | 20:19 |
odyssey4me | *they | 20:19 |
Apsu | The HA'ness of agents isn't the point. That's essentially solved already. | 20:19 |
Apsu | You can put them anywhere, as many as you like, the scheduler is (almost) fine. | 20:20 |
Apsu | The issue is what's often called "direct return", in the load-balancing world. | 20:20 |
Sam-I-Am | ooo yeah | 20:20 |
Apsu | I.e., instances on a given compute host will directly route through the upstream (non-virtual) switch to reach the outside world, and likewise for traffic coming back in to that instance. | 20:20 |
Apsu | Which is also known as direct north-south traversal. | 20:21 |
openstackgerrit | Kevin Carter proposed stackforge/os-ansible-deployment: Adds rsyslog client role and enables it in all plays https://review.openstack.org/164714 | 20:21 |
Apsu | Currently, the path is east-west to network nodes, then north-south | 20:21 |
Apsu | Aggregating all routed traffic through the network nodes. | 20:21 |
Apsu | Generally switches can handle aggregate traffic better than servers, and their uplinks are capable of being much better, so you're artificially limiting your (aggregate) instance bandwidth by funneling through the east-west path. | 20:22 |
odyssey4me | Apsu yeah, but that choking point is actually a positive thing - assuming that you're able to scale those network control points sideways and perhaps also set them into AZ's. | 20:22 |
Apsu | Even worse, you generally traverse the same switch to get to the network node in the first place | 20:22 |
Apsu | And if it's the same physical interface (different VLAN, say), you've just halved your routed bandwidth | 20:22 |
odyssey4me | yeah, but hang on - this is why perhaps you should be implementing neutron with a real-world controller, instead of OVS. | 20:22 |
Apsu | Sure. This is part of what led people to put OVS on physical switches. | 20:23 |
Apsu | Got one under my desk right now :P | 20:23 |
palendae | Apsu: O.o | 20:23 |
Apsu | palendae: Oh yes. | 20:23 |
palendae | O. k. | 20:23 |
Apsu | Runs a full Debian distro, has OVS, can see each physical port as an OVS port | 20:23 |
Apsu | All the needfuls | 20:24 |
Sam-I-Am | debian? so it has ovs 1.3? | 20:24 |
Apsu | Sam-I-Am: First rule of Debian club. | 20:24 |
odyssey4me | so you you chose to use a Cisco Nexus fabric with its controller in a VDC, and allowing Neutron to orchestrate directly, you'd be sitting pretty (although somewhat poorer compared to a similar OVS setup) | 20:24 |
Apsu | Sam-I-Am: I'm trying to reduce the salt level, not increase it ;P | 20:24 |
Sam-I-Am | Apsu: when did we move to salt? | 20:24 |
Apsu | odyssey4me: Welcome to what almost every single network vendor has been working on for ~2 years. | 20:25 |
Sam-I-Am | or you can use brocade :) | 20:25 |
Apsu | Sam-I-Am: Before we realized it wasn't the expression of our saltiness. | 20:25 |
odyssey4me | but that's the sort of setup needed for a real production environment - the OVS setup really is only useful for small setups | 20:25 |
odyssey4me | Sam-I-Am yeah, or Arista | 20:25 |
Apsu | odyssey4me: Eh, that's debatable. Google's got OVS running their internal backbone switches | 20:25 |
Apsu | They were one of the first to stick it on a physical switch | 20:26 |
Sam-I-Am | rackspace uses ovs | 20:26 |
Apsu | True, public cloud networking is partly OVS. | 20:26 |
Apsu | Well, I should say partly NSX.. | 20:26 |
Sam-I-Am | one could argue that if you have to pick software, ovs scales better than linuxbridge | 20:26 |
odyssey4me | fair enough - OVS can be used... but then it should be on specialised hardware... and for a decent L3 setup I expect that a more capable controller owuld be needed than the basic stuff that usually gets used | 20:26 |
odyssey4me | maybe opendaylight or something? dunno - it's been a hiwle | 20:26 |
odyssey4me | *while | 20:27 |
palendae | This discussions is working well with the class right now | 20:27 |
Apsu | odyssey4me: Sure. To be fair, OVS 2.3+ is way more advanced than Neutron has begun to take advantage of | 20:27 |
Apsu | Still, I don't think it's "better" than native linux. | 20:28 |
Apsu | Namespaces and tunnel interfaces and minor orchestration can do what you need. | 20:28 |
daneyon_ | hughsaunders: I ran multiple times and i hit the same error. The only way I can get around it is by removing the api check. | 20:28 |
odyssey4me | also to be fair, our horrible experiences with OVS have a lot to do with early versions and the inability for Ubuntu to provide kernel and OVS kernel module patches properly... I do have horrible memories of kernel panics after almost every package update | 20:29 |
hughsaunders | daneyon_: would you be able to pastebin the response you get from cinder? | 20:29 |
Apsu | odyssey4me: The biggest benefits from OVS come from using *actual* metrics and monitoring at the flow level to provide dynamic adjustments to traffic, to maximize utilization | 20:29 |
Apsu | odyssey4me: That's the whole value prop of OpenFlow. The dynamic, programmable feedback loop | 20:29 |
Apsu | Then you can adjust QoS, pick different datacenter paths based on link utilization, etc | 20:29 |
Sam-I-Am | what is this qos? | 20:29 |
odyssey4me | Apsu yeah, I do remember my fellow architect schooling me in those mysteries :p | 20:29 |
Apsu | Gives you a good place to interface with OSPF in the backbone layer... | 20:30 |
Apsu | Sam-I-Am: I have no idea what I'm talking about, I just started pasting from a buzzword generator. Don't mind me. | 20:30 |
hughsaunders | Apsu: now we have it in writing! | 20:30 |
palendae | Apsu is a markov chain | 20:30 |
Apsu | hughsaunders: Was it ever in question? | 20:30 |
Apsu | palendae: What do you feel about Apsu is a markov chain ? | 20:31 |
daneyon_ | hughsaunders: just to clarify, the response I get from Cinder API when I simply curl the VIP? | 20:34 |
hughsaunders | daneyon_: yes please | 20:34 |
hughsaunders | just to see if theres some reason its not matching | 20:35 |
daneyon_ | hughsaunders: https://etherpad.mozilla.org/74IKwZSD2b | 20:35 |
hughsaunders | daneyon_: thanks | 20:36 |
*** vmtrooper has joined #openstack-ansible | 20:50 | |
daneyon_ | hughsaunders: yw | 20:53 |
*** vmtrooper has quit IRC | 20:55 | |
*** jaypipes has quit IRC | 20:56 | |
*** jaypipes has joined #openstack-ansible | 20:57 | |
*** Mudpuppy has quit IRC | 21:00 | |
*** sigmavirus24 is now known as sigmavirus24_awa | 21:00 | |
hughsaunders | daneyon_: I setup nc to respond with the response you provided, and pointed the "check cinder api service is available" task from the tip of the juno branch at and I can't get it to fail :( so I'm not sure whats happening. Which SHA of os-ansible-deployment are you on? | 21:08 |
daneyon_ | Sam-I-Am: I've had a chance to look at t he HA tool in more detail. It seems like the tool is more of a DR solution than an HA solution. How long do you see the typical faial-over time between L3 agents in your test cases? | 21:08 |
daneyon_ | hughsaunders: commit 2c6e3b5c5958feda28c800a3acec9165051e6fdc | 21:09 |
hughsaunders | daneyon_: thanks | 21:09 |
daneyon_ | hughsaunders: yw | 21:11 |
daneyon_ | hughsaunders: bb in 10-15.. need food. | 21:12 |
*** Mudpuppy has joined #openstack-ansible | 21:14 | |
*** Mudpuppy_ has joined #openstack-ansible | 21:17 | |
*** Mudpuppy has quit IRC | 21:20 | |
*** Mudpuppy_ is now known as Mudpuppy | 21:31 | |
hughsaunders | daneyon_: still can't get it to fail, tried ansible 1.6.10 (as pre requirements) and 1.8.4 (latest release). could you add a debug task after "check cinder api service is available" and pastebin the result? -debug: var=cinder_get | 21:34 |
*** sigmavirus24_awa is now known as sigmavirus24 | 21:47 | |
daneyon_ | hughsaunders: Do I add something like this: debug: cinder_get | 21:52 |
hughsaunders | daneyon_: yeah -debug: var=cinder_get | 21:53 |
*** stevemar has quit IRC | 22:13 | |
*** Mudpuppy has quit IRC | 22:19 | |
daneyon_ | hughsaunders: i updated the etherpad | 22:28 |
daneyon_ | hughsaunders: Is their a log I should check b/c I get the same error msg | 22:29 |
hughsaunders | daneyon_: yeah, it seems the debug is not evaluated because the previous task fails :( | 22:30 |
Apsu | daneyon_: < 5 min for failover. Which isn't very fast in the worst case. It's more like Medium Availability, I guess. There are better ways to do it but not without upstream scheduler changes (which currently don't handle failover at all for L3), running the cronjobs more often (which can be too slow and pile up with lots of networks/agents), or using a dedicated daemon to plug into agent statuses through AMQP and do its own scheduling and queueing | 22:34 |
Apsu | (like rpcdaemon did/does). | 22:34 |
daneyon_ | Apsu: Thx. Can you refresh my memory why you went away from rpcdaemon then? | 22:36 |
Apsu | daneyon_: Neutron finally implemented a decent L3 scheduler that handled most of the edge cases it used to not, and only lacked the basic case of: agent down -> move routers to other agents | 22:38 |
Apsu | The LeastUsedScheduler or whatever. It just needs to get poked a little bit | 22:39 |
Apsu | Which the ATT script did in a simple straightforward way | 22:39 |
daneyon_ | Apsu: OK. Thanks! | 22:39 |
Apsu | rpcdaemon is ... a slightly complex, multithreaded, proper daemon. | 22:39 |
*** vmtrooper has joined #openstack-ansible | 22:39 | |
Apsu | I can totally show it to you and explain it if you're interested, but you'd have to disable the builtin schedulers | 22:39 |
daneyon_ | Apsu: Maybe at some point. Right now, I'm just trying to gather data. | 22:41 |
Apsu | It's faster than processing the same bits in bash, for sure, but the real solution is honestly to make a better scheduler. | 22:42 |
Apsu | I think Neutron had a blueprint for one.... | 22:42 |
*** sigmavirus24 is now known as sigmavirus24_awa | 22:43 | |
Apsu | Might have gotten displaced a little due to DVR/HA bits coming in | 22:43 |
hughsaunders | daneyon_: thanks for providing debug info, the only thing I can think off is that the internal vip or service port are wrong, but I'm not sure how that would happen. | 22:44 |
Apsu | Looks like this part is in at least: https://github.com/openstack/neutron/blob/e933891462408435c580ad42ff737f8bff428fbc/neutron/scheduler/l3_agent_scheduler.py#L126 | 22:44 |
Apsu | auto_schedule_routers | 22:44 |
Apsu | Which was part of the automatic rescheduler class | 22:44 |
hughsaunders | The content check stands up to scruityn | 22:44 |
Apsu | Anyhow, heading out. | 22:44 |
hughsaunders | I'm off now, but will ping if I think of anything else. | 22:44 |
openstackgerrit | Kevin Carter proposed stackforge/os-ansible-deployment: Adds rsyslog client role and enables it in all plays https://review.openstack.org/164714 | 22:44 |
*** vmtrooper has quit IRC | 22:45 | |
daneyon_ | hughsaunders: what's weird the play checks the vip and port in the previous step. That passes. I even went changed the vip/port vars in the api check to the real IP/port and I get the same failure. I'm going to do a rebuild of my env soon and I'll let you know if it comes back. | 22:45 |
*** sdake has quit IRC | 22:55 | |
*** KLevenstein has quit IRC | 22:59 | |
*** galstrom is now known as galstrom_zzz | 22:59 | |
*** sdake has joined #openstack-ansible | 23:24 | |
*** jaypipes has quit IRC | 23:41 | |
*** sdake__ has joined #openstack-ansible | 23:45 | |
*** sdake has quit IRC | 23:49 |
Generated by irclog2html.py 2.14.0 by Marius Gedminas - find it at mg.pov.lt!