*** jamesdenton has joined #openstack-astara | 00:06 | |
*** jamesdenton has quit IRC | 00:11 | |
*** shashank_hegde has quit IRC | 01:04 | |
*** ssalagame has quit IRC | 01:18 | |
*** jamesdenton has joined #openstack-astara | 03:52 | |
*** jamesdenton has quit IRC | 03:52 | |
*** shashank_hegde has joined #openstack-astara | 03:52 | |
*** pcaruana has joined #openstack-astara | 04:11 | |
*** shashank_hegde has quit IRC | 04:12 | |
*** shashank_hegde has joined #openstack-astara | 04:13 | |
*** pcaruana has quit IRC | 04:24 | |
*** shashank_hegde has quit IRC | 04:29 | |
*** shashank_hegde has joined #openstack-astara | 04:29 | |
*** pcaruana has joined #openstack-astara | 04:34 | |
*** pcaruana has quit IRC | 04:40 | |
*** ssalagame has joined #openstack-astara | 04:59 | |
*** shashank_hegde has quit IRC | 05:00 | |
*** shashank_hegde has joined #openstack-astara | 05:01 | |
*** shashank_hegde has quit IRC | 05:33 | |
*** ssalagame has quit IRC | 05:45 | |
*** shashank_hegde has joined #openstack-astara | 06:42 | |
*** ronis has joined #openstack-astara | 07:08 | |
*** shashank_hegde has quit IRC | 08:48 | |
*** phil_h has quit IRC | 11:19 | |
*** ronis has quit IRC | 12:50 | |
*** ronis has joined #openstack-astara | 12:50 | |
*** ssalagame has joined #openstack-astara | 13:43 | |
*** jamesdenton has joined #openstack-astara | 14:11 | |
*** ssalagame has quit IRC | 14:24 | |
*** ssalagame has joined #openstack-astara | 14:26 | |
*** ronis has quit IRC | 14:27 | |
*** jamesdenton has quit IRC | 14:30 | |
*** ssalagame has quit IRC | 15:06 | |
*** justinlund has joined #openstack-astara | 15:32 | |
*** ssalagame has joined #openstack-astara | 15:45 | |
*** ronis has joined #openstack-astara | 16:11 | |
*** justinlund has quit IRC | 16:21 | |
*** justinlund has joined #openstack-astara | 16:37 | |
*** ronis has quit IRC | 16:57 | |
*** shashank_hegde has joined #openstack-astara | 17:16 | |
*** shashank_hegde has quit IRC | 17:17 | |
*** justinlund has quit IRC | 17:19 | |
*** ssalagame has quit IRC | 17:23 | |
*** ssalagame has joined #openstack-astara | 17:23 | |
*** ssalagame has quit IRC | 17:23 | |
*** justinlund has joined #openstack-astara | 17:28 | |
*** shashank_hegde has joined #openstack-astara | 17:52 | |
*** ssalagame has joined #openstack-astara | 18:21 | |
*** ronis has joined #openstack-astara | 18:52 | |
*** mhayden has quit IRC | 19:00 | |
*** mhayden has joined #openstack-astara | 19:00 | |
*** phil_h has joined #openstack-astara | 19:16 | |
phil_h | Eric, are you there? | 19:30 |
---|---|---|
elo | yes. just jump on this computer | 19:32 |
elo | whats up | 19:32 |
phil_h | Just wanted to check to see if you have given the ansible stuff much thought? | 19:33 |
elo | not off-hand | 19:34 |
phil_h | I am looking at what plumgrid did and seeing if I can mod my astara ansible stuff to fit | 19:34 |
elo | looked at it earlier.. they way they disable stuff is what we need to borrow | 19:35 |
phil_h | I would like to get astara included in the OSA stuff | 19:35 |
elo | ok | 19:35 |
elo | brb. bio break | 19:36 |
phil_h | I need to figure out how to use OSA to install OpenStack so I can then start to merge the stuff | 19:36 |
phil_h | K | 19:36 |
*** ronis has quit IRC | 20:06 | |
elo | back | 20:08 |
phil_h | k | 20:17 |
phil_h | anyway I an trying to find some time to get started on the OSA stuff for Astara | 20:18 |
phil_h | Is anyone else interested? | 20:18 |
stupidnic | I am wondering if anybody is around to help me troubleshoot an issue I am having with a virtual network. | 21:45 |
stupidnic | I have a client (only one) that seems to be having some sort of issue with their instances. Basically the instances stop pinging all of a sudden. Very random and intermittent. | 21:47 |
stupidnic | I have a ping running externally to a floating IP as well as a ping from the router to the internal IP address. Randomly the internal ping stops responding. | 21:49 |
stupidnic | I have used tcpdump on the hypervisor to confirm that the packets are getting through to the compute node, it's just that the instance suddenly stops responding. | 21:49 |
stupidnic | In the tcpdump I can see the packets going unanswered, and then suddenly the router does an ARP request for the IP address and it magically starts working again | 21:50 |
stupidnic | This seems to be happening to all their instances. I started up another instance just for testing and it is also impacted by this. | 21:51 |
*** ssalagame has quit IRC | 22:10 | |
fzylogic_ | any chance that they have a second router instance running somewhere? We've seen that happen a few times when an instance partially fails, but astara isn't able to delete the service VM (usually due to a nova bug/outage) | 22:11 |
fzylogic_ | doesn't seem *quite* right if you're seeing traffic hitting the VM, but that's the most frequent problem we see that affects only a single tenant. | 22:13 |
*** ssalagame has joined #openstack-astara | 22:15 | |
stupidnic | fzylogic_: Hmmm... Perhaps. It seems to be really at the router level. So I went to each compute node and started a tcpdump on the bridge interface for the tenant | 22:22 |
stupidnic | What is odd... when the ping stops on the router, I can still see the packets being passed between the two compute nodes | 22:22 |
stupidnic | So the pings keep going, it's just the router isn't seeing the traffic I think | 22:23 |
fzylogic_ | could definitely be the case that there's a duplicate router somewhere taking over that gateway IP | 22:23 |
stupidnic | That makes sense. | 22:24 |
stupidnic | I have also seen some weird behavior like this a long long time ago where somebody assigned a broadcast IP as an IP on a server | 22:24 |
stupidnic | It had similar behavior | 22:25 |
stupidnic | Odd... | 22:25 |
stupidnic | I just typed arp on the router | 22:26 |
stupidnic | and it is very slow to respond | 22:26 |
fzylogic_ | dns not working? | 22:26 |
stupidnic | doesn't seem to be... I can't ping 8.8.8.8 | 22:27 |
stupidnic | Okay... So looking at the public interface on the router... I am showing dropped packets on the public interface | 22:28 |
*** ssalagame has quit IRC | 22:33 | |
stupidnic | fzylogic_: is there any way to find out if there is another instance hanging around? | 22:37 |
*** ssalagame has joined #openstack-astara | 22:39 | |
fzylogic_ | we've got a script that compares the list of routers in neutron with a list of VMs in nova | 22:40 |
fzylogic_ | https://github.com/dreamhost/os-maintenance-tools/blob/master/bin/nova_audit_routers.py | 22:41 |
fzylogic_ | or if your cluster's small enough, you can probably just eyeball it | 22:41 |
stupidnic | Okay. So another data point. | 22:41 |
stupidnic | I just went and checked another tenant's router | 22:42 |
stupidnic | and I can ping google, the gateway, etc | 22:42 |
stupidnic | But in this tenant's router I can't do that | 22:42 |
stupidnic | That doesn't make any sense to me | 22:43 |
fzylogic_ | if you leave a ping going for a while, do you see intermittent success at all? | 22:44 |
fzylogic_ | if so, that sounds like 2 routers fighting over an IP | 22:44 |
stupidnic | let me setup a ping to the router's external interface | 22:45 |
stupidnic | I will say that I already rebooted the router in the tenant earlier today (was one of the first things I tried) | 22:48 |
stupidnic | Okay... pinging the external IP of the router doesn't seem to show any issues with it | 22:54 |
stupidnic | pinging clean | 22:54 |
fzylogic_ | do you see the packets on the router VM that you're logged into? | 22:54 |
stupidnic | Cross checking | 22:55 |
stupidnic | Okay... the VM seems to be hung? | 22:55 |
stupidnic | wtf | 22:56 |
stupidnic | yep... the instance rebooted | 22:57 |
stupidnic | that's really weird | 22:57 |
stupidnic | Also... 8 routers, 8 instances | 22:58 |
stupidnic | so no duplicates as near as I can tell | 22:58 |
stupidnic | Can't ping the gateway though | 23:00 |
stupidnic | from the new router that just started up | 23:00 |
stupidnic | So to cross check this... I have another router running on the same compute node everything works fine on that router | 23:03 |
stupidnic | I can ping out | 23:04 |
stupidnic | Okay. I confirmed that the two routers can ping each other | 23:06 |
stupidnic | fzylogic_: okay. So looking at the router VM external interface... I am not seeing the ICMP packets | 23:12 |
stupidnic | So something else has that IP address I am guessing | 23:13 |
stupidnic | Alright... getting somewhere now | 23:13 |
stupidnic | Checking the core for the IP address assigned to the router VM (162.220.49.14) I show a different MAC address than the one assigned to the router's public interface | 23:14 |
stupidnic | Does neutron have a way of searching by mac address? | 23:15 |
fzylogic_ | none that I can think of | 23:18 |
fzylogic_ | at least not directly | 23:18 |
fzylogic_ | but if you do a port-list, I think it should show the mac addresses on each one | 23:19 |
stupidnic | Okay. That's the right track. Now to see if I can find that in the haystack | 23:20 |
stupidnic | awesome... that mac address doesn't exist in the port list | 23:22 |
stupidnic | Okay. So I finally found the mac address. | 23:26 |
stupidnic | it belongs to a tap interface on one of the hypervisors | 23:27 |
fzylogic_ | grep for it in /etc/libvirt/qemu/* on each one | 23:27 |
fzylogic_ | sounds like you may have an instance that libvirt hung onto after nova tried deleting it | 23:28 |
stupidnic | Yep. | 23:29 |
fzylogic_ | I've seen hypervisor crashes do this if nova's not configured just right | 23:29 |
stupidnic | So there is an instance xml file that has that mac address | 23:30 |
stupidnic | so... how do I get rid of this? | 23:30 |
fzylogic_ | the same xml file will have the UUID of the instance so you can ask nova if it's supposed to exist or not | 23:30 |
fzylogic_ | if not, you can just do `virsh destroy instance-whatever` | 23:30 |
stupidnic | just confirming here... that the main <uuid> is the one I should be looking for, right? | 23:32 |
fzylogic_ | yep | 23:32 |
stupidnic | Yeah looks like Nova doesn't know anything about that. I went to the Hypervisor list... and I don't see that instance id listed for the compute node at all | 23:33 |
stupidnic | This was probably related to the hostname change we made to the controller a while back | 23:34 |
stupidnic | That wrecked some stuff | 23:34 |
fzylogic_ | could definitely be | 23:34 |
stupidnic | need a blink tag on that one... | 23:34 |
fzylogic_ | might be interesting to look through old nova logs, but for now you can pretty safely just delete that VM with virsh | 23:34 |
stupidnic | changing the hostname on the controller is bad... very bad | 23:34 |
fzylogic_ | also, check what you have configured for the running_deleted_instance_action, running_deleted_instance_timeout, and running_deleted_instance_poll_interval options in nova.conf | 23:36 |
fzylogic_ | you want running_deleted_instance_action=reap | 23:36 |
stupidnic | Good to know. | 23:36 |
stupidnic | Thank you | 23:36 |
fzylogic_ | and whatever values for the other 2 you consider sane for your environment | 23:36 |
fzylogic_ | sure thing | 23:37 |
stupidnic | Hot damn | 23:37 |
stupidnic | that's got it | 23:37 |
stupidnic | man... that was a lot harder than it should have been | 23:38 |
stupidnic | fzylogic_: is that option in the compute nova.conf or at the server? | 23:39 |
fzylogic_ | nova.conf on your hypervisors | 23:39 |
stupidnic | roger | 23:39 |
fzylogic_ | it's used by nova-compute | 23:39 |
Generated by irclog2html.py 2.14.0 by Marius Gedminas - find it at mg.pov.lt!