Friday, 2016-01-15

openstackgerrit	Merged openstack/astara: devstack doesn't check ASTARA_APPLIANCE_SSH_PUBLIC_KEY existence https://review.openstack.org/264589	00:21
adam_g	when it rains it pours: https://bugs.launchpad.net/diskimage-builder/+bug/1534387	00:27
openstack	Launchpad bug 1534387 in diskimage-builder "debian DIB_EXTLINUX=1 builds fail: Missing package name for distro/element: debian/base" [Undecided,New]	00:27
openstackgerrit	Adam Gandelman proposed openstack/astara: Add astara-ctl + API functional tests https://review.openstack.org/243874	00:36
openstackgerrit	Adam Gandelman proposed openstack/astara: fail func tests early (do not merge) https://review.openstack.org/266666	00:36
openstackgerrit	Merged openstack/astara: Drop unused call to non-existent function https://review.openstack.org/267127	00:41
markmcclain	adam_g: fun times	00:47
*** stanchan has quit IRC		01:01
*** yanghy has joined #openstack-astara		01:58
*** outofmemory is now known as reedip		02:05
openstackgerrit	Yang Hongyang proposed openstack/astara: Refactor ensure_cache for loadbalancer driver https://review.openstack.org/266741	02:44
openstackgerrit	Yang Hongyang proposed openstack/astara: Refactor ensure_cache for instance manager https://review.openstack.org/266740	02:44
openstackgerrit	Yang Hongyang proposed openstack/astara: Refactor ensure_cache for router driver https://review.openstack.org/266742	02:44
openstackgerrit	Yang Li proposed openstack/astara: Add an option to get max sleep time from the config file https://review.openstack.org/266213	02:55
openstackgerrit	Yang Hongyang proposed openstack/astara: Remove unnecessary nosetest param https://review.openstack.org/267904	03:12
openstackgerrit	Yang Hongyang proposed openstack/astara: Remove unused openstack common conf https://review.openstack.org/267920	03:34
openstackgerrit	xiayu proposed openstack/astara: Automatically generate etc/orchestrator.ini file https://review.openstack.org/265576	03:40
openstackgerrit	Yang Li proposed openstack/astara: Add an option to get max sleep time from the config file https://review.openstack.org/266213	04:42
*** leonstack has quit IRC		04:54
*** leonstack has joined #openstack-astara		04:56
*** leonstack has quit IRC		05:19
*** leonstack has joined #openstack-astara		05:20
*** leonstack1 has joined #openstack-astara		06:37
*** leonstack has quit IRC		06:40
*** leonstack has joined #openstack-astara		06:46
*** leonstack1 has quit IRC		06:47
*** leonstack has quit IRC		06:49
*** leonstack has joined #openstack-astara		07:48
*** ronis has joined #openstack-astara		08:30
*** yanghy_ has joined #openstack-astara		08:32
*** yanghy has quit IRC		08:35
*** reedip is now known as outofmemory		08:49
*** leonstack1 has joined #openstack-astara		08:55
*** leonstack has quit IRC		08:58
*** outofmemory has quit IRC		09:07
*** ronis has quit IRC		09:31
*** leonstack has joined #openstack-astara		09:43
*** leonstack1 has quit IRC		09:45
*** xiayu has quit IRC		09:47
*** xiayu has joined #openstack-astara		09:48
*** xiayu has quit IRC		10:00
openstackgerrit	Yang Li proposed openstack/astara: Fix a bug for default provider https://review.openstack.org/268085	11:28
*** leonstack1 has joined #openstack-astara		12:18
*** leonstack has quit IRC		12:21
*** leonstack has joined #openstack-astara		12:31
*** leonstack1 has quit IRC		12:32
*** openstackgerrit has quit IRC		12:50
*** openstackgerrit has joined #openstack-astara		12:51
*** yanghy_ has quit IRC		14:38
ryanpetrello	adam_g you around?	15:54
ryanpetrello	or markmcclain	15:54
ryanpetrello	I was talking to rods about this orphaned vrrp port issue	15:54
markmcclain	ryanpetrello, rods: yes	15:55
ryanpetrello	we had discussed the idea of having a "Cleanup" state	15:55
ryanpetrello	my concern is that it complicates the state machine more	15:55
ryanpetrello	and also if you run into a situation where port deletes aren't working, it clogs up the state machine workers	15:55
ryanpetrello	what about some way to mark ports as orphaned after detachment?	15:55
markmcclain	yeah... am concerned about that too	15:55
ryanpetrello	and a thread in the rug that does nothing but deletes them up in the background?	15:55
ryanpetrello	maybe even just changing the name to some identifier that flags them for cleanup via the rug	15:56
markmcclain	yeah.. we could have a reaper thread	15:56
markmcclain	the issue is HA rug	15:56
markmcclain	who runs the reaper?	15:56
markmcclain	the alternate is that we set a dirty bit	15:57
ryanpetrello	is it a huge issue if they're both issuing deletes?	15:57
markmcclain	only if deletes are failing for strange reasons	15:57
ryanpetrello	yea	15:57
markmcclain	because you'll have N processes attempting	15:57
ryanpetrello	right	15:57
ryanpetrello	how does the HA sharding currently work?	15:58
markmcclain	it's based on the hash-ring	15:58
markmcclain	of resource id	15:58
markmcclain	we could alternately set a dirty bit	15:58
ryanpetrello	where?	15:58
markmcclain	and if it's set the exit of is_alive	15:58
markmcclain	makes 1 attempt to delete	15:59
markmcclain	and then continues on with the state machine	15:59
markmcclain	so blocking wouldn't occur and we'd get periodic cleanups	15:59
ryanpetrello	maybe	15:59
rods	markmcclain one of the issue that I'm seeing when deleting the port in the REPLUG is that there are cases where if an exception is raised the rug moves the router from REPLUG to CONFIG to CALCACTION so just cleanup in the replug is not enough	15:59
markmcclain	right	15:59
ryanpetrello	right	15:59
ryanpetrello	as the state machine evolves	15:59
ryanpetrello	we're going to keep running into this	15:59
ryanpetrello	which is why I think something that reaps in the background makes sense	16:00
markmcclain	I'm thinking we just flag that the state is dirty	16:00
markmcclain	and there's some kind of janitor state	16:00
ryanpetrello	another consideration	16:00
ryanpetrello	this approach only handles ports detached by the rug	16:00
markmcclain	well the configure step will notice that there's a mismatch	16:01
ryanpetrello	true	16:01
markmcclain	and attempt to clean up	16:01
ryanpetrello	and replug	16:01
markmcclain	basically replug is supposed to be the state where ports are reconciled	16:02
ryanpetrello	right	16:02
ryanpetrello	I'm not against the idea of this happening in is_alive	16:02
ryanpetrello	my original inclination with the reaping was in tandem w/ the health check	16:02
ryanpetrello	there's obviously still room for orphaned ports	16:02
markmcclain	yeah.. that's what I'm leaning towards	16:03
ryanpetrello	e.g., stop the rug at the wrong time	16:03
markmcclain	is that we set a dirty bit	16:03
markmcclain	and the health check can try to make the instance healthy	16:03
ryanpetrello	do neutron ports have metadata?	16:03
ryanpetrello	when you say dirty bit, do you mean something at runtime in the state machine?	16:04
ryanpetrello	or something stored at the DB level	16:04
ryanpetrello	whatever approach we take here	16:04
ryanpetrello	I'd like it to be something the rug can recover if it's restarted	16:04
ryanpetrello	e.g., if a replug happens and then we immediately restart the rug	16:04
ryanpetrello	when the new rug process comes back up, it should notice the orphaned port and delete it	16:04
ryanpetrello	(imo)	16:04
rods	yeah, should be something in the db	16:04
ryanpetrello	otherwise we're just slowly going to leak ports	16:04
ryanpetrello	maybe we rename ports with some signification of orphaned + hash-ring resource ID	16:05
ryanpetrello	so each rug only handles the orphaned ports it should care about	16:05
rods	is that going to cause issues on rebalancing?	16:07
rods	brb	16:09
ryanpetrello	interested in adam_g's perspective when he's in	16:14
markmcclain	sadly the ports don't have metadata	16:26
markmcclain	so we'll have to see a dirty bit in the workers machine state	16:26
markmcclain	the nice thing is that even if control is handed off the dirty bit would be set again becuase the ports would not match	16:26
*** ryanpetrello is now known as ryanpetrello1		16:47
*** ryanpetrello1 is now known as ryanpetrello		16:48
openstackgerrit	mark mcclain proposed openstack/astara-neutron: Add reno for release notes management https://review.openstack.org/267812	17:13
* adam_g reads backscroll		17:46
adam_g	marking stuff dirty for later cleanup seems like we'd still be in the situation of things being eventually consistent and tenants not being able to delete their things (at least for some period of time)	17:50
*** cleverdevil has quit IRC		17:55
*** cleverdevil has joined #openstack-astara		17:56
*** smartshader has quit IRC		18:17
rods	I think there are no alternatives to the eventually consistent behaviour right now, we may want to focus on not leaving stray ports	18:34
markmcclain	adam_g: https://review.openstack.org/#/c/267962/1	18:38
openstackgerrit	mark mcclain proposed openstack/astara-neutron: remove dead floating IP code https://review.openstack.org/268286	18:48
*** leonstack1 has joined #openstack-astara		18:56
*** leonstack has quit IRC		18:57
openstackgerrit	mark mcclain proposed openstack/astara-neutron: remove dead floating IP code https://review.openstack.org/268286	18:57
openstackgerrit	mark mcclain proposed openstack/astara-neutron: remove dead floating IP code https://review.openstack.org/268286	18:59
openstackgerrit	Merged openstack/astara-neutron: Add reno for release notes management https://review.openstack.org/267812	19:01
fzylogic	markmcclain: that patch appears to be working as expected	19:01
fzylogic	thanks!	19:01
markmcclain	awesome	19:01
adam_g	markmcclain, so looking at the gate failures	19:27
adam_g	i think that test is just bad	19:28
adam_g	and we'd be better off dropping it once the newer tests land	19:28
adam_g	which create tenant/network/router per test instead of relying on the devstack created one	19:29
markmcclain	yeah.. thanks makes sense	19:29
openstackgerrit	mark mcclain proposed openstack/astara-neutron: allow DHCP from router interfaces https://review.openstack.org/266586	19:32
markmcclain	adam_g: fixed rebase conflict if you want to re-add +A	19:33
openstackgerrit	Adam Gandelman proposed openstack/astara: Add astara-ctl + API functional tests https://review.openstack.org/243874	19:33
markmcclain	adam_g: thanks	19:36
openstackgerrit	Adam Gandelman proposed openstack/astara: Add astara-ctl + API functional tests https://review.openstack.org/243874	20:00
openstackgerrit	Adam Gandelman proposed openstack/astara: Enrich functional test suite https://review.openstack.org/219952	20:00
openstackgerrit	Adam Gandelman proposed openstack/astara: fail func tests early (do not merge) https://review.openstack.org/266666	20:00
markmcclain	rods, ryanpetrello: https://review.openstack.org/#/c/246005/14	20:25
*** davidlenwell has quit IRC		20:38
*** davidlenwell has joined #openstack-astara		20:39
openstackgerrit	mark mcclain proposed openstack/astara: Move settings from plugin.sh to the settings file https://review.openstack.org/268327	20:50
adam_g	markmcclain, so im trying to make network cleanup more robust in this test suite and hitting network in use errors because of ports not being cleaned up in time	21:16
adam_g	markmcclain, wondering if overriding the ml2's delete_networks() to avoid raising the in-use exception if the ports its found as astara-internal ports, under the assumption that we will be cleaning those up later	21:17
adam_g	... if that coupled with some reaper thread in astara would help with the stray ports issue rods is having	21:18
adam_g	http://git.openstack.org/cgit/openstack/neutron/tree/neutron/db/db_base_plugin_v2.py#n58	21:21
markmcclain	adam_g: yeah... considered that too	21:29
markmcclain	adam_g: I've also wanted to retire the ml2 plugin wrapper too	21:29
adam_g	markmcclain, it seems thats exactly what happens for dhcp ports/etc	21:29
markmcclain	right it's a bit different because dhcp does not use nova	21:29
markmcclain	so not sure what's going to happen if we delete things out from underneath nova	21:29
adam_g	yea	21:32
openstackgerrit	Adam Gandelman proposed openstack/astara: Add astara-ctl + API functional tests https://review.openstack.org/243874	21:34
openstackgerrit	Adam Gandelman proposed openstack/astara: Enrich functional test suite https://review.openstack.org/219952	21:34
openstackgerrit	Adam Gandelman proposed openstack/astara: fail func tests early (do not merge) https://review.openstack.org/266666	21:34
stupidnic	Okay. I seem to be having a problem with the Astara instance starting up.	21:42
stupidnic	I blew away our entire install and redeployed. Most everything works. We can turn up instances, but I can't seem to boot an Astara instance.	21:43
openstackgerrit	Merged openstack/astara: Allow API listening address to be specified in config https://review.openstack.org/246005	21:43
markmcclain	stupidnic: seeing any errors in the logs?	21:44
stupidnic	markmcclain: the only thing I am seeing is a traceback on CheckBoot.execute()	21:44
stupidnic	Let me dig a bit deeper in the logs	21:44
* adam_g needs to run baby to the baby doc. ttyl		21:44
markmcclain	stupidnic: ok.. can you paste the traceback to paste.openstack.org?	21:46
stupidnic	Sure. Thinking on this a bit more... is it possible that image isn't in the correct project and as such Astara can't access it?	21:47
markmcclain	possibly	21:47
markmcclain	especially if the instance isn't showing up in Nova	21:47
stupidnic	Well it shows up, but it immediately errors	21:47
stupidnic	Let me look at Nova	21:48
stupidnic	RescheduledException: Build of instance dacef104-0eda-4b7a-95f4-f452574dadd9 was re-scheduled: 'dict' object has no attribute 'disk_format'\n"	21:49
stupidnic	bad image import?	21:49
stupidnic	odd... glance shows the image is disk_format raw which is what it should be	21:51
stupidnic	might be related to a bug in rbd... I seem to recall having this fixed before and we made a Salt rule to update the file with the patched version, but we pulled it exepcting the patch to have made it up stream	21:57
stupidnic	double checking	21:57
elo	is this related to this bug: https://bugs.launchpad.net/nova/+bug/1508230	21:57
openstack	Launchpad bug 1508230 in nova (Ubuntu Wily) "regression in cloning raw image type with ceph" [High,Fix committed] - Assigned to James Page (james-page)	21:57
stupidnic	elo: one and the same	21:58
stupidnic	We made the mistake of assuming that after 3 months this bug would have been patched in the debs that Ubuntu is publishing... sadly that is not the case	21:59
stupidnic	Yep... sigh	22:00
openstackgerrit	mark mcclain proposed openstack/astara: Move settings from plugin.sh to the settings file https://review.openstack.org/268327	22:01
stupidnic	Alright... have the instance booted... can't talk to it over ipv6 though... digging into that now	22:14
stupidnic	Okay. I see the traffic making it over the VXLAN tunnels we are using	22:17
stupidnic	from the controller to the compute node	22:18
elo	can't talk to the management IP address	22:18
stupidnic	anywhere else I should look?	22:18
stupidnic	Yeah. The instance is up and runninng (I can see the console) but I can't ping the management ip	22:18
elo	I assume using OVS on the nodes	22:19
stupidnic	negative	22:19
stupidnic	I hate it :)	22:19
stupidnic	It's not a bad technology... it just makes things way more complicated than they really need to be	22:19
elo	markmcclain: could this be the issue that you seen with linux bridge is not replicating packet from one bridge to another?	22:21
stupidnic	What is the details to connect to the Astara instance (I used the default one)	22:22
markmcclain	elo: possibly	22:22
markmcclain	if everything is on same host then it's something else	22:22
stupidnic	it's not	22:22
markmcclain	ok.. so it's a replication issue	22:23
stupidnic	we have separate controllers and compute nodes	22:23
markmcclain	since neighbor discoverty/arp has to work	22:23
stupidnic	I can see the packets on the vxlan interface on the compute node... so the packets are making it that far	22:23
elo	do you see any drops in the IPtables for packets	22:23
elo	on the VXlan bridge interface	22:24
markmcclain	hmmm.. if they're making to the compute node then things are good	22:24
stupidnic	17:26:46.517809 IP6 fdd6:a1fa:cfa8:748e:f816:3eff:fefc:fb88 > ff02::1:ff15:d0dc: ICMP6, neighbor solicitation, who has fdd6:a1fa:cfa8:748e:f816:3eff:fe15:d0dc, length 32	22:26
stupidnic	that's on the vxlan interface on the compute node	22:27
stupidnic	I see fb88 is the IPv6 on the controller	22:27
openstackgerrit	mark mcclain proposed openstack/astara: make the enabled_drivers configurable in devstack https://review.openstack.org/268353	22:28
stupidnic	I would like to login to the console on the Astara router to confirm it actually has the IPv6 address we are looking for	22:28
markmcclain	you have to build an appliance with the demo user	22:29
markmcclain	or mount the disk image and user that can login with username/passwd	22:29
markmcclain	s/demo/debug/	22:29
stupidnic	Okay.	22:29
markmcclain	has the L2 agent moved the vxlan device to proper bridge?	22:30
stupidnic	How would I confirm that?	22:30
markmcclain	brctl show	22:31
markmcclain	should list the interfaces added to the bridge	22:31
stupidnic	Okay I show that vxlan-10 and tapf92... are on the same bridge	22:32
stupidnic	so in theory I should also see the packets on that same interface if I tcpdump it	22:32
stupidnic	yep	22:32
elo	yes	22:32
stupidnic	I can the who has	22:33
stupidnic	I can see	22:33
stupidnic	So the packets are making it into the tap interface	22:33
stupidnic	on the compute node	22:33
stupidnic	I suspect something is up with the instance itself....	22:34
stupidnic	maybe dhcp	22:34
elo	config drive is used for router instance for network configuration	22:35
stupidnic	how do I enable the debug user in disk-image-create	22:37
elo	Ill send you instruction on how to backdoor a qcow image	22:37
markmcclain	ok.. I've got head out for a bit	22:37
stupidnic	I am pretty sure this is a DHCP issue	22:41
stupidnic	I just checked the network in Horizon and got an error about DHCP for the service network	22:42
elo	https://gist.github.com/eric-lopez/d0321112cc1678566c7e	22:50
stupidnic	elo: can you clarify something for me? We have dhcp-agent and metadata-agent disabled on the controller. Is that correct?	22:51
elo	yes	22:52
elo	these services are handled by the router instance	22:52
stupidnic	That's what I thought	22:52
elo	if the router is configured properly or connected to the tenant network, the instance in that network will not get DHCP or metadata info	22:53
stupidnic	heh, vi isn't in the image	22:56
elo	ok. will update gist. this was quickly written up	23:03
stupidnic	it's no problem... I just had a laugh	23:04
stupidnic	I echo'ed it	23:04
stupidnic	and we are in	23:09
stupidnic	and now I can't type any more	23:09
stupidnic	sigh	23:10
stupidnic	I should start drinking	23:10
stupidnic	man... that's weird	23:11
elo	TGIF...	23:12
elo	yes	23:12
stupidnic	I have another instance that has been running for hours... I can type on the console for that one	23:12
stupidnic	it's like the instance locks up	23:12
elo	is it on the same compute node?	23:13
stupidnic	No. Different compute node	23:13
stupidnic	but they are all identical	23:13
*** shashank_hegde has joined #openstack-astara		23:13
stupidnic	Soft rebooting isn't working either	23:13
stupidnic	something up there	23:14
stupidnic	Okay. I don't see any configuration for eth0 or eth1	23:17
stupidnic	the interfaces are there, but there is nothing in the interfaces file	23:17
stupidnic	Where is the astara appliance supposed to pull its configuration information from?	23:21
elo	confdrive	23:22
stupidnic	Okay. The lock issue is probably due to the rug rebooting the instance out from under me	23:27
stupidnic	it just did it to me again	23:27
stupidnic	Okay. So why isn't this instance getting its configuration	23:28
stupidnic	What services do I need to have running on the controller?	23:28
elo	that makes sense as it can't configure the instance as it isn't getting a mgmt IP address	23:29
stupidnic	Right. So where might we going wrong?	23:29
elo	nova-compute should of installed genisoimage packages that configdrive requires	23:31
fzylogic	assuming nova-compute is configured to use ISO confdrive images	23:32
fzylogic	might be vfat	23:32
fzylogic	both have their own optional requirements	23:32
elo	correct. I forgot about vfat	23:32
elo	check nova.conf on the format the config_drive_format is set	23:34
stupidnic	nope, not set on the compute nodes	23:34
stupidnic	negative on the controller as well	23:35
stupidnic	so if it is not set then it will default to iso	23:36
stupidnic	and I can confirm that genisoimage	23:36
stupidnic	is installed	23:36
elo	reference docs says default is set to iso9660	23:38
fzylogic	nova-compute.log should tell you for sure if it's being built when the instance spawns	23:42
stupidnic	the confdrive?	23:43
stupidnic	not seeing anything in the logs reference confdrive or geniso	23:43
fzylogic	yeah	23:44
fzylogic	when instances boot here, it logs 2 things	23:44
fzylogic	"instance: <uuid>] Using config drive"	23:44
fzylogic	"instance: <uuid>] Creating config drive at <path>"	23:44
stupidnic	Yeah I have that	23:44
fzylogic	you can either kill the rug or put the router into debug mode after it boots so you don't get the appliance pulled out from under you	23:45
fzylogic	that'll let you poke around a bit more thoroughly	23:45
stupidnic	Okay. So just drop orchestrator?	23:46
openstackgerrit	Adam Gandelman proposed openstack/astara: Add astara-ctl + API functional tests https://review.openstack.org/243874	23:46
openstackgerrit	Adam Gandelman proposed openstack/astara: fail func tests early (do not merge) https://review.openstack.org/266666	23:46
elo	astara-ctl router debug <router_id>	23:47
fzylogic	^^	23:47
stupidnic	would the router ID be the instance ID or?	23:48
fzylogic	no, the router ID as neutron knows it	23:48
stupidnic	Okay... got it	23:49
stupidnic	So... we know that nova-compute is building the confdrive	23:49
elo	so is it configuring the mgmt interface correctly?	23:52
stupidnic	No there are no interfaces on the instance other than lo	23:53
stupidnic	well there are interfaces but they are not configured	23:53
stupidnic	I just checked the path for the config drive as specified in the logs and that file does not exist	23:53
stupidnic	The path is there and contains console.log and libvirt.xml	23:54
adam_g	stupidnic, check /proc/partitions in the appliance, should have config drive in there as sr0	23:56
adam_g	on the compute node, virsh dumpxml for the instance, should be a <disk/> entry for disk.config	23:57
elo	should be a disk (qcow image) and disk.config (config drive) files in that directory	23:57
stupidnic	There isn't.	23:57
stupidnic	And the debug didn't work	23:57
stupidnic	the instance is still being yanked out from under me	23:58
stupidnic	Is it possible that there is an issue with Ceph here?	23:58
stupidnic	Looking in the libvirt.xml there is a reference to disk.conf but it is referencing an rbd	23:58
adam_g	stupidnic, is 'force_config_drive=True' set in nova.conf ?	23:58
stupidnic	no, I don't have any settings related to config_drive	23:59
stupidnic	<source protocol="rbd" name="volumes/01e62deb-2fd4-4b38-935b-4326a426ecb0_disk.config">	23:59
adam_g	im not sure what the status of config drive /w ceph is	23:59

Generated by irclog2html.py 2.14.0 by Marius Gedminas - find it at mg.pov.lt!