Tuesday, 2014-12-02

*** Haomeng\|2 has joined #openstack-ironic		00:01
dlaube	jroll: I last cloned devstack right around the juno release	00:02
jroll	dlaube: ah, that console thing is new	00:02
*** Haomeng has quit IRC		00:02
*** smoriya has joined #openstack-ironic		00:03
*** russellb has joined #openstack-ironic		00:04
*** davideagnello has quit IRC		00:08
*** davideagnello has joined #openstack-ironic		00:10
dlaube	I do see console logs though	00:11
dlaube	going to try another nova boot and will share a paste of the output	00:11
*** ChuckC_ has joined #openstack-ironic		00:12
jroll	dlaube: right, there are console logs for the VM, but IPA logs won't be in there	00:13
NobodyCam	also dlaube have you seen : https://ask.openstack.org/en/question/50080/ironicdriversmodulesagent-node-command-status-errored-error-downloading-image/	00:14
*** Masahiro has joined #openstack-ironic		00:15
*** pensu has joined #openstack-ironic		00:18
*** Masahiro has quit IRC		00:19
dlaube	crap, yeah.. baremetal console log just shows it sitting at a coreos login	00:20
*** ChuckC_ is now known as ChuckC		00:20
dlaube	presumably the deploy image	00:20
dlaube	before it goes to install my ubuntu	00:20
jroll	yeah, with newer devstack it will keep logging	00:24
jroll	here, lemme grab the patch	00:24
jroll	you can just patch locally	00:24
jroll	dlaube: https://review.openstack.org/#/c/136867/2/lib/ironic	00:25
*** ryanpetrello has quit IRC		00:26
jroll	you can actually just edit ironic.conf and restart conductor	00:26
*** ChuckC has quit IRC		00:29
dlaube	thanks jroll!	00:33
dlaube	I know how to restart ironic deploy via apt on our lab… but in devstack I normally ./unstack.sh and ./stack.sh	00:33
dlaube	is there an easier way to restart just ironic conductor in devstack?	00:34
*** anderbubble has joined #openstack-ironic		00:34
JayF	yeah, connect up to the screen for it (ir-cond should be the name)	00:35
JayF	^c and restart the process	00:35
dlaube	thanks JayF	00:35
JayF	I think I usually hit ctrl+c then hit up to get the command used to spawn it	00:36
NobodyCam	JayF: ++	00:36
openstackgerrit	Michael Davies proposed openstack/ironic-specs: Proposal to add logical names to Ironic nodes https://review.openstack.org/134439	00:39
*** pensu has quit IRC		00:40
*** penick has quit IRC		00:40
jroll	dlaube: :) np	00:40
*** lucas-dinner has quit IRC		00:41
*** penick has joined #openstack-ironic		00:42
*** penick has quit IRC		00:42
*** Marga_ has quit IRC		00:45
*** Marga_ has joined #openstack-ironic		00:46
dlaube	hmm	00:46
dlaube	can only find the sample conf	00:46
dlaube	heh	00:46
*** ryanpetrello has joined #openstack-ironic		00:46
dlaube	root@lab7:~/devstack# find /opt/stack/ -name ironic.conf*	00:46
dlaube	/opt/stack/ironic/etc/ironic/ironic.conf.sample	00:46
*** igordcard has quit IRC		00:47
*** Marga_ has quit IRC		00:52
*** Masahiro has joined #openstack-ironic		00:52
*** Marga_ has joined #openstack-ironic		00:52
*** ryanpetrello has quit IRC		00:53
*** ryanpetrello has joined #openstack-ironic		00:56
*** ryanpetrello has quit IRC		01:00
*** spandhe has quit IRC		01:10
*** anderbubble has quit IRC		01:28
*** Masahiro has quit IRC		01:30
*** Masahiro has joined #openstack-ironic		01:33
dlaube	hmm	01:33
*** r-daneel has quit IRC		01:50
*** Marga_ has quit IRC		01:58
zer0c00l	What are the drivers used by the devstack setup mentioned here? http://docs.openstack.org/developer/ironic/dev/dev-quickstart.html#deploying-ironic-with-devstack	01:58
zer0c00l	i see the power driver is ssh	01:58
zer0c00l	Does it use the pxe driver?	01:58
zer0c00l	If i have to make ironic use my custom driver, where should i add it?	01:58
zer0c00l	IRONIC_ENABLED_DRIVERS ?	01:58
zer0c00l	Also how do i see the existing enabled drivers?	01:59
zer0c00l	My openrc looks like this http://paste.fedoraproject.org/155684/85573141	01:59
zer0c00l	i can't see anything else other that ssh driver mentioned in the ironic conductor log	02:00
jroll	dlaube: should be /etc/ironic.conf	02:00
jroll	zer0c00l: IRONIC_ENABLED_DRIVERS, yeah, and restack	02:01
jroll	(I think)	02:01
*** nosnos has joined #openstack-ironic		02:02
zer0c00l	jroll: enabled_drivers = fake,pxe_ssh,pxe_ipmitool	02:02
openstackgerrit	Tan Lin proposed openstack/ironic: Fixed typo in Drac management driver test https://review.openstack.org/138028	02:02
jroll	zer0c00l: yeah, it needs to be there	02:03
zer0c00l	It is mentioned in /etc/ironic/ironic.conf	02:03
jroll	IRONIC_ENABLED_DRIVERS in devstack makes it go there	02:03
zer0c00l	so if i add a new one there and restart the conductor the new driver should be loaded	02:03
zer0c00l	?	02:03
jroll	yes	02:03
jroll	oh	02:03
zer0c00l	sure. Thanks!	02:03
jroll	you need it in setup.cfg	02:03
zer0c00l	setup.cfg?	02:03
zer0c00l	of ironic?	02:03
jroll	and that requires setup.py install	02:03
jroll	yes	02:03
*** marcoemorais has quit IRC		02:03
Haomeng\|2	zer0c00l: ironic.conf	02:03
zer0c00l	ok	02:04
Haomeng\|2	zer0c00l: you can change it in ironic.conf, for example - enabled_drivers = fake,pxe_ssh,pxe_ipmitool	02:04
Haomeng\|2	zer0c00l: and restart the ironic-conductor process	02:05
zer0c00l	Got it!	02:05
zer0c00l	Thanks	02:05
Haomeng\|2	zer0c00l: welcome	02:05
*** Dafna has quit IRC		02:16
*** takadayuiko has joined #openstack-ironic		02:16
openstackgerrit	Haomeng,Wang proposed openstack/ironic: boot_devices.PXE value should match with pyghmi define https://review.openstack.org/137745	02:20
*** Nisha has joined #openstack-ironic		02:25
zer0c00l	The "Boot" and the "Deploy" Stuff are not decoupled in the ironic pxe drivers?	02:30
zer0c00l	PXE is a boot method, iscsi is a deploy method	02:30
zer0c00l	The "PXEDeploy" class always deploys using iscsi method	02:30
zer0c00l	if i have to write my own deploy method, i need to either make a copy of PXEDeploy or decouple this thing	02:31
zer0c00l	?	02:31
*** ChuckC has joined #openstack-ironic		02:34
Haomeng\|2	zer0c00l: yes, good catch	02:35
Haomeng\|2	zer0c00l: we have bp which try to decouple deploy and boot	02:35
Haomeng\|2	zer0c00l: let me find it	02:35
jroll	zer0c00l: yeah, that's something we want to do asap	02:36
jroll	:(	02:36
zer0c00l	i can take a look at it	02:37
zer0c00l	Is there a bug#?	02:37
jroll	https://review.openstack.org/#/q/status:open+branch:master+topic:bp/new-boot-interface,n,z	02:37
jroll	is where lucas has been working on it	02:38
jroll	I gotta run, later	02:38
zer0c00l	sure	02:38
zer0c00l	let me check	02:38
*** killer_prince is now known as lazy_prince		02:40
Haomeng\|2	yes, it is	02:41
harlowja_	zer0c00l have u tried using libvirt vms to be the pxeboot targets, i'm not sure if thats what ironic does (vs use the fake stuff)	02:48
harlowja_	that might be a way to get everything all on your laptop	02:48
*** Masahiro has quit IRC		02:48
harlowja_	including vms that act as 'machines' that u can pxeboot	02:48
harlowja_	https://bugzilla.redhat.com/show_bug.cgi?id=815136 might be sorta neat to	02:49
harlowja_	libvirt + ipmi	02:49
*** ryanpetrello has joined #openstack-ironic		02:51
*** Masahiro has joined #openstack-ironic		02:53
*** dlaube has quit IRC		02:55
*** ramineni has joined #openstack-ironic		03:02
*** achanda has joined #openstack-ironic		03:03
*** Masahiro has quit IRC		03:11
*** nosnos has quit IRC		03:29
*** Masahiro has joined #openstack-ironic		03:35
*** Haomeng\|2 has quit IRC		03:38
*** Masahiro has quit IRC		03:40
*** Masahiro has joined #openstack-ironic		03:43
*** Masahiro has quit IRC		03:45
*** pensu has joined #openstack-ironic		03:47
*** rloo_ has quit IRC		03:47
*** lazy_prince has quit IRC		03:52
*** Masahiro has joined #openstack-ironic		03:55
*** rushiagr_away is now known as rushiagr		03:58
*** Masahiro has quit IRC		04:07
*** naohirot has joined #openstack-ironic		04:07
naohirot	good afternoon ironic!	04:07
*** nosnos has joined #openstack-ironic		04:17
openstackgerrit	Merged openstack/ironic: Fixed typo in Drac management driver test https://review.openstack.org/138028	04:19
*** achanda has quit IRC		04:21
*** Masahiro has joined #openstack-ironic		04:28
*** ryanpetrello has quit IRC		04:28
*** Marga_ has joined #openstack-ironic		04:47
*** killer_prince has joined #openstack-ironic		04:48
*** killer_prince is now known as lazy_prince		04:48
*** Haomeng has joined #openstack-ironic		04:53
*** pensu has quit IRC		05:11
*** lazy_prince is now known as killer_prince		05:18
*** rameshg87 has joined #openstack-ironic		05:21
*** pcrews has quit IRC		05:30
*** achanda has joined #openstack-ironic		05:36
*** achanda has quit IRC		05:38
*** achanda has joined #openstack-ironic		05:39
*** pensu has joined #openstack-ironic		05:40
*** achanda has quit IRC		05:43
*** achanda has joined #openstack-ironic		05:44
*** Masahiro has quit IRC		05:46
openstackgerrit	Harshada Mangesh Kakad proposed openstack/ironic: Add documentation for SeaMicro driver https://review.openstack.org/136324	05:53
*** lintan has joined #openstack-ironic		06:02
*** Masahiro has joined #openstack-ironic		06:04
*** Marga_ has quit IRC		06:06
*** achanda has quit IRC		06:14
*** achanda has joined #openstack-ironic		06:14
*** achanda has quit IRC		06:19
*** mrda is now known as mrda-away		06:28
*** pradipta_away is now known as pradipta		06:33
openstackgerrit	sandhya proposed openstack/ironic-specs: Chassis Level Node Discovery https://review.openstack.org/134866	06:34
*** Marga_ has joined #openstack-ironic		06:41
*** harlowja_ is now known as harlowja_away		06:45
*** killer_prince has quit IRC		06:47
*** lazy_prince has joined #openstack-ironic		06:47
*** chenglch has joined #openstack-ironic		06:54
*** Masahiro has quit IRC		07:11
*** subscope has quit IRC		07:12
*** Masahiro has joined #openstack-ironic		07:16
*** k4n0 has joined #openstack-ironic		07:21
*** subscope has joined #openstack-ironic		07:27
lintan	lintan:	07:40
Haomeng	lintan: hi	07:45
Haomeng	lintan: understand you are trying to ping your self for testing:)	07:46
lintan	Haomeng: haha, you get me :)	07:47
Haomeng	lintan: :)	07:47
Haomeng	lintan: but if it is working, that should be irc client bug, because, that does not make sense:)	07:48
Haomeng	lintan: :)	07:48
lintan	Haomeng: :( yes, it doesn't work as you said. I just have a try.	07:49
Haomeng	lintan: :)	07:49
*** LuisArizmendi has joined #openstack-ironic		08:04
*** romcheg has joined #openstack-ironic		08:13
*** ndipanov_gone is now known as ndipanov		08:19
*** Masahiro has quit IRC		08:33
*** dlpartain has joined #openstack-ironic		08:36
takadayuiko	stackuser common-venv use-ephemeral deploy-ironic	08:38
takadayuiko	mistook :O	08:39
*** Masahiro has joined #openstack-ironic		08:41
*** jcoufal has joined #openstack-ironic		08:42
*** andreykurilin has joined #openstack-ironic		08:45
*** vinbs has joined #openstack-ironic		08:50
Nisha	hi dtantsur\|afk	08:51
*** nosnos has quit IRC		09:00
openstackgerrit	Nisha Agarwal proposed openstack/ironic-specs: Discover node properties using new CLI node-discover-properties https://review.openstack.org/100951	09:14
*** pradipta is now known as pradipta_away		09:15
*** pradipta_away is now known as pradipta		09:17
*** pradipta is now known as pradipta_away		09:17
*** jistr has joined #openstack-ironic		09:18
*** chenglch\|2 has joined #openstack-ironic		09:24
*** chenglch has quit IRC		09:27
*** dlpartain1 has joined #openstack-ironic		09:31
*** dlpartain has quit IRC		09:31
*** igordcard has joined #openstack-ironic		09:31
*** andreykurilin has quit IRC		09:32
*** viktors\|afk has quit IRC		09:34
*** viktors has joined #openstack-ironic		09:34
*** athomas has joined #openstack-ironic		09:35
*** dlpartain1 has quit IRC		09:35
*** lucasagomes has joined #openstack-ironic		09:39
*** yuriyz has quit IRC		09:40
*** viktors has quit IRC		09:40
*** derekh has joined #openstack-ironic		09:40
*** dtantsur\|afk is now known as dtantsur		09:41
dtantsur	Morning!	09:42
naohirot	dtantsur: good morning :)	09:42
*** foexle has joined #openstack-ironic		09:42
*** lsmola has quit IRC		09:43
openstackgerrit	Merged openstack/ironic: boot_devices.PXE value should match with pyghmi define https://review.openstack.org/137745	09:43
*** jcoufal_ has joined #openstack-ironic		09:44
*** jcoufal has quit IRC		09:47
*** chenglch\|2 has quit IRC		09:52
Nisha	dtantsur, good morning	09:53
Nisha	i have updated the spec	09:54
Nisha	dtantsur, i have used the term introspect now in the spec instead of discovery now	09:55
dtantsur	Nisha, good. The parent spec is not updated with it, but I believe we all agreed.	09:55
dtantsur	and good morning :)	09:55
Nisha	dtantsur, let me know your comments/suggestions on it	09:55
dtantsur	yeah sure, gimme a moment	09:55
Nisha	dtantsur, :)	09:55
Nisha	dtantsur, :)	09:56
dtantsur	Nisha, shouldn't CLI command be also called node-introspect-properties? (or even just node-introspect)	09:56
Nisha	i actually wanted to ask that before posting the spec... :)	09:57
Nisha	but then i though i will do that once discussed	09:57
dtantsur	Nisha, also there are a few places where you still use DISCOVERING and DISCOVERYFAIL	09:57
Nisha	Ok. Let me see.	09:57
Nisha	I will repost the spec	09:57
*** lsmola has joined #openstack-ironic		09:58
dtantsur	Nisha, also introspection_timeout option should be mentioned in "deployer impact" section (with the default value)	09:58
Nisha	ok	09:59
dtantsur	lemme post the comments on the spec, IRC is a bad reference source :)	10:00
*** sambetts has joined #openstack-ironic		10:01
*** viktors has joined #openstack-ironic		10:01
*** Masahiro has quit IRC		10:03
dtantsur	done	10:04
*** luisjariz has joined #openstack-ironic		10:04
*** luisjariz has quit IRC		10:05
*** LuisArizmendi has quit IRC		10:07
dtantsur	lucasagomes, o/ may I use you for discoverd reviews today as well? :)	10:07
lucasagomes	dtantsur, heh sure	10:08
lucasagomes	morning all	10:08
dtantsur	lucasagomes, I have 4 more, the most important right now being https://review.openstack.org/#/c/137418/ and https://review.openstack.org/#/c/137361/	10:08
dtantsur	thanks in advance :)	10:08
takadayuiko	Hi, lucasagomes	10:08
* dtantsur had a hard rebase yesterday...		10:09
lucasagomes	takadayuiko, hello there :)	10:09
dtantsur	takadayuiko, o/	10:09
sambetts	dtantsur, lucasagomes: I had to head out ysterday evening, was a final decision made about the state machine?	10:09
takadayuiko	dtantsur, o/	10:09
dtantsur	sambetts, I guess it was "carry on with what we have now"	10:09
*** Masahiro has joined #openstack-ironic		10:09
lucasagomes	sambetts, yeah improve the one we have now	10:10
sambetts	so continue to implement the ideas proposed at the summit?	10:10
dtantsur	yep	10:10
lucasagomes	some of the stuff proposed have made into the new model, like less multipaths, states are now classified as passive/active (kinda like the state action), some name changes	10:11
sambetts	cool cool :-)	10:11
openstackgerrit	Nisha Agarwal proposed openstack/ironic-specs: Discover node properties using new CLI node-introspect https://review.openstack.org/100951	10:11
sambetts	lucasagomes: ah ok, just refined a bit from the whiteboard scribble	10:11
lucasagomes	yup	10:11
Nisha	dtantsur, lucasagomes could i request your reviews on https://review.openstack.org/134022 and https://review.openstack.org/137024	10:12
*** yuriyz has joined #openstack-ironic		10:12
Nisha	posted long back and no reviews till now	10:13
dtantsur	yeah, review queue is huge for us, sorry :) will try to find time today	10:13
Nisha	dtantsur, thanks	10:13
dtantsur	lucasagomes, thanks, updated	10:33
*** lsmola has quit IRC		10:34
lucasagomes	Nisha, #137024 reviwed	10:35
*** naohirot has quit IRC		10:37
*** vdrok has joined #openstack-ironic		10:38
lucasagomes	dtantsur, what does /v1/continue does?	10:39
* lucasagomes brb 1 sec		10:39
dtantsur	lucasagomes, it's an endpoint receiving callback from the ramdisk	10:39
dtantsur	and I know that I suck at naming :)	10:39
lucasagomes	dtantsur, this is where all the pos_* plugins will run?	10:42
lucasagomes	I'm wondering if the http request won't timeout there making it sync	10:42
*** pelix has joined #openstack-ironic		10:42
dtantsur	lucasagomes, yep. it corresponds to process() function	10:42
*** Masahiro has quit IRC		10:42
dtantsur	well, it depends on the timeout :)	10:42
dtantsur	even on the timeout of CURL, right?	10:43
dtantsur	(talking about the bash ramdisk)	10:43
lucasagomes	yeah, usually tools have their own timeout	10:44
lucasagomes	you can modify it do with some -- options	10:44
lucasagomes	dtantsur, but does the ramdisk needs to wait the /continue to finish ?	10:44
dtantsur	lucasagomes, actually it took 1-2 seconds last time I checked :)	10:44
lucasagomes	I thought it would post the data and poweroff and the service would then process the data and do what it needs to do	10:44
dtantsur	lucasagomes, it does, if we want to implement IPMI credentials setting	10:44
lucasagomes	oh	10:45
dtantsur	lucasagomes, also: it's nice to leave the ramdisk in the troubleshoot mode if we sent some crap and discoverd returned an error	10:45
lucasagomes	dtantsur, yeah I'm just wondering when more plugins comes in	10:45
lucasagomes	we can kinda lose the control of the time	10:45
lucasagomes	and being async sounds more flexible	10:45
lucasagomes	unless we have a notification to tell the ramdisk to continue that's hard	10:46
lucasagomes	but I think that for v1/ it may be fine to leave it sync	10:46
dtantsur	well... if plugins don't need to return the result to the ramdisk, they could use greenthread.spawn() and become async	10:46
mrda-away	hey jroll, are you happy with my response to your query on the logical name spec?	10:46
dtantsur	if they do need to return the result, it can't be helped	10:46
lucasagomes	dtantsur, right	10:47
lucasagomes	ok grand then :)	10:47
dtantsur	cool :)	10:47
* dtantsur yoga time, brb		10:47
*** lsmola has joined #openstack-ironic		10:50
*** ramineni has quit IRC		11:06
openstackgerrit	Lucas Alvares Gomes proposed openstack/ironic: Extend API multivalue fields https://review.openstack.org/137762	11:07
openstackgerrit	Lucas Alvares Gomes proposed openstack/ironic: Extend API multivalue fields https://review.openstack.org/137762	11:08
*** Nisha has quit IRC		11:14
*** bradjones has quit IRC		11:14
*** Nisha has joined #openstack-ironic		11:15
*** rameshg87 has quit IRC		11:15
*** bradjones has joined #openstack-ironic		11:19
*** bradjones has quit IRC		11:19
*** bradjones has joined #openstack-ironic		11:19
*** alexpilotti_ has joined #openstack-ironic		11:19
*** alexpilotti has quit IRC		11:20
*** alexpilotti_ has quit IRC		11:23
*** vinbs has quit IRC		11:38
*** smoriya has quit IRC		11:41
*** jistr is now known as jistr\|training		11:42
*** Masahiro has joined #openstack-ironic		11:43
*** Masahiro has quit IRC		11:48
*** takadayuiko has quit IRC		11:52
*** romcheg has quit IRC		11:54
*** romcheg has joined #openstack-ironic		11:54
*** naohirot has joined #openstack-ironic		11:54
openstackgerrit	Nisha Agarwal proposed openstack/ironic-specs: Discover node properties using new CLI node-introspect https://review.openstack.org/100951	11:56
*** Haomeng\|2 has joined #openstack-ironic		11:58
*** Haomeng has quit IRC		11:59
openstackgerrit	Nisha Agarwal proposed openstack/ironic-specs: Discover node properties for iLO drivers https://review.openstack.org/103007	12:13
*** pensu has quit IRC		12:14
openstackgerrit	Nisha Agarwal proposed openstack/ironic-specs: uefi support for agent-ilo driver https://review.openstack.org/137024	12:25
*** k4n0 has quit IRC		12:28
*** lucasagomes is now known as lucas-hungry		12:32
*** lazy_prince is now known as killer_prince		12:43
*** ryanpetrello has joined #openstack-ironic		12:43
*** Masahiro has joined #openstack-ironic		12:52
*** Masahiro has quit IRC		12:57
*** erwan_taf has joined #openstack-ironic		13:11
*** dprince has joined #openstack-ironic		13:15
*** lucas-hungry is now known as lucasagomes		13:23
*** igordcard has quit IRC		13:28
*** killer_prince is now known as lazy_prince		13:29
*** igordcard has joined #openstack-ironic		13:31
openstackgerrit	Oleksii Chuprykov proposed openstack/ironic-python-agent: Use oslo.utils and oslo.concurrency https://review.openstack.org/138116	13:33
*** ryanpetrello has quit IRC		13:38
*** ryanpetrello has joined #openstack-ironic		13:39
*** Marga_ has quit IRC		13:47
*** rloo has joined #openstack-ironic		13:54
*** rushiagr is now known as rushiagr_away		13:55
*** igordcard has quit IRC		13:56
*** jjohnson2 has joined #openstack-ironic		13:57
*** Nisha has quit IRC		13:59
*** ryanpetrello_ has joined #openstack-ironic		14:06
*** ryanpetrello has quit IRC		14:08
*** ryanpetrello_ is now known as ryanpetrello		14:08
*** linggao has joined #openstack-ironic		14:13
*** ndipanov has quit IRC		14:16
*** Marga_ has joined #openstack-ironic		14:18
*** ryanpetrello has quit IRC		14:18
*** ryanpetrello has joined #openstack-ironic		14:19
*** Marga_ has quit IRC		14:23
ChuckC	morning ironic	14:24
openstackgerrit	Merged openstack/ironic-specs: Proposal to add logical names to Ironic nodes https://review.openstack.org/134439	14:35
jroll	morning everybody	14:35
Shrews	morning jroll	14:35
Shrews	and ChuckC	14:36
jroll	mrda-away: landed your spec, I think you can provide a node uuid in the body and I forgot about that	14:36
jroll	heya Shrews, ChuckC :)	14:36
rloo	morning ChuckC, jroll, Shrews	14:36
*** dlaube has joined #openstack-ironic		14:36
Shrews	o/ rloo	14:37
jroll	\o rloo	14:37
jroll	(jinx)	14:37
Shrews	get outta my head jroll	14:37
jroll	:D	14:38
openstackgerrit	Harshada Mangesh Kakad proposed openstack/ironic: Add documentation for SeaMicro driver https://review.openstack.org/136324	14:38
rloo	'great minds think alike' ?	14:38
jroll	great is an interesting word for my mind :P	14:40
Shrews	great is an incorrect word for my mind :P	14:41
*** Masahiro has joined #openstack-ironic		14:41
rloo	fools seldom differ? ;)	14:43
jroll	lolol	14:44
dtantsur	Morning ChuckC, jroll, Shrews, rloo!	14:44
jroll	hey dtantsur :)	14:44
rloo	afternoon dtantsur	14:44
Shrews	rloo: that's more appropriate :)	14:44
Shrews	hey dtantsur	14:44
lucasagomes	jrist, Shrews rloo ChuckC morning	14:46
*** Masahiro has quit IRC		14:46
lucasagomes	jroll, :)	14:46
rloo	afternoon lucasagomes	14:46
lucasagomes	jr<tab> is dangerous	14:46
jroll	lol	14:46
*** rushiagr_away is now known as rushiagr		14:49
*** lazy_prince is now known as killer_prince		14:50
*** naohirot has quit IRC		14:53
NobodyCam	good momrning Ironic-ers	14:55
dtantsur	NobodyCam, o/	14:55
jroll	hiya NobodyCam :)	14:56
lucasagomes	NobodyCam, morning	14:56
NobodyCam	morning dtantsur jroll lucasagomes :)	14:56
* jroll toses a pot of coffee to NobodyCam		14:56
jroll	hmm, need to find a nova core	14:56
NobodyCam	oh thank you jroll :) neeeded :)	14:56
NobodyCam	nova core?	14:57
jroll	yeah for https://review.openstack.org/#/c/98930/	14:58
jroll	configdrive	14:58
dlaube	g'morning	14:59
*** r-daneel has joined #openstack-ironic		15:00
jroll	hiya dlaube :)	15:00
jroll	NobodyCam: found one \o/	15:01
jroll	darn, now I have to actually write code	15:01
NobodyCam	oh Nice spec	15:01
NobodyCam	lol	15:01
NobodyCam	morning dlaube	15:01
*** rushiagr is now known as rushiagr_away		15:04
lucasagomes	jroll, o/	15:11
rloo	jroll: I opened a bug about setting maintenance mode off via node-update, needing to clear maint reason	15:12
rloo	jroll: https://bugs.launchpad.net/ironic/+bug/1398191	15:12
lucasagomes	jroll, any news on the rebuild vs configdrive thing?	15:12
rloo	jroll: so we don't forget ;)	15:12
NobodyCam	morning rloo :)	15:13
rloo	morning to the man who hopefully has had coffee	15:13
*** ndipanov has joined #openstack-ironic		15:13
NobodyCam	:) yep :) working on first cup now	15:14
jroll	lucasagomes: we haven't talked about it more, yesterday was pretty busy	15:16
lucasagomes	:) yeah I hear ya	15:17
jroll	rloo: cool, thanks	15:18
*** Marga_ has joined #openstack-ironic		15:19
*** Marga_ has quit IRC		15:23
*** lynxman has quit IRC		15:25
*** lynxman has joined #openstack-ironic		15:26
NobodyCam	lucasagomes: your comment on https://review.openstack.org/#/c/132137 is that because you foresee anyone wanting to set the uuid? or is there another reason I'm not thinking about?	15:27
NobodyCam	s/foresee/don't foresee	15:27
lucasagomes	NobodyCam, yeah, I mean I can't think about any use case where someone wants to create a node in Ironic and input a UUID by hand	15:28
lucasagomes	I understand it's supported in the API so makes sense to be able to do in the client	15:28
jroll	I would probably use it if my cmdb used UUIDs	15:28
lucasagomes	but a UUID seems like something that should always be generate (to guarantee uniqueness)	15:28
dtantsur	lucasagomes, discoverd may by such a thing, if folks do force me to handle creation of Ironic nodes (which I try to avoid)	15:28
jroll	just for easy linkage	15:28
lucasagomes	right	15:29
NobodyCam	yea, I was thinking about folks who have a existing cmdb and just wanted to keep the same id's	15:29
dtantsur	yeah and CMDB use case too	15:29
lucasagomes	yeah, ok :)	15:29
lucasagomes	I was fine with the change	15:29
jroll	lucasagomes: the uuid library doesn't ensure uniqueness by remembering uuids or anything, there can actually be collisions I would guess	15:29
lucasagomes	my -1 is because of the lack of tests	15:29
NobodyCam	still needs tests	15:29
NobodyCam	yea just wanded to make sure	15:29
jroll	we have the unique constraint for ensuring uniqueness :P	15:29
lucasagomes	yeah, I haven't thought about the CMDB thing	15:30
lucasagomes	I can see it, but still, I find it odd to input a UUID by hand	15:30
NobodyCam	anyone seen arun on line?	15:30
lucasagomes	in the conference we talked about having alias and all for nodes	15:30
lucasagomes	I like that	15:31
jroll	yeah	15:31
jroll	I also find it odd, but could be useful	15:31
jroll	also, I just landed mrda's spec for the name thing	15:31
lucasagomes	w00t	15:31
NobodyCam	I hope folks doing that are adding nodes via script and not hand :-p	15:31
NobodyCam	nice	15:31
jroll	:P	15:32
lucasagomes	jroll, how IPA picks the disk to use for the deployment?	15:33
lucasagomes	u guys have some mechanism there? or pick the first one?	15:33
jroll	it's pluggable	15:34
jroll	but it chooses the smallest disk above 4GB	15:34
jroll	https://github.com/openstack/ironic-python-agent/blob/master/ironic_python_agent/hardware.py#L262-270	15:34
* lucasagomes looks		15:34
jroll	you could override that with a little hardware manager plugin	15:35
jroll	super easy	15:35
lucasagomes	nice yeah!	15:35
jroll	for example this is our manager for our hardware https://github.com/rackerlabs/onmetal-ironic-hardware-manager/blob/master/onmetal_ironic_hardware_manager/__init__.py#L39	15:36
jroll	just replace the methods in that class with whatever you want to override	15:36
lucasagomes	jroll, having some hints in the node.properties about which disk to pick would make sense for u guys too?	15:36
lucasagomes	(like UUID, WWN, etc...)	15:36
jroll	lucasagomes: it would make sense, we personally probably wouldn't use it but yeah	15:36
lucasagomes	ack	15:37
* lucasagomes clicks on the example		15:37
rloo	NobodyCam, lucasagomes. wrt 132137, i could be wrong but that discussion about PUT in the meeting was partially due to a bug, where they wanted to specify the uuid when creating a node	15:48
rloo	NobodyCam: and I mentioned yesterday to arun that it would be nice if he could add a test to that patch ;)	15:49
*** dlaube has quit IRC		15:51
NobodyCam	rloo: awesome Thank you :)	15:52
*** dlaube has joined #openstack-ironic		15:55
*** zz_jgrimm is now known as jgrimm		15:57
*** alexpilotti has joined #openstack-ironic		15:58
lucasagomes	rloo, ahh right	15:58
lucasagomes	so seems people does have many use cases for it :)	15:59
lucasagomes	which is good	15:59
rloo	lucasagomes: we aim to please. ha ha.	15:59
*** alexpilotti has quit IRC		15:59
*** alexpilotti has joined #openstack-ironic		15:59
lucasagomes	rloo, lol! aye	16:00
*** yjiang5 is now known as yjiang5_away		16:01
*** anderbubble has joined #openstack-ironic		16:01
*** pcrews has joined #openstack-ironic		16:04
NobodyCam	hummm	16:06
lucasagomes	dtantsur, rloo added a comment about DISCOVERING->PREBOOTING/AVAILABLE on the state machine thing	16:12
NobodyCam	jroll: the IPA `iso-image-create` dose not create the agent image itself... correct?	16:13
jroll	mmm	16:13
* jroll looks		16:13
*** rushiagr_away is now known as rushiagr		16:13
jroll	NobodyCam: no, but 'make iso' will	16:14
jroll	through the dependencies	16:14
rloo	thx lucasagomes. it isn't clear to me whether one can opt out of zapping/discovery at any time 'around' the state machine?	16:14
NobodyCam	:) commenting on the agent-ilo0uefi support spec	16:14
lucasagomes	rloo, it doesn't seem to be	16:16
lucasagomes	but I think it that those states could be non-op	16:16
rloo	lucasagomes: do people really want to eg do discovering every time a node is going to be made available again?	16:17
lucasagomes	if the driver doesn't contain any zapping steps, or introspection interface	16:17
lucasagomes	zapping and discovering just moves to the next state	16:17
lucasagomes	same for prebooting	16:17
rloo	lucasagomes: yeah, but what if the driver wants to do discovering once, the very first time the node is enrolled?	16:17
lucasagomes	rloo, if zapping is updating firmware for e.g it can introduce new capabilties that needs to be discovered	16:17
rloo	lucasagomes: but what if zapping doesn't do anything that requires discovering.	16:18
lucasagomes	rloo, I hope that we are going to introduce a state before INIT to do that	16:18
lucasagomes	so that it can be configurable, once the node enters the main loop (ZAPPING->...)	16:18
lucasagomes	the second discover can be configured	16:18
lucasagomes	rloo, so it can be skipped	16:19
openstackgerrit	Dmitry Tantsur proposed openstack/ironic-specs: In-band hardware properites discovery via ironic-discoverd https://review.openstack.org/135605	16:19
jroll	whoa, I need to look at the state machine again, we should never automatically go to discovered :\|	16:19
rloo	lucasagomes: as long as something can be skipped ...	16:19
lucasagomes	rloo, but that doesn't totally invalidade discovering, I believe that discovering you could catch things like "hey this disk doesn't exist anymore"	16:19
lucasagomes	due some failure	16:19
*** Marga_ has joined #openstack-ironic		16:20
rloo	jroll: yeah, please look. i think we should try to give this spec high priority.	16:20
lucasagomes	jroll, I was arguing about, making some states optional	16:20
jroll	lucasagomes: discovery in that case would touch node.properties to update the disk size or whatever	16:20
jroll	not alert that a disk is gone	16:20
jroll	(AIUI)	16:20
lucasagomes	yeah in that diagram it seems it's going to do that	16:21
dtantsur	reference ramdisk for discoverd is merged https://review.openstack.org/#/c/122151/ \o/	16:21
jroll	yeah, do not want	16:21
jroll	dtantsur: nice!	16:21
lucasagomes	the way I thought about it before was to have a consistent check that could be implemented	16:21
lucasagomes	after discovering	16:21
lucasagomes	and after AVAILABLE	16:21
lucasagomes	because the machine could be hanging there in AVAILABLE for days before a nova boot comes in	16:22
lucasagomes	and something may fail, or someone wrongly pulled a cable etc	16:22
lucasagomes	that way we would reject the machine if it was picked for deployment but the consistent check have failed	16:22
lucasagomes	(and nova with retry filter would then pick another machine to deploy that instance)	16:22
jroll	yeah, agree, we need to check for consistency, not sure if that should be in zapping or a different thing, but I really don't think it should be in discovery	16:23
dtantsur	lucasagomes, on the review you didn't answer the question, how (and why) discoverd will figure out whether to move node to prebooting or available :-/	16:23
rloo	we already have periodic task that checks the power; would a periodic check for consistency on nodes that are avail do what you want?	16:23
lucasagomes	dtantsur, my suggestion was to always move to prebooting	16:23
jroll	dtantsur: I would think ironic would handle that?	16:23
jroll	rloo: just a check at some point before making the node available	16:24
jroll	rloo: check that all the ram/disks/networks/etc are there	16:24
lucasagomes	jroll, yeah I don't think it should be discovering either	16:24
dtantsur	jroll, Ironic can't :) Ironic should be told that discoverd is done. and it's told by moving a node to the next state	16:24
rloo	jroll: i thought lucas mentioned that the node is already in avail and something happens to it in that state	16:24
lucasagomes	but I was trying to fit in the current diagram	16:24
*** Marga_ has quit IRC		16:24
jroll	dtantsur: not sure how I feel about an api call with a target state of available	16:25
lucasagomes	rloo, jroll yeah that too. so we added two consistent checks (same code/action) in diff stages	16:25
lucasagomes	one after discovering and one after available	16:25
jroll	dtantsur: perhaps /nodes/uuid/states/provision {'target': 'discoverydone'} and ironic decides what to do	16:25
*** yjiang5_away is now known as yjiang5		16:25
dtantsur	jroll, that works for me	16:25
jroll	lucasagomes: discovering changes node.properties, why does it need to verify node.properties	16:25
lucasagomes	jroll, I was trying to fit in the current diagram	16:26
lucasagomes	but I think we need some other thing there to actually perform the checking	16:26
dtantsur	jroll, lucasagomes, will I be able to move it to INTROSPECTIONFAIL too?	16:26
lucasagomes	dtantsur, somehow you have to tell ironic that it has failed to introspect	16:26
lucasagomes	so I believe yes	16:26
lucasagomes	idk if an API call with target:introspectionfail that makes little sense to me	16:27
lucasagomes	but some way you gotta notify it	16:27
dtantsur	lucasagomes, for now I'm planning on Ironic timing out the introspection :D but it doesn't look too friendly	16:27
jroll	hmm	16:27
jroll	this is hard :(	16:28
lucasagomes	yeah, it's not "fast" enough :)	16:28
lucasagomes	jroll, yup	16:28
lucasagomes	jroll, the way we are architect'ing things, we could use a state to do few other steps	16:29
dtantsur	I guess it's the first time we try to fit a 3rdparty to do some long-running node job for Ironic...	16:29
lucasagomes	like DEPLOYING could check consistency	16:29
rloo	dtantsur: your discoveryd needs that wait flag thing that we haven't described yet, right?	16:29
dtantsur	rloo, yes	16:29
lucasagomes	(I not necessarily think it's great, but I believe that it could be part of that state)	16:29
dtantsur	I mean, it can live without it, but it will be a strange state of node :)	16:29
*** Masahiro has joined #openstack-ironic		16:30
rloo	dtantsur: it seems like part of that wait flag mechanism should allow for whatever it is waiting for, to indicate that it is done and whether it succeeded or not.	16:30
rloo	dtantsur: or to time out waiting ;)	16:30
dtantsur	rloo, good idea	16:30
lucasagomes	sounds good rloo	16:31
rloo	dtantsur: and then it seems like if it fails, it does into whatever *FAIL state associated with the state where the wait was?	16:31
rloo	s/does/goes/	16:31
dtantsur	yeah	16:31
rloo	with some meaningful msg of course :-)	16:31
dtantsur	heh, do we need 'transition_reason' now?	16:32
dtantsur	like we had 'maintenance_reason'?	16:32
jroll	... or use last_error?	16:33
jroll	that's exactly what last_error is for	16:33
rloo	dtantsur: dunno. I need to wait for the states to settle down first, etc. then we can work on adding the bits we need to deal with it all ;)	16:33
dtantsur	jroll, make last_error also available for writing from outside?	16:34
dtantsur	rloo, right	16:34
jroll	sigh	16:34
*** Masahiro has quit IRC		16:34
rloo	why the sigh, jroll?	16:35
jroll	discoverd or whatever sends {"target": "DISCOVERYFAIL", "reason": "everything is broken"}, ironic handles the state change and updating last_error	16:35
lucasagomes	maybe as part of the body request to unset the wait flag	16:35
lucasagomes	you can give the message	16:35
jroll	yeah	16:35
lucasagomes	not even need the target	16:35
lucasagomes	cause if it's DISCOVERING the DISCOVERYFAIL is the error state associated with it	16:35
lucasagomes	so ironic can figure that out	16:35
jroll	/nodes/uuid/discovery {"result": "success", "reason": ""}	16:36
jroll	for example	16:36
jroll	/nodes/uuid/discovery {"result": "error", "reason": "busted"}	16:36
jroll	something like that	16:36
rloo	I have to read dtantsur's spec still. I don't like discoveryd hooked into the introspecting state.	16:36
jroll	I don't think we should be talking about intimate details of how discovery works before we figure out the states	16:37
lucasagomes	I think I will need a small spec for the disk hints :/	16:40
lucasagomes	to determine what could be used to figure out which device to pick, such as UUID, NAME, MODEL, SERIAL, WWN etc...	16:41
jroll	and any combination	16:42
lucasagomes	yes	16:42
* lucasagomes starts a spec		16:42
* jroll bbiab		16:43
*** dprince has quit IRC		16:45
*** romcheg has quit IRC		16:48
openstackgerrit	Victor Lowther proposed openstack/ironic-specs: New Ironic provisioner state machine. https://review.openstack.org/133828	16:48
devananda	morning, all	16:51
*** Marga_ has joined #openstack-ironic		16:52
NobodyCam	good morning devananda	16:53
* devananda reads scrollback		16:53
*** achanda has joined #openstack-ironic		16:53
dtantsur	devananda, morning	16:54
lucasagomes	devananda, morning	16:54
devananda	lucasagomes: consistency check after AVAILABLE, in the deploy pipeline? a) I would rather not do that for several reasons, b) we're really adding a lot of features to this state machine now ...	16:55
*** achanda has quit IRC		16:56
lucasagomes	devananda, right, idk how long the machine could be sitting there waiting for the nova boot command to come	16:56
*** jcoufal_ has quit IRC		16:57
lucasagomes	so it would be nice if we could test things like is the cable still connect to the right port (like JoshNang does when zapping)	16:57
lucasagomes	doesn't need to be a full check, but allowing some check would be nice (optional, even if part of the deploy)	16:57
*** pensu has joined #openstack-ironic		16:58
*** achanda has joined #openstack-ironic		16:58
NobodyCam	devananda: didn't you add rebuild to the new state machine?	16:59
devananda	lucasagomes: sure, I don't know that either. it could be just a few seconds, too.	16:59
*** igordcard has joined #openstack-ironic		17:00
lucasagomes	devananda, sure, it could be there for 10s or 1 month	17:00
lucasagomes	we can't predict that	17:00
lucasagomes	(that's why I thought that some checks would be nice to have)	17:01
*** Nisha has joined #openstack-ironic		17:01
devananda	having hints in node.properties about which disk to pick -- also not something I want to encourage. how is that not getting into the department of snowflake-management?	17:01
lucasagomes	devananda, if we are supporting RAID	17:02
lucasagomes	how I can tell ironic to use the raid device I just created?	17:02
dtantsur	also cases like big RAID for data and small SSD for an OS image... Someone told me about it.	17:03
lucasagomes	devananda, it's not about describing all disks we have in the server	17:03
lucasagomes	but giving hints about which one we should deploy the image onto	17:03
NobodyCam	I have seen use cases where the custome wanted OS to be sdb not sda	17:03
lucasagomes	that seems aligned with the project scope	17:04
lucasagomes	it's useful data for the dpeloyment	17:04
victor_lowther	nottomention that what /dev/sda is in your discovery image may not be the same as it is to the installed OS.	17:04
* victor_lowther has encountered that		17:04
lucasagomes	victor_lowther, exactly, device names can change on each boot	17:05
lucasagomes	but things like serial, wwn, UUID	17:05
lucasagomes	can't	17:05
victor_lowther	right	17:05
lucasagomes	those are the hints I'm thinking of	17:05
lucasagomes	or even, if u want to make it more generic you could say	17:05
victor_lowther	devananda: they will definitly be needed.	17:05
lucasagomes	hey pick the disk which is >=1TB	17:05
victor_lowther	it is snowflake management for vms because their idea of disk order is much more predictable.	17:06
dtantsur	lucasagomes, this logic is better left to discoverd...	17:06
lucasagomes	dtantsur, it's at deploy time	17:06
dtantsur	but being able to precisely point to a disk seems very much needed to me	17:06
lucasagomes	dtantsur, idk how discoverd can do that	17:06
lucasagomes	victor_lowther, +1	17:06
dtantsur	lucasagomes, well, in RAID case it probably can't. w/o RAID it can pre-populate this property based on some logic	17:07
victor_lowther	scan /dev/disks/by-whatever and pick the awesomest ones	17:07
dtantsur	(in theory, it's not implemented)	17:07
lucasagomes	victor_lowther, yup, or lsblk	17:07
victor_lowther	after your raid arrays are created.	17:07
lucasagomes	but in Ironic POV, hints are generic, in the ramdisk we cna create the logic using sysfs or lsblk or whatever to figure out it	17:07
victor_lowther	ya	17:08
lucasagomes	devananda, does it sounds fair?	17:08
*** Marga_ has quit IRC		17:09
* devananda returns to reading scrollback, had to pop into a meeting ...		17:09
*** Marga_ has joined #openstack-ironic		17:09
victor_lowther	Crowbar used an unholy combination of /dev/disk/by-id and the actual full device paths in sysfs to figure things out.	17:10
lucasagomes	hardware device path?	17:10
victor_lowther	due to fun on one of our targets where some random disk in an external drive array was /dev/sda	17:11
lucasagomes	heh yeah without udev rules you can't predict it	17:11
lucasagomes	(unless it's LVM so u can :))	17:11
victor_lowther	due to PCI bus ordering and module insertion ordering fun.	17:11
victor_lowther	LVM and UUIDs are basically useless in the scenario I am thinking of	17:12
victor_lowther	because they are not applicable to raw physical disks.	17:12
lucasagomes	sure	17:12
victor_lowther	so don't help you ifgure out where the boot partition should be.	17:12
victor_lowther	UEFI is sorta nice there in that your boot partition can be anywhere as long as UEFI can see it.	17:13
victor_lowther	lucasagomes: ya, hardware device path.	17:14
lucasagomes	right in this case we want to find the root device, and lay the image onto it	17:14
*** viktors is now known as viktors\|afk		17:14
lucasagomes	the root partition (in case of a full disk image) will be part of the image itself	17:14
lucasagomes	victor_lowther, right	17:14
victor_lowther	so, dtantsur	17:16
victor_lowther	how to you want discoverd to work in the state machine?	17:16
rloo	NobodyCam: I just looked at this spec https://review.openstack.org/#/c/101122/ which is still targetted for juno. Shouldn't it have been abandoned?	17:16
NobodyCam	yea let me look	17:17
*** marcoemorais has joined #openstack-ironic		17:17
NobodyCam	ah ha.. seems I added a comment instead of abandoning it	17:18
dtantsur	victor_lowther, in case of using discoverd for introspection, Ironic sets node to DISCOVERING, WAIT -> true, calls to discoverd and relaxes :)	17:18
dtantsur	victor_lowther, after that discoverd comes back and advances node state to the next one	17:18
devananda	https://drive.google.com/file/d/0Bz_nyJF_YYGZZ05zaU9kb2Z4SE0/view?usp=sharing	17:19
*** hemna__ has joined #openstack-ironic		17:19
NobodyCam	rloo: updated	17:19
rloo	thx NobodyCam	17:19
*** Marga_ has quit IRC		17:19
devananda	victor_lowther: my draft from last night, forgot to post anywhere - also working a bit more on it now	17:19
devananda	victor_lowther: i'm looking at your latest spec now ...	17:20
NobodyCam	looking very good devananda :)	17:20
*** dtantsur is now known as dtantsur\|brb		17:22
lucasagomes	devananda, sorry for insisting, but just to get the initial feedback cause I'm currently writing the spec for it. R you ok with the hints?	17:23
*** dprince has joined #openstack-ironic		17:28
devananda	oslo change / hacking rule has broken our gate (and just about everyone else's)	17:29
devananda	https://bugs.launchpad.net/hacking/+bug/1398472	17:29
lucasagomes	:/	17:30
devananda	lucasagomes: do we guarantee which disk device the node will boot from?	17:31
lucasagomes	devananda, yes, after writing the image we get the UUID and set it as the ROOT=UUID=	17:31
lucasagomes	so at boot time, after deployment it's going to boot from the right device	17:32
devananda	lucasagomes: that's a kernel param, right?	17:32
lucasagomes	devananda, yes	17:32
devananda	which applies only to net boot	17:32
lucasagomes	yes, for the fulldisk images I believe they already have a bootloader in the image	17:33
JayF	yes	17:33
devananda	yup	17:33
lucasagomes	with the right config to boot the right partition	17:33
*** achanda has quit IRC		17:33
devananda	and if we're providing an API to change which disk the image is written to, are we sure that we wrote that to the disk the system expects the bootloader to be on?	17:33
*** achanda has joined #openstack-ironic		17:34
devananda	also, after an image is written to disk, and the instance is deployed, what does it mean if the operator changes this value in the API without redeploying a new instance?	17:34
lucasagomes	devananda, is a) the fulldisk image example?	17:34
lucasagomes	devananda, because right now if u have 2 disks and ironic pick the first	17:35
lucasagomes	it's random	17:35
lucasagomes	ironic could boot once and get disk A and boot second time and get disk B	17:35
lucasagomes	it's way more unpredicable the way it works now	17:35
lucasagomes	for B) this is what zapping should do, clean the disks	17:36
lucasagomes	I mean, both examples could happens as-is	17:36
lucasagomes	with the current code	17:36
devananda	I'm not sure where zapping got introduced to this question	17:36
devananda	as for the current behavior, if it's not repeatable -- that's a bug	17:37
devananda	assuming no hardware failures, any number of (re)deployments, given the same inputs, to the same node, should result in the same disk being used	17:37
lucasagomes	I'm not sure I'm in sync. Right yeah it's bug	17:37
devananda	if that's not the case, that should definitely be fixed	17:37
lucasagomes	and the hints helps fixing it	17:37
devananda	taht's where I disagree	17:38
lucasagomes	devananda, so that assumition is wrong	17:38
devananda	we should have predictable behavior. if I understand where you're going with hints (and maybe I dont)	17:38
lucasagomes	cause you don't have udev rules enforcing device names or ordering	17:38
devananda	they will lead to less repeatability, not more	17:38
lucasagomes	you can't assume that	17:38
*** achanda has quit IRC		17:39
JayF	lucasagomes: devananda: device names / identifiers are not used at all by IPA today	17:39
victor_lowther	the issue is that (for example) discoverd's idea of what /dev/sda is may not be the same as what any random other distro's idea is.	17:39
devananda	right	17:39
lucasagomes	the hints is giving the operator a way to tell based on persistent block device naming which device to use to deploy the image	17:39
victor_lowther	because SCSI device ordering in Linux is not stable, and it never will be.	17:39
JayF	Although I think we take the "first", smallest disk >4GB ... so it's very possible that it could be inconsistent	17:39
devananda	each OS may order the disks differently	17:39
JayF	but in the case of IPA we could easily make the pick-a-device-to-deploy-to decision have more knowledge	17:39
devananda	the OS-specific ordering shouldn't matter to Ironic. That it matters within the context of discoverd is a different problem	17:40
JayF	I think the case that lucasagomes suggests is possible (and I've seen it happen), especially since deploy ramdisks use coreos+very recent kernel which means ordering could be radically different than an older (think: RHEL5/6) image would have	17:40
lucasagomes	JayF, yeah, that's what I'm tying to abstract	17:40
JayF	devananda: I'm saying I think it can today with IPA, if you have disks of equal size	17:41
victor_lowther	the stablest solution I have found is to figure out what disk to use via its sysfs device path, then ensure that the IS install and bootloader use that disk based on its disk ID/WWN/serial number.	17:41
JayF	victor_lowther: that's exactly how we do discovery for updating firmwares on our raid cards during decom	17:41
devananda	can Ironic repeatably a) deploy an image b) onto the same disk c) which is the disk that that machine attempts to boot from \|\| has the UUID which we pass as a kernel param via PXE	17:41
lucasagomes	victor_lowther, JayF yup. That's what I'm proposing	17:41
victor_lowther	devananda: UUID is a filesystem/partition property, not a disk property.	17:42
lucasagomes	devananda, the UUID is after the deploy	17:42
victor_lowther	Best to not rely on it for OS install purposes	17:42
lucasagomes	cause it's a property of the fs	17:42
lucasagomes	yeah that ^	17:42
devananda	victor_lowther: yes. argh....	17:42
JayF	you can rely on labels	17:42
JayF	and label the partition how you expect	17:42
JayF	when you deploy it	17:42
JayF	although that doesn't help in the deploy ramdisk case	17:43
devananda	can Ironic repeatably a) deploy an image b) onto the same disk c) which is the disk that that machine attempts to boot from	17:43
victor_lowther	that is still after you pick the disk to deploy to	17:43
lucasagomes	devananda, if ur booting via the network idk about c)	17:43
JayF	devananda: my answer would be that IPA today, in some cases (two disks of same size), could consistently fail c, but I wouldn't expect unstable behavior across provisions	17:43
victor_lowther	Only if something tells it which physical disk to use by ID	17:43
lucasagomes	+1	17:44
victor_lowther	otherwise it will just usually work, not always work.	17:44
devananda	so, IIUC, the problem you're all referring to is that the ramdisk agents (whether iSCSI or IPA or discoverd) may use non-repeatable ordering, or may use different ordering from each other (eg, if using discoverd and then using IPA on the same node)	17:44
victor_lowther	yes	17:44
lucasagomes	devananda, correct	17:44
devananda	and just to check, the problem you're referring to is NOT related to the physical boot device or knowing which physical disk the server will boot from	17:45
*** jistr\|training has quit IRC		17:45
victor_lowther	it is related.	17:45
lucasagomes	devananda, the problem is about finding which device we should write the image onto. And that means that we want to boot from that disk too	17:47
lucasagomes	(that's why we enforce ROOT=UUID= in the kernel cmdline)... for the fulldisk image it already has a bootloader configured (but there's still problem if both disks contains a bootloader)	17:48
lucasagomes	which is not solved by the hints (and is not intended to solve that too)	17:48
victor_lowther	and if you are managing the RAID controller, you can directly control that. If you are using UEFI, you don't have to care about that. Otherwise, you have to rely on having enough BIOS control, heuristics and/or quality of firmware.	17:48
*** sambetts has quit IRC		17:48
JayF	victor_lowther: was not my intention to -1 you as soon as you pushed a new patchset. The comment I just posted is from yesterday and it just never made it's way up :(	17:49
lucasagomes	yeah, for RAID it's good to, to make sure you are using the device you've just created	17:49
victor_lowther	it helps that most RAID controllers let you specify which disk they will try to boot from.	17:49
lucasagomes	victor_lowther, [off-topic] http://doodle.com/9h4ncgx4etkyfgdw2wpdircv help us voting on the mascot name :)	17:49
*** athomas has quit IRC		17:50
victor_lowther	JayF: no worries	17:50
*** Marga_ has joined #openstack-ironic		17:51
JayF	jroll: nova-spec for configdrive just went in	17:52
JayF	jroll: woo	17:52
NobodyCam	nice :)	17:52
victor_lowther	lucasagomes: Doodle 404!	17:55
lucasagomes	ew lemme check	17:55
lucasagomes	victor_lowther, http://doodle.com/9h4ncgx4etkyfgdw here it go	17:56
victor_lowther	That worked	17:57
lucasagomes	thanks :D	17:57
NobodyCam	brb	17:58
*** rushiagr is now known as rushiagr_away		17:59
lucasagomes	aight I will go now	18:02
lucasagomes	have a good night everyone	18:02
lucasagomes	(I will check the channel from time to time)	18:02
*** lucasagomes is now known as lucas-dinner		18:02
*** derekh has quit IRC		18:06
*** rwsu has joined #openstack-ironic		18:07
*** Nisha has quit IRC		18:10
NobodyCam	have a good night lucas-dinner	18:11
*** anderbubble has quit IRC		18:18
*** Masahiro has joined #openstack-ironic		18:19
*** anderbubble has joined #openstack-ironic		18:21
*** pensu has quit IRC		18:21
*** pensu has joined #openstack-ironic		18:23
PaulCzar	getting a fun new error from ironic-conductor. it happens during what looks to be laying down the image files in the tftproot directory:	18:23
PaulCzar	https://gist.github.com/paulczar/ca2c3c21612b5cef5cc3	18:23
*** Masahiro has quit IRC		18:23
PaulCzar	unfortunately the error isn't very clear where to look to work out what is going wrong	18:24
devananda	PaulCzar: is this repeatable? if so, under what conditions?	18:26
PaulCzar	any time I try to provision a node ?	18:27
PaulCzar	still using the ssh_agent	18:27
PaulCzar	with virtualbox	18:27
NobodyCam	PaulCzar: swift is setup for temp urls?	18:30
*** harlowja_away is now known as harlowja_		18:31
NobodyCam	and you have enough free disk space to hold the image?	18:31
devananda	this is odd: 2014-12-02 18:21:19.361 4171 DEBUG ironic.drivers.modules.image_cache [-] Destination /tftpboot/96b061ed-eb91-4b12-b2d6-9d1c6ce34369/deploy_kernel already exists for image 88786efb-8985-4017-8839-7cd98ff9c87a fetch_image /usr/local/lib/python2.7/dist-packages/ironic/drivers/modules/image_cache.py:110	18:31
jroll	2014-12-02 18:21:19.361 4171 WARNING ironic.conductor.manager [-] Error in deploy of node 96b061ed-eb91-4b12-b2d6-9d1c6ce34369: 'image_source'	18:31
devananda	PaulCzar: can you manually delete the /tftpboot/96b061ed-eb91-4b12-b2d6-9d1c6ce34369 directory	18:31
jroll	the node doesn't have instance_info.image_source	18:31
jroll	err, instance_info['image_source']	18:32
jroll	which nova is supposed to put down afaik	18:32
devananda	jroll: huh. that's not a great error message, then.	18:32
devananda	yea, Nova should be passing that in, but it should fail that check sooner, right?	18:32
jroll	devananda: really? I think it's a great error :P	18:32
devananda	look at where it comes from	18:32
jroll	I'm joking	18:32
devananda	heh	18:33
devananda	there should be an exception traceback just after that	18:33
PaulCzar	hmm where does it get image_source from? glance metadata ?	18:33
jroll	PaulCzar: nova puts the image uuid there	18:33
devananda	PaulCzar: that should be the image UUID that you requested via "nova boot"	18:33
PaulCzar	crazy question ... if I used the image name will nova still pass the uuid ?	18:34
devananda	the nova.virt.ironic driver should populate the node.instance_info['image_source'] field during driver.deploy	18:34
devananda	PaulCzar: yup	18:34
devananda	well. it should. if it's not ....	18:34
jroll	so that probably comes through from https://github.com/openstack/ironic/blob/master/ironic/drivers/modules/agent.py#L259	18:35
jroll	that's the only direct image_source access we do	18:35
jroll	more specifically https://github.com/openstack/ironic/blob/master/ironic/drivers/modules/agent.py#L177	18:35
PaulCzar	let me try again with the uuid just to make sure	18:35
jroll	bah, we don't validate image_source	18:36
devananda	https://github.com/openstack/ironic/blob/master/ironic/conductor/manager.py#L715	18:36
*** dtantsur\|brb is now known as dtantsur		18:36
devananda	PaulCzar: is there an exception traceback in the conductor log after that, which you didn't include in the paste?	18:36
PaulCzar	devananda: nope, there's no exception traceback	18:37
devananda	so that bothers me a bit	18:37
jroll	btw that comes from https://github.com/openstack/ironic/blob/master/ironic/conductor/manager.py#L715	18:37
devananda	we're catching and reraising, but losing the traceback somehow?	18:37
jroll	it's a catch-all	18:37
devananda	jroll: yea, I just linked that :)	18:37
* jroll is blind right now		18:37
devananda	also, that exception, when converted to a string, is just the word "image_source"	18:38
jroll	so, it should reraise that exception	18:38
jroll	where does that end up? :\|	18:38
devananda	well, I bet it is, but there's no logger set up for that thread	18:38
jroll	mmm, right.	18:38
*** rwsu has quit IRC		18:38
dlaube	hey guys, I'm seeing "ERROR ironic.drivers.modules.agent [-] node 1d0203c6-151c-4113-bb31-738b336a07e4 command status errored: {u'message': u'Error downloading image.', u'code': 500, u'type': u'ImageDownloadError', u'details': u'Could not download image with id 1edccf41-f244-4304-ab65-66d28d5a86a7.'}" when I try to nova boot	18:39
devananda	s/logger/exception handler	18:39
devananda	since, clearly, it's capable of logging from that thread	18:39
dlaube	I've added agent_pxe_append_params = nofb nomodeset vga=normal console=ttyS0 systemd.journald.forward_to_console=yes but I'm not sure what I should be looking for in the bm console logs	18:39
jroll	so maybe outside of that with reraise block we should log.exception(e)	18:39
jroll	dlaube: look for anything interesting... or feel free to just paste the whole thing	18:39
devananda	jroll: I'm guessing a bit here, but it seems like either we can drop the reraise and just log and clean up, or we should sort out why worker threads aren't logging exceptions.	18:41
* devananda returns to thinking about the state machine		18:42
dlaube	http://paste.openstack.org/show/p3Hvh7DAfDzspebWFxbl/	18:42
dlaube	looks like coreos deploy image went well… nothing from IPA stands out to me	18:42
dlaube	am I able to ssh into the coreos deploy image while it is doing the rest of the deployment?	18:43
jroll	devananda: both, imo? unexpected exceptions that we don't catch should be logged, we've written a bunch of code to work around it	18:43
jroll	dlaube: if you build your own ramdisk and embed ssh keys, you can :)	18:43
jroll	ugh, ipa y u no lo	18:43
jroll	log, even	18:43
jroll	JayF: ^ that's what IPA console logs look like in devstack fyi	18:44
rloo	devananda, jroll: isn't it because there is a hook there to call ._provisioning_error_handler()	18:44
dlaube	jroll: can I still use disk-image-builder and then add some things in before I glance image-create? or is a custom deploy image/ramdisk more involved in that	18:46
dlaube	googling now	18:46
jroll	dlaube: blah, I think we're missing some log things	18:47
jroll	dlaube: for IPA, we don't use DIB, there's a builder in the repo	18:47
jroll	dlaube: https://github.com/openstack/ironic-python-agent/tree/master/imagebuild/coreos	18:47
jroll	drop things in oem/ as needed	18:47
jroll	(e.g. oem/authorized_keys)	18:48
NobodyCam	brb...	18:48
jroll	rloo: where is there a hook for ._provisioning_error_handler()?	18:48
devananda	jroll: https://github.com/openstack/ironic/blob/master/ironic/conductor/manager.py#L698	18:48
PaulCzar	I think I'm going backwards now ... - Error in deploy of node 6bbed42b-68eb-4946-90a5-68c797762f94: HTTPNotFound (HTTP 404)	18:48
devananda	https://github.com/openstack/ironic/blob/master/ironic/conductor/manager.py#L593	18:49
dlaube	good to know. thanks jroll	18:49
devananda	PaulCzar: does "ironic node-show <UUID>" indicate anything on the last_error field?	18:49
jroll	ah, I see	18:49
jroll	dlaube: np	18:49
PaulCzar	also these errors seem to be surfacing as WARN rather than ERROR	18:50
jroll	rloo: devananda: yeah, so _provisioning_error_handler doesn't actually do anything in this case	18:50
PaulCzar	devananda: last_error: none	18:50
rloo	jroll: right. The comment for _provisioning_error_handler() is perhaps why. It thinks it is only being called when there was an exception spawning a worker.	18:51
devananda	rloo: that does not appear to be the case, based on reading task_manager	18:51
jroll	it knows more, it just only handles that sort of exception	18:51
devananda	rloo: but IMBW ...	18:51
rloo	devananda: right, based on the comments in task_manager (I'm too lazy to read the code now)	18:52
devananda	PaulCzar: something's off. if the deploy fails, it should save the failed state and reason why	18:52
PaulCzar	paste of conductor log is in the first comment here - https://gist.github.com/paulczar/ca2c3c21612b5cef5cc3	18:53
devananda	PaulCzar: paste of "ironic node-show <UUID>" for the failed node?	18:54
*** dtantsur is now known as dtantsur\|afk		18:54
PaulCzar	added as comment in above gist	18:55
NobodyCam	instance uuid is none?	18:56
devananda	PaulCzar: this has been cleaned up. there's no failure here.	18:56
devananda	PaulCzar: can you capture that after the boot fails?	18:56
PaulCzar	that is after the boot fails	18:56
PaulCzar	blerg, nova scheduler fails the nova boot with no valid host	18:57
dlaube	the thing I find interesting about the failure to download the image error reported in the ironic cond log is that it looks like the image is being retrieved just fine according to glance reg "29831 DEBUG glance.registry.api.v1.images [7c53c4cb-4271-478c-ae2c-aa87300f7471 08df81fb719d413eacb36c2a249f1514 dcd2a172eb934e39a99ddb216e94b69f - - -] Successfully retrieved image ef0270d4-9e13-4c58-a8e9-bf8aad31d09d show /opt/stack/glance/glance/registry/api	18:57
PaulCzar	dlaube: I had something similar to that the other day and I think the metadata for the image had bad values for kernel and ramdisk	18:59
devananda	PaulCzar: gotcha. it looks like maybe this image is not accessible by the ironic service user? b195e0dc-fb06-4032-aae9-720b63abb923	19:00
PaulCzar	son of a!	19:00
PaulCzar	the newer glance cli doesn't allow --public	19:01
dlaube	PaulCzar: thanks, I'll double check the ids for the kernel and initrd specified in glance for the image I'm using with my nova boot call	19:02
jroll	dlaube: we do a glance.show() on the image, the agent actually downloads it from swift	19:03
devananda	victor_lowther: i think i missed the discussion somewhere -- is there a reason that your new draft does not have a path from DELETED to AVAILABLE, without going through both ZAP and INTROSPECT ?	19:03
dlaube	jroll: ahh	19:03
jroll	dlaube: I think you're hitting this one https://github.com/openstack/ironic-python-agent/blob/master/ironic_python_agent/extensions/standby.py#L146	19:03
jroll	would love a patch to add logging for that	19:04
dlaube	so I should check my ~/logs/screen-s-object.log in devstack then	19:04
victor_lowther	yes, and it has to do with drawing that diagram while listening to an all-hands. :/	19:04
PaulCzar	is swift needed for the non-devstack ?	19:04
PaulCzar	it's not mentioned in the developer quick start	19:04
devananda	victor_lowther: hah	19:04
jroll	PaulCzar: for the agent driver (docs suck really bad for the agent driver) :(	19:05
jroll	victor_lowther: devananda: I really really think that zapping should not imply introspecting and vice versa	19:05
devananda	victor_lowther: ok. I'm taking another stab at it. I like the new wording around multiple states. What do you think of denoting that with a symbol to be even clearer, eg, [DEPLOY*]	19:06
devananda	jroll: ditto	19:06
victor_lowther	Make a comment, and I will fix it in the next rev -- there are a few other reviews I missed while hacking up version 8.	19:06
* jroll comments on review		19:06
PaulCzar	are you kidding me right now ? something as critical as mentioning swift is required isn't mentioned anywhere ?	19:06
victor_lowther	jroll: why not?	19:06
*** mikedillion has joined #openstack-ironic		19:07
victor_lowther	devananda: sure, I can throw a symbol there.	19:07
jroll	PaulCzar: I'm a horrible person :(	19:08
jroll	PaulCzar: to be clear, this is only for agent driver, not pxe driver	19:08
*** spandhe has joined #openstack-ironic		19:08
jroll	victor_lowther: 1) because they are two separate things; 2) introspecting should never be automatically triggered	19:08
PaulCzar	so my options are to run swift or spend $1000 on building a pxe lab ?	19:09
jroll	PaulCzar: I don't understand what infra you need for a pxe lab that you don't need for the agent	19:09
victor_lowther	jroll: I think ZAPPING is a special case because out of all the states we have, ZAPPING is where we are going to make changes that can change what the node hardware looks like to everyone else.	19:10
PaulCzar	jroll: doesn't pxe require ipmi gear to power on/off ?	19:10
jroll	PaulCzar: how are you doing power control with the agent driver?	19:10
jroll	because both require some form of power control	19:10
victor_lowther	so it should update node.properties for cpus, arch. memory, disk sizes, etc.	19:10
PaulCzar	jroll: using the virtualbox power controls in ssh_agent	19:11
jroll	PaulCzar: you can do that with the pxe driver as well, see pxe_ssh	19:11
jroll	victor_lowther: if the operator does things in zapping that will change properties, they should run discovery afterward or something. maybe we can make that an optional thing.	19:11
*** pensu has quit IRC		19:11
jroll	victor_lowther: imagine you flash new firmware in zapping and a disk disappears, do you want to update node.properties to reflect that disk is gone?	19:12
victor_lowther	hell yes	19:12
PaulCzar	jroll: ah, good to know	19:12
jroll	what	19:12
jroll	what	19:12
PaulCzar	jroll : docs for pxe? or do I need to dig through the source to figure it out? :)	19:12
jroll	if a disk disappears for an unknown reaosn, I want ironic to fail that node hard	19:12
*** sreekanth has joined #openstack-ironic		19:13
jroll	victor_lowther: ^	19:13
jroll	PaulCzar: docs for pxe are pretty alright, just follow the deploy docs that are up	19:13
* jroll finds a link		19:13
victor_lowther	the one does not preclude the other,	19:13
devananda	PaulCzar: PXE_* drivers are better documented at this point. largely becausethey've been around the project the longest	19:13
victor_lowther	or do you want the hardware properties in node.properties to be proscriptive instead of descriptive?	19:13
jroll	victor_lowther: I don't understand	19:14
devananda	PaulCzar: that said, if you're looking at doing anything meaningful with physical hardware, why are you using *_ssh / virtualbox instead of IPMI ?	19:14
jroll	PaulCzar: http://docs.openstack.org/developer/ironic/deploy/install-guide.html	19:14
PaulCzar	devananda: because I want to prove this out locally and be able to do CI etc without tying up phyisical hardware	19:15
victor_lowther	do you want the info in node.properties to declare what you expect a node to have	19:15
jroll	victor_lowther: very much yes	19:15
victor_lowther	or do you want it to reflect what the node has?	19:15
jroll	victor_lowther: perhaps discovery should fill that in the first time, to reflect the actual properties	19:15
jroll	victor_lowther: but I never want those to change without me knowing	19:15
devananda	victor_lowther, jroll: we have an important distinction here. "properties declare what the node should have; if it doesn't, fail fast" \|\|	19:15
devananda	\|\| "I dont know what tjhe node has, go discover it and update properties"	19:16
NobodyCam	devananda: ++	19:16
devananda	the latter is exceptionally rare	19:16
devananda	and, IMO, should require external initiation	19:16
jroll	completely agree	19:16
victor_lowther	the latter is the model I have always operated in.	19:16
victor_lowther	it does not inhibit error detection and resolution due to missing hardware	19:16
PaulCzar	btw all the docs say to use glance ... --public ... new clients should be glance ... --is-public=true	19:16
jroll	victor_lowther: so if half of your ram fails, you want to just keep using that machine?	19:16
victor_lowther	because I have always tracked those changes over time.	19:16
NobodyCam	PaulCzar: can you file a bug on that so we don't forget	19:17
victor_lowther	if half the ram fails without redundancy, the machine usually dies a horrible death	19:17
victor_lowther	not continue to work silently	19:17
devananda	victor_lowther: if the characteristics of the node change in any meaningful way, unintentionally, chances are good that it won't match a Nova Flavor any more	19:17
victor_lowther	unless it lost halv the ram due to someone stealing it.	19:18
jroll	victor_lowther: right. then by the existing spec: the user ends up calling nova delete, ironic introspects the hardware and updates node.properties, it suddenly has half the ram	19:18
devananda	victor_lowther: so then the node sits idle, but not in an error state, indefinitely, since it no longer can be used by Nova	19:18
victor_lowther	then your inventory control system notices and goes WTF	19:18
victor_lowther	(which should be something besides Ironic)	19:19
devananda	victor_lowther: you assume someone has a CMDB	19:19
devananda	which, I think they should, but I also think that encoding that behavior in ironic is not helpful.	19:19
victor_lowther	and if they do not, how do they track everything?	19:19
devananda	doing the other thing (take the node out of rotation) does not prevent an external system from doing what you expect (namely, noticing the error)	19:19
devananda	victor_lowther: napkins	19:20
victor_lowther	Excel	19:20
NobodyCam	most testing env's don't have a cmdb	19:20
devananda	PaulCzar: if you don't want to run swift, and you are building a virtualized CI system for Ironic, thenyes, you probably want to stick to pxe_ssh driver	19:20
PaulCzar	finding out all of this the hard way :)	19:21
devananda	PaulCzar: that said, the Agent is pretty spiffy, and I didn't think running swift would be enough of a burden to prevent someone from choosing the Agent	19:22
jroll	victor_lowther: even if you do have a CMDB, you'd have to set it up to poll ironic, notice changes, alert, etc etc	19:22
victor_lowther	If y'all intend node.properties to be proscriptive, I don't have an issue with that.	19:22
victor_lowther	that is jsut a mode that Crowbar has never operated in.	19:22
devananda	victor_lowther: :)	19:22
jroll	victor_lowther: I may be biased, I don't even care to run discovery because I want to tell ironic what should be there	19:22
victor_lowther	jroll: that inevitably fails	19:23
jroll	why?	19:23
jroll	it's working great for me	19:23
victor_lowther	so someone else sorts out inventory from ordering then?	19:23
jroll	we get a spreadsheet from our vendor of exactly what was shipped	19:24
devananda	for my uses, discovery is good once and only twice -- when new hardware arrives and I want to ensure the factory manifest is accurate (which it usually isn't) and after replacing (parts of) hardware, for the same reason	19:24
jroll	we run a shitty python script to take that data and put it in ironic	19:24
devananda	jroll: next you'll tell me that spreadsheet is never wrong	19:24
jroll	plug everything in and pxe boot	19:24
jroll	devananda: for the purposes of node.properties, it's never been wrong :)	19:24
devananda	jroll: fair 'nuf	19:24
* jroll will not say the same about other info there		19:25
devananda	NIC MAC's and IPMI info are the largest problem I've seen so far, actually	19:25
devananda	not # of CPU cores	19:25
jroll	yep, same here	19:25
victor_lowther	Must not order systems in Q4, then. :)	19:25
victor_lowther	but anyways	19:26
victor_lowther	I am fine with logic that has node.properties for hardware config be proscriptive if set	19:26
victor_lowther	we would just need to have logic at certian points in the state machine to check it.	19:27
jroll	agree	19:27
jroll	what we do today as part of zapping is verify that what's in the node is what's in ironic	19:28
jroll	and fail out if there's a mismatch	19:28
jroll	I don't know if that's the best route, but it's been working for us, we've caught real hardware failures with this	19:28
devananda	jroll: iiuc, lucas-dinner was proposing to put in an assertion-check at the beginning of deployment	19:29
jroll	devananda: right, I disagree, too slow	19:29
devananda	I would strongly prefer to put that befor making the node available	19:29
jroll	yep	19:29
victor_lowther	so, in INTROSPECTING, then? :)	19:29
jroll	IMO no	19:30
jroll	introspecting should be manually triggered, look at what's there and update the node object, no questions ask	19:30
*** ndipanov is now known as ndipanov_gone		19:30
victor_lowther	ok	19:30
jroll	asked*	19:30
victor_lowther	then it should be outside the state machine	19:30
jroll	but... folks can and will disagree with me :)	19:30
jroll	I mean, it's still a valid state	19:31
victor_lowther	since it can happen any time	19:31
victor_lowther	by operator request	19:31
victor_lowther	sort of like maintenance mode.	19:31
jroll	no, it can only happen in certain states (available/init/maybe more0	19:31
jroll	like, you couldn't send a DEPLOYING node into introspection	19:31
*** pelix has quit IRC		19:31
jroll	because it's busy	19:31
devananda	not in available	19:32
jroll	so you would go available -> init -> introspecting/	19:32
jroll	?	19:32
jroll	that would be fine with me	19:32
victor_lowther	meh	19:32
devananda	jroll: s/init/managed/	19:32
jroll	sure	19:32
devananda	it's just a word, but i think that substitution has value in our discussion	19:32
jroll	so many new words	19:32
victor_lowther	I would just set maintenance mode, run introspection, verify that I got what I expect, and unset maintennance mode.	19:33
devananda	maintenance mode doesn't affect current state,a nd can be applied to any state	19:33
victor_lowther	Exactly.	19:33
devananda	what about requiring the node be transitioned back to managed (aka init) in order to initiate introspection?	19:34
victor_lowther	back to the zapping can change things point	19:34
devananda	a node that is managed but not available for use yet can be (re)introspected	19:34
devananda	zapping is orthogonal	19:34
devananda	I can zap at that point. I can also zap between delete and available	19:34
devananda	it's automatic between delete and avaialble	19:35
devananda	it's manual on a managed node	19:35
devananda	jroll: I think this is closer to what ya'll are doing today?	19:35
jroll	devananda: without the intermediate managed state, yes	19:35
jroll	(and we don't have any concept of introspection, ofc)	19:36
devananda	sure	19:36
devananda	do you zap nodes which are available to nova?	19:36
devananda	or do you do something to take them out of scheduling first?	19:36
*** mrda-away is now known as mrda		19:36
mrda	Morning Ironic	19:36
NobodyCam	morning mrda	19:37
jroll	devananda: we zap available nodes, they go to the "ZAPPING" state (DECOMMISSIONING downstream) where they can no longer be scheduled	19:37
devananda	k	19:37
devananda	so there's a small race there	19:37
jroll	yet another :)	19:38
devananda	jroll, victor_lowther: http://paste.openstack.org/show/XsFt3wtJevf8LCcCB5fG/	19:38
devananda	lemme know what you think	19:38
jroll	we don't actually do that very often, only if e.g. we get new firmware or whatever	19:38
jroll	is R:thing a request for thing?	19:39
devananda	jroll: yes. see victor's draft for an explanation	19:39
devananda	jroll: tldr; the PUT to the REST API	19:39
jroll	thought so, thanks	19:39
jroll	right	19:39
devananda	[FOO*] indicates FOOING, FOOED, and FOOFAIL	19:39
victor_lowther	hm	19:39
jroll	right	19:40
victor_lowther	I think it is weird to have [ZAP*] in two places in the graph	19:40
devananda	victor_lowther: i agree	19:40
devananda	but I wanted it to be clear that managed -> zap -> managed is a manually-invoked process	19:40
devananda	and active -> delete -> zap -> available is automatic	19:41
devananda	wasn't sure how else to do that	19:41
victor_lowther	hm...	19:41
jroll	devananda: yeah, I think this looks fine	19:41
devananda	i also left out preboot :(	19:41
victor_lowther	namaged -> zap+flash -> managed	19:41
victor_lowther	and delete -> zap -> available	19:42
victor_lowther	preboot is easy to forget	19:42
victor_lowther	I would just make it a rule that all fooing states should be booted into somehting	19:43
victor_lowther	maybe.	19:43
victor_lowther	Have to think about what that entails.	19:43
victor_lowther	hm...	19:43
victor_lowther	managed -> [mangle] -> [introspect] -> managed?	19:44
devananda	that's not necessarily linear	19:44
victor_lowther	make it clear that zap will not change what the hardware looks like, whereas mangle can?	19:44
* victor_lowther nods		19:44
devananda	which is why i made zap and inspect separate loops	19:44
victor_lowther	but hard to freehand nonlinear things in irc. :)	19:45
devananda	indeed	19:45
devananda	jroll: could you guys implement the long-running-ramdisk stuff within the zap* state?	19:46
jroll	devananda: as in, boot the ramdisk at the end of zap*	19:46
jroll	seems weird	19:46
devananda	yes	19:46
jroll	I don't think we need the preboot state, tbh	19:46
devananda	it's part of "get it ready for provisioning again"	19:46
devananda	right?	19:46
jroll	I guess?	19:46
jroll	I mean, it's an optimization	19:47
devananda	that's what I'm getting at	19:47
NobodyCam	I agree with jroll sounds strange at first read	19:47
jroll	you can schedule to nodes that are prebooted or not prebooted	19:47
PaulCzar	any benefits to using ipxe over pxe ?	19:47
PaulCzar	reading through the pxe docs right now	19:47
devananda	preboot as a seprate requestable state, which is essentially just a permutation of AVAILABLE, doesn't fit in the general case	19:47
jroll	one is faster, should prefer one, just check power on to decide	19:47
victor_lowther	PaulCzar: less tftp traffic	19:47
jroll	PaulCzar: http > pxe	19:47
PaulCzar	ipxe seems to be simpler ?	19:47
jroll	err	19:47
devananda	PaulCzar: http > tftp	19:47
jroll	http > tftp	19:47
*** lucas-dinner has quit IRC		19:48
devananda	PaulCzar: PXE is simpler in a sense. iPXE is both more extensible and more robust.	19:48
victor_lowther	potential downside is that you have booting to local disk can be problematic depending on firmware	19:48
openstackgerrit	Jarrod Johnson proposed stackforge/pyghmi: Implement server side IPMI protocol (WIP) https://review.openstack.org/138109	19:48
devananda	PaulCzar: you'll need a tftp service (just run tfptd) for PXE. you'll also need an HTTP service (eg, apache) for iPXE	19:48
PaulCzar	devananda: right ... I'll start with pxe then ... although I do like the idea of http ... tftp is super slow	19:49
jjohnson2	tftp means a really ugly workaround for > 65k blocks, and tftp is easy to implement in software, meaning it isn't fancy enough to do things like send more than one packet without acknowledgement	19:49
devananda	PaulCzar: start simple ++	19:49
devananda	jroll: if you just want a scheduling hint, why not use something in node.properties ?	19:50
devananda	jroll: we don't need a discrete state for that	19:50
devananda	like a nova filter which prefers nodes with "is_prewarmed" in node.properties['capabilities'] over ones that do not have that key	19:51
victor_lowther	that is what I suggested to hint to Ironic whether and what to preboot.	19:51
devananda	victor_lowther: the question in my mind right now is, how would one indicate that to ironic	19:52
devananda	either ...	19:52
devananda	- all nodes preboot all the time	19:52
devananda	- none preboot automaticaly, but an operator (or external service) can manually request a specific node to preboot	19:53
devananda	- some magical orchestration logic gets added to ironic with knobs that allow it to decide when and how many to preboot	19:53
devananda	i clearly don't like option #3	19:53
victor_lowther	defeinitly not 3	19:53
victor_lowther	I would lean to 2	19:53
devananda	if it's 1, I would say it belongs in ZAP*	19:54
devananda	and I don't actually see a benefit to 2	19:54
jroll	devananda: yeah, that's fine, I've never thought we needed preboot states	19:54
devananda	and I don't actually see a benefit to 2 -- as a separate state in the state machine	19:54
jroll	devananda: yeah, actually, in ZAP* might work	19:55
* jroll looks at code		19:55
devananda	jroll: cool. i'm curious to know how you'd implement 2 in ZAP*	19:55
victor_lowther	so, how to tell what gets prebooted?	19:55
victor_lowther	(the image, not hte nodes)	19:55
victor_lowther	in that case?	19:55
jroll	devananda: 2 would not work in ZAP*	19:55
NobodyCam	brb	19:55
devananda	jroll: hm. ok. then ya'll would just preboot all the time?	19:56
jroll	yes	19:57
jroll	I see why others might want to only preboot some	19:57
jjohnson2	fyi, the ipmi target implementation can sanely respond to a get channel auth request	19:58
jroll	devananda: what we do now is at the end of ZAP*, we reboot instead of power off	19:58
jjohnson2	NobodyCam,so not too much work and the hard part is done	19:58
jjohnson2	too much more work until the hard part is over that is	19:58
*** ParsectiX has joined #openstack-ironic		19:58
devananda	jroll: nice. that's what I thought. so this works for you	19:59
devananda	jroll: and I"m not aware of any other contributors working on / asking for a prewarmed-some-of-the-time optimization. so I'm OK with not making this more complicated to accomodate that	19:59
*** spandhe has quit IRC		19:59
victor_lowther	ok	20:00
*** spandhe has joined #openstack-ironic		20:00
jroll	devananda: sure, makes sense	20:00
devananda	victor_lowther: new version of mine coming in 10m	20:00
victor_lowther	so to me that sounds like PREBOOT* should vanish in favor of a "don't shut down when you hit AVAILABLE" flag.	20:01
victor_lowther	coupled with "ZAP* always boots into something like discoverd or IPA"	20:01
*** andreykurilin_ has joined #openstack-ironic		20:02
devananda	gah. TC meeting ... make that 2 hours	20:02
* jroll also has a meeting now		20:02
victor_lowther	:) No worries. I am not going to so anything with the spec until more folks have commented on it anyways.	20:03
victor_lowther	er, do anything.	20:03
devananda	http://paste.openstack.org/show/JZ0tjouzcFYWZx1P18z6/	20:04
jroll	devananda: transition from AVAILABLE to MANAGED?	20:04
devananda	yes	20:04
devananda	missing an ^ arrow	20:05
jroll	devananda: other than that, lgtm, would love for victor_lowther to push a new patchset with this	20:07
*** Masahiro has joined #openstack-ironic		20:07
NobodyCam	jjohnson2: I'll have a look in a bit	20:08
NobodyCam	but awesome	20:08
victor_lowther	I will, but not until this evening. Have to give interested folks on the far side of the planet their chance to point out other things I got wrong.	20:08
rloo	devananda: I'm not sure how you get from delete->zap (unless I am misinterpreting [DELET*/AVAILABLE] -- doesn't this mean go to AVAILABLE after deleted?	20:09
devananda	rloo: it means the target state is AVAILABLE	20:09
devananda	but it can go through anything the state machine wants to along the way	20:09
jroll	victor_lowther: if we've already agreed something is wrong, we should fix it asap so that people aren't commenting on outdated things	20:09
devananda	current / target	20:09
devananda	that tracks the difference between ZAP/MANAGE and ZAP/AVAILABLE	20:10
jroll	victor_lowther: give them a chance to comment on the new state machine overnight, rather than change it tomorrow and wait another night	20:10
devananda	victor_lowther: if you dont mind and dont have time, i can just push a new rev over yours, with that picture in it	20:10
rloo	devananda: hmm. ok, it is the only one that is different (target state of others, go to that state next)	20:11
victor_lowther	it is not just hte picture, there is feedback that I inadvertly ignored from the last rev that also needs to be incorporated.	20:11
rloo	devananda: you lost me with delet* -> zap/manage vs delet* -> zap/available. I don't see that in the diagram.	20:11
jroll	rloo: right, the ACTIVE -> AVAILABLE transition goes through deleting and zapping	20:11
devananda	rloo: ZAP is not a target state, ever, actually	20:11
victor_lowther	but a comment on the current rev with a link to your new state machine would be appreciated. :)	20:12
devananda	victor_lowther: ack, will do	20:12
jroll	devananda: hmm, so what's the API call for MANAGED -> ZAP -> MANAGED	20:12
*** Masahiro has quit IRC		20:12
jroll	target: ZAPPED?	20:12
devananda	jroll: PUT /... {target: zap}	20:12
JayF	As a note, since you're talking about this; I do have a proposal up on the state machine spec that suggests "decom" work be done in another step other than ZAPPING	20:12
jroll	ok	20:12
jroll	JayF: I don't love that :(	20:13
devananda	whcihi goes to current=ZAPPING,target=MANAGED	20:13
JayF	because ZAPPING being arbitrary stuff or security cleanups is incredibly confusing	20:13
devananda	JayF: look at http://paste.openstack.org/show/JZ0tjouzcFYWZx1P18z6/	20:13
devananda	JayF: and tell me if you still think that	20:13
devananda	current=ZAPPING,target=MANAGED is different from current=ZAPPING,target=AVAILABLE	20:14
devananda	JayF: which I think captures enough information for your needs	20:14
JayF	so what kicks things from MANAGED to AVAILABLE	20:14
devananda	a human	20:14
victor_lowther	I woud rename ZAP/AVAILABLE to CLEAN/AVAILABLE	20:14
jroll	devananda: yep, got it	20:14
devananda	we could just rename ZAP to CLEAN	20:15
victor_lowther	nah	20:15
devananda	:)	20:15
JayF	If we rename zap to clean for the case of cleanup after deleted	20:15
JayF	that's exactly what I wanted	20:15
victor_lowther	because then we can rule that only cleaning stuff hallens in the ->AVAILABLE transition, and if you want to do arbitrary things you have to MANAGE the node first.	20:15
victor_lowther	and ZAP/MANAGE would be clean + other interesting things.	20:16
*** r-daneel has quit IRC		20:17
*** r-daneel has joined #openstack-ironic		20:17
rloo	+1 for not having both zap/managed and zap/available. clean/available works for me.	20:22
JayF	victor_lowther: exactly what I was thinking. Although ZAP/MANAGE should not do the CLEAN steps	20:23
victor_lowther	JayF: why not?	20:23
jroll	JayF: what are the CLEAN steps? how do you update your fleet's firmware?	20:23
devananda	JayF: sure it should	20:23
JayF	jroll: by managing the node, then using zap	20:23
jroll	Although ZAP/MANAGE should not do the CLEAN steps	20:24
jroll	""	20:24
JayF	victor_lowther: My thought is more that any state that goes to ZAP/MANAGE would've already been CLEAN	20:24
jroll	idgi	20:24
victor_lowther	if they don't then you potentially have not zeroed the disks on a node you are transitioning to AVAILABLE for the first tim.	20:24
jroll	unless steps can be in both ZAP and CLEAN	20:24
JayF	12:16:27 <victor_lowther> and ZAP/MANAGE would be clean + other interesting things. <-- that's the part I disagree with	20:24
jroll	yeah, that too	20:24
JayF	jroll: ++ that's more that I was thinking	20:24
JayF	ZAP steps can be identical to CLEAN steps if that's what you want	20:24
jroll	JayF: I think victor is adding things like "build a raid" into that	20:24
victor_lowther	yy	20:24
JayF	for purposes of ZAP, sure	20:25
jroll	I think ZAP/MANAGE should likely be CLEAN + other things	20:25
JayF	but I'm just saying we shouldn't do all the 'decom' / 'clean' steps in ZAP/MANAGE	20:25
jroll	I can't think of anything you wouldn't do	20:25
JayF	How about secure erasing a JBOD	20:25
jroll	especially because ZAP/MANAGE is how you bring in a new node	20:25
JayF	ZAP/MANAGE would also be used for people who wanted to change a node config after it's been created, right?	20:26
victor_lowther	JayF: what specirfically would you not want to do in ZAP/MANAGE that you would want to do in CLEAN/AVAILABLE?	20:26
devananda	JayF: zap/manage is a superset of zap/available	20:26
JayF	I disagree with ^ the idea that zap/manage should be a superset of zap/available	20:26
JayF	this is honestly part of why I don't want clean/zap conflated	20:26
devananda	though as I read this -- I am more and more leaning towards zap/manage and clean/available	20:26
JayF	if we want CLEAN, we should do a CLEAN then do a ZAP	20:26
victor_lowther	other way around.	20:27
JayF	er, okay	20:27
NobodyCam	++ other way around	20:27
JayF	maybe going from MANAGE/ZAP could have an optional trip through CLEAN/AVAILABLE	20:27
JayF	which seems like it'd fulfill your desires	20:27
JayF	without mixing the things together which I don't like	20:27
victor_lowther	I don't think it should be optional	20:27
devananda	JayF: if we have CLEAN/AVAILABLE, do you think we can put all long-running reconfiguration tasks outside of the AVAILABLE loop entirely?	20:27
JayF	define:long-running	20:28
devananda	eg, requrie operators to transition a node back to MANAGED in order to do things which are not part of CLEAN	20:28
victor_lowther	I basically always want to CLEAN things before they go into production	20:28
devananda	for what ever definition of CLEAN you use	20:28
*** lucasagomes has joined #openstack-ironic		20:28
JayF	victor_lowther: +1 I agree	20:29
JayF	I just think that CLEAN before going int o prod should actually just go into CLEAN/AVAILABLE	20:29
JayF	after ZAP/MANAGE is done	20:30
JayF	pretty much anytime something "leaves" the AVAILABLE loop, it's reentry point (if cleaning enabled) should be CLEAN/AVAILABLE	20:30
victor_lowther	well, then we will have to redraw the graph.	20:30
devananda	JayF: what would you do in one state that you wouldn't d oin the other?	20:30
victor_lowther	but I am fine with that.	20:30
victor_lowther	i.e: MANAGE -> CLEAN -> AVAILABLE	20:31
JayF	devananda: I don't think that's even relevant here	20:31
victor_lowther	instead of MANAGE -> AVAILABLE	20:31
JayF	if we're saying a node should be CLEAN before being AVAILABLE	20:31
JayF	we should just have it transit that state	20:31
JayF	rather than overloading ZAP to do n+CLEAN without going through the actual CLEAN state	20:31
devananda	JayF: I'd like to know, though. what would you do in one state that you wouldn't do in the other?	20:32
devananda	also, maybe someone already said that and I missed it -- multitasking meetings is great	20:32
JayF	devananda: we have steps in our CLEAN that would change BIOS settings to enable access to some hardware then flips them back later	20:32
JayF	if there's a case where I'm doing something like rebuilding a RAID, I may not want those settings changed	20:32
JayF	or generally doing anything unneccessary that can be limited by write cycles (like reflashing a firmware)	20:33
jroll	mmm.	20:33
*** ParsectiX has quit IRC		20:33
JayF	plus I might enqueue a ZAP task to fix a node that was in CLEAN FAILED	20:34
devananda	JayF: so that would leave the managed -> zap -> managed loop intact	20:34
devananda	JayF: change zap/available to clean/available	20:34
JayF	devananda: Yah; I'm very OK with that	20:34
devananda	JayF: and insert that step in the connection from managed -> available as well	20:35
JayF	then MANAGED -> [CLEAN] -> AVAILABLE	20:35
devananda	right	20:35
dlaube	ZAP? sounds pretty ….shocking.	20:35
dlaube	:P	20:35
victor_lowther	dlaube: it is my greatest contribution to Ironic ever.	20:37
*** anderbubble has quit IRC		20:38
*** jjohnson2 has quit IRC		20:39
devananda	JayF: http://paste.openstack.org/show/b0XISoOBBfD48UrSB0CI/	20:40
dlaube	victor_lowther: is this like IPA but with extra awesome sauce added or something?	20:40
dlaube	:D	20:40
JayF	devananda: dumb question; what's the "R:" in that diagram?	20:41
victor_lowther	It has _all_ the awesome sauce. And sprinkles.	20:41
victor_lowther	JayF: API call	20:41
jroll	JayF: the PUT request requesting a state	20:41
JayF	that's what I thought	20:41
jroll	dlaube: it's things like wiping disks, flashing firmware, etc	20:41
JayF	devananda: +1	20:41
victor_lowther	dlaube: really, just hte name of a state.	20:41
*** ParsectiX has joined #openstack-ironic		20:41
jroll	victor_lowther: ... for now :)	20:42
victor_lowther	A state that does awesome things	20:42
jroll	def do_zap()	20:42
victor_lowther	ZOMG!	20:42
dlaube	ahh ok	20:42
jroll	it's going to end up everywhere	20:42
dlaube	gotcha	20:42
victor_lowther	I imagine IPA will be a player in that state.	20:42
JayF	zap == Ironic is doing a thing to the hardware that the operator requested that isn't CLEANING or DEPLOYING	20:42
victor_lowther	where such a thing can be "blow away my RAID array" or "flash all the things"	20:43
JayF	or do a burn-in test	20:44
victor_lowther	that too	20:46
devananda	anyone care to poke more holes in my latest dia?	20:47
lucasagomes	devananda, yup yeah I was suggesting that (checking before deploying)	20:47
NobodyCam	same link?	20:47
* lucasagomes too much scrollback		20:47
devananda	NobodyCam: http://paste.openstack.org/show/b0XISoOBBfD48UrSB0CI/	20:47
jroll	whoa, lucas is here late	20:48
lucasagomes	jroll, devananda the thing is that putting before doesn't help cause you don't know how much time the machine is sitting on available (could be 1 month)	20:48
NobodyCam	devananda: I asked earlier but may have missed the answer. you dropped the rebuild state?	20:48
lucasagomes	jroll, and it could be optional :) so it can be fast	20:48
lucasagomes	for the baremetal-to-tenant use case	20:48
lucasagomes	jroll, yeah	20:48
lucasagomes	I should go sleep :D	20:49
jroll	lucasagomes: I still disagree	20:49
devananda	lucasagomes: we already have a power status check loop for nodes in that state	20:49
devananda	lucasagomes: no reason we couldn't add a similar check loop for other things there	20:49
lucasagomes	power status check?	20:49
lucasagomes	devananda, +1	20:49
lucasagomes	yeah offering some flexibility for checks there is good	20:49
lucasagomes	if operators wants, cause u know someone may have pulled the cable	20:49
devananda	lucasagomes: that runs in the background on available nodes, but not after a user has started booting one	20:49
lucasagomes	like a periodic task?	20:50
devananda	lucasagomes: we'll already notice if someone pulled the IPMI cable out	20:50
devananda	lucasagomes: yes. periodic task	20:50
devananda	which only runs on nodes in AVAILABLE state	20:50
lucasagomes	right, yeah it's fine. Doesn't need to be a state	20:50
lucasagomes	but it's good that we have in mind that this is a valid use case and we need to tackle it somehow	20:50
lucasagomes	devananda, fair enff	20:50
lucasagomes	jroll, seems to disagree, I still think its valid :)	20:51
lucasagomes	but I'm ok with the periodic task	20:51
devananda	lucasagomes: i object to putting in a mandatory inband-status-assertion-check during deploy	20:51
devananda	that'll slow down deploys by way, way too much	20:51
lucasagomes	devananda, it could be optional	20:51
lucasagomes	that's what I'm trying to point, it's not because it's represented as a state that it has to be mandatory	20:52
lucasagomes	zapping is optional afaiui	20:52
devananda	but doing something in a periodic task to assert that nodes which we think are AVAILABLE actually are, and still have the same properties as the last time we checked?	20:52
devananda	sure, that's fine	20:52
*** igordcard has quit IRC		20:52
lucasagomes	devananda, yeah the periodic task works too :0	20:52
devananda	lucasagomes: in a state machine like A -> B -> C, state "B" is not optional	20:52
lucasagomes	:)*	20:52
* jroll just keeps his datacenter locked and doesn't worry about idle servers being changed		20:53
devananda	it may be no-op'd, but it's not optional	20:53
lucasagomes	devananda, that's goes back to the FSM	20:53
lucasagomes	state -> action -> state	20:53
NobodyCam	jroll: power supplys do fail	20:53
devananda	jroll: garden gnomes. they're sneaky ...	20:53
lucasagomes	actions can be optional	20:53
lucasagomes	which is where the task runs	20:53
jroll	NobodyCam: I feel like we might notice that through IPMI, dunno	20:53
lucasagomes	in a FSM the engine drivers the code from one state to another and just call some hooks (aka actions)	20:53
lucasagomes	and it could be non-op	20:54
jroll	look, without this, worst case scenario is the deploy fails and is rescheduled	20:54
devananda	jroll: ++	20:54
jroll	which sucks as far as time that 'nova boot' takes	20:54
jroll	but like, isn't that bad	20:54
devananda	and the node gets kicked into maintenance mode	20:54
jroll	error status, but yeah	20:54
jroll	same idea	20:54
devananda	right	20:54
lucasagomes	alright I think we agreed with the periodic task ting	20:55
lucasagomes	thing	20:55
lucasagomes	I don't wanna go back to discuss FSM vs non-FSM	20:55
lucasagomes	(did had a pleasant time doing that)	20:55
lucasagomes	didn't*	20:55
*** anderbubble has joined #openstack-ironic		20:55
*** igordcard has joined #openstack-ironic		20:56
* lucasagomes brb		20:58
*** jjohnson2 has joined #openstack-ironic		21:00
devananda	anyone interested in the cross-project meeting?	21:00
devananda	it's starting now	21:00
NobodyCam	did we have a volenter from our team for that?	21:01
* jroll will be lurking		21:01
devananda	we have folks doing multiple cross project things	21:01
NobodyCam	volunteer	21:01
devananda	api, oslo, stable maint, vuln, etc ...	21:01
devananda	this is a general all kinds of cross project thing	21:01
devananda	thing	21:01
NobodyCam	lucasagomes: when you get back.. take a look at https://review.openstack.org/#/c/138109 if you have the time	21:03
*** marcoemorais has quit IRC		21:03
jjohnson2	well, I officially have coded in VR	21:04
NobodyCam	n VR?	21:04
jjohnson2	NobodyCam, yeah, had my editor up in my oculus	21:04
NobodyCam	oh cool	21:04
jjohnson2	now I'm done doing that	21:04
jjohnson2	it needs a few more pixels before it'll be comfortable developing on a 12 meter screen 10 meters away	21:05
rloo	devananda: wrt your latest diagram, nit: need arrow from MANAGED to AVAILABLE (and should the request be 'manage'? or eg 'available'?)	21:09
devananda	rloo: there is such an arrow. it goes through CLEAN though	21:10
devananda	rloo: and the verb is "provide"	21:10
rloo	devananda: oh, I thought it was going to be optional to clean from MANAGED -> AVAILABLE. guess not.	21:11
rloo	devananda: that was my (hopefully) last question. why is the verb 'provide' instead of 'clean'?	21:11
devananda	rloo: i thought so too, but folks this morning were fairly adamant about it	21:11
JayF	If you have cleaning disabled, obviously that steps a noop, right?	21:11
devananda	clean could, presumably, decide if it wants to no-op when coming from managed (or something)	21:11
devananda	JayF: sure	21:11
JayF	yeah exactly	21:12
rloo	JayF: I was wondering if you might want to skip the clean when you go from MANAGED-> AVAIL, but always clean after a deploy.	21:12
mrda	rloo: thank you for your wise and thorough review of the logical-name spec	21:12
JayF	this is a plank in my+joshnang's platform	21:12
rloo	I guess if the 'clean' is smart enough to know when it might want to clean. like a lazy janitor :D	21:12
mrda	rloo: but now I have to patch the merged spec as the point you raised is very valid :)	21:14
rloo	mrda: yw. sorry, i meant to look at it sooner. but I feel overwhelmed when I look at the list of specs and my 'method' of starting with the older specs was probably not a good idea.	21:14
NobodyCam	rloo: ++ to starting with older specs ... Thank you	21:15
mrda	rloo: +1	21:15
mrda	rloo: appreciate your comments - had a small brain fade, which I now have to fix :)	21:16
rloo	NobodyCam, mrda: good idea in theory, but I'm learning not to be too strict about it ;)	21:16
rloo	mrda: no worries. I'm sure we would have picked them up at coding time. but I like sooner better than later ;)	21:16
mrda	yup	21:17
rloo	mrda. i think it is difficult to get a spec 'correct'. so good enuf is good enuf!	21:17
jroll	rloo: great comments, sorry I landed that early	21:18
rloo	jroll: that issue with no stack trace for the exception in the conductor/deploy, are you going to handle that? (eg open a ticket or whatever)?	21:18
jroll	gah, did a bug not get filed?	21:19
rloo	jroll: no worries. I don't think you landed that early, we have to get those specs approved.	21:19
rloo	jroll: I don't think so. looking...	21:19
jroll	I'd rather not own that	21:19
rloo	jroll: ok, that was my other question. ok, i'll open a bug for it so we don't forget. I'm only opening a bug ;)	21:20
jroll	ok, thanks	21:20
jroll	we may have a use for lots of low-hanging fruit :)	21:21
rloo	it'll take me longer to write up the bug than to fix it I think ;)	21:21
mrda	So, regarding the logical-name spec, now that I have to patch it - do I just raise a new review with the (small) patch to remove the reference to tenant?	21:21
mrda	and should I worry about a bug?	21:21
* mrda thinks not		21:21
rloo	mrda: yup. i've got a small patch up for some other spec.	21:21
JayF	I wouldn't bug it at all	21:21
rloo	mrda: no bug	21:21
jroll	mrda: jfdi :)	21:22
rloo	ha ha	21:22
mrda	ok cool, I'll raise a new review and fix my brain fade. Thanks for the direction.	21:22
lucasagomes	NobodyCam, sure :) I will add it to the todo list here and review tomorrow morning	21:32
lucasagomes	NobodyCam, it's a bit late now :) /me wants to relax :D	21:32
*** mikedillion has quit IRC		21:32
NobodyCam	lucasagomes: its the start jjohnson2's ipmi to system command listener	21:33
lucasagomes	yeah I did a quick skimming	21:33
NobodyCam	:)	21:33
jjohnson2	huh?	21:33
NobodyCam	your wip patch	21:33
lucasagomes	awesome! seems we are going to have the ipmi listener (BiMiC) :)	21:33
jjohnson2	yeah, ipmi 2.0 only	21:34
jjohnson2	I'm doing the rmcp+ open session request parsing now	21:34
*** openstackgerrit has quit IRC		21:34
lucasagomes	cool! 1.5 can come later no hurry (if needed as well)	21:34
*** openstackgerrit has joined #openstack-ironic		21:35
NobodyCam	brb ... /me looks for some food stuffs	21:35
openstackgerrit	Merged openstack/ironic-specs: iRMC Power Driver for Ironic https://review.openstack.org/134487	21:35
jjohnson2	I might further walk the line of ipmitool and pyghmi compatibility testing first	21:35
jjohnson2	cipher suite 3 specifically	21:36
*** alexpilotti has quit IRC		21:38
devananda	now, with REBUILD: http://paste.openstack.org/show/ojbuBbsQGNlDMyz2mnPj/	21:40
*** romcheg has joined #openstack-ironic		21:40
JayF	devananda: [ot] does nova rebuild in Ironic guarantee the same backend node?	21:40
devananda	JayF: yes	21:41
JayF	clearly it does with preserve ephemeral, but what about the other cases?	21:41
openstackgerrit	Merged openstack/ironic-specs: Don't deprecate maint mode updates via node-update https://review.openstack.org/138178	21:46
*** ParsectiX has quit IRC		21:50
NobodyCam	w00 h00 :) rebuild	21:52
*** mikedillion has joined #openstack-ironic		21:53
NobodyCam	devananda: do you think rescue would ever have a need to redeploy anything?	21:54
*** sreekanth has quit IRC		21:54
*** anderbubble has quit IRC		21:55
*** Masahiro has joined #openstack-ironic		21:56
*** ParsectiX has joined #openstack-ironic		21:57
*** ParsectiX has quit IRC		21:59
*** ParsectiX has joined #openstack-ironic		21:59
devananda	NobodyCam: then its rebuild	22:00
devananda	NobodyCam: rescue should be "net boot this machine into a recovery ramdisk so I can troubleshoot it"	22:00
*** Masahiro has quit IRC		22:01
devananda	which, actually, as an operator, i might want to do from the MANAGEMENT side	22:01
devananda	er, MANAGED	22:01
devananda	without ever going to AVAILABLE or ACTIVE	22:01
devananda	bah	22:01
devananda	why did i have to think of that	22:01
*** linggao has quit IRC		22:02
devananda	right, that's thinking of it the wrong way. time for more coffee.	22:02
*** anderbubble has joined #openstack-ironic		22:04
*** dprince has quit IRC		22:06
NobodyCam	:)	22:06
*** alexpilotti has joined #openstack-ironic		22:08
victor_lowther	devananda: I will get started on the next rev of the state machine spec using your latest graph.	22:10
*** igordcard has quit IRC		22:12
devananda	victor_lowther: cheers	22:13
*** ryanpetrello has quit IRC		22:18
openstackgerrit	Jarrod Johnson proposed stackforge/pyghmi: Implement server side IPMI protocol (WIP) https://review.openstack.org/138109	22:19
*** mjturek has quit IRC		22:21
*** Hefeweizen has quit IRC		22:22
*** mikedillion has quit IRC		22:23
*** ryanpetrello has joined #openstack-ironic		22:23
*** lucasagomes has quit IRC		22:25
JayF	devananda: no	22:29
*** jjohnson2 has quit IRC		22:29
JayF	devananda: rescue is a nova concept; how can you rescue an instance if there is no instacne to rescue	22:30
devananda	JayF: right. and the thing I was thinking of is ZAP	22:32
*** foexle has quit IRC		22:38
*** ryanpetrello has quit IRC		22:40
*** anderbubble has quit IRC		22:42
*** ryanpetrello has joined #openstack-ironic		22:44
*** erwan_taf has quit IRC		22:45
openstackgerrit	Michael Davies proposed openstack/ironic-specs: Updates to logical name spec from review 134439 https://review.openstack.org/138565	22:45
JayF	mrda: ^ +2	22:47
*** ryanpetrello has quit IRC		22:48
victor_lowther	devananda: why UNRESCUE?	22:48
devananda	it's not DEPLOYING and it's not RESCUING	22:49
*** anderbubble has joined #openstack-ironic		22:49
victor_lowther	what I mean is	22:49
devananda	returning the instance to the ACTIVE state	22:50
victor_lowther	what is Ironic doing during that state transition?	22:50
JayF	turning machine off	22:50
jroll	rebooting to the instance	22:50
JayF	changing boot device	22:50
JayF	flipping networks	22:50
devananda	probably changing the PXE configs	22:50
JayF	turning machine on	22:50
devananda	possibly those things too	22:50
JayF	aweeks: ^ you should likely read this	22:50
victor_lowther	ok.	22:50
victor_lowther	question answered.	22:50
devananda	so yah. there is an implicit UNRESCUEFAIL in my diagram too	22:50
aweeks	yo	22:51
mrda	thanks JayF	22:51
devananda	as a path forward for the code itself	22:51
aweeks	I'm in the process of implementing rescue mode	22:51
aweeks	internally so far	22:51
devananda	do ya'll think it might be worth implemeting the current states in a state machine	22:52
devananda	landing that	22:52
devananda	then moving to the new states?	22:52
JayF	That just sounds really confusing tbh	22:52
JayF	versus a clean "break" and migration	22:52
NobodyCam	that seems like more work	22:52
devananda	more work yes. easier to reason about the steps involved, migration / upgrade path, etc	22:52
devananda	JayF: clean break doesn't sound like a good thing to me. why does it to you?	22:53
aweeks	devananda: is there currently a proposal for states related to rescue mode?	22:53
devananda	aweeks: http://paste.openstack.org/show/ojbuBbsQGNlDMyz2mnPj/	22:53
victor_lowther	For (UN)?RESCUING, is ironic responsible for the pxe/bootimg/whatever swizzling, or Something Nova-Ish?	22:53
aweeks	ah, thanks	22:53
devananda	victor_lowther: ironic is	22:53
jroll	victor_lowther: ironic is	22:53
JayF	victor_lowther: Ironic does it; Nova just tells us to rescue/unrescue	22:53
victor_lowther	ok	22:53
jroll	nova has the virt driver calls	22:53
aweeks	yeah, there are two calls: rescue(), unrescue(), and a "RESCUED" state in Nova	22:54
JayF	devananda: At first thought it seems simpler ... but honestly I'd defer to knowledge of others :) Upgrading a state machine without breaking backwards compat is hard :)	22:54
*** jgrimm is now known as zz_jgrimm		22:55
*** marcoemorais has joined #openstack-ironic		22:55
*** anderbubble has quit IRC		22:56
devananda	JayF: let's assume it is hard but not impossible within this cycle. is that worth it?	22:56
aweeks	devananda: in that diagram, what do the s represent? "[RESCU/RESCUE]"	22:56
*** anderbubble has joined #openstack-ironic		22:57
devananda	aweeks: see 133828 and comments thereon	22:57
JayF	devananda: I don't know :)	22:57
*** romcheg has quit IRC		22:57
aweeks	devananda: got it, thanks	22:58
devananda	JayF: massively breaking backwards compat the first cycle after integration isn't exactly on my priority list, btw :)	22:58
*** marcoemorais1 has joined #openstack-ironic		22:58
JayF	devananda: are you sure? I think it'd be great, and there's lots of precedent for it to ;)	22:59
JayF	s/to/too/	22:59
devananda	lol	22:59
*** bradjones has quit IRC		22:59
*** marcoemorais has quit IRC		23:00
aweeks	devananda: not sure if relevant, but my implementation so far only has two states: RESCUEWAIT, and RESCUED in ironic	23:02
aweeks	and implements the rescue() and unrescue() functions in the virt driver	23:02
devananda	aweeks: that's fine for now	23:02
JayF	aweeks: likely you'll want to either convince devananda to adopt the states you see or change the states you're using, lol	23:02
devananda	aweeks: we'll likely rename all the states soon anyway	23:03
aweeks	I don't really care about the names	23:03
* devananda gets out the alphabet soup		23:03
jroll	hehe	23:03
*** marcoemorais1 has quit IRC		23:03
aweeks	devananda: JayF: also, to be clear, the proposal includes removing *WAIT states, and instead has two separate states (the actual state, and a "wait" state)?	23:05
*** marcoemorais has joined #openstack-ironic		23:06
devananda	aweeks: nope. it introduces a not-yet-well-defined wait flag	23:06
*** ryanpetrello has joined #openstack-ironic		23:06
aweeks	hurm	23:06
devananda	it does remove the *WAIT states, though -- that's correct	23:06
devananda	DEPLOYING+wait	23:06
devananda	RESCUING+wait	23:06
devananda	etc	23:07
NobodyCam	rloo: if you have a free minute, can you give https://review.openstack.org/#/c/138565/ a quick look over	23:07
*** harlowja_ is now known as harlowja_away		23:07
*** spandhe has quit IRC		23:08
openstackgerrit	Victor Lowther proposed openstack/ironic-specs: New Ironic provisioner state machine. https://review.openstack.org/133828	23:09
victor_lowther	devananda: I think we should drop the wait flag stuff for now	23:09
*** alexpilotti has quit IRC		23:09
victor_lowther	in the interests of finalizing the spec by the end of the week.	23:09
devananda	victor_lowther: then we need to add WAIT to the STATE description	23:10
aweeks	so, my possibly uninformed perspective is that it seems like the ironic state machine should be a super set of the nova state machine. in that there are a set of states in ironic that are 1-1 with the nova states, and edges in the nova state machine can be replaced by 1 or more states/edges in ironic?	23:10
devananda	victor_lowther: because we must have a DEPLOYWAIT state, or equivalent	23:10
devananda	or the state machien can't handle the current drivers	23:10
victor_lowther	ah	23:10
devananda	aweeks: superset, yes. there are also states within ironic where the node is not even visible to nova	23:11
victor_lowther	Mind throwing which states need that treatment at the newly-updated spec?	23:11
aweeks	the idea being that the ACTIVE (nova) -> rescue() -> RESCUED (nova) -> unrescue() -> ACTIVE (nova) in nova could be transformed into: ACTIVE (ironic) -> rescue() -> INTERNALSTATE (ironic) -> ... RESCUED (ironic) -> INTERNALSTATE (ironic) -> unrescue() -> ACTIVE (ironic)	23:12
*** harlowja_away is now known as harlowja_		23:12
aweeks	with ACTIVE and RESCUED being 1-1 between ironic/nova	23:12
aweeks	but with intermediate states potentially in ironic	23:12
jroll	unrelated: I really wish I could add arbitrary fields to node-list in the client	23:12
devananda	victor_lowther: at a minimum, deploy, clean, zap.. possibly also validate, inspect	23:13
victor_lowther	jroll: I was suprised that you could not.	23:13
aweeks	or similar for other state transitions	23:13
devananda	jroll: you mean, because the client has to change, or the API service doesn't erturn the fields you want?	23:13
victor_lowther	devananda: so basically, instead of -ING states they should be -WAIT	23:14
victor_lowther	?	23:14
NobodyCam	victor_lowther: "In the active state, Ironic is doing something to the node." just checking thats not reffering to the ACTIVE state	23:14
victor_lowther	line?	23:14
NobodyCam	119-120	23:14
jroll	devananda: I mean, as an operator, I want to do a node-list and get last_error as well	23:15
jroll	just throwing it out there	23:15
victor_lowther	no, otherwise it would be in CAPS. That usage refers to the -ING state.	23:15
NobodyCam	:)	23:16
*** alexpilotti has joined #openstack-ironic		23:16
devananda	victor_lowther: more granularly, it may go like this, for some drivers	23:16
devananda	DEPLOYING (ironic-conductor is doing things)	23:16
devananda	DEPLOYWAIT (conductor is idle, lock is released, and an agent is doing something on the node locally)	23:17
devananda	DEPLOYING (ironic conductor is working on it again, since the agent is done)	23:17
devananda	DEPLOYDONE (hand off...)	23:17
devananda	ACTIVE	23:17
devananda	jroll: ^ fair statements?	23:17
jroll	yes	23:18
devananda	I think that is better modelled by DEPLOYING +/- WAIT_FLAG	23:18
victor_lowther	well, if lucasgomes does not like our state machine now...	23:19
victor_lowther	ya	23:19
victor_lowther	what I have been missing is a clear articulation of precisely when and how the wait flag would work.	23:19
devananda	it signals that ironic is mid-task, but has released the lock and is waiting for an external call-back	23:19
*** ryanpetrello has quit IRC		23:20
devananda	the same process happens within introspection	23:20
victor_lowther	specifically around how the node handoff to and from whatever external agent works.	23:20
victor_lowther	argh, after 1700 here.	23:20
NobodyCam	thats just what we do now	23:21
devananda	for the PXE driver, there's a waiting period after the machine is first powered on	23:21
victor_lowther	Gotta scram.	23:21
devananda	victor_lowther: ack, ttyl	23:21
*** alexpilotti has quit IRC		23:21
NobodyCam	have a good night victor_lowther	23:21
NobodyCam	thank you for the awesome effort	23:21
NobodyCam	and others too	23:21
dlaube	g'night victor_lowther	23:22
*** bradjones has joined #openstack-ironic		23:28
*** Haomeng has joined #openstack-ironic		23:33
*** spandhe has joined #openstack-ironic		23:34
*** Haomeng\|2 has quit IRC		23:34
*** anderbubble has quit IRC		23:35
*** andreykurilin_ has quit IRC		23:39
*** Masahiro has joined #openstack-ironic		23:45
rloo	mrda: I was just looking at your patch 138565	23:47
rloo	mrda: does it say anywhere that the logical names must be unique?	23:47
*** yuanying has joined #openstack-ironic		23:47
NobodyCam	rloo: I enfered that from the 1:1 uuid statment... mabe incorrectly	23:47
NobodyCam	maybe*	23:47
rloo	mrda: or is that what '1:1 mapping between a <logical name> and a <node uuid>' means.	23:48
rloo	NobodyCam: ok. I must be tired, I don't remember what 1:1 mapping means!	23:48
* NobodyCam just found https://wiki.openstack.org/wiki/OpenStackClient/HumanInterfaceGuidelines		23:48
JayF	rloo: 1:1 mapping means no duped names or uuids	23:49
JayF	rloo: each name maps to exactly one uuid and vice-versa	23:49
NobodyCam	does this mean we need to support --format in our cli	23:49
rloo	JayF: thx for clarifying!	23:49
*** Masahiro has quit IRC		23:49
JayF	NobodyCam: openstackclient != python-*client iirc	23:49
JayF	NobodyCam: I think openstackclient is the "openstack" command/sdk people are working on	23:49
NobodyCam	ok I took it as openstack clientS	23:50
NobodyCam	you are correct	23:50
*** yuanying_ has quit IRC		23:50
rloo	NobodyCam: Thx for asking; I'm good with 138565. Would you do the honours and approve it?	23:51
NobodyCam	:)	23:51
NobodyCam	rloo will do	23:51
NobodyCam	rloo: done	23:51
mrda	rloo: yes, a 1:1 mapping between logical_name and uuid implies that logical_name needs to be unique	23:52
mrda	(or at least I intended it to be so)	23:52
openstackgerrit	Josh Gachnang proposed openstack/ironic-python-agent: Use LLDP to get switch port mapping https://review.openstack.org/92627	23:52
NobodyCam	thats how I took it	23:52
*** Hefeweizen has joined #openstack-ironic		23:52
NobodyCam	JoshNang: ooooo neat-oh	23:52
JoshNang	NobodyCam: :D	23:52
JoshNang	it's basically going to require a custom hardware manager per switch manufacturer though. lldp is a not great format	23:53
JayF	NobodyCam: we're running that in our prod hw manager to verify our ports are accurate today	23:53
* mrda just upgrade his internet from 6/0.3 to 30/1. It's a nice change :)		23:54
openstackgerrit	Merged openstack/ironic-specs: Updates to logical name spec from review 134439 https://review.openstack.org/138565	23:54
JayF	mrda: congratulations, that's what, like 20% of all the internet down there :P	23:54
NobodyCam	lol	23:54
mrda	lol, not exactly. Some people are getting 1000 down.	23:55
NobodyCam	JayF: would that look like this: https://scholarworks.iu.edu/dspace/bitstream/handle/2022/171/image9CP.JPG	23:56
mrda	ADSL -> Cable	23:56
* NobodyCam has cable modem installed in his RV :) 100/30 atm I think		23:56
JayF	NobodyCam: which one of the cans is for the nsa?	23:56
NobodyCam	lol	23:56
mrda	NobodyCam: I can go there too, but it's an extra 20/month. I'll see how this goes for now.	23:57
JayF	that's nice.	23:57
JayF	I have 50 down now but I can get 300 down for like $20/month	23:58
mrda	It's really the change from 0.3 to 1 up that's important. Video calls are hard at 0.3 up.	23:58
NobodyCam	:) /me pays a bit more as he never get the "contract" price	23:58
NobodyCam	mrda: audio only is ruff at .3	23:59
*** ryanpetrello has joined #openstack-ironic		23:59

Generated by irclog2html.py 2.14.0 by Marius Gedminas - find it at mg.pov.lt!