*** Haomeng|2 has joined #openstack-ironic | 00:01 | |
dlaube | jroll: I last cloned devstack right around the juno release | 00:02 |
---|---|---|
jroll | dlaube: ah, that console thing is new | 00:02 |
*** Haomeng has quit IRC | 00:02 | |
*** smoriya has joined #openstack-ironic | 00:03 | |
*** russellb has joined #openstack-ironic | 00:04 | |
*** davideagnello has quit IRC | 00:08 | |
*** davideagnello has joined #openstack-ironic | 00:10 | |
dlaube | I do see console logs though | 00:11 |
dlaube | going to try another nova boot and will share a paste of the output | 00:11 |
*** ChuckC_ has joined #openstack-ironic | 00:12 | |
jroll | dlaube: right, there are console logs for the VM, but IPA logs won't be in there | 00:13 |
NobodyCam | also dlaube have you seen : https://ask.openstack.org/en/question/50080/ironicdriversmodulesagent-node-command-status-errored-error-downloading-image/ | 00:14 |
*** Masahiro has joined #openstack-ironic | 00:15 | |
*** pensu has joined #openstack-ironic | 00:18 | |
*** Masahiro has quit IRC | 00:19 | |
dlaube | crap, yeah.. baremetal console log just shows it sitting at a coreos login | 00:20 |
*** ChuckC_ is now known as ChuckC | 00:20 | |
dlaube | presumably the deploy image | 00:20 |
dlaube | before it goes to install my ubuntu | 00:20 |
jroll | yeah, with newer devstack it will keep logging | 00:24 |
jroll | here, lemme grab the patch | 00:24 |
jroll | you can just patch locally | 00:24 |
jroll | dlaube: https://review.openstack.org/#/c/136867/2/lib/ironic | 00:25 |
*** ryanpetrello has quit IRC | 00:26 | |
jroll | you can actually just edit ironic.conf and restart conductor | 00:26 |
*** ChuckC has quit IRC | 00:29 | |
dlaube | thanks jroll! | 00:33 |
dlaube | I know how to restart ironic deploy via apt on our lab… but in devstack I normally ./unstack.sh and ./stack.sh | 00:33 |
dlaube | is there an easier way to restart just ironic conductor in devstack? | 00:34 |
*** anderbubble has joined #openstack-ironic | 00:34 | |
JayF | yeah, connect up to the screen for it (ir-cond should be the name) | 00:35 |
JayF | ^c and restart the process | 00:35 |
dlaube | thanks JayF | 00:35 |
JayF | I think I usually hit ctrl+c then hit up to get the command used to spawn it | 00:36 |
NobodyCam | JayF: ++ | 00:36 |
openstackgerrit | Michael Davies proposed openstack/ironic-specs: Proposal to add logical names to Ironic nodes https://review.openstack.org/134439 | 00:39 |
*** pensu has quit IRC | 00:40 | |
*** penick has quit IRC | 00:40 | |
jroll | dlaube: :) np | 00:40 |
*** lucas-dinner has quit IRC | 00:41 | |
*** penick has joined #openstack-ironic | 00:42 | |
*** penick has quit IRC | 00:42 | |
*** Marga_ has quit IRC | 00:45 | |
*** Marga_ has joined #openstack-ironic | 00:46 | |
dlaube | hmm | 00:46 |
dlaube | can only find the sample conf | 00:46 |
dlaube | heh | 00:46 |
*** ryanpetrello has joined #openstack-ironic | 00:46 | |
dlaube | root@lab7:~/devstack# find /opt/stack/ -name ironic.conf* | 00:46 |
dlaube | /opt/stack/ironic/etc/ironic/ironic.conf.sample | 00:46 |
*** igordcard has quit IRC | 00:47 | |
*** Marga_ has quit IRC | 00:52 | |
*** Masahiro has joined #openstack-ironic | 00:52 | |
*** Marga_ has joined #openstack-ironic | 00:52 | |
*** ryanpetrello has quit IRC | 00:53 | |
*** ryanpetrello has joined #openstack-ironic | 00:56 | |
*** ryanpetrello has quit IRC | 01:00 | |
*** spandhe has quit IRC | 01:10 | |
*** anderbubble has quit IRC | 01:28 | |
*** Masahiro has quit IRC | 01:30 | |
*** Masahiro has joined #openstack-ironic | 01:33 | |
dlaube | hmm | 01:33 |
*** r-daneel has quit IRC | 01:50 | |
*** Marga_ has quit IRC | 01:58 | |
zer0c00l | What are the drivers used by the devstack setup mentioned here? http://docs.openstack.org/developer/ironic/dev/dev-quickstart.html#deploying-ironic-with-devstack | 01:58 |
zer0c00l | i see the power driver is ssh | 01:58 |
zer0c00l | Does it use the pxe driver? | 01:58 |
zer0c00l | If i have to make ironic use my custom driver, where should i add it? | 01:58 |
zer0c00l | IRONIC_ENABLED_DRIVERS ? | 01:58 |
zer0c00l | Also how do i see the existing enabled drivers? | 01:59 |
zer0c00l | My openrc looks like this http://paste.fedoraproject.org/155684/85573141 | 01:59 |
zer0c00l | i can't see anything else other that ssh driver mentioned in the ironic conductor log | 02:00 |
jroll | dlaube: should be /etc/ironic.conf | 02:00 |
jroll | zer0c00l: IRONIC_ENABLED_DRIVERS, yeah, and restack | 02:01 |
jroll | (I think) | 02:01 |
*** nosnos has joined #openstack-ironic | 02:02 | |
zer0c00l | jroll: enabled_drivers = fake,pxe_ssh,pxe_ipmitool | 02:02 |
openstackgerrit | Tan Lin proposed openstack/ironic: Fixed typo in Drac management driver test https://review.openstack.org/138028 | 02:02 |
jroll | zer0c00l: yeah, it needs to be there | 02:03 |
zer0c00l | It is mentioned in /etc/ironic/ironic.conf | 02:03 |
jroll | IRONIC_ENABLED_DRIVERS in devstack makes it go there | 02:03 |
zer0c00l | so if i add a new one there and restart the conductor the new driver should be loaded | 02:03 |
zer0c00l | ? | 02:03 |
jroll | yes | 02:03 |
jroll | oh | 02:03 |
zer0c00l | sure. Thanks! | 02:03 |
jroll | you need it in setup.cfg | 02:03 |
zer0c00l | setup.cfg? | 02:03 |
zer0c00l | of ironic? | 02:03 |
jroll | and that requires setup.py install | 02:03 |
jroll | yes | 02:03 |
*** marcoemorais has quit IRC | 02:03 | |
Haomeng|2 | zer0c00l: ironic.conf | 02:03 |
zer0c00l | ok | 02:04 |
Haomeng|2 | zer0c00l: you can change it in ironic.conf, for example - enabled_drivers = fake,pxe_ssh,pxe_ipmitool | 02:04 |
Haomeng|2 | zer0c00l: and restart the ironic-conductor process | 02:05 |
zer0c00l | Got it! | 02:05 |
zer0c00l | Thanks | 02:05 |
Haomeng|2 | zer0c00l: welcome | 02:05 |
*** Dafna has quit IRC | 02:16 | |
*** takadayuiko has joined #openstack-ironic | 02:16 | |
openstackgerrit | Haomeng,Wang proposed openstack/ironic: boot_devices.PXE value should match with pyghmi define https://review.openstack.org/137745 | 02:20 |
*** Nisha has joined #openstack-ironic | 02:25 | |
zer0c00l | The "Boot" and the "Deploy" Stuff are not decoupled in the ironic pxe drivers? | 02:30 |
zer0c00l | PXE is a boot method, iscsi is a deploy method | 02:30 |
zer0c00l | The "PXEDeploy" class always deploys using iscsi method | 02:30 |
zer0c00l | if i have to write my own deploy method, i need to either make a copy of PXEDeploy or decouple this thing | 02:31 |
zer0c00l | ? | 02:31 |
*** ChuckC has joined #openstack-ironic | 02:34 | |
Haomeng|2 | zer0c00l: yes, good catch | 02:35 |
Haomeng|2 | zer0c00l: we have bp which try to decouple deploy and boot | 02:35 |
Haomeng|2 | zer0c00l: let me find it | 02:35 |
jroll | zer0c00l: yeah, that's something we want to do asap | 02:36 |
jroll | :( | 02:36 |
zer0c00l | i can take a look at it | 02:37 |
zer0c00l | Is there a bug#? | 02:37 |
jroll | https://review.openstack.org/#/q/status:open+branch:master+topic:bp/new-boot-interface,n,z | 02:37 |
jroll | is where lucas has been working on it | 02:38 |
jroll | I gotta run, later | 02:38 |
zer0c00l | sure | 02:38 |
zer0c00l | let me check | 02:38 |
*** killer_prince is now known as lazy_prince | 02:40 | |
Haomeng|2 | yes, it is | 02:41 |
harlowja_ | zer0c00l have u tried using libvirt vms to be the pxeboot targets, i'm not sure if thats what ironic does (vs use the fake stuff) | 02:48 |
harlowja_ | that might be a way to get everything all on your laptop | 02:48 |
*** Masahiro has quit IRC | 02:48 | |
harlowja_ | including vms that act as 'machines' that u can pxeboot | 02:48 |
harlowja_ | https://bugzilla.redhat.com/show_bug.cgi?id=815136 might be sorta neat to | 02:49 |
harlowja_ | libvirt + ipmi | 02:49 |
*** ryanpetrello has joined #openstack-ironic | 02:51 | |
*** Masahiro has joined #openstack-ironic | 02:53 | |
*** dlaube has quit IRC | 02:55 | |
*** ramineni has joined #openstack-ironic | 03:02 | |
*** achanda has joined #openstack-ironic | 03:03 | |
*** Masahiro has quit IRC | 03:11 | |
*** nosnos has quit IRC | 03:29 | |
*** Masahiro has joined #openstack-ironic | 03:35 | |
*** Haomeng|2 has quit IRC | 03:38 | |
*** Masahiro has quit IRC | 03:40 | |
*** Masahiro has joined #openstack-ironic | 03:43 | |
*** Masahiro has quit IRC | 03:45 | |
*** pensu has joined #openstack-ironic | 03:47 | |
*** rloo_ has quit IRC | 03:47 | |
*** lazy_prince has quit IRC | 03:52 | |
*** Masahiro has joined #openstack-ironic | 03:55 | |
*** rushiagr_away is now known as rushiagr | 03:58 | |
*** Masahiro has quit IRC | 04:07 | |
*** naohirot has joined #openstack-ironic | 04:07 | |
naohirot | good afternoon ironic! | 04:07 |
*** nosnos has joined #openstack-ironic | 04:17 | |
openstackgerrit | Merged openstack/ironic: Fixed typo in Drac management driver test https://review.openstack.org/138028 | 04:19 |
*** achanda has quit IRC | 04:21 | |
*** Masahiro has joined #openstack-ironic | 04:28 | |
*** ryanpetrello has quit IRC | 04:28 | |
*** Marga_ has joined #openstack-ironic | 04:47 | |
*** killer_prince has joined #openstack-ironic | 04:48 | |
*** killer_prince is now known as lazy_prince | 04:48 | |
*** Haomeng has joined #openstack-ironic | 04:53 | |
*** pensu has quit IRC | 05:11 | |
*** lazy_prince is now known as killer_prince | 05:18 | |
*** rameshg87 has joined #openstack-ironic | 05:21 | |
*** pcrews has quit IRC | 05:30 | |
*** achanda has joined #openstack-ironic | 05:36 | |
*** achanda has quit IRC | 05:38 | |
*** achanda has joined #openstack-ironic | 05:39 | |
*** pensu has joined #openstack-ironic | 05:40 | |
*** achanda has quit IRC | 05:43 | |
*** achanda has joined #openstack-ironic | 05:44 | |
*** Masahiro has quit IRC | 05:46 | |
openstackgerrit | Harshada Mangesh Kakad proposed openstack/ironic: Add documentation for SeaMicro driver https://review.openstack.org/136324 | 05:53 |
*** lintan has joined #openstack-ironic | 06:02 | |
*** Masahiro has joined #openstack-ironic | 06:04 | |
*** Marga_ has quit IRC | 06:06 | |
*** achanda has quit IRC | 06:14 | |
*** achanda has joined #openstack-ironic | 06:14 | |
*** achanda has quit IRC | 06:19 | |
*** mrda is now known as mrda-away | 06:28 | |
*** pradipta_away is now known as pradipta | 06:33 | |
openstackgerrit | sandhya proposed openstack/ironic-specs: Chassis Level Node Discovery https://review.openstack.org/134866 | 06:34 |
*** Marga_ has joined #openstack-ironic | 06:41 | |
*** harlowja_ is now known as harlowja_away | 06:45 | |
*** killer_prince has quit IRC | 06:47 | |
*** lazy_prince has joined #openstack-ironic | 06:47 | |
*** chenglch has joined #openstack-ironic | 06:54 | |
*** Masahiro has quit IRC | 07:11 | |
*** subscope has quit IRC | 07:12 | |
*** Masahiro has joined #openstack-ironic | 07:16 | |
*** k4n0 has joined #openstack-ironic | 07:21 | |
*** subscope has joined #openstack-ironic | 07:27 | |
lintan | lintan: | 07:40 |
Haomeng | lintan: hi | 07:45 |
Haomeng | lintan: understand you are trying to ping your self for testing:) | 07:46 |
lintan | Haomeng: haha, you get me :) | 07:47 |
Haomeng | lintan: :) | 07:47 |
Haomeng | lintan: but if it is working, that should be irc client bug, because, that does not make sense:) | 07:48 |
Haomeng | lintan: :) | 07:48 |
lintan | Haomeng: :( yes, it doesn't work as you said. I just have a try. | 07:49 |
Haomeng | lintan: :) | 07:49 |
*** LuisArizmendi has joined #openstack-ironic | 08:04 | |
*** romcheg has joined #openstack-ironic | 08:13 | |
*** ndipanov_gone is now known as ndipanov | 08:19 | |
*** Masahiro has quit IRC | 08:33 | |
*** dlpartain has joined #openstack-ironic | 08:36 | |
takadayuiko | stackuser common-venv use-ephemeral deploy-ironic | 08:38 |
takadayuiko | mistook :O | 08:39 |
*** Masahiro has joined #openstack-ironic | 08:41 | |
*** jcoufal has joined #openstack-ironic | 08:42 | |
*** andreykurilin has joined #openstack-ironic | 08:45 | |
*** vinbs has joined #openstack-ironic | 08:50 | |
Nisha | hi dtantsur|afk | 08:51 |
*** nosnos has quit IRC | 09:00 | |
openstackgerrit | Nisha Agarwal proposed openstack/ironic-specs: Discover node properties using new CLI node-discover-properties https://review.openstack.org/100951 | 09:14 |
*** pradipta is now known as pradipta_away | 09:15 | |
*** pradipta_away is now known as pradipta | 09:17 | |
*** pradipta is now known as pradipta_away | 09:17 | |
*** jistr has joined #openstack-ironic | 09:18 | |
*** chenglch|2 has joined #openstack-ironic | 09:24 | |
*** chenglch has quit IRC | 09:27 | |
*** dlpartain1 has joined #openstack-ironic | 09:31 | |
*** dlpartain has quit IRC | 09:31 | |
*** igordcard has joined #openstack-ironic | 09:31 | |
*** andreykurilin has quit IRC | 09:32 | |
*** viktors|afk has quit IRC | 09:34 | |
*** viktors has joined #openstack-ironic | 09:34 | |
*** athomas has joined #openstack-ironic | 09:35 | |
*** dlpartain1 has quit IRC | 09:35 | |
*** lucasagomes has joined #openstack-ironic | 09:39 | |
*** yuriyz has quit IRC | 09:40 | |
*** viktors has quit IRC | 09:40 | |
*** derekh has joined #openstack-ironic | 09:40 | |
*** dtantsur|afk is now known as dtantsur | 09:41 | |
dtantsur | Morning! | 09:42 |
naohirot | dtantsur: good morning :) | 09:42 |
*** foexle has joined #openstack-ironic | 09:42 | |
*** lsmola has quit IRC | 09:43 | |
openstackgerrit | Merged openstack/ironic: boot_devices.PXE value should match with pyghmi define https://review.openstack.org/137745 | 09:43 |
*** jcoufal_ has joined #openstack-ironic | 09:44 | |
*** jcoufal has quit IRC | 09:47 | |
*** chenglch|2 has quit IRC | 09:52 | |
Nisha | dtantsur, good morning | 09:53 |
Nisha | i have updated the spec | 09:54 |
Nisha | dtantsur, i have used the term introspect now in the spec instead of discovery now | 09:55 |
dtantsur | Nisha, good. The parent spec is not updated with it, but I believe we all agreed. | 09:55 |
dtantsur | and good morning :) | 09:55 |
Nisha | dtantsur, let me know your comments/suggestions on it | 09:55 |
dtantsur | yeah sure, gimme a moment | 09:55 |
Nisha | dtantsur, :) | 09:55 |
Nisha | dtantsur, :) | 09:56 |
dtantsur | Nisha, shouldn't CLI command be also called node-introspect-properties? (or even just node-introspect) | 09:56 |
Nisha | i actually wanted to ask that before posting the spec... :) | 09:57 |
Nisha | but then i though i will do that once discussed | 09:57 |
dtantsur | Nisha, also there are a few places where you still use DISCOVERING and DISCOVERYFAIL | 09:57 |
Nisha | Ok. Let me see. | 09:57 |
Nisha | I will repost the spec | 09:57 |
*** lsmola has joined #openstack-ironic | 09:58 | |
dtantsur | Nisha, also introspection_timeout option should be mentioned in "deployer impact" section (with the default value) | 09:58 |
Nisha | ok | 09:59 |
dtantsur | lemme post the comments on the spec, IRC is a bad reference source :) | 10:00 |
*** sambetts has joined #openstack-ironic | 10:01 | |
*** viktors has joined #openstack-ironic | 10:01 | |
*** Masahiro has quit IRC | 10:03 | |
dtantsur | done | 10:04 |
*** luisjariz has joined #openstack-ironic | 10:04 | |
*** luisjariz has quit IRC | 10:05 | |
*** LuisArizmendi has quit IRC | 10:07 | |
dtantsur | lucasagomes, o/ may I use you for discoverd reviews today as well? :) | 10:07 |
lucasagomes | dtantsur, heh sure | 10:08 |
lucasagomes | morning all | 10:08 |
dtantsur | lucasagomes, I have 4 more, the most important right now being https://review.openstack.org/#/c/137418/ and https://review.openstack.org/#/c/137361/ | 10:08 |
dtantsur | thanks in advance :) | 10:08 |
takadayuiko | Hi, lucasagomes | 10:08 |
* dtantsur had a hard rebase yesterday... | 10:09 | |
lucasagomes | takadayuiko, hello there :) | 10:09 |
dtantsur | takadayuiko, o/ | 10:09 |
sambetts | dtantsur, lucasagomes: I had to head out ysterday evening, was a final decision made about the state machine? | 10:09 |
takadayuiko | dtantsur, o/ | 10:09 |
dtantsur | sambetts, I guess it was "carry on with what we have now" | 10:09 |
*** Masahiro has joined #openstack-ironic | 10:09 | |
lucasagomes | sambetts, yeah improve the one we have now | 10:10 |
sambetts | so continue to implement the ideas proposed at the summit? | 10:10 |
dtantsur | yep | 10:10 |
lucasagomes | some of the stuff proposed have made into the new model, like less multipaths, states are now classified as passive/active (kinda like the state action), some name changes | 10:11 |
sambetts | cool cool :-) | 10:11 |
openstackgerrit | Nisha Agarwal proposed openstack/ironic-specs: Discover node properties using new CLI node-introspect https://review.openstack.org/100951 | 10:11 |
sambetts | lucasagomes: ah ok, just refined a bit from the whiteboard scribble | 10:11 |
lucasagomes | yup | 10:11 |
Nisha | dtantsur, lucasagomes could i request your reviews on https://review.openstack.org/134022 and https://review.openstack.org/137024 | 10:12 |
*** yuriyz has joined #openstack-ironic | 10:12 | |
Nisha | posted long back and no reviews till now | 10:13 |
dtantsur | yeah, review queue is huge for us, sorry :) will try to find time today | 10:13 |
Nisha | dtantsur, thanks | 10:13 |
dtantsur | lucasagomes, thanks, updated | 10:33 |
*** lsmola has quit IRC | 10:34 | |
lucasagomes | Nisha, #137024 reviwed | 10:35 |
*** naohirot has quit IRC | 10:37 | |
*** vdrok has joined #openstack-ironic | 10:38 | |
lucasagomes | dtantsur, what does /v1/continue does? | 10:39 |
* lucasagomes brb 1 sec | 10:39 | |
dtantsur | lucasagomes, it's an endpoint receiving callback from the ramdisk | 10:39 |
dtantsur | and I know that I suck at naming :) | 10:39 |
lucasagomes | dtantsur, this is where all the pos_* plugins will run? | 10:42 |
lucasagomes | I'm wondering if the http request won't timeout there making it sync | 10:42 |
*** pelix has joined #openstack-ironic | 10:42 | |
dtantsur | lucasagomes, yep. it corresponds to process() function | 10:42 |
*** Masahiro has quit IRC | 10:42 | |
dtantsur | well, it depends on the timeout :) | 10:42 |
dtantsur | even on the timeout of CURL, right? | 10:43 |
dtantsur | (talking about the bash ramdisk) | 10:43 |
lucasagomes | yeah, usually tools have their own timeout | 10:44 |
lucasagomes | you can modify it do with some -- options | 10:44 |
lucasagomes | dtantsur, but does the ramdisk needs to wait the /continue to finish ? | 10:44 |
dtantsur | lucasagomes, actually it took 1-2 seconds last time I checked :) | 10:44 |
lucasagomes | I thought it would post the data and poweroff and the service would then process the data and do what it needs to do | 10:44 |
dtantsur | lucasagomes, it does, if we want to implement IPMI credentials setting | 10:44 |
lucasagomes | oh | 10:45 |
dtantsur | lucasagomes, also: it's nice to leave the ramdisk in the troubleshoot mode if we sent some crap and discoverd returned an error | 10:45 |
lucasagomes | dtantsur, yeah I'm just wondering when more plugins comes in | 10:45 |
lucasagomes | we can kinda lose the control of the time | 10:45 |
lucasagomes | and being async sounds more flexible | 10:45 |
lucasagomes | unless we have a notification to tell the ramdisk to continue that's hard | 10:46 |
lucasagomes | but I think that for v1/ it may be fine to leave it sync | 10:46 |
dtantsur | well... if plugins don't need to return the result to the ramdisk, they could use greenthread.spawn() and become async | 10:46 |
mrda-away | hey jroll, are you happy with my response to your query on the logical name spec? | 10:46 |
dtantsur | if they do need to return the result, it can't be helped | 10:46 |
lucasagomes | dtantsur, right | 10:47 |
lucasagomes | ok grand then :) | 10:47 |
dtantsur | cool :) | 10:47 |
* dtantsur yoga time, brb | 10:47 | |
*** lsmola has joined #openstack-ironic | 10:50 | |
*** ramineni has quit IRC | 11:06 | |
openstackgerrit | Lucas Alvares Gomes proposed openstack/ironic: Extend API multivalue fields https://review.openstack.org/137762 | 11:07 |
openstackgerrit | Lucas Alvares Gomes proposed openstack/ironic: Extend API multivalue fields https://review.openstack.org/137762 | 11:08 |
*** Nisha has quit IRC | 11:14 | |
*** bradjones has quit IRC | 11:14 | |
*** Nisha has joined #openstack-ironic | 11:15 | |
*** rameshg87 has quit IRC | 11:15 | |
*** bradjones has joined #openstack-ironic | 11:19 | |
*** bradjones has quit IRC | 11:19 | |
*** bradjones has joined #openstack-ironic | 11:19 | |
*** alexpilotti_ has joined #openstack-ironic | 11:19 | |
*** alexpilotti has quit IRC | 11:20 | |
*** alexpilotti_ has quit IRC | 11:23 | |
*** vinbs has quit IRC | 11:38 | |
*** smoriya has quit IRC | 11:41 | |
*** jistr is now known as jistr|training | 11:42 | |
*** Masahiro has joined #openstack-ironic | 11:43 | |
*** Masahiro has quit IRC | 11:48 | |
*** takadayuiko has quit IRC | 11:52 | |
*** romcheg has quit IRC | 11:54 | |
*** romcheg has joined #openstack-ironic | 11:54 | |
*** naohirot has joined #openstack-ironic | 11:54 | |
openstackgerrit | Nisha Agarwal proposed openstack/ironic-specs: Discover node properties using new CLI node-introspect https://review.openstack.org/100951 | 11:56 |
*** Haomeng|2 has joined #openstack-ironic | 11:58 | |
*** Haomeng has quit IRC | 11:59 | |
openstackgerrit | Nisha Agarwal proposed openstack/ironic-specs: Discover node properties for iLO drivers https://review.openstack.org/103007 | 12:13 |
*** pensu has quit IRC | 12:14 | |
openstackgerrit | Nisha Agarwal proposed openstack/ironic-specs: uefi support for agent-ilo driver https://review.openstack.org/137024 | 12:25 |
*** k4n0 has quit IRC | 12:28 | |
*** lucasagomes is now known as lucas-hungry | 12:32 | |
*** lazy_prince is now known as killer_prince | 12:43 | |
*** ryanpetrello has joined #openstack-ironic | 12:43 | |
*** Masahiro has joined #openstack-ironic | 12:52 | |
*** Masahiro has quit IRC | 12:57 | |
*** erwan_taf has joined #openstack-ironic | 13:11 | |
*** dprince has joined #openstack-ironic | 13:15 | |
*** lucas-hungry is now known as lucasagomes | 13:23 | |
*** igordcard has quit IRC | 13:28 | |
*** killer_prince is now known as lazy_prince | 13:29 | |
*** igordcard has joined #openstack-ironic | 13:31 | |
openstackgerrit | Oleksii Chuprykov proposed openstack/ironic-python-agent: Use oslo.utils and oslo.concurrency https://review.openstack.org/138116 | 13:33 |
*** ryanpetrello has quit IRC | 13:38 | |
*** ryanpetrello has joined #openstack-ironic | 13:39 | |
*** Marga_ has quit IRC | 13:47 | |
*** rloo has joined #openstack-ironic | 13:54 | |
*** rushiagr is now known as rushiagr_away | 13:55 | |
*** igordcard has quit IRC | 13:56 | |
*** jjohnson2 has joined #openstack-ironic | 13:57 | |
*** Nisha has quit IRC | 13:59 | |
*** ryanpetrello_ has joined #openstack-ironic | 14:06 | |
*** ryanpetrello has quit IRC | 14:08 | |
*** ryanpetrello_ is now known as ryanpetrello | 14:08 | |
*** linggao has joined #openstack-ironic | 14:13 | |
*** ndipanov has quit IRC | 14:16 | |
*** Marga_ has joined #openstack-ironic | 14:18 | |
*** ryanpetrello has quit IRC | 14:18 | |
*** ryanpetrello has joined #openstack-ironic | 14:19 | |
*** Marga_ has quit IRC | 14:23 | |
ChuckC | morning ironic | 14:24 |
openstackgerrit | Merged openstack/ironic-specs: Proposal to add logical names to Ironic nodes https://review.openstack.org/134439 | 14:35 |
jroll | morning everybody | 14:35 |
Shrews | morning jroll | 14:35 |
Shrews | and ChuckC | 14:36 |
jroll | mrda-away: landed your spec, I think you can provide a node uuid in the body and I forgot about that | 14:36 |
jroll | heya Shrews, ChuckC :) | 14:36 |
rloo | morning ChuckC, jroll, Shrews | 14:36 |
*** dlaube has joined #openstack-ironic | 14:36 | |
Shrews | o/ rloo | 14:37 |
jroll | \o rloo | 14:37 |
jroll | (jinx) | 14:37 |
Shrews | get outta my head jroll | 14:37 |
jroll | :D | 14:38 |
openstackgerrit | Harshada Mangesh Kakad proposed openstack/ironic: Add documentation for SeaMicro driver https://review.openstack.org/136324 | 14:38 |
rloo | 'great minds think alike' ? | 14:38 |
jroll | great is an interesting word for my mind :P | 14:40 |
Shrews | great is an incorrect word for my mind :P | 14:41 |
*** Masahiro has joined #openstack-ironic | 14:41 | |
rloo | fools seldom differ? ;) | 14:43 |
jroll | lolol | 14:44 |
dtantsur | Morning ChuckC, jroll, Shrews, rloo! | 14:44 |
jroll | hey dtantsur :) | 14:44 |
rloo | afternoon dtantsur | 14:44 |
Shrews | rloo: that's more appropriate :) | 14:44 |
Shrews | hey dtantsur | 14:44 |
lucasagomes | jrist, Shrews rloo ChuckC morning | 14:46 |
*** Masahiro has quit IRC | 14:46 | |
lucasagomes | jroll, :) | 14:46 |
rloo | afternoon lucasagomes | 14:46 |
lucasagomes | jr<tab> is dangerous | 14:46 |
jroll | lol | 14:46 |
*** rushiagr_away is now known as rushiagr | 14:49 | |
*** lazy_prince is now known as killer_prince | 14:50 | |
*** naohirot has quit IRC | 14:53 | |
NobodyCam | good momrning Ironic-ers | 14:55 |
dtantsur | NobodyCam, o/ | 14:55 |
jroll | hiya NobodyCam :) | 14:56 |
lucasagomes | NobodyCam, morning | 14:56 |
NobodyCam | morning dtantsur jroll lucasagomes :) | 14:56 |
* jroll toses a pot of coffee to NobodyCam | 14:56 | |
jroll | hmm, need to find a nova core | 14:56 |
NobodyCam | oh thank you jroll :) neeeded :) | 14:56 |
NobodyCam | nova core? | 14:57 |
jroll | yeah for https://review.openstack.org/#/c/98930/ | 14:58 |
jroll | configdrive | 14:58 |
dlaube | g'morning | 14:59 |
*** r-daneel has joined #openstack-ironic | 15:00 | |
jroll | hiya dlaube :) | 15:00 |
jroll | NobodyCam: found one \o/ | 15:01 |
jroll | darn, now I have to actually write code | 15:01 |
NobodyCam | oh Nice spec | 15:01 |
NobodyCam | lol | 15:01 |
NobodyCam | morning dlaube | 15:01 |
*** rushiagr is now known as rushiagr_away | 15:04 | |
lucasagomes | jroll, o/ | 15:11 |
rloo | jroll: I opened a bug about setting maintenance mode off via node-update, needing to clear maint reason | 15:12 |
rloo | jroll: https://bugs.launchpad.net/ironic/+bug/1398191 | 15:12 |
lucasagomes | jroll, any news on the rebuild vs configdrive thing? | 15:12 |
rloo | jroll: so we don't forget ;) | 15:12 |
NobodyCam | morning rloo :) | 15:13 |
rloo | morning to the man who hopefully has had coffee | 15:13 |
*** ndipanov has joined #openstack-ironic | 15:13 | |
NobodyCam | :) yep :) working on first cup now | 15:14 |
jroll | lucasagomes: we haven't talked about it more, yesterday was pretty busy | 15:16 |
lucasagomes | :) yeah I hear ya | 15:17 |
jroll | rloo: cool, thanks | 15:18 |
*** Marga_ has joined #openstack-ironic | 15:19 | |
*** Marga_ has quit IRC | 15:23 | |
*** lynxman has quit IRC | 15:25 | |
*** lynxman has joined #openstack-ironic | 15:26 | |
NobodyCam | lucasagomes: your comment on https://review.openstack.org/#/c/132137 is that because you foresee anyone wanting to set the uuid? or is there another reason I'm not thinking about? | 15:27 |
NobodyCam | s/foresee/don't foresee | 15:27 |
lucasagomes | NobodyCam, yeah, I mean I can't think about any use case where someone wants to create a node in Ironic and input a UUID by hand | 15:28 |
lucasagomes | I understand it's supported in the API so makes sense to be able to do in the client | 15:28 |
jroll | I would probably use it if my cmdb used UUIDs | 15:28 |
lucasagomes | but a UUID seems like something that should always be generate (to guarantee uniqueness) | 15:28 |
dtantsur | lucasagomes, discoverd may by such a thing, if folks do force me to handle creation of Ironic nodes (which I try to avoid) | 15:28 |
jroll | just for easy linkage | 15:28 |
lucasagomes | right | 15:29 |
NobodyCam | yea, I was thinking about folks who have a existing cmdb and just wanted to keep the same id's | 15:29 |
dtantsur | yeah and CMDB use case too | 15:29 |
lucasagomes | yeah, ok :) | 15:29 |
lucasagomes | I was fine with the change | 15:29 |
jroll | lucasagomes: the uuid library doesn't ensure uniqueness by remembering uuids or anything, there can actually be collisions I would guess | 15:29 |
lucasagomes | my -1 is because of the lack of tests | 15:29 |
NobodyCam | still needs tests | 15:29 |
NobodyCam | yea just wanded to make sure | 15:29 |
jroll | we have the unique constraint for ensuring uniqueness :P | 15:29 |
lucasagomes | yeah, I haven't thought about the CMDB thing | 15:30 |
lucasagomes | I can see it, but still, I find it odd to input a UUID by hand | 15:30 |
NobodyCam | anyone seen arun on line? | 15:30 |
lucasagomes | in the conference we talked about having alias and all for nodes | 15:30 |
lucasagomes | I like that | 15:31 |
jroll | yeah | 15:31 |
jroll | I also find it odd, but could be useful | 15:31 |
jroll | also, I just landed mrda's spec for the name thing | 15:31 |
lucasagomes | w00t | 15:31 |
NobodyCam | I hope folks doing that are adding nodes via script and not hand :-p | 15:31 |
NobodyCam | nice | 15:31 |
jroll | :P | 15:32 |
lucasagomes | jroll, how IPA picks the disk to use for the deployment? | 15:33 |
lucasagomes | u guys have some mechanism there? or pick the first one? | 15:33 |
jroll | it's pluggable | 15:34 |
jroll | but it chooses the smallest disk above 4GB | 15:34 |
jroll | https://github.com/openstack/ironic-python-agent/blob/master/ironic_python_agent/hardware.py#L262-270 | 15:34 |
* lucasagomes looks | 15:34 | |
jroll | you could override that with a little hardware manager plugin | 15:35 |
jroll | super easy | 15:35 |
lucasagomes | nice yeah! | 15:35 |
jroll | for example this is our manager for our hardware https://github.com/rackerlabs/onmetal-ironic-hardware-manager/blob/master/onmetal_ironic_hardware_manager/__init__.py#L39 | 15:36 |
jroll | just replace the methods in that class with whatever you want to override | 15:36 |
lucasagomes | jroll, having some hints in the node.properties about which disk to pick would make sense for u guys too? | 15:36 |
lucasagomes | (like UUID, WWN, etc...) | 15:36 |
jroll | lucasagomes: it would make sense, we personally probably wouldn't use it but yeah | 15:36 |
lucasagomes | ack | 15:37 |
* lucasagomes clicks on the example | 15:37 | |
rloo | NobodyCam, lucasagomes. wrt 132137, i could be wrong but that discussion about PUT in the meeting was partially due to a bug, where they wanted to specify the uuid when creating a node | 15:48 |
rloo | NobodyCam: and I mentioned yesterday to arun that it would be nice if he could add a test to that patch ;) | 15:49 |
*** dlaube has quit IRC | 15:51 | |
NobodyCam | rloo: awesome Thank you :) | 15:52 |
*** dlaube has joined #openstack-ironic | 15:55 | |
*** zz_jgrimm is now known as jgrimm | 15:57 | |
*** alexpilotti has joined #openstack-ironic | 15:58 | |
lucasagomes | rloo, ahh right | 15:58 |
lucasagomes | so seems people does have many use cases for it :) | 15:59 |
lucasagomes | which is good | 15:59 |
rloo | lucasagomes: we aim to please. ha ha. | 15:59 |
*** alexpilotti has quit IRC | 15:59 | |
*** alexpilotti has joined #openstack-ironic | 15:59 | |
lucasagomes | rloo, lol! aye | 16:00 |
*** yjiang5 is now known as yjiang5_away | 16:01 | |
*** anderbubble has joined #openstack-ironic | 16:01 | |
*** pcrews has joined #openstack-ironic | 16:04 | |
NobodyCam | hummm | 16:06 |
lucasagomes | dtantsur, rloo added a comment about DISCOVERING->PREBOOTING/AVAILABLE on the state machine thing | 16:12 |
NobodyCam | jroll: the IPA `iso-image-create` dose not create the agent image itself... correct? | 16:13 |
jroll | mmm | 16:13 |
* jroll looks | 16:13 | |
*** rushiagr_away is now known as rushiagr | 16:13 | |
jroll | NobodyCam: no, but 'make iso' will | 16:14 |
jroll | through the dependencies | 16:14 |
rloo | thx lucasagomes. it isn't clear to me whether one can opt out of zapping/discovery at any time 'around' the state machine? | 16:14 |
NobodyCam | :) commenting on the agent-ilo0uefi support spec | 16:14 |
lucasagomes | rloo, it doesn't seem to be | 16:16 |
lucasagomes | but I think it that those states could be non-op | 16:16 |
rloo | lucasagomes: do people really want to eg do discovering every time a node is going to be made available again? | 16:17 |
lucasagomes | if the driver doesn't contain any zapping steps, or introspection interface | 16:17 |
lucasagomes | zapping and discovering just moves to the next state | 16:17 |
lucasagomes | same for prebooting | 16:17 |
rloo | lucasagomes: yeah, but what if the driver wants to do discovering once, the very first time the node is enrolled? | 16:17 |
lucasagomes | rloo, if zapping is updating firmware for e.g it can introduce new capabilties that needs to be discovered | 16:17 |
rloo | lucasagomes: but what if zapping doesn't do anything that requires discovering. | 16:18 |
lucasagomes | rloo, I hope that we are going to introduce a state before INIT to do that | 16:18 |
lucasagomes | so that it can be configurable, once the node enters the main loop (ZAPPING->...) | 16:18 |
lucasagomes | the second discover can be configured | 16:18 |
lucasagomes | rloo, so it can be skipped | 16:19 |
openstackgerrit | Dmitry Tantsur proposed openstack/ironic-specs: In-band hardware properites discovery via ironic-discoverd https://review.openstack.org/135605 | 16:19 |
jroll | whoa, I need to look at the state machine again, we should never automatically go to discovered :| | 16:19 |
rloo | lucasagomes: as long as something can be skipped ... | 16:19 |
lucasagomes | rloo, but that doesn't totally invalidade discovering, I believe that discovering you could catch things like "hey this disk doesn't exist anymore" | 16:19 |
lucasagomes | due some failure | 16:19 |
*** Marga_ has joined #openstack-ironic | 16:20 | |
rloo | jroll: yeah, please look. i think we should try to give this spec high priority. | 16:20 |
lucasagomes | jroll, I was arguing about, making some states optional | 16:20 |
jroll | lucasagomes: discovery in that case would touch node.properties to update the disk size or whatever | 16:20 |
jroll | not alert that a disk is gone | 16:20 |
jroll | (AIUI) | 16:20 |
lucasagomes | yeah in that diagram it seems it's going to do that | 16:21 |
dtantsur | reference ramdisk for discoverd is merged https://review.openstack.org/#/c/122151/ \o/ | 16:21 |
jroll | yeah, do not want | 16:21 |
jroll | dtantsur: nice! | 16:21 |
lucasagomes | the way I thought about it before was to have a consistent check that could be implemented | 16:21 |
lucasagomes | after discovering | 16:21 |
lucasagomes | and after AVAILABLE | 16:21 |
lucasagomes | because the machine could be hanging there in AVAILABLE for days before a nova boot comes in | 16:22 |
lucasagomes | and something may fail, or someone wrongly pulled a cable etc | 16:22 |
lucasagomes | that way we would reject the machine if it was picked for deployment but the consistent check have failed | 16:22 |
lucasagomes | (and nova with retry filter would then pick another machine to deploy that instance) | 16:22 |
jroll | yeah, agree, we need to check for consistency, not sure if that should be in zapping or a different thing, but I really don't think it should be in discovery | 16:23 |
dtantsur | lucasagomes, on the review you didn't answer the question, how (and why) discoverd will figure out whether to move node to prebooting or available :-/ | 16:23 |
rloo | we already have periodic task that checks the power; would a periodic check for consistency on nodes that are avail do what you want? | 16:23 |
lucasagomes | dtantsur, my suggestion was to always move to prebooting | 16:23 |
jroll | dtantsur: I would think ironic would handle that? | 16:23 |
jroll | rloo: just a check at some point before making the node available | 16:24 |
jroll | rloo: check that all the ram/disks/networks/etc are there | 16:24 |
lucasagomes | jroll, yeah I don't think it should be discovering either | 16:24 |
dtantsur | jroll, Ironic can't :) Ironic should be told that discoverd is done. and it's told by moving a node to the next state | 16:24 |
rloo | jroll: i thought lucas mentioned that the node is already in avail and something happens to it in that state | 16:24 |
lucasagomes | but I was trying to fit in the current diagram | 16:24 |
*** Marga_ has quit IRC | 16:24 | |
jroll | dtantsur: not sure how I feel about an api call with a target state of available | 16:25 |
lucasagomes | rloo, jroll yeah that too. so we added two consistent checks (same code/action) in diff stages | 16:25 |
lucasagomes | one after discovering and one after available | 16:25 |
jroll | dtantsur: perhaps /nodes/uuid/states/provision {'target': 'discoverydone'} and ironic decides what to do | 16:25 |
*** yjiang5_away is now known as yjiang5 | 16:25 | |
dtantsur | jroll, that works for me | 16:25 |
jroll | lucasagomes: discovering changes node.properties, why does it need to verify node.properties | 16:25 |
lucasagomes | jroll, I was trying to fit in the current diagram | 16:26 |
lucasagomes | but I think we need some other thing there to actually perform the checking | 16:26 |
dtantsur | jroll, lucasagomes, will I be able to move it to INTROSPECTIONFAIL too? | 16:26 |
lucasagomes | dtantsur, somehow you have to tell ironic that it has failed to introspect | 16:26 |
lucasagomes | so I believe yes | 16:26 |
lucasagomes | idk if an API call with target:introspectionfail that makes little sense to me | 16:27 |
lucasagomes | but some way you gotta notify it | 16:27 |
dtantsur | lucasagomes, for now I'm planning on Ironic timing out the introspection :D but it doesn't look too friendly | 16:27 |
jroll | hmm | 16:27 |
jroll | this is hard :( | 16:28 |
lucasagomes | yeah, it's not "fast" enough :) | 16:28 |
lucasagomes | jroll, yup | 16:28 |
lucasagomes | jroll, the way we are architect'ing things, we could use a state to do few other steps | 16:29 |
dtantsur | I guess it's the first time we try to fit a 3rdparty to do some long-running node job for Ironic... | 16:29 |
lucasagomes | like DEPLOYING could check consistency | 16:29 |
rloo | dtantsur: your discoveryd needs that wait flag thing that we haven't described yet, right? | 16:29 |
dtantsur | rloo, yes | 16:29 |
lucasagomes | (I not necessarily think it's great, but I believe that it could be part of that state) | 16:29 |
dtantsur | I mean, it can live without it, but it will be a strange state of node :) | 16:29 |
*** Masahiro has joined #openstack-ironic | 16:30 | |
rloo | dtantsur: it seems like part of that wait flag mechanism should allow for whatever it is waiting for, to indicate that it is done and whether it succeeded or not. | 16:30 |
rloo | dtantsur: or to time out waiting ;) | 16:30 |
dtantsur | rloo, good idea | 16:30 |
lucasagomes | sounds good rloo | 16:31 |
rloo | dtantsur: and then it seems like if it fails, it does into whatever *FAIL state associated with the state where the wait was? | 16:31 |
rloo | s/does/goes/ | 16:31 |
dtantsur | yeah | 16:31 |
rloo | with some meaningful msg of course :-) | 16:31 |
dtantsur | heh, do we need 'transition_reason' now? | 16:32 |
dtantsur | like we had 'maintenance_reason'? | 16:32 |
jroll | ... or use last_error? | 16:33 |
jroll | that's exactly what last_error is for | 16:33 |
rloo | dtantsur: dunno. I need to wait for the states to settle down first, etc. then we can work on adding the bits we need to deal with it all ;) | 16:33 |
dtantsur | jroll, make last_error also available for writing from outside? | 16:34 |
dtantsur | rloo, right | 16:34 |
jroll | sigh | 16:34 |
*** Masahiro has quit IRC | 16:34 | |
rloo | why the sigh, jroll? | 16:35 |
jroll | discoverd or whatever sends {"target": "DISCOVERYFAIL", "reason": "everything is broken"}, ironic handles the state change and updating last_error | 16:35 |
lucasagomes | maybe as part of the body request to unset the wait flag | 16:35 |
lucasagomes | you can give the message | 16:35 |
jroll | yeah | 16:35 |
lucasagomes | not even need the target | 16:35 |
lucasagomes | cause if it's DISCOVERING the DISCOVERYFAIL is the error state associated with it | 16:35 |
lucasagomes | so ironic can figure that out | 16:35 |
jroll | /nodes/uuid/discovery {"result": "success", "reason": ""} | 16:36 |
jroll | for example | 16:36 |
jroll | /nodes/uuid/discovery {"result": "error", "reason": "busted"} | 16:36 |
jroll | something like that | 16:36 |
rloo | I have to read dtantsur's spec still. I don't like discoveryd hooked into the introspecting state. | 16:36 |
jroll | I don't think we should be talking about intimate details of how discovery works before we figure out the states | 16:37 |
lucasagomes | I think I will need a small spec for the disk hints :/ | 16:40 |
lucasagomes | to determine what could be used to figure out which device to pick, such as UUID, NAME, MODEL, SERIAL, WWN etc... | 16:41 |
jroll | and any combination | 16:42 |
lucasagomes | yes | 16:42 |
* lucasagomes starts a spec | 16:42 | |
* jroll bbiab | 16:43 | |
*** dprince has quit IRC | 16:45 | |
*** romcheg has quit IRC | 16:48 | |
openstackgerrit | Victor Lowther proposed openstack/ironic-specs: New Ironic provisioner state machine. https://review.openstack.org/133828 | 16:48 |
devananda | morning, all | 16:51 |
*** Marga_ has joined #openstack-ironic | 16:52 | |
NobodyCam | good morning devananda | 16:53 |
* devananda reads scrollback | 16:53 | |
*** achanda has joined #openstack-ironic | 16:53 | |
dtantsur | devananda, morning | 16:54 |
lucasagomes | devananda, morning | 16:54 |
devananda | lucasagomes: consistency check after AVAILABLE, in the deploy pipeline? a) I would rather not do that for several reasons, b) we're really adding a lot of features to this state machine now ... | 16:55 |
*** achanda has quit IRC | 16:56 | |
lucasagomes | devananda, right, idk how long the machine could be sitting there waiting for the nova boot command to come | 16:56 |
*** jcoufal_ has quit IRC | 16:57 | |
lucasagomes | so it would be nice if we could test things like is the cable still connect to the right port (like JoshNang does when zapping) | 16:57 |
lucasagomes | doesn't need to be a full check, but allowing some check would be nice (optional, even if part of the deploy) | 16:57 |
*** pensu has joined #openstack-ironic | 16:58 | |
*** achanda has joined #openstack-ironic | 16:58 | |
NobodyCam | devananda: didn't you add rebuild to the new state machine? | 16:59 |
devananda | lucasagomes: sure, I don't know that either. it could be just a few seconds, too. | 16:59 |
*** igordcard has joined #openstack-ironic | 17:00 | |
lucasagomes | devananda, sure, it could be there for 10s or 1 month | 17:00 |
lucasagomes | we can't predict that | 17:00 |
lucasagomes | (that's why I thought that some checks would be nice to have) | 17:01 |
*** Nisha has joined #openstack-ironic | 17:01 | |
devananda | having hints in node.properties about which disk to pick -- also not something I want to encourage. how is that not getting into the department of snowflake-management? | 17:01 |
lucasagomes | devananda, if we are supporting RAID | 17:02 |
lucasagomes | how I can tell ironic to use the raid device I just created? | 17:02 |
dtantsur | also cases like big RAID for data and small SSD for an OS image... Someone told me about it. | 17:03 |
lucasagomes | devananda, it's not about describing all disks we have in the server | 17:03 |
lucasagomes | but giving hints about which one we should deploy the image onto | 17:03 |
NobodyCam | I have seen use cases where the custome wanted OS to be sdb not sda | 17:03 |
lucasagomes | that seems aligned with the project scope | 17:04 |
lucasagomes | it's useful data for the dpeloyment | 17:04 |
victor_lowther | nottomention that what /dev/sda is in your discovery image may not be the same as it is to the installed OS. | 17:04 |
* victor_lowther has encountered that | 17:04 | |
lucasagomes | victor_lowther, exactly, device names can change on each boot | 17:05 |
lucasagomes | but things like serial, wwn, UUID | 17:05 |
lucasagomes | can't | 17:05 |
victor_lowther | right | 17:05 |
lucasagomes | those are the hints I'm thinking of | 17:05 |
lucasagomes | or even, if u want to make it more generic you could say | 17:05 |
victor_lowther | devananda: they will definitly be needed. | 17:05 |
lucasagomes | hey pick the disk which is >=1TB | 17:05 |
victor_lowther | it is snowflake management for vms because their idea of disk order is much more predictable. | 17:06 |
dtantsur | lucasagomes, this logic is better left to discoverd... | 17:06 |
lucasagomes | dtantsur, it's at deploy time | 17:06 |
dtantsur | but being able to precisely point to a disk seems very much needed to me | 17:06 |
lucasagomes | dtantsur, idk how discoverd can do that | 17:06 |
lucasagomes | victor_lowther, +1 | 17:06 |
dtantsur | lucasagomes, well, in RAID case it probably can't. w/o RAID it can pre-populate this property based on some logic | 17:07 |
victor_lowther | scan /dev/disks/by-whatever and pick the awesomest ones | 17:07 |
dtantsur | (in theory, it's not implemented) | 17:07 |
lucasagomes | victor_lowther, yup, or lsblk | 17:07 |
victor_lowther | after your raid arrays are created. | 17:07 |
lucasagomes | but in Ironic POV, hints are generic, in the ramdisk we cna create the logic using sysfs or lsblk or whatever to figure out it | 17:07 |
victor_lowther | ya | 17:08 |
lucasagomes | devananda, does it sounds fair? | 17:08 |
*** Marga_ has quit IRC | 17:09 | |
* devananda returns to reading scrollback, had to pop into a meeting ... | 17:09 | |
*** Marga_ has joined #openstack-ironic | 17:09 | |
victor_lowther | Crowbar used an unholy combination of /dev/disk/by-id and the actual full device paths in sysfs to figure things out. | 17:10 |
lucasagomes | hardware device path? | 17:10 |
victor_lowther | due to fun on one of our targets where some random disk in an external drive array was /dev/sda | 17:11 |
lucasagomes | heh yeah without udev rules you can't predict it | 17:11 |
lucasagomes | (unless it's LVM so u can :)) | 17:11 |
victor_lowther | due to PCI bus ordering and module insertion ordering fun. | 17:11 |
victor_lowther | LVM and UUIDs are basically useless in the scenario I am thinking of | 17:12 |
victor_lowther | because they are not applicable to raw physical disks. | 17:12 |
lucasagomes | sure | 17:12 |
victor_lowther | so don't help you ifgure out where the boot partition should be. | 17:12 |
victor_lowther | UEFI is sorta nice there in that your boot partition can be anywhere as long as UEFI can see it. | 17:13 |
victor_lowther | lucasagomes: ya, hardware device path. | 17:14 |
lucasagomes | right in this case we want to find the root device, and lay the image onto it | 17:14 |
*** viktors is now known as viktors|afk | 17:14 | |
lucasagomes | the root partition (in case of a full disk image) will be part of the image itself | 17:14 |
lucasagomes | victor_lowther, right | 17:14 |
victor_lowther | so, dtantsur | 17:16 |
victor_lowther | how to you want discoverd to work in the state machine? | 17:16 |
rloo | NobodyCam: I just looked at this spec https://review.openstack.org/#/c/101122/ which is still targetted for juno. Shouldn't it have been abandoned? | 17:16 |
NobodyCam | yea let me look | 17:17 |
*** marcoemorais has joined #openstack-ironic | 17:17 | |
NobodyCam | ah ha.. seems I added a comment instead of abandoning it | 17:18 |
dtantsur | victor_lowther, in case of using discoverd for introspection, Ironic sets node to DISCOVERING, WAIT -> true, calls to discoverd and relaxes :) | 17:18 |
dtantsur | victor_lowther, after that discoverd comes back and advances node state to the next one | 17:18 |
devananda | https://drive.google.com/file/d/0Bz_nyJF_YYGZZ05zaU9kb2Z4SE0/view?usp=sharing | 17:19 |
*** hemna__ has joined #openstack-ironic | 17:19 | |
NobodyCam | rloo: updated | 17:19 |
rloo | thx NobodyCam | 17:19 |
*** Marga_ has quit IRC | 17:19 | |
devananda | victor_lowther: my draft from last night, forgot to post anywhere - also working a bit more on it now | 17:19 |
devananda | victor_lowther: i'm looking at your latest spec now ... | 17:20 |
NobodyCam | looking very good devananda :) | 17:20 |
*** dtantsur is now known as dtantsur|brb | 17:22 | |
lucasagomes | devananda, sorry for insisting, but just to get the initial feedback cause I'm currently writing the spec for it. R you ok with the hints? | 17:23 |
*** dprince has joined #openstack-ironic | 17:28 | |
devananda | oslo change / hacking rule has broken our gate (and just about everyone else's) | 17:29 |
devananda | https://bugs.launchpad.net/hacking/+bug/1398472 | 17:29 |
lucasagomes | :/ | 17:30 |
devananda | lucasagomes: do we guarantee which disk device the node will boot from? | 17:31 |
lucasagomes | devananda, yes, after writing the image we get the UUID and set it as the ROOT=UUID= | 17:31 |
lucasagomes | so at boot time, after deployment it's going to boot from the right device | 17:32 |
devananda | lucasagomes: that's a kernel param, right? | 17:32 |
lucasagomes | devananda, yes | 17:32 |
devananda | which applies only to net boot | 17:32 |
lucasagomes | yes, for the fulldisk images I believe they already have a bootloader in the image | 17:33 |
JayF | yes | 17:33 |
devananda | yup | 17:33 |
lucasagomes | with the right config to boot the right partition | 17:33 |
*** achanda has quit IRC | 17:33 | |
devananda | and if we're providing an API to change which disk the image is written to, are we sure that we wrote that to the disk the system expects the bootloader to be on? | 17:33 |
*** achanda has joined #openstack-ironic | 17:34 | |
devananda | also, after an image is written to disk, and the instance is deployed, what does it mean if the operator changes this value in the API without redeploying a new instance? | 17:34 |
lucasagomes | devananda, is a) the fulldisk image example? | 17:34 |
lucasagomes | devananda, because right now if u have 2 disks and ironic pick the first | 17:35 |
lucasagomes | it's random | 17:35 |
lucasagomes | ironic could boot once and get disk A and boot second time and get disk B | 17:35 |
lucasagomes | it's way more unpredicable the way it works now | 17:35 |
lucasagomes | for B) this is what zapping should do, clean the disks | 17:36 |
lucasagomes | I mean, both examples could happens as-is | 17:36 |
lucasagomes | with the current code | 17:36 |
devananda | I'm not sure where zapping got introduced to this question | 17:36 |
devananda | as for the current behavior, if it's not repeatable -- that's a bug | 17:37 |
devananda | assuming no hardware failures, any number of (re)deployments, given the same inputs, to the same node, should result in the same disk being used | 17:37 |
lucasagomes | I'm not sure I'm in sync. Right yeah it's bug | 17:37 |
devananda | if that's not the case, that should definitely be fixed | 17:37 |
lucasagomes | and the hints helps fixing it | 17:37 |
devananda | taht's where I disagree | 17:38 |
lucasagomes | devananda, so that assumition is wrong | 17:38 |
devananda | we should have predictable behavior. if I understand where you're going with hints (and maybe I dont) | 17:38 |
lucasagomes | cause you don't have udev rules enforcing device names or ordering | 17:38 |
devananda | they will lead to less repeatability, not more | 17:38 |
lucasagomes | you can't assume that | 17:38 |
*** achanda has quit IRC | 17:39 | |
JayF | lucasagomes: devananda: device names / identifiers are *not used at all* by IPA today | 17:39 |
victor_lowther | the issue is that (for example) discoverd's idea of what /dev/sda is may not be the same as what any random other distro's idea is. | 17:39 |
devananda | right | 17:39 |
lucasagomes | the hints is giving the operator a way to tell based on persistent block device naming which device to use to deploy the image | 17:39 |
victor_lowther | because SCSI device ordering in Linux is not stable, and it never will be. | 17:39 |
JayF | Although I think we take the "first", smallest disk >4GB ... so it's very possible that it could be inconsistent | 17:39 |
devananda | each OS may order the disks differently | 17:39 |
JayF | but in the case of IPA we could easily make the pick-a-device-to-deploy-to decision have more knowledge | 17:39 |
devananda | the OS-specific ordering shouldn't matter to Ironic. That it matters within the context of discoverd is a different problem | 17:40 |
JayF | I think the case that lucasagomes suggests is possible (and I've seen it happen), especially since deploy ramdisks use coreos+very recent kernel which means ordering could be radically different than an older (think: RHEL5/6) image would have | 17:40 |
lucasagomes | JayF, yeah, that's what I'm tying to abstract | 17:40 |
JayF | devananda: I'm saying I think it can today with IPA, if you have disks of equal size | 17:41 |
victor_lowther | the stablest solution I have found is to figure out what disk to use via its sysfs device path, then ensure that the IS install and bootloader use that disk based on its disk ID/WWN/serial number. | 17:41 |
JayF | victor_lowther: that's *exactly* how we do discovery for updating firmwares on our raid cards during decom | 17:41 |
devananda | can Ironic repeatably a) deploy an image b) onto the same disk c) which is the disk that that machine attempts to boot from || has the UUID which we pass as a kernel param via PXE | 17:41 |
lucasagomes | victor_lowther, JayF yup. That's what I'm proposing | 17:41 |
victor_lowther | devananda: UUID is a filesystem/partition property, not a disk property. | 17:42 |
lucasagomes | devananda, the UUID is after the deploy | 17:42 |
victor_lowther | Best to not rely on it for OS install purposes | 17:42 |
lucasagomes | cause it's a property of the fs | 17:42 |
lucasagomes | yeah that ^ | 17:42 |
devananda | victor_lowther: yes. argh.... | 17:42 |
JayF | you can rely on labels | 17:42 |
JayF | and label the partition how you expect | 17:42 |
JayF | when you deploy it | 17:42 |
JayF | although that doesn't help in the deploy ramdisk case | 17:43 |
devananda | can Ironic repeatably a) deploy an image b) onto the same disk c) which is the disk that that machine attempts to boot from | 17:43 |
victor_lowther | that is still after you pick the disk to deploy to | 17:43 |
lucasagomes | devananda, if ur booting via the network idk about c) | 17:43 |
JayF | devananda: my answer would be that IPA today, in some cases (two disks of same size), could consistently fail c, but I wouldn't expect unstable behavior across provisions | 17:43 |
victor_lowther | Only if something tells it which physical disk to use by ID | 17:43 |
lucasagomes | +1 | 17:44 |
victor_lowther | otherwise it will just usually work, not always work. | 17:44 |
devananda | so, IIUC, the problem you're all referring to is that the ramdisk agents (whether iSCSI or IPA or discoverd) may use non-repeatable ordering, or may use different ordering from each other (eg, if using discoverd and then using IPA on the same node) | 17:44 |
victor_lowther | yes | 17:44 |
lucasagomes | devananda, correct | 17:44 |
devananda | and just to check, the problem you're referring to is NOT related to the physical boot device or knowing which physical disk the server will boot from | 17:45 |
*** jistr|training has quit IRC | 17:45 | |
victor_lowther | it is related. | 17:45 |
lucasagomes | devananda, the problem is about finding which device we should write the image onto. And that means that we want to boot from that disk too | 17:47 |
lucasagomes | (that's why we enforce ROOT=UUID= in the kernel cmdline)... for the fulldisk image it already has a bootloader configured (but there's still problem if both disks contains a bootloader) | 17:48 |
lucasagomes | which is not solved by the hints (and is not intended to solve that too) | 17:48 |
victor_lowther | and if you are managing the RAID controller, you can directly control that. If you are using UEFI, you don't have to care about that. Otherwise, you have to rely on having enough BIOS control, heuristics and/or quality of firmware. | 17:48 |
*** sambetts has quit IRC | 17:48 | |
JayF | victor_lowther: was not my intention to -1 you as soon as you pushed a new patchset. The comment I just posted is from yesterday and it just never made it's way up :( | 17:49 |
lucasagomes | yeah, for RAID it's good to, to make sure you are using the device you've just created | 17:49 |
victor_lowther | it helps that most RAID controllers let you specify which disk they will try to boot from. | 17:49 |
lucasagomes | victor_lowther, [off-topic] http://doodle.com/9h4ncgx4etkyfgdw2wpdircv help us voting on the mascot name :) | 17:49 |
*** athomas has quit IRC | 17:50 | |
victor_lowther | JayF: no worries | 17:50 |
*** Marga_ has joined #openstack-ironic | 17:51 | |
JayF | jroll: nova-spec for configdrive just went in | 17:52 |
JayF | jroll: woo | 17:52 |
NobodyCam | nice :) | 17:52 |
victor_lowther | lucasagomes: Doodle 404! | 17:55 |
lucasagomes | ew lemme check | 17:55 |
lucasagomes | victor_lowther, http://doodle.com/9h4ncgx4etkyfgdw here it go | 17:56 |
victor_lowther | That worked | 17:57 |
lucasagomes | thanks :D | 17:57 |
NobodyCam | brb | 17:58 |
*** rushiagr is now known as rushiagr_away | 17:59 | |
lucasagomes | aight I will go now | 18:02 |
lucasagomes | have a good night everyone | 18:02 |
lucasagomes | (I will check the channel from time to time) | 18:02 |
*** lucasagomes is now known as lucas-dinner | 18:02 | |
*** derekh has quit IRC | 18:06 | |
*** rwsu has joined #openstack-ironic | 18:07 | |
*** Nisha has quit IRC | 18:10 | |
NobodyCam | have a good night lucas-dinner | 18:11 |
*** anderbubble has quit IRC | 18:18 | |
*** Masahiro has joined #openstack-ironic | 18:19 | |
*** anderbubble has joined #openstack-ironic | 18:21 | |
*** pensu has quit IRC | 18:21 | |
*** pensu has joined #openstack-ironic | 18:23 | |
PaulCzar | getting a fun new error from ironic-conductor. it happens during what looks to be laying down the image files in the tftproot directory: | 18:23 |
PaulCzar | https://gist.github.com/paulczar/ca2c3c21612b5cef5cc3 | 18:23 |
*** Masahiro has quit IRC | 18:23 | |
PaulCzar | unfortunately the error isn't very clear where to look to work out what is going wrong | 18:24 |
devananda | PaulCzar: is this repeatable? if so, under what conditions? | 18:26 |
PaulCzar | any time I try to provision a node ? | 18:27 |
PaulCzar | still using the ssh_agent | 18:27 |
PaulCzar | with virtualbox | 18:27 |
NobodyCam | PaulCzar: swift is setup for temp urls? | 18:30 |
*** harlowja_away is now known as harlowja_ | 18:31 | |
NobodyCam | and you have enough free disk space to hold the image? | 18:31 |
devananda | this is odd: 2014-12-02 18:21:19.361 4171 DEBUG ironic.drivers.modules.image_cache [-] Destination /tftpboot/96b061ed-eb91-4b12-b2d6-9d1c6ce34369/deploy_kernel already exists for image 88786efb-8985-4017-8839-7cd98ff9c87a fetch_image /usr/local/lib/python2.7/dist-packages/ironic/drivers/modules/image_cache.py:110 | 18:31 |
jroll | 2014-12-02 18:21:19.361 4171 WARNING ironic.conductor.manager [-] Error in deploy of node 96b061ed-eb91-4b12-b2d6-9d1c6ce34369: 'image_source' | 18:31 |
devananda | PaulCzar: can you manually delete the /tftpboot/96b061ed-eb91-4b12-b2d6-9d1c6ce34369 directory | 18:31 |
jroll | the node doesn't have instance_info.image_source | 18:31 |
jroll | err, instance_info['image_source'] | 18:32 |
jroll | which nova is supposed to put down afaik | 18:32 |
devananda | jroll: huh. that's not a great error message, then. | 18:32 |
devananda | yea, Nova should be passing that in, but it should fail that check sooner, right? | 18:32 |
jroll | devananda: really? I think it's a great error :P | 18:32 |
devananda | look at where it comes from | 18:32 |
jroll | I'm joking | 18:32 |
devananda | heh | 18:33 |
devananda | there should be an exception traceback just after that | 18:33 |
PaulCzar | hmm where does it get image_source from? glance metadata ? | 18:33 |
jroll | PaulCzar: nova puts the image uuid there | 18:33 |
devananda | PaulCzar: that should be the image UUID that you requested via "nova boot" | 18:33 |
PaulCzar | crazy question ... if I used the image name will nova still pass the uuid ? | 18:34 |
devananda | the nova.virt.ironic driver should populate the node.instance_info['image_source'] field during driver.deploy | 18:34 |
devananda | PaulCzar: yup | 18:34 |
devananda | well. it should. if it's not .... | 18:34 |
jroll | so that probably comes through from https://github.com/openstack/ironic/blob/master/ironic/drivers/modules/agent.py#L259 | 18:35 |
jroll | that's the only direct image_source access we do | 18:35 |
jroll | more specifically https://github.com/openstack/ironic/blob/master/ironic/drivers/modules/agent.py#L177 | 18:35 |
PaulCzar | let me try again with the uuid just to make sure | 18:35 |
jroll | bah, we don't validate image_source | 18:36 |
devananda | https://github.com/openstack/ironic/blob/master/ironic/conductor/manager.py#L715 | 18:36 |
*** dtantsur|brb is now known as dtantsur | 18:36 | |
devananda | PaulCzar: is there an exception traceback in the conductor log after that, which you didn't include in the paste? | 18:36 |
PaulCzar | devananda: nope, there's no exception traceback | 18:37 |
devananda | so that bothers me a bit | 18:37 |
jroll | btw that comes from https://github.com/openstack/ironic/blob/master/ironic/conductor/manager.py#L715 | 18:37 |
devananda | we're catching and reraising, but losing the traceback somehow? | 18:37 |
jroll | it's a catch-all | 18:37 |
devananda | jroll: yea, I just linked that :) | 18:37 |
* jroll is blind right now | 18:37 | |
devananda | also, that exception, when converted to a string, is just the word "image_source" | 18:38 |
jroll | so, it should reraise that exception | 18:38 |
jroll | where does that end up? :| | 18:38 |
devananda | well, I bet it is, but there's no logger set up for that thread | 18:38 |
jroll | mmm, right. | 18:38 |
*** rwsu has quit IRC | 18:38 | |
dlaube | hey guys, I'm seeing "ERROR ironic.drivers.modules.agent [-] node 1d0203c6-151c-4113-bb31-738b336a07e4 command status errored: {u'message': u'Error downloading image.', u'code': 500, u'type': u'ImageDownloadError', u'details': u'Could not download image with id 1edccf41-f244-4304-ab65-66d28d5a86a7.'}" when I try to nova boot | 18:39 |
devananda | s/logger/exception handler | 18:39 |
devananda | since, clearly, it's capable of logging from that thread | 18:39 |
dlaube | I've added agent_pxe_append_params = nofb nomodeset vga=normal console=ttyS0 systemd.journald.forward_to_console=yes but I'm not sure what I should be looking for in the bm console logs | 18:39 |
jroll | so maybe outside of that with reraise block we should log.exception(e) | 18:39 |
jroll | dlaube: look for anything interesting... or feel free to just paste the whole thing | 18:39 |
devananda | jroll: I'm guessing a bit here, but it seems like either we can drop the reraise and just log and clean up, or we should sort out why worker threads aren't logging exceptions. | 18:41 |
* devananda returns to thinking about the state machine | 18:42 | |
dlaube | http://paste.openstack.org/show/p3Hvh7DAfDzspebWFxbl/ | 18:42 |
dlaube | looks like coreos deploy image went well… nothing from IPA stands out to me | 18:42 |
dlaube | am I able to ssh into the coreos deploy image while it is doing the rest of the deployment? | 18:43 |
jroll | devananda: both, imo? unexpected exceptions that we don't catch should be logged, we've written a bunch of code to work around it | 18:43 |
jroll | dlaube: if you build your own ramdisk and embed ssh keys, you can :) | 18:43 |
jroll | ugh, ipa y u no lo | 18:43 |
jroll | log, even | 18:43 |
jroll | JayF: ^ that's what IPA console logs look like in devstack fyi | 18:44 |
rloo | devananda, jroll: isn't it because there is a hook there to call ._provisioning_error_handler() | 18:44 |
dlaube | jroll: can I still use disk-image-builder and then add some things in before I glance image-create? or is a custom deploy image/ramdisk more involved in that | 18:46 |
dlaube | googling now | 18:46 |
jroll | dlaube: blah, I think we're missing some log things | 18:47 |
jroll | dlaube: for IPA, we don't use DIB, there's a builder in the repo | 18:47 |
jroll | dlaube: https://github.com/openstack/ironic-python-agent/tree/master/imagebuild/coreos | 18:47 |
jroll | drop things in oem/ as needed | 18:47 |
jroll | (e.g. oem/authorized_keys) | 18:48 |
NobodyCam | brb... | 18:48 |
jroll | rloo: where is there a hook for ._provisioning_error_handler()? | 18:48 |
devananda | jroll: https://github.com/openstack/ironic/blob/master/ironic/conductor/manager.py#L698 | 18:48 |
PaulCzar | I think I'm going backwards now ... - Error in deploy of node 6bbed42b-68eb-4946-90a5-68c797762f94: HTTPNotFound (HTTP 404) | 18:48 |
devananda | https://github.com/openstack/ironic/blob/master/ironic/conductor/manager.py#L593 | 18:49 |
dlaube | good to know. thanks jroll | 18:49 |
devananda | PaulCzar: does "ironic node-show <UUID>" indicate anything on the last_error field? | 18:49 |
jroll | ah, I see | 18:49 |
jroll | dlaube: np | 18:49 |
PaulCzar | also these errors seem to be surfacing as WARN rather than ERROR | 18:50 |
jroll | rloo: devananda: yeah, so _provisioning_error_handler doesn't actually do anything in this case | 18:50 |
PaulCzar | devananda: last_error: none | 18:50 |
rloo | jroll: right. The comment for _provisioning_error_handler() is perhaps why. It thinks it is only being called when there was an exception spawning a worker. | 18:51 |
devananda | rloo: that does not appear to be the case, based on reading task_manager | 18:51 |
jroll | it knows more, it just only handles that sort of exception | 18:51 |
devananda | rloo: but IMBW ... | 18:51 |
rloo | devananda: right, based on the comments in task_manager (I'm too lazy to read the code now) | 18:52 |
devananda | PaulCzar: something's off. if the deploy fails, it should save the failed state and reason why | 18:52 |
PaulCzar | paste of conductor log is in the first comment here - https://gist.github.com/paulczar/ca2c3c21612b5cef5cc3 | 18:53 |
devananda | PaulCzar: paste of "ironic node-show <UUID>" for the failed node? | 18:54 |
*** dtantsur is now known as dtantsur|afk | 18:54 | |
PaulCzar | added as comment in above gist | 18:55 |
NobodyCam | instance uuid is none? | 18:56 |
devananda | PaulCzar: this has been cleaned up. there's no failure here. | 18:56 |
devananda | PaulCzar: can you capture that after the boot fails? | 18:56 |
PaulCzar | that is after the boot fails | 18:56 |
PaulCzar | blerg, nova scheduler fails the nova boot with no valid host | 18:57 |
dlaube | the thing I find interesting about the failure to download the image error reported in the ironic cond log is that it looks like the image is being retrieved just fine according to glance reg "29831 DEBUG glance.registry.api.v1.images [7c53c4cb-4271-478c-ae2c-aa87300f7471 08df81fb719d413eacb36c2a249f1514 dcd2a172eb934e39a99ddb216e94b69f - - -] Successfully retrieved image ef0270d4-9e13-4c58-a8e9-bf8aad31d09d show /opt/stack/glance/glance/registry/api | 18:57 |
PaulCzar | dlaube: I had something similar to that the other day and I think the metadata for the image had bad values for kernel and ramdisk | 18:59 |
devananda | PaulCzar: gotcha. it looks like maybe this image is not accessible by the ironic service user? b195e0dc-fb06-4032-aae9-720b63abb923 | 19:00 |
PaulCzar | son of a! | 19:00 |
PaulCzar | the newer glance cli doesn't allow --public | 19:01 |
dlaube | PaulCzar: thanks, I'll double check the ids for the kernel and initrd specified in glance for the image I'm using with my nova boot call | 19:02 |
jroll | dlaube: we do a glance.show() on the image, the agent actually downloads it from swift | 19:03 |
devananda | victor_lowther: i think i missed the discussion somewhere -- is there a reason that your new draft does not have a path from DELETED to AVAILABLE, without going through both ZAP and INTROSPECT ? | 19:03 |
dlaube | jroll: ahh | 19:03 |
jroll | dlaube: I think you're hitting this one https://github.com/openstack/ironic-python-agent/blob/master/ironic_python_agent/extensions/standby.py#L146 | 19:03 |
jroll | would love a patch to add logging for that | 19:04 |
dlaube | so I should check my ~/logs/screen-s-object.log in devstack then | 19:04 |
victor_lowther | yes, and it has to do with drawing that diagram while listening to an all-hands. :/ | 19:04 |
PaulCzar | is swift needed for the non-devstack ? | 19:04 |
PaulCzar | it's not mentioned in the developer quick start | 19:04 |
devananda | victor_lowther: hah | 19:04 |
jroll | PaulCzar: for the agent driver (docs suck really bad for the agent driver) :( | 19:05 |
jroll | victor_lowther: devananda: I really really think that zapping should not imply introspecting and vice versa | 19:05 |
devananda | victor_lowther: ok. I'm taking another stab at it. I like the new wording around multiple states. What do you think of denoting that with a symbol to be even clearer, eg, [DEPLOY*] | 19:06 |
devananda | jroll: ditto | 19:06 |
victor_lowther | Make a comment, and I will fix it in the next rev -- there are a few other reviews I missed while hacking up version 8. | 19:06 |
* jroll comments on review | 19:06 | |
PaulCzar | are you kidding me right now ? something as critical as mentioning swift is required isn't mentioned anywhere ? | 19:06 |
victor_lowther | jroll: why not? | 19:06 |
*** mikedillion has joined #openstack-ironic | 19:07 | |
victor_lowther | devananda: sure, I can throw a symbol there. | 19:07 |
jroll | PaulCzar: I'm a horrible person :( | 19:08 |
jroll | PaulCzar: to be clear, this is only for agent driver, not pxe driver | 19:08 |
*** spandhe has joined #openstack-ironic | 19:08 | |
jroll | victor_lowther: 1) because they are two separate things; 2) introspecting should never be automatically triggered | 19:08 |
PaulCzar | so my options are to run swift or spend $1000 on building a pxe lab ? | 19:09 |
jroll | PaulCzar: I don't understand what infra you need for a pxe lab that you don't need for the agent | 19:09 |
victor_lowther | jroll: I think ZAPPING is a special case because out of all the states we have, ZAPPING is where we are going to make changes that can change what the node hardware looks like to everyone else. | 19:10 |
PaulCzar | jroll: doesn't pxe require ipmi gear to power on/off ? | 19:10 |
jroll | PaulCzar: how are you doing power control with the agent driver? | 19:10 |
jroll | because both require *some* form of power control | 19:10 |
victor_lowther | so it should update node.properties for cpus, arch. memory, disk sizes, etc. | 19:10 |
PaulCzar | jroll: using the virtualbox power controls in ssh_agent | 19:11 |
jroll | PaulCzar: you can do that with the pxe driver as well, see pxe_ssh | 19:11 |
jroll | victor_lowther: if the operator does things in zapping that will change properties, they should run discovery afterward or something. maybe we can make that an optional thing. | 19:11 |
*** pensu has quit IRC | 19:11 | |
jroll | victor_lowther: imagine you flash new firmware in zapping and a disk disappears, do you want to update node.properties to reflect that disk is gone? | 19:12 |
victor_lowther | hell yes | 19:12 |
PaulCzar | jroll: ah, good to know | 19:12 |
jroll | what | 19:12 |
jroll | what | 19:12 |
PaulCzar | jroll : docs for pxe? or do I need to dig through the source to figure it out? :) | 19:12 |
jroll | if a disk disappears for an unknown reaosn, I want ironic to fail that node hard | 19:12 |
*** sreekanth has joined #openstack-ironic | 19:13 | |
jroll | victor_lowther: ^ | 19:13 |
jroll | PaulCzar: docs for pxe are pretty alright, just follow the deploy docs that are up | 19:13 |
* jroll finds a link | 19:13 | |
victor_lowther | the one does not preclude the other, | 19:13 |
devananda | PaulCzar: PXE_* drivers are better documented at this point. largely becausethey've been around the project the longest | 19:13 |
victor_lowther | or do you want the hardware properties in node.properties to be proscriptive instead of descriptive? | 19:13 |
jroll | victor_lowther: I don't understand | 19:14 |
devananda | PaulCzar: that said, if you're looking at doing anything meaningful with physical hardware, why are you using *_ssh / virtualbox instead of IPMI ? | 19:14 |
jroll | PaulCzar: http://docs.openstack.org/developer/ironic/deploy/install-guide.html | 19:14 |
PaulCzar | devananda: because I want to prove this out locally and be able to do CI etc without tying up phyisical hardware | 19:15 |
victor_lowther | do you want the info in node.properties to declare what you expect a node to have | 19:15 |
jroll | victor_lowther: very much yes | 19:15 |
victor_lowther | or do you want it to reflect what the node has? | 19:15 |
jroll | victor_lowther: perhaps discovery should fill that in the first time, to reflect the actual properties | 19:15 |
jroll | victor_lowther: but I never want those to change without me knowing | 19:15 |
devananda | victor_lowther, jroll: we have an important distinction here. "properties declare what the node should have; if it doesn't, fail fast" || | 19:15 |
devananda | || "I dont know what tjhe node has, go discover it and update properties" | 19:16 |
NobodyCam | devananda: ++ | 19:16 |
devananda | the latter is exceptionally rare | 19:16 |
devananda | and, IMO, should require external initiation | 19:16 |
jroll | completely agree | 19:16 |
victor_lowther | the latter is the model I have always operated in. | 19:16 |
victor_lowther | it does not inhibit error detection and resolution due to missing hardware | 19:16 |
PaulCzar | btw all the docs say to use glance ... --public ... new clients should be glance ... --is-public=true | 19:16 |
jroll | victor_lowther: so if half of your ram fails, you want to just keep using that machine? | 19:16 |
victor_lowther | because I have always tracked those changes over time. | 19:16 |
NobodyCam | PaulCzar: can you file a bug on that so we don't forget | 19:17 |
victor_lowther | if half the ram fails without redundancy, the machine usually dies a horrible death | 19:17 |
victor_lowther | not continue to work silently | 19:17 |
devananda | victor_lowther: if the characteristics of the node change in any meaningful way, unintentionally, chances are good that it won't match a Nova Flavor any more | 19:17 |
victor_lowther | unless it lost halv the ram due to someone stealing it. | 19:18 |
jroll | victor_lowther: right. then by the existing spec: the user ends up calling nova delete, ironic introspects the hardware and updates node.properties, it suddenly has half the ram | 19:18 |
devananda | victor_lowther: so then the node sits idle, but not in an error state, indefinitely, since it no longer can be used by Nova | 19:18 |
victor_lowther | then your inventory control system notices and goes WTF | 19:18 |
victor_lowther | (which should be something besides Ironic) | 19:19 |
devananda | victor_lowther: you assume someone has a CMDB | 19:19 |
devananda | which, I think they should, but I also think that encoding that behavior in ironic is not helpful. | 19:19 |
victor_lowther | and if they do not, how do they track everything? | 19:19 |
devananda | doing the other thing (take the node out of rotation) does not prevent an external system from doing what you expect (namely, noticing the error) | 19:19 |
devananda | victor_lowther: napkins | 19:20 |
victor_lowther | Excel | 19:20 |
NobodyCam | most testing env's don't have a cmdb | 19:20 |
devananda | PaulCzar: if you don't want to run swift, and you are building a virtualized CI system for Ironic, thenyes, you probably want to stick to pxe_ssh driver | 19:20 |
PaulCzar | finding out all of this the hard way :) | 19:21 |
devananda | PaulCzar: that said, the Agent is pretty spiffy, and I didn't think running swift would be enough of a burden to prevent someone from choosing the Agent | 19:22 |
jroll | victor_lowther: even if you do have a CMDB, you'd have to set it up to poll ironic, notice changes, alert, etc etc | 19:22 |
victor_lowther | If y'all intend node.properties to be proscriptive, I don't have an issue with that. | 19:22 |
victor_lowther | that is jsut a mode that Crowbar has never operated in. | 19:22 |
devananda | victor_lowther: :) | 19:22 |
jroll | victor_lowther: I may be biased, I don't even care to run discovery because I want to tell ironic what should be there | 19:22 |
victor_lowther | jroll: that inevitably fails | 19:23 |
jroll | why? | 19:23 |
jroll | it's working great for me | 19:23 |
victor_lowther | so someone else sorts out inventory from ordering then? | 19:23 |
jroll | we get a spreadsheet from our vendor of exactly what was shipped | 19:24 |
devananda | for my uses, discovery is good once and only twice -- when new hardware arrives and I want to ensure the factory manifest is accurate (which it usually isn't) and after replacing (parts of) hardware, for the same reason | 19:24 |
jroll | we run a shitty python script to take that data and put it in ironic | 19:24 |
devananda | jroll: next you'll tell me that spreadsheet is never wrong | 19:24 |
jroll | plug everything in and pxe boot | 19:24 |
jroll | devananda: for the purposes of node.properties, it's never been wrong :) | 19:24 |
devananda | jroll: fair 'nuf | 19:24 |
* jroll will not say the same about other info there | 19:25 | |
devananda | NIC MAC's and IPMI info are the largest problem I've seen so far, actually | 19:25 |
devananda | not # of CPU cores | 19:25 |
jroll | yep, same here | 19:25 |
victor_lowther | Must not order systems in Q4, then. :) | 19:25 |
victor_lowther | but anyways | 19:26 |
victor_lowther | I am fine with logic that has node.properties for hardware config be proscriptive if set | 19:26 |
victor_lowther | we would just need to have logic at certian points in the state machine to check it. | 19:27 |
jroll | agree | 19:27 |
jroll | what we do today as part of zapping is verify that what's in the node is what's in ironic | 19:28 |
jroll | and fail out if there's a mismatch | 19:28 |
jroll | I don't know if that's the best route, but it's been working for us, we've caught real hardware failures with this | 19:28 |
devananda | jroll: iiuc, lucas-dinner was proposing to put in an assertion-check at the beginning of deployment | 19:29 |
jroll | devananda: right, I disagree, too slow | 19:29 |
devananda | I would strongly prefer to put that befor making the node available | 19:29 |
jroll | yep | 19:29 |
victor_lowther | so, in INTROSPECTING, then? :) | 19:29 |
jroll | IMO no | 19:30 |
jroll | introspecting should be manually triggered, look at what's there and update the node object, no questions ask | 19:30 |
*** ndipanov is now known as ndipanov_gone | 19:30 | |
victor_lowther | ok | 19:30 |
jroll | asked* | 19:30 |
victor_lowther | then it should be outside the state machine | 19:30 |
jroll | but... folks can and will disagree with me :) | 19:30 |
jroll | I mean, it's still a valid state | 19:31 |
victor_lowther | since it can happen any time | 19:31 |
victor_lowther | by operator request | 19:31 |
victor_lowther | sort of like maintenance mode. | 19:31 |
jroll | no, it can only happen in certain states (available/init/maybe more0 | 19:31 |
jroll | like, you couldn't send a DEPLOYING node into introspection | 19:31 |
*** pelix has quit IRC | 19:31 | |
jroll | because it's busy | 19:31 |
devananda | not in available | 19:32 |
jroll | so you would go available -> init -> introspecting/ | 19:32 |
jroll | ? | 19:32 |
jroll | that would be fine with me | 19:32 |
victor_lowther | meh | 19:32 |
devananda | jroll: s/init/managed/ | 19:32 |
jroll | sure | 19:32 |
devananda | it's just a word, but i think that substitution has value in our discussion | 19:32 |
jroll | so many new words | 19:32 |
victor_lowther | I would just set maintenance mode, run introspection, verify that I got what I expect, and unset maintennance mode. | 19:33 |
devananda | maintenance mode doesn't affect current state,a nd can be applied to any state | 19:33 |
victor_lowther | Exactly. | 19:33 |
devananda | what about requiring the node be transitioned back to managed (aka init) in order to initiate introspection? | 19:34 |
victor_lowther | back to the zapping can change things point | 19:34 |
devananda | a node that is managed but not available for use yet can be (re)introspected | 19:34 |
devananda | zapping is orthogonal | 19:34 |
devananda | I can zap at that point. I can also zap between delete and available | 19:34 |
devananda | it's automatic between delete and avaialble | 19:35 |
devananda | it's manual on a managed node | 19:35 |
devananda | jroll: I think this is closer to what ya'll are doing today? | 19:35 |
jroll | devananda: without the intermediate managed state, yes | 19:35 |
jroll | (and we don't have any concept of introspection, ofc) | 19:36 |
devananda | sure | 19:36 |
devananda | do you zap nodes which are available to nova? | 19:36 |
devananda | or do you do something to take them out of scheduling first? | 19:36 |
*** mrda-away is now known as mrda | 19:36 | |
mrda | Morning Ironic | 19:36 |
NobodyCam | morning mrda | 19:37 |
jroll | devananda: we zap available nodes, they go to the "ZAPPING" state (DECOMMISSIONING downstream) where they can no longer be scheduled | 19:37 |
devananda | k | 19:37 |
devananda | so there's a small race there | 19:37 |
jroll | yet another :) | 19:38 |
devananda | jroll, victor_lowther: http://paste.openstack.org/show/XsFt3wtJevf8LCcCB5fG/ | 19:38 |
devananda | lemme know what you think | 19:38 |
jroll | we don't actually do that very often, only if e.g. we get new firmware or whatever | 19:38 |
jroll | is R:thing a request for thing? | 19:39 |
devananda | jroll: yes. see victor's draft for an explanation | 19:39 |
devananda | jroll: tldr; the PUT to the REST API | 19:39 |
jroll | thought so, thanks | 19:39 |
jroll | right | 19:39 |
devananda | [FOO*] indicates FOOING, FOOED, and FOOFAIL | 19:39 |
victor_lowther | hm | 19:39 |
jroll | right | 19:40 |
victor_lowther | I think it is weird to have [ZAP*] in two places in the graph | 19:40 |
devananda | victor_lowther: i agree | 19:40 |
devananda | but I wanted it to be clear that managed -> zap -> managed is a manually-invoked process | 19:40 |
devananda | and active -> delete -> zap -> available is automatic | 19:41 |
devananda | wasn't sure how else to do that | 19:41 |
victor_lowther | hm... | 19:41 |
jroll | devananda: yeah, I think this looks fine | 19:41 |
devananda | i also left out preboot :( | 19:41 |
victor_lowther | namaged -> zap+flash -> managed | 19:41 |
victor_lowther | and delete -> zap -> available | 19:42 |
victor_lowther | preboot is easy to forget | 19:42 |
victor_lowther | I would just make it a rule that all fooing states should be booted into somehting | 19:43 |
victor_lowther | maybe. | 19:43 |
victor_lowther | Have to think about what that entails. | 19:43 |
victor_lowther | hm... | 19:43 |
victor_lowther | managed -> [mangle] -> [introspect] -> managed? | 19:44 |
devananda | that's not necessarily linear | 19:44 |
victor_lowther | make it clear that zap will not change what the hardware looks like, whereas mangle can? | 19:44 |
* victor_lowther nods | 19:44 | |
devananda | which is why i made zap and inspect separate loops | 19:44 |
victor_lowther | but hard to freehand nonlinear things in irc. :) | 19:45 |
devananda | indeed | 19:45 |
devananda | jroll: could you guys implement the long-running-ramdisk stuff within the zap* state? | 19:46 |
jroll | devananda: as in, boot the ramdisk at the end of zap* | 19:46 |
jroll | seems weird | 19:46 |
devananda | yes | 19:46 |
jroll | I don't think we need the preboot state, tbh | 19:46 |
devananda | it's part of "get it ready for provisioning again" | 19:46 |
devananda | right? | 19:46 |
jroll | I guess? | 19:46 |
jroll | I mean, it's an optimization | 19:47 |
devananda | that's what I'm getting at | 19:47 |
NobodyCam | I agree with jroll sounds strange at first read | 19:47 |
jroll | you can schedule to nodes that are prebooted or not prebooted | 19:47 |
PaulCzar | any benefits to using ipxe over pxe ? | 19:47 |
PaulCzar | reading through the pxe docs right now | 19:47 |
devananda | preboot as a seprate requestable state, which is essentially just a permutation of AVAILABLE, doesn't fit in the general case | 19:47 |
jroll | one is faster, should prefer one, just check power on to decide | 19:47 |
victor_lowther | PaulCzar: less tftp traffic | 19:47 |
jroll | PaulCzar: http > pxe | 19:47 |
PaulCzar | ipxe seems to be simpler ? | 19:47 |
jroll | err | 19:47 |
devananda | PaulCzar: http > tftp | 19:47 |
jroll | http > tftp | 19:47 |
*** lucas-dinner has quit IRC | 19:48 | |
devananda | PaulCzar: PXE is simpler in a sense. iPXE is both more extensible and more robust. | 19:48 |
victor_lowther | potential downside is that you have booting to local disk can be problematic depending on firmware | 19:48 |
openstackgerrit | Jarrod Johnson proposed stackforge/pyghmi: Implement server side IPMI protocol (WIP) https://review.openstack.org/138109 | 19:48 |
devananda | PaulCzar: you'll need a tftp service (just run tfptd) for PXE. you'll *also* need an HTTP service (eg, apache) for iPXE | 19:48 |
PaulCzar | devananda: right ... I'll start with pxe then ... although I do like the idea of http ... tftp is super slow | 19:49 |
jjohnson2 | tftp means a really ugly workaround for > 65k blocks, and tftp is easy to implement in software, meaning it isn't fancy enough to do things like send more than one packet without acknowledgement | 19:49 |
devananda | PaulCzar: start simple ++ | 19:49 |
devananda | jroll: if you just want a scheduling hint, why not use something in node.properties ? | 19:50 |
devananda | jroll: we don't need a discrete state for that | 19:50 |
devananda | like a nova filter which prefers nodes with "is_prewarmed" in node.properties['capabilities'] over ones that do not have that key | 19:51 |
victor_lowther | that is what I suggested to hint to Ironic whether and what to preboot. | 19:51 |
devananda | victor_lowther: the question in my mind right now is, how would one indicate that to ironic | 19:52 |
devananda | either ... | 19:52 |
devananda | - all nodes preboot all the time | 19:52 |
devananda | - none preboot automaticaly, but an operator (or external service) can manually request a specific node to preboot | 19:53 |
devananda | - some magical orchestration logic gets added to ironic with knobs that allow it to decide when and how many to preboot | 19:53 |
devananda | i clearly don't like option #3 | 19:53 |
victor_lowther | defeinitly not 3 | 19:53 |
victor_lowther | I would lean to 2 | 19:53 |
devananda | if it's 1, I would say it belongs in ZAP* | 19:54 |
devananda | and I don't actually see a benefit to 2 | 19:54 |
jroll | devananda: yeah, that's fine, I've never thought we needed preboot states | 19:54 |
devananda | and I don't actually see a benefit to 2 -- as a separate state in the state machine | 19:54 |
jroll | devananda: yeah, actually, in ZAP* might work | 19:55 |
* jroll looks at code | 19:55 | |
devananda | jroll: cool. i'm curious to know how you'd implement 2 in ZAP* | 19:55 |
victor_lowther | so, how to tell what gets prebooted? | 19:55 |
victor_lowther | (the image, not hte nodes) | 19:55 |
victor_lowther | in that case? | 19:55 |
jroll | devananda: 2 would not work in ZAP* | 19:55 |
NobodyCam | brb | 19:55 |
devananda | jroll: hm. ok. then ya'll would just preboot all the time? | 19:56 |
jroll | yes | 19:57 |
jroll | I see why others might want to only preboot some | 19:57 |
jjohnson2 | fyi, the ipmi target implementation can sanely respond to a get channel auth request | 19:58 |
jroll | devananda: what we do now is at the end of ZAP*, we reboot instead of power off | 19:58 |
jjohnson2 | NobodyCam,so not too much work and the hard part is done | 19:58 |
jjohnson2 | too much more work until the hard part is over that is | 19:58 |
*** ParsectiX has joined #openstack-ironic | 19:58 | |
devananda | jroll: nice. that's what I thought. so this works for you | 19:59 |
devananda | jroll: and I"m not aware of any other contributors working on / asking for a prewarmed-some-of-the-time optimization. so I'm OK with not making this more complicated to accomodate that | 19:59 |
*** spandhe has quit IRC | 19:59 | |
victor_lowther | ok | 20:00 |
*** spandhe has joined #openstack-ironic | 20:00 | |
jroll | devananda: sure, makes sense | 20:00 |
devananda | victor_lowther: new version of mine coming in 10m | 20:00 |
victor_lowther | so to me that sounds like PREBOOT* should vanish in favor of a "don't shut down when you hit AVAILABLE" flag. | 20:01 |
victor_lowther | coupled with "ZAP* always boots into something like discoverd or IPA" | 20:01 |
*** andreykurilin_ has joined #openstack-ironic | 20:02 | |
devananda | gah. TC meeting ... make that 2 hours | 20:02 |
* jroll also has a meeting now | 20:02 | |
victor_lowther | :) No worries. I am not going to so anything with the spec until more folks have commented on it anyways. | 20:03 |
victor_lowther | er, do anything. | 20:03 |
devananda | http://paste.openstack.org/show/JZ0tjouzcFYWZx1P18z6/ | 20:04 |
jroll | devananda: transition from AVAILABLE to MANAGED? | 20:04 |
devananda | yes | 20:04 |
devananda | missing an ^ arrow | 20:05 |
jroll | devananda: other than that, lgtm, would love for victor_lowther to push a new patchset with this | 20:07 |
*** Masahiro has joined #openstack-ironic | 20:07 | |
NobodyCam | jjohnson2: I'll have a look in a bit | 20:08 |
NobodyCam | but awesome | 20:08 |
victor_lowther | I will, but not until this evening. Have to give interested folks on the far side of the planet their chance to point out other things I got wrong. | 20:08 |
rloo | devananda: I'm not sure how you get from delete->zap (unless I am misinterpreting [DELET*/AVAILABLE] -- doesn't this mean go to AVAILABLE after deleted? | 20:09 |
devananda | rloo: it means the target state is AVAILABLE | 20:09 |
devananda | but it can go through anything the state machine wants to along the way | 20:09 |
jroll | victor_lowther: if we've already agreed something is wrong, we should fix it asap so that people aren't commenting on outdated things | 20:09 |
devananda | current / target | 20:09 |
devananda | that tracks the difference between ZAP/MANAGE and ZAP/AVAILABLE | 20:10 |
jroll | victor_lowther: give them a chance to comment on the new state machine overnight, rather than change it tomorrow and wait another night | 20:10 |
devananda | victor_lowther: if you dont mind and dont have time, i can just push a new rev over yours, with that picture in it | 20:10 |
rloo | devananda: hmm. ok, it is the only one that is different (target state of others, go to that state next) | 20:11 |
victor_lowther | it is not just hte picture, there is feedback that I inadvertly ignored from the last rev that also needs to be incorporated. | 20:11 |
rloo | devananda: you lost me with delet* -> zap/manage vs delet* -> zap/available. I don't see that in the diagram. | 20:11 |
jroll | rloo: right, the ACTIVE -> AVAILABLE transition goes through deleting and zapping | 20:11 |
devananda | rloo: ZAP is not a target state, ever, actually | 20:11 |
victor_lowther | but a comment on the current rev with a link to your new state machine would be appreciated. :) | 20:12 |
devananda | victor_lowther: ack, will do | 20:12 |
jroll | devananda: hmm, so what's the API call for MANAGED -> ZAP -> MANAGED | 20:12 |
*** Masahiro has quit IRC | 20:12 | |
jroll | target: ZAPPED? | 20:12 |
devananda | jroll: PUT /... {target: zap} | 20:12 |
JayF | As a note, since you're talking about this; I do have a proposal up on the state machine spec that suggests "decom" work be done in another step other than ZAPPING | 20:12 |
jroll | ok | 20:12 |
jroll | JayF: I don't love that :( | 20:13 |
devananda | whcihi goes to current=ZAPPING,target=MANAGED | 20:13 |
JayF | because ZAPPING being arbitrary stuff or security cleanups is incredibly confusing | 20:13 |
devananda | JayF: look at http://paste.openstack.org/show/JZ0tjouzcFYWZx1P18z6/ | 20:13 |
devananda | JayF: and tell me if you still think that | 20:13 |
devananda | current=ZAPPING,target=MANAGED is different from current=ZAPPING,target=AVAILABLE | 20:14 |
devananda | JayF: which I think captures enough information for your needs | 20:14 |
JayF | so what kicks things from MANAGED to AVAILABLE | 20:14 |
devananda | a human | 20:14 |
victor_lowther | I woud rename ZAP/AVAILABLE to CLEAN/AVAILABLE | 20:14 |
jroll | devananda: yep, got it | 20:14 |
devananda | we could just rename ZAP to CLEAN | 20:15 |
victor_lowther | nah | 20:15 |
devananda | :) | 20:15 |
JayF | If we rename zap to clean for the case of cleanup after deleted | 20:15 |
JayF | that's exactly what I wanted | 20:15 |
victor_lowther | because then we can rule that only cleaning stuff hallens in the ->AVAILABLE transition, and if you want to do arbitrary things you have to MANAGE the node first. | 20:15 |
victor_lowther | and ZAP/MANAGE would be clean + other interesting things. | 20:16 |
*** r-daneel has quit IRC | 20:17 | |
*** r-daneel has joined #openstack-ironic | 20:17 | |
rloo | +1 for not having both zap/managed and zap/available. clean/available works for me. | 20:22 |
JayF | victor_lowther: exactly what I was thinking. Although ZAP/MANAGE should not do the CLEAN steps | 20:23 |
victor_lowther | JayF: why not? | 20:23 |
jroll | JayF: what are the CLEAN steps? how do you update your fleet's firmware? | 20:23 |
devananda | JayF: sure it should | 20:23 |
JayF | jroll: by managing the node, then using zap | 20:23 |
jroll | Although ZAP/MANAGE should not do the CLEAN steps | 20:24 |
jroll | "" | 20:24 |
JayF | victor_lowther: My thought is more that any state that goes to ZAP/MANAGE would've already been CLEAN | 20:24 |
jroll | idgi | 20:24 |
victor_lowther | if they don't then you potentially have not zeroed the disks on a node you are transitioning to AVAILABLE for the first tim. | 20:24 |
jroll | unless steps can be in both ZAP and CLEAN | 20:24 |
JayF | 12:16:27 <victor_lowther> and ZAP/MANAGE would be clean + other interesting things. <-- that's the part I disagree with | 20:24 |
jroll | yeah, that too | 20:24 |
JayF | jroll: ++ that's more that I was thinking | 20:24 |
JayF | ZAP steps can be identical to CLEAN steps if that's what you want | 20:24 |
jroll | JayF: I think victor is adding things like "build a raid" into that | 20:24 |
victor_lowther | yy | 20:24 |
JayF | for purposes of ZAP, sure | 20:25 |
jroll | I think ZAP/MANAGE should likely be CLEAN + other things | 20:25 |
JayF | but I'm just saying we shouldn't do all the 'decom' / 'clean' steps in ZAP/MANAGE | 20:25 |
jroll | I can't think of anything you wouldn't do | 20:25 |
JayF | How about secure erasing a JBOD | 20:25 |
jroll | especially because ZAP/MANAGE is how you bring in a new node | 20:25 |
JayF | ZAP/MANAGE would also be used for people who wanted to change a node config after it's been created, right? | 20:26 |
victor_lowther | JayF: what specirfically would you not want to do in ZAP/MANAGE that you would want to do in CLEAN/AVAILABLE? | 20:26 |
devananda | JayF: zap/manage is a superset of zap/available | 20:26 |
JayF | I disagree with ^ the idea that zap/manage should be a superset of zap/available | 20:26 |
JayF | this is honestly part of why I don't want clean/zap conflated | 20:26 |
devananda | though as I read this -- I am more and more leaning towards zap/manage and clean/available | 20:26 |
JayF | if we want CLEAN, we should do a CLEAN then do a ZAP | 20:26 |
victor_lowther | other way around. | 20:27 |
JayF | er, okay | 20:27 |
NobodyCam | ++ other way around | 20:27 |
JayF | maybe going from MANAGE/ZAP could have an optional trip through CLEAN/AVAILABLE | 20:27 |
JayF | which seems like it'd fulfill your desires | 20:27 |
JayF | without mixing the things together which I don't like | 20:27 |
victor_lowther | I don't think it should be optional | 20:27 |
devananda | JayF: if we have CLEAN/AVAILABLE, do you think we can put all long-running reconfiguration tasks outside of the AVAILABLE loop entirely? | 20:27 |
JayF | define:long-running | 20:28 |
devananda | eg, requrie operators to transition a node back to MANAGED in order to do things which are not part of CLEAN | 20:28 |
victor_lowther | I basically always want to CLEAN things before they go into production | 20:28 |
devananda | for what ever definition of CLEAN you use | 20:28 |
*** lucasagomes has joined #openstack-ironic | 20:28 | |
JayF | victor_lowther: +1 I agree | 20:29 |
JayF | I just think that CLEAN before going int o prod should actually just go into CLEAN/AVAILABLE | 20:29 |
JayF | after ZAP/MANAGE is done | 20:30 |
JayF | pretty much anytime something "leaves" the AVAILABLE loop, it's reentry point (if cleaning enabled) should be CLEAN/AVAILABLE | 20:30 |
victor_lowther | well, then we will have to redraw the graph. | 20:30 |
devananda | JayF: what would you do in one state that you wouldn't d oin the other? | 20:30 |
victor_lowther | but I am fine with that. | 20:30 |
victor_lowther | i.e: MANAGE -> CLEAN -> AVAILABLE | 20:31 |
JayF | devananda: I don't think that's even relevant here | 20:31 |
victor_lowther | instead of MANAGE -> AVAILABLE | 20:31 |
JayF | if we're saying a node should be CLEAN before being AVAILABLE | 20:31 |
JayF | we should just have it transit that state | 20:31 |
JayF | rather than overloading ZAP to do n+CLEAN without going through the actual CLEAN state | 20:31 |
devananda | JayF: I'd like to know, though. what would you do in one state that you wouldn't do in the other? | 20:32 |
devananda | also, maybe someone already said that and I missed it -- multitasking meetings is great | 20:32 |
JayF | devananda: we have steps in our CLEAN that would change BIOS settings to enable access to some hardware then flips them back later | 20:32 |
JayF | if there's a case where I'm doing something like rebuilding a RAID, I may not want those settings changed | 20:32 |
JayF | or generally doing anything unneccessary that can be limited by write cycles (like reflashing a firmware) | 20:33 |
jroll | mmm. | 20:33 |
*** ParsectiX has quit IRC | 20:33 | |
JayF | plus I might enqueue a ZAP task to fix a node that was in CLEAN FAILED | 20:34 |
devananda | JayF: so that would leave the managed -> zap -> managed loop intact | 20:34 |
devananda | JayF: change zap/available to clean/available | 20:34 |
JayF | devananda: Yah; I'm very OK with that | 20:34 |
devananda | JayF: and insert that step in the connection from managed -> available as well | 20:35 |
JayF | then MANAGED -> [CLEAN] -> AVAILABLE | 20:35 |
devananda | right | 20:35 |
dlaube | ZAP? sounds pretty ….shocking. | 20:35 |
dlaube | :P | 20:35 |
victor_lowther | dlaube: it is my greatest contribution to Ironic ever. | 20:37 |
*** anderbubble has quit IRC | 20:38 | |
*** jjohnson2 has quit IRC | 20:39 | |
devananda | JayF: http://paste.openstack.org/show/b0XISoOBBfD48UrSB0CI/ | 20:40 |
dlaube | victor_lowther: is this like IPA but with extra awesome sauce added or something? | 20:40 |
dlaube | :D | 20:40 |
JayF | devananda: dumb question; what's the "R:" in that diagram? | 20:41 |
victor_lowther | It has _all_ the awesome sauce. And sprinkles. | 20:41 |
victor_lowther | JayF: API call | 20:41 |
jroll | JayF: the PUT request requesting a state | 20:41 |
JayF | that's what I thought | 20:41 |
jroll | dlaube: it's things like wiping disks, flashing firmware, etc | 20:41 |
JayF | devananda: +1 | 20:41 |
victor_lowther | dlaube: really, just hte name of a state. | 20:41 |
*** ParsectiX has joined #openstack-ironic | 20:41 | |
jroll | victor_lowther: ... for now :) | 20:42 |
victor_lowther | A state that does awesome things | 20:42 |
jroll | def do_zap() | 20:42 |
victor_lowther | ZOMG! | 20:42 |
dlaube | ahh ok | 20:42 |
jroll | it's going to end up everywhere | 20:42 |
dlaube | gotcha | 20:42 |
victor_lowther | I imagine IPA will be a player in that state. | 20:42 |
JayF | zap == Ironic is doing a thing to the hardware that the operator requested that isn't CLEANING or DEPLOYING | 20:42 |
victor_lowther | where such a thing can be "blow away my RAID array" or "flash all the things" | 20:43 |
JayF | or do a burn-in test | 20:44 |
victor_lowther | that too | 20:46 |
devananda | anyone care to poke more holes in my latest dia? | 20:47 |
lucasagomes | devananda, yup yeah I was suggesting that (checking before deploying) | 20:47 |
NobodyCam | same link? | 20:47 |
* lucasagomes too much scrollback | 20:47 | |
devananda | NobodyCam: http://paste.openstack.org/show/b0XISoOBBfD48UrSB0CI/ | 20:47 |
jroll | whoa, lucas is here late | 20:48 |
lucasagomes | jroll, devananda the thing is that putting before doesn't help cause you don't know how much time the machine is sitting on available (could be 1 month) | 20:48 |
NobodyCam | devananda: I asked earlier but may have missed the answer. you dropped the rebuild state? | 20:48 |
lucasagomes | jroll, and it could be optional :) so it can be fast | 20:48 |
lucasagomes | for the baremetal-to-tenant use case | 20:48 |
lucasagomes | jroll, yeah | 20:48 |
lucasagomes | I should go sleep :D | 20:49 |
jroll | lucasagomes: I still disagree | 20:49 |
devananda | lucasagomes: we already have a power status check loop for nodes in that state | 20:49 |
devananda | lucasagomes: no reason we couldn't add a similar check loop for other things there | 20:49 |
lucasagomes | power status check? | 20:49 |
lucasagomes | devananda, +1 | 20:49 |
lucasagomes | yeah offering some flexibility for checks there is good | 20:49 |
lucasagomes | if operators wants, cause u know someone may have pulled the cable | 20:49 |
devananda | lucasagomes: that runs in the background on available nodes, but not after a user has started booting one | 20:49 |
lucasagomes | like a periodic task? | 20:50 |
devananda | lucasagomes: we'll already notice if someone pulled the IPMI cable out | 20:50 |
devananda | lucasagomes: yes. periodic task | 20:50 |
devananda | which only runs on nodes in AVAILABLE state | 20:50 |
lucasagomes | right, yeah it's fine. Doesn't need to be a state | 20:50 |
lucasagomes | but it's good that we have in mind that this is a valid use case and we need to tackle it somehow | 20:50 |
lucasagomes | devananda, fair enff | 20:50 |
lucasagomes | jroll, seems to disagree, I still think its valid :) | 20:51 |
lucasagomes | but I'm ok with the periodic task | 20:51 |
devananda | lucasagomes: i object to putting in a mandatory inband-status-assertion-check during deploy | 20:51 |
devananda | that'll slow down deploys by way, way too much | 20:51 |
lucasagomes | devananda, it could be optional | 20:51 |
lucasagomes | that's what I'm trying to point, it's not because it's represented as a state that it has to be mandatory | 20:52 |
lucasagomes | zapping is optional afaiui | 20:52 |
devananda | but doing something in a periodic task to assert that nodes which we think are AVAILABLE actually are, and still have the same properties as the last time we checked? | 20:52 |
devananda | sure, that's fine | 20:52 |
*** igordcard has quit IRC | 20:52 | |
lucasagomes | devananda, yeah the periodic task works too :0 | 20:52 |
devananda | lucasagomes: in a state machine like A -> B -> C, state "B" is not optional | 20:52 |
lucasagomes | :)* | 20:52 |
* jroll just keeps his datacenter locked and doesn't worry about idle servers being changed | 20:53 | |
devananda | it may be no-op'd, but it's not optional | 20:53 |
lucasagomes | devananda, that's goes back to the FSM | 20:53 |
lucasagomes | state -> action -> state | 20:53 |
NobodyCam | jroll: power supplys do fail | 20:53 |
devananda | jroll: garden gnomes. they're sneaky ... | 20:53 |
lucasagomes | actions can be optional | 20:53 |
lucasagomes | which is where the task runs | 20:53 |
jroll | NobodyCam: I feel like we might notice that through IPMI, dunno | 20:53 |
lucasagomes | in a FSM the engine drivers the code from one state to another and just call some hooks (aka actions) | 20:53 |
lucasagomes | and it could be non-op | 20:54 |
jroll | look, without this, worst case scenario is the deploy fails and is rescheduled | 20:54 |
devananda | jroll: ++ | 20:54 |
jroll | which sucks as far as time that 'nova boot' takes | 20:54 |
jroll | but like, isn't that bad | 20:54 |
devananda | and the node gets kicked into maintenance mode | 20:54 |
jroll | error status, but yeah | 20:54 |
jroll | same idea | 20:54 |
devananda | right | 20:54 |
lucasagomes | alright I think we agreed with the periodic task ting | 20:55 |
lucasagomes | thing | 20:55 |
lucasagomes | I don't wanna go back to discuss FSM vs non-FSM | 20:55 |
lucasagomes | (did had a pleasant time doing that) | 20:55 |
lucasagomes | didn't* | 20:55 |
*** anderbubble has joined #openstack-ironic | 20:55 | |
*** igordcard has joined #openstack-ironic | 20:56 | |
* lucasagomes brb | 20:58 | |
*** jjohnson2 has joined #openstack-ironic | 21:00 | |
devananda | anyone interested in the cross-project meeting? | 21:00 |
devananda | it's starting now | 21:00 |
NobodyCam | did we have a volenter from our team for that? | 21:01 |
* jroll will be lurking | 21:01 | |
devananda | we have folks doing multiple cross project things | 21:01 |
NobodyCam | volunteer | 21:01 |
devananda | api, oslo, stable maint, vuln, etc ... | 21:01 |
devananda | this is a general all kinds of cross project thing | 21:01 |
devananda | thing | 21:01 |
NobodyCam | lucasagomes: when you get back.. take a look at https://review.openstack.org/#/c/138109 if you have the time | 21:03 |
*** marcoemorais has quit IRC | 21:03 | |
jjohnson2 | well, I officially have coded in VR | 21:04 |
NobodyCam | n VR? | 21:04 |
jjohnson2 | NobodyCam, yeah, had my editor up in my oculus | 21:04 |
NobodyCam | oh cool | 21:04 |
jjohnson2 | now I'm done doing that | 21:04 |
jjohnson2 | it needs a few more pixels before it'll be comfortable developing on a 12 meter screen 10 meters away | 21:05 |
rloo | devananda: wrt your latest diagram, nit: need arrow from MANAGED to AVAILABLE (and should the request be 'manage'? or eg 'available'?) | 21:09 |
devananda | rloo: there is such an arrow. it goes through CLEAN though | 21:10 |
devananda | rloo: and the verb is "provide" | 21:10 |
rloo | devananda: oh, I thought it was going to be optional to clean from MANAGED -> AVAILABLE. guess not. | 21:11 |
rloo | devananda: that was my (hopefully) last question. why is the verb 'provide' instead of 'clean'? | 21:11 |
devananda | rloo: i thought so too, but folks this morning were fairly adamant about it | 21:11 |
JayF | If you have cleaning disabled, obviously that steps a noop, right? | 21:11 |
devananda | clean could, presumably, decide if it wants to no-op when coming from managed (or something) | 21:11 |
devananda | JayF: sure | 21:11 |
JayF | yeah exactly | 21:12 |
rloo | JayF: I was wondering if you might want to skip the clean when you go from MANAGED-> AVAIL, but always clean after a deploy. | 21:12 |
mrda | rloo: thank you for your wise and thorough review of the logical-name spec | 21:12 |
JayF | this is a plank in my+joshnang's platform | 21:12 |
rloo | I guess if the 'clean' is smart enough to know when it might want to clean. like a lazy janitor :D | 21:12 |
mrda | rloo: but now I have to patch the merged spec as the point you raised is very valid :) | 21:14 |
rloo | mrda: yw. sorry, i meant to look at it sooner. but I feel overwhelmed when I look at the list of specs and my 'method' of starting with the older specs was probably not a good idea. | 21:14 |
NobodyCam | rloo: ++ to starting with older specs ... Thank you | 21:15 |
mrda | rloo: +1 | 21:15 |
mrda | rloo: appreciate your comments - had a small brain fade, which I now have to fix :) | 21:16 |
rloo | NobodyCam, mrda: good idea in theory, but I'm learning not to be too strict about it ;) | 21:16 |
rloo | mrda: no worries. I'm sure we would have picked them up at coding time. but I like sooner better than later ;) | 21:16 |
mrda | yup | 21:17 |
rloo | mrda. i think it is difficult to get a spec 'correct'. so good enuf is good enuf! | 21:17 |
jroll | rloo: great comments, sorry I landed that early | 21:18 |
rloo | jroll: that issue with no stack trace for the exception in the conductor/deploy, are you going to handle that? (eg open a ticket or whatever)? | 21:18 |
jroll | gah, did a bug not get filed? | 21:19 |
rloo | jroll: no worries. I don't think you landed that early, we have to get those specs approved. | 21:19 |
rloo | jroll: I don't think so. looking... | 21:19 |
jroll | I'd rather not own that | 21:19 |
rloo | jroll: ok, that was my other question. ok, i'll open a bug for it so we don't forget. I'm only opening a bug ;) | 21:20 |
jroll | ok, thanks | 21:20 |
jroll | we may have a use for lots of low-hanging fruit :) | 21:21 |
rloo | it'll take me longer to write up the bug than to fix it I think ;) | 21:21 |
mrda | So, regarding the logical-name spec, now that I have to patch it - do I just raise a new review with the (small) patch to remove the reference to tenant? | 21:21 |
mrda | and should I worry about a bug? | 21:21 |
* mrda thinks not | 21:21 | |
rloo | mrda: yup. i've got a small patch up for some other spec. | 21:21 |
JayF | I wouldn't bug it at all | 21:21 |
rloo | mrda: no bug | 21:21 |
jroll | mrda: jfdi :) | 21:22 |
rloo | ha ha | 21:22 |
mrda | ok cool, I'll raise a new review and fix my brain fade. Thanks for the direction. | 21:22 |
lucasagomes | NobodyCam, sure :) I will add it to the todo list here and review tomorrow morning | 21:32 |
lucasagomes | NobodyCam, it's a bit late now :) /me wants to relax :D | 21:32 |
*** mikedillion has quit IRC | 21:32 | |
NobodyCam | lucasagomes: its the start jjohnson2's ipmi to system command listener | 21:33 |
lucasagomes | yeah I did a quick skimming | 21:33 |
NobodyCam | :) | 21:33 |
jjohnson2 | huh? | 21:33 |
NobodyCam | your wip patch | 21:33 |
lucasagomes | awesome! seems we are going to have the ipmi listener (BiMiC) :) | 21:33 |
jjohnson2 | yeah, ipmi 2.0 only | 21:34 |
jjohnson2 | I'm doing the rmcp+ open session request parsing now | 21:34 |
*** openstackgerrit has quit IRC | 21:34 | |
lucasagomes | cool! 1.5 can come later no hurry (if needed as well) | 21:34 |
*** openstackgerrit has joined #openstack-ironic | 21:35 | |
NobodyCam | brb ... /me looks for some food stuffs | 21:35 |
openstackgerrit | Merged openstack/ironic-specs: iRMC Power Driver for Ironic https://review.openstack.org/134487 | 21:35 |
jjohnson2 | I might further walk the line of ipmitool and pyghmi compatibility testing first | 21:35 |
jjohnson2 | cipher suite 3 specifically | 21:36 |
*** alexpilotti has quit IRC | 21:38 | |
devananda | now, with REBUILD: http://paste.openstack.org/show/ojbuBbsQGNlDMyz2mnPj/ | 21:40 |
*** romcheg has joined #openstack-ironic | 21:40 | |
JayF | devananda: [ot] does nova rebuild in Ironic guarantee the same backend node? | 21:40 |
devananda | JayF: yes | 21:41 |
JayF | clearly it does with preserve ephemeral, but what about the other cases? | 21:41 |
openstackgerrit | Merged openstack/ironic-specs: Don't deprecate maint mode updates via node-update https://review.openstack.org/138178 | 21:46 |
*** ParsectiX has quit IRC | 21:50 | |
NobodyCam | w00 h00 :) rebuild | 21:52 |
*** mikedillion has joined #openstack-ironic | 21:53 | |
NobodyCam | devananda: do you think rescue would ever have a need to redeploy anything? | 21:54 |
*** sreekanth has quit IRC | 21:54 | |
*** anderbubble has quit IRC | 21:55 | |
*** Masahiro has joined #openstack-ironic | 21:56 | |
*** ParsectiX has joined #openstack-ironic | 21:57 | |
*** ParsectiX has quit IRC | 21:59 | |
*** ParsectiX has joined #openstack-ironic | 21:59 | |
devananda | NobodyCam: then its rebuild | 22:00 |
devananda | NobodyCam: rescue should be "net boot this machine into a recovery ramdisk so I can troubleshoot it" | 22:00 |
*** Masahiro has quit IRC | 22:01 | |
devananda | which, actually, as an operator, i might want to do from the MANAGEMENT side | 22:01 |
devananda | er, MANAGED | 22:01 |
devananda | without ever going to AVAILABLE or ACTIVE | 22:01 |
devananda | bah | 22:01 |
devananda | why did i have to think of that | 22:01 |
*** linggao has quit IRC | 22:02 | |
devananda | right, that's thinking of it the wrong way. time for more coffee. | 22:02 |
*** anderbubble has joined #openstack-ironic | 22:04 | |
*** dprince has quit IRC | 22:06 | |
NobodyCam | :) | 22:06 |
*** alexpilotti has joined #openstack-ironic | 22:08 | |
victor_lowther | devananda: I will get started on the next rev of the state machine spec using your latest graph. | 22:10 |
*** igordcard has quit IRC | 22:12 | |
devananda | victor_lowther: cheers | 22:13 |
*** ryanpetrello has quit IRC | 22:18 | |
openstackgerrit | Jarrod Johnson proposed stackforge/pyghmi: Implement server side IPMI protocol (WIP) https://review.openstack.org/138109 | 22:19 |
*** mjturek has quit IRC | 22:21 | |
*** Hefeweizen has quit IRC | 22:22 | |
*** mikedillion has quit IRC | 22:23 | |
*** ryanpetrello has joined #openstack-ironic | 22:23 | |
*** lucasagomes has quit IRC | 22:25 | |
JayF | devananda: no | 22:29 |
*** jjohnson2 has quit IRC | 22:29 | |
JayF | devananda: rescue is a nova concept; how can you rescue an instance if there is no instacne to rescue | 22:30 |
devananda | JayF: right. and the thing I was thinking of is ZAP | 22:32 |
*** foexle has quit IRC | 22:38 | |
*** ryanpetrello has quit IRC | 22:40 | |
*** anderbubble has quit IRC | 22:42 | |
*** ryanpetrello has joined #openstack-ironic | 22:44 | |
*** erwan_taf has quit IRC | 22:45 | |
openstackgerrit | Michael Davies proposed openstack/ironic-specs: Updates to logical name spec from review 134439 https://review.openstack.org/138565 | 22:45 |
JayF | mrda: ^ +2 | 22:47 |
*** ryanpetrello has quit IRC | 22:48 | |
victor_lowther | devananda: why UNRESCUE? | 22:48 |
devananda | it's not DEPLOYING and it's not RESCUING | 22:49 |
*** anderbubble has joined #openstack-ironic | 22:49 | |
victor_lowther | what I mean is | 22:49 |
devananda | returning the instance to the ACTIVE state | 22:50 |
victor_lowther | what is Ironic doing during that state transition? | 22:50 |
JayF | turning machine off | 22:50 |
jroll | rebooting to the instance | 22:50 |
JayF | changing boot device | 22:50 |
JayF | flipping networks | 22:50 |
devananda | probably changing the PXE configs | 22:50 |
JayF | turning machine on | 22:50 |
devananda | possibly those things too | 22:50 |
JayF | aweeks: ^ you should likely read this | 22:50 |
victor_lowther | ok. | 22:50 |
victor_lowther | question answered. | 22:50 |
devananda | so yah. there is an implicit UNRESCUEFAIL in my diagram too | 22:50 |
aweeks | yo | 22:51 |
mrda | thanks JayF | 22:51 |
devananda | as a path forward for the code itself | 22:51 |
aweeks | I'm in the process of implementing rescue mode | 22:51 |
aweeks | internally so far | 22:51 |
devananda | do ya'll think it might be worth implemeting the current states in a state machine | 22:52 |
devananda | landing that | 22:52 |
devananda | then moving to the new states? | 22:52 |
JayF | That just sounds really confusing tbh | 22:52 |
JayF | versus a clean "break" and migration | 22:52 |
NobodyCam | that seems like more work | 22:52 |
devananda | more work yes. easier to reason about the steps involved, migration / upgrade path, etc | 22:52 |
devananda | JayF: clean break doesn't sound like a good thing to me. why does it to you? | 22:53 |
aweeks | devananda: is there currently a proposal for states related to rescue mode? | 22:53 |
devananda | aweeks: http://paste.openstack.org/show/ojbuBbsQGNlDMyz2mnPj/ | 22:53 |
victor_lowther | For (UN)?RESCUING, is ironic responsible for the pxe/bootimg/whatever swizzling, or Something Nova-Ish? | 22:53 |
aweeks | ah, thanks | 22:53 |
devananda | victor_lowther: ironic is | 22:53 |
jroll | victor_lowther: ironic is | 22:53 |
JayF | victor_lowther: Ironic does it; Nova just tells us to rescue/unrescue | 22:53 |
victor_lowther | ok | 22:53 |
jroll | nova has the virt driver calls | 22:53 |
aweeks | yeah, there are two calls: rescue(), unrescue(), and a "RESCUED" state in Nova | 22:54 |
JayF | devananda: At first thought it seems simpler ... but honestly I'd defer to knowledge of others :) Upgrading a state machine without breaking backwards compat is hard :) | 22:54 |
*** jgrimm is now known as zz_jgrimm | 22:55 | |
*** marcoemorais has joined #openstack-ironic | 22:55 | |
*** anderbubble has quit IRC | 22:56 | |
devananda | JayF: let's assume it is hard but not impossible within this cycle. is that worth it? | 22:56 |
aweeks | devananda: in that diagram, what do the *s represent? "[RESCU*/RESCUE]" | 22:56 |
*** anderbubble has joined #openstack-ironic | 22:57 | |
devananda | aweeks: see 133828 and comments thereon | 22:57 |
JayF | devananda: I don't know :) | 22:57 |
*** romcheg has quit IRC | 22:57 | |
aweeks | devananda: got it, thanks | 22:58 |
devananda | JayF: massively breaking backwards compat the first cycle after integration isn't exactly on my priority list, btw :) | 22:58 |
*** marcoemorais1 has joined #openstack-ironic | 22:58 | |
JayF | devananda: are you sure? I think it'd be great, and there's lots of precedent for it to ;) | 22:59 |
JayF | s/to/too/ | 22:59 |
devananda | lol | 22:59 |
*** bradjones has quit IRC | 22:59 | |
*** marcoemorais has quit IRC | 23:00 | |
aweeks | devananda: not sure if relevant, but my implementation so far only has two states: RESCUEWAIT, and RESCUED in ironic | 23:02 |
aweeks | and implements the rescue() and unrescue() functions in the virt driver | 23:02 |
devananda | aweeks: that's fine for now | 23:02 |
JayF | aweeks: likely you'll want to either convince devananda to adopt the states you see or change the states you're using, lol | 23:02 |
devananda | aweeks: we'll likely rename all the states soon anyway | 23:03 |
aweeks | I don't really care about the names | 23:03 |
* devananda gets out the alphabet soup | 23:03 | |
jroll | hehe | 23:03 |
*** marcoemorais1 has quit IRC | 23:03 | |
aweeks | devananda: JayF: also, to be clear, the proposal includes removing *WAIT states, and instead has two separate states (the actual state, and a "wait" state)? | 23:05 |
*** marcoemorais has joined #openstack-ironic | 23:06 | |
devananda | aweeks: nope. it introduces a not-yet-well-defined wait flag | 23:06 |
*** ryanpetrello has joined #openstack-ironic | 23:06 | |
aweeks | hurm | 23:06 |
devananda | it does remove the *WAIT states, though -- that's correct | 23:06 |
devananda | DEPLOYING+wait | 23:06 |
devananda | RESCUING+wait | 23:06 |
devananda | etc | 23:07 |
NobodyCam | rloo: if you have a free minute, can you give https://review.openstack.org/#/c/138565/ a quick look over | 23:07 |
*** harlowja_ is now known as harlowja_away | 23:07 | |
*** spandhe has quit IRC | 23:08 | |
openstackgerrit | Victor Lowther proposed openstack/ironic-specs: New Ironic provisioner state machine. https://review.openstack.org/133828 | 23:09 |
victor_lowther | devananda: I think we should drop the wait flag stuff for now | 23:09 |
*** alexpilotti has quit IRC | 23:09 | |
victor_lowther | in the interests of finalizing the spec by the end of the week. | 23:09 |
devananda | victor_lowther: then we need to add *WAIT to the STATE* description | 23:10 |
aweeks | so, my possibly uninformed perspective is that it seems like the ironic state machine should be a super set of the nova state machine. in that there are a set of states in ironic that are 1-1 with the nova states, and edges in the nova state machine can be replaced by 1 or more states/edges in ironic? | 23:10 |
devananda | victor_lowther: because we must have a DEPLOYWAIT state, or equivalent | 23:10 |
devananda | or the state machien can't handle the current drivers | 23:10 |
victor_lowther | ah | 23:10 |
devananda | aweeks: superset, yes. there are also states within ironic where the node is not even visible to nova | 23:11 |
victor_lowther | Mind throwing which states need that treatment at the newly-updated spec? | 23:11 |
aweeks | the idea being that the ACTIVE (nova) -> rescue() -> RESCUED (nova) -> unrescue() -> ACTIVE (nova) in nova could be transformed into: ACTIVE (ironic) -> rescue() -> INTERNALSTATE (ironic) -> ... RESCUED (ironic) -> INTERNALSTATE (ironic) -> unrescue() -> ACTIVE (ironic) | 23:12 |
*** harlowja_away is now known as harlowja_ | 23:12 | |
aweeks | with ACTIVE and RESCUED being 1-1 between ironic/nova | 23:12 |
aweeks | but with intermediate states potentially in ironic | 23:12 |
jroll | unrelated: I really wish I could add arbitrary fields to node-list in the client | 23:12 |
devananda | victor_lowther: at a minimum, deploy, clean, zap.. possibly also validate, inspect | 23:13 |
victor_lowther | jroll: I was suprised that you could not. | 23:13 |
aweeks | or similar for other state transitions | 23:13 |
devananda | jroll: you mean, because the client has to change, or the API service doesn't erturn the fields you want? | 23:13 |
victor_lowther | devananda: so basically, instead of -ING states they should be -WAIT | 23:14 |
victor_lowther | ? | 23:14 |
NobodyCam | victor_lowther: "In the active state, Ironic is doing something to the node." just checking thats not reffering to the ACTIVE state | 23:14 |
victor_lowther | line? | 23:14 |
NobodyCam | 119-120 | 23:14 |
jroll | devananda: I mean, as an operator, I want to do a node-list and get last_error as well | 23:15 |
jroll | just throwing it out there | 23:15 |
victor_lowther | no, otherwise it would be in CAPS. That usage refers to the -ING state. | 23:15 |
NobodyCam | :) | 23:16 |
*** alexpilotti has joined #openstack-ironic | 23:16 | |
devananda | victor_lowther: more granularly, it may go like this, for some drivers | 23:16 |
devananda | DEPLOYING (ironic-conductor is doing things) | 23:16 |
devananda | DEPLOYWAIT (conductor is idle, lock is released, and an agent is doing something on the node locally) | 23:17 |
devananda | DEPLOYING (ironic conductor is working on it again, since the agent is done) | 23:17 |
devananda | DEPLOYDONE (hand off...) | 23:17 |
devananda | ACTIVE | 23:17 |
devananda | jroll: ^ fair statements? | 23:17 |
jroll | yes | 23:18 |
devananda | I think that is better modelled by DEPLOYING +/- WAIT_FLAG | 23:18 |
victor_lowther | well, if lucasgomes does not like our state machine now... | 23:19 |
victor_lowther | ya | 23:19 |
victor_lowther | what I have been missing is a clear articulation of precisely when and how the wait flag would work. | 23:19 |
devananda | it signals that ironic is mid-task, but has released the lock and is waiting for an external call-back | 23:19 |
*** ryanpetrello has quit IRC | 23:20 | |
devananda | the same process happens within introspection | 23:20 |
victor_lowther | specifically around how the node handoff to and from whatever external agent works. | 23:20 |
victor_lowther | argh, after 1700 here. | 23:20 |
NobodyCam | thats just what we do now | 23:21 |
devananda | for the PXE driver, there's a waiting period after the machine is first powered on | 23:21 |
victor_lowther | Gotta scram. | 23:21 |
devananda | victor_lowther: ack, ttyl | 23:21 |
*** alexpilotti has quit IRC | 23:21 | |
NobodyCam | have a good night victor_lowther | 23:21 |
NobodyCam | thank you for the awesome effort | 23:21 |
NobodyCam | and others too | 23:21 |
dlaube | g'night victor_lowther | 23:22 |
*** bradjones has joined #openstack-ironic | 23:28 | |
*** Haomeng has joined #openstack-ironic | 23:33 | |
*** spandhe has joined #openstack-ironic | 23:34 | |
*** Haomeng|2 has quit IRC | 23:34 | |
*** anderbubble has quit IRC | 23:35 | |
*** andreykurilin_ has quit IRC | 23:39 | |
*** Masahiro has joined #openstack-ironic | 23:45 | |
rloo | mrda: I was just looking at your patch 138565 | 23:47 |
rloo | mrda: does it say anywhere that the logical names must be unique? | 23:47 |
*** yuanying has joined #openstack-ironic | 23:47 | |
NobodyCam | rloo: I enfered that from the 1:1 uuid statment... mabe incorrectly | 23:47 |
NobodyCam | maybe* | 23:47 |
rloo | mrda: or is that what '1:1 mapping between a <logical name> and a <node uuid>' means. | 23:48 |
rloo | NobodyCam: ok. I must be tired, I don't remember what 1:1 mapping means! | 23:48 |
* NobodyCam just found https://wiki.openstack.org/wiki/OpenStackClient/HumanInterfaceGuidelines | 23:48 | |
JayF | rloo: 1:1 mapping means no duped names or uuids | 23:49 |
JayF | rloo: each name maps to exactly one uuid and vice-versa | 23:49 |
NobodyCam | does this mean we need to support --format in our cli | 23:49 |
rloo | JayF: thx for clarifying! | 23:49 |
*** Masahiro has quit IRC | 23:49 | |
JayF | NobodyCam: openstackclient != python-*client iirc | 23:49 |
JayF | NobodyCam: I think openstackclient is the "openstack" command/sdk people are working on | 23:49 |
NobodyCam | ok I took it as openstack clientS | 23:50 |
NobodyCam | you are correct | 23:50 |
*** yuanying_ has quit IRC | 23:50 | |
rloo | NobodyCam: Thx for asking; I'm good with 138565. Would you do the honours and approve it? | 23:51 |
NobodyCam | :) | 23:51 |
NobodyCam | rloo will do | 23:51 |
NobodyCam | rloo: done | 23:51 |
mrda | rloo: yes, a 1:1 mapping between logical_name and uuid implies that logical_name needs to be unique | 23:52 |
mrda | (or at least I intended it to be so) | 23:52 |
openstackgerrit | Josh Gachnang proposed openstack/ironic-python-agent: Use LLDP to get switch port mapping https://review.openstack.org/92627 | 23:52 |
NobodyCam | thats how I took it | 23:52 |
*** Hefeweizen has joined #openstack-ironic | 23:52 | |
NobodyCam | JoshNang: ooooo neat-oh | 23:52 |
JoshNang | NobodyCam: :D | 23:52 |
JoshNang | it's basically going to require a custom hardware manager per switch manufacturer though. lldp is a not great format | 23:53 |
JayF | NobodyCam: we're running that in our prod hw manager to verify our ports are accurate today | 23:53 |
* mrda just upgrade his internet from 6/0.3 to 30/1. It's a nice change :) | 23:54 | |
openstackgerrit | Merged openstack/ironic-specs: Updates to logical name spec from review 134439 https://review.openstack.org/138565 | 23:54 |
JayF | mrda: congratulations, that's what, like 20% of all the internet down there :P | 23:54 |
NobodyCam | lol | 23:54 |
mrda | lol, not exactly. Some people are getting 1000 down. | 23:55 |
NobodyCam | JayF: would that look like this: https://scholarworks.iu.edu/dspace/bitstream/handle/2022/171/image9CP.JPG | 23:56 |
mrda | ADSL -> Cable | 23:56 |
* NobodyCam has cable modem installed in his RV :) 100/30 atm I think | 23:56 | |
JayF | NobodyCam: which one of the cans is for the nsa? | 23:56 |
NobodyCam | lol | 23:56 |
mrda | NobodyCam: I can go there too, but it's an extra 20/month. I'll see how this goes for now. | 23:57 |
JayF | that's nice. | 23:57 |
JayF | I have 50 down now but I can get 300 down for like $20/month | 23:58 |
mrda | It's really the change from 0.3 to 1 up that's important. Video calls are hard at 0.3 up. | 23:58 |
NobodyCam | :) /me pays a bit more as he never get the "contract" price | 23:58 |
NobodyCam | mrda: audio only is ruff at .3 | 23:59 |
*** ryanpetrello has joined #openstack-ironic | 23:59 |
Generated by irclog2html.py 2.14.0 by Marius Gedminas - find it at mg.pov.lt!