*** max_lobur has joined #openstack-ironic | 00:11 | |
*** max_lobur has quit IRC | 00:11 | |
*** max_lobur has joined #openstack-ironic | 00:11 | |
*** max_lobur has quit IRC | 00:11 | |
*** matsuhashi has joined #openstack-ironic | 00:25 | |
*** max_lobur has joined #openstack-ironic | 00:31 | |
*** max_lobur has quit IRC | 00:31 | |
*** yongli has joined #openstack-ironic | 01:02 | |
*** BadCub01 has quit IRC | 01:12 | |
*** eghobo has joined #openstack-ironic | 01:23 | |
*** nosnos has joined #openstack-ironic | 01:36 | |
*** eghobo has quit IRC | 02:08 | |
openstackgerrit | Dan Prince proposed a change to openstack/ironic: Run ipmi power status less aggressively https://review.openstack.org/80400 | 02:28 |
---|---|---|
openstackgerrit | Dan Prince proposed a change to openstack/ironic: Run ipmi power status less aggressively https://review.openstack.org/80400 | 02:30 |
*** matsuhashi has quit IRC | 02:55 | |
*** eghobo has joined #openstack-ironic | 03:13 | |
*** matsuhashi has joined #openstack-ironic | 03:21 | |
*** rameshg87 has joined #openstack-ironic | 03:38 | |
*** ekarlso has quit IRC | 03:48 | |
*** ekarlso has joined #openstack-ironic | 03:48 | |
*** killer_p- has joined #openstack-ironic | 03:52 | |
*** killer_p- is now known as killer_prince | 03:52 | |
*** lazy_prince has quit IRC | 03:53 | |
*** todd_dsm has joined #openstack-ironic | 04:05 | |
*** killer_prince has quit IRC | 04:16 | |
*** lazy_prince has joined #openstack-ironic | 04:18 | |
*** lazy_prince is now known as killer_prince | 04:18 | |
*** todd_dsm has quit IRC | 04:41 | |
*** vkozhukalov has joined #openstack-ironic | 04:42 | |
*** vkozhukalov has left #openstack-ironic | 04:45 | |
*** killer_prince2 has joined #openstack-ironic | 04:52 | |
*** killer_prince2 is now known as lazy_prince | 04:52 | |
*** nosnos_ has joined #openstack-ironic | 05:16 | |
*** pradipta_away is now known as pradipta | 05:18 | |
*** nosnos has quit IRC | 05:19 | |
*** vkozhukalov has joined #openstack-ironic | 05:40 | |
*** vkozhukalov has left #openstack-ironic | 05:41 | |
*** todd_dsm has joined #openstack-ironic | 05:52 | |
*** matsuhas_ has joined #openstack-ironic | 06:00 | |
*** matsuhashi has quit IRC | 06:01 | |
openstackgerrit | Jenkins proposed a change to openstack/ironic: Imported Translations from Transifex https://review.openstack.org/78862 | 06:07 |
*** nosnos has joined #openstack-ironic | 06:19 | |
*** nosnos_ has quit IRC | 06:22 | |
*** matsuhas_ has quit IRC | 06:31 | |
*** matsuhashi has joined #openstack-ironic | 06:31 | |
*** saju_m has joined #openstack-ironic | 06:35 | |
*** mrda is now known as mrda_away | 06:45 | |
*** saju_m has quit IRC | 06:59 | |
*** max_lobur has joined #openstack-ironic | 07:08 | |
*** saju_m has joined #openstack-ironic | 07:12 | |
*** saju_m has quit IRC | 07:17 | |
*** saju_m has joined #openstack-ironic | 07:17 | |
*** branen_ has joined #openstack-ironic | 07:18 | |
*** todd_dsm has quit IRC | 07:19 | |
*** branen__ has quit IRC | 07:19 | |
*** saju_m has quit IRC | 07:22 | |
*** saju_m has joined #openstack-ironic | 07:23 | |
*** saju_m has quit IRC | 07:30 | |
*** saju_m has joined #openstack-ironic | 07:31 | |
*** eghobo has quit IRC | 07:32 | |
*** jistr has joined #openstack-ironic | 07:38 | |
*** todd_dsm has joined #openstack-ironic | 07:57 | |
*** athomas has joined #openstack-ironic | 08:21 | |
*** tzumainn has joined #openstack-ironic | 08:26 | |
*** ifarkas has joined #openstack-ironic | 08:35 | |
*** ndipanov has joined #openstack-ironic | 08:41 | |
*** lucasagomes has joined #openstack-ironic | 08:52 | |
*** derekh has joined #openstack-ironic | 08:57 | |
*** jistr has quit IRC | 08:59 | |
*** martyntaylor has joined #openstack-ironic | 09:01 | |
*** jistr has joined #openstack-ironic | 09:16 | |
dtantsur | Morning Ironic | 09:19 |
GheRivero | morning | 09:19 |
dtantsur | Can anyone have a quick look at https://review.openstack.org/#/c/81770/ ? It is (hopefully) the last patch to run Devstack+Ironic on Fedora | 09:20 |
*** martyntaylor has left #openstack-ironic | 09:20 | |
agordeev | dtantsur, GheRivero morning! | 09:22 |
yuriyz | morning Ironic | 09:26 |
agordeev | yuriyz: morning! | 09:28 |
*** pradipta is now known as pradipta_away | 09:28 | |
lucasagomes | morning all :) | 09:36 |
agordeev | dtantsur: are you in a hurry with that patch? | 09:41 |
agordeev | lucasagomes: morning! | 09:41 |
dtantsur | agordeev, no, nothing serious, just wanting to close this topic | 09:42 |
agordeev | dtantsur: ah, you can add this link/topic to today's meeting agenda :) | 09:47 |
dtantsur | agordeev, the thing is, I am pretty new here and I'm not sure how to add something to the agenda (btw, when and where is the meeting?) | 09:48 |
ifarkas | dtantsur, https://wiki.openstack.org/wiki/Meetings#Ironic_.28Bare_Metal.29_team_meeting | 09:50 |
*** vkozhukalov has joined #openstack-ironic | 09:50 | |
dtantsur | ifarkas, thnx. We may discuss Fedora status during "integration & testing" topic or in the end of the meeting | 09:53 |
mdurnosvistov | Morning all! :) | 09:53 |
ifarkas | dtantsur, sure. I think it fits better to the "integration & testing" part | 09:54 |
agordeev | mdurnosvistov: morning | 10:01 |
*** dshulyak has joined #openstack-ironic | 10:09 | |
*** vkozhukalov has left #openstack-ironic | 10:11 | |
*** aignatov is now known as bucash | 10:20 | |
*** bucash is now known as aignatov | 10:20 | |
openstackgerrit | A change was merged to openstack/ironic: Correct version.py and update current version string https://review.openstack.org/81327 | 10:24 |
*** jistr has quit IRC | 10:24 | |
*** jistr has joined #openstack-ironic | 10:27 | |
*** romcheg has joined #openstack-ironic | 10:29 | |
*** matsuhashi has quit IRC | 10:33 | |
*** matsuhas_ has joined #openstack-ironic | 10:35 | |
*** max_lobur1 has joined #openstack-ironic | 10:35 | |
*** max_lobur has quit IRC | 10:36 | |
*** vkozhukalov has joined #openstack-ironic | 10:54 | |
*** rameshg87 has quit IRC | 10:55 | |
*** max_lobur1 has quit IRC | 11:02 | |
*** matsuhas_ has quit IRC | 11:07 | |
*** athomas has quit IRC | 11:16 | |
*** nosnos has quit IRC | 11:23 | |
*** athomas has joined #openstack-ironic | 11:24 | |
*** romcheg has quit IRC | 11:42 | |
*** romcheg1 has joined #openstack-ironic | 11:42 | |
*** romcheg has joined #openstack-ironic | 11:47 | |
*** romcheg1 has quit IRC | 11:47 | |
openstackgerrit | Jenkins proposed a change to openstack/ironic: Updated from global requirements https://review.openstack.org/79334 | 11:48 |
*** matsuhashi has joined #openstack-ironic | 11:53 | |
*** nosnos has joined #openstack-ironic | 11:54 | |
*** ndipanov has quit IRC | 11:59 | |
*** ndipanov has joined #openstack-ironic | 12:03 | |
*** lucasagomes is now known as lucas-hungry | 12:05 | |
*** max_lobur has joined #openstack-ironic | 12:08 | |
*** romcheg has quit IRC | 12:11 | |
*** matsuhashi has quit IRC | 12:14 | |
*** matsuhas_ has joined #openstack-ironic | 12:17 | |
*** romcheg has joined #openstack-ironic | 12:21 | |
*** romcheg has quit IRC | 12:25 | |
*** romcheg has joined #openstack-ironic | 12:31 | |
*** lazy_prince has quit IRC | 12:31 | |
*** romcheg has quit IRC | 12:34 | |
*** romcheg has joined #openstack-ironic | 12:34 | |
*** rloo has joined #openstack-ironic | 12:36 | |
*** matty_dubs|brno is now known as matty_dubs | 12:46 | |
*** vkozhukalov has quit IRC | 12:49 | |
*** vkozhukalov1 has joined #openstack-ironic | 12:50 | |
*** linggao has joined #openstack-ironic | 12:55 | |
*** romcheg has quit IRC | 13:03 | |
*** lucas-hungry is now known as lucasagomes | 13:07 | |
*** matsuhas_ has quit IRC | 13:12 | |
*** matsuhashi has joined #openstack-ironic | 13:13 | |
jroll | vkozhukalov1: hi | 13:13 |
*** ndipanov_ has joined #openstack-ironic | 13:13 | |
jroll | vkozhukalov1: etherpad isn't working for me, the cursor is jumping all over the place, so I want to make a couple of quick comments here | 13:14 |
*** romcheg has joined #openstack-ironic | 13:14 | |
*** jbjohnso_ has joined #openstack-ironic | 13:14 | |
jroll | vkozhukalov1: devananda is correct in that a CMDB is far outside the scope of the agent project. I want to be clear that this is an *ironic* agent, not a generic agent framework. | 13:15 |
*** ndipanov has quit IRC | 13:16 | |
*** matsuhashi has quit IRC | 13:17 | |
jroll | vkozhukalov1: also, sorry for butchering "command line" in the etherpad, again, etherpad isn't working right for me :( | 13:19 |
*** max_lobur has quit IRC | 13:22 | |
*** max_lobur has joined #openstack-ironic | 13:23 | |
*** rustlebee is now known as russellb | 13:23 | |
*** nosnos has quit IRC | 13:24 | |
openstackgerrit | Rohan Kanade proposed a change to openstack/ironic: Adds max retry limit to sync_power_state task https://review.openstack.org/77420 | 13:24 |
vkozhukalov1 | jroll: hello, it is ok, external cmdb is not critical | 13:24 |
*** nosnos has joined #openstack-ironic | 13:25 | |
vkozhukalov1 | jroll: let's leave it outside of scope | 13:25 |
vkozhukalov1 | jroll: I've just added some points to discuss on today meeting. https://wiki.openstack.org/wiki/Meetings/Ironic#Agenda_for_next_meeting | 13:26 |
jroll | vkozhukalov1: ok :) you can always have the CMDB talk to ir-api and the agent, to do what you need | 13:26 |
jroll | cool | 13:26 |
jroll | vkozhukalov1: LGTM | 13:27 |
*** nosnos has quit IRC | 13:29 | |
vkozhukalov1 | jroll: the main question is granular procedural approach vs declarative (hard coded) approach, I believe both of them are compatible with one another. | 13:31 |
jroll | I agree | 13:31 |
jroll | I think granular APIs are a hard requirement | 13:32 |
jroll | and if you have that, adding a "flow" sort of thing on top is simple | 13:32 |
*** tatyana has joined #openstack-ironic | 13:32 | |
jroll | because it's just a list of arbitrary commands to run | 13:32 |
jroll | validation might be difficult, but should be doable | 13:33 |
jroll | vkozhukalov1: I'd like to focus on the granular APIs first, the "flow" thing to me is just a nice to have | 13:33 |
vkozhukalov1 | jroll: yes, I agree that implementing "flow" over granular API is quite simple | 13:34 |
jroll | cool :) | 13:35 |
*** tatyana has quit IRC | 13:36 | |
*** tatyana has joined #openstack-ironic | 13:36 | |
jroll | vkozhukalov1: I'm also working on a draft for updating the agent wiki page - I'll wait until after the meeting to finish it, though | 13:36 |
vkozhukalov1 | jroll: another question about granularity and validation is: should particular tasks be dependent on others? For example, if you want to granularly install bootloader, it is certainly not always possible when you did not configure disks before. It seems not so trivial to follow such inter-dependencies. | 13:38 |
*** jgrimm has quit IRC | 13:39 | |
jroll | vkozhukalov1: I would argue that the API should not handle those dependencies | 13:40 |
jroll | with the bootloader example | 13:40 |
jroll | the install bootloader endpoint should first check if it is possible | 13:40 |
jroll | e.g. if there are no partitions, error out | 13:40 |
jroll | but it should not actually do the disk setup | 13:40 |
jroll | does that make sense? | 13:40 |
openstackgerrit | Rohan Kanade proposed a change to openstack/ironic: Adds max retry limit to sync_power_state task https://review.openstack.org/77420 | 13:42 |
*** jrist has joined #openstack-ironic | 13:44 | |
vkozhukalov1 | jroll: we are on the same page here, I also think that driver should just error out, not doing anything if it is not possible | 13:45 |
*** max_lobur has quit IRC | 13:46 | |
jroll | :) | 13:46 |
*** max_lobur has joined #openstack-ironic | 13:46 | |
vkozhukalov1 | AFK | 13:48 |
*** rloo has quit IRC | 13:54 | |
*** rloo has joined #openstack-ironic | 13:54 | |
openstackgerrit | Dan Prince proposed a change to openstack/ironic: Run ipmi power status less aggressively https://review.openstack.org/80400 | 13:54 |
*** rloo has quit IRC | 13:55 | |
*** rloo has joined #openstack-ironic | 13:55 | |
*** vkozhukalov1 has left #openstack-ironic | 13:57 | |
*** nosnos has joined #openstack-ironic | 13:58 | |
*** matsuhashi has joined #openstack-ironic | 13:59 | |
*** vkozhukalov1 has joined #openstack-ironic | 13:59 | |
*** nosnos has quit IRC | 14:00 | |
*** nosnos has joined #openstack-ironic | 14:01 | |
*** nosnos has quit IRC | 14:05 | |
NobodyCam | Good Morning Ironic | 14:09 |
romcheg | Morning NobodyCam | 14:09 |
lucasagomes | morning NobodyCam romcheg :) | 14:10 |
romcheg | Morning lucasagomes | 14:10 |
*** nosnos has joined #openstack-ironic | 14:11 | |
NobodyCam | Morning romcheg and lucasagomes :) | 14:11 |
NobodyCam | how was your weekends | 14:12 |
*** vkozhukalov1 has left #openstack-ironic | 14:12 | |
lucasagomes | great :) just came back from prague | 14:13 |
lucasagomes | awesome city | 14:13 |
NobodyCam | :) | 14:13 |
romcheg | I've returned from Moscow. Border control guys there are assholes :) | 14:13 |
*** nosnos has quit IRC | 14:14 | |
romcheg | How was yours? | 14:14 |
*** krtaylor has quit IRC | 14:15 | |
*** vkozhukalov1 has joined #openstack-ironic | 14:16 | |
*** matsuhashi has quit IRC | 14:16 | |
*** vkozhukalov1 has left #openstack-ironic | 14:16 | |
*** vkozhukalov1 has joined #openstack-ironic | 14:16 | |
*** rwsu has joined #openstack-ironic | 14:17 | |
jroll | morning all :) | 14:18 |
*** max_lobur1 has joined #openstack-ironic | 14:19 | |
*** jgrimm has joined #openstack-ironic | 14:21 | |
*** max_lobur has quit IRC | 14:21 | |
*** vkozhukalov1 has left #openstack-ironic | 14:22 | |
NobodyCam | was good ... installed a water misting system on the rv for the trip to atalana | 14:22 |
NobodyCam | morning jroll | 14:22 |
jroll | that'll be a heck of a drive | 14:23 |
NobodyCam | yep leaving april 1 :) | 14:24 |
jroll | nice :) | 14:25 |
*** vkozhukalov1 has joined #openstack-ironic | 14:26 | |
*** vkozhukalov1 has left #openstack-ironic | 14:26 | |
*** toure has joined #openstack-ironic | 14:28 | |
*** vkozhukalov1 has joined #openstack-ironic | 14:29 | |
*** matsuhashi has joined #openstack-ironic | 14:32 | |
*** lucasagomes_ has joined #openstack-ironic | 14:33 | |
*** lucasagomes has quit IRC | 14:34 | |
*** lucasagomes_ is now known as lucasagomes | 14:37 | |
*** toure has quit IRC | 14:37 | |
* NobodyCam calls tmobile | 14:37 | |
*** toure has joined #openstack-ironic | 14:39 | |
*** saju_m has quit IRC | 14:39 | |
*** ndipanov_ has quit IRC | 14:41 | |
*** krtaylor has joined #openstack-ironic | 14:44 | |
NobodyCam | woo hoo /me has a phone again :-p | 14:45 |
NobodyCam | so much for updating the sim card over the weekend | 14:45 |
*** ndipanov_ has joined #openstack-ironic | 14:53 | |
*** vkozhukalov1 has left #openstack-ironic | 15:03 | |
*** dhellmann_ is now known as dhellmann | 15:05 | |
*** vkozhukalov has joined #openstack-ironic | 15:08 | |
NobodyCam | Thank you lucasagomes :) | 15:10 |
lucasagomes | NobodyCam, np! thank u :) | 15:10 |
*** todd_dsm has quit IRC | 15:11 | |
*** vkozhukalov has quit IRC | 15:12 | |
*** toure has quit IRC | 15:13 | |
*** toure has joined #openstack-ironic | 15:13 | |
*** eghobo has joined #openstack-ironic | 15:16 | |
*** ndipanov_ has quit IRC | 15:20 | |
*** matsuhashi has quit IRC | 15:25 | |
*** todd_dsm has joined #openstack-ironic | 15:27 | |
NobodyCam | brb | 15:27 |
*** matsuhas_ has joined #openstack-ironic | 15:28 | |
*** matsuhas_ has quit IRC | 15:33 | |
*** ndipanov_ has joined #openstack-ironic | 15:34 | |
*** mkerrin has joined #openstack-ironic | 15:35 | |
*** rloo has quit IRC | 15:39 | |
*** rloo has joined #openstack-ironic | 15:40 | |
devananda | g'morning, all | 15:48 |
NobodyCam | good morning devananda :) | 15:48 |
lucasagomes | morning derekh | 15:52 |
lucasagomes | devananda, | 15:52 |
lucasagomes | sorry derekh :) | 15:52 |
derekh | lucasagomes: np | 15:52 |
*** BadCub01 has joined #openstack-ironic | 15:53 | |
*** blamar has joined #openstack-ironic | 15:59 | |
*** blamar has joined #openstack-ironic | 16:00 | |
*** ifarkas has quit IRC | 16:04 | |
* NobodyCam starts a devtest run... | 16:08 | |
NobodyCam | brb | 16:08 |
*** dwalleck has joined #openstack-ironic | 16:11 | |
*** dwalleck has quit IRC | 16:15 | |
*** eghobo has quit IRC | 16:16 | |
devananda | romcheg: hi there! welcome back :) | 16:19 |
romcheg | Morning devananda | 16:19 |
*** romcheg has left #openstack-ironic | 16:19 | |
*** romcheg has joined #openstack-ironic | 16:19 | |
romcheg | whoops, accidentally left the chat | 16:19 |
*** tatyana has quit IRC | 16:21 | |
*** eghobo has joined #openstack-ironic | 16:22 | |
romcheg | *is trying to understand the math here https://review.openstack.org/#/c/80400/6/ironic/drivers/modules/ipmitool.py* | 16:23 |
*** toure has left #openstack-ironic | 16:23 | |
jroll | heh | 16:23 |
*** tatyana has joined #openstack-ironic | 16:27 | |
devananda | ugh | 16:34 |
NobodyCam | ugh? | 16:34 |
devananda | that does not need to be called recursively, on every iteration of the loop | 16:34 |
jroll | not to mention it will infinitely recurse... | 16:36 |
jroll | x ** 2 will never == -1 | 16:36 |
lucasagomes | jroll, but that's total_time | 16:37 |
jroll | OH | 16:37 |
lucasagomes | he's checking if retry == -1 | 16:37 |
* jroll runs off to get coffee :) | 16:37 | |
lucasagomes | but anyway, that logic is a bit confuse | 16:37 |
devananda | when retry == 0, the next call will == -1 and not recurse further | 16:38 |
devananda | it's really bad logic. and the variable "total_time" is not total_time at all | 16:38 |
lucasagomes | hehe yeah | 16:38 |
devananda | and there's a simple formula for sum-of-a-sequence | 16:38 |
lucasagomes | should we have 2 parameters for it? one for max sleep time | 16:39 |
lucasagomes | and the second for number of retries? | 16:39 |
devananda | no | 16:39 |
devananda | just one -- total time to retry | 16:39 |
romcheg | Isn't that too implicit? | 16:39 |
devananda | then do an exponential back-off starting until that total time is reached | 16:39 |
devananda | 1, 2, 4, 8, .... | 16:39 |
devananda | or use fibonacci sequence. similar effect | 16:40 |
lucasagomes | 1,4,9,16... | 16:40 |
devananda | n^2 vs 2^n :) | 16:40 |
lucasagomes | heh | 16:41 |
devananda | point is to avoid DOS a BMC with polling the power state | 16:41 |
devananda | and we don't need complicated logic, multiple CONF options, or recursive methods to do that | 16:42 |
lucasagomes | yeah, indeed | 16:43 |
jroll | +1 | 16:43 |
devananda | romcheg: https://review.openstack.org/#/c/81336/ could use another pair of eyes | 16:43 |
romcheg | *looks* | 16:43 |
devananda | hmm | 16:44 |
devananda | lucasagomes: you -1'd https://review.openstack.org/#/c/81340/ - let's chat for a minute | 16:44 |
lucasagomes | devananda, sure | 16:44 |
lucasagomes | I put some comments there | 16:44 |
lucasagomes | I agree that the rescue api is incomplete/not mature | 16:44 |
devananda | lucasagomes: ah. so i wasn't clear -- i meant, we have no REST API for it | 16:44 |
lucasagomes | devananda, ahh that's correct | 16:45 |
devananda | lucasagomes: the driver API may or may not be (in)complete -- it's untested as no driver has even started implementing it | 16:45 |
lucasagomes | devananda, right | 16:45 |
devananda | lucasagomes: so I wanted to hide the driver interface so it isn't presented in the list of supported interfaces | 16:45 |
devananda | in the API | 16:45 |
NobodyCam | bbiafm post bbt walkies... | 16:45 |
devananda | without any REST API to call it | 16:45 |
lucasagomes | devananda, right | 16:46 |
lucasagomes | but when it's implemented we would need to revert the change that that patch is doing | 16:46 |
devananda | lucasagomes: dependency of https://review.openstack.org/#/c/81336/ whcih you +2 :) | 16:46 |
devananda | yes | 16:46 |
lucasagomes | so I would just leave it there for now | 16:46 |
lucasagomes | ironic didn't graduate so our api will still change | 16:46 |
devananda | sure | 16:46 |
lucasagomes | devananda, I agree with 81336, which shows rescue as not supported | 16:47 |
lucasagomes | but 81340 to me sounds like, removing part of the plumbing that will be needed in the future | 16:47 |
devananda | lucasagomes: so when someone implements an out-of-tree driver based on the Icehouse release *and* adds a driver.rescue() interface | 16:47 |
devananda | lucasagomes: they will have been mislead since there wouldn't be any REST API to invoke it | 16:47 |
devananda | I'm happy to revert 81340 as soon as Juno opens | 16:48 |
devananda | but I think it should be hidden in Icehouse release | 16:48 |
lucasagomes | right... so wouldn't be better to implement the rest api instead of ripping that off? | 16:48 |
devananda | well. if we had done that a month ago, yes :) | 16:48 |
lucasagomes | heh | 16:48 |
devananda | RC1 is this week, I think | 16:49 |
*** harlowja has joined #openstack-ironic | 16:49 | |
lucasagomes | ok grand, we also need to remove from the python client | 16:49 |
devananda | there's still a bunch on https://launchpad.net/ironic/+milestone/icehouse-rc1 that haven't landed yet | 16:49 |
lucasagomes | which on previous versions will still showing rescue in the validate | 16:49 |
devananda | hm? | 16:49 |
devananda | "on previous versions" -- i'm not sure what you mean | 16:50 |
lucasagomes | devananda, ah ignore that, I thought that in the client we were using print_dict | 16:51 |
lucasagomes | and had a list with the driver interfaces that needs to be listed | 16:52 |
lucasagomes | but we are not | 16:52 |
lucasagomes | (which is good) | 16:52 |
lucasagomes | https://github.com/openstack/python-ironicclient/blob/master/ironicclient/v1/node_shell.py#L162 | 16:52 |
lucasagomes | devananda, right... so agreed, for the icehouse release we can hide that interface | 16:53 |
lucasagomes | devananda, also, can I have ur opnion on https://blueprints.launchpad.net/ironic/+spec/credentials-keystone-v3 ? | 16:53 |
NobodyCam | lucasagomes: +1 from me on that BP | 16:54 |
lucasagomes | NobodyCam, :) yeah I was doing some experimentation here with keystone, seems quite flexible | 16:55 |
devananda | lucasagomes: hm, interesting idea | 16:55 |
lucasagomes | yeah | 16:56 |
lucasagomes | I don't think ironic should be managing credentials at all | 16:56 |
NobodyCam | devananda: remove all passwords from our db | 16:56 |
devananda | lucasagomes: i've been toying with ironic using reversible AES to store credentials | 16:56 |
devananda | I agree we need to handle credentials more sanely than we do today | 16:56 |
lucasagomes | yeah | 16:56 |
devananda | like, seriously. that's something I should have done a while back | 16:56 |
lucasagomes | but there's also a problem with fragmentation, cause keystone is the service that is supposed to manage this sort of things | 16:57 |
devananda | keystone manages credentials for openstack services | 16:57 |
devananda | hmm | 16:57 |
lucasagomes | yeah, but with v3 they made it flexible enough to other services to store other credentials within keystone as well | 16:58 |
devananda | it's also a question of how tightly coupled should ironic be with keystone. if we put all creds there, then it's not possible to start ironic until after keystone starts | 16:58 |
NobodyCam | i like offloading the cerds to keystone, I would be ok with keeping basic password support as we have now so that ironic could be used by "devs" outside of the openstack env for testing | 16:58 |
devananda | if we support >1 location for cred storage, it needs to be pluggable. and then we've just re-implemented keystone | 16:59 |
dtantsur | I need your help :) I'm trying to follow recently-merged guide for Ironic on devstack (and as usual on Fedora 20), and I'm stuck with the problem: instance is in "building" state for looong-long time, then fails | 16:59 |
lucasagomes | devananda, maybe we should offer some flexibility like... you can store, ipmi_password/username, ipmi_credential_id? | 17:00 |
dtantsur | the only thing I found in logs grepping by ERROR was: http://paste.openstack.org/show/74162/ | 17:00 |
dtantsur | (traceback converted to human-readable form: http://paste.openstack.org/show/74164/ ) | 17:00 |
dtantsur | any advice is appreciated | 17:00 |
devananda | dtantsur: ERROR ironic.conductor.manager [-] Timeout reached when waiting callback for node | 17:01 |
devananda | dtantsur: check syslog for messages from tftp | 17:01 |
NobodyCam | dtantsur: how did you build your deploy ramdisk? | 17:01 |
romcheg | devananda: I've looked through 81336 and I think we can land it | 17:01 |
dtantsur | devananda, yeah, that should be the cause, though exception also bothers me | 17:01 |
romcheg | agree? | 17:01 |
JayF | If I understand correctly, you said v3 Keystone is what you need to store creds for other services. Is it OK for Ironic to take a hard dep on Keystone v3 API? | 17:01 |
*** vkozhukalov has joined #openstack-ironic | 17:01 | |
dtantsur | NobodyCam, devstack: BM_DEPLOY_FLAVOR="-a amd64 fedora deploy-ironic" | 17:01 |
NobodyCam | ok | 17:02 |
lucasagomes | JayF, that's need to be discussed | 17:02 |
lucasagomes | JayF, right now we have a mix of v2 and v3 | 17:02 |
lucasagomes | which seems kinda messy | 17:02 |
NobodyCam | dtantsur: do you have access to console on the node your deploying? | 17:02 |
devananda | dtantsur: if syslog shows the node being served kernel/ramdisk by tftp, and also fetching the deploy token, then it should have worked. my guess is one or both of those did not get pulled | 17:02 |
lucasagomes | maybe we should do -just like other services: heat - and use v3 only | 17:02 |
devananda | lucasagomes: i thought we are only eystone v2 today | 17:03 |
*** todd_dsm has quit IRC | 17:03 | |
lucasagomes | devananda, the common/keystone.py supports v3 as well | 17:03 |
JayF | I think we'd strongly prefer to not have a hard dep on Keystone v3 | 17:03 |
devananda | ah | 17:03 |
lucasagomes | to get things from the catalog | 17:03 |
dtantsur | devananda, is it sudo journalctl | grep -i tftp ? Than nothing intriguing.. | 17:03 |
lucasagomes | JayF, right, any particular reason? | 17:03 |
devananda | dtantsur: i dont know journalctl. /var/log/syslog ? | 17:03 |
JayF | lucasagomes: it significantly raises the bar for integration with existing clouds | 17:04 |
devananda | JayF: hm. afaik, we need keyv3 for signed URLs, both in swift and ironic, which is how we are looking at doing any sort of secure callback from teh agent | 17:04 |
dtantsur | devananda, sudo grep -rni tftp /var/log/ gives nothing (journalctl seems to be replacement for syslog in Fedora) | 17:04 |
devananda | dtantsur: are you running a tftp service? | 17:04 |
lucasagomes | JayF, yeah indeed... but afaik services like heat only supports v3 no? | 17:04 |
lucasagomes | so we won't be first | 17:05 |
lucasagomes | devananda, yeah, the trust thing for the ramdisk we might need v3 as well (I think) | 17:05 |
devananda | dtantsur: look in devstack/lib/ironic for 'tftp' and see which service should be running | 17:05 |
JayF | IMO heat taking the dependency wasn't a decision I would've made/supported, but I work on Ironic not heat :) | 17:05 |
lucasagomes | heh | 17:06 |
JayF | I'm just saying it might be worthwhile to have it be more than a passing IRC conversation to add that dependency | 17:06 |
lucasagomes | JayF, oh sure | 17:06 |
lucasagomes | I mean, the credentials thing is not even approaved or anything | 17:06 |
lucasagomes | we will discuss it more | 17:06 |
lucasagomes | and see the impacts | 17:06 |
JayF | Thanks, that's what I wanted to be sure of | 17:07 |
* NobodyCam starts devtest again, | 17:07 | |
devananda | lucasagomes: so one reason I suspect folks will object to using keystone for IPMI creds | 17:07 |
devananda | lucasagomes: is privilege separation | 17:07 |
devananda | direct BMC access (and the credentials thereof) are generally much more tightly controlled than access to the tools which *use* those accounts to provision hardware | 17:08 |
devananda | eg, separate accounts to grant "nova boot --flavor baremetal" vs "ironic node-show | grep password" | 17:09 |
devananda | right now, we only support very rudimentary access for this via keystone v2 | 17:09 |
devananda | but we do support separating those two actions so that "users" of baremetal can't see the credentials | 17:09 |
lucasagomes | right | 17:10 |
lucasagomes | so yeah the keystone thing right now is admin only | 17:10 |
dtantsur | devananda, I see no signs of tftp in ps, nor in services; seems like tftp should be operated by xinetd, and there's config file for it. It is supposed to listen on port 69, right? | 17:10 |
lucasagomes | ironic would only store a ref to the credentials | 17:10 |
*** romcheg has quit IRC | 17:10 | |
lucasagomes | but only admins will be able to list/get it | 17:10 |
devananda | dtantsur: it should definitely be running. that's (one of) the problem(s) | 17:10 |
lucasagomes | https://github.com/openstack/keystone/blob/master/etc/policy.json#L59 | 17:10 |
dtantsur | devananda, netstat only gives udp6 0 0 :::69 :::* | 17:10 |
dtantsur | devananda, maybe it only binds to ipv6 endpoint? | 17:11 |
NobodyCam | dtantsur: that should be both v4 and v6 | 17:11 |
devananda | lucasagomes: IMHO, ironic should continue to "own" BMC creds, but use keyv3 policy to limit access to them even further | 17:11 |
devananda | lucasagomes: perhaps to teh point of preventing retrieval via the REST API | 17:11 |
devananda | lucasagomes: eg, write-but-dont-read | 17:11 |
devananda | DC ops often have security/compliance requirements around this stuff | 17:12 |
devananda | my feeling right now is taht stashing the BMC creds in keystone is going to violate those compliance reqs | 17:13 |
NobodyCam | devananda: thats why I like pushing the creds on to keystone | 17:13 |
devananda | NobodyCam: huh? | 17:14 |
NobodyCam | you think so | 17:14 |
*** ifarkas has joined #openstack-ironic | 17:14 | |
devananda | NobodyCam: yes, because it would put all the credentials in one place | 17:14 |
NobodyCam | keystones job is creds so they are keeping up on how to store passwords / keys | 17:14 |
NobodyCam | oh | 17:14 |
devananda | it's clearly worth bringing this up on the ML :) | 17:15 |
lucasagomes | devananda, +1 | 17:15 |
dtantsur | devananda, NobodyCam: added `flags = IPv4` to xinet.d configuration, now I have xinet.d listen on ipv4 endpoint as well, will try again | 17:15 |
JayF | Yeah but there needs to be some separation of responsibilities; user auth and BMC auth are different use cases and security cases IMO | 17:15 |
devananda | JayF: right | 17:15 |
JayF | devananda: perhaps a topic for the meeting? But maybe needs to be hashed out more first | 17:15 |
lucasagomes | yeah I think for the meeting next week we could talk more about it | 17:15 |
lucasagomes | it's too fresh right now | 17:16 |
lucasagomes | needs more thought | 17:16 |
NobodyCam | lucasagomes: is what dtantsur just did a fedora requirment? | 17:16 |
* lucasagomes reads | 17:16 | |
NobodyCam | added `flags = IPv4` to xinet.d configuration, | 17:17 |
dtantsur | NobodyCam, I've seen similar issues on boxes with both IPv4 and v6 supported, when service uses common approach: enumerate all possible endpoints and try to bind, unless one succeed | 17:18 |
JoshNang | I added a blueprint to describe the agent driver we're working on. I should have some preliminary code pushed up shortly. Perhaps its another topic for the meeting? https://blueprints.launchpad.net/ironic/+spec/agent-driver | 17:18 |
lucasagomes | NobodyCam, hmm I didn't do that in my env | 17:18 |
*** epim has joined #openstack-ironic | 17:18 | |
jroll | JoshNang: nice | 17:18 |
jroll | JoshNang: that depends on some un-merged reviews, right? | 17:19 |
jroll | oh, I should finish reading first ;) | 17:19 |
NobodyCam | devananda: lucasagomes want a creds topic on the agenda, as JayF says to early? | 17:20 |
JoshNang | jroll: yup. maybe some other ones i forgot about too | 17:20 |
lucasagomes | NobodyCam, maybe fft | 17:20 |
NobodyCam | :) | 17:20 |
JayF | NobodyCam: I'd say next week after a ML thread this week? | 17:21 |
lucasagomes | or for the meeting next week, with a better idea/understanding of the implications | 17:21 |
jroll | JoshNang: I think it's just that one | 17:21 |
lucasagomes | JayF, sounds good | 17:21 |
NobodyCam | :) well hold off to see what ML comes up with | 17:21 |
devananda | JayF, lucasagomes: either of you want to start the ML thread? | 17:23 |
lucasagomes | sure | 17:23 |
devananda | lucasagomes: also, did you change your mind on https://review.openstack.org/#/c/81340/2 ? | 17:24 |
devananda | lucasagomes: *thanks, aslo ... | 17:24 |
lucasagomes | devananda, oh yeah, I was about to remove my -1 | 17:24 |
lucasagomes | devananda, there's that nit in the commit message, but I think it's ok, if u rebase that or put another patch set up u can fix | 17:25 |
devananda | lucasagomes: ah. I'll fix taht nit if you're ready to +2 afterwards | 17:26 |
devananda | just want to get it in today if we're going to get it in :) | 17:26 |
lucasagomes | devananda, don't need to, I will +2 that with the nit | 17:26 |
lucasagomes | cause it's not like a problem :) | 17:26 |
devananda | NobodyCam: did you decide what you want to do with https://review.openstack.org/#/c/80376/ ? | 17:26 |
devananda | lucasagomes: cheers | 17:26 |
devananda | oh, and https://review.openstack.org/#/c/81336/ needs a +A | 17:27 |
devananda | looks like romcheg +2'd but didn't +A | 17:27 |
lucasagomes | devananda, done | 17:28 |
devananda | ty | 17:28 |
NobodyCam | devananda: I'd like to switch to kwystone, but will wait for ML, if we keep creds in our db then the reverisable aes stuff will be needed, and a what to filter out the password/key from node show | 17:29 |
* NobodyCam looks | 17:29 | |
NobodyCam | lucasagomes: beet me to the +a | 17:30 |
NobodyCam | :) | 17:30 |
lucasagomes | NobodyCam, I will try to send the email to the ML today or tomorrow morning tops | 17:30 |
NobodyCam | s/what/way/ | 17:30 |
NobodyCam | lucasagomes: :) TY :) | 17:30 |
lucasagomes | someone knows whether the signed url for the ramdisk needs v3 or not? | 17:33 |
devananda | yes | 17:33 |
devananda | , AFAIK ,signed url's is only v3 | 17:33 |
lucasagomes | right | 17:33 |
lucasagomes | thanks | 17:33 |
JayF | That is code that doesn't exist atm though :) | 17:34 |
JoshNang | lucasagomes: signed url to download the glance image from swift? | 17:34 |
JayF | JoshNang: I think they're talking about what was mentioned at the mid-cycle meetup of using signed urls, provided by keystone, to auth ironic to the agent and back | 17:34 |
lucasagomes | yup, neither signed url or our credentials in keystone exist, I just want to gather the ideas we have in the moment that touches v3 | 17:34 |
devananda | JoshNang: that too. but we've also been discussing a signed url for the agent to POST back to ir-api | 17:34 |
devananda | JayF: yes | 17:35 |
lucasagomes | JayF,* | 17:35 |
JoshNang | gotcha | 17:35 |
lucasagomes | JoshNang, yeah for the vendor_passthru, pass_deploy_info() that we use in pxe right now | 17:35 |
JoshNang | definitely a very useful security feature | 17:35 |
lucasagomes | yeah, the auth_token in the /tftp needs to go away :) | 17:36 |
*** jistr has quit IRC | 17:36 | |
devananda | lucasagomes: another interesting one for you https://review.openstack.org/#/c/78912/ | 17:36 |
JayF | Probably something we might want to wait to actually implement in the agent+driver until after Atlanta summit though, since it's not strictly required for a working prototype | 17:36 |
lucasagomes | devananda, will see | 17:37 |
devananda | JayF: ++ | 17:37 |
NobodyCam | lucasagomes: devananda: https://review.openstack.org/#/c/82180/ | 17:37 |
devananda | we may well end up with a session for key v3 integration discussion @Atlanta | 17:37 |
NobodyCam | add driver-show to client | 17:37 |
devananda | sounds like there are multiple angles to consider and things to implement | 17:38 |
NobodyCam | devananda: seems like it would be worth it | 17:38 |
lucasagomes | devananda, +1 | 17:39 |
NobodyCam | humm NodeLocked: Node 24e2d627-5a8d-4619-abfe-14f54d428783 is locked by host ubuntu, please retry after the current operation is completed. | 17:40 |
lucasagomes | devananda, would be nice to have some ceilometer guys there as well to rethink the push-ipmi-data-to-ceilometer as well | 17:40 |
devananda | lucasagomes: i thought that bp was abandoned // haomeng was working on integration | 17:40 |
devananda | so that ir-cond would push notifications to ceil | 17:40 |
lucasagomes | devananda, right, yeah he was working on it | 17:41 |
JayF | Do all Openstack services only push statistics to Ceilometer? Are there any integrations with non-OS monitoring tools like statsd? | 17:41 |
lucasagomes | devananda, but I think we agreed with the ceilometer guys that ironic will do it, and the main reason was because ironic owns the ipmi credentials | 17:41 |
lucasagomes | which might change in that discussion | 17:41 |
lucasagomes | so it would be nice to have their presence there as well | 17:41 |
devananda | yuriyz: on 81763, any reason not to set action_timeout to 0? | 17:42 |
devananda | yuriyz: afaict, this is only controlling the sleep() time for a mocked function anyway | 17:42 |
lucasagomes | JayF, hmm I think ceilometer has a central agent that pulls statistics as well | 17:42 |
lucasagomes | JayF, and the central agent sends it via RPC to the collectors | 17:42 |
lucasagomes | that then write to the database | 17:43 |
lucasagomes | what Ironic would do, would send it via RPC to the collectors directly AFAIUI | 17:43 |
JayF | Just thinking about how I would metric Ironic in an enviornment without Ceilometer. | 17:43 |
JayF | It sounds like the answer might have to be, run a Ceilometer and have it report into another monitoring system | 17:44 |
*** athomas has quit IRC | 17:45 | |
devananda | lucasagomes: re: "ironic owns the creds" -- sorta. the real reason AFAIR is "ironic owns the BMC access channel" | 17:45 |
devananda | lucasagomes: adding more services with access to the BMC means more complicated ops and security | 17:46 |
lucasagomes | devananda, sure, but we could do it in a controller way using trust etc | 17:46 |
devananda | lucasagomes: the original BP for ceilo was to use *local* IPMI access via an agent on the host. which clearly doesn't apply to ironic's instances | 17:46 |
devananda | JayF: that depends onw hat you want to monitor | 17:47 |
lucasagomes | devananda, but that original approached changed no? | 17:47 |
devananda | JayF: resource utilization (how many nodes used / available) vs stats of each node (cpu temp, fan speed, etc) | 17:47 |
lucasagomes | after the discussion u guys had in hong kong | 17:47 |
devananda | lucasagomes: right | 17:48 |
JayF | devananda: I'd probably want to monitor both, but do different things with them (i.e. capcity planning vs failure prediction) | 17:48 |
lucasagomes | which would be the ironic conductor that would retrieve the data and send it to the ceilometer collectors? | 17:48 |
JayF | devananda: which is why I'd want the flexibility to get the data, especially the second type of data, into another system outside of openstack | 17:48 |
lucasagomes | JayF, which ceilometer might fit well with their alarms | 17:49 |
JayF | lucasagomes: to be blunt; my A+ preference would be to cut ceilometer out of the conversation completely, but given that's not possible, I'm just looking for the most efficient way to get the data out and into another system | 17:50 |
lucasagomes | right ack | 17:50 |
devananda | JayF: my ideal would be for ironic to expose an API for the retrieval of said information | 17:51 |
devananda | JayF: iirc, and it's been a while so imbw, ceilometer didn't want to poll to get it -- they wanted ironic to push the notifications out on some periodicity | 17:52 |
devananda | NobodyCam: https://review.openstack.org/#/c/77939/ could use eyes | 17:52 |
JayF | Pushing fits more with other monitoring systems | 17:52 |
jroll | devananda: sounds expensive unless ironic is storing that data already | 17:52 |
JayF | but I'd love it to be pluggable | 17:52 |
devananda | jroll: right | 17:53 |
* NobodyCam looks | 17:53 | |
JayF | where I could, for instance, ship to statsd and/or graphite (in addition to? || in place of?) ceilometer | 17:53 |
devananda | so. pushing means it needs to be pluggable | 17:53 |
jroll | indeed | 17:53 |
JayF | Has there been much thought on this? Any blueprints, etc? I'll gladly toss up a blueprint for metrics gathering | 17:53 |
devananda | as longas the message format is standardized across all drivers (which is one of the driver API req's) then I think that should be doable | 17:53 |
devananda | yes | 17:53 |
devananda | https://blueprints.launchpad.net/ironic/+spec/send-data-to-ceilometer | 17:54 |
*** derekh has quit IRC | 17:54 | |
JayF | I'll add some comments to that then. Thanks | 17:54 |
NobodyCam | devananda: jenkins failed on that review...? | 17:54 |
devananda | and https://review.openstack.org/#/c/72538/ | 17:54 |
NobodyCam | ahh rebase | 17:54 |
devananda | JayF: there's a _long_ discussion there. worth reading up on before you comment | 17:54 |
JayF | devananda: absolutely :) reading the context is part of the process. Thanks for the links | 17:55 |
NobodyCam | yuriyz: wanta tosss up a quick rebase on https://review.openstack.org/#/c/77939/ | 17:55 |
*** max_lobur1 has quit IRC | 17:58 | |
Shrews | devananda: for bug 1295870, i was sort of thinking following the model nova uses, which can be seen in this class: https://github.com/openstack/nova/blob/master/nova/image/glance.py#L164 | 17:59 |
Shrews | devananda: i don't think putting retry logic in the client itself is the better route. no other client code does that, from what i can tell. And I think a developer using the client lib would want tighter control over that, anyway | 18:00 |
* NobodyCam makes a bagel b4 meeting ... brb | 18:03 | |
Shrews | NobodyCam: I should really invest some money in that bagel company you're supporting :) | 18:05 |
devananda | Shrews: ++ | 18:06 |
NobodyCam | lol :) | 18:06 |
NobodyCam | http://www.saraleebread.com | 18:08 |
openstackgerrit | A change was merged to openstack/ironic: Change JsonEncodedType.impl to TEXT https://review.openstack.org/81583 | 18:09 |
openstackgerrit | A change was merged to openstack/ironic: Imported Translations from Transifex https://review.openstack.org/78862 | 18:09 |
lucasagomes | Shrews, nice! yeah calling manually the _retry_if... is not ideal | 18:10 |
Shrews | agreed | 18:11 |
*** zul has quit IRC | 18:13 | |
openstackgerrit | Jarrod Johnson proposed a change to stackforge/pyghmi: Fix missing delay_xmit argument breaking power wait requests https://review.openstack.org/82569 | 18:15 |
*** zul has joined #openstack-ironic | 18:16 | |
openstackgerrit | A change was merged to openstack/python-ironicclient: Add support for 'driver-show' command https://review.openstack.org/82180 | 18:17 |
jroll | vkozhukalov, agordeev, if either of you are around, here's my draft for updating the agent wiki page: https://etherpad.openstack.org/p/282Ocf7oXR | 18:18 |
jroll | and anyone else interested ^ | 18:18 |
jroll | of course, some of that may change after today's meeting | 18:18 |
jroll | s/may/will likely/ :) | 18:18 |
vkozhukalov | jroll: having a look | 18:19 |
NobodyCam | speaking of meeting ... anyone have or want anything on the agenda that thats not there already? | 18:20 |
*** lucasagomes is now known as lucas-afk | 18:21 | |
jroll | agenda lgtm :) | 18:21 |
devananda | easy review for another core to approve: https://review.openstack.org/#/c/81555/ | 18:23 |
* NobodyCam clicks | 18:24 | |
NobodyCam | :-p | 18:24 |
dtantsur | In the meanwhile, I've tried to deploy instance, while tftp should be listening on ipv4 udp:69, but with the same result. And again no signs of tftpd working in any logs.. | 18:26 |
dtantsur | lucas-afk, did you say you use Fedora as well? | 18:27 |
NobodyCam | no tftp at all? ie deployment k&R not served? | 18:28 |
devananda | dtantsur: do you have network bridge properly set up? you should see DHCP BOOTP request coming from the VM | 18:28 |
devananda | dtantsur: either tail dnsmasq's log or tcpdump that network | 18:28 |
NobodyCam | dtantsur: real hw or vm? | 18:28 |
dtantsur | devananda, I'll see | 18:29 |
dtantsur | NobodyCam, vm, libvirt | 18:29 |
*** Hefeweizen has quit IRC | 18:30 | |
NobodyCam | dtantsur: dies node get any dhcp info... ie an ip | 18:30 |
NobodyCam | s/dies/does/ | 18:30 |
dtantsur | NobodyCam, `DHCPACK(tap1d4fb9d1-3e) 10.0.0.5 52:54:00:4f:8e:7a host-10-0-0-5` <-- seems like yes | 18:31 |
dtantsur | NobodyCam, and DHCPRELEASE follows in 30 minutes | 18:32 |
NobodyCam | dtantsur: any firewall running on host?] | 18:32 |
dtantsur | NobodyCam, I thinks firewalld. Let me check.. | 18:34 |
NobodyCam | oh if so is port 69 open? | 18:35 |
dtantsur | maybe not, trying to figure out | 18:37 |
* dtantsur never liked iptables >_< | 18:38 | |
NobodyCam | last I used iptables was centOs but I used something like iptables -I INPUT -p udp --dport 69 -j ACCEPT | 18:39 |
devananda | NobodyCam: another one for you - https://review.openstack.org/#/c/81340/2 | 18:39 |
NobodyCam | to open the port | 18:39 |
NobodyCam | lol thats a good reason : because the API for this has not been created yet | 18:40 |
dtantsur | ok, trying again with port opened | 18:41 |
NobodyCam | dtantsur: :) | 18:41 |
NobodyCam | should work better that way :) | 18:41 |
*** greghaynes has joined #openstack-ironic | 18:42 | |
NobodyCam | devananda: you ok with the spelling error lucas pointed out? just commit message so I'm ok with it :) | 18:42 |
dtantsur | btw, I get {"message": "'HTTPInternalServerError' object has no attribute '__name__'", "code": 500, "created": "2014-03-24T18:41:50Z"} while trying to delete failed instance, but that should be another story... | 18:44 |
NobodyCam | dtantsur: ya sounds unreleated to your current deploy issue | 18:45 |
NobodyCam | undercloud deployed from seed with ironic but getting error in undercloud.. | 18:46 |
NobodyCam | Timing out after 600 seconds: | 18:46 |
NobodyCam | COMMAND=ironic chassis-create -d devtest_canary | 18:46 |
NobodyCam | I think I know why... | 18:46 |
openstackgerrit | A change was merged to openstack/ironic: Stop incorrectly returning rescue: supported https://review.openstack.org/81336 | 18:47 |
*** martyntaylor has joined #openstack-ironic | 18:48 | |
*** martyntaylor has left #openstack-ironic | 18:49 | |
NobodyCam | ten minute bell! | 18:50 |
dtantsur | NobodyCam, great, some tftp activity in logs and node can be pinged, but ssh gives "connection refused" (better than "no route to host" already!) | 18:52 |
NobodyCam | dtantsur: what is the status | 18:53 |
NobodyCam | the initial deployment ramdisk will not have ssh | 18:53 |
dtantsur | NobodyCam, "spawning" for now. Should I wait more? | 18:54 |
NobodyCam | yes wait | 18:54 |
NobodyCam | can you watch the nodes console? | 18:54 |
dtantsur | NobodyCam, not sure how | 18:55 |
NobodyCam | its ok | 18:55 |
NobodyCam | watch the status field | 18:55 |
*** tatyana has left #openstack-ironic | 18:56 | |
devananda | dtantsur: what's the status in ironic's API? | 18:56 |
devananda | "spawning" is from nova-api | 18:56 |
*** romcheg has joined #openstack-ironic | 18:56 | |
NobodyCam | dtantsur: ironic node-show | 18:56 |
NobodyCam | or just ironic node-list | 18:57 |
dtantsur | devananda, NobodyCam provision_state | wait call-back | 18:57 |
dtantsur | target_provision_state | deploy complete | 18:57 |
*** lucas-afk is now known as lucasagomes | 18:57 | |
*** agordeev2 has joined #openstack-ironic | 18:58 | |
*** romcheg has quit IRC | 18:58 | |
devananda | dtantsur: that looks good. tail ir-cond log | 18:58 |
lucasagomes | dtantsur, yeah I do, haven't tested that devstack patch tho | 18:58 |
NobodyCam | dtantsur: nova is waiting for the node to ping back and say start deploy | 18:58 |
*** romcheg has joined #openstack-ironic | 18:58 | |
devananda | ^ s/nova/ironic/ | 18:58 |
NobodyCam | doh TY devananda :) | 18:58 |
devananda | dtantsur: you should see in the tftp logs, in addition to kernel & ramdisk, that the token file is also fetched | 18:58 |
devananda | dtantsur: depending on how slow nested virt is for you, it may be several minutes before the deployment resumes | 18:59 |
dtantsur | devananda, "/tftpboot/token-..."? Yes, seems like it was fetched | 18:59 |
dtantsur | devananda, ok, thanks you. I'll try not to panic a bit more :) | 19:00 |
*** harlowja has quit IRC | 19:01 | |
*** max_lobur has joined #openstack-ironic | 19:02 | |
*** mrda_away is now known as mrda | 19:02 | |
*** harlowja has joined #openstack-ironic | 19:03 | |
*** romcheg has quit IRC | 19:09 | |
*** romcheg has joined #openstack-ironic | 19:09 | |
*** yonglihe_ has joined #openstack-ironic | 19:15 | |
*** yongli has quit IRC | 19:16 | |
*** romcheg1 has joined #openstack-ironic | 19:22 | |
*** romcheg has quit IRC | 19:24 | |
openstackgerrit | A change was merged to openstack/ironic: Hide rescue interface from validate() output https://review.openstack.org/81340 | 19:26 |
dtantsur | To panic again: still does not work :( | 19:29 |
dtantsur | and I'm going to file a bug about cleaning up after failure: http://paste.openstack.org/show/74179/ http://paste.openstack.org/show/74164/ | 19:30 |
*** romcheg1 has quit IRC | 19:38 | |
*** romcheg has joined #openstack-ironic | 19:38 | |
*** epim has quit IRC | 19:42 | |
openstackgerrit | A change was merged to openstack/ironic: Fix traceback hook for avoid duplicate traces https://review.openstack.org/81555 | 19:46 |
dtantsur | created https://bugs.launchpad.net/ironic/+bug/1296918 | 19:49 |
dtantsur | lifeless, you mentioned something about nova delete not working? I have troubles with deleting failed instances - they just stay in "deleting" state. May it be related? | 19:51 |
lifeless | dtantsur: thats the symptom | 19:52 |
lifeless | dtantsur: though it fails for running instances too for me | 19:52 |
dtantsur | lifeless, nova show for such instance gives me {"message": "'HTTPInternalServerError' object has no attribute '__name__'", "code": 500, "created": "2014-03-24T17:15:50Z"} | 19:53 |
dtantsur | is it the same for you? | 19:53 |
lifeless | dunno right now | 19:54 |
lifeless | I've context switched away from Ironic until we get consensus on the n-n startup issue | 19:55 |
dtantsur | ok, will try to investigate as well | 19:56 |
lifeless | n-c, I mean | 19:58 |
lifeless | dtantsur: https://bugs.launchpad.net/ironic/+bug/1295503 | 19:59 |
jroll | 12:59:24 JayF | I think we should take jroll's proposed wiki page, make it the wiki page, and put arch discussions in there/in a blueprint | 20:00 |
jroll | I agree | 20:00 |
agordeev2 | +1 | 20:00 |
devananda | JayF: so etherpads are, IMO, good for near-real-time discussions | 20:00 |
devananda | JayF: not so much for long async design sessions | 20:00 |
dtantsur | lifeless, thanks, will look into this as well | 20:00 |
devananda | JayF: but BPs are even worse for that | 20:00 |
vkozhukalov | jroll: devananda: let's remove all that stuff about CMDB | 20:00 |
JayF | I completely agree | 20:00 |
jroll | vkozhukalov: +1 | 20:00 |
*** JoshNang has quit IRC | 20:00 | |
NobodyCam | brb | 20:00 |
JayF | So what's the best way to go? I suggest codifying the Minimum-viable-architecture in the wiki, working towards implementing that in a prototype for the summit, then at the summit hashing it out further | 20:01 |
*** JoshNang_ has joined #openstack-ironic | 20:01 | |
jroll | makes sense to me | 20:01 |
devananda | JayF: that sounds good. do you have that MVP drafted somewhere already? | 20:01 |
jroll | devananda: https://etherpad.openstack.org/p/282Ocf7oXR | 20:01 |
lucasagomes | what about some clarification on the procedural vs declarative API? | 20:01 |
JayF | jroll: you can hash out the architecture stuff more in that though, right? and pull references to the old etherpad? | 20:01 |
vkozhukalov | one of the main questions is about procedural vs declarative approach | 20:02 |
jroll | JayF: yes, I can | 20:02 |
jroll | ok, so | 20:02 |
jroll | is anyone opposed to a fine-grained API with all available commands? | 20:02 |
devananda | lifeless: ok, so the problem seems to actually be here: https://github.com/openstack/nova/blob/master/nova/compute/manager.py#L1008 | 20:02 |
devananda | lifeless: which is out of scope for us to be able to change | 20:02 |
vkozhukalov | is everyone ok with having both of those approaches simultaneusly? | 20:02 |
jroll | and is anyone opposed to adding an additional endpoint to submit multiple commands? | 20:02 |
jroll | I think we should have both | 20:02 |
devananda | lifeless: it's not the nova.virt.ironic.driver:init_host() method taht's failing -- it's compute.manager :( | 20:02 |
*** max_lobur has quit IRC | 20:03 | |
jroll | lucasagomes: I think we should support both - I'll add that to https://etherpad.openstack.org/p/282Ocf7oXR | 20:04 |
lucasagomes | jroll, cheers | 20:04 |
vkozhukalov | jroll: i still think we need to have a minimal list of drivers as well, let's move one by one and decide which of those mentioned here https://etherpad.openstack.org/p/IronicPythonAgent we really need. | 20:06 |
jroll | vkozhukalov: I mentioned them in the other etherpad | 20:06 |
jroll | not which functionality they cover, but which drivers we should have | 20:06 |
jroll | and I can list out which functions they should support, if that's helpful | 20:07 |
vkozhukalov | jroll: ok, see, line 36 | 20:07 |
jroll | yes | 20:07 |
devananda | vkozhukalov: jroll: one thing to consider about the agent -- ironic's driver API and the agent API will converge over time | 20:07 |
jroll | devananda: perhaps. I think the agent API will always be more fine-grained | 20:08 |
devananda | any action that we need to perform on hardware will a) need to be expressed in some way via teh REST and Driver APIs | 20:08 |
lifeless | devananda: yes, I know :( | 20:08 |
devananda | and b) likely be expressed in other driver APIs as well | 20:08 |
lifeless | devananda: I have an idea though which you may hate | 20:08 |
*** romcheg has quit IRC | 20:08 | |
lifeless | devananda: which is to be evil | 20:08 |
devananda | lifeless: ehh.... | 20:08 |
jroll | devananda: eh, maybe you're right. I can definitely see them converging. maybe not 100% but close. | 20:09 |
*** JoshNang_ has quit IRC | 20:09 | |
devananda | jroll: what we do with the agent, vendors will do in hardware | 20:09 |
devananda | jroll: update firmware? yea, some vendors can do taht directly via the BMC | 20:10 |
devananda | same for build raid, etc | 20:10 |
jroll | sure | 20:10 |
NobodyCam | lifeless: evil? how so? | 20:10 |
JayF | I think it's possible that the agent might be able to do things that other drivers might find difficult/impossible over time | 20:11 |
lifeless | devananda: https://review.openstack.org/#/c/81959/ + https://review.openstack.org/#/c/82414/ + https://review.openstack.org/#/c/81627/ | 20:11 |
JayF | but that doesn't exclude those things from being a part of the larger ironic api if some driver comes along that can implement them, good | 20:11 |
lifeless | devananda: should let you a) get a seed ironic that Just Works | 20:11 |
devananda | jroll: not 100% identical, but the agent and driver APIs will necessarily converge (since ir-cond needs a way to tell the driver to do things on the hardware) | 20:11 |
lifeless | devananda: and b) see the undercloud fail | 20:11 |
jroll | JayF: at a minimum, the agent driver could implement them :P | 20:11 |
jroll | devananda: right, I agree | 20:11 |
JayF | jroll: exactly :D | 20:11 |
devananda | JayF: yes. and things which the upstream agent can do which vendor drivers _cant_ will encourage vendors to start _addign_ those functions to their hardware | 20:12 |
JayF | exactly | 20:12 |
JayF | Look at this -> we need a whole ramdisk + some python agent just to update your bios. Don't you just want to make it a call via the BMC? | 20:12 |
JayF | That's why I like getting to a full prototype of things working together. Almost certainly we can learn more from a working example than from talking about what a working example might look like ;) | 20:13 |
lifeless | JayF: am I wrong to say please nononono I can audit the ramdisk and python code | 20:13 |
lifeless | JayF: I can't audit vendor code, and we *know* its often got problems | 20:13 |
devananda | lifeless: you have different priorities than hw vendors ;) | 20:13 |
JayF | lifeless: In my experience, you sometimes might be exec()'ing a vendor binary to flash the firmware or update the bios | 20:14 |
lifeless | devananda: arguably not:) | 20:14 |
JayF | lifeless: so the only thing auditable in those cases are the scaffolding we've built to enable it | 20:14 |
devananda | and almost all the time, we'll be relying on vendor tooling to do the actual plumbing anyway | 20:14 |
lifeless | JayF: yes but at least I'm not gambling they get network security correct | 20:14 |
vkozhukalov | another question which is still open for me is: how is conductor going to do inventory? is it supposed to be implemented as periodic task? | 20:14 |
lifeless | JayF: cipher suite 0. | 20:14 |
devananda | vkozhukalov: two approaches have been proposed so far | 20:14 |
devananda | vkozhukalov: 1) default PXE ramdisk which POSTs enrollment data | 20:15 |
devananda | vkozhukalov: 2) driver API to "scan" for unregistered hardware | 20:15 |
jroll | ^ | 20:15 |
JayF | lifeless: I just think we'll have won if we can get most vendors to start cryptographically signing their firmwares and configs and things of that matter. That way the path doesn't matter as much. | 20:15 |
devananda | JayF: ++ | 20:15 |
jroll | devananda, vkozhukalov: I think we could support both methods. | 20:15 |
jroll | s/could/should | 20:15 |
*** JoshNang_ has joined #openstack-ironic | 20:16 | |
devananda | jroll: ++should | 20:16 |
NobodyCam | brb | 20:16 |
*** romcheg has joined #openstack-ironic | 20:18 | |
*** romcheg has left #openstack-ironic | 20:18 | |
lifeless | devananda: so by evil I mean tarpitting the ironic calls c.m makes | 20:18 |
*** romcheg has joined #openstack-ironic | 20:18 | |
*** lucasagomes is now known as lucas-dinner | 20:18 | |
lifeless | devananda: but can we go big picture for a second ? | 20:18 |
devananda | lifeless: sure | 20:19 |
lifeless | devananda: isn't the idea of Ironic that nova instances won't be owned by a nova compute in quite they way they are today ? | 20:19 |
lifeless | devananda: e.g. three n-cs, three ironics, one fails, there shouldn't be any big drama | 20:19 |
devananda | lifeless: yes, except that will require rearchitecting some chunks of n-cpu, n-cond, n-sched | 20:19 |
lifeless | sure sure | 20:19 |
lifeless | but if thats the big picture | 20:19 |
lifeless | perhaps we can do some compromise stuff today with that in mind | 20:19 |
lifeless | e.g. | 20:19 |
lifeless | the failure in list_instances | 20:20 |
vkozhukalov | jroll: devananda: for me they are to different tasks 1) discovery - list of nodes available 2) inventory - hardware details | 20:20 |
JayF | vkozhukalov: What use is a list of nodes without the details with them? | 20:21 |
lifeless | what if list_instances was also ring based (using the nova knowledge of running n-c) | 20:21 |
devananda | vkozhukalov: perhaps there is a terminology difference. "discovery" for us means "find new hardware that is not yet known by ironic" | 20:21 |
vkozhukalov | jroll: devananda: 1) discovery - heartbeat 2) inventory - hardware info exposed via API | 20:21 |
jroll | vkozhukalov: yes. that's why we should support both. | 20:21 |
devananda | vkozhukalov: discovery != heartbeat | 20:21 |
lifeless | devananda: agh, got too detailed. Let me make a pad. | 20:21 |
devananda | lifeless: please do. I dont believe nova has any hash ring or knowledge of ir-cond's ring | 20:22 |
jroll | vkozhukalov: so, discovery is so far an unsolved problem, and I'd like to punt on that for now. | 20:22 |
lifeless | https://etherpad.openstack.org/p/ironic-nova-friction | 20:23 |
devananda | lifeless: that said, i /think/ i see where you're going, broadly... eager to see it more clearly | 20:23 |
vkozhukalov | devananda: heartbeat > discovery, right? | 20:23 |
jroll | vkozhukalov: heartbeat, the hardware info is send on the *first* heartbeat and stored on the node object in the db | 20:23 |
devananda | vkozhukalov: heartbeat == "an idle agent informing ir-cond that it is still alive" | 20:23 |
jroll | vkozhukalov: further heartbeats just say "I'm alive" | 20:23 |
devananda | jroll: "hw info is sent on first POST" | 20:23 |
devananda | jroll: to clarify my view, that POST is not a heartbeat | 20:24 |
jroll | vkozhukalov: (I should mention that the "first heartbeat" is not actually a heartbeat) | 20:24 |
vkozhukalov | jroll: ok, now i see | 20:24 |
jroll | devananda: agreed, bad wording on my part | 20:24 |
devananda | :) | 20:24 |
devananda | jroll: do you have a pad / flow diagram of the agent's init process? | 20:25 |
jroll | vkozhukalov: for "inventory", I think that may be outside the scope of ironic itself, but I do like the idea of exposing an API endpoint in both ironic and the agent | 20:25 |
devananda | jroll: eg, from DHCP BOOT through ... until it is finally idle | 20:25 |
*** romcheg1 has joined #openstack-ironic | 20:25 | |
jroll | devananda: I think so but not positive, let me poke around | 20:26 |
devananda | as far as inventory and exposing hw specs -- to me, the answer is simple: stash any details that ironic doesn't need in node.properties or node.extra | 20:26 |
jroll | +1 | 20:26 |
devananda | ir-api already exposes those | 20:26 |
devananda | so there's NOTHING to do :) | 20:26 |
jroll | :) | 20:26 |
devananda | jroll: thanks. i'd like to understand how yhou see the auth during agent startup (both for a known and an unknown node) | 20:26 |
*** romcheg has quit IRC | 20:27 | |
jroll | devananda: we use a trusted network for authentication right now | 20:27 |
*** harlowja_ has joined #openstack-ironic | 20:27 | |
vkozhukalov | devananda: jroll: we definitely need to have detailed diagram about heartbeat/discovery/inventory flow | 20:27 |
JayF | devananda: I think we're enforcing security of the network agents come on, rather than in the agent itself right now | 20:27 |
*** romcheg has joined #openstack-ironic | 20:28 | |
jroll | vkozhukalov: except for exposing a "list hardware" api call, we're not in the business of inventory management | 20:28 |
lifeless | devananda: soime thoughts there, but stepping away for a sec we have a contractor here needs shown around | 20:28 |
devananda | lifeless: ack | 20:28 |
JayF | devananda: if, for instance, I pass some option or PXE config based on MAC onto a network, any device can spoof that mac and get the 'credentials' which means truly authenticating an agent on a trustworthy network is borderline impossible | 20:28 |
*** notq has joined #openstack-ironic | 20:29 | |
devananda | JayF: unless SDN // MAC filtering on the switch's ports | 20:29 |
devananda | JayF: then yes. but we're not there yet :) | 20:29 |
JayF | devananda: exactly, and at that point you're completely relying on the security of the network | 20:29 |
*** epim has joined #openstack-ironic | 20:29 | |
NobodyCam | lifeless: question, is there any reason each element that needs it could not check if keystone is setup and if not call a keystone init from a O-R-C script? | 20:30 |
JayF | devananda: not saying we shouldn't add layers eventually, just saying I think there's, at least for a little while unless there's some more clever solution I haven't thought of/heard yet, an intrisic requirement that the PXE network be secured | 20:30 |
*** romcheg2 has joined #openstack-ironic | 20:30 | |
*** romcheg1 has quit IRC | 20:30 | |
devananda | JayF: i agree that there is that assumption today | 20:31 |
*** harlowja has quit IRC | 20:31 | |
devananda | JayF: but it's one taht i think we should aim to get away from | 20:31 |
JayF | I'm just curious as to what could happen in the future, even, to change that assumption | 20:31 |
JayF | Not really urgent to know now, I just don't see a path away from that requirement | 20:32 |
devananda | ack | 20:32 |
JayF | unless you change from 'pxe' to some more authenticated mechanism to transmit the agent and/or configs (like virtual media, or some fancy UEFI secure remote booting thing that may not exist yet but would theoretically be awesome) | 20:32 |
devananda | JayF: example: ^ | 20:32 |
devananda | yep | 20:33 |
*** romcheg has quit IRC | 20:33 | |
devananda | work is in flight for exactly that | 20:33 |
jroll | how will that work? | 20:34 |
devananda | hypothetically | 20:36 |
devananda | driver creates disk w/ secure token, mounts disk via VM channel | 20:37 |
devananda | driver powers on node | 20:37 |
devananda | node PXE boots generic ramdisk | 20:37 |
devananda | then uses secure token to auth back to ironic | 20:37 |
devananda | <end> | 20:37 |
devananda | same process can be extended for validating signatore of the PXE and user images, firmware image, etc. it's a step towards UEFI support | 20:38 |
jroll | hmm, ok | 20:38 |
jroll | what is VM channel? | 20:38 |
jroll | virtual media? | 20:39 |
devananda | virtual media channel | 20:39 |
JayF | so there's an implict requirement for hardware support for mounting remote media? | 20:39 |
devananda | not part of IPMI spec -- but nearly all vendors have one | 20:39 |
devananda | yes | 20:39 |
jroll | hmm | 20:39 |
devananda | this is one of the main benefits taht hw vendors are looking to drive | 20:39 |
devananda | *get in their drivers | 20:39 |
devananda | that's not to say "assume PXE net is secure" is an invalid model -- it's fine for a lot of situations | 20:40 |
devananda | but as we think about APIs and architeture, we should avoid limiting ourselves to ^ | 20:40 |
JayF | well it's the only model without hardware cooperation of some kind it seems :C | 20:40 |
devananda | to be pedantic, we kinda need hardware cooperation to do /any/ of this :p | 20:41 |
jroll | hahaha | 20:41 |
JoshNang_ | :D | 20:41 |
lifeless | ok back | 20:41 |
JayF | ipmitool chassis power on | 20:41 |
JayF | # no | 20:41 |
devananda | hehehe :) | 20:41 |
* JayF has seen BMCs almost that ornery | 20:41 | |
lifeless | NobodyCam: say there are three undercloud nodes | 20:41 |
lifeless | NobodyCam: which one should do the init-keystone initialisation ? | 20:41 |
* jroll afk for a few | 20:42 | |
NobodyCam | lifeless: hummm.. yes odd on they would all try at the same time | 20:42 |
NobodyCam | s/odd/odds/ | 20:44 |
vkozhukalov | good night guys, tomorrow going to draw heartbeat/discovery sequence diagram, and think we are almost agreed about main points | 20:45 |
*** linggao has quit IRC | 20:45 | |
NobodyCam | night vkozhukalov | 20:45 |
devananda | comstud: around? want to continue the compute.manager:init_host discussion from last week? we're jotting notes at https://etherpad.openstack.org/p/ironic-nova-friction | 20:46 |
lifeless | NobodyCam: so, no, not without some careful plumbing/thought. | 20:46 |
*** max_lobur has joined #openstack-ironic | 20:46 | |
NobodyCam | lifeless: would be neat of there was a hash ring type check the undercloud could do.. | 20:47 |
NobodyCam | so only one would attempt to register | 20:47 |
NobodyCam | :-p | 20:47 |
lifeless | NobodyCam: we have a etherpad for that | 20:47 |
NobodyCam | was just a thoiught I had walking th dogs | 20:47 |
NobodyCam | oh link? | 20:47 |
devananda | vkozhukalov: g'night! thanks! | 20:47 |
devananda | lifeless: re: hash ring for tripleo things -- wouldn't it be nice if we had a quorum manager? ;) | 20:48 |
devananda | something like, oh, zookeeper maybe ... | 20:49 |
lifeless | devananda: then we have the same bootstrap problem for that :) | 20:49 |
*** vkozhukalov has quit IRC | 20:49 | |
devananda | hehe | 20:49 |
lifeless | devananda: we're looking at some plumbing to help, and yes, I think perhaps we need to make a consistent API and facility for that in openstack but thats a later battle. | 20:49 |
NobodyCam | :) | 20:50 |
devananda | lifeless: so a problem with nova.virt.ironic.driver:list_instances merely returning [] when unable to auth/reach ironic/etc is that this will lead to very abberant behavior later on | 20:51 |
comstud | devananda: Half around... making/eating lunch, but I can multitask | 20:52 |
devananda | lifeless: short of that, i'm not sure what you could be proposing | 20:52 |
lifeless | devananda: just finishing the analysis, bear with me :)_ | 20:52 |
lifeless | I may be on crack | 20:52 |
lifeless | this code is a problem | 20:54 |
lifeless | for instance in local_instances: | 20:54 |
lifeless | if instance.host != our_host: | 20:54 |
russell_h | lifeless: "Nova-compute is intrinsically HA with federated state mirrored into a central scheduler + DB" <- that seems wrong | 21:02 |
lifeless | russell_h: ok! how so | 21:03 |
*** jbjohnso_ has quit IRC | 21:03 | |
russell_h | lifeless: I mean, the statements about state mirroring are correct, but the conclusion that its HA I think is wrong | 21:03 |
russell_h | today, to make nova-compute HA you would need to implement some sort of failover or master-election on top | 21:03 |
russell_h | and there are definitely a lot of challenges along the way | 21:03 |
lifeless | russell_h: if you have 5 libvirt kvm hypervisors and one fails, the system as a whole remains available | 21:03 |
*** JoshNang_ is now known as JoshNang | 21:04 | |
russell_h | lifeless: ah, I see what you mean. You mean that collectively, the nova-compute layer is HA? | 21:04 |
lifeless | russell_h: no, there is no master in n-c, its federated, not replicated state. | 21:04 |
lifeless | right | 21:04 |
russell_h | ok, gotcha, agreed | 21:04 |
lifeless | individual VMs aren't HA with any of the current hypervisors | 21:04 |
comstud | russell_h: There is a way to do HA for nova-compute | 21:04 |
lifeless | but n-c isn't a hypervisor, its an abstraction. | 21:04 |
comstud | it's kinda hacky, but we think it works | 21:04 |
lifeless | comstud: running live-migrate and stopping pre-pivot ? | 21:04 |
comstud | I'm talking about nova-compute itself | 21:05 |
comstud | run N of them | 21:05 |
comstud | set their 'host' all to the same thing | 21:05 |
comstud | CONF.host | 21:05 |
lifeless | comstud: oh right - so yeah thats one of the options here for ironic actually | 21:05 |
lifeless | comstud: but big concerns about scale | 21:05 |
comstud | yeah, i'm just talking in the ironic context | 21:05 |
lifeless | like, 10K machines with Ironic, do you want them all reporting all instances ? | 21:05 |
comstud | not really! | 21:05 |
comstud | but | 21:05 |
comstud | that's what cells is for | 21:06 |
comstud | :) | 21:06 |
* comstud hides | 21:06 | |
lifeless | comstud: so not :) | 21:06 |
comstud | so is | 21:06 |
russell_h | what we need is a hashring service | 21:06 |
lifeless | comstud: I thought cells was for scaling DB / rabbit /scheduler usage ? | 21:07 |
comstud | lifeless: it's a general way to break up work | 21:07 |
comstud | But yes, I still don't like all nova-computes talking to ironic and getting all instances | 21:07 |
lifeless | comstud: its very manual though, right? you have to decide how many cells | 21:07 |
comstud | it's not really a solution for that.. for that part I'm somewhat kidding | 21:07 |
comstud | yes | 21:08 |
comstud | you break up hosts into cells manually... or you can wrap config with something automatic | 21:08 |
lifeless | yes, so - not dynamic :) | 21:08 |
lifeless | anyhooo | 21:08 |
comstud | i don't think you want 10K nodes in a single cell either way though | 21:09 |
comstud | but we can ignore that | 21:09 |
comstud | I think you need a solution regardless | 21:09 |
comstud | coincidentally i was just talking to dansmith about this re: nova conductor | 21:11 |
comstud | trying to find a way to perhaps have N nova-conductors manage a set of M nova-computes | 21:11 |
lifeless | it doesn't do that already? | 21:12 |
lifeless | I thought the whole *point* of nova-conductor was to scale DB access separately to computes (as well as the security aspects) | 21:12 |
comstud | well, right now there's no need to have any conductor own a particular compute | 21:13 |
comstud | but we're talking about maybe doing that with respect to periodic tasks | 21:13 |
comstud | so yes, we have that today without any 'ownership' | 21:13 |
comstud | (M:N) | 21:14 |
lifeless | for things with no real locality of reference, that seems fine to me | 21:15 |
lifeless | anyhow, hash rings++ | 21:15 |
lifeless | comstud: you're looking at https://etherpad.openstack.org/p/ironic-nova-friction ? | 21:15 |
mrda | comstud: so formalising the relationship between computes and conductors. Where *would* cells fit into that discussion? | 21:16 |
comstud | lifeless: I haven't looked at that, no | 21:16 |
comstud | mrda: It doesn't | 21:16 |
mrda | :) | 21:16 |
comstud | cells is mostly supposed to be a manual "i assign certain nodes to cells" and they don't flip | 21:17 |
comstud | because there can be layer 2 boundaries and so forth | 21:17 |
*** jbjohnso_ has joined #openstack-ironic | 21:19 | |
devananda | N n-cpu with a static 1:M relationship to nodes invalidates the HA which ir-api/ir-cond services provide | 21:19 |
*** agordeev2 has quit IRC | 21:19 | |
mrda | hmmm | 21:20 |
devananda | and N n-cpu with no relationship to nodes, where each n-cpu reports all nodes, will confound and overwhelm the scheduler | 21:20 |
devananda | and only 1 N-cpu instance invalidates nova's HA | 21:20 |
devananda | therefor: we must remove n-cpu! :-D | 21:20 |
devananda | lifeless: actually, what's the harm in running ony 1 n-cpu instance? | 21:21 |
devananda | lifeless: if we assume it is easy to start a new one if the existing one fails | 21:21 |
devananda | can heat ensure that we have 0 or 1 copies of n-cpu running at any given time, but no more than 1? | 21:22 |
lifeless | devananda: means we need to wrap it in pacemaker, which is sad. | 21:22 |
lifeless | devananda: no, heat cannot. | 21:22 |
devananda | darn | 21:22 |
lifeless | devananda: heats granularity is the APIs it orchestrates(*) | 21:22 |
lifeless | *: this is changing a bit with software config, but not to solve this aspect yet | 21:22 |
lifeless | devananda: also scale | 21:23 |
devananda | right. until we test it, I suspect scale of n-cpu wont be taht much of an issue | 21:23 |
devananda | since it's merely greenthreads waiting on API calls with a small amount of python in the middle | 21:23 |
lifeless | devananda: it has a (green)thread pool | 21:23 |
devananda | CPU's scale well | 21:23 |
devananda | make the pool bigger :) | 21:23 |
lifeless | devananda: R U SRS? | 21:23 |
devananda | we've talked about the scaling of ir-cond :: nodes since conductors do a lot of IO | 21:24 |
devananda | and in nova-bm taht same IO pressure is on n-cpu | 21:24 |
devananda | but not the case with nova.virt.ironic | 21:24 |
devananda | or am i missing something? | 21:24 |
devananda | comstud: roughly, what's teh scale factor for n-cpu :: n-cond ? | 21:25 |
lifeless | devananda: I'm worried about efficieny of things like list_instances | 21:25 |
lifeless | devananda: and the interaction with e.g. scheduler etc there, which partitioning amongst n-cs would address. | 21:25 |
devananda | lifeless: you are referrign to the work done inside n-cpu when it gets the list of instances back from ir-api? | 21:27 |
devananda | lifeless: or the latency in getting said list? | 21:27 |
devananda | *are you | 21:27 |
lifeless | devananda: yes, yes and then what it hands to conductor etc. | 21:28 |
lifeless | devananda: all those codepaths are designed for up to hundreds of VMs, not thousands+ | 21:28 |
*** romcheg has joined #openstack-ironic | 21:29 | |
devananda | ah. so you think CPU would become the bottleneck then. gotcha | 21:29 |
lifeless | concerned | 21:29 |
lifeless | data will show :) | 21:29 |
devananda | indeed | 21:29 |
comstud | devananda: not sure that's been determined | 21:30 |
*** romcheg1 has joined #openstack-ironic | 21:30 | |
comstud | devananda: it's a lot higher than it should be right now... beacuse DB calls in conductor all block | 21:30 |
devananda | comstud: <facepalm> | 21:30 |
comstud | hehe ya.. well they do everywhere | 21:31 |
*** romcheg2 has quit IRC | 21:31 | |
comstud | it's just that spreading the load across all computes is actually better right now | 21:31 |
comstud | but anyway... fixes coming hopefully soonish | 21:31 |
lifeless | devananda: so _init_instance | 21:32 |
devananda | ok, task at hand. relationship between each n-cpu host and ironic nodes | 21:32 |
lifeless | devananda: have you read through that ? | 21:32 |
lifeless | devananda: its basically recovering from things that ironic either doesn't support (e.g. migrations) or partially applied local logic like deletes. | 21:33 |
lifeless | I think, if we have 2 n-cs with the same hostname that we *DO NOT WANT* any _init_instance stuff happening, as its just a massive race condition waiting to happen | 21:33 |
devananda | lifeless: right | 21:33 |
*** romcheg has quit IRC | 21:34 | |
openstackgerrit | Jay Faulkner proposed a change to openstack/ironic: Set good defaults for heartbeat interval & timeout https://review.openstack.org/82615 | 21:34 |
*** romcheg has joined #openstack-ironic | 21:34 | |
devananda | lifeless: do we ever want any of that stuff to happen (eg, if we had 1 n-cpu and restarted it) | 21:35 |
devananda | i think the answer is yes | 21:35 |
lifeless | devananda: it would be a nicety - e.g. if n-c is killed mid-delete | 21:35 |
devananda | right | 21:36 |
lifeless | finishing the cleanup without a user having to run 'nova delete' again | 21:36 |
lifeless | but, in principle, its no different than if the delete threw an exception and n-c wasn't restarted. | 21:36 |
devananda | but some steps in there dont make sense | 21:36 |
*** romcheg1 has quit IRC | 21:37 | |
lifeless | so | 21:38 |
lifeless | we want destroy-evacuated-instances to be a no-op | 21:38 |
lifeless | we want init-instance to run but only on one n-c | 21:38 |
lifeless | and the rest is a wash | 21:39 |
openstackgerrit | Jay Faulkner proposed a change to openstack/ironic: Set good defaults for heartbeat interval & timeout https://review.openstack.org/82615 | 21:39 |
lifeless | so I've put my evil plan in the etherpad ;) | 21:40 |
lifeless | devananda: plug_vifs for instance doesn't make any sense | 21:41 |
devananda | lifeless: the more i read _init_instance() the more i think none of that needs to be run for ironic | 21:41 |
devananda | lifeless: a few things (clean up a failed delete && retry any pending reboots) make sense but are not strictkkly necessary | 21:42 |
lifeless | right | 21:42 |
devananda | so it's clearner to just avoid the whole thing | 21:42 |
devananda | except i dont know that we actually can do that :( | 21:42 |
lifeless | we could use a different compute manager - etc/nova/nova.conf.sample:#compute_manager=nova.compute.manager.ComputeManager | 21:43 |
jroll | updated the agent wiki, if anyone is interested: https://wiki.openstack.org/wiki/Ironic-python-agent | 21:43 |
jroll | not 100% done but it's most of the way there | 21:44 |
JayF | jroll: we might wanna get the source for that PDF and s/teeth-agent/ironic-python-agent/ ;) | 21:44 |
devananda | enfi? | 21:44 |
jroll | JayF: oops | 21:44 |
jroll | JayF: also, blamar should have the source for that PDF | 21:45 |
lifeless | devananda: E No F* Idea. | 21:45 |
devananda | ahh | 21:45 |
devananda | E_NFI | 21:45 |
comstud | I'd avoid a different compute manager.. that starts to get into hacky territory | 21:46 |
comstud | although we're trying about making compute pluggable at a higher layer at some point | 21:46 |
comstud | I'd try to hold out for that | 21:46 |
lifeless | comstud: we just need to neuter one method | 21:46 |
comstud | Things like cells, vmware, hyperv, ironic all fit into a model where an external thing is managing the real nodes/hosts | 21:47 |
lifeless | comstud: and we're looking at 'how to make this usable for I', not long term plans. | 21:47 |
comstud | _init_instance? i missed why it needs neutered | 21:47 |
comstud | sure | 21:47 |
lifeless | comstud: init_host actually, thats 90% irrelevant/problematic. _init_instance is problematic for two reasons. | 21:48 |
comstud | It may do a bunch of unnecessary things, but there's nothing you can just choose to 'fake' in the driver? | 21:48 |
lifeless | comstud: a) with 3 N-C's with the same hostname we'll run _init_instance *concurrently* for the same instances on the different N-Cs | 21:48 |
comstud | we can look at moving some of _init_instance into the driver layer or something | 21:48 |
comstud | yeah ok | 21:48 |
devananda | comstud: several methods called in _init_instance are also called elsewhere. we can't fake all of those. | 21:48 |
lifeless | b) much of it is irrelevant (and just noise but not harmful) but some is just weird to do to an already running instance - like plug_vifs. | 21:49 |
comstud | i gotcha | 21:49 |
lifeless | comstud: I agree with longer term refactorings | 21:49 |
lifeless | comstud: problem is the chicken-egg thing | 21:49 |
comstud | sure | 21:49 |
lifeless | comstud: until we're @ parity with nova-bm + tested, no inclusion. Parity means no less robust etc, and nova-bm had a federated model of host ownership | 21:50 |
lifeless | so it had the 'generally available with N n-cs if one fails' property that we're looking at reclaiming here. | 21:50 |
comstud | i'm for getting things working now :) | 21:50 |
lifeless | comstud: so my proposed short term plan is | 21:50 |
comstud | just don't want "*too* hacky*" | 21:50 |
comstud | there are some things already that I think need to move to driver layer | 21:51 |
devananda | lifeless: no, ,it didn't | 21:51 |
comstud | _init_instance could be one of them | 21:51 |
devananda | lifeless: nova-bm had a strict ownership | 21:51 |
lifeless | a) run N n-c same hostname. b) subclass ComputeManager and neuter init-host to permit 1) startup without ironic for TripleO and 2) avoid _init_instance | 21:51 |
lifeless | devananda: yes it did | 21:51 |
devananda | lifeless: when did you add that,a nd why wasn't the BP updated? | 21:51 |
lifeless | devananda: which meant if I had 100 nodes and 4 n-cs 'nova boot' would work if one n-c was offline. | 21:51 |
devananda | lifeless: it would work 75% of the time | 21:52 |
devananda | if the scheduler happened to pick a node not owned by the offline n-cpu | 21:52 |
lifeless | devananda: scheduler would reschedule | 21:52 |
devananda | heh | 21:52 |
lifeless | devananda: and after timeout scheduler would pick a live n-c directly. | 21:52 |
devananda | gotcha | 21:52 |
lifeless | devananda: what blueprint ? what did I add? | 21:52 |
comstud | scheduler does not reschedule if the n-c is down | 21:52 |
devananda | lifeless: nvm. i misunderstood your statement about "generally available" | 21:52 |
comstud | and it sends to there | 21:52 |
comstud | How would that have worked? | 21:53 |
lifeless | comstud: it doesn't ? | 21:53 |
comstud | but yes, after timeout, things would have been fine | 21:53 |
comstud | no | 21:53 |
comstud | it has no idea if compute got the msg or not | 21:53 |
comstud | it's a cast | 21:53 |
lifeless | comstud: ah, so my misunderstanding, but timeout would do it too | 21:53 |
comstud | yeah, after the timeout period, sched would notice it's down | 21:53 |
lifeless | either way | 21:53 |
lifeless | you don't need to run an ops firedrill over a down n-c | 21:53 |
lifeless | do we have consensus on this approach ? | 21:54 |
comstud | you did with bm after the instance is built | 21:54 |
devananda | assuming you have considerable excess capacity | 21:54 |
comstud | can't do any actions | 21:54 |
devananda | and no running instances owned by taht n-cpu | 21:54 |
lifeless | comstud: yeah, you can't ignore it | 21:54 |
comstud | (but that is not the case with ironic) | 21:54 |
lifeless | comstud: but you don't need to be screaming down the streets with sirens on either | 21:54 |
comstud | i'd still call it an ops firedrill :) | 21:54 |
lifeless | devananda: running instances are fine until they want to reboot | 21:54 |
devananda | lifeless: right | 21:54 |
comstud | i guess it depends on if you're running a large public cloud or not | 21:55 |
comstud | :) | 21:55 |
*** max_lobur has quit IRC | 21:55 | |
lifeless | comstud: a cloud of nova-bm or one deployed by nova-bm:) | 21:55 |
lifeless | anyhow | 21:55 |
comstud | a cloud of nova-bm | 21:55 |
lifeless | comstud: so, don't do that :) | 21:55 |
devananda | comstud: hopefully no one is doing that :) | 21:55 |
comstud | in this case | 21:55 |
lifeless | anyhoo.... | 21:55 |
comstud | i doubt anyone is doing that with baremetal driver | 21:56 |
comstud | yeah, anyhoo... | 21:56 |
devananda | 21:51:22 < lifeless> a) run N n-c same hostname. b) subclass ComputeManager and neuter init-host ... | 21:56 |
devananda | backing up to that :) | 21:56 |
comstud | seems like a reasonable option for now | 21:56 |
devananda | lifeless: i think that might work. but ... it means ironic isn't just a driver for nova | 21:56 |
comstud | devananda: you already need the special scheduler host manager | 21:57 |
comstud | :-/ | 21:57 |
comstud | which i'd also like to ditch the need for | 21:57 |
devananda | yea ... | 21:57 |
comstud | i think it might be fine short term | 21:57 |
comstud | but | 21:57 |
devananda | comstud: so i think, long term, the scheduler host manager could be addressed by more capable filtering | 21:57 |
comstud | I think maybe we can get more of the init process down to the driver layer in nova-compute | 21:58 |
comstud | so that you can ... not do things | 21:58 |
devananda | that would be much better IMO | 21:58 |
devananda | at this point, landing a new compute.manager subclass in noav is probably out of the question for Icehouse :) | 21:58 |
comstud | correct | 21:58 |
comstud | within nova | 21:58 |
devananda | i haven't looked at how easily we can plug in an outof-tree compute mgr | 21:58 |
comstud | you could ship one in Ironic | 21:58 |
devananda | possible? | 21:58 |
comstud | yeah it is very possible last i knew | 21:58 |
devananda | k | 21:59 |
comstud | checking... | 21:59 |
comstud | should be compute_manager conf setting | 21:59 |
devananda | #compute_manager=nova.compute.manager.ComputeManager | 21:59 |
devananda | so, hypothetically, yea | 21:59 |
comstud | 92 cfg.StrOpt('compute_manager', 93 default='nova.compute.manager.ComputeManager', | 21:59 |
lifeless | devananda: its trivial. | 21:59 |
devananda | k | 21:59 |
lifeless | devananda: I linked the setting aove | 21:59 |
lifeless | above | 21:59 |
devananda | so | 22:00 |
devananda | dependency here that we need to address anyway, but this just raises it | 22:00 |
devananda | we need to run the nova unit test suite | 22:00 |
devananda | in our check/gate | 22:00 |
devananda | for as long as these things aren't in nova | 22:00 |
devananda | at least then we will spot when nova changes an internal API and breaks the out of tree code | 22:01 |
devananda | whcih seriously sucks. but that's the life of an incubated project for now. | 22:01 |
*** jbjohnso_ has quit IRC | 22:03 | |
lifeless | devananda: I don't hold much confidence that it will tell us that. | 22:05 |
lifeless | devananda: because there are few interface tests in nova | 22:05 |
devananda | sigh | 22:05 |
lifeless | devananda: by which I mean tests that test tht 'all drivers support X' | 22:05 |
lifeless | NobodyCam: so, https://review.openstack.org/#/c/80376/ ? | 22:07 |
NobodyCam | ahh yes, I chatted with lucas, and even before I got to chat with him he had put up. https://blueprints.launchpad.net/ironic/+spec/credentials-keystone-v3 | 22:10 |
lifeless | NobodyCam: does that block the ssh patch somehow ? | 22:11 |
lifeless | NobodyCam: it seems like a good improvement to me, but orthogonal. | 22:11 |
NobodyCam | he is going to shot a letter to the ML to see what people think about the idea of removing all user creds from ironic | 22:11 |
openstackgerrit | Jim Rollenhagen proposed a change to openstack/ironic: Add Node.instance_info field https://review.openstack.org/79466 | 22:11 |
lifeless | NobodyCam: sure, but - does this block it ? | 22:12 |
lifeless | NobodyCam: if its going to be blocked indefinitely, I'll abandon it to get it out of my working set. | 22:12 |
NobodyCam | not block but seems a extra amount of work to land only to pull out again | 22:12 |
lifeless | NobodyCam: otherwise I'm checking it every day to see if there are things I need to do to it. | 22:12 |
lifeless | NobodyCam: it wouldn't get pulled out | 22:12 |
comstud | devananda: different topic. Were objects added to Ironic somewhat after some code was already using dbapi directly? (Pointing to some things in conductor somewhat suggests this...like getting a list of nodes) | 22:13 |
lifeless | NobodyCam: it compromises two distinct things | 22:13 |
lifeless | NobodyCam: a) storing a different form of password | 22:13 |
*** matty_dubs is now known as matty_dubs|gone | 22:13 | |
lifeless | NobodyCam: b) utilising a stored SSH key in a different way | 22:13 |
comstud | devananda: (Also things passing a node_id to conductor instead of the Node object itself) | 22:13 |
lifeless | NobodyCam: a) is a patch that has to happen to all drivers with the keystone v3 thing. b) would stay in. | 22:13 |
Shrews | so this is an easily missed bug that somehow found its way in: https://github.com/openstack/ironic/blob/master/ironic/nova/virt/ironic/driver.py#L266 | 22:14 |
lifeless | NobodyCam: or put another way if you do the keystone v3 thing first, we'll still have to land a patch to enable SSH with API provided keys in future. | 22:14 |
devananda | comstud: iirc, only things doing direct dbapi calls are in the api service for object creation | 22:15 |
devananda | comstud: there may be a better way to do that with objects that was added after we copied objects from nova | 22:15 |
devananda | comstud: as for passing node_id to conductor -- we started by passing the whole object, but then quickly realized that was wasted bytes in RPC when we need to fetch it from the DB anyway | 22:16 |
NobodyCam | devananda: how much work did you put into that aes crypto stuff you looked at | 22:16 |
devananda | comstud: if a condcutor trusted the potentially-stale RPC object, it would lead to bad locking | 22:17 |
devananda | NobodyCam: no code yet | 22:17 |
lifeless | NobodyCam: does that make sense ? | 22:17 |
NobodyCam | lifeless: that put another way does make a vary valid point | 22:17 |
devananda | Shrews: what's teh bug? | 22:17 |
NobodyCam | very even | 22:18 |
Shrews | devananda: no 'raise' :-P | 22:18 |
*** romcheg has quit IRC | 22:18 | |
devananda | oh. hah! | 22:18 |
Shrews | as i said, easily missed | 22:18 |
devananda | again, why we need unit tests | 22:18 |
devananda | *why we need to be running unit tests | 22:19 |
comstud | devananda: Yeah, there's a nice refresh() call on the object that could be used | 22:19 |
comstud | well, at least in nova | 22:19 |
notq | on the ironic white board, it says that ironic failed to graduate in icehouse. what does "graduate" in this context mean? Is that graduate from an incubation project, or does that mean it won't be released in icehouse? | 22:19 |
comstud | devananda: but if you're always going to refresh, I can understand the bytes saving | 22:19 |
devananda | lifeless: quick testr question -- testr config in ironic to a) clone nova b) run nova's unit test suite on the nova driver code in our tree | 22:20 |
comstud | devananda: But anyway, there's a number of direct DB API use in conductor right now... | 22:20 |
comstud | it seems for things that are not implemented in objects yet | 22:20 |
comstud | like (getting) a list of nodes | 22:20 |
devananda | notq: means we are still an incubated project and therefor not part of the official release or the symmetric gate | 22:20 |
devananda | notq: does not mean that downstream package managers won't release packages of ironic (in fact, some of them are) | 22:20 |
*** romcheg has joined #openstack-ironic | 22:21 | |
notq | devananda: okay, so, from a user perspective not much difference? | 22:21 |
*** romcheg has quit IRC | 22:21 | |
lifeless | devananda: it all depends on discover | 22:21 |
devananda | comstud: ahh. yes. those return a list of objects tho | 22:21 |
comstud | it should, but right now it's a list of sql-a models | 22:21 |
comstud | so it seems | 22:22 |
lifeless | devananda: but again, novas code doesn't understand 'run tests on a driver' | 22:22 |
comstud | $ grep -c dbapi *.py | 22:22 |
lifeless | devananda: drivers have tests. | 22:22 |
comstud | manager.py:12 | 22:22 |
comstud | task_manager.py:7 | 22:22 |
*** eghobo has quit IRC | 22:22 | |
comstud | devananda: I'm guessing you're open to fixes :) | 22:22 |
NobodyCam | we need to move the password / key out of the info dict | 22:22 |
lifeless | devananda: so I think you're approach the problem backwards. The way I'd approach it is 'how to have Ironic unittests that extend/trigger on Nova changes and exercise the nova driver' | 22:22 |
lifeless | NobodyCam: separate problem though, right ? :) | 22:23 |
devananda | comstud: you mean get_nodeinfo_list? | 22:23 |
comstud | Node has no 'destroy' right now, so dbapi.destroy_node() is called, as an example as well | 22:23 |
NobodyCam | lifeless: yes it is... | 22:23 |
comstud | yeah, get_nodeinfo_list would be the first one I'd nail | 22:23 |
lifeless | NobodyCam: note that ssh keys are better than passwords with the current structure, because ssh keys can be limited but passwords cannot | 22:23 |
lifeless | (by sshd) | 22:23 |
devananda | comstud: so taht should be the only one... and it should stay taht way :) | 22:23 |
comstud | 501 node_list = self.dbapi.get_nodeinfo_list(columns=columns, | 22:23 |
comstud | 502 filters=filters) | 22:23 |
comstud | what should stay what way? | 22:23 |
devananda | comstud: well. let me ask | 22:24 |
comstud | ok | 22:24 |
JayF | hs check-tempest-dsvm-virtual-ironic been passing in most cases? I see it's nonvoting, but it failed for my patch and was curious is that was expected or somehow caused by my task | 22:24 |
devananda | comstud: with the object code, can an object be init'd with only partial field data? if so, what does taht do later on? | 22:24 |
jroll | JayF: I haven't seen it pass, ever | 22:24 |
comstud | you can | 22:24 |
comstud | and you can make it lazy load other junk on access later if you want | 22:24 |
devananda | comstud: the point of get_nodeinfo_list is to avoid unnecessarily fetching the whole node when you only need eg. 2 columns | 22:24 |
devananda | com | 22:24 |
devananda | ok | 22:24 |
comstud | oh, gotcha | 22:25 |
devananda | comstud: so the lazy stuff either wasn't implemented, or I didn't undersatnd at the time | 22:25 |
comstud | you can do that still, yes | 22:25 |
comstud | in fact, we just recently had to do this for something in nova | 22:25 |
comstud | that queried specific columns | 22:25 |
devananda | cool. so yea, taht'd be fine :) | 22:25 |
comstud | it may not have been implemented at the time, depending on how early you grabbed stuff | 22:25 |
devananda | comstud: as for destroy ... that was probably an oversight on my part. objects should have destroy :) | 22:25 |
JayF | jroll: thanks, that's the feedback I was lookin' for | 22:25 |
comstud | devananda: cools | 22:25 |
jroll | :) | 22:25 |
comstud | 511 node = self.dbapi.get_node(node_uuid) | 22:26 |
notq | trying to get me head around it all, tasked to create a baremetal cloud. just got some contacts with hp cloud os that i'm supposed to work with to help us, but trying to catch up and understand as much as possible. | 22:26 |
comstud | there's that one too which is just an oversight i'm guessing | 22:26 |
notq | anyway, don't mean to clutter your dev work. i'll shut up now :) | 22:26 |
NobodyCam | lifeless: I have not tested, but do you know off the top of your head waht paramiko would do with both key and key file set | 22:26 |
lifeless | NobodyCam: doesn't matter, the code prevents that | 22:26 |
lifeless | NobodyCam: one and only one credential is permitted | 22:27 |
lifeless | NobodyCam: 0/2/3 all error - and there are tests :) | 22:27 |
comstud | devananda: I'll get some patches together here | 22:27 |
comstud | i can see some of these not really needing objects, like 494 self.dbapi.touch_conductor(self.host) | 22:27 |
devananda | JayF: last i heard from adam_g this morning, -virtual-ironic test is passing on HPCS nodes but not on RAX nodes. pending more work | 22:27 |
comstud | heh | 22:27 |
devananda | comstud: right :) | 22:27 |
NobodyCam | ahh yes it does | 22:27 |
comstud | devananda: btw, thanks for implementing your DB api as a class :) | 22:28 |
devananda | comstud: welcome :) | 22:28 |
comstud | for nova, I have larger plans | 22:28 |
JayF | devananda: ah yeah, I didn't make the connection that was the test he was talking about, but he said something to dwalleck and I about it before. Apparently not much we can do on the image side to fix but he's working with libcloud for something upstream | 22:28 |
devananda | comstud: the singleton thing gets a bit wonky, but the class should make life much easier in the long run | 22:28 |
comstud | where the DB API actually exposes objects itself | 22:28 |
comstud | so we don't have this huge list of bullshit | 22:28 |
devananda | JayF: yep | 22:28 |
comstud | devananda: right | 22:29 |
devananda | lifeless: so there are two different angles | 22:29 |
devananda | lifeless: a) have reasonable unit test coverage of the nova.virt.ironic driver, and ensure we can catch issues in it at that layer | 22:29 |
devananda | lifeless: b) have some integration tests with nova + the ironic driver | 22:30 |
adam_g | devananda, heh no, not exactly. the actual devstack setup of ironic is succeeding but tempest-against-ironic is not. still work to do there | 22:30 |
adam_g | JayF, ^ | 22:30 |
devananda | adam_g: ahh. and is taht succeeding on both RAX and HPCS? | 22:30 |
adam_g | devananda, HPCS so far | 22:30 |
devananda | ack | 22:30 |
JayF | adam_g: k, ty | 22:31 |
*** ifarkas has quit IRC | 22:31 | |
openstackgerrit | lifeless proposed a change to openstack/ironic: Provide a new ComputeManager for Ironic. https://review.openstack.org/82637 | 22:32 |
devananda | lifeless: commented on https://bugs.launchpad.net/ironic/+bug/1295503 | 22:32 |
lifeless | devananda: what do you mean by integration tests | 22:33 |
devananda | lifeless: in this case, i mean something which ensures the nova virt driver API as implemented by ironic's driver isn't stale | 22:33 |
devananda | lifeless: as your point earlier, that may not be wells erved by nova's unit test suite. i haven't checked | 22:34 |
devananda | lifeless: but my first point (unit tests of our driver code) should be gettign run | 22:34 |
devananda | and they're not run anywhere today | 22:34 |
lifeless | devananda: they should be | 22:34 |
lifeless | test_command=OS_STDOUT_CAPTURE=1 OS_STDERR_CAPTURE=1 OS_TEST_TIMEOUT=60 ${PYTHON:-python} -m subunit.run discover -t ./ ./ $LISTOPT $IDOPTION | 22:34 |
lifeless | devananda: are you sure they are not ? | 22:34 |
comstud | Is this new compute manager only to work around the fact that you're trying to use keystone before keystone is configured? | 22:35 |
lifeless | comstud: no | 22:35 |
NobodyCam | devananda: did we have a BP to refactor password out of dict already? | 22:35 |
comstud | I assume it's more than that | 22:35 |
comstud | ok | 22:35 |
lifeless | comstud: you did read the commit message right ? | 22:35 |
comstud | sorry, I was reading a bug | 22:35 |
devananda | NobodyCam: not afaik | 22:35 |
lifeless | comstud: and the etherpad | 22:35 |
lifeless | comstud: where we listed 5 or so reasons Ironic might not be available at init-host time | 22:36 |
NobodyCam | lifeless: 80376 LGTM | 22:36 |
lifeless | NobodyCam: so +A it ;) | 22:36 |
devananda | comstud: see my comment at end of bug | 22:36 |
comstud | yeah | 22:36 |
NobodyCam | lifeless: I am now... :) | 22:36 |
NobodyCam | devananda: also going to file BP for that refactor | 22:37 |
comstud | sorry, I just reverted back to a sore subject I have | 22:37 |
comstud | :) | 22:37 |
lifeless | comstud: whats the sore subject? | 22:37 |
comstud | needlessly jumping to conclusions | 22:38 |
devananda | lifeless: i beleive instances = instance_obj.InstanceList.get_by_host is not necessary | 22:38 |
comstud | lifeless: needing to bring services online unconfigured in order to configure them | 22:38 |
lifeless | comstud: I'm not sure which side of that you're on :) | 22:39 |
lifeless | comstud: I'm on the 'I just want to deploy easily and not have to embed arbitrary state state machines into my deploy logic' | 22:39 |
lifeless | comstud: side. | 22:39 |
comstud | i'm on the side that I think it's dumb to have to bring up unconfigured services | 22:39 |
comstud | :) | 22:39 |
comstud | nod | 22:39 |
lifeless | comstud: so, in this case, nova *is* configured. | 22:39 |
lifeless | comstud: a *dependency* isn't. | 22:39 |
comstud | in this case i'm referring to keystone | 22:39 |
lifeless | sure, keystone is a sore point for me too :) | 22:40 |
comstud | if you start up nova first | 22:40 |
comstud | and ironic before that even | 22:40 |
comstud | then you start keystone and configure it | 22:40 |
comstud | there's a race in there | 22:40 |
comstud | etc | 22:40 |
lifeless | whats the race? | 22:40 |
lifeless | I think I know, but for clarity | 22:40 |
comstud | nova querying keystone between unconfigured and configured | 22:41 |
lifeless | which will error right ? | 22:41 |
comstud | it get a 401 and should not retry because it generally should mean you put invalid creds in | 22:41 |
comstud | right | 22:41 |
openstackgerrit | lifeless proposed a change to openstack/ironic: Provide a new ComputeManager for Ironic. https://review.openstack.org/82637 | 22:41 |
lifeless | so perhaps nova should rety | 22:41 |
lifeless | retry there | 22:41 |
comstud | nova can retry on CONNREFUSED easily. once keystone is up, it should be configured | 22:41 |
devananda | the issue is, if keystone's offline, it should retry | 22:41 |
comstud | i argue nova should not retry on 401 :) | 22:42 |
devananda | if the auth is bad, it shouldn't assume auth will change | 22:42 |
comstud | because that's a 'invalid creds man, you screwed up your nova config' | 22:42 |
lifeless | but the auth might be bad because the creds are being rotated | 22:42 |
comstud | (or you screwed up your keystone config) | 22:42 |
devananda | so we can point the finger at keystone for being "available" when it's not configured yet | 22:42 |
lifeless | you don't know if its keystone or nova thats wrong | 22:42 |
comstud | hah, well, that's an interesting case | 22:42 |
lifeless | initial bring up is a special case of key rotation | 22:42 |
lifeless | :) | 22:42 |
comstud | i think you add new creds first to keystone | 22:42 |
comstud | fix nova | 22:43 |
comstud | remove old creds | 22:43 |
comstud | ;p | 22:43 |
comstud | (no retry on 401 in nova) | 22:43 |
lifeless | so, in a SOA, stopping cold and not retrying is a pretty significant increase in management complexity. | 22:44 |
comstud | i could see nova maybe retrying once it's been running | 22:44 |
lifeless | Whats the rationale for making a 401 stop cold | 22:44 |
comstud | but on startup, it should maybe bail | 22:44 |
comstud | yeah, i dunno, but it's all somewhat besides my point | 22:45 |
comstud | generally you bring services online when they are ready to be used | 22:45 |
comstud | you don't want people querying an unconfigured service | 22:45 |
devananda | lifeless: please no (c) in empty files | 22:46 |
devananda | want to test it, but otherwise LGTM | 22:46 |
NobodyCam | devananda: https://blueprints.launchpad.net/ironic/+spec/refactor-password-key-storage | 22:46 |
devananda | NobodyCam: thanks | 22:47 |
NobodyCam | lifeless: in zuuls hands now! | 22:47 |
openstackgerrit | lifeless proposed a change to openstack/ironic: Provide a new ComputeManager for Ironic. https://review.openstack.org/82637 | 22:47 |
NobodyCam | brb mid afternoon walkies | 22:47 |
* devananda rewrites dprince's patch | 22:48 | |
lifeless | comstud: so, I agree about not querying an unconfigured service, but not about nova giving up :) | 22:48 |
lifeless | comstud: keystone is kindof special that much of its configuration is done through the service | 22:48 |
comstud | that's fair | 22:48 |
comstud | well | 22:49 |
lifeless | comstud: so one could argue that its actually configured once it has a service token | 22:49 |
comstud | not sure keystone is special here in OpenStack with that regard | 22:49 |
comstud | my soreness comes from nova wanting to remove nova-manage and make it all API driven | 22:49 |
comstud | because it's apparently difficult to roll out configs. | 22:49 |
comstud | or something. | 22:50 |
comstud | (but easy to query the API) | 22:50 |
lifeless | not sure I see the connect between nova-manage and configs | 22:50 |
lifeless | wasn't nova-manage all DB backdoor access? | 22:50 |
comstud | yeah, sorry it's more "because it's apparently hard to update the DB" | 22:51 |
comstud | yes it was/is | 22:51 |
comstud | There was pushback to adding cells configuration (which is DB driven) to nova-manage | 22:51 |
comstud | I made enough of a case to land it, but still annoyed by the general thought process, myself | 22:52 |
comstud | maybe i'm in the minority | 22:52 |
jroll | devananda: working on this agent startup diagram - how much detail are you looking for? this is what I have right now, it's a little hand-wavey: https://dl.dropboxusercontent.com/u/363486/IPA-Startup-Flow.png | 22:53 |
adam_g | dtantsur, still around? | 22:53 |
devananda | jroll: looking | 22:53 |
jroll | but I think it's a good at-a-glance overview | 22:53 |
devananda | jroll: some numbers or an indication of where to "enter" the flow would help | 22:54 |
devananda | i think i got it, though | 22:54 |
jroll | ok, cool, thanks | 22:54 |
jroll | it'll be on the wiki soon | 22:54 |
devananda | jroll: also, clarification that the response to POST for unknown hw info | 22:54 |
devananda | s/that/what/ | 22:55 |
devananda | otherwise, yea, good start, thanks | 22:55 |
jroll | sure | 22:55 |
NobodyCam | lifeless: in your TripleO / Ironic testing the under cloud did deploy from the seed you weren't getting Stack create failed, status FAILED on hte undercloud?? is that correct? | 23:10 |
lifeless | NobodyCam: stack will fail because the wait condition never triggers | 23:11 |
lifeless | NobodyCam: which is due to the bug we just worked through | 23:11 |
NobodyCam | ack | 23:11 |
NobodyCam | :) | 23:11 |
lifeless | NobodyCam: to fix, you need to use the review I put up, add a new setting to nova.conf to set the compute manager, and have undercloud-vm-ironic-source.yaml set that setting | 23:12 |
NobodyCam | :) | 23:13 |
*** dwalleck has joined #openstack-ironic | 23:13 | |
jroll | devananda: https://dl.dropboxusercontent.com/u/363486/IPA-Startup-Flow.png | 23:15 |
devananda | jroll: nice. i would add a logical break above "issue commands" | 23:16 |
jroll | devananda: like | 23:17 |
devananda | but everything is much clearer | 23:17 |
jroll | "some time passes..." | 23:17 |
jroll | ? | 23:17 |
devananda | jroll: something to indicate taht a User is driving the API at that point. not the agent | 23:17 |
devananda | it looks like the agent is driving itself in a very round-about awy :) | 23:18 |
devananda | way | 23:18 |
jroll | right right | 23:18 |
jroll | will do | 23:18 |
*** dwalleck has quit IRC | 23:21 | |
openstackgerrit | Devananda van der Veen proposed a change to openstack/ironic: Run ipmi power status less aggressively https://review.openstack.org/82668 | 23:30 |
openstackgerrit | Adam Gandelman proposed a change to openstack/ironic: Pass no arguments to _wait_for_provision_state() https://review.openstack.org/82669 | 23:31 |
adam_g | devananda, ^ something merged friday and broke instance deletion. :| | 23:32 |
devananda | :( | 23:32 |
jroll | adam_g: should check-tripleo-ironic-undercloud-precise be passing, currently? | 23:33 |
devananda | oil-ci-bot ?? | 23:34 |
adam_g | is the nova driver unit test coverage in the ironic tree currently, or buried in nova's history somewhere? | 23:34 |
adam_g | devananda, heh, my firefox LP session was still logged into an old bot account. | 23:34 |
adam_g | i barely use FF these days | 23:34 |
adam_g | jrist, i do not know about tripleo | 23:34 |
adam_g | jroll, ^ | 23:35 |
jroll | ah | 23:35 |
jroll | whose gate jobs are those? | 23:35 |
devananda | adam_g: lol | 23:35 |
devananda | jroll: see #tripleo :) | 23:35 |
adam_g | jroll, that is a separate CI effort focused on tripleo. #tripleo would be able to tell you more | 23:35 |
devananda | adam_g: it should be contained in ironic's tree... let me check | 23:36 |
jroll | got it, thanks | 23:36 |
JayF | This fix to the heartbeat bug (https://bugs.launchpad.net/ironic/+bug/1295874) that was assigned to me in the meeting is awaiting core review: https://review.openstack.org/#/c/82615/ | 23:37 |
devananda | JayF: thanks for the ping | 23:38 |
JayF | devananda: reading through the periodic task code, it looked like they might also need a bugreport filed for not logging or throwing an exception when you give an interval lower than the logic permits (it idles 60s between runs), wdyt? | 23:39 |
JayF | I would think a WARN in the log would be sufficient if it sees an interval lower than DEFAULT_INTERVAL | 23:40 |
devananda | JayF: yea, please follow up on that | 23:40 |
devananda | (because I forgot to) | 23:40 |
devananda | i remember a discussion around the last summit of, hey, we should fix that. but i dont tink anyone did | 23:41 |
JayF | yeah it's not even documented in the comments on the decorator that appear to be pulled into some user docs | 23:41 |
JayF | so it needs to throw a warn and get the docs fixed :x | 23:41 |
JayF | I might just file the bug and push the fix to them too if it's easy | 23:41 |
devananda | so in previous systems I've worked on | 23:42 |
devananda | hb_timeout is always ~ 2.5x hb_interval | 23:42 |
JayF | Here's my logic in making it what it is: | 23:42 |
JayF | When we run tests, do we think it's OK for tests to pass consistently if the first heartbeat fails. | 23:42 |
JayF | My gut said no, so I made the timeout reflect that. | 23:43 |
JayF | I agree I would not use these values in a production environment though. | 23:43 |
devananda | heh | 23:43 |
devananda | #sanedefaults | 23:43 |
devananda | I agree with your reasoning. but I think we should have sane production defaults to the extent possible | 23:43 |
devananda | and it's much easier to codify changing these in a test env | 23:43 |
devananda | eg, in the test class setUp, just override the conf | 23:44 |
devananda | we do that a lot already | 23:44 |
russell_h | devananda: 2.5x - 3x is what I've always done too | 23:44 |
JayF | I have no problem with bumping the timeout to 150s, I'll see if I can also fix the tests to use 60/90 | 23:44 |
JayF | Should that be in the same merge req/commit or a separate one? | 23:44 |
devananda | JayF: same one | 23:44 |
devananda | JayF: i'll toss a comment up | 23:44 |
JayF | ty | 23:44 |
NobodyCam | hey hey lifeless still about? | 23:53 |
lifeless | yes | 23:53 |
NobodyCam | :) can you push up a quick rebase on https://review.openstack.org/#/c/82637 also it got hit by H803 git commit title ('Provide a new ComputeManager for Ironic.') should not end with period | 23:54 |
NobodyCam | :-p | 23:55 |
openstackgerrit | A change was merged to openstack/ironic: Permit passing SSH keys into the Ironic API https://review.openstack.org/80376 | 23:55 |
NobodyCam | also ^^^ :) | 23:55 |
openstackgerrit | Jim Rollenhagen proposed a change to openstack/ironic: Add Node.instance_info field https://review.openstack.org/79466 | 23:58 |
Generated by irclog2html.py 2.14.0 by Marius Gedminas - find it at mg.pov.lt!