17:00:08 <TheJulia> #startmeeting ironic 17:00:09 <openstack> Meeting started Mon Feb 12 17:00:08 2018 UTC and is due to finish in 60 minutes. The chair is TheJulia. Information about MeetBot at http://wiki.debian.org/MeetBot. 17:00:10 <openstack> Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. 17:00:11 <TheJulia> o/ 17:00:12 <dtantsur> o/ 17:00:12 <openstack> The meeting name has been set to 'ironic' 17:00:17 <etingof> o/ 17:00:19 <mjturek> o/ 17:00:21 <jroll> \o 17:00:23 <stendulker> o/ 17:00:34 <TheJulia> Our meeting agenda can be found on the wiki, as always! 17:00:37 <hshiina> o/ 17:00:39 <TheJulia> #link https://wiki.openstack.org/wiki/Meetings/Ironic 17:01:03 <rloo> o/ 17:01:05 <mgoddard_> o/ 17:01:07 <TheJulia> #topic Announcements / Reminder 17:01:29 <TheJulia> First off, Thank you dtantsur for your hard work! 17:01:35 <jroll> ++ 17:01:43 <dtantsur> :) 17:01:55 <rloo> +++ 17:01:57 <dtantsur> and congrats TheJulia for taking this hard work ;) 17:02:00 <etingof> dtantsur++ 17:02:05 <jlvillal> o/ 17:02:11 <jlvillal> +1 :) 17:02:12 <rloo> +++ congrats! 17:02:16 <TheJulia> I hope to meet everyone's expectations. Please remember that I too am human, as I take on the role of fearless leader. 17:02:22 <stendulker> Congrats Julia !! 17:02:26 <rloo> and THANK YOU TheJulia for volunteering! 17:02:30 <etingof> ++TheJulia 17:02:43 * jlvillal had expectations of an in-human leader :P 17:02:51 <TheJulia> Anyway, time for the remaining annoucements 17:02:59 <TheJulia> jlvillal: only if I get a flying aircraft carrier ;) 17:03:11 <jlvillal> :) 17:03:18 * dtantsur announces that his visa to ireland was finally approved 17:03:36 <TheJulia> The PTG is coming up. Please update the PTG Planning etherpad today and over this week. 17:03:39 <TheJulia> #link https://etherpad.openstack.org/p/ironic-rocky-ptg 17:04:00 * jlvillal thought we had a new core reviewer 17:04:07 <TheJulia> I will break the etherpad up and generate a schedule for Wednesday/Thursday on this coming friday. 17:04:30 <TheJulia> jlvillal: thanks for the reminder! 17:04:46 <TheJulia> #info hshiina is now a member of ironic-core, congrats! 17:04:50 <jroll> \o/ 17:04:53 <jlvillal> Nice! :) 17:05:06 <hshiina> thanks, everyone 17:05:13 <stendulker> Congrats hshiina !! 17:05:24 <TheJulia> I think that is about it for annoucements, does anyone else have anything to annouce? 17:05:29 <TheJulia> announce 17:05:30 <rloo> congrats hshiina, welcome to moar reviews! :) 17:05:35 * jlvillal checks time in Japan and sees it is 2:05AM. Yowzer! 17:05:37 <rloo> ptg get together poll: https://doodle.com/poll/d4ff6m9hxg887n9q 17:05:38 <dtantsur> when is queens final? 17:05:56 <TheJulia> #link https://doodle.com/poll/d4ff6m9hxg887n9q 17:06:01 <sambetts> o/ 17:06:13 <TheJulia> Please respond to the doodle so we can schedule an evening gathering at the PTG. 17:06:34 <rloo> if you're interested in joining the ptg get-together, please indicate your availability via the poll. i had indicated a feb 16 deadline, but i think i'd like to book a place sooner rather than later, so please sign up by tomorrow. i'll send out email too. thx. 17:06:51 <dtantsur> ++ let's not wait till the last moment with booking 17:06:52 <TheJulia> #info Queens final releases are slated for the week of the 19th-23rd. 17:07:08 <TheJulia> rloo: thanks! 17:07:20 <dtantsur> oh, so one more week to fix all the bugs 17:07:31 * rloo wonder, what bugs? impossible... 17:07:35 <TheJulia> and tests... and gates 17:07:45 <dtantsur> rloo: 3 critical bugs.. 17:07:47 <TheJulia> Anyway, we should move on 17:07:58 <dtantsur> move on \o/ 17:07:59 <TheJulia> dtantsur: is there a list on the whiteboard? 17:08:01 * dtantsur flies to the woods 17:08:04 <dtantsur> TheJulia: there is 17:08:09 <TheJulia> #topic Review action items from previous meeting 17:09:04 <TheJulia> Looks like our only action item was to review/triage and work on bugs last week. 17:09:56 <TheJulia> I think we can just move on since this week should be the same. Any disagreements? 17:10:21 <dtantsur> +++ 17:10:44 <TheJulia> Moving on then! 17:11:12 <TheJulia> #topic Subteam status reports 17:11:16 <TheJulia> #link https://etherpad.openstack.org/p/IronicWhiteBoard 17:11:20 <TheJulia> Starting at Line 202 17:12:40 <rloo> dtantsur: are you going to work on classic driver deprecation this week? (doc needs updating?) 17:12:46 <TheJulia> FYI, for those that don't see it, dtantsur has put the list of critical Queens bugs that need to land and be backported this week under the bugs section. 17:12:49 <dtantsur> rloo: very likely so 17:13:02 <dtantsur> rloo: after solving the API issue we talked about 17:13:10 <rloo> dtantsur: we should ping vendors that need to update their docs wrt classic driver deprecations 17:13:12 * dtantsur finally has a devstack environment to test things 17:13:23 <dtantsur> rloo: well, I did a call on the ML, and at least 2 vendors proposed patches 17:13:34 <rloo> dtantsur: ok good. 17:13:37 * dtantsur hands TheJulia a loooong stick to poke people 17:13:56 <jroll> the critical bugs are at line 215, for anyone else that also can't read today 17:14:04 <rloo> dtantsur: also the TODOs wrt migrating CI to hardware types. who's going to do all those? 17:14:05 <TheJulia> dtantsur: is the end sharpened ? 17:14:19 <dtantsur> TheJulia: just enough to make it annoying 17:14:26 <TheJulia> dtantsur: awesome! 17:14:30 <dtantsur> rloo: I suspect me, unless somebody wants to help 17:14:50 <rloo> just a heads up in case you miss it, traits is almost done but we forgot one thing, it'll need to be backported (L285) 17:15:07 <TheJulia> mgoddard: re: traits, is any further action absolutely required for this release? 17:15:17 * TheJulia looks at 285 17:15:26 <jroll> should just be that one 17:16:08 <TheJulia> https://review.openstack.org/#/c/543461/ 17:16:08 <patchbot> patch 543461 - ironic - Validate instance_info.traits against node traits 17:16:20 <rloo> it isn't clear to me what we did wrt routed network support :) L314+ 17:16:52 * TheJulia stats a list 17:16:55 <dtantsur> hjensas: mind cleaning it up please ^^^? 17:17:18 <dtantsur> to make it clear what is to be done for queens, what is a follow-up for rocky, etc 17:17:42 <hjensas> o/ Will do. 17:17:55 <rloo> i am deleting the 'split away tempest plugin', meant to do that before this meeting (L423) 17:18:16 <TheJulia> Looks like the the ansible docs need a revision, and we should likely try to land/backport 17:18:23 <dtantsur> oh, before I forgot: TheJulia we need to document creating queens jobs for the tempest plugin in our releasing documentation 17:18:32 <dtantsur> and, well, create queens jobs :) 17:18:50 <mgoddard_> TheJulia: that patch is borderline required IMO, but would be nice to get it in. I have just found a small issue with the nova virt driver that will need fixing 17:19:06 <rloo> TheJulia: wrt bifrost L431 -- that's the bug we're fixing now, right? 17:19:23 <TheJulia> dtantsur: is it done for this cycle? 17:19:39 <dtantsur> TheJulia: nope, just remembered 17:19:48 <dtantsur> I can do it while we talk 17:20:21 <TheJulia> rloo: no, different but not really a big deal, just needs to be gotten to "soon" since it is all the way off in keystoneauth1 17:20:34 <rloo> TheJulia: :-( 17:20:50 <TheJulia> Yeah :\ 17:21:10 <TheJulia> Anyway, I think we've looked most everything over for subteams 17:21:18 <TheJulia> Are we ready to move on? 17:21:52 <rloo> + moving on. do we want to continue with these subteam statuses until after PTG, or put on hold until after PTG? 17:22:34 <rloo> although i guess we aren't quite done with queens so maybe continue... 17:22:36 <openstackgerrit> Dmitry Tantsur proposed openstack/ironic-tempest-plugin master: Add jobs for stable/queens https://review.openstack.org/543555 17:22:36 <dtantsur> TheJulia: ^^^ 17:22:38 <TheJulia> We ought to hold off for next week, I think next week will all be discussion 17:22:41 <TheJulia> dtantsur: thanks! 17:23:06 <TheJulia> #topic Priorities for this week 17:23:18 <TheJulia> I'm going to remove the list of things from last week that are struck out 17:24:51 <TheJulia> dtantsur: do you think we should explicitly add the list of bugs to the priority list? 17:25:06 <rloo> TheJulia: dtantsur's doc patch for classic drivers dep: https://review.openstack.org/#/c/537959/ 17:25:07 <patchbot> patch 537959 - ironic - Switch contributor documentation to hardware types 17:25:17 <dtantsur> TheJulia: probably won't hurt 17:25:29 <TheJulia> dtantsur: can you perform that copy/paste? 17:25:41 <dtantsur> yep 17:25:44 <openstackgerrit> Dmitry Tantsur proposed openstack/ironic master: releasing docs: document stable jobs for the tempest plugin https://review.openstack.org/543558 17:26:48 <TheJulia> I think that works order wise 17:26:52 <TheJulia> thoughts/objections? 17:27:21 <dtantsur> LGTM 17:27:32 <TheJulia> Are we ready to move on? 17:27:39 <rloo> i guess it is implicit that the traits patch is a weekly priority? 17:27:54 <dtantsur> it's in "Required backports" 17:28:04 <rloo> dtantsur: hence 'implicit' :) 17:28:11 <dtantsur> :) 17:29:19 <TheJulia> One thing to keep in mind, if anyone becomes aware of something that must be backported, please raise visibility as soon as possible. 17:29:33 <TheJulia> Time to move on :) 17:29:46 <dtantsur> I think https://review.openstack.org/542214 is nice to have 17:29:46 <patchbot> patch 542214 - ironic-inspector - Only set switch_id in local_link_connection if it ... 17:30:02 <TheJulia> I agree 17:30:35 <TheJulia> #topic Bug Triaging for the week 17:30:54 <TheJulia> Same as last week? 17:31:01 <dtantsur> ++ 17:31:29 <TheJulia> #action Everyone to triage/review bugs in preparation for final Queens release. 17:31:47 <TheJulia> Moving on! 17:31:52 <TheJulia> #topic Discussion 17:32:07 <TheJulia> First, and only topic it looks like, is what to do with grenade. 17:32:14 <rloo> fix it :) 17:32:27 <TheJulia> The problem is there is no fixing it as-is... 17:32:47 <rloo> so it seems like grenade framework doesn't work for us/rolling upgrades 17:33:14 <rloo> have we had discussion with the grenade folks in the past about it? cuz we're now trying to continue to hack to get it to work for us 17:33:34 <rloo> and we hack something but then don't follow up. and then something breaks :-( 17:33:56 <TheJulia> Yeah, we can't complete the nova upgrade without a nova patch in-place to handle version negotiation either. 17:34:05 <jroll> what's the latest problem with grenade? 17:34:06 <rloo> which doesn't mean that we shouldn't hack something now but ... 17:34:34 <TheJulia> jroll: tl;dr sqlalchemy gets upgraded, and old nova is incompatible with newer sqlalchemy 17:34:40 <TheJulia> *boom* 17:35:02 <rloo> and in our rolling upgrades scenario, we don't upgrade nova, just ironic 17:35:11 * jroll thinks he needs more time than we have to fully understand the thing 17:35:23 <rloo> cuz the order of upgrading is ironic first, then nova 17:35:27 <jroll> are we back to the segfault problem, is my actual question 17:35:32 <TheJulia> And since we don't upgrade ironic-api either, we can't actually upgrade nova 17:35:41 <TheJulia> jroll: We are! :) 17:35:55 <jroll> TheJulia: that seems to me like a critical bug to be fixed, likely by the nova team 17:36:08 * jroll recalls dansmith saying similar, and then the bug disappeared for a while 17:36:12 <TheJulia> A critical bug in Pike? 17:36:19 <jroll> yes 17:36:26 <rloo> but should old s/w be expected to work with new packages? 17:36:49 <jroll> running software should not be expected to segfault after an apt-get upgrade. 17:37:00 <jroll> ever, that's a bug, flat-out. 17:37:06 <rloo> jroll: ok, in that case, it is a nova bug. 17:37:39 <dtantsur> we don't upper-cap sqlalchemy in requirements, so we're expected to work with newer versions 17:38:11 <dansmith> rloo: a segv after a package upgrade would be a bug in some library 17:38:13 <jroll> I'm totally open to a conversation about whether grenade is the right tool for the job here, but it seems to me we've been doing a lot to hack around this bug, and then complaining that grenade makes those hacks hard :) 17:38:16 <TheJulia> So then it is a nova bug 17:38:25 <dansmith> rloo: there should be nothing you can do from python land to segv yourself 17:38:42 * jroll isn't sure it's a nova bug, but it's a bug with how nova interacts with the system, yes 17:38:45 <rloo> dansmith: that is good to know! 17:39:00 <dansmith> jroll: you might even argue that grenade is the right tool since it's poking something that needs fixing :) 17:39:10 <TheJulia> jroll: I think at the same time, we have an unrealistic scenario that we're executing with grenade 17:39:22 <jroll> dansmith: yeah, I should have finished with "so that's a separate conversation" :) 17:39:27 <dansmith> jroll: aye :) 17:39:35 <dansmith> TheJulia: what's unrealistic about it? 17:39:51 <dansmith> aside from the fact that nobody would deploy any of this from devstack anyway 17:39:57 <TheJulia> dansmith: Upgrade everything but nova on the same machine without isolation of underlying shared packages 17:40:08 <TheJulia> which we do because we can't run newer nova with older ironic 17:40:12 <dansmith> TheJulia: I don't think that's unrealistic 17:40:36 <dansmith> it's unideal for sure, 17:40:54 <dansmith> but if the package versions don't prohibit it, I think people would expect that should work 17:41:18 <jroll> would and do, unfortunately 17:41:25 <dansmith> right 17:41:41 <TheJulia> so will we actually get traction for nova to fix it in stable/pike? 17:41:55 <TheJulia> Well, for a fix to land 17:42:09 <dansmith> if there's something nova has to do, then sure, but I can't imagine what that is 17:42:48 <rloo> i think we may need to work with nova to help pinpoint where/how it is failing... seems like if we take ironic out of the picture, nova should still segv? 17:42:49 <TheJulia> If dtantsur's assertion that projects must be compatible with future sqlalchemy versions, then there is an extra kwarg that needs to be removed that is currently ignored I believe 17:43:21 <TheJulia> if the underlying bytecode is removed that the python runtime is using, does it recompile the bytecode? 17:43:33 <jroll> dansmith: I think it's less that nova needs to do something and more us begging for help because we've cumulatively put hundreds of people-hours into trying to track this down and/or fix it :( 17:43:45 <dansmith> jroll: I hear ya 17:44:47 <dtantsur> TheJulia: IIRC yes 17:44:48 <dansmith> TheJulia: that shouldn't cause an segv, otherwise that'd be a python bug 17:44:54 <rloo> doesn't nova have a rolling upgrades/grenade job? I'd think it would have barfed there too? 17:45:06 <dansmith> rloo: several of them yeah 17:45:20 <jroll> rloo: nova-conductor is upgraded in that job (which is the service that is segfaulting) 17:45:40 <jroll> that's why it isn't seen there 17:45:41 <TheJulia> but do those upgrades not actually upgrade nova? 17:45:44 <rloo> jroll: really, i thought in our job, we didn't upgrade nova. let's take it offline 17:46:17 <jroll> rloo: correct, we do not upgrade nova. nova's grenade jobs do. nova-conductor only breaks when not upgraded. 17:46:18 <dansmith> TheJulia: we upgrade pieces of nova in the partial job, but conductor always gets upgraded (i.e. restarted( 17:46:25 <TheJulia> dansmith: ok 17:46:43 <rloo> jroll: ah got it. so can we change their test to not upgrade and see if it barfs? 17:46:57 <dansmith> rloo: no, the whole point of our grenade test is to upgrade conductor :) 17:47:01 <TheJulia> The take-away I'm getting is we don't try and change the grenade scenario, that we hunt down and try and fix the root cause of the segfault? 17:47:24 <dansmith> has anyone tried to reproduce this locally? 17:47:39 <jroll> TheJulia: that's my opinion, yes 17:47:42 <rloo> TheJulia: yup, we should fix root cause 17:47:46 <dansmith> because doing that would let us get core files more easily and dig into what was going on when the segv is triggered 17:47:50 <TheJulia> dansmith: I'm fairly sure I did so last week 17:47:52 * jroll has not tried locally 17:48:08 <dansmith> um okay :) 17:48:18 <TheJulia> I wiped the machine out though 17:48:27 <dansmith> TheJulia: does that mean you're fairly sure you reproduced it? or fairly sure you tried? 17:48:39 * rloo wonders why the segv appeared, then disappeared, then appeared again... 17:48:52 <dansmith> rloo: that's usually the nature of such things 17:49:04 <TheJulia> dansmith: I really don't remember at this point :( 17:49:11 <rloo> dansmith: that explains it then! 17:49:22 <TheJulia> I think that I did, but last week was a blur 17:49:24 <dansmith> they can be deterministic, but often not, due to ordering and timing 17:50:00 <rloo> dansmith: so it might be hard to reproduce. great. 17:50:43 <rloo> although zuul is having great luck reproducing 17:51:19 <TheJulia> unless it is breaking updated bytecode that is causing the segfault.. I seem to remember the first time we ran into this we got some lsofs out of a running system where the conductor was crashing and we had some sqlalchemy files open but not all... 17:51:59 <TheJulia> I'll continue to work on it this week, but with the constraint of not changing the job or scenario 17:52:01 <dansmith> AFAIK, python only opens those files whilst loading them the first time, not continually 17:52:25 <dansmith> and I don't think it ever purges them and has to reload them 17:52:43 <jroll> I would imagine it's more about some shared library underneath getting upgraded 17:52:47 <dansmith> yes 17:52:51 <dansmith> I would bet on it 17:53:06 <TheJulia> The case is the same for shared libraries 17:53:15 <TheJulia> Open file handler don't change 17:53:32 <dansmith> TheJulia: but shared libraries can be opened and closed, 17:53:33 <TheJulia> it would have ot be opening a new file/library/thing that is often accessed 17:53:39 <jroll> even when the process is forked? 17:53:51 <dansmith> jroll: yes if it's just a fork 17:54:11 <TheJulia> We're running out of time today 17:54:19 <TheJulia> rloo: It doesn't look like we're goin gto get to RFEs at this point 17:54:28 <rloo> no worries 17:54:39 <rloo> i might poke people about them later. or not :) 17:54:50 <TheJulia> rloo: I believe that is reasonable 17:55:33 <TheJulia> #action TheJulia to try and reproduce the fun grenade crash situation locally and use that to try and collect data 17:56:01 <TheJulia> Since we have only 4 minutes left, does anyone have anything else that needs to be discussed today? 17:56:52 * TheJulia queries crickets as a service 17:57:12 * jroll has nothing 17:57:18 <rloo> crickets 17:57:27 * dtantsur too 17:57:28 <TheJulia> Okay, thanks everyone! 17:57:32 <TheJulia> Have a wonderful week! 17:57:35 <dtantsur> thanks TheJulia and congrats again 17:57:43 <TheJulia> #endmeeting