14:00:41 #startmeeting nova 14:00:42 Meeting started Thu Apr 2 14:00:41 2015 UTC and is due to finish in 60 minutes. The chair is johnthetubaguy. Information about MeetBot at http://wiki.debian.org/MeetBot. 14:00:43 Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. 14:00:45 The meeting name has been set to 'nova' 14:00:49 o/ 14:00:49 o/ 14:00:50 hi 14:00:51 o/ 14:00:51 \o 14:00:51 o/ 14:00:52 o/ 14:00:53 o/ 14:00:56 o/ 14:00:56 o/ 14:00:59 \o 14:01:04 o/ 14:01:06 #topic Kilo Release Status 14:01:12 hello hello 14:01:34 so its the final burn down to RC 14:01:51 basically lets merge lots of bug fixes and test kilo as much as we can 14:02:12 lets leave the bug talk to the bug section 14:02:17 #topic Gate status 14:02:30 stable was busted but that's fixed now 14:02:35 there was a gate fix around policy issues, but anything else cropping up? 14:02:51 cool, fixed now is better than not fixed 14:02:54 there was a bad trace with n-net and deallocate after some tempest changes but that's fixed and i'll backport to stable today 14:03:04 i've added links to some non-voting jobs 14:03:04 o/ 14:03:06 o/ 14:03:09 mriedem: anything you need people to jump on 14:03:16 the pci bug with compute not being able to start has a fix re-posted and it looks better 14:03:17 yes 14:03:17 aiocpu (multi-node devstack job that runs live migration testing) fails a lot, see bug https://bugs.launchpad.net/nova/+bug/1438803 14:03:18 Launchpad bug 1438803 in OpenStack Compute (nova) "libvirt error libvirtError: Requested operation is not valid: nwfilter is in use hit in _post_live_migration" [Undecided,In progress] - Assigned to Matt Riedemann (mriedem) 14:03:31 https://review.openstack.org/#/c/131321/ <<== this 14:03:33 i have a patch up for ^ but it's hacky and i haven't seen it demonstrate that it hits the flow 14:03:44 we are talking gate bugs here 14:03:47 we have the cells job that is going to be fixed soon 14:03:56 ah ok 14:04:03 johnthetubaguy: i realize, but these are also jobs running on nova code 14:04:08 so if they don't work, it means nova doesn't work 14:04:11 but let's discuss about the cells job on the bugs section 14:04:14 sorry bout that 14:04:21 mriedem: totally 14:04:31 oh, nvm :) 14:04:34 i got defensive for nothing 14:04:41 moving on - ceph job: ceph non-voting job fails hard on test_volume_boot_pattern: https://bugs.launchpad.net/cinder/+bug/1439371 14:04:42 Launchpad bug 1439371 in Cinder "Volume creation from image fails for UEC+Ceph" [Undecided,New] - Assigned to Jon Bernard (jbernard) 14:04:55 dansmith: and jbernard looked at that yesterday, looks like it's in cinder's court now? 14:05:09 yeah, although I doubt there will be a fix for kilo 14:05:14 ouch 14:05:15 jbernard is going to try 14:05:18 mriedem: not got priorities on these, what do you want to set them as? 14:05:20 but I think it's probably not going to happen 14:05:25 however, it's not a huge deal, IMHO 14:05:33 it's only ceph with UEC images AND boot from volume 14:05:39 easy to release note that and fix it 14:05:48 ah, OK, so we can probably mark that with the tag: https://launchpad.net/nova/+bugs?field.tag=kilo-rc-potential 14:05:54 I am working on a change to the gate to use the disk image for ceph jobs 14:05:58 for the live migration one, i don't know if that's an rc blocker or not 14:05:59 so we can get actual runs again 14:06:19 mriedem: just thinking we should triage the bug, thats all really 14:06:28 if i can get a run with my workaround retry patch for live migration and show it's hitting the retry loop, we could merge that as a workaround for kilo 14:06:39 mriedem: on livemigration it's something we only recently got testing, so I'm not surprised we're exposing issues 14:06:42 johnthetubaguy: well, we were told it's due to super old libvirt/qemu 14:06:47 mostly we should consider them for backporting 14:07:05 sdague: yeah, thats a good point 14:07:11 if they're not regressions, they're less important I think 14:07:12 yeah, like i said, not an rc blocker imo 14:07:29 +1 to all those comments 14:07:31 the other thing i had listed was the cells job, but sounds like bauzas was going to talk about that later 14:07:42 I trust mriedem to make that clear on the bug 14:07:48 cools… 14:07:48 mriedem: well, all the related bugs are RC1 related 14:08:12 but we can give a status here 14:08:15 #topic Bugs 14:08:24 so before we start… a quick process reminder 14:08:41 we can release RC1 once all the bugs listed here are merged: 14:08:54 https://launchpad.net/nova/+milestone/kilo-rc1 14:09:00 …so 14:09:07 rc1 is sort of scheduled for 4/9 right? 14:09:10 a week from now 14:09:16 please only target bugs to rc1 if you think we should block the release on that 14:09:21 or is 4/9 just when the project / release manageres get really antsy? 14:09:34 mriedem: yes, https://wiki.openstack.org/wiki/Kilo_Release_Schedule 14:09:43 4/9 is the last possible day 14:09:43 well block seems strong here 14:09:52 mriedem: question was answered by mikal saying "the sooner is the better" 14:10:02 ideally before 4/9 IIUC 14:10:04 yeh, earlier is better 14:10:09 yeah, we can also do rc2 if needed 14:10:10 well, we don't want to cut RC1 too early, so we force RC2, but yeah 14:10:20 anyways, so thats the exit criteria 14:10:28 we have the importance - so I would imageine we target what we think is reasonable to land 14:10:29 all the RC1 targeted bugs merged 14:10:43 so if you target a bug, you are saying "please block the release on this" 14:10:48 now we also have a tag 14:10:56 if you want to say, please consider blocking on this one 14:10:59 then use this: 14:11:07 https://launchpad.net/nova/+bugs?field.tag=kilo-rc-potential 14:11:18 OK, so does that all sound like people expected? 14:11:35 its the same juno, as far as I remember 14:11:37 yes mostly 14:11:40 johnthetubaguy: yep 14:11:48 cools 14:12:04 #info targeting to kilo-rc1 means please block the kilo release on my bug 14:12:12 lets talk bugs 14:12:25 there was a cells one? 14:13:06 yup 14:13:11 so, just a quick status 14:13:39 chain starting with https://review.openstack.org/#/c/168294/3 is marked as RC1 and targets to reduce the failures down to 3 14:13:57 bauzas: and the last 3 are going to be whitelisted out right? 14:14:05 and https://review.openstack.org/#/c/166396/ is whitelisting those 3 14:14:22 with a Depends-On tag on the last patch from the series 14:14:59 yeh, I'm not sure why we do depends on there 14:15:02 all the patches but one are good to review, I should privode a last update for https://review.openstack.org/#/c/169400/3 by the 2 next hours 14:15:23 sdague: if we did depends the other way around we should see the cells test pass I guess? 14:15:29 sdague: because it will remove from the whitelist the failures that are fixed by the series 14:15:42 johnthetubaguy: no we wouldn't 14:16:02 OK, I guess we can't depend on that repo 14:16:02 bauzas: we can do it offline, but it's not needed here, I'm going to delete it 14:16:10 OK, cool 14:16:14 sdague: okay, let's discuss offline, sure 14:16:30 any more on those? they look good but not quite blockers I guess 14:16:54 johnthetubaguy: we actually would like to see the cells job green by Kilo hence the RC1 tag 14:17:13 if not, it would require backports 14:17:28 bauzas: OK 14:17:37 bauzas: if so, you need to respin patches faster when alaski -1s them :) 14:17:52 sdague: yey, I know... 14:17:53 any more for this before we go onto the open discussion bits? 14:18:11 bauzas: I can help if needed 14:18:23 honestly, I don't think the cells patches should hold up rc, because they are iterating too slow 14:18:53 if it comes to it I agree, but I would like to see them in 14:18:57 sdague: I am with you, we can remove those when we get closer 14:18:58 sdague: eh, it was -1'd yesterday evening my time 14:18:58 I can help iterate too 14:19:05 not holding up rc makes sense 14:19:12 continuing to merge until the line does as well, IMHO 14:19:24 dansmith: ++ 14:19:37 they *have* made amazing progress in a short time, 14:19:44 +1 14:19:45 progress that we've been looking for for years :) 14:20:05 …. to be clear 14:20:24 we can merge any non-violating bug fixes until we try to cut RC1 14:20:50 so no need to target a bug to be able to merge it right now 14:20:58 we are trusting all cores to do the right thing here 14:21:01 okay, lemme try to fix the -1 in that hour, that's doable 14:21:01 anyways, moving on 14:21:13 #topic open discussion 14:21:32 so skipping stuck reviews 14:21:36 as there are non in the agenda 14:21:41 lets go through the agenda here 14:22:04 #link http://lists.openstack.org/pipermail/openstack-dev/2015-April/060360.html 14:22:10 systemz fun 14:22:11 markus_z: ^? 14:22:22 yepp, that's me 14:22:24 aren't we using fakelibvirt in the tests? 14:22:27 we should be 14:22:36 are they not running them in tox or something? 14:22:45 dansmith: yeh, I thought so, it might be falling back through under some cases? 14:22:53 johnthetubaguy: I just added an item for the Bugs section in the agenda just before the meeting 14:22:56 (my badf) 14:22:58 are we requireing zkvm CI to run unit tests? 14:22:59 I didn't think it ... would. 14:23:00 i thought it was just tempest? 14:23:12 generally that's the case, yeah 14:23:16 we don't require other virt drivers to run unit tests 14:23:21 *CI i mean 14:23:27 right 14:23:42 saying the unit tests pass on platform X (or Z heh) doesn't mean much, IMHO 14:23:46 just a waste of resources I think 14:23:49 yeah 14:23:56 markus_z: so don't run unit tests on z :) 14:23:59 focus on tempest 14:24:00 dansmith: +1 14:24:01 problem solved! 14:24:03 Thanks, that was easy :) 14:24:05 heh 14:24:06 cools 14:24:25 so, hypervisor support matrix 14:24:30 also z 14:24:38 That's also from my side. 14:24:48 markus_z: for the support matrix, i think you just need to get a patch up and then it'll be iterated in review 14:24:55 right 14:25:02 Is the CI a precondition? 14:25:05 if something is partial, i think there is a place to add notes 14:25:07 no 14:25:13 well, i assume not 14:25:17 zkvm blueprint is merged and in tree 14:25:19 Some items are unclear to me, what do I do with them? 14:25:22 markus_z: CI is more about A vs C catagory 14:25:32 johnthetubaguy: Ah, ok 14:25:41 markus_z: maybe put TBD in the review and then -W it 14:25:42 yeah 14:25:47 then iterate in review 14:25:55 TBD in the table cell i mean for the item in question 14:25:56 yeah, lets get the known ticks and known crosses 14:25:58 johnthetubaguy: does that means any hypervisor can submit it without CI ?code in stackforge is also ok? 14:26:04 jichen: no 14:26:14 powervc and zvm in stackforge don't apply here, or nova-docker :) 14:26:19 zkvm is in tree in the libvirt driver 14:26:20 right 14:26:22 jichen: we are talking about in three drivers here 14:26:29 mriedem: ok, got it 14:26:30 s/three/tree/ 14:26:40 johnthetubaguy: ok, thanks 14:26:41 jichen: you should probably have your own support matrix for zvm in the stackforge repo though 14:26:51 markus_z: have we unblocked you now? 14:27:05 markus_z: I guess just get the review up, and we can talk more in gerrit 14:27:06 Not yet, sorry. 14:27:26 np, whats the next question? 14:27:28 Some of the items itself are unclear. Even in the description there is "something something, dark side". 14:27:34 That's what I meant 14:27:52 markus_z: let's talk about them off the meeting 14:28:00 assuming you're talking about the matrix... 14:28:01 markus_z: right, lets talk about that offline 14:28:06 dansmith: yes I do 14:28:12 cool 14:28:13 just catch us in #openstack-nova as normal 14:28:16 cools 14:28:19 OK, pushing it next week and talk in review 14:28:26 so a message from mikal… 14:28:26 no family guy jokes here! 14:28:37 okidoki 14:28:43 #link http://lists.openstack.org/pipermail/openstack-dev/2015-April/060360.html 14:28:52 arg 14:28:59 #link https://etherpad.openstack.org/p/liberty-nova-summit-ideas 14:29:03 thats what I meant 14:29:07 the flood gates are open 14:29:23 well, you can probably throw nova-net -> neutron migration on there 14:29:25 and evacuate 14:29:26 but I would expect to be requested to have a spec up for review, if you want to discussion your feature at the summit 14:30:05 johnthetubaguy: how many fishbowl sessions ? 14:30:09 johnthetubaguy: 2 days ? 14:30:11 I think the deadline for sessions will be announced post election 14:30:15 bauzas: I think so 14:30:26 fwiw, i hope everything is scheduled similar to paris 14:30:27 mriedem: evacuate? 14:30:31 same as last time, is what I remember we agreed 14:30:37 with 2 back to back sessions for hairy issues 14:30:42 right 14:30:44 cfriesen: yeah, as in "effing fix it" 14:30:45 ok, then the contrib meetup, fair 14:31:04 and a thing thurs afternoon to decide release priorities 14:31:06 yeh, that worked well last time 14:31:12 agrteed 14:31:15 then the orgy on friday 14:31:16 :) 14:31:32 * dansmith bites his tongue 14:31:43 yes... 14:31:52 i'll throw the evacuate wip spec in here 14:31:52 so the etherpad is the source for the sessions ? no longer the website ? 14:31:55 since it was controversial 14:31:57 bauzas: you had something to discuss? 14:32:04 bauzas: I believe so, just like paris 14:32:07 johnthetubaguy: ok 14:32:13 mriedem: I don't think the spec is controversial 14:32:20 johnthetubaguy: yeah, just re: http://lists.openstack.org/pipermail/openstack-dev/2015-April/060448.html 14:32:20 dansmith: not the spec 14:32:24 mriedem: the "do we do anything in the meantime" was the controversial bit 14:32:25 okay 14:32:36 the thing that led to the spec 14:32:49 well, I think that's sailed now 14:33:22 in a sea of f bombs 14:33:22 yes 14:33:29 heh 14:33:40 we could add a log message saying not to do it 14:33:43 * johnthetubaguy ducks 14:34:06 not to do evacuate? 14:34:07 bauzas: so you had something about that thread? 14:34:26 seems like path forward on quotas might be useful as well 14:34:28 johnthetubaguy: I was just thinking we should maybe discuss with the bug monkeys for helping checking it 14:34:33 LOG.warning('Oh jeez, you just effed up your system') 14:34:34 because that's a ball of mud 14:34:34 sdague: quotas is in the etherpad 14:34:37 dansmith: i was kinda joking, but yeah 14:34:38 sdague: yeah 14:34:49 I looked and Launchpad is buggy 14:34:50 bauzas: I liked James Bottomley's suggestion of having it automatically change state when more info provided 14:34:58 mriedem: oh I missed it 14:35:09 cfriesen: Launchpad doesn't do this 14:35:16 sdague: +1 on fixing quotas, using compare and swap in the DB was the biggest lead I heard, but yeah 14:35:30 cfriesen: that assumes all kinds of things about launchpad without knowing any of the launchpad limitations 14:35:44 cfriesen: and it's buggy because if you check 'Incomplete with response", it will give you bugs that are replied, but not from the owner only 14:35:46 it was exceptionally unproductive response honestly 14:35:48 s/owner/reporter 14:36:18 http://goo.gl/YPMUf3 is the list of incomplete bugs having replies 14:36:22 49 IIRC 14:36:40 and most of them have replies, but not from the reporter, just as follow-up messages 14:37:02 so we would need to iterate over them 14:37:16 as said, it needs to be done manually until someone enough lazy automates it 14:37:38 I was thinking that the trivial bug monkeys were a good gang for helping that 14:37:52 yeh, I did go through a couple hundred incomplete bugs yesterday, closed a lot of them, it's just a lot of manual work 14:38:16 sdague: I tried during lunch to play with launchpadlib, that's doable but risky 14:38:35 bauzas: I don't trust launchpadlib 14:38:42 sdague: agreed 14:38:48 given how often launchpad times out rest calls 14:38:49 sdague: as I said, the report is wrong 14:38:52 yep 14:39:02 Oh, good, /me was about play with launchpad lib 14:39:18 kashyap: that's fine but that's really buggy 14:39:23 Also, auto-closing some bugs can close off valid bugs as ttx noted :-( 14:39:43 kashyap: ^ hence my point, we need to do this by hand :( 14:39:49 But, I think given the volume, I think this auto-expiration needs to be in place I guess 14:39:58 manully is a valid option. 14:40:11 garyk: only if I'm not the only one doing it :P 14:40:11 the rub is, we need people to do this work 14:40:19 johnthetubaguy: exactly 14:40:21 sdague: +1 14:40:23 so auto expire is back on 14:40:24 johnthetubaguy: Yes, and it's utterly unthankful 14:40:24 sdague: you should certainly not be the one doing it. 14:40:33 If you end up expiring manually all bugs after a given abandon time, you should just autoexpire and be done with them :) 14:40:37 kashyap: agreed 14:40:37 it should be the onus of people who are responsible for tagged bugs 14:40:46 sdague: that's why I think we should call the bug monkeys to look at them 14:40:48 garyk: I clearly am, there were 200+ incomplete bugs when I looked yesterday 14:40:55 My point on the ML was that the tool is not really helping having a conversation with the reporter anyway 14:41:17 Also, hard as it might be to digest, sometimes some bugs will take a year or so to get to a proper resolution. 14:41:22 sdague: so I think auto abandon on is the right choice 14:41:28 I've seen such bugs in other community projects I participate in. 14:41:30 johnthetubaguy: agreed 14:41:34 sdague: i know. but you should not be responsible for doing this by yourself. 14:41:54 it is our responsibility so we should get our act tigether 14:41:56 ttx: that's a tool issue, we should have a bug system that should put back the ticket in another state if the reporter is replying 14:42:19 at least if he's ticking that he answers the question 14:42:46 #help we need more folks to help with bug triage 14:42:56 OK, I don't see lots of us stepping up to help here 14:43:01 thats what we need 14:43:05 bauzas: I don't think you are helping support getting someone to do the work by using the phrase bug monkeys 14:43:12 maybe need some kind of bounty 14:43:25 anteaya: just because it worked for trivial bugs 14:43:29 but I think this conversation has run its course a bit... 14:43:38 johnthetubaguy: agreed, move on 14:43:43 anything more before we move back over to #openstack-nova ? 14:43:48 anteaya: https://etherpad.openstack.org/p/kilo-nova-priorities-tracking L148 and below 14:43:51 end it! 14:43:58 mriedem: agreed 14:43:59 fair enough 14:44:01 thanks all 14:44:19 happy bug triage and fixing 14:44:23 #endmeeting