18:02:27 #startmeeting cue 18:02:28 Meeting started Mon Jun 29 18:02:27 2015 UTC and is due to finish in 60 minutes. The chair is sputnik13. Information about MeetBot at http://wiki.debian.org/MeetBot. 18:02:29 Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. 18:02:32 The meeting name has been set to 'cue' 18:02:38 roll call 18:02:44 1 o/ 18:03:03 2 o/ 18:03:03 2 18:03:06 3 18:03:46 o/ 18:04:13 ok we have quorum, I assume vipul is otherwise occupied :) 18:04:24 let's start with action items 18:04:31 o/ 18:04:35 i'll be in an hout 18:04:36 out 18:04:45 #link http://eavesdrop.openstack.org/meetings/cue/2015/cue.2015-06-22-18.00.html 18:05:12 #note AI #1 esmute__ and davideagnello to fix the rally gate job and get rally tests running 18:05:26 esmute__ and davideagnello can you give us an update 18:06:17 I have made a necessary rally directory and file structure change needed for Rally to run in CI gate 18:06:18 davideagnello had a good handle on it so he did the work on this 18:06:23 this was checked in 18:06:56 is the rally test running now? 18:07:20 there are currently two patches outstanding following up on getting those merged. Sergey created them to resolve getting Devstack installed in prehook before rally tests start running 18:07:33 davideagnello: can you link these patches? 18:08:03 I saw the rally test running in CI once with my patch but it was because of a race condition which these patches will resolve: 18:08:10 https://review.openstack.org/#/c/195991/ 18:08:18 https://review.openstack.org/#/c/195992/ 18:08:22 #link https://review.openstack.org/#/c/195991/ 18:08:29 #link https://review.openstack.org/#/c/195992/ 18:09:21 once these are in, will be testing our rally gate job 18:09:50 ok 18:10:03 can you follow up on this for next week's meeting? 18:10:32 #action davideagnello to follow up on merge of https://review.openstack.org/#/c/195991/ and https://review.openstack.org/#/c/195992/ 18:11:03 next item... 18:11:10 #note AI #2 vipuls to link Josh's patch with correct bug number 18:11:32 doh 18:11:36 sputnik13: I may have merged it :P 18:11:38 #info AI #2 vipuls to link Josh's patch with correct bug number 18:11:48 i'll update the bug 18:13:01 ok 18:13:06 next item... 18:13:18 #info AI #3 sputnik13 to fill in details for https://blueprints.launchpad.net/cue/+spec/kafka 18:13:28 right, not done, will work on this week :) 18:13:35 #action sputnik13 to fill in details for https://blueprints.launchpad.net/cue/+spec/kafka 18:13:44 kicking the can down the road :) 18:14:03 that's it for actions from last week 18:14:18 I think there's something that's not on the action item list, the tempest gate is not voting yet 18:14:37 esmute__ do you have the patch that makes this voting? 18:14:38 I have put a patch last week to make it votin 18:14:44 and run on gate 18:14:56 can you link the patch 18:14:57 #link https://review.openstack.org/#/c/194324/ 18:15:01 you need one more +2 18:15:17 I guess this should be a new #topic 18:15:27 #topic make tempest gate voting 18:15:42 If i dont get a +2 in the next day or two, ill go and pester them 18:15:52 #info patch submitted to make tempest gate voting 18:15:53 #link https://review.openstack.org/#/c/194324/ 18:16:11 #action esmute__ to follow up with patch and ensure it gets merged 18:16:22 just so we remember to follow up next week :) 18:17:03 any other topics to discuss before we go on to bugs? 18:17:22 oh, we're official official now 18:18:09 #topic Cue is Openstack!! 18:18:12 Woo!. Celebration 18:18:34 woo 18:19:02 awesome! 18:19:04 #info Cue was accepted by TC as an Openstack project 2015/06/23 and merged 2015/06/25 18:19:11 #link https://review.openstack.org/#/c/191173/ 18:19:13 #celebrate 18:19:15 :-D 18:20:08 w00t 18:20:10 hopefully we see more contributors going forward :) 18:20:33 would be great for the team to be bigger by the next summit 18:20:42 #link https://review.openstack.org/#/c/196268/ 18:20:45 so we can do mid-cycles and whatever else 18:20:46 :) 18:20:51 patch to move our stuff to openstack 18:20:57 yay 18:21:09 what does that mean for outstanding patches? 18:21:21 https://wiki.openstack.org/wiki/Meetings/InfraTeamMeeting#Upcoming_Project_Renames 18:21:49 do patches get moved to the new namespaces? 18:22:00 or do we have to make sure there's no outstanding patches? 18:22:04 I doubt it, we may need to rebase and resubmit 18:22:16 ic 18:22:17 outstanding patches are fine.. there just might be some manual labor needed to move over 18:22:29 ok 18:22:37 good news is we don't have a lot of outstanding patches :) 18:22:44 is that good news :P 18:22:59 yes that means cores are doing their jobs and getting patches merged! 18:23:05 fair enough ;) 18:23:13 that's my story and I'm sticking to it ;) 18:23:44 is the 2x +2 to +wf a general openstack thing or is that at project discretion? 18:24:10 it's followed by most openstack projects.. but no hard rule 18:24:28 We need to transition to that 18:25:06 ok, in due time 18:25:44 other topics? or are we ready for some bug squashing? :) 18:26:55 we need to get better at positive confirmation to move on meetings 18:27:06 * sputnik13 pokes dkalleg esmute__ abitha davideagnello 18:27:23 hello~ 18:27:28 No other topics from me 18:27:47 nope 18:27:53 yes will do that 18:28:01 ok moving on 18:28:06 #topic Bug Squash 18:28:33 new bug: #link https://bugs.launchpad.net/cue/+bug/1469823 18:28:33 Launchpad bug 1469823 in Cue "Delete Cluster fails when Cluster contains VM's which have already been deleted" [High,New] 18:28:40 #link http://bit.ly/1MChDwJ 18:29:11 I should just put that in the topic from now on rather than as separate links 18:29:12 meh 18:29:18 davideagnello: how is this reproduce? 18:29:25 Does it happens everything a cluster is deleted? 18:29:50 create a cluster of n nodes with VM capacity of n-1 18:30:45 or create a cluster, then delete one of the clustered VMs when rabbit check is taking place, this will trigger a rollback 18:31:00 then issue a cluster delete 18:31:10 do we have a node status field? 18:31:26 node status or cluster status? 18:31:30 node status 18:31:30 so this issue only happen when a rollback is performed 18:31:35 yes 18:31:38 I'm pretty sure there was a node status 18:31:46 there is, in the db 18:31:54 we are not exposing it though 18:32:02 so the rollback probably should update the status 18:32:06 it doesn't need to be exposed 18:32:49 if we've previously deleted a VM and its associated resources, and we have a record that mirrors nova status, it should be updated 18:32:53 yes, the change of moving the node update ahead now we require to update this 18:33:00 I think it could be argued we shouldn't be caching status at all 18:33:12 because that's yet another thing that could become out of sync 18:33:21 but as long as we have it it should be kept as updated as possible 18:33:34 agreed 18:33:55 I think there is two issues here 18:34:04 one the node status is not being updated as required 18:34:46 two when we are listing interfaces on a specific VM during deletion we are not catching the not found exception which will then fail our delete flow 18:35:11 we should be resolving the second one as a higher priority since this breaks cue 18:35:14 if we have node status that says the node is already deleted should we be deleting? 18:35:44 if we're not doing a delete because the delete already happened the second becomes a non-issue 18:35:48 we delete a vm after we get it's intefaces, we would have to re-structure our delete flow 18:37:06 I don't think this is a High bug, it's Medium at best 18:37:31 it doesn't result in leaked resources, it doesn't impede any other functionality 18:37:45 not to say it's not something we need to resolve 18:37:53 but it's a matter of does it need to be done immediately 18:39:05 we have bugs that are piling up, I think we should start putting some portion of time to get these resolved 18:39:33 yes, and that needs to be done for critical and high importance bugs 18:39:46 and to some extent medium bugs 18:40:02 I would argue this is more of a high importance bug 18:40:20 anyone else want to weigh in? 18:40:32 reading scrollback 18:40:45 this is directly at the user's interface, as far as the user is concerned one of our basic CRD functions is broken 18:40:58 reading.. 18:41:02 * the delete 18:41:08 well, we need to be consistent in application of the definition 18:41:09 https://wiki.openstack.org/wiki/Bugs 18:41:25 High is "Data corruption / complete failure affecting most users, with workaround" 18:41:33 and "Failure of a significant feature, no workaround" 18:41:44 that's probably more an "or" 18:42:02 the delete doesn't work, but the cluster does not take up any resources 18:42:14 all underlying resources are cleaned 18:42:23 the bug should be high if there is no workaround 18:42:24 there is no data corruption 18:42:35 yes but the user wouldn't be aware of that give our api alone 18:43:07 there is some corruption in the DB records 18:43:16 why not fixing this by catching the exception? 18:43:35 that's not a corruption 18:43:43 already have that fixed, but we are looking at a more inclusive solution 18:43:48 and I don't think it constitutes "data" 18:43:59 the state is out of sync 18:44:00 ok 18:44:13 it's a bit of an academic debate :) 18:45:44 currently we have no quota enforcement for clusters, so this doesn't prevent a user from creating and using brokers 18:46:02 davideagnello: do you have a patch so we can see the fix you are proposing? 18:46:31 esmute__: this resolved the underlying issue: https://review.openstack.org/#/c/196332/ 18:46:51 but we don't want the list interface task to always cover the exception 18:49:31 sorry i've been in and out.. 18:49:36 :) 18:49:41 why don't we discuss this offline in #openstack-cue 18:49:55 ok, let's table this one 18:50:04 ok 18:50:07 there's a critical bug 18:50:16 https://bugs.launchpad.net/cue/+bug/1466609 18:50:16 Launchpad bug 1466609 in Cue "user supplied network is not validated before attaching" [Critical,New] 18:50:29 does this look critical or should it be reclassified? 18:51:03 is this critical because of the security implication? 18:51:04 So if a user were to supply a network-id that they do not own.. would they have a way to get to the broker 18:51:16 yes, the reason it's critical is because of the security implication 18:51:30 ok 18:51:48 but it's currently not a security issue because a user who attaches to someone else's network also can't access it themselves 18:51:51 possibly 18:52:07 this wouldn't be an issue until we allow multiple networks to be supplied 18:52:08 vipul: no currently they wouldn't be able to get to the broker 18:52:10 which we do not do today 18:52:14 correct 18:52:17 reclassify? 18:52:22 i would say that's the reason to deprioritze 18:52:40 so if i pass in a network that doesnt belong to me, cue will still attach it no? 18:52:44 ok, I think medium is fine in that light 18:52:47 maybe put a note saying this becomes high pri when we implmeent multi nic 18:53:01 esmute__: yes. that's the bug 18:53:02 that also should apply for single network 18:53:08 not just multiple 18:53:20 esmute__: right, and they woudln't be able to access the broker 18:53:35 so given they won't be able to access it does break their ability to use the broker 18:53:52 they should know better :P 18:54:09 maybe that justifies prioritizing it as high? 18:54:24 high or medium 18:54:29 ? 18:54:51 i think sputnik13 got a point.. it shouldnt be high since we dont support multiple networks 18:55:02 I don't think it's as simple as saying does the user own the network.. what if it's a shared network, what if the user is the admin of multiple tenants 18:55:58 that gets messy fast 18:56:06 is this even something we should verify more than saying the id is a network id? isn't it up to the user to provide the correct network id? 18:56:38 davideagnello that's fine if the only ones they can harm is themselves 18:56:38 I think we do want to make sure the user can't shoot themselves in the foot.. that's a better UX.. but there are some open questions on how we would verify 18:56:58 this could be used as a DOS to some poor tenant 18:57:05 vipul: if the user is an admin of multiple tenants, then he should be able to use these networks 18:57:24 whose network id can be misused, and cue ends up eating up all his avaialble ports 18:57:28 vipul: well it would use neutron port quota on the cue tenant I think, not on the tenant that owns the network 18:57:52 oh wait 18:57:54 it takes up an IP 18:57:56 sputnik13: that would be good to verify 18:58:01 right that's another issue.. 18:58:10 so it could become a DOS 18:58:15 regardless of quotas 18:58:37 i think High is probably right 18:58:42 need to think about the impl 18:58:59 so one of the purposes of discussing these bugs during the meeting is to verify classification then change status on anything that's "New" 18:59:08 we're agreeing this is something we will fix yes? 18:59:27 the DOS possibility makes it a high 18:59:30 does that mean it should be "Confirmed" or "Triaged"? 18:59:32 we should fix this 18:59:35 vipul: I agree 19:00:01 any dissent? 19:00:03 we have 1 minute 19:00:11 +1 19:00:12 your clock is slow 19:00:17 blah 19:00:19 your clock is fast 19:00:21 :-P 19:00:36 ok will reclassify to High 19:00:44 and it's Triaged 19:01:12 any last things before we end the meeting? 19:01:23 going once 19:01:26 going twice 19:01:33 #endmeeting