18:02:27 <sputnik13> #startmeeting cue 18:02:28 <openstack> Meeting started Mon Jun 29 18:02:27 2015 UTC and is due to finish in 60 minutes. The chair is sputnik13. Information about MeetBot at http://wiki.debian.org/MeetBot. 18:02:29 <openstack> Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. 18:02:32 <openstack> The meeting name has been set to 'cue' 18:02:38 <sputnik13> roll call 18:02:44 <sputnik13> 1 o/ 18:03:03 <dkalleg> 2 o/ 18:03:03 <abitha> 2 18:03:06 <davideagnello> 3 18:03:46 <esmute__> o/ 18:04:13 <sputnik13> ok we have quorum, I assume vipul is otherwise occupied :) 18:04:24 <sputnik13> let's start with action items 18:04:31 <vipul> o/ 18:04:35 <vipul> i'll be in an hout 18:04:36 <vipul> out 18:04:45 <sputnik13> #link http://eavesdrop.openstack.org/meetings/cue/2015/cue.2015-06-22-18.00.html 18:05:12 <sputnik13> #note AI #1 esmute__ and davideagnello to fix the rally gate job and get rally tests running 18:05:26 <sputnik13> esmute__ and davideagnello can you give us an update 18:06:17 <davideagnello> I have made a necessary rally directory and file structure change needed for Rally to run in CI gate 18:06:18 <esmute__> davideagnello had a good handle on it so he did the work on this 18:06:23 <davideagnello> this was checked in 18:06:56 <sputnik13> is the rally test running now? 18:07:20 <davideagnello> there are currently two patches outstanding following up on getting those merged. Sergey created them to resolve getting Devstack installed in prehook before rally tests start running 18:07:33 <esmute__> davideagnello: can you link these patches? 18:08:03 <davideagnello> I saw the rally test running in CI once with my patch but it was because of a race condition which these patches will resolve: 18:08:10 <davideagnello> https://review.openstack.org/#/c/195991/ 18:08:18 <davideagnello> https://review.openstack.org/#/c/195992/ 18:08:22 <sputnik13> #link https://review.openstack.org/#/c/195991/ 18:08:29 <sputnik13> #link https://review.openstack.org/#/c/195992/ 18:09:21 <davideagnello> once these are in, will be testing our rally gate job 18:09:50 <sputnik13> ok 18:10:03 <sputnik13> can you follow up on this for next week's meeting? 18:10:32 <sputnik13> #action davideagnello to follow up on merge of https://review.openstack.org/#/c/195991/ and https://review.openstack.org/#/c/195992/ 18:11:03 <sputnik13> next item... 18:11:10 <sputnik13> #note AI #2 vipuls to link Josh's patch with correct bug number 18:11:32 <sputnik13> doh 18:11:36 <vipul> sputnik13: I may have merged it :P 18:11:38 <sputnik13> #info AI #2 vipuls to link Josh's patch with correct bug number 18:11:48 <vipul> i'll update the bug 18:13:01 <sputnik13> ok 18:13:06 <sputnik13> next item... 18:13:18 <sputnik13> #info AI #3 sputnik13 to fill in details for https://blueprints.launchpad.net/cue/+spec/kafka 18:13:28 <sputnik13> right, not done, will work on this week :) 18:13:35 <sputnik13> #action sputnik13 to fill in details for https://blueprints.launchpad.net/cue/+spec/kafka 18:13:44 <sputnik13> kicking the can down the road :) 18:14:03 <sputnik13> that's it for actions from last week 18:14:18 <sputnik13> I think there's something that's not on the action item list, the tempest gate is not voting yet 18:14:37 <sputnik13> esmute__ do you have the patch that makes this voting? 18:14:38 <esmute__> I have put a patch last week to make it votin 18:14:44 <esmute__> and run on gate 18:14:56 <sputnik13> can you link the patch 18:14:57 <esmute__> #link https://review.openstack.org/#/c/194324/ 18:15:01 <esmute__> you need one more +2 18:15:17 <sputnik13> I guess this should be a new #topic 18:15:27 <sputnik13> #topic make tempest gate voting 18:15:42 <esmute__> If i dont get a +2 in the next day or two, ill go and pester them 18:15:52 <sputnik13> #info patch submitted to make tempest gate voting 18:15:53 <sputnik13> #link https://review.openstack.org/#/c/194324/ 18:16:11 <sputnik13> #action esmute__ to follow up with patch and ensure it gets merged 18:16:22 <sputnik13> just so we remember to follow up next week :) 18:17:03 <sputnik13> any other topics to discuss before we go on to bugs? 18:17:22 <sputnik13> oh, we're official official now 18:18:09 <sputnik13> #topic Cue is Openstack!! 18:18:12 <esmute__> Woo!. Celebration 18:18:34 <dkalleg> woo 18:19:02 <davideagnello> awesome! 18:19:04 <sputnik13> #info Cue was accepted by TC as an Openstack project 2015/06/23 and merged 2015/06/25 18:19:11 <sputnik13> #link https://review.openstack.org/#/c/191173/ 18:19:13 <sputnik13> #celebrate 18:19:15 <sputnik13> :-D 18:20:08 <vipul> w00t 18:20:10 <sputnik13> hopefully we see more contributors going forward :) 18:20:33 <sputnik13> would be great for the team to be bigger by the next summit 18:20:42 <vipul> #link https://review.openstack.org/#/c/196268/ 18:20:45 <sputnik13> so we can do mid-cycles and whatever else 18:20:46 <sputnik13> :) 18:20:51 <vipul> patch to move our stuff to openstack 18:20:57 <sputnik13> yay 18:21:09 <sputnik13> what does that mean for outstanding patches? 18:21:21 <vipul> https://wiki.openstack.org/wiki/Meetings/InfraTeamMeeting#Upcoming_Project_Renames 18:21:49 <sputnik13> do patches get moved to the new namespaces? 18:22:00 <sputnik13> or do we have to make sure there's no outstanding patches? 18:22:04 <vipul> I doubt it, we may need to rebase and resubmit 18:22:16 <sputnik13> ic 18:22:17 <vipul> outstanding patches are fine.. there just might be some manual labor needed to move over 18:22:29 <sputnik13> ok 18:22:37 <sputnik13> good news is we don't have a lot of outstanding patches :) 18:22:44 <vipul> is that good news :P 18:22:59 <sputnik13> yes that means cores are doing their jobs and getting patches merged! 18:23:05 <vipul> fair enough ;) 18:23:13 <sputnik13> that's my story and I'm sticking to it ;) 18:23:44 <sputnik13> is the 2x +2 to +wf a general openstack thing or is that at project discretion? 18:24:10 <vipul> it's followed by most openstack projects.. but no hard rule 18:24:28 <vipul> We need to transition to that 18:25:06 <sputnik13> ok, in due time 18:25:44 <sputnik13> other topics? or are we ready for some bug squashing? :) 18:26:55 <sputnik13> we need to get better at positive confirmation to move on meetings 18:27:06 * sputnik13 pokes dkalleg esmute__ abitha davideagnello 18:27:23 <sputnik13> hello~ 18:27:28 <dkalleg> No other topics from me 18:27:47 <davideagnello> nope 18:27:53 <abitha> yes will do that 18:28:01 <sputnik13> ok moving on 18:28:06 <sputnik13> #topic Bug Squash 18:28:33 <davideagnello> new bug: #link https://bugs.launchpad.net/cue/+bug/1469823 18:28:33 <openstack> Launchpad bug 1469823 in Cue "Delete Cluster fails when Cluster contains VM's which have already been deleted" [High,New] 18:28:40 <sputnik13> #link http://bit.ly/1MChDwJ 18:29:11 <sputnik13> I should just put that in the topic from now on rather than as separate links 18:29:12 <sputnik13> meh 18:29:18 <esmute__> davideagnello: how is this reproduce? 18:29:25 <esmute__> Does it happens everything a cluster is deleted? 18:29:50 <davideagnello> create a cluster of n nodes with VM capacity of n-1 18:30:45 <davideagnello> or create a cluster, then delete one of the clustered VMs when rabbit check is taking place, this will trigger a rollback 18:31:00 <davideagnello> then issue a cluster delete 18:31:10 <sputnik13> do we have a node status field? 18:31:26 <davideagnello> node status or cluster status? 18:31:30 <sputnik13> node status 18:31:30 <esmute__> so this issue only happen when a rollback is performed 18:31:35 <davideagnello> yes 18:31:38 <sputnik13> I'm pretty sure there was a node status 18:31:46 <davideagnello> there is, in the db 18:31:54 <davideagnello> we are not exposing it though 18:32:02 <sputnik13> so the rollback probably should update the status 18:32:06 <sputnik13> it doesn't need to be exposed 18:32:49 <sputnik13> if we've previously deleted a VM and its associated resources, and we have a record that mirrors nova status, it should be updated 18:32:53 <davideagnello> yes, the change of moving the node update ahead now we require to update this 18:33:00 <sputnik13> I think it could be argued we shouldn't be caching status at all 18:33:12 <sputnik13> because that's yet another thing that could become out of sync 18:33:21 <sputnik13> but as long as we have it it should be kept as updated as possible 18:33:34 <davideagnello> agreed 18:33:55 <davideagnello> I think there is two issues here 18:34:04 <davideagnello> one the node status is not being updated as required 18:34:46 <davideagnello> two when we are listing interfaces on a specific VM during deletion we are not catching the not found exception which will then fail our delete flow 18:35:11 <davideagnello> we should be resolving the second one as a higher priority since this breaks cue 18:35:14 <sputnik13> if we have node status that says the node is already deleted should we be deleting? 18:35:44 <sputnik13> if we're not doing a delete because the delete already happened the second becomes a non-issue 18:35:48 <davideagnello> we delete a vm after we get it's intefaces, we would have to re-structure our delete flow 18:37:06 <sputnik13> I don't think this is a High bug, it's Medium at best 18:37:31 <sputnik13> it doesn't result in leaked resources, it doesn't impede any other functionality 18:37:45 <sputnik13> not to say it's not something we need to resolve 18:37:53 <sputnik13> but it's a matter of does it need to be done immediately 18:39:05 <davideagnello> we have bugs that are piling up, I think we should start putting some portion of time to get these resolved 18:39:33 <sputnik13> yes, and that needs to be done for critical and high importance bugs 18:39:46 <sputnik13> and to some extent medium bugs 18:40:02 <davideagnello> I would argue this is more of a high importance bug 18:40:20 <sputnik13> anyone else want to weigh in? 18:40:32 <vipul> reading scrollback 18:40:45 <davideagnello> this is directly at the user's interface, as far as the user is concerned one of our basic CRD functions is broken 18:40:58 <esmute__> reading.. 18:41:02 <davideagnello> * the delete 18:41:08 <sputnik13> well, we need to be consistent in application of the definition 18:41:09 <sputnik13> https://wiki.openstack.org/wiki/Bugs 18:41:25 <sputnik13> High is "Data corruption / complete failure affecting most users, with workaround" 18:41:33 <sputnik13> and "Failure of a significant feature, no workaround" 18:41:44 <sputnik13> that's probably more an "or" 18:42:02 <sputnik13> the delete doesn't work, but the cluster does not take up any resources 18:42:14 <sputnik13> all underlying resources are cleaned 18:42:23 <esmute__> the bug should be high if there is no workaround 18:42:24 <sputnik13> there is no data corruption 18:42:35 <davideagnello> yes but the user wouldn't be aware of that give our api alone 18:43:07 <davideagnello> there is some corruption in the DB records 18:43:16 <esmute__> why not fixing this by catching the exception? 18:43:35 <sputnik13> that's not a corruption 18:43:43 <davideagnello> already have that fixed, but we are looking at a more inclusive solution 18:43:48 <sputnik13> and I don't think it constitutes "data" 18:43:59 <sputnik13> the state is out of sync 18:44:00 <davideagnello> ok 18:44:13 <sputnik13> it's a bit of an academic debate :) 18:45:44 <sputnik13> currently we have no quota enforcement for clusters, so this doesn't prevent a user from creating and using brokers 18:46:02 <esmute__> davideagnello: do you have a patch so we can see the fix you are proposing? 18:46:31 <davideagnello> esmute__: this resolved the underlying issue: https://review.openstack.org/#/c/196332/ 18:46:51 <davideagnello> but we don't want the list interface task to always cover the exception 18:49:31 <vipul> sorry i've been in and out.. 18:49:36 <sputnik13> :) 18:49:41 <vipul> why don't we discuss this offline in #openstack-cue 18:49:55 <sputnik13> ok, let's table this one 18:50:04 <davideagnello> ok 18:50:07 <sputnik13> there's a critical bug 18:50:16 <sputnik13> https://bugs.launchpad.net/cue/+bug/1466609 18:50:16 <openstack> Launchpad bug 1466609 in Cue "user supplied network is not validated before attaching" [Critical,New] 18:50:29 <sputnik13> does this look critical or should it be reclassified? 18:51:03 <davideagnello> is this critical because of the security implication? 18:51:04 <vipul> So if a user were to supply a network-id that they do not own.. would they have a way to get to the broker 18:51:16 <sputnik13> yes, the reason it's critical is because of the security implication 18:51:30 <davideagnello> ok 18:51:48 <sputnik13> but it's currently not a security issue because a user who attaches to someone else's network also can't access it themselves 18:51:51 <sputnik13> possibly 18:52:07 <vipul> this wouldn't be an issue until we allow multiple networks to be supplied 18:52:08 <sputnik13> vipul: no currently they wouldn't be able to get to the broker 18:52:10 <vipul> which we do not do today 18:52:14 <sputnik13> correct 18:52:17 <sputnik13> reclassify? 18:52:22 <vipul> i would say that's the reason to deprioritze 18:52:40 <esmute__> so if i pass in a network that doesnt belong to me, cue will still attach it no? 18:52:44 <sputnik13> ok, I think medium is fine in that light 18:52:47 <vipul> maybe put a note saying this becomes high pri when we implmeent multi nic 18:53:01 <vipul> esmute__: yes. that's the bug 18:53:02 <esmute__> that also should apply for single network 18:53:08 <esmute__> not just multiple 18:53:20 <sputnik13> esmute__: right, and they woudln't be able to access the broker 18:53:35 <sputnik13> so given they won't be able to access it does break their ability to use the broker 18:53:52 <vipul> they should know better :P 18:54:09 <sputnik13> maybe that justifies prioritizing it as high? 18:54:24 <sputnik13> high or medium 18:54:29 <sputnik13> ? 18:54:51 <esmute__> i think sputnik13 got a point.. it shouldnt be high since we dont support multiple networks 18:55:02 <vipul> I don't think it's as simple as saying does the user own the network.. what if it's a shared network, what if the user is the admin of multiple tenants 18:55:58 <sputnik13> that gets messy fast 18:56:06 <davideagnello> is this even something we should verify more than saying the id is a network id? isn't it up to the user to provide the correct network id? 18:56:38 <sputnik13> davideagnello that's fine if the only ones they can harm is themselves 18:56:38 <vipul> I think we do want to make sure the user can't shoot themselves in the foot.. that's a better UX.. but there are some open questions on how we would verify 18:56:58 <vipul> this could be used as a DOS to some poor tenant 18:57:05 <esmute__> vipul: if the user is an admin of multiple tenants, then he should be able to use these networks 18:57:24 <vipul> whose network id can be misused, and cue ends up eating up all his avaialble ports 18:57:28 <sputnik13> vipul: well it would use neutron port quota on the cue tenant I think, not on the tenant that owns the network 18:57:52 <sputnik13> oh wait 18:57:54 <sputnik13> it takes up an IP 18:57:56 <vipul> sputnik13: that would be good to verify 18:58:01 <vipul> right that's another issue.. 18:58:10 <sputnik13> so it could become a DOS 18:58:15 <sputnik13> regardless of quotas 18:58:37 <vipul> i think High is probably right 18:58:42 <vipul> need to think about the impl 18:58:59 <sputnik13> so one of the purposes of discussing these bugs during the meeting is to verify classification then change status on anything that's "New" 18:59:08 <sputnik13> we're agreeing this is something we will fix yes? 18:59:27 <vipul> the DOS possibility makes it a high 18:59:30 <sputnik13> does that mean it should be "Confirmed" or "Triaged"? 18:59:32 <vipul> we should fix this 18:59:35 <sputnik13> vipul: I agree 19:00:01 <sputnik13> any dissent? 19:00:03 <sputnik13> we have 1 minute 19:00:11 <esmute__> +1 19:00:12 <vipul> your clock is slow 19:00:17 <sputnik13> blah 19:00:19 <sputnik13> your clock is fast 19:00:21 <sputnik13> :-P 19:00:36 <sputnik13> ok will reclassify to High 19:00:44 <sputnik13> and it's Triaged 19:01:12 <sputnik13> any last things before we end the meeting? 19:01:23 <sputnik13> going once 19:01:26 <sputnik13> going twice 19:01:33 <sputnik13> #endmeeting