#openstack-meeting-alt log

18:02:27 <sputnik13> #startmeeting cue
18:02:28 <openstack> Meeting started Mon Jun 29 18:02:27 2015 UTC and is due to finish in 60 minutes.  The chair is sputnik13. Information about MeetBot at http://wiki.debian.org/MeetBot.
18:02:29 <openstack> Useful Commands: #action #agreed #help #info #idea #link #topic #startvote.
18:02:32 <openstack> The meeting name has been set to 'cue'
18:02:38 <sputnik13> roll call
18:02:44 <sputnik13> 1 o/
18:03:03 <dkalleg> 2 o/
18:03:03 <abitha> 2
18:03:06 <davideagnello> 3
18:03:46 <esmute__> o/
18:04:13 <sputnik13> ok we have quorum, I assume vipul is otherwise occupied :)
18:04:24 <sputnik13> let's start with action items
18:04:31 <vipul> o/
18:04:35 <vipul> i'll be in an hout
18:04:36 <vipul> out
18:04:45 <sputnik13> #link http://eavesdrop.openstack.org/meetings/cue/2015/cue.2015-06-22-18.00.html
18:05:12 <sputnik13> #note AI #1 esmute__ and davideagnello to fix the rally gate job and get rally tests running
18:05:26 <sputnik13> esmute__ and davideagnello can you give us an update
18:06:17 <davideagnello> I have made a necessary rally directory and file structure change needed for Rally to run in CI gate
18:06:18 <esmute__> davideagnello had a good handle on it so he did the work on this
18:06:23 <davideagnello> this was checked in
18:06:56 <sputnik13> is the rally test running now?
18:07:20 <davideagnello> there are currently two patches outstanding following up on getting those merged.  Sergey created them to resolve getting Devstack installed in prehook before rally tests start running
18:07:33 <esmute__> davideagnello: can you link these patches?
18:08:03 <davideagnello> I saw the rally test running in CI once with my patch but it was because of a race condition which these patches will resolve:
18:08:10 <davideagnello> https://review.openstack.org/#/c/195991/
18:08:18 <davideagnello> https://review.openstack.org/#/c/195992/
18:08:22 <sputnik13> #link https://review.openstack.org/#/c/195991/
18:08:29 <sputnik13> #link https://review.openstack.org/#/c/195992/
18:09:21 <davideagnello> once these are in, will be testing our rally gate job
18:09:50 <sputnik13> ok
18:10:03 <sputnik13> can you follow up on this for next week's meeting?
18:10:32 <sputnik13> #action davideagnello to follow up on merge of https://review.openstack.org/#/c/195991/ and https://review.openstack.org/#/c/195992/
18:11:03 <sputnik13> next item...
18:11:10 <sputnik13> #note AI #2 vipuls to link Josh's patch with correct bug number
18:11:32 <sputnik13> doh
18:11:36 <vipul> sputnik13: I may have merged it :P
18:11:38 <sputnik13> #info AI #2 vipuls to link Josh's patch with correct bug number
18:11:48 <vipul> i'll update the bug
18:13:01 <sputnik13> ok
18:13:06 <sputnik13> next item...
18:13:18 <sputnik13> #info AI #3 sputnik13 to fill in details for https://blueprints.launchpad.net/cue/+spec/kafka
18:13:28 <sputnik13> right, not done, will work on this week :)
18:13:35 <sputnik13> #action sputnik13 to fill in details for https://blueprints.launchpad.net/cue/+spec/kafka
18:13:44 <sputnik13> kicking the can down the road :)
18:14:03 <sputnik13> that's it for actions from last week
18:14:18 <sputnik13> I think there's something that's not on the action item list, the tempest gate is not voting yet
18:14:37 <sputnik13> esmute__ do you have the patch that makes this voting?
18:14:38 <esmute__> I have put a patch last week to make it votin
18:14:44 <esmute__> and run on gate
18:14:56 <sputnik13> can you link the patch
18:14:57 <esmute__> #link https://review.openstack.org/#/c/194324/
18:15:01 <esmute__> you need one more +2
18:15:17 <sputnik13> I guess this should be a new #topic
18:15:27 <sputnik13> #topic make tempest gate voting
18:15:42 <esmute__> If i dont get a +2 in the next day or two, ill go and pester them
18:15:52 <sputnik13> #info patch submitted to make tempest gate voting
18:15:53 <sputnik13> #link https://review.openstack.org/#/c/194324/
18:16:11 <sputnik13> #action esmute__ to follow up with patch and ensure it gets merged
18:16:22 <sputnik13> just so we remember to follow up next week :)
18:17:03 <sputnik13> any other topics to discuss before we go on to bugs?
18:17:22 <sputnik13> oh, we're official official now
18:18:09 <sputnik13> #topic Cue is Openstack!!
18:18:12 <esmute__> Woo!. Celebration
18:18:34 <dkalleg> woo
18:19:02 <davideagnello> awesome!
18:19:04 <sputnik13> #info Cue was accepted by TC as an Openstack project 2015/06/23 and merged 2015/06/25
18:19:11 <sputnik13> #link https://review.openstack.org/#/c/191173/
18:19:13 <sputnik13> #celebrate
18:19:15 <sputnik13> :-D
18:20:08 <vipul> w00t
18:20:10 <sputnik13> hopefully we see more contributors going forward :)
18:20:33 <sputnik13> would be great for the team to be bigger by the next summit
18:20:42 <vipul> #link https://review.openstack.org/#/c/196268/
18:20:45 <sputnik13> so we can do mid-cycles and whatever else
18:20:46 <sputnik13> :)
18:20:51 <vipul> patch to move our stuff to openstack
18:20:57 <sputnik13> yay
18:21:09 <sputnik13> what does that mean for outstanding patches?
18:21:21 <vipul> https://wiki.openstack.org/wiki/Meetings/InfraTeamMeeting#Upcoming_Project_Renames
18:21:49 <sputnik13> do patches get moved to the new namespaces?
18:22:00 <sputnik13> or do we have to make sure there's no outstanding patches?
18:22:04 <vipul> I doubt it, we may need to rebase and resubmit
18:22:16 <sputnik13> ic
18:22:17 <vipul> outstanding patches are fine.. there just might be some manual labor needed to move over
18:22:29 <sputnik13> ok
18:22:37 <sputnik13> good news is we don't have a lot of outstanding patches :)
18:22:44 <vipul> is that good news :P
18:22:59 <sputnik13> yes that means cores are doing their jobs and getting patches merged!
18:23:05 <vipul> fair enough ;)
18:23:13 <sputnik13> that's my story and I'm sticking to it ;)
18:23:44 <sputnik13> is the 2x +2 to +wf a general openstack thing or is that at project discretion?
18:24:10 <vipul> it's followed by most openstack projects.. but no hard rule
18:24:28 <vipul> We need to transition to that
18:25:06 <sputnik13> ok, in due time
18:25:44 <sputnik13> other topics?  or are we ready for some bug squashing? :)
18:26:55 <sputnik13> we need to get better at positive confirmation to move on meetings
18:27:06 * sputnik13 pokes dkalleg esmute__ abitha davideagnello
18:27:23 <sputnik13> hello~
18:27:28 <dkalleg> No other topics from me
18:27:47 <davideagnello> nope
18:27:53 <abitha> yes will do that
18:28:01 <sputnik13> ok moving on
18:28:06 <sputnik13> #topic Bug Squash
18:28:33 <davideagnello> new bug:  #link https://bugs.launchpad.net/cue/+bug/1469823
18:28:33 <openstack> Launchpad bug 1469823 in Cue "Delete Cluster fails when Cluster contains VM's which have already been deleted" [High,New]
18:28:40 <sputnik13> #link http://bit.ly/1MChDwJ
18:29:11 <sputnik13> I should just put that in the topic from now on rather than as separate links
18:29:12 <sputnik13> meh
18:29:18 <esmute__> davideagnello: how is this reproduce?
18:29:25 <esmute__> Does it happens everything a cluster is deleted?
18:29:50 <davideagnello> create a cluster of n nodes with VM capacity of n-1
18:30:45 <davideagnello> or create a cluster, then delete one of the clustered VMs when rabbit check is taking place, this will trigger a rollback
18:31:00 <davideagnello> then issue a cluster delete
18:31:10 <sputnik13> do we have a node status field?
18:31:26 <davideagnello> node status or cluster status?
18:31:30 <sputnik13> node status
18:31:30 <esmute__> so this issue only happen when a rollback is performed
18:31:35 <davideagnello> yes
18:31:38 <sputnik13> I'm pretty sure there was a node status
18:31:46 <davideagnello> there is, in the db
18:31:54 <davideagnello> we are not exposing it though
18:32:02 <sputnik13> so the rollback probably should update the status
18:32:06 <sputnik13> it doesn't need to be exposed
18:32:49 <sputnik13> if we've previously deleted a VM and its associated resources, and we have a record that mirrors nova status, it should be updated
18:32:53 <davideagnello> yes, the change of moving the node update ahead now we require to update this
18:33:00 <sputnik13> I think it could be argued we shouldn't be caching status at all
18:33:12 <sputnik13> because that's yet another thing that could become out of sync
18:33:21 <sputnik13> but as long as we have it it should be kept as updated as possible
18:33:34 <davideagnello> agreed
18:33:55 <davideagnello> I think there is two issues here
18:34:04 <davideagnello> one the node status is not being updated as required
18:34:46 <davideagnello> two when we are listing interfaces on a specific VM during deletion we are not catching the not found exception which will then fail our delete flow
18:35:11 <davideagnello> we should be resolving the second one as a higher priority since this breaks cue
18:35:14 <sputnik13> if we have node status that says the node is already deleted should we be deleting?
18:35:44 <sputnik13> if we're not doing a delete because the delete already happened the second becomes a non-issue
18:35:48 <davideagnello> we delete a vm after we get it's intefaces, we would have to re-structure our delete flow
18:37:06 <sputnik13> I don't think this is a High bug, it's Medium at best
18:37:31 <sputnik13> it doesn't result in leaked resources, it doesn't impede any other functionality
18:37:45 <sputnik13> not to say it's not something we need to resolve
18:37:53 <sputnik13> but it's a matter of does it need to be done immediately
18:39:05 <davideagnello> we have bugs that are piling up, I think we should start putting some portion of time to get these resolved
18:39:33 <sputnik13> yes, and that needs to be done for critical and high importance bugs
18:39:46 <sputnik13> and to some extent medium bugs
18:40:02 <davideagnello> I would argue this is more of a high importance bug
18:40:20 <sputnik13> anyone else want to weigh in?
18:40:32 <vipul> reading scrollback
18:40:45 <davideagnello> this is directly at the user's interface, as far as the user is concerned one of our basic CRD functions is broken
18:40:58 <esmute__> reading..
18:41:02 <davideagnello> * the delete
18:41:08 <sputnik13> well, we need to be consistent in application of the definition
18:41:09 <sputnik13> https://wiki.openstack.org/wiki/Bugs
18:41:25 <sputnik13> High is "Data corruption / complete failure affecting most users, with workaround"
18:41:33 <sputnik13> and "Failure of a significant feature, no workaround"
18:41:44 <sputnik13> that's probably more an "or"
18:42:02 <sputnik13> the delete doesn't work, but the cluster does not take up any resources
18:42:14 <sputnik13> all underlying resources are cleaned
18:42:23 <esmute__> the bug should be high if there is no workaround
18:42:24 <sputnik13> there is no data corruption
18:42:35 <davideagnello> yes but the user wouldn't be aware of that give our api alone
18:43:07 <davideagnello> there is some corruption in the DB records
18:43:16 <esmute__> why not fixing this by catching the exception?
18:43:35 <sputnik13> that's not a corruption
18:43:43 <davideagnello> already have that fixed, but we are looking at a more inclusive solution
18:43:48 <sputnik13> and I don't think it constitutes "data"
18:43:59 <sputnik13> the state is out of sync
18:44:00 <davideagnello> ok
18:44:13 <sputnik13> it's a bit of an academic debate :)
18:45:44 <sputnik13> currently we have no quota enforcement for clusters, so this doesn't prevent a user from creating and using brokers
18:46:02 <esmute__> davideagnello: do you have a patch so we can see the fix you are proposing?
18:46:31 <davideagnello> esmute__: this resolved the underlying issue:  https://review.openstack.org/#/c/196332/
18:46:51 <davideagnello> but we don't want the list interface task to always cover the exception
18:49:31 <vipul> sorry i've been in and out..
18:49:36 <sputnik13> :)
18:49:41 <vipul> why don't we discuss this offline in #openstack-cue
18:49:55 <sputnik13> ok, let's table this one
18:50:04 <davideagnello> ok
18:50:07 <sputnik13> there's a critical bug
18:50:16 <sputnik13> https://bugs.launchpad.net/cue/+bug/1466609
18:50:16 <openstack> Launchpad bug 1466609 in Cue "user supplied network is not validated before attaching" [Critical,New]
18:50:29 <sputnik13> does this look critical or should it be reclassified?
18:51:03 <davideagnello> is this critical because of the security implication?
18:51:04 <vipul> So if a user were to supply a network-id that they do not own.. would they have a way to get to the broker
18:51:16 <sputnik13> yes, the reason it's critical is because of the security implication
18:51:30 <davideagnello> ok
18:51:48 <sputnik13> but it's currently not a security issue because a user who attaches to someone else's network also can't access it themselves
18:51:51 <sputnik13> possibly
18:52:07 <vipul> this wouldn't be an issue until we allow multiple networks to be supplied
18:52:08 <sputnik13> vipul: no currently they wouldn't be able to get to the broker
18:52:10 <vipul> which we do not do today
18:52:14 <sputnik13> correct
18:52:17 <sputnik13> reclassify?
18:52:22 <vipul> i would say that's the reason to deprioritze
18:52:40 <esmute__> so if i pass in a network that doesnt belong to me, cue will still attach it no?
18:52:44 <sputnik13> ok, I think medium is fine in that light
18:52:47 <vipul> maybe put a note saying this becomes high pri when we implmeent multi nic
18:53:01 <vipul> esmute__: yes. that's the bug
18:53:02 <esmute__> that also should apply for single network
18:53:08 <esmute__> not just multiple
18:53:20 <sputnik13> esmute__: right, and they woudln't be able to access the broker
18:53:35 <sputnik13> so given they won't be able to access it does break their ability to use the broker
18:53:52 <vipul> they should know better :P
18:54:09 <sputnik13> maybe that justifies prioritizing it as high?
18:54:24 <sputnik13> high or medium
18:54:29 <sputnik13> ?
18:54:51 <esmute__> i think sputnik13 got a point.. it shouldnt be high since we dont support multiple networks
18:55:02 <vipul> I don't think it's as simple as saying does the user own the network.. what if it's a shared network, what if the user is the admin of multiple tenants
18:55:58 <sputnik13> that gets messy fast
18:56:06 <davideagnello> is this even something we should verify more than saying the id is a network id?  isn't it up to the user to provide the correct network id?
18:56:38 <sputnik13> davideagnello that's fine if the only ones they can harm is themselves
18:56:38 <vipul> I think we do want to make sure the user can't shoot themselves in the foot.. that's a better UX.. but there are some open questions on how we would verify
18:56:58 <vipul> this could be used as a DOS to some poor tenant
18:57:05 <esmute__> vipul: if the user is an admin of multiple tenants, then he should be able to use these networks
18:57:24 <vipul> whose network id can be misused, and cue ends up eating up all his avaialble ports
18:57:28 <sputnik13> vipul: well it would use neutron port quota on the cue tenant I think, not on the tenant that owns the network
18:57:52 <sputnik13> oh wait
18:57:54 <sputnik13> it takes up an IP
18:57:56 <vipul> sputnik13: that would be good to verify
18:58:01 <vipul> right that's another issue..
18:58:10 <sputnik13> so it could become a DOS
18:58:15 <sputnik13> regardless of quotas
18:58:37 <vipul> i think High is probably right
18:58:42 <vipul> need to think about the impl
18:58:59 <sputnik13> so one of the purposes of discussing these bugs during the meeting is to verify classification then change status on anything that's "New"
18:59:08 <sputnik13> we're agreeing this is something we will fix yes?
18:59:27 <vipul> the DOS possibility makes it a high
18:59:30 <sputnik13> does that mean it should be "Confirmed" or "Triaged"?
18:59:32 <vipul> we should fix this
18:59:35 <sputnik13> vipul: I agree
19:00:01 <sputnik13> any dissent?
19:00:03 <sputnik13> we have 1 minute
19:00:11 <esmute__> +1
19:00:12 <vipul> your clock is slow
19:00:17 <sputnik13> blah
19:00:19 <sputnik13> your clock is fast
19:00:21 <sputnik13> :-P
19:00:36 <sputnik13> ok will reclassify to High
19:00:44 <sputnik13> and it's Triaged
19:01:12 <sputnik13> any last things before we end the meeting?
19:01:23 <sputnik13> going once
19:01:26 <sputnik13> going twice
19:01:33 <sputnik13> #endmeeting