21:00:11 <alaski> #startmeeting nova_cells
21:00:12 <openstack> Meeting started Wed May 13 21:00:11 2015 UTC and is due to finish in 60 minutes.  The chair is alaski. Information about MeetBot at http://wiki.debian.org/MeetBot.
21:00:13 <openstack> Useful Commands: #action #agreed #help #info #idea #link #topic #startvote.
21:00:15 <openstack> The meeting name has been set to 'nova_cells'
21:00:28 <alaski> Anyone here for the cells meeting?
21:00:33 <melwitt> o/
21:00:38 <dansmith> damn
21:00:39 <dansmith> o/
21:00:48 <belmoreira> o/
21:01:00 <alaski> dansmith: heh
21:01:06 <alaski> cool, let's get started
21:01:12 <alaski> #topic Tempest testing
21:01:25 <alaski> the cells job is in very good shape
21:01:27 <alaski> http://goo.gl/b7R8wq
21:01:30 <edleafe> o/
21:01:42 <alaski> there was a recent hiccup with a tempest change, but that has been addressed
21:01:44 <bauzas> \o
21:02:02 <alaski> I'm consistenly seeing tempest.api.compute.servers.test_list_servers_negative.ListServersNegativeTestJSON fail still
21:02:04 <dansmith> that was tempest's fault anyway right?
21:02:09 <alaski> yes
21:02:46 <bauzas> a negative test failing ?
21:02:53 <alaski> the test fails infrequently, but seems to happen a couple of times a day
21:03:01 <alaski> I just put up https://review.openstack.org/#/c/182772/ for it
21:03:16 <alaski> though it may just address the trace, the actual test failure still needs some digging
21:03:40 <alaski> I'm somewhat hopeful it will fix the test though :)
21:03:56 <melwitt> what about the UnexpectedVMStateError traces? are they okay?
21:04:13 <alaski> like http://logs.openstack.org/51/179951/12/check/check-tempest-dsvm-cells/9bfbc3d/logs/screen-n-cell-region.txt.gz?level=TRACE ?
21:04:42 <melwitt> the thing that's hard about the cells job is that exceptions at the messaging level are swallowed, so even if it voted things could get past it I think
21:04:52 <melwitt> alaski: yes
21:05:33 <alaski> I've had that tab open for a while and didn't know if it was still relevant
21:06:00 <alaski> melwitt: true, we can only really be sure about what tempest explicitly checks for
21:06:25 <melwitt> I bring it up because I was checking the logs for my now monstrous cells Instance object patch, and there were a lot of them. so I checked for other test runs and see it there too though not nearly as many as on my patch
21:07:17 <alaski> that's helpful to know
21:07:32 <melwitt> alaski: what I mean is, in a non-cells environment exceptions like that would cause a tempest test failure but with cells, not necessarily
21:08:20 <melwitt> example being I had a problem in an earlier patch set of accessing a non existing attribute on a dict and it passed the tempest job
21:09:19 <bauzas> melwitt: probably because most of the exceptions are swallowed like alaski said ?
21:09:24 <melwitt> and I think maybe it's just that we have to check the logs for traces when we're doing cells/messaging.py changes ourselves
21:09:38 <alaski> melwitt: I see.  Is the issue that tempest isn't looking at enough, or that cells is really bad at bubbling things up, or both?
21:09:39 <melwitt> bauzas: yes, that's what I said
21:10:23 <alaski> do we still have the mechanism to fail a test on exceptions in certain logs?
21:10:30 <dansmith> heh
21:10:40 <dansmith> I dunno, but it's turned off for everything I think
21:10:45 <dansmith> which sucks
21:10:49 <dansmith> IMHO
21:10:53 <alaski> yeah
21:10:55 <bauzas> agreed
21:10:59 <melwitt> alaski: cells catches the exception and returns it to the caller, and the caller can decide what to do with that
21:11:06 <alaski> we should turn that on for cells if we can
21:11:08 <dansmith> you could probably get it turned on for a non-voting thing
21:11:08 <bauzas> could we maybe reraise the exceptions ?
21:11:26 <bauzas> and not just silently drop them ?
21:11:54 <alaski> melwitt: the caller is an rpc cast in most cases though.  is that where it's dropped?
21:12:31 <alaski> dansmith: agreed.  definitely worth looking into getting it going again in some capacity
21:13:45 <alaski> bauzas: I think the issue may be that exceptions that would normally happen in the compute api end up on the other side of an rpc cast
21:14:00 <melwitt> alaski: maybe. I'm not sure if "cells messages" always have request/response or if they have a cast too
21:14:14 <bauzas> alaski: yeah that's my thought too hence the reraise
21:14:26 <bauzas> oh, a cast, nvm
21:14:41 * bauzas is just tied
21:14:44 <alaski> melwitt: gotcha.  the cells messages should have a response, but the loss if probably between the api and the parent cell service
21:14:44 <bauzas> tired even
21:14:56 <alaski> not between parent and cell
21:15:37 <alaski> it sounds like checking logs, manually if necessary, but preferably via an automated thing is a good thing to do
21:16:09 <alaski> #action alaski to look into options for failing on exceptions in cells logs
21:16:33 <alaski> anything else on testing?
21:16:56 <alaski> numbers are really good now, and we could almost get away with voting I think
21:17:13 <alaski> but knocking out one/two more issues should get us there IMO
21:17:29 <alaski> #topic Specs
21:17:37 <dansmith> oh
21:17:39 <dansmith> god
21:17:39 <dansmith> no
21:17:44 <dansmith> not........SPECS
21:17:57 <alaski> lol
21:18:00 <alaski> so, there are specs
21:18:08 <dansmith> I swear, alaski has more specs than should be allowed
21:18:10 <bauzas> well, I just sucked in reviewing those
21:18:15 <alaski> and there will be a quiz on them for entrance to the cells summit session
21:18:27 <dansmith> I'll happily fail that then :)
21:18:35 <bauzas> I still have a question tho
21:18:43 <alaski> dansmith: hah, you get a mandatory exception
21:18:47 <dansmith> fsck.
21:18:57 <alaski> bauzas: sure
21:19:04 <bauzas> alaski: https://review.openstack.org/#/c/169901/
21:19:30 <bauzas> alaski: before commenting it, could you please refresh my mind why it would need a separate object ?
21:19:44 <alaski> separate from request_spec?
21:19:51 <bauzas> alaski: yup for persisting
21:20:08 <bauzas> alaski: I understand we need to carry a relationship w/ the instance
21:20:15 <alaski> because there are things that we need to store that don't relate to the reqspecs purpose
21:20:39 <bauzas> alaski: your spec was mentioning that, but do you have examples ?
21:21:10 <alaski> it's in the spec but availability_zone, power_state, task_state, uuid, key_name, metadata,
21:21:12 <alaski> security_groups, etc...
21:21:19 <bauzas> alaski: oh right
21:21:57 <alaski> some of those things aren't actually necessary, like power/task state, but uuid and metadata are
21:22:39 <bauzas> mmm ok, I have my answers, I'll comment out the spec then
21:22:44 <alaski> cool
21:22:58 <bauzas> (well, in the plane)
21:23:11 <alaski> https://review.openstack.org/141486 https://review.openstack.org/#/c/169901/ https://review.openstack.org/#/c/182715/ https://review.openstack.org/#/c/136490/ (for posterity)
21:23:35 <alaski> probably a couple more to come, to dansmiths chagrin
21:24:31 <dansmith> heh
21:24:33 <alaski> but my goal is to get everyone on the same/similar page during the summit, so the specs can be easy reads/reviews
21:25:25 <alaski> anyone want to talk more about specs?
21:25:58 <alaski> #topic Open Discussion
21:26:10 <alaski> I have one thing to mention
21:26:23 <alaski> we should skip next weeks meeting
21:26:32 <alaski> but also I'm out the following week
21:26:52 <alaski> if someone wants to run the meeting please speak up, otherwise we can skip that week as well
21:26:57 <dansmith> +1 for skip
21:27:01 <dansmith> I will be gone too
21:27:07 <bauzas> yeah we need vacations :;)
21:27:28 <alaski> sounds good
21:27:56 <alaski> anyone have a topic to bring up?
21:28:35 <bauzas> well, just to mention we can review the etherpads for the summit
21:29:02 <alaski> ahh, good point
21:29:04 <alaski> there is https://etherpad.openstack.org/p/YVR-nova-cells-v2
21:29:14 <alaski> and https://etherpad.openstack.org/p/YVR-nova-scalling-out-scheduler-for-cells
21:29:32 <alaski> both of which I should edit a little bit it looks like
21:29:58 <bauzas> alaski: fair point, I also need to do that for the latter
21:30:24 <bauzas> alaski: there are 2 points, one for providing incremental updates and one for having a shared state
21:30:40 <bauzas> both are complementary IMHO
21:30:53 <bauzas> but let's not tease all our sessions :)
21:31:00 <alaski> heh
21:31:22 <alaski> what I would like to nail down is cells vs aggregates vs azs, and how does the scheduler deal with them
21:31:38 <bauzas> that's a 3rd point :)
21:31:42 <alaski> and for someone to convince me that a cell is a host property
21:31:55 <bauzas> I can provide beers
21:32:48 <alaski> I think we should have http://www.sortilegewhisky.com/en/the-original/
21:33:04 * alaski has no clue if that's good, but it looks intriguing
21:33:32 <alaski> anything else?
21:33:45 <bauzas> alaski: at least one beverage which is called a trick, nice
21:33:59 <belmoreira> alaski: nice way to close the meeting :)
21:34:16 <alaski> belmoreira: :)
21:34:33 <alaski> see you all next week(hopefully)!
21:34:40 <alaski> #endmeeting