21:00:11 <alaski> #startmeeting nova_cells 21:00:12 <openstack> Meeting started Wed May 13 21:00:11 2015 UTC and is due to finish in 60 minutes. The chair is alaski. Information about MeetBot at http://wiki.debian.org/MeetBot. 21:00:13 <openstack> Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. 21:00:15 <openstack> The meeting name has been set to 'nova_cells' 21:00:28 <alaski> Anyone here for the cells meeting? 21:00:33 <melwitt> o/ 21:00:38 <dansmith> damn 21:00:39 <dansmith> o/ 21:00:48 <belmoreira> o/ 21:01:00 <alaski> dansmith: heh 21:01:06 <alaski> cool, let's get started 21:01:12 <alaski> #topic Tempest testing 21:01:25 <alaski> the cells job is in very good shape 21:01:27 <alaski> http://goo.gl/b7R8wq 21:01:30 <edleafe> o/ 21:01:42 <alaski> there was a recent hiccup with a tempest change, but that has been addressed 21:01:44 <bauzas> \o 21:02:02 <alaski> I'm consistenly seeing tempest.api.compute.servers.test_list_servers_negative.ListServersNegativeTestJSON fail still 21:02:04 <dansmith> that was tempest's fault anyway right? 21:02:09 <alaski> yes 21:02:46 <bauzas> a negative test failing ? 21:02:53 <alaski> the test fails infrequently, but seems to happen a couple of times a day 21:03:01 <alaski> I just put up https://review.openstack.org/#/c/182772/ for it 21:03:16 <alaski> though it may just address the trace, the actual test failure still needs some digging 21:03:40 <alaski> I'm somewhat hopeful it will fix the test though :) 21:03:56 <melwitt> what about the UnexpectedVMStateError traces? are they okay? 21:04:13 <alaski> like http://logs.openstack.org/51/179951/12/check/check-tempest-dsvm-cells/9bfbc3d/logs/screen-n-cell-region.txt.gz?level=TRACE ? 21:04:42 <melwitt> the thing that's hard about the cells job is that exceptions at the messaging level are swallowed, so even if it voted things could get past it I think 21:04:52 <melwitt> alaski: yes 21:05:33 <alaski> I've had that tab open for a while and didn't know if it was still relevant 21:06:00 <alaski> melwitt: true, we can only really be sure about what tempest explicitly checks for 21:06:25 <melwitt> I bring it up because I was checking the logs for my now monstrous cells Instance object patch, and there were a lot of them. so I checked for other test runs and see it there too though not nearly as many as on my patch 21:07:17 <alaski> that's helpful to know 21:07:32 <melwitt> alaski: what I mean is, in a non-cells environment exceptions like that would cause a tempest test failure but with cells, not necessarily 21:08:20 <melwitt> example being I had a problem in an earlier patch set of accessing a non existing attribute on a dict and it passed the tempest job 21:09:19 <bauzas> melwitt: probably because most of the exceptions are swallowed like alaski said ? 21:09:24 <melwitt> and I think maybe it's just that we have to check the logs for traces when we're doing cells/messaging.py changes ourselves 21:09:38 <alaski> melwitt: I see. Is the issue that tempest isn't looking at enough, or that cells is really bad at bubbling things up, or both? 21:09:39 <melwitt> bauzas: yes, that's what I said 21:10:23 <alaski> do we still have the mechanism to fail a test on exceptions in certain logs? 21:10:30 <dansmith> heh 21:10:40 <dansmith> I dunno, but it's turned off for everything I think 21:10:45 <dansmith> which sucks 21:10:49 <dansmith> IMHO 21:10:53 <alaski> yeah 21:10:55 <bauzas> agreed 21:10:59 <melwitt> alaski: cells catches the exception and returns it to the caller, and the caller can decide what to do with that 21:11:06 <alaski> we should turn that on for cells if we can 21:11:08 <dansmith> you could probably get it turned on for a non-voting thing 21:11:08 <bauzas> could we maybe reraise the exceptions ? 21:11:26 <bauzas> and not just silently drop them ? 21:11:54 <alaski> melwitt: the caller is an rpc cast in most cases though. is that where it's dropped? 21:12:31 <alaski> dansmith: agreed. definitely worth looking into getting it going again in some capacity 21:13:45 <alaski> bauzas: I think the issue may be that exceptions that would normally happen in the compute api end up on the other side of an rpc cast 21:14:00 <melwitt> alaski: maybe. I'm not sure if "cells messages" always have request/response or if they have a cast too 21:14:14 <bauzas> alaski: yeah that's my thought too hence the reraise 21:14:26 <bauzas> oh, a cast, nvm 21:14:41 * bauzas is just tied 21:14:44 <alaski> melwitt: gotcha. the cells messages should have a response, but the loss if probably between the api and the parent cell service 21:14:44 <bauzas> tired even 21:14:56 <alaski> not between parent and cell 21:15:37 <alaski> it sounds like checking logs, manually if necessary, but preferably via an automated thing is a good thing to do 21:16:09 <alaski> #action alaski to look into options for failing on exceptions in cells logs 21:16:33 <alaski> anything else on testing? 21:16:56 <alaski> numbers are really good now, and we could almost get away with voting I think 21:17:13 <alaski> but knocking out one/two more issues should get us there IMO 21:17:29 <alaski> #topic Specs 21:17:37 <dansmith> oh 21:17:39 <dansmith> god 21:17:39 <dansmith> no 21:17:44 <dansmith> not........SPECS 21:17:57 <alaski> lol 21:18:00 <alaski> so, there are specs 21:18:08 <dansmith> I swear, alaski has more specs than should be allowed 21:18:10 <bauzas> well, I just sucked in reviewing those 21:18:15 <alaski> and there will be a quiz on them for entrance to the cells summit session 21:18:27 <dansmith> I'll happily fail that then :) 21:18:35 <bauzas> I still have a question tho 21:18:43 <alaski> dansmith: hah, you get a mandatory exception 21:18:47 <dansmith> fsck. 21:18:57 <alaski> bauzas: sure 21:19:04 <bauzas> alaski: https://review.openstack.org/#/c/169901/ 21:19:30 <bauzas> alaski: before commenting it, could you please refresh my mind why it would need a separate object ? 21:19:44 <alaski> separate from request_spec? 21:19:51 <bauzas> alaski: yup for persisting 21:20:08 <bauzas> alaski: I understand we need to carry a relationship w/ the instance 21:20:15 <alaski> because there are things that we need to store that don't relate to the reqspecs purpose 21:20:39 <bauzas> alaski: your spec was mentioning that, but do you have examples ? 21:21:10 <alaski> it's in the spec but availability_zone, power_state, task_state, uuid, key_name, metadata, 21:21:12 <alaski> security_groups, etc... 21:21:19 <bauzas> alaski: oh right 21:21:57 <alaski> some of those things aren't actually necessary, like power/task state, but uuid and metadata are 21:22:39 <bauzas> mmm ok, I have my answers, I'll comment out the spec then 21:22:44 <alaski> cool 21:22:58 <bauzas> (well, in the plane) 21:23:11 <alaski> https://review.openstack.org/141486 https://review.openstack.org/#/c/169901/ https://review.openstack.org/#/c/182715/ https://review.openstack.org/#/c/136490/ (for posterity) 21:23:35 <alaski> probably a couple more to come, to dansmiths chagrin 21:24:31 <dansmith> heh 21:24:33 <alaski> but my goal is to get everyone on the same/similar page during the summit, so the specs can be easy reads/reviews 21:25:25 <alaski> anyone want to talk more about specs? 21:25:58 <alaski> #topic Open Discussion 21:26:10 <alaski> I have one thing to mention 21:26:23 <alaski> we should skip next weeks meeting 21:26:32 <alaski> but also I'm out the following week 21:26:52 <alaski> if someone wants to run the meeting please speak up, otherwise we can skip that week as well 21:26:57 <dansmith> +1 for skip 21:27:01 <dansmith> I will be gone too 21:27:07 <bauzas> yeah we need vacations :;) 21:27:28 <alaski> sounds good 21:27:56 <alaski> anyone have a topic to bring up? 21:28:35 <bauzas> well, just to mention we can review the etherpads for the summit 21:29:02 <alaski> ahh, good point 21:29:04 <alaski> there is https://etherpad.openstack.org/p/YVR-nova-cells-v2 21:29:14 <alaski> and https://etherpad.openstack.org/p/YVR-nova-scalling-out-scheduler-for-cells 21:29:32 <alaski> both of which I should edit a little bit it looks like 21:29:58 <bauzas> alaski: fair point, I also need to do that for the latter 21:30:24 <bauzas> alaski: there are 2 points, one for providing incremental updates and one for having a shared state 21:30:40 <bauzas> both are complementary IMHO 21:30:53 <bauzas> but let's not tease all our sessions :) 21:31:00 <alaski> heh 21:31:22 <alaski> what I would like to nail down is cells vs aggregates vs azs, and how does the scheduler deal with them 21:31:38 <bauzas> that's a 3rd point :) 21:31:42 <alaski> and for someone to convince me that a cell is a host property 21:31:55 <bauzas> I can provide beers 21:32:48 <alaski> I think we should have http://www.sortilegewhisky.com/en/the-original/ 21:33:04 * alaski has no clue if that's good, but it looks intriguing 21:33:32 <alaski> anything else? 21:33:45 <bauzas> alaski: at least one beverage which is called a trick, nice 21:33:59 <belmoreira> alaski: nice way to close the meeting :) 21:34:16 <alaski> belmoreira: :) 21:34:33 <alaski> see you all next week(hopefully)! 21:34:40 <alaski> #endmeeting