21:00:11 #startmeeting nova_cells 21:00:12 Meeting started Wed May 13 21:00:11 2015 UTC and is due to finish in 60 minutes. The chair is alaski. Information about MeetBot at http://wiki.debian.org/MeetBot. 21:00:13 Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. 21:00:15 The meeting name has been set to 'nova_cells' 21:00:28 Anyone here for the cells meeting? 21:00:33 o/ 21:00:38 damn 21:00:39 o/ 21:00:48 o/ 21:01:00 dansmith: heh 21:01:06 cool, let's get started 21:01:12 #topic Tempest testing 21:01:25 the cells job is in very good shape 21:01:27 http://goo.gl/b7R8wq 21:01:30 o/ 21:01:42 there was a recent hiccup with a tempest change, but that has been addressed 21:01:44 \o 21:02:02 I'm consistenly seeing tempest.api.compute.servers.test_list_servers_negative.ListServersNegativeTestJSON fail still 21:02:04 that was tempest's fault anyway right? 21:02:09 yes 21:02:46 a negative test failing ? 21:02:53 the test fails infrequently, but seems to happen a couple of times a day 21:03:01 I just put up https://review.openstack.org/#/c/182772/ for it 21:03:16 though it may just address the trace, the actual test failure still needs some digging 21:03:40 I'm somewhat hopeful it will fix the test though :) 21:03:56 what about the UnexpectedVMStateError traces? are they okay? 21:04:13 like http://logs.openstack.org/51/179951/12/check/check-tempest-dsvm-cells/9bfbc3d/logs/screen-n-cell-region.txt.gz?level=TRACE ? 21:04:42 the thing that's hard about the cells job is that exceptions at the messaging level are swallowed, so even if it voted things could get past it I think 21:04:52 alaski: yes 21:05:33 I've had that tab open for a while and didn't know if it was still relevant 21:06:00 melwitt: true, we can only really be sure about what tempest explicitly checks for 21:06:25 I bring it up because I was checking the logs for my now monstrous cells Instance object patch, and there were a lot of them. so I checked for other test runs and see it there too though not nearly as many as on my patch 21:07:17 that's helpful to know 21:07:32 alaski: what I mean is, in a non-cells environment exceptions like that would cause a tempest test failure but with cells, not necessarily 21:08:20 example being I had a problem in an earlier patch set of accessing a non existing attribute on a dict and it passed the tempest job 21:09:19 melwitt: probably because most of the exceptions are swallowed like alaski said ? 21:09:24 and I think maybe it's just that we have to check the logs for traces when we're doing cells/messaging.py changes ourselves 21:09:38 melwitt: I see. Is the issue that tempest isn't looking at enough, or that cells is really bad at bubbling things up, or both? 21:09:39 bauzas: yes, that's what I said 21:10:23 do we still have the mechanism to fail a test on exceptions in certain logs? 21:10:30 heh 21:10:40 I dunno, but it's turned off for everything I think 21:10:45 which sucks 21:10:49 IMHO 21:10:53 yeah 21:10:55 agreed 21:10:59 alaski: cells catches the exception and returns it to the caller, and the caller can decide what to do with that 21:11:06 we should turn that on for cells if we can 21:11:08 you could probably get it turned on for a non-voting thing 21:11:08 could we maybe reraise the exceptions ? 21:11:26 and not just silently drop them ? 21:11:54 melwitt: the caller is an rpc cast in most cases though. is that where it's dropped? 21:12:31 dansmith: agreed. definitely worth looking into getting it going again in some capacity 21:13:45 bauzas: I think the issue may be that exceptions that would normally happen in the compute api end up on the other side of an rpc cast 21:14:00 alaski: maybe. I'm not sure if "cells messages" always have request/response or if they have a cast too 21:14:14 alaski: yeah that's my thought too hence the reraise 21:14:26 oh, a cast, nvm 21:14:41 * bauzas is just tied 21:14:44 melwitt: gotcha. the cells messages should have a response, but the loss if probably between the api and the parent cell service 21:14:44 tired even 21:14:56 not between parent and cell 21:15:37 it sounds like checking logs, manually if necessary, but preferably via an automated thing is a good thing to do 21:16:09 #action alaski to look into options for failing on exceptions in cells logs 21:16:33 anything else on testing? 21:16:56 numbers are really good now, and we could almost get away with voting I think 21:17:13 but knocking out one/two more issues should get us there IMO 21:17:29 #topic Specs 21:17:37 oh 21:17:39 god 21:17:39 no 21:17:44 not........SPECS 21:17:57 lol 21:18:00 so, there are specs 21:18:08 I swear, alaski has more specs than should be allowed 21:18:10 well, I just sucked in reviewing those 21:18:15 and there will be a quiz on them for entrance to the cells summit session 21:18:27 I'll happily fail that then :) 21:18:35 I still have a question tho 21:18:43 dansmith: hah, you get a mandatory exception 21:18:47 fsck. 21:18:57 bauzas: sure 21:19:04 alaski: https://review.openstack.org/#/c/169901/ 21:19:30 alaski: before commenting it, could you please refresh my mind why it would need a separate object ? 21:19:44 separate from request_spec? 21:19:51 alaski: yup for persisting 21:20:08 alaski: I understand we need to carry a relationship w/ the instance 21:20:15 because there are things that we need to store that don't relate to the reqspecs purpose 21:20:39 alaski: your spec was mentioning that, but do you have examples ? 21:21:10 it's in the spec but availability_zone, power_state, task_state, uuid, key_name, metadata, 21:21:12 security_groups, etc... 21:21:19 alaski: oh right 21:21:57 some of those things aren't actually necessary, like power/task state, but uuid and metadata are 21:22:39 mmm ok, I have my answers, I'll comment out the spec then 21:22:44 cool 21:22:58 (well, in the plane) 21:23:11 https://review.openstack.org/141486 https://review.openstack.org/#/c/169901/ https://review.openstack.org/#/c/182715/ https://review.openstack.org/#/c/136490/ (for posterity) 21:23:35 probably a couple more to come, to dansmiths chagrin 21:24:31 heh 21:24:33 but my goal is to get everyone on the same/similar page during the summit, so the specs can be easy reads/reviews 21:25:25 anyone want to talk more about specs? 21:25:58 #topic Open Discussion 21:26:10 I have one thing to mention 21:26:23 we should skip next weeks meeting 21:26:32 but also I'm out the following week 21:26:52 if someone wants to run the meeting please speak up, otherwise we can skip that week as well 21:26:57 +1 for skip 21:27:01 I will be gone too 21:27:07 yeah we need vacations :;) 21:27:28 sounds good 21:27:56 anyone have a topic to bring up? 21:28:35 well, just to mention we can review the etherpads for the summit 21:29:02 ahh, good point 21:29:04 there is https://etherpad.openstack.org/p/YVR-nova-cells-v2 21:29:14 and https://etherpad.openstack.org/p/YVR-nova-scalling-out-scheduler-for-cells 21:29:32 both of which I should edit a little bit it looks like 21:29:58 alaski: fair point, I also need to do that for the latter 21:30:24 alaski: there are 2 points, one for providing incremental updates and one for having a shared state 21:30:40 both are complementary IMHO 21:30:53 but let's not tease all our sessions :) 21:31:00 heh 21:31:22 what I would like to nail down is cells vs aggregates vs azs, and how does the scheduler deal with them 21:31:38 that's a 3rd point :) 21:31:42 and for someone to convince me that a cell is a host property 21:31:55 I can provide beers 21:32:48 I think we should have http://www.sortilegewhisky.com/en/the-original/ 21:33:04 * alaski has no clue if that's good, but it looks intriguing 21:33:32 anything else? 21:33:45 alaski: at least one beverage which is called a trick, nice 21:33:59 alaski: nice way to close the meeting :) 21:34:16 belmoreira: :) 21:34:33 see you all next week(hopefully)! 21:34:40 #endmeeting