17:01:33 <dansmith> #startmeeting nova_cells 17:01:34 <openstack> Meeting started Wed Mar 14 17:01:33 2018 UTC and is due to finish in 60 minutes. The chair is dansmith. Information about MeetBot at http://wiki.debian.org/MeetBot. 17:01:35 <openstack> Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. 17:01:38 <openstack> The meeting name has been set to 'nova_cells' 17:01:40 <melwitt> o7 17:01:41 <tssurya> o/ 17:01:48 <dansmith> sorry, I clicked the wrong channel and was all "where the fsck are those people?" 17:01:56 <tssurya> :D 17:01:56 <melwitt> heh 17:02:16 <dansmith> #link https://wiki.openstack.org/wiki/Meetings/NovaCellsv2 17:02:27 <belmoreira> o/ 17:02:27 <dansmith> #topic bugs 17:02:38 <dansmith> So, first bug on the list is something I want to talk about selfishly 17:02:56 * bauzas sits at the back of the room 17:03:02 <dansmith> internally, ironic people have been saying for a while that host discovery with ironic nodes was buggy and problematic, 17:03:11 <dansmith> and I wasn't buying it 17:03:12 <mriedem> o/ 17:03:35 <dansmith> we added the scheduler periodic for them so things were a little more automatic, which helped a little, but there's a big problem they finally illuminated 17:03:47 <dansmith> #link https://bugs.launchpad.net/nova/+bug/1755602 17:03:48 <openstack> Launchpad bug 1755602 in OpenStack Compute (nova) "Ironic computes may not be discovered when node count is less than compute count" [Medium,In progress] - Assigned to Dan Smith (danms) 17:04:04 <dansmith> I tried to put a lot of detail in the bug there, so hopefully it's clear what is going on, 17:04:14 <dansmith> but basically if you have more nova-computes than ironic nodes, 17:04:35 <dansmith> we'll discover one host and map the compute node, and then never discover other hosts until they get an unmapped node assigned to them by the hash ring 17:04:54 <dansmith> they've been working around it by deleting nodes and re-discovering, or just having enough nodes to not notice, 17:05:03 <dansmith> but it's a real problem for bootstrapping a deployment 17:05:24 <dansmith> so I have a potential patch up, which lets us discover by compute service instead of just node, 17:05:47 <dansmith> which also solves their problem of not being able to get discovery done until they have ironic nodes enrolled, which was an early complaint about this process in newton or ocata from them if you recall 17:06:04 <dansmith> #link https://review.openstack.org/#/c/552691/ 17:06:31 <dansmith> so that is backportable and minimally invasive, but I think like compute node, we probably need to grow a mapped field so we can do that efficiently 17:06:56 <dansmith> the ironic people like this because they can provision their nova services, get them mapped, and then move on to ironic and not have to come back to re-do discovery 17:07:07 <dansmith> TBH, I'm not sure why we didn't just do this when they initially complained and we added the periodic, 17:07:18 <dansmith> so I'm putting it out here in case anyone can see why we *shouldn't* do this 17:07:42 <mriedem> have the internal guys picked this up and tested it yet? 17:08:10 <dansmith> they haven't because I told them to wait, and because they have to change when this is run and with the new flag 17:08:16 <dansmith> so I wanted to float it at least before they go do that 17:08:20 <dansmith> (and this came up only last night) 17:08:46 <dansmith> they currently run it really late and in a weird spot because they have to in order for it to work at all, 17:08:59 <dansmith> so this is moving from like total post-deploy steps way up into the nova phase of setup 17:09:07 <dansmith> which in tripleo moves it from like heat to puppet or something 17:09:27 <mriedem> sure, i'd just like to know that someone on the deployment side of the house that needs this can pick it up, hack it in to a test env and say 'yup does the trick, thanks' 17:09:44 <melwitt> I'm still trying to understand what's going on, so I'll have to comment later. (unmapped node assigned by hash ring) 17:09:52 <dansmith> if there are no philosophical objections then I'll get them to do that 17:10:09 <mriedem> i have no such objections, it's all optional 17:10:14 <dansmith> mriedem: okay 17:10:31 <dansmith> melwitt: let's continue so we don't miss anything then and we can circle back to discuss more at the end if you want 17:10:43 <bauzas> I'm not opiniated either 17:10:48 <mriedem> i'm told there are at least 30 tripleo core team members so finding someone to test it should be easy :) 17:10:54 <dansmith> heh 17:11:05 <dansmith> well, getting it tested won't be too hard, yeah :) 17:11:15 <melwitt> yeah, I don't expect to be opinionated, just need to read through the bug and patch 17:11:19 <dansmith> the versionI had last night was just thrown together and didn't work anyway 17:11:26 <dansmith> melwitt: okay 17:11:40 <dansmith> so, I think the rest of the bugs on the list here have been here for a while and have patches up 17:11:50 <dansmith> I think I've voted on most of them at one time or another 17:11:57 <dansmith> anything specific people want to bring up about any of these? 17:12:10 <tssurya> nope 17:12:14 <bauzas> IIRC, the tripleo folks had an experimental job, nope ? 17:12:22 <dansmith> tssurya: your marker reset one will have to change again I guess, sorry about that :/ 17:12:41 <tssurya> dansmith: yes I have to re-rebase :) 17:12:48 <mriedem> bauzas: that job won't do anything if tripleo itself isn't changed to use the new flag 17:12:49 <dansmith> tssurya: apologies :) 17:13:03 <tssurya> dansmith: no its fine I will ping you when I do that 17:13:09 <bauzas> mriedem: sure, I'm saying they could magically test things 17:13:21 <dansmith> tssurya: ack 17:13:49 <tssurya> dansmith: btw, this has been there for some time - https://review.openstack.org/#/c/519275/, I think you and the submitter disagree on some things 17:14:02 <tssurya> just bringing it to your notice 17:14:24 <dansmith> tssurya: ah okay I'll look 17:14:44 <dansmith> the other non-WIP two have +2s from me, so other people could slam them in easily 17:14:59 <dansmith> I have updated the priorities etherpad to highlight the ones that need one more +2 17:15:02 <dansmith> or did yesterday 17:15:26 <tssurya> ack 17:15:32 <dansmith> any other bugs to highlight? 17:15:43 <tssurya> no 17:16:01 <dansmith> #topic open reviews 17:16:23 <dansmith> So edleafe has got the member_of api patch up and close to being merged, which is required for my placement request filter stuff 17:16:33 <dansmith> I've rebased on it, but it has moved a bit since I did 17:16:44 <dansmith> but once that merges, the rest will be ready for review, up until the last WIP at the end 17:16:55 <dansmith> I was thinking about how to test this, other than just with functional tests 17:17:28 <dansmith> could do it potentially in tempest by creating aggregates and making sure that it refuses to build with no matching tenant aggregate, but otherwise it'd be a little hard 17:17:35 <mriedem> you probably missed my comments in https://review.openstack.org/#/c/545002/ since i didn't vote 17:17:38 <dansmith> any comments or opinions on how that should be done? 17:17:44 <mriedem> but i was going to ask for functional tests for all of the new filters too 17:17:53 <dansmith> mriedem: ah yeah 17:17:58 <mriedem> i don't think we need tempest for these filters 17:18:04 <dansmith> aight cool 17:18:20 <mriedem> the shit gibi is working on with port bw allocation crap, that's definitely tempest territory 17:18:52 <dansmith> any other open reviews people want to highlight? 17:19:13 <mriedem> uh yeah 17:19:21 <mriedem> https://review.openstack.org/#/q/topic:bp/remove-nova-network+(status:open+OR+status:merged) 17:19:31 <mriedem> that moves the cells v1 job in tree 17:19:32 <dansmith> oh my 17:19:36 <mriedem> and then changes it to use neutron 17:19:40 <dansmith> does it work? 17:19:42 <mriedem> yeah 17:19:45 <dansmith> amazing 17:19:52 * dansmith puts that on the prio etherpad 17:19:53 <mriedem> i've rechecked a few times and hit some race failures, 17:20:10 <mriedem> but have fixed up the config and blacklist that i don't think those should be a problem now, 17:20:19 <dansmith> awesome 17:20:23 <mriedem> problem was some tempest tests were looking for active ports assuming that b/c the server was active the ports would be 17:20:41 <dansmith> I can't believe that wasn't more of a giant dumpster fire 17:20:47 <mriedem> me neither 17:20:49 <dansmith> makes me suspicious ;) 17:20:58 <mriedem> i have https://blueprints.launchpad.net/nova/+spec/remove-nova-network for tracking removal work which is a specless bp 17:21:00 <mriedem> on the agenda for thursday 17:21:09 <dansmith> okay cool 17:21:11 <mriedem> unless melwitt wants to just make an executive decision now 17:21:26 <melwitt> I just approved it since we agreed at the PTG 17:21:37 <mriedem> alright 17:21:38 <mriedem> cool 17:21:42 <dansmith> the other one we need to make sure gets approved is the xenapi one that will lead to removing the upcall 17:21:49 <bauzas> I proxy my +1 here for the nova-net removal 17:22:03 <mriedem> proxy? 17:22:05 <mriedem> for yourself? 17:22:07 <bauzas> given I won't attend the meeting 17:22:09 <mriedem> oh 17:22:10 <mriedem> heh 17:22:40 <mriedem> next time i ask laura how she spent $200 at target i'll preface with "i'll let my proxy ask this" 17:22:48 <dansmith> okay any other open reviews? 17:23:18 <bauzas> mriedem: uh 17:23:40 <mriedem> end it 17:23:47 <dansmith> #topic open discussion 17:24:04 <dansmith> melwitt: want to discuss the hash ring stuff, or just leave you to read the bug? 17:24:05 <tssurya> Question - Do we still plan to keep this NewtonCellsCheck Test in Rocky ? -> https://github.com/openstack/nova/blob/master/nova/tests/unit/db/test_sqlalchemy_migration.py#L356 ; since it uses a particularly older DBVersion it fails for my new DB version; not sure how to handle it because it is wrong to just update it I guess. 17:24:21 <tssurya> dansmith: oops, after you 17:24:29 <melwitt> dansmith: I'll read the bug. thanks tho 17:24:33 <dansmith> melwitt: ack 17:24:38 <mriedem> tssurya: do you have a patch where that's failing? otherwise i don't understand the problem. 17:24:49 <tssurya> http://logs.openstack.org/05/552505/1/check/openstack-tox-py27/d5ecd2c/testr_results.html.gz 17:25:29 <dansmith> ah because the model is changing 17:25:47 <dansmith> the last time we hit something like this we just removed the ancient check I think 17:26:04 <dansmith> the only other option is to convert it to straight sqla I think, 17:26:09 <dansmith> and in that case at least, we just removed it 17:26:14 <dansmith> it being the test 17:26:20 <mriedem> i think i've converted to using straight sqla 17:26:25 <mriedem> in at least one case like this 17:26:43 <dansmith> yeah for flavors I did I think 17:26:49 <mriedem> https://github.com/openstack/nova/commit/3ca7eaab0287ce3f6d556baf0d1e0bb2f9d8aeb5#diff-43368cee9c9999756b4b7d140ef1055aR385 17:26:52 <dansmith> because we had default flavors in the initial migration or something 17:27:27 <tssurya> hmm, so I do something similar and keep the test 17:27:36 <dansmith> mriedem: that is the opposite problem here though 17:27:46 <mriedem> how? 17:27:58 <mriedem> i added description to the flavors table, and an older migratoin using the model couldn't handle that 17:28:00 <mriedem> same issue here 17:28:05 <dansmith> it's not doing create, 17:28:13 <dansmith> it's doing save as part of the test 17:28:24 <mriedem> https://github.com/openstack/nova/blob/master/nova/tests/unit/db/test_sqlalchemy_migration.py#L440 17:28:25 <dansmith> you'd have to convert the test itself to not use the objects 17:28:27 <mriedem> it's failing on create 17:28:48 <dansmith> oh you're right, sorry I was looking at models.py::save() 17:28:53 <dansmith> File "/home/zuul/src/git.openstack.org/openstack/nova/.tox/py27/local/lib/python2.7/site-packages/oslo_db/sqlalchemy/models.py", line 50, in save 17:28:54 <dansmith> that line 17:29:02 <dansmith> but it's create in the object yeah 17:29:21 <mriedem> i think the test is just using the object as a convenience 17:29:28 <dansmith> it is yeah 17:29:48 <mriedem> changing all of those create calls could be annoying 17:30:03 <mriedem> unless you monkeypatched it or something 17:30:27 <mriedem> idk, i generally don't like just deleting tests for code that we still have in tree 17:30:43 <mriedem> especially now that branches are never going away :) 17:30:56 <tssurya> oh yea 17:30:58 <dansmith> it's not that many cases, it's quite doable 17:31:14 <mriedem> tssurya: i could help out if you need it by putting a patch at the bottom of your series to do that change 17:31:30 <tssurya> mriedem : awesome thanks! 17:31:38 <mriedem> better than looking at gd specs anyway 17:32:11 <tssurya> mriedem : I already have a need help thing in the priorities etherpad also for you 17:32:24 <mriedem> ooo i'm honored 17:32:28 <tssurya> regarding another test :) 17:32:36 <dansmith> mriedem: 0ce4dff41f8d38edf790a301ec8e7040b279d65a 17:32:38 <tssurya> so will add this too to the lsit 17:32:47 <dansmith> mriedem: that's the one I was thinking of where we just removed the test 17:32:55 <dansmith> but agree, if it's trivial to keep it, by all means 17:33:10 <mriedem> tssurya: ok https://review.openstack.org/#/c/546660/ yeah i can help out with that after this 17:33:16 <mriedem> i do enjoy writing tests... 17:33:17 <tssurya> I am up for removing it considering its Newton test 17:33:34 <mriedem> tssurya: but someone could skip newton 17:33:38 <mriedem> and go directly to pike or some crap 17:33:55 <tssurya> mriedem : yea your call, :) 17:34:09 <mriedem> i'll push a patch to convert that test to use sqla 17:34:11 <mriedem> see how bad it is 17:34:14 <tssurya> thanks 17:34:54 <dansmith> okay anything else for open discussion? 17:35:00 <tssurya> nope 17:35:51 <dansmith> #endmeeting