17:01:33 <dansmith> #startmeeting nova_cells
17:01:34 <openstack> Meeting started Wed Mar 14 17:01:33 2018 UTC and is due to finish in 60 minutes.  The chair is dansmith. Information about MeetBot at http://wiki.debian.org/MeetBot.
17:01:35 <openstack> Useful Commands: #action #agreed #help #info #idea #link #topic #startvote.
17:01:38 <openstack> The meeting name has been set to 'nova_cells'
17:01:40 <melwitt> o7
17:01:41 <tssurya> o/
17:01:48 <dansmith> sorry, I clicked the wrong channel and was all "where the fsck are those people?"
17:01:56 <tssurya> :D
17:01:56 <melwitt> heh
17:02:16 <dansmith> #link https://wiki.openstack.org/wiki/Meetings/NovaCellsv2
17:02:27 <belmoreira> o/
17:02:27 <dansmith> #topic bugs
17:02:38 <dansmith> So, first bug on the list is something I want to talk about selfishly
17:02:56 * bauzas sits at the back of the room
17:03:02 <dansmith> internally, ironic people have been saying for a while that host discovery with ironic nodes was buggy and problematic,
17:03:11 <dansmith> and I wasn't buying it
17:03:12 <mriedem> o/
17:03:35 <dansmith> we added the scheduler periodic for them so things were a little more automatic, which helped a little, but there's a big problem they finally illuminated
17:03:47 <dansmith> #link https://bugs.launchpad.net/nova/+bug/1755602
17:03:48 <openstack> Launchpad bug 1755602 in OpenStack Compute (nova) "Ironic computes may not be discovered when node count is less than compute count" [Medium,In progress] - Assigned to Dan Smith (danms)
17:04:04 <dansmith> I tried to put a lot of detail in the bug there, so hopefully it's clear what is going on,
17:04:14 <dansmith> but basically if you have more nova-computes than ironic nodes,
17:04:35 <dansmith> we'll discover one host and map the compute node, and then never discover other hosts until they get an unmapped node assigned to them by the hash ring
17:04:54 <dansmith> they've been working around it by deleting nodes and re-discovering, or just having enough nodes to not notice,
17:05:03 <dansmith> but it's a real problem for bootstrapping a deployment
17:05:24 <dansmith> so I have a potential patch up, which lets us discover by compute service instead of just node,
17:05:47 <dansmith> which also solves their problem of not being able to get discovery done until they have ironic nodes enrolled, which was an early complaint about this process in newton or ocata from them if you recall
17:06:04 <dansmith> #link https://review.openstack.org/#/c/552691/
17:06:31 <dansmith> so that is backportable and minimally invasive, but I think like compute node, we probably need to grow a mapped field so we can do that efficiently
17:06:56 <dansmith> the ironic people like this because they can provision their nova services, get them mapped, and then move on to ironic and not have to come back to re-do discovery
17:07:07 <dansmith> TBH, I'm not sure why we didn't just do this when they initially complained and we added the periodic,
17:07:18 <dansmith> so I'm putting it out here in case anyone can see why we *shouldn't* do this
17:07:42 <mriedem> have the internal guys picked this up and tested it yet?
17:08:10 <dansmith> they haven't because I told them to wait, and because they have to change when this is run and with the new flag
17:08:16 <dansmith> so I wanted to float it at least before they go do that
17:08:20 <dansmith> (and this came up only last night)
17:08:46 <dansmith> they currently run it really late and in a weird spot because they have to in order for it to work at all,
17:08:59 <dansmith> so this is moving from like total post-deploy steps way up into the nova phase of setup
17:09:07 <dansmith> which in tripleo moves it from like heat to puppet or something
17:09:27 <mriedem> sure, i'd just like to know that someone on the deployment side of the house that needs this can pick it up, hack it in to a test env and say 'yup does the trick, thanks'
17:09:44 <melwitt> I'm still trying to understand what's going on, so I'll have to comment later. (unmapped node assigned by hash ring)
17:09:52 <dansmith> if there are no philosophical objections then I'll get them to do  that
17:10:09 <mriedem> i have no such objections, it's all optional
17:10:14 <dansmith> mriedem: okay
17:10:31 <dansmith> melwitt: let's continue so we don't miss anything then and we can circle back to discuss more at the end if you want
17:10:43 <bauzas> I'm not opiniated either
17:10:48 <mriedem> i'm told there are at least 30 tripleo core team members so finding someone to test it should be easy :)
17:10:54 <dansmith> heh
17:11:05 <dansmith> well, getting it tested won't be too hard, yeah :)
17:11:15 <melwitt> yeah, I don't expect to be opinionated, just need to read through the bug and patch
17:11:19 <dansmith> the versionI had last night was just thrown together and didn't work anyway
17:11:26 <dansmith> melwitt: okay
17:11:40 <dansmith> so, I think the rest of the bugs on the list here have been here for a while and have patches up
17:11:50 <dansmith> I think I've voted on most of them at one time or another
17:11:57 <dansmith> anything specific people want to bring up about any of these?
17:12:10 <tssurya> nope
17:12:14 <bauzas> IIRC, the tripleo folks had an experimental job, nope ?
17:12:22 <dansmith> tssurya: your marker reset one will have to change again I guess, sorry about that :/
17:12:41 <tssurya> dansmith: yes I have to re-rebase :)
17:12:48 <mriedem> bauzas: that job won't do anything if tripleo itself isn't changed to use the new flag
17:12:49 <dansmith> tssurya: apologies :)
17:13:03 <tssurya> dansmith: no its fine I will ping you when I do that
17:13:09 <bauzas> mriedem: sure, I'm saying they could magically test things
17:13:21 <dansmith> tssurya: ack
17:13:49 <tssurya> dansmith: btw, this has been there for some time - https://review.openstack.org/#/c/519275/, I think you and the submitter disagree on some things
17:14:02 <tssurya> just bringing it to your notice
17:14:24 <dansmith> tssurya: ah okay I'll look
17:14:44 <dansmith> the other non-WIP two have +2s from me, so other people could slam them in easily
17:14:59 <dansmith> I have updated the priorities etherpad to highlight the ones that need one more +2
17:15:02 <dansmith> or did yesterday
17:15:26 <tssurya> ack
17:15:32 <dansmith> any other bugs to highlight?
17:15:43 <tssurya> no
17:16:01 <dansmith> #topic open reviews
17:16:23 <dansmith> So edleafe has got the member_of api patch up and close to being merged, which is required for my placement request filter stuff
17:16:33 <dansmith> I've rebased on it, but it has moved a bit since I did
17:16:44 <dansmith> but once that merges, the rest will be ready for review, up until the last WIP at the end
17:16:55 <dansmith> I was thinking about how to test this, other than just with functional tests
17:17:28 <dansmith> could do it potentially in tempest by creating aggregates and making sure that it refuses to build with no matching tenant aggregate, but otherwise it'd be a little hard
17:17:35 <mriedem> you probably missed my comments in https://review.openstack.org/#/c/545002/ since i didn't vote
17:17:38 <dansmith> any comments or opinions on how that should be done?
17:17:44 <mriedem> but i was going to ask for functional tests for all of the new filters too
17:17:53 <dansmith> mriedem: ah yeah
17:17:58 <mriedem> i don't think we need tempest for these filters
17:18:04 <dansmith> aight cool
17:18:20 <mriedem> the shit gibi is working on with port bw allocation crap, that's definitely tempest territory
17:18:52 <dansmith> any other open reviews people want to highlight?
17:19:13 <mriedem> uh yeah
17:19:21 <mriedem> https://review.openstack.org/#/q/topic:bp/remove-nova-network+(status:open+OR+status:merged)
17:19:31 <mriedem> that moves the cells v1 job in tree
17:19:32 <dansmith> oh my
17:19:36 <mriedem> and then changes it to use neutron
17:19:40 <dansmith> does it work?
17:19:42 <mriedem> yeah
17:19:45 <dansmith> amazing
17:19:52 * dansmith puts that on the prio etherpad
17:19:53 <mriedem> i've rechecked a few times and hit some race failures,
17:20:10 <mriedem> but have fixed up the config and blacklist that i don't think those should be a problem now,
17:20:19 <dansmith> awesome
17:20:23 <mriedem> problem was some tempest tests were looking for active ports assuming that b/c the server was active the ports would be
17:20:41 <dansmith> I can't believe that wasn't more of a giant dumpster fire
17:20:47 <mriedem> me neither
17:20:49 <dansmith> makes me suspicious ;)
17:20:58 <mriedem> i have https://blueprints.launchpad.net/nova/+spec/remove-nova-network for tracking removal work which is a specless bp
17:21:00 <mriedem> on the agenda for thursday
17:21:09 <dansmith> okay cool
17:21:11 <mriedem> unless melwitt wants to just make an executive decision now
17:21:26 <melwitt> I just approved it since we agreed at the PTG
17:21:37 <mriedem> alright
17:21:38 <mriedem> cool
17:21:42 <dansmith> the other one we need to make sure gets approved is the xenapi one that will lead to removing the upcall
17:21:49 <bauzas> I proxy my +1 here for the nova-net removal
17:22:03 <mriedem> proxy?
17:22:05 <mriedem> for yourself?
17:22:07 <bauzas> given I won't attend the meeting
17:22:09 <mriedem> oh
17:22:10 <mriedem> heh
17:22:40 <mriedem> next time i ask laura how she spent $200 at target i'll preface with "i'll let my proxy ask this"
17:22:48 <dansmith> okay any other open reviews?
17:23:18 <bauzas> mriedem: uh
17:23:40 <mriedem> end it
17:23:47 <dansmith> #topic open discussion
17:24:04 <dansmith> melwitt: want to discuss the hash ring stuff, or just leave you to read the bug?
17:24:05 <tssurya> Question - Do we still plan to keep this NewtonCellsCheck Test in Rocky ? -> https://github.com/openstack/nova/blob/master/nova/tests/unit/db/test_sqlalchemy_migration.py#L356 ; since it uses a particularly older DBVersion it fails for my new DB version; not sure how to handle it because it is wrong to just update it I guess.
17:24:21 <tssurya> dansmith: oops, after you
17:24:29 <melwitt> dansmith: I'll read the bug. thanks tho
17:24:33 <dansmith> melwitt: ack
17:24:38 <mriedem> tssurya: do you have a patch where that's failing? otherwise i don't understand the problem.
17:24:49 <tssurya> http://logs.openstack.org/05/552505/1/check/openstack-tox-py27/d5ecd2c/testr_results.html.gz
17:25:29 <dansmith> ah because the model is changing
17:25:47 <dansmith> the last time we hit something like this we just removed the ancient check I think
17:26:04 <dansmith> the only other option is to convert it to straight sqla I think,
17:26:09 <dansmith> and in that case at least, we just removed it
17:26:14 <dansmith> it being the test
17:26:20 <mriedem> i think i've converted to using straight sqla
17:26:25 <mriedem> in at least one case like this
17:26:43 <dansmith> yeah for flavors I did I think
17:26:49 <mriedem> https://github.com/openstack/nova/commit/3ca7eaab0287ce3f6d556baf0d1e0bb2f9d8aeb5#diff-43368cee9c9999756b4b7d140ef1055aR385
17:26:52 <dansmith> because we had default flavors in the initial migration or something
17:27:27 <tssurya> hmm, so I do something similar and keep the test
17:27:36 <dansmith> mriedem: that is the opposite problem here though
17:27:46 <mriedem> how?
17:27:58 <mriedem> i added description to the flavors table, and an older migratoin using the model couldn't handle that
17:28:00 <mriedem> same issue here
17:28:05 <dansmith> it's not doing create,
17:28:13 <dansmith> it's doing save as part of the test
17:28:24 <mriedem> https://github.com/openstack/nova/blob/master/nova/tests/unit/db/test_sqlalchemy_migration.py#L440
17:28:25 <dansmith> you'd have to convert the test itself to not use the objects
17:28:27 <mriedem> it's failing on create
17:28:48 <dansmith> oh you're right, sorry I was looking at models.py::save()
17:28:53 <dansmith> File "/home/zuul/src/git.openstack.org/openstack/nova/.tox/py27/local/lib/python2.7/site-packages/oslo_db/sqlalchemy/models.py", line 50, in save
17:28:54 <dansmith> that line
17:29:02 <dansmith> but it's create in the object yeah
17:29:21 <mriedem> i think the test is just using the object as a convenience
17:29:28 <dansmith> it is yeah
17:29:48 <mriedem> changing all of those create calls could be annoying
17:30:03 <mriedem> unless you monkeypatched it or something
17:30:27 <mriedem> idk, i generally don't like just deleting tests for code that we still have in tree
17:30:43 <mriedem> especially now that branches are never going away :)
17:30:56 <tssurya> oh yea
17:30:58 <dansmith> it's not that many cases, it's quite doable
17:31:14 <mriedem> tssurya: i could help out if you need it by putting a patch at the bottom of your series to do that change
17:31:30 <tssurya> mriedem : awesome thanks!
17:31:38 <mriedem> better than looking at gd specs anyway
17:32:11 <tssurya> mriedem : I already have a need help thing in the priorities etherpad also for you
17:32:24 <mriedem> ooo i'm honored
17:32:28 <tssurya> regarding another test :)
17:32:36 <dansmith> mriedem: 0ce4dff41f8d38edf790a301ec8e7040b279d65a
17:32:38 <tssurya> so will add this too to the lsit
17:32:47 <dansmith> mriedem: that's the one I was thinking of where we just removed the test
17:32:55 <dansmith> but agree, if it's trivial to keep it, by all means
17:33:10 <mriedem> tssurya: ok https://review.openstack.org/#/c/546660/ yeah i can help out with that after this
17:33:16 <mriedem> i do enjoy writing tests...
17:33:17 <tssurya> I am up for removing it considering its Newton test
17:33:34 <mriedem> tssurya: but someone could skip newton
17:33:38 <mriedem> and go directly to pike or some crap
17:33:55 <tssurya> mriedem : yea your call, :)
17:34:09 <mriedem> i'll push a patch to convert that test to use sqla
17:34:11 <mriedem> see how bad it is
17:34:14 <tssurya> thanks
17:34:54 <dansmith> okay anything else for open discussion?
17:35:00 <tssurya> nope
17:35:51 <dansmith> #endmeeting