15:01:37 <ihrachys> #startmeeting neutron_upgrades
15:01:38 <openstack> Meeting started Mon Jul 18 15:01:37 2016 UTC and is due to finish in 60 minutes.  The chair is ihrachys. Information about MeetBot at http://wiki.debian.org/MeetBot.
15:01:40 <openstack> Useful Commands: #action #agreed #help #info #idea #link #topic #startvote.
15:01:43 <openstack> The meeting name has been set to 'neutron_upgrades'
15:01:48 <slunkad_> hello
15:01:57 <korzen> Hi ihrachys, good to see you back in business
15:02:11 <ihrachys> it's good indeed to be back! :)
15:02:27 * ihrachys waves at sc68cal and rossella_s
15:02:43 <rossella_s> hi ihrachys and all
15:02:46 <jlibosva> hello
15:03:12 <ihrachys> #link https://wiki.openstack.org/wiki/Meetings/Neutron-Upgrades-Subteam Agenda
15:03:18 <ihrachys> #topic Announcements
15:03:29 <ihrachys> not much from me. we delivered N2, and N3 is in August.
15:04:27 <ihrachys> we should try to land subnet and maybe port till then
15:04:33 <ihrachys> #topic Partial Multinode Grenade
15:04:40 <ihrachys> there was some progress on that one lately
15:05:15 <ihrachys> linuxbridge flavour added to experimental: https://review.openstack.org/336793 and https://review.openstack.org/340962
15:05:26 <ihrachys> that said, the job currently fails on three tests accessing FIP
15:05:37 <ihrachys> with ssh timeout
15:05:37 <ihrachys> suggesting another MTU issue :)
15:05:40 <ihrachys> speaking of MTU...
15:06:01 <ihrachys> we had a bad week for multinode grenade, the job started to misbehave
15:06:17 <ihrachys> #link https://bugs.launchpad.net/neutron/+bug/1603268
15:06:17 <openstack> Launchpad bug 1603268 in neutron "Unstable grenade multinode job" [Critical,In progress] - Assigned to Ihar Hrachyshka (ihar-hrachyshka)
15:06:57 <ihrachys> the current theory is, we issue ssh into FIP attached to an instance using br-ex, and the bridge has mtu = 1500, while underlying network is at 1450
15:07:07 <ihrachys> so requests don't pass thru
15:07:22 <ihrachys> we are not yet sure how it still works, even if unstable
15:07:31 <ihrachys> anyhow, there is a set of patches that should get it back to decency
15:07:41 <ihrachys> starting at https://review.openstack.org/#/c/343024/ plus Depends-On links
15:08:31 <ihrachys> overall, there are some things to tweak in devstack-gate to fix mtu sanity, like properly configuring neutron with global_physnet_mtu instead of network_device_mtu: https://review.openstack.org/342975
15:08:51 <ihrachys> that plus some other cleanup bits should hopefully get the job in better shape
15:09:01 <ihrachys> including linuxbridge one
15:09:12 <ihrachys> I will recheck the latter once we settle down the ovs one that is in gate.
15:09:16 <korzen> MTU.. never ending story ;)
15:09:49 <ihrachys> aye. I have like 8 patches related to MTU sitting in my queue :)
15:10:07 <ihrachys> there were also some good news about dvr
15:10:12 <ihrachys> thanks to sc68cal, we now have it voting: https://review.openstack.org/336116
15:10:14 <ihrachys> YAY
15:10:23 <korzen> :)
15:10:29 <ihrachys> so both legacy and dvr upgrades are covered in gate.
15:10:59 <ihrachys> I guess the plan now is to monitor both, then later consider dropping the legacy one (at least that's what I saw in the etherpad crafted by armax and sc68cal)
15:11:23 <korzen> dvr should be covering the legacy...
15:11:37 <ihrachys> korzen: elaborate
15:11:40 <korzen> the question is how much DVR is dvr job now
15:12:16 <korzen> current dvr job in my understanding is running legacy too
15:12:29 <korzen> because it has smoke tests
15:12:39 <korzen> which is using legacy
15:12:57 <ihrachys> korzen: ok, so you suggest we may have testing gaps in dvr cases in dvr job.
15:13:15 <korzen> but I'm not sure if smoke will create legacy or neutron is confgured to launch dvr routers?
15:14:04 <korzen> grenade is running the general tempest tests, and I'm not sure if dvr specific tests are launched
15:14:17 <ihrachys> yeah, something to consider. also of interest for 'long standing resources' that are created by grenade and are, I suspect, all legacy (if we create routers at all)
15:14:59 <ihrachys> tests executed in dvr jobs: http://logs.openstack.org/58/342958/3/check/gate-grenade-dsvm-neutron-dvr-multinode/df59b0d/logs/testr_results.html.gz
15:15:25 <ihrachys> it's really basic so far.
15:15:58 <ihrachys> speaking of which, I think grenade does not even execute smoke tests from neutron tree right now
15:16:08 <ihrachys> there is a patch to fix that: https://review.openstack.org/#/c/337372/
15:16:36 <ihrachys> without it, our smoke tags are of no use for grenade runs
15:17:08 <ihrachys> ok, it's something to consider in the next iterations around those jobs. overall it's cool to see progress.
15:17:18 <korzen> yeap
15:17:22 <ihrachys> #topic Object implementation
15:18:07 <ihrachys> as I mentioned before, we are short on time, and we should really strive to land some big objects, subnet and port are likely the best candidates at this point.
15:18:28 <ihrachys> now that I am back, I started looking at subnet set of patches from korzen and will try to avoid slowing down Artur
15:18:47 <ihrachys> rossella_s: korzen has a bunch of patches in his queue that I believe are mergeable, I marked some with +2
15:18:57 <ihrachys> would be cool to try to flush those asap
15:19:00 <korzen> I'm constantly debugging and fixing found issues
15:19:07 <korzen> for subnet
15:19:41 <rossella_s> ihrachys, ack...I will look at them tomorrow
15:20:07 <ihrachys> ok, I think with korzen on subnet, it's covered.
15:20:22 <ihrachys> what about port? I believe jlibosva did not have time to revive it?
15:20:25 <slunkad_> I guess we are only waiting for one testcase for the sg patch .. korzen I don't really understand what you have done in the _load_shared method for subnet
15:20:31 <jlibosva> nope, I didn;t
15:20:59 <ihrachys> jlibosva: so you either step in now, or risk me taking it over!!!1! :)
15:21:13 <jlibosva> ihrachys: I think it would be better if you take over
15:21:50 <ihrachys> slunkad_: I see -1 at https://review.openstack.org/#/c/284738/ that is there for some time. are you planning to respin it this week so that we can land it?
15:21:58 <ihrachys> jlibosva: yessir
15:22:06 <jlibosva> thanks and sorry :)
15:22:40 <slunkad_> ihrachys: yes I am, we discussed in the last meeting about the testcase that is needed for is_default
15:22:47 <ihrachys> slunkad_: also, we were suggesting before that new patches should contain code that integrate those objects in db code. is it on your radar?
15:23:12 <slunkad_> ihrachys: yes surely, after the sg is merged
15:23:37 <ihrachys> slunkad_: it would be better to have both pieces ready to land before we push buttons.
15:23:38 <slunkad_> kong: if you could explain a little more what exactly you are doing for subnet it would help me to move forward
15:23:52 <ihrachys> slunkad_: at least to grasp if the initial version of the object is good enough to be used at least for something.
15:24:07 <ihrachys> slunkad_: otherwise it's code that hangs untangled from the actual neutron-serve
15:24:09 <ihrachys> *server
15:24:12 <slunkad_> ihrachys: oh ok, then  I can start working on it sooner
15:24:23 <ihrachys> slunkad_: thanks
15:24:29 <ihrachys> slunkad_: I see you already have a WIP for that
15:24:39 <slunkad_> ihrachys: yes but that is really really old
15:25:28 <slunkad_> s/kong/korzen
15:25:29 <korzen> sorry I've got disconnected
15:25:43 <ihrachys> slunkad_: I am not following how korzen shared field relates to sg patch you have. can you elaborate?
15:26:22 <korzen> ihrachys, I was suggesting for is-default attribute to be implemented similary like subnet's shared
15:26:36 <slunkad_> ihrachys: apparently since is_default is only handled in create
15:26:45 <korzen> and I mean the load_shared logic
15:27:35 <korzen> we need to write special loading method for synthetic field that has not an OVO
15:27:55 <slunkad_> korzen: I guess you mean just the skeleton of load_share because the stuff in it looks quite different
15:28:50 <korzen> what I mean is to create method load_default() and put in in from_db_obj() overridden mehtod in SG class
15:29:24 <slunkad_> korzen: ok
15:30:40 <ihrachys> ok I hope we are clear on the way forward there.
15:31:43 <ihrachys> there are other object patches in the queue, though I haven't reached them just yet. I try to stick to what is most critical, which is subnet and port in my world.
15:31:55 <ihrachys> that said, we'll get there with reviews for other patches.
15:32:07 <ihrachys> #topic Other patches on review
15:32:27 <ihrachys> I had something related to mtu, but now in database.
15:32:53 <ihrachys> so in mitaka, we changed a way we configure mtu. new networks got correct mtu values calculated and stored in db.
15:33:18 <ihrachys> now, for existing networks created before fixes, db contains bad values (like 0) that break backend mtu setup.
15:33:30 <ihrachys> and those networks have not migrated just yet to correct values.
15:33:59 <ihrachys> so, I have a patch that makes mtu field not persisted in db but calculated on demand on every network fetch: https://review.openstack.org/336805
15:34:37 <ihrachys> basically, for upgrade matters, it means that after upgrade to N, old networks may have MTUs changed (as returned by neutron-server)
15:34:48 <ihrachys> now, if we talk about api only, it's all good and done.
15:35:37 <ihrachys> but then, we have agents still running with bad MTU applied. the question is, should we do something more than just fixing it on neutron-server, like triggering MTU reset on agents somehow?
15:35:50 <ihrachys> how do we handle such changes in generall?
15:36:35 <korzen> I guess we can resync
15:36:48 <korzen> then agents will get new values of MTU
15:37:05 <korzen> it should be maybe exposed via CLI?
15:37:12 <korzen> to forve the sync on old agent?
15:37:19 <korzen> force*
15:37:20 <ihrachys> will agents always trigger full resync on restart? or do they sometimes apply some graceful techniques to skip some of work?
15:37:47 <ihrachys> because upgrade kinda requires eventual restart of an agent, and maybe it's good enough.
15:37:57 <korzen> that is one option
15:38:04 <korzen> but what about old agents?
15:38:07 <ihrachys> rossella_s: comments?
15:38:34 <korzen> if you will leave some L2 agents running for longer period before upgrde?
15:38:37 <ihrachys> korzen: in a way, they will stay broken (using mixed bad and good MTUs) until they are upgraded.
15:39:00 <rossella_s> ihrachys, the l2 agent will always full resync
15:39:22 <ihrachys> korzen: for new ports on a network handled by new neutron-server, MTU will be correct; only devices that were created before neutron-server is upgraded will have bad MTUs.
15:39:37 <ihrachys> the mixed nature of MTU of course will make debugging issues fun
15:40:05 <korzen> ihrachys, sounds like ops are going to have more fun because of us :)
15:40:15 <ihrachys> but then, an argument can be made that 'some ports have broken mtu' is better than 'all ports have broken mtu' :)
15:40:41 <ihrachys> rossella_s: I assume it's the same for l3 and dhcp
15:41:07 <rossella_s> ihrachys, I guess so
15:42:02 <korzen> it is like L2 is wiring them
15:42:11 <korzen> so the L2 is enough
15:42:35 <ihrachys> l3 router also does its job though, since it handles namespaces.
15:42:44 <ihrachys> they all rely on interface_driver
15:43:20 <korzen> so where is exactly MTU set?
15:44:01 <ihrachys> on all ports plugged into a network. it's both router/dhcp ports as well as instance taps
15:44:20 <ihrachys> for tap, it's both nova and neutron that handle it.
15:44:26 <korzen> ihrachys, but where in code?
15:44:47 <korzen> sorry if I'm asking obvious things
15:44:54 <ihrachys> neutron/agent/linux/interface.py is for interface_drivers
15:45:00 <korzen> ok
15:45:14 <ihrachys> for hybrid bridge, it's somewhere in nova tree, not neutron. though neutron provides the value to apply.
15:46:16 <ihrachys> anyway, seems like we may assume a resync. mixed mtus are not the best thing I would like to see, but even if we trigger resync somehow, it may still take time to apply, in the meantime we are still left with mixed mtus.
15:47:59 <ihrachys> ok, any more upgrade related patches to discuss?
15:48:26 <ihrachys> #topic Open discussion
15:48:43 <ihrachys> I see two patches from korzen in the agenda: https://review.openstack.org/334380 and https://review.openstack.org/334381
15:48:50 <ihrachys> I already reviewed both and they are good
15:48:59 <ihrachys> korzen: do we need anything there apart from +W?
15:49:08 <ihrachys> or is there something to discuss?
15:49:20 <korzen> they are ok
15:49:51 <ihrachys> ok, anything more to discuss?
15:50:09 <korzen> I guess not
15:50:29 <ihrachys> I bet not. ok, let's focus on subnet and port and make progress there.
15:50:35 <ihrachys> thanks everyone!
15:50:36 <ihrachys> #endmeeting