15:01:37 <ihrachys> #startmeeting neutron_upgrades 15:01:38 <openstack> Meeting started Mon Jul 18 15:01:37 2016 UTC and is due to finish in 60 minutes. The chair is ihrachys. Information about MeetBot at http://wiki.debian.org/MeetBot. 15:01:40 <openstack> Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. 15:01:43 <openstack> The meeting name has been set to 'neutron_upgrades' 15:01:48 <slunkad_> hello 15:01:57 <korzen> Hi ihrachys, good to see you back in business 15:02:11 <ihrachys> it's good indeed to be back! :) 15:02:27 * ihrachys waves at sc68cal and rossella_s 15:02:43 <rossella_s> hi ihrachys and all 15:02:46 <jlibosva> hello 15:03:12 <ihrachys> #link https://wiki.openstack.org/wiki/Meetings/Neutron-Upgrades-Subteam Agenda 15:03:18 <ihrachys> #topic Announcements 15:03:29 <ihrachys> not much from me. we delivered N2, and N3 is in August. 15:04:27 <ihrachys> we should try to land subnet and maybe port till then 15:04:33 <ihrachys> #topic Partial Multinode Grenade 15:04:40 <ihrachys> there was some progress on that one lately 15:05:15 <ihrachys> linuxbridge flavour added to experimental: https://review.openstack.org/336793 and https://review.openstack.org/340962 15:05:26 <ihrachys> that said, the job currently fails on three tests accessing FIP 15:05:37 <ihrachys> with ssh timeout 15:05:37 <ihrachys> suggesting another MTU issue :) 15:05:40 <ihrachys> speaking of MTU... 15:06:01 <ihrachys> we had a bad week for multinode grenade, the job started to misbehave 15:06:17 <ihrachys> #link https://bugs.launchpad.net/neutron/+bug/1603268 15:06:17 <openstack> Launchpad bug 1603268 in neutron "Unstable grenade multinode job" [Critical,In progress] - Assigned to Ihar Hrachyshka (ihar-hrachyshka) 15:06:57 <ihrachys> the current theory is, we issue ssh into FIP attached to an instance using br-ex, and the bridge has mtu = 1500, while underlying network is at 1450 15:07:07 <ihrachys> so requests don't pass thru 15:07:22 <ihrachys> we are not yet sure how it still works, even if unstable 15:07:31 <ihrachys> anyhow, there is a set of patches that should get it back to decency 15:07:41 <ihrachys> starting at https://review.openstack.org/#/c/343024/ plus Depends-On links 15:08:31 <ihrachys> overall, there are some things to tweak in devstack-gate to fix mtu sanity, like properly configuring neutron with global_physnet_mtu instead of network_device_mtu: https://review.openstack.org/342975 15:08:51 <ihrachys> that plus some other cleanup bits should hopefully get the job in better shape 15:09:01 <ihrachys> including linuxbridge one 15:09:12 <ihrachys> I will recheck the latter once we settle down the ovs one that is in gate. 15:09:16 <korzen> MTU.. never ending story ;) 15:09:49 <ihrachys> aye. I have like 8 patches related to MTU sitting in my queue :) 15:10:07 <ihrachys> there were also some good news about dvr 15:10:12 <ihrachys> thanks to sc68cal, we now have it voting: https://review.openstack.org/336116 15:10:14 <ihrachys> YAY 15:10:23 <korzen> :) 15:10:29 <ihrachys> so both legacy and dvr upgrades are covered in gate. 15:10:59 <ihrachys> I guess the plan now is to monitor both, then later consider dropping the legacy one (at least that's what I saw in the etherpad crafted by armax and sc68cal) 15:11:23 <korzen> dvr should be covering the legacy... 15:11:37 <ihrachys> korzen: elaborate 15:11:40 <korzen> the question is how much DVR is dvr job now 15:12:16 <korzen> current dvr job in my understanding is running legacy too 15:12:29 <korzen> because it has smoke tests 15:12:39 <korzen> which is using legacy 15:12:57 <ihrachys> korzen: ok, so you suggest we may have testing gaps in dvr cases in dvr job. 15:13:15 <korzen> but I'm not sure if smoke will create legacy or neutron is confgured to launch dvr routers? 15:14:04 <korzen> grenade is running the general tempest tests, and I'm not sure if dvr specific tests are launched 15:14:17 <ihrachys> yeah, something to consider. also of interest for 'long standing resources' that are created by grenade and are, I suspect, all legacy (if we create routers at all) 15:14:59 <ihrachys> tests executed in dvr jobs: http://logs.openstack.org/58/342958/3/check/gate-grenade-dsvm-neutron-dvr-multinode/df59b0d/logs/testr_results.html.gz 15:15:25 <ihrachys> it's really basic so far. 15:15:58 <ihrachys> speaking of which, I think grenade does not even execute smoke tests from neutron tree right now 15:16:08 <ihrachys> there is a patch to fix that: https://review.openstack.org/#/c/337372/ 15:16:36 <ihrachys> without it, our smoke tags are of no use for grenade runs 15:17:08 <ihrachys> ok, it's something to consider in the next iterations around those jobs. overall it's cool to see progress. 15:17:18 <korzen> yeap 15:17:22 <ihrachys> #topic Object implementation 15:18:07 <ihrachys> as I mentioned before, we are short on time, and we should really strive to land some big objects, subnet and port are likely the best candidates at this point. 15:18:28 <ihrachys> now that I am back, I started looking at subnet set of patches from korzen and will try to avoid slowing down Artur 15:18:47 <ihrachys> rossella_s: korzen has a bunch of patches in his queue that I believe are mergeable, I marked some with +2 15:18:57 <ihrachys> would be cool to try to flush those asap 15:19:00 <korzen> I'm constantly debugging and fixing found issues 15:19:07 <korzen> for subnet 15:19:41 <rossella_s> ihrachys, ack...I will look at them tomorrow 15:20:07 <ihrachys> ok, I think with korzen on subnet, it's covered. 15:20:22 <ihrachys> what about port? I believe jlibosva did not have time to revive it? 15:20:25 <slunkad_> I guess we are only waiting for one testcase for the sg patch .. korzen I don't really understand what you have done in the _load_shared method for subnet 15:20:31 <jlibosva> nope, I didn;t 15:20:59 <ihrachys> jlibosva: so you either step in now, or risk me taking it over!!!1! :) 15:21:13 <jlibosva> ihrachys: I think it would be better if you take over 15:21:50 <ihrachys> slunkad_: I see -1 at https://review.openstack.org/#/c/284738/ that is there for some time. are you planning to respin it this week so that we can land it? 15:21:58 <ihrachys> jlibosva: yessir 15:22:06 <jlibosva> thanks and sorry :) 15:22:40 <slunkad_> ihrachys: yes I am, we discussed in the last meeting about the testcase that is needed for is_default 15:22:47 <ihrachys> slunkad_: also, we were suggesting before that new patches should contain code that integrate those objects in db code. is it on your radar? 15:23:12 <slunkad_> ihrachys: yes surely, after the sg is merged 15:23:37 <ihrachys> slunkad_: it would be better to have both pieces ready to land before we push buttons. 15:23:38 <slunkad_> kong: if you could explain a little more what exactly you are doing for subnet it would help me to move forward 15:23:52 <ihrachys> slunkad_: at least to grasp if the initial version of the object is good enough to be used at least for something. 15:24:07 <ihrachys> slunkad_: otherwise it's code that hangs untangled from the actual neutron-serve 15:24:09 <ihrachys> *server 15:24:12 <slunkad_> ihrachys: oh ok, then I can start working on it sooner 15:24:23 <ihrachys> slunkad_: thanks 15:24:29 <ihrachys> slunkad_: I see you already have a WIP for that 15:24:39 <slunkad_> ihrachys: yes but that is really really old 15:25:28 <slunkad_> s/kong/korzen 15:25:29 <korzen> sorry I've got disconnected 15:25:43 <ihrachys> slunkad_: I am not following how korzen shared field relates to sg patch you have. can you elaborate? 15:26:22 <korzen> ihrachys, I was suggesting for is-default attribute to be implemented similary like subnet's shared 15:26:36 <slunkad_> ihrachys: apparently since is_default is only handled in create 15:26:45 <korzen> and I mean the load_shared logic 15:27:35 <korzen> we need to write special loading method for synthetic field that has not an OVO 15:27:55 <slunkad_> korzen: I guess you mean just the skeleton of load_share because the stuff in it looks quite different 15:28:50 <korzen> what I mean is to create method load_default() and put in in from_db_obj() overridden mehtod in SG class 15:29:24 <slunkad_> korzen: ok 15:30:40 <ihrachys> ok I hope we are clear on the way forward there. 15:31:43 <ihrachys> there are other object patches in the queue, though I haven't reached them just yet. I try to stick to what is most critical, which is subnet and port in my world. 15:31:55 <ihrachys> that said, we'll get there with reviews for other patches. 15:32:07 <ihrachys> #topic Other patches on review 15:32:27 <ihrachys> I had something related to mtu, but now in database. 15:32:53 <ihrachys> so in mitaka, we changed a way we configure mtu. new networks got correct mtu values calculated and stored in db. 15:33:18 <ihrachys> now, for existing networks created before fixes, db contains bad values (like 0) that break backend mtu setup. 15:33:30 <ihrachys> and those networks have not migrated just yet to correct values. 15:33:59 <ihrachys> so, I have a patch that makes mtu field not persisted in db but calculated on demand on every network fetch: https://review.openstack.org/336805 15:34:37 <ihrachys> basically, for upgrade matters, it means that after upgrade to N, old networks may have MTUs changed (as returned by neutron-server) 15:34:48 <ihrachys> now, if we talk about api only, it's all good and done. 15:35:37 <ihrachys> but then, we have agents still running with bad MTU applied. the question is, should we do something more than just fixing it on neutron-server, like triggering MTU reset on agents somehow? 15:35:50 <ihrachys> how do we handle such changes in generall? 15:36:35 <korzen> I guess we can resync 15:36:48 <korzen> then agents will get new values of MTU 15:37:05 <korzen> it should be maybe exposed via CLI? 15:37:12 <korzen> to forve the sync on old agent? 15:37:19 <korzen> force* 15:37:20 <ihrachys> will agents always trigger full resync on restart? or do they sometimes apply some graceful techniques to skip some of work? 15:37:47 <ihrachys> because upgrade kinda requires eventual restart of an agent, and maybe it's good enough. 15:37:57 <korzen> that is one option 15:38:04 <korzen> but what about old agents? 15:38:07 <ihrachys> rossella_s: comments? 15:38:34 <korzen> if you will leave some L2 agents running for longer period before upgrde? 15:38:37 <ihrachys> korzen: in a way, they will stay broken (using mixed bad and good MTUs) until they are upgraded. 15:39:00 <rossella_s> ihrachys, the l2 agent will always full resync 15:39:22 <ihrachys> korzen: for new ports on a network handled by new neutron-server, MTU will be correct; only devices that were created before neutron-server is upgraded will have bad MTUs. 15:39:37 <ihrachys> the mixed nature of MTU of course will make debugging issues fun 15:40:05 <korzen> ihrachys, sounds like ops are going to have more fun because of us :) 15:40:15 <ihrachys> but then, an argument can be made that 'some ports have broken mtu' is better than 'all ports have broken mtu' :) 15:40:41 <ihrachys> rossella_s: I assume it's the same for l3 and dhcp 15:41:07 <rossella_s> ihrachys, I guess so 15:42:02 <korzen> it is like L2 is wiring them 15:42:11 <korzen> so the L2 is enough 15:42:35 <ihrachys> l3 router also does its job though, since it handles namespaces. 15:42:44 <ihrachys> they all rely on interface_driver 15:43:20 <korzen> so where is exactly MTU set? 15:44:01 <ihrachys> on all ports plugged into a network. it's both router/dhcp ports as well as instance taps 15:44:20 <ihrachys> for tap, it's both nova and neutron that handle it. 15:44:26 <korzen> ihrachys, but where in code? 15:44:47 <korzen> sorry if I'm asking obvious things 15:44:54 <ihrachys> neutron/agent/linux/interface.py is for interface_drivers 15:45:00 <korzen> ok 15:45:14 <ihrachys> for hybrid bridge, it's somewhere in nova tree, not neutron. though neutron provides the value to apply. 15:46:16 <ihrachys> anyway, seems like we may assume a resync. mixed mtus are not the best thing I would like to see, but even if we trigger resync somehow, it may still take time to apply, in the meantime we are still left with mixed mtus. 15:47:59 <ihrachys> ok, any more upgrade related patches to discuss? 15:48:26 <ihrachys> #topic Open discussion 15:48:43 <ihrachys> I see two patches from korzen in the agenda: https://review.openstack.org/334380 and https://review.openstack.org/334381 15:48:50 <ihrachys> I already reviewed both and they are good 15:48:59 <ihrachys> korzen: do we need anything there apart from +W? 15:49:08 <ihrachys> or is there something to discuss? 15:49:20 <korzen> they are ok 15:49:51 <ihrachys> ok, anything more to discuss? 15:50:09 <korzen> I guess not 15:50:29 <ihrachys> I bet not. ok, let's focus on subnet and port and make progress there. 15:50:35 <ihrachys> thanks everyone! 15:50:36 <ihrachys> #endmeeting