15:01:37 #startmeeting neutron_upgrades 15:01:38 Meeting started Mon Jul 18 15:01:37 2016 UTC and is due to finish in 60 minutes. The chair is ihrachys. Information about MeetBot at http://wiki.debian.org/MeetBot. 15:01:40 Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. 15:01:43 The meeting name has been set to 'neutron_upgrades' 15:01:48 hello 15:01:57 Hi ihrachys, good to see you back in business 15:02:11 it's good indeed to be back! :) 15:02:27 * ihrachys waves at sc68cal and rossella_s 15:02:43 hi ihrachys and all 15:02:46 hello 15:03:12 #link https://wiki.openstack.org/wiki/Meetings/Neutron-Upgrades-Subteam Agenda 15:03:18 #topic Announcements 15:03:29 not much from me. we delivered N2, and N3 is in August. 15:04:27 we should try to land subnet and maybe port till then 15:04:33 #topic Partial Multinode Grenade 15:04:40 there was some progress on that one lately 15:05:15 linuxbridge flavour added to experimental: https://review.openstack.org/336793 and https://review.openstack.org/340962 15:05:26 that said, the job currently fails on three tests accessing FIP 15:05:37 with ssh timeout 15:05:37 suggesting another MTU issue :) 15:05:40 speaking of MTU... 15:06:01 we had a bad week for multinode grenade, the job started to misbehave 15:06:17 #link https://bugs.launchpad.net/neutron/+bug/1603268 15:06:17 Launchpad bug 1603268 in neutron "Unstable grenade multinode job" [Critical,In progress] - Assigned to Ihar Hrachyshka (ihar-hrachyshka) 15:06:57 the current theory is, we issue ssh into FIP attached to an instance using br-ex, and the bridge has mtu = 1500, while underlying network is at 1450 15:07:07 so requests don't pass thru 15:07:22 we are not yet sure how it still works, even if unstable 15:07:31 anyhow, there is a set of patches that should get it back to decency 15:07:41 starting at https://review.openstack.org/#/c/343024/ plus Depends-On links 15:08:31 overall, there are some things to tweak in devstack-gate to fix mtu sanity, like properly configuring neutron with global_physnet_mtu instead of network_device_mtu: https://review.openstack.org/342975 15:08:51 that plus some other cleanup bits should hopefully get the job in better shape 15:09:01 including linuxbridge one 15:09:12 I will recheck the latter once we settle down the ovs one that is in gate. 15:09:16 MTU.. never ending story ;) 15:09:49 aye. I have like 8 patches related to MTU sitting in my queue :) 15:10:07 there were also some good news about dvr 15:10:12 thanks to sc68cal, we now have it voting: https://review.openstack.org/336116 15:10:14 YAY 15:10:23 :) 15:10:29 so both legacy and dvr upgrades are covered in gate. 15:10:59 I guess the plan now is to monitor both, then later consider dropping the legacy one (at least that's what I saw in the etherpad crafted by armax and sc68cal) 15:11:23 dvr should be covering the legacy... 15:11:37 korzen: elaborate 15:11:40 the question is how much DVR is dvr job now 15:12:16 current dvr job in my understanding is running legacy too 15:12:29 because it has smoke tests 15:12:39 which is using legacy 15:12:57 korzen: ok, so you suggest we may have testing gaps in dvr cases in dvr job. 15:13:15 but I'm not sure if smoke will create legacy or neutron is confgured to launch dvr routers? 15:14:04 grenade is running the general tempest tests, and I'm not sure if dvr specific tests are launched 15:14:17 yeah, something to consider. also of interest for 'long standing resources' that are created by grenade and are, I suspect, all legacy (if we create routers at all) 15:14:59 tests executed in dvr jobs: http://logs.openstack.org/58/342958/3/check/gate-grenade-dsvm-neutron-dvr-multinode/df59b0d/logs/testr_results.html.gz 15:15:25 it's really basic so far. 15:15:58 speaking of which, I think grenade does not even execute smoke tests from neutron tree right now 15:16:08 there is a patch to fix that: https://review.openstack.org/#/c/337372/ 15:16:36 without it, our smoke tags are of no use for grenade runs 15:17:08 ok, it's something to consider in the next iterations around those jobs. overall it's cool to see progress. 15:17:18 yeap 15:17:22 #topic Object implementation 15:18:07 as I mentioned before, we are short on time, and we should really strive to land some big objects, subnet and port are likely the best candidates at this point. 15:18:28 now that I am back, I started looking at subnet set of patches from korzen and will try to avoid slowing down Artur 15:18:47 rossella_s: korzen has a bunch of patches in his queue that I believe are mergeable, I marked some with +2 15:18:57 would be cool to try to flush those asap 15:19:00 I'm constantly debugging and fixing found issues 15:19:07 for subnet 15:19:41 ihrachys, ack...I will look at them tomorrow 15:20:07 ok, I think with korzen on subnet, it's covered. 15:20:22 what about port? I believe jlibosva did not have time to revive it? 15:20:25 I guess we are only waiting for one testcase for the sg patch .. korzen I don't really understand what you have done in the _load_shared method for subnet 15:20:31 nope, I didn;t 15:20:59 jlibosva: so you either step in now, or risk me taking it over!!!1! :) 15:21:13 ihrachys: I think it would be better if you take over 15:21:50 slunkad_: I see -1 at https://review.openstack.org/#/c/284738/ that is there for some time. are you planning to respin it this week so that we can land it? 15:21:58 jlibosva: yessir 15:22:06 thanks and sorry :) 15:22:40 ihrachys: yes I am, we discussed in the last meeting about the testcase that is needed for is_default 15:22:47 slunkad_: also, we were suggesting before that new patches should contain code that integrate those objects in db code. is it on your radar? 15:23:12 ihrachys: yes surely, after the sg is merged 15:23:37 slunkad_: it would be better to have both pieces ready to land before we push buttons. 15:23:38 kong: if you could explain a little more what exactly you are doing for subnet it would help me to move forward 15:23:52 slunkad_: at least to grasp if the initial version of the object is good enough to be used at least for something. 15:24:07 slunkad_: otherwise it's code that hangs untangled from the actual neutron-serve 15:24:09 *server 15:24:12 ihrachys: oh ok, then I can start working on it sooner 15:24:23 slunkad_: thanks 15:24:29 slunkad_: I see you already have a WIP for that 15:24:39 ihrachys: yes but that is really really old 15:25:28 s/kong/korzen 15:25:29 sorry I've got disconnected 15:25:43 slunkad_: I am not following how korzen shared field relates to sg patch you have. can you elaborate? 15:26:22 ihrachys, I was suggesting for is-default attribute to be implemented similary like subnet's shared 15:26:36 ihrachys: apparently since is_default is only handled in create 15:26:45 and I mean the load_shared logic 15:27:35 we need to write special loading method for synthetic field that has not an OVO 15:27:55 korzen: I guess you mean just the skeleton of load_share because the stuff in it looks quite different 15:28:50 what I mean is to create method load_default() and put in in from_db_obj() overridden mehtod in SG class 15:29:24 korzen: ok 15:30:40 ok I hope we are clear on the way forward there. 15:31:43 there are other object patches in the queue, though I haven't reached them just yet. I try to stick to what is most critical, which is subnet and port in my world. 15:31:55 that said, we'll get there with reviews for other patches. 15:32:07 #topic Other patches on review 15:32:27 I had something related to mtu, but now in database. 15:32:53 so in mitaka, we changed a way we configure mtu. new networks got correct mtu values calculated and stored in db. 15:33:18 now, for existing networks created before fixes, db contains bad values (like 0) that break backend mtu setup. 15:33:30 and those networks have not migrated just yet to correct values. 15:33:59 so, I have a patch that makes mtu field not persisted in db but calculated on demand on every network fetch: https://review.openstack.org/336805 15:34:37 basically, for upgrade matters, it means that after upgrade to N, old networks may have MTUs changed (as returned by neutron-server) 15:34:48 now, if we talk about api only, it's all good and done. 15:35:37 but then, we have agents still running with bad MTU applied. the question is, should we do something more than just fixing it on neutron-server, like triggering MTU reset on agents somehow? 15:35:50 how do we handle such changes in generall? 15:36:35 I guess we can resync 15:36:48 then agents will get new values of MTU 15:37:05 it should be maybe exposed via CLI? 15:37:12 to forve the sync on old agent? 15:37:19 force* 15:37:20 will agents always trigger full resync on restart? or do they sometimes apply some graceful techniques to skip some of work? 15:37:47 because upgrade kinda requires eventual restart of an agent, and maybe it's good enough. 15:37:57 that is one option 15:38:04 but what about old agents? 15:38:07 rossella_s: comments? 15:38:34 if you will leave some L2 agents running for longer period before upgrde? 15:38:37 korzen: in a way, they will stay broken (using mixed bad and good MTUs) until they are upgraded. 15:39:00 ihrachys, the l2 agent will always full resync 15:39:22 korzen: for new ports on a network handled by new neutron-server, MTU will be correct; only devices that were created before neutron-server is upgraded will have bad MTUs. 15:39:37 the mixed nature of MTU of course will make debugging issues fun 15:40:05 ihrachys, sounds like ops are going to have more fun because of us :) 15:40:15 but then, an argument can be made that 'some ports have broken mtu' is better than 'all ports have broken mtu' :) 15:40:41 rossella_s: I assume it's the same for l3 and dhcp 15:41:07 ihrachys, I guess so 15:42:02 it is like L2 is wiring them 15:42:11 so the L2 is enough 15:42:35 l3 router also does its job though, since it handles namespaces. 15:42:44 they all rely on interface_driver 15:43:20 so where is exactly MTU set? 15:44:01 on all ports plugged into a network. it's both router/dhcp ports as well as instance taps 15:44:20 for tap, it's both nova and neutron that handle it. 15:44:26 ihrachys, but where in code? 15:44:47 sorry if I'm asking obvious things 15:44:54 neutron/agent/linux/interface.py is for interface_drivers 15:45:00 ok 15:45:14 for hybrid bridge, it's somewhere in nova tree, not neutron. though neutron provides the value to apply. 15:46:16 anyway, seems like we may assume a resync. mixed mtus are not the best thing I would like to see, but even if we trigger resync somehow, it may still take time to apply, in the meantime we are still left with mixed mtus. 15:47:59 ok, any more upgrade related patches to discuss? 15:48:26 #topic Open discussion 15:48:43 I see two patches from korzen in the agenda: https://review.openstack.org/334380 and https://review.openstack.org/334381 15:48:50 I already reviewed both and they are good 15:48:59 korzen: do we need anything there apart from +W? 15:49:08 or is there something to discuss? 15:49:20 they are ok 15:49:51 ok, anything more to discuss? 15:50:09 I guess not 15:50:29 I bet not. ok, let's focus on subnet and port and make progress there. 15:50:35 thanks everyone! 15:50:36 #endmeeting