15:00:56 #startmeeting neutron_upgrades 15:01:02 Meeting started Mon Feb 6 15:00:56 2017 UTC and is due to finish in 60 minutes. The chair is ihrachys. Information about MeetBot at http://wiki.debian.org/MeetBot. 15:01:03 Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. 15:01:06 The meeting name has been set to 'neutron_upgrades' 15:01:17 hello 15:01:18 hola 15:01:20 hi 15:01:25 hi 15:02:23 #link https://wiki.openstack.org/wiki/Meetings/Neutron-Upgrades-Subteam Agenda 15:02:37 hey everyone 15:02:40 #topic Announcements 15:02:54 rc1 is cut off master, so master is pike now, and we can start landing patches as usual 15:03:20 we may have some preparation work before we do merge full steam, but overall, the branch is open 15:03:43 Hello 15:04:45 before we proceed, I'd like to run through ptg plans once again 15:04:55 #topic PTG in Atlanta 15:05:24 korzen: I think you were planning to walk through the agenda pad and write up something for our cause 15:05:30 https://etherpad.openstack.org/p/neutron-ptg-pike-upgrades 15:05:31 #link https://etherpad.openstack.org/p/neutron-ptg-pike PTG etherpad 15:05:42 korzen++ 15:06:23 my intention was to have more detailed technical identification of tasks to be done 15:07:06 to be done, or to be covered during PTG? 15:07:50 to be done 15:08:17 putting the priority on online data migration 15:08:40 korzen: about online data migration, can you expand more that definition? 15:08:44 yes, that's the most fuzzy topic. do we want to do prior research? 15:08:56 yes, I will do the research 15:10:05 korzen: thanks. 15:10:14 electrocucaracha, moving the data from one format to the newer 15:10:23 online 15:10:25 ;) 15:11:18 we do not have the candidate for it now, but when it will appear, the framework should be ready to use 15:11:36 I believe it comprises of 1) some common hooking mechanism into get_object(s)/create/delete/update that would allow us to reduce the code needed for each case and 2) some hook in neutron-db-manage allowing to enforce transition for all pending updates. 15:11:40 korzen: during that time the table needs to be blocked? 15:12:22 electrocucaracha, no, table should not be block 15:12:30 at least not for long period of time 15:12:54 the operations on DB should be atomic and doable in chunks 15:13:15 so it can be done in background 15:13:32 also when accessing the object, the data will be also migrated 15:13:42 so get_objects etc will count 15:14:35 as ihrachys said, neutron-db-manage -online-data-migration or similar command should be added 15:15:11 and when all data is migrated, when old format is not needed, it can be removed in following release 15:15:19 in contact migration script 15:15:46 so basically all the new calls will be addressed to the new schema 15:16:07 in n+2 yes 15:16:25 n+1 should be backward compatible 15:17:33 korzen: ok I think the agenda makes sense, should we move it into the common etherpad? or at least link to it from there? 15:17:45 yes, we can link to it 15:19:56 ok done. drivers team will work this week on clarifying general agenda for the event. 15:20:46 #topic Partial Multinode Grenade 15:20:57 korzen: any update on mixed server version? 15:21:42 no yet, I was fighting with the setup 15:23:06 Now, when the Ocata rc1 is released I can check the Newton/Ocata compatibility 15:23:45 I'm fixing last issues with the k8s enviroment and should be ready to test 15:24:17 the containers should gave us the proper approach for upgrading the neutron servers 15:24:54 I would roll out the new neutron server with old one running on the other node 15:24:56 with one DB 15:25:10 and the new server will call expand on DB schema 15:25:33 on that level, I will check the API CRUD operations and launching the VMs 15:27:14 I've heard that mirantis done the zero downtime in their tests, including neutron but it seams that they stopped the neutron server for a short while 15:27:20 'new server will call expand' - you mean, you will restart with new code, then call neutron-db-manage --upgrade --expand from that same container? 15:27:33 ihrachys, yes 15:27:44 ok makes sense 15:28:59 funny think that Mirantis claims that only nova scheduler is a problem to achieve zero downtime 15:29:27 but technically speaking neutron and other projects does not support the zero downtime upgrade as of now 15:30:34 I guess that they were lucky enough to have for example large offline data migration that would delay the networking API to get up 15:30:41 not to have* 15:31:30 maybe, or they're still using nova-network 15:31:41 they don't 15:31:59 I will need to ask more details 15:32:29 hopefully we can get an answer before PTG 15:32:49 some people even feel ok to change underlying schema while service is running. it may produce some hickups during the time, but in some cases it's well isolated. so that's maybe that. 15:32:59 by other hand, zero downtime is a hot topic in OSIC, we have guys from nova, QA, cinder and Ansible-Kolla working on that 15:34:59 ok moving on 15:35:00 #topic Object implementation 15:35:16 #link https://review.openstack.org/#/q/status:open+project:openstack/neutron+branch:master+topic:bp/adopt-oslo-versioned-objects-for-db Open patches 15:35:48 I am going to walk through the list to see what we blocked before rc1 cut-off, and will land where it's due. 15:36:09 asingh_: please rebase https://review.openstack.org/356825 (tag patch), that should be ready I think 15:36:52 korzen: as for port binding patch, we don't see to have a resolution yet? https://review.openstack.org/#/c/407868/ 15:37:49 I have a couple of patches that only refactor the existing code 15:38:09 ihrachys, nope, it is till not passing the gate 15:38:48 electrocucaracha: I will get back to those, opened in tabs 15:39:00 i have two patches 1. quota ovo https://review.openstack.org/#/c/338625/ 15:39:21 2. external network ovo https://review.openstack.org/#/c/353088/ 15:41:00 manjeets: ack 15:41:21 ihrachys, How do we go about the network segment synthetic field in porting binding level? Shall we include both segment id and segment object? https://review.openstack.org/#/c/382037/. 15:41:27 I can suggest to change the status on the spreadsheet to "Ready", in that way we can distinguish those patches that are ready to be reviewed to those who are in progresss 15:42:46 sshank: wait, you are talking about 'creating' a segment object there. in which case, there won't be a level? 15:43:33 so the order would be 1. create segment 2. create port 3. create binding and levels. I don't see where we have a circular creation problem. or do I miss something? 15:44:22 ihrachys, For the create segment, we need segment_id in the fields on level object. 15:44:54 sshank: when you create a new segment, you don't pass any levels, they will be created later. 15:46:44 ihrachys, But for creating the segment, it needs segment ID in level object for foreign key reference. I just wanted to confirm if we need to add this since I was told in the previous reviews to not have segment_id and segment synthetic field. 15:46:48 wait, do you talk about 'creating an object' as in 'instantiating python object', not a model? 15:47:50 it may be the case that there is some work to do to make object-field handling code to work for this scenario 15:48:02 lemme check once more, and I will report back on gerrit 15:48:49 ihrachys, I think we need segment id and segment synthetic field since the former is needed for foreign key referral and the latter for push notificaitons. 15:49:56 I think foreign key referral usage is bound to how we implemented that logic, not to the essence of the goal, that is, pulling the right related objects. 15:50:02 anyhow, let's take it to gerrit 15:50:13 ihrachys, Okay. Thank you. 15:50:39 I will skip 'Other patches' for this meeting, nothing interesting there 15:50:44 #topic Open discussion 15:50:50 I have one thing to point out 15:51:06 some of you may have noticed already that gate is unstable lately 15:51:37 one of the failures we see is memory consumption going too high, making oom-killer shooting processes, usually mysql. 15:51:57 we also see libvirtd dying on malloc() so that can also be related 15:52:22 there is a long thread on that started by armax at http://lists.openstack.org/pipermail/openstack-dev/2017-February/111413.html 15:52:34 basically, neutron and nova are the memory hoggers 15:52:58 and we raised memory consumption during mitaka to newton to ocata significantly 15:53:06 so we were thinking what could trigger that 15:53:28 ihrachys: does neutron has a periodic task mechanism as nova? 15:53:34 and both nova folks and some of us were thinking, maybe it's OVO adoption that makes services keep some object references in memory 15:53:49 electrocucaracha: what's the mechanism? you would need to elaborate. 15:54:20 ihrachys: by default, nova checks every 60 secs the status of the VMs 15:54:56 ihrachys: there is a way to change this behavior to subscribe instead of asking compute nodes 15:55:09 so, though we don't have any numbers yet (people are working on generating memory usage profiles), it's worth a note here that OVO is a suspect, and we may need to get involved in whatever comes from the investigation. 15:55:21 electrocucaracha: how does it relate to memory consumption? 15:55:33 I think it could change cpu usage pattern but not memory? 15:56:05 ihrachys, is it OVO lib general problem? 15:56:56 we don't know yet, maybe nova and neutron do something wrong; maybe it's not even OVO. 15:57:13 but nova folks said they noticed memory usage going up when they started adoption 15:57:17 so that's now two of us 15:57:21 :) 15:57:33 maybe that's related, maybe not. a memory profile should give us data. 15:57:46 for now, let's point fingers into oslo direction :)) 15:57:54 i guess cinder also have ovo it may or may not be ovo 15:58:10 yeah, for now it's just unfounded suspicions 15:58:18 but it's bad for PR ;) 15:59:12 ok let's call it a day. thanks everyone. 15:59:14 #endmeeting