15:00:56 <ihrachys> #startmeeting neutron_upgrades
15:01:02 <openstack> Meeting started Mon Feb  6 15:00:56 2017 UTC and is due to finish in 60 minutes.  The chair is ihrachys. Information about MeetBot at http://wiki.debian.org/MeetBot.
15:01:03 <openstack> Useful Commands: #action #agreed #help #info #idea #link #topic #startvote.
15:01:06 <openstack> The meeting name has been set to 'neutron_upgrades'
15:01:17 <korzen> hello
15:01:18 <electrocucaracha> hola
15:01:20 <sindhu> hi
15:01:25 <dasanind> hi
15:02:23 <ihrachys> #link https://wiki.openstack.org/wiki/Meetings/Neutron-Upgrades-Subteam Agenda
15:02:37 <ihrachys> hey everyone
15:02:40 <ihrachys> #topic Announcements
15:02:54 <ihrachys> rc1 is cut off master, so master is pike now, and we can start landing patches as usual
15:03:20 <ihrachys> we may have some preparation work before we do merge full steam, but overall, the branch is open
15:03:43 <sshank> Hello
15:04:45 <ihrachys> before we proceed, I'd like to run through ptg plans once again
15:04:55 <ihrachys> #topic PTG in Atlanta
15:05:24 <ihrachys> korzen: I think you were planning to walk through the agenda pad and write up something for our cause
15:05:30 <korzen> https://etherpad.openstack.org/p/neutron-ptg-pike-upgrades
15:05:31 <ihrachys> #link https://etherpad.openstack.org/p/neutron-ptg-pike PTG etherpad
15:05:42 <ihrachys> korzen++
15:06:23 <korzen> my intention was to have more detailed technical identification of tasks to be done
15:07:06 <ihrachys> to be done, or to be covered during PTG?
15:07:50 <korzen> to be done
15:08:17 <korzen> putting the priority on online data migration
15:08:40 <electrocucaracha> korzen: about online data migration, can you expand more that definition?
15:08:44 <ihrachys> yes, that's the most fuzzy topic. do we want to do prior research?
15:08:56 <korzen> yes, I will do the research
15:10:05 <ihrachys> korzen: thanks.
15:10:14 <korzen> electrocucaracha, moving the data from one format to the newer
15:10:23 <korzen> online
15:10:25 <korzen> ;)
15:11:18 <korzen> we do not have the candidate for it now, but when it will appear, the framework should be ready to use
15:11:36 <ihrachys> I believe it comprises of 1) some common hooking mechanism into get_object(s)/create/delete/update that would allow us to reduce the code needed for each case and 2) some hook in neutron-db-manage allowing to enforce transition for all pending updates.
15:11:40 <electrocucaracha> korzen: during that time the table needs to be blocked?
15:12:22 <korzen> electrocucaracha, no, table should not be block
15:12:30 <korzen> at least not for long period of time
15:12:54 <korzen> the operations on DB should be atomic and doable in chunks
15:13:15 <korzen> so it can be done in background
15:13:32 <korzen> also when accessing the object, the data will be also migrated
15:13:42 <korzen> so get_objects etc will count
15:14:35 <korzen> as ihrachys said, neutron-db-manage -online-data-migration or similar command should be added
15:15:11 <korzen> and when all data is migrated, when old format is not needed, it can be removed in following release
15:15:19 <korzen> in contact migration script
15:15:46 <electrocucaracha> so basically all the new calls will be addressed to the new schema
15:16:07 <korzen> in n+2 yes
15:16:25 <korzen> n+1 should be backward compatible
15:17:33 <ihrachys> korzen: ok I think the agenda makes sense, should we move it into the common etherpad? or at least link to it from there?
15:17:45 <korzen> yes, we can link to it
15:19:56 <ihrachys> ok done. drivers team will work this week on clarifying general agenda for the event.
15:20:46 <ihrachys> #topic Partial Multinode Grenade
15:20:57 <ihrachys> korzen: any update on mixed server version?
15:21:42 <korzen> no yet, I was fighting with the setup
15:23:06 <korzen> Now, when the Ocata rc1 is released I can check the Newton/Ocata compatibility
15:23:45 <korzen> I'm fixing last issues with the k8s enviroment and should be ready to test
15:24:17 <korzen> the containers should gave us the proper approach for upgrading the neutron servers
15:24:54 <korzen> I would roll out the new neutron server with old one running on the other node
15:24:56 <korzen> with one DB
15:25:10 <korzen> and the new server will call expand on DB schema
15:25:33 <korzen> on that level, I will check the API CRUD operations and launching the VMs
15:27:14 <korzen> I've heard that mirantis done the zero downtime in their tests, including neutron but it seams that they stopped the neutron server for a short while
15:27:20 <ihrachys> 'new server will call expand' - you mean, you will restart with new code, then call neutron-db-manage --upgrade --expand from that same container?
15:27:33 <korzen> ihrachys, yes
15:27:44 <ihrachys> ok makes sense
15:28:59 <korzen> funny think that Mirantis claims that only nova scheduler is a problem to achieve zero downtime
15:29:27 <korzen> but technically speaking neutron and other projects does not support the zero downtime upgrade as of now
15:30:34 <korzen> I guess that they were lucky enough to have for example large offline data migration that would delay the networking API to get up
15:30:41 <korzen> not to have*
15:31:30 <electrocucaracha> maybe, or they're still using nova-network
15:31:41 <ihrachys> they don't
15:31:59 <korzen> I will need to ask more details
15:32:29 <korzen> hopefully we can get an answer before PTG
15:32:49 <ihrachys> some people even feel ok to change underlying schema while service is running. it may produce some hickups during the time, but in some cases it's well isolated. so that's maybe that.
15:32:59 <electrocucaracha> by other hand, zero downtime is a hot topic in OSIC, we have guys from nova, QA, cinder and Ansible-Kolla working on that
15:34:59 <ihrachys> ok moving on
15:35:00 <ihrachys> #topic Object implementation
15:35:16 <ihrachys> #link https://review.openstack.org/#/q/status:open+project:openstack/neutron+branch:master+topic:bp/adopt-oslo-versioned-objects-for-db Open patches
15:35:48 <ihrachys> I am going to walk through the list to see what we blocked before rc1 cut-off, and will land where it's due.
15:36:09 <ihrachys> asingh_: please rebase https://review.openstack.org/356825 (tag patch), that should be ready I think
15:36:52 <ihrachys> korzen: as for port binding patch, we don't see to have a resolution yet? https://review.openstack.org/#/c/407868/
15:37:49 <electrocucaracha> I have a couple of patches that only refactor the existing code
15:38:09 <korzen> ihrachys, nope, it is till not passing the gate
15:38:48 <ihrachys> electrocucaracha: I will get back to those, opened in tabs
15:39:00 <manjeets> i have two patches 1. quota ovo https://review.openstack.org/#/c/338625/
15:39:21 <manjeets> 2. external network ovo https://review.openstack.org/#/c/353088/
15:41:00 <ihrachys> manjeets: ack
15:41:21 <sshank> ihrachys, How do we go about the network segment synthetic field in porting binding level? Shall we include both segment id and segment object? https://review.openstack.org/#/c/382037/.
15:41:27 <electrocucaracha> I can suggest to change the status on the spreadsheet to "Ready", in that way we can distinguish those patches that are ready to be reviewed to those who are in progresss
15:42:46 <ihrachys> sshank: wait, you are talking about 'creating' a segment object there. in which case, there won't be a level?
15:43:33 <ihrachys> so the order would be 1. create segment 2. create port 3. create binding and levels. I don't see where we have a circular creation problem. or do I miss something?
15:44:22 <sshank> ihrachys, For the create segment, we need segment_id in the fields on level object.
15:44:54 <ihrachys> sshank: when you create a new segment, you don't pass any levels, they will be created later.
15:46:44 <sshank> ihrachys, But for creating the segment, it needs segment ID in level object for foreign key reference. I just wanted to confirm if we need to add this since I was told in the previous reviews to not have segment_id and segment synthetic field.
15:46:48 <ihrachys> wait, do you talk about 'creating an object' as in 'instantiating python object', not a model?
15:47:50 <ihrachys> it may be the case that there is some work to do to make object-field handling code to work for this scenario
15:48:02 <ihrachys> lemme check once more, and I will report back on gerrit
15:48:49 <sshank> ihrachys, I think we need segment id and segment synthetic field since the former is needed for foreign key referral and the latter for push notificaitons.
15:49:56 <ihrachys> I think foreign key referral usage is bound to how we implemented that logic, not to the essence of the goal, that is, pulling the right related objects.
15:50:02 <ihrachys> anyhow, let's take it to gerrit
15:50:13 <sshank> ihrachys, Okay. Thank you.
15:50:39 <ihrachys> I will skip 'Other patches' for this meeting, nothing interesting there
15:50:44 <ihrachys> #topic Open discussion
15:50:50 <ihrachys> I have one thing to point out
15:51:06 <ihrachys> some of you may have noticed already that gate is unstable lately
15:51:37 <ihrachys> one of the failures we see is memory consumption going too high, making oom-killer shooting processes, usually mysql.
15:51:57 <ihrachys> we also see libvirtd dying on malloc() so that can also be related
15:52:22 <ihrachys> there is a long thread on that started by armax at http://lists.openstack.org/pipermail/openstack-dev/2017-February/111413.html
15:52:34 <ihrachys> basically, neutron and nova are the memory hoggers
15:52:58 <ihrachys> and we raised memory consumption during mitaka to newton to ocata significantly
15:53:06 <ihrachys> so we were thinking what could trigger that
15:53:28 <electrocucaracha> ihrachys: does neutron has a periodic task mechanism as nova?
15:53:34 <ihrachys> and both nova folks and some of us were thinking, maybe it's OVO adoption that makes services keep some object references in memory
15:53:49 <ihrachys> electrocucaracha: what's the mechanism? you would need to elaborate.
15:54:20 <electrocucaracha> ihrachys: by default, nova checks every 60 secs the status of the VMs
15:54:56 <electrocucaracha> ihrachys: there is a way to change this behavior to subscribe instead of asking compute nodes
15:55:09 <ihrachys> so, though we don't have any numbers yet (people are working on generating memory usage profiles), it's worth a note here that OVO is a suspect, and we may need to get involved in whatever comes from the investigation.
15:55:21 <ihrachys> electrocucaracha: how does it relate to memory consumption?
15:55:33 <ihrachys> I think it could change cpu usage pattern but not memory?
15:56:05 <korzen> ihrachys, is it OVO lib general problem?
15:56:56 <ihrachys> we don't know yet, maybe nova and neutron do something wrong; maybe it's not even OVO.
15:57:13 <ihrachys> but nova folks said they noticed memory usage going up when they started adoption
15:57:17 <ihrachys> so that's now two of us
15:57:21 <korzen> :)
15:57:33 <ihrachys> maybe that's related, maybe not. a memory profile should give us data.
15:57:46 <ihrachys> for now, let's point fingers into oslo direction :))
15:57:54 <manjeets> i guess cinder also have ovo it may or may not be ovo
15:58:10 <ihrachys> yeah, for now it's just unfounded suspicions
15:58:18 <ihrachys> but it's bad for PR ;)
15:59:12 <ihrachys> ok let's call it a day. thanks everyone.
15:59:14 <ihrachys> #endmeeting