#openstack-neutron log

14:00:35 <haleyb> #startmeeting neutron_drivers
14:00:35 <opendevmeet> Meeting started Fri May 31 14:00:35 2024 UTC and is due to finish in 60 minutes.  The chair is haleyb. Information about MeetBot at http://wiki.debian.org/MeetBot.
14:00:35 <opendevmeet> Useful Commands: #action #agreed #help #info #idea #link #topic #startvote.
14:00:35 <opendevmeet> The meeting name has been set to 'neutron_drivers'
14:00:46 <haleyb> ping list: ykarel, mlavalle, mtomaska, slaweq, obondarev, tobias-urdin, lajoskatona, amotoki, haleyb, ralonsoh
14:00:47 <slaweq> o/
14:00:49 <ihrachys> o/
14:01:02 <lajoskatona> o/
14:01:05 <mlavalle> \o
14:02:46 <haleyb> i know one of the rfes was from seba but he's marked away
14:03:31 <lajoskatona> seba is Christian Rohmann's nick?
14:04:14 <haleyb> sebastian lohff - i had https://bugs.launchpad.net/neutron/+bug/2065198 on the list
14:05:16 <haleyb> well, i didn't put it in the on-demand, its just been screened
14:05:29 <lajoskatona> haleyb: ack
14:05:57 <haleyb> what is Christian's nick? he didn't follow the (nick):topic format
14:06:45 <mlavalle> we can discuss the RFE wiyhout him in the meeting, can't we?
14:06:55 <haleyb> sure, just if we had questions
14:07:13 <mlavalle> if we have questions we can ask in Launchpad or the patches
14:07:18 <haleyb> alright, can get started
14:07:50 <haleyb> #link https://bugs.launchpad.net/neutron/+bug/2064120
14:07:57 <haleyb> [RFE] BGP Dynamic Routing for IPv4 AND IPv6 routes not possible with just one dragent (really make use of MP-BGP or support multiple BGP speakers / sessions per dragent)
14:08:25 <haleyb> i triaged this about a month ago
14:09:47 <lajoskatona> racosta actually asked about this topic also (https://meetings.opendev.org/irclogs/%23openstack-neutron/%23openstack-neutron.2024-05-28.log.html )
14:10:35 <haleyb> and i see he had a comment on the ML
14:11:20 <racosta> o/
14:11:26 <racosta> I'm just following the discussion...
14:11:51 <lajoskatona> To tell the truth I am not so deeply faimiliar with n-d-r but based on the option in the RFE ,a spec would be good to dug deeper and select which option to select
14:12:00 <lajoskatona> 1) Enabling a BGP speaker to have more than one address family
14:12:09 <lajoskatona> 2) Somehow get more than one BGP speaker per dragent working by
14:12:39 <haleyb> racosta: hi, you probably know more about this space than most
14:13:35 <haleyb> 2a) extend os_ken to support more than one BGP session
14:13:54 <racosta> yeah, the os-ken side is the biggest problem in my opinion...
14:13:58 <haleyb> 2b)  if extending os_ken is not an option, maybe the whole dragent can be built to run multiple instances of a single driver (os_ken) to then accept more than one bgp speaker scheduled to it.
14:14:11 <haleyb> copy/paste from the email
14:14:35 <slaweq> so this seems like clearly n-d-r specific RFE and because of that I think we may also ask frickler about his opinion as he is one of the most active maintainers of that stadium project
14:14:55 <mlavalle> ++
14:15:13 <slaweq> regarding os-ken - do You know if maybe something like that was implemented in ryu after we forked it?
14:15:19 <lajoskatona> true frickler hasgood backgound for n-d-r
14:15:32 <slaweq> maybe we 'just' would need to port some changes
14:16:05 <lajoskatona> I checked ryu a few weeks ago and there were no changes as I remember, and the last changes/fixes  were ported by ralonsoh
14:16:15 <lajoskatona> but it can be double checked
14:16:19 <haleyb> yes, good idea. and it seems a spec would be good as lajoskatona mentioned so we have more info on the possible solutions
14:16:22 <slaweq> lajoskatona ok, thx
14:18:18 <haleyb> so i will add some comments based on above and ask for a spec so we can discuss options
14:18:32 <lajoskatona> +1
14:19:21 <mlavalle> +1
14:19:30 <slaweq> before the spec shouldn't we vote if we accept that rfe in general? I'm ok with it and spec can help to choose correct solution to be implemented but just for formality I think we should approve rfe :)
14:20:00 <mlavalle> the rfe is reasonable IMO
14:20:45 <lajoskatona> +1 from me to the RFE
14:21:12 <haleyb> slaweq: true, i was only thinking of getting more info, but it does seem reasonable to me as well
14:22:50 <haleyb> i actually don't remember the rules, is 4 quorum?
14:23:04 <haleyb> it might be the best we can do until rodolfo gets back
14:23:46 <slaweq> +1 from me for the RFE
14:23:51 <ihrachys> I can throw my +0.5 in (hides)
14:23:54 <slaweq> and I think we already have 5 votes now
14:24:04 <slaweq> 5.5 :)
14:24:46 <mlavalle> lol 5.5
14:25:25 <haleyb> i only counted myself lajoskatona mlavalle slaweq (4) + ihrachys (.5)
14:25:26 <racosta> I also think this RFE reasonable ;)
14:26:00 <mlavalle> thanks for your opinion racosta :-)
14:26:02 <haleyb> racosta: ack, thanks, i will approve then and ask for a spec
14:26:06 <slaweq> yes, it's 4
14:26:07 <slaweq> sorry
14:26:32 * haleyb is only on first coffee and was out late so had to double check
14:27:09 * slaweq had holiday yestarday and today work is hard :P
14:28:14 <haleyb> ok, next topic
14:28:48 <haleyb> lajoskatona added a topic about sqlachemy, i've been looking at this too
14:29:47 <lajoskatona> yes actually my goal was to make it visible to everybody and has a discussion
14:30:07 <haleyb> i was in the process of creating a bug yesterday when i had to leave
14:30:29 <haleyb> and i was testing a change to see if it helped with the cover job
14:30:47 <ihrachys> is there more to do there except posting a sed /old/new/s patch + pass CI? are there hidden risks beyond it?
14:30:47 <lajoskatona> so my understanding is that we currently have lazy=subquery which was fine for years with sqlalchemy<2
14:31:02 <haleyb> but let's talk, i have a link to the selectin page
14:31:14 <lajoskatona> and even today we can live with it, but it is not effective
14:31:25 <haleyb> #link https://docs.sqlalchemy.org/en/20/orm/queryguide/relationships.html#selectin-eager-loading
14:31:33 <lajoskatona> ihrachys: I suppose no but never tried :-)
14:31:39 <haleyb> What Kind of Loading to Use ?
14:31:53 <haleyb> that's where i was reading
14:32:43 <haleyb> One to Many / Many to Many Collection - The selectinload() is generally the best loading strategy to use
14:32:48 <ihrachys> "The only scenario in which selectin eager loading is not feasible is when the model is using composite primary keys, and the backend database does not support tuples with IN, which currently includes SQL Server." does it imply we are good as long as we don't deploy on microsoft?
14:33:27 <lajoskatona> "The subqueryload() eager loader is mostly legacy at this point, superseded by the selectinload()
14:34:09 <lajoskatona> so it can happen even that it will be dropped in a future release I suppose
14:34:52 <lajoskatona> ihrachys: does microsoft is a deploy destination for Neutron / openstack?
14:35:00 <slaweq> IMO we are still in a pretty good moment in the cycle to try it - worst case we can revert the change, right?
14:35:05 <ihrachys> won't be dropped; not until ms sql is gone I guess. but we don't care.
14:35:15 <lajoskatona> :-)
14:35:27 <lajoskatona> slaweq: +1
14:35:33 <ihrachys> lajoskatona I don't think a lot of people even try postgres... we are mysql shop, for better or worse.
14:35:39 <slaweq> lajoskatona: no, we officially support only mysql/mariadb and postgresql
14:35:52 <ihrachys> so let's just post a sed s/old/new/ and see if anything falls off :)
14:35:55 <haleyb> i can create a bug today and propose a patch. i've only looked in neutron so far, not neutron-lib or other places
14:36:01 <slaweq> ihrachys++
14:36:09 <mlavalle> yeap, let's give it a try
14:36:34 <lajoskatona> +1, haleyb if you lack time next week I can jump in
14:36:58 <ihrachys> haleyb probably a good idea to run neutron-lib change against master neutron test suite... which reminds me that we don't have such a job :) https://bugs.launchpad.net/neutron/+bug/2067515
14:37:06 <lajoskatona> I quiclky grepped stadiums and as I see directly no mention of lazy=subquery
14:37:41 <haleyb> lajoskatona: i already have a patch locally, i was waiting for cover test to complete when i had to leave yesterday
14:38:39 <lajoskatona> haleyb: cool, thanks
14:38:55 <ihrachys> beyond the sql query type change... do we want to touch on memory consumption in general?
14:39:21 <ihrachys> afair Mike said query change may save memory, but sqla2 should not have blown the memory in the first place
14:39:23 <haleyb> we might need a change in neutron-lib though in _load_one_to_manys() will have to look
14:40:05 <ihrachys> haleyb can probably switch in stages, no need to replace all of them at once if some pose unexpected work
14:40:34 <racosta> will it be possible to see a reduction in the time spent on queries through tests in Zuul?
14:40:43 <racosta> I remembered this discussion here: https://etherpad.opendev.org/p/oct2023-ptg-os-dbperf-crossproject
14:41:02 <haleyb> ihrachys: my results show old and new versions of coverage use about the same amount of memory, so no smoking gun yet
14:41:39 <ihrachys> :(
14:41:59 <ihrachys> racosta I see select counters; will the SELECT IN switch show a reduction?
14:42:33 <haleyb> and it's sitll unclear why the sqlachemy bump triggered this to me
14:43:35 <ihrachys> haleyb I think I ran with sqla1 vs 2 locally (cover tox) and both seemed to have grown in memory consumed over time. so I couldn't reproduce locally. can you?
14:45:46 <haleyb> ihrachys: i can occasionally get my VM to go sideways with sqla2, i need to move back to v1 and retry there
14:46:22 <ihrachys> ack. "occasionally" is the worst. maybe I was just lucky then.
14:46:38 <haleyb> the fact each coverage process can consume 30-40% memory is sad
14:47:33 <ihrachys> thanks for the update. I guess we will learn more in the next weeks. we should probably move on?
14:47:41 <haleyb> we have one more topic from slaweq
14:48:16 <slaweq> thx haleyb, I wanted to ask about Your opinion about https://bugs.launchpad.net/neutron/+bug/2022043/
14:48:34 <slaweq> I was trying to explain it better in comment #5 in that LP
14:48:39 <ihrachys> slaweq thanks for the last comment, helps a lot
14:48:51 <slaweq> there was fix for that but apparently it broke some ci jobs
14:49:19 <lajoskatona> tempest jobs also?
14:49:22 <slaweq> so I am now not sure if we want/should try to fix it or maybe it is better to leave  it as it is now and close the LP simply
14:49:38 <ihrachys> I have a question about "Unfortunately, due to the "populate_project_id" method  ... project_id is always added to every resource". AFAIU the patch (reverted) was accommodating for this behavior. Should this behavior be unconditional though in the first place? E.g. should populate_project_id populate for all dicts regardless whether API definition has the field?..
14:50:17 <slaweq> ihrachys that is good question which I asked myself yesterday too
14:50:50 <ihrachys> you have an answer or also confused like me?
14:50:56 <slaweq> maybe it should be fixed in different way and in https://github.com/openstack/neutron-lib/blob/master/neutron_lib/api/attributes.py#L259 it shouldn't be added at all to the res_dict if attributes don't have it
14:51:10 <slaweq> ihrachys I don't have answer for that
14:51:12 <slaweq> :)
14:52:16 <ihrachys> on superfluous level, this SOUNDS like a reasonable thing to do - don't add fields that are not supported. I suspect there may be a risk of breakage if a plugin then relies on them present?
14:53:50 <slaweq> ihrachys maybe but I am not aware of anything like that really. Also tenant_id is still in the context object always. If plugin relies on e.g. flavor['project_id'] then this is IMO mistake simply
14:54:05 <haleyb> slaweq: so if these resources don't need a project_id, what is it set to? just the caller's project_id?
14:54:28 <slaweq> haleyb yes, look in https://github.com/openstack/neutron-lib/blob/master/neutron_lib/api/attributes.py#L259
14:54:51 <slaweq> for every POST request, if there is no project_id in the res_dict it is added from the context
14:54:58 <ihrachys> it is a bug in a plugin if it does rely on it. still a break that can go nasty. :) (gonna revert another one lol)
14:55:04 <haleyb> ack
14:55:06 <slaweq> for resources which belongs to project it makes sense
14:55:16 <slaweq> but for some resources not really
14:55:44 <slaweq> ihrachys and that's why I am asking here what we should do with it
14:55:57 <slaweq> maybe it is better to leave it as it is simply
14:55:58 <slaweq> :)
14:56:25 <ihrachys> I'm always for trying to make the code correct. we can throw this change at the wall and see if it sticks, like with sql queries :)
14:57:16 <ihrachys> if not, we should add a comment to api definitions explaining why these are part of the definitions. perhaps link to a bug.
14:58:07 <mlavalle> yes, let's try to fix it
14:58:38 <slaweq> ok, so I will try to propose another fix for it and make sure I don't break any ci this time
14:58:45 <slaweq> :)
14:58:52 <ihrachys> thanks! we should check against neutron suite at least.
14:58:54 <mlavalle> like you said, we are still early in the cycle
14:59:18 <slaweq> thank You for help with this
14:59:27 <lajoskatona> +1
14:59:33 <haleyb> slaweq: ack, thanks, but someone beat you to breaking the CI, that job is failing again :(
14:59:33 <slaweq> and sorry for breaking ci so You had to revert it :)
14:59:46 <slaweq> LOL
15:00:00 * haleyb is crying
15:00:01 <ihrachys> np, it's part of the job. we learned ci is not testing something, so hopefully we'll plug the gap
15:01:06 <haleyb> ihrachys: it might have been one of the netaddr changes...
15:01:40 <ihrachys> I was going to ask about the logs after the meeting because of course - landed a bunch of these yesterday.
15:02:30 <haleyb> alright, we can end meeting
15:02:44 <ihrachys> o/
15:03:00 <mlavalle> \o
15:03:02 <haleyb> #endmeeting