14:00:35 <haleyb> #startmeeting neutron_drivers 14:00:35 <opendevmeet> Meeting started Fri May 31 14:00:35 2024 UTC and is due to finish in 60 minutes. The chair is haleyb. Information about MeetBot at http://wiki.debian.org/MeetBot. 14:00:35 <opendevmeet> Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. 14:00:35 <opendevmeet> The meeting name has been set to 'neutron_drivers' 14:00:46 <haleyb> ping list: ykarel, mlavalle, mtomaska, slaweq, obondarev, tobias-urdin, lajoskatona, amotoki, haleyb, ralonsoh 14:00:47 <slaweq> o/ 14:00:49 <ihrachys> o/ 14:01:02 <lajoskatona> o/ 14:01:05 <mlavalle> \o 14:02:46 <haleyb> i know one of the rfes was from seba but he's marked away 14:03:31 <lajoskatona> seba is Christian Rohmann's nick? 14:04:14 <haleyb> sebastian lohff - i had https://bugs.launchpad.net/neutron/+bug/2065198 on the list 14:05:16 <haleyb> well, i didn't put it in the on-demand, its just been screened 14:05:29 <lajoskatona> haleyb: ack 14:05:57 <haleyb> what is Christian's nick? he didn't follow the (nick):topic format 14:06:45 <mlavalle> we can discuss the RFE wiyhout him in the meeting, can't we? 14:06:55 <haleyb> sure, just if we had questions 14:07:13 <mlavalle> if we have questions we can ask in Launchpad or the patches 14:07:18 <haleyb> alright, can get started 14:07:50 <haleyb> #link https://bugs.launchpad.net/neutron/+bug/2064120 14:07:57 <haleyb> [RFE] BGP Dynamic Routing for IPv4 AND IPv6 routes not possible with just one dragent (really make use of MP-BGP or support multiple BGP speakers / sessions per dragent) 14:08:25 <haleyb> i triaged this about a month ago 14:09:47 <lajoskatona> racosta actually asked about this topic also (https://meetings.opendev.org/irclogs/%23openstack-neutron/%23openstack-neutron.2024-05-28.log.html ) 14:10:35 <haleyb> and i see he had a comment on the ML 14:11:20 <racosta> o/ 14:11:26 <racosta> I'm just following the discussion... 14:11:51 <lajoskatona> To tell the truth I am not so deeply faimiliar with n-d-r but based on the option in the RFE ,a spec would be good to dug deeper and select which option to select 14:12:00 <lajoskatona> 1) Enabling a BGP speaker to have more than one address family 14:12:09 <lajoskatona> 2) Somehow get more than one BGP speaker per dragent working by 14:12:39 <haleyb> racosta: hi, you probably know more about this space than most 14:13:35 <haleyb> 2a) extend os_ken to support more than one BGP session 14:13:54 <racosta> yeah, the os-ken side is the biggest problem in my opinion... 14:13:58 <haleyb> 2b) if extending os_ken is not an option, maybe the whole dragent can be built to run multiple instances of a single driver (os_ken) to then accept more than one bgp speaker scheduled to it. 14:14:11 <haleyb> copy/paste from the email 14:14:35 <slaweq> so this seems like clearly n-d-r specific RFE and because of that I think we may also ask frickler about his opinion as he is one of the most active maintainers of that stadium project 14:14:55 <mlavalle> ++ 14:15:13 <slaweq> regarding os-ken - do You know if maybe something like that was implemented in ryu after we forked it? 14:15:19 <lajoskatona> true frickler hasgood backgound for n-d-r 14:15:32 <slaweq> maybe we 'just' would need to port some changes 14:16:05 <lajoskatona> I checked ryu a few weeks ago and there were no changes as I remember, and the last changes/fixes were ported by ralonsoh 14:16:15 <lajoskatona> but it can be double checked 14:16:19 <haleyb> yes, good idea. and it seems a spec would be good as lajoskatona mentioned so we have more info on the possible solutions 14:16:22 <slaweq> lajoskatona ok, thx 14:18:18 <haleyb> so i will add some comments based on above and ask for a spec so we can discuss options 14:18:32 <lajoskatona> +1 14:19:21 <mlavalle> +1 14:19:30 <slaweq> before the spec shouldn't we vote if we accept that rfe in general? I'm ok with it and spec can help to choose correct solution to be implemented but just for formality I think we should approve rfe :) 14:20:00 <mlavalle> the rfe is reasonable IMO 14:20:45 <lajoskatona> +1 from me to the RFE 14:21:12 <haleyb> slaweq: true, i was only thinking of getting more info, but it does seem reasonable to me as well 14:22:50 <haleyb> i actually don't remember the rules, is 4 quorum? 14:23:04 <haleyb> it might be the best we can do until rodolfo gets back 14:23:46 <slaweq> +1 from me for the RFE 14:23:51 <ihrachys> I can throw my +0.5 in (hides) 14:23:54 <slaweq> and I think we already have 5 votes now 14:24:04 <slaweq> 5.5 :) 14:24:46 <mlavalle> lol 5.5 14:25:25 <haleyb> i only counted myself lajoskatona mlavalle slaweq (4) + ihrachys (.5) 14:25:26 <racosta> I also think this RFE reasonable ;) 14:26:00 <mlavalle> thanks for your opinion racosta :-) 14:26:02 <haleyb> racosta: ack, thanks, i will approve then and ask for a spec 14:26:06 <slaweq> yes, it's 4 14:26:07 <slaweq> sorry 14:26:32 * haleyb is only on first coffee and was out late so had to double check 14:27:09 * slaweq had holiday yestarday and today work is hard :P 14:28:14 <haleyb> ok, next topic 14:28:48 <haleyb> lajoskatona added a topic about sqlachemy, i've been looking at this too 14:29:47 <lajoskatona> yes actually my goal was to make it visible to everybody and has a discussion 14:30:07 <haleyb> i was in the process of creating a bug yesterday when i had to leave 14:30:29 <haleyb> and i was testing a change to see if it helped with the cover job 14:30:47 <ihrachys> is there more to do there except posting a sed /old/new/s patch + pass CI? are there hidden risks beyond it? 14:30:47 <lajoskatona> so my understanding is that we currently have lazy=subquery which was fine for years with sqlalchemy<2 14:31:02 <haleyb> but let's talk, i have a link to the selectin page 14:31:14 <lajoskatona> and even today we can live with it, but it is not effective 14:31:25 <haleyb> #link https://docs.sqlalchemy.org/en/20/orm/queryguide/relationships.html#selectin-eager-loading 14:31:33 <lajoskatona> ihrachys: I suppose no but never tried :-) 14:31:39 <haleyb> What Kind of Loading to Use ? 14:31:53 <haleyb> that's where i was reading 14:32:43 <haleyb> One to Many / Many to Many Collection - The selectinload() is generally the best loading strategy to use 14:32:48 <ihrachys> "The only scenario in which selectin eager loading is not feasible is when the model is using composite primary keys, and the backend database does not support tuples with IN, which currently includes SQL Server." does it imply we are good as long as we don't deploy on microsoft? 14:33:27 <lajoskatona> "The subqueryload() eager loader is mostly legacy at this point, superseded by the selectinload() 14:34:09 <lajoskatona> so it can happen even that it will be dropped in a future release I suppose 14:34:52 <lajoskatona> ihrachys: does microsoft is a deploy destination for Neutron / openstack? 14:35:00 <slaweq> IMO we are still in a pretty good moment in the cycle to try it - worst case we can revert the change, right? 14:35:05 <ihrachys> won't be dropped; not until ms sql is gone I guess. but we don't care. 14:35:15 <lajoskatona> :-) 14:35:27 <lajoskatona> slaweq: +1 14:35:33 <ihrachys> lajoskatona I don't think a lot of people even try postgres... we are mysql shop, for better or worse. 14:35:39 <slaweq> lajoskatona: no, we officially support only mysql/mariadb and postgresql 14:35:52 <ihrachys> so let's just post a sed s/old/new/ and see if anything falls off :) 14:35:55 <haleyb> i can create a bug today and propose a patch. i've only looked in neutron so far, not neutron-lib or other places 14:36:01 <slaweq> ihrachys++ 14:36:09 <mlavalle> yeap, let's give it a try 14:36:34 <lajoskatona> +1, haleyb if you lack time next week I can jump in 14:36:58 <ihrachys> haleyb probably a good idea to run neutron-lib change against master neutron test suite... which reminds me that we don't have such a job :) https://bugs.launchpad.net/neutron/+bug/2067515 14:37:06 <lajoskatona> I quiclky grepped stadiums and as I see directly no mention of lazy=subquery 14:37:41 <haleyb> lajoskatona: i already have a patch locally, i was waiting for cover test to complete when i had to leave yesterday 14:38:39 <lajoskatona> haleyb: cool, thanks 14:38:55 <ihrachys> beyond the sql query type change... do we want to touch on memory consumption in general? 14:39:21 <ihrachys> afair Mike said query change may save memory, but sqla2 should not have blown the memory in the first place 14:39:23 <haleyb> we might need a change in neutron-lib though in _load_one_to_manys() will have to look 14:40:05 <ihrachys> haleyb can probably switch in stages, no need to replace all of them at once if some pose unexpected work 14:40:34 <racosta> will it be possible to see a reduction in the time spent on queries through tests in Zuul? 14:40:43 <racosta> I remembered this discussion here: https://etherpad.opendev.org/p/oct2023-ptg-os-dbperf-crossproject 14:41:02 <haleyb> ihrachys: my results show old and new versions of coverage use about the same amount of memory, so no smoking gun yet 14:41:39 <ihrachys> :( 14:41:59 <ihrachys> racosta I see select counters; will the SELECT IN switch show a reduction? 14:42:33 <haleyb> and it's sitll unclear why the sqlachemy bump triggered this to me 14:43:35 <ihrachys> haleyb I think I ran with sqla1 vs 2 locally (cover tox) and both seemed to have grown in memory consumed over time. so I couldn't reproduce locally. can you? 14:45:46 <haleyb> ihrachys: i can occasionally get my VM to go sideways with sqla2, i need to move back to v1 and retry there 14:46:22 <ihrachys> ack. "occasionally" is the worst. maybe I was just lucky then. 14:46:38 <haleyb> the fact each coverage process can consume 30-40% memory is sad 14:47:33 <ihrachys> thanks for the update. I guess we will learn more in the next weeks. we should probably move on? 14:47:41 <haleyb> we have one more topic from slaweq 14:48:16 <slaweq> thx haleyb, I wanted to ask about Your opinion about https://bugs.launchpad.net/neutron/+bug/2022043/ 14:48:34 <slaweq> I was trying to explain it better in comment #5 in that LP 14:48:39 <ihrachys> slaweq thanks for the last comment, helps a lot 14:48:51 <slaweq> there was fix for that but apparently it broke some ci jobs 14:49:19 <lajoskatona> tempest jobs also? 14:49:22 <slaweq> so I am now not sure if we want/should try to fix it or maybe it is better to leave it as it is now and close the LP simply 14:49:38 <ihrachys> I have a question about "Unfortunately, due to the "populate_project_id" method ... project_id is always added to every resource". AFAIU the patch (reverted) was accommodating for this behavior. Should this behavior be unconditional though in the first place? E.g. should populate_project_id populate for all dicts regardless whether API definition has the field?.. 14:50:17 <slaweq> ihrachys that is good question which I asked myself yesterday too 14:50:50 <ihrachys> you have an answer or also confused like me? 14:50:56 <slaweq> maybe it should be fixed in different way and in https://github.com/openstack/neutron-lib/blob/master/neutron_lib/api/attributes.py#L259 it shouldn't be added at all to the res_dict if attributes don't have it 14:51:10 <slaweq> ihrachys I don't have answer for that 14:51:12 <slaweq> :) 14:52:16 <ihrachys> on superfluous level, this SOUNDS like a reasonable thing to do - don't add fields that are not supported. I suspect there may be a risk of breakage if a plugin then relies on them present? 14:53:50 <slaweq> ihrachys maybe but I am not aware of anything like that really. Also tenant_id is still in the context object always. If plugin relies on e.g. flavor['project_id'] then this is IMO mistake simply 14:54:05 <haleyb> slaweq: so if these resources don't need a project_id, what is it set to? just the caller's project_id? 14:54:28 <slaweq> haleyb yes, look in https://github.com/openstack/neutron-lib/blob/master/neutron_lib/api/attributes.py#L259 14:54:51 <slaweq> for every POST request, if there is no project_id in the res_dict it is added from the context 14:54:58 <ihrachys> it is a bug in a plugin if it does rely on it. still a break that can go nasty. :) (gonna revert another one lol) 14:55:04 <haleyb> ack 14:55:06 <slaweq> for resources which belongs to project it makes sense 14:55:16 <slaweq> but for some resources not really 14:55:44 <slaweq> ihrachys and that's why I am asking here what we should do with it 14:55:57 <slaweq> maybe it is better to leave it as it is simply 14:55:58 <slaweq> :) 14:56:25 <ihrachys> I'm always for trying to make the code correct. we can throw this change at the wall and see if it sticks, like with sql queries :) 14:57:16 <ihrachys> if not, we should add a comment to api definitions explaining why these are part of the definitions. perhaps link to a bug. 14:58:07 <mlavalle> yes, let's try to fix it 14:58:38 <slaweq> ok, so I will try to propose another fix for it and make sure I don't break any ci this time 14:58:45 <slaweq> :) 14:58:52 <ihrachys> thanks! we should check against neutron suite at least. 14:58:54 <mlavalle> like you said, we are still early in the cycle 14:59:18 <slaweq> thank You for help with this 14:59:27 <lajoskatona> +1 14:59:33 <haleyb> slaweq: ack, thanks, but someone beat you to breaking the CI, that job is failing again :( 14:59:33 <slaweq> and sorry for breaking ci so You had to revert it :) 14:59:46 <slaweq> LOL 15:00:00 * haleyb is crying 15:00:01 <ihrachys> np, it's part of the job. we learned ci is not testing something, so hopefully we'll plug the gap 15:01:06 <haleyb> ihrachys: it might have been one of the netaddr changes... 15:01:40 <ihrachys> I was going to ask about the logs after the meeting because of course - landed a bunch of these yesterday. 15:02:30 <haleyb> alright, we can end meeting 15:02:44 <ihrachys> o/ 15:03:00 <mlavalle> \o 15:03:02 <haleyb> #endmeeting