11:00:29 <oneswig> #startmeeting scientific-sig
11:00:30 <openstack> Meeting started Wed May 22 11:00:29 2019 UTC and is due to finish in 60 minutes.  The chair is oneswig. Information about MeetBot at http://wiki.debian.org/MeetBot.
11:00:31 <openstack> Useful Commands: #action #agreed #help #info #idea #link #topic #startvote.
11:00:33 <openstack> The meeting name has been set to 'scientific_sig'
11:00:40 <oneswig> up up and away
11:00:57 <oneswig> #link Agenda for today https://wiki.openstack.org/wiki/Scientific_SIG#IRC_Meeting_May_22nd_2019
11:01:06 <oneswig> Hi all
11:01:21 <martial_> Good morning ;)
11:01:23 <janders> hi All!
11:01:53 <oneswig> Let's get started ...
11:02:02 <janders> SDN time!
11:02:07 <oneswig> #topic coherent management of SDN fabrics
11:02:13 <oneswig> janders: what's been going wrong?
11:02:44 <janders> as discussed in the Scientific SIG at the PTG, I took the challenge of SDN consistency issues to the Neutron folks
11:03:00 <janders> so - to recap - what is the challenge in the first place?
11:03:30 <janders> I will use SuperCloud's Mellanox SDN as an example (but I believe this challenge is platform agnostic)
11:04:45 <janders> in my deployment we occasionally hit a scenario where for whatever reason (packet drops, system overload, ...) Neutron requests a change on the SDN
11:05:00 <janders> but that change never completes, despite Neutron thinking it has completed
11:05:32 <oneswig> First question - if this is a request via TCP, how can packet drops be a factor?
11:05:52 <janders> there are several layers of APIs on the SDN side so this might happen between them
11:06:09 <janders> good question oneswig, unfortunately I don't have the answer
11:06:32 <oneswig> OK, carry on...
11:06:34 <janders> I'd think NEO<>UFM traffic is TCP as well so this should not be happening - but it does
11:06:56 <janders> it's rare but when it happens it's a massive PITA to reconcile all the sources of truth
11:07:28 <janders> so basically we're looking at a scenario where Neutron requests something from the SDN, SDN returns 200 OK but for some reason things aren't right
11:07:43 <janders> (or another one where things were right but for some reason stopped being right)
11:08:16 <janders> I believe in stock-standard Neutron/OVS, the L2 agent is polling the port status and if it finds any mismatches it will rectify those
11:08:22 <martial_> That sounds like it could be frustrating indeed
11:08:38 <janders> this is missing from nearly all SDN solutions that I know of - either API or ansible driven
11:09:08 <janders> Neutron configures ports once - and when it gets a 200 it never verifies the config again (let alone fixing anything)
11:09:46 <oneswig> The 200 OK response when a change has not been applied might be a specific issue with this driver.
11:09:56 <janders> this is not cool so as we discussed I've spoken to Miguel about this at the PTG. He was supportive and promised to have a look at this with the team - and asked me to open the bug:
11:10:07 <janders> https://bugs.launchpad.net/neutron/+bug/1829449
11:10:08 <openstack> Launchpad bug 1829449 in neutron "Implement consistency check and self-healing for SDN-managed fabrics" [Undecided,New]
11:10:19 <oneswig> good bug, great read.
11:10:39 <janders> the Neutron guys pointed me to this blueprint:
11:10:40 <janders> https://review.opendev.org/#/c/565463/12/specs/stein/sbi-database-consistency.rst
11:10:54 <janders> it's interesting - it seems people have been looking at similar issues for a while
11:11:40 <oneswig> This spec covers similar areas and I'm aware that the mlnx SDN driver needed better concurrency handling on the southbound API to it.
11:11:52 <janders> oneswig: I agree there is room for improvement on the Mellanox SDN side, but even if the SDN stops returning 200 OK before the fabric is configured, I bet we'll still run into an issue or two where for some reason state is lost somewhere
11:12:42 <oneswig> I think you and mgoddard both hit issues with races on this interface, right?
11:12:44 <janders> plus it would be good if it can fix itself when an operator makes a mistake
11:12:55 <janders> I think so
11:13:04 <janders> I think what was killing us was garbage in the journal
11:13:40 <janders> neutron would try replaying the journal (D)DoSing the SDN API
11:13:49 <janders> with requests that were no longer valid
11:14:18 <janders> from my understanding this spec touches on both things - consistency checking/enforcement (which I like)
11:14:46 <janders> and also on the journal which was supposed to help with some of these challenges - but in my experience it does not always help, sometimes it makes things worse
11:14:59 <janders> but overall yeah I think there's value in this blueprint
11:15:23 <janders> what I wanted to get out of discussing this here is to find out if you think implementing this blueprint would fix the issues you're seeing
11:15:58 <janders> personally I would like to see generic functionality in neutron which checks the state of the fabric for inconsistencies and resolves these
11:15:59 <janders> PLUS
11:16:18 <oneswig> It makes reference to Open Daylight's journal but mostly is retrofitted from OVN's implementation.
11:16:24 <janders> I would like the vendors to 1) work on general resiliency of their solutions but 2) also provide the functionality that the above would plug into
11:16:42 <oneswig> If there's an effort to make something cross-SDN, it would be good to see more consideration of that I think
11:16:58 <oneswig> janders: agreed
11:17:22 <janders> I know for sure Juniper users suffer with the same (neutron and the SDN getting out of sync + having to merge manually which is a nightmare)
11:17:31 <janders> Blair isn't here but I think he's had some of this with Cumulus too
11:18:25 <janders> when I was chatting to Miguel with Adrian from Mellanox, an Ericsson developer came and joined us, saying their stuff is suffering from this too
11:19:07 <janders> I was trying to get some feedback from Mellanox on this but my contacts are OOO - so we're probably looking at next week the earliest
11:19:18 <oneswig> Way back (Paris summit) I recall discussing this kind of issue with the team from Big Switch - part of their solution was to checksum current state and check against the state in the SDN controller.  All port state cheaply compared in one move - only costly if they don't tally
11:22:46 <janders> get_inconsistent_resources ''''''''''''''''''''''''''  Get a list of inconsistent resources which the revision number from the table aformentioned differs from the standardattributes table.
11:22:57 <janders> (line 332)
11:24:01 <janders> sync_resource '''''''''''''  The method provides a way to handle cases in which the SBI needs to be called and bump the revision number once there has been success on syncing with the SBI backend. In most cases this will be called in the ``\*_postcommit()`` methods.
11:24:09 <janders> (line 226)
11:24:31 <janders> if I'm reading this right these two functions should be able to resolve issues in the most common scenarios I'm seeing
11:24:49 <oneswig> I was interested by the treatment on existing implementations here: https://review.opendev.org/#/c/565463/12/specs/stein/sbi-database-consistency.rst@451
11:26:36 <janders> this is what Neutron is doing today, right?
11:26:55 <oneswig> It's what OpenDaylight's Neutron driver is doing today, apparently
11:27:14 <janders> if that's the case it would work if it can't talk to the SDN, but if the SDN accepts the request, it's deleted from journal and that's it
11:27:45 <janders> if something goes sideways afterwards, or the port "unconfigures itself" somehow journal won't help
11:27:54 <janders> I believe the Mellanox mechanism driver does exactly the same
11:29:04 <janders> I think get_inconsistent_resources and sync_resource could provide functionality to deal with these issues
11:29:37 <oneswig> That's something I think at the PTG we compared with a deep scrub in Ceph, right?
11:29:43 <janders> yes
11:32:02 <janders> so - that's where things are at in regards to this challenge at this point in time
11:32:11 <janders> I'm thinking what should we do next
11:32:26 <oneswig> The prerequisites (line 104) - do you think Neo could track a revision sequence number like this?
11:32:38 <janders> I'll try to run the blueprint by the Mellanox guys when they're back
11:33:07 <janders> good quesiton! My guess is - the current version might not have that field, but I suppose it shouldn't be a big deal to add it..
11:34:03 <oneswig> It might be exposing an assumption that (for example) the OVN controller doesn't expect there to be other users making changes, where Neo is designed as a multi-user tool for which Neutron is one user.
11:34:05 <janders> I also believe there's a fair bit of activity around SDN at Mellanox so perhaps there are some new avenues that could be used to get this functionality
11:34:35 <oneswig> janders: biggest hope would be to roll Neo and UFM into one ...
11:34:51 <janders> agreed and I don't think that's completely off the table
11:35:23 <janders> although it won't happen overnight..
11:35:43 <oneswig> No indeed.  And I wouldn't want to use it if it did!
11:35:51 <janders> haha! :)
11:36:08 <janders> do you think you would be able to run this blueprint by John and/or Mark for extra feedback?
11:36:52 <janders> I'm thinking after we've all looked through it, perhaps let's have one more discussion here (I will see if we can get the Mellanoxes here) and maybe then let's ask the Neutron folks if we could bring this up in their meeting?
11:36:57 <oneswig> I think there's questions to ask of this spec but it's pretty close.  My main concern is whether it can be implemented elsewhere and to what extent it reinvents existing work in each implementation
11:37:15 <janders> all valid concerns
11:37:27 <janders> the approach I was thinking of taking is get it right for one vendor plugin
11:37:39 <janders> and abstract the implementation away so that the others can use it too
11:37:44 <oneswig> janders: will definitely like to see this go further though.
11:38:04 <janders> yeah - I think this will literally make or break SDN based deployments
11:38:06 <oneswig> you mean apart from the reference implementation in OVN?
11:38:09 <janders> people are struggling at scale
11:38:23 <janders> right, perhaps OVN could be that reference
11:38:41 <janders> when I made that statement first I wasn't aware to what extent this was already investigated for OVN
11:38:46 <oneswig> I think it needs one in addition to OVN - otherwise it will include OVN-centric assumptions
11:39:04 <janders> and I suppose being the open implementation it's well positioned as a reference / good practice
11:40:51 <oneswig> Tungsten or ODL would be good to see as examples of how this proposal would interoperate with a proven system.
11:41:49 <oneswig> It would not be a trivial change for them.  The Mellanox driver is smaller and simpler.
11:42:23 <janders> indeed
11:42:48 <oneswig> Anyway, I'll circulate it here and see what people think.
11:43:17 <janders> what was the system that you guys built that used dual eth+ib interconnect? Was that used for ASKAP?
11:43:22 <janders> I wonder about scale of it
11:43:29 <oneswig> It would certainly be worthwhile if you can canvas your Mellanox contact on it too
11:43:47 <oneswig> janders: not for ASKAP.  It's ALaSKA at Cambridge University.
11:43:57 <oneswig> Getting a Rocky upgrade today.
11:44:04 <janders> oh wow
11:44:20 <oneswig> Scale is 2 racks, 2 IB switches.
11:44:24 <janders> yeah I meant ALaSKA - when I think of SKA I automatically type ASKAP :)
11:44:29 <janders> right!
11:44:45 <janders> yeah I suspect that we can be quite successful building small-medium scale systems with the current stack
11:44:53 <oneswig> ASKAP's far more impressive as a prototype :-)
11:45:00 <janders> but before we go thousands we need to improve this, otherwise operations might get tricky
11:45:26 <janders> especially if it's a highly dynamic system
11:45:44 <oneswig> I very much agree.
11:45:55 <janders> ok! I think we have a very good plan
11:46:03 <oneswig> The part that isn't covered here is any means of requesting a resync from SDN to Neutron
11:46:05 <janders> I suggest we touch base on this again next week
11:46:19 <janders> true!
11:46:25 <oneswig> I'm out next week, alas. 2 weeks time?
11:46:41 <janders> I haven't thought about how that consistency check function is called...
11:46:56 <janders> I wouldn't mind having a CLI command to run manually to start with
11:47:08 <janders> for 1) consistency check and 2) reconcile
11:47:32 <janders> OK! sounds like a plan. That should be the right amount of time to engage the key individuals
11:47:41 <oneswig> Any path to complete the feedback loop would be good, no matter how slow.
11:48:02 <oneswig> OK janders, remind me to put it on the agenda for follow-up in 2 weeks.
11:48:10 <janders> ok! will do
11:48:20 <oneswig> #topic AOB
11:48:34 <janders> if you have any comments / questions in the meantime drop me a line - or just update the bug directly
11:48:41 <janders> they guys are quite responsive
11:48:42 <oneswig> will do, thanks
11:48:49 <oneswig> Anything else new?
11:49:19 <janders> nothing I'm allowed to talk about - made some good progress with procurement..
11:49:31 <janders> that's been my life for the last week mostly :(
11:49:40 <janders> how about you?
11:49:42 <oneswig> Ah, shopping, what a pleasant distraction on a winter's night :-)
11:49:55 <janders> Rocky upgrade sounds exciting! :)
11:50:24 <oneswig> Proceeding through update to latest Queens so far... nearly ready for the jump (but I'm not driving it)
11:50:51 <oneswig> I'm working on my slides for CERN next week - 2 presentations on SKA and Swiss Personalized Health Network...
11:51:18 <janders> great! are the presentations recorded?
11:51:35 <oneswig> I know a few folks from the SIG will be there so I'm hoping to make it worthwhile :-)
11:51:41 <oneswig> Don't know about recording, I expect so.
11:51:44 <janders> I'm sure it will be
11:51:54 <janders> that would be great! :)
11:52:29 <oneswig> Better get back to the preparing then, no pressure.
11:53:06 <janders> I better leave you to it :)
11:53:07 <oneswig> BTW do you have a Cumulus deployment?  I wonder if they might provide another view
11:53:21 <janders> I don't - I believe Blair used to run one
11:53:34 <janders> I think they are closer to networking-ansible driver
11:53:48 <janders> they do have an API that's like a proxy which in turn sshes into switches
11:54:19 <oneswig> Right, let's copy him in too, see if there's a known issue and if they'd be interested.
11:54:20 <janders> and - sometimes the state change doesn't propagate all the way to the port and manual intervention is needed - at least that's what I heard some time back
11:54:28 <janders> that's a great idea
11:54:39 <oneswig> OK janders, any more to cover?
11:54:47 <janders> no, I think we're good
11:54:54 <janders> enjoy Geneva!
11:54:58 <oneswig> Great, let's close, it must be late at your end.
11:54:59 <oneswig> Thanks!
11:55:02 <janders> till next time
11:55:03 <oneswig> #endmeeting