11:00:19 <oneswig> #startmeeting scientific-sig 11:00:20 <openstack> Meeting started Wed Jan 29 11:00:19 2020 UTC and is due to finish in 60 minutes. The chair is oneswig. Information about MeetBot at http://wiki.debian.org/MeetBot. 11:00:21 <openstack> Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. 11:00:23 <openstack> The meeting name has been set to 'scientific_sig' 11:00:36 <oneswig> Hello 11:00:58 <oneswig> ... echo ... 11:05:03 <ttx> ... o ... 11:05:49 <oneswig> hi ttx 11:06:05 <oneswig> quiet, real quiet today (should have posted an agenda... :-( 11:06:12 <ttx> I replied so that you did not feel too lonely :) 11:06:57 <oneswig> We usually pull in 4-5 but there were no agenda items from the sig slack channel 11:08:26 <janders> g'day all 11:08:30 <janders> sorry mac issues 11:08:47 <janders> time to get a real laptop not a toy 11:08:55 <oneswig> Hi janders, just us and ttx right now... 11:09:23 <janders> I did see that mention of the Mellanox HDR OpenStack press release from last week's log 11:09:24 <oneswig> Macbook? Mine runs IRC just fine... got about 6GB in swap doing something though. 11:09:48 <janders> my 5 cents - it was indeed running fine for quite a while, I think they just streamlined it a little more 11:09:56 <oneswig> janders: is that your doing I wonder? 11:10:16 <oneswig> what's different with hdr that required change? 11:10:20 <janders> I might have contributed a little :) but mostly good work by MLNX 11:11:00 <janders> I think most work was around tripleo integration bits 11:11:20 <oneswig> Ah, Ok. 11:11:21 <janders> HDR... good question - I haven't played with HDR200 vfs 11:11:36 <janders> I've got HDR200 storage and HDR100 computes 11:11:43 <oneswig> I've been having lots of fun this last week with VF-LAG 11:11:47 <janders> *maybe* there was a bit of fiddling there 11:12:29 <janders> specifically - HDR200 on PCIe3 11:12:39 <janders> other than that - I think it was most integration work 11:12:49 <janders> VF-LAG - interesting! 11:12:57 <janders> what is it like? 11:13:24 <janders> I've never touched it - and it could be interesting for my cyber system going forward in certain circumstances 11:13:39 <oneswig> Jury's out at present. The major issue I had was the systemd/udev plumbing to put VFs into switchdev mode instead of "legacy" mode. 11:14:10 <oneswig> Curiously, it worked in legacy mode but I haven't got around to trying it in switchdev mode yet. That's todays fun. 11:14:31 <janders> what does it look like from VM owner perspective? ethX and ethY, each coming off a different port? 11:14:37 <oneswig> When I was misconfiguring it, I got Tx bandwidth of 1 port and it appeared I could receive on either port. 11:14:56 <oneswig> janders: no, 1 VF, somehow coupled to 2 PFs. 11:14:59 <janders> or is it magicnic0 which stays up despite ports backing it going up and down? 11:15:09 <janders> ok so option 2) nice! 11:15:12 <oneswig> I think that's the idea. Haven't tested the failover bit yet. 11:15:37 <janders> is the promise that you can aggregate bandwidth and failover at the same time? 11:16:32 <oneswig> The aggregated bandwidth needs verifying but the failover capability is intended for sure. 11:16:38 <oneswig> I'll let you know! 11:17:38 <oneswig> Any news on Supercloud? 11:18:11 <oneswig> btw I'm hoping to write up the VF-LAG experience for next week, it's almost completely undocumented fwiw 11:18:45 <janders> actually I do have some news, what might be disappointing is these are not very performance related 11:18:58 <janders> SuperCloud got bogged down in the cyber security realm for some time it seems 11:19:15 <janders> but I have hit some interesting challanges and issues lately 11:19:51 <janders> in terms of cyber specifics I proposed the volume-transfer based mechanism of copying unsafe data in (and at a later stage - possibly copying data out) 11:20:05 <janders> will see what my security team have to say about this one 11:20:19 <janders> but so far a so good, people seem to be liking the idea 11:20:23 <oneswig> this is malware payloads for analysis? 11:20:30 <janders> yes, in encrypted form 11:20:51 <oneswig> Loving your work, janders :-) 11:21:01 <janders> I might try submit something for Vancouver on this, it is getting quite interesting really 11:21:16 <janders> very different than what I typically do 11:21:19 <oneswig> Is this into bare metal instances? 11:21:25 <janders> VMs at this stage 11:21:36 <janders> but you raised an interesting point 11:21:56 <oneswig> probably wise unless you really trust your layers of defence :-) 11:22:09 <janders> this system will be used by guys who uncovered spectre/meltdown so looking at hardware layer security might be in scope 11:22:25 <janders> it wasn't brought up yet, so mostly looking at VMs 11:22:59 <janders> I should probably say - guys who participated in uncovering spectre/meltdown - it was a big, orchestrated effort 11:23:11 <janders> so that's one interesting bit 11:23:16 <janders> the other bit is more operational 11:23:33 <janders> hit some interesting failure modes in OSP13 which likely impact later releases, too 11:23:35 <janders> BZ coming soon 11:23:47 <janders> https://bugzilla.redhat.com/show_bug.cgi?id=1795402 11:23:47 <oneswig> You might need to give those researchers access to bios settings, microcode etc. 11:23:48 <openstack> bugzilla.redhat.com bug 1795402 in openstack-nova "Nova list --all-tenants | ERROR (ClientException): Unexpected API Error. | <type 'exceptions.TypeError'> (HTTP 500)" [Medium,New] - Assigned to nova-maint 11:24:09 <janders> essentially we had AMQP dropout brick an entire project 11:24:14 <oneswig> typeerror - suggests a code path nobody else is tickling 11:24:35 <janders> (and "openstack server list --all-projects") 11:24:48 <janders> I think it is rare, but when you hit it it does get very ugly very quickly 11:24:59 <oneswig> ouch 11:25:03 <oneswig> is that Queens? 11:25:24 <janders> essentially if AMQP dropout happens in a very bad time during instance creation, instance ends up with missing fields in DB and nova can't handle that 11:25:26 <janders> yes, queens 11:25:44 <janders> OSP16 should be out soon, but it will probably be a while before we upgrade 11:26:10 <janders> I fixed this within an hour from when it happened by setting deleted=1 for instance and re-creating 11:26:11 <oneswig> Are you getting rabbit issues often enough to hit this? 11:26:19 <janders> it only happened once so far 11:26:33 <oneswig> Nice work for getting to the bottom of it! 11:26:33 <janders> but as per Murphy's law it happened at the least appropriate time 11:26:52 <janders> when I had a contractor set up something time-sensitive 11:27:14 <janders> so I did a quick and dirty fix and then had RHAT guys had a look 11:27:30 <janders> they have some really smart fellas up in Brisbane 11:27:42 <janders> the "proper" fix in the BZ was their work 11:28:04 <janders> it's interesting how nova ended up in that state - I thought I will share 11:28:33 <oneswig> thanks, really interesting to know! 11:28:41 <janders> there are known issues with OSP13 AMQP connection handling in nova-compute which are fixed in latest minor updates 11:28:48 <janders> those might be affecting later versions, not sure 11:28:59 <janders> so the root cause of this is likely fixed 11:29:08 <janders> but in case of future errors... nova should be smarter 11:29:17 <janders> I hope to see some resiliency enhancementw 11:29:30 <janders> it should purge network cache itself instead of needing this 11:29:47 <janders> supposedly there is a periodic task that does that but what im hearing is it is yet to be seen to fix anything 11:30:15 <janders> so - if you ever hit this (hope not) here's the SQL :) 11:31:20 <janders> other than that I will be putting in some policy customisations into heat - and splitting GPFS backend for "connected" and "disconnected" instances soon 11:31:37 <janders> (connected=with_floating_ips disconnected=only_accessible_via_vnc in our terminology) 11:32:35 <janders> so - long story short - not a lot of action, but some interesting stuff happening 11:32:50 <oneswig> sounds like you've had your hands busy! 11:33:20 <janders> a little, yes! :) 11:33:43 <oneswig> Thanks for the update, always good to hear your news 11:34:49 <janders> https://photos.app.goo.gl/dcLYzuRMNeHJKHv4A this nasty thing has been keeping us busy lately 11:35:22 <janders> Canberra has been affected by bushfire smoke lately but was lucky enough to avoid big fires nearby... till this week 11:35:32 <janders> so far it's not too bad but we gotta keep an eye on it 11:35:32 <oneswig> ouch. Take care janders! 11:35:56 <janders> thanks oneswig :) 11:36:15 <oneswig> I don't think there's anything else to cover here. 11:36:19 <janders> do you know when does CFP for Vancouver open? 11:36:35 <janders> I keep checking the website but haven't seen much so far 11:36:55 <priteau> janders: Given the new format of this event, I am not sure if there will be a CFP as usual? 11:37:09 <janders> interesting! 11:37:12 <priteau> Maybe ttx knows more about this 11:37:19 <janders> would you be happy to explain the new format in a nutshell/ 11:37:51 <priteau> I only know what's publicly available on the eventbrite page 11:38:00 <priteau> Each day will be broken into three parts: 11:38:10 <priteau> Short kickoff with all attendees to set the goals for the day or discuss the outcomes of the previous day 11:38:22 <priteau> OpenDev: Morning discussions covering projects like Ansible, Ceph, Kubernetes, OpenStack, and more 11:38:38 <priteau> PTG: Afternoon working sessions for project teams and SIGs to continue the morning’s discussions. 11:38:51 <priteau> See https://www.eventbrite.com/e/opendev-ptg-vancouver-2020-tickets-88923270897 11:38:56 <janders> sounds like less presentations, more discussions? 11:39:47 <priteau> Yes, it explicitely says: OpenDev [...] will include discussion oriented sessions around a particular topic to explore a problem within a topic area, share common architectures, and collaborate around potential solutions. 11:40:10 <janders> looks like the balance between the "main" summit and PTG is about to turn upside-down... 11:40:34 <janders> which might not be a bad thing, but it is indeed very different than what it used to be 11:40:45 <janders> good? bad? what do you guys think? 11:41:08 <priteau> AFAIK there will still be a usual summit in Q4 2020 11:41:27 <priteau> Dunno if this new event is an improvement, we shall see 11:41:30 <oneswig> I like the focus of the new session, I hope it turns out how it is being described 11:41:51 <janders> one tricky bit is 11:42:08 <janders> with many organisations getting travel approvals is heaps easier when you have a presentation slot in 11:42:14 <janders> (that does include CSIRO) 11:42:41 <oneswig> lightning talk slot in a sig session perhaps :-) 11:42:56 <janders> haha :D I like your thinking oneswig 11:43:45 <janders> if SARS-ng epidemic isn't too bad I'd very much like to re-visit Vancouver (and probably LA/California on the way back) 11:44:34 <janders> looks like Qantas is now flying direct SYD-YVR - AND getting out of US is so much easier than getting in, immigration-queues wise 11:44:54 <janders> the above would be my way to go 11:45:42 <janders> June is the right time to do this, too 11:46:45 <janders> allright! it's getting close to the hour - so if you guys would like to wrap up, please go ahead 11:46:55 <janders> good chatting - and we'll chat next week 11:47:07 <ttx> re opendev 11:47:12 <oneswig> same to you janders 11:47:22 <ttx> there will not really be a CFP per se 11:48:09 <janders> ttx so more like tracks moderated by project leads, etc? 11:48:09 <ttx> each theme should have a programming committee responsible for picking content. Those may use a CFP, but probably will just reach out to find suitable speakers.... Presentations are just one side of the event 11:48:14 <ttx> most of it is open discussion 11:48:47 <ttx> More details will be out soon as we push out the call for programming committee members 11:49:04 <oneswig> great, thanks for clarifying ttx 11:49:52 <janders> +1 11:50:20 <oneswig> OK, any final items? 11:51:08 <janders> ttx we're happy to help as the SIG :) 11:52:21 <janders> oneswig I think we're good. See you next week! :) 11:52:30 <oneswig> likewise janders 11:52:33 <oneswig> #endmeeting