11:00:19 #startmeeting scientific-sig 11:00:20 Meeting started Wed Jan 29 11:00:19 2020 UTC and is due to finish in 60 minutes. The chair is oneswig. Information about MeetBot at http://wiki.debian.org/MeetBot. 11:00:21 Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. 11:00:23 The meeting name has been set to 'scientific_sig' 11:00:36 Hello 11:00:58 ... echo ... 11:05:03 ... o ... 11:05:49 hi ttx 11:06:05 quiet, real quiet today (should have posted an agenda... :-( 11:06:12 I replied so that you did not feel too lonely :) 11:06:57 We usually pull in 4-5 but there were no agenda items from the sig slack channel 11:08:26 g'day all 11:08:30 sorry mac issues 11:08:47 time to get a real laptop not a toy 11:08:55 Hi janders, just us and ttx right now... 11:09:23 I did see that mention of the Mellanox HDR OpenStack press release from last week's log 11:09:24 Macbook? Mine runs IRC just fine... got about 6GB in swap doing something though. 11:09:48 my 5 cents - it was indeed running fine for quite a while, I think they just streamlined it a little more 11:09:56 janders: is that your doing I wonder? 11:10:16 what's different with hdr that required change? 11:10:20 I might have contributed a little :) but mostly good work by MLNX 11:11:00 I think most work was around tripleo integration bits 11:11:20 Ah, Ok. 11:11:21 HDR... good question - I haven't played with HDR200 vfs 11:11:36 I've got HDR200 storage and HDR100 computes 11:11:43 I've been having lots of fun this last week with VF-LAG 11:11:47 *maybe* there was a bit of fiddling there 11:12:29 specifically - HDR200 on PCIe3 11:12:39 other than that - I think it was most integration work 11:12:49 VF-LAG - interesting! 11:12:57 what is it like? 11:13:24 I've never touched it - and it could be interesting for my cyber system going forward in certain circumstances 11:13:39 Jury's out at present. The major issue I had was the systemd/udev plumbing to put VFs into switchdev mode instead of "legacy" mode. 11:14:10 Curiously, it worked in legacy mode but I haven't got around to trying it in switchdev mode yet. That's todays fun. 11:14:31 what does it look like from VM owner perspective? ethX and ethY, each coming off a different port? 11:14:37 When I was misconfiguring it, I got Tx bandwidth of 1 port and it appeared I could receive on either port. 11:14:56 janders: no, 1 VF, somehow coupled to 2 PFs. 11:14:59 or is it magicnic0 which stays up despite ports backing it going up and down? 11:15:09 ok so option 2) nice! 11:15:12 I think that's the idea. Haven't tested the failover bit yet. 11:15:37 is the promise that you can aggregate bandwidth and failover at the same time? 11:16:32 The aggregated bandwidth needs verifying but the failover capability is intended for sure. 11:16:38 I'll let you know! 11:17:38 Any news on Supercloud? 11:18:11 btw I'm hoping to write up the VF-LAG experience for next week, it's almost completely undocumented fwiw 11:18:45 actually I do have some news, what might be disappointing is these are not very performance related 11:18:58 SuperCloud got bogged down in the cyber security realm for some time it seems 11:19:15 but I have hit some interesting challanges and issues lately 11:19:51 in terms of cyber specifics I proposed the volume-transfer based mechanism of copying unsafe data in (and at a later stage - possibly copying data out) 11:20:05 will see what my security team have to say about this one 11:20:19 but so far a so good, people seem to be liking the idea 11:20:23 this is malware payloads for analysis? 11:20:30 yes, in encrypted form 11:20:51 Loving your work, janders :-) 11:21:01 I might try submit something for Vancouver on this, it is getting quite interesting really 11:21:16 very different than what I typically do 11:21:19 Is this into bare metal instances? 11:21:25 VMs at this stage 11:21:36 but you raised an interesting point 11:21:56 probably wise unless you really trust your layers of defence :-) 11:22:09 this system will be used by guys who uncovered spectre/meltdown so looking at hardware layer security might be in scope 11:22:25 it wasn't brought up yet, so mostly looking at VMs 11:22:59 I should probably say - guys who participated in uncovering spectre/meltdown - it was a big, orchestrated effort 11:23:11 so that's one interesting bit 11:23:16 the other bit is more operational 11:23:33 hit some interesting failure modes in OSP13 which likely impact later releases, too 11:23:35 BZ coming soon 11:23:47 https://bugzilla.redhat.com/show_bug.cgi?id=1795402 11:23:47 You might need to give those researchers access to bios settings, microcode etc. 11:23:48 bugzilla.redhat.com bug 1795402 in openstack-nova "Nova list --all-tenants | ERROR (ClientException): Unexpected API Error. | (HTTP 500)" [Medium,New] - Assigned to nova-maint 11:24:09 essentially we had AMQP dropout brick an entire project 11:24:14 typeerror - suggests a code path nobody else is tickling 11:24:35 (and "openstack server list --all-projects") 11:24:48 I think it is rare, but when you hit it it does get very ugly very quickly 11:24:59 ouch 11:25:03 is that Queens? 11:25:24 essentially if AMQP dropout happens in a very bad time during instance creation, instance ends up with missing fields in DB and nova can't handle that 11:25:26 yes, queens 11:25:44 OSP16 should be out soon, but it will probably be a while before we upgrade 11:26:10 I fixed this within an hour from when it happened by setting deleted=1 for instance and re-creating 11:26:11 Are you getting rabbit issues often enough to hit this? 11:26:19 it only happened once so far 11:26:33 Nice work for getting to the bottom of it! 11:26:33 but as per Murphy's law it happened at the least appropriate time 11:26:52 when I had a contractor set up something time-sensitive 11:27:14 so I did a quick and dirty fix and then had RHAT guys had a look 11:27:30 they have some really smart fellas up in Brisbane 11:27:42 the "proper" fix in the BZ was their work 11:28:04 it's interesting how nova ended up in that state - I thought I will share 11:28:33 thanks, really interesting to know! 11:28:41 there are known issues with OSP13 AMQP connection handling in nova-compute which are fixed in latest minor updates 11:28:48 those might be affecting later versions, not sure 11:28:59 so the root cause of this is likely fixed 11:29:08 but in case of future errors... nova should be smarter 11:29:17 I hope to see some resiliency enhancementw 11:29:30 it should purge network cache itself instead of needing this 11:29:47 supposedly there is a periodic task that does that but what im hearing is it is yet to be seen to fix anything 11:30:15 so - if you ever hit this (hope not) here's the SQL :) 11:31:20 other than that I will be putting in some policy customisations into heat - and splitting GPFS backend for "connected" and "disconnected" instances soon 11:31:37 (connected=with_floating_ips disconnected=only_accessible_via_vnc in our terminology) 11:32:35 so - long story short - not a lot of action, but some interesting stuff happening 11:32:50 sounds like you've had your hands busy! 11:33:20 a little, yes! :) 11:33:43 Thanks for the update, always good to hear your news 11:34:49 https://photos.app.goo.gl/dcLYzuRMNeHJKHv4A this nasty thing has been keeping us busy lately 11:35:22 Canberra has been affected by bushfire smoke lately but was lucky enough to avoid big fires nearby... till this week 11:35:32 so far it's not too bad but we gotta keep an eye on it 11:35:32 ouch. Take care janders! 11:35:56 thanks oneswig :) 11:36:15 I don't think there's anything else to cover here. 11:36:19 do you know when does CFP for Vancouver open? 11:36:35 I keep checking the website but haven't seen much so far 11:36:55 janders: Given the new format of this event, I am not sure if there will be a CFP as usual? 11:37:09 interesting! 11:37:12 Maybe ttx knows more about this 11:37:19 would you be happy to explain the new format in a nutshell/ 11:37:51 I only know what's publicly available on the eventbrite page 11:38:00 Each day will be broken into three parts: 11:38:10 Short kickoff with all attendees to set the goals for the day or discuss the outcomes of the previous day 11:38:22 OpenDev: Morning discussions covering projects like Ansible, Ceph, Kubernetes, OpenStack, and more 11:38:38 PTG: Afternoon working sessions for project teams and SIGs to continue the morning’s discussions. 11:38:51 See https://www.eventbrite.com/e/opendev-ptg-vancouver-2020-tickets-88923270897 11:38:56 sounds like less presentations, more discussions? 11:39:47 Yes, it explicitely says: OpenDev [...] will include discussion oriented sessions around a particular topic to explore a problem within a topic area, share common architectures, and collaborate around potential solutions. 11:40:10 looks like the balance between the "main" summit and PTG is about to turn upside-down... 11:40:34 which might not be a bad thing, but it is indeed very different than what it used to be 11:40:45 good? bad? what do you guys think? 11:41:08 AFAIK there will still be a usual summit in Q4 2020 11:41:27 Dunno if this new event is an improvement, we shall see 11:41:30 I like the focus of the new session, I hope it turns out how it is being described 11:41:51 one tricky bit is 11:42:08 with many organisations getting travel approvals is heaps easier when you have a presentation slot in 11:42:14 (that does include CSIRO) 11:42:41 lightning talk slot in a sig session perhaps :-) 11:42:56 haha :D I like your thinking oneswig 11:43:45 if SARS-ng epidemic isn't too bad I'd very much like to re-visit Vancouver (and probably LA/California on the way back) 11:44:34 looks like Qantas is now flying direct SYD-YVR - AND getting out of US is so much easier than getting in, immigration-queues wise 11:44:54 the above would be my way to go 11:45:42 June is the right time to do this, too 11:46:45 allright! it's getting close to the hour - so if you guys would like to wrap up, please go ahead 11:46:55 good chatting - and we'll chat next week 11:47:07 re opendev 11:47:12 same to you janders 11:47:22 there will not really be a CFP per se 11:48:09 ttx so more like tracks moderated by project leads, etc? 11:48:09 each theme should have a programming committee responsible for picking content. Those may use a CFP, but probably will just reach out to find suitable speakers.... Presentations are just one side of the event 11:48:14 most of it is open discussion 11:48:47 More details will be out soon as we push out the call for programming committee members 11:49:04 great, thanks for clarifying ttx 11:49:52 +1 11:50:20 OK, any final items? 11:51:08 ttx we're happy to help as the SIG :) 11:52:21 oneswig I think we're good. See you next week! :) 11:52:30 likewise janders 11:52:33 #endmeeting