#openstack-meeting log

11:04:27 <oneswig> #startmeeting scientific-sig
11:04:28 <openstack> Meeting started Wed Jul  3 11:04:27 2019 UTC and is due to finish in 60 minutes.  The chair is oneswig. Information about MeetBot at http://wiki.debian.org/MeetBot.
11:04:29 <openstack> Useful Commands: #action #agreed #help #info #idea #link #topic #startvote.
11:04:31 <openstack> The meeting name has been set to 'scientific_sig'
11:04:33 <oneswig> #chair b1airo
11:04:33 <openstack> Current chairs: b1airo oneswig
11:04:43 <oneswig> janders: used the 2600TP white box servers
11:04:44 <b1airo> sorry, laptop just froze at a very inopportune moment
11:04:59 <janders> oneswig: yes, those ones
11:05:11 <janders> do you have any experience with automating BIOS configuration?
11:05:12 <oneswig> Would prefer not to, bmc and bios firmware was not good.
11:05:28 <janders> oops then we might be a little screwed
11:05:37 <oneswig> janders: I don't think it is possible.  I recall we had to upgrade NIC firmware on them and it was painful at the time.
11:05:46 <oneswig> (this was 2 years ago, things may have improved).
11:05:47 <janders> they have partial redfish support but it seems very read-only especially if done via ansible
11:06:04 <janders> there are some new tools my guys are testing
11:06:05 <oneswig> sounds like things have evolved a bit then.
11:06:18 <b1airo> were they cheap enough to warrant it...?
11:06:37 <janders> but I'm in discussion with their devs, they seem to be keen to make things better
11:07:08 <janders> allright.. it's good to know we're not alone
11:07:20 <janders> these seem pretty cool piece of kit other than this "little
11:07:24 <janders> " detail
11:07:41 <janders> I will try to work out something with Intel and happy to report back if anyone is interested
11:08:00 <oneswig> always like a discussion on things like that...
11:08:27 <oneswig> belmoreira: I have John and Mark here, are you available to discuss next week?
11:09:21 <belmoreira> should be fine. Send me your availability (day, hour) and I will check with the other guys
11:09:39 <oneswig> will do, thanks (and apologies for the thrash)
11:10:08 <belmoreira> oneswig no worries
11:10:35 <oneswig> How's the new control plane rollout going?
11:10:53 <b1airo> how's your control-plane k8s-ing going belmoreira ?
11:11:05 <b1airo> jinx
11:11:12 <oneswig> :-)
11:11:19 <belmoreira> :)
11:11:42 <belmoreira> we have production request going to the k8s cluster with glance
11:12:03 <belmoreira> next step is nova-api (maybe next week)
11:12:11 <janders> nice!
11:12:17 <belmoreira> and then have a full cell control plane
11:12:43 <belmoreira> the main motivation is to gain ops experience and see if it's makes sense to move the control plane to k8s
11:13:02 <janders> belmoreira: when complete, will you run a "small/internal" k8s to run OpenStack control plane that will then run "large/user-facing" k8s on top?
11:14:40 <belmoreira> janders not clear. What I would like is to have the k8s cluster deployed with magnum in the same infrastructure. Like the inception that we have today for the control plane VMs
11:15:41 <janders> inception... is the OpenStack running itself, essentially?
11:15:52 <b1airo> if i've understood belmoreira's current cell-level control-plane architecture correctly then each control-plane k8s cluster will itself be running across some subset of his production cloud! it's turtles, or dogfood - hopefully those things aren't equivalent - all the way down/round
11:16:57 <belmoreira> b1airo: that's it
11:17:54 <b1airo> memory can't be too rusty yet then, despite being part of a "fleshy von neumann machine" (referencing a conversation earlier this evening)
11:18:06 <b1airo> anyone played with Singularity 3.3 yet?
11:18:16 <b1airo> fakeroot builds and such?
11:18:48 <b1airo> (it's still in rc for the moment i think, so won't be surprised if not)
11:21:38 <oneswig> belmoreira: got any tips for ceph mds performance?  We've got a deployment with CephFS as a cluster filesystem and it's apparently pegged in directory operations
11:23:58 <b1airo> yeesh, is telling them not to do that an option oneswig ?
11:24:44 <oneswig> belmoreira: back on the self-hosted control plane, what have you looked at in terms of disaster recovery scenarios?  How to fix a broken control plane from within?
11:25:05 <oneswig> b1airo: It was my idea ...
11:25:37 <oneswig> I have to head off
11:26:20 <oneswig> b1airo: can you take the reins? Got to head out now
11:26:34 <b1airo> oh, well in that case it's a fine idea :-P. i guess that means it's not a prod deployment that you're trying to fix though?
11:26:40 <b1airo> sure oneswig
11:26:47 <oneswig> see you janders belmoreira b1airo, nice to briefly catch up
11:26:59 <oneswig> b1airo: no, a bit of scientific experimentation
11:27:03 <oneswig> cheers all
11:27:06 <janders> oneswig: thank you, till next time
11:27:07 <b1airo> o/
11:27:55 <belmoreira> sorry, was away for few moments
11:28:13 <b1airo> i'm assuming oneswig must have already looked at directory fragmentation/sharding across multiple MSD's, but perhaps worth mentioning anyway
11:28:23 <b1airo> *MDS's
11:28:51 <belmoreira> oneswig "ceph mds performance" I don't have great tips. Maybe Dan can help
11:30:38 <belmoreira> oneswig "disaster recovery scenarios" in case of catastrophe... we always keep few physical nodes to bootstrap the cloud. But that would be very unlikely
11:31:08 <b1airo> also, vul at unimelb might have some tips - they are running a HTC/HPC system against CephFS, so have probably seen some of these things
11:32:33 <b1airo> ok, i think we can probably call it quits for now. late here anyway
11:32:46 <janders> agreed
11:32:49 <janders> thanks guys!
11:32:52 <janders> till next time
11:33:15 <belmoreira> thanks
11:34:02 <b1airo> cheers!
11:34:06 <b1airo> #endmeeting