21:00:21 <oneswig> #startmeeting scientific-sig 21:00:22 <openstack> Meeting started Tue Mar 2 21:00:21 2021 UTC and is due to finish in 60 minutes. The chair is oneswig. Information about MeetBot at http://wiki.debian.org/MeetBot. 21:00:23 <openstack> Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. 21:00:26 <openstack> The meeting name has been set to 'scientific_sig' 21:00:33 <martial> Hello Stig 21:00:40 <oneswig> Hi martial 21:00:45 <oneswig> #chair martial 21:00:46 <openstack> Current chairs: martial oneswig 21:00:49 <oneswig> How's things? 21:01:17 <oneswig> P2302 is that the NIST federation work? 21:01:29 <martial> not bad, just a little crazy :) 21:02:29 <oneswig> Hopefully crazy in a good way 21:02:29 <martial> IEEE actually, the NIST work got published in NIST SP500-332 21:02:38 <martial> #link https://www.nist.gov/publications/nist-cloud-federation-reference-architecture 21:03:12 <janders> g'day oneswig martial o/ 21:03:18 <janders> how is it going? 21:03:38 <oneswig> Hi janders, good thanks 21:03:55 <martial> doing well, thanks janders 21:04:35 <oneswig> Busy :-) 21:05:43 <martial> busy good? :) 21:05:51 <julianp> Hi all. Bookmarked that link martial. Time to learn more about cloud federation. 21:05:54 <oneswig> I'm only 2 years late on this but I saw a really neat talk on large-scale Ceph administration: https://www.youtube.com/watch?v=niFNZN5EKvE 21:06:44 <oneswig> Well worth a look, it presents a very nice way of visualising the spread of utilisation across nodes. 21:11:01 <oneswig> It came up after a group we work with were adding larger drives to an existing Ceph cluster which pushed it to hitting hard limits of PGs/OSD. Sounds quite painful. 21:13:13 <oneswig> julianp: you were asking a while back about infrastructure. I think we are getting much closer now to having guests on the system. 21:13:37 <julianp> Eeeexcellent. Thanks for thinking of me oneswig. 21:13:39 <martial> I wonder if we can ask Rion to have another Minio conversation 21:14:00 <oneswig> Rion having more fun with MinIO? 21:15:40 <oneswig> julianp: will be in touch soon I hope! 21:16:12 <martial> well we are heavy with Ceph on SSDs but if you remember we had a small video chat with Rion about why Minio was useful for deployments 21:16:25 <julianp> Much obliged oneswig. 21:17:09 <julianp> martial do you remember why Minio was considered useful? 21:18:14 <martial> ease of deployment seemed to be a core reason 21:18:34 <julianp> Gotcha. 21:21:16 <oneswig> martial: ever compared it to Portworx? 21:22:40 <martial> I have a very small minio equivalent setup for testing but never tried portworx 21:25:28 <martial> any new setup for you Stig? 21:26:04 <oneswig> I've been banging my head on a real puzzler for the last few days. I have a set of hosts that take ~3s to run "time ssh centos@host hostname" 21:26:19 <oneswig> It's not DNS before you ask, pretty sure on that now :-) 21:27:22 <priteau> Something auth-related, PAM? 21:27:26 <janders> what OS is the ssh connection originating from? 21:27:34 <oneswig> I'm still uncertain on the root cause. There's some smoking guns relating to SELinux blocking access to /etc/ld.so.cache that look suspicious. 21:27:37 <oneswig> CentOS 8.3 21:28:24 <oneswig> Hi priteau :-) auth is ssh keys - although there's plenty of pam modules involved in that login. 21:28:25 <janders> does 'setenforce 0' make any difference? 21:28:41 <janders> let me see what my NVMe cleaning lab is on 21:28:44 <oneswig> janders: disabling selinux and rebooting the node is not apparently helping... 21:29:11 <oneswig> It's bizarre because I have other nodes in the same environment for which the same test takes a more sensible 0.2s 21:29:26 <janders> oneswig disabling SEL both client and server side? 21:29:26 <martial> same hardware/kernel version? 21:29:37 <janders> oneswig melding servers would come in handy :) 21:29:38 <oneswig> Different hardware, same kernel 21:29:57 <oneswig> melding? 21:30:22 <oneswig> janders: making the client == the server, ie ssh localhost, has the same delay 21:30:48 <martial> ohhhh I have had this happen, it was a network device driver for me 21:31:11 <julianp> Does `ssh -vvvv` show you where it gets stuck? 21:31:26 <priteau> ssh localhost shouldn't be slowed down by a NIC issue though 21:31:35 <janders> oneswig does the problem seem to stick to either the piece of hardware in question being a client or a server? 21:31:41 <janders> priteau good point 21:32:09 <martial> pierre, agreeing with you but localhost also was slow 21:32:25 <oneswig> julianp: a bit. The ssh debug output isn't timestamped, alas. There was a message, I'll see if I can dig it out. 21:32:32 <julianp> I bet it was the butler in the library with a candlestick. 21:32:46 <martial> now I can not remember if there was something else related to it 21:34:09 <oneswig> I've been running strace on client and server to try to spot something, that's my current effort. 21:34:18 <oneswig> Interesting though! 21:34:36 <oneswig> brb 21:34:45 <priteau> Have you tried changing various other settings in sshd_config? GSSAPI maybe? 21:36:16 <martial> I was checking in our slack to see if we documented this one 21:36:39 <oneswig> I removed the GSSAPI auth method and that bought some time, a fraction of a second 21:36:48 <martial> no luck 21:38:08 <martial> silly question because that was part of our checklist: IPv6 disabled? 21:38:14 <oneswig> uninstalling cockpit also gained me about 0.1s. Small things. 21:38:37 <oneswig> IPV6 I haven't tried - worth a shot! 21:39:42 <priteau> MTU? (although I've seen it cause hangs, not slowdowns) 21:40:33 <priteau> (and it would probably not affect ssh localhost) 21:41:05 <oneswig> martial: just tried it, no joy alas 21:41:14 <martial> files listed first in your nsswitch.conf? so it uses /etc/hosts ? 21:41:50 <janders> oneswig a bit brute-force, but maybe worthwhile copying /etc of a "good" and a "bad" machine and doing a recursive diff? 21:42:11 <oneswig> priteau: I don't think it's MTU. 21:42:43 <martial> similar idea as above, you could try "UseDNS no" in your sshd_config 21:43:03 <janders> oneswig is console login normal (making sure it's ssh only)? 21:43:07 <oneswig> If I run "ssh-keygen" on a dodgy node, there's a long delay before it prompts me for the filename. That might be connected 21:43:35 <oneswig> martial: UseDNS no is set - been bitten by that one before :-) 21:44:04 <martial> okay not DNS, not IPv6 (use -4 :) ) 21:44:04 <julianp> As for the timestamps not being in the ssh output, you can add it using `ts` found in `moreutils` TIL: ssh -vvvv some-host hostname 2>&1 | ts 21:44:32 <oneswig> janders: I'd need to get onto the BMC and it's one of those HPE boxes where you have to buy a license to use the console after the node boots... 21:45:12 <oneswig> julianp: that is new to me, neat trick! 21:46:00 <priteau> The slow ssh-keygen is very odd, I don't think it should do any I/O 21:46:07 <priteau> Network I/O I mean 21:46:12 <priteau> Something related to OpenSSL then? 21:46:27 <janders> oneswig that licensing model is ridiculous! 21:46:53 <priteau> oneswig: what does "cat /proc/sys/kernel/random/entropy_avail" say? 21:47:14 <julianp> Oh that's a good idea priteau. We've run into that. 21:47:38 <oneswig> entropy_avail = 3443 - is that enough? 21:47:47 <priteau> Should be 21:48:04 <janders> oneswig the two machines I'm currently working on have ~3400 and ~3800 21:48:10 <janders> priteau nice trick! 21:48:13 <oneswig> Doesn't look like I can get moreutils on CentOS - perhaps it's in EPEL? 21:49:02 <julianp> I believe you can replicate the moreutils ts functionality using some awk. 21:49:28 <oneswig> julianp: now we are talking :-) 21:49:49 <julianp> xD 21:52:19 <julianp> ssh -vvvv -C condo hostname 2>&1 | awk '{ print strftime(),$0 }' 21:52:28 <oneswig> I'm clearly going to have to follow up to the SIG on this if I ever figure it out... 21:52:49 <julianp> Yes please! My curiosity is piqued. 21:53:17 <oneswig> Anything can be curious, until you *have* to solve it 21:54:06 <oneswig> for AOB: couple of small points 21:54:15 <martial> nice trick julian 21:55:29 <oneswig> PTG (virtual) dates were announced - 19-23 April - https://www.openstack.org/ptg/ 21:56:47 <oneswig> Next week (Wednesday 1100 UTC) I'm hoping to have a session on Jupyterhub and OpenStack. I think there's a good deal to cover on how to provide user-friendly integrations 21:57:09 <julianp> Ooh. I'm interested in that. 21:57:35 <oneswig> julianp: hopefully not too early a start for you? 21:57:53 <janders> oneswig from my side, I completed the initial work on support for NVMe-native disk cleaning in Ironic 21:58:02 <janders> if you are interested, I can give a preso on that in a couple weeks 21:58:21 <oneswig> janders: That would be great, I'd love to see it. 21:58:46 <janders> initial = works best on an all-NVMe nodes (it doesn't make much of a difference in hybrid HDD-NVMe configs or on SSDs) 21:58:58 <oneswig> Random but related question: Do you know, with software RAID in Ironic, can I label the RAID block devices? They come up in an arbitrary order. 21:59:29 <janders> oneswig not sure. Might be worth asking on #openstack-ironic 21:59:31 <oneswig> janders: would be good to hear more. 21:59:41 <oneswig> thanks janders, will do 21:59:50 <oneswig> OK, nearly time - final comments? 22:00:17 <oneswig> #endmeeting