12:04:07 #startmeeting TripleO Security Squad 12:04:08 Meeting started Wed Jun 27 12:04:07 2018 UTC and is due to finish in 60 minutes. The chair is jaosorior. Information about MeetBot at http://wiki.debian.org/MeetBot. 12:04:09 Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. 12:04:09 hello folks! 12:04:11 The meeting name has been set to 'tripleo_security_squad' 12:04:12 o/ 12:04:24 👋 12:04:25 #link https://etherpad.openstack.org/p/tripleo-security-squad 12:04:31 as always, there's the etherpad link 12:05:16 Tengu, moguimar: around? 12:05:56 hey jaosorior 12:06:09 et al 12:06:10 lhinds: hey! how's it going? 12:06:12 weshay|ruck: quiquell|rover: looks like introspection failures in ovb, are we missing a patch or something in rdo cloud? 12:06:24 good thanks 12:06:40 I guess we can start 12:06:46 #topic Pubilc TLS by default work update 12:07:39 So, this work all depends on the work that Jaganathan Palanisamy is doing 12:07:45 mostly this patch https://review.openstack.org/#/c/572678/ 12:08:14 skramaja: do you know if there's any blockers that Jaganathan has? any way we could help? 12:08:49 Sagi Shnaidman proposed openstack/tripleo-upgrade stable/queens: Unpin ansible-lint from fixed version https://review.openstack.org/578364 12:09:43 uhm... I guess h e's not around 12:09:54 anyway, so, this is my last week before my PTO 12:10:03 so, if someone wants to take over this work, it would be very appreciated 12:10:05 Sagi Shnaidman proposed openstack/tripleo-upgrade stable/pike: Unpin ansible-lint from fixed version https://review.openstack.org/578365 12:10:09 else we'll have to postpone it for Stein 12:10:17 URGENT TRIPLEO TASKS NEED ATTENTION 12:10:19 https://bugs.launchpad.net/tripleo/+bug/1777939 12:10:19 https://bugs.launchpad.net/tripleo/+bug/1778040 12:10:20 https://bugs.launchpad.net/tripleo/+bug/1778847 12:10:21 https://bugs.launchpad.net/tripleo/+bug/1778865 12:10:21 Launchpad bug 1777939 in tripleo "Unable to connect to AMQP server on undercloud.internalapi.localdomain:5672 after None tries: (0, 0): (403) ACCESS_REFUSED " [Critical,In progress] - Assigned to wes hayutin (weshayutin) 12:10:22 Launchpad bug 1778040 in tripleo "Error at overcloud_prep_containers Package: qpid-dispatch-router-0.8.0-1.el7.x86_64 (@delorean-master-testing)", " Requires: libqpid-proton.so.10()(64bit)" [Critical,In progress] - Assigned to Quique Llorente (quiquell) 12:10:23 Launchpad bug 1778847 in tripleo "fs027 __init__() got an unexpected keyword argument 'cafile'" [Critical,In progress] - Assigned to Quique Llorente (quiquell) 12:10:24 Launchpad bug 1778865 in tripleo "Failing openstack-pox-linter TypeError: unbound method construct_mapping()" [Critical,Fix committed] - Assigned to Quique Llorente (quiquell) 12:10:32 quiquell|rover, https://review.openstack.org/578365 https://review.openstack.org/578364 12:10:40 So, if there's anybody interested, please let me know and I'll guide you through what's left to do 12:11:06 maybe Tengu ^^ ? 12:11:12 anyway, that's all on that topic 12:11:27 any questions/feedbnack? 12:12:08 #topic Limit TripleO users work update] 12:12:29 lhinds: now that you're around, mind giving an update on this? 12:12:34 so I have my spec up: https://review.openstack.org/#/c/572760/ 12:12:55 could do with some reviews if possible folks, just what you think of the approach 12:13:19 in the meantime I have the main script developed that I am just working out how to best call in CI 12:13:50 I will check that in soon'ish and tag it on the blueprint 12:14:00 that's it for now 12:14:12 Note there is also https://review.openstack.org/#/c/572761/ that would cover TripleO 12:14:17 TripleO's python scripts 12:14:21 mainly python-tripleoclient 12:14:31 Yep, that's Tengu who I think is back from PTO now 12:14:45 I think he might be in his NHO though 12:14:53 ah ok 12:15:03 beer holiday in Munich 12:15:09 alright, any questions/feedback on this topic? 12:15:09 jaosorior: sorted out with latest comments. we will be adding a parameter to identify if hw data is required or not in the plan-environment. will be enabled for nfv and hci 12:15:38 skramaja: any estimates on when that will be sorted out? 12:17:06 #topic any other business 12:17:08 jaosorior: i assume jagan is working on it. havent spoken to him. will appraise him tomorrow. 12:17:14 Marios Andreou proposed openstack/tripleo-quickstart master: Adds virtualenv into dependencies for the reproducer script https://review.openstack.org/578366 12:17:16 Marios Andreou proposed openstack/tripleo-quickstart-extras master: Adds check for python-virtualenv with error message https://review.openstack.org/578081 12:17:18 Anything else folks want to bring up to the meeting? 12:17:30 skramaja: thanks 12:17:43 quiquell|rover, did you see mwhahaha's question? 12:17:48 jaosorior, not sure if we wanted to talk about Vault during this meeting? 12:18:17 redrobot: we could bring it up if you have any updates relevant to the TripleO implications of it :D else we can discuss that offline. 12:18:23 Quique Llorente proposed openstack/tripleo-quickstart master: Unpin ansible-lint https://review.openstack.org/578340 12:18:34 weshay|ruck: Nop 12:18:34 I mean, off the meeting 12:18:48 weshay|ruck: where ? 12:19:12 nothing from me 12:19:19 I took some notes on things to consider when deploying Vault 12:19:34 Alright, lets take a look 12:19:36 #link https://etherpad.openstack.org/p/production-vault 12:19:52 mwhahaha, I see fs01/35 master working on no-op changes 12:20:27 weshay|ruck: when did those last run? 12:20:34 redrobot: what is SSSS? 12:20:38 Jun 27, 2018 8:00:21 AM 12:20:46 Shamir's Secret-Sharing Scheme 12:20:49 https://review.openstack.org/#/c/560445 12:20:50 weshay|ruck: k it was overnight so maybe it's ok now 12:20:52 redrobot: ah that, right 12:21:02 mwhahaha, weshay|ruck: What question ? 12:21:04 mwhahaha, ya.. I was having issues last night as well 12:21:11 RDO is ok now 12:21:14 redrobot: is there any other scheme to protect the key that would be considered secure? 12:21:24 not in Vault, no 12:21:27 OK 12:21:32 Flavio Percoco proposed openstack/tripleo-heat-templates master: Add the ability to scaleup the openshift stack https://review.openstack.org/578285 12:21:32 so I guess the answer is a yes :D 12:21:37 Haha 12:21:43 that was easy 12:22:01 yeah, that's where it gets tricky... not sure how much TripleO automates vs. how much is left to the deployer to figure out 12:22:02 and I guess we can stick with the default until we do further research about how many is good 12:22:39 redrobot: well, the more we can automate the better :D 12:22:51 The thing about SSSS is that you can't really automate Vault unsealing 12:23:01 since each key-shard holder will need to submit their own piece 12:23:12 uhm 12:23:15 how does that work/ 12:23:17 ? 12:23:29 so, when you start Vault, it starts up in a "sealed" state 12:23:36 so the encryption key is not available to it 12:23:40 right 12:24:16 when you use SSSS, each piece is held by a key holder (security person working at deployers company) 12:24:17 Quique Llorente proposed openstack/tripleo-upgrade stable/pike: Unpin ansible-lint https://review.openstack.org/578351 12:24:26 and each needs to submit their piece to the API 12:24:37 until X (threshold) pieces have been submitted 12:24:56 then Vault can recreate the master key and unencrypt the encryption key and become "unsealed" 12:25:02 the thing about SSSS is how many key shards we'd like to have for tripleO 12:25:04 and then you can start using it. 12:25:14 Quique Llorente proposed openstack/tripleo-upgrade stable/queens: Unpin ansible-lint https://review.openstack.org/578352 12:25:48 Quique Llorente proposed openstack/tripleo-quickstart-extras master: Unpin ansible-lint https://review.openstack.org/578339 12:25:55 Alright, so we would deploy Vault using Ansible (which is ran on TripleO). Wouldn't it be possible to have ansible have all the keys on first-installation, temporarily, and unseal the vault 12:25:57 >? 12:26:06 Quique Llorente proposed openstack/tripleo-validations master: Unpin ansible-lint https://review.openstack.org/578368 12:26:07 moguimar, I think the issue is that 1 shard implies 1 person is available who understands GPG and is available to unseal the Vault 12:26:08 if not, we wouldn't be able to deploy anything... and this wouldn't be a very practical thing :/ 12:26:15 weshay|ruck: We need to land 5 patches to unblock check 12:26:30 unblock check? 12:26:45 weshay|ruck: linting is failing 12:26:49 jaosorior, it's that tradeoff between security and convenience 12:27:12 in order to let Ansible have enough shards, you can't use GPG to protect the shard-generation 12:27:14 we can start with the threshold of 1 12:27:24 and then think about increasing it later if necessary 12:27:48 mwhahaha, assuming the patch to turn off idempotency undercloud in fs02 was not in the latest run.. although having a time finding the git commit used 12:28:11 moguimar, what do you think about GPG usage to protect shards? 12:28:12 weshay|ruck: It was not in the last run. 12:28:16 redrobot: we don't have a way to have steps in between to pause the deployment and get folks to start inputing their keys... and even if we did, that's not great usability for folks that are already using TripleO :/ 12:28:25 quiquell|rover, where is the git commit in the log? 12:28:29 for tq 12:28:33 weshay|ruck: commit for idempotency https://review.openstack.org/#/c/577809/ 12:28:48 https://review.rdoproject.org/jenkins/job/periodic-tripleo-ci-centos-7-ovb-1ctlr_1comp-featureset002-master-upload/335/consoleFull 12:28:50 weshay|ruck: The run from my morning didn't have it 12:28:54 * weshay|ruck is blind 12:28:58 apparently 12:29:31 I think having all the shards at the same place is virtually equal to having only one shard 12:29:49 So I'd go with a single key for now 12:29:52 fair enough 12:30:08 moguimar, I can agree with that, but then what does it matter if you have only one shard vs only having the master key? 12:30:22 they are the same =D 12:30:25 I'm actually not even sure if SSSS supports a threshold of one? 12:30:33 I'm just using a common word 12:30:34 Steven Hardy proposed openstack/tripleo-common master: Verify the Swift container exists with a small utility workflow https://review.openstack.org/528213 12:30:38 moguimar, so you're basically saying SSSS should not be used? 12:30:54 I'm fine with that, btw 12:30:57 would make things easier 12:31:14 I've heard somewhere in the documentation that a single key is 1 shard with 1 threshold 12:31:19 redrobot: well, either way this is just preliminary discussion. Ideally this will all go into a blueprint 12:31:23 s/heard/read 12:31:48 k, 12:32:01 moving on, I understand HA a bit more 12:32:03 hopefully by having a blueprint there will be more folks with relevant feedback 12:32:04 lolz 12:32:09 the thing is how to get different players to unseal the vault whenever needed 12:32:25 by players I mean people or other processes 12:32:36 to submit key shards to the vault server 12:32:38 basically Vault open source can be clustered in a Master+Hot Standby mode 12:32:47 right 12:32:59 but to enable that, you need a "HA" enabled backend 12:33:02 which mysql is not 12:33:10 but you could use mysql for storage 12:33:16 so basically you configure 2 backends 12:33:23 lets go with that assumption for now... but I would sure like to know why it's not HA 12:33:23 1 has to be HA-backend 12:33:51 right 12:33:54 could be etcd 12:33:55 as I understand it, Vault is using a distributed lock to decide what node is master 12:33:58 weshay|ruck: TASK [validate-undercloud : Reinstall the undercloud to check idempotency] ***** 12:34:00 2018-06-27 00:06:19.688 | Wednesday 27 June 2018 00:06:19 +0000 (0:00:00.096) 0:37:55.012 ******** 12:34:02 2018-06-27 00:06:19.714 | skipping: [undercloud] 12:34:06 Bogdan Dobrelya proposed openstack/tripleo-heat-templates master: Add support for j2-rendered user defined drop-ins https://review.openstack.org/578370 12:34:12 mwhahaha, quiquell|rover this may be new.. in latest promotion jobs https://logs.rdoproject.org/openstack-periodic/periodic-tripleo-ci-centos-7-singlenode-featureset027-master/5fd6688/undercloud/home/jenkins/undercloud_sanity_check.log.txt.gz#_2018-06-27_00_06_50 12:34:16 and for some reason, the mysql backend can't be used for that 12:34:18 most everything else is passing afaict 12:34:23 https://logs.rdoproject.org/openstack-periodic/periodic-tripleo-ci-centos-7-singlenode-featureset027-master/5fd6688/console.txt.gz#_2018-06-27_00_06_19_669 12:34:32 not sure if it's not possible or just not implemented 12:34:39 maybe just not implemented 12:34:40 weshay|ruck: new fire... what a wonderful news 12:34:41 alright 12:34:47 so lets say we have etcd, which is supported for HA 12:35:01 Yes, then ETCD can be used for the lock, and mysql for storage 12:35:02 weshay|ruck: i saw quiquell|rover had a patch for that 12:35:02 the caveat is that we have no automatic failover, how so? 12:35:22 mwhahaha: Yep... is the ironic api change 12:35:25 so, when the master fails, a hot-standby can become master 12:35:45 but afaict there is no mechanism to let the clients know 12:35:57 so the client needs to be smart enough to go check other nodes when one fails out 12:36:11 or we need to have a smart lb that can reconfigure itself when a master fails 12:36:19 weshay|ruck, mwhahaha: Need some reviewing on the fixes 12:36:30 * mwhahaha reviewed 12:36:32 quiquell|rover, k.. we may need to change the timeout on https://review.rdoproject.org/jenkins/job/periodic-tripleo-ci-centos-7-ovb-1ctlr_1comp-featureset020-master/1021/console 12:36:38 the simple approach is to have a round robin to all nodes 12:36:45 redrobot: what about having clients access Vault behind HAProxy? 12:36:56 so only the master responds 12:37:01 and the hot standby nodes just forward the request to the master 12:37:04 hot-standby nodes always respond with a redirect 12:37:12 ah 12:37:14 I see 12:37:15 weshay|ruck: We can do that in RDO ? 12:37:25 moguimar, ah yes, that's also an option 12:37:34 they can redirect or forward 12:38:37 so, with redirects turned on, and a round-robin lb, you still need to take failed masters out of rotation 12:39:05 So this basically all means we gotta write a smart Vault client library 12:39:19 quiquell|rover, ya 12:39:26 well, smart lb or smart client, yeah 12:39:52 weshay|ruck, mwhahaha, sshnaidm: +2 +1w to be able to fix TQE check https://review.openstack.org/#/c/578339/ 12:39:53 redrobot: OR... we could go around it by tying Vault to a Virtual IP just for Vault usage and having a resource agent tied to it. 12:40:03 with that we can try to merge the ironic fix 12:40:13 weshay|ruck, marios please review fixes for linter: https://review.openstack.org/#/c/578364/ https://review.openstack.org/#/c/578365 https://review.openstack.org/#/c/578340/ https://review.openstack.org/#/c/578339 12:40:26 weshay|ruck, mwhahaha: The actual fix https://review.openstack.org/#/c/578292/ 12:40:27 😯 that could work too jaosorior 12:40:55 redrobot: pacemaker could be monitoring the master's health, and have a VIP attached to that node. When the master fails, it would then move the VIP to where the new master is. The caveat is that there will be a small service interruption. 12:41:08 which is 10s IIRC... however long pacemaker takes to do a monitor action 12:41:42 sshnaidm: Thanks 12:41:44 so yeah, it should be possible. It would just requite us to write a pacemaker resource agent that monitors Vault master nodes. 12:41:59 the magic that ties that agent to the VIP is already available via pacemaker 12:42:20 redrobot: mind if I write that down on your notes? 12:42:29 jaosorior, not at all... 12:42:53 my notes are y'alls notes 😁 12:43:51 written 12:43:58 jaosorior, sweet 12:44:00 sshnaidm: ack in a bit 12:44:07 last thing to consider is the Vault Policy Engine 12:44:18 quiquell|rover, this need to be rebased, not dependent: https://review.openstack.org/#/c/578292/ 12:44:25 as policies need to be defined 12:44:27 quiquell|rover, it's commits in the same repo 12:44:40 yay, more policy work 12:44:48 sshnaidm: You are right same repo 12:45:11 sshnaidm: Have so much commits in my brains right now 12:45:18 redrobot: OK, so we need to maintain a policy file for Vault in TripleO then, right? 12:45:23 yup 12:45:38 quiquell|rover, yeah, totally understandable.. 12:45:47 OK, once you have more notes about how that works, we can get some dedicated time to brainstorm how that would look like 12:45:54 if we use more then the root token, yes 12:46:14 but it sounds like a bad idea to use root token 12:46:23 haha, yeah, moguimar 12:46:23 moguimar: hopefully we do use something else than root token :D 12:46:36 basicaly the root token has godlike powers 12:46:43 and it can create other tokens 12:46:50 quiquell|rover, you know how to fix https://review.openstack.org/#/c/578292/ ? 12:46:53 we could use it if we're just doing a POC though 12:46:58 I was thinking we can define some paths and then build some policies around that. 12:47:06 then you have policies to adress specific access level for the new tokens 12:47:08 thanks for the help sshnaidm 12:47:23 Quique Llorente proposed openstack/tripleo-quickstart-extras master: Use openstack CLI for ironic https://review.openstack.org/578292 12:47:45 redrobot: if you can add some notes on how policy works for Vault that we could read, it would be great :D 12:47:47 sshnaidm: updated https://review.openstack.org/#/c/578292/ 12:47:58 one thing I'd like to see with the Vault Policy, is that one service could not read secrets that belong to another service 12:48:02 that way I can get on-par with the stuff your checking out, and hopefully give more useful comments. 12:48:25 quiquell|rover, why did you abandon it? 12:48:33 jaosorior, ack, I'll add more notes to the policy stuff 12:48:34 weshay|ruck: Didn't have time to really investigate it, just gave a fix 12:48:37 it just needed to be rebased 12:48:39 quiquell|rover, abandoned? 12:48:48 weshay|ruck: I have reparent it with at another patch 12:48:56 right 12:49:07 no need to abandon though 12:49:25 weshay|ruck: brain fart, restored 12:49:27 redrobot: anyway, so far the things we could already start considering for a POC are: * master key initially (no SSSS). * etcd as a backend for HA. 12:49:30 quiquell|rover, :) 12:49:48 jaosorior, yep, and mysql for storage backend 12:50:12 folks, do you know, can I use get_file in parameter_defaults? 12:50:12 (I'm assuming mysql is preferred for storage because it may already have backup policies around it?) 12:50:26 right 12:50:36 and we could start researching what's needed to be written for the pacemaker resource agent 12:50:41 I'm really stuck with https://review.openstack.org/#/c/578088/ 12:51:00 need to figure out the way to refer the env file named lately 12:51:01 we could start this using the root token. And after having something that kinda works, we could move on to have a nicer implementation with proper policies 12:51:24 redrobot: does that sound alright to you? 12:51:30 undercloud_config knows nothing of heat stacks, and I need to have it in the -e foo file name 12:51:31 yep 12:51:35 great great 12:51:49 redrobot: thanks for bringing it up! 12:51:53 so thinking of using get_file in parameters_default... would it work? 12:52:03 👍 12:52:12 bogdando: No get_file won't work in an environment file 12:52:17 * bogdando sigh 12:52:17 weshay|ruck, sshnaidm, mwhahaha, marios: https://review.openstack.org/#/c/578351/, https://review.openstack.org/#/c/578352/, https://review.openstack.org/#/c/578368/ 12:52:23 bogdando: you'll have to write the file then append it to the -e options 12:52:25 missing linting reviews 12:52:26 redrobot: anything else you want to bring up regarding that topic? 12:52:34 I think that's all for now 12:52:41 shardy: I cannot write it before I know the stack id 12:52:44 or set the StackAction in one of the environment files we already write 12:52:44 alright 12:52:46 it's created later 12:52:48 thanks for joining folks! 12:52:50 Also, one thing 12:52:56 I'll be off for three weeks on PTO 12:53:02 quiquell|rover, mm.. it's duplicates 12:53:06 ah yes, I'm out on PTO next week as well 12:53:08 lhinds: could you take on driving these meetings while I'm gone? 12:53:16 shardy: so I did https://review.openstack.org/#/c/578370/ but it won't help much :( 12:53:17 bogdando: The ID won't be known until the ephemeral stack is created, so I think the best you can do is use the name e.g the --stack option 12:53:23 as I said in the review previously 12:53:32 jaosorior, I can't do next week, but I could do the two weeks after that. 12:53:37 jaosorior: sure, np..the next two right? 12:53:39 shardy: yea, but that looks too fragile... 12:53:44 lhinds: next three :D 12:53:58 sshnaidm: wich one ? 12:54:03 shall i do next week and redrobot the following two? 12:54:05 I'll be off next week as well 12:54:07 quiquell|rover, I did for stable branches: https://review.openstack.org/578364 https://review.openstack.org/578365 12:54:09 bogdando: really? I think the --stack option will be rarely used, and we can just document that (as with all tripleo deployments) the name must be unique? 12:54:13 lhinds, redrobot: that sounds like a plan! 12:54:15 thanks guys! 12:54:19 and thanks everybody for joining 12:54:20 k, no worries 12:54:22 it'll be better than using one name anyway 12:54:25 #endmeeting