Tuesday, 2021-04-13

*** hamalq has quit IRC		01:19
*** sanjayu__ has quit IRC		01:21
*** ajitha has joined #zuul		01:57
*** rlandy has joined #zuul		02:17
*** rlandy is now known as rlandy\|rover		02:18
*** evrardjp has quit IRC		02:33
*** evrardjp has joined #zuul		02:33
*** rlandy\|rover has quit IRC		02:41
*** bhavikdbavishi has joined #zuul		03:13
*** bhavikdbavishi1 has joined #zuul		03:16
*** bhavikdbavishi has quit IRC		03:17
*** bhavikdbavishi1 is now known as bhavikdbavishi		03:17
*** vishalmanchanda has joined #zuul		03:18
*** jfoufas1 has joined #zuul		04:49
*** fbo has joined #zuul		05:07
*** y2kenny has quit IRC		05:26
*** bhavikdbavishi has quit IRC		06:17
*** bhavikdbavishi has joined #zuul		06:23
*** saneax has joined #zuul		06:30
*** dpawlik has joined #zuul		06:33
*** dpawlik has quit IRC		06:37
*** dpawlik has joined #zuul		06:40
*** dpawlik has quit IRC		06:42
*** dpawlik has joined #zuul		06:55
*** dpawlik has quit IRC		06:55
*** bhavikdbavishi has quit IRC		06:58
*** jcapitao has joined #zuul		07:12
*** dpawlik has joined #zuul		07:13
*** dpawlik has quit IRC		07:13
*** dpawlik has joined #zuul		07:14
*** jpena\|off is now known as jpena		07:34
*** rpittau\|afk is now known as rpittau		07:43
*** tosky has joined #zuul		07:49
*** nils has joined #zuul		08:11
*** dpawlik has quit IRC		09:28
*** nhicher has quit IRC		09:28
*** fbo has quit IRC		09:29
*** nhicher has joined #zuul		09:48
*** fbo has joined #zuul		09:48
*** jcapitao has quit IRC		09:49
*** dpawlik has joined #zuul		09:52
*** rpittau is now known as rpittau\|bbl		09:54
*** bhavikdbavishi has joined #zuul		10:26
*** bhavikdbavishi has quit IRC		10:44
*** dpawlik has quit IRC		10:45
*** dpawlik9 has joined #zuul		10:53
swest	corvus: 785972 lgtm. no need to wait longer than necessary from my pov	11:14
*** bhavikdbavishi has joined #zuul		11:29
*** jpena is now known as jpena\|lunch		11:31
*** rlandy has joined #zuul		11:37
*** rlandy is now known as rlandy\|rover		11:37
*** bhavikdbavishi1 has joined #zuul		11:57
*** rlandy\|rover is now known as rlandy\|rover\|mtg		11:58
*** bhavikdbavishi has quit IRC		11:58
*** bhavikdbavishi1 is now known as bhavikdbavishi		11:58
*** jcapitao_ has joined #zuul		12:02
*** jcapitao_ is now known as jcapitao		12:04
*** rpittau\|bbl is now known as rpittau		12:31
*** jpena\|lunch is now known as jpena		12:37
*** tosky has quit IRC		13:19
*** tosky has joined #zuul		13:25
*** tosky has quit IRC		13:29
*** tosky has joined #zuul		13:29
*** bhavikdbavishi has quit IRC		13:40
*** rlandy\|rover\|mtg is now known as rlandy\|rover		13:59
openstackgerrit	Merged zuul/zuul master: Make buildset mandatory on build https://review.opendev.org/c/zuul/zuul/+/770900	14:02
*** sanjayu_ has joined #zuul		14:30
*** saneax has quit IRC		14:33
*** sanjayu__ has joined #zuul		14:39
*** sanjayu_ has quit IRC		14:42
*** hamalq has joined #zuul		16:20
*** bhavikdbavishi has joined #zuul		16:22
*** rpittau is now known as rpittau\|afk		16:24
corvus	swest: ack	16:28
*** jcapitao has quit IRC		16:29
corvus	zuul-maint: https://review.opendev.org/758940 is going to be a minor breaking change for operators (we need them to generate a new secret and put it in zuul.conf). if i send an announcement about that today, how long should we wait to merge that change in master? until tomorrow, or until monday?	16:30
corvus	obviously that only affects people running master CD; at the same time we can also let folks know that it will be required in the next release, which i imagine will be at least a few weeks from now.	16:31
corvus	(and people can add the new section asap, so this is about letting people know to add this now to prepare for when it's merged/release)	16:32
*** nils has quit IRC		16:32
clarkb	corvus: the new secret is the encryption key for the data at rest in zk?	16:33
corvus	clarkb: yep	16:33
clarkb	is that something we can auto generate to ease deployment? or would that reduce security posture?	16:33
corvus	it's used (or will be used) by the schedulers and executors, and needs to be distributed to them by some means other than zk. right now the implementation puts it in the zuul.conf file; we could conceivably auto-generate it and write it to another file, but i don't know how we'd distribute it.	16:36
fungi	i don't think autogenerating it would reduce security posture, however it would mean additional code and something to write it out	16:36
fungi	and yes, once it's used by additional machines, you have a classic key distribution problem	16:36
corvus	(i mean, if you look at the state right now with exactly one scheduler needing it, it's easy. but with >1 shedulers ande >1 exeuctors, it's a distribution problem)	16:36
clarkb	ah ya that makes sense since all members of the cluster will need to decrypt eventually	16:36
corvus	maybe not web and merger, but certainly a lot	16:37
corvus	(and who knows, maybe they will)	16:37
fungi	key distribution is considered notoriously hard in cryptography circles. there are already many solutions out there, we should allow operators to pick one which best suits them and not try to invent our own (which we would invariably get wrong, and that would reduce the overall security posture)	16:38
corvus	i'm expecting executors to need it because if we don't want to store decrypted secrets in zk, we'll need to push the decryption to later (ie, right now we put decrypted secrets in gearman, but we won't want to do that in zk)	16:38
clarkb	makes sense	16:38
corvus	fungi: yeah, for now it's "whatever you're doing with your config management"; i could definitely see us doing something like adding support for a vault or something later. much later. :)	16:39
fungi	so in opendev for example, i expect we would put the key material in our private variable store on our bastion, and then use our ansible jinja2 configuration template as usual	16:39
*** jfoufas1 has quit IRC		16:40
corvus	fungi: exactly like https://review.opendev.org/785980	16:41
fungi	i guess the point is, zuul already needs to be configured with other secrets, so this one should be handled the same as the rest for consistency (i.e. we punt and expect the sysadmin to address the need)	16:41
corvus	yep. gerrit/github http creds are handled this way	16:41
clarkb	corvus: I feel like tomorrow and maybe even monday may not be sufficient notice? however, I'm not sure how many people are installing from source (tobiash, gerrit zuul, and opendev zuul?)	16:42
corvus	source or :latest images	16:43
corvus	clarkb: want to do 2 weeks then?	16:43
clarkb	we've still got big release stuff happening on the opendev zuul this week. I wasn't planning to do the zk upgrade stuff until friday at this point :/	16:43
*** jpena is now known as jpena\|off		16:43
corvus	clarkb: oh, i'm not worried about opendev :)	16:43
fungi	i think if the error message logged when that key is missing is clear, easy to spot and has sufficient breadcrumbs/guidance in the upgrade release note, it should be fine?	16:43
corvus	opendev will be taken care of within the hour	16:43
clarkb	corvus: we'd still need to restart once the change lands and ensure everything is working as expected?	16:44
tobiash	we're also updating in a controlled manner so we're unaffected as well	16:44
clarkb	the change being the zuul side change, we can stage the config change early and wait	16:44
corvus	clarkb: at some point, yeah, but i'm in no rush to restart zuul right now	16:44
clarkb	corvus: got it	16:45
corvus	i think the issue is this: how much notice do we give people running CD that we're making a breaking deployment change?	16:45
corvus	we usually give lots of notice because we usually only do them on major releases	16:45
corvus	we should have included this in v4, but we forgot	16:45
clarkb	maybe say mid next week and give people a week?	16:46
corvus	and i don't think this is worth revving to v5 and then renaming the distributed scheduler as v6; but if we wanted to be really strict about our versions, we could do that.	16:46
clarkb	that should heopfully be enough time to notice and either pin deployment or stage the key appropriately	16:46
corvus	(mostly because this is a really easy deployment change to make)	16:46
corvus	clarkb: that sounds reasonable; i can send the email today with a target merge date of april 20	16:47
corvus	zuul-maint: ^ sound good?	16:47
fungi	seems like plenty of time to me. i'm reading through the change now just to get an idea of what error message the scheduler is likely to throw if started without that set	16:48
clarkb	corvus: sounds good to me	16:48
corvus	fungi: i think it's: raise RuntimeError("No key store password configured!")	16:49
fungi	aha, thanks!	16:49
corvus	so i think that would show up as an exception in the scheduler log	16:50
corvus	with traceback	16:50
fungi	i was getting bogged down in the various kinds of keys handed in zuul/lib/keystorage.py	16:50
fungi	and yeah, the RuntimeError exception was in zuul/scheduler.py which i hadn't reached yet	16:51
fungi	that looks clearly noticeable	16:51
fungi	it'll also be safe to drop back to 4.2.x until you correct your configuration, i expect	16:52
corvus	yep	16:53
corvus	zuul-maint: draft email https://etherpad.opendev.org/p/ta01qhhP9DNg0jpmMFjK	16:57
fungi	corvus: does it need any guidance on the format/length of teh value?	16:59
*** decimuscorvinus has quit IRC		16:59
fungi	er, i guess you say "strong password" so good enough	16:59
fungi	yeah lgtm	16:59
*** decimuscorvinus has joined #zuul		17:00
fungi	ideally secret encryption key would be at least as strong as the strongest secret encrypted with it	17:00
fungi	is there an effective size limit for that value?	17:01
corvus	fungi: yeah, i was thinking not to be too prescriptive there, but i could give an example if it would help...	17:01
corvus	fungi: i don't believe so; we treat it as a string and pass it to the 'password' field when serializing the private keys	17:02
fungi	i'm going back over the change a bit more closely just to see what kdf is being used, though presumably it's consistent with the one we use on the per-project keys	17:03
corvus	serialization.BestAvailableEncryption(password) looks like the important bit there	17:03
fungi	oh, right, we're using rsa to encrypt the project secrets	17:04
fungi	not symmetric encryptuion	17:05
fungi	encryption	17:05
corvus	yep, so the issue here is storage of the rsa private key (and also the ssh private key for the similar per-project ssh key feature)	17:05
fungi	yeah, symmetrically encrypting asymmetric encryption private keys, essentially	17:06
corvus	yep	17:07
fungi	i'll check out whether cryptography.hazmat.primitives.serialization.BestAvailableEncryption() does any key stretching	17:07
fungi	or if we need to layer some in there	17:07
corvus	fungi: the ssh key side of this would be this from paramiko: http://docs.paramiko.org/en/stable/api/keys.html#paramiko.pkey.PKey.write_private_key	17:11
corvus	(with an RSAKey)	17:11
clarkb	is there a risk using a best available method that if best available changes later you'll eb out of sync with the stored data?	17:12
clarkb	(thinking about the symettry of decryption and encryption, you encrypt with best available and if it doesn't appropriately identify the storage system then decryption may get confused?)	17:13
fungi	the serialization should encode an algorithmic identifier	17:13
fungi	so the decryption method can use that to determine how to decrypt it	17:14
clarkb	ok (the docs do confirm it can change over time, but as long as they are writing metadata to make an older or newer option useable then should be fine)	17:14
fungi	the examples do demonstrate passing a raw passphrase directly to serialization.BestAvailableEncryption() so this is likely safe without doing any stretching before hand to convert it into a usable symmetric key with even distribution	17:17
*** decimuscorvinus has quit IRC		17:19
*** decimuscorvinus has joined #zuul		17:23
*** decimuscorvinus has quit IRC		17:23
*** decimuscorvinus has joined #zuul		17:25
fungi	and it looks like the underlying implementation is specific to the kind of material being encrypted too	17:27
*** sshnaidm has joined #zuul		17:28
*** sshnaidm is now known as sshnaidm\|pto		17:28
clarkb	which makes sense since the produced key values will want different attriutes based on whether or not you use rsa or ed25519 or whatever under the hood?	17:28
clarkb	I was looking at it and it appears they do handle a "password is too long" return code from openssl, but I can't find any indication of what the limit might actully be on the openssl side	17:29
clarkb	I suspect that it is quite large and we don't need to wrry about it	17:29
clarkb	ah there we go there is a check for > 1023 bytes	17:30
fungi	so basically don't use 8 kilobit keys	17:31
fungi	but 4096 bits would be reasonable	17:31
corvus	swest, tobiash, fungi: i did happen to spot a minor upgrade-related concern i missed yesterday and switched my vote to a -1; i'll see if i can make sure that code path is tested later today.	17:32
fungi	especially if you're using it to encode 4096-bit private keys	17:32
clarkb	I haven't looked much at the zuul side, but still slightly concerend about best available changing. We use bestAvailable to produce a private key, will that private key continue to decrypt data in zk if the best available encrpytion changes?	17:32
corvus	clarkb: i think we use best-available to choose the serialization encryption for the private key	17:33
clarkb	corvus: ah the serialization not the actual private key derivation, got it	17:33
corvus	clarkb: so i think the thought is the serialization handler should have the metadata fungi was talking about to know which mech to use to deserialize it with the provided password	17:33
fungi	worst case if that goes through a shift you may need to clear it out of zk and take some downtime to restart all schedulers i guess	17:34
corvus	i don't know that for certain, but that's my read	17:34
fungi	but yeah, how it's signalled is going to probably be up to cryptography's load methods autodetecting how to deserialize	17:35
fungi	for example we're using TraditionalOpenSSL format for the rsa private keys, so it will be up setting/reading to whatever algorithmic identifiers that format supports	17:37
fungi	this isn't an opaque symmetrically encrypted blob we're storing in zk, we're relying on whatever serializations are typical for the various kinds of key material we're relying on, and whatever encryption options they have natively	17:38
clarkb	ya I guess I'm just wondering what happens if you end up with older ssl on half the hosts and they cannot do best available for the others. Or if best available moves ahead for some other reason (cyptography update?)	17:40
fungi	so that's saying to use whatever the "best" parameters are for TraditionalOpenSSL	17:40
fungi	and yes, we'll probably need to set some expectations around upgrade ordering for things like ssl libs	17:41
corvus	well, if best available doesn't immediately change to a previously unavailable algo, then it should be fine	17:41
clarkb	corvus: that is a good point	17:41
corvus	iow, if best available switches from A to B but B was previously available, then it should be invisible	17:42
fungi	right, you'll probably need to avoid upgrading some of your systems and then blowing away and reencoding the keys you're storing in zk	17:42
corvus	but if a scheduler does not have B, and another scheduler both has B and bestavailable switches to B, then it's a prob	17:42
corvus	hopefully not common in practice, but i guess if you made a huge (10 year?) jump it might be expected	17:43
fungi	and may need to avoid adding new projects if part of your system isn't upgraded yet	17:43
fungi	upgrade some but not all of your servers from ubuntu 16.04 to 20.04 and then add new projects, you might end up encoding with the openssl from 20.04 in ways which the openssl on 16.04 can't handle... or the other way around openssl may be configured on 20.04 to no longer allow some encoding which was "best" as of 16.04 (though much more unlikely, typical deprecation is much slower than that, but maybe some	17:47
fungi	advance in factoring will suddenly make sha2 unsafe in a few years?)	17:47
fungi	and yeah, i'd worry less about what the openssl on those platforms supports, more about what the distros might decide to disallow through runtime or buildtime configuration	17:48
fungi	as we've seen with fedora 33 and rsa for openssh	17:48
clarkb	I guess the good news with ^ is that undoing that as a user isn't terribly difficult	17:49
clarkb	you reduce your default security stance but that may be worthwhile to get all hosts up to date, then restrict again in a short term period	17:49
fungi	anyway, i'm satisfied at this point it doesn't represent a glaring risk of future breakage. there may be changes we become aware of which are problematic for some deployments, but we should be able to tackle them if and when they arise	17:51
fungi	and i wouldn't want to prematurely optimize for problems we don't have, what's there now is already necessarily complex so we shouldn't make it any more complex than it has to be ;)	17:52
fungi	after all, complexity is the greatest source of security problems, not lack of foresight	17:52
corvus	++	17:58
corvus	i'll go ahead and send that email	17:58
fungi	out of curiosity, what would the process look like for changing that key? shut down all dependent services, remove the project keys from zk, then start services again with the new config?	17:59
openstackgerrit	Merged zuul/zuul master: Move project secrets key loading to key storage https://review.opendev.org/c/zuul/zuul/+/758939	18:00
clarkb	fungi: probably need to clear entries in zk too?	18:00
clarkb	to avoid zuul trying to read them without the old key present	18:00
fungi	i said "remove the project keys from zk" but you mean something in addition to those?	18:01
clarkb	fungi: yes all of the other secret data is or will be encrypted with the resulting encryption key if I read things right	18:01
clarkb	if left behind zuul would try to read and decrypt it with the new key and fail?	18:02
fungi	oh, i missed that the scheduler is decrypting secrets with the project keys and then reencrypting them wit this shared key	18:02
corvus	er that should not be what's happening	18:02
fungi	yeah, so you shouldn't need to remove job secrets from zk right?	18:03
clarkb	specifically "All private data in Zookeeper will be encrypted at rest."	18:03
clarkb	from teh commit message	18:03
fungi	yeah, but the job secrets are already encrypted with per-project keys	18:03
clarkb	oh I see	18:04
fungi	so that part was already solved	18:04
corvus	correct. and now the per-project keys are encrypted with this new master password, and since they are, they can now be stored in zk.	18:04
clarkb	that only becomes a problem if the project keys change then which doesn't necessarily happy if you change the top level key, though it may	18:04
fungi	and those keys themselves aren't changing, just how those keys are stored is	18:04
clarkb	(if you wanted to rotate everyting out below in the "tree")	18:04
*** josefwells has joined #zuul		18:40
*** sanjayu__ has quit IRC		18:53
*** josefwells has quit IRC		18:59
*** bhavikdbavishi has quit IRC		19:32
*** ajitha has quit IRC		19:46
*** vishalmanchanda has quit IRC		19:55
*** avass has quit IRC		19:55
*** avass has joined #zuul		20:02
*** y2kenny has joined #zuul		20:17
y2kenny	Where can I find out more about the Zuul contribution workflow? (I see the Workflow label on Gerrit but I am not sure what is missing for the value to change.)	20:20
fungi	y2kenny: one of the zuul maintainers has to set the workflow label to +1 (it's controlled via group acls in gerrit)	20:21
clarkb	y2kenny: there are three review categories. The first is verified which zuul supplies a +1 for if check jobs pass after uploading your patchset. Next is code-review and workflow. These tend to be applied by core reviewers. Our gerrit enforces a single code-review +2 and a workflow +1 to merge code. When these happen zuul reruns tests again and if they pass it +2 verifies and then merges it	20:21
clarkb	the code review and workflow labels are separated so that you can control the timing of a merge if it doesn't make sense to land something yet but you still want to indicate you have done proper review	20:22
clarkb	it also helps us to do our by convention two reviewers +2 before approval process. The second person to review can add the approval	20:22
y2kenny	fungi, clarkb: ok. Is there a major release coming that slowing submission? I am just wondering if there's anything else I need to do for a patch to go in and the timing of the patch. (It's a fix that I wanted to get into my deployment. https://review.opendev.org/c/zuul/zuul/+/783986)	20:24
clarkb	y2kenny: a release was just made so I think any holdups around that should be gone now	20:27
clarkb	I've persoanlly been distracted with other things lately, but I can take a look at that change	20:27
y2kenny	clarkb: thanks!	20:27
clarkb	today is actually a limbo day with openstack release tomorrow and me not wanting to break anything before then :)	20:27
y2kenny	make sense.	20:28
clarkb	that change does conflict with a number of other changes. corvus any concern with that one going in?	20:29
clarkb	I +2'd it but didn't approve to give corvus a chance to weigh in on the conflcits resolution, but I suspect we can land it	20:31
y2kenny	great. Thanks!	20:31
corvus	y2kenny, clarkb: it's worth noting it conflicts with another implementation of the fix :)	20:35
y2kenny	corvus: oops... I did not realize that	20:36
corvus	well, to be fair, it has a -1 and may have fatal flaws. i don't mind at all :)	20:36
corvus	but we should look closely at it and see if the issues there are resolved by the new one :)	20:37
clarkb	++	20:37
corvus	the other two conflicts should be easily resolved though; i can't speak for their authors, but in general, i think a fix like this can go in first and rebasing on it shouldn't be a big deal (to my knowledge the other two changes don't change this code, just move it around, so the rebase should be easy or nearly automatic)	20:38
corvus	but yeah, let's take a quick look at the other and make sure we have the bases covered	20:39
corvus	https://review.opendev.org/738668 is the mirror change	20:39
clarkb	I think y2kenny's case doesn't handle when a run is aborted	20:42
clarkb	but I'm not sure we want it to since the intent is to rerun?	20:42
clarkb	and aborted jobs should be those that are forcefully killed (do we want logs in that case?)	20:42
y2kenny	my implementation also doesn't handle RESULT_DISK_FULL	20:46
y2kenny	so perhaps I should handle both UNREACHABLE and DISK_FULL but not ABORT	20:47
corvus	disk_full is handled at the level above, it's not necessary to do it there.	21:00
y2kenny	ok	21:01
corvus	and i don't think there's anything else we want to do with aborted; we should just return asap when we abort, and that still happens	21:02
corvus	i think tristanC_'s question about skipping the pause stanza if we're unreachable is a good question, which i never answered on my version. i tend to think it's mostly harmless to leave there, as setting the pause flag is probably the last thing it does in the playbook, so unreachable isn't likely to happen after it's set. but it could, so skipping that may be an improvement.	21:04
corvus	oh, but maybe the success check handles that	21:05
corvus	yeah, we can't run pause because we won't have set success	21:05
corvus	that actually wasn't present in the old version, so that's improved in the mean time	21:05
corvus	y2kenny: i believe your patch is not subject to either of the issues tristanC_ raised on mine, so we should abandon mine and continue to merge yours	21:07
y2kenny	corvus: sounds good.	21:08
corvus	let's give tristanC_ a little bit in case he wants to review https://review.opendev.org/783986 since he looked at the earlier one; but i'm happy it has sufficient review and addresses his earlier concerns, so if he's busy and doesn't weigh in soon, we can go ahead and approve it.	21:08
*** rlandy\|rover is now known as rlandy\|rov\|biab		21:09
tristanC_	corvus: thanks for the reminder, i actually already started the review, but got distracted and forget to +2 the change	21:19
*** tristanC_ is now known as tristanC		21:19
*** rlandy\|rov\|biab is now known as rlandy\|rover		21:58
openstackgerrit	Merged zuul/zuul master: Allow post playbook to run when run playbook has unreachable result https://review.opendev.org/c/zuul/zuul/+/783986	22:23
*** y2kenny has quit IRC		23:11
*** tosky has quit IRC		23:34
openstackgerrit	James E. Blair proposed zuul/zuul-operator master: Allow terminationGracePeriodSeconds to be configurable https://review.opendev.org/c/zuul/zuul-operator/+/785989	23:48
corvus	tristanC: i'm interested in your thoughts on https://review.opendev.org/785985	23:50
corvus	tristanC: you and i both have independently used secrets for everything -- in the operator, and in my raw k8s i set up for gerrit's zuul. but avass rightly points out that we can use configmaps for some things, and that's what the original spec said. personally, i think i gravitated towards secrets for everything for simplicity's sake. i'm interested in your thoughts / experiences there.	23:51
corvus	i'm honestly pretty ambivalent about it; i could go either way.	23:52
corvus	(no rush on that, just wanted to draw attention to it as something in particular that could use more discussion)	23:53
*** ajitha has joined #zuul		23:56
corvus	remote: https://gerrit-review.googlesource.com/c/zuul/ops/+/303243 Add a keystore password for Zuul [NEW]	23:59
corvus	there's the keystore change for gerrit's zuul	23:59

Generated by irclog2html.py 2.17.2 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!