Wednesday, 2019-12-18

*** tosky has quit IRC		00:00
*** mattw4 has quit IRC		00:02
SpamapS	mnaser: v3 is pretty nice.. I'd be down for testing. :)	00:03
mnaser	SpamapS: yeah, funnily enough tho, we've historically never used tiller ever :p	00:03
mnaser	so helm v3 was a "hey glad y'all finally caught on" :P	00:03
*** saneax has quit IRC		00:20
corvus	mnaser: sweet -- maybe we can use those in the gerrit-review deployment	00:21
mnaser	corvus: ill push it up at some point, probably later today or tomorrow and then we can look about adding it to opendev	00:22
corvus	++	00:22
mnaser	the only help i'd appreciate is (i tried this and failed a while back) is tagged releases on dockerhub would be nice	00:25
*** rlandy has quit IRC		00:29
corvus	agreed; i'll try to get to that if someone else doesn't soon, but i've a bit larger backlog than normal right now	00:35
*** sgw has joined #zuul		01:01
SpamapS	TBH, Helm is a far far simpler approach than the operator.	01:09
SpamapS	It can't do as much, but it's definitely intended as "package management for k8s apps" rather than "do all the magic"	01:09
mnaser	SpamapS: yep, its a good intermediate step for now	01:12
SpamapS	We've actually shifted over to managing kubernetes objects with Terraform of late. The lifecycle and dependency management is particularly nice.	01:20
SpamapS	But we have a bunch of helm stuff that will stay in helm for a while.	01:21
*** swest has joined #zuul		01:30
*** swest has quit IRC		01:35
*** swest has joined #zuul		01:50
*** bhavikdbavishi has joined #zuul		02:47
*** gouthamr has quit IRC		03:14
*** gouthamr has joined #zuul		03:24
*** jamesmcarthur has joined #zuul		03:29
*** jamesmcarthur has quit IRC		03:44
*** pcaruana has joined #zuul		05:24
*** raukadah is now known as chkumar\|rover		05:58
*** saneax has joined #zuul		06:24
*** jamesmcarthur has joined #zuul		06:46
*** jamesmcarthur has quit IRC		06:51
*** jcapitao\|afk has joined #zuul		07:27
*** jcapitao\|afk is now known as jcapitao		07:28
*** gtema_ has joined #zuul		07:58
*** tosky has joined #zuul		08:18
*** jpena\|off is now known as jpena		08:22
*** gtema_ has quit IRC		08:26
*** tosky has quit IRC		08:32
*** avass has joined #zuul		08:44
*** themroc has joined #zuul		08:49
*** tosky has joined #zuul		09:16
*** mhu has joined #zuul		09:33
*** bhavikdbavishi has quit IRC		09:58
*** tosky has quit IRC		10:02
*** tosky has joined #zuul		10:03
*** jcapitao is now known as jcapitao\|afk		11:29
*** sshnaidm has quit IRC		12:04
*** bhavikdbavishi has joined #zuul		12:04
tristanC	fwiw i'm still exploring dhall-lang, and for application deployment here is what i wrote for zuul: https://github.com/TristanCacqueray/dhall-operator/blob/master/applications/Zuul.dhall	12:07
*** avass has quit IRC		12:11
*** mgoddard has quit IRC		12:31
*** jpena is now known as jpena\|lunch		12:41
*** armstrongs has joined #zuul		12:41
*** mgoddard has joined #zuul		12:49
*** armstrongs has quit IRC		12:50
*** mgoddard has quit IRC		12:54
*** rlandy has joined #zuul		12:57
*** sshnaidm has joined #zuul		12:58
*** Goneri has quit IRC		12:58
*** sshnaidm has quit IRC		13:02
*** sshnaidm has joined #zuul		13:03
*** AshBullock has joined #zuul		13:04
*** electrofelix has joined #zuul		13:05
*** jamesmcarthur has joined #zuul		13:05
*** mgoddard has joined #zuul		13:09
AshBullock	Hey all, I have a question on the nodepool kubernetes driver, I'm seeing some jobs running on our eks cluster hitting RETRY_LIMIT intermittently, I was wondering if there is an undocumented max-pods setting for kubernetes similar to the openshift driver ? https://zuul-ci.org/docs/nodepool/configuration.html#attr-providers.[openshiftpods].max-pods	13:09
*** jamesmcarthur has quit IRC		13:12
*** jamesmcarthur has joined #zuul		13:13
*** jcapitao\|afk is now known as jcapitao		13:16
*** ssbarnea has quit IRC		13:22
*** jamesmcarthur has quit IRC		13:29
Shrews	AshBullock: no, there is no such setting in the kubernetes driver	13:33
Shrews	AshBullock: that being said, quota issues (such as max-pods) should not cause RETRY_LIMIT errors. You would see messages in nodepool about "not enough quota to satisfy" the request, and it would simply not handle the request until quota freed up. You are likely hitting some sort of communication issue, I'm guessing (not a k8s expert).	13:39
Shrews	first place you may want to look is the zuul executor logs for the builds encountering the retry. might be more info there	13:43
*** jpena\|lunch is now known as jpena		13:45
*** Goneri has joined #zuul		13:46
*** bhavikdbavishi has quit IRC		13:48
*** jamesmcarthur has joined #zuul		13:55
*** jamesmcarthur has quit IRC		13:59
AshBullock	Thanks for the help Shrews, I'll take a look at the executor logs to see if I can find anything	14:00
*** sshnaidm has quit IRC		14:02
*** jamesmcarthur has joined #zuul		14:03
Shrews	corvus: the etherpad lgtm. who is going to run/operate their zuul?	14:04
*** chkumar\|rover is now known as chandankumar		14:07
clarkb	NODE_FAILURE is ehat you get if you cant provision with nodepool	14:14
clarkb	retry limit happens when the job runs but zuul detect s failures in pre run or ansible return exit code indicating network problems	14:15
clarkb	zuul by default tries 3 times to run the job when ithits this	14:15
clarkb	failing 3 times results in retry limit	14:15
clarkb	AshBullock: ^	14:15
mordred	Shrews: they/we are	14:20
*** themroc has quit IRC		14:21
mordred	Shrews: idea being having a community repo/repos sort of similar to opendev where we have a gitops repo for running it	14:23
*** bhavikdbavishi has joined #zuul		14:48
*** sshnaidm has joined #zuul		14:53
*** AshBullock has quit IRC		15:04
*** ssbarnea has joined #zuul		15:12
mnaser	hrm, the latest nodepool-builder image on dockerhub doesnt have debootstrap	15:28
mnaser	https://opendev.org/zuul/nodepool/commit/46d0ce248326127c2d883a415af98fea66af889d this commit implies "Note the sibling build will have installed many of these from the bindep.txt file from diskimage-builder itself." but adding "However, when using releases this is not done."	15:28
mnaser	does that mean i have to build my own images and add these bits on top of it (i'm ok with that, it just seems like uh, work that new users might struggle with)	15:29
mordred	hrm	15:29
mordred	I do not believe that's the intent	15:29
mordred	mnaser: let's loop corvus in when he gets up so we can talk through it	15:30
mnaser	ya looking at the /var/log/apt/history.log in the container, its not htere	15:31
mnaser	*there	15:31
mordred	mnaser: (the ultimate solution here is the finishing of the docker-base-image patches for dib so that debootstrap is not needed anymore)	15:31
mnaser	mordred: yeah i was thinking about that too! i was wondering about the feasiblity of running docker-base-image inside a container	15:31
*** panda has quit IRC		15:31
mordred	mnaser: shold be fine actually - it just does a podman export	15:32
mordred	so it doesn;'t actually _run_ a container build or anything - just fetches and then exports the filesystem	15:32
mnaser	mordred: i guess we'd need podman as a runtime dependency in that case but thats' fine by me	15:32
mordred	yah	15:32
mordred	(it's actually written to work with podman or docker - but I think podman is the nicer runtime dep)	15:33
mnaser	yeah its probably not gonna try and mess around with trying to get a systemd service/etc done	15:33
mordred	mnaser: https://review.opendev.org/#/c/693619/ is what I've got so far	15:33
*** panda has joined #zuul		15:33
mnaser	mordred: i actually searched for that patch hoping it'd have merged	15:34
mordred	mnaser: I think that part is solid enough - the hard bits are going to be https://review.opendev.org/#/c/693642/ and similar for the other base os images	15:34
mnaser	and then i would have started using it =P	15:34
clarkb	mordred: mnaser that commit merged yesterday, maybe it failed to upload the updated image?	15:34
mordred	mnaser: honestly - I think it's all likely not too hard to get moving	15:34
clarkb	or your pull is from pre merge?	15:34
mnaser	clarkb: the last merge was 18 hours ago and the most recent image in dockerhub was 18 hours ago (and also the one i have)	15:34
mordred	clarkb: I don't think that commit is sufficient	15:34
clarkb	oh I thought debootstrap was specifically listed	15:35
mordred	clarkb: somewhere we missed that we need some of the siblings behavior in production builds too	15:35
mordred	it's not	15:35
clarkb	ah	15:35
mordred	I'm honestly not sure what the right solution is - think it's worth a quick discussion - I'm pretty sure implemeting the right solution won't be as hard as figuring out what it is :)	15:36
mnaser	yep, agreed	15:36
mnaser	i think for now ill kinda just uh, have an image with a few extra packages (based on the most recent tagged release)	15:36
mnaser	just to unblock the helm charts work im at	15:37
mordred	mnaser: I betcha those two dib patches would get you a bootable ubuntu image if you did container-base-image vm ubuntu-kernel DIB_CONTAINER_IMAGE=docker.io/library/ubuntu	15:37
mnaser	so far launcher works well (tested with cloudimages) and working on builder now	15:37
mordred	mnaser: woot	15:37
clarkb	I think if we are installing vhd-utils and debian-keyring and yum etc we may as well install debootstrap	15:38
clarkb	yum is the equivalent of debootstrap there for red hat distros	15:39
mordred	I agree	15:39
openstackgerrit	Monty Taylor proposed zuul/nodepool master: Add debootstrap to builder package list https://review.opendev.org/699707	15:40
clarkb	I know ianw intends to get this into production after PTO	15:40
mordred	clarkb, mnaser: ^^	15:40
clarkb	I expect things will work a bit more happily out of the box once we dogfood it	15:41
mnaser	clarkb: thats a very reasonable argument IMHO	15:41
*** ssbarnea has quit IRC		15:46
mnaser	fwiw mnaser/nodepool-builder:latest is running with mordred patch, so ill test that and see if any other things pop up missing	15:50
tristanC	Software Factory 3.4 has been released, amongs other things it removes SCL for python3 and the zuul rpm doesn't have patches anymore: https://www.softwarefactory-project.io/releases/3.4/	15:51
corvus	ohai	15:53
clarkb	tristanC: removes SCL because centos/rhel 7 provide python 3 directly?	15:54
mnaser	challenge #2: sudo mount --bind /opt/cache/apt/debian /tmp/dib_build.8Jsgxogy/mnt/var/cache/apt/archives => mount: /tmp/dib_build.8Jsgxogy/mnt/var/cache/apt/archives: permission denied. -- gonna guess i have to find the right capability to add to this container	15:55
tristanC	clarkb: yes, we rebuilt every python3 components using the python-3.6 provided by el7	15:55
* mnaser goes back to research		15:55
clarkb	mnaser: ya you'll need privileges	15:55
corvus	mordred: ++ 699707	15:55
mnaser	clarkb: ya im trying to avoid privileged: true and finding the right caps to add..	15:55
corvus	mnaser: you are my hero	15:56
clarkb	I think mount is its own cap?	15:56
mordred	mnaser: we've been wanting a human to do that for like 2 years now	15:56
tristanC	clarkb: we still enable the rh-git-218 SCL because zuul needs a more recent git	15:56
mnaser	corvus, mordred \o/	15:56
clarkb	tristanC: hrm what aspect of zuul requires newer git?	15:56
mnaser	mount seems to require CAP_SYS_ADMIN:X	15:57
tristanC	clarkb: iirc GIT_SSH_COMMAND doesn't work on el7	15:57
mordred	of course it does	15:57
mnaser	super unrelated and old school but http://linux-vserver.org/Capabilities_and_Flags -- seems like there are contexts	15:57
mnaser	and we can build caps based on contexts, SECURE_MOUNT which is allowing to mount	15:57
mnaser	i wonder if k8s can support these	15:57
mnaser	seems like a vserver construct tho :<	15:58
clarkb	there are ways around that if people want to hack on dib. One method is FUSE (might make builds slower?) another is mkfs.* for certain filesystems can take an existing fs tree and write it into the new fs on a file without mounting it	15:59
clarkb	ext4 can do that but I don't think xfs or btrfs can	16:00
mnaser	yeah it's gonna have to be CAP_SYS_ADMIN because thats the only way you can get `mount` :(	16:03
corvus	i wonder if it's because of things like proc and sysfs	16:04
clarkb	corvus: that and the way it writes the file image out is to mount a file as block device	16:04
mnaser	actually proc and sysfs seem to be ok to mount by default	16:04
clarkb	it then unmounts that file and you get the .raw image. This is then converted to other formats	16:04
clarkb	in theory that could be fuse mounted	16:05
clarkb	it could also have its contents written directly by mkfs if the fs types support that	16:05
mnaser	apparnetly certain file systems have a `FS_USERNS_MOUNT` flag (procfs, tmpfs, sysfs) which make sthem ok, but mountin ext4/nfs/btrfs/overlayfs etc are no bueno	16:06
mordred	mnaser: how does img work? (or the other "build images in unprivileged containers using user namespaces")	16:08
clarkb	mordred: they don't have to create a proper filesystem aiui	16:09
mnaser	mordred: i mean i think i could have avoided that error by disabling the apt cache	16:09
mnaser	cause it was trying to bind mount /var/cache/apt/archives	16:09
clarkb	mnaser: I don't think so. the end of dib runs is to mkfs on a file, mount it, write the fs out, unmount it, and convert from raw to $format	16:10
clarkb	that mount will need perms	16:10
mnaser	yeah so i would have failed later if i disabled it	16:10
clarkb	yes	16:10
*** chandankumar is now known as raukadah		16:10
mnaser	btw lol, the next nugget	16:10
mnaser	"ps: command not found"	16:10
mnaser	:p	16:10
clarkb	dib is running ps or your are?	16:11
mnaser	/usr/local/lib/python3.7/site-packages/diskimage_builder/lib/common-functions: line 177: ps: command not found	16:11
mnaser	dib is	16:11
mnaser	https://github.com/openstack/diskimage-builder/blob/master/diskimage_builder/lib/common-functions#L172-L189	16:11
clarkb	do we need to install coreutils/sysutils because the container images strip that out	16:11
clarkb	also ^ would make the container image based bootstrap weird I expect	16:12
mnaser	im gonna keep iterating and push up a patch with all the things ill find..	16:13
mnaser	mkdir: cannot create directory '/etc/modprobe.d': Permission denied	16:19
mnaser	hmmm	16:19
mnaser	this feels like a bug	16:20
mnaser	https://opendev.org/openstack/diskimage-builder/src/branch/master/diskimage_builder/elements/modprobe/extra-data.d/50-modprobe-blacklist -- it should be `$TMP_MOUNT_PATH/etc/modprobe.d` no ?	16:21
mnaser	or maybe it should be prefixed with a sudo	16:21
clarkb	mnaser: I think it should be $TMP_MOUNT_PATH prefixed	16:24
mnaser	right, because we're not actually trying to check if the host has kmod or not..	16:24
clarkb	exactly	16:24
mnaser	ok ill push up a patch now	16:24
*** ssbarnea has joined #zuul		16:26
openstackgerrit	Merged zuul/nodepool master: Add debootstrap to builder package list https://review.opendev.org/699707	16:27
mnaser	https://review.opendev.org/699722 modprobe.d: use $TMP_MOUNT_PATH	16:27
openstackgerrit	Mohammed Naser proposed zuul/nodepool master: Add procps to packages in Dockerfile https://review.opendev.org/699725	16:32
mnaser	hmm	16:33
mnaser	mkdir: cannot create directory '/tmp/dib_build.qfvFRgnB/mnt/etc/modprobe.d': Permission denied	16:33
mnaser	i guess we gotta sudo that?	16:33
clarkb	possibly. The relative permissiones get mind bendy there because that is in the nested fs and ya /etc there is probably owned by uid 0 which is root	16:34
clarkb	and sudo will reconcile that	16:34
mnaser	ok great, got past that	16:46
mnaser	im gonna keep putting things inside https://review.opendev.org/#/q/topic:nodepool-in-k8s	16:47
*** jcapitao is now known as jcapitao\|afk		16:48
*** rlandy is now known as rlandy\|brb		17:17
*** hashar has joined #zuul		17:21
*** bhavikdbavishi has quit IRC		17:23
*** mattw4 has joined #zuul		17:35
*** jpena is now known as jpena\|off		17:36
*** panda has quit IRC		17:36
*** panda has joined #zuul		17:39
mnaser	exec_sudo: losetup: cannot find an unused loop device	17:40
mnaser	i've wrestled this enough for a while with no success, added CAP_MKNOD and no bueno	17:41
mnaser	apparently we do some mknods but that feels wrong https://serverfault.com/a/720496	17:42
mnaser	and seems to imply that they are shared with the host	17:42
tristanC	mnaser: perhaps try to bind mount /dev/loop-control or authorize the c:10:237 device ?	17:42
mnaser	tristanC: ok, i'lll dig from there	17:45
*** rlandy\|brb is now known as rlandy		17:46
openstackgerrit	Tristan Cacqueray proposed zuul/zuul master: spec: add a zuul-runner cli https://review.opendev.org/681277	17:59
mnaser	hmm	18:04
mnaser	we can look into this at some point: https://github.com/braincorp/partfs	18:04
*** rfolco has quit IRC		18:07
*** electrofelix has quit IRC		18:09
tristanC	mnaser: you'd still need a device access, e.g. /dev/fuse	18:09
clarkb	tristanC: ya but anyone can read and write to that device	18:10
clarkb	at least on my machine	18:10
clarkb	you won't need additioanl permissions/capabilities past that aiui	18:10
mnaser	tristanC, clarkb: but apparently also fuse can be mounted inside containers	18:10
mnaser	according to some article i remember reading at some point	18:10
pabelanger	mnaser: https://review.opendev.org/415927/ might be helpful, that was my last attemps of DIB inside docker, back in 2017	18:13
pabelanger	there was even work to use docker for dib matrix of tests	18:14
*** yolanda__ is now known as yolanda		18:18
*** tosky has quit IRC		18:23
*** Goneri has quit IRC		18:28
mnaser	pabelanger: neat	18:35
mnaser	im so tempted to just say f'it and add "privileged: true" :(	18:35
mnaser	we're at: "failed to set up loop device: Operation not permitted"	18:36
tobiash	mnaser: there was a time when containerized dib leaked loop devices	18:36
mnaser	tobiash: thats what im worried about too	18:36
mnaser	i exposed a single loop device only (loop0)	18:36
tobiash	mnaser: you'll need to use privileged	18:36
pabelanger	mnaser: yah, needs to be privileged right now	18:36
mnaser	tobiash: ive been adding manual CAPS as needed..	18:37
mnaser	im at CAP_MKNOD and CAP_SYS_ADMIN ..	18:37
tobiash	ok, that'll take some time to find all needed privs ;)	18:37
mnaser	i think ill cutover to privileged for now to make sure it works then i will scale back	18:38
tobiash	mnaser: for the loopdev leak, we have this in the root.d phase: http://paste.openstack.org/show/787742/	18:41
tobiash	but no idea if that's still required	18:41
tobiash	we needed this back in zuulv2 days and stick with that	18:42
mnaser	tobiash: ya i remember running into similar issues a long time ago	18:43
*** openstackgerrit has quit IRC		18:43
mnaser	ya privileged just uh, fixed it all, but we'll see.	18:44
clarkb	tobiash: mnaser re the loop leak I would expect that would affect containerized and not containerized dib the same and i don't believe that is something we see leaking on our builders	18:44
clarkb	heh but now that I check I think maybe we do	18:45
mnaser	:P	18:45
clarkb	what is weird about that is we don't seem to hit the node limit	18:45
clarkb	so we don't leak them quickly?	18:45
clarkb	in any case that isn't container specific	18:45
mnaser	i think im going to make builders a statefulset	18:46
tobiash	clarkb: back then we had this issue only in dockerized envs	18:46
mnaser	because the builder hostname will be changing often during redeploys and the builder ids are constantly changing	18:47
clarkb	mnaser: that shouldn't matter? the biggest reason to make it stateful will be to keep the cache around so that your builds are faster	18:47
tobiash	mnaser: yes, builders need to be a statefulset	18:47
mnaser	https://www.irccloud.com/pastebin/VSmF0iC0/	18:47
clarkb	tobiash: oh?	18:47
tobiash	As well as the executors	18:47
tobiash	clarkb: but I don't remember the reasons	18:48
mnaser	everytime you redeploy, it'll be a different hostname	18:48
mnaser	the executors might make sense bc or the cache	18:48
clarkb	mnaser: hrm and I guess we use the hostnames to identify deleting images?	18:48
mnaser	yep	18:48
mnaser	so they're not deleting cause those "nodes" arent responding	18:48
clarkb	seems like we could make that better in nodepool, but probably also low priority	18:49
clarkb	(the image is already deleted on the nodepool side if the host is gone so it should noop and be happy there)	18:49
tobiash	mnaser: the executors also need a stable identity because of live streaming	18:49
*** openstackgerrit has joined #zuul		18:50
openstackgerrit	Merged zuul/nodepool master: Add procps to packages in Dockerfile https://review.opendev.org/699725	18:50
openstackgerrit	Merged zuul/nodepool master: Functional tests - use common verification script https://review.opendev.org/698834	18:50
*** rfolco has joined #zuul		18:52
*** sshnaidm is now known as sshnaidm\|afk		18:58
*** jamesmcarthur has quit IRC		19:03
mnaser	ok so https://review.opendev.org/#/c/699722/ helped me build images locally if anyone wants to help push that through	19:25
clarkb	mnaser: +2. ianw is out on pto so may not get too it soon. If another infra reviewer can ack it though I think we can merge it without ianw	19:26
*** jamesmcarthur has joined #zuul		19:26
* mnaser is trying to avoid having a local build as much as possible		19:27
*** mgoddard has quit IRC		19:33
*** mgoddard has joined #zuul		19:34
*** Goneri has joined #zuul		19:37
*** jamesmcarthur has quit IRC		19:38
*** mhu has quit IRC		19:41
SpamapS	I hit an interesting problem today	19:48
SpamapS	we have a job in our gate that creates a terraform plan... that's a diff against the infrastructure that it saves as an artifact...	19:48
SpamapS	but we don't apply until promote, post-merge. The promote job goes and digs out the artifact, and applies that diff, or complains if the infrastructure changed and the diff is stale.	19:49
SpamapS	We approved two changes in rapid succession, and zuul went [gate changeA = plan1][gate changeB = plan1][merge changeA][promote changeA+plan1 == SUCCESS][merge changeB][promote changeB+plan1 == stale FAIL]	19:50
*** decimuscorvinus has quit IRC		19:51
*** decimuscorvinus has joined #zuul		19:52
SpamapS	Now, if we semaphore the gate job and the promote job, we can shrink the window for duplicate plans, but we can't eliminate it. There's a window where the semaphore is unlocked, and changeA is merged, and gating changeB wins, and makes a duplicate plan, and then the same scenario happens...	19:52
SpamapS	Any ideas? At this point, we're thinking block the plan creation gate job until any unapplied plans are applied.. but that's also going to make things extremely serialized (maybe that's what we want?)	19:53
corvus	SpamapS: does the plan for B not include the changes that A makes?	19:54
mordred	SpamapS: it seems like the gate job for changeB needs to be making a plan/diff that would be the result of changeB being applied if changeA was already applied	19:54
corvus	(just wondering why the promote of B didn't see the A plan component as existing/noop)	19:54
corvus	i think mordred and i are saying similar things	19:54
mordred	which - if it's diffing against production, is kind of hard to simulate in the gate job, since changeA hasn't been applied yet	19:54
mordred	corvus: I agree :_)	19:55
mordred	SpamapS: are corvus and I tracking the problem correctly at least?	19:55
SpamapS	Terraform doesn't give us the option to assume a plan has already been applied. It's not as smart as git.	19:58
SpamapS	mordred: yes you're right that the gate job for change B needs to make a plan that includes the results of changeA's plan. In order to do that, one must apply change A's plan. There's no stacking.	19:59
corvus	yeah, from a high level, it seems like "the production system" is a part of the gate environment. i think that means it either needs to be able to be modeled serially (so that changes are stacked correctly), or mutexed into the singleton that it is	20:01
SpamapS	corvus: right, basically we're going to end up passing a lock through as an artifact that promote will unlock by applying or failing, and until that happens, no plans can be created.	20:02
corvus	(i wonder if it's feasible to make a tool which manipulates terraform plans that way -- subtraction, addition, etc)	20:02
SpamapS	I'm not sure it would be valid unfortunately. Cloud APIs often have emergent effects.	20:03
SpamapS	The plan may be "create a foo" and that creation will get an ID that is now part of the state of the system.	20:03
corvus	SpamapS: ack	20:03
SpamapS	I'm also not sure this is how we need the system to work.	20:04
SpamapS	We did this to lock in the handoff from gate tests -> promote applies.	20:04
corvus	this seems like an interesting consideration in deployment systems -- how stateful vs stateless they are	20:04
SpamapS	Agreed, this is a particularly sticky wicket.	20:05
SpamapS	Up until yesterday, we'd just let the promote job apply whatever it needed to based on the code in the repo.	20:05
corvus	SpamapS: incidentally, if it's not too boring for you to explain it to me, why make a plan in the gate? why isn't that just something that happens post-merge?	20:05
corvus	oh, heh, i think your last sentence is getting at my question :)	20:06
corvus	so yeah, what changed?	20:06
SpamapS	That's a good question.	20:06
SpamapS	We wanted to make sure that what we gated is the only change that happens. I'm not sure it's as valuable as we thought, especially if it makes our deployment pipeline serialize with the gate.	20:06
corvus	oh interesting	20:07
SpamapS	The minor problem we were solving is that sometimes there are in-process manual changes that may get overridden by stuff landing in the gate.	20:07
SpamapS	TBH I'm struggling to come up with reasons of value.	20:08
corvus	heh, it's always those "manual" edge cases that mess up this whole gitops thing	20:08
SpamapS	We may just want to drop it and be a bit more forceful with "apply what's in the code base"	20:08
* corvus looks at opendev's "emergency" file		20:08
SpamapS	There was also some talk about vetting the plans in the gate, so like, make sure it never deletes an RDS or something, but one can't really inspect them in the current state of terraform, so that's just a fantasy.	20:09
*** jamesmcarthur has joined #zuul		20:09
SpamapS	I actually think the thing we want is not plan-in-the-gate but plan-in-check. So.. inform the user of what this change would do, and then maybe give them some kind of option to say "yes this is approved, but only with that plan"	20:11
clarkb	corvus: re gerrit and zuul. I dont think yoi can do blue green deployments of zuul if gating. We can do that for subcomponents that scale out like the executor though	20:12
corvus	clarkb: is that re the original msg or the reply i just sent?	20:13
clarkb	corvus: the one you just sent and the one from thomas	20:13
clarkb	but as you say its well tested	20:13
corvus	i meant to say that blue/green wasn't necessary because of gating, but maybe i could have been more explicit	20:13
corvus	i wrote too many words	20:14
clarkb	corvus: ya I got that but it goes the other way too. If the zuul isgating aproject you cant blue green that indtall due to shared statr	20:14
clarkb	but ya I think once you get to gated state a lot of those concerns go away	20:15
corvus	ah yes. the flip side of SpamapS's coin	20:15
SpamapS	corvus: mordred thanks for your wisdom. I think we're going to move the plan generation into promote.	20:15
*** rf0lc0 has joined #zuul		20:15
corvus	SpamapS: sounds good; thanks for the brain food. i love hearing about use cases like this	20:16
SpamapS	That way we still get a plan artifact of what happened, but we don't block gate jobs from running.	20:16
*** hashar has quit IRC		20:16
SpamapS	(terraform plans tied to git commits are extremely useful for audits / RCA's)	20:16
*** rfolco has quit IRC		20:17
*** hashar has joined #zuul		20:24
SpamapS	corvus: if we find that we do want to have this serialization between gate and promote, I wonder if there's room for a new type of mutex-ish object where it follows the artifacts across pipelines. As annoying as serializing the plan generation would be.. as long as everything else could run in the gate.. and there's just this big queue of "generate plan and upload artifact" waiting.. that's a fast process..	20:37
SpamapS	this might still be useful in other contexts.	20:37
SpamapS	but, yeah, let's wait for a second attempt at it before we go beyond noodling	20:37
*** hashar has quit IRC		20:38
*** rf0lc0 has quit IRC		20:41
mnaser	does paramiko have any sort of uh, run-time dependencies like the actual ssh binaries?	20:42
corvus	i don't think so	20:42
mnaser	ssh-keyscan to the IP of this machine works perfectly, but nodepool is timing out	20:43
corvus	mnaser: is it using the right ip? it should be in the error log if it's timing out	20:55
mnaser	corvus: yeah, its the ipv4 one (eliminating any ipv6 shenanigans)	20:56
pabelanger	that timeout do you have setup for boot?	20:56
pabelanger	maybe keyscan happening too fast?	20:56
corvus	mnaser: this is the method, if you want to try manual debug: https://opendev.org/zuul/nodepool/src/branch/master/nodepool/nodeutils.py#L58	20:57
corvus	mnaser: you can actually just import that and run it from the repl; no special objects/classes needed	20:57
mnaser	yeah im running it manually on my local machine vs that container to see what is different	20:57
mnaser	it def times out running it seperately too	20:57
* mnaser hmms		20:57
mnaser	the nodepool-builder container has a bit more stuff in there (ssh tool is there) and it scans successfully	20:59
mnaser	let me see if it actaully runs there	20:59
mnaser	ok, extracting the method out, it actually.. works?	21:06
mnaser	it is returning a `ssh-ed25519` key	21:06
*** rf0lc0 has joined #zuul		21:26
*** rf0lc0 has quit IRC		21:26
*** jcapitao\|afk has quit IRC		21:38
corvus	clarkb: do you think we should restart opendev before releasing 3.14?	21:44
clarkb	corvus: let me look at the git log	21:44
corvus	it's the 2.5 removal, smart-reconfig, and a couple of bugfixes.	21:45
clarkb	ya those bugfixes are maybe worth restarting for since they affect pipeline behavior?	21:46
clarkb	I'm not too concerned about the ansible version removal	21:46
corvus	k, i'll get that started then	21:46
fungi	i'm only on for a moment from tonight's hotel, but have to cheer for the "pi release"	21:48
corvus	fungi: wait till the bugfix releases.... 3.14.15926535	21:49
fungi	yass	21:50
clarkb	they put pi on the wall in our only underground MAX station here. And got it wrong	21:50
corvus	knuths christmas lecture this year was on pi	21:50
clarkb	apparently they had taken the value as printed in some textbook which also got it wrong	21:50
corvus	clarkb: wow	21:50
clarkb	and its carved into the stone wall	21:50
clarkb	so they never changed it :)	21:51
fungi	should i be worried about how well the max isn't engineered, if they can't get pi right?	21:51
clarkb	fungi: I think siemens makes the trains and not local construction company so probably ok	21:51
corvus	https://www.roadsideamerica.com/tip/20814	21:51
fungi	i bet siemens knows pi	21:51
*** dtroyer has joined #zuul		21:52
*** pcaruana has quit IRC		21:54
*** saneax has quit IRC		21:55
corvus	also letterspacing lining numbers is a bit of a typographic blunder.	21:57
clarkb	corvus: if only they had you to help them do the layout :)	21:58
corvus	yes, they could have had the wrong numbers in better style!	21:58
clarkb	one of the really neat things about that station is the have the vertical cores they took laid out horizontally then have applied geological timeline tidbits along it	21:59
corvus	ok, opendev restarted; we'll watch that a bit and then cut a release	22:01
*** jamesmcarthur has quit IRC		22:04
*** mattw4 has quit IRC		22:07
*** mattw4 has joined #zuul		22:10
*** mattw4 has quit IRC		22:51
*** saneax has joined #zuul		22:54
*** mattw4 has joined #zuul		22:59
*** rlandy is now known as rlandy\|bbl		23:11
*** saneax has quit IRC		23:24
*** mattw4 has quit IRC		23:40
*** mattw4 has joined #zuul		23:40
*** mattw4 has quit IRC		23:45
*** mattw4 has joined #zuul		23:46

Generated by irclog2html.py 2.15.3 by Marius Gedminas - find it at mg.pov.lt!