Wednesday, 2021-03-24

*** tosky has quit IRC		00:09
*** piotrowskim has quit IRC		01:02
*** holser has quit IRC		01:45
*** evrardjp has quit IRC		03:33
*** evrardjp has joined #zuul		03:33
*** ykarel has joined #zuul		04:05
*** ykarel has quit IRC		04:15
*** ykarel has joined #zuul		04:15
*** vishalmanchanda has joined #zuul		04:42
*** jfoufas1 has joined #zuul		05:24
*** ajitha has joined #zuul		05:26
*** wuchunyang has joined #zuul		05:36
*** saneax has joined #zuul		05:56
*** zbr\|rover has quit IRC		06:08
*** zbr\|rover has joined #zuul		06:10
*** parallax has quit IRC		07:29
*** jcapitao has joined #zuul		07:46
openstackgerrit	Tobias Henkel proposed zuul/zuul master: Route streams to different zones via finger gateway https://review.opendev.org/c/zuul/zuul/+/664965	08:11
openstackgerrit	Tobias Henkel proposed zuul/zuul master: Support ssl encrypted fingergw https://review.opendev.org/c/zuul/zuul/+/664950	08:11
*** ykarel is now known as ykarel\|lunch		08:17
*** rpittau\|afk is now known as rpittau		08:19
*** hashar has joined #zuul		08:28
openstackgerrit	Simon Westphahl proposed zuul/zuul master: Add UUID for queue items https://review.opendev.org/c/zuul/zuul/+/772512	08:31
openstackgerrit	Simon Westphahl proposed zuul/zuul master: Store semaphore state in Zookeeper https://review.opendev.org/c/zuul/zuul/+/772513	08:31
*** ricolin has joined #zuul		08:32
*** tosky has joined #zuul		08:41
openstackgerrit	Simon Westphahl proposed zuul/zuul master: Unify handling of dequeue and enqueue events https://review.opendev.org/c/zuul/zuul/+/781099	08:46
openstackgerrit	Simon Westphahl proposed zuul/zuul master: Improve test output by using named queues https://review.opendev.org/c/zuul/zuul/+/775620	08:46
openstackgerrit	Simon Westphahl proposed zuul/zuul master: Avoid race when task from queue is in progress https://review.opendev.org/c/zuul/zuul/+/775621	08:46
openstackgerrit	Simon Westphahl proposed zuul/zuul master: Implement Zookeeper backed connection event queues https://review.opendev.org/c/zuul/zuul/+/775622	08:46
openstackgerrit	Simon Westphahl proposed zuul/zuul master: Dispatch Github webhook events via Zookeeper https://review.opendev.org/c/zuul/zuul/+/775624	08:46
openstackgerrit	Simon Westphahl proposed zuul/zuul master: Dispatch Pagure webhook events via Zookeeper https://review.opendev.org/c/zuul/zuul/+/775623	08:46
openstackgerrit	Simon Westphahl proposed zuul/zuul master: Dispatch Gitlab webhook events via Zookeeper https://review.opendev.org/c/zuul/zuul/+/775625	08:46
openstackgerrit	Sorin Sbârnea proposed zuul/zuul master: Document tox environments https://review.opendev.org/c/zuul/zuul/+/766460	09:00
*** vishalmanchanda has quit IRC		09:01
*** jpena\|off is now known as jpena		09:09
*** holser has joined #zuul		09:16
*** ykarel\|lunch is now known as ykarel		09:16
*** nils has joined #zuul		09:37
*** vishalmanchanda has joined #zuul		09:42
*** parallax has joined #zuul		09:54
openstackgerrit	Tobias Henkel proposed zuul/zuul master: Route streams to different zones via finger gateway https://review.opendev.org/c/zuul/zuul/+/664965	10:03
openstackgerrit	Tobias Henkel proposed zuul/zuul master: Support ssl encrypted fingergw https://review.opendev.org/c/zuul/zuul/+/664950	10:04
*** jangutter_ has joined #zuul		10:24
*** jangutter has quit IRC		10:27
*** wuchunyang has quit IRC		10:59
*** jcapitao is now known as jcapitao_lunch		11:06
*** hashar has quit IRC		11:07
*** tobias-urdin has joined #zuul		11:09
tobias-urdin	is there any way to make nodepool (or zuul?) wait for cloud-init to complete on a node before using it	11:09
tobias-urdin	for example, with an image that does not have python installed, and cloud-init (passed through nodepool) that installs it sometimes causes a race condition between zuul trying to execute ansible and python is not installed on the node	11:10
tobias-urdin	im thinking, a pre-run task using the "raw" module and pausing until the python binary is found, but that's kind of hacky	11:11
tobias-urdin	but maybe it breaks even earlier than that	11:11
avass	tobias-urdin: you're installing python with cloud init?	11:17
avass	I think corvus had siilar problems where there was a race with host-keys generated by cloud-init	11:18
avass	tobias-urdin: if you can somehow install python before enabling sshd that would do it I think.	11:19
tobias-urdin	avass: yeah, nodepool passes userdata that installs python, but sometimes python is not installed in time when zuul to start using the node	11:34
*** rlandy has joined #zuul		11:35
tobias-urdin	for now i'm just pausing the execution in pre-run, seems to work just need to make sure nothing using modules is used before i guess	11:36
tristanC	tobias-urdin: maybe you could use a `raw` task to wait for python installation, e.g. https://docs.ansible.com/ansible/latest/collections/ansible/builtin/raw_module.html#examples	11:55
avass	tristanC, tobias-urdin: doesn't the initial ansible-setup need python too?	12:07
*** sshnaidm\|off is now known as sshnaidm		12:13
*** jpena is now known as jpena\|lunch		12:32
*** jcapitao_lunch is now known as jcapitao		12:42
*** ykarel has quit IRC		13:04
*** ykarel has joined #zuul		13:07
*** ykarel_ has joined #zuul		13:13
*** ykarel has quit IRC		13:14
mordred	tobias-urdin: I don't suppose you can tell nodepool to build you an image that has python already?	13:50
mordred	but in general - "cloud-init is complete" is, AIUI, a hard condition to generalize. I'm sure some people would like that if there is a good way to express it and it can be determined via the cloud's api	13:51
mordred	if we're talking about doing it in nodepool that is. for zuul - yeah- a very early raw task in your base pre-run playbook that waits for python to exist might be a good idea	13:52
swest	tobias-urdin: would executing 'cloud-init status --wait' work for you?	13:54
tobias-urdin	mordred: unfortunately the reason it does not have python is because we want to use a non-customized image as much as possible, but i'm sticking to the pre-run trick for now	13:54
tobias-urdin	swest: didn't know about that, perhaps that is better as a raw task than simply pausing, thanks!	13:55
mordred	swest: ooh that's a potentially neat trick	13:55
corvus	++ sounds like a good role for zuul-jobs if it works :)	13:56
mordred	++	13:56
fungi	tobias-urdin: i'm slightly confused... how does cloud-init run without python?	13:57
fungi	or has it been rewritten in something other than python?	13:57
tobias-urdin	fungi: on centos (and similar) it uses platform-python (which is python3 but is /usr/libexec/platform-python or smth like that) and not /usr/bin/python3	13:59
tobias-urdin	pretty much all system tooling uses that, but python is not "installed"	13:59
fungi	tobias-urdin: aha, got it. so your nodes have python, just not the python you want to use to run your tests	13:59
tobias-urdin	yeah, so i can use that to run stuff but i can't use that for my applications	13:59
avass	I thought the "ansible '*' -m setup" needed python, maybe it doesn't	14:01
mordred	tobias-urdin: you know - for the ansible case, trying using /usr/libexec/platform-python might be an interesting experiment. it would potential help the ansible not pollute the images you are otherwise trying to keep as more pristine test environments	14:01
mordred	that way you could actually have jobs that install python as part of the workload - which would be neat	14:01
tobias-urdin	mordred: yeah :)	14:03
corvus	avass: i'm curious about that too	14:05
mordred	avass, corvus: since that's for fact caching, do we just catch the exception and not cache facts if it doesn't work?	14:08
corvus	mordred: it's actually mostly for ssh connection testing; so we want it to fail if it doesn't work	14:08
mordred	oh - right	14:09
mordred	we do in fact want that to work	14:09
mordred	I wonder if the setup module finds /usr/libexec/platform-python	14:09
mordred	system_interpreters = ['/usr/libexec/platform-python', '/usr/bin/python3', '/usr/bin/python']	14:10
mordred	also: https://github.com/ansible/ansible/blob/4c5ce5a1a9e79a845aff4978cfeb72a0d4ecf7d6/lib/ansible/modules/package_facts.py#L242	14:10
*** ykarel_ is now known as ykarel		14:10
mordred	it looks lke ansible is aware of and will attempt to find platform-python	14:10
corvus	ah nice; that would probably be used for all ansible tasks then, right? (once facts are gathered?)	14:11
mordred	yeah - except I think we set ansible_python_interpreter somewhere, no?	14:12
mordred	which I think might stem from older days when ansible was worse at finding the right interpreter?	14:12
mordred	(like, I'm wondering if setup works because ansible finds platform-python, but then we configure ansible more explicitly which then breaks things)	14:13
corvus	mordred: i think we set it to auto which is the default now?	14:13
mordred	I think you're right?	14:13
mordred	tobias-urdin: ^^ are you setting ansible_python_interpreter somewhere?	14:13
* mordred is learning fascinating things today		14:14
corvus	tobias-urdin: and is your race condition with running ansible or running your code which requires python? (ie, is ansible breaking, or is a shell task that runs "python something.py" breaking)?	14:15
tobias-urdin	it's actually pretty messy, i'm setting python-path to /usr/bin/python3 in nodepool, passing userdata in nodepool to install python3	14:24
tobias-urdin	then the application im running is python3, which is running inside a venv, that in itself runs ansible, that sets ansible_python_interpreter to bootstrap the node, which in turn installs python3 (it's already installed) which then runs another playbook with interpreter set to python3	14:25
tobias-urdin	i'm basicly testing bootstrapping code, that is a python app, that runs ansible, inside zuul that runs it with ansible :p	14:26
avass	heh :)	14:26
corvus	hrm, then i wonder how ansible -m setup works	14:26
avass	corvus: maybe that ignores ansible_python_interpreter or fails and tries to find another interpreter from the defaults	14:27
corvus	maybe	14:27
*** jpena\|lunch is now known as jpena		14:28
avass	actually I think ansible_python_interpreter might just be missing since that step uses a separate inventory file	14:29
corvus	avass: oh, that would be interesting	14:30
avass	I checked and that command fails if ansible_python_interpreter is set to a bad path	14:30
tobias-urdin	anyway, i added a pre-run playbook to the base job which just does cloud-init status --wait which seems like the best situation, we always want to wait for cloud-init so that fine for us (thanks swest!)	14:30
corvus	tobias-urdin: did you use raw for that?	14:31
tobias-urdin	yes, straight up "raw: cloud-init status --wait" with gather_facts set to false for the playbook	14:32
corvus	okay, that's not inconsistent with setup doing something special	14:36
corvus	i'm doing some tests :) with absolutely no python on the system, setup module fails	14:40
corvus	ansible will use platform-python. setup honors ansible_python_interpreter. and if it's set, it won't fall back on platform-python.	14:45
corvus	and from what i can tell, zuul includes ansible_python_interpreter in the setup inventory.	14:45
corvus	avass, tobias-urdin, mordred: so i don't understand how tobias-urdin's system gets past zuul's invocation of ansible -m setup.	14:45
corvus	the ansible_python_interpreter that's set via nodepool should cause that to fail before even running the pre-playbook.	14:46
corvus	(i just confirmed on opendev that our setup-inventory files have ansible_python_interpreter in them)	14:48
corvus	tobias-urdin: are there any messages in your executor log related to zuul's ansible setup phase?	14:50
corvus	tobias-urdin: it'll be an ansible command with "-m setup" in the args	14:51
corvus	tobias-urdin: because i'm stumped as to how this is working for you; i would expect you to need to use the default of 'auto' for python-path for ansible to work reliably, then in a regular pre-run task (no "raw" or anything) either install python, or wait for cloud-init to finish doing it for you. then proceed with running your app.	14:53
corvus	mordred, avass, swest: ^	14:54
avass	yep I agree	14:54
avass	corvus: oh actually I think I got it	14:57
avass	corvus, mordred: ansible setup exits with 127 and that's not handled: https://opendev.org/zuul/zuul/src/branch/master/zuul/executor/server.py#L2314	14:59
avass	oh nvm, I might have gotten the actual exitcode and the "rc" response it returns mixed up	15:00
mordred	wow	15:00
corvus	exit code is 2 if it's not found	15:01
corvus	so that's line 2361?	15:01
avass	yeah I read "rc": 127 as what the exitcode would be without actually checking the exitcode	15:01
corvus	avass: that would make sense :)	15:01
corvus	there's a whole bunch of processing there... i think we may not actually hit any of the cases for exit code 2 that cause it to return	15:02
corvus	here's my test run: http://paste.openstack.org/show/803869/	15:03
avass	corvus: but that path only return a non RESULT_NORMAL status if this is in the log: https://opendev.org/zuul/zuul/src/branch/master/zuul/executor/server.py#L2370	15:03
corvus	avass: right that's what i'm saying	15:03
avass	corvus: ah I was busy writing :)	15:03
corvus	avass: so i think you solved it :) apparently we only care about network issues at that stage, so we bypass the interpreter error	15:04
corvus	mordred: ^	15:04
corvus	it's like, it worked well enough to bomb out, carry on!	15:05
corvus	tobias-urdin: okay, i think we understand why your sequence works. :) the other one might work too if you run into problems.	15:06
openstackgerrit	James E. Blair proposed zuul/nodepool master: Azure: replace driver with state machine driver https://review.opendev.org/c/zuul/nodepool/+/781925	15:12
openstackgerrit	James E. Blair proposed zuul/nodepool master: Azure: update documentation https://review.opendev.org/c/zuul/nodepool/+/781926	15:12
avass	corvus: I also found this comment which might be worth taking a look at heh: https://opendev.org/zuul/zuul/src/branch/master/zuul/executor/server.py#L2416 :)	15:20
corvus	avass: yep, that could probably be handled now	15:21
mordred	nice	15:21
corvus	that would let us increase the persistent ssh session timeout which would improve efficiency on heavily loaded systems, while at the same time closing the persistent ssh connections immediately at the end of jobs	15:22
tobias-urdin	nice :D	15:36
corvus	swest, tobiash: i've started a "punch list" for v5: https://etherpad.opendev.org/p/zuulv5	15:43
corvus	i think so far we've got 2 things we should remember to do; ideally before our need for them is critical :)	15:44
*** jfoufas1 has quit IRC		15:55
avass	does zookeeper have something like leases in etcd? just wondering if keys could be attached a lease so if a client doesn't bump it's lease (like a crash) the keys would just be dropped	15:59
avass	or if there's another reason why that's not used	15:59
*** ykarel is now known as ykarel\|away		16:00
corvus	avass: there are ephemeral nodes	16:00
corvus	avass: and we use them where appropriate; but the 2 items on that list aren't a good match for ephemeral nodes (in the first case, the wrong client would own the node)	16:01
corvus	and in the second, we don't want the node to disappear even if the client does	16:01
corvus	we're actually probably going to be using ephemeral nodes a lot less in the future, as we decouple persistent "system state" from clients	16:02
corvus	we might be able to use ephemeral nodes for item 1 if we swap things around so the originator of the request creates the result node	16:05
avass	corvus: checking the second case the jobs holding the semaphores could be stored as znodes below (sub znodes?) the semaphore itself so they're dropped. but maybe that's much less efficient	16:07
* avass will now read up on the sos spec		16:09
corvus	yeah, i don't think that improves efficiency, and it doesn't address the "leak due to bug" case	16:09
corvus	i think you might be on to something with #1 though, i'll give that a look later :)	16:10
openstackgerrit	James E. Blair proposed zuul/zuul master: Don't refresh change when enqueuing an dequeue event https://review.opendev.org/c/zuul/zuul/+/782812	16:21
*** ykarel\|away has quit IRC		16:22
*** hamalq has joined #zuul		16:32
openstackgerrit	Jeremy Stanley proposed zuul/zuul-jobs master: WIP: Set Gentoo profile in configure-mirrors https://review.opendev.org/c/zuul/zuul-jobs/+/782339	16:36
openstackgerrit	Jeremy Stanley proposed zuul/zuul-jobs master: Revert "Temporarily stop running Gentoo base role tests" https://review.opendev.org/c/zuul/zuul-jobs/+/771106	16:36
avass	corvus: no but it should at least be possible to repair the system with a restart that way	16:37
corvus	avass: which thing are you talking about, 1 or 2?	16:47
*** masterpe has quit IRC		16:48
*** Eighth_Doctor has quit IRC		16:48
*** mordred has quit IRC		16:48
corvus	if 2, then we don't want the semaphore to disappear when a scheduler restarts	16:48
avass	corvus: 2 but in my head the semaphore and the executors holding part of the semaphore are different nodes, so if an executor crashes it drops the node it was holding	16:51
*** mordred has joined #zuul		16:51
*** masterpe has joined #zuul		16:52
avass	so instead the semaphore being one znode containing how many are currently being used, it just contains a max number with sub nodes held by executors being references to what jobs are currently using them	16:52
avass	(but maybe there's a good reason why it shouldn't work like that)	16:53
corvus	avass: because the scheduler is responsible for acquiring the semaphore before scheduling the job for an executor	16:54
avass	then that makes more sense	16:55
*** Eighth_Doctor has joined #zuul		17:04
*** y2kenny has joined #zuul		17:19
*** rpittau is now known as rpittau\|afk		17:24
*** jcapitao has quit IRC		17:47
openstackgerrit	James E. Blair proposed zuul/zuul master: Use ephemeral nodes for management result events https://review.opendev.org/c/zuul/zuul/+/782834	17:56
corvus	avass: ^ inspired by your comment on #1; swest, tobiash: ^	17:56
corvus	i think that also fixes a race	18:02
*** jpena is now known as jpena\|off		18:04
avass	wait what's the difference between a gerrit hashtag and a gerit topic?	18:08
corvus	avass: you can only have one topic, and can have many hashtags	18:09
avass	oh cool	18:09
corvus	(also, gerrit uses topics for things like submitting groups of changes together, which we don't enable in opendev)	18:10
corvus	i'm currently using the "sos" hashtag to identify a working set of changes -- like, let's review this group of changes as a unit, get them merged, then restart. otherwise the entire topic is too big to deal with.	18:11
corvus	(so that's why only some "topic:sos" have "hashtag:sos"	18:11
corvus	(could also do something like sos-1 sos-2 etc, but i haven't found the need for that yet)	18:13
*** hashar has joined #zuul		18:54
*** sshnaidm is now known as sshnaidm\|afk		19:12
*** GomathiselviS has joined #zuul		19:15
*** hashar is now known as hasharAway		19:58
*** hasharAway is now known as hashar		20:24
*** GomathiselviS has quit IRC		20:28
*** zettabyte has joined #zuul		20:41
*** zettabyte has quit IRC		20:48
tobiash	corvus: so an ephemeral node stays ephemeral also when updated by a different session?	20:58
*** hamalq has quit IRC		20:59
*** hamalq has joined #zuul		21:00
corvus	tobiash: yep; i believe we use that with node requests	21:01
tobiash	Cool :)	21:01
openstackgerrit	Albin Vass proposed zuul/nodepool master: Document ImagePullPolicy for kubernetes driver. https://review.opendev.org/c/zuul/nodepool/+/764463	21:10
*** y2kenny has quit IRC		21:14
*** y2kenny has joined #zuul		21:18
y2kenny	corvus: your suggestion from yesterday worked.	21:18
tobiash	corvus: regarding 781099, does it make sense to do the same with the promote event later as well?	21:18
y2kenny	corvus: thanks	21:19
*** jangutter has joined #zuul		21:22
*** jangutter_ has quit IRC		21:25
*** zettabyte has joined #zuul		21:26
*** hashar has quit IRC		21:33
*** zettabyte has quit IRC		21:33
*** zettabyte has joined #zuul		21:34
corvus	y2kenny: \o/	21:34
openstackgerrit	Merged zuul/zuul master: Add UUID for queue items https://review.opendev.org/c/zuul/zuul/+/772512	21:36
zettabyte	We're trying to speed up one of our zuul builds by moving some steps into disk image builder. One of our steps is to clone a few docker images, so I want to put these in disk image builder so that they are available before the zuul job starts.	21:36
zettabyte	Does anyone know if this can be done and if there are perhaps some examples to look at?	21:36
corvus	tobiash: maybe? i think that one was mostly aimed at reducing the code in process_global_management_queue (lines 1232 through 1262 on the old side), so i think it's mission accomplished there; but it seems likely that there some more consolidation we can do on the rpc side	21:36
tobiash	++	21:37
tobiash	corvus, swest: commented on 775622	21:37
corvus	zettabyte: that's a good question; we haven't done that in opendev so i don't have an example to point at. the main thing i'd be concerned about is starting/stopping the docker daemon. it's worth a try. it might be simpler with podman. and finally, if worse comes to worst, you may be able to download the images as files and then import them into docker on the node.	21:39
fungi	zettabyte: i don't know about the part where you tell your docker client where to find the images/cache, but we do something similar in opendev to pre-clone all our git repositories and download a number of files a lot of our builds rely on	21:39
tobiash	zettabyte: if you're using podman look at https://review.opendev.org/c/openstack/diskimage-builder/+/767706	21:40
fungi	we run a lot of jobs which start nested virtual machines, and so pre-download iso images those nested vm instances would boot and store them in known paths i our nodepool diskimages	21:40
zettabyte	corvus: Yeah, that's exactly what I'm struggling with. Starting the docker daemon. I don't think you can do that in chroot	21:40
tobiash	zettabyte: I meant if you're using docker	21:41
tobiash	with podman it's likely easier	21:41
corvus	tobiash: good catch re iter	21:41
fungi	zettabyte: you might be able to run the docker bits outside the chroot and then copy the results into it?	21:41
tobiash	fungi, zettabyte: we're running docker within bwrap when pulling the images	21:42
corvus	tobiash: oh nice that looks like just what zettabyte needs?	21:43
zettabyte	fungi: Yeah, that was my next thought. I'm trying post-root.d , but does that mean I need docker installed on nodepool-builder host? I'm getting a bit confused there	21:43
tobiash	yes, I've spent quite some time back then to get this working ;)	21:43
zettabyte	tobiash: Yeah, I'll lookup podman thanks. I don't know it	21:43
corvus	tobiash: is it only not merged because of pep8?	21:43
corvus	zettabyte: to be clear, there was a mistake in an earlier comment, you should look at https://review.opendev.org/767706 with docker as it may do what you are talking about today	21:44
zettabyte	corvus: Thanks!	21:45
corvus	tobiash: and, you know, the "POC" in the title? :)	21:45
tobiash	corvus: I don't know, I wasn't sure if this is interesting for folks. I've uploaded it some time ago because someone asked a similar question	21:45
*** ajitha has quit IRC		21:45
tobiash	since the way it works is a bit hacky ;)	21:45
corvus	tobiash: sounds like there are at least 3 people in the world interested :)	21:45
tobiash	I can easily un-poc it :)	21:45
corvus	tobiash: might be worth doing and see if dib wants it; could always add it with a warning it may muck up your networking or something	21:46
avass	zettabyte, tobiash: we're eventually going to have the same problem with pre-pulling images but haven't had much time to take a look at it yet. if you happen to find a better solution (or get docker to stop forcing everything to go through it's daemon) we'd be very interested :)	21:46
tobiash	at least we use that since years in production and even within nodepool-builder running containerized within openshift	21:47
tobiash	so it even works with docker-in-docker	21:47
avass	it's a bit sad that even pulling images requires the daemon to be running	21:47
corvus	skopeo/podman are great for that; i'd be really tempted to use those to write a file then import to docker on boot, just to keep from pulling hair out.	21:48
corvus	(assuming i wanted to use docker in the actual tests)	21:48
avass	yeah that's an alternative. I tried getting podman to work by symlinking docker->podman but the tools rely on it being docker a lot	21:49
corvus	while using podman for everything would be great, to be clear, i'm suggesting using skopeo/podman in dib to make the image, then on the actual booted node, importing the image into docker	21:50
corvus	a little extra time at boot, but shouldn't be much	21:51
tobiash	that works file for small images probably but not for our 15gb+ images unfortunately	21:51
zettabyte	tobiash: https://review.opendev.org/c/openstack/diskimage-builder/+/767706/ looks great. Would have taken a week of pain to figure something like that out	21:51
corvus	tobiash: point	21:51
avass	same for our 6gb+ images :)	21:51
tobiash	I've removed the poc, but I guess before being accepted the docs need to be added	21:51
corvus	it's also probably possible to write directly to docker's image cache. can't be too hard, right? :)	21:52
tobiash	however I don't have much time right now to do that so if anyone wants to take it over feel free	21:52
tobiash	corvus: I've tried that hard back then since that's the preferable solution but skopeo at least at that time also relies on a docker daemon	21:52
avass	corvus: that can actually be very hard :)	21:53
tobiash	at least two years ago docker had no library for that	21:53
tobiash	no idea if that has changed since then	21:53
avass	tobiash: I think the structure skopeo stores the images in is different to how docker does it	21:54
tobiash	yes, that was the problem	21:54
avass	and I didn't find any tool to do that last time I checked ~3months ago	21:54
avass	I suppose that would be a good candidate for a side project	21:56
fungi	one which you get to revise constantly each time docker inc decide to restructure their cache	21:56
avass	fungi: the overlay2 structure doesn't seem to have changed in at least 3 months so it can't be too bad can it? ;)	21:58
*** zettabyte has quit IRC		22:00
*** zettabyte has joined #zuul		22:01
fungi	by modern standards that's positively fossilized	22:02
fungi	i don't suppose serving the images from a local registry on the node would perform any better than importing them from a filesystem path	22:03
avass	probably not	22:04
*** zettabyte has quit IRC		22:07
*** zettabyte has joined #zuul		22:08
*** zettabyte has quit IRC		22:13
*** zettabyte has joined #zuul		22:15
*** zettabyte has quit IRC		22:24
*** zettabyte has joined #zuul		22:25
openstackgerrit	Merged zuul/zuul master: Unify handling of dequeue and enqueue events https://review.opendev.org/c/zuul/zuul/+/781099	22:30
*** zettabyte has quit IRC		22:31
*** zettabyte has joined #zuul		22:32
*** zettabyte has quit IRC		22:38
*** zettabyte has joined #zuul		22:38
*** vishalmanchanda has quit IRC		22:41
openstackgerrit	Merged zuul/nodepool master: Document ImagePullPolicy for kubernetes driver. https://review.opendev.org/c/zuul/nodepool/+/764463	22:44
*** zettabyte has quit IRC		22:45
*** zettabyte has joined #zuul		22:46
openstackgerrit	Merged zuul/zuul master: Improve test output by using named queues https://review.opendev.org/c/zuul/zuul/+/775620	22:46
openstackgerrit	Merged zuul/zuul master: Avoid race when task from queue is in progress https://review.opendev.org/c/zuul/zuul/+/775621	22:46
*** zettabyte has quit IRC		22:52
*** zettabyte has joined #zuul		22:53
*** nils has quit IRC		22:57
*** zettabyte has quit IRC		23:00
*** zettabyte has joined #zuul		23:01
y2kenny	If I have ProjectA (config project) and ProjectB (untrusted project) and I pushed a job in ProjectB pre-submit that inherit a job in ProjectA that in turn calls a role in ProjectB that is not yet submitted, is that supposed to work?	23:01
y2kenny	(I am currently getting role not found.)	23:01
*** rlandy has quit IRC		23:05
*** zettabyte has quit IRC		23:09
*** zettabyte has joined #zuul		23:09
openstackgerrit	Merged zuul/nodepool master: Mention node id when unlock failed https://review.opendev.org/c/zuul/nodepool/+/777678	23:12
*** zettabyte has quit IRC		23:15
*** zettabyte has joined #zuul		23:16
fungi	y2kenny: https://zuul-ci.org/docs/zuul/reference/job_def.html#attr-job.roles states " Zuul roles are able to benefit from speculative merging and cross-project dependencies when used by playbooks in untrusted projects."	23:17
fungi	so it has to do with where the playbook resides	23:17
mordred	yah - otherwise you could add a role to an untrusted project that overrides a role in a trusted project and then execute code speculatively in a trusted context - which would be bad	23:18
fungi	if the playbook is in a config project and references a role from an untrusted project, it needs that role to be present on the appropriate branch of the untrusted project	23:21
*** zettabyte has quit IRC		23:26
*** zettabyte has joined #zuul		23:27
y2kenny	fungi, mordred: yea, I was reading that and I want to make sure I understood it correctly. I am doing something funky to get around some of my logging setup. It's not so much the role in the untrusted project overriding the trusted one... I am actually passing the role name from the untrusted project into the trusted project to execute. But I	23:30
y2kenny	get why that is a security issue (so the security model worked :)).	23:30
mordred	\o/	23:31
y2kenny	I know this looks like I am replicating the whole pre/run/post structure but I kind of have to becuase of various permission/security issue.	23:32
y2kenny	basically I needed to start something (a logging process) that span the entire duration of the job on the executor.	23:32
y2kenny	which has to be in a trusted project but can't be just in the pre because the pre playbook will exit.	23:33
y2kenny	the work around is fine if I just pass the commands as a variable but I was hoping to do something more advanced by passing a role to be executed	23:34
fungi	could it technically be a separate build/job which runs on the executor, in parallel to the main job, just paused until the main job ends?	23:34
y2kenny	um... separate as in starts by the same trigger but not in a parent-child relationship?	23:35
fungi	we already have a model for concurrent interdependent builds	23:36
fungi	for example, our container testing workflow starts a job which runs an image registry on a node, starts another job when that first job "pauses" and the second job can add images to or retrieve images from the registry being served by the node for the paused job, then once the second job completes the first is unpaused and cleans up	23:37
y2kenny	I think I saw that example but may be I misunderstood the implementation	23:38
y2kenny	I thought the registry job is parent to the second job	23:38
fungi	so i don't know the details of your logger, but you could in theory run the "logger job" on the executor, "pause" it (the logger started by the job keeps running), then you start your second job you want logged on an ephemeral node or whatever, once the logged job is done the logger job wakes back up, shuts down the logger process, and archives the logs or whatever	23:39
corvus	fungi, y2kenny: i think a paused job still ends the main run playbook, so any running processes will be terminated -- it just waits for children to finish before starting the post-run playbook.	23:39
fungi	ahh, okay, so you'd have to have some way of leaving it running outside the playbook regardless	23:40
corvus	ya	23:40
fungi	for our image registry example, i suppose the registry is a background process disassociated from the ansible which started it	23:40
corvus	fungi: it's on a worker node	23:40
fungi	right, of course	23:41
corvus	the trick here is y2kenny doesn't have a convenient worker node to run the ipmitool on and would like to use the executor	23:41
fungi	was just going to say that gets harder if you try to do it on the executor through the bubblewrap layer	23:41
corvus	(partly due to nodepool's inability to handle cross-provider requests -- can't request a static node and a vm at the same time)	23:42
y2kenny	I am actually using the registry example to start the baremetal node (since the baremetal will stay power on after the playbook quit)	23:42
fungi	i suppose it would be possible, but would require a separate supervisor to handle the logger process	23:42
y2kenny	I can potentially have two separate job off the same trigger	23:42
corvus	y2kenny, fungi: but tie these 2 things together and we may have anohter option:	23:42
fungi	i.e. something else running independently on the executor, which the playbooks talk to	23:42
y2kenny	so no parent/child relationship	23:42
corvus	outer job runs on vm, starts impmitool, pauses; inner job starts on baremetal, completes; outer job on vm resumes	23:43
corvus	it's 2 separate jobs, so it gets around the nodepoool cross-provider issue	23:43
corvus	oh, but the outer job would need to know the baremetal node...	23:44
corvus	nevermind	23:44
fungi	i thought zuul wanted dependent jobs to be in the same provider too	23:44
corvus	i'm unsure if that's a preference or a hard requirement	23:45
fungi	or it could be i imagined it	23:45
corvus	i'd look it up but i guess it doesn't matter	23:45
corvus	fungi: no you're right, i'm just not sure if it will entertain nodes from another provider if the current provider can't actually supply them	23:45
corvus	anyway, moot point	23:45
y2kenny	wait... so is this the dependency between separate job or the dependencies between parent and child?	23:46
y2kenny	for separate jobs but with a specified dependencies, I can certainly use different nodeset	23:46
fungi	job dependencies, not inheritence of job definitions	23:46
fungi	but yeah, as corvus points out, the ipmitool job won't know where to find the corresponding baremetal ipmi interface	23:47
y2kenny	oh right... because the inner job gets the node allocated later	23:48
corvus	we can pass info from outer to inner job, but not the other way around	23:48
y2kenny	right	23:48
fungi	if there were some separate ipmitool-as-a-service with an api the executor could talk to, then you could theoretically communicate that to start/stop logging of a specified node and retrieve the log data	23:49
fungi	but that's a lot of additional bespoke engineering	23:49
y2kenny	yea... the alternative I was going to do is feed the dmesg to a server some where else via netconsole	23:50
*** zettabyte has quit IRC		23:50
y2kenny	but that's a few more things to setup	23:50
y2kenny	and netconsole is still not as complete as a BMC serial-over-LAN capture.	23:51
fungi	i suppose you're not running ironic, you could probably have the executor talk to it to collect console logs otherwise	23:51
corvus	acutally....	23:51
y2kenny	but I think what I have currently should be sufficient for now. I am just passing the test command on the baremetal via variable.	23:52
corvus	fungi: i think the provider preference is a preference -- other providers will handle the request if the requested provider has declined it (which it would do if it can't satisfy it because it doesn't have that type)	23:52
*** zettabyte has joined #zuul		23:52
corvus	fungi, y2kenny: so if you wanted to write a little bit of code, you could probably start a daemon on the outer job, return the network address of the daemon to the inner job via zuul, pause, then have the inner job connect to the daemon and tell it which ipmi host to connect to; then the daemon can start logging and the inner job can proceed	23:53
corvus	whether that rube-goldberg machine is preferable to any of the others, i can't say :)	23:54
corvus	but at least everything is ephemeral	23:54
y2kenny	haha... yea... I will need to think about that.	23:54
corvus	and to be clear, since i'm making up the outer/inner job terminology here, the inner job is just a job that has "job.dependencies: outer-job"	23:55
y2kenny	right.	23:55
y2kenny	anyway, thank fungi and corvus for brain storming.	23:58
y2kenny	thank you*	23:59
fungi	in a single-job model, could each playbook on the executor start up an ipmitool background process streaming to a file, and then in post just concatenate them?	23:59
fungi	there could be gaps, of course	23:59
fungi	but in theory the gaps at least wouldn't be while playbooks were running	23:59

Generated by irclog2html.py 2.17.2 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!