*** tosky has quit IRC | 00:09 | |
*** piotrowskim has quit IRC | 01:02 | |
*** holser has quit IRC | 01:45 | |
*** evrardjp has quit IRC | 03:33 | |
*** evrardjp has joined #zuul | 03:33 | |
*** ykarel has joined #zuul | 04:05 | |
*** ykarel has quit IRC | 04:15 | |
*** ykarel has joined #zuul | 04:15 | |
*** vishalmanchanda has joined #zuul | 04:42 | |
*** jfoufas1 has joined #zuul | 05:24 | |
*** ajitha has joined #zuul | 05:26 | |
*** wuchunyang has joined #zuul | 05:36 | |
*** saneax has joined #zuul | 05:56 | |
*** zbr|rover has quit IRC | 06:08 | |
*** zbr|rover has joined #zuul | 06:10 | |
*** parallax has quit IRC | 07:29 | |
*** jcapitao has joined #zuul | 07:46 | |
openstackgerrit | Tobias Henkel proposed zuul/zuul master: Route streams to different zones via finger gateway https://review.opendev.org/c/zuul/zuul/+/664965 | 08:11 |
---|---|---|
openstackgerrit | Tobias Henkel proposed zuul/zuul master: Support ssl encrypted fingergw https://review.opendev.org/c/zuul/zuul/+/664950 | 08:11 |
*** ykarel is now known as ykarel|lunch | 08:17 | |
*** rpittau|afk is now known as rpittau | 08:19 | |
*** hashar has joined #zuul | 08:28 | |
openstackgerrit | Simon Westphahl proposed zuul/zuul master: Add UUID for queue items https://review.opendev.org/c/zuul/zuul/+/772512 | 08:31 |
openstackgerrit | Simon Westphahl proposed zuul/zuul master: Store semaphore state in Zookeeper https://review.opendev.org/c/zuul/zuul/+/772513 | 08:31 |
*** ricolin has joined #zuul | 08:32 | |
*** tosky has joined #zuul | 08:41 | |
openstackgerrit | Simon Westphahl proposed zuul/zuul master: Unify handling of dequeue and enqueue events https://review.opendev.org/c/zuul/zuul/+/781099 | 08:46 |
openstackgerrit | Simon Westphahl proposed zuul/zuul master: Improve test output by using named queues https://review.opendev.org/c/zuul/zuul/+/775620 | 08:46 |
openstackgerrit | Simon Westphahl proposed zuul/zuul master: Avoid race when task from queue is in progress https://review.opendev.org/c/zuul/zuul/+/775621 | 08:46 |
openstackgerrit | Simon Westphahl proposed zuul/zuul master: Implement Zookeeper backed connection event queues https://review.opendev.org/c/zuul/zuul/+/775622 | 08:46 |
openstackgerrit | Simon Westphahl proposed zuul/zuul master: Dispatch Github webhook events via Zookeeper https://review.opendev.org/c/zuul/zuul/+/775624 | 08:46 |
openstackgerrit | Simon Westphahl proposed zuul/zuul master: Dispatch Pagure webhook events via Zookeeper https://review.opendev.org/c/zuul/zuul/+/775623 | 08:46 |
openstackgerrit | Simon Westphahl proposed zuul/zuul master: Dispatch Gitlab webhook events via Zookeeper https://review.opendev.org/c/zuul/zuul/+/775625 | 08:46 |
openstackgerrit | Sorin Sbârnea proposed zuul/zuul master: Document tox environments https://review.opendev.org/c/zuul/zuul/+/766460 | 09:00 |
*** vishalmanchanda has quit IRC | 09:01 | |
*** jpena|off is now known as jpena | 09:09 | |
*** holser has joined #zuul | 09:16 | |
*** ykarel|lunch is now known as ykarel | 09:16 | |
*** nils has joined #zuul | 09:37 | |
*** vishalmanchanda has joined #zuul | 09:42 | |
*** parallax has joined #zuul | 09:54 | |
openstackgerrit | Tobias Henkel proposed zuul/zuul master: Route streams to different zones via finger gateway https://review.opendev.org/c/zuul/zuul/+/664965 | 10:03 |
openstackgerrit | Tobias Henkel proposed zuul/zuul master: Support ssl encrypted fingergw https://review.opendev.org/c/zuul/zuul/+/664950 | 10:04 |
*** jangutter_ has joined #zuul | 10:24 | |
*** jangutter has quit IRC | 10:27 | |
*** wuchunyang has quit IRC | 10:59 | |
*** jcapitao is now known as jcapitao_lunch | 11:06 | |
*** hashar has quit IRC | 11:07 | |
*** tobias-urdin has joined #zuul | 11:09 | |
tobias-urdin | is there any way to make nodepool (or zuul?) wait for cloud-init to complete on a node before using it | 11:09 |
tobias-urdin | for example, with an image that does not have python installed, and cloud-init (passed through nodepool) that installs it sometimes causes a race condition between zuul trying to execute ansible and python is not installed on the node | 11:10 |
tobias-urdin | im thinking, a pre-run task using the "raw" module and pausing until the python binary is found, but that's kind of hacky | 11:11 |
tobias-urdin | but maybe it breaks even earlier than that | 11:11 |
avass | tobias-urdin: you're installing python with cloud init? | 11:17 |
avass | I think corvus had siilar problems where there was a race with host-keys generated by cloud-init | 11:18 |
avass | tobias-urdin: if you can somehow install python before enabling sshd that would do it I think. | 11:19 |
tobias-urdin | avass: yeah, nodepool passes userdata that installs python, but sometimes python is not installed in time when zuul to start using the node | 11:34 |
*** rlandy has joined #zuul | 11:35 | |
tobias-urdin | for now i'm just pausing the execution in pre-run, seems to work just need to make sure nothing using modules is used before i guess | 11:36 |
tristanC | tobias-urdin: maybe you could use a `raw` task to wait for python installation, e.g. https://docs.ansible.com/ansible/latest/collections/ansible/builtin/raw_module.html#examples | 11:55 |
avass | tristanC, tobias-urdin: doesn't the initial ansible-setup need python too? | 12:07 |
*** sshnaidm|off is now known as sshnaidm | 12:13 | |
*** jpena is now known as jpena|lunch | 12:32 | |
*** jcapitao_lunch is now known as jcapitao | 12:42 | |
*** ykarel has quit IRC | 13:04 | |
*** ykarel has joined #zuul | 13:07 | |
*** ykarel_ has joined #zuul | 13:13 | |
*** ykarel has quit IRC | 13:14 | |
mordred | tobias-urdin: I don't suppose you can tell nodepool to build you an image that has python already? | 13:50 |
mordred | but in general - "cloud-init is complete" is, AIUI, a hard condition to generalize. I'm sure some people would like that if there is a good way to express it and it can be determined via the cloud's api | 13:51 |
mordred | if we're talking about doing it in nodepool that is. for zuul - yeah- a very early raw task in your base pre-run playbook that waits for python to exist might be a good idea | 13:52 |
swest | tobias-urdin: would executing 'cloud-init status --wait' work for you? | 13:54 |
tobias-urdin | mordred: unfortunately the reason it does not have python is because we want to use a non-customized image as much as possible, but i'm sticking to the pre-run trick for now | 13:54 |
tobias-urdin | swest: didn't know about that, perhaps that is better as a raw task than simply pausing, thanks! | 13:55 |
mordred | swest: ooh that's a potentially neat trick | 13:55 |
corvus | ++ sounds like a good role for zuul-jobs if it works :) | 13:56 |
mordred | ++ | 13:56 |
fungi | tobias-urdin: i'm slightly confused... how does cloud-init run without python? | 13:57 |
fungi | or has it been rewritten in something other than python? | 13:57 |
tobias-urdin | fungi: on centos (and similar) it uses platform-python (which is python3 but is /usr/libexec/platform-python or smth like that) and not /usr/bin/python3 | 13:59 |
tobias-urdin | pretty much all system tooling uses that, but python is not "installed" | 13:59 |
fungi | tobias-urdin: aha, got it. so your nodes have python, just not the python you want to use to run your tests | 13:59 |
tobias-urdin | yeah, so i can use that to run stuff but i can't use that for my applications | 13:59 |
avass | I thought the "ansible '*' -m setup" needed python, maybe it doesn't | 14:01 |
mordred | tobias-urdin: you know - for the ansible case, trying using /usr/libexec/platform-python might be an interesting experiment. it would potential help the ansible not pollute the images you are otherwise trying to keep as more pristine test environments | 14:01 |
mordred | that way you could actually have jobs that install python as part of the workload - which would be neat | 14:01 |
tobias-urdin | mordred: yeah :) | 14:03 |
corvus | avass: i'm curious about that too | 14:05 |
mordred | avass, corvus: since that's for fact caching, do we just catch the exception and not cache facts if it doesn't work? | 14:08 |
corvus | mordred: it's actually mostly for ssh connection testing; so we want it to fail if it doesn't work | 14:08 |
mordred | oh - right | 14:09 |
mordred | we do in fact want that to work | 14:09 |
mordred | I wonder if the setup module finds /usr/libexec/platform-python | 14:09 |
mordred | system_interpreters = ['/usr/libexec/platform-python', '/usr/bin/python3', '/usr/bin/python'] | 14:10 |
mordred | also: https://github.com/ansible/ansible/blob/4c5ce5a1a9e79a845aff4978cfeb72a0d4ecf7d6/lib/ansible/modules/package_facts.py#L242 | 14:10 |
*** ykarel_ is now known as ykarel | 14:10 | |
mordred | it looks lke ansible is aware of and will attempt to find platform-python | 14:10 |
corvus | ah nice; that would probably be used for all ansible tasks then, right? (once facts are gathered?) | 14:11 |
mordred | yeah - except I think we set ansible_python_interpreter somewhere, no? | 14:12 |
mordred | which I think might stem from older days when ansible was worse at finding the right interpreter? | 14:12 |
mordred | (like, I'm wondering if setup works because ansible finds platform-python, but then we configure ansible more explicitly which then breaks things) | 14:13 |
corvus | mordred: i think we set it to auto which is the default now? | 14:13 |
mordred | I think you're right? | 14:13 |
mordred | tobias-urdin: ^^ are you setting ansible_python_interpreter somewhere? | 14:13 |
* mordred is learning fascinating things today | 14:14 | |
corvus | tobias-urdin: and is your race condition with running ansible or running your code which requires python? (ie, is ansible breaking, or is a shell task that runs "python something.py" breaking)? | 14:15 |
tobias-urdin | it's actually pretty messy, i'm setting python-path to /usr/bin/python3 in nodepool, passing userdata in nodepool to install python3 | 14:24 |
tobias-urdin | then the application im running is python3, which is running inside a venv, that in itself runs ansible, that sets ansible_python_interpreter to bootstrap the node, which in turn installs python3 (it's already installed) which then runs another playbook with interpreter set to python3 | 14:25 |
tobias-urdin | i'm basicly testing bootstrapping code, that is a python app, that runs ansible, inside zuul that runs it with ansible :p | 14:26 |
avass | heh :) | 14:26 |
corvus | hrm, then i wonder how ansible -m setup works | 14:26 |
avass | corvus: maybe that ignores ansible_python_interpreter or fails and tries to find another interpreter from the defaults | 14:27 |
corvus | maybe | 14:27 |
*** jpena|lunch is now known as jpena | 14:28 | |
avass | actually I think ansible_python_interpreter might just be missing since that step uses a separate inventory file | 14:29 |
corvus | avass: oh, that would be interesting | 14:30 |
avass | I checked and that command fails if ansible_python_interpreter is set to a bad path | 14:30 |
tobias-urdin | anyway, i added a pre-run playbook to the base job which just does cloud-init status --wait which seems like the best situation, we always want to wait for cloud-init so that fine for us (thanks swest!) | 14:30 |
corvus | tobias-urdin: did you use raw for that? | 14:31 |
tobias-urdin | yes, straight up "raw: cloud-init status --wait" with gather_facts set to false for the playbook | 14:32 |
corvus | okay, that's not inconsistent with setup doing something special | 14:36 |
corvus | i'm doing some tests :) with absolutely no python on the system, setup module fails | 14:40 |
corvus | ansible will use platform-python. setup honors ansible_python_interpreter. and if it's set, it won't fall back on platform-python. | 14:45 |
corvus | and from what i can tell, zuul includes ansible_python_interpreter in the setup inventory. | 14:45 |
corvus | avass, tobias-urdin, mordred: so i don't understand how tobias-urdin's system gets past zuul's invocation of ansible -m setup. | 14:45 |
corvus | the ansible_python_interpreter that's set via nodepool should cause that to fail before even running the pre-playbook. | 14:46 |
corvus | (i just confirmed on opendev that our setup-inventory files have ansible_python_interpreter in them) | 14:48 |
corvus | tobias-urdin: are there any messages in your executor log related to zuul's ansible setup phase? | 14:50 |
corvus | tobias-urdin: it'll be an ansible command with "-m setup" in the args | 14:51 |
corvus | tobias-urdin: because i'm stumped as to how this is working for you; i would expect you to need to use the default of 'auto' for python-path for ansible to work reliably, then in a regular pre-run task (no "raw" or anything) either install python, or wait for cloud-init to finish doing it for you. then proceed with running your app. | 14:53 |
corvus | mordred, avass, swest: ^ | 14:54 |
avass | yep I agree | 14:54 |
avass | corvus: oh actually I think I got it | 14:57 |
avass | corvus, mordred: ansible setup exits with 127 and that's not handled: https://opendev.org/zuul/zuul/src/branch/master/zuul/executor/server.py#L2314 | 14:59 |
avass | oh nvm, I might have gotten the actual exitcode and the "rc" response it returns mixed up | 15:00 |
mordred | wow | 15:00 |
corvus | exit code is 2 if it's not found | 15:01 |
corvus | so that's line 2361? | 15:01 |
avass | yeah I read "rc": 127 as what the exitcode would be without actually checking the exitcode | 15:01 |
corvus | avass: that would make sense :) | 15:01 |
corvus | there's a whole bunch of processing there... i think we may not actually hit any of the cases for exit code 2 that cause it to return | 15:02 |
corvus | here's my test run: http://paste.openstack.org/show/803869/ | 15:03 |
avass | corvus: but that path only return a non RESULT_NORMAL status if this is in the log: https://opendev.org/zuul/zuul/src/branch/master/zuul/executor/server.py#L2370 | 15:03 |
corvus | avass: right that's what i'm saying | 15:03 |
avass | corvus: ah I was busy writing :) | 15:03 |
corvus | avass: so i think you solved it :) apparently we *only* care about network issues at that stage, so we bypass the interpreter error | 15:04 |
corvus | mordred: ^ | 15:04 |
corvus | it's like, it worked well enough to bomb out, carry on! | 15:05 |
corvus | tobias-urdin: okay, i think we understand why your sequence works. :) the other one might work too if you run into problems. | 15:06 |
openstackgerrit | James E. Blair proposed zuul/nodepool master: Azure: replace driver with state machine driver https://review.opendev.org/c/zuul/nodepool/+/781925 | 15:12 |
openstackgerrit | James E. Blair proposed zuul/nodepool master: Azure: update documentation https://review.opendev.org/c/zuul/nodepool/+/781926 | 15:12 |
avass | corvus: I also found this comment which might be worth taking a look at heh: https://opendev.org/zuul/zuul/src/branch/master/zuul/executor/server.py#L2416 :) | 15:20 |
corvus | avass: yep, that could probably be handled now | 15:21 |
mordred | nice | 15:21 |
corvus | that would let us increase the persistent ssh session timeout which would improve efficiency on heavily loaded systems, while at the same time closing the persistent ssh connections immediately at the end of jobs | 15:22 |
tobias-urdin | nice :D | 15:36 |
corvus | swest, tobiash: i've started a "punch list" for v5: https://etherpad.opendev.org/p/zuulv5 | 15:43 |
corvus | i think so far we've got 2 things we should remember to do; ideally before our need for them is critical :) | 15:44 |
*** jfoufas1 has quit IRC | 15:55 | |
avass | does zookeeper have something like leases in etcd? just wondering if keys could be attached a lease so if a client doesn't bump it's lease (like a crash) the keys would just be dropped | 15:59 |
avass | or if there's another reason why that's not used | 15:59 |
*** ykarel is now known as ykarel|away | 16:00 | |
corvus | avass: there are ephemeral nodes | 16:00 |
corvus | avass: and we use them where appropriate; but the 2 items on that list aren't a good match for ephemeral nodes (in the first case, the wrong client would own the node) | 16:01 |
corvus | and in the second, we don't want the node to disappear even if the client does | 16:01 |
corvus | we're actually probably going to be using ephemeral nodes a lot less in the future, as we decouple persistent "system state" from clients | 16:02 |
corvus | we might be able to use ephemeral nodes for item 1 if we swap things around so the originator of the request creates the result node | 16:05 |
avass | corvus: checking the second case the jobs holding the semaphores could be stored as znodes below (sub znodes?) the semaphore itself so they're dropped. but maybe that's much less efficient | 16:07 |
* avass will now read up on the sos spec | 16:09 | |
corvus | yeah, i don't think that improves efficiency, and it doesn't address the "leak due to bug" case | 16:09 |
corvus | i think you might be on to something with #1 though, i'll give that a look later :) | 16:10 |
openstackgerrit | James E. Blair proposed zuul/zuul master: Don't refresh change when enqueuing an dequeue event https://review.opendev.org/c/zuul/zuul/+/782812 | 16:21 |
*** ykarel|away has quit IRC | 16:22 | |
*** hamalq has joined #zuul | 16:32 | |
openstackgerrit | Jeremy Stanley proposed zuul/zuul-jobs master: WIP: Set Gentoo profile in configure-mirrors https://review.opendev.org/c/zuul/zuul-jobs/+/782339 | 16:36 |
openstackgerrit | Jeremy Stanley proposed zuul/zuul-jobs master: Revert "Temporarily stop running Gentoo base role tests" https://review.opendev.org/c/zuul/zuul-jobs/+/771106 | 16:36 |
avass | corvus: no but it should at least be possible to repair the system with a restart that way | 16:37 |
corvus | avass: which thing are you talking about, 1 or 2? | 16:47 |
*** masterpe has quit IRC | 16:48 | |
*** Eighth_Doctor has quit IRC | 16:48 | |
*** mordred has quit IRC | 16:48 | |
corvus | if 2, then we don't want the semaphore to disappear when a scheduler restarts | 16:48 |
avass | corvus: 2 but in my head the semaphore and the executors holding part of the semaphore are different nodes, so if an executor crashes it drops the node it was holding | 16:51 |
*** mordred has joined #zuul | 16:51 | |
*** masterpe has joined #zuul | 16:52 | |
avass | so instead the semaphore being one znode containing how many are currently being used, it just contains a max number with sub nodes held by executors being references to what jobs are currently using them | 16:52 |
avass | (but maybe there's a good reason why it shouldn't work like that) | 16:53 |
corvus | avass: because the scheduler is responsible for acquiring the semaphore before scheduling the job for an executor | 16:54 |
avass | then that makes more sense | 16:55 |
*** Eighth_Doctor has joined #zuul | 17:04 | |
*** y2kenny has joined #zuul | 17:19 | |
*** rpittau is now known as rpittau|afk | 17:24 | |
*** jcapitao has quit IRC | 17:47 | |
openstackgerrit | James E. Blair proposed zuul/zuul master: Use ephemeral nodes for management result events https://review.opendev.org/c/zuul/zuul/+/782834 | 17:56 |
corvus | avass: ^ inspired by your comment on #1; swest, tobiash: ^ | 17:56 |
corvus | i think that also fixes a race | 18:02 |
*** jpena is now known as jpena|off | 18:04 | |
avass | wait what's the difference between a gerrit hashtag and a gerit topic? | 18:08 |
corvus | avass: you can only have one topic, and can have many hashtags | 18:09 |
avass | oh cool | 18:09 |
corvus | (also, gerrit uses topics for things like submitting groups of changes together, which we don't enable in opendev) | 18:10 |
corvus | i'm currently using the "sos" hashtag to identify a working set of changes -- like, let's review this group of changes as a unit, get them merged, then restart. otherwise the entire topic is too big to deal with. | 18:11 |
corvus | (so that's why only some "topic:sos" have "hashtag:sos" | 18:11 |
corvus | (could also do something like sos-1 sos-2 etc, but i haven't found the need for that yet) | 18:13 |
*** hashar has joined #zuul | 18:54 | |
*** sshnaidm is now known as sshnaidm|afk | 19:12 | |
*** GomathiselviS has joined #zuul | 19:15 | |
*** hashar is now known as hasharAway | 19:58 | |
*** hasharAway is now known as hashar | 20:24 | |
*** GomathiselviS has quit IRC | 20:28 | |
*** zettabyte has joined #zuul | 20:41 | |
*** zettabyte has quit IRC | 20:48 | |
tobiash | corvus: so an ephemeral node stays ephemeral also when updated by a different session? | 20:58 |
*** hamalq has quit IRC | 20:59 | |
*** hamalq has joined #zuul | 21:00 | |
corvus | tobiash: yep; i believe we use that with node requests | 21:01 |
tobiash | Cool :) | 21:01 |
openstackgerrit | Albin Vass proposed zuul/nodepool master: Document ImagePullPolicy for kubernetes driver. https://review.opendev.org/c/zuul/nodepool/+/764463 | 21:10 |
*** y2kenny has quit IRC | 21:14 | |
*** y2kenny has joined #zuul | 21:18 | |
y2kenny | corvus: your suggestion from yesterday worked. | 21:18 |
tobiash | corvus: regarding 781099, does it make sense to do the same with the promote event later as well? | 21:18 |
y2kenny | corvus: thanks | 21:19 |
*** jangutter has joined #zuul | 21:22 | |
*** jangutter_ has quit IRC | 21:25 | |
*** zettabyte has joined #zuul | 21:26 | |
*** hashar has quit IRC | 21:33 | |
*** zettabyte has quit IRC | 21:33 | |
*** zettabyte has joined #zuul | 21:34 | |
corvus | y2kenny: \o/ | 21:34 |
openstackgerrit | Merged zuul/zuul master: Add UUID for queue items https://review.opendev.org/c/zuul/zuul/+/772512 | 21:36 |
zettabyte | We're trying to speed up one of our zuul builds by moving some steps into disk image builder. One of our steps is to clone a few docker images, so I want to put these in disk image builder so that they are available before the zuul job starts. | 21:36 |
zettabyte | Does anyone know if this can be done and if there are perhaps some examples to look at? | 21:36 |
corvus | tobiash: maybe? i think that one was mostly aimed at reducing the code in process_global_management_queue (lines 1232 through 1262 on the old side), so i think it's mission accomplished there; but it seems likely that there some more consolidation we can do on the rpc side | 21:36 |
tobiash | ++ | 21:37 |
tobiash | corvus, swest: commented on 775622 | 21:37 |
corvus | zettabyte: that's a good question; we haven't done that in opendev so i don't have an example to point at. the main thing i'd be concerned about is starting/stopping the docker daemon. it's worth a try. it might be simpler with podman. and finally, if worse comes to worst, you may be able to download the images as files and then import them into docker on the node. | 21:39 |
fungi | zettabyte: i don't know about the part where you tell your docker client where to find the images/cache, but we do something similar in opendev to pre-clone all our git repositories and download a number of files a lot of our builds rely on | 21:39 |
tobiash | zettabyte: if you're using podman look at https://review.opendev.org/c/openstack/diskimage-builder/+/767706 | 21:40 |
fungi | we run a lot of jobs which start nested virtual machines, and so pre-download iso images those nested vm instances would boot and store them in known paths i our nodepool diskimages | 21:40 |
zettabyte | corvus: Yeah, that's exactly what I'm struggling with. Starting the docker daemon. I don't think you can do that in chroot | 21:40 |
tobiash | zettabyte: I meant if you're using docker | 21:41 |
tobiash | with podman it's likely easier | 21:41 |
corvus | tobiash: good catch re iter | 21:41 |
fungi | zettabyte: you might be able to run the docker bits outside the chroot and then copy the results into it? | 21:41 |
tobiash | fungi, zettabyte: we're running docker within bwrap when pulling the images | 21:42 |
corvus | tobiash: oh nice that looks like just what zettabyte needs? | 21:43 |
zettabyte | fungi: Yeah, that was my next thought. I'm trying post-root.d , but does that mean I need docker installed on nodepool-builder host? I'm getting a bit confused there | 21:43 |
tobiash | yes, I've spent quite some time back then to get this working ;) | 21:43 |
zettabyte | tobiash: Yeah, I'll lookup podman thanks. I don't know it | 21:43 |
corvus | tobiash: is it only not merged because of pep8? | 21:43 |
corvus | zettabyte: to be clear, there was a mistake in an earlier comment, you should look at https://review.opendev.org/767706 with docker as it may do what you are talking about today | 21:44 |
zettabyte | corvus: Thanks! | 21:45 |
corvus | tobiash: and, you know, the "POC" in the title? :) | 21:45 |
tobiash | corvus: I don't know, I wasn't sure if this is interesting for folks. I've uploaded it some time ago because someone asked a similar question | 21:45 |
*** ajitha has quit IRC | 21:45 | |
tobiash | since the way it works is a bit hacky ;) | 21:45 |
corvus | tobiash: sounds like there are at least 3 people in the world interested :) | 21:45 |
tobiash | I can easily un-poc it :) | 21:45 |
corvus | tobiash: might be worth doing and see if dib wants it; could always add it with a warning it may muck up your networking or something | 21:46 |
avass | zettabyte, tobiash: we're eventually going to have the same problem with pre-pulling images but haven't had much time to take a look at it yet. if you happen to find a better solution (or get docker to stop forcing everything to go through it's daemon) we'd be very interested :) | 21:46 |
tobiash | at least we use that since years in production and even within nodepool-builder running containerized within openshift | 21:47 |
tobiash | so it even works with docker-in-docker | 21:47 |
avass | it's a bit sad that even pulling images requires the daemon to be running | 21:47 |
corvus | skopeo/podman are great for that; i'd be really tempted to use those to write a file then import to docker on boot, just to keep from pulling hair out. | 21:48 |
corvus | (assuming i wanted to use docker in the actual tests) | 21:48 |
avass | yeah that's an alternative. I tried getting podman to work by symlinking docker->podman but the tools rely on it being docker a lot | 21:49 |
corvus | while using podman for everything would be great, to be clear, i'm suggesting using skopeo/podman in dib to make the image, then on the actual booted node, importing the image into docker | 21:50 |
corvus | a little extra time at boot, but shouldn't be much | 21:51 |
tobiash | that works file for small images probably but not for our 15gb+ images unfortunately | 21:51 |
zettabyte | tobiash: https://review.opendev.org/c/openstack/diskimage-builder/+/767706/ looks great. Would have taken a week of pain to figure something like that out | 21:51 |
corvus | tobiash: point | 21:51 |
avass | same for our 6gb+ images :) | 21:51 |
tobiash | I've removed the poc, but I guess before being accepted the docs need to be added | 21:51 |
corvus | it's also probably possible to write directly to docker's image cache. can't be too hard, right? :) | 21:52 |
tobiash | however I don't have much time right now to do that so if anyone wants to take it over feel free | 21:52 |
tobiash | corvus: I've tried that hard back then since that's the preferable solution but skopeo at least at that time also relies on a docker daemon | 21:52 |
avass | corvus: that can actually be very hard :) | 21:53 |
tobiash | at least two years ago docker had no library for that | 21:53 |
tobiash | no idea if that has changed since then | 21:53 |
avass | tobiash: I think the structure skopeo stores the images in is different to how docker does it | 21:54 |
tobiash | yes, that was the problem | 21:54 |
avass | and I didn't find any tool to do that last time I checked ~3months ago | 21:54 |
avass | I suppose that would be a good candidate for a side project | 21:56 |
fungi | one which you get to revise constantly each time docker inc decide to restructure their cache | 21:56 |
avass | fungi: the overlay2 structure doesn't seem to have changed in at least 3 months so it can't be too bad can it? ;) | 21:58 |
*** zettabyte has quit IRC | 22:00 | |
*** zettabyte has joined #zuul | 22:01 | |
fungi | by modern standards that's positively fossilized | 22:02 |
fungi | i don't suppose serving the images from a local registry on the node would perform any better than importing them from a filesystem path | 22:03 |
avass | probably not | 22:04 |
*** zettabyte has quit IRC | 22:07 | |
*** zettabyte has joined #zuul | 22:08 | |
*** zettabyte has quit IRC | 22:13 | |
*** zettabyte has joined #zuul | 22:15 | |
*** zettabyte has quit IRC | 22:24 | |
*** zettabyte has joined #zuul | 22:25 | |
openstackgerrit | Merged zuul/zuul master: Unify handling of dequeue and enqueue events https://review.opendev.org/c/zuul/zuul/+/781099 | 22:30 |
*** zettabyte has quit IRC | 22:31 | |
*** zettabyte has joined #zuul | 22:32 | |
*** zettabyte has quit IRC | 22:38 | |
*** zettabyte has joined #zuul | 22:38 | |
*** vishalmanchanda has quit IRC | 22:41 | |
openstackgerrit | Merged zuul/nodepool master: Document ImagePullPolicy for kubernetes driver. https://review.opendev.org/c/zuul/nodepool/+/764463 | 22:44 |
*** zettabyte has quit IRC | 22:45 | |
*** zettabyte has joined #zuul | 22:46 | |
openstackgerrit | Merged zuul/zuul master: Improve test output by using named queues https://review.opendev.org/c/zuul/zuul/+/775620 | 22:46 |
openstackgerrit | Merged zuul/zuul master: Avoid race when task from queue is in progress https://review.opendev.org/c/zuul/zuul/+/775621 | 22:46 |
*** zettabyte has quit IRC | 22:52 | |
*** zettabyte has joined #zuul | 22:53 | |
*** nils has quit IRC | 22:57 | |
*** zettabyte has quit IRC | 23:00 | |
*** zettabyte has joined #zuul | 23:01 | |
y2kenny | If I have ProjectA (config project) and ProjectB (untrusted project) and I pushed a job in ProjectB pre-submit that inherit a job in ProjectA that in turn calls a role in ProjectB that is not yet submitted, is that supposed to work? | 23:01 |
y2kenny | (I am currently getting role not found.) | 23:01 |
*** rlandy has quit IRC | 23:05 | |
*** zettabyte has quit IRC | 23:09 | |
*** zettabyte has joined #zuul | 23:09 | |
openstackgerrit | Merged zuul/nodepool master: Mention node id when unlock failed https://review.opendev.org/c/zuul/nodepool/+/777678 | 23:12 |
*** zettabyte has quit IRC | 23:15 | |
*** zettabyte has joined #zuul | 23:16 | |
fungi | y2kenny: https://zuul-ci.org/docs/zuul/reference/job_def.html#attr-job.roles states " Zuul roles are able to benefit from speculative merging and cross-project dependencies when used by playbooks in untrusted projects." | 23:17 |
fungi | so it has to do with where the playbook resides | 23:17 |
mordred | yah - otherwise you could add a role to an untrusted project that overrides a role in a trusted project and then execute code speculatively in a trusted context - which would be bad | 23:18 |
fungi | if the playbook is in a config project and references a role from an untrusted project, it needs that role to be present on the appropriate branch of the untrusted project | 23:21 |
*** zettabyte has quit IRC | 23:26 | |
*** zettabyte has joined #zuul | 23:27 | |
y2kenny | fungi, mordred: yea, I was reading that and I want to make sure I understood it correctly. I am doing something funky to get around some of my logging setup. It's not so much the role in the untrusted project overriding the trusted one... I am actually passing the role name from the untrusted project into the trusted project to execute. But I | 23:30 |
y2kenny | get why that is a security issue (so the security model worked :)). | 23:30 |
mordred | \o/ | 23:31 |
y2kenny | I know this looks like I am replicating the whole pre/run/post structure but I kind of have to becuase of various permission/security issue. | 23:32 |
y2kenny | basically I needed to start something (a logging process) that span the entire duration of the job on the executor. | 23:32 |
y2kenny | which has to be in a trusted project but can't be just in the pre because the pre playbook will exit. | 23:33 |
y2kenny | the work around is fine if I just pass the commands as a variable but I was hoping to do something more advanced by passing a role to be executed | 23:34 |
fungi | could it technically be a separate build/job which runs on the executor, in parallel to the main job, just paused until the main job ends? | 23:34 |
y2kenny | um... separate as in starts by the same trigger but not in a parent-child relationship? | 23:35 |
fungi | we already have a model for concurrent interdependent builds | 23:36 |
fungi | for example, our container testing workflow starts a job which runs an image registry on a node, starts another job when that first job "pauses" and the second job can add images to or retrieve images from the registry being served by the node for the paused job, then once the second job completes the first is unpaused and cleans up | 23:37 |
y2kenny | I think I saw that example but may be I misunderstood the implementation | 23:38 |
y2kenny | I thought the registry job is parent to the second job | 23:38 |
fungi | so i don't know the details of your logger, but you could in theory run the "logger job" on the executor, "pause" it (the logger started by the job keeps running), then you start your second job you want logged on an ephemeral node or whatever, once the logged job is done the logger job wakes back up, shuts down the logger process, and archives the logs or whatever | 23:39 |
corvus | fungi, y2kenny: i think a paused job still ends the main run playbook, so any running processes will be terminated -- it just waits for children to finish before starting the post-run playbook. | 23:39 |
fungi | ahh, okay, so you'd have to have some way of leaving it running outside the playbook regardless | 23:40 |
corvus | ya | 23:40 |
fungi | for our image registry example, i suppose the registry is a background process disassociated from the ansible which started it | 23:40 |
corvus | fungi: it's on a worker node | 23:40 |
fungi | right, of course | 23:41 |
corvus | the trick here is y2kenny doesn't have a convenient worker node to run the ipmitool on and would like to use the executor | 23:41 |
fungi | was just going to say that gets harder if you try to do it on the executor through the bubblewrap layer | 23:41 |
corvus | (partly due to nodepool's inability to handle cross-provider requests -- can't request a static node and a vm at the same time) | 23:42 |
y2kenny | I am actually using the registry example to start the baremetal node (since the baremetal will stay power on after the playbook quit) | 23:42 |
fungi | i suppose it would be possible, but would require a separate supervisor to handle the logger process | 23:42 |
y2kenny | I can potentially have two separate job off the same trigger | 23:42 |
corvus | y2kenny, fungi: but tie these 2 things together and we may have anohter option: | 23:42 |
fungi | i.e. something else running independently on the executor, which the playbooks talk to | 23:42 |
y2kenny | so no parent/child relationship | 23:42 |
corvus | outer job runs on vm, starts impmitool, pauses; inner job starts on baremetal, completes; outer job on vm resumes | 23:43 |
corvus | it's 2 separate jobs, so it gets around the nodepoool cross-provider issue | 23:43 |
corvus | oh, but the outer job would need to know the baremetal node... | 23:44 |
corvus | nevermind | 23:44 |
fungi | i thought zuul wanted dependent jobs to be in the same provider too | 23:44 |
corvus | i'm unsure if that's a preference or a hard requirement | 23:45 |
fungi | or it could be i imagined it | 23:45 |
corvus | i'd look it up but i guess it doesn't matter | 23:45 |
corvus | fungi: no you're right, i'm just not sure if it will entertain nodes from another provider if the current provider can't actually supply them | 23:45 |
corvus | anyway, moot point | 23:45 |
y2kenny | wait... so is this the dependency between separate job or the dependencies between parent and child? | 23:46 |
y2kenny | for separate jobs but with a specified dependencies, I can certainly use different nodeset | 23:46 |
fungi | job dependencies, not inheritence of job definitions | 23:46 |
fungi | but yeah, as corvus points out, the ipmitool job won't know where to find the corresponding baremetal ipmi interface | 23:47 |
y2kenny | oh right... because the inner job gets the node allocated later | 23:48 |
corvus | we can pass info from outer to inner job, but not the other way around | 23:48 |
y2kenny | right | 23:48 |
fungi | if there were some separate ipmitool-as-a-service with an api the executor could talk to, then you could theoretically communicate that to start/stop logging of a specified node and retrieve the log data | 23:49 |
fungi | but that's a lot of additional bespoke engineering | 23:49 |
y2kenny | yea... the alternative I was going to do is feed the dmesg to a server some where else via netconsole | 23:50 |
*** zettabyte has quit IRC | 23:50 | |
y2kenny | but that's a few more things to setup | 23:50 |
y2kenny | and netconsole is still not as complete as a BMC serial-over-LAN capture. | 23:51 |
fungi | i suppose you're not running ironic, you could probably have the executor talk to it to collect console logs otherwise | 23:51 |
corvus | acutally.... | 23:51 |
y2kenny | but I think what I have currently should be sufficient for now. I am just passing the test command on the baremetal via variable. | 23:52 |
corvus | fungi: i think the provider preference is a preference -- other providers will handle the request if the requested provider has declined it (which it would do if it can't satisfy it because it doesn't have that type) | 23:52 |
*** zettabyte has joined #zuul | 23:52 | |
corvus | fungi, y2kenny: so if you wanted to write a little bit of code, you could probably start a daemon on the outer job, return the network address of the daemon to the inner job via zuul, pause, then have the inner job connect to the daemon and tell it which ipmi host to connect to; then the daemon can start logging and the inner job can proceed | 23:53 |
corvus | whether *that* rube-goldberg machine is preferable to any of the others, i can't say :) | 23:54 |
corvus | but at least everything is ephemeral | 23:54 |
y2kenny | haha... yea... I will need to think about that. | 23:54 |
corvus | and to be clear, since i'm making up the outer/inner job terminology here, the inner job is just a job that has "job.dependencies: outer-job" | 23:55 |
y2kenny | right. | 23:55 |
y2kenny | anyway, thank fungi and corvus for brain storming. | 23:58 |
y2kenny | thank you* | 23:59 |
fungi | in a single-job model, could each playbook on the executor start up an ipmitool background process streaming to a file, and then in post just concatenate them? | 23:59 |
fungi | there could be gaps, of course | 23:59 |
fungi | but in theory the gaps at least wouldn't be while playbooks were running | 23:59 |
Generated by irclog2html.py 2.17.2 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!