*** tosky has quit IRC | 00:01 | |
*** zxiiro has quit IRC | 00:10 | |
*** jamesmcarthur has joined #zuul | 00:27 | |
*** mattw4 has quit IRC | 00:28 | |
*** jamesmcarthur has quit IRC | 00:33 | |
*** sugaar has quit IRC | 01:02 | |
*** reiterative has quit IRC | 01:03 | |
*** sugaar has joined #zuul | 01:05 | |
*** reiterative has joined #zuul | 01:06 | |
*** mattw4 has joined #zuul | 01:13 | |
*** Goneri has quit IRC | 01:13 | |
*** mattw4 has quit IRC | 01:18 | |
*** rlandy has quit IRC | 01:21 | |
y2kenny | Is node-attributes only applicable to openstack driver? | 01:53 |
---|---|---|
*** jamesmcarthur has joined #zuul | 02:03 | |
*** swest has quit IRC | 02:57 | |
*** jamesmcarthur has quit IRC | 03:02 | |
*** jamesmcarthur has joined #zuul | 03:03 | |
*** jamesmcarthur has quit IRC | 03:06 | |
*** jamesmcarthur_ has joined #zuul | 03:06 | |
*** bhavikdbavishi has joined #zuul | 03:11 | |
*** swest has joined #zuul | 03:12 | |
SpamapS | y2kenny: yes. | 03:15 |
*** bhavikdbavishi has quit IRC | 03:19 | |
*** bhavikdbavishi has joined #zuul | 03:28 | |
*** jamesmcarthur_ has quit IRC | 03:47 | |
*** jamesmcarthur has joined #zuul | 03:49 | |
*** jamesmcarthur has quit IRC | 03:54 | |
*** jamesmcarthur has joined #zuul | 04:18 | |
*** jamesmcarthur has quit IRC | 04:25 | |
*** bhavikdbavishi has quit IRC | 04:43 | |
*** jamesmcarthur has joined #zuul | 04:44 | |
*** bhavikdbavishi has joined #zuul | 04:47 | |
*** jamesmcarthur has quit IRC | 05:06 | |
*** sgw has quit IRC | 05:25 | |
*** bhavikdbavishi has quit IRC | 05:26 | |
*** bhavikdbavishi has joined #zuul | 05:26 | |
*** evrardjp has quit IRC | 05:36 | |
*** evrardjp has joined #zuul | 05:36 | |
*** y2kenny has quit IRC | 05:40 | |
*** saneax has joined #zuul | 06:20 | |
*** jamesmcarthur has joined #zuul | 07:08 | |
*** bhavikdbavishi has quit IRC | 07:09 | |
*** jamesmcarthur has quit IRC | 07:12 | |
*** dpawlik has joined #zuul | 07:16 | |
*** bhavikdbavishi has joined #zuul | 07:56 | |
*** jcapitao has joined #zuul | 07:57 | |
*** avass has joined #zuul | 08:27 | |
*** tosky has joined #zuul | 08:41 | |
*** sshnaidm|pto is now known as sshnaidm | 08:48 | |
*** jpena|off is now known as jpena | 09:04 | |
*** bolg has joined #zuul | 09:10 | |
*** harrymichal has joined #zuul | 09:17 | |
*** panda|off is now known as panda | 09:21 | |
*** bolg has quit IRC | 09:34 | |
*** bolg has joined #zuul | 09:36 | |
*** bolg has quit IRC | 09:36 | |
openstackgerrit | Jan Kubovy proposed zuul/zuul master: WIP: Store unparsed branch config in Zookeeper https://review.opendev.org/705716 | 10:25 |
*** avass has quit IRC | 10:41 | |
*** avass has joined #zuul | 11:10 | |
*** dpawlik has quit IRC | 11:33 | |
*** dpawlik has joined #zuul | 11:34 | |
*** rlandy has joined #zuul | 11:40 | |
*** bhavikdbavishi has quit IRC | 11:58 | |
*** jcapitao is now known as jcapitao_lunch | 11:59 | |
*** jpena is now known as jpena|lunch | 11:59 | |
*** rfolco has joined #zuul | 12:02 | |
*** bolg has joined #zuul | 12:28 | |
*** bhavikdbavishi has joined #zuul | 12:41 | |
*** jpena|lunch is now known as jpena | 13:01 | |
*** sgw has joined #zuul | 13:04 | |
*** hashar has joined #zuul | 13:05 | |
zbr | do we have any reason not to continue with https://review.opendev.org/#/c/677971/ ? | 13:12 |
*** jcapitao_lunch is now known as jcapitao | 13:25 | |
mordred | zbr: just left comments | 13:31 |
*** avass has quit IRC | 13:34 | |
zbr | mordred: thanks, so mostly not against it. | 13:36 |
mordred | no - just think there's some things we can do/update to make it work more better | 13:36 |
zbr | tbh, the python version is not so important but distro + ansible version used by zuul are, because these can have a relevant impact on how it behaves | 13:36 |
zbr | sometimes people forget that zuul may run a different ansible version that they usually do | 13:37 |
*** hashar has quit IRC | 13:37 | |
mordred | totally - although I think ansible version is already there. is there a behavior difference people should know about related to the executor distro though? | 13:37 |
zbr | probably not, let me clean it and have another look at it. | 13:42 |
*** harrymichal has quit IRC | 13:44 | |
*** y2kenny has joined #zuul | 13:55 | |
*** Goneri has joined #zuul | 13:57 | |
*** bhavikdbavishi has quit IRC | 13:59 | |
y2kenny | I just want to double confirm, is node-attributes only applicable to openstack nodepool driver? If that's the case, it explains my observation of the scheduler using the wrong executor to reach a node. | 14:01 |
y2kenny | (the node I created was using static driver) | 14:02 |
*** bolg has quit IRC | 14:11 | |
Shrews | y2kenny: yes, openstack driver only | 14:13 |
y2kenny | Shrews: Are there any plans to expand that as a general thing? Without this, the executor zone concept wouldn't work for other drivers | 14:15 |
y2kenny | (as far as I understand.) | 14:16 |
*** hashar has joined #zuul | 14:16 | |
*** bolg has joined #zuul | 14:18 | |
*** harrymichal has joined #zuul | 14:21 | |
*** jamesmcarthur has joined #zuul | 14:23 | |
Shrews | y2kenny: no current plans that i'm aware of. i don't see any immediate reason to not support that in all drivers | 14:24 |
Shrews | i suppose at the time we added it, we only needed it for openstack | 14:24 |
mordred | yeah - it seems like a good thing to support everywhere | 14:26 |
Shrews | i think i could code something up fairly quickly. lemme see how much effort that is | 14:27 |
Shrews | oh, i smartly made it a common attribute. each driver just needs to read it from the config. this should be easy | 14:34 |
mordred | Shrews: good job! | 14:41 |
Shrews | of course, the static driver has to be difficult b/c it's so different | 14:54 |
Shrews | ugh | 14:54 |
mordred | Shrews: day drinking? | 14:55 |
*** y2kenny has quit IRC | 14:59 | |
Shrews | mordred: i wish | 15:06 |
openstackgerrit | David Shrewsbury proposed zuul/nodepool master: Support node-attributes in static driver https://review.opendev.org/714672 | 15:08 |
Shrews | aaaaaand y2kenny is gone | 15:08 |
Shrews | mordred: but there it is ^^^ | 15:08 |
Shrews | oh, that's missing something | 15:09 |
*** mattw4 has joined #zuul | 15:10 | |
Shrews | oh, nope. i think that's right | 15:12 |
Shrews | mordred: also, i think node-attributes is already supported in all the other drivers, we just don't have it documented. i need to verify that first, though | 15:14 |
Shrews | but that can be done separately | 15:14 |
clarkb | what do the node attributes do? | 15:15 |
clarkb | are they inventory host vars in the end? | 15:15 |
fungi | sounds like they're used for zoning executors/nodes? | 15:15 |
Shrews | just arbitrary data we set in the node zk data that zuul can do whatever it wants with. right now i think it's just used for zones | 15:16 |
Shrews | i wanted something flexible for similar new zuul features that didn't require updating nodepool to support | 15:16 |
Shrews | i don't think any of it is passed through at all | 15:17 |
Shrews | (as host vars, that is) | 15:18 |
clarkb | gotcha | 15:18 |
*** bhavikdbavishi has joined #zuul | 15:24 | |
tobiash | I just found a funny way of leaking jobs in the executor. Restarting the scheduler will leak all paused jobs in the executors (not nodes) if not all executors are restarted as well. | 15:33 |
corvus | tobiash: i would think that we would lose the gearman job and that should take care of it | 15:34 |
tobiash | corvus: that doesn't seem to be the case. The executor just sits there and waits for the resume that will never happen. | 15:36 |
tobiash | however that's not a huge problem | 15:36 |
corvus | there's a chain: scheduler -> gearman server -> executor, and if that's broken, the job should abort | 15:36 |
tobiash | except for our graceful draining script that will wait forever | 15:36 |
tobiash | does the executor get an event from gearman about that? (actually it disconnects and reconnects gearman if the scheduler restarts) | 15:37 |
tobiash | if that's possible we could react on a gearman disconnect in the executor and just drop all jobs | 15:38 |
corvus | tobiash: hrm, i may be remembering incorrectly; the worker may not get notified on a client disconnect, which means it will only discover that it's gone when it tries to send more data, which a paused job won't do | 15:42 |
corvus | that would explain the behavior | 15:42 |
tobiash | yes | 15:43 |
corvus | tobiash: we could have the executor send a "still paused" data packet every 5 minutes :) | 15:44 |
tobiash | corvus: we could react on https://opendev.org/zuul/zuul/src/branch/master/zuul/lib/gearworker.py#L96 | 15:44 |
tobiash | and throw away all jobs in this case (as we loose every job on a disconnect) | 15:44 |
corvus | tobiash: but that's only if the executor disconnects | 15:45 |
tobiash | right, which only covers some scheduler restarts | 15:45 |
corvus | i thought this was in the case that the scheduler only disconnects | 15:45 |
*** zxiiro has joined #zuul | 15:46 | |
tobiash | then the only way I see is the "still paused" heartbeat | 15:46 |
tobiash | in our case the gearman server is started together with the scheduler | 15:46 |
corvus | (opendev uses the embedded gear server, so *everything* disconnects when we restart the scheduler, which is probably why we don't see this; but iirc, you run a separate geard) | 15:46 |
corvus | tobiash: wait, do you restart your gear server when you restart your scheduler? | 15:46 |
tobiash | yes | 15:46 |
corvus | oh, that's different. 1 sec. | 15:46 |
corvus | tobiash: there's a handleDisconnect callback that the worker should get | 15:47 |
tobiash | that's not handled in the executor :) | 15:47 |
tobiash | I guess we handle that nowhere in zuul | 15:48 |
corvus | tobiash: at least not anymore | 15:48 |
corvus | it looks like we lost it somewhere between 2.5 and now | 15:49 |
*** jamesmcarthur has quit IRC | 15:49 | |
tobiash | I'll see if I can re-plumb that through | 15:49 |
corvus | oh, it may only ever have been on the client side | 15:49 |
corvus | we may never have had it on the server side | 15:49 |
corvus | tobiash: take a look at ZuulGearmanClient in executor/client.py | 15:50 |
tobiash | yes, just saw it on merge and execute client | 15:50 |
corvus | i think we'll need a subclass of Text/BinaryWorker like that for the executor (though it'll be a little trickier there because of the way the classes are set up) | 15:50 |
tobiash | corvus: hrm, seems like handleDisconnect is not part of the inheritance chain of TextWorker | 15:56 |
tobiash | handleDisconnect is defined in Client, but TextWorker only contains BaseClient | 15:57 |
*** jamesmcarthur has joined #zuul | 15:58 | |
corvus | tobiash: weird -- because this should be in the hierarchy: https://opendev.org/opendev/gear/src/branch/master/gear/__init__.py#L842 | 16:00 |
tobiash | yeah I just saw it's called, but not down there defined in the hierarchy | 16:02 |
tobiash | I guess this will just throw an exception | 16:02 |
tobiash | but it also means that adding it should work | 16:02 |
*** jcapitao is now known as jcapitao_afk | 16:07 | |
*** jamesmcarthur has quit IRC | 16:09 | |
*** mattw4 has quit IRC | 16:18 | |
*** mattw4 has joined #zuul | 16:19 | |
*** jamesmcarthur has joined #zuul | 16:21 | |
*** armstrongs has joined #zuul | 16:25 | |
*** jamesmcarthur has quit IRC | 16:26 | |
*** jcapitao_afk is now known as jcapitao | 16:32 | |
*** harrymichal has quit IRC | 16:37 | |
*** armstrongs has quit IRC | 16:38 | |
*** tflink has quit IRC | 16:43 | |
*** tflink has joined #zuul | 16:43 | |
tobiash | corvus: I think that sould move it into the right place: https://review.opendev.org/714709 Move handleDisconnect into BaseClientServer | 16:48 |
*** tflink has quit IRC | 16:49 | |
*** tflink has joined #zuul | 16:50 | |
*** y2kenny has joined #zuul | 16:51 | |
y2kenny | Shrews: sorry I had to head to the office to get some equipments. Thanks for the quick patch. | 16:54 |
y2kenny | per the documentation here, I need to set the node-attributes if I am to use the zone feature. For example, if I have a nodepool inside a private cluster (no external IP for the nodes) and an executor in this private cluster, I would want the scheduler only use the private-cluster executor to talk to the nodes. | 16:57 |
*** mattw4 has quit IRC | 16:57 | |
*** mattw4 has joined #zuul | 16:58 | |
y2kenny | I tried this feature by manually launching a single node in the private cluster and have the nodepool provide it via a static driver | 16:58 |
y2kenny | (sorry... the document here: https://zuul-ci.org/docs/zuul/discussion/components.html#executor) | 16:59 |
y2kenny | what I noticed is that the scheduler tries to reach the node in the private cluster via another executor I have outside of the private cluster and the node is unreachable | 16:59 |
y2kenny | Shrews: does a similar patch needs to be applied for the Kubernetes driver as well for the k8s driver to support zone? | 17:03 |
y2kenny | Shrews: also, should I go and vote as an interested user for the patch you just made? | 17:12 |
*** sshnaidm is now known as sshnaidm|afk | 17:13 | |
fungi | y2kenny: absolutely! and if you get a chance to skim the implementation and spot possible bugs or things which you think might work better another way, please leave a comment and point them out | 17:21 |
y2kenny | fungi: ok, will do. | 17:22 |
fungi | opendev's code review platform is configured to allow anyone to vote and comment on changes, because we host projects which want users who provide feedback ;) | 17:23 |
y2kenny | opendev looks pretty awesome. I didn't come across it until diving into Zuul. | 17:24 |
y2kenny | I think it's a pretty good alternative to GitHub and looks like it uses all open source components. | 17:24 |
openstackgerrit | Tobias Henkel proposed zuul/zuul master: Stop jobs on gearman disconnect https://review.opendev.org/714722 | 17:25 |
tobiash | corvus: I think this should fix it, but no idea how to test that yet ^ | 17:25 |
clarkb | tobiash: could have a job that starts a process that sits and waits, then disconnect gearman and check ps to see the processes are cleaned up? | 17:27 |
clarkb | (or will that not properly check the state in the executor itself? | 17:27 |
tobiash | clarkb: I guess I'd have to shutdown the gearman server during the test | 17:27 |
clarkb | tobiash: likely es | 17:28 |
fungi | y2kenny: it's a work in progress, but the idea is that open source projects who value relynig on open source tools for their contributor workflows but lack the resources to run all those services themselves can collaborate with other like-minded projects to operate a platform which meets their collective needs | 17:28 |
tobiash | it's gonna be interesting if it's possible to stop the gear server without breaking the test case :) | 17:28 |
*** hashar has quit IRC | 17:29 | |
*** evrardjp has quit IRC | 17:36 | |
*** evrardjp has joined #zuul | 17:36 | |
*** ianychoi has quit IRC | 17:45 | |
*** yolanda has quit IRC | 17:45 | |
*** mgoddard has quit IRC | 17:45 | |
*** openstackstatus has quit IRC | 17:45 | |
*** dmsimard|off has quit IRC | 17:45 | |
*** ianychoi has joined #zuul | 17:46 | |
*** yolanda has joined #zuul | 17:46 | |
*** irclogbot_3 has quit IRC | 17:47 | |
*** openstackstatus has joined #zuul | 17:48 | |
*** ChanServ sets mode: +v openstackstatus | 17:48 | |
Shrews | y2kenny: Well, like I said in the scrollback, I believe node-attributes should already work with the rest of the drivers. I'll be putting up a change shortly that validates that and also updates the docs to reflect it. | 17:48 |
y2kenny | Shrews: Thanks! | 17:48 |
*** mgoddard has joined #zuul | 17:49 | |
Shrews | y2kenny: it didn't work for the static driver because it's a different beast from the other drivers | 17:49 |
*** irclogbot_2 has joined #zuul | 17:49 | |
y2kenny | Shrews: ah ok. I have no idea how the drivers work under the hood (I assumed they are all somewhat the same until I read what you said about pre-launching having different behaviour.) | 17:51 |
y2kenny | Quick question, do you guys have some stats on Zuul usage around the OpenStack project? (I am trying to pitch my management.) I tried to find the slides from the Gerrit User Summit but looks like it's not posted yet. | 17:57 |
clarkb | y2kenny: ya we publish things occasionally. I think soem official numbers may have ended up in the OSF annual report | 17:58 |
clarkb | y2kenny: let me see ifI can find a link | 17:58 |
y2kenny | some numbers on number of tests and nodes for size and scale would be very useful. | 17:58 |
clarkb | y2kenny: https://www.openstack.org/foundation/2019-openstack-foundation-annual-report grep for "Zuul Project Update" | 17:59 |
clarkb | y2kenny: there is a small blurb in there about openstack's use of zuul and leboncoin's | 17:59 |
clarkb | y2kenny: also the articles linked from https://zuul-ci.org/users.html | 18:00 |
clarkb | we can dig up more specifics if it helps too, just let us know | 18:00 |
clarkb | (its usually a matter of running queries against graphite, which you can do too) | 18:00 |
fungi | y2kenny: but also we maintain some real-time graphs like http://grafana.openstack.org/d/T6vSHcSik/zuul-status?orgId=1 | 18:01 |
y2kenny | unfortunately I am not familiar with graphite. | 18:01 |
y2kenny | Can you provide a few of these figures? | 18:02 |
y2kenny | 1) number of components server by OpenStack's CI | 18:03 |
y2kenny | 2) number of build nodes | 18:03 |
y2kenny | 3) number of test nodes | 18:03 |
y2kenny | 4) number of build per week | 18:03 |
y2kenny | 5) number of tests per week | 18:03 |
*** jcapitao has quit IRC | 18:03 | |
fungi | ahh, so more of a sizing thing | 18:04 |
y2kenny | 6) active developers contributing to the workload | 18:04 |
y2kenny | I think Monty has some of those figures in his slides | 18:04 |
fungi | we peak at a little over 1k job nodes at the moment | 18:04 |
y2kenny | I just can't find the slides :) | 18:04 |
fungi | in use simultaneously | 18:04 |
*** jamesmcarthur has joined #zuul | 18:04 | |
clarkb | mordred: ^ you have those someplace you can share easily from? | 18:04 |
fungi | we have 12 executors and 8 additional mergers to support those | 18:05 |
fungi | build nodes and test nodes are indistinguishable for opendev at least, i'm not sure how you're differentiating those terms | 18:05 |
y2kenny | fungi: understood... I think that make sense for your use case. I differentiate them mostly for my usecase. | 18:06 |
fungi | y2kenny: well, my point was more that i don't know what you mean by those terms | 18:06 |
corvus | y2kenny: for 1 do you mean git repos? | 18:06 |
openstackgerrit | Merged zuul/zuul-jobs master: Trim whitespace from uri password for docker promote https://review.opendev.org/714506 | 18:06 |
y2kenny | fungi: i.e., we generally wouldn't want a baremetal node with GPUs for testing to spend time on compiling things. That's the reason we differentiate. | 18:07 |
fungi | i took #1 to be how many separate zuul service instances (executors and mergers) we were running | 18:07 |
clarkb | https://graphite01.opendev.org/render/?title=Build%20Nodes%20Used%20Per%20Day&from=20200101&until=now&target=alias(summarize(sumSeries(stats.gauges.nodepool.label.*.nodes.in-use),%20%271d%27),%20%27Nodes%20In%20Use%27)&height=800&width=1280 is an example graphite query that answers 2/3 on a daily basis | 18:07 |
clarkb | note you can change the date range there and also ask for json output for hard numbes | 18:07 |
y2kenny | corvus: for 1, I meant openstack components that interact with each other (it's been awhile sense I did openstack, but I would say Nova is a component and cinder is another?) | 18:08 |
fungi | y2kenny: oh, i see, you're saying you plan to have some nodes which do "builds" and some nodes which run "tests" and they're separate. we consider tests to also be a kind of "build" so don't really have a means of separating statistics around them, at least not which i can think of | 18:08 |
mordred | y2kenny, clarkb: https://opendev.org/inaugust/inaugust.com/src/branch/master/src/zuulv3/gus2019.rst was what I did at Gerrit Summit | 18:08 |
corvus | y2kenny: that's a hard question to answer, it's a little fuzzy. depending on how you count, there are between 6 and a few hundred of those, but opendev's zuul supports 2,000 git repos. | 18:09 |
corvus | https://www.openstack.org/software/project-navigator/openstack-components#openstack-services | 18:10 |
mordred | maybe it might make sense to think of opendev buildsets for the "builds" number and opendev builds for the "Tests" number | 18:10 |
corvus | that's some of the different ways of counting them | 18:10 |
y2kenny | corvus: the idea behind that question is to talk about the complexity of the project. The counter example would be typical webapp where it's really one component with a single button deploy and release. Another definition for component may be individually versioned projects (Can nova and cinder get release independently?... I forgot... I vaguely | 18:11 |
y2kenny | remember you guys do release train.) | 18:11 |
*** jpena is now known as jpena|off | 18:12 | |
y2kenny | corvus: the 2000 git repo is a good number :) | 18:13 |
corvus | y2kenny: the full set of repos under openstack project governance is 694 -- that represents the set of git repos that coordinate release activity and follow the same testing standards, etc. | 18:14 |
corvus | many others are associated but have looser requirements. | 18:15 |
corvus | as fungi said, i think 2 and 3 are the same for us, and that's 1,000 right now. | 18:15 |
y2kenny | so the 694 is under one tennant? | 18:16 |
y2kenny | these are all very good numbers. Thanks guys. | 18:16 |
mordred | for 6 we're at a lower point right now - but the system has managed in the 2500 active devs number | 18:16 |
corvus | y2kenny: even more under one tenant, close to the full 2000, because we haven't gotten around to moving the others out yet | 18:17 |
mordred | (we have fewer devs today - but that doesn't mean that this zuul hasn't taken care of more in the not too distant past) | 18:17 |
corvus | y2kenny: based on clark's graph, i'd estimate #4 and #5 (again, the same for us) at about 90,000 per week | 18:18 |
mordred | there was a point where we were doing 2k per hour at peak load | 18:18 |
y2kenny | mordred: Understood. I have some more politically minded people challenging my CI capacity by asking about utilization during lunch time :) | 18:18 |
mordred | y2kenny: how friendly :) | 18:19 |
*** jamesmcarthur has quit IRC | 18:19 | |
*** jamesmcarthur has joined #zuul | 18:19 | |
clarkb | one of the great things about zuul is it has been designed to scale up with your needs | 18:19 |
corvus | yeah, our peak load has probably been more like 150,000 to 200,000 per week | 18:19 |
clarkb | if you need to run more jobs you add more cloud capacity and more executors | 18:20 |
mordred | yeah - but even that is only that low because devs go to sleep - if the 2k jobs per hour held, that would make our estimated peak weekly load 336k jobs / week | 18:20 |
mordred | depending on what it is we're wanting to comunicate :) | 18:20 |
*** bhavikdbavishi has quit IRC | 18:24 | |
fungi | it's a matter of capacity vs utilization | 18:25 |
fungi | we have capacity to run 336k/wk, we run fewer because utilization fluctuates with waking hours of developers interacting with our code review system | 18:26 |
Shrews | corvus: umm, did we not add nodepool tests for the gce driver? | 18:27 |
corvus | Shrews: maybe not. i think we have some very limited testing for aws, we could probably add something similar. | 18:29 |
openstackgerrit | David Shrewsbury proposed zuul/nodepool master: Update tests for node-attributes https://review.opendev.org/714738 | 18:31 |
Shrews | corvus: note the frowny face in 714738 :( | 18:31 |
Shrews | y2kenny: it seems node-attributes was broken in the openshift drivers, but works for k8s. 714738 should update the docs for us. Thanks for helping us find that. | 18:33 |
y2kenny | Shrews: thanks for the quick response! | 18:34 |
*** saneax has quit IRC | 18:39 | |
*** mattw4 has quit IRC | 18:46 | |
*** dpawlik has quit IRC | 18:53 | |
openstackgerrit | Tobias Henkel proposed zuul/zuul master: Stop jobs on gearman disconnect https://review.opendev.org/714722 | 18:58 |
tobiash | corvus, clarkb: the initial version didn't work, so yay tests :) | 18:59 |
*** avass has joined #zuul | 19:19 | |
*** avass has quit IRC | 19:30 | |
Shrews | zuul-maint: fyi, something has broken the nodepool-zuul-functional test in nodepool while also removing that job from zuul's suite of tests | 19:36 |
Shrews | oh, nm on the zuul side. user error, but still broken :/ | 19:41 |
*** avass has joined #zuul | 19:41 | |
mordred | Shrews: I didn't do it | 19:42 |
Shrews | :) attempting to trace it back | 19:42 |
*** hashar has joined #zuul | 19:43 | |
Shrews | https://zuul.opendev.org/t/zuul/build/dc97ec7941eb490a937569a2f11de947/log/nodepool/log/nodepool-builder.log#19 | 19:45 |
Shrews | possibly something with the container builds? | 19:45 |
Shrews | i don't see any nodepool or zuul repo merges relevant to that | 19:46 |
mordred | Shrews: maybe ianw knows ^^ | 19:48 |
corvus | i don't think that should have anything to do with containers | 19:52 |
corvus | is it saying it can't execute that program? | 19:52 |
Shrews | that's how i read it | 19:58 |
ianw | umm, did the config interitance changes merge? i did update fake dib in that ... i certainly didn't mean to change permissions or anything | 20:08 |
ianw | hasn't merged anyway | 20:08 |
Shrews | ianw: no, i checked that | 20:10 |
ianw | how bizarre, not much merged in zuul that looks related either | 20:13 |
Shrews | yeah, this is a difficult one | 20:18 |
ianw | just grabbing some breakfast ... will look after | 20:24 |
Shrews | according to https://zuul.opendev.org/t/zuul/builds?job_name=nodepool-zuul-functional it broke sometime between 2020-03-19T22:36:07 and 2020-03-21T17:20:55 | 20:24 |
Shrews | i guess 2020-03-22T13:07:26 is the real first failure | 20:31 |
*** y2kenny has quit IRC | 20:31 | |
Shrews | ianw: could this change have caused some unintended side effect? https://review.opendev.org/#/c/713759/1/playbooks/group_vars/nodepool-builder_opendev.yaml | 20:40 |
Shrews | seems unlikely, but grasping at straws rn | 20:41 |
ianw | Shrews: i don't think so ... let's put a node on hold, i'll set that up now | 20:43 |
Shrews | yeah, i think that's the next course of action now | 20:44 |
Shrews | ianw: you can use https://review.opendev.org/714672 if you like | 20:45 |
ianw | cool,next run should hold | 20:47 |
*** jamesmcarthur has quit IRC | 20:51 | |
*** jamesmcarthur has joined #zuul | 20:52 | |
*** jamesmcarthur has quit IRC | 20:56 | |
ianw | -rw-r--r-- 1 root staff 2294 Mar 24 20:58 /usr/local/lib/python3.6/dist-packages/nodepool/../nodepool/tests/fake-image-create | 21:07 |
ianw | it must be something in setuptools, etc? | 21:07 |
*** hashar has quit IRC | 21:08 | |
corvus | something that now mutates the permissions when installing it? | 21:10 |
ianw | yeah how weird. it's correct in zuul's checkout in /home/zuul/opendev ... | 21:11 |
ianw | it's not explicitly listed in setup.cfg or anything like that | 21:11 |
ianw | "root staff" seems odd ... "staff"? | 21:12 |
*** rfolco has quit IRC | 21:12 | |
fungi | staff is standardized as gid 50 on debian and derivatives | 21:14 |
*** jamesmcarthur has joined #zuul | 21:14 | |
fungi | gets used for a lot of stuff which would traditionally have been the "wheel" or "admin" group | 21:15 |
*** jamesmcarthur has quit IRC | 21:15 | |
fungi | something in a parent directory could be setgid? | 21:15 |
*** jamesmcarthur has joined #zuul | 21:16 | |
ianw | drwxr-sr-x 6 root staff 4096 Mar 24 20:58 /usr/local/lib/python3.6/dist-packages/nodepool/ | 21:16 |
fungi | yep, setgid staff there | 21:16 |
ianw | somehow a umask has changed? or the way we install has changed? | 21:18 |
fungi | are any parents of that also setgid staff? maybe all the way down to /usr itself? | 21:19 |
fungi | what distro is it? | 21:20 |
corvus | i don't think the staff part is important | 21:20 |
corvus | it's the lack of execute | 21:20 |
ianw | https://opendev.org/zuul/nodepool/src/branch/master/roles/nodepool-zuul-functional/tasks/main.yaml#L3 | 21:20 |
ianw | cmd: sudo pip3 install . | 21:21 |
ianw | i feel like maybe we changed sudo lately? | 21:21 |
ianw | 2020-03-24 20:59:04.240633 | TASK [revoke-sudo : Remove sudo access for zuul user.] ... no that all happens after, nothing there | 21:22 |
ianw | # pip3 --version | 21:24 |
ianw | pip 20.0.2 from /usr/local/lib/python3.6/dist-packages/pip (python 3.6) | 21:24 |
corvus | ianw, /win 11 | 21:24 |
corvus | sorry | 21:24 |
fungi | it would be odd for umask to mask out setting exec bit, so i don't think itS that either | 21:24 |
corvus | i figure we have 2 approaches: either figure out what changed and see if we can revert or work around it; or change the approach to not have the functional test run that command out of the install dir (instead, run it out of the git checkout) | 21:25 |
mordred | corvus, tristanC: I just noticed: http://zuul.opendev.org/t/openstack/status/change/714763,1 <-- if someone gives you a link and you open it - there is no indication which pipeline that is :) | 21:26 |
corvus | i agree and i agree that could be improved | 21:27 |
mordred | (it's not a sequence of actions I think I've ever taken before now) | 21:27 |
corvus | re nodepool; i sort of think the approach of just changing the path for the funtional test config may be the way to go | 21:28 |
corvus | it's clearly not a test that runs outside of ci very often | 21:28 |
corvus | so just update its config file to set dib-cmd to /home/zuul/....; or if we really wanted to do it right, update the playbook to template that out | 21:29 |
ianw | or perhaps pip install -e | 21:32 |
corvus | ianw: yeah, it's probably worth throwing a change up to do that and see if it works | 21:40 |
corvus | mordred: i'm trying to debug that behavior from earlier, and this command doesn't seem to work: zuul enqueue --tenant openstack --pipeline third-party-check --project ansible/ansible --change 68122,8de92b45b3ea7576912ce7686381618456189ab9 | 21:41 |
corvus | i get zuul.rpcclient.RPCFailure: Invalid change: 68122,8de92b45b3ea7576912ce7686381618456189ab9 | 21:41 |
corvus | which i find really confusing | 21:41 |
corvus | it's still open, and that looks like the right commit in that pr | 21:42 |
mordred | corvus: yeah. that's not some issue with there being a 3pci check still on there? | 21:43 |
fungi | what were the details of the event which triggered it? i recall looking at the pr and it was closed and reopened and some labels removed and added and a new commit pushed all around the time the build occurred | 21:46 |
corvus | i don't think there was a new commit? | 21:46 |
corvus | but lots of closing/opening happened afterwords | 21:46 |
mordred | yeah - he was trying to retrigger ci runs | 21:46 |
corvus | but i don't think any of that should impact zuul being able to load the change... | 21:47 |
corvus | mordred: did anything change with respect to the opendev zuul app installation? | 21:47 |
mordred | corvus: not to my knowledge? | 21:48 |
mordred | corvus: let me go check though | 21:48 |
mordred | corvus: I may not know how to check that | 21:54 |
fungi | i've heard github is intuitive, can't you just... intuit it? | 21:55 |
corvus | i'm trying to set up a local zuul with a github connection to see if i can debug this | 22:01 |
*** jamesmcarthur has quit IRC | 22:03 | |
*** jamesmcarthur has joined #zuul | 22:04 | |
ianw | so it looks the file in the wheel is : -rw-r--r-- 2.0 unx 2294 b- defN 20-Mar-24 20:55 nodepool/tests/fake-image-create | 22:05 |
*** jamesmcarthur has quit IRC | 22:05 | |
*** jamesmcarthur has joined #zuul | 22:05 | |
ianw | so it's not so much extraction of it, as it's going into the wheel wrong | 22:05 |
corvus | mordred: wow, all that work into setting up a local zuul to test it, and it worked fine | 22:10 |
corvus | mordred: like, i can enqueue it into a local zuul with an anonymous github connection without error | 22:11 |
corvus | but the same action on opendev's zuul gets me an "invalid change" | 22:11 |
corvus | which means either there's some state in opendev's zuul that's wedged, or something about using the authenticated app installation is breaking it | 22:11 |
mordred | corvus: wow. | 22:14 |
mordred | corvus: well, I pinged gundalow to see if anyhting changed permissions-wise | 22:16 |
corvus | mordred: switching to #opendev | 22:18 |
openstackgerrit | Ian Wienand proposed zuul/nodepool master: nodepool-zuul-functional: switch to editable install https://review.opendev.org/714788 | 22:51 |
*** jamesmcarthur has quit IRC | 23:12 | |
*** jamesmcarthur has joined #zuul | 23:50 | |
*** jamesmcarthur has quit IRC | 23:57 | |
*** rlandy has quit IRC | 23:57 |
Generated by irclog2html.py 2.15.3 by Marius Gedminas - find it at mg.pov.lt!