*** dkranz has joined #zuul | 00:11 | |
*** xinliang has quit IRC | 03:43 | |
*** xinliang has joined #zuul | 03:55 | |
openstackgerrit | Clint 'SpamapS' Byrum proposed openstack-infra/zuul feature/zuulv3: Make encrypt_secret.py work with OpenSSL 0.x https://review.openstack.org/517133 | 06:37 |
---|---|---|
openstackgerrit | Merged openstack-infra/nodepool feature/zuulv3: Reset state on unpaused, declined request https://review.openstack.org/517417 | 06:44 |
openstackgerrit | Clint 'SpamapS' Byrum proposed openstack-infra/zuul feature/zuulv3: Add BaseSource.getProjectReadonly and refactor https://review.openstack.org/517067 | 06:55 |
openstackgerrit | Clint 'SpamapS' Byrum proposed openstack-infra/zuul feature/zuulv3: Do not add invalid projets via the /keys API https://review.openstack.org/517078 | 06:55 |
*** hashar has joined #zuul | 08:58 | |
*** electrofelix has joined #zuul | 09:38 | |
*** Cibo_ has joined #zuul | 10:43 | |
*** Cibo has joined #zuul | 10:54 | |
*** Cibo_ has quit IRC | 10:56 | |
*** hashar has quit IRC | 11:08 | |
*** hashar has joined #zuul | 11:13 | |
*** hashar has quit IRC | 11:37 | |
*** hashar has joined #zuul | 12:19 | |
dmsimard | Anyone know where I could start troubleshooting three mergers being connected to geard properly but only one of them is picking up work ? | 13:52 |
*** jkilpatr has quit IRC | 13:52 | |
dmsimard | it's connected properly ... zuul-merg 10705 zuul 6u IPv4 26617254 0t0 TCP zm01.review.rdoproject.org:41876->managesf.review.rdoproject.org:4730 (ESTABLISHED) | 13:52 |
dmsimard | restarted the two not picking up work in debug and that's all I get http://paste.openstack.org/raw/625431/ | 13:53 |
dmsimard | as I mention that, things start rolling again... | 14:09 |
dmsimard | ¯\_(ツ)_/¯ | 14:09 |
*** jkilpatr has joined #zuul | 14:12 | |
openstackgerrit | David Shrewsbury proposed openstack-infra/zuul feature/zuulv3: Switch to threading model of socketserver https://review.openstack.org/517437 | 14:45 |
*** sambetts|afk has quit IRC | 14:51 | |
*** jkilpatr has quit IRC | 15:01 | |
*** dkranz has quit IRC | 15:09 | |
jeblair | dmsimard: telnet geardserver 4730 | 15:10 |
jeblair | dmsimard: workers | 15:10 |
jeblair | dmsimard: status | 15:11 |
dmsimard | jeblair: it was a red herring, what was stuck was actually our nodepool | 15:11 |
dmsimard | it was just weird that two mergers were not picking up anything but all three are doing stuff now. | 15:11 |
jeblair | dmsimard: it's worth noting that unless the system is busy and there are jobs queued, it's quite likely for only one to pick up jobs (and the same one). all depends on cpu speed and network topology. | 15:11 |
jeblair | dmsimard: the geard server doesn't round-robin or anything, it wakes up all idle workers, and the first one to respond gets the next job. | 15:12 |
dmsimard | yeah, we had to get more than one merger.. we have three and it's fine now. Back when we only had one, sometimes we would get one of those "rebase bombs" from the upstream gerrit and that monopolizes a merger for quite a bit | 15:12 |
jeblair | dmsimard: and it wakes them in the same order :| | 15:12 |
dmsimard | like a 22-long patch stack with depends-on mixed in between | 15:13 |
jeblair | good times | 15:13 |
*** jkilpatr has joined #zuul | 15:14 | |
*** kmalloc has joined #zuul | 15:37 | |
dmsimard | jeblair: I'm trying to hunt down the best solution to fix https://review.openstack.org/#/c/514489/ and https://review.openstack.org/#/c/514490/ | 15:50 |
dmsimard | jeblair: I wondered if we should just make sure the hostvars apply to localhost as well | 15:50 |
dmsimard | but at the same time, there can be legitimate "nodeless" jobs which run *only* on localhost, so I'm not sure to what extent the concept of nodepool vars applies (cloud, provider, ip addresses, etc.) | 15:52 |
dmsimard | For example, tristanC's work on the container driver defines a different inventory layout https://review.openstack.org/#/c/468753/24/nodepool/driver/oci/provider.py | 15:54 |
dmsimard | It'd look like emit-job-header would be driver dependant | 15:55 |
dmsimard | so maybe we could pull that back in the executor code itself, rather than to try and keep it as a role/playbook | 15:56 |
*** bhavik has joined #zuul | 15:56 | |
SpamapS | jeblair: oh waking in the same order. That would be an easy thing to fix. ;) | 16:06 |
* SpamapS throws it on the pile | 16:06 | |
SpamapS | jeblair: note that ideally they'd be woken up in the order that they slept. | 16:09 |
SpamapS | so the one that gave you a PRE_SLEEP first is the first one to wake up. That sort of functions as a round robin. | 16:10 |
* SpamapS ponders putting that into the protocol doc | 16:10 | |
jeblair | SpamapS: ++ | 16:27 |
jeblair | dmsimard: indeed we have nodeless jobs already. | 16:28 |
dmsimard | jeblair: what does the inventory look like for nodeless jobs ? | 16:30 |
jeblair | dmsimard: i think the right solution is for zuul not to care. i think our job header should gracefully handle missing data. | 16:30 |
jeblair | dmsimard: "hosts: []" | 16:31 |
jeblair | or hosts: {} | 16:31 |
jeblair | something like that | 16:31 |
jeblair | dmsimard: yeah, should be "all: hosts: {}" | 16:31 |
*** Cibo has quit IRC | 17:03 | |
*** dkranz has joined #zuul | 17:07 | |
*** jkilpatr has quit IRC | 17:16 | |
*** jkilpatr has joined #zuul | 17:28 | |
*** bhavik has quit IRC | 17:36 | |
*** Cibo has joined #zuul | 18:10 | |
*** Cibo has quit IRC | 18:16 | |
*** weshay is now known as weshay_brb | 18:30 | |
openstackgerrit | Merged openstack-infra/zuul feature/zuulv3: Make encrypt_secret.py work with OpenSSL 0.x https://review.openstack.org/517133 | 18:37 |
jlk | ugh. So if I bring 20K to the table, I can get a decent XC90 on a 72 month term for the same monthly payments I'm making now. Only, I think I'm underwater on current car, so blah. | 18:46 |
SpamapS | jlk: wrong window :) | 18:50 |
jlk | ah shit. | 18:50 |
jlk | too bad this isn't slack, I can't delete it. | 18:50 |
SpamapS | you can't really delete it in slack either. :) | 18:50 |
SpamapS | You can just hide it from unprivileged users who haven't thought to run clients that log everything yet. ;) | 18:51 |
*** Cibo has joined #zuul | 18:52 | |
jlk | sure | 18:52 |
jlk | SpamapS: as a GH user, care to review https://review.openstack.org/#/c/517121/ ? | 18:52 |
jlk | jeblair: ^^ is ready for review | 18:53 |
SpamapS | jlk: oh yeah that one is cool :) | 18:54 |
SpamapS | even though I still don't have apps ;) | 18:54 |
jlk | ah right. | 18:54 |
jlk | I wonder if you'll get GraphQL before apps | 18:54 |
SpamapS | but I can review with hope for the future | 18:54 |
*** Cibo has quit IRC | 19:11 | |
SpamapS | Looks like we might have a GH bug: http://paste.openstack.org/show/625458/ | 19:19 |
*** weshay_brb is now known as weshay | 19:20 | |
*** electrofelix has quit IRC | 19:21 | |
jlk | interesting! | 19:32 |
jlk | we only seem to set that value in pull request style events | 19:33 |
jlk | wha tevent was taht? | 19:33 |
jlk | I'm guessing a push | 19:33 |
jlk | or... | 19:34 |
jlk | wow, why is it a ZuulTriggerEvent? | 19:35 |
SpamapS | 2017-11-03 12:18:43,376 DEBUG zuul.ZuulTrigger: onChangeEnqueued {'parent-change-enqueued'} | 19:49 |
SpamapS | 2017-11-03 12:18:43,376 DEBUG zuul.ZuulTrigger: Checking for changes needing <Change 0x7f2c1c016a90 20,472d642f38f232719e8d75ee15c87ac09d2fa2bd>: | 19:49 |
SpamapS | jlk: just added parent-change-enqueued as a trigger. :) | 19:50 |
jlk | okay, so that's the new thing we never hit in Bonny | 19:50 |
SpamapS | which is how you get dependencies to merge along with their parents | 19:50 |
jlk | interesting that the trigger object is a zuul trigger event, so it doesn't have that updated_at | 19:50 |
SpamapS | the source is zuul. :) | 19:51 |
SpamapS | so it makes sense | 19:51 |
jlk | It means we wouldn't be able to determine if it's an updateOf something else | 19:52 |
jlk | maybe what could be done is if updated_at isn't a key, create it and make it "now" | 19:53 |
*** dkranz has quit IRC | 20:41 | |
SpamapS | Yeah that's the way to go I think. | 20:50 |
SpamapS | Because that's pretty much the way it happened | 20:50 |
SpamapS | so I have an interesting problem to deal with now.. hoping semaphore can help but I'm not sure. | 20:52 |
SpamapS | I have a pool of resources that are statically allocated to my CI (user accounts). Creating/deleting them is heavy so we don't want the churn. But the accounts give us isolation during a CI run... | 20:53 |
SpamapS | What I'm not sure how I'm going to deal with is how to hand out one account to only one job at a time. | 20:53 |
SpamapS | the semaphore will just tell me how many are concurrently running. But I kind of need something more like a resource pool. | 20:54 |
clarkb | could have a pre run on the executor do a checkout of the account and lock it | 20:55 |
clarkb | and do it all in ansible | 20:55 |
clarkb | though not sure if you can then modify the inventory | 20:56 |
clarkb | that might be the issue with this plan | 20:56 |
SpamapS | I think I need a state store for that. | 21:01 |
SpamapS | and I may need one, period. | 21:01 |
clarkb | ya I think thats a given if you need to coordinate arbitrary data among jobs? | 21:01 |
SpamapS | yeah I was just trying to think if I could use something in zuul already but I don't think I can. | 21:02 |
SpamapS | I honestly just need a pool of integers. | 21:02 |
SpamapS | since I could have an array of 10 user accounts and just pick using the integers as offsets. | 21:03 |
SpamapS | Feels like something zookeeper or etcd would do well. | 21:04 |
jeblair | things like this may or may not come under nodepool's remit. we're giving it more node types, and we may extend it to support more types of cloud resources. but this is perhaps more general. perhaps extending semaphores in zuul to include data, or making secrets reservable is another option. | 21:04 |
jeblair | SpamapS: but i agree, nothing off-the-shelf. | 21:05 |
clarkb | reservable secrets makes sense to me | 21:05 |
clarkb | "this is my trove cluster now" "this job has complete control of this k8s deployment" | 21:06 |
clarkb | maybe those arent the best examples but ya | 21:06 |
jeblair | SpamapS: a pre-playbook could reserve it via zk or etcd, but you'd still need a way to put it back, and we need cleanup jobs (your idea!) for that. we haven't implemented them yet, but they're probably not *too* hard. | 21:06 |
SpamapS | jeblair: oh yeah I'd love it if nodepool could be extended to do this. | 21:07 |
SpamapS | For now I think what I'm going to do is just abuse the zookeeper we already have for zuul but with different creds and different "chroot" | 21:10 |
jeblair | SpamapS: yeah, if you can do it all within one playbook then an ephemeral zk node should work well | 21:12 |
SpamapS | should work OK if I start a daemon-ish process early in each playbook which holds the resource while the playbook runs. As long as the node timeout is longer than the lag between pre/run/post | 21:12 |
SpamapS | No I was thinking it would cross the boundaries but that as long as we refresh the ephemeral nodes in time it wouldn't be a problem. | 21:13 |
jeblair | SpamapS: i'm not sure that's going to work. once a playbook ends, bwrap will kill the process, and ephemeral nodes are tied to connections, so zk will soon realize that connection is dead and delete it. if you tried to lock it again, it could race with another connection (and using the standard zk locking algorithm, if there is a race, it will lose, because it'll be at the back of the line) | 21:18 |
SpamapS | jeblair: there's a timeout for re-establishing that dead connection, I was hoping we can make that long enough to survive pre to run. | 21:19 |
jeblair | SpamapS: yeah, but you won't be re-establishing it, you'll be making a new one | 21:19 |
SpamapS | Actually | 21:19 |
SpamapS | I have nodes | 21:19 |
jeblair | doing it on the node works :) | 21:19 |
SpamapS | And in my evil world, those nodes can reach my zk (or.. a ZK anyway) | 21:19 |
SpamapS | but gah | 21:20 |
SpamapS | this got evil | 21:20 |
SpamapS | Maybe I can just write it as a zuul feature. ;) | 21:20 |
SpamapS | Another thought is to just write a little API frontend. | 21:20 |
jeblair | you could implement cleanup jobs :) | 21:20 |
SpamapS | Yeah | 21:20 |
SpamapS | Not sure how much more I can invest in Zuul.. just getting it up and running has been a bit of a side-job. | 21:21 |
SpamapS | Right now my automation spins up the heavy accounts and deletes them. And that's ok because the job that needs those accounts runs like, 3 times a day. But the admins of that service have made it clear that the harder I push it, the more it will fail. | 21:22 |
jeblair | also, i wonder if another sort of job might be useful -- one that starts before any others in the buildset, runs for the duration of the buildset, then gets a signal (or is killed) when all others are done. so basically doing a pre-job + cleanup-job pair but with only one job. | 21:22 |
SpamapS | so the pool of accounts is something I'll have to do before I get more jobs on Zuul.. but it's also something I can't do until I get more buy-in for zuul ;-) | 21:22 |
SpamapS | jeblair: oh interesting! | 21:22 |
SpamapS | A supervisor job | 21:23 |
jeblair | good name | 21:23 |
jeblair | that's still probably about as much work as the cleanup job (they share enough maybe we can implement both at once) | 21:23 |
*** Cibo has joined #zuul | 21:23 | |
jeblair | SpamapS: i have a really hacky idea | 21:23 |
jeblair | this is terrible and no one should ever do it | 21:23 |
jeblair | but, you could probably make your own supervisor job just by having a long-running zero-node job that queries the zuul status.json to figure out when its peers are all done. | 21:24 |
jeblair | you'd still need to do some inter-job communication outside of zuul to coordinate things. but it's an option. | 21:25 |
jeblair | i'm not actually sure it's any less work than implementing cleanup job. but just brainstorming here. :) | 21:25 |
SpamapS | I like it | 21:27 |
SpamapS | But I think the trouble is still finding the coordination point. | 21:28 |
SpamapS | dunno, I've sort of backburnered it now as I think I'll have to let it sit and stew for the next 2 weeks while I get some other stuff done | 21:29 |
SpamapS | I'm sure if I was coming to Sydney I could have figured it out with all of you over fried Wallaby toes. | 21:29 |
SpamapS | jlk: have you ever succeeded in getting Zuul to dump debug logs about the github API requests it is making? | 21:35 |
SpamapS | I'd really like to include the Etag's and stuff so GH support can debug with me. | 21:35 |
openstackgerrit | Merged openstack-infra/zuul feature/zuulv3: Prime github app install map on connection load https://review.openstack.org/517121 | 21:50 |
*** harlowja has quit IRC | 22:12 | |
jlk | SpamapS: yeah I thought with a log config that was cranked up to debug you can get some | 23:01 |
jlk | oh, maybe not. I see "caching due to etag" but not what the etag itself is | 23:02 |
jlk | so maybe it's more github3.py debugging | 23:02 |
jlk | let me see if I can insert a debugger here and get data | 23:03 |
*** harlowja has joined #zuul | 23:04 | |
openstackgerrit | Emilien Macchi proposed openstack-infra/zuul-jobs master: version-from-git: fix logic with tags https://review.openstack.org/517733 | 23:05 |
*** Cibo has quit IRC | 23:39 | |
openstackgerrit | Merged openstack-infra/zuul-jobs master: version-from-git: fix logic with tags https://review.openstack.org/517733 | 23:48 |
*** hashar has quit IRC | 23:55 |
Generated by irclog2html.py 2.15.3 by Marius Gedminas - find it at mg.pov.lt!