clarkb | ssh happens in a forked process so I don't think paramiko updating would affect that | 00:00 |
---|---|---|
SpamapS | AFAIK we don't use paramiko | 00:00 |
clarkb | we do just at the beginning for host key handling | 00:01 |
SpamapS | yeah just confirmed, ansible is using ssh | 00:01 |
SpamapS | ah dunno about that | 00:01 |
SpamapS | but I see what you're saying | 00:01 |
clarkb | but ya ansible is forked and uses openssh by default | 00:01 |
SpamapS | nodepool might have torched the node | 00:01 |
SpamapS | because my deploy job is just a job that runs on a bastion | 00:01 |
SpamapS | the bastion being a regular node | 00:01 |
SpamapS | hrm.. hard to find the node's real hostname anywhere | 00:04 |
SpamapS | since it was a post_failure | 00:04 |
SpamapS | no logs were saved | 00:04 |
SpamapS | which kinda sucks.. probably bad form on my post playbook part | 00:04 |
clarkb | connectivity problems do make this difficult | 00:04 |
clarkb | I say as I need to address my derp home networking. Warm eather seems to have made my office's wireless bridge device unhappy | 00:05 |
SpamapS | nodepool did delete the node out from under the job | 00:07 |
SpamapS | I wonder if I *am* restarting zookeeper or something | 00:07 |
*** rlandy has quit IRC | 00:08 | |
SpamapS | 2018-04-26 15:45:23,507 DEBUG zuul.AnsibleJob: [build: f287dae10e8d4b98a70a97b21f7f021c] Ansible output: b'RUNNING HANDLER [zookeeper : Restart zookeeper] ********************************' | 00:09 |
SpamapS | yep | 00:09 |
SpamapS | restarted it, which presumably caused the lock to be lost | 00:09 |
SpamapS | well now at least I know | 00:09 |
SpamapS | - meta: flush_handlers | 00:13 |
SpamapS | >:| | 00:13 |
tristanC | fdegir: zuul.rpm only contains the cli and the module... the doc, webui and services are sub packages. you can get them all using "yum install rh-python35-zuul-*" | 00:25 |
tristanC | clarkb: the lock should survive a zookeeper restart if the client reconnect before the session timeout | 00:39 |
clarkb | tristanC: it may actually happen because nodepool sees all the nodes as aliens if zk isnt responding? | 01:16 |
*** harlowja has quit IRC | 01:23 | |
tristanC | clarkb: can't find that behavior in the launcher code, maybe this happens if a zk call is executed when the service is down | 01:35 |
clarkb | corvus: so I dont forget your changes to config loading probably deserve a release note | 02:16 |
SpamapS | http://paste.openstack.org/show/719983/ | 03:41 |
SpamapS | Been getting these a lot | 03:41 |
openstackgerrit | Clint 'SpamapS' Byrum proposed openstack-infra/zuul master: Sometimes GitHub doesn't return repo permissions https://review.openstack.org/564666 | 03:54 |
SpamapS | ^^ looks like a simple case of assuming the latest version of an API that isn't stable. | 03:54 |
SpamapS | Heh in fact, looks like GHE 2.13 doesn't even have /collaborators | 03:56 |
openstackgerrit | Tristan Cacqueray proposed openstack-infra/zuul master: mqtt: add basic reporter https://review.openstack.org/535543 | 04:13 |
SpamapS | hrm | 04:36 |
openstackgerrit | Tristan Cacqueray proposed openstack-infra/zuul master: mqtt: add basic reporter https://review.openstack.org/535543 | 04:40 |
openstackgerrit | Clint 'SpamapS' Byrum proposed openstack-infra/zuul-jobs master: Make revoke-sudo work on base cloud-init images https://review.openstack.org/564674 | 04:45 |
SpamapS | ^^ FYI, I want this for our internal cloud tests here at GD, because I want to run things like tox/flake8/etc. with the exact image that most of our users use.. | 04:46 |
openstackgerrit | Tristan Cacqueray proposed openstack-infra/zuul master: mqtt: add basic reporter https://review.openstack.org/535543 | 05:06 |
*** swest has joined #zuul | 05:12 | |
openstackgerrit | Tristan Cacqueray proposed openstack-infra/zuul master: web: add OpenAPI documentation https://review.openstack.org/535541 | 05:52 |
SpamapS | hrm, how does ensure-tox work exactly? it installs tox with --user ... but .local/bin is only added to path on login shells.. which you don't get with the command: module. | 05:57 |
SpamapS | Guessing I need to start installing tox without --user | 05:57 |
tristanC | SpamapS: .local/bin could be added to the environment, like so: https://review.openstack.org/#/c/532083/7/roles/ansible-lint/tasks/main.yaml | 06:08 |
SpamapS | tristanC: yeah, it could. But it's not yet. ;) | 06:33 |
SpamapS | and I believe this works fine because tox is pre-installed on custom images | 06:33 |
*** yolanda__ is now known as yolanda | 06:59 | |
openstackgerrit | Tristan Cacqueray proposed openstack-infra/nodepool master: builder: support setting diskimage env-vars in secure configuration https://review.openstack.org/564687 | 07:13 |
*** xinliang has quit IRC | 07:14 | |
*** xinliang has joined #zuul | 07:15 | |
*** ssbarnea_ has joined #zuul | 07:45 | |
*** hashar has joined #zuul | 07:47 | |
*** jamesblonde has joined #zuul | 07:50 | |
*** jpena|off is now known as jpena | 07:52 | |
jamesblonde | hello :) is there some people to answer my questions ? | 07:54 |
tobiash | jamesblonde: just post your question, but note that most people here are located in us timezones | 07:59 |
jamesblonde | that's why I asked so I will try to stay tuned. My question is how does the nodepool is connected to jenkins ? | 08:18 |
openstackgerrit | Matthieu Huin proposed openstack-infra/nodepool master: Add separate modules for management commands https://review.openstack.org/536303 | 08:28 |
openstackgerrit | Matthieu Huin proposed openstack-infra/nodepool master: Add separate modules for management commands https://review.openstack.org/536303 | 08:37 |
jamesblonde | and what difference between Zuul Launcher + Zuul trigger (v2) and Zuul executor (v3) both were replaced by this ? | 08:49 |
tobiash | jamesblonde: nodepool v2 or v3? | 09:05 |
tobiash | v3 has no connection to jenkins (as there is no jenkins with zuul v3) | 09:06 |
tobiash | jamesblonde: zuul launcher (v2) was replaced by zuul executor (v3) | 09:07 |
tobiash | jamesblonde: not sure what you mean with zuul trigger (v2) | 09:08 |
*** jamesblonde has quit IRC | 09:10 | |
openstackgerrit | Matthieu Huin proposed openstack-infra/zuul master: zuul web: add admin endpoint, enqueue & autohold commands https://review.openstack.org/539004 | 10:23 |
*** CrayZee has joined #zuul | 10:27 | |
openstackgerrit | Matthieu Huin proposed openstack-infra/zuul master: zuul web: add admin endpoint, enqueue & autohold commands https://review.openstack.org/539004 | 10:30 |
openstackgerrit | Matthieu Huin proposed openstack-infra/zuul master: zuul web: add admin endpoint, enqueue & autohold commands https://review.openstack.org/539004 | 10:45 |
openstackgerrit | Matthieu Huin proposed openstack-infra/zuul master: zuul web: add admin endpoint, enqueue & autohold commands https://review.openstack.org/539004 | 11:19 |
*** jpena is now known as jpena|lunch | 11:56 | |
*** ssbarnea_ has quit IRC | 12:06 | |
*** ssbarnea_ has joined #zuul | 12:07 | |
*** ssbarnea_ has quit IRC | 12:09 | |
*** ssbarnea_ has joined #zuul | 12:13 | |
mordred | tobiash: perhaps we need a FAQ entry for v2 -> v3 migrations - I think this is the second day a similar question has been asked about launchers/executors - might be nice if we had a short page "so you're already running a zuul v2 and looking to upgrade" | 12:18 |
tobiash | mordred: good idea | 12:18 |
mordred | because they're certainly fair questions | 12:18 |
tobiash | mordred: so you're back from traveling hell? | 12:18 |
tobiash | ;) | 12:19 |
*** ssbarnea_ has quit IRC | 12:19 | |
mordred | tobiash: yes! | 12:19 |
mordred | tobiash: my couch at home is much more comfortable than places that are not my couch at home | 12:20 |
*** jamesblonde has joined #zuul | 12:20 | |
tobiash | mordred: I can imagine that | 12:20 |
jamesblonde | the nodepool that comes with zuul v3 ? | 12:21 |
tobiash | jamesblonde: the nodepool that comes with v3 has no linkage to jenkins | 12:21 |
*** ssbarnea_ has joined #zuul | 12:21 | |
tobiash | jamesblonde: as jenkins is replaced in v3 by zuul-executor | 12:21 |
jamesblonde | ok got it, and what if i want to keep using jenkins with gearman plugin ? should I keep executors ? | 12:22 |
mordred | jamesblonde: you should not upgrade to zuul v3 at the moment if you want to keep using jenkins. however, there are a few people - electrofelix is one - who have been working on zuul v3 + jenkins | 12:23 |
jamesblonde | That's my though right now. Is there a particular reason ? Not tested yet ? | 12:24 |
tobiash | jamesblonde: the data driven architecture has been changed | 12:24 |
tobiash | jamesblonde: the sources are now pushed to the nodes by the executor | 12:25 |
tobiash | jamesblonde: the merger doesn't serve any repos anymore | 12:25 |
pabelanger | mordred: jamesblonde: tobiash: I was recently pointed to https://github.com/jenkinsci/nodepool-agents-plugin for nodepoolv3 and jenkins | 12:25 |
pabelanger | I believe it is coming out of rackspace | 12:25 |
mordred | pabelanger: neat! yah - that's hughsaunders and odyssey4me | 12:26 |
pabelanger | yar | 12:26 |
jamesblonde | Yes instead of pulling it with jenkins... I am asking because we don't want to use openstack cloud but wanted to migrate to zuul v3. | 12:26 |
mordred | jamesblonde: oh - well, you don't have to use an openstack cloud with v3 | 12:27 |
odyssey4me | yep, I didn't do the development - that's down to hughsaunders and some other team members... I'm just a tester :) | 12:27 |
jamesblonde | Thanks for your recommandation, i check the repo and keep in mind for electrofelix | 12:27 |
mordred | jamesblonde: zuulv3 has direct support for pre-defined static nodes | 12:27 |
mordred | as well as a growing number of non-openstack node providers | 12:27 |
mordred | so if that's the reason you wanted to keep your jenkins - we've got you covered :) | 12:27 |
mordred | I should say - nodepool v3 has direct support for static nodes as well as a growing number of non-openstack dynamic node providers | 12:28 |
mordred | zuul v3 has support for whatever nodepool gives it :) | 12:28 |
jamesblonde | So that would be the best for us | 12:28 |
mordred | \o/ | 12:28 |
mordred | odyssey4me: 'just a tester' | 12:29 |
jamesblonde | And in this case zuul executor is not needed like in the v2 ? Should I keep my zuul launcher & trigger ? | 12:30 |
pabelanger | right, zuul-executor will only work with zuulv3 | 12:30 |
mordred | wait - I think y'all just said different things | 12:30 |
mordred | in v3 you need a zuul-scheduler and at least one executor | 12:31 |
mordred | and a nodepool | 12:31 |
jamesblonde | That's what I want, use zuul v3 but using another node pool manager | 12:31 |
pabelanger | ah, yes. I was only focusing on zuul-launcher / zuul-executor part | 12:31 |
mordred | yah - so that's not really a thing with zuul v3 | 12:31 |
mordred | zuul v3 gets nodes from nodepool | 12:32 |
mordred | if you want to use zuul v3 and the nodes are somewhere, the best bet would be to write a plugin for nodepool to get nodes from whatever is managing them | 12:32 |
jamesblonde | ok so zuul v3 is only pre-configured to work with nodepool | 12:32 |
tobiash | jamesblonde: yes, so essentially nodepool is actually a mandatory part of zuul now | 12:33 |
mordred | yes | 12:33 |
mordred | but - it itself is pluggable - so nodepool should be able to get nodes from whatever system - be it static or openstack or ec2 or something homegrown | 12:34 |
tobiash | jamesblonde: but as mordred said nodepool also can manage a pool of e.g. statically defined nodes | 12:34 |
electrofelix | jamesblonde: from my testing so far the upgrade path will be migrate to nodepool v3 with jenkins, hughsaunders is the person to chat to about the plugin, once we've had a chance to migrate locally ourselves hoping to help him with that plugin and subsequently a zuul-trigger plugin to allow zuul v3 -> jenkins communication | 12:35 |
jamesblonde | ok i am going to think about writing such a plugin | 12:35 |
tobiash | jamesblonde: do you use a system for dynamic node provisioning? | 12:36 |
electrofelix | jamesblonde: I'd get nodepoolv3 integrated with jenkins first as a first pass | 12:37 |
mordred | electrofelix: I think the issue is that the reason they were having jenkins in the mix was to avoid nodepool since they have jenkins getting nodes from somewhere else | 12:37 |
jamesblonde | we use AWS cloud only but it is more VM instances | 12:37 |
mordred | jamesblonde: there is an ec2 driver up for review already actually | 12:37 |
mordred | jamesblonde: https://review.openstack.org/#/c/535558/ | 12:38 |
pabelanger | yah, best to talk with tristanC about nodepool drivers, he writes them in his sleep :) | 12:38 |
jamesblonde | my goal is to use a Zuul v2 like behavior with a set of dynamic nodes to manage ephemeral ressources (because today we have 5 fulltime jenkins masters running) | 12:38 |
tobiash | jamesblonde: in this case I think you want v3 without jenkins and with nodepool and https://review.openstack.org/535558 | 12:39 |
*** rlandy has joined #zuul | 12:39 | |
jamesblonde | I thinks that exactly what we are looking for | 12:40 |
mordred | sweet | 12:41 |
jamesblonde | so I was bad thinking that nodepool was made for OpenStack based cloud => as you can read here https://docs.openstack.org/infra/nodepool/ "It is designed to work with any OpenStack based cloud," | 12:41 |
mordred | oh. heh. good call! | 12:41 |
jamesblonde | (i am french so that sentence made me think I had to use a openstack based cloud, and not standalone physical or virtual machine) | 12:42 |
openstackgerrit | Monty Taylor proposed openstack-infra/nodepool master: Clarify in doc introduction that OpenStack is not required https://review.openstack.org/564746 | 12:44 |
mordred | jamesblonde: ^^ maybe that will prevent such confusion next time | 12:44 |
jamesblonde | oh indeed | 12:45 |
jamesblonde | yes i did not see that one | 12:45 |
jamesblonde | good review btw ;) | 12:45 |
mordred | \o/ | 12:46 |
jamesblonde | It is clearer in my brain now ^^ i am going to test it using the aws driver | 12:46 |
mordred | jamesblonde: sweet. I think tristanC has used that for some things, so it should work, but it's also new, so please let us know if you have any issues with it | 12:47 |
jamesblonde | of course I'll come back and write some doc too | 12:48 |
jamesblonde | (if i plan to use it | 12:48 |
Shrews | i think the proposed aws driver is fairly limited | 12:48 |
Shrews | more of a WIP | 12:49 |
mordred | Shrews: you're more of a WIP | 12:50 |
jamesblonde | i will let you know but could be a good idea to contribute on it | 12:50 |
mordred | ++ that would be very welcome | 12:50 |
Shrews | /ignore mordred --reason "just cause" | 12:50 |
*** jpena|lunch is now known as jpena | 12:54 | |
openstackgerrit | Merged openstack-infra/zuul master: Fix zuul home directory in zuul from scratch document https://review.openstack.org/564386 | 12:57 |
openstackgerrit | Merged openstack-infra/nodepool master: Clarify in doc introduction that OpenStack is not required https://review.openstack.org/564746 | 12:57 |
*** dkranz has joined #zuul | 13:14 | |
SpamapS | Even if it is limited.. | 13:49 |
SpamapS | It's needed. | 13:49 |
SpamapS | And you have to start somewhere. | 13:49 |
SpamapS | What's blocking it currently? | 13:49 |
* SpamapS going through the review slowly | 13:49 | |
SpamapS | also, if we do want people to write drivers, https://review.openstack.org/#/c/535555/ is critical | 13:50 |
SpamapS | (and has two +2's.. so...) | 13:50 |
SpamapS | Make that 3 | 13:53 |
SpamapS | Shrews: was there some unstated reason we haven't landed 535555? | 13:53 |
SpamapS | Actually it just looks like it's been sitting ready to ship for a few days, so, +3'd | 13:54 |
Shrews | SpamapS: nope | 13:55 |
SpamapS | werd | 14:02 |
Shrews | i sort of want to make the drivers pluggable (except for openstack and maybe static) so others don't need to wait on nodepool releases to get the latest and greatest driver | 14:04 |
Shrews | i wonder how others feel about that though | 14:04 |
Shrews | also, i don't really want to review AWS changes :) | 14:04 |
openstackgerrit | Merged openstack-infra/nodepool master: Refactor NodeLauncher to be generic https://review.openstack.org/535555 | 14:05 |
Shrews | or VMWare changes | 14:07 |
Shrews | or Azure changes | 14:07 |
Shrews | etc | 14:07 |
mordred | Shrews: well - I'd agree, except the horizon/neutron plugin testing mess makes me think twice about that | 14:08 |
Shrews | mordred: i am not aware of the details there | 14:08 |
mordred | Shrews: it's solveable - but basically the out of tree driver needs the thing it's a driver for in order to test - it's probably fine for us since we release frequently | 14:09 |
openstackgerrit | Paul Belanger proposed openstack-infra/nodepool master: Add fedora-28 to nodepool dsvm https://review.openstack.org/559211 | 14:09 |
Shrews | mordred: well, i mean, the shifting of responsibility to make sure it works with nodepool is the main reason i'd like it pluggable | 14:11 |
Shrews | because we can't test with any other provider ourselves | 14:11 |
Shrews | so we'd just be guessing | 14:11 |
Shrews | but if a driver author has those resources to test... great | 14:11 |
mordred | nod. yeah - it's a topic we should certainly consider how to deal with | 14:12 |
Shrews | i just imagine someone coming to #zuul and saying "hey, the aws driver doesn't do this thing" | 14:12 |
Shrews | how can we (other than tristanC) test and fix? | 14:13 |
mordred | Shrews: that said, I bet markatwood would give us quota to test the ec2 driver | 14:13 |
Shrews | mordred: there's still vmware, azure, kubernetes, some-other-latest-greatest-thing | 14:13 |
Shrews | i guess i've already decided which way *I* lean on this :) | 14:14 |
mordred | vmware is the only one f those that seems problematic though - since we'd have to install vmware and that would suck | 14:14 |
mordred | Shrews: hehe | 14:14 |
pabelanger | it would be great for driver author to setup 3pci on the nodepool driver some how | 14:14 |
pabelanger | and report results | 14:14 |
rcarrillocruz | problem is on clouds/products that are not free | 14:15 |
pabelanger | unless openstack-infra gets credentials to azure / aws | 14:15 |
Shrews | it's not just access... it's a working knowledge of the thing | 14:15 |
rcarrillocruz | like, tristan developed his asw driver by using the free tier account | 14:15 |
rcarrillocruz | but that goes away after a year me thinkg | 14:15 |
pabelanger | yah | 14:15 |
mordred | I think my concern is that I don't want to wind up with key things in a contrib ghetto | 14:15 |
Shrews | e.g., i don't have any desire to learn vmware | 14:15 |
rcarrillocruz | lol | 14:15 |
mordred | however we can make sure that they're in good shape and reasonable for people to use I'm in favor of | 14:16 |
pabelanger | Shrews: easy, nova vmware driver | 14:16 |
mordred | pabelanger: ;) | 14:16 |
pabelanger | monies please | 14:16 |
rcarrillocruz | this is what ansible folks use to test vmware modules, https://github.com/vmware/govmomi/tree/master/vcsim , but yeah, i hear what Shrews says about 'knowing everything about all drivers to review them' | 14:16 |
pabelanger | :D | 14:16 |
mordred | yah. the ansible community choice to empower driver authors to care about their driver is more scalable than them all having to learn all of the drivers | 14:17 |
mordred | so it might be more of figuring out what the line isof drivers that we think are important enough that we should collectively learn something about them | 14:17 |
mordred | and also have a mechanism for people who want to care and feed for a driver that we can not care about | 14:18 |
Shrews | i think this warrants a ML discussion. i can start that up | 14:18 |
mordred | coolio | 14:19 |
mordred | cause I think the major cloud providers (other than openstack of course) - ec2, gce and azure - are ones we should have out of the box support for - just like having out of the box support for github for zuul | 14:20 |
mordred | now - the others - the digital oceans and mac stadiums - the line gets much more blurry for me | 14:20 |
Shrews | i think the line needs to be drawn on what we actively test | 14:21 |
Shrews | not on popularity | 14:22 |
Shrews | but i'll put that in the initial email | 14:22 |
tobiash | Shrews, mordred: the pluggable driver interface was discussed a few months ago and the decision at that time was that we want such a thing but need time to stabilize the driver api first before making that public | 14:25 |
mordred | tobiash: ++ | 14:26 |
mordred | Shrews: oh totally - but I think we should actively test ec2, azure and gce in addition to openstack | 14:26 |
tobiash | I think corvus wanted to land a few more drivers before making that step to get real experiences | 14:27 |
mordred | (assuming, of course, we can get donated quota to do such a thing) | 14:27 |
tobiash | so maybe we want to wait until some of tristanC's drivers landed to validate that the internal api works and can be published | 14:27 |
Shrews | mordred: if we can actively test them, i'm more ameniable to having them in-tree | 14:28 |
Shrews | amenable | 14:29 |
Shrews | words are hard | 14:29 |
* mordred hands Shrews a box of ameniable rhinocerouses | 14:29 | |
Shrews | mmm, yummy | 14:29 |
dmsimard | mordred: I think I found a bug in the zuul UI ? If I go here: http://zuul.openstack.org/jobs.html and then ctrl+f our oddly specific job "legacy-grenade-dsvm-cinder-mn-sub-volbak", clicking on the "builds" link changes the link in the address bar to http://zuul.openstack.org/builds.html?job_name=legacy-grenade-dsvm-cinder-mn-sub-volbak but it doesn't actually refresh the page to go to the builds for that job. | 14:32 |
rcarrillocruz | we could team up with ansible/ansible to see if they could donate us 'some' quota for those providers | 14:33 |
rcarrillocruz | hint hint | 14:33 |
mordred | dmsimard: that doesn't seem awesome | 14:36 |
mordred | dmsimard: although I need to finish the angular5 patch (one more thing outstanding) - so let's check if against that (tracking it down in the current code is likely not going to be the world's most fun thing) | 14:37 |
dmsimard | mordred: np | 14:38 |
mordred | dmsimard: http://logs.openstack.org/89/551989/31/check/zuul-build-dashboard/f6d6097/npm/html/builds.html?job_name=legacy-grenade-dsvm-cinder-mn-sub-volbak <-- worked on top of the angular5 patch | 14:39 |
mordred | dmsimard: so - I think I've fixed your bug in an upcoming patch | 14:39 |
dmsimard | mordred: going to that URL directly works | 14:39 |
dmsimard | mordred: it's clicking on the builds link from the jobs page that doesn't, let me try | 14:39 |
mordred | dmsimard: ya - but I got to that by following your process | 14:39 |
mordred | http://logs.openstack.org/89/551989/31/check/zuul-build-dashboard/f6d6097/npm/html/jobs.html | 14:40 |
dmsimard | ah, ++ | 14:40 |
dmsimard | mordred: you're so good you fix problems you didn't even know you had :) | 14:40 |
mordred | unfortunatley I have a half-done fix for a different problem sitting on that patch locally - but I haven't touched it ina week so I don't remember what the problem was anymore | 14:40 |
*** jimi|ansible has quit IRC | 14:42 | |
corvus | mordred, tobiash, Shrews: i very much think that a reasonable set of popular drivers should be in-tree in order to be useful for users. and yes, they should be tested, though i'm not sure they always need to tested against live systems -- betamax/mocks/fakes may be enough in some circumstances. as core reviewers we don't need to know everything about them. we need to make a good api interface so that | 14:50 |
corvus | people who do know about them can maintain them. | 14:50 |
Shrews | i don't understand the reasoning that having them in-tree makes it more useful | 14:51 |
Shrews | it may make it simpler | 14:52 |
Shrews | email just sent, btw | 14:52 |
*** acozine1 has joined #zuul | 14:53 | |
corvus | Shrews: yes, simpler is useful | 14:53 |
Shrews | and i fear the "drive-by" driver contribution. we accept a new driver, but the author then disappears and doesn't maintain it | 14:54 |
corvus | Shrews: having them in or out of tree has no impact on that. if there's no one to maintain it, it's dead either way. | 14:55 |
corvus | Shrews: we need to be responsible for nodepool being usable and functional; it's too important for us to outsource that. | 14:56 |
corvus | Shrews: i'm not saying *all* drivers need to be in-tree | 14:56 |
corvus | Shrews: but most of the ones on the table so far should be, because they're all pretty major players. | 14:57 |
corvus | (i'm fine with creating an out-of-tree driver interface after we have openstack/aws/k8s/... in tree) | 14:58 |
*** gtema has joined #zuul | 15:09 | |
*** jimi|ansible has joined #zuul | 15:12 | |
*** jimi|ansible has joined #zuul | 15:12 | |
openstackgerrit | Matthieu Huin proposed openstack-infra/zuul master: zuul web: add admin endpoint, enqueue & autohold commands https://review.openstack.org/539004 | 15:29 |
*** myoung is now known as myoung|email-unl | 15:32 | |
*** myoung|email-unl is now known as myoung|emailplz | 15:32 | |
corvus | how about we merge my config changes? | 15:34 |
corvus | i hit w+1 on the ones that lacked it | 15:34 |
corvus | maybe we can restart openstack-infra with them today and see how they perform | 15:35 |
clarkb | corvus: and watch cacti memory graphs | 15:37 |
corvus | clarkb: oh, also, i agree about a release note | 15:38 |
jimi|ansible | mordred / Shrews : does zuul support restart sub-jobs yet? having a discussion about our current CI and figured if zuul doesn't do this yet we should probably start pestering you for it now :) | 15:40 |
clarkb | jimi|ansible: you are describing a feature that would let you describe to zuul it is ok for a job to run up to N times before succeeding and if eventually succeeding treat it as a success? | 15:43 |
clarkb | or is this restart in another context? | 15:43 |
mordred | clarkb: I think this is "recheck specific-job" | 15:46 |
clarkb | ah | 15:46 |
jimi|ansible | yeah just restart a job due to transient network/etc. failures | 15:48 |
jimi|ansible | for example, in ansible we do integration tests across all the distros, and quite often we'll see failures on ubuntu or fedora for example due to failures in the apt/yum/dnf/whatever tests because the remote resource had an issue | 15:49 |
jimi|ansible | so rather than re-run the entire test suite just restart that sub-job | 15:49 |
openstackgerrit | Merged openstack-infra/zuul master: Don't store references to secret objects from jobs https://review.openstack.org/553596 | 15:50 |
gtema | sorry for the stupid question. When I install fresh nodepool and configure static pool with 1 host, should 'nodepool list' show this node? I'm trying to install on premise zuul but struggling here. I see, that nodepool tries to login to host upon service restart, but it fails and no proper log information is available | 15:55 |
*** jamesblonde has quit IRC | 15:55 | |
*** hashar is now known as hasharAway | 15:56 | |
clarkb | gtema: reading the code it looks like static node info isn't written into zookeeper until first use, and zookeeper's node records are where `nodepool list` output comes from | 16:01 |
corvus | gtema: i believe it should not appear in the list. personally, i think it should, but the driver isn't implemented that way. | 16:02 |
corvus | i'd like us to revisit that. | 16:02 |
clarkb | corvus: ++ it appears that when the static nodes are launched() their records are written we can probably just write records for all of them on start up then launch will update status? | 16:02 |
corvus | yeah, it seems like it should be possible | 16:03 |
tobiash | corvus: maybe you want to rebase the timeout fix to the start of your stack to minimize rechecks ;) | 16:03 |
corvus | i think we talked about it in review; i'm not sure why it didn't work out | 16:03 |
*** hasharAway is now known as hashar | 16:03 | |
corvus | tobiash: yeah, now that i've incurred the cost, i'll do that :) | 16:03 |
clarkb | gtema: as for debuggin the ssh, I don't know that nodepool actually tries to login but will ask the remote node for its ssh hostkey | 16:04 |
clarkb | gtema: what logs do you have? | 16:04 |
openstackgerrit | James E. Blair proposed openstack-infra/zuul master: Allow extra time for some ansible tests https://review.openstack.org/564572 | 16:05 |
openstackgerrit | James E. Blair proposed openstack-infra/zuul master: Fix race in test_bubblewrap_leak https://review.openstack.org/564640 | 16:05 |
openstackgerrit | James E. Blair proposed openstack-infra/zuul master: Perform late validation of secrets https://review.openstack.org/553041 | 16:05 |
openstackgerrit | James E. Blair proposed openstack-infra/zuul master: Perform late validation of nodesets https://review.openstack.org/553088 | 16:05 |
openstackgerrit | James E. Blair proposed openstack-infra/zuul master: Late bind projects https://review.openstack.org/553618 | 16:05 |
openstackgerrit | James E. Blair proposed openstack-infra/zuul master: Make config objects freezable https://review.openstack.org/562816 | 16:05 |
openstackgerrit | James E. Blair proposed openstack-infra/zuul master: Remove layout from ParseContext https://review.openstack.org/563695 | 16:05 |
openstackgerrit | James E. Blair proposed openstack-infra/zuul master: Remove 'base' from UnparsedAbideConfig https://review.openstack.org/563757 | 16:05 |
openstackgerrit | James E. Blair proposed openstack-infra/zuul master: Change TestMaxTimeout to not run ansible https://review.openstack.org/564562 | 16:05 |
openstackgerrit | James E. Blair proposed openstack-infra/zuul master: Store source context on all config objects https://review.openstack.org/564563 | 16:05 |
openstackgerrit | James E. Blair proposed openstack-infra/zuul master: Cache configuration objects in addition to YAML dicts https://review.openstack.org/564061 | 16:05 |
openstackgerrit | James E. Blair proposed openstack-infra/zuul master: Stop deep-copying job variables https://review.openstack.org/564564 | 16:05 |
openstackgerrit | James E. Blair proposed openstack-infra/zuul master: Remove source_context argument to Pipeline https://review.openstack.org/564642 | 16:05 |
openstackgerrit | James E. Blair proposed openstack-infra/zuul master: Simplify UnparsedConfig.copy https://review.openstack.org/564647 | 16:05 |
corvus | all rebase ^ | 16:05 |
clarkb | corvus: I'm assuming that gerrit will reapply our +2 in most (all) changes since the thing changing was two lines in the tests | 16:05 |
clarkb | corvus: let me know if I need to rereview something | 16:06 |
gtema | only switching nodepool to debug by manually changing logconfig.py normal to handler to DEBUG | 16:06 |
corvus | (we need to make that a command line argument) ^ | 16:06 |
clarkb | gtema: can you share those logs with a paste service so that we can see what it is doing? | 16:06 |
gtema | clarkb: and on the target host failed attempts from audit.log | 16:06 |
gtema | clarkb: https://pastebin.com/H9CdA3Vi - nodepool.log | 16:09 |
gtema | clarkb: immediately after restart in the /var/log/messages: https://pastebin.com/S05KiRCf | 16:12 |
openstackgerrit | Tobias Henkel proposed openstack-infra/zuul master: Add regex support to project stanzas https://review.openstack.org/535713 | 16:12 |
tobiash | clarkb, corvus: rebased ^ to match your stack | 16:13 |
corvus | tobiash: thanks and sorry | 16:13 |
tobiash | I just hit the rebase button ;) | 16:14 |
tobiash | votes are retained | 16:14 |
clarkb | gtema: my initial reading of the nodepool logs is that nothing is wrong, nodepool creates the records it needs then is waiting for node requests from zuul | 16:14 |
tobiash | corvus: shall we +w this too or do you want a further review on that? | 16:14 |
corvus | tobiash: let me sanity check it in the new context | 16:14 |
clarkb | and I don't see whee nodepool would be logging into the remote host, it definitely does a keyscan though | 16:14 |
tobiash | ok | 16:14 |
clarkb | oh wait its gonna do the ready check isn't it /me digs more | 16:15 |
tobiash | clarkb: it does a keyscan during reconfig | 16:15 |
gtema | clarkb: ok, thanks. I was confused that nodes are not listed. Will continue zuul setup. But would those nodes listed only when tasks are executing there, or permanently after first task was executed | 16:15 |
corvus | gtema: only when tasks are executed, i believe | 16:16 |
gtema | clarkb: ok, thanks | 16:16 |
clarkb | tobiash: ya I see the keyscanning. gtema best guess is that the keyscan implementation attempts to do a login to get the key(s)? | 16:16 |
tobiash | clarkb: no, it just does a keyscan | 16:17 |
tobiash | so it can hand them over to zuul | 16:17 |
clarkb | there is definitely a paramiko.start_client then client.get_remote_server_key | 16:17 |
clarkb | unsure if the start_client will attempt a login? | 16:17 |
clarkb | or at least appear that way from audit.log's perspective | 16:17 |
corvus | clarkb: maybe any ssh connection that doesn't end with a login is a "login failed" ? | 16:18 |
corvus | from sshd's pov | 16:18 |
clarkb | corvus: ya | 16:18 |
clarkb | also no account info in that logged entry | 16:18 |
clarkb | which lines up with I just made keyscan | 16:18 |
corvus | gtema: so best guess is that everything's working okay, and if you continue with zuul setup so it requests a static node, it should (hopefully) work | 16:19 |
gtema | ok, thanks | 16:19 |
tobiash | corvus: I think I'll rebase the regex change on top of your complete stack, currently it's somewhere in the middle | 16:22 |
corvus | tobiash: sounds good | 16:22 |
openstackgerrit | James E. Blair proposed openstack-infra/zuul master: Add allowed-triggers and allowed-reporters tenant settings https://review.openstack.org/554082 | 16:31 |
openstackgerrit | Merged openstack-infra/zuul master: Allow extra time for some ansible tests https://review.openstack.org/564572 | 16:44 |
openstackgerrit | Merged openstack-infra/zuul master: Fix race in test_bubblewrap_leak https://review.openstack.org/564640 | 16:45 |
tobiash | hrm, everything broken, I think I have to restructure the regex change | 16:47 |
clarkb | oh ya types move and stuff | 16:47 |
corvus | tobiash: yeah, that's what i was worried about; let me know if you need help. | 16:47 |
tobiash | corvus: just some guidance about the way to choose, in UnparsedConfig.copy | 16:48 |
tobiash | first choice is to keep the regex projects grouped by regex, but then it would be someting extra during copy | 16:49 |
tobiash | or making that a list and group them by regex in tenantparser._addlayoutitem | 16:50 |
tobiash | I'm leaning towards option 2 even if that may occur a slight performance cost | 16:50 |
corvus | tobiash: hrm, i'm not sure i understand completely. in both options, where would you separate out the regex projects from the regular ones? | 16:53 |
corvus | tobiash: also, while i'm thinking about it, my guess is that your main loop should go in Layout.getProjectPipelineConfig now | 16:53 |
corvus | tobiash: maybe the thing to do is to just keep them in the project list with all the others in UnparsedConfig, but then separate them out into their own list or dict in parseConfig. | 16:55 |
corvus | (so UnparsedConfig only has "projects" and ParsedConfig has "projects" and "projects_by_regex") | 16:56 |
openstackgerrit | Merged openstack-infra/zuul master: Perform late validation of secrets https://review.openstack.org/553041 | 16:58 |
openstackgerrit | Merged openstack-infra/zuul master: Perform late validation of nodesets https://review.openstack.org/553088 | 16:58 |
openstackgerrit | Merged openstack-infra/zuul master: Late bind projects https://review.openstack.org/553618 | 16:58 |
tobiash | right, the unparsed config should not know about regex | 16:58 |
tobiash | I'll try that | 16:58 |
clarkb | I like that separation as the unparsedConfig is just raw datastructures | 16:59 |
corvus | friendly reminder, today is a fine day to update https://etherpad.openstack.org/p/zuul-update-email | 17:02 |
corvus | mordred: clarkb and i were just having a chat in etherpad about the fact that we probably should have added a release note about the new re2 dependency | 17:07 |
*** jpena is now known as jpena|off | 17:07 | |
corvus | mordred: do you know if we can retroactively add a note? | 17:08 |
corvus | (i mean, obviously we can add it to the next release, but i mean is there a way to get it categorized under the previous one?) | 17:08 |
*** gtema has quit IRC | 17:09 | |
corvus | i'll ask in #openstack-release | 17:09 |
*** jimi|ansible has quit IRC | 17:13 | |
corvus | mordred: there's a mypy error in http://logs.openstack.org/28/564628/2/check/tox-pep8/0c5268b/job-output.txt.gz | 17:17 |
corvus | mordred: oh, i think it's correct :) | 17:20 |
openstackgerrit | James E. Blair proposed openstack-infra/zuul master: Report git sha in status page version https://review.openstack.org/564628 | 17:21 |
*** kmalloc has joined #zuul | 17:22 | |
openstackgerrit | James E. Blair proposed openstack-infra/zuul master: Add release note about re2 https://review.openstack.org/564847 | 17:27 |
corvus | clarkb: ^ apparently we can just do that :) | 17:28 |
corvus | it's probably worth thinking about whether we want to add release notes for dependency additions though. one could argue that openstack-infra is just broken because we don't run bindep on our install. :) | 17:28 |
openstackgerrit | Matthieu Huin proposed openstack-infra/zuul master: zuul web: add admin endpoint, enqueue & autohold commands https://review.openstack.org/539004 | 17:34 |
JosefWells | Hey, zuul masters, I was wondering if any other CI systems have a similar nearest-non-failing algorithm for starting test runs, etc | 18:00 |
clarkb | JosefWells: the only one that comes to mind is chef's thing oh what is it called. Its not open source but is zuul inspired | 18:06 |
JosefWells | I've seen similar systems in semiconductor companies, but nothing open source till zuul | 18:07 |
openstackgerrit | James E. Blair proposed openstack-infra/zuul master: Add debug info to test_slow_start https://review.openstack.org/564857 | 18:11 |
openstackgerrit | Tobias Henkel proposed openstack-infra/zuul master: Add regex support to project stanzas https://review.openstack.org/535713 | 18:14 |
tobiash | corvus: had to reimplement half of this but got it running now ^ | 18:15 |
JosefWells | Thanks clarkb! I'm off to play with zuul! | 18:16 |
tobiash | corvus: 564847 results in a strange ordering of the release notes: http://logs.openstack.org/47/564847/1/check/build-sphinx-docs/a6f6b7e/html/releasenotes.html | 18:21 |
SpamapS | JosefWells: I believe the Prow folks are thinking of doing it. | 18:23 |
SpamapS | https://github.com/kubernetes/test-infra/tree/master/prow | 18:23 |
SpamapS | but for now IIRC it uses a simpler "1+n" window algorithm where they try 1, and then 1+n, and that way they have a chance at landing 1 or 1+n changes. | 18:24 |
clarkb | tobiash: looks like the tests didn't have to chagne though thats good | 18:28 |
*** electrofelix has quit IRC | 18:31 | |
mordred | corvus: wow - mypy caught an actual thing? neat | 18:31 |
SpamapS | #winning | 18:31 |
tobiash | clarkb: yeah, had to reimplement almost everything except the tests ;) | 18:31 |
SpamapS | Hm.. feature idea.. let trusted playbooks request holds. | 18:35 |
SpamapS | It would be cool to basically be able to say "If you find XYZ in the logs, and the author doesn't have any other holds active, hold these nodes" | 18:36 |
clarkb | SpamapS: you could implement it as a playbook/role with a secret (to talk to nodepool) | 18:38 |
SpamapS | yeah, I also just want nodepool to have a rest API | 18:39 |
SpamapS | so I can do exactly that | 18:39 |
SpamapS | I need a non CLI non-shared-box UI for nodepool | 18:39 |
*** jimi|ansible has joined #zuul | 18:39 | |
*** jimi|ansible has joined #zuul | 18:39 | |
*** elyezer has quit IRC | 18:39 | |
SpamapS | right now I have people logging in and sudo'ing to nodepool/zuul to make holds and clean them up | 18:39 |
openstackgerrit | Andreas Jaeger proposed openstack-infra/zuul master: Fix description for DependentPipelineManager https://review.openstack.org/564862 | 18:40 |
clarkb | tobiash: left a couple comments but they don't apepar to be regressions so didn't -1 | 18:43 |
openstackgerrit | Andreas Jaeger proposed openstack-infra/zuul master: Fix some code description https://review.openstack.org/564862 | 18:44 |
Shrews | SpamapS: you've seen https://review.openstack.org/539004 ? | 18:45 |
clarkb | SpamapS: I'm sure your copy the entire journald data contents is related btu is it common to not be able to debug based on logs in your env? | 18:47 |
clarkb | (its one of the big things I push back on with openstack teams, if you can debug it from the logs then your ops can't either) | 18:47 |
SpamapS | clarkb: people use it as a dev-on-demand service | 18:49 |
SpamapS | write the patch, throw at wall, log in and fix wrong assumptions, repeat | 18:49 |
SpamapS | works pretty well | 18:50 |
SpamapS | would like this to be a first class paradigm in zuul eventually | 18:50 |
openstackgerrit | Merged openstack-infra/zuul master: Make config objects freezable https://review.openstack.org/562816 | 18:51 |
openstackgerrit | Merged openstack-infra/zuul master: Remove layout from ParseContext https://review.openstack.org/563695 | 18:51 |
clarkb | ah so assumption is that initial pass will fail and dev will jump on to iterate | 18:51 |
clarkb | interesting | 18:51 |
SpamapS | sometimes | 18:52 |
SpamapS | not always | 18:52 |
SpamapS | just a common like, "I need to fiddle with it some" | 18:52 |
SpamapS | and rather than having a parallel vagrant path.. | 18:52 |
SpamapS | just zuul for all | 18:52 |
corvus | SpamapS: i don't see a problem with this in principle, but i think we'll want to explore the ux around it a bit. how would the idea of, rather than requesting it in a playbook, simply every failed job was auto-held, perhaps up to a per-author or per-tenant limit or something? could even be a limit of 1 -- so the last failed job for $author is auto-held for 24 hours. | 18:54 |
corvus | (to be clear, i'm just brainstorming) | 18:54 |
SpamapS | Yeah I've been wondering that too. | 18:54 |
SpamapS | Have had similar thoughts | 18:55 |
SpamapS | Another thought I've had is to dump an SSH key into a recheck comment. | 18:55 |
SpamapS | Like "I'm a trusted person and I want to be able to get into the nodes if this fails" | 18:55 |
SpamapS | recheck-with-hold | 18:56 |
SpamapS | something like that | 18:56 |
SpamapS | anyway.. just something I'm thinking about | 18:56 |
SpamapS | too many ideas to get done | 18:56 |
SpamapS | For a team of about 10 users, the current method is working fine. | 18:57 |
SpamapS | But I can see it failing to scale quickly. | 18:57 |
corvus | SpamapS: that's a promising idea too -- it sounds like it could have a good level of delegation there (presumably could be enabled per-pipeline) | 18:57 |
openstackgerrit | Merged openstack-infra/zuul master: Remove 'base' from UnparsedAbideConfig https://review.openstack.org/563757 | 18:58 |
openstackgerrit | Merged openstack-infra/zuul master: Change TestMaxTimeout to not run ansible https://review.openstack.org/564562 | 18:58 |
openstackgerrit | Merged openstack-infra/zuul master: Store source context on all config objects https://review.openstack.org/564563 | 18:58 |
openstackgerrit | Merged openstack-infra/zuul master: Cache configuration objects in addition to YAML dicts https://review.openstack.org/564061 | 18:58 |
openstackgerrit | Merged openstack-infra/zuul master: Stop deep-copying job variables https://review.openstack.org/564564 | 18:58 |
openstackgerrit | Merged openstack-infra/zuul master: Remove source_context argument to Pipeline https://review.openstack.org/564642 | 18:58 |
openstackgerrit | Merged openstack-infra/zuul master: Simplify UnparsedConfig.copy https://review.openstack.org/564647 | 18:58 |
corvus | welp, that's that landed! | 18:58 |
*** elyezer has joined #zuul | 19:00 | |
*** spsurya has quit IRC | 19:01 | |
openstackgerrit | Merged openstack-infra/zuul master: Report git sha in status page version https://review.openstack.org/564628 | 19:15 |
*** myoung|emailplz is now known as myoung | 19:18 | |
openstackgerrit | Tobias Henkel proposed openstack-infra/zuul master: Add regex support to project stanzas https://review.openstack.org/535713 | 19:35 |
tobiash | clarkb: adapted to your comments ^ | 19:36 |
openstackgerrit | Fatih Degirmenci proposed openstack-infra/nodepool master: Add nodepool service file for CentOS7 https://review.openstack.org/564872 | 19:49 |
corvus | tobiash: dhellman says it's a bug in reno and is unrelated to that patch. the sections can end up in any order, and in fact, i think we're seeing it in action right now with them changing on the website. | 19:54 |
corvus | tobiash: https://storyboard.openstack.org/#!/story/2001934 | 19:54 |
corvus | so it's not related to the change to add the re2 releasenote, that should be safe to land | 19:54 |
mordred | corvus: we've exercised reno a bit recently haven't we? | 19:55 |
corvus | ayup | 19:56 |
tobiash | Ah ok | 19:56 |
pabelanger | fdegir: left a suggestion on 564872 | 20:01 |
*** CrayZee has quit IRC | 20:06 | |
fdegir | pabelanger: just looking at it | 20:11 |
fdegir | pabelanger: i didn't get this part of the comment: it will combined both files and use the proper path for centos. | 20:12 |
fdegir | pabelanger: when you say "both files", which files do you mean? | 20:12 |
pabelanger | you'd install the existing nodepool-launcher.service and new nodepool-launcher.d/centos.config | 20:14 |
fdegir | pabelanger: now i got it | 20:14 |
pabelanger | centos.conf | 20:14 |
fdegir | right when you responded | 20:14 |
pabelanger | :) | 20:14 |
fdegir | but... | 20:15 |
fdegir | pabelanger: even though systemd seems to be happy, sudo systemctl start nodepool-launcher hangs | 20:15 |
pabelanger | does nodepool-launcher -d work? | 20:16 |
fdegir | that works | 20:16 |
pabelanger | if so, you might have permissions issues | 20:16 |
fdegir | I think we need the start command updated | 20:16 |
pabelanger | no, I suspect you cannot create the pid file | 20:16 |
fdegir | and modified like the one i have in centos one with -d | 20:16 |
fdegir | pid file is there | 20:16 |
fdegir | i was looking at the service file from softwarefactory | 20:17 |
pabelanger | should be in /var/run/nodepool, which systemd creates with RuntimeDirectory=nodepool | 20:17 |
pabelanger | is the nodepool-launcher process running, maybe strace | 20:17 |
fdegir | Job for nodepool-launcher.service failed because a timeout was exceeded. See "systemctl status nodepool-launcher.service" and "journalctl -xe" for details. | 20:17 |
fdegir | it is running | 20:17 |
pabelanger | forking? | 20:17 |
fdegir | yes | 20:17 |
fdegir | if I use the one from sf with Type=simple and /usr/bin/nodepool-launcher -d | 20:18 |
fdegir | it works | 20:18 |
pabelanger | might need guessmainpid=no | 20:18 |
pabelanger | and pidfile set | 20:18 |
fdegir | let me try that one | 20:18 |
pabelanger | I stopped testing with centos, but your likely hitting some issues with systemd and python-daemon. | 20:19 |
pabelanger | you can also enable systemd debugs to get more info on why it is failing | 20:19 |
pabelanger | I guess nothing in journalctl -u nodepool-launcher.service | 20:20 |
fdegir | pabelanger: nodepool-launcher.service start operation timed out. Terminating. | 20:20 |
fdegir | pabelanger: if you look at this one | 20:20 |
fdegir | https://review.openstack.org/#/c/564872/1/etc/centos7/nodepool-launcher.service | 20:20 |
fdegir | the 3 main differences are the Type, ExecStart, and PIDFile | 20:21 |
fdegir | and that one works with no issues | 20:21 |
fdegir | but since i don't have fedora system, I am not sure if the one i sent for centos works on fedora as well | 20:22 |
pabelanger | right, you can use nodepool-launcher -d and type=simple but don't want that to be default | 20:22 |
corvus | heads up that current master may be broken (we apparently have a hole in our testing) | 20:22 |
pabelanger | you should be able to use type=forking, pidfile, execstart | 20:22 |
pabelanger | but likey need more flags on centos | 20:22 |
fdegir | ok | 20:22 |
pabelanger | maybe guessmainpid=no | 20:23 |
pabelanger | I think that will read the PIDfile for the process to watch | 20:23 |
fdegir | tried guessmainpid and it timed our as well | 20:23 |
fdegir | out* | 20:23 |
pabelanger | I'd enable debugging in systemd and see what is happening | 20:24 |
pabelanger | fdegir: but I do use nodepool-launcher -d myself and it works | 20:24 |
pabelanger | we just want zfs to use type=forking | 20:24 |
*** acozine1 has quit IRC | 20:26 | |
fdegir | pabelanger: yes, if i run manually things work | 20:26 |
fdegir | pabelanger: but not as a service | 20:26 |
pabelanger | that to mean sounds like permission issue for selinux issue | 20:27 |
pabelanger | might want to check audit logs | 20:27 |
pabelanger | or set selinux to passive for nwo | 20:27 |
pabelanger | now* | 20:27 |
fdegir | sorry, didn't help | 20:28 |
fdegir | the thing is | 20:28 |
fdegir | when i issue systemctl start, i see the process | 20:28 |
fdegir | the pid is in pidfile | 20:28 |
fdegir | the nodepool reporting 018-04-27 20:27:33,673 INFO nodepool.NodePool: Starting PoolWorker.static-vms-main | 20:28 |
fdegir | so everything seems to be working but the systemctl start doesn't seem proceed further, keeps waiting and finally timing out | 20:29 |
pabelanger | does process die too? | 20:29 |
fdegir | yes | 20:30 |
pabelanger | yah, likely python-daemon cannot start properly. Check permissions on all folders, eg: /var/log/nodepool, etc | 20:30 |
pabelanger | /etc/nodepool | 20:31 |
pabelanger | if you sudo su nodepool | 20:31 |
pabelanger | then run nodepool-launcher | 20:31 |
pabelanger | it also likey fails | 20:31 |
pabelanger | which common cause is permissions issue | 20:31 |
pabelanger | and because python-daemon as stderr=none, you don't see failure | 20:31 |
clarkb | (because proper unix daemonization says you should close all open fds) | 20:36 |
pabelanger | yah, wonder if we need a --noop / --dry-run, or script to validate proper permissions on folders so daemon can properly start. Pretty hard for a new user to nodepool to understand what is happening when not using -d | 20:37 |
corvus | pabelanger: https://review.openstack.org/547889 | 20:38 |
pabelanger | yay! | 20:39 |
corvus | if ianw is busy, maybe someone else can port that to zuul | 20:39 |
*** ssbarnea_ has quit IRC | 20:43 | |
fdegir | pabelanger: this is what i get with systemd debugging | 20:50 |
fdegir | pabelanger: https://hastebin.com/ofidunewiw.sql | 20:50 |
clarkb | fdegir: pabelanger I think that is telling us we set the type to forking but the fork parent never exited (we know it did fork though beacuse the child is mentioned in the log) | 20:53 |
fdegir | again, all the permissions are right | 20:57 |
fdegir | i can start things manually | 20:57 |
fdegir | with systemctl start, i see | 20:57 |
fdegir | cat /var/run/nodepool/nodepool.pid | 20:57 |
fdegir | 20732 | 20:57 |
fdegir | nodepool 20732 1 4 20:56 ? 00:00:01 /usr/bin/python3.5 /usr/bin/nodepool-launcher | 20:57 |
fdegir | while systemctl start is waiting | 20:58 |
clarkb | ya rereading docs the parent isn't exiting | 20:58 |
fdegir | and then the stuff you see in log happens | 20:58 |
fungi | could it be blocking on additional (higher-numbered) file descriptors inherited from the shell or something? i've never looked to see whether that daemon library is smart enough to iterate over all bound fds | 21:00 |
fdegir | few weeks ago when i tried is on fedora, it worked | 21:00 |
fdegir | so this seems to be centos thingy | 21:00 |
fungi | some naive daemonization routines just assume closing stdin, stdout and stderr is sufficient | 21:00 |
fdegir | and seeing sf using simple made me think they have a reason to use simple | 21:01 |
fdegir | they might have faced similar issue | 21:01 |
clarkb | fungi: systemd says it waits for parent to exit | 21:02 |
clarkb | https://pagure.io/python-daemon/blob/master/f/daemon/daemon.py#_812 is how the library decides to detach or not by default | 21:02 |
clarkb | so oddly I think that means we don't want type = forking or we want to set detach process flag to true | 21:03 |
clarkb | this feels like an optimization for systemd | 21:03 |
clarkb | pabelanger: ^ does forking work for you? I Think you said you had tested it on fedora at least | 21:04 |
clarkb | fdegir: try it without the -d and type simple | 21:05 |
pabelanger | clarkb: I can test quickly | 21:05 |
pabelanger | I haven't yet | 21:05 |
openstackgerrit | James E. Blair proposed openstack-infra/zuul master: Coerce MappingProxyTypes in job vars to dicts https://review.openstack.org/564886 | 21:05 |
*** harlowja has joined #zuul | 21:07 | |
pabelanger | okay, I don't think we tested this on fedora, it is also hanging for me | 21:08 |
pabelanger | let me try something | 21:08 |
fdegir | clarkb: it works | 21:08 |
fdegir | clarkb: i mean without -d and type simple | 21:08 |
pabelanger | clarkb: fdegir: that was the fix, detach_process=True | 21:19 |
fdegir | pabelanger: so forking didn't work on fedora as well? | 21:20 |
pabelanger | only after I patched nodepool/cmd/__init__.py | 21:21 |
pabelanger | I've been using simple and -d myself | 21:22 |
pabelanger | so, if we want to support forking, we'll need to patch nodepool / zuul | 21:22 |
pabelanger | however, having issue with pidfile | 21:22 |
corvus | pabelanger: the zfs docs should work on fedora, are you saying they don't? | 21:53 |
pabelanger | corvus: I was testing with nodepool-builder, let me try nodepool-launcher | 21:55 |
fdegir | i just tried again now and it didn't work | 21:56 |
fdegir | on fedora27 | 21:56 |
fdegir | same timeout occurs there too | 21:56 |
fdegir | Apr 27 21:44:06 fedora.localdomain systemd[1]: nodepool-launcher.service: Start operation timed out. Terminating. | 21:56 |
corvus | are the service files that ended up in the repo the same ones from the earlier version of the docs? | 21:56 |
fdegir | i used the one from nodepool repo | 21:57 |
fdegir | oh | 21:57 |
fdegir | corvus: i just looked at leifmadsen's gist | 21:57 |
fdegir | clarkb: and that gist has simple there so the one in nodepool repo doesn't match to that | 21:58 |
fdegir | corvus: ^ | 21:58 |
pabelanger | yah, nodepool-launcher and forking isn't working. I'm not sure anybody actually tested it | 21:58 |
corvus | gist? | 21:58 |
fdegir | https://gist.github.com/leifmadsen/93b9283d10dfddba096e32fb172cf569 | 21:58 |
pabelanger | it is failing on fedora for me | 21:58 |
corvus | fdegir: oh, that's ... rather out of date :) | 21:58 |
fdegir | because i was 100% sure it worked on fedora for me when he was working with the first version | 21:58 |
fdegir | but the service file contains simple | 21:58 |
corvus | fdegir: this is the most up to date thing, which is derived from that: https://zuul-ci.org/docs/zuul/admin/zuul-from-scratch.html | 21:59 |
fdegir | so if that part of nodepool hasn't changed then the service file that ended up in nodepool wasn't the correct one | 21:59 |
fdegir | corvus: yes | 21:59 |
*** elyezer has quit IRC | 21:59 | |
fdegir | corvus: the "official" one points to service files from nodepool repo | 21:59 |
fdegir | corvus: and that's what i've been working on for centos docs | 21:59 |
corvus | okay, that should work for fedora | 22:00 |
fdegir | corvus: when you said if the right service files ended up in repo then i checked gist | 22:00 |
fdegir | corvus: it doesn't | 22:00 |
corvus | fdegir: oh, i meant the ones from a previous version of the docs, but later than the gist | 22:00 |
fdegir | the official one doesn't work | 22:00 |
corvus | fdegir: to be clear: you're saying if i follow the instructions in https://zuul-ci.org/docs/zuul/admin/zuul-from-scratch.html it won't work? | 22:00 |
fdegir | yes, that's what i am saying | 22:01 |
fdegir | the service file the doc tells user to copy from nodepool repo is the problem | 22:01 |
fdegir | https://zuul-ci.org/docs/zuul/admin/nodepool_install.html | 22:01 |
fdegir | sudo cp etc/nodepool-launcher.service /etc/systemd/system/nodepool-launcher.service | 22:01 |
fdegir | this service file has forking in it | 22:01 |
corvus | okay, that's a problem for which i will drop everything and run through the instructions again | 22:01 |
clarkb | corvus: the issue is https://pagure.io/python-daemon/blob/master/f/daemon/daemon.py#_812 | 22:02 |
fdegir | i think the easiest fix is to switch to simple instead | 22:02 |
fdegir | until nodepool/zuul is patched according to what clarkb just pasted | 22:02 |
clarkb | corvus: fdegir we can either decide to use simple and allow default behavior from ^ or override the default behavior and fork twice | 22:02 |
pabelanger | okay | 22:02 |
pabelanger | the issue is type=forking | 22:02 |
clarkb | sort of | 22:03 |
pabelanger | switching back to type=simple, the pidfile is created properly | 22:03 |
pabelanger | and systemd starts properly | 22:03 |
pabelanger | however, I don't think that is the right way systemd wants the process to work | 22:03 |
pabelanger | we'd need the setting clarkb said above for type=forking I think | 22:03 |
*** rlandy has quit IRC | 22:04 | |
clarkb | right forking is fine if you fork. and simple is fine if you don't fork. Just have to decide which we want | 22:04 |
corvus | http://git.zuul-ci.org/cgit/zuul/commit/doc/source/admin/zuul-from-scratch.rst?id=28d99222a6cb82aaf7698571359363be6416b38f | 22:04 |
openstackgerrit | Merged openstack-infra/zuul master: Coerce MappingProxyTypes in job vars to dicts https://review.openstack.org/564886 | 22:04 |
fdegir | same problem probably exists for zuul-{scheduler, executor} as well since those service files use type=forking too | 22:04 |
corvus | the service file that was added to nodepool was *not* the one that was in the docs | 22:04 |
pabelanger | nope, I lied type=simple doesn't work | 22:04 |
pabelanger | it was killed after x seconds | 22:05 |
corvus | Shrews: ^ | 22:05 |
clarkb | pabelanger: ok that at least makes me think we didn't do something completely wrong in investigating the forking option | 22:05 |
corvus | pabelanger, fdegir: have you tried the version in http://git.zuul-ci.org/cgit/zuul/commit/doc/source/admin/zuul-from-scratch.rst?id=28d99222a6cb82aaf7698571359363be6416b38f ? | 22:06 |
corvus | pabelanger, fdegir: specifically at http://git.zuul-ci.org/cgit/zuul/tree/doc/source/admin/zuul-from-scratch.rst?id=38b26de3b398e1ee1fa2bcbed0a6bc5105589f67#n254 | 22:06 |
pabelanger | clarkb: yah, enabling detach_process=True is what gets type=forking working | 22:06 |
pabelanger | corvus: testing | 22:06 |
*** elyezer has joined #zuul | 22:08 | |
clarkb | what is odd about fdegir's log is that it seems to indicate there is a child | 22:09 |
clarkb | but the only os.fork happens if detach_process=True | 22:09 |
fdegir | corvus: that seems to work | 22:11 |
pabelanger | confirmed | 22:11 |
fdegir | corvus: it's still alive | 22:11 |
corvus | pabelanger: can you please propose that as a patch. can you also please verify that the zuul service files are the same ones from that version of the documentation? | 22:12 |
pabelanger | but, I don't think systemd will ever use the pid file we are creating as PIDfile is only used with forking | 22:12 |
pabelanger | corvus: sure | 22:12 |
fdegir | pabelanger: can you add me to those changes as reviewer so i can continue with centos instructions based on those? | 22:13 |
corvus | pabelanger, Shrews, tobiash: i'd like us to be very careful with the zuul-from-scratch document. when we make changes, we need someone to actually do the process manually and verify that it works. | 22:13 |
corvus | what happened here is that after i spent several days running through the document and verified everything in it, we made changes based on things that people thought "should work". let's not do that again. | 22:14 |
corvus | so please at least get a review comment from someone -- the author or a reviewer -- that says "i tested this and it works" | 22:14 |
pabelanger | yah, I left a +2 with I have not tested, I should have really done a +1 | 22:15 |
openstackgerrit | Paul Belanger proposed openstack-infra/nodepool master: Fix nodepool-launcher systemd file https://review.openstack.org/564901 | 22:18 |
pabelanger | corvus: fdegir: clarkb: ^ that is working systemd file | 22:18 |
pabelanger | for nodepool | 22:19 |
pabelanger | I'll test zuul over the weekend | 22:19 |
fdegir | tried it and it works | 22:20 |
fdegir | thanks all for the help | 22:22 |
fdegir | now i can go back to where i left things | 22:22 |
corvus | fdegir: it looks like the same problem exists for the zuul service files | 22:22 |
corvus | fdegir: you can get a good version of those from the doc i linked earlier until we fix it | 22:23 |
corvus | pabelanger: it looks like you asked Shrews to make the same erroneous changes to the service files in zuul, can we go ahead and fix that now? | 22:23 |
fdegir | corvus: will look for those patches as well and base the work on it | 22:23 |
pabelanger | corvus: yes, I won't be able to test them until later however | 22:24 |
corvus | pabelanger: as long as they match the version i confirmed was working earlier, i'm happy. they're certainly broken now. | 22:24 |
corvus | clarkb: can you approve https://review.openstack.org/564901 ? | 22:26 |
mordred | corvus: I +2d - want me to wait on clarkb or just +A? | 22:26 |
corvus | mordred: +a | 22:27 |
mordred | corvus: done | 22:27 |
fdegir | would you like me to send those since with the new nodepool service files I am now moving to zuul steps | 22:27 |
clarkb | sorry finally getting to lunch now | 22:27 |
fdegir | and can verify those service files from the earlier version of the doc and send the change | 22:27 |
corvus | fdegir: i think pabelanger is about to do that in just a few mins | 22:28 |
openstackgerrit | Paul Belanger proposed openstack-infra/zuul master: Fix zuul systemd files https://review.openstack.org/564903 | 22:28 |
fdegir | ok | 22:28 |
corvus | seconds even | 22:28 |
fdegir | :) | 22:28 |
pabelanger | revert, but untested | 22:28 |
pabelanger | (by me) | 22:28 |
corvus | they match the ones i tested | 22:29 |
*** hashar has quit IRC | 23:19 | |
openstackgerrit | Fatih Degirmenci proposed openstack-infra/zuul master: Add CentOS 7 environment setup instructions https://review.openstack.org/564948 | 23:24 |
openstackgerrit | Fatih Degirmenci proposed openstack-infra/zuul master: Add CentOS 7 environment setup instructions https://review.openstack.org/564948 | 23:26 |
openstackgerrit | Fatih Degirmenci proposed openstack-infra/nodepool master: Add systemd drop-in file for CentOS 7 https://review.openstack.org/564872 | 23:41 |
openstackgerrit | Fatih Degirmenci proposed openstack-infra/zuul master: Add steps to use systemd drop-in for Nodepool on CentOS 7 https://review.openstack.org/564950 | 23:47 |
Generated by irclog2html.py 2.15.3 by Marius Gedminas - find it at mg.pov.lt!