openstackgerrit | Monty Taylor proposed openstack-infra/zuul-jobs master: Add general sphinx and reno jobs and role https://review.openstack.org/521142 | 00:00 |
---|---|---|
openstackgerrit | Monty Taylor proposed openstack-infra/zuul-jobs master: Add support for warning-is-error to sphinx role https://review.openstack.org/521618 | 00:00 |
openstackgerrit | Monty Taylor proposed openstack-infra/zuul-jobs master: Update fetch sphinx output to use sphinx vars https://review.openstack.org/521590 | 00:00 |
tristanC | mordred: oups, the angular version note got lost in the rebase, it was documented here: https://review.openstack.org/#/c/466561/1/etc/status/fetch-dependencies.sh (v1.5.6) | 01:00 |
openstackgerrit | Tristan Cacqueray proposed openstack-infra/zuul feature/zuulv3: web: add /static installation instructions https://review.openstack.org/521694 | 01:24 |
tristanC | mordred: /win 45 | 01:25 |
tristanC | oops, well this https://review.openstack.org/521694 recap /static installation instructions | 01:25 |
mordred | tristanC: ah - cool! I can update my patch to use 1.5.6 instead of 1.5.8 | 01:28 |
mordred | tristanC: https://review.openstack.org/#/c/521625/ - unless you want to squash mine into your patch there - either way is fine with me | 01:28 |
tristanC | mordred: or i can verify zuul.angular.js works with 1.5.8, the concern is about the $locationProvider used in builds.json to parse the query string args | 01:29 |
tristanC | mordred: it feels like those curl script are band aid until we integrate webpack or something | 01:30 |
tristanC | clarkb: agreed what matter is how zuul jobs leverage ansible, though there is one bit to account is how the zuul_stream callback works, iiuc the zuul-executor needs a tcp connection to the slave zuul_console daemon | 01:33 |
tristanC | which seems to assume the nodepool slave do already have a regular network along the ansible_connection | 01:35 |
pabelanger | clarkb: jeblair: I added the notes about the zuulv3.o.o outage last week to: https://wiki.openstack.org/wiki/Infrastructure_Status | 01:39 |
pabelanger | https://review.openstack.org/513915/ was the commit in question that stopped us from starting zuul again | 01:39 |
pabelanger | and required a force merge of: https://review.openstack.org/519949/ | 01:39 |
jeblair | pabelanger: that was the original version of that commit; it was fine. mordred merged that repo into another outside of gerrit, which is why the problem arose. | 02:09 |
pabelanger | okay | 02:17 |
pabelanger | remote: https://review.openstack.org/521700 demo variable scoping issue in ansible | 02:27 |
pabelanger | jeblair: mordred: clarkb: dmsimard: a very simple patch to demo the issues I am having ^. This isn't a zuul issue, but difference how different inventory files can affect ansible from running. We can go into more details inthe morning | 02:29 |
tristanC | regarding nodepool backends, this isn't a blocker to release v3 today from my point of view. though we might want to merge a few simple addition to support custom ansible_connection and ansible_user so that it works for tobiash's use case. | 02:40 |
pabelanger | tristanC: which patch is that? | 02:55 |
tristanC | pabelanger: https://review.openstack.org/453983 and https://review.openstack.org/453983 | 02:56 |
openstackgerrit | Merged openstack-infra/zuul feature/zuulv3: web: add /static installation instructions https://review.openstack.org/521694 | 03:00 |
pabelanger | tristanC: think one was to be https://review.openstack.org/501976 | 03:04 |
pabelanger | but cool, never knew we had that | 03:04 |
pabelanger | looks interesting | 03:04 |
pabelanger | tobiash: left comments on 501976 | 03:07 |
*** harlowja has quit IRC | 03:22 | |
dmsimard | mordred, tristanC, jeblair: I'd love to pick your brain about a question I have regarding the API implementation in ARA if you happen to be around | 03:45 |
dmsimard | jlk: maybe you too since you used it in BonnyCI :D | 03:47 |
dmsimard | before 1.0, aggregating data to a single location (i.e, running ansible from different servers with ARA set up) meant using a database server, like MySQL, creating credentials and a database -- and then configure those credentials so that ARA knows how to connect to that database | 03:48 |
dmsimard | In OpenStack terms, it's not very different than how different nova compute nodes knows where the nova database is as well as the username and password | 03:49 |
dmsimard | It's not ideal for a number of reasons, one of which is because the user has read/write access to the database and those credentials might end up on users' laptops because that's where they run ansible from. A bit meh. | 03:49 |
dmsimard | So, enter 1.0 with this shiny new API. There's either the default standalone/offline/internal API which has no authentication, no network calls and no HTTP involved. | 03:50 |
dmsimard | Or there's the HTTP REST API that you can make available so that you can get/post/update data | 03:51 |
dmsimard | I don't really want to be in the business of managing API tokens or credentials, or ACLs but there might not be any other way. I'd really have two kind of "users" (or "tokens"), read-only and read-write. However, I'm not really sure how to go about managing tokens/users. | 03:52 |
dmsimard | I think when we discussed the Zuul API in Denver jeblair said he also wasn't interested in managing credentials and would rather keep the API opened and securing the API as an exercise to the operator (i.e, hide /api/admin properly through a webserver or something) | 03:55 |
dmsimard | I'm wondering if I should do the same thing or not. I really want to keep the code base as simple as possible.. I'm a bit concerned about the implications of adding credentials, permissions, etc. | 03:56 |
dmsimard | </endwalloftext> | 03:56 |
tristanC | dmsimard: how about using http authorization and adding htaccess in front of the ara server? | 04:00 |
tristanC | i guess this is how it's going to be implemented for the zuul-web/admin endpoint | 04:01 |
*** harlowja has joined #zuul | 04:01 | |
dmsimard | tristanC: Like a http authentication ? or restriction by IP ? | 04:01 |
dmsimard | I'm not sure how http authentication in front of an API would work from a client perspective | 04:01 |
dmsimard | restriction by IP (or hosting the API in a restricted network to begin with) is probably what I had in mind | 04:02 |
dmsimard | I suppose since the client uses python requests, it's probably easy to go through the http auth and then do a GET/POST/PATCH/etc, just never seen that done before | 04:03 |
dmsimard | but yes, it's an interesting idea I hadn't thought about. I really just don't want to end up *validating* the credentials and matching those to some permissions | 04:04 |
tristanC | dmsimard: i meant like support a 'authorization' or even a 'x-auth-token' http header at the client level, and then use a middleware to authorize the request on top of the ara server | 04:07 |
tristanC | dmsimard: though, isn't zuul going to only use the standalone/offline/internal api of ara? | 04:09 |
dmsimard | tristanC: probably, yes.. the API is useful to ARA first of all, it is consuming the API instead of doing custom SQL queries everywhere | 04:11 |
dmsimard | the API endpoint is available if people are interested in aggregating data from different locations that way | 04:12 |
dmsimard | but it also allows to query ARA programmatically over HTTP | 04:13 |
dmsimard | i.e, give me the tasks for this playbook -- or give me the results for this task | 04:13 |
dmsimard | Running the API endpoint is not required at all, the default is still the internal API that is completely offline without HTTP | 04:14 |
tristanC | dmsimard: speaking of which, i'd be interested in the 'give me the output of all the failed task' | 04:14 |
tristanC | which sounds like the first query the zuul user should get when looking at the ara report of his job | 04:15 |
dmsimard | tristanC: yup, you could totally do something like this (totally just wrote it now) http://paste.openstack.org/raw/626895/ | 04:21 |
tristanC | so that would be part of a "ara generate report --failed-first" or something like that? | 04:22 |
dmsimard | tristanC: that's python, it's not a frontend/UI implementation | 04:22 |
dmsimard | tristanC: it's something that, for example, the zuul executor could do to learn about failures and maybe link to them directly or something. | 04:23 |
* dmsimard waves hands like mordred would | 04:24 | |
tristanC | dmsimard: i meant, right now you have to click "logs -> ara -> playbook -> task-page -> the task that failed" to get the reason why your job failed | 04:25 |
*** haint has quit IRC | 04:25 | |
tristanC | dmsimard: what would be cool is to shorten all those intermediary clicks so that when you click logs, then you get the output of the tasks that failed | 04:25 |
dmsimard | tristanC: yeah but really this failed task result is already available to a direct link like http://logs.openstack.org/72/516172/4/check/openstack-tox-cover/f3e9208/ara/result/09382b17-4cfe-44dd-b0c1-729feeef3e4f/ | 04:26 |
dmsimard | A static report isn't going to have an API available in order to query it | 04:27 |
dmsimard | But the executor can query ARA after the playbook has completed, determine if there has been any failures, and link to it accordingly | 04:27 |
dmsimard | Anyway, your imagination is the limit around what you want to end up doing with the API | 04:29 |
*** smyers has quit IRC | 04:36 | |
*** smyers has joined #zuul | 04:36 | |
*** yolanda has quit IRC | 04:44 | |
*** nguyentrihai has joined #zuul | 05:34 | |
*** haint has joined #zuul | 05:40 | |
*** nguyentrihai has quit IRC | 05:43 | |
*** harlowja has quit IRC | 05:52 | |
tobiash | pabelanger: did you mean 503148 or forgot to click send on 501976? | 06:16 |
openstackgerrit | Tobias Henkel proposed openstack-infra/nodepool feature/zuulv3: Rename ssh_port to connection_port https://review.openstack.org/500800 | 06:26 |
openstackgerrit | Tobias Henkel proposed openstack-infra/nodepool feature/zuulv3: Support username also for unmanaged cloud images https://review.openstack.org/500808 | 06:28 |
*** yolanda has joined #zuul | 06:45 | |
*** hashar has joined #zuul | 07:03 | |
openstackgerrit | Tobias Henkel proposed openstack-infra/nodepool feature/zuulv3: Add connection-type to provider diskimage https://review.openstack.org/503148 | 07:38 |
openstackgerrit | Tobias Henkel proposed openstack-infra/nodepool feature/zuulv3: Don't gather host keys for non ssh connections https://review.openstack.org/503166 | 07:38 |
openstackgerrit | Tobias Henkel proposed openstack-infra/nodepool feature/zuulv3: Add connection-port to provider diskimage https://review.openstack.org/504112 | 07:38 |
openstackgerrit | Tobias Henkel proposed openstack-infra/zuul feature/zuulv3: Use username from node information if available https://review.openstack.org/453983 | 07:45 |
openstackgerrit | Tobias Henkel proposed openstack-infra/zuul feature/zuulv3: Rename ssh_port to connection_port https://review.openstack.org/500799 | 07:45 |
openstackgerrit | Tobias Henkel proposed openstack-infra/zuul feature/zuulv3: Use connection type supplied from nodepool https://review.openstack.org/501976 | 07:45 |
*** rcarrillocruz has quit IRC | 08:49 | |
*** rcarrillocruz has joined #zuul | 09:42 | |
*** hashar has quit IRC | 10:03 | |
*** jesusaur has quit IRC | 10:07 | |
*** hashar has joined #zuul | 10:16 | |
*** electrofelix has joined #zuul | 10:18 | |
*** jesusaur has joined #zuul | 10:19 | |
*** jhesketh has quit IRC | 10:28 | |
*** jhesketh has joined #zuul | 10:30 | |
*** jkilpatr has joined #zuul | 12:04 | |
*** isaacb has joined #zuul | 12:13 | |
openstackgerrit | Tobias Henkel proposed openstack-infra/zuul feature/zuulv3: Use username from node information if available https://review.openstack.org/453983 | 12:28 |
openstackgerrit | Tobias Henkel proposed openstack-infra/zuul feature/zuulv3: Rename ssh_port to connection_port https://review.openstack.org/500799 | 12:28 |
openstackgerrit | Tobias Henkel proposed openstack-infra/zuul feature/zuulv3: Use connection type supplied from nodepool https://review.openstack.org/501976 | 12:28 |
rcarrillocruz | hey folks, to trigger check jobs on .zuul.yaml, that's not implicit right? | 13:08 |
rcarrillocruz | like | 13:09 |
rcarrillocruz | i have to splicitly put a files regex on .zuul.yaml if i want to trigger the job should it be modified | 13:09 |
rcarrillocruz | ? | 13:09 |
rcarrillocruz | i.e. | 13:09 |
rcarrillocruz | files: | 13:09 |
rcarrillocruz | - ^lib/ansible/modules/network/ovs/.*$ | 13:09 |
rcarrillocruz | - ^test/integration/targets/openvswitch.* | 13:09 |
rcarrillocruz | i should also add | 13:09 |
rcarrillocruz | - .zuul.yaml | 13:09 |
rcarrillocruz | should I want that job to be triggered on .zuul.yaml mod ? | 13:10 |
tobiash | rcarrillocruz: if you use files filter you probably also want to add .zuul.yaml in case you touch the according job | 13:15 |
rcarrillocruz | Ok, thx for confirming | 13:16 |
tobiash | rcarrillocruz: and probably the according playbook | 13:17 |
rcarrillocruz | Aye | 13:32 |
tobiash | rcarrillocruz: if you want to limit what jobs run on zuul.yaml changes you also could split that into several files | 13:56 |
*** jkilpatr_ has joined #zuul | 14:00 | |
*** jkilpatr has quit IRC | 14:03 | |
dmsimard | jeblair: so I noticed that ARA is still not up to date on the executors.. we had gotten stuck by https://review.openstack.org/#/c/516740/ | 14:04 |
dmsimard | I happened to have switched to Firefox (57 is awesome) and there's a bugfix in one of the latest releases that resolves an issue with permanent links on firefox :( | 14:04 |
*** hashar has quit IRC | 14:11 | |
*** hashar has joined #zuul | 14:11 | |
rcarrillocruz | mordred: in terms of zuul , third party CI and github, how's the story there? will 3rd partys willing to CI to create their own GH app and 'we' install it on our repos or is there other mechanism in the roadmap ? | 14:49 |
rcarrillocruz | other question: depends-on does not work in multiCI envs (like mixing a Github and Gerrit) iiuc, does it work on github to github tho ? | 14:55 |
*** weshay is now known as weshay_pto | 14:55 | |
mordred | rcarrillocruz: yes to the first question | 14:56 |
mordred | rcarrillocruz: for the second, cross-source depends-on is the thing that doesn't work yet- but it's on the short-term roadmap | 14:57 |
rcarrillocruz | Does it work if source are GH? | 14:57 |
mordred | rcarrillocruz: so that'll be fixed before we cut an official 3.0 release | 14:58 |
mordred | yes - it works with gh | 14:58 |
rcarrillocruz | Sweet thx | 14:58 |
pabelanger | tobiash: rcarrillocruz: I would think .zuul.yaml files would be implied matching on files, but never tested | 15:06 |
rcarrillocruz | yeah, thought so, but encountered that. I think it makes sense , as you tie a file , to a job, to a pipeline | 15:07 |
rcarrillocruz | if it was implied | 15:07 |
rcarrillocruz | that would mean kicking off on all pipeliens | 15:07 |
rcarrillocruz | at least that's how i assume the rationale is about needing that to be explicit | 15:07 |
tobiash | pabelanger: oh, didn't thought of that possibility | 15:09 |
tobiash | is it really implied? | 15:09 |
pabelanger | tobiash: I am not sure, I assumed it was. but need to test myself. i think zuul always will load it config | 15:09 |
rcarrillocruz | so, reading that roadmap thing | 15:15 |
rcarrillocruz | i'm curious about the dashboard | 15:15 |
rcarrillocruz | what does it mean | 15:15 |
rcarrillocruz | bundling something on zuul , html stuff and all, to get zuulv3.openstack.org kind of interface | 15:16 |
rcarrillocruz | from what i see, 8001 is the zuul 'api', to see live status of queues . I assume the dashboard as we know it is something we deploy outside of zuul package | 15:16 |
rcarrillocruz | ? | 15:16 |
rcarrillocruz | heh, was chatting about dmsimard the other day we may eventually also need ansible_connection along with ansible_user plumbed up to zuul, just spotted tobiash https://review.openstack.org/#/c/501976/ | 15:19 |
rcarrillocruz | ++ | 15:19 |
dmsimard | tobiash++ | 15:19 |
rcarrillocruz | tristanC: the dashboard thing you got it assigned, is that dashboard a thing that will be bundled within zuul ? | 15:20 |
rcarrillocruz | do you have changes for it to look around? | 15:20 |
rcarrillocruz | nm, https://review.openstack.org/#/q/topic:zuul-web+(status:open+OR+status:merged) | 15:27 |
pabelanger | rcarrillocruz: I think it would be something like: https://softwarefactory-project.io/zuul3/local/builds.html | 15:28 |
rcarrillocruz | oh | 15:28 |
rcarrillocruz | mucho bonito! | 15:28 |
jeblair | rcarrillocruz: dashboard will be built in to zuul -- all the web stuff will be combined | 15:28 |
pabelanger | I think SF rolled that out yesterday, which is the based for zuul-dashboard | 15:28 |
jeblair | rcarrillocruz: "topic:zuul-web" has the changes | 15:28 |
rcarrillocruz | ++ | 15:28 |
rcarrillocruz | that's great | 15:29 |
rcarrillocruz | cos i was having a hard time figuring out how to get a dhasboard last night | 15:29 |
rcarrillocruz | i think i better wait to get that merged | 15:29 |
tobiash | rcarrillocruz, dmsimard: yeah, just rebased it this morning :) | 15:31 |
rcarrillocruz | that's great for me, cos, i need ansible_connection to be either local or network_cli in order to test network devices from executor | 15:32 |
rcarrillocruz | workaround now, i create a bastion with nodepool that creates an inventory on the fly with the needed vars | 15:32 |
jeblair | rcarrillocruz: i wonder how that will work with the security protections we have against local connections for untrusted jobs (it doesn't apply to trusted jobs, but i'm sure we'd want to find a way to make both work). (cc: mordred) | 15:38 |
rcarrillocruz | yeah. for this POC, i wanted to have jobs in-repo cos i find superuseful to get those tests on commit. OTOH, that forces me get this additional bastion as i can't do certain things on the executor. I think when I show this to my peers and we move it forward in prod i'll just make the 'run ansible network integration tests' job a role on config project so I don't double-jump to kick off tests, | 15:43 |
rcarrillocruz | executor-bastion-testnode | 15:43 |
jeblair | rcarrillocruz: i'm assuming the network_cli connection plugin wouldn't cause many security concerns on the executor... what do you need local for? | 15:45 |
rcarrillocruz | it's a trick we have on network modules on 2.3/2.4. We check on action plugins if it's local, then switch to network_cli. We had to do that back in 2.3 to leverage ansible command line flags like -k and -u , instead of needing to pass creds as module side args | 15:48 |
rcarrillocruz | https://github.com/ansible/ansible/blob/stable-2.4/lib/ansible/plugins/action/ios.py#L51 | 15:49 |
rcarrillocruz | good news is that on devel we're moving away from that hack, and devices ssh connection will use connection: network_cli , no more local | 15:49 |
rcarrillocruz | but there are a few platform families needing that transition | 15:49 |
mordred | jeblair, rcarrillocruz: I believe ansible_connection should be fine with our security protections - if we have nodepool pass it, then it'll be in the inventory - what we protect against is a user setting it as a variable (iirc) | 15:49 |
mordred | lemme check though | 15:50 |
jeblair | mordred: okay, that's the way my brain was heading, but wanted to make sure | 15:50 |
jeblair | so even that local->network_cli hack may work | 15:50 |
rcarrillocruz | so something i haven't tested yet but i think i may hit a roadbloack is https://github.com/openstack-infra/zuul/blob/feature/zuulv3/zuul/executor/server.py#L1406 | 15:52 |
rcarrillocruz | by default, there's a gather_facts on nodepool nodes | 15:53 |
rcarrillocruz | however, for network devices that will fail | 15:53 |
rcarrillocruz | we don't have a shell to play with | 15:53 |
rcarrillocruz | not along python to gather facts | 15:53 |
rcarrillocruz | should that be tweakable somehow, or overridable the gather facts phase on the job section | 15:53 |
rcarrillocruz | ? | 15:53 |
pabelanger | I wonder if you could laydown an empty file in fact cache, like we do for localhost as a pre playbook | 15:54 |
mordred | rcarrillocruz: well - we do that in server to pre-cache the facts - which along with gather: smart means tasks in jobs shouldn't themselves run fact gathering ... | 15:54 |
rcarrillocruz | or paramiko nodepool connection_port as we chatted the other day | 15:55 |
rcarrillocruz | mordred: but don't we fail early witn NODE_FAILURE should that pre stage fail ? | 15:55 |
mordred | perhaps, similar to connection it's someting we need to know abot a node from nodepool - 'supports fact gathering' | 15:55 |
mordred | rcarrillocruz: oh - absolutely- that'll totally break you as it is today | 15:55 |
mordred | biggest question would be how to know that the node type in question does not support fact gathering | 15:56 |
rcarrillocruz | so not sure how we would tackle that, as an executor flag (pre fact gather yes/no) | 15:56 |
rcarrillocruz | or have a new param on the node | 15:56 |
rcarrillocruz | saying | 15:56 |
rcarrillocruz | 'supporst fact gathering' | 15:56 |
mordred | rcarrillocruz: how do the network modules themselves handle playbook automatic fact gathering? | 15:56 |
rcarrillocruz | both are not mutually exclusive | 15:56 |
rcarrillocruz | mordred: today we have <platform>_facts | 15:56 |
rcarrillocruz | in the short term, gather_facts will be pluggable | 15:57 |
rcarrillocruz | meaning, if we hint ansible that the node is a network thing ( think with ansible_network_os), then executor will spawn the right 'driver' | 15:57 |
rcarrillocruz | i think alikins was on it, not sure if we'll get that for 2.6 at the very least | 15:57 |
mordred | rcarrillocruz: yah - but ansible-playbook runs fact gathering on hosts ... are there just hard-coded lists in playbook that say "don't run fact gathering if ansible_connection is network_cli or something?" | 15:57 |
pabelanger | well, I think we only run setup_playbooks (gather facts) today to ensure SSH has been setup properly. We could re tweek that again and stop doing ansible -m setup to validate SSH is working, which moves facts back into ansible-playbook. Wouldn't that allow somebody to gather_facts: false in all playbooks? | 15:58 |
mordred | oh - wait | 15:58 |
rcarrillocruz | yeah, i think gather_facts is only on ssh connection | 15:58 |
mordred | if we need to plumb ansible_connection in anyway, we could just check if ansible_connection == 'ssh' in that fact gathering | 15:58 |
rcarrillocruz | mordred: http://docs.ansible.com/ansible/latest/ios_facts_module.html we do as modules | 15:58 |
mordred | since, as pabelanger says, it's in support of our ssh connections | 15:59 |
rcarrillocruz | cool | 15:59 |
rcarrillocruz | i thnk that's a good compromise | 15:59 |
pabelanger | yah | 15:59 |
rcarrillocruz | so, wait for tobiash changes to land | 15:59 |
rcarrillocruz | then change executor logic to test that | 15:59 |
jeblair | well, it's twofold | 16:00 |
jeblair | it's not just to validate that ssh is working, but also to establish the ssh controlpersist connections | 16:00 |
jlk | As folks who work on CI, y'all will appreciate this: https://unix.stackexchange.com/questions/405783/why-does-man-print-gimme-gimme-gimme-at-0030 | 16:01 |
pabelanger | woot! https://github.com/gtest-org/ansible/pull/1 tox-pep8 works (still) via github connection driver | 16:01 |
pabelanger | took 10mins to sync git repo to node however :D | 16:01 |
rcarrillocruz | jeblair: what's the reason to check for the controlpersist? asking, as in paramiko we don't have such, it's the reason why ansible-connection was written, to have 'feature parity' | 16:02 |
*** isaacb has quit IRC | 16:02 | |
mordred | rcarrillocruz: we set up controlpersist independently | 16:03 |
rcarrillocruz | jlk: off-topic, does anyone know sigmavirus or where he hangs out? https://github.com/sigmavirus24/github3.py/pull/671 , i guess it would be good to get a release to not carry the editable package on requirements.txt | 16:03 |
mordred | rcarrillocruz: because of wrapping ansible-playbook calls in bubblewrap | 16:03 |
rcarrillocruz | ic | 16:05 |
jlk | rcarrillocruz: I know not, but definitely worth poking upstream again :( | 16:05 |
jeblair | digging deeper, i *think* things should still work even if controlpersist isn't established there | 16:05 |
mordred | rcarrillocruz: also because we start an ssh agent so that we can inject the ssh keys into it and then remove them so that they are not there for the jobs | 16:05 |
rcarrillocruz | ah | 16:06 |
rcarrillocruz | so that explains the remove_build_key role | 16:06 |
rcarrillocruz | i was wondering what was about it | 16:06 |
mordred | I mean - there's a few things we could make ssh-aware - like we don't need to start an ssh agent if ansible_connection != ssh | 16:06 |
rcarrillocruz | that's the rationale for it? | 16:06 |
jlk | hrm. | 16:06 |
jlk | mordred: would we need to model that "add/remove" capability if the connection is not ssh? like if the connection is kubectl exec ? | 16:07 |
mordred | rcarrillocruz: ya - we have a base key that we manage, we use that in service of creating a per-build key and adding it to the remote nodes, then removing access to the original key fromthe job before handing things off | 16:07 |
jlk | is the threat model written up somewhere w/ the keys? | 16:08 |
jeblair | rcarrillocruz: that way a job can't (somehow) ssh into another host outside the set it's been given | 16:08 |
rcarrillocruz | was it ever on the table the idea to spawn executors from nodepool itself ? like a control plane pool | 16:08 |
jeblair | it shouldn't be able to do that anyway, but just in case | 16:08 |
mordred | jlk: unsure - we could make the key dance in the base job no-op if ansible_connection != ssh - or it's possible we'll need to do similar things for other systems, like win_rm which uses passwords/certs iirc | 16:08 |
jlk | mordred: yeah we may need to do that in the k8s exec route. Otherwise a task on the executor could exec into pods/containers from another job | 16:09 |
pabelanger | jlk: github question, how is the 'detail' url in 'all checks have passed' box work? https://github.com/gtest-org/ansible/pull/1 | 16:10 |
jlk | unless we figure out a way to prevent docker/kubectl calls from happening via shell on the executor | 16:10 |
mordred | jlk: nod. yah - it seems like there is a dance we need to do generally, but the impl may be different for each type of ansible connection plugin | 16:10 |
mordred | jlk: we have that way | 16:10 |
pabelanger | jlk: is that something we need zuul to update with stream.html page / final logs? | 16:10 |
jeblair | it's worth noting the key swap is a second layer defense. it should not be possible for an untrusted job to add a host to the inventory or run a local hsel. | 16:10 |
jeblair | shell | 16:10 |
mordred | jlk: docker/kubectl calls are already prevented from running via shell on the executor | 16:10 |
jlk | pabelanger: it's the zuul_url fed back through as part of the status POST call | 16:10 |
mordred | jeblair: ++ | 16:11 |
jlk | jeblair: oh good point. | 16:11 |
jeblair | (but just in case that happens somehow, we didn't want the result of that attack to be "you can ssh into any node zuul can ssh into") | 16:11 |
pabelanger | jlk: thanks, in bonnyci, did you get it properly configured to point to your final logs? | 16:11 |
mordred | yah | 16:11 |
jlk | pabelanger: for zuul 2.x yes, for 3.x I think that's still an open question | 16:11 |
mordred | it seems like good form to have equivilent auth dances for other connection types | 16:11 |
jlk | (particularly since that URL dances around) | 16:11 |
jlk | You can have only one URL, so you need it to link to a page that shows all the jobs from a pipeline, with links into their jobs | 16:12 |
jlk | er logs | 16:12 |
jeblair | jlk: what did you do in zuul v2? | 16:12 |
*** isaacb has joined #zuul | 16:12 | |
jlk | we pointed to a directory | 16:13 |
jlk | and that directory had subdirs for all the jobs I believe | 16:13 |
jeblair | oh, so you constructed the logpath specifically for that case | 16:13 |
jeblair | makes sense | 16:13 |
pabelanger | okay, so it is possible we still have some work to do on v3 | 16:14 |
jeblair | our path in openstack is constructed to organize by change, but not buildset | 16:14 |
jeblair | maybe we could switch it? | 16:15 |
jeblair | instead of /change/patchset/pipeline/job/build/ we could use /change/patchset/pipeline/buildset/job/ | 16:15 |
rcarrillocruz | i copy pasted the url format from bonnyCI , this is how it looks like on my side http://38.145.34.35/logs/ansible-networking/check/github.com/rcarrillocruz-org/ansible-fork/5/c66a514898a14a9ba93a813c8d32a117/ | 16:15 |
jeblair | or we can link to the dashboard url for the buildset | 16:16 |
jeblair | once the dashboard lands | 16:16 |
jeblair | that may be the better approach | 16:16 |
jeblair | rcarrillocruz: ah thx, makes sense | 16:17 |
mordred | I like the dashboard approach ... since that link could potentially contain the in-progress links and change to the log links (at leats in theory) | 16:17 |
jlk | yeah the dashboard was what we were hoping for | 16:17 |
jlk | and works more like Travis, CircleCI, Shippable, etc. | 16:17 |
mordred | so the link given in the status could be a persistent link that people could re-use | 16:18 |
jeblair | mordred: i agree, though currently the dashboard doesn't handle in-progress links | 16:18 |
mordred | yah | 16:18 |
jeblair | and it's not trivial to add | 16:18 |
pabelanger | jeblair: don't mind trying the new URL format | 16:18 |
* rcarrillocruz will follow what shippable does, to make people more comfortable on current way of doing ansible CI things | 16:18 | |
jeblair | (i think we *can*, it's just programming, but it's merging two data sources) | 16:18 |
jeblair | rcarrillocruz: what does that mean? | 16:19 |
jeblair | tristanC: replied on https://review.openstack.org/503270 | 16:26 |
tristanC | jeblair: followed up :) | 16:39 |
rcarrillocruz | just echoing jlk 'works more like travis, shippable'. At Ansible they use Shippable, so I'll try to show things like Shippable on zuul PR notifications | 16:40 |
jeblair | rcarrillocruz: right, i'm asking what that means :) | 16:41 |
tristanC | fwiw i'm not convinced the route we decided at the ptg are the best, it makes apache rewrite a bit weird to serve static .html files on dynamic path | 16:41 |
tristanC | i wonder if we shouldn't step back and have instead a single .html file that would query the different controller path | 16:42 |
tristanC | or if you have other suggestion, i wouldn't mind using another routes list and refactoring the html bits | 16:44 |
rcarrillocruz | this is how a link on an ansible PR looks like https://app.shippable.com/github/ansible/ansible/runs/44996/summary/console . From the gtest PR it was put earlier we point to the main zuul v3 dashboard, would be good to point to the actual job stream link | 16:45 |
rcarrillocruz | not sure howe we get the shippable 'run' link | 16:45 |
rcarrillocruz | i can ask mattclay | 16:45 |
jlk | Pretty sure it comes with the status from Shippable | 16:46 |
jlk | the pending one | 16:46 |
rcarrillocruz | wootz | 16:46 |
rcarrillocruz | https://github.com/rcarrillocruz-org/ansible-fork/pull/5 | 16:46 |
rcarrillocruz | janky | 16:46 |
rcarrillocruz | but i get 'usable' links back on PR | 16:46 |
rcarrillocruz | just added zuul_return on the base post playbook | 16:46 |
jlk | https://travis-ci.org/BonnyCI/hoist/builds/267787248?utm_source=github_status&utm_medium=notification is a relevant link from Travis | 16:47 |
jlk | it's the URL it tosses on status POSTs | 16:48 |
mattclay | rcarrillocruz: You had a question about getting Shippable run links? | 16:48 |
rcarrillocruz | oh | 16:48 |
rcarrillocruz | did not even know you were here mattclay | 16:48 |
rcarrillocruz | :-) | 16:48 |
* mattclay waves | 16:49 | |
rcarrillocruz | so we were wondering how shippable 'detail' link gets returned straight to the job being run | 16:49 |
rcarrillocruz | as in the zuul report put on openstack just points to the main dashboard | 16:49 |
rcarrillocruz | https://github.com/gtest-org/ansible/pull/1 | 16:49 |
mattclay | rcarrillocruz: You mean the 'Details' link for the Shippable status that shows up on a PR? | 16:50 |
rcarrillocruz | yah | 16:50 |
mattclay | rcarrillocruz: I believe it's this: https://developer.github.com/v3/repos/statuses/#create-a-status | 16:50 |
jlk | right, like I said. It's a URL that is provided as part of the POST to set the commit status | 16:51 |
mattclay | It gets updated every time the run status changes until it's finished. | 16:52 |
pabelanger | I don't think shipable comments on PRs like zuul right? | 16:55 |
*** isaacb has quit IRC | 16:57 | |
rcarrillocruz | bit different yeah, it doesn't put a comment per-se | 16:57 |
rcarrillocruz | https://github.com/ansible/ansible/pull/33146 | 16:57 |
rcarrillocruz | it's an 'all checks have passed' that you can click | 16:58 |
rcarrillocruz | iirc with zuul we put a straight comment from the bot | 16:58 |
mordred | rcarrillocruz: we can do either - it's configurable | 16:58 |
mordred | rcarrillocruz: you can configure it to report into that status link, or to leave comments, or both | 16:58 |
*** hashar is now known as hasharAway | 17:00 | |
rcarrillocruz | aha | 17:00 |
* rcarrillocruz just looking at comment option of GH reporter | 17:00 | |
jeblair | i thought we could not update the link...? | 17:00 |
jeblair | if we can update the url, then we should link to the status page in the start report, then link to the logs/dashboard in the final report | 17:05 |
jeblair | but i thought someone said we could only set the url once | 17:05 |
jeblair | oh, maybe i'm misremembering, and the problem is that, without the dashboard, we don't have a single url for the buildset after the builds complete? | 17:06 |
jeblair | so once we *do* have the dashboard, we can do what i described above: set the url to status page on start, then set the url to dashboard on final | 17:07 |
jeblair | rcarrillocruz, jlk: ^ does that sound right? | 17:07 |
mordred | yes - I think that was the main issue | 17:07 |
pabelanger | yah, that looks to be right based on docs | 17:07 |
jeblair | cool. i will be very happy when this is all straightened out. :) | 17:07 |
jeblair | tristanC: can we just have zuul-web serve the static html files? | 17:08 |
*** bhavik1 has joined #zuul | 17:16 | |
tristanC | jeblair: well yes, that's what it does by default | 17:17 |
openstackgerrit | Tristan Cacqueray proposed openstack-infra/zuul feature/zuulv3: web: add /{tenant}/jobs route https://review.openstack.org/503270 | 17:18 |
jeblair | tristanC: sorry, i may have misunderstood what you were saying about apache rewriting | 17:21 |
tristanC | jeblair: to keep things dead simple from a user pov, we said that the status page would avaiable at /{tenant}/status.html | 17:22 |
tristanC | jeblair: that content is served by /{tenant}/status.json and the page just do "get status.json" | 17:22 |
tristanC | jeblair: which is all good with a standalone zuul-web service | 17:23 |
tristanC | jeblair: but to serve those html files (which includes builds.html, jobs.html, and later {jobname}.html) from a proxy, then we need to rewrite those url using something like: | 17:24 |
jeblair | why would we need to rewrite individual urls in the proxy? normally i would just expect to proxy the root | 17:25 |
tristanC | AliasMatch "^/zuul3/.*/(.*).html" "/var/www/zuul-web/static/$1.html" | 17:27 |
tristanC | jeblair: rewrite static file so that they are served by apache instead of aiohttp | 17:27 |
jeblair | tristanC: why not let aiohttp serve them? | 17:27 |
tristanC | jeblair: good point :-) i may have over optimized that thing... | 17:28 |
jeblair | tristanC: also, we're sending cache-control headers for the status page at least, we can probably make sure we set those correctly for the html pages too, and then apache will end up serving them from cache anyway most of the time, with no extra configuration | 17:29 |
tristanC | alright then nevermind that concern, let's do this instead | 17:32 |
tristanC | just need to add cache-control to the static file controller | 17:33 |
jeblair | ++ | 17:35 |
pabelanger | jeblair: mordred: clarkb: dmsimard: I linked this last night, but https://review.openstack.org/521700/ is an example of the issues I was trying to explain around the need for https://review.openstack.org/521324/ | 17:39 |
pabelanger | it shows how group vars are handled differently based on inventory file | 17:40 |
pabelanger | jlk: ^might be intersting to you too | 17:40 |
* dmsimard looks | 17:42 | |
dmsimard | pabelanger: I think I understand what's going on but that seems like a bug in Ansible to me | 17:44 |
dmsimard | Doing something in Zuul to address that seems like a workaround for a bug | 17:44 |
dmsimard | v3-inventory and v3-inventory-group should behave the same | 17:45 |
dmsimard | Well.. maybe not, actually | 17:45 |
jeblair | pabelanger: which numbers do you get when you run v3-inventory-group? | 17:52 |
*** bhavik1 has quit IRC | 17:53 | |
jeblair | ah it's in the job log -- 67890 | 17:53 |
clarkb | pabelanger: so the problem is in how group vars are associated to a host if its logical name doesn't change? | 17:53 |
jeblair | pabelanger: you switched the order of the groups in v3-inventory vs v3-inventory-group. is that important? | 17:55 |
dmsimard | pabelanger: I sort of remember something related to changes in variable scopes and inheritance in 2.4... let check | 17:56 |
dmsimard | pabelanger: heh, that sounds like our culprit too: https://github.com/ansible/ansible/issues/29008 | 17:57 |
dmsimard | "import_playbook from child directory break var scope" | 17:57 |
pabelanger | jeblair: oh, that is a typo, doesn't affect things | 17:57 |
dmsimard | pabelanger: bcoca explains the change here: https://github.com/ansible/ansible/issues/29008#issuecomment-330558987 | 17:58 |
pabelanger | clarkb: maybe? I don't know why it doesn't work | 17:58 |
pabelanger | dmsimard: looking | 17:58 |
dmsimard | pabelanger: tl;dr, in 2.3 vars were loaded at the start (which confuses Ansible in your case because you have one host in two groups) and in 2.4 they are loaded on demand which should have the desired behavior | 17:58 |
jeblair | dmsimard: yeah, that's how i'm reading it | 17:59 |
dmsimard | pabelanger: the issue is prevalent especially if you have the same hostvar in more than one group_vars | 18:00 |
dmsimard | otherwise it probably doesn't reproduce | 18:00 |
pabelanger | dmsimard: right, I know include is deprecated in 2.4 and should switch to new syntax, but I haven't tested that yet | 18:00 |
jeblair | pabelanger: would it be very difficult for you to try your example under 2.4? | 18:00 |
pabelanger | jeblair: nope, i can run that now | 18:00 |
dmsimard | pabelanger: it's not a matter of using include or import, there *is* a change in how variables are loaded in 2.4 | 18:00 |
dmsimard | pabelanger: see bcoca's comment | 18:01 |
pabelanger | yes | 18:01 |
pabelanger | let me first test with include and 2.4 | 18:01 |
pabelanger | then, switch up to import_playbook | 18:02 |
dmsimard | I don't think either matters | 18:02 |
dmsimard | at least going off by what they're saying in the bug | 18:02 |
pabelanger | okay, 2.4.1.0 also failed. changing some syntax | 18:06 |
pabelanger | v3-inventory-group also fails using import_playbooks | 18:07 |
pabelanger | dmsimard: which is what you expected | 18:07 |
pabelanger | dmsimard: so, what are you thinking is the correct process? | 18:07 |
pabelanger | I think it comes down to: http://paste.openstack.org/show/626981/ | 18:09 |
pabelanger | v3-inventory, is 2 plays (which seems to load vars properly) and v3-inventory-group is 1 play | 18:10 |
openstackgerrit | James E. Blair proposed openstack-infra/zuul feature/zuulv3: Add inventory variables for checkouts https://review.openstack.org/521976 | 18:21 |
electrofelix | does zuul support running a specific job on failure? we need to do some parsing of data in order to report back to users on change failure? | 18:33 |
jlk | I don't think we have a 'finally' type bit of a pipeline | 18:35 |
jlk | that's an interesting feature addition, that a spec would be nice for | 18:35 |
electrofelix | I was hoping the failure_actions in v3 might be something that was thinking along those lines | 18:35 |
jlk | in your playbook you could catch failure from a play and handle it within that job | 18:35 |
electrofelix | bit difficult when any of 5 jobs launched could be the cause of the failure | 18:36 |
jlk | nod, where do you see failure_actions? I may have missed something | 18:36 |
mordred | electrofelix: so - one of the things on the todo list is better parsing/presentation of the logged json for each job... for instance: | 18:37 |
mordred | electrofelix: http://logs.openstack.org/05/521105/3/infra-check/openstack-zuul-jobs-linters/942626d/job-output.json.gz | 18:37 |
mordred | electrofelix: has all the base data that was used to produce http://logs.openstack.org/05/521105/3/infra-check/openstack-zuul-jobs-linters/942626d/job-output.txt.gz | 18:38 |
electrofelix | it's in the zuul/model.py, I figured it might be a generic replacement for how the failure message was performed before | 18:38 |
tobiash | electrofelix: post playbooks should be executed regardless of a failed run playbook (within the same job) | 18:38 |
mordred | electrofelix: so with an html view of that, collapsing the non-error portions and expanding only the failure portion should be fairly easy | 18:38 |
tobiash | if that's enough | 18:38 |
mordred | andyah - what tobiash said | 18:38 |
electrofelix | tobiash: we'd end up needing to copy the same code to the post playbook of all jobs (and btw, we're still using Jenkins...) | 18:39 |
mordred | electrofelix: I thinkn I may not fully understand which thing you're trying to do? | 18:39 |
electrofelix | I was hoping there might be something we said, on_failure of any of the jobs for the change in the pipeline, run this job | 18:39 |
tobiash | electrofelix: so you're talking about zuulv2? | 18:40 |
mordred | electrofelix: what would you do in the job that runs in response to on_failure? | 18:40 |
electrofelix | tobiash: yes, but also considering moving to zuulv3 (still works with Gearman) | 18:40 |
mordred | electrofelix: but you don't need to copy the same code to the post playbook of all jobs - you should be able to put the code you need in the post playbook of your base job | 18:41 |
tobiash | so cleanup jobs are not there in zuul but I think there were already discussions about that some months ago | 18:41 |
mordred | electrofelix: also, did I paste you https://etherpad.openstack.org/p/zuulv3-jenkins-integration yet? | 18:42 |
electrofelix | mordred: take the git tree, parse some metadata stored in the failed commit message, look up some changes further upstream (we're doing artifact promotion), and notify the source projects that produced the artifact that just failed it's promotion | 18:42 |
jeblair | tobiash: yes, i think we're planning on adding them shortly after 3.0 | 18:42 |
mordred | electrofelix: yah - you should totally be able to just do that in a post playbook on your base job - it'll have a variable that indicates whether the job failed or not, and it also has all of the git repo state available | 18:43 |
electrofelix | mordred: yep, I think it's orthogonal, I possibly just need to understand a bit more about base jobs and what that means in working with gearman/jenkins | 18:43 |
mordred | electrofelix: ++ | 18:43 |
mordred | electrofelix: mostly poking to make sure I understand the thing you're wanting to accomplish. you could do it with a cleanup job, but I'm pretty sure you could do it with a base job. | 18:45 |
jeblair | the difference between cleanup/base would be whether it happes once per job in a buildset, or once for the whole buildset | 18:45 |
mordred | jeblair: ++ | 18:46 |
mordred | electrofelix: all that said - you do know that zuulv3 isn't compatible with the jenkins gearman plugin, yeah? that's the reason I pasted that etherpad about v3/jenkins integration thoughts | 18:46 |
electrofelix | based on my hazy understanding of terminology, once for the whole buildset, we only care if any of the jobs for the change have failed, we don't care as to what one | 18:46 |
mordred | nod. so cleanup job, once it exists, may map better for you | 18:47 |
jlk | woo, use case -> solution. | 18:47 |
jlk | go team | 18:47 |
jeblair | ya, and now we have 2 use cases for cleanup | 18:48 |
mordred | and since all you need is the git repo state, you should be able to potentially write a nodeless cleanup job | 18:48 |
mordred | \o/ | 18:48 |
mordred | that means it may even be a good idea :) | 18:48 |
electrofelix | mordred: I thought the additional works were to make it work better rather than it not working at all. I see there are still references to gearman in v3 | 18:50 |
electrofelix | mordred: what is it that doesn't currently work? is it just the nodepool integration? or can zuulv3 not launch jobs on Jenkins with static slaves at all? | 18:55 |
mordred | electrofelix: oh - it still uses gearman, but it uses gearman as an internal communication mechanism, not as an interface with external systems | 18:55 |
jeblair | the zuulv3 spec may provide some background: http://specs.openstack.org/openstack-infra/infra-specs/specs/zuulv3.html | 18:56 |
jeblair | also some of the docs we wrote for the infra migration: https://docs.openstack.org/infra/manual/zuulv3.html#what-is-zuul-v3 | 18:57 |
jeblair | in short, it handles execution and multi-node orchestration itself, via ansible. jobs run as ansible playbooks. those can be simple playbooks which just run tests (which is the bulk of what we do in openstack infra), but ansible gives us a lot of flexibility in interacting with other systems, so mordred's etherpad lays out a way of doing so | 18:59 |
*** jeblair is now known as thecount | 19:01 | |
*** thecount is now known as jeblair | 19:01 | |
mordred | yah. one of the biggest bits is figuring out how to get the zuul prepared repo state onto the node that jenkins is going to run the job on - that's the reason for the handoff dance in the second part of the etherpad | 19:01 |
mordred | if you're doing static nodes, then obviously the nodepool integration bit isn't as important | 19:01 |
electrofelix | is it no longer possible from the node side to clone/pull from zuul merger? | 19:02 |
mordred | although we might have to brainstorm about how to get the information about the correct static node to zuul if it's static nodes jenkins owns rather than static nodes zuul owns and passes over | 19:02 |
mordred | electrofelix: nope. they don't run git servers at all | 19:03 |
jeblair | that was mostly to facilitate the workflow where cloud resources don't have access to the control plane, so it's a push rather than a pull now | 19:03 |
electrofelix | mordred: ah, well that would be a problem, I wonder if we could hack something there temporarily as otherwise it might become so difficult to migrate that no time ever gets allocated for us to help work on it | 19:04 |
mordred | electrofelix: well - once the jenkins integration stuff is done (several people want it/need it) I imagine it would be much easier for you to migrate | 19:05 |
jeblair | (and as mordred mentioned, doing a first pass of the zuul-trigger-plugin without nodepool support should be a lot easier) | 19:05 |
jeblair | (if you only have static nodes to worry about for the moment) | 19:05 |
mordred | ++ | 19:06 |
* rcarrillocruz vaguely recalls having a reverse tunnel to the merger in Gozer just for that | 19:06 | |
mordred | rcarrillocruz: sssh. don't put that tunnel in the docs :) | 19:06 |
dmsimard | pabelanger: hey sorry I went to get lunch | 19:06 |
* rcarrillocruz also has dug Gozer deep in its memory so he may be wrong | 19:06 | |
rcarrillocruz | :P | 19:07 |
dmsimard | pabelanger: so you were not able to get group_vars to load expectedly in either cases with 2.4.1.0 ? | 19:07 |
electrofelix | rcarrillocruz: I thought gozer was old enough that it still had the push merge functionality ;-) | 19:07 |
rcarrillocruz | lol, we had so many hacks it's hard to remember | 19:07 |
electrofelix | Needing to hack something together to deal with this reporting back to other repos on a failure in a different repo might be difficult to persuade it's worth writing the zuul trigger plugin and then mean we have a more difficult time to migrate | 19:08 |
rcarrillocruz | like all the proxy mesh i put to make pulls to work on the internal labs | 19:08 |
pabelanger | dmsimard: both v2-inventory and v3-inventory work as expected, v3-inventory-group fails | 19:08 |
dmsimard | pabelanger: so same behavior as 2.3 ?? | 19:08 |
pabelanger | dmsimard: right | 19:09 |
pabelanger | which, is fine for me. | 19:09 |
dmsimard | let me try something | 19:09 |
* tobiash had fun deploying openshift a hundred times during the last two weeks | 19:13 | |
electrofelix | mordred jeblair: so we run a git daemon from the same container as the zuul merger instance (using supervisor), which is obviously a giant security hole for private repos, but hey lets not worry about that. Seems like that might allow us to migrate to v3 with the zuul trigger plugin for getting code onto the slaves | 19:16 |
electrofelix | s/with/without/ | 19:16 |
dmsimard | pabelanger: ok, FWIW I confirm the behavior -- when asking bcoca about it, it is expected behavior. The problem is basically that you have group vars for the *same* host in two groups, the last one loaded wins in that case which is ultimately defined by alphabetical order by default | 19:21 |
pabelanger | right | 19:22 |
dmsimard | pabelanger: but this behavior can be changed with the "ansible_group_priority" var.. I don't see it on docs.ansible.org but there's a mention of it here https://github.com/ansible/ansible/pull/28777 | 19:22 |
pabelanger | dmsimard: https://review.openstack.org/521324/ is my attempt to fix it | 19:23 |
jeblair | electrofelix: well, in v3 the git repos we want to put onto the workers are on the new zuul-executor server, and they're in a job-specific directory. it, erm, would be physically possible for you to do the same sort of thing, except that it's an even larger security hole. it's definitely not intended to be served out. tbh, i'm not sure it'd be that much harder to do the jenkins plugin. | 19:23 |
dmsimard | pabelanger: that makes sense | 19:23 |
dmsimard | pabelanger: this problem hurts my brain | 19:23 |
dmsimard | jeblair: I understand pabelanger's issue now | 19:24 |
dmsimard | jeblair: forget about SSH, different plays or var scopes.. it's about the same *inventory host* being in two different groups, and these two groups each have a group_vars.. There has to be one group_vars that wins over the other. What pabelanger aims to fix is to provide the ability to generate different *inventory hosts* which are really the same nodepool VM as to make sure each group_vars is loaded properly | 19:26 |
dmsimard | I hope that makes sense, this one hurts my brain for some reason | 19:26 |
electrofelix | jeblair: the problem is selling it, it can sound much easier when it's supposedly just a script, and far more work when it's a plugin, whether it is the same amount of work to solve doesn't always figure into it... | 19:27 |
electrofelix | mordred: I'll try chatting to you more about the plugin, I've a feeling I won't be able to get it to fly this side of feb, but might at least try | 19:29 |
dmsimard | pabelanger: something that is worth mentioning is that this problem doesn't reproduce if you have different var names | 19:30 |
dmsimard | pabelanger: we're seeing this "race" because the same var is defined in both places | 19:30 |
clarkb | dmsimard: interesting so it must merge the vars together? | 19:30 |
clarkb | and its last overlapping name wins? | 19:30 |
dmsimard | clarkb: yes, child > parent, priority and then 'alpha sort' | 19:31 |
dmsimard | There's arguably not much else they can do | 19:31 |
dmsimard | There has to be something to resolve conflicts | 19:31 |
pabelanger | dmsimard: right, I want to keep variable names, but set them to different values based on host. I could rewrite the playbooks to use unique vars, but not something I'd like to do | 19:31 |
mordred | electrofelix: so - it's also been suggested to me that the thing I'm calling a plugin might be able to be done with a groovy script in a jenkinsfile | 19:32 |
dmsimard | pabelanger: yup, just clarifying the behavior about "conflicting" group_vars | 19:32 |
pabelanger | dmsimard: yah, thanks | 19:32 |
pabelanger | you explained it better then I could | 19:32 |
mordred | electrofelix: I don't really know much about those - but I bet if we put our heads together we could come up with a hacky POC approach that would do the handoffs appropriately but not involve a new plugin | 19:32 |
electrofelix | mordred: yes, but it requires a system groovy script and that would just be a precursor to a plugin because you really wouldn't want to have to replicate that for every job | 19:33 |
mordred | electrofelix: nod | 19:34 |
jeblair | dmsimard: yes, though i had read the comment on the bug about loading on demand as suggesting that perhaps when a host is being used because it's in a specific group, that group would win, not some arbitrary first or last group. | 19:35 |
jeblair | but i'm not arguing with reality, that was just what i was hoping for :) | 19:36 |
dmsimard | jeblair: that's what I thought too, actually, which is why I was surprised to see the issue stayed there in 2.4 | 19:39 |
dmsimard | let me challenge upstream on that | 19:39 |
pabelanger | I should also note, https://review.openstack.org/519596/ didn't actually fix the issue with var scoping as I expected. So, we could just abandon that now, if we don't see value in doing it | 19:40 |
pabelanger | it still required an updated inventory file | 19:40 |
dmsimard | pabelanger: added a comment on https://review.openstack.org/#/c/521324/ which summarizes what we discussed | 19:42 |
*** hasharAway is now known as hashar | 19:44 | |
*** electrofelix has quit IRC | 19:45 | |
dmsimard | jeblair: vars are loaded "just in time" for the host, but it doesn't change how when it loads the vars there is a conflict that needs to be resolved basically | 19:47 |
dmsimard | There's no awareness of context as to what group the play is running against vs variable loading | 19:48 |
dmsimard | Which would be awkward anyway, if you target a play against "all", you don't really know what group you're targetting | 19:48 |
jeblair | ya, makes sense | 19:49 |
openstackgerrit | Monty Taylor proposed openstack-infra/zuul-jobs master: Only run whereto if htaccess file exists https://review.openstack.org/521996 | 19:59 |
openstackgerrit | Monty Taylor proposed openstack-infra/zuul-jobs master: Only run whereto if htaccess file exists https://review.openstack.org/521996 | 20:00 |
*** jasondotstar has joined #zuul | 20:13 | |
kklimonda | how does zuul promote work? | 20:23 |
kklimonda | based on the description I've expected promote to move the given change to the top of the queue (below the currently running jobs), but either that did not happen, or the UI is just not showing that correctly. | 20:25 |
openstackgerrit | Merged openstack-infra/zuul-jobs master: Add general sphinx and reno jobs and role https://review.openstack.org/521142 | 20:29 |
openstackgerrit | Merged openstack-infra/zuul-jobs master: Add support for warning-is-error to sphinx role https://review.openstack.org/521618 | 20:34 |
jeblair | kklimonda: it should move the change to the top of the queue *ahead* of the currently running jobs | 20:36 |
kklimonda | @jeblair is there any way to see the internal zuul queue to see if that has happened? Or is zuul web/status.json "canonical" representation anyway? | 20:37 |
jeblair | kklimonda: it's canonical | 20:37 |
jeblair | kklimonda: can you describe your initial state, your promote command, and the state after running promote in more detail? | 20:38 |
kklimonda | sure | 20:38 |
jeblair | kklimonda: (maybe use etherpad.openstack.org if it helps to write it out there) | 20:38 |
kklimonda | jeblair: https://etherpad.openstack.org/p/zuulv3-promote - zuulv3 web is public, and nothing critical in logs, so I just wrote it all down | 20:42 |
jeblair | looking | 20:42 |
jeblair | oh it's check, i was assuming gate | 20:43 |
kklimonda | is this a gate-only feature? what's the difference? | 20:44 |
jeblair | within a pipeline, there are multiple queues. in gate (dependent pipelines), these are determined by which projects affect each other and need to be tested together. in check (indepedent pipelines), all of the items are independent (ie, their ordering in the pipeline doesn't affect each other), so every item gets its own queue. | 20:45 |
kklimonda | ah, that makes sense | 20:45 |
jeblair | so yeah, promote isn't going to do anything in that case since it's a queue of one | 20:45 |
jeblair | we could probably alter it to do something more useful in that case | 20:46 |
kklimonda | would it be possible to implement that for check too? As in, how much work would that be? I'm only juggling 2 patches right now ;) | 20:46 |
kklimonda | s/check/independent pipelines/ | 20:47 |
jeblair | i don't think it would be a simple change... mostly because the behavior we get in gate comes as side effect of re-ordering the queue (the dependency stack has changed, so zuul cancels jobs and re-launches) | 20:48 |
jeblair | here's an idea though | 20:48 |
jeblair | the goal is really "get me results for this change faster", right? | 20:48 |
pabelanger | I actually think if you promote a change in check, it gets moved back to the bottom of status page | 20:49 |
pabelanger | at least that is how I remember it when I tried to promote something in check may moons ago | 20:49 |
jlk | punishment! | 20:50 |
jeblair | perhaps we could add a command to change the priority for a specific change. normally priority is determined by the pipeline. but if we had a command to say "increase the priority of this change", zuul could cancel the node request for that change, and re-issue it with the updated, higher, priority. this would let it get the nodes faster and therefore complete faster. | 20:50 |
kklimonda | right, that's how I've assumed that to be working in the first place - then I started reading the code, and got confused :) | 20:51 |
kklimonda | I was missing "gate-only" part of the puzzle, now that code makes more sense | 20:51 |
jeblair | node allocation is now the dominant factor in when changes start running jobs. the gearman queue is far less relevant now | 20:51 |
kklimonda | right | 20:51 |
jeblair | this priority change would probably be a lot easier to do. | 20:51 |
kklimonda | would that also affect dependent pipelines, or is promote basically doing the same thing anyway? | 20:52 |
pabelanger | Ya, prioirity would be nice | 20:52 |
kklimonda | if it is, perhaps we could just reuse promote for both pipelines, and just make zuul different thing based on the pipeline type.. which sounds pretty nasty.. | 20:53 |
jeblair | kklimonda: it could work on dependent pipelines, but promote would still be better there, because a change at the end of the queue with jobs that have finished still won't report until the change ahead has. though you could use priority on a set of changes in one change queue to give them advantage over a different change queue. | 20:54 |
jeblair | kklimonda: yeah... i'm sort of thinking that two commands may be clearer, but maybe we should have 'zuul promote' error out on independent pipelines? | 20:54 |
jeblair | ("You probably don't want this, use priority instead") | 20:55 |
kklimonda | mhm, error out with a message about the other command could work | 20:55 |
kklimonda | jeblair: btw, now that the summit is over if you have time, I've reworked https://review.openstack.org/#/c/515169/ a bit | 20:59 |
kklimonda | I've also had an idea how to unify autohold requests for jobs, changes and refs by making the last part of the key a regex (.* for job-wide, refs/changes/[change]/.* for changes and full ref for refs) | 21:01 |
kklimonda | but before I write it I wanted someone to take a look at the current revision | 21:02 |
jeblair | kklimonda: ah, thanks! i had a successful vacation and managed to completely forget everything from before the summit. :) | 21:02 |
dmsimard | jeblair: that's quite the feat | 21:03 |
kklimonda | haha, didn't know that was actually possible - tell me your secret ;) | 21:03 |
openstackgerrit | Monty Taylor proposed openstack-infra/zuul-jobs master: Set success-url for sphinx-docs to html https://review.openstack.org/522017 | 21:05 |
mordred | kklimonda: vacationing in places where there is no internet is helpful - I did the same thing before the summit - it made the begnning of the summit fun, as I had to re-learn what a computer is | 21:06 |
dmsimard | mordred, jeblair, pabelanger: By the way, static report generation in ARA might not make it into 1.0. The use case with Zuul made me realize that it /really/ doesn't scale well and I'd much rather improve the sqlite "middleware" option I came up with instead ( http://ara.readthedocs.io/en/latest/advanced.html ) | 21:07 |
jeblair | i deprived myself of oxygen by climbing something like 6 thousand stairs; probably caused permanent brain damage but felt great. | 21:07 |
mordred | jeblair: \o/ | 21:07 |
mordred | jeblair: you had too many brain nuggets anyway | 21:07 |
mordred | dmsimard: nod. where did we get on deploying the middleware version in openstack land? | 21:08 |
jeblair | dmsimard: thanks, makes sense | 21:08 |
dmsimard | The static generation in ARA doesn't come for free, there's some constraints and hacks involved to ensure parity between the dynamic and the static version -- so improving the story around "arbitrary" sqlite databases and making the report always "dynamic" will allow for more freedom | 21:08 |
dmsimard | mordred: not yet, there's reviews for logs-dev.o.o here: https://review.openstack.org/#/q/topic:ara-sqlite-middleware | 21:09 |
dmsimard | I just -W'd https://review.openstack.org/#/c/513866/ because I need to double check something with the vhost setup first. | 21:09 |
kklimonda | @dmsimard with sqlite middleware, would it be possible to "parse" ara reports programmatically, for example to get per-task durations? | 21:10 |
dmsimard | kklimonda: when 1.0 is released, yes -- not in the current "stable" version | 21:10 |
dmsimard | kklimonda: I actually discussed this last night, hang on | 21:11 |
dmsimard | kklimonda: http://eavesdrop.openstack.org/irclogs/%23zuul/%23zuul.2017-11-21.log.html#t2017-11-21T04:14:39 | 21:12 |
dmsimard | see for example http://paste.openstack.org/raw/626895/ which gets information about failed tasks for a particular playbook | 21:12 |
kklimonda | mhm, that will probably make a lot of things easier :) | 21:13 |
kklimonda | I had to gather duration of a single task across all the jobs, right now ended up parsing html (with the power of grep and sed) but being able to just load bunch of sqlite DBs and run queries on them would be much nicer | 21:14 |
dmsimard | kklimonda: and the cool thing about the API is that the client-side implementation (that paste just now) knows how to "talk" to the API offline/internally or over HTTP REST without any changes in the implementation | 21:15 |
dmsimard | so people can write "plugins" or whatever they want and it'll just work, whether I'm running it locally on my laptop without a centralized instance or if I'm sending data over http | 21:16 |
kklimonda | so with this implementation anyone could write a python script that will connect to ara endpoints and query them for various data? | 21:16 |
dmsimard | yes, right now the client is bundled in ara -- but the plan is to unbundle it.. like python-araclient or something. Same for the other components (webapp especially) | 21:17 |
dmsimard | It's not 100% clear yet how the API will end up being restricted (or not).. I'm not interested in the business of handling credentials, passwords, permissions, ACLs/RBAC, etc. This might be an exercise left to the operator -- to restrict through a webserver or something. | 21:18 |
kklimonda | right now there is no RBAC etc. anyway, right? | 21:20 |
dmsimard | Right, but there's also no API and the interface is 100% passive | 21:20 |
kklimonda | anyone can just pull static files and have their sanity tested by parsing it with regex | 21:20 |
dmsimard | The interface in 1.0 remains 100% passive, but you can POST/PATCH/DELETE through the API | 21:20 |
kklimonda | hum | 21:20 |
kklimonda | what would be the usecase for making changes to the already generated report? | 21:21 |
kklimonda | (I'm probably missing something obvious, I only see ARA as a tool to display zuul job results right now :)) | 21:21 |
dmsimard | Mostly things that you don't know until later | 21:22 |
dmsimard | For example, we might want to create a record in the database for a task | 21:22 |
dmsimard | and then update it later once we know if it failed or passed | 21:22 |
dmsimard | ara itself isn't really going to be modifying historical data, but the ability is there -- the api is super generic | 21:23 |
dmsimard | kklimonda: https://github.com/ansible/ansible/blob/devel/lib/ansible/plugins/callback/default.py might give some context around how things work | 21:24 |
dmsimard | kklimonda: ara is a callback plugin that leverages each of these hooks (v2_playbook_on_start, v2_task_on_failed, etc.) and in some circumstances you want to circle back around an event that started (a task) and "finish" it (mark it as successful) | 21:25 |
dmsimard | mordred: btw "select count(*)" is stupid slow for a number of reasons and I kind of want to keep numbers about the amount of data processed by ara. How do you feel about just selecting the last row and getting the id instead ? like "select id from table order by id desc limit 1" ? It's going to be inaccurate if you end up deleting data but it's not really a lie in the sense that ara did process those | 21:31 |
dmsimard | I totally hacked something to make it run faster with sqlalchemy (thank you anonymous stackoverflow person) but it's still way too slow | 21:32 |
mordred | dmsimard: have you tried "select count(id)" ? | 21:33 |
mordred | dmsimard: select count(*) has a special optimization in mysql that makes it fast | 21:33 |
mordred | dmsimard: but if you select count(id) sqlalchemy _should_ be able to use the index on the primary key | 21:34 |
dmsimard | I don't remember, I do know that it is a fairly well documented issue that select count is slow in sqlalchemy | 21:34 |
mordred | nod. well - getting the highest value from an auto increment int primary key column should be good enough | 21:34 |
dmsimard | mordred: that special optimization is in innodb ? | 21:35 |
mordred | dmsimard: oh- actually, I think it's just in myisam - trying to remember - it's beena few years since my consulting days and it gets hazy | 21:37 |
dmsimard | heh | 21:37 |
dmsimard | I vaguely remember doing repeated "show table status" on innodb tables and the row count varying wildly | 21:38 |
jeblair | mordred: maybe you're at the phase now where you can only tune mysql while drunk | 21:38 |
dmsimard | oh look it's explained here | 21:38 |
dmsimard | The number of rows. Some storage engines, such as MyISAM, store the exact count. For other storage engines, such as InnoDB, this value is an approximation, and may vary from the actual value by as much as 40 to 50%. In such cases, use SELECT COUNT(*) to obtain an accurate count. | 21:38 |
dmsimard | The Rows value is NULL for tables in the INFORMATION_SCHEMA database. | 21:38 |
mordred | dmsimard: yah - there it is | 21:38 |
dmsimard | good to know | 21:39 |
mordred | and https://www.percona.com/blog/2007/04/10/count-vs-countcol/ explains how innodb will do which type of scans in which cases | 21:39 |
mordred | also https://www.percona.com/blog/2006/12/01/count-for-innodb-tables/ | 21:40 |
mordred | depending on how much you want to know :) | 21:40 |
dmsimard | It's been at least 2 years since I've actively tuned mysql but it's still fun :) | 21:41 |
*** jkilpatr_ has quit IRC | 21:47 | |
*** threestrands has joined #zuul | 21:48 | |
*** jkilpatr has joined #zuul | 22:05 | |
*** hashar has quit IRC | 23:22 |
Generated by irclog2html.py 2.15.3 by Marius Gedminas - find it at mg.pov.lt!