Tuesday, 2017-11-21

openstackgerrit	Monty Taylor proposed openstack-infra/zuul-jobs master: Add general sphinx and reno jobs and role https://review.openstack.org/521142	00:00
openstackgerrit	Monty Taylor proposed openstack-infra/zuul-jobs master: Add support for warning-is-error to sphinx role https://review.openstack.org/521618	00:00
openstackgerrit	Monty Taylor proposed openstack-infra/zuul-jobs master: Update fetch sphinx output to use sphinx vars https://review.openstack.org/521590	00:00
tristanC	mordred: oups, the angular version note got lost in the rebase, it was documented here: https://review.openstack.org/#/c/466561/1/etc/status/fetch-dependencies.sh (v1.5.6)	01:00
openstackgerrit	Tristan Cacqueray proposed openstack-infra/zuul feature/zuulv3: web: add /static installation instructions https://review.openstack.org/521694	01:24
tristanC	mordred: /win 45	01:25
tristanC	oops, well this https://review.openstack.org/521694 recap /static installation instructions	01:25
mordred	tristanC: ah - cool! I can update my patch to use 1.5.6 instead of 1.5.8	01:28
mordred	tristanC: https://review.openstack.org/#/c/521625/ - unless you want to squash mine into your patch there - either way is fine with me	01:28
tristanC	mordred: or i can verify zuul.angular.js works with 1.5.8, the concern is about the $locationProvider used in builds.json to parse the query string args	01:29
tristanC	mordred: it feels like those curl script are band aid until we integrate webpack or something	01:30
tristanC	clarkb: agreed what matter is how zuul jobs leverage ansible, though there is one bit to account is how the zuul_stream callback works, iiuc the zuul-executor needs a tcp connection to the slave zuul_console daemon	01:33
tristanC	which seems to assume the nodepool slave do already have a regular network along the ansible_connection	01:35
pabelanger	clarkb: jeblair: I added the notes about the zuulv3.o.o outage last week to: https://wiki.openstack.org/wiki/Infrastructure_Status	01:39
pabelanger	https://review.openstack.org/513915/ was the commit in question that stopped us from starting zuul again	01:39
pabelanger	and required a force merge of: https://review.openstack.org/519949/	01:39
jeblair	pabelanger: that was the original version of that commit; it was fine. mordred merged that repo into another outside of gerrit, which is why the problem arose.	02:09
pabelanger	okay	02:17
pabelanger	remote: https://review.openstack.org/521700 demo variable scoping issue in ansible	02:27
pabelanger	jeblair: mordred: clarkb: dmsimard: a very simple patch to demo the issues I am having ^. This isn't a zuul issue, but difference how different inventory files can affect ansible from running. We can go into more details inthe morning	02:29
tristanC	regarding nodepool backends, this isn't a blocker to release v3 today from my point of view. though we might want to merge a few simple addition to support custom ansible_connection and ansible_user so that it works for tobiash's use case.	02:40
pabelanger	tristanC: which patch is that?	02:55
tristanC	pabelanger: https://review.openstack.org/453983 and https://review.openstack.org/453983	02:56
openstackgerrit	Merged openstack-infra/zuul feature/zuulv3: web: add /static installation instructions https://review.openstack.org/521694	03:00
pabelanger	tristanC: think one was to be https://review.openstack.org/501976	03:04
pabelanger	but cool, never knew we had that	03:04
pabelanger	looks interesting	03:04
pabelanger	tobiash: left comments on 501976	03:07
*** harlowja has quit IRC		03:22
dmsimard	mordred, tristanC, jeblair: I'd love to pick your brain about a question I have regarding the API implementation in ARA if you happen to be around	03:45
dmsimard	jlk: maybe you too since you used it in BonnyCI :D	03:47
dmsimard	before 1.0, aggregating data to a single location (i.e, running ansible from different servers with ARA set up) meant using a database server, like MySQL, creating credentials and a database -- and then configure those credentials so that ARA knows how to connect to that database	03:48
dmsimard	In OpenStack terms, it's not very different than how different nova compute nodes knows where the nova database is as well as the username and password	03:49
dmsimard	It's not ideal for a number of reasons, one of which is because the user has read/write access to the database and those credentials might end up on users' laptops because that's where they run ansible from. A bit meh.	03:49
dmsimard	So, enter 1.0 with this shiny new API. There's either the default standalone/offline/internal API which has no authentication, no network calls and no HTTP involved.	03:50
dmsimard	Or there's the HTTP REST API that you can make available so that you can get/post/update data	03:51
dmsimard	I don't really want to be in the business of managing API tokens or credentials, or ACLs but there might not be any other way. I'd really have two kind of "users" (or "tokens"), read-only and read-write. However, I'm not really sure how to go about managing tokens/users.	03:52
dmsimard	I think when we discussed the Zuul API in Denver jeblair said he also wasn't interested in managing credentials and would rather keep the API opened and securing the API as an exercise to the operator (i.e, hide /api/admin properly through a webserver or something)	03:55
dmsimard	I'm wondering if I should do the same thing or not. I really want to keep the code base as simple as possible.. I'm a bit concerned about the implications of adding credentials, permissions, etc.	03:56
dmsimard	</endwalloftext>	03:56
tristanC	dmsimard: how about using http authorization and adding htaccess in front of the ara server?	04:00
tristanC	i guess this is how it's going to be implemented for the zuul-web/admin endpoint	04:01
*** harlowja has joined #zuul		04:01
dmsimard	tristanC: Like a http authentication ? or restriction by IP ?	04:01
dmsimard	I'm not sure how http authentication in front of an API would work from a client perspective	04:01
dmsimard	restriction by IP (or hosting the API in a restricted network to begin with) is probably what I had in mind	04:02
dmsimard	I suppose since the client uses python requests, it's probably easy to go through the http auth and then do a GET/POST/PATCH/etc, just never seen that done before	04:03
dmsimard	but yes, it's an interesting idea I hadn't thought about. I really just don't want to end up validating the credentials and matching those to some permissions	04:04
tristanC	dmsimard: i meant like support a 'authorization' or even a 'x-auth-token' http header at the client level, and then use a middleware to authorize the request on top of the ara server	04:07
tristanC	dmsimard: though, isn't zuul going to only use the standalone/offline/internal api of ara?	04:09
dmsimard	tristanC: probably, yes.. the API is useful to ARA first of all, it is consuming the API instead of doing custom SQL queries everywhere	04:11
dmsimard	the API endpoint is available if people are interested in aggregating data from different locations that way	04:12
dmsimard	but it also allows to query ARA programmatically over HTTP	04:13
dmsimard	i.e, give me the tasks for this playbook -- or give me the results for this task	04:13
dmsimard	Running the API endpoint is not required at all, the default is still the internal API that is completely offline without HTTP	04:14
tristanC	dmsimard: speaking of which, i'd be interested in the 'give me the output of all the failed task'	04:14
tristanC	which sounds like the first query the zuul user should get when looking at the ara report of his job	04:15
dmsimard	tristanC: yup, you could totally do something like this (totally just wrote it now) http://paste.openstack.org/raw/626895/	04:21
tristanC	so that would be part of a "ara generate report --failed-first" or something like that?	04:22
dmsimard	tristanC: that's python, it's not a frontend/UI implementation	04:22
dmsimard	tristanC: it's something that, for example, the zuul executor could do to learn about failures and maybe link to them directly or something.	04:23
* dmsimard waves hands like mordred would		04:24
tristanC	dmsimard: i meant, right now you have to click "logs -> ara -> playbook -> task-page -> the task that failed" to get the reason why your job failed	04:25
*** haint has quit IRC		04:25
tristanC	dmsimard: what would be cool is to shorten all those intermediary clicks so that when you click logs, then you get the output of the tasks that failed	04:25
dmsimard	tristanC: yeah but really this failed task result is already available to a direct link like http://logs.openstack.org/72/516172/4/check/openstack-tox-cover/f3e9208/ara/result/09382b17-4cfe-44dd-b0c1-729feeef3e4f/	04:26
dmsimard	A static report isn't going to have an API available in order to query it	04:27
dmsimard	But the executor can query ARA after the playbook has completed, determine if there has been any failures, and link to it accordingly	04:27
dmsimard	Anyway, your imagination is the limit around what you want to end up doing with the API	04:29
*** smyers has quit IRC		04:36
*** smyers has joined #zuul		04:36
*** yolanda has quit IRC		04:44
*** nguyentrihai has joined #zuul		05:34
*** haint has joined #zuul		05:40
*** nguyentrihai has quit IRC		05:43
*** harlowja has quit IRC		05:52
tobiash	pabelanger: did you mean 503148 or forgot to click send on 501976?	06:16
openstackgerrit	Tobias Henkel proposed openstack-infra/nodepool feature/zuulv3: Rename ssh_port to connection_port https://review.openstack.org/500800	06:26
openstackgerrit	Tobias Henkel proposed openstack-infra/nodepool feature/zuulv3: Support username also for unmanaged cloud images https://review.openstack.org/500808	06:28
*** yolanda has joined #zuul		06:45
*** hashar has joined #zuul		07:03
openstackgerrit	Tobias Henkel proposed openstack-infra/nodepool feature/zuulv3: Add connection-type to provider diskimage https://review.openstack.org/503148	07:38
openstackgerrit	Tobias Henkel proposed openstack-infra/nodepool feature/zuulv3: Don't gather host keys for non ssh connections https://review.openstack.org/503166	07:38
openstackgerrit	Tobias Henkel proposed openstack-infra/nodepool feature/zuulv3: Add connection-port to provider diskimage https://review.openstack.org/504112	07:38
openstackgerrit	Tobias Henkel proposed openstack-infra/zuul feature/zuulv3: Use username from node information if available https://review.openstack.org/453983	07:45
openstackgerrit	Tobias Henkel proposed openstack-infra/zuul feature/zuulv3: Rename ssh_port to connection_port https://review.openstack.org/500799	07:45
openstackgerrit	Tobias Henkel proposed openstack-infra/zuul feature/zuulv3: Use connection type supplied from nodepool https://review.openstack.org/501976	07:45
*** rcarrillocruz has quit IRC		08:49
*** rcarrillocruz has joined #zuul		09:42
*** hashar has quit IRC		10:03
*** jesusaur has quit IRC		10:07
*** hashar has joined #zuul		10:16
*** electrofelix has joined #zuul		10:18
*** jesusaur has joined #zuul		10:19
*** jhesketh has quit IRC		10:28
*** jhesketh has joined #zuul		10:30
*** jkilpatr has joined #zuul		12:04
*** isaacb has joined #zuul		12:13
openstackgerrit	Tobias Henkel proposed openstack-infra/zuul feature/zuulv3: Use username from node information if available https://review.openstack.org/453983	12:28
openstackgerrit	Tobias Henkel proposed openstack-infra/zuul feature/zuulv3: Rename ssh_port to connection_port https://review.openstack.org/500799	12:28
openstackgerrit	Tobias Henkel proposed openstack-infra/zuul feature/zuulv3: Use connection type supplied from nodepool https://review.openstack.org/501976	12:28
rcarrillocruz	hey folks, to trigger check jobs on .zuul.yaml, that's not implicit right?	13:08
rcarrillocruz	like	13:09
rcarrillocruz	i have to splicitly put a files regex on .zuul.yaml if i want to trigger the job should it be modified	13:09
rcarrillocruz	?	13:09
rcarrillocruz	i.e.	13:09
rcarrillocruz	files:	13:09
rcarrillocruz	- ^lib/ansible/modules/network/ovs/.*$	13:09
rcarrillocruz	- ^test/integration/targets/openvswitch.*	13:09
rcarrillocruz	i should also add	13:09
rcarrillocruz	- .zuul.yaml	13:09
rcarrillocruz	should I want that job to be triggered on .zuul.yaml mod ?	13:10
tobiash	rcarrillocruz: if you use files filter you probably also want to add .zuul.yaml in case you touch the according job	13:15
rcarrillocruz	Ok, thx for confirming	13:16
tobiash	rcarrillocruz: and probably the according playbook	13:17
rcarrillocruz	Aye	13:32
tobiash	rcarrillocruz: if you want to limit what jobs run on zuul.yaml changes you also could split that into several files	13:56
*** jkilpatr_ has joined #zuul		14:00
*** jkilpatr has quit IRC		14:03
dmsimard	jeblair: so I noticed that ARA is still not up to date on the executors.. we had gotten stuck by https://review.openstack.org/#/c/516740/	14:04
dmsimard	I happened to have switched to Firefox (57 is awesome) and there's a bugfix in one of the latest releases that resolves an issue with permanent links on firefox :(	14:04
*** hashar has quit IRC		14:11
*** hashar has joined #zuul		14:11
rcarrillocruz	mordred: in terms of zuul , third party CI and github, how's the story there? will 3rd partys willing to CI to create their own GH app and 'we' install it on our repos or is there other mechanism in the roadmap ?	14:49
rcarrillocruz	other question: depends-on does not work in multiCI envs (like mixing a Github and Gerrit) iiuc, does it work on github to github tho ?	14:55
*** weshay is now known as weshay_pto		14:55
mordred	rcarrillocruz: yes to the first question	14:56
mordred	rcarrillocruz: for the second, cross-source depends-on is the thing that doesn't work yet- but it's on the short-term roadmap	14:57
rcarrillocruz	Does it work if source are GH?	14:57
mordred	rcarrillocruz: so that'll be fixed before we cut an official 3.0 release	14:58
mordred	yes - it works with gh	14:58
rcarrillocruz	Sweet thx	14:58
pabelanger	tobiash: rcarrillocruz: I would think .zuul.yaml files would be implied matching on files, but never tested	15:06
rcarrillocruz	yeah, thought so, but encountered that. I think it makes sense , as you tie a file , to a job, to a pipeline	15:07
rcarrillocruz	if it was implied	15:07
rcarrillocruz	that would mean kicking off on all pipeliens	15:07
rcarrillocruz	at least that's how i assume the rationale is about needing that to be explicit	15:07
tobiash	pabelanger: oh, didn't thought of that possibility	15:09
tobiash	is it really implied?	15:09
pabelanger	tobiash: I am not sure, I assumed it was. but need to test myself. i think zuul always will load it config	15:09
rcarrillocruz	so, reading that roadmap thing	15:15
rcarrillocruz	i'm curious about the dashboard	15:15
rcarrillocruz	what does it mean	15:15
rcarrillocruz	bundling something on zuul , html stuff and all, to get zuulv3.openstack.org kind of interface	15:16
rcarrillocruz	from what i see, 8001 is the zuul 'api', to see live status of queues . I assume the dashboard as we know it is something we deploy outside of zuul package	15:16
rcarrillocruz	?	15:16
rcarrillocruz	heh, was chatting about dmsimard the other day we may eventually also need ansible_connection along with ansible_user plumbed up to zuul, just spotted tobiash https://review.openstack.org/#/c/501976/	15:19
rcarrillocruz	++	15:19
dmsimard	tobiash++	15:19
rcarrillocruz	tristanC: the dashboard thing you got it assigned, is that dashboard a thing that will be bundled within zuul ?	15:20
rcarrillocruz	do you have changes for it to look around?	15:20
rcarrillocruz	nm, https://review.openstack.org/#/q/topic:zuul-web+(status:open+OR+status:merged)	15:27
pabelanger	rcarrillocruz: I think it would be something like: https://softwarefactory-project.io/zuul3/local/builds.html	15:28
rcarrillocruz	oh	15:28
rcarrillocruz	mucho bonito!	15:28
jeblair	rcarrillocruz: dashboard will be built in to zuul -- all the web stuff will be combined	15:28
pabelanger	I think SF rolled that out yesterday, which is the based for zuul-dashboard	15:28
jeblair	rcarrillocruz: "topic:zuul-web" has the changes	15:28
rcarrillocruz	++	15:28
rcarrillocruz	that's great	15:29
rcarrillocruz	cos i was having a hard time figuring out how to get a dhasboard last night	15:29
rcarrillocruz	i think i better wait to get that merged	15:29
tobiash	rcarrillocruz, dmsimard: yeah, just rebased it this morning :)	15:31
rcarrillocruz	that's great for me, cos, i need ansible_connection to be either local or network_cli in order to test network devices from executor	15:32
rcarrillocruz	workaround now, i create a bastion with nodepool that creates an inventory on the fly with the needed vars	15:32
jeblair	rcarrillocruz: i wonder how that will work with the security protections we have against local connections for untrusted jobs (it doesn't apply to trusted jobs, but i'm sure we'd want to find a way to make both work). (cc: mordred)	15:38
rcarrillocruz	yeah. for this POC, i wanted to have jobs in-repo cos i find superuseful to get those tests on commit. OTOH, that forces me get this additional bastion as i can't do certain things on the executor. I think when I show this to my peers and we move it forward in prod i'll just make the 'run ansible network integration tests' job a role on config project so I don't double-jump to kick off tests,	15:43
rcarrillocruz	executor-bastion-testnode	15:43
jeblair	rcarrillocruz: i'm assuming the network_cli connection plugin wouldn't cause many security concerns on the executor... what do you need local for?	15:45
rcarrillocruz	it's a trick we have on network modules on 2.3/2.4. We check on action plugins if it's local, then switch to network_cli. We had to do that back in 2.3 to leverage ansible command line flags like -k and -u , instead of needing to pass creds as module side args	15:48
rcarrillocruz	https://github.com/ansible/ansible/blob/stable-2.4/lib/ansible/plugins/action/ios.py#L51	15:49
rcarrillocruz	good news is that on devel we're moving away from that hack, and devices ssh connection will use connection: network_cli , no more local	15:49
rcarrillocruz	but there are a few platform families needing that transition	15:49
mordred	jeblair, rcarrillocruz: I believe ansible_connection should be fine with our security protections - if we have nodepool pass it, then it'll be in the inventory - what we protect against is a user setting it as a variable (iirc)	15:49
mordred	lemme check though	15:50
jeblair	mordred: okay, that's the way my brain was heading, but wanted to make sure	15:50
jeblair	so even that local->network_cli hack may work	15:50
rcarrillocruz	so something i haven't tested yet but i think i may hit a roadbloack is https://github.com/openstack-infra/zuul/blob/feature/zuulv3/zuul/executor/server.py#L1406	15:52
rcarrillocruz	by default, there's a gather_facts on nodepool nodes	15:53
rcarrillocruz	however, for network devices that will fail	15:53
rcarrillocruz	we don't have a shell to play with	15:53
rcarrillocruz	not along python to gather facts	15:53
rcarrillocruz	should that be tweakable somehow, or overridable the gather facts phase on the job section	15:53
rcarrillocruz	?	15:53
pabelanger	I wonder if you could laydown an empty file in fact cache, like we do for localhost as a pre playbook	15:54
mordred	rcarrillocruz: well - we do that in server to pre-cache the facts - which along with gather: smart means tasks in jobs shouldn't themselves run fact gathering ...	15:54
rcarrillocruz	or paramiko nodepool connection_port as we chatted the other day	15:55
rcarrillocruz	mordred: but don't we fail early witn NODE_FAILURE should that pre stage fail ?	15:55
mordred	perhaps, similar to connection it's someting we need to know abot a node from nodepool - 'supports fact gathering'	15:55
mordred	rcarrillocruz: oh - absolutely- that'll totally break you as it is today	15:55
mordred	biggest question would be how to know that the node type in question does not support fact gathering	15:56
rcarrillocruz	so not sure how we would tackle that, as an executor flag (pre fact gather yes/no)	15:56
rcarrillocruz	or have a new param on the node	15:56
rcarrillocruz	saying	15:56
rcarrillocruz	'supporst fact gathering'	15:56
mordred	rcarrillocruz: how do the network modules themselves handle playbook automatic fact gathering?	15:56
rcarrillocruz	both are not mutually exclusive	15:56
rcarrillocruz	mordred: today we have <platform>_facts	15:56
rcarrillocruz	in the short term, gather_facts will be pluggable	15:57
rcarrillocruz	meaning, if we hint ansible that the node is a network thing ( think with ansible_network_os), then executor will spawn the right 'driver'	15:57
rcarrillocruz	i think alikins was on it, not sure if we'll get that for 2.6 at the very least	15:57
mordred	rcarrillocruz: yah - but ansible-playbook runs fact gathering on hosts ... are there just hard-coded lists in playbook that say "don't run fact gathering if ansible_connection is network_cli or something?"	15:57
pabelanger	well, I think we only run setup_playbooks (gather facts) today to ensure SSH has been setup properly. We could re tweek that again and stop doing ansible -m setup to validate SSH is working, which moves facts back into ansible-playbook. Wouldn't that allow somebody to gather_facts: false in all playbooks?	15:58
mordred	oh - wait	15:58
rcarrillocruz	yeah, i think gather_facts is only on ssh connection	15:58
mordred	if we need to plumb ansible_connection in anyway, we could just check if ansible_connection == 'ssh' in that fact gathering	15:58
rcarrillocruz	mordred: http://docs.ansible.com/ansible/latest/ios_facts_module.html we do as modules	15:58
mordred	since, as pabelanger says, it's in support of our ssh connections	15:59
rcarrillocruz	cool	15:59
rcarrillocruz	i thnk that's a good compromise	15:59
pabelanger	yah	15:59
rcarrillocruz	so, wait for tobiash changes to land	15:59
rcarrillocruz	then change executor logic to test that	15:59
jeblair	well, it's twofold	16:00
jeblair	it's not just to validate that ssh is working, but also to establish the ssh controlpersist connections	16:00
jlk	As folks who work on CI, y'all will appreciate this: https://unix.stackexchange.com/questions/405783/why-does-man-print-gimme-gimme-gimme-at-0030	16:01
pabelanger	woot! https://github.com/gtest-org/ansible/pull/1 tox-pep8 works (still) via github connection driver	16:01
pabelanger	took 10mins to sync git repo to node however :D	16:01
rcarrillocruz	jeblair: what's the reason to check for the controlpersist? asking, as in paramiko we don't have such, it's the reason why ansible-connection was written, to have 'feature parity'	16:02
*** isaacb has quit IRC		16:02
mordred	rcarrillocruz: we set up controlpersist independently	16:03
rcarrillocruz	jlk: off-topic, does anyone know sigmavirus or where he hangs out? https://github.com/sigmavirus24/github3.py/pull/671 , i guess it would be good to get a release to not carry the editable package on requirements.txt	16:03
mordred	rcarrillocruz: because of wrapping ansible-playbook calls in bubblewrap	16:03
rcarrillocruz	ic	16:05
jlk	rcarrillocruz: I know not, but definitely worth poking upstream again :(	16:05
jeblair	digging deeper, i think things should still work even if controlpersist isn't established there	16:05
mordred	rcarrillocruz: also because we start an ssh agent so that we can inject the ssh keys into it and then remove them so that they are not there for the jobs	16:05
rcarrillocruz	ah	16:06
rcarrillocruz	so that explains the remove_build_key role	16:06
rcarrillocruz	i was wondering what was about it	16:06
mordred	I mean - there's a few things we could make ssh-aware - like we don't need to start an ssh agent if ansible_connection != ssh	16:06
rcarrillocruz	that's the rationale for it?	16:06
jlk	hrm.	16:06
jlk	mordred: would we need to model that "add/remove" capability if the connection is not ssh? like if the connection is kubectl exec ?	16:07
mordred	rcarrillocruz: ya - we have a base key that we manage, we use that in service of creating a per-build key and adding it to the remote nodes, then removing access to the original key fromthe job before handing things off	16:07
jlk	is the threat model written up somewhere w/ the keys?	16:08
jeblair	rcarrillocruz: that way a job can't (somehow) ssh into another host outside the set it's been given	16:08
rcarrillocruz	was it ever on the table the idea to spawn executors from nodepool itself ? like a control plane pool	16:08
jeblair	it shouldn't be able to do that anyway, but just in case	16:08
mordred	jlk: unsure - we could make the key dance in the base job no-op if ansible_connection != ssh - or it's possible we'll need to do similar things for other systems, like win_rm which uses passwords/certs iirc	16:08
jlk	mordred: yeah we may need to do that in the k8s exec route. Otherwise a task on the executor could exec into pods/containers from another job	16:09
pabelanger	jlk: github question, how is the 'detail' url in 'all checks have passed' box work? https://github.com/gtest-org/ansible/pull/1	16:10
jlk	unless we figure out a way to prevent docker/kubectl calls from happening via shell on the executor	16:10
mordred	jlk: nod. yah - it seems like there is a dance we need to do generally, but the impl may be different for each type of ansible connection plugin	16:10
mordred	jlk: we have that way	16:10
pabelanger	jlk: is that something we need zuul to update with stream.html page / final logs?	16:10
jeblair	it's worth noting the key swap is a second layer defense. it should not be possible for an untrusted job to add a host to the inventory or run a local hsel.	16:10
jeblair	shell	16:10
mordred	jlk: docker/kubectl calls are already prevented from running via shell on the executor	16:10
jlk	pabelanger: it's the zuul_url fed back through as part of the status POST call	16:10
mordred	jeblair: ++	16:11
jlk	jeblair: oh good point.	16:11
jeblair	(but just in case that happens somehow, we didn't want the result of that attack to be "you can ssh into any node zuul can ssh into")	16:11
pabelanger	jlk: thanks, in bonnyci, did you get it properly configured to point to your final logs?	16:11
mordred	yah	16:11
jlk	pabelanger: for zuul 2.x yes, for 3.x I think that's still an open question	16:11
mordred	it seems like good form to have equivilent auth dances for other connection types	16:11
jlk	(particularly since that URL dances around)	16:11
jlk	You can have only one URL, so you need it to link to a page that shows all the jobs from a pipeline, with links into their jobs	16:12
jlk	er logs	16:12
jeblair	jlk: what did you do in zuul v2?	16:12
*** isaacb has joined #zuul		16:12
jlk	we pointed to a directory	16:13
jlk	and that directory had subdirs for all the jobs I believe	16:13
jeblair	oh, so you constructed the logpath specifically for that case	16:13
jeblair	makes sense	16:13
pabelanger	okay, so it is possible we still have some work to do on v3	16:14
jeblair	our path in openstack is constructed to organize by change, but not buildset	16:14
jeblair	maybe we could switch it?	16:15
jeblair	instead of /change/patchset/pipeline/job/build/ we could use /change/patchset/pipeline/buildset/job/	16:15
rcarrillocruz	i copy pasted the url format from bonnyCI , this is how it looks like on my side http://38.145.34.35/logs/ansible-networking/check/github.com/rcarrillocruz-org/ansible-fork/5/c66a514898a14a9ba93a813c8d32a117/	16:15
jeblair	or we can link to the dashboard url for the buildset	16:16
jeblair	once the dashboard lands	16:16
jeblair	that may be the better approach	16:16
jeblair	rcarrillocruz: ah thx, makes sense	16:17
mordred	I like the dashboard approach ... since that link could potentially contain the in-progress links and change to the log links (at leats in theory)	16:17
jlk	yeah the dashboard was what we were hoping for	16:17
jlk	and works more like Travis, CircleCI, Shippable, etc.	16:17
mordred	so the link given in the status could be a persistent link that people could re-use	16:18
jeblair	mordred: i agree, though currently the dashboard doesn't handle in-progress links	16:18
mordred	yah	16:18
jeblair	and it's not trivial to add	16:18
pabelanger	jeblair: don't mind trying the new URL format	16:18
* rcarrillocruz will follow what shippable does, to make people more comfortable on current way of doing ansible CI things		16:18
jeblair	(i think we can, it's just programming, but it's merging two data sources)	16:18
jeblair	rcarrillocruz: what does that mean?	16:19
jeblair	tristanC: replied on https://review.openstack.org/503270	16:26
tristanC	jeblair: followed up :)	16:39
rcarrillocruz	just echoing jlk 'works more like travis, shippable'. At Ansible they use Shippable, so I'll try to show things like Shippable on zuul PR notifications	16:40
jeblair	rcarrillocruz: right, i'm asking what that means :)	16:41
tristanC	fwiw i'm not convinced the route we decided at the ptg are the best, it makes apache rewrite a bit weird to serve static .html files on dynamic path	16:41
tristanC	i wonder if we shouldn't step back and have instead a single .html file that would query the different controller path	16:42
tristanC	or if you have other suggestion, i wouldn't mind using another routes list and refactoring the html bits	16:44
rcarrillocruz	this is how a link on an ansible PR looks like https://app.shippable.com/github/ansible/ansible/runs/44996/summary/console . From the gtest PR it was put earlier we point to the main zuul v3 dashboard, would be good to point to the actual job stream link	16:45
rcarrillocruz	not sure howe we get the shippable 'run' link	16:45
rcarrillocruz	i can ask mattclay	16:45
jlk	Pretty sure it comes with the status from Shippable	16:46
jlk	the pending one	16:46
rcarrillocruz	wootz	16:46
rcarrillocruz	https://github.com/rcarrillocruz-org/ansible-fork/pull/5	16:46
rcarrillocruz	janky	16:46
rcarrillocruz	but i get 'usable' links back on PR	16:46
rcarrillocruz	just added zuul_return on the base post playbook	16:46
jlk	https://travis-ci.org/BonnyCI/hoist/builds/267787248?utm_source=github_status&utm_medium=notification is a relevant link from Travis	16:47
jlk	it's the URL it tosses on status POSTs	16:48
mattclay	rcarrillocruz: You had a question about getting Shippable run links?	16:48
rcarrillocruz	oh	16:48
rcarrillocruz	did not even know you were here mattclay	16:48
rcarrillocruz	:-)	16:48
* mattclay waves		16:49
rcarrillocruz	so we were wondering how shippable 'detail' link gets returned straight to the job being run	16:49
rcarrillocruz	as in the zuul report put on openstack just points to the main dashboard	16:49
rcarrillocruz	https://github.com/gtest-org/ansible/pull/1	16:49
mattclay	rcarrillocruz: You mean the 'Details' link for the Shippable status that shows up on a PR?	16:50
rcarrillocruz	yah	16:50
mattclay	rcarrillocruz: I believe it's this: https://developer.github.com/v3/repos/statuses/#create-a-status	16:50
jlk	right, like I said. It's a URL that is provided as part of the POST to set the commit status	16:51
mattclay	It gets updated every time the run status changes until it's finished.	16:52
pabelanger	I don't think shipable comments on PRs like zuul right?	16:55
*** isaacb has quit IRC		16:57
rcarrillocruz	bit different yeah, it doesn't put a comment per-se	16:57
rcarrillocruz	https://github.com/ansible/ansible/pull/33146	16:57
rcarrillocruz	it's an 'all checks have passed' that you can click	16:58
rcarrillocruz	iirc with zuul we put a straight comment from the bot	16:58
mordred	rcarrillocruz: we can do either - it's configurable	16:58
mordred	rcarrillocruz: you can configure it to report into that status link, or to leave comments, or both	16:58
*** hashar is now known as hasharAway		17:00
rcarrillocruz	aha	17:00
* rcarrillocruz just looking at comment option of GH reporter		17:00
jeblair	i thought we could not update the link...?	17:00
jeblair	if we can update the url, then we should link to the status page in the start report, then link to the logs/dashboard in the final report	17:05
jeblair	but i thought someone said we could only set the url once	17:05
jeblair	oh, maybe i'm misremembering, and the problem is that, without the dashboard, we don't have a single url for the buildset after the builds complete?	17:06
jeblair	so once we do have the dashboard, we can do what i described above: set the url to status page on start, then set the url to dashboard on final	17:07
jeblair	rcarrillocruz, jlk: ^ does that sound right?	17:07
mordred	yes - I think that was the main issue	17:07
pabelanger	yah, that looks to be right based on docs	17:07
jeblair	cool. i will be very happy when this is all straightened out. :)	17:07
jeblair	tristanC: can we just have zuul-web serve the static html files?	17:08
*** bhavik1 has joined #zuul		17:16
tristanC	jeblair: well yes, that's what it does by default	17:17
openstackgerrit	Tristan Cacqueray proposed openstack-infra/zuul feature/zuulv3: web: add /{tenant}/jobs route https://review.openstack.org/503270	17:18
jeblair	tristanC: sorry, i may have misunderstood what you were saying about apache rewriting	17:21
tristanC	jeblair: to keep things dead simple from a user pov, we said that the status page would avaiable at /{tenant}/status.html	17:22
tristanC	jeblair: that content is served by /{tenant}/status.json and the page just do "get status.json"	17:22
tristanC	jeblair: which is all good with a standalone zuul-web service	17:23
tristanC	jeblair: but to serve those html files (which includes builds.html, jobs.html, and later {jobname}.html) from a proxy, then we need to rewrite those url using something like:	17:24
jeblair	why would we need to rewrite individual urls in the proxy? normally i would just expect to proxy the root	17:25
tristanC	AliasMatch "^/zuul3/./(.).html" "/var/www/zuul-web/static/$1.html"	17:27
tristanC	jeblair: rewrite static file so that they are served by apache instead of aiohttp	17:27
jeblair	tristanC: why not let aiohttp serve them?	17:27
tristanC	jeblair: good point :-) i may have over optimized that thing...	17:28
jeblair	tristanC: also, we're sending cache-control headers for the status page at least, we can probably make sure we set those correctly for the html pages too, and then apache will end up serving them from cache anyway most of the time, with no extra configuration	17:29
tristanC	alright then nevermind that concern, let's do this instead	17:32
tristanC	just need to add cache-control to the static file controller	17:33
jeblair	++	17:35
pabelanger	jeblair: mordred: clarkb: dmsimard: I linked this last night, but https://review.openstack.org/521700/ is an example of the issues I was trying to explain around the need for https://review.openstack.org/521324/	17:39
pabelanger	it shows how group vars are handled differently based on inventory file	17:40
pabelanger	jlk: ^might be intersting to you too	17:40
* dmsimard looks		17:42
dmsimard	pabelanger: I think I understand what's going on but that seems like a bug in Ansible to me	17:44
dmsimard	Doing something in Zuul to address that seems like a workaround for a bug	17:44
dmsimard	v3-inventory and v3-inventory-group should behave the same	17:45
dmsimard	Well.. maybe not, actually	17:45
jeblair	pabelanger: which numbers do you get when you run v3-inventory-group?	17:52
*** bhavik1 has quit IRC		17:53
jeblair	ah it's in the job log -- 67890	17:53
clarkb	pabelanger: so the problem is in how group vars are associated to a host if its logical name doesn't change?	17:53
jeblair	pabelanger: you switched the order of the groups in v3-inventory vs v3-inventory-group. is that important?	17:55
dmsimard	pabelanger: I sort of remember something related to changes in variable scopes and inheritance in 2.4... let check	17:56
dmsimard	pabelanger: heh, that sounds like our culprit too: https://github.com/ansible/ansible/issues/29008	17:57
dmsimard	"import_playbook from child directory break var scope"	17:57
pabelanger	jeblair: oh, that is a typo, doesn't affect things	17:57
dmsimard	pabelanger: bcoca explains the change here: https://github.com/ansible/ansible/issues/29008#issuecomment-330558987	17:58
pabelanger	clarkb: maybe? I don't know why it doesn't work	17:58
pabelanger	dmsimard: looking	17:58
dmsimard	pabelanger: tl;dr, in 2.3 vars were loaded at the start (which confuses Ansible in your case because you have one host in two groups) and in 2.4 they are loaded on demand which should have the desired behavior	17:58
jeblair	dmsimard: yeah, that's how i'm reading it	17:59
dmsimard	pabelanger: the issue is prevalent especially if you have the same hostvar in more than one group_vars	18:00
dmsimard	otherwise it probably doesn't reproduce	18:00
pabelanger	dmsimard: right, I know include is deprecated in 2.4 and should switch to new syntax, but I haven't tested that yet	18:00
jeblair	pabelanger: would it be very difficult for you to try your example under 2.4?	18:00
pabelanger	jeblair: nope, i can run that now	18:00
dmsimard	pabelanger: it's not a matter of using include or import, there is a change in how variables are loaded in 2.4	18:00
dmsimard	pabelanger: see bcoca's comment	18:01
pabelanger	yes	18:01
pabelanger	let me first test with include and 2.4	18:01
pabelanger	then, switch up to import_playbook	18:02
dmsimard	I don't think either matters	18:02
dmsimard	at least going off by what they're saying in the bug	18:02
pabelanger	okay, 2.4.1.0 also failed. changing some syntax	18:06
pabelanger	v3-inventory-group also fails using import_playbooks	18:07
pabelanger	dmsimard: which is what you expected	18:07
pabelanger	dmsimard: so, what are you thinking is the correct process?	18:07
pabelanger	I think it comes down to: http://paste.openstack.org/show/626981/	18:09
pabelanger	v3-inventory, is 2 plays (which seems to load vars properly) and v3-inventory-group is 1 play	18:10
openstackgerrit	James E. Blair proposed openstack-infra/zuul feature/zuulv3: Add inventory variables for checkouts https://review.openstack.org/521976	18:21
electrofelix	does zuul support running a specific job on failure? we need to do some parsing of data in order to report back to users on change failure?	18:33
jlk	I don't think we have a 'finally' type bit of a pipeline	18:35
jlk	that's an interesting feature addition, that a spec would be nice for	18:35
electrofelix	I was hoping the failure_actions in v3 might be something that was thinking along those lines	18:35
jlk	in your playbook you could catch failure from a play and handle it within that job	18:35
electrofelix	bit difficult when any of 5 jobs launched could be the cause of the failure	18:36
jlk	nod, where do you see failure_actions? I may have missed something	18:36
mordred	electrofelix: so - one of the things on the todo list is better parsing/presentation of the logged json for each job... for instance:	18:37
mordred	electrofelix: http://logs.openstack.org/05/521105/3/infra-check/openstack-zuul-jobs-linters/942626d/job-output.json.gz	18:37
mordred	electrofelix: has all the base data that was used to produce http://logs.openstack.org/05/521105/3/infra-check/openstack-zuul-jobs-linters/942626d/job-output.txt.gz	18:38
electrofelix	it's in the zuul/model.py, I figured it might be a generic replacement for how the failure message was performed before	18:38
tobiash	electrofelix: post playbooks should be executed regardless of a failed run playbook (within the same job)	18:38
mordred	electrofelix: so with an html view of that, collapsing the non-error portions and expanding only the failure portion should be fairly easy	18:38
tobiash	if that's enough	18:38
mordred	andyah - what tobiash said	18:38
electrofelix	tobiash: we'd end up needing to copy the same code to the post playbook of all jobs (and btw, we're still using Jenkins...)	18:39
mordred	electrofelix: I thinkn I may not fully understand which thing you're trying to do?	18:39
electrofelix	I was hoping there might be something we said, on_failure of any of the jobs for the change in the pipeline, run this job	18:39
tobiash	electrofelix: so you're talking about zuulv2?	18:40
mordred	electrofelix: what would you do in the job that runs in response to on_failure?	18:40
electrofelix	tobiash: yes, but also considering moving to zuulv3 (still works with Gearman)	18:40
mordred	electrofelix: but you don't need to copy the same code to the post playbook of all jobs - you should be able to put the code you need in the post playbook of your base job	18:41
tobiash	so cleanup jobs are not there in zuul but I think there were already discussions about that some months ago	18:41
mordred	electrofelix: also, did I paste you https://etherpad.openstack.org/p/zuulv3-jenkins-integration yet?	18:42
electrofelix	mordred: take the git tree, parse some metadata stored in the failed commit message, look up some changes further upstream (we're doing artifact promotion), and notify the source projects that produced the artifact that just failed it's promotion	18:42
jeblair	tobiash: yes, i think we're planning on adding them shortly after 3.0	18:42
mordred	electrofelix: yah - you should totally be able to just do that in a post playbook on your base job - it'll have a variable that indicates whether the job failed or not, and it also has all of the git repo state available	18:43
electrofelix	mordred: yep, I think it's orthogonal, I possibly just need to understand a bit more about base jobs and what that means in working with gearman/jenkins	18:43
mordred	electrofelix: ++	18:43
mordred	electrofelix: mostly poking to make sure I understand the thing you're wanting to accomplish. you could do it with a cleanup job, but I'm pretty sure you could do it with a base job.	18:45
jeblair	the difference between cleanup/base would be whether it happes once per job in a buildset, or once for the whole buildset	18:45
mordred	jeblair: ++	18:46
mordred	electrofelix: all that said - you do know that zuulv3 isn't compatible with the jenkins gearman plugin, yeah? that's the reason I pasted that etherpad about v3/jenkins integration thoughts	18:46
electrofelix	based on my hazy understanding of terminology, once for the whole buildset, we only care if any of the jobs for the change have failed, we don't care as to what one	18:46
mordred	nod. so cleanup job, once it exists, may map better for you	18:47
jlk	woo, use case -> solution.	18:47
jlk	go team	18:47
jeblair	ya, and now we have 2 use cases for cleanup	18:48
mordred	and since all you need is the git repo state, you should be able to potentially write a nodeless cleanup job	18:48
mordred	\o/	18:48
mordred	that means it may even be a good idea :)	18:48
electrofelix	mordred: I thought the additional works were to make it work better rather than it not working at all. I see there are still references to gearman in v3	18:50
electrofelix	mordred: what is it that doesn't currently work? is it just the nodepool integration? or can zuulv3 not launch jobs on Jenkins with static slaves at all?	18:55
mordred	electrofelix: oh - it still uses gearman, but it uses gearman as an internal communication mechanism, not as an interface with external systems	18:55
jeblair	the zuulv3 spec may provide some background: http://specs.openstack.org/openstack-infra/infra-specs/specs/zuulv3.html	18:56
jeblair	also some of the docs we wrote for the infra migration: https://docs.openstack.org/infra/manual/zuulv3.html#what-is-zuul-v3	18:57
jeblair	in short, it handles execution and multi-node orchestration itself, via ansible. jobs run as ansible playbooks. those can be simple playbooks which just run tests (which is the bulk of what we do in openstack infra), but ansible gives us a lot of flexibility in interacting with other systems, so mordred's etherpad lays out a way of doing so	18:59
*** jeblair is now known as thecount		19:01
*** thecount is now known as jeblair		19:01
mordred	yah. one of the biggest bits is figuring out how to get the zuul prepared repo state onto the node that jenkins is going to run the job on - that's the reason for the handoff dance in the second part of the etherpad	19:01
mordred	if you're doing static nodes, then obviously the nodepool integration bit isn't as important	19:01
electrofelix	is it no longer possible from the node side to clone/pull from zuul merger?	19:02
mordred	although we might have to brainstorm about how to get the information about the correct static node to zuul if it's static nodes jenkins owns rather than static nodes zuul owns and passes over	19:02
mordred	electrofelix: nope. they don't run git servers at all	19:03
jeblair	that was mostly to facilitate the workflow where cloud resources don't have access to the control plane, so it's a push rather than a pull now	19:03
electrofelix	mordred: ah, well that would be a problem, I wonder if we could hack something there temporarily as otherwise it might become so difficult to migrate that no time ever gets allocated for us to help work on it	19:04
mordred	electrofelix: well - once the jenkins integration stuff is done (several people want it/need it) I imagine it would be much easier for you to migrate	19:05
jeblair	(and as mordred mentioned, doing a first pass of the zuul-trigger-plugin without nodepool support should be a lot easier)	19:05
jeblair	(if you only have static nodes to worry about for the moment)	19:05
mordred	++	19:06
* rcarrillocruz vaguely recalls having a reverse tunnel to the merger in Gozer just for that		19:06
mordred	rcarrillocruz: sssh. don't put that tunnel in the docs :)	19:06
dmsimard	pabelanger: hey sorry I went to get lunch	19:06
* rcarrillocruz also has dug Gozer deep in its memory so he may be wrong		19:06
rcarrillocruz	:P	19:07
dmsimard	pabelanger: so you were not able to get group_vars to load expectedly in either cases with 2.4.1.0 ?	19:07
electrofelix	rcarrillocruz: I thought gozer was old enough that it still had the push merge functionality ;-)	19:07
rcarrillocruz	lol, we had so many hacks it's hard to remember	19:07
electrofelix	Needing to hack something together to deal with this reporting back to other repos on a failure in a different repo might be difficult to persuade it's worth writing the zuul trigger plugin and then mean we have a more difficult time to migrate	19:08
rcarrillocruz	like all the proxy mesh i put to make pulls to work on the internal labs	19:08
pabelanger	dmsimard: both v2-inventory and v3-inventory work as expected, v3-inventory-group fails	19:08
dmsimard	pabelanger: so same behavior as 2.3 ??	19:08
pabelanger	dmsimard: right	19:09
pabelanger	which, is fine for me.	19:09
dmsimard	let me try something	19:09
* tobiash had fun deploying openshift a hundred times during the last two weeks		19:13
electrofelix	mordred jeblair: so we run a git daemon from the same container as the zuul merger instance (using supervisor), which is obviously a giant security hole for private repos, but hey lets not worry about that. Seems like that might allow us to migrate to v3 with the zuul trigger plugin for getting code onto the slaves	19:16
electrofelix	s/with/without/	19:16
dmsimard	pabelanger: ok, FWIW I confirm the behavior -- when asking bcoca about it, it is expected behavior. The problem is basically that you have group vars for the same host in two groups, the last one loaded wins in that case which is ultimately defined by alphabetical order by default	19:21
pabelanger	right	19:22
dmsimard	pabelanger: but this behavior can be changed with the "ansible_group_priority" var.. I don't see it on docs.ansible.org but there's a mention of it here https://github.com/ansible/ansible/pull/28777	19:22
pabelanger	dmsimard: https://review.openstack.org/521324/ is my attempt to fix it	19:23
jeblair	electrofelix: well, in v3 the git repos we want to put onto the workers are on the new zuul-executor server, and they're in a job-specific directory. it, erm, would be physically possible for you to do the same sort of thing, except that it's an even larger security hole. it's definitely not intended to be served out. tbh, i'm not sure it'd be that much harder to do the jenkins plugin.	19:23
dmsimard	pabelanger: that makes sense	19:23
dmsimard	pabelanger: this problem hurts my brain	19:23
dmsimard	jeblair: I understand pabelanger's issue now	19:24
dmsimard	jeblair: forget about SSH, different plays or var scopes.. it's about the same inventory host being in two different groups, and these two groups each have a group_vars.. There has to be one group_vars that wins over the other. What pabelanger aims to fix is to provide the ability to generate different inventory hosts which are really the same nodepool VM as to make sure each group_vars is loaded properly	19:26
dmsimard	I hope that makes sense, this one hurts my brain for some reason	19:26
electrofelix	jeblair: the problem is selling it, it can sound much easier when it's supposedly just a script, and far more work when it's a plugin, whether it is the same amount of work to solve doesn't always figure into it...	19:27
electrofelix	mordred: I'll try chatting to you more about the plugin, I've a feeling I won't be able to get it to fly this side of feb, but might at least try	19:29
dmsimard	pabelanger: something that is worth mentioning is that this problem doesn't reproduce if you have different var names	19:30
dmsimard	pabelanger: we're seeing this "race" because the same var is defined in both places	19:30
clarkb	dmsimard: interesting so it must merge the vars together?	19:30
clarkb	and its last overlapping name wins?	19:30
dmsimard	clarkb: yes, child > parent, priority and then 'alpha sort'	19:31
dmsimard	There's arguably not much else they can do	19:31
dmsimard	There has to be something to resolve conflicts	19:31
pabelanger	dmsimard: right, I want to keep variable names, but set them to different values based on host. I could rewrite the playbooks to use unique vars, but not something I'd like to do	19:31
mordred	electrofelix: so - it's also been suggested to me that the thing I'm calling a plugin might be able to be done with a groovy script in a jenkinsfile	19:32
dmsimard	pabelanger: yup, just clarifying the behavior about "conflicting" group_vars	19:32
pabelanger	dmsimard: yah, thanks	19:32
pabelanger	you explained it better then I could	19:32
mordred	electrofelix: I don't really know much about those - but I bet if we put our heads together we could come up with a hacky POC approach that would do the handoffs appropriately but not involve a new plugin	19:32
electrofelix	mordred: yes, but it requires a system groovy script and that would just be a precursor to a plugin because you really wouldn't want to have to replicate that for every job	19:33
mordred	electrofelix: nod	19:34
jeblair	dmsimard: yes, though i had read the comment on the bug about loading on demand as suggesting that perhaps when a host is being used because it's in a specific group, that group would win, not some arbitrary first or last group.	19:35
jeblair	but i'm not arguing with reality, that was just what i was hoping for :)	19:36
dmsimard	jeblair: that's what I thought too, actually, which is why I was surprised to see the issue stayed there in 2.4	19:39
dmsimard	let me challenge upstream on that	19:39
pabelanger	I should also note, https://review.openstack.org/519596/ didn't actually fix the issue with var scoping as I expected. So, we could just abandon that now, if we don't see value in doing it	19:40
pabelanger	it still required an updated inventory file	19:40
dmsimard	pabelanger: added a comment on https://review.openstack.org/#/c/521324/ which summarizes what we discussed	19:42
*** hasharAway is now known as hashar		19:44
*** electrofelix has quit IRC		19:45
dmsimard	jeblair: vars are loaded "just in time" for the host, but it doesn't change how when it loads the vars there is a conflict that needs to be resolved basically	19:47
dmsimard	There's no awareness of context as to what group the play is running against vs variable loading	19:48
dmsimard	Which would be awkward anyway, if you target a play against "all", you don't really know what group you're targetting	19:48
jeblair	ya, makes sense	19:49
openstackgerrit	Monty Taylor proposed openstack-infra/zuul-jobs master: Only run whereto if htaccess file exists https://review.openstack.org/521996	19:59
openstackgerrit	Monty Taylor proposed openstack-infra/zuul-jobs master: Only run whereto if htaccess file exists https://review.openstack.org/521996	20:00
*** jasondotstar has joined #zuul		20:13
kklimonda	how does zuul promote work?	20:23
kklimonda	based on the description I've expected promote to move the given change to the top of the queue (below the currently running jobs), but either that did not happen, or the UI is just not showing that correctly.	20:25
openstackgerrit	Merged openstack-infra/zuul-jobs master: Add general sphinx and reno jobs and role https://review.openstack.org/521142	20:29
openstackgerrit	Merged openstack-infra/zuul-jobs master: Add support for warning-is-error to sphinx role https://review.openstack.org/521618	20:34
jeblair	kklimonda: it should move the change to the top of the queue ahead of the currently running jobs	20:36
kklimonda	@jeblair is there any way to see the internal zuul queue to see if that has happened? Or is zuul web/status.json "canonical" representation anyway?	20:37
jeblair	kklimonda: it's canonical	20:37
jeblair	kklimonda: can you describe your initial state, your promote command, and the state after running promote in more detail?	20:38
kklimonda	sure	20:38
jeblair	kklimonda: (maybe use etherpad.openstack.org if it helps to write it out there)	20:38
kklimonda	jeblair: https://etherpad.openstack.org/p/zuulv3-promote - zuulv3 web is public, and nothing critical in logs, so I just wrote it all down	20:42
jeblair	looking	20:42
jeblair	oh it's check, i was assuming gate	20:43
kklimonda	is this a gate-only feature? what's the difference?	20:44
jeblair	within a pipeline, there are multiple queues. in gate (dependent pipelines), these are determined by which projects affect each other and need to be tested together. in check (indepedent pipelines), all of the items are independent (ie, their ordering in the pipeline doesn't affect each other), so every item gets its own queue.	20:45
kklimonda	ah, that makes sense	20:45
jeblair	so yeah, promote isn't going to do anything in that case since it's a queue of one	20:45
jeblair	we could probably alter it to do something more useful in that case	20:46
kklimonda	would it be possible to implement that for check too? As in, how much work would that be? I'm only juggling 2 patches right now ;)	20:46
kklimonda	s/check/independent pipelines/	20:47
jeblair	i don't think it would be a simple change... mostly because the behavior we get in gate comes as side effect of re-ordering the queue (the dependency stack has changed, so zuul cancels jobs and re-launches)	20:48
jeblair	here's an idea though	20:48
jeblair	the goal is really "get me results for this change faster", right?	20:48
pabelanger	I actually think if you promote a change in check, it gets moved back to the bottom of status page	20:49
pabelanger	at least that is how I remember it when I tried to promote something in check may moons ago	20:49
jlk	punishment!	20:50
jeblair	perhaps we could add a command to change the priority for a specific change. normally priority is determined by the pipeline. but if we had a command to say "increase the priority of this change", zuul could cancel the node request for that change, and re-issue it with the updated, higher, priority. this would let it get the nodes faster and therefore complete faster.	20:50
kklimonda	right, that's how I've assumed that to be working in the first place - then I started reading the code, and got confused :)	20:51
kklimonda	I was missing "gate-only" part of the puzzle, now that code makes more sense	20:51
jeblair	node allocation is now the dominant factor in when changes start running jobs. the gearman queue is far less relevant now	20:51
kklimonda	right	20:51
jeblair	this priority change would probably be a lot easier to do.	20:51
kklimonda	would that also affect dependent pipelines, or is promote basically doing the same thing anyway?	20:52
pabelanger	Ya, prioirity would be nice	20:52
kklimonda	if it is, perhaps we could just reuse promote for both pipelines, and just make zuul different thing based on the pipeline type.. which sounds pretty nasty..	20:53
jeblair	kklimonda: it could work on dependent pipelines, but promote would still be better there, because a change at the end of the queue with jobs that have finished still won't report until the change ahead has. though you could use priority on a set of changes in one change queue to give them advantage over a different change queue.	20:54
jeblair	kklimonda: yeah... i'm sort of thinking that two commands may be clearer, but maybe we should have 'zuul promote' error out on independent pipelines?	20:54
jeblair	("You probably don't want this, use priority instead")	20:55
kklimonda	mhm, error out with a message about the other command could work	20:55
kklimonda	jeblair: btw, now that the summit is over if you have time, I've reworked https://review.openstack.org/#/c/515169/ a bit	20:59
kklimonda	I've also had an idea how to unify autohold requests for jobs, changes and refs by making the last part of the key a regex (.* for job-wide, refs/changes/[change]/.* for changes and full ref for refs)	21:01
kklimonda	but before I write it I wanted someone to take a look at the current revision	21:02
jeblair	kklimonda: ah, thanks! i had a successful vacation and managed to completely forget everything from before the summit. :)	21:02
dmsimard	jeblair: that's quite the feat	21:03
kklimonda	haha, didn't know that was actually possible - tell me your secret ;)	21:03
openstackgerrit	Monty Taylor proposed openstack-infra/zuul-jobs master: Set success-url for sphinx-docs to html https://review.openstack.org/522017	21:05
mordred	kklimonda: vacationing in places where there is no internet is helpful - I did the same thing before the summit - it made the begnning of the summit fun, as I had to re-learn what a computer is	21:06
dmsimard	mordred, jeblair, pabelanger: By the way, static report generation in ARA might not make it into 1.0. The use case with Zuul made me realize that it /really/ doesn't scale well and I'd much rather improve the sqlite "middleware" option I came up with instead ( http://ara.readthedocs.io/en/latest/advanced.html )	21:07
jeblair	i deprived myself of oxygen by climbing something like 6 thousand stairs; probably caused permanent brain damage but felt great.	21:07
mordred	jeblair: \o/	21:07
mordred	jeblair: you had too many brain nuggets anyway	21:07
mordred	dmsimard: nod. where did we get on deploying the middleware version in openstack land?	21:08
jeblair	dmsimard: thanks, makes sense	21:08
dmsimard	The static generation in ARA doesn't come for free, there's some constraints and hacks involved to ensure parity between the dynamic and the static version -- so improving the story around "arbitrary" sqlite databases and making the report always "dynamic" will allow for more freedom	21:08
dmsimard	mordred: not yet, there's reviews for logs-dev.o.o here: https://review.openstack.org/#/q/topic:ara-sqlite-middleware	21:09
dmsimard	I just -W'd https://review.openstack.org/#/c/513866/ because I need to double check something with the vhost setup first.	21:09
kklimonda	@dmsimard with sqlite middleware, would it be possible to "parse" ara reports programmatically, for example to get per-task durations?	21:10
dmsimard	kklimonda: when 1.0 is released, yes -- not in the current "stable" version	21:10
dmsimard	kklimonda: I actually discussed this last night, hang on	21:11
dmsimard	kklimonda: http://eavesdrop.openstack.org/irclogs/%23zuul/%23zuul.2017-11-21.log.html#t2017-11-21T04:14:39	21:12
dmsimard	see for example http://paste.openstack.org/raw/626895/ which gets information about failed tasks for a particular playbook	21:12
kklimonda	mhm, that will probably make a lot of things easier :)	21:13
kklimonda	I had to gather duration of a single task across all the jobs, right now ended up parsing html (with the power of grep and sed) but being able to just load bunch of sqlite DBs and run queries on them would be much nicer	21:14
dmsimard	kklimonda: and the cool thing about the API is that the client-side implementation (that paste just now) knows how to "talk" to the API offline/internally or over HTTP REST without any changes in the implementation	21:15
dmsimard	so people can write "plugins" or whatever they want and it'll just work, whether I'm running it locally on my laptop without a centralized instance or if I'm sending data over http	21:16
kklimonda	so with this implementation anyone could write a python script that will connect to ara endpoints and query them for various data?	21:16
dmsimard	yes, right now the client is bundled in ara -- but the plan is to unbundle it.. like python-araclient or something. Same for the other components (webapp especially)	21:17
dmsimard	It's not 100% clear yet how the API will end up being restricted (or not).. I'm not interested in the business of handling credentials, passwords, permissions, ACLs/RBAC, etc. This might be an exercise left to the operator -- to restrict through a webserver or something.	21:18
kklimonda	right now there is no RBAC etc. anyway, right?	21:20
dmsimard	Right, but there's also no API and the interface is 100% passive	21:20
kklimonda	anyone can just pull static files and have their sanity tested by parsing it with regex	21:20
dmsimard	The interface in 1.0 remains 100% passive, but you can POST/PATCH/DELETE through the API	21:20
kklimonda	hum	21:20
kklimonda	what would be the usecase for making changes to the already generated report?	21:21
kklimonda	(I'm probably missing something obvious, I only see ARA as a tool to display zuul job results right now :))	21:21
dmsimard	Mostly things that you don't know until later	21:22
dmsimard	For example, we might want to create a record in the database for a task	21:22
dmsimard	and then update it later once we know if it failed or passed	21:22
dmsimard	ara itself isn't really going to be modifying historical data, but the ability is there -- the api is super generic	21:23
dmsimard	kklimonda: https://github.com/ansible/ansible/blob/devel/lib/ansible/plugins/callback/default.py might give some context around how things work	21:24
dmsimard	kklimonda: ara is a callback plugin that leverages each of these hooks (v2_playbook_on_start, v2_task_on_failed, etc.) and in some circumstances you want to circle back around an event that started (a task) and "finish" it (mark it as successful)	21:25
dmsimard	mordred: btw "select count(*)" is stupid slow for a number of reasons and I kind of want to keep numbers about the amount of data processed by ara. How do you feel about just selecting the last row and getting the id instead ? like "select id from table order by id desc limit 1" ? It's going to be inaccurate if you end up deleting data but it's not really a lie in the sense that ara did process those	21:31
dmsimard	I totally hacked something to make it run faster with sqlalchemy (thank you anonymous stackoverflow person) but it's still way too slow	21:32
mordred	dmsimard: have you tried "select count(id)" ?	21:33
mordred	dmsimard: select count(*) has a special optimization in mysql that makes it fast	21:33
mordred	dmsimard: but if you select count(id) sqlalchemy _should_ be able to use the index on the primary key	21:34
dmsimard	I don't remember, I do know that it is a fairly well documented issue that select count is slow in sqlalchemy	21:34
mordred	nod. well - getting the highest value from an auto increment int primary key column should be good enough	21:34
dmsimard	mordred: that special optimization is in innodb ?	21:35
mordred	dmsimard: oh- actually, I think it's just in myisam - trying to remember - it's beena few years since my consulting days and it gets hazy	21:37
dmsimard	heh	21:37
dmsimard	I vaguely remember doing repeated "show table status" on innodb tables and the row count varying wildly	21:38
jeblair	mordred: maybe you're at the phase now where you can only tune mysql while drunk	21:38
dmsimard	oh look it's explained here	21:38
dmsimard	The number of rows. Some storage engines, such as MyISAM, store the exact count. For other storage engines, such as InnoDB, this value is an approximation, and may vary from the actual value by as much as 40 to 50%. In such cases, use SELECT COUNT(*) to obtain an accurate count.	21:38
dmsimard	The Rows value is NULL for tables in the INFORMATION_SCHEMA database.	21:38
mordred	dmsimard: yah - there it is	21:38
dmsimard	good to know	21:39
mordred	and https://www.percona.com/blog/2007/04/10/count-vs-countcol/ explains how innodb will do which type of scans in which cases	21:39
mordred	also https://www.percona.com/blog/2006/12/01/count-for-innodb-tables/	21:40
mordred	depending on how much you want to know :)	21:40
dmsimard	It's been at least 2 years since I've actively tuned mysql but it's still fun :)	21:41
*** jkilpatr_ has quit IRC		21:47
*** threestrands has joined #zuul		21:48
*** jkilpatr has joined #zuul		22:05
*** hashar has quit IRC		23:22

Generated by irclog2html.py 2.15.3 by Marius Gedminas - find it at mg.pov.lt!