Wednesday, 2019-07-24

clarkb	any idea how http://logs.openstack.org/58/671858/3/gate/zuul-tox-docs/b6f9626/job-output.txt.gz#_2019-07-23_16_48_40_674698 happened and if it has been corrected?	00:14
*** ianychoi has quit IRC		00:18
*** ianychoi has joined #zuul		00:20
fungi	clarkb: fixed by 672372 maybe?	00:22
clarkb	ah I see that now in scrollback thanks	00:23
openstackgerrit	Merged zuul/zuul master: Fix sphinx error https://review.opendev.org/672372	00:33
*** mattw4 has quit IRC		00:37
*** jamesmcarthur has joined #zuul		00:52
*** jamesmcarthur has quit IRC		00:58
*** igordc has quit IRC		01:04
*** jamesmcarthur has joined #zuul		01:20
*** bhavikdbavishi has joined #zuul		01:52
*** bhavikdbavishi1 has joined #zuul		01:55
*** bhavikdbavishi has quit IRC		01:57
*** bhavikdbavishi1 is now known as bhavikdbavishi		01:57
*** jamesmcarthur has quit IRC		02:16
*** jamesmcarthur has joined #zuul		02:17
*** jamesmcarthur has quit IRC		02:21
*** jamesmcarthur has joined #zuul		02:22
*** jamesmcarthur has quit IRC		02:24
*** jamesmcarthur has joined #zuul		02:24
*** jamesmcarthur has quit IRC		02:32
*** bhavikdbavishi has quit IRC		02:46
*** mattw4 has joined #zuul		03:21
*** bhavikdbavishi has joined #zuul		03:35
*** mattw4 has quit IRC		03:37
*** mattw4 has joined #zuul		03:43
*** mattw4 has quit IRC		03:51
*** igordc has joined #zuul		03:58
*** igordc has quit IRC		04:01
*** bolg has joined #zuul		04:02
*** raukadah is now known as chandankumar		04:02
*** pcaruana has joined #zuul		04:27
*** michael-beaver has quit IRC		04:31
*** bjackman has joined #zuul		04:43
*** pcaruana has quit IRC		05:12
*** jangutter has quit IRC		05:18
*** pcaruana has joined #zuul		05:25
*** bolg has quit IRC		05:34
*** bolg has joined #zuul		05:36
*** bolg has quit IRC		05:55
*** tosky has joined #zuul		06:41
*** rlandy has joined #zuul		06:58
*** jpena\|off is now known as jpena		07:12
*** jangutter has joined #zuul		07:19
*** jpena is now known as jpena\|mtg		07:19
daniel2	So with the dockerized zuul setup, how does nodepool build images? Does it still use diskimagebuilder?	07:21
*** jangutter has quit IRC		07:23
openstackgerrit	Ian Wienand proposed zuul/nodepool master: Enable debug logs for openstack-functional tests https://review.opendev.org/672412	07:23
*** hashar has joined #zuul		07:41
*** bolg has joined #zuul		07:53
*** threestrands has joined #zuul		08:02
*** rlandy is now known as rlandy\|mtg		08:51
*** altlogbot_2 has quit IRC		08:51
*** irclogbot_2 has quit IRC		08:51
*** altlogbot_2 has joined #zuul		08:53
*** irclogbot_3 has joined #zuul		08:53
*** hwangbo has quit IRC		09:04
*** jangutter has joined #zuul		09:16
*** saneax has joined #zuul		09:35
*** bhavikdbavishi has quit IRC		09:38
*** bolg has quit IRC		09:39
*** bolg has joined #zuul		09:48
*** jamesmcarthur has joined #zuul		09:49
*** arxcruz is now known as arxcruz\|brb		09:54
*** jamesmcarthur has quit IRC		09:58
*** jamesmcarthur_ has joined #zuul		09:58
*** threestrands has quit IRC		10:08
*** sshnaidm\|afk is now known as sshnaidm		10:10
*** jamesmcarthur_ has quit IRC		10:11
zbr_	can can I add some extra tasks to run on a child-job without overriding pre-run from parent? Is there a way to do this?	10:13
zbr_	i guess roles would do the trick, but I am not sure how inheritance works with them	10:14
AJaeger	zbr_: if you add a pre-run, it is run in-addition, see also https://zuul-ci.org/docs/zuul/user/config.html#job	10:16
AJaeger	zbr_: so, you never override pre-run	10:16
openstackgerrit	Fabien Boucher proposed zuul/zuul master: Return dependency cycle failure to user https://review.opendev.org/672487	10:16
zbr_	AJaeger: super! thanks.	10:17
zbr_	it totally makes sense to do it like that.	10:17
*** hashar has quit IRC		10:40
*** saneax has quit IRC		11:32
*** saneax has joined #zuul		11:33
*** irclogbot_3 has quit IRC		11:33
*** irclogbot_3 has joined #zuul		11:36
*** wxy-xiyuan has quit IRC		11:42
*** hashar has joined #zuul		11:57
*** bolg has quit IRC		12:04
*** bhavikdbavishi has joined #zuul		12:12
*** roman_g has joined #zuul		12:22
roman_g	Hello team! Users question. I have 2 dependent changes in 2 repos - A and B. Change in repo B has proper Depends-On: xxxxxx pointing to the change in repo A. Code in repo B git-clones repo A internally and it currently clones 'master'. How could I properly utilize Zuul to do git-clone repo A for me, and apply the parent change over it?	12:28
roman_g	*User's	12:28
flaper87	what ansible version does zuul-executor use by default?	12:28
flaper87	(when there are multiple versions installed, that is)	12:28
flaper87	is it the oldest or the latest?	12:29
*** arxcruz\|brb is now known as arxcruz		12:29
Shrews	flaper87: https://zuul-ci.org/docs/zuul/user/config.html#attr-job.ansible-version	12:29
*** bhavikdbavishi has quit IRC		12:35
*** bhavikdbavishi has joined #zuul		12:39
flaper87	Shrews: thanks	12:39
*** bhavikdbavishi1 has joined #zuul		12:42
*** bhavikdbavishi has quit IRC		12:43
*** bhavikdbavishi1 is now known as bhavikdbavishi		12:43
AJaeger	roman_g: Use required-projects - it's a list of repos that we download on disk for you, it's the same branch plus your depends-on changes on it.	12:44
AJaeger	roman_g: So, just use that repo, you get it for free ;)	12:44
AJaeger	I meant: checked out tree - and not entirely free but no magic for you to apply a change	12:44
AJaeger	roman_g: you can pass that dir to your jobs like in https://opendev.org/zuul/nodepool/src/branch/master/playbooks/nodepool-functional-openstack/pre.yaml#L4	12:47
*** bolg has joined #zuul		12:50
*** yoctozepto has joined #zuul		12:54
roman_g	AJaeger: thank you! Looking into it.	12:56
*** bjackman has quit IRC		12:59
*** bolg has quit IRC		12:59
*** bolg has joined #zuul		13:01
*** bolg has quit IRC		13:05
*** jku has joined #zuul		13:06
*** jku has quit IRC		13:06
*** jank has joined #zuul		13:07
*** jank has quit IRC		13:07
tristanC	corvus: about zuul-tests.d, is there a change planned to match such test job when a job definition changes (to avoid adding the full zuul.yaml to the files list)?	13:09
*** jank has joined #zuul		13:10
AJaeger	tristanC: already merged ;)	13:11
AJaeger	tristanC: see https://review.opendev.org/669752	13:11
*** jpena\|mtg is now known as jpena\|off		13:12
*** jank has quit IRC		13:14
tristanC	AJaeger: thanks. So a tox-linters-test job would set tox-linters as parent to trigger when tox-linters definition change right?	13:15
*** jank has joined #zuul		13:15
openstackgerrit	Fabien Boucher proposed zuul/zuul master: Return dependency cycle failure to user https://review.opendev.org/672487	13:15
AJaeger	tristanC: not sure what happens if parent changes, best check implementation or ask corvus...	13:16
*** bhavikdbavishi has quit IRC		13:20
tristanC	corvus: it seems like match-on-config-updates implies that a job is able to self test itself. Wouldn't it make sense to let another job run when a job definition change? e.g. tox-linters-test.match-on-job-update: [tox-linters] ?	13:24
*** jpena\|off is now known as jpena		13:25
*** jpena is now known as jpena\|mtg		13:27
*** jamesmcarthur has joined #zuul		13:55
*** jeliu_ has joined #zuul		14:07
*** lennyb has quit IRC		14:18
*** jamesmcarthur has quit IRC		14:25
*** michael-beaver has joined #zuul		14:29
*** mattw4 has joined #zuul		14:31
*** rlandy\|mtg has quit IRC		14:32
fungi	daniel2: depends on what you mean by "dockerized zuul setup" but sure you can run nodepool builders (including diskimage-builder) in a container. nodepool can also talk to container orchestration engines (kubernetes, openshift) to launch containers for jobs to run in, if that's what you're asking	14:35
*** jpena\|mtg is now known as jpena\|off		14:36
corvus	tristanC: right, the match-on-config-update feature was designed to eliminate the need for '.zuul.yaml' in files matchers, so that when a job is updated, it is run. most jobs are self-testing. i'm having trouble imagining a job which tests another job without also being descended from that job. in zuul-jobs, we have a lot of test-foo jobs, but they are testing roles. a tox-linters-test job could inherit	14:39
corvus	from tox-linters, and it should match on a job update to either. so i think even that use-case is covered.	14:39
tristanC	corvus: so match-on-config-updates also matches parent job definition change?	14:42
corvus	tristanC: afaik it should	14:42
corvus	tristanC: it's basically implemented as a diff for the finalized job config; if anything about what it's about to run changes, it'll match	14:43
tristanC	corvus: the use-case is to be able to test the pre/post phase of the job	14:43
tristanC	corvus: for example, for tox job, we might want to be able to setup a tools/tests-setup.sh, which would need to be done before the unittest pre phase	14:44
corvus	tristanC: got it. yeah, that should work	14:45
*** igordc has joined #zuul		14:49
*** AshBullock has joined #zuul		14:51
*** mattw4 has quit IRC		14:51
AshBullock	Hey guys, we are trying to hook up kubernetes to nodepool, we see in the documented configs two labels, namepace and pod, we've added both of these, but are seeing "AttributeError: 'NoneType' object has no attribute 'create_namespace'" in the nodepool logs, how are these labels supposed to be setup? Thanks	14:55
tristanC	AJaeger: there should be an exception in the logs about "Couldn't load client from config"	14:56
tristanC	AshBullock: ^ (sorry AJaeger, wrong autocomplete)	14:56
AshBullock	this is our config file for nodepool http://paste.openstack.org/show/754804/	14:59
*** mattw4 has joined #zuul		15:00
AshBullock	and the error we get is http://paste.openstack.org/show/754805/	15:00
AshBullock	and to confirm we do see the error you mentioned 2019-07-24 14:55:57,139 ERROR nodepool.driver.kubernetes.KubernetesProvider: Couldn't load client from config	15:01
*** swest has quit IRC		15:03
tristanC	corvus: alright, we'll give this a try then. I guess we can re-run the pre and post phase of the parent in the run phase of the test job and do the assert in the child job	15:03
tristanC	s/test job/child job/	15:04
tristanC	AshBullock: did you setup the ~nodepool/.kube/config file?	15:05
AshBullock	have the kube config added yes	15:06
AshBullock	Now receiving this error: "Failure","message":"namespaces is forbidden: User \"system:anonymous\" cannot create resource \"namespaces\" in API group \"\" at the cluster scope","reason":"Forbidden","details":{"kind":"namespaces"},"code":403}	15:07
AshBullock	after updating my nodepool config	15:08
AshBullock	to reference the kube context name correctly	15:08
tristanC	AshBullock: perhaps there is something to enable in EKS to enable your service account to list/create resources	15:09
*** jank has quit IRC		15:09
AshBullock	thanks, I'll look into that now	15:09
tristanC	AshBullock: iirc eks requires an iam token to use the api from outside. is your nodepool-launcher service running in eks?	15:12
*** roman_g has left #zuul		15:29
AshBullock	thanks, managed to get it spinning up now, issue was the aws client was out of date and did not have the get-token command	15:32
AshBullock	thanks for all the help	15:33
clarkb	we might want to make note of that requirement in the docs?	15:34
corvus	yeah, any additional info that could help future users would be great :)	15:37
AshBullock	So I've got the containers running but get this error: main \| MODULE FAILURE: error: You must be logged in to the server (Unauthorized) after installing kubectl on the executor, I can run kubectl commands from nodepool user but I assume the ansible run is using a virtual env, any ideas how to solve this?	15:42
AshBullock	this is running on the pre.yml tasks	15:43
*** mattw4 has quit IRC		15:44
*** mattw4 has joined #zuul		15:44
AshBullock	which is running zuul roles add-build-sshkey , prepare-workspace	15:45
tristanC	AshBullock: nodepool creates a service account per zuul noderequest like so: https://opendev.org/zuul/nodepool/src/branch/master/nodepool/driver/kubernetes/provider.py#L155	15:46
tristanC	AshBullock: which may different from the one you had to configure for nodepool...	15:47
AshBullock	so that created service account probably doesn't have access to the credentials?	15:48
AshBullock	so how would I pass them through to the user?	15:48
*** mattw4 has quit IRC		15:49
tristanC	AshBullock: it shouldn't have access to the credentials to prevent the job from tempering with another job resources.	15:50
tristanC	AshBullock: it is meant to be a restricted account for the namespace created per job	15:51
AshBullock	we're targeting hosts: all, should we be targeting hosts: pod as per this guide: https://www.softwarefactory-project.io/tech-preview-using-openshift-as-a-resource-provider.html	15:51
tristanC	"hosts: all" should match the resources given to the job	15:53
tristanC	the blog post uses "pod" instead just to be more explicit	15:54
AshBullock	with containers is there a list of approved zuul roles to use?	15:56
tristanC	AshBullock: most should work, except for the one using the "synchronize" module	15:57
tristanC	AshBullock: here is how we copy the sources to pod: https://review.opendev.org/631402	15:58
*** AshBullock has quit IRC		16:04
clarkb	I have rechecked https://review.opendev.org/#/c/671858/3 since sphinx builds were fixed (just a heads up that that should be going in)	16:06
openstackgerrit	James E. Blair proposed zuul/zuul-jobs master: Download-artifact: use the artifact type rather than name https://review.opendev.org/672557	16:09
*** hashar has quit IRC		16:10
*** AshBullock has joined #zuul		16:13
*** pcaruana has quit IRC		16:13
mordred	corvus: that also looks good - I can go either way	16:20
corvus	mordred: it's a little more work, but ending up with "_type" is going to bother me less, so i'm perfectly happy to do it :)	16:21
mordred	\o/	16:21
openstackgerrit	James E. Blair proposed zuul/zuul master: Add log browsing to build page https://review.opendev.org/671906	16:25
openstackgerrit	James E. Blair proposed zuul/zuul master: Move artifacts to their own section https://review.opendev.org/672379	16:25
daniel2	fungi: So I added the nodepool builder container, I'm not sure if you or anyone active here has done this, but I can't get the images directory to create due to permission issues. Even if its put inside of the home /var/lib/nodepool.	16:27
daniel2	Docker log shows: PermissionError: [Errno 13] Permission denied: '/var/lib/nodepool/images/builder_id.txt'	16:28
clarkb	daniel2: can you check the uid:gid ownership of that file and that of the nodepool-builder process?	16:30
clarkb	the nodepool-builder records that file so that it identifies itself uniquely to zookeeper iirc. It will need to be able to write to that directory and file	16:31
daniel2	clarkb: so when docker runs, it does a whoami, but fails `whoami: cannot find name for user ID 10001`	16:31
daniel2	This is using the zuul/nodepool-builder image	16:31
daniel2	That folder is probably not the right permissions but not sure how to change that. I guess I could write a file to run before the command	16:32
clarkb	I expect the nodepool-builder image wants you to mount in a volume or bindmount for that directly so that it persists between docker container instances	16:32
clarkb	and you'll have to set permissions appropriately?	16:32
daniel2	clarkb: That's what I'm doing.	16:32
daniel2	using a bind mount	16:32
clarkb	daniel2: you can docker exec a shell into the image to poke around and check things like permissions	16:32
daniel2	I cant because the container died due to nodepool-builder exiting.	16:32
clarkb	you can run it then and change the command	16:33
daniel2	dunno why I didnt think of that	16:33
daniel2	must be the lack of sleep	16:33
clarkb	just double confirming that uid is the one set by the image so that is the expected value	16:35
daniel2	clarkb: this is just strange, the permissions appear correct.	16:38
daniel2	Would it possibly be an issue with the host?	16:38
pabelanger	is /var/lib/nodepool/images/ a mount?	16:39
daniel2	/dev/mapper/area50--vg-root on /var/lib/nodepool/images type ext4 (rw,relatime,errors=remount-ro,data=ordered)	16:39
pabelanger	that is from inside container?	16:40
daniel2	yes	16:40
pabelanger	and touch /var/lib/nodepool/images/foo.txt works?	16:40
daniel2	no, gives permission denied	16:40
pabelanger	what does mount outside of container look like	16:41
clarkb	and what are permissions of the directory	16:41
clarkb	(to create a file you need w on the dir iirc)	16:42
daniel2	drwxrwxr-x 2 ubuntu ubuntu 4.0K Jul 24 04:50 images	16:42
pabelanger	what is uuid of ubuntu?	16:42
clarkb	that would be the issue I think	16:42
daniel2	1000	16:42
pabelanger	that should be the same as user in container	16:42
daniel2	eheh	16:42
daniel2	Can I specify the user id in the docker compose file	16:43
clarkb	we may need to modify our entrypoint to chown that	16:43
daniel2	ah I see	16:43
pabelanger	I don't think we start a nodepool-builder in quickstart do we?	16:43
daniel2	no we don't.	16:43
clarkb	pabelanger: we don't	16:43
daniel2	I added that myself.	16:43
Shrews	quickstart uses static nodes so builder isn't necessary	16:45
daniel2	builder_1 \| chown: changing ownership of '/var/lib/nodepool/images': Operation not permitted	16:45
daniel2	:D	16:45
pabelanger	daniel2: https://opendev.org/windmill/ansible-role-nodepool/src/branch/master/tests/playbooks/templates/etc/systemd/system/nodepool-builder.service.j2 is how I do it directly with docker, -u 1001:1001, for the volumes	16:46
*** mattw4 has joined #zuul		16:46
pabelanger	not sure how do to it with compose	16:46
clarkb	pabelanger: that presumes you chown outside of docker though right?	16:46
daniel2	ohhh	16:46
clarkb	pabelanger: its the same problem in either case whether you change the uid or not	16:46
clarkb	something has to set ownership on that dir an dlooks like the current image is failing to do so	16:47
daniel2	Well, I guess changing the id wouldn't work in nodepool-builder	16:47
clarkb	daniel2: do you have more log context for the chown failure?	16:47
daniel2	no thats all it said	16:47
pabelanger	clarkb: I cann't remember. I'd have to look at docs	16:47
pabelanger	but I don't chown anything directly, docker does it	16:47
pabelanger	daniel2: but that is using zuul/nodepool-builder images, so should work	16:48
daniel2	I could try and bind mount the service file	16:48
daniel2	oh no	16:49
daniel2	Thats to start with docker	16:49
pabelanger	yah, this isn't using compose, just docker directly	16:49
pabelanger	you could just try command manually, and see if it works	16:50
daniel2	https://shafer.cc/paste/view/0c89553c That was what I had in docker-compose.yaml when I tried with chown	16:50
pabelanger	then, work to add it to compose	16:50
clarkb	daniel2: did you chown it to ubuntu:ubuntu then?	16:51
daniel2	I got past it	16:51
daniel2	I set user: 1000:1000 in docker-compose section for builder	16:52
clarkb	sure but if you didn't chown it to 1000:1000 in the first place would it have owrked?	16:52
daniel2	No, I didn't chown it until you guys had mentioned it	16:52
daniel2	before I wasn't doing anything outside of the norm	16:52
clarkb	fwiw our entrypoint for the opendev gitea images does an explicit chown of the mounts. I'm not see where we might do that for nodepool-builder	16:52
clarkb	daniel2: what i sthe context of builder_1 \| chown: changing ownership of '/var/lib/nodepool/images': Operation not permitted ? was that a chown you ran or a chown that it tried to rn on its own?	16:53
daniel2	it was a chown I ran	16:53
daniel2	using the command config line in the docker-compose section	16:53
pabelanger	clarkb: Yah, same. I have ansible chown / chmod the /var/lib/nodepool folder to specific user, which is uuid of user in container. I assume compose has the same ability	16:53
clarkb	gotcha so I think we may want ot update nodepool/tools/uid_entrypoint.sh to chown things properly	16:53
clarkb	pabelanger: its not compose's job, it is the entrypoint	16:54
clarkb	at least with how we've set up gitea	16:54
Shrews	perhaps the best course of action is to set images-dir in nodepool.yaml to some place that a) it has permission to write to, and b) has space to actually build images	17:00
daniel2	mkdir: cannot create directory '/var/lib/nodepool/.cache/image-create': Permission denied	17:00
daniel2	this is getting old D:	17:00
Shrews	that will fix the builder_id.txt issue since it writest to images-dir	17:00
Shrews	an external volume is probably best	17:01
daniel2	It is.	17:01
daniel2	I have no name!@7c14b606484d:/$ ls	17:01
daniel2	Nice hostname	17:01
daniel2	or username.	17:01
clarkb	Shrews: yes the way it is done with other tools is to have the entrypoint do the chown	17:01
clarkb	Shrews: it would also work to chown outside of the container	17:01
* clarkb gets an example		17:02
clarkb	Shrews: https://opendev.org/opendev/system-config/src/branch/master/docker/gitea-init/entrypoint.sh#L17-L26 we could do that with nodepool's entrypoint	17:04
Shrews	clarkb: but /var/lib is within the actual container, right? i'm suggesting an external volume mounted at run time	17:04
clarkb	Shrews: yes	17:04
clarkb	same as with /data/ in the gitea example	17:05
*** AshBullock has quit IRC		17:09
*** igordc has quit IRC		17:16
*** igordc has joined #zuul		17:17
daniel2	So I fixed one problem and created another :)	17:23
openstackgerrit	James E. Blair proposed zuul/zuul-operator master: WIP: testing https://review.opendev.org/672567	17:24
*** sgw has quit IRC		17:25
fungi	daniel2: welcome to computers? ;)	17:25
daniel2	haha right	17:25
fungi	pretty much describes my typical day	17:25
daniel2	at least I finished enough of the CI setup that we were able to close out that issue for the sprint.	17:26
daniel2	Thats why I was up so late, wanted to knock that out.	17:26
fungi	awesome~!	17:26
daniel2	We moevd the nodepool stuff to another issue.	17:26
*** igordc has quit IRC		17:27
yoctozepto	tried asking on infra but maybe here is a better place - any idea why http://zuul.openstack.org/builds?project=openstack%2Fkolla-ansible&pipeline=periodic&pipeline=periodic-stable&branch=master&branch=stable%2Fstein&branch=stable%2Frocky&branch=stable%2Fqueens does not return?	17:29
yoctozepto	it does if you replace kolla-ansible with kolla	17:29
yoctozepto	or remove filter on periodic-stable	17:30
yoctozepto	otherwise it does not work (or would take hours? waited several minutes already)	17:30
fungi	it's not clear to me what that query means	17:33
fungi	are those terms expected to be anded? ored?	17:33
clarkb	fungi: it operates as an AND	17:33
clarkb	(sorry lucene query language all caps habit)	17:33
clarkb	I think that is why you get no results	17:34
clarkb	you can't be in two pipelines	17:34
fungi	are the types compared via and but multiple options within each type compared with or?	17:34
clarkb	fungi: I don't think so	17:34
* clarkb looks at the sql query		17:34
clarkb	hrm it moved	17:35
AJaeger	clarkb, fungi: http://zuul.openstack.org/builds?project=openstack%2Fkolla-ansible&pipeline=periodic-stable is much simpler and shows the problem that yoctozepto has...	17:36
AJaeger	yoctozepto: keep it simple, please ;)	17:36
fungi	i'm guessing the expectation is that this should query "project:openstack/kolla-ansible AND pipeline:(periodic-stable OR stable) AND branch:(master OR stable/stein or stable/rocky OR stable/queens)"	17:36
corvus	if someone wants to implement that, go for it, but that's not how it works :)	17:37
yoctozepto	AJaeger: I did in the other channel, then I discovered this one loads in a couple of minutes	17:37
fungi	ahh, sorry, my network connection here is going in and out, so my responses are lagging somewhat	17:37
yoctozepto	;-)	17:37
clarkb	as a sanity check periodic-stable pipeline does have the mysql reporter listed	17:37
yoctozepto	guys	17:38
yoctozepto	but it works for kolla	17:38
yoctozepto	magic: http://zuul.openstack.org/builds?project=openstack%2Fkolla&pipeline=periodic&pipeline=periodic-stable&branch=master&branch=stable%2Fstein&branch=stable%2Frocky&branch=stable%2Fqueens	17:38
clarkb	AJaeger: that url works for me	17:38
yoctozepto	hence someone has implemented it	17:38
yoctozepto	but it does not want to work for kolla-ansible	17:38
fungi	or this is undefined behavior	17:38
yoctozepto	for no particular reason	17:38
clarkb	looking at the api code it expects a singular pipeline	17:38
yoctozepto	lolz, but it worked so great until I tried it on k-a	17:39
yoctozepto	;D	17:39
yoctozepto	http://zuul.openstack.org/builds?project=openstack%2Fnova&pipeline=periodic&pipeline=periodic-stable&branch=master&branch=stable%2Fstein&branch=stable%2Frocky&branch=stable%2Fqueens	17:39
yoctozepto	etc.	17:39
yoctozepto	;D	17:39
fungi	but yes, the results there do seem to match the pseudoquery i wrote above, so possible it works that way by accident	17:39
yoctozepto	it also does work for check+gate: http://zuul.openstack.org/builds?project=openstack%2Fkolla-ansible&pipeline=gate&pipeline=check&branch=master&branch=stable%2Fstein&branch=stable%2Frocky&branch=stable%2Fqueens	17:40
corvus	if there are multiple values for a parameter, then they are treated as an "in" query	17:40
fungi	i wonder if one of those branches or pipelines has no kolla-ansible matches and that's the difference	17:40
yoctozepto	only not if you sprinkle periodic-stable	17:41
yoctozepto	fungi: good one	17:41
yoctozepto	lemme check	17:41
AJaeger	yoctozepto: so, what's the smallest query that shows the problem?	17:41
*** pcaruana has joined #zuul		17:41
yoctozepto	AJaeger: I wish I knew	17:41
AJaeger	yoctozepto: yeah, get now results for the one I posted using kolla-ansible - was too impatient last time ;)	17:41
clarkb	q = self.listFilter(q, buildset_table.c.pipeline, pipeline)	17:42
clarkb	corvus: ^ that generates the "in" ?	17:42
*** saneax has quit IRC		17:42
corvus	clarkb: yep	17:42
corvus	if it's single, it's "==", otherwise it's "in"	17:42
clarkb	ah yup I see the defintion of listFilter now	17:43
yoctozepto	rocky/queens do not have any periodic-stable	17:43
yoctozepto	but withouth them: http://zuul.openstack.org/builds?project=openstack%2Fkolla-ansible&branch=master&branch=stable%2Fstein&pipeline=periodic&pipeline=periodic-stable	17:43
yoctozepto	still nothing	17:43
*** saneax has joined #zuul		17:43
*** saneax has quit IRC		17:44
yoctozepto	AJaeger: you might like this, it's shorter ;p	17:44
yoctozepto	I think I overloaded Zuul now	17:44
yoctozepto	it says Fetching info...	17:44
yoctozepto	for the previously working too now	17:44
fungi	well, it's only zuul-web you're overloading, presumably	17:45
yoctozepto	yeah, thought about that	17:45
fungi	(if you are)	17:45
fungi	separate daemon from the scheduler	17:45
yoctozepto	I'm not well versed in zuul's components yet	17:45
yoctozepto	I see, will have that in mind	17:45
yoctozepto	(the next time I overload it)	17:45
corvus	this is the query that has been running for 11 minutes: http://paste.openstack.org/show/754810/	17:46
clarkb	is it possible that seraching over (project, branch, pipeline) simply needs to be better indexed?	17:46
clarkb	the query that works is only (project, pipeline)	17:46
yoctozepto	clarkb: it works fast for kolla and nova for example	17:47
* yoctozepto - zuul-web official overloader		17:47
corvus	wow it does construct the query differently for kolla and kolla-ansible	17:47
* yoctozepto convicted for generating a 11 minute query :-(		17:48
fungi	huh, that's bizarre	17:48
yoctozepto	maybe due to -?	17:48
corvus	http://paste.openstack.org/show/754811/	17:48
yoctozepto	I mean: -	17:48
corvus	first one is k-a, second is k	17:48
yoctozepto	ah, on the mysql side	17:49
yoctozepto	Using filesort and Using join buffer (Block Nested Loop) spoil the play :-(	17:50
corvus	mordred, Shrews: ^ i'm stumped as to why the query planner would make that choice	17:57
corvus	kolla-ansible has 11k buildsets in the db, and kolla has 10k. so they should both be in the index	17:58
yoctozepto	corvus: could you try explain after replacing USE INDEX with FORCE INDEX?	17:58
yoctozepto	corvus: you can also compare with nova as it is fine too	17:58
corvus	yoctozepto: force index with k-a shows the same plan	17:59
Shrews	what indexes are available on the zuul_build table?	18:00
yoctozepto	corvus: yeah, was worth trying anyway, it protects agains full scan on the same table though...	18:00
Shrews	it's doing a table scan on that table for both queries	18:00
corvus	https://etherpad.openstack.org/p/Z0cucbdugf	18:00
*** igordc has joined #zuul		18:01
* fungi is enjoying this impromptu crash course in database mysteries		18:01
Shrews	corvus: could you get a describe for the other two tables?	18:02
* yoctozepto sharing fungi's enjoyment		18:03
corvus	done	18:03
Shrews	what's the difference in those two queries? my eyes can't find it	18:04
yoctozepto	Shrews: kolla vs kolla-ansible	18:04
yoctozepto	changing project caused it	18:04
corvus	yeah, only the string constant; no structural change	18:04
Shrews	and they're both slow?	18:04
yoctozepto	exactly	18:04
yoctozepto	nope	18:05
yoctozepto	kolla fast	18:05
yoctozepto	kolla-ansible slow	18:05
yoctozepto	nova fast	18:05
yoctozepto	probably many more fast	18:05
Shrews	oh. figuring out why the optimizer does what it does is dark magic. lemme see if i can spot anything obvious though. my guess is the cardinality is significantly different for the projects	18:06
fungi	(de)optimizer	18:07
Shrews	select count(*) from zuul_buildset where project = "openstack/kolla" <--- that and a similar count for "openstack/kolla-ansible" might be useful	18:09
yoctozepto	and nova	18:09
Shrews	or at least interesting	18:09
fungi	17:58 <corvus> kolla-ansible has 11k buildsets in the db, and kolla has 10k...	18:09
yoctozepto	and nova has probably many more	18:10
corvus	46	18:10
corvus	exact numbers in etherpad now	18:10
yoctozepto	;D	18:10
yoctozepto	k-a has an even number	18:11
yoctozepto	the others have odd	18:11
corvus	nova has the fast query, nova-specs is slow	18:12
yoctozepto	and count is?	18:13
corvus	yoctozepto: and nova-specs is 2153 -- odd :)	18:13
yoctozepto	:-(	18:13
Shrews	pipeline might make a difference in counts	18:13
corvus	yoctozepto: but that is making me wonder about your '-' theory	18:13
yoctozepto	corvus: yeah but why xD	18:13
corvus	it's a longer form of something that's also in the index	18:14
yoctozepto	I proposed that before you explained it is mysql query optimizer	18:14
yoctozepto	ah, you think this way	18:14
yoctozepto	well, it did try to use project_pipeline_idx	18:15
yoctozepto	and cardinality is different/better	18:15
yoctozepto	because as you observed we have a prefix	18:15
yoctozepto	I would try out some others but zuul-web is still angry at me	18:16
yoctozepto	actually it does not load for me at all atm	18:16
clarkb	ya I think it is angry more globally	18:16
clarkb	scheduler is still running so nothing should be lost	18:16
yoctozepto	I hope so	18:17
corvus	i can probably kill the queries	18:17
clarkb	I checked the scheduler logs and it was busy and happy	18:17
yoctozepto	but if mysql locked tables then it is bad anyway	18:17
corvus	ah, there's only one long query right now	18:17
corvus	running for 1566 seconds	18:17
yoctozepto	kill it anyways	18:17
corvus	just died on its own :)	18:18
yoctozepto	ok ;-)	18:18
yoctozepto	(yeah, right)	18:18
corvus	if we have more questions, i'm happy to run 'explain' commands so we can find out without tying things up	18:18
yoctozepto	zuul-web still angry	18:18
yoctozepto	sure, let's go with more - dash examples	18:18
yoctozepto	something small	18:19
yoctozepto	karbor	18:19
yoctozepto	karbor-dashboard	18:19
yoctozepto	(well, relatively to nova)	18:19
Shrews	so, when the pipeline is considered in the counts, the cardinality of the rows is much more different: 820 rows vs. 3307	18:20
Shrews	might be enough to cause the optimizer to choose a different plan	18:20
Shrews	sadly, i've forgotten sooooo much about query optimization	18:20
yoctozepto	Shrews: A vs B	18:21
yoctozepto	A = ? B = ?	18:21
Shrews	prepared statements might make sense here	18:23
yoctozepto	Shrews: what cardinalities were those that you posted?	18:24
yoctozepto	k vs k-a?	18:24
yoctozepto	k-a vs k?	18:24
yoctozepto	something vs nova? :D	18:24
yoctozepto	(I can't do that myself, you know)	18:24
Shrews	project name and pipeline counts from zuul_buildsets	18:24
Shrews	select count(*) from zuul_buildset where project = "openstack/kolla-ansible" and pipeline IN ('periodic', 'periodic-stable');	18:24
Shrews	select count(*) from zuul_buildset where project = "openstack/kolla" and pipeline IN ('periodic', 'periodic-stable');	18:25
yoctozepto	so kolla simply has more here?	18:25
*** panda has quit IRC		18:25
Shrews	yes. but nova has 3000+ too.	18:26
yoctozepto	and that's why it is fast	18:26
Shrews	no. just speculation	18:26
yoctozepto	yeah, but quite possible	18:26
yoctozepto	it actually used that index	18:26
yoctozepto	with 820	18:27
yoctozepto	\| 1 \| SIMPLE \| zuul_buildset \| NULL \| range \| PRIMARY,project_pipeline_idx,project_change_idx \| project_pipeline_idx \| 1536 \| NULL \| 820 \| 4.00 \| Using index condition; Using where; Using temporary; Using filesort \|	18:27
yoctozepto	so it liked this one specifically	18:27
yoctozepto	<corvus> yoctozepto: and nova-specs is 2153 -- odd :)	18:27
yoctozepto	Shrews: can you check nova-specs filtered?	18:27
Shrews	i get 0 with nova-specs	18:28
corvus	nova-specs with periodic pipelines is 0	18:28
corvus	which makes sense	18:28
yoctozepto	and it's slow	18:28
yoctozepto	slowest 0 in history, been there, done that	18:28
Shrews	with the smaller cardinality of that combo, it's picking a poor index (project_pipeline_idx) and we then scan 820 x 7093912 rows, vs just 7093912	18:31
Shrews	is that a needed index?	18:31
yoctozepto	we can exclude it	18:32
corvus	Shrews: which one, project_pipeline_idx?	18:32
*** panda has joined #zuul		18:33
Shrews	i'd have to get a mysqldump of that data and then remember a whole bunch of stuff before i could say what a fix is	18:33
Shrews	corvus: yeah	18:33
yoctozepto	IGNORE INDEX (blah)	18:33
corvus	Shrews: i think it's only there to speed up queries like this :)	18:34
Shrews	corvus: seems to do the opposite :)	18:34
yoctozepto	if you don't want to remove it entirely, try if IGNORE INDEX after that join will help us for now	18:35
corvus	yoctozepto: yeah that seems to switch to the other form	18:35
yoctozepto	YAY	18:35
corvus	2614 rows in set (9.24 sec)	18:35
yoctozepto	reasonable	18:35
*** hwangbo has joined #zuul		18:36
yoctozepto	still, could probably be better if things were index more optimally for this type of query	18:36
corvus	it also seems that removing the branch terms uses the better query	18:36
corvus	yeah, i think the thinking for the project_pipeline_idx was that it should help this case, because we're giving it a project and a pipeline, so it should be able to get the 820 buildsets that match, then join with the builds.	18:38
yoctozepto	¯\_(ツ)_/¯	18:38
yoctozepto	^ that's mysql to us	18:38
corvus	yoctozepto, Shrews: using a single pipeline helps as well	18:39
corvus	i wonder if indexing the project and pipeline separately would help?	18:39
yoctozepto	corvus: http://zuul.openstack.org/builds?project=openstack%2Fkolla-ansible&pipeline=periodic-stable&branch=master&branch=stable%2Fstein&branch=stable%2Frocky&branch=stable%2Fqueens - seems slow ?	18:40
yoctozepto	ah not that much	18:40
yoctozepto	like 15 s	18:40
yoctozepto	so fine indeed	18:40
yoctozepto	I got impatient waiting minutes for the bad one	18:40
yoctozepto	just like AJaeger did	18:41
corvus	that's actually a 3rd query plan	18:41
corvus	yoctozepto: http://paste.openstack.org/show/754812/	18:42
corvus	when i switched to 1 pipeline, i did 'periodic' and got the 'fast' one. your switching to 'periodic-stable' got us a new 'medium' one :)	18:42
*** tosky has quit IRC		18:42
corvus	(it looks sort of like the 'slow' one, but it's a 'ref' rather than a 'range' scan)	18:42
yoctozepto	lol	18:43
openstackgerrit	Tristan Cacqueray proposed zuul/zuul-operator master: WIP: debug zuul-operator-functional-k8s job https://review.opendev.org/672576	18:43
clarkb	unrelated but do we need to edit zuul's pipeline configs so that rechecked changes go into the gate?	18:44
yoctozepto	I don't get this thingy	18:45
yoctozepto	we have	18:45
yoctozepto	INNER JOIN zuul_buildset ON zuul_buildset.id = zuul_build.buildset_id	18:45
yoctozepto	[ FROM zuul_build USE INDEX (PRIMARY) ]	18:45
tristanC	jeliu_: corvus: we shouldn't need the operator-sdk to test the zuul-operator. It seems like the cli should be used for local dev... I'll have a look why the pod doesn't start now	18:45
yoctozepto	buildset has	18:45
yoctozepto	PRIMARY KEY (`id`),	18:45
corvus	tristanC: agreed, and thanks! i think jeliu_ was also just looking into that too	18:45
yoctozepto	build has	18:45
*** fdegir has quit IRC		18:45
yoctozepto	PRIMARY KEY (`id`),	18:45
yoctozepto	KEY `buildset_id` (`buildset_id`),	18:45
yoctozepto	yet we get:	18:45
yoctozepto	\| 1 \| SIMPLE \| zuul_build \| NULL \| ALL \| NULL \| NULL \| NULL \| NULL \| 7094344 \| 0.00 \| Using where; Using join buffer (Block Nested Loop) \|	18:45
yoctozepto	it's counterintuitive for me atm	18:46
*** fdegir has joined #zuul		18:46
yoctozepto	damn lol	18:46
yoctozepto	USE INDEX	18:46
yoctozepto	what happens if drop this bastard?	18:46
yoctozepto	if you* drop	18:46
corvus	yoctozepto: language :)	18:46
yoctozepto	sorry	18:47
yoctozepto	non-native speaker here, I probably feel it differently	18:47
corvus	yoctozepto: that corrected some very bad queries we used to have	18:47
yoctozepto	hmm, do you have examples?	18:47
yoctozepto	did they use buildset_id or job_name_buildset_id_idx?	18:48
yoctozepto	maybe let's add at least buildset_id	18:48
yoctozepto	to the list	18:48
corvus	yoctozepto: https://review.opendev.org/605170 says it was the build list query we were fixing	18:49
corvus	we could probably find more detail in irc logs around that	18:49
yoctozepto	I checked https://dictionary.cambridge.org/pl/dictionary/english/bastard some dictionaries really list that as offensive, sorry	18:50
corvus	yoctozepto: no prob :)	18:50
corvus	yoctozepto: but i think that may have been the query without any project filters	18:50
yoctozepto	fun thing is the best translations to Polish are not offensive language at all, they are actually much politer than what you actually hear	18:51
yoctozepto	"MySQL's query optimizer is choosing a poor index to use when"	18:51
yoctozepto	poor index is not very precise :D	18:51
corvus	yoctozepto: agreed. though i'm not sure we could characterize our current problem with 100% precision :)	18:51
yoctozepto	it's a very generic method anyway	18:52
yoctozepto	this probably hit some things good, some things bad	18:52
yoctozepto	corvus: well, forcibly excluding the best JOIN INDEX	18:53
yoctozepto	is a BAD IDEA	18:53
yoctozepto	(TM)	18:53
corvus	if we drop use index, things get very fast indeed, for k-a, 2614 rows in set (0.99 sec)	18:53
yoctozepto	can you re-explain the queries without this?	18:54
corvus	yoctozepto: line 114 of etherpad	18:54
yoctozepto	can yo do that for kolla and nova and nova-specs too?	18:55
yoctozepto	and possibly kolla-ansible in the "medium case"?	18:55
yoctozepto	(interested in how mysql goes about it)	18:56
yoctozepto	\| 1 \| SIMPLE \| zuul_build \| NULL \| ref \| buildset_id \| buildset_id \| 5 \| zuul.zuul_buildset.id \| 8 \| 100.00 \| NULL \|	18:56
yoctozepto	^ this, sir, is pure heaven	18:56
corvus	i'm about 30 mins late for lunch now, maybe Shrews can help?	18:56
yoctozepto	corvus: sorry, I am actually planning my bedtime now	18:57
Shrews	we still may want to consider prepared statement for this if it's executed a lot	18:57
Shrews	if it's rare/on-demand... meh	18:57
corvus	yoctozepto: i'm pretty sure where that breaks down is when we do the same queries without any project, pipeline, or branch selection; ie, what you get when you first hit the builds page without entering any search terms	18:57
fungi	it's on-demand insofar as folks are querying it via the builds page on the web dashboard	18:57
yoctozepto	on-demand, my case possibly rare but I wanted to propose it as CI health check	18:57
Shrews	glad my suspicion bore us some fruit though	18:57
yoctozepto	corvus: it can be checked easily	18:58
corvus	yep; someone needs to take over for me though; i need food	18:58
corvus	biab	18:58
yoctozepto	Shrews/fungi?	18:59
Shrews	yoctozepto: sorry, got distracted... what do you need?	19:00
yoctozepto	let's have explains for completenes	19:00
yoctozepto	withtout the use index	19:00
yoctozepto	corvus produced one at L114	19:00
yoctozepto	would like to look at kolla and nova cases too	19:01
yoctozepto	and maybe the no-WHERE case as corvus suggested?	19:01
Shrews	lemme look	19:01
yoctozepto	(apart from tenant WHERE probably)	19:01
Shrews	yoctozepto: posted nova and kolla explains	19:05
yoctozepto	hmm, you got the same color as I have ;D	19:06
yoctozepto	ok, it did them all in the same way	19:06
yoctozepto	which is... good	19:06
Shrews	and why would we want no WHERE clause?	19:06
yoctozepto	not that 'Using index condition; Using where; Using temporary; Using filesort' is very good but still	19:07
yoctozepto	Shrews: corvus suggested main page was slow without the use index	19:07
yoctozepto	could be*	19:07
yoctozepto	(not was)	19:07
yoctozepto	it should run just the tenant where	19:07
yoctozepto	but not the others	19:07
Shrews	personally, i'm not interested in complex, slow queries without a where clause	19:08
Shrews	we're doing something wrong if we are doing that	19:08
yoctozepto	Shrews: it's the curse of on-demand stuff	19:09
yoctozepto	Shrews: in case you want to take a look: http://zuul.openstack.org/builds	19:10
yoctozepto	this is what we users are filtering	19:10
mordred	Shrews: I'm back from lunch - it seems like this is a topic that someone might reasonably expect me to also look at?	19:11
yoctozepto	mordred: be my guest	19:11
* mordred is still reading scrollback		19:11
Shrews	mordred: i think the immediate need has dissipated	19:11
mordred	ok. there's a LOT of scrollback	19:12
fungi	we painted an elder sign on the side of the database and it seems to be keeping the great old ones on the other side of the portal now	19:12
mordred	oh thank god	19:12
mordred	I was worried I was going to have to page in the mysql optimizer internals again	19:13
yoctozepto	fungi: fantasy RPG fan?	19:13
yoctozepto	mordred: we kind of found the culprit	19:13
fungi	yoctozepto: or h.p. lovecraft stories. take your pick	19:13
yoctozepto	fungi: or both?	19:13
yoctozepto	mordred: USE INDEX(PRIMARY) kills our best available JOIN index	19:14
Shrews	mordred: it's been over 10 years for both of us. i question any of the knowledge that we've paged out at this point	19:14
mordred	Shrews: yeah.	19:14
mordred	yoctozepto: something was doing an explicit index hint?	19:14
yoctozepto	mordred: yeah, the very generic method	19:15
yoctozepto	https://review.opendev.org/#/c/605170/2/zuul/driver/sql/sqlconnection.py	19:15
yoctozepto	wish the commit message was a bit less enigmatic	19:15
mordred	ah - yeah - I think I remember chatting about that at the time. 9 times out of 10 index hints wind up being the wrong choice - I think at the time it was presenting as the strange case where it was needed - but now things have shifted again (Which is usually the issue with index hints - most of the time humans can't keep up with the optimizer in terms of dealing with changing data set)	19:17
yoctozepto	mordred: optimizer changes too	19:17
yoctozepto	heuristic change	19:17
yoctozepto	data changes	19:18
mordred	http://eavesdrop.openstack.org/irclogs/%23zuul/%23zuul.2018-09-25.log.html#t2018-09-25T16:25:10	19:18
mordred	there's the original conversation	19:18
fungi	yeah, we're likely running a newer mysql/mariadb in opendev than we were back then	19:18
fungi	so entirely possible the optimizer is no longer the same	19:18
yoctozepto	"doing a full table scan to check the tenant"	19:19
yoctozepto	not happening today	19:20
mordred	well - that's good at least :)	19:20
fungi	ahh, actually we're using a trove instance for the db, so i doubt the version has been updated	19:21
yoctozepto	"a box of Sakila memberberries"	19:21
mordred	yoctozepto: I set them right next to my little glass dolphin sculpture	19:26
mordred	of course, now I can't remember where either of them are	19:26
fungi	in a storage unit somewhere by the airport	19:26
mordred	oh right	19:26
mordred	although it turns out "by" in this case is give or take an hour	19:27
fungi	for atlanta, that's practically next door?	19:27
mordred	we opted for a unit in kennesaw, because it was super cheap and near one of my cousins	19:27
yoctozepto	ah, guys	19:27
yoctozepto	back to real life	19:27
mordred	:)	19:27
* mordred disturbs the real work		19:27
yoctozepto	what about reverting that one change	19:28
yoctozepto	and observing?	19:28
fungi	we would only need to restart zuul-web with that revert applied, right?	19:29
fungi	if so, doesn't seem terribly risky	19:29
mordred	well - I don't think we should revert revert should we? we should only remove the with_hint line?	19:30
mordred	or have I not paged in enough of the backscroll?	19:30
mordred	NEVERMIND	19:30
mordred	the other bits are just formatting change	19:30
fungi	that's what reverting 605170 would be, right	19:30
mordred	yeah	19:31
mordred	want me to propose a revert?	19:31
fungi	whoever wants to push the button, i'm happy to review	19:31
fungi	though i was thinking we could just hand-apply it on zuul.opendev.org and restart zuul-web	19:33
mordred	fungi: that's also a fine choice	19:33
mordred	fungi: we could do those in parallel even	19:33
fungi	and see if it has the anticipated performance impact	19:33
openstackgerrit	Monty Taylor proposed zuul/zuul master: Revert "Speed up build list query under mysql" https://review.opendev.org/672581	19:33
yoctozepto	but don't forget to change it permanently later	19:33
clarkb	ya the hand rvert and restart is a common thing we've done when checking things like memory leaks	19:34
mordred	that gives us a place to discuss	19:34
clarkb	I would do it tha way to quickly get results iwthout ending up with revert revert revert revert commits	19:34
yoctozepto	yeah, it needs some time for observations	19:34
fungi	yoctozepto: right, i mean hand-apply the proposed change without approving it straight away	19:34
fungi	still propose it for review so we have a place to better record our observations	19:34
*** hwangbo has quit IRC		19:34
fungi	which mordred has done	19:35
openstackgerrit	Merged zuul/zuul master: Update xterm to >= 3.14.5 https://review.opendev.org/671858	19:39
clarkb	I'll keep an eye on ^	19:39
clarkb	was tested with the check queue results so should e fine	19:40
*** hwangbo has joined #zuul		19:50
corvus	no way	19:54
fungi	no way what?	19:54
corvus	the solution to the problem is to go back to when things were even worse?	19:54
corvus	i sunk several hours of analysis into this, including why that revert would be a bad idea	19:55
corvus	and i go out for lunch, and come back to find that in my absence, folks have just decided to ignore that?	19:55
fungi	ahh, i misunderstood that was the conclusion	19:55
corvus	19:31 < mordred> want me to propose a revert?	19:56
corvus	19:31 < fungi> whoever wants to push the button, i'm happy to review	19:56
fungi	review by testing it, to see if that made it worse/better	19:57
fungi	(push revert button generating the revert, not button to approve it)	19:58
corvus	a year ago, that query took 20 seconds. why would it be faster now?	19:58
corvus	50 rows in set (1 min 14.12 sec)	19:59
fungi	it sounded like the theory was that changes in relative table sizes have caused the optimizer to choose different indices now than they did in october, but i have likely misunderstood	19:59
corvus	fungi: i tested it ^	19:59
corvus	that's going to happen everytime someone goes to the build page http://zuul.openstack.org/builds	20:01
fungi	got it, so 672581 will cause the builds page default query to take 20s to load on opendev's deployment	20:02
corvus	fungi: no, it would have taken 20 seconds a year ago, it would take 1 minute 14 seconds now.	20:02
clarkb	74.12 seconds now looks like	20:02
fungi	well, even 20s would not be great, agreed	20:03
fungi	so yes the revert does seem to not be a solution	20:03
fungi	mordred: yoctozepto: Shrews: ^	20:03
corvus	it will make one arcane and rarely used query faster at the cost of making the simple query used many times per day extremely slow	20:04
Shrews	i just returned from physical therapy, catching up. what's being reverted?	20:04
corvus	Shrews: nothing -- https://review.opendev.org/672581	20:05
mordred	nothing. corvus already analyzed it and determined it's a terrible idea	20:05
Shrews	oh yeah. that wasn't the solution	20:06
corvus	with my limited knowledge, i can think of 2 ways to proceed -- 1) repeat the process from a year ago and try to come up with a better set of indexes that works in all cases; or 2) maybe we look into keeping the hint where we aren't filtering by project, etc, but drop it when we are	20:07
corvus	option 2 seems kind of hacky, like we're doubling down on the "try to beat the optimizer" bet. but it also seems like it might be practical.	20:08
corvus	and i agree with mordred in principle; i would love to not try to beat the optimizer	20:08
fungi	but by default the optimizer is selecting a less optimal query for the default page view	20:09
mordred	corvus: I'm still digesting - but I agree - we're already in beat-the-optimizer land, so I don't think 2 is any WORSE - and if we know in code the difference between filtering by project and not, sending two different queries to the db is a fine choice	20:09
mordred	like, it's not uncommon for the answer to hard query optimization to be "put more logic in the app and ask the database different questions" - even though it frequently feels wrong	20:10
corvus	mordred: okay, i think it'd be great if you continued to digest this (https://etherpad.openstack.org/p/Z0cucbdugf has a bunch of queries you can run on the prod zuul db) 'cause you and Shrews are gonna be more likely to synthenize an option 3 than i am. but while you work on that, i can try regression testing a bunch of queries against the use/don't-use hint idea and see if that's feasible.	20:12
mordred	corvus: yeah. I am now reading the etherpad and enjoying it	20:12
corvus	also, if anyone else wants to do this, i'm sure we can put up a copy of the db somewhere; there isn't a whit of sensitive data in it.	20:13
Shrews	corvus: i was about to suggest letting me mysqldump the data and play with it locally.	20:13
Shrews	of course, now i have to remember how to use mysqldump :/	20:13
corvus	Shrews: ++	20:13
clarkb	mysqldump -u foo -p databasename \| gzip -9 > foo.sql.gz	20:14
clarkb	roughly	20:14
Shrews	hrm, how can i pull that out of the container	20:14
corvus	Shrews: no container on this host	20:14
clarkb	I don't think it is in a container	20:14
Shrews	oh wait	20:14
Shrews	yah	20:14
Shrews	nm	20:14
corvus	least, not yet :)	20:15
corvus	(but also, ftr, we have figured out how to do that; it's a bit wonky with lots of weird shell quoting; we do that for gitea)	20:15
fungi	Shrews: yeah, the database is i a trove instance, so you need to know the hostname along with the credentials	20:16
fungi	but can be found in the zuul configs in /etc	20:17
mordred	corvus: so - just so I can make sure I'm understanding what's going on from the etherpad ...	20:17
mordred	the first query is an example of with the hint but filtering by project	20:17
mordred	which shows the results of 820 * 7093912 rows which == a lot of rows	20:18
Shrews	mordred: the query plan changes based on project name (which caused me to suspect cardinality of that data). the one you reference there is the slow plan	20:19
corvus	mordred: ah, the hint didn't really come into the conversation until really late. line 114.	20:19
mordred	wait - both .. yeah	20:19
mordred	this is why I'm talking out loud :)	20:19
corvus	mordred: so, yes that's true for the first query. but it's also true for the second query. but then what Shrews said -- the only diff between those 2 queries is the project name.	20:19
Shrews	mordred: corvus: anyone else: data dump in my home dir on zuul01 (zuul.sql.gz)	20:20
fungi	thanks Shrews!	20:20
corvus	cool; if anyone without access to that server wants it, i can put it somewhere public	20:21
Shrews	5.7.18 MySQL Community Server	20:23
Shrews	for those playing along	20:23
Shrews	mordred: we could put everything in ndb	20:25
* Shrews waits for hurled wet critters		20:25
mordred	Shrews: I was actually thinking mongo	20:26
mordred	(but i could totally kill this with NdbRecord)	20:26
corvus	yeah, this is the kind of data it was made for	20:31
Shrews	oh wow. 5.7.18 is really old	20:34
mordred	yeah. when we start deploying zuul from our container images I TOTALLY want to reassess our db backend	20:34
corvus	yeah, that's why we're tending toward mariadb containers for new things	20:34
*** panda has quit IRC		20:35
Shrews	5.7.27 is the closest archive download available. will try that	20:35
*** panda has joined #zuul		20:37
mordred	Shrews: your user on servers is uncapitalized which confuses me	20:45
Shrews	keeps the feds off my trail	20:46
* mordred decapitalizes		20:46
mordred	corvus: I can't believe I'm about to say this - but we should change this to be a subquery	20:54
corvus	mordred: feeling feverish?	20:55
mordred	I know. but rewriting the first query as a subquery returns in 0.03 seconds and looks at 1536 * 5 rows	20:55
mordred	I'm still experimenting, mind you - so let's consider that an anecdote	20:56
mordred	for now	20:56
corvus	mordred, Shrews: i have some initial results from my regression testing on dropping the hint. as expected, so far it's only a problem for the queries which have no project or pipeline conditionals. if you have either of those, it's good. if you have a project, it's excellent (0 seconds). if you have a pipeline without a project, it's okay (1-5 seconds). that makes me wonder if an independent pipeline	20:58
corvus	index would additionally be helpful (we only have (project, pipeline) right now)	20:58
mordred	I doubt it -the pipeline cardinality is going to be super low - it's really only valuable combined with something else	20:59
corvus	oh that's a good point	21:00
mordred	oh - let me check pipeline without project in my subquery experiment	21:00
corvus	i think i should improve this script a little bit to make it more automated, and so we can add in more of the fields we search (job, build, etc), and we can also do apples/apples with mordred's subquery idea	21:01
mordred	2.8 seconds for pipeline without project. 0.29 seconds for nova	21:01
corvus	mordred: branch or no?	21:02
mordred	yeah - those all have branches	21:02
mordred	or - a list of them	21:02
corvus	mordred: then that's the same time i got (2.35s)	21:02
mordred	awesome	21:02
corvus	mordred: so what's the subquery like with only "zuul_buildset.tenant='openstack'" in the where clause?	21:03
mordred	checking	21:04
*** pcaruana has quit IRC		21:05
mordred	8.69 seconds	21:05
mordred	of course, that produces 7.2 million records	21:06
corvus	well, it's better than 74, but not better than 0	21:07
corvus	mordred: throw a 'limit 50' on the end there :)	21:07
corvus	(that query with the hint and limit 50 is 0.00 seconds)	21:07
mordred	it's no better with limit because there is no support (at least in 5.7) for limit in a subquery	21:08
mordred	oh - although I guess it's actually not correct to limit in the subquery in this case	21:08
mordred	corvus: it's possible we might actually have a collection of different queries that are each better for different use cases	21:09
corvus	mordred: well, also i'm wondering if the subquery is producing pretty similar results to the single query but without the hint	21:10
mordred	it might be - except for the limit 50 case with the hint where that's smoking the subquery	21:11
Shrews	are you both testing on the production db? i just remembered sometimes running ANALYZE on a table makes a difference on funky path decisions. might want to give that a whirl	21:12
Shrews	just got my local db loaded	21:12
mordred	Shrews: I am testing on the production db	21:12
corvus	yep	21:12
mordred	which is a sentence I realize is a crazy sentence to type	21:12
Shrews	how very infra-responsible of you two :J)	21:13
corvus	it's not that important of a db :)	21:13
* fungi wonders if mariadb has added support for freudian_analyze		21:13
mordred	corvus, Shrews: I put a couple of subquery examples at the bottom	21:13
fungi	er, i should have said psychoanalyze ;)	21:14
corvus	mordred: is there a case where you think the subquery is better than not?	21:14
mordred	corvus: we could totally get around the limit limitation by performing it as 2 queries - one with a limit on the buildset table to produce a list of ids, then a second query with a constructed in list with its own limit	21:15
mordred	corvus: I think it's very good on things that filter by project	21:15
corvus	mordred: but i think the single query without the hint is also good on things that filter by project	21:15
mordred	it's not awful at things that don't limit by project	21:15
mordred	yeah	21:15
corvus	given that so far the single query with hint seems to be the only thing that can handle the query with no additional filters, it may not be worth the complexity of adding subquery into the mix if it doesn't have a win over single-query-without-hint	21:16
mordred	corvus: what about with the hint without project and with limit - is that still terrible?	21:16
mordred	agree	21:17
*** armstrongs has joined #zuul		21:17
corvus	mordred: re your question ^ are you asking about pipeline and branch or no?	21:17
*** mattw4 has quit IRC		21:18
Shrews	my analyze suggestion does nothing locally, fwiw	21:19
corvus	basically: hint: yes, pipeline: no, project: no, branch: no --> 0.00s (this is the query for the builds landing page)	21:19
corvus	but so far, that's the only thing we'd want to use the hint for	21:20
corvus	oh, sorry -- if we only have branch, we want to use the hint	21:20
mordred	yeah, because branch also has super low cardinality	21:21
corvus	so let me rephrase that: so far, it's looking like "use the hint if we have pipeline or project; otherwise do not use the hint" with a single query is producing excellent results	21:21
*** armstrongs has quit IRC		21:22
mordred	wfm	21:24
clarkb	console log streaming still works in the web ui so the xterm update must be working	21:26
mordred	clarkb: woot	21:28
clarkb	mordred: corvus points out we aren't updating the js in production currently	21:29
clarkb	because js tarball has moved	21:29
mordred	clarkb: oh. yeah. so we have not learned that the xterm update is working	21:30
*** jeliu_ has quit IRC		21:30
*** mattw4 has joined #zuul		21:31
corvus	mordred: for timing purposes i would like to eliminate the query cache	21:31
corvus	mordred: know of a way to do that?	21:31
corvus	cause i'm starting to get crazy fast times	21:32
mordred	yes - one sec	21:32
mordred	corvus: select SQL_NO_CACHE ...	21:32
mordred	corvus: or SET SESSION query_cache_type = OFF;	21:33
Shrews	corvus: that's a deprecated modifier now. weird	21:38
corvus	good thing we're on an old server	21:38
corvus	okay, i have automated my regression script; running it now	21:38
mordred	yeah - that's the 5.7 thing	21:38
mordred	cool	21:38
Shrews	corvus: it should be deprecated even on 5.7 ('show warnings' after your select will output the deprecation notice)	21:39
Shrews	still deprecated in 8.0 so... whatever	21:39
corvus	mordred, Shrews: http://paste.openstack.org/show/754818/	21:39
fungi	there's a mysql 8.0? what rock have i been living under?	21:40
corvus	i omitted the queries with no project or pipeline, since i ran them earlier and we know that without the hint they take forever	21:40
corvus	that's measuring execution time in python, so that includes query parsing, fetching data, etc. it's going to be a little more than what the mysql cli would report	21:40
mordred	cool	21:41
Shrews	fungi: went from 5.7 to 8.0, so you haven't been under a rock too long	21:41
fungi	ahh, okay	21:41
fungi	i see, there were 7.x cluster releases	21:41
corvus	i'm running my script with the other search terms now (job, build, result, etc)	21:52
corvus	it's looking like we also don't want to use the hint when we have a change; that's the other thing that's indexed on the buildset table.	21:58
corvus	so far: if not (project or pipeline or change): use hint; otherwise do not use hint	21:58
corvus	http://paste.openstack.org/show/754820/	22:01
corvus	that held up for that set of things	22:01
mordred	corvus: I'm fascinated that pipeline makes it ok	22:02
mordred	project and change make sense to me - they're the things that are the most specific	22:02
corvus	let me run that narrowly	22:02
mordred	corvus: also - I have meltybrain - can I restate to make sure I'm parsing - if project or pipeline or chage: do not use hint ; else: use hint	22:03
corvus	mordred: yep -- except i think you are right to be suspicious of pipeline	22:04
corvus	http://paste.openstack.org/show/754821/	22:04
corvus	it looks like we are better off using the hint with pipeline; it's just that if we don't use the hint with pipeline, it's not terrible	22:05
mordred	corvus: I think it we wanted to generalize, we could do a query at startup something like "select count(distinct(column_name))" to get the number of distinct values - and then define a column threshold over which we use the hint	22:05
mordred	that's super weird - we only have l ike 10 pipeline names	22:06
mordred	but - empirical data wins	22:06
mordred	but for now - I think just hard-coding the logic as you have defined it seems like a fine choice	22:06
corvus	mordred: yeah, there's a relationship to the indexes here, so it's not crazy; just incomprehensible	22:06
mordred	yeah	22:07
* mordred needs to dinner ... I'll check back post dinnering to see if there is further things to be baffled by		22:07
corvus	project and change are the most accessible indexes on that table	22:07
corvus	k, i'll probably check a few more things then write this up	22:07
mordred	yeah. project and change are the best things to be looking for	22:07
corvus	sorta weird that build isn't faster; we may want to look into that one	22:08
corvus	oh that's not weird at all	22:08
corvus	we don't have an index on build uuid	22:08
corvus	that's a big oversight	22:08
corvus	Shrews: are you still around? can you try some queries for me before and after adding an index?	22:09
Shrews	corvus: can be. just a sec	22:09
corvus	i'm digging up the q's	22:09
Shrews	k. ready	22:10
corvus	Shrews: http://paste.openstack.org/show/754822/	22:10
corvus	Shrews: can you run both of those before and after adding an index on zuul_build.uuid ?	22:11
Shrews	yup	22:11
Shrews	corvus: err, the explains, yes?	22:11
Shrews	or do you want the actual data?	22:11
corvus	Shrews: timing	22:12
Shrews	oh	22:12
corvus	(explains could be interesting too)	22:12
*** mattw4 has quit IRC		22:12
corvus	that's the table with 7m rows, so it may take a minute to build the index	22:12
corvus	also, how long does it take to build the index is good info :)	22:12
Shrews	before index, 1st query is 11.71s, 2nd query is 2.96s	22:13
Shrews	corvus: explains for those are at the end of the etherpad. anything else before i add the index	22:14
corvus	Shrews: not that i can think of now	22:15
Shrews	ok. gimme a sec to recall the create index syntax	22:15
*** mattw4 has joined #zuul		22:16
Shrews	corvus: index creation data at the end of the etherpad	22:18
corvus	30s isn't too bad	22:19
Shrews	corvus: that uuid does not exist in my data	22:19
corvus	oh, heh, it's really new	22:19
corvus	i'm sure i have an old one handy, 1 sec	22:19
corvus	Shrews: try 0f4573fc46934b79bec64438d7d63c70	22:20
Shrews	good news is with the index, both empty result sets returned REALLY quickly	22:20
Shrews	corvus: see end of etherpad. the results of the non-index queries will be slightly skewed since there were no results	22:23
Shrews	but you can consider the timings best case i guess?	22:23
Shrews	0s and 2.78s for the queries with the index	22:23
Shrews	i can drop the index and retry them with the existing uuid if you like	22:25
*** jeliu_ has joined #zuul		22:25
corvus	Shrews: can i beg you to drop the index and repeat that with a slight modification? (1 sec and i'll explain)	22:26
Shrews	yep	22:26
corvus	we have a job name index there that includes the buildset id, so i looked up why and found this: https://review.opendev.org/481614	22:27
corvus	a "covering index"	22:27
corvus	i'm thinking that we'd probably want the same thing for a build uuid index -- so can you create it as (uuid, buildset_id) ?	22:28
Shrews	creating. you want the queries with the existing uuid i assume corvus?	22:29
corvus	yeah	22:29
Shrews	index create about the same, 28.6s	22:29
Shrews	corvus: 0.00s on the first query, 2.84s on the second	22:31
corvus	k, about the same	22:31
Shrews	yeah	22:31
corvus	but theoretically better assuming the 'covering index' thing works like that	22:32
corvus	that seems like a Vocabulary Word so i assume mordred is right about that :)	22:32
Shrews	corvus: the explain actually looks much better	22:32
openstackgerrit	Jeff Liu proposed zuul/zuul-operator master: [WIP] Verify Operator Pod Running https://review.opendev.org/670395	22:32
Shrews	corvus: see end of etherpad again	22:33
corvus	thx	22:33
Shrews	that actually may have more to do with using an existing uuid	22:34
Shrews	hrm, nope	22:35
Shrews	(uuid, buildset_id) looks like a winner	22:36
corvus	cool, i'll include a schema change with this patch too	22:36
*** jeliu_ has quit IRC		22:36
corvus	now i'm curious why job_name isn't performing in the same way; maybe it's because the cardinality is too low on something like 'openstack-tox-py27'	22:36
corvus	i'm trying it with a more rare job	22:36
corvus	yep, that's it. if i use 'zuul-operator-functional-k8s' as the job name, it looks like what you got for the build uuid query	22:38
corvus	so, because the uuid is unique, once we add that index, we're always going to have good results with no hint.	22:39
Shrews	corvus: openstack-tox-py27 occurs the most of any other value	22:40
corvus	but with the same kind of index for job name, we get better results using the hint with a job with lots of hits like 'openstack-tox-py27', but better results without the hit for a rare job.	22:40
Shrews	i placed the top 20 counts in the etherpad	22:40
corvus	this is where it would be really nice if the optimizer were making the right choices :)	22:41
corvus	http://paste.openstack.org/show/754823/ numbers for those ^	22:41
Shrews	only 29 for zuul-operator-functional-k8s	22:42
Shrews	corvus: need anything else? food time here	22:43
corvus	Shrews: nope, thanks!	22:43
corvus	i'll flip a coin on whether to use the hint for job name, then write up the patch	22:43
openstackgerrit	James E. Blair proposed zuul/zuul master: Improve SQL query performance in some cases https://review.opendev.org/672606	23:13
corvus	Shrews, mordred, yoctozepto: ^ whew. there we go. thanks for your help there	23:14
openstackgerrit	James E. Blair proposed zuul/zuul master: Improve SQL query performance in some cases https://review.opendev.org/672606	23:14
openstackgerrit	James E. Blair proposed zuul/zuul master: Improve SQL query performance in some cases https://review.opendev.org/672606	23:15
*** michael-beaver has quit IRC		23:24
openstackgerrit	Ian Wienand proposed zuul/nodepool master: Enable debug logs for openstack-functional tests https://review.opendev.org/672412	23:28
*** mattw4 has quit IRC		23:44

Generated by irclog2html.py 2.15.3 by Marius Gedminas - find it at mg.pov.lt!