Wednesday, 2021-04-21

openstackgerrit	Merged zuul/zuul-jobs master: ensure-docker: ensure docker.socket is stopped https://review.opendev.org/c/zuul/zuul-jobs/+/787271	00:18
openstackgerrit	Merged zuul/nodepool master: Require dib 3.10.0 https://review.opendev.org/c/zuul/nodepool/+/786984	00:23
*** sam_wan has joined #zuul		00:59
*** sam_wan has quit IRC		01:36
*** ikhan has quit IRC		02:07
*** ajitha has joined #zuul		02:30
*** evrardjp has quit IRC		02:33
*** evrardjp has joined #zuul		02:33
*** sam_wan has joined #zuul		03:16
*** rlandy\|rover has quit IRC		03:34
*** ykarel\|away has joined #zuul		04:06
*** ykarel_ has joined #zuul		04:10
*** ykarel\|away has quit IRC		04:12
*** bhavikdbavishi has joined #zuul		04:15
*** bhavikdbavishi1 has joined #zuul		04:18
*** bhavikdbavishi has quit IRC		04:20
*** bhavikdbavishi1 is now known as bhavikdbavishi		04:20
*** bhavikdbavishi has quit IRC		04:27
*** bhavikdbavishi has joined #zuul		04:28
*** bhavikdbavishi has quit IRC		04:39
*** hamalq has quit IRC		04:49
*** vishalmanchanda has joined #zuul		04:55
*** jfoufas1 has joined #zuul		05:11
*** paladox has quit IRC		05:55
*** ykarel_ has quit IRC		05:55
*** ykarel__ has joined #zuul		05:55
*** mnaser has quit IRC		05:59
*** bhavikdbavishi has joined #zuul		05:59
*** mnaser has joined #zuul		06:00
*** saneax has joined #zuul		06:25
*** jcapitao has joined #zuul		06:34
*** ykarel_ has joined #zuul		06:38
*** ykarel__ has quit IRC		06:40
*** bhavikdbavishi1 has joined #zuul		06:53
*** bhavikdbavishi has quit IRC		06:54
*** bhavikdbavishi1 is now known as bhavikdbavishi		06:54
*** avass has quit IRC		07:09
*** avass has joined #zuul		07:10
*** rpittau\|afk is now known as rpittau		07:33
*** bhavikdbavishi has quit IRC		07:35
*** bhavikdbavishi has joined #zuul		07:35
*** tosky has joined #zuul		07:46
*** bhavikdbavishi has quit IRC		07:48
*** ykarel_ has quit IRC		07:52
*** jpena\|off is now known as jpena		07:56
*** nils has joined #zuul		08:08
*** bhavikdbavishi has joined #zuul		08:08
*** bhavikdbavishi1 has joined #zuul		08:11
*** bhavikdbavishi has quit IRC		08:13
*** bhavikdbavishi1 is now known as bhavikdbavishi		08:13
*** ykarel_ has joined #zuul		08:27
*** ykarel_ has quit IRC		09:34
*** holser has joined #zuul		09:52
openstackgerrit	Ian Wienand proposed zuul/nodepool master: Account for resource usage of leaked nodes https://review.opendev.org/c/zuul/nodepool/+/785821	10:12
*** bhavikdbavishi has quit IRC		10:18
*** bhavikdbavishi has joined #zuul		10:25
openstackgerrit	Matthieu Huin proposed zuul/zuul master: web UI: user login with OpenID Connect https://review.opendev.org/c/zuul/zuul/+/734082	10:28
openstackgerrit	Matthieu Huin proposed zuul/zuul master: Add authentication-realm attribute to tenants https://review.opendev.org/c/zuul/zuul/+/735586	10:29
openstackgerrit	Matthieu Huin proposed zuul/zuul master: web UI: allow a privileged user to dequeue a change https://review.opendev.org/c/zuul/zuul/+/734850	10:29
openstackgerrit	Matthieu Huin proposed zuul/zuul master: web UI: allow a privileged user to re-enqueue a change https://review.opendev.org/c/zuul/zuul/+/736772	10:29
openstackgerrit	Matthieu Huin proposed zuul/zuul master: Web UI: allow a privileged user to request autohold https://review.opendev.org/c/zuul/zuul/+/768115	10:30
openstackgerrit	Matthieu Huin proposed zuul/zuul master: Web UI: add Autoholds, Autohold page https://review.opendev.org/c/zuul/zuul/+/768199	10:31
*** jcapitao is now known as jcapitao_lunch		10:36
*** bhavikdbavishi has quit IRC		11:17
*** bhavikdbavishi has joined #zuul		11:29
*** jpena is now known as jpena\|lunch		11:32
*** rlandy has joined #zuul		11:48
*** rlandy is now known as rlandy\|rover		11:49
*** rlandy\|rover has quit IRC		11:54
*** sshnaidm has quit IRC		12:00
*** jcapitao_lunch is now known as jcapitao		12:06
*** sshnaidm has joined #zuul		12:07
*** rlandy has joined #zuul		12:08
*** rlandy is now known as rlandy\|rover		12:08
*** okamis has joined #zuul		12:30
*** jpena\|lunch is now known as jpena		12:32
*** sam_wan has quit IRC		12:52
*** fsvsbs has quit IRC		12:55
*** bhavikdbavishi has quit IRC		13:35
*** saneax has quit IRC		14:16
corvus	tobiash: do you have any thoughts on http://lists.zuul-ci.org/pipermail/zuul-discuss/2021-April/001566.html ?	14:33
avass	corvus: I'm running my own deployment on the tip of the master branch and don't have those problems	14:55
avass	so maybe there's a change combined with specific github configuration that causes that?	14:56
corvus	huh. super weird. i guess we'll just wait for gtema_ to do more investigation	14:56
corvus	avass: maybe so?	14:56
*** jfoufas1 has quit IRC		14:57
*** bhavikdbavishi has joined #zuul		14:59
*** saneax has joined #zuul		15:05
avass	also I have some ideas how to extend zuul-cache to also handle provides/requires and fetching artifacts from a previous pipelines with the artifacts api :)	15:13
avass	corvus: I double checked and both labels and reviews work for me and I'm running 4.2.1.dev8 4f3f973a	15:15
*** okamis has quit IRC		15:21
*** bhavikdbavishi1 has joined #zuul		15:28
*** bhavikdbavishi has quit IRC		15:30
*** bhavikdbavishi1 is now known as bhavikdbavishi		15:30
openstackgerrit	Merged zuul/nodepool master: Log decline reason at info https://review.opendev.org/c/zuul/nodepool/+/786513	15:39
*** bhavikdbavishi1 has joined #zuul		16:08
*** bhavikdbavishi has quit IRC		16:11
*** bhavikdbavishi1 is now known as bhavikdbavishi		16:11
*** saneax has quit IRC		16:16
corvus	hrm, https://review.opendev.org/758940 failed quickstart, but afaict it just looks like a random zk disconnect	16:20
corvus	i'm going to recheck, but let's keep that in mind	16:20
corvus	(visible in the nodepool launcher)	16:20
*** hamalq has joined #zuul		16:22
*** hamalq has quit IRC		16:23
*** hamalq has joined #zuul		16:24
*** jcapitao has quit IRC		16:41
*** jpena is now known as jpena\|off		17:03
*** bhavikdbavishi has quit IRC		17:07
*** bhavikdbavishi has joined #zuul		17:08
openstackgerrit	Tristan Cacqueray proposed zuul/zuul-jobs master: ensure-docker: prevent issue on centos-7 where the socket does not exists https://review.opendev.org/c/zuul/zuul-jobs/+/787421	17:12
*** bhavikdbavishi has quit IRC		17:24
*** rpittau is now known as rpittau\|afk		17:24
tristanC	according to https://bugs.launchpad.net/tripleo/+bug/1925372, the change for ensure-docker with the socket service broke centos-7 job	17:26
openstack	Launchpad bug 1925372 in tripleo "centos-7 content provider failing to install and start docker" [Critical,Triaged]	17:26
corvus	tristanC: any idea why the centos7 test job didn't catch that?	17:38
corvus	zuul-jobs-test-ensure-docker-centos-7	17:39
openstackgerrit	Merged zuul/zuul master: Store secrets keys and SSH keys in Zookeeper https://review.opendev.org/c/zuul/zuul/+/758940	17:40
*** bhavikdbavishi has joined #zuul		17:46
corvus	tristanC: is it because the zuul-jobs test uses upstream repos and tripleo does not?	17:48
mordred	corvus: looking through logs - the difference ... yupo	17:48
mordred	that's what I was just in the middle of writing	17:48
mordred	tripelo jobs are using distro, zuul-jobs test is using upstream	17:48
corvus	ok, that makes sense	17:48
corvus	and distro might not even have a socket service	17:48
mordred	we could potentially put a when: not distro instead of a failed_when false	17:49
tristanC	tripleo jobs does seem to be using distro packages	17:49
mordred	but- that might not be accurate anywhere other than centos7 (I'm gussing distro-docker on centos7 is old)	17:49
mordred	so it might need to be when: not centos7 and not use-distro-packages	17:50
corvus	maybe we could add a comment so that we remember what we're protecting against	17:50
mordred	yeah	17:50
corvus	and maybe we need to 2x the jobs and run them both ways?	17:50
corvus	normally i'd hesitate to do that, but this role is important and becoming widely used, and it's almost two very different circumstances depending on the flag	17:51
mordred	yeah - I don't think it's a crazy idea	17:52
tristanC	i don't mind using an alternative attribute/comment	17:52
corvus	tristanC: cool -- how about we merge your existing change to fix tripleo quick, then add a comment and/or change the condition and add a second set of tests in a followup?	17:53
tristanC	that works for me, let me do a follow-up then	17:54
corvus	cool, +3	17:55
openstackgerrit	Tristan Cacqueray proposed zuul/zuul-jobs master: ensure-docker: do not manage the socket on distro centos https://review.opendev.org/c/zuul/zuul-jobs/+/787429	18:01
tristanC	here is a follow-up using `when: not centos7 and not use-distro-packages`, but perhaps we could also check for a docker.socket service instead?	18:02
*** y2kenny has joined #zuul		18:06
*** nils has quit IRC		18:07
avass	what would happen if "enabled: true" is removed? if the docker.socket isn't present maybe ansible is smart enough to not do anything?	18:07
avass	docs say "At least one of state and enabled are required." and "started/stopped are idempotent actions that will not run commands unless necessary." so maybe?	18:09
y2kenny	Hi, this has been bugging me for awhile but I am not sure if it's a known bug or configuration issue. On the Zuul web UI status page, when a build set has multiple jobs running, while the job is in progress, there is a link going to the stream log. When a build/job is finished while other jobs in the same buildset are still going, there's a	18:11
y2kenny	link to http://<server>/t/<tenant>/build/<build id> for the finished job. But that link always goes to "build does not exist" with 404 error on api/tenant/<tenant>/build/<build id>... Is that a known issue?	18:11
fungi	y2kenny: should be configurable, in opendev's deployment it goes to the upload location for that build's logs (since the build is not recorded into the database until the entire buildset reports)	18:13
avass	looks like the service module still fails if it's told to stop a service that doesn't exist	18:14
fungi	y2kenny: soon i think zuul is switching when build information is written to the db to be as soon as each build completes rather than being implemented as a reporter	18:14
y2kenny	fungi: Ah ok, I was about to ask about that.	18:15
tobiash	corvus: I've seen that. I think this should be analyzed. However we're running 4.2.0 in production without issues so that sounds weird	18:15
openstackgerrit	Merged zuul/zuul-jobs master: ensure-docker: prevent issue on centos-7 where the socket does not exists https://review.opendev.org/c/zuul/zuul-jobs/+/787421	18:19
corvus	tobiash, avass: so we have 1 report of failure (with no logs) and 2 successes.	18:23
corvus	oof, 2 unit test timeouts on https://review.opendev.org/785972 both on bhs1	18:28
corvus	we're probably getting close to the point where we need to bump the timeout; i sort of expected us to slowly creep up as we did more zk work	18:28
avass	yeah and since they mention k8s and doing a rollback maybe they also reverted something else. we're going to need a bit more information at least	18:28
fungi	probably ovh-bhs1 is acting as a canary because it's least suited to whatever the bulk of the resource consumption in those jobs is	18:30
fungi	i agree it seems like an indication we need to increase the timeout (or improve test efficiency somehow)	18:30
corvus	i have an "easy" way which is not so easy: if we can find a way to roll-up sql schema migrations it would save a huge amount of time	18:31
corvus	i just don't see how to do that with alembic and still support arbitrary migrations	18:32
corvus	i think what i really want is a tree of migrations with multiple starting points; like $current can be reached via the existing tree of migrations or a rollup migration.	18:32
corvus	then 99% of the tests can use the rollup. but i haven't seen how to convince alembic that a tree with multiple roots is okay.	18:34
corvus	(i believe a common way to handle rollup migrations with alembic is to require your users to upgrade to a certain point before upgrading past it; so you simply remove the ability to upgrade from any point before then. that sounds user-unfriendly)	18:35
*** y2kenny has quit IRC		18:45
*** vishalmanchanda has quit IRC		18:54
*** ajitha has quit IRC		19:00
*** bhavikdbavishi has quit IRC		19:10
*** hamalq has quit IRC		19:44
*** hamalq has joined #zuul		19:44
openstackgerrit	Merged zuul/zuul master: Move key_store_password to keystore section in zuul.conf https://review.opendev.org/c/zuul/zuul/+/785972	19:45
openstackgerrit	Merged zuul/zuul master: Support key versions and unique names in ZK keystorage https://review.opendev.org/c/zuul/zuul/+/786774	19:50
openstackgerrit	Merged zuul/zuul master: Pseudo-shard unique project names in keystore https://review.opendev.org/c/zuul/zuul/+/786983	19:50
corvus	huzzah!	19:57
*** nils has joined #zuul		20:08
tobiash	\o/	20:11
corvus	i'm coordinating a restart in #opendev	20:19
corvus	i'd like to restart opendev with that, and then land the global repo state changes	20:19
*** nils has quit IRC		20:21
*** nils has joined #zuul		20:43
corvus	tobiash, swest: opendev is restarted with secrets in zk. it took a few (~5?) minutes to import them for all the projects. i restarted it a second time after that, and it took about 1.5 minutes to load them. that's definitely workable, but it might be worth taking a look at whether we can speed that up.	21:22
mordred	corvus: once we have multi-scheduler the 1.5 minutes might no longer matter?	21:40
corvus	mordred: yeah; there's definitely a balancing act between making things "worse" now in order to make them "better" later...	21:40
corvus	mordred: but even with multi-sched, it wouldn't hurt to be faster :)	21:40
corvus	so i'm thinking if we can do one or two low-hanging-fruit kind of things (like just call get_children once per connections) and they make a difference, great	21:41
corvus	otherwise, it's not worth fretting over	21:41
openstackgerrit	James E. Blair proposed zuul/zuul master: Add a fast-forward test https://review.opendev.org/c/zuul/zuul/+/786521	22:05
openstackgerrit	James E. Blair proposed zuul/zuul master: Correct repo_state format in isUpdateNeeded https://review.opendev.org/c/zuul/zuul/+/786522	22:05
openstackgerrit	James E. Blair proposed zuul/zuul master: Revert "Revert "Make repo state buildset global"" https://review.opendev.org/c/zuul/zuul/+/785535	22:05
openstackgerrit	James E. Blair proposed zuul/zuul master: Fix repo state restore / Keep jobgraphs frozen https://review.opendev.org/c/zuul/zuul/+/785536	22:05
openstackgerrit	James E. Blair proposed zuul/zuul master: Restore repo state in checkoutBranch https://review.opendev.org/c/zuul/zuul/+/786523	22:05
openstackgerrit	James E. Blair proposed zuul/zuul master: Clarify merger updates and resets https://review.opendev.org/c/zuul/zuul/+/786744	22:05
openstackgerrit	James E. Blair proposed zuul/zuul master: Support overlapping repos and a flat workspace scheme https://review.opendev.org/c/zuul/zuul/+/787451	22:05
corvus	that's a rebase plus a new one	22:06
*** nils has quit IRC		22:06
*** rlandy\|rover is now known as rlandy\|rover\|bbl		22:24
corvus	clarkb: when you have a second, would you mind doing a re-review of https://review.opendev.org/785536 ? i think you previously +2d it when it was a pair of changes; i've squashed it since then. and also https://review.opendev.org/786744 which is new -- it's an attempt to make merger stuff easier to understand.	22:30
clarkb	I'll try! (too many things today)	22:31
corvus	oh yeah, sorry, i just saw the new cloud was :(	22:32
clarkb	no worries, it was my own fault	22:35
clarkb	just working to make it happy now	22:35
ianw	2021-04-21 22:35:36,688 ERROR nodepool.builder.CleanupWorker.0: Exception cleaning up image fedora-32:	22:36
ianw	2021-04-21 22:35:36,687 ERROR nodepool.zk.ZooKeeper: Error loading json data from image build /nodepool/images/fedora-32/builds/0000057968	22:37
ianw	json.decoder.JSONDecodeError: Expecting value: line 1 column 1 (char 0)	22:37
ianw	is this ringing any bells, a null entry in ZK maybe?	22:37
clarkb	ianw: ya I think that is an issue no one has been able to track down	22:38
clarkb	and we've just manualyl removed the znode to address it in the past? (though we made its impact less bad by skipping to the next cleanup iirc rather than bailing out)	22:38
ianw	ok	22:39
ianw	this started happening on nb03 @	22:39
ianw	2021-04-21 09:17:00,614 DEBUG nodepool.builder.CleanupWorker.0: Removing failed upload record: <ImageUpload {'state': 'uploading', 'state_time': 1618996371.6001904, 'external_id': None, 'external_name': None, 'format': None, 'username': 'zuul', 'python_path': 'auto', 'shell_type': None, 'id': '0000000004', 'build_id': '0000096307', 'provider_name': 'osuosl-regionone', 'image_name': 'debian-buster-arm64'}>	22:39
ianw	2021-04-21 09:17:00,705 ERROR nodepool.zk.ZooKeeper: Error loading json data from image build /nodepool/images/fedora-32/builds/0000057968	22:40
ianw	that's pretty close together ... i wonder if the removing failed upload somehow affected it?	22:40
ianw	however, nb03 has hours of attempting to upload to osu and failing before that as well (see other discussions on the suspected ipv4 issues there)	22:41
*** hamalq has quit IRC		23:34
ianw	2021-04-21 06:35:47,362 INFO nodepool.builder.CleanupWorker.0: Deleting image build fedora-32-0000057968 from ovh-bhs1	23:37
ianw	2021-04-21 06:35:53,663 ERROR nodepool.zk.ZooKeeper: Error loading json data from image build /nodepool/images/fedora-32/builds/0000057968	23:37
ianw	WE DON'T LOG ANYTHING BETWEEN THOSE TWO	23:38
ianw	sorry, caps lock	23:38
ianw	(CONNECTED [localhost:2181]) /nodepool/images/fedora-32/builds> json_cat 0000057968	23:40
ianw	it's just empty, as suspected	23:40
ianw	ahh, no it's not!	23:41
ianw	(CONNECTED [localhost:2181]) /nodepool/images/fedora-32/builds/0000057968/providers> ls	23:41
ianw	ovh-bhs1	23:41
*** tosky has quit IRC		23:42
ianw	(CONNECTED [localhost:2181]) /nodepool/images/fedora-32/builds/0000057968/providers/ovh-bhs1/images> ls	23:44
ianw	is blank. so somehow ovh-bhs1 has no recorded images but a zombie entry	23:45
fungi	i feel like we've had empty image build znodes before, and never managed to work out what causes that to happen	23:49
corvus	it could be a race/sequencing issue with locks	23:50
ianw	nodepool-builder.log.2021-04-20_23:2021-04-21 06:22:40,129 INFO nodepool.builder.CleanupWorker.0: Deleting image build fedora-32-0000057968 from vexxhost-ca-ymq-1	23:52
ianw	nodepool-builder.log.2021-04-20_23:2021-04-21 06:35:47,362 INFO nodepool.builder.CleanupWorker.0: Deleting image build fedora-32-0000057968 from ovh-bhs1	23:52
ianw	the vexxhost one didn't seem to have any issues. the ovh-bhs1 did. so possibly looking for something that happened between 06:22 -> 06:35	23:53
ianw	this is on nb01. i wonder if 02 did something in that period?	23:53
corvus	if it's a lock race/sequencing issue it would be triggered by the last one.	23:53
fungi	and yeah, the previous incidents i've observed did look like they came in bursts	23:55
ianw	2021-04-21 06:22:31,024 ERROR nodepool.builder.UploadWorker.0: Failed to upload build 0000057970 of image fedora-32 to provider ovh-bhs1	23:55
ianw	openstack.exceptions.ConflictException: ConflictException: 409: Client Error for url: https://image.compute.bhs1.cloud.ovh.net/v2/images/2da84d78-f42f-4f8a-95f4-9405df0d9443/file, Conflict	23:56
ianw	don't know what a "conflictexception" means	23:56
openstackgerrit	James E. Blair proposed zuul/zuul master: Lock node requests in fake nodepool https://review.opendev.org/c/zuul/zuul/+/787301	23:56
fungi	ianw: looks like openstacksdk raises that on a 409 status response from keystoneauth	23:57
fungi	so i guess the real question is under what circumstances does keystoneauth get a 409	23:58
ianw	2021-04-21 06:35:26,661 INFO nodepool.builder.UploadWorker.2: Image build fedora-32-0000057970 (external_id 12743826-2016-4ae7-b838-e4aefd919c7d) in ovh-bhs1 is ready	23:59
ianw	2021-04-21 06:22:31,140 INFO nodepool.builder.UploadWorker.2: Uploading DIB image build 0000057970 from /opt/nodepool_dib/fedora-32-0000057970.qcow2 to ovh-bhs1	23:59
ianw	sorry, that's reversed, but it tries again and the image is ready by 06:35	23:59
fungi	you'd think the whole point of mapping error codes to custom python exceptions would be so you could also attach descriptive explanations ;)	23:59

Generated by irclog2html.py 2.17.2 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!