openstackgerrit | Merged zuul/zuul-jobs master: ensure-docker: ensure docker.socket is stopped https://review.opendev.org/c/zuul/zuul-jobs/+/787271 | 00:18 |
---|---|---|
openstackgerrit | Merged zuul/nodepool master: Require dib 3.10.0 https://review.opendev.org/c/zuul/nodepool/+/786984 | 00:23 |
*** sam_wan has joined #zuul | 00:59 | |
*** sam_wan has quit IRC | 01:36 | |
*** ikhan has quit IRC | 02:07 | |
*** ajitha has joined #zuul | 02:30 | |
*** evrardjp has quit IRC | 02:33 | |
*** evrardjp has joined #zuul | 02:33 | |
*** sam_wan has joined #zuul | 03:16 | |
*** rlandy|rover has quit IRC | 03:34 | |
*** ykarel|away has joined #zuul | 04:06 | |
*** ykarel_ has joined #zuul | 04:10 | |
*** ykarel|away has quit IRC | 04:12 | |
*** bhavikdbavishi has joined #zuul | 04:15 | |
*** bhavikdbavishi1 has joined #zuul | 04:18 | |
*** bhavikdbavishi has quit IRC | 04:20 | |
*** bhavikdbavishi1 is now known as bhavikdbavishi | 04:20 | |
*** bhavikdbavishi has quit IRC | 04:27 | |
*** bhavikdbavishi has joined #zuul | 04:28 | |
*** bhavikdbavishi has quit IRC | 04:39 | |
*** hamalq has quit IRC | 04:49 | |
*** vishalmanchanda has joined #zuul | 04:55 | |
*** jfoufas1 has joined #zuul | 05:11 | |
*** paladox has quit IRC | 05:55 | |
*** ykarel_ has quit IRC | 05:55 | |
*** ykarel__ has joined #zuul | 05:55 | |
*** mnaser has quit IRC | 05:59 | |
*** bhavikdbavishi has joined #zuul | 05:59 | |
*** mnaser has joined #zuul | 06:00 | |
*** saneax has joined #zuul | 06:25 | |
*** jcapitao has joined #zuul | 06:34 | |
*** ykarel_ has joined #zuul | 06:38 | |
*** ykarel__ has quit IRC | 06:40 | |
*** bhavikdbavishi1 has joined #zuul | 06:53 | |
*** bhavikdbavishi has quit IRC | 06:54 | |
*** bhavikdbavishi1 is now known as bhavikdbavishi | 06:54 | |
*** avass has quit IRC | 07:09 | |
*** avass has joined #zuul | 07:10 | |
*** rpittau|afk is now known as rpittau | 07:33 | |
*** bhavikdbavishi has quit IRC | 07:35 | |
*** bhavikdbavishi has joined #zuul | 07:35 | |
*** tosky has joined #zuul | 07:46 | |
*** bhavikdbavishi has quit IRC | 07:48 | |
*** ykarel_ has quit IRC | 07:52 | |
*** jpena|off is now known as jpena | 07:56 | |
*** nils has joined #zuul | 08:08 | |
*** bhavikdbavishi has joined #zuul | 08:08 | |
*** bhavikdbavishi1 has joined #zuul | 08:11 | |
*** bhavikdbavishi has quit IRC | 08:13 | |
*** bhavikdbavishi1 is now known as bhavikdbavishi | 08:13 | |
*** ykarel_ has joined #zuul | 08:27 | |
*** ykarel_ has quit IRC | 09:34 | |
*** holser has joined #zuul | 09:52 | |
openstackgerrit | Ian Wienand proposed zuul/nodepool master: Account for resource usage of leaked nodes https://review.opendev.org/c/zuul/nodepool/+/785821 | 10:12 |
*** bhavikdbavishi has quit IRC | 10:18 | |
*** bhavikdbavishi has joined #zuul | 10:25 | |
openstackgerrit | Matthieu Huin proposed zuul/zuul master: web UI: user login with OpenID Connect https://review.opendev.org/c/zuul/zuul/+/734082 | 10:28 |
openstackgerrit | Matthieu Huin proposed zuul/zuul master: Add authentication-realm attribute to tenants https://review.opendev.org/c/zuul/zuul/+/735586 | 10:29 |
openstackgerrit | Matthieu Huin proposed zuul/zuul master: web UI: allow a privileged user to dequeue a change https://review.opendev.org/c/zuul/zuul/+/734850 | 10:29 |
openstackgerrit | Matthieu Huin proposed zuul/zuul master: web UI: allow a privileged user to re-enqueue a change https://review.opendev.org/c/zuul/zuul/+/736772 | 10:29 |
openstackgerrit | Matthieu Huin proposed zuul/zuul master: Web UI: allow a privileged user to request autohold https://review.opendev.org/c/zuul/zuul/+/768115 | 10:30 |
openstackgerrit | Matthieu Huin proposed zuul/zuul master: Web UI: add Autoholds, Autohold page https://review.opendev.org/c/zuul/zuul/+/768199 | 10:31 |
*** jcapitao is now known as jcapitao_lunch | 10:36 | |
*** bhavikdbavishi has quit IRC | 11:17 | |
*** bhavikdbavishi has joined #zuul | 11:29 | |
*** jpena is now known as jpena|lunch | 11:32 | |
*** rlandy has joined #zuul | 11:48 | |
*** rlandy is now known as rlandy|rover | 11:49 | |
*** rlandy|rover has quit IRC | 11:54 | |
*** sshnaidm has quit IRC | 12:00 | |
*** jcapitao_lunch is now known as jcapitao | 12:06 | |
*** sshnaidm has joined #zuul | 12:07 | |
*** rlandy has joined #zuul | 12:08 | |
*** rlandy is now known as rlandy|rover | 12:08 | |
*** okamis has joined #zuul | 12:30 | |
*** jpena|lunch is now known as jpena | 12:32 | |
*** sam_wan has quit IRC | 12:52 | |
*** fsvsbs has quit IRC | 12:55 | |
*** bhavikdbavishi has quit IRC | 13:35 | |
*** saneax has quit IRC | 14:16 | |
corvus | tobiash: do you have any thoughts on http://lists.zuul-ci.org/pipermail/zuul-discuss/2021-April/001566.html ? | 14:33 |
avass | corvus: I'm running my own deployment on the tip of the master branch and don't have those problems | 14:55 |
avass | so maybe there's a change combined with specific github configuration that causes that? | 14:56 |
corvus | huh. super weird. i guess we'll just wait for gtema_ to do more investigation | 14:56 |
corvus | avass: maybe so? | 14:56 |
*** jfoufas1 has quit IRC | 14:57 | |
*** bhavikdbavishi has joined #zuul | 14:59 | |
*** saneax has joined #zuul | 15:05 | |
avass | also I have some ideas how to extend zuul-cache to also handle provides/requires and fetching artifacts from a previous pipelines with the artifacts api :) | 15:13 |
avass | corvus: I double checked and both labels and reviews work for me and I'm running 4.2.1.dev8 4f3f973a | 15:15 |
*** okamis has quit IRC | 15:21 | |
*** bhavikdbavishi1 has joined #zuul | 15:28 | |
*** bhavikdbavishi has quit IRC | 15:30 | |
*** bhavikdbavishi1 is now known as bhavikdbavishi | 15:30 | |
openstackgerrit | Merged zuul/nodepool master: Log decline reason at info https://review.opendev.org/c/zuul/nodepool/+/786513 | 15:39 |
*** bhavikdbavishi1 has joined #zuul | 16:08 | |
*** bhavikdbavishi has quit IRC | 16:11 | |
*** bhavikdbavishi1 is now known as bhavikdbavishi | 16:11 | |
*** saneax has quit IRC | 16:16 | |
corvus | hrm, https://review.opendev.org/758940 failed quickstart, but afaict it just looks like a random zk disconnect | 16:20 |
corvus | i'm going to recheck, but let's keep that in mind | 16:20 |
corvus | (visible in the nodepool launcher) | 16:20 |
*** hamalq has joined #zuul | 16:22 | |
*** hamalq has quit IRC | 16:23 | |
*** hamalq has joined #zuul | 16:24 | |
*** jcapitao has quit IRC | 16:41 | |
*** jpena is now known as jpena|off | 17:03 | |
*** bhavikdbavishi has quit IRC | 17:07 | |
*** bhavikdbavishi has joined #zuul | 17:08 | |
openstackgerrit | Tristan Cacqueray proposed zuul/zuul-jobs master: ensure-docker: prevent issue on centos-7 where the socket does not exists https://review.opendev.org/c/zuul/zuul-jobs/+/787421 | 17:12 |
*** bhavikdbavishi has quit IRC | 17:24 | |
*** rpittau is now known as rpittau|afk | 17:24 | |
tristanC | according to https://bugs.launchpad.net/tripleo/+bug/1925372, the change for ensure-docker with the socket service broke centos-7 job | 17:26 |
openstack | Launchpad bug 1925372 in tripleo "centos-7 content provider failing to install and start docker" [Critical,Triaged] | 17:26 |
corvus | tristanC: any idea why the centos7 test job didn't catch that? | 17:38 |
corvus | zuul-jobs-test-ensure-docker-centos-7 | 17:39 |
openstackgerrit | Merged zuul/zuul master: Store secrets keys and SSH keys in Zookeeper https://review.opendev.org/c/zuul/zuul/+/758940 | 17:40 |
*** bhavikdbavishi has joined #zuul | 17:46 | |
corvus | tristanC: is it because the zuul-jobs test uses upstream repos and tripleo does not? | 17:48 |
mordred | corvus: looking through logs - the difference ... yupo | 17:48 |
mordred | that's what I was just in the middle of writing | 17:48 |
mordred | tripelo jobs are using distro, zuul-jobs test is using upstream | 17:48 |
corvus | ok, that makes sense | 17:48 |
corvus | and distro might not even have a socket service | 17:48 |
mordred | we could potentially put a when: not distro instead of a failed_when false | 17:49 |
tristanC | tripleo jobs does seem to be using distro packages | 17:49 |
mordred | but- that might not be accurate anywhere other than centos7 (I'm gussing distro-docker on centos7 is old) | 17:49 |
mordred | so it might need to be when: not centos7 and not use-distro-packages | 17:50 |
corvus | maybe we could add a comment so that we remember what we're protecting against | 17:50 |
mordred | yeah | 17:50 |
corvus | and maybe we need to 2x the jobs and run them both ways? | 17:50 |
corvus | normally i'd hesitate to do that, but this role is important and becoming widely used, and it's almost two very different circumstances depending on the flag | 17:51 |
mordred | yeah - I don't think it's a crazy idea | 17:52 |
tristanC | i don't mind using an alternative attribute/comment | 17:52 |
corvus | tristanC: cool -- how about we merge your existing change to fix tripleo quick, then add a comment and/or change the condition and add a second set of tests in a followup? | 17:53 |
tristanC | that works for me, let me do a follow-up then | 17:54 |
corvus | cool, +3 | 17:55 |
openstackgerrit | Tristan Cacqueray proposed zuul/zuul-jobs master: ensure-docker: do not manage the socket on distro centos https://review.opendev.org/c/zuul/zuul-jobs/+/787429 | 18:01 |
tristanC | here is a follow-up using `when: not centos7 and not use-distro-packages`, but perhaps we could also check for a docker.socket service instead? | 18:02 |
*** y2kenny has joined #zuul | 18:06 | |
*** nils has quit IRC | 18:07 | |
avass | what would happen if "enabled: true" is removed? if the docker.socket isn't present maybe ansible is smart enough to not do anything? | 18:07 |
avass | docs say "At least one of state and enabled are required." and "started/stopped are idempotent actions that will not run commands unless necessary." so maybe? | 18:09 |
y2kenny | Hi, this has been bugging me for awhile but I am not sure if it's a known bug or configuration issue. On the Zuul web UI status page, when a build set has multiple jobs running, while the job is in progress, there is a link going to the stream log. When a build/job is finished while other jobs in the same buildset are still going, there's a | 18:11 |
y2kenny | link to http://<server>/t/<tenant>/build/<build id> for the finished job. But that link always goes to "build does not exist" with 404 error on api/tenant/<tenant>/build/<build id>... Is that a known issue? | 18:11 |
fungi | y2kenny: should be configurable, in opendev's deployment it goes to the upload location for that build's logs (since the build is not recorded into the database until the entire buildset reports) | 18:13 |
avass | looks like the service module still fails if it's told to stop a service that doesn't exist | 18:14 |
fungi | y2kenny: soon i think zuul is switching when build information is written to the db to be as soon as each build completes rather than being implemented as a reporter | 18:14 |
y2kenny | fungi: Ah ok, I was about to ask about that. | 18:15 |
tobiash | corvus: I've seen that. I think this should be analyzed. However we're running 4.2.0 in production without issues so that sounds weird | 18:15 |
openstackgerrit | Merged zuul/zuul-jobs master: ensure-docker: prevent issue on centos-7 where the socket does not exists https://review.opendev.org/c/zuul/zuul-jobs/+/787421 | 18:19 |
corvus | tobiash, avass: so we have 1 report of failure (with no logs) and 2 successes. | 18:23 |
corvus | oof, 2 unit test timeouts on https://review.opendev.org/785972 both on bhs1 | 18:28 |
corvus | we're probably getting close to the point where we need to bump the timeout; i sort of expected us to slowly creep up as we did more zk work | 18:28 |
avass | yeah and since they mention k8s and doing a rollback maybe they also reverted something else. we're going to need a bit more information at least | 18:28 |
fungi | probably ovh-bhs1 is acting as a canary because it's least suited to whatever the bulk of the resource consumption in those jobs is | 18:30 |
fungi | i agree it seems like an indication we need to increase the timeout (or improve test efficiency somehow) | 18:30 |
corvus | i have an "easy" way which is not so easy: if we can find a way to roll-up sql schema migrations it would save a huge amount of time | 18:31 |
corvus | i just don't see how to do that with alembic and still support arbitrary migrations | 18:32 |
corvus | i think what i really want is a tree of migrations with multiple starting points; like $current can be reached via the existing tree of migrations or a rollup migration. | 18:32 |
corvus | then 99% of the tests can use the rollup. but i haven't seen how to convince alembic that a tree with multiple roots is okay. | 18:34 |
corvus | (i believe a common way to handle rollup migrations with alembic is to require your users to upgrade to a certain point before upgrading past it; so you simply remove the ability to upgrade from any point before then. that sounds user-unfriendly) | 18:35 |
*** y2kenny has quit IRC | 18:45 | |
*** vishalmanchanda has quit IRC | 18:54 | |
*** ajitha has quit IRC | 19:00 | |
*** bhavikdbavishi has quit IRC | 19:10 | |
*** hamalq has quit IRC | 19:44 | |
*** hamalq has joined #zuul | 19:44 | |
openstackgerrit | Merged zuul/zuul master: Move key_store_password to keystore section in zuul.conf https://review.opendev.org/c/zuul/zuul/+/785972 | 19:45 |
openstackgerrit | Merged zuul/zuul master: Support key versions and unique names in ZK keystorage https://review.opendev.org/c/zuul/zuul/+/786774 | 19:50 |
openstackgerrit | Merged zuul/zuul master: Pseudo-shard unique project names in keystore https://review.opendev.org/c/zuul/zuul/+/786983 | 19:50 |
corvus | huzzah! | 19:57 |
*** nils has joined #zuul | 20:08 | |
tobiash | \o/ | 20:11 |
corvus | i'm coordinating a restart in #opendev | 20:19 |
corvus | i'd like to restart opendev with that, and then land the global repo state changes | 20:19 |
*** nils has quit IRC | 20:21 | |
*** nils has joined #zuul | 20:43 | |
corvus | tobiash, swest: opendev is restarted with secrets in zk. it took a few (~5?) minutes to import them for all the projects. i restarted it a second time after that, and it took about 1.5 minutes to load them. that's definitely workable, but it might be worth taking a look at whether we can speed that up. | 21:22 |
mordred | corvus: once we have multi-scheduler the 1.5 minutes might no longer matter? | 21:40 |
corvus | mordred: yeah; there's definitely a balancing act between making things "worse" now in order to make them "better" later... | 21:40 |
corvus | mordred: but even with multi-sched, it wouldn't hurt to be faster :) | 21:40 |
corvus | so i'm thinking if we can do one or two low-hanging-fruit kind of things (like just call get_children once per connections) and they make a difference, great | 21:41 |
corvus | otherwise, it's not worth fretting over | 21:41 |
openstackgerrit | James E. Blair proposed zuul/zuul master: Add a fast-forward test https://review.opendev.org/c/zuul/zuul/+/786521 | 22:05 |
openstackgerrit | James E. Blair proposed zuul/zuul master: Correct repo_state format in isUpdateNeeded https://review.opendev.org/c/zuul/zuul/+/786522 | 22:05 |
openstackgerrit | James E. Blair proposed zuul/zuul master: Revert "Revert "Make repo state buildset global"" https://review.opendev.org/c/zuul/zuul/+/785535 | 22:05 |
openstackgerrit | James E. Blair proposed zuul/zuul master: Fix repo state restore / Keep jobgraphs frozen https://review.opendev.org/c/zuul/zuul/+/785536 | 22:05 |
openstackgerrit | James E. Blair proposed zuul/zuul master: Restore repo state in checkoutBranch https://review.opendev.org/c/zuul/zuul/+/786523 | 22:05 |
openstackgerrit | James E. Blair proposed zuul/zuul master: Clarify merger updates and resets https://review.opendev.org/c/zuul/zuul/+/786744 | 22:05 |
openstackgerrit | James E. Blair proposed zuul/zuul master: Support overlapping repos and a flat workspace scheme https://review.opendev.org/c/zuul/zuul/+/787451 | 22:05 |
corvus | that's a rebase plus a new one | 22:06 |
*** nils has quit IRC | 22:06 | |
*** rlandy|rover is now known as rlandy|rover|bbl | 22:24 | |
corvus | clarkb: when you have a second, would you mind doing a re-review of https://review.opendev.org/785536 ? i think you previously +2d it when it was a pair of changes; i've squashed it since then. and also https://review.opendev.org/786744 which is new -- it's an attempt to make merger stuff easier to understand. | 22:30 |
clarkb | I'll try! (too many things today) | 22:31 |
corvus | oh yeah, sorry, i just saw the new cloud was :( | 22:32 |
clarkb | no worries, it was my own fault | 22:35 |
clarkb | just working to make it happy now | 22:35 |
ianw | 2021-04-21 22:35:36,688 ERROR nodepool.builder.CleanupWorker.0: Exception cleaning up image fedora-32: | 22:36 |
ianw | 2021-04-21 22:35:36,687 ERROR nodepool.zk.ZooKeeper: Error loading json data from image build /nodepool/images/fedora-32/builds/0000057968 | 22:37 |
ianw | json.decoder.JSONDecodeError: Expecting value: line 1 column 1 (char 0) | 22:37 |
ianw | is this ringing any bells, a null entry in ZK maybe? | 22:37 |
clarkb | ianw: ya I think that is an issue no one has been able to track down | 22:38 |
clarkb | and we've just manualyl removed the znode to address it in the past? (though we made its impact less bad by skipping to the next cleanup iirc rather than bailing out) | 22:38 |
ianw | ok | 22:39 |
ianw | this started happening on nb03 @ | 22:39 |
ianw | 2021-04-21 09:17:00,614 DEBUG nodepool.builder.CleanupWorker.0: Removing failed upload record: <ImageUpload {'state': 'uploading', 'state_time': 1618996371.6001904, 'external_id': None, 'external_name': None, 'format': None, 'username': 'zuul', 'python_path': 'auto', 'shell_type': None, 'id': '0000000004', 'build_id': '0000096307', 'provider_name': 'osuosl-regionone', 'image_name': 'debian-buster-arm64'}> | 22:39 |
ianw | 2021-04-21 09:17:00,705 ERROR nodepool.zk.ZooKeeper: Error loading json data from image build /nodepool/images/fedora-32/builds/0000057968 | 22:40 |
ianw | that's pretty close together ... i wonder if the removing failed upload somehow affected it? | 22:40 |
ianw | however, nb03 has hours of attempting to upload to osu and failing before that as well (see other discussions on the suspected ipv4 issues there) | 22:41 |
*** hamalq has quit IRC | 23:34 | |
ianw | 2021-04-21 06:35:47,362 INFO nodepool.builder.CleanupWorker.0: Deleting image build fedora-32-0000057968 from ovh-bhs1 | 23:37 |
ianw | 2021-04-21 06:35:53,663 ERROR nodepool.zk.ZooKeeper: Error loading json data from image build /nodepool/images/fedora-32/builds/0000057968 | 23:37 |
ianw | WE DON'T LOG ANYTHING BETWEEN THOSE TWO | 23:38 |
ianw | sorry, caps lock | 23:38 |
ianw | (CONNECTED [localhost:2181]) /nodepool/images/fedora-32/builds> json_cat 0000057968 | 23:40 |
ianw | it's just empty, as suspected | 23:40 |
ianw | ahh, no it's not! | 23:41 |
ianw | (CONNECTED [localhost:2181]) /nodepool/images/fedora-32/builds/0000057968/providers> ls | 23:41 |
ianw | ovh-bhs1 | 23:41 |
*** tosky has quit IRC | 23:42 | |
ianw | (CONNECTED [localhost:2181]) /nodepool/images/fedora-32/builds/0000057968/providers/ovh-bhs1/images> ls | 23:44 |
ianw | is blank. so somehow ovh-bhs1 has no recorded images but a zombie entry | 23:45 |
fungi | i feel like we've had empty image build znodes before, and never managed to work out what causes that to happen | 23:49 |
corvus | it could be a race/sequencing issue with locks | 23:50 |
ianw | nodepool-builder.log.2021-04-20_23:2021-04-21 06:22:40,129 INFO nodepool.builder.CleanupWorker.0: Deleting image build fedora-32-0000057968 from vexxhost-ca-ymq-1 | 23:52 |
ianw | nodepool-builder.log.2021-04-20_23:2021-04-21 06:35:47,362 INFO nodepool.builder.CleanupWorker.0: Deleting image build fedora-32-0000057968 from ovh-bhs1 | 23:52 |
ianw | the vexxhost one didn't seem to have any issues. the ovh-bhs1 did. so possibly looking for something that happened between 06:22 -> 06:35 | 23:53 |
ianw | this is on nb01. i wonder if 02 did something in that period? | 23:53 |
corvus | if it's a lock race/sequencing issue it would be triggered by the last one. | 23:53 |
fungi | and yeah, the previous incidents i've observed did look like they came in bursts | 23:55 |
ianw | 2021-04-21 06:22:31,024 ERROR nodepool.builder.UploadWorker.0: Failed to upload build 0000057970 of image fedora-32 to provider ovh-bhs1 | 23:55 |
ianw | openstack.exceptions.ConflictException: ConflictException: 409: Client Error for url: https://image.compute.bhs1.cloud.ovh.net/v2/images/2da84d78-f42f-4f8a-95f4-9405df0d9443/file, Conflict | 23:56 |
ianw | don't know what a "conflictexception" means | 23:56 |
openstackgerrit | James E. Blair proposed zuul/zuul master: Lock node requests in fake nodepool https://review.opendev.org/c/zuul/zuul/+/787301 | 23:56 |
fungi | ianw: looks like openstacksdk raises that on a 409 status response from keystoneauth | 23:57 |
fungi | so i guess the real question is under what circumstances does keystoneauth get a 409 | 23:58 |
ianw | 2021-04-21 06:35:26,661 INFO nodepool.builder.UploadWorker.2: Image build fedora-32-0000057970 (external_id 12743826-2016-4ae7-b838-e4aefd919c7d) in ovh-bhs1 is ready | 23:59 |
ianw | 2021-04-21 06:22:31,140 INFO nodepool.builder.UploadWorker.2: Uploading DIB image build 0000057970 from /opt/nodepool_dib/fedora-32-0000057970.qcow2 to ovh-bhs1 | 23:59 |
ianw | sorry, that's reversed, but it tries again and the image is ready by 06:35 | 23:59 |
fungi | you'd think the whole point of mapping error codes to custom python exceptions would be so you could also attach descriptive explanations ;) | 23:59 |
Generated by irclog2html.py 2.17.2 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!