Tuesday, 2021-07-06

ianwhttps://zuul.opendev.org/t/openstack/builds?job_name=release-wheel-cache does not appear to be running still01:04
*** ykarel_ is now known as ykarel04:48
*** ysandeep|away is now known as ysandeep05:10
*** jpena|off is now known as jpena06:58
*** amoralej|off is now known as amoralej07:07
*** rpittau|afk is now known as rpittau07:30
*** ysandeep is now known as ysandeep|lunch07:55
*** ykarel is now known as ykarel|lunch08:04
*** ykarel|lunch is now known as ykarel09:17
*** ysandeep|lunch is now known as ysandeep09:19
*** bhagyashris_ is now known as bhagyashris|ruck09:50
opendevreviewDmitry Tantsur proposed ttygroup/gertty master: Suggest a 'cherry-picked from' line when cherry picking  https://review.opendev.org/c/ttygroup/gertty/+/79964111:06
opendevreviewDmitry Tantsur proposed ttygroup/gertty master: examples: match 'commit <hash>'  https://review.opendev.org/c/ttygroup/gertty/+/79964211:13
*** jpena is now known as jpena|lunch11:31
*** ysandeep is now known as ysandeep|brb12:12
*** amoralej is now known as amoralej|lunch12:20
*** ysandeep|brb is now known as ysandeep12:28
*** jpena|lunch is now known as jpena12:37
*** amoralej|lunch is now known as amoralej13:15
*** osmanlicilegi is now known as Guest413:34
*** prometheanfire is now known as Guest313:35
opendevreviewDanni Shi proposed openstack/diskimage-builder master: Add a keylime-agent element and a tpm-emulator element  https://review.opendev.org/c/openstack/diskimage-builder/+/78960113:44
*** ykarel is now known as ykarel|away14:38
*** ysandeep is now known as ysandeep|away14:41
clarkbianw: I wonder if that hit the afllout of jinja changes in zuul?14:56
clarkbI can probabl take a look after my morning meeting and sending out the infra meeting agenda14:56
clarkbfungi: I'm also hoping to run my gerrit account retirement script over the list I generated on Friday. Any chance you might be able to take a look at hte double check list today?14:56
fungiclarkb: i looked at it some over the weekend, spot-checking the danger list, and everything i queried looked reasonable to me. i'll take another look today between meetings though14:57
clarkbexcellent, in that case I'll plan to start processing that list after meetings and such14:58
clarkbfungi: in https://review.opendev.org/c/openstack/project-config/+/799123 why do we remove support to check track-upstream just because we drop the feature from one project? Do we clean that up so we can remove those tools/ scripts that get cleaned up?15:13
clarkbI'm ok with that if that is the case, just want to make sure I understand the additional cleanup of feature support for jeepyb in there15:13
fungiclarkb: well, at least as a start, it lets us catch if someone tries to add track-upstream on another project15:22
fungithe expectation is that we don't want to continue to support that15:23
*** gthiemon1e is now known as gthiemonge15:23
fungithe accompanying system-config change rips out the cronjob for it, which has broken on the new gerrit deployment15:23
clarkbah I hadn't realized it wasn't working ++ in that case15:25
fungiyeah, it's been spamming us ~hourly since the new server was built15:25
fungiand as we don't need it any longer, it seemed like my time was better invested simplifying it away rather than fixing it15:25
clarkbsounds good. I'll finish up my review on those changes as soon as I get the infra meeting agenda out15:27
fungii haven't done anything to remove the feature from jeepyb yet, just our use of it15:28
fungibut that could be a next step (there's a fair bit we can probably clean up in jeepyb at this point)15:28
clarkbwe might leave it in jeepyb in case other users are using it (though I don't really know of any other users of jeepyb at this point)15:29
fungiright, that was sort of why i hadn't approached that end of it yet15:30
clarkbI've got the meeting agenda updated. Any last minute items to add before I send it out?15:30
fungii can't think of any15:35
clarkbfungi: I left a comment on 799123. Maybe you can check it to see if you want to address that in a followup and approve 799123 as is?15:40
* clarkb finds breakfast15:44
*** marios is now known as marios|out16:04
*** jpena is now known as jpena|off16:14
opendevreviewRich Bowen proposed opendev/yaml2ical master: Report which week a meeting occurs.  https://review.opendev.org/c/opendev/yaml2ical/+/79969116:29
clarkbfungi: I've got my input list for retire-user.sh ready to go as well as an updated heredoc git commit message in that script. Should I start processing the list or do you want to double check more accounts first?16:38
*** rpittau is now known as rpittau|afk16:39
fungiclarkb: i checked a few more, seems like it should be safe enough. we should expect at least a few people to reactivate eventually and run into problems, but can't make an omelette otherwise16:39
clarkbya I think if we wait a couple of weeks to give them a chance to complain we'll be fine.16:40
clarkbalright I'll start running that here16:40
clarkbjust have to figure out tee syntax with for loops again16:40
opendevreviewJeremy Stanley proposed openstack/project-config master: Drop use of track-upstream  https://review.opendev.org/c/openstack/project-config/+/79912316:45
*** amoralej is now known as amoralej|off16:58
fricklerclarkb: ianw: I'm not sure I'll make it to today's meeting, but I want to mention that I try to at least passively watch most of the meetings, even if I don't talk much. so I'd not be super happy with moving it even later, but given what ianw does, I'd also agree that giving him easier access would be more important17:10
clarkbfrickler: that is good to know. I thought I'd raise the question at least.17:10
fricklermaybe another option would be to move to your evening, which would be early morning for me? 5 UTC would be feasible for me, maybe 4 UTC too, at least during daylight saving here17:12
fricklerthough that might be too late for fungi17:13
fungii'm flexible... don't have kids or other obligations really17:13
fungi5utc would be 1am local right now, so not ideal, but i'd make it work17:14
frickleron a slightly different topic I'll be mostly offline (otherwise known as PTO) for 3 weeks starting this friday. maybe you can experiment with the timing during that interval17:15
fungiyeah, i'll be gone for the next two meetings after today, on the road all day both tuesdays (i might come home monday instead in which case i'll just miss next week's)17:20
*** mgoddard- is now known as mgoddard17:26
clarkbfrickler: enjoy your time off and that is a neat idea re experimenting with times17:31
clarkbok the account retirments are done. I'll get the log file stashed in the usual location momentarily. Then I'll rerun the audit script to pick up these changes17:59
fungiawesome, thanks again!17:59
clarkbthe log is on review now and I'm running the audit now18:04
clarkbI'm going to take a break then prep for our meeting while that is running18:04
AJaegerHi, just noticed that a promote job failed, seems a fallout from the recent security Zuul change.18:21
AJaegerCould somebody look at https://zuul.opendev.org/t/openstack/build/1c148f751a594ceab627020b6f11dd36 and propose a fix, please?18:21
clarkbAJaeger: thanks for the haeds up I'm sure one of us can. ianw also noticed some periodic jobs are not running so we need to dig into those too18:23
AJaegerthanks, clarkb !18:23
clarkbfungi: ^ ajaegers example looks like the one you were working on before with targets being undefined18:32
fungiThe task includes an option with an undefined variable. The error was: 'dict object' has no attribute 'targets' The error appears to be in '/var/lib/zuul/builds/1c148f751a594ceab627020b6f11dd36/trusted/project_0/opendev.org/opendev/base-jobs/playbooks/docs/promote.yaml': line 47, column 18:32
clarkbbut this is for openstack manuals18:32
fungiyeah, was just taking a peek there18:32
fungipromote-openstack-manuals seems to descend from opendev-promote-docs-base which i think we adjusted18:32
fungiso need to figure out what the mismatch is there18:33
clarkbfungi: I have approved the track upstream removal change18:33
clarkbthe gerrit user audit has completed and results looks how I expect them. I'll push that yaml file up to the normal spot18:35
fungithanks x2!18:35
opendevreviewMerged openstack/project-config master: Drop use of track-upstream  https://review.opendev.org/c/openstack/project-config/+/79912318:41
fungionce the system-config change lands, optional follow-ups are retiring the opendev/gerrit repo and ripping track-upstream support out of jeepyb18:42
clarkblooks like the linaro cloud ssl cert is still expired. I'll email kevinz 18:48
fungiyeah, seems like he didn't see the ping in here last week18:48
clarkbemail sent18:53
fungiokay, looks like the problem with the openstack manuals promotion is afsdocs_secret-openstack-manuals needs to be reworked to look similar to afsdocs_secret-tox-docs18:59
fungii'll give it a shot18:59
opendevreviewJeremy Stanley proposed openstack/project-config master: afsdocs_secret-openstack-manuals: Zuul 4.6.0 fix  https://review.opendev.org/c/openstack/project-config/+/79971019:03
fungithat ^ should hopefully solve the issue AJaeger pointed out earlier19:03
ianwjust fyi i'm out my thu/fri this week19:56
ianwclarkb: re the periodic jobs, i'm sure it is some sort of job config issue but it's not obvious to me where the error is19:59
ianwi would have thought there'd still be failed jobs20:00
clarkbianw: ya I looked at the zuul status bell and I didn't see any errors that looked suspect there20:00
clarkbmaybe we need to grep scheduler logs for the job name and take it from there?20:00
ianwclarkb: yeah, that's pretty much what i did :) http://lists.zuul-ci.org/pipermail/zuul-discuss/2021-July/001660.html20:01
clarkbaha there is a thread on zuul-discuss20:01
ianwhttp://people.redhat.com/~iwienand/zuul-periodic-27-06-2021/963101cc7c01460abfd34664e5610e18.txt is basically the last time it ran20:01
ianwhttp://people.redhat.com/~iwienand/zuul-periodic-27-06-2021/afdb7039494649c09e8bb2b64158a385.txt i don't know.  that log is > 100mb as it seems to be in a bit of a loop.  but also, i think during that period requirements had config errors20:03
ianwhttp://people.redhat.com/~iwienand/zuul-periodic-27-06-2021/5fe4dd08d1a44700bf92fd475565f687.txt is after zuul got restarted20:03
clarkbChange <Branch 0x7fa023409a60 openstack/requirements refs/heads/master updated None..None> is already in pipeline, ignoring20:05
clarkbaha that is it20:05
clarkbianw: look at the status page we have a 90 hour old periodic entry in the queue20:06
clarkbianw: I think we dequeue that then let it schedule directly as usual? I think enqueue of periodic doesn't work the way we want and that is hcausing the problem20:06
ianwpublish-wheel-cache-debian-bullseye-arm64 queued20:07
ianwand then everything else with error20:08
ianwthere is an 86-hour old one too20:08
clarkbya and the arm64 queue is related to the linaro cert I think? I sent email about that20:08
clarkbI need to eat some lunch but I can help clean that up after20:09
ianwhrm, we should have osuosl nodes for that20:09
ianwhrm, the one that has all errors is openstack/requirements 000000020:16
ianwi wonder if that got re-enqueued after the zuul restart20:16
ianwthere is a kolla change in similar state20:16
ianw"sudo docker-compose  exec scheduler zuul dequeue --tenant openstack --pipeline periodic --project openstack/requirements --ref 0000000000000000000000000000000000000000" failed, but the entry went way from the status page20:22
ianw"sudo docker-compose  exec scheduler zuul dequeue --tenant openstack --pipeline periodic --project openstack/requirements --ref refs/heads/master" worked but i had to run it twice (i reloaded status page between)20:23
clarkbianw: yes it was in the queue when things got stopped and then reenqueued after and I think the reenqueue doesn't work for periodic jobs20:37
clarkbfungi: in https://review.opendev.org/c/openstack/project-config/+/799710/1/zuul.d/secrets.yaml is "branch" as a key there going to do the right thing? I guess I don't understand how the generic "branch" branch name fits into publishing for manuals20:40
clarkbianw: re arm64 and linaro outage does osuosl provide the larger arm64 flavor type or just linaro? I thought that may be the problem we are seeing20:41
fungiclarkb: i'm not positive, that change just makes it consistent with the other secrets being passed to the same parent. the playbook is what actually accesses that array20:48
ianwthe requirements job shouldn't require larger images, although the kolla one maybe20:50
clarkbfungi: ya I worry that the mapping from special var in the past doesn't map onto what is in there now. I think you may need to write down all the branches? Worth cross checking anyway20:50
ianwok, right now i just ran21:10
ianwsudo docker-compose  exec scheduler zuul dequeue --tenant openstack --pipeline periodic --project openstack/kolla --ref refs/heads/master21:10
ianwwhich seemed to remove the stuck kolla periodic buildset @ 000000000000000000000000000000000000000021:11
ianwbut the one for refs/heads/master is still there21:11
fungiyeah, the 0x0 items are from a reenqueue21:11
ianwi am going to run it again21:12
ianwok, so the *second* run got the one that was in the queue with ref/heads/master21:13
fungiworth working out whether we're actually able to pass the right parameters through the rpc interface to correctly reenqueue a timer-triggered item21:13
ianwi have great deja vu of never being able to figure that out21:13
ianwhttp://lists.zuul-ci.org/pipermail/zuul-discuss/2019-May/000909.html google tells me21:14
ianwi'm not sure why the first dequeue with --ref refs/heads/master removes the buildset against 000....00021:15
ianwanyway, there is a wallaby one for kolla too21:15
ianwok, that is gone now too21:15
ianwi will look at the arm64 nodes before 06:00 UTC and see if we can't sort this out21:16
clarkbfungi: your fix for the promotion job looks correct after reading the playbook. However, https://opendev.org/opendev/base-jobs/src/branch/master/zuul.d/jobs.yaml#L277-L290 should also get updated21:19
clarkbfungi: I've approced the secrets fix and we can followup on ^21:19
clarkbdocs_tag_path only shows up in the docstrings of those jobs21:20
ianw799126 should also finally unbreak the dib gate and also our centos-8-stream image building21:20
clarkbI think it has been replaced with target.tag. I'll work on an update to base-jobs21:20
opendevreviewMerged openstack/project-config master: afsdocs_secret-openstack-manuals: Zuul 4.6.0 fix  https://review.opendev.org/c/openstack/project-config/+/79971021:27
fungiclarkb: thanks, i'll try to write that one you spotted up now21:28
fungiaha, the description for opendev-publish-tox-docs-base21:30
opendevreviewClark Boylan proposed opendev/base-jobs master: Fix docstrings to match job updates  https://review.opendev.org/c/opendev/base-jobs/+/79972021:34
clarkbfungi: ^ something like that maybe21:34
fungiooh, i was struggling to find words. i'll review that and criticize your choices instead! ;)21:34
fungii've asked a question on it, because my fixes are very close to being a cargo-cult of prior fixes21:37
fungiso my understanding is cloudy21:37
fungiit seems like the playbooks aren't treating those as jobvars at all, since they're accessed via the secret values21:39
clarkbfungi: responded21:39
clarkbjobvar is just an rst rendering thing21:40
opendevreviewClark Boylan proposed opendev/base-jobs master: Fix docstrings to match job updates  https://review.opendev.org/c/opendev/base-jobs/+/79972021:42
clarkbianw: left some comments on the paste change. I think there are a few testing things to clean up and I had a question about some of that too21:52
clarkbcorvus: catching up on the matrix spec it isn't fully clear to me if we need to manage the synapse server to run a k8s bridge or our own irc bridge. Do those run in the server or as separate software than well bridges?22:10
mordredclarkb: the IRC bridges already exist - so we dont' need to run anything there22:14
mordredif we wanted to supply a bridge to, for instance, k8s slack, that we can have EMS run that for us (it's $20/month per bridge)22:15
clarkbI see22:17
mordred(they run as separate processes, but aiui you also need to configure the synapse software to interact with them)22:18
corvusyep; and there's also some cooperation needed on the slack side -- for each slack instance (so it doesn't scale as well as, say, hooking up to an entire irc network)22:18
corvusif anyone here is on the gerrit slack, you're welcome to use my matrix bridge, btw.  just ping me.22:19
* mordred uses the gerrit slack via corvus' matrix bridge22:19
ianwclarkb: on the buildset-registry job for the paste service in gate; i think i left that out deliberately because at gate time it should only be pulling the lodgeit image from upstream22:22
ianwi think that's right; if it depends-on a lodgeit change, that change would have to be merged (and pushed to dockerhub) before it got to gate?22:22
clarkbianw: I think if you use the buidlest registry then that wouldn't need to be the case, but I'm not completely sure22:22
ianwi guess i'm saying we shouldn't be using the intermediate/buildset registry in gate here because the change should be published22:24
ianwi think that's different for images that are part of system-config22:25
corvuswhatever you do, i wouldn't make check differ from gate22:25
clarkbianw: I think if you have a depends on then zuul should be able to find the image in the intermediate registry and pull it to the buildest registry then when the parent merges dockerhub will be updated but that could happen concurrently with gating the paste job22:27
corvusdepends-on also means that it won't merge until the change ahead is merged.  if we're willing to accept the small race condition between gate and promote, then it should be reasonable to use the buildset registry in both check and gate.22:27
*** Guest3 is now known as prometheanfire22:28
opendevreviewIan Wienand proposed opendev/system-config master: Add paste service  https://review.opendev.org/c/opendev/system-config/+/79840022:28
opendevreviewIan Wienand proposed opendev/system-config master: lodgeit: use mariadb connector  https://review.opendev.org/c/opendev/system-config/+/79900422:28
corvushowever, deployment jobs that use a mutex should be strictly sequenced, so even that race shouldn't be a problem22:28
clarkbyup we promote before running infra-prod deployment jobs iirc22:30
clarkbhwich results in a strict ordering22:30
ianwwell in this case it's the lodgeit job that would need to promote22:30
clarkbah right22:31
clarkbeven then I think the risk is fairly low. But point taken22:31
corvustrue, that could race a deployment job.  i think the chances are small though22:31
clarkbOn the zuul and nodepool side we run their deployment jobs hourly to reduce the pain of that aiui22:32
ianwi also think there is a strong possibility the lodgeit container will never again be updated :)22:32
corvusianw: i think your idea of basically having the system-config gate fail if the lodgit promote failed has merit -- though there is a race condition there too.  if you do decide to make gate!=check, please leave a note as to why since usually we just assume that's a bug.22:32
corvusbut all things being equal, i like the current ps where check==gate22:33
ianwi'm fine with that22:33
opendevreviewMerged openstack/diskimage-builder master: Mount /sys RO  https://review.opendev.org/c/openstack/diskimage-builder/+/79912622:44
clarkbside note: the zookeeper metrics look good according to grafana22:54
ianwcorvus: i know we're a few days out, but do you forsee any issues if i restart zuul to incorporate https://review.opendev.org/c/opendev/system-config/+/798243 on the 11th (UTC ~11pm, i.e at about this time)22:54
clarkbwatches strongly correlate to ephemeral nodes22:54
corvusianw: lgtm22:56
*** ysandeep|away is now known as ysandeep23:38

Generated by irclog2html.py 2.17.2 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!