*** rlandy has quit IRC | 00:00 | |
openstackgerrit | Merged zuul/zuul master: Add a simple test for upstream renaming branches https://review.opendev.org/739255 | 00:14 |
---|---|---|
fungi | y2kenny: not sure if it's a direction you want to go, but nodepool launcher daemons have worker threads which perform periodic tasks for purpose of node lifecycle management. maybe one of them could renew tokens for any nodes it knows about? | 00:18 |
openstackgerrit | Merged zuul/zuul master: Expire Github installation key 3 minutes before https://review.opendev.org/738772 | 00:25 |
openstackgerrit | Guillaume Chauvel proposed zuul/zuul master: scheduler: Fix event process abide hasUnparsedBranchCache argument https://review.opendev.org/739042 | 00:26 |
openstackgerrit | Guillaume Chauvel proposed zuul/zuul master: Fix branch name and project name for ref-updated create/delete https://review.opendev.org/738320 | 00:26 |
openstackgerrit | Guillaume Chauvel proposed zuul/zuul master: FakeGerritChange: Add Change-Id in commit message https://review.opendev.org/739197 | 00:26 |
openstackgerrit | Guillaume Chauvel proposed zuul/zuul master: WIP: Scheduler: Reconfiguration ref-updated create/delete https://review.opendev.org/739198 | 00:26 |
openstackgerrit | Guillaume Chauvel proposed zuul/zuul master: WIP: Scheduler: Reconfiguration ref-updated oldrev+newrev https://review.opendev.org/739078 | 00:26 |
*** hashar has joined #zuul | 00:46 | |
*** rfolco has quit IRC | 00:48 | |
*** hashar has quit IRC | 01:11 | |
*** swest has quit IRC | 01:50 | |
*** swest has joined #zuul | 02:04 | |
y2kenny | fungi: do you separate from the cleanup leaked node ones? are those worker threads accessible from the drivers? | 02:09 |
y2kenny | did you mean* | 02:09 |
*** bhavikdbavishi has joined #zuul | 02:21 | |
*** hamalq has quit IRC | 02:25 | |
*** ysandeep|away is now known as ysandeep | 02:26 | |
*** hamalq has joined #zuul | 02:33 | |
*** bhavikdbavishi has quit IRC | 02:34 | |
*** bhavikdbavishi has joined #zuul | 02:37 | |
fungi | the cleanup thread would be an example... it's been a while since i poked around in that bit of the codebase but they'd be driver-specific by nature | 02:54 |
y2kenny | fungi: ok thanks | 02:56 |
*** hamalq has quit IRC | 03:04 | |
*** bhavikdbavishi1 has joined #zuul | 03:07 | |
*** bhavikdbavishi has quit IRC | 03:09 | |
*** bhavikdbavishi1 is now known as bhavikdbavishi | 03:09 | |
*** bhavikdbavishi has quit IRC | 04:04 | |
*** bhavikdbavishi has joined #zuul | 04:14 | |
*** wuchunyang has joined #zuul | 04:33 | |
*** evrardjp has quit IRC | 04:33 | |
*** evrardjp has joined #zuul | 04:33 | |
*** sgw1 has quit IRC | 04:39 | |
*** vishalmanchanda has joined #zuul | 04:44 | |
*** wuchunyang has quit IRC | 04:59 | |
*** wuchunyang has joined #zuul | 04:59 | |
*** sugaar has quit IRC | 05:02 | |
swest | ianw: would be great if you could have another look at https://review.opendev.org/#/c/728824/ | 05:25 |
*** y2kenny has quit IRC | 05:25 | |
*** wuchunyang has quit IRC | 05:29 | |
*** bhagyashris is now known as bhagyashris|brb | 05:50 | |
*** marios has joined #zuul | 06:01 | |
*** bhagyashris|brb is now known as bhagyashris | 06:16 | |
*** bhavikdbavishi1 has joined #zuul | 06:20 | |
*** bhavikdbavishi has quit IRC | 06:22 | |
*** bhavikdbavishi1 is now known as bhavikdbavishi | 06:22 | |
ianw | swest: sorry, i've just had my head in non dib things | 06:22 |
*** wuchunyang has joined #zuul | 06:31 | |
openstackgerrit | Felix Edel proposed zuul/zuul master: Introduce Patternfly 4 https://review.opendev.org/736225 | 06:33 |
openstackgerrit | Felix Edel proposed zuul/zuul master: PF4: Add new Zuul logo with text https://review.opendev.org/738033 | 06:33 |
openstackgerrit | Felix Edel proposed zuul/zuul master: PF4: Update "fetching info ..." and refresh animation https://review.opendev.org/738010 | 06:33 |
openstackgerrit | Felix Edel proposed zuul/zuul master: PF4: Update buildset result page (new layout and styling) https://review.opendev.org/738011 | 06:33 |
swest | ianw: thanks a lot | 06:59 |
*** bhavikdbavishi has quit IRC | 07:06 | |
openstackgerrit | Merged zuul/zuul master: Replace cookie use with localStorage https://review.opendev.org/739454 | 07:13 |
*** jcapitao has joined #zuul | 07:22 | |
*** bhavikdbavishi has joined #zuul | 07:30 | |
*** tosky has joined #zuul | 07:36 | |
*** hashar has joined #zuul | 07:42 | |
*** saneax has joined #zuul | 07:47 | |
openstackgerrit | Sorin Sbarnea (zbr) proposed zuul/zuul master: Reduce table nesting on build pages https://review.opendev.org/739559 | 07:50 |
zbr | apparently we still have gate failures on zuul https://zuul.opendev.org/t/zuul/build/f973fd1dff9743449e6c1a34eea5f3aa | 07:51 |
zbr | promote pipeline failed consistently for more than... zull can display. | 07:52 |
openstackgerrit | Benjamin Schanzel proposed zuul/zuul master: github: use change.message in squahsed commit message https://review.opendev.org/736019 | 08:04 |
*** sugaar has joined #zuul | 08:10 | |
*** nils has joined #zuul | 08:10 | |
*** nils has quit IRC | 08:11 | |
*** nils has joined #zuul | 08:11 | |
*** ysandeep is now known as ysandeep|lunch | 08:34 | |
tobiash | corvus: I've administratively -2ed the auto gc patch (https://review.opendev.org/723800) so we can observe this in combination with the followup for 1-2 weeks in prod | 08:44 |
tobiash | corvus: we're running this since many weeks now and the followup (which we deployed yesterday) should rule out the last problem with it. So I'd like to give it a burn in test in our prod system for another week or so before landing those. | 08:45 |
openstackgerrit | Sorin Sbarnea (zbr) proposed zuul/zuul master: Avoid interactive when building containers https://review.opendev.org/739680 | 09:00 |
*** bhavikdbavishi has quit IRC | 09:11 | |
openstackgerrit | Sorin Sbarnea (zbr) proposed zuul/zuul master: Attempt to remove .keep file https://review.opendev.org/739684 | 09:11 |
tobiash | zbr: regarding ^ check out https://review.opendev.org/663108 | 09:17 |
zbr | tobiash: lol, more than an year to remove a file, and not merged yet. | 09:18 |
tobiash | zbr: it was harder than initially expected and low prio ;) | 09:18 |
zbr | sadly i cannot help you merge it, but once recheck passes i will try to poke others. | 09:19 |
openstackgerrit | Felix Edel proposed zuul/zuul master: Introduce Patternfly 4 https://review.opendev.org/736225 | 09:20 |
openstackgerrit | Felix Edel proposed zuul/zuul master: PF4: Add new Zuul logo with text https://review.opendev.org/738033 | 09:20 |
openstackgerrit | Felix Edel proposed zuul/zuul master: PF4: Update "fetching info ..." and refresh animation https://review.opendev.org/738010 | 09:20 |
openstackgerrit | Felix Edel proposed zuul/zuul master: PF4: Update buildset result page (new layout and styling) https://review.opendev.org/738011 | 09:20 |
tobiash | to be fair I wasn't too active on that change and forgot to ask for a review after addressing the comments | 09:21 |
tobiash | corvus: ++ for the branch release dance :) | 09:27 |
*** jcapitao is now known as jcapitao_afk | 09:30 | |
*** tumble has joined #zuul | 09:31 | |
*** bhavikdbavishi has joined #zuul | 09:43 | |
*** holser has quit IRC | 09:48 | |
*** ysandeep|lunch is now known as ysandeep | 09:48 | |
*** jcapitao_afk is now known as jcapitao | 09:49 | |
*** tosky has quit IRC | 10:00 | |
zbr | tobiash: corvus: small https://review.opendev.org/#/c/739680/ | 10:00 |
*** tosky has joined #zuul | 10:01 | |
tobiash | zbr: afaik this has been fixed in the python-builder base image. Can you update them and try again? | 10:03 |
zbr | afaik these values are not persistent inside the image. | 10:05 |
zbr | i am almost sure my image was donwloaded and failed to run | 10:06 |
zbr | basically you need to be sure you defined them whenever you perform the build | 10:06 |
*** hashar has quit IRC | 10:07 | |
*** wuchunyang has quit IRC | 10:08 | |
tobiash | zbr: the central fix was supposed to be https://review.opendev.org/738121 | 10:09 |
zbr | i got the error today, on master. | 10:09 |
zbr | i bet it came from another call of apt | 10:10 |
*** bhavikdbavishi has quit IRC | 10:13 | |
*** holser has joined #zuul | 10:15 | |
zbr | tobiash: I can assure you that "docker build ." on zuul master does trigger the interactive prompt. | 10:20 |
zbr | your fix was good, but for another use-case. | 10:20 |
zbr | http://paste.openstack.org/show/795598/ | 10:22 |
tobiash | it wasn't my fix ;) | 10:22 |
*** bhavikdbavishi has joined #zuul | 10:28 | |
*** bhagyashris is now known as bhagyashris|brb | 10:37 | |
*** harrymichal has joined #zuul | 10:41 | |
*** wuchunyang has joined #zuul | 10:50 | |
*** wuchunyang has quit IRC | 10:55 | |
*** felixedel has joined #zuul | 10:57 | |
*** jcapitao is now known as jcapitao_lunch | 10:59 | |
felixedel | zuul-maint: Question about the build result page: Is there a reason why there are three dedicated pages Build, BuildLogs and BuildConsole instead of just a single page using three tabs? Currently, each of those three pages is kind of using the same container, but fills it with a different content and activates a different tab. | 11:00 |
openstackgerrit | Tobias Henkel proposed zuul/zuul master: Resume jobs after reenqueue of an item https://review.opendev.org/739709 | 11:01 |
*** holser has quit IRC | 11:06 | |
*** bhagyashris|brb is now known as bhagyashris | 11:12 | |
avass | tobiash: wanna +2: https://review.opendev.org/#/c/727158/ to drop ansible 2.6 in zuul-jobs? | 11:16 |
tobiash | avass: you mean +3? | 11:17 |
avass | should have been done by 2nd of july: http://lists.zuul-ci.org/pipermail/zuul-announce/2020-June/000075.html | 11:17 |
avass | ah yeah | 11:17 |
tobiash | done | 11:17 |
avass | thanks! | 11:18 |
openstackgerrit | Merged zuul/zuul-jobs master: Drop support for ansible 2.6 https://review.opendev.org/727158 | 11:32 |
tristanC | felixedel: none that I can remember... it sounds like a refactor opportunity | 11:36 |
*** ysandeep is now known as ysandeep|afk | 11:41 | |
felixedel | tristanC: I think I found it: https://review.opendev.org/#/c/675235/ | 11:41 |
*** felixedel has quit IRC | 11:43 | |
*** hashar has joined #zuul | 11:52 | |
*** rfolco has joined #zuul | 12:04 | |
*** ysandeep|afk is now known as ysandeep | 12:06 | |
*** jcapitao_lunch is now known as jcapitao | 12:15 | |
*** holser has joined #zuul | 12:18 | |
*** rlandy has joined #zuul | 12:19 | |
*** vishalmanchanda has quit IRC | 12:22 | |
*** harrymichal has quit IRC | 12:29 | |
*** harrymichal has joined #zuul | 12:29 | |
*** bhavikdbavishi has quit IRC | 12:35 | |
*** bhavikdbavishi has joined #zuul | 12:36 | |
*** piotrowskim has joined #zuul | 12:52 | |
*** sgw1 has joined #zuul | 13:11 | |
*** Goneri has joined #zuul | 13:18 | |
*** Goneri has quit IRC | 13:33 | |
*** bhavikdbavishi has quit IRC | 13:51 | |
openstackgerrit | Simon Westphahl proposed zuul/zuul master: Protect repo update for cat/fileschanges with lock https://review.opendev.org/739761 | 13:52 |
openstackgerrit | Simon Westphahl proposed zuul/zuul master: Correctly fail cat/fileschanges when update fails https://review.opendev.org/739762 | 13:52 |
*** olaph has quit IRC | 13:58 | |
*** Goneri has joined #zuul | 14:02 | |
openstackgerrit | Tristan Cacqueray proposed zuul/zuul-operator master: Update to dhall lang v14 https://review.opendev.org/739767 | 14:03 |
openstackgerrit | Tristan Cacqueray proposed zuul/zuul-operator master: Add reformat changes to the blame ignore list https://review.opendev.org/739768 | 14:03 |
openstackgerrit | Tristan Cacqueray proposed zuul/zuul-operator master: Update to dhall lang v17 https://review.opendev.org/739767 | 14:03 |
openstackgerrit | Tristan Cacqueray proposed zuul/zuul-operator master: Add reformat changes to the blame ignore list https://review.opendev.org/739768 | 14:04 |
*** bhagyashris is now known as bhagyashris|dinn | 14:09 | |
*** holser has quit IRC | 14:35 | |
*** holser has joined #zuul | 14:36 | |
*** saneax has quit IRC | 14:51 | |
tobiash | tristanC: mind an easy review? https://review.opendev.org/738620 (removes some unused variables) | 14:54 |
*** saneax has joined #zuul | 14:55 | |
*** bhagyashris|dinn is now known as bhagyashris | 14:56 | |
*** saneax has quit IRC | 14:56 | |
*** saneax has joined #zuul | 14:57 | |
corvus | tobiash: sounds good (re admin -2 on git patch) | 15:00 |
openstackgerrit | Tristan Cacqueray proposed zuul/zuul-operator master: Use ensure-pip role to unblock the CI https://review.opendev.org/739786 | 15:00 |
zbr | avass: https://github.com/ansible/ansible-lint/pull/881 please, i think that webknjaz may be in vacation this week. ignore zuul error (infra) | 15:00 |
tobiash | corvus: great, I'll +w it in 1-2 weeks if we don't observe issues | 15:01 |
*** saneax has quit IRC | 15:01 | |
*** saneax has joined #zuul | 15:04 | |
*** saneax has quit IRC | 15:05 | |
*** hamalq has joined #zuul | 15:10 | |
*** bhavikdbavishi has joined #zuul | 15:12 | |
*** hamalq_ has joined #zuul | 15:12 | |
*** hamalq has quit IRC | 15:15 | |
*** bhavikdbavishi has quit IRC | 15:18 | |
*** vishalmanchanda has joined #zuul | 15:25 | |
*** bhavikdbavishi has joined #zuul | 15:28 | |
openstackgerrit | Benjamin Schanzel proposed zuul/zuul master: github: use change.message in squahsed commit message https://review.opendev.org/736019 | 15:29 |
*** ysandeep is now known as ysandeep|away | 15:32 | |
openstackgerrit | Guillaume Chauvel proposed zuul/zuul master: scheduler: Fix event process abide hasUnparsedBranchCache argument https://review.opendev.org/739042 | 15:38 |
openstackgerrit | Guillaume Chauvel proposed zuul/zuul master: Fix branch name and project name for ref-updated create/delete https://review.opendev.org/738320 | 15:38 |
openstackgerrit | Guillaume Chauvel proposed zuul/zuul master: FakeGerritChange: Add Change-Id in commit message https://review.opendev.org/739197 | 15:38 |
openstackgerrit | Guillaume Chauvel proposed zuul/zuul master: WIP: Scheduler: Reconfiguration ref-updated create/delete https://review.opendev.org/739198 | 15:38 |
openstackgerrit | Guillaume Chauvel proposed zuul/zuul master: WIP: Scheduler: Reconfiguration ref-updated oldrev+newrev https://review.opendev.org/739078 | 15:38 |
zbr | felixedel did a great job with pf4, i wonder when it will merge. new UI looks bit weird but not necessarily in a bad way. | 15:39 |
zbr | tobiash: the .keep removal still has an issue | 15:41 |
tobiash | yes :( | 15:42 |
*** bhavikdbavishi1 has joined #zuul | 15:52 | |
zbr | how can i find all jobs that run with a specific nodeset? i am ware of a problem with fedora-31 but i do not know how to find jobs that used it recently (last 24h) | 15:52 |
openstackgerrit | Merged zuul/zuul master: Avoid interactive when building containers https://review.opendev.org/739680 | 15:53 |
zbr | i am not aware of any filtering tricks that could allow me to do that | 15:53 |
corvus | zbr: i don't think zuul provides that. but if you want to hop over to #opendev, i can look that up for you in the logs. | 15:53 |
*** bhavikdbavishi has quit IRC | 15:53 | |
*** bhavikdbavishi1 is now known as bhavikdbavishi | 15:53 | |
tobiash | otherwise one would need to query all builds, squash them into a list of jobnames and query them from the zuul api (which includes the nodeset) | 15:55 |
fungi | also, "i want to know all the jobs with <property x>" is a bit vague. presumably the desire is to just know about jobs in merged configuration and not in speculative configuration | 15:57 |
*** reiterative has quit IRC | 15:57 | |
fungi | but zuul can run plenty of jobs which are not in its merged configuration state | 15:57 |
tobiash | in that case one could loop over the jobs list which is non-speculatively | 15:57 |
*** reiterative has joined #zuul | 15:58 | |
tobiash | but doesn't cover the 'in the last 24h' filter | 15:58 |
fungi | right, "jobs which ran with <property x>" and "jobs in merged config which could run with <property x>" are overlapping sets, but neither is necessarily a proper subset of the other | 15:59 |
fungi | if this were going to eventually become queryable from the builds api then i guess it should be the former, but from the jobs api it should be the latter | 16:00 |
*** marios has quit IRC | 16:02 | |
corvus | we don't store enough info in the builds api (yet) to make that queryable. there's a node field, but it's inadequate for the current data model | 16:02 |
corvus | we don't store enough info in the builds table (yet) to make that queryable. there's a node field, but it's inadequate for the current data model | 16:03 |
*** jcapitao has quit IRC | 16:11 | |
*** ysandeep|away is now known as ysandeep | 16:21 | |
openstackgerrit | Merged zuul/zuul-jobs master: Allow deleting workspace after running terraform destroy https://review.opendev.org/738771 | 16:29 |
*** chandankumar is now known as raukadah | 16:38 | |
*** SpamapS has quit IRC | 16:42 | |
webknjaz | @zbr: Monday was a public holiday in CZ, but not the whole week | 16:52 |
openstackgerrit | Merged zuul/zuul master: GitHub Reporter: Fix User Email in Merge Commit Message https://review.opendev.org/738590 | 16:58 |
openstackgerrit | Merged zuul/zuul master: Introduce Patternfly 4 https://review.opendev.org/736225 | 17:07 |
openstackgerrit | Merged zuul/zuul master: PF4: Add new Zuul logo with text https://review.opendev.org/738033 | 17:11 |
fungi | webknjaz: you could have pretended, most of us wouldn't have questioned it ;) | 17:12 |
webknjaz | lol 😂 | 17:19 |
zbr | too late to hide now | 17:19 |
*** SpamapS has joined #zuul | 17:20 | |
openstackgerrit | Merged zuul/zuul master: Remove some unused variables https://review.opendev.org/738620 | 17:23 |
corvus | tobiash, tristanC: this is something i've noticed with the zuul for the gerrit project. it's running in k8s with a single mysql server, and if the scheduler pod restarts while the sql server is down, it disables the sql reporter. then i need to manually restart the scheduler to fix it. | 17:29 |
corvus | the best solution would probably be to have an HA mysql service. :) | 17:29 |
corvus | but should we also re-think how we handle disabling sql reporters? | 17:29 |
tobiash | HA mysql is always good to have :) | 17:30 |
tobiash | I think we shouldn't disable them at all probably | 17:30 |
corvus | should we have the scheduler wait for sql before starting? or should we have it continually retry after starting and put it into service if it shows up | 17:30 |
corvus | or that | 17:30 |
corvus | just let it soft-fail like any other reporter if it isn't there | 17:30 |
tobiash | ++ | 17:30 |
corvus | (i think the main thing is that right now the scheduler does the schema upgrades, so there is something special that has to happen on start) | 17:31 |
corvus | (but maybe that can still happen any time | 17:31 |
tobiash | hrm, good question | 17:31 |
corvus | (ugh, also, the k8s cluster has done the thing where it lost its internal dns server; i think i need to reboot the whole cluster) | 17:32 |
tobiash | we saw schema updates taking up to 30min in prod and actually I'd rather wait for it to succeed except 'loosing' builds from users point of view | 17:32 |
tobiash | s/except/instead of | 17:32 |
fungi | "reboot the whole cluster" | 17:37 |
fungi | so basically the cluster is the new server, and kubernetes is init | 17:37 |
*** hashar has quit IRC | 17:47 | |
openstackgerrit | Tobias Henkel proposed zuul/zuul master: Revert "Revert "Create zuul/web/static on demand"" https://review.opendev.org/663108 | 17:55 |
tobiash | zbr: I think that should finally do it ^ | 17:55 |
tristanC | corvus: it seems like the ideal behavior would be to wait for a reporter instead of disabling it | 18:00 |
openstackgerrit | James E. Blair proposed zuul/zuul master: WIP: keep retrying SQL db init https://review.opendev.org/739827 | 18:01 |
corvus | tristanC, tobiash: ^ maybe something like that? i haven't tested it yet; just sketching it out | 18:01 |
openstackgerrit | Sorin Sbarnea (zbr) proposed zuul/zuul master: Enable ANSI rendering via react-ansi https://review.opendev.org/739444 | 18:02 |
tobiash | corvus: I think _init() requires a lock, other than that I think it makes sense to be able to configure zuul such that startup still fails when sql is not available (which would be the mode we'd be using) | 18:05 |
zbr | corvus: fungi: wdyt about https://review.opendev.org/#/c/739559/ ? | 18:05 |
corvus | tobiash: yeah, i think you're right about the lock since it could be triggered by a web gear rpc call (otherwise it should all be in the scheduler main loop) | 18:05 |
tobiash | ah yeah, didn't think about the main loop :D | 18:06 |
corvus | zbr: lgtm but i'm going to recheck it since we just landed pf4 | 18:07 |
fungi | zbr: seems reasonable, though i'll admit i'm not really savvy with react/bootstrap panel containers | 18:07 |
zbr | indeed, probably not impacted but better to test. | 18:08 |
fungi | i definitely agree with the reasoning in the commit message | 18:08 |
fungi | and assuming the recheck looks good for build results views i'm tentatively +2 on it | 18:09 |
zbr | fungi: when it comes to UI, less (noise) is more. | 18:09 |
fungi | absolutely | 18:09 |
fungi | i still think in sgml, for what it's worth ;) | 18:09 |
zbr | fungi: i asked yesterday but did not get a clear answer: i want to propose ditching the "popup on result label" and display all details in the expansion. | 18:10 |
zbr | so we would no longer have 3 places to render task details | 18:10 |
corvus | zbr: let's also ask tristanC about the panel -- he might have insight about the most patternfly-typical way to handle things like that | 18:11 |
*** nils has quit IRC | 18:11 | |
zbr | some "tipical" ways where changed in fp4 too, the idea is that we should not ab(use) all cool UI elements it provides. | 18:12 |
fungi | zbr: i'm a fairly utilitarian sort of guy... if it still works and i can find the information i need and it's simpler than what we had before, then it's fine by me. but i would like broader feedback on ui preferences than just mine | 18:12 |
corvus | zbr: i'm open to an experiment about the popup, but it's going to have to work really well for me to favor that. | 18:13 |
zbr | i do not plan to make major redesigns, only to simplify parts that proved to be confusing | 18:13 |
corvus | zbr: don't embark on it if you aren't prepared for rejection. | 18:13 |
zbr | i wasn't expecting more than this, only to see if people are open to something like this | 18:14 |
zbr | obviously that we need to see it to decide applies | 18:14 |
corvus | definitely open. this is our most novel ui feature, so we have the most to gain and the most to lose with experiments/changes. :) | 18:15 |
fungi | i definitely favor simplicity (both for the user and for the maintainer), but i recognize that usability is a multi-dimensional vector and different people favor different dimensions in the field | 18:16 |
corvus | https://ci.gerritcodereview.com/t/gerrit/builds | 18:17 |
corvus | that's back up now that i have upgraded the cluster (an effective rolling restart) | 18:17 |
corvus | and of course, since it was a restart, that's the branch tip | 18:18 |
corvus | so you can see the pf4 stuff in action | 18:18 |
tobiash | cool :) | 18:19 |
tristanC | corvus: zbr: i don't have much insight about pf or ui design... i guess regular user are used to clicking on the [result] box to get the detail, but i can see how new user can be confused by the current layout | 18:19 |
fungi | nice! i don't really see much of a change, which i suppose is praise coming from me ;) | 18:19 |
corvus | tristanC: sorry, i was asking about the panel change in https://review.opendev.org/739559 | 18:20 |
tobiash | and I see the retry reporting there as well :) | 18:20 |
corvus | yep | 18:20 |
corvus | tristanC, zbr: seems like zbr is saying we should reserve panels for pages which display multiple items, so each item gets a panel; i was wondering if there's an overall page structure framework we should use in pf, or if a simple h2 is the way to go | 18:22 |
tristanC | corvus: zbr: if we are going to change the layout to accomodate new users, we might want to help existing user getting use to a new behavior | 18:22 |
tristanC | corvus: about page panel, i don't know, a simple h2 seems to work fine | 18:22 |
tobiash | corvus: do you mind a re-review of 710034 (the github auth refactor). I had to do a rebase in the meantime and addressed a comment clarkb had. | 18:23 |
tristanC | i would say that if the page only has one panel header, then that can be replaced by a h2 | 18:23 |
avass | webknjaz: seeing emojis in irc feels.. strange | 18:23 |
fungi | tristanC: that sounds like what 739559 is doing then | 18:24 |
fungi | avass: i recently upgraded my font set to include the noto (no tofu) family and started getting them in my consoles. i agree it continues to surprise me | 18:25 |
corvus | zbr, tristanC: this is the last time we discussed the task expansion: http://eavesdrop.openstack.org/irclogs/%23zuul/%23zuul.2020-06-10.log.html#t2020-06-10T15:44:00 | 18:27 |
avass | zbr: lgtm and I agree that it doesn't make sense to split that test into several tests so I'll go ahead and approve that | 18:27 |
avass | webknjaz: unless you have something else you want to change | 18:28 |
corvus | i believe everything said then still stands. it's worth a re-read before embarking on any changes. | 18:28 |
avass | zbr: uh, except that the tests aren't passing ;) | 18:28 |
zbr | avass: ignore zuul jobs (best channel to say this....) | 18:29 |
corvus | zbr: is that the f31 issue? | 18:30 |
zbr | yes | 18:30 |
zbr | i think paul started to work on it | 18:30 |
zbr | we did not see it because we do not (usually) run fedora jobs, started last night | 18:31 |
avass | tristanC: I started looking into the zuul-operator a bit and might push some changes later in the week when I get around to it. But I think I've found some bugs :) | 18:32 |
zbr | maybe that is a good opportunity to ask about having a periodic-nodeset-sanity-pipeline, that runs daily and makes use of most important zuul-jobs roles | 18:33 |
zbr | so we would know when something breaks from outside | 18:33 |
zbr | same could be used to validate that a new nodeset image "looks" good to be used | 18:33 |
avass | tristanC: is there a list of things that needs to be done somewhere? | 18:34 |
tristanC | avass: nice, that's quite possible, i plan on working on it some more. here are a list of missing things: https://review.opendev.org/#/c/718755/4/README.md | 18:36 |
tristanC | avass: iirc we talked about providing a one file install to setup all the services on different provider, and i think we should be able to generate such file from the existing configuration | 18:38 |
tobiash | corvus: I noticed another unrelated failure in test_playbook and it looks like it retries the job 'timeout' instead of marking it as TIMED_OUT. However I have no idea yet why. | 18:40 |
tobiash | https://16017867d971cd1a3c19-21d6bae6f57d664d8ef403dd2ad49654.ssl.cf1.rackcdn.com/738620/1/gate/tox-py35/219edc3/testr_results.html | 18:40 |
tristanC | avass: also i'd like to adapt this k8s deploy function to work using podman play kube directly, without k8s : https://softwarefactory-project.io/cgit/zuul-images-jobs/tree/functions/deploy.dhall | 18:40 |
tobiash | maybe it's not even a test race but a real race | 18:40 |
tristanC | avass: i think that would be a nice little transformer to convert the StatefuleSet/Deployment to simpler Pod | 18:40 |
avass | tristanC: ah cool, I've been messing around with it with Kind so far. | 18:41 |
tristanC | avass: Kind sounds like something interesting to document too | 18:43 |
*** vishalmanchanda has quit IRC | 18:44 | |
tristanC | avass: i meant to try podman as a way to run the smallest zuul possible locally | 18:44 |
avass | tristanC: I think I'll have some reading to do to get my head around dhall first ;) | 18:51 |
*** bhavikdbavishi has quit IRC | 19:07 | |
zbr | is it common for zuul to fail py35 tests randomly? https://zuul.opendev.org/t/zuul/build/62339692bbcf43a68c3c7d9463cde97f -- looks random to me. | 19:09 |
fungi | randomness is an illusion, nondeterminism is more likely | 19:13 |
* fungi risks veering into philosophy | 19:13 | |
tobiash | corvus: I think I found out the cause of the test_playbook fail. In this case it was slow and the 'timeout' job already timed out during the pre playbook causing it to retry hitting the test timeout in the end | 19:15 |
tobiash | corvus: now the key question, do we want a job to retry if it timeouts in a pre playbook or not? | 19:16 |
tobiash | corvus: the docs are not very clear about that: https://zuul-ci.org/docs/zuul/reference/job_def.html#attr-job.timeout | 19:17 |
avass | zbr: is it always that test or is it random? | 19:20 |
zbr | yep | 19:21 |
openstackgerrit | Tobias Henkel proposed zuul/zuul master: Make test_playbook more stable https://review.opendev.org/739835 | 19:21 |
tobiash | corvus: if we want to retain the current behavior, this should make the test more stable ^ | 19:22 |
tobiash | zbr: there are some test cases that might contain races which over time accumulate until someone takes the time to track each one down(like that one ^ as well) | 19:25 |
tobiash | zbr: in your case the command socket thread didn't exit cleanly as it seems | 19:30 |
zbr | btw, i find weird to see py35 being only python tested instead of py36 or py37. | 19:31 |
fungi | making sure we don't break "old" python? | 19:31 |
*** hashar has joined #zuul | 19:34 | |
openstackgerrit | Tobias Henkel proposed zuul/zuul master: Join command thread on exit https://review.opendev.org/739838 | 19:36 |
tobiash | zbr: this might fix the issue you spotted ^ | 19:37 |
tobiash | zbr: in order to save resources we test the oldest and newest supported python version and not all in between | 19:37 |
zbr | py35 is joining the py27 club in two months, is that what we call newest? | 19:38 |
tobiash | zbr: why newest? we test py35 and py38 where py35 being our oldest supported version | 19:39 |
zbr | there are important changes around threading on py36,py37,... so I would not attempt to skip any version. | 19:39 |
fungi | why not compile and test every point release then? we have to draw the line somewhere | 19:41 |
fungi | previously we said testing oldest and newest we support should be sufficient to catch most errors | 19:42 |
mordred | zbr: those threading changes should exist in 38 too no? | 19:43 |
tobiash | the problem is that the zuul tests take quite some time combined with unfortunately some degree of unstable tests | 19:43 |
tobiash | testing four python versions is just not feasable currently | 19:43 |
zbr | just to clarify it: i am not asking to test all 3x python, i argue that py35 may be a poor choice, and that py36 or py37 would be much more useful | 19:44 |
tobiash | but there was already discussion to ditch py35 in a not too distant future so then we'd test py36 and py38 | 19:44 |
zbr | probably py36+py38 would be the best values, imho | 19:44 |
zbr | w/o 37 due to resource | 19:45 |
tobiash | py35 is not a poor choice as opendev is still running zuul with that version (unless the switch to containers is already complete) | 19:45 |
tobiash | and testing the lowest supported version ensures that we don't use language features that are not supported in py35 | 19:45 |
tobiash | mordred: did opendev already switch to containers or is that wip? | 19:46 |
mordred | tobiash: we're mostly on containers | 19:46 |
mordred | tobiash: we're still working on getting executors on containers - as well as our arm nodepool-builder | 19:46 |
mordred | but I think both of those are really close | 19:47 |
zbr | as long we replace it with py36 before 2020-09-13 we should be fine | 19:47 |
mordred | tobiash: but yes, that's right- we test py35 to ensure that we don't accidentlaly use too-new features | 19:47 |
mordred | and in so doing break the opendev deployment | 19:47 |
tobiash | zbr: fyi, there is a mailing list discussion about dropping py35: http://lists.zuul-ci.org/pipermail/zuul-discuss/2020-May/001225.html | 19:49 |
tobiash | tldr as I understood is that as soon as opendev switched to containers we can ditch it | 19:50 |
zbr | that message describes the situation very well | 19:50 |
hashar | hello | 20:03 |
hashar | I am not talking much in here anymore. But just wanted to highlight Wikimedia has upgraded its Gerrit from 2.15 to 3.2 ~ 8 days ago | 20:03 |
hashar | in short: new modern ui that is way more pleasant than the old GWT based one | 20:04 |
hashar | there is support for git protocol v2 which makes fetches dramatically faster | 20:04 |
fungi | hashar: thanks for the update! paladox has been keeping us apprised too | 20:04 |
hashar | and for the upgrade itself the person that did an update wrote a blog post upstream https://groups.google.com/g/repo-discuss/c/G5wucKJg9Ag/m/pLin-i3mBgAJ?pli=1 | 20:04 |
hashar | ahh paladox :] so nice | 20:05 |
fungi | opendev is in progress planning a similar upgrade (though we'll need a pause at 2.16 i think for the notedb transition) | 20:05 |
mordred | hashar: yes - I will be using info from that blog post to work on ours! :) | 20:05 |
mordred | yeah. although having read the wikimedia account, maybe just plowing all the way to 3.2 over a weekend is the right choice ... | 20:06 |
hashar | Paladox had setup a dev platform months ahead and contributed a lot back to Gerrit upstream | 20:07 |
fungi | hashar: also, good to see you around again for a bit--missed you! | 20:07 |
tristanC | we are also planning to work on upgrading software factory to gerrit 3.x in the next few months | 20:07 |
paladox | i fixed zuul v2 to support at least gerrit 2.16 | 20:07 |
hashar | and Christian had set up a whole replica of production has a staging/test bed area to do the whole migration without affecting production | 20:07 |
paladox | and seems to have worked with 3.2 :) | 20:07 |
hashar | fungi: so kind thank you :] I have been busy with a bunch of other duties | 20:07 |
fungi | aren't we all? ;) | 20:08 |
avass | mordred: that's what we did when we upgraded from 2.15 to 3.2 ;) | 20:09 |
hashar | I guess software wise most culpirts have probably been fixed by now, though there might still be some fixes still pending review in upstream gerrit | 20:11 |
hashar | the key I guess was to be able to play the whole upgrade outside of production with same hardware/os/packages/git repos/database | 20:11 |
hashar | and do it several tiome making sure everything went fine | 20:11 |
paladox | i'm currently migrating dark theme to a pref (so it'll replicate accross all login users) | 20:11 |
paladox | *logged in | 20:12 |
mordred | I'm fairly confident it's going to go pretty well for us due to all of the work of everyone else upgrading already :) | 20:12 |
tristanC | mordred: ++ :) | 20:13 |
hashar | ;D | 20:14 |
*** harrymichal has quit IRC | 20:42 | |
*** tumble has quit IRC | 20:44 | |
fungi | one of the (very few) benefits to being slow to upgrade, i guess | 20:46 |
corvus | i was just tracking down an issue seen on gerrit's nodepool that i mentioned here last week | 20:56 |
corvus | where zuul was throwing a bunch of retry errors because the host key doesn't match | 20:56 |
fungi | new developments then? | 20:57 |
corvus | that happens at the setup playbook, so i think it's not matching the host key that nodepool sends it | 20:57 |
corvus | and i think that's becaues nodepool is getting the host key very quickly after boot | 20:57 |
corvus | i'm guessing the image may have a host key burned into it but then it gets re-generated at boot | 20:57 |
fungi | oh, and nodepool is racing the key regen at boot | 20:58 |
corvus | ya | 20:58 |
fungi | what's the distro? | 20:58 |
corvus | debian buster | 20:58 |
corvus | i'll try to catch 2 in a row and see if they start with the same key | 20:58 |
fungi | i thought the usual tactic was to strip the keys from the image so that sshd blocks waiting for key generation | 20:59 |
corvus | me too | 20:59 |
fungi | i wonder if instead something like cloud-init is replacing keys proactively | 20:59 |
corvus | only other theory i have is arp related | 20:59 |
fungi | well, yeah, i wouldn't be surprised if there are rogue instances squatting the same ip addresses | 21:00 |
fungi | it's not like that problem is necessarily unique to openstack nova (or even virtual machines in general) | 21:00 |
corvus | cool, i spotted a second booting with the same key | 21:08 |
corvus | AAAAC3NzaC1lZDI1NTE5AAAAIMWnt6KgVST9yHYCOCmSz7YxFG6lB7JuHt9NXfBeKi2I | 21:08 |
corvus | it then immediately changed | 21:08 |
fungi | same key but more importantly different ip address? | 21:09 |
corvus | so i think that's good evidence for the theory that the image has the key baked in and it gets updated on boot theory | 21:09 |
corvus | ya | 21:09 |
corvus | i'm not sure how to work around this | 21:09 |
fungi | agreed then, that implies the image | 21:09 |
tobiash | corvus: interesting, but how can that be fixed (assuming it's a non custom image) | 21:10 |
fungi | i mean, ultra hacky solution is that if you know the baked-in key fingerprint, keep retrying until it's not that | 21:10 |
fungi | but not at all elegant, and fragile if the image changes its baked-in key | 21:10 |
tobiash | Blacklisting in nodepool | 21:10 |
corvus | hrm, yeah, that could work. that's probably better than "sleep(5)" which is the best i've come up with so far :) | 21:11 |
tobiash | I guess fixing the image is not an option? | 21:11 |
corvus | it's the cloud-standard google-provided image; there might be a way to fix it, but it'd be nice to handle this case anyway (and for any cloud) | 21:12 |
tobiash | Nodepool could auto blacklist by remembering the last x keys | 21:12 |
corvus | (also, did i say buster? i think i meant stretch) | 21:13 |
corvus | tobiash: that's good too -- everything after the first failure should work :) | 21:14 |
tobiash | Like store a set of the last 1000 keys encountered and wait until we get a uniqur one | 21:14 |
avass | I believe ec2 has an 'initializing' phase while running user-data, but I've only seen that through the web interface. does openstack/gce have something similar? | 21:14 |
fungi | in theory, the folks in charge of building and uploading those images are regulars on the debian-cloud ml... i could certainly ask some questions if desired | 21:15 |
corvus | avass: good q, i'll look for that | 21:16 |
corvus | also, there are 'startup scripts'; i could probably provide one that sets a metadata flag, and assuming it runs after key generation, that would indicate that we had progressed past that | 21:16 |
corvus | avass: (also, yeah this is gce) | 21:16 |
corvus | (or if the startup script doesn't run after that, it could background and wait) | 21:17 |
avass | I'm trying to find out how to get that state through the api though | 21:17 |
corvus | i only see the "status: RUNNING" field (which is what we're already using to detect the instance is ready) | 21:20 |
corvus | https://wiki.debian.org/Cloud/GoogleComputeEngineImage is relevant | 21:22 |
corvus | i wonder if buster is different | 21:23 |
avass | I think I found it for ec2 at least, it's under DescribeInstanceStatus: https://docs.aws.amazon.com/AWSEC2/latest/APIReference/API_DescribeInstanceStatus.html instance-status.reachability and system-status.reachability | 21:23 |
fungi | corvus: official debian cloud image building for buster changed significantly in an attempt to be more consistent across providers | 21:24 |
corvus | here's what i think i'll do: i'll set up my local test env to do some boot tests where i output the ssh key as fast as possible, and try that with buster vs stretch | 21:25 |
corvus | if buster is better, then it might make sense to just see if we can move gerrit's zuul to use buster :) | 21:25 |
corvus | if they both show the problem, i'll look into a startup script to work around it | 21:26 |
fungi | ahh, yeah, transitioned from bootstrap-vz to fai for buster | 21:26 |
fungi | sounds like a good plan | 21:26 |
*** armstrongs has joined #zuul | 21:33 | |
corvus | oooh | 21:36 |
corvus | in my local tests, i'm using the public ip, but gerrit's nodepool/zuul are using the private ip | 21:37 |
corvus | i wonder if the public ip takes longer to hook up (NAT), so the key generation is done by the time i can connect | 21:37 |
corvus | that would explain why i never saw this locally | 21:37 |
corvus | (and why now that i'm looking closely, it seems like it's taking much longer in my tests) | 21:38 |
corvus | that means i'll need to move my testing in-cloud :/ | 21:39 |
*** armstrongs has quit IRC | 21:41 | |
corvus | we could also probably just turn of host key checking; it seems pretty low-risk since this in entirely inside the cloud | 21:54 |
avass | corvus: looked through the gce docs but couldnt think of a better way to get the correct host-key except to let nodepool manage the host-keys. But that would require nodepool to be able to ssh to the node. | 21:55 |
corvus | (perhaps in the gce driver, use-internal-ip:True should imply host-key-checking:False | 21:55 |
corvus | avass: that is what nodepool does | 21:55 |
avass | corvus: I mean generate new host-keys and install them :) | 21:55 |
corvus | ooh :) | 21:55 |
corvus | yeah, i don't think we need to go that far | 21:55 |
corvus | hey i managed to snag a reproduction in testing | 21:56 |
corvus | it took about 4 tries | 21:56 |
corvus | and reprod with buster too, same key as stretch | 21:59 |
corvus | (so maybe there's only one key to watch out for?) | 22:00 |
*** rlandy is now known as rlandy|bbl | 22:08 | |
*** tobiash has quit IRC | 22:12 | |
corvus | i think for the moment, i'm going to just turn off host key checking in gerrit's nodepool, since this is starting to look like it's hard to hit otherwise. if that works, then it may not be worth doing any of the other rube-goldberg stuff. :) | 22:14 |
corvus | https://gerrit-review.googlesource.com/c/zuul/ops/+/274652 | 22:23 |
*** avass has quit IRC | 22:25 | |
mordred | mhu: related to our discussion yesterday about translations - here's a guide to aliasing git commands into more native swedish: https://github.com/bjorne/git-pa-svenska :) | 22:27 |
corvus | mordred: if you're around, can you look at that gerrit-zuul change ^ | 22:36 |
corvus | it's all commit message :) | 22:36 |
mordred | corvus: yes! | 22:38 |
mordred | corvus: done | 22:39 |
mordred | corvus: (I'd already read the scrollback here) | 22:39 |
*** erbarr has quit IRC | 22:45 | |
corvus | mordred: hrm, apparently that was wrong | 22:46 |
corvus | 2020-07-07 22:45:47,586 DEBUG zuul.AnsibleJob.output: [e: 390951245725432ba62ac536b8014109] [build: 10c8322dbefe4cf2bfc67f51c70c9c21] Ansible output: b' "msg": "Data could not be sent to remote host \\"10.128.15.214\\". Make sure this host can be reached over ssh: Host key verification failed.\\r\\n",' | 22:46 |
*** gmann has quit IRC | 22:47 | |
*** erbarr has joined #zuul | 22:48 | |
corvus | i'm not entirely sure why we have that option if it behaves like that | 22:48 |
*** PrinzElvis has quit IRC | 22:50 | |
*** webknjaz has quit IRC | 22:50 | |
*** iamweswilson has quit IRC | 22:50 | |
*** kmalloc has quit IRC | 22:50 | |
*** mwhahaha has quit IRC | 22:50 | |
*** mnaser has quit IRC | 22:50 | |
*** jbryce has quit IRC | 22:50 | |
*** piotrowskim has quit IRC | 22:51 | |
*** kklimonda has quit IRC | 22:51 | |
*** stevthedev has quit IRC | 22:51 | |
*** johnsom has quit IRC | 22:51 | |
*** maxamillion has quit IRC | 22:51 | |
*** dcastellani has quit IRC | 22:51 | |
*** rpittau has quit IRC | 22:51 | |
*** evgenyl has quit IRC | 22:52 | |
*** gundalow has quit IRC | 22:52 | |
*** ericsysmin has quit IRC | 22:52 | |
*** erbarr has quit IRC | 22:52 | |
*** tdasilva has quit IRC | 22:52 | |
*** samccann has quit IRC | 22:53 | |
*** ChrisShort has quit IRC | 22:53 | |
*** Open10K8S has quit IRC | 22:53 | |
*** lseki has quit IRC | 22:53 | |
*** zbr has quit IRC | 22:53 | |
*** guilhermesp has quit IRC | 22:53 | |
*** donnyd has quit IRC | 22:53 | |
corvus | okay, i'm really confused. is anyone setting "host-key-checking" to false? | 22:59 |
corvus | because based on what i just saw the gerrit zuul do, i don't see how you could use it and have a working configuration | 23:00 |
*** tosky has quit IRC | 23:01 | |
fungi | i wonder if behavior for ansible changed since we implemented that | 23:04 |
mordred | corvus: yeah - that seems very not ok | 23:10 |
*** hamalq_ has quit IRC | 23:10 | |
mordred | corvus: is there a corresponding action we need to do? | 23:10 |
mordred | corvus: the docs say "when set to false nodepool-launcher will not ssh-keyscan nodes" | 23:11 |
mordred | but that's nodepool side - we'd need to tell ansible on zuul's side to not do host key validation | 23:11 |
corvus | mordred: there is no such option for zuul; i think the only thing we could do would be to somehow set ansible inventory variables to do that | 23:13 |
mordred | corvus: I agree - I don't see how this could be used - I think we need to pass host-key-checking in the node dict, and then in zuul we need to set host_key_checking = False in the ansible.cfg | 23:13 |
corvus | yeah | 23:13 |
mordred | or inventory args: ansible_ssh_extra_args='-o StrictHostKeyChecking=no' | 23:13 |
mordred | https://stackoverflow.com/questions/23074412/how-to-set-host-key-checking-false-in-ansible-inventory-file | 23:13 |
mordred | corvus: so clearly nobody is using this | 23:14 |
corvus | i think that's a reasonable thing to do (we'd probably want it to be inventory, since it's a per-host variable) | 23:14 |
corvus | but yeah, i'd also like to double check that | 23:14 |
mordred | yeah. agree | 23:14 |
corvus | hrm, a lot of eu folks are not in channel right now | 23:14 |
*** hashar has quit IRC | 23:15 | |
mordred | corvus: we should also double check that ansible_ssh_extra_args is extra and doesn't override ssh_args from ansible.cfg | 23:15 |
corvus | i was hoping i could ping them now and collect responses tomorrow; i'll just try to ask first thing when i get up | 23:15 |
mordred | I'd guess it is from naming | 23:15 |
mordred | corvus: it definitely seems like if we set host-key-checking off in nodepool that we'd want that to follow the node | 23:16 |
* mordred has to afk | 23:16 | |
*** rlandy|bbl is now known as rlandy | 23:21 | |
*** Goneri has quit IRC | 23:25 | |
openstackgerrit | Guillaume Chauvel proposed zuul/zuul master: WIP: Scheduler: Reconfiguration ref-updated create/delete https://review.opendev.org/739198 | 23:39 |
openstackgerrit | Guillaume Chauvel proposed zuul/zuul master: WIP: Scheduler: Reconfiguration ref-updated oldrev+newrev https://review.opendev.org/739078 | 23:39 |
Generated by irclog2html.py 2.17.2 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!