*** tosky has quit IRC | 00:02 | |
*** jamesmcarthur has joined #zuul | 00:23 | |
*** jamesmcarthur has quit IRC | 00:29 | |
*** jamesmcarthur has joined #zuul | 01:04 | |
*** jamesmcarthur has quit IRC | 01:09 | |
*** jamesmcarthur has joined #zuul | 01:35 | |
openstackgerrit | Ian Wienand proposed zuul/nodepool master: Dockerfile: add DEBUG environment flag https://review.opendev.org/694845 | 01:37 |
---|---|---|
openstackgerrit | Ian Wienand proposed zuul/nodepool master: Also build sibling container images https://review.opendev.org/697393 | 01:37 |
openstackgerrit | Ian Wienand proposed zuul/nodepool master: Add container-with-siblings functional test https://review.opendev.org/693464 | 01:37 |
openstackgerrit | Ian Wienand proposed zuul/nodepool master: Dockerfile: install nodepool-builder dependencies https://review.opendev.org/693306 | 01:37 |
openstackgerrit | Ian Wienand proposed zuul/nodepool master: Add a container-with-releases functional test https://review.opendev.org/698818 | 01:37 |
openstackgerrit | Ian Wienand proposed zuul/nodepool master: Functional tests - use common verification script https://review.opendev.org/698834 | 01:37 |
*** jamesmcarthur has quit IRC | 01:40 | |
*** jamesmcarthur has joined #zuul | 02:36 | |
*** jamesmcarthur_ has joined #zuul | 02:46 | |
*** jamesmcarthur has quit IRC | 02:46 | |
*** bhavikdbavishi has joined #zuul | 02:53 | |
*** jamesmcarthur_ has quit IRC | 03:28 | |
*** jamesmcarthur has joined #zuul | 03:32 | |
*** jamesmcarthur has quit IRC | 03:37 | |
*** swest has quit IRC | 04:08 | |
*** openstackgerrit has quit IRC | 04:08 | |
*** dmellado has quit IRC | 04:08 | |
*** irclogbot_0 has quit IRC | 04:08 | |
*** dmsimard has quit IRC | 04:08 | |
*** shanemcd has quit IRC | 04:08 | |
*** klindgren_ has quit IRC | 04:08 | |
*** ianw has quit IRC | 04:08 | |
*** aspiers has quit IRC | 04:08 | |
*** gothicmindfood has quit IRC | 04:08 | |
*** openstackstatus has quit IRC | 04:11 | |
*** openstackstatus has joined #zuul | 04:14 | |
*** ChanServ sets mode: +v openstackstatus | 04:14 | |
*** raukadah is now known as chkumar|rover | 05:22 | |
*** saneax has joined #zuul | 06:42 | |
*** sanjayu_ has joined #zuul | 06:48 | |
*** saneax has quit IRC | 06:51 | |
*** themroc has joined #zuul | 07:15 | |
*** pcaruana has joined #zuul | 07:16 | |
*** jcapitao|off has joined #zuul | 07:47 | |
*** tosky has joined #zuul | 08:02 | |
*** jcapitao|off is now known as jcapitao | 08:16 | |
*** hashar has joined #zuul | 08:37 | |
*** sshnaidm|off is now known as sshnaidm | 08:41 | |
*** avass has joined #zuul | 08:49 | |
avass | Is it possible to reload the tenant config for only one tenant? | 08:50 |
*** jpena|off is now known as jpena | 08:52 | |
*** fbo has joined #zuul | 09:07 | |
*** mugsie has quit IRC | 09:19 | |
*** mugsie has joined #zuul | 09:21 | |
*** yolanda has quit IRC | 09:28 | |
*** yolanda__ has joined #zuul | 09:28 | |
*** mhu has joined #zuul | 09:32 | |
*** shanemcd has joined #zuul | 10:07 | |
*** dmellado has joined #zuul | 10:07 | |
*** ianw has joined #zuul | 10:07 | |
*** klindgren has joined #zuul | 10:07 | |
*** themroc has quit IRC | 10:07 | |
*** themroc has joined #zuul | 10:08 | |
*** irclogbot_0 has joined #zuul | 10:10 | |
*** aspiers has joined #zuul | 10:14 | |
*** jcapitao is now known as jcapitao|afk | 11:50 | |
*** rfolco has joined #zuul | 12:03 | |
*** dmsimard has joined #zuul | 12:12 | |
*** pcaruana has quit IRC | 12:27 | |
*** pcaruana has joined #zuul | 12:33 | |
*** jpena is now known as jpena|lunch | 12:39 | |
*** sanjayu_ has quit IRC | 12:56 | |
*** rlandy has joined #zuul | 12:58 | |
*** jamesmcarthur has joined #zuul | 13:04 | |
*** jcapitao|afk is now known as jcapitao | 13:09 | |
*** jamesmcarthur has quit IRC | 13:15 | |
*** jpena|lunch is now known as pjena | 13:31 | |
*** pjena is now known as jpena | 13:32 | |
*** Goneri has joined #zuul | 13:44 | |
fungi | avass: i don't think so... what's the use case? is there some problem when the other tenants are reconfigured? | 13:48 |
mordred | fungi, avass: I want to say this is something tobiash was interested in a while back | 14:05 |
mordred | and maybe did something about? or maybe didn't do something about? | 14:06 |
tobiash | mordred, avass: check out https://review.opendev.org/#/c/652114 | 14:19 |
tobiash | btw, this is ready for review and we use it in production since two.months now | 14:19 |
tobiash | ;) | 14:20 |
tobiash | fungi: with many tenants this stalls zuul quite a while. A full reconfiguration can take up to 20 minutes in our deployment | 14:21 |
*** saneax has joined #zuul | 14:29 | |
*** openstackgerrit has joined #zuul | 14:30 | |
openstackgerrit | Monty Taylor proposed zuul/zuul master: Add --check-config option to zuul scheduler https://review.opendev.org/542160 | 14:30 |
mnaser | FYI my talk was accepted into FOSDEM so I’ll be talking zuul: https://fosdem.org/2020/schedule/event/safe_gated_and_integrated_gitops_for_kubernetes/ | 14:32 |
tristanC | tobiash: left a comment | 14:33 |
mordred | mnaser: cool! | 14:34 |
tristanC | i meant, as an operator, it may be confusing to pick between a smart-reconfigure and full-reconfigure... shouldn't the reconfigure be always smart? | 14:37 |
mordred | tristanC: yeah, I'd imagine scripting to trigger reconfigures when landing a change would almost always want smart - I'm not sure if there's still a use case for full ... but maybe adding smart is a more conservative way to add it - and then maybe at some point in the future if we're all happy with it we just alias full to smart? it's a good question | 14:41 |
tristanC | mordred: having both works for me, but since this update the cli/unix-socket api, we might want avoid adding a new command | 14:44 |
*** chkumar|rover is now known as ignoreirc | 14:57 | |
*** ignoreirc is now known as chkumar|rover | 14:58 | |
openstackgerrit | Merged zuul/nodepool master: Dockerfile: create APP_DIR https://review.opendev.org/693646 | 14:59 |
avass | fungi: one of the tenant has a lot of branches in one of the projects, it's a bit annoying having to reload that one when we add a project on another tenant | 15:00 |
tobiash | tristanC: full reconfig is still needed i.e. to fix inconsistent cached things | 15:02 |
avass | tobiash: yeah, a full-reconfigure takes about 30-40 minutes for us | 15:02 |
tobiash | As it reloads all config while smart reconfig operates incrementally | 15:02 |
tristanC | tobiash: then perhaps full-reconfigure could be renamed full-reload, and the smart-reconfigure be renamed full-reconfigure ? | 15:05 |
avass | is the 'full' needed. how about 'reconfigure'? | 15:06 |
tristanC | avass: well most zuul operators must already be using the 'full-reconfigure' command | 15:07 |
tobiash | tristanC: a full-reconfigure does a full config reload while a smart-reconfigure does an incremental approach. I don't see the need to change the already existing full-reconfigure | 15:08 |
tristanC | and i guess most will want the new smart-reconfigure command, thus i'm suggesting we make it the default | 15:08 |
corvus | mordred: what did you think of my concern on 542160? | 15:08 |
tristanC | tobiash: then that's ok, it seems like we can just s/full/smart/ in our playbooks | 15:09 |
tobiash | There is no default, full-reconfigure is named like that in anticipation of further reconfig variants | 15:09 |
avass | tobish: thanks for the link, looks good to me except a small spelling error :) | 15:10 |
tobiash | no, because the use cases are different | 15:12 |
tobiash | scratch my last sentence | 15:12 |
mordred | corvus: oh - that's a good point - I missed that back there | 15:13 |
corvus | i'll restate it with a -1 | 15:14 |
mordred | corvus: thanks - I agree, it's worth a discussion about user experience | 15:15 |
tobiash | corvus; mordred: my use case is that we want to be able to test changes to tenant config upfront | 15:16 |
tobiash | we spawn a second scheduler and mergers and check if the new config is valid | 15:16 |
tobiash | (for the whole tenant) | 15:17 |
corvus | tobiash: and with the smaller system, it's able to load the configuration within the timeout? | 15:18 |
tobiash | corvus: it worked until half a year ago where it tool ~20min for startup, now we filter the tenant list in the job based on the diff | 15:18 |
tobiash | s/tool/took | 15:19 |
tobiash | and with that it runs typiically only 5 minutes | 15:19 |
tobiash | corvus: would you be ok to add a big exclamation mark to the docs about the use case and how it is expected to be used? | 15:20 |
corvus | tobiash: ok, i have a few thoughts: 1) the idea of being able to run "program --validate-config" is pretty universal, so users will be surprised when they try to use it that they also have to start daemons just to see if other daemons will start. for this, i think we should add a section to the docs about it, and even add a quick note to the cli args pointing there (like "caveat: see [doc section]"). 2) | 15:23 |
corvus | since even you can't use it as written, we might want to consider altering it to be "--validate-tenant" or something like that, so it takes an argument and filters for the one tenant. but that's a suggestion, not a -1. | 15:23 |
corvus | (#2 could be a followup) | 15:25 |
tobiash | validate-tenant actually sounds like the way to go | 15:25 |
tobiash | however this should accept a list, so --validate-tenants? | 15:25 |
corvus | sure, and maybe also accept * (and/or default to *)? | 15:26 |
tobiash | ++ | 15:26 |
tobiash | great, thanks | 15:26 |
corvus | sounds good; i'll copy/paste this conversation into review :) | 15:26 |
tobiash | :) | 15:27 |
*** avass has quit IRC | 15:30 | |
corvus | tobiash: 652114 lgtm but avass found a typo | 15:43 |
*** bhavikdbavishi has quit IRC | 15:46 | |
*** themroc has quit IRC | 15:59 | |
AJaeger | zuul-jobs reviewers, https://review.opendev.org/#/c/696337/ has two +2s but a few questions and was not approved - anybody wants to +2A? Subject is "Add pypi_fqdn to differentiate it package mirrors" | 16:03 |
*** hashar has quit IRC | 16:05 | |
*** hashar has joined #zuul | 16:07 | |
*** jamesmcarthur has joined #zuul | 16:09 | |
*** chkumar|rover is now known as raukadah | 16:14 | |
*** saneax has quit IRC | 17:05 | |
*** hashar has quit IRC | 17:14 | |
pabelanger | if multiple jobs produce artifacts, and a child depends on both, zuul.artifacts should be updated correctly in this use case? | 17:14 |
pabelanger | testing it now to confirm, but figured I'd ask | 17:15 |
*** mattw4 has joined #zuul | 17:26 | |
openstackgerrit | Tobias Henkel proposed zuul/zuul master: Add support for smart reconfigurations https://review.opendev.org/652114 | 17:31 |
tobiash | corvus: thanks, fixed | 17:32 |
corvus | pabelanger: yep, and multiple changes too | 17:37 |
corvus | tristanC: are you happy with https://review.opendev.org/652114 (i think your questions were answered in irc, but i wanted to double check) | 17:56 |
*** jpena is now known as jpena|off | 18:08 | |
tristanC | corvus: i'm very happy with it, it's a much needed improvement | 18:21 |
corvus | tristanC: want to go ahead and +3 then? that way we have a record that you're happy :) (it also had a +2 from mordred before the typo fix) | 18:23 |
tristanC | corvus: done, thanks tobiash :) | 18:24 |
*** themroc has joined #zuul | 18:25 | |
mordred | that should make avass happy | 18:25 |
*** sshnaidm is now known as sshnaidm|afk | 18:26 | |
clarkb | opendev had 1053 job retries in the previous 24 hours period. 73 of these ended with all three attempts failing. A good chunk of this 73 are due to pre-run failures that are consistent. The good news here is that we also had a cloud outage in this period | 18:29 |
clarkb | Thought I'd share some data on the utility of job retries | 18:30 |
corvus | clarkb: wow, that sounds like things are working well | 18:30 |
corvus | i mean, the whole pre-run retry idea is working well | 18:30 |
clarkb | I think what this shows opendev is that we need the retries, but we don't want a very high limit as we do have a non trivial number of consistent failures | 18:30 |
clarkb | corvus: yup | 18:30 |
corvus | (the internet and stuff on it is *not* working well :) | 18:30 |
clarkb | I think this also means that the zuul default of 3 retries is a good one | 18:31 |
corvus | ++ | 18:31 |
clarkb | I also wrote up an email to openstack-discuss identifying cases of those 73 consistent failures for the related parties so that they can hopefully fix them (and in some cases they are already fixing them) | 18:32 |
clarkb | other zuul operators may want to keep track of these numbers too as real issues can hide in those retries | 18:33 |
clarkb | http://lists.openstack.org/pipermail/openstack-discuss/2019-December/011600.html if others are interested in the sorts of consistent retry failures we see | 18:34 |
clarkb | anyway this has been in the back of my head for a while and finally got around to put a bit more logging in place. From that was able to dig in and identify that this is a useful feature and our default is good as well as point specific projects at improvements they can make. And with that I'm going to context switch to the next thing | 18:39 |
clarkb | The next thing is the OSF Annual Report Zuul section. My plan is to get a draft going on an etherpad and share that with the channel | 18:39 |
mordred | clarkb: that's some awesome data | 18:40 |
clarkb | one thing we might want to consider is how to store retry data in the db | 18:42 |
clarkb | aiui we only report the last attempt to the database | 18:42 |
mordred | infra-root: it has come to my attention that I have some vacation days that I have to take or lose - so I'm going to be primarily AFK for the rest of the week. I'll probably still be lurking around since I'd intended to be working so we don't have a ton of extra things to do teed up | 18:42 |
clarkb | this means it might be difficult for zuul operators to track retries without having another system (like opendev's logstash/elasticsearch) | 18:42 |
*** themroc has quit IRC | 18:42 | |
corvus | mordred: vacationing by volunteering your time to an open source project? :) | 18:43 |
mordred | corvus: :) | 18:43 |
clarkb | My parents are flying into town thursday I expect my hours will become weird starting about then | 18:44 |
Shrews | mordred: be aware that you have until like mid february to use it | 18:44 |
corvus | clarkb: yeah; i think in general we're going to need to store non-final builds and buildsets. that would also let us have zuul-web build pages for completed builds before their buildset is complete (another common complaint). i think that can come as part of (or after) the conversion of the sql reporter from a reporter to a mandatory component. | 18:45 |
corvus | (but right now, it's structurally hard to do that) | 18:46 |
mordred | Shrews: yah - I think this is likely the best time to do it since stuff is slow anyway (and also I'm on the Mexican Riviera) | 18:46 |
clarkb | corvus: roger | 18:48 |
Shrews | mordred: that being said, it's a good reminder for me to examine which days to use my remaining tiime | 18:48 |
*** armstrongs has joined #zuul | 18:49 | |
*** rfolco has quit IRC | 19:09 | |
*** rfolco has joined #zuul | 19:10 | |
mordred | Shrews: ++ | 19:17 |
openstackgerrit | Merged zuul/zuul master: Add support for smart reconfigurations https://review.opendev.org/652114 | 19:21 |
fungi | mordred: glad to hear you've got some vacation time coming and so we'll be seeing more of you! | 19:21 |
*** rlandy is now known as rlandy|brb | 19:22 | |
fungi | i personally am being shanghaied on a boat which is not destined for shanghai, and will disappear from regcognizance the day after next | 19:25 |
*** jamesmcarthur has quit IRC | 19:27 | |
*** jamesmcarthur has joined #zuul | 19:28 | |
*** rlandy|brb is now known as rlandy | 19:45 | |
*** jamesmcarthur has quit IRC | 19:48 | |
*** mhu has quit IRC | 19:50 | |
*** themroc has joined #zuul | 19:51 | |
*** themroc has quit IRC | 19:51 | |
*** Goneri has quit IRC | 19:53 | |
tobiash | clarkb: re retries, check out https://review.opendev.org/#/c/633501 | 19:54 |
tobiash | That also helped us to analyse the reason of a retry (at least if log upload still worked) | 19:58 |
clarkb | tobiash: oh that would be great | 20:05 |
clarkb | corvus: ^ that may be a workaround for the db issues? | 20:05 |
clarkb | corvus: maybe you can review that to make sure it doesn't conflict with your plans stated earlier? | 20:05 |
clarkb | zuulians first draft of an annual report report at the bottom of fungi's data gathering etherpad. Put it there because I reference the data in the etherpad and this way it is easy for people to double check I got things correct. https://etherpad.openstack.org/p/zuul-2019-annual-report-data | 20:48 |
clarkb | we've been asked to ahve this ready by January 10 so not a huge rush but with holidays I figured it was better to start early than late | 20:48 |
*** jamesmcarthur has joined #zuul | 20:49 | |
clarkb | feel free to edit or make suggestions. | 20:56 |
*** jamesmcarthur_ has joined #zuul | 20:58 | |
*** gothicmindfood has joined #zuul | 20:59 | |
*** jamesmcarthur has quit IRC | 21:01 | |
corvus | clarkb, tobiash: in principle i think that could work and i don't think it would impact future db changes. but i think there are some problems with those patches with current code; i left some comments on 633501. | 21:06 |
*** Goneri has joined #zuul | 21:06 | |
corvus | also, i'm still actually a little confused about how that would work in practice since there are generally no longs for retried builds. | 21:06 |
corvus | like, you just get to see that the build was retried. it's pretty durned hard to find out why. | 21:07 |
clarkb | there are logs if it fails in pre run like devstack jobs or if the disk issues that ironic sees occur | 21:07 |
clarkb | but ya not in all cases | 21:07 |
*** jcapitao has quit IRC | 21:47 | |
*** pcaruana has quit IRC | 21:55 | |
*** rfolco is now known as rfolco|bbl | 21:58 | |
*** mattw4 has quit IRC | 22:04 | |
*** mattw4 has joined #zuul | 22:04 | |
*** mattw4 has quit IRC | 22:28 | |
*** mattw4 has joined #zuul | 22:28 | |
*** rlandy is now known as rlandy|bbl | 22:51 | |
*** jamesmcarthur_ has quit IRC | 23:26 | |
*** tosky has quit IRC | 23:43 |
Generated by irclog2html.py 2.15.3 by Marius Gedminas - find it at mg.pov.lt!