*** openstackgerrit has quit IRC | 00:34 | |
*** openstackgerrit has joined #zuul | 00:54 | |
openstackgerrit | Tristan Cacqueray proposed openstack-infra/zuul master: web: add jobs graph rendering https://review.openstack.org/537869 | 00:54 |
---|---|---|
*** rlandy has quit IRC | 01:07 | |
*** bhavikdbavishi has joined #zuul | 01:49 | |
*** bhavikdbavishi has quit IRC | 01:59 | |
*** ruffian_sheep has joined #zuul | 02:07 | |
ruffian_sheep | Hi!I want to build a Third Party CI!But now ,I don't know how to set the piplines of jenkins jobs build used in zuul | 02:10 |
ruffian_sheep | This is the link where I learn how to build it.https://wiki.openstack.org/wiki/Cinder/tested-3rdParty-drivers.I don't know the exactly setting of the layout.yaml and project.yaml. | 02:13 |
*** sshnaidm is now known as sshnaidm|off | 02:14 | |
fungi | ahh, i wish in #openstack-infra you'd mentioned this was for an openstack third-party ci system | 02:14 |
fungi | or i wouldn't have directed you to #zuul | 02:15 |
fungi | zuul hasn't supported jenkins for roughly a year now | 02:15 |
fungi | so if you want to run zuul with jenkins you're going to be stuck running a somewhat old and unmaintained release of zuul | 02:16 |
fungi | i think 2.6.0 is likely the last release which had jenkins integration | 02:17 |
ruffian_sheep | But .....the link is given by official guidance document | 02:25 |
fungi | that's guidance from the openstack cinder project. you should talk to people involved in that project to get recommendations on what they expect from you | 02:26 |
ruffian_sheep | I think so !But...this my third week to learn how to build it.And I cann't find anyone do it like me to answer my questionsT.T | 02:32 |
ruffian_sheep | <fungi>:Do you know the channel they discuss? | 02:33 |
fungi | there is a #openstack-cinder channel where the openstack cinder team tends to be present, and also an #openstack-third-party-ci channel where some third-party ci operators tend to be | 02:34 |
ruffian_sheep | <fungi>:Thanks!Does IRC have the ability to add permanent friends? | 02:36 |
ruffian_sheep | <fungi>: ;P | 02:36 |
fungi | not really, no (some irc clients might, i don't know) | 02:36 |
ruffian_sheep | <fungi>: Alright | 02:37 |
ruffian_sheep | Hi!I want to build a Third Party CI!But now ,I don't know how to set the piplines of jenkins jobs build used in zuul.This is the link where I learn how to build it.https://wiki.openstack.org/wiki/Cinder/tested-3rdParty-drivers.I don't know the exactly setting of the layout.yaml and project.yaml. | 02:38 |
ruffian_sheep | Sorry,send the wrong channel.. | 02:39 |
ruffian_sheep | Please ignore the message | 02:39 |
*** jiapei has joined #zuul | 03:10 | |
*** _ari_ has quit IRC | 03:21 | |
*** rcarrillocruz has joined #zuul | 03:39 | |
*** bhavikdbavishi has joined #zuul | 04:51 | |
openstackgerrit | Tristan Cacqueray proposed openstack-infra/zuul master: web: add version to info endpoint https://review.openstack.org/609571 | 05:35 |
openstackgerrit | Tristan Cacqueray proposed openstack-infra/zuul master: web: add about dropdown to display zuul version https://review.openstack.org/630027 | 05:35 |
*** chkumar|out is now known as chandankumar | 05:40 | |
openstackgerrit | Tristan Cacqueray proposed openstack-infra/zuul master: sql: add buildset uuid https://review.openstack.org/630034 | 06:45 |
openstackgerrit | Tristan Cacqueray proposed openstack-infra/zuul master: web: add /{tenant}/buildsets route https://review.openstack.org/630035 | 06:45 |
*** hashar has joined #zuul | 06:54 | |
openstackgerrit | Rui Chen proposed openstack-infra/zuul master: Avoid using list branches with protected=1 in github driver https://review.openstack.org/630038 | 06:55 |
*** pcaruana has joined #zuul | 07:05 | |
*** bhavikdbavishi has quit IRC | 07:10 | |
*** quiquell|off is now known as quiquell | 07:16 | |
openstackgerrit | Tristan Cacqueray proposed openstack-infra/zuul master: sql: add buildset uuid column https://review.openstack.org/630034 | 07:17 |
openstackgerrit | Tristan Cacqueray proposed openstack-infra/zuul master: web: add /{tenant}/buildsets route https://review.openstack.org/630035 | 07:17 |
openstackgerrit | Tristan Cacqueray proposed openstack-infra/zuul master: web: add buildsets page https://review.openstack.org/630041 | 07:17 |
*** hashar has quit IRC | 07:18 | |
*** jiapei has left #zuul | 07:20 | |
*** gtema has joined #zuul | 07:44 | |
openstackgerrit | Tristan Cacqueray proposed openstack-infra/zuul master: sql: add buildset uuid column https://review.openstack.org/630034 | 07:46 |
*** gtema has quit IRC | 08:01 | |
*** gtema has joined #zuul | 08:02 | |
*** rcarrillocruz has quit IRC | 08:03 | |
openstackgerrit | Tristan Cacqueray proposed openstack-infra/zuul master: web: add /{tenant}/buildsets route https://review.openstack.org/630035 | 08:17 |
*** jpena|off is now known as jpena | 08:54 | |
*** panda|off is now known as panda | 08:55 | |
openstackgerrit | Tristan Cacqueray proposed openstack-infra/zuul master: web: add /{tenant}/buildsets route https://review.openstack.org/630035 | 09:07 |
openstackgerrit | Tristan Cacqueray proposed openstack-infra/zuul master: web: add buildsets page https://review.openstack.org/630041 | 09:07 |
openstackgerrit | Tristan Cacqueray proposed openstack-infra/zuul master: web: add /{tenant}/buildset/{uuid} route https://review.openstack.org/630078 | 09:07 |
openstackgerrit | Tristan Cacqueray proposed openstack-infra/zuul master: web: add buildset page https://review.openstack.org/630079 | 09:07 |
*** avass has joined #zuul | 09:38 | |
avass | any way to run a job on all nodes with a specific label? | 09:40 |
panda | avass: adding the label in the nodeset and then using hosts: <label> in the playbook doesnt' work ? | 09:42 |
panda | does anyone know if files: attribute in a child overrides completely the same attribute in the parent or the two lists are merged ? | 09:42 |
avass | panda: i thought that only requests one node with the label? | 09:43 |
avass | panda: i mean all nodes with a specific label in nodepool sorry | 09:43 |
panda | avass: yes, nevermind, I always confuse the label as an alias for group | 09:45 |
panda | avass: in fact I think the only way is to use groups | 09:46 |
avass | yeah, but that would mean setting up different labels for all nodes then adding to the group wouldn't it? | 09:46 |
ssbarnea|rover | found a bug in zuul, take a look at this message which translates into a table full of SUCCESS in gerrit and a negative vote. https://s3.sbarnea.com/ss/190111-Change_Ia5bcd556_GATE_CHECK_for_TripleO__review.openstack_Code_Review_.png | 09:48 |
ssbarnea|rover | or maybe this counts as a bug in the JS extension for gerrit made for zuul. | 09:49 |
panda | avass: two nodes can have the same label, and I think it's separated from the group definition. If you set two nodes with the same label in nodes: you will need to manually add those nodes to the same group to run the playbook on both | 09:51 |
avass | panda: yes but wouldn't that mean that nodeset only requests one of the nodes from nodepool? | 09:52 |
avass | panda: or nvm I see what you mean. But that wouldn't be the same as all nodes | 09:52 |
panda | avass: nodeset.nodes.label (required) | 09:53 |
panda | The Nodepool label for the node. Zuul will request a node with this label. | 09:53 |
panda | two nodes with the same label will be two nodes anyway | 09:53 |
avass | panda: yes, but I have static nodes with jobs that need to be run on all of them at the same time. only adding a label in nodepool when setting up another node is nicer than having to reconfigure the job | 09:55 |
avass | panda: meaning I don't want to run a job on a set number of nodes but all of the nodes with a nodepool label | 09:56 |
*** avass is now known as avass|lunch | 10:07 | |
openstackgerrit | Tobias Henkel proposed openstack-infra/zuul master: Fix test_load_governor on large machines https://review.openstack.org/630118 | 10:08 |
tobiash | avass: that's not possible atm. You would need to remove them from nodepool and use add_host in the job. Then you can run stuff on all of them. But still you would need to find a way to query that list of nodes somewhere | 10:15 |
openstackgerrit | Andriy Shevchenko proposed openstack/pbrx master: Updatae home-page https://review.openstack.org/630132 | 10:21 |
*** ruffian_sheep has quit IRC | 10:37 | |
*** electrofelix has joined #zuul | 10:48 | |
*** avass|lunch is now known as avass | 10:53 | |
avass | tobiash: alright | 10:53 |
*** hashar has joined #zuul | 10:55 | |
avass | tobiash: do you think it would be a lot of work to implement it? might work on it later | 10:56 |
jkt | my zuul/web/static/ contains just the .keep file after I `pip install`, what am I doing wrong? Do I need some additional packages to build the web dashboard? | 10:58 |
*** gtema has quit IRC | 11:01 | |
tobiash | avass: I'm not sure how this would fit into the current architecture of nodepool | 11:01 |
avass | ah, I see | 11:01 |
avass | would have been a nice thing to have to set up static nodes | 11:03 |
jkt | ah, right, if I'm installing from git directly, I need yarn. | 11:20 |
*** hashar has quit IRC | 11:29 | |
openstackgerrit | Tobias Henkel proposed openstack-infra/zuul master: Fix test_load_governor on large machines https://review.openstack.org/630118 | 11:39 |
*** cmurphy is now known as cmorpheus | 12:05 | |
*** dkehn has quit IRC | 12:10 | |
*** rcarrillocruz has joined #zuul | 12:35 | |
*** hashar has joined #zuul | 12:35 | |
*** jpena is now known as jpena|lunch | 12:38 | |
*** rcarrillocruz has quit IRC | 12:40 | |
*** gtema has joined #zuul | 12:49 | |
*** rcarrillocruz has joined #zuul | 13:10 | |
*** rlandy has joined #zuul | 13:18 | |
openstackgerrit | Sorin Sbarnea proposed openstack-infra/zuul-jobs master: Avoid zuul DISK_FULL failure with too big logs https://review.openstack.org/630224 | 13:37 |
*** jpena|lunch is now known as jpena | 13:44 | |
openstackgerrit | Sorin Sbarnea proposed openstack-infra/zuul-jobs master: Avoid zuul DISK_FULL failure with too big logs https://review.openstack.org/630224 | 13:46 |
*** dkehn has joined #zuul | 13:55 | |
fungi | ssbarnea|rover: yeah, what you're seeing is a combination of 1. zuul not having an external log link to supply because the job was aborted prior to uploading its logs, and 2. the "hideci" javascript overlay in use on openstack's gerrit server not knowing how to deal with it | 13:57 |
fungi | i know we'd discussed in the past having a way to at least archive and report a log link for the build's console log even if it exceeded storage on the executor. the current implementation where the disk accountant as a separate thread aborts jobs out-of-band makes that a bit challenging to implement, i think | 13:58 |
dmsimard | fungi: perhaps the take away could be that the executor should not be storing the files in transit at all ? | 14:43 |
fungi | well, it does some various things to them while stored on the executor which would need to happen elsewhere i suppose | 14:44 |
fungi | dmsimard: also, the one safeguard we do have currently to curtail filling up logservers is the executor disk accountant thread, which operates on the transient build dir | 14:46 |
fungi | if we copy those files straight through to the destination, we likely need some other way to figure out how much we're about to copy | 14:47 |
dmsimard | what I had implemented a while back for RDO's jenkins things is that the logserver had a ssh key embedded in the image (not unlike the current infra root keys) and the log server would pull the logs instead | 14:47 |
fungi | yeah, we're also hoping to soon get away from having an actual logserver. i doubt swift containers have a mechanism to pull data from some other source on demand | 14:48 |
dmsimard | yeah, swift makes this somewhat of a no-go | 14:48 |
dmsimard | The ideal situation is if we're able to either pull once or push once, right now we're pulling and then pushing which costs us bandwidth, storage and performance | 14:51 |
fungi | the previous iteration of logs-on-swift under zuul v2 did have the job nodes upload their logs directly to the swift api, though we had to take extra care to only authorize them for access to their job's log subtree | 14:52 |
fungi | and much of the challenge was around generating and distributing credentials for that | 14:53 |
dmsimard | yeah, security is why we had the log server pull the logs instead of having the nodes push them | 14:54 |
fungi | also, aggregating artifacts/logs through the executor does, i think, provide increased flexibility in supporting multiple storage solutions | 14:55 |
dmsimard | pros and cons :) | 14:56 |
jkt | how does nodepool actually connect to the nodes when using the static provider? | 15:14 |
jkt | I see that I have to register the nodes' SSH server pubkeys, but the docs do not say anything about how to configure the nodes themselves | 15:15 |
jkt | presumably, I should somehow add some SSH pubkey to my target user's home dir, but the docs do not specify which key is used for this by nodepool/zuul | 15:16 |
jkt | is that the same zuul's SSH key as used for talking to Gerrit by chance? | 15:16 |
Shrews | jkt: you may find https://zuul-ci.org/docs/zuul/admin/nodepool_static.html useful | 15:18 |
Shrews | jkt: that's part of the Zuul From Scratch guide https://zuul-ci.org/docs/zuul/admin/zuul-from-scratch.html | 15:19 |
jkt | Shrews: thanks, I read that one, and I haven't found my answer in there | 15:21 |
jkt | Shrews: it is probably executor.private_key_file, right? | 15:21 |
*** quiquell is now known as quiquell|off | 15:22 | |
avass | it sets up temporary ssh keys during the pre-job using the master ssh-key | 15:23 |
avass | jkt: and yes it should be that file | 15:23 |
jkt | let me check that playbook file, thanks | 15:24 |
*** hashar has quit IRC | 15:28 | |
corvus | dmsimard, fungi: i think we can have the executor store a few lines of the ansible log in the database in these cases. | 15:48 |
*** avass has quit IRC | 15:49 | |
fungi | makes sense. at least knowing which task was in progress when the DISK_FULL result was determined would have helped speed up diagnosis | 15:50 |
fungi | granted that's still not a huge help in the particular case we ran into, with a job which normally produces 85mb of logs when it succeeds, but wanted to archive 7gb of logs during an infrequent failure case | 15:51 |
fungi | but mordred has suggested in #openstack-infra that perhaps running a remote du and then not collecting the logs from the node if over a set threshold would provide sufficient detail to diagnose the cause | 15:52 |
fungi | (and collect at least a summary of the du info, perhaps by just echoing in the console log) | 15:53 |
mordred | yeah - thinking that could go into fetch-output | 15:54 |
dmsimard | what was the failure scenario ? would it be something that could've been handled by an ansible rescue block ? | 15:55 |
dmsimard | like fetch-output fails, rescue -> do something helpful | 15:55 |
mordred | dmsimard: right now no - right now it's that the disk accounting protector on the zuul executor killed things because the job had used too much executor disk space | 15:55 |
mordred | dmsimard: but that's sort of what we're talking about as a potential mitigation so that we can avoid hitting the kill-job-too-much-disk safety net | 15:56 |
dmsimard | could the fetch output role look at the size of the artifacts it needs to pull before pulling them ? | 15:57 |
mordred | dmsimard: that's what we were just talking about doing - yeah | 15:57 |
dmsimard | oh, hadn't read the entire thing | 15:57 |
dmsimard | caffeine-- | 15:57 |
mordred | corvus: if we did go that direction - perhaps we could expose zuul's disk usage thresholds in a zuul variable so that the role could have the default disk threshold be set from the executor threshold? | 16:00 |
mordred | although - with all of that - I need to AFK for a bit ... y'all have fun | 16:00 |
tobiash | jkt: nodepool only gets the public keys, it doesn't login to the nodes | 16:00 |
corvus | fungi, mordred: yeah, the two approaches are not mutually exclusive. :) i don't see a problem with exposing the threshold variable | 16:02 |
fungi | mordred: the math still gets interesting, because we're pulling from multiple nodes but have one threshold to work with. i guess the role could tally up the du from all of them before pulling from any of them? | 16:05 |
jkt | tobiash: yup, I am now aware that it's actually zuul-executor, and its default config happens to reuse the "main" key as used for connecting to Gerrit | 16:06 |
*** evrardjp has quit IRC | 16:10 | |
openstackgerrit | Paul Belanger proposed openstack-infra/zuul-jobs master: Add dco-license job https://review.openstack.org/630302 | 16:11 |
*** evrardjp has joined #zuul | 16:11 | |
pabelanger | fungi: ttx: ^ is the follow up for pushing up dco-license job, going test is now from ansible-network zuul | 16:11 |
corvus | pabelanger: you were asking for a zuul release.... should we release 2fd6883 as 3.4.0? | 16:12 |
corvus | tobiash, clarkb: ^? | 16:13 |
pabelanger | corvus: yes, that looks fine to me | 16:13 |
tobiash | corvus: is that what you are running atm? | 16:13 |
corvus | tobiash: yeah | 16:14 |
tobiash | lgtm | 16:14 |
tobiash | nothing really important between that and current master | 16:14 |
corvus | there's a small behavior change for the archive url artifacts, but i think it's forward-compatible enough to have it span releases | 16:15 |
tobiash | corvus: while we're at release partying, does anything speak against a gear release? That would enable us to introduce the client side keepalive in zuul. | 16:16 |
corvus | tobiash: oh thanks, yeah we can do that to.... though since gear is a library, let's do that monday? :) | 16:17 |
tobiash | monday is fine | 16:17 |
tobiash | thanks | 16:17 |
clarkb | corvus: fine with me. The github improvements would be good to publish | 16:19 |
clarkb | on the nodepool side of things we may want to get Shrews's image build timeouts in and tested with openstack before releasing. I think that may be an important one given recent errors with image building in openstack land | 16:19 |
Shrews | yeah, i'm working on validating the dib log output is the same before i'm comfortable with it | 16:20 |
Shrews | should know shortly | 16:20 |
tobiash | clarkb, Shrews: regarding the build timeout, I think it should be configurable. We have one image that regularly already takes 8-10 hours... | 16:22 |
tobiash | (it's a 500gb image) | 16:22 |
clarkb | tobiash: ++ | 16:24 |
corvus | zuul 3.4.0 pushed | 16:24 |
tobiash | \o/ | 16:24 |
clarkb | tobiash: I haven't had a chance to review it yet, but is on my list of things to try and get to today | 16:26 |
*** EmilienM is now known as EvilienM | 16:26 | |
tobiash | clarkb: you mean the image build timeout? | 16:27 |
clarkb | yes | 16:27 |
tobiash | I just had a quick look at the first verion yesterday when it was wip'ed and noticed the hard coded timeout | 16:28 |
openstackgerrit | Paul Belanger proposed openstack-infra/zuul-jobs master: Add dco-license job https://review.openstack.org/630302 | 16:30 |
*** pcaruana has quit IRC | 16:36 | |
*** pwhalen has joined #zuul | 16:51 | |
*** gtema has quit IRC | 16:52 | |
corvus | tobiash, pabelanger, tristanC: are you aware of this capability? it's not something we've really advertised... https://review.openstack.org/629983 | 16:53 |
tobiash | corvus: not yet, will look later | 16:54 |
corvus | i'm wondering whether we should continue to pretend that doesn't exist, or should we start to use it in some places (keeping in mind that using it will serve as an example to others). or should we jump into the deep end of the pool and make the 'parent' attribute a list and make this more convenient to use. | 16:58 |
pabelanger | corvus: ah, interesting. I don't think I've written a job like that before | 16:59 |
corvus | i don't think anyone has :) | 16:59 |
pabelanger | yah, having job with 2 parents, way cool! | 17:00 |
openstackgerrit | David Shrewsbury proposed openstack-infra/nodepool master: Add a timeout for the image build https://review.openstack.org/629923 | 17:09 |
clarkb | Shrews: ^ one note on the flush location there | 17:11 |
Shrews | clarkb: that doesn't appear necessary since EOF in the normal case seems to handle that | 17:12 |
clarkb | Shrews: but the log.write(m.encode('utf8')) only happens in the error cases | 17:13 |
clarkb | er no the else is all cases | 17:13 |
clarkb | so just move the flush below the log.write in the exception handler? | 17:13 |
Shrews | clarkb: you've confused me. sorry | 17:14 |
Shrews | clarkb: i need the flush+fsync there to get that message at the end of the log file | 17:15 |
clarkb | Shrews: https://review.openstack.org/#/c/629923/2/nodepool/builder.py line 785. Is a write that happens after the flush and fsync. Shouldn't it come first? | 17:15 |
Shrews | clarkb: the normal case doesn't need that | 17:15 |
Shrews | no, it shouldn't be first because i want it as close to the end of file as possible | 17:16 |
clarkb | gotcha its ensuring the first process has finished writing before the nodepool builder writes that line | 17:16 |
Shrews | because in the timeout case, there can still be buffered data this not yet written to the log | 17:16 |
Shrews | s/this/that is/ | 17:17 |
Shrews | i guess technically it's not necessary since the dib process could still write to the log long after we've moved on, but it doesn't hurt either | 17:19 |
*** panda is now known as panda|off | 17:25 | |
clarkb | ya getting it close enough is probably fine. grep works and we shouldn't timeout often | 17:25 |
*** pwhalen has quit IRC | 17:28 | |
*** electrofelix has quit IRC | 17:31 | |
openstackgerrit | David Shrewsbury proposed openstack-infra/nodepool master: Add a timeout for the image build https://review.openstack.org/629923 | 17:32 |
Shrews | ok, so the requirements on that ^^ seems to be in flux. Do we want the timeout configurable per-image or just an overall timeout? | 17:34 |
Shrews | clarkb: tobiash: ^^^ | 17:34 |
Shrews | that last PS added an overall timeout | 17:35 |
Shrews | but I just saw the comments | 17:35 |
clarkb | Shrews: given tobiash's extreme builds we likely have to make it configurable. I don't think it has to be configurable per image if that is more work | 17:35 |
Shrews | ok, then the current PS should work | 17:36 |
clarkb | (that was more of an optimization so tobiash could say "this is the really long image build treat it differently") For openstack most of our builds are consistent and it won't matter | 17:36 |
*** rlandy is now known as rlandy|brb | 17:36 | |
clarkb | Shrews: ya that latest ps lgtm | 17:37 |
pabelanger | Hmm. I've updated my zuul-web to 3.4.0, but for some reason it isn't connecting properly to zookeeper now | 17:40 |
pabelanger | rolling back to 3.3.1 works as expected | 17:40 |
pabelanger | going to try another service, zuul-web has a lot of changes | 17:42 |
pabelanger | Oh, ha | 17:51 |
pabelanger | I think zuul-web now needs a connection to zookeeper | 17:51 |
pabelanger | no reason it doesn't work, I have it blocked in firewall | 17:52 |
pabelanger | let me confirm that is fix, and I'll propose reno change about upgrading | 17:53 |
pabelanger | Yup, that is it | 17:54 |
*** pwhalen has joined #zuul | 17:55 | |
timburke | fungi: thinking about limiting swift access to a particular subtree, you might want to look at prefix-based tempurls -- see https://docs.openstack.org/swift/latest/api/temporary_url_middleware.html | 18:00 |
fungi | timburke: i think that's what we did once upon a time | 18:01 |
timburke | wow, that's been around longer than i'd remembered... since swift 2.12... | 18:03 |
*** rlandy|brb is now known as rlandy | 18:07 | |
*** TheJulia is now known as needssleep | 18:10 | |
fungi | timburke: yeah, looks like we were using tmpurls with prefixes: https://git.zuul-ci.org/cgit/zuul/tree/zuul/configloader.py?h=9e78452#n125 | 18:10 |
openstackgerrit | Paul Belanger proposed openstack-infra/zuul master: Update docs since zuul-web requires zookeeper https://review.openstack.org/630365 | 18:12 |
pabelanger | corvus: tobiash: clarkb: ^documentation updates for zuul-web when upgrading to 3.4.0 | 18:12 |
corvus | when did that change? | 18:13 |
pabelanger | corvus: I have nodes now in web UI, let me find commit | 18:14 |
pabelanger | corvus: I think https://review.openstack.org/553979/ | 18:15 |
corvus | welp, we should have tagged zuul v4 then. | 18:16 |
corvus | that's an architectural deployment change | 18:16 |
openstackgerrit | Merged openstack-infra/zuul-jobs master: Add validate-dco-license role https://review.openstack.org/629565 | 18:17 |
pabelanger | Yah, I only caught it because zuul-web is on different VM then scheduler | 18:17 |
corvus | i think we should re-tag 3.3.1 and 3.4.1 | 18:17 |
corvus | er | 18:17 |
corvus | i think we should re-tag 3.3.1 as 3.4.1 | 18:17 |
pabelanger | ok | 18:18 |
corvus | because we should not be sneaking in deployment changes (like, if you need to update your firewall rules, it's a deployment change) as minor version changes | 18:18 |
pabelanger | ++ agree | 18:19 |
corvus | and then we can either revert the web->zk changes, or tag 3.4.0 as 4.0.0 | 18:19 |
corvus | (and i guess scale-out-scheduler becomes zuul v5 :) | 18:19 |
fungi | i'll have to stop talking about "zuul v3" | 18:19 |
openstackgerrit | Merged openstack-infra/zuul-jobs master: Add dco-license job https://review.openstack.org/630302 | 18:20 |
pabelanger | will defer to people, but would be okay with revert and 3.4.0 over 4.0.0 bump | 18:20 |
corvus | well, we lose the node+label stuff, unless we rework that to use gearman | 18:20 |
corvus | tristanC, tobiash, SpamapS: ^ thoughts? | 18:22 |
AJaeger | corvus, do we need https://review.openstack.org/#/c/625615/1/zuul/site-variables.yaml ? Happy to +2A, just want confirmation (or you can +2A yourself ;) | 18:22 |
*** hashar has joined #zuul | 18:23 | |
openstackgerrit | Paul Belanger proposed openstack-infra/zuul-jobs master: Simplify dco-license job playbook https://review.openstack.org/630369 | 18:25 |
AJaeger | here's a zuul-jobs change for review - do we want to pin nodejs as mordred proposed? https://review.openstack.org/627823 | 18:25 |
SpamapS | 3.4 seems sufficient for an internal-only deploy change. | 18:26 |
SpamapS | I'd expect a 3.3.1 to be patches only. | 18:27 |
SpamapS | (fixes, non-changing features, etc) | 18:27 |
SpamapS | but we added some cool features, so it seems fine to say "to get them you need to update your firewalls" | 18:27 |
SpamapS | Honestly, I'd even be fine with scale-out scheduler being a minor 3.x+1 release, if it didn't change any of the external interfaces. | 18:28 |
fungi | SpamapS: well, even if you don't use those new features, you're going to need to update your firewalls if you want to continue running zuul-web, i expect? | 18:28 |
SpamapS | fungi: Yeah, that's fine with me. I'm not going to upgrade from 3.3 to 3.4 without planning and reading changelogs. | 18:29 |
fungi | fair | 18:29 |
corvus | i'm happy to be talked back from the edge... though up to this point it's generally been safe to update to 3.x without touching your deployment setup (modulo that one time where certain web configs had to change, sorry SpamapS). 3.x releases have generally been "new job features or behavior" not "new system configuration" | 18:29 |
SpamapS | (I might upgrade from 3.3.0 -> 3.3.1 without doing that, if it had an important fix) | 18:29 |
SpamapS | So, one thing might be to make the new zuul-web bits off by default. | 18:30 |
pabelanger | fungi: yah, in my case, my zuul.conf on zuul-web didn't have zookeeper configuration too, so I needed a zuul.conf change also | 18:30 |
pabelanger | but, that is only because I dynamic generate it with minimal content | 18:31 |
mrhillsman | let me just say the changes that have been going in from a ux have been great so kudos to all the folks doing the work ;) | 18:31 |
corvus | if i'm reading the change correctly, it is effectively required | 18:33 |
corvus | if you omit the zk hosts, it defaults to localhost | 18:33 |
corvus | so regardless, zuul-web will not start if it can't reach a zk | 18:33 |
pabelanger | yes, that is what happen to me | 18:34 |
corvus | (even though the connect invocation is wrapped in a conditional to test whether zk_hosts is set) | 18:35 |
corvus | (it's never going to fail) | 18:35 |
corvus | except... in tests? | 18:36 |
corvus | nope, even then it's used | 18:36 |
corvus | oh, ok. *some* tests use it, some don't | 18:37 |
corvus | so the only place it's possible to have a web server without a zk connection is in *some* unit tests. not in production. | 18:37 |
corvus | we usually try to run tests as in production, so we should probably change that. | 18:38 |
corvus | given that there's a default value for the zk host, i don't see an easy way to make this optional | 18:38 |
corvus | why do we have a default for the zk hosts value when we don't document that, and say that the hosts list is required? https://zuul-ci.org/docs/zuul/admin/components.html#attr-zookeeper | 18:40 |
corvus | (if we didn't have the default, i think there would be an easy way to implement what SpamapS suggested -- add the zk connection info if you want it, or don't) | 18:41 |
*** jpena is now known as jpena|off | 18:44 | |
corvus | clarkb, tobiash, mordred, pabelanger, SpamapS, fungi, Shrews, and anyone else who wants to weigh in: what should we do? A) tag 3.3.1 as 3.4.1, land the release note, then tag master as 4.0.0. B) tag 3.3.1 as 3.4.1, make the ZK stuff optional [somehow] then tag master as 3.4.2. C) land the release note, and send out the release announcement with a note about the additional requirement. | 18:45 |
clarkb | I like B if we can solve the somehow | 18:47 |
clarkb | that gives us future flexibility too | 18:47 |
fungi | would making the zk stuff optional and then tagging 3.3.2 be on the table? | 18:47 |
fungi | as in "oops, 3.3.1 was a regression, fixed in 3.3.2" | 18:48 |
corvus | fungi: 3.4.0 is already tagged, it has the new zk stuff | 18:48 |
fungi | oh, nevermind | 18:48 |
corvus | 3.3.1 is the last "safe" release | 18:48 |
clarkb | for solving the somehow, one appraoch could be to check if 2181 is listening on localhost, if so connect to it, else don't try to connect? | 18:49 |
tobiash | I like B | 18:49 |
clarkb | (that might be too smart for its own good though) | 18:49 |
fungi | got it, so "tag 3.3.1 as 3.4.1" means "revert 3.4.0 temporarily" | 18:49 |
corvus | clarkb: i don't like an approach that fails quietly if zk is down | 18:49 |
corvus | clarkb: i think we need to know what the operator expects, and then fail loudly if they expect zk to exist but it doesn't | 18:49 |
pabelanger | there is also D right, tag 3.3.1 as 3.4.1, revert node+label web ui, tag 3.4.2 ? | 18:49 |
* tobiash is only partly around | 18:49 | |
*** dmellado has quit IRC | 18:51 | |
*** gouthamr has quit IRC | 18:51 | |
pabelanger | as much as I'd like C, I don't think that will be fair to users. To that is -1 for me | 18:51 |
corvus | B is nice too, but i need ideas on how to do it. only way i see would be to drop the undocumented default value. i'm not sure that needs a major release, but it might be nice to at least give notice and have a deprecation period. | 18:51 |
fungi | revisiting the proposed options, i like b though it makes me wonder if we have consensus/precedent on when to increment the major release version component | 18:52 |
pabelanger | +1 for B, if we can figure out how to do | 18:53 |
corvus | fungi: i'm not sure if we have consensus, but we do have precedent. it has only incremented on major architecture changes (jenkins api -> gearman -> zk/ansible) | 18:53 |
fungi | i agree version numbers are cheap/free, so if we want a new connection between two components to mean new major version, i'm okay with it | 18:53 |
corvus | though 2.5 is an asterisk :) | 18:54 |
fungi | heh | 18:54 |
fungi | the addition of a connection from zuul-web to zk doesn't feel like it's in the same class as our previous major version bumps, at least | 18:54 |
corvus | okay, so does anyone have a suggestion or how to implement B? | 18:54 |
pabelanger | if we go to 4.0.0, what else would be include feature wise in that stream? | 18:54 |
corvus | pabelanger: i don't understand the question | 18:55 |
clarkb | corvus: one option is to make anothe rbreaking change and force people to list localhost if that is what they mean (sounds like docs already say this is how you would do it) | 18:55 |
fungi | i don't think we include anything else feature-wise in 4.0.0 aside from the node+label webui | 18:55 |
pabelanger | let me reask, would multiple ansible version be 4.0 or 5.0 | 18:55 |
fungi | depends on when it lands and what lands before it | 18:56 |
Shrews | going to 4.0.0 seems... extreme, but i get it. if we could do B, that seems better | 18:56 |
corvus | clarkb: yeah. i'd like to do that, but it does mean breaking anyone relying on the undocumented default | 18:56 |
clarkb | does the docker-compose quickstart take advantage of that? | 18:57 |
clarkb | (trying to get a sense for how many people might actually be using it, I know openstack bmw and SpamapS all run remote zk clusters) | 18:57 |
fungi | i feel like changing an undocumented behavior (one which contradicts what the documentation claims) could be argued as a minor version bump with very thorough release note | 18:57 |
corvus | if we were planning this ahead of time, i would say "lets send an announcement that in the next release we're going to remove the undocumented default, and wait at least a week. | 18:57 |
corvus | fungi: yeah. you might be able to talk me into that. i hope so. :) | 18:58 |
SpamapS | HOw long has the undocumented default been out? | 18:58 |
SpamapS | Because, I'm inclined to ignore it if it's less than a few months. | 18:58 |
SpamapS | If it were documented, that'd be one thing. | 18:58 |
corvus | i think it's old. i'll check. | 18:58 |
SpamapS | But the accidental "oops this works" being replaced with "hey where'd my nodes/labels UI go?" is kind of fine with me. | 18:59 |
SpamapS | (assuming the answer is that the nodes/labels UI goes away when there's no ZK config) | 18:59 |
fungi | so amended proposal: re-tag 3.3.1 as 3.4.1, announce the coming removal of the default, then tag 3.4.0 as 3.5.0 with a clear release note explaining the situation | 19:00 |
SpamapS | I thought the nodes/labels UI bits only landed a few days ago. | 19:00 |
SpamapS | But maybe the API has been around longer. | 19:00 |
fungi | (or, you know, master as 3.5.0 really) | 19:00 |
corvus | on 2017-06-23 i changed the section it was under; but it's been there since the beginning, 2017-02-21 | 19:01 |
fungi | basically just a "oops, our bad, here's what we should have announced, we'll unwind it with another release in a week" | 19:01 |
pabelanger | fungi: so, that would only break people using localhost zookeeper with zuul scheduler, IIUC | 19:02 |
corvus | SpamapS: yeah, the nodes/labels are new. but having a zk default is old | 19:02 |
corvus | fungi: yeah, that's starting to sound like a plan | 19:02 |
fungi | i do feel like 4.0.0 is going to imply much larger architectural changes than we intend to convey, unless we start to set expectations of much more frequent major version bumps going forward | 19:03 |
corvus | clarkb: the quickstart does not rely on implicit zk hosts | 19:04 |
SpamapS | Wait, so zuul-web has been trying to connect to ZK since June? | 19:05 |
SpamapS | Or the config value was there, but unused? | 19:05 |
corvus | fungi: yeah, it sounds like we'd all probably like to be able to make this change as 3.X, but we just need to do it carefully, and reserve 4.0 for "you're going to start or stop running a new service" | 19:05 |
pabelanger | SpamapS: only after dec 29 https://review.openstack.org/553979/ | 19:06 |
corvus | SpamapS: no, zuul-web has only been trying to connect since 3.4.0 (or master a week ago). but the "[zookeeper]" section of the config file, which is valid on any host (for this reason) has had an implicit default of localhost since "forever" | 19:06 |
*** hashar has quit IRC | 19:06 | |
SpamapS | Ah | 19:06 |
SpamapS | well IMO the more important thing is whether the daemon started up and worked. | 19:06 |
corvus | heh, apparently that was 2 weeks ago, time flies | 19:06 |
fungi | i suppose we could find a way to make that default optional only for zuul-web, but that could get complicated | 19:07 |
corvus | fungi: yeah, it undercuts the work we did to unify that | 19:07 |
fungi | er, make that default nonexistent i maean | 19:07 |
fungi | i agree, it seems like a poor choice for consistency | 19:07 |
SpamapS | But I see the complication that the config file parsed through the time we started using it. | 19:07 |
pabelanger | so, if zookeeper is nonexistent in zuul-web, the new nodes+label UI won't load? Sorry, getting lost mental mapping everything | 19:08 |
SpamapS | Right now I believe if you don't set it, zuul-web will fail to start unless you have zk on localhost. | 19:09 |
corvus | pabelanger: the current behavior is that zuul-web will always attempt to connect to zk, regardless of your config, because of the implicit defaul. | 19:09 |
corvus | SpamapS: right | 19:09 |
SpamapS | We could change it to go ahead and catch that failure when the default is used with a "We think you didn't mean to do this" warning in the log and release notes, and then 3.5.0 can be the first release that doesn't have the default. ? | 19:11 |
SpamapS | *We could change it -- in a 3.4.1 release | 19:11 |
pabelanger | corvus: ok, and sounds like we are wanting to remove that. which stops zuul-web from connecting to zookeeper, we then also update UI to not display labels / nodes if no zookeeper connection? Or do we get that by default (guess I should look) | 19:12 |
SpamapS | So that way 3.4.1 still starts where 3.4.0 doesn't, but I don't think it *breaks* anybody. | 19:12 |
SpamapS | Also, I'm sort of inclined to suggest that 4.0 drops any nodepool-specific API's from zuul-web, and we make a nodepool-web. | 19:12 |
SpamapS | (but that's jumping ahead) | 19:13 |
fungi | pabelanger: it seems like the simplest way forward is to not rework too much of that, roll back the release by re-tagging the old release, and then announce removal of the implicit zk config option, then tag a new release after a bit of a waiting period (so just roll forward with requiring zk for zuul-web) | 19:13 |
fungi | s/option/default/ | 19:14 |
clarkb | fungi: well if we don't rework that won't it also fail on not having zk? | 19:14 |
fungi | yes, but we will have announced that the configuration there is mandatory, like the documentation already states | 19:14 |
clarkb | ah right | 19:14 |
corvus | pabelanger: i'm not sure; we might just end up pushing the brokenness down to when someone clicks on the "labels" tab. in which case, maybe 3.4.0 with a release note saying "oops this is required" is best. or maybe combine the two approaches: re-tag 3.3.1 then release 3.5.0 in a week after an announcement that the new requirement is coming. | 19:15 |
fungi | at least for me, it seems like a reasonable balance of courtesy to users with a minimal amount of reworking the software we already have | 19:15 |
corvus | SpamapS: well, we didn't have any until 2 weeks ago.... i think a lot of folks feel like presenting a unified web view of the system is worthwhile (regardless of whether nodepool also has a web ui) | 19:15 |
pabelanger | Yah, I guess what I am trying to figure out, if we still expect 3.x to have nodes+labels UI, that needs a firewall / config update. unless zuul-web is smart enough not to render it if no zookeeper connection. Other wise, users still need to update firewalls today or when 3.x happens. | 19:17 |
pabelanger | and if so, then maybe just original option C) land reno note and ML is good enough with out changing defaults | 19:17 |
fungi | i think the firewall config update is suitable for a minor version bump with a clear release note, but that's just my opinion | 19:17 |
fungi | (an opinion SpamapS helped me to form, credit where credit is due) | 19:18 |
corvus | fungi: do you think we could do (C) -- just land a release note and announce 3.4.0 with it? | 19:18 |
fungi | have we not done a release announcement for 3.4.0 yet? | 19:19 |
corvus | nope | 19:19 |
corvus | i'm slow | 19:19 |
*** smyers has quit IRC | 19:19 | |
fungi | that's tempting. it won't be included in the repo state for that tag is my biggest worry, so visibility is mainly in the announcement | 19:19 |
pabelanger | fungi: https://review.openstack.org/630365/ should fix that, once we run reno publish again | 19:20 |
*** smyers has joined #zuul | 19:20 | |
corvus | yeah, though it will be in the website docs | 19:20 |
corvus | under the right section | 19:20 |
fungi | that's not bad | 19:20 |
tobiash | We could make zk optional and hide the labels tab just like we hide the builds tab without sql | 19:21 |
fungi | tobiash: yeah, that's what the code tries to do | 19:21 |
corvus | fungi: i don't think it tries to hide the tabs | 19:21 |
fungi | problem is we always supply a default zk server of localhost when it's not explicitly configured | 19:21 |
fungi | ahh | 19:21 |
pabelanger | yah, I think we'd need to add logic to hide tabs | 19:22 |
corvus | fungi: the main issue is that startup fails due to not connecting | 19:22 |
fungi | so the conditional there is currently more limited | 19:22 |
tobiash | We can add the has labels info in the info endpoint | 19:22 |
pabelanger | which is fine also, and removes then need for firewall change, unless you want it | 19:22 |
corvus | if we caused it to get past that, then we'd be confronted with the fact that clicking the labels tab triggers a 500 | 19:22 |
corvus | but at least it's running :) | 19:22 |
SpamapS | zuul-web not running is, technically, hiding the tab. | 19:22 |
fungi | heh ;) | 19:22 |
pabelanger | SpamapS: touché | 19:23 |
corvus | tobiash: yes... though, tbh, i'm not sure that's something we really want to be optional. i think we'd only be considering it because of this issue; otherwise i don't think it's worth it | 19:23 |
corvus | i mean, presented with that choice, i would just say rework it to use gearman | 19:23 |
SpamapS | Does solve a lot of issues if it can just fetch via gearman. | 19:25 |
fungi | if that's the route we want to take, i'd say skip the re-tagging of 3.3.1 and just tag a revert of the feature | 19:26 |
fungi | as 3.4.1 | 19:26 |
corvus | at some point zuul-web will need to talk to zk, so it's not throwaway work. it's just surprisingly early is all. :) | 19:28 |
pabelanger | I'm okay with reverting the feature, I'm not really using it. But possible users on zuul.o.o are | 19:30 |
pabelanger | I need to step away for a few moments, but will also be EOD shortly, need to run some errands | 19:30 |
fungi | the distributed scheduler work... is the plan that it gets rid of gearman from the zuul/nodepool architecture entirely? | 19:30 |
pabelanger | I've reverted my local zuul to 3.3.1 for now | 19:31 |
corvus | okay, new poll: A) land the release note and send out an announcement with it. B) revert the feature and figure out later what to do about it (put it back with better notice, make it optional, use gearman). | 19:31 |
corvus | pabelanger: before you leave, ^ do you have a preference, and if it's B, do you strongly object to A? | 19:31 |
corvus | fungi: yes | 19:31 |
pabelanger | corvus: both are fine, B is more user friendly, but if you are okay with A, so am I | 19:32 |
corvus | fungi: so doing more stuff in zuul-web with gearman may be throwaway work. | 19:32 |
fungi | points in favor of a: it's a lot less work, it doesn't kick the can further down the road, and at least it was a minor version bump not a patchlevel increment | 19:32 |
fungi | corvus: thanks, that's why i asked, yes ;) | 19:32 |
corvus | okay, i think everyone thinks A is acceptable, if not ideal. so how about we go with that. | 19:33 |
pabelanger | sorry, should say deployer friendly | 19:33 |
fungi | i think option a isn't terrible if 3.5.0 (announced in advance) drops the implicit zk config default so that we don't fail to notice similar dependencies creeping into future releases | 19:34 |
corvus | yeah, i think we should do that. | 19:34 |
fungi | if 3.4.0 had already been announced i'd be pretty against option a, fwiw | 19:34 |
corvus | that may not force the issue, so i think we'll mostly need to be careful | 19:35 |
corvus | (because we actually recommend using the same zuul config file on all hosts) | 19:35 |
corvus | fungi: care to +3 https://review.openstack.org/630365 ? | 19:37 |
corvus | http://logs.openstack.org/65/630365/1/check/tox-docs/ff72c5d/html/releasenotes.html#relnotes-3-4-0 | 19:37 |
fungi | and done | 19:38 |
SpamapS | corvus: I mirror pabelanger's feelings. Fine with either, prefer B, but not going to do the work, so (A) is sufficient. | 19:39 |
corvus | how does this look? https://etherpad.openstack.org/p/6thJzljYMd | 19:41 |
corvus | i put it first in the list; should we call it out more strongly? | 19:42 |
*** openstackstatus has quit IRC | 19:43 | |
SpamapS | strong enough for me. | 19:43 |
fungi | yeah, i worry that doing too much else to it deviates from the notes published to the docs site | 19:43 |
fungi | if we do want to call it out more, we could mention it between the download url and the release notes introduction i guess | 19:44 |
corvus | k. when 630365 lands, i'll send that | 19:44 |
*** openstackstatus has joined #zuul | 19:45 | |
*** ChanServ sets mode: +v openstackstatus | 19:45 | |
fungi | like "be aware this release may require configuration and firewall changes (see below)" | 19:45 |
corvus | fungi: yeah, part of me wants to do that, then part of me says it would look like "please read 6 lines down" | 19:45 |
fungi | sure | 19:46 |
fungi | i think it's good | 19:46 |
fungi | i mean, if we expect people to notice it in the published release notes we should expect them to notice the same thing in the release announcement | 19:46 |
pabelanger | +1 | 19:47 |
pabelanger | on etherpad | 19:47 |
clarkb | ya lgtm | 19:47 |
*** kmalloc has joined #zuul | 19:50 | |
*** kmalloc has quit IRC | 19:51 | |
*** gouthamr has joined #zuul | 19:53 | |
pabelanger | okay, shutting down for a while. Thanks everybody for help this afternoon! | 19:55 |
*** dmellado has joined #zuul | 19:58 | |
openstackgerrit | Merged openstack-infra/zuul-jobs master: Simplify dco-license job playbook https://review.openstack.org/630369 | 20:05 |
openstackgerrit | Merged openstack-infra/zuul master: Update docs since zuul-web requires zookeeper https://review.openstack.org/630365 | 20:05 |
*** openstack has joined #zuul | 20:18 | |
*** ChanServ sets mode: +o openstack | 20:18 | |
corvus | docs are published, release announcement sent; thanks everyone! | 20:42 |
*** hashar has joined #zuul | 20:43 | |
*** corvus is now known as thecount | 21:02 | |
*** thecount is now known as corvus | 21:02 | |
openstackgerrit | David Shrewsbury proposed openstack-infra/nodepool master: Add a timeout for the image build https://review.openstack.org/629923 | 21:14 |
*** rlandy has quit IRC | 21:31 | |
openstackgerrit | Merged openstack-infra/zuul-jobs master: Add role to move docs and artifacts to log root https://review.openstack.org/629571 | 22:07 |
*** hashar has quit IRC | 22:10 | |
*** EvilienM is now known as EmilienM | 22:14 | |
*** pabelanger has quit IRC | 23:58 |
Generated by irclog2html.py 2.15.3 by Marius Gedminas - find it at mg.pov.lt!