Friday, 2019-01-11

*** openstackgerrit has quit IRC		00:34
*** openstackgerrit has joined #zuul		00:54
openstackgerrit	Tristan Cacqueray proposed openstack-infra/zuul master: web: add jobs graph rendering https://review.openstack.org/537869	00:54
*** rlandy has quit IRC		01:07
*** bhavikdbavishi has joined #zuul		01:49
*** bhavikdbavishi has quit IRC		01:59
*** ruffian_sheep has joined #zuul		02:07
ruffian_sheep	Hi!I want to build a Third Party CI!But now ,I don't know how to set the piplines of jenkins jobs build used in zuul	02:10
ruffian_sheep	This is the link where I learn how to build it.https://wiki.openstack.org/wiki/Cinder/tested-3rdParty-drivers.I don't know the exactly setting of the layout.yaml and project.yaml.	02:13
*** sshnaidm is now known as sshnaidm\|off		02:14
fungi	ahh, i wish in #openstack-infra you'd mentioned this was for an openstack third-party ci system	02:14
fungi	or i wouldn't have directed you to #zuul	02:15
fungi	zuul hasn't supported jenkins for roughly a year now	02:15
fungi	so if you want to run zuul with jenkins you're going to be stuck running a somewhat old and unmaintained release of zuul	02:16
fungi	i think 2.6.0 is likely the last release which had jenkins integration	02:17
ruffian_sheep	But .....the link is given by official guidance document	02:25
fungi	that's guidance from the openstack cinder project. you should talk to people involved in that project to get recommendations on what they expect from you	02:26
ruffian_sheep	I think so !But...this my third week to learn how to build it.And I cann't find anyone do it like me to answer my questionsT.T	02:32
ruffian_sheep	<fungi>:Do you know the channel they discuss?	02:33
fungi	there is a #openstack-cinder channel where the openstack cinder team tends to be present, and also an #openstack-third-party-ci channel where some third-party ci operators tend to be	02:34
ruffian_sheep	<fungi>:Thanks!Does IRC have the ability to add permanent friends?	02:36
ruffian_sheep	<fungi>: ;P	02:36
fungi	not really, no (some irc clients might, i don't know)	02:36
ruffian_sheep	<fungi>: Alright	02:37
ruffian_sheep	Hi!I want to build a Third Party CI!But now ,I don't know how to set the piplines of jenkins jobs build used in zuul.This is the link where I learn how to build it.https://wiki.openstack.org/wiki/Cinder/tested-3rdParty-drivers.I don't know the exactly setting of the layout.yaml and project.yaml.	02:38
ruffian_sheep	Sorry,send the wrong channel..	02:39
ruffian_sheep	Please ignore the message	02:39
*** jiapei has joined #zuul		03:10
*** _ari_ has quit IRC		03:21
*** rcarrillocruz has joined #zuul		03:39
*** bhavikdbavishi has joined #zuul		04:51
openstackgerrit	Tristan Cacqueray proposed openstack-infra/zuul master: web: add version to info endpoint https://review.openstack.org/609571	05:35
openstackgerrit	Tristan Cacqueray proposed openstack-infra/zuul master: web: add about dropdown to display zuul version https://review.openstack.org/630027	05:35
*** chkumar\|out is now known as chandankumar		05:40
openstackgerrit	Tristan Cacqueray proposed openstack-infra/zuul master: sql: add buildset uuid https://review.openstack.org/630034	06:45
openstackgerrit	Tristan Cacqueray proposed openstack-infra/zuul master: web: add /{tenant}/buildsets route https://review.openstack.org/630035	06:45
*** hashar has joined #zuul		06:54
openstackgerrit	Rui Chen proposed openstack-infra/zuul master: Avoid using list branches with protected=1 in github driver https://review.openstack.org/630038	06:55
*** pcaruana has joined #zuul		07:05
*** bhavikdbavishi has quit IRC		07:10
*** quiquell\|off is now known as quiquell		07:16
openstackgerrit	Tristan Cacqueray proposed openstack-infra/zuul master: sql: add buildset uuid column https://review.openstack.org/630034	07:17
openstackgerrit	Tristan Cacqueray proposed openstack-infra/zuul master: web: add /{tenant}/buildsets route https://review.openstack.org/630035	07:17
openstackgerrit	Tristan Cacqueray proposed openstack-infra/zuul master: web: add buildsets page https://review.openstack.org/630041	07:17
*** hashar has quit IRC		07:18
*** jiapei has left #zuul		07:20
*** gtema has joined #zuul		07:44
openstackgerrit	Tristan Cacqueray proposed openstack-infra/zuul master: sql: add buildset uuid column https://review.openstack.org/630034	07:46
*** gtema has quit IRC		08:01
*** gtema has joined #zuul		08:02
*** rcarrillocruz has quit IRC		08:03
openstackgerrit	Tristan Cacqueray proposed openstack-infra/zuul master: web: add /{tenant}/buildsets route https://review.openstack.org/630035	08:17
*** jpena\|off is now known as jpena		08:54
*** panda\|off is now known as panda		08:55
openstackgerrit	Tristan Cacqueray proposed openstack-infra/zuul master: web: add /{tenant}/buildsets route https://review.openstack.org/630035	09:07
openstackgerrit	Tristan Cacqueray proposed openstack-infra/zuul master: web: add buildsets page https://review.openstack.org/630041	09:07
openstackgerrit	Tristan Cacqueray proposed openstack-infra/zuul master: web: add /{tenant}/buildset/{uuid} route https://review.openstack.org/630078	09:07
openstackgerrit	Tristan Cacqueray proposed openstack-infra/zuul master: web: add buildset page https://review.openstack.org/630079	09:07
*** avass has joined #zuul		09:38
avass	any way to run a job on all nodes with a specific label?	09:40
panda	avass: adding the label in the nodeset and then using hosts: <label> in the playbook doesnt' work ?	09:42
panda	does anyone know if files: attribute in a child overrides completely the same attribute in the parent or the two lists are merged ?	09:42
avass	panda: i thought that only requests one node with the label?	09:43
avass	panda: i mean all nodes with a specific label in nodepool sorry	09:43
panda	avass: yes, nevermind, I always confuse the label as an alias for group	09:45
panda	avass: in fact I think the only way is to use groups	09:46
avass	yeah, but that would mean setting up different labels for all nodes then adding to the group wouldn't it?	09:46
ssbarnea\|rover	found a bug in zuul, take a look at this message which translates into a table full of SUCCESS in gerrit and a negative vote. https://s3.sbarnea.com/ss/190111-Change_Ia5bcd556_GATE_CHECK_for_TripleO__review.openstack_Code_Review_.png	09:48
ssbarnea\|rover	or maybe this counts as a bug in the JS extension for gerrit made for zuul.	09:49
panda	avass: two nodes can have the same label, and I think it's separated from the group definition. If you set two nodes with the same label in nodes: you will need to manually add those nodes to the same group to run the playbook on both	09:51
avass	panda: yes but wouldn't that mean that nodeset only requests one of the nodes from nodepool?	09:52
avass	panda: or nvm I see what you mean. But that wouldn't be the same as all nodes	09:52
panda	avass: nodeset.nodes.label (required)	09:53
panda	The Nodepool label for the node. Zuul will request a node with this label.	09:53
panda	two nodes with the same label will be two nodes anyway	09:53
avass	panda: yes, but I have static nodes with jobs that need to be run on all of them at the same time. only adding a label in nodepool when setting up another node is nicer than having to reconfigure the job	09:55
avass	panda: meaning I don't want to run a job on a set number of nodes but all of the nodes with a nodepool label	09:56
*** avass is now known as avass\|lunch		10:07
openstackgerrit	Tobias Henkel proposed openstack-infra/zuul master: Fix test_load_governor on large machines https://review.openstack.org/630118	10:08
tobiash	avass: that's not possible atm. You would need to remove them from nodepool and use add_host in the job. Then you can run stuff on all of them. But still you would need to find a way to query that list of nodes somewhere	10:15
openstackgerrit	Andriy Shevchenko proposed openstack/pbrx master: Updatae home-page https://review.openstack.org/630132	10:21
*** ruffian_sheep has quit IRC		10:37
*** electrofelix has joined #zuul		10:48
*** avass\|lunch is now known as avass		10:53
avass	tobiash: alright	10:53
*** hashar has joined #zuul		10:55
avass	tobiash: do you think it would be a lot of work to implement it? might work on it later	10:56
jkt	my zuul/web/static/ contains just the .keep file after I `pip install`, what am I doing wrong? Do I need some additional packages to build the web dashboard?	10:58
*** gtema has quit IRC		11:01
tobiash	avass: I'm not sure how this would fit into the current architecture of nodepool	11:01
avass	ah, I see	11:01
avass	would have been a nice thing to have to set up static nodes	11:03
jkt	ah, right, if I'm installing from git directly, I need yarn.	11:20
*** hashar has quit IRC		11:29
openstackgerrit	Tobias Henkel proposed openstack-infra/zuul master: Fix test_load_governor on large machines https://review.openstack.org/630118	11:39
*** cmurphy is now known as cmorpheus		12:05
*** dkehn has quit IRC		12:10
*** rcarrillocruz has joined #zuul		12:35
*** hashar has joined #zuul		12:35
*** jpena is now known as jpena\|lunch		12:38
*** rcarrillocruz has quit IRC		12:40
*** gtema has joined #zuul		12:49
*** rcarrillocruz has joined #zuul		13:10
*** rlandy has joined #zuul		13:18
openstackgerrit	Sorin Sbarnea proposed openstack-infra/zuul-jobs master: Avoid zuul DISK_FULL failure with too big logs https://review.openstack.org/630224	13:37
*** jpena\|lunch is now known as jpena		13:44
openstackgerrit	Sorin Sbarnea proposed openstack-infra/zuul-jobs master: Avoid zuul DISK_FULL failure with too big logs https://review.openstack.org/630224	13:46
*** dkehn has joined #zuul		13:55
fungi	ssbarnea\|rover: yeah, what you're seeing is a combination of 1. zuul not having an external log link to supply because the job was aborted prior to uploading its logs, and 2. the "hideci" javascript overlay in use on openstack's gerrit server not knowing how to deal with it	13:57
fungi	i know we'd discussed in the past having a way to at least archive and report a log link for the build's console log even if it exceeded storage on the executor. the current implementation where the disk accountant as a separate thread aborts jobs out-of-band makes that a bit challenging to implement, i think	13:58
dmsimard	fungi: perhaps the take away could be that the executor should not be storing the files in transit at all ?	14:43
fungi	well, it does some various things to them while stored on the executor which would need to happen elsewhere i suppose	14:44
fungi	dmsimard: also, the one safeguard we do have currently to curtail filling up logservers is the executor disk accountant thread, which operates on the transient build dir	14:46
fungi	if we copy those files straight through to the destination, we likely need some other way to figure out how much we're about to copy	14:47
dmsimard	what I had implemented a while back for RDO's jenkins things is that the logserver had a ssh key embedded in the image (not unlike the current infra root keys) and the log server would pull the logs instead	14:47
fungi	yeah, we're also hoping to soon get away from having an actual logserver. i doubt swift containers have a mechanism to pull data from some other source on demand	14:48
dmsimard	yeah, swift makes this somewhat of a no-go	14:48
dmsimard	The ideal situation is if we're able to either pull once or push once, right now we're pulling and then pushing which costs us bandwidth, storage and performance	14:51
fungi	the previous iteration of logs-on-swift under zuul v2 did have the job nodes upload their logs directly to the swift api, though we had to take extra care to only authorize them for access to their job's log subtree	14:52
fungi	and much of the challenge was around generating and distributing credentials for that	14:53
dmsimard	yeah, security is why we had the log server pull the logs instead of having the nodes push them	14:54
fungi	also, aggregating artifacts/logs through the executor does, i think, provide increased flexibility in supporting multiple storage solutions	14:55
dmsimard	pros and cons :)	14:56
jkt	how does nodepool actually connect to the nodes when using the static provider?	15:14
jkt	I see that I have to register the nodes' SSH server pubkeys, but the docs do not say anything about how to configure the nodes themselves	15:15
jkt	presumably, I should somehow add some SSH pubkey to my target user's home dir, but the docs do not specify which key is used for this by nodepool/zuul	15:16
jkt	is that the same zuul's SSH key as used for talking to Gerrit by chance?	15:16
Shrews	jkt: you may find https://zuul-ci.org/docs/zuul/admin/nodepool_static.html useful	15:18
Shrews	jkt: that's part of the Zuul From Scratch guide https://zuul-ci.org/docs/zuul/admin/zuul-from-scratch.html	15:19
jkt	Shrews: thanks, I read that one, and I haven't found my answer in there	15:21
jkt	Shrews: it is probably executor.private_key_file, right?	15:21
*** quiquell is now known as quiquell\|off		15:22
avass	it sets up temporary ssh keys during the pre-job using the master ssh-key	15:23
avass	jkt: and yes it should be that file	15:23
jkt	let me check that playbook file, thanks	15:24
*** hashar has quit IRC		15:28
corvus	dmsimard, fungi: i think we can have the executor store a few lines of the ansible log in the database in these cases.	15:48
*** avass has quit IRC		15:49
fungi	makes sense. at least knowing which task was in progress when the DISK_FULL result was determined would have helped speed up diagnosis	15:50
fungi	granted that's still not a huge help in the particular case we ran into, with a job which normally produces 85mb of logs when it succeeds, but wanted to archive 7gb of logs during an infrequent failure case	15:51
fungi	but mordred has suggested in #openstack-infra that perhaps running a remote du and then not collecting the logs from the node if over a set threshold would provide sufficient detail to diagnose the cause	15:52
fungi	(and collect at least a summary of the du info, perhaps by just echoing in the console log)	15:53
mordred	yeah - thinking that could go into fetch-output	15:54
dmsimard	what was the failure scenario ? would it be something that could've been handled by an ansible rescue block ?	15:55
dmsimard	like fetch-output fails, rescue -> do something helpful	15:55
mordred	dmsimard: right now no - right now it's that the disk accounting protector on the zuul executor killed things because the job had used too much executor disk space	15:55
mordred	dmsimard: but that's sort of what we're talking about as a potential mitigation so that we can avoid hitting the kill-job-too-much-disk safety net	15:56
dmsimard	could the fetch output role look at the size of the artifacts it needs to pull before pulling them ?	15:57
mordred	dmsimard: that's what we were just talking about doing - yeah	15:57
dmsimard	oh, hadn't read the entire thing	15:57
dmsimard	caffeine--	15:57
mordred	corvus: if we did go that direction - perhaps we could expose zuul's disk usage thresholds in a zuul variable so that the role could have the default disk threshold be set from the executor threshold?	16:00
mordred	although - with all of that - I need to AFK for a bit ... y'all have fun	16:00
tobiash	jkt: nodepool only gets the public keys, it doesn't login to the nodes	16:00
corvus	fungi, mordred: yeah, the two approaches are not mutually exclusive. :) i don't see a problem with exposing the threshold variable	16:02
fungi	mordred: the math still gets interesting, because we're pulling from multiple nodes but have one threshold to work with. i guess the role could tally up the du from all of them before pulling from any of them?	16:05
jkt	tobiash: yup, I am now aware that it's actually zuul-executor, and its default config happens to reuse the "main" key as used for connecting to Gerrit	16:06
*** evrardjp has quit IRC		16:10
openstackgerrit	Paul Belanger proposed openstack-infra/zuul-jobs master: Add dco-license job https://review.openstack.org/630302	16:11
*** evrardjp has joined #zuul		16:11
pabelanger	fungi: ttx: ^ is the follow up for pushing up dco-license job, going test is now from ansible-network zuul	16:11
corvus	pabelanger: you were asking for a zuul release.... should we release 2fd6883 as 3.4.0?	16:12
corvus	tobiash, clarkb: ^?	16:13
pabelanger	corvus: yes, that looks fine to me	16:13
tobiash	corvus: is that what you are running atm?	16:13
corvus	tobiash: yeah	16:14
tobiash	lgtm	16:14
tobiash	nothing really important between that and current master	16:14
corvus	there's a small behavior change for the archive url artifacts, but i think it's forward-compatible enough to have it span releases	16:15
tobiash	corvus: while we're at release partying, does anything speak against a gear release? That would enable us to introduce the client side keepalive in zuul.	16:16
corvus	tobiash: oh thanks, yeah we can do that to.... though since gear is a library, let's do that monday? :)	16:17
tobiash	monday is fine	16:17
tobiash	thanks	16:17
clarkb	corvus: fine with me. The github improvements would be good to publish	16:19
clarkb	on the nodepool side of things we may want to get Shrews's image build timeouts in and tested with openstack before releasing. I think that may be an important one given recent errors with image building in openstack land	16:19
Shrews	yeah, i'm working on validating the dib log output is the same before i'm comfortable with it	16:20
Shrews	should know shortly	16:20
tobiash	clarkb, Shrews: regarding the build timeout, I think it should be configurable. We have one image that regularly already takes 8-10 hours...	16:22
tobiash	(it's a 500gb image)	16:22
clarkb	tobiash: ++	16:24
corvus	zuul 3.4.0 pushed	16:24
tobiash	\o/	16:24
clarkb	tobiash: I haven't had a chance to review it yet, but is on my list of things to try and get to today	16:26
*** EmilienM is now known as EvilienM		16:26
tobiash	clarkb: you mean the image build timeout?	16:27
clarkb	yes	16:27
tobiash	I just had a quick look at the first verion yesterday when it was wip'ed and noticed the hard coded timeout	16:28
openstackgerrit	Paul Belanger proposed openstack-infra/zuul-jobs master: Add dco-license job https://review.openstack.org/630302	16:30
*** pcaruana has quit IRC		16:36
*** pwhalen has joined #zuul		16:51
*** gtema has quit IRC		16:52
corvus	tobiash, pabelanger, tristanC: are you aware of this capability? it's not something we've really advertised... https://review.openstack.org/629983	16:53
tobiash	corvus: not yet, will look later	16:54
corvus	i'm wondering whether we should continue to pretend that doesn't exist, or should we start to use it in some places (keeping in mind that using it will serve as an example to others). or should we jump into the deep end of the pool and make the 'parent' attribute a list and make this more convenient to use.	16:58
pabelanger	corvus: ah, interesting. I don't think I've written a job like that before	16:59
corvus	i don't think anyone has :)	16:59
pabelanger	yah, having job with 2 parents, way cool!	17:00
openstackgerrit	David Shrewsbury proposed openstack-infra/nodepool master: Add a timeout for the image build https://review.openstack.org/629923	17:09
clarkb	Shrews: ^ one note on the flush location there	17:11
Shrews	clarkb: that doesn't appear necessary since EOF in the normal case seems to handle that	17:12
clarkb	Shrews: but the log.write(m.encode('utf8')) only happens in the error cases	17:13
clarkb	er no the else is all cases	17:13
clarkb	so just move the flush below the log.write in the exception handler?	17:13
Shrews	clarkb: you've confused me. sorry	17:14
Shrews	clarkb: i need the flush+fsync there to get that message at the end of the log file	17:15
clarkb	Shrews: https://review.openstack.org/#/c/629923/2/nodepool/builder.py line 785. Is a write that happens after the flush and fsync. Shouldn't it come first?	17:15
Shrews	clarkb: the normal case doesn't need that	17:15
Shrews	no, it shouldn't be first because i want it as close to the end of file as possible	17:16
clarkb	gotcha its ensuring the first process has finished writing before the nodepool builder writes that line	17:16
Shrews	because in the timeout case, there can still be buffered data this not yet written to the log	17:16
Shrews	s/this/that is/	17:17
Shrews	i guess technically it's not necessary since the dib process could still write to the log long after we've moved on, but it doesn't hurt either	17:19
*** panda is now known as panda\|off		17:25
clarkb	ya getting it close enough is probably fine. grep works and we shouldn't timeout often	17:25
*** pwhalen has quit IRC		17:28
*** electrofelix has quit IRC		17:31
openstackgerrit	David Shrewsbury proposed openstack-infra/nodepool master: Add a timeout for the image build https://review.openstack.org/629923	17:32
Shrews	ok, so the requirements on that ^^ seems to be in flux. Do we want the timeout configurable per-image or just an overall timeout?	17:34
Shrews	clarkb: tobiash: ^^^	17:34
Shrews	that last PS added an overall timeout	17:35
Shrews	but I just saw the comments	17:35
clarkb	Shrews: given tobiash's extreme builds we likely have to make it configurable. I don't think it has to be configurable per image if that is more work	17:35
Shrews	ok, then the current PS should work	17:36
clarkb	(that was more of an optimization so tobiash could say "this is the really long image build treat it differently") For openstack most of our builds are consistent and it won't matter	17:36
*** rlandy is now known as rlandy\|brb		17:36
clarkb	Shrews: ya that latest ps lgtm	17:37
pabelanger	Hmm. I've updated my zuul-web to 3.4.0, but for some reason it isn't connecting properly to zookeeper now	17:40
pabelanger	rolling back to 3.3.1 works as expected	17:40
pabelanger	going to try another service, zuul-web has a lot of changes	17:42
pabelanger	Oh, ha	17:51
pabelanger	I think zuul-web now needs a connection to zookeeper	17:51
pabelanger	no reason it doesn't work, I have it blocked in firewall	17:52
pabelanger	let me confirm that is fix, and I'll propose reno change about upgrading	17:53
pabelanger	Yup, that is it	17:54
*** pwhalen has joined #zuul		17:55
timburke	fungi: thinking about limiting swift access to a particular subtree, you might want to look at prefix-based tempurls -- see https://docs.openstack.org/swift/latest/api/temporary_url_middleware.html	18:00
fungi	timburke: i think that's what we did once upon a time	18:01
timburke	wow, that's been around longer than i'd remembered... since swift 2.12...	18:03
*** rlandy\|brb is now known as rlandy		18:07
*** TheJulia is now known as needssleep		18:10
fungi	timburke: yeah, looks like we were using tmpurls with prefixes: https://git.zuul-ci.org/cgit/zuul/tree/zuul/configloader.py?h=9e78452#n125	18:10
openstackgerrit	Paul Belanger proposed openstack-infra/zuul master: Update docs since zuul-web requires zookeeper https://review.openstack.org/630365	18:12
pabelanger	corvus: tobiash: clarkb: ^documentation updates for zuul-web when upgrading to 3.4.0	18:12
corvus	when did that change?	18:13
pabelanger	corvus: I have nodes now in web UI, let me find commit	18:14
pabelanger	corvus: I think https://review.openstack.org/553979/	18:15
corvus	welp, we should have tagged zuul v4 then.	18:16
corvus	that's an architectural deployment change	18:16
openstackgerrit	Merged openstack-infra/zuul-jobs master: Add validate-dco-license role https://review.openstack.org/629565	18:17
pabelanger	Yah, I only caught it because zuul-web is on different VM then scheduler	18:17
corvus	i think we should re-tag 3.3.1 and 3.4.1	18:17
corvus	er	18:17
corvus	i think we should re-tag 3.3.1 as 3.4.1	18:17
pabelanger	ok	18:18
corvus	because we should not be sneaking in deployment changes (like, if you need to update your firewall rules, it's a deployment change) as minor version changes	18:18
pabelanger	++ agree	18:19
corvus	and then we can either revert the web->zk changes, or tag 3.4.0 as 4.0.0	18:19
corvus	(and i guess scale-out-scheduler becomes zuul v5 :)	18:19
fungi	i'll have to stop talking about "zuul v3"	18:19
openstackgerrit	Merged openstack-infra/zuul-jobs master: Add dco-license job https://review.openstack.org/630302	18:20
pabelanger	will defer to people, but would be okay with revert and 3.4.0 over 4.0.0 bump	18:20
corvus	well, we lose the node+label stuff, unless we rework that to use gearman	18:20
corvus	tristanC, tobiash, SpamapS: ^ thoughts?	18:22
AJaeger	corvus, do we need https://review.openstack.org/#/c/625615/1/zuul/site-variables.yaml ? Happy to +2A, just want confirmation (or you can +2A yourself ;)	18:22
*** hashar has joined #zuul		18:23
openstackgerrit	Paul Belanger proposed openstack-infra/zuul-jobs master: Simplify dco-license job playbook https://review.openstack.org/630369	18:25
AJaeger	here's a zuul-jobs change for review - do we want to pin nodejs as mordred proposed? https://review.openstack.org/627823	18:25
SpamapS	3.4 seems sufficient for an internal-only deploy change.	18:26
SpamapS	I'd expect a 3.3.1 to be patches only.	18:27
SpamapS	(fixes, non-changing features, etc)	18:27
SpamapS	but we added some cool features, so it seems fine to say "to get them you need to update your firewalls"	18:27
SpamapS	Honestly, I'd even be fine with scale-out scheduler being a minor 3.x+1 release, if it didn't change any of the external interfaces.	18:28
fungi	SpamapS: well, even if you don't use those new features, you're going to need to update your firewalls if you want to continue running zuul-web, i expect?	18:28
SpamapS	fungi: Yeah, that's fine with me. I'm not going to upgrade from 3.3 to 3.4 without planning and reading changelogs.	18:29
fungi	fair	18:29
corvus	i'm happy to be talked back from the edge... though up to this point it's generally been safe to update to 3.x without touching your deployment setup (modulo that one time where certain web configs had to change, sorry SpamapS). 3.x releases have generally been "new job features or behavior" not "new system configuration"	18:29
SpamapS	(I might upgrade from 3.3.0 -> 3.3.1 without doing that, if it had an important fix)	18:29
SpamapS	So, one thing might be to make the new zuul-web bits off by default.	18:30
pabelanger	fungi: yah, in my case, my zuul.conf on zuul-web didn't have zookeeper configuration too, so I needed a zuul.conf change also	18:30
pabelanger	but, that is only because I dynamic generate it with minimal content	18:31
mrhillsman	let me just say the changes that have been going in from a ux have been great so kudos to all the folks doing the work ;)	18:31
corvus	if i'm reading the change correctly, it is effectively required	18:33
corvus	if you omit the zk hosts, it defaults to localhost	18:33
corvus	so regardless, zuul-web will not start if it can't reach a zk	18:33
pabelanger	yes, that is what happen to me	18:34
corvus	(even though the connect invocation is wrapped in a conditional to test whether zk_hosts is set)	18:35
corvus	(it's never going to fail)	18:35
corvus	except... in tests?	18:36
corvus	nope, even then it's used	18:36
corvus	oh, ok. some tests use it, some don't	18:37
corvus	so the only place it's possible to have a web server without a zk connection is in some unit tests. not in production.	18:37
corvus	we usually try to run tests as in production, so we should probably change that.	18:38
corvus	given that there's a default value for the zk host, i don't see an easy way to make this optional	18:38
corvus	why do we have a default for the zk hosts value when we don't document that, and say that the hosts list is required? https://zuul-ci.org/docs/zuul/admin/components.html#attr-zookeeper	18:40
corvus	(if we didn't have the default, i think there would be an easy way to implement what SpamapS suggested -- add the zk connection info if you want it, or don't)	18:41
*** jpena is now known as jpena\|off		18:44
corvus	clarkb, tobiash, mordred, pabelanger, SpamapS, fungi, Shrews, and anyone else who wants to weigh in: what should we do? A) tag 3.3.1 as 3.4.1, land the release note, then tag master as 4.0.0. B) tag 3.3.1 as 3.4.1, make the ZK stuff optional [somehow] then tag master as 3.4.2. C) land the release note, and send out the release announcement with a note about the additional requirement.	18:45
clarkb	I like B if we can solve the somehow	18:47
clarkb	that gives us future flexibility too	18:47
fungi	would making the zk stuff optional and then tagging 3.3.2 be on the table?	18:47
fungi	as in "oops, 3.3.1 was a regression, fixed in 3.3.2"	18:48
corvus	fungi: 3.4.0 is already tagged, it has the new zk stuff	18:48
fungi	oh, nevermind	18:48
corvus	3.3.1 is the last "safe" release	18:48
clarkb	for solving the somehow, one appraoch could be to check if 2181 is listening on localhost, if so connect to it, else don't try to connect?	18:49
tobiash	I like B	18:49
clarkb	(that might be too smart for its own good though)	18:49
fungi	got it, so "tag 3.3.1 as 3.4.1" means "revert 3.4.0 temporarily"	18:49
corvus	clarkb: i don't like an approach that fails quietly if zk is down	18:49
corvus	clarkb: i think we need to know what the operator expects, and then fail loudly if they expect zk to exist but it doesn't	18:49
pabelanger	there is also D right, tag 3.3.1 as 3.4.1, revert node+label web ui, tag 3.4.2 ?	18:49
* tobiash is only partly around		18:49
*** dmellado has quit IRC		18:51
*** gouthamr has quit IRC		18:51
pabelanger	as much as I'd like C, I don't think that will be fair to users. To that is -1 for me	18:51
corvus	B is nice too, but i need ideas on how to do it. only way i see would be to drop the undocumented default value. i'm not sure that needs a major release, but it might be nice to at least give notice and have a deprecation period.	18:51
fungi	revisiting the proposed options, i like b though it makes me wonder if we have consensus/precedent on when to increment the major release version component	18:52
pabelanger	+1 for B, if we can figure out how to do	18:53
corvus	fungi: i'm not sure if we have consensus, but we do have precedent. it has only incremented on major architecture changes (jenkins api -> gearman -> zk/ansible)	18:53
fungi	i agree version numbers are cheap/free, so if we want a new connection between two components to mean new major version, i'm okay with it	18:53
corvus	though 2.5 is an asterisk :)	18:54
fungi	heh	18:54
fungi	the addition of a connection from zuul-web to zk doesn't feel like it's in the same class as our previous major version bumps, at least	18:54
corvus	okay, so does anyone have a suggestion or how to implement B?	18:54
pabelanger	if we go to 4.0.0, what else would be include feature wise in that stream?	18:54
corvus	pabelanger: i don't understand the question	18:55
clarkb	corvus: one option is to make anothe rbreaking change and force people to list localhost if that is what they mean (sounds like docs already say this is how you would do it)	18:55
fungi	i don't think we include anything else feature-wise in 4.0.0 aside from the node+label webui	18:55
pabelanger	let me reask, would multiple ansible version be 4.0 or 5.0	18:55
fungi	depends on when it lands and what lands before it	18:56
Shrews	going to 4.0.0 seems... extreme, but i get it. if we could do B, that seems better	18:56
corvus	clarkb: yeah. i'd like to do that, but it does mean breaking anyone relying on the undocumented default	18:56
clarkb	does the docker-compose quickstart take advantage of that?	18:57
clarkb	(trying to get a sense for how many people might actually be using it, I know openstack bmw and SpamapS all run remote zk clusters)	18:57
fungi	i feel like changing an undocumented behavior (one which contradicts what the documentation claims) could be argued as a minor version bump with very thorough release note	18:57
corvus	if we were planning this ahead of time, i would say "lets send an announcement that in the next release we're going to remove the undocumented default, and wait at least a week.	18:57
corvus	fungi: yeah. you might be able to talk me into that. i hope so. :)	18:58
SpamapS	HOw long has the undocumented default been out?	18:58
SpamapS	Because, I'm inclined to ignore it if it's less than a few months.	18:58
SpamapS	If it were documented, that'd be one thing.	18:58
corvus	i think it's old. i'll check.	18:58
SpamapS	But the accidental "oops this works" being replaced with "hey where'd my nodes/labels UI go?" is kind of fine with me.	18:59
SpamapS	(assuming the answer is that the nodes/labels UI goes away when there's no ZK config)	18:59
fungi	so amended proposal: re-tag 3.3.1 as 3.4.1, announce the coming removal of the default, then tag 3.4.0 as 3.5.0 with a clear release note explaining the situation	19:00
SpamapS	I thought the nodes/labels UI bits only landed a few days ago.	19:00
SpamapS	But maybe the API has been around longer.	19:00
fungi	(or, you know, master as 3.5.0 really)	19:00
corvus	on 2017-06-23 i changed the section it was under; but it's been there since the beginning, 2017-02-21	19:01
fungi	basically just a "oops, our bad, here's what we should have announced, we'll unwind it with another release in a week"	19:01
pabelanger	fungi: so, that would only break people using localhost zookeeper with zuul scheduler, IIUC	19:02
corvus	SpamapS: yeah, the nodes/labels are new. but having a zk default is old	19:02
corvus	fungi: yeah, that's starting to sound like a plan	19:02
fungi	i do feel like 4.0.0 is going to imply much larger architectural changes than we intend to convey, unless we start to set expectations of much more frequent major version bumps going forward	19:03
corvus	clarkb: the quickstart does not rely on implicit zk hosts	19:04
SpamapS	Wait, so zuul-web has been trying to connect to ZK since June?	19:05
SpamapS	Or the config value was there, but unused?	19:05
corvus	fungi: yeah, it sounds like we'd all probably like to be able to make this change as 3.X, but we just need to do it carefully, and reserve 4.0 for "you're going to start or stop running a new service"	19:05
pabelanger	SpamapS: only after dec 29 https://review.openstack.org/553979/	19:06
corvus	SpamapS: no, zuul-web has only been trying to connect since 3.4.0 (or master a week ago). but the "[zookeeper]" section of the config file, which is valid on any host (for this reason) has had an implicit default of localhost since "forever"	19:06
*** hashar has quit IRC		19:06
SpamapS	Ah	19:06
SpamapS	well IMO the more important thing is whether the daemon started up and worked.	19:06
corvus	heh, apparently that was 2 weeks ago, time flies	19:06
fungi	i suppose we could find a way to make that default optional only for zuul-web, but that could get complicated	19:07
corvus	fungi: yeah, it undercuts the work we did to unify that	19:07
fungi	er, make that default nonexistent i maean	19:07
fungi	i agree, it seems like a poor choice for consistency	19:07
SpamapS	But I see the complication that the config file parsed through the time we started using it.	19:07
pabelanger	so, if zookeeper is nonexistent in zuul-web, the new nodes+label UI won't load? Sorry, getting lost mental mapping everything	19:08
SpamapS	Right now I believe if you don't set it, zuul-web will fail to start unless you have zk on localhost.	19:09
corvus	pabelanger: the current behavior is that zuul-web will always attempt to connect to zk, regardless of your config, because of the implicit defaul.	19:09
corvus	SpamapS: right	19:09
SpamapS	We could change it to go ahead and catch that failure when the default is used with a "We think you didn't mean to do this" warning in the log and release notes, and then 3.5.0 can be the first release that doesn't have the default. ?	19:11
SpamapS	*We could change it -- in a 3.4.1 release	19:11
pabelanger	corvus: ok, and sounds like we are wanting to remove that. which stops zuul-web from connecting to zookeeper, we then also update UI to not display labels / nodes if no zookeeper connection? Or do we get that by default (guess I should look)	19:12
SpamapS	So that way 3.4.1 still starts where 3.4.0 doesn't, but I don't think it breaks anybody.	19:12
SpamapS	Also, I'm sort of inclined to suggest that 4.0 drops any nodepool-specific API's from zuul-web, and we make a nodepool-web.	19:12
SpamapS	(but that's jumping ahead)	19:13
fungi	pabelanger: it seems like the simplest way forward is to not rework too much of that, roll back the release by re-tagging the old release, and then announce removal of the implicit zk config option, then tag a new release after a bit of a waiting period (so just roll forward with requiring zk for zuul-web)	19:13
fungi	s/option/default/	19:14
clarkb	fungi: well if we don't rework that won't it also fail on not having zk?	19:14
fungi	yes, but we will have announced that the configuration there is mandatory, like the documentation already states	19:14
clarkb	ah right	19:14
corvus	pabelanger: i'm not sure; we might just end up pushing the brokenness down to when someone clicks on the "labels" tab. in which case, maybe 3.4.0 with a release note saying "oops this is required" is best. or maybe combine the two approaches: re-tag 3.3.1 then release 3.5.0 in a week after an announcement that the new requirement is coming.	19:15
fungi	at least for me, it seems like a reasonable balance of courtesy to users with a minimal amount of reworking the software we already have	19:15
corvus	SpamapS: well, we didn't have any until 2 weeks ago.... i think a lot of folks feel like presenting a unified web view of the system is worthwhile (regardless of whether nodepool also has a web ui)	19:15
pabelanger	Yah, I guess what I am trying to figure out, if we still expect 3.x to have nodes+labels UI, that needs a firewall / config update. unless zuul-web is smart enough not to render it if no zookeeper connection. Other wise, users still need to update firewalls today or when 3.x happens.	19:17
pabelanger	and if so, then maybe just original option C) land reno note and ML is good enough with out changing defaults	19:17
fungi	i think the firewall config update is suitable for a minor version bump with a clear release note, but that's just my opinion	19:17
fungi	(an opinion SpamapS helped me to form, credit where credit is due)	19:18
corvus	fungi: do you think we could do (C) -- just land a release note and announce 3.4.0 with it?	19:18
fungi	have we not done a release announcement for 3.4.0 yet?	19:19
corvus	nope	19:19
corvus	i'm slow	19:19
*** smyers has quit IRC		19:19
fungi	that's tempting. it won't be included in the repo state for that tag is my biggest worry, so visibility is mainly in the announcement	19:19
pabelanger	fungi: https://review.openstack.org/630365/ should fix that, once we run reno publish again	19:20
*** smyers has joined #zuul		19:20
corvus	yeah, though it will be in the website docs	19:20
corvus	under the right section	19:20
fungi	that's not bad	19:20
tobiash	We could make zk optional and hide the labels tab just like we hide the builds tab without sql	19:21
fungi	tobiash: yeah, that's what the code tries to do	19:21
corvus	fungi: i don't think it tries to hide the tabs	19:21
fungi	problem is we always supply a default zk server of localhost when it's not explicitly configured	19:21
fungi	ahh	19:21
pabelanger	yah, I think we'd need to add logic to hide tabs	19:22
corvus	fungi: the main issue is that startup fails due to not connecting	19:22
fungi	so the conditional there is currently more limited	19:22
tobiash	We can add the has labels info in the info endpoint	19:22
pabelanger	which is fine also, and removes then need for firewall change, unless you want it	19:22
corvus	if we caused it to get past that, then we'd be confronted with the fact that clicking the labels tab triggers a 500	19:22
corvus	but at least it's running :)	19:22
SpamapS	zuul-web not running is, technically, hiding the tab.	19:22
fungi	heh ;)	19:22
pabelanger	SpamapS: touché	19:23
corvus	tobiash: yes... though, tbh, i'm not sure that's something we really want to be optional. i think we'd only be considering it because of this issue; otherwise i don't think it's worth it	19:23
corvus	i mean, presented with that choice, i would just say rework it to use gearman	19:23
SpamapS	Does solve a lot of issues if it can just fetch via gearman.	19:25
fungi	if that's the route we want to take, i'd say skip the re-tagging of 3.3.1 and just tag a revert of the feature	19:26
fungi	as 3.4.1	19:26
corvus	at some point zuul-web will need to talk to zk, so it's not throwaway work. it's just surprisingly early is all. :)	19:28
pabelanger	I'm okay with reverting the feature, I'm not really using it. But possible users on zuul.o.o are	19:30
pabelanger	I need to step away for a few moments, but will also be EOD shortly, need to run some errands	19:30
fungi	the distributed scheduler work... is the plan that it gets rid of gearman from the zuul/nodepool architecture entirely?	19:30
pabelanger	I've reverted my local zuul to 3.3.1 for now	19:31
corvus	okay, new poll: A) land the release note and send out an announcement with it. B) revert the feature and figure out later what to do about it (put it back with better notice, make it optional, use gearman).	19:31
corvus	pabelanger: before you leave, ^ do you have a preference, and if it's B, do you strongly object to A?	19:31
corvus	fungi: yes	19:31
pabelanger	corvus: both are fine, B is more user friendly, but if you are okay with A, so am I	19:32
corvus	fungi: so doing more stuff in zuul-web with gearman may be throwaway work.	19:32
fungi	points in favor of a: it's a lot less work, it doesn't kick the can further down the road, and at least it was a minor version bump not a patchlevel increment	19:32
fungi	corvus: thanks, that's why i asked, yes ;)	19:32
corvus	okay, i think everyone thinks A is acceptable, if not ideal. so how about we go with that.	19:33
pabelanger	sorry, should say deployer friendly	19:33
fungi	i think option a isn't terrible if 3.5.0 (announced in advance) drops the implicit zk config default so that we don't fail to notice similar dependencies creeping into future releases	19:34
corvus	yeah, i think we should do that.	19:34
fungi	if 3.4.0 had already been announced i'd be pretty against option a, fwiw	19:34
corvus	that may not force the issue, so i think we'll mostly need to be careful	19:35
corvus	(because we actually recommend using the same zuul config file on all hosts)	19:35
corvus	fungi: care to +3 https://review.openstack.org/630365 ?	19:37
corvus	http://logs.openstack.org/65/630365/1/check/tox-docs/ff72c5d/html/releasenotes.html#relnotes-3-4-0	19:37
fungi	and done	19:38
SpamapS	corvus: I mirror pabelanger's feelings. Fine with either, prefer B, but not going to do the work, so (A) is sufficient.	19:39
corvus	how does this look? https://etherpad.openstack.org/p/6thJzljYMd	19:41
corvus	i put it first in the list; should we call it out more strongly?	19:42
*** openstackstatus has quit IRC		19:43
SpamapS	strong enough for me.	19:43
fungi	yeah, i worry that doing too much else to it deviates from the notes published to the docs site	19:43
fungi	if we do want to call it out more, we could mention it between the download url and the release notes introduction i guess	19:44
corvus	k. when 630365 lands, i'll send that	19:44
*** openstackstatus has joined #zuul		19:45
*** ChanServ sets mode: +v openstackstatus		19:45
fungi	like "be aware this release may require configuration and firewall changes (see below)"	19:45
corvus	fungi: yeah, part of me wants to do that, then part of me says it would look like "please read 6 lines down"	19:45
fungi	sure	19:46
fungi	i think it's good	19:46
fungi	i mean, if we expect people to notice it in the published release notes we should expect them to notice the same thing in the release announcement	19:46
pabelanger	+1	19:47
pabelanger	on etherpad	19:47
clarkb	ya lgtm	19:47
*** kmalloc has joined #zuul		19:50
*** kmalloc has quit IRC		19:51
*** gouthamr has joined #zuul		19:53
pabelanger	okay, shutting down for a while. Thanks everybody for help this afternoon!	19:55
*** dmellado has joined #zuul		19:58
openstackgerrit	Merged openstack-infra/zuul-jobs master: Simplify dco-license job playbook https://review.openstack.org/630369	20:05
openstackgerrit	Merged openstack-infra/zuul master: Update docs since zuul-web requires zookeeper https://review.openstack.org/630365	20:05
*** openstack has joined #zuul		20:18
*** ChanServ sets mode: +o openstack		20:18
corvus	docs are published, release announcement sent; thanks everyone!	20:42
*** hashar has joined #zuul		20:43
*** corvus is now known as thecount		21:02
*** thecount is now known as corvus		21:02
openstackgerrit	David Shrewsbury proposed openstack-infra/nodepool master: Add a timeout for the image build https://review.openstack.org/629923	21:14
*** rlandy has quit IRC		21:31
openstackgerrit	Merged openstack-infra/zuul-jobs master: Add role to move docs and artifacts to log root https://review.openstack.org/629571	22:07
*** hashar has quit IRC		22:10
*** EvilienM is now known as EmilienM		22:14
*** pabelanger has quit IRC		23:58

Generated by irclog2html.py 2.15.3 by Marius Gedminas - find it at mg.pov.lt!