Monday, 2018-02-19

*** elyezer has quit IRC		00:35
*** elyezer has joined #zuul		00:37
*** elyezer has quit IRC		00:47
*** elyezer has joined #zuul		00:48
*** elyezer has quit IRC		01:39
*** elyezer has joined #zuul		01:41
openstackgerrit	Monty Taylor proposed openstack-infra/zuul master: Reorgnize the output bundle split https://review.openstack.org/545706	02:04
mordred	SpamapS: is remaning az capacity available to an api user?	02:08
mordred	SpamapS: ah- I understand now (read more scrollback)	02:15
*** dkranz has quit IRC		02:30
*** rlandy has quit IRC		02:52
dmsimard	mordred, andreaf: yes, actually I did release ara 0.14.6 to fix just that: https://github.com/openstack/ara/commit/9b962c5dcb77dc3e7687f75030ea41de30a54746	04:50
dmsimard	I fixed it from the perspective of ara but I still feel it's a bug. I remember discussing it with #ansible-devel and it was supposed to be fixed in 2.4.3 iirc	04:52
tobiash	SpamapS: we should probably make the az stickyness configurable per provider	05:08
tobiash	In my cloud this also has these disadvantages	05:09
tobiash	And in my cloud inter az bandwith (which probably was a reason for az stickyness) is a non-issue for me	05:10
tobiash	Especially in smaller clouds this is even problematic as it can prevent nodes from being created regardless of quota	05:12
tristanC	tobiash: shouldn't we make az stickyness part of the nodeset definition instead?	05:12
tobiash	That depends strongly on the cloud and its inter az bandwidth	05:13
tobiash	So I think this should be primarily set by nodepool as the user usually doesn't have the knowledge to judge about that	05:14
tristanC	tobiash: doesn't it change network configuration when nodes aren't in the same az?	05:16
*** elyezer has quit IRC		05:18
*** elyezer has joined #zuul		05:21
tristanC	iiuc, cross az nodeset won't necessarly be in the same tenant network, and maybe some job explicitely need that	05:27
tobiash	tristanC: no it doesn't necessarily	05:34
tobiash	The networks used is also defined by the provider	05:35
*** bhavik1 has joined #zuul		05:52
*** bhavik1 has quit IRC		06:00
*** threestrands has quit IRC		06:10
*** threestrands has joined #zuul		06:23
SpamapS	So, I'd say that cross-az-stickyness should be a boolean, defaulting to true.	06:48
SpamapS	If you know your cloud doesn't need it, and you find yourself wishing it wasn't a thing, you can turn it off.	06:48
openstackgerrit	Tristan Cacqueray proposed openstack-infra/zuul master: web: angular controller for status and stream page https://review.openstack.org/545755	06:48
SpamapS	For me, cross-AZ bandwidth is the same as intra-AZ bandwidth when the HV's are in different racks.	06:49
SpamapS	And the networks are all more or less the same.. cross-AZ may go through one extra router, but that may happen intra-AZ too if the nodes are on different subnet.s	06:49
*** threestrands has quit IRC		07:00
*** AJaeger has quit IRC		07:00
*** AJaeger has joined #zuul		07:04
tobiash	SpamapS: yeah, it should default to true to keep the current behavior	07:06
tristanC	tobiash: thanks for the ansible-2.4 port, it seems to be working fine	07:10
tobiash	tristanC: yeah, it worked in my testenv and I'm going to roll this out on production this week :)	07:11
tobiash	This is essential for me to get real windows jobs running as with ansible 2.3 almost every win_* module is broken with py3...	07:12
tristanC	we also have some playbooks that are >2.4 only, hopefully we'll deploy your patch soon too	07:16
tobiash	tristanC: we'll land it shortly after 3.0 release	07:16
SpamapS	which is hopefully.. shortly. :)	07:18
* SpamapS has been missing meetings.. not sure what's left		07:18
tristanC	too bad, ansible-2.4 support is also blocking fedora packages	07:19
SpamapS	What's the reasoning behind delaying the bump after release?	07:19
SpamapS	2.4 breaks old stuff?	07:19
SpamapS	(Does kind of feel like 2.4 was overly aggressive, and introduced some new annoying, meaningless warnings)	07:20
tobiash	SpamapS: the openstack release process is critical and roughly at the same time as the 3.0 release	07:21
tobiash	ansible 2.4 is a high risk to disrupt the openstack release so it will be merged after that	07:22
AJaeger	zuul and nodepool cores, ianw and myself fixed the integration tests, could you review the stacks starting at https://review.openstack.org/#/c/545163/ and https://review.openstack.org/#/c/545158/ , please?	07:25
*** elyezer has quit IRC		07:27
*** elyezer has joined #zuul		07:31
SpamapS	tobiash: ah yeah that makes more sense actually. :)	07:42
SpamapS	I kind of think we're going to have to support multiple ansible versions at some point.	07:43
SpamapS	btw, my AZ problems show nicely here http://paste.openstack.org/show/677466/	07:43
SpamapS	need 10 nodes, have 11, building 6 (min-ready is 15)	07:45
*** chrnils has joined #zuul		07:47
*** sshnaidm\|off has quit IRC		08:35
*** electrofelix has joined #zuul		08:35
*** jpena\|off is now known as jpena		08:43
*** sshnaidm has joined #zuul		08:44
openstackgerrit	Antoine Musso proposed openstack-infra/zuul master: WIP: ensure that Change.number is a string https://review.openstack.org/545768	09:04
*** sshnaidm has quit IRC		09:07
*** sshnaidm has joined #zuul		09:08
*** hashar_ has joined #zuul		09:24
openstackgerrit	Antoine Musso proposed openstack-infra/zuul master: WIP: ensure that Change.number is a string https://review.openstack.org/545768	09:46
openstackgerrit	Tristan Cacqueray proposed openstack-infra/zuul master: Tenant config dynamic loading of project from external script https://review.openstack.org/535878	09:46
*** elyezer has quit IRC		09:58
*** elyezer has joined #zuul		09:59
*** kmalloc has joined #zuul		10:14
openstackgerrit	Matthieu Huin proposed openstack-infra/zuul master: zuul web: add admin endpoint, enqueue & autohold commands https://review.openstack.org/539004	10:19
*** tosky has joined #zuul		10:29
*** fbo has quit IRC		10:40
*** dmellado has quit IRC		12:26
*** elyezer has quit IRC		12:30
*** elyezer has joined #zuul		12:35
andreaf	corvus: I've done some tests in https://review.openstack.org/#/c/545696 with group_vars	12:35
andreaf	corvus: from what I can see devstack_localrc is merged between controller group vars and generic vars (from parent job)	12:37
andreaf	corvus: but not between subnode group vars and generic vars (from parent job)	12:37
andreaf	corvus: http://logs.openstack.org/28/545728/4/check/devstack-multinode/f2a68a6/compute1/logs/_.localrc_auto.txt	12:38
andreaf	corvus: could you have a look into that?	12:38
*** jpena is now known as jpena\|lunch		12:39
*** sshnaidm has quit IRC		12:42
*** dmellado has joined #zuul		13:07
*** sshnaidm has joined #zuul		13:25
*** jpena\|lunch is now known as jpena		13:31
*** rlandy has joined #zuul		13:32
*** rlandy is now known as rlandy\|rover		13:33
*** dmellado has quit IRC		13:38
*** JasonCL has joined #zuul		13:41
*** JasonCL has quit IRC		13:46
*** JasonCL has joined #zuul		13:46
*** JasonCL has quit IRC		13:49
*** JasonCL has joined #zuul		13:50
*** JasonCL has quit IRC		14:10
*** dmellado has joined #zuul		14:44
*** sshnaidm has quit IRC		14:46
*** sshnaidm has joined #zuul		14:47
*** dmellado has quit IRC		15:02
*** tosky_ has joined #zuul		15:02
*** tosky has quit IRC		15:03
*** elyezer has quit IRC		15:11
*** JasonCL has joined #zuul		15:13
-openstackstatus- NOTICE: Zuul has been restarted to pick up latest memory fixes. Queues were saved however patches uploaded after 14:40UTC may have been missed. Please recheck if needed.		15:17
*** swest has quit IRC		15:17
*** swest has joined #zuul		15:19
*** elyezer has joined #zuul		15:24
corvus	andreaf: that's strange, i wouldn't expect them to be merged at all -- i'd expect one to take precedence over the other	15:25
andreaf	corvus: oh.. is it? so what I was trying to do would not work - i.e. defined devstack_conf in vars and then extend it in group vars	15:27
corvus	SpamapS, pabelanger: we shouldn't need a custom logging configuration to answer questions like SpamapS is asking, so let's figure out what log lines we need to switch to info in order to at least answer basic questions like "is nodepool doing anything"	15:27
pabelanger	++	15:28
andreaf	corvus i.e. https://review.openstack.org/#/c/545696/10/.zuul.yaml	15:28
andreaf	corvus: yeah I think you are right they are basically overriding each other... the subnode only gets what's defined in group vars for the subnode	15:29
andreaf	corvus: any idea on how to achieve what I'm trying to do? I don't really want to repeat the whole localrc settings for every node group, I only want to add what's relevant	15:31
openstackgerrit	Monty Taylor proposed openstack-infra/zuul master: Remove .json suffix from web routes https://review.openstack.org/537010	15:31
openstackgerrit	Monty Taylor proposed openstack-infra/zuul master: Port per-change status to zuul-web https://review.openstack.org/535903	15:31
openstackgerrit	Monty Taylor proposed openstack-infra/zuul master: Re-enable test of returning 404 on unknown tenant https://review.openstack.org/545644	15:31
openstackgerrit	Monty Taylor proposed openstack-infra/zuul master: Add /info and /{tenant}/info route to zuul-web https://review.openstack.org/537011	15:31
corvus	so we could leave this the way it is, which basically means the job author would have to be careful about whether they put a given variable in 'vars', 'host_vars', or 'group_vars'. or we could merge variables inside zuul and only use host vars when writing the inventory.	15:33
mordred	SpamapS: yes, I agree - and I have some thoughts on doing that but have been holding off on them until post zuul release	15:33
AJaeger	zuul and nodepool cores, ianw and myself fixed the integration tests, could you review the stacks starting at https://review.openstack.org/#/c/545163/ and https://review.openstack.org/#/c/545158/ , please?	15:34
mordred	corvus, tobiash, tristanC: when you get a chance, could you review the four patches above? (the first four of the javascript topic) I'd like to get them in and deployed (or get them rejected) so that the live draft changes later in the stack will work as we review those patches	15:35
mordred	andreaf, corvus, SpamapS: also - I just got a patch landed to keystoneauth and released last week that allows us to tell keystoneauth to split its logging across four more specific loggers ... which was all in support of being able to have some keystoneauth debug logging in the default nodepool logging config but not all of it	15:35
mordred	andreaf: sorry - I mentoined the wrong people	15:36
mordred	pabelanger: ^^	15:36
corvus	andreaf: so i think if we kept this the way we have it now, you'd have to stop using vars at all, and only use group_vars, which means putting the devstack_localrc in both group_vars section.	15:37
mordred	pabelanger, corvus, SpamapS: https://docs.openstack.org/keystoneauth/latest/using-sessions.html#logging FWIW - I'll get us a patch up so that we can take advantage of that	15:37
*** JasonCL has quit IRC		15:38
*** JasonCL has joined #zuul		15:38
corvus	mordred: yeah, that will be nice to have in our debug logs, but i think it's too low-level for the default log (or what SpamapS is asking for)	15:38
andreaf	corvus: I don't think that's very maintainable, unless we use vars to define a base set of devstack setting, and then extend the dictionaries in group vars	15:38
andreaf	corvus not sure we can really do that though	15:39
corvus	andreaf: right, so if that's the goal, we either need to rethink the whole devstack_localrc dictionary thing (maybe split it out so that not everything is in one dictionary), or merge vars, host_vars, and group_vars internal to zuul.	15:40
corvus	mordred: thoughts ^?	15:40
mordred	corvus: hrm. I'm not crazy about merging vars, host_vars and group_vars internal to zuul - that seems like starting to reimplement ansible internals	15:41
corvus	mordred: well, ansible doesn't merge them, so...	15:41
andreaf	corvus: we could have variables with different names and merge them into the write_devstack_conf module	15:41
corvus	andreaf: yep, that's a possibility	15:42
Shrews	mordred: oh, that logging split is nice	15:42
*** JasonCL has quit IRC		15:43
mordred	corvus, andreaf: well, it can: http://docs.ansible.com/ansible/latest/intro_configuration.html#hash-behaviour	15:43
andreaf	corvus, mordred: however I fear that would make a poor user interface	15:43
andreaf	mordred: yeah it's configured not to by default right	15:43
mordred	but that's potentially much more widely reaching	15:43
mordred	and I think has a greater chance to confuse people	15:44
corvus	mordred: oh neat, i didn't know that. that does change things. :)	15:44
corvus	we could expose that option, and let the devstack job use it	15:44
mordred	corvus: it does- but I think that might break people's roles or playbooks if they're not expecting it - so maybe doing some sort of merge specific to the zuul variables in zuul is fine here	15:45
mordred	corvus: oh - that's an idea	15:45
mordred	corvus: that way it doesn't mess up variables in roles or playbooks where people aren't expecting it, but devstack can opt-in	15:46
corvus	for that matter, if we merged them internal to zuul, we could also make that a per-job flag i think.	15:47
mordred	yah	15:47
AJaeger	corvus: and that would apply to all used roles and playbooks? So, could still break some...	15:49
AJaeger	tricky ;(	15:49
corvus	to summarize: merging internally to zuul is probably the simplest in some ways (we don't add any new zuul options), but the reason not to is because we think someone might prefer the override behavior between host_vars and group_vars at some point, and it alters a standard behavior of ansible.	15:49
corvus	while exposing the hash_behavior option is easy to implement and allows a normal and valid use of ansible. we just need to tell people about that option when this case comes up.	15:50
andreaf	mordred, corvus: I think the best would be to be able to set the behaviour o specific variables	15:51
corvus	AJaeger: yes it could -- but that's true anytime you use that option with ansible. so, as the docs say, don't use it unless you really need it.	15:51
andreaf	mordred, corvus if we remove the test matrix we will have to set list of services and that's going to be dicts that we don't want to merge	15:52
corvus	AJaeger: i don't anticipate that would be a substantial problem in practice. much of the time, we don't even need hostvars.	15:52
mordred	luckily, since it would be a job variable, we can try it and make sure none of hte base roles will be borked	15:52
corvus	andreaf: the services aren't lists, they are dict entries, specifically so that we can merge them.	15:52
mordred	corvus: well, it affects all variable precedence - so it's possible it could introduce a merge where otherwise a replace would have happened	15:52
mordred	even i host_vars/group_vars aren't present	15:52
mordred	corvus: I agree, I doubt it'll be an issue for us - but it's still possible that something in one of the base roles would not be hash-merge clean	15:53
corvus	mordred: oh. that might be a bummer for site vars.	15:53
mordred	yah. I think we want to double-check that it doesn't allow site vars to have local values merged into them	15:54
andreaf	corvus: yes I know they are dicts - I guess it depends if we want controllers and subnodes to disable they don't need	15:54
andreaf	corvus or to define which services they do need	15:54
andreaf	corvus, mordred: well maybe I should just duplicate devstack_localrc and everything would be easier	15:55
andreaf	corvus, mordred: would it be possible in the job definition to set the shared part of the dict in a shared_devstack_localrc varilable, and then set devstack_localrc in group vars as the merge of shared plus a custom part?	15:57
corvus	andreaf: yeah, we could merge them in the role	15:59
andreaf	corvus, mordred: can we use jinja2 in the job definition? for instance devstack_localrc: "{{ shared_devstack_localrc \|combine(my_devstack_localrc) }}"	16:07
corvus	andreaf: i don't believe that will work -- zuul isn't going to interpolate it, and i don't think ansible does so for inventory files	16:09
mordred	corvus: it doesn't - but it does for vars files in roles - so it's _possible_ it'll do late interpolatio	16:12
openstackgerrit	David Shrewsbury proposed openstack-infra/nodepool master: Simplify launcher cleanup worker https://review.openstack.org/545867	16:12
mordred	corvus, andreaf: which is to say - I agree with corvus, I think it won't work ... BUT, it's totally worth trying :)	16:14
andreaf	mordred corvus another question - will zuul merge dicts between group_vars in parent and group_vars in children jobs?	16:15
corvus	andreaf: yes	16:15
tobiash	mordred: am I right that your per change status change only uses the change number?	16:16
tobiash	mordred: I'm asking because github has no global change numbers	16:16
andreaf	corvus ok so if base multinode defines services for controller and services for subnode, a child job can selectively add/remove services on controller and on subnode	16:16
corvus	andreaf: yep, that should work.	16:17
tobiash	mordred: if I'm right this won't work for github	16:18
mordred	tobiash: good point - yes, I believe you are correct about that	16:23
mordred	tobiash: we probably should add support for filtering by project+change too, yeah?	16:23
tobiash	mordred: let me check what the 'id' field is in github	16:24
andreaf	mordred, corvus you never know, it may work https://review.openstack.org/#/c/545696/11/.zuul.yaml	16:24
tobiash	mordred: ok, the id field is something like this: "id": "6,a672a19d197779e4d5410cb25775ee373cf47269"	16:25
tobiash	and your checking the whole string right?	16:25
mordred	yah	16:25
tobiash	assuming the git hash is unique enough this should work equally well in github :)	16:26
mordred	cool. so there is a gh value that will work - it might not be immediately guessable by an end-user like the gerrit one is	16:26
tobiash	not guessable but the intention is probably reporting this via status_url right?	16:27
mordred	tobiash: but if we're mostly expecting this to get called as the result of links we're putting places, that should be good enough for now - and we can leave figuring out how to get someone from "ansible/ansible/pulls/6" to status/change/6,a672a19d197779e4d5410cb25775ee373cf47269 later	16:27
*** JasonCL has joined #zuul		16:27
mordred	tobiash: yup	16:27
mordred	however - I thnk at some pint having a dashboard page for a change that doesn't change as the change is revised would e nice - perhaps something like 'change/review.openstack.org/501040' and '/change/github.com/ansible/ansible/6' - so that the key is similar to what's used in depends-on lines?	16:30
corvus	++	16:31
mordred	but that dashboard page could use the ansible/ansible/6 to look up 6,a672a19d197779e4d5410cb25775ee373cf47269 and then get the status from status/change/6,a672a19d197779e4d5410cb25775ee373cf47269	16:31
mordred	(and in fact could potentially look up older builds/status for that change too and show info about them from the build history while showing live status or completed build info for the current change	16:32
SpamapS	corvus: to be clear, I log at debug level. The only indication I get that nodes are spinning up is this: 2018-02-18 01:04:45,639 DEBUG nodepool.NodeLauncher-0000005071: Waiting for server ded9e08e-311c-4c98-8b1f-1d516148aed9 for node id: 0000005071	16:32
SpamapS	actually I also maybe just missed this one 2018-02-18 01:04:30,230 INFO nodepool.NodeLauncher-0000005071: Creating server with hostname k7-a-0000005071 in a from image centos7-kolla for node id: 0000005071	16:33
corvus	SpamapS: oh, what notification are you missing?	16:33
SpamapS	But I'd swear sometimes I don't see those until just before the request completes	16:33
*** JasonCL has quit IRC		16:33
SpamapS	and really, I think the logging is ok. I am just overwhelmed with logs	16:34
SpamapS	(Info has the creating message, and that should be plenty)	16:34
SpamapS	The real problem is the AZ allocation.	16:34
SpamapS	I haven't checked my math, but from what I can tell, each node added to a nodeset makes it far far harder to have enough ready nodes in the chosen AZ ever.	16:35
tobiash	mordred: +2 with comment on https://review.openstack.org/#/c/545644	16:36
SpamapS	Because everything is chosen at random, min-ready may not spin up any nodes in the AZ you need next, but if there are nodeset_size - 1 nodes in that AZ already, then your job queues while a node spins up.	16:36
SpamapS	And then min-ready may come along and not get enough nodes in that AZ again.. and the cycle repeats until you get a request which happens to pull a node from the very-full AZ.	16:37
SpamapS	So yeah.. I think the simplest thing to do is let people turn off the nodeset-in-one-AZ for a particular cloud.	16:38
mordred	tobiash: that's a good point ...	16:38
corvus	mordred: when did those 4 patches become the first in the series? i thought 538099 was the first?	16:39
corvus	mordred: oh, i think you explained that in the commit msg for 537010...	16:40
*** jpena is now known as jpena\|brb		16:42
dmsimard	mordred: replied to your email on zuul-discuss with some additional thoughts	16:43
openstackgerrit	Monty Taylor proposed openstack-infra/zuul master: Use a status code to detect unknown vs. missing tenant https://review.openstack.org/545879	16:45
mordred	tobiash: ^^ how about if we did that?	16:45
tobiash	mordred: looks good, but what do you think about just sending the complete payload with 404?	16:50
SpamapS	corvus: so yeah just to reset.. I think my logging problem was actually that I should have been watching my INFO-only log and I'd have seen the activity just fine.	16:50
tobiash	that way the response still would be json (consistent and also machine readable if you don't have the status code like in a shell pipe)	16:50
SpamapS	but my "why are my jobs always queueing for 5 minutes" problem is with AZ stickyness meeting large nodeset sizes.	16:51
mordred	tobiash: wfm - one sec	16:51
corvus	SpamapS: ok	16:51
SpamapS	The latter I intend to submit a patch for.	16:51
clarkb	fwiw I'm not a huge fan of the random az by default behavior in nodepool and think we should use the default az unless a list of az's is specified	16:51
SpamapS	Which would allow turning it off per provider for those who know the cloud that is pointed at should be fine with cross-AZ jobs.	16:52
clarkb	(and aiui default could be nova giving you whatever it wants to give you)	16:52
dmsimard	mordred, andreaf: btw just in case you missed my reply for this issue http://eavesdrop.openstack.org/irclogs/%23zuul/%23zuul.2018-02-18.log.html#t2018-02-18T17:43:57 .. see my answer here: http://eavesdrop.openstack.org/irclogs/%23zuul/%23zuul.2018-02-19.log.html#t2018-02-19T04:50:41	16:52
SpamapS	clarkb: agreed. I didn't know we did that by default.	16:52
SpamapS	Typically clouds will configure their default AZ to be the one that is most empty.	16:52
clarkb	SpamapS: yes it surprised me when we broke citycloud due to this behavior	16:52
SpamapS	In GoDaddy's case, we catually require explicit AZ because we did routed networks before neutron did them. ;)	16:53
SpamapS	catually and actually.	16:53
* SpamapS meows		16:53
clarkb	SpamapS: in that case you could use different providers for each az	16:54
clarkb	except now we won't weight the requests by max-server size anymore right?	16:54
corvus	clarkb: pool, not provider	16:54
clarkb	so maybe that doesn't help	16:54
clarkb	oh right pool. But still I'm not sure it helps unless we weigh the requests based on max-server size	16:55
corvus	what is max-server size?	16:55
clarkb	the max-servers allowed to be booted by each pool/provider	16:56
corvus	clarkb: pool again	16:56
clarkb	so that you could give the smaller az a smaller max-servers value and it would be weighed against the whole. This is how old nodepool worked but I think with the new request mechanism we don't do that anymore	16:56
corvus	clarkb: why would that be necessary?	16:56
clarkb	corvus: because aiui SpamapS'	16:56
clarkb	er	16:57
clarkb	my understanding of SpamapS' problem is that he has two az's one is smaller than the other. When the small oen gets a large node request say for 5 nodes it will often block waiting for resources to clear up	16:57
clarkb	if we weighed handling of requests the bigger az would be more likely to handle that request until they would both be in a situation to block and then it doesn't matter	16:57
*** JasonCL has joined #zuul		16:58
mordred	tobiash: actually, the exception there doesn't take that parameter at all - and takes code and message explicitly	16:58
openstackgerrit	Monty Taylor proposed openstack-infra/zuul master: Use a status code to detect unknown vs. missing tenant https://review.openstack.org/545879	17:01
corvus	clarkb, SpamapS: yeah, there are a lot of improvements that can be made to the launchers if they are aware of each other. right now, the algorithm assumes no information sharing between them. i think it's worth evolving it in the future to share info, but it needs to be done carefully.	17:01
clarkb	another way to appraoch this is to let the cloud put instances in whichever AZ is most appropriate. Though sounds like that won't entirely work for godaddy's cloud	17:01
clarkb	I do think it is worth figuring out if we can remove the random az by default behavior in nodepool as it is suprising to users of openstack clouds imo	17:01
corvus	clarkb: i believe the reason we always specify an az is because otherwise we can't guarantee that we'll get nodes in the same az.	17:02
clarkb	yes, but it is also not how all the existing clients and tools work if you don't specify an az so you will get different potentially unexpected behavior	17:02
clarkb	(I know I hadn't expected that when we ran into it)	17:02
corvus	clarkb: sure, but, i mean, if this existed we wouldn't have had to write nodepool. :)	17:03
corvus	let's make this argument less circulare	17:03
corvus	we need to (in some cases) create 2 nodes that are guaranteed to be in the same az	17:03
corvus	is there another way to do that?	17:03
clarkb	instruct the users that if they have more than one AZ to configure them?	17:04
clarkb	I think this is a case where ^ is less harm than booting instances in AZs you shouldn't be booting in	17:04
clarkb	I have yet tofind a cloud where the default az is actually more than one az	17:04
clarkb	even though it is theoretically possible	17:04
corvus	clarkb: can you ask the cloud what the default az is?	17:05
clarkb	I am not sure	17:05
mordred	no	17:05
corvus	that would be the simplest solution. perhaps we should start the very long process of adding that as an openstack api feature?	17:06
mordred	well - it's not really ... it would be exceptionally hard to do - because an AZ isn't actually a thing	17:06
mordred	so, with regions, you can query the keystone catalog - because region is an aspect of the endpoint registration record	17:07
mordred	but azs exist independently in nova, neutron and cinder	17:07
corvus	mordred: well, we only care about regions in nova, so it could be a nova api call	17:07
corvus	er	17:08
corvus	az not region	17:08
clarkb	worth pointing out that az foo in nova may require volumes from az bar in cinder and a different combo of az's won't work. So you'll have to configure nodepool for cases like that anyways (random won't necessarily work)	17:08
clarkb	(though we aren't doing cinder volumes with nodepool so mostly a non issue)	17:09
mordred	right - I was actually just checking floating ips to make sure we weren't accidentally doing something incorrect there	17:09
corvus	clarkb: perhaps a way to use the default az would be to do have nodepool rememer what value or values it has gotten back, and then discard errant nodes or specify that value going forward	17:10
*** dmellado has joined #zuul		17:10
mordred	but I believe we're fine due to how we create floating ips directly on a port	17:10
*** myoung is now known as myoung\|bbl		17:11
corvus	clarkb: that would probably make things better in the citycloud case. i don't think anything short of information sharing between launchers would help SpamapS's case.	17:11
clarkb	yes, I think SpamapS' case is exceptional in its corner case qualities :)	17:12
*** JasonCL has quit IRC		17:12
mordred	clarkb: SpamapS is exceptional at producing corner cases	17:12
clarkb	and ya could remember check the az value of the first node that comes back using the default no value az reuqest then explicitly specify that going forward	17:12
*** JasonCL has joined #zuul		17:21
*** jpena\|brb is now known as jpena		17:22
*** JasonCL has quit IRC		17:25
*** JasonCL has joined #zuul		17:37
corvus	tobiash: did you want to review https://review.openstack.org/537011 ?	17:37
tobiash	corvus: oh yes, forgot about that between discussion with mordred and dinner ;)	17:39
andreaf	dmsimard thanks	17:40
corvus	tobiash, mordred: tristanC reviewed previous versions of all those changes positively, so i went ahead and approved all of them except 537011.	17:40
mordred	corvus, clarkb SpamapS: I have verified with the nova team that there is no way to find out what the default az is	17:41
*** JasonCL has quit IRC		17:42
tobiash	lgtm	17:42
corvus	mordred: looks like you're all clear on those 4 patches	17:43
mordred	corvus: woot! thanks!	17:43
corvus	over in #openstack-infra, we found that my memory improvement is using a bit too much cpu, so i'm going to switch to working on a fix to that.	17:44
mordred	corvus: ++	17:44
mordred	corvus: I'll wait to suggest a scheduler/zuul-web restart on your cpu fix	17:44
corvus	sounds good	17:45
mordred	anybody remember who it was that had a console app that shows zuul status? was that dansmith?	17:53
mordred	clarkb: ^^?	17:53
clarkb	I think harlowja and dansmith had their own	17:54
*** sshnaidm is now known as sshnaidm\|off		17:54
*** JasonCL has joined #zuul		17:59
*** tosky_ is now known as tosky		18:04
*** JasonCL has quit IRC		18:05
*** tosky has quit IRC		18:07
dmsimard	mordred: harlowja yeah	18:12
dmsimard	I have a link for it somewhere...	18:12
*** hashar_ is now known as DockerIsA_Sin		18:12
dmsimard	mordred: https://github.com/harlowja/gerrit_view/blob/master/README.rst#czuul	18:14
*** openstackgerrit has quit IRC		18:18
*** jpena is now known as jpena\|away		18:19
*** JasonCL has joined #zuul		18:26
mordred	dmsimard: thanks. pull request submitted	18:28
dmsimard	\o/	18:29
*** chrnils has quit IRC		18:31
*** JasonCL has quit IRC		18:32
*** DockerIsA_Sin is now known as hashar		18:33
*** openstackgerrit has joined #zuul		18:34
openstackgerrit	Merged openstack-infra/zuul master: Remove .json suffix from web routes https://review.openstack.org/537010	18:34
openstackgerrit	Merged openstack-infra/zuul master: Port per-change status to zuul-web https://review.openstack.org/535903	18:40
openstackgerrit	Merged openstack-infra/zuul master: Re-enable test of returning 404 on unknown tenant https://review.openstack.org/545644	18:40
openstackgerrit	Merged openstack-infra/zuul master: Add /info and /{tenant}/info route to zuul-web https://review.openstack.org/537011	18:40
openstackgerrit	James E. Blair proposed openstack-infra/zuul master: Avoid creating extra project schemas https://review.openstack.org/545953	18:45
clarkb	corvus: is ^ the performance fix?	18:46
corvus	clarkb: yep	18:46
corvus	that was not what i was expecting, fwiw.	18:46
clarkb	ok reviewing now, then breakfast	18:46
corvus	but in my local tests, that restored (and actually slightly improved with the extra changes i put in there) the parse time of a large configuration to the same as before the whole rework stack	18:47
corvus	the difference was about double (ie, https://review.openstack.org/545448 caused the time to double, and then https://review.openstack.org/545953 halved it again). that doesn't match the 6x difference we see in openstack-infra. i can't really account for that other than to say that my local config doesn't mirror it exactly.	18:49
corvus	so i think we should try that in prod to see whether that's a complete or partial fix.	18:49
corvus	i started a local test run when i pushed it up -- it looks like it's going to fail unit tests, but i think it may only be due to test infrastructure issues. i think it's worth reviewing it for substance now while i fix up the test failures.	18:51
AJaeger	corvus: did you see mordred's change https://review.openstack.org/#/c/545610 ? That touches similar lines as yours...	18:51
clarkb	corvus: substance wise it looks fine to me	18:51
corvus	AJaeger: i like mordred's change -- but let's rebase it after this one since it's not as urgent	18:52
fungi	yeah, i approved it before i saw you mention local test failures. curious to see where it breaks	18:57
openstackgerrit	James E. Blair proposed openstack-infra/zuul master: Avoid creating extra project schemas https://review.openstack.org/545953	18:58
openstackgerrit	James E. Blair proposed openstack-infra/zuul master: Allow test_playbook to run long https://review.openstack.org/545955	18:58
dmsimard	Do we have a strategy for updating Ansible in Zuul without potentially breaking everyone's jobs ?	18:59
corvus	fungi, clarkb, mordred: ^ local test failures fixed. i also stacked it on top of a test timeout bump.	18:59
fungi	thanks, i was wondering whether the change in timeout handling was to blame for the rise in unit tests timeouts for zuul changes	18:59
corvus	dmsimard: no, i think what we're going to do is land the 2.4 update, deal with breakage, and after that work out a way to support multiple versions.	19:00
corvus	dmsimard: (then, of course, changing the default version will break everyone, but that's an easier fix)	19:00
fungi	but i guess the tests themselves outgrew the timeout some too	19:00
corvus	fungi: yeah, i think that particular test timeout is due to our adding new jobs to that test for the post-timeout and hostvars	19:00
fungi	k	19:01
fungi	i see, that's the per-test timeout, not the job/playbook timeout	19:01
dmsimard	corvus: yeah Ansible unfortunately doesn't have a very good reputation at being "stable" (at least in the semver sense), typically breaking roles/playbooks. One has to be careful about using the word "regression" in #ansible-devel, often times it's a fix that broke an "unexpected behavior" or something like that :)	19:01
dmsimard	Since we have hundreds of jobs, upgrading Ansible might be quite a sensitive task	19:02
*** harlowja has joined #zuul		19:03
fungi	and no sane way to exercise them all against new ansible versions other than just taking the plunge and seeing what breaks	19:04
clarkb	corvus: the two changes are approved now	19:06
*** nhicher has quit IRC		19:06
*** myoung\|bbl is now known as myoung		19:07
*** nhicher has joined #zuul		19:07
dmsimard	fungi: that's scary	19:08
dmsimard	fungi: I wonder if there could be a job that'd run a job in another version of Ansible somehow	19:08
clarkb	dmsimard: aiui the idea is that with zuul testing ansible we can avoid some of those issues, but yes. Upgrading zuul's ansible likely to be an adventure	19:08
dmsimard	clarkb: I'm not worried so much about Zuul than I am about the users' jobs	19:09
clarkb	dmsimard: sure thats what I mean. the ansible zuul runs	19:09
corvus	dmsimard: well, i mean, if you do that, you've solved the problem of running multiple versions of ansible. which we plan to do.	19:09
corvus	so i mean, i guess we could stay on 2.3 until we do that.	19:09
clarkb	at the very least we need to publish what version of ansible a job ran under to aid in debugging (I think mordred was going to add that, not sure if it got added)	19:09
corvus	but i feel like we're early enough in the process that we (and i mean the collective zuul's of the world here, not just openstack) can deal with a cutover to 2.4.	19:10
Shrews	pretty sure the logging of ansible version is already landed	19:10
corvus	then solve for multiple versions of ansible later	19:10
dmsimard	clarkb: That's https://review.openstack.org/#/c/514489/ and https://review.openstack.org/#/c/532304/ which were blocked by a weird include_role issue, I think that's fixed now. I'll look.	19:11
fungi	dmsimard: we could set up some system to run preliminary builds of select jobs against newer ansible or something, but my point was we have tons of jobs which run infrequently under specific situations and have side effects we cant fully exercise outside production without a lot of extra effort to do things like set up our own mock pypi and whatnot (and even that doesn't 100% guarantee it'll accurately	19:11
fungi	represent a production run)	19:11
dmsimard	OH, no, it was a random fedora26 failure I could not understand	19:11
dmsimard	I'll debug that.	19:11
Shrews	dmsimard: oh, that still hasn't landed. thought it had	19:12
dmsimard	fungi: yeah I don't have any great ideas right now... at least as far as ARA is concerned, there are different integration jobs which tests that ara works with different versions of Ansible (2.2, 2.3, 2.4, devel). We don't really have the luxury of doing that with Zuul and the hundreds of jobs our users have	19:13
dmsimard	and then again, it needs to be tested under Zuul's Ansible	19:13
openstackgerrit	David Moreau Simard proposed openstack-infra/zuul-jobs master: Revert "Revert "Add zuul.{pipeline,nodepool.provider,executor.hostname} to job header"" https://review.openstack.org/514489	19:14
openstackgerrit	David Moreau Simard proposed openstack-infra/zuul-jobs master: Add Ansible version to job header https://review.openstack.org/532304	19:14
*** jpena\|away is now known as jpena\|off		19:14
clarkb	is the ansible version something we want the jobs to provide or something that zuul should just write out?	19:17
clarkb	(I worrythat that is a feature that everyone should have and if it is in job ocnfig many will miss it?)	19:17
corvus	well, this is the way we provide it to everyone -- a role in zuul-jobs that anyone can (should) use	19:18
clarkb	ya I guess the intent is for people to be using these roles. Maybe info about these roles can go into the bootstrapping doc when writing a base job	19:20
*** JasonCL has joined #zuul		19:20
corvus	clarkb: yeah, that's sort of where the zuul-from-scratch doc leaves off right now. but i believe we're all agreed it should be included there.	19:22
*** JasonCL has quit IRC		19:25
dmsimard	clarkb: I believe we discussed in the past that if users need advanced customization of their Ansible (ex: OSA might need a specific version of Ansible to test things.. different modules/callbacks, etc.), they would need to rely on installing a "nested" version of Ansible and running with that	19:47
corvus	well, that's the story now. but just so we're clear: after we upgrade to ansible 2.4, we are going to figure out a way for zuul to support more than one version of ansible, and people will be able to specify which version of ansible to run in their jobs.	19:51
dmsimard	corvus: that sounds amazing -- hard, but amazing	19:51
corvus	it will not be easy, but it is required in order for us to achive our objective of people using the same ansible in test and production.	19:52
rcarrillocruz	wow	19:52
rcarrillocruz	That would be amazing indeed	19:52
*** JasonCL has joined #zuul		19:53
clarkb	seems like the biggest hurdle there is the security monkey patching?	19:53
fungi	seems like we're going to become increasingly aware of the nuanced differences between each minor ansible release	19:53
clarkb	otherwise you can just install a virtualenv for each ansible and fork ansible out of them?	19:54
fungi	ansible seemed amenable to upstreaming some of the security improvements, right?	19:54
fungi	i mean, after the got over the "what? you're going to run arbitrary user-supplied playbooks on your servers!?!"	19:55
fungi	after they	19:55
dmsimard	clarkb: yeah, we need a clean way/interface to patch the stuff that we do in Zuul (security is one, streaming is another, etc.)	19:57
corvus	i actually think that's not going to take much more work. we already need to do work for each new version of ansible. so it's mostly a matter of not deleting the old versions, but with some minor-to-moderate extra maintenance work to deal with stable point releases. most of that can be automated (this stuff doesn't generally change a lot between point releases)	20:00
corvus	the thing we have to sort of invent is how to have multiple versions of ansible installed that works with all the ways people want to deploy zuul	20:01
*** JasonCL has quit IRC		20:02
dmsimard	I have some quite dirty code to handle forward/backward compat in ARA	20:10
dmsimard	i.e, https://github.com/openstack/ara/blob/dd37d4566bb4a17871e55e4ba50e17b46079efc2/ara/config.py#L41 "if LooseVersion(ansible_version) >= LooseVersion('2.3.0')" etc	20:10
clarkb	corvus: my naive assumption here is that zuul can sort of internally use the new python3 virtualenv stdlib tooling	20:10
dmsimard	If we end up finding a better way to handle this sort of stuff in Zuul, I'm all ears	20:11
clarkb	https://docs.python.org/3/library/venv.html#api	20:11
dmsimard	clarkb: so the executor would bootstrap the appropriate version of Ansible after creating the bwrap or something like that ?	20:12
clarkb	dmsimard: I think it can do it before bwrapping then bind mount into it and just keep a bunch of premade ansible installs	20:12
clarkb	but ya something like that	20:12
clarkb	probably make them on demand the first time	20:12
dmsimard	ah I see what you mean.. yeah, that could make sense	20:13
corvus	clarkb: ah yes, that sounds like just the thing.	20:20
mordred	clarkb, corvus: yah - I like the venv idea and think that's a good way to move forward on that	20:41
corvus	dmsimard: what's the status of https://review.openstack.org/543633 ?	20:43
corvus	dmsimard: (that's the fix for https://storyboard.openstack.org/#!/story/2001329 right?)	20:44
mordred	corvus: so - I just noticed (because I looked at docs online and the docs lied to me) we have an upper-bounds cap on aiohttp of <3 ... but 3.0.1 is out now	20:57
corvus	mordred: yeah, they broke something, commit should have info	20:58
mordred	corvus: it's mostly the same, although there is at least one thing that be nice: https://aiohttp.readthedocs.io/en/stable/web_advanced.html#graceful-shutdown	20:58
corvus	mordred: https://review.openstack.org/543546	20:58
mordred	corvus: yah - that was what I was about to say - glad to see that was the reason for th ecap	20:59
mordred	so we can update to 3.0 when we've migrated to bionic	20:59
corvus	mordred: oh, any chance 3.0.1 fixed the issue?	20:59
Shrews	odd. i wonder what 3.5.3 has that they needed	21:00
SpamapS	clarkb: "my understanding of SpamapS' problem is that he has two az's one is smaller than the other." -- that's a misunderstanding. I have 3 AZ's, and they are all relatively the same size.	21:00
dmsimard	corvus: I rebased it today, it's on top of my todo list right now. I plan on reproducing it with https://review.openstack.org/#/c/530642/12/playbooks/devstack-tempest.yaml	21:00
dmsimard	corvus: it's probably the fix, yes, but I want a test that is able to demonstrate that it actually does fix it	21:01
dmsimard	corvus: er, I rebased something else that was not that -- I don't actually need to rebase that one.	21:01
clarkb	SpamapS: ah ok, I'm not sure what nodepool would do differently then? maybe weight not based on max-servers but by available quota?	21:05
corvus	dmsimard: can you add a test for it in zuul?	21:09
dmsimard	corvus: That's what I attempted to do with the functional integration test https://review.openstack.org/#/c/543633/2/playbooks/zuul-stream/functional.yaml but it doesn't seem to reproduce the issue, I'm doing some testing locally to try and understand why	21:11
mordred	clarkb: I think what I was hearing from SpamapS yesterday is that we do the az randomization before we check for ready nodes - so he had an az with ready nodes, but the az balancer chose a different az and proceeded to spin up the nodes there	21:12
clarkb	ah	21:13
corvus	er, that's not how that works	21:14
corvus	if there's a ready node, we'll grab it and use its AZ to spin up any more nodes if necessary	21:14
mordred	oh - yah. I just re-read	21:15
pabelanger	I'm just looking at https://review.openstack.org/545953/ and py35 seems to have failed	21:15
mordred	in his case it was that min-ready was also working on spinning up nodes and he was near capacity	21:15
corvus	spectacularly.. i'll look into that	21:16
mordred	pabelanger: oh - wow - that's a lot of red	21:16
pabelanger	looks like threads didn't shutdown propelry	21:17
pabelanger	properly*	21:17
clarkb	but we also can't really fix that without having handler coordination like what corvus described I think	21:18
clarkb	because it requires handlers to have a rough idea if any other handler is in a better spot to handle a request	21:18
mordred	clarkb: yah	21:18
corvus	those tests work in isolation...	21:18
corvus	i'm wondering if one test went awry and messed up the state for other tests	21:19
openstackgerrit	Merged openstack-infra/zuul master: Allow test_playbook to run long https://review.openstack.org/545955	21:19
corvus	oh i think i know which :)	21:21
openstackgerrit	David Shrewsbury proposed openstack-infra/nodepool master: Hack for test_delete_now rare failures https://review.openstack.org/545982	21:22
corvus	mordred: is there a way i can have ttrun do 2 different classes?	21:22
Shrews	i have been pondering the test_delete_now failures today. i'd like to see the above hack go in until a real solution can be generated	21:23
corvus	mordred: like i want to run: "ttrun -epy35 tests.unit.test_model" and ttrun -epy35 tests.unit.test_requirements" in one solution?	21:23
corvus	Shrews: oh is that the thing we added recursive locking for?	21:23
corvus	Shrews: that merged in kazoo, we can probably use it if we can remember how it was going to solve it.	21:23
Shrews	corvus: i spent a great deal today trying to see how that could help (i totally forgot the solution we discussed)	21:24
Shrews	corvus: i did not see how it would help at all since ReadLock and WriteLock still create znodes under the thing we intend to delete	21:24
corvus	Shrews: heh, i may have to ask eavesdrop to figure out what we were thinking	21:24
mordred	corvus: not really, no - I think stestr can though	21:25
clarkb	tox -epy36 -- '(TestClass1\|TestClass2)' ?	21:25
mordred	corvus: actually, I think testr itself can too ... ^^ like that	21:26
mordred	you may also want to add --parallel=1 or whatever the concurrency arg is	21:26
openstackgerrit	James E. Blair proposed openstack-infra/zuul master: Avoid creating extra project schemas https://review.openstack.org/545953	21:26
mordred	concurrency	21:26
mordred	tox -epy35 -- --concurrency=1 'tests.unit.test_(model\|requirements)'	21:27
mordred	should do the trick	21:27
corvus	oh ok, well i just started running the whole squite locally	21:27
mordred	k	21:27
corvus	though i'm pretty sure that should fix it	21:27
mordred	corvus: should we go ahead and re-promote in gate?	21:29
corvus	mordred: i should have a local test run in just a minute	21:30
clarkb	does nothing have to call readConfig?	21:30
corvus	clarkb: the scheduler now calls both read and load config	21:30
clarkb	was that git added?	21:31
clarkb	(I'm not seeing it but also could just be looking poorly)	21:31
corvus	clarkb: i may need more context for your question	21:31
clarkb	I'm not seeing where the scheduelr calls read config. But I'm about to fetch change locally and grep better	21:31
mordred	corvus: gerrit is showing something strange in the interdiff	21:31
corvus	clarkb: you're talking about what change? mordred's config changes?	21:32
mordred	corvus, clarkb: that readConfig change is from here: https://review.openstack.org/#/c/545610/ ... but gerrit is showing it in the 'show me differences between patchset 2 and 3" view	21:32
clarkb	545953 I think I'm just not seeing an existing api due to the restricted scope of that change	21:32
corvus	my latest patchset is a fix+rebase	21:32
mordred	oh - it's a rebase - so it's just the werid gerrit rebase thing	21:32
corvus	sorry, i don't normally like to do that, but i did so this time because the test failures were hard to repro locally and i thought it might be due to a change in the tree	21:33
mordred	and sorry - it wasn't from that patch, it was from a different one	21:33
corvus	i can un-rebase that if you'd like	21:33
mordred	corvus: ++	21:33
mordred	corvus: nah - I was just confused because I thought it was content from an un-landed change :)	21:33
corvus	i only made one change, and the rebase should (in retrospect) not be necessary. the change is to add a cleanup in test_model.	21:33
clarkb	corvus: ya cleanup in test_model makes sense to me	21:33
mordred	++	21:34
corvus	(so probably easiest to just look at the actual diff from base and know that single line is the only change in the latest patchset)	21:34
clarkb	ok I've approved it	21:35
clarkb	git log -p straightened me out	21:36
mordred	\o/	21:36
*** threestrands has joined #zuul		21:37
corvus	local tests pass, i'll re-enqueue in gate	21:37
fungi	the 545956 update included a rebase i guess	21:38
fungi	ahh, scrollback confirms	21:39
corvus	dmsimard: 543633 has a unit test in it -- does it not fail without the fix?	21:39
corvus	sorry, i should have undone that but forgot	21:39
SpamapS	So .. sort of?	21:39
* SpamapS is laggy today sorry		21:40
dmsimard	corvus: that's not unit, that's integration -- the reason I'm confused is that this exact same patch used to fail if you look at the prior history in https://review.openstack.org/#/c/504238/	21:40
corvus	dmsimard: oh, it doesn't have a unit :(	21:40
dmsimard	corvus: it suddenly stopped failing and I was never able to understand why	21:40
SpamapS	My issue is that sometimes I routinely have 10+ ready nodes in an AZ, and because requests randomly choose a ready node, instead of a ready set, it's entirely possible we never choose one of those 10, and thus, end up needing to spin up nodeset_size - 1 nodes on every single request.	21:41
mordred	dmsimard: maybe point-release of ansible?	21:41
dmsimard	I'm working on setting up a reproducer right now, mordred's test-logs ( https://github.com/openstack-infra/zuul/blob/master/tools/test-logs.sh ) is quite useful -- I'll send some improvements for it	21:41
SpamapS	I have min-ready: 15, and a nodeset size of 5, and 3 AZ's.	21:41
clarkb	SpamapS: so we probably want to make ready nodes a heap sorted by nodes per az	21:41
clarkb	SpamapS: and pull off in that order	21:41
mordred	dmsimard: \o/ I'm useful :)	21:42
SpamapS	I had hoped that meant we'd have a high chance of picking an AZ with an entire set in it.	21:42
SpamapS	What actually happens is, we pick the right one almost never.	21:42
corvus	mordred, dmsimard: well part of mordred's work on logging is that we should not need test-logs anymore	21:42
corvus	we need actual tests for this stuff, and i believe mordred is adding those	21:42
dmsimard	mordred: The fix for the issue we're talking about landed in 2.4.3 so unless Zuul is suddenly using 2.4.3, something's up	21:42
corvus	so let's not spend too much time on improving test-logs	21:42
mordred	corvus: I don't think we're looing at improving it at all	21:43
mordred	corvus: I think he's just using it to help figure out why things suddenly started working	21:43
dmsimard	corvus: sure -- it's a way for me to run the test playbooks with the callbacks and stuff without having access to a sandbox/test zuul environment	21:43
dmsimard	mordred: yeah that	21:43
corvus	mordred: dmsimard said he was improving it	21:43
SpamapS	It gets really bad when 2 or 3 req's come in at once.	21:43
dmsimard	corvus: 2 minutes worth of improvement to make my life easier :P	21:43
mordred	corvus: not related to that- do all of the parsers do self.schema = self.getSchema now?	21:44
mordred	corvus: nm - they don't	21:44
corvus	mordred: i think only tenantparser doesn't... but i think it probably could?	21:44
SpamapS	clarkb: I think we'd also need to make min-ready's do the same in reverse. When a min-ready req's comes in, choose the AZ with the least ready nodes.	21:45
corvus	just wasn't important to me earlier because it's only called once either way	21:45
corvus	dmsimard: so what happens if you add that test without the fix?	21:45
SpamapS	Or, just allow disabling az-sticky	21:45
mordred	corvus: I actually tried generalizing the schema bits when I made the base class patch - but it got to weird and I figured I'd do it as a follow up	21:45
mordred	corvus: (there was one of the classes that called getSchema with arguments)	21:45
SpamapS	Take nodeset_size nodes from the provider and ignore AZ.	21:45
dmsimard	corvus: I'll have an answer in a few minutes	21:45
corvus	mordred: yeah, nodeset has an anonymous argument, so we store 2 variations	21:46
mordred	yah. that was the one that made it seems less nice	21:46
corvus	mordred: could throw a layer of indirection in there... self.createSchema() and override that in nodeset....	21:47
mordred	corvus: oh - wait - you've solved the issue I had already I think	21:48
mordred	yup.	21:49
mordred	corvus: thanks! your latest patches remove the issue I had putting that in the base :)	21:49
corvus	yay	21:50
openstackgerrit	Monty Taylor proposed openstack-infra/zuul master: Extract an abstract base Parser class https://review.openstack.org/545610	21:53
mordred	corvus: there ya go ^^	21:53
clarkb	SpamapS: in your config you have a single pool for all three AZs and nodepool is randomly assigning nodes to each AZ right?	21:56
*** hashar has quit IRC		21:56
dmsimard	corvus: can't reproduce it locally again... with or without the patch. I'll send a dummy patch with a generic version of what andreaf reported last time it occurred.	21:57
andreaf	mordred I started documenting a bit https://review.openstack.org/#/q/status:open+project:openstack-dev/devstack+branch:master+topic:ansible_docs	21:58
corvus	andreaf: \o/	22:00
mordred	http://logs.openstack.org/52/545952/1/check/build-openstack-sphinx-docs/478cade/html/roles.html <-- looks great!	22:00
clarkb	meeting time right?	22:00
corvus	yep, meeting time in #openstack-meeting-alt	22:00
mordred	andreaf: there's a similar thing for jobs too	22:00
mordred	andreaf: .. zuul:autojobs:: IIRC	22:01
mordred	andreaf: the second patch - I think you missed a git add	22:02
andreaf	mordred: heh ok thanks	22:14
*** JasonCL has joined #zuul		22:16
mordred	andreaf: exciting to see that though	22:17
openstackgerrit	Merged openstack-infra/zuul master: Avoid creating extra project schemas https://review.openstack.org/545953	22:18
*** openstackgerrit has quit IRC		22:18
*** JasonCL has quit IRC		22:20
*** openstackgerrit has joined #zuul		22:38
openstackgerrit	Monty Taylor proposed openstack-infra/zuul master: Use a status code to detect unknown vs. missing tenant https://review.openstack.org/545879	22:38
openstackgerrit	Monty Taylor proposed openstack-infra/zuul master: Remove Paste from the dependencies https://review.openstack.org/546000	22:38
mordred	corvus, pabelanger, clarkb: ^^ 545879 fixes an api mismatch - and the follow up just cleans up Paste - neither are urgent, but both should be easy +As	22:54
*** JasonCL has joined #zuul		22:54
dmsimard	(╯°□°）╯︵ ┻━┻	22:57
dmsimard	corvus, mordred: https://review.openstack.org/#/c/545994/ didn't trigger the json issue	22:57
mordred	dmsimard: :(	22:59
mordred	dmsimard: maybe a 2.3 point release fixed the issue	22:59
dmsimard	mordred: as far as I know, the fix is https://github.com/ansible/ansible/commit/c30ee42fe1f0a9666a90f4d63121780f2a186c54 and it didn't land until 2.4.3, it's not backported to 2.3.x which is what we're using	23:02
mordred	dmsimard: yah	23:02
dmsimard	mordred: yeah 2.3 has the bug: https://github.com/ansible/ansible/blob/stable-2.3/lib/ansible/executor/task_executor.py#L459	23:03
dmsimard	this is annoying	23:03
dmsimard	andreaf: do you have a some sort of reference on the failure from the include_role in https://review.openstack.org/#/c/530642/12/playbooks/devstack-tempest.yaml ?	23:04
dmsimard	andreaf: which job ran that code ?	23:05
andreaf	dmsimard devstack-tempest and tempest-full and tempest-full-py3	23:07
*** JasonCL has quit IRC		23:07
dmsimard	andreaf: I can't find any signs of the error I'm looking for, could you point it out to me ?	23:10
corvus	dmsimard: well, according to the bug in storyboard, we saw the issue in build e4fdb1c004fe47478a5468bb6318ad66	23:23
corvus	dmsimard: http://logs.openstack.org/42/530642/11/check/tempest-full/e4fdb1c/job-output.txt.gz#_2018-02-09_18_10_40_314968	23:24
corvus	dmsimard: there is no output in the console log about that	23:24
corvus	the behavior is just start followed by an immediate end	23:25
corvus	dmsimard: and you'll note there's no entry in ara for the devstack-tempest playbook	23:25
corvus	dmsimard: it's patchset 11 that hit that error, not the latest version.	23:26
corvus	dmsimard: patchset 11 doesn't use include_role... it actually looks like it's invalid ansible, but that should have been reported.	23:27
corvus	dmsimard: that's in executor-debug.log.10.gz on ze03 if more logs would help	23:31
dmsimard	corvus: I'll check it out, thanks	23:45

Generated by irclog2html.py 2.17.2 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!