*** elyezer has quit IRC | 00:35 | |
*** elyezer has joined #zuul | 00:37 | |
*** elyezer has quit IRC | 00:47 | |
*** elyezer has joined #zuul | 00:48 | |
*** elyezer has quit IRC | 01:39 | |
*** elyezer has joined #zuul | 01:41 | |
openstackgerrit | Monty Taylor proposed openstack-infra/zuul master: Reorgnize the output bundle split https://review.openstack.org/545706 | 02:04 |
---|---|---|
mordred | SpamapS: is remaning az capacity available to an api user? | 02:08 |
mordred | SpamapS: ah- I understand now (read more scrollback) | 02:15 |
*** dkranz has quit IRC | 02:30 | |
*** rlandy has quit IRC | 02:52 | |
dmsimard | mordred, andreaf: yes, actually I did release ara 0.14.6 to fix just that: https://github.com/openstack/ara/commit/9b962c5dcb77dc3e7687f75030ea41de30a54746 | 04:50 |
dmsimard | I fixed it from the perspective of ara but I still feel it's a bug. I remember discussing it with #ansible-devel and it was supposed to be fixed in 2.4.3 iirc | 04:52 |
tobiash | SpamapS: we should probably make the az stickyness configurable per provider | 05:08 |
tobiash | In my cloud this also has these disadvantages | 05:09 |
tobiash | And in my cloud inter az bandwith (which probably was a reason for az stickyness) is a non-issue for me | 05:10 |
tobiash | Especially in smaller clouds this is even problematic as it can prevent nodes from being created regardless of quota | 05:12 |
tristanC | tobiash: shouldn't we make az stickyness part of the nodeset definition instead? | 05:12 |
tobiash | That depends strongly on the cloud and its inter az bandwidth | 05:13 |
tobiash | So I think this should be primarily set by nodepool as the user usually doesn't have the knowledge to judge about that | 05:14 |
tristanC | tobiash: doesn't it change network configuration when nodes aren't in the same az? | 05:16 |
*** elyezer has quit IRC | 05:18 | |
*** elyezer has joined #zuul | 05:21 | |
tristanC | iiuc, cross az nodeset won't necessarly be in the same tenant network, and maybe some job explicitely need that | 05:27 |
tobiash | tristanC: no it doesn't necessarily | 05:34 |
tobiash | The networks used is also defined by the provider | 05:35 |
*** bhavik1 has joined #zuul | 05:52 | |
*** bhavik1 has quit IRC | 06:00 | |
*** threestrands has quit IRC | 06:10 | |
*** threestrands has joined #zuul | 06:23 | |
SpamapS | So, I'd say that cross-az-stickyness should be a boolean, defaulting to true. | 06:48 |
SpamapS | If you know your cloud doesn't need it, and you find yourself wishing it wasn't a thing, you can turn it off. | 06:48 |
openstackgerrit | Tristan Cacqueray proposed openstack-infra/zuul master: web: angular controller for status and stream page https://review.openstack.org/545755 | 06:48 |
SpamapS | For me, cross-AZ bandwidth is the same as intra-AZ bandwidth when the HV's are in different racks. | 06:49 |
SpamapS | And the networks are all more or less the same.. cross-AZ may go through one extra router, but that may happen intra-AZ too if the nodes are on different subnet.s | 06:49 |
*** threestrands has quit IRC | 07:00 | |
*** AJaeger has quit IRC | 07:00 | |
*** AJaeger has joined #zuul | 07:04 | |
tobiash | SpamapS: yeah, it should default to true to keep the current behavior | 07:06 |
tristanC | tobiash: thanks for the ansible-2.4 port, it seems to be working fine | 07:10 |
tobiash | tristanC: yeah, it worked in my testenv and I'm going to roll this out on production this week :) | 07:11 |
tobiash | This is essential for me to get real windows jobs running as with ansible 2.3 almost every win_* module is broken with py3... | 07:12 |
tristanC | we also have some playbooks that are >2.4 only, hopefully we'll deploy your patch soon too | 07:16 |
tobiash | tristanC: we'll land it shortly after 3.0 release | 07:16 |
SpamapS | which is hopefully.. shortly. :) | 07:18 |
* SpamapS has been missing meetings.. not sure what's left | 07:18 | |
tristanC | too bad, ansible-2.4 support is also blocking fedora packages | 07:19 |
SpamapS | What's the reasoning behind delaying the bump after release? | 07:19 |
SpamapS | 2.4 breaks old stuff? | 07:19 |
SpamapS | (Does kind of feel like 2.4 was overly aggressive, and introduced some new annoying, meaningless warnings) | 07:20 |
tobiash | SpamapS: the openstack release process is critical and roughly at the same time as the 3.0 release | 07:21 |
tobiash | ansible 2.4 is a high risk to disrupt the openstack release so it will be merged after that | 07:22 |
AJaeger | zuul and nodepool cores, ianw and myself fixed the integration tests, could you review the stacks starting at https://review.openstack.org/#/c/545163/ and https://review.openstack.org/#/c/545158/ , please? | 07:25 |
*** elyezer has quit IRC | 07:27 | |
*** elyezer has joined #zuul | 07:31 | |
SpamapS | tobiash: ah yeah that makes more sense actually. :) | 07:42 |
SpamapS | I kind of think we're going to have to support multiple ansible versions at some point. | 07:43 |
SpamapS | btw, my AZ problems show nicely here http://paste.openstack.org/show/677466/ | 07:43 |
SpamapS | need 10 nodes, have 11, building 6 (min-ready is 15) | 07:45 |
*** chrnils has joined #zuul | 07:47 | |
*** sshnaidm|off has quit IRC | 08:35 | |
*** electrofelix has joined #zuul | 08:35 | |
*** jpena|off is now known as jpena | 08:43 | |
*** sshnaidm has joined #zuul | 08:44 | |
openstackgerrit | Antoine Musso proposed openstack-infra/zuul master: WIP: ensure that Change.number is a string https://review.openstack.org/545768 | 09:04 |
*** sshnaidm has quit IRC | 09:07 | |
*** sshnaidm has joined #zuul | 09:08 | |
*** hashar_ has joined #zuul | 09:24 | |
openstackgerrit | Antoine Musso proposed openstack-infra/zuul master: WIP: ensure that Change.number is a string https://review.openstack.org/545768 | 09:46 |
openstackgerrit | Tristan Cacqueray proposed openstack-infra/zuul master: Tenant config dynamic loading of project from external script https://review.openstack.org/535878 | 09:46 |
*** elyezer has quit IRC | 09:58 | |
*** elyezer has joined #zuul | 09:59 | |
*** kmalloc has joined #zuul | 10:14 | |
openstackgerrit | Matthieu Huin proposed openstack-infra/zuul master: zuul web: add admin endpoint, enqueue & autohold commands https://review.openstack.org/539004 | 10:19 |
*** tosky has joined #zuul | 10:29 | |
*** fbo has quit IRC | 10:40 | |
*** dmellado has quit IRC | 12:26 | |
*** elyezer has quit IRC | 12:30 | |
*** elyezer has joined #zuul | 12:35 | |
andreaf | corvus: I've done some tests in https://review.openstack.org/#/c/545696 with group_vars | 12:35 |
andreaf | corvus: from what I can see devstack_localrc is merged between controller group vars and generic vars (from parent job) | 12:37 |
andreaf | corvus: but not between subnode group vars and generic vars (from parent job) | 12:37 |
andreaf | corvus: http://logs.openstack.org/28/545728/4/check/devstack-multinode/f2a68a6/compute1/logs/_.localrc_auto.txt | 12:38 |
andreaf | corvus: could you have a look into that? | 12:38 |
*** jpena is now known as jpena|lunch | 12:39 | |
*** sshnaidm has quit IRC | 12:42 | |
*** dmellado has joined #zuul | 13:07 | |
*** sshnaidm has joined #zuul | 13:25 | |
*** jpena|lunch is now known as jpena | 13:31 | |
*** rlandy has joined #zuul | 13:32 | |
*** rlandy is now known as rlandy|rover | 13:33 | |
*** dmellado has quit IRC | 13:38 | |
*** JasonCL has joined #zuul | 13:41 | |
*** JasonCL has quit IRC | 13:46 | |
*** JasonCL has joined #zuul | 13:46 | |
*** JasonCL has quit IRC | 13:49 | |
*** JasonCL has joined #zuul | 13:50 | |
*** JasonCL has quit IRC | 14:10 | |
*** dmellado has joined #zuul | 14:44 | |
*** sshnaidm has quit IRC | 14:46 | |
*** sshnaidm has joined #zuul | 14:47 | |
*** dmellado has quit IRC | 15:02 | |
*** tosky_ has joined #zuul | 15:02 | |
*** tosky has quit IRC | 15:03 | |
*** elyezer has quit IRC | 15:11 | |
*** JasonCL has joined #zuul | 15:13 | |
-openstackstatus- NOTICE: Zuul has been restarted to pick up latest memory fixes. Queues were saved however patches uploaded after 14:40UTC may have been missed. Please recheck if needed. | 15:17 | |
*** swest has quit IRC | 15:17 | |
*** swest has joined #zuul | 15:19 | |
*** elyezer has joined #zuul | 15:24 | |
corvus | andreaf: that's strange, i wouldn't expect them to be merged at all -- i'd expect one to take precedence over the other | 15:25 |
andreaf | corvus: oh.. is it? so what I was trying to do would not work - i.e. defined devstack_conf in vars and then extend it in group vars | 15:27 |
corvus | SpamapS, pabelanger: we shouldn't need a custom logging configuration to answer questions like SpamapS is asking, so let's figure out what log lines we need to switch to info in order to at least answer basic questions like "is nodepool doing anything" | 15:27 |
pabelanger | ++ | 15:28 |
andreaf | corvus i.e. https://review.openstack.org/#/c/545696/10/.zuul.yaml | 15:28 |
andreaf | corvus: yeah I think you are right they are basically overriding each other... the subnode only gets what's defined in group vars for the subnode | 15:29 |
andreaf | corvus: any idea on how to achieve what I'm trying to do? I don't really want to repeat the whole localrc settings for every node group, I only want to add what's relevant | 15:31 |
openstackgerrit | Monty Taylor proposed openstack-infra/zuul master: Remove .json suffix from web routes https://review.openstack.org/537010 | 15:31 |
openstackgerrit | Monty Taylor proposed openstack-infra/zuul master: Port per-change status to zuul-web https://review.openstack.org/535903 | 15:31 |
openstackgerrit | Monty Taylor proposed openstack-infra/zuul master: Re-enable test of returning 404 on unknown tenant https://review.openstack.org/545644 | 15:31 |
openstackgerrit | Monty Taylor proposed openstack-infra/zuul master: Add /info and /{tenant}/info route to zuul-web https://review.openstack.org/537011 | 15:31 |
corvus | so we could leave this the way it is, which basically means the job author would have to be careful about whether they put a given variable in 'vars', 'host_vars', or 'group_vars'. or we could merge variables inside zuul and only use host vars when writing the inventory. | 15:33 |
mordred | SpamapS: yes, I agree - and I have some thoughts on doing that but have been holding off on them until post zuul release | 15:33 |
AJaeger | zuul and nodepool cores, ianw and myself fixed the integration tests, could you review the stacks starting at https://review.openstack.org/#/c/545163/ and https://review.openstack.org/#/c/545158/ , please? | 15:34 |
mordred | corvus, tobiash, tristanC: when you get a chance, could you review the four patches above? (the first four of the javascript topic) I'd like to get them in and deployed (or get them rejected) so that the live draft changes later in the stack will work as we review those patches | 15:35 |
mordred | andreaf, corvus, SpamapS: also - I just got a patch landed to keystoneauth and released last week that allows us to tell keystoneauth to split its logging across four more specific loggers ... which was all in support of being able to have *some* keystoneauth debug logging in the default nodepool logging config but not *all* of it | 15:35 |
mordred | andreaf: sorry - I mentoined the wrong people | 15:36 |
mordred | pabelanger: ^^ | 15:36 |
corvus | andreaf: so i think if we kept this the way we have it now, you'd have to stop using vars at all, and only use group_vars, which means putting the devstack_localrc in both group_vars section. | 15:37 |
mordred | pabelanger, corvus, SpamapS: https://docs.openstack.org/keystoneauth/latest/using-sessions.html#logging FWIW - I'll get us a patch up so that we can take advantage of that | 15:37 |
*** JasonCL has quit IRC | 15:38 | |
*** JasonCL has joined #zuul | 15:38 | |
corvus | mordred: yeah, that will be nice to have in our debug logs, but i think it's too low-level for the default log (or what SpamapS is asking for) | 15:38 |
andreaf | corvus: I don't think that's very maintainable, unless we use vars to define a base set of devstack setting, and then extend the dictionaries in group vars | 15:38 |
andreaf | corvus not sure we can really do that though | 15:39 |
corvus | andreaf: right, so if that's the goal, we either need to rethink the whole devstack_localrc dictionary thing (maybe split it out so that not everything is in one dictionary), or merge vars, host_vars, and group_vars internal to zuul. | 15:40 |
corvus | mordred: thoughts ^? | 15:40 |
mordred | corvus: hrm. I'm not crazy about merging vars, host_vars and group_vars internal to zuul - that seems like starting to reimplement ansible internals | 15:41 |
corvus | mordred: well, ansible *doesn't* merge them, so... | 15:41 |
andreaf | corvus: we could have variables with different names and merge them into the write_devstack_conf module | 15:41 |
corvus | andreaf: yep, that's a possibility | 15:42 |
Shrews | mordred: oh, that logging split is nice | 15:42 |
*** JasonCL has quit IRC | 15:43 | |
mordred | corvus, andreaf: well, it *can*: http://docs.ansible.com/ansible/latest/intro_configuration.html#hash-behaviour | 15:43 |
andreaf | corvus, mordred: however I fear that would make a poor user interface | 15:43 |
andreaf | mordred: yeah it's configured not to by default right | 15:43 |
mordred | but that's potentially much more widely reaching | 15:43 |
mordred | and I think has a greater chance to confuse people | 15:44 |
corvus | mordred: oh neat, i didn't know that. that does change things. :) | 15:44 |
corvus | we could expose that option, and let the devstack job use it | 15:44 |
mordred | corvus: it does- but I think that might break people's roles or playbooks if they're not expecting it - so maybe doing some sort of merge specific to the zuul variables in zuul is fine here | 15:45 |
mordred | corvus: oh - that's an idea | 15:45 |
mordred | corvus: that way it doesn't mess up variables in roles or playbooks where people aren't expecting it, but devstack can opt-in | 15:46 |
corvus | for that matter, if we merged them internal to zuul, we could also make that a per-job flag i think. | 15:47 |
mordred | yah | 15:47 |
AJaeger | corvus: and that would apply to all used roles and playbooks? So, could still break some... | 15:49 |
AJaeger | tricky ;( | 15:49 |
corvus | to summarize: merging internally to zuul is probably the simplest in some ways (we don't add any new zuul options), but the reason not to is because we think someone might prefer the override behavior between host_vars and group_vars at some point, and it alters a standard behavior of ansible. | 15:49 |
corvus | while exposing the hash_behavior option is easy to implement and allows a normal and valid use of ansible. we just need to tell people about that option when this case comes up. | 15:50 |
andreaf | mordred, corvus: I think the best would be to be able to set the behaviour o specific variables | 15:51 |
corvus | AJaeger: yes it could -- but that's true anytime you use that option with ansible. so, as the docs say, don't use it unless you really need it. | 15:51 |
andreaf | mordred, corvus if we remove the test matrix we will have to set list of services and that's going to be dicts that we don't want to merge | 15:52 |
corvus | AJaeger: i don't anticipate that would be a substantial problem in practice. much of the time, we don't even need hostvars. | 15:52 |
mordred | luckily, since it would be a job variable, we can try it and make sure none of hte base roles will be borked | 15:52 |
corvus | andreaf: the services aren't lists, they are dict entries, specifically so that we can merge them. | 15:52 |
mordred | corvus: well, it affects all variable precedence - so it's possible it could introduce a merge where otherwise a replace would have happened | 15:52 |
mordred | even i host_vars/group_vars aren't present | 15:52 |
mordred | corvus: I agree, I doubt it'll be an issue for us - but it's still *possible* that something in one of the base roles would not be hash-merge clean | 15:53 |
corvus | mordred: oh. that might be a bummer for site vars. | 15:53 |
mordred | yah. I think we want to double-check that it doesn't allow site vars to have local values merged into them | 15:54 |
andreaf | corvus: yes I know they are dicts - I guess it depends if we want controllers and subnodes to disable they don't need | 15:54 |
andreaf | corvus or to define which services they do need | 15:54 |
andreaf | corvus, mordred: well maybe I should just duplicate devstack_localrc and everything would be easier | 15:55 |
andreaf | corvus, mordred: would it be possible in the job definition to set the shared part of the dict in a shared_devstack_localrc varilable, and then set devstack_localrc in group vars as the merge of shared plus a custom part? | 15:57 |
corvus | andreaf: yeah, we could merge them in the role | 15:59 |
andreaf | corvus, mordred: can we use jinja2 in the job definition? for instance devstack_localrc: "{{ shared_devstack_localrc |combine(my_devstack_localrc) }}" | 16:07 |
corvus | andreaf: i don't believe that will work -- zuul isn't going to interpolate it, and i don't think ansible does so for inventory files | 16:09 |
mordred | corvus: it doesn't - but it does for vars files in roles - so it's _possible_ it'll do late interpolatio | 16:12 |
openstackgerrit | David Shrewsbury proposed openstack-infra/nodepool master: Simplify launcher cleanup worker https://review.openstack.org/545867 | 16:12 |
mordred | corvus, andreaf: which is to say - I agree with corvus, I think it won't work ... BUT, it's totally worth trying :) | 16:14 |
andreaf | mordred corvus another question - will zuul merge dicts between group_vars in parent and group_vars in children jobs? | 16:15 |
corvus | andreaf: yes | 16:15 |
tobiash | mordred: am I right that your per change status change only uses the change number? | 16:16 |
tobiash | mordred: I'm asking because github has no global change numbers | 16:16 |
andreaf | corvus ok so if base multinode defines services for controller and services for subnode, a child job can selectively add/remove services on controller and on subnode | 16:16 |
corvus | andreaf: yep, that should work. | 16:17 |
tobiash | mordred: if I'm right this won't work for github | 16:18 |
mordred | tobiash: good point - yes, I believe you are correct about that | 16:23 |
mordred | tobiash: we probably should add support for filtering by project+change too, yeah? | 16:23 |
tobiash | mordred: let me check what the 'id' field is in github | 16:24 |
andreaf | mordred, corvus you never know, it may work https://review.openstack.org/#/c/545696/11/.zuul.yaml | 16:24 |
tobiash | mordred: ok, the id field is something like this: "id": "6,a672a19d197779e4d5410cb25775ee373cf47269" | 16:25 |
tobiash | and your checking the whole string right? | 16:25 |
mordred | yah | 16:25 |
tobiash | assuming the git hash is unique enough this should work equally well in github :) | 16:26 |
mordred | cool. so there is a gh value that will work - it might not be immediately guessable by an end-user like the gerrit one is | 16:26 |
tobiash | not guessable but the intention is probably reporting this via status_url right? | 16:27 |
mordred | tobiash: but if we're mostly expecting this to get called as the result of links we're putting places, that should be good enough for now - and we can leave figuring out how to get someone from "ansible/ansible/pulls/6" to status/change/6,a672a19d197779e4d5410cb25775ee373cf47269 later | 16:27 |
*** JasonCL has joined #zuul | 16:27 | |
mordred | tobiash: yup | 16:27 |
mordred | however - I thnk at some pint having a dashboard page for a change that doesn't change as the change is revised would e nice - perhaps something like 'change/review.openstack.org/501040' and '/change/github.com/ansible/ansible/6' - so that the key is similar to what's used in depends-on lines? | 16:30 |
corvus | ++ | 16:31 |
mordred | but that dashboard page could use the ansible/ansible/6 to look up 6,a672a19d197779e4d5410cb25775ee373cf47269 and then get the status from status/change/6,a672a19d197779e4d5410cb25775ee373cf47269 | 16:31 |
mordred | (and in fact could potentially look up older builds/status for that change too and show info about them from the build history while showing live status or completed build info for the current change | 16:32 |
SpamapS | corvus: to be clear, I log at debug level. The only indication I get that nodes are spinning up is this: 2018-02-18 01:04:45,639 DEBUG nodepool.NodeLauncher-0000005071: Waiting for server ded9e08e-311c-4c98-8b1f-1d516148aed9 for node id: 0000005071 | 16:32 |
SpamapS | actually I also maybe just missed this one 2018-02-18 01:04:30,230 INFO nodepool.NodeLauncher-0000005071: Creating server with hostname k7-a-0000005071 in a from image centos7-kolla for node id: 0000005071 | 16:33 |
corvus | SpamapS: oh, what notification are you missing? | 16:33 |
SpamapS | But I'd swear sometimes I don't see those until just before the request completes | 16:33 |
*** JasonCL has quit IRC | 16:33 | |
SpamapS | and really, I think the logging is ok. I am just overwhelmed with logs | 16:34 |
SpamapS | (Info has the creating message, and that should be plenty) | 16:34 |
SpamapS | The real problem is the AZ allocation. | 16:34 |
SpamapS | I haven't checked my math, but from what I can tell, each node added to a nodeset makes it *far far* harder to have enough ready nodes in the chosen AZ ever. | 16:35 |
tobiash | mordred: +2 with comment on https://review.openstack.org/#/c/545644 | 16:36 |
SpamapS | Because everything is chosen at random, min-ready may not spin up any nodes in the AZ you need next, but if there are nodeset_size - 1 nodes in that AZ already, then your job queues while a node spins up. | 16:36 |
SpamapS | And then min-ready may come along and not get enough nodes in that AZ again.. and the cycle repeats until you get a request which happens to pull a node from the very-full AZ. | 16:37 |
SpamapS | So yeah.. I think the simplest thing to do is let people turn off the nodeset-in-one-AZ for a particular cloud. | 16:38 |
mordred | tobiash: that's a good point ... | 16:38 |
corvus | mordred: when did those 4 patches become the first in the series? i thought 538099 was the first? | 16:39 |
corvus | mordred: oh, i think you explained that in the commit msg for 537010... | 16:40 |
*** jpena is now known as jpena|brb | 16:42 | |
dmsimard | mordred: replied to your email on zuul-discuss with some additional thoughts | 16:43 |
openstackgerrit | Monty Taylor proposed openstack-infra/zuul master: Use a status code to detect unknown vs. missing tenant https://review.openstack.org/545879 | 16:45 |
mordred | tobiash: ^^ how about if we did that? | 16:45 |
tobiash | mordred: looks good, but what do you think about just sending the complete payload with 404? | 16:50 |
SpamapS | corvus: so yeah just to reset.. I think my logging problem was actually that I should have been watching my INFO-only log and I'd have seen the activity just fine. | 16:50 |
tobiash | that way the response still would be json (consistent and also machine readable if you don't have the status code like in a shell pipe) | 16:50 |
SpamapS | but my "why are my jobs always queueing for 5 minutes" problem is with AZ stickyness meeting large nodeset sizes. | 16:51 |
mordred | tobiash: wfm - one sec | 16:51 |
corvus | SpamapS: ok | 16:51 |
SpamapS | The latter I intend to submit a patch for. | 16:51 |
clarkb | fwiw I'm not a huge fan of the random az by default behavior in nodepool and think we should use the default az unless a list of az's is specified | 16:51 |
SpamapS | Which would allow turning it off per provider for those who know the cloud that is pointed at should be fine with cross-AZ jobs. | 16:52 |
clarkb | (and aiui default could be nova giving you whatever it wants to give you) | 16:52 |
dmsimard | mordred, andreaf: btw just in case you missed my reply for this issue http://eavesdrop.openstack.org/irclogs/%23zuul/%23zuul.2018-02-18.log.html#t2018-02-18T17:43:57 .. see my answer here: http://eavesdrop.openstack.org/irclogs/%23zuul/%23zuul.2018-02-19.log.html#t2018-02-19T04:50:41 | 16:52 |
SpamapS | clarkb: agreed. I didn't know we did that by default. | 16:52 |
SpamapS | Typically clouds will configure their default AZ to be the one that is most empty. | 16:52 |
clarkb | SpamapS: yes it surprised me when we broke citycloud due to this behavior | 16:52 |
SpamapS | In GoDaddy's case, we catually require explicit AZ because we did routed networks before neutron did them. ;) | 16:53 |
SpamapS | catually and actually. | 16:53 |
* SpamapS meows | 16:53 | |
clarkb | SpamapS: in that case you could use different providers for each az | 16:54 |
clarkb | except now we won't weight the requests by max-server size anymore right? | 16:54 |
corvus | clarkb: pool, not provider | 16:54 |
clarkb | so maybe that doesn't help | 16:54 |
clarkb | oh right pool. But still I'm not sure it helps unless we weigh the requests based on max-server size | 16:55 |
corvus | what is max-server size? | 16:55 |
clarkb | the max-servers allowed to be booted by each pool/provider | 16:56 |
corvus | clarkb: pool again | 16:56 |
clarkb | so that you could give the smaller az a smaller max-servers value and it would be weighed against the whole. This is how old nodepool worked but I think with the new request mechanism we don't do that anymore | 16:56 |
corvus | clarkb: why would that be necessary? | 16:56 |
clarkb | corvus: because aiui SpamapS' | 16:56 |
clarkb | er | 16:57 |
clarkb | my understanding of SpamapS' problem is that he has two az's one is smaller than the other. When the small oen gets a large node request say for 5 nodes it will often block waiting for resources to clear up | 16:57 |
clarkb | if we weighed handling of requests the bigger az would be more likely to handle that request until they would both be in a situation to block and then it doesn't matter | 16:57 |
*** JasonCL has joined #zuul | 16:58 | |
mordred | tobiash: actually, the exception there doesn't take that parameter at all - and takes code and message explicitly | 16:58 |
openstackgerrit | Monty Taylor proposed openstack-infra/zuul master: Use a status code to detect unknown vs. missing tenant https://review.openstack.org/545879 | 17:01 |
corvus | clarkb, SpamapS: yeah, there are a lot of improvements that can be made to the launchers if they are aware of each other. right now, the algorithm assumes no information sharing between them. i think it's worth evolving it in the future to share info, but it needs to be done carefully. | 17:01 |
clarkb | another way to appraoch this is to let the cloud put instances in whichever AZ is most appropriate. Though sounds like that won't entirely work for godaddy's cloud | 17:01 |
clarkb | I do think it is worth figuring out if we can remove the random az by default behavior in nodepool as it is suprising to users of openstack clouds imo | 17:01 |
corvus | clarkb: i believe the reason we always specify an az is because otherwise we can't guarantee that we'll get nodes in the same az. | 17:02 |
clarkb | yes, but it is also not how all the existing clients and tools work if you don't specify an az so you will get different potentially unexpected behavior | 17:02 |
clarkb | (I know I hadn't expected that when we ran into it) | 17:02 |
corvus | clarkb: sure, but, i mean, if this existed we wouldn't have had to write nodepool. :) | 17:03 |
corvus | let's make this argument less circulare | 17:03 |
corvus | we need to (in some cases) create 2 nodes that are guaranteed to be in the same az | 17:03 |
corvus | is there another way to do that? | 17:03 |
clarkb | instruct the users that if they have more than one AZ to configure them? | 17:04 |
clarkb | I think this is a case where ^ is less harm than booting instances in AZs you shouldn't be booting in | 17:04 |
clarkb | I have yet tofind a cloud where the default az is actually more than one az | 17:04 |
clarkb | even though it is theoretically possible | 17:04 |
corvus | clarkb: can you ask the cloud what the default az is? | 17:05 |
clarkb | I am not sure | 17:05 |
mordred | no | 17:05 |
corvus | that would be the simplest solution. perhaps we should start the very long process of adding that as an openstack api feature? | 17:06 |
mordred | well - it's not really ... it would be exceptionally hard to do - because an AZ isn't actually a thing | 17:06 |
mordred | so, with regions, you can query the keystone catalog - because region is an aspect of the endpoint registration record | 17:07 |
mordred | but azs exist independently in nova, neutron and cinder | 17:07 |
corvus | mordred: well, we only care about regions in nova, so it could be a nova api call | 17:07 |
corvus | er | 17:08 |
corvus | az not region | 17:08 |
clarkb | worth pointing out that az foo in nova may require volumes from az bar in cinder and a different combo of az's won't work. So you'll have to configure nodepool for cases like that anyways (random won't necessarily work) | 17:08 |
clarkb | (though we aren't doing cinder volumes with nodepool so mostly a non issue) | 17:09 |
mordred | right - I was actually just checking floating ips to make sure we weren't accidentally doing something incorrect there | 17:09 |
corvus | clarkb: perhaps a way to use the default az would be to do have nodepool rememer what value or values it has gotten back, and then discard errant nodes or specify that value going forward | 17:10 |
*** dmellado has joined #zuul | 17:10 | |
mordred | but I believe we're fine due to how we create floating ips directly on a port | 17:10 |
*** myoung is now known as myoung|bbl | 17:11 | |
corvus | clarkb: that would probably make things better in the citycloud case. i don't think anything short of information sharing between launchers would help SpamapS's case. | 17:11 |
clarkb | yes, I think SpamapS' case is exceptional in its corner case qualities :) | 17:12 |
*** JasonCL has quit IRC | 17:12 | |
mordred | clarkb: SpamapS is exceptional at producing corner cases | 17:12 |
clarkb | and ya could remember check the az value of the first node that comes back using the default no value az reuqest then explicitly specify that going forward | 17:12 |
*** JasonCL has joined #zuul | 17:21 | |
*** jpena|brb is now known as jpena | 17:22 | |
*** JasonCL has quit IRC | 17:25 | |
*** JasonCL has joined #zuul | 17:37 | |
corvus | tobiash: did you want to review https://review.openstack.org/537011 ? | 17:37 |
tobiash | corvus: oh yes, forgot about that between discussion with mordred and dinner ;) | 17:39 |
andreaf | dmsimard thanks | 17:40 |
corvus | tobiash, mordred: tristanC reviewed previous versions of all those changes positively, so i went ahead and approved all of them except 537011. | 17:40 |
mordred | corvus, clarkb SpamapS: I have verified with the nova team that there is no way to find out what the default az is | 17:41 |
*** JasonCL has quit IRC | 17:42 | |
tobiash | lgtm | 17:42 |
corvus | mordred: looks like you're all clear on those 4 patches | 17:43 |
mordred | corvus: woot! thanks! | 17:43 |
corvus | over in #openstack-infra, we found that my memory improvement is using a bit too much cpu, so i'm going to switch to working on a fix to that. | 17:44 |
mordred | corvus: ++ | 17:44 |
mordred | corvus: I'll wait to suggest a scheduler/zuul-web restart on your cpu fix | 17:44 |
corvus | sounds good | 17:45 |
mordred | anybody remember who it was that had a console app that shows zuul status? was that dansmith? | 17:53 |
mordred | clarkb: ^^? | 17:53 |
clarkb | I think harlowja and dansmith had their own | 17:54 |
*** sshnaidm is now known as sshnaidm|off | 17:54 | |
*** JasonCL has joined #zuul | 17:59 | |
*** tosky_ is now known as tosky | 18:04 | |
*** JasonCL has quit IRC | 18:05 | |
*** tosky has quit IRC | 18:07 | |
dmsimard | mordred: harlowja yeah | 18:12 |
dmsimard | I have a link for it somewhere... | 18:12 |
*** hashar_ is now known as DockerIsA_Sin | 18:12 | |
dmsimard | mordred: https://github.com/harlowja/gerrit_view/blob/master/README.rst#czuul | 18:14 |
*** openstackgerrit has quit IRC | 18:18 | |
*** jpena is now known as jpena|away | 18:19 | |
*** JasonCL has joined #zuul | 18:26 | |
mordred | dmsimard: thanks. pull request submitted | 18:28 |
dmsimard | \o/ | 18:29 |
*** chrnils has quit IRC | 18:31 | |
*** JasonCL has quit IRC | 18:32 | |
*** DockerIsA_Sin is now known as hashar | 18:33 | |
*** openstackgerrit has joined #zuul | 18:34 | |
openstackgerrit | Merged openstack-infra/zuul master: Remove .json suffix from web routes https://review.openstack.org/537010 | 18:34 |
openstackgerrit | Merged openstack-infra/zuul master: Port per-change status to zuul-web https://review.openstack.org/535903 | 18:40 |
openstackgerrit | Merged openstack-infra/zuul master: Re-enable test of returning 404 on unknown tenant https://review.openstack.org/545644 | 18:40 |
openstackgerrit | Merged openstack-infra/zuul master: Add /info and /{tenant}/info route to zuul-web https://review.openstack.org/537011 | 18:40 |
openstackgerrit | James E. Blair proposed openstack-infra/zuul master: Avoid creating extra project schemas https://review.openstack.org/545953 | 18:45 |
clarkb | corvus: is ^ the performance fix? | 18:46 |
corvus | clarkb: yep | 18:46 |
corvus | that was not what i was expecting, fwiw. | 18:46 |
clarkb | ok reviewing now, then breakfast | 18:46 |
corvus | but in my local tests, that restored (and actually slightly improved with the extra changes i put in there) the parse time of a large configuration to the same as before the whole rework stack | 18:47 |
corvus | the difference was about double (ie, https://review.openstack.org/545448 caused the time to double, and then https://review.openstack.org/545953 halved it again). that doesn't match the 6x difference we see in openstack-infra. i can't really account for that other than to say that my local config doesn't mirror it exactly. | 18:49 |
corvus | so i think we should try that in prod to see whether that's a complete or partial fix. | 18:49 |
corvus | i started a local test run when i pushed it up -- it looks like it's going to fail unit tests, but i think it may only be due to test infrastructure issues. i think it's worth reviewing it for substance now while i fix up the test failures. | 18:51 |
AJaeger | corvus: did you see mordred's change https://review.openstack.org/#/c/545610 ? That touches similar lines as yours... | 18:51 |
clarkb | corvus: substance wise it looks fine to me | 18:51 |
corvus | AJaeger: i like mordred's change -- but let's rebase it after this one since it's not as urgent | 18:52 |
fungi | yeah, i approved it before i saw you mention local test failures. curious to see where it breaks | 18:57 |
openstackgerrit | James E. Blair proposed openstack-infra/zuul master: Avoid creating extra project schemas https://review.openstack.org/545953 | 18:58 |
openstackgerrit | James E. Blair proposed openstack-infra/zuul master: Allow test_playbook to run long https://review.openstack.org/545955 | 18:58 |
dmsimard | Do we have a strategy for updating Ansible in Zuul without potentially breaking everyone's jobs ? | 18:59 |
corvus | fungi, clarkb, mordred: ^ local test failures fixed. i also stacked it on top of a test timeout bump. | 18:59 |
fungi | thanks, i was wondering whether the change in timeout handling was to blame for the rise in unit tests timeouts for zuul changes | 18:59 |
corvus | dmsimard: no, i think what we're going to do is land the 2.4 update, deal with breakage, and after that work out a way to support multiple versions. | 19:00 |
corvus | dmsimard: (then, of course, changing the default version will break everyone, but that's an easier fix) | 19:00 |
fungi | but i guess the tests themselves outgrew the timeout some too | 19:00 |
corvus | fungi: yeah, i think that particular test timeout is due to our adding new jobs to that test for the post-timeout and hostvars | 19:00 |
fungi | k | 19:01 |
fungi | i see, that's the per-test timeout, not the job/playbook timeout | 19:01 |
dmsimard | corvus: yeah Ansible unfortunately doesn't have a very good reputation at being "stable" (at least in the semver sense), typically breaking roles/playbooks. One has to be careful about using the word "regression" in #ansible-devel, often times it's a fix that broke an "unexpected behavior" or something like that :) | 19:01 |
dmsimard | Since we have hundreds of jobs, upgrading Ansible might be quite a sensitive task | 19:02 |
*** harlowja has joined #zuul | 19:03 | |
fungi | and no sane way to exercise them all against new ansible versions other than just taking the plunge and seeing what breaks | 19:04 |
clarkb | corvus: the two changes are approved now | 19:06 |
*** nhicher has quit IRC | 19:06 | |
*** myoung|bbl is now known as myoung | 19:07 | |
*** nhicher has joined #zuul | 19:07 | |
dmsimard | fungi: that's scary | 19:08 |
dmsimard | fungi: I wonder if there could be a job that'd run a job in another version of Ansible somehow | 19:08 |
clarkb | dmsimard: aiui the idea is that with zuul testing ansible we can avoid some of those issues, but yes. Upgrading zuul's ansible likely to be an adventure | 19:08 |
dmsimard | clarkb: I'm not worried so much about Zuul than I am about the users' jobs | 19:09 |
clarkb | dmsimard: sure thats what I mean. the ansible zuul runs | 19:09 |
corvus | dmsimard: well, i mean, if you do that, you've solved the problem of running multiple versions of ansible. which we plan to do. | 19:09 |
corvus | so i mean, i guess we could stay on 2.3 until we do that. | 19:09 |
clarkb | at the very least we need to publish what version of ansible a job ran under to aid in debugging (I think mordred was going to add that, not sure if it got added) | 19:09 |
corvus | but i feel like we're early enough in the process that we (and i mean the collective zuul's of the world here, not just openstack) can deal with a cutover to 2.4. | 19:10 |
Shrews | pretty sure the logging of ansible version is already landed | 19:10 |
corvus | then solve for multiple versions of ansible later | 19:10 |
dmsimard | clarkb: That's https://review.openstack.org/#/c/514489/ and https://review.openstack.org/#/c/532304/ which were blocked by a weird include_role issue, I think that's fixed now. I'll look. | 19:11 |
fungi | dmsimard: we could set up some system to run preliminary builds of select jobs against newer ansible or something, but my point was we have tons of jobs which run infrequently under specific situations and have side effects we cant fully exercise outside production without a lot of extra effort to do things like set up our own mock pypi and whatnot (and even that doesn't 100% guarantee it'll accurately | 19:11 |
fungi | represent a production run) | 19:11 |
dmsimard | OH, no, it was a random fedora26 failure I could not understand | 19:11 |
dmsimard | I'll debug that. | 19:11 |
Shrews | dmsimard: oh, that still hasn't landed. thought it had | 19:12 |
dmsimard | fungi: yeah I don't have any great ideas right now... at least as far as ARA is concerned, there are different integration jobs which tests that ara works with different versions of Ansible (2.2, 2.3, 2.4, devel). We don't really have the luxury of doing that with Zuul and the hundreds of jobs our users have | 19:13 |
dmsimard | and then again, it needs to be tested under *Zuul's* Ansible | 19:13 |
openstackgerrit | David Moreau Simard proposed openstack-infra/zuul-jobs master: Revert "Revert "Add zuul.{pipeline,nodepool.provider,executor.hostname} to job header"" https://review.openstack.org/514489 | 19:14 |
openstackgerrit | David Moreau Simard proposed openstack-infra/zuul-jobs master: Add Ansible version to job header https://review.openstack.org/532304 | 19:14 |
*** jpena|away is now known as jpena|off | 19:14 | |
clarkb | is the ansible version something we want the jobs to provide or something that zuul should just write out? | 19:17 |
clarkb | (I worrythat that is a feature that everyone should have and if it is in job ocnfig many will miss it?) | 19:17 |
corvus | well, this is the way we provide it to everyone -- a role in zuul-jobs that anyone can (should) use | 19:18 |
clarkb | ya I guess the intent is for people to be using these roles. Maybe info about these roles can go into the bootstrapping doc when writing a base job | 19:20 |
*** JasonCL has joined #zuul | 19:20 | |
corvus | clarkb: yeah, that's sort of where the zuul-from-scratch doc leaves off right now. but i believe we're all agreed it should be included there. | 19:22 |
*** JasonCL has quit IRC | 19:25 | |
dmsimard | clarkb: I believe we discussed in the past that if users need advanced customization of their Ansible (ex: OSA might need a specific version of Ansible to test things.. different modules/callbacks, etc.), they would need to rely on installing a "nested" version of Ansible and running with that | 19:47 |
corvus | well, that's the story now. but just so we're clear: after we upgrade to ansible 2.4, we are going to figure out a way for zuul to support more than one version of ansible, and people will be able to specify which version of ansible to run in their jobs. | 19:51 |
dmsimard | corvus: that sounds amazing -- hard, but amazing | 19:51 |
corvus | it will not be easy, but it is required in order for us to achive our objective of people using the same ansible in test and production. | 19:52 |
rcarrillocruz | wow | 19:52 |
rcarrillocruz | That would be amazing indeed | 19:52 |
*** JasonCL has joined #zuul | 19:53 | |
clarkb | seems like the biggest hurdle there is the security monkey patching? | 19:53 |
fungi | seems like we're going to become increasingly aware of the nuanced differences between each minor ansible release | 19:53 |
clarkb | otherwise you can just install a virtualenv for each ansible and fork ansible out of them? | 19:54 |
fungi | ansible seemed amenable to upstreaming some of the security improvements, right? | 19:54 |
fungi | i mean, after the got over the "what? you're going to run arbitrary user-supplied playbooks on your servers!?!" | 19:55 |
fungi | after they | 19:55 |
dmsimard | clarkb: yeah, we need a clean way/interface to patch the stuff that we do in Zuul (security is one, streaming is another, etc.) | 19:57 |
corvus | i actually think that's not going to take much more work. we already need to do work for each new version of ansible. so it's *mostly* a matter of not deleting the old versions, but with some minor-to-moderate extra maintenance work to deal with stable point releases. most of that can be automated (this stuff doesn't generally change a lot between point releases) | 20:00 |
corvus | the thing we have to sort of invent is how to have multiple versions of ansible installed that works with all the ways people want to deploy zuul | 20:01 |
*** JasonCL has quit IRC | 20:02 | |
dmsimard | I have some quite dirty code to handle forward/backward compat in ARA | 20:10 |
dmsimard | i.e, https://github.com/openstack/ara/blob/dd37d4566bb4a17871e55e4ba50e17b46079efc2/ara/config.py#L41 "if LooseVersion(ansible_version) >= LooseVersion('2.3.0')" etc | 20:10 |
clarkb | corvus: my naive assumption here is that zuul can sort of internally use the new python3 virtualenv stdlib tooling | 20:10 |
dmsimard | If we end up finding a better way to handle this sort of stuff in Zuul, I'm all ears | 20:11 |
clarkb | https://docs.python.org/3/library/venv.html#api | 20:11 |
dmsimard | clarkb: so the executor would bootstrap the appropriate version of Ansible after creating the bwrap or something like that ? | 20:12 |
clarkb | dmsimard: I think it can do it before bwrapping then bind mount into it and just keep a bunch of premade ansible installs | 20:12 |
clarkb | but ya something like that | 20:12 |
clarkb | probably make them on demand the first time | 20:12 |
dmsimard | ah I see what you mean.. yeah, that could make sense | 20:13 |
corvus | clarkb: ah yes, that sounds like just the thing. | 20:20 |
mordred | clarkb, corvus: yah - I like the venv idea and think that's a good way to move forward on that | 20:41 |
corvus | dmsimard: what's the status of https://review.openstack.org/543633 ? | 20:43 |
corvus | dmsimard: (that's the fix for https://storyboard.openstack.org/#!/story/2001329 right?) | 20:44 |
mordred | corvus: so - I just noticed (because I looked at docs online and the docs lied to me) we have an upper-bounds cap on aiohttp of <3 ... but 3.0.1 is out now | 20:57 |
corvus | mordred: yeah, they broke something, commit should have info | 20:58 |
mordred | corvus: it's mostly the same, although there is at least one thing that be nice: https://aiohttp.readthedocs.io/en/stable/web_advanced.html#graceful-shutdown | 20:58 |
corvus | mordred: https://review.openstack.org/543546 | 20:58 |
mordred | corvus: yah - that was what I was about to say - glad to see that was the reason for th ecap | 20:59 |
mordred | so we can update to 3.0 when we've migrated to bionic | 20:59 |
corvus | mordred: oh, any chance 3.0.1 fixed the issue? | 20:59 |
Shrews | odd. i wonder what 3.5.3 has that they needed | 21:00 |
SpamapS | clarkb: "my understanding of SpamapS' problem is that he has two az's one is smaller than the other." -- that's a misunderstanding. I have 3 AZ's, and they are all relatively the same size. | 21:00 |
dmsimard | corvus: I rebased it today, it's on top of my todo list right now. I plan on reproducing it with https://review.openstack.org/#/c/530642/12/playbooks/devstack-tempest.yaml | 21:00 |
dmsimard | corvus: it's probably the fix, yes, but I want a test that is able to demonstrate that it actually does fix it | 21:01 |
dmsimard | corvus: er, I rebased something else that was *not* that -- I don't actually need to rebase that one. | 21:01 |
clarkb | SpamapS: ah ok, I'm not sure what nodepool would do differently then? maybe weight not based on max-servers but by available quota? | 21:05 |
corvus | dmsimard: can you add a test for it in zuul? | 21:09 |
dmsimard | corvus: That's what I attempted to do with the functional integration test https://review.openstack.org/#/c/543633/2/playbooks/zuul-stream/functional.yaml but it doesn't seem to reproduce the issue, I'm doing some testing locally to try and understand why | 21:11 |
mordred | clarkb: I think what I was hearing from SpamapS yesterday is that we do the az randomization before we check for ready nodes - so he had an az with ready nodes, but the az balancer chose a different az and proceeded to spin up the nodes there | 21:12 |
clarkb | ah | 21:13 |
corvus | er, that's not how that works | 21:14 |
corvus | if there's a ready node, we'll grab it and use its AZ to spin up any more nodes if necessary | 21:14 |
mordred | oh - yah. I just re-read | 21:15 |
pabelanger | I'm just looking at https://review.openstack.org/545953/ and py35 seems to have failed | 21:15 |
mordred | in his case it was that min-ready was also working on spinning up nodes and he was near capacity | 21:15 |
corvus | spectacularly.. i'll look into that | 21:16 |
mordred | pabelanger: oh - wow - that's a lot of red | 21:16 |
pabelanger | looks like threads didn't shutdown propelry | 21:17 |
pabelanger | properly* | 21:17 |
clarkb | but we also can't really fix that without having handler coordination like what corvus described I think | 21:18 |
clarkb | because it requires handlers to have a rough idea if any other handler is in a better spot to handle a request | 21:18 |
mordred | clarkb: yah | 21:18 |
corvus | those tests work in isolation... | 21:18 |
corvus | i'm wondering if one test went awry and messed up the state for other tests | 21:19 |
openstackgerrit | Merged openstack-infra/zuul master: Allow test_playbook to run long https://review.openstack.org/545955 | 21:19 |
corvus | oh i think i know which :) | 21:21 |
openstackgerrit | David Shrewsbury proposed openstack-infra/nodepool master: Hack for test_delete_now rare failures https://review.openstack.org/545982 | 21:22 |
corvus | mordred: is there a way i can have ttrun do 2 different classes? | 21:22 |
Shrews | i have been pondering the test_delete_now failures today. i'd like to see the above hack go in until a real solution can be generated | 21:23 |
corvus | mordred: like i want to run: "ttrun -epy35 tests.unit.test_model" and ttrun -epy35 tests.unit.test_requirements" in one solution? | 21:23 |
corvus | Shrews: oh is *that* the thing we added recursive locking for? | 21:23 |
corvus | Shrews: that merged in kazoo, we can probably use it if we can remember how it was going to solve it. | 21:23 |
Shrews | corvus: i spent a great deal today trying to see how that could help (i totally forgot the solution we discussed) | 21:24 |
Shrews | corvus: i did not see how it would help at all since ReadLock and WriteLock still create znodes under the thing we intend to delete | 21:24 |
corvus | Shrews: heh, i may have to ask eavesdrop to figure out what we were thinking | 21:24 |
mordred | corvus: not really, no - I think stestr can though | 21:25 |
clarkb | tox -epy36 -- '(TestClass1|TestClass2)' ? | 21:25 |
mordred | corvus: actually, I think testr itself can too ... ^^ like that | 21:26 |
mordred | you may also want to add --parallel=1 or whatever the concurrency arg is | 21:26 |
openstackgerrit | James E. Blair proposed openstack-infra/zuul master: Avoid creating extra project schemas https://review.openstack.org/545953 | 21:26 |
mordred | concurrency | 21:26 |
mordred | tox -epy35 -- --concurrency=1 'tests.unit.test_(model|requirements)' | 21:27 |
mordred | should do the trick | 21:27 |
corvus | oh ok, well i just started running the whole squite locally | 21:27 |
mordred | k | 21:27 |
corvus | though i'm pretty sure that should fix it | 21:27 |
mordred | corvus: should we go ahead and re-promote in gate? | 21:29 |
corvus | mordred: i should have a local test run in just a minute | 21:30 |
clarkb | does nothing have to call readConfig? | 21:30 |
corvus | clarkb: the scheduler now calls both read and load config | 21:30 |
clarkb | was that git added? | 21:31 |
clarkb | (I'm not seeing it but also could just be looking poorly) | 21:31 |
corvus | clarkb: i may need more context for your question | 21:31 |
clarkb | I'm not seeing where the scheduelr calls read config. But I'm about to fetch change locally and grep better | 21:31 |
mordred | corvus: gerrit is showing something strange in the interdiff | 21:31 |
corvus | clarkb: you're talking about what change? mordred's config changes? | 21:32 |
mordred | corvus, clarkb: that readConfig change is from here: https://review.openstack.org/#/c/545610/ ... but gerrit is showing it in the 'show me differences between patchset 2 and 3" view | 21:32 |
clarkb | 545953 I think I'm just not seeing an existing api due to the restricted scope of that change | 21:32 |
corvus | my latest patchset is a fix+rebase | 21:32 |
mordred | oh - it's a rebase - so it's just the werid gerrit rebase thing | 21:32 |
corvus | sorry, i don't normally like to do that, but i did so this time because the test failures were hard to repro locally and i thought it might be due to a change in the tree | 21:33 |
mordred | and sorry - it wasn't from that patch, it was from a different one | 21:33 |
corvus | i can un-rebase that if you'd like | 21:33 |
mordred | corvus: ++ | 21:33 |
mordred | corvus: nah - I was just confused because I thought it was content from an un-landed change :) | 21:33 |
corvus | i only made one change, and the rebase should (in retrospect) not be necessary. the change is to add a cleanup in test_model. | 21:33 |
clarkb | corvus: ya cleanup in test_model makes sense to me | 21:33 |
mordred | ++ | 21:34 |
corvus | (so probably easiest to just look at the actual diff from base and know that single line is the only change in the latest patchset) | 21:34 |
clarkb | ok I've approved it | 21:35 |
clarkb | git log -p straightened me out | 21:36 |
mordred | \o/ | 21:36 |
*** threestrands has joined #zuul | 21:37 | |
corvus | local tests pass, i'll re-enqueue in gate | 21:37 |
fungi | the 545956 update included a rebase i guess | 21:38 |
fungi | ahh, scrollback confirms | 21:39 |
corvus | dmsimard: 543633 has a unit test in it -- does it not fail without the fix? | 21:39 |
corvus | sorry, i should have undone that but forgot | 21:39 |
SpamapS | So .. sort of? | 21:39 |
* SpamapS is laggy today sorry | 21:40 | |
dmsimard | corvus: that's not unit, that's integration -- the reason I'm confused is that this exact same patch used to fail if you look at the prior history in https://review.openstack.org/#/c/504238/ | 21:40 |
corvus | dmsimard: oh, it doesn't have a unit :( | 21:40 |
dmsimard | corvus: it suddenly stopped failing and I was never able to understand why | 21:40 |
SpamapS | My issue is that sometimes I routinely have 10+ ready nodes in an AZ, and because requests randomly choose a ready node, instead of a ready set, it's entirely possible we never choose one of those 10, and thus, end up needing to spin up nodeset_size - 1 nodes on every single request. | 21:41 |
mordred | dmsimard: maybe point-release of ansible? | 21:41 |
dmsimard | I'm working on setting up a reproducer right now, mordred's test-logs ( https://github.com/openstack-infra/zuul/blob/master/tools/test-logs.sh ) is quite useful -- I'll send some improvements for it | 21:41 |
SpamapS | I have min-ready: 15, and a nodeset size of 5, and 3 AZ's. | 21:41 |
clarkb | SpamapS: so we probably want to make ready nodes a heap sorted by nodes per az | 21:41 |
clarkb | SpamapS: and pull off in that order | 21:41 |
mordred | dmsimard: \o/ I'm useful :) | 21:42 |
SpamapS | I had hoped that meant we'd have a high chance of picking an AZ with an entire set in it. | 21:42 |
SpamapS | What actually happens is, we pick the right one almost never. | 21:42 |
corvus | mordred, dmsimard: well part of mordred's work on logging is that we should not need test-logs anymore | 21:42 |
corvus | we need *actual* tests for this stuff, and i believe mordred is adding those | 21:42 |
dmsimard | mordred: The fix for the issue we're talking about landed in 2.4.3 so unless Zuul is suddenly using 2.4.3, something's up | 21:42 |
corvus | so let's not spend too much time on improving test-logs | 21:42 |
mordred | corvus: I don't think we're looing at improving it at all | 21:43 |
mordred | corvus: I think he's just using it to help figure out why things suddenly started working | 21:43 |
dmsimard | corvus: sure -- it's a way for me to run the test playbooks with the callbacks and stuff without having access to a sandbox/test zuul environment | 21:43 |
dmsimard | mordred: yeah that | 21:43 |
corvus | mordred: dmsimard said he was improving it | 21:43 |
SpamapS | It gets really bad when 2 or 3 req's come in at once. | 21:43 |
dmsimard | corvus: 2 minutes worth of improvement to make my life easier :P | 21:43 |
mordred | corvus: not related to that- do all of the parsers do self.schema = self.getSchema now? | 21:44 |
mordred | corvus: nm - they don't | 21:44 |
corvus | mordred: i think only tenantparser doesn't... but i think it probably could? | 21:44 |
SpamapS | clarkb: I think we'd also need to make min-ready's do the same in reverse. When a min-ready req's comes in, choose the AZ with the least ready nodes. | 21:45 |
corvus | just wasn't important to me earlier because it's only called once either way | 21:45 |
corvus | dmsimard: so what happens if you add that test without the fix? | 21:45 |
SpamapS | Or, just allow disabling az-sticky | 21:45 |
mordred | corvus: I actually tried generalizing the schema bits when I made the base class patch - but it got to weird and I figured I'd do it as a follow up | 21:45 |
mordred | corvus: (there was one of the classes that called getSchema with arguments) | 21:45 |
SpamapS | Take nodeset_size nodes from the provider and ignore AZ. | 21:45 |
dmsimard | corvus: I'll have an answer in a few minutes | 21:45 |
corvus | mordred: yeah, nodeset has an anonymous argument, so we store 2 variations | 21:46 |
mordred | yah. that was the one that made it seems less nice | 21:46 |
corvus | mordred: could throw a layer of indirection in there... self.createSchema() and override that in nodeset.... | 21:47 |
mordred | corvus: oh - wait - you've solved the issue I had already I think | 21:48 |
mordred | yup. | 21:49 |
mordred | corvus: thanks! your latest patches remove the issue I had putting that in the base :) | 21:49 |
corvus | yay | 21:50 |
openstackgerrit | Monty Taylor proposed openstack-infra/zuul master: Extract an abstract base Parser class https://review.openstack.org/545610 | 21:53 |
mordred | corvus: there ya go ^^ | 21:53 |
clarkb | SpamapS: in your config you have a single pool for all three AZs and nodepool is randomly assigning nodes to each AZ right? | 21:56 |
*** hashar has quit IRC | 21:56 | |
dmsimard | corvus: can't reproduce it locally again... with or without the patch. I'll send a dummy patch with a generic version of what andreaf reported last time it occurred. | 21:57 |
andreaf | mordred I started documenting a bit https://review.openstack.org/#/q/status:open+project:openstack-dev/devstack+branch:master+topic:ansible_docs | 21:58 |
corvus | andreaf: \o/ | 22:00 |
mordred | http://logs.openstack.org/52/545952/1/check/build-openstack-sphinx-docs/478cade/html/roles.html <-- looks great! | 22:00 |
clarkb | meeting time right? | 22:00 |
corvus | yep, meeting time in #openstack-meeting-alt | 22:00 |
mordred | andreaf: there's a similar thing for jobs too | 22:00 |
mordred | andreaf: .. zuul:autojobs:: IIRC | 22:01 |
mordred | andreaf: the second patch - I think you missed a git add | 22:02 |
andreaf | mordred: heh ok thanks | 22:14 |
*** JasonCL has joined #zuul | 22:16 | |
mordred | andreaf: exciting to see that though | 22:17 |
openstackgerrit | Merged openstack-infra/zuul master: Avoid creating extra project schemas https://review.openstack.org/545953 | 22:18 |
*** openstackgerrit has quit IRC | 22:18 | |
*** JasonCL has quit IRC | 22:20 | |
*** openstackgerrit has joined #zuul | 22:38 | |
openstackgerrit | Monty Taylor proposed openstack-infra/zuul master: Use a status code to detect unknown vs. missing tenant https://review.openstack.org/545879 | 22:38 |
openstackgerrit | Monty Taylor proposed openstack-infra/zuul master: Remove Paste from the dependencies https://review.openstack.org/546000 | 22:38 |
mordred | corvus, pabelanger, clarkb: ^^ 545879 fixes an api mismatch - and the follow up just cleans up Paste - neither are urgent, but both should be easy +As | 22:54 |
*** JasonCL has joined #zuul | 22:54 | |
dmsimard | (╯°□°)╯︵ ┻━┻ | 22:57 |
dmsimard | corvus, mordred: https://review.openstack.org/#/c/545994/ didn't trigger the json issue | 22:57 |
mordred | dmsimard: :( | 22:59 |
mordred | dmsimard: maybe a 2.3 point release fixed the issue | 22:59 |
dmsimard | mordred: as far as I know, the fix is https://github.com/ansible/ansible/commit/c30ee42fe1f0a9666a90f4d63121780f2a186c54 and it didn't land until 2.4.3, it's not backported to 2.3.x which is what we're using | 23:02 |
mordred | dmsimard: yah | 23:02 |
dmsimard | mordred: yeah 2.3 has the bug: https://github.com/ansible/ansible/blob/stable-2.3/lib/ansible/executor/task_executor.py#L459 | 23:03 |
dmsimard | this is annoying | 23:03 |
dmsimard | andreaf: do you have a some sort of reference on the failure from the include_role in https://review.openstack.org/#/c/530642/12/playbooks/devstack-tempest.yaml ? | 23:04 |
dmsimard | andreaf: which job ran that code ? | 23:05 |
andreaf | dmsimard devstack-tempest and tempest-full and tempest-full-py3 | 23:07 |
*** JasonCL has quit IRC | 23:07 | |
dmsimard | andreaf: I can't find any signs of the error I'm looking for, could you point it out to me ? | 23:10 |
corvus | dmsimard: well, according to the bug in storyboard, we saw the issue in build e4fdb1c004fe47478a5468bb6318ad66 | 23:23 |
corvus | dmsimard: http://logs.openstack.org/42/530642/11/check/tempest-full/e4fdb1c/job-output.txt.gz#_2018-02-09_18_10_40_314968 | 23:24 |
corvus | dmsimard: there is no output in the console log about that | 23:24 |
corvus | the behavior is just start followed by an immediate end | 23:25 |
corvus | dmsimard: and you'll note there's no entry in ara for the devstack-tempest playbook | 23:25 |
corvus | dmsimard: it's patchset 11 that hit that error, not the latest version. | 23:26 |
corvus | dmsimard: patchset 11 doesn't use include_role... it actually looks like it's invalid ansible, but that should have been reported. | 23:27 |
corvus | dmsimard: that's in executor-debug.log.10.gz on ze03 if more logs would help | 23:31 |
dmsimard | corvus: I'll check it out, thanks | 23:45 |
Generated by irclog2html.py 2.17.2 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!