*** caphrim007 has quit IRC | 00:03 | |
*** ssbarnea has quit IRC | 00:28 | |
jhesketh | corvus, mordred, dmsimard: Yep, the start of which for locally run stuff is here: https://review.openstack.org/#/q/status:open+project:openstack-infra/zuul+branch:master+topic:freeze_job (I took a long vacation last month, so I'm just catching up on things and hope to continue that "real soon now"(tm)) | 00:37 |
---|---|---|
*** irclogbot_3 has quit IRC | 00:38 | |
dmsimard | jhesketh: ++ I know the feeling | 00:56 |
*** rlandy has quit IRC | 01:58 | |
*** threestrands has joined #zuul | 02:30 | |
*** bhavikdbavishi has joined #zuul | 03:52 | |
dmsimard | my zuul-fu is rusty, what's the best approach when you'd like to override just a single parameter from a job that is otherwise okay ? | 03:58 |
dmsimard | like, pretend in a totally hypothetical scenario that I'd like my tox-linters job to run on ubuntu-bionic instead of the current ubuntu-xenial | 03:58 |
dmsimard | it's not enough to warrant creating a child job that would inherit tox-linters | 03:59 |
dmsimard | it would just be a variant I guess ? | 04:01 |
clarkb | dmsimard: yes would just be a variant | 04:07 |
clarkb | can be as simple as setting nodeset where you choose to run the job | 04:07 |
dmsimard | for some reason I may misremember variants as being across branches of a single project | 04:08 |
dmsimard | or perhaps that's just the most common use case.. I'm rusty T_T | 04:09 |
clarkb | yes that is one instance where it happens | 04:10 |
tristanC | dmsimard: you may also override parameter when adding the job to a pipeline | 04:11 |
tristanC | dmsimard: e.g. https://softwarefactory-project.io/r/#/c/13585/3/zuul.d/openshift-jobs.yaml | 04:11 |
dmsimard | tristanC: oh, thanks | 04:12 |
*** bhavikdbavishi has quit IRC | 04:30 | |
*** pall has quit IRC | 05:28 | |
*** threestrands has quit IRC | 06:23 | |
tobiash | tristanC, corvus: what do you think about making config errors a bit more prominent in the web ui? | 06:47 |
tobiash | currently it's just a tiny little bell in the upper right corner. This can be overlooked easily by most users. I think having a real warning bar would be more appropriate as this should indicate something a project should fix. | 06:49 |
tobiash | spotting this on the first glance is especially with the github workflows and protected branches essential as this has a lot of opportunity about introducing config errors which we don't have any chance to catch | 06:51 |
tobiash | s/especially/especially important/ | 06:51 |
*** bhavikdbavishi has joined #zuul | 06:52 | |
tristanC | tobiash: i actually don't have an opinion :) | 06:52 |
tristanC | tobiash: though i meant to add an generic error_reducer to simplify api error handling: | 06:53 |
*** andreaf has quit IRC | 06:53 | |
tristanC | in the main app.jsx, we could have a warning banner hidden on all the pages | 06:53 |
tobiash | tristanC: for me as an multi-tenant operator this would be crucial as config errors are one of our re-occurring pain points | 06:54 |
tristanC | and an error reducer that could make it visible when api call fails, or in this case, when config-error list isn't empty | 06:54 |
jhesketh | tobiash: my opinion fwiw is that it's appropriate. The majority of visitors to the zuul dashboard are looking for their job status or results, they are likely less concerned with the configuration | 06:54 |
*** andreaf has joined #zuul | 06:55 | |
jhesketh | of course we should be logging the errors back to the patchsets introducing them, but we need to figure out a better way of testing config projects first | 06:55 |
tristanC | tobiash: my concern with multi-tenant operation, is that you have to go through each tenant to get their status and config-errors. What do you think of having top-level endpoints to return aggregated data? | 06:56 |
tobiash | jhesketh: the problem is that with many workflows there are no patchsets that introduce the errors but just things like repo renames, protecting a branch, etc which are all things that can be done by any repo admin in github without any possibility to react/report from zuul side | 06:56 |
tobiash | tristanC: I'm not interested in seeing all config errors on a global scale but make them more visible to my users (that only care for their own tenant) | 06:57 |
tobiash | tristanC: each tenant is mainly responsible for its own config here | 06:59 |
tobiash | what we're doing atm to mitigate this as best as possible is to start a zuul to verify the global config on each main.yaml change that is proposed (e.g. to add/rename/remove repos) | 07:00 |
tobiash | that is not going to scale in the foreseeable future so I'd like to change that to only validate if the main.yaml itself is valid | 07:01 |
jhesketh | tristanC: it would be nice to have a top-level view for administrators, but I agree with tobiash about not displaying them all to all tenants | 07:01 |
tobiash | jhesketh: so currently the errors are already skoped by tenant so that's great | 07:01 |
jhesketh | tobiash: and yes, I see your point about where errors may come from... I'm still not sure if the visitors to the dashboard will care for larger warnings though? If we ever build an admin set of pages perhaps it could be more prominent there? | 07:01 |
jhesketh | :thumbsup: | 07:01 |
tobiash | jhesketh: I think this just needs to be a bit more visible so I don't get complains several times a week why zuul is not acting on a specific repo and telling them 'look there is a tiny little bell that tells you why' ;) | 07:03 |
jhesketh | that's fair and I'm not against making it a bit more obvious. I would be cautious of making it look like zuul is broken when an uninitiated developer visits it for the first time to check their patch's progress though | 07:05 |
tristanC | for large tenant, like the one at zuul.openstack.org, wouldn't it be odd to display a single project config errors as a very visible warning bar? | 07:05 |
jhesketh | perhaps it could be a large warning on the project's page (when that merges) | 07:05 |
tobiash | tristanC: yes, maybe that needs to be different/configurable for single and multi tenant setups? | 07:06 |
tobiash | I see that a warning bar could be annoying in large single tenant installations | 07:07 |
tobiash | ... while it could save me much time supporting projects in my large multi-tenant setup | 07:08 |
*** bhavikdbavishi has quit IRC | 07:10 | |
*** quiquell has joined #zuul | 07:12 | |
quiquell | Good morning | 07:12 |
*** bhavikdbavishi has joined #zuul | 07:13 | |
*** pcaruana has joined #zuul | 07:36 | |
*** bhavikdbavishi has quit IRC | 07:37 | |
*** quiquell is now known as quiquell|brb | 07:49 | |
openstackgerrit | Tristan Cacqueray proposed openstack-infra/zuul master: doc: fix typo in secret example https://review.openstack.org/616095 | 08:02 |
*** quiquell|brb is now known as quiquell | 08:04 | |
*** themroc has joined #zuul | 08:16 | |
*** bhavikdbavishi has joined #zuul | 08:23 | |
*** hashar has joined #zuul | 08:33 | |
*** bhavikdbavishi has quit IRC | 08:36 | |
*** jpena|off is now known as jpena | 08:50 | |
*** hashar has quit IRC | 08:55 | |
*** goern has joined #zuul | 08:56 | |
*** hashar has joined #zuul | 09:00 | |
*** ssbarnea has joined #zuul | 09:19 | |
*** electrofelix has joined #zuul | 09:29 | |
*** ttx has quit IRC | 09:37 | |
quiquell | Hello I am using the zuul running at docker-compose as explained in the user guide to test stuff | 09:39 |
quiquell | I am pushing a project there but takes a lot of time at "Processing changes" is that normal ? | 09:40 |
*** panda|off is now known as panda | 09:43 | |
*** ttx has joined #zuul | 09:50 | |
*** sshnaidm|afk is now known as sshnaidm|rover | 10:01 | |
*** rfolco|ruck has joined #zuul | 10:37 | |
*** hashar has quit IRC | 10:42 | |
quiquell | ianw: I am testing zuul's docker-compose thingy | 10:51 |
quiquell | ianw: But has some issues pushing projects there | 10:51 |
*** bhavikdbavishi has joined #zuul | 10:51 | |
quiquell | ianw: It takes forever... do you know what can be ? | 10:51 |
*** bhavikdbavishi has quit IRC | 11:24 | |
*** ssbarnea has quit IRC | 11:34 | |
*** ssbarnea has joined #zuul | 11:53 | |
*** bhavikdbavishi has joined #zuul | 11:59 | |
*** snapiri has joined #zuul | 12:22 | |
*** sshnaidm|rover is now known as sshnaidm|afk | 12:50 | |
*** jpena is now known as jpena|lunch | 12:53 | |
*** rlandy has joined #zuul | 12:54 | |
*** jpena|lunch is now known as jpena | 13:18 | |
*** goern has quit IRC | 13:21 | |
*** bhavikdbavishi has quit IRC | 13:29 | |
*** JosefWells has joined #zuul | 13:33 | |
JosefWells | I having trouble getting zuul to run ansible playbooks.. | 13:45 |
JosefWells | in my logs I see: | 13:45 |
JosefWells | zuul-executor_1 | FileNotFoundError: [Errno 2] No such file or directory: 'ansible' | 13:45 |
*** pcaruana has quit IRC | 13:45 | |
JosefWells | I put together a docker compose for zuul a while back | 13:45 |
JosefWells | https://github.com/josefwells/zuul-docker | 13:45 |
JosefWells | In the docker build I see ansible being grabbed as part of the pip install zuul | 13:46 |
tobiash | JosefWells: is it in the path? | 13:46 |
JosefWells | Running setup.py bdist_wheel for ansible: started Running setup.py bdist_wheel for ansible: finished with status 'done' | 13:46 |
JosefWells | that is really my question I guess | 13:47 |
JosefWells | it is pip installed in the container that zuul-executor is running in | 13:47 |
JosefWells | I guess not.. I need to add the zuul home pip bin to path | 13:47 |
JosefWells | thanks tobiash, let me add that to my dockerfile | 13:48 |
JosefWells | I saw some discussion on the mailing list about an official docker setup for zuul | 13:48 |
tobiash | :) | 13:48 |
JosefWells | this was my first rodeo with docker, so it is probably sub-optimal | 13:48 |
tobiash | JosefWells: there is a new docker based quick start tutorial: https://zuul-ci.org/docs/zuul/admin/quick-start.html | 13:49 |
JosefWells | cool, I'll take a look | 13:49 |
openstackgerrit | Tobias Henkel proposed openstack-infra/zuul master: Fix reporting ansible errors in buildlog https://review.openstack.org/616206 | 13:57 |
openstackgerrit | Tobias Henkel proposed openstack-infra/zuul master: Fix reporting ansible errors in buildlog https://review.openstack.org/616206 | 13:59 |
*** pcaruana has joined #zuul | 14:00 | |
tobiash | corvus, mordred: ansible errors like missing roles were missing now in the buildlog. I assume since the ansible update and we didn't notice because of a lack of testing this. So here is a fix and test ^ | 14:08 |
openstackgerrit | Tobias Henkel proposed openstack-infra/zuul master: Fix reporting ansible errors in buildlog https://review.openstack.org/616206 | 14:11 |
mordred | tobiash: whoops - and looks good to me | 14:12 |
*** smyers has quit IRC | 14:13 | |
tobiash | :) | 14:13 |
*** smyers has joined #zuul | 14:14 | |
openstackgerrit | Merged openstack-infra/nodepool master: Correct heading levels for Kubernetes config docs https://review.openstack.org/616007 | 14:20 |
*** pcaruana has quit IRC | 14:33 | |
*** pcaruana has joined #zuul | 14:34 | |
*** quiquell is now known as quiquell|off | 16:19 | |
*** hashar has joined #zuul | 16:30 | |
*** hashar has quit IRC | 16:35 | |
*** hashar has joined #zuul | 16:36 | |
*** pcaruana has quit IRC | 16:38 | |
*** hashar has quit IRC | 17:02 | |
*** irclogbot_3 has joined #zuul | 17:02 | |
openstackgerrit | Fabien Boucher proposed openstack-infra/zuul master: WIP - Pagure driver https://review.openstack.org/604404 | 17:25 |
*** themroc has quit IRC | 17:30 | |
*** jpena is now known as jpena|off | 17:46 | |
openstackgerrit | Tobias Henkel proposed openstack-infra/nodepool master: Add resource metadata to nodes https://review.openstack.org/616262 | 18:04 |
openstackgerrit | Merged openstack-infra/zuul master: Fix reporting ansible errors in buildlog https://review.openstack.org/616206 | 18:04 |
openstackgerrit | Merged openstack-infra/zuul master: doc: fix typo in secret example https://review.openstack.org/616095 | 18:04 |
*** panda is now known as panda|off | 18:10 | |
Shrews | tobiash: i don't believe i really understand that metadata change | 18:12 |
tobiash | Shrews: zuul's side is wip, just had an idea how we could have per project/tenant compute resource statistics | 18:13 |
tobiash | Shrews: the idea is that nodepool could add the abount of resources (cores, ram, more in the future) to each node as metadata | 18:13 |
tobiash | and zuul could push usage data when locking/unlocking nodes | 18:14 |
tobiash | to statsd | 18:16 |
corvus | oh, i get it. i like that idea. | 18:21 |
Shrews | hrm, seems like it might be expensive calculations this way (depending on how often it's calculated) | 18:22 |
corvus | it would only change on node status changes, so in the scheme of things, we wouldn't have to calculate it that often. | 18:25 |
tobiash | the idea was to just increment/decrement gauges (maybe stored in the tenant object) | 18:27 |
tobiash | that would happen on every lock/unlock of a nodeset | 18:27 |
Shrews | oh i see. so usage as zuul sees it | 18:27 |
tobiash | yes | 18:27 |
tobiash | and on nodeset unlock I'd also like to emit a counter of resource*duration | 18:28 |
Shrews | i wonder if there's a better way we can de-duplicate that data, rather than having it in every node | 18:28 |
tobiash | Shrews: well, we could store that in the node request | 18:29 |
tobiash | but that's slightly more complicated as this needs to be updated on every node that is added | 18:29 |
tobiash | but that would be fine for me too | 18:30 |
Shrews | what you have is fine. i'm just thinking out loud right now | 18:30 |
corvus | plus, different nodes may be different sizes | 18:31 |
corvus | (we don't do that in openstack -- yet -- but the ability and intent is there) | 18:31 |
tobiash | if we have it in the request we would have to sum it up in nodepool | 18:31 |
tobiash | we have different sized nodes so supporting that is required for me anyway :) | 18:32 |
tobiash | do you prefer a defined format for this or the loose metadata-dict approach I took? | 18:33 |
tobiash | I'd be fine with both | 18:33 |
Shrews | i just mean that a particular flavor from a provider will always be the same. seems a bit wasteful to store that data in all the nodes matching that. but i'm just doing unnecessary pre-optimization that may not be necessary | 18:33 |
Shrews | thus my overuse of "unnecessary" which was totally unnecessary | 18:34 |
Shrews | and superfluous | 18:34 |
tobiash | if we want to optimize that we would need to store the flavor data in zk which is a new object and different by providers | 18:34 |
corvus | Shrews: oh, yes, you're right. it's not exactly third-normal form is it? :) but yeah, we don't (yet) have a zk record for provider-flavor so that's a little more difficult. | 18:34 |
Shrews | tobiash: exactly where my head was headed, then zuul could do the lookup | 18:35 |
tobiash | so I would prefer to store that little bit of extra data for the sake of simplicity | 18:35 |
Shrews | corvus: it's the db dev in me... i can't shake it :) | 18:36 |
tobiash | Shrews: we currently have 600kb of data in zk with 200 nodes so that really might be premature optimization ;) | 18:36 |
Shrews | tobiash: yeah, but it's your "more in the future" statement that we should probably think a bit about | 18:37 |
tobiash | Shrews: the more in the future might be volume, disk, but I cannot think about much more | 18:38 |
Shrews | i wouldn't like small metadata to grow to large metadata, then we'd be forced to redesign | 18:38 |
tobiash | Shrews: then we probably should go with a fixed schema | 18:39 |
corvus | i wonder if containers have any useful metadata? | 18:40 |
Shrews | hrm | 18:40 |
Shrews | exposed ports? volumes? base image? | 18:40 |
tobiash | that why this should be probably optional so zuul can use it or not depending on the existence | 18:40 |
corvus | yeah, ports, volumes, networks are countable. and containers themselves for the namespaces. | 18:40 |
corvus | i like the 'meta' dictionary for things like this. easy to put whatever makes sense for a given resource in there. | 18:41 |
corvus | (though i could also see having a "resource" dictionary with the same content. but either way, i do like the data-in-a-dict approach) | 18:42 |
* tobiash just wanted to suggest to rename that dict to resource | 18:42 | |
Shrews | seems like this metadata could be grouped by "label" if we determine we need to go that route | 18:42 |
tobiash | "label" can be different per provider | 18:43 |
Shrews | yep, so a /<provider>/<label> node heirarchy | 18:44 |
Shrews | not suggesting we do that. just wildly throwing out ideas | 18:44 |
Shrews | much like mordred does | 18:44 |
tobiash | lol | 18:45 |
Shrews | :) | 18:45 |
Shrews | oh wait... do we already store some metadata somewhere??? | 18:47 |
Shrews | yes! in the launcher node | 18:47 |
Shrews | we store supported labels when the launcher thread registers itself | 18:48 |
tobiash | I'd suggest to take the easy approach and add the counters to a resource dict per node (or nodeset as an optimization?). This is an easier less error prone approach and I don't see a scaling limit there in the foreseeable future. We can switch to a more sophisticated approach any time if this would get too much somewhere in the future. | 18:48 |
*** kmalloc is now known as needscoffee | 19:05 | |
*** electrofelix has quit IRC | 19:47 | |
*** hashar has joined #zuul | 20:11 | |
*** needscoffee is now known as kmalloc | 20:19 | |
*** rfolco|ruck is now known as rfolco|off | 20:40 | |
*** jtanner has quit IRC | 20:47 | |
*** mattclay has quit IRC | 20:47 | |
*** mrhillsman has quit IRC | 20:47 | |
*** jbryce has quit IRC | 20:47 | |
*** hogepodge has quit IRC | 20:47 | |
*** fdegir has quit IRC | 20:47 | |
*** jbryce has joined #zuul | 20:48 | |
*** mnaser has quit IRC | 20:48 | |
*** mattclay has joined #zuul | 20:48 | |
*** hogepodge has joined #zuul | 20:48 | |
*** fdegir has joined #zuul | 20:48 | |
*** jtanner has joined #zuul | 20:49 | |
*** mnaser has joined #zuul | 20:49 | |
*** andreaf has quit IRC | 20:51 | |
dkehn | tobiash: I guess this is a repeditive question, when the scheduler is retreiving ProtectedBranch github connection, is it using the zuul user, the sshkey config variable is set, but I notice that there is no user/password combo? | 20:51 |
*** andreaf has joined #zuul | 20:52 | |
tobiash | dkehn: it is using the auth mechanism you configured in the github connection | 21:00 |
tobiash | so this can be a user or github app auth | 21:00 |
*** edleafe_ has joined #zuul | 21:03 | |
dkehn | ok, sshkey is configured, just wanted to make sure, | 21:06 |
openstackgerrit | Tobias Henkel proposed openstack-infra/zuul master: WIP: Report tenant and project specific resource usage stats https://review.openstack.org/616306 | 21:07 |
dkehn | tobiash: would you have time to look at a pastebin of the error I’m seeing, https://pastebin.com/T9mDkwRM | 21:13 |
openstackgerrit | Tobias Henkel proposed openstack-infra/nodepool master: Add resource metadata to nodes https://review.openstack.org/616262 | 21:13 |
tobiash | dkehn: I guess you have protected branches in that repo? | 21:15 |
dkehn | tobiash: actually I’m not sure | 21:16 |
tobiash | dkehn: oh, well it isn't even searching for protected branches: https://github.ibm.com/api/v3/repos/wrigley/zuul/experiment-conf/branches?per_page=100&protected=0 | 21:16 |
tobiash | dkehn: so make sure that zuul has access to that repo | 21:17 |
tobiash | to me it looks like zuul has no access to that repo at all which is the reason why github returns 404 | 21:17 |
tobiash | dkehn: do you have the zuul.conf at hand (remove credentials before upload) | 21:18 |
dkehn | tobiash: actually when you explain to someone else you see a glaring error in the config | 21:19 |
tobiash | :) | 21:19 |
clarkb | https://etherpad.openstack.org/p/BER-reusable-zuul-job-configurations etherpad prep for tuesday session in berlin on reusing zuul configs | 21:33 |
clarkb | please feel free to add topics you'd like to discuss or have others think about | 21:33 |
*** hashar has quit IRC | 21:49 | |
*** hashar_ has joined #zuul | 21:50 | |
openstackgerrit | James E. Blair proposed openstack-infra/zuul master: WIP: Set relative priority of node requests https://review.openstack.org/615356 | 21:58 |
openstackgerrit | James E. Blair proposed openstack-infra/zuul master: Remove unneeded nodepool test skips https://review.openstack.org/616358 | 21:58 |
*** hashar_ has quit IRC | 22:07 | |
corvus | tobiash: two questions an 604648: 1) what do you think about my idea of just updating the cache inside of lockNode and always using the cache? 2) regardless of that, why do we care about the races in stats? surely that's a place where it would be better to use cached data? | 22:12 |
corvus | tobiash: (my understanding is that this cache uses callbacks, so the updates should be relatively quick, meaning the stats should not be far out of date) | 22:13 |
tobiash | corvus: even slightly delayed stats can prevent the stats to drop to zero im certain situations | 22:15 |
tobiash | corvus: but there is a followup that fixes some of that | 22:15 |
corvus | tobiash: if we make the stats cheap (by using the cache) we could just send them periodically. | 22:16 |
corvus | (so we could send them on every node update, or every 10 seconds if there are no node updates) | 22:16 |
corvus | that should clear things out. | 22:16 |
tobiash | corvus: this is the followup i meant: https://review.openstack.org/#/c/613680 | 22:17 |
tobiash | It does something similar | 22:18 |
tobiash | corvus: regarding 1) The current version already defaults to using the cache | 22:20 |
tobiash | The cache is only disabled in a few cases in order to work around some test races | 22:21 |
corvus | tobiash: i think i'm suggesting the opposite. rather than rate limit, keep the current behavior, but send periodic updates to clean up any lingering errors. | 22:21 |
tobiash | The current behavior isn't really cheap even with the cache | 22:21 |
corvus | tobiash: yeah, i'm wondering why you didn't think we needed to update the cache in the node lock. basically... i wrote a nice review comment and got no reply. :( | 22:21 |
tobiash | Sorry, was side tracked at that time | 22:22 |
corvus | tobiash: really? it seems like with the cache it should be very fast. | 22:22 |
tobiash | There is much json parsing involved while iterating even with cache | 22:23 |
tobiash | So we save the network rtts but not json parsing | 22:23 |
tobiash | If we want to safe that we need to cache the node objects which would be another cache layer | 22:24 |
corvus | tobiash: yeah, i think we should look at doing that; there should be update events we get from the treecache that can do that for us. | 22:24 |
tobiash | Regarding update on lock, I think it's a good idea, then we could remove all the scattered safety updates after lock | 22:25 |
corvus | ++ | 22:25 |
tobiash | But I think that's independent of the cache as the updaze after lock already exist everywhere | 22:26 |
corvus | i don't think we need to block on the stats thing -- we can like with rate limited for now. just i think we should work toward a system that's fast enough that we can count all the nodes in a reasonable amount of time. :) | 22:26 |
corvus | s/we can like/we can live/ | 22:26 |
corvus | tobiash: yes, though i think it should also give us the confidence to remove the cache flag. | 22:27 |
corvus | so we can hide that detail | 22:27 |
tobiash | Yes, with thr additional cache layer that will be much faster | 22:27 |
corvus | (put another way -- i think it will be easier for us as nodepool developers to just remember that everything is cached, it could be slightly out of date, *unless* a node is locked -- then it's guaranteed to be up to date and immutable. that's easy to understand :) | 22:28 |
tobiash | But starting only with the treecache was easier, maybe it makes sense tondo that in two steps | 22:28 |
corvus | ++ | 22:28 |
corvus | tobiash: one more question -- do you have any work in progress to cache the node request tree? or immediate plans to do so? | 22:29 |
tobiash | Not yet | 22:29 |
corvus | ok, i might do that very soon :) | 22:30 |
corvus | i want it for https://review.openstack.org/615356 | 22:30 |
tobiash | Sounds cool, we are very interested in your priority idea too :) | 22:31 |
tobiash | So if I can help... | 22:31 |
*** rfolco|off has quit IRC | 22:34 | |
*** ssbarnea has quit IRC | 22:37 | |
tobiash | corvus: one thought I have about that is that we might need to adapt the loop behavior of assignHandlers to make this work | 22:38 |
tobiash | I observed that this loop skips a lot of requests due to lock contention with several not matching providers | 22:39 |
tobiash | So it runs througj the whole list and skips every request that it couldnt lock | 22:40 |
openstackgerrit | James E. Blair proposed openstack-infra/zuul master: WIP: Set relative priority of node requests https://review.openstack.org/615356 | 22:41 |
tobiash | Maybe we need some criteria to tell it to abort the loop and start again from the beginning | 22:42 |
corvus | tobiash: hrm, i don't understand the problem you're describing | 22:43 |
tobiash | The problem is that we have say two providers that can fulfill most of the requests and 8 pools with seldom used static nodes | 22:44 |
tobiash | All of these loop over the request list, lock and decline | 22:45 |
tobiash | So there is a high probability that a high prio request is locked by a static provider that will decline | 22:46 |
tobiash | But that will cause that request to be skipped by the provider that can fulfill this | 22:46 |
tobiash | So it takes the next request and the now skipped high prio request will only be served after the provider looped over the complete list and starts again from the beginning | 22:48 |
tobiash | If that probability is high enough this circumvents the priority queue | 22:49 |
tobiash | We have times when this lock contention leads to more or less random orders of processing in the end | 22:50 |
clarkb | tobiash: maybe this is an indication the priority system should include a rewrite to use node subscriptions instead of polls? | 22:50 |
clarkb | or poll without locking? | 22:50 |
clarkb | read data, if actionable lock, if lock fails assume someone else has lock and move to next potential candidate? | 22:51 |
tobiash | I'm sure that this is partially an indication that we need to split launchers | 22:52 |
clarkb | oh you are talking about current priority system (not the proposed one) | 22:52 |
clarkb | might still be an indication that we should lock less aggressively and/or avoid polling | 22:53 |
tobiash | Yes, but that wouldn't matter | 22:53 |
clarkb | tobiash: reduced lock contention should make the ordering more deterministc? | 22:53 |
tobiash | A solution could be to start from the beginning after we accepted a (or few) request | 22:54 |
tobiash | That would also fit into the dynamic prio | 22:55 |
tobiash | And this should be fast enough if we cache the requests | 22:57 |
tobiash | Yes, I guess in openstack land this is a non issue because most providers can fullfill the same types? | 23:01 |
clarkb | tobiash: yes, though there are cases where providers fulfill specific types (specifically arm and kata test nodes) | 23:04 |
corvus | tobiash: ah i see. yes, restarting from the beginning would be a good fix for that i think, and low cost with request caching. | 23:06 |
SpamapS | did something happened to the add-build-sshkey role? | 23:06 |
corvus | SpamapS: i think i made a recent change to it that should be noop in most cases. | 23:07 |
SpamapS | http://paste.openstack.org/show/734380/ | 23:07 |
corvus | recent = months | 23:07 |
SpamapS | My zuul was idle for a few weeks and now every job goes to NODE_FAILURE with that fail | 23:07 |
corvus | SpamapS: oh, well, the role still exists | 23:08 |
SpamapS | http://paste.openstack.org/show/734381/ | 23:08 |
SpamapS | hm | 23:08 |
tobiash | SpamapS: node_failure normally happens before ansible | 23:09 |
tobiash | SpamapS: or did you mean retry_limit? | 23:09 |
corvus | SpamapS: the add-build-sshkey is in zuul-jobs, so you'll need that added to the roles for the job. i notice your second error says "git.openstack.org/openstack-infra/zuul-base-jobs" but that refers to "git.zuul-ci.org/zuul-jobs", so something may not be adding up in terms of your connections. | 23:12 |
corvus | (see http://git.zuul-ci.org/cgit/zuul-base-jobs/tree/zuul.yaml#n15 ) | 23:12 |
SpamapS | I had to use git.openstack.org because of {some reason I forget the details of} | 23:15 |
SpamapS | it's possible somebody has fixed that now | 23:15 |
*** rlandy is now known as rlandy|bbl | 23:29 |
Generated by irclog2html.py 2.15.3 by Marius Gedminas - find it at mg.pov.lt!