Thursday, 2019-04-11

*** threestrands has joined #zuul		00:14
SpamapS	why on earth are the drivers doing this btw?	00:29
SpamapS	Just ocurred to me that this seems like a generic problem for all providers.	00:29
clarkb	SpamapS: my guess is beacuse figuring out usage is driver specific	00:30
SpamapS	But not at the max-servers level	00:30
clarkb	we could push that into the drivers and then have ya that	00:30
SpamapS	That's entirely nodepool's problem.	00:30
clarkb	prior to actually checking quota it was entirely in the scheduler iirc	00:31
SpamapS	There's "how many are running in the cloud vs. what you want to launch" and then theres' "how many are in nodepool's database vs. what you want to run?"	00:31
clarkb	nodepool used its local db to calculate usage	00:31
SpamapS	Yeah, seems like both should happen.	00:31
clarkb	but now it uses a combo of both	00:31
SpamapS	seems like nodepool should handle max-servers itself.	00:31
clarkb	SpamapS: I think this is largely just an overlooked item in the split out of drivers from onyl having an openstack driver	00:31
SpamapS	makes sense	00:32
*** pwhalen has quit IRC		00:32
*** tobiash has quit IRC		00:32
*** timburke has quit IRC		00:32
clarkb	but ya nodepool could ask the driver of a provider for its reported usage, grab usage from the db and compare the two at a higher level	00:33
clarkb	then drivers would only need to implement the provide reported usage data	00:33
*** tobiash has joined #zuul		00:33
*** timburke has joined #zuul		00:34
*** mgoddard has quit IRC		00:34
SpamapS	That's also good, but I need even less.	00:35
SpamapS	Nodepool could look at the number of nodes a provider owns, and if it's >= max-servers, reject the request.	00:35
SpamapS	No cloud request needed for that.	00:35
clarkb	ya I actually think the openstack driver does that short cut	00:35
SpamapS	I actually assumed it did that	00:35
clarkb	but you'd still need the check usage for the general case	00:36
SpamapS	and luckily I have a 30 instance limit.. or I might have eaten up my entire AWS budget yesterday. ;)	00:36
clarkb	I have max-server headroom, do I actually have quota	00:36
SpamapS	another broken assumption from the openstack driver that we accidentally carried into AWS is that this provider owns every instance it can see.	00:37
*** mgoddard has joined #zuul		00:37
SpamapS	AWS definitely does not work like that.	00:37
SpamapS	(well, it can, but only with a really carefully constructed IAM policy)	00:37
SpamapS	so I'm adding a tag system so listNodes filters out nodes that don't belong to nodepool.	00:38
clarkb	oh it already does the thing I said via hasRemainingQuota	00:38
clarkb	but aws driver doesn't implement that	00:38
SpamapS	That shortcut should be moved to a concrete function in the parent class that gets called before hasREmainingQuota.	00:39
SpamapS	something like if self.pool.max_servers is not None: ...	00:39
SpamapS	anyway... for now.. just adding some filtering on tags and counting and it should work	00:39
clarkb	and it doesn't shortcut like I thought	00:40
clarkb	SpamapS: I think updating the base class hasRemainingQuota() to check local db against max* would get you what you want	00:41
clarkb	then drivers can selectively do more by completely overriding the base class method or do both by supercalling it and then doing more	00:41
SpamapS	clarkb: max*? Are there cpu counts hiding in there too?	00:41
clarkb	SpamapS: ya ram, cpu, and instances	00:41
clarkb	looks like	00:41
clarkb	hrm I suppose you'd really only be able to do instances that way	00:42
clarkb	beacuse you won't know ram or cpu usage	00:42
clarkb	but still an improvement	00:42
SpamapS	ah ok, interesting.. AWS actually does have a price list API that has some of what the openstack flavors API has... so we could actually do this	00:44
SpamapS	https://docs.aws.amazon.com/aws-cost-management/latest/APIReference/API_pricing_GetProducts.html	00:44
SpamapS	In fact with that API, one could actually set up a run-rate as a limit	00:47
SpamapS	but yeah, for now, I just want max-servers :-P	00:47
* SpamapS working through just querying EC2.		00:47
openstackgerrit	Clark Boylan proposed openstack-infra/nodepool master: Add simple max-server sanity check to base handler class https://review.openstack.org/651676	01:01
clarkb	SpamapS: ^ I haven't tested that but it may be as simple as that for your needs	01:01
*** jamesmcarthur has joined #zuul		01:34
*** jamesmcarthur has quit IRC		01:48
*** jamesmcarthur has joined #zuul		01:49
*** jamesmcarthur has quit IRC		02:25
*** bhavikdbavishi has joined #zuul		02:42
*** bhavikdbavishi has quit IRC		03:05
*** jamesmcarthur has joined #zuul		03:25
*** jamesmcarthur has quit IRC		03:41
*** bhavikdbavishi has joined #zuul		03:44
*** bhavikdbavishi1 has joined #zuul		03:49
*** bhavikdbavishi has quit IRC		03:49
*** bhavikdbavishi1 is now known as bhavikdbavishi		03:49
*** jamesmcarthur has joined #zuul		04:01
*** jamesmcarthur has quit IRC		04:19
*** bjackman_ has joined #zuul		04:25
openstackgerrit	Tristan Cacqueray proposed openstack-infra/nodepool master: Add python-path option to node https://review.openstack.org/637338	04:44
*** mhu has quit IRC		04:45
openstackgerrit	Tristan Cacqueray proposed openstack-infra/zuul-jobs master: Add ansible-lint job https://review.openstack.org/532083	04:46
*** quiquell\|off is now known as quiquell\|rover		05:32
*** gouthamr has quit IRC		05:56
*** gouthamr has joined #zuul		06:00
openstackgerrit	Tristan Cacqueray proposed openstack-infra/zuul-jobs master: Add ansible-lint job https://review.openstack.org/532083	06:12
*** quiquell\|rover is now known as quique\|rover\|brb		06:26
*** gtema has joined #zuul		06:57
*** quique\|rover\|brb is now known as quiquell\|rover		07:06
*** threestrands has quit IRC		07:30
*** jpena\|off is now known as jpena		07:36
*** hashar has joined #zuul		08:18
*** yolanda_ has joined #zuul		08:29
*** mhu has joined #zuul		09:26
*** electrofelix has joined #zuul		09:40
*** bhavikdbavishi has quit IRC		10:12
*** yolanda_ has quit IRC		10:40
*** yolanda_ has joined #zuul		10:52
*** jpena is now known as jpena\|lunch		10:56
quiquell\|rover	tristanC: do you know if a job that depends on a non-voting job will start even if dependant fails ?	11:01
quiquell\|rover	nhicher: ^ do you know ?	11:02
quiquell\|rover	I am going to test that	11:03
*** bhavikdbavishi has joined #zuul		11:08
AJaeger	I hope it does not ;)	11:12
quiquell\|rover	testing it but our pipeline looks like	11:20
quiquell\|rover	AJaeger: openstack-periodic at https://softwarefactory-project.io/zuul/t/rdoproject.org/status	11:20
quiquell\|rover	AJaeger: the dependant is the buildah job and it's failing	11:20
quiquell\|rover	AJaeger: testing it here https://review.rdoproject.org/r/#/c/20143/	11:21
quiquell\|rover	AJaeger: nah is working as expected https://review.rdoproject.org/r/#/c/20143/	11:31
AJaeger	good ;)	11:34
quiquell\|rover	AJaeger: then I don't know why our jobs are running	11:35
quiquell\|rover	:-/ that's worse	11:35
quiquell\|rover	AJaeger: do you have some brain cycles for me ?	11:35
quiquell\|rover	AJaeger: this is the stuff I am talking about https://github.com/rdo-infra/review.rdoproject.org-config/blob/master/zuul.d/tripleo.yaml#L45-L47	11:40
AJaeger	sorry, not enough brain right now to dig into that ;(	11:41
*** rlandy has joined #zuul		11:57
*** rlandy is now known as rlandy\|ruck		11:58
*** jamesmcarthur has joined #zuul		12:13
*** jpena\|lunch is now known as jpena		12:26
pabelanger	clarkb: mordred: tobiash: care to add https://review.openstack.org/163922 to your review queue, that should be a small improvement on zuul-merger with stacked commits	12:29
*** jamesmcarthur has quit IRC		12:34
*** jamesmcarthur has joined #zuul		12:47
*** quiquell\|rover is now known as quiquell\|lunch		12:59
*** bjackman_ has quit IRC		13:04
*** jamesmcarthur has quit IRC		13:04
nhicher	quiquell\|lunch: did you find the issue with your non-voting job ?	13:06
quiquell\|lunch	nhicher: there were no issue just a brain fart from my part	13:07
quiquell\|lunch	thanks anyways	13:07
nhicher	quiquell\|lunch: ok =)	13:08
*** webknjaz has joined #zuul		13:09
webknjaz	Hello everyone!	13:10
* webknjaz wonders whether this is the right place to criticise the usage of CherryPy in Zuul...		13:10
*** bhavikdbavishi has quit IRC		13:17
Shrews	webknjaz: perhaps you mean "discuss" rather than "criticise"? but yes, this is the correct channel, though the person that switched us to that is on vacation at the moment	13:17
*** pcaruana has quit IRC		13:20
webknjaz	@Shrews: yeah... The thing is that I've been exposed to the source code which made me a bit unhappy :)	13:22
webknjaz	I'm a CherryPy maintainer (along with other things like aiohttp, ansible core).	13:22
webknjaz	So I just wanted to point out how to do a few things cherrypy-way. I'm especially interested in GitHub Apps now.	13:22
webknjaz	Here's a cleaner example of having event handlers from one of my PoCs, for example: https://github.com/webknjaz/ansiwatch-bot/blob/58246a8/ansiwatch_bot/apps/github_events.py#L136-L157	13:22
webknjaz	Oh, and I'm currently developing a framework for writing github apps and actions — https://tutorial.octomachinery.dev	13:22
webknjaz	@Shrews: so I just wanted to say that if interested parties want some feedback, maybe you could tell them to ping me once they are back?	13:23
Shrews	webknjaz: we'd LOVE feedback on how to do things better, or where we do things in not the best way. anything you can share to help us improve is much appreciated. No need to wait for that particular dev to return.	13:30
webknjaz	Is this channel logged? Will it better to share it in some better discoverable place?	13:31
Shrews	webknjaz: yes, it is logged: http://eavesdrop.openstack.org/irclogs/%23zuul/	13:32
*** quiquell\|lunch is now known as quiquell\|rover		13:37
*** pcaruana has joined #zuul		13:42
webknjaz	I think that some parts of http://git.zuul-ci.org/cgit/zuul/tree/zuul/web/__init__.py#n647 could use plugins like from here: https://github.com/cherrypy/cherrypy/blob/b648977/cherrypy/process/plugins.py#L484-L580	13:51
webknjaz	This class also looks like it should be a plugin: http://git.zuul-ci.org/cgit/zuul/tree/zuul/driver/github/githubconnection.py#n101	13:53
webknjaz	people don't seem to realize that the core of CherryPy is a pubsub bus which they can actually use as well	13:54
*** bjackman_ has joined #zuul		13:56
webknjaz	https://www.irccloud.com/pastebin/1QzyW2Aj/	13:59
*** bjackman_ has quit IRC		14:01
webknjaz	But I think that this is still a wrong approach. Architecturally, it's way nicer to customize the routing layer. CherryPy has pluggable and replaceable request dispatchers.	14:02
webknjaz	For example, here's what I've done: https://github.com/webknjaz/ansiwatch-bot/blob/master/ansiwatch_bot/request_dispatcher.py	14:02
webknjaz	This allows me to have per-event methods: https://github.com/webknjaz/ansiwatch-bot/blob/58246a8/ansiwatch_bot/apps/github_events.py#L136-L157	14:02
*** swest has quit IRC		14:03
webknjaz	You can have a decorator to unpack all webhook payload keys as handler method arguments: https://github.com/webknjaz/ansiwatch-bot/blob/master/ansiwatch_bot/tools.py	14:03
*** swest has joined #zuul		14:04
webknjaz	Overall working with GitHub can be offloaded to plugins: https://github.com/webknjaz/ansiwatch-bot/blob/master/ansiwatch_bot/plugins.py	14:05
Shrews	webknjaz: i'm not sure how much sense it makes for us to use the cherrypy dispatchers. we have other parts of the app (non-cherrypy based) that we are communicating with via gearman, which is what you see there in the payload() method	14:13
webknjaz	what do you mean? does gearman act as a proxy?	14:13
*** ianychoi has quit IRC		14:13
*** ianychoi has joined #zuul		14:14
Shrews	we have several pieces/daemons to "zuul" that all communicate with each other (submitting jobs/requests/etc) using gearman	14:14
mordred	yeah. it's a network bus/ job worker system	14:14
pabelanger	webknjaz: https://zuul-ci.org/docs/zuul/admin/components.html might help explain the different parts of zuul	14:14
Shrews	pabelanger: ah yes, was just looking for that. thx	14:15
pabelanger	np!	14:15
mordred	so this http://git.zuul-ci.org/cgit/zuul/tree/zuul/driver/github/githubconnection.py#n1556 is the only bit running on the zuul-web server, the other bits are running in other processes probably on other machines that don't do anything with http requests	14:15
webknjaz	so it's vice versa	14:16
webknjaz	anyway, it still needs to have validation in a tool	14:16
mordred	yeah- the web interaction is largely "receive payload, validate signature, put on gearman bus" - but yeah, I think doing the validate as a decorator like you mention could be potentially nice	14:17
webknjaz	for interfacing with jobs it may be better to have a CherryPy plugin	14:17
webknjaz	because otherwise implementation details of the rpc leak into the web handler layer	14:18
mordred	like a cherrypy plugin that does the submit job? that's an interesting idea	14:18
webknjaz	yep	14:18
webknjaz	you'd use pub-sub to interface with it	14:19
webknjaz	https://www.irccloud.com/pastebin/MlVJDNcG/	14:20
mordred	nod. I could see that plugin being potentially nice - passing things onto the gearman bus is a common pattern	14:20
webknjaz	or without that helper func it's: https://github.com/webknjaz/ansiwatch-bot/blob/master/ansiwatch_bot/apps/github_events.py#L93-L94	14:21
mordred	webknjaz: https://opendev.org/openstack-infra/zuul/src/branch/master/zuul/web/__init__.py#L263-L270 ... is an example of one of the simpler but common patterns of "turn this http request into a gearman call" - is there a better way to set the CRSF header more generally than just grabbing the response and setting the header directly like we're doing there?	14:25
webknjaz	I believe you could use another tool	14:26
webknjaz	http://docs.cherrypy.org/en/latest/extend.html#hook-point	14:26
webknjaz	Try using `before_finalize` hook point	14:27
webknjaz	After looking at http://git.zuul-ci.org/cgit/zuul/tree/zuul/driver/github/githubconnection.py#n241 I think that you really need a custom dispatcher...	14:28
webknjaz	maybe you could event use a WSGI app interface "on the other end" of gearman	14:29
*** quiquell\|rover is now known as quiquell\|off		14:34
*** hashar has quit IRC		15:03
*** bhavikdbavishi has joined #zuul		15:13
*** pcaruana has quit IRC		15:42
*** jamesmcarthur has joined #zuul		15:49
*** jamesmcarthur_ has joined #zuul		15:50
*** jamesmcarthur has quit IRC		15:50
openstackgerrit	Matthieu Huin proposed openstack-infra/zuul master: [DNM] admin REST API: docker-compose PoC, frontend https://review.openstack.org/643536	15:55
*** bhavikdbavishi has quit IRC		16:01
*** chandankumar is now known as raukadah		16:01
*** quiquell\|off has quit IRC		16:14
*** ianychoi has quit IRC		16:20
SpamapS	Shrews:you around? I have a question about NodeRequestHandler.decline_request ... it fails if all launchers have declined.. but.. don't we want to retry it again at some point?	16:26
SpamapS	Like, if all the providers are just busy... we want that request to queue, right?	16:26
SpamapS	But it seems like that will just lead to NODE_FAILURE	16:27
clarkb	when they are busy they stop processing requests	16:29
clarkb	so it shouldnt decline unless it actually failed up to the retry count or the provider does not have the label	16:30
SpamapS	clarkb:oh? how do they know that?	16:30
clarkb	SpamapS: the hasremainingquota check	16:30
SpamapS	Mine hits hasProviderQuota first, and fails the request.	16:30
SpamapS	So there's a window right as you get busy, where requests fail.	16:31
clarkb	it is possible this is another openstack specific behavior that should be at a higher level	16:31
SpamapS	Or it's just a rare thing and you don't see it that often?	16:32
SpamapS	Like, to make it super careful, you'd need to call hasRemainingQuota before accepting requests.	16:32
SpamapS	And ideally that would reserve resources, so you don't accept two and then one gets failed.	16:33
*** zbr has quit IRC		16:34
clarkb	yes I think it gets the request and checks has remaining quota and if not block until resources are freed	16:34
clarkb	that serialization prevents sending back inappropriate declines	16:34
SpamapS	Well in the test suite what I have is one active request, a quota of 1, and when I send another request for 1, it's failed.	16:39
SpamapS	I'll push up what I have now, and maybe you can spot the assumptions.	16:39
openstackgerrit	Clint 'SpamapS' Byrum proposed openstack-infra/nodepool master: Implement max-servers for AWS driver https://review.openstack.org/649474	16:40
*** zbr has joined #zuul		16:40
SpamapS	clarkb:I would very much appreciate your eyes when you have some time. THanks ^^	16:40
clarkb	SpamapS: https://git.openstack.org/cgit/openstack-infra/nodepool/tree/nodepool/driver/__init__.py#n480	16:41
clarkb	that is the code that pauses when has remaining quota isn't true	16:41
SpamapS	clarkb:yeah, in my test runs I never see that "Pausing"	16:43
SpamapS	so there is at least a narrow window between accepting of the request that puts you at quota, and pausing, that results in a failed request. I wonder if I add a sleep if that will remain true	16:46
SpamapS	In fact, the request gets declined, which is why we don't go to waitForNodeSet.	16:49
SpamapS	Not sure how to get through to that next loop iteration that pauses things once we're at quota actually. Seems like everything will bounce off as declined.	16:51
* Shrews catches up		16:52
SpamapS	Seems to me that you shouldn't ever decline an over quota request, you should just pause.	16:52
*** jamesmcarthur has joined #zuul		16:52
Shrews	SpamapS: i suspect part of the problem may be that you are sharing resources across provider pools, and we don't really support that right now	16:55
*** jamesmcarthur_ has quit IRC		16:56
Shrews	so your check for quota should never race with another pool thread trying to use that same quota pool	16:56
Shrews	(if you follow that configuration rule, that is)	16:56
Shrews	i realize that's probably not ideal with AWS in its current form	16:56
SpamapS	This is a test	16:56
SpamapS	1 pool	16:56
SpamapS	1 provider	16:56
SpamapS	Also no, the quota check I put in place scopes to pool.	16:57
Shrews	hrm	16:57
SpamapS	But in this case, the problem is the algorithm by which we pause. I believe it has a race condition in it, where you will decline a request if you are already exactly at quota.	16:58
SpamapS	And that may not happen often in OpenStack because of the many quotas.	16:58
SpamapS	Some min-ready comes along and unwedges things by slipping under the quotas.	16:58
SpamapS	But with just max-servers quota.. it's always going to be the case that we walk right up to the quota. And then there's no path I can see through the code to self.paused = True	16:59
SpamapS	Every subsequent request to that pool will be declined until the quota is released.	16:59
SpamapS	The comment at the top of _waitForNodeSet I think calls out the race at the driver level, but it's actually deeper.	17:01
*** gtema has quit IRC		17:04
SpamapS	I'm actually having trouble figuring out how the pause in _waitForNodeSet is ever reached.	17:04
*** pcaruana has joined #zuul		17:05
Shrews	SpamapS: if you have to launch a node, but you are out of quota, then you reach that pause.	17:05
SpamapS	Shrews:not in the test case here: https://review.openstack.org/649474	17:06
SpamapS	You can't even get to "need to launch a node" because you're already failing hasProviderQuota.	17:06
SpamapS	Or maybe I misunderstood what hasProviderQuota is supposed to do.	17:07
SpamapS	I wonder if OpenStack gets around this because of the estimatedNodepoolQuota .. it maybe passes hasProviderQuota in that next request instance.	17:09
SpamapS	Yeah, the caching probably hides this.	17:09
SpamapS	Keeping the window open just long enough to slip down to _waitForNodeSet.	17:10
*** jpena is now known as jpena\|off		17:10
Shrews	SpamapS: hasProviderQuota, iirc, is supposed to be "can this provider handle the nodes the request, regardless of what is available now". hasRemainingQuota is the answer to "what is available now"	17:12
SpamapS	Shrews:ok, so hasProviderQuota should not take running machines in to account, but rather total capacity of said provider?	17:12
SpamapS	That does make sense.	17:13
Shrews	SpamapS: i think so.... been a while	17:13
Shrews	it was originally for things like "request wants 50 nodes, but provider only has 40 total"	17:13
SpamapS	yeah that would make sense	17:14
SpamapS	If that's the case, I think we should make the comment in the abstract class more clear. I'll take that up.	17:14
Shrews	SpamapS: but that morphed with tobiash's quota changes, so that's harder to see just from looking at it to try to remember :)	17:14
SpamapS	Yeah, I think in reading the openstack one I missed that there's an `estimatedNodepoolQuota` and `estimatedNodepoolQuotaUsed` ...	17:14
clarkb	Shrews: semi related to this is https://review.openstack.org/#/c/651676/	17:14
SpamapS	I just saw them as the same call.	17:14
Shrews	SpamapS: "Checks if a provider has enough quota to handle a list of nodes. This does not take our currently existing nodes into account."	17:15
SpamapS	yep that were it	17:16
Shrews	SpamapS: i mean, that seems to say what i just said. what's unclear?	17:16
SpamapS	http://paste.openstack.org/show/749204/ <-- this makes the test work the way I expected.	17:17
SpamapS	Shrews:I think I missed the "does not" part.	17:17
SpamapS	"psshhh... details."	17:17
Shrews	SpamapS: happy to +2 any changes that make that more clearerer :)	17:19
* Shrews wonders if <blink> works in rst....		17:19
* SpamapS deserved that		17:20
openstackgerrit	Clint 'SpamapS' Byrum proposed openstack-infra/nodepool master: Implement max-servers for AWS driver https://review.openstack.org/649474	17:21
SpamapS	durn pep8	17:23
openstackgerrit	Clint 'SpamapS' Byrum proposed openstack-infra/nodepool master: Implement max-servers for AWS driver https://review.openstack.org/649474	17:23
SpamapS	Shrews:^ ok, this I think actually implements correctly. :)	17:23
SpamapS	whoops, spotted a bug	17:24
openstackgerrit	Clint 'SpamapS' Byrum proposed openstack-infra/nodepool master: Implement max-servers for AWS driver https://review.openstack.org/649474	17:24
Shrews	clarkb: what's the impetus for that change? Not saying we shouldn't do it, just curious what brought that about	17:30
clarkb	Shrews: if we had that then aws driver would've worked as is for SpamapS quota issues I think	17:31
clarkb	Shrews: very few of our drivers implement hasRemainingQuota so they all ignore the common max-servers directive	17:31
clarkb	this should enforce that directive accurately as long as the "cloud" doesn't leak instances	17:32
*** bhavikdbavishi has joined #zuul		17:32
SpamapS	clarkb:+++ I like yours. :)	17:33
*** jamesmcarthur has quit IRC		17:35
Shrews	clarkb: my ownly concern with that, atm, is the method used to count current nodes. that may not be accurate since it includes all node states.	17:38
clarkb	Shrews: I believe you need to include all node states as errored/ready/in-use/and deleting but not yet deleted nodes all consume quota	17:39
Shrews	clarkb: INIT and ABORTED do not	17:40
clarkb	do nodes go aborted only prior to asking $cloud to make them?	17:40
Shrews	it's the latter one that concerns me the most. i know it is transient, but don't remember how long that sticks around (or how many may result due to a single launch)	17:40
clarkb	it should be easy enough to filter out by state if we can nail that down to states that never represent resources in a cloud	17:42
clarkb	but I thought they all basically did. Maybe init doesn't	17:42
Shrews	you'll need to at least filter those two states	17:43
Shrews	and maybe FAILED	17:44
clarkb	ya aborted is only set when we hit a quota error? so in theory that shouldn't count against your quota	17:45
clarkb	failed does count against your quota in openstack	17:45
clarkb	I don't know if it will in all clouds but conservative there seems fine?	17:45
Shrews	failed doesn't always count, but being conservative seems logical	17:46
Shrews	failing to launch a node after X retries will result in a FAILED state	17:46
Shrews	but so can launching a node and losing the ZK connection before we could record it	17:47
*** jamesmcarthur has joined #zuul		17:47
*** sshnaidm_ has joined #zuul		17:52
openstackgerrit	Clark Boylan proposed openstack-infra/nodepool master: Add simple max-server sanity check to base handler class https://review.openstack.org/651676	17:53
*** sshnaidm has quit IRC		17:53
clarkb	Shrews: ^ that look better? I added DELETED as well since that should mean the node is completely gone from the cloud and is now just accounted in our db	17:53
manjeets	Hi guys has anything changed for upstream untrusted project ?	17:54
*** jamesmcarthur has quit IRC		17:54
manjeets	we have a job pipeline that works fine on ci-sandbox	17:54
manjeets	but same jobs are not getting triggered on actual project	17:55
manjeets	https://review.openstack.org/#/c/647960/2	17:55
manjeets	look at comment Intel SriovTaas CI	17:55
manjeets	but I run same job on ci-sanbox it gets triggered	17:56
*** jamesmcarthur has joined #zuul		17:56
clarkb	manjeets: the ability to merge is repo specific	17:56
manjeets	clarkb, we don't have merge in the pipeline actually	17:56
manjeets	using zuul's docker-compose thing	17:57
clarkb	zuul has to merge your commit against the target branch to test it	17:57
clarkb	that is failing according to the error message	17:57
clarkb	I would check your merger logs	17:57
*** jamesmcarthur has quit IRC		17:58
manjeets	clarkb, merge where ? it should never merge to repo anyway >	17:58
*** jamesmcarthur has joined #zuul		17:59
clarkb	manjeets: one of the fundamental design choices of zuul is that it intends to test what your code would look like if it merged to the actual repo. This means before testing a change zuul merges it locally against the target branch and tests the resulting commit. All of this is in zuul managed git repos. If tests pass then later zuul can ask gerrit to merge them to to canonical repo	17:59
clarkb	this way developers don't have to manually rebase everytime a new commit merges to get accurate test results. Zuul does that for you and you know when you ask zuul to merge the code that it should work to a high degree of confidence	18:00
clarkb	the error message on that change indicates zuul failed to do this local merge. I would check the zuul merger process's logs	18:00
Shrews	clarkb: yah. good call on DELETED too	18:01
*** jamesmcarthur has quit IRC		18:01
*** jamesmcarthur has joined #zuul		18:02
*** sshnaidm_ is now known as sshnaidm\|off		18:03
openstackgerrit	Fabien Boucher proposed openstack-infra/zuul master: WIP - Pagure driver - https://pagure.io/pagure/ https://review.openstack.org/604404	18:03
*** electrofelix has quit IRC		18:19
*** jamesmcarthur has quit IRC		18:26
*** jamesmcarthur has joined #zuul		18:28
pabelanger	tobiash: I cannot remember, but were maybe you discussing the ability to support all forms of merge that github supports? (eg: merge / squash / rebase)	18:29
tobiash	pabelanger: yes, I was part of the discussion	18:31
pabelanger	tobiash: do you happen to know when that was again, so I can find the irclogs?	18:31
pabelanger	I'd like to refesh myself on that topic	18:32
manjeets	clarkb, I get it do you mean it merges the patchset to repo cloned locally for testing ?	18:32
clarkb	manjeets: yes	18:32
tobiash	pabelanger: hrm, no idea, could be months	18:33
pabelanger	kk	18:33
*** hashar has joined #zuul		18:57
SpamapS	do we not run any coverage reports for nodepool tests?	18:59
clarkb	hrm we did before the zuulv3 rewrite	19:01
clarkb	I don't see it now	19:01
Shrews	removed that long ago	19:01
Shrews	https://review.openstack.org/#/c/608688/	19:02
*** pcaruana has quit IRC		19:02
clarkb	hrm fwiw I found it really useful when improving tests and debugging races	19:05
clarkb	(I added it and the functioanl jobs way back when to tackle nodepools frequent breakages)	19:05
Shrews	you should still be able to run it on demand	19:05
clarkb	ya or run it locally. I think we ended up stabilizing to the point where it wasn't as useful as often so not really objecting. Just pointing out that it can be valuable	19:06
mordred	Shrews: zuul-preview seems to be running super slow	19:06
clarkb	the functional tests go a long way for asnity checking	19:06
mordred	Shrews: if you check out https://review.openstack.org/#/c/651219/ and click on inaugust-build-website - it'll just sit there spinning	19:07
mordred	I started looking in to it - but haven't gotten very far	19:07
mordred	obviously it's not _urgent_ as it's not really a fully production thing yet - but thought I'd mention it	19:07
Shrews	mordred: we didn't merge my proposed changes yet, did we?	19:08
mordred	Shrews: no - I donm't think so	19:09
Shrews	not that it would improve anything, but wondering if i broke something	19:09
Shrews	no, we didn't	19:09
mordred	Shrews: although I did +2 them	19:09
mordred	I think we're getting crawled	19:11
mordred	docker logs --since 30m -f 3310c96209ef on the host shows a bunch of activity - none of it useful	19:12
Shrews	mordred: cache overload?	19:12
mordred	maybe? although I think this:	19:12
mordred	[Thu Apr 11 19:09:20.617020 2019] [proxy_http:error] [pid 2916:tid 139831896164096] (70007)The timeout specified has expired: [client 174.143.130.226:55828] AH01102: error reading status line from remote server 174.143.130.226:80	19:13
mordred	oh - nevermind - I thought it was timing out remotely	19:13
Shrews	mordred: i don't even know where to access zuul-preview. it's a container then? what host?	19:13
mordred	yeah - zuul-proxy.opendev.org	19:14
Shrews	that does not resolve for me	19:15
*** bhavikdbavishi has quit IRC		19:20
Shrews	mordred: zp makes requests to http://zuul.openstack.org/api/tenant which is not doing anything for me except displaying "Fetching info"	19:21
Shrews	so maybe the issue is in zuul-web	19:22
Shrews	something is hammering zuul-web for change 651910	19:27
Shrews	mordred: is it normal to see so many of the same GET requests for a change? That doesn't seem right	19:31
Shrews	e.g., GET /api/tenant/openstack/status/change/651910,1	19:32
Shrews	happening multiple times per second	19:33
pabelanger	that is a tripleoci patch	19:33
Shrews	so is 651912 which pops up a lot too, but not as much as the other one	19:34
pabelanger	I wonder if that is somebody running a zuul-runner like too	19:34
pabelanger	I know tripleo is trying to do things like scrape API to run zuul jobs locally	19:35
pabelanger	rlandy\|ruck: ^by chance, are you aware of any tooling that would scrap zuul.o.o api?	19:35
Shrews	pabelanger: so maybe something of theirs is behaving badly	19:35
pabelanger	Shrews: do you have an IP where the traffic is coming from?	19:37
Shrews	no, only 127.0.01 (guessing the real ip isn't known?). the client signature is "Mozilla/5.0 (Macintosh; Intel Mac OS X 10.14; rv:66.0) Gecko/20100101 Firefox/66.0"	19:38
Shrews	so maybe not an automated tool	19:38
Shrews	i'm very curious why we don't have the incoming IP now. some cherrypy-ism?	19:40
webknjaz	?	19:41
*** weshay has joined #zuul		19:41
webknjaz	if it's behind a reverse proxy you should use a proxy tool	19:41
rlandy\|ruck	pabelanger: we looked into but we never went that way	19:41
rlandy\|ruck	too complicated error prone	19:42
pabelanger	Oh	19:42
pabelanger	there is a triplo CI tool	19:42
pabelanger	that generates reports in granafa	19:42
pabelanger	for that, I think they scrape api from zuul-web	19:42
pabelanger	rlandy\|ruck: do you happen to know where that is done?	19:43
clarkb	yes it is behind apache so need to check the apache logs	19:44
rlandy\|ruck	zbr: ^^ maybe asking about your work	19:45
rlandy\|ruck	this was not reproducer-related	19:45
Shrews	clarkb: indeed. thx	19:46
rlandy\|ruck	pabelanger: or this: http://dashboard-ci.tripleo.org/d/cEEjGFFmz/cockpit?orgId=1?	19:46
pabelanger	rlandy\|ruck: yes, that is it thankyou	19:47
pabelanger	talking in #tripleo now	19:47
Shrews	source IP is 2a02:8010:61a9:33::1dd7	19:47
rlandy\|ruck	pabelanger: k - sorry - thought you were after reproducer work	19:47
pabelanger	Shrews: that isn't rdocloud, no ipv6, so that is good	19:49
Shrews	we've been getting hit hard ever since that change was submitted, so maybe find the author?	19:50
pabelanger	Shrews: yah, talking to him now	19:51
clarkb	whois says broadband service in the uk	19:51
pabelanger	tripleo is digging into it now	19:54
pabelanger	Shrews: clarkb: just looking at apache logs, there might be a few tripleo script scraping the API, 38.145.34.111 is another and that is in rdo-cloud	20:04
clarkb	any idea what tripleo is trying to learn?	20:04
clarkb	(I wonder if this indicates some new api endpoints that we might want to add)	20:05
pabelanger	clarkb: most monitoring health of their jobs	20:05
pabelanger	eg: http://dashboard-ci.tripleo.org/d/cEEjGFFmz/cockpit?orgId=1 is something they built	20:05
Shrews	most likely indicates the need for rate limiting :)	20:05
pabelanger	but weshay best to ask	20:05
clarkb	hrm we have that data in graphite already?	20:06
pabelanger	Shrews: ya, good idea	20:06
pabelanger	IIRC, zbr also	20:06
clarkb	eg we shouldn't need to scrape zuul's api and can pull that data from what should be cheaper quicker sources like graphite	20:06
*** jamesmcarthur has quit IRC		20:10
weshay	clarkb ya.. monitoring all the things to keep tripleo upstream healthy	20:11
clarkb	weshay: acn we stop doing this https://review.openstack.org/#/c/567224/ ?	20:11
clarkb	and set up a periodic pipeline instead?	20:11
weshay	yup! it's on the to-do list	20:12
mordred	Shrews: wow - I hop on a call for a bit and miss all the fun	20:14
Shrews	mordred: how "convenient" :-P	20:15
clarkb	weshay: thanks	20:16
*** jamesmcarthur has joined #zuul		22:01
*** jamesmcarthur has quit IRC		22:14
*** hashar has quit IRC		22:29
openstackgerrit	Paul Belanger proposed openstack-infra/zuul-jobs master: Add tox-py37 job https://review.openstack.org/651938	22:30
pabelanger	clarkb: fungi: mordred: tobiash: ianw: noticed we did't have tox-py37 job^	22:30
pabelanger	doh	22:30
pabelanger	see bug	22:30
openstackgerrit	Paul Belanger proposed openstack-infra/zuul-jobs master: Add tox-py37 job https://review.openstack.org/651938	22:31
pabelanger	there	22:31
pabelanger	also, spacex launch in 3mins	22:32
fungi	which pad this time?	22:33
clarkb	https://www.youtube.com/watch?v=TXMGu2d8c8g is the official youtube stream	22:33
fungi	ahh, from kennedy	22:34
fungi	wish they'd schedule more for wallops now that they've gotten the pad back together and cleaned up after the explosion	22:35
clarkb	I think they need the saturn 5 pad for this particular configuration	22:36
fungi	the announcer is a little enthusiastic	22:36
clarkb	or maybe its the old shuttle pad? basically its huge so few options	22:36
pabelanger	fungi: Yah, he is awesome	22:36
fungi	rocket ballet	22:41
clarkb	the size of large buildings	22:42
clarkb	I love the kerbal style graphics	22:43
pabelanger	Yay	22:45
pabelanger	3 for 3	22:45
fungi	payload orbital insertion and relanding all first stage boosters in 10 minutes flat	22:46
clarkb	one better than last time	22:46
pabelanger	indeed, so cool	22:46
mordred	++	22:46
clarkb	pabelanger: linting job doesn't like the py37 job addition (I haven't looked at why yet)	22:46
pabelanger	yah, looking now	22:46
fungi	my inner moon landing conspiracy theorist says the life feed cutting out for the third booster touchdown was strategic ;)	22:47
fungi	er, live feed	22:47
mordred	pabelanger: do we want to add it to zuul so that we run 37 tests too?	22:47
clarkb	to followup on the docker ipv6 stuff I haven't heard anything back on either the issue or the PR yet	22:47
clarkb	fungi: ha	22:47
fungi	"no really, we landed it, i swear"	22:47
mordred	fungi: yeah. I'm torn on whether to go conspiracy or 'streaming video sucks'	22:47
fungi	i'm pretty sure streaming video sucks	22:47
mordred	yeah	22:48
openstackgerrit	Paul Belanger proposed openstack-infra/zuul-jobs master: Add tox-py37 job https://review.openstack.org/651938	22:48
pabelanger	think that is our fix	22:48
mordred	and I can imagine that broadcasting video WHILE a rocket lands on top of you might be even more suck	22:48
fungi	on an unmanned drone, even	22:48
mordred	pabelanger: oh - ignore me - my brain was somehow thinking you were adding this to the zuul repo :)	22:48
pabelanger	mordred: can we pull python37 from bionic? I've been using fedora-29 myself	22:48
mordred	pabelanger: dunno. I install python with pyenv myself	22:49
pabelanger	but, we could if people want	22:49
* mordred is such a terrible distro-company employee		22:49
clarkb	fungi: you can probably look outside with binoculars to check right?	22:49
SpamapS	bionic has 3.7.1 and may not see many updates since it's in universe.	22:49
clarkb	pabelanger: check out the work coreyb has done for openstack py37 testing	22:50
clarkb	but ya its basically install it from universe and go	22:50
pabelanger	clarkb: cool	22:50
clarkb	SpamapS: I get some sense that canonical/coreyb are interested in it. TO be see how up to date it gets though	22:51
fungi	SpamapS: though to be fair, it released with something like 3.7.0 beta 2	22:51
fungi	so they have at least updated it once already	22:51
clarkb	pabelanger: the job should also work on fedora and tumbleweed	22:54
clarkb	but tumbleweed will eventually stop having a python3.7 (when it gets 3.8) and fedora will eol in a few months	22:54
pabelanger	clarkb: Yup, was mostly curious is we wanted to add another distro into the mix stick with debuntu	22:55
*** rlandy\|ruck is now known as rlandy\|ruck\|bbl		23:04
openstackgerrit	Paul Belanger proposed openstack-infra/zuul-jobs master: Don't run zuul_debug_info_enabled under python2.6 https://review.openstack.org/650880	23:12
pabelanger	dmsimard: ^update	23:12
pabelanger	clarkb: mordred: ^might have thoughts too	23:12
openstackgerrit	Merged openstack-infra/zuul-jobs master: Add tox-py37 job https://review.openstack.org/651938	23:13
SpamapS	if people are interested in it, 3.7 will stay up to date.	23:13
clarkb	pabelanger: while ansible itself may support python2.6 Im not sure zuul can (no testing)	23:15
*** paladox has quit IRC		23:16
*** paladox has joined #zuul		23:16
clarkb	if we do that check we should log a warning when we skip the info	23:16
pabelanger	clarkb: yah, agree. This is mostly on the remote side of the node from nodepool, I have network images that are still python26 :(	23:16
clarkb	so that it doesnt silently disappear	23:16
pabelanger	that was part of my reason for adding a switch, to avoid hiding the magic	23:17
*** rlandy\|ruck\|bbl has quit IRC		23:21
*** tobiash has quit IRC		23:21
*** corvus has quit IRC		23:21
*** jlvillal has quit IRC		23:21
*** mgoddard has quit IRC		23:24
*** mgoddard has joined #zuul		23:27
*** rlandy\|ruck\|bbl has joined #zuul		23:27
*** tobiash has joined #zuul		23:27
*** corvus has joined #zuul		23:27
*** jlvillal has joined #zuul		23:27
fungi	alternative is to build a python 3.something in an alternate path in your images and tell ansible to use that for ansible things	23:50

Generated by irclog2html.py 2.15.3 by Marius Gedminas - find it at mg.pov.lt!