Friday, 2017-06-09

*** jamielennox\|away is now known as jamielennox		00:02
jamielennox	is there a way i can get zuul to scan my project-config	00:03
jamielennox	s/scan/validate	00:03
SpamapS	mordred: I'm just poking at a thing that translates layout <-> layout and does not attempt to grok jjb	00:13
SpamapS	My thinking is to keep it as simple and unambitious as possible. pipelines and projects to a central config is step 1, and then I want to try and have project job selection moved into git repos as .zuul.yaml	00:14
SpamapS	jamielennox: v2 yes, v3, start it up.	00:14
SpamapS	jamielennox: v3 has trouble with validation the way v2 did it so all we do is validate that the config can parse.	00:15
jamielennox	SpamapS: yea, it's just leading to me debugging by push change to git, restart server, parse logs	00:15
jamielennox	i'm not sure how exactly you could expose it with all the in-repo stuff, just asking	00:15
SpamapS	jamielennox: I think we could definitely write a validator that loads config.	00:48
SpamapS	jamielennox: it just hasn't been done.	00:48
SpamapS	like, just read the zuul.conf and the full layout from every source, like you would at startup	00:48
jamielennox	SpamapS: so realistically i don't even need that, just scan the file i give you and does it make sense	00:49
jamielennox	oh, but you can't	00:49
SpamapS	Oh yeah for that I think you can write a little cmdline entry point	00:49
SpamapS	Well you can	00:49
SpamapS	you can run it through the voluptuous schema checker	00:49
jamielennox	yea, but it depends where things like pipelines and connections are defined	00:49
jamielennox	because gerrit: is a key, and if that's defined elsewhere your schema will fial	00:49
jamielennox	zuul (at least used to) validate reviews against its own project-config so once you have it working it's not so bad	00:50
jamielennox	but if validation fails on startup it seems to take down a thread	00:50
openstackgerrit	Jamie Lennox proposed openstack-infra/zuul feature/zuulv3: Store cache expiry times per status object https://review.openstack.org/461330	00:57
openstackgerrit	Jamie Lennox proposed openstack-infra/zuul feature/zuulv3: Use routes for URL handling in webapp https://review.openstack.org/461331	00:57
openstackgerrit	Jamie Lennox proposed openstack-infra/zuul feature/zuulv3: Use tenant_name to look up project key https://review.openstack.org/461332	00:57
SpamapS	jamielennox: I think you could write a cli version that validates the whole config. Not so sure about fragments.	00:59
*** jamielennox is now known as jamielennox\|away		01:00
SpamapS	jamielennox\|away: that said, for fragments... submit review, wait for angry report of failed config load?	01:00
*** jamielennox\|away is now known as jamielennox		01:10
jamielennox	SpamapS: yea, i think once you're up and running you can rely on the reports	01:11
jamielennox	just for now the error is happening during startup and causing problems	01:11
SpamapS	on startup you can do the whole-config thing	01:12
jamielennox	SpamapS: is there anything about bubblewrap that means the executor should run as root	01:13
jamielennox	i caught something about it on IRC the other day but don't remember it exactly	01:13
jamielennox	we're currently running executor as a zuul user and i have to change the finger port to make that happen, which is fine, i don't care about users being able to finger the executors directly	01:14
jamielennox	but bubblewrap is the other thing i can think of that would be affected by this	01:15
mordred	SpamapS: oh! right - the layout conversion. nod	01:16
SpamapS	jamielennox: no, it should install setuid by default	02:22
SpamapS	jamielennox: OR if you have a very new kernel, it can do its thing with USER_NS stuff without setuid	02:22
jamielennox	SpamapS: so your saying i have to run executor as root?	02:43
tristanC	jamielennox: bwrap should be root setuid so that it can be used by regular user	03:25
*** dkranz has quit IRC		03:26
*** dkranz has joined #zuul		03:31
*** hashar has joined #zuul		09:09
*** hashar has quit IRC		09:23
*** hashar has joined #zuul		09:40
*** hashar has quit IRC		10:18
*** jkilpatr has joined #zuul		10:59
Shrews	tristanC: left you a comment on https://review.openstack.org/472128 along with the +2. was there a reason to avoid the 2 extra characters for 'list_address' ?	12:37
Shrews	vs listen_address	12:38
tristanC	Shrews: not at all, it's a mistake, should be listen_address	12:40
openstackgerrit	Tristan Cacqueray proposed openstack-infra/nodepool feature/zuulv3: Add webapp port and listen_address configuration https://review.openstack.org/472128	12:41
Shrews	tfw the code you had semi-working yesterday no longer works at all today	13:08
pabelanger	Shrews: tristanC: we should also make sure to update good.yaml fixtures too for 472128	13:24
tristanC	pabelanger: done	13:56
openstackgerrit	Tristan Cacqueray proposed openstack-infra/nodepool feature/zuulv3: Add webapp port and listen_address configuration https://review.openstack.org/472128	13:56
pabelanger	tristanC: thanks	13:56
tristanC	you're welcome, thanks for the review!	14:25
openstackgerrit	Paul Belanger proposed openstack-infra/zuul feature/zuulv3: WIP: Add reporter for Federated Message Bus (fedmsg) https://review.openstack.org/426861	15:00
Shrews	So, something has merged within the last few days to affect ansible jobs in unit tests	15:07
Shrews	because what used to work, now hangs here: http://paste.openstack.org/show/612143/	15:08
pabelanger	doesn't look to be using bubblewrap	15:10
Shrews	hrm. lemme rebuild the env. forgot to do that when i rebased	15:12
Shrews	pabelanger: you run fedora, yes? what's the package for bubblewrap?	15:21
pabelanger	Shrews: should be bubblewrap for package name	15:21
pabelanger	there is also bwrap-oci, but haven't tried that	15:22
Shrews	hrm, bubblewrap already installed, it seems	15:23
Shrews	sigh	15:23
Shrews	ok, well the pause was from my wait_for in my playbook. but now i'm not getting build.jobdir populated	15:32
* Shrews takes a long lunch break to blow off frustration. bbl		15:32
openstackgerrit	Paul Belanger proposed openstack-infra/zuul feature/zuulv3: WIP: Add reporter for Federated Message Bus (fedmsg) https://review.openstack.org/426861	15:58
openstackgerrit	James E. Blair proposed openstack-infra/zuul-jobs master: Add Sphinx module for Zuul jobs https://review.openstack.org/472743	16:04
jeblair	SpamapS, mordred: can you take a look at that change and its parents? ^ i'm working on establishing a documentation framework for the zuul stdlib. obviously that should be spun out into a new sphinx module, but incubating it there for the moment.	16:09
SpamapS	mmmmmmmmmmmmmm doc framework	16:19
mordred	jeblair, tristanC, pabelanger, jlk, SpamapS, Shrews: ok. I have sent email to the list on the REST question - sorry it took so long, there was a bunch of research needed doing	16:22
mordred	clarkb, jamielennox: ^^ you too - sorry for lack of ping	16:28
SpamapS	mordred: that's good readin	16:47
pabelanger	jeblair: I have restarted ze01.o.o. We look to be running now	16:48
jeblair	neat!	16:49
jeblair	pabelanger: how about we stop zuulv3-dev now	16:49
pabelanger	jeblair: yes	16:49
jeblair	i'll do that	16:49
pabelanger	http://zuulv3.openstack.org/	16:49
pabelanger	should also properly display things now	16:49
jeblair	zuulv3-dev is stopped	16:50
jeblair	i will recheck a change now	16:50
jeblair	hrm, i'm going to restart the scheduler	16:51
pabelanger	ack	16:52
pabelanger	seen something that time	16:52
SpamapS	how does zuulv3.openstack.org know what status.json to get?	17:02
jeblair	SpamapS: it's in the apache config for now	17:03
jeblair	RewriteRule ^/status.json$ http://127.0.0.1:8001/openstack/status.json [P]	17:03
jeblair	RewriteRule ^/status/(.*) http://127.0.0.1:8001/openstack/status/$1 [P]	17:03
jeblair	we should add a story about making the status page tenant aware, if there isn't one already	17:04
mordred	jeblair, SpamapS: yah - and we'll probably want to figure out an auth story aroud access to tenant status too (I do see that jamielennox already has patches for tenant-aware logs which should also allow a similar amount of protection)	17:06
jeblair	mordred: right. for now, the story is, if you have private tenants, build that into your web server infrastructure.	17:07
jeblair	(like, use mod_auth_foo to restrict access to /private_tenant/....)	17:08
jeblair	auth in zuul itself would be a significant distraction at this point	17:08
mordred	jeblair: ++	17:08
jeblair	SpamapS: have you started any work regarding the new ssh context manager/wrapper thing in the merger? (this came up in the context of test_timer_sshkey)	17:18
SpamapS	jeblair: none.. that's way down in the stack unfortunateyl.	17:22
* jlk looks at the size of the scrollbar on mordred 's email, settles in for a long read.		17:22
jeblair	SpamapS: okay, i think it's about to pop onto the top of my stack	17:22
jeblair	SpamapS: pabelanger and i are suspecting that change is not at all working in reality	17:22
jeblair	i do not see how to use the context manager for an initial clone	17:25
jeblair	SpamapS, pabelanger: okay, i've confirmed that's the problem with the merger. i will start working on a fix. it will take a little bit, but i think we can do it fairly easily and comprehensively.	17:34
jeblair	pabelanger: i have to take care of a few things first; it will probably be a couple of hours until i'm ready with this change.	17:38
jlk	mordred: just to screw with your noodle, what if we skipped REST and went straight to GraphQL?	17:38
pabelanger	jeblair: understood	17:39
pabelanger	jlk: is GraphQL the new hotness these days?	17:40
jlk	pabelanger: seems like a buzzy word, but I'm not really clear on what problems it solves. I'm mostly aware of it because Github builds their website with graphql, and are now exposing it so that us app writers can get first-class access as they change the platform, instead of waiting for somebody to re-write everything for their REST API.	17:42
jlk	at some point, we're going to have GraphQL in zuul, to deal with github, at least on a client level.	17:42
jeblair	fwiw, i'm not sure 'rest api' actually describes what we'll end up doing with zuul anyway. more like 'http api'.	17:42
pabelanger	Ya, I've only heard of it from here and githubv4 (I think). Not sure what else is using it. Looks like facebooks started it	17:43
jlk	nod	17:43
jeblair	(there are almost zero actual REST apis in the world)	17:44
SpamapS	Because that's hard and mostly an exercise in correctness not pragmatism? :)	17:46
SpamapS	Looks to me like GraphQL is more about being super expressive on the client side about what you want from the server and reducing round trips by allowing secondary lookups in the responses.	17:47
SpamapS	I could see that being useful for a better status responder	17:47
SpamapS	Like if no jobs are expanded you don't need the lists of jobs.	17:48
jlk	yeah I'm not seriously suggesting it.	17:48
jlk	although now that I re-read what you're saying	17:48
SpamapS	jlk: I actually think it's for making what we do with status.json more efficient.	17:48
SpamapS	I mean, status.json is 110 - 120 kB	17:49
jlk	My basic read is that you can so "Go fetch me this info, and while you're there, fetch this adjacent info, and that info, and that over there too, and all teh things attached to that"	17:49
jlk	or you can just say "I only want this tiny bit"	17:49
SpamapS	if you refresh that 1/s ... that's not a trivial amount of data if you have a lot of users watching.	17:49
SpamapS	jlk: its like SQL, for backend API queries. ;)	17:49
jlk	It'd allow us to go from 4 or 5 round trips to the github API to a single call.	17:50
SpamapS	jlk: yeah seems like something worth it for Github to push, since it will likely reduce their bandwidth usage and server side wasted overhead.	17:50
SpamapS	so I think it's a super valid thing to push for when we revisit status.json	17:51
SpamapS	also I didn't know we had some kind of frontend for nodepool?	17:51
SpamapS	or is it just an HTTP backend	17:51
jlk	I could see these things being useful for monitoring	17:52
jlk	particularly in a containerized world, where we're not going to run zuul _and_ a monitoring daemon	17:52
jlk	I want something external that can poke at it and tell me if it's healthy	17:52
jlk	and later data mining for efficiency fixes. "where are you spinning your wheels"	17:52
SpamapS	yeah those /health checks mordred mentioned	17:53
SpamapS	jlk: that's more statsd's job tho	17:53
SpamapS	just let zuul spit that stuff out and you can go get answers from influxdb or datadog	17:53
jlk	how does statsd work? what collects it? Do you have to tell it what to send things to?	17:54
jlk	brb, walking the furry four legger.	17:54
mordred	SpamapS: /health checks are a thing k8s people want for prometheus	18:00
mordred	SpamapS: I'm not saying we implement them today - but it's a thing that will come up in the future	18:01
SpamapS	I like them for stuff too	18:01
mordred	yay	18:01
SpamapS	I used to have nagios ssh'ing into boxes and restarting our apache when /health timed out	18:01
SpamapS	this isn't a new paradigm ;)	18:01
SpamapS	jlk: statsd is daemon that listens for a UDP protocol. You add little calls in your app when you want to increment counters and it sends them off to statsd. Zuul and nodepool already do that.	18:02
SpamapS	jlk: and then statsd takes those increments and puts them where you want.. graphite/influxdb/etc. Datadog speaks statsd too.	18:03
SpamapS	mordred: does my response make sense at all?	18:18
SpamapS	I'm kind of arguing that you have a nice clean "here's where the world talks to Zuul" spot, and then you can have any kind of messy backend weirdness you want behind that.	18:19
mordred	SpamapS: I goes to read	18:35
mordred	SpamapS: it does - although amusingly enough part of it (the zk async stream concern related to webhooks) is in the spec I wrote about ingestors that I told peole to ignore that clarkb also suggested mqtt for. I don't want to actually get in to that debate currently (I think there are pros and cons to each and we shoudl discuss those) - but I don't think we have to for now	18:40
mordred	SpamapS: I would like to bikeshed on the one vs. two endpoints thought though	18:41
mordred	lemme respond to the email first though	18:41
*** yolanda has quit IRC		18:43
jlk	SpamapS: right, so okay. The app has to support shuffling things off to statsd itself, and has to have configuration for _which_ statsd to shuffle them off too.	18:45
*** yolanda has joined #zuul		18:45
jlk	SpamapS: that's a little different in that you need prior knowledge of this stuff IN the config file to run the service, making moving an image to different environments a bit more difficult (but not really since we already ahve some of that stuff).	18:46
mordred	jlk: it's acutally usually configured via env vars though	18:46
jlk	whereas if it were health things hanging off the API, it wouldn't matter where they're ran, outside parties could poke at the health points.	18:46
jlk	mordred: I can't tell you how much I hate configuration via ENV vars.	18:46
mordred	me too - but you know it's the "right" way in cloud native, right? ;)	18:46
mordred	https://12factor.net/config	18:47
mordred	jlk: but in any case, in this particular instance it's a mechanism where communicating to statsd where the statsd is for your k8s would be fairly easy to accomplish without app having a-priori knowledge, no?	18:47
jlk	so..	18:48
jlk	yeah, statsd daemon startup should advertise its location to k8s, which will toss it in etcd	18:48
mordred	also - to provide metrics to a /status endpoint, the app has to _save_ the metrics somewhere	18:48
mordred	jlk: ++	18:49
jlk	yoru service still have to be able to read from etcd to, at start up, know where to send statsd stuff	18:49
jlk	(and within the cluster that place is going ot be fairly static, due to k8s proxy things)	18:49
clarkb	mordred: it was mostly a response to having a spec for a thing that already exists mostly. I was saying you don't need a spec for that you just need to have zuul read from mqtt	18:49
mordred	clarkb: right. I'm not convinced that actually solves the use cases	18:49
SpamapS	jlk: it's push vs. pull all over again. :)	18:49
mordred	clarkb: which is why I think it'll be fun to talk about	18:49
* SpamapS will ponder whilst at the gym		18:49
jlk	mordred: hrm, I hadn't thought about saving the metrics, just more of getting snapshots of state periodically. Hit hte API point to get a snapshot of state, some external system does the time series analysis	18:50
mordred	jlk: yes. I think polling urls for some state is a FINE idea	18:50
mordred	like, polling the zuul status page, for instance - would be a fine way to get an amount of data	18:50
jlk	I also think statsd is a fine idea.	18:50
jlk	since we have ALREADY built statsd support into the app	18:51
mordred	yah- I think having a /status AND also emitting to statsd is best of both worlds	18:51
jlk	which reminds me, should probably take a pass through the github driver code and litter some statsd about	18:51
mordred	I think there will be metrics we provide to statsd that would be hard to collect and provide to /status - but maybe that's fine	18:51
jlk	yeah I can see a separation of concerns	18:53
jlk	you don't want lengthy analysis to happen when hotting the status/ url	18:53
jlk	not unless we do something like graphQL and allow taking a sip of the stats vs the firehose	18:54
mordred	SpamapS: ok. responded with more words than are likely necessary	19:12
mordred	jlk: agree	19:13
pabelanger	lynx finger://ze01.openstack.org	19:18
pabelanger	:D	19:18
pabelanger	that is kinda cool	19:18
pabelanger	cannot wait for jobs to be listed	19:18
jeblair	mordred, SpamapS: i also responded, in an apparently complementary way to mordred. hopefully walked the right line between "lets decide the now stuff now, and talk about the later stuff later".	19:19
jeblair	pabelanger: "finger @ze01.openstack.org" should return a nice error message	19:21
jeblair	lacking a newline	19:21
pabelanger	jeblair: ya, was playing with that already	19:22
pabelanger	way trying to see if any client support other then port 79	19:23
pabelanger	GNU finger apparently does, but that is not shipped in fedora	19:23
mordred	jeblair, Shrews: that finger command is the most exciting thing ever	19:27
* Shrews very angry at finger things atm		19:28
* mordred hands Shrews a box of fingers he found laying around		19:28
jeblair	mordred: you're always bringing game of thrones into things	19:29
* Shrews gives middle finger back to mordred		19:29
Shrews	did i win? i feel like i just won	19:29
jeblair	Shrews: yes but you gave the prize to mordred	19:30
openstackgerrit	David Shrewsbury proposed openstack-infra/zuul feature/zuulv3: WIP: add log streaming test https://review.openstack.org/471079	19:30
Shrews	if someone wants to explain why my ansible job never runs in that test (it did until i rebased yesterday), i would be terribly grateful	19:31
Shrews	because i'm a bit fed up	19:31
Shrews	test_log_streamer.py line 102	19:32
mordred	Shrews: when you say "don't seem to have real sync for ansible jobs" ...	19:33
Shrews	we can't pause ansible jobs with hold_jobs_in_build	19:33
mordred	Shrews: http://git.openstack.org/cgit/openstack-infra/zuul/tree/tests/unit/test_inventory.py?h=feature/zuulv3#n29	19:34
mordred	Shrews: self.executor_server.hold_jobs_in_build = True seems to work for me?	19:34
Shrews	mordred: for fake builds	19:34
mordred	OOOHOHHHHHHHH	19:34
mordred	gotcha	19:34
mordred	sorry - and thanks for clarifying for my dumb brainhole	19:35
Shrews	the entire test attempts to keep the ansible log file around long enough so i can attempt to start the finger thingy and stream it	19:35
Shrews	which i almost had working yesterday (was streaming and getting contents)	19:36
Shrews	now nothing works	19:36
clarkb	mordred: I have responded to the great api thread of june 2017	19:36
pabelanger	http://logs.openstack.org/79/471079/3/check/gate-zuul-python35/fd2756a/console.html#_2017-06-07_17_13_58_908303 seems to be a warning, but I don't see any other tasks after it	19:36
Shrews	pabelanger: that's another thing i've been avoiding. i have no idea why that's there	19:37
pabelanger	Ya, I don't see any of the other tasks after welcome	19:38
pabelanger	possible that ansible is just not running them now?	19:38
mordred	I think it's acutally that somehting is failing in that method so no file is getting written	19:38
* mordred looks		19:38
pabelanger	I'd actually expect http://logs.openstack.org/79/471079/3/check/gate-zuul-python35/fd2756a/console.html#_2017-06-07_17_13_58_909229 to be ok=4, since you have 4 tasks	19:39
mordred	Shrews: mind if I push up an update with some added debugging?	19:40
Shrews	mordred: go fore it	19:41
Shrews	i can't even get that locally. something missing from my local setup?	19:42
jeblair	Shrews: remember the venv activation required when using ttrun on real ansible jobs?	19:42
Shrews	jeblair: yup, got that	19:42
jeblair	drat, out of ideas	19:43
mordred	Shrews: I think you may have rebased on a commit that's in the middle of the updates to the callback plugin	19:43
mordred	nothing that should be causing an issue here though	19:44
Shrews	welp, going to just rebase on an older commit that works	19:45
Shrews	mordred: fwiw, i was getting that v2_playback method error before rebasing, too	19:45
mordred	Shrews: oh - I say that ...	19:45
mordred	Shrews: nope. there is a bugfix that went in after your rebase that is probably the thing screwing you	19:46
Shrews	i thought maybe something was wrong with my test ansible config setup causing that	19:46
openstackgerrit	Monty Taylor proposed openstack-infra/zuul feature/zuulv3: WIP: add log streaming test https://review.openstack.org/471079	19:46
mordred	Shrews: I missed an encode('utf-8') on the send to the socket part	19:46
jeblair	py3 is the best!	19:46
mordred	Shrews: that adds a debugging and a rebase	19:47
mordred	Shrews: so either it'll work now, or we'll hopefully see the error message	19:47
openstackgerrit	Monty Taylor proposed openstack-infra/zuul feature/zuulv3: WIP: add log streaming test https://review.openstack.org/471079	19:48
mordred	sorry - one more fix	19:48
jlk	alright I'm out for the weekend. Would love some eyes on the github caching change object er, change.	19:52
openstackgerrit	James E. Blair proposed openstack-infra/zuul feature/zuulv3: Fix git ssh env in merger https://review.openstack.org/472788	19:55
jeblair	pabelanger, SpamapS, mordred: ^ that's to fix the current blocker on ze01	19:56
pabelanger	looking	19:57
Shrews	mordred: the exception would seem to imply something wrong with my job yaml, but i don't see it	20:00
* Shrews wishes for a separate utility for zuul-purposed yaml files		20:06
jeblair	Shrews: what exception?	20:11
Shrews	jeblair: http://logs.openstack.org/79/471079/6/check/gate-zuul-python35/0744d62/testr_results.html.gz	20:11
jeblair	Shrews: s/image/label/	20:11
jeblair	Shrews: https://review.openstack.org/472372 changed that	20:12
Shrews	jeblair: other configs use image	20:12
jeblair	Shrews: before yesterday, yes	20:12
pabelanger	Ah, also didn't know we changed it	20:13
jeblair	trying to get the breaking config changes in before we write too many real configs :)	20:14
pabelanger	ya, I don't believe we are using that today in our jobs	20:15
Shrews	ah, mordred rebased it	20:15
openstackgerrit	David Shrewsbury proposed openstack-infra/zuul feature/zuulv3: WIP: add log streaming test https://review.openstack.org/471079	20:15
mordred	Shrews: oh - sorry - yah -rebased to get the utf-8 fix, forgot the label change	20:16
Shrews	mordred: pushed up a PS to correct it	20:16
mordred	Shrews: fingers crossed	20:18
mordred	jeblair: your patch has the sads	20:18
Shrews	still don't know why things won't run locally now. just going to have to depend on zuul, i guess	20:19
openstackgerrit	James E. Blair proposed openstack-infra/zuul feature/zuulv3: Fix git ssh env in merger https://review.openstack.org/472788	20:21
jeblair	mordred: thanks. let's try that ^. :)	20:21
pabelanger	mordred: jeblair: On the topic of nodepool-drivers, would a heat implemetation fall under the current openstack driver via shade today? Or some new dedicated driver? Was having some talks with tripleo-CI folks and the use case would be have nodepool create their OVB resources directly when possible	20:22
mordred	pabelanger: that's a great question. I think a heat driver could be a separate driver, but I'd want it to use shade not use heatclient directly	20:24
jeblair	pabelanger: could that use case be covered by linch-pin?	20:25
mordred	pabelanger: it's also possible that making it a different driver doesn't make sense and just adding it as an option to the current one would make more sense	20:25
pabelanger	jeblair: yes, that is also possible. I am really starting to like the idea of a generic ansible driver or we just say that is linch-pin thing	20:25
pabelanger	mordred: understood	20:26
mordred	lke - I'm guessing it would be "boot this heat stack" rather than "boot this server" - so one could imagine having a "stack" option instead of an "image" option	20:26
pabelanger	I get the feeling it is more about how people would like to support it long term	20:26
mordred	but I don't know enough	20:26
pabelanger	mordred: ya, that was my simplest use case atm	20:26
mordred	pabelanger: it's also worth doing some thought into how we express that zuul-side and how the resources in question make it into the inventory	20:26
jeblair	yes, it's 'nodepool' not 'stackpool' :)	20:27
mordred	which is to say - I don't think I know enough to know whether it sohuld be linch-pin, a modificaiton to the current openstack driver or a new driver	20:27
mordred	but - I can say that in all three of those cases the heat api interactions should all be via shade ;)	20:27
mordred	also - the same question will need to be asked re: multi-node and linch-pin integration - how does that get expressed in zuul config and how does it map into inventories	20:28
mordred	so there is a worthwhile question to explore regardless of impl details	20:28
jeblair	yep. i don't think we have time to fully explore it now. i'd really like us to focus on getting v3 out the door.	20:29
jeblair	these are important considerations, but let's add them to the backlog, not get distracted by them now.	20:30
pabelanger	Ya, I don't want to distract on current efforts for sure	20:30
mordred	yah. agree. but definitely think there is a thing that is super worth digging in to when the time is right	20:30
openstackgerrit	David Shrewsbury proposed openstack-infra/zuul feature/zuulv3: WIP: add log streaming test https://review.openstack.org/471079	20:32
jeblair	mordred, pabelanger: on that note, how about https://review.openstack.org/472472 ?	20:32
jeblair	pabelanger raised a good point there	20:33
openstackgerrit	Merged openstack-infra/zuul-jobs master: Add initial license, docs, and other config https://review.openstack.org/472410	20:34
openstackgerrit	Merged openstack-infra/zuul-jobs master: Add revoke-sudo role https://review.openstack.org/472461	20:34
pabelanger	Ya, on the fense. If we wanted to enforce ansible, it would be a great thing to skip. But the opposite, it is an easy way to run a bash script	20:34
jeblair	and i guess the answer is how much should the "standard library" python jobs attempt to do? after conversations with mordred, i'm encouraged to try to push it as far as possible.	20:34
pabelanger	+2 however, don't want to block	20:35
pabelanger	ya, if we want to support it cool	20:35
jeblair	pabelanger: i think the idea for this job would be to make it as simple as possible for someone to "just run the python unit tests in a repo". we might even add things to it later to support nose as well as testr, to try to make it universally applicable.	20:35
jeblair	pabelanger: so it would work as well for an openstack project as a random github project	20:36
jeblair	pabelanger: i've been digging into all the stuff we do (in run-tox.sh), and i actually think most of it can be universally applied. which surprised me a bit.	20:36
jeblair	(as long as we do it carefully)	20:36
jeblair	(aside from the 50mb subunit limit)	20:37
pabelanger	sure, wfm	20:37
jeblair	ok, 1 test failure on the merger fix; going to refresh that now	20:38
openstackgerrit	James E. Blair proposed openstack-infra/zuul feature/zuulv3: Fix git ssh env in merger https://review.openstack.org/472788	20:39
jeblair	pabelanger: i think the next step in that series is to start decomposing run-tox.sh into roles	20:39
pabelanger	Ya, that seems right	20:40
Shrews	mordred: http://logs.openstack.org/79/471079/7/check/gate-zuul-python35/b0b14f4/console.html#_2017-06-09_20_20_30_429766	20:40
mordred	Shrews: ooh - that's fun	20:42
Shrews	missed encoding there	20:42
Shrews	unless it's fixed elsewhere?	20:42
mordred	Shrews: oh for the love of ... yes, that's the bug	20:43
* mordred feels bad - was testing this locally with py27 - should stop doing that		20:43
Shrews	mordred: we are at least now hitting my artificial failure point again. many many thanks	20:44
mordred	Shrews: you got that encoding or want me to?	20:46
Shrews	i got it	20:46
mordred	kk. cool	20:46
mordred	Shrews: is that working locally again too?	20:48
mordred	by "working" I mean "breaking properly"	20:48
openstackgerrit	David Shrewsbury proposed openstack-infra/zuul feature/zuulv3: Fix zuul_streamer send() call for py35 https://review.openstack.org/472816	20:48
jeblair	yay passes tests now	20:51
pabelanger	Yay tests	20:52
mordred	jeblair: ^^ that change if you get a sec to bump it in	20:52
jeblair	mordred, pabelanger: what do you think of the documentation-related bits i've started on in 472485 and 472743?	20:52
jeblair	mordred: +3	20:53
mordred	jeblair: ++	20:53
pabelanger	jeblair: I like them, I was hoping to see the output	20:53
openstackgerrit	David Shrewsbury proposed openstack-infra/zuul feature/zuulv3: WIP: add log streaming test https://review.openstack.org/471079	20:54
Shrews	mordred: i don't think so	20:54
jeblair	pabelanger: yeah; that'll be easier with zuulv3.o.o up and running. :)	20:54
pabelanger	++	20:54
jeblair	pabelanger: you can run 'tox -e docs' with it checked out locally though and you'll get something	20:54
pabelanger	Agree, I'm not sure I have them cloned locally yet :)	20:56
pabelanger	will have to try that out in a bit, looks like heading out for some food momentarily	20:56
openstackgerrit	Merged openstack-infra/zuul feature/zuulv3: Fix git ssh env in merger https://review.openstack.org/472788	21:01
openstackgerrit	Merged openstack-infra/zuul feature/zuulv3: Fix zuul_streamer send() call for py35 https://review.openstack.org/472816	21:06
openstackgerrit	David Shrewsbury proposed openstack-infra/zuul feature/zuulv3: WIP: add log streaming test https://review.openstack.org/471079	21:10
mordred	Shrews: zomg. patch 9 actually passed the test for py27!	21:11
Shrews	mordred: that one has always passed	21:11
mordred	oh	21:11
mordred	oh right - because you skip if not py35	21:11
Shrews	right	21:12
Shrews	the string exceptions are new. hoping that's some other random thing	21:12
Shrews	mordred: locally, it seems to hang on the 'shell' task since i get no output in the ansible log for that task beyond the name. it seems to continue on zuul, but the current patchset should validate that	21:14
Shrews	specifically: shell: echo 'Hello, World'	21:15
mordred	Shrews: that sounds like the same issue I'm seeing just when I try to run a simple playbook not in the test suite ... but on the patch where I combined the two log streamers	21:19
mordred	I wonder ...	21:20
Shrews	ugh, mysterious StringException again	21:20
Shrews	aaaaand nothing in the logs. wunderbar	21:21
mordred	Shrews: http://logs.openstack.org/79/471079/10/check/gate-zuul-python35/3fb578e/console.html#_2017-06-09_21_17_20_206490	21:21
Shrews	wuh?	21:22
mordred	hrm - I dont think that's for your test ..	21:22
jeblair	oh yeah, that looks like the old "everthing spews to the console" bug again	21:22
mordred	yah	21:22
jeblair	that's a test timeout	21:23
jeblair	there's no way we're going to extract its output (if any) from the console log	21:23
Shrews	so perhaps my test IS hanging as well? i removed the short-circuit to prevent the hang	21:24
Shrews	hanging in zuul as well, i mean	21:24
jeblair	yep	21:24
Shrews	i wonder what happens if i remove the shell task...	21:25
* Shrews tests		21:25
Shrews	hah! it proceeds as normal	21:26
mordred	Shrews: I've got a thing I want to toss up to try	21:26
Shrews	something funky with our shell module	21:26
mordred	well - that's gonna be doing the stream-logs-from-remote-host code - which maybe isn't working so well in the test framework	21:27
mordred	Shrews: oh- actually - you need to be runnign a zuul_console on the "remote" node for this to work at all	21:27
* mordred feels really stupid		21:27
jeblair	oh yeah.	21:28
mordred	one -sec - lemme make you a quick patch thing	21:28
jeblair	this test is going to need a good long comment. :)	21:28
mordred	yah	21:28
*** jkilpatr has quit IRC		21:28
mordred	so - if we don't test with shell, but instead test with non-shell things	21:28
mordred	it should be fine for testing the finger log streamer	21:28
mordred	since non-shell things do not need zuul_console to produce output	21:28
Shrews	k. that's fine. i really didn't need that shell thing anyway	21:28
mordred	yah- you just need things to produce output so you can test that you stream that output	21:29
Shrews	yup	21:29
Shrews	jeblair: this test has been a... experience... for sure	21:30
Shrews	jeblair: can i do this without defining the 'nodes' in the job? I'm seeing entries for 'localhost' and 'ubuntu-xenial' in the ansible log and they're sort of conflicting when i go to remove the flag file	21:33
openstackgerrit	Monty Taylor proposed openstack-infra/zuul feature/zuulv3: Don't wait for forever to join streamer https://review.openstack.org/472839	21:34
openstackgerrit	Monty Taylor proposed openstack-infra/zuul feature/zuulv3: Use display.display for executor debug messages https://review.openstack.org/472840	21:34
mordred	Shrews, jeblair: two minor changes to zuul_stream to consider	21:34
jeblair	Shrews: yes; i think we always add 'localhost' in the tests.	21:35
Shrews	mordred: lgtm, except the 2nd one has an unintentional indent	21:37
jeblair	mordred: 2 comments on the first one; a 0 and a -0.5. :)	21:39
openstackgerrit	David Shrewsbury proposed openstack-infra/zuul feature/zuulv3: WIP: add log streaming test https://review.openstack.org/471079	21:41
jeblair	mordred: does 840 put the "starting to log" message into the main job log?	21:41
jeblair	mordred: if so, i'm not sure i'm keen on that. maybe we could run with '-vvv' for the tests instead?	21:42
mordred	jeblair: no - it does not	21:46
SpamapS	quite the spirited discussion on HTTP stuffs	21:46
mordred	jeblair: it's probably worth a comment somewhere	21:46
jeblair	mordred: ok. i are confused.	21:46
mordred	jeblair: there are essentially now two places things can go - the log file we defined, and stdout	21:46
jeblair	mordred: got it, thanks. :)	21:47
jeblair	mordred: then i'll be +2 on that after the fixup.	21:47
mordred	jeblair: self._display.display will put things on stdout - which it seemed like our test suite was capturing in some manner at least in an earlier linkn from pabelanger	21:47
mordred	cool	21:47
jeblair	mordred: yeah, and it'll end up in the zuul executor log, which is fine	21:47
mordred	jeblair: it's possible we may want to explore intentional uses of those two a little more	21:47
jeblair	nod	21:48
mordred	jeblair: re: waiting for 30 - tasks will not proceeed any further if that is blocking	21:48
mordred	we ok with that still?	21:48
jeblair	mordred: it should only matter if something has gone wrong, or if we're really backlogged reading the log. so it seems okay to me....?	21:49
mordred	++	21:50
jeblair	pabelanger: sweet. zuulv3 is up sufficiently that it is complaining about config errors related to image/label	21:50
SpamapS	also... nobody's biting on my Netflix/zuul reference? ;-)	21:51
mordred	SpamapS: :)	21:51
mordred	SpamapS: I almost did, but then decided not to	21:51
SpamapS	good choice	21:53
SpamapS	it was bad	21:53
Shrews	mordred: ps11 fails as I expect now	21:53
jeblair	2017-06-09 21:54:36,390 INFO zuul.IndependentPipelineManager: Adding change <Change 0x7f1bfc2e4780 472483,1> to queue <ChangeQueue check: openstack-infra/zuul> in <Pipeline check>	21:55
jeblair	that's progress	21:55
jeblair	2017-06-09 21:55:13,467 DEBUG zuul.AnsibleJob: [build: 77890c7afde348899105bccf5bcb71f3] Ansible output: b'fatal: [ubuntu-xenial]: UNREACHABLE! => {"changed": false, "msg": "SSH Error: data could not be sent to remote host \\"15.184.66.20\\". Make sure this host can be reached over ssh", "unreachable": true}'	21:55
jeblair	that's unfortunate	21:55
mordred	jeblair: do you remember why we're doing the log streaming in zuul_stream as a subprocess and not a thread?	21:55
SpamapS	key problems?	21:56
jeblair	mordred: i thought it was something about ansible modules, but i can't recall what.	21:56
jeblair	er plugins	21:56
jeblair	like, something about that forced our hand. but i have no idea.	21:57
mordred	nod	21:57
jeblair	mordred: should be easy to switch, since they have almost the same interface.	21:57
jeblair	SpamapS: highly likely	21:57
* mordred is starting to worry about layers of subprocesses with things reading and writing sockets and files and whatnot		21:57
SpamapS	jeblair: is this trusted or untrusted? While I tested the ssh-agent stuff lightly locally... and the tests verify it works the way we hope it does.. there are a few new moving parts there...	21:58
jeblair	SpamapS: amusingly, i think all of our config right now is untrusted	21:58
SpamapS	fail closed FTW?	21:58
jeblair	heh	21:58
Shrews	mordred: it does seem like that could get fragile rather quickly	21:59
SpamapS	how's that subprocess spawned btw?	22:00
* SpamapS looks		22:00
jeblair	multiprocessing i think	22:00
SpamapS	:q	22:01
SpamapS	yay vim reflex	22:01
jeblair	SpamapS: i think we have more local key problems on the host	22:02
SpamapS	I've always thought multiprocessing was only for stepping around the GIL.. if you have other issues, subprocess is the cleaner path.	22:03
openstackgerrit	Monty Taylor proposed openstack-infra/zuul feature/zuulv3: Use display.display for executor debug messages https://review.openstack.org/472840	22:04
openstackgerrit	Monty Taylor proposed openstack-infra/zuul feature/zuulv3: Don't wait for forever to join streamer https://review.openstack.org/472839	22:04
openstackgerrit	Monty Taylor proposed openstack-infra/zuul feature/zuulv3: Use threads instead of processes in zuul_stream https://review.openstack.org/472850	22:04
mordred	jeblair, Shrews: ^^ local testing shows that to work just fine	22:04
mordred	SpamapS: you too	22:04
SpamapS	mordred: is it possible we were worried about GIL contention?	22:05
SpamapS	log streaming is going to be a constant load	22:05
jeblair	this is inside of an ansible process	22:05
SpamapS	or is that only inside ansible-playbook and thus not such a concern?	22:05
mordred	ya	22:06
mordred	that	22:06
mordred	I tihnk it was cargo-cult- not specific decision	22:06
SpamapS	so we already have our own per-job CPU eater	22:06
SpamapS	ack	22:06
* SpamapS is about 25 minutes from EOD'ing early to begin a 3 day LEGOland extravaganza in beautiful Carlsbad CA...		22:06
mordred	ooh fun	22:06
SpamapS	so.. my focus is already drifting	22:06
* SpamapS pulls it together for a last review push		22:07
mordred	SpamapS: also - I agree, I don't know that moving status.json out of the scheduler is of immediate concern and just shifting that to be aiohttp in place for now seems fine	22:07
SpamapS	mordred: that stack is a little funny	22:08
SpamapS	adds the terminate() call and then removes it	22:08
SpamapS	mordred: I actually really love the idea of coalescing with Etag's of the hashed status.json or something.. but.. yeah, KISS says if you already need to rework your HTTP, just rework it to be the way you want it.	22:09
jeblair	SpamapS, mordred: you both +2d https://review.openstack.org/472485 which uses https://review.openstack.org/472483 -- does that mean you like the addition of that field?	22:13
SpamapS	jeblair: I do! +2'd	22:16
mordred	jeblair: yup	22:19
clarkb	mordred: fwiw the api services that just talk to the other services in openstack are what are being wsgi'd there is enough application logic in them talking to backends to make that desireable	22:21
clarkb	but maybe I still misunderstand what you are saying?	22:21
mordred	clarkb: yah - I'm saying that in either case we want to have the api services be separate services	22:22
mordred	clarkb: so either WSGI or aiohttp it'll be a thing thats job is just to handle http requests	22:22
clarkb	mordred: right, but the problem is not just that they are spearate but that running the http server in python is sadness	22:22
mordred	clarkb: right - aiohttp is apparently _much_ better at that	22:23
mordred	one sec - lemme get you links	22:23
SpamapS	right that's basically the point of aiohttp	22:23
clarkb	so I see aiohttp as roughly equivalent to eventlet + whatever webserver that nova api (used to) give you	22:23
SpamapS	I have zero problem with WSGI.. but if we're already going python3.5+ and asyncio for streaming...	22:23
mordred	SpamapS: exactly	22:23
SpamapS	yeah I don't think comparing it to eventlet is fair at all	22:24
clarkb	they are very similar in both design and use (if you dont use eventlet monkeypatching and instead fully cooperate)	22:24
clarkb	asnycio has the benefit of the syntax being nicer in new python though	22:24
SpamapS	eventlet is trying very hard to hide complexity from you, and in so doing creates a debugging nightmare and a compatibility nightmare (basically invalidates half of pypy's advantages because of its trickery)	22:24
clarkb	SpamapS: no thats not quit true, you can use eventlet fully explicitly without monkey patching iirc	22:25
clarkb	now it happens that openstack doesn't	22:25
SpamapS	Did any of the openstack services not use monkeypatching?	22:25
mordred	clarkb: https://github.com/aio-libs/aiohttp/issues/234 - the bug about "add docs explaining how to use this in production" and the associated PR discuss that the situation is different	22:25
mordred	https://github.com/aio-libs/aiohttp/pull/237	22:25
SpamapS	And it's not just the monkeypatching that breaks eventlet for pypi	22:25
SpamapS	pypy	22:25
clarkb	pypy works fine now	22:25
clarkb	has for years (vishy got it going iirc)	22:26
SpamapS	works, does not improve your performance	22:26
SpamapS	(the way pypy should)	22:26
clarkb	SpamapS: according to intel it does	22:26
SpamapS	ah maybe intel fixed eventlet	22:26
clarkb	I don't know how they made it happen but they did all kinds of testing and benchmarks	22:26
mordred	given that we _currently_ are doing fine with a paste server in a thread, I think aiohttp is likely to be fine for us too :)	22:26
clarkb	mordred: sort of we have to cache the response in apache for zuul	22:26
clarkb	because paste in a thread doesn't cut it	22:26
SpamapS	there were some things that were causing the jit to run over and over IIRC	22:26
clarkb	(greanted you could continue to do that)	22:26
mordred	yah	22:26
clarkb	specifically for status.json because its big	22:27
SpamapS	I think we would definitely continue to do that.	22:27
SpamapS	And then refactor to read from zk when we have a zk to read from.	22:27
mordred	yup. but that'll be for later :)	22:27
SpamapS	and maybe by then we'll have more than 3 things to have in the web tier that might make WSGI+Flask a more compelling choice.	22:28
SpamapS	(like the admin requests)	22:28
clarkb	SpamapS: the recent thread about dropping pypy testing for cinderclient brought up the pypy + openstack + intel stuff and I think they had rough numbers there	22:28
clarkb	unfortunately they haven't done much upstream (at least not explicitly) that I have seen so its somewhat hand wavy still	22:28
SpamapS	clarkb: I remember seeing those and thinking it was surprising. As usual, my info is outdated. :-P	22:28
clarkb	SpamapS: I went to a local talk they gave on it using swift specifically and I want to say it was like 2x speedup but only after processes were running for ~5 minutes	22:29
clarkb	so it won't make $clitool better but for long lived services even with eventlet it can be quite beneficial	22:29
SpamapS	Sounds like a win to me. :)	22:29
SpamapS	I figure aiohttp will finally give you twisted level performance without twisted level brain twisting	22:31
mordred	yah	22:32
mordred	my main thinking is that we have to run log streaming anyway - and that's websockets in python and isn't served by wsgi. so we _could_ do wsgi for the other things, but since we're not actually an API service in that way, it doesn't seem like any win to justify 2 http technologies when we can just use one and likely be fine	22:32
mordred	especially since people report that aiohttp performs very well	22:32
SpamapS	also the API for aiohttp looks simple enough	22:33
SpamapS	it's not like we're saddling developers with something super weird	22:33
clarkb	mordred: ya I agree that keeping them similar is worthwhile	22:33
clarkb	mordred: I'm just afeared of deciding in a year it all has to be rewritten becuase average response time	22:33
SpamapS	in a year it does all have to be rewritten	22:34
SpamapS	so we can scale out schedulers ;)	22:34
jeblair	finger 5b7d2c04bd0a4a229167e1a89d2fa2ce@ze01.openstack.org	22:36
jeblair	everybody run that	22:36
mordred	I am running that	22:36
mordred	jeblair: mine is not streaming anything but is sitting at 2017-06-09 22:33:12.248107 \| TASK [openstack-info : Display networking information about zuul worker.]	22:36
jeblair	mordred: agreed	22:36
mordred	cool	22:36
jeblair	mordred: i am happy about the things it output up to that point	22:37
clarkb	censored is a funny verb for "this writes too much data"	22:37
clarkb	:)	22:37
jeblair	i am less happy about the lack of things it output after that	22:37
mordred	jeblair: I will confirm that that is all th efile on disk shows	22:38
jeblair	i'll ssh into the worker	22:38
jeblair	which, amusingly, would be easier if it had gotten around to printing its ip address	22:38
mordred	jeblair: hah	22:38
jeblair	the only thing running on the worker is zuul 839 0.0 0.1 115788 10068 ? Sl 22:33 0:00 /usr/bin/python /tmp/ansible_XtEykT/ansible_module_zuul_console.py	22:41
mordred	jeblair: is there a /tmp/console file?	22:41
pabelanger	I think ansible-playbook is defunted	22:41
jeblair	mordred: /tmp/console-17a1464795d146bcb85c9802956a908f.log on the worker has the complete output from openstack-info	22:41
jeblair	2017-06-09 22:33:17.593676 \| [Zuul] Task exit code: 0	22:42
jeblair	ends with that	22:42
mordred	jeblair: ok. so somewhere the streaming borked	22:42
mordred	jeblair: what's the IP?	22:42
pabelanger	oh, we are also using python3 for ansible. Are we wanting that too?	22:42
jeblair	ssh -i /var/lib/zuul/ssh/nodepool_id_rsa zuul@15.184.65.167	22:42
jeblair	ssh -i /var/lib/zuul/ssh/nodepool_id_rsa zuul@15.184.65.167	22:42
jeblair	mordred: ^ you'll want to run that from ze01	22:42
jeblair	mordred: (pabelanger is fixing up missing root ssh keys)	22:43
mordred	jeblair: I was actually just trying to hit the console streamer	22:43
jeblair	ah k	22:43
pabelanger	https://review.openstack.org/#/c/472853/ and https://review.openstack.org/#/c/472854 are nl01.o.o updates we'll need I think	22:43
pabelanger	I included flavor-name change too	22:44
mordred	jeblair: telnet 15.184.65.167 19885 opens the connection, then I put 17a1464795d146bcb85c9802956a908f in and hit enter, but it did not start streaming	22:44
jeblair	mordred: i also get that behavior	22:45
mordred	jeblair: maybe we should add some trace logging into zuul_console that we send to a file or something and see what it thinks is going on	22:45
jeblair	mordred: yeah; i'll see if i can eek anything out of the running process	22:46
jeblair	mordred: it's looping on this:	22:47
jeblair	[pid 1322] open("/tmp/console.log", O_RDONLY) = -1 ENOENT (No such file or directory)	22:47
jeblair	[pid 1322] select(0, NULL, NULL, NULL, {0, 500000} <unfinished ...>	22:47
jeblair	[pid 1269] <... select resumed> ) = 0 (Timeout)	22:47
mordred	jeblair: wow. it should never do thta	22:47
mordred	jeblair: out of sync copies of python things?	22:47
mordred	jeblair: that's an old copy of zuul_console	22:47
jeblair	weird. the version on ze01 looks current.	22:48
mordred	jeblair: we copy to /opt/zuul and work from there, right?	22:49
jeblair	we restarted it only a couple hours ago; that change merged yesterday, right?	22:49
mordred	yah	22:49
jeblair	mordred: /var/lib/zuul	22:49
jeblair	so /var/lib/zuul/ansible/zuul/ansible/library/zuul_console.py should be the operative version	22:49
mordred	jeblair: I agree - that looks like what I expect it to look like	22:50
jeblair	/tmp/5b7d2c04bd0a4a229167e1a89d2fa2ce/ansible/untrusted.cfg says library = /var/lib/zuul/ansible/zuul/ansible/library	22:51
mordred	and /var/lib/zuul/ansible/zuul/ansible/library is what's written to the unrusted.cfg	22:51
jeblair	which is correct	22:51
mordred	jeblair: it's like we have the same process in our heads	22:51
jeblair	the start time of the zuul_console process corresponds with the start of the job (so it's not an old one somehow)	22:54
mordred	jeblair: I ran an updatedb and then a locate	22:54
mordred	jeblair: checking the contents of all of the zuul_console.py files, I see only one that doesn't reference console-{uuid}.log	22:55
mordred	which is /var/lib/zuul/executor-git/git.openstack.org/openstack-infra/zuul/zuul/ansible/library/zuul_console.py	22:55
mordred	but that also doesn't reference /tmp/console.log	22:55
mordred	OH WAIT A SECOND	22:55
jeblair	that's probably a master checkout	22:55
mordred	yah	22:56
mordred	jeblair: but zuul_console takes a filename as an argument	22:56
jeblair	(that's the working directory of the merger, so its contents shouldn't matter)	22:56
mordred	are we passing it a filename in the job content?	22:56
pabelanger	we set it to /tmp/console.log in our prepare-workspace role	22:56
mordred	that's the bug	22:56
mordred	we need to stop doing that	22:56
pabelanger	k, what should it be?	22:57
mordred	nothing. just leave it out :)	22:57
pabelanger	k, we should likely disabled the override then :)	22:57
pabelanger	I can patch, 1 sec	22:57
mordred	yah.	22:57
jeblair	oh, the uuid thing is the default	22:57
mordred	you can also set it to '/tmp/console-{log_uuid}.log'	22:58
pabelanger	port? can that be left or removed?	22:58
mordred	yah - I think we need to rework that as a thing tha has a parameter	22:58
mordred	pabelanger: just remove it for now	22:58
pabelanger	ack	22:58
mordred	pabelanger: and we can rethink how we might want to allow this to be parameterized	22:58
pabelanger	Hmm, how do you want to handle clean up on static nodes?	22:58
mordred	pabelanger: I have some thoughts on that - I'll write them up for folkses	22:59
jeblair	oh, wait, this has broken static node log streaming hasn't it?	23:00
jeblair	or has it?	23:00
mordred	shouldn't have - it will have broken cleaning up lurking log files	23:00
jeblair	ah, gotcha	23:00
pabelanger	ya, we just ensure /tmp/console.log was purged before	23:00
pabelanger	we could wildcard it moving forward	23:01
jeblair	pabelanger: not a bad idea	23:01
mordred	++	23:02
mordred	jeblair, pabelanger: we could also send a "we're done, cleanup after yourself" to the on-node console streamer	23:03
jeblair	ya	23:03
pabelanger	agree	23:03
pabelanger	tmpreaper would also work I think	23:04
mordred	I actually started poking at "cleanup" the other day, but then got lost in the "refactor these two to be the same code"	23:04
jeblair	pabelanger: yeah, but tmpreaper is externalizing the cost of zuul ops onto the sysadmin	23:04
pabelanger	finger is still working here :D	23:04
jeblair	better for us to clean up ourselves	23:04
pabelanger	agree	23:04
jeblair	finger b82946f7a81a496bbfa45451606c34b2@ze01.openstack.org	23:04
mordred	I'll pull that thought back up and see if I can't get y'all a patch on monday	23:05
pabelanger	Oh, ah. we hit ffi missing dependency	23:05
jeblair	Shrews: fingering is happening ^ :)	23:05
mordred	and WORKING	23:05
pabelanger	ya	23:05
pabelanger	Nice, it is now logging our rsync attempts too	23:06
clarkb	pabelanger: and censoring them	23:06
pabelanger	Oh, this is just pulling logs to executor	23:06
mordred	jeblair: what do you think of zuul itself doing a socket connection to each node on port 19885 and sending a cleanup command before it returns the nodes?	23:06
mordred	jeblair: we could also make it a thing in the base job's post playbook	23:07
jeblair	mordred: i think 19885 has to be read-only; so something in the playbook which logs into the host and sends a signal would be better	23:07
clarkb	mordred: could you use an at exit type construct for the service instead? I assume we expect the process to die	23:07
mordred	jeblair: kk	23:07
mordred	clarkb: no - nothing kills the process currently	23:07
pabelanger	guess we need to also publish properly to logs.o.o now too	23:08
jeblair	(we need it to continue running at least until the executor has finished streaming from it)	23:08
jeblair	(which is slightly after the thing it is running is finished)	23:08
mordred	jeblair: ok - maybe zuul_console should return the PID of the child it spawns	23:08
mordred	jeblair: and then we can run a post-playbook in the base job that signals that pid and tells it to shut down	23:08
jeblair	mordred: considering we just said "we're only going to wait 30 seconds for the stream to catch up", we could probably have the streamer wait 40 seconds, then clean up and exit.	23:09
mordred	jeblair: 40 seconds after what?	23:09
mordred	it doesn't know when the last task will have been performed	23:09
mordred	(there is a connect-disconnect per task now)	23:09
clarkb	mordred: will it get a signal when the parent dies(parents get signal when children die)	23:10
jeblair	mordred: oh, right... hrm.	23:10
mordred	clarkb: nope	23:10
clarkb	maybe we can make parent explicitly send a signal?	23:10
mordred	clarkb: the parent is LONG gone on purpose	23:10
jeblair	mordred: pid/signal sounds best so far.	23:10
clarkb	well whatever ends the job	23:10
mordred	clarkb: the first task of the first pre-playbook is "run zuul_console"	23:10
clarkb	can lookup pid based on socket ownership or fd	23:10
mordred	clarkb: ooh - TIL ...	23:11
mordred	clarkb: what's the best way to lookup pid based on socket ownership?	23:11
clarkb	thats a good question :) I Know its possible, but not sure of a best way	23:11
mordred	heh :)	23:12
clarkb	mordred: looks like psutil	23:12
mordred	clarkb: sudo ss -lptn 'sport = :19885'	23:13
clarkb	you can iterate over processes and for each process you can list the files	23:13
mordred	or sudo netstat -nlp \| grep :19855	23:13
clarkb	ya or lsof	23:13
clarkb	but I am assuming you want to do it from python right?	23:13
mordred	no - it'll be an ansible task	23:13
clarkb	I think psutil may be easiest there	23:13
clarkb	ah	23:13
mordred	becuase the only thing that will have context to know things are done is the base job's post playbook	23:14
mordred	clarkb: I don't have psutil instlaled anywhere	23:14
clarkb	mordred: its a pyhton module	23:14
mordred	oh - duh	23:14
clarkb	I think off pypi	23:14
mordred	yah - let's see if we can get this without needing to install stuff	23:15
clarkb	https://pypi.python.org/pypi/psutil	23:15
clarkb	ya ss/netstat/lsof likely to work fine. Just not sure how consistently they are installed places	23:15
clarkb	I seem to recall needing to install lsof on centos	23:15
mordred	netstat is pretty-much always around right?	23:16
clarkb	it is in /bin for me	23:17
clarkb	which I think means yes at least for suse	23:17
clarkb	mordred: looks like netstat is being killed as part of the switch to ip and ss is the new command	23:17
clarkb	so your original one is likely best	23:18
mordred	kill $(netstat -nlp \| grep :19885 \| awk '{print $7}' \| cut -f1 -d/)	23:18
mordred	lovely	23:18
mordred	ss has the hardest output to parse	23:18
clarkb	of course	23:18
mordred	root@ubuntu-xenial-rax-ord-9191322:~# ss -lptn 'sport = :19885'	23:18
mordred	State Recv-Q Send-Q Local Address:Port Peer Address:Port	23:19
mordred	LISTEN 0 5 :::19885 :::* users:(("python",pid=3440,fd=3))	23:19
mordred	SIGH	23:19
clarkb	oh thats not too bad give me one sec	23:19
jeblair	mordred: hah, i thought SIGH was a State.	23:19
clarkb	\| sed -n -e 's/.*,$pid=[0-9]\+$,/\1/p'	23:20
* jeblair renames nodepool states... SIGH. JUST_GET_IT_OVER_WITH. WHATS_THE_HOLDUP.		23:20
clarkb	SIGH has value 256	23:21
mordred	clarkb: that gets me pid=3440fd=3))	23:21
* clarkb tests more locally		23:21
clarkb	oh right you just want pid	23:22
clarkb	\| sed -n -e 's/.,pid=$[0-9]\+$,./\1/p'	23:22
mordred	yes	23:22
mordred	thank you	23:22
clarkb	not to derail anything but can you imagine sorting all this out for windows too?	23:23
pabelanger	jeblair: are we expecting the current jobs to get killed once job timeout happens?	23:24
mordred	oy	23:25
mordred	clarkb: well - this isn't going to work actually - because you have to be root to look up process by port	23:25
clarkb	mordred: even if that process is sharing a user with you?	23:26
mordred	hrm. maybe? lemme try	23:26
* clarkb checks /proc		23:26
jeblair	pabelanger: yes	23:26
clarkb	mordred: things are owned by you in /proc so if utility does best effort it should work	23:26
jeblair	pabelanger: i'm inclined to just let that happen and check back later	23:26
clarkb	but if it just bails out if not root then ya won't work	23:26
mordred	clarkb: ok. cool. you're right	23:26
mordred	it works	23:26
clarkb	cool	23:27
Shrews	Yay finger things being useful	23:27
clarkb	mordred: you definitely won't eb able to do it for arbitrary users as not root though	23:27
pabelanger	jeblair: agree, happy to see what happens	23:27
mordred	clarkb, jeblair, pabelanger: https://review.openstack.org/472866	23:28
jeblair	mordred: is that going to put us in a catch-22? it will create a log file that the callback plugin will want to stream?	23:29
mordred	jeblair: I was just about to say that	23:29
mordred	jeblair: I'll workon that next :)	23:29
mordred	jeblair: can probably write it as an option to the zuul_console module actually - so that we can just call "zuul_console: state=absent"	23:30
jeblair	mordred: interestingly, that's a DoS that your 30s timeout patch will prevent :)	23:30
mordred	jeblair: and have the zuul_console python module that gets copied over and that does not log to the shell log do the kill in that case	23:30
jeblair	mordred: ++	23:31
mordred	it'll also keep the logic in a zuul file and have the base job be easy and symmetrical for others	23:31
clarkb	mordred: that would be python then?	23:33
clarkb	probably still don't want to rely on psutil?	23:33
jeblair	i think it could be problematic to use something not in the stdlib	23:39
clarkb	in that case still possible to iterate through /proc/$piddirsownedbyouruser/fd	23:43
openstackgerrit	Monty Taylor proposed openstack-infra/zuul feature/zuulv3: Add shutdown option for zuul_console https://review.openstack.org/472867	23:43
mordred	clarkb, jeblair: ^^ how's that look?	23:43
clarkb	that looks fine, I'm just trying to find a sane way to make it all python without the subprocess	23:46
mordred	I welcome that	23:46
mordred	I'm gonna pushup a quick update adding a comment	23:46
openstackgerrit	Monty Taylor proposed openstack-infra/zuul feature/zuulv3: Add shutdown option for zuul_console https://review.openstack.org/472867	23:47
mordred	clarkb, jeblair: it is time for me to EOD - but I think that should at least put a placeholder in the approach we can use there	23:51
jeblair	mordred: lgtm; ++ to proc if clarkb works it out.	23:51
clarkb	mordred: https://github.com/giampaolo/psutil/blob/0a6953cfd59009b422b808b2c59e37077c0bdcb1/psutil/_pslinux.py#L1870 ya that does what I dscribe using /proc	23:51
mordred	clarkb: so we could potentially copypasta some and be not 100% terrible	23:51
jeblair	bsd licensed, should be ok	23:52
clarkb	so I thin general process is list /proc filter by process dirs owned by us, for each process dir readlink fd/*, if matches tcp port then return pid	23:52
clarkb	yup licensing should be fine and I think we can mostly use that implementation too	23:52
mordred	cool	23:52
mordred	clarkb: if you get a while desire to do that before I do and update the change, I will not be offended	23:52
mordred	s/while/wild/	23:53
clarkb	though looking in my /proc we may need to do a second lookup because I get things like 3 -> socket:[11248846] in fd	23:53
jeblair	clarkb: fdinfo?	23:53
clarkb	that gives me pos, flags, and mnt_id. Apparently htat number in [] is an inode	23:54
pabelanger	we seem to be getting a deprecated warning about commas as lists, but not sure where that is coming from just yet	23:54
jeblair	mordred: for monday: http://paste.openstack.org/show/612171/	23:56
clarkb	jeblair: mordred you read /proc/net/tcp which gives you all the connections in binary tabular form	23:56
pabelanger	warning and error seem to go hand and hand: http://paste.openstack.org/show/612172/	23:57
jeblair	clarkb: is pid in there?	23:57
clarkb	so I think what we actually do is read /proc/net/tcp first to find inode for socket on correct port. Then look for that inode to find the process	23:57
jeblair	clarkb: oh gotcha	23:58
clarkb	jeblair: it doens't look like pid is in there :) that would be too easy	23:58
clarkb	sl local_address rem_address st tx_queue rx_queue tr tm->when retrnsmt uid timeout inode <- are the fields	23:58
jeblair	pabelanger: are you sure those are related?	23:58
jeblair	pabelanger: i mean, it's outputting the comma warning on every invocation. i would expect it to also show up in invocations with the log error.	23:59

Generated by irclog2html.py 2.15.3 by Marius Gedminas - find it at mg.pov.lt!