jeblair | jamielennox: when all the streaming stuff is in place, i expect us to have a finger multiplexer (probably built into the websocket multiplexer) so that we can have a single streaming server for all of the executors; the executors should not be end-user addressible. eg zuul.example.com/uuid | 00:01 |
---|---|---|
jamielennox | jeblair: yea, that was my understanding - but it means that there's really no reason to run the executor fingerd on 79, and therefore no reason for root on the executor | 00:02 |
jeblair | jamielennox: that service will either want to listen on 79, or it could listen on another port and you could port-forward to it. | 00:02 |
*** dmsimard has joined #zuul | 00:02 | |
jamielennox | the multiplexor might be 79 | 00:02 |
jeblair | jamielennox: yep, i think you've got it. | 00:02 |
jamielennox | jeblair: so would it be reasonable to change the default port of the executor fingerd? | 00:02 |
jeblair | jamielennox: yeah, i was thinking we'd do it when we had the multiplexer up and running, but if inverting that would be helpful now, we could probably go ahead and do it. | 00:03 |
jamielennox | jeblair: it's purely the run as root that threw me, i've got it changed in local config i just wasn't sure if there was something else i was missing | 00:05 |
mordred | tristanC: yes. zuul and nodepool both are fully py3 now | 00:11 |
jlk | well | 00:13 |
*** yolanda has quit IRC | 00:14 | |
mordred | jlk: no? | 00:14 |
jlk | I think there is an open change to actually _use_ that setting in some ways | 00:14 |
jlk | sorry that was meant for jamielennox and jeblair | 00:14 |
mordred | ah - nod | 00:15 |
jlk | https://review.openstack.org/473103 | 00:15 |
mordred | jlk: did you see the patch about github retries? | 00:15 |
jlk | jamielennox: that's part of some changes | 00:15 |
jlk | mordred: yeah, I -1d it | 00:15 |
mordred | ah - I see that now - cool | 00:15 |
jamielennox | jlk: ok, i can try and look at those - maybe the executor needs to register the finger port in zk when it picks up a job? | 00:16 |
jamielennox | cause at the moment it's configurable, but those changes don't appear to be | 00:17 |
jamielennox | oh, no, it's still configurable | 00:17 |
mordred | jamielennox: the port is supported at least in the finger url bit when I added them | 00:18 |
mordred | or, rather, here: https://review.openstack.org/#/c/473103/3 | 00:19 |
mordred | so they show up _somewhere_ | 00:19 |
jamielennox | mordred: yea, i'm trying to get back to the point where i can actually test that | 00:21 |
jamielennox | stuck on py3 and some ssl issues, maybe today, depends how badly the dentist goes | 00:22 |
pabelanger | jeblair: mordred https://review.openstack.org/#/c/473533/ and https://review.openstack.org/#/c/472861/ need to land to make zuul +1 again | 00:25 |
pabelanger | that should help zuul +1 all our open changes | 00:26 |
pabelanger | zuul (user) | 00:26 |
pabelanger | https://review.openstack.org/#/c/473613/ would also be nice to get a few more nodes on nl01.o.o | 00:26 |
*** yolanda has joined #zuul | 00:37 | |
openstackgerrit | David Shrewsbury proposed openstack-infra/zuul feature/zuulv3: Add log streaming test https://review.openstack.org/471079 | 00:48 |
Shrews | stupid pep8 | 00:48 |
Shrews | jamielennox: should only need to change the finger port to not run as root | 00:50 |
Shrews | that was the only reason | 00:50 |
openstackgerrit | Jesse Keating proposed openstack-infra/zuul feature/zuulv3: Improve fake github commit status handling https://review.openstack.org/473633 | 01:02 |
jlk | blah, pep8. | 01:05 |
openstackgerrit | Jesse Keating proposed openstack-infra/zuul feature/zuulv3: Improve fake github commit status handling https://review.openstack.org/473633 | 01:06 |
jamielennox | any zuul v3 outstanding patches? | 01:09 |
jamielennox | bah | 01:09 |
jamielennox | python3 | 01:09 |
jamielennox | getting some encoding issues | 01:09 |
openstackgerrit | Tristan Cacqueray proposed openstack-infra/zuul feature/zuulv3: github: retry pull_request() https://review.openstack.org/473301 | 01:14 |
openstackgerrit | Tristan Cacqueray proposed openstack-infra/zuul feature/zuulv3: zuul_stream: handle empty line https://review.openstack.org/473222 | 01:34 |
*** yolanda has quit IRC | 01:37 | |
*** jiaohaolin1 has left #zuul | 01:45 | |
openstackgerrit | Jamie Lennox proposed openstack-infra/zuul feature/zuulv3: Encode webhook_token secret https://review.openstack.org/473674 | 03:36 |
openstackgerrit | Jesse Keating proposed openstack-infra/zuul feature/zuulv3: Improve fake github commit status handling https://review.openstack.org/473633 | 03:36 |
*** bhavik1 has joined #zuul | 04:37 | |
*** yolanda has joined #zuul | 05:54 | |
*** isaacb has joined #zuul | 06:28 | |
*** yolanda has quit IRC | 07:28 | |
*** bhavik1 has quit IRC | 08:19 | |
*** hashar has joined #zuul | 09:13 | |
openstackgerrit | Tristan Cacqueray proposed openstack-infra/zuul feature/zuulv3: RFC: zuul.d: configuration split https://review.openstack.org/473764 | 09:53 |
*** isaacb_ has joined #zuul | 11:07 | |
*** isaacb has quit IRC | 11:09 | |
*** jkilpatr has joined #zuul | 11:10 | |
*** isaacb_ has quit IRC | 11:41 | |
*** isaacb_ has joined #zuul | 11:42 | |
openstackgerrit | Tristan Cacqueray proposed openstack-infra/zuul feature/zuulv3: RFC: zuul.d: configuration split https://review.openstack.org/473764 | 11:56 |
*** openstackgerrit has quit IRC | 12:18 | |
*** openstackgerrit has joined #zuul | 12:23 | |
openstackgerrit | Ricardo Carrillo Cruz proposed openstack-infra/nodepool feature/zuulv3: Create group for label type https://review.openstack.org/473809 | 12:23 |
rcarrillocruz | jeblair: handwavy patch for adding per-label groups ^ | 12:24 |
rcarrillocruz | i need to spawn a nodepool VM to test that | 12:25 |
*** olaph has quit IRC | 12:27 | |
openstackgerrit | Monty Taylor proposed openstack-infra/zuul feature/zuulv3: Add a success-url for status.json test https://review.openstack.org/473604 | 12:29 |
openstackgerrit | Monty Taylor proposed openstack-infra/zuul feature/zuulv3: Move streaming url formatting to model https://review.openstack.org/473090 | 12:29 |
openstackgerrit | Monty Taylor proposed openstack-infra/zuul feature/zuulv3: Support finger ports in finger URL https://review.openstack.org/473103 | 12:29 |
openstackgerrit | Monty Taylor proposed openstack-infra/zuul feature/zuulv3: Add build.started state flag https://review.openstack.org/473811 | 12:29 |
Shrews | mordred: https://review.openstack.org/471079 is really for sure ready now. sorry i left a pep8 problem in there | 13:17 |
*** _ari_|conf is now known as _ari_ | 13:18 | |
mordred | Shrews: +2 | 13:20 |
mordred | rcarrillocruz: lgtm | 13:21 |
mordred | rcarrillocruz: from a zuul perspective, check out https://review.openstack.org/#/c/467611/ and make sure it's good enough for what you want moving forward | 13:23 |
mordred | rcarrillocruz, jeblair, jlk: there is a question that I've been mulling related to that, btw - which is whether or not we should add the "openstack" host vars to our nodes that get added by the openstack ansible inventory ... | 13:24 |
mordred | argument for is that if someone has a playbook for a thing that expects to run against some nodes that are running in openstack and is using those variables, they won't be able to directly test their playbook against the nodes we launch for them | 13:25 |
mordred | argument against is that we actually dont' communicate anywhere "these are ubuntu-xenial nodes in openstack" - the openstack part is an impl detail | 13:25 |
mordred | perhaps we should consider including an optional node type in the nodeset spec - so that a user could say "I want an ubuntu-xenial node in an openstack cloud" and if they did, we'd attach the openstack hostvars | 13:26 |
mordred | but if they don't explicitly request that, we wouldn't do that | 13:26 |
mordred | (that might also be a thing we need anyway for other types of workloads - "give me ubuntu-xenial in a container" or "give me ubuntu-xenial in GCE") | 13:27 |
mordred | BUT - all of that also feeds in to a different thing I keep meaning on bringing up - which is "as a user, how do I get a list of available labels" - which seems like perhaps a good REST API call and dashboard page to add to tristanC's thing | 13:28 |
pabelanger | jeblair: mordred: Shrews: would it be much effort if I requested finger @ze01.o.o list all running build ID and job names? | 13:38 |
Shrews | pabelanger: the streamer doesn't have access to all available builds (not without processing on disk directories and trying to make a guess at which ones are valid) | 13:39 |
pabelanger | Shrews: we don't store that info in zk today? | 13:42 |
Shrews | pabelanger: not that i'm aware of | 13:42 |
pabelanger | k | 13:45 |
*** dkranz has joined #zuul | 13:47 | |
jlk | mordred: so I am thinking that we'd want extra details like that to come from the nodepool driver. If using the openstack nodepool driver, you get some openstack hints in your vars. If you're using the k8s driver, you get some k8s hints in inventory (if there are any), ditto GCP, AWS, and so on. | 13:52 |
*** dmsimard has quit IRC | 14:05 | |
mordred | jlk: yah - I definitley think htey should come from there - but if they're on by default is that muddying an abstraction? | 14:23 |
mordred | jlk: like, today, you get a node named "ubuntu-xenial" in your inventory with some zuul variables and an ip | 14:24 |
mordred | but if we started adding the openstack dict to the variables for all openstack-provided nodes, would that be good? or bad? | 14:24 |
mordred | pabelanger, Shrews: I think that's likely something we can/should add to the multi-plexor when we write it - because the multi-plexor would be in a good position to, at the very least, GET /status.json and then return a list | 14:26 |
mordred | but for the executor streamers that seems like the wrong layer | 14:26 |
jeblair | mordred: agreed re multiplexer | 14:28 |
mordred | jlk: I definitely agree re the k8s/aws/gce hints etc - just mostly not sure if we should add those hints if the user hasn't requested a certain delivery model- or whether we should expose/allow that request | 14:28 |
jeblair | mordred: is there any harm to supplying extra info like the openstack hints? as for requesting it -- i think the way you request something like that is via labels (ubuntu-xenial-container vs ubuntu-xenial-openstack) if that sort of thing is important to you. | 14:30 |
mordred | jeblair: any harm is the question - I can't decide whether it's leaking abstraction details or not | 14:36 |
mordred | jeblair: but for the labels thing - in my brain I was imagining that a deployer might have, for instance, an openstack provider, a k8s provider and an AWS provider - but similar to how we define an ubuntu-xenial across all of our clouds today and jobs don't care where it comes from, they define an ubuntu-xenial label across all three and jobs don't care | 14:38 |
jeblair | mordred: yep, if you don't care, just call it 'ubuntu-xenial'. if you do care, call it 'ubuntu-xenial-openstack'. if you both do and don't care, make both labels. | 14:39 |
mordred | jeblair: right. but if you make both labels, that would affect quota math | 14:39 |
jeblair | mordred: nodepool v3 is noticeably less concerned with that than v0. | 14:40 |
mordred | nod | 14:40 |
mordred | jeblair: I was thinking if you had an ubuntu-xenial node on both and could further restrict it like label: ubuntu-xential provider: openstack | 14:40 |
pabelanger | I'd appreciate a review on https://review.openstack.org/#/c/472853 please. I'd like to add our infra-root keys to nl01.o.o to make some debugging a little easier: https://review.openstack.org/#/c/472854 | 14:40 |
mordred | jeblair: but I can also see that being too much and not useful | 14:41 |
jeblair | mordred: the only quota related thing that would come into play is min-ready, and i think that still maps pretty well here. "i want 10 ubuntu-xenial-openstack nodes min-ready, and only 1 ubuntu-xenial-k8s" | 14:41 |
pabelanger | We should also consider adding auto-hold support back for zuulv3.o.o, its helpful when debugging job failures too | 14:41 |
jeblair | pabelanger: it's in storyboard. so we have considered it and accepted that it needs to be done. no one has picked up the task yet. | 14:42 |
mordred | jeblair: right- but "i want 10 ubuntu-xenial-openstack nodes min-ready" doesn't get you any min-ready of ubuntu-xenial nodes | 14:43 |
* Shrews recommends a more non-generic, understandable term than "multiplexor". What's a good term for the thing that gives you the finger? | 14:43 | |
* fungi refrains from obvious wisecrack opening | 14:43 | |
jeblair | mordred: sure. ubuntu-xenial would need its own min-ready, and there would be a minor loss of efficiency there. if we really cared about that, we could probably map them to each other in nodepool. | 14:43 |
Shrews | fungi: come on... you know you wanna | 14:44 |
mordred | jeblair: yah - that was mostly what I was thinking -some sort of map or even an alias or something | 14:44 |
jeblair | mordred: but "provider: openstack" inside of zuul seems like the abstraction violation. i'd very much prefer not to do that. | 14:44 |
jeblair | mordred: if any of this stuff is important, let's make it happen in nodepool. | 14:45 |
mordred | jeblair: yah - so, that's the question I had WRT exposing the openstack variables - is it an abstraction violation to do so if they're there | 14:45 |
pabelanger | jeblair: sure, it would be great if we could find a volunteer to maybe work on it. | 14:45 |
mordred | jeblair: and if it is a violation, is it a thing we should allow someone to opt-in to? | 14:46 |
mordred | Shrews: how about "finger gateway"? | 14:46 |
* rcarrillocruz back and reads scrollback | 14:46 | |
jeblair | mordred: i don't know what these variables are and what they are used for. but if they are useful, i don't see why we can't pass them over. and if we aren't sure we want to, we can make it an option on the nodepool side (add a "send openstack vars" option to the pool-label) | 14:46 |
Shrews | mordred: that is a good (but non-humorous) alternative | 14:47 |
Shrews | i like that | 14:47 |
mordred | jeblair: maybe let's start with passing them, then see if we think they're a bad idea and need an option? | 14:47 |
jeblair | mordred: do we have a need/use for them now? | 14:48 |
mordred | jeblair: well - the concept came to mind because rcarrillocruz was talking about adding nodepool label to the groups nova metadata. this affects how the openstack dynamic inventory puts nodes in to groups. that mainly made me think that if people have playbooks that are expecting to be run in the context of the openstack dynamic inventory, we are not actually currently exposing the information those playbooks | 14:50 |
mordred | are provided even though we have access to it | 14:50 |
mordred | jeblair: (it's come up a few times in my brain as I've been reviewing the code around the node exchange, but just never bubbled all the way up to "discuss") | 14:51 |
jeblair | mordred: rcarrillocruz's grouping issue is handled by zuul, right? | 14:52 |
jeblair | mordred: i guess i'm having trouble understanding what automatic grouping would happen from openstack vars in a nodepool+zuulv3 scenario. | 14:53 |
mordred | jeblair: oh! sorry - I understand the disconnect | 14:53 |
mordred | jeblair: I don't mean for automatic grouping purposes - one sec, lemme make a quick paste | 14:54 |
rcarrillocruz | yeah, groups in nodesets zuul is a superst of my patch | 14:54 |
rcarrillocruz | trying to think of a use case i could use with what mordred is describing | 14:54 |
rcarrillocruz | in just- nodepool scenarios | 14:54 |
jeblair | pabelanger: yeah, i'm concerned that we're all getting a bit distracted by post zuul-v3 things and people are not working on the things we need to get to zuul v3 itself. there is still a considerable backlog and almost no one is working it down. | 14:59 |
pabelanger | jeblair: agree. infra-root keys is the first step to helping | 15:04 |
jeblair | pabelanger: i +3d that change | 15:04 |
pabelanger | I think I'll also have to work on log publishing today too. It's been too difficult tracking all the open streams in terminals | 15:05 |
jeblair | mordred: can you +3 https://review.openstack.org/473222 ? | 15:12 |
jeblair | pabelanger: what do you we need to do regarding log publishing? | 15:13 |
pabelanger | jeblair: right now we are not publishing any logs to logs.o.o. This is because only set it up before to publish to zuulv3-dev.o.o/logs. So we need to address that | 15:15 |
jeblair | pabelanger: right. what's your plan? | 15:16 |
pabelanger | jeblair: I'd like to see what people would like to do. Specfically, have we solved our log url / log path issues we discussed at the PTG? Today we just dropped everything into a top level build uuid folder. I am not sure that will work well on logs.o.o | 15:18 |
jeblair | pabelanger: no, i think we should use the same log path system we have in production now. so any variables missing, we should add. first step then is probably to make a new post-playbook scp log copier that copies logs up to logs.o.o. | 15:20 |
jeblair | pabelanger: it will have to be in a trusted post playbook and use an ssh key on the executor to do the copying because we can't use secrets yet because we haven't turned on ssl support for gearman. | 15:20 |
openstackgerrit | Merged openstack-infra/zuul feature/zuulv3: Improve fake github commit status handling https://review.openstack.org/473633 | 15:21 |
jeblair | (if we switch gearman to ssl, we can use secrets, then we can put the ssh key in a secret and do this in an untrusted repo) | 15:21 |
pabelanger | jeblair: okay, do we have a SSL cert for gearman or will it just be a self-signed | 15:21 |
jeblair | pabelanger: self-signed | 15:21 |
pabelanger | k | 15:21 |
pabelanger | we might as well do that first, let me see what needs to be done | 15:22 |
jeblair | Shrews: i left a comment on https://review.openstack.org/471079 | 15:23 |
Shrews | jeblair: regarding the file buffering, i quickly ran into a test run where i got less data from the file than what the recv() returned | 15:24 |
jeblair | Shrews, mordred: i know that zuul_console is unbuffered -- did we perhaps inadvertently make the callback plugin write buffered? or .... | 15:25 |
Shrews | so i assumed that was due to buffering | 15:25 |
jeblair | Shrews: recv() returned *more* data than the file? | 15:25 |
Shrews | jeblair: yes | 15:25 |
jeblair | Shrews: hrm... you strip newlines from the file... is it possible it was short by the number of newlines? | 15:26 |
Shrews | jeblair: no, there were visible lines missing | 15:26 |
jeblair | Shrews: are you thinking they were written to the file after you read the file but before you streamed it? | 15:27 |
Shrews | jeblair: it's possible there is some sort of race around that | 15:28 |
*** jkilpatr_ has joined #zuul | 15:29 | |
Shrews | jeblair: that's partly why i didn't also choose to keep calling recv() until the entire buffer size is returned (there is actually a recv flag to do that in a single call) | 15:29 |
Shrews | not being able to guarantee that the file contents returned would meet the minimum | 15:30 |
Shrews | i didn't want the test to have a possibility of hanging | 15:30 |
*** jkilpatr has quit IRC | 15:31 | |
jeblair | Shrews: how's this for a crazy idea: open the log file for reading in the test, open the streaming connection and spawn a thread to read the whole stream in the background and return when complete, remove the flag file, wait for job to complete, read entire file in (the file is deleted now, but we still have an open handle so we'll be able to read it), wait for stream read to complete, compare, close file. | 15:33 |
Shrews | jeblair: "return when complete" ... part of the problem is that we don't know when it is "complete" | 15:35 |
jeblair | Shrews: does the finger server disconnect when the job ends? | 15:36 |
Shrews | jeblair: in theory the server should break the connection on EOF | 15:37 |
jeblair | Shrews: if that's the case (and it seems like if it isn't, we probably should make it so), if we wait to the end of the job we'll get that signal | 15:38 |
*** isaacb_ has quit IRC | 15:39 | |
Shrews | jeblair: how do you propose coordinating that? ending the job while the streaming is ongoing doesn't guarantee we will get the entire file streamed, does it? the file could get removed from underneath the streamer | 15:41 |
jeblair | Shrews: if so, that's a bug. | 15:41 |
jeblair | Shrews: but i expect it to hold it open until it gets to the end of the file. | 15:42 |
Shrews | this test is really getting super complex. i'll try to add that new bit though | 15:43 |
jeblair | Shrews: i think it can be simpler this way | 15:43 |
Shrews | hah | 15:44 |
jeblair | Shrews: there's only one sync point in the job | 15:44 |
Shrews | 1? | 15:44 |
jeblair | sure, laugh, whatever | 15:44 |
jeblair | yes one sync point | 15:44 |
jeblair | the job only has to wait until the test has opened the log file and made the initial stream connection | 15:44 |
jeblair | then it can run to the end | 15:44 |
jeblair | and after that, the test can wait until the build completes in the usual manner | 15:45 |
Shrews | not a derisive laugh, btw. that came across poorly | 15:45 |
jeblair | 'sok :) | 15:45 |
jeblair | i'll reread it as an evil cackle for fun | 15:46 |
Shrews | more a maniacal, point-of-turning-evil laugh | 15:47 |
openstackgerrit | Paul Belanger proposed openstack-infra/nodepool feature/zuulv3: Add region_name to zookeeper https://review.openstack.org/473887 | 15:50 |
pabelanger | jeblair: Shrews: ^ was something I noticed missing when we try to setup our mirrors in zuulv3.o.o. Would be interested in your feedback | 15:51 |
mordred | jeblair, jlk: this is what I'm talking about http://paste.openstack.org/show/612438/ | 15:51 |
Shrews | pabelanger: lgtm | 15:52 |
mordred | jeblair: the first thing is what we're emitting today | 15:53 |
mordred | jeblair: if we add the "openstack" vars (which are added to a node's hostvars if you use the openstack dynamic inventory) you get the second thing | 15:53 |
openstackgerrit | Paul Belanger proposed openstack-infra/zuul feature/zuulv3: Update to nodepool_region_name from zookeeper data https://review.openstack.org/473889 | 15:55 |
pabelanger | ^ is the zuul change too | 15:55 |
mordred | pabelanger: actually - region is already in the inventory | 15:55 |
jeblair | mordred: gotcha -- so just things like getting the ip info from there is an obvious usage (like, i have an ansible play that sets up some iptables stuff and reads openstack.networks.private to do that) | 15:55 |
mordred | pabelanger: should be in the "nodepool_region" variable in the node's hostvars | 15:55 |
mordred | jeblair: yah - I could also imagine a playbook looking at the volumes dict and deciding to do or not do something with that information | 15:56 |
jeblair | mordred: *nod* | 15:56 |
mordred | pabelanger: should be in the "nodepool_region" variable in the node's hostvars | 15:57 |
rcarrillocruz | i can imagine this would be useful for $provider testing | 15:57 |
rcarrillocruz | like | 15:57 |
mordred | pabelanger: maybe we should also start saving the inventory file when we save logs | 15:57 |
jeblair | mordred: i think pabelanger wants to transmit that to zuul though | 15:57 |
rcarrillocruz | give me a xenial-openstack cos i want to test specific openstack things on it | 15:57 |
pabelanger | mordred: yes, nodepool_region will always be null currently. The zuul patch problem pulls the data from zookeeper, and just renames it to nodepool_region_name, since we use region_name for zookeeper. However, if people okay with nodepool_region, I can omit that change | 15:58 |
mordred | jeblair: yes. zuul already writes that to the inventory | 15:58 |
rcarrillocruz | i see no harm on it, providing is opt-in | 15:58 |
mordred | pabelanger: ah - I'm fine putting it in the zuul vars and not as nodepool_region | 15:58 |
jeblair | mordred, rcarrillocruz: yeah, it's a big chunk of data... maybe we should have it disabled by default, but let folks opt in to sending it? | 15:58 |
rcarrillocruz | ++ from me for disabled by default | 15:59 |
pabelanger | mordred: okay, nodepool_region_name is okay for you? | 15:59 |
mordred | jeblair: yah. we can also put a pin in that and figure out the best mechanism to communicate it later | 15:59 |
jeblair | mordred: ya, not urgent | 15:59 |
mordred | pabelanger, jeblair: I'm fine either way: zuul.region_name or nodepool_region - but I think I lean towards zuul.region_name | 16:00 |
mordred | pabelanger, jeblair: if we do zuul.region_name I think we should remove nodepool_region from the hostvars | 16:00 |
jeblair | mordred, pabelanger: give me a minute, i'm *really* confused by these patches. | 16:00 |
mordred | kk | 16:00 |
pabelanger | now I am confused too | 16:00 |
jeblair | 473887 looks like it's *adding* something to the zk node data structure | 16:01 |
pabelanger | http://paste.openstack.org/show/612440/ is the issue we have today, nodepool_region will always be null | 16:02 |
jeblair | like, nodepool is not passing the region at all currently, regardless of whatever we name it. is that correct? | 16:02 |
mordred | yes | 16:02 |
pabelanger | jeblair: correct. We must have missed that before | 16:03 |
jeblair | ok. then the zuul change is needed because 473887 chose a different name than what zuul was expecting | 16:03 |
jeblair | okay, i think i understand these, and i think we should call the zk attribute "region". that's to match "provider" (which is not provider_name) | 16:04 |
pabelanger | okay, I can make that change | 16:04 |
mordred | jeblair, pabelanger: on 473889 - I put in a question on the review - but it's "should those variables be top-level or be moved to the zuul variables dict" | 16:05 |
* mordred votes for move - but is fine if we pick the other thing | 16:05 | |
jeblair | mordred: hrm, it's in as a host var... but i guess they all just share the same namespace, huh? | 16:06 |
jeblair | mordred: so yeah, i'm leaning toward move | 16:06 |
jeblair | mordred: how do we accomplish that though? | 16:06 |
mordred | it's easy - we just put them into the zuul dict - I can doa followup | 16:07 |
jeblair | mordred: a zuul dict for a single host? | 16:07 |
pabelanger | I'd be okay with a zuul.nodepool dict() if we went that roune | 16:07 |
pabelanger | rout* | 16:07 |
pabelanger | fetching coffee | 16:07 |
*** hashar has quit IRC | 16:07 | |
jeblair | mordred: would it be like this https://etherpad.openstack.org/p/mHTWbqUQ3M ? | 16:08 |
jeblair | (would those get merged?) | 16:09 |
mordred | yes. to both questions | 16:10 |
jeblair | cool, in that case, zuul.nodepool_region or zuul.nodepool.region both wfm. | 16:10 |
*** isaacb has joined #zuul | 16:11 | |
Shrews | let's be careful with that. i'm not 100% sure how that would affect current nodes with the old style metadata | 16:11 |
mordred | jeblair: nope. I take it all back. | 16:12 |
jeblair | mordred: it overwrites the whole thing? | 16:12 |
mordred | jeblair: we'd need to turn on variable dict merging to get merging, and we've been avoiding that on purpose so far | 16:13 |
mordred | jeblair: so I think ignoring me and keeping them top-level per-node as we have them is fine | 16:13 |
Shrews | oh, these aren't node metadata vars. nm | 16:13 |
jeblair | ok. so i think we just revise pabelanger's patch for my bikeshed, then no change to zuul. | 16:13 |
mordred | Shrews: cool - I was just about to ask which thing we should be careful about | 16:13 |
mordred | jeblair: yup | 16:13 |
pabelanger | Agree, I don't think we should enable dict merge setting | 16:14 |
mordred | jeblair: we _could_ make nodepool a dict to be consistent with zuul being a dict | 16:14 |
mordred | but I do not think it's essential that we do so | 16:14 |
jeblair | mordred: agreed, that seems like it would be fine too, but <shrug> | 16:14 |
jeblair | i guess we should decide rsn though. :) | 16:14 |
pabelanger | nodepool dict seems okay to me | 16:15 |
mordred | woot | 16:15 |
jeblair | wfm | 16:15 |
openstackgerrit | Paul Belanger proposed openstack-infra/nodepool feature/zuulv3: Add region to zookeeper https://review.openstack.org/473887 | 16:15 |
pabelanger | okay, ^ should mean we can abandon 473889 | 16:16 |
jeblair | pabelanger: yep. but if you want to move them into a dict, now's the time. :) | 16:17 |
pabelanger | k, I can do that too | 16:17 |
pabelanger | jeblair: mordred: still as a host_vars right? | 16:20 |
mordred | pabelanger: yah - I think just right there in that same function | 16:20 |
pabelanger | k | 16:23 |
jlk | mordred: do you know of any consumers that would appreciate such variables? Or do you feel like "if you build it, they will come"? | 16:28 |
openstackgerrit | Paul Belanger proposed openstack-infra/zuul feature/zuulv3: Create nodepool dictionary for ansible inventory https://review.openstack.org/473889 | 16:29 |
mordred | jlk: yah. that's the real rub - I don't know of specific consumers- only that it's variables we provide for openstack inventory users ... and so far everything we've put upstream in ansible gets used way more than I expect :) | 16:30 |
jlk | hah. I see there has been some more discussion. I agree it's a lot of info, maybe way more than should be exposed by default. As much as I hate adding more config toggles, this does seem like one. | 16:31 |
*** isaacb has quit IRC | 16:35 | |
*** isaacb has joined #zuul | 16:36 | |
*** jkilpatr_ has quit IRC | 16:39 | |
*** jkilpatr has joined #zuul | 16:39 | |
rcarrillocruz | just tested https://review.openstack.org/#/c/473809/1 on my nodepool instance | 16:41 |
rcarrillocruz | deleted a node | 16:41 |
rcarrillocruz | confirm the per-label group is there | 16:41 |
rcarrillocruz | jeblair: are we good to go ^ ? | 16:41 |
jeblair | rcarrillocruz: almost -- we should call it 'label' and not type; left a comment on patch. | 16:48 |
rcarrillocruz | ah ok, i follwed the Node convention | 16:49 |
rcarrillocruz | thx, i'll amend | 16:49 |
jeblair | yeah, this way it'll be one less thing we need to change :) | 16:50 |
rcarrillocruz | so jeblair , i should wait for your patch to land or put a depends-on ? | 16:50 |
jeblair | rcarrillocruz: i haven't written it yet, so why don't you just go ahead and rework yours to use 'label' wherever you can... so it might look like "nodepool_node_label=self._node.type", then i'll fix that when i change the rest of the zk stuff | 16:51 |
rcarrillocruz | k | 16:51 |
jeblair | rcarrillocruz: but at least you won't have to deal with the metadata variable changing | 16:51 |
openstackgerrit | Merged openstack-infra/nodepool feature/zuulv3: Add region to zookeeper https://review.openstack.org/473887 | 16:54 |
openstackgerrit | Ricardo Carrillo Cruz proposed openstack-infra/nodepool feature/zuulv3: Create group for label type https://review.openstack.org/473809 | 16:55 |
*** jkilpatr has quit IRC | 17:02 | |
rcarrillocruz | thx folks | 17:03 |
*** jkilpatr has joined #zuul | 17:04 | |
* SpamapS sits back down at laptop, refreshed from 2 days of legolanding. | 17:04 | |
*** isaacb has quit IRC | 17:07 | |
jlk | good times! | 17:13 |
jlk | my boys were so bummed when we went that we were missing the opening of Ninjago by a couple weeks. | 17:13 |
openstackgerrit | Paul Belanger proposed openstack-infra/zuul feature/zuulv3: Add ssl support to gearman / gearman_server https://review.openstack.org/473916 | 17:15 |
pabelanger | jeblair: okay, I think we first need ^ for gearman SSL. I didn't see anything in zuul today for it. I'd also like to setup a unit test for SSL, but want to make sure that is the right step first | 17:17 |
SpamapS | jlk: It was meh for me, but my boys did in fact enjoy it since it was the only video game they got to play all day. | 17:19 |
SpamapS | jlk: (re: Ninjago) | 17:19 |
jeblair | pabelanger: yep agreed -- though i think we need to use "has_option" because if we don't, doesn't configparser.get raise an exception if the value isn't there? | 17:30 |
pabelanger | jeblair: Ya, that is something I wanted to test. I thought it defaulted to None, but untested. | 17:31 |
pabelanger | working on generating our self-signed certs | 17:31 |
jeblair | pabelanger: yeah, we need has_option: http://paste.openstack.org/show/612449/ | 17:33 |
pabelanger | jeblair: thanks | 17:34 |
openstackgerrit | Merged openstack-infra/nodepool feature/zuulv3: Create group for label type https://review.openstack.org/473809 | 17:34 |
jeblair | jlk: cache change lgtm | 17:37 |
jlk | yay! | 17:37 |
jeblair | mordred: i commented on 473090 | 17:55 |
SpamapS | FYI, 7/11, CNCF would like another demo of Zuul, this time with v3. :) | 18:12 |
mordred | SpamapS: sweet | 18:14 |
mordred | jeblair: I have responded - thanks! | 18:34 |
*** hashar has joined #zuul | 18:36 | |
jlk | SpamapS: care to take a look at https://review.openstack.org/472468 and perhaps give it workflow? | 18:43 |
SpamapS | jlk: looking now | 19:02 |
SpamapS | zomg summer WFH is such a tire fire sometimes | 19:17 |
* SpamapS is going to have to escape to a coffee shop me thinks | 19:17 | |
jlk | :( | 19:18 |
SpamapS | jlk: review progressing :) | 19:21 |
Shrews | mordred: after the meeting, we might need to put our heads together on your "follow_log" streaming logic. It doesn't seem to stop streaming on EOF | 19:43 |
mordred | Shrews: kk. the logic for serving on the node itself? or the logic for reading it on the executor? | 19:45 |
Shrews | mordred: the serving code | 19:46 |
openstackgerrit | Merged openstack-infra/zuul feature/zuulv3: Implement a cache of github change objects https://review.openstack.org/472468 | 19:46 |
Shrews | mordred: like, maybe if the file has NOT been truncated, is it ok to just return False since we've only gotten to that point (i think) if we've read all of the file? | 19:47 |
Shrews | returning False should terminate the streaming | 19:47 |
mordred | Shrews: well, that is on purpose. it's basically doing a tail -f of the log and doesn't know if more things wil be added | 19:47 |
mordred | Shrews: HOWEVER - now that we're doing logfile-per-command, we could potentially change that logic, since we don't need to support subsequent commands appending to the file | 19:48 |
Shrews | mordred: that might work for the ansible plugin, but i'm talking about in reference to the finger streamer (where i copied that code) | 19:49 |
mordred | gotcha | 19:49 |
Shrews | doesn't seem like we'll ever really know if we've reached the true EOF | 19:49 |
mordred | so - yah - let's rethink this part - because there have been some structural changes since it was written | 19:50 |
mordred | Shrews: on the node, we can definitely write an EOF to the log file, because we don't re-use log files across tasks on the node anymore | 19:51 |
mordred | Shrews: on the executor, we also know when we're done with the logfile - that's when we run zuul_console: state=absent | 19:52 |
mordred | Shrews: so the executor streamer can write a "yup, I'm done" | 19:52 |
mordred | Shrews: and then the finger streamer can notice that and stop | 19:53 |
mordred | Shrews: right? | 19:53 |
jeblair | mordred, Shrews: the executor streamer knows it's done when the file is *deleted* | 19:53 |
Shrews | hrm, yeah. maybe a presence check is what's needed there | 19:54 |
jeblair | so while the node streamer is doing a tail -f (and i agree, that can probably change now), the executor streamer can replace that "did the file get truncated?" check with a "did the file get deleted?" | 19:54 |
mordred | jeblair: yes. agree | 19:55 |
Shrews | right, we won't ever truncate that log | 19:55 |
jeblair | though, we'll have to think about startup -- what if someone requests a stream before the job starts? then the file won't exist -- yet. | 19:55 |
Shrews | the finger streamer returns "log file not found" in that case | 19:55 |
* SpamapS smells a lockfile coming | 19:55 | |
jeblair | however, the executor doesn't return the streaming information/url until after the job starts... so... maybe it's okay? | 19:55 |
jeblair | well, it sends it right before it runs ansible for the first time | 19:56 |
Shrews | or, if the build dir doesn't exist yet, "invalid build uuid" | 19:56 |
jeblair | so... i guess there's a tiny race there. | 19:56 |
jeblair | Shrews: aha! | 19:56 |
jeblair | Shrews: build dir exists, but log file doesn't means that the job hasn't started yet. so the streamer can wait for the log to show up (if we want to be nice). | 19:56 |
jeblair | Shrews: build dir doesn't exist means the job is finished. terminate connection. | 19:57 |
Shrews | jeblair: yeah, we could be nicer that way | 19:57 |
mordred | ++ to both | 19:58 |
mordred | btw - the executor streamer can stop streaming even before the file is deleted if we want - "[Zuul] Task exit code: %s" is the last line emitted by the command module | 20:00 |
mordred | OH - I say that - I actually see a logic flaw there | 20:00 |
jeblair | mordred: there may be more tasks, so the executor streamer can't stop, right? | 20:02 |
openstackgerrit | Jesse Keating proposed openstack-infra/zuul feature/zuulv3: Implement pipeline requirement of github labels https://review.openstack.org/473962 | 20:02 |
mordred | jeblair: it can stop streaming that task | 20:02 |
jeblair | mordred: oh, as a client. yes. | 20:02 |
Shrews | jeblair: file exists check seems to work. good idea | 20:02 |
jeblair | (i thought you made it do that? i recall a check for that line) | 20:02 |
mordred | jeblair: it can also stop streaming that particular file | 20:02 |
mordred | jeblair: I do on the executor | 20:02 |
mordred | jeblair: but I was suggesting we could do the same check in the node streamer so that the thread streaming a given file could terminate | 20:03 |
jeblair | mordred: ah, you said 'executor streamer' above. okay, resetting -- | 20:03 |
mordred | sorry. we have many pieces | 20:04 |
jeblair | mordred: yeah, i think that's okay. just one thing we'll need to make sure of is that in both places (node and executor streamers), we flush all the data output before we close any open connections. | 20:05 |
mordred | ++ | 20:05 |
pabelanger | So, has anyboby tried using SSL with python gear recently? I keep getting SSLError: [SSL: WRONG_VERSION_NUMBER] wrong version number (_ssl.c:661) back from my SSL certs I have generated | 20:06 |
Shrews | pabelanger: i added that code loooooong ago (4+ years?). worked at the time. :) | 20:06 |
jeblair | Shrews: i think you added tests too? :) | 20:07 |
pabelanger | Ya, that's what I see. I've been following https://dst.lbl.gov/~boverhof/openssl_certs.html for my server / client / ca certs | 20:07 |
Shrews | wow, /me just realizing how long he has been in this openstack stuff | 20:07 |
jeblair | Shrews: don't look down | 20:07 |
pabelanger | I am starting to thing ssl_version=ssl.PROTOCOL_TLSv1 might be part of the issue | 20:07 |
SpamapS | indeed | 20:08 |
jeblair | ah, we use fixed certs in the tests, so we're not generating new ones or anything | 20:08 |
jeblair | pabelanger: what emits that error? | 20:11 |
openstackgerrit | Monty Taylor proposed openstack-infra/zuul feature/zuulv3: Make sure we always log the exit line https://review.openstack.org/473966 | 20:11 |
pabelanger | jeblair: gear.Server: http://paste.openstack.org/show/612463/ | 20:13 |
pabelanger | afk for a few minutes, need to stretch legs | 20:13 |
openstackgerrit | David Shrewsbury proposed openstack-infra/zuul feature/zuulv3: Stop streaming from server when log file removed https://review.openstack.org/473967 | 20:13 |
Shrews | jeblair: mordred: ^^^ this seems to work | 20:13 |
jeblair | pabelanger: are you sure the client is using ssl? | 20:14 |
*** jkilpatr has quit IRC | 20:15 | |
pabelanger | jeblair: I believe so, let me confirm again | 20:17 |
jeblair | pabelanger: there should be a "Using SSL" debug log line right after "Connecting to" | 20:18 |
SpamapS | you can always 'openssl s_client -connect foo:4730' and try some admin commands :) | 20:19 |
jeblair | ya | 20:19 |
pabelanger | that is a good idea | 20:20 |
pabelanger | jeblair: I don't see that, sorry was looking | 20:20 |
pabelanger | let me first debug with geard client | 20:20 |
*** jkilpatr has joined #zuul | 20:31 | |
openstackgerrit | Merged openstack-infra/zuul feature/zuulv3: Use threads instead of processes in zuul_stream https://review.openstack.org/472850 | 20:38 |
openstackgerrit | David Shrewsbury proposed openstack-infra/zuul feature/zuulv3: Add log streaming test https://review.openstack.org/471079 | 20:39 |
openstackgerrit | Merged openstack-infra/zuul feature/zuulv3: Stop streaming from server when log file removed https://review.openstack.org/473967 | 20:39 |
Shrews | jeblair: okie dokie, reworked that ^^^ to use the thread for streaming. Although I could remove a couple of sync points from the job itself, I had to add some artificial spinwaits to wait for things to get populated | 20:40 |
openstackgerrit | Merged openstack-infra/zuul feature/zuulv3: Don't wait for forever to join streamer https://review.openstack.org/472839 | 20:40 |
Shrews | jeblair: oh, this is fun. ever seen this seeming non-related fail? http://paste.openstack.org/show/612464/ | 20:42 |
openstackgerrit | Merged openstack-infra/zuul feature/zuulv3: Use display.display for executor debug messages https://review.openstack.org/472840 | 20:42 |
jeblair | Shrews: no, but i've never had a config/streamer/ directory -- did you lose that in some git operation somehow? | 20:43 |
Shrews | jeblair: nope. that's being added for that test | 20:44 |
Shrews | happened while running the test 100 times | 20:44 |
Shrews | failed on run 71 | 20:44 |
jeblair | Shrews: yeah, but it looks like it's not where it's expected on your local system | 20:45 |
jeblair | Shrews: did you perform any git operations while running those tests in the background? | 20:45 |
Shrews | hrm, maybe doing the 'git review' during the run caused issues | 20:45 |
Shrews | yeah | 20:45 |
Shrews | must be it | 20:45 |
jeblair | yes, it absolutely did as part of the rebase check. | 20:45 |
* Shrews makes mental note to not do that | 20:45 | |
jeblair | Shrews: or 'git review -R' if you really want to | 20:46 |
jeblair | clarkb: can you take a look at 472964 ? | 20:48 |
clarkb | jeblair: mordred my preference on the log indexing/parsing/etc side of things is to not make things conditionally different | 20:51 |
clarkb | it complicates the rules necessary to parse and index and query | 20:51 |
jeblair | clarkb: generally, i agree -- is it worth having the "hostname" (eg, ubuntu-xenial) in every log line on every single-node job though? or is that a parsing complexity we'd be willing to accept in order to simplify the most common logs? | 20:52 |
mordred | clarkb: ok- so the prepended hostname on every logline you don't think will bother folks? | 20:52 |
clarkb | it wouldn't bother me, we often have people confused about what jobs ran on and that would make it pretty explicit | 20:53 |
mordred | that's a good point | 20:53 |
jeblair | (yes, though i think even now our pre-playbook outputs that at the top of the job) | 20:53 |
clarkb | I don't feel strongly about it. The difference in logstash rules for that is an optional match group for hostname | 20:55 |
clarkb | so fairly straightforward | 20:56 |
mordred | jeblair: I can split the if len > 1 bit out into its own patch and put it at the end if we want to get the rest of that stack in but till talk about the hostname part | 20:57 |
mordred | oh - I guess there is nothing pressing on top of it | 20:57 |
jeblair | i also don't feel strongly. however, based on what i've seen, i think perhaps this is one of those cases where we can make users' lives better at a small cost to our logstash config complexity. so if it's not too hard to do, let's do that. if there's a compelling user story for always having it, we can always stick it back in. | 20:58 |
mordred | jeblair: kk | 20:58 |
clarkb | is there a problem we are trying to solve with it? just concern that too much info will make logs harder to read? | 20:59 |
clarkb | and if ^ is the case why are we not worried about it for multinode? | 20:59 |
jeblair | clarkb: well, it's required by multinode in order to have any hope of reading it | 21:00 |
mordred | clarkb: yah. but for single node it winds up being a change from the current log format | 21:00 |
clarkb | right, but if users are capable of that wouldn't they be capable of it on signle node too? | 21:00 |
mordred | so since it's a change, it made me want to think about whether the change would increase or decrease user happiness | 21:00 |
clarkb | I'm personally a fan of consistency | 21:01 |
clarkb | also journalctl outputs with hostname by default too | 21:01 |
jeblair | i think users will be able to handle it; i think it just might be annoying -- the same thing repeated on every line... though now that i say that... mordred, does the hostname perhaps make an easy way to distinguish between "ansible" log lines vs "console" log lines? | 21:02 |
clarkb | (so having hostname there would potentialyl make it easier on people used to journalctl output) | 21:02 |
jeblair | (now i wish i had one of these handy, but we're broken right now) | 21:02 |
mordred | clarkb: is there an example default journalctl output lying around anywhere? | 21:02 |
clarkb | mordred: just run journalctl -f on your laptop :) | 21:02 |
clarkb | Jun 13 08:49:10 nibbler systemd[2605]: Started XFCE notifications service. | 21:02 |
clarkb | for example | 21:02 |
clarkb | nibbler is the hostname here | 21:02 |
jeblair | http://zuulv3-dev.openstack.org/logs/0a06c318cf2340e7a99f08a7b6127164/ansible_log.txt | 21:03 |
mordred | jeblair: yah - but a bunch has changed there | 21:03 |
jeblair | mordred: hrm, it kind of does -- ignoring the stuff we've changed (like fixing dual timestamps, etc) -- the hostname does kind of make a nice indication that something is a console line vs an ansible line... | 21:03 |
mordred | jeblair: yah - it's not terrible. why don't we just keep it for a bit and after we read logs for a little while revisit? | 21:05 |
mordred | jeblair: so I'll pull just that bit out of the stack but leave the other bits so that it's trivial to re-add if we want | 21:05 |
jeblair | mordred: sgtm! | 21:05 |
openstackgerrit | Monty Taylor proposed openstack-infra/zuul feature/zuulv3: Change log streaming link to finger protocol https://review.openstack.org/437764 | 21:09 |
openstackgerrit | Monty Taylor proposed openstack-infra/zuul feature/zuulv3: Extract get_playhosts listing in zuul_stream to a method https://review.openstack.org/472964 | 21:09 |
openstackgerrit | Monty Taylor proposed openstack-infra/zuul feature/zuulv3: Only prepend hostname on multi-node plays https://review.openstack.org/473985 | 21:09 |
mordred | jeblair: kk. there is is reorganized a bit | 21:10 |
pabelanger | okay | 21:15 |
pabelanger | would love some body to validate my steps for SSL and gearman: http://paste.openstack.org/show/612468/ | 21:16 |
pabelanger | taking a break before I dive back into it | 21:20 |
*** dkranz has quit IRC | 21:27 | |
Shrews | pabelanger: it may be that self-signed certs won't work? i seem to recall some sort of issue around those, but i really don't remember | 21:32 |
SpamapS | pabelanger: I usually use CA.sh | 21:32 |
jeblair | Shrews: i think pabelanger's example signs the certs with a ca key, so they shouldn't be (only) self-signed | 21:33 |
Shrews | jeblair: k. like i said, my memory on that is fuzzy. 'self-signed' brought up bad feelings that i can't put a finger on atm | 21:35 |
Shrews | wish i could remember, but that would risk bring up memories of libra | 21:40 |
Shrews | *shudder* | 21:40 |
jeblair | pabelanger: i confirm your error | 21:47 |
jeblair | pabelanger: this process, however, works for me: http://paste.openstack.org/show/612472/ | 21:48 |
jeblair | pabelanger: i suspect the problem is the expiration -- if i look at the certs i generate with your instructions, their begin and end dates are the same | 21:57 |
jeblair | pabelanger: oops, i'm wrong about that, nevermind | 21:58 |
jeblair | pabelanger, Shrews: oh wait -- i think those might be self-signed! | 22:01 |
jeblair | i mean, there's a CA cert there and all | 22:01 |
jeblair | but i think if you enter the same values for the subject information, you get that error | 22:02 |
jeblair | pabelanger: i can't tell what you entered since you didn't include that in the paste, or use the -subj line | 22:02 |
jeblair | but that is something that's different about my procedure -- i use -subj and provide something different for each cert | 22:03 |
pabelanger | jeblair: I just used the default values provided by openssl, I just pressed enter for most prompts | 22:05 |
pabelanger | let me make sure client and server different values | 22:05 |
pabelanger | jeblair: yes, that was the issue. By using a different subject for client / server / ca, I no longer get errors. Thank you | 22:09 |
*** hashar has quit IRC | 22:12 | |
*** hashar has joined #zuul | 23:02 | |
openstackgerrit | Jesse Keating proposed openstack-infra/zuul feature/zuulv3: Implement pipeline reject filter for github https://review.openstack.org/474001 | 23:10 |
* jlk hangs up his hat for the day. | 23:11 | |
*** hashar has quit IRC | 23:28 |
Generated by irclog2html.py 2.15.3 by Marius Gedminas - find it at mg.pov.lt!