pabelanger | Ya, zuulv3 does seems to be preforming better | 00:20 |
---|---|---|
openstackgerrit | Merged openstack-infra/zuul feature/zuulv3: Fix Gearman UnknownJob handler https://review.openstack.org/508992 | 00:39 |
*** smyers has quit IRC | 00:54 | |
*** smyers has joined #zuul | 00:57 | |
*** smyers has quit IRC | 01:39 | |
*** smyers has joined #zuul | 01:45 | |
*** jhesketh has quit IRC | 01:51 | |
*** jhesketh has joined #zuul | 01:51 | |
*** jkilpatr has quit IRC | 02:46 | |
mordred | rcarrillocruz: I was thinking it was going to fail because you added a line to the test that was SUPER long - but if pep8 allows it, awesome | 03:21 |
*** jamielennox has quit IRC | 04:14 | |
*** jamielennox has joined #zuul | 04:18 | |
*** bhavik1 has joined #zuul | 04:44 | |
*** ricky_ has joined #zuul | 08:28 | |
openstackgerrit | Ricardo Carrillo Cruz proposed openstack-infra/nodepool feature/zuulv3: Bring back per label groups in Openstack https://review.openstack.org/509620 | 08:33 |
*** hashar has joined #zuul | 08:34 | |
openstackgerrit | Tobias Henkel proposed openstack-infra/nodepool feature/zuulv3: Use same flake8 config as in zuul https://review.openstack.org/509715 | 08:34 |
ricky_ | tobiash: please re-review when you get a sec ^ | 08:34 |
ricky_ | thx | 08:34 |
tobiash | mordred, jeblair, rcarrillocruz: looks like pep8 config in nodepool is broken. ^ would sync it to the same settings as in zuul, but we would have to fix quite some stuff... | 08:35 |
tobiash | ricky_: lgtm | 08:35 |
ricky_ | thx | 08:37 |
*** bhavik1 has quit IRC | 09:23 | |
kklimonda | can I ship my own ansible action plugins with roles/playbooks? Or perhaps I can explain my usecase: I'd like to expose some additional variables to my tasks (for example I have a repo with debian packaging, I'd like to parse debian/changelog and expose version to other tasks as a variable). | 09:41 |
tobiash | kklimonda: I think you can ship your own modules, but no action plugins as they are restricted by zuul in order to prevent unreviewed code to do bad stuff | 09:46 |
kklimonda | tobiash: is this just matter of trusted vs untrusted projects? i.e. can I ship action plugin if it's part of a trusted project? | 09:47 |
tobiash | kklimonda: by looking into the code I think it's just restricted for untrusted projects: | 10:28 |
tobiash | kklimonda: http://git.openstack.org/cgit/openstack-infra/zuul/tree/zuul/executor/server.py?h=feature/zuulv3#n1533 | 10:28 |
tobiash | kklimonda: but I don't know what's the default search path of action plugins in ansible | 10:28 |
*** ricky_ has quit IRC | 10:59 | |
*** jkilpatr has joined #zuul | 11:15 | |
*** jkilpatr has quit IRC | 11:24 | |
*** jkilpatr has joined #zuul | 11:37 | |
kklimonda | @tobiash thanks, I'll check it out | 12:22 |
*** SotK_ has joined #zuul | 12:41 | |
*** SotK_ has left #zuul | 12:48 | |
dmsimard | Are we planning on reloading the executors sometime soon ? I'd like to have https://review.openstack.org/#/c/509254/ in to properly test zuulv3 elastic-recheck changes | 13:29 |
*** dkranz has joined #zuul | 13:40 | |
fungi | memory utilization on zuulv3 is looking muuuuch better today: http://cacti.openstack.org/cacti/graph.php?action=view&local_graph_id=63979&rra_id=all | 14:17 |
jeblair | yeah, it's still *a lot* of memory, but not an ever increasing graph | 14:21 |
jeblair | btw, we do know that we leak cached change (and now pull-request) objects. there's not an easy solution to that right now, but they are small, and that's a slow leak. | 14:21 |
mordred | jeblair: with the current rate of change in the scheduler- and existing planned changes - I'm comfortable with a slow leak | 14:30 |
mordred | jeblair: (I mean, I'm fairly certain we'll have at least one change per week we'll wnat to restart to pick up between now and whenever we could fix theleak) | 14:30 |
jeblair | we have about 1k items in the pipelines right now; so more things in memory than we would even have while normally running -- though i don't know what our proportion of dynamic configs is | 14:33 |
pabelanger | ya, memory does look pretty flat this morning | 14:37 |
openstackgerrit | David Shrewsbury proposed openstack-infra/zuul feature/zuulv3: Handle double node locking snafu https://review.openstack.org/509603 | 14:38 |
Shrews | jeblair: probably going to need your expertise for a test for that ^^^ | 14:38 |
Shrews | my gerrit "zuulv3" filter is now useless since everyone is using that :( | 14:40 |
jeblair | we should switch to "frank" | 14:42 |
rcarrillocruz | got post failure on zuulv3 for sphinx on https://review.openstack.org/#/c/509620/, but regardless, +1 from Jenkins zuulv2 | 14:44 |
rcarrillocruz | are we good to merge mordred ? | 14:44 |
rcarrillocruz | i made a shorter line the assert | 14:44 |
dmellado | rcarrillocruz: I've been seeing that behavior quite a bit, sadly | 14:47 |
rcarrillocruz | thx jeblair | 14:47 |
pabelanger | rcarrillocruz: Docs should not be published for feature branches | 14:48 |
rcarrillocruz | so failure expected | 14:48 |
rcarrillocruz | ok | 14:48 |
pabelanger | ya | 14:48 |
pabelanger | we need to fix the job | 14:48 |
pabelanger | build-openstack-sphinx-docs should be using prepare-infra-docs-for-afs role | 14:48 |
jeblair | pabelanger: that's the doc *build* job | 14:50 |
jeblair | it should run on all changes | 14:50 |
jeblair | and it should publish to logs.o.o | 14:50 |
pabelanger | jeblair: right, but openstack docs is how it was setup. Which doesn't allow feature branches to be published when it was zuulv2.5 JJB. | 14:51 |
openstackgerrit | Merged openstack-infra/nodepool feature/zuulv3: Bring back per label groups in Openstack https://review.openstack.org/509620 | 14:51 |
pabelanger | at once point we did have an build-infra-docs, but I am not sure atm | 14:52 |
pabelanger | looking | 14:52 |
jeblair | pabelanger: let's move this to -infra | 14:52 |
pabelanger | kk | 14:52 |
jeblair | pabelanger, Shrews: i don't see any zookeeper connection issues since i stopped doing objgraph stuff after the restart | 15:04 |
pabelanger | Ya, I think the high CPU load was causing them to drop | 15:06 |
clarkb | cacti reports signifiacntly more sane cpu usage | 15:06 |
pabelanger | clarkb: jeblair: I am noticing we consuming for HDD space, I am not sure we have 2nd drive mounted | 15:07 |
jeblair | Shrews: i think we can probably pause the nodepool provider to control when it fulfills requests, and we can probably manually close the zk connection... i don't think we have a facility to stop scheduler processing of zk events while we do that. i think tests for this could be very difficult. | 15:07 |
jeblair | pabelanger: the 2nd drive is all swap | 15:07 |
pabelanger | HA | 15:09 |
pabelanger | nice | 15:09 |
jeblair | Shrews: there are some more unit-test like tests in test_nodepool... maybe we could do it there | 15:09 |
jeblair | Shrews: the test class itself acts as a scheduler, so it has its own onnodesprovisioned event | 15:10 |
jeblair | Shrews: i think i'd give that a shot -- probably make a new test class because you'll want to control onnodesprovisioned and have it do something differently | 15:10 |
jeblair | Shrews: aha! there's even a test in there for disconnects | 15:11 |
jeblair | Shrews: so i think that has almost all the pieces | 15:11 |
pabelanger | jeblair: assuming all of swap on 2nd drive is wrong, on next stoppage of zuulv3 should we rebuild the 2nd drive and setup properly fstab? | 15:13 |
pabelanger | I'm sorry, but if people are upset there was an outage for 5 days because CI was down or sucked, we should have added more people to zuulv3 effort. Its not like we've been asking for more help. | 15:15 |
pabelanger | wow | 15:15 |
pabelanger | that was the wrong window | 15:16 |
mnaser | its nice that the memory of zuul is relatively stable | 15:18 |
mnaser | and my browser cant even handle rendering how big the queue is :D | 15:19 |
dmsimard | mnaser: time to use zuultty | 15:21 |
mnaser | i need to learn gertty first but maybe ill look into taht | 15:22 |
mnaser | :p | 15:22 |
dmsimard | I always forget where zuultty is hidden, it's like a subfolder in some other project | 15:22 |
mnaser | google search shows... a result of you in eavesdrop | 15:23 |
mnaser | :p | 15:23 |
Shrews | jeblair: k. i'll see if i can figure something out | 15:24 |
dmsimard | mnaser: lmao | 15:24 |
*** hashar is now known as hasharAway | 15:24 | |
mnaser | dmsimard https://gist.github.com/sileht/c342606a7ba64761936e | 15:24 |
dmsimard | mnaser: nah that's not it.. hang on, let me find it | 15:25 |
dmsimard | mnaser: https://github.com/harlowja/gerrit_view/blob/master/README.rst#czuul | 15:26 |
mnaser | dmsimard nice! | 15:27 |
fungi | jeblair: seeing a recent jump in memory utilization (not huge, but at least pronounced) in the past few minutes... wondering if any of this will drop as zuulv3 catches up on its backlog | 15:30 |
fungi | that is, once we're adding changes more slowly than we're reporting on them | 15:30 |
jeblair | fungi: i doubt it will ever drop due to python memory management.... | 15:30 |
jeblair | also, i don't expect zuulv3's queues to ever shrink in our current configuration | 15:31 |
fungi | "management" needs quotes there ;) | 15:31 |
jeblair | i'm going to poke at more memory things, it may cause disruption again | 15:38 |
*** bhavik1 has joined #zuul | 15:56 | |
openstackgerrit | Monty Taylor proposed openstack-infra/zuul feature/zuulv3: Use normal docs build jobs https://review.openstack.org/509833 | 16:49 |
*** bhavik1 has quit IRC | 16:50 | |
pabelanger | jeblair: here is my first attempt at getting kill_after_timeout working locally in GitPython. https://github.com/pabelanger/GitPython/commit/2e78443444c3b836ba3bcd6e6dde62be77ce3779 Not that you are an expert, but any thing pop out as a potential issue? | 16:53 |
pabelanger | when the timeout happens, it will now raise the follow: git.exc.GitCommandError: Cmd('git') failed due to: exit code(-9) | 16:53 |
pabelanger | which we can likey trap and then proceed to clean up the repo | 16:53 |
jeblair | pabelanger: i'm pretty sure the as_process option is tightly integrated with the progress output | 16:55 |
jeblair | pabelanger: so you may want to try setting as_process to false if progress in None | 16:56 |
jeblair | pabelanger: however, in parallel to working upstream, why don't you make a local method for us to use instead of that one, so we don't have to wait for a gitpython release? | 16:57 |
pabelanger | jeblair: Oh, I see. Good idea | 16:57 |
pabelanger | jeblair: Sure, I can try my hand at it | 16:57 |
dmsimard | jeblair, mordred, fungi, clarkb, pabelanger: I don't know if we can make use of this or if it's relevant but it seemed awesome enough that it was at least worth sharing: https://github.com/nickjj/ansigenome | 16:59 |
jeblair | dmsimard: nice, thanks | 17:00 |
*** maxamillion has quit IRC | 17:01 | |
*** maxamillion has joined #zuul | 17:03 | |
dmsimard | There's some interesting features, like making sure we have READMEs, it can generate them, etc. I might poke at it out of personal curiosity to see what it does. | 17:04 |
odyssey4me | dmsimard nickjj also did https://github.com/nickjj/rolespec | 17:17 |
pabelanger | I seen a talk at ansiblefest SF, using testinfra library too. https://pypi.python.org/pypi/testinfra | 17:22 |
jeblair | we're still using a bit more memory than we should; i'm currently looking at some layouts held in memory because some merger jobs have gotten stuck on the git timeout issue that pabelanger is working on. i'm going to think about whether we need to do anything about that other than just fix the git timeout thing. | 17:22 |
pabelanger | I might start using that for helping test roles | 17:22 |
dmsimard | odyssey4me: one day I would like to see something like serverspec but with ansible | 17:23 |
dmsimard | serverspec being ruby and all that | 17:23 |
dmsimard | https://github.com/larsks/ansible-assertive/ allows for doing non-failing asserts for example | 17:24 |
dmsimard | I had written this a long time ago inspired from stuff that EmilienM did back at Enovance | 17:25 |
dmsimard | https://github.com/dmsimard/openstack-serverspec/blob/master/spec/tests/swift_loadbalancer/swift_proxy_spec.rb | 17:25 |
dmsimard | pabelanger: ah so testinfra is basically like serverspec but in python | 17:26 |
dmsimard | never heard of it before | 17:26 |
* EmilienM hides | 17:27 | |
dmsimard | I guess I want to do the same thing as serverspec and testinfra but with ansible proper :D | 17:27 |
dmsimard | pabelanger: oh, but since testinfra is python, we could probably just easily wrap it inside ansible modules | 17:27 |
pabelanger | dmsimard: well, I'd want it to run outside of ansible. EG: have ansible do its thing, then run the python unit test to validate it worked | 17:28 |
pabelanger | otherwise, if ansible is broken, it will be hard to detect that if running inside | 17:28 |
pabelanger | I'll have to find the talk, but it was about molecule at ansiblefest SF | 17:28 |
dmsimard | are the talks online yet ? | 17:29 |
pabelanger | I think so | 17:29 |
dmsimard | I've heard about molecule too but never used it | 17:29 |
jlk | I know that guy | 17:29 |
jlk | who wrote it | 17:29 |
jlk | He used to work at Blue Box | 17:29 |
jlk | alas, I haven't really given molecule a spin yet :( | 17:29 |
pabelanger | Ya, i don't think we'd use it, since it works like beaker. Meant to setup your nodes into containers, then run ansible. But we have zuul / nodepool to do that | 17:29 |
pabelanger | but the testinfra was interesting | 17:30 |
pabelanger | assert file exists, server runs, etc | 17:30 |
pabelanger | service* | 17:30 |
jlk | yeah, there is need for things like that in the enterprise world | 17:30 |
jlk | we used ansible to set up serverspec | 17:31 |
jlk | to validate teh ansible | 17:31 |
dmsimard | I almost went as far as getting monitoring checks to run serverspecs in prod on a regular basis | 17:31 |
dmsimard | but got sidetracked by far less fun things | 17:31 |
jlk | particularly because it wasn't a "continual deployment" environment, Ansible was ran on-demand. So something like serverspec could catch something messing with the system | 17:31 |
jlk | dmsimard: that's exactly what we did | 17:32 |
jlk | sensu alerts for serverspec failures | 17:32 |
dmsimard | jlk: neat, that's what we wanted to do yeah | 17:32 |
dmsimard | but that was at $oldjob :) | 17:32 |
dmsimard | jlk: so the guy worked at metacloud first? then bluebox? does he have a flair for acquisitions or something ? :p | 17:33 |
jlk | bluebox then metacloud | 17:33 |
jlk | he left BB before acquisition | 17:33 |
jlk | (before I joined BB actually) | 17:34 |
Shrews | jeblair: i'm stumped on this test. if i push up the current iteration of it, would you mind showing me where i'm going wrong? | 17:38 |
pabelanger | okay, I think I fixed kill_after_timeout upstream: https://github.com/gitpython-developers/GitPython/pull/683 working on a zuul function now | 17:41 |
kklimonda | zuul doesn't seem to be doing anything to ensure that all jobs that are part of the dependency graph will run on the same cloud, right? | 17:42 |
pabelanger | nodes in the same nodeset should be on the same cloud | 17:43 |
kklimonda | but nodeset is per job, right? | 17:44 |
pabelanger | Right | 17:44 |
pabelanger | the only way to pin jobs to a cloud, would be to create a unique label for said cloud | 17:45 |
pabelanger | we do this today for tripleo-centos-7 images | 17:45 |
pabelanger | and they only run jobs on tripleo-test-cloud-rh1, for historical reasons | 17:45 |
kklimonda | yes, but I don't want to pin it to a specific cloud, just make sure that a given set of jobs will all run on a single cloud | 17:45 |
Shrews | kklimonda: what's the use case for that requirement? | 17:46 |
pabelanger | I don't think we support that currently | 17:46 |
kklimonda | Shrews: I need to build 1GB of packages and then install them for testing. | 17:46 |
pabelanger | sounds like artifact handling? | 17:47 |
kklimonda | yeah | 17:47 |
pabelanger | ya, so this is something we don't do too well atm | 17:47 |
pabelanger | how we worked around it was regional mirrors / proxies to help with that | 17:47 |
kklimonda | well, probably less then that - a lot of packages are dbg symbols, but I'll still end up with ~100MB of packages that have to actually be transferred per build. | 17:47 |
kklimonda | sure, but mirrors/proxies don't help me when it's all new artifacts each time | 17:48 |
pabelanger | right | 17:48 |
kklimonda | (not that I don't need those anyway) | 17:48 |
pabelanger | we basically have the same issue today with the kolla project. They upload large artifacts to tarballs.o.o, and jobs then download from it. | 17:49 |
clarkb | we've talked about possibly using shared cinder volumes for that which would imply scheduling to one cloud region. | 17:49 |
pabelanger | yah | 17:49 |
clarkb | but cinser volumes arent currently multi attachable so that has been possible future work | 17:49 |
dmsimard | nor can we guarantee that cinder will be available in every cloud ? | 17:52 |
clarkb | dmsimard: correct though only infracloud doesnt in our current clouds iirc | 17:53 |
jeblair | pabelanger: thinking about it a bit more -- maybe we've only seen the git hangs on https? so maybe we should go with that solution, and consider the gitpython timeout as a backup solution we can implement later if needed. what do you think? | 18:04 |
jeblair | Shrews: of course! | 18:05 |
pabelanger | jeblair: actually, yah. Looking at etherpad it was HTTPs. So sure, let me get some .gitconfig settings going | 18:09 |
pabelanger | jeblair: do you have recommendations on limites for ratelimit and time? | 18:09 |
pabelanger | jeblair: also, I'm having a hard time understanding if our WatchDog for zuulv3 executor is working. I don't see how we abort the process any more | 18:10 |
pabelanger | not important at the moment, maybe when you have spare time | 18:10 |
jeblair | pabelanger: maybe as close to "no data in 30s" as we can get? so probably 30s for the time and the lowest non-zero value you can do for rate? | 18:10 |
jeblair | pabelanger: i thought we timeout out builds all the time? :) | 18:11 |
pabelanger | we do, i think I'm just not understanding https://review.openstack.org/426306/ which is where we stopped passing the proc into Watchdog class | 18:12 |
pabelanger | but first, I'll do .gitconfig changes | 18:13 |
jeblair | pabelanger: we pass an instance method to the watchdoc. the instance method aborts self.proc | 18:14 |
mnaser | status.json for zuulc3 is almost 4.4M at this point heh | 18:16 |
jeblair | what i'd really like to do with that is send updates over websocket | 18:16 |
jeblair | of course, zuul itself barely knows when something changes at this point, so that's a ways down the road | 18:17 |
pabelanger | jeblair: doh, I see it now. Thank you | 18:18 |
*** electrofelix has quit IRC | 18:20 | |
pabelanger | jeblair: okay, settings work locally: http://paste.openstack.org/show/622785/ | 18:24 |
pabelanger | proposing patch for 1000 bytes/sec for 30 sec | 18:24 |
openstackgerrit | David Shrewsbury proposed openstack-infra/zuul feature/zuulv3: Handle double node locking snafu https://review.openstack.org/509603 | 18:34 |
Shrews | jeblair: thx. see test_nodepool.py | 18:35 |
dmsimard | mordred, jeblair: https://www.anandtech.com/show/11902/thinkpad-anniversary-edition-25-limited-edition-thinkpad-goes-retro :) | 18:37 |
mordred | dmsimard: yes. it gives me great excitement | 18:39 |
dmsimard | I was almost excited until I saw the Geforce in it ? My W541 has an nvidia card and it has brought me nothing but trouble :( | 18:40 |
dmsimard | Starting at 1899$, ouch | 18:40 |
jeblair | Shrews: hey cool, you found a test for another issue on the etherpad! | 18:45 |
jeblair | that's line 148 -- kazoo callback error | 18:46 |
Shrews | jeblair: quite by accident | 18:46 |
Shrews | jeblair: oh, i just notice i missed setting fail_first_request in setup (used to be there, but then made the base class and accidentally removed it) | 18:49 |
jeblair | Shrews: ok, i think i understand the exception -- in the test, we're doing a zk operation in the callback we can't do -- we can't shut down zk from the zk callback | 18:51 |
jeblair | that means the error is something different from product.....oh.... i bet in production we somehow hit that inside the resubmit (which happens in the callback). | 18:51 |
jeblair | Shrews: oh, no, strike that. | 18:52 |
jeblair | Shrews: the production error is actually something we can ignore, i think. | 18:52 |
openstackgerrit | Paul Belanger proposed openstack-infra/zuul feature/zuulv3: Add git timeout for HTTP(S) operations https://review.openstack.org/509876 | 18:53 |
jeblair | Shrews: the key to this is that this is happening inside of an execption handler, and the "TypeError: callback() takes 2 positional arguments but 3 were given" error is a red herring | 18:53 |
pabelanger | jeblair: how does that look^ | 18:53 |
jeblair | Shrews: that's a harmless exception, it's the one after that matters | 18:53 |
Shrews | jeblair: the callback is _updateNodeRequest(), yeah? i was having a devil of a time trying to get that to trigger | 18:53 |
jeblair | Shrews: yep | 18:53 |
jeblair | Shrews: how about this? in onNodesProvisioned, set an Event, and .wait() for it inside the main test method. and then kill zk in the main test method | 18:55 |
jeblair | Shrews: then it'll happen outside the callback thread; should work | 18:55 |
Shrews | jeblair: should I delete the request from onNodesProvisioned before setting the event? i'm not seeing how to invalidate the first request | 19:01 |
jeblair | Shrews: it should be deleted automatically after the zk client disconnections (it's ephemeral) | 19:03 |
Shrews | jeblair: yeah, but doing it in the main test method seems too late. we need it trigger again by the time it gets back to the main thread. | 19:04 |
Shrews | i hate to admit that i'm totally lost by the sequencing here :( | 19:04 |
Shrews | i tried the Event thing and i'm not seeing it retrigger | 19:05 |
jeblair | Shrews: nevermind the test -- what's the sequence you need to have happen? | 19:07 |
jeblair | Shrews: maybe you can write that up on an etherpad, and i can take a look at it when i get back from lunch | 19:08 |
Shrews | 1) request A fulfilled 2) req A enters event queue 3) before the queue is processed, req A disappears, causing req A to resubmit 4) waitForRequests() returns | 19:09 |
Shrews | 0) submit request A | 19:10 |
Shrews | if anyone else more familiar with zuul testing wants to take a stab, by all means, please :) | 19:11 |
tobiash | Shrews: sounds like monkey patching might be able to help disappearing req in step 3 | 19:17 |
tobiash | I don't have the code at hand currently, but I could imagine that a disappearing req could be injected into the event processing like that | 19:20 |
Shrews | i think the error with my thinking is believing that there is an event queue in test-land. which now leaves me more confuzzled about how to test this | 19:32 |
* Shrews walks | 19:33 | |
*** hasharAway is now known as hashar | 19:50 | |
*** ianw|pto is now known as ianw | 20:00 | |
openstackgerrit | David Shrewsbury proposed openstack-infra/zuul feature/zuulv3: Handle double node locking snafu https://review.openstack.org/509603 | 20:01 |
openstackgerrit | David Shrewsbury proposed openstack-infra/zuul feature/zuulv3: Handle double node locking snafu https://review.openstack.org/509603 | 20:02 |
Shrews | gah | 20:02 |
openstackgerrit | David Shrewsbury proposed openstack-infra/zuul feature/zuulv3: Handle double node locking snafu https://review.openstack.org/509603 | 20:03 |
jeblair | Shrews: here's what i've got: https://etherpad.openstack.org/p/4cZGJDn02i | 20:03 |
Shrews | jeblair: i think i got it now | 20:03 |
*** jkilpatr has quit IRC | 20:03 | |
jeblair | Shrews: and yeah, i think we need to make our own "event loop" in the test, since the test is standing in for the scheduler | 20:03 |
Shrews | jeblair: took me a while to realize how to actually get to my codepath to test | 20:03 |
Shrews | i don't think we need to introduce the event loop | 20:04 |
jeblair | Shrews: well, we want something to call acceptNodes with outdated info, right? | 20:04 |
jeblair | Shrews: (ie, we should end up calling acceptNodes twice in that test) | 20:05 |
jeblair | hope that makes sense; the main test thread is on the left; other threads are on the right | 20:06 |
openstackgerrit | Paul Belanger proposed openstack-infra/zuul feature/zuulv3: Create git_http_low_speed_limit / git_http_low_speed_time https://review.openstack.org/509893 | 20:06 |
pabelanger | okay, git timeout patches uploaded | 20:07 |
Shrews | jeblair: you don't feel that simulating those conditions as i did in that new PS is sufficient? if not, then yeah, we'll have to add an event loop | 20:08 |
jeblair | Shrews: i hadn't looked. apparently i was working on the etherpad while you were updating the change. | 20:08 |
jeblair | Shrews: i think those are good tests as long as we've got the sequencing right. i think the only thing they don't do is actually exercise the zk disconnect callback. however, test_node_request_disconnect covers that separately, so we may be okay. | 20:11 |
jeblair | Shrews: if you're happy, i'm happy :) | 20:11 |
Shrews | jeblair: not even test_node_request is testing the event queue path. i really just needed a way to exercise acceptNodes(), which i think those do | 20:12 |
Shrews | and my head hurts. :) being dumb/hardheaded takes a lot of energy | 20:13 |
Shrews | must be all the beer from my younger years | 20:14 |
jeblair | Shrews: yeah -- the first thing the scheduler event processor does is acceptNodes; so all of these are doing a first-order approximation of that and assuming nothing interesting happens after | 20:14 |
pabelanger | etherpad also updated | 20:14 |
jeblair | i think that's okay for the scope of these tests | 20:14 |
jeblair | pabelanger: cool, thx. +3 on the first and small -1s on the second | 20:21 |
*** jkilpatr has joined #zuul | 20:22 | |
openstackgerrit | Paul Belanger proposed openstack-infra/zuul feature/zuulv3: Create git_http_low_speed_limit / git_http_low_speed_time https://review.openstack.org/509893 | 20:33 |
jeblair | i have found a second memory leak triggered by these git timeouts. i have a test case and fix in progress. | 20:34 |
jeblair | (and i have concluded that we should fix this regardless of the git timeout issue) | 20:35 |
openstackgerrit | Merged openstack-infra/zuul feature/zuulv3: Add git timeout for HTTP(S) operations https://review.openstack.org/509876 | 20:35 |
jeblair | pabelanger: do you want to update and restart executors with that ^ ? | 20:37 |
pabelanger | jeblair: great work! | 20:37 |
pabelanger | jeblair: sure, give me a moment to fetch a drink | 20:38 |
*** dkranz has quit IRC | 20:43 | |
openstackgerrit | Monty Taylor proposed openstack-infra/zuul feature/zuulv3: Fix path exclusions https://review.openstack.org/509901 | 20:47 |
mordred | jeblair: ok - after thinking WAY too hard about that ^^ I think that should do us | 20:48 |
jeblair | mordred: heh, the last thing i remember from you on the subject was "let me write that real quick!" :) | 20:50 |
openstackgerrit | James E. Blair proposed openstack-infra/zuul feature/zuulv3: Remove references to pipelines, queues, and layouts on dequeue https://review.openstack.org/509903 | 20:54 |
jeblair | there's memory leak #2 | 20:54 |
jeblair | i also realized there's another bug that helped uncover that; it's minor, but i'll try to fix it now too. causes us to run too many merge operations, and we could always do with fewer of those. | 20:55 |
mordred | jeblair: yah - 'real quick' got distracted and pushed down on the stack until I remembered that I was going to write it real quick | 20:56 |
jeblair | mordred: zomg the commit-message-longer-than-bugfix club needs a secret handshake. | 20:57 |
jeblair | i was settling in for a long review and now find myself unprepared | 20:58 |
mordred | jeblair: hehehe | 20:58 |
mordred | jeblair: your most recent patch isn't a easy to read as the previous memory leak - you had to touch WAY more than one line there | 20:59 |
jeblair | mordred: indeed; i also haven't run the full test suite against it; could have some lurking bugs still. | 21:00 |
mordred | jeblair: it'll fun the full test suite against itself | 21:01 |
clarkb | mordred: does path fix work if user changes $HOME | 21:01 |
clarkb | is that even possible in bwrap? | 21:01 |
mordred | clarkb: unpossible | 21:01 |
clarkb | because passwd is ro? | 21:01 |
mordred | clarkb: the user doesn't have the execution context to change the environment in which ansible-playbook is executed | 21:02 |
pabelanger | okay, starting to restart executors. puppet has been run | 21:02 |
mordred | clarkb: they can set environment in tasks - but those are all executed by ansible-playbook, so are subshells of the shell where the env is checked | 21:03 |
clarkb | mordred: even via something like /proc? | 21:03 |
mordred | wel - they can't write to /proc unless the path filter is already busted | 21:03 |
jeblair | (worth noting, this can be improved when ansible 2.4.1 is released and we can get this value from an ansible.cfg file) | 21:04 |
mordred | but it's all sequencing - zuul-executor executes ansible-playbook and passes an explicit environment to that subprocess - the action plugin that checks the path against HOME is in that process - and the user shouldn't have access to change the environment that exists there | 21:04 |
*** hashar has quit IRC | 21:04 | |
mordred | also what jeblair said | 21:04 |
clarkb | that was going to be my next question, can we supply it directly as config rather than potentially user changeablr env stuff | 21:05 |
mordred | yah - most definitey once 2.4.1 is out | 21:05 |
clarkb | soubds like later, ok | 21:05 |
mordred | but also - if the user is able to change HOME - that should be considered a SERIOUS issue | 21:05 |
mordred | as that would mean that the user was able to execute abitrary code in a context that they should not be able to execute arbitrary code | 21:06 |
mordred | which is not to say it's unpossible - obviously- but if we find an instance of that we should drop everything and think about nothing but that | 21:06 |
openstackgerrit | Monty Taylor proposed openstack-infra/zuul feature/zuulv3: Fix doc typo that inverts logic https://review.openstack.org/509905 | 21:17 |
mordred | jeblair: doublecheck me on that ^^ but I just noticed that looking at the docs for something else | 21:18 |
pabelanger | so far, ze05 and ze06 were in hung state for git clone. They've been since restarted with fixes, moving to ze07 | 21:21 |
jeblair | mordred: good catch, but i think the fix is different; commented | 21:25 |
pabelanger | jeblair: all executors upgraded and restarted | 21:26 |
openstackgerrit | David Shrewsbury proposed openstack-infra/zuul feature/zuulv3: Handle double node locking snafu https://review.openstack.org/509603 | 21:27 |
Shrews | noticed i missed a var for a %s ^^^ | 21:27 |
jeblair | dmsimard: ^ re executors | 21:28 |
mordred | jeblair: ah - good -I'll update that - and that tells me I want to set some of our publish jobs to be post-review: true as wel | 21:28 |
dmsimard | Yay, thanks | 21:28 |
jeblair | mordred: do we have publish jobs defined outside of project-config? | 21:29 |
jeblair | mordred: or i can just wait for the change and review it :) | 21:29 |
mordred | jeblair: oh - no, we don't. nevermind. all good | 21:30 |
jeblair | kk | 21:30 |
pabelanger | jeblair: I'm going to switch to ze03.o.o stopped issue again | 21:35 |
pabelanger | unless something else you'd like me to do | 21:35 |
jeblair | pabelanger: thanks | 21:41 |
openstackgerrit | James E. Blair proposed openstack-infra/zuul feature/zuulv3: Fix early processing of merge-pending items on reconfig https://review.openstack.org/509912 | 21:59 |
mordred | jeblair: wow. that's a fun one | 22:00 |
jeblair | mordred: yeah, that happened, and exposed the memory leak, and i was writing a test around it, and realised, "hey, maybe i shouldn't write a test that replicates the behavior of a bug; maybe i should fix the bug." | 22:07 |
mnaser | jeblair im looking at *a lot* of inefficent behaviour at the moment in the zuul status page | 22:15 |
mnaser | would it be okay to introduce 1 additional dependency (mustache) | 22:15 |
*** hogepodge has joined #zuul | 22:15 | |
mnaser | its a very simple/small javascript templating language that will help the translation of status state <=> html | 22:16 |
jeblair | sounds pretty hipster. :) | 22:19 |
mordred | mnaser: I should probably talk to you about the 'rework how we deal with javascript and html' patches I'm going to get back to once the v3 rollout is done ... | 22:20 |
mnaser | jeblair or maybe angular? that would simplify the code base soooo much | 22:20 |
mnaser | mordred jeblair i could probably get the status page redone in angular.. tonight. maintaining the same look :> | 22:20 |
mordred | mnaser: I believe tristanC's dashboard work introduces angular - so maybe we should just put a pin in improvements here until the rollout is done and we can give some attention to how it's all being stiched together | 22:21 |
mordred | mnaser: we've been holding off on that work until post-rollout so that we don't get too distracted ... one sec though, I'll link you to a couple of patches | 22:21 |
mnaser | ok ill have a look | 22:21 |
mnaser | not like i can help much in the internals of zuul and finding memory leaks :-P | 22:22 |
tristanC | mordred: mnaser: indeed, the zuul-web patch for tenants, jobs and builds are using angular: https://review.openstack.org/#/q/topic:zuul-web | 22:22 |
jeblair | ya -- my only request for angular is that it be understandable by folks who have used web systems other than angular -- i'm able to follow the patterns that tristanC has used fairly easily | 22:23 |
fungi | for some reason i always confuse angularjs with reactjs (the facebook one with the patent license controversy) | 22:23 |
mnaser | jeblair i agree 100% -- i dont want to leave zuul with some complicated codebase no one knows how to fix if im not available | 22:24 |
mordred | mnaser: https://review.openstack.org/#/c/487538/ and https://review.openstack.org/#/c/503268/ are the two relevant pieces | 22:24 |
jeblair | mnaser: exactly! :) | 22:24 |
mnaser | mordred ++ to using webpack to manage dependencies | 22:24 |
mordred | mnaser: the first is some initial exploration I did around incorporating javascript toolchains - the second is the first patch from tristanC that adds angular and uses it using the current setup | 22:24 |
jeblair | so maybe building on mordred and tristanC's work is the best bet. i think the only caveat is that we won't really review+merge larger changes like that until after the dust settles (but probably *soon* after the dust settles) | 22:25 |
mordred | yah - I want the dashboard :) | 22:25 |
mnaser | the nice thing is the status page is really well/easily tested | 22:25 |
jeblair | so just know there will be a bit of a delay if we go that route. but in the long run, it's the best i think. | 22:26 |
mnaser | thanks to whoever wrote the ?demo= stuff | 22:26 |
mordred | that's one of the reasons I used status page as a trial balloon for the toolchain stuff :) | 22:26 |
mnaser | i'll work on angular-ifying the status page and then we can "integrate" it with the other angular-based pages later, it shouldn't be too much work (if i do it correctly) | 22:26 |
mnaser | because personally the status page is currently unusable for me, always crashes my browser :( | 22:27 |
openstackgerrit | Monty Taylor proposed openstack-infra/zuul feature/zuulv3: Fix doc typo that missed important words https://review.openstack.org/509905 | 22:27 |
mnaser | the v3 one that is, with the big status.json file | 22:27 |
jeblair | me too | 22:27 |
mordred | mnaser: I'm rebasing that patch of mine real quick - silly merge conflicts | 22:29 |
fungi | mnaser: is the v2 one at http://zuul.openstack.org/ (not the custom one we have on status.o.o) actually any better in that regard? | 22:31 |
fungi | it also is nigh unusable for me at relatively high queue sizes | 22:31 |
openstackgerrit | Monty Taylor proposed openstack-infra/zuul feature/zuulv3: Use yarn and webpack to manage status javascript https://review.openstack.org/487538 | 22:39 |
openstackgerrit | Monty Taylor proposed openstack-infra/zuul feature/zuulv3: Migrate console streaming to webpack/yarn https://review.openstack.org/487539 | 22:39 |
mordred | mnaser: in https://review.openstack.org/487538 if you do "yarn install ; npm run start:livev3" it'll spin it up in a local dev server pointed at zuulv3.openstack.org | 22:39 |
openstackgerrit | Tristan Cacqueray proposed openstack-infra/nodepool feature/zuulv3: alien-list: use provider name https://review.openstack.org/508788 | 22:42 |
openstackgerrit | Monty Taylor proposed openstack-infra/zuul feature/zuulv3: Migrate console streaming to webpack/yarn https://review.openstack.org/487539 | 22:52 |
openstackgerrit | Monty Taylor proposed openstack-infra/zuul feature/zuulv3: WIP Use yarn and webpack to manage status javascript https://review.openstack.org/487538 | 22:54 |
openstackgerrit | Monty Taylor proposed openstack-infra/zuul feature/zuulv3: WIP Migrate console streaming to webpack/yarn https://review.openstack.org/487539 | 22:54 |
openstackgerrit | James E. Blair proposed openstack-infra/zuul feature/zuulv3: Remove references to pipelines, queues, and layouts on dequeue https://review.openstack.org/509903 | 23:30 |
openstackgerrit | James E. Blair proposed openstack-infra/zuul feature/zuulv3: Fix early processing of merge-pending items on reconfig https://review.openstack.org/509912 | 23:30 |
Generated by irclog2html.py 2.15.3 by Marius Gedminas - find it at mg.pov.lt!