corvus | SpamapS: remote: https://review.openstack.org/540184 Send a NOOP after registering a function | 00:12 |
---|---|---|
SpamapS | corvus: reading gearmand.. looks like it does not wake up workers on registration. | 00:19 |
SpamapS | Only SUBMIT's | 00:19 |
SpamapS | or PRE_SLEEP's | 00:19 |
SpamapS | (as in, it will respond to a PRE_SLEEP with a NOOP if there are jobs in the worker's functions) | 00:19 |
corvus | SpamapS: oh interesting, so geard currently has the the same behavior | 00:20 |
SpamapS | so geard appears to be doing the right thing. | 00:20 |
corvus | or the consistent thing, if not the right thing :) | 00:20 |
SpamapS | I think it makes sense too | 00:20 |
SpamapS | CAN_DO doesn't mean "and I have nothing to do", so one can reasonably expect either GRAB_JOB* or PRE_SLEEP after a CAN_DO. | 00:20 |
corvus | SpamapS: which, current behavior, or noop-after-cando? | 00:20 |
* SpamapS has reversed himself :-P | 00:20 | |
SpamapS | I think current behavior makes sense, now that I've thought about it. | 00:21 |
SpamapS | The protocol seems more or less silent on this, as usual. I don't see that it would hurt to NOOP after CAN_DO, but ... I think it's ok to wait for the PRE_SLEEP/GRAB_JOB. | 00:22 |
corvus | SpamapS: i still think noop-after-cando makes sense, for the same reason. they are independent. :) but that's probably because i see the state machine as persisting across a cando packet. ie, if you're 'sleeping' before cando, then you're still sleeping after cando. if you see a cando packet as indicating the connection is 'awake', then i agree, it makes sense for the connection to go back to sleep. | 00:24 |
corvus | or put another way, the question is -- once the client goes to sleep, can only the server wake it up, or can it wake itself up? | 00:26 |
SpamapS | I believe it can wake itself up. | 00:27 |
corvus | so 2 pre_sleeps in a row are okay? | 00:27 |
SpamapS | But they don't tend to. :) | 00:27 |
corvus | SpamapS: feel free to -1 that change, and i can make another so that when a client does registerFunction, it also does a pre_sleep to poke the server to poke it back | 00:28 |
SpamapS | the gearmand code doesn't mind if you sleep twice. it will just do the same thing. | 00:28 |
corvus | (i think that's the other thing that makes me favor noop-on-cando -- it's a bit more efficient, since the server has all the info) | 00:29 |
SpamapS | I mean, it's also worth noting, a sleeping worker is perfectly fine to just GRAB_JOB | 00:30 |
SpamapS | the point of the NOOP is to say "that might be successful now" | 00:30 |
SpamapS | Sending NOOP doesn't actually change the state of the worker to "not sleeping". | 00:31 |
SpamapS | in fact, GRAB_JOB is the only thing that changes to that state. | 00:31 |
corvus | that's a good point | 00:38 |
openstackgerrit | Merged openstack-infra/zuul master: Support the fragment form of Gerrit URLs https://review.openstack.org/539705 | 00:49 |
SpamapS | Gearman is like the game of go. Simple rules. Lifetime to master. | 00:51 |
SpamapS | so far so good with my 44-repo job.... | 00:58 |
SpamapS | corvus: ^^ however, creating zuul refs definitely makes the executor very busy when those jobs start. | 01:00 |
*** rlandy|biab is now known as rlandy | 01:05 | |
*** rlandy has quit IRC | 01:31 | |
*** yolanda_ has joined #zuul | 02:06 | |
*** yolanda has quit IRC | 02:07 | |
*** lennyb has quit IRC | 03:22 | |
openstackgerrit | ChangBo Guo(gcb) proposed openstack-infra/nodepool master: update ggupported python version in setup.cfg https://review.openstack.org/540231 | 03:26 |
*** harlowja_ has quit IRC | 04:09 | |
openstackgerrit | ChangBo Guo(gcb) proposed openstack-infra/nodepool master: update supported python version in setup.cfg https://review.openstack.org/540231 | 04:30 |
openstackgerrit | James E. Blair proposed openstack-infra/zone-zuul-ci.org master: Zuul: Remove project name https://review.openstack.org/540249 | 04:44 |
*** bhavik1 has joined #zuul | 04:51 | |
*** bhavik1 has quit IRC | 04:53 | |
*** harlowja has joined #zuul | 04:56 | |
openstackgerrit | Merged openstack-infra/zuul master: Don't use run_lock in executor's merger https://review.openstack.org/540112 | 05:36 |
*** xinliang has quit IRC | 05:52 | |
*** xinliang has joined #zuul | 06:04 | |
*** threestrands has quit IRC | 07:05 | |
*** yolanda_ has quit IRC | 07:23 | |
*** ngocng has joined #zuul | 08:36 | |
*** ngocng has quit IRC | 08:37 | |
*** jpena|off is now known as jpena | 08:50 | |
openstackgerrit | ChangBo Guo(gcb) proposed openstack-infra/nodepool master: Refactor method node_list in status.py https://review.openstack.org/540325 | 09:45 |
*** xinliang has quit IRC | 10:56 | |
*** AJaeger has quit IRC | 11:02 | |
*** AJaeger has joined #zuul | 11:03 | |
openstackgerrit | Merged openstack-infra/nodepool master: update supported python version in setup.cfg https://review.openstack.org/540231 | 11:08 |
openstackgerrit | Merged openstack-infra/zuul-jobs master: Make subunit processing more resilient https://review.openstack.org/534431 | 11:27 |
*** xinliang has joined #zuul | 11:30 | |
openstackgerrit | Merged openstack-infra/zuul-jobs master: Change the list of extensions to a dict https://review.openstack.org/539683 | 11:50 |
openstackgerrit | Tristan Cacqueray proposed openstack-infra/nodepool master: zk: use kazoo retry facilities https://review.openstack.org/535537 | 12:18 |
openstackgerrit | Merged openstack-infra/zone-zuul-ci.org master: Zuul: Remove project name https://review.openstack.org/540249 | 12:20 |
*** jpena is now known as jpena|lunch | 12:40 | |
*** JasonCL has quit IRC | 13:23 | |
*** JasonCL has joined #zuul | 13:24 | |
*** JasonCL has quit IRC | 13:28 | |
*** rlandy has joined #zuul | 13:31 | |
Shrews | FWIW, I believe I have a fix for the provider wedge problem implemented. However, deleting an unused node at pause time affects soooo many tests, so trying to get that cleaned up. | 13:43 |
Shrews | it's not fun | 13:43 |
*** jpena|lunch is now known as jpena | 13:44 | |
*** jpena is now known as jpena|off | 13:50 | |
*** JasonCL has joined #zuul | 14:02 | |
*** jpena|off is now known as jpena | 14:05 | |
openstackgerrit | Andrea Frittoli proposed openstack-infra/zuul-jobs master: Add no_log to verbouse tasks https://review.openstack.org/540401 | 14:36 |
*** JasonCL has quit IRC | 14:45 | |
hughsaunders | Hey all, odyssey4me mentioned a few days ago that we are interested in integrating Jenkins with nodepool. I'm investigating this and my initial question is what are the operations that I need to submit to nodepool via zk in order to allocate, use and release nodes? | 14:47 |
odyssey4me | Shrews mordred electrofelix ^ :) | 14:49 |
hughsaunders | Looking at zuul's nodepool module there are functions for request, lock/unlock, return, use and accept. From that I'd guess the flow is request, accept, lock, use, unlock, return? | 14:49 |
*** JasonCL has joined #zuul | 14:51 | |
Shrews | wow, this must be "let's all code to the zuul-nodepool api week" | 14:56 |
*** JasonCL has quit IRC | 14:56 | |
Shrews | hughsaunders: it's a bit complicated, and not *really* documented yet because it's still not cemented fully, but a lot of it is described in http://specs.openstack.org/openstack-infra/infra-specs/specs/zuulv3.html | 14:57 |
Shrews | but from the zuul side, those seem to be the basic actions. corvus can provide more detail | 14:59 |
odyssey4me | Shrews The actions are easy to see, but the flow of a request is what hughsaunders is trying to ascertain | 15:00 |
odyssey4me | the lifecycle of the node, so to speak | 15:00 |
hughsaunders | Shrews: didn't you get the invite? | 15:04 |
Shrews | for... ? i guess not | 15:05 |
electrofelix | hughsaunders: https://etherpad.openstack.org/p/zuulv3-jenkins-integration might be a good place to start, think mordred wrote it? | 15:06 |
hughsaunders | for zuul-nodepool api week ;-) | 15:06 |
hughsaunders | electrofelix: thanks, will have a read | 15:07 |
electrofelix | hughsaunders: are you going to be over for the PTG? | 15:07 |
hughsaunders | electrofelix: sadly not | 15:07 |
electrofelix | what timezone do you work within, at the very least if I make it there and someone else planning on looking at this area, planning for some discussion around times that would suit would probably help | 15:08 |
hughsaunders | UTC | 15:08 |
electrofelix | that keeps it simple so ;-) | 15:08 |
hughsaunders | :) | 15:08 |
hughsaunders | I was thinking of a similar approach to the existing plugins idea in the etherpad. | 15:10 |
hughsaunders | It could be refined a little with https://github.com/jenkinsci/groovy-events-listener-plugin | 15:10 |
electrofelix | hughsaunders: ah, didn't know there was a proper plugin for events for that, that would certainly allow for a POC using groovy to be put together | 15:11 |
hughsaunders | yeah | 15:11 |
hughsaunders | just need to know the operations that np+zk require | 15:11 |
electrofelix | I think for optimal integration need to handle static slaves, dynamic ones from other plugins and nodepool provided slaves | 15:14 |
electrofelix | having a separate nodepool provider should make it a little easier to generically handle any dynamic slave (assuming it's possible to easily check if there is one available from any of them) | 15:15 |
hughsaunders | I would have thought nodepool only need to know about nodepool provided slaves.. | 15:16 |
electrofelix | I'm thinking the zuul trigger plugin part should regard the nodepool provider as just another slave provider, so need to keep some of the interactions separated | 15:17 |
hughsaunders | ah ok, I didn't have triggering jobs on my radar | 15:17 |
electrofelix | just thinking out loud, I think the interactions describe above are separate | 15:17 |
hughsaunders | I nodepool to maintain a pool of nodes which become jenkins slaves, then jenkins allocates jobs to them and when the job finishes the node is cleaned up. The build that uses the node is triggered by something else - could be cron or pull request webhook etc | 15:18 |
* hughsaunders should read and edit before submitting | 15:19 | |
hughsaunders | slack has made me even lazier than before | 15:19 |
electrofelix | ah, ok, so you're just coming from the perspective of using nodepool from Jenkins, not the integration with zuul part? | 15:19 |
hughsaunders | yeah | 15:20 |
hughsaunders | we're heavily invested in Jenkins at the moment, and nodepool could make node management easier. | 15:20 |
hughsaunders | Migration to zuul might happen but if it does, its a long way off. | 15:20 |
electrofelix | that could work well, because I think we want to solve the integration with zuul and as we only have static slaves at the moment | 15:20 |
electrofelix | but would like to use nodepool in the future as well | 15:21 |
hughsaunders | we're using dynamic slaves, but a mess of groovy and ansible allocates a slave within each job, which is slow and error prone. | 15:21 |
electrofelix | if someone else is worrying about the nodepool piece, means I've one less piece to worry about and just need to make sure that we can easily use the nodepool plugin :D | 15:22 |
hughsaunders | :) | 15:23 |
openstackgerrit | Merged openstack-infra/zuul-jobs master: Add no_log to verbouse tasks https://review.openstack.org/540401 | 15:23 |
hughsaunders | electrofelix: I was thinking of having a python daemon between zk and jenkins, so that the groovy parts can be very thin. Would that architecture work for you? | 15:29 |
hughsaunders | The jenkins groovy events plugin hits the python daemon's api, that then does all the needfuls with zk. The python daemon also listens to zk events and calls the Jenkins api to register nodes. | 15:30 |
openstackgerrit | Sam Betts proposed openstack-infra/zuul master: Add ZUUL_OVERRIDE_BRANCH support for legacy jobs https://review.openstack.org/540418 | 15:30 |
hughsaunders | Or maybe nodes self register via a DIB element, but that might expose keys. | 15:31 |
*** JasonCL has joined #zuul | 15:38 | |
*** JasonCL has quit IRC | 15:39 | |
*** JasonCL has joined #zuul | 15:43 | |
electrofelix | hughsaunders: as a POC I think that would be fine to prove that it works, but I think that for it to become a solution people would be willing to install and use in general that it probably should all be in a plugin in Jenkins that talks directly to zk | 15:52 |
hughsaunders | Yeah, it would be much easier for a user to just install a plugin | 15:52 |
electrofelix | For the most part it's really just that you want to keep the calls to zk simple? Could write a series of calls in groovy and place them into a shared library in Jenkins that the events groovy code just utilizes? https://jenkins.io/doc/book/pipeline/shared-libraries/#writing-libraries | 15:58 |
hughsaunders | I'm not sure how the scoping works with that, shared libraries are definitely available to pipeline-dsl jobs, but I'm not sure if they are available in the context of the groovy events plugin, will experiment. | 16:05 |
*** dmellado has quit IRC | 16:11 | |
*** dmellado has joined #zuul | 16:14 | |
tobiash | electrofelix: keep in mind that you need a persistent connention to zk otherwise you loose locks | 16:26 |
electrofelix | hughsaunders: groovy scripts run within a system context will have access to any libraries that jenkins can load, so I'd be reasonably certain that it should be possible to grab just about anything else within Jenkins to be able to load stuff | 16:33 |
electrofelix | I'm pretty sure the events plugin has to run the groovy within a system context to have access to the events in the first place | 16:33 |
hughsaunders | I'm guessing something that needs a background thread - to maintain the zk connection will have to be its own java plugin, rather than riding on the existing groovy events plugin. | 16:34 |
electrofelix | that does complicate it | 16:37 |
electrofelix | anyway, I'd go with whatever you're comfortable as a starting POC to confirm things, with a view that it will probably make sense to have it as a plugin in jenkins that talks direct to zookeeper rather than through an intermediary service | 16:39 |
electrofelix | hughsaunders: https://cdn.rawgit.com/jenkinsci/groovy-events-listener-plugin/master/src/main/resources/org/jenkinsci/plugins/globalEventsPlugin/GlobalEventsPlugin/help-onEventGroovyCode.html mentions a PLUGIN_STARTED event that could be used to launch a background thread | 16:45 |
electrofelix | hughsaunders: and https://github.com/jenkinsci/groovy-events-listener-plugin/blob/master/src/main/site/examples/2_IncludeExternalGroovyScript.groovy covers loading common groovy code from some location | 16:46 |
* SpamapS reading some fastcinating, relatable backscroll | 16:48 | |
SpamapS | hughsaunders: I'm in a similar boat. Company full of Jenkins jobs and just one team on Zuul.. the migration may never happen honestly, but we do need to do some cross-repo stuff like yesterday ;) | 16:48 |
hughsaunders | electrofelix: interesting, thanks and SpamapS I feel your pain | 16:49 |
SpamapS | And we have a similar "hmm, maybe nodepool for jenkins..." | 16:50 |
SpamapS | because we use the jenkins openstack plugin | 16:50 |
SpamapS | and... there are... issues. | 16:50 |
* SpamapS does like that this is coming full circle | 16:50 | |
corvus | i guess my question is -- why the focus on enabling jenkins, rather than converting to ansible? | 16:50 |
electrofelix | SpamapS: I knew I recalled something used to do this before ... ;-) | 16:50 |
SpamapS | corvus: we have thousands of lines of Groovy defining jobs | 16:51 |
SpamapS | it's just.. | 16:51 |
electrofelix | corvus: it's pretty difficult to convince management that it's worthwhile switching | 16:51 |
SpamapS | What I'd like to do instead is enable zuul to send things through jenkins. | 16:51 |
electrofelix | at least difficult to convince them we should rewrite everything to switch in one go | 16:51 |
SpamapS | One idea I've had is just to have a job with pre playbooks that sets up a node as a jenkins master, and feeds in the configuration. | 16:51 |
hughsaunders | I just want to rip out the node management code from my groovy libs and let nodepool do it.. | 16:52 |
corvus | yeah, rewriting everything is hard. but there must be work that can be done to automate conversion. we did some of it for openstack. | 16:52 |
electrofelix | If it's possible to get easy integration between Jenkins <--> zuulv3 <--> nodepool, then I think the advantages of using .zuul.yaml in projects will convince people to switch organically | 16:52 |
corvus | if we're talking about writing new code, why not do that? | 16:52 |
SpamapS | And then the job runs in that jenkins master, and produces whatever result, and then it's torn down. | 16:52 |
SpamapS | corvus: you know as well as anyone how awful jenkins config and job definition are intertwined if people aren't religious about using jjb or something like it. | 16:52 |
SpamapS | Also I believe some of these users, shockingly, actually like jenkins pipeline. | 16:53 |
SpamapS | Groovy isn't the worst language ever. The pipeline UI is pretty good, showing jobs progressing through steps and allowing retrying of steps that fail. It's not how *I* work, but I get why some aren't excited about watching ansible text fly by. | 16:54 |
hughsaunders | groovy isn't the worst lang, but jenkins cps groovy is close. | 16:56 |
hughsaunders | If it was actual groovy it would be okish | 16:56 |
SpamapS | IIRC, people are also using Kotlin in the same context now. | 16:56 |
SpamapS | And then there's the pipeline DSL | 16:56 |
electrofelix | corvus: I think where we are, more likely to switch completely to zuulv3, if I can get us to the point that we're on zuulv3 without needing to migrate all the jobs | 16:56 |
SpamapS | We've been doing an experiment where we just use jenkins-cli.jar to kick a job from a tiny node. | 16:57 |
SpamapS | It ran into Jenkins plumbing issues but that is maybe the simplest way to take advantage of jenkins jobs and then write zuul jobs based on them. | 16:57 |
SpamapS | Like, we have a job that builds RPMs and puts them in a yum repo. So we were thinking, kick that off from a zuul job's, and then you can run your zuul job can use the rpms. | 16:58 |
SpamapS | hey | 16:58 |
SpamapS | slack has ruined me too | 16:58 |
clarkb | as someone that only slacks via irc it looks like irc to me :) | 17:01 |
SpamapS | You're missing out, and I'm not being cheeky. Edit button++. | 17:02 |
mordred | SpamapS: s/edit button++// ... WFM | 17:02 |
SpamapS | I need mah gifs. | 17:02 |
clarkb | also when they have outages irc tends to stay up | 17:03 |
corvus | okay, this channel has gone really far off topic | 17:03 |
openstackgerrit | David Shrewsbury proposed openstack-infra/nodepool master: Provider wedge fix https://review.openstack.org/539316 | 17:11 |
Shrews | clarkb: tobiash: I had to change some tests in that ^^ that I believe you two wrote. If you could carefully look at those changes to make sure they're still valid, I'd appreciate it. | 17:12 |
Shrews | i mean, the tests pass locally, but i think what they're testing is still right | 17:13 |
tobiash | Shrews: ok, looking in a minute | 17:13 |
Shrews | oh, i see one thing i need to change | 17:14 |
tobiash | Ok, I'll wait ;) | 17:15 |
Shrews | and of course, that change breaks a test... *sigh* | 17:17 |
Shrews | tobiash: ok, nm. it's good as is | 17:27 |
*** dmellado has quit IRC | 17:34 | |
Shrews | can review https://review.openstack.org/537932 get some love today? | 17:36 |
*** harlowja has quit IRC | 17:37 | |
*** jpena is now known as jpena|off | 17:38 | |
corvus | Shrews: is pass okay there, or should we log.exception it? | 17:41 |
Shrews | corvus: i could go either way on it. I don't think we care it's missing at that point since we intend to decline it | 17:42 |
Shrews | corvus: i can add that though. more info never hurts | 17:43 |
corvus | Shrews: yeah, i'm mostly thinking if that comes in handy debugging some weird situation.... is there a more narrow exception we can check for there -- a 'node not presest' exception? | 17:44 |
corvus | Shrews: cause maybe that could be just a log.debug line, and then the broader exception handler could do log.execption. | 17:45 |
Shrews | corvus: no, Exception is raised if it doesnt exist currently | 17:46 |
openstackgerrit | David Shrewsbury proposed openstack-infra/nodepool master: Handle missing request during a decline. https://review.openstack.org/537932 | 17:46 |
openstackgerrit | Merged openstack-infra/nodepool master: Invalidate flavor and image cache on 400 errors https://review.openstack.org/441215 | 17:47 |
Shrews | pretty important bug fix though b/c we were continually declining the same req over and over in production, which made the log crazy large | 17:47 |
corvus | Shrews: ya | 17:48 |
corvus | Shrews: well, if this exception log annoys us, we can add a more specific exception and reduce it later | 17:48 |
corvus | tobiash: can you re-review https://review.openstack.org/537932 when you get a chance | 17:49 |
Shrews | i don't expect to see this one often, tbh | 17:49 |
tobiash | corvus: done | 17:50 |
Shrews | corvus: tobiash: thx | 17:51 |
tobiash | corvus: do you have some performance monitoring of zk in place? | 17:51 |
corvus | tobiash: no :( | 17:51 |
corvus | maybe | 17:51 |
corvus | hrm, i'm not sure how to link to a host in the new version of cacti | 17:52 |
corvus | gimme a sec | 17:52 |
tobiash | we're currently building up a monitoring for our deployment and I'm currently in process of adding zk metrics | 17:53 |
corvus | http://cacti.openstack.org/cacti/graph_view.php?action=tree&tree_id=1&leaf_id=94&nodeid=node1_94&host_group_data= | 17:54 |
corvus | tobiash: i believe zk is the only thing running on that host at the moment | 17:54 |
corvus | so we don't have any direct metrics of zk itself, only indirect metrics of the host | 17:54 |
tobiash | wow 20gb for zk | 17:55 |
corvus | there may be some leftover stuff on disk | 17:55 |
corvus | (i'm checking that) | 17:56 |
tobiash | oh, nm read 20gb and thought ram ;) | 17:56 |
corvus | heh, no the whole system is using about 900MB | 17:57 |
tobiash | the ram usage looks pretty much like ours | 17:57 |
corvus | the network traffic is not insubstantial. | 17:57 |
tobiash | a bit more than I suspected | 18:01 |
tobiash | but no problem for local networking | 18:01 |
corvus | ftr, /var/lib/zookeeper is 568M | 18:02 |
*** harlowja has joined #zuul | 18:21 | |
corvus | Shrews: the handler is going to keep running and therefore, delete one more node on each pass through, correct? so if a request requires 2 nodes, it will delete 2, after 2 passes? | 18:30 |
corvus | (re 539316) | 18:30 |
openstackgerrit | Andreas Jaeger proposed openstack-infra/zuul-jobs master: Revert "Change the list of extensions to a dict" https://review.openstack.org/540480 | 18:33 |
*** dmellado has joined #zuul | 18:35 | |
openstackgerrit | James E. Blair proposed openstack-infra/zuul master: Executor: Don't start too many jobs at once https://review.openstack.org/540162 | 18:39 |
openstackgerrit | Andreas Jaeger proposed openstack-infra/zuul-jobs master: Change the list of extensions to a dict https://review.openstack.org/540485 | 18:43 |
openstackgerrit | Merged openstack-infra/nodepool master: Handle missing request during a decline. https://review.openstack.org/537932 | 18:51 |
Shrews | corvus: (back from lunch) it should pause, delete, unpause, create node and repeat those steps for each node it needs. | 18:54 |
Shrews | corvus: thanks for looking at it. it's a significant enough change, i want others to try to poke holes in it. so far, i've not found any, but i may be too close to it | 18:55 |
Shrews | corvus: of course, if it can't delete anything, it will continue as normal | 18:56 |
Shrews | waiting for quota to be freed through normal means | 18:57 |
corvus | Shrews: the return on line 496 suggests that it won't unpause immediately after deleting, only if, after the next pass through the loop, it determines it has quota, will it unpause | 19:02 |
Shrews | corvus: yeah, it's not on in the same loop | 19:04 |
corvus | Shrews: the return means it has to go back to the provider looping through all the request handlers again, right? | 19:04 |
Shrews | corvus: you mean looping through all of the nodes? | 19:05 |
corvus | Shrews: iiuc, the provider loops through all the request handlers, and the request handler loops through all the requested nodes. so the return sends it back to the provider request-handler loop | 19:06 |
Shrews | oh, yes | 19:06 |
Shrews | nothing changes with that | 19:06 |
corvus | Shrews: so i think it's (start node loop), pause, delete, (back to provider loop), (resume node loop), unpause, create, pause [repeat] | 19:07 |
Shrews | same as what i stated, just slightly more detailed :) | 19:08 |
Shrews | but yeah | 19:08 |
corvus | Shrews: the main thing i wanted to double check is -- there will be a delay after the node deletion before the unpause (i think that's fine -- delete is async anyway, this will give it time to happen), but despite being paused, we're going to get back to that point in the loop and try again? | 19:08 |
Shrews | i suppose it depends on how the quota code deals with nodes in DELETING state. i see what you're saying though... maybe we try to delete more than we really need, right? | 19:10 |
corvus | Shrews: that's a secondary concern i was about to bring up, yes. | 19:10 |
Shrews | hrm, that's a valid concern | 19:11 |
corvus | Shrews: i think the primary concern is: if we need to delete 2 ready nodes, will we? (i think the answer is yes because despite the return in 496, we come back to that point again later). then the secondary concern is: If it takes a long time (>provider loop interval) to delete 1 node, will we keep deleting ready nodes until one of them finally is deleted? i think the answer to that is yes too, because of | 19:11 |
corvus | the async delete. | 19:11 |
corvus | one might argue that's the best behavior, but i guess it depends on how well the provider loop time matches the expected node deletion time. | 19:12 |
Shrews | corvus: i agree. i didn't see that hole. glad you did. not sure how to address that yet | 19:12 |
Shrews | i don't want it to delete more unused nodes than it has to | 19:12 |
corvus | Shrews: maybe deletoldestunusednode could check for deleting nodes, and noop if there is a deleting node < $some_timeout old | 19:13 |
corvus | Shrews: well, that doesn't deal well with the multi-node request case | 19:13 |
corvus | Shrews: so i guess it'd have to be more like "delete the number of nodes needed for the request, unless there are nodes already deleting that haven't been in deleting state more than 5 minutes" | 19:14 |
Shrews | i think it will. the spotted DELETING node should disappear, then we can actually delete for any other nodes, yeah | 19:15 |
corvus | Shrews: downside of that is it makes the waitfornodeset loop a bit more complicated -- it can't just delete them one at a time, it has to calculate its deficit, right? | 19:15 |
Shrews | we could still treat it one-at-a-time i think | 19:15 |
corvus | if we did, it'd be slow though? a 5 node request would delete each of those nodes one at a time before fulfilling. | 19:16 |
corvus | maybe that's okay since this is kind of an edge case? | 19:16 |
Shrews | corvus: i think that's ok, imo | 19:16 |
corvus | yeah, i think we can live with it. it doesn't have to be optimized, it just has to eventually get itself out of the corner. | 19:16 |
Shrews | corvus: i'll WIP that and add that bit. thanks | 19:17 |
corvus | cool, thx | 19:17 |
corvus | SpamapS: 18:58 < openstackgerrit> James E. Blair proposed openstack-infra/gear master: Automatically send GRAB_JOB after CAN_DO https://review.openstack.org/540489 | 19:17 |
SpamapS | sweet | 19:17 |
SpamapS | corvus: will review later today | 19:18 |
corvus | SpamapS: how's that look as an alternative to yesterday's thing? | 19:18 |
corvus | SpamapS: thx | 19:18 |
corvus | i will have pep8 fixed by the time you look at it :) | 19:18 |
mordred | corvus, Shrews, SpamapS, jimi|ansible: you'll be happy to know that in addition to zuul correctly catching some bugs in https://github.com/ansible/ansible/pull/20974 - it has also now run successfully - which means we can land a PR I've been waiting to land for over 6 months because I wanted to see it integratoin tested first | 19:18 |
jimi|ansible | \o/ | 19:19 |
corvus | mordred: you play the long game :) | 19:19 |
AJaeger | congrats! | 19:19 |
Shrews | mordred: awesome! | 19:20 |
corvus | mordred: when i click on the yellow circle, it seems to say openstack-zuul hasn't completed checks yet? | 19:21 |
Shrews | We can now anticipate many questions from folks on how to parse our job ouput when their module change causes a failure. \o/ | 19:21 |
mordred | corvus: I just pushed a new patchset | 19:21 |
mordred | the last one passed integration tests but faile a unittest due to amissing method on a fake object | 19:22 |
corvus | mordred: ah, the missing info for me is that you pushed a new patchset... do amends not show up in the timeline? | 19:23 |
clarkb | corvus: thats one of the major grumps with github aiui. When you rebase you effectively lose old review history | 19:23 |
mordred | corvus: they do now actually | 19:24 |
mordred | if you look underneath the last comment from openstack-zuul | 19:24 |
clarkb | oh they've fixed it? | 19:24 |
mordred | you should see my face and a line "Add a module_utils OpenStack Cloud constructor …" | 19:24 |
* Shrews sees mordred's face everywhere | 19:25 | |
Shrews | mostly nightmares | 19:25 |
corvus | mordred: yeah, but i didn't see that anywhere else, like between the -20 and -4 hour comments, so i didn't think it meant that was chronological | 19:25 |
mordred | I think it gets replaced | 19:26 |
corvus | mordred: but i guess when checks are done, the newest comment will appear after the commit? | 19:26 |
mordred | like - the old versions of the commit don't show up any more | 19:26 |
mordred | corvus: yes | 19:26 |
corvus | ok. that of course makes perfect sense. it's just that things *disappearing* from a timeline undercut the value of a timeline as a timeline from a UI point of view. it makes me question the nature of time. | 19:27 |
mordred | corvus: as well you should | 19:27 |
corvus | but i think i grok the rules of this artificial quantum realm now | 19:28 |
mordred | of course, now Shrews is lucky and gets to review that monster ... and hopefully maybe rcarrillocruz too | 19:28 |
corvus | good thing there's no rush. ;) | 19:28 |
openstackgerrit | Merged openstack-infra/zuul-jobs master: Revert "Change the list of extensions to a dict" https://review.openstack.org/540480 | 19:28 |
*** electrofelix has quit IRC | 19:31 | |
tobiash | Shrews, corvus: the quota code treats nodes in DELETING state like normal READY nodes | 19:34 |
corvus | tobiash: ok. i think that confirms we should make the change we discussed | 19:35 |
Shrews | yeah. and another test to verify the delete behavior | 19:36 |
openstackgerrit | James E. Blair proposed openstack-infra/zuul master: Executor: Don't start too many jobs at once https://review.openstack.org/540162 | 19:36 |
corvus | tobiash: tobiash can you review my 2 comments on 535711? | 19:55 |
tobiash | looking | 19:57 |
tobiash | corvus: responded | 20:02 |
tobiash | regarding the path I can change that | 20:02 |
tobiash | regarding drivers hooking it mordred had a followup to refactor that: https://review.openstack.org/#/c/537014 | 20:03 |
tobiash | regarding the path I just thought this was discussed before I adopted the change so I left it as is | 20:04 |
tobiash | but I'm fine with maintaining the old path | 20:04 |
corvus | tobiash: ah, thanks. that followup is after the confusing .json change... mordred can we rebase 537014 earlier in the stack? i'd like to have a driver-based zuul-web asap and build off of that. | 20:05 |
corvus | tobiash: oh, i don't remember wanting to change the path... do you have a reference to jog my memory? | 20:05 |
tobiash | corvus: that was just my assumption as I didn't know about such a conversation ;) | 20:06 |
corvus | tobiash: oh, you mean it was in jlk's original change | 20:07 |
tobiash | yes | 20:07 |
tobiash | so my assumption was that should be more generic than 'connection' | 20:07 |
jlk | hrm, I remember some of this convo | 20:07 |
openstackgerrit | David Shrewsbury proposed openstack-infra/nodepool master: Provider wedge fix https://review.openstack.org/539316 | 20:07 |
corvus | jlk: https://review.openstack.org/#/c/535711/8//COMMIT_MSG specifically | 20:08 |
jlk | and thought process. | 20:08 |
tobiash | but no idea about that, is 'connection' a synonym to 'driver' in zuul context? | 20:08 |
SpamapS | mordred: zomg, zuul made your day better. | 20:09 |
corvus | tobiash: no, but a given connection has exactly one driver | 20:09 |
SpamapS | that's awesome. | 20:09 |
corvus | tobiash: so connection implies driver | 20:09 |
SpamapS | to me, connections are definitions that are used for triggers and/or reporters. drivers are things that define connections. | 20:10 |
jlk | oh | 20:11 |
jlk | corvus: I thought maybe it was because we might have another connection driver with a similar name? | 20:11 |
jlk | maybe? | 20:11 |
* jlk thinks some more | 20:11 | |
tobiash | so the current path is /connection/<name_as_in_zuul_conf>/... | 20:12 |
tobiash | and the change currently replaces /connection/ by /driver/github/ | 20:12 |
mordred | a connection is a logical representation of a remote system. a driver is code that implements that.... | 20:12 |
jlk | Maybe I felt wrong takign the /connection/ path entirely | 20:12 |
tobiash | so stay with connection? | 20:13 |
jlk | either way, it should totally be discussed , reasoned about, etc.. | 20:13 |
mordred | so a zuul might have a connectoin to public github and a connection to a private github and a connection to a gerrit | 20:13 |
mordred | the first two woulduse the github driver, the second woulduse the gerrit driver | 20:13 |
jlk | it might have multiple connections to public github | 20:13 |
corvus | so /connection/my_private_github/ uniquely identifies both a connection (ie, remote system) and the driver which is used to interface with it. | 20:13 |
tobiash | the connection name is encoded in the path currently so the driver name is not necessary | 20:14 |
corvus | if you have multiple connections, it'd just be /connection/github1 and /connection/github2 | 20:14 |
mordred | agree. /connection/{connection} sohuld be sufficient | 20:14 |
mordred | and yes - I can rebase the confusing stack | 20:15 |
corvus | mordred: i was thinking if we can do the last change first -- 537014 -- that would be ideal | 20:16 |
mordred | corvus: let me see how bad that rebase is :) | 20:17 |
jlk | yeah I honestly can't remember why I went with that path, and I didn't comment in the code for it (go me!) so I have no argument here. | 20:17 |
mordred | tobiash: you ok with me rebasing this stack real quick? don't want to step on work you're doing | 20:17 |
tobiash | mordred: go ahead | 20:18 |
corvus | mordred: er | 20:18 |
tobiash | corvus: I think I know why the change currently has the driver in the route | 20:18 |
tobiash | it needs that information to query the gearman about github | 20:19 |
corvus | mordred: i was thinking rebase 537014 on 536780... so it shouldn't affect the 3 changes in tobias's stack | 20:19 |
tobiash | so maybe that gets easier with mordred's change | 20:19 |
tobiash | maybe these need to be squashed and reworked | 20:19 |
tobiash | have to look at mordred's change again | 20:19 |
corvus | hrm... ok... well, if it's easier, we can also land all these patches together -- ie, break the endpoint, then change it back with the refactor. | 20:20 |
tobiash | corvus: or put mordred's change (without github) in front of it | 20:21 |
corvus | and as long as we land them in rapid succesion, we don't have to deal with an api break | 20:21 |
corvus | i assumed mordred's change depended on 535711 | 20:22 |
tobiash | partly | 20:22 |
tobiash | it also changes the sql connection | 20:22 |
tobiash | and primarily adds the framework for that into zuul-web if I understood correctly | 20:22 |
corvus | right, i'm just thinking that if we land 535711, 536773, 536780, 537014, and a new change to change the route back to the old form, we should be able to do that with minimal updates. | 20:23 |
corvus | assuming that mordred comes back and says that rebase isn't too crazy. ;) | 20:24 |
tobiash | ok, I'm fine with that too ;) | 20:24 |
tobiash | the path change might be easy there: https://review.openstack.org/#/c/537014/3/zuul/web/handler.py | 20:25 |
corvus | yep, at that point it's just deleting a line :) | 20:25 |
tobiash | yeah, that order is probably the easiest | 20:26 |
mordred | corvus, tobiash: sorry - brainjuggling the rebase ... andyes, corvus, I agre with rebase 537014 on 536780 | 20:33 |
mordred | but there's some internal dependencies - so I'm doing it in a few passes | 20:34 |
tobiash | multipass rebase... nice strategy :) | 20:35 |
mordred | ok. I *think* I've got it | 20:56 |
openstackgerrit | Monty Taylor proposed openstack-infra/zuul master: Add facility for plugins to register web routes https://review.openstack.org/537014 | 20:56 |
openstackgerrit | Monty Taylor proposed openstack-infra/zuul master: Add /info and /{tenant}/info route to zuul-web https://review.openstack.org/537011 | 20:56 |
openstackgerrit | Monty Taylor proposed openstack-infra/zuul master: Add support for configuring graphite_url https://review.openstack.org/537012 | 20:57 |
openstackgerrit | Monty Taylor proposed openstack-infra/zuul master: Move WebInfo config processing into zuul.model https://review.openstack.org/537013 | 20:57 |
tobiash | mordred: ah, already with the correct path :) | 20:58 |
tobiash | mordred: the commit message doesn't fit the corrected path | 20:59 |
mordred | tobiash: whoops | 20:59 |
openstackgerrit | Monty Taylor proposed openstack-infra/zuul master: Add facility for plugins to register web routes https://review.openstack.org/537014 | 21:00 |
openstackgerrit | Monty Taylor proposed openstack-infra/zuul master: Add /info and /{tenant}/info route to zuul-web https://review.openstack.org/537011 | 21:00 |
openstackgerrit | Monty Taylor proposed openstack-infra/zuul master: Add support for configuring graphite_url https://review.openstack.org/537012 | 21:00 |
openstackgerrit | Monty Taylor proposed openstack-infra/zuul master: Move WebInfo config processing into zuul.model https://review.openstack.org/537013 | 21:00 |
mordred | tobiash: fixed :) | 21:00 |
corvus | sweet, i'll review that now | 21:01 |
mordred | I also marked 537011 WIP because it needs tests | 21:02 |
tobiash | mordred: commented on 537014 | 21:09 |
corvus | (i have comments in progress) | 21:09 |
tobiash | not sure if I'm correct there but that's what I think | 21:09 |
corvus | mordred: reviewed -- mostly some small things. i think tobiash is right about the path. also, we need tests for the rest api. that can come next. | 21:14 |
openstackgerrit | Andrea Frittoli proposed openstack-infra/zuul-jobs master: Change the list of extensions to a dict https://review.openstack.org/540485 | 21:16 |
mordred | corvus, tobiash: re: path - I can see where the confusion lies - but there's two different classes there ... | 21:21 |
mordred | class SqlWebHandler(BaseWebHandler) <-- the sql handler subclasses BaseWebHandler - whereas the github driver subclasses BaseDriverWebHandler (which should be renamed to BaseConnectoinWebHandler) | 21:21 |
mordred | the BaseDriverWebHandler pre-pends connection info always - the BaseWebHandler doesn't | 21:22 |
corvus | mordred: oh yep i fell into that trap | 21:22 |
tobiash | oh, yes | 21:23 |
tobiash | then ignore my comment | 21:23 |
openstackgerrit | Monty Taylor proposed openstack-infra/zuul master: Add facility for plugins to register web routes https://review.openstack.org/537014 | 21:28 |
corvus | i'm +2 on the stack of 4 | 21:34 |
tobiash | \o/ | 21:35 |
tobiash | eod for me, cya | 21:41 |
fungi | anybody who enjoys reviewing massive amounts of whitespace will just love https://review.openstack.org/540082 | 21:41 |
openstackgerrit | David Shrewsbury proposed openstack-infra/nodepool master: Provider wedge fix https://review.openstack.org/539316 | 21:42 |
*** myoung is now known as myoung|bbl | 21:42 | |
clarkb | fungi: looks like you have enough reviews for approval | 21:43 |
mordred | corvus: the commit message is incorrect on the first patch ... do we care? | 21:43 |
mordred | corvus: otherwise I think we can land the stack | 21:43 |
corvus | mordred: it's correct for that patch, right? it's just that we change it back 3 patches later? | 21:44 |
corvus | fungi: aprvd | 21:45 |
fungi | thanks corvus. i thought maybe the whitespace demanded greater care ;) | 21:46 |
corvus | fungi: heh you have tricked me into a whitespace review ;) | 21:46 |
fungi | and it's not even april 1 yet | 21:46 |
openstackgerrit | Merged openstack-infra/zuul-website master: Convert Arcana files from DOS to UNIX line endings https://review.openstack.org/540082 | 21:48 |
openstackgerrit | Merged openstack-infra/zuul-website master: Link page source from the index footer https://review.openstack.org/540083 | 21:48 |
*** rlandy has quit IRC | 21:51 | |
corvus | mordred: 537014 made zuul sad. i left a comment pointing out the error | 21:56 |
mordred | corvus: I make zuul sad all th etime | 22:01 |
* corvus warms up the hot cocoa | 22:02 | |
corvus | mordred: you gonna fix that or should i push up the fix? | 22:19 |
openstackgerrit | James E. Blair proposed openstack-infra/zuul master: Add facility for plugins to register web routes https://review.openstack.org/537014 | 22:20 |
corvus | nm | 22:20 |
mordred | corvus: oh - sorry - I made the snarky comment about how I break things - but then didn't actually, you know, fix it | 22:24 |
mordred | sigh | 22:24 |
corvus | mordred: we all have our special skills ;) | 22:24 |
openstackgerrit | Monty Taylor proposed openstack-infra/zuul-jobs master: Look for subunit2html - not an empty variable https://review.openstack.org/540547 | 22:31 |
mordred | corvus: zuul unhappy again. I'll get it this time | 22:41 |
openstackgerrit | Andrea Frittoli proposed openstack-infra/zuul-jobs master: Change the list of extensions to a dict https://review.openstack.org/540485 | 22:42 |
mordred | hahaha | 22:47 |
mordred | rebase issue | 22:47 |
openstackgerrit | Monty Taylor proposed openstack-infra/zuul master: Add facility for plugins to register web routes https://review.openstack.org/537014 | 22:48 |
corvus | mordred: i guess those tests work! | 22:48 |
mordred | corvus: indeed! | 22:54 |
openstackgerrit | Merged openstack-infra/zuul-jobs master: Look for subunit2html - not an empty variable https://review.openstack.org/540547 | 22:59 |
corvus | i'm going to carry over tobias's +2 on that and start merging these | 23:17 |
mordred | corvus: ++ | 23:45 |
Generated by irclog2html.py 2.15.3 by Marius Gedminas - find it at mg.pov.lt!