Thursday, 2017-08-10

fungiokay, gordc volunteered for telemetry ptl at the last moment, crisis averted00:46
*** emagana has quit IRC00:51
*** emagana has joined #openstack-tc00:52
dimsfungi :  w00t00:53
fungidims: well, i wouldn't cheer too much until you read http://git.openstack.org/cgit/openstack/election/tree/candidates/queens/Telemetry/gordc.txt00:55
fungisounds like the plan is to wind it down anyway00:55
*** emagana has quit IRC00:56
fungiand acknowledging that little work is happening there any longer00:57
dimsfungi :  :(00:57
dimsfungi : i was reading about this go-contributor-workshop to see what we could learn from them - https://blog.golang.org/contributor-workshop00:58
fungineat01:02
fungithey did a good job of including tweets from and photos of women participating at the event01:04
fungithough trucker hat swag seems like it's reinforcing the brogrammer stereotype a bit... wonder whether they thought that through01:04
*** RuiChen has joined #openstack-tc01:37
*** RuiChen has left #openstack-tc01:58
*** dklyle has quit IRC02:49
*** david-lyle has joined #openstack-tc03:22
*** rmcall has quit IRC04:07
*** rmcall has joined #openstack-tc04:40
*** david-lyle has quit IRC04:44
*** rmcall has quit IRC04:45
*** dklyle has joined #openstack-tc04:45
*** gcb has joined #openstack-tc07:13
*** cdent has joined #openstack-tc09:35
*** sdague has joined #openstack-tc09:50
*** gcb has quit IRC10:04
*** dtantsur|afk is now known as dtantsur10:35
dhellmannfungi, smcginnis, dims : it sounds like we should talk about adding status:maintenance-mode to those projects12:43
fungiquite possibly12:44
cdentI don’t know that gord’s message is all _that_ depressing. It basically says “it kinda works and isn’t broken enough for people do to anything about it”. That’s an interesting kind of success.12:51
cdentThe extraction of gnocchi to be something more generic than “just openstack” is also a kind of success.12:51
cdentWhat might be not a success is if the other services aren’t capable of producing info that generic tools can consume12:52
cdent(ttx: since you asked about different uses of the term “upstream”, there’s one in this message: http://lists.openstack.org/pipermail/openstack-infra/2017-August/005546.html )12:56
smcginnisIt may be the project is just in a stable state and needs someone to keep the lights on.13:02
smcginnisIf and until it gets superceded by something else or someone has a plan to expand it in some way.13:03
smcginnisBut status:maintenance-mode probably is an accurate tag for it at this point I suppose.13:03
cdentI’d like to see the day when projects are striving to get that tag13:05
cdentnot thinking of it as bad13:05
cdentbut yeah, I agree it is the right tag for at least telemetry (now that gnocchi is no longer included)13:05
ttxcdent: yeah, in that case we are the users and citycloud is upstream from us -- still a bit confusing I'll admit13:07
smcginnisDo we need to hold a TC meeting to select a storlets PTL? Or hash it out between office hours and the ML?13:10
cdentttx: the thing that is different/missing there is that there’s no comits13:11
cdentsmcginnis: start on ml?13:11
ttxI think trying to reach to the former PTL is the first step13:11
ttxthen maybe ask notmyname if he has an idea of the current status / users13:12
smcginniscdent, ttx: Makes sense to me.13:12
ttxTraditionally we waited for the election conclusion, but I guess we can speed up13:13
smcginnisYeah, doesn't seem to be anything gating this one.13:13
smcginnisSo only two projects with a PTL election. That seems lower than usual.13:14
ttxBut also only two projects without nominees, which is also lower than usual. Usually more than 2 miss the call13:15
ttxI think we had 3 last two elections13:15
smcginnisThat is good.13:15
ttxIn the storlets case it's getting slow but it's not completely dead either. I'll reach out to Eran Rom and Kota Tsuyuzaki13:17
cdentWhen do we start asking if the concept of PTL, as currently constructed, is sustainable?13:21
dhellmannhow do you see it as potentially unsustainable?13:24
ttxIf you can't get a contact point for a project, I think that project should be considered unmaintained13:25
dhellmannsure, but that's not the concept of PTL being flawed, is it?13:25
ttxbecause that's what the PTL is, once you remove all the paint13:25
ttxdhellmann: no it's not, I agree13:25
ttxOne issue is that the PTLs are traditionally bad at delegating13:26
cdentcouple things dhellmann: a) if not many people are wanting to do the job, then that suggests that the job is either hard to do, or not rewarding, or hard to justify to employers, or any of a variety of things. b) have you observed how much some people like mriedem work? we shouldn’t be encouraging that13:26
dhellmannthat has been a trend, yes13:26
dhellmanncdent : yes, see ttx's comment13:26
cdentthat covers b, but I don’t know that it covers a13:27
ttxi.e. everyone else is happy letting "the PTL" do things13:27
cdentand I think the lack of delegation is in many cases effect not cause13:27
dhellmannthe employer justification angle is important, now that employers have figured out that it doesn't mean they can control the project absolutely13:27
smcginnisI think for the case of telemetry, it's partly a matter of the perception of what PTL means.13:28
dtroyerdhellmann: ++   I have seen where 'PTL' is not the management-buzzword nearly like it once was13:28
dhellmannwe've defined more liaison roles, which should make at least some of the delegation lines clearer13:29
smcginnisFor maintenance or stable projects, I think there is an idea that PTL is a challenge and they are expected to somehow work to "revitalize" the project.13:29
cdentsmcginnis: I think it is hard to extrapolate about ptls in general from the example of telemetry13:29
smcginnisWhere we need to make it clear in those cases it just needs to be someone to be the point person.13:29
dhellmannsmcginnis : there's also definitely a general lack of interest in governance from many members of that team13:29
smcginniscdent: Right, not trying to say that applies in all cases. But in the case of so called "dying" projects, it's a different role than being Nova PTL.13:29
ttxI don't think any of this is linked to "the PTL" as a concept. It's more a crisis of strategic involvement in projects13:30
dhellmannmaybe teams with status:maintenance-mode don't need a ptl, but need a "point of contact" role13:30
ttxYou have less people doing janitor work in projects13:30
dhellmannttx: that has been a trend for a while, yes13:30
ttxthat makes the PTL, as the default janitor, more exposed13:30
smcginnisSo status:maintenance-mode can have a "janitor" election. :)13:30
ttxBut replacing the PTL concept won't solve the underlying issue13:31
cdentSorry, I didn’t mean to distract us from that point.13:31
ttxwhich is that it's hard to get people to work ,on stuff that benefits everyone13:31
cdentMy question was a much broader bigger picture thing, not related to the issue with some projects being in maintenance mode13:31
cdentit was more in reaction to the statement about the small number of elections. My reaction is “of course, who would want that job"13:32
ttxcdent: I'm not really concerned by lack of elections. PTL positions are now more inherited through succession planning, which is good13:32
ttxi.e. less hostile takeovers, more of a planned transition13:33
dhellmannwe evolved to that a loong time ago13:33
smcginnisYeah, I see it as a good thing. Less companies pushing employees to do it so they can say "we have X number of PTLs"13:33
ttxI would be more concerned if there were dozens of no-candidate projects13:33
dtroyerconsider too that we have a long history of re-electing folks running for most PTL & TC seats.  In the PTL case I think it has to do with avoidance of (percieved) confrontation as much as anything else.  Some projects have made a deliverate decision to 'rotate' the PTL so only one steps up to run.13:33
ttx1.613:34
ttx%13:34
cdentyes, I get that, but the axe I’m grinding right now is: We need to recognize that working on openstack is becoming increasingly challenging. It’s a problem that is not going away.13:34
dhellmannthe election process is there as a safety valve, and I don't think we want to eliminate it, but as long as we have people doing the work I'm not that concerned that we don't actually have votes13:34
smcginniscdent: Yeah, I agree it definitely has changed a bit.13:34
ttxI mean 3.2% of teams end up with no candidates, and for the second one we are not even sure it's not human error13:34
dhellmanncdent : absolutely. we should be looking for ways to make it easier. Do you think there's something we could change about the PTL role to improve things?13:36
ttxcdent: totally. The contributor base evolves, as we can't rely as much on large service providers employing dozens of devs. and 2/ As users get more involved we need to teach them the value of strategic contributions13:36
ttxwe are in the middle of that transition13:37
ttxThat said, I see nice signs, like mnaser taking over the PuppetOpenSTack PTLship13:38
cdentdhellmann: I don’t really know. We’re definitely in a difficult transition right now in some projects. The “dozen of devs” aren’t there but the expectations that are on projects (that _may_ be put there by the projects themselves) remain.13:39
dhellmannI agree, we might need to loosen, if not standards, standard practices in some ways13:39
dhellmannperhaps emphasizing the communication duties, over the leadership duties, for the PTL role would help in some cases13:40
smcginnisWe might just need to adjust our own conceptions that things will not move as quickly as they used to.13:40
ttxHorizontal teams have been forced to adapt in the past, today vertical teams need to adapt as well13:40
smcginnisAnd that for some projects, it's just going to be reality that there are 2-3 active contributors that just don't get as much accomplished each cycle.13:40
mugsieyeah - that could help. I have found that (at least for the Designate team) PTL was more about herding cats, and making sure that people know what is going on than "tech leadership".13:41
mugsiethat leadership came from the cores and a few dedicated contributors13:41
smcginnismugsie: Yeah, I've found it to be more of a administrative position than anything else.13:41
mugsiethat said, we were in a bad place for succession for this election until very recently, so we may not be the gold standard13:41
ttxPTL used to be about keeping sanity while drinking from the firehose. Now it's about getting essential bases covered13:42
mugsiethe biggest issue I see for PTLs is the lack of quality reviews, while some contributors dump huge tactical patches. just co-ordinating who is going to review a massive patch chain is hard work nowadays13:44
smcginnisI've really been trying to encourage people to get involved that if nothing else, doing reviews has a huge impact.13:46
smcginnisI would love to see more non-core reviews happening.13:46
smcginnis(And more core reviews too)13:46
mugsiesmcginnis: ++13:48
mugsiethere is one or two non cores that I am very happy to see a review from, as they are goiod reviewers - I just wish we had more13:48
*** marst_ has joined #openstack-tc13:54
*** persia has joined #openstack-tc14:05
cdent[t v2a]14:15
purplerbot<dtantsur> ENOTMUCHTIME is a common error code nowadays [2017-08-10 14:14:41.151800] [n v2a]14:15
dtantsurwut?14:15
* dtantsur is afraid of cdent now14:15
cdentdtantsur: I was quoting you into here to support something I said above about people being overtasked14:16
cdentyou know about the transclusion powers of purplerbot?14:16
dtantsurno, no idea14:16
smcginnisThat's something new to me. :)14:17
cdenthttps://anticdent.org/purple-irc-bot.html14:17
cdentin the first set of bullet points14:18
*** lbragstad has quit IRC14:28
dhellmanncdent : is the line reference for the bot some sort of hash?14:59
*** lbragstad has joined #openstack-tc14:59
cdentdhellmann: it’s a base62 encoding of the first few bytes of a uuid15:00
fungimmm, office hour15:00
fungialso, not nearly enough people use base6215:01
* lbragstad meanders in with a full cup of coffee15:01
persiaDoes a change in the wording of an old resolution require a new resolution, or can patches be submitted against old resolutions?  Note that this would be a semantically meaningful change.15:01
cdentit’s trying to be unique but small, but the tooling checks for dupes15:01
dhellmanncdent : interesting. I guess it would be hard to express something like "that thing cdent said in $channel $n lines back"15:01
ttxFactoid: Nova current activity is not significantly lower than during Mitaka: http://imgur.com/a/gX9ox15:01
dhellmanncdent : ooo, though "that thing cdent said in $channel that matches $regex" would be useful"15:01
cdentyes, it would, and the functionality could be added, but I never bothered because [t v2a]15:02
purplerbot<dtantsur> ENOTMUCHTIME is a common error code nowadays [2017-08-10 14:14:41.151800] [n v2a]15:02
dhellmannhaha15:02
cdentthe use of the numbers and the naming of things “nids” all goes back to doug engelbart stuff15:03
dhellmannpersia : I think in the past we've done a new resolution and explicitly marked the old one as deprecated. There should be examples of that in the repo.15:03
cdentthe “real” stuff that I used a few years ago integrated wikis, mailing list archives and irc15:04
dhellmannpersia : https://governance.openstack.org/tc/resolutions/superseded/index.html15:04
dhellmann"superseded" not "deprecated"15:04
persiadhellmann: Thanks.  So the appropriate procedure would be to raise the complaint to the TC, possibly assist with writing a new resolution, etc.?15:04
dhellmannpersia : that sounds like a good approach -- starting the conversation is always a good way to begin :-)15:05
dhellmannwould you like to raise a topic now?15:05
persiaYes :)15:05
* dtantsur is now a classic writer, I guess15:05
cdentdtantsur++15:05
persiaIn 20141128-elections-process-for-leaderless-programs, the phase "As soon as possible" is used.  Having just been subject to this, I found every minute of the 112 it took worrisome, because of the time pressure.  I'd like to request that "promptly" or similar be considered as an alternative.15:06
persia(well, not every minute, because I didn't notice the phrase for the first 60 or 70)15:06
dhellmannhttps://governance.openstack.org/tc/resolutions/20141128-elections-process-for-leaderless-programs.html for reference15:07
dhellmannhmm, yes15:07
dhellmannI think the intent there is to not have you wait until the actual election is over to let us know15:07
dhellmannbut it's not like we were sitting around waiting for an email yesterday15:08
dhellmannrephrasing that seems like a reasonable change15:08
dhellmannand in this case, I don't think the spirit of the resolution changes so I wouldn't go through the superseded process15:08
cdentagreed15:08
persiaMy understanding was that as long as you knew before the next office hour after the nomination period concluded, nobody would really notice the time that may have passed.15:08
dhellmannsupersession? I don't know the right term there -- just patch the existing file15:08
dhellmannyeah, that seems about right15:09
dhellmannI mean, someone following more closely might have already figured it out or expressed concern (telemetry came up shortly before the deadline)15:09
* persia will prepare a change for "as soon as possible" -> "promptly" for consideration in gerrit, just patching the old resolution, rather than having a new one.15:09
dhellmannbut the "formal" notification doesn't have to be immediate15:09
dhellmannpersia : ++15:09
dhellmannthanks for raising that, and I'm sorry if any poor wording choices introduced unnecessary stress into your experience15:10
dhellmannthank you for acting as an election official :-)15:10
fungipersia: i would even, personally, be fine if it just said "before the conclusion of the election" or something like that15:10
dhellmannthat seems like a reasonable change, too15:10
fungishould be able to take your time, we just need to know before the current ptl's term is up, i think15:11
persiafungi: I'll use that instead.  Having a clear bound will probably be easier to understand for those needing to raise the issue in the future.15:11
cdenti think fungi’s suggestion is better than promptly as it is more concrete15:11
cdentjinx15:11
dhellmannI think the point is we want to try to find someone to do it before the end of the election, so waiting until the day before isn't really helpful, but being concrete is good.15:11
ttxcdent: IIRC you were asking for a version of my graph only including "main", more mature  projects: http://imgur.com/a/i6VJk15:11
smcginnisttx: good data15:12
ttxNot completely crazy curve for projects past their feature development peak15:12
fungidhellmann: well, probably a few days of no ptl (or of the previous ptl sticking it out) while we find a replacement or decide to drop it as an official team is not a huge issue eiter15:12
cdentyeah, that’s useful15:12
fungieither15:12
ttxAlso the peak is reached before the current "crisis"15:12
dhellmannfungi : true15:12
ttxi.e. peak at Mitaka, not Newton15:13
dhellmannit would be interesting to compare that to patchsets proposed per day15:13
persiaAlthough many folk may have been consuming Mitaka during Newton, causing delayed perception of slowing.15:14
cdentdhellmann++15:14
ttxhmm, let's see15:14
* ttx hacks something real quick15:14
ttxI expect a slightly translated similar curve, but let's see15:16
fungicdent: on your point about the meaning of "upstream" and "downstream" i think it also depends a lot on context, as your most recent example shows. it's mostly a producer vs consumer distinction (distros are downstream from our development work, operators are perhaps downstream from distros and api users like our infrastructure can be downstream from service providers donating resources to our cause)15:16
cdentWhile that’s going, I responded to alan clark’s request for agenda items by pointing him at the second section of https://anticdent.org/tc-report-32.html He agreed that some kind of “top of the mind” discussion about the state of the universe would be good, but it would be useful to have some guiding questions prepared. I said I’d check in with everyone here about that.15:16
fungiso when distros talk about "upstream" they typically mean producers of software they're packaging, which is the most common context a lot of us previously involved in distro work hear it in15:16
ttxdhellmann: http://imgur.com/a/QDnSD15:17
dhellmannttx, I don't suppose you can get those onto the same graph?15:18
cdentfungi: yeah, agreed. The part that was confusing for me was the idea that user committee could be “upstream” if it was an openstack committee (since openstack at large is “upstream”).15:18
ttxdhellmann: I can but it will take me more time :)15:18
dhellmannthe shapes look about the same except for the bump around havana15:18
ttxyes around havana we had a LOT of tactical contributions we just could not absorb15:18
ttxcdent: btw you made it to LWN quotes section today15:19
cdentoh dear, from saying what?15:19
ttxthat was about your retrospective proposal15:20
* ttx fetches quote15:20
ttxcdent: are you explicitly sending them a copy of your email ?15:20
cdentttx sending who what email?15:21
openstackgerritFelipe Monteiro proposed openstack/governance master: Mark Murano complete for Queens policy in code goal  https://review.openstack.org/49257315:21
ttxcdent: send LWN a copy of your TC report email ? They link to it every week15:21
ttxhttps://lwn.net/Articles/730191/15:21
cdentOh, no. I didn’t know that was happening.15:21
ttxis that accessible ^15:22
*** emagana has joined #openstack-tc15:22
ttxThey probably pick it up from the Planet then15:22
cdentyeah, that works, thanks15:22
ttxsince they link to your blog15:22
persiaOn "upstream": some projects find using "mainline" to describe themselves useful when they consider that there is no further "upstream" from which they are deriving.15:23
cdentI shall try to make sure the wider audience does not curtail my color commentary.15:23
cdentanyway back to [t 1xSU] if possible. Does anyone have anything to add?15:24
purplerbot<cdent> While that’s going, I responded to alan clark’s request for agenda items by pointing him at the second section of https://anticdent.org/tc-report-32.html He agreed that some kind of “top of the mind” discussion about the state of the universe would be good, but it would be useful to have some guiding questions prepared. I said I’d check in with everyone here about that. [2017-08-10 15:16:50.941642] [n 1xSU]15:24
dhellmannI thought the plan was to go through the working groups created in boston at the march meeting? or do we consider that "done"?15:26
cdentI think that remains on the agenda. Alan asked for additional agenda items.15:26
cdentAnd I was thinking that some more general talk might be useful in shapring the working group activity15:26
ttxpersia: we could add language about the TC having the power to appoint PTLs in case nobody nominated themselves directly in the charter, since it talks about election15:27
dhellmannit might. my impression is that we talk in generalities a *lot* and that more specifics would help15:27
cdentdhellmann: in that case, maybe you and the other tc have already done so, but I certainly have not15:28
cdentMy impression is that the issues I was talking about earlier (working on openstack is too damn hard) are not well understood despite being well known and not really on people’s agenda’s and I can’t carry on not doing something about it15:29
dhellmanncdent : sure. and that's an example of something where we could be more specific and less general15:29
persiattx: My fear with wider exposure of that power is that it may create the impression that one becomes PTL initially by arranging TC appointment.  While I don't want to reduce opporunities for free drinks and swag, I imagine the TC mostly wants projects to organically produce PTLs, and that appointment should only be an exceptional process (more the process of helping the project to self-select a PTL than that the TC is specifically controlling PTL15:29
persia selection).15:29
dhellmannless "it's hard" and more "it's hard because X"15:29
cdentpeople have to say out loud “it’s hard” before they can say why15:29
ttxpersia: yeah, that's fair15:29
cdentThe insistence on coming prepared with the reasons for everything is a hugely limiting factor in our discussions15:30
ttxdhellmann: yes, plan in September is to go through the various workstreams so that they can expose progress (or lack thereof)15:30
cdentThus the notion of a more retrospective oriented thing15:30
cdentEmilienM said he’s be willing to structure something like that, if people felt it was appropriate15:30
openstackgerritEmmet Hikory proposed openstack/governance master: Amend leaderless program resolution timeframe  https://review.openstack.org/49257815:31
dhellmanncdent : I would like to avoid a fruitless afternoon of complaining about random things, so if we can focus on a small number and actually talk about trying to address them instead of just listing them all out again, that may be more productive15:31
dhellmannstructure would be good in either case15:32
ttxyes, I feel like there was a lot of complaining about random things in the past, and that didn't really help much15:32
cdentyes, that’s why I’m bringing up the topic here so we can formulate the questions that would help shape the conversation15:32
ttxI don't feel like there was ever a shortage of people saying "it's hard" out loud15:33
cdentI think there are plenty of people who say that making openstack into what they want is hard15:33
cdentbut I’m less clear on the labor issues15:34
persiaThe "it's hard"s that I have heard often seem to be very specific to individual circumstances.  Each has a solution, but all together cannot be solved together.15:34
cdentAs I said to smcginnis earlier: If openstack was a single corporate thing, then labor would say “oi, we’re overtasked, get some people in here or give us less to do” but there are no mechanisms for that in our collaborative env15:34
fungithat's sort of why we did the exercise with the board, tc and uc in boston... to attempt to pin down predominant opinions on what's hard and where limited available effort should be focused to make the most impact15:34
dhellmanncdent : funny, I have a blog post draft ready to go up when I return from pto that talks about that exactly15:35
smcginnisfungi: And the top 5 help wanted was a result of that, right?15:35
fungia result15:35
cdentdhellmann: I look forward to it15:35
persiacdent: How are there no such mechanisms?  Can PTLs not report their teams understaffed to requirements in a forum that contributing organisations can consume when determining how to allocate contributed FTEs?15:35
cdentpersia: I think something like that is going on all the time and not working?15:36
cdentThings like the top 5 list are helpful and things like it will continue to be helpful.15:36
persiacdent: Yes.  But that it isn't working is different than that there are no mechanisms.  I don't beleive it would work better if OpenStack was managed, rather than governed.15:36
cdentI’m not suggesting that openstack _should_ be managed15:36
fungifor example, one of the top 5 outcomes was that openstack is unnecessarily complex, so work is starting to remove less-used, incomplete or deprecated features, be more clear about what configurations we actually test/support, shed some projects which aren't bringing anything to the table to help the overall picture, et cetera15:37
cdentI’m simply trying to identify that there are issues15:37
persiacdent: Apologies.  Took "If openstack was a single corporate thing" out of context.15:37
* ttx drinks herbal tea from a legacy HP Cloud Services cup15:37
cdentI _like_ very much that openstack is a collaborative affair, but because it is at the same time economically driven, some of the functions for improvement and change are difficult15:38
fungittx: an antique!15:38
ttxThey pivoted at least 3 times since this one15:38
dhellmanncdent : that topic of finding resources to work on things that benefit everyone might lead to a useful conversation in the board meeting15:39
cdentIn truth, I think it’s the only topic worth talking about.15:40
*** emagana has quit IRC15:40
*** emagana has joined #openstack-tc15:41
persiacdent: The financial situation is indeed complex.  Because of the nature of what I do, two things I hear often from contributing orgs are "We can't hire anyone for openstack except at insane salaries" and "We do not believe that paying for openstack feature development is useful."15:41
* dims peeks15:41
dhellmannso, how do we frame that conversation constructively?15:41
fungipersia: feature development seems like the last thing we need. bug fixing and stabilization of current features on the other hand would be stellar ;)15:42
fungiwe could also, as i mentioned, use some help ripping out old/broken bits which bring more complexity than usefulness15:42
ttxdhellmann: I was thinking we could present the top-5 list and say "now what"15:43
persiafungi: Some of my principals define "feature" to be things like "something to mean we don't need to reboot all the machies in the substrate with cron to continue to provide services".  Not that these are necessarily mature users or anything :)15:43
ttxWe might want to add a couple more things to the list before we do that though15:43
dhellmannttx: there are 2 items on our top 5 list15:43
fungipersia: heh, nice!15:43
ttxHow about adding "project stewardship roles" to the top-5 ?15:43
ttxLike people taking on PTL, release liaison... roles15:43
dhellmannat the board meeting at the last summit I raised the point of asking prospective new gold members about their commitment to giving contributors time to become leaders within the community, and not just work on tactical tasks15:44
mtreinishttx: what is the target audience for the top 5 list?15:44
dhellmannperhaps we can extend that to existing members?15:44
ttxWe may not lack PTLs yet, but we definitely lack PTL-like stewardship roles the PTL could delegate to15:44
ttxmtreinish: contributing organizations15:44
persiadhellmann: For discussion regarding asking for more resources, I'd suggest asking the board for help in communicating to operators and vendors that contribution to maintenance is essential to continued availability of openstack, rather than asking the board for the resources directly.15:44
mtreinishttx: because saying that but getting a bunch of new contributors who've never worked in the community before may not be the most realisitc15:44
dhellmannpersia : sure. operators and users are going to be an increasingly important source of contributions15:45
*** emagana has quit IRC15:45
ttxmtreinish: contributors from Asia in particular needed more guidance on where to contribute to be useful15:45
cdentyeah, there is a flip side which is the onramping and ongoing complexity, which is another well known problem that we need more strategies for15:45
ttxmtreinish: also steawrdship can start with something as siumple as bug czarring15:46
dhellmannbut my point was to discuss the issue with the board, because I think this is an area where the board can help by raising awareness of the problem and incentivizing companies to address it (to bring in cdent's point about economics)15:46
persiadhellmann: Don't discount vendors and software providers.  The former have been our backbone, but they will still continue for less advanced users.  The latter will probably become more important as the user/operator contribution level increases, as they provide ways to purchase fractional FTEs.15:46
ttxBug czars have traditionnally been newcomers15:46
dhellmannpersia : it's like you're channeling me from 6 months ago :-)15:46
ttxand proved itself a great on-ramp to more involvement15:46
fungione (trivial) thing which was suggested as a topic at an earlier office hour... getting agreement from the board that it's okay to fix that extremely confusing typo in the technical committee member policy (a foundation bylaws appendix)15:47
mtreinishttx: perhaps, but I've seen many times people working on wrangling bugs who don't understand the process or the project well enough and make a mess of things15:47
persiadhellmann: I'm just repeating things said in Austin :)15:47
dhellmannmaybe I've been saying those things for more than 6 months :-)15:48
mtreinishttx: I'm just saying we can put things like we need X on the list, but we want to make sure we provide the tools for people with limited experience to be successful doing X15:48
ttxmtreinish: yes15:49
ttxStill I'd very much like to get to that Denver meeting with more than 2 things on the list15:49
ttxotherwise it's a bit counterproductive15:49
ttxfungi: how is infra doing those days ?15:50
*** emagana has joined #openstack-tc15:50
ttxShould we add it as a top-5 ?15:50
mtreinishttx: I always could use help on my gate data analysis tooling. Openstack-health hasn't had a real patch in months15:50
fungittx: infra's scrambling to keep afloat15:51
dhellmannthat sounds like a "yes"15:51
fungi(which is one of the reasons i'm not so active in office hour at the moment)15:51
ttxyes15:51
dims++ to add infra15:51
ttxI cited infra in my talks in Asia15:51
fungimuch appreciated15:51
ttxbut I think it's time to add it15:52
smcginnis++15:52
ttxfungi: would you be willing to draft something ?15:52
dimsfungi : are about cloud capacity?15:52
fungii expect this to get much better once the herculean push for zuul v3 is complete, but we're really worn thin at present15:52
dimss/are/how/15:52
fungidims: brief alignment of unfortunate events with different providers aside, i think we're generally okay on test resource capacity (not this week obviously, but in general)15:53
dhellmannmore cloud capacity is always good, but I think our #1 issue is somehow encouraging contributing companies to give resources to community needs beyond their own immediate concerns15:53
sdaguefungi: do we have more capacity coming online?15:54
fungittx: draft something for the top 5 most wanted list, or for the meeting agenda in denver, or what?15:54
sdaguebecause going back into june, we were very often going to the full 160015:54
ttxfungi: top-5 list15:54
sdagueand while we don't use that all the time, hitting the hard limit causes really large delays in result turn around time, including fixing the gate itself15:54
ttxdhellmann: part of it is not having any free time beyond firefighting to engage with potential donors15:55
smcginnisdhellmann: I see that issue - contributing beyond their own immediate concerns.15:55
fungisdague: we have a couple new donors in the wings, but current struggles are because our vouchers for ovh expired and they suspended our service just days after osic went offline, we discovered that we're apparently very bandwidth-constrained in infra-cloud for internet uplink limiting our effective utilization there, network issues in a region in citycloud have had it offline for us for a few weeks...15:55
fungisdague: so fixing at least some of those would probably get things back on track15:56
fungibut it's almost all i can do to keep up with the communication with different providers and trouble tickets sometimes15:56
sdaguettx: well, it's notable that the entire upgrade testing space is me in small windows, and definitely can't extend past the current boundaries. If upgrade testing remains important we definitely need more folks engaged there.15:57
sdaguefungi: ok, cool15:57
ttxsdague: should we add upgrade testing, or more generally QA ?15:58
fungittx: i'll come up with something for the hitlist. i may need to make it a little vague because our specific needs shift pretty constantly (so our actual need is for experienced and talented generalists)15:58
ttxsdague: mtreinish above said he would welcome help in the gate data analysis tooling maintenance15:58
mtreinishfungi: I'd gladly donate all my closet cloud resources to the gate. But it's probably too slow and limited capacity for it to be useful15:58
sdaguettx: I also need to be honest and say that shepharding unified limits is beyond my ability to commit at this point. I did look around for some other possible folks to drive that, but people were finding homes at the time.15:58
ttxmtreinish: where would we get our TV from?15:59
sdaguettx: I think the more specific you get, with how people could have an impact, the more likely we'll get engagement15:59
fungimtreinish: i wouldn't want it to keep you awake all night either, plus it's summer15:59
mtreinishttx: heh, there are other computers :)15:59
mtreinishfungi: I'd have to rig up some better cooling, but it would be doable if necessary16:00
ttx"Infra donors: OVH, CityCloud and Matt Treinish"16:01
dhellmannanother way to address the capacity problem would be to look for ways to run fewer jobs. I know nova does some creative path regex work. Maybe we can expand that to other project team.s16:01
mtreinishttx: haha, I like the way that sounds16:01
fungi"...and Matt Treinish's clothes closet" has a better ring to it16:01
dhellmannthere's also that ML thread that said something about trove running 20+ check jobs -- are all of those really necessary?16:02
dhellmannand are there other similar cases of excess we can look at?16:02
mtreinishdhellmann: none of them are, they're non-voting and don't even work16:02
dhellmanns/excess/potential excess/16:02
mtreinishdhellmann: there is a lot of trimming that can be done16:02
mtreinishbut it requires some one to dig into all the details16:02
dhellmannwell, they said they were working on getting them to be voting, so I don't want to just turn them all off, but can some be merged?16:02
mtreinishwhich is very time consuming16:02
*** dklyle is now known as david-lyle16:02
dhellmannmtreinish : perhaps we need to make it a requirement for queens?16:03
ttxfungi: re top-5 list -- mention geographic coverage ?16:03
fungidhellmann: i think the trove situation is mostly because they can't effectively check everything they need to in a single job due to the time it takes, so they've split it up into lots of shorter jobs (though that does come with setup overhead). something about the slowness of having to test applications in virtual machines running on other virtual machines16:03
fungittx: great point, definitely16:03
dhellmannfungi : ok, that's the sort of detail I was missing, thanks16:03
sdagueso, I think if you are going to address the issue of CI use time, the first step would be accounting for it16:03
sdagueper project, what is the CI use time per patch landed16:04
mtreinishback to gate data analysis tooling :)16:04
sdaguemtreinish: yeh, but that's a very specific problem16:04
dhellmannyeah, I don't want to design the whole solution here, just see if that's something worth looking into.16:04
sdaguedhellmann: yeh, the issue ends up being that a lot of human guessing work happens on projects that we think are outliers16:05
sdaguebut it's really at best recapturing 1% of resources16:05
dhellmannbecause if we say it's something we want done, we can recruit someone to do the work16:05
dhellmannthe outreachy program is starting up and looking for projects; we're going to ask the board to ask companies to give people time to do this sort of work; etc.16:05
fungiyeah, while the skip-if regexes may reduce job resource consumption for certain changes, it also adds a ton of cognitive complexity to te configuration and regularly breaks/drops jobs for unrelated projects if we're not hypervigilant about regex review and inheritence/precedence. so being able to measure how much it's really reducing job resource consumption would be useful to avoid wasting time and16:05
fungiincreasing risk there for no appreciable gain16:05
sdaguefungi: right, and set norms16:06
sdaguewhich is like the acceptable number of CI hours classes of projects should be shooting for16:06
sdagueper patch16:06
dhellmannsure, like I said, I don't want to solve the problem right now, I want to agree that it's worth looking into16:06
sdaguedhellmann: I think that given our growth curve, it is worth looking into16:07
sdaguethe previous assumption that we'd always have headroom is not valid16:07
fungiwe ought to be able to measure that, though as with many things (and i'm tired of saying it), becomes easier in zuul v3. the database reporter will be a wealth of absolute timing data we can mine and analyze16:07
dhellmannoops, docs team meeting is starting, I need to head over there16:08
fungisort of like what we have for subunit2sql now, but for the outer layer16:08
sdaguefungi: sure, but I wouldn't wait to get started on it if there were volunteers16:08
fungiagreed16:08
fungiit's more a question of whether we have the relevant data. best case we might be able to mine it out of zuul debug logs today16:09
sdagueyeh16:09
fungiwe can get builds per change and their start/end times from there16:09
mtreinishfungi: we have the data for the gate queue at least in subunit2sql already, I was just working on drafting a small graphing scipt to make some bar graphs by project16:10
sdaguemtreinish: you need the check queue16:10
sdaguethat's where most of the CI time is spent16:10
mtreinishsdague: we do, but at least for right now we don't have that data aggregated anywhere16:11
ttxOnce we are past RC1 I'm considering writing a "Beware of zombies" contribution metrics post16:11
fungiwell, also subunit2sql doesn't get all jobs, and for the ones which do provide subunit it often only covers the job payload so there's some (variable depending on the job) overhead to take into account16:11
sdaguefungi: ++16:11
sdagueI would caution extending subunit2sql here16:12
sdaguehonestly, you could get the data almost from scraping gerrit comments16:12
sdagueyou'd miss a few conditinos16:12
mtreinishfungi: there is, but at least for dsvm jobs we also get the devstack portion. So all that's missing from there is the pre-devstack setup (which can take a lot of time in some situations)16:12
sdaguebut you'd have a 90% solution16:12
fungiso we may be able to get some prelim numbers out of the subunit2sql data and i'd be cool with it as a point of interest, but raw data from zuul will be more accurate (whether that's gleaned from debug logs or queried from the v3 reports db)16:12
mtreinishbut subunit2sql is an incomplete set of data, it was just easy to do and would give us a view16:13
fungiyeah, i certainly don't object and it may show us some things we aren't expecting16:13
sdagueI would have caution there without the check queue, you are getting a < 10% dataset16:14
sdagueso it's going to be really easy to make decisions on the wrong picture16:14
fungialso it.'s a rarefied set of changes which have already passed check and review16:16
fungiwho knows how much trash is being put in check which never sees the gate?16:16
sdaguefungi: yep, and all the -nv on check16:19
mtreinishthat trove case mriedem pointed out on the ML is a good example of that16:21
sdaguesure, but even neutron is running > 15 tempest jobs in check16:21
sdaguethere is also an interesting edge condition I was noticing this morning on the tripleo gate queue (that was quite large)16:22
sdaguethe tripple-o ci nodes / jobs a pretty unstable (or at least have been)16:22
sdaguebut those patches, and the things they gate with, include a ton of standard ci jobs16:22
sdaguewhich means every time their is a gate reset because of the trippleo-ci16:23
sdagueit's snagging a ton of high priority nodes from the main CI16:23
sdagueall the puppet changes for instance and mistral are in that cogate16:24
sdagueI think that behavior wasn't noticed until we became so node constrained16:24
fungithe stuff running in the tripleo queue in the normal gate pipeline actually runs on our nodes (multi-node jobs i think?)16:25
fungithere's a separate check-tripleo or whatever pipeline which runs jobs on the tripleo test cloud16:26
sdaguesure16:26
sdaguebut it cogates with puppet16:26
fungibut the tripleo jobs running on our normal nodes are definitely also very unstable16:26
fungiand yes, it's getting queued with all the other repos whose teams have agreed to gate on tripleo jobs16:27
sdagueright, and gate queue is priority of check16:27
fungicorrect (though at least we won't assign nodes to more than 20 changes in a gate queue if they're failing frequently)16:27
sdaguesure, but puppet-nova is 15 nodes (for instance)16:29
sdague6 of them tempest runs16:29
sdague20 * 15 == 30016:29
sdagueso worst case that's currently nearly half of our capacity16:29
sdaguereal world 20 node deep case (what was constant this week) my guess is basically locking up 100 nodes16:30
sdagueit's a new problem, don't want anyone to feel bad about it16:30
sdaguebut it's something that I don't think we ran in such a way in previous releases16:31
openstackgerritHongbin Lu proposed openstack/governance master: Zun completion of python35 goal  https://review.openstack.org/49261516:32
*** hongbin has joined #openstack-tc16:33
*** emagana has quit IRC16:49
*** marst_ has quit IRC16:54
*** marst_ has joined #openstack-tc16:54
mtreinishfungi: hah, well trying to graph all the projects in 1 go I encountered a first using matplotlib: "ValueError: Image size of 180000x2700 pixels is too large. It must be less than 2^16 in each direction."16:54
mtreinishthat's no fun16:55
dimsfungi : glad to hear that16:56
fungimtreinish: hah!16:57
*** dtantsur is now known as dtantsur|afk17:00
*** emagana has joined #openstack-tc17:09
*** emagana has quit IRC17:12
*** emagana has joined #openstack-tc17:13
*** marst_ has quit IRC17:13
mtreinishfungi: interesting I found a bug in the elastic-recheck tests running the graphs: http://logs.openstack.org/42/492342/1/gate/gate-elastic-recheck-python27-ubuntu-xenial/f42c792/console.html#_2017-08-10_15_24_52_02708917:14
mtreinishfungi: because http://i.imgur.com/inMQlBo.png clearly wasn't right17:14
mtreinish(limited it to the top 10 run time sums)17:15
fungiindeed, that's odd17:15
mtreinishI think watcher has the same problem: http://paste.openstack.org/show/618078/17:17
*** emagana has quit IRC17:18
mtreinishthe run_time is done by taking the first start time and the last stop time for all the tests and using the delta17:18
mtreinishso if something is messing with time I could see that having unintended consequences17:18
mtreinishftr, watcher's unit tests take ~50sec to execute not 25M secs17:19
*** emagana has joined #openstack-tc17:19
mtreinishas amusing as that is :)17:19
*** emagana has quit IRC17:19
*** emagana has joined #openstack-tc17:20
mtreinishfungi: yep looks like both test suites mock datetime during the run. Which is probably messing with the subunit time reporting17:22
mtreinishfungi: ignoring those 2 outliers it's a much more reasonable graph: http://i.imgur.com/DkwwTO9.png17:23
mtreinishthat's over the whole db (so 6 months)17:23
fungiwe probably do need someone to think through a datetime mocking solution which won't interfere with subunit reporting though17:25
fungijust as a longer-term thing to keep on the radar17:25
mtreinishyeah, starting that investigation was going to be my post lunch task17:26
*** emagana has quit IRC17:58
fungisdague: to follow up on the capacity questions, rackspace also just bumped our quota by an aggregate of a couple hundred instances across two regions, and the same time that took effect the zuul backlog (jobs waiting/pending) reversed direction from trending upward to trending downward. maybe coincidence, but i like to think we were just on the cusp of being able to keep up18:37
*** cdent has quit IRC18:41
sdaguefungi: we might have been18:44
sdaguewe're still 3 hours wait time on check nodes18:44
sdaguehopefully that burns down18:44
fungiit ought to catch up fairly quickly18:47
fungiwe're at about 2k jobs waiting at present18:48
fungiand running around 850 nodes in use after accounting for boot/delete overhead18:48
fungilooks like we've been dropping the backlog by a steady 500 jobs per hour since the quota bump took effect18:49
fungiso maybe another 4 hours to be fully caught up, if nothing major changes18:49
*** openstackgerrit has quit IRC19:03
*** emagana has joined #openstack-tc19:15
*** emagana has quit IRC19:23
*** emagana has joined #openstack-tc19:24
*** emagana has quit IRC19:29
*** emagana has joined #openstack-tc19:36
*** thingee_ has joined #openstack-tc19:58
*** emagana has quit IRC20:46
*** emagana has joined #openstack-tc20:46
*** emagana_ has joined #openstack-tc20:47
*** emagana has quit IRC20:50
*** emagana_ has quit IRC21:07
fungialso, zuul beat the clock on my catch-up estimate. huzzah!23:03
*** hongbin has quit IRC23:21
*** sdague has quit IRC23:47

Generated by irclog2html.py 2.15.3 by Marius Gedminas - find it at mg.pov.lt!