Thursday, 2017-08-10

fungi	okay, gordc volunteered for telemetry ptl at the last moment, crisis averted	00:46
*** emagana has quit IRC		00:51
*** emagana has joined #openstack-tc		00:52
dims	fungi : w00t	00:53
fungi	dims: well, i wouldn't cheer too much until you read http://git.openstack.org/cgit/openstack/election/tree/candidates/queens/Telemetry/gordc.txt	00:55
fungi	sounds like the plan is to wind it down anyway	00:55
*** emagana has quit IRC		00:56
fungi	and acknowledging that little work is happening there any longer	00:57
dims	fungi : :(	00:57
dims	fungi : i was reading about this go-contributor-workshop to see what we could learn from them - https://blog.golang.org/contributor-workshop	00:58
fungi	neat	01:02
fungi	they did a good job of including tweets from and photos of women participating at the event	01:04
fungi	though trucker hat swag seems like it's reinforcing the brogrammer stereotype a bit... wonder whether they thought that through	01:04
*** RuiChen has joined #openstack-tc		01:37
*** RuiChen has left #openstack-tc		01:58
*** dklyle has quit IRC		02:49
*** david-lyle has joined #openstack-tc		03:22
*** rmcall has quit IRC		04:07
*** rmcall has joined #openstack-tc		04:40
*** david-lyle has quit IRC		04:44
*** rmcall has quit IRC		04:45
*** dklyle has joined #openstack-tc		04:45
*** gcb has joined #openstack-tc		07:13
*** cdent has joined #openstack-tc		09:35
*** sdague has joined #openstack-tc		09:50
*** gcb has quit IRC		10:04
*** dtantsur\|afk is now known as dtantsur		10:35
dhellmann	fungi, smcginnis, dims : it sounds like we should talk about adding status:maintenance-mode to those projects	12:43
fungi	quite possibly	12:44
cdent	I don’t know that gord’s message is all _that_ depressing. It basically says “it kinda works and isn’t broken enough for people do to anything about it”. That’s an interesting kind of success.	12:51
cdent	The extraction of gnocchi to be something more generic than “just openstack” is also a kind of success.	12:51
cdent	What might be not a success is if the other services aren’t capable of producing info that generic tools can consume	12:52
cdent	(ttx: since you asked about different uses of the term “upstream”, there’s one in this message: http://lists.openstack.org/pipermail/openstack-infra/2017-August/005546.html )	12:56
smcginnis	It may be the project is just in a stable state and needs someone to keep the lights on.	13:02
smcginnis	If and until it gets superceded by something else or someone has a plan to expand it in some way.	13:03
smcginnis	But status:maintenance-mode probably is an accurate tag for it at this point I suppose.	13:03
cdent	I’d like to see the day when projects are striving to get that tag	13:05
cdent	not thinking of it as bad	13:05
cdent	but yeah, I agree it is the right tag for at least telemetry (now that gnocchi is no longer included)	13:05
ttx	cdent: yeah, in that case we are the users and citycloud is upstream from us -- still a bit confusing I'll admit	13:07
smcginnis	Do we need to hold a TC meeting to select a storlets PTL? Or hash it out between office hours and the ML?	13:10
cdent	ttx: the thing that is different/missing there is that there’s no comits	13:11
cdent	smcginnis: start on ml?	13:11
ttx	I think trying to reach to the former PTL is the first step	13:11
ttx	then maybe ask notmyname if he has an idea of the current status / users	13:12
smcginnis	cdent, ttx: Makes sense to me.	13:12
ttx	Traditionally we waited for the election conclusion, but I guess we can speed up	13:13
smcginnis	Yeah, doesn't seem to be anything gating this one.	13:13
smcginnis	So only two projects with a PTL election. That seems lower than usual.	13:14
ttx	But also only two projects without nominees, which is also lower than usual. Usually more than 2 miss the call	13:15
ttx	I think we had 3 last two elections	13:15
smcginnis	That is good.	13:15
ttx	In the storlets case it's getting slow but it's not completely dead either. I'll reach out to Eran Rom and Kota Tsuyuzaki	13:17
cdent	When do we start asking if the concept of PTL, as currently constructed, is sustainable?	13:21
dhellmann	how do you see it as potentially unsustainable?	13:24
ttx	If you can't get a contact point for a project, I think that project should be considered unmaintained	13:25
dhellmann	sure, but that's not the concept of PTL being flawed, is it?	13:25
ttx	because that's what the PTL is, once you remove all the paint	13:25
ttx	dhellmann: no it's not, I agree	13:25
ttx	One issue is that the PTLs are traditionally bad at delegating	13:26
cdent	couple things dhellmann: a) if not many people are wanting to do the job, then that suggests that the job is either hard to do, or not rewarding, or hard to justify to employers, or any of a variety of things. b) have you observed how much some people like mriedem work? we shouldn’t be encouraging that	13:26
dhellmann	that has been a trend, yes	13:26
dhellmann	cdent : yes, see ttx's comment	13:26
cdent	that covers b, but I don’t know that it covers a	13:27
ttx	i.e. everyone else is happy letting "the PTL" do things	13:27
cdent	and I think the lack of delegation is in many cases effect not cause	13:27
dhellmann	the employer justification angle is important, now that employers have figured out that it doesn't mean they can control the project absolutely	13:27
smcginnis	I think for the case of telemetry, it's partly a matter of the perception of what PTL means.	13:28
dtroyer	dhellmann: ++ I have seen where 'PTL' is not the management-buzzword nearly like it once was	13:28
dhellmann	we've defined more liaison roles, which should make at least some of the delegation lines clearer	13:29
smcginnis	For maintenance or stable projects, I think there is an idea that PTL is a challenge and they are expected to somehow work to "revitalize" the project.	13:29
cdent	smcginnis: I think it is hard to extrapolate about ptls in general from the example of telemetry	13:29
smcginnis	Where we need to make it clear in those cases it just needs to be someone to be the point person.	13:29
dhellmann	smcginnis : there's also definitely a general lack of interest in governance from many members of that team	13:29
smcginnis	cdent: Right, not trying to say that applies in all cases. But in the case of so called "dying" projects, it's a different role than being Nova PTL.	13:29
ttx	I don't think any of this is linked to "the PTL" as a concept. It's more a crisis of strategic involvement in projects	13:30
dhellmann	maybe teams with status:maintenance-mode don't need a ptl, but need a "point of contact" role	13:30
ttx	You have less people doing janitor work in projects	13:30
dhellmann	ttx: that has been a trend for a while, yes	13:30
ttx	that makes the PTL, as the default janitor, more exposed	13:30
smcginnis	So status:maintenance-mode can have a "janitor" election. :)	13:30
ttx	But replacing the PTL concept won't solve the underlying issue	13:31
cdent	Sorry, I didn’t mean to distract us from that point.	13:31
ttx	which is that it's hard to get people to work ,on stuff that benefits everyone	13:31
cdent	My question was a much broader bigger picture thing, not related to the issue with some projects being in maintenance mode	13:31
cdent	it was more in reaction to the statement about the small number of elections. My reaction is “of course, who would want that job"	13:32
ttx	cdent: I'm not really concerned by lack of elections. PTL positions are now more inherited through succession planning, which is good	13:32
ttx	i.e. less hostile takeovers, more of a planned transition	13:33
dhellmann	we evolved to that a loong time ago	13:33
smcginnis	Yeah, I see it as a good thing. Less companies pushing employees to do it so they can say "we have X number of PTLs"	13:33
ttx	I would be more concerned if there were dozens of no-candidate projects	13:33
dtroyer	consider too that we have a long history of re-electing folks running for most PTL & TC seats. In the PTL case I think it has to do with avoidance of (percieved) confrontation as much as anything else. Some projects have made a deliverate decision to 'rotate' the PTL so only one steps up to run.	13:33
ttx	1.6	13:34
ttx	%	13:34
cdent	yes, I get that, but the axe I’m grinding right now is: We need to recognize that working on openstack is becoming increasingly challenging. It’s a problem that is not going away.	13:34
dhellmann	the election process is there as a safety valve, and I don't think we want to eliminate it, but as long as we have people doing the work I'm not that concerned that we don't actually have votes	13:34
smcginnis	cdent: Yeah, I agree it definitely has changed a bit.	13:34
ttx	I mean 3.2% of teams end up with no candidates, and for the second one we are not even sure it's not human error	13:34
dhellmann	cdent : absolutely. we should be looking for ways to make it easier. Do you think there's something we could change about the PTL role to improve things?	13:36
ttx	cdent: totally. The contributor base evolves, as we can't rely as much on large service providers employing dozens of devs. and 2/ As users get more involved we need to teach them the value of strategic contributions	13:36
ttx	we are in the middle of that transition	13:37
ttx	That said, I see nice signs, like mnaser taking over the PuppetOpenSTack PTLship	13:38
cdent	dhellmann: I don’t really know. We’re definitely in a difficult transition right now in some projects. The “dozen of devs” aren’t there but the expectations that are on projects (that _may_ be put there by the projects themselves) remain.	13:39
dhellmann	I agree, we might need to loosen, if not standards, standard practices in some ways	13:39
dhellmann	perhaps emphasizing the communication duties, over the leadership duties, for the PTL role would help in some cases	13:40
smcginnis	We might just need to adjust our own conceptions that things will not move as quickly as they used to.	13:40
ttx	Horizontal teams have been forced to adapt in the past, today vertical teams need to adapt as well	13:40
smcginnis	And that for some projects, it's just going to be reality that there are 2-3 active contributors that just don't get as much accomplished each cycle.	13:40
mugsie	yeah - that could help. I have found that (at least for the Designate team) PTL was more about herding cats, and making sure that people know what is going on than "tech leadership".	13:41
mugsie	that leadership came from the cores and a few dedicated contributors	13:41
smcginnis	mugsie: Yeah, I've found it to be more of a administrative position than anything else.	13:41
mugsie	that said, we were in a bad place for succession for this election until very recently, so we may not be the gold standard	13:41
ttx	PTL used to be about keeping sanity while drinking from the firehose. Now it's about getting essential bases covered	13:42
mugsie	the biggest issue I see for PTLs is the lack of quality reviews, while some contributors dump huge tactical patches. just co-ordinating who is going to review a massive patch chain is hard work nowadays	13:44
smcginnis	I've really been trying to encourage people to get involved that if nothing else, doing reviews has a huge impact.	13:46
smcginnis	I would love to see more non-core reviews happening.	13:46
smcginnis	(And more core reviews too)	13:46
mugsie	smcginnis: ++	13:48
mugsie	there is one or two non cores that I am very happy to see a review from, as they are goiod reviewers - I just wish we had more	13:48
*** marst_ has joined #openstack-tc		13:54
*** persia has joined #openstack-tc		14:05
cdent	[t v2a]	14:15
purplerbot	<dtantsur> ENOTMUCHTIME is a common error code nowadays [2017-08-10 14:14:41.151800] [n v2a]	14:15
dtantsur	wut?	14:15
* dtantsur is afraid of cdent now		14:15
cdent	dtantsur: I was quoting you into here to support something I said above about people being overtasked	14:16
cdent	you know about the transclusion powers of purplerbot?	14:16
dtantsur	no, no idea	14:16
smcginnis	That's something new to me. :)	14:17
cdent	https://anticdent.org/purple-irc-bot.html	14:17
cdent	in the first set of bullet points	14:18
*** lbragstad has quit IRC		14:28
dhellmann	cdent : is the line reference for the bot some sort of hash?	14:59
*** lbragstad has joined #openstack-tc		14:59
cdent	dhellmann: it’s a base62 encoding of the first few bytes of a uuid	15:00
fungi	mmm, office hour	15:00
fungi	also, not nearly enough people use base62	15:01
* lbragstad meanders in with a full cup of coffee		15:01
persia	Does a change in the wording of an old resolution require a new resolution, or can patches be submitted against old resolutions? Note that this would be a semantically meaningful change.	15:01
cdent	it’s trying to be unique but small, but the tooling checks for dupes	15:01
dhellmann	cdent : interesting. I guess it would be hard to express something like "that thing cdent said in $channel $n lines back"	15:01
ttx	Factoid: Nova current activity is not significantly lower than during Mitaka: http://imgur.com/a/gX9ox	15:01
dhellmann	cdent : ooo, though "that thing cdent said in $channel that matches $regex" would be useful"	15:01
cdent	yes, it would, and the functionality could be added, but I never bothered because [t v2a]	15:02
purplerbot	<dtantsur> ENOTMUCHTIME is a common error code nowadays [2017-08-10 14:14:41.151800] [n v2a]	15:02
dhellmann	haha	15:02
cdent	the use of the numbers and the naming of things “nids” all goes back to doug engelbart stuff	15:03
dhellmann	persia : I think in the past we've done a new resolution and explicitly marked the old one as deprecated. There should be examples of that in the repo.	15:03
cdent	the “real” stuff that I used a few years ago integrated wikis, mailing list archives and irc	15:04
dhellmann	persia : https://governance.openstack.org/tc/resolutions/superseded/index.html	15:04
dhellmann	"superseded" not "deprecated"	15:04
persia	dhellmann: Thanks. So the appropriate procedure would be to raise the complaint to the TC, possibly assist with writing a new resolution, etc.?	15:04
dhellmann	persia : that sounds like a good approach -- starting the conversation is always a good way to begin :-)	15:05
dhellmann	would you like to raise a topic now?	15:05
persia	Yes :)	15:05
* dtantsur is now a classic writer, I guess		15:05
cdent	dtantsur++	15:05
persia	In 20141128-elections-process-for-leaderless-programs, the phase "As soon as possible" is used. Having just been subject to this, I found every minute of the 112 it took worrisome, because of the time pressure. I'd like to request that "promptly" or similar be considered as an alternative.	15:06
persia	(well, not every minute, because I didn't notice the phrase for the first 60 or 70)	15:06
dhellmann	https://governance.openstack.org/tc/resolutions/20141128-elections-process-for-leaderless-programs.html for reference	15:07
dhellmann	hmm, yes	15:07
dhellmann	I think the intent there is to not have you wait until the actual election is over to let us know	15:07
dhellmann	but it's not like we were sitting around waiting for an email yesterday	15:08
dhellmann	rephrasing that seems like a reasonable change	15:08
dhellmann	and in this case, I don't think the spirit of the resolution changes so I wouldn't go through the superseded process	15:08
cdent	agreed	15:08
persia	My understanding was that as long as you knew before the next office hour after the nomination period concluded, nobody would really notice the time that may have passed.	15:08
dhellmann	supersession? I don't know the right term there -- just patch the existing file	15:08
dhellmann	yeah, that seems about right	15:09
dhellmann	I mean, someone following more closely might have already figured it out or expressed concern (telemetry came up shortly before the deadline)	15:09
* persia will prepare a change for "as soon as possible" -> "promptly" for consideration in gerrit, just patching the old resolution, rather than having a new one.		15:09
dhellmann	but the "formal" notification doesn't have to be immediate	15:09
dhellmann	persia : ++	15:09
dhellmann	thanks for raising that, and I'm sorry if any poor wording choices introduced unnecessary stress into your experience	15:10
dhellmann	thank you for acting as an election official :-)	15:10
fungi	persia: i would even, personally, be fine if it just said "before the conclusion of the election" or something like that	15:10
dhellmann	that seems like a reasonable change, too	15:10
fungi	should be able to take your time, we just need to know before the current ptl's term is up, i think	15:11
persia	fungi: I'll use that instead. Having a clear bound will probably be easier to understand for those needing to raise the issue in the future.	15:11
cdent	i think fungi’s suggestion is better than promptly as it is more concrete	15:11
cdent	jinx	15:11
dhellmann	I think the point is we want to try to find someone to do it before the end of the election, so waiting until the day before isn't really helpful, but being concrete is good.	15:11
ttx	cdent: IIRC you were asking for a version of my graph only including "main", more mature projects: http://imgur.com/a/i6VJk	15:11
smcginnis	ttx: good data	15:12
ttx	Not completely crazy curve for projects past their feature development peak	15:12
fungi	dhellmann: well, probably a few days of no ptl (or of the previous ptl sticking it out) while we find a replacement or decide to drop it as an official team is not a huge issue eiter	15:12
cdent	yeah, that’s useful	15:12
fungi	either	15:12
ttx	Also the peak is reached before the current "crisis"	15:12
dhellmann	fungi : true	15:12
ttx	i.e. peak at Mitaka, not Newton	15:13
dhellmann	it would be interesting to compare that to patchsets proposed per day	15:13
persia	Although many folk may have been consuming Mitaka during Newton, causing delayed perception of slowing.	15:14
cdent	dhellmann++	15:14
ttx	hmm, let's see	15:14
* ttx hacks something real quick		15:14
ttx	I expect a slightly translated similar curve, but let's see	15:16
fungi	cdent: on your point about the meaning of "upstream" and "downstream" i think it also depends a lot on context, as your most recent example shows. it's mostly a producer vs consumer distinction (distros are downstream from our development work, operators are perhaps downstream from distros and api users like our infrastructure can be downstream from service providers donating resources to our cause)	15:16
cdent	While that’s going, I responded to alan clark’s request for agenda items by pointing him at the second section of https://anticdent.org/tc-report-32.html He agreed that some kind of “top of the mind” discussion about the state of the universe would be good, but it would be useful to have some guiding questions prepared. I said I’d check in with everyone here about that.	15:16
fungi	so when distros talk about "upstream" they typically mean producers of software they're packaging, which is the most common context a lot of us previously involved in distro work hear it in	15:16
ttx	dhellmann: http://imgur.com/a/QDnSD	15:17
dhellmann	ttx, I don't suppose you can get those onto the same graph?	15:18
cdent	fungi: yeah, agreed. The part that was confusing for me was the idea that user committee could be “upstream” if it was an openstack committee (since openstack at large is “upstream”).	15:18
ttx	dhellmann: I can but it will take me more time :)	15:18
dhellmann	the shapes look about the same except for the bump around havana	15:18
ttx	yes around havana we had a LOT of tactical contributions we just could not absorb	15:18
ttx	cdent: btw you made it to LWN quotes section today	15:19
cdent	oh dear, from saying what?	15:19
ttx	that was about your retrospective proposal	15:20
* ttx fetches quote		15:20
ttx	cdent: are you explicitly sending them a copy of your email ?	15:20
cdent	ttx sending who what email?	15:21
openstackgerrit	Felipe Monteiro proposed openstack/governance master: Mark Murano complete for Queens policy in code goal https://review.openstack.org/492573	15:21
ttx	cdent: send LWN a copy of your TC report email ? They link to it every week	15:21
ttx	https://lwn.net/Articles/730191/	15:21
cdent	Oh, no. I didn’t know that was happening.	15:21
ttx	is that accessible ^	15:22
*** emagana has joined #openstack-tc		15:22
ttx	They probably pick it up from the Planet then	15:22
cdent	yeah, that works, thanks	15:22
ttx	since they link to your blog	15:22
persia	On "upstream": some projects find using "mainline" to describe themselves useful when they consider that there is no further "upstream" from which they are deriving.	15:23
cdent	I shall try to make sure the wider audience does not curtail my color commentary.	15:23
cdent	anyway back to [t 1xSU] if possible. Does anyone have anything to add?	15:24
purplerbot	<cdent> While that’s going, I responded to alan clark’s request for agenda items by pointing him at the second section of https://anticdent.org/tc-report-32.html He agreed that some kind of “top of the mind” discussion about the state of the universe would be good, but it would be useful to have some guiding questions prepared. I said I’d check in with everyone here about that. [2017-08-10 15:16:50.941642] [n 1xSU]	15:24
dhellmann	I thought the plan was to go through the working groups created in boston at the march meeting? or do we consider that "done"?	15:26
cdent	I think that remains on the agenda. Alan asked for additional agenda items.	15:26
cdent	And I was thinking that some more general talk might be useful in shapring the working group activity	15:26
ttx	persia: we could add language about the TC having the power to appoint PTLs in case nobody nominated themselves directly in the charter, since it talks about election	15:27
dhellmann	it might. my impression is that we talk in generalities a lot and that more specifics would help	15:27
cdent	dhellmann: in that case, maybe you and the other tc have already done so, but I certainly have not	15:28
cdent	My impression is that the issues I was talking about earlier (working on openstack is too damn hard) are not well understood despite being well known and not really on people’s agenda’s and I can’t carry on not doing something about it	15:29
dhellmann	cdent : sure. and that's an example of something where we could be more specific and less general	15:29
persia	ttx: My fear with wider exposure of that power is that it may create the impression that one becomes PTL initially by arranging TC appointment. While I don't want to reduce opporunities for free drinks and swag, I imagine the TC mostly wants projects to organically produce PTLs, and that appointment should only be an exceptional process (more the process of helping the project to self-select a PTL than that the TC is specifically controlling PTL	15:29
persia	selection).	15:29
dhellmann	less "it's hard" and more "it's hard because X"	15:29
cdent	people have to say out loud “it’s hard” before they can say why	15:29
ttx	persia: yeah, that's fair	15:29
cdent	The insistence on coming prepared with the reasons for everything is a hugely limiting factor in our discussions	15:30
ttx	dhellmann: yes, plan in September is to go through the various workstreams so that they can expose progress (or lack thereof)	15:30
cdent	Thus the notion of a more retrospective oriented thing	15:30
cdent	EmilienM said he’s be willing to structure something like that, if people felt it was appropriate	15:30
openstackgerrit	Emmet Hikory proposed openstack/governance master: Amend leaderless program resolution timeframe https://review.openstack.org/492578	15:31
dhellmann	cdent : I would like to avoid a fruitless afternoon of complaining about random things, so if we can focus on a small number and actually talk about trying to address them instead of just listing them all out again, that may be more productive	15:31
dhellmann	structure would be good in either case	15:32
ttx	yes, I feel like there was a lot of complaining about random things in the past, and that didn't really help much	15:32
cdent	yes, that’s why I’m bringing up the topic here so we can formulate the questions that would help shape the conversation	15:32
ttx	I don't feel like there was ever a shortage of people saying "it's hard" out loud	15:33
cdent	I think there are plenty of people who say that making openstack into what they want is hard	15:33
cdent	but I’m less clear on the labor issues	15:34
persia	The "it's hard"s that I have heard often seem to be very specific to individual circumstances. Each has a solution, but all together cannot be solved together.	15:34
cdent	As I said to smcginnis earlier: If openstack was a single corporate thing, then labor would say “oi, we’re overtasked, get some people in here or give us less to do” but there are no mechanisms for that in our collaborative env	15:34
fungi	that's sort of why we did the exercise with the board, tc and uc in boston... to attempt to pin down predominant opinions on what's hard and where limited available effort should be focused to make the most impact	15:34
dhellmann	cdent : funny, I have a blog post draft ready to go up when I return from pto that talks about that exactly	15:35
smcginnis	fungi: And the top 5 help wanted was a result of that, right?	15:35
fungi	a result	15:35
cdent	dhellmann: I look forward to it	15:35
persia	cdent: How are there no such mechanisms? Can PTLs not report their teams understaffed to requirements in a forum that contributing organisations can consume when determining how to allocate contributed FTEs?	15:35
cdent	persia: I think something like that is going on all the time and not working?	15:36
cdent	Things like the top 5 list are helpful and things like it will continue to be helpful.	15:36
persia	cdent: Yes. But that it isn't working is different than that there are no mechanisms. I don't beleive it would work better if OpenStack was managed, rather than governed.	15:36
cdent	I’m not suggesting that openstack _should_ be managed	15:36
fungi	for example, one of the top 5 outcomes was that openstack is unnecessarily complex, so work is starting to remove less-used, incomplete or deprecated features, be more clear about what configurations we actually test/support, shed some projects which aren't bringing anything to the table to help the overall picture, et cetera	15:37
cdent	I’m simply trying to identify that there are issues	15:37
persia	cdent: Apologies. Took "If openstack was a single corporate thing" out of context.	15:37
* ttx drinks herbal tea from a legacy HP Cloud Services cup		15:37
cdent	I _like_ very much that openstack is a collaborative affair, but because it is at the same time economically driven, some of the functions for improvement and change are difficult	15:38
fungi	ttx: an antique!	15:38
ttx	They pivoted at least 3 times since this one	15:38
dhellmann	cdent : that topic of finding resources to work on things that benefit everyone might lead to a useful conversation in the board meeting	15:39
cdent	In truth, I think it’s the only topic worth talking about.	15:40
*** emagana has quit IRC		15:40
*** emagana has joined #openstack-tc		15:41
persia	cdent: The financial situation is indeed complex. Because of the nature of what I do, two things I hear often from contributing orgs are "We can't hire anyone for openstack except at insane salaries" and "We do not believe that paying for openstack feature development is useful."	15:41
* dims peeks		15:41
dhellmann	so, how do we frame that conversation constructively?	15:41
fungi	persia: feature development seems like the last thing we need. bug fixing and stabilization of current features on the other hand would be stellar ;)	15:42
fungi	we could also, as i mentioned, use some help ripping out old/broken bits which bring more complexity than usefulness	15:42
ttx	dhellmann: I was thinking we could present the top-5 list and say "now what"	15:43
persia	fungi: Some of my principals define "feature" to be things like "something to mean we don't need to reboot all the machies in the substrate with cron to continue to provide services". Not that these are necessarily mature users or anything :)	15:43
ttx	We might want to add a couple more things to the list before we do that though	15:43
dhellmann	ttx: there are 2 items on our top 5 list	15:43
fungi	persia: heh, nice!	15:43
ttx	How about adding "project stewardship roles" to the top-5 ?	15:43
ttx	Like people taking on PTL, release liaison... roles	15:43
dhellmann	at the board meeting at the last summit I raised the point of asking prospective new gold members about their commitment to giving contributors time to become leaders within the community, and not just work on tactical tasks	15:44
mtreinish	ttx: what is the target audience for the top 5 list?	15:44
dhellmann	perhaps we can extend that to existing members?	15:44
ttx	We may not lack PTLs yet, but we definitely lack PTL-like stewardship roles the PTL could delegate to	15:44
ttx	mtreinish: contributing organizations	15:44
persia	dhellmann: For discussion regarding asking for more resources, I'd suggest asking the board for help in communicating to operators and vendors that contribution to maintenance is essential to continued availability of openstack, rather than asking the board for the resources directly.	15:44
mtreinish	ttx: because saying that but getting a bunch of new contributors who've never worked in the community before may not be the most realisitc	15:44
dhellmann	persia : sure. operators and users are going to be an increasingly important source of contributions	15:45
*** emagana has quit IRC		15:45
ttx	mtreinish: contributors from Asia in particular needed more guidance on where to contribute to be useful	15:45
cdent	yeah, there is a flip side which is the onramping and ongoing complexity, which is another well known problem that we need more strategies for	15:45
ttx	mtreinish: also steawrdship can start with something as siumple as bug czarring	15:46
dhellmann	but my point was to discuss the issue with the board, because I think this is an area where the board can help by raising awareness of the problem and incentivizing companies to address it (to bring in cdent's point about economics)	15:46
persia	dhellmann: Don't discount vendors and software providers. The former have been our backbone, but they will still continue for less advanced users. The latter will probably become more important as the user/operator contribution level increases, as they provide ways to purchase fractional FTEs.	15:46
ttx	Bug czars have traditionnally been newcomers	15:46
dhellmann	persia : it's like you're channeling me from 6 months ago :-)	15:46
ttx	and proved itself a great on-ramp to more involvement	15:46
fungi	one (trivial) thing which was suggested as a topic at an earlier office hour... getting agreement from the board that it's okay to fix that extremely confusing typo in the technical committee member policy (a foundation bylaws appendix)	15:47
mtreinish	ttx: perhaps, but I've seen many times people working on wrangling bugs who don't understand the process or the project well enough and make a mess of things	15:47
persia	dhellmann: I'm just repeating things said in Austin :)	15:47
dhellmann	maybe I've been saying those things for more than 6 months :-)	15:48
mtreinish	ttx: I'm just saying we can put things like we need X on the list, but we want to make sure we provide the tools for people with limited experience to be successful doing X	15:48
ttx	mtreinish: yes	15:49
ttx	Still I'd very much like to get to that Denver meeting with more than 2 things on the list	15:49
ttx	otherwise it's a bit counterproductive	15:49
ttx	fungi: how is infra doing those days ?	15:50
*** emagana has joined #openstack-tc		15:50
ttx	Should we add it as a top-5 ?	15:50
mtreinish	ttx: I always could use help on my gate data analysis tooling. Openstack-health hasn't had a real patch in months	15:50
fungi	ttx: infra's scrambling to keep afloat	15:51
dhellmann	that sounds like a "yes"	15:51
fungi	(which is one of the reasons i'm not so active in office hour at the moment)	15:51
ttx	yes	15:51
dims	++ to add infra	15:51
ttx	I cited infra in my talks in Asia	15:51
fungi	much appreciated	15:51
ttx	but I think it's time to add it	15:52
smcginnis	++	15:52
ttx	fungi: would you be willing to draft something ?	15:52
dims	fungi : are about cloud capacity?	15:52
fungi	i expect this to get much better once the herculean push for zuul v3 is complete, but we're really worn thin at present	15:52
dims	s/are/how/	15:52
fungi	dims: brief alignment of unfortunate events with different providers aside, i think we're generally okay on test resource capacity (not this week obviously, but in general)	15:53
dhellmann	more cloud capacity is always good, but I think our #1 issue is somehow encouraging contributing companies to give resources to community needs beyond their own immediate concerns	15:53
sdague	fungi: do we have more capacity coming online?	15:54
fungi	ttx: draft something for the top 5 most wanted list, or for the meeting agenda in denver, or what?	15:54
sdague	because going back into june, we were very often going to the full 1600	15:54
ttx	fungi: top-5 list	15:54
sdague	and while we don't use that all the time, hitting the hard limit causes really large delays in result turn around time, including fixing the gate itself	15:54
ttx	dhellmann: part of it is not having any free time beyond firefighting to engage with potential donors	15:55
smcginnis	dhellmann: I see that issue - contributing beyond their own immediate concerns.	15:55
fungi	sdague: we have a couple new donors in the wings, but current struggles are because our vouchers for ovh expired and they suspended our service just days after osic went offline, we discovered that we're apparently very bandwidth-constrained in infra-cloud for internet uplink limiting our effective utilization there, network issues in a region in citycloud have had it offline for us for a few weeks...	15:55
fungi	sdague: so fixing at least some of those would probably get things back on track	15:56
fungi	but it's almost all i can do to keep up with the communication with different providers and trouble tickets sometimes	15:56
sdague	ttx: well, it's notable that the entire upgrade testing space is me in small windows, and definitely can't extend past the current boundaries. If upgrade testing remains important we definitely need more folks engaged there.	15:57
sdague	fungi: ok, cool	15:57
ttx	sdague: should we add upgrade testing, or more generally QA ?	15:58
fungi	ttx: i'll come up with something for the hitlist. i may need to make it a little vague because our specific needs shift pretty constantly (so our actual need is for experienced and talented generalists)	15:58
ttx	sdague: mtreinish above said he would welcome help in the gate data analysis tooling maintenance	15:58
mtreinish	fungi: I'd gladly donate all my closet cloud resources to the gate. But it's probably too slow and limited capacity for it to be useful	15:58
sdague	ttx: I also need to be honest and say that shepharding unified limits is beyond my ability to commit at this point. I did look around for some other possible folks to drive that, but people were finding homes at the time.	15:58
ttx	mtreinish: where would we get our TV from?	15:59
sdague	ttx: I think the more specific you get, with how people could have an impact, the more likely we'll get engagement	15:59
fungi	mtreinish: i wouldn't want it to keep you awake all night either, plus it's summer	15:59
mtreinish	ttx: heh, there are other computers :)	15:59
mtreinish	fungi: I'd have to rig up some better cooling, but it would be doable if necessary	16:00
ttx	"Infra donors: OVH, CityCloud and Matt Treinish"	16:01
dhellmann	another way to address the capacity problem would be to look for ways to run fewer jobs. I know nova does some creative path regex work. Maybe we can expand that to other project team.s	16:01
mtreinish	ttx: haha, I like the way that sounds	16:01
fungi	"...and Matt Treinish's clothes closet" has a better ring to it	16:01
dhellmann	there's also that ML thread that said something about trove running 20+ check jobs -- are all of those really necessary?	16:02
dhellmann	and are there other similar cases of excess we can look at?	16:02
mtreinish	dhellmann: none of them are, they're non-voting and don't even work	16:02
dhellmann	s/excess/potential excess/	16:02
mtreinish	dhellmann: there is a lot of trimming that can be done	16:02
mtreinish	but it requires some one to dig into all the details	16:02
dhellmann	well, they said they were working on getting them to be voting, so I don't want to just turn them all off, but can some be merged?	16:02
mtreinish	which is very time consuming	16:02
*** dklyle is now known as david-lyle		16:02
dhellmann	mtreinish : perhaps we need to make it a requirement for queens?	16:03
ttx	fungi: re top-5 list -- mention geographic coverage ?	16:03
fungi	dhellmann: i think the trove situation is mostly because they can't effectively check everything they need to in a single job due to the time it takes, so they've split it up into lots of shorter jobs (though that does come with setup overhead). something about the slowness of having to test applications in virtual machines running on other virtual machines	16:03
fungi	ttx: great point, definitely	16:03
dhellmann	fungi : ok, that's the sort of detail I was missing, thanks	16:03
sdague	so, I think if you are going to address the issue of CI use time, the first step would be accounting for it	16:03
sdague	per project, what is the CI use time per patch landed	16:04
mtreinish	back to gate data analysis tooling :)	16:04
sdague	mtreinish: yeh, but that's a very specific problem	16:04
dhellmann	yeah, I don't want to design the whole solution here, just see if that's something worth looking into.	16:04
sdague	dhellmann: yeh, the issue ends up being that a lot of human guessing work happens on projects that we think are outliers	16:05
sdague	but it's really at best recapturing 1% of resources	16:05
dhellmann	because if we say it's something we want done, we can recruit someone to do the work	16:05
dhellmann	the outreachy program is starting up and looking for projects; we're going to ask the board to ask companies to give people time to do this sort of work; etc.	16:05
fungi	yeah, while the skip-if regexes may reduce job resource consumption for certain changes, it also adds a ton of cognitive complexity to te configuration and regularly breaks/drops jobs for unrelated projects if we're not hypervigilant about regex review and inheritence/precedence. so being able to measure how much it's really reducing job resource consumption would be useful to avoid wasting time and	16:05
fungi	increasing risk there for no appreciable gain	16:05
sdague	fungi: right, and set norms	16:06
sdague	which is like the acceptable number of CI hours classes of projects should be shooting for	16:06
sdague	per patch	16:06
dhellmann	sure, like I said, I don't want to solve the problem right now, I want to agree that it's worth looking into	16:06
sdague	dhellmann: I think that given our growth curve, it is worth looking into	16:07
sdague	the previous assumption that we'd always have headroom is not valid	16:07
fungi	we ought to be able to measure that, though as with many things (and i'm tired of saying it), becomes easier in zuul v3. the database reporter will be a wealth of absolute timing data we can mine and analyze	16:07
dhellmann	oops, docs team meeting is starting, I need to head over there	16:08
fungi	sort of like what we have for subunit2sql now, but for the outer layer	16:08
sdague	fungi: sure, but I wouldn't wait to get started on it if there were volunteers	16:08
fungi	agreed	16:08
fungi	it's more a question of whether we have the relevant data. best case we might be able to mine it out of zuul debug logs today	16:09
sdague	yeh	16:09
fungi	we can get builds per change and their start/end times from there	16:09
mtreinish	fungi: we have the data for the gate queue at least in subunit2sql already, I was just working on drafting a small graphing scipt to make some bar graphs by project	16:10
sdague	mtreinish: you need the check queue	16:10
sdague	that's where most of the CI time is spent	16:10
mtreinish	sdague: we do, but at least for right now we don't have that data aggregated anywhere	16:11
ttx	Once we are past RC1 I'm considering writing a "Beware of zombies" contribution metrics post	16:11
fungi	well, also subunit2sql doesn't get all jobs, and for the ones which do provide subunit it often only covers the job payload so there's some (variable depending on the job) overhead to take into account	16:11
sdague	fungi: ++	16:11
sdague	I would caution extending subunit2sql here	16:12
sdague	honestly, you could get the data almost from scraping gerrit comments	16:12
sdague	you'd miss a few conditinos	16:12
mtreinish	fungi: there is, but at least for dsvm jobs we also get the devstack portion. So all that's missing from there is the pre-devstack setup (which can take a lot of time in some situations)	16:12
sdague	but you'd have a 90% solution	16:12
fungi	so we may be able to get some prelim numbers out of the subunit2sql data and i'd be cool with it as a point of interest, but raw data from zuul will be more accurate (whether that's gleaned from debug logs or queried from the v3 reports db)	16:12
mtreinish	but subunit2sql is an incomplete set of data, it was just easy to do and would give us a view	16:13
fungi	yeah, i certainly don't object and it may show us some things we aren't expecting	16:13
sdague	I would have caution there without the check queue, you are getting a < 10% dataset	16:14
sdague	so it's going to be really easy to make decisions on the wrong picture	16:14
fungi	also it.'s a rarefied set of changes which have already passed check and review	16:16
fungi	who knows how much trash is being put in check which never sees the gate?	16:16
sdague	fungi: yep, and all the -nv on check	16:19
mtreinish	that trove case mriedem pointed out on the ML is a good example of that	16:21
sdague	sure, but even neutron is running > 15 tempest jobs in check	16:21
sdague	there is also an interesting edge condition I was noticing this morning on the tripleo gate queue (that was quite large)	16:22
sdague	the tripple-o ci nodes / jobs a pretty unstable (or at least have been)	16:22
sdague	but those patches, and the things they gate with, include a ton of standard ci jobs	16:22
sdague	which means every time their is a gate reset because of the trippleo-ci	16:23
sdague	it's snagging a ton of high priority nodes from the main CI	16:23
sdague	all the puppet changes for instance and mistral are in that cogate	16:24
sdague	I think that behavior wasn't noticed until we became so node constrained	16:24
fungi	the stuff running in the tripleo queue in the normal gate pipeline actually runs on our nodes (multi-node jobs i think?)	16:25
fungi	there's a separate check-tripleo or whatever pipeline which runs jobs on the tripleo test cloud	16:26
sdague	sure	16:26
sdague	but it cogates with puppet	16:26
fungi	but the tripleo jobs running on our normal nodes are definitely also very unstable	16:26
fungi	and yes, it's getting queued with all the other repos whose teams have agreed to gate on tripleo jobs	16:27
sdague	right, and gate queue is priority of check	16:27
fungi	correct (though at least we won't assign nodes to more than 20 changes in a gate queue if they're failing frequently)	16:27
sdague	sure, but puppet-nova is 15 nodes (for instance)	16:29
sdague	6 of them tempest runs	16:29
sdague	20 * 15 == 300	16:29
sdague	so worst case that's currently nearly half of our capacity	16:29
sdague	real world 20 node deep case (what was constant this week) my guess is basically locking up 100 nodes	16:30
sdague	it's a new problem, don't want anyone to feel bad about it	16:30
sdague	but it's something that I don't think we ran in such a way in previous releases	16:31
openstackgerrit	Hongbin Lu proposed openstack/governance master: Zun completion of python35 goal https://review.openstack.org/492615	16:32
*** hongbin has joined #openstack-tc		16:33
*** emagana has quit IRC		16:49
*** marst_ has quit IRC		16:54
*** marst_ has joined #openstack-tc		16:54
mtreinish	fungi: hah, well trying to graph all the projects in 1 go I encountered a first using matplotlib: "ValueError: Image size of 180000x2700 pixels is too large. It must be less than 2^16 in each direction."	16:54
mtreinish	that's no fun	16:55
dims	fungi : glad to hear that	16:56
fungi	mtreinish: hah!	16:57
*** dtantsur is now known as dtantsur\|afk		17:00
*** emagana has joined #openstack-tc		17:09
*** emagana has quit IRC		17:12
*** emagana has joined #openstack-tc		17:13
*** marst_ has quit IRC		17:13
mtreinish	fungi: interesting I found a bug in the elastic-recheck tests running the graphs: http://logs.openstack.org/42/492342/1/gate/gate-elastic-recheck-python27-ubuntu-xenial/f42c792/console.html#_2017-08-10_15_24_52_027089	17:14
mtreinish	fungi: because http://i.imgur.com/inMQlBo.png clearly wasn't right	17:14
mtreinish	(limited it to the top 10 run time sums)	17:15
fungi	indeed, that's odd	17:15
mtreinish	I think watcher has the same problem: http://paste.openstack.org/show/618078/	17:17
*** emagana has quit IRC		17:18
mtreinish	the run_time is done by taking the first start time and the last stop time for all the tests and using the delta	17:18
mtreinish	so if something is messing with time I could see that having unintended consequences	17:18
mtreinish	ftr, watcher's unit tests take ~50sec to execute not 25M secs	17:19
*** emagana has joined #openstack-tc		17:19
mtreinish	as amusing as that is :)	17:19
*** emagana has quit IRC		17:19
*** emagana has joined #openstack-tc		17:20
mtreinish	fungi: yep looks like both test suites mock datetime during the run. Which is probably messing with the subunit time reporting	17:22
mtreinish	fungi: ignoring those 2 outliers it's a much more reasonable graph: http://i.imgur.com/DkwwTO9.png	17:23
mtreinish	that's over the whole db (so 6 months)	17:23
fungi	we probably do need someone to think through a datetime mocking solution which won't interfere with subunit reporting though	17:25
fungi	just as a longer-term thing to keep on the radar	17:25
mtreinish	yeah, starting that investigation was going to be my post lunch task	17:26
*** emagana has quit IRC		17:58
fungi	sdague: to follow up on the capacity questions, rackspace also just bumped our quota by an aggregate of a couple hundred instances across two regions, and the same time that took effect the zuul backlog (jobs waiting/pending) reversed direction from trending upward to trending downward. maybe coincidence, but i like to think we were just on the cusp of being able to keep up	18:37
*** cdent has quit IRC		18:41
sdague	fungi: we might have been	18:44
sdague	we're still 3 hours wait time on check nodes	18:44
sdague	hopefully that burns down	18:44
fungi	it ought to catch up fairly quickly	18:47
fungi	we're at about 2k jobs waiting at present	18:48
fungi	and running around 850 nodes in use after accounting for boot/delete overhead	18:48
fungi	looks like we've been dropping the backlog by a steady 500 jobs per hour since the quota bump took effect	18:49
fungi	so maybe another 4 hours to be fully caught up, if nothing major changes	18:49
*** openstackgerrit has quit IRC		19:03
*** emagana has joined #openstack-tc		19:15
*** emagana has quit IRC		19:23
*** emagana has joined #openstack-tc		19:24
*** emagana has quit IRC		19:29
*** emagana has joined #openstack-tc		19:36
*** thingee_ has joined #openstack-tc		19:58
*** emagana has quit IRC		20:46
*** emagana has joined #openstack-tc		20:46
*** emagana_ has joined #openstack-tc		20:47
*** emagana has quit IRC		20:50
*** emagana_ has quit IRC		21:07
fungi	also, zuul beat the clock on my catch-up estimate. huzzah!	23:03
*** hongbin has quit IRC		23:21
*** sdague has quit IRC		23:47

Generated by irclog2html.py 2.15.3 by Marius Gedminas - find it at mg.pov.lt!