fungi | okay, gordc volunteered for telemetry ptl at the last moment, crisis averted | 00:46 |
---|---|---|
*** emagana has quit IRC | 00:51 | |
*** emagana has joined #openstack-tc | 00:52 | |
dims | fungi : w00t | 00:53 |
fungi | dims: well, i wouldn't cheer too much until you read http://git.openstack.org/cgit/openstack/election/tree/candidates/queens/Telemetry/gordc.txt | 00:55 |
fungi | sounds like the plan is to wind it down anyway | 00:55 |
*** emagana has quit IRC | 00:56 | |
fungi | and acknowledging that little work is happening there any longer | 00:57 |
dims | fungi : :( | 00:57 |
dims | fungi : i was reading about this go-contributor-workshop to see what we could learn from them - https://blog.golang.org/contributor-workshop | 00:58 |
fungi | neat | 01:02 |
fungi | they did a good job of including tweets from and photos of women participating at the event | 01:04 |
fungi | though trucker hat swag seems like it's reinforcing the brogrammer stereotype a bit... wonder whether they thought that through | 01:04 |
*** RuiChen has joined #openstack-tc | 01:37 | |
*** RuiChen has left #openstack-tc | 01:58 | |
*** dklyle has quit IRC | 02:49 | |
*** david-lyle has joined #openstack-tc | 03:22 | |
*** rmcall has quit IRC | 04:07 | |
*** rmcall has joined #openstack-tc | 04:40 | |
*** david-lyle has quit IRC | 04:44 | |
*** rmcall has quit IRC | 04:45 | |
*** dklyle has joined #openstack-tc | 04:45 | |
*** gcb has joined #openstack-tc | 07:13 | |
*** cdent has joined #openstack-tc | 09:35 | |
*** sdague has joined #openstack-tc | 09:50 | |
*** gcb has quit IRC | 10:04 | |
*** dtantsur|afk is now known as dtantsur | 10:35 | |
dhellmann | fungi, smcginnis, dims : it sounds like we should talk about adding status:maintenance-mode to those projects | 12:43 |
fungi | quite possibly | 12:44 |
cdent | I don’t know that gord’s message is all _that_ depressing. It basically says “it kinda works and isn’t broken enough for people do to anything about it”. That’s an interesting kind of success. | 12:51 |
cdent | The extraction of gnocchi to be something more generic than “just openstack” is also a kind of success. | 12:51 |
cdent | What might be not a success is if the other services aren’t capable of producing info that generic tools can consume | 12:52 |
cdent | (ttx: since you asked about different uses of the term “upstream”, there’s one in this message: http://lists.openstack.org/pipermail/openstack-infra/2017-August/005546.html ) | 12:56 |
smcginnis | It may be the project is just in a stable state and needs someone to keep the lights on. | 13:02 |
smcginnis | If and until it gets superceded by something else or someone has a plan to expand it in some way. | 13:03 |
smcginnis | But status:maintenance-mode probably is an accurate tag for it at this point I suppose. | 13:03 |
cdent | I’d like to see the day when projects are striving to get that tag | 13:05 |
cdent | not thinking of it as bad | 13:05 |
cdent | but yeah, I agree it is the right tag for at least telemetry (now that gnocchi is no longer included) | 13:05 |
ttx | cdent: yeah, in that case we are the users and citycloud is upstream from us -- still a bit confusing I'll admit | 13:07 |
smcginnis | Do we need to hold a TC meeting to select a storlets PTL? Or hash it out between office hours and the ML? | 13:10 |
cdent | ttx: the thing that is different/missing there is that there’s no comits | 13:11 |
cdent | smcginnis: start on ml? | 13:11 |
ttx | I think trying to reach to the former PTL is the first step | 13:11 |
ttx | then maybe ask notmyname if he has an idea of the current status / users | 13:12 |
smcginnis | cdent, ttx: Makes sense to me. | 13:12 |
ttx | Traditionally we waited for the election conclusion, but I guess we can speed up | 13:13 |
smcginnis | Yeah, doesn't seem to be anything gating this one. | 13:13 |
smcginnis | So only two projects with a PTL election. That seems lower than usual. | 13:14 |
ttx | But also only two projects without nominees, which is also lower than usual. Usually more than 2 miss the call | 13:15 |
ttx | I think we had 3 last two elections | 13:15 |
smcginnis | That is good. | 13:15 |
ttx | In the storlets case it's getting slow but it's not completely dead either. I'll reach out to Eran Rom and Kota Tsuyuzaki | 13:17 |
cdent | When do we start asking if the concept of PTL, as currently constructed, is sustainable? | 13:21 |
dhellmann | how do you see it as potentially unsustainable? | 13:24 |
ttx | If you can't get a contact point for a project, I think that project should be considered unmaintained | 13:25 |
dhellmann | sure, but that's not the concept of PTL being flawed, is it? | 13:25 |
ttx | because that's what the PTL is, once you remove all the paint | 13:25 |
ttx | dhellmann: no it's not, I agree | 13:25 |
ttx | One issue is that the PTLs are traditionally bad at delegating | 13:26 |
cdent | couple things dhellmann: a) if not many people are wanting to do the job, then that suggests that the job is either hard to do, or not rewarding, or hard to justify to employers, or any of a variety of things. b) have you observed how much some people like mriedem work? we shouldn’t be encouraging that | 13:26 |
dhellmann | that has been a trend, yes | 13:26 |
dhellmann | cdent : yes, see ttx's comment | 13:26 |
cdent | that covers b, but I don’t know that it covers a | 13:27 |
ttx | i.e. everyone else is happy letting "the PTL" do things | 13:27 |
cdent | and I think the lack of delegation is in many cases effect not cause | 13:27 |
dhellmann | the employer justification angle is important, now that employers have figured out that it doesn't mean they can control the project absolutely | 13:27 |
smcginnis | I think for the case of telemetry, it's partly a matter of the perception of what PTL means. | 13:28 |
dtroyer | dhellmann: ++ I have seen where 'PTL' is not the management-buzzword nearly like it once was | 13:28 |
dhellmann | we've defined more liaison roles, which should make at least some of the delegation lines clearer | 13:29 |
smcginnis | For maintenance or stable projects, I think there is an idea that PTL is a challenge and they are expected to somehow work to "revitalize" the project. | 13:29 |
cdent | smcginnis: I think it is hard to extrapolate about ptls in general from the example of telemetry | 13:29 |
smcginnis | Where we need to make it clear in those cases it just needs to be someone to be the point person. | 13:29 |
dhellmann | smcginnis : there's also definitely a general lack of interest in governance from many members of that team | 13:29 |
smcginnis | cdent: Right, not trying to say that applies in all cases. But in the case of so called "dying" projects, it's a different role than being Nova PTL. | 13:29 |
ttx | I don't think any of this is linked to "the PTL" as a concept. It's more a crisis of strategic involvement in projects | 13:30 |
dhellmann | maybe teams with status:maintenance-mode don't need a ptl, but need a "point of contact" role | 13:30 |
ttx | You have less people doing janitor work in projects | 13:30 |
dhellmann | ttx: that has been a trend for a while, yes | 13:30 |
ttx | that makes the PTL, as the default janitor, more exposed | 13:30 |
smcginnis | So status:maintenance-mode can have a "janitor" election. :) | 13:30 |
ttx | But replacing the PTL concept won't solve the underlying issue | 13:31 |
cdent | Sorry, I didn’t mean to distract us from that point. | 13:31 |
ttx | which is that it's hard to get people to work ,on stuff that benefits everyone | 13:31 |
cdent | My question was a much broader bigger picture thing, not related to the issue with some projects being in maintenance mode | 13:31 |
cdent | it was more in reaction to the statement about the small number of elections. My reaction is “of course, who would want that job" | 13:32 |
ttx | cdent: I'm not really concerned by lack of elections. PTL positions are now more inherited through succession planning, which is good | 13:32 |
ttx | i.e. less hostile takeovers, more of a planned transition | 13:33 |
dhellmann | we evolved to that a loong time ago | 13:33 |
smcginnis | Yeah, I see it as a good thing. Less companies pushing employees to do it so they can say "we have X number of PTLs" | 13:33 |
ttx | I would be more concerned if there were dozens of no-candidate projects | 13:33 |
dtroyer | consider too that we have a long history of re-electing folks running for most PTL & TC seats. In the PTL case I think it has to do with avoidance of (percieved) confrontation as much as anything else. Some projects have made a deliverate decision to 'rotate' the PTL so only one steps up to run. | 13:33 |
ttx | 1.6 | 13:34 |
ttx | % | 13:34 |
cdent | yes, I get that, but the axe I’m grinding right now is: We need to recognize that working on openstack is becoming increasingly challenging. It’s a problem that is not going away. | 13:34 |
dhellmann | the election process is there as a safety valve, and I don't think we want to eliminate it, but as long as we have people doing the work I'm not that concerned that we don't actually have votes | 13:34 |
smcginnis | cdent: Yeah, I agree it definitely has changed a bit. | 13:34 |
ttx | I mean 3.2% of teams end up with no candidates, and for the second one we are not even sure it's not human error | 13:34 |
dhellmann | cdent : absolutely. we should be looking for ways to make it easier. Do you think there's something we could change about the PTL role to improve things? | 13:36 |
ttx | cdent: totally. The contributor base evolves, as we can't rely as much on large service providers employing dozens of devs. and 2/ As users get more involved we need to teach them the value of strategic contributions | 13:36 |
ttx | we are in the middle of that transition | 13:37 |
ttx | That said, I see nice signs, like mnaser taking over the PuppetOpenSTack PTLship | 13:38 |
cdent | dhellmann: I don’t really know. We’re definitely in a difficult transition right now in some projects. The “dozen of devs” aren’t there but the expectations that are on projects (that _may_ be put there by the projects themselves) remain. | 13:39 |
dhellmann | I agree, we might need to loosen, if not standards, standard practices in some ways | 13:39 |
dhellmann | perhaps emphasizing the communication duties, over the leadership duties, for the PTL role would help in some cases | 13:40 |
smcginnis | We might just need to adjust our own conceptions that things will not move as quickly as they used to. | 13:40 |
ttx | Horizontal teams have been forced to adapt in the past, today vertical teams need to adapt as well | 13:40 |
smcginnis | And that for some projects, it's just going to be reality that there are 2-3 active contributors that just don't get as much accomplished each cycle. | 13:40 |
mugsie | yeah - that could help. I have found that (at least for the Designate team) PTL was more about herding cats, and making sure that people know what is going on than "tech leadership". | 13:41 |
mugsie | that leadership came from the cores and a few dedicated contributors | 13:41 |
smcginnis | mugsie: Yeah, I've found it to be more of a administrative position than anything else. | 13:41 |
mugsie | that said, we were in a bad place for succession for this election until very recently, so we may not be the gold standard | 13:41 |
ttx | PTL used to be about keeping sanity while drinking from the firehose. Now it's about getting essential bases covered | 13:42 |
mugsie | the biggest issue I see for PTLs is the lack of quality reviews, while some contributors dump huge tactical patches. just co-ordinating who is going to review a massive patch chain is hard work nowadays | 13:44 |
smcginnis | I've really been trying to encourage people to get involved that if nothing else, doing reviews has a huge impact. | 13:46 |
smcginnis | I would love to see more non-core reviews happening. | 13:46 |
smcginnis | (And more core reviews too) | 13:46 |
mugsie | smcginnis: ++ | 13:48 |
mugsie | there is one or two non cores that I am very happy to see a review from, as they are goiod reviewers - I just wish we had more | 13:48 |
*** marst_ has joined #openstack-tc | 13:54 | |
*** persia has joined #openstack-tc | 14:05 | |
cdent | [t v2a] | 14:15 |
purplerbot | <dtantsur> ENOTMUCHTIME is a common error code nowadays [2017-08-10 14:14:41.151800] [n v2a] | 14:15 |
dtantsur | wut? | 14:15 |
* dtantsur is afraid of cdent now | 14:15 | |
cdent | dtantsur: I was quoting you into here to support something I said above about people being overtasked | 14:16 |
cdent | you know about the transclusion powers of purplerbot? | 14:16 |
dtantsur | no, no idea | 14:16 |
smcginnis | That's something new to me. :) | 14:17 |
cdent | https://anticdent.org/purple-irc-bot.html | 14:17 |
cdent | in the first set of bullet points | 14:18 |
*** lbragstad has quit IRC | 14:28 | |
dhellmann | cdent : is the line reference for the bot some sort of hash? | 14:59 |
*** lbragstad has joined #openstack-tc | 14:59 | |
cdent | dhellmann: it’s a base62 encoding of the first few bytes of a uuid | 15:00 |
fungi | mmm, office hour | 15:00 |
fungi | also, not nearly enough people use base62 | 15:01 |
* lbragstad meanders in with a full cup of coffee | 15:01 | |
persia | Does a change in the wording of an old resolution require a new resolution, or can patches be submitted against old resolutions? Note that this would be a semantically meaningful change. | 15:01 |
cdent | it’s trying to be unique but small, but the tooling checks for dupes | 15:01 |
dhellmann | cdent : interesting. I guess it would be hard to express something like "that thing cdent said in $channel $n lines back" | 15:01 |
ttx | Factoid: Nova current activity is not significantly lower than during Mitaka: http://imgur.com/a/gX9ox | 15:01 |
dhellmann | cdent : ooo, though "that thing cdent said in $channel that matches $regex" would be useful" | 15:01 |
cdent | yes, it would, and the functionality could be added, but I never bothered because [t v2a] | 15:02 |
purplerbot | <dtantsur> ENOTMUCHTIME is a common error code nowadays [2017-08-10 14:14:41.151800] [n v2a] | 15:02 |
dhellmann | haha | 15:02 |
cdent | the use of the numbers and the naming of things “nids” all goes back to doug engelbart stuff | 15:03 |
dhellmann | persia : I think in the past we've done a new resolution and explicitly marked the old one as deprecated. There should be examples of that in the repo. | 15:03 |
cdent | the “real” stuff that I used a few years ago integrated wikis, mailing list archives and irc | 15:04 |
dhellmann | persia : https://governance.openstack.org/tc/resolutions/superseded/index.html | 15:04 |
dhellmann | "superseded" not "deprecated" | 15:04 |
persia | dhellmann: Thanks. So the appropriate procedure would be to raise the complaint to the TC, possibly assist with writing a new resolution, etc.? | 15:04 |
dhellmann | persia : that sounds like a good approach -- starting the conversation is always a good way to begin :-) | 15:05 |
dhellmann | would you like to raise a topic now? | 15:05 |
persia | Yes :) | 15:05 |
* dtantsur is now a classic writer, I guess | 15:05 | |
cdent | dtantsur++ | 15:05 |
persia | In 20141128-elections-process-for-leaderless-programs, the phase "As soon as possible" is used. Having just been subject to this, I found every minute of the 112 it took worrisome, because of the time pressure. I'd like to request that "promptly" or similar be considered as an alternative. | 15:06 |
persia | (well, not every minute, because I didn't notice the phrase for the first 60 or 70) | 15:06 |
dhellmann | https://governance.openstack.org/tc/resolutions/20141128-elections-process-for-leaderless-programs.html for reference | 15:07 |
dhellmann | hmm, yes | 15:07 |
dhellmann | I think the intent there is to not have you wait until the actual election is over to let us know | 15:07 |
dhellmann | but it's not like we were sitting around waiting for an email yesterday | 15:08 |
dhellmann | rephrasing that seems like a reasonable change | 15:08 |
dhellmann | and in this case, I don't think the spirit of the resolution changes so I wouldn't go through the superseded process | 15:08 |
cdent | agreed | 15:08 |
persia | My understanding was that as long as you knew before the next office hour after the nomination period concluded, nobody would really notice the time that may have passed. | 15:08 |
dhellmann | supersession? I don't know the right term there -- just patch the existing file | 15:08 |
dhellmann | yeah, that seems about right | 15:09 |
dhellmann | I mean, someone following more closely might have already figured it out or expressed concern (telemetry came up shortly before the deadline) | 15:09 |
* persia will prepare a change for "as soon as possible" -> "promptly" for consideration in gerrit, just patching the old resolution, rather than having a new one. | 15:09 | |
dhellmann | but the "formal" notification doesn't have to be immediate | 15:09 |
dhellmann | persia : ++ | 15:09 |
dhellmann | thanks for raising that, and I'm sorry if any poor wording choices introduced unnecessary stress into your experience | 15:10 |
dhellmann | thank you for acting as an election official :-) | 15:10 |
fungi | persia: i would even, personally, be fine if it just said "before the conclusion of the election" or something like that | 15:10 |
dhellmann | that seems like a reasonable change, too | 15:10 |
fungi | should be able to take your time, we just need to know before the current ptl's term is up, i think | 15:11 |
persia | fungi: I'll use that instead. Having a clear bound will probably be easier to understand for those needing to raise the issue in the future. | 15:11 |
cdent | i think fungi’s suggestion is better than promptly as it is more concrete | 15:11 |
cdent | jinx | 15:11 |
dhellmann | I think the point is we want to try to find someone to do it before the end of the election, so waiting until the day before isn't really helpful, but being concrete is good. | 15:11 |
ttx | cdent: IIRC you were asking for a version of my graph only including "main", more mature projects: http://imgur.com/a/i6VJk | 15:11 |
smcginnis | ttx: good data | 15:12 |
ttx | Not completely crazy curve for projects past their feature development peak | 15:12 |
fungi | dhellmann: well, probably a few days of no ptl (or of the previous ptl sticking it out) while we find a replacement or decide to drop it as an official team is not a huge issue eiter | 15:12 |
cdent | yeah, that’s useful | 15:12 |
fungi | either | 15:12 |
ttx | Also the peak is reached before the current "crisis" | 15:12 |
dhellmann | fungi : true | 15:12 |
ttx | i.e. peak at Mitaka, not Newton | 15:13 |
dhellmann | it would be interesting to compare that to patchsets proposed per day | 15:13 |
persia | Although many folk may have been consuming Mitaka during Newton, causing delayed perception of slowing. | 15:14 |
cdent | dhellmann++ | 15:14 |
ttx | hmm, let's see | 15:14 |
* ttx hacks something real quick | 15:14 | |
ttx | I expect a slightly translated similar curve, but let's see | 15:16 |
fungi | cdent: on your point about the meaning of "upstream" and "downstream" i think it also depends a lot on context, as your most recent example shows. it's mostly a producer vs consumer distinction (distros are downstream from our development work, operators are perhaps downstream from distros and api users like our infrastructure can be downstream from service providers donating resources to our cause) | 15:16 |
cdent | While that’s going, I responded to alan clark’s request for agenda items by pointing him at the second section of https://anticdent.org/tc-report-32.html He agreed that some kind of “top of the mind” discussion about the state of the universe would be good, but it would be useful to have some guiding questions prepared. I said I’d check in with everyone here about that. | 15:16 |
fungi | so when distros talk about "upstream" they typically mean producers of software they're packaging, which is the most common context a lot of us previously involved in distro work hear it in | 15:16 |
ttx | dhellmann: http://imgur.com/a/QDnSD | 15:17 |
dhellmann | ttx, I don't suppose you can get those onto the same graph? | 15:18 |
cdent | fungi: yeah, agreed. The part that was confusing for me was the idea that user committee could be “upstream” if it was an openstack committee (since openstack at large is “upstream”). | 15:18 |
ttx | dhellmann: I can but it will take me more time :) | 15:18 |
dhellmann | the shapes look about the same except for the bump around havana | 15:18 |
ttx | yes around havana we had a LOT of tactical contributions we just could not absorb | 15:18 |
ttx | cdent: btw you made it to LWN quotes section today | 15:19 |
cdent | oh dear, from saying what? | 15:19 |
ttx | that was about your retrospective proposal | 15:20 |
* ttx fetches quote | 15:20 | |
ttx | cdent: are you explicitly sending them a copy of your email ? | 15:20 |
cdent | ttx sending who what email? | 15:21 |
openstackgerrit | Felipe Monteiro proposed openstack/governance master: Mark Murano complete for Queens policy in code goal https://review.openstack.org/492573 | 15:21 |
ttx | cdent: send LWN a copy of your TC report email ? They link to it every week | 15:21 |
ttx | https://lwn.net/Articles/730191/ | 15:21 |
cdent | Oh, no. I didn’t know that was happening. | 15:21 |
ttx | is that accessible ^ | 15:22 |
*** emagana has joined #openstack-tc | 15:22 | |
ttx | They probably pick it up from the Planet then | 15:22 |
cdent | yeah, that works, thanks | 15:22 |
ttx | since they link to your blog | 15:22 |
persia | On "upstream": some projects find using "mainline" to describe themselves useful when they consider that there is no further "upstream" from which they are deriving. | 15:23 |
cdent | I shall try to make sure the wider audience does not curtail my color commentary. | 15:23 |
cdent | anyway back to [t 1xSU] if possible. Does anyone have anything to add? | 15:24 |
purplerbot | <cdent> While that’s going, I responded to alan clark’s request for agenda items by pointing him at the second section of https://anticdent.org/tc-report-32.html He agreed that some kind of “top of the mind” discussion about the state of the universe would be good, but it would be useful to have some guiding questions prepared. I said I’d check in with everyone here about that. [2017-08-10 15:16:50.941642] [n 1xSU] | 15:24 |
dhellmann | I thought the plan was to go through the working groups created in boston at the march meeting? or do we consider that "done"? | 15:26 |
cdent | I think that remains on the agenda. Alan asked for additional agenda items. | 15:26 |
cdent | And I was thinking that some more general talk might be useful in shapring the working group activity | 15:26 |
ttx | persia: we could add language about the TC having the power to appoint PTLs in case nobody nominated themselves directly in the charter, since it talks about election | 15:27 |
dhellmann | it might. my impression is that we talk in generalities a *lot* and that more specifics would help | 15:27 |
cdent | dhellmann: in that case, maybe you and the other tc have already done so, but I certainly have not | 15:28 |
cdent | My impression is that the issues I was talking about earlier (working on openstack is too damn hard) are not well understood despite being well known and not really on people’s agenda’s and I can’t carry on not doing something about it | 15:29 |
dhellmann | cdent : sure. and that's an example of something where we could be more specific and less general | 15:29 |
persia | ttx: My fear with wider exposure of that power is that it may create the impression that one becomes PTL initially by arranging TC appointment. While I don't want to reduce opporunities for free drinks and swag, I imagine the TC mostly wants projects to organically produce PTLs, and that appointment should only be an exceptional process (more the process of helping the project to self-select a PTL than that the TC is specifically controlling PTL | 15:29 |
persia | selection). | 15:29 |
dhellmann | less "it's hard" and more "it's hard because X" | 15:29 |
cdent | people have to say out loud “it’s hard” before they can say why | 15:29 |
ttx | persia: yeah, that's fair | 15:29 |
cdent | The insistence on coming prepared with the reasons for everything is a hugely limiting factor in our discussions | 15:30 |
ttx | dhellmann: yes, plan in September is to go through the various workstreams so that they can expose progress (or lack thereof) | 15:30 |
cdent | Thus the notion of a more retrospective oriented thing | 15:30 |
cdent | EmilienM said he’s be willing to structure something like that, if people felt it was appropriate | 15:30 |
openstackgerrit | Emmet Hikory proposed openstack/governance master: Amend leaderless program resolution timeframe https://review.openstack.org/492578 | 15:31 |
dhellmann | cdent : I would like to avoid a fruitless afternoon of complaining about random things, so if we can focus on a small number and actually talk about trying to address them instead of just listing them all out again, that may be more productive | 15:31 |
dhellmann | structure would be good in either case | 15:32 |
ttx | yes, I feel like there was a lot of complaining about random things in the past, and that didn't really help much | 15:32 |
cdent | yes, that’s why I’m bringing up the topic here so we can formulate the questions that would help shape the conversation | 15:32 |
ttx | I don't feel like there was ever a shortage of people saying "it's hard" out loud | 15:33 |
cdent | I think there are plenty of people who say that making openstack into what they want is hard | 15:33 |
cdent | but I’m less clear on the labor issues | 15:34 |
persia | The "it's hard"s that I have heard often seem to be very specific to individual circumstances. Each has a solution, but all together cannot be solved together. | 15:34 |
cdent | As I said to smcginnis earlier: If openstack was a single corporate thing, then labor would say “oi, we’re overtasked, get some people in here or give us less to do” but there are no mechanisms for that in our collaborative env | 15:34 |
fungi | that's sort of why we did the exercise with the board, tc and uc in boston... to attempt to pin down predominant opinions on what's hard and where limited available effort should be focused to make the most impact | 15:34 |
dhellmann | cdent : funny, I have a blog post draft ready to go up when I return from pto that talks about that exactly | 15:35 |
smcginnis | fungi: And the top 5 help wanted was a result of that, right? | 15:35 |
fungi | a result | 15:35 |
cdent | dhellmann: I look forward to it | 15:35 |
persia | cdent: How are there no such mechanisms? Can PTLs not report their teams understaffed to requirements in a forum that contributing organisations can consume when determining how to allocate contributed FTEs? | 15:35 |
cdent | persia: I think something like that is going on all the time and not working? | 15:36 |
cdent | Things like the top 5 list are helpful and things like it will continue to be helpful. | 15:36 |
persia | cdent: Yes. But that it isn't working is different than that there are no mechanisms. I don't beleive it would work better if OpenStack was managed, rather than governed. | 15:36 |
cdent | I’m not suggesting that openstack _should_ be managed | 15:36 |
fungi | for example, one of the top 5 outcomes was that openstack is unnecessarily complex, so work is starting to remove less-used, incomplete or deprecated features, be more clear about what configurations we actually test/support, shed some projects which aren't bringing anything to the table to help the overall picture, et cetera | 15:37 |
cdent | I’m simply trying to identify that there are issues | 15:37 |
persia | cdent: Apologies. Took "If openstack was a single corporate thing" out of context. | 15:37 |
* ttx drinks herbal tea from a legacy HP Cloud Services cup | 15:37 | |
cdent | I _like_ very much that openstack is a collaborative affair, but because it is at the same time economically driven, some of the functions for improvement and change are difficult | 15:38 |
fungi | ttx: an antique! | 15:38 |
ttx | They pivoted at least 3 times since this one | 15:38 |
dhellmann | cdent : that topic of finding resources to work on things that benefit everyone might lead to a useful conversation in the board meeting | 15:39 |
cdent | In truth, I think it’s the only topic worth talking about. | 15:40 |
*** emagana has quit IRC | 15:40 | |
*** emagana has joined #openstack-tc | 15:41 | |
persia | cdent: The financial situation is indeed complex. Because of the nature of what I do, two things I hear often from contributing orgs are "We can't hire anyone for openstack except at insane salaries" and "We do not believe that paying for openstack feature development is useful." | 15:41 |
* dims peeks | 15:41 | |
dhellmann | so, how do we frame that conversation constructively? | 15:41 |
fungi | persia: feature development seems like the last thing we need. bug fixing and stabilization of current features on the other hand would be stellar ;) | 15:42 |
fungi | we could also, as i mentioned, use some help ripping out old/broken bits which bring more complexity than usefulness | 15:42 |
ttx | dhellmann: I was thinking we could present the top-5 list and say "now what" | 15:43 |
persia | fungi: Some of my principals define "feature" to be things like "something to mean we don't need to reboot all the machies in the substrate with cron to continue to provide services". Not that these are necessarily mature users or anything :) | 15:43 |
ttx | We might want to add a couple more things to the list before we do that though | 15:43 |
dhellmann | ttx: there are 2 items on our top 5 list | 15:43 |
fungi | persia: heh, nice! | 15:43 |
ttx | How about adding "project stewardship roles" to the top-5 ? | 15:43 |
ttx | Like people taking on PTL, release liaison... roles | 15:43 |
dhellmann | at the board meeting at the last summit I raised the point of asking prospective new gold members about their commitment to giving contributors time to become leaders within the community, and not just work on tactical tasks | 15:44 |
mtreinish | ttx: what is the target audience for the top 5 list? | 15:44 |
dhellmann | perhaps we can extend that to existing members? | 15:44 |
ttx | We may not lack PTLs yet, but we definitely lack PTL-like stewardship roles the PTL could delegate to | 15:44 |
ttx | mtreinish: contributing organizations | 15:44 |
persia | dhellmann: For discussion regarding asking for more resources, I'd suggest asking the board for help in communicating to operators and vendors that contribution to maintenance is essential to continued availability of openstack, rather than asking the board for the resources directly. | 15:44 |
mtreinish | ttx: because saying that but getting a bunch of new contributors who've never worked in the community before may not be the most realisitc | 15:44 |
dhellmann | persia : sure. operators and users are going to be an increasingly important source of contributions | 15:45 |
*** emagana has quit IRC | 15:45 | |
ttx | mtreinish: contributors from Asia in particular needed more guidance on where to contribute to be useful | 15:45 |
cdent | yeah, there is a flip side which is the onramping and ongoing complexity, which is another well known problem that we need more strategies for | 15:45 |
ttx | mtreinish: also steawrdship can start with something as siumple as bug czarring | 15:46 |
dhellmann | but my point was to discuss the issue with the board, because I think this is an area where the board can help by raising awareness of the problem and incentivizing companies to address it (to bring in cdent's point about economics) | 15:46 |
persia | dhellmann: Don't discount vendors and software providers. The former have been our backbone, but they will still continue for less advanced users. The latter will probably become more important as the user/operator contribution level increases, as they provide ways to purchase fractional FTEs. | 15:46 |
ttx | Bug czars have traditionnally been newcomers | 15:46 |
dhellmann | persia : it's like you're channeling me from 6 months ago :-) | 15:46 |
ttx | and proved itself a great on-ramp to more involvement | 15:46 |
fungi | one (trivial) thing which was suggested as a topic at an earlier office hour... getting agreement from the board that it's okay to fix that extremely confusing typo in the technical committee member policy (a foundation bylaws appendix) | 15:47 |
mtreinish | ttx: perhaps, but I've seen many times people working on wrangling bugs who don't understand the process or the project well enough and make a mess of things | 15:47 |
persia | dhellmann: I'm just repeating things said in Austin :) | 15:47 |
dhellmann | maybe I've been saying those things for more than 6 months :-) | 15:48 |
mtreinish | ttx: I'm just saying we can put things like we need X on the list, but we want to make sure we provide the tools for people with limited experience to be successful doing X | 15:48 |
ttx | mtreinish: yes | 15:49 |
ttx | Still I'd very much like to get to that Denver meeting with more than 2 things on the list | 15:49 |
ttx | otherwise it's a bit counterproductive | 15:49 |
ttx | fungi: how is infra doing those days ? | 15:50 |
*** emagana has joined #openstack-tc | 15:50 | |
ttx | Should we add it as a top-5 ? | 15:50 |
mtreinish | ttx: I always could use help on my gate data analysis tooling. Openstack-health hasn't had a real patch in months | 15:50 |
fungi | ttx: infra's scrambling to keep afloat | 15:51 |
dhellmann | that sounds like a "yes" | 15:51 |
fungi | (which is one of the reasons i'm not so active in office hour at the moment) | 15:51 |
ttx | yes | 15:51 |
dims | ++ to add infra | 15:51 |
ttx | I cited infra in my talks in Asia | 15:51 |
fungi | much appreciated | 15:51 |
ttx | but I think it's time to add it | 15:52 |
smcginnis | ++ | 15:52 |
ttx | fungi: would you be willing to draft something ? | 15:52 |
dims | fungi : are about cloud capacity? | 15:52 |
fungi | i expect this to get much better once the herculean push for zuul v3 is complete, but we're really worn thin at present | 15:52 |
dims | s/are/how/ | 15:52 |
fungi | dims: brief alignment of unfortunate events with different providers aside, i think we're generally okay on test resource capacity (not this week obviously, but in general) | 15:53 |
dhellmann | more cloud capacity is always good, but I think our #1 issue is somehow encouraging contributing companies to give resources to community needs beyond their own immediate concerns | 15:53 |
sdague | fungi: do we have more capacity coming online? | 15:54 |
fungi | ttx: draft something for the top 5 most wanted list, or for the meeting agenda in denver, or what? | 15:54 |
sdague | because going back into june, we were very often going to the full 1600 | 15:54 |
ttx | fungi: top-5 list | 15:54 |
sdague | and while we don't use that all the time, hitting the hard limit causes really large delays in result turn around time, including fixing the gate itself | 15:54 |
ttx | dhellmann: part of it is not having any free time beyond firefighting to engage with potential donors | 15:55 |
smcginnis | dhellmann: I see that issue - contributing beyond their own immediate concerns. | 15:55 |
fungi | sdague: we have a couple new donors in the wings, but current struggles are because our vouchers for ovh expired and they suspended our service just days after osic went offline, we discovered that we're apparently very bandwidth-constrained in infra-cloud for internet uplink limiting our effective utilization there, network issues in a region in citycloud have had it offline for us for a few weeks... | 15:55 |
fungi | sdague: so fixing at least some of those would probably get things back on track | 15:56 |
fungi | but it's almost all i can do to keep up with the communication with different providers and trouble tickets sometimes | 15:56 |
sdague | ttx: well, it's notable that the entire upgrade testing space is me in small windows, and definitely can't extend past the current boundaries. If upgrade testing remains important we definitely need more folks engaged there. | 15:57 |
sdague | fungi: ok, cool | 15:57 |
ttx | sdague: should we add upgrade testing, or more generally QA ? | 15:58 |
fungi | ttx: i'll come up with something for the hitlist. i may need to make it a little vague because our specific needs shift pretty constantly (so our actual need is for experienced and talented generalists) | 15:58 |
ttx | sdague: mtreinish above said he would welcome help in the gate data analysis tooling maintenance | 15:58 |
mtreinish | fungi: I'd gladly donate all my closet cloud resources to the gate. But it's probably too slow and limited capacity for it to be useful | 15:58 |
sdague | ttx: I also need to be honest and say that shepharding unified limits is beyond my ability to commit at this point. I did look around for some other possible folks to drive that, but people were finding homes at the time. | 15:58 |
ttx | mtreinish: where would we get our TV from? | 15:59 |
sdague | ttx: I think the more specific you get, with how people could have an impact, the more likely we'll get engagement | 15:59 |
fungi | mtreinish: i wouldn't want it to keep you awake all night either, plus it's summer | 15:59 |
mtreinish | ttx: heh, there are other computers :) | 15:59 |
mtreinish | fungi: I'd have to rig up some better cooling, but it would be doable if necessary | 16:00 |
ttx | "Infra donors: OVH, CityCloud and Matt Treinish" | 16:01 |
dhellmann | another way to address the capacity problem would be to look for ways to run fewer jobs. I know nova does some creative path regex work. Maybe we can expand that to other project team.s | 16:01 |
mtreinish | ttx: haha, I like the way that sounds | 16:01 |
fungi | "...and Matt Treinish's clothes closet" has a better ring to it | 16:01 |
dhellmann | there's also that ML thread that said something about trove running 20+ check jobs -- are all of those really necessary? | 16:02 |
dhellmann | and are there other similar cases of excess we can look at? | 16:02 |
mtreinish | dhellmann: none of them are, they're non-voting and don't even work | 16:02 |
dhellmann | s/excess/potential excess/ | 16:02 |
mtreinish | dhellmann: there is a lot of trimming that can be done | 16:02 |
mtreinish | but it requires some one to dig into all the details | 16:02 |
dhellmann | well, they said they were working on getting them to be voting, so I don't want to just turn them all off, but can some be merged? | 16:02 |
mtreinish | which is very time consuming | 16:02 |
*** dklyle is now known as david-lyle | 16:02 | |
dhellmann | mtreinish : perhaps we need to make it a requirement for queens? | 16:03 |
ttx | fungi: re top-5 list -- mention geographic coverage ? | 16:03 |
fungi | dhellmann: i think the trove situation is mostly because they can't effectively check everything they need to in a single job due to the time it takes, so they've split it up into lots of shorter jobs (though that does come with setup overhead). something about the slowness of having to test applications in virtual machines running on other virtual machines | 16:03 |
fungi | ttx: great point, definitely | 16:03 |
dhellmann | fungi : ok, that's the sort of detail I was missing, thanks | 16:03 |
sdague | so, I think if you are going to address the issue of CI use time, the first step would be accounting for it | 16:03 |
sdague | per project, what is the CI use time per patch landed | 16:04 |
mtreinish | back to gate data analysis tooling :) | 16:04 |
sdague | mtreinish: yeh, but that's a very specific problem | 16:04 |
dhellmann | yeah, I don't want to design the whole solution here, just see if that's something worth looking into. | 16:04 |
sdague | dhellmann: yeh, the issue ends up being that a lot of human guessing work happens on projects that we think are outliers | 16:05 |
sdague | but it's really at best recapturing 1% of resources | 16:05 |
dhellmann | because if we say it's something we want done, we can recruit someone to do the work | 16:05 |
dhellmann | the outreachy program is starting up and looking for projects; we're going to ask the board to ask companies to give people time to do this sort of work; etc. | 16:05 |
fungi | yeah, while the skip-if regexes may reduce job resource consumption for certain changes, it also adds a ton of cognitive complexity to te configuration and regularly breaks/drops jobs for unrelated projects if we're not hypervigilant about regex review and inheritence/precedence. so being able to measure how much it's really reducing job resource consumption would be useful to avoid wasting time and | 16:05 |
fungi | increasing risk there for no appreciable gain | 16:05 |
sdague | fungi: right, and set norms | 16:06 |
sdague | which is like the acceptable number of CI hours classes of projects should be shooting for | 16:06 |
sdague | per patch | 16:06 |
dhellmann | sure, like I said, I don't want to solve the problem right now, I want to agree that it's worth looking into | 16:06 |
sdague | dhellmann: I think that given our growth curve, it is worth looking into | 16:07 |
sdague | the previous assumption that we'd always have headroom is not valid | 16:07 |
fungi | we ought to be able to measure that, though as with many things (and i'm tired of saying it), becomes easier in zuul v3. the database reporter will be a wealth of absolute timing data we can mine and analyze | 16:07 |
dhellmann | oops, docs team meeting is starting, I need to head over there | 16:08 |
fungi | sort of like what we have for subunit2sql now, but for the outer layer | 16:08 |
sdague | fungi: sure, but I wouldn't wait to get started on it if there were volunteers | 16:08 |
fungi | agreed | 16:08 |
fungi | it's more a question of whether we have the relevant data. best case we might be able to mine it out of zuul debug logs today | 16:09 |
sdague | yeh | 16:09 |
fungi | we can get builds per change and their start/end times from there | 16:09 |
mtreinish | fungi: we have the data for the gate queue at least in subunit2sql already, I was just working on drafting a small graphing scipt to make some bar graphs by project | 16:10 |
sdague | mtreinish: you need the check queue | 16:10 |
sdague | that's where most of the CI time is spent | 16:10 |
mtreinish | sdague: we do, but at least for right now we don't have that data aggregated anywhere | 16:11 |
ttx | Once we are past RC1 I'm considering writing a "Beware of zombies" contribution metrics post | 16:11 |
fungi | well, also subunit2sql doesn't get all jobs, and for the ones which do provide subunit it often only covers the job payload so there's some (variable depending on the job) overhead to take into account | 16:11 |
sdague | fungi: ++ | 16:11 |
sdague | I would caution extending subunit2sql here | 16:12 |
sdague | honestly, you could get the data almost from scraping gerrit comments | 16:12 |
sdague | you'd miss a few conditinos | 16:12 |
mtreinish | fungi: there is, but at least for dsvm jobs we also get the devstack portion. So all that's missing from there is the pre-devstack setup (which can take a lot of time in some situations) | 16:12 |
sdague | but you'd have a 90% solution | 16:12 |
fungi | so we may be able to get some prelim numbers out of the subunit2sql data and i'd be cool with it as a point of interest, but raw data from zuul will be more accurate (whether that's gleaned from debug logs or queried from the v3 reports db) | 16:12 |
mtreinish | but subunit2sql is an incomplete set of data, it was just easy to do and would give us a view | 16:13 |
fungi | yeah, i certainly don't object and it may show us some things we aren't expecting | 16:13 |
sdague | I would have caution there without the check queue, you are getting a < 10% dataset | 16:14 |
sdague | so it's going to be really easy to make decisions on the wrong picture | 16:14 |
fungi | also it.'s a rarefied set of changes which have already passed check and review | 16:16 |
fungi | who knows how much trash is being put in check which never sees the gate? | 16:16 |
sdague | fungi: yep, and all the -nv on check | 16:19 |
mtreinish | that trove case mriedem pointed out on the ML is a good example of that | 16:21 |
sdague | sure, but even neutron is running > 15 tempest jobs in check | 16:21 |
sdague | there is also an interesting edge condition I was noticing this morning on the tripleo gate queue (that was quite large) | 16:22 |
sdague | the tripple-o ci nodes / jobs a pretty unstable (or at least have been) | 16:22 |
sdague | but those patches, and the things they gate with, include a ton of standard ci jobs | 16:22 |
sdague | which means every time their is a gate reset because of the trippleo-ci | 16:23 |
sdague | it's snagging a ton of high priority nodes from the main CI | 16:23 |
sdague | all the puppet changes for instance and mistral are in that cogate | 16:24 |
sdague | I think that behavior wasn't noticed until we became so node constrained | 16:24 |
fungi | the stuff running in the tripleo queue in the normal gate pipeline actually runs on our nodes (multi-node jobs i think?) | 16:25 |
fungi | there's a separate check-tripleo or whatever pipeline which runs jobs on the tripleo test cloud | 16:26 |
sdague | sure | 16:26 |
sdague | but it cogates with puppet | 16:26 |
fungi | but the tripleo jobs running on our normal nodes are definitely also very unstable | 16:26 |
fungi | and yes, it's getting queued with all the other repos whose teams have agreed to gate on tripleo jobs | 16:27 |
sdague | right, and gate queue is priority of check | 16:27 |
fungi | correct (though at least we won't assign nodes to more than 20 changes in a gate queue if they're failing frequently) | 16:27 |
sdague | sure, but puppet-nova is 15 nodes (for instance) | 16:29 |
sdague | 6 of them tempest runs | 16:29 |
sdague | 20 * 15 == 300 | 16:29 |
sdague | so worst case that's currently nearly half of our capacity | 16:29 |
sdague | real world 20 node deep case (what was constant this week) my guess is basically locking up 100 nodes | 16:30 |
sdague | it's a new problem, don't want anyone to feel bad about it | 16:30 |
sdague | but it's something that I don't think we ran in such a way in previous releases | 16:31 |
openstackgerrit | Hongbin Lu proposed openstack/governance master: Zun completion of python35 goal https://review.openstack.org/492615 | 16:32 |
*** hongbin has joined #openstack-tc | 16:33 | |
*** emagana has quit IRC | 16:49 | |
*** marst_ has quit IRC | 16:54 | |
*** marst_ has joined #openstack-tc | 16:54 | |
mtreinish | fungi: hah, well trying to graph all the projects in 1 go I encountered a first using matplotlib: "ValueError: Image size of 180000x2700 pixels is too large. It must be less than 2^16 in each direction." | 16:54 |
mtreinish | that's no fun | 16:55 |
dims | fungi : glad to hear that | 16:56 |
fungi | mtreinish: hah! | 16:57 |
*** dtantsur is now known as dtantsur|afk | 17:00 | |
*** emagana has joined #openstack-tc | 17:09 | |
*** emagana has quit IRC | 17:12 | |
*** emagana has joined #openstack-tc | 17:13 | |
*** marst_ has quit IRC | 17:13 | |
mtreinish | fungi: interesting I found a bug in the elastic-recheck tests running the graphs: http://logs.openstack.org/42/492342/1/gate/gate-elastic-recheck-python27-ubuntu-xenial/f42c792/console.html#_2017-08-10_15_24_52_027089 | 17:14 |
mtreinish | fungi: because http://i.imgur.com/inMQlBo.png clearly wasn't right | 17:14 |
mtreinish | (limited it to the top 10 run time sums) | 17:15 |
fungi | indeed, that's odd | 17:15 |
mtreinish | I think watcher has the same problem: http://paste.openstack.org/show/618078/ | 17:17 |
*** emagana has quit IRC | 17:18 | |
mtreinish | the run_time is done by taking the first start time and the last stop time for all the tests and using the delta | 17:18 |
mtreinish | so if something is messing with time I could see that having unintended consequences | 17:18 |
mtreinish | ftr, watcher's unit tests take ~50sec to execute not 25M secs | 17:19 |
*** emagana has joined #openstack-tc | 17:19 | |
mtreinish | as amusing as that is :) | 17:19 |
*** emagana has quit IRC | 17:19 | |
*** emagana has joined #openstack-tc | 17:20 | |
mtreinish | fungi: yep looks like both test suites mock datetime during the run. Which is probably messing with the subunit time reporting | 17:22 |
mtreinish | fungi: ignoring those 2 outliers it's a much more reasonable graph: http://i.imgur.com/DkwwTO9.png | 17:23 |
mtreinish | that's over the whole db (so 6 months) | 17:23 |
fungi | we probably do need someone to think through a datetime mocking solution which won't interfere with subunit reporting though | 17:25 |
fungi | just as a longer-term thing to keep on the radar | 17:25 |
mtreinish | yeah, starting that investigation was going to be my post lunch task | 17:26 |
*** emagana has quit IRC | 17:58 | |
fungi | sdague: to follow up on the capacity questions, rackspace also just bumped our quota by an aggregate of a couple hundred instances across two regions, and the same time that took effect the zuul backlog (jobs waiting/pending) reversed direction from trending upward to trending downward. maybe coincidence, but i like to think we were just on the cusp of being able to keep up | 18:37 |
*** cdent has quit IRC | 18:41 | |
sdague | fungi: we might have been | 18:44 |
sdague | we're still 3 hours wait time on check nodes | 18:44 |
sdague | hopefully that burns down | 18:44 |
fungi | it ought to catch up fairly quickly | 18:47 |
fungi | we're at about 2k jobs waiting at present | 18:48 |
fungi | and running around 850 nodes in use after accounting for boot/delete overhead | 18:48 |
fungi | looks like we've been dropping the backlog by a steady 500 jobs per hour since the quota bump took effect | 18:49 |
fungi | so maybe another 4 hours to be fully caught up, if nothing major changes | 18:49 |
*** openstackgerrit has quit IRC | 19:03 | |
*** emagana has joined #openstack-tc | 19:15 | |
*** emagana has quit IRC | 19:23 | |
*** emagana has joined #openstack-tc | 19:24 | |
*** emagana has quit IRC | 19:29 | |
*** emagana has joined #openstack-tc | 19:36 | |
*** thingee_ has joined #openstack-tc | 19:58 | |
*** emagana has quit IRC | 20:46 | |
*** emagana has joined #openstack-tc | 20:46 | |
*** emagana_ has joined #openstack-tc | 20:47 | |
*** emagana has quit IRC | 20:50 | |
*** emagana_ has quit IRC | 21:07 | |
fungi | also, zuul beat the clock on my catch-up estimate. huzzah! | 23:03 |
*** hongbin has quit IRC | 23:21 | |
*** sdague has quit IRC | 23:47 |
Generated by irclog2html.py 2.15.3 by Marius Gedminas - find it at mg.pov.lt!