21:03:34 #startmeeting crossproject 21:03:34 Meeting started Tue Feb 10 21:03:34 2015 UTC and is due to finish in 60 minutes. The chair is ttx. Information about MeetBot at http://wiki.debian.org/MeetBot. 21:03:35 Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. 21:03:38 Our agenda for today: 21:03:38 The meeting name has been set to 'crossproject' 21:03:42 #link http://wiki.openstack.org/Meetings/CrossProjectMeeting 21:03:50 #topic Status update on novanet2neutron 21:03:55 o/ 21:03:58 hi there 21:03:59 As promised we scheduled a status update to keep track of how that effort was doing 21:04:00 o/ 21:04:08 anteaya: floor is yours 21:04:10 so we have been working hard and making some progress 21:04:12 thanks 21:04:12 Ok. 21:04:23 we have meetings 21:04:26 #link https://wiki.openstack.org/wiki/Meetings/Nova-nettoNeutronMigration 21:04:33 the logs are attached to that page 21:04:43 we have merged part 1 of a two part neutron spec 21:04:53 the first part sets expectations for operators 21:05:18 emagma is going to start work on docs to have something ready for sdague to present at the operators mid-cycle 21:05:29 * SergeyLukjanov is here, hotel wifi isn't working ;( 21:05:36 anteaya: much appreciated 21:05:44 now we are working on the second part, which some folks have requested proof of concept code to evaluate 21:05:56 we have a wip patch up for a db migration 21:06:09 #link https://review.openstack.org/148260 21:06:21 our group thinks this can be tested by itself 21:06:44 once we have a patchset that doesn't crash jlibosva and an op from yahoo will test 21:07:02 we also have a wip patch up for a proxy: https://review.openstack.org/#/c/150490/ 21:07:09 which can use more eyes 21:07:15 so right now 21:07:33 we are looking for someone to work on docs with emagma that can attend tuesday 0900 utc meetings 21:07:37 since he can't 21:07:42 and more reviewers 21:07:45 comments? 21:08:23 did I lose the room? 21:08:40 I'm here 21:08:43 o/ 21:08:44 * morganfainberg is here 21:08:44 mikal: thanks 21:08:47 we're listening 21:08:47 anteaya, lots to look at 21:08:51 great okay so i'm done 21:08:54 asalkeld: great 21:09:02 unless there are comments or questions? 21:09:02 anteaya: have you identified any code which needs to land in nova? 21:09:07 is the neutron-proxy something that's likely to get into nova? 21:09:19 mikal: I have not yet seen any code changes that need to land in nova 21:09:30 anteaya: cool. please ping me if you do see any... 21:09:36 if someone from the migration group is here and can correct me speak up 21:09:45 mikal: thank you, will keep my eye out 21:09:51 sdague: I'll be at the ops midcycle too, can help in presenting if needed 21:10:02 bknudson: my udnerstanding is that no, it is a neutron thing that will be in neutron 21:10:20 ttx sdague and emagma the author of the docs says he also we be there 21:10:39 bknudson: I stand corrected 21:10:52 bknudson: this proxy patch is proposed against nova: https://review.openstack.org/#/c/150490/ 21:11:08 mikal: ^^ I'm wrong 21:11:20 anteaya: are you making it to the ops mid-cycle? 21:11:25 the proxy patch has been proposed against nova 21:11:25 anteaya: that might be useful if you can 21:11:30 * mikal looks 21:11:36 mikal: I am handing the torch, I'm mid-cycled out 21:11:47 anteaya: fair enough 21:11:51 mikal: I volunteered to be proxy for this info 21:11:56 at the nova midcycle 21:12:07 mikal: will do all prep work I am able to make this an effective experience 21:12:11 but my cat needs me 21:12:12 there's a reasonable chance I'll be there as well 21:12:13 as long as the team keeps me informed enough 21:12:18 markmcclain: good 21:12:26 sdague: more than you want hopefully 21:12:30 :) 21:12:38 anything more here? 21:12:44 or can I give the floor back? 21:13:04 taht sounds good to me 21:13:05 * anteaya yields the floor back to ttx 21:13:08 great progress actually 21:13:08 thanks 21:13:10 hmmm, the "Sideways" test ... an interesting use of grenade 21:13:20 neat :) 21:13:22 next topic... 21:13:25 #topic API_Working_Group update 21:13:59 etoews is unfortunately unable to attend due to a scheduling conflict, so I'll paste his update here 21:14:03 #info WG has agreed to use 1 repo 21:14:08 #link http://lists.openstack.org/pipermail/openstack-dev/2015-January/055687.html 21:14:13 #info WG has agreed to use the api-wg repo 21:14:17 #link http://lists.openstack.org/pipermail/openstack-dev/2015-February/056227.html 21:14:25 #info Of course things can change in the future but, at this point in the API WG's life, we feel the above are the most appropriate. They were good suggestions that provoked some interesting discussion but we'll just have to be more diligent about engaging the CPLs 21:14:37 much like kaufer did in: 21:14:41 #link https://review.openstack.org/#/c/145579/ 21:14:52 That is all... Comments on that update ? 21:15:15 thumbs up 21:15:32 looks like these are the docs: http://specs.openstack.org/openstack/api-wg/ 21:15:42 bknudson: yes 21:16:21 #topic EOL stable/icehouse (a.k.a. "fixing stable branches for good") 21:16:31 So there was a thread flagging the sad current state of stable branches and questioning our ability to support Icehouse for 6 more months 21:16:46 actually, more like 4 more 21:16:53 #link http://lists.openstack.org/pipermail/openstack-dev/2015-February/056366.html 21:16:59 yeah those threads were kindof all over 21:17:09 Personally I think we are in a better shape than in recent years, thanks to the stable team restructure and the nomination of stable branch champions 21:17:20 But even if it's occuring less often, the branches still get broken, like today 21:17:35 One issue this thread revealed is the gap between the stable branch champions and the QA/Gate teams... In think those two groups should be working much more closely together 21:17:46 from https://wiki.openstack.org/wiki/StableBranchRelease#Planned_stable.2Ficehouse_releases - 2014.1.5 - last planned, July 2015 21:17:50 ttx: ++ 21:17:55 Matt mentioned training more people to gate, I think those champions would be the first in line 21:18:00 s/would/should 21:18:12 ttx, we need better notifications + untangle stable branches from bleeding edge. and this is btw pretty unconnected to the whole EOL issue for icehouse, juno is as well affected. 21:18:23 sharing the same IRC channels and the same status etherpads would also help 21:18:36 so the disconnect seems to be no one from stable maint is proactively improving stable infra 21:18:38 might I suggest that being on #openstack-qa is a strongly recommended activity for stable branch folks 21:18:44 sdague: ++ 21:18:48 as this nearly always manifests as failing tempest tests 21:18:52 ttx, I was actually going to ask QA people to do some TOI for silly stable maintainers, cool 21:18:53 When we embarked on the 15-month support journey in Paris, that was with the promise that stable capping would solve all the illness in the world 21:19:01 or other projects the qa team is typically responsible for 21:19:34 TOI? 21:19:38 do we still think that stable capping would avoid 95% of the issues ? 21:19:50 'transfer of information' 21:19:54 ttx: it should still improve the situation, but is proving harder to implement than we thought -- and there's not currently an owner 21:19:58 If yes, I wondered if we should not hold a virtual sprint to make fast progress on that 21:20:01 ttx: honestly hard to tell. I hope capping non transient deps will help 21:20:06 you're still going to get occasional breakage 21:20:12 I think capping will help a lot 21:20:16 but it should help a lot 21:20:17 since we can't seem to find an "owner" with the right skillset 21:20:43 let's collectively fix that by working on it on the same day, rather than all in bits and pieces 21:20:56 ttx: a sprint makes sense; jogo how much more do you think there is to do to get juno caps in place? 21:20:57 I am open to join, but I'm not really into gate, so will be slow. jogo asked me to review https://review.openstack.org/#/c/147451/16 for the start, I'll start my day tomorrow with this one 21:21:01 (There are a lot of people involved, each with a piece of the puzzle, and we may benefit from having them all focused on the same issue on the same day) 21:21:02 At least the capping would keep the stable gates together when there is something released that is not backwards compatible 21:21:13 dhellmann: I *was* close until everything went kaboom 21:21:23 jogo: that's what I thought 21:21:33 the sprint can double as a good intro ion stable gate knowledge 21:21:49 + for sprint and intro 21:21:51 ihrachyshka: that is the crux of the issue. "I am not really into gate" 21:21:55 ihrachyshka: so unless we're going to turn off tests for stable branch, it seems like being into the gate is kind of key requirement of stable folks 21:22:04 jogo: or do you think a sprint wouldn't help ? You were probably the closest to the end goal, so you probably know 21:22:16 sdague, could we also look at simplifying testing on stable: e.g. what do we gain with I->J grenade testing 21:22:20 there is no one in stable that is working on gating infra/testing harness etc 21:22:22 ihrachyshka: by "into" did you mean "interested in" or "already knowledgable about"? 21:22:34 sdague, I repeat: I relate to it and open for getting more involved (enough for stable stuff, don't ask me to go deeper) 21:22:42 dhellmann, 2nd, sorry 21:22:50 apevec: that's the basic upgrade test, right? 21:22:53 ttx: a sprint for capping juno won't help IMHO. just have a few people follow that patch and related patches is enough 21:22:57 ihrachyshka: that's what I thought, but wanted to clarify :-) 21:23:14 jogo: so maybe just a shared channel. #openstack-gate ? 21:23:20 dhellmann, yes, but pointless between stable branches 21:23:22 ttx: I prefer -qa 21:23:25 jogo: is there more work to change the test job implementations for installing tempest in a virtualenv, or other things like that? 21:23:29 we already are in there all the time anyway 21:23:30 yeah, grenade for I->J is not critical and its removal would benefit a lot, untangling branches when they start to fall apart 21:23:32 apevec: isn't that how people upgrade? 21:23:50 dhellmann: 1 patch in the gate, at the top 21:23:55 apevec: why is it pointless? 21:24:00 grenade I->J is probably one of the more important tests 21:24:01 dhellmann: they arguably have already upgraded 21:24:05 jogo: cool, so maybe a sprint isn't needed 21:24:19 ttx: if I'm running stable I, and I want to upgrade, I would probably choose stable J, right? 21:24:21 you are double checking backports dont break your ability to upgrade 21:24:22 apevec, i don't see why that would be unimportant. we want to amke sure upgrades from i->j aren't broken 21:24:26 * dhellmann double checks his alphabet 21:24:29 which you want to do if on I 21:24:32 dhellmann: maybe a gate cruft crash course. on what the current state is and where we want it to go for stable 21:24:35 clarkb, ++ 21:24:45 jogo: ++ 21:25:08 jogo: having a good doc about how the devstack gate job tools work would be useful too (that might exist already?) 21:25:17 dhellmann: I would say that grenade is more useful to catch breaking changes in last stable -> master, than between stable -2 and stable -1 21:25:19 dhellmann: ++ 21:25:21 dhellmann: so someone from stable maint can work on moving all stable branches to pinning all dependencies (including transitive). that is a passive effort 21:25:42 jogo: good idea 21:25:44 dansmith, pointless on current stable, it is important I->J worked at GA 21:25:47 dhellmann: as in, I know it catched the first kind of issue on the past, not sure it ever caught such an issue in the latter 21:25:51 we should put a list of these things in an etherpad somewhere 21:25:55 dhellmann: hmm, I think there are a few docs not sure if there is any one. 21:26:00 apevec: only if we don't allow backports to J 21:26:03 ttx, +, we don't even consider patches for backport that are 'scary' or do db migrations or anything non obvious, so the chance smth slips in is negligible 21:26:07 dhellmann: so cost of maintenandce might outweigh benefits 21:26:12 apevec: why? until I is EOL, we need to make sure i->J works, no? 21:26:36 sdague, backports do not include db schema changes by default, and if they do they must be backward compatible 21:26:49 apevec: what does that have to do with anything? 21:26:57 apevec: grenade tests a whooole lot more than db upgrades 21:27:02 apevec: db migrations are not the only thing involved in an upgrade 21:27:29 i would be very hesitant to remove i->j genade 21:27:29 what else? config changes? also not allowed in backports 21:27:47 i'd rather defer that until we have the other fixes in 21:27:51 stable updates must work w/o updates 21:27:56 i.e. yum update and all still works 21:27:57 if the maintenance cost is still too high, we can look at removing it 21:28:19 apevec: that's exactly why those tests are important there 21:28:23 morganfainberg, looks like it is the main paint point afaict 21:28:39 morganfainberg, we can start from pinning everything and see whether it makes us generally happy to maintain it as-is 21:28:52 ihrachyshka, ack 21:28:53 ihrachyshka that is my point 21:29:01 ihrachyshka: cap, don't pin, to allow for point updates with bug fixes 21:29:10 apevec: so you're saying it's not worth testing because the reviewers will make sure that the backports don't break anything? why do we test anything then? :) 21:29:11 dhellmann, ++ 21:29:21 dhellmann, https://review.openstack.org/#/c/147451/16 actually pins 21:29:33 jogo: ^^ is experimental, right? 21:29:38 dansmith, ok, let's not send more time on this, was just an idea to float around :) 21:29:45 apevec: alright :) 21:29:47 but yes lets cap, we can evaluate the benefit of keeping grenade until after we do the stable cap 21:29:55 dhellmann: well it doesn't work yet ... but its not experimental 21:29:57 So... how about we complete stable reqs capping, keep all hands on deck in #openstack-qa until that's achieved... and see how that flies. 21:29:59 what about support term? 21:30:17 jogo: it's not = though, it's ~= so those allow patch updates, right? 21:30:22 If it crashes again, we can look into more dramatic options like removing grenade between stables, or shortening support cycles 21:30:29 ttx, ++ 21:30:35 dhellmann: yes 21:30:36 ttx: ++ 21:30:37 ack 21:30:42 well we just bounced the last fix from the gate https://review.openstack.org/#/c/154216/ 21:30:55 More generally, stable branch chapions should get more education on gates (and anyone who wants to join them) 21:31:04 Unable to compute factors p and q from exponent d. 21:31:16 jogo: sounds like bruteforcing RSA 21:31:18 oh, haven't seen that one in a while... 21:31:20 wow, sounds computer sciency 21:31:57 paramiko 21:31:57 ttx, so what's the plan to get us educated? 21:32:01 so I see two action items from here 21:32:08 jogo, mtreinish: how does the shortterm plan I summarized work for you ? 21:32:24 * get some stable maint people to step up and help with stable related tooling 21:32:51 * reevaluate 15 month support for Kilo 21:33:01 * oh and stable maint to use -qa 21:33:13 and set up that training 21:33:17 yes 21:33:22 who's going to do that? 21:33:40 I can take the reevaluate item 21:33:41 ttx: wait another few hours to unwedge stable/juno and be able to work on pinning juno reqs 21:34:01 ttx: yeah, but I'm still extremely skeptical about 15months 21:34:04 dhellmann: training is the follow up to the first bullet point 21:34:10 mtreinish: me too 21:34:26 We need a wider discussion on how costly maintaining 3 branches for 3 months every 6 months is actually much more of a nightmare than 2 branches all the time 21:34:37 mtreinish, me too, supporting 3+ branches in parallel is hard and honestly pointless from my side 21:34:40 jogo: ok, but it would be good to know who is going to put together the training materials and docs 21:34:45 like... when branches are broken, how often is it branch-specific ? 21:35:04 We could easily reduce to 12 months and avoid the 3 parallel 21:35:27 ttx, maybe pinning up versions and recent untangling of tempest will make it effortless, we need some time and work to do before being sure 21:35:35 From a consumer standpoint it just feels weird to drop support at the release date without giving any time for the transition, hence the 15 months 21:35:40 ttx: I would support that 21:35:42 sorry if this is offtopicish, but do we do a postmortem [and then the remedial work to stop that class of failure occuring again in future, when we have one of these fails] ? 21:35:46 ttx, ++ i agree 21:35:58 ihrachyshka: don't forget maintaining tooling. that takes effort as it changes over time 21:36:05 speaking to the cost 21:36:05 it would also be useful to know how the test job reconfiguration we've done affects how breaks in one branch impact other branches, esp. master 21:36:06 but if 3 branches is a lot more work than 2 (and I'm not convinced of that), then 12 months is definitelky an option 21:36:15 lifeless: yes, and everytime we start to go and fix it the same problem wedges us... 21:36:20 which is what is happening now 21:36:30 lifeless: most of the causes of the failures have been different instances of the same issue lately 21:36:42 ok, I'll dig into this with one of you after the meeting 21:36:59 lifeless: https://etherpad.openstack.org/p/wedged-stable-gate-feb-2015 should explain it 21:37:08 another stable-ish point I wanted to raise is lack of ACLs for stable-maint to actually merge stuff 21:37:17 or at least some of it 21:37:23 no devstack, no tempest, no requirements/master 21:37:33 * ttx joins #openstack-qa to help where I can 21:37:33 no grenade either 21:37:51 ihrachyshka: that will come naturally when it's clear that the people doing the work have the right backgrounds in those areas, no? 21:37:56 ihrachyshka: yeah that is a fair point. This a bit of a chicken and egg issue though 21:38:19 * jogo wonders who else here is an active stable maint member? 21:38:21 in the mean time, there are people able to do the needed reviews 21:38:23 ihrachyshka if "you aren't really into gate" i'd say the transfer of knowledge needs to come first on those fronts 21:38:27 dhellmann, I think we should trust people not to merge unreasonable stuff even while they catch up. 21:38:30 jogo: o/ 21:38:30 ihrachyshka: also if there is more communication, +2 is not really that needed 21:38:44 morganfainberg, grenade - maybe, but devstack or requirements? 21:38:47 I don't see stable folks reviewing devstack or grenade? Honestly, I fast approve fixes there when needed 21:38:59 it's blocking you currently because we don't communicate that much between groups 21:39:02 sdague: +1 21:39:07 sdague, +1 21:39:12 ttx, afaik most of +2 owners are not in my or apevec timezone 21:39:15 sdague: well we need to get out of the putting out fire mode and get in front of the issues 21:39:23 but we are trying to do that now (pinning deps etc.) 21:39:25 ihrachyshka: that's a fair point, TZ doesn't help 21:39:27 jogo: oh, agreed as well 21:39:43 part of "not communicating" is due to not being oon the same TZ at all 21:39:50 not just us being dense 21:40:16 well we do have a devstack +2 in .eu with chmouel, so he should be reachable then 21:40:24 if stable maint wants to help put out fires, they need people available at all times. 21:40:34 jogo: stable-maint is a bit more diffuse now, we have project specific teams and stable-maint-core to bind them all 21:40:52 so stable-maint-core needs good timezone coverage 21:41:11 yep! 21:41:12 so hard to tell. From -core, apevec, ihrachyshka are here 21:41:13 jogo, wanna relocate me? :D 21:41:26 ihrachyshka: talk to mordred 21:41:29 we migth want other -core to step up as well 21:41:45 ok, I think we need to move on 21:41:49 * mtreinish wonders where adam_g is 21:41:53 we seem to have a path forward 21:41:56 ttx: yeah, not enough people involved is sorta the issue 21:41:56 I think we need some coverage matrix from stable maint, devstack, etc. Just to see whom you can expect responding and when 21:42:03 we deal with the TZ issue for reviews in other parts of the project. If we can get to the point where having one stable branch broken doesn't also break master, the urgency to get fixes merged within minutes or hours should go away and we can deal with this more calmly. 21:42:06 still waiting for stable maint to sign up to help 21:42:21 dhellmann: ++ 21:42:33 ok, let's move on 21:42:37 dhellmann, ++ 21:42:38 dhellmann: that's only on some projects though. Tempest has to gate on all the stable branches too 21:42:52 jogo, I already stepped up to look into tooling 21:43:18 ihrachyshka: I think jogo meant "the other members". Obviously you and alan are here :) 21:43:26 mtreinish, adam_g is US west coast afaik 21:43:33 but there are 6 people in that team 21:43:36 mtreinish: we should think more about that, maybe when a branch is broken we disable those tests 21:43:46 so that's a good reason to be in #openstack-qa, because for 3 of those 4 projects approvers are there 21:43:54 sdague: ++ 21:44:05 dhellmann: then we just open ourselves to breaking it more 21:44:09 ttx: yup, 2 is a good start but not sure if its enough 21:44:28 mtreinish: tradeoffs 21:44:33 is it easy to drop those tests to non-voting temporarily? 21:44:34 jogo: I can takle the action to explain that stable-maint-core is not just to fasttrack backports in all projects 21:44:56 #action ttx to ask stable-maint-core to get incolved in stable gate maint 21:44:58 and if we need to grow review teams we should also talk about that as well 21:45:09 dhellmann: heh, fair enough 21:45:16 ttx: thanks. And I will send out a follow up to arrange for a time to do a knowledge transfer 21:45:20 alright, really moving on now 21:45:26 #topic openstack-specs discussion 21:45:35 * CLI Sorting Argument Guidelines (https://review.openstack.org/#/c/145544/) 21:45:40 This one looks ready to move to TC rubberstamping. 21:45:47 All affected PTLs +1ed it (nikhil +1es an earlier patchset) 21:46:02 objections? 21:46:25 #action ttx to add https://review.openstack.org/#/c/145544/ to next week TC agenda 21:46:32 * Add TRACE definition to log guidelines (https://review.openstack.org/#/c/145245/) 21:46:44 Looks like this one may need one more draft before lining up the +1s, but only to fix nits 21:47:03 So it's really close to rubberstamping 21:47:17 would be good to get the PTL content votes in there, then we can do a final typo scrub 21:47:21 review it now or forever hold your peace 21:47:34 * Cross-Project spec to eliminate SQL Schema Downgrades (https://review.openstack.org/#/c/152337/) 21:47:43 This one is the result of http://lists.openstack.org/pipermail/openstack-dev/2015-January/055586.html 21:47:45 omg really? 21:48:03 really wat 21:48:11 really going to ban sql downgrades 21:48:23 ban might be a bit string 21:48:25 ong 21:48:35 But agree this one needs a lot of PTLs attention, since so many projects currently provide downgrade 21:48:49 Also needs some operators feedback. The thread for that is there: 21:48:51 I'm just so in favor of making the downgrade method print "Seriously?" 21:48:51 grenade tests downgrade? 21:48:53 #link http://lists.openstack.org/pipermail/openstack-operators/2015-January/006082.html 21:49:01 bknudson: no 21:49:06 bknudson: nothing tests downgrades 21:49:12 AFAICT general feedback from -operators is that nobody relies on downgrades 21:49:16 dansmith, that's mostly what will happen. 21:49:18 the closest we have is unit tests that try to downgrade an empty database 21:49:22 which is kind of making the point 21:49:31 thank you sdague you said it shorter than I would have 21:49:32 morganfainberg: can't wait :) 21:49:53 dansmith, "Seriously!? sys.exit() 21:49:55 ;) 21:50:17 morganfainberg: or "Your license to run openstack has just been revoked. kthx" 21:50:24 dansmith, AHAahaha 21:50:41 "Nice try. Try restoriung from backup instead" 21:51:15 ttx: restoring from backup doesn't work either though 21:51:17 You have entered the land of know return. 21:51:19 anyway, if you want it to happen, talk your fellow cores and local PTL into reviewing that one 21:51:22 looks like it'll be oslo.db that will reject the downgrade, so that should make it consistent. 21:51:22 I think the spec is currently overly simplistic 21:51:26 no 21:51:57 http://cdn.meme.am/instances/500x/59033410.jpg 21:51:57 I think a lot of operators also _think_ that downgrades work 21:52:05 So we're trying to solve an education problem with a policy 21:52:07 dhellmann: perfect 21:52:08 Which seems odd to me 21:52:14 mikal: really? because that's not what I've heard at any summit 21:52:16 mikal, i'd like to get more insight into the realities of downgrades on the spec, - because downgrades don't really work. 21:52:25 Only operators with no db experience 21:52:27 sdague: nor on that -operators thread for the matter 21:52:38 mikal, even if people think they do 21:52:43 sdague: my recollection of the ops meetup last time was there was surprise 21:52:44 Let's query the operators at the midcycle 21:52:45 Rockyg, ++ 21:52:53 Rockyg, please do! 21:52:57 i'd love that feedback 21:53:01 I'm not saying downgrades are great, but I am saying that dropping them doesn't solve our actual problem 21:53:11 even of some people are using it, I like the idea deprecating with scary warnings first 21:53:15 if operators need it then we'd add an anti-grenade test? 21:53:18 mikal: your case where the APIs are left up and continuing to allow changes during the upgrade is valid, but I'm not sure we can code around that. 21:53:21 mikal: it solves the problem of not spending time on them and still having them not work :) 21:53:27 mikal: I wasn't at the last ops meetup, but at the ops tracks at the last 2 summits that wasn't the vibe I got 21:53:30 No. convincing developers that willy-nilly changing the schemas is what has to happen 21:53:36 mikal, well it does solve *a* problem. we support it and it could irrevocably destroy someone's install 21:53:57 Sorry that it's bad to change them willy-nilly 21:54:09 dhellmann: well, what worries me is it might take the operator some time to realize they need to roll back 21:54:12 mikal, and there is very very little use/testing of the edge cases with them / any cases 21:54:17 mikal: fair 21:54:29 dhellmann: i.e. shut apis down, do upgrade, test and thing everything is ok, turn stuff back on, run for a while, realize they're screwed 21:54:53 mikal: so if dropping the downgrade doesn't solve our problem, what do you think the actual problem is ? 21:55:15 I think the spec needs to explore the state change issue more 21:55:22 i've never heard of a single operator actually using a downgrade in production / real deployment. please lets find thos operators 21:55:29 And we need to have an alternative path proposed 21:55:47 it would definitely be useful to have a recommendation for dealing with this case 21:55:48 mikal, if something is that broken that justifies a rollback, do you really trust the data from the new code? 21:55:53 morganfainberg: I've certainly met operators who have had to do manual db cleanups post upgrade 21:56:01 But that's rolling forward, which I think is reasonable 21:56:04 morganfainberg: and educate them using testing + production? 21:56:05 mikal, that is a different thing. 21:56:11 since downgrade doesn't work today, removing them doesn't seem to change where we are, and improve it if anything. Then we can talk about other strategies, right? 21:56:29 dansmith: that's true 21:56:38 But think like an operator... 21:56:39 yeh, I'm with dansmith on this. The current path is so massively untested, and not really right 21:56:43 dansmith, it does lighten our test load and prevents irrivocable harm. 21:56:49 it's actually worse than downgrade being a no-op 21:56:58 If we throw away rollbacks (which I am ok with), we also need to provide clear guidance on what to do when you experience a problem 21:57:02 so it's like saying the first time you're going to test your HA env is when you get a blackout 21:57:06 er, I mean downgrade being a no-op is safer 21:57:09 I want to see that discussion happening now so that we actually do that bit 21:57:21 it sounds like we're saying that we don't provide guidance now, and that we need to. 21:57:22 mikal, sure. this is why the spec and threads are posted :) 21:57:32 but AIUI we already do provide guidance - we say to backup the system 21:57:40 lifeless: backups don't work either 21:57:42 sdague: right, I fail to see how preventing them from shooting themselves in the foot would be a bad idea. That doesn't mean we shouldn't work on educating them about safer weapons, tbut that feels like an orthogonal thing 21:57:54 ttx: exactly 21:57:54