14:01:10 #startmeeting airship 14:01:11 Meeting started Tue Oct 2 14:01:10 2018 UTC and is due to finish in 60 minutes. The chair is mark-burnett. Information about MeetBot at http://wiki.debian.org/MeetBot. 14:01:12 Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. 14:01:14 The meeting name has been set to 'airship' 14:01:15 Well played scott 14:01:39 o/ 14:01:43 o/ 14:01:44 So we have a light agenda again this week, please feel free to add to it: https://etherpad.openstack.org/p/airship-meeting-2018-10-02 14:01:47 o/ 14:02:55 o/ 14:03:24 o/ 14:04:47 I'm trying to help someone get in irc to talk about the wiki, sorry :) 14:04:51 o/ 14:05:06 Should we talk about armada retries first? 14:05:11 #topic Armada Retries 14:05:32 Can whoever added that give an overview of the issue? 14:07:59 sure 14:08:55 when running site update, or initial deploy - armada runs helm test, if this fails then armada will remove and redeploy the chart 14:09:22 Are you sure? 14:09:27 I don't think I've seen that 14:09:29 though sometimes we end up in a situation were armada moves on, as it sees the chart has been deployed on the 2nd run 14:09:51 if the install/upgrade times out, then helm marks the release as FAILED 14:10:02 I believe there is only a removal when there is a helm (tiller) timeout - not on test fail 14:10:09 in that case on a retry armada will remove and reinstall the release 14:10:20 ah ok 14:10:20 but shouldn't on a helm test failure 14:10:38 Yeah, I have definitely seen both install/upgrade failures resulting in removal and also armada "moving on" the 2nd run 14:10:43 though the end result is the same - effectivly helm test is being bypassed 14:11:08 here's one idea for how armada can deal better with the retry: https://storyboard.openstack.org/#!/story/2003897 14:11:11 So I guess one obvious factor here is that we have historically relied on retries from prom & sy 14:11:42 Yeah, that's a nice idea seaneagan 14:11:59 Sean should you also paste a link to your statefulness spec? Perhaps its the link above 14:12:07 seaneagan: whats proposed above would improve things in one respect, though i worry that run time would increase a lot? 14:12:17 What do you do if something has installed correctly and passed, but then later fails? 14:12:22 means unnecessary test runs sometimes though, yes 14:12:42 I think rerunning the tests should be a solid default behavior, modified by supplied enabling and blacklisting/whitelisting 14:13:28 I mean the test.* section could be updated to include controlling configuration for when the test should be run 14:13:29 Right. If you are in a case where confirming software health is not needed, you can disable testing or whitelist only what you care about 14:13:39 So if there are charts with very slow tests then they can be controlled differently 14:14:27 I think in most site update scenarios, positive test results across all deployed charts would be a Good Thing(tm) 14:14:33 if armada had memory that a particular chart failed previously, then it could only rerun tests/waits in that case 14:14:42 Im not sure i fully understand though—this seems like duct tape around not having state (e.g record of it previously passing) 14:15:36 I'd disagree since tests are just point-in-time 14:15:51 Rerunning every test every time while potentially something someone would want sounds more like an edge case desire—state tracking seems like the right economical solution (in the currency of run time) 14:15:52 So rerunning them regularly seems like a good operational goal 14:16:21 Kaspars Skels proposed openstack/airship-treasuremap master: Uplift latest charts/images except ceph https://review.openstack.org/605331 14:16:30 o/ 14:16:37 i think if we were to re-run the regularly - we would also need to improve how the logs from the tests are handled 14:16:42 That just seems like a mode armada should offer to your point scott - not as a solution to this problem though 14:16:53 ideally storing them somewhere, that would allow us to retreive them 14:17:05 Though a second avenue of maintaining state of previously failed release tests is fine as well 14:17:08 Isn't that LMA? 14:17:14 rather than relying on the pod deletetion as part of an upgrade hook currently 14:17:44 it in in some cases, but that relys on lma being there, and operational 14:17:53 which wont be the case for many users 14:18:41 yep - really depends when LMA's deployed, and cases where the LMA components are the releases with failed tests 14:19:08 this would slightly improve matters for deleting test pods at the right time: https://storyboard.openstack.org/#!/story/2003896 14:19:56 though agree with the direction of using something like LMA 14:19:59 Seems that is the price you pay with a self-hosted monitoring solution. You have that issue across all logs prior to LMA being deployed. I'm not sure how re-running tests changes this issue significantly. 14:22:04 But to the original topic, it seems the solutions are test-result persistence or rerunning tests with every 'apply' 14:22:42 Does it make sense for Sean to take this issue and these possible solutions offline to write a spec that references other outstanding specs? 14:23:25 yes, i did make one for state management previously 14:24:06 https://review.openstack.org/#/c/586582/ 14:24:24 so could be an update to that one or a new one 14:25:35 Ok, maybe we can review this offline. 14:25:45 #topic New Wiki 14:25:57 So we have a new wiki: https://wiki.openstack.org/wiki/Airship 14:26:11 Unfortunately, the person who has put the most work into this is having issues getting into irc today 14:26:37 I know he wanted to share some points about it, but perhaps it will have to wait until next time 14:27:44 Anyway, please take a look and contribute! 14:27:53 looks great 14:27:57 This looks nice - a great add. 14:28:05 Yeah, it really is a great start :) 14:28:32 Ok, looks like there is one more item 14:28:37 #topic Marketing 1 Pager 14:28:46 i may have to borrow Lesley to pimp the osh one 14:29:04 :) 14:29:24 @hogepodge did you want to comment on the 1 pager? 14:29:49 yes 14:29:54 We have just two requests. 14:30:19 The first is clarifying some of the language about the underlying k8s/openstack layer and how that works with the openstack and other application delivery layer 14:30:39 There are comments in the working doc identifying the two confusing sections. 14:31:10 Can you link the working doc again please? 14:31:22 The second is that we'd like to have a small sidebar that talks about AT&T usage and the origin of the project, but that's a bit more work and we understand that would require AT&T approval. 14:31:52 link https://docs.google.com/document/d/1BHFZaKDCuoXKjbsRcAOtb2RFuaOThemXCoFMVxXkAMU/edit?usp=sharing Airship One Pager 14:31:56 #link https://docs.google.com/document/d/1BHFZaKDCuoXKjbsRcAOtb2RFuaOThemXCoFMVxXkAMU/edit?usp=sharing Airship One Pager 14:32:00 Thanks :) 14:32:33 The first is critical, the second is a nice to have if we can get it. 14:32:37 thank you! 14:33:03 Thank you, this is really shaping up :) 14:33:42 Yeah, everything is basically there. We're producing one for StarlingX also, which is what gives a bit of breathing room to tidy up the final content. 14:33:49 Thanks to everyone who's contributed to it. 14:34:02 Questions? 14:35:57 Ok, let's close out I guess 14:36:01 #topic Roundtable 14:36:02 Andrey Volkov proposed openstack/airship-in-a-bottle master: Bump Shipyard version https://review.openstack.org/607203 14:36:03 Is there an example of the use case sidebar for AT&T usage - e.g. like the kata 1 pager 14:36:05 Any other last minute topics? 14:36:07 Ah 14:37:03 alanmeadows: the one mattmceuen did for the containers handout was pretty good 14:37:27 back then is made reference to ucp, but i think coul form the basis quite well? 14:38:03 sure, material isn't a problem I just meant an example of what the foundation is looking for for this -- for this 1 pager there was a skeleton to clone 14:38:53 we can take this one offline, don't need to hold everyone up 14:41:36 Andrey Volkov proposed openstack/airship-treasuremap master: Bump Shipyard version https://review.openstack.org/607212 14:43:46 Sorry for cutting off the conversation. Anything else? 14:48:45 Ok, thanks for coming 14:48:47 #endmeeting