#openstack-meeting-alt log

14:00:11 <shardy> #startmeeting tripleo
14:00:15 <openstack> Meeting started Tue Apr 12 14:00:11 2016 UTC and is due to finish in 60 minutes.  The chair is shardy. Information about MeetBot at http://wiki.debian.org/MeetBot.
14:00:16 <openstack> Useful Commands: #action #agreed #help #info #idea #link #topic #startvote.
14:00:18 <openstack> The meeting name has been set to 'tripleo'
14:00:27 <shardy> #topic rollcall
14:00:29 <derekh> o/
14:00:31 <shardy> Hi all!
14:00:33 <jaosorior> o/
14:00:34 <dprince> hi
14:00:35 <trown> o/
14:00:42 * derekh lurks while working on rh1 cloud
14:01:09 <pradk> o/
14:01:14 <EmilienM> o/
14:01:17 <bandini> o/
14:01:51 <gfidente> o/
14:01:51 <shardy> Ok then, lets get started
14:01:53 <sanjay__u> o/
14:02:00 <shardy> #topic agenda
14:02:00 <shardy> * one off agenda items
14:02:00 <shardy> * bugs
14:02:00 <shardy> * Projects releases or stable backports
14:02:00 <shardy> * CI
14:02:03 <shardy> * Specs
14:02:15 <shardy> * Open disussion
14:02:20 <shardy> I made one minor change:
14:02:30 <shardy> #link https://wiki.openstack.org/wiki/Meetings/TripleO#One-off_agenda_items
14:02:40 <shardy> I moved one-off items to the start, as we kept running out of time
14:03:00 <dprince> shardy: good idea
14:03:03 <shardy> I propose we keep them time-boxed at say 5mins each max, and move them to open-discussion if they run over, sound reasonable?
14:03:15 <dprince> yep
14:03:31 <shardy> Cool, so there are two this week
14:03:37 <shardy> #topic one off agenda items
14:03:48 <shardy> trown: want to give us an update on tripleo-quickstart?
14:03:57 <shardy> I'm not sure if those are from last week or not
14:03:58 <trown> shardy: sure
14:04:07 <trown> they are from last week
14:04:23 <trown> tripleo-quickstart code is imported, and third-party CI jobs are running
14:04:47 <trown> still need to move over github issues and make some minor CI fixes
14:05:02 <shardy> trown: excellent, sounds like good progress :)
14:05:02 <dprince> shardy: yeah, so I usually clean up that wiki right before the meeting :). And this week I didn't...
14:05:25 <shardy> dprince: hehe, I'll do it right after we finish :)
14:05:37 <shardy> Ok, one other one-off item is summit topics:
14:05:40 <trown> documentation is a bit blocked by a lack of an image for upstream, but that is a pretty big topic
14:05:49 <shardy> #link https://etherpad.openstack.org/p/newton-tripleo-sessions
14:06:12 <shardy> trown: ack - is that something we can follow up on on the ML or does it need disscussion now?
14:06:36 <trown> I think ML, plus maybe CI subteam meeting if/when it is an official thing
14:06:43 <shardy> trown: +1, thanks
14:06:56 <shardy> Ok so I refactored all the session proposals
14:07:12 <shardy> and other than the TLS one I managed to capture all the ideas, with some combined into sessions
14:07:18 <shardy> it looks a lot like last summit really
14:07:30 <shardy> anyone have any final comments or objections to those?
14:07:38 <shardy> I need to propose them to the schedule this week
14:08:13 <dprince> shardy: seems good to me
14:08:44 <shardy> Ok, any issues let me know otherwise I'll propose them later today or early tomorrow, thanks!
14:08:46 <EmilienM> it looks excellent
14:09:09 <shardy> #topic bugs
14:09:25 * beagles wanders in late
14:10:11 <shardy> Anyone have any specific bugs to mention?
14:10:33 <shardy> I see a number of CI related ones, and I'm aware we've got CI issues generally, but anything else to highlight?
14:10:41 <derekh> we got two bugs preventing us from moving the current-tripleo pin https://bugs.launchpad.net/oslo.config/+bug/1568820 and
14:10:42 <openstack> Launchpad bug 1568820 in OpenStack Compute (nova) "Duplicate sections in generated config" [High,Confirmed]
14:11:03 <shardy> derekh: aha, that was going to be my next question :)
14:11:35 <EmilienM> why don't we purge nova.conf before puppet run?
14:11:36 <derekh> I seem to have lost the other bug, anyways its a mistral thing
14:11:45 <EmilienM> I think puppet-nova has support for that
14:11:55 <trown> EmilienM: oh, that is a good idea
14:11:58 <derekh> EmilienM: we could, and it works for the undercloud,
14:12:14 <derekh> EmilienM: I've tried it for the overcloud, but I don't think I did it correctly
14:12:16 <dprince> EmilienM: we might check to make sure there isn't also a packaging issue
14:12:22 <derekh> dprince: its not
14:12:34 <dprince> EmilienM: purging via puppet would be fine too... but might cover over the actual bug
14:12:41 <dprince> derekh: okay
14:12:41 <derekh> dprince: oslo-config-generator is gerating duplicate sections
14:12:49 <EmilienM> we do it in glance https://github.com/openstack/puppet-glance/blob/5d6e42356efb79e62bf3f1f464a444be39b2dca4/manifests/registry.pp#L116-L119
14:12:53 <EmilienM> I can submit a patch in nova
14:13:05 <EmilienM> #action EmilienM to patch puppet-nova to add support for nova.conf purging and patch tripleo to enable it
14:13:09 <bnemec> derekh: Is there a bug open for that?
14:13:21 <derekh> bnemec: https://bugs.launchpad.net/oslo.config/+bug/1568820
14:13:22 <openstack> Launchpad bug 1568820 in OpenStack Compute (nova) "Duplicate sections in generated config" [High,Confirmed]
14:13:22 <dprince> derekh: okay, so oslo-config-generator is causing a it to be packaged with duplicates then?
14:13:29 <bnemec> derekh: Thanks
14:13:45 <derekh> dprince: yup, and that seems to confuse the puppet-nova module
14:14:20 <dprince> derekh: cool. lets fix that then rather than turn on puppet-nova config purging
14:14:25 <derekh> bnemec: The fix I have up probably isn't suitable, been trying to work on a better solution but got draged off to something else
14:15:09 <derekh> anyways, that all from me, things in the ways of moving the tripleo pin
14:15:21 <shardy> Ok, sounds like this will require further discussion after we get CI back up, thanks for the update derekh
14:15:43 <shardy> #topic Projects releases or stable backports
14:15:56 <EmilienM> dprince: ok so no need to patch puppet-nova?
14:16:12 <shardy> So a couple of updates - we branched puppet-tripleo yesterday as that missed the initial stable/mitaka branching
14:16:45 <shardy> and gfidente updated the wiki with a revised release process that uses openstack/releases to push the tags & announce the release to the ML
14:16:56 <dprince> EmilienM: don't think so
14:17:00 <shardy> #link https://github.com/openstack/releases
14:17:12 <shardy> #link https://review.openstack.org/#/c/303986/
14:17:27 <shardy> #link https://wiki.openstack.org/wiki/TripleO/ReleaseManagement#How_to_make_a_release
14:18:32 <shardy> That's the first step towards aligning with the release process changes discussed here (for all projects):
14:18:37 <shardy> #link http://lists.openstack.org/pipermail/openstack-dev/2016-March/090737.html
14:19:18 <shardy> So anyone can now propose a release, and we can potentially look at doing interim milestone releases too via a similar method
14:20:27 <shardy> Anyone have any comments on that, or other release/backport content?
14:21:38 <shardy> Ok then
14:21:40 <shardy> #topic CI
14:21:49 * shardy dons tin-foil hat
14:21:50 <derekh> rh1 down as of thismoreing
14:22:01 <derekh> currently working on it
14:22:21 <shardy> derekh: any info to share re the cause yet?
14:22:23 <derekh> besides that, jobs a re running too ling and hitting timeouts
14:22:45 <sshnaidm> derekh, yes, we encounter more infra issues than usual, I prepaired some statistics for today:
14:22:45 <sshnaidm> #link https://etherpad.openstack.org/p/tripleo-issues-analysis
14:22:57 <weshay> sshnaidm, very nice!
14:23:13 <derekh> shardy: nope, this is the 3 or 4 time this exact thing has happened (in 2 years), 100,000s of arp requests flying around the network
14:23:36 <derekh> shardy: the only way I've ever figured out to deal with it has been a reboot of everything
14:24:21 <jdob> https://media.giphy.com/media/F7yLXA5fJ5sLC/giphy.gif
14:24:22 <shardy> sshnaidm: thanks, I guess everything will fail today, but when the cloud is back up this sort of analysis will be useful
14:24:30 <derekh> shardy: the cahing work I've been doing is block on getthing the tripleo pin moved
14:24:35 <bnemec> jdob: +1!
14:24:38 <shardy> derekh: ack, thanks
14:24:56 <derekh> sshnaidm: yup, we got lots of roblems ;-(
14:25:00 <derekh> one more thing
14:25:23 <gfidente> ... panic ...
14:25:35 <derekh> I *think* the dns errors we have seen in jobs a few times coincide with nodepool being restarted
14:25:59 <derekh> that should be ok, but it hits us with lots of new instance requests at the same time
14:26:06 <derekh> and kind of DOS's us
14:26:13 <derekh> I think thats what happens anyways
14:26:22 <shardy> derekh: interesting
14:26:39 <shardy> re the ram usage, I've been doing some profiling trying to figure out things we can do to reduce it
14:26:44 <trown> that seems like another argument for being totally third-party
14:27:00 <shardy> so far I've got a patch up which disables ceilometer/aodh because we don't use them, that saves over 300M
14:27:09 <derekh> trown: it would
14:27:12 <derekh> shardy: nice
14:27:53 <shardy> There's also a lot of processes spawning multiple workers on the undercloud which we may be able to trim down, like we have for the overcloud
14:28:00 <dprince> shardy: just make those composable roles?
14:28:14 <trown> dprince: composable undercloud?
14:28:19 <shardy> I'm also attempting to profile inside heat as that's one of the worst offenders (along with mariadb)
14:28:52 <bnemec> We probably have mariadb tuned for real deployments, which isn't ideal for a memory-limited CI environment.
14:28:53 <dprince> trown: that too, the idea of using Heat again for the undercloud is worthy of getting back to
14:28:55 <shardy> dprince: Yeah, well initially I guess it'll be some conditionals in the manifest, but we could look at using the same service profiles from puppet-tripleo perhaps?
14:29:10 <derekh> any ideas why the overcloud deploy takes > 1hr ? lots of stops to sync up?
14:29:12 <shardy> bnemec: Yeah, I was wondering if we should add a "minimal" option to undercloud install
14:29:16 <dprince> trown: it was one of TripleO's foundation ideas... i.e. the feedback between the over and undercloud
14:29:17 <derekh> for HA
14:29:23 <shardy> that turns on a different config tuned for minimal footprint
14:30:12 <shardy> derekh: the overcloud deploy takes 10mins for me locally, so I suspect it's performance of the platform
14:30:30 <gfidente> shardy +1
14:30:33 <trown> derekh: at least some of the time it is due to the nova-ironic race, which we have retries to deal with, but retries are time-expensive
14:30:33 <bnemec> shardy: Yeah, or we use slagle's custom hieradata to pass in a CI-specific config.  I feel a little weird adding user-facing options for CI only.
14:30:34 <derekh> shardy: for the HA overcloud ?
14:31:06 <slagle> a 10 minute deploy? is there a --turbo option? :)
14:31:13 <bnemec> :-)
14:31:13 <gfidente> overcloud only
14:31:24 <bnemec> I'm up to 20 minutes locally these days.
14:31:26 <gfidente> but yes that's it for me too
14:31:28 <shardy> bnemec: Yeah, I was thinking the same, but it'd be good to offer the option for developers too I think
14:31:45 <shardy> bnemec: This is for a 2-node nonha deployment on a box with an SSD
14:31:47 <bnemec> shardy: Maybe put something in tripleo.sh so dev environments get that by default.
14:32:00 <shardy> it does take 20mins on my other (slower non-SSD) box
14:32:02 <bnemec> shardy: Yeah, I'm SSD-backed too.  It's still slow.
14:32:25 <gfidente> honestly for me that is just about using 2cores and 8g per node
14:32:29 <shardy> bnemec: sure, we can start with tripleo.sh and prove out what options we need there I guess
14:33:01 <bnemec> shardy: Yeah, we can figure something out.
14:33:26 <slagle> couldnt we get rid of nova-conductor on the uc too?
14:33:42 <dprince> slagle: I tried. I don't think it is possible anymore
14:33:55 <dprince> slagle: nova requires it now I think :/
14:33:59 <EmilienM> how can we get rid of nova conductor?
14:34:01 <slagle> oh ok
14:34:01 <trown> :(
14:34:12 <shardy> slagle: we already have a custom OS::Nova::Server resource in tripleo-common, so I wonder if we actually need nova at all
14:34:27 <shardy> that's more of a long term discussion tho ;)
14:34:40 <trown> that would save some resources :)
14:35:04 <dprince> shardy: exactly, but I think we'd want Ironic heat resources too then
14:35:27 <shardy> dprince: Yeah, how to wire it up definitely needs more thought
14:35:40 <dprince> getting rid of nova in the UC is a worthy investigion
14:36:14 <dprince> shardy: I've been keen on droping Nova in the UC for a while now. But we've actually added more features around Nova, not less :/
14:36:34 <shardy> dprince: Not really, we've got a custom nova resource and a custom nova scheduler
14:36:50 <dprince> shardy: like some of the scheduler related things in t-h-t
14:37:08 <shardy> the scheduler filter thing could be easily reimplemented by filtering an ironic node list
14:37:19 <shardy> dprince: that actually didn't work, bnemec had to write a custom filter
14:37:29 <dprince> shardy: perhaps so, and I'd like to see it work out
14:37:30 <shardy> so again, nova isn't actually buying us much there
14:37:46 <shardy> anyway, shall we move on and table this for the beer-track at summit? :)
14:38:02 <dprince> shardy: the reason for nova is multi-tenant use cases
14:38:08 <shardy> I'll try to carve out some time to revive my ironic resources and look at how it might be done
14:38:16 <dprince> shardy: like Magnum would require it for example
14:38:44 <dprince> shardy: if we've got no plans for multi-tenant undercloud cases or to use projects that require that then I think we can seriously consider dropping it
14:40:21 <shardy> dprince: Yeah, there's a bunch more complexity than nova around that tho right - I'm not sure we'd want to support Ironic in those environments anyway unless all the separation around baremetal to tenant are worked out?
14:41:13 <shardy> #topic specs
14:41:20 <dprince> shardy: last summit I think there was a lot of focus around multi-tenant baremetal clouds that use nova...
14:41:51 <shardy> dprince: ack, something to discuss further then I guess - I'm just wondering if we have to *always* require it
14:42:11 <shardy> e.g can we support a "tripleo lite" mode
14:42:40 <dprince> shardy: for me the key would be to get Heat to support OS::Nova::Server, similar to slagle's patch but without the extra work
14:43:02 <dprince> shardy: if that is even possible, Just refining the interface there a bit so it seems cleaner
14:43:25 <dprince> all things to consider
14:43:28 <bnemec> I would not be a fan of trying to support both Nova and not-Nova.
14:44:31 <shardy> #link https://review.openstack.org/#/q/project:openstack/tripleo-specs+status:open
14:45:12 <shardy> So on the topic of specs - I'm planning to create some series in launchpad (like other projects) so we can track features on the roadmap for newton
14:45:39 <shardy> folks have started asking about it already, and it'll be easier to track if we can tag either spec-lite bugs or blueprints against a "newton" series
14:45:44 <shardy> is that OK with folks?
14:45:51 <trown> +1
14:46:04 <bandini> +1
14:46:20 <dprince> +1
14:46:31 <shardy> the only effort is ensuring you raise either a bug or blueprint and propose it for the series, then we know it's targetted to Newton
14:46:33 <beagles> +1
14:47:10 <shardy> I'd say specs should be optional particularly for simpler features, as they always get bogged down in implementation discussions
14:47:26 <shardy> obviously for more complex stuff specs may be posted too :)
14:48:40 <shardy> Relatedly it'd be good to consider making some milestone releases this cycle (again like other projects), mostly so we give folks better visibilty of the release cycle as it passes
14:49:39 <shardy> Anyway, we can discuss that further on the ML, just getting the idea out there for consideration
14:50:00 <shardy> #topic Open Discussion
14:50:18 <shardy> Anyone have anything to discuss?
14:50:21 <sshnaidm> I'd like to get your opinions about tempest support patch: https://review.openstack.org/#/c/295844/
14:51:15 <shardy> sshnaidm: We simply don't have the time budget to run it every commit - are you thinking of the periodic job?
14:51:19 <dprince> sshnaidm: idea seems fine to me. Just keep in mind that we aren't even close to a point where running that for all the CI jobs is appropriate
14:51:25 <trown> I am -1 to adding anything that increases job time
14:51:29 <sshnaidm> shadower, yes, it should go to periodic
14:51:34 <dprince> sshnaidm: as a general tool/feature for tripleo.sh it is fine I think
14:51:58 <sshnaidm> dprince, sure, I plan to start with periodic nonha, and then we'll see
14:52:04 <trown> periodic would be fine though
14:52:05 <EmilienM> is there any wip to move it in tempest itself?
14:52:23 <dprince> one new topic I'd like to mention is not landing any more t-h-t features that aren't in the composable roles format
14:52:40 <EmilienM> config_tempest.py sounds like a script not specific to tripleo, but that could be use widely in OpenStack
14:52:50 <dprince> now that we have the keystone example I think that should be sufficient to convert over to the new format. for controller services...
14:52:58 <shardy> dprince: +1, gnocchi was the last one because it was agreed as a backport exception for mitaka
14:52:59 <EmilienM> dprince: ++
14:53:12 <trown> nice
14:53:28 <slagle> are we still committed to testing stack-updates before landing composable roles?
14:53:32 <EmilienM> sshnaidm: see my question ^
14:53:40 <sshnaidm> EmilienM, dmellado is working on tempest configuration script in upstream, but not close to finish yet
14:53:54 <dprince> slagle: we have an upgrades job
14:54:07 <slagle> dprince: it doesnt test anything
14:54:07 <sshnaidm> EmilienM, he promises to present it in summit
14:54:11 <dprince> slagle: understood that isn't where we'd like it to be but that is the bar I think
14:54:24 <shardy> What is the status of that, it does an update, but not the full version-to-version upgrade right?
14:54:28 <dprince> slagle: we can't wait on this any further I think
14:54:36 <slagle> ok, i'm just asking
14:54:39 <EmilienM> shardy: wait, won't we run tempest at each commit?
14:54:44 <shardy> EmilienM: No
14:54:47 <sshnaidm> example of tempest run is here: https://review.openstack.org/#/c/297038/16  - but just an example
14:55:04 <EmilienM> I think that's a mistake, we should run at least some basic tests. But that's my opinion
14:55:12 <sshnaidm> EmilienM, I plan to start with nonha periodic, because it's time consuming now
14:55:13 <shardy> EmilienM: we don't have time unless we can somehow reduce our CI runtime by ~20mins
14:55:24 <sshnaidm> yes, time is bottleneck now
14:55:29 <shardy> EmilienM: if we can reduce the runtime, we can consider adding it
14:55:30 <slagle> dprince: we had been in agreement on this, but i had the feeling we werent any longer, so would just like to clarify
14:55:44 <EmilienM> shardy: but what if we drop pingtest and run tempest/smoke instead?
14:55:55 <EmilienM> tempest/smoke has 2 scenarios that spawn VM and ssh to it
14:56:14 <dprince> EmilienM: tempest isn't our most important issue in all our CI jobs I think. Until the walltime comes down signiicantly I'd like the talk of tempest to go away I think
14:56:14 <shardy> EmilienM: we'll have to compare the coverage - IMO pingtest covers things not covered at all by tempest
14:56:26 <bnemec> EmilienM: ping test takes 3 minutes right now, not 20.
14:56:48 <bnemec> At least last I checked after the cirros change merged.
14:57:04 <trown> bnemec: yep it is back down to super speedy
14:57:09 <EmilienM> mhh ok, everyone seems happy with pingtest
14:57:11 <shardy> tempest also has zero functional coverage of heat, which is covered by pingtest
14:57:17 <trown> tempest smoke is taking 20+ minutes in RDO
14:57:32 <shardy> EmilienM: everyone wants to see better coverage, but half our CI jobs are getting killed by the infra timeout
14:57:33 * EmilienM stop arging
14:57:39 <bnemec> shardy: +1
14:57:41 <shardy> we have to fix that problem first
14:57:42 <dprince> EmilienM: I wouldn't say we are happy with it. We just can't spare any extra time
14:58:01 <bnemec> It's not that I don't want Tempest, but we aren't in a place where it's practical yet.  Unfortunately.
14:58:12 <shardy> starting with the periodic jobs seems like the most workable compromise
14:58:16 <dprince> slagle: lets revisit the talk about how to improve the upgrades job in #tripleo perhaps
14:58:19 <bandini> is getting beefier HW an option for CI?
14:58:28 <trown> ya I am pro putting it on the periodic job
14:58:30 <bnemec> bandini: You buying? :-)
14:58:34 <dprince> slagle: everything, not just composability would benefit from the upgrades job being better
14:58:40 <shardy> bandini: there are various optins under discussion
14:58:46 <bandini> bnemec: erm I left my wallet at home :P
14:59:00 <shardy> 2mins - anything else before we wrap up?
14:59:08 <slagle> dprince: sure
14:59:20 <bandini> shardy: I hope something comes out of it, because it seems such an important problem atm
14:59:30 * EmilienM jumps in puppet meeting
14:59:39 <shardy> Ok, thanks all!
14:59:43 <shardy> #endmeeting