#openstack-ansible log

16:01:46 <mnaser> #startmeeting openstack_ansible_meeting
16:01:47 <openstack> Meeting started Tue Jul 23 16:01:46 2019 UTC and is due to finish in 60 minutes.  The chair is mnaser. Information about MeetBot at http://wiki.debian.org/MeetBot.
16:01:48 <openstack> Useful Commands: #action #agreed #help #info #idea #link #topic #startvote.
16:01:50 <openstack> The meeting name has been set to 'openstack_ansible_meeting'
16:01:50 <mnaser> #topic rollcall
16:01:52 <mnaser> o/
16:01:56 <noonedeadpunk> o/
16:01:57 <namrata> o/
16:02:05 <guilhermesp> o/
16:02:25 <mnaser> hows everyone?!
16:02:32 <jrosser> o/
16:02:41 * guilhermesp fixes deployments
16:02:56 <mnaser> fun :)
16:03:04 <mnaser> #topic office hours
16:03:53 <mnaser> personally
16:03:56 <mnaser> https://review.opendev.org/#/c/671783/
16:04:00 <mnaser> i love the -392 part :D
16:04:19 <guilhermesp> it is  lot :P
16:04:29 <noonedeadpunk> So I was thinking about dropping os-log-dir-setup.yml and rsyslog client in terms of this
16:04:45 <guilhermesp> seems that we started the work of cleaning up  things
16:05:19 <mnaser> yeah i think at this point rsyslog_client wont really do much if we never run it
16:05:26 <mnaser> i would even retire rsyslog_server unless omeone wants to maintain it
16:05:31 <noonedeadpunk> but since it's still used by ceph, unbound and things like tempest, rally and utility container...
16:05:44 <mnaser> utility needs rsyslog? o_O
16:06:03 <noonedeadpunk> not rsyslog but os-log-dir-setup.yml
16:06:08 <mnaser> ahhh ok
16:06:16 <mnaser> this is the thing that does bind mounts right?
16:06:17 <noonedeadpunk> I mixed up things a bit:(
16:06:26 <noonedeadpunk> yep
16:06:33 <mnaser> i was always wondering why we just didn't log things inside the container and kills all those complicated bind mounts
16:08:47 <jrosser> it probably historical so you can just collect up whatever is in /openstack/logs/* and not worry about which containers you have
16:08:54 <noonedeadpunk> probably for compressing without entering container
16:08:57 <spotz> I thought it was so we could find everything in one place?
16:09:37 <noonedeadpunk> but since integrated test, which are actually metal ones, everything is already in one place
16:10:15 <evrardjp> o/
16:10:26 <spotz> evrardjp!!!!
16:10:46 <evrardjp> spotz: ! :)
16:11:02 <evrardjp> mnaser: love the - in -392
16:12:46 <cjloader> o/
16:12:48 <evrardjp> mnaser: yes, historical. I would prefer removing all the complex bind mounts too, as this was a pain to deal with. Cleaning this up would also simplify the code further
16:13:45 <jrosser> there is something that collects container journals on the host anyway isnt there?
16:13:47 <noonedeadpunk> so, the thing is that we don't need that almost anywhere, ecept 3 playbooks
16:14:01 <jrosser> so moving to all journals this is now of little point to keep the bind mounts
16:14:32 <mnaser> yeah i agree with all of this so maybe it would be a good clean up
16:14:45 <mnaser> btw also knock on wood our CI has been relatively stable recently
16:14:51 <mnaser> im pretty happy on where it is
16:15:06 <mnaser> the upgrade jobs need work unfortunately
16:15:37 <evrardjp> yeah.
16:15:44 <evrardjp> we've decreased coverage though :(
16:16:07 <evrardjp> if ppl want to help on increasing coverage, I have plenty of ideas, so little time.
16:16:17 <jrosser> evrardjp: ^ can you define what you mean by that, as there several different things
16:18:20 <evrardjp> well I think the first thing to do is to match what we removed -- so define new "specific" jobs having a pre-run play configuring the o_u_c or user_variables, to get feature parity back
16:18:36 <evrardjp> then I guess the idea would be to implement multi-node jobs in periodics
16:20:22 <noonedeadpunk> Yeah, since we might be missing system packages in roles
16:20:50 <noonedeadpunk> we won't catch this if they are already installed by previous role
16:21:01 <mnaser> would we get coverage again if we run both container and metal on every change?
16:21:12 <mnaser> i mean im not opposed to it but we need to figure out why centos takes stupid long with lxc
16:21:16 <mnaser> its like 2h20 to run aio
16:21:35 <jrosser> i would like to see haproxy back in some form
16:22:08 <mnaser> yes with the bind-to-mgmt stuff you're doing
16:22:10 <mnaser> it'll be running fine
16:22:19 <jrosser> the curent metal jobs are fast but they are not sufficiently real life
16:22:42 <jrosser> i'm a bit stuck on the galera stuff there
16:25:12 <mnaser> jrosser: its odd that it just works for us, i can try to help with looking that the fails
16:25:16 <mnaser> did you end up doig the my.cnf adjustments?
16:25:25 <jrosser> i did, and it's inconsistent
16:26:00 <jrosser> heres the change for the client my.cnf https://review.opendev.org/#/c/672101/
16:26:24 <mnaser> jrosser: lol
16:26:26 <mnaser> you're gonna hhate me
16:26:40 <mnaser> jrosser: check the review :p
16:26:43 <jrosser> i figured i may have messed it up
16:27:13 <noonedeadpunk> yeah
16:27:14 <jrosser> aaaaaahhhhhh crap :)
16:27:42 <jrosser> thankyou :)
16:27:44 * jrosser fixes
16:28:09 <openstackgerrit> Jonathan Rosser proposed openstack/openstack-ansible-galera_client master: Default to connecting clients via ip rather than local socket  https://review.opendev.org/672101
16:28:22 <mnaser> so that should hopefully help
16:28:48 <evrardjp> mnaser: it won't be enough, and I didn't intent to run centos+lxc :)
16:29:07 <mnaser> evrardjp: well i figured we'd run all the oeprating systems we cover
16:29:14 <evrardjp> I just wanted to have like mariadb cluster testing + keystone and stop there
16:30:00 <mnaser> that probably can be wired up in logic
16:30:09 <evrardjp> we already have all we need
16:30:17 <mnaser> ye sbut writing the 'dependency' system
16:30:19 <evrardjp> just change affinity
16:30:35 <evrardjp> what do you mean?
16:31:08 <mnaser> when testing os_keystone then grab what services need os_keystone (i.e. galera memcache etc)
16:31:13 <evrardjp> the idea to not run centos+lxc was just to have scenarios (ubuntu +lxc is the most frequent one)
16:31:38 <mnaser> when testing os_nova then grab its dependnecies which are os_neutron,os_keystone,os_glance
16:31:45 <mnaser> whos dependencies are.. etc
16:31:46 <mnaser> to reduce our run times
16:31:47 <evrardjp> that's kinda what's done in tests/role/bootstrap
16:32:03 <evrardjp> but I understand it would be smarter if we do it that way
16:32:17 <evrardjp> I thought this could be done using a CLI :)
16:32:21 <mnaser> mhmm
16:32:31 <mnaser> i think we should have full coverage, if we do aio_lxc, we do it for all supproted systems
16:32:38 <evrardjp> just encapsulate the logic there, instead of relying on so many conditionals in ansible
16:32:41 <mnaser> imho we either do it or drop it
16:32:56 <mnaser> or otherwise we'll get a change that breaks ar ole for centos that wont be caught there
16:33:02 <mnaser> and then itll be broken in integrated
16:33:45 <evrardjp> I meant to keep centos for baremetal, so it would still be tested. Just keeping the most common use cases
16:33:59 <evrardjp> but I understand your point
16:34:38 <mnaser> i feel like with a little bit more efort we'd understand the fundamnetal reason of why centos is sooooo slow
16:34:47 <mnaser> i think we've regressed something
16:34:50 <mnaser> it never took this long before
16:35:22 <jrosser> the data is all there in the ARA reports / db
16:35:38 <jrosser> to decide if its specific things that are slow, or it's just across the board
16:35:44 <mnaser> across the board
16:35:46 <mnaser> every operation is slower
16:35:49 <mnaser> like 3-4x slower
16:35:54 <mnaser> even simple things
16:36:09 <jrosser> does that still stand outside CI?
16:36:12 <noonedeadpunk> Our connection module affects?
16:36:36 <mnaser> i havent tried outside CI, i thougth about our connection moduel but figured it would regress in both OSes?
16:36:54 <jamesdenton> o/
16:36:59 <mnaser> bonjour
16:37:39 <jamesdenton> i've been talking on IRC for like, at least a week, and just realized nothing was going thru
16:37:45 <mnaser> lolol
16:37:52 <jamesdenton> my feelings were hurt for a bit
16:37:55 <jamesdenton> lol
16:38:34 <mnaser> yeah i dont know
16:38:39 <mnaser> for the centos stuff
16:38:44 <mnaser> it deffo needs some profiling
16:39:01 <mnaser> it'd be nice to get to nspawn and not have to deal with that but
16:39:29 <jrosser> we bit too much off there in one go
16:39:43 <mnaser> yea macvlan+nspawn together is a lot
16:39:47 <jrosser> nspawn + macvlan is too much
16:39:48 <jrosser> yes
16:40:00 <mnaser> bridge+nspawn is easier to cosnume but i dunno if i have the current cycles to help with it
16:40:25 <jrosser> i think there may be (was?) a limitation with nspawn and the number of interfaces you could create
16:41:30 <jrosser> evrardjp: did you do some work on ansible profiling?
16:41:39 <evrardjp> long ago
16:41:44 <evrardjp> dw is better :D
16:41:50 <jrosser> i was just looking for dw but he's not in #mitogen
16:42:07 <jrosser> don't want to waste a bunch of time learning 10 wrong tools when someon can just say "do this"
16:42:18 <evrardjp> that reminds me I need to connect on that channel since I reconfigured my bouncer
16:42:24 <evrardjp> jrosser: totally
16:42:35 <evrardjp> just wait for him, last time he was super helpful to me
16:42:53 <evrardjp> maybe ping him on twitter?
16:45:43 <jrosser> done
16:46:47 <chandankumar> cloudnull: jrosser needs +w on this https://review.opendev.org/#/c/672225/
16:46:47 <mnaser> so we'll keep improving and cleaning up things, journald seems to be cleaning up well
16:47:18 <cloudnull> chandankumar done
16:47:25 <chandankumar> cloudnull: thanks :-)
16:47:59 <mnaser> OH
16:48:01 <mnaser> also
16:48:06 <mnaser> did y'all see my email to the ML
16:48:10 <mnaser> about openstack-ansible-collab
16:48:52 <guilhermesp> hectic days, just saw an email around, but I will take a look
16:50:24 <jamesdenton> Just a heads up, but the unicast flood issue that was brought up last week is related to change in os-vif introduced in Stein. See: https://bugs.launchpad.net/os-vif/+bug/1837252.
16:50:24 <openstack> Launchpad bug 1837252 in neutron "IFLA_BR_AGEING_TIME of 0 causes flooding across bridges" [Undecided,Incomplete]
16:50:27 <jrosser> chandankumar: did you find a solution with your tempest undefined var?
16:51:15 <mnaser> jamesdenton: affecting lxb only?
16:51:31 <chandankumar> jrosser: above changes worked https://review.opendev.org/#/c/672231/ here, But I need to come up with a better solution
16:51:33 <jamesdenton> it affects the qbr bridges, too, with OVS. Just not sure what the overall effect is there
16:51:56 <chandankumar> it might happening due to mixing of venv and distro stuff
16:52:37 <chandankumar> jrosser: sorry wrong review
16:52:58 <chandankumar> jrosser: https://review.opendev.org/#/c/667219/
16:57:30 <jrosser> evrardjp: word according to dw "'perf record -g ansible-playbook ...' of the ansible run  /and/ separately on the host using simply 'perf record'   followed by 'perf report' might show something obvious"
16:57:51 <jrosser> namrata: did you want to ask about upgrades?
16:58:08 <namrata> yeah I was waiting for open discussion
16:58:22 <evrardjp> jrosser: oh yeah that rings me a bell :)
16:58:31 <namrata> Hi would like to contribute to openstack ansible and I can start with the issue which I faced while upgrading R->S i.e upgrading WSREP SST method from xtrabackup-v2 to mariabackup.
16:58:31 <jrosser> namrata: just ask :)
16:59:02 <evrardjp> jrosser: could it be the fact we just added all those plays, and maybe there is cruft in the inventory?
16:59:05 <evrardjp> I haven't checked tbh
16:59:29 <namrata> jrosser suggested to take this up in the meeting so we can discuss how to handle it
16:59:36 <jrosser> mnaser: ^ so for fixing up the R-S upgrade for galera, do we make patches to master? i wasnt totally clear where we do that
16:59:42 <jrosser> evrardjp: ^ maybe you can advise too
17:00:11 <evrardjp> well, is S-master broken too?
17:00:28 <jrosser> if you start on S you are already on the new replication method
17:00:33 <evrardjp> ok
17:00:55 <evrardjp> so it's S upgrade only, so it's only implementable in stein
17:01:07 <evrardjp> you got your answer ? :p
17:01:30 <jrosser> i guess it's made more complicated by not having a working R-S upgrade CI job :/
17:01:38 <jrosser> well, unless we do, of course
17:02:07 <mnaser> yeah stable only patch
17:02:09 <namrata> okay so I should push to stable/stein then
17:02:55 <jrosser> namrata: ok so that sounds like your answer, write something that goes onto stable/stein
17:03:08 <namrata> jrosser thanks
17:03:10 <jrosser> and thanks for taking the time to fix it up :)
17:03:28 <namrata> :]
17:15:41 <mnaser> #endmeeting