#openstack-ansible log

16:01:03 <noonedeadpunk> #startmeeting openstack_ansible_meeting
16:01:03 <openstack> Meeting started Tue Sep 29 16:01:03 2020 UTC and is due to finish in 60 minutes.  The chair is noonedeadpunk. Information about MeetBot at http://wiki.debian.org/MeetBot.
16:01:04 <openstack> Useful Commands: #action #agreed #help #info #idea #link #topic #startvote.
16:01:07 <openstack> The meeting name has been set to 'openstack_ansible_meeting'
16:01:38 <fridtjof[m]> jrosser: found the issue i think
16:01:38 <noonedeadpunk> #topic office hours
16:01:49 <noonedeadpunk> \o/
16:01:59 <jrosser> o/ hello
16:02:39 <noonedeadpunk> Ok, so telemetry failure?
16:04:27 <noonedeadpunk> I'd say let's maybe trying to merge aodh at least?
16:05:09 <jrosser> worst case we have to revert it
16:05:25 <jrosser> i've not had opportunity to test that locally yet
16:05:33 <noonedeadpunk> I think worst case we will have just another patch to fix it
16:05:45 <jrosser> right, thats fine
16:06:15 <jrosser> so thats this https://review.opendev.org/#/c/754791
16:06:29 <jrosser> followed by https://review.opendev.org/#/c/754720/
16:06:48 <noonedeadpunk> yep
16:07:29 <noonedeadpunk> ok, then next thingis galera....
16:07:48 <noonedeadpunk> I tried to look into it and it fails in so many different ways....
16:08:37 <noonedeadpunk> When I deployed it locally it was passing 3 or 4 times in a row when I decided that it's ok
16:09:04 <noonedeadpunk> in some cases there was smth weird with container, as service start was just hanging...
16:09:20 <noonedeadpunk> so at the moment, we have 2 scenarios
16:10:54 <noonedeadpunk> 1st is old one, when one of the containers don't see address of another partner. and it's the issue of this specific member, and it goes back to ok state in case of restart
16:11:03 <noonedeadpunk> while cluster is synced in this state
16:12:20 <noonedeadpunk> 2nd case when one of the containers is really down and didn't get up. IN this case we should restart not containers which don't see neighboor but down member...
16:12:35 <noonedeadpunk> And I dunno how to make ogic to make it work
16:12:41 <noonedeadpunk> *logic
16:12:59 <noonedeadpunk> From other side, we can add serial and probably forget about the issue at once
16:13:54 <jrosser> for the first case do you think that the container networking is completely broken
16:15:00 <noonedeadpunk> no, for the first 3 members are up and synced, but one of them show only 2 addresses in wsrep_incoming_addresses
16:15:20 <noonedeadpunk> which doesn't affect anything functionally, except it's weird and our tests fail
16:17:06 <jrosser> the functional test is kind of tech debt somehow
16:17:17 <jrosser> we could have an integrated test with affinity=3 on the container
16:17:34 <jrosser> then expand the galera role to have cluster status checks
16:17:54 <noonedeadpunk> have no idea how to do the last part
16:19:33 <noonedeadpunk> or just do cluster checks by default?
16:19:44 <noonedeadpunk> or with some var passed?
16:19:54 <noonedeadpunk> hm, yeah, might be
16:22:02 <jrosser> see affinity on here https://github.com/openstack/openstack-ansible/blob/master/doc/source/admin/maintenance-tasks/containers.rst
16:22:26 <jrosser> i never used this though.... maybe works!?
16:22:28 <noonedeadpunk> yeah. I was not about affinity, but about how to extend role with tests)
16:22:35 <noonedeadpunk> me too lol
16:23:22 <jrosser> yes so it would be optional sanity checks i guess, you don't want that interfering when trying to rescue a broken galera cluster
16:23:37 <jrosser> and some flag to make everything stop after setup-openstack
16:23:47 <jrosser> *setup-infrastructure
16:24:07 <openstackgerrit> James Gibson proposed openstack/openstack-ansible-ops master: Change ansible tests to prefer Python3 over Python2 in vitualenv  https://review.opendev.org/751773
16:24:51 <noonedeadpunk> hm, yeah, makes sense
16:25:06 <jrosser> noonedeadpunk: i have to head out for a bit but there is still a lot to go over for V release
16:25:22 <jrosser> i sort of took over the PTG etherpad to track all these patches
16:25:52 <jrosser> we needs the linters fixed for at least openstack-ansible-tests to land 2.10.1 patch there
16:26:03 <noonedeadpunk> And I think it's about time to freeze master bumps?
16:26:52 <noonedeadpunk> or at least switch master to victoria...
16:27:25 <noonedeadpunk> so we don't start figting with W issues
16:33:20 <openstackgerrit> Merged openstack/openstack-ansible-os_aodh master: Remove CI jobs to allow db setup patch to merge  https://review.opendev.org/754791
16:33:27 <openstackgerrit> Merged openstack/openstack-ansible-os_aodh master: Use the utility host for db setup tasks  https://review.opendev.org/754720
16:36:24 <jrosser> yes though we also start fighting requirements changes too as they’re based off the branch name
16:37:39 <jrosser> noonedeadpunk: maybe an extra keyword on the scenario “infra” we could just run the first part of the deploy
16:41:32 <noonedeadpunk> jrosser: we can actually just to break here in case of some scenarios https://opendev.org/openstack/openstack-ansible/src/branch/master/scripts/gate-check-commit.sh#L188
16:42:20 <jrosser> right - perhaps a small step to getting rid of the functional tests
16:44:08 <noonedeadpunk> yeah, I think I will try doing that tomorrow instead of trying to revive functional tests as is
16:44:44 <jrosser> better value time I think
16:45:18 <noonedeadpunk> yeah
16:47:17 <openstackgerrit> Dmitriy Rabotyagov (noonedeadpunk) proposed openstack/openstack-ansible-tests master: Update ansible-lint==4.3.5, flake8==3.8.3, bashate==2.0.0  https://review.opendev.org/754982
16:48:27 <openstackgerrit> Dmitriy Rabotyagov (noonedeadpunk) proposed openstack/ansible-role-systemd_mount master: Install required packages for NFS/CephFS mounts  https://review.opendev.org/754978
16:49:45 <openstackgerrit> Dmitriy Rabotyagov (noonedeadpunk) proposed openstack/ansible-role-systemd_mount master: Install required packages for NFS/CephFS mounts  https://review.opendev.org/754978
16:57:10 * jrosser back
16:58:44 <noonedeadpunk> btw, mensis have completed fixing monasca role at least for train
16:59:48 <jrosser> thats good - so long as we can keep on top of it
16:59:58 <jrosser> like senlin CI already seems completely broken :(
17:00:38 <noonedeadpunk> oh damn
17:00:39 * jrosser wish we had a better dashboard for periodic jobs
17:00:53 <jrosser> it's kind of easy if you're a one-repo project to look in zuul state
17:01:04 <jrosser> but with so many it's just really hard
17:01:07 <noonedeadpunk> yeah it is...
17:01:21 <noonedeadpunk> but what I was going to say about monasca - we have retired roles
17:01:34 <noonedeadpunk> and I was thinking about reviving it
17:01:52 <noonedeadpunk> the thing was, that monasca had 2 repos - for service and agent
17:02:02 <noonedeadpunk> and I was thinking if it's worth mmerging them now
17:02:09 <noonedeadpunk> like we did for galera
17:02:26 <jrosser> that makes sense, it's not unlike neutron or nova really
17:02:52 <noonedeadpunk> point in separation might be, that agent installation can be provided to customers who know nothing about osa
17:03:14 <jrosser> what does it create?
17:03:39 <noonedeadpunk> I think it grabs data from vms?
17:03:48 <noonedeadpunk> like prometheus expoter or smth...
17:04:49 <mensis> its for grabbing metrics, and it has several plugins which including gathering metrics from vms
17:05:08 <openstackgerrit> Jonathan Rosser proposed openstack/openstack-ansible master: Update ansible-lint==4.3.5, flake8==3.8.3, bashate==2.0.0  https://review.opendev.org/755065
17:06:26 <noonedeadpunk> but the thing is, that monasca can be left without PTL and not sure about project future because of that...
17:12:48 <noonedeadpunk> #endmeeting