16:01:03 <noonedeadpunk> #startmeeting openstack_ansible_meeting 16:01:03 <openstack> Meeting started Tue Sep 29 16:01:03 2020 UTC and is due to finish in 60 minutes. The chair is noonedeadpunk. Information about MeetBot at http://wiki.debian.org/MeetBot. 16:01:04 <openstack> Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. 16:01:07 <openstack> The meeting name has been set to 'openstack_ansible_meeting' 16:01:38 <fridtjof[m]> jrosser: found the issue i think 16:01:38 <noonedeadpunk> #topic office hours 16:01:49 <noonedeadpunk> \o/ 16:01:59 <jrosser> o/ hello 16:02:39 <noonedeadpunk> Ok, so telemetry failure? 16:04:27 <noonedeadpunk> I'd say let's maybe trying to merge aodh at least? 16:05:09 <jrosser> worst case we have to revert it 16:05:25 <jrosser> i've not had opportunity to test that locally yet 16:05:33 <noonedeadpunk> I think worst case we will have just another patch to fix it 16:05:45 <jrosser> right, thats fine 16:06:15 <jrosser> so thats this https://review.opendev.org/#/c/754791 16:06:29 <jrosser> followed by https://review.opendev.org/#/c/754720/ 16:06:48 <noonedeadpunk> yep 16:07:29 <noonedeadpunk> ok, then next thingis galera.... 16:07:48 <noonedeadpunk> I tried to look into it and it fails in so many different ways.... 16:08:37 <noonedeadpunk> When I deployed it locally it was passing 3 or 4 times in a row when I decided that it's ok 16:09:04 <noonedeadpunk> in some cases there was smth weird with container, as service start was just hanging... 16:09:20 <noonedeadpunk> so at the moment, we have 2 scenarios 16:10:54 <noonedeadpunk> 1st is old one, when one of the containers don't see address of another partner. and it's the issue of this specific member, and it goes back to ok state in case of restart 16:11:03 <noonedeadpunk> while cluster is synced in this state 16:12:20 <noonedeadpunk> 2nd case when one of the containers is really down and didn't get up. IN this case we should restart not containers which don't see neighboor but down member... 16:12:35 <noonedeadpunk> And I dunno how to make ogic to make it work 16:12:41 <noonedeadpunk> *logic 16:12:59 <noonedeadpunk> From other side, we can add serial and probably forget about the issue at once 16:13:54 <jrosser> for the first case do you think that the container networking is completely broken 16:15:00 <noonedeadpunk> no, for the first 3 members are up and synced, but one of them show only 2 addresses in wsrep_incoming_addresses 16:15:20 <noonedeadpunk> which doesn't affect anything functionally, except it's weird and our tests fail 16:17:06 <jrosser> the functional test is kind of tech debt somehow 16:17:17 <jrosser> we could have an integrated test with affinity=3 on the container 16:17:34 <jrosser> then expand the galera role to have cluster status checks 16:17:54 <noonedeadpunk> have no idea how to do the last part 16:19:33 <noonedeadpunk> or just do cluster checks by default? 16:19:44 <noonedeadpunk> or with some var passed? 16:19:54 <noonedeadpunk> hm, yeah, might be 16:22:02 <jrosser> see affinity on here https://github.com/openstack/openstack-ansible/blob/master/doc/source/admin/maintenance-tasks/containers.rst 16:22:26 <jrosser> i never used this though.... maybe works!? 16:22:28 <noonedeadpunk> yeah. I was not about affinity, but about how to extend role with tests) 16:22:35 <noonedeadpunk> me too lol 16:23:22 <jrosser> yes so it would be optional sanity checks i guess, you don't want that interfering when trying to rescue a broken galera cluster 16:23:37 <jrosser> and some flag to make everything stop after setup-openstack 16:23:47 <jrosser> *setup-infrastructure 16:24:07 <openstackgerrit> James Gibson proposed openstack/openstack-ansible-ops master: Change ansible tests to prefer Python3 over Python2 in vitualenv https://review.opendev.org/751773 16:24:51 <noonedeadpunk> hm, yeah, makes sense 16:25:06 <jrosser> noonedeadpunk: i have to head out for a bit but there is still a lot to go over for V release 16:25:22 <jrosser> i sort of took over the PTG etherpad to track all these patches 16:25:52 <jrosser> we needs the linters fixed for at least openstack-ansible-tests to land 2.10.1 patch there 16:26:03 <noonedeadpunk> And I think it's about time to freeze master bumps? 16:26:52 <noonedeadpunk> or at least switch master to victoria... 16:27:25 <noonedeadpunk> so we don't start figting with W issues 16:33:20 <openstackgerrit> Merged openstack/openstack-ansible-os_aodh master: Remove CI jobs to allow db setup patch to merge https://review.opendev.org/754791 16:33:27 <openstackgerrit> Merged openstack/openstack-ansible-os_aodh master: Use the utility host for db setup tasks https://review.opendev.org/754720 16:36:24 <jrosser> yes though we also start fighting requirements changes too as they’re based off the branch name 16:37:39 <jrosser> noonedeadpunk: maybe an extra keyword on the scenario “infra” we could just run the first part of the deploy 16:41:32 <noonedeadpunk> jrosser: we can actually just to break here in case of some scenarios https://opendev.org/openstack/openstack-ansible/src/branch/master/scripts/gate-check-commit.sh#L188 16:42:20 <jrosser> right - perhaps a small step to getting rid of the functional tests 16:44:08 <noonedeadpunk> yeah, I think I will try doing that tomorrow instead of trying to revive functional tests as is 16:44:44 <jrosser> better value time I think 16:45:18 <noonedeadpunk> yeah 16:47:17 <openstackgerrit> Dmitriy Rabotyagov (noonedeadpunk) proposed openstack/openstack-ansible-tests master: Update ansible-lint==4.3.5, flake8==3.8.3, bashate==2.0.0 https://review.opendev.org/754982 16:48:27 <openstackgerrit> Dmitriy Rabotyagov (noonedeadpunk) proposed openstack/ansible-role-systemd_mount master: Install required packages for NFS/CephFS mounts https://review.opendev.org/754978 16:49:45 <openstackgerrit> Dmitriy Rabotyagov (noonedeadpunk) proposed openstack/ansible-role-systemd_mount master: Install required packages for NFS/CephFS mounts https://review.opendev.org/754978 16:57:10 * jrosser back 16:58:44 <noonedeadpunk> btw, mensis have completed fixing monasca role at least for train 16:59:48 <jrosser> thats good - so long as we can keep on top of it 16:59:58 <jrosser> like senlin CI already seems completely broken :( 17:00:38 <noonedeadpunk> oh damn 17:00:39 * jrosser wish we had a better dashboard for periodic jobs 17:00:53 <jrosser> it's kind of easy if you're a one-repo project to look in zuul state 17:01:04 <jrosser> but with so many it's just really hard 17:01:07 <noonedeadpunk> yeah it is... 17:01:21 <noonedeadpunk> but what I was going to say about monasca - we have retired roles 17:01:34 <noonedeadpunk> and I was thinking about reviving it 17:01:52 <noonedeadpunk> the thing was, that monasca had 2 repos - for service and agent 17:02:02 <noonedeadpunk> and I was thinking if it's worth mmerging them now 17:02:09 <noonedeadpunk> like we did for galera 17:02:26 <jrosser> that makes sense, it's not unlike neutron or nova really 17:02:52 <noonedeadpunk> point in separation might be, that agent installation can be provided to customers who know nothing about osa 17:03:14 <jrosser> what does it create? 17:03:39 <noonedeadpunk> I think it grabs data from vms? 17:03:48 <noonedeadpunk> like prometheus expoter or smth... 17:04:49 <mensis> its for grabbing metrics, and it has several plugins which including gathering metrics from vms 17:05:08 <openstackgerrit> Jonathan Rosser proposed openstack/openstack-ansible master: Update ansible-lint==4.3.5, flake8==3.8.3, bashate==2.0.0 https://review.opendev.org/755065 17:06:26 <noonedeadpunk> but the thing is, that monasca can be left without PTL and not sure about project future because of that... 17:12:48 <noonedeadpunk> #endmeeting