Tuesday, 2022-05-10

*** ysandeep\|out is now known as ysandeep\|rover		04:42
noonedeadpunk	mornings	07:12
jrosser	good morning	07:38
foutatoro	hello jrosser	07:57
jrosser	hello	07:57
damiandabrowski[m]	morning folks!	07:58
*** ysandeep\|rover is now known as ysandeep\|rover\|lunch		08:49
admin1	morning	09:20
*** ysandeep\|rover\|lunch is now known as ysandeep\|rover		09:47
mgariepy	good morning everyone	11:23
*** dviroel\|afk is now known as dviroel		11:28
opendevreview	Marc Gariépy proposed openstack/openstack-ansible-os_tempest master: [DNM] testing if all the tests are still passing. https://review.opendev.org/c/openstack/openstack-ansible-os_tempest/+/841257	12:22
mgariepy	noonedeadpunk, uca doesnt have tempest plugin package. jammy does have some but i guess it's only in universe and won't stay up to date anyway.	12:23
noonedeadpunk	we were using source install for ubuntu tempest regardless if it's source or distro install	12:25
noonedeadpunk	I thought we had something that prevented this from failures	12:26
noonedeadpunk	maybe we can disable building wheels on ubuntu, when it's distro install	12:26
mgariepy	i thing the hard-coded "source" install does fix it	12:26
mgariepy	we will see.	12:27
noonedeadpunk	but we don't have repo container when rest is distro install? do we?	12:27
mgariepy	indeed we do not.	12:27
mgariepy	let's see if it passes. if not i'll debug it.	12:28
noonedeadpunk	then it should fail on attempt to get constraints file from repo container	12:29
mgariepy	tempest was installing from source on distro install test a couple weeks ago.	12:29
noonedeadpunk	well. I dropped some things maybe :D	12:30
mgariepy	lol. maybe that's why i'm re-testing the role haha	12:30
noonedeadpunk	like with https://review.opendev.org/c/openstack/openstack-ansible/+/837845	12:31
noonedeadpunk	But I don't see what would result in the issues...	12:31
noonedeadpunk	Maybe we also fixed not deploying repo container for distro installs when were merging gluster	12:32
mgariepy	well me neither.	12:32
mgariepy	let's wait for the test result.	12:32
noonedeadpunk	but eventually this fails only for tempest role	12:32
noonedeadpunk	which is really interesting	12:32
mgariepy	we do have another role that doesn't support distro install.	12:32
mgariepy	gnocchi.	12:33
noonedeadpunk	I guess we should jsut drop distro support there?	12:36
noonedeadpunk	or whole telemetry does support it?	12:36
mgariepy	i have no idea if there is gnocchi in uca or not. it's getting really hard i think to have all our roles patched at the same time	12:37
mgariepy	there are always one or 2 or 4 that are left behind.	12:37
lowercase	I'm finally getting to a place where I feel comfortable uploading my work with fluentd, openstack and loki into a public repo. What's the repo where this would all go. The one that had the elk configurations and such.	12:45
jrosser	lowercase: openstack-ansible-ops is the repo for this sort of thing	12:47
mgariepy	foutatoro, did you recovered your cluster ?	12:48
foutatoro	mgariepy, good morning.	12:56
foutatoro	mgariepy: not yet I have a really strange issue. previous VM disks seem to be in ceph vms pool but I can't list them not attached them to appropriate VM	12:58
foutatoro	https://paste.opendev.org/show/b1Mhw44QtAMH8qlQVTBL/	12:58
lowercase	4 in (since 6M)	12:59
lowercase	did you just recover an osd?	12:59
mgariepy	6M i guess it month./	12:59
mgariepy	it's 6 months**	13:00
lowercase	then why are the pgs degraded?	13:00
lowercase	if he didn't lose an osd	13:00
foutatoro	lowercase: I've 4 osd this is a pred-prod	13:00
mgariepy	was an osd out for a long time ?	13:01
foutatoro	mgariepy>: no	13:01
lowercase	what happened 14 hours ago	13:01
foutatoro	lowercase: due to a incident all infra hosts were restared	13:02
lowercase	do you infra hosts also host osds?	13:02
foutatoro	yes	13:03
lowercase	okay, so all osds were offline 14 hours ago	13:03
foutatoro	exact	13:03
lowercase	which makes a 6 month uptime for that osd impossible. So what happened 6 minutes ago?	13:04
foutatoro	since the restart the cinder-volune service state is down also	13:04
jrosser	the backend is down, not the service	13:04
foutatoro	nothing happens 6 minutes ago	13:04
lowercase	your ceph health status says otherwise	13:04
foutatoro	lowercase mgariepy: is there a way to download rdb objects as qcow2 ?	13:08
mgariepy	foutatoro, you can copy the image from ceph yes.	13:09
lowercase	okay, i was wrong. mgariepy was correct. 12 osds: 11 up (since 2m), 12 in (since 10w)	13:10
lowercase	i restarted an osd in my dev cluster just to confirm.	13:10
mgariepy	foutatoro, https://paste.openstack.org/show/bj72ZpWyVrbgBcSaswTU/	13:11
mgariepy	simple command with a few args easy to remember by heart	13:12
mgariepy	but you really should try to see why cinder is not starting.	13:13
jrosser	cinder volume backends can be down because of rabbitmq trouble	13:13
mgariepy	foutatoro, ok first thing, can you lists projects and users (this will tell you if keystone works)	13:14
foutatoro	yes I can list projets, users, neutron networks, previous instances names ...	13:14
mgariepy	for rabbitmq, what does `rabbitmqctl cluster_status` tells you	13:15
foutatoro	`rabbitmqctl cluster_status`: https://paste.openstack.org/show/b3GkAwR515Mfs9unC40K/	13:17
mgariepy	ok seems ok i guess.	13:18
mgariepy	now cinder did you restart it after you fixed the galera cluster ?	13:18
foutatoro	yes, I restart containers and all services with 'systemctl restart cinder*'	13:19
mgariepy	and the cinder api is online in your haproxy ?	13:20
mgariepy	hatop -s /var/run/haproxy.stat	13:20
foutatoro	https://paste.openstack.org/show/byhWiTt4sJ5FNdEegIry/	13:23
foutatoro	cider-api is not marked as UP	13:23
mgariepy	cinder_api-back seems UP.	13:24
mgariepy	in the cinder container	13:25
mgariepy	what does cinder log looks like ?	13:25
mgariepy	`journalctl -u cinder.slice -f`	13:25
mgariepy	`systemctl status cinder.slice`	13:27
foutatoro	https://paste.openstack.org/show/bFSZYN9I2hL5DkDAJLfo/	13:27
jrosser	`cinder service-list`	13:29
lowercase	This might help narrow down: `journalctl -u cinder-volume -p 3` or `journalctl -u cinder.slice -f -p 3`, -p 3 only shows logs that are marked as errors.	13:31
mgariepy	nice about -p3	13:32
mgariepy	i usually to `-n 10000\|grep something` :D lol	13:32
mgariepy	or --since with some quick google haha	13:32
lowercase	another favorite is --no-pager, which makes journactl not view in a bad ... well pager.	13:34
foutatoro	cinder service-list return a Bad Gateway it tries to join serve running on 8776	13:35
mgariepy	`opesntack volume service list`	13:35
foutatoro	https://paste.openstack.org/show/bc2YxdSQWJpCnle4a1xw/	13:35
foutatoro	https://paste.openstack.org/show/b6BpHCcSaf8FfG0Uf8Eh/	13:36
foutatoro	`opesntack volume service list`: https://paste.openstack.org/show/b6BpHCcSaf8FfG0Uf8Eh/	13:36
foutatoro	I'm restarting the scheduler	13:37
lowercase	cinder-api is prob offline	13:37
lowercase	you don't even have a cinder-api?	13:37
jrosser	i don't think it appears in that list anyway	13:38
mgariepy	indeed it doesnt	13:39
lowercase	it sure doesn't	13:39
lowercase	huh	13:39
mgariepy	systemctl restart cinder.slice	13:39
mgariepy	or the status before.	13:40
mgariepy	just to see.	13:40
jrosser	i keep saying that the up/down there isnt about the service running or not :)	13:40
jrosser	it's the backend	13:40
foutatoro	cinder.slice status: https://paste.openstack.org/show/bmHEid7kcMNSyWILsUU6/	13:41
lowercase	`journalctl -n 100 -p 3 -u cinder.slice` command please	13:43
jrosser	i do not thing that it is correct to have both rbd:volumes@RBD and infra2-cinder-volumes-container-bdc12de4@RBD cinder-volume services both listed	13:44
jrosser	that is a sign that there is something wrong with the active/active parts of the config	13:44
mgariepy	or it wasn't cleaned up ?	13:45
jrosser	yes, or that	13:45
mgariepy	if it was installed in the good old days. and it was never cleaned up it can be there.	13:46
foutatoro	https://paste.openstack.org/show/b84ZTb019u7lU7cDudwI/	13:46
jrosser	looks like at least rabbitmq trouble there	13:49
jrosser	you can use netstat or something to see if there are any actual connections working	13:50
lowercase	pymysql.err.OperationalError: (2013, 'Lost connection to M	13:53
lowercase	ySQL server during query')	13:53
lowercase	yeah, both rabbit and mysql issues.	13:53
lowercase	a whole log of mysql issues.	13:54
foutatoro	I see but those errors were at 2022-05-10 08:03:24.397	13:59
foutatoro	and I restart services after	13:59
lowercase	Is the time on the server the same time as your timezone?	14:00
lowercase	cause mine are set to UTC and that screws me up all the time lol	14:00
mgariepy	i prefer to have utc everywhere .. here we have some day light saving ( +1 / -1 every 6 months)	14:01
mgariepy	noonedeadpunk, tempest still pass with the static install_method.	14:09
noonedeadpunk	oh, ok	14:09
mgariepy	good enough for me :D haha	14:11
mgariepy	foutatoro, https://paste.openstack.org/show/bmHEid7kcMNSyWILsUU6/ is seems to be missing the api service	14:22
mgariepy	unless i'm mistaken.	14:22
mgariepy	ha. it's not in the cinder slice :/	14:24
mgariepy	it's in the uwsgi slice...	14:25
mgariepy	fun.	14:25
foutatoro	so I have to run 'openstack-ansible os-cinder-install.yml' ?	14:26
mgariepy	systemctl status cinder-api.service	14:27
mgariepy	`journalctl -u cinder-api.service -p 3 -n 100`	14:29
mgariepy	foutatoro, i don't think running playbook will help you there.	14:33
mgariepy	it's better to try to find the root cause	14:33
mgariepy	even more since this is a pre-prod system. you can take the time to debug it. it's not like when the produciton cluster have issues	14:34
foutatoro	cinder-api status: https://paste.openstack.org/show/bDXX5EswGiW5HqJ1Ietk/	14:39
foutatoro	mgariepy: I don't know why this message "http://infra2-cinder-api-container-632dfbb4:8776/ returned with HTTP 300" but "wget http://infra2-cinder-api-container-632dfbb4:8776/" works fine on both containers	14:43
mgariepy	300 is multiple choice from haproxy check i think.	14:43
lowercase	http 300 just means multiple choice, meaning that the url isn't a terminating url and there are multiple url paths that it can follow. i would try curl -L <url> and see if you get a 200 from that	14:44
lowercase	but can you perform the same command with -p 3 appended to it	14:45
lowercase	your api server is clearly running, processing requests. Howevor, since the issue is a rabbit or database issue, i want to see if the api service is complaining about either of those.	14:46
foutatoro	lowercase: right, curl -L works but adding -p 3 makes the request not terminate	14:52
lowercase	-p 3 to the journactl command lol	14:52
foutatoro	my bad	14:52
lowercase	`journalctl -u cinder-api.service -p 3 -n 100`	14:52
*** dviroel is now known as dviroel\|lunch\|afk		14:53
foutatoro	lowercase: journal shows logs of yesterday	14:55
foutatoro	https://paste.openstack.org/show/bgQAuugtt5ERJVxobuIi/	14:55
noonedeadpunk	#startmeeting openstack_ansible_meeting	15:00
opendevmeet	Meeting started Tue May 10 15:00:29 2022 UTC and is due to finish in 60 minutes. The chair is noonedeadpunk. Information about MeetBot at http://wiki.debian.org/MeetBot.	15:00
opendevmeet	Useful Commands: #action #agreed #help #info #idea #link #topic #startvote.	15:00
opendevmeet	The meeting name has been set to 'openstack_ansible_meeting'	15:00
noonedeadpunk	#topic rollcall	15:00
noonedeadpunk	o/	15:00
noonedeadpunk	well, I'm actually semi-around	15:01
mgariepy	hey o/	15:01
jrosser	hello o/	15:02
noonedeadpunk	#topic office hours	15:05
ebbex	o/	15:05
noonedeadpunk	I will be honest - I done nothing. I can't even recall what I was doing whole week...	15:05
noonedeadpunk	Likely side-effect after moving to the new place...	15:06
noonedeadpunk	jrosser: you had some issues with merging repo stuff - should we discuss it?	15:06
jrosser	oh yes, i left it alone for a few days	15:06
jrosser	but i think it's got all a bit circular	15:07
noonedeadpunk	We can always disable CI to land that...	15:07
jrosser	well maybe a couple of things to look at first	15:07
damiandabrowski[m]	hi!	15:08
jrosser	the glusterfs filesystem does not exist until we merge this https://review.opendev.org/c/openstack/openstack-ansible/+/837589/13/playbooks/repo-install.yml	15:08
jrosser	the repo_install playbook needs updating to create it, as the current use of serial: breaks the installation	15:08
jrosser	the tasks cannot be serial for those parts	15:09
jrosser	but then logically the next patch to merge (until I thought about it) was this https://review.opendev.org/c/openstack/openstack-ansible-repo_server/+/839411	15:09
jrosser	but i don't think thats ever going to pass without the first one	15:09
jrosser	as the fs will not exist	15:09
jrosser	i think rather than circular patches, i mean its very hard to get everything to pass in CI without making it circular	15:11
noonedeadpunk	I also left comment for https://review.opendev.org/c/openstack/openstack-ansible/+/837589/13/ansible-collection-requirements.yml#40 just in case :)	15:12
jrosser	ah yes i saw that	15:12
jrosser	i got kind of diverted by playing with skyline	15:13
jrosser	but we should try to get this gluster stuff merged becasue it is a big change and needs some testing for real	15:13
noonedeadpunk	yeah	15:13
jrosser	deleting / re-creating repo server containers has some subletlies now, for example	15:13
noonedeadpunk	Btw regarding rbac topic - I guess there's no reall need to do changes this release since cinder/heat are still not ready	15:14
noonedeadpunk	But I'd rather introduced service role anyway, despite discussions about it are still ongoing	15:14
noonedeadpunk	we can suggest dropping all repo containers at once for example as well	15:15
noonedeadpunk	as that should be fine I guess?	15:15
jrosser	i mount /openstack/glusterfs in the current patches	15:15
noonedeadpunk	As there's nothing _really_ important anyway	15:15
jrosser	as there is UUID need to be preserved, else you can't re-create/join the cluster properly	15:15
jrosser	so sometimes you need to keep that, sometimes you need to delete it	15:15
jrosser	depends if you want to destroy the fs and start again, or to keep it	15:16
noonedeadpunk	I actually thought it will get removed with force_containers_data_destroy ?	15:16
jrosser	that does not seem to understand whatever bind mounts get made	15:16
noonedeadpunk	But not sure	15:16
jrosser	however in this case, i think that preserving it is the right thing to do for multinode	15:17
jrosser	there is also an impact on re-deploying an infra node	15:17
jrosser	having said all this - i really would like other eyes / opinions on it	15:18
noonedeadpunk	yep, fair	15:20
noonedeadpunk	another thing - do we want to have a presentation about project updates?	15:20
noonedeadpunk	THere's no dedicated event during summit for that, but still marketing has some plan how to promote these	15:20
noonedeadpunk	Basically they asked for a video 10mins tops to say about changes that were made lately	15:21
jrosser	i guess we would have to look back over the etherpads to see what we did / did not do	15:22
noonedeadpunk	yup, agree	15:23
noonedeadpunk	I will try to put smth into other etherpad so we could review topics next week	15:23
damiandabrowski[m]	okok, great	15:24
noonedeadpunk	ok, what else we have on plate?	15:26
noonedeadpunk	Except tons of stuff that needs to land?	15:27
jrosser	hmmm yes - reviews / merging of lots of things	15:27
jrosser	i should also say that i have done a proof-of-concept with the alternative dashboard, skyline	15:28
jrosser	and it's ummmm - interesting to deploy	15:28
noonedeadpunk	I can imagine, as it's nodejs iirc?	15:29
noonedeadpunk	at least frontend part of it	15:29
jrosser	there is a python part 'apiserver' then nodejs things for 'console'	15:30
damiandabrowski[m]	i will spend some time on reviews tomorrow	15:30
jrosser	and imho the code is very docker / kolla centric	15:30
jrosser	and also confuses the service code with deployment tooling, as theres a executable to generate the required nginx config /o\	15:30
jrosser	but i think this is an opportunity to influence the skyline development to support wider tools and deployments	15:32
jrosser	debugging this is really on the edge of my understanding though, so if anyone is interested with web development skills then please help out :)	15:33
noonedeadpunk	tool to generate nginx config sounds as it sounds ofc...	15:42
noonedeadpunk	And also I saw there's no SSO support atm	15:42
noonedeadpunk	So I'd say they have plenty gaps as of todayy	15:42
noonedeadpunk	but you're right, we'd better chime-in earlier then later	15:43
jrosser	theres kind of two parts i think - ansible'ing up the deployment, pretty much whatever-it-takes to make it work	15:44
jrosser	then work on tidying that all up and making it all more OSA-like	15:45
noonedeadpunk	#endmeeting	16:00
opendevmeet	Meeting ended Tue May 10 16:00:34 2022 UTC. Information about MeetBot at http://wiki.debian.org/MeetBot . (v 0.1.4)	16:00
opendevmeet	Minutes: https://meetings.opendev.org/meetings/openstack_ansible_meeting/2022/openstack_ansible_meeting.2022-05-10-15.00.html	16:00
opendevmeet	Minutes (text): https://meetings.opendev.org/meetings/openstack_ansible_meeting/2022/openstack_ansible_meeting.2022-05-10-15.00.txt	16:00
opendevmeet	Log: https://meetings.opendev.org/meetings/openstack_ansible_meeting/2022/openstack_ansible_meeting.2022-05-10-15.00.log.html	16:00
*** ysandeep\|rover is now known as ysandeep\|out		16:23
opendevreview	Dmitriy Rabotyagov proposed openstack/openstack-ansible-os_octavia master: Make octavia_provider_network better configurable https://review.opendev.org/c/openstack/openstack-ansible-os_octavia/+/787336	16:46
*** dviroel\|lunch\|afk is now known as dviroel\		18:53
*** dviroel\ is now known as dviroel		18:53
*** dviroel is now known as dviroel\|out		21:22

Generated by irclog2html.py 2.17.3 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!