Thursday, 2022-03-24

noonedeadpunk	A pity that fghaas is not around here, as I bet he uses k8s with octavia and magnum	09:33
Brace	So, we seem to have got to the bottom of our broken cluster, the l3 agent on c2 isn't working. So we've disabled all the neutron services on that controller and are able to bring up some instances now.	09:46
Brace	Which will give us a bit of time to try and find out what's actually wrong with c2.	09:46
jrosser	zigo: do i remember right that you had some insights into uwsgi / chuncked transfer settings?	10:19
zigo	jrosser: I do: the swift team pretends Swift works with uwsgi, but that's bullshit, it's broken in very subtile ways.	10:20
zigo	I reverted all of swift over uwsgi.	10:20
jrosser	how about for glance?	10:20
zigo	jrosser: Glance is said to be fine starting with Xena.	10:21
zigo	jrosser: FYI, here's the config I use in Debian for swift: https://review.opendev.org/c/openstack/swift/+/821192	10:21
zigo	I'd love upstream to adopt it, and start gating with it.	10:21
zigo	Until then, I'll stick on Eventlet.	10:21
zigo	We insisted for more than a year already, and we keep getting issues.	10:22
zigo	The last one was empty uploads, even though swift says it's ok ... :/	10:22
jrosser	mossblaser: is this any help? ^^	10:22
zigo	Note the:	10:24
zigo	route-run = chunked:	10:24
zigo	and:	10:24
zigo	route = .* addheader:Date: ${httptime[]}	10:24
zigo	in my proposed patch. While these options are making Swift pass all refstack tests, they are forcing "Transfer-Encoding: chunked" which is probably not what one wants.	10:24
zigo	Though we haven't find another way to get things approximatly working.	10:24
zigo	The other thing, is that the Swift object server is COMPLETELY broken over uwsgi, because the exchanges between proxy <=> object servers aren't even HTTP compliant.	10:25
mossblaser	jrosser: I'm afraid I'm not familiar enough with either glance or uwsgi to know off hand if this is the same issue as I've been seeing. Though assuming we are on Xena(?) the issue does seem to persist...	10:25
zigo	All this is reall a shame, because uwsgi provides a x2 performance improvement ...	10:25
zigo	jrosser: What issue are you seeing?	10:26
jrosser	mossblaser: can you paste something at paste.openstack.org from what we see with glance?	10:27
mossblaser	an intermittent failure during image upload from cinder to glance which looks very much like this bug: https://bugs.launchpad.net/glance/+bug/1916482 -- logs from our observed issue: https://paste.opendev.org/show/bLq9YXaH6ZdsBj57iWkL/	10:30
noonedeadpunk	fwiw I catched that recently as well, but in my case we used by mistake different chunk size for cinder and glance	10:34
noonedeadpunk	by default cinder sets chunk size to 4, and glance to 8	10:34
noonedeadpunk	If you accidentally missed to configure that, you will have issues with images to create from volumes for sure	10:34
mossblaser	noonedeadpunk: I presume that would lead to persistent failures, rather than intermittent? (In our case image creation succeeds the majority of the time)	10:35
noonedeadpunk	it not persistant, no. but depends on luck and volume size	10:36
noonedeadpunk	bigger ones almost always fail, smaller mostly work	10:39
noonedeadpunk	also for nova with local drives you might want to try out https://review.opendev.org/c/openstack/openstack-ansible-os_nova/+/828897 in case the have physical connectivity to ceph	10:41
mossblaser	(a quick check and it seems that the block size is left as the default for cinder and glance in our setup which looking at the docs I hope means they're the same! Thanks for the suggestion!)	10:42
noonedeadpunk	default means they are not same:)	10:48
noonedeadpunk	default for glance is 8: https://docs.openstack.org/glance/latest/configuration/glance_api.html#glance.store.rbd.store.rbd_store_chunk_size	10:48
mossblaser	uh-ohh! -- I must have looked at the docs for an older version	10:49
noonedeadpunk	for cinder-volume rbd_store_chunk_size is 4 https://docs.openstack.org/cinder/latest/configuration/block-storage/samples/cinder.conf.html	10:49
noonedeadpunk	it was always like that)	10:49
mossblaser	evidently I need to start drinking coffee!	10:50
mossblaser	that is unfortunate	10:50
noonedeadpunk	If you check https://docs.openstack.org/openstack-ansible/latest/user/ceph/full-deploy.html#user-variables you will find that we define `rbd_store_chunk_size: 8` there	10:50
noonedeadpunk	for this exact reason	10:51
noonedeadpunk	jrosser: question - do you think we should add `galera_monitoring_user_password` to user_secrets?	10:52
noonedeadpunk	or there're cases when you don't want to have it covered with any password?	10:53
noonedeadpunk	or well, maybe it's question to andrewbonney :)	10:53
andrewbonney	I don't think I have a reason for it to be password-less	10:58
opendevreview	Dmitriy Rabotyagov proposed openstack/openstack-ansible master: Add galera monitoring user to secrets https://review.opendev.org/c/openstack/openstack-ansible/+/835038	11:03
noonedeadpunk	NeilHanlon: hey! around?	11:56
noonedeadpunk	pinging you as rocky expert:) We see that our patch fails now on rocky, as we assume that /etc/ssh/sshd_config.d exist and used by ssh	11:57
noonedeadpunk	things go smooth for CentOS 8, but fail for Rocky.	11:57
noonedeadpunk	So was wondering, if you know anything about that difference	11:57
noonedeadpunk	https://review.opendev.org/c/openstack/openstack-ansible-repo_server/+/827100 as example for logs	11:57
jrosser	noonedeadpunk: oh i think there was special handling for that on centos	12:21
jrosser	noonedeadpunk: argh yes https://review.opendev.org/c/openstack/openstack-ansible-plugins/+/825113/16/roles/ssh_keypairs/tasks/standalone/install_ssh_ca.yml#52	12:22
jrosser	so i think we maybe dont test rocky for plugins?	12:23
noonedeadpunk	we don't indeed	12:24
* jrosser feels unit tests disucssion coming up again :)		12:24
* noonedeadpunk don't stay in same place more then 2-3 weeks in a row so life is full mess, so can't focus on a thing for some time now...		12:25
jrosser	oh of course, i'm not complaining :)	12:26
spatel	any idea about this error - https://paste.opendev.org/show/b8OmbVOc1e6b1CypkNns/	12:59
spatel	using this doc to create octavia ingress controller	12:59
spatel	I would appreciate if anyone has any google example yaml to create octavia ingress controller for my k8s (because nothing working for me :()	13:00
noonedeadpunk	spatel: try finding way to reach fghaas - he likely can help you if in good mood	13:10
noonedeadpunk	but I'm not sure 100% if he runs octavia from k8s or jsut use heat and magnum for that...	13:12
jrosser	spatel: people here who do k8s on openstack use the nginx ingress and then octavia in TCP LB across however many backends needed	13:33
jrosser	thats turned out simplest as you can have cert management (LE in this case) handled in the k8s side, not octavia	13:33
spatel	hmm! i thought everyone default using octavia?	13:35
spatel	If nginx is way to go and easy as hell then sure i would go with that way in production	13:39
spatel	I thought tightly couple with octavia and we don't need to do anything just request for LB and it will be available without extra steps like this doc saying - https://superuser.openstack.org/articles/guide-octavia-ingress-controller-for-kubernetes/	13:40
mossblaser	zigo: jrosser noonedeadpunk: so I tried out setting the block size and this did not seem to fix the problem but switching haproxy into tcp mode does appear to -- perhaps glance isn't set up right in Xena after all.	15:03
noonedeadpunk	mossblaser: another suggestion - don't use uwsgi for glance	15:04
zigo	mossblaser: Glance does work over uwsgi on any release, it's only broken when using Swift as a backend in some specific cases.	15:04
noonedeadpunk	zigo: and except you need interoperable import being used?	15:04
zigo	noonedeadpunk: I'm kind of tired to read this all the time, and would very much prefer if upstream was working on fixes.	15:05
zigo	:/	15:05
zigo	(not blaming anyone on this channel: don't take it personally)	15:05
zigo	Same thing with Swift.	15:05
jrosser	noonedeadpunk: seems that volume-to-image suffers as well as interoperable import	15:07
noonedeadpunk	So while I understand why tcp would work, I'm not really convinced it's root cause tbh	15:07
zigo	jrosser: The problem is always Transfer-Encoding: chunked related indeed...	15:07
noonedeadpunk	or well, proper way to fix	15:07
jrosser	mossblaser: did you try any of the uwsgi config things?	15:08
noonedeadpunk	jrosser: well, for me changing chunk size just worked tbh to fix volume-to-image	15:09
jrosser	hmm	15:09
noonedeadpunk	but I I guess in this case mossblaser trying to upload image from nova ephemeral that's on local drive?	15:10
zigo	Do you have "wsgi-manage-chunked-input = true" ?	15:10
zigo	What version of uwsgi is that btw?	15:10
zigo	>= 2.0.19 ?	15:10
noonedeadpunk	we have that by default zigo https://opendev.org/openstack/ansible-role-uwsgi/src/branch/master/templates/uwsgi.ini.j2#L33	15:10
zigo	Lower wont have the option...	15:10
mossblaser	jrosser: I did not yet (since this appears it may need more than a simple config change in OSA)	15:11
noonedeadpunk	oh, wait, you mentioned other option....	15:11
mossblaser	noonedeadpunk: I was uploading an image from a nova volume which lives in CEPH into Glance (also using CEPH for storage), nothing local involved	15:11
noonedeadpunk	oh, ok...	15:12
zigo	Also activate the transformation_chunked plugin !	15:12
jrosser	*cinder volume	15:12
zigo	plugins = python3,transformation_chunked	15:12
noonedeadpunk	as admin1 was refferencing same issue just 2 days ago, but was uploading from local	15:12
jrosser	mossblaser: you can hack this stuff into the uwsgi config by hand in the test lab	15:12
jrosser	then we can work on a patch if it fixes things	15:12
mossblaser	sorry cinder, of course (it has been a long day!)	15:14
mossblaser	I shall have a play re: uwsgi	15:14
noonedeadpunk	yes, that would be interesting	15:15
noonedeadpunk	and we should deploy uWSGI==2.0.20	15:16
zigo	IMO, it's kind of silly that you guys are just remplementing all what's already done in packages...	15:19
zigo	That's twice the work for no valid reason.	15:19
noonedeadpunk	except to be sure that you can install any specific version anytime you want?	15:20
noonedeadpunk	without need to mirror repos?	15:20
zigo	Again, that's a packaging concern, to make sure all versions are fit together.	15:21
zigo	can	15:21
noonedeadpunk	I think it depends on what is meant under fitting	15:22
zigo	I don't ! :)	15:22
zigo	I don't think it depends on anything.	15:22
zigo	That's distro's work, end of the story.	15:22
noonedeadpunk	um, and what if regressions in code exist? As no secret that weird backports take place close to each release. And what you should do as cloud operator to revert things back, when only latest package versions is stored in repos?	15:24
noonedeadpunk	As I don't understand how I should ensure state of my cloud with packages, when deploying next week I just have new version of software without any options	15:25
zigo	Wrong package: fix the package.	15:25
zigo	Not wrong package -> use some weirdo overrides.	15:25
noonedeadpunk	My point was leading to ensuring exact same software being deployed not depending on time when it is deployed :)	15:26
mgariepy	or the os.	15:26
zigo	That's because you see the OS as working against you, instead of trying to modify it to do what you want.	15:27
zigo	If you want a specific snapshot of the OS so you don't get the latest point release... make such snapshot and be done with it! :)	15:28
noonedeadpunk	Well yes, I do agree here that it's likely point of perception being present:)	15:28
zigo	There's all the tooling you want for that.	15:28
zigo	As being the person behind all the Debian package since OpenStack exists, I'm probably completely biased ... :)	15:28
noonedeadpunk	sorry, what should I do with that snapshot then?:)	15:28
noonedeadpunk	deploy it in other region?	15:29
noonedeadpunk	I can imagine using it for CI testing...	15:29
noonedeadpunk	but not sure I see how it can be re-used anywhere esle except that host	15:29
zigo	My way of doing things is to simply trust the package manager to do what's right, and provide only bugfixes with no regressions.	15:29
zigo	So I wouldn't do a snapshot, it's only you who claimed you don't want things to be fixed ... :)	15:30
noonedeadpunk	And I admit it makes sense for some usecases:)	15:30
admin1	my issue with glance was the haproxy was set to mode http and it was hitting some byte limit .. which solved after i changed mode in haproxy for glance from http => tcp	15:31
zigo	admin1: Byte limits? Can you be more specific?	15:31
noonedeadpunk	it was exact same issue and reference to https://bugs.launchpad.net/glance/+bug/1916482	15:32
noonedeadpunk	and that's the bug created https://bugs.launchpad.net/openstack-ansible/+bug/1965986	15:33
zigo	Oh, that's a long standing issue in Glance, which is why everyone with some experience never chooses Ceph as a backend for it.	15:33
noonedeadpunk	that's why I thought that mossblaser issue is same one	15:33
zigo	Sad but truth, Glance over RBD simply sux...	15:33
zigo	Though it's IIRC not related to haproxy.	15:34
noonedeadpunk	jrosser: sorry I didn't fully get your comment on https://review.opendev.org/c/openstack/openstack-ansible-galera_server/+/831550 - did you mean we should just place temp dir inside /tmp and get done with it?	15:45
jrosser	it was #tmp not /tmp ?	15:46
noonedeadpunk	`galera_tmp_dir: /var/lib/mysql/#tmp`	15:46
noonedeadpunk	and galera_ignore_db_dirs is relative to datadir	15:46
noonedeadpunk	we can set `galera_tmp_dir: /tmp` actually	15:46
jrosser	oh	15:47
jrosser	becasue galera_tmp_dir: /var/lib/mysql/#tmp	15:47
noonedeadpunk	but I wasn't sure if it's good since /var/lib/mysql can be separate mount point...	15:47
jrosser	i honestly thought the # was a typo :)	15:47
jrosser	is that a convention for mysql things	15:47
noonedeadpunk	ah, no, it was intended)) added # as otherwise ppl won't be able to create database with name `tmp`	15:48
noonedeadpunk	and I think that `#tmp` highly unlikely to be created :D	15:48
noonedeadpunk	but actually yes	15:48
noonedeadpunk	if directory is not set, maria tends to create smth like /var/lib/mysql/#mysql50#tmp.stLr46FBlt	15:49
noonedeadpunk	easy solution would be if `ignore_db_dirs` was supporting regexp, but it doesn't	15:49
noonedeadpunk	I even saw CI failures for upgrade jobs because of that	15:55
noonedeadpunk	and catched in another region in production during upgrade	15:55
opendevreview	Dmitriy Rabotyagov proposed openstack/openstack-ansible master: Add mysql directory for logging https://review.opendev.org/c/openstack/openstack-ansible/+/835091	16:07
opendevreview	Dmitriy Rabotyagov proposed openstack/openstack-ansible-galera_server master: Update MariDB version to 10.6.7 https://review.opendev.org/c/openstack/openstack-ansible-galera_server/+/833259	16:08
opendevreview	Dmitriy Rabotyagov proposed openstack/openstack-ansible-galera_server master: Update MariaDB version to 10.6.7 https://review.opendev.org/c/openstack/openstack-ansible-galera_server/+/833259	16:08
noonedeadpunk	so that was original error like I saw in production https://zuul.opendev.org/t/openstack/build/fe6fd9e0341c4d4b80530cbe5e091cc3/log/logs/openstack/aio1_galera_container-4bf4bdaa/mariadb.service.journal-12-06-26.log.txt#460	16:09
noonedeadpunk	to be fair, I'm not sure if that's fixed with patch as another common weird error raised even with it....	16:10
noonedeadpunk	maybe jsut point that to /tmp indeed....	16:11
admin1	zigo, how it got solved in haproxy then ?	16:22
admin1	i mean i was able to create snapshots from local as well as remote after that	16:22
noonedeadpunk	I need really to reproduce that to play with it. As chunked plugin for uwsgi sounds promising	17:32
spatel	I am running this command - openstack-ansible setup-openstack.yml --tags common-mq --limit '!nova_compute'	19:34
spatel	got this error - https://paste.opendev.org/show/bzT7JMrwONnSza328XoO/	19:34
spatel	jrosser ^	19:38
spatel	Related to this play https://opendev.org/openstack/ansible-role-uwsgi/src/branch/master/tasks/main.yml#L16	19:41
spatel	I have changed include_vars: "{{ item }}" to include_vars: "{{ lookup('first_found', params) }}"	19:52
spatel	still same error, I am running 24.0.0 tag	19:56
*** dviroel is now known as dviroel\|pto		20:45
opendevreview	Neil Hanlon proposed openstack/openstack-ansible-plugins master: Update ssh_keypairs role to fix module for Rocky Linux 8 https://review.opendev.org/c/openstack/openstack-ansible-plugins/+/835152	21:55
NeilHanlon	noonedeadpunk / jrosser - i think that should do the trick	21:55
jrosser	NeilHanlon: one small issue but otherwise looks ok	22:12
NeilHanlon	jrosser: thank you.. I should look for WARNINGs :)	22:57

Generated by irclog2html.py 2.17.3 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!