noonedeadpunk | A pity that fghaas is not around here, as I bet he uses k8s with octavia and magnum | 09:33 |
---|---|---|
Brace | So, we seem to have got to the bottom of our broken cluster, the l3 agent on c2 isn't working. So we've disabled all the neutron services on that controller and are able to bring up some instances now. | 09:46 |
Brace | Which will give us a bit of time to try and find out what's actually wrong with c2. | 09:46 |
jrosser | zigo: do i remember right that you had some insights into uwsgi / chuncked transfer settings? | 10:19 |
zigo | jrosser: I do: the swift team pretends Swift works with uwsgi, but that's bullshit, it's broken in very subtile ways. | 10:20 |
zigo | I reverted all of swift over uwsgi. | 10:20 |
jrosser | how about for glance? | 10:20 |
zigo | jrosser: Glance is said to be fine starting with Xena. | 10:21 |
zigo | jrosser: FYI, here's the config I use in Debian for swift: https://review.opendev.org/c/openstack/swift/+/821192 | 10:21 |
zigo | I'd love upstream to adopt it, and start gating with it. | 10:21 |
zigo | Until then, I'll stick on Eventlet. | 10:21 |
zigo | We insisted for more than a year already, and we keep getting issues. | 10:22 |
zigo | The last one was empty uploads, even though swift says it's ok ... :/ | 10:22 |
jrosser | mossblaser: is this any help? ^^ | 10:22 |
zigo | Note the: | 10:24 |
zigo | route-run = chunked: | 10:24 |
zigo | and: | 10:24 |
zigo | route = .* addheader:Date: ${httptime[]} | 10:24 |
zigo | in my proposed patch. While these options are making Swift pass all refstack tests, they are forcing "Transfer-Encoding: chunked" which is probably not what one wants. | 10:24 |
zigo | Though we haven't find another way to get things *approximatly* working. | 10:24 |
zigo | The other thing, is that the Swift object server is *COMPLETELY* broken over uwsgi, because the exchanges between proxy <=> object servers aren't even HTTP compliant. | 10:25 |
mossblaser | jrosser: I'm afraid I'm not familiar enough with either glance or uwsgi to know off hand if this is the same issue as I've been seeing. Though assuming we are on Xena(?) the issue does seem to persist... | 10:25 |
zigo | All this is reall a shame, because uwsgi provides a x2 performance improvement ... | 10:25 |
zigo | jrosser: What issue are you seeing? | 10:26 |
jrosser | mossblaser: can you paste something at paste.openstack.org from what we see with glance? | 10:27 |
mossblaser | an intermittent failure during image upload from cinder to glance which looks very much like this bug: https://bugs.launchpad.net/glance/+bug/1916482 -- logs from our observed issue: https://paste.opendev.org/show/bLq9YXaH6ZdsBj57iWkL/ | 10:30 |
noonedeadpunk | fwiw I catched that recently as well, but in my case we used by mistake different chunk size for cinder and glance | 10:34 |
noonedeadpunk | by default cinder sets chunk size to 4, and glance to 8 | 10:34 |
noonedeadpunk | If you accidentally missed to configure that, you will have issues with images to create from volumes for sure | 10:34 |
mossblaser | noonedeadpunk: I presume that would lead to persistent failures, rather than intermittent? (In our case image creation succeeds the majority of the time) | 10:35 |
noonedeadpunk | it not persistant, no. but depends on luck and volume size | 10:36 |
noonedeadpunk | bigger ones almost always fail, smaller mostly work | 10:39 |
noonedeadpunk | also for nova with local drives you might want to try out https://review.opendev.org/c/openstack/openstack-ansible-os_nova/+/828897 in case the have physical connectivity to ceph | 10:41 |
mossblaser | (a quick check and it seems that the block size is left as the default for cinder and glance in our setup which looking at the docs I hope means they're the same! Thanks for the suggestion!) | 10:42 |
noonedeadpunk | default means they are not same:) | 10:48 |
noonedeadpunk | default for glance is 8: https://docs.openstack.org/glance/latest/configuration/glance_api.html#glance.store.rbd.store.rbd_store_chunk_size | 10:48 |
mossblaser | uh-ohh! -- I must have looked at the docs for an older version | 10:49 |
noonedeadpunk | for cinder-volume rbd_store_chunk_size is 4 https://docs.openstack.org/cinder/latest/configuration/block-storage/samples/cinder.conf.html | 10:49 |
noonedeadpunk | it was always like that) | 10:49 |
mossblaser | evidently I need to start drinking coffee! | 10:50 |
mossblaser | that is unfortunate | 10:50 |
noonedeadpunk | If you check https://docs.openstack.org/openstack-ansible/latest/user/ceph/full-deploy.html#user-variables you will find that we define `rbd_store_chunk_size: 8` there | 10:50 |
noonedeadpunk | for this exact reason | 10:51 |
noonedeadpunk | jrosser: question - do you think we should add `galera_monitoring_user_password` to user_secrets? | 10:52 |
noonedeadpunk | or there're cases when you don't want to have it covered with any password? | 10:53 |
noonedeadpunk | or well, maybe it's question to andrewbonney :) | 10:53 |
andrewbonney | I don't think I have a reason for it to be password-less | 10:58 |
opendevreview | Dmitriy Rabotyagov proposed openstack/openstack-ansible master: Add galera monitoring user to secrets https://review.opendev.org/c/openstack/openstack-ansible/+/835038 | 11:03 |
noonedeadpunk | NeilHanlon: hey! around? | 11:56 |
noonedeadpunk | pinging you as rocky expert:) We see that our patch fails now on rocky, as we assume that /etc/ssh/sshd_config.d exist and used by ssh | 11:57 |
noonedeadpunk | things go smooth for CentOS 8, but fail for Rocky. | 11:57 |
noonedeadpunk | So was wondering, if you know anything about that difference | 11:57 |
noonedeadpunk | https://review.opendev.org/c/openstack/openstack-ansible-repo_server/+/827100 as example for logs | 11:57 |
jrosser | noonedeadpunk: oh i think there was special handling for that on centos | 12:21 |
jrosser | noonedeadpunk: argh yes https://review.opendev.org/c/openstack/openstack-ansible-plugins/+/825113/16/roles/ssh_keypairs/tasks/standalone/install_ssh_ca.yml#52 | 12:22 |
jrosser | so i think we maybe dont test rocky for plugins? | 12:23 |
noonedeadpunk | we don't indeed | 12:24 |
* jrosser feels unit tests disucssion coming up again :) | 12:24 | |
* noonedeadpunk don't stay in same place more then 2-3 weeks in a row so life is full mess, so can't focus on a thing for some time now... | 12:25 | |
jrosser | oh of course, i'm not complaining :) | 12:26 |
spatel | any idea about this error - https://paste.opendev.org/show/b8OmbVOc1e6b1CypkNns/ | 12:59 |
spatel | using this doc to create octavia ingress controller | 12:59 |
spatel | I would appreciate if anyone has any google example yaml to create octavia ingress controller for my k8s (because nothing working for me :() | 13:00 |
noonedeadpunk | spatel: try finding way to reach fghaas - he likely can help you if in good mood | 13:10 |
noonedeadpunk | but I'm not sure 100% if he runs octavia from k8s or jsut use heat and magnum for that... | 13:12 |
jrosser | spatel: people here who do k8s on openstack use the nginx ingress and then octavia in TCP LB across however many backends needed | 13:33 |
jrosser | thats turned out simplest as you can have cert management (LE in this case) handled in the k8s side, not octavia | 13:33 |
spatel | hmm! i thought everyone default using octavia? | 13:35 |
spatel | If nginx is way to go and easy as hell then sure i would go with that way in production | 13:39 |
spatel | I thought tightly couple with octavia and we don't need to do anything just request for LB and it will be available without extra steps like this doc saying - https://superuser.openstack.org/articles/guide-octavia-ingress-controller-for-kubernetes/ | 13:40 |
mossblaser | zigo: jrosser noonedeadpunk: so I tried out setting the block size and this did not seem to fix the problem but switching haproxy into tcp mode does appear to -- perhaps glance isn't set up right in Xena after all. | 15:03 |
noonedeadpunk | mossblaser: another suggestion - don't use uwsgi for glance | 15:04 |
zigo | mossblaser: Glance does work over uwsgi on *any* release, it's only broken when using Swift as a backend in some specific cases. | 15:04 |
noonedeadpunk | zigo: and except you need interoperable import being used? | 15:04 |
zigo | noonedeadpunk: I'm kind of tired to read this all the time, and would very much prefer if upstream was working on fixes. | 15:05 |
zigo | :/ | 15:05 |
zigo | (not blaming anyone on this channel: don't take it personally) | 15:05 |
zigo | Same thing with Swift. | 15:05 |
jrosser | noonedeadpunk: seems that volume-to-image suffers as well as interoperable import | 15:07 |
noonedeadpunk | So while I understand why tcp would work, I'm not really convinced it's root cause tbh | 15:07 |
zigo | jrosser: The problem is always Transfer-Encoding: chunked related indeed... | 15:07 |
noonedeadpunk | or well, proper way to fix | 15:07 |
jrosser | mossblaser: did you try any of the uwsgi config things? | 15:08 |
noonedeadpunk | jrosser: well, for me changing chunk size just worked tbh to fix volume-to-image | 15:09 |
jrosser | hmm | 15:09 |
noonedeadpunk | but I I guess in this case mossblaser trying to upload image from nova ephemeral that's on local drive? | 15:10 |
zigo | Do you have "wsgi-manage-chunked-input = true" ? | 15:10 |
zigo | What version of uwsgi is that btw? | 15:10 |
zigo | >= 2.0.19 ? | 15:10 |
noonedeadpunk | we have that by default zigo https://opendev.org/openstack/ansible-role-uwsgi/src/branch/master/templates/uwsgi.ini.j2#L33 | 15:10 |
zigo | Lower wont have the option... | 15:10 |
mossblaser | jrosser: I did not yet (since this appears it may need more than a simple config change in OSA) | 15:11 |
noonedeadpunk | oh, wait, you mentioned other option.... | 15:11 |
mossblaser | noonedeadpunk: I was uploading an image from a nova volume which lives in CEPH into Glance (also using CEPH for storage), nothing local involved | 15:11 |
noonedeadpunk | oh, ok... | 15:12 |
zigo | Also activate the transformation_chunked plugin ! | 15:12 |
jrosser | *cinder volume | 15:12 |
zigo | plugins = python3,transformation_chunked | 15:12 |
noonedeadpunk | as admin1 was refferencing same issue just 2 days ago, but was uploading from local | 15:12 |
jrosser | mossblaser: you can hack this stuff into the uwsgi config by hand in the test lab | 15:12 |
jrosser | then we can work on a patch if it fixes things | 15:12 |
mossblaser | sorry cinder, of course (it has been a long day!) | 15:14 |
mossblaser | I shall have a play re: uwsgi | 15:14 |
noonedeadpunk | yes, that would be interesting | 15:15 |
noonedeadpunk | and we should deploy uWSGI==2.0.20 | 15:16 |
zigo | IMO, it's kind of silly that you guys are just remplementing all what's already done in packages... | 15:19 |
zigo | That's twice the work for no valid reason. | 15:19 |
noonedeadpunk | except to be sure that you can install any specific version anytime you want? | 15:20 |
noonedeadpunk | without need to mirror repos? | 15:20 |
zigo | Again, that's a packaging concern, to make sure all versions are fit together. | 15:21 |
zigo | can | 15:21 |
noonedeadpunk | I think it depends on what is meant under fitting | 15:22 |
zigo | I don't ! :) | 15:22 |
zigo | I don't think it depends on anything. | 15:22 |
zigo | That's distro's work, end of the story. | 15:22 |
noonedeadpunk | um, and what if regressions in code exist? As no secret that weird backports take place close to each release. And what you should do as cloud operator to revert things back, when only latest package versions is stored in repos? | 15:24 |
noonedeadpunk | As I don't understand how I should ensure state of my cloud with packages, when deploying next week I just have new version of software without any options | 15:25 |
zigo | Wrong package: fix the package. | 15:25 |
zigo | Not wrong package -> use some weirdo overrides. | 15:25 |
noonedeadpunk | My point was leading to ensuring exact same software being deployed not depending on time when it is deployed :) | 15:26 |
mgariepy | or the os. | 15:26 |
zigo | That's because you see the OS as working against you, instead of trying to modify it to do what you want. | 15:27 |
zigo | If you want a specific snapshot of the OS so you don't get the latest point release... make such snapshot and be done with it! :) | 15:28 |
noonedeadpunk | Well yes, I do agree here that it's likely point of perception being present:) | 15:28 |
zigo | There's all the tooling you want for that. | 15:28 |
zigo | As being the person behind all the Debian package since OpenStack exists, I'm probably completely biased ... :) | 15:28 |
noonedeadpunk | sorry, what should I do with that snapshot then?:) | 15:28 |
noonedeadpunk | deploy it in other region? | 15:29 |
noonedeadpunk | I can imagine using it for CI testing... | 15:29 |
noonedeadpunk | but not sure I see how it can be re-used anywhere esle except that host | 15:29 |
zigo | My way of doing things is to simply trust the package manager to do what's right, and provide only bugfixes with no regressions. | 15:29 |
zigo | So I wouldn't do a snapshot, it's only you who claimed you don't want things to be fixed ... :) | 15:30 |
noonedeadpunk | And I admit it makes sense for some usecases:) | 15:30 |
admin1 | my issue with glance was the haproxy was set to mode http and it was hitting some byte limit .. which solved after i changed mode in haproxy for glance from http => tcp | 15:31 |
zigo | admin1: Byte limits? Can you be more specific? | 15:31 |
noonedeadpunk | it was exact same issue and reference to https://bugs.launchpad.net/glance/+bug/1916482 | 15:32 |
noonedeadpunk | and that's the bug created https://bugs.launchpad.net/openstack-ansible/+bug/1965986 | 15:33 |
zigo | Oh, that's a long standing issue in Glance, which is why everyone with some experience never chooses Ceph as a backend for it. | 15:33 |
noonedeadpunk | that's why I thought that mossblaser issue is same one | 15:33 |
zigo | Sad but truth, Glance over RBD simply sux... | 15:33 |
zigo | Though it's IIRC not related to haproxy. | 15:34 |
noonedeadpunk | jrosser: sorry I didn't fully get your comment on https://review.opendev.org/c/openstack/openstack-ansible-galera_server/+/831550 - did you mean we should just place temp dir inside /tmp and get done with it? | 15:45 |
jrosser | it was #tmp not /tmp ? | 15:46 |
noonedeadpunk | `galera_tmp_dir: /var/lib/mysql/#tmp` | 15:46 |
noonedeadpunk | and galera_ignore_db_dirs is relative to datadir | 15:46 |
noonedeadpunk | we can set `galera_tmp_dir: /tmp` actually | 15:46 |
jrosser | oh | 15:47 |
jrosser | becasue galera_tmp_dir: /var/lib/mysql/#tmp | 15:47 |
noonedeadpunk | but I wasn't sure if it's good since /var/lib/mysql can be separate mount point... | 15:47 |
jrosser | i honestly thought the # was a typo :) | 15:47 |
jrosser | is that a convention for mysql things | 15:47 |
noonedeadpunk | ah, no, it was intended)) added # as otherwise ppl won't be able to create database with name `tmp` | 15:48 |
noonedeadpunk | and I think that `#tmp` highly unlikely to be created :D | 15:48 |
noonedeadpunk | but actually yes | 15:48 |
noonedeadpunk | if directory is not set, maria tends to create smth like /var/lib/mysql/#mysql50#tmp.stLr46FBlt | 15:49 |
noonedeadpunk | easy solution would be if `ignore_db_dirs` was supporting regexp, but it doesn't | 15:49 |
noonedeadpunk | I even saw CI failures for upgrade jobs because of that | 15:55 |
noonedeadpunk | and catched in another region in production during upgrade | 15:55 |
opendevreview | Dmitriy Rabotyagov proposed openstack/openstack-ansible master: Add mysql directory for logging https://review.opendev.org/c/openstack/openstack-ansible/+/835091 | 16:07 |
opendevreview | Dmitriy Rabotyagov proposed openstack/openstack-ansible-galera_server master: Update MariDB version to 10.6.7 https://review.opendev.org/c/openstack/openstack-ansible-galera_server/+/833259 | 16:08 |
opendevreview | Dmitriy Rabotyagov proposed openstack/openstack-ansible-galera_server master: Update MariaDB version to 10.6.7 https://review.opendev.org/c/openstack/openstack-ansible-galera_server/+/833259 | 16:08 |
noonedeadpunk | so that was original error like I saw in production https://zuul.opendev.org/t/openstack/build/fe6fd9e0341c4d4b80530cbe5e091cc3/log/logs/openstack/aio1_galera_container-4bf4bdaa/mariadb.service.journal-12-06-26.log.txt#460 | 16:09 |
noonedeadpunk | to be fair, I'm not sure if that's fixed with patch as another common weird error raised even with it.... | 16:10 |
noonedeadpunk | maybe jsut point that to /tmp indeed.... | 16:11 |
admin1 | zigo, how it got solved in haproxy then ? | 16:22 |
admin1 | i mean i was able to create snapshots from local as well as remote after that | 16:22 |
noonedeadpunk | I need really to reproduce that to play with it. As chunked plugin for uwsgi sounds promising | 17:32 |
spatel | I am running this command - openstack-ansible setup-openstack.yml --tags common-mq --limit '!nova_compute' | 19:34 |
spatel | got this error - https://paste.opendev.org/show/bzT7JMrwONnSza328XoO/ | 19:34 |
spatel | jrosser ^ | 19:38 |
spatel | Related to this play https://opendev.org/openstack/ansible-role-uwsgi/src/branch/master/tasks/main.yml#L16 | 19:41 |
spatel | I have changed include_vars: "{{ item }}" to include_vars: "{{ lookup('first_found', params) }}" | 19:52 |
spatel | still same error, I am running 24.0.0 tag | 19:56 |
*** dviroel is now known as dviroel|pto | 20:45 | |
opendevreview | Neil Hanlon proposed openstack/openstack-ansible-plugins master: Update ssh_keypairs role to fix module for Rocky Linux 8 https://review.opendev.org/c/openstack/openstack-ansible-plugins/+/835152 | 21:55 |
NeilHanlon | noonedeadpunk / jrosser - i think that should do the trick | 21:55 |
jrosser | NeilHanlon: one small issue but otherwise looks ok | 22:12 |
NeilHanlon | jrosser: thank you.. I should look for WARNINGs :) | 22:57 |
Generated by irclog2html.py 2.17.3 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!