*** chkumar|rover is now known as chandankumar | 03:31 | |
noonedeadpunk | For those who use prometheus and libvirt exporter - it might be useful to know that project has changed an owner to quite contraversary one (just own opinion) - some details are in kolla patch - https://review.opendev.org/c/openstack/kolla/+/868161 | 10:20 |
---|---|---|
opendevreview | Dmitriy Rabotyagov proposed openstack/openstack-ansible-openstack_hosts master: Allow to manage extra services, mounts and networks https://review.opendev.org/c/openstack/openstack-ansible-openstack_hosts/+/868534 | 10:23 |
*** dviroel_ is now known as dviroel | 11:15 | |
anskiy | question! I have openstack installation with one region and Ceph cluster. I'm trying to move Cinder to control-plane nodes and use it in active-active mode. So, if I understood correctly: there would be only one cinder-volume service which would be attached to one AZ. Suppose I want to add another AZ (which should represent another DC), should I create another Ceph cluster with separate cinder-volume service on the exact sa | 13:25 |
noonedeadpunk | anskiy: it kind of depends on your AZ implementation | 13:26 |
noonedeadpunk | I'm doing AZ deployment at the moment and was planning to publish some better docs about it (and had some talk on how to configure AZs with OSA in OCtober) | 13:26 |
noonedeadpunk | But long story short - it does depend on your requitements. For AZ you can both share and separate storages. So if your DCs are less then 10km from each other and you're confident in link between them - you might want to strecth ceph cluster between AZs | 13:28 |
noonedeadpunk | But if you want separate ceph clusters - you can do that as well. But then I think you will need to spawn independant cinder-volumes, if you want to isolate az1 going to storage in az2 | 13:28 |
anskiy | noonedeadpunk: that's gonna be a one-to-one relation between AZ and DC, but control-plane nodes would be only in one DC. Is this reasonable (with your note on my trust in cross-DC network link)? Or do people just span control-plane across DCs for multi-AZ setup? | 13:31 |
noonedeadpunk | Well, I personally spawn control plane cross DC, but we're going to have 3 AZs | 13:35 |
noonedeadpunk | As then, you can survive complete AZ failure even API wise | 13:35 |
noonedeadpunk | I did a pair of keepalived instances per AZ, so 3 public instances and 3 privatre, and then DNS RR | 13:36 |
noonedeadpunk | Also, haproxy is targeting only local to AZ backends to reduce cross-az traffic | 13:36 |
noonedeadpunk | and same can be done with internal vip either through /etc/hosts or dns - so services in containers will talk to local haproxy and be pointed towards local backends (ie nova-cinder communication) | 13:39 |
noonedeadpunk | the only nasty thing are images in glance | 13:39 |
noonedeadpunk | is I wasn't able to find proper way to satisfy everyone without using swift backend instead of rbd. If you're fine with using interoperable import only - there's a way around I guess. | 13:40 |
noonedeadpunk | (so depends on how you can mandate users behaviour) | 13:41 |
anskiy | noonedeadpunk: thank you for the insights, gonna have to think more about this. | 13:50 |
noonedeadpunk | anskiy: that's actually the talk I was mentioning - it's far from being a good one, but still might give some insights https://www.youtube.com/watch?v=wvTvfAR_4eM&list=PLuLHMFPfD_--LAMu7bBkCNAXfTy04iLPj There're also presentations for the event lying in public access somewhere | 13:56 |
moha7 | while `telnet To whom not being in vacation (: | 14:11 |
moha7 | OSA was deployed successfully (I mean not getting any error in deployment processes), But now when I run the command ` openstack network list`, I get this error: | 14:11 |
moha7 | HttpException: 503: Server Error for url: http://172.17.246.1:9696/v2.0/networks, 503 Service Unavailable: No server is available to handle this request. | 14:11 |
moha7 | while `telnet 172.17.246.1 9696` from the infra1-utility-container gets connected to the port 9696 | 14:12 |
moha7 | There's a same error here: https://bugzilla.redhat.com/show_bug.cgi?id=2045082#c9 | 14:12 |
noonedeadpunk | moha7: you should be telneting not to haproxy (which listens on 172.17.246.1 I guess), but to haproxy backends, or to put in a better way - mgmt address of neutron-server container | 14:24 |
noonedeadpunk | the error you see most likely says that haproxy for some reason can't reach neutron-server either because of some networking issue, or because neutron-server died | 14:25 |
moha7 | 172.17.246.1 --> internal vip | 14:27 |
moha7 | 172.17.246.174 --> infra1-neutron-server-container-21189fcd | 14:28 |
moha7 | can not telnet to infra1-neutron-server-container-21189fcd from infra1-utility-container-5cf19aed on port 9696 | 14:29 |
noonedeadpunk | but does anything listein inside infra1-neutron-server-container-21189fcd container on that port? | 14:31 |
moha7 | There's a service there named "neutron.slice" with some error. I've never seen this name before! Services status: http://ix.io/4jAM | 14:31 |
noonedeadpunk | so you're trying to have OVN as a networking driver? | 14:33 |
noonedeadpunk | Or you don't care and just spawning default option? | 14:33 |
moha7 | nobody listens to 9696 in the neutron lxc container: http://ix.io/4jAN | 14:33 |
noonedeadpunk | mhm, yeah, I guess it's related to ovn init issue - `ValueError: :6642: bad peer name format` | 14:35 |
jamesdenton | that's a missing northd group | 14:35 |
moha7 | I didn't know OVN is the default and previously was configuring it as I was thinking it is on linuxbridge, trying to port it to OVS; But this time, I deployed it with ovn as it is the default option. | 14:35 |
noonedeadpunk | moha7: yes, we switched default to OVN in Zed | 14:36 |
noonedeadpunk | But you can still use lxb if you want to | 14:36 |
moha7 | I followed this post: https://satishdotpatel.github.io/openstack-ansible-multinode-ovn/ to configure the user_variables and openstack_user_config files | 14:36 |
noonedeadpunk | yeah, I think northd group was introduced relatively lately | 14:37 |
noonedeadpunk | So you'd need to add network-northd_hosts definition to your openstack-user-config.yml | 14:38 |
jamesdenton | that blog is likely a little outdated. That is the way ^^^ | 14:38 |
jamesdenton | Something like --> network-northd_hosts: *controller_hosts, if you have an aliad setup | 14:39 |
jamesdenton | *alias | 14:39 |
noonedeadpunk | si we basically made override `env.d/neutron.yml` as default behaviour | 14:39 |
moha7 | Yeah, I have not set network-northd_hosts; Does it need to an OVN gateway toa sout network deinitions to? | 14:42 |
moha7 | Yeah, I have not set network-northd_hosts; Does it need to an OVN gateway toa sout network deinitions to? | 14:42 |
moha7 | Does it need to an OVN gateway too?* | 14:42 |
moha7 | noonedeadpunk: So, the setting for env.d/neutron.yml that is introduced in that blog is wrong? | 14:44 |
jamesdenton | moha7 those aren't really necessary anymore | 14:44 |
moha7 | Then, network-northd_hosts would be enough, right? | 14:44 |
jamesdenton | you will likely want:: network-gateway_hosts: *compute_hosts | 14:44 |
jamesdenton | So, all computes are ovn controllers. you can decide if you want computes to be gateway nodes with that ^^ | 14:45 |
jamesdenton | or, you can make controllers or dedicated network nodes the gateway nodes using the appropriate alias | 14:45 |
jamesdenton | moha7 the blog was correct as of early December. This is a very recent change, and docs are forthcoming | 14:46 |
moha7 | jamesdenton: I'm not enogh familiar with OVN to decide where I should put the gateway! Based on the picture in the post below, seems compute hosts are a good option: | 14:48 |
moha7 | https://blog.russellbryant.net/2016/09/29/ovs-2-6-and-the-first-release-of-ovn/ | 14:48 |
jamesdenton | yes, i agree, the gateway on computes mirrors the OVS DVR arch | 14:49 |
jamesdenton | and i think that was the intention | 14:49 |
moha7 | Do you know any recent document on OVN, I'm searching but couldn't find any recent document! | 14:49 |
jamesdenton | hmm, i don't really. sorry | 14:49 |
jamesdenton | https://docs.openstack.org/networking-ovn/latest/admin/refarch/refarch.html | 14:50 |
jamesdenton | that might help? | 14:50 |
moha7 | Thanks; By this changes, should deploy from scratch? Or just the os-neutron-install.yml would be enough? | 14:51 |
moha7 | jamesdenton: Sure, thanks for the link; It seems OVN is an interesting backend with new concepts | 14:52 |
jamesdenton | just -os-neutron-install should be enough | 14:52 |
moha7 | +1 | 14:53 |
*** dviroel is now known as dviroel|lunch | 15:08 | |
opendevreview | Dmitriy Rabotyagov proposed openstack/openstack-ansible-lxc_hosts master: Allow to create OVS bridge for lxcbr0 https://review.opendev.org/c/openstack/openstack-ansible-lxc_hosts/+/868603 | 15:32 |
opendevreview | Dmitriy Rabotyagov proposed openstack/openstack-ansible-lxc_container_create master: Add bridge_type to lxc_container_networks https://review.opendev.org/c/openstack/openstack-ansible-lxc_container_create/+/868604 | 15:40 |
moha7 | now, after adding network-northd_hosts and network-gateway_hosts (here: http://ix.io/4jB4), there's no more of this error: "ValueError: :6642: bad peer name format", but this warning is in the status output for neutron-server and neutron.slice: http://ix.io/4jB3 Is there any other option missing? | the command `openstack network list` on utility container returns "Gateway Timeout (HTTP 504)" after a long wait. | port 9696 | 15:54 |
moha7 | is not up on none of neutron containers | 15:54 |
noonedeadpunk | And can you telnet to 172.17.246.1 3306 from neutron-server? | 15:56 |
moha7 | It's connected, but closed free fast! | 15:58 |
noonedeadpunk | that can also be result of the bug that should be fixed with https://review.opendev.org/c/openstack/openstack-ansible-os_neutron/+/868415 | 15:58 |
noonedeadpunk | but I think that you should have neutron version installed that is not affected by it yet | 15:58 |
moha7 | "Connection closed by foreign host." | 15:58 |
noonedeadpunk | so sounds more like mariadb thingy | 15:58 |
noonedeadpunk | Ok, and can you run `mysql -e "SHOW GLOBAL STATUS LIKE 'wsrep_%'"` from utility or galera container? | 15:59 |
noonedeadpunk | eventually, results from galera and utility may differ | 15:59 |
jamesdenton | it does seem like haproxy and/or galera are being a problem | 16:00 |
moha7 | from utility, the output of that mysql command is: ERROR 2013 (HY000): Lost connection to server at 'handshake: reading initial communication packet', system error: 1 | 16:00 |
jamesdenton | you might check the status of haproxy, it might be worth re-running haproxy playbook or simply restarting the service | 16:00 |
noonedeadpunk | that sound like ssl | 16:01 |
noonedeadpunk | and from galera? | 16:01 |
moha7 | from galera container: ERROR 2002 (HY000): Can't connect to local server through socket '/var/run/mysqld/mysqld.sock' (111) | 16:01 |
noonedeadpunk | huh | 16:01 |
noonedeadpunk | and systemctl status mariadb? | 16:01 |
moha7 | failed: http://ix.io/4jBa | 16:03 |
moha7 | `galera_new_cluster` couldn't start it. | 16:04 |
noonedeadpunk | well. here you go... Do you have some strict firewall rules between controlelrs? | 16:04 |
moha7 | no at all | 16:05 |
moha7 | Seems I should re-deploy it, right? | 16:05 |
jamesdenton | what is the status of the other 2 galera containers? | 16:05 |
moha7 | w8 | 16:05 |
noonedeadpunk | You can try re-running `openstack-ansible playbooks/galera-server.yml -e galera_ignore_cluster_state=true -e galera_force_bootstrap=true` if they're also down | 16:06 |
noonedeadpunk | it either fail or succeed | 16:06 |
moha7 | jamesdenton: "Failed to start MariaDB" on all 3 galera containers. | 16:07 |
jamesdenton | ok, try what noonedeadpunk mentioned | 16:07 |
moha7 | +1 | 16:07 |
noonedeadpunk | I wonder why they all would fail though | 16:07 |
noonedeadpunk | doesn't sound too healthy that they did | 16:08 |
jamesdenton | also, if you can post the output of this from each container, that would be helpful: cat /var/lib/mysql/grastate.dat | 16:09 |
noonedeadpunk | bet it all -1 | 16:11 |
noonedeadpunk | I have impression that grstate kinda broken for a while as haven't seen anything except -1 there for years now | 16:11 |
noonedeadpunk | Or maybe we were failing in a way that's not covered only | 16:12 |
moha7 | jamesdenton: http://ix.io/4jBc | 16:13 |
jamesdenton | interesting, i feel like i've seen this before | 16:14 |
jamesdenton | from within the ct3 container, can you use the mysql client? | 16:16 |
moha7 | I re-run the galera cluster from contanier3, there are some errors there, but now, `mysql -e "SHOW GLOBAL STATUS LIKE 'wsrep_%'"` returns the tables on utility | 16:16 |
jamesdenton | ok, so on ct2 and ct1, it should just be a matter of "systemctl start mariadb" | 16:16 |
noonedeadpunk | hm | 16:19 |
noonedeadpunk | that's weird | 16:19 |
noonedeadpunk | these errors in log should have been covered with https://opendev.org/openstack/openstack-ansible-galera_server/src/branch/master/defaults/main.yml#L112-L114 | 16:20 |
moha7 | now, started and running on all galera nodes, but returining `[Warning] Aborted connection 67 to db: 'neutron' user: 'neutron' host: 'ct1-neut ron-server-container-21189fcd.openstack.local' (Got an error reading communication packets)` in the `systemctl status mariadb` | 16:20 |
jamesdenton | ok - try rerunning neutron playbooks now that the DB is up | 16:21 |
moha7 | still no port 9696 on ct1-neutron-container | 16:21 |
moha7 | Ah, ok | 16:22 |
noonedeadpunk | I'd say that galera is unlikely is desired state tbh | 16:22 |
jamesdenton | and that could bem too | 16:22 |
jamesdenton | maybe rerun setup-infra and setup-openstack? | 16:22 |
noonedeadpunk | as `FATAL ERROR: Upgrade failed` is not good tbh | 16:22 |
noonedeadpunk | and all these tmp tables shouldn't be there | 16:22 |
moha7 | I have snapshots. I rollback to the step where setup-hosts.yml was done | 16:23 |
moha7 | and start from setup-infra | 16:23 |
jamesdenton | ok, don't forget to add the groups, then | 16:24 |
moha7 | The deployment server is standalone, not on the nodes | 16:25 |
jamesdenton | gotcha | 16:26 |
*** dviroel|lunch is now known as dviroel | 16:32 | |
*** dviroel is now known as dviroel}out | 19:37 | |
*** dviroel}out is now known as dviroel|out | 19:37 |
Generated by irclog2html.py 2.17.3 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!