NeilHanlon | prometheanfire: do you have a link to logs for that? | 00:44 |
---|---|---|
prometheanfire | sure, https://pastebin.com/raw/gF1sSJdA | 00:48 |
prometheanfire | don't have access to github gists from that laptop | 00:48 |
prometheanfire | the link to ovs commit's message sums it up fairly well though | 00:49 |
NeilHanlon | prometheanfire: just did a bit of digging.. from what I can tell, the packages from centos for openvswitch have the patch you mention for openvswitch 2.17, but *not* for openvswitch 2.15 | 01:26 |
prometheanfire | hmm, looked at it more, looks like it's the ovs version within the neutron venv | 01:39 |
prometheanfire | not the package | 01:39 |
NeilHanlon | i poked around at the os_neutron role and while I see it installing the centos-nfv release and installing `openvswitch`, i'm not actually sure where that dependency is being fulfilled... the repos only provide packages with the version in the name (e.g., openvswitch2.15 and openvswitch2.17). I wasn't able to see a version supplied anywhere. Perhaps | 05:25 |
NeilHanlon | jrosser may be able to shed some light | 05:25 |
noonedeadpunk | prometheanfire: is that for zed or ... ? As we do have OVN as default in gates, but we were missing https://review.opendev.org/c/openstack/openstack-ansible-os_neutron/+/869042 quite recently | 08:28 |
admin1 | good morning .. i am trying to setup a new cluster .. tag 25.2.0 and stuck on openstack.osa.db_setup : Create database for service -> the output has been hidden due to the fact that 'no_log: true' was specified for this result ... how to map this specific step to exactly which file I need to set the no_log to false | 10:30 |
admin1 | or if there is an easy way to do no_log: false in command line for all | 10:30 |
admin1 | i tried -e no_log=false .. seems to not make a change | 10:30 |
noonedeadpunk | admin1: it's in /etc/ansible/ansible_collections/osa/roles/db_setup/tasks/main.yml | 10:34 |
noonedeadpunk | eventually we should make that easily configurable as you've suggested | 10:35 |
admin1 | did not worked .. i made the changes in /etc/ansible/ansible_collections/openstack/osa/roles/db_setup/tasks/main.yml - as it is where it was in my ubuntu 2 | 10:38 |
admin1 | ubuntu 22. | 10:38 |
noonedeadpunk | ah, yes, sorry, missed openstack folder (was typing from memory) | 10:39 |
noonedeadpunk | could you have overriden ANSIBLE_COLLECTIONS_PATHS or ANSIBLE_COLLECTIONS_PATH? | 10:42 |
admin1 | no . i always insisted on osa be able to do stuff out of the box | 10:43 |
admin1 | i never override anything | 10:43 |
admin1 | just follow the manual :) | 10:43 |
admin1 | grep -ri openstack.osa.db_setup /etc/ansible/ returns these files /etc/ansible/roles/os_keystone/tasks/main.yml: | 10:44 |
noonedeadpunk | Um, then I have no idea why it didn't worked... | 10:44 |
noonedeadpunk | Yes, but there we just include the role | 10:44 |
noonedeadpunk | and it should come from $ANSIBLE_COLLECTIONS_PATH/openstack.osa.db_setup | 10:45 |
noonedeadpunk | well, you can check with debug if that's the path as well | 10:46 |
admin1 | checking if there is a one liner | 10:47 |
noonedeadpunk | eventually what can go wrong at db setup is either mariadb cluster is not healthy or my.cnf for client on utility container is wrong | 10:54 |
admin1 | yeah .. debugging those | 10:54 |
noonedeadpunk | So you can check these 2 things as well - maybe will be faster to solve these | 10:55 |
admin1 | suspect haproxy issue | 10:55 |
admin1 | strange .. my util container cannot ping any other container on mgmt even on the same server :D | 11:08 |
noonedeadpunk | is there IP configured on lxcbr0 on host? | 11:22 |
noonedeadpunk | but yeah, shouldn't be the reason | 11:22 |
noonedeadpunk | I'd suspect lxc-dnsmasq and tried to restart it | 11:22 |
noonedeadpunk | ah, on mgmt, not eth0 | 11:23 |
admin1 | on br-mgmt | 11:25 |
noonedeadpunk | and forwarding is enabled for all interfaces on sysctl? | 11:26 |
admin1 | it is .. i think it has something to do with mtu | 11:40 |
admin1 | at the end, its the mtu :D | 11:40 |
*** dviroel|ourt is now known as dviroel | 11:45 | |
noonedeadpunk | ah, ok, then it's not any of my faults at least :D | 12:33 |
noonedeadpunk | We need to merge this rabbitmq bump to recheck and pass Yoga upgrades https://review.opendev.org/c/openstack/openstack-ansible/+/869078 | 12:38 |
admin1 | hi noonedeadpunk, do you know what would cause this error: ? https://paste.openstack.org/raw/bvthX5495Y5KLkq6ipKU/ | 12:42 |
noonedeadpunk | I assume you don't have /etc/resolv.conf in controller? | 12:43 |
admin1 | its a softlink and its broken for some reason :D | 12:44 |
noonedeadpunk | systemd-resolved is dead? | 12:44 |
admin1 | Active: inactive (dead) | 12:44 |
noonedeadpunk | yeah. try restarting it | 12:44 |
admin1 | hmm.. thanks | 12:45 |
admin1 | removing ssl and ssl-verify-server-cert from .my.cnf and changing internal VIP to galera container IP works .. changing the IP to the VIP does not work .. i get ERROR 2013 (HY000): Lost connection to server at 'handshake: reading initial communication packet', system error: 115 | 13:24 |
admin1 | from utility container | 13:25 |
admin1 | this cluster has only 1 controller for now | 13:25 |
admin1 | entering the mysql ip directly in .my.cnf gives ERROR 2026 (HY000): TLS/SSL error: Validation of SSL server certificate failed when ssl and ssl-verify-cert is enabeld in .my.cnf | 13:27 |
admin1 | what i recalled before was running setup hosts and infra, and then logging to util and hitting mysql ENTER .. if all was good, i was inside mysql and then i used to run setup-openstack | 13:27 |
admin1 | br-mgmt runs on top of unrouted network on its own dedi vlans .. so i think I am ok to not use extra ssl and tls for mysql connection .. how/where do I disable those ? | 13:30 |
noonedeadpunk | admin1: so SSL should be issued for hostname and internal VIP (galera_address) only | 13:34 |
noonedeadpunk | Accessing through IP of galera container is supposed to fail validation | 13:34 |
noonedeadpunk | basically that's what's in cert: https://opendev.org/openstack/openstack-ansible-galera_server/src/branch/master/defaults/main.yml#L236 | 13:35 |
noonedeadpunk | If you want to omit ssl you can use galera_use_ssl or if you don't want to verify certs, you can also `galera_ssl_verify: false` | 13:36 |
admin1 | my internal vip is an ip address | 14:57 |
admin1 | that is no longer supported ? | 14:57 |
admin1 | i don't see the point of having 1 extra dns query per connection from dns -> ip for internal vip ? | 14:58 |
noonedeadpunk | admin1: as you might see from condition - galera_address can be either IP or FQDN. And by default it's `galera_address: "{{ internal_lb_vip_address }}"` | 15:00 |
noonedeadpunk | tbh it would be interesting to see output of `openssl x509 -in /etc/ssl/certs/galera.pem -text -noout` | 15:01 |
* noonedeadpunk places internal IP to /etc/hosts to minimize time for resolving dns | 15:02 | |
admin1 | still getting ERROR 2013 (HY000): Lost connection to server at 'handshake: reading initial communication packet', system error: 115 when entering mysql from util | 15:03 |
admin1 | something did changed | 15:04 |
admin1 | galera backend is listed as DOWN .. while its running and fine in galera container | 15:07 |
admin1 | noonedeadpunk, that command openssl is to be tested from which node ? util ? | 15:09 |
admin1 | gives unable to laod cert | 15:09 |
admin1 | from controller as well as util | 15:09 |
noonedeadpunk | from galera | 15:09 |
noonedeadpunk | oh, well | 15:10 |
admin1 | returns a cert .. Subject: CN = c1-galera-container-44ae38ac | 15:10 |
noonedeadpunk | ` galera backend is listed as DOWN .. while its running and fine in galera container ` -> it's a bit dfifferent | 15:10 |
noonedeadpunk | in cert there's SAN that's interesting | 15:10 |
noonedeadpunk | but anyway, for galera haproxy checks for aliveness another service that now runs with systemd | 15:10 |
noonedeadpunk | let me recall | 15:11 |
noonedeadpunk | Should be mariadbcheck or smth | 15:12 |
noonedeadpunk | this thing https://opendev.org/openstack/openstack-ansible-galera_server/src/branch/master/tasks/galera_server_post_install.yml#L47-L63 | 15:12 |
noonedeadpunk | so basically it's socket that does check for return of /usr/local/bin/clustercheck | 15:13 |
noonedeadpunk | and based on it haproxy determines if galera backend is alive | 15:13 |
admin1 | running that command manually works fine | 15:16 |
spatel | I believe this could be issue of password, i had similar issue where i changed monitoring password of mysql and LB stop sending traffic to mysql | 15:16 |
admin1 | brand new deployment .. usual checkout the current branch, run setup hosts and setup infra and nothing else | 15:17 |
admin1 | i will try to delete all containers, re-initialize the passwords and retry | 15:17 |
admin1 | if i do use an internal vip, would it be added in /etc/hosts locally or would I have to add it in internal dns | 15:18 |
spatel | why redeploy.. i would say try to fix.. may be we have a bug | 15:18 |
admin1 | password is fine though .. if i change the .my.cnf to point to ip directly and not haproxy, it just works | 15:18 |
admin1 | the issue/bug is haproxy is seeing mysql as down when it is not | 15:19 |
admin1 | i think if that is fixed, i am unblocked | 15:19 |
spatel | what hatop saying? | 15:19 |
admin1 | i use the gui . ( never used hatop ) | 15:20 |
admin1 | says backend is down | 15:21 |
admin1 | galera-back DOWN | 15:21 |
spatel | Good! what is in allow_list in haproxy ? | 15:21 |
spatel | make sure 9200 is listening on galera nodes - https://paste.opendev.org/show/bAtJ6xJE7KpMUlrAfNgK/ | 15:22 |
spatel | 9200 is check script | 15:22 |
admin1 | 9200 is there .. but does not respond to queries from outside | 15:24 |
admin1 | hmm.. | 15:24 |
spatel | cool! now you know what to do | 15:24 |
noonedeadpunk | it's only answering from specific set of addresses | 15:24 |
spatel | check xinit.d | 15:24 |
noonedeadpunk | As socket has IPAddressAllow | 15:24 |
noonedeadpunk | spatel: it's not xinit.d anymore | 15:24 |
noonedeadpunk | it's systemd socket | 15:24 |
spatel | oops!! | 15:25 |
noonedeadpunk | And allowed list defined here https://opendev.org/openstack/openstack-ansible/src/branch/master/inventory/group_vars/galera_all.yml#L33-L39 | 15:25 |
spatel | its just systemd daemon.. but using xinet.d (sorry i misspell earlier) | 15:26 |
noonedeadpunk | but it should be reachable from any galera or haproxy hopst | 15:26 |
spatel | https://paste.opendev.org/show/bYp9w2EnPVtReHRcDsRD/ | 15:26 |
noonedeadpunk | nope, xinetd not used at all in Yoga+ | 15:26 |
spatel | only_from = 0.0.0.0/0 | 15:26 |
admin1 | let me install tcpdump and see what ip it comes as | 15:26 |
spatel | I am checking in wallaby (sorry) | 15:26 |
admin1 | i have multiple ips in the controller in the same range | 15:26 |
admin1 | it only has 1 listed | 15:26 |
noonedeadpunk | (or even Xena+) | 15:27 |
spatel | how does it handle upgrade? are we wiping out config during upgrade process? | 15:27 |
noonedeadpunk | admin1: well, you can override galera_monitoring_allowed_source if needed | 15:27 |
admin1 | my controller ip is .11, but .9 is also there .. now tcpdump shows IP 172.29.236.9.59070 > 172.29.239.67.9200 | 15:27 |
admin1 | so it was coming from a diff ip in the controller | 15:28 |
noonedeadpunk | spatel: yup. https://opendev.org/openstack/openstack-ansible-galera_server/src/branch/master/tasks/galera_server_post_install.yml#L16-L36 | 15:28 |
spatel | bravo!! | 15:28 |
admin1 | why not give it the br-mgmt range as default ? | 15:29 |
admin1 | if i put 172.29.236.0/22 there, does 127.0.0.1 also need to exist ? | 15:30 |
admin1 | will see | 15:30 |
spatel | it us br-mgmt to poke 9200 (may be you put your hapoxy vip in different range) | 15:30 |
admin1 | server base IP = 172.29.236.11 , VIP = 172.29.236.9 .. | 15:31 |
noonedeadpunk | admin1: that was always like that, like 5+ years | 15:31 |
admin1 | allow was 172.29.236.11 , while tcpdump showed the 9200 connection was from 236.9 | 15:31 |
admin1 | i now did galera_monitoring_allowed_source: 172.29.236.0/22 | 15:31 |
noonedeadpunk | we didn't touch that allowlist when switching from xinet.d to systemd | 15:31 |
admin1 | IPAddressAllow=172.29.236.0/22, but still not work | 15:33 |
admin1 | had to do a daemon-reload and stop/start manually | 15:33 |
admin1 | finally galera is UP from haproxy | 15:34 |
admin1 | i think i can finally run setup-openstack now :) | 15:34 |
admin1 | thanks guys | 15:34 |
noonedeadpunk | huh, wonder why deamon-reload and restart wasn't performed | 15:34 |
noonedeadpunk | that could be proper bug | 15:35 |
admin1 | can we allow br-mgmt range by default there ? as i am sure most people will have at least base and vip in the controller | 15:36 |
admin1 | and in my case, controller was listed but requests went via the VIP | 15:37 |
noonedeadpunk | admin1: are not adding VIP as /32 or /24? | 15:37 |
noonedeadpunk | s/not/you/ | 15:38 |
admin1 | no | 15:38 |
admin1 | adding it as /22 .. same range as br-mgmt | 15:38 |
noonedeadpunk | well... it's worth being /32 | 15:38 |
noonedeadpunk | then you won't have issues like that | 15:38 |
noonedeadpunk | or well, should not, as VIP won't be used as sourced one, but simply as an alias | 15:39 |
admin1 | right .. | 15:40 |
noonedeadpunk | well, we can set to whole mgmt network as well, but I'm not sure if it's the only place which can be problematic at the end | 15:47 |
noonedeadpunk | feel free to propose patch if you feel it's worth fixing | 15:49 |
*** dviroel is now known as dviroel|lunch | 16:01 | |
prometheanfire | noonedeadpunk: from what I can see it's a backtrace but not a crash, when I manually apply the patch to the venv I get no backtrace | 16:37 |
prometheanfire | and ya, zed | 16:38 |
prometheanfire | noonedeadpunk: you are saying that patch also fixes the error? | 16:39 |
noonedeadpunk | well, no, I don't know if it fixes your issue, but I know it fixes another bug in case when northd is on the same place as neutron-server | 16:45 |
noonedeadpunk | as then they both are conflicting for lock dir | 16:45 |
noonedeadpunk | but I decided to mention that jsut in case | 16:46 |
prometheanfire | ah, I don't think so, they are in separate containers from what I remember | 16:46 |
noonedeadpunk | I see that openvswitch2.17-2.17.0-57 is being installed in CI at least | 16:49 |
prometheanfire | ya, that's the package version, is that package version installing the python ovs package in the venv too (that's the code I patched that fixed it) | 16:51 |
prometheanfire | I imagine that it's ovs from pypi that's used in the venv | 16:52 |
prometheanfire | heh, fix committed in october or later iirc, last release in may https://pypi.org/project/ovs/#history | 16:53 |
admin1 | how is gluster used in these new releases ? | 17:05 |
admin1 | repo uses gluster now ? | 17:06 |
*** dviroel|lunch is now known as dviroel | 17:10 | |
admin1 | one more error .. https://paste.openstack.org/show/bNukeywgadvG2M9PvGsm/ -- not sure of this one as well | 17:18 |
admin1 | what would be the correct format for external ceph file | 17:23 |
admin1 | i think its a bug .. https://paste.openstack.org/show/bYPWEAipoL0VZaNfExwg/ on how the file is processed | 17:30 |
admin1 | oh | 17:30 |
admin1 | my bad | 17:30 |
admin1 | its = and not : | 17:31 |
noonedeadpunk | prometheanfire: that we take from upper constraints :) | 17:40 |
noonedeadpunk | admin1: yes, so for repo container we dropped out lsyncd. Now any shared filesystem can be mounted, like nfs, cephfs, s3fs, etc. | 17:41 |
noonedeadpunk | So you can pass mountpoint as a variable that will be consumed by systemd_mount role. | 17:41 |
noonedeadpunk | By default gluster is being installed, but you can disable that in case already have cephfs or smth at this point | 17:42 |
noonedeadpunk | prometheanfire: so yeah, for venv we need release of ovs in pypi and then bump of version in u-c | 17:43 |
spatel | noonedeadpunk i have question about billing :) you guys using ceilometer + gnocchi correct? | 17:44 |
noonedeadpunk | nah, we're not. I used that before though | 17:44 |
noonedeadpunk | like... 3 years ago? | 17:44 |
spatel | currently what you guys using for flavor based billing? | 17:44 |
spatel | consume notification and pass it to your home brew billing tool? | 17:45 |
noonedeadpunk | in-house system that just poll APIs | 17:45 |
spatel | I am looking for just flavor based billing and not sure what tools i should use | 17:45 |
spatel | ceilometer/gnocchi will overkill it | 17:46 |
spatel | i hate cloudkitty.. | 17:46 |
noonedeadpunk | lol | 17:46 |
spatel | you too? | 17:46 |
noonedeadpunk | nah, never used it. | 17:46 |
spatel | Thinking i can just consume mysql table and run some python math to do monthly or per hour billing based on flavor name | 17:47 |
noonedeadpunk | or well, tried to adopt, but was short in time and then other department wrote a plugin for our billing system that was calculating units based on gnocchi | 17:47 |
noonedeadpunk | the main problem is that you need to take into account when it was created / deleted and also resizes | 17:48 |
noonedeadpunk | as you might have 100 gb volume that was resized to 200 gb 1 min before your script run | 17:48 |
noonedeadpunk | so should you bill for 200gb or 100 gb.... | 17:49 |
spatel | we never resize (so i don't care about it) all i care about creation time and just calculate hours based on time | 17:49 |
spatel | We have private cloud so billing is just to see numbers and costing.. just to show it to management | 17:50 |
spatel | nobody going to pay me :) | 17:50 |
spatel | We compare all 3 cloud and then decide which place we should run our service. | 17:50 |
spatel | If i have billing in my private cloud then i can tell costing etc.. | 17:51 |
noonedeadpunk | yeah, I dunno any good ready solution... maybe smth can be done jsut through prometheus and grafana | 17:51 |
noonedeadpunk | as using openstack exporter you can get flavors | 17:53 |
spatel | you mean prometheus exporter? | 17:55 |
noonedeadpunk | yeah.... but not sure tbh.... | 17:55 |
noonedeadpunk | at least I do recal having grafana dashboard from gnocchi that could sum up usage per project | 17:55 |
noonedeadpunk | so I assume smth can be done with prometheus as well | 17:56 |
spatel | hmm.. why don't just extract those info from mysql :) | 17:57 |
spatel | that info must be somewhere in db | 17:57 |
spatel | let me try or start from somewhere.. | 17:57 |
spatel | what you guys do for BW billing? | 17:58 |
admin1 | for one cluster, i am testing https://fleio.com/pricing | 18:05 |
admin1 | considering the amount of hours you have to spend for doing it your own, the price looks OK in that regard | 18:05 |
noonedeadpunk | ah, yes, I do remember these folks. But they would be quite costy given it's only for internal needs | 18:28 |
noonedeadpunk | fwiw fleio folks were hanging out here some time ago... But they also relied on ceilometer from what I know (at least were contributing to ceilometer role) | 18:29 |
noonedeadpunk | they were not active lately though :) | 18:30 |
noonedeadpunk | *:( | 18:30 |
jrosser | spatel: we used this for usage accouting (not billing) https://opendev.org/vexxhost/atmosphere | 18:53 |
jrosser | it consumes events as http from ceilometer and creates db entries you can then query | 18:54 |
spatel | jrosser Thanks, let me check.. | 18:54 |
spatel | how does it work.. there is no doc in repo :) | 18:54 |
jrosser | but like no documentation so you have to read the code | 18:54 |
spatel | why not consumes events directly from rabbitMQ instead of ceilometer? may be some extra metrics there like BW/IOPS etc | 18:55 |
jrosser | i didnt design it | 18:55 |
noonedeadpunk | Hm, I thought atmosphere now is their whole automation stack.... | 18:56 |
jrosser | that is also very confusing | 18:56 |
jrosser | anyway we have a combination of atmosphere, openstack exporter and then prometheus | 18:56 |
spatel | Yes atmosphere is openstack deployment tool using k8s :) | 18:56 |
spatel | https://vexxhost.com/private-cloud/atmosphere-openstack-depoyment/ | 18:57 |
noonedeadpunk | it's k8s AND asnible from what I know | 18:58 |
spatel | vxxhost was using OSA before correct? | 18:58 |
noonedeadpunk | yup | 18:58 |
spatel | i talked to mnaser in Berlin about it.. and he told check it out in github and give it a shot :) | 18:59 |
spatel | They create rabbitMQ for each service so no single rabbit cluster for all | 19:00 |
spatel | everything is autoheal using k8s | 19:00 |
spatel | But devil in details :) worth give it a try for learning though | 19:01 |
noonedeadpunk | I'm suuuuuuper sceptic about auto-heal for everything | 19:04 |
spatel | Yes, it looks fancy but problem start when shit hit the fan :D | 19:13 |
noonedeadpunk | or well. maybe autoheal (which systemd does anyway out of the box), but totally not auto-scale. | 19:13 |
noonedeadpunk | and to auto-heal rabbit - you need to detect it's not healthy first, which is most tricky thing at least for me.... | 19:13 |
spatel | Anyway all openstack services are stateless (they are auto healer itself) | 19:14 |
spatel | Only rabbitMQ and Galera required some love. (I can tell you rabbitMQ is much stable these days, may be they fixed lots of bug) | 19:14 |
spatel | If you create dedicated rabbit instance for each service then very less chances that you will run on any issue. | 19:16 |
spatel | Biggest issue of rabbit is cluster and queue sync up. | 19:16 |
spatel | may be we should try that in OSA :) option to spin up rabbit for each services :) | 19:17 |
noonedeadpunk | You have that option | 19:18 |
noonedeadpunk | It's possible and we used that like 3 years ago | 19:18 |
spatel | Only nova/neutron use heavey rabbit but not others | 19:18 |
spatel | I like that idea to have own rabbit instance for each service | 19:18 |
noonedeadpunk | Well, for nova you have cells | 19:18 |
noonedeadpunk | And for neutron - ovn :D | 19:18 |
noonedeadpunk | You would need to have either quite powerfull control plane or have more then 3 of them | 19:19 |
noonedeadpunk | As many rabbits is quite heavy thing | 19:19 |
noonedeadpunk | rabbit per service is not very well documented though. It's only mentioned in trove role documentation | 19:21 |
noonedeadpunk | But idea is exactly the same. | 19:21 |
noonedeadpunk | and you can have galera per service as well | 19:22 |
spatel | Galera is very stable application | 19:23 |
noonedeadpunk | lol | 19:23 |
spatel | I had almost zero issue | 19:23 |
spatel | I don't know about you :D | 19:23 |
noonedeadpunk | you should have also said it has zero bugs | 19:24 |
noonedeadpunk | well, yeah, if it runs - it runs properly | 19:24 |
noonedeadpunk | until shit hits the fan | 19:24 |
spatel | I don't know what bug you encounter but it works well in my case.. | 19:24 |
spatel | May be you guys doing something which i am not aware. | 19:25 |
noonedeadpunk | broken ist, broken threading and when they've broke root permissions during upgrade because of weird bits | 19:25 |
noonedeadpunk | they fix everything but it might be hard to find stable version from time to time | 19:26 |
spatel | noonedeadpunk i agreed on that. There was a bug in newer version and uograde was no smoother | 19:26 |
spatel | May be we need noSQL db in future to not worry about it :) | 19:27 |
spatel | How do you perform housekeeping on mysql | 19:27 |
spatel | we have nova DB size of 150G :D | 19:27 |
noonedeadpunk | oh, that is smth we need to implement in OSA | 19:28 |
noonedeadpunk | as nova-manage allows to trib deleted records | 19:28 |
spatel | I was reading about nova-manage tool to purge but want to learn before i try | 19:28 |
noonedeadpunk | and I was thinking on making an optional systemd service for that for nova and cinder at least | 19:28 |
spatel | do you have command handy to do cleanup? | 19:29 |
admin1 | heat always fails => https://paste.openstack.org/raw/bTxVwYrRztFQsaIt9eHn/ | 19:29 |
spatel | are there any outage during cleanup? | 19:29 |
noonedeadpunk | https://docs.openstack.org/nova/latest/cli/nova-manage.html#db-archive-deleted-rows | 19:29 |
noonedeadpunk | no-no, it jsut wipes records from DB that are marked as deleted | 19:30 |
noonedeadpunk | so you first move to shadow, and then can purge from shadow | 19:30 |
noonedeadpunk | admin1: that usually means that wheels are not build | 19:31 |
spatel | noonedeadpunk tx | 19:31 |
admin1 | how to force rebuild a heat wheel | 19:31 |
spatel | venv_build=true (something like this in ansible command) | 19:32 |
noonedeadpunk | admin1: we also backported bugfix for that https://review.opendev.org/c/openstack/openstack-ansible-os_heat/+/865564 | 19:32 |
noonedeadpunk | maybe not released yet.... | 19:32 |
admin1 | right now, i did apt install git and its running again | 19:32 |
jrosser | but that is wrong | 19:32 |
noonedeadpunk | but that git binary is used only when it can't find repo container as target for wheels | 19:33 |
jrosser | admin1: it is so much more useful to understand why this happens | 19:33 |
jrosser | "git is missing" -> something is broken in your deploy that means wheels are not built properly | 19:33 |
admin1 | the playbooks finished | 19:34 |
admin1 | i can test heat if its all OK | 19:34 |
admin1 | or try to build the wheel again | 19:34 |
noonedeadpunk | admin1: they were not built at the first place | 19:34 |
jrosser | i think you're missing the point really | 19:34 |
admin1 | i know :) | 19:34 |
jrosser | something happened during your deploy that made the wheel build for heat not happen or break | 19:35 |
jrosser | then you probably tried again and it went further but then failed with "git not found" | 19:35 |
noonedeadpunk | So the thing is that in case when role can't find a destination on where to build wheels, it tries to just install into the venv, and needs git for that | 19:35 |
noonedeadpunk | oh.. well... maybe that as well... | 19:35 |
prometheanfire | noonedeadpunk: yep :D | 19:36 |
spatel | noonedeadpunk why we don't stop there instead building wheel into venv? | 19:37 |
noonedeadpunk | I kind of wonder if instead of https://opendev.org/openstack/ansible-role-python_venv_build/src/branch/master/vars/main.yml#L80 we should just define True and deal with consequences... | 19:37 |
noonedeadpunk | we are not building wheels at all then | 19:37 |
noonedeadpunk | we're just installing packages from pypi independently | 19:37 |
spatel | I would prefer to no do anything and go back fix stuff instead of find other ways out | 19:38 |
noonedeadpunk | spatel: it's matter if we jsut fail installation or trying to proceed | 19:38 |
jrosser | when the wheel build fails it should delete the .txt files i think | 19:38 |
jrosser | that should make it repeatably try again | 19:38 |
noonedeadpunk | yep, we have block for that | 19:38 |
jrosser | so thats why it is interesting that for admin1 this seems not to happen | 19:39 |
jrosser | but why find root cause when a hack will do /o\ | 19:39 |
admin1 | no no .. i have to deliver this cluster by this evening .. .. i can create this same env again in dev tomorrow and go into root cause ) | 19:39 |
noonedeadpunk | I still think it's matter of `venv_wheel_build_enable` being evaluated to false rather then anything else | 19:39 |
noonedeadpunk | which depends on quite complex logic to be fair | 19:40 |
jrosser | maybe we should make that a hard failure when there is no build target | 19:41 |
jrosser | rather than fall back to building in the ven | 19:41 |
jrosser | v | 19:41 |
jrosser | and only allow that if `venv_wheel_build_enable` is actually set to `False` | 19:41 |
jrosser | i think i am also suspicious about stale facts for this whole process | 19:43 |
jrosser | it relies on having architecture and OS facts available for the repo servers, else it just won't work | 19:43 |
jrosser | this should address that though https://github.com/openstack/ansible-role-python_venv_build/blob/4d766a1f9d9993e2bb3647fdcf19da23fffbae61/tasks/main.yml#L31-L39 | 19:44 |
jrosser | this is also pretty complicated logic https://github.com/openstack/ansible-role-python_venv_build/blob/4d766a1f9d9993e2bb3647fdcf19da23fffbae61/tasks/main.yml#L68 | 19:49 |
admin1 | i have checked the cluster using a heat file and it worked fine .. so it was just a matter of apt install git for me for this one ..should I build the wheels again and try ? or assume it works good | 20:11 |
noonedeadpunk | admin1: it would be great if you could drop the git from container and pasted whole output of python_venv_build | 20:19 |
noonedeadpunk | and re-run with `-e venv_wheels_rebuild=true -e venv_rebuild=true` | 20:20 |
damiandabrowski | sorry, i just jumped in for a minute | 20:20 |
noonedeadpunk | yeah, facts shouldn't be the issue I guess... But I'm not sure what is.... | 20:20 |
damiandabrowski | venv_build issue reminds me of uwsgi issue we had in november | 20:21 |
damiandabrowski | so basically the current logic is: if there's only one container matching distro and architecture, then wheel won't be built on repo container | 20:22 |
noonedeadpunk | Well. It's same but different I guess.. | 20:22 |
damiandabrowski | so it's mainly the case for single controller deployment | 20:22 |
noonedeadpunk | ohhhhh | 20:22 |
noonedeadpunk | yes, that's true | 20:22 |
damiandabrowski | we tried to fix that for our CI previously: https://review.opendev.org/c/openstack/openstack-ansible/+/752311 | 20:23 |
noonedeadpunk | admin1: how many controllers/repo containers do you have ?:) | 20:23 |
damiandabrowski | but i think it would be good te set venv_wheel_build_enable: True in /opt/openstack-ansible/inventory/group_vars/all/all.yml | 20:24 |
noonedeadpunk | for some reason I assumed it's multiple controllers :D | 20:24 |
noonedeadpunk | fwiw 1 vote still needed for https://review.opendev.org/c/openstack/openstack-ansible/+/869078 | 20:24 |
admin1 | 1 controller , 1 repo | 20:25 |
noonedeadpunk | damiandabrowski: the thing is, that in CI we see quite serious performance penalty from using venv_wheel_build_enable: True for AIO at least | 20:25 |
noonedeadpunk | admin1: then disregard all that | 20:25 |
noonedeadpunk | it's all good | 20:26 |
noonedeadpunk | except we need to release a new version with heat isntalling git for that specific usecase | 20:26 |
spatel | Quick question, can i add tags in vm to identify it? | 20:29 |
spatel | like customer name as tags so i can query based on tags and list all vms | 20:29 |
spatel | Oh! nova server-tag-add vm1 cust1 | 20:33 |
jrosser | damiandabrowski: where is the "if there's only one container matching distro and architecture, then wheel won't be built on repo container" in the code? | 20:47 |
jrosser | oh like here https://github.com/openstack/ansible-role-python_venv_build/blob/83998be6b81e756828edc723059e6a5405dd2da6/vars/main.yml#L80 | 20:49 |
jrosser | why do we do that? | 20:49 |
jrosser | hmm right i need to remember that | 20:51 |
jrosser | going to make some unexpected outcome when i add a single aarch64 repo node and one aarch64 compute node to my lab | 20:52 |
damiandabrowski | i guess we thought "if we have only one host, then there's no reason to build wheels" which is probably right as it won't optimize anything | 20:52 |
damiandabrowski | but on the other hand, more hosts can be added at any time and repo container is always deployed anyway | 20:53 |
damiandabrowski | so at the end it may make sense to always build wheels | 20:54 |
*** dviroel is now known as dviroel|afk | 21:19 | |
*** rgunasekaran_ is now known as rgunasekaran | 21:42 | |
prometheanfire | having problems creating ports with ovn, is ovn actually ready to be default? | 22:05 |
admin1 | how to tell osa to not treat one host as network host ( so that it does not create network agents again in rerunning the playbooks ) | 22:12 |
admin1 | so say host h1 was specified as network node before, but not anymore in the user_config .. how to tell osa to not treat h1 as network node anymore ? | 22:12 |
opendevreview | Merged openstack/openstack-ansible stable/xena: Bump rabbitmq role SHA https://review.opendev.org/c/openstack/openstack-ansible/+/869078 | 23:05 |
Generated by irclog2html.py 2.17.3 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!