*** zbr is now known as Guest2488 | 05:03 | |
*** raukadah is now known as chandankumar | 05:46 | |
snadge | TASK [os_nova : Install kvm pip packages .. is now failing "unable to execute 'gcc': No such file or directory" | 06:52 |
---|---|---|
snadge | so close .. i seem to be almost through the setup_openstack playbook | 06:58 |
noonedeadpunk | well issue you pasted is not about python3 | 07:00 |
noonedeadpunk | but yeah, it could be if ansible is not bootstraped, as it would still use old roles | 07:03 |
noonedeadpunk | while using new playbooks | 07:03 |
jrosser | kvm pip should never need gcc either | 07:04 |
jrosser | this indicates it is trying to build the wheel locally on the compute host rather than in the repo server | 07:04 |
noonedeadpunk | but actually then compute would be treated as repo? Or wheel won't be built at all? | 07:05 |
jrosser | there wont be a C toolchain potentially | 07:05 |
jrosser | but imho this looks pretty suspect from the log above `changed: [infra1_repo_container-cf45d187] => (item={u'path': u'/var/www/repo/os-releases/20.0.1'})` | 07:06 |
jrosser | 20.0.1? srsly? | 07:06 |
jrosser | snadge: which release / branch / tag are you wanting? | 07:10 |
jrosser | Ussuri i think? | 07:11 |
noonedeadpunk | I think paste was before bootstrapping.... but dunno... | 07:12 |
jrosser | yeah even so for T thats a very very early tag, should ideally be starting from the most recent | 07:16 |
*** rpittau|afk is now known as rpittau | 07:22 | |
snadge | it is 21.2.6. latest ussuri yes | 07:42 |
snadge | the bootstrap fixed the above problem yes.. this latest one, im looking into | 07:44 |
snadge | im not sure why os_nova is trying to install kvm pip packages onto the compute nodes | 07:44 |
jrosser | is this an AIO or something more complicated? | 07:45 |
snadge | something more complicated but not by much, its based on a stripped down version of that pretty much | 07:46 |
jrosser | and you have an AIO build alongside as reference? | 07:46 |
snadge | i had installed it in a test setup that was similar just using vsphere vms, that was the reference | 07:47 |
snadge | the production setup is basically the same thing | 07:47 |
snadge | but of course it isn't exactly | 07:47 |
jrosser | well, if i was having this much trouble i'd maybe take a step back | 07:47 |
jrosser | first off, there are regular CI jobs which test this stuff for all the stable branches | 07:48 |
jrosser | so we could take a look at the latest one of those for Ussuri, which is here https://review.opendev.org/c/openstack/openstack-ansible/+/794999 | 07:48 |
jrosser | if you click "Zuul Summary" you can see the the Centos-7 jobs were passing (on June 6th anyway) | 07:49 |
jrosser | you can also go look in the logs for those jobs | 07:49 |
jrosser | then it would be try to reproduce exactly those same tests locally with the AIO config, which should be just a couple hours waiting for it to deploy in a VM | 07:50 |
snadge | the other thing i did different in testing was use 21.2.3 | 07:51 |
snadge | ok so if it built on jun 7.. when was 21.2.6 released | 07:52 |
jrosser | remeber that these releases are pretty mechanically generated | 07:53 |
snadge | assuming i can git checkout stable/ussuri instead of eg. 21.2.6 | 07:53 |
jrosser | imho the level of change on these branches is really low | 07:54 |
jrosser | for something like Ussuri we really are not changing anything unless something really stupid happens (like Ubuntu change a load of repos or Centos have a minor release with breaking changes) - the point releases on the whole just pull in any bugfixes made to the actual services like nova/keystone etc | 07:56 |
jrosser | the ansible roles and openstack-ansible itself are staying pretty much the same | 07:56 |
*** raukadah is now known as chandankumar | 08:28 | |
opendevreview | Jonathan Rosser proposed openstack/openstack-ansible-os_neutron master: Do not set Open vSwitch hostname https://review.opendev.org/c/openstack/openstack-ansible-os_neutron/+/793009 | 09:10 |
noonedeadpunk | hm I actually wonder why https://review.opendev.org/c/openstack/openstack-ansible-os_senlin/+/754045 still fails with cert verification issue | 09:42 |
opendevreview | Dmitriy Rabotyagov proposed openstack/openstack-ansible master: Don't set keystone URI as unsecure https://review.opendev.org/c/openstack/openstack-ansible/+/796809 | 09:48 |
opendevreview | Dmitriy Rabotyagov proposed openstack/openstack-ansible master: Don't set keystone URI as unsecure https://review.opendev.org/c/openstack/openstack-ansible/+/796809 | 10:08 |
noonedeadpunk | I'd expect it to fail atm... | 10:08 |
noonedeadpunk | curl https://89.42.141.147:5000 results in `SSL certificate problem: unable to get local issuer certificate` :( | 10:26 |
noonedeadpunk | oh, well.. probably we should either put url or add IP to the certificate? | 10:28 |
noonedeadpunk | *put url as keystone endpoint | 10:28 |
noonedeadpunk | but asking https://aio1.openstack.local:5000 has exact same result :( | 10:32 |
opendevreview | Arx Cruz proposed openstack/openstack-ansible-os_tempest master: Create alternative tempest run command https://review.opendev.org/c/openstack/openstack-ansible-os_tempest/+/796818 | 10:37 |
opendevreview | Dmitriy Rabotyagov proposed openstack/openstack-ansible master: Use openstack_repo_url for requirements_git_url https://review.opendev.org/c/openstack/openstack-ansible/+/796820 | 10:43 |
noonedeadpunk | jrosser: is it enough to copy ExampleCorpRoot.crt into /usr/local/share/ca-certificates/ or also intermediate should be copied there as well? | 11:09 |
opendevreview | Dmitriy Rabotyagov proposed openstack/openstack-ansible master: Don't set keystone URI as unsecure https://review.opendev.org/c/openstack/openstack-ansible/+/796809 | 11:13 |
snadge | im still unable to figure out why Install kvm pip packages is failing | 11:31 |
noonedeadpunk | oh, and you're running centos 7? | 11:37 |
noonedeadpunk | I think this really might be osa bug actually | 11:39 |
snadge | yes unfortunately | 11:44 |
snadge | we are in the process of sorting out a driver issue.. rhel 8.2/8.3 with some out of tree drivers appear to work | 11:44 |
snadge | but its too late to test and deploy with that now | 11:44 |
snadge | also speccing new hardware but this is also future plans | 11:45 |
snadge | i see centos 8 and centos 8 stream are listed as supported.. any plans to support rocky or alma linux, or basically a rhel clone? | 11:48 |
snadge | if we still haven't replaced the hardware and need to replace ussuri, i can look into centos 8 stream | 11:52 |
jrosser | noonedeadpunk: i think you can set up the CA however you need - the thing that makes the intermediate necessary is the signed_by https://opendev.org/openstack/ansible-role-pki/src/branch/master/defaults/main.yml#L92 | 12:11 |
jrosser | so in theory (not tried it though!) you could set it up with just a root and no intermediate at all | 12:12 |
jrosser | the most general case is your company has a root which you will never be able to access the key for, but they give you the CA cert | 12:13 |
jrosser | then they create an intermediate CA cert/key which you do have access to | 12:13 |
jrosser | in terms of copying the CA into /usr/local/share/ca-certificates you should only need to put the root CA there | 12:14 |
jrosser | if you also have to put the intermediate there, then something is broken | 12:15 |
jrosser | the services should be set up to present a cert chain of the service cert + the intermediate CA, which can then be validated by a client which only needs the root CA cert | 12:15 |
noonedeadpunk | well, smth really is broken with basic aio... | 12:18 |
noonedeadpunk | oh, well, intermediate is not part of ha proxy cert | 12:18 |
jrosser | it should be, just a momenty | 12:19 |
jrosser | this should define what gets copied over to the haproxy certs dir https://github.com/openstack/openstack-ansible-haproxy_server/blob/master/defaults/main.yml#L152-L168 | 12:19 |
noonedeadpunk | /etc/openstack_deploy/pki/certs/certs/haproxy_aio1-chain.crt is present, but it's not part of /etc/ssl/private/haproxy.pem | 12:20 |
jrosser | and then this should concatenate the pieces https://github.com/openstack/openstack-ansible-haproxy_server/blob/master/handlers/main.yml#L16-L21 | 12:20 |
noonedeadpunk | oh well, and here's a mistake | 12:21 |
noonedeadpunk | `{{ haproxy_user_ssl_ca_cert is defined | ternary(haproxy_ssl_ca_cert,'') }}` | 12:21 |
noonedeadpunk | and we have `haproxy_user_ssl_ca_cert | default(haproxy_pki_intermediate_cert_path)` | 12:21 |
jrosser | oh hmm | 12:23 |
jrosser | like previously there was maybe not a CA (previous haproxy default self signed case?) but now there always is | 12:24 |
noonedeadpunk | yeah... | 12:24 |
noonedeadpunk | and another thing, that we probably need to add ip address to certificate subject | 12:24 |
noonedeadpunk | or at least add what we have defined as vip address | 12:25 |
noonedeadpunk | as once I added chain to haproxy, I got another issue - `SSL: no alternative certificate subject name matches target host name '172.29.236.101'` | 12:25 |
jrosser | right - theres an example of that in the rabbitmq role | 12:26 |
noonedeadpunk | but thanks for helping out here | 12:26 |
noonedeadpunk | I would spend so much time without these pointers... | 12:26 |
jrosser | https://github.com/openstack/openstack-ansible-rabbitmq_server/blob/master/defaults/main.yml#L157 | 12:26 |
noonedeadpunk | https://github.com/openstack/openstack-ansible-haproxy_server/blob/master/defaults/main.yml#L149 I think we just need to add internal here as well just in case? but not sure... | 12:27 |
noonedeadpunk | I think it's worth being another cert.... | 12:27 |
jrosser | yeah | 12:28 |
jrosser | we should look at the internal+external both being SSL as a seperate thing | 12:28 |
jrosser | becasue you might want company cert on the outside | 12:28 |
jrosser | but still need to use internal CA on the inside | 12:28 |
noonedeadpunk | yep, or do let's encrypt outside... Which I guess we can't do right now | 12:29 |
noonedeadpunk | (not sure) | 12:29 |
jrosser | hmm well that is a good question actually | 12:31 |
jrosser | i am hoping that everything still works as before, but now not so sure if re-running the haproxy role after a deployment with LE will replace the LE certs ones with those from the PKI role | 12:31 |
opendevreview | Dmitriy Rabotyagov proposed openstack/openstack-ansible-os_nova master: Drop CentOS 7 specific task https://review.opendev.org/c/openstack/openstack-ansible-os_nova/+/796830 | 12:32 |
jrosser | i think it should be OK, as this should only happen once https://github.com/openstack/openstack-ansible-haproxy_server/blob/master/handlers/main.yml#L16-L21 | 12:33 |
noonedeadpunk | but I mean we get let's encryprt cert for both internal and external at the same time? | 12:33 |
noonedeadpunk | disregard - let me read code carefully ) | 12:33 |
jrosser | oh yes i imagine if you set ssl_all_endpoints of whatever it is then yes, it would be LE both sides | 12:34 |
mgariepy | the gate check for ovn patch is painful. | 12:34 |
mgariepy | 3 recheck, 3 differents failure | 12:34 |
noonedeadpunk | :( | 12:35 |
mgariepy | 1 calico, 1 buster, 1 centos8 irrc | 12:35 |
noonedeadpunk | and only in gates... | 12:35 |
mgariepy | that's annoying lol | 12:35 |
noonedeadpunk | btw calico fails pretty frequently imo | 12:36 |
mgariepy | is there a lot of users using it that you know of ? | 12:36 |
noonedeadpunk | I know logan- used to... | 12:36 |
noonedeadpunk | And who knows - maybe we will consider it as well for some reason one day (at least there were suggestions to try it out) | 12:37 |
noonedeadpunk | snadge: nope, not planning to try alma/rocky/etc at the moment | 12:39 |
jrosser | i guess we maybe have not bumped the calico version for a long time | 12:39 |
noonedeadpunk | snadge: but I think you can try installing gcc on computes as a workaround manually | 12:39 |
noonedeadpunk | jrosser: well, I kind of did recently. we're following 3.18 branch and current one is 3.19 | 12:40 |
noonedeadpunk | (and it has been released pretty recently) | 12:40 |
mgariepy | jamesdenton, are you around ? | 12:42 |
mgariepy | spatel, do you know how to reset the ovn sb db ? | 13:03 |
spatel | hmm what do you mean reset | 13:04 |
jamesdenton | yes? on a call at the moment | 13:04 |
jamesdenton | i do not know how to reset that | 13:04 |
mgariepy | i got a buggy aio install with ovn | 13:05 |
mgariepy | the issue is that the sb db is somewhat empty. | 13:05 |
spatel | i don't think there is any reset thing, you need to delete db file and re-create it i believe | 13:05 |
mgariepy | gateway chassis: [neutron-ovn-invalid-chassis] | 13:05 |
spatel | your compute node should push data to sb | 13:05 |
mgariepy | and it's not listening on all the ports the working one is.. | 13:06 |
mgariepy | i think the issue was that i did zap the neutron containers but didn't drop the DB. | 13:11 |
mgariepy | for redeployment. | 13:12 |
opendevreview | Arx Cruz proposed openstack/openstack-ansible-os_tempest master: Create alternative tempest run command https://review.opendev.org/c/openstack/openstack-ansible-os_tempest/+/796818 | 13:52 |
opendevreview | Adrien Cunin proposed openstack/openstack-ansible-openstack_hosts master: Make sure tzdata is installed in containers https://review.opendev.org/c/openstack/openstack-ansible-openstack_hosts/+/796850 | 14:33 |
opendevreview | Gaudenz Steinlin proposed openstack/openstack-ansible-os_nova master: Use version from repo_packages for SPICE HTML5 https://review.opendev.org/c/openstack/openstack-ansible-os_nova/+/796852 | 14:54 |
opendevreview | Dmitriy Rabotyagov proposed openstack/openstack-ansible master: Fix serialized playbook runs https://review.opendev.org/c/openstack/openstack-ansible/+/752040 | 15:05 |
CeeMac | anyone have any tops for troubleshooting rabbitmq message timeouts? | 15:36 |
CeeMac | we're having some problems with live migrations failing, at first look we're getting 504 gateway timeouts from cinder api, however when looking deeper in the logs we also see some message timeout events in the cinder api logs. so i think nova is requesting a volume operation, cinder api is dropping a message on the queue, then its not getting a response back within a particular time so its dropping the request, which i | 15:38 |
CeeMac | think is then generating the 504 | 15:38 |
CeeMac | tops==tips | 15:39 |
*** rpittau is now known as rpittau|afk | 16:09 | |
noonedeadpunk | I only have one thought, that some rabbitmq member is non functional and doesn't operate normally | 16:14 |
noonedeadpunk | or they're under really high load that they can't handle all messages in time | 16:15 |
noonedeadpunk | second point can be checked with statistics pretty easily | 16:15 |
*** sshnaidm is now known as sshnaidm|afk | 16:16 | |
noonedeadpunk | in case of first one I ussually just run playbook with -e rabbitmq_upgrade=true, as downtime costs more then just running role... | 16:16 |
CeeMac | thanks noonedeadpunk in this case it looks like a couple of volumes were associated with cinder-volume pools which no longer exist, which would explain why the message wasn't getting processed in the queue | 16:41 |
noonedeadpunk | ah:) | 16:41 |
noonedeadpunk | I wouldn't guess that | 16:42 |
CeeMac | we've re-managed the volumes to a valid host/pool and the migration went straight through | 16:42 |
CeeMac | (after tidying up the inactive vif bindings - another thing we took ages to find) | 16:42 |
noonedeadpunk | yeah it's quite straightforward with cinder-manage | 16:42 |
CeeMac | yeah, it had us stumped for a fair while too, | 16:42 |
opendevreview | Dmitriy Rabotyagov proposed openstack/ansible-role-pki master: Allow to generate/install certificates conditionally https://review.opendev.org/c/openstack/ansible-role-pki/+/796895 | 16:54 |
opendevreview | Dmitriy Rabotyagov proposed openstack/openstack-ansible-haproxy_server master: WIP Generate self-signed SSL per listen IP https://review.opendev.org/c/openstack/openstack-ansible-haproxy_server/+/796940 | 18:32 |
opendevreview | Dmitriy Rabotyagov proposed openstack/openstack-ansible-haproxy_server master: WIP Generate self-signed SSL per listen IP https://review.opendev.org/c/openstack/openstack-ansible-haproxy_server/+/796940 | 18:37 |
mgariepy | (˚Õ˚)ر ~~~~╚╩╩╝ centos-8-stream | 18:42 |
mgariepy | the ovn patchs seems to be failing randomly more reliably.. | 18:43 |
mgariepy | hoo. it's a non-voling one .. :S | 18:45 |
opendevreview | Merged openstack/openstack-ansible-os_magnum stable/victoria: Define region for Magnum trust https://review.opendev.org/c/openstack/openstack-ansible-os_magnum/+/795043 | 20:32 |
snadge | these timezone differences are rough, i feel like i need to work during the evenings but it's probably better this way | 21:31 |
snadge | installing gcc on the compute nodes is my latest act of desperation ;) | 21:32 |
snadge | is the python virtual environment a new dependency for the compute nodes? | 21:33 |
jrosser | snadge: it's been like that forever | 21:36 |
jrosser | but look here, for the AIO case the deploy host == compute host, which gets gcc https://github.com/openstack/openstack-ansible/blob/master/scripts/bootstrap-ansible.sh#L73-L81 | 21:36 |
snadge | -bash: dnf: command not found | 21:39 |
snadge | that could be a problem ;) | 21:39 |
jrosser | i gave you a link to the master branch | 21:39 |
snadge | thats the latest version of that script which is master branch yes | 21:39 |
jrosser | which is centos-8 only | 21:39 |
jrosser | ussuri would be https://github.com/openstack/openstack-ansible/blob/stable/ussuri/scripts/bootstrap-ansible.sh#L73-L77 | 21:40 |
snadge | ok that has $RHT_PKG_MGR -y install which should work.. i guess the question is why its not | 21:40 |
snadge | bootstrap also runs on the compute nodes? | 21:40 |
jrosser | no, so it feels like a bug maybe | 21:41 |
snadge | you only run it on the deployment host is my understanding | 21:41 |
snadge | yeah im not bothered and the easiest thing was to yum install gcc on the computes to see what happens i guess | 21:41 |
jrosser | sure | 21:41 |
snadge | thats running now | 21:41 |
jrosser | and actually here is the root cause https://opendev.org/openstack/openstack-ansible-os_nova/commit/e72835e5ac97f51665add4ade2a737eab12b3a9e | 21:41 |
jrosser | curse of centos again | 21:41 |
jrosser | whilst there is a python 3 interpreter for centos-7 nowadays, they don't package any useful python libraries | 21:42 |
jrosser | so the libvirt python lib is just completely unavailable and it looks like we have to build it from source | 21:43 |
snadge | ok that makes complete sense now.. nobody should be using ussuri or centos 7 anymore, my apologies | 21:43 |
snadge | this will be the last time i promise :P | 21:44 |
jrosser | no its fine - it's as much a bug in OSA that we don't special case installing gcc on computes for ussuri/centos-7 | 21:44 |
snadge | im probably going to suggest moving onto centos 8 stream | 21:44 |
snadge | even though we can use RHEL for free | 21:44 |
jrosser | just unlucky that in the AIO/CI we get that as a side effect of all the functions being collapsed onto the same node | 21:44 |
jrosser | i'll see if i can make a patch | 21:45 |
snadge | how is the centos 8 stream support? are any other centos forks supported, like almalinux or rocky? | 21:48 |
snadge | everyone else laughs in ubuntu.. i know ;) | 21:52 |
jrosser | stream support will be in the upcoming Wallaby release | 21:58 |
jrosser | though i guess it needs some multinode testing properly in a lab, for reasons just like you found with ussuri/centos-7 | 21:59 |
jrosser | there will likley be a few small gremins | 21:59 |
jrosser | *gremlins | 21:59 |
snadge | as soon as we have deployed this ussuri release I can look into wallaby/stream | 21:59 |
snadge | who knows how long it will take them to buy new hardware, its a new bladecenter and san | 22:00 |
jrosser | and it remains to be seen what the stability is, though hopefully as OSA gets all the openstack stuff from source code you might be relatively decoupled from that | 22:00 |
snadge | its possible that rhel clone support could be added alongside stream.. for now, they are almost identical | 22:02 |
snadge | but how they diverge who knows, and that obviously doubles the amount of testing etc | 22:02 |
snadge | centos 8 stream just seems like a weird choice for a bladecenter.. but im sure i can get approval to at least try it :P | 22:05 |
opendevreview | Jonathan Rosser proposed openstack/openstack-ansible-os_nova stable/ussuri: Install gcc on any nova-compute hosts which need libvirt-devel https://review.opendev.org/c/openstack/openstack-ansible-os_nova/+/796957 | 22:13 |
jrosser | snadge: ^ i've not got anywhere to test that locally, the CI should verify that syntactically it's ok at least | 22:14 |
snadge | yeah it looks good, i could simply pull in that patch and run the playbook again.. maybe remove gcc first | 22:15 |
jrosser | yeah, if you can test it and leave a comment on the patch that would be super helpful | 22:15 |
snadge | which playbook will run that? | 22:15 |
jrosser | you can run the playbooks/os-nova-install.yml | 22:16 |
snadge | ahh okay even easier | 22:16 |
jrosser | have you applied a patch from gerrit before? | 22:16 |
snadge | not for this project, but cyanogenmod which also uses gerrit, but a while ago | 22:16 |
jrosser | hit the 3-dots thing top right on https://review.opendev.org/c/openstack/openstack-ansible-os_nova/+/796957 | 22:17 |
jrosser | select "download patch" and copy the "cherry pick" line to your clipboard | 22:18 |
jrosser | then cd to /etc/ansible/roles/os_nova, paste the command and it should apply it there for you | 22:18 |
jrosser | in terms of rhel support - as you say it is extremely similar to stream | 22:20 |
jrosser | and i would expect you could get that working reasonably OK - there are almost certainly some places that we match on the string 'centos' that would need to be made more generic | 22:21 |
jrosser | and the usual suspects would be getting systemd-networkd and lxc working | 22:22 |
snadge | ive installed the patch but waiting for setup_openstack to finish | 22:23 |
snadge | i just want to see the horizon web interface and don't actually care if it works ;) | 22:23 |
jrosser | let me just check what we had to do for horizon too..... | 22:23 |
jrosser | yes so theres a similar missing-python3-library situation for mod-wsgi which needed this workaround https://github.com/openstack/openstack-ansible-os_horizon/commit/075dcf9c7e7b13776848b08b092d901cc185b669 | 22:30 |
snadge | thats in master.. is it in ussuri/stable? | 22:41 |
snadge | if horizon installs correctly i will assume yes | 22:42 |
snadge | its my day off.. maybe i'll get paid an extra day this week hehe | 22:45 |
snadge | i owe you a carton of beers at this rate, by the time travel between australia and other countries is allowed again | 23:02 |
snadge | horizon installed 👍 | 23:07 |
Generated by irclog2html.py 2.17.2 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!