*** ysandeep|out is now known as ysandeep | 01:23 | |
*** ysandeep is now known as ysandeep|afk | 02:44 | |
*** ysandeep|afk is now known as ysandeep | 04:57 | |
opendevreview | Dmitriy Rabotyagov proposed openstack/openstack-ansible-ceph_client master: Provide opportunity to define cluster_name https://review.opendev.org/c/openstack/openstack-ansible-ceph_client/+/852588 | 05:07 |
---|---|---|
*** prometheanfire is now known as Guest194 | 07:00 | |
*** ysandeep is now known as ysandeep|afk | 07:09 | |
*** Guest194 is now known as Guest200 | 07:28 | |
evrardjp | noonedeadpunk: I disagree, the fact is that the config should radically be different, IMO | 07:34 |
evrardjp | else I would not even try this | 07:35 |
evrardjp | and I don't think that for your case you need variable generation. There are plenty of places for osa case that template is a better choice | 07:36 |
evrardjp | but let's try the PoC see how far that goes | 07:36 |
*** ysandeep|afk is now known as ysandeep | 07:37 | |
*** Adri2000_ is now known as Adri2000 | 08:29 | |
evrardjp | Ok I am at the end of the time I have for the PoC, and I see that this is a positive improvement, yet too marginal to be worth the risks of failed migrations. | 08:43 |
evrardjp | The results removed completely the variables from osa/inventory/group_vars/haproxy, reconfigured the role to use an external role, put all the "desired state" into the deployer node | 08:43 |
evrardjp | it would load the role to reconfigure haproxy if necessary, in each playbook, with include_role tasks_from to allow for a reconfiguration of a frontend/backend live | 08:45 |
evrardjp | I need a quick patch on the upstream role for it | 08:45 |
evrardjp | I had two ways to reconfigure using my external role | 08:46 |
evrardjp | the first way was to generate a series of vars (using set_facts) that would give a proper config. Sadly this becomes very convoluted when configuring haproxy from a non haproxy play | 08:47 |
evrardjp | the other way, was to template directly from OSA existing role and use my other role to reload the configuration/handle the state. This is relatively good in terms of code cleanup, but only bring marginal improvements over the whole configuration | 08:47 |
evrardjp | a mix of those two models could give great results, but at the risk of the clarity and complexity during an upgrade. | 08:48 |
evrardjp | I think that noonedeadpunk's patch on 'interface' and maybe future changes in the templating are "good enough" for a majority of OSA users. | 08:50 |
evrardjp | for people who want something different, I am sure my role can deliver it, if you think it "from scratch". Now it's a tad late for OSA for the marginal improvements | 08:50 |
evrardjp | So there you go, 1 day flushed :) | 08:51 |
opendevreview | Dmitriy Rabotyagov proposed openstack/openstack-ansible-ceph_client master: Do not delegate facts when fetching keyrings https://review.opendev.org/c/openstack/openstack-ansible-ceph_client/+/852714 | 08:53 |
jrosser_ | urgh centos-9-stream jobs are broken | 09:05 |
*** ysandeep is now known as ysandeep|lunch | 09:34 | |
snadge | im running into an issue deploying a yoga install on ubuntu 20.04.4.. https://pastebin.com/24UuhBPX | 09:51 |
snadge | im aware of this issue: https://bugs.launchpad.net/openstack-ansible/+bug/1943978 .. and have installed that patch | 09:51 |
noonedeadpunk | snadge: from the error it seems that haproxy does not see any alive nginx on repo_containers | 09:54 |
noonedeadpunk | so have a feeling that repo-install.yml has failed previously | 09:54 |
snadge | oddly.. wgetting that file seems to work.. perhaps something is bouncing up and down | 09:55 |
noonedeadpunk | yeah, with centos seems we got bad timing for repo updates... | 09:58 |
snadge | ive seen this message a few times "backend repo_all-back has no server available!" | 09:58 |
noonedeadpunk | *infra mirrors sync | 09:58 |
noonedeadpunk | snadge: well yes, that would explain 503 | 09:58 |
snadge | i wonder why it did that.. one of the playbooks must have made it go to lunch | 09:58 |
noonedeadpunk | or well, that's is the reason of 503) | 09:58 |
snadge | then haproxy has marked it nonresponsive or whatever | 09:59 |
opendevreview | Jean-Philippe Evrard proposed openstack/openstack-ansible master: Cleanup useless variables https://review.opendev.org/c/openstack/openstack-ansible/+/852563 | 10:06 |
snadge | yeah something is causing the repo server to drop out | 10:10 |
snadge | but the problem seems intermittent | 10:11 |
mrf | mmm what text editor got the containers? | 10:12 |
mrf | nano vi? | 10:12 |
snadge | cinder-volume is crashing in a loop saying access denied to user cinder .. using password yes.. it seems like a mysql error? | 10:20 |
snadge | this is on the controller which runs all the containers plus galera etc | 10:21 |
snadge | i wonder if thats whats loading the system up and causing the repo server to drop out | 10:21 |
mrf | why i cant edit from host the /var/lib/mysql/grastate.dat in the path of rootfs /var/lib/lxc/controller1_galera_container-45ce6c70/rootfs/var/lib/mysql ?? | 10:43 |
mrf | solved finally used sed... for replace the boostrap | 10:46 |
noonedeadpunk | I have a problem in my sandbox. `internal endpoint for volumev3 service in az-poc region not found` https://paste.openstack.org/show/bC3upHmpHotyE5PyKhnG/ | 10:51 |
noonedeadpunk | wtf is that.... | 10:51 |
noonedeadpunk | mrf: /var/lib/mysql is a bind mount inside the container. So you should check for actual path on the host | 10:52 |
noonedeadpunk | inside token stanza for me catalog is weird indeed. It's somehow filtered I would say | 10:53 |
noonedeadpunk | ok, wtf https://paste.openstack.org/show/bPjXX6jQFm0WWa0yh4Ka/ | 11:04 |
*** ysandeep|lunch is now known as ysandeep | 11:06 | |
jrosser_ | mrf: there won't be an editor in the containers, they are as minimal as practical. you can install vim or whatever if you need it | 11:09 |
jrosser_ | snadge: wget from the haproxy node and also check the haproxy log will be useful | 11:10 |
jrosser_ | all interaction with the repo hosts will be via the loadbalancer so it's important to find why that is unstable | 11:10 |
jrosser_ | mrf: if you are having database trouble then we have some docs here https://docs.openstack.org/openstack-ansible/latest/admin/maintenance-tasks.html#galera-cluster-maintenance | 11:13 |
mrf | yeah i already solved with a sed... forcing the a one to bootstrap | 11:15 |
*** dviroel|out is now known as dviroel | 11:24 | |
noonedeadpunk | so, basically catalog is taken from your auth, and endpoints from a separate API request | 11:39 |
noonedeadpunk | why not everything is returned during token generation then... | 11:39 |
noonedeadpunk | ok, I know what is that | 12:07 |
snadge | maybe im running out of tcp ports or something stupid on the host? a whole bunch of servers went down at the same time this time | 12:58 |
snadge | it seems haproxy logs to /dev/log which is just the main journal | 13:01 |
snadge | it crashes during keystone setup in setup_openstack.. and now its just bailing saying it cant find "/var/www/repo/os-releases/25.0.0/ubuntu-20.04-x86_64/requirements/keystone-25.0.0-constraints.txt" | 13:05 |
snadge | so i have to blow away the keystone container and just reinstall that part? i got stuck in this loop last time | 13:06 |
snadge | i knew i shouldn't have used version 25 :( | 13:06 |
snadge | how do i rebuild that file? | 13:14 |
mrf | Could not find the requested service aodh-api: host" mmm for aodh we just need in the yml the metering-alarm_hosts no? | 13:29 |
*** ysandeep is now known as ysandeep|break | 13:30 | |
jrosser_ | snadge: that sounds like you still have problems with the repo server | 13:34 |
jrosser_ | i am not sure re-creating the keystone container is going to help | 13:35 |
snadge | yeah because i've done this once already, i need to try and find out why its happening | 13:35 |
jrosser_ | i think also you are installing 25.0.0 tag, which would not include any bugfixes that have been applied to yoga since the first release | 13:35 |
snadge | i will check | 13:35 |
mrf | stable/yoga git download the 25.0.0 tag | 13:36 |
mrf | same happens to me | 13:36 |
jrosser_ | no :) | 13:36 |
mrf | yes | 13:36 |
jrosser_ | stable/yoga is the head of the branch | 13:36 |
mrf | in my deploy i read 100% 25.0.0 | 13:36 |
mrf | and i git the stable/yoga | 13:36 |
snadge | it is set to 25.0.0 | 13:37 |
snadge | how do i change it to the latest yoga? | 13:37 |
mrf | im re running the install of aodh containers will check the tag, but im 99% sure that it show 25.0.0 for stable/yoga | 13:38 |
snadge | there is b1, rc1 and rc2 | 13:38 |
jrosser_ | beta1, release candidate 1 and 2 | 13:38 |
snadge | they will be older then? .. oh you are suggesting trying the dev branch | 13:39 |
jrosser_ | i don't know what that means | 13:39 |
jrosser_ | stable/yoga is a branch | 13:39 |
jrosser_ | 25.x.x are tags that mark points in the history of that branch | 13:39 |
snadge | ah okay that makes sense now.. so if i want some fixes that have been done since 25.0.0 i can switch to stable/yoga | 13:40 |
NeilHanlon | snadge: does this visualization help, or no? https://drop1.neilhanlon.me/irc/uploads/ae91b2a8fb5663f5/image.png | 13:41 |
snadge | i need to figure out why the repo server crashes during keystone install.. but it gets jammed, and i have to blow away the keystone container to start again | 13:43 |
jrosser_ | the only thing to note is when you switch to checking out stable/yoga the installed version will become something like 25.1.0.dev33 | 13:43 |
jrosser_ | snadge: can you paste some more debug about what is happening? | 13:43 |
jrosser_ | installing keystone should not affect the repo server, it is helpful if we can debug it | 13:44 |
snadge | well now im at the point where i have the second error that the constraint file is missing | 13:45 |
snadge | so i have to blow it all away to get it to crash the repo server.. and even then, i probably wont know why | 13:45 |
mrf | jrosser how to checkout the installed version? | 13:45 |
mrf | any file in openstack_ git contains version string? | 13:46 |
jrosser_ | mrf: it is templated into the top of /usr/local/bin/openstack-ansible | 13:46 |
jrosser_ | snadge: we can help debug if you like | 13:46 |
snadge | that would be great, its real late here but i wouldn't mind progressing past this block at least | 13:47 |
jrosser_ | there are standard debug things to try, like wget the same file several times | 13:47 |
jrosser_ | the loadbalancer will hit each repo server in turn so if you get 1-in-3 type succeed/fail then you know that the contents of the repo servers are not synchronised | 13:48 |
mrf | export OSA_VERSION="25.0.0" | 13:48 |
jrosser_ | mrf: that isntallation is the result of `git checkout 25.0.0` | 13:48 |
snadge | there is only one repo container.. so it shouldn't even really need haproxy? | 13:49 |
snadge | this is a fairly small install | 13:49 |
mrf | from my cli history "575 git clone -b stable/yoga https://opendev.org/openstack/openstack-ansible /opt/openstack-ansible" | 13:49 |
jrosser_ | mrf: are you re-running `scripts/bootstrap-ansible.sh` each time you change the checkout of openstack-ansible to deploy? | 13:52 |
mrf | i never changed :( is the first time we use ansible for deploy openstack... | 13:52 |
mrf | it just a virtual envirioment for test it | 13:53 |
jrosser_ | if you change from tag 2.5.0 to stable/yoga then you really should re-run the bootstrap script | 13:53 |
mrf | re-bootstraped and changed to export OSA_VERSION="25.0.1.dev3" | 13:57 |
jrosser_ | did you git fetch? | 14:00 |
snadge | can i just rebuild the repo container? | 14:12 |
jrosser_ | you can re-run the playbook for it, no problem | 14:16 |
jrosser_ | you can also delete/re-create it completely | 14:16 |
jrosser_ | but i will add that /var/www/repo/os-releases/25.0.0/ubuntu-20.04-x86_64/requirements/keystone-25.0.0-constraints.txt" is a file created during the keystone playbook, not when the repo server is built | 14:17 |
snadge | okay i just need to figure out why the playbook isn't creating that file and putting it into the repo then | 14:19 |
snadge | i dont know which playbook it is, i can only assume it thinks its already done and skipping it or something | 14:33 |
*** ysandeep|break is now known as ysandeep | 14:34 | |
snadge | it just happened again, and i couldn't figure out why.. i desperately tried turning the timout for haproxy way up | 14:41 |
snadge | but it didnt help | 14:41 |
jrosser_ | it would really help to see pastes of the log | 14:41 |
jrosser_ | becasue i don't know if you are talking about 404 or 503 | 14:42 |
snadge | the ansible playback log is about all i have to go on | 14:42 |
snadge | haproxy logs to the journal on the controller i think, and it doesnt say much other than down, i simply dont know where to look | 14:43 |
jrosser_ | ok | 14:43 |
jrosser_ | and is it down? | 14:44 |
jrosser_ | haproxy is checking for something very specific being present, not just that the socket accepts a connection | 14:44 |
jrosser_ | so "down" can mean network problems, nginx not running, the file it's checking being absent..... | 14:45 |
*** ysandeep is now known as ysandeep|out | 14:47 | |
mgariepy | snadge, look at the haproxy config to see what it's looking for for that service | 14:47 |
jrosser_ | `cat /etc/haproxy/conf.d/repo_all` | 14:48 |
snadge | yep so i can connect to it with wget and it works | 14:51 |
snadge | its up at the moment | 14:51 |
snadge | but the last error i got was this | 14:51 |
snadge | https://pastebin.com/MR0EtzRP | 14:51 |
snadge | and now if i run the keystone install again, it will just say that constraints file is missing.. instead of failing during that python_venv_build step as above | 14:56 |
jrosser_ | and the loadbalancer now has the repo server being down? | 14:57 |
snadge | how do i show the haproxy status | 14:59 |
jrosser_ | ` hatop -s /var/run/haproxy.stat` | 15:00 |
jrosser_ | and `journalctl -u haproxy` for the log | 15:00 |
snadge | repo_all-back is up | 15:01 |
jrosser_ | you can also follow the log for a service with `journalctl -fu haproxy` to watch it in real-time | 15:01 |
jrosser_ | it has the feeling of ARP trouble where something else has the same IP tbh | 15:02 |
snadge | repo_all-back times out.. then comes back a few minutes later | 15:04 |
snadge | yeah this sounds like ip conflict, you're right | 15:04 |
jrosser_ | the container IP are allocated randomly from the CIDR for the management network | 15:09 |
snadge | it will be the haproxy ip.. 172.29.236.101 .. a penny has dropped | 15:09 |
jrosser_ | this is important settings https://github.com/openstack/openstack-ansible/blob/master/etc/openstack_deploy/openstack_user_config.yml.example#L91-L95 | 15:10 |
jrosser_ | places where you have your own routers, or you take IP from for the mgmt bridges on hosts need to be excluded from the range available to containers | 15:11 |
*** dviroel is now known as dviroel|lunch | 15:38 | |
snadge | ive shut off the vms that could have conflicted with that haproxy ip address.. i've added all of the br-mgmt static addresses to the used_ips list | 15:41 |
*** Guest200 is now known as prometheanfire | 15:59 | |
snadge | i customised haproxy.cfg, shouldn't that get overwritten? | 16:14 |
jrosser_ | it will get overwritten if you re-run the haproxy playbook | 16:16 |
jrosser_ | the sum total of the haproxy config file is made by glueing together all the generated parts in /etc/haproxy/haproxy.cfg | 16:17 |
jrosser_ | oops /etc/haproxy/conf.d i mean | 16:17 |
snadge | is that part of hosts, inf or openstack playbook | 16:17 |
snadge | im part way through inf now.. i started again after hopefully resolving any potential ip conflict issues | 16:18 |
jrosser_ | it's in infrastrcuture | 16:18 |
jrosser_ | setup-infrastructure.yml is just a list of other playbooks to call | 16:18 |
jrosser_ | you can do them by hand individually as/when you need | 16:18 |
*** dviroel|lunch is now known as dviroel | 16:47 | |
snadge | hmm, it never seemed to overwrite my customisation but apparently it doesn't matter and its going past the keystone setup now | 16:52 |
snadge | all i did was increase the timeout for the repo_all-back from 12000 to 120000 | 16:55 |
snadge | but of course all that did was make it take longer, and it was probably an arp conflict like you said | 16:55 |
snadge | horizon is next.. im so excited to see the gui this time, even though I know at least cinder won't be working.. thats a minor technicality :P | 18:46 |
snadge | looks like its working, i'll get a few hours sleep and fix the storage and networking tomorrow.. thanks again jrosser | 19:21 |
*** dviroel is now known as dviroel|out | 21:48 | |
opendevreview | Merged openstack/openstack-ansible-lxc_hosts stable/xena: Prevent lxc.service from being restarted on package update https://review.opendev.org/c/openstack/openstack-ansible-lxc_hosts/+/852498 | 22:19 |
Generated by irclog2html.py 2.17.3 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!