Wednesday, 2022-08-10

*** ysandeep\|out is now known as ysandeep		01:23
*** ysandeep is now known as ysandeep\|afk		02:44
*** ysandeep\|afk is now known as ysandeep		04:57
opendevreview	Dmitriy Rabotyagov proposed openstack/openstack-ansible-ceph_client master: Provide opportunity to define cluster_name https://review.opendev.org/c/openstack/openstack-ansible-ceph_client/+/852588	05:07
*** prometheanfire is now known as Guest194		07:00
*** ysandeep is now known as ysandeep\|afk		07:09
*** Guest194 is now known as Guest200		07:28
evrardjp	noonedeadpunk: I disagree, the fact is that the config should radically be different, IMO	07:34
evrardjp	else I would not even try this	07:35
evrardjp	and I don't think that for your case you need variable generation. There are plenty of places for osa case that template is a better choice	07:36
evrardjp	but let's try the PoC see how far that goes	07:36
*** ysandeep\|afk is now known as ysandeep		07:37
*** Adri2000_ is now known as Adri2000		08:29
evrardjp	Ok I am at the end of the time I have for the PoC, and I see that this is a positive improvement, yet too marginal to be worth the risks of failed migrations.	08:43
evrardjp	The results removed completely the variables from osa/inventory/group_vars/haproxy, reconfigured the role to use an external role, put all the "desired state" into the deployer node	08:43
evrardjp	it would load the role to reconfigure haproxy if necessary, in each playbook, with include_role tasks_from to allow for a reconfiguration of a frontend/backend live	08:45
evrardjp	I need a quick patch on the upstream role for it	08:45
evrardjp	I had two ways to reconfigure using my external role	08:46
evrardjp	the first way was to generate a series of vars (using set_facts) that would give a proper config. Sadly this becomes very convoluted when configuring haproxy from a non haproxy play	08:47
evrardjp	the other way, was to template directly from OSA existing role and use my other role to reload the configuration/handle the state. This is relatively good in terms of code cleanup, but only bring marginal improvements over the whole configuration	08:47
evrardjp	a mix of those two models could give great results, but at the risk of the clarity and complexity during an upgrade.	08:48
evrardjp	I think that noonedeadpunk's patch on 'interface' and maybe future changes in the templating are "good enough" for a majority of OSA users.	08:50
evrardjp	for people who want something different, I am sure my role can deliver it, if you think it "from scratch". Now it's a tad late for OSA for the marginal improvements	08:50
evrardjp	So there you go, 1 day flushed :)	08:51
opendevreview	Dmitriy Rabotyagov proposed openstack/openstack-ansible-ceph_client master: Do not delegate facts when fetching keyrings https://review.opendev.org/c/openstack/openstack-ansible-ceph_client/+/852714	08:53
jrosser_	urgh centos-9-stream jobs are broken	09:05
*** ysandeep is now known as ysandeep\|lunch		09:34
snadge	im running into an issue deploying a yoga install on ubuntu 20.04.4.. https://pastebin.com/24UuhBPX	09:51
snadge	im aware of this issue: https://bugs.launchpad.net/openstack-ansible/+bug/1943978 .. and have installed that patch	09:51
noonedeadpunk	snadge: from the error it seems that haproxy does not see any alive nginx on repo_containers	09:54
noonedeadpunk	so have a feeling that repo-install.yml has failed previously	09:54
snadge	oddly.. wgetting that file seems to work.. perhaps something is bouncing up and down	09:55
noonedeadpunk	yeah, with centos seems we got bad timing for repo updates...	09:58
snadge	ive seen this message a few times "backend repo_all-back has no server available!"	09:58
noonedeadpunk	*infra mirrors sync	09:58
noonedeadpunk	snadge: well yes, that would explain 503	09:58
snadge	i wonder why it did that.. one of the playbooks must have made it go to lunch	09:58
noonedeadpunk	or well, that's is the reason of 503)	09:58
snadge	then haproxy has marked it nonresponsive or whatever	09:59
opendevreview	Jean-Philippe Evrard proposed openstack/openstack-ansible master: Cleanup useless variables https://review.opendev.org/c/openstack/openstack-ansible/+/852563	10:06
snadge	yeah something is causing the repo server to drop out	10:10
snadge	but the problem seems intermittent	10:11
mrf	mmm what text editor got the containers?	10:12
mrf	nano vi?	10:12
snadge	cinder-volume is crashing in a loop saying access denied to user cinder .. using password yes.. it seems like a mysql error?	10:20
snadge	this is on the controller which runs all the containers plus galera etc	10:21
snadge	i wonder if thats whats loading the system up and causing the repo server to drop out	10:21
mrf	why i cant edit from host the /var/lib/mysql/grastate.dat in the path of rootfs /var/lib/lxc/controller1_galera_container-45ce6c70/rootfs/var/lib/mysql ??	10:43
mrf	solved finally used sed... for replace the boostrap	10:46
noonedeadpunk	I have a problem in my sandbox. `internal endpoint for volumev3 service in az-poc region not found` https://paste.openstack.org/show/bC3upHmpHotyE5PyKhnG/	10:51
noonedeadpunk	wtf is that....	10:51
noonedeadpunk	mrf: /var/lib/mysql is a bind mount inside the container. So you should check for actual path on the host	10:52
noonedeadpunk	inside token stanza for me catalog is weird indeed. It's somehow filtered I would say	10:53
noonedeadpunk	ok, wtf https://paste.openstack.org/show/bPjXX6jQFm0WWa0yh4Ka/	11:04
*** ysandeep\|lunch is now known as ysandeep		11:06
jrosser_	mrf: there won't be an editor in the containers, they are as minimal as practical. you can install vim or whatever if you need it	11:09
jrosser_	snadge: wget from the haproxy node and also check the haproxy log will be useful	11:10
jrosser_	all interaction with the repo hosts will be via the loadbalancer so it's important to find why that is unstable	11:10
jrosser_	mrf: if you are having database trouble then we have some docs here https://docs.openstack.org/openstack-ansible/latest/admin/maintenance-tasks.html#galera-cluster-maintenance	11:13
mrf	yeah i already solved with a sed... forcing the a one to bootstrap	11:15
*** dviroel\|out is now known as dviroel		11:24
noonedeadpunk	so, basically catalog is taken from your auth, and endpoints from a separate API request	11:39
noonedeadpunk	why not everything is returned during token generation then...	11:39
noonedeadpunk	ok, I know what is that	12:07
snadge	maybe im running out of tcp ports or something stupid on the host? a whole bunch of servers went down at the same time this time	12:58
snadge	it seems haproxy logs to /dev/log which is just the main journal	13:01
snadge	it crashes during keystone setup in setup_openstack.. and now its just bailing saying it cant find "/var/www/repo/os-releases/25.0.0/ubuntu-20.04-x86_64/requirements/keystone-25.0.0-constraints.txt"	13:05
snadge	so i have to blow away the keystone container and just reinstall that part? i got stuck in this loop last time	13:06
snadge	i knew i shouldn't have used version 25 :(	13:06
snadge	how do i rebuild that file?	13:14
mrf	Could not find the requested service aodh-api: host" mmm for aodh we just need in the yml the metering-alarm_hosts no?	13:29
*** ysandeep is now known as ysandeep\|break		13:30
jrosser_	snadge: that sounds like you still have problems with the repo server	13:34
jrosser_	i am not sure re-creating the keystone container is going to help	13:35
snadge	yeah because i've done this once already, i need to try and find out why its happening	13:35
jrosser_	i think also you are installing 25.0.0 tag, which would not include any bugfixes that have been applied to yoga since the first release	13:35
snadge	i will check	13:35
mrf	stable/yoga git download the 25.0.0 tag	13:36
mrf	same happens to me	13:36
jrosser_	no :)	13:36
mrf	yes	13:36
jrosser_	stable/yoga is the head of the branch	13:36
mrf	in my deploy i read 100% 25.0.0	13:36
mrf	and i git the stable/yoga	13:36
snadge	it is set to 25.0.0	13:37
snadge	how do i change it to the latest yoga?	13:37
mrf	im re running the install of aodh containers will check the tag, but im 99% sure that it show 25.0.0 for stable/yoga	13:38
snadge	there is b1, rc1 and rc2	13:38
jrosser_	beta1, release candidate 1 and 2	13:38
snadge	they will be older then? .. oh you are suggesting trying the dev branch	13:39
jrosser_	i don't know what that means	13:39
jrosser_	stable/yoga is a branch	13:39
jrosser_	25.x.x are tags that mark points in the history of that branch	13:39
snadge	ah okay that makes sense now.. so if i want some fixes that have been done since 25.0.0 i can switch to stable/yoga	13:40
NeilHanlon	snadge: does this visualization help, or no? https://drop1.neilhanlon.me/irc/uploads/ae91b2a8fb5663f5/image.png	13:41
snadge	i need to figure out why the repo server crashes during keystone install.. but it gets jammed, and i have to blow away the keystone container to start again	13:43
jrosser_	the only thing to note is when you switch to checking out stable/yoga the installed version will become something like 25.1.0.dev33	13:43
jrosser_	snadge: can you paste some more debug about what is happening?	13:43
jrosser_	installing keystone should not affect the repo server, it is helpful if we can debug it	13:44
snadge	well now im at the point where i have the second error that the constraint file is missing	13:45
snadge	so i have to blow it all away to get it to crash the repo server.. and even then, i probably wont know why	13:45
mrf	jrosser how to checkout the installed version?	13:45
mrf	any file in openstack_ git contains version string?	13:46
jrosser_	mrf: it is templated into the top of /usr/local/bin/openstack-ansible	13:46
jrosser_	snadge: we can help debug if you like	13:46
snadge	that would be great, its real late here but i wouldn't mind progressing past this block at least	13:47
jrosser_	there are standard debug things to try, like wget the same file several times	13:47
jrosser_	the loadbalancer will hit each repo server in turn so if you get 1-in-3 type succeed/fail then you know that the contents of the repo servers are not synchronised	13:48
mrf	export OSA_VERSION="25.0.0"	13:48
jrosser_	mrf: that isntallation is the result of `git checkout 25.0.0`	13:48
snadge	there is only one repo container.. so it shouldn't even really need haproxy?	13:49
snadge	this is a fairly small install	13:49
mrf	from my cli history "575 git clone -b stable/yoga https://opendev.org/openstack/openstack-ansible /opt/openstack-ansible"	13:49
jrosser_	mrf: are you re-running `scripts/bootstrap-ansible.sh` each time you change the checkout of openstack-ansible to deploy?	13:52
mrf	i never changed :( is the first time we use ansible for deploy openstack...	13:52
mrf	it just a virtual envirioment for test it	13:53
jrosser_	if you change from tag 2.5.0 to stable/yoga then you really should re-run the bootstrap script	13:53
mrf	re-bootstraped and changed to export OSA_VERSION="25.0.1.dev3"	13:57
jrosser_	did you git fetch?	14:00
snadge	can i just rebuild the repo container?	14:12
jrosser_	you can re-run the playbook for it, no problem	14:16
jrosser_	you can also delete/re-create it completely	14:16
jrosser_	but i will add that /var/www/repo/os-releases/25.0.0/ubuntu-20.04-x86_64/requirements/keystone-25.0.0-constraints.txt" is a file created during the keystone playbook, not when the repo server is built	14:17
snadge	okay i just need to figure out why the playbook isn't creating that file and putting it into the repo then	14:19
snadge	i dont know which playbook it is, i can only assume it thinks its already done and skipping it or something	14:33
*** ysandeep\|break is now known as ysandeep		14:34
snadge	it just happened again, and i couldn't figure out why.. i desperately tried turning the timout for haproxy way up	14:41
snadge	but it didnt help	14:41
jrosser_	it would really help to see pastes of the log	14:41
jrosser_	becasue i don't know if you are talking about 404 or 503	14:42
snadge	the ansible playback log is about all i have to go on	14:42
snadge	haproxy logs to the journal on the controller i think, and it doesnt say much other than down, i simply dont know where to look	14:43
jrosser_	ok	14:43
jrosser_	and is it down?	14:44
jrosser_	haproxy is checking for something very specific being present, not just that the socket accepts a connection	14:44
jrosser_	so "down" can mean network problems, nginx not running, the file it's checking being absent.....	14:45
*** ysandeep is now known as ysandeep\|out		14:47
mgariepy	snadge, look at the haproxy config to see what it's looking for for that service	14:47
jrosser_	`cat /etc/haproxy/conf.d/repo_all`	14:48
snadge	yep so i can connect to it with wget and it works	14:51
snadge	its up at the moment	14:51
snadge	but the last error i got was this	14:51
snadge	https://pastebin.com/MR0EtzRP	14:51
snadge	and now if i run the keystone install again, it will just say that constraints file is missing.. instead of failing during that python_venv_build step as above	14:56
jrosser_	and the loadbalancer now has the repo server being down?	14:57
snadge	how do i show the haproxy status	14:59
jrosser_	` hatop -s /var/run/haproxy.stat`	15:00
jrosser_	and `journalctl -u haproxy` for the log	15:00
snadge	repo_all-back is up	15:01
jrosser_	you can also follow the log for a service with `journalctl -fu haproxy` to watch it in real-time	15:01
jrosser_	it has the feeling of ARP trouble where something else has the same IP tbh	15:02
snadge	repo_all-back times out.. then comes back a few minutes later	15:04
snadge	yeah this sounds like ip conflict, you're right	15:04
jrosser_	the container IP are allocated randomly from the CIDR for the management network	15:09
snadge	it will be the haproxy ip.. 172.29.236.101 .. a penny has dropped	15:09
jrosser_	this is important settings https://github.com/openstack/openstack-ansible/blob/master/etc/openstack_deploy/openstack_user_config.yml.example#L91-L95	15:10
jrosser_	places where you have your own routers, or you take IP from for the mgmt bridges on hosts need to be excluded from the range available to containers	15:11
*** dviroel is now known as dviroel\|lunch		15:38
snadge	ive shut off the vms that could have conflicted with that haproxy ip address.. i've added all of the br-mgmt static addresses to the used_ips list	15:41
*** Guest200 is now known as prometheanfire		15:59
snadge	i customised haproxy.cfg, shouldn't that get overwritten?	16:14
jrosser_	it will get overwritten if you re-run the haproxy playbook	16:16
jrosser_	the sum total of the haproxy config file is made by glueing together all the generated parts in /etc/haproxy/haproxy.cfg	16:17
jrosser_	oops /etc/haproxy/conf.d i mean	16:17
snadge	is that part of hosts, inf or openstack playbook	16:17
snadge	im part way through inf now.. i started again after hopefully resolving any potential ip conflict issues	16:18
jrosser_	it's in infrastrcuture	16:18
jrosser_	setup-infrastructure.yml is just a list of other playbooks to call	16:18
jrosser_	you can do them by hand individually as/when you need	16:18
*** dviroel\|lunch is now known as dviroel		16:47
snadge	hmm, it never seemed to overwrite my customisation but apparently it doesn't matter and its going past the keystone setup now	16:52
snadge	all i did was increase the timeout for the repo_all-back from 12000 to 120000	16:55
snadge	but of course all that did was make it take longer, and it was probably an arp conflict like you said	16:55
snadge	horizon is next.. im so excited to see the gui this time, even though I know at least cinder won't be working.. thats a minor technicality :P	18:46
snadge	looks like its working, i'll get a few hours sleep and fix the storage and networking tomorrow.. thanks again jrosser	19:21
*** dviroel is now known as dviroel\|out		21:48
opendevreview	Merged openstack/openstack-ansible-lxc_hosts stable/xena: Prevent lxc.service from being restarted on package update https://review.opendev.org/c/openstack/openstack-ansible-lxc_hosts/+/852498	22:19

Generated by irclog2html.py 2.17.3 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!