Monday, 2021-12-27

*** akahat\|ruck is now known as akahat		05:38
admin1	\o	08:11
opendevreview	Dmitriy Rabotyagov proposed openstack/ansible-role-python_venv_build master: Replace virtualenv with exacutable for pip https://review.opendev.org/c/openstack/ansible-role-python_venv_build/+/822998	08:17
opendevreview	Dmitriy Rabotyagov proposed openstack/openstack-ansible master: Update ansible-core to 2.12.1 https://review.opendev.org/c/openstack/openstack-ansible/+/822063	08:18
*** akahat is now known as akahat\|PTO		08:28
kleini	https://opendev.org/openstack/openstack-ansible/src/branch/master/playbooks/healthcheck-hosts.yml#L122 <- how does this make sense? I configured very different IP addresses on the internal networks of my deployment.	08:41
noonedeadpunk	it does in CI :D	08:43
noonedeadpunk	but yeah, you're right, we need to adjust that	08:44
noonedeadpunk	do you want to patch?	08:44
kleini	So, do the healthcheck playbooks only make sense in CI not in prod?	08:44
noonedeadpunk	I think it should be both ideally	08:45
kleini	Okay, let me check, how I can add the actual management network IP there	08:45
noonedeadpunk	I'd say it should be smth like {{ management_address }} there	08:45
kleini	digging in my memories how ansible debugging works	08:46
noonedeadpunk	to have that said, next line is not valid as well due to `openstack.local`	08:47
admin1	hi noonedeadpunk .. i tested upgrade from rocky xenial -> bionic twice in the lab .. there the repo was built and all worked fine .. yesterday i dropped one server in production and the repo is not built .. is there a way to force repo build ?	08:48
noonedeadpunk	that should be {{ openstack_domain }} (or {{ container_domain }} actually)	08:48
noonedeadpunk	admin1: I have super vague memories about old repo_build stuff and it was always super painfull tbh...	08:49
noonedeadpunk	I would need to read through the code the same way you'd do that...	08:49
noonedeadpunk	admin1: how it fails at least?	08:50
admin1	it does not fail .. it skips the build	08:52
admin1	let me gist one run	08:52
noonedeadpunk	oh, there was some var for sure to trigger that....	08:52
noonedeadpunk	`repo_build_wheel_rebuild`	08:53
noonedeadpunk	and `repo_build_venv_rebuild`	08:53
noonedeadpunk	depending on what exactly you want	08:53
noonedeadpunk	But I'd backup repo_servers before doing that	08:53
admin1	even if i limit to just the new container ?	08:54
admin1	on bionic ?	08:54
noonedeadpunk	you can't do this	08:54
noonedeadpunk	oh, well, repo container?	08:54
admin1	c3 and c2 are the old repo containers .... c1 is bionic .. maybe i can do openstack-ansible repo-install.yml -v -e repo_build_wheel_rebuild=true -e repo_build_venv_rebuild=true -l c1_repo_container-xxx	08:55
noonedeadpunk	well, the question is how lsyncd is also configured, as one day it had --delete flag, so whatever you build could be dropped with lsync	08:56
admin1	c3 has the lsyncd ( master) ... i had stopped lsyncd there	08:56
noonedeadpunk	but other then that it might work, yes	08:56
admin1	is a repo built on c1 under binonc overwritten by lsync that runs on c3 ?	08:56
noonedeadpunk	do you know how rsync with --delete works ?:)	08:57
admin1	what is the repo build location in repo containers .. just that i can check if the data is there and back it up ..	08:57
admin1	i do	08:57
noonedeadpunk	lsycn runs on source, all others are destinations	08:57
admin1	i hope my data is still there	08:57
noonedeadpunk	so it's jsut triggeres rsync from c3 with --delete	08:57
admin1	got it	08:58
noonedeadpunk	you can check nginx conf for that/ but it's /var/www/repo/	08:59
noonedeadpunk	venv iirc	08:59
noonedeadpunk	*./venvs	08:59
admin1	i see data in c2 ..	08:59
admin1	checking in c3	09:00
admin1	its there	09:00
admin1	so if i set c2 and c3(lsyncd.lua) on MAINT, disable lsyncd from c3, enable c1 ( bionic) as READY in haproxy, and run openstack-ansible repo-install.yml -v -e repo_build_wheel_rebuild=true -e repo_build_venv_rebuild=true -l c1_repo_container_xxx , it should theoritically build the stuff in c1 ?	09:03
noonedeadpunk	hm, so from what I see, playbook itself decides which repo containers would be used as targets for build... https://opendev.org/openstack/openstack-ansible/src/branch/stable/rocky/playbooks/repo-build.yml#L33-L44	09:08
noonedeadpunk	So I'd really expected that stuff should be built for c1 just by default...	09:08
noonedeadpunk	wait...	09:09
noonedeadpunk	ok, gotcha, that is ridiculous...	09:11
noonedeadpunk	or not)	09:12
noonedeadpunk	so you sure that you don't have anything related to bionic in c1 in /var/www/repo/pools ?	09:12
opendevreview	Dmitriy Rabotyagov proposed openstack/openstack-ansible master: [doc] Update infra node scaling documentation https://review.opendev.org/c/openstack/openstack-ansible/+/822912	09:16
noonedeadpunk	seems solving our issue with failing lxc jobs. Eventually I bleieve only adding setuptools would help, but virtualenv part is somehow messy imo in ansible. it might be fine if we used it for creation, but we have a command for that anyway. https://review.opendev.org/c/openstack/ansible-role-python_venv_build/+/822998	09:19
kleini	green W healthcheck-hosts.yml. Will provide my fixes.	09:19
noonedeadpunk	nice!	09:19
admin1	noonedeadpunk, i rm -rf the /var/www , rebooted the container and retrying ..	09:21
admin1	noonedeadpunk, its searching for some repo_master role .. .. https://gist.githubusercontent.com/a1git/8f4df96f5933d0db944267ac70f584ea/raw/f04f70263e66fe74337ff27931e40a863900eff7/repo-build2.log	09:28
admin1	maybe i should not disable c3 ( lsyncd.lua ) master	09:31
admin1	will enable that and retry .. without the limit	09:31
admin1	i guess there is no way to say rebuild only for 18.04 but skip 16.04 already there.	09:38
noonedeadpunk	well, that's what I suspected kind of....	09:40
noonedeadpunk	but eventually I thought that limit might affect the way this dynamic group will be generated	09:40
admin1	well, i started on -e repo_build_wheel_rebuild=true -e repo_build_venv_rebuild=true without any limits .. i suspect 16.04 might fail ..if checksums are missing or something as its too old .. but 18 might be built .. .. if it fails , then i have backup of /var/www/ which i can restore and retry again	09:41
noonedeadpunk	hm [WARNING]: Could not match supplied host pattern, ignoring: repo_masters	09:41
noonedeadpunk	but you kind of have `{"add_group": "repo_servers_18.04_x86_64", "changed": false, "parent_groups": ["all"]}`	09:43
noonedeadpunk	I kind of afraid about https://opendev.org/openstack/openstack-ansible/src/branch/stable/rocky/playbooks/repo-build.yml#L38	09:44
noonedeadpunk	but eventually this should add a host per each OS version	09:44
noonedeadpunk	not sure why this does not happen	09:44
admin1	strange thing is this worked twice on lab .. i used the same config and variable .. even kept the domain name and ips the same and there it upgraded fine just like the documentation ..	09:45
opendevreview	Dmitriy Rabotyagov proposed openstack/openstack-ansible master: [doc] Update infra node scaling documentation https://review.opendev.org/c/openstack/openstack-ansible/+/822912	09:46
admin1	even if 16.04 is gone, that is ok .. as we are not growing now in 16.04 .. so as long as it builds 18.04 i think its good enough	09:46
noonedeadpunk	well, repo-build used to weirdly failing for me as well, even when having exactly same deployments that were passing.. I'm actually glad we got rid of it....	09:46
noonedeadpunk	well, I can suggest nasty thing then - edit inventory (and openstack_user_config) to have repo container only on c1	09:47
admin1	"openstack-ansible repo-install.yml -vv -e repo_build_wheel_rebuild=true -e repo_build_venv_rebuild=true" is running now .. if this fails, then will try that one	09:48
noonedeadpunk	but I think the question is not only in growing, but also in maintenance of existing xenail	09:48
noonedeadpunk	as you will fail I believe even when trying to adjust some config	09:48
noonedeadpunk	not repo-install.yml, repo-build.yml	09:48
admin1	i don't want to control C .. but it does call repo-build also	09:49
noonedeadpunk	yeah, just wasting time :)	09:50
admin1	its on the "repo_build : Create OpenStack-Ansible requirement wheels" task, so i thikn its working ..	09:50
admin1	think*	09:50
admin1	i see it building in c1 .. finally \o/	09:52
admin1	wheel_build log	09:52
admin1	i know its not relevant anymore, but out of curiosity .. if lsync is on c3, but the new bionic is on c1, does it copy from c1 -> c3 and then lsync it again from c3 ?	09:57
noonedeadpunk	nope, it's not copied from c1	10:01
noonedeadpunk	we never managed to get this flow working really properly.	10:01
admin1	its done .. i see both 16 and 18 packages in c3 and only 18 in c1	10:07
noonedeadpunk	oh?	10:07
admin1	checking with keystone playbook if all is good ..	10:07
admin1	its complaining about /etc/keystone/fernet-keys does not contain keys, use keystone-manage fernet_setup to create Fernet keys. .. .. is it safe to login inside the venv and issue the create command ?	10:16
admin1	glance went in ok ..	10:23
admin1	i will disable keystone and do the rest .. will check into keystone individually later	10:23
admin1	quick question .. when all of this is upgraded(hopefully) , do i have upgrade 1 version at a time ? or can i jump a few versions at once ?	10:25
noonedeadpunk	well, I was jumping R->T and T->V	10:41
admin1	ok	10:41
admin1	exept keystone complaining on fertnet keys, all other services are almost installed .. no errors.. and i used the newly built repo server to ensure it has all the pakages	10:42
noonedeadpunk	and it was pretty well. But you might go your own way:) eventually no upgrades except version+1 are tested by any project	10:42
noonedeadpunk	and nova now explicitly blocks such upgrades from W	10:42
admin1	this cluster is with integrated ceph .. .. i think i need to bump ceph version at some point as well	10:42
admin1	i will do it slow .. 1 version at a time ..	10:43
*** chkumar\|rover is now known as chandankumar		10:50
admin1	can osa handle letsencyrpt ssl automatically if domain is pointed to the external vip ?	10:54
admin1	sorry .. ignore that quesiton	10:55
admin1	except keystone all things look good :)	10:56
opendevreview	Dmitriy Rabotyagov proposed openstack/openstack-ansible-os_keystone master: Drop keystone_default_role_name https://review.opendev.org/c/openstack/openstack-ansible-os_keystone/+/823003	11:06
admin1	noonedeadpunk, seen this error before ? /etc/keystone/fernet-keys does not contain keys, use keystone-manage fernet_setup to create Fernet keys	11:38
admin1	i did the setup command .. but it did not worked	11:38
admin1	i did the setup command .. keystone-manage fernet_setup --keystone-user keystone --keystone-group service .. but it did not helped	11:39
noonedeadpunk	might be smth related to symlinking?	11:39
noonedeadpunk	/etc/keystone is likely a symlink in R	11:39
admin1	its a directory	11:41
*** sshnaidm\|afk is now known as sshnaidm		11:42
noonedeadpunk	hm... I think error must be logged anyway in /var/log/keystone?	11:46
admin1	this is all it has ...100s of lines ..	11:47
admin1	https://gist.githubusercontent.com/a1git/24a333b2976a798a502eb5201f651a60/raw/fcd005b718e8be8aebb6422c40c5083f99d31d61/gistfile1.txt	11:47
admin1	i will try to nuke this container and retry	11:48
admin1	i think it was because i was using it with a limit	12:01
admin1	i did it without limit and it just worked	12:01
admin1	doh !	12:01
noonedeadpunk	ah, keystone with limit never works just in case	12:05
noonedeadpunk	I was just updating https://review.opendev.org/c/openstack/openstack-ansible/+/822912/5/doc/source/admin/scale-environment.rst to mention that)	12:06
admin1	how do i run only mds and mon role but not the osd role	12:09
opendevreview	Dmitriy Rabotyagov proposed openstack/openstack-ansible master: Disable service_token requirement by default https://review.opendev.org/c/openstack/openstack-ansible/+/823005	12:25
noonedeadpunk	admin1: at least you can leverage limit to ceph_mons only for example	13:01
noonedeadpunk	but there could be also tags that would allow to do that	13:01
opendevreview	Dmitriy Rabotyagov proposed openstack/openstack-ansible-os_glance master: Support service tokens https://review.opendev.org/c/openstack/openstack-ansible-os_glance/+/823009	13:06
opendevreview	Dmitriy Rabotyagov proposed openstack/openstack-ansible-os_glance master: Support service tokens https://review.opendev.org/c/openstack/openstack-ansible-os_glance/+/823009	13:07
opendevreview	Dmitriy Rabotyagov proposed openstack/openstack-ansible-os_glance master: Support service tokens https://review.opendev.org/c/openstack/openstack-ansible-os_glance/+/823009	13:09
opendevreview	Dmitriy Rabotyagov proposed openstack/openstack-ansible-os_glance master: Support service tokens https://review.opendev.org/c/openstack/openstack-ansible-os_glance/+/823009	13:52
noonedeadpunk	so, zun now fails single tempest job that is eaily reproducable in aio - test_run_container_with_cinder_volume_dynamic_created	14:39
noonedeadpunk	I wonder if it should be run at all considering that test_run_container_with_cinder_volume is disabled because of bug https://bugs.launchpad.net/zun/+bug/1897497	14:40
noonedeadpunk	https://github.com/openstack/zun-tempest-plugin/blob/master/zun_tempest_plugin/tests/tempest/api/test_containers.py#L380	14:41
noonedeadpunk	ah, no `No iscsi_target is presently exported for volume`. So we have just our CI broken I guess	14:58
admin1	is ubuntu-esm-infra.list part of osa ?	15:48
admin1	what happened is xenial has version 13.0 of ceph (mimic) .. bionic got version 12.0 of ceph . -- both point to the same mimic repo .. but i found this deb https://esm.ubuntu.com/infra/ubuntu xenial-infra-security main extra on xenial host with the name	15:49
admin1	sources.list.d/ubuntu-esm-infra.list	15:49
noonedeadpunk	no, I don't think it is	16:02
noonedeadpunk	can't recall having that	16:02
admin1	i found out .. in bionic ceph pinning, its like this Pin: release o=Ubuntu .. while in xenial its Pin: release o=ceph.com	16:09
admin1	so one got pinned via ceph.com, other via Ubuntu	16:09
noonedeadpunk	yeah, but pinning not in sources	16:13
admin1	changing it manually to ceph.com and then apt upgrade reboot fixed it	16:29
admin1	one server is done .. 2 more controllers to go	16:29
jrosser_	on old releases like this there are variable marks to set if it takes the Ubuntu ceph packages, the UCA ones or the ones at ceph.com	16:36
jrosser_	it’s not automatic to choose the one you need/want	16:36
admin1	i changed to ceph.com, rebooted .. they are good to go .. then i ran playbooks again, it set it back to Ubuntu , but since package already upgraded, it did not downgrade it .. so i am good	16:38
admin1	one wishlist from a long time is to run actual swift using osa ..	16:48
admin1	because its not strong consistency but eventual consistency, i can place the servers in dual datacenters ( including high latency) and be sure that backups are protected	16:48
noonedeadpunk	well, with ceph you can have rgw (with swift and s3 compatability) relatively easily	16:49
noonedeadpunk	and do cross-region bacups as well	16:49
opendevreview	James Denton proposed openstack/openstack-ansible-os_glance master: Define _glance_available_stores in variables https://review.opendev.org/c/openstack/openstack-ansible-os_glance/+/822899	16:53
opendevreview	Dmitriy Rabotyagov proposed openstack/openstack-ansible-os_glance master: Support service tokens https://review.opendev.org/c/openstack/openstack-ansible-os_glance/+/823009	17:21
opendevreview	Dmitriy Rabotyagov proposed openstack/openstack-ansible-os_glance master: Add boto3 module for s3 backend https://review.opendev.org/c/openstack/openstack-ansible-os_glance/+/822870	17:21
opendevreview	Dmitriy Rabotyagov proposed openstack/openstack-ansible-os_glance master: Support service tokens https://review.opendev.org/c/openstack/openstack-ansible-os_glance/+/823009	17:21
opendevreview	Marcus Klein proposed openstack/openstack-ansible master: fix healthcheck-hosts.yml for different configuration https://review.opendev.org/c/openstack/openstack-ansible/+/823023	17:31
admin1	noonedeadpunk, jrosser_ .. thanks for all the help and support ..	17:31
kleini	https://review.opendev.org/c/openstack/openstack-ansible/+/774472 <- this commit removed the openstacksdk which is used by healthcheck-openstack.yml. How does it work in CI, if it fails for me in prod?	17:57
noonedeadpunk	kleini: we don't run healthcheck-openstack.yml in CI	18:13
noonedeadpunk	eventually, I think it needs to spawn just own venv with clients or be delegated to utility container (second is easier)	18:13
kleini	so only healthcheck-hosts.yml is run in CI?	18:16
kleini	and healthcheck-openstack.yml works again when adding openstacksdk to requirements.txt again	18:17
kleini	will try to delegate healthcheck-openstack.yml to utility container. need to find some example, how to do that	18:17
noonedeadpunk	and healthcheck-infrastructure.yml is also run	18:23
noonedeadpunk	openstack is not as tempest or rally is the way better method how to test openstack	18:24
noonedeadpunk	as eventually this playbook needs to be maintained, while tempest is maintained by service developers	18:25
kleini	okay, will skip it then and stick to tempest	18:25
noonedeadpunk	kleini: eventually, I think you just need to replace `hosts: localhost` with `hosts: utility_all[0	18:25
noonedeadpunk	* `hosts: groups['utility_all'][0]`	18:26
kleini	did that and it says again that openstacksdk is missing	18:27
kleini	so maybe additionally the venv needs to be set	18:27
noonedeadpunk	and set ansible_python_interpreter: "{{ utility_venv_bin/python }}"	18:27
noonedeadpunk	* "{{ utility_venv_bin }}/python"	18:28
noonedeadpunk	but tbh I'd rather dropped that playbook in favor of tempest unless somebody really wants to maintain it and finds useful	18:32
kleini	works	18:34
kleini	tempest is hard for me to configure in regards to which tests should be run. there is no list of test, no list of suites	18:35
kleini	and suite "smoke" is nearly nothing useful. it tests only keystone API	18:36
opendevreview	Marcus Klein proposed openstack/openstack-ansible master: fix healthcheck-hosts.yml for different configuration https://review.opendev.org/c/openstack/openstack-ansible/+/823023	18:47
jrosser_	kleini: if you want to validate your install you should look at refstack https://refstack.openstack.org	19:42
jrosser_	that bundles tempest and a defined set of tests for validating interoperability	19:42
admin1	with the PKI certs in place, is the kesytone url for ceph object storage still http://<internal-vip:5000> or something else ?	21:20
admin1	to integrate osa managed openstack and ceph-ansible managed ceph	21:20
admin1	to add swift/s3 of ceph to openstack	21:21
jrosser_	admin1: although we have the PKI role in place now, the internal endpoint still defaults to http rather than https	22:02
jrosser_	there are instructions here for if you want to switch that to https, which you could do on fresh deployments https://github.com/openstack/openstack-ansible/blob/master/doc/source/user/security/ssl-certificates.rst#tls-for-haproxy-internal-vip	22:03
admin1	how recommended is it to use https:// for internal traffic .. i have tested and internal is robust ( i mean does not leak to guests )	22:04
jrosser_	for the internal keystone endpoint, when you enable https, the certificate should be valid for whatever fqdn or ip you have defined the internal vip as	22:04
jrosser_	we will switch to defaulting to https at some future release	22:04
jrosser_	currently there is no upgrade path for that so the default remains as http	22:04
admin1	question is .. for those doing ceph-ansible + osa .. if we switch to pki, since its self signed cert, do we need to copy the ca certs etc to ceph mons as well ?	22:05
jrosser_	well, there is an upgrade path if you're happy that the control plane is broken for the period of doing an upgrade	22:05
admin1	its all in lab ..	22:05
jrosser_	it is not self signed	22:05
jrosser_	it creates a CA root, which is self signed	22:05
jrosser_	so there is a CA cert you can copy off the deploy node and install onto whatever else you want	22:06
admin1	so for the ceph-mons to connect to https:// keystone, that ca cert is the only thing that needs to be copied over ..	22:06
jrosser_	probably	22:06
jrosser_	different things tend to behave differently	22:06
jrosser_	libvirt has different needs to python code, for example	22:07
jrosser_	for where you put the CA, if it needs a copy of the intermediate CA, if it wants a cert chain blah blah blah	22:07
jrosser_	first thing to do is add the OSA PKI root to the system CA store of your ceph nodes and see if thats good enough	22:08
admin1	yeah . no hurry now .. just doing some future roadmap plannings .	22:08
jrosser_	if not, then dig into the ceph docs to find what it wants	22:08
jrosser_	there are some things we don't yet test really	22:09
jrosser_	like providing your own external cert, and also using the PKI role for internal	22:09
admin1	if we provide own external cert, is that cert used instead of pki ?	22:10
jrosser_	or providing your own intermediate CA / key from an existing company CA for osa+PKI to use	22:10
jrosser_	for what? there are now lots of certs	22:10
admin1	:D	22:10
jrosser_	for external, you would use the vars in the haproxy role, which should be very similar/same as before	22:11
admin1	for example .. if we change the internal vip for example to cloud-int.domain.com and external vip to cloud.domain.com and provide a san/wildcard that satisfies both cloud-int and cloud.domain.com, can that cert be used for internal and external instead of the self-signed pki ?	22:11
jrosser_	that sort of misses the point	22:13
jrosser_	you need ssl on rabbitmq today regardless, and that is coming from the PKI role	22:13
jrosser_	so the internal VIP is really just one of very many places that certificates are required	22:14
admin1	that is true ..	22:14
jrosser_	and imho it is more important to have well designed trust on the internal SSL, more important than it being a certificate from a "real issuer"	22:15
admin1	some "customers" really insist on having everything "certified" :)	22:16
jrosser_	the trouble is you can't have certificates issued for rfc1918 ip addresses, or things which are not public	22:16
jrosser_	so thats basically broken thinking	22:16
admin1	yeah	22:18
jrosser_	having a private internal CA is more secure than a publically trusted one	22:18
jrosser_	becasue the internal things will only authenticate with each other, not with an external hacker	22:18

Generated by irclog2html.py 2.17.3 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!