*** akahat|ruck is now known as akahat | 05:38 | |
admin1 | \o | 08:11 |
---|---|---|
opendevreview | Dmitriy Rabotyagov proposed openstack/ansible-role-python_venv_build master: Replace virtualenv with exacutable for pip https://review.opendev.org/c/openstack/ansible-role-python_venv_build/+/822998 | 08:17 |
opendevreview | Dmitriy Rabotyagov proposed openstack/openstack-ansible master: Update ansible-core to 2.12.1 https://review.opendev.org/c/openstack/openstack-ansible/+/822063 | 08:18 |
*** akahat is now known as akahat|PTO | 08:28 | |
kleini | https://opendev.org/openstack/openstack-ansible/src/branch/master/playbooks/healthcheck-hosts.yml#L122 <- how does this make sense? I configured very different IP addresses on the internal networks of my deployment. | 08:41 |
noonedeadpunk | it does in CI :D | 08:43 |
noonedeadpunk | but yeah, you're right, we need to adjust that | 08:44 |
noonedeadpunk | do you want to patch? | 08:44 |
kleini | So, do the healthcheck playbooks only make sense in CI not in prod? | 08:44 |
noonedeadpunk | I think it should be both ideally | 08:45 |
kleini | Okay, let me check, how I can add the actual management network IP there | 08:45 |
noonedeadpunk | I'd say it should be smth like {{ management_address }} there | 08:45 |
kleini | digging in my memories how ansible debugging works | 08:46 |
noonedeadpunk | to have that said, next line is not valid as well due to `openstack.local` | 08:47 |
admin1 | hi noonedeadpunk .. i tested upgrade from rocky xenial -> bionic twice in the lab .. there the repo was built and all worked fine .. yesterday i dropped one server in production and the repo is not built .. is there a way to force repo build ? | 08:48 |
noonedeadpunk | that should be {{ openstack_domain }} (or {{ container_domain }} actually) | 08:48 |
noonedeadpunk | admin1: I have super vague memories about old repo_build stuff and it was always super painfull tbh... | 08:49 |
noonedeadpunk | I would need to read through the code the same way you'd do that... | 08:49 |
noonedeadpunk | admin1: how it fails at least? | 08:50 |
admin1 | it does not fail .. it skips the build | 08:52 |
admin1 | let me gist one run | 08:52 |
noonedeadpunk | oh, there was some var for sure to trigger that.... | 08:52 |
noonedeadpunk | `repo_build_wheel_rebuild` | 08:53 |
noonedeadpunk | and `repo_build_venv_rebuild` | 08:53 |
noonedeadpunk | depending on what exactly you want | 08:53 |
noonedeadpunk | But I'd backup repo_servers before doing that | 08:53 |
admin1 | even if i limit to just the new container ? | 08:54 |
admin1 | on bionic ? | 08:54 |
noonedeadpunk | you can't do this | 08:54 |
noonedeadpunk | oh, well, repo container? | 08:54 |
admin1 | c3 and c2 are the old repo containers .... c1 is bionic .. maybe i can do openstack-ansible repo-install.yml -v -e repo_build_wheel_rebuild=true -e repo_build_venv_rebuild=true -l c1_repo_container-xxx | 08:55 |
noonedeadpunk | well, the question is how lsyncd is also configured, as one day it had --delete flag, so whatever you build could be dropped with lsync | 08:56 |
admin1 | c3 has the lsyncd ( master) ... i had stopped lsyncd there | 08:56 |
noonedeadpunk | but other then that it might work, yes | 08:56 |
admin1 | is a repo built on c1 under binonc overwritten by lsync that runs on c3 ? | 08:56 |
noonedeadpunk | do you know how rsync with --delete works ?:) | 08:57 |
admin1 | what is the repo build location in repo containers .. just that i can check if the data is there and back it up .. | 08:57 |
admin1 | i do | 08:57 |
noonedeadpunk | lsycn runs on source, all others are destinations | 08:57 |
admin1 | i hope my data is still there | 08:57 |
noonedeadpunk | so it's jsut triggeres rsync from c3 with --delete | 08:57 |
admin1 | got it | 08:58 |
noonedeadpunk | you can check nginx conf for that/ but it's /var/www/repo/ | 08:59 |
noonedeadpunk | venv iirc | 08:59 |
noonedeadpunk | *./venvs | 08:59 |
admin1 | i see data in c2 .. | 08:59 |
admin1 | checking in c3 | 09:00 |
admin1 | its there | 09:00 |
admin1 | so if i set c2 and c3(lsyncd.lua) on MAINT, disable lsyncd from c3, enable c1 ( bionic) as READY in haproxy, and run openstack-ansible repo-install.yml -v -e repo_build_wheel_rebuild=true -e repo_build_venv_rebuild=true -l c1_repo_container_xxx , it should theoritically build the stuff in c1 ? | 09:03 |
noonedeadpunk | hm, so from what I see, playbook itself decides which repo containers would be used as targets for build... https://opendev.org/openstack/openstack-ansible/src/branch/stable/rocky/playbooks/repo-build.yml#L33-L44 | 09:08 |
noonedeadpunk | So I'd really expected that stuff should be built for c1 just by default... | 09:08 |
noonedeadpunk | wait... | 09:09 |
noonedeadpunk | ok, gotcha, that is ridiculous... | 09:11 |
noonedeadpunk | or not) | 09:12 |
noonedeadpunk | so you sure that you don't have anything related to bionic in c1 in /var/www/repo/pools ? | 09:12 |
opendevreview | Dmitriy Rabotyagov proposed openstack/openstack-ansible master: [doc] Update infra node scaling documentation https://review.opendev.org/c/openstack/openstack-ansible/+/822912 | 09:16 |
noonedeadpunk | seems solving our issue with failing lxc jobs. Eventually I bleieve only adding setuptools would help, but virtualenv part is somehow messy imo in ansible. it might be fine if we used it for creation, but we have a command for that anyway. https://review.opendev.org/c/openstack/ansible-role-python_venv_build/+/822998 | 09:19 |
kleini | green W healthcheck-hosts.yml. Will provide my fixes. | 09:19 |
noonedeadpunk | nice! | 09:19 |
admin1 | noonedeadpunk, i rm -rf the /var/www , rebooted the container and retrying .. | 09:21 |
admin1 | noonedeadpunk, its searching for some repo_master role .. .. https://gist.githubusercontent.com/a1git/8f4df96f5933d0db944267ac70f584ea/raw/f04f70263e66fe74337ff27931e40a863900eff7/repo-build2.log | 09:28 |
admin1 | maybe i should not disable c3 ( lsyncd.lua ) master | 09:31 |
admin1 | will enable that and retry .. without the limit | 09:31 |
admin1 | i guess there is no way to say rebuild only for 18.04 but skip 16.04 already there. | 09:38 |
noonedeadpunk | well, that's what I suspected kind of.... | 09:40 |
noonedeadpunk | but eventually I thought that limit might affect the way this dynamic group will be generated | 09:40 |
admin1 | well, i started on -e repo_build_wheel_rebuild=true -e repo_build_venv_rebuild=true without any limits .. i suspect 16.04 might fail ..if checksums are missing or something as its too old .. but 18 might be built .. .. if it fails , then i have backup of /var/www/ which i can restore and retry again | 09:41 |
noonedeadpunk | hm [WARNING]: Could not match supplied host pattern, ignoring: repo_masters | 09:41 |
noonedeadpunk | but you kind of have `{"add_group": "repo_servers_18.04_x86_64", "changed": false, "parent_groups": ["all"]}` | 09:43 |
noonedeadpunk | I kind of afraid about https://opendev.org/openstack/openstack-ansible/src/branch/stable/rocky/playbooks/repo-build.yml#L38 | 09:44 |
noonedeadpunk | but eventually this should add a host per each OS version | 09:44 |
noonedeadpunk | not sure why this does not happen | 09:44 |
admin1 | strange thing is this worked twice on lab .. i used the same config and variable .. even kept the domain name and ips the same and there it upgraded fine just like the documentation .. | 09:45 |
opendevreview | Dmitriy Rabotyagov proposed openstack/openstack-ansible master: [doc] Update infra node scaling documentation https://review.opendev.org/c/openstack/openstack-ansible/+/822912 | 09:46 |
admin1 | even if 16.04 is gone, that is ok .. as we are not growing now in 16.04 .. so as long as it builds 18.04 i think its good enough | 09:46 |
noonedeadpunk | well, repo-build used to weirdly failing for me as well, even when having exactly same deployments that were passing.. I'm actually glad we got rid of it.... | 09:46 |
noonedeadpunk | well, I can suggest nasty thing then - edit inventory (and openstack_user_config) to have repo container only on c1 | 09:47 |
admin1 | "openstack-ansible repo-install.yml -vv -e repo_build_wheel_rebuild=true -e repo_build_venv_rebuild=true" is running now .. if this fails, then will try that one | 09:48 |
noonedeadpunk | but I think the question is not only in growing, but also in maintenance of existing xenail | 09:48 |
noonedeadpunk | as you will fail I believe even when trying to adjust some config | 09:48 |
noonedeadpunk | not repo-install.yml, repo-build.yml | 09:48 |
admin1 | i don't want to control C .. but it does call repo-build also | 09:49 |
noonedeadpunk | yeah, just wasting time :) | 09:50 |
admin1 | its on the "repo_build : Create OpenStack-Ansible requirement wheels" task, so i thikn its working .. | 09:50 |
admin1 | think* | 09:50 |
admin1 | i see it building in c1 .. finally \o/ | 09:52 |
admin1 | wheel_build log | 09:52 |
admin1 | i know its not relevant anymore, but out of curiosity .. if lsync is on c3, but the new bionic is on c1, does it copy from c1 -> c3 and then lsync it again from c3 ? | 09:57 |
noonedeadpunk | nope, it's not copied from c1 | 10:01 |
noonedeadpunk | we never managed to get this flow working really properly. | 10:01 |
admin1 | its done .. i see both 16 and 18 packages in c3 and only 18 in c1 | 10:07 |
noonedeadpunk | oh? | 10:07 |
admin1 | checking with keystone playbook if all is good .. | 10:07 |
admin1 | its complaining about /etc/keystone/fernet-keys does not contain keys, use keystone-manage fernet_setup to create Fernet keys. .. .. is it safe to login inside the venv and issue the create command ? | 10:16 |
admin1 | glance went in ok .. | 10:23 |
admin1 | i will disable keystone and do the rest .. will check into keystone individually later | 10:23 |
admin1 | quick question .. when all of this is upgraded(hopefully) , do i have upgrade 1 version at a time ? or can i jump a few versions at once ? | 10:25 |
noonedeadpunk | well, I was jumping R->T and T->V | 10:41 |
admin1 | ok | 10:41 |
admin1 | exept keystone complaining on fertnet keys, all other services are almost installed .. no errors.. and i used the newly built repo server to ensure it has all the pakages | 10:42 |
noonedeadpunk | and it was pretty well. But you might go your own way:) eventually no upgrades except version+1 are tested by any project | 10:42 |
noonedeadpunk | and nova now explicitly blocks such upgrades from W | 10:42 |
admin1 | this cluster is with integrated ceph .. .. i think i need to bump ceph version at some point as well | 10:42 |
admin1 | i will do it slow .. 1 version at a time .. | 10:43 |
*** chkumar|rover is now known as chandankumar | 10:50 | |
admin1 | can osa handle letsencyrpt ssl automatically if domain is pointed to the external vip ? | 10:54 |
admin1 | sorry .. ignore that quesiton | 10:55 |
admin1 | except keystone all things look good :) | 10:56 |
opendevreview | Dmitriy Rabotyagov proposed openstack/openstack-ansible-os_keystone master: Drop keystone_default_role_name https://review.opendev.org/c/openstack/openstack-ansible-os_keystone/+/823003 | 11:06 |
admin1 | noonedeadpunk, seen this error before ? /etc/keystone/fernet-keys does not contain keys, use keystone-manage fernet_setup to create Fernet keys | 11:38 |
admin1 | i did the setup command .. but it did not worked | 11:38 |
admin1 | i did the setup command .. keystone-manage fernet_setup --keystone-user keystone --keystone-group service .. but it did not helped | 11:39 |
noonedeadpunk | might be smth related to symlinking? | 11:39 |
noonedeadpunk | /etc/keystone is likely a symlink in R | 11:39 |
admin1 | its a directory | 11:41 |
*** sshnaidm|afk is now known as sshnaidm | 11:42 | |
noonedeadpunk | hm... I think error must be logged anyway in /var/log/keystone? | 11:46 |
admin1 | this is all it has ...100s of lines .. | 11:47 |
admin1 | https://gist.githubusercontent.com/a1git/24a333b2976a798a502eb5201f651a60/raw/fcd005b718e8be8aebb6422c40c5083f99d31d61/gistfile1.txt | 11:47 |
admin1 | i will try to nuke this container and retry | 11:48 |
admin1 | i think it was because i was using it with a limit | 12:01 |
admin1 | i did it without limit and it just worked | 12:01 |
admin1 | doh ! | 12:01 |
noonedeadpunk | ah, keystone with limit never works just in case | 12:05 |
noonedeadpunk | I was just updating https://review.opendev.org/c/openstack/openstack-ansible/+/822912/5/doc/source/admin/scale-environment.rst to mention that) | 12:06 |
admin1 | how do i run only mds and mon role but not the osd role | 12:09 |
opendevreview | Dmitriy Rabotyagov proposed openstack/openstack-ansible master: Disable service_token requirement by default https://review.opendev.org/c/openstack/openstack-ansible/+/823005 | 12:25 |
noonedeadpunk | admin1: at least you can leverage limit to ceph_mons only for example | 13:01 |
noonedeadpunk | but there could be also tags that would allow to do that | 13:01 |
opendevreview | Dmitriy Rabotyagov proposed openstack/openstack-ansible-os_glance master: Support service tokens https://review.opendev.org/c/openstack/openstack-ansible-os_glance/+/823009 | 13:06 |
opendevreview | Dmitriy Rabotyagov proposed openstack/openstack-ansible-os_glance master: Support service tokens https://review.opendev.org/c/openstack/openstack-ansible-os_glance/+/823009 | 13:07 |
opendevreview | Dmitriy Rabotyagov proposed openstack/openstack-ansible-os_glance master: Support service tokens https://review.opendev.org/c/openstack/openstack-ansible-os_glance/+/823009 | 13:09 |
opendevreview | Dmitriy Rabotyagov proposed openstack/openstack-ansible-os_glance master: Support service tokens https://review.opendev.org/c/openstack/openstack-ansible-os_glance/+/823009 | 13:52 |
noonedeadpunk | so, zun now fails single tempest job that is eaily reproducable in aio - test_run_container_with_cinder_volume_dynamic_created | 14:39 |
noonedeadpunk | I wonder if it should be run at all considering that test_run_container_with_cinder_volume is disabled because of bug https://bugs.launchpad.net/zun/+bug/1897497 | 14:40 |
noonedeadpunk | https://github.com/openstack/zun-tempest-plugin/blob/master/zun_tempest_plugin/tests/tempest/api/test_containers.py#L380 | 14:41 |
noonedeadpunk | ah, no `No iscsi_target is presently exported for volume`. So we have just our CI broken I guess | 14:58 |
admin1 | is ubuntu-esm-infra.list part of osa ? | 15:48 |
admin1 | what happened is xenial has version 13.0 of ceph (mimic) .. bionic got version 12.0 of ceph . -- both point to the same mimic repo .. but i found this deb https://esm.ubuntu.com/infra/ubuntu xenial-infra-security main extra on xenial host with the name | 15:49 |
admin1 | sources.list.d/ubuntu-esm-infra.list | 15:49 |
noonedeadpunk | no, I don't think it is | 16:02 |
noonedeadpunk | can't recall having that | 16:02 |
admin1 | i found out .. in bionic ceph pinning, its like this Pin: release o=Ubuntu .. while in xenial its Pin: release o=ceph.com | 16:09 |
admin1 | so one got pinned via ceph.com, other via Ubuntu | 16:09 |
noonedeadpunk | yeah, but pinning not in sources | 16:13 |
admin1 | changing it manually to ceph.com and then apt upgrade reboot fixed it | 16:29 |
admin1 | one server is done .. 2 more controllers to go | 16:29 |
jrosser_ | on old releases like this there are variable marks to set if it takes the Ubuntu ceph packages, the UCA ones or the ones at ceph.com | 16:36 |
jrosser_ | it’s not automatic to choose the one you need/want | 16:36 |
admin1 | i changed to ceph.com, rebooted .. they are good to go .. then i ran playbooks again, it set it back to Ubuntu , but since package already upgraded, it did not downgrade it .. so i am good | 16:38 |
admin1 | one wishlist from a long time is to run actual swift using osa .. | 16:48 |
admin1 | because its not strong consistency but eventual consistency, i can place the servers in dual datacenters ( including high latency) and be sure that backups are protected | 16:48 |
noonedeadpunk | well, with ceph you can have rgw (with swift and s3 compatability) relatively easily | 16:49 |
noonedeadpunk | and do cross-region bacups as well | 16:49 |
opendevreview | James Denton proposed openstack/openstack-ansible-os_glance master: Define _glance_available_stores in variables https://review.opendev.org/c/openstack/openstack-ansible-os_glance/+/822899 | 16:53 |
opendevreview | Dmitriy Rabotyagov proposed openstack/openstack-ansible-os_glance master: Support service tokens https://review.opendev.org/c/openstack/openstack-ansible-os_glance/+/823009 | 17:21 |
opendevreview | Dmitriy Rabotyagov proposed openstack/openstack-ansible-os_glance master: Add boto3 module for s3 backend https://review.opendev.org/c/openstack/openstack-ansible-os_glance/+/822870 | 17:21 |
opendevreview | Dmitriy Rabotyagov proposed openstack/openstack-ansible-os_glance master: Support service tokens https://review.opendev.org/c/openstack/openstack-ansible-os_glance/+/823009 | 17:21 |
opendevreview | Marcus Klein proposed openstack/openstack-ansible master: fix healthcheck-hosts.yml for different configuration https://review.opendev.org/c/openstack/openstack-ansible/+/823023 | 17:31 |
admin1 | noonedeadpunk, jrosser_ .. thanks for all the help and support .. | 17:31 |
kleini | https://review.opendev.org/c/openstack/openstack-ansible/+/774472 <- this commit removed the openstacksdk which is used by healthcheck-openstack.yml. How does it work in CI, if it fails for me in prod? | 17:57 |
noonedeadpunk | kleini: we don't run healthcheck-openstack.yml in CI | 18:13 |
noonedeadpunk | eventually, I think it needs to spawn just own venv with clients or be delegated to utility container (second is easier) | 18:13 |
kleini | so only healthcheck-hosts.yml is run in CI? | 18:16 |
kleini | and healthcheck-openstack.yml works again when adding openstacksdk to requirements.txt again | 18:17 |
kleini | will try to delegate healthcheck-openstack.yml to utility container. need to find some example, how to do that | 18:17 |
noonedeadpunk | and healthcheck-infrastructure.yml is also run | 18:23 |
noonedeadpunk | openstack is not as tempest or rally is the way better method how to test openstack | 18:24 |
noonedeadpunk | as eventually this playbook needs to be maintained, while tempest is maintained by service developers | 18:25 |
kleini | okay, will skip it then and stick to tempest | 18:25 |
noonedeadpunk | kleini: eventually, I think you just need to replace `hosts: localhost` with `hosts: utility_all[0 | 18:25 |
noonedeadpunk | * `hosts: groups['utility_all'][0]` | 18:26 |
kleini | did that and it says again that openstacksdk is missing | 18:27 |
kleini | so maybe additionally the venv needs to be set | 18:27 |
noonedeadpunk | and set ansible_python_interpreter: "{{ utility_venv_bin/python }}" | 18:27 |
noonedeadpunk | * "{{ utility_venv_bin }}/python" | 18:28 |
noonedeadpunk | but tbh I'd rather dropped that playbook in favor of tempest unless somebody really wants to maintain it and finds useful | 18:32 |
kleini | works | 18:34 |
kleini | tempest is hard for me to configure in regards to which tests should be run. there is no list of test, no list of suites | 18:35 |
kleini | and suite "smoke" is nearly nothing useful. it tests only keystone API | 18:36 |
opendevreview | Marcus Klein proposed openstack/openstack-ansible master: fix healthcheck-hosts.yml for different configuration https://review.opendev.org/c/openstack/openstack-ansible/+/823023 | 18:47 |
jrosser_ | kleini: if you want to validate your install you should look at refstack https://refstack.openstack.org | 19:42 |
jrosser_ | that bundles tempest and a defined set of tests for validating interoperability | 19:42 |
admin1 | with the PKI certs in place, is the kesytone url for ceph object storage still http://<internal-vip:5000> or something else ? | 21:20 |
admin1 | to integrate osa managed openstack and ceph-ansible managed ceph | 21:20 |
admin1 | to add swift/s3 of ceph to openstack | 21:21 |
jrosser_ | admin1: although we have the PKI role in place now, the internal endpoint still defaults to http rather than https | 22:02 |
jrosser_ | there are instructions here for if you want to switch that to https, which you could do on fresh deployments https://github.com/openstack/openstack-ansible/blob/master/doc/source/user/security/ssl-certificates.rst#tls-for-haproxy-internal-vip | 22:03 |
admin1 | how recommended is it to use https:// for internal traffic .. i have tested and internal is robust ( i mean does not leak to guests ) | 22:04 |
jrosser_ | for the internal keystone endpoint, when you enable https, the certificate should be valid for whatever fqdn or ip you have defined the internal vip as | 22:04 |
jrosser_ | we will switch to defaulting to https at some future release | 22:04 |
jrosser_ | currently there is no upgrade path for that so the default remains as http | 22:04 |
admin1 | question is .. for those doing ceph-ansible + osa .. if we switch to pki, since its self signed cert, do we need to copy the ca certs etc to ceph mons as well ? | 22:05 |
jrosser_ | well, there is an upgrade path if you're happy that the control plane is broken for the period of doing an upgrade | 22:05 |
admin1 | its all in lab .. | 22:05 |
jrosser_ | it is not self signed | 22:05 |
jrosser_ | it creates a CA root, which is self signed | 22:05 |
jrosser_ | so there is a CA cert you can copy off the deploy node and install onto whatever else you want | 22:06 |
admin1 | so for the ceph-mons to connect to https:// keystone, that ca cert is the only thing that needs to be copied over .. | 22:06 |
jrosser_ | probably | 22:06 |
jrosser_ | different things tend to behave differently | 22:06 |
jrosser_ | libvirt has different needs to python code, for example | 22:07 |
jrosser_ | for where you put the CA, if it needs a copy of the intermediate CA, if it wants a cert chain blah blah blah | 22:07 |
jrosser_ | first thing to do is add the OSA PKI root to the system CA store of your ceph nodes and see if thats good enough | 22:08 |
admin1 | yeah . no hurry now .. just doing some future roadmap plannings . | 22:08 |
jrosser_ | if not, then dig into the ceph docs to find what it wants | 22:08 |
jrosser_ | there are some things we don't yet test really | 22:09 |
jrosser_ | like providing your own external cert, and also using the PKI role for internal | 22:09 |
admin1 | if we provide own external cert, is that cert used instead of pki ? | 22:10 |
jrosser_ | or providing your own intermediate CA / key from an existing company CA for osa+PKI to use | 22:10 |
jrosser_ | for what? there are now lots of certs | 22:10 |
admin1 | :D | 22:10 |
jrosser_ | for external, you would use the vars in the haproxy role, which should be very similar/same as before | 22:11 |
admin1 | for example .. if we change the internal vip for example to cloud-int.domain.com and external vip to cloud.domain.com and provide a san/wildcard that satisfies both cloud-int and cloud.domain.com, can that cert be used for internal and external instead of the self-signed pki ? | 22:11 |
jrosser_ | that sort of misses the point | 22:13 |
jrosser_ | you need ssl on rabbitmq today regardless, and that is coming from the PKI role | 22:13 |
jrosser_ | so the internal VIP is really just one of very many places that certificates are required | 22:14 |
admin1 | that is true .. | 22:14 |
jrosser_ | and imho it is more important to have well designed trust on the internal SSL, more important than it being a certificate from a "real issuer" | 22:15 |
admin1 | some "customers" really insist on having everything "certified" :) | 22:16 |
jrosser_ | the trouble is you can't have certificates issued for rfc1918 ip addresses, or things which are not public | 22:16 |
jrosser_ | so thats basically broken thinking | 22:16 |
admin1 | yeah | 22:18 |
jrosser_ | having a private internal CA is more secure than a publically trusted one | 22:18 |
jrosser_ | becasue the internal things will only authenticate with each other, not with an external hacker | 22:18 |
Generated by irclog2html.py 2.17.3 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!