*** ttsiouts has joined #openstack-containers | 00:07 | |
*** sdake has joined #openstack-containers | 00:08 | |
*** itlinux has joined #openstack-containers | 00:09 | |
*** sdake has quit IRC | 00:12 | |
*** itlinux_ has joined #openstack-containers | 00:13 | |
*** sdake has joined #openstack-containers | 00:14 | |
*** itlinux has quit IRC | 00:15 | |
*** ttsiouts has quit IRC | 00:29 | |
*** sdake has quit IRC | 00:41 | |
*** sapd1 has quit IRC | 00:42 | |
*** PagliaccisCloud has quit IRC | 00:50 | |
*** PagliaccisCloud has joined #openstack-containers | 00:52 | |
*** openstackgerrit has joined #openstack-containers | 00:57 | |
openstackgerrit | Jake Yip proposed openstack/magnum master: Update min tox version to 2.0 https://review.openstack.org/616412 | 00:57 |
---|---|---|
*** ricolin has joined #openstack-containers | 01:03 | |
*** sdake has joined #openstack-containers | 01:08 | |
*** sapd1 has joined #openstack-containers | 01:52 | |
*** sdake has quit IRC | 02:05 | |
*** hongbin has joined #openstack-containers | 02:13 | |
*** sdake has joined #openstack-containers | 02:28 | |
*** sapd1 has quit IRC | 02:37 | |
*** itlinux_ has quit IRC | 03:06 | |
*** itlinux has joined #openstack-containers | 03:12 | |
*** itlinux has quit IRC | 03:21 | |
*** itlinux has joined #openstack-containers | 03:25 | |
*** sdake has quit IRC | 03:51 | |
*** ramishra has joined #openstack-containers | 04:09 | |
*** udesale has joined #openstack-containers | 04:18 | |
*** ykarel|away has joined #openstack-containers | 04:31 | |
*** ykarel|away is now known as ykarel | 04:31 | |
*** janki has joined #openstack-containers | 05:08 | |
*** hongbin has quit IRC | 05:09 | |
*** udesale has quit IRC | 05:22 | |
*** jhesketh has quit IRC | 05:47 | |
*** jhesketh has joined #openstack-containers | 05:48 | |
*** sdake has joined #openstack-containers | 05:48 | |
*** sdake has quit IRC | 05:50 | |
*** pcaruana has joined #openstack-containers | 05:52 | |
*** sdake has joined #openstack-containers | 05:58 | |
*** pcaruana has quit IRC | 06:07 | |
*** dims has quit IRC | 06:24 | |
*** dims has joined #openstack-containers | 06:26 | |
*** itlinux has quit IRC | 06:34 | |
*** dims has quit IRC | 06:36 | |
*** dims has joined #openstack-containers | 06:37 | |
*** mkuf has quit IRC | 06:48 | |
*** mkuf has joined #openstack-containers | 07:01 | |
*** udesale has joined #openstack-containers | 07:51 | |
*** udesale has quit IRC | 08:01 | |
*** flwang1 has joined #openstack-containers | 08:14 | |
*** sapd1 has joined #openstack-containers | 08:16 | |
*** pcaruana has joined #openstack-containers | 08:18 | |
*** pcaruana has quit IRC | 08:25 | |
*** yolanda has joined #openstack-containers | 08:25 | |
flwang1 | strigazi: around? | 08:33 |
*** sdake has quit IRC | 08:36 | |
*** pcaruana has joined #openstack-containers | 08:37 | |
flwang1 | strigazi: do you have time for a catch up? | 08:40 |
*** ykarel is now known as ykarel|lunch | 08:41 | |
*** pcaruana has quit IRC | 08:44 | |
*** ttsiouts has joined #openstack-containers | 08:52 | |
*** alisanhaji has joined #openstack-containers | 09:00 | |
*** pcaruana has joined #openstack-containers | 09:01 | |
*** ttsiouts has quit IRC | 09:05 | |
*** ttsiouts has joined #openstack-containers | 09:06 | |
*** ttsiouts has quit IRC | 09:10 | |
*** ttsiouts has joined #openstack-containers | 09:12 | |
*** ign0tus has joined #openstack-containers | 09:12 | |
*** alisanhaji has quit IRC | 09:30 | |
*** alisanhaji has joined #openstack-containers | 09:32 | |
*** ykarel|lunch is now known as ykarel | 09:39 | |
*** sdake has joined #openstack-containers | 09:55 | |
*** sdake has quit IRC | 10:51 | |
*** sdake has joined #openstack-containers | 10:55 | |
*** ttsiouts has quit IRC | 11:19 | |
*** ttsiouts has joined #openstack-containers | 11:20 | |
*** janki has quit IRC | 11:22 | |
*** ttsiouts has quit IRC | 11:24 | |
*** mkuf has quit IRC | 11:27 | |
*** mkuf has joined #openstack-containers | 11:28 | |
*** sapd1 has quit IRC | 11:29 | |
*** udesale has joined #openstack-containers | 12:00 | |
*** ttsiouts has joined #openstack-containers | 12:01 | |
*** dave-mccowan has joined #openstack-containers | 12:19 | |
*** sdake has quit IRC | 12:39 | |
*** janki has joined #openstack-containers | 13:17 | |
*** sdake has joined #openstack-containers | 13:19 | |
*** ivve has joined #openstack-containers | 13:38 | |
*** andrein has joined #openstack-containers | 13:41 | |
*** sapd1 has joined #openstack-containers | 13:43 | |
andrein | Hi guys, I'm trying to configure magnum on openstack rocky. I can launch the cluster, but the heat stack fails after creating the masters. I've logged in to the masters and every one of them is hanging when starting etcd because it can't find the certificates. I've noticed the make-certs.sh job failed on all of them because they're trying to hit the keystone API over the internal endpoint. How can I change this? | 13:47 |
*** sapd1 has quit IRC | 13:47 | |
*** sdake has quit IRC | 14:11 | |
*** janki has quit IRC | 14:12 | |
*** janki has joined #openstack-containers | 14:12 | |
*** ykarel is now known as ykarel|away | 14:14 | |
*** ykarel|away has quit IRC | 14:18 | |
*** sapd1 has joined #openstack-containers | 14:18 | |
*** ykarel|away has joined #openstack-containers | 14:19 | |
*** ttsiouts has quit IRC | 14:21 | |
*** sdake has joined #openstack-containers | 14:22 | |
*** ttsiouts has joined #openstack-containers | 14:22 | |
*** sdake has quit IRC | 14:23 | |
DimGR | strigazi hii :) | 14:23 |
*** ttsiouts has quit IRC | 14:25 | |
*** ttsiouts has joined #openstack-containers | 14:25 | |
*** sdake has joined #openstack-containers | 14:33 | |
brtknr | andrein: how was your openstack deployed? | 14:37 |
andrein | brtknr, I deployed it using kolla-ansible | 14:47 |
*** hongbin has joined #openstack-containers | 14:47 | |
brtknr | Which version of kolla-ansible? | 14:47 |
andrein | Version 7.0.1 | 14:48 |
*** sdake has quit IRC | 14:48 | |
*** sdake has joined #openstack-containers | 14:50 | |
brtknr | Hmm, can you check your heat-container-agent log in master? | 14:50 |
brtknr | also check /var/log/cloud-init.log | 14:51 |
brtknr | and /var/log/cloud-init-output.log | 14:51 |
brtknr | and grep -i for fail | 14:51 |
andrein | On the Kubernetes master, right? | 14:51 |
* andrein is spawning another cluster | 14:53 | |
*** pcaruana has quit IRC | 14:57 | |
*** munimeha1 has joined #openstack-containers | 14:58 | |
andrein | brtknr, cloud init log shows make-cert.service failing | 14:59 |
andrein | That's the only error I see in cloud-init logs. I'm using coreos as a base image for this cluster. | 15:04 |
*** sdake has quit IRC | 15:04 | |
andrein | From what I notice in /etc/sysconfig/heat-params, MAGNUM_URL is set to the public endpoint, but AUTH_URL is private. | 15:04 |
andrein | Make-certs.sh is trying to hit the private auth endpoint and times out after a while, that causes etcd to fail etc. | 15:05 |
*** sdake has joined #openstack-containers | 15:07 | |
openstackgerrit | jacky06 proposed openstack/magnum-tempest-plugin master: Update json module to jsonutils https://review.openstack.org/638968 | 15:07 |
andrein | Hmmm, wait a second, in horizon under admin/system information I do have the wrong URL for the public endpoint. Seems something went south in kolla-ansible | 15:08 |
*** sapd1 has quit IRC | 15:25 | |
*** openstackgerrit has quit IRC | 15:28 | |
*** alisanhaji has quit IRC | 15:34 | |
*** pcaruana has joined #openstack-containers | 15:42 | |
*** alisanhaji has joined #openstack-containers | 15:44 | |
*** sdake has quit IRC | 15:49 | |
*** udesale has quit IRC | 15:50 | |
*** belmoreira has quit IRC | 16:00 | |
*** sdake has joined #openstack-containers | 16:01 | |
*** ricolin has quit IRC | 16:04 | |
*** Adri2000 has joined #openstack-containers | 16:21 | |
Adri2000 | hello | 16:21 |
Adri2000 | is there any existing discussion somewhere about using 8.8.8.8 as default dns server for magnum-created networks, instead of not specifying any dns server and therefore using neutron dns resolution? | 16:23 |
Adri2000 | at least in the k8s_fedora_atomic_v1 driver | 16:23 |
*** janki has quit IRC | 16:41 | |
*** ramishra has quit IRC | 16:41 | |
*** janki has joined #openstack-containers | 16:41 | |
*** ign0tus has quit IRC | 16:45 | |
*** ivve has quit IRC | 16:58 | |
*** andrein has quit IRC | 17:00 | |
-openstackstatus- NOTICE: Gerrit is being restarted for a configuration change, it will be briefly offline. | 17:09 | |
*** ykarel|away has quit IRC | 17:21 | |
*** itlinux has joined #openstack-containers | 17:35 | |
*** sdake has quit IRC | 17:35 | |
*** ttsiouts has quit IRC | 17:38 | |
flwang1 | Adri2000: we have seen this requirement before | 17:47 |
flwang1 | but no one working on that now | 17:47 |
*** itlinux has quit IRC | 18:42 | |
*** itlinux has joined #openstack-containers | 18:47 | |
*** andrein has joined #openstack-containers | 18:56 | |
brtknr | Meeting today? | 19:10 |
*** ivve has joined #openstack-containers | 19:12 | |
brtknr | andrein: if coreos is not essential, try using fedora-atomic driver | 19:12 |
brtknr | not sure many people here are testing coreos environment | 19:12 |
brtknr | although that might change soon with fedora-coreos? | 19:13 |
*** ttsiouts has joined #openstack-containers | 19:31 | |
*** sdake has joined #openstack-containers | 19:32 | |
*** dave-mccowan has quit IRC | 19:55 | |
*** itlinux has quit IRC | 19:56 | |
*** NobodyCam has joined #openstack-containers | 19:57 | |
NobodyCam | morning Magnum folks | 19:57 |
NobodyCam | anyone encountered Authorization failed. or token scope issues with OpenStack sensible installed magnum? | 19:58 |
*** dave-mccowan has joined #openstack-containers | 20:01 | |
*** itlinux has joined #openstack-containers | 20:08 | |
*** ttsiouts has quit IRC | 20:11 | |
*** ttsiouts has joined #openstack-containers | 20:11 | |
*** itlinux has quit IRC | 20:15 | |
*** ttsiouts has quit IRC | 20:15 | |
*** sdake has quit IRC | 20:17 | |
andrein | brtknr, I eventually got it working with CoreOS. Had to change the keystone public endpoint manually, no idea why kolla skipped reconfiguring it, the other endpoints were changed. | 20:18 |
*** sdake has joined #openstack-containers | 20:19 | |
*** ttsiouts has joined #openstack-containers | 20:32 | |
flwang1 | strigazi: do we have meeting today? | 20:36 |
flwang1 | brtknr: seems we don't have meeting today, strigazi is not online | 20:42 |
strigazi | We do have a meeting | 20:46 |
brtknr | Woot! | 20:49 |
brtknr | andrein: we have submitted kolla-ansible config to modify keystone endpoint in the past but maybe it wasnt for coreos | 20:49 |
strigazi | Dates for the next three Tuesdays https://wiki.openstack.org/wiki/Meetings/Containers | 20:50 |
colin- | hi | 20:51 |
*** andrein has quit IRC | 20:52 | |
flwang1 | strigazi: cool, good to see you | 20:53 |
*** andrein has joined #openstack-containers | 20:53 | |
*** mkuf has quit IRC | 20:57 | |
strigazi | #startmeeting containers | 21:00 |
openstack | Meeting started Tue Mar 5 21:00:05 2019 UTC and is due to finish in 60 minutes. The chair is strigazi. Information about MeetBot at http://wiki.debian.org/MeetBot. | 21:00 |
openstack | Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. | 21:00 |
*** openstack changes topic to " (Meeting topic: containers)" | 21:00 | |
openstack | The meeting name has been set to 'containers' | 21:00 |
strigazi | #topic Roll Call | 21:00 |
*** openstack changes topic to "Roll Call (Meeting topic: containers)" | 21:00 | |
strigazi | o/ | 21:00 |
schaney | o/ | 21:00 |
jakeyip | o/ | 21:00 |
brtknr | o/ | 21:01 |
strigazi | Hello schaney jakeyip brtknr | 21:02 |
strigazi | #topic Stories/Tasks | 21:02 |
*** openstack changes topic to "Stories/Tasks (Meeting topic: containers)" | 21:02 | |
*** imdigitaljim has joined #openstack-containers | 21:02 | |
imdigitaljim | o/ | 21:02 |
strigazi | I want to mention three things quickly. | 21:03 |
strigazi | CI for swarm and kubernetes is not passing | 21:03 |
colin- | hello | 21:03 |
strigazi | Hello colin- imdigitaljim | 21:03 |
strigazi | I'm finding the error | 21:04 |
strigazi | for example for k8s http://logs.openstack.org/73/639873/3/check/magnum-functional-k8s/06f3638/logs/screen-h-eng.txt.gz?level=ERROR | 21:04 |
strigazi | The error is the same for swarm | 21:04 |
strigazi | If someone wants to take a look and then comment in https://review.openstack.org/#/c/640238/ or in a fix :) | 21:06 |
strigazi | 2. | 21:06 |
strigazi | small regression I have found for the etcd_volume_size label (persistent storage for etcd) https://storyboard.openstack.org/#!/story/2005143 | 21:06 |
strigazi | this fix is obvious | 21:07 |
strigazi | 3. | 21:07 |
strigazi | imdigitaljim created Cluster creators that leave WRT Keystone cause major error https://storyboard.openstack.org/#!/story/2005145 | 21:07 |
imdigitaljim | yeah thats my 1 | 21:07 |
strigazi | it has been discusses many times. the keystone team says there is no fix | 21:07 |
strigazi | in our cloud we manually transfer the trustee user to another account. | 21:08 |
imdigitaljim | could we rework magnum to opt to poll heat based on a service account for 1 part | 21:08 |
imdigitaljim | instead of using trust cred to poll heat | 21:08 |
strigazi | imdigitaljim: some says this is a security issue, it was like this before. | 21:08 |
imdigitaljim | oh? | 21:09 |
strigazi | but this fixes part of the problem | 21:09 |
imdigitaljim | couldnt it be scoped to readonly/gets for heat | 21:09 |
imdigitaljim | the kubernetes side | 21:09 |
imdigitaljim | either might be trust transfer (like you suggest) | 21:09 |
imdigitaljim | or we have been opting for teams to use a bot account type approach for their tenant | 21:09 |
imdigitaljim | that will persist among users leaving | 21:09 |
strigazi | trusts transfer *won't* happen in keystone, ever | 21:10 |
imdigitaljim | yeah | 21:10 |
imdigitaljim | i doubt it would | 21:10 |
jakeyip | does this happen only if the user is deleted from keystone? | 21:10 |
strigazi | they were clear with this in the Dublin PTG | 21:10 |
*** pcaruana has quit IRC | 21:10 | |
strigazi | yes | 21:10 |
imdigitaljim | yeah | 21:10 |
strigazi | the trust powers die when the user is deleted | 21:11 |
strigazi | same for application creds | 21:11 |
imdigitaljim | to be honest even if we fix the "magnum to opt to poll heat based on a service account" | 21:11 |
imdigitaljim | that would be a huge improvement | 21:11 |
imdigitaljim | that would at least enable us to delete the clusters | 21:11 |
imdigitaljim | without db edits | 21:11 |
strigazi | admins can delete the cluster anyway | 21:11 |
imdigitaljim | we could not | 21:12 |
strigazi | ? | 21:12 |
imdigitaljim | with our admin accounts | 21:12 |
imdigitaljim | the codepaths bomb out with heat polling | 21:12 |
imdigitaljim | not sure where | 21:12 |
jakeyip | is this a heat issue instead? | 21:12 |
imdigitaljim | the occurrence was just yesterda | 21:12 |
strigazi | mayve you diverged in the code? | 21:12 |
imdigitaljim | no i had to delete the heat stack underneath with normal heat functionality | 21:12 |
imdigitaljim | and then manually remove the cluster via db | 21:13 |
strigazi | wrong policy? | 21:13 |
imdigitaljim | not with that regard | 21:13 |
colin- | +1 re: service account, fwiw | 21:13 |
imdigitaljim | nope | 21:14 |
imdigitaljim | AuthorizationFailure: unexpected keystone client error occurred: Could not find user: <deleted_user>. (HTTP 404) (Request-ID: req-370b414f-239a-4e13-b00d-a1d87184904b) | 21:15 |
strigazi | ok | 21:15 |
jakeyip | ok so figuring out why admin can't use magnum to delete a cluster but can use heat to delete a stack will be a way forward? | 21:15 |
jakeyip | I wonder what is the workflow for normal resources (e.g. nova instances) in case of people leaving? | 21:15 |
strigazi | the problem is magnum can't check the status of the stack | 21:16 |
brtknr | it would be nice if the trust was owned by a role+domain rather than a user, so anyone with the role+domain can act as that role+domain | 21:16 |
imdigitaljim | ^ | 21:16 |
imdigitaljim | +1 | 21:16 |
imdigitaljim | +1 | 21:16 |
brtknr | guess its too late to refactor things now... | 21:16 |
imdigitaljim | imo not really | 21:16 |
strigazi | it is a bit bad as well | 21:16 |
imdigitaljim | but it can be bad based on the use-case | 21:17 |
imdigitaljim | for us its fine | 21:17 |
strigazi | the trust creds are a leak | 21:17 |
imdigitaljim | yeah | 21:17 |
imdigitaljim | the trust creds on the server | 21:17 |
strigazi | userA takes trust creds from userb that they both own the cluster | 21:17 |
imdigitaljim | and you can get access to other clusters | 21:17 |
strigazi | userA is fired, can still access keystone | 21:17 |
brtknr | oh, because trust is still out in the wild? | 21:18 |
strigazi | the polling issue is different than the trust in the cluster | 21:18 |
imdigitaljim | yeah | 21:18 |
brtknr | change trust password *rolls eyes* | 21:18 |
imdigitaljim | different issues | 21:18 |
strigazi | we can do service account for polling again | 21:18 |
imdigitaljim | but an admin readonly scope | 21:19 |
imdigitaljim | ? | 21:19 |
strigazi | That is possible | 21:19 |
strigazi | since the magnum controller is managed by admins | 21:19 |
imdigitaljim | yeah | 21:19 |
imdigitaljim | i think that would a satisfactory solution | 21:19 |
imdigitaljim | the clusters we can figure out/delete/etc | 21:19 |
imdigitaljim | but magnums behavior is a bit unavoidable | 21:20 |
imdigitaljim | thanks strigazi! | 21:20 |
imdigitaljim | you going to denver? | 21:20 |
strigazi | https://github.com/openstack/magnum/commit/f895b2bd0922f29a9d6b08617cb60258fa101c68#diff-e004adac7f8cb91a28c210e2a8d08ee9 | 21:21 |
strigazi | I'm going yes | 21:21 |
imdigitaljim | lets meet up! | 21:21 |
strigazi | sure thing :) | 21:22 |
strigazi | Is anyone going to work on the polling thing? maybe a longer description first in storyboard? | 21:22 |
flwang1 | strigazi: re https://storyboard.openstack.org/#!/story/2005145 i think you and ricardo proposed this issue before in mailing list | 21:23 |
strigazi | yes, I mentioned this. I discussed it with the keystone team in Dublin | 21:24 |
flwang1 | and IIRC, we need support from keystone side? | 21:24 |
strigazi | there won't be help or change | 21:24 |
strigazi | from the keystone side | 21:24 |
strigazi | 22:11 < strigazi> trusts transfer *won't* happen in keystone, ever | 21:25 |
strigazi | nor for application credentials | 21:25 |
flwang1 | strigazi: so we have to fix it in magnum? | 21:25 |
strigazi | yes | 21:25 |
strigazi | two issues, one is the polling heat issue | 21:25 |
strigazi | 2nd, the cluster inside the cluster must be rotated | 21:26 |
imdigitaljim | creds inside* | 21:26 |
strigazi | we had a design for this in Dublin, but not man power | 21:26 |
strigazi | yes, creds :) | 21:26 |
imdigitaljim | yeah 1) trust on magnum, fixable and 2) trust on cluster, no clear path yet | 21:26 |
strigazi | 2) we have a rotate certificates api with noop | 21:27 |
strigazi | it can rotate the certs and the trust | 21:27 |
strigazi | that was the design | 21:27 |
flwang1 | strigazi: ok, i think we need longer discussion for this one | 21:27 |
imdigitaljim | im more concerned about 1) for the moment which is smaller in scope | 21:27 |
imdigitaljim | 2) might be more challenging and needs more discussion/desing | 21:27 |
imdigitaljim | design | 21:27 |
strigazi | no :) we did it one year ago, someone can implement it :) | 21:27 |
strigazi | I'll bring up the pointer in storyboard | 21:28 |
*** janki has quit IRC | 21:29 | |
strigazi | For the autoscaler, are there any outstanfing comments? Can we start pushing the maintainers to accept it? | 21:30 |
flwang1 | strigazi: i'm happy with current status. | 21:30 |
flwang1 | it passed my test | 21:30 |
schaney | strigazi: there are some future enhancements that I am hoping to work with you guys on | 21:31 |
flwang1 | strigazi: so we can/should start to push CA team to merge it | 21:31 |
strigazi | schaney: do you want to leave a comment you are happy with the current state? we can ping the CA team the {'k8s', 'sig', 'openstack'} in some order | 21:32 |
flwang1 | schaney: sure, the /resize api is coming | 21:32 |
schaney | I can leave a comment yeah | 21:34 |
schaney | Are you alright with me including some of the stipulations in the comment? | 21:35 |
schaney | for things like nodegroups, resize, and a couple bugs | 21:35 |
strigazi | schaney: I don't know how it will work for them | 21:35 |
schaney | same, not sure if it's better to get something out there and start iterating | 21:36 |
strigazi | +1 ^^ | 21:36 |
schaney | or try to get it perfect first | 21:36 |
flwang1 | schaney: i would suggest to track them in magnum or open separated issues later, but just my 2c | 21:36 |
imdigitaljim | we'll probably just do PRs against the first iteration | 21:37 |
schaney | track them in magnum vs the autoscaler? | 21:37 |
imdigitaljim | and use issues in autoscaler repo probably | 21:37 |
imdigitaljim | ./shrug | 21:37 |
schaney | yeah, us making PRs to the autoscaler will work for us going forward | 21:38 |
schaney | the current PR has so much going on already | 21:38 |
strigazi | We can focus on the things that work atm, and when it is in, PR in the CA repo are fine | 21:38 |
flwang1 | issues in autoscaler, but don't scare them :) | 21:38 |
flwang1 | strigazi: +1 | 21:39 |
schaney | one question if tghartland has looking into the TemplateNodeInfo interface method implementation | 21:39 |
strigazi | as long as we agree on the direction | 21:39 |
schaney | I think the current implementation will cause a crash | 21:40 |
imdigitaljim | imho i think we're all heading the same direction | 21:40 |
strigazi | creash on what? | 21:40 |
strigazi | crash on what? why? | 21:40 |
schaney | the autoscaler | 21:40 |
strigazi | is it reproducible? | 21:41 |
schaney | Should be, I am curious as to if you guys have seen it | 21:41 |
strigazi | no | 21:42 |
schaney | I'll double check, but the current implementation should crash 100% of the time when it gets called | 21:42 |
strigazi | it is a specific call that is not implemented? | 21:42 |
schaney | yes | 21:42 |
strigazi | TemplateNodeInfo this > | 21:42 |
schaney | TemplateNodeInfo() | 21:42 |
strigazi | I'll discuss it with him tmr | 21:43 |
schaney | kk sounds good, I think for good faith for the upstream autoscaler guys, we might want to figure that part out | 21:43 |
schaney | before requesting merge | 21:44 |
strigazi | 100% probability of crash should be fixed first | 21:44 |
*** ivve has quit IRC | 21:44 | |
schaney | :) yeah | 21:44 |
strigazi | it is the vm flavor basically? | 21:45 |
schaney | yeah pretty much | 21:45 |
schaney | the autoscaler gets confused when there are no schedulable nodes | 21:46 |
*** alisanhaji has quit IRC | 21:46 | |
schaney | so TemplateNodeInfo() should generate a sample node for a given nodegroup | 21:46 |
strigazi | sounds easy | 21:47 |
schaney | Yeah shouldn't be too bad, just need to fully construct the template node | 21:47 |
strigazi | this however: 'the autoscaler gets confused when there are no schedulable nodes' sounds bad. | 21:48 |
schaney | it tries to run simulations before scaling up | 21:48 |
strigazi | so how it works now? | 21:48 |
schaney | if there are valid nodes, it will use their info in the simulation | 21:49 |
strigazi | it doesn't do any simulations? | 21:49 |
schaney | if there is no valid node, it needs the result of templateNodeInfo | 21:49 |
strigazi | if you can send us a scenario to reproduce, it would help | 21:50 |
schaney | cordon all nodes and put the cluster in a situation to scale up, should show the issue | 21:51 |
strigazi | but, won't it create a new node? | 21:51 |
strigazi | I pinged him, he will try tmr | 21:52 |
flwang1 | strigazi: in my testing, it scaled up well | 21:52 |
strigazi | schaney: apart from that, anything else? | 21:52 |
strigazi | to request to merge | 21:52 |
strigazi | flwang1: for me as well | 21:53 |
schaney | I think that was the last crash that I was looking at, everything else will just be tweaking | 21:54 |
strigazi | nice | 21:54 |
schaney | flwang1: to be clear, this issue is only seen when effectively scaling up from 0 | 21:54 |
flwang1 | schaney: i see. i haven't tested that case | 21:55 |
schaney | rare case, but I was just bringing it up since it will cause a crash | 21:55 |
flwang1 | schaney: cool | 21:55 |
strigazi | we can address it | 21:55 |
schaney | awesome | 21:56 |
strigazi | we are almost out of time | 21:58 |
flwang1 | strigazi: rolling upgrade status? | 21:58 |
strigazi | I'll just ask one more time, Can someone look into the CI failures? | 21:58 |
flwang1 | strigazi: i did | 21:59 |
strigazi | flwang1: end meeting first and the discuss it? | 21:59 |
flwang1 | the current ci failure is related to nested virt | 21:59 |
strigazi | how so? | 21:59 |
flwang1 | strigazi: sure | 21:59 |
flwang1 | i even popped up in infra channel | 21:59 |
strigazi | let's end the meeting first | 21:59 |
colin- | see you next time | 21:59 |
strigazi | thanks everyone | 22:00 |
flwang1 | and there is no good way now, seems infra recently upgrade their kernel | 22:00 |
flwang1 | manser may have more inputs | 22:00 |
strigazi | #endmeeting | 22:00 |
*** openstack changes topic to "OpenStack Containers Team" | 22:00 | |
openstack | Meeting ended Tue Mar 5 22:00:33 2019 UTC. Information about MeetBot at http://wiki.debian.org/MeetBot . (v 0.1.4) | 22:00 |
openstack | Minutes: http://eavesdrop.openstack.org/meetings/containers/2019/containers.2019-03-05-21.00.html | 22:00 |
openstack | Minutes (text): http://eavesdrop.openstack.org/meetings/containers/2019/containers.2019-03-05-21.00.txt | 22:00 |
openstack | Log: http://eavesdrop.openstack.org/meetings/containers/2019/containers.2019-03-05-21.00.log.html | 22:00 |
strigazi | this thing? http://eavesdrop.openstack.org/irclogs/%23openstack-infra/%23openstack-infra.2019-03-03.log.html#t2019-03-03T20:17:29 | 22:02 |
flwang1 | strigazi: yes | 22:03 |
strigazi | that is why I have the CI non voting | 22:03 |
strigazi | feels more like an indication to me all these years. | 22:03 |
flwang1 | strigazi: yep, nest virt is still a pain | 22:04 |
strigazi | no problems with centos here | 22:04 |
strigazi | anyway, | 22:04 |
flwang1 | maybe we should migrate to FA 29 to try? | 22:04 |
flwang1 | did you get any luck on FA 29? it failed in my testing | 22:05 |
strigazi | at cern we use it | 22:05 |
flwang1 | which k8s version? | 22:05 |
strigazi | I didn't have time for testing it in devstack | 22:05 |
flwang1 | from community, no change? | 22:05 |
strigazi | no change | 22:05 |
flwang1 | ok | 22:05 |
strigazi | 1.13.3 and 1.12.4 | 22:06 |
flwang1 | cool | 22:06 |
strigazi | overlay2, no extra volumes | 22:06 |
flwang1 | ok | 22:06 |
flwang1 | btw, i have already proposed the patch for api ref of resize API https://review.openstack.org/639882 | 22:07 |
flwang1 | and the health_status patch in cluster listing patch is here https://review.openstack.org/640222 | 22:07 |
strigazi | i have seen them | 22:08 |
strigazi | missed the api-ref | 22:08 |
strigazi | For upgrades, I'm working in the driver code. | 22:09 |
strigazi | Do you want to take the api? | 22:09 |
flwang1 | yep, i can help polish the api patch, and the api ref | 22:09 |
strigazi | The only part that needs work in the api is: | 22:09 |
flwang1 | as for api, do you want to use the same way I'm using for resize api? | 22:10 |
strigazi | i think they are the same, no? | 22:10 |
strigazi | last time I checked it was | 22:10 |
flwang1 | they should be same, a little bit diff between your current one and mine | 22:10 |
strigazi | ok | 22:10 |
strigazi | the only part that needs some thought is | 22:11 |
strigazi | clusterA used CT-A to be created | 22:11 |
strigazi | CT-A had labels X Y and Z | 22:11 |
flwang1 | labels merging issue? | 22:11 |
strigazi | yes | 22:12 |
strigazi | I thought of a config option to check if some labels are going to be changed, and in such a case refuse to upgrade or even create a cluster | 22:13 |
flwang1 | can we do simple merge now like ricardo and i discussed? | 22:14 |
jakeyip | is this to do with https://review.openstack.org/#/c/621611/ ? | 22:15 |
strigazi | no | 22:16 |
strigazi | flwang1: were you discussed it? | 22:16 |
strigazi | flwang1: where you discussed it? | 22:16 |
flwang1 | https://review.openstack.org/#/c/621611/ | 22:16 |
flwang1 | we discussed similar issue in above patch | 22:16 |
strigazi | this is for cluster create | 22:16 |
flwang1 | yes, but similar issue | 22:17 |
strigazi | almost but not | 22:17 |
flwang1 | i mean, we probably want to use same policy for merging to avoid confusing users | 22:17 |
flwang1 | yes i know | 22:17 |
strigazi | shall i explain it or not? | 22:17 |
strigazi | in one loine | 22:18 |
strigazi | in one line | 22:18 |
strigazi | user in cluster creation selected version 5, the new CT to upgrade has version 4, what do you do? | 22:18 |
strigazi | downgrade? | 22:18 |
flwang1 | for a version of an addon? | 22:19 |
strigazi | based on 621611 yes, downgrade | 22:19 |
strigazi | addon or k8s | 22:19 |
jakeyip | cluster creation label should override CT label? | 22:19 |
strigazi | jakeyip: yes, for creation. for upgrade? | 22:20 |
strigazi | as an admin, I don't wont to support users that go rogue | 22:20 |
jakeyip | btw I was just reviewing 621611 last night and I felt quite uneasy about it, prob cos there are many ways like this where it's going to be weird | 22:20 |
flwang1 | we should support downgrade, but better not now? | 22:20 |
strigazi | support downgrade from user selected version, to admin suggested? | 22:21 |
strigazi | this is asking for trouble | 22:21 |
strigazi | the matrix of version explodes this way | 22:21 |
flwang1 | yes, so i think we don't have to do it now, maybe even in future | 22:22 |
jakeyip | sorry I am a bit lost where is the change for this functionality? (CT label update and cluster upgrade) | 22:23 |
strigazi | downgrading is bad | 22:23 |
strigazi | jakeyip: there is no change for this yet | 22:24 |
jakeyip | strigazi: I see. are we thinking of using update on CT to update clusters? | 22:24 |
strigazi | the problem is that users can select labels in cluster creation, then with a CT they will try to upgrade and there will be conflicts | 22:25 |
strigazi | jakeyip: yes | 22:25 |
flwang1 | strigazi: yep, labels is a pain, we can only support base image upgrade and k8s upgrade | 22:26 |
flwang1 | we need more discussion about labels, i mean i need more thinking | 22:26 |
strigazi | we need to support uprading add ons too. but even for k8s we should discourage users to pick the version in cluster creation | 22:27 |
strigazi | let's see next week about it | 22:27 |
flwang1 | strigazi: that's why i mentioned before, we probably need another attributes for template | 22:27 |
flwang1 | which can indicate the compatibility with new/old versions | 22:28 |
*** sdake has quit IRC | 22:28 | |
strigazi | this week you can pick the API patch and I continue with the driver | 22:28 |
flwang1 | for example, CT-1.11.2 has attribute can_upgrade_to ['1.12.4', '1.13.4'] | 22:28 |
flwang1 | strigazi: no problem | 22:28 |
strigazi | cool | 22:29 |
*** sdake has joined #openstack-containers | 22:29 | |
*** sdake has quit IRC | 22:29 | |
flwang1 | strigazi: thank you, my frined | 22:29 |
flwang1 | friend | 22:29 |
strigazi | jakeyip: do you need help with anything? | 22:29 |
strigazi | flwang1: thank you | 22:29 |
strigazi | flwang1: you do too much for the project | 22:30 |
jakeyip | yes, maybe https://review.openstack.org/#/c/638077/ ? | 22:30 |
jakeyip | Pardon me, I am a newbie, but I feel like the CT to Clusters relationship needs to be defined a bit better? | 22:31 |
flwang1 | strigazi: btw, as for https://review.openstack.org/640211 | 22:31 |
jakeyip | if they are tightly coupled then updating CT is going to be very scary, for both operators and users | 22:31 |
strigazi | jakeyip: ack for 638077 | 22:31 |
strigazi | jakeyip: we can limit the access to CTs for users | 22:32 |
flwang1 | i changed my mind, i would like to show the health_status_reason by default, cause the api has returned everything, we don't have to ask the user to add --detailed again to trigger another api call to see the health_status_reason, thoughts? | 22:32 |
strigazi | jakeyip: and give less freedom in labels in cluster creation | 22:32 |
jakeyip | I feel like having CT just acting like a template is good, it prefills fields that you can override | 22:33 |
strigazi | flwang1: hmm, i'm only concerned for very big clusters | 22:33 |
flwang1 | strigazi: that's rare case, no? | 22:33 |
strigazi | what is the limit of the field in the db? | 22:33 |
flwang1 | maybe common in cern | 22:34 |
strigazi | for us more than 500 nodes is a bit rare | 22:34 |
strigazi | but a few 100s is not | 22:34 |
strigazi | we can take it :) | 22:34 |
flwang1 | strigazi: you try download the patch and give it try | 22:35 |
strigazi | since the info is in the db anyway | 22:35 |
strigazi | yes | 22:35 |
flwang1 | if it's really a pain, i'm open to support --detailed | 22:35 |
strigazi | cool | 22:35 |
*** sapd1 has joined #openstack-containers | 22:36 | |
jakeyip | strigazi: I, as an operator, will feel uneasy updating a CT that half the clusters in my cloud depend on. So maybe I won't do it and just create new CTs. Negating the whole benefit. | 22:36 |
flwang1 | i'm good, sorry for a lot of pushing | 22:36 |
strigazi | won't be need probably, I'll add Ricardo in the review too | 22:36 |
strigazi | jakeyip it is not possible to update used CT and it won't be | 22:37 |
flwang1 | jakeyip: i can see your pain, but just like image, make CT immutable also have good sides | 22:37 |
*** sdake has joined #openstack-containers | 22:37 | |
strigazi | jakeyip: did you understand that CTs will be mutable? they will continue to be immutable | 22:38 |
jakeyip | strigazi: sorry I thought we are talking about a updating labels on a CT to trigger upgrade on a cluster? | 22:38 |
jakeyip | ok. phew | 22:38 |
strigazi | jakeyip: selecting a new CT triggers upgrade | 22:38 |
jakeyip | strigazi: thanks for the clarification! | 22:39 |
strigazi | cool, I have to go guys | 22:39 |
strigazi | see you around | 22:40 |
strigazi | thanks flwang1 jakeyip for all the work | 22:40 |
brtknr | Sorry, i enjoyed my observer role, like to stay in the loop! good night | 22:40 |
jakeyip | see you thanks strigazi as always | 22:40 |
flwang1 | strigazi: see you | 22:40 |
*** andrein has quit IRC | 22:40 | |
strigazi | brtknr: cheers | 22:41 |
strigazi | bye | 22:41 |
imdigitaljim | flwang1 we've moved long past having CT blocking issues for upgrades fwiw | 22:42 |
imdigitaljim | id be glad to share more recent updates to centos driver | 22:42 |
imdigitaljim | we want a few last things done and a few of my team is going to scour the code and ready it for upstreaming | 22:42 |
brtknr | imdigitaljim: are you using centos atomic or vanilla centos? | 22:42 |
imdigitaljim | centos 7.6 | 22:42 |
brtknr | atomic or not? | 22:43 |
imdigitaljim | nope | 22:43 |
brtknr | with magnum? | 22:43 |
imdigitaljim | yes | 22:43 |
imdigitaljim | this driver can be fairly easily adapted to ubuntu as wel | 22:44 |
imdigitaljim | and the like | 22:44 |
brtknr | thats cool! i often get questions about whether that is possible | 22:44 |
imdigitaljim | oh for sure it is | 22:44 |
imdigitaljim | ill ping you when we start uploading the driver | 22:44 |
imdigitaljim | it works differently than fedoras | 22:44 |
imdigitaljim | but its still executed the same way if that makes sense | 22:44 |
brtknr | do you have kube* services running as containers? | 22:44 |
imdigitaljim | yes | 22:44 |
*** sdake has quit IRC | 22:45 | |
brtknr | cool! | 22:45 |
imdigitaljim | we also rely on github for versioning | 22:45 |
imdigitaljim | so when you have a cluster you know what git revision of the cluster it was in case you need to know how it was bootstrapped | 22:45 |
imdigitaljim | and providing newer versions is ezpz | 22:45 |
imdigitaljim | upgrades in place are done through an api call + heat agent + kubernetes deployment | 22:46 |
brtknr | thats one way of doing it! | 22:46 |
imdigitaljim | so in other words we have a repo for magnum and a repo for the bootstrapping kubernetes content | 22:46 |
imdigitaljim | that we handle separately | 22:46 |
imdigitaljim | we hardly ever update magnum's code | 22:46 |
imdigitaljim | ill be glad to provide documentation on it and how it works for everything | 22:47 |
jakeyip | how do you point k8s to the config repo ? | 22:47 |
brtknr | so you dont go via kubeadm ? | 22:47 |
imdigitaljim | when that time comes | 22:47 |
imdigitaljim | no but we've considered adapting that approach | 22:47 |
jakeyip | that's nice in a way. good for power users. | 22:47 |
imdigitaljim | we do it the hardway since we have more control | 22:48 |
imdigitaljim | (i.e similar to fedora atomics) | 22:48 |
imdigitaljim | btw we could use any git repo | 22:48 |
imdigitaljim | even github.com | 22:48 |
brtknr | can you upgrade docker in place too? inside containers? | 22:48 |
imdigitaljim | youd provide that bootstrapping endpoint in the config file | 22:48 |
brtknr | *inside vms | 22:48 |
imdigitaljim | when we need to upgrade docker | 22:48 |
imdigitaljim | we just do a rolling scale | 22:49 |
imdigitaljim | rolling/update | 22:49 |
imdigitaljim | but we actually have puppet connected | 22:49 |
imdigitaljim | which can do it too | 22:49 |
imdigitaljim | but thats not part of the requirement for upstream content | 22:49 |
imdigitaljim | puppet is not a dependency | 22:49 |
brtknr | do you replace the image with new docker version or update the package itself? | 22:49 |
imdigitaljim | so for example | 22:50 |
imdigitaljim | the image in the template is centos-magnum | 22:50 |
imdigitaljim | we update the image with a new centos-magnum with updated docker | 22:50 |
imdigitaljim | and when nodes are scaled in/out they come up with new version of centos | 22:50 |
imdigitaljim | and/or docker | 22:50 |
imdigitaljim | and any additional software upgrades | 22:50 |
*** sdake has joined #openstack-containers | 22:51 | |
imdigitaljim | based on a little git management and using the git api | 22:51 |
imdigitaljim | we can provide any versions of kubernetes | 22:51 |
imdigitaljim | (that we want to support) | 22:51 |
brtknr | so rolling scale = -1 old instance +1 new instance? | 22:51 |
imdigitaljim | so we support v1.12.1 -> v1.13.4 | 22:51 |
imdigitaljim | and slowly move up from there as people dont have older clusters | 22:52 |
imdigitaljim | yeah | 22:52 |
brtknr | doesnt that just replace the nths node | 22:52 |
brtknr | n_th node* | 22:52 |
*** munimeha1 has quit IRC | 22:52 | |
imdigitaljim | minions_to_remove=1...N | 22:52 |
imdigitaljim | masters_to_remove=1...N | 22:53 |
imdigitaljim | just make that call | 22:53 |
imdigitaljim | until all minions/masters are cycled | 22:53 |
imdigitaljim | thats how we execute the inplace upgrades mostly too | 22:53 |
jakeyip | this is via kubectl or ? | 22:53 |
imdigitaljim | automatic | 22:53 |
imdigitaljim | heat-api call essentially though | 22:53 |
imdigitaljim | we have a kubernetes deployment that gets put on the cluster | 22:54 |
imdigitaljim | that cycles them | 22:54 |
imdigitaljim | the cluster manages itself | 22:54 |
imdigitaljim | it does a drain + kill | 22:55 |
imdigitaljim | so no loss of service | 22:55 |
imdigitaljim | we might iterate and if you have enough capacity | 22:55 |
imdigitaljim | you could grow your cluster N -> 2N | 22:55 |
jakeyip | nice. one thing I'm confused when we were talking about a new image, is that a docker image or glance image? | 22:55 |
imdigitaljim | then back down to N canning the old nodes | 22:55 |
imdigitaljim | ah yeah | 22:56 |
imdigitaljim | glance image | 22:56 |
imdigitaljim | centos-magnum is the glance image defined in the template | 22:56 |
imdigitaljim | and we upgrade that | 22:56 |
jakeyip | ok so heat-api to do a rebuild, or is that a new nova instance ? | 22:56 |
imdigitaljim | so new nodes come up with the upgrades | 22:56 |
brtknr | so the glance image name is important | 22:56 |
jakeyip | I guess N -> 2N is a new nova instance | 22:56 |
brtknr | there cannot be duplicate centos-magnum images? | 22:56 |
imdigitaljim | we dont need multiple images | 22:57 |
imdigitaljim | for our case | 22:57 |
imdigitaljim | just 1 | 22:57 |
brtknr | what i mean is, old and new version keep the same name | 22:57 |
imdigitaljim | if youre creating a cluster or upgrading nodes | 22:57 |
brtknr | ? | 22:57 |
imdigitaljim | we delete old | 22:57 |
imdigitaljim | or rename old | 22:57 |
imdigitaljim | and provide new | 22:57 |
brtknr | ok cool | 22:57 |
imdigitaljim | theres only ever 1 by that name | 22:58 |
imdigitaljim | so its always okay | 22:58 |
imdigitaljim | :D | 22:58 |
brtknr | =D | 22:58 |
brtknr | i like the way docker solves this problem by using tag | 22:58 |
brtknr | image hash can be different but tag is always the same | 22:59 |
imdigitaljim | yeah | 22:59 |
brtknr | glance could benefit from something similar | 22:59 |
imdigitaljim | basically thats how we treat the glance image | 22:59 |
imdigitaljim | but yeah | 22:59 |
imdigitaljim | its not by default | 22:59 |
imdigitaljim | because when you provide same tag in docker | 22:59 |
imdigitaljim | it detags the old one | 23:00 |
imdigitaljim | which is what we do | 23:00 |
imdigitaljim | heh | 23:00 |
brtknr | are you also doing federations? | 23:00 |
imdigitaljim | we do keep it for a while | 23:00 |
imdigitaljim | not yet but thats to come | 23:00 |
imdigitaljim | well push upstream probably after inplace upgrades is fully completed | 23:00 |
brtknr | with a mix of gpu/non-gpu nodes in the same cluster? | 23:00 |
imdigitaljim | its mostly like a alpha/beta level maturity | 23:00 |
imdigitaljim | ah yeah we'd also need to probably get hte nodegroups in place for upstream since the fedora guys all want it | 23:01 |
imdigitaljim | we dont use it here | 23:01 |
imdigitaljim | but we know its important | 23:01 |
brtknr | hte? | 23:01 |
imdigitaljim | the* | 23:01 |
jakeyip | imdigitaljim: so are you still using magnum? | 23:02 |
imdigitaljim | yup | 23:02 |
brtknr | lol^ | 23:02 |
imdigitaljim | core magnum is almost the same (tiny changes that we'd upstream) | 23:02 |
jakeyip | lol I don't mean it that way | 23:02 |
*** sdake has quit IRC | 23:02 | |
jakeyip | just for 1st provisioning? | 23:02 |
imdigitaljim | and everything else is a driver change | 23:02 |
imdigitaljim | magnum =/= fedora atomic k8s for us | 23:02 |
imdigitaljim | if thats what you mean | 23:02 |
jakeyip | I am thinking everything like node-count / image is going to be out of sync with your approach | 23:03 |
imdigitaljim | magnum is basically just a CRUD service | 23:03 |
imdigitaljim | when flwang fixes the api | 23:03 |
imdigitaljim | it wont be | 23:03 |
imdigitaljim | but atm yes it only reflects create time | 23:03 |
* brtknr googles CRUD | 23:03 | |
brtknr | oh i see | 23:04 |
imdigitaljim | create read update delete | 23:04 |
imdigitaljim | the image doesnt get out of date | 23:04 |
imdigitaljim | the node-count does | 23:04 |
jakeyip | so it might be magnum thinks node-count is 2 but heat and actually is 4? then what happens when flwang api updates node-count? | 23:04 |
imdigitaljim | we were also thinking of a better way than polling heat | 23:04 |
imdigitaljim | but see if we can interact with a rabbitmq or something | 23:05 |
imdigitaljim | heat <-> magnum relationship is pretty close anyways | 23:05 |
imdigitaljim | jakeyip | 23:05 |
imdigitaljim | when any scaling happens we'd use magnum's api instead of heats | 23:05 |
imdigitaljim | and then magnum will update itself as expected | 23:05 |
imdigitaljim | but mostly magnum just proxies requests to heat anyways | 23:06 |
jakeyip | ok but for current clusters they'll be out of synced and someone needs to fix them up again I think? | 23:06 |
imdigitaljim | how so? | 23:06 |
jakeyip | the upgrade workflow you were mentioning about adds new nodes? | 23:07 |
imdigitaljim | we dont get out of synced issues | 23:08 |
imdigitaljim | so maybe ive already solved that and hadnt explained it | 23:08 |
jakeyip | is node-count going to be eventually consistent with what's defined in magnum, after the upgrade ? | 23:09 |
imdigitaljim | oh you mean the N->2N thing | 23:13 |
imdigitaljim | we dont do that now | 23:13 |
imdigitaljim | that was just an idea for a more optimal upgrade | 23:13 |
imdigitaljim | but with the extra resource requirement | 23:13 |
jakeyip | I see. thanks for the clarification! | 23:13 |
jakeyip | would love to see your implementation when it is possible! | 23:14 |
brtknr | Me too | 23:31 |
*** sdake has joined #openstack-containers | 23:33 | |
imdigitaljim | yeah im looking forward to submitting it and would love to have some additional users | 23:56 |
Generated by irclog2html.py 2.15.3 by Marius Gedminas - find it at mg.pov.lt!