openstackgerrit | Feilong Wang proposed openstack/magnum master: [k8s] Update cluster health status by native API https://review.openstack.org/572897 | 00:00 |
---|---|---|
*** hongbin has joined #openstack-containers | 00:26 | |
*** livelace has quit IRC | 00:43 | |
*** livelace has joined #openstack-containers | 00:43 | |
*** janki has joined #openstack-containers | 01:28 | |
*** ricolin has joined #openstack-containers | 01:33 | |
*** Bhujay has joined #openstack-containers | 02:00 | |
openstackgerrit | Feilong Wang proposed openstack/magnum master: [k8s] Update cluster health status by native API https://review.openstack.org/572897 | 02:27 |
*** janki has quit IRC | 03:06 | |
*** Bhujay has quit IRC | 03:24 | |
*** Nel1x has quit IRC | 03:34 | |
*** ramishra has joined #openstack-containers | 03:40 | |
*** ykarel has joined #openstack-containers | 04:00 | |
*** udesale has joined #openstack-containers | 04:00 | |
*** ykarel has quit IRC | 04:06 | |
*** ykarel has joined #openstack-containers | 04:07 | |
*** hongbin has quit IRC | 04:10 | |
*** janki has joined #openstack-containers | 04:27 | |
*** Bhujay has joined #openstack-containers | 04:45 | |
*** Bhujay has quit IRC | 04:51 | |
*** Bhujay has joined #openstack-containers | 04:53 | |
*** flwang1 has quit IRC | 04:54 | |
*** ykarel has quit IRC | 05:07 | |
*** ykarel has joined #openstack-containers | 05:25 | |
*** janki has quit IRC | 06:06 | |
*** rcernin has quit IRC | 06:38 | |
*** rcernin has joined #openstack-containers | 06:41 | |
*** pcaruana has joined #openstack-containers | 06:42 | |
*** adrianc has joined #openstack-containers | 06:51 | |
*** rcernin has quit IRC | 06:51 | |
strigazi | ykarel: can you have a look: 592336 | 07:11 |
strigazi | imdigitaljim: http://paste.openstack.org/show/728455/ looks nice | 07:12 |
*** mattgo has joined #openstack-containers | 07:39 | |
openstackgerrit | OpenStack Proposal Bot proposed openstack/magnum-ui master: Imported Translations from Zanata https://review.openstack.org/594054 | 07:47 |
ykarel | strigazi, okk will check | 07:49 |
ykarel | strigazi, there are two test case failing, is that a known issue | 07:52 |
ykarel | No API token found for service account \"default\", retry after the token is automatically created and added to the service account | 07:52 |
strigazi | ykarel in the functional tests? | 07:54 |
ykarel | strigazi, yes and other is TypeError: delete_namespaced_service() takes exactly 4 arguments, which seems related to kubernetes version | 07:54 |
strigazi | ykarel this is known ^^ | 07:54 |
strigazi | the other must be happening because it tries too fast to create something | 07:55 |
ykarel | strigazi, okk, +2 +W, for the known issue is there a patch already? | 07:56 |
strigazi | ykarel: no. It needs a change in the params of the client. | 07:57 |
ykarel | strigazi, ack | 07:58 |
strigazi | ykarel: where do you see this? No API token found for service account \"default\", retry after the token is automatically created and added to the service account | 08:03 |
ykarel | amhttp://logs.openstack.org/36/592336/3/check/magnum-functional-k8s/2c671f0/job-output.txt.gz#_2018-08-20_12_01_02_553214 | 08:03 |
ykarel | strigazi, ^^ | 08:03 |
*** yankcrime has joined #openstack-containers | 08:11 | |
*** suanand has joined #openstack-containers | 08:28 | |
*** olivenwk has joined #openstack-containers | 08:38 | |
*** flwang1 has joined #openstack-containers | 08:43 | |
openstackgerrit | Merged openstack/magnum master: [k8s] Add proxy to master and set cluster-cidr https://review.openstack.org/592336 | 08:46 |
flwang1 | strigazi: around for a quick sync? | 08:48 |
strigazi | flwang1: I'm going to a 1 hour meeting :( | 08:49 |
flwang1 | strigazi: no problem | 08:50 |
openstackgerrit | Feilong Wang proposed openstack/magnum master: [k8s] Update cluster health status by native API https://review.openstack.org/572897 | 08:50 |
flwang1 | strigazi: i can manage get the versioned object works, but it doesn't work well with wsme types system | 08:50 |
flwang1 | the wsme types can't support CoercedDict well | 08:51 |
flwang1 | as a result, in the api response, user can't see the 2nd layer dict of the health_status_reason | 08:53 |
*** adrianc has quit IRC | 10:00 | |
*** adrianc has joined #openstack-containers | 10:09 | |
openstackgerrit | suzhengwei proposed openstack/magnum master: service init or heartbeat without hostname https://review.openstack.org/594119 | 10:20 |
strigazi | flwang1: ping | 10:24 |
*** Bhujay has quit IRC | 10:26 | |
*** ykarel is now known as ykarel|lunch | 10:29 | |
flwang1 | strigazi: yes | 10:46 |
strigazi | flwang1: sync? | 10:46 |
flwang1 | strigazi: let's do it | 10:47 |
flwang1 | except rolling upgrade and health monitoring, anything else we want to get in rocky | 10:47 |
strigazi | I have a minor one, that came up today. It is for swarm-mode | 10:48 |
strigazi | Make the default overleynetwork cidr of swarm-mode configurable | 10:49 |
strigazi | It conflicts with some of our public ips | 10:49 |
strigazi | The current default one | 10:49 |
strigazi | I guess you are not interested in swarm | 10:50 |
flwang1 | strigazi: yep, you're right | 10:50 |
flwang1 | so let's just focus on the upgrade one and cluster health monitoring? | 10:51 |
strigazi | yes | 10:51 |
flwang1 | currently, the health monitoring just works, but as i mentioned above, i'm dealing with the wsme types and oslo.versionedobjects | 10:51 |
strigazi | link to the code? | 10:52 |
flwang1 | https://review.openstack.org/572897 you mean patch link? | 10:52 |
strigazi | and file | 10:52 |
strigazi | https://review.openstack.org/#/c/572897/8/magnum/api/controllers/v1/cluster.py@141 | 10:52 |
flwang1 | https://review.openstack.org/#/c/572897/8/magnum/api/controllers/v1/cluster.py@141 | 10:52 |
flwang1 | yep | 10:53 |
flwang1 | i'm still testing to figure out the correct way to display the 2 layers nested dict | 10:53 |
strigazi | in the client? | 10:53 |
flwang1 | client does nothing, the error comes from server side | 10:54 |
strigazi | The api can return a valid json | 10:54 |
flwang1 | but we may need a client change to show the two new fields in the table | 10:54 |
flwang1 | api can't return a valid json | 10:54 |
strigazi | in a string | 10:55 |
strigazi | if can't, right | 10:55 |
flwang1 | https://review.openstack.org/#/c/572897/8/magnum/drivers/common/k8s_monitor.py@196 | 10:56 |
flwang1 | this is current data structure of the health_status_reason | 10:56 |
flwang1 | with current structure, we can basically provide all the info we got from k8s to the cluster auto healer | 10:56 |
strigazi | looks good | 10:56 |
strigazi | Is there something that we miss now? can we take this? https://review.openstack.org/#/c/570818/11/magnum/api/controllers/v1/cluster.py | 10:59 |
flwang1 | no, that patch will be updated in favor of the new health status reason structure | 11:00 |
flwang1 | never mind, i will figure out | 11:00 |
flwang1 | i just need your opinions about the whole workflow and the info we can provide with the health_status_reason | 11:01 |
flwang1 | and it would be nice if Ricardo can review it as well | 11:02 |
strigazi | I'm a little lost on the dependency of the patches. Do we need to decide on 572897 and then update 570818 ? | 11:02 |
strigazi | With the current state | 11:03 |
strigazi | we will have another periodic task, the one you created sync_cluster_health_status | 11:04 |
strigazi | that will check the _COMPLETE clusters | 11:05 |
strigazi | or even the ones with the statuses you listed | 11:05 |
strigazi | and it will update the status accordngly | 11:05 |
strigazi | if the api doesn't return ok or if any node is not ready the cluster will be unhealthy | 11:06 |
strigazi | makes sesne? | 11:06 |
strigazi | flwang1: ^^ | 11:06 |
flwang1 | after figure out the working structure, i will update 570818 | 11:06 |
flwang1 | firstly | 11:06 |
flwang1 | yep, as i mentioned in the design policy, if any node or api is not in good status, then the overall cluster is unhealthy | 11:07 |
flwang1 | we can improve the algorithm later, for the first version, i'd like to make it strict | 11:07 |
strigazi | Isn't this strict enough? How it can be stricter? | 11:08 |
strigazi | or more strict | 11:08 |
flwang1 | if you do have concern, we can hide the health_status and health_status_reason attributes for now | 11:08 |
strigazi | I don't have a concern about it | 11:09 |
flwang1 | i didn't verify, but for example, one of node may have disk pressure, but it's still in ready status | 11:09 |
flwang1 | something like that | 11:09 |
strigazi | I think we can base the node status only on the Ready field | 11:09 |
flwang1 | because currently, i'm just using the 'Ready' to represent the health status of minion node | 11:10 |
*** ykarel|lunch is now known as ykarel | 11:10 | |
flwang1 | yep, that's my current design | 11:10 |
flwang1 | but i do want to provide all the conditions of the node for reference | 11:10 |
flwang1 | hence why i'm making the health_status_reason's data structure a little bit 'rich' | 11:11 |
strigazi | ok | 11:11 |
strigazi | not bad | 11:11 |
strigazi | what we can do | 11:11 |
strigazi | is leave the reason empty if it is healthy | 11:11 |
strigazi | and there are no 'issues' | 11:12 |
flwang1 | for the worst case, we can make the health_status_reason as a very simple dict | 11:13 |
*** Bhujay has joined #openstack-containers | 11:13 | |
flwang1 | e.g. if cluster is UNHEALTHY, then health_status_reason = {"node-0.Ready": False} | 11:14 |
flwang1 | or something like that | 11:14 |
strigazi | I think it is getting complicated like this. Complicated on the server side | 11:15 |
flwang1 | given it's a dict and only magnum internal auto healer will parse it, we should be OK | 11:15 |
flwang1 | yep | 11:15 |
flwang1 | i know | 11:15 |
flwang1 | we have to balance | 11:15 |
strigazi | As a first the simpler to implement and maintain solution sounds ideal to me | 11:16 |
flwang1 | sure, i will continue to investigate and discuss with you later | 11:18 |
strigazi | wait a moment | 11:18 |
strigazi | I'm still not sure what we haven't decided yet. It seems to me that only the health_status_reason field is not clear right? | 11:19 |
strigazi | flwang1: ^^ | 11:20 |
flwang1 | yes | 11:20 |
flwang1 | otherwise, it just works for me | 11:20 |
strigazi | So the missing part is nested vs not nested dict | 11:21 |
flwang1 | yep | 11:22 |
strigazi | Doesn't nested dict work? | 11:23 |
flwang1 | with health_status_reason = wtypes.DictType(str, o_fields.CoercedDict) | 11:23 |
flwang1 | the response will be "health_status_reason": {"k8scluster-wnd2jvqdmci3-master-0": {}, "api": {}, "k8scluster-wnd2jvqdmci3-minion-0": {}}, "user_id": "6bc2f37c4c424182967b51386270ec1c", "uuid": "cfd56d2b-73f1-4f27-a007-dc3473b681ee", "api_address": "https://172.24.4.19:6443", "master_addresses": ["172.24.4.19"], "node_count": 1, "project_id": "116161cb5f384bfa80c21b6ab0bff625", "status": "CREATE_COMPLETE", "docker_volume_si | 11:24 |
flwang1 | as you can see, the 2nd layer dict is {} | 11:25 |
*** udesale has quit IRC | 11:25 | |
flwang1 | but it's stored correctly in magnum db | 11:25 |
strigazi | So, wsme incompatibility | 11:25 |
flwang1 | we just need to figure out how to make wsme and oslo.versionedobjects work together as we want | 11:25 |
flwang1 | yep, if we can call it 'incompatibility' | 11:26 |
flwang1 | wsme just can't handle the CoercedDict type from oslo.versionedobject | 11:26 |
flwang1 | maybe we need a customized wsme type | 11:26 |
strigazi | that is one option | 11:27 |
strigazi | the other is to go for non-nested dict | 11:27 |
flwang1 | yep | 11:28 |
flwang1 | as we discussed above | 11:28 |
flwang1 | but non-nested dict may make the dict looks weird | 11:28 |
flwang1 | will dig | 11:28 |
strigazi | IMO, it is not bad to go for the flat dict initially | 11:29 |
flwang1 | ok, that's my last sort | 11:30 |
flwang1 | are we on the same page now for the health monitoring? | 11:32 |
strigazi | yes | 11:32 |
strigazi | I would go for the flat option | 11:32 |
strigazi | fyi | 11:32 |
flwang1 | btw, i'd like to totally drop the https://github.com/openstack/magnum/blob/master/magnum/drivers/common/k8s_monitor.py#L41 in Stein | 11:33 |
flwang1 | it's useless | 11:33 |
strigazi | + | 11:33 |
strigazi | +1 | 11:33 |
strigazi | Should we also emit a notification when a cluster is not healthy? | 11:34 |
flwang1 | pls revisit https://review.openstack.org/572249 and let's merge it and backport to Rocky | 11:34 |
flwang1 | then we can drop the pull_data in Stein | 11:34 |
strigazi | ok | 11:34 |
flwang1 | strigazi: it's good to have but i'd like to wait until there is a real user requirement | 11:35 |
strigazi | You don't use notification? | 11:35 |
strigazi | You don't use notifications? | 11:35 |
flwang1 | we use, but don't consume it much TBH | 11:35 |
strigazi | We do, for magnum we are working on it | 11:36 |
strigazi | but for heat it looks great | 11:36 |
flwang1 | ok, cool, then we can have | 11:36 |
strigazi | eg | 11:36 |
flwang1 | use zaqar queue for status update? | 11:36 |
strigazi | in the weekend maybe some nodes went unhealthy and then healthy again | 11:37 |
flwang1 | ah, right | 11:37 |
strigazi | which status update? | 11:37 |
strigazi | We don't have zaqar | 11:37 |
flwang1 | nevermind then | 11:37 |
flwang1 | back to flat dict, then how do we define the key | 11:38 |
flwang1 | for api, we can just use "api": "ok", but how about minion nodes? | 11:38 |
strigazi | it needs to be one right? | 11:39 |
flwang1 | using something like "minion-node-1.Ready": False? | 11:39 |
strigazi | Can we have: {"api": ok, "minion-node-1.Ready": False, "minion-node-2.Ready": True} | 11:39 |
flwang1 | {"api": "ok", "node-0.Ready": True, "node-0.OutOfDisk": False, "node-1.Ready": True, "node-1.OutOfDisk": False, ... ...} | 11:40 |
flwang1 | that's what i'm suggesting above | 11:40 |
flwang1 | that's not perfect, but I think it's clean/clear enough | 11:40 |
strigazi | it is clear, very clear | 11:41 |
flwang1 | do you want to see the "node-0.OutOfDisk": False | 11:41 |
flwang1 | other conditions except Ready | 11:41 |
flwang1 | if yes, then we need the format "nodename.Ready" otherwise, just "nodename": 'ok' or "nodename": True | 11:42 |
strigazi | It is very good to see all, it just gets a bit heavy. | 11:42 |
strigazi | 5 per node, right? | 11:43 |
flwang1 | yes | 11:43 |
strigazi | Let's make a quick cound of the data required though | 11:43 |
flwang1 | so how about just use 'Ready', but still key the 'nodename.Ready' format for future | 11:43 |
strigazi | ^^ better than just ok | 11:43 |
flwang1 | can you elaborate? | 11:44 |
strigazi | give me 5' | 11:45 |
flwang1 | still around? | 12:06 |
flwang1 | strigazi: ^ | 12:11 |
strigazi | here | 12:11 |
strigazi | for your question | 12:12 |
strigazi | nodename.Ready: True is better than nodename: ok | 12:12 |
flwang1 | ok, then next question, do we want to cover all the 5 for the first version? | 12:13 |
strigazi | how much space will we need for a 1000 node cluster? | 12:13 |
flwang1 | it could be long, but we're using Text | 12:14 |
strigazi | Text is 5GB i think | 12:15 |
flwang1 | that should be fine | 12:15 |
flwang1 | from another angle, can we fix issue like OutOfDisk? | 12:15 |
strigazi | replace the node, it is a 'fix' | 12:16 |
*** adrianc has left #openstack-containers | 12:16 | |
flwang1 | so how about just show the nodename.Ready for now and add more in the future after we figure out the whole picture | 12:16 |
strigazi | It sounds good to me | 12:17 |
flwang1 | deal | 12:17 |
strigazi | Incremental changes are better | 12:17 |
strigazi | long text is L + 4 bytes, where L < 2^32 | 12:17 |
strigazi | Are we good with health status? | 12:18 |
flwang1 | i feel very good | 12:19 |
flwang1 | question for the rolling upgrade | 12:20 |
strigazi | upgrades then, did you get the gist of it? Apart from some hypervisor reboot I'll make it fully functional todat | 12:20 |
strigazi | tell me | 12:20 |
flwang1 | i need a series of ready to test patches | 12:20 |
flwang1 | i'm really keen to test it | 12:20 |
strigazi | The idea is to provide users with cluster templates and the just follow those | 12:23 |
strigazi | The idea is to provide users with cluster templates and they just follow those | 12:23 |
flwang1 | yep, i understand that | 12:23 |
strigazi | Did you see my patch for adding the heat agent in the minions? | 12:23 |
flwang1 | yep | 12:23 |
flwang1 | i saw that | 12:23 |
flwang1 | is it ready to go? | 12:24 |
strigazi | The ssh part | 12:24 |
flwang1 | why do we need the ssh part? | 12:24 |
strigazi | to act as being in the host | 12:24 |
strigazi | some operations need to be in the same filesystem | 12:24 |
strigazi | eg https://review.openstack.org/#/c/561858/1/magnum/drivers/common/templates/kubernetes/fragments/configure-kubernetes-minion.sh@157 | 12:25 |
strigazi | eg https://review.openstack.org/#/c/561858/1/magnum/drivers/common/templates/kubernetes/fragments/configure-kubernetes-minion.sh@16 | 12:26 |
flwang1 | ok, got | 12:28 |
strigazi | We can continue in the meeting if you don't have a question now | 12:29 |
flwang1 | will you upload new patch set today? | 12:32 |
strigazi | yes | 12:32 |
flwang1 | cool, i just want to give it a try | 12:32 |
flwang1 | to understand it better | 12:32 |
strigazi | ok | 12:32 |
flwang1 | thanks for working on that, i know it's hard one | 12:33 |
strigazi | It didn't go as good as I wanted | 12:33 |
strigazi | It took too much time | 12:34 |
flwang1 | i can imagine | 12:34 |
flwang1 | but it would be a great feature | 12:34 |
strigazi | I hope it is | 12:35 |
strigazi | I need to go, are planing to sleep? :) | 12:36 |
strigazi | I need to go, are you planing to sleep? :) | 12:36 |
flwang1 | yep | 12:36 |
flwang1 | will we have meeting after 9 hours? | 12:36 |
strigazi | Thanks for staying late Feilong, you have done great work! | 12:36 |
strigazi | yes | 12:37 |
flwang1 | cool, ttyl | 12:37 |
flwang1 | have a good one | 12:37 |
strigazi | good night | 12:37 |
strigazi | @all meeting: https://wiki.openstack.org/wiki/Meetings/Containers#Agenda_for_2018-08-21_2100_UTC | 12:38 |
strigazi | Tuesday, 21 August 2018 21:00UTC | 12:39 |
*** pbourke has quit IRC | 14:06 | |
*** pbourke has joined #openstack-containers | 14:07 | |
*** suanand has quit IRC | 14:53 | |
openstackgerrit | Spyros Trigazis proposed openstack/magnum stable/queens: [k8s] Add proxy to master and set cluster-cidr https://review.openstack.org/594264 | 14:55 |
*** hongbin has joined #openstack-containers | 14:56 | |
*** pcaruana has quit IRC | 15:09 | |
*** Bhujay has quit IRC | 15:27 | |
*** ykarel is now known as ykarel|away | 15:45 | |
*** strigazi has quit IRC | 15:46 | |
*** strigazi has joined #openstack-containers | 15:46 | |
*** ramishra has quit IRC | 15:54 | |
*** itlinux has joined #openstack-containers | 15:59 | |
*** ricolin has quit IRC | 16:06 | |
*** olivenwk has quit IRC | 16:31 | |
*** ykarel|away has quit IRC | 16:49 | |
*** mattgo has quit IRC | 16:54 | |
*** robertomls has joined #openstack-containers | 17:26 | |
*** cbrumm has quit IRC | 17:42 | |
*** dave-mccowan has quit IRC | 17:44 | |
*** portdirect has quit IRC | 17:54 | |
*** cbrumm has joined #openstack-containers | 17:55 | |
*** robertomls has quit IRC | 18:27 | |
*** dave-mccowan has joined #openstack-containers | 18:30 | |
*** dave-mccowan has quit IRC | 18:35 | |
*** vkmc has quit IRC | 18:37 | |
*** sahilsinha has quit IRC | 18:37 | |
*** fungi has quit IRC | 18:37 | |
*** vkmc has joined #openstack-containers | 18:40 | |
*** Chealion has quit IRC | 18:42 | |
*** tobberydberg has quit IRC | 18:42 | |
*** mnaser has quit IRC | 18:42 | |
*** fungi has joined #openstack-containers | 18:48 | |
*** mnaser has joined #openstack-containers | 19:08 | |
*** robertomls has joined #openstack-containers | 19:18 | |
*** spiette has quit IRC | 19:32 | |
*** sdake has quit IRC | 19:33 | |
*** sdake has joined #openstack-containers | 19:34 | |
*** spiette has joined #openstack-containers | 19:36 | |
*** ArchiFleKs has quit IRC | 19:39 | |
*** robertomls has quit IRC | 19:43 | |
*** flwang1 has quit IRC | 19:47 | |
*** robertomls has joined #openstack-containers | 19:47 | |
*** robertomls has quit IRC | 20:16 | |
*** robertomls has joined #openstack-containers | 20:16 | |
*** robertomls has quit IRC | 20:41 | |
strigazi | flwang: imdigitaljim are you here? | 20:58 |
*** canori02 has joined #openstack-containers | 20:59 | |
strigazi | I'll wait a bit before starting the meeting | 21:00 |
imdigitaljim | yeah | 21:00 |
imdigitaljim | sorry | 21:00 |
imdigitaljim | im available | 21:00 |
strigazi | I think flwang will join at some point | 21:01 |
strigazi | imdigitaljim: let's start then | 21:01 |
strigazi | #startmeeting containers | 21:01 |
openstack | Meeting started Tue Aug 21 21:01:48 2018 UTC and is due to finish in 60 minutes. The chair is strigazi. Information about MeetBot at http://wiki.debian.org/MeetBot. | 21:01 |
openstack | Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. | 21:01 |
*** openstack changes topic to " (Meeting topic: containers)" | 21:01 | |
openstack | The meeting name has been set to 'containers' | 21:01 |
strigazi | #topic Roll Call | 21:01 |
*** openstack changes topic to "Roll Call (Meeting topic: containers)" | 21:01 | |
imdigitaljim | o/ | 21:02 |
colin- | hi | 21:02 |
strigazi | o/ | 21:02 |
*** harlowja has joined #openstack-containers | 21:02 | |
strigazi | #topic Announcements | 21:02 |
*** openstack changes topic to "Announcements (Meeting topic: containers)" | 21:02 | |
strigazi | kubernetes v1.11.2 is up: https://hub.docker.com/r/openstackmagnum/kubernetes-kubelet/tags/ | 21:03 |
canori02 | o/ | 21:03 |
strigazi | #topic Blueprints/Bugs/Ideas | 21:03 |
*** openstack changes topic to "Blueprints/Bugs/Ideas (Meeting topic: containers)" | 21:03 | |
strigazi | hello canori02 | 21:03 |
imdigitaljim | i saw your proxy merge. I'll rebase those other ones and we'll be good to go =] | 21:04 |
imdigitaljim | on a sidenote we recently switched over to DS proxy as well | 21:04 |
imdigitaljim | so if you want i can throw a PR for that at some point? | 21:04 |
strigazi | imdigitaljim: you changed to DS downstream? | 21:04 |
imdigitaljim | yeah | 21:05 |
imdigitaljim | making motion towards self-hosted | 21:05 |
imdigitaljim | to make in-place upgrades smoother | 21:05 |
strigazi | I think it is better to move to DS if we have a plan for most of the components | 21:05 |
imdigitaljim | well ill get a PR up for it soon | 21:06 |
strigazi | you have only proxt as DS | 21:06 |
strigazi | ? | 21:06 |
imdigitaljim | yeah currently | 21:06 |
imdigitaljim | everything else is static still for now | 21:06 |
strigazi | and calico is a DS right? | 21:06 |
imdigitaljim | yes | 21:06 |
imdigitaljim | and keystone-auth plugin is DS | 21:06 |
imdigitaljim | for masters | 21:06 |
imdigitaljim | nodeSelector: node-role.kubernetes.io/master: "" | 21:07 |
imdigitaljim | in other words | 21:07 |
imdigitaljim | ccm is the same as well | 21:07 |
strigazi | node selector can be used for api and sch and cm I guess | 21:08 |
imdigitaljim | in most deployments that is exactly whats its for | 21:08 |
imdigitaljim | but yeah that'd be appropriate too | 21:08 |
imdigitaljim | and a few tolerations | 21:09 |
imdigitaljim | i have some concerns about the reliability on self-hosted | 21:09 |
imdigitaljim | in terms of node reboot | 21:09 |
imdigitaljim | so i think ill have to plan those out | 21:09 |
imdigitaljim | and see what our 'competition' does | 21:10 |
imdigitaljim | but in general i think our near term goals will be in-place upgrade | 21:10 |
strigazi | static pods shouldn't be a problem if kubelet starts | 21:10 |
imdigitaljim | yeah thats what i was thinking | 21:11 |
imdigitaljim | except we'd probably need to keep etcd static as well | 21:11 |
imdigitaljim | without etcd it cant join | 21:11 |
imdigitaljim | what kubeadm does is it throws up a static set for rejoining | 21:11 |
imdigitaljim | and then tears it down when reconnected to the clusterr | 21:11 |
imdigitaljim | which i think would be preferable | 21:11 |
strigazi | What do you mean with a static set? | 21:12 |
strigazi | for single master that I have tried, it is just a static pod, isn't it? | 21:12 |
imdigitaljim | it ends up being one yes | 21:12 |
imdigitaljim | i think i'll have to investigate the workflow a little more | 21:13 |
imdigitaljim | make sure im also understanding it correctly | 21:13 |
strigazi | if the data of etcd are in place, rebooting the node isn't a problem | 21:13 |
imdigitaljim | but if etcd is self-hosted as well | 21:13 |
strigazi | using a static pod or not | 21:14 |
imdigitaljim | theres no apiserver online to interact with it | 21:14 |
imdigitaljim | no? | 21:14 |
strigazi | kubelet can't start pods without an api server | 21:14 |
imdigitaljim | (multimaster scenario) | 21:14 |
imdigitaljim | exactly | 21:14 |
strigazi | In multimaster I don't know what kubeadm does | 21:15 |
strigazi | if it converts the static pods to a deployment | 21:15 |
strigazi | or ds | 21:15 |
imdigitaljim | i think this problem is significantly easier in single master | 21:15 |
strigazi | well if, you use static pods, it is the "same" for multi and single master | 21:16 |
imdigitaljim | yeah | 21:16 |
strigazi | if reboot one by one | 21:16 |
imdigitaljim | are you assuming etcd is also self-hosted in this? | 21:17 |
strigazi | if you reboot all of them in one go maybe the result is different | 21:17 |
imdigitaljim | or not | 21:17 |
strigazi | I don't see a diference | 21:17 |
strigazi | kubelet for static pods is like systemd for processes :) | 21:18 |
flwang | sorry i'm late | 21:18 |
colin- | welcome | 21:18 |
imdigitaljim | https://github.com/kubernetes/kubeadm/blob/master/docs/design/design_v1.10.md#optional-and-alpha-in-v19-self-hosting | 21:18 |
imdigitaljim | i believe there are some unaccounted for situations | 21:19 |
strigazi | oh, you mean make even etcd a ds | 21:20 |
strigazi | I wouldn't do that :) | 21:20 |
strigazi | it sounds terrifying | 21:21 |
imdigitaljim | haha | 21:21 |
imdigitaljim | yeah just some concerns to look at | 21:21 |
imdigitaljim | definitely possible to overcome though | 21:21 |
strigazi | ok, since flwang is also in, | 21:22 |
flwang | ds for etcd? | 21:22 |
colin- | yes, mad science lab :) | 21:22 |
strigazi | I was rebasing the upgrades api and I spent a good three hourds debugging. (ds for etcd, yes) | 21:23 |
strigazi | The new change for the svc account keys was "breaking" the functionality | 21:23 |
flwang | strigazi: breaking? | 21:24 |
strigazi | but I finally, figured it out, when magnum creates the dict for params to pass to heat | 21:24 |
strigazi | it adds and generate the keys for the service account | 21:25 |
strigazi | and if since it generates every time magnum creates the params it means that the keys will be different | 21:25 |
strigazi | "breaking" not really breaking | 21:25 |
flwang | ah | 21:26 |
flwang | so do we need any change for the key generating? | 21:26 |
strigazi | maybe, but for now I just ignore those two params | 21:26 |
strigazi | I'll push after the meeting | 21:26 |
imdigitaljim | thats a crazy issue | 21:27 |
imdigitaljim | yeah | 21:27 |
imdigitaljim | maybe we should save in barbican | 21:27 |
imdigitaljim | or | 21:27 |
flwang | strigazi: ok, i will keep an eye on that part | 21:27 |
imdigitaljim | whatever secret backend | 21:27 |
imdigitaljim | and extract or create depending on update/create? | 21:27 |
strigazi | we should do the same thing we do for the ca | 21:27 |
imdigitaljim | so yeah^ | 21:27 |
strigazi | imdigitaljim we can do that too | 21:27 |
strigazi | the only thing I don't like is having the secret in barbican and then passing it as a heat paramter | 21:28 |
colin- | i think that would be useful without too much cost | 21:28 |
colin- | oh | 21:28 |
strigazi | It will stay in heat db for ever | 21:29 |
colin- | that's lame | 21:29 |
imdigitaljim | thats true | 21:29 |
strigazi | encrypted, but still | 21:29 |
imdigitaljim | can trustee be extended to allow cluster to interact with barbican? | 21:29 |
strigazi | yes | 21:29 |
imdigitaljim | (or secret backend) | 21:29 |
strigazi | we don't have to do anything actually, the trustee with the trust can talk to barbican | 21:30 |
imdigitaljim | yeah | 21:30 |
imdigitaljim | thats what i was hoping :] | 21:30 |
strigazi | vault or other is different | 21:30 |
imdigitaljim | i havent actually tried it | 21:30 |
imdigitaljim | people can add support for other backend as they need imho | 21:30 |
strigazi | Let's see in stein | 21:30 |
flwang | imdigitaljim: i'm interested in your keystone implementation btw | 21:31 |
imdigitaljim | oh sure | 21:31 |
flwang | imdigitaljim: especially if it needs api restart | 21:31 |
imdigitaljim | ive been working through some changes on new features here and we just accepted some alpha customers so we've been tied up | 21:31 |
imdigitaljim | but ill revisit those PR's and then some | 21:31 |
imdigitaljim | which api restart? | 21:32 |
flwang | k8s api server | 21:32 |
imdigitaljim | oh im not doing a restart, im confused? | 21:33 |
strigazi | why it needs an api restart? | 21:33 |
flwang | because based on testing, you have to restart k8s api server after you got the service URL of keystone auth service | 21:33 |
flwang | if it's deployed as DS | 21:33 |
imdigitaljim | oh yeah i havent needed to at all | 21:33 |
strigazi | flwang: when I tried I didn't restarted the k8s-api | 21:33 |
strigazi | flwang you can use host-network | 21:33 |
imdigitaljim | ^ | 21:34 |
imdigitaljim | i dont know if you're using gthat | 21:34 |
flwang | then I really want to know how did you do that if you don't deployed it on master | 21:34 |
flwang | master's kubelet | 21:34 |
imdigitaljim | but we are doing hostNetwork: true | 21:34 |
imdigitaljim | i do | 21:34 |
imdigitaljim | tolerations: | 21:34 |
imdigitaljim | - key: dedicated | 21:34 |
imdigitaljim | value: master | 21:34 |
imdigitaljim | effect: NoSchedule | 21:34 |
imdigitaljim | - key: CriticalAddonsOnly | 21:34 |
imdigitaljim | value: "True" | 21:34 |
imdigitaljim | effect: NoSchedule | 21:34 |
imdigitaljim | nodeSelector: | 21:34 |
imdigitaljim | node-role.kubernetes.io/master: "" | 21:34 |
imdigitaljim | we're putting most kube-system resources on masters | 21:35 |
flwang | so your k8s-keystone-auth service is running as DS and not running on master, right? | 21:35 |
imdigitaljim | not like dashboards and whatnot | 21:35 |
imdigitaljim | its running a DS on master | 21:35 |
imdigitaljim | and not on minion | 21:35 |
flwang | ah, right | 21:35 |
flwang | then yes, for that case, it's much easier and could avoid the api restart | 21:36 |
imdigitaljim | oh okay yeah | 21:36 |
flwang | so, should we just get the kubelet back on master? | 21:36 |
strigazi | it seems too reasonable :) | 21:36 |
strigazi | to not get it | 21:37 |
imdigitaljim | o/ ill be glad to add it back | 21:37 |
imdigitaljim | also strigazi: could we make the flannel parts also a software deployment | 21:37 |
imdigitaljim | are those necesssary to be apart of cloud-init? | 21:37 |
strigazi | we could, no they don't | 21:37 |
imdigitaljim | i've noticed we're nearing the capacity on cloud-init data | 21:38 |
flwang | if we all agree 'kubelet' back on master, then it's easy | 21:38 |
imdigitaljim | yeah | 21:38 |
flwang | we just need to drop some 'if' check for calico | 21:38 |
strigazi | that should be enough | 21:39 |
imdigitaljim | how do you all feel about leveraging a helm for prometheus/dashboard/etc | 21:39 |
imdigitaljim | instead of using our scripts going forwad? | 21:39 |
imdigitaljim | forward? | 21:39 |
imdigitaljim | helm charts and such are much cleaner/easier to maintain | 21:39 |
strigazi | we were discussing this today | 21:39 |
strigazi | yes, we could | 21:39 |
flwang | to be more clear, should kubelet on master in Rocky? | 21:40 |
strigazi | the question is | 21:40 |
strigazi | if we don't put it Rocky, how many user will cherry-pick downstream | 21:40 |
strigazi | if we don't put it Rocky, how many users will cherry-pick downstream | 21:40 |
imdigitaljim | i think it would be appropriate for rocky | 21:40 |
imdigitaljim | stein is a bit far out for such a critical change | 21:41 |
flwang | Stein will be a very long release | 21:41 |
flwang | the longest so far IIRC | 21:41 |
strigazi | yes it will | 21:42 |
flwang | I think the risk is low and the benefit is big | 21:42 |
strigazi | +1 | 21:42 |
flwang | ok, then I will propose a patch this week | 21:43 |
flwang | I'm glad to see Magnum team is so productive | 21:43 |
imdigitaljim | \o/ | 21:43 |
strigazi | :) | 21:43 |
colin- | yeah i think that's worthwhile, it will provide a lot of benefit for consumers | 21:44 |
strigazi | The only area we haven't push a lot, is the -1 stable branch | 21:44 |
strigazi | usually stable and master are in very good shape and are up to date | 21:45 |
strigazi | but current-stable -1 is a little behind | 21:45 |
strigazi | I don't know if we can put a lot of effort into old branches | 21:46 |
strigazi | the ones that we are present here run stable + patches | 21:47 |
strigazi | since we will push some more patches in rocky should we give it a timeline of two or three weeks? | 21:49 |
strigazi | the bracnh is cut, packagers will have everything in place, we can do as many releases as we want with non-breaking changes like these | 21:50 |
strigazi | makes sense? | 21:50 |
*** itlinux has quit IRC | 21:51 | |
strigazi | imdigitaljim: flwang colin- canori02 ^^ | 21:51 |
canori02 | Makes sense | 21:51 |
colin- | yeah that seems reasonable | 21:52 |
imdigitaljim | yeah | 21:52 |
imdigitaljim | that sounds great | 21:52 |
strigazi | imdigitaljim: colin- you use rpms? containers? ansible? | 21:52 |
imdigitaljim | for magnum? | 21:53 |
strigazi | flwang: canori02 you? | 21:53 |
strigazi | imdigitaljim: yes | 21:53 |
imdigitaljim | containers | 21:53 |
strigazi | kolla? | 21:53 |
imdigitaljim | puppet + containers | 21:54 |
canori02 | ansible here | 21:54 |
strigazi | interesting we have a spectrum | 21:54 |
strigazi | we use puppet + rpms | 21:54 |
flwang | sorry, was in standup meeting | 21:55 |
flwang | i'm reading the log | 21:55 |
strigazi | but we have a large koji infra for rpms | 21:55 |
flwang | we're using puppet+debian pkg | 21:56 |
strigazi | you deploy on debian sid? | 21:56 |
strigazi | it is strech now | 21:57 |
strigazi | anything else folks? | 21:59 |
strigazi | see you next week or just around | 21:59 |
strigazi | #endmeeing | 22:00 |
strigazi | #endmeeting | 22:00 |
*** openstack changes topic to "OpenStack Containers Team" | 22:00 | |
openstack | Meeting ended Tue Aug 21 22:00:06 2018 UTC. Information about MeetBot at http://wiki.debian.org/MeetBot . (v 0.1.4) | 22:00 |
openstack | Minutes: http://eavesdrop.openstack.org/meetings/containers/2018/containers.2018-08-21-21.01.html | 22:00 |
openstack | Minutes (text): http://eavesdrop.openstack.org/meetings/containers/2018/containers.2018-08-21-21.01.txt | 22:00 |
openstack | Log: http://eavesdrop.openstack.org/meetings/containers/2018/containers.2018-08-21-21.01.log.html | 22:00 |
strigazi | canori02: For your coreos patch, it is still doesn't work for me. Is it working with master? | 22:00 |
flwang | strigazi: thanks for hosting | 22:01 |
flwang | strigazi: what's our strategy for coreOS driver? | 22:02 |
flwang | should we try to do more catch up for that driver? | 22:02 |
openstackgerrit | Spyros Trigazis proposed openstack/magnum master: WIP: Add cluster upgrade to the API https://review.openstack.org/514959 | 22:03 |
canori02 | strigazi: I think I had it on 17.0.2. But I'll bring it up to master and fix accordingly. Was it just when providing the custom ca tgat it didn't work for you? | 22:03 |
strigazi | flwang: you are welcome. yes we should, we should invest in ignition | 22:04 |
strigazi | canori02: well the base64 way needs also decofing on the driver side | 22:05 |
strigazi | canori02: yes in the make-cert part is not working | 22:05 |
flwang | strigazi: is ignition the replacement for cloud-init on coreos? | 22:05 |
strigazi | flwang: yes, kind of | 22:05 |
flwang | got | 22:05 |
strigazi | ignition runs before the OS boots | 22:05 |
flwang | strigazi: thanks for the upgrade api patch ;) | 22:06 |
strigazi | but it can write config files | 22:06 |
canori02 | It's the replacement for coreos-cloudconfig | 22:06 |
strigazi | flwang: it needs one more patch for the separating the paramters for master and minioin | 22:07 |
strigazi | canori02: maybe we can escape the \ in \n | 22:07 |
flwang | strigazi: when that patch will be pushed? | 22:07 |
strigazi | this or we encode in base64 and decode in the nodes | 22:07 |
strigazi | I'm trying to rebase | 22:07 |
flwang | strigazi: great, sorry for pushing | 22:09 |
strigazi | flwang: thanks for pushing! | 22:09 |
*** rcernin has joined #openstack-containers | 22:10 | |
flwang | strigazi: haha | 22:11 |
flwang | we're keen for that feature, so..... | 22:11 |
flwang | have a good night, i'm all good for today | 22:12 |
colin- | ttyl | 22:12 |
openstackgerrit | Spyros Trigazis proposed openstack/magnum master: k8s_atomic: Add upgrade node functionallity https://review.openstack.org/514960 | 22:18 |
strigazi | res | 22:19 |
flwang | imdigitaljim: still around? | 22:22 |
imdigitaljim | yeah | 22:22 |
flwang | imdigitaljim: when you rebase you tidy master patch, could you please consider that we will bring the kubelet back? | 22:23 |
imdigitaljim | yes absolutely | 22:23 |
flwang | to make all our life easier ;) | 22:23 |
imdigitaljim | i was planning to do so | 22:23 |
imdigitaljim | :] | 22:23 |
flwang | cool, i will add you as reviewer for the kubelet patch | 22:23 |
canori02 | strigazi: how can I pass a ca to a magnum cluster? I hadn't used that functionality before | 22:24 |
imdigitaljim | theres a few variables that exist for it | 22:24 |
imdigitaljim | if you mean a ca.crt | 22:25 |
strigazi | flwang: imdigitaljim I think it is cleaner to add kubelet first | 22:25 |
strigazi | canori02: the ca is passed already | 22:26 |
imdigitaljim | thats fine too | 22:26 |
imdigitaljim | i can make a cleanup pass after kubelet | 22:26 |
imdigitaljim | so flwang just do what you'd need and ill fix it up | 22:26 |
flwang | strigazi: yes, that's my plan | 22:28 |
flwang | imdigitaljim: awesome, thanks | 22:28 |
imdigitaljim | strigazi: what is the overall goal of thise upgrade | 22:29 |
imdigitaljim | will you upgrading api/scheduler/controller as well? | 22:30 |
strigazi | all components | 22:30 |
strigazi | that we have tags for | 22:30 |
imdigitaljim | awesome | 22:30 |
imdigitaljim | look forward to seeing it completed | 22:31 |
strigazi | :) | 22:32 |
*** hongbin has quit IRC | 22:44 | |
openstackgerrit | Merged openstack/magnum-ui master: Imported Translations from Zanata https://review.openstack.org/594054 | 23:40 |
Generated by irclog2html.py 2.15.3 by Marius Gedminas - find it at mg.pov.lt!