21:01:58 #startmeeting containers 21:01:59 Meeting started Tue Apr 2 21:01:58 2019 UTC and is due to finish in 60 minutes. The chair is strigazi. Information about MeetBot at http://wiki.debian.org/MeetBot. 21:02:00 Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. 21:02:02 The meeting name has been set to 'containers' 21:02:04 #topic Roll Call 21:02:05 yes, you're in charge the stein release ;) 21:02:07 o/ 21:02:09 o/ 21:02:22 o/ 21:03:24 #topic Stories/Tasks 21:03:48 For upgrades: 21:03:49 o/ 21:04:07 flwang: I still dont' understand https://review.openstack.org/#/c/649221/ 21:04:13 hey jakeyip 21:04:25 hi strigazi 21:04:30 strigazi: let me show you 21:05:03 https://review.openstack.org/#/c/514960/4/magnum/drivers/common/templates/kubernetes/fragments/configure-kubernetes-minion.sh@161 21:05:29 with your patch, the configure-kubernetes-minion will be run by heat-container-agent 21:05:51 but heat-container-agent can't access /usr/lib/systemd/system 21:05:55 ok 21:05:58 which will make cluster create failed 21:06:01 only for this? 21:06:23 after workaround this 21:06:28 we should do all of this over ssh 21:07:06 ssh for checking and copying as well? 21:07:37 can you help me understand what's the limitation why we have to use ssh? 21:09:11 for syscontainers we need to use ssh for the atomic command. Atomic install needs to be in the same fs. It creates hard links for the installed container image 21:09:21 this is the first reason ^^ 21:09:26 the second is 21:10:21 the second is kind of weak. It is: not to have to install all deps of the host system to agent 21:10:34 eg the command hostname 21:10:37 also 21:10:54 for systemctl, it is better to do it over ssh IMO. 21:11:26 otherwise, we might see weird things if the systemd verions != systemctl version 21:11:31 makes sense? 21:12:29 ok, if we have to do that, then i'm ok. but for special case, like this docker.service, i'm not sure which one is the good/simple way 21:12:44 so far, your patch doesn't work for me yet. 21:12:49 btw 21:12:50 ssh for sure 21:13:06 the agent should be minimal 21:13:13 and general purpose 21:13:18 where can i find the upgrade-kubernetes.sh on the minion node? 21:13:35 to test by hand? 21:13:46 no 21:14:02 it leaves for a limited period of time in /var/lib/cloud/ 21:14:10 i mean will the script be shipped after the cluster created 21:14:14 s/leaves/lives 21:14:21 no 21:14:34 it is shipped in stack update 21:14:34 or after issued the upgrade command 21:14:44 this ^^ 21:15:06 ok, in my testing, after issued the upgrade, i can't find it on the minion node 21:15:12 https://review.openstack.org/#/c/514960/4/magnum/drivers/k8s_fedora_atomic_v1/templates/kubeminion.yaml@458 21:16:05 hi guys, I'm using magnum to deploy a k8s cluster, but when I call the `coe config cluster` it gives me this error "a bytes-like object is required, not 'str'" does anyone have any hint ou tip about it? thank in advance 21:16:08 we can discuss details off line if you have some time today 21:16:46 guimaluf: we're in weekly meeting, mind we discuss it offline after 45 mins? 21:16:53 flwang: today for you tonight, I do have :) 21:17:00 strigazi: ok 21:17:25 strigazi: i will be around at UTC 10:00 21:17:38 flwang, oh, sorry. I thought meetings were held on #openstack-meetings! my sorry! :) 21:17:54 guimaluf: i've PMed you with the answer ;) 21:18:13 flwang: maybe09:00 ? 21:18:34 strigazi: sure, no problem 21:19:53 flwang: for the API, shall we take it? 21:20:53 strigazi: yes, i have tested it based on your patch 21:20:55 it works 21:21:11 I can give a final +2 tmr 21:21:15 and we can polish it along with your current functional patch if there is any small issue 21:21:28 get it in will make your patch testing easier 21:21:37 i will propose a API ref patch soon 21:21:49 strigazi: thanks 21:23:02 Let's move one, I need to discuss to things 21:23:21 s/to things/two things/ 21:23:32 move on :) 21:23:52 After two corrections it worked :) 21:23:59 1.14.0 21:24:12 sonobuoy passed? 21:24:18 we have the containers, did it work for anyone? flwang ? 21:24:28 in case we forget, I'd like to add 2 things to the agenda too, python-magnumclient 2.13.0 and why it is in a abandonded state: https://review.openstack.org/#/c/642609/ and multi-nic patch https://review.openstack.org/#/c/648818/ 21:24:30 I havent' tried, will do tmr 21:24:40 i just tested v1.13.5, i'm going to test it today or tomorrow 21:24:51 v1.13.5 can pass sonobuoy 21:25:11 i posted by sonobuoy e2e results here which failed with 12 failures: http://paste.openstack.org/show/748667/ 21:25:22 192 passed 21:25:42 although i don't understand what the failures mean to be honest 21:25:49 o/ 21:25:51 brtknr: i think it deserves a rerun 21:25:59 sorry i was late 21:26:02 will do 21:26:13 brtknr: thanks for the results 21:26:19 ttsiouts: welcome 21:26:28 brtknr: flannel or calico 21:26:32 brtknr: master branch? 21:26:33 hello ttsiouts 21:27:15 strigazi: flannel, not master branch... queens :D 21:27:21 brtknr: ok 21:27:29 I will also check 21:27:33 i only upgraded the images 21:27:38 ok 21:28:06 I think and hope it will work for master 21:28:16 ok, so we are close for 1.14 21:28:25 strigazi: we have deployed stable stein 21:28:31 will test it soon 21:28:38 i am also happy to do a rerun both on master and queens 21:28:50 flwang: brtknr how did you run the e2e tests> 21:28:51 ? 21:29:21 oh 1.14.0 is out 21:29:23 sonobuoy run 21:29:27 https://github.com/heptio/sonobuoy/releases/tag/v0.14.0 21:29:34 excellent 21:29:40 i'm using 0.13 21:29:42 I'll try in prod and devstack 21:29:50 will test with 0.14 for v1.14.0 21:29:56 flwang: congrats in stein :) 21:30:02 haha, thanks 21:30:05 strigazi: I followed the istructions on https://github.com/heptio/sonobuoy 21:30:15 we have to use stein since we need a lot of new features in stein 21:30:24 ok, let's finalize 1.14 tmr 21:30:49 brtknr: I +2 your patch for the network config 21:31:12 strigazi: thanks :) 21:31:13 as for the client patch, i will restore and push 21:31:22 brtknr: for the client we will release again soon 21:31:25 i was distracted a bit 21:31:29 brtknr: what flwang said :) 21:31:33 flwang: no worriees 21:31:50 we will release a train client technically 21:32:03 ok, the other, still two, things are: 21:32:17 NGs 21:32:53 flwang: brtknr, with ttsiouts we tested the migration on a copy of our prod DB 21:33:20 after a small fix, everything worked as expected. 21:33:27 strigazi: ++ 21:33:41 nice, i have a question on that, is there a way to go back to previous db state? 21:33:45 probably we had some trash from old migrations, but it works fine 21:33:57 brtknr: yes 21:34:09 it is called backup the DB first :P 21:34:22 the magnum db is tiny 21:34:35 strigazi: with the ng-3, a small issue i have mentioned in ng4 is, it making cluster delete very slow 21:34:41 though the stack has been deleted 21:34:49 it worths a dig 21:34:53 we have more than 500 cluster and it is less than one megabyte 21:34:53 is that through a magnum cli command or copy db files manually? 21:34:58 ttsiouts is aware i hope 21:35:11 flwang: yes most probably it's the extra queries for fetching and deleting the NGs 21:35:26 ttsiouts: but it's more than that 21:35:36 flwang: what do you mean very slow? 21:35:36 would be nice to do a comparison of how much longer it actually takes 21:35:38 those extra queries shouldn't take such long 21:35:53 flwang: hmmm 21:36:20 strigazi: generall, after the stack deleted, magnum will just take several seconds to remove the db record in magnum db 21:36:35 flwang: I'm checking 21:36:40 but now it takes another 20-30 seconds to delete the record 21:36:50 i haven't seen a delete failure 21:36:55 so just a heads up 21:37:00 ok 21:37:30 small issue but it should and probably can be improved 21:38:05 yep, that's a huge job, well done ttsiouts 21:38:19 flwang: :D 21:38:20 I have tested the other two patches and work fine for me 21:38:33 ttsiouts: ++ x10 21:38:44 or ++^2 21:39:01 flwang: a single query could delete them 21:39:13 I can propose a patch 21:39:48 yep amazing work! 21:40:02 :) 21:40:02 i didnt even imagine this would be reality 1 month ago 21:40:30 it tooks some afternoon sessions but ttsiouts pulled this off 21:40:47 solid implementation 21:41:06 is the CRUD bit going to be tricky? 21:41:14 brtknr: it's 21:41:32 strigazi: i will review ng4 and ng5 21:41:58 thanks 21:42:11 the last item comes from this: 21:42:13 flwang: thanks!! 21:42:31 https://review.openstack.org/#/c/648317/ 21:42:52 brtknr: I don't know how many times I've thanked you for your testing but they are not enough 21:43:18 Shall we implement an option to have BFV instead of extra attached volumes? it will be cleaner and easier to maintain and run 21:43:52 what do you mean BFV? 21:44:20 Boot From Volume 21:45:20 do you mean boot from volume for the nodes? 21:45:28 master and node 21:45:35 I think it will be intersting for you 21:45:37 yes 21:45:44 ttsiouts: :) my pleasure 21:45:52 with NGs we can separate 21:45:54 brtknr: :D 21:46:10 but for "old" drivers, yes, master and node 21:46:40 thouhts? 21:46:52 I think mnaser would be also interested 21:47:01 strigazi: yep, i like it. but i don't understand why it can resolve the mount problem 21:47:41 because we won't mount a device to the vm and then in the kubelet container 21:47:49 the volume will be the root fs 21:47:56 ah 21:48:07 i see, you're talking about the specific case 21:48:30 now, we mount, we partition the fs 21:48:33 strigazi: is that interoperable with baremetal provisioning case? 21:48:43 strigazi: i understand what you're talking about now 21:48:57 brtknr: for BM it is not an issue 21:49:33 brtknr: for BM we don't mount a volume for the container storage (images and container overlays) 21:50:11 strigazi: but even if we can have BFV, if we still allow use specify the docker_volume_size, user will still run into this issue 21:50:23 so we should have both of them 21:50:46 s/use/user 21:51:29 yes, it may be an issue, but we can advertise BFV for use cases where the flavor has a small disk 21:52:01 yep 21:52:31 I think the original problem is addressed better with BFV VMs 21:52:40 so let's take this and add BFV, and slowly drop the docker_volume_size in the long run? 21:52:41 we (cern) don't use this feature 21:52:55 I mean docker_volume_size 21:52:58 strigazi: yep, because you have a big image root size 21:53:03 yep i know 21:53:13 well, we do, but only for swarm, AFAIK 21:53:21 big-ish 21:53:22 cert dont use BFV or mount/partition? 21:53:30 s/cert/cern 21:53:45 FWIW we are BFV only cloud. I think I remember pushing up or working on something about this not long ago 21:53:47 brtknr: magnum doesn't support boot from volume now 21:54:06 mnaser: yep, i think you have a patch stall somewhere 21:54:12 do you mind me picking it up? 21:54:18 hello mnaser 21:54:26 no, please go ahead. I’m a bit short on time with release and stuff 21:54:32 Hi brtknr o/ 21:54:39 mnaser: cool, thanks 21:54:47 strigazi: we have 5 mins 21:55:08 give me 2 mins for AS and AH? 21:55:46 https://review.openstack.org/#/c/621734/ 21:56:08 auto scaling and auto healing are important feature for k8s, now we're going to do it in magnum https://review.openstack.org/631378 21:56:24 strigazi: mind me publishing the autoscaler image on openstackmagnum? 21:56:57 flwang: how? copy from thomas's repo? 21:57:22 no 21:57:29 just download the sourcecode and build 21:57:47 why do i have to use thomas's repo? 21:58:11 is thomas around? 21:58:12 you don't, it is just what everyone of us have tested 21:58:32 you mean i don't have to? 21:58:58 if it's well tested, i can just copy 22:00:13 strigazi: brtknr: could you please also test the patch https://review.openstack.org/631378 to make sure you're happy with it? 22:00:51 flwang: yep i'll test it tomorrow 22:01:16 brtknr: thanks 22:01:34 flwang: as soon as I redeploy my devstack as my magnum db is updated to support nodegroups and I didnt back up 22:01:35 http://paste.openstack.org/raw/748747/ 22:02:40 flwang: I left some comments 22:02:48 strigazi: yep, i saw that 22:02:58 strigazi: thanks for review 22:03:06 strigazi: are you happy generally? 22:03:34 generally yes 22:03:51 flwang: brtknr I pushed the image http://paste.openstack.org/raw/748747/ 22:03:52 strigazi: cool 22:04:13 i will propose a new patch set today to address all comments 22:04:33 flwang: how do you normally test auto-healing? 22:04:57 flwang: did the CA team told you that they will release magnum in 1.14.1? 22:05:15 strigazi: yes, they told me they will include it in 1.14.1 22:05:22 it will take a couple of weeks 22:05:27 brtknr: we can talk offline 22:05:42 just put more load for your cluster, it's easy 22:06:05 flwang: isnt that auto-scaling? 22:06:11 sorry for silly questions 22:06:27 yes we can talk latrer 22:06:33 brtknr: http://paste.openstack.org/raw/748748/ 22:07:05 ~ 500mb pods 22:07:17 strigazi: did you get a chance to see this https://review.openstack.org/#/c/643225/ ? 22:07:41 no, will do tmr 22:09:03 strigazi: no problem, thanks 22:09:10 i'm a happy boy now 22:09:19 :) 22:09:44 Shall we end the meeting? 22:09:49 anything else? 22:09:53 Small thing 22:10:06 Has the functional tests started working again magically? 22:10:19 we can test 22:10:24 I don't know 22:10:26 Maybe an updated kernel in CI might have helped? 22:10:26 flwang: ? 22:10:40 strigazi: i don't think so unless the nested virt has been fixed? 22:10:42 Spyros Trigazis proposed openstack/magnum master: Revert "ci: Disable functional tests" https://review.openstack.org/642873 22:10:45 mnaser: is it? 22:11:00 mnaser: ^^ 22:11:01 I know kata works fine but they don’t update their kernel. So maybe it was a bad kernel and Ubuntu release an update and it’s ok now 22:11:15 Just something to try. 22:11:26 mnaser: thanks, good to know 22:11:44 If not, I can maybe try to reimplement those tests in centos. Our hosts run centos so it might be more stable 22:11:44 the quick boy strigazi has a patch already ^ 22:11:46 the ci is running, let's see 22:11:59 Cool, I’ll add myself to the review. 22:12:01 Thanks 22:12:14 mnaser: thanks for all your love for magnum 22:12:23 mnaser: ++ 22:12:27 mnaser: yeaah!! 22:12:41 :D I try my best with the little time I have :P 22:13:19 thanks again :) 22:13:23 mnaser: cheers 22:13:29 let's end the meeting then 22:13:49 #endmeeting