21:01:58 <strigazi> #startmeeting containers 21:01:59 <openstack> Meeting started Tue Apr 2 21:01:58 2019 UTC and is due to finish in 60 minutes. The chair is strigazi. Information about MeetBot at http://wiki.debian.org/MeetBot. 21:02:00 <openstack> Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. 21:02:02 <openstack> The meeting name has been set to 'containers' 21:02:04 <strigazi> #topic Roll Call 21:02:05 <flwang> yes, you're in charge the stein release ;) 21:02:07 <strigazi> o/ 21:02:09 <brtknr> o/ 21:02:22 <flwang> o/ 21:03:24 <strigazi> #topic Stories/Tasks 21:03:48 <strigazi> For upgrades: 21:03:49 <jakeyip> o/ 21:04:07 <strigazi> flwang: I still dont' understand https://review.openstack.org/#/c/649221/ 21:04:13 <strigazi> hey jakeyip 21:04:25 <jakeyip> hi strigazi 21:04:30 <flwang> strigazi: let me show you 21:05:03 <flwang> https://review.openstack.org/#/c/514960/4/magnum/drivers/common/templates/kubernetes/fragments/configure-kubernetes-minion.sh@161 21:05:29 <flwang> with your patch, the configure-kubernetes-minion will be run by heat-container-agent 21:05:51 <flwang> but heat-container-agent can't access /usr/lib/systemd/system 21:05:55 <strigazi> ok 21:05:58 <flwang> which will make cluster create failed 21:06:01 <strigazi> only for this? 21:06:23 <flwang> after workaround this 21:06:28 <strigazi> we should do all of this over ssh 21:07:06 <flwang> ssh for checking and copying as well? 21:07:37 <flwang> can you help me understand what's the limitation why we have to use ssh? 21:09:11 <strigazi> for syscontainers we need to use ssh for the atomic command. Atomic install needs to be in the same fs. It creates hard links for the installed container image 21:09:21 <strigazi> this is the first reason ^^ 21:09:26 <strigazi> the second is 21:10:21 <strigazi> the second is kind of weak. It is: not to have to install all deps of the host system to agent 21:10:34 <strigazi> eg the command hostname 21:10:37 <strigazi> also 21:10:54 <strigazi> for systemctl, it is better to do it over ssh IMO. 21:11:26 <strigazi> otherwise, we might see weird things if the systemd verions != systemctl version 21:11:31 <strigazi> makes sense? 21:12:29 <flwang> ok, if we have to do that, then i'm ok. but for special case, like this docker.service, i'm not sure which one is the good/simple way 21:12:44 <flwang> so far, your patch doesn't work for me yet. 21:12:49 <flwang> btw 21:12:50 <strigazi> ssh for sure 21:13:06 <strigazi> the agent should be minimal 21:13:13 <strigazi> and general purpose 21:13:18 <flwang> where can i find the upgrade-kubernetes.sh on the minion node? 21:13:35 <strigazi> to test by hand? 21:13:46 <flwang> no 21:14:02 <strigazi> it leaves for a limited period of time in /var/lib/cloud/<smth> 21:14:10 <flwang> i mean will the script be shipped after the cluster created 21:14:14 <strigazi> s/leaves/lives 21:14:21 <strigazi> no 21:14:34 <strigazi> it is shipped in stack update 21:14:34 <flwang> or after issued the upgrade command 21:14:44 <strigazi> this ^^ 21:15:06 <flwang> ok, in my testing, after issued the upgrade, i can't find it on the minion node 21:15:12 <strigazi> https://review.openstack.org/#/c/514960/4/magnum/drivers/k8s_fedora_atomic_v1/templates/kubeminion.yaml@458 21:16:05 <guimaluf> hi guys, I'm using magnum to deploy a k8s cluster, but when I call the `coe config cluster` it gives me this error "a bytes-like object is required, not 'str'" does anyone have any hint ou tip about it? thank in advance 21:16:08 <flwang> we can discuss details off line if you have some time today 21:16:46 <flwang> guimaluf: we're in weekly meeting, mind we discuss it offline after 45 mins? 21:16:53 <strigazi> flwang: today for you tonight, I do have :) 21:17:00 <flwang> strigazi: ok 21:17:25 <flwang> strigazi: i will be around at UTC 10:00 21:17:38 <guimaluf> flwang, oh, sorry. I thought meetings were held on #openstack-meetings! my sorry! :) 21:17:54 <brtknr> guimaluf: i've PMed you with the answer ;) 21:18:13 <strigazi> flwang: maybe09:00 ? 21:18:34 <flwang> strigazi: sure, no problem 21:19:53 <strigazi> flwang: for the API, shall we take it? 21:20:53 <flwang> strigazi: yes, i have tested it based on your patch 21:20:55 <flwang> it works 21:21:11 <strigazi> I can give a final +2 tmr 21:21:15 <flwang> and we can polish it along with your current functional patch if there is any small issue 21:21:28 <flwang> get it in will make your patch testing easier 21:21:37 <flwang> i will propose a API ref patch soon 21:21:49 <flwang> strigazi: thanks 21:23:02 <strigazi> Let's move one, I need to discuss to things 21:23:21 <strigazi> s/to things/two things/ 21:23:32 <strigazi> move on :) 21:23:52 <strigazi> After two corrections it worked :) 21:23:59 <strigazi> 1.14.0 21:24:12 <flwang> sonobuoy passed? 21:24:18 <strigazi> we have the containers, did it work for anyone? flwang ? 21:24:28 <brtknr> in case we forget, I'd like to add 2 things to the agenda too, python-magnumclient 2.13.0 and why it is in a abandonded state: https://review.openstack.org/#/c/642609/ and multi-nic patch https://review.openstack.org/#/c/648818/ 21:24:30 <strigazi> I havent' tried, will do tmr 21:24:40 <flwang> i just tested v1.13.5, i'm going to test it today or tomorrow 21:24:51 <flwang> v1.13.5 can pass sonobuoy 21:25:11 <brtknr> i posted by sonobuoy e2e results here which failed with 12 failures: http://paste.openstack.org/show/748667/ 21:25:22 <brtknr> 192 passed 21:25:42 <brtknr> although i don't understand what the failures mean to be honest 21:25:49 <ttsiouts> o/ 21:25:51 <flwang> brtknr: i think it deserves a rerun 21:25:59 <ttsiouts> sorry i was late 21:26:02 <brtknr> will do 21:26:13 <strigazi> brtknr: thanks for the results 21:26:19 <strigazi> ttsiouts: welcome 21:26:28 <strigazi> brtknr: flannel or calico 21:26:32 <strigazi> brtknr: master branch? 21:26:33 <brtknr> hello ttsiouts 21:27:15 <brtknr> strigazi: flannel, not master branch... queens :D 21:27:21 <strigazi> brtknr: ok 21:27:29 <strigazi> I will also check 21:27:33 <brtknr> i only upgraded the images 21:27:38 <strigazi> ok 21:28:06 <strigazi> I think and hope it will work for master 21:28:16 <strigazi> ok, so we are close for 1.14 21:28:25 <flwang> strigazi: we have deployed stable stein 21:28:31 <flwang> will test it soon 21:28:38 <brtknr> i am also happy to do a rerun both on master and queens 21:28:50 <strigazi> flwang: brtknr how did you run the e2e tests> 21:28:51 <strigazi> ? 21:29:21 <strigazi> oh 1.14.0 is out 21:29:23 <flwang> sonobuoy run 21:29:27 <strigazi> https://github.com/heptio/sonobuoy/releases/tag/v0.14.0 21:29:34 <strigazi> excellent 21:29:40 <flwang> i'm using 0.13 21:29:42 <strigazi> I'll try in prod and devstack 21:29:50 <flwang> will test with 0.14 for v1.14.0 21:29:56 <strigazi> flwang: congrats in stein :) 21:30:02 <flwang> haha, thanks 21:30:05 <brtknr> strigazi: I followed the istructions on https://github.com/heptio/sonobuoy 21:30:15 <flwang> we have to use stein since we need a lot of new features in stein 21:30:24 <strigazi> ok, let's finalize 1.14 tmr 21:30:49 <strigazi> brtknr: I +2 your patch for the network config 21:31:12 <brtknr> strigazi: thanks :) 21:31:13 <flwang> as for the client patch, i will restore and push 21:31:22 <strigazi> brtknr: for the client we will release again soon 21:31:25 <flwang> i was distracted a bit 21:31:29 <strigazi> brtknr: what flwang said :) 21:31:33 <brtknr> flwang: no worriees 21:31:50 <flwang> we will release a train client technically 21:32:03 <strigazi> ok, the other, still two, things are: 21:32:17 <strigazi> NGs 21:32:53 <strigazi> flwang: brtknr, with ttsiouts we tested the migration on a copy of our prod DB 21:33:20 <strigazi> after a small fix, everything worked as expected. 21:33:27 <ttsiouts> strigazi: ++ 21:33:41 <brtknr> nice, i have a question on that, is there a way to go back to previous db state? 21:33:45 <strigazi> probably we had some trash from old migrations, but it works fine 21:33:57 <strigazi> brtknr: yes 21:34:09 <strigazi> it is called backup the DB first :P 21:34:22 <strigazi> the magnum db is tiny 21:34:35 <flwang> strigazi: with the ng-3, a small issue i have mentioned in ng4 is, it making cluster delete very slow 21:34:41 <flwang> though the stack has been deleted 21:34:49 <flwang> it worths a dig 21:34:53 <strigazi> we have more than 500 cluster and it is less than one megabyte 21:34:53 <brtknr> is that through a magnum cli command or copy db files manually? 21:34:58 <flwang> ttsiouts is aware i hope 21:35:11 <ttsiouts> flwang: yes most probably it's the extra queries for fetching and deleting the NGs 21:35:26 <flwang> ttsiouts: but it's more than that 21:35:36 <strigazi> flwang: what do you mean very slow? 21:35:36 <brtknr> would be nice to do a comparison of how much longer it actually takes 21:35:38 <flwang> those extra queries shouldn't take such long 21:35:53 <ttsiouts> flwang: hmmm 21:36:20 <flwang> strigazi: generall, after the stack deleted, magnum will just take several seconds to remove the db record in magnum db 21:36:35 <ttsiouts> flwang: I'm checking 21:36:40 <flwang> but now it takes another 20-30 seconds to delete the record 21:36:50 <flwang> i haven't seen a delete failure 21:36:55 <flwang> so just a heads up 21:37:00 <strigazi> ok 21:37:30 <strigazi> small issue but it should and probably can be improved 21:38:05 <flwang> yep, that's a huge job, well done ttsiouts 21:38:19 <ttsiouts> flwang: :D 21:38:20 <strigazi> I have tested the other two patches and work fine for me 21:38:33 <strigazi> ttsiouts: ++ x10 21:38:44 <strigazi> or ++^2 21:39:01 <ttsiouts> flwang: a single query could delete them 21:39:13 <ttsiouts> I can propose a patch 21:39:48 <brtknr> yep amazing work! 21:40:02 <ttsiouts> :) 21:40:02 <brtknr> i didnt even imagine this would be reality 1 month ago 21:40:30 <strigazi> it tooks some afternoon sessions but ttsiouts pulled this off 21:40:47 <strigazi> solid implementation 21:41:06 <brtknr> is the CRUD bit going to be tricky? 21:41:14 <flwang> brtknr: it's 21:41:32 <flwang> strigazi: i will review ng4 and ng5 21:41:58 <strigazi> thanks 21:42:11 <strigazi> the last item comes from this: 21:42:13 <ttsiouts> flwang: thanks!! 21:42:31 <strigazi> https://review.openstack.org/#/c/648317/ 21:42:52 <ttsiouts> brtknr: I don't know how many times I've thanked you for your testing but they are not enough 21:43:18 <strigazi> Shall we implement an option to have BFV instead of extra attached volumes? it will be cleaner and easier to maintain and run 21:43:52 <flwang> what do you mean BFV? 21:44:20 <strigazi> Boot From Volume 21:45:20 <flwang> do you mean boot from volume for the nodes? 21:45:28 <flwang> master and node 21:45:35 <strigazi> I think it will be intersting for you 21:45:37 <strigazi> yes 21:45:44 <brtknr> ttsiouts: :) my pleasure 21:45:52 <strigazi> with NGs we can separate 21:45:54 <ttsiouts> brtknr: :D 21:46:10 <strigazi> but for "old" drivers, yes, master and node 21:46:40 <strigazi> thouhts? 21:46:52 <strigazi> I think mnaser would be also interested 21:47:01 <flwang> strigazi: yep, i like it. but i don't understand why it can resolve the mount problem 21:47:41 <strigazi> because we won't mount a device to the vm and then in the kubelet container 21:47:49 <strigazi> the volume will be the root fs 21:47:56 <flwang> ah 21:48:07 <flwang> i see, you're talking about the specific case 21:48:30 <strigazi> now, we mount, we partition the fs 21:48:33 <brtknr> strigazi: is that interoperable with baremetal provisioning case? 21:48:43 <flwang> strigazi: i understand what you're talking about now 21:48:57 <strigazi> brtknr: for BM it is not an issue 21:49:33 <strigazi> brtknr: for BM we don't mount a volume for the container storage (images and container overlays) 21:50:11 <flwang> strigazi: but even if we can have BFV, if we still allow use specify the docker_volume_size, user will still run into this issue 21:50:23 <flwang> so we should have both of them 21:50:46 <flwang> s/use/user 21:51:29 <strigazi> yes, it may be an issue, but we can advertise BFV for use cases where the flavor has a small disk 21:52:01 <flwang> yep 21:52:31 <strigazi> I think the original problem is addressed better with BFV VMs 21:52:40 <flwang> so let's take this and add BFV, and slowly drop the docker_volume_size in the long run? 21:52:41 <strigazi> we (cern) don't use this feature 21:52:55 <strigazi> I mean docker_volume_size 21:52:58 <flwang> strigazi: yep, because you have a big image root size 21:53:03 <flwang> yep i know 21:53:13 <strigazi> well, we do, but only for swarm, AFAIK 21:53:21 <strigazi> big-ish 21:53:22 <brtknr> cert dont use BFV or mount/partition? 21:53:30 <brtknr> s/cert/cern 21:53:45 <mnaser> FWIW we are BFV only cloud. I think I remember pushing up or working on something about this not long ago 21:53:47 <flwang> brtknr: magnum doesn't support boot from volume now 21:54:06 <flwang> mnaser: yep, i think you have a patch stall somewhere 21:54:12 <flwang> do you mind me picking it up? 21:54:18 <brtknr> hello mnaser 21:54:26 <mnaser> no, please go ahead. I’m a bit short on time with release and stuff 21:54:32 <mnaser> Hi brtknr o/ 21:54:39 <flwang> mnaser: cool, thanks 21:54:47 <flwang> strigazi: we have 5 mins 21:55:08 <flwang> give me 2 mins for AS and AH? 21:55:46 <strigazi> https://review.openstack.org/#/c/621734/ 21:56:08 <flwang> auto scaling and auto healing are important feature for k8s, now we're going to do it in magnum https://review.openstack.org/631378 21:56:24 <flwang> strigazi: mind me publishing the autoscaler image on openstackmagnum? 21:56:57 <strigazi> flwang: how? copy from thomas's repo? 21:57:22 <flwang> no 21:57:29 <flwang> just download the sourcecode and build 21:57:47 <flwang> why do i have to use thomas's repo? 21:58:11 <flwang> is thomas around? 21:58:12 <strigazi> you don't, it is just what everyone of us have tested 21:58:32 <flwang> you mean i don't have to? 21:58:58 <flwang> if it's well tested, i can just copy 22:00:13 <flwang> strigazi: brtknr: could you please also test the patch https://review.openstack.org/631378 to make sure you're happy with it? 22:00:51 <brtknr> flwang: yep i'll test it tomorrow 22:01:16 <flwang> brtknr: thanks 22:01:34 <brtknr> flwang: as soon as I redeploy my devstack as my magnum db is updated to support nodegroups and I didnt back up 22:01:35 <strigazi> http://paste.openstack.org/raw/748747/ 22:02:40 <strigazi> flwang: I left some comments 22:02:48 <flwang> strigazi: yep, i saw that 22:02:58 <flwang> strigazi: thanks for review 22:03:06 <flwang> strigazi: are you happy generally? 22:03:34 <strigazi> generally yes 22:03:51 <strigazi> flwang: brtknr I pushed the image http://paste.openstack.org/raw/748747/ 22:03:52 <flwang> strigazi: cool 22:04:13 <flwang> i will propose a new patch set today to address all comments 22:04:33 <brtknr> flwang: how do you normally test auto-healing? 22:04:57 <strigazi> flwang: did the CA team told you that they will release magnum in 1.14.1? 22:05:15 <flwang> strigazi: yes, they told me they will include it in 1.14.1 22:05:22 <flwang> it will take a couple of weeks 22:05:27 <flwang> brtknr: we can talk offline 22:05:42 <flwang> just put more load for your cluster, it's easy 22:06:05 <brtknr> flwang: isnt that auto-scaling? 22:06:11 <brtknr> sorry for silly questions 22:06:27 <brtknr> yes we can talk latrer 22:06:33 <strigazi> brtknr: http://paste.openstack.org/raw/748748/ 22:07:05 <strigazi> ~ 500mb pods 22:07:17 <flwang> strigazi: did you get a chance to see this https://review.openstack.org/#/c/643225/ ? 22:07:41 <strigazi> no, will do tmr 22:09:03 <flwang> strigazi: no problem, thanks 22:09:10 <flwang> i'm a happy boy now 22:09:19 <strigazi> :) 22:09:44 <strigazi> Shall we end the meeting? 22:09:49 <strigazi> anything else? 22:09:53 <mnaser> Small thing 22:10:06 <mnaser> Has the functional tests started working again magically? 22:10:19 <strigazi> we can test 22:10:24 <strigazi> I don't know 22:10:26 <mnaser> Maybe an updated kernel in CI might have helped? 22:10:26 <strigazi> flwang: ? 22:10:40 <flwang> strigazi: i don't think so unless the nested virt has been fixed? 22:10:42 <openstackgerrit> Spyros Trigazis proposed openstack/magnum master: Revert "ci: Disable functional tests" https://review.openstack.org/642873 22:10:45 <flwang> mnaser: is it? 22:11:00 <strigazi> mnaser: ^^ 22:11:01 <mnaser> I know kata works fine but they don’t update their kernel. So maybe it was a bad kernel and Ubuntu release an update and it’s ok now 22:11:15 <mnaser> Just something to try. 22:11:26 <flwang> mnaser: thanks, good to know 22:11:44 <mnaser> If not, I can maybe try to reimplement those tests in centos. Our hosts run centos so it might be more stable 22:11:44 <flwang> the quick boy strigazi has a patch already ^ 22:11:46 <strigazi> the ci is running, let's see 22:11:59 <mnaser> Cool, I’ll add myself to the review. 22:12:01 <mnaser> Thanks 22:12:14 <flwang> mnaser: thanks for all your love for magnum 22:12:23 <strigazi> mnaser: ++ 22:12:27 <brtknr> mnaser: yeaah!! 22:12:41 <mnaser> :D I try my best with the little time I have :P 22:13:19 <strigazi> thanks again :) 22:13:23 <flwang> mnaser: cheers 22:13:29 <strigazi> let's end the meeting then 22:13:49 <strigazi> #endmeeting