#openstack-containers log

21:01:58 <strigazi> #startmeeting containers
21:01:59 <openstack> Meeting started Tue Apr  2 21:01:58 2019 UTC and is due to finish in 60 minutes.  The chair is strigazi. Information about MeetBot at http://wiki.debian.org/MeetBot.
21:02:00 <openstack> Useful Commands: #action #agreed #help #info #idea #link #topic #startvote.
21:02:02 <openstack> The meeting name has been set to 'containers'
21:02:04 <strigazi> #topic Roll Call
21:02:05 <flwang> yes, you're in charge the stein release ;)
21:02:07 <strigazi> o/
21:02:09 <brtknr> o/
21:02:22 <flwang> o/
21:03:24 <strigazi> #topic Stories/Tasks
21:03:48 <strigazi> For upgrades:
21:03:49 <jakeyip> o/
21:04:07 <strigazi> flwang: I still dont' understand https://review.openstack.org/#/c/649221/
21:04:13 <strigazi> hey jakeyip
21:04:25 <jakeyip> hi strigazi
21:04:30 <flwang> strigazi:  let me show you
21:05:03 <flwang> https://review.openstack.org/#/c/514960/4/magnum/drivers/common/templates/kubernetes/fragments/configure-kubernetes-minion.sh@161
21:05:29 <flwang> with your patch, the configure-kubernetes-minion will be run by heat-container-agent
21:05:51 <flwang> but heat-container-agent can't access /usr/lib/systemd/system
21:05:55 <strigazi> ok
21:05:58 <flwang> which will make  cluster create failed
21:06:01 <strigazi> only for this?
21:06:23 <flwang> after workaround this
21:06:28 <strigazi> we should do all of this over ssh
21:07:06 <flwang> ssh for checking and copying as well?
21:07:37 <flwang> can you help me understand what's the limitation why we have to use ssh?
21:09:11 <strigazi> for syscontainers we need to use ssh for the atomic command. Atomic install needs to be in the same fs. It creates hard links for the installed container image
21:09:21 <strigazi> this is the first reason ^^
21:09:26 <strigazi> the second is
21:10:21 <strigazi> the second is kind of weak. It is: not to have to install all deps of the host system to agent
21:10:34 <strigazi> eg the command hostname
21:10:37 <strigazi> also
21:10:54 <strigazi> for systemctl, it is better to do it over ssh IMO.
21:11:26 <strigazi> otherwise, we might see weird things if the systemd verions != systemctl version
21:11:31 <strigazi> makes sense?
21:12:29 <flwang> ok, if we have to do that, then i'm ok. but for special case, like this docker.service, i'm not sure which one is the good/simple way
21:12:44 <flwang> so far, your patch doesn't work for me yet.
21:12:49 <flwang> btw
21:12:50 <strigazi> ssh for sure
21:13:06 <strigazi> the agent should be minimal
21:13:13 <strigazi> and general purpose
21:13:18 <flwang> where can i find the upgrade-kubernetes.sh on the minion node?
21:13:35 <strigazi> to test by hand?
21:13:46 <flwang> no
21:14:02 <strigazi> it leaves for a limited period of time in /var/lib/cloud/<smth>
21:14:10 <flwang> i mean will the script be shipped after the cluster created
21:14:14 <strigazi> s/leaves/lives
21:14:21 <strigazi> no
21:14:34 <strigazi> it is shipped in stack update
21:14:34 <flwang> or after issued the upgrade command
21:14:44 <strigazi> this ^^
21:15:06 <flwang> ok, in my testing, after issued the upgrade, i can't find it on the minion node
21:15:12 <strigazi> https://review.openstack.org/#/c/514960/4/magnum/drivers/k8s_fedora_atomic_v1/templates/kubeminion.yaml@458
21:16:05 <guimaluf> hi guys, I'm using magnum to deploy a k8s cluster, but when I call the `coe config cluster` it gives me this error "a bytes-like object is required, not 'str'" does anyone have any hint ou tip about it? thank in advance
21:16:08 <flwang> we can discuss details off line if you have some time today
21:16:46 <flwang> guimaluf: we're in weekly meeting, mind we discuss it offline after 45 mins?
21:16:53 <strigazi> flwang: today for you tonight, I do have :)
21:17:00 <flwang> strigazi: ok
21:17:25 <flwang> strigazi: i will be around at UTC 10:00
21:17:38 <guimaluf> flwang, oh, sorry. I thought meetings were held on #openstack-meetings! my sorry! :)
21:17:54 <brtknr> guimaluf: i've PMed you with the answer ;)
21:18:13 <strigazi> flwang: maybe09:00 ?
21:18:34 <flwang> strigazi: sure, no problem
21:19:53 <strigazi> flwang: for the API, shall we take it?
21:20:53 <flwang> strigazi: yes, i have tested it based on your patch
21:20:55 <flwang> it works
21:21:11 <strigazi> I can give a final +2 tmr
21:21:15 <flwang> and we can polish it along with your current functional patch if there is any small issue
21:21:28 <flwang> get it in will make your patch testing easier
21:21:37 <flwang> i will propose a API ref patch soon
21:21:49 <flwang> strigazi: thanks
21:23:02 <strigazi> Let's move one, I need to discuss to things
21:23:21 <strigazi> s/to things/two things/
21:23:32 <strigazi> move on :)
21:23:52 <strigazi> After two corrections it worked :)
21:23:59 <strigazi> 1.14.0
21:24:12 <flwang> sonobuoy passed?
21:24:18 <strigazi> we have the containers, did it work for anyone? flwang ?
21:24:28 <brtknr> in case we forget, I'd like to add 2 things to the agenda too, python-magnumclient 2.13.0 and why it is in a abandonded state: https://review.openstack.org/#/c/642609/ and multi-nic patch https://review.openstack.org/#/c/648818/
21:24:30 <strigazi> I havent' tried, will do tmr
21:24:40 <flwang> i just tested v1.13.5, i'm going to test it today or tomorrow
21:24:51 <flwang> v1.13.5 can pass sonobuoy
21:25:11 <brtknr> i posted by sonobuoy e2e results here which failed with 12 failures:  http://paste.openstack.org/show/748667/
21:25:22 <brtknr> 192 passed
21:25:42 <brtknr> although i don't understand what the failures mean to be honest
21:25:49 <ttsiouts> o/
21:25:51 <flwang> brtknr: i think it deserves a rerun
21:25:59 <ttsiouts> sorry i was late
21:26:02 <brtknr> will do
21:26:13 <strigazi> brtknr: thanks for the results
21:26:19 <strigazi> ttsiouts: welcome
21:26:28 <strigazi> brtknr: flannel or calico
21:26:32 <strigazi> brtknr: master branch?
21:26:33 <brtknr> hello ttsiouts
21:27:15 <brtknr> strigazi: flannel, not master branch... queens :D
21:27:21 <strigazi> brtknr: ok
21:27:29 <strigazi> I will also check
21:27:33 <brtknr> i only upgraded the images
21:27:38 <strigazi> ok
21:28:06 <strigazi> I think and hope it will work for master
21:28:16 <strigazi> ok, so we are close for 1.14
21:28:25 <flwang> strigazi: we have deployed stable stein
21:28:31 <flwang> will test it soon
21:28:38 <brtknr> i am also happy to do a rerun both on master and queens
21:28:50 <strigazi> flwang: brtknr how did you run the e2e tests>
21:28:51 <strigazi> ?
21:29:21 <strigazi> oh 1.14.0 is out
21:29:23 <flwang> sonobuoy run
21:29:27 <strigazi> https://github.com/heptio/sonobuoy/releases/tag/v0.14.0
21:29:34 <strigazi> excellent
21:29:40 <flwang> i'm using 0.13
21:29:42 <strigazi> I'll try in prod and devstack
21:29:50 <flwang> will test with 0.14 for v1.14.0
21:29:56 <strigazi> flwang: congrats in stein :)
21:30:02 <flwang> haha, thanks
21:30:05 <brtknr> strigazi: I followed the istructions on https://github.com/heptio/sonobuoy
21:30:15 <flwang> we have to use stein since we need a lot of new features in stein
21:30:24 <strigazi> ok, let's finalize 1.14 tmr
21:30:49 <strigazi> brtknr: I +2 your patch for the network config
21:31:12 <brtknr> strigazi: thanks :)
21:31:13 <flwang> as for the client patch, i will restore and push
21:31:22 <strigazi> brtknr: for the client we will release again soon
21:31:25 <flwang> i was distracted a bit
21:31:29 <strigazi> brtknr: what flwang said :)
21:31:33 <brtknr> flwang: no worriees
21:31:50 <flwang> we will release a train client technically
21:32:03 <strigazi> ok, the other, still two, things are:
21:32:17 <strigazi> NGs
21:32:53 <strigazi> flwang: brtknr, with ttsiouts we tested the migration on a copy of our prod DB
21:33:20 <strigazi> after a small fix, everything worked as expected.
21:33:27 <ttsiouts> strigazi: ++
21:33:41 <brtknr> nice, i have a question on that, is there a way to go back to previous db state?
21:33:45 <strigazi> probably we had some trash from old migrations, but it works fine
21:33:57 <strigazi> brtknr: yes
21:34:09 <strigazi> it is called backup the DB first :P
21:34:22 <strigazi> the magnum db is tiny
21:34:35 <flwang> strigazi: with the ng-3, a small issue i have mentioned in ng4 is, it making cluster delete very slow
21:34:41 <flwang> though the stack has been deleted
21:34:49 <flwang> it worths  a dig
21:34:53 <strigazi> we have more than 500 cluster and it is less than one megabyte
21:34:53 <brtknr> is that through a magnum cli command or copy db files manually?
21:34:58 <flwang> ttsiouts is aware i hope
21:35:11 <ttsiouts> flwang: yes most probably it's the extra queries for fetching and deleting the NGs
21:35:26 <flwang> ttsiouts: but it's more than that
21:35:36 <strigazi> flwang: what do you mean very slow?
21:35:36 <brtknr> would be nice to do a comparison of how much longer it actually takes
21:35:38 <flwang> those extra queries shouldn't take such long
21:35:53 <ttsiouts> flwang: hmmm
21:36:20 <flwang> strigazi: generall, after the stack deleted, magnum will just take several seconds to remove the db record in magnum db
21:36:35 <ttsiouts> flwang: I'm checking
21:36:40 <flwang> but now it takes another 20-30 seconds to delete the record
21:36:50 <flwang> i haven't seen a delete failure
21:36:55 <flwang> so just a heads up
21:37:00 <strigazi> ok
21:37:30 <strigazi> small issue but it should and probably can be improved
21:38:05 <flwang> yep, that's a huge job, well done ttsiouts
21:38:19 <ttsiouts> flwang: :D
21:38:20 <strigazi> I have tested the other two patches and work fine for me
21:38:33 <strigazi> ttsiouts: ++ x10
21:38:44 <strigazi> or ++^2
21:39:01 <ttsiouts> flwang: a single query could delete them
21:39:13 <ttsiouts> I can propose a patch
21:39:48 <brtknr> yep amazing work!
21:40:02 <ttsiouts> :)
21:40:02 <brtknr> i didnt even imagine this would be reality 1 month ago
21:40:30 <strigazi> it tooks some afternoon sessions but ttsiouts pulled this off
21:40:47 <strigazi> solid implementation
21:41:06 <brtknr> is the CRUD bit going to be tricky?
21:41:14 <flwang> brtknr: it's
21:41:32 <flwang> strigazi: i will review ng4 and ng5
21:41:58 <strigazi> thanks
21:42:11 <strigazi> the last item comes from this:
21:42:13 <ttsiouts> flwang: thanks!!
21:42:31 <strigazi> https://review.openstack.org/#/c/648317/
21:42:52 <ttsiouts> brtknr: I don't know how many times I've thanked you for your testing but they are not enough
21:43:18 <strigazi> Shall we implement an option to have BFV instead of extra attached volumes? it will be cleaner and easier to maintain and run
21:43:52 <flwang> what do you mean BFV?
21:44:20 <strigazi> Boot From Volume
21:45:20 <flwang> do you mean boot from volume for the nodes?
21:45:28 <flwang> master and node
21:45:35 <strigazi> I think it will be intersting for you
21:45:37 <strigazi> yes
21:45:44 <brtknr> ttsiouts: :) my pleasure
21:45:52 <strigazi> with NGs we can separate
21:45:54 <ttsiouts> brtknr: :D
21:46:10 <strigazi> but for "old" drivers, yes, master and node
21:46:40 <strigazi> thouhts?
21:46:52 <strigazi> I think mnaser would be also interested
21:47:01 <flwang> strigazi: yep, i like it. but i don't understand why it can resolve the mount problem
21:47:41 <strigazi> because we won't mount a device to the vm and then in the kubelet container
21:47:49 <strigazi> the volume will be the root fs
21:47:56 <flwang> ah
21:48:07 <flwang> i see, you're talking about the specific case
21:48:30 <strigazi> now, we mount, we partition the fs
21:48:33 <brtknr> strigazi: is that interoperable with baremetal provisioning case?
21:48:43 <flwang> strigazi:  i understand what you're talking about now
21:48:57 <strigazi> brtknr: for BM it is not an issue
21:49:33 <strigazi> brtknr: for BM we don't mount a volume for the container storage (images and container overlays)
21:50:11 <flwang> strigazi: but even if we can have BFV, if we still allow use specify the docker_volume_size, user will still run into this issue
21:50:23 <flwang> so we should have both of them
21:50:46 <flwang> s/use/user
21:51:29 <strigazi> yes, it may be an issue, but we can advertise BFV for use cases where the flavor has a small disk
21:52:01 <flwang> yep
21:52:31 <strigazi> I think the original problem is addressed better with BFV VMs
21:52:40 <flwang> so let's take this and add BFV, and slowly drop the docker_volume_size in the long run?
21:52:41 <strigazi> we (cern) don't use this feature
21:52:55 <strigazi> I mean docker_volume_size
21:52:58 <flwang> strigazi: yep, because you have a big image root size
21:53:03 <flwang> yep i know
21:53:13 <strigazi> well, we do, but only for swarm, AFAIK
21:53:21 <strigazi> big-ish
21:53:22 <brtknr> cert dont use BFV or mount/partition?
21:53:30 <brtknr> s/cert/cern
21:53:45 <mnaser> FWIW we are BFV only cloud. I think I remember pushing up or working on something about this not long ago
21:53:47 <flwang> brtknr: magnum doesn't support boot from volume now
21:54:06 <flwang> mnaser: yep, i think you have a patch stall somewhere
21:54:12 <flwang> do you mind me picking it up?
21:54:18 <brtknr> hello mnaser
21:54:26 <mnaser> no, please go ahead. I’m a bit short on time with release and stuff
21:54:32 <mnaser> Hi brtknr o/
21:54:39 <flwang> mnaser: cool, thanks
21:54:47 <flwang> strigazi: we have 5 mins
21:55:08 <flwang> give me 2 mins for AS and AH?
21:55:46 <strigazi> https://review.openstack.org/#/c/621734/
21:56:08 <flwang> auto scaling and auto healing are important feature for k8s, now we're going to do it in magnum https://review.openstack.org/631378
21:56:24 <flwang> strigazi: mind me publishing the autoscaler image on openstackmagnum?
21:56:57 <strigazi> flwang: how? copy from thomas's repo?
21:57:22 <flwang> no
21:57:29 <flwang> just download the sourcecode and build
21:57:47 <flwang> why do i have to use thomas's repo?
21:58:11 <flwang> is thomas around?
21:58:12 <strigazi> you don't, it is just what everyone of us have tested
21:58:32 <flwang> you mean i don't have to?
21:58:58 <flwang> if it's well tested, i can just copy
22:00:13 <flwang> strigazi: brtknr: could you please also test the patch https://review.openstack.org/631378 to make sure you're happy with it?
22:00:51 <brtknr> flwang: yep i'll test it tomorrow
22:01:16 <flwang> brtknr: thanks
22:01:34 <brtknr> flwang: as soon as I redeploy my devstack as my magnum db is updated to support nodegroups and I didnt back up
22:01:35 <strigazi> http://paste.openstack.org/raw/748747/
22:02:40 <strigazi> flwang: I left some comments
22:02:48 <flwang> strigazi: yep, i saw that
22:02:58 <flwang> strigazi: thanks for review
22:03:06 <flwang> strigazi: are you happy generally?
22:03:34 <strigazi> generally yes
22:03:51 <strigazi> flwang:  brtknr I pushed the image http://paste.openstack.org/raw/748747/
22:03:52 <flwang> strigazi: cool
22:04:13 <flwang> i will propose a new patch set today to address all comments
22:04:33 <brtknr> flwang: how do you normally test auto-healing?
22:04:57 <strigazi> flwang: did the CA team told you that they will release magnum in 1.14.1?
22:05:15 <flwang> strigazi: yes, they told me they will include it in 1.14.1
22:05:22 <flwang> it will take a couple of weeks
22:05:27 <flwang> brtknr: we can talk offline
22:05:42 <flwang> just put more load for your cluster, it's easy
22:06:05 <brtknr> flwang: isnt that auto-scaling?
22:06:11 <brtknr> sorry for silly questions
22:06:27 <brtknr> yes we can talk latrer
22:06:33 <strigazi> brtknr: http://paste.openstack.org/raw/748748/
22:07:05 <strigazi> ~ 500mb pods
22:07:17 <flwang> strigazi:  did you get a chance to see this https://review.openstack.org/#/c/643225/ ?
22:07:41 <strigazi> no, will do tmr
22:09:03 <flwang> strigazi: no problem, thanks
22:09:10 <flwang> i'm a happy boy now
22:09:19 <strigazi> :)
22:09:44 <strigazi> Shall we end the meeting?
22:09:49 <strigazi> anything else?
22:09:53 <mnaser> Small thing
22:10:06 <mnaser> Has the functional tests started working again magically?
22:10:19 <strigazi> we can test
22:10:24 <strigazi> I don't know
22:10:26 <mnaser> Maybe an updated kernel in CI might have helped?
22:10:26 <strigazi> flwang: ?
22:10:40 <flwang> strigazi: i don't think so unless the nested virt has been fixed?
22:10:42 <openstackgerrit> Spyros Trigazis proposed openstack/magnum master: Revert "ci: Disable functional tests"  https://review.openstack.org/642873
22:10:45 <flwang> mnaser:  is it?
22:11:00 <strigazi> mnaser: ^^
22:11:01 <mnaser> I know kata works fine but they don’t update their kernel. So maybe it was a bad kernel and Ubuntu release an update and it’s ok now
22:11:15 <mnaser> Just something to try.
22:11:26 <flwang> mnaser: thanks, good to know
22:11:44 <mnaser> If not, I can maybe try to reimplement those tests in centos. Our hosts run centos so it might be more stable
22:11:44 <flwang> the quick boy strigazi has a patch already ^
22:11:46 <strigazi> the ci is running, let's see
22:11:59 <mnaser> Cool, I’ll add myself to the review.
22:12:01 <mnaser> Thanks
22:12:14 <flwang> mnaser: thanks for all your love for magnum
22:12:23 <strigazi> mnaser: ++
22:12:27 <brtknr> mnaser: yeaah!!
22:12:41 <mnaser> :D I try my best with the little time I have :P
22:13:19 <strigazi> thanks again :)
22:13:23 <flwang> mnaser: cheers
22:13:29 <strigazi> let's end the meeting then
22:13:49 <strigazi> #endmeeting