09:04:04 <flwang1> #startmeeting magnum 09:04:05 <openstack> Meeting started Wed Sep 23 09:04:04 2020 UTC and is due to finish in 60 minutes. The chair is flwang1. Information about MeetBot at http://wiki.debian.org/MeetBot. 09:04:06 <openstack> Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. 09:04:08 <openstack> The meeting name has been set to 'magnum' 09:04:18 <flwang1> brtknr: meeting? 09:04:24 <brtknr> sure 09:04:28 <brtknr> o/ 09:04:41 <flwang1> o/ 09:04:56 <flwang1> Spyros said he will tried, but seems he is not online 09:05:20 <flwang1> as for the hyperkube vs binary, i'm thinking if we can keep support both 09:05:21 <jakeyip> o/ 09:05:26 <flwang1> hi jakeyip 09:05:34 <brtknr> o/ jakeyip 09:05:49 <brtknr> i am happy to support both 09:06:24 <strigazi> flwang1: o/ 09:06:38 <flwang1> strigazi: hey 09:06:50 <flwang1> #topic hyperkube vs binary 09:07:11 <flwang1> strigazi: i'm thinking if we can support both, do you think it's a bad idea? 09:07:44 <strigazi> flwang1: no, i think we have too 09:07:59 <brtknr> o/ strigazi 09:08:06 <strigazi> flwang1: well it is bad, but we have too :) 09:08:11 <strigazi> brtknr: hello 09:08:29 <flwang1> when i say support both, i mean something like we did before for atomic system container and the hyperkube 09:09:08 <flwang1> however, that will make the upgrade script a bit messy 09:09:16 <brtknr> do we allow the possibility of going from hyperkube -> binary? 09:09:34 <strigazi> brtknr: for running cluster we can't do it 09:09:59 <flwang1> strigazi: why we can't? 09:10:13 <brtknr> because the upgrade scripts are baked into the heat template? 09:10:49 <strigazi> brtknr: yes 09:11:08 <flwang1> strigazi: i see the point 09:11:31 <strigazi> flwang1: brtknr: what we can do is deploy a software_deployment from another template but we will lose track so much 09:11:51 <strigazi> the good part is that we NGs at least, new NGs in running clusters will get the binary 09:12:19 <flwang1> if that's the case, we will have to support hyperkube 09:12:40 <brtknr> i noticed it yesterday when i tried to bootstrap some new nodes with the virtio-scsi fix: https://review.opendev.org/#/c/753312/ 09:12:44 <flwang1> and from an operation pov, the cloud provider can decide when to fully switch to binary 09:12:52 <brtknr> it didnt work with existing cluster, only with a new cluwter 09:13:40 <strigazi> flwang1: sounds reasonable, at least for new clusters we should avoid hyperkube 09:13:43 <flwang1> ok, so can we make an agreement that we will have to support both? 09:14:17 <flwang1> like atomic system container and hyperkube 09:14:23 <brtknr> can we allow the possibility of choosing hyperkube source? i dont mind defaulting to k8s.gcr.io 09:14:25 <flwang1> that's a bit funny for me 09:15:40 <flwang1> brtknr: i don't mind supporting that 09:15:42 <brtknr> it doesnt affect the function, just gives operators more flexibility 09:15:51 <strigazi> brtknr: flwang1: we can add new label for rancher 09:15:52 <flwang1> brtknr: yep, i understand 09:16:10 <strigazi> for people with their own registry it is pointless 09:16:15 <flwang1> strigazi: yep, that's basically what brtknr proposed 09:16:30 <flwang1> strigazi: true 09:16:57 <strigazi> it is just that it doesn't work with the private registry override we have. like who wins? 09:17:15 <strigazi> brtknr: you will honestly rely on rancher? 09:18:17 <strigazi> brtknr: maybe do it like helm? repository and tag? 09:19:17 <brtknr> strigazi: do you see problems with using rancher? 09:20:24 <strigazi> brtknr: it is a competitor project and product, not sure about the license and how community driven it is. From our (magnum) side, we can make the option generic and not default to it 09:21:03 <strigazi> if someone opts in for it, they are on their own 09:21:11 <brtknr> strigazi: i understand, i will revert the change to default to it 09:21:35 <brtknr> what do you mean by > maybe do it like helm? repository and tag? 09:21:56 <strigazi> brtknr: instead of prefix, have the full thing as a label 09:22:13 <strigazi> repository: index.docker.io/rancher/foo 09:22:30 <strigazi> tag: vX.Y.Z-bar 09:22:52 <brtknr> and not limit to calling it hyperkube? 09:22:53 <strigazi> the tag we have it as kube_tah 09:22:58 <strigazi> the tag we have it as kube_tag 09:23:02 <flwang1> i think we can reuse our current kube_tag, just need a new label for the repo, isn't it? 09:23:05 <openstackgerrit> Merged openstack/magnum-ui stable/victoria: Update .gitreview for stable/victoria https://review.opendev.org/753179 09:23:06 <openstackgerrit> Merged openstack/magnum-ui stable/victoria: Update TOX_CONSTRAINTS_FILE for stable/victoria https://review.opendev.org/753180 09:23:38 <strigazi> brtknr: yeah, since it a new label, why limit it? the community has a high rate of changing things 09:23:50 <strigazi> flwang1: yes 09:24:13 <brtknr> okay okay okay sdasdasd 09:24:17 <strigazi> smth like hyperkube_image? hyperkube_repo or _repository 09:24:24 <brtknr> sdasdasds 09:24:26 <brtknr> sdsds 09:24:28 <brtknr> qweqwewqeqe 09:24:34 <brtknr> oops sorry 09:24:38 <strigazi> cat? 09:24:38 <jakeyip> lol 09:24:49 <openstackgerrit> Merged openstack/magnum-ui master: Update master for stable/victoria https://review.opendev.org/753181 09:24:49 <openstackgerrit> Merged openstack/magnum-ui master: Add Python3 wallaby unit tests https://review.opendev.org/753182 09:24:51 <brtknr> my keyboard was unresponsibe 09:25:03 <strigazi> I wanted to be cat :) 09:25:22 <brtknr> lol 09:25:37 <brtknr> okay i'll change it to hyperkube repo 09:25:44 <strigazi> So we agree on it right? 09:26:37 <flwang1> strigazi: +1 are you going to polish the patch? 09:26:37 <strigazi> Sorry team, I have time for one more quick subject, I need to leave in 10'. A family thing, everything is ok. 09:26:45 <strigazi> flwang1: yes 09:27:01 <flwang1> strigazi: cool, i will take a look the upgrade part 09:27:11 <flwang1> strigazi: i have a interesting topoic 09:27:16 <flwang1> the health monitor of heat-contianer-agent 09:27:30 <flwang1> i have seen several cases of dead heat-container-agent on our production 09:27:36 <flwang1> which caused upgrade failed 09:27:57 <strigazi> flwang1: I can change to brtknr's suggestion to use the server tarball. 09:28:15 <flwang1> who will build the tarball? 09:28:19 <flwang1> and where to host it? 09:28:20 <strigazi> flwang1: will the monitor help? 09:28:37 <strigazi> flwang1 already there, built by the community 09:28:41 <flwang1> podman does support health-check command, but i have no idea what's the good command we should use 09:28:44 <brtknr> and use kube-* binaries from the server tarball? 09:29:05 <flwang1> strigazi: if there is a tarball from community, then i'm happy to use it 09:29:16 <brtknr> what about kube-proxy? you mention you had issues getting kube-proxy ?working 09:29:30 <strigazi> brtknr: only kubelet binary, for the rest the images. They are included in the tarball. I will push, we can take it in gerrut 09:29:42 <flwang1> the tarball will includes kubelet and those container images of kube components, is it? 09:29:43 <strigazi> brtknr: works now 09:29:50 <strigazi> flwang1: yes 09:30:04 <flwang1> then i will be very happy 09:30:11 <brtknr> cool! 09:30:19 <flwang1> because it can resolve the digest issue 09:30:22 <flwang1> properly 09:30:25 <strigazi> flwang1: exactly 09:30:34 <flwang1> very good 09:30:39 <flwang1> move on? 09:30:44 <strigazi> yes 09:30:55 <flwang1> #topic health check of heat-container-agent 09:31:02 <strigazi> flwang1: you want smth to trigger the unit restart? 09:31:11 <flwang1> strigazi: exactly 09:31:18 <strigazi> sounds good to me 09:31:22 <brtknr> seems non-controvertial 09:31:28 <brtknr> next topic? :P 09:31:29 <flwang1> because based on my experience, a restart just fix it 09:31:55 <strigazi> "did you turn it off and on?" 09:32:10 <flwang1> turn what off and on? 09:32:25 <strigazi> yeap, the fix for everything 09:32:26 <brtknr> vm 09:32:41 <flwang1> no, i don't want to restart the vm 09:32:45 <flwang1> just heat-contaiiner-agent 09:32:57 <flwang1> because the node is working ok 09:33:15 <brtknr> please propose a patch, i dont know how this is health check? 09:33:32 <strigazi> flwang1: was the process dead? 09:33:44 <strigazi> the container was still running? 09:33:53 <brtknr> which heat container agent tag are you using? 09:34:02 <flwang1> i only have a rough idea, podman does support health-check command, but i don't have any idea how to check the health of heat-container-agent 09:34:14 <flwang1> the process is not dead 09:34:19 <strigazi> in theory we shouldn't 09:35:17 <flwang1> i can't remember the details, because it's a customer ticket, the only thing i can see is by 'journalctl -u heat-container-agent', there is nothing 09:35:27 <strigazi> i think improvements in the container or systemd are just welcome, right? 09:35:29 <flwang1> train-stable-1 09:35:43 <flwang1> strigazi: i think so 09:35:57 <brtknr> +1 09:36:03 <flwang1> as i said above, i just want to open my mind by getting some input from you guys 09:36:33 <brtknr> anything in the HCA logs before it dies? 09:37:04 <flwang1> brtknr: how to see the log before the stuck? 09:37:08 <flwang1> i don't think i can 09:37:17 <flwang1> it's rotated i think 09:37:20 <brtknr> inside /var/log/heat-config/ 09:37:43 <flwang1> that's normal i would say 09:37:56 <strigazi> flwang1: just check if os-collect-config is running 09:38:06 <flwang1> for some reasons, it just dead after a long time idle 09:38:09 <strigazi> ps aux | grep os-collect-config 09:38:20 <flwang1> i cann't do it now 09:38:25 <brtknr> it is not dying during script execution? 09:38:34 <flwang1> because the ticket has been resolved, it's a customer env 09:38:41 <flwang1> brtknr: it's not 09:38:54 <strigazi> flwang1: in a running cluster try to kill -9 the os-collect-config process 09:38:58 <flwang1> it's an old cluster, something like 4-5 months 09:39:19 <strigazi> see if the container/systemd unit will log smth 09:39:28 <strigazi> we can start there 09:39:36 <flwang1> strigazi: good point, i will try 09:39:51 <flwang1> i think i got something to start 09:40:08 <flwang1> brtknr: strigazi: anything else you want to discuss? 09:40:31 <brtknr> yes 09:40:39 <brtknr> can you guys take a look at this? https://review.opendev.org/#/c/743945/ 09:41:01 <strigazi> flwang1: i'm good, I need to go guys, see you brtknr I will have a look 09:41:02 <brtknr> i am surprised you havent hit this, deleting cluster leaves dead trustee users behind 09:41:23 <brtknr> cya strigazi 09:41:30 <flwang1> strigazi: cya 09:41:37 <brtknr> strigazi: btw i will be off from next month until next year 09:41:50 <brtknr> on parental leave 09:41:59 <brtknr> so will be quiet for a while 09:42:02 <flwang1> brtknr: sorry, i will test it tomorrow 09:42:21 <brtknr> thanks flwang1 09:42:23 <flwang1> brtknr: so you won't do any upstream work in the next 4 months? 09:42:41 <strigazi> brtknr: oh, nice, family time :) 09:42:44 <brtknr> flwang1: i will try but will be looking after 2 babies full time :P 09:43:07 <strigazi> flwang1: brtknr: signing off, brtknr I will catch you before your leave :) 09:43:44 <flwang1> i see. no problem 09:44:38 <flwang1> it's a good idea to have more time with family during the covid19 time 09:46:27 <flwang1> anything else, team? 09:46:48 <jakeyip> I have a question 09:47:01 <jakeyip> does anybody run out of ulimits using coreos? 09:47:17 <jakeyip> https://github.com/coreos/fedora-coreos-tracker/issues/269#issuecomment-615202267 09:48:16 <flwang1> jakeyip: https://review.opendev.org/#/c/749169/ 09:48:25 <flwang1> is this you're looking for? 09:48:41 <jakeyip> lol 09:48:56 <flwang1> it's just merged today :) 09:49:09 <jakeyip> yeah I know, what a coincidence 09:49:23 <flwang1> jakeyip: put some time to do some code review will save you more time :D 09:49:52 <jakeyip> good idea ;) 09:50:39 <flwang1> if there is no new topic, i'm going to close the meeting for today 09:50:50 <flwang1> thank you for joining, my friends 09:50:54 <jakeyip> do you plan on backporting? I can do it in our repo anyway so no big deal 09:51:09 <flwang1> jakeyip: we will do backport to Victoria 09:51:27 <jakeyip> ok 09:51:39 <openstackgerrit> Feilong Wang proposed openstack/magnum stable/victoria: Update default values for docker nofile and vm.max_map_count https://review.opendev.org/753559 09:51:56 <openstackgerrit> Feilong Wang proposed openstack/magnum stable/victoria: Remove cloud-config from k8s worker node https://review.opendev.org/753560 09:52:08 <flwang1> #endmeeting