16:01:54 <mihgen> #startmeeting fuel 16:01:55 <openstack> Meeting started Thu Aug 21 16:01:54 2014 UTC and is due to finish in 60 minutes. The chair is mihgen. Information about MeetBot at http://wiki.debian.org/MeetBot. 16:01:56 <tatyana> hi 16:01:56 <openstack> Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. 16:01:56 <akislitsky> hi 16:01:57 <meow-nofer> hi 16:01:59 <openstack> The meeting name has been set to 'fuel' 16:02:07 <mihgen> hey 16:02:22 <mihgen> today vkozhukalov is on vacation so I have to run all these commands ) 16:02:33 <mihgen> who else is here? 16:02:37 <dpyzhov> hi 16:02:38 <angdraug> o/ 16:02:40 <rmoe> hi 16:02:40 <sambork> hhi 16:02:40 <ikalnitsky> o/ 16:02:45 <christopheraedo> hi 16:02:45 * dilyin here 16:02:49 <msemenov> hi 16:02:59 <mihgen> good 16:03:03 <asyriy> hi 16:03:05 <vkramskikh> hi 16:03:09 <mihgen> let's go over agenda 16:03:13 <mihgen> #link https://etherpad.openstack.org/p/fuel-weekly-meeting-agenda 16:03:20 <agordeev> hi 16:03:23 <mihgen> #topic 5.1 release status (Fuel, mos-openstack, mos-linux) 16:03:55 <mihgen> so folks we still have bugs, yes, and still can't meet HCF criteria 16:04:29 <mihgen> for Fuel we have Galera issues, and many patching issues, as well as a few more other things 16:05:08 <mihgen> I think I would pass a voice to folks to talk about Fuel issues first, and then we switch to mos-openstack/mos-linux 16:05:19 <mihgen> #topic Galera issues 16:05:33 <mihgen> holser: please provide us with latest status on this 16:05:49 <mihgen> #link https://bugs.launchpad.net/bugs/1354479 16:05:50 <uvirtbot> Launchpad bug 1354479 in fuel "Galera is not syncing on the slaves sometimes" [Critical,In progress] 16:06:12 <mihgen> #link https://bugs.launchpad.net/bugs/1355162 16:06:13 <holser> mihgen: Finally I found the issue with Galera 16:06:16 <uvirtbot> Launchpad bug 1355162 in fuel "[library] MySQL Galera is not operable after controllers hard reset" [High,Confirmed] 16:06:38 <holser> They were caused by high memory consumption caused high swap/in/swap out 16:06:53 <mihgen> is it the only reason? 16:07:14 <holser> I made a review to switch from mysqldump to xtrabackup and slightly decreased RAM 16:07:19 <holser> and that helped 16:07:38 <holser> so now I am making BVT tests with my patch 16:07:59 <mihgen> holser: do we need to increase ram size for BVT/Fuel CI jobs? 16:08:06 <holser> only that, but xtrabackup helps as it’s much faster so it’s enough not to timout deployment 16:08:13 <holser> mihgen: we do 16:08:33 <mihgen> teran_: did you agreed on that? ^^ 16:08:48 <holser> as I showed that comlex deployments with neutron+gre cconsumes up to 3GB RAM 16:09:28 <holser> ps ax -O rss | awk '{ sum += $2 } END {print "Total memory usage =", sum/1024, "MB"}' 16:09:29 <holser> Total memory usage = 2691.69 MB 16:09:44 <nurla> mihgen: we've already increase ram size for tests to 2.5gb 16:09:51 <holser> That’s after deployment, as during deployment it was 3Gb 16:09:58 <angdraug> xarses also mentioned before that we need 2 vCPUs on test VMs 16:10:11 <angdraug> with one vCPU, there's too much context switching 16:10:16 <holser> 2 vCPU should help also 16:10:32 <mihgen> nurla: so should we get 3gb instead and 2 vcpu? 16:10:54 <mihgen> at least for now before we improve consumption if possibl 16:11:15 <teran_> mihgen: I'd prefer we start to use xtrabackup, I had expirience with that tool it's faster the mysqldump 16:11:29 <angdraug> one more thing from xarses (he's having trouble getting online): we should enable KSM on jenkins slaves 16:11:38 <mihgen> teran_: we will but as I understood we still need 3gb 16:11:40 <holser> teran_: +1 https://review.openstack.org/#/c/109606/ 16:12:04 <nurla> this requirements will be affect vbox users 16:12:16 <mihgen> also there was suggestion xdeller in mailing list about in-memory compression 16:12:24 <mihgen> might be we need to consider that as well 16:12:43 <angdraug> mihgen: yup, that's what KSM is for, RAM deduplication 16:12:44 <mihgen> holser: it's actually great that you nailed that down 16:13:12 <tzn> SKM can be very CPU heavy 16:13:20 <tzn> s/SKM/KSM/ 16:13:21 <mihgen> teran_: in your todo list to take a look on this? 16:13:29 <tzn> it might inpact performance 16:13:55 <angdraug> we have it enabled on the dev server in MTV, it's not too bad wrt CPU 16:13:59 <holser> I’d use huge pages instead for our libvirt virtual instances 16:14:02 <mihgen> may be devops team can try it and see how it works 16:14:08 <teran_> mihgen: about 3g - currently it's possible but it could make us close eyes on sime performance issues 16:14:09 <tzn> +1 hugepages 16:14:37 <mihgen> holser: what's the decision, 3gb or 2.5 is ok? 16:14:44 <tzn> althoug we could have better gain with KSM, worth checking 16:14:47 <holser> 2.5 should be ok 16:14:58 <holser> let’s leave 2.5GB for now 16:15:03 <nurla> +1 16:15:06 <mihgen> ok. cool 16:15:22 <mihgen> let's move on 16:15:33 <mihgen> #topic Patching status 16:15:34 <holser> but question with 2 vCPU is still open 16:15:45 <mihgen> teran_: pls consider 2 vcpu 16:15:48 <teran_> mihgen: I saw offer obout RAM compression - that should help, so yeah it's in my todo :) 16:15:57 <mihgen> kk :) 16:16:11 <mihgen> patching shows new areas of issues 16:16:16 <teran_> mihgen: ok 16:17:05 <mihgen> deductively we've discovered that python deps of openstack are not gonna be updated during pathing 16:17:20 <mihgen> so basically oslo.messaging is not updated on node 16:17:37 <mihgen> just because puppet doesn't know that it should update this package 16:17:43 <mihgen> and it's not updated by deps tree 16:17:54 <mihgen> dilyin: what's our action plan on this? 16:18:24 <mihgen> ikalnitsky: any more news on other issues related to patching? 16:18:28 <dilyin> we have decided to add missing package installation to Fuel together with service notification 16:18:29 <angdraug> that's what we get for using puppet instead of apt/yum to manage dependencies 16:18:56 <ikalnitsky> mihgen: yep. here's the status 16:19:04 <dilyin> bogdando have made a good patch to integrate oslo.messaging class to controlelr and compute 16:19:08 <mihgen> dilyin: provide a link pls 16:19:09 <ikalnitsky> The ceilometer issue: 16:19:09 <ikalnitsky> #link https://bugs.launchpad.net/fuel/+bug/1354494 16:19:09 <ikalnitsky> the fix is merged, but afaik QA still have some troubles with ceilometer. Need to investigate. 16:19:10 <uvirtbot> Launchpad bug 1354494 in fuel/5.1.x "Puppet fails during updating ceilometer node" [Critical,Fix committed] 16:19:17 <ikalnitsky> The murano dashboard issue: 16:19:18 <ikalnitsky> #link https://bugs.launchpad.net/fuel/+bug/1355180 16:19:18 <ikalnitsky> The patch is already done, but we don't test it yet. 16:19:19 <uvirtbot> Launchpad bug 1355180 in fuel/6.0.x "rollback will fail - migration rollback is impossible - need to backup databases" [High,Confirmed] 16:19:23 <ikalnitsky> We finally resolve the issue with Ubuntu HA. 16:19:23 <ikalnitsky> #link https://bugs.launchpad.net/fuel/+bug/1356873 16:19:23 <ikalnitsky> the fix isn't merged yet, but I made some tests and it works. 16:19:24 <uvirtbot> Launchpad bug 1356873 in fuel "[Update] Patching on Ubuntu nova ha failed with Unmet dependencies" [Critical,In progress] 16:19:33 <ikalnitsky> As for this 16:19:34 <ikalnitsky> #link https://bugs.launchpad.net/fuel/+bug/1355180 16:19:34 <ikalnitsky> it was confirmed from the mos side that there're no migrations between 5.0 and 5.0.2. So we can target it for 6.0 and won't fix in 5.1. 16:19:43 <dilyin> https://review.openstack.org/#/c/116011/ 16:19:46 <ikalnitsky> that's all :) 16:20:02 <mihgen> ok, thanks 16:20:24 <mihgen> I'm wondering on the current testing coverage, how deep did we go, what else to expect 16:20:29 <mihgen> nurla: tatyana ^^ 16:21:05 <dilyin> angdraug, actually we use both puppet and apt/yum and collect troubles from both sides) 16:21:06 <tatyana> nurla: not so deep aswe want - we need to improve our test 16:21:21 <tatyana> to verify if all deps are updated 16:21:50 <nurla> also we should covered vcenter and cli 16:22:11 <tatyana> and meausre downtimes 16:22:13 <mihgen> ok, what about other stuff, like nova-network, murano/sahara 16:22:34 <tatyana> covered 16:22:45 <mihgen> tatyana: oh yes, we need to know what happens under real load by rally and patching at the same time 16:22:48 <mihgen> tatyana: good 16:22:54 <tatyana> we need to run destructive tess as well on patched ha 16:23:12 <tatyana> test 16:23:45 <nurla> today we extend our swarm for run patching against all tests 16:23:45 <mihgen> ok if we turn off primary controller and run patching ) 16:24:01 <nurla> mihgen: O_0 16:24:20 <ikalnitsky> mihgen: don't do it, i'm scared 16:24:22 <nurla> and after shutdown cluster too) 16:24:35 <mihgen> ohh. too many cases 16:24:50 <mihgen> all right, anything else on patching? 16:24:58 <mihgen> dpyzhov: what about size of tar balls? 16:25:19 <mihgen> sorry it was not me, but my IRC client ) 16:25:28 <dpyzhov> Well, we can win 2Gb with lrzip 16:25:40 <dpyzhov> but it takes 15 minutes to unpack 16:26:09 <xarses> so, a wash 16:26:09 <ikalnitsky> dpyzhov: what about using lrzip only on final tarball ? 16:26:10 <christopheraedo> I would say better to make it smaller at the expense of time (consider you'll also save time downloading the upgrade package) 16:26:11 <dpyzhov> I’m playing with fdupes in order to add hardlinks into tarball 16:26:13 <dilyin> it's possible to provide only incremental repo updates. it will make update tarballs many times smaller. but we have no facilities to generate such diff files yet 16:26:49 <dpyzhov> incremental updates are out of scope for 5.1 16:26:58 <dpyzhov> It’s too dangerous 16:27:05 <dpyzhov> We will get new bugs 16:27:05 <dilyin> yes fdupes can hardlink same file very well. it will free a lot of space on master node but will not help with update tarball size 16:27:10 <mihgen> yep let's be realistic for 5.1 16:27:24 <dpyzhov> dilyin: we have duplicates in tarball 16:27:28 <mihgen> if we can do hardlinks way in 5.1, that would be awesome 16:27:36 <dpyzhov> because we have two centos repos and two ubuntu repos 16:28:04 <mihgen> ok so dpyzhov - you are trying to get hardlinks working for us? 16:28:19 <dpyzhov> But looks like lrzip still better than fdupes 16:28:30 <dpyzhov> mihgen: yes 16:28:34 <dilyin> they are better together 16:28:44 <dpyzhov> dilyin: looks like not 16:28:58 <mihgen> ok let's move on folks 16:29:14 <mihgen> #topic remove public IPs from slave nodes 16:29:27 <akasatkin> Latest ISO is http://jenkins-product.srt.mirantis.net:8080/view/custom_iso/job/custom_master_iso/65/ 16:29:34 <akasatkin> Option "Assign public network to all nodes" is on the Settings tab in UI and it is disabled by default. I.e. only "controller" and "zabbiz-server" nodes will have Public network by default. 16:29:42 <akasatkin> Both HA and non-HA deployments are under testing. 16:29:46 <mihgen> there were a lot of complaints about having public IPs assigned to all nodes, even when not needed 16:29:53 <akasatkin> Current problems are: nodes' default gateways point to the master node's IP (Sergey V. to figure out what to do with that), issues with ubuntu/neutron-vlan/ceph deployments (https://bugs.launchpad.net/fuel/+bug/1359834 - not yet reproduced on the last ISO) 16:29:54 <uvirtbot> Launchpad bug 1359834 in fuel "[custom iso] Ceph volumes don't work on ubuntu/neutron-vlan" [High,Incomplete] 16:30:14 <mihgen> xenolog: did you figure out what to do with default gateway? 16:30:36 <mihgen> akasatkin: cinder is fixed? it was using public net (wrongly) 16:30:48 <akasatkin> yes, fixed 16:31:01 <mihgen> ok good 16:31:25 <angdraug> is this feature a risk for 5.1 schedule? 16:31:28 <akasatkin> ceph also works on last ISO. we just started with this configuration on 65 16:31:52 <mihgen> angdraug: should be actually done 16:32:09 <akasatkin> but didn't check other cfg's on last ISO yet 16:32:12 <angdraug> for Ceph we still need to merge https://review.openstack.org/115728 16:32:31 <angdraug> otherwise it will raise HEALTH_WARN on every deployment 16:32:57 <xenolog> I can propose two way: 16:32:57 <xenolog> 1. simple — set default gateway to the master node or to another node, that customer can define 16:32:57 <xenolog> 2. more_powerful — make virtual router, based on controller's and managed by pacemaker. 16:33:59 <xenolog> 1-st way required lot of changes in nailgun/fuelweb 16:34:30 <mihgen> why a lot of changes? should not be so 16:34:33 <mihgen> in 1st 16:34:39 <xenolog> 2-nd way —required only puppet-part changes 16:35:20 <mihgen> 1st way should be a way simpler. but we need public IPs gateway for nova-network still 16:35:30 <mihgen> ok let's take it out of the meeting 16:35:39 <mihgen> and discuss in #fuel-dev or ML if needed 16:35:54 <mihgen> I hope it can be quickly done, otherwise we will have to go back with all of this 16:36:03 <xarses> xenolog: working with multiple-cluster networks, i found that l23network will set what ever its passed, we just need to fix nailgun to pass the network you want for default 16:36:09 <mihgen> ok I think we should move on 16:36:17 <xarses> we also need to get a gateway router from the user on the other network 16:36:22 <xenolog> mihgen: 16:36:22 <xenolog> 1. field for IP address definition in fuel-web 16:36:22 <xenolog> 2. network checker, that monitor, that defined router — is a router 16:36:22 <xenolog> 3. new variables in astute.yaml 16:37:01 <mihgen> xenolog: ohh 16:37:03 <mihgen> ok 16:37:13 <mihgen> let's move on 16:37:23 <mihgen> #topic mos-openstack bugs 16:37:32 <mihgen> dmitryme: around to provide status? 16:37:46 <xenolog> 2-nd way looks like more relaible 16:37:51 <dmitryme> mihgen: yep, I am here 16:38:10 <mihgen> xenolog: yeah but seems complicated 16:38:20 <xenolog> because don't required from curtomer infrastructure 16:38:24 <dmitryme> basically we have two critical bugs left: fixing keystone and memcache interaction 16:38:46 <dmitryme> and fixing oslo.messaging <-> rabbitmq interaction 16:38:59 <mihgen> any progress/estimates on both? 16:39:03 <dmitryme> for the first bug it is in the process of debugging 16:39:15 <dmitryme> I hope 1-2 days 16:39:20 <xenolog> mihgen: more complicated in implementing, but simpleest in using. 16:39:32 <dilyin> I guest the first way is better. Just set route to the IP provided by user. If master node goes down it's not a big problem. Internet connection from computes is not strictly required 16:39:46 <dmitryme> as for the second one, I’ve already merged the old fix we used in 5.0.1 and right now are testing more proper fix 16:39:46 <mihgen> dilyin: +1 16:39:59 <mihgen> xenolog: dilyin but looks like we need to talk about it over email :) 16:40:11 <dmitryme> s/I’ve/we’ve/ 16:40:14 <mihgen> dmitryme: ok thanks 16:40:18 <dmitryme> it wasn’t me :-) 16:40:20 <mihgen> dmitryme: I hope it's gonna be soon too.. 16:40:36 <mihgen> anything else? 16:40:49 <dmitryme> mihgen: I will discuss it, probably the old fix is good enough for 5.1 16:40:56 <xenolog> dilyin: 16:40:56 <xenolog> > If master node goes down it's not a big problem 16:40:56 <xenolog> master node CAN go down — it's i bigest problem!!! 16:41:02 <dmitryme> mihgen: nope, nothing else 16:41:22 <xenolog> because we have min.3 controller and only one master node. 16:41:33 <dilyin> xenolog, why? what importnat services are left there? 16:41:44 <mihgen> xenolog: we have no deps on master node, only NTP and partly DNS 16:41:46 <angdraug> topic is mos-openstack? 16:41:54 <mihgen> and that's what we must fix, hopefully in 6.0 16:41:58 <mihgen> should be easy 16:42:02 <mihgen> dmitryme: thanks 16:42:08 <mihgen> moving on 16:42:14 <dmitryme> o, I think I should have mentioned issue which should be fixed by sqlalchemy upgrade, it has ‘high’ status, the fix is under review right now 16:42:26 <angdraug> link? 16:42:35 <mihgen> dmitryme: ok 16:42:36 <xenolog> if something can иу икщлут - it is when something breaks. 16:42:40 <mihgen> #topic mos-linux bugs 16:42:53 <xenolog> s/иу икщлут/be broken/g 16:43:00 <nurla> sed 16:43:31 <mihgen> msemenov: around? 16:43:51 <mihgen> any status on rabbitmq upgrade, iptables.. ? 16:44:09 <msemenov> mihgen: here 16:44:10 <mihgen> #link https://bugs.launchpad.net/fuel/+bug/1359096 16:44:12 <uvirtbot> Launchpad bug 1359096 in fuel "Build iptables 1.4.11 for centos" [High,Confirmed] 16:44:27 <angdraug> we also still need yet another ceph update: 16:44:32 <angdraug> #link https://bugs.launchpad.net/fuel/+bug/1341009 16:44:35 <uvirtbot> Launchpad bug 1341009 in fuel/5.0.x "[osci] obsolete ceph package in fuel-repository for 5.0.1 and 5.1" [Critical,Fix released] 16:44:41 <xenolog> DNS — this is not enough. 16:44:42 <msemenov> request https://gerrit.mirantis.com/#/c/21088 16:44:53 <msemenov> we have +1 from Evgeny Li 16:45:00 <msemenov> so it seems we can merge 16:45:56 <mihgen> angdraug: is it that critical with ceph? 16:46:04 <xarses> was the python issue resolved? 16:46:21 <mihgen> I'm afraid a bit of any pkgs upgrades if possible 16:46:28 <mihgen> xarses: what python issue? 16:46:30 <angdraug> mihgen: it's not critical but it's high priority 16:46:37 <msemenov> may be this one? https://bugs.launchpad.net/mos/+bug/1342068 16:46:38 <uvirtbot> Launchpad bug 1342068 in mos "syslog logging to /dev/log race condition" [Critical,Fix committed] 16:46:39 <mihgen> angdraug: even high.. 16:46:42 <angdraug> there's upstart and radosgw related fixes in that version 16:46:50 <xarses> msemenov: correct 16:47:07 <msemenov> xarses: not reproduced with the fix. So moved to fix committed 16:47:22 <angdraug> if we're close to HCF I guess we can stay with ceph 0.80.4, but upstream highly recommends an upgrade 16:47:34 <xarses> msemenov: ok, it should be in today's ISO? 16:47:52 <msemenov> xarses: sure 16:47:53 <angdraug> nurla: do we have confirmation that python /dev/log problem is solved now? 16:48:14 <xarses> msemenov: ok, I will retest my case that caused it to occur all the time 16:48:14 <angdraug> xarses: yes, patched python packages have hit the 5.1 mirrors yesterday 16:48:32 <nurla> angdraug: no, we havn't 16:48:54 <mihgen> angdraug: I would skip it if possible, too many things, and we have 3 QA going on vacation next week 16:49:01 <nurla> because nova and neutron issues blocked us 16:49:08 <angdraug> mihgen: ok :( 16:49:26 <msemenov> as from the conversation with D.Borodaenko, there should be 100% cpu load for services writing to .dev.log 16:49:51 <msemenov> after restarting rsyslog many times(even during deployment) 16:49:51 <xarses> if you restart syslog while they are writing 16:50:00 <msemenov> and we dont see it 16:50:37 <angdraug> we don't have an ISO from today that would have passed centos bvt, why? 16:50:38 <xarses> We will retest today and mark it fix released if i can't see it anymore 16:51:47 <msemenov> xarses: and if the bug is still here, please provide deatiled repro steps in the issue 16:51:58 <xarses> yes 16:52:25 <mihgen> nurla: any comment on centos bvt? 16:52:46 <mihgen> nurla: we need bug about it. every build hangs 16:52:55 <nurla> ok 16:53:16 <mihgen> ok anything else to bring with mos-linux? 16:53:19 <nurla> at first look, issue with galera 16:53:50 <msemenov> link? 16:54:07 <mihgen> #topic 6.0 plans and beyond 16:54:30 <mihgen> so we've discussed it a bit previously, just repeating that the main goal is to get Juno working 16:55:00 <mihgen> the thing that we still don't test openstack master with current puppet manifests puts us under risk releasing before design summit 16:55:14 <mihgen> so we should start doing that ASAP and collaborate across teasm 16:55:29 <mihgen> we can run a few things in parallel 16:55:47 <mihgen> for now though anyone who can help with reaching HCF should do that. 16:56:10 <mihgen> that's it from my side. any questions/suggestions? 16:56:56 <mihgen> actually forgot about 5.0.2 16:57:04 <mihgen> #topic 5.0.2 milestone 16:57:07 <mihgen> #link https://launchpad.net/fuel/+milestone/5.0.2 16:57:27 <mihgen> there are a number of bugs over there. We must keep watching them too 16:57:49 <mihgen> should be like 95% of back ports to stable branch 16:58:13 <mihgen> #topic other questions 16:59:41 <christopheraedo> Last week I missed a chance to answer question about an open blueprint status. 16:59:44 <mihgen> looks like no questions except some we ran in parallel in the office) 16:59:45 <christopheraedo> Added in progress and todo work items to the blueprint (https://blueprints.launchpad.net/fuel/+spec/fuel-web-docs-dev-env-restructure) 16:59:48 <christopheraedo> Will change Fuel pages on OpenStack wiki today/tomorrow. First just re-ordering and improving the organization. Then I'll go through and add/update content over the next few days. 16:59:51 <mihgen> christopheraedo: oh yeah 17:00:12 <mihgen> christopheraedo: very good 17:00:13 <angdraug> time 17:00:13 <mihgen> thanks 17:00:20 <mihgen> ok guys thanks 17:00:26 <mihgen> see you next meeting 17:00:32 <tatyana> bb 17:00:36 <mihgen> #endmeeting