16:01:31 #startmeeting openstack_ansible_meeting 16:01:32 Meeting started Tue Oct 23 16:01:31 2018 UTC and is due to finish in 60 minutes. The chair is evrardjp. Information about MeetBot at http://wiki.debian.org/MeetBot. 16:01:34 Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. 16:01:36 The meeting name has been set to 'openstack_ansible_meeting' 16:01:57 #topic rollcall 16:01:59 o/ 16:02:04 o/ 16:02:07 o/ 16:02:25 o/ 16:02:31 o/ 16:02:46 Jesse Pretorius (odyssey4me) proposed openstack/openstack-ansible-ops master: MNAIO: Add legacy os-infra_hosts group back https://review.openstack.org/612737 16:02:50 o/ 16:03:03 #topic Last week highlights 16:03:08 o/ 16:03:22 hwoarang: added "Proposed distro job for ceph deployments and needs votes to get it in. Also restored lxc and ceph deployments for SUSE. Also needs votes :)" 16:03:48 any link to that review? 16:03:50 jrosser 's highlight was "bionic timouts - fixed by removing repo cach server" 16:04:32 my highlight is old i've removed it from the wiki. i think it's all done but i am still catching up. iirc some opensuse jobs have been reverted and/or switched to non-voting 16:04:37 so trying to figure out what happened :( 16:05:19 ok sorry for that it seems my browser might have caused a reference to an old highlight then 16:05:43 afaik we went nv after a lot of timeout and failure, some might be infra related and some being mirror related and some just being some os thing 16:05:44 \o/ 16:06:01 Merged openstack/openstack-ansible-os_swift master: RedHat: Use monolithic openstack-swift package https://review.openstack.org/612397 16:06:01 Merged openstack/openstack-ansible-os_swift master: zuul: Switch to distro package installation template https://review.openstack.org/606056 16:06:25 should we discuss this further in the open discussion, or is there a bug referencing this? 16:06:44 open discussion 16:06:48 great 16:06:59 let's move to bug triage first though 16:07:06 #topic bugtriage 16:07:17 Please see our usual etherpad 16:07:27 https://etherpad.openstack.org/p/osa-bugtriage 16:07:35 #link https://bugs.launchpad.net/openstack-ansible/+bug/1798079 16:07:36 Launchpad bug 1798079 in openstack-ansible "Test environment example in openstack-ansible" [Undecided,New] 16:08:33 yup 16:08:36 seems fair 16:08:45 https://docs.openstack.org/openstack-ansible/rocky/user/test/example.html doesn't talk about bounds 16:08:48 bonds* 16:08:52 looks like low hanging fruit 16:09:22 classification? 16:09:42 low/confirmed? 16:10:00 lgtm 16:10:27 anyone new wants to take this low hanging fruit? 16:10:54 I like low hanging fruit :) 16:11:00 I could give it a try 16:11:07 thanks! 16:11:10 #link https://bugs.launchpad.net/openstack-ansible/+bug/1797499 16:11:11 Launchpad bug 1797499 in openstack-ansible "keystone default deploy test uses http not https" [Undecided,New] 16:12:24 I'm pretty sure this was discussed at length last week in the channel. 16:13:02 not reflected in the bug, although the bug was updated 3 times.https://bugs.launchpad.net/openstack-ansible/+bug/1797499/+activity 16:13:02 Launchpad bug 1797499 in openstack-ansible "keystone default deploy test uses http not https" [Undecided,New] 16:13:02 The test mentioned is a localhost test against the local keystone service. It does not need to use https because the keystone container only listens on http by default. 16:13:14 Yeah, I didn't know there was a bug. 16:13:39 This is partially fixed by something that merged recently. 16:14:50 The actual issue the guy had was he was trying to use the same IP for the external and internal endpoints. 16:15:01 I think the title of the bug is very confusing and incorrect 16:15:47 The fact that http didn't work right when the settings were applied is now fixed in https://review.openstack.org/#/q/I823f2f949258157e306dbf80570abe53373da0c3 16:16:04 I remember said patch odyssey4me 16:16:05 good 16:16:13 so we can close this one then, as invalid? 16:16:26 if it's an incorrect classification we'll see this one open 16:16:40 there is still an issue in that if keystone is set to use client <-https-> haproxy <-https-> keystone then it will fail mmiserably 16:17:09 You should be able to use the same IP for internal and external endpoints, its the user_variables.yml.example 16:17:12 I think we may already have a bug or two for that condition 16:17:27 spotz: yes, but it never worked - but thanks to your patch it now will 16:17:36 spotz: I would not recommend it though 16:17:47 but that's another topic 16:17:56 My suggestion to the reporter was to use different IP's for the different endpoints. 16:18:05 There was a report of success after that. 16:18:08 ok 16:18:56 so do we agree on the invalid classification? 16:19:50 let's move to the open discussion 16:20:08 #topic open discussion 16:20:50 hwoarang: before going to your topic I will prio on those already written there, as there are topics that might be skipped 16:20:57 let me know your email to add your to the hangout if coming to summit! 16:21:02 (due to their recurrent/unupdated) nature 16:21:15 that's indeed first topic 16:21:25 #link https://etherpad.openstack.org/p/OSA-berlin-planning 16:21:55 anything else on that topic? 16:22:13 ok next 16:22:24 "Openvswitch configuration does not handle configuration properly on compute nodes. It should be configured with different interfaces on neutron agent container and compute hosts" 16:22:39 wasn't this one already present last week? 16:22:56 do you have the bug for that? 16:23:47 jamesdenton: all I can see on said topic is two links: http://eavesdrop.openstack.org/irclogs/%23openstack-ansible/%23openstack-ansible.2018-07-30.log.html#t2018-07-30T15:33:48 and https://drive.google.com/file/d/1ebmQFx4w7W6G9KJGLuj82VDH5xOwXEhW/view 16:24:03 thx 16:24:04 jamesdenton: but I think it's an old topic 16:24:12 i did OVS install recently and don't recall any issues 16:24:15 ok 16:24:21 thanks for the feedback there 16:24:42 Tahvok is not there to talk about that, so let's move towards another topic then! 16:24:50 Integration between os_tempest and tripleo validate-tempest 16:24:55 is there anything to say there? 16:25:16 anything new? 16:25:24 chandankumar and arxcruz? 16:25:46 evrardjp: I am working on https://review.openstack.org/591424 16:25:50 I think I saw a commit today, please review! 16:25:51 distro support 16:25:53 chandankumar: great! 16:25:54 evrardjp: well, i'll work to enable python-tempestconf 16:26:00 this week/sprint 16:26:03 that sounds very nice! 16:26:07 it is almost done few breaking changes 16:26:20 if you need help don't hesitate to ping 16:26:33 we are still reformulating how our internal sprint works, but now things are get track 16:26:53 arxcruz: agile! 16:26:57 evrardjp: I am in sync with odyssey4me lots of stuff going on from that changes 16:26:57 :D 16:27:00 I should send the first wip this week 16:27:16 evrardjp: hehe, i wish :D 16:27:25 chandankumar: cool, you are in good hands! other cores, don't hesitate to help there :) 16:27:43 i can do reviews if need be :) 16:27:45 arxcruz: hahah. Thanks for the first WIP then! :) 16:27:56 Jesse Pretorius (odyssey4me) proposed openstack/openstack-ansible-ops master: MNAIO: Add legacy os-infra_hosts group back https://review.openstack.org/612737 16:28:20 arxcruz: chandankumar do you mind if we keep this item on the agenda, so that I know to ping you, and we can track things next week? 16:28:30 spotz: mbuil Opened a bug for AIO networking not working: https://bugs.launchpad.net/openstack-ansible/+bug/1799507 16:28:30 Launchpad bug 1799507 in openstack-ansible "AIO deployment instances do not have network connectivity" [Undecided,New] 16:28:31 evrardjp: sure 16:28:33 evrardjp: sure 16:28:39 great 16:29:00 evrardjp: https://etherpad.openstack.org/p/openstack-ansible-tempest is our plan 16:29:01 Merged openstack/openstack-ansible-tests stable/rocky: Update Ansible to 2.5.10 https://review.openstack.org/612405 16:29:02 so far 16:29:13 feel free to comment :) 16:29:27 I will have a look :) 16:29:40 anything else on that topic? 16:29:55 If not, I'd like to leave the mic to hwoarang 16:30:00 ok 16:30:44 so what i would like to talk about is this job revert/non-voting situation. what we normally do when a job brakes one day, then we normally move it to non-voting 16:30:59 however, nobody remembers to bring it back to voting so the testing matrix is different every other day 16:31:09 and not sure what we can do about that 16:31:15 it all feels a bit random right now 16:31:38 hwoarang: typically I try to push an immediate patch to revert the non-voting change, and then recheck it from time to time 16:31:42 I find that works best 16:31:55 Zuul v3 brought us the ability to be more flexible, but indeed I don't like this kind of expectations issues personally 16:32:05 but, quite honestly, all the job changes are making it hard for me to get work done to switch the role to use the integrated build properly 16:32:21 we are fortunate enough to have distro people around so maybe we can reach out to them, give them like 2 days to fix stuff before we revert or something? 16:32:22 should we decide a rule? 16:32:31 because moving from non-voting to voting normally has lower prio 16:32:38 so things can stay in non-voting for days or weeks 16:32:50 it seems like the mirror issues are better now, that's definitely been more stable since nicolasbock upped the mirror refresh frequency 16:32:54 hwoarang: which can lead to bad things 16:33:16 the only issue now is that broken packages hit the repositories relatively often on master - especially for the distro builds 16:33:31 but a broken master is expected from time to time :/ 16:34:04 keepig jobs as voting actually puts pressure on upstream people for a quick fix 16:34:12 may I suggest we move towards a non-voting in master, and keep jobs stable on stable branches, until 2 days are passed without improvements? 16:34:14 perhaps then distro builds should remain non-voting for master until after m3, then work gets done to make it all work right until the RC time frame? that sucks though because it puts tons of pressure on everyone working on that then 16:34:21 more ideally, the work should be spread out 16:34:34 so it'd be far nicer if we could use a more stable repo somehow 16:34:47 something that gets testing before promoting 16:35:00 it maybe doesn't need to wait for m3 16:35:06 that might work since packages are changing quite often before M points 16:35:27 i am fine with stabilizing distro jobs after branching too 16:36:10 could we perhaps rather use an infra specific mirror that's updated only after the package updates are tested and validated? 16:36:13 what i dont like is this flip-flot because it's hard to keep track of it on all the repos 16:36:22 yep, definitely agreed for that 16:36:25 also the jobs should stay in checks and we should all together not merge things blindly if a -nv fails -- really check at the failure 16:36:50 odyssey4me: hwoarang agreed on the no flip-flop 16:37:15 We currently don't have that for Master odyssey4me 16:37:41 ok, but perhaps there'd be a way to implement it in openstack-infra? 16:37:51 so far it seems that distro jobs are causing the big trouble so we can make them non-voting. but the source based ones should remain voting and try to fix them instead of moving to non-voting. fixes normally arrive in less than 48h 16:38:21 some sort of periodic job to test the 'proposed' set, then if it passes copy the tested set into the infra mirrors 16:38:45 well ideally upstreams should CI their packages ;p 16:39:01 hwoarang: and publish them right? :p 16:39:03 that seems fair to me - we aim to switch the distro jobs to voting before the new release, and all stable branches have them voting 16:39:05 :) 16:39:22 so could we sum it up? 16:39:25 that sounds like the easier solution to keep 'master' happy 16:39:41 *easiest 16:39:59 master: source -- wait for 2 days, packages -- see what we can do with more stability + making it non voting until milestone x // stable branches -- always wait for 2 days 16:40:45 more or less. at least check with the appropriate $distro channel for ETA 16:40:54 maybe the problem is not known to them at all 16:41:26 well, centos and ubuntu don't update their packages without testing - so those distro jobs could perhaps remain voting the entire time... but it seems that opensuse is not testing the master packages after prepping them, so this may have to be suse specific? 16:41:30 hwoarang: indeed 16:41:47 right now the ubuntu distro installs are incomplete, so they'd need to be non-voting anyway 16:41:56 but I think centos is fine as far as I've seen 16:42:07 odyssey4me: and we are not running bleeding edge PPAs 16:42:08 odyssey4me: honestly i dont know how the suse cloud team is testing the packages 16:42:30 evrardjp: yep, if that changes then we'd likely have to apply the same rule 16:42:36 We don't hwoarang (IRC) ;) 16:42:39 * evrardjp whistles 16:42:54 ah ok then 16:42:58 We don't spend a lot of time on master unfortunately 16:43:07 we do but... OMG you don't want to know 16:43:14 so stabilizting after branching seems the most sensible thing for suse 16:43:14 Once it's branched it's a different story 16:43:24 I think so 16:43:31 ok then so be it 16:43:37 Unless we can convince the right people to add some more vetting to master ;) 16:43:46 ok, so we're all happy for Ubuntu/SUSE distro jobs to remain non-voting until the RC period where work ramps up to get them working - any work done during the cycle is appreciated and advised, but it may break routinely 16:43:52 nicolasbock: isn't that what I suggested? 16:43:54 :p 16:43:58 odyssey4me: ok 16:44:15 hehe 16:44:16 mnaser: you happy with that? 16:44:35 It seems that CentOS is the model to follow for the rest. ;) 16:44:35 Yes, I don't think we need to convince you evrardjp (IRC) ;) 16:44:36 Merged openstack/openstack-ansible-tests master: Update ansible to latest stable 2.6.x https://review.openstack.org/612062 16:45:01 ok are we done on this topic? A new bug was raised 16:45:14 from jungleboyj 16:45:19 https://bugs.launchpad.net/openstack-ansible/+bug/1799507 16:45:19 Launchpad bug 1799507 in openstack-ansible "AIO deployment instances do not have network connectivity" [Undecided,New] 16:45:26 i mean, i don't think it's ideal, but i dont think we can do better. 16:45:44 hwoarang: FYI https://review.openstack.org/612391 is up, but isn't passing yet :/ 16:45:53 Merged openstack/openstack-ansible-os_glance master: Make glance cache management cron task idempotent https://review.openstack.org/612065 16:46:05 odyssey4me: yes but it's not suse who is failing :) 16:46:10 but yes, packagers: please CI your stuff, that'd be awesomeeeEe 16:46:10 I can poke jungleboyj if we need him 16:46:13 so... 16:46:27 i am here 16:47:12 sorry I might have pulled jungleboyj a little too early in the conversation if you are all still talking about that 16:47:16 :p 16:47:18 hwoarang: ah yes, I remember now :p 16:47:35 jungleboyj: can you give us your openstack_user_config.yml and eventual user variables? 16:48:08 i mean, we cant force people to do things but it would REALLY be nice if they cared about down stream users, because if they break us we can break them and vice versa 16:48:08 or did you use gate_check_commit? 16:48:13 but yeah. moving on. 16:48:32 sorry to keep the clock in there 16:48:34 :p 16:49:23 jungleboyj: I propose continue discussing your bug after the meeting, would that be okay for you? It would let you the time to publish said configuration or say how you reached said state (which process did you run for example) 16:49:36 (it's only in 10 minutes) 16:49:54 so for the last 10 minutes: Are there other topics ? 16:50:18 evrardjp: Sure. Happy to provide any data you need. :-) Just let me know how I can help. 16:50:46 thanks jungleboyj 16:51:01 Jesse Pretorius (odyssey4me) proposed openstack/openstack-ansible-os_glance stable/rocky: Make glance cache management cron task idempotent https://review.openstack.org/612752 16:51:02 thank you 16:51:10 bionic/ceph another -nv thing that might fester. I have it working but only by upgrading ceph to mimic 16:51:28 oh 16:51:34 that's quite something to track too 16:51:56 jrosser: do you have help there? like logan- ? 16:52:08 i want a second opinion really 16:52:24 uca and dl.ceph.com only seem to provide mimic for bionic anyway 16:52:27 o/ 16:52:43 we should use mimic then 16:52:45 jrosser: sounds sensible to me then 16:52:54 and a discussion point might be if we prefer uca or ceph.com packages 16:52:56 jrosser.doit() 16:52:57 sounds reasonable to me 16:53:01 odyssey4me: :D 16:53:04 i prefer ceph.com becasue they work :) 16:53:16 i can't make the uca ones pass gate check 16:53:21 sounds fine to me, I just hate the extra moving part :p 16:53:30 jrosser: you mean the packages or the people? 16:53:34 ^ ive hit bugs in radosgw with uca packages because they were pending SRE in launchpad 16:53:39 so i always use ceph.com packages personally 16:53:50 isnt ceph mirrored in infra btw 16:53:58 ceph.com also provide debug symbol packages which is ++ good thing 16:53:59 mnaser: hammer is :p 16:54:03 http://mirror.sjc1.vexxhost.openstack.org/ceph-deb-mimic/ 16:54:08 http://mirror.sjc1.vexxhost.openstack.org/ceph-deb-luminous/ 16:54:10 http://mirror.sjc1.vexxhost.openstack.org/ceph-deb-jewel/ 16:54:11 :D 16:54:14 problem solved? 16:54:18 mnaser: :D 16:54:22 mnaser: look at the env var passed into a job though 16:54:23 jrosser: agreed 16:54:29 thats not so useful 16:55:00 not sure, as in when to use uca and when not to? 16:55:01 ok, do we need to switch from mirroring to a reverse proxy instead? or perhaps expose the right env vars? 16:55:24 * jrosser school half term so not managed to follow this up 16:56:04 ok, so we keep using the ceph packages, but need to switch to using the right mirror - and figure out how we get the right mirror path 16:56:10 anyway - i have patches in for switching to mimic and also some general tidy up of the ceph server install 16:56:16 odyssey4me: agreed 16:56:34 jrosser: thanks for that! 16:56:35 reviews please on those and i can chase them up when i'm back at work later in the week 16:56:47 Dmitriy Rabotjagov (noonedeadpunk) proposed openstack/openstack-ansible-os_masakari master: Basic implementation of masakari-monitors https://review.openstack.org/584629 16:57:17 FYI I've got a URL that may be useful for reviewers in a hurry: http://bit.ly/2NVPFCg 16:57:43 Those are mergable, have passed CI, submitted by cores, and have no negative reviews. 16:57:49 thanks odyssey4me 16:57:59 only two minutes remaining for your last topics! 16:59:33 * mnaser cant wait for gerrit shared dashboards 16:59:34 ok thanks everyone! 16:59:37 odyssey4me: will be our goto dashboard-ian 16:59:39 mnaser: agreed 16:59:46 haha 17:00:09 #endmeeting