Tuesday, 2014-04-01

tchaypoLotus907efi: https://git.openstack.org/cgit/openstack/diskimage-builder/tree/elements/ubuntu/README.md suggests that the ubuntu element will grab the cloud image00:01
Lotus907efiok, I will go read that00:01
Lotus907efithanks00:01
*** Hefeweiz1n has quit IRC00:02
tchaypoit looks like https://git.openstack.org/cgit/openstack/diskimage-builder/tree/elements/ubuntu/root.d/10-cache-ubuntu-tarball does the downloading00:02
Lotus907efihmm, not really much there00:02
*** andreaf has quit IRC00:02
Lotus907efiit doesn't say for instance how those cloud images are used00:02
greghayneswoooo 'overcloud  | DELETE_FAILED'00:03
*** eghobo has quit IRC00:03
*** newell has quit IRC00:07
openstackgerritJames Polley proposed a change to openstack/tripleo-incubator: Standardise location of environment password/rc files.  https://review.openstack.org/8325000:08
tchaypo6 days, 12 revisions, I've settled into a comfortable habit of checking that each morning and doing a new update based on the new nits people found yesterday00:13
*** cwolferh_ has quit IRC00:13
tchaypoTo me this feels like a good thing - it's getting thorough review and when it lands it will be solid. I wonder if this would be disconcerting to someone who wasn't used to this review process though00:14
mordredtchaypo: almost certainly - it's a learning curve for new folks for sure00:15
tchaypoI can't think of anything we could do to make that less of a curve, beyond looking out for new people and giving them a bit of hand-holding through their first review or two00:17
tchaypoI'm guessing most people will already be familiar with doing thorough pre-commit reviews though, so it's probably not too big of an issue anyway00:18
mordredtchaypo: we do send people a welcome email on their first commit sent in00:19
mordred:)00:19
tchaypoyeah00:20
tchaypoI beleive that started just *after* my first commit00:20
mordredhehe00:20
tchaypo*grumps*00:20
mordredyou want me to send you a copy of the text in an email?00:20
tchaypoI'd rather you sent me a pony00:21
clarkbtchaypo: really? seems like most people don't do pre commit reviews00:21
tchaypoI even have a nice patch of grass outside where it could graze00:21
mordredtchaypo: I will attempt to fit a pony into an email00:21
tchaypoclarkb: maybe my perspective is skewed; my second-last employer used it (and as far as i can tell the workflow I got used to there also inspired the gerrit workflow we use on openstack); my last employer used pre-commit reviews in some cases (I always tried to get pre-commit reivew on my own changes, but it wasn't mandatory on my team)00:24
tchaypomordred: Be careful with providing ponies, it just raises expectations00:26
tchaypoat second-last-job someone arranged a pony for our team for sysadmin appreciation day00:26
tchaypobut then Vint Cerf wandered past and one of my workmates got a photo with him *and* the pony00:26
*** matsuhashi has joined #tripleo00:27
tchaypohttps://www.flickr.com/photos/joanne-psi/482443475/?rb=100:27
tchayposo if you end up providing the pony and it doesn't come with a side-helping of Father of the Internet I'm just going to be disappointed00:28
mordredfair00:28
lifelessStevenK: what was the pony name @ Canonical ?00:30
StevenKWoody00:34
tchaypoI can't see any problems with requesting that someone give you that particular pony.00:34
tchayponone at all.00:34
*** giulivo has quit IRC00:47
*** CaptTofu has joined #tripleo00:49
*** UtahDave has left #tripleo00:53
*** CaptTofu has quit IRC00:54
*** zigo has quit IRC00:59
*** zigo has joined #tripleo01:01
*** morazi has quit IRC01:16
*** CaptTofu has joined #tripleo01:17
*** kiall has joined #tripleo01:19
*** nosnos has joined #tripleo01:26
tchaypolifeless: When you were investigating the dhcp issues yesterday, I think you mentioned you found some rules weren't beint re-applied - can you tell me more about that?01:38
tchaypoI'm pretty sure you found that they weren't being set up even after restarting neutron-openvswitch-agent though, right?01:39
lifelesstchaypo: n-o-a was crashing on startup01:39
lifelesstchaypo: due to a missing lock path setting in the undercloud01:39
lifelesswoooo billions of slaves01:40
tchaypoyep, that's not my problem. thanks01:40
tchaypoi mean, not the problem I'm seeing. "not my problem" could be read other ways.01:41
lifelessindeed01:41
*** fandi has joined #tripleo01:42
*** kiall has quit IRC01:47
*** CaptTofu has quit IRC01:51
*** CaptTofu has joined #tripleo01:58
*** noslzzp has quit IRC02:05
*** weshay has quit IRC02:06
*** slagle has quit IRC02:13
*** slagle has joined #tripleo02:15
*** slagle has quit IRC02:28
*** lifeless changes topic to "tripleo-cd running preserve-ephemeral WIP patches and https://review.openstack.org/#/c/62042/ | Using OpenStack to deploy OpenStack;meetings Tuesday 1900 UTC in #openstack-meeting-alt"02:30
*** fandi has quit IRC02:30
*** spzala has quit IRC02:33
*** fandi has joined #tripleo02:43
lifelessStevenK: https://review.openstack.org/#/c/79043/ needs rebasing02:46
*** fandi has quit IRC02:51
*** untriaged-bot has joined #tripleo03:00
untriaged-botUntriaged bugs so far:03:00
untriaged-bothttps://bugs.launchpad.net/tripleo/+bug/129048803:00
uvirtbotLaunchpad bug 1290488 in tripleo "Baremetal: Invalid credentials" [Undecided,Incomplete]03:00
*** untriaged-bot has quit IRC03:00
openstackgerritlifeless proposed a change to openstack/tripleo-incubator: Make it possible to pass real hardware in.  https://review.openstack.org/8318803:04
*** fandi has joined #tripleo03:05
lifelessSpamapS: reviewed https://review.openstack.org/#/c/83614/03:08
*** killer_prince has quit IRC03:08
lifelesstchaypo: http://paste.openstack.org/show/74692/ is what I was seeing03:08
StevenKlifeless: derekh would like the incubator portion to land first03:11
lifelessStevenK: yes, but it still needs a rebase :)03:11
*** ramishra has joined #tripleo03:12
*** yamahata has joined #tripleo03:13
openstackgerritSteve Kowalik proposed a change to openstack-infra/tripleo-ci: ensure-test-env now uses devtest_testenv  https://review.openstack.org/8328503:14
openstackgerritSteve Kowalik proposed a change to openstack-infra/tripleo-ci: Populate seed.ip in the testenv JSON  https://review.openstack.org/7904303:15
*** ramishra_ has joined #tripleo03:18
openstackgerritJames Polley proposed a change to openstack/tripleo-incubator: Add some clarity to the first-time user experience  https://review.openstack.org/8329403:20
*** ramishra has quit IRC03:21
openstackgerritlifeless proposed a change to openstack/tripleo-image-elements: Fixup testenv config for interface names.  https://review.openstack.org/8432603:21
openstackgerritlifeless proposed a change to openstack/tripleo-image-elements: Fixup HP region testenv config.  https://review.openstack.org/8407503:21
openstackgerritlifeless proposed a change to openstack/tripleo-image-elements: Remove options not present in the heat template.  https://review.openstack.org/8407403:21
openstackgerritlifeless proposed a change to openstack/tripleo-image-elements: Performance tweaks for testenv deploy script.  https://review.openstack.org/8407303:21
openstackgerritlifeless proposed a change to openstack/tripleo-image-elements: Tune deploy-ci-overcloud a little.  https://review.openstack.org/8407603:21
*** rpodolyaka1 has joined #tripleo03:21
lifelessboom03:22
*** matsuhashi has quit IRC03:22
lifelessSpamapS: Ng: GheRivero: thoughts solicited: https://review.openstack.org/#/c/84049/03:24
lifelessSpamapS: https://review.openstack.org/#/c/83659/03:25
*** rpodolyaka1 has quit IRC03:26
lifelessNg: GheRivero: https://review.openstack.org/#/c/82741/03:29
*** fandi has quit IRC03:31
openstackgerritSteve Kowalik proposed a change to openstack/os-cloud-config: Add CLI scripts for init-{heat,keystone,swift}  https://review.openstack.org/8433003:35
lifelessjogo: can you re-eyeball https://review.openstack.org/#/c/74600/ ?03:36
*** nosnos has quit IRC03:37
lifelessdevananda: NobodyCam: might want to look into why the undercloud job on https://review.openstack.org/#/c/84043 failed03:38
lifelessalso we need a get-state-from-hosts fix for ironic03:39
*** CaptTofu has quit IRC03:41
StevenKBlah, the call with Paul is at 0300Z03:42
StevenKEr, 0300 AEST03:42
*** ramishra_ has quit IRC03:44
*** eghobo has joined #tripleo03:44
*** ramishra has joined #tripleo03:45
tchayporeally? i thought it was 9am.03:47
tchaypomaybe my OWA calendar doesn't know what timezone i'm in03:48
StevenKI was assuming it was 9am PST03:50
tchaypoi was assuming that OWA was showing me that part of the message in a localized format03:51
tchaypoespecially when it offers an "add this to my calendar" button which puts it in my calendar at 9am03:51
StevenKThat may be quite false :-)03:51
tchaypoi would be entirely unsurprised to be wrong03:55
tchaypoclarkb: still around? did you get the email about the meeting with paul from the ospo?03:56
clarkbtchaypo: did they arrive after 0000UTC?03:57
clarkbif so I didn't and I don't remember getting anything03:57
tchaypoi got it at 10:18am03:58
tchaypoI can't remember if I'm currently GMT+10 or GMT+1103:58
clarkbtchaypo: date -u for great good03:59
StevenK% date -R03:59
StevenKTue, 01 Apr 2014 14:58:59 +110003:59
clarkbor that03:59
tchayposo that'd be before 0000, but i think it was only sent to people who said they were interested in talking to him, so maybe you didn't get it04:01
tchaypooh, it looks like greghaynes got the message04:01
tchayposo he shold be able to confirm whether it shows 9am for him as well or if it's in his afternoon04:01
clarkbthey tried to get me to commit to a thing later this month and do a bunch of paperwork for it today but ERELEASE, ESUMMIT, ENEXTRELEASECYCLE04:02
greghaynestchaypo: which one?04:02
greghaynestchaypo: the original calendar event?04:02
tchaypothe updated calendar event04:03
greghaynes3:15pm04:03
greghaynesin who knows what timexone04:03
greghaynestimezone04:03
tchaypoi think that's time sent04:03
tchaypobut what time does the meeting show as?04:03
tchaypofor the the original was 8am-9am and the update is 9am-10am04:03
greghaynes3pm04:03
greghaynes3-404:04
tchaypoexcellent04:04
StevenKgreghaynes: On which day?04:04
greghaynesoh, you AUSers04:04
tchaypofairly confident OWA is being nice and showing it to us in our local time04:04
tchaypothanks greghaynes04:04
greghayneswed the 2nd04:05
tchaypogreghaynes: the preferred term is "aussies"04:05
StevenKOrzies04:05
greghaynesAusome04:05
tchaypopronounced the same as "ozzies", there's nothing weird than a USAnian calling us "osseys"04:05
greghayneswow, so ive been saying it wrong all along04:05
tchaypoy'all do :)04:06
*** rpodolyaka1 has joined #tripleo04:06
* tchaypo leaves cafe to check mail04:06
*** killer_prince has joined #tripleo04:11
*** matsuhashi has joined #tripleo04:14
openstackgerritSteve Kowalik proposed a change to openstack/os-cloud-config: Add CLI scripts for init-{heat,keystone,swift}  https://review.openstack.org/8433004:22
*** nosnos has joined #tripleo04:23
*** akuznetsov has joined #tripleo04:30
*** killer_prince2 has joined #tripleo04:35
*** akuznetsov has quit IRC04:42
*** akuznetsov has joined #tripleo04:43
openstackgerritSteve Kowalik proposed a change to openstack/tripleo-incubator: Store seed details in the JSON  https://review.openstack.org/7905104:54
StevenKlifeless: Feel like putting a +A on https://review.openstack.org/#/c/81691/ ?04:54
StevenKlifeless: I'm happy to recheck no bug if you'd like it go through CI again04:55
*** cody-somerville has quit IRC05:01
*** matsuhas_ has joined #tripleo05:02
*** matsuhas_ has quit IRC05:02
*** cody-somerville has joined #tripleo05:02
*** cody-somerville has joined #tripleo05:02
openstackgerritA change was merged to openstack/tripleo-incubator: Completely subsume POWER_MANAGER into testenv  https://review.openstack.org/8169105:03
*** matsuhashi has quit IRC05:05
*** ramishra has quit IRC05:05
*** matsuhashi has joined #tripleo05:05
*** ramishra has joined #tripleo05:06
*** rpodolyaka1 has quit IRC05:07
*** akuznetsov has quit IRC05:08
*** Rakesh5 has joined #tripleo05:09
*** rpodolyaka1 has joined #tripleo05:11
*** killer_prince has quit IRC05:14
*** akuznetsov has joined #tripleo05:19
StevenKCrumbs, that's an enormous backlog05:25
*** CaptTofu has joined #tripleo05:30
tchaypobacklog?05:30
lifelesstchaypo: I'm guessing at the CI queue05:30
tchaypooh right.05:31
tchaypohah. i imagine it is.05:31
StevenKcheck-tripleo [8 hour history of changes in pipeline] (74)05:32
StevenKThat we're only running 3 jobs at the moment is a worry05:32
tchaypooh myyyyy05:33
*** CaptTofu has quit IRC05:35
lifelesscap of 35 nodes05:36
lifelesswhich is 7 jobs05:36
lifelessand a bunch stalled deleting that I'm poking at05:36
*** akuznetsov has quit IRC05:46
tchaypoStevenK: what's your preferred method for creating an ubuntu mirror?05:49
StevenKtchaypo: apt-mirror05:50
StevenKtchaypo: Depending on what you're mirroring, the churn is ~3G/day05:50
tchaypothe fact that https://help.ubuntu.com/community/Rsyncmirror references "the new feisty" leads me to believe it's not entirely current05:50
StevenKHahaha05:50
* tchaypo tries to remember the ubuntu releases05:51
tchaypoi remember joking about feisty at an LCA, but can't remember if it was 07 or 0805:52
StevenKWarty, Hoary, Breezy, Dapper, Edgy, Feisty, Gutsy, Hardy, Intrepid, Jaunty, Karmic, Lucid, Maverick, Natty, Oneiric, Precise, Quantal, Raring, Saucy, Trusty05:52
StevenK(From memory)05:52
clarkbhardy was the best05:53
StevenKDapper 4 eva05:53
StevenKDapper was the first release I contributed to05:54
StevenKtchaypo: I can pastebin a config for apt-mirror if you wish05:55
StevenKThere's also an element for it in tie05:55
tchaypoi never object to things that save me thinking05:55
StevenK"Could not submit your paste because your paste contains spam."05:56
StevenKLIES05:56
StevenKtchaypo: http://paste.openstack.org/show/74713/05:56
clarkbI just remember using hardy and thinking I didn't need to update immediately on the next release, which was a first for me05:57
greghayneslooks spammy to me05:57
StevenKgreghaynes: Yeah, I'm certain the URLs contain all sorts of horrible malware. :-D05:58
clarkband yet http://paste.openstack.org/show/74712/ gets through05:58
greghaynesIts the au. that gives it away05:58
StevenKclarkb: I joined Canonical during the Gutsy release cycle, I didn't stay running Hardy for long05:59
* clarkb can't wait for trusty and non segfaulty vim05:59
StevenKclarkb: Upgrade now?06:00
StevenKThe final beta is out, the churn between now and release is going to be small06:00
clarkbStevenK: last I checked xubuntu had a couple nasty issues06:00
* clarkb looks again06:00
clarkbbeta2 looks much better I might do that tomorrow06:02
StevenKIt takes my machines roughly 3 minutes to grab all the .debs for an upgrade. <3 local mirror06:03
tchayponon-segfaulty vim?06:04
StevenKtchaypo: One of the yaml files in -infra/config causes vim to segv06:05
tchayponice06:06
tchaypowhich one? i wanna try!06:06
*** rpodolyaka1 has quit IRC06:07
clarkbtchaypo: https://git.openstack.org/cgit/openstack-infra/config/tree/modules/openstack_project/templates/nodepool/nodepool.yaml.erb06:10
StevenK% vim modules/openstack_project/templates/nodepool/nodepool.yaml.erb06:12
StevenKVim: Caught deadly signal SEGV06:12
clarkbthough it was recently updated and may not cause segfaults anymore06:12
clarkboh good its still deadly06:12
StevenKYes, it has not lost its deadlyness06:12
tchaypofirst of all that's an erb not a yaml06:13
tchaypoand second, it loads for me06:14
StevenKclarkb: vim -u /dev/null modules/openstack_project/templates/nodepool/nodepool.yaml.erb does not segv, but then you have nothing helpful set.06:14
StevenKtchaypo: Are you running Saucy?06:14
tchaypolsb_release says 14.04 Trusty Tahr06:15
StevenKtchaypo: That's why -- vim on that file works on Trusty, SEGV's on Saucy06:15
tchaypooh, so you're not even on the trusty beta?06:16
StevenKNot yet06:16
* tchaypo relocates indoors06:16
StevenKI will probably upgrade my laptop this week06:16
lifelessI've just added trusty to my mirror06:18
lifelessof course universe sources hit the checksum race. FAIL.06:18
StevenKHeh, I've had trusty on my mirror since it hit au.archive06:19
*** e0ne has joined #tripleo06:23
tchaypookay, i have bandersnatch runnin in a cronjb, exciting.06:34
tchayponow for apt-mirror06:34
tchaypoaccording to http://apt-mirror.github.io/ this shouldn't take long06:35
*** rpodolyaka1 has joined #tripleo06:35
*** e0ne has quit IRC06:37
*** rdopieralski has joined #tripleo06:39
tchaypowhee, 168.9Gb to download06:43
* tchaypo puts that in the background and carries on working06:43
*** rpodolyaka1 has quit IRC06:45
*** nosnos has quit IRC06:45
*** nosnos has joined #tripleo06:46
*** jprovazn has joined #tripleo06:55
*** mrunge has joined #tripleo06:56
*** matsuhashi has quit IRC06:58
*** matsuhashi has joined #tripleo06:58
tchaypoI'm staring at https://git.openstack.org/cgit/openstack/tripleo-incubator/tree/scripts/devtest_overcloud.sh#n14006:59
tchaypoi can see that we're waiting for nova hypervisor-stats to show the right number of nodes available06:59
tchaypowhat I can't understand is why we would ever expect to end up with the right number of nodes06:59
*** jtomasek has joined #tripleo07:01
tchaypoah. I'm guessing this must be setup-baremetal at the end of _undercloud.sh07:01
*** jistr has joined #tripleo07:02
openstackgerritLadislav Smola proposed a change to openstack/tuskar-ui: Making overcloud init work  https://review.openstack.org/8334007:02
*** matsuhashi has quit IRC07:03
*** giulivo has joined #tripleo07:05
*** matsuhashi has joined #tripleo07:06
openstackgerritLadislav Smola proposed a change to openstack/tuskar-ui: Making overcloud init work  https://review.openstack.org/8334007:09
*** bauzas has joined #tripleo07:14
GheRiveromorning all07:15
*** jcoufal has joined #tripleo07:16
*** CaptTofu has joined #tripleo07:19
*** CaptTofu has quit IRC07:23
rpodolyakamorning07:27
openstackgerritNicholas Randon proposed a change to openstack/tripleo-incubator: SSH key and virtual_power_driver not used on H/W  https://review.openstack.org/8377007:29
*** xuhaiwei has joined #tripleo07:32
*** marun is now known as maru_afk07:36
openstackgerritNicholas Randon proposed a change to openstack/tripleo-incubator: Bridge physical interface to the seed.  https://review.openstack.org/8408307:37
*** ramishra has quit IRC07:38
*** hashar has joined #tripleo07:44
*** eghobo has quit IRC07:45
lifelesswheeeooo upgrades to trusty are fast with a local mirror07:46
GheRiverois trusty released?07:47
lifelessnah07:49
lifelessbut samba4.1 is in it07:49
lifelessgnarr lots of | nova-compute     | ci-overcloud-novacompute7-3my4s5pay2om | nova     | enabled | down  | 2014-04-01T06:56:05.000000 | -               |07:52
openstackgerritGhe Rivero proposed a change to openstack/tripleo-incubator: Allow to set seed node cpus and memory  https://review.openstack.org/8407807:53
*** athomas has joined #tripleo07:54
openstackgerritGhe Rivero proposed a change to openstack/tripleo-incubator: Allow to use a cirros image as the end user image  https://review.openstack.org/8334708:01
*** lazy_prince has joined #tripleo08:02
*** lazy_prince is now known as killer_prince08:02
*** gcha has joined #tripleo08:03
*** matsuhashi has quit IRC08:03
*** matsuhas_ has joined #tripleo08:05
*** lucasagomes has joined #tripleo08:13
*** CaptTofu has joined #tripleo08:20
*** killer_prince has quit IRC08:23
lifelesswow08:24
lifelessI think somethings really unhappy here08:24
*** derekh has joined #tripleo08:24
*** CaptTofu has quit IRC08:24
lifelessderekh: so we had CI for a few hours08:24
lifelessderekh: now nova-compute is hung on the hypervisors08:24
lifelessderekh: and by hung, I mean service stop doesn't stop it :)08:25
derekhlifeless: one all of the compute nodes?08:25
derekh* on all of the compute nodes08:26
derekhbrb08:26
lifelessnova service-list :(08:26
Ngmorning08:27
xuhaiweiI have built the mirror, but how to use it? set a environment variable?08:30
lifelessxuhaiwei: yes, see the pypi element README.md08:31
xuhaiweiI can't understand it well, should I export PYPI_MIRROR_URL= ~/.cache/image-create/pypi/mirror08:32
lifelessif thats where your mirror is, you just need to include the pypi element08:32
lifelesse.g. export DIB_COMMON_ELEMENTS="pypi stackuser"08:32
openstackgerritGhe Rivero proposed a change to openstack/tripleo-incubator: Allow to pass uncompress option to boot-seed-vm  https://review.openstack.org/8277208:33
*** matsuhas_ has quit IRC08:33
openstackgerritA change was merged to openstack/tripleo-incubator: Update documentation.  https://review.openstack.org/8274108:35
*** andreaf has joined #tripleo08:37
*** ramishra has joined #tripleo08:38
*** matsuhashi has joined #tripleo08:38
openstackgerritLadislav Smola proposed a change to openstack/tuskar-ui: Adding overcloud keystone client  https://review.openstack.org/8437908:39
openstackgerritLadislav Smola proposed a change to openstack/tuskar-ui: Overcloud initialization  https://review.openstack.org/8334008:39
*** martyntaylor has left #tripleo08:39
derekhlifeless: bad version of nova maybe?08:40
lifelessderekh: I'm hoping not :)08:40
lifelesshttp://paste.openstack.org/show/74727/08:40
lifelessbut its not context switching08:41
lifelessderekh: oh that reminds me08:41
lifelessderekh: log files, we should grab /mnt/state/var/log as well08:42
lifelessI was going to whip up a patch but the day got away from me08:42
lifelesswow08:42
lifelessVmPeak:  1807004 kB08:42
lifelesswtf is nova-compute /doing/08:42
*** ramishra has quit IRC08:43
xuhaiweiI find this code in devtest_ramdisk.sh: DIB_COMMON_ELEMENTS=${DIB_COMMON_ELEMENTS:-'stackuser'}, when I run this shell, I should export DIB_COMMON_ELEMENTS="pypi stackuser", but when I run other shells I should export DIB_COMMON_ELEMENTS again if different element is required??08:43
lifelessIf I understand your question correctly. Yes.08:43
xuhaiweiOK, thanks08:44
openstackgerritGhe Rivero proposed a change to openstack/tripleo-incubator: Add elements to user image via TEST_IMAGE_DIB_EXTRA_ARGS  https://review.openstack.org/8273908:45
lifelessSpamapS: I've put in the review what I want to see08:50
lifelessderekh: hahaha tried to reboot the node... it hung08:51
lifelessderekh: there's something fundamentally wrong here08:51
lifeless08:51
lifeless    1 root      20   0 48828  24m 1436 R   100  0.0 119:07.00 init08:51
lifelesspipe(0x7fff5fddb900)                    = -1 EMFILE (Too many open files)08:52
lifelessthats from init08:52
lifelessahahahahahahahahahahahahahahahahahaha08:53
lifelessKEYBUK08:53
*** jp_at_hp has joined #tripleo08:53
*** d0ugal_ has quit IRC08:53
lifelessno upstart changes since oct 908:53
*** d0ugal_ has joined #tripleo08:53
*** d0ugal_ is now known as d0ugal08:54
*** d0ugal has quit IRC08:54
*** d0ugal has joined #tripleo08:54
*** d0ugal has quit IRC08:54
*** d0ugal has joined #tripleo08:54
Nglifeless: ugh, that's awful08:55
derekhlifeless: I'm not sure what you mean08:55
lifelessbug https://bugs.launchpad.net/tripleo/+bug/130066308:56
uvirtbotLaunchpad bug 1300663 in tripleo "upstart using 100% CPU" [Critical,Triaged]08:56
openstackgerritA change was merged to openstack/tuskar: Updates gettextutils module from oslo-incubator  https://review.openstack.org/8417108:56
* Ng is minded to land 84049 purely on the grounds that any repo called "-incubator" shouldn't have to worry too much about layerings and dependencies08:56
openstackgerritA change was merged to openstack/tuskar: Updates test module from oslo-incubator  https://review.openstack.org/8417208:56
openstackgerritA change was merged to openstack/tuskar: Updates local module from oslo-incubator  https://review.openstack.org/8417308:57
openstackgerritA change was merged to openstack/tuskar: Updates timeutils module from oslo-incubator  https://review.openstack.org/8417408:58
derekhlifeless: so init has 100's of /dev/ptmx open, are you thinking that the problem08:58
*** athomas has quit IRC08:59
lifelessderekh: 1023 fd's open for init08:59
*** untriaged-bot has joined #tripleo09:00
untriaged-botUntriaged bugs so far:09:00
untriaged-bothttps://bugs.launchpad.net/tripleo/+bug/129048809:00
uvirtbotLaunchpad bug 1290488 in tripleo "Baremetal: Invalid credentials" [Undecided,Incomplete]09:00
*** untriaged-bot has quit IRC09:00
derekh# ps -ef | grep flock | wc09:00
derekh    999    8992   6594009:00
lifelessoh hoh ho dhcp-all-interfaces is spawning per plugged vm port ?09:01
lifelessand not cleaning up properly?09:02
lifelessderekh: this could be a symptom rather than a cause09:02
derekhlifeless: ok09:02
lifelesse..g the <defunct>09:02
lifelesswhy haven't they been reaped09:02
openstackgerritA change was merged to openstack/tuskar: Updates importutils module from oslo-incubator  https://review.openstack.org/8417509:03
openstackgerritA change was merged to openstack/tuskar: Updates jsonutils module from oslo-incubator  https://review.openstack.org/8417609:04
openstackgerritA change was merged to openstack/tuskar: Updates excutils module from oslo-incubator  https://review.openstack.org/8417709:04
openstackgerritA change was merged to openstack/tuskar: Updates fileutils module from oslo-incubator  https://review.openstack.org/8417809:04
openstackgerritA change was merged to openstack/tuskar: Updates log module from oslo-incubator  https://review.openstack.org/8417909:04
openstackgerritA change was merged to openstack/tuskar: Updates lockutils module from oslo-incubator  https://review.openstack.org/8418409:04
openstackgerritA change was merged to openstack/tuskar: Updates fixture module from oslo-incubator  https://review.openstack.org/8418509:04
openstackgerritLadislav Smola proposed a change to openstack/tuskar-ui: Adding overcloud keystone client  https://review.openstack.org/8437909:05
openstackgerritLadislav Smola proposed a change to openstack/tuskar-ui: Overcloud initialization  https://review.openstack.org/8334009:05
derekhlifeless: the first of them (on compute4), was started 1 hour before the compute node stopped checking in with controller, the 999th was started around the same time compute hit problems09:05
lifelessderekh: yeah09:05
derekhstill need to find cause though09:06
lifelesssee my comment 3 on the bug09:06
lifelessNg: what happens after start on starting network-interface - when does the service *stop* ?09:06
lifelessNg: or does exec say 'run till completion' ?09:06
Nglifeless: the upstart config we have for dhcp-all doesn't specify a stopping condition, so yeah, it's just saying "launch this and let it end when it ends"09:07
lifelessNg: so a few thoughts09:10
lifelessNg: this is meant to be skipping these interfaces09:12
lifelessNg: but clearly its not09:12
lifelessNg: ahha, I think I know why09:12
lifelessNg: when you exec and flock09:12
*** ramishra has joined #tripleo09:13
lifelessNg: you drop the specifier for the same interface09:13
lifelessNg: so it goes from 'tap foo' to 'check them all'09:13
lifelessNg: because INTERFACE is set to ${1:-}09:13
Nglifeless: good spot09:13
lifelesssecondly09:14
lifelessyou FLOCK before inspecting the interface09:14
lifelesslet me work up a patch09:14
lifeless:q09:14
lifeless:q09:14
NgI don't flock anything, dan added that ;)09:14
Ngbut it's entirely probably that I reviewed it and didn't spot the loss of $109:14
openstackgerritjan grant proposed a change to openstack/diskimage-builder: Hard-link multiple identical files.  https://review.openstack.org/8438409:15
Nga "$@" on the end of the flock command ought to be sufficient to fix the loss of $1, but I was just looking at this now and thinking that some of the inspection should be happening first too09:15
Ngno point flocking and re-execing if we're just going to immediately exit09:15
janglifeless: as promised last night: turned my hand-cranked script into an element, I think. Note this is currently untested in that form, trying it now.09:16
openstackgerritlifeless proposed a change to openstack/diskimage-builder: Document a little the concerns for operators.  https://review.openstack.org/8438509:17
*** ramishra has quit IRC09:17
*** jcoufal has quit IRC09:18
*** tserong has quit IRC09:18
*** tserong has joined #tripleo09:19
*** tserong has joined #tripleo09:19
openstackgerritNicholas Randon proposed a change to openstack/tripleo-incubator: SSH key and virtual_power_driver not used on H/W  https://review.openstack.org/8377009:19
openstackgerritNicholas Randon proposed a change to openstack/tripleo-incubator: Make it possible to pass real hardware in.  https://review.openstack.org/8318809:19
openstackgerritlifeless proposed a change to openstack/diskimage-builder: Fix resource exhaustion with upstart.  https://review.openstack.org/8438609:20
*** ramishra has joined #tripleo09:24
openstackgerritA change was merged to openstack/tripleo-incubator: Add more tools.  https://review.openstack.org/8404909:24
Nglifeless: -1 on the resource exhaustion change I'm afraid09:24
*** martyntaylor has joined #tripleo09:26
jangagh09:28
jangI was running devtest _somewhere_09:28
jangI've lost the window.09:29
Ngalways run devtest in screen :)09:29
lifelessNg: do you hate freedom ?09:30
jangheh09:31
Nglifeless: passionately09:31
StevenKOf course he does, he runs OS X09:31
jangIs there a screen setting to not define a window bracket, but rather to let my scrollback buffer work>09:32
jangah yes. "Trashcan full? No big of a deal, just buy another."09:32
lifelessjang: screen *is* my scrollback buffer:)09:34
lifelessjang: some folk think I may be odd.09:34
openstackgerritlifeless proposed a change to openstack/diskimage-builder: Fix resource exhaustion with upstart.  https://review.openstack.org/8438609:34
lifelessNg: it is of course untested.09:35
lifelessthats next09:35
jangI know you can use ^A - something (? esc?) to scroll up. I was wondering if there was a method more in keeping with what the various xterm-alikes use.09:35
lifeless^A[09:36
lifelessat least for me09:36
lifelessmaybe tmux has mouse integration09:36
rdopieralskiI always do ^A [ to enter the copy-paste mode09:36
lifelessthat would be nice09:36
Ng[ and Esc are equivalent afair, it's just going into the copy mode09:36
lifelesssomeone should write a terminal multiplexer gui09:36
jangRight. I just prefer kde's (!) point-and-drool keyboard controls for most things. And yet I still use vi.09:37
rdopieralskiNg: actually esc is ^[09:37
lifelessok, trying a reboot of this machine09:38
Ngrdopieralski: I'm saying in terms of screen bindings. ^a [ and ^a Esc are bound to the same feature09:38
rdopieralskiNg: interesting09:38
Ngrdopieralski: but I admit I wasn't very clear about that :)09:39
*** jistr has quit IRC09:39
*** gcha has quit IRC09:40
Nglifeless: once more I am stamping your freedom into the ground with my bead-blasted aluminium jackboot :/09:43
lifelesshhahahhahahahahahahahahah09:44
derekhNg: isn't that one ok, because the interface name is passed into enable_interface?09:45
openstackgerritlifeless proposed a change to openstack/diskimage-builder: Fix resource exhaustion with upstart.  https://review.openstack.org/8438609:45
lifelessderekh: $0 will be wrong09:46
lifelessderekh: I think09:46
derekhlifeless: no it isn't09:46
derekhlifeless: I checked befor I added a +209:46
lifelessah09:46
lifelesswell I pushed a new variant anyhow that is perhaps easier to reason about09:46
*** morganfainberg is now known as morganfainberg_Z09:46
Nghmm, derekh is right, but I still think this new rev is better09:46
Ngsneaking $@ through several layers is not exactly clear09:47
derekhlifeless: yup, it looks easier to the eye now09:47
openstackgerritA change was merged to openstack/diskimage-builder: Fix resource exhaustion with upstart.  https://review.openstack.org/8438609:50
*** athomas has joined #tripleo09:50
*** jcoufal has joined #tripleo09:50
openstackgerritjan grant proposed a change to openstack/diskimage-builder: Hard-link multiple identical files.  https://review.openstack.org/8438409:51
lifelessderekh: i'm inclined to fix in-situ given the headaches we've had09:52
derekhlifeless: sounds reasonable09:52
lifelessderekh: we also need to manually stable br-ctlplane to workaround the dhclient running on eth2 issue09:52
lifelessderekh: but lets get one machine copacetic and go from there09:52
lifelessifarkas: https://review.openstack.org/#/c/81200/ - https://bugs.launchpad.net/tripleo/+bug/130045809:54
uvirtbotLaunchpad bug 1300458 in tripleo "incorrect ram weighting on overclouds" [Critical,Triaged]09:54
lifelesslsmola_: ^ FYI too09:54
lsmola_lifeless: what is the problem here?09:56
*** athomas has quit IRC09:56
lifelesslsmola_: the negative weight should only be applied to bare metal clouds09:56
lifelesslsmola_: otherwise it makes vm clouds stack all the VMs onto one host09:57
lifelesslsmola_: which is rather bad for performance!09:57
ifarkaslifeless, right, that makes sense09:58
lsmola_lifeless: ah, we are using the same nova element for overcloud and overcloud, uh09:58
lifelessyes09:58
lifelessif you look down in the config09:58
lifelessthere are bits that are trigged by baremetal09:58
lsmola_lifeless: yes, we need to make condition there, sorry i didn't realized09:58
lsmola_lifeless: also list of used filters will be different09:58
lifelessjust need to either move the condition down, or make it have a similar guard09:58
lifelesslsmola_: exactly09:58
openstackgerritA change was merged to openstack/tripleo-heat-templates: Drop dnsmasq_range from the undercloud source.  https://review.openstack.org/8313009:59
*** athomas has joined #tripleo09:59
*** jistr has joined #tripleo09:59
openstackgerritA change was merged to openstack/tripleo-image-elements: Drop dnsmasq_range from the seeds config.json.  https://review.openstack.org/8312809:59
lifelessbah, eni typo10:00
lifelessthis should come up now10:00
lifelessseparately10:01
lifelessnot what you want to see10:01
lifelessErrors were encountered while processing:10:01
lifeless grub-efi-amd6410:01
lifeless grub-efi-amd64-signed10:01
lifeless shim-signed10:01
lifelessError in function:10:01
lifelessA fatal error occurred10:01
lsmola_lifeless: so you meant putting it under [baremetal] ?10:01
lifelesslsmola_: well no, I mean in the {{# guarded section10:01
lifelessor a similarly guarded section10:02
lifelessit has to be in [DEFAULT] doesn't it?10:02
lsmola_lifeless: yes10:02
lsmola_lifeless: ok, should I fix that?10:02
lifelessplease10:02
lsmola_lifeless: so this should be fine? {{#nova.baremetal}}10:03
ifarkaslsmola_, lifeless, btw: https://review.openstack.org/#/c/84131/10:03
*** e0ne has joined #tripleo10:03
lifelessifarkas: yeah10:06
lifelessso if we configure it via heat, cool10:07
lifelessbut lets unbreak things first10:07
lsmola_lifeless: ok I will put there quick  {{#nova.baremetal}} guard10:07
ifarkaslifeless, ack10:08
openstackgerritA change was merged to openstack/tripleo-image-elements: Install libffi-dev in the glance element  https://review.openstack.org/8033710:08
*** CaptTofu has joined #tripleo10:09
lifelessblink10:09
lifeless-GRUB_CMDLINE_LINUX_DEFAULT=""10:09
lifeless+GRUB_CMDLINE_LINUX_DEFAULT="nomdmonddf nomdmonisw"10:09
lifelessEWTF10:09
tchayponom d mon nom10:13
*** e0ne has quit IRC10:13
*** CaptTofu has quit IRC10:13
*** matsuhashi has quit IRC10:14
openstackgerritLadislav Smola proposed a change to openstack/tripleo-image-elements: Adding baremetal guard to RamWeigher  https://review.openstack.org/8439710:16
*** akuznetsov has joined #tripleo10:16
lsmola_lifeless: https://review.openstack.org/#/c/84397/110:18
lsmola_lifeless: quick fix, making it configurable via heat will solve rest of the issues10:18
lsmola_ifarkas: ^10:18
lifelesslsmola_: one minor tweak10:19
lifelessok yay that hypervisor checked in eventually10:19
jp_at_hplifeless: on https://review.openstack.org/#/c/83188 - I still hold the opinion that the domumentation around what the nodes file should be is incomplete...10:20
lifelessjp_at_hp: yes, I understand now10:20
*** akuznetsov has quit IRC10:20
lsmola_lifeless: oh :-)10:20
lifelessjp_at_hp: I haven't had time to put an example in ni devtest_setup.sh, but I agree thats valuable to have.10:20
jp_at_hpI think it can be a separate patch,k to improve that whole section documentation wise...10:20
openstackgerritLadislav Smola proposed a change to openstack/tripleo-image-elements: Adding baremetal guard to RamWeigher  https://review.openstack.org/8439710:21
ifarkaslsmola_, one typo :-)10:22
openstackgerritLadislav Smola proposed a change to openstack/tripleo-image-elements: Adding baremetal guard to RamWeigher  https://review.openstack.org/8439710:23
lsmola_ifarkas: lifeless fixed10:23
*** yamahata has quit IRC10:24
*** e0ne has joined #tripleo10:24
openstackgerritNicholas Randon proposed a change to openstack/tripleo-incubator: Bridge physical interface to the seed.  https://review.openstack.org/8408310:26
*** e0ne_ has joined #tripleo10:30
openstackgerritNicholas Randon proposed a change to openstack/tripleo-incubator: Bridge physical interface to the seed.  https://review.openstack.org/8408310:30
openstackgerritNicholas Randon proposed a change to openstack/tripleo-incubator: Bridge physical interface to the seed.  https://review.openstack.org/8408310:31
*** e0ne has quit IRC10:33
*** hashar has quit IRC10:33
*** akuznetsov has joined #tripleo10:34
derekhlifeless: did that go ok? want me to update the other nodes?10:39
lifelessderekh: I'm just writing a script to do that10:39
derekhlifeless: ok10:40
*** nosnos has quit IRC10:40
*** markmc has joined #tripleo10:45
*** CaptTofu has joined #tripleo10:46
*** pblaho has joined #tripleo10:48
*** gcha has joined #tripleo10:49
lifelessderekh: so see /root/recovert10:51
lifelessderekh: once they all power on and are network reachable, running force-o-c-c should get us all sorted modulo cleaning up the state on any hung instances10:52
*** CaptTofu has quit IRC10:54
derekhlifeless: lgtm, we don't need to do anything on the controller?10:56
lifelessderekh: nova list --all-tenants | awk '$6=="ERROR" { print $2 }' | xargs -n1 nova reset-state10:57
lifelessderekh: after its all back up10:57
lifelessbut - I think thats sorted already10:57
derekhlifeless: ok10:58
openstackgerritMichael Kerrin proposed a change to openstack/diskimage-builder: Add utility to mount and unmount images  https://review.openstack.org/7650910:59
openstackgerritMichael Kerrin proposed a change to openstack/diskimage-builder: WIP - Add script to replay elements in an existing image  https://review.openstack.org/8026711:01
openstackgerritMichael Kerrin proposed a change to openstack/diskimage-builder: WIP - enable elements to be re-playable  https://review.openstack.org/8149411:01
StevenKAdd utility to mount and unmount images -- didn't we have those already in dib? And that makes me sad, since it's like two lines to do so anyway11:02
jp_at_hplifeless: on https://review.openstack.org/#/c/83770, I want to understand your comment on patch set 7 better.  I thought that the ssh details were only needed for the virtual power manager,, is that wrong?11:02
openstackgerritjan grant proposed a change to openstack/diskimage-builder: Hard-link multiple identical files.  https://review.openstack.org/8438411:03
openstackgerritJan Provaznik proposed a change to openstack/tripleo-heat-templates: Add mysql innodb buffer pool size  https://review.openstack.org/8440511:03
lifelessjp_at_hp: I may have tied myself up in knots11:05
lifelessjp_at_hp: let me see11:05
openstackgerritLadislav Smola proposed a change to openstack/tuskar-ui: Removing unused testadata  https://review.openstack.org/8440611:11
lifelessjp_at_hp: try that on for size11:14
*** slagle has joined #tripleo11:14
xuhaiweiwhen running devtest_seed.sh, I got an error, the log says:"Storing debug log for failure in /root/.pip/pip.log", but the /root/.pip/pip.log does not exist11:15
lifelessderekh: I count 10 hypervisors up11:16
lifelessderekh: and active slaves11:17
openstackgerritA change was merged to openstack/tripleo-image-elements: Adding baremetal guard to RamWeigher  https://review.openstack.org/8439711:18
derekhlifeless: sweet11:18
openstackgerritRadomir Dopieralski proposed a change to openstack/tuskar-ui: Make run_tests.sh work with errexit  https://review.openstack.org/8440911:19
janglifeless: https://review.openstack.org/#/c/84384/ looks to be working now (hardlink aggregation). It about halved the size of /opt on my seed. Testing it across the board now.11:20
jp_at_hpthanks lifeless - that's kinda what I was expecting, but it's good to get it so clearly.11:20
lifelessjang: cool. Is it a net win  perf wise ?11:21
*** slagle_ has joined #tripleo11:21
*** slagle has quit IRC11:21
derekhlifeless: I can't ssh to the te-broker floating IP, gonna see if its running on compute11:21
*** e0ne_ has quit IRC11:22
lifelessderekh: its not11:23
openstackgerritNicholas Randon proposed a change to openstack/tripleo-incubator: Bridge physical interface to the seed.  https://review.openstack.org/8408311:23
derekhlifeless: ya, its gone, not even defined in libvirt11:23
janglifeless: measuring now; it'll take me a while to get a number of runs that amount to any kind of statistical difference.11:24
lifelessderekh: thats odd :)11:24
lifeless`11:24
jangfor a dev cycle, it might be close. But for deploying finished images, where you don't care about the CI time as much, it might be a winner.11:24
lifelessderekh: I hope i haven't just wedged it:) - I'm going to tag and run11:25
lifelessderekh: I have one suggestion - if you rebuild the broker you can use neutron create-port to create a port with 192.168.1.1 in advance and supply that on the boot line11:25
lifelessderekh: rather than a net-id11:25
derekhlifeless: ok, will do11:25
lifelessderekh: then we don't need to wait for infra11:25
lifeless--net nic-id=asdad IIRC11:26
derekhlifeless: ahh, its gone to ERROR state now, will rebuild it11:26
lifelessme -> sleep11:26
Ngnight lifeless11:26
jangnight11:27
*** hashar has joined #tripleo11:28
openstackgerritLadislav Smola proposed a change to openstack/tuskar-ui: Adding overcloud keystone client  https://review.openstack.org/8437911:31
openstackgerritLadislav Smola proposed a change to openstack/tuskar-ui: Overcloud initialization  https://review.openstack.org/8334011:31
*** lucasagomes is now known as lucas-hungry11:34
openstackgerritA change was merged to openstack/tuskar-ui: Make run_tests.sh work with errexit  https://review.openstack.org/8440911:36
*** kiall_ is now known as Kiall11:36
openstackgerritLadislav Smola proposed a change to openstack/tuskar-ui: Hide non working scaling  https://review.openstack.org/8441311:37
derekhNg: Hmm, that fix doesn't seem to have worked, still got a shed load of dhcp-all-interfaces.sh instances running11:41
*** weshay has joined #tripleo11:41
Ngderekh: huh. where is this actually happening?11:42
derekhNg: I'm looking on ci-overcloud-novacompute4-dqiq7436leuh11:42
derekhNg: so this is the upstart config http://git.openstack.org/cgit/openstack/diskimage-builder/tree/elements/dhcp-all-interfaces/install.d/dhcp-all-interfaces.conf11:44
derekhNg: so that would start dhcp-all-interfaces.sh on all interfaces , each time a single interface comes up11:45
Ngugh11:45
derekhe.g.11:46
derekh-rw-r----- 1 root root     276 Apr  1 11:31 dhcp-all-interfaces-qbr00dd03df-ca.log11:46
derekhand loads others like it11:46
Ngagreed11:46
derekhso we need it to ignore the instance interfaces somehow11:48
*** xuhaiwei has quit IRC11:48
Ngderekh: ideally at the upstart level, so we're not pointlessly spawning the script for each one of these interfaces11:50
Ngjust poking at an instance interface to see if it has any easy hints we can detect11:50
derekhNg: yup, ok11:50
lifelessderekh: lots of instances, or a growing backlog ?11:50
lifelessderekh: I'd expect lots of instances to have run, but to all exit very quickly.11:50
* lifeless failed at sleep11:51
*** d0ugal has quit IRC11:51
lifelessderekh: the /root/recovert scripts should help mass-apply whatever you and ng come up with11:51
derekhlifeless: the flock could doesn't seem to be changing holding steady at 502 for about 10 minutes so far11:51
*** d0ugal has joined #tripleo11:51
derekhlifeless: yup11:52
derekhlifeless: *flock count11:52
lifelessoh any flock count means we're still trying to enable now11:52
NgI am a bit surprised that flockers aren't dying off as each one goes through the pointless enumeration exercise11:52
lifelessassuming the fix *did* get copied off11:52
derekhlifeless: we have the updated script11:53
lifelessok, I must leave you to it11:53
* lifeless tries again11:54
derekhlifeless: night11:54
*** morazi has joined #tripleo11:55
*** ramishra has quit IRC11:56
*** mrunge has quit IRC11:59
Ngderekh: any objections if I murder the flock of flocks? I'd like to get things calmed down and test a modification to the upstart job11:59
derekhNg: go for it11:59
derekhNg: ok, so I think the change to dhcp-all-interfaces.sh is wrong12:02
Ngderekh: oh?12:02
derekhwe're calling it with no args in the upstart script12:02
*** e0ne has joined #tripleo12:02
derekhso ARGS="/usr/local/sbin/dhcp-all-interfaces.sh"12:02
Ngderekh: yes, that is the modification to the upstart job I wanted to test, adding $INTERFACE12:03
Ngthis whole thing is awful, it's still even called *all-interfaces* ;)12:03
*** e0ne has quit IRC12:04
*** e0ne has joined #tripleo12:04
derekhNg: when its respawned in enable_interface for a specifc interface it gets recalled on all interfaces12:05
derekhas those were the original prams12:05
Ngderekh: agreed, it's basically never ever operating in single-interface mode ever12:05
Ngat all12:05
derekhNg: what your trying should fix this case but dhcp-all-interfaces.sh should also be fixed for the case where it gets no param (or changed not to support that)12:06
derekhNg: yup12:06
derekhNg: will be back in a few minutes12:07
Ngk12:07
Nghmm, now I'm not sure if I subtly broke the upstart job or not. I just brought up a random manual TAP interface to try and get upstart to fire off the script, but I see nothing12:07
Ngalso not sure if upstart is so wedged that it can't do anything anymore12:08
Ngit doesn't seem to be reaping the dead flock12:08
*** CaptTofu has joined #tripleo12:08
*** e0ne has quit IRC12:09
Ngyeah I think upstart is toast, "initctl list" isn't even returning12:10
Nggoing by the original purpose of this script, we really shouldn't let it keep getting triggered indefinitely, it's supposed to get us DHCP'd on all physical interfaces at boot, because we don't know which one is the one we want12:11
Ngadding more and more chicanery to keep us tip-toeing around virtual interfaces that appear later, and avoiding flooding init daemons to death, all seems entirely pointless, so I'm thinking about useful ways to detect that our work is done, and stop running anymore12:13
*** CaptTofu has quit IRC12:16
Ngderekh: so I'm thinking a two-prong approach here. first, remove the ability for the script to detect all interfaces and just require that an interface name be passed in. I don't think we're ever calling it other than from upstart/udev with (at least the intention ;) of passing in an interface name12:19
*** e0ne has joined #tripleo12:19
Ngderekh: maybe three-prong, because with prong one complete, we should probably rename this element and everything inside it ;)12:20
Ngderekh: but the final prong would be to try and figure out a smart way to detect that we don't need to be doing this anymore (i.e. the machine has booted and is talking to the world)12:20
*** jprovazn has quit IRC12:21
NgI'm not sure the final prong is terribly reasonable, sadly, but we can at least attempt to do better at racing to exit12:21
openstackgerritAndrea Rosa proposed a change to openstack/tripleo-image-elements: Allow settings for Nova Scheduler  https://review.openstack.org/8413112:29
*** rlandy has joined #tripleo12:34
*** lucas-hungry is now known as lucasagomes12:36
*** CaptTofu has joined #tripleo12:37
*** ramishra has joined #tripleo12:37
*** jdob has joined #tripleo12:40
*** jprovazn has joined #tripleo12:45
*** geerdest has joined #tripleo12:46
*** lblanchard has joined #tripleo12:49
derekhNg: back, removing the ability to detect all interfaces seems reasonable, to get us over this hump we should be ok to just pass the interface name around correctly, and we can put the rest of the changes through gerrit?12:50
derekhlooks like it already tests if a interface is real12:56
derekhInspecting interface: qbr12932ae8-f5...Device has generated MAC, skipping.12:57
*** dprince has joined #tripleo12:57
*** killer_prince2 has quit IRC12:58
SpamapSgenerated MAC really means locally administered bit right?12:58
SpamapSSeems like we need to actually check that it is a NIC, and not a bridge/tap/etc.12:59
derekhSpamapS: yup, that what we want13:01
*** sballe has joined #tripleo13:02
*** derekh changes topic to "FIREDRILL ci-overcloud down | tripleo-cd running preserve-ephemeral WIP patches and https://review.openstack.org/#/c/62042/ | Using OpenStack to deploy OpenStack;meetings Tuesday 1900 UTC in #openstack-meeting-alt"13:03
*** ccrouch has quit IRC13:04
openstackgerritClint "SpamapS" Byrum proposed a change to openstack/tripleo-heat-templates: WIP: Switch overcloud to software-config  https://review.openstack.org/8166613:05
*** pblaho has quit IRC13:08
SpamapShm13:11
SpamapSbridges at least don't have tun_flags in /sys/devices/*/net/$devname13:12
*** ccrouch has joined #tripleo13:13
SpamapSderekh: I wonder if the 'flags' files in /sys/devices/*/net/* have the information we need13:17
*** noslzzp has joined #tripleo13:17
SpamapShttp://paste.openstack.org/show/7474513:17
SpamapSdefinite patterns there13:17
* derekh goes searching for docs13:20
SpamapSderekh: I'm digging through the kernel source at the moment :-P13:21
derekhSpamapS: ok13:21
openstackgerritLadislav Smola proposed a change to openstack/tuskar-ui: Loading template paraemters from tuskar-api  https://review.openstack.org/8442713:22
*** CaptTofu has quit IRC13:23
SpamapSderekh: note that we're ~10 minutes from screaming hungry beasts needing their morning feeding.. so if you don't hear from me soon.. carry on ;)13:24
derekhSpamapS: ok np13:24
*** akuznetsov has quit IRC13:26
SpamapSderekh: ok so the flags are defined in /usr/include/linux/if.h13:27
SpamapSderekh: and it looks like IFF_EBRIDGE is a _private_ flag. :-/13:28
SpamapSmeaning not in sysfs13:28
derekh:-(13:28
SpamapSderekh: so we'll probably be reduced to filtering things out by digging in ovs13:28
SpamapSor whitelisting classes13:29
SpamapSlike ethX, emX, vnetX13:29
SpamapSbut that seems not awesome either13:29
*** akuznetsov has joined #tripleo13:29
derekhya would be nice if we could avoid it13:31
dprincederekh: what is the problem you are solving here?13:31
derekhbtw we're about to get CLOUDOUTRAGED, if people think its annoying I can turn it off, just giving it a go13:32
derekhdprince: https://etherpad.openstack.org/p/cloud-outage13:32
*** CLOUDOUTAGE has joined #tripleo13:32
CLOUDOUTAGElifeless devananda Ng SpamapS jog0 GheRivero derekh dprince slagle  -- ci-overcloud currently down https://etherpad.openstack.org/p/cloud-outage13:32
*** CLOUDOUTAGE has quit IRC13:32
SpamapSoh heh I get those in my email already13:32
SpamapSanyway, time to go feed the beasts13:32
derekhk13:33
SpamapSderekh: I did notice that /sys/devices/*/net/xxx/bridge might be useful in finding bridges to filter out13:33
openstackgerritLadislav Smola proposed a change to openstack/tuskar-ui: Loading template parameters from tuskar-api  https://review.openstack.org/8442713:33
SpamapSderekh: clint@clint-HP:~$ cat /sys/devices/virtual/net/virbr0/bridge/bridge_id13:33
SpamapS8000.fe540009a2f113:33
derekhSpamapS: ok13:34
openstackgerritLadislav Smola proposed a change to openstack/tuskar-ui: Loading template parameters from tuskar-api  https://review.openstack.org/8442713:35
derekhNg: are you still doing anything with the upstart script, if not I might just bouch compute4 with $INTERFACE added, its not exaclty what we want but I think might be good enough to get us passed this problem in overcloud13:35
derekh*bounce13:35
openstackgerritLadislav Smola proposed a change to openstack/tuskar-ui: Loading template parameters from tuskar-api  https://review.openstack.org/8442713:36
*** julim has joined #tripleo13:38
*** mrunge has joined #tripleo13:38
dprinceSpamapS/derekh: The mac_addr_type bit should differentiate between real NICs vs. others right? (i.e. locally managed)13:39
SpamapSdprince: except that you just changed some things to have not locally managed macs.. ;)13:39
SpamapSbut I guess those things are things we want dhcp'd..13:40
dprinceSpamapS: oh, well why didn't you say something.13:40
SpamapSdprince: either way, that's not _really_ the right clue.13:41
SpamapSalso I wonder if even filtering bridges is right.. since once the bridge is up we do want the dhclient on the bridge not the iface, right?13:41
derekhI've changed the unstart script on novacompute4 to say13:42
dprinceSpamapS: dhcp-all-interfaces should not touch bridges. It is for bootstrapping only13:42
derekhexec /usr/local/sbin/dhcp-all-interfaces.sh $INTERFACE13:42
dprinceSpamapS: IMO it is a violation of its purpose...13:42
derekhgoing to reboot it to see if it works while we find a long term solution13:42
*** sballe has quit IRC13:43
*** Rakesh5 has quit IRC13:44
SpamapSdprince: that is a fair point.13:47
SpamapSdprince: given that, then yes, if /sys/devices/virtual/net/$interface/bridge/bridge_id exists, then we should not add it13:48
SpamapSand I suspect tap interfaces are fairly easy to detect13:48
openstackgerritLadislav Smola proposed a change to openstack/tuskar-ui: Fix test caching  https://review.openstack.org/8443613:51
dprinceSpamapS: Actually, I think excluding anything in just /sys/devices/virtual/net/ would do fine.13:52
derekhSpamapS: dprince: ya, I was about to say, the virt interfaces are in /sys/devices/virtual http://paste.openstack.org/show/74750/13:54
dprincederekh/SpamapS: who is doing this? would you like me to test/throw up a patch?13:54
derekhdprince: go for it13:54
dprincederekh: okay. So it was just br-ctlplane that is causing us failures then?13:55
Ngderekh: sorry, got dragged away for lunch13:56
derekhdprince: no the origional problem was that dhcp-all-interfaces.sh was being recec'd with no params and also we were not passing an interface name into it from the upstart config, so they just kept looping over all interfaces13:58
derekh*reexec'ed13:58
dprincederekh: okay.13:58
dprinceSpamapS: re, the "original problem", could Ubuntu use udev rule's to avoid that? e.i. instead of calling the script for 'all interfaces' could we just do it one interface at a time like we do on Fedora?13:59
derekhdprince: we were the ones calling it for all interfaces, not passing any param in http://git.openstack.org/cgit/openstack/diskimage-builder/tree/elements/dhcp-all-interfaces/install.d/dhcp-all-interfaces.conf?id=f10e61457977a22f531154bac66a532d4684f34614:01
derekhupstart was calling once for each interface14:02
NgI'm refactoring the element to stop ever trying to do every interface, and to then have a more appropriate name14:02
dprincederekh: Well, there were several iterations. I was trying to re-use what was already there (for Debian)14:02
*** CLOUDOUTAGE has joined #tripleo14:03
CLOUDOUTAGElifeless devananda Ng SpamapS jog0 GheRivero derekh dprince slagle  -- ci-overcloud currently down https://etherpad.openstack.org/p/cloud-outage14:03
*** CLOUDOUTAGE has quit IRC14:03
derekhdprince: yup14:03
dprincederekh: They were calling it multiple times (for each interface added)... but each time it would inspect all interfaces.14:03
dprincederekh: and I'm now questioning that since I added support to do just a single interface at a time...14:03
dprinceSpamapS: can we do this instead: 'exec /usr/local/sbin/dhcp-all-interfaces.sh $INTERFACE'14:04
Ngthat's what derekh has done to the live system, aiui14:05
Ngand that certainly should work14:05
derekhNg: dprince yup, testing it at the moment14:05
dprincederekh: gotcha. I buy that. Now that dhcp-all-interfaces.sh supports adding interfaces one-at-a-time I think it is cleaner that way14:06
derekhdprince: agreed14:06
openstackgerritChris Jones proposed a change to openstack/diskimage-builder: Refactor dhcp-all-interfaces to restore sanity.  https://review.openstack.org/8444214:07
NgThis is not a pretty diff :(14:07
*** yassine has joined #tripleo14:09
lxsliEyes please? https://review.openstack.org/#/c/83596/14:10
*** bauzas has quit IRC14:13
dprinceNg: left you some comments. The main thing is I think we need to use /sys/devices/virtual/net and exclude devices there.14:17
dprinceNg: also several nits about various things too. I might suggest that you decouple the renaming from the functional changes too14:18
dprinceNg: from an element standpoint (top level) it will still effectively 'DHCP ALL INTERFACES' (the real ones)14:18
dprinceNg: noting that I already sort of wanted to change it too... but resisted because it seemed reasonable as is14:19
derekhok, compute4 back up with the change to pass $INTERFACE into dhcp-all-interfaces, looks better (for defunct flock processes) but still not correct, nova having problems creating networks14:22
dprincederekh: is this an undercloud/overcloud issue? Or an issue inside the test environment itself?14:24
derekhdprince: an issue on ci-overcloud compute nodes14:24
derekhdprince: the node I'm looking at is ci-overcloud-novacompute4-dqiq7436leuh if you want to take a look14:25
bnemecjprovazn: https://github.com/openstack/tripleo-incubator/blob/master/scripts/devtest_variables.sh#L7914:26
derekhwe now have over 1000 dhcp-all-interfaces-*.log log files, one for each devices being created (real and virtual)14:26
*** lazy_prince has joined #tripleo14:26
openstackgerritChris Jones proposed a change to openstack/diskimage-builder: Rename dhcp-all-interfaces and fix it on Ubuntu.  https://review.openstack.org/8444214:27
*** lazy_prince is now known as killer_prince14:27
Ngdprince: I agreed with your comments on the commit message, but I'm not convinced about the other stuff yet14:27
Ngderekh: ugh, that's still a pretty crazy amount of work going on for no real reason14:27
derekhNg: yup14:28
*** shardy is now known as shardy_afk14:28
jprovaznbnemec, thanks, I use these vars too, for some reason I though that there was also a check "if fedora then set qpid"14:28
Ngderekh: did you have any genius ideas for ways we can detect that things are working and the udev/upstart jobs can be disabled?14:29
bnemecjprovazn: Yeah, below that there is an if fedora then use selinux-permissive check that would be similar though.14:29
jprovaznbnemec, you are right14:29
dprinceNg: if os-collect-config has executed dhcp-all-interfaces has done it's job and can go away IMO14:30
Nginteresting14:30
dprinceNg: its only real purpose is for bootstrapping, That said, the udev rules (with systemd) will only exec this script once per interface.14:31
*** sballe has joined #tripleo14:31
Ngthat could work. we'd need to move the element to t-i-e14:31
Ngdprince: sure, but we are generating a *lot* of interfaces on some machines14:31
Ng[15:26:09] derekh: we now have over 1000 dhcp-all-interfaces-*.log log files14:31
NgI don't understand why we are making a thousand network interfaces, but I'm sure there is some kind of horrible neutron-y explanation ;)14:31
dprinceNg: Can we put an extra conditional in the upstart script itself? I added this to the systemd script to safeguard things on Fedora: ConditionPathExists=!/etc/sysconfig/network-scripts/ifcfg-%I14:32
derekhI was tempted to see what happens if I add "rm -f /etc/init/dhcp-all-interfaces.conf" to the end of dhcp-all-interfaces.sh but that would be a short term get this up and running fix14:32
dprinceNg: i.e, don't run this init script if networking is already configured...14:32
dprincederekh/Ng: it sounds to me like upstart just needs to be tamed here...14:33
derekhwe're now upto 1482 log files, I guess because zuul keeps retrying VM's that failed to start14:33
Ngit's not just upstart, you guys have one file per if which is pretty cheap to test14:33
*** killer_prince has quit IRC14:34
*** CLOUDOUTAGE has joined #tripleo14:34
CLOUDOUTAGElifeless devananda Ng SpamapS jog0 GheRivero derekh dprince slagle  -- ci-overcloud currently down https://etherpad.openstack.org/p/cloud-outage14:34
*** CLOUDOUTAGE has quit IRC14:34
*** killer_prince has joined #tripleo14:35
jangall, how much is 192.0.2.0/24 baked into the devtest config? If I want to pick another subnet across the board, is this still a mainly-manual process, or is it parameterised?14:35
*** jprovazn is now known as jprovazn_afk14:35
dprincejang: https://review.openstack.org/#/c/82327/14:35
derekhSo is it possible (only thinking out loud now) that the networking-interface upstart job is failing all virtual interfaces being brought up because dhcp-all-interfaces doesn't drop a config file14:35
derekhroot@ci-overcloud-novacompute4-dqiq7436leuh:/var/log/upstart# ls -l *qbr8edfc22d-f6*14:35
derekh-rw-r----- 1 root root 76 Apr  1 14:27 dhcp-all-interfaces-qbr8edfc22d-f6.log14:35
derekh-rw-r----- 1 root root 49 Apr  1 14:28 network-interface-qbr8edfc22d-f6.log14:35
derekhroot@ci-overcloud-novacompute4-dqiq7436leuh:/var/log/upstart# cat dhcp-all-interfaces-qbr8edfc22d-f6.log14:35
derekhInspecting interface: qbr8edfc22d-f6...Device has generated MAC, skipping.14:35
derekhroot@ci-overcloud-novacompute4-dqiq7436leuh:/var/log/upstart# cat network-interface-qbr8edfc22d-f6.log14:35
derekhifdown: interface qbr8edfc22d-f6 not configured14:35
jangbonzer dprince !14:36
dprincejang: the fact that we hard code it should go away (hopefully soon)14:36
*** CaptTofu has joined #tripleo14:36
Ngderekh: hmm. they shouldn't have a static config though, right? neutron ought to be configuring them dynamically14:38
Ng(I'm guessing)14:38
Ngifup/ifdown rely on eni14:39
*** bauzas has joined #tripleo14:39
dprinceNg: I would very much like us to be using static configs (aka. eni): https://review.openstack.org/#/c/69918/14:40
dprinceNg: to me it is the normal thing to do :)14:40
derekhNg: yup they shouldn't,14:40
*** CaptTofu has quit IRC14:40
Ngderekh: with the $INTERFACE added to the upstart job, is the networking still broken?14:43
Ngif that works, let's at least land that and then have a wider (list?) discussion about how we refactor this stuff to be simple and robust14:44
dprinceNg: was going to push just that but bailed when I saw your branch... so are you going to push just this fix too?14:45
Ngdprince: if it works, I'm happy to push a change.14:45
derekhNg: with  $INTERFACE added we have gotten rid of our respawn problem but neutron/nova are having problems with the net devices its creating, I've just reboot it again to try and remove the upstart job after it first runs to see what happens14:45
dprinceNg: actually, I think it would be nice to exclude virtual devices too since that seems to be an immediate problem as well14:46
derekhNg: once its back if we still have a problem we should jump on the box and see if we can figure out what else is wrong14:46
Ngderekh: if we remove it after the first run we might not generate a dhcp config for the physical device that needs it14:47
*** rdopieralski has quit IRC14:47
*** spzala has joined #tripleo14:47
*** rpodolyaka has quit IRC14:48
derekhNg: good point..so that was a bad idea14:48
Ngwe could avoid spitting out so many logs by not echoing so much, but that seems like a secondary concern14:48
Ngthe problem with removing the job is knowing when it's safe to do. so far the only idea there that makes sense is a successful occ run, IMHO14:49
Ngand network-interface.conf will still run, since that's an upstream upstart thing, not one of ours, afair, so we will likely get lots of logs anyway14:50
*** rpodolyaka has joined #tripleo14:51
openstackgerritAlexis Lee proposed a change to openstack/os-apply-config: Add DO_NOT_CREATE option  https://review.openstack.org/8203814:51
jangdprince: wrt that eni patch - I've just seen a rebooted overcloud control (props on the name change incidentally, long overdue) come up without shifting its ip address from eth0 to br-ex. Should I expect to see code in there to ensure that that'll work properly with the patch?14:54
*** slagle_ is now known as slagle14:55
dprincejang: that is what this branch does. Making reboots work was one of the primary goals14:56
dprincejang: the ability to reboot a node (IMO) is a fundamental case we don't yet support :(14:56
openstackgerritChris Jones proposed a change to openstack/diskimage-builder: Fix dhcp-all-interfaces on Ubuntu/Debian.  https://review.openstack.org/8446614:58
dprincederekh: so you removed the dhcp-all-interfaces.conf script on ci-overcloud-novacompute4-dqiq7436leuh.novalocal ?14:59
Ngderekh: if we can get it working with just the $INTERFACE addition, 84466 is just that14:59
derekhdprince: yup, I did14:59
derekhNg: that didn't work but I'm not sure if the problem was unrelated14:59
Ngderekh: ok14:59
NgI need to disappear for 20-30 mins, will be happy to assist further when I return :)15:00
derekhNg: maybe try it again on compute5 can see what the problem is15:00
derekhNg: ok15:00
*** untriaged-bot has joined #tripleo15:01
untriaged-botUntriaged bugs so far:15:01
untriaged-bothttps://bugs.launchpad.net/tripleo/+bug/130069315:01
uvirtbotLaunchpad bug 1300693 in tripleo "devtest.sh run leaves 3 default security groups" [Undecided,New]15:01
untriaged-bothttps://bugs.launchpad.net/tripleo/+bug/129048815:01
uvirtbotLaunchpad bug 1290488 in tripleo "Baremetal: Invalid credentials" [Undecided,Incomplete]15:01
*** untriaged-bot has quit IRC15:01
*** matty_dubs|gone is now known as matty_dubs15:02
*** TravT has joined #tripleo15:04
*** CLOUDOUTAGE has joined #tripleo15:05
CLOUDOUTAGElifeless devananda Ng SpamapS jog0 GheRivero derekh dprince slagle  -- ci-overcloud currently down https://etherpad.openstack.org/p/cloud-outage15:05
*** CLOUDOUTAGE has quit IRC15:05
derekhwe also got lots of these tracbacks on the nova-comute log which may be a new reason nova isn't able to start an instance http://paste.openstack.org/show/74757/15:06
dprincederekh: oh, that looks not so good15:08
bnemecI think lifeless was looking at that yesterday too.15:08
bnemecHe posted a paste with the same traceback.15:08
openstackgerritgerry-drudy proposed a change to openstack/tripleo-heat-templates: Add SwiftSortingMethod parameter  https://review.openstack.org/8447015:10
derekhbnemec: any idea if he mentioned what the problem might be?15:11
dprincederekh: seems to be related to the recent neutron event change https://review.openstack.org/#/c/81120/15:12
bnemecderekh: Not that I can see.  He just posted a link to http://paste.openstack.org/show/74700/ but skimming the scrollback I don't see any followup.15:13
derekhdprince: ok, so maybe we always had that tracback and its not effecting anything15:14
derekhdprince: checking15:14
*** ramishra has quit IRC15:16
derekhdprince: we were not getting that trace back in successful CI runs15:17
*** jcoufal has quit IRC15:18
dprincederekh: well, it could be a race where neutron wins sometimes right?15:19
*** hashar_ has joined #tripleo15:19
derekhdprince: ok15:20
*** hashar has quit IRC15:22
*** CaptTofu has joined #tripleo15:22
*** hashar_ has quit IRC15:26
*** ifarkas has quit IRC15:26
derekhApr  1 15:37:25 ci-overcloud-novacompute4-dqiq7436leuh kernel: [ 2467.029161] kvm [16383]: vcpu0 unhandled wrmsr: 0x682 data 015:29
derekhlots of these aswell15:30
derekhgoogling, probably a non issue15:30
openstackgerritDan Prince proposed a change to openstack/tripleo-incubator: Load undercloud images with -d (delete duplicate)  https://review.openstack.org/8447815:32
*** hashar has joined #tripleo15:33
*** CLOUDOUTAGE has joined #tripleo15:36
CLOUDOUTAGElifeless devananda Ng SpamapS jog0 GheRivero derekh dprince slagle  -- ci-overcloud currently down https://etherpad.openstack.org/p/cloud-outage15:36
*** CLOUDOUTAGE has quit IRC15:36
*** yassine has quit IRC15:40
dprincederekh: could we just add the $INTERFACE fix to the upstart script in the testenv worker images and then respin them?15:42
* dprince likes a clean slate sometimes15:42
derekhdprince: yes, we could but until we get the ci-overcloud up and running it wont help, I haven't even gotten as far as testing the testenv's to see if they have a problem15:43
derekhdprince: but I got no objection15:44
* dprince has to run out... will be back online in a bit...15:46
derekhalso got lots of these15:46
*** dprince has quit IRC15:46
derekh2014-04-01 15:45:58.938+0000: 8234: warning : virAuditSend:135 : Failed to send audit message virt=kvm op=stop reason=destroyed vm="instance-00003f52" uuid=610fc898-5cfb-4512-a4ad-3802dd962a75 vm-pid=-1: Operation not permitted15:46
*** mrunge has quit IRC15:47
*** jcoufal has joined #tripleo15:48
*** UtahDave has joined #tripleo15:48
*** akrivoka has quit IRC15:49
*** UtahDave has left #tripleo15:50
*** gcha has quit IRC15:52
openstackgerritNicholas Randon proposed a change to openstack/tripleo-incubator: SSH key and virtual_power_driver not used on H/W  https://review.openstack.org/8377015:56
*** matty_dubs is now known as matty_dubs|lunch15:57
*** akrivoka has joined #tripleo15:57
* Ng returns later than anticipated15:58
derekhNg: I meant to try just that  $INTERFACE change on compute5 while your were gone but got distracted, will do it now16:00
Ngk16:00
*** hashar has quit IRC16:00
*** hashar has joined #tripleo16:01
*** hashar has quit IRC16:02
derekhNg: done just changed line to "exec /usr/local/sbin/dhcp-all-interfaces.sh $INTERFACE" and rebooted16:03
*** ccorrigan has quit IRC16:03
Ngderekh: fingers crossed :)16:04
*** eghobo has joined #tripleo16:04
*** sballe_ has joined #tripleo16:05
*** sballe has quit IRC16:05
*** BadCub01 has quit IRC16:05
*** cwolferh has joined #tripleo16:05
*** CLOUDOUTAGE has joined #tripleo16:07
CLOUDOUTAGElifeless devananda Ng SpamapS jog0 GheRivero derekh dprince slagle  -- ci-overcloud currently down https://etherpad.openstack.org/p/cloud-outage16:07
*** CLOUDOUTAGE has quit IRC16:07
*** e0ne has quit IRC16:09
Shrewsdoes it say something about my mental health that my mind insists on reading that as CLOUDOUTRAGE?16:12
derekhShrews: It may reflect something about your mental health but its not just you, its been commented on before16:13
Shrewsah, *phew*. good to hear16:13
*** morazi has quit IRC16:15
openstackgerritNicholas Randon proposed a change to openstack/tripleo-incubator: SSH key and virtual_power_driver not used on H/W  https://review.openstack.org/8377016:16
*** jistr has quit IRC16:17
*** morazi has joined #tripleo16:18
derekhNg: compute5 back up wanna look at anything befor I run "os-collect-config --force --one -v"?16:18
Ngderekh: peeking now16:19
Ngderekh: ok, I think everything seems sane so far16:23
derekhNg: ok, running o-c-c on compute516:24
*** CaptTofu has quit IRC16:24
derekhNg: its finished, instances should start to get scheduled on in now16:25
Ngk16:25
derekhsame log entry in nova-compute.log http://paste.openstack.org/show/74757/16:29
*** maru_afk is now known as marun16:32
SpamapSNg: regarding thousands of interfaces, you need a tap interface to attach to each VM16:34
SpamapSderekh: anything I can do to help with the outtage?16:34
*** sballe_ has quit IRC16:34
NgSpamapS: yeah, I was just curious why we were spinning up quite so many in such a short time16:34
derekhNg: I thinks its because, instance were failing to spawn (I'm guessing because of http://paste.openstack.org/show/74757/ ) then zull just keeps retrying them, so we got lots of tap interfaces16:35
Ngah right16:36
Ngthat would make some sense I guess :)16:36
derekhSpamapS: ya, please you will hopefully have better luck then I did, I been keeping some highlight of what we did here https://etherpad.openstack.org/p/cloud-outage16:36
derekhSpamapS: I've gotta run in a few minutes so if you have any questions about whats happened so far befor I go then fire them out16:37
*** CLOUDOUTAGE has joined #tripleo16:38
CLOUDOUTAGElifeless devananda Ng SpamapS jog0 GheRivero derekh dprince slagle  -- ci-overcloud currently down https://etherpad.openstack.org/p/cloud-outage16:38
*** CLOUDOUTAGE has quit IRC16:38
derekhI've been more focused on trying to get the ci-overcloud runing in the short term (by trying fixes in place), leaving long term solution to gerrit process16:39
derekhI'll leave the CLOUDOUTAGE bot running but if ye need to suppress it you can just comment out the "ircmessage:" on the etherpad16:40
*** dprince has joined #tripleo16:41
*** athomas has quit IRC16:41
SpamapSNg: I think the right answer is to not print anything out in the network interface jobs16:42
SpamapSNg: no printing == no log file16:42
*** rpodolyaka1 has joined #tripleo16:42
SpamapSNg: another option is to turn off console logging.16:42
NgSpamapS: agreed, we don't really need to be echoing from dhcp-all-interfaces16:42
Ngbut network-interface.conf isn't ours, so we should think carefully before modifying it16:43
openstackgerritBen Nemec proposed a change to openstack/diskimage-builder: Add unit test for cache-url  https://review.openstack.org/8449316:43
SpamapSNg: also we could use 'logger' instead to send them to /var/log/syslog16:43
SpamapSNg: network-interface will only be echoing things if 'ifup $INTERFACE' echoes things16:43
SpamapSis that what is happening?16:43
NgSpamapS: yeah, all the squillions of tap devices have no eni entry, so ifup will whine16:44
Ngbut none of this is what is breaking us atm16:44
derekhok, gotta run16:44
openstackgerritBen Nemec proposed a change to openstack/diskimage-builder: Add unit test for cache-url  https://review.openstack.org/8449316:44
*** derekh has quit IRC16:44
*** rpodolyaka2 has joined #tripleo16:45
SpamapSNg: hmmmmmm16:45
SpamapSNg: we only --allow auto16:45
SpamapSNg: so ifup should be a noop for them16:45
SpamapSsomebody.. please... get derekh an irc bouncer16:45
NgSpamapS: oh, I'm wrong, it's not ifup, it's ifdown16:46
Ngso when the VM terminates, you get a single line log file with something like "ifdown: interface qbr716cf0b0-18 not configured"16:46
*** rpodolyaka1 has quit IRC16:47
NgI'll file an upstream bug about that16:47
Ngbut we need to figure out wtf is wrong with CI :)16:47
*** rpodolyaka2 has quit IRC16:49
Ng2014-04-01 16:47:54.756 6949 TRACE nova.compute.manager [instance: fe53754d-5cd0-4982-8369-a909e0a2dd29] Unauthorized: Unknown auth strategy16:49
Ngwat16:49
openstackgerritBen Nemec proposed a change to openstack/diskimage-builder: Add unit test for cache-url  https://review.openstack.org/8449316:49
openstackgerritgerry-drudy proposed a change to openstack/tripleo-heat-templates: Add SwiftSortingMethod parameter  https://review.openstack.org/8447016:51
*** mrunge has joined #tripleo16:56
*** BadCub01 has joined #tripleo16:57
*** matty_dubs|lunch is now known as matty_dubs16:57
SpamapSNg: yeah that's a bug for sure.16:59
*** stevehuang has joined #tripleo16:59
SpamapSNg: check to make sure cfn is working (os-collect-config --print cfn)16:59
NgSpamapS: looks ok, I see metadata with passwords and stuff17:00
openstackgerritAndrea Rosa proposed a change to openstack/tripleo-incubator: Adding single quote to a grep command  https://review.openstack.org/8449517:01
*** arosen has quit IRC17:02
*** mrunge has quit IRC17:02
NgSpamapS: I'm a bit stumped here, and I'm about to get called for dinner - can I hand off to you?17:03
*** jcoufal has quit IRC17:04
SpamapSNg: yeah, which box?17:04
Ng(ci-overcloud-novacompute5 is where we've been poking around. afaics it's configured properly and things have been restarted, but it's just super unhappy)17:04
SpamapSwait what's even the outtage? I see nodepools running.. ??17:05
* SpamapS reads the etherpad17:05
*** CaptTofu has joined #tripleo17:05
NgSpamapS: hrm, they weren't working before. I did just do a desparate occ --force --one-time about 2 minutes ago17:06
*** CLOUDOUTAGE has joined #tripleo17:09
CLOUDOUTAGElifeless devananda Ng SpamapS jog0 GheRivero derekh dprince slagle  -- ci-overcloud currently down https://etherpad.openstack.org/p/cloud-outage17:09
*** CLOUDOUTAGE has quit IRC17:09
SpamapSNg: ifdown17:10
*** UtahDave has joined #tripleo17:10
SpamapSdoh17:10
SpamapSn/m17:10
SpamapSNg: sorry, copy/paste/scrollback/stupid fail ;)17:11
*** UtahDave has left #tripleo17:11
SpamapSNg: ah yeah I see most are in ERROR17:11
openstackgerritClint "SpamapS" Byrum proposed a change to openstack/tripleo-image-elements: Add support for "signals" to os-refresh-config  https://review.openstack.org/8361417:12
*** slagle has quit IRC17:13
*** slagle has joined #tripleo17:15
SpamapSheh so we have ACTIVE vms on 4 and 517:20
*** morazi_ has joined #tripleo17:28
SpamapSwhoa..17:29
SpamapSupstart seems overwhelmed on ci-overcloud-novacompute617:29
SpamapSwith all the logging I think17:29
SpamapSroot         1 91.1  0.0  37772 13384 ?        Rs   11:02 359:42 /sbin/init17:30
SpamapSNg: hah! fixed in trusty. ;)17:30
*** morganfainberg_Z is now known as morganfainberg17:31
*** morazi has quit IRC17:32
*** CaptTofu has quit IRC17:32
*** CaptTofu has joined #tripleo17:32
*** morazi_ is now known as morazi17:33
*** CaptTofu has quit IRC17:37
*** CLOUDOUTAGE has joined #tripleo17:40
CLOUDOUTAGElifeless devananda Ng SpamapS jog0 GheRivero derekh dprince slagle  -- ci-overcloud currently down https://etherpad.openstack.org/p/cloud-outage17:40
*** CLOUDOUTAGE has quit IRC17:40
*** TravT has quit IRC17:40
*** e0ne has joined #tripleo17:40
*** TravT has joined #tripleo17:44
*** rpodolyaka1 has joined #tripleo17:46
*** lucasagomes is now known as lucas-afk17:49
*** rpodolyaka1 has quit IRC17:50
*** jprovazn_afk is now known as jprovazn17:52
*** rpodolyaka1 has joined #tripleo17:58
clarkbhello #tripleo. Is the firedrill topic for ci-overcloud related to nodepool getting "ClientException: The server has either erred or is incapable of performing the requested operation. (HTTP 500)" from tripleo cloud?18:02
SpamapSclarkb: our compute nodes are having issues18:03
SpamapSclarkb: but I wouldn't expect 500 from the controller18:03
*** julim has quit IRC18:04
*** jang1 has joined #tripleo18:05
*** julim has joined #tripleo18:06
*** derekh has joined #tripleo18:07
*** CLOUDOUTAGE has joined #tripleo18:11
CLOUDOUTAGElifeless devananda Ng SpamapS jog0 GheRivero derekh dprince slagle  -- ci-overcloud currently down https://etherpad.openstack.org/p/cloud-outage18:11
*** CLOUDOUTAGE has quit IRC18:11
SpamapSand now we seem to have lost NovaCompute618:11
derekh2014-04-01T17:29:49  <SpamapS> upstart seems overwhelmed on ci-overcloud-novacompute618:11
SpamapSderekh: yeah, so I unblocked upstart by raising max open files for pid 118:12
SpamapSderekh: but then networking went dead18:12
derekhSpamapS: this was the origional error, init hit its fd limit18:12
SpamapSderekh: I'm poking in via console now.18:12
SpamapSderekh: ah ok.. so that was already known18:12
derekhSpamapS: ps -ef | grep -i flock | wc18:12
SpamapSderekh: that's a definite bug in upstart18:12
SpamapSit shouldn't be responding to EMFILE with trying forever.. it should be just logging on its main log "can't log for XXXX"18:12
derekhout dhcp-all-interfaces script was respawning itself as we we're passing in a interface18:13
derekh*our18:13
derekhSpamapS: https://bugs.launchpad.net/tripleo/+bug/130066318:13
uvirtbotLaunchpad bug 1300663 in tripleo "upstart using 100% CPU" [Critical,Triaged]18:13
SpamapSderekh: ok18:14
SpamapSderekh: I set 'console none' on network-interface on NovaCompute618:15
derekh*not passing in a interface18:15
SpamapSderekh: the logging is fixed in trusty btw (14.04)18:15
derekhSpamapS: ok18:15
SpamapSdoh18:15
SpamapSkernel paniced18:15
SpamapSsomething killed init18:15
* SpamapS reboots18:15
derekhso with the updated dhcp-all-interfaces on both compute4 and 5 zuul is now starting instances, but there quickly go into DELETE state I'm guess because of this error http://paste.openstack.org/show/74757/18:17
derekhSpamapS: ^ thats were I was when I left it18:17
derekhsorry ERROR state then DELETE18:18
SpamapSderekh: we have some ACTIVE's18:18
SpamapSbut just a few18:18
derekhSpamapS: but they don't stay active for line, their then deleted once they go to ERROR18:20
derekhhmm one seems to be staying running18:20
*** panda has quit IRC18:21
*** panda has joined #tripleo18:21
SpamapSahh18:22
*** CaptTofu has joined #tripleo18:23
* derekh wonders if we have the old ci-overcloud images anywhere to give us some breathing room18:23
*** jang1 has quit IRC18:25
derekhforget that idea, their gone18:25
SpamapSwell..18:26
SpamapSalso..18:26
SpamapSwe should like, make new stuff work :)18:26
*** rpodolyaka1 has quit IRC18:28
SpamapSok I see instances booting on NovaCompute618:29
dprinceif we build a new image with the $INTERFACE fix for dhcp-all-interfaces, For the other error I wonder if making the vif_plugging_timeout larger would help: https://review.openstack.org/#/c/81224/2/elements/nova/os-apply-config/etc/nova/nova.conf18:29
derekhSpamapS: yes, but ideally not on our running cloud, having a rollback strategy wouldn't do any harm18:29
SpamapSso I think we should disable logging of network-interface anyway..18:30
derekhdprince: no, that timeout was made small because we're expecting it to timeout18:30
SpamapSthat is just never a good idea on a nova compute host18:30
derekhdprince: because neutron doesn't have the creds to send the notify to nova18:31
dprincederekh: ack18:31
derekhSpamapS: dprince going to restart neutron-openvswitch-agent on the controller, it hasn't loged anythin since yesterday and its usually so chatty18:32
derekhSpamapS: dprince its last log entry was18:32
derekh2014-03-31 06:42:59.489 25453 CRITICAL neutron [-] Trying to re-send() an already-triggered event.18:32
SpamapSinteresting18:33
dprincederekh: sounds promising18:33
derekh2014-04-01 18:35:39.920 2479 ERROR neutron.agent.linux.ovsdb_monitor [-] Error received from ovsdb monitor: 2014-04-01T18:35:39Z|00001|fatal_signal|WARN|terminating with signal 15 (Terminated)18:33
* dprince thinks good monitoring would help w/ all this18:33
SpamapSdprince: and HA ;)18:34
SpamapSand CI gating of image replacement18:34
SpamapSall things we hold dear18:34
SpamapSthat we're not doing. :-P18:34
dprinceSpamapS: I actually think I would take monitoring first, but sure. All these things18:34
* dprince likes to see18:34
SpamapSwe have crap monitoring already18:34
SpamapSIf we had HA, we wouldn't have been compelled to replace the whole ci overcloud.18:35
SpamapSif we had CI gating of images, when we were ready to update it, we'd have found this problem18:35
dprinceSpamapS: yeah, well what if your HA eats its face off18:36
SpamapSdprince: ????18:36
SpamapSI don't really understand.. but o-k. :)18:36
dprincei.e. it dies because someone deployed wrong or something18:36
SpamapSso there really is no HA w/o monitoring18:37
SpamapSso yes you'd have to have monitoring before HA.. but my point is, we had monitoring, just crappy monitoring, which alerted us to "cloud not working"18:37
SpamapSwe responded with our only possible fix for a hardware issue: redeploy cloud18:37
SpamapShad we been using actual CD, we'd have deployed images that were already known to work18:38
SpamapSthats what I'm frustrated by: that we don't have CD.. we have hope and prayer mostly18:38
dprinceI think the problem in this case may have been it took a bit to notice the errors.18:38
SpamapSno18:38
SpamapSthe errors were the death of the machine18:39
*** e0ne has quit IRC18:39
SpamapSknowing it was dead 8 hours earlier would have had us in this situation 8 hours earlier.18:39
dprinceSo who actually deployed the new images then? And when?18:39
SpamapSlifeless IIRC.. and its what I would do too btw.18:39
SpamapSwith no backups.. not much else to do.18:39
dprincenot trying to blame. just understand the timing here18:40
*** e0ne has joined #tripleo18:40
*** e0ne has quit IRC18:40
SpamapSderekh: so is neutron failing on controller now?18:40
*** rpodolyaka1 has joined #tripleo18:40
*** e0ne has joined #tripleo18:40
SpamapS2014-04-01 18:43:14.555 10089 TRACE neutron.notifiers.nova BadRequest: The server could not comply with the request since it is either malformed or otherwise incorrect. (HTTP 400)18:41
SpamapSthats what neutron-server says18:41
derekhSpamapS: log file isn't logging anythin but process is running, but I didn't notice something18:41
derekh2014-04-01 18:42:25.093 7737 TRACE nova.api.openstack NeutronClientException: 409-{u'NeutronError': {u'message': u"Quota exceeded for resources: ['floatingip']", u'type': u'OverQuota', u'detail': u''}}18:41
SpamapSoh!18:41
SpamapSwell that would explain why they're being deleted18:41
*** CLOUDOUTAGE has joined #tripleo18:42
CLOUDOUTAGElifeless devananda Ng SpamapS jog0 GheRivero derekh dprince slagle  -- ci-overcloud currently down https://etherpad.openstack.org/p/cloud-outage18:42
*** CLOUDOUTAGE has quit IRC18:42
Ngdoh!18:42
derekh| floating_ips                | 100    |18:42
derekhquota seems high enough18:42
SpamapSderekh: quotas are likeing being misreported18:43
SpamapSfloatingip-list shows 64 assigned to that tenant18:43
derekhhmm "neutron floatingip-list" shows a list of 63 floating IP's only 2 seem to be in use18:44
SpamapSyou are charged if you own them18:44
derekhgonna floatingip-delete one of them18:45
SpamapS+----------+----------------------------------+18:45
SpamapS| count(*) | tenant_id                        |18:45
SpamapS+----------+----------------------------------+18:45
SpamapS|        1 | 94df8fe8a9b44a6498ac0c4a05238e7c |18:45
SpamapS|       60 | e01e473a9250498883955b80966a1e58 |18:45
SpamapS+----------+----------------------------------+18:45
SpamapSI see quotas of 60 for that tenant id18:46
SpamapSwhich should still be enough18:46
derekhSpamapS: ok, we have progress18:47
derekhafter deleting a floating IP18:47
derekhone of the instance got a floatingip (first time today)18:48
derekhand I can ping it18:48
derekh64 bytes from 138.35.77.49: icmp_seq=1 ttl=44 time=264 ms18:48
derekhSpamapS: ^18:48
SpamapSderekh: sounds good18:48
SpamapSderekh: so maybe nodepool doesn't know about the ips it already has18:48
derekhSpamapS: and zuul just popped a job off its pool http://status.openstack.org/zuul/18:48
*** newell_ has joined #tripleo18:49
derekhSpamapS: yup, in all the downage it must have lost track18:49
derekhSpamapS: gonna delete the rest18:49
SpamapStripleo ci-overcloud: setting the standard for downtime since 201418:49
SpamapSderekh: so I didn't do any other "fixing" on NovaCompute6 .. just disabled console logging for network-interface ...18:52
derekhSpamapS: ok, so now we have 2 things todo (hopefully)18:53
derekh1. add $INTERFACE into the upstart config for dhcp-all-interfaces on all compute nodes18:53
SpamapSok that seems like a straight forward bug fix18:54
SpamapSnot sure how we missed that18:54
derekh2. zull is now running jobs but they are all going to fail because our gear broker is dead, we need a new broker and it must have the IP tripleo-bm-test=192.168.1.118:54
SpamapSdprince: btw, an upstart job that is 'start on starting network-interface' is basically identical to a udev rule18:55
SpamapSam not at all averse to just using a udev rule only.18:55
tchaypoMorninge18:56
derekhI'll be sticking around for the meeting but gotta go then so wont be able to apply those fixes,18:56
tchaypoOr rather, late night! (IMHO morning begins at sunrise)18:56
SpamapSugh meetings18:57
derekhbrb18:57
*** jcoufal has joined #tripleo18:59
*** akrivoka has quit IRC19:00
*** akrivoka has joined #tripleo19:01
*** jistr has joined #tripleo19:01
*** jcoufal has quit IRC19:03
*** jcoufal has joined #tripleo19:03
mariosare we meeting tonight?19:04
rpodolyaka1we are19:04
*** tzumainn has joined #tripleo19:05
mariosrpodolyaka1: k tx19:05
rpodolyaka1np19:05
*** e0ne has quit IRC19:06
*** e0ne has joined #tripleo19:07
*** rlandy has quit IRC19:07
*** rlandy has joined #tripleo19:07
*** blamar has joined #tripleo19:08
*** eguz has joined #tripleo19:11
mariosrpodolyaka1: should be now right?19:11
* marios time change confused19:12
rpodolyaka1marios: it should. SpamapS has been waiting for lifeless to respond, but he's probably offline19:12
mariosrpodolyaka1: k thanks bud19:12
*** CLOUDOUTAGE has joined #tripleo19:13
CLOUDOUTAGElifeless devananda Ng SpamapS jog0 GheRivero derekh dprince slagle  -- ci-overcloud currently down https://etherpad.openstack.org/p/cloud-outage19:13
*** CLOUDOUTAGE has quit IRC19:13
rpodolyaka1marios: np :) don't go anywhere, we are about to start :)19:13
*** eghobo has quit IRC19:15
lifelessderekh: I'll get C's breakfast - can we do a graceful handover after that? I see that most of the hypervisors are still down?19:24
*** akuznetsov has quit IRC19:28
*** jp_at_hp has quit IRC19:33
*** jp_at_hp has joined #tripleo19:40
*** CLOUDOUTAGE has joined #tripleo19:44
CLOUDOUTAGElifeless devananda Ng SpamapS jog0 GheRivero derekh dprince slagle  -- ci-overcloud currently down https://etherpad.openstack.org/p/cloud-outage19:44
*** CLOUDOUTAGE has quit IRC19:44
*** TravT has quit IRC19:45
*** hashar has joined #tripleo19:46
* tchaypo notices the marconi/macaroni thread19:46
*** newell_ has quit IRC19:46
tchaypoit's so cute when the rest of the world finally joins the party a day late19:47
*** newell_ has joined #tripleo19:47
*** e0ne_ has joined #tripleo19:51
*** e0ne has quit IRC19:51
*** e0ne_ has quit IRC19:53
*** TravT has joined #tripleo19:54
*** e0ne has joined #tripleo19:54
*** sdake_ is now known as sdake__19:54
*** yamahata has joined #tripleo19:55
*** CaptTofu has quit IRC19:55
*** CaptTofu has joined #tripleo19:56
*** lsmola has joined #tripleo20:00
slaglelsmola: so, is there going to be any working version of tuskar that uses just icehouse of all the dependent projects?20:00
*** CaptTofu has quit IRC20:00
jistrlsmola: what features of heat did you mean?20:00
slaglelsmola: that would be a reason to create a stable icehouse branch for tuskar20:00
*** akuznetsov has joined #tripleo20:01
*** notmyname has joined #tripleo20:01
notmynamederekh: this is the right channel in which to talk?20:02
notmynamederekh: to you or lifeless?20:02
jistryeah i think what slagle says is correct. We're far from perfect with Tuskar, but it does deploy something :) i think it might be good to have a stable icehouse branch20:02
jprovaznSpamapS, "<SpamapS> next steps for HA is to have Heat inform nodes when they're about to be rebooted or deleted." - I don't understand why this is needed - for some graceful shutdown?20:02
lifelessnotmyname: hi20:02
jcoufallsmola: I think we need a stable release of icehouse20:02
greghaynesSpamapS: Not sure if you saw my message yesterday - was having an issue with the software-config patch in that the db-password cfn property was inside of a deployments array, causing os-collect-config --key db-password to fail20:02
lsmolaslagle, yes for icehouse we have alsmost prepared a stable version of tuskar and tuskar-ui20:02
derekhlifeless: I gotta run, are you happy enough you know what needs to happen20:02
lifelessnotmyname: sure, or #openstack-dev20:02
jcoufalslagle: there should be20:02
lifelessderekh: I believe I do - mass deploy the add of $INTERFACE and fixup the broker20:02
derekhlifeless: yup,20:02
lifelessderekh: is there a review up for that fix? Or should I add one?20:02
SpamapSgreghaynes: you have an old os-collect-config then. It will be exploded into a json per deployment20:03
jcoufalslagle: still fixing few stuff, but we will need one20:03
slaglelsmola: jcoufal jistr : ok, then i think it makes sense to create icehouse branches for tuskar, that will continue to work with icehouse branches from other projects20:03
greghaynesaha20:03
jistrslagle: +120:03
*** rbrady has quit IRC20:03
jcoufalslagle: +120:03
derekhlifeless: that was a short term fix Ng SpamapS and dprince were talking about what the long term fix should look link,20:03
jcoufaltha make sense20:03
lsmolaslagle, ok, as it should be packed as tech preview, we will need that20:03
derekhNg: may have that exact fix up20:03
jcoufallsmola: that's not accurate20:04
lifelessderekh: I think thats the one we can and should commit through20:04
lifeless*though*20:04
derekhlifeless: ok20:04
slaglelsmola: yea, i believe expectations have been set :) no one is expecting perfection20:04
tchaypomatty_dubs: your doge link was acceptable20:04
derekhnotmyname: yes, I'll let lifeless fill you in, I gotta run, will follow up tomorrow if anything is needed20:04
matty_dubstchaypo: Yay! :)20:05
notmynamelifeless: derekh just told me in the meeting that swift broke things for you. I hadn't seen your email until I search the list history20:05
lsmolaslagle, well we have working creating and deleting of stack, that should be enough :-)20:05
derekhok bye all20:05
lifelessnotmyname: ah - did you find it?20:05
*** derekh has quit IRC20:05
lifelessnotmyname: I don't think (and didn't intend to suggest) that swift did anything wrong20:05
notmynamelifeless: I did, but there isnt' a lot of detail there20:05
notmynamelifeless: subject "[openstack-dev] [TripleO][CI] all overcloud jobs failing"20:06
notmynamelifeless: I'm wondering if it was this commit https://github.com/openstack/swift/commit/c6cebb6e621a245c9c2d5bff0df59689b014037320:07
lifelessnotmyname: yes that was it20:07
notmynamelifeless: swift patch or email?20:07
lifelessnotmyname: it changes the mode from world readable to user readable only20:07
lifelessnotmyname: both :)20:07
*** CaptTofu has joined #tripleo20:08
lsmolaslagle, we can chat about more details tomorrow, I need to go now :-)20:09
lsmolagood night everybody20:09
*** rpodolyaka1 has quit IRC20:09
slaglelsmola: night20:10
jprovaznSpamapS, to follow next HA steps you mentioned: when a resource plugin which tells server it's being rebooted/deleted is done, what are next steps? you can do then a graceful shutdown - migrate services somewhere else. But this all sounds like a different way than having all services in HA mode - then you don't have to care about graceful shutdown. Or am I missing something?20:12
*** akuznetsov has quit IRC20:13
greghaynes:) I had basically the same question almost typed out in my irc buffer20:13
SpamapShah20:13
jprovazn:) great20:13
SpamapSso the reason this is needed is for the update case20:14
greghaynesTheres also the bit about deferring to the services on state of the cluster / keeping quorum during upgrade20:14
SpamapSit is as much about making Heat wait for a signal back that the node is o-k to take down as it is about quiescing.20:14
*** CaptTofu has quit IRC20:14
openstackgerritlifeless proposed a change to openstack/diskimage-builder: Fix dhcp-all-interfaces upstart job  https://review.openstack.org/8453920:14
lifelessSpamapS: slagle: ^20:14
SpamapSfor nova-compute, this is evacuate..20:14
*** CLOUDOUTAGE has joined #tripleo20:14
CLOUDOUTAGElifeless devananda Ng SpamapS jog0 GheRivero derekh dprince slagle  -- ci-overcloud currently down https://etherpad.openstack.org/p/cloud-outage20:14
*** CLOUDOUTAGE has quit IRC20:14
*** CaptTofu has joined #tripleo20:15
SpamapSfor rabbit and galera, we want to delay any downtime if it would break quorum.20:15
tchaypothat's marvellously spammy - and in weechat, it's extremely colorful too20:15
* tchaypo likes20:15
*** e0ne has quit IRC20:16
greghaynesI really love that the username for CLOUDOUTAGE ends up being cloudouta20:17
*** e0ne has joined #tripleo20:17
notmynamelifeless: so I take it from the tripelo patch that you're runnign swift as root?20:18
jprovaznSpamapS, I see, though for controller nodes it seems to me that you need all services in HA mode at first to have them in cluster, then you can take care of no breaking quorum20:18
SpamapSnotmyname: we're running the ring creation tools as root20:18
notmynameSpamapS: ok. why? (not attacking, just curious)20:18
lifelessnotmyname: we run the rest of swift as swift20:18
greghaynesSpamapS: So the resource you were describing would be what is responsible for asking service if its ok to remove that node from the cluster, and then reporting back to heat wheather or not to progress?20:19
SpamapSnotmyname: https://git.openstack.org/cgit/openstack/tripleo-image-elements/tree/elements/swift/os-refresh-config/configure.d/73-swift runs as root20:19
lifelessnotmyname: but config changes need root access if only to be able to sudo to the right users :)20:19
jprovaznSpamapS, IOW adding HA support for heat-engine, neutron would be prerequisities to run >1 controller node20:19
*** CaptTofu has quit IRC20:19
SpamapSjprovazn: yes, but all things in HA mode are already in the review pipeline, right?20:19
SpamapSjprovazn: heat-engine 'just works'20:19
greghayneshttps://etherpad.openstack.org/p/tripleo-icehouse-ha-production-configuration is relevant etherpad20:19
SpamapSjprovazn: neutron-l3-agent is the only one that won't really be HA IIRC20:19
SpamapSgreghaynes: correct20:20
jprovaznSpamapS, heat-engine is merged - lifless was not sure about it's stability (when we were talking about it in sunnyvalle), but I would give it a shot20:20
SpamapSheat-engine works multi-node20:21
lifeless(heat-engine) as long as you don't reboot it20:21
SpamapSlifeless: that's a MASSIVE architectural problem in Heat.20:21
SpamapSwe're not solving that.20:21
SpamapSWe need to, but not in our HA work.20:21
jprovaznSpamapS, then another thing is adding backend for cinder (gluster or ceph)20:21
lifelessSpamapS: Oh, I knwo :)20:22
SpamapSjprovazn: yes, not in the next step category though.20:22
lifelesswell soon20:22
SpamapSIn the important steps, yes.20:22
lifelesslets get HA control plane bedded down20:22
lifelessthen have a discussion about what the next step should be20:22
SpamapSBut cinder will keep working as-is right now for volumes that are not on downed nodes, will it not?20:22
lifelessrather than saying yes or no right now20:22
SpamapSright cinder control plane will be fine is what I'm trying ot say20:23
SpamapSto20:23
jprovaznSpamapS, ok, I see what you mean then, I just don't considered most of services done :)20:23
jprovazn(in queue)20:23
notmynamelifeless: just to be explicit, you aren't waiting on swift for anything, right?20:23
lifelessnotmyname: nope - I would have screamed loudly if we thought that :)20:23
notmynamelifeless: ok, thanks20:24
*** notmyname has left #tripleo20:24
lifelessnotmyname: what you can do, if you're interested, is run 'check experimental' on your patches20:24
greghaynesIm curious how the control flow for the new heat resource will go, does heat pick a node, resource asks node if its ok, repeat until its ok or fail? Or is there plans for asking down to the app level on which node would be best to update at a given point?20:24
SpamapSgreghaynes: in our current templates, Heat is going to pick all the nodes, and try to rebuild them all at once. I am a little worried about that actually. :-P20:25
greghaynesoh joy!20:25
greghaynesWCPGW20:25
*** julim has quit IRC20:25
greghaynesI think the let heat pick a node at random and beg it to become ok to upgrade in the near future is suboptimal but might be a good mvp...20:27
*** eguz has quit IRC20:27
*** eghobo has joined #tripleo20:28
greghaynesbut this is my super naive at heat idea :)20:28
SpamapSgreghaynes: there is a rolling upgrade story already in Heat..20:30
greghaynesah20:31
SpamapSbut we have issues using heat's scaling groups20:31
SpamapSI think we'll get there with Heat.20:31
SpamapSbut .. so much to do.. :-P20:31
ccrouch(3:23:11 PM) jprovazn: SpamapS, ok, I see what you mean then, I just don't considered most of services done :)20:31
ccrouchso which ones do you think are still pending?20:32
SpamapSI am not keeping track of that :-/20:32
SpamapSoy.. nor am I tracking food intake.. such hungry.. very food .. so eat..20:32
jprovaznSpamapS, ccrouch: well, some "smaller" tasks from here: https://etherpad.openstack.org/p/tripleo-icehouse-ha-production-configuration line 6620:33
lifelessSpamapS: go eat something then20:33
greghaynesmuch lunch20:33
lifelessSpamapS: you're no use to anyone with low brain energy20:34
lifelesswow signalling heat waitcondition completion is slow sometimes - 20seconds20:34
greghaynesjprovazn: Yep, I think we also tried to make trello cards for all the not done things in there20:34
lifelessok 10 hypervisors up20:36
jprovazngreghaynes, well, it might be that 2) and 5) are part of an existing card20:36
jprovazngreghaynes, others should be covered20:36
lifelessresetting state on ERRORd VMs20:36
*** jcoufal has quit IRC20:37
greghaynesjprovazn: By others should be covered you mean they all have trello cards?20:37
*** jistr has quit IRC20:37
jprovazngreghaynes, I mean they are part of an existing trello card20:37
greghaynesok. So do you know of any todos for it that you dont see in trello?20:38
* greghaynes puts note in etherpad that these should all be in trello20:38
jprovazngreghaynes, lifeless: this card is not clear to me - https://trello.com/c/DaIs1zxb/82-neutron-ha-redundant-environment - it mentions 2 different approaches:20:39
jprovaznl3 agent isn't currently able to run active-active for a network, so we either need to migrate networks on router quiesceing or run active-passive neutron agents (which as agents have uuids is possibly non-trivial)20:39
lifelessjprovazn: right, the card is the problem not the solution20:40
lifelessjprovazn: There is a third approach20:40
jprovaznlifeless, yes20:40
jprovaznthe new patch20:40
lifelessjprovazn: yeah20:40
jprovaznlifeless, so I would prefer a/a or a/p20:41
jprovaznbut not network migrations - it seems like waste of time20:41
lifelessjprovazn: all the options are a/a or a/p :P20:41
lifelessjprovazn: network migrations are a/p20:41
jprovaznlifeless, but graceful, right?20:41
lifelessjprovazn: yes, unless we're migrating because of a failure20:41
lifelessjprovazn: in which case they restore service20:41
jprovaznlifeless, so I understand it that we would do some special script which migrates networks when the resource plugin (mentioned by SpamapS before) is done20:42
lifelessjprovazn: or when pacemaker is failing a node over20:44
*** CLOUDOUTAGE has joined #tripleo20:46
CLOUDOUTAGElifeless devananda Ng SpamapS jog0 GheRivero derekh dprince slagle  -- ci-overcloud currently down https://etherpad.openstack.org/p/cloud-outage20:46
*** CLOUDOUTAGE has quit IRC20:46
jprovaznlifeless, hm, I would expect that pacemaker (+maybe drdb if needed) would do this for us - IOW with pacemaker we wouldn't have to do any migration script20:46
lifelessjprovazn: with pacemaker we need to manually override the uuid of the agent20:46
lifelessjprovazn: have you looked at the a/a status? having that would be most cool ....20:47
*** CaptTofu has joined #tripleo20:47
jprovaznlifeless, and this is the point where my neutron knowledge ends up (we need to manually override the uuid of the agent)20:47
lifelessjprovazn: ok so20:47
lifelessjprovazn: each neutron agent has a uui20:47
jprovaznlifeless, I think it's still nto merged20:47
jprovazn(the patch)20:47
*** CaptTofu has quit IRC20:48
*** CaptTofu_ has joined #tripleo20:48
lifelessjprovazn: if you run 3 l3 agents20:48
lifelessand create three networks20:48
lifelessthey will end up one per agent (probably)20:48
lifelessso networks are partitioned over agents20:48
lifelessif an agent fails20:49
lifelessand you want to start it up elsewhere20:49
lifelessyou have to override the agent id (basically hostname on the rabbit bus)20:49
lifelessor it won't run the networks its meant to - their router will remain down20:49
*** sballe has joined #tripleo20:49
*** CaptTofu_ has quit IRC20:49
jprovaznlifeless, I see - though this should not be so difficult, RHOS folks already do this with a/p with pacemaker setup20:50
lifelessjprovazn: righto - so we'd need to import their logic20:50
lifelessjprovazn: the nice thing about the migration approach though is that we get some natural redundancy because its a/a20:51
lifelessits just not a/a for any *single* network20:51
dprincelifeless: hi, when you deploy a new overcloud where do you run devtest_overcloud from?20:51
lifelessdprince: me @ home? me on the HP undercloud box for cd-overcloud? me on the HP undercloud box for ci-overcloud ?20:52
dprincelifeless: was asking some questions earlier (to understand things... not question them)20:52
lifelessjprovazn: so markmc said us running the unmerged patch might be a good thing to help it merge20:53
dprincelifeless: on the HP cloud specifically, just trying to understand how we might keep our old images around... just in case20:53
markmclifeless, markmcclain ?20:53
lifelessdprince: deploying ci-overcloud?20:53
jprovaznlifeless, yes - we can at least test it20:53
lifelessbah20:53
*** CaptTofu_ has joined #tripleo20:53
lifelessjprovazn: markmcclain :P20:53
lifelessmarkmc: EFAIL :(20:53
dprincelifeless: Sure CI. that is a good start I think20:54
markmclifeless, yes, markmcclain did a tragically bad job of choosing an IRC nick20:54
lifelessdprince: well point is the answer is different :P20:54
lifelessmarkmc: and a name, right?20:54
lifelessdprince: anyhow, I normally sit in /opt/stack/tripleo-image-elements/tripleo-cd and run ./deploy-ci-overcloud20:54
markmclifeless, it's actually quite a nice name :)20:54
dprincelifeless: On the undercloud box?20:55
lifelessdprince: yes, ssh to heat-admin@cd-undercloud.tripleo.org; sudo su -; source undercloud creds20:55
dprincelifeless: gotcha. On the RH side I usually do everything from the seed host.20:56
*** jdob has quit IRC20:57
lifelessdprince: I find doing stuff from baremetal is faster :)20:57
dprincelifeless: And at this point I'm always keeping images at this point though. In fact I typically apply fixes by editing images so as not to break things.20:57
dprincelifeless: I said seed host (it is baremetal)20:57
lifelessdprince: oh right20:58
lifelessdprince: so yeah, the seed host has no tools etc on it on the hp cloud20:58
*** blamar has quit IRC20:58
lifelessdprince: whereas the undercloud comes pre loaded with a bunch of stuff20:58
lifelessdprince: but we should document these differences20:58
dprincelifeless: just wondering if we might adopt a common place to backup images.20:58
lifeless(or standardise)20:58
dprincelifeless: because we use load-images -d (and that doesn't keep things around in Glance)20:59
lifelessdprince: keep em in glance? we could change our scripts ..20:59
dprincelifeless: or... perhaps we need a new glance strategy20:59
dprincelifeless: either way is fine, we'll need to purge Glance from time to time...20:59
lifeless--keep-5 ?20:59
lifeless--keep=5 ? I meant20:59
dprincelifeless: but having them backed up, renamed is going to ward off frustrations I think21:00
*** untriaged-bot has joined #tripleo21:00
untriaged-botUntriaged bugs so far:21:00
untriaged-bothttps://bugs.launchpad.net/tripleo/+bug/129048821:00
*** untriaged-bot has quit IRC21:00
uvirtbotLaunchpad bug 1290488 in tripleo "Baremetal: Invalid credentials" [Undecided,Incomplete]21:00
*** blamar has joined #tripleo21:00
dprincelifeless: okay, I'd be fine with a --keep option (or something). Will file a ticket on it and ponder it because I think this would be super useful and perhaps take the pressure off sometimes.21:01
lifelessack21:01
openstackgerritBen Nemec proposed a change to openstack/diskimage-builder: Add unit test for cache-url  https://review.openstack.org/8449321:02
jprovaznlifeless, with network migrations strategy for l3 agent, in case of failure of one of nodes - networks provided by l3 agent on this node will not be available until heat spawns new instance of the node (I suppose that any l3-agent state would be restored from /mnt/state)?21:03
lifelessjprovazn: no the agent is stateless21:03
lifelessjprovazn: with the migration strategy, we'd have pacemaker detect node down and trigger a rebalance which would divide the networks assigned to that agent to the other agents21:03
jprovaznlifeless, ok, so until new instance with same uuid boots up21:03
lifelessno21:03
lifelessgimme a sec21:04
lifelesstrying to get CI back as a priority - 10 minutes?21:04
jprovaznsure21:04
*** blamar has quit IRC21:04
jprovaznlifeless, ok, I think I get it from your last comment21:06
*** blamar has joined #tripleo21:06
lifelessoh wow, nova --nic net-id and port combined gets the wrong order21:06
lifelessgnar21:06
*** CaptTofu_ has quit IRC21:07
*** CaptTofu has joined #tripleo21:08
*** lblanchard has quit IRC21:10
meenaI have tempest included in the overcloud not-compute. I see tempest in two locations 1.  under /opt/stack/venvs/ which contains only bin, local, share, lib and include directores. 2. under /opt/stack/tempest directories, which contains all the test source  config files.  When I run run_tempest from /opt/stack/venvs/tempest, does it pick up the tempest.conf from /opt/stack/tempests ?21:12
*** CaptTofu has quit IRC21:12
lifelessmeena: no21:12
lifelesstempest.conf will be written by o-a-c to /etc/tempest/21:13
jp_at_hplifeless: Can I get eyes on an early change please? https://review.openstack.org/84549  I want to see if there is anything that would be fundamentally unacceptable in it, or any other things you want to suggest.21:13
*** blamar has quit IRC21:13
lifelessjp_at_hp: queued up21:13
*** blamar has joined #tripleo21:14
*** dprince has quit IRC21:14
jp_at_hplifeless: ta, I'm heading offline now, but will look forward to comments first thing my morning.  It is supposed to be moving towards https://etherpad.openstack.org/p/tripleo-incubator-rationalise-ui and https://etherpad.openstack.org/p/tripleo-devtest.sh-refactoring-blueprint.  Thanks again.21:15
*** CLOUDOUTAGE has joined #tripleo21:17
CLOUDOUTAGElifeless devananda Ng SpamapS jog0 GheRivero derekh dprince slagle  -- ci-overcloud currently down https://etherpad.openstack.org/p/cloud-outage21:17
*** CLOUDOUTAGE has quit IRC21:17
*** rpodolyaka1 has joined #tripleo21:17
*** blamar has quit IRC21:18
*** blamar has joined #tripleo21:18
*** e0ne has quit IRC21:18
*** jp_at_hp has quit IRC21:19
*** jprovazn has quit IRC21:20
*** blamar_ has joined #tripleo21:20
*** rpodolyaka1 has quit IRC21:22
*** blamar has quit IRC21:23
meenalifeless: how do I run tempest from /opt/stack/venvs/tempest?21:25
*** blamar has joined #tripleo21:25
*** blamar_ has quit IRC21:25
lifelessmeena: /opt/stack/venvs/tempest/bin/tempest, I presume21:26
lifelessandreaf: you were working on tempest in CI - which strategy are you pursuing21:26
lifeless?21:26
openstackgerritTrent Geerdes proposed a change to openstack/tripleo-incubator: Adding horizon element to undercloud extras  https://review.openstack.org/8418921:29
*** markmc has quit IRC21:32
SpamapSok now my brain is back in working order21:35
openstackgerritClint "SpamapS" Byrum proposed a change to openstack/tripleo-image-elements: Prevent network interface logging w/ ovs agent  https://review.openstack.org/8456121:41
*** fungi has joined #tripleo21:41
lifelessoh for the lova of..21:42
lifelesshttps://review.openstack.org/#/c/84208/21:43
lifelessthats why I'm having trouble21:43
jeblairlifeless: fyi ram quota accounting seen by nodepool seems wrong (not sure if that's expected behavior at this point in the firedrill)21:43
lifelessjeblair: thanks21:43
SpamapSperserved21:43
lifelessjeblair: it probably is, as nova doesn't like havnig hypervisors hang21:44
jeblairpicky picky21:44
SpamapSlifeless: so I see more up now, but 6,0,1,9 are all down?21:46
*** hashar has quit IRC21:46
*** CLOUDOUTAGE has joined #tripleo21:48
CLOUDOUTAGElifeless devananda Ng SpamapS jog0 GheRivero derekh dprince slagle  -- ci-overcloud currently down https://etherpad.openstack.org/p/cloud-outage21:48
*** CLOUDOUTAGE has quit IRC21:48
*** e0ne has joined #tripleo21:49
*** matty_dubs is now known as matty_dubs|gone21:51
* fungi loves the bot21:51
*** CaptTofu has joined #tripleo21:51
*** e0ne_ has joined #tripleo21:51
*** e0ne has quit IRC21:51
SpamapSlifeless: so is 84208 causing 2014-04-01 21:57:41.763 10126 WARNING nova.virt.libvirt.driver [req-14ef5f5c-a3dd-4de1-8520-1b3d3fa32437 d5af62d2183d431796d74c5bb119ec9f e01e473a9250498883955b80966a1e58] Timeout waiting for vif plugging callback for instance 56242d03-1aee-4673-bcc0-7a5df107393021:52
SpamapSlifeless: and if so, should we deploy a new nova-compute image?21:53
*** dguerri_ has joined #tripleo21:53
lifelessthats a 30 second delay on booting everything21:53
*** sdake_ has joined #tripleo21:53
*** sdake_ has quit IRC21:53
*** sdake_ has joined #tripleo21:53
*** mtaylor has joined #tripleo21:53
lifelessits because of the unfixed callback thing from neutron21:53
*** sdake_ is now known as sdake_121:53
*** jpeeler1 has joined #tripleo21:53
*** dguerri has quit IRC21:54
*** jpeeler has quit IRC21:54
*** openstackgerrit has quit IRC21:54
*** jtomasek has quit IRC21:54
*** mordred has quit IRC21:54
*** sdake has quit IRC21:54
*** jtomasek has joined #tripleo21:55
SpamapSlifeless: ok, anything I can do?21:55
lifelessI'm just trying to monkey this neutron usage fix in21:55
lifelesshopefully I can just do it in the control plane21:55
lifelessbut I'm not sure of that21:55
*** e0ne_ has quit IRC21:56
SpamapSah so deleting all the floatingips didn't fix it?21:56
lifelessSpamapS: not a floatingip issue22:01
lifelessSpamapS: I needed to get te-broker rebuilt with ip 192.168.1.122:01
SpamapSlifeless: mmm k22:01
lifelessSpamapS: but when I passed that in we hit the bug that the review above I listed fixes22:01
lifelessso eth0 was on 192.168.1.122:01
lifelessand eth1 was on 10.x22:01
lifeless-> no metadata22:02
lifeless-> fail22:02
SpamapSAH22:04
*** weshay has quit IRC22:04
SpamapSoh what tangled webs22:05
*** morazi has quit IRC22:05
*** hashar has joined #tripleo22:11
*** sballe has quit IRC22:15
*** rpodolyaka1 has joined #tripleo22:18
*** CLOUDOUTAGE has joined #tripleo22:19
CLOUDOUTAGElifeless devananda Ng SpamapS jog0 GheRivero derekh dprince slagle  -- ci-overcloud currently down https://etherpad.openstack.org/p/cloud-outage22:19
*** CLOUDOUTAGE has quit IRC22:19
*** jp_at_hp has joined #tripleo22:20
*** rpodolyaka1 has quit IRC22:22
*** al has quit IRC22:22
*** giulivo has quit IRC22:27
jb11211Is there any work being done to get the tuskar-ui into tripleo?22:28
*** giulivo has joined #tripleo22:29
*** jp_at_hp has quit IRC22:31
*** lsmola has quit IRC22:31
lifelessjb11211: what do you mean ?22:34
*** CaptTofu has quit IRC22:34
jb11211with a bit of work (tuskar has crappy python dependencies) I can get tuskar and horizon installed on the undercloud host. Are there plans to add tuskar-ui as an element which can also be added22:35
*** CaptTofu has joined #tripleo22:35
jb11211to the undercloud image22:35
lifelessjb11211: certainly, that would be good22:35
*** TravT has quit IRC22:36
jb11211lifeless: are you aware of any work that's already been done to this end or does it need to be started from scratch?22:38
lifelessjb11211: I'm not aware of any outstanding patch sets - but check the tripleo-image-elements review queue22:39
*** CaptTofu has quit IRC22:39
*** lucas-afk has quit IRC22:41
jb11211I'm also thinking trying to add support for integrating keystone with LDAP for the under and over cloud images22:42
jb11211it would need being able to add/change lines on the /etc/keystone/keystone.conf files. should I try and add that to the image elements or the heat templates?22:43
lifelessconfig changes typically require template changes in tripleo-image-elements and heat-template changes in tripleo-heat-templates22:45
lifelessthe former cover the serialisation the latter the modelling22:45
lifelesscurrent status of ci - broker wasn't handing out te's properly / effectively - assuming te's have hung, am redeploying22:49
*** CLOUDOUTAGE has joined #tripleo22:50
CLOUDOUTAGElifeless devananda Ng SpamapS jog0 GheRivero derekh dprince slagle  -- ci-overcloud currently down https://etherpad.openstack.org/p/cloud-outage22:50
*** CLOUDOUTAGE has quit IRC22:50
*** e0ne has joined #tripleo22:52
lifeless./win 5122:53
*** al has joined #tripleo22:53
*** e0ne has quit IRC22:57
ccrouch(5:38:58 PM) jb11211: lifeless: are you aware of any work that's already been done to this end or does it need to be started from scratch?23:01
ccrouchthere has been some work done in this area under the "instack" umbrella23:01
ccrouchhttps://github.com/agroup/instack23:02
ccrouchand23:02
ccrouchhttps://github.com/agroup/instack-undercloud23:02
lifelessccrouch: huh23:02
lifelessI didn't know about that23:02
ccrouchwell you knew about instack right?23:03
ccrouchslagle: was meant to have talked about that at sunnyvale23:03
ccrouchmost of the stuff in instack-undercloud is just scripting to get the right packaged version of tripleo and tuskar installed23:04
*** michchap has quit IRC23:05
*** decede has quit IRC23:05
ccrouchthere is a very basic tuskar element, just there for expediency. The right solution is tuskar related elements in t-i-e, as jb11211 was mentioning23:05
*** decede has joined #tripleo23:05
ccrouchwe just haven't had the bandwidth to do that yet23:05
ccrouchthere is also a scripts to initialize the overcloud, which will go away when that feature is in tuskar (os-cloud-config ?)23:07
lifelessccrouch: yes - but why not use the current scripts in incubator  - so we don't have 3 copies of them ?23:08
*** yamahata has quit IRC23:08
lifelessccrouch: incubator -> os-cloud-config is the plan23:08
lifelessccrouch: if instack has a copy of them, we'll end up with a harder to track migration23:08
lifelessor I worry that we will anyway23:08
* SpamapS has no idea what instack is23:09
ccrouchlifeless: we use all the incubator scripts23:10
ccrouche.g. init-keystone23:10
ccrouchsetup-endpoints23:10
ccrouchsetup-neutron23:10
lifelessccrouch: ok; I must have misinterpreted what you said :)23:10
ccrouchno worries :-)23:11
ccrouchwe've tried hard not to reinvent stuff which was already done in incubator. As you say, useless duplication when those are available and being tested23:12
lifelesscool cool23:14
lifelessI read 12:07 < ccrouch> there is also a scripts to initialize the overcloud, which will go away when that feature is in tuskar (os-cloud-config ?)23:14
lifelessas meaning you had stuff that would be in os-cloud-config23:14
ccrouchSpamapS: apologies. sounds like slagle was too shy in sunnyvale, though i thought he'd mentioned this on a dev thread at some point23:14
ccrouchanyways instack is basically just a way to install t-i-e on an actual running machine, versus an image23:14
SpamapSccrouch: I sat with slagle the whole week. We were just trying to fix CI. ;)23:14
ccrouchdoh :-)23:15
ccrouchwell i'm sure he'd be happy to chat more about it23:15
ccrouchlifeless: ah ok, no just using the same stuff as incubator, which will eventually be replaced by os-cloud-config23:17
SpamapSccrouch: we've talked about it before. I've never much liked the idea, as I think it will bleed upward into the elements as overly complex install scripts. However, I understand why it is desired.23:18
*** rpodolyaka1 has joined #tripleo23:18
*** CLOUDOUTAGE has joined #tripleo23:21
CLOUDOUTAGElifeless devananda Ng SpamapS jog0 GheRivero derekh dprince slagle  -- ci-overcloud currently down https://etherpad.openstack.org/p/cloud-outage23:21
*** CLOUDOUTAGE has quit IRC23:21
*** rpodolyaka1 has quit IRC23:21
*** michchap has joined #tripleo23:21
ccrouchSpamapS: so far so good i think. Besides supporting the package based install of elements, which is orthogonal, I dont think we've had to change the elements to support this specifically. though slagle can keep me honest there23:22
lifelessSpamapS: I think 'run-once' stuff is ok, its the 'evolve it' stuff I worry about bleeding23:23
*** derekh has joined #tripleo23:23
lifelessderekh: HI23:24
lifelessderekh: te's weren't responding23:24
derekhlifeless: hi ya23:24
derekhlifeless: oh, any idea why?23:25
lifelessderekh: so I think there's a bug in the te side agent or something, anyhow redeploying atm, 6 ACTIVE 4 spawning23:25
lifelessderekh: no23:25
derekhlifeless: ok23:25
lifelessderekh: but I imagine killing the broker and then making a new one could well tickle it23:25
derekhlifeless: ok, sounds like that something we've done before when broker used to fall over but maybe its a slightly different case23:26
SpamapSlifeless: one thing that I think we've learned, is that it is really hard to keep image build times fast23:27
SpamapSlifeless: still necessary, but when they slip.. users get impatient and want to do things in place.23:27
lifelessSpamapS: yeah23:28
lifelessok te's registering now23:28
lifelesswell machines atm23:28
lifelessok te's registered23:28
lifelessderekh: I think the te's weren't registered with the broker, I was misreading the status output23:29
lifelessyes - https://jenkins07.openstack.org/job/check-tripleo-undercloud-precise/373/console is away23:29
derekhwoot23:30
lifelesslunch then some reviews23:31
lifelessgreghaynes: what do you need?23:31
greghaynesall knowledge of heat via instant osmisis23:35
*** stevehuang has quit IRC23:36
greghaynesI did just push two patches, one which depends on a third, that could use reviewing23:36
greghayneshttps://review.openstack.org/#/c/83675/ is really the one that needs reviewing23:38
slaglelifeless: instack is just the "clever" name i gave the scripted undercloud installer we talked about in sunnyvale23:40
lifelessslagle: ack23:40
slaglei didn't spend enough time coming up with a cool name that starts with M*.23:40
lifelessmystack?23:40
lifeless:P23:40
slaglecould be :)23:40
slaglejb11211: but, no, we don't yet have elements for tuskar23:41
SpamapSslagle: Matzo23:41
lifelessok 40 testenvironmets up and running23:41
lifelessSpamapS: Balls!23:41
SpamapSftw23:41
greghayneslifeless: I could probably use some advice on what to poke at next - I was thinking id get the software-config patch working for me (I think I have a mirror-related issue with it) and then try to get merge.py able to do CONTROLSCALE > 1... not sure if thats best route to victory though?23:43
greghaynesAlternate idea was trying to figure out enough heat to work on the resource SpamapS was mentioning earlier23:45
greghayneshrm, I think openstack gerrit bot is MIA23:46
SpamapSgreghaynes: Testing the software-config patch and getting CONTROLSCALE > 1 would be very helpful23:46
greghaynessweet23:48
*** hashar has quit IRC23:50
*** e0ne has joined #tripleo23:52
*** xuhaiwei has joined #tripleo23:54
*** e0ne has quit IRC23:55

Generated by irclog2html.py 2.14.0 by Marius Gedminas - find it at mg.pov.lt!