*** casanch1_ has quit IRC | 00:01 | |
*** julim has quit IRC | 00:05 | |
*** dkehn_ has joined #tripleo | 00:18 | |
*** dkehn_ is now known as dkehnx | 00:19 | |
BadCub | greghaynes: Hi Greg | 00:21 |
---|---|---|
BadCub | lifeless: thanks for the intro :-) | 00:22 |
* BadCub needs to ponder dinner and excedrin | 00:24 | |
*** matsuhashi has joined #tripleo | 00:27 | |
*** nosnos has joined #tripleo | 00:32 | |
*** yamahata has joined #tripleo | 00:33 | |
*** noslzzp has joined #tripleo | 00:41 | |
lifeless | StevenK: coverage - did you see? try setup.py test --coverage --coverage-pacakge-name ... | 00:43 |
lifeless | StevenK: we'll get the testr alias added in a new release but that ^ should work right now | 00:43 |
lifeless | SpamapS: ^ | 00:44 |
lifeless | -> C's gym class | 00:44 |
SpamapS | lifeless: fix all the things. :) | 00:49 |
*** saurabhs has quit IRC | 00:57 | |
*** CaptTofu_ has quit IRC | 01:01 | |
*** CaptTofu_ has joined #tripleo | 01:02 | |
*** CaptTofu_ has quit IRC | 01:06 | |
tchaypo | So I contacted support for my keyboard a few days ago to ask if there was a firmware upgrade I could apply to try to make it not reset itself a couple of times a day | 01:21 |
tchaypo | after checking to see if it might just be low on charge they've vey swiftly moved on to "it seems that it's not charging correctly, I've put you in touch with our RMA department so we can organise a replacement for you" | 01:22 |
tchaypo | thas more than I expected | 01:22 |
*** nati_ueno has quit IRC | 01:30 | |
mordred | lifeless: your patch still confuses me | 01:32 |
tchaypo | lolololol | 01:37 |
tchaypo | I've just realised that the review time stats refer to the median, the 3rd quartile... and the 1rd quartile. | 01:38 |
openstackgerrit | Steve Kowalik proposed a change to openstack/os-cloud-config: Wrap register-nodes CLI in try/except https://review.openstack.org/99530 | 01:41 |
*** rwsu has quit IRC | 01:42 | |
tchaypo | StevenK: are you just mkaing our review backlog worse? | 01:43 |
StevenK | Sorry | 01:44 |
StevenK | Shall I stop coding? :-P | 01:44 |
tchaypo | don't stop coding | 01:48 |
tchaypo | just start reviewing | 01:48 |
vinsh | greghaynes, soooo I posted a comment in: https://review.openstack.org/#/c/61376/ | 01:58 |
vinsh | my changes to listen to glance-registry on localIP work. | 01:59 |
vinsh | Will have to see what Jan thinks. | 01:59 |
lifeless | mordred: ok, which one? | 02:14 |
*** noslzzp has quit IRC | 02:18 | |
*** CaptTofu_ has joined #tripleo | 02:23 | |
*** CaptTofu_ has quit IRC | 02:28 | |
*** eghobo has quit IRC | 02:36 | |
*** CaptTofu_ has joined #tripleo | 02:38 | |
openstackgerrit | Clint 'SpamapS' Byrum proposed a change to openstack/os-collect-config: Make heat the default collection method https://review.openstack.org/94527 | 02:46 |
openstackgerrit | Clint 'SpamapS' Byrum proposed a change to openstack/os-collect-config: Cache auth_ref from keystoneclient https://review.openstack.org/94526 | 02:46 |
openstackgerrit | Clint 'SpamapS' Byrum proposed a change to openstack/os-collect-config: Add dogpile cache to keystone abstraction layer https://review.openstack.org/94524 | 02:46 |
openstackgerrit | Clint 'SpamapS' Byrum proposed a change to openstack/os-collect-config: Split keystone away from heat collector https://review.openstack.org/94523 | 02:46 |
*** ramishra has joined #tripleo | 02:56 | |
*** noslzzp has joined #tripleo | 02:58 | |
*** untriaged-bot has joined #tripleo | 03:00 | |
untriaged-bot | No untriaged bugs so far! \o/ | 03:00 |
*** untriaged-bot has quit IRC | 03:00 | |
*** noslzzp has quit IRC | 03:03 | |
*** CaptTofu_ has quit IRC | 03:26 | |
*** CaptTofu_ has joined #tripleo | 03:27 | |
*** akuznetsov has joined #tripleo | 03:30 | |
*** CaptTofu_ has quit IRC | 03:31 | |
lifeless | tchaypo: StevenK: don't bank on it until we get an ack from cody, but I think we'll be set on tickets. | 03:33 |
*** ramishra has quit IRC | 03:33 | |
lifeless | marios: btw https://review.openstack.org/#/c/92509/2 is +2 ready I believe. | 03:34 |
lifeless | marios: note that because its a no-op rebase your -1 was sticky :( | 03:34 |
tchaypo | Thanks | 03:35 |
*** pcrews has quit IRC | 03:40 | |
lifeless | greghaynes: ok so whats the next blocker for getting a CI job with HA control plane (over or under) | 03:43 |
*** nosnos has quit IRC | 03:44 | |
greghaynes | I dont know of any other than patch reviews ATM. Have you been able to re-check the os-is-bootstrap-host review stack and the heat templating for galera cluster? | 03:46 |
lifeless | greghaynes: no, gimme numbers? | 03:46 |
*** eghobo has joined #tripleo | 03:47 | |
greghaynes | https://review.openstack.org/#/c/83883/ and https://review.openstack.org/#/c/93032/ + its dependencies | 03:47 |
lifeless | hmm | 03:48 |
greghaynes | and https://review.openstack.org/#/c/99196/ | 03:48 |
lifeless | I want to put the newest patch at the top in gertty | 03:48 |
*** ramishra has joined #tripleo | 03:49 | |
*** jml has quit IRC | 03:52 | |
*** jml has joined #tripleo | 03:53 | |
greghaynes | hrm, I should really update the commit msg on 83883 | 03:56 |
*** elynn_ has quit IRC | 03:57 | |
lifeless | greghaynes: the one I just +2d. Really ? :) | 03:57 |
greghaynes | heh, well that should carry over | 03:57 |
lifeless | we need SpamapS or some other +2 to land it ... | 03:58 |
greghaynes | the bit about it not solving scaling issues is no longer a problem | 03:58 |
lifeless | greghaynes: ah cool | 03:58 |
lifeless | the cluster address is cool with a trailing , ? | 03:59 |
greghaynes | I got a successful build with 3 control nodes, so I believe so | 04:00 |
lifeless | cool | 04:00 |
lifeless | ah, another gertty bug | 04:01 |
lifeless | please wait while I fix my tools. | 04:01 |
lifeless | once SpamapS is back from dinner we may get the other two patches landed | 04:01 |
greghaynes | w00t | 04:01 |
openstackgerrit | Steve Kowalik proposed a change to openstack/os-cloud-config: Check for relevant environment variables https://review.openstack.org/95396 | 04:03 |
*** akuznetsov has quit IRC | 04:09 | |
*** matsuhashi has quit IRC | 04:11 | |
*** lazy_prince has joined #tripleo | 04:13 | |
*** tzumainn has quit IRC | 04:18 | |
*** akuznetsov has joined #tripleo | 04:20 | |
lifeless | greghaynes: ok so lets talk 86435 | 04:20 |
lifeless | greghaynes: I must have managed to lose review comments somewhere | 04:21 |
lifeless | greghaynes: the test with || true is heinous. | 04:21 |
greghaynes | Yep. Got ideas for a les heinous way? | 04:22 |
greghaynes | I think it was mentioned maybe exit 0 in that case but echo 255 | 04:23 |
StevenK | tchaypo: Remember the docs question about os-cloud-config? | 04:23 |
lifeless | greghaynes: test $? | 04:23 |
StevenK | /home/steven/openstack/openstack/os-cloud-config/doc/source/index.rst:11: WARNING: toctree contains reference to document 'contributing' that doesn't have a title: no link will be generated | 04:23 |
greghaynes | lifeless: Would still have to || true due to set -e | 04:24 |
lifeless | oh I know where my comments went. I ratholed into man bash | 04:24 |
lifeless | greghaynes: set +e | 04:24 |
lifeless | os-is-bootstrap-host | 04:24 |
lifeless | val=$? | 04:24 |
lifeless | set -e | 04:24 |
greghaynes | hah, I guess the set -e doesnt really do much if youre || true'ing to get around it :p | 04:24 |
lifeless | its longer but obviously correct vs echoing which could come from anything | 04:24 |
lifeless | whereas the || true is obviously incorrect, until you read the source. | 04:25 |
greghaynes | yep, fair | 04:25 |
StevenK | You can't if ! os-is-bootstrap-host ; then ... ? | 04:25 |
StevenK | Which avoids the whole set +e rubbish | 04:26 |
greghaynes | StevenK: Theres 3 vals to check | 04:26 |
greghaynes | er, 3 possible values | 04:26 |
greghaynes | so need to either catch output or test exit val | 04:26 |
StevenK | greghaynes: if ! os-is-bootstrap-host will test the exit value | 04:27 |
lifeless | StevenK: its a three-value thing, one ok, and two different failure modes | 04:27 |
greghaynes | yep, for a boolean, but then youve lost the exit val | 04:27 |
lifeless | StevenK: is $? accessible after that express? | 04:27 |
lifeless | greghaynes: possibly not | 04:28 |
lifeless | greghaynes: this is what I was digging into when I ran out of time the other day | 04:28 |
greghaynes | ah | 04:28 |
lifeless | greghaynes: if false; then true; else echo $?; fi | 04:28 |
lifeless | 1 | 04:28 |
lifeless | greghaynes: you'll want to test it obviously | 04:29 |
lifeless | greghaynes: while you're tweaking things, care to fixup the []'s and foo;bar's in that file? separate patch if you like but it was a bit hard to read | 04:30 |
lifeless | greghaynes: (specifially space between expression and ], and ] and ;, and ; and then. | 04:30 |
greghaynes | yep, np | 04:31 |
StevenK | You can only test once | 04:31 |
lifeless | yeah, you need to capture it, my example was an example | 04:31 |
StevenK | $? gets overridden by the first test | 04:31 |
StevenK | steven@undermined:~% ./foo.sh | 04:31 |
StevenK | + ./testscript | 04:31 |
StevenK | + '[' 255 -eq 1 ']' | 04:31 |
StevenK | + '[' 1 -eq 255 ']' | 04:31 |
*** CaptTofu_ has joined #tripleo | 04:31 | |
lifeless | StevenK: show the source luke | 04:31 |
StevenK | lifeless, greghaynes: http://paste.openstack.org/show/83743/ | 04:32 |
greghaynes | Just made a test that pretty much proved you can do it with test | 04:33 |
StevenK | \o/ | 04:33 |
greghaynes | so will fix that | 04:33 |
greghaynes | yep, did just that :) | 04:33 |
StevenK | +1 for lack of set +e | 04:33 |
tchaypo | yest StevenK ? | 04:35 |
StevenK | tchaypo: I pasted the warning a few lines after my prod | 04:35 |
*** CaptTofu_ has quit IRC | 04:35 | |
tchaypo | StevenK: yes, i remember that warning | 04:37 |
StevenK | tchaypo: You also couldn't get the docs to build locally | 04:37 |
tchaypo | yes. did you get that fixed? | 04:37 |
StevenK | It works for me | 04:38 |
StevenK | tchaypo: So, tox -epy27 ; . .tox/py27/bin/activate ; python setup.py build_sphinx | 04:38 |
tchaypo | nice and simple | 04:38 |
StevenK | I'm adding a docs venv in the patch to fix that warning | 04:39 |
* tchaypo looks forward to the review to add "tox-edocs" | 04:39 | |
cody-somerville | lifeless: StevenK: tchaypo: What are you waiting on ACK from me on? | 04:40 |
tchaypo | fwiw i fail on "tox -epy27" | 04:40 |
tchaypo | │tox.ConfigError: ConfigError: substitution key 'posargs' not found | 04:40 |
StevenK | tchaypo: Pastebin the full output? | 04:40 |
StevenK | cody-somerville: From lifeless' mail about pycon au | 04:40 |
tchaypo | cody-somerville: confirmation that we have tickets to pycon-au | 04:40 |
tchaypo | StevenK: http://paste.openstack.org/show/83745/ | 04:42 |
tchaypo | i have vague memories that this is caused by a newer version of tox and can be fixed by downgrading, i think | 04:42 |
StevenK | % tox --version | 04:42 |
StevenK | 1.6.0 imported from /usr/lib/python2.7/dist-packages/tox/__init__.pyc | 04:43 |
StevenK | Yeah, are you running 1.7.0 ? | 04:43 |
tchaypo | silly me. ?I've just realised that the review stats don't just come from tripleo-incubator project, they're from all of our projects. | 04:47 |
*** matsuhashi has joined #tripleo | 04:48 | |
*** nosnos has joined #tripleo | 04:48 | |
tchaypo | yeah,m 1.6.0 is working better | 04:49 |
lifeless | cody-somerville: (see corp email a couple days ago) | 04:50 |
lifeless | cody-somerville: I'm going to mail you shortly about the other stuff you asked me | 04:50 |
StevenK | Heh, python-tox is still 1.6.0 in utopic | 04:50 |
cody-somerville | I have like a two thousand unread e-mails :} | 04:50 |
openstackgerrit | A change was merged to openstack/tripleo-image-elements: Properly enabling and restarting snmpd https://review.openstack.org/95689 | 04:52 |
openstackgerrit | A change was merged to openstack/diskimage-builder: Parameterise PXE kernel and initrd selection https://review.openstack.org/83390 | 04:52 |
openstackgerrit | A change was merged to openstack/diskimage-builder: Tidy up SuSE kernel selection https://review.openstack.org/90748 | 04:52 |
lifeless | cody-somerville: ok so look for ones from me:) | 04:52 |
cody-somerville | :-D | 04:53 |
lifeless | theres less of those, I hope. | 04:53 |
openstackgerrit | OpenStack Proposal Bot proposed a change to openstack/diskimage-builder: Updated from global requirements https://review.openstack.org/99074 | 04:54 |
*** michchap_ has quit IRC | 04:55 | |
*** michchap has joined #tripleo | 04:55 | |
tchaypo | Oaky, I've drafted http://paste.openstack.org/show/msSndC7ZypdvbssY8QRq/ as an email to send to the team as a reminder about needing to do reviews | 04:56 |
tchaypo | lifeless: StevenK: and anyone else who's around - feedback welcome before I hit send.. | 04:57 |
lifeless | greghaynes: -1 on 93032 | 04:57 |
lifeless | tchaypo: sec | 04:57 |
lifeless | tchaypo: looks great | 04:58 |
StevenK | "handy link" | 04:58 |
lifeless | you'll be review czar before you know it | 04:58 |
lifeless | bit.ly might be useful for the url | 04:58 |
StevenK | I agree it's a handy link, but it's still horrible | 04:58 |
lifeless | and or link to the wiki page if you didn't | 04:58 |
tchaypo | http://eavesdrop.openstack.org/meetings/tripleo/2014/tripleo.2014-06-11-07.01.html has way too much "tchaypo" | 04:59 |
openstackgerrit | OpenStack Proposal Bot proposed a change to openstack/os-cloud-config: Updated from global requirements https://review.openstack.org/93253 | 04:59 |
tchaypo | especially considering that most of it was just me parroting things other people said with #info prepended | 04:59 |
tchaypo | next time I'm going to encourage other people to use #info more liberally | 04:59 |
lifeless | tchaypo: Idon't think they can | 04:59 |
lifeless | tchaypo: since they aren't running the meeting | 04:59 |
lifeless | tchaypo: IMBW | 05:00 |
lifeless | tchaypo: did the url you asked me to link get in there ? | 05:00 |
tchaypo | https://wiki.debian.org/MeetBot says that it's a "command for everyone" | 05:00 |
lifeless | greghaynes: reviewed the full set | 05:01 |
tchaypo | yep, your #link made it | 05:01 |
lifeless | greghaynes: where is the patch to change the default to 3-node control plane? | 05:01 |
lifeless | tchaypo: TIL | 05:01 |
tchaypo | but jcoufal's #agreed did not, as that's a chair-only command | 05:01 |
greghaynes | ooo I should make that patch | 05:02 |
* StevenK tries to undistract tchaypo | 05:02 | |
tchaypo | undistract me? | 05:03 |
StevenK | tchaypo: os-cloud-config docs warning | 05:03 |
tchaypo | i'm getting to it. as soon as i send this email. | 05:03 |
tchaypo | just to check - email goes to openstack-dev with [tripleo] in the subject, right? | 05:05 |
tchaypo | If there's a more specific list I'm either not on it or forgot about it | 05:06 |
lifeless | tchaypo: correct | 05:07 |
StevenK | tchaypo: Right, I've fixed the contributing thing, what was the change to turn warnings into errors? | 05:07 |
*** eghobo has quit IRC | 05:08 | |
*** eghobo has joined #tripleo | 05:09 | |
*** noslzzp has joined #tripleo | 05:09 | |
tchaypo | hrm | 05:12 |
tchaypo | https://review.openstack.org/#/c/97129/4/setup.cfg | 05:13 |
tchaypo | lifeless: the R1 work involves migrating from saucy to trusty, right? | 05:15 |
tchaypo | so updating the reame to say trusty would be correct? | 05:15 |
lifeless | which reame? | 05:16 |
tchaypo | *README.rst | 05:16 |
tchaypo | in tripleo-incubator | 05:16 |
lifeless | I think the trusty patch landed | 05:16 |
lifeless | and yes, we're running trusty in hp1 now | 05:16 |
openstackgerrit | James Polley proposed a change to openstack/tripleo-incubator: Add details of which OS releases are tested in CI https://review.openstack.org/92749 | 05:16 |
tchaypo | hrm. | 05:21 |
tchaypo | so I'm looking for old reviews, to follow the advice in my email | 05:21 |
tchaypo | and I run across https://review.openstack.org/#/c/93370/ | 05:22 |
*** rakesh_hs has joined #tripleo | 05:22 | |
*** noslzzp has quit IRC | 05:22 | |
lifeless | tchaypo: 97129 failed CI | 05:22 |
tchaypo | it's been a month since that got 2 +2s, but it's not going to get a +a until some indefinite future event. Is there anything we can do to hide that? Would marking it as WIP help? | 05:23 |
tchaypo | ? | 05:23 |
lifeless | tchaypo: indeed, done. | 05:24 |
tchaypo | lifeless: yes but it depends on 92749 and it needs a rebase; I'm waiting for 92749 to land before i rebase | 05:25 |
lifeless | tchaypo: why | 05:25 |
tchaypo | ubuntu has suddenly decided that alt-tab should follow the order of the icons down the panel rather than most-recently-used app | 05:25 |
lifeless | tchaypo: 'unity' has ... | 05:26 |
tchaypo | oh. | 05:27 |
tchaypo | my keyboard was in mac mode, so i was actually using super-tab not alt-tab | 05:28 |
tchaypo | lifeless: half of my reasons are invalid. the remaining reason is that I've already rebased it a few times | 05:29 |
tchaypo | and I'm lazy and don't want to do it again | 05:30 |
tchaypo | but actually it shouldn't be as much work as i think | 05:30 |
lifeless | so folk are going to spend neurons figuring out whats wrong for you | 05:30 |
lifeless | either WIP it as a signal | 05:30 |
lifeless | or fix it as a courtesy :) | 05:30 |
tchaypo | you win | 05:31 |
openstackgerrit | A change was merged to openstack/diskimage-builder: Debian: Support additional debootstrap arguments https://review.openstack.org/90535 | 05:31 |
openstackgerrit | James Polley proposed a change to openstack/tripleo-incubator: Clean up all outstanding ReST errors and warnings https://review.openstack.org/97129 | 05:31 |
tchaypo | i mean, you are correct. the project wins :) | 05:31 |
* tchaypo runs test locally | 05:32 | |
tchaypo | oh good, i can reproduce that locally now | 05:33 |
tchaypo | probably should have checked before the commit :( | 05:34 |
*** rbrady has quit IRC | 05:34 | |
*** akuznetsov has quit IRC | 05:40 | |
openstackgerrit | James Polley proposed a change to openstack/tripleo-incubator: Clean up all outstanding ReST errors and warnings https://review.openstack.org/97129 | 05:41 |
*** rdopieralski has joined #tripleo | 05:42 | |
openstackgerrit | Steve Kowalik proposed a change to openstack/os-cloud-config: More documentation fixes https://review.openstack.org/99570 | 05:53 |
*** lsmola has joined #tripleo | 05:54 | |
* tchaypo gives up | 05:57 | |
tchaypo | my wifi just isn't reliable this afternoon | 05:57 |
tchaypo | tune tio prepare for german class | 05:57 |
tchaypo | *time to | 05:57 |
*** akuznetsov has joined #tripleo | 06:11 | |
*** dshulyak_ has joined #tripleo | 06:12 | |
*** CaptTofu_ has joined #tripleo | 06:19 | |
*** CaptTofu_ has quit IRC | 06:24 | |
*** jprovazn has joined #tripleo | 06:24 | |
*** akuznetsov has quit IRC | 06:25 | |
openstackgerrit | Nikhil Manchanda proposed a change to openstack/diskimage-builder: Do not use DatasourceNone for precise cloud-init https://review.openstack.org/99505 | 06:42 |
openstackgerrit | Nikhil Manchanda proposed a change to openstack/diskimage-builder: Do not use DatasourceNone for precise cloud-init https://review.openstack.org/99505 | 06:44 |
*** akuznetsov has joined #tripleo | 06:48 | |
*** jtomasek has joined #tripleo | 06:51 | |
*** cody-somerville has quit IRC | 06:53 | |
* SpamapS returns from dinner and opens reviews | 06:58 | |
*** xuhaiwei has quit IRC | 07:01 | |
lifeless | \o/ | 07:02 |
SpamapS | lifeless: I read this, and it strikes me that we don't ever take into account disaster recovery | 07:02 |
*** akuznetsov has quit IRC | 07:02 | |
SpamapS | randomstrings are fine for things we can re-assert, but the cluster name is something we'll have to inherit if we're restoring the database. | 07:03 |
lifeless | SpamapS: wouldn't we be restoring the heat stack too then ? | 07:04 |
lifeless | SpamapS: and wouldn't restoring it restore the string ? | 07:05 |
*** cody-somerville has joined #tripleo | 07:05 | |
*** cody-somerville has joined #tripleo | 07:05 | |
SpamapS | lifeless: that's an interesting question | 07:06 |
SpamapS | lifeless: I think no, because we may be recovering from disaster in a separate data center. | 07:07 |
lifeless | SpamapS: let me rephrase | 07:07 |
SpamapS | lifeless: my thinking is simply that we should pass this in. | 07:07 |
SpamapS | instead of making it random | 07:07 |
lifeless | SpamapS: there doesn't seem like there is any reason that it shouldn't be owned by heat | 07:07 |
lifeless | SpamapS: restoring a heat stack with data heat generated seems like a heat problem, different data centre doesn't seem like an interesting distinction | 07:08 |
SpamapS | It is, because you'd be restoring addresses.. | 07:08 |
SpamapS | all kinds of things | 07:09 |
SpamapS | physical UUID's of servers | 07:09 |
SpamapS | thats assuming that you've somehow synced all those servers to the new DC | 07:09 |
lifeless | right, 'restore' is a complex thing to do to a cluster | 07:09 |
SpamapS | seems completely unlikely | 07:09 |
SpamapS | Right, so I'm saying, to keep it simple, we should parameterize anything that is immutable state rather than make it random. | 07:10 |
lifeless | I think you're saying that you expect to implement 'restore a stack' as 'deploy a new stack and do restores to servers within it' | 07:10 |
SpamapS | I also don't know why making it random would actually be a good idea. | 07:10 |
lifeless | hmmm | 07:10 |
SpamapS | Since we may want two stacks to share masters in a warm-standby DR scenario.. they'd need to share wsrep_cluster_name | 07:10 |
lifeless | I don't know where to pull on this one | 07:11 |
lifeless | obviously it can either be passed in | 07:11 |
lifeless | or random | 07:11 |
SpamapS | Let me back up from the doom and gloom. | 07:11 |
*** eguz has joined #tripleo | 07:11 | |
lifeless | but heat broke defaulting parameters to expressions | 07:11 |
SpamapS | This particular random string doesn't actually need to be random. It just needs to be identical among this cluster. That is all. | 07:11 |
lifeless | SpamapS: and unique, no ? | 07:11 |
SpamapS | no | 07:11 |
*** eguz has quit IRC | 07:11 | |
*** pblaho has joined #tripleo | 07:12 | |
SpamapS | lifeless: uniqueness _does_ protect against accidentally joining two clusters to eachother that are not meant to be joined. | 07:12 |
lifeless | SpamapS: so, you know more about this than I | 07:12 |
lifeless | SpamapS: but there are many many DR things we don't support yet | 07:13 |
SpamapS | I think I want it to be stackname+resourcename | 07:13 |
lifeless | SpamapS: I think having HA >> not having HA | 07:13 |
SpamapS | but even stackname might be too rigid. | 07:13 |
*** jcoufal has joined #tripleo | 07:13 | |
lifeless | SpamapS: do we need to get this right, now ? | 07:14 |
SpamapS | lifeless: it even has a default value | 07:14 |
SpamapS | Default Value:my_wsrep_cluster | 07:14 |
*** eghobo has quit IRC | 07:15 | |
*** akuznetsov has joined #tripleo | 07:15 | |
lifeless | SpamapS: if we get the design wrong for this, AIUI, the consequence is that folk doing a restore may have to tweak the template they restore with | 07:15 |
SpamapS | lifeless: I'm trying to find out if it is something that ends up in the state stored on disk. If not, then I'm less concerned about long term ramifications as we can change it with just some downtime. | 07:15 |
SpamapS | If it is stored on disk, I think we _must_ think this through. | 07:16 |
lifeless | SpamapS: we can always bring up a new cluster, downtime copy, delete old cluster, no ? | 07:16 |
SpamapS | lifeless: that is what I'm looking at | 07:16 |
lifeless | SpamapS: or can one install of galera only participate in one cluster at a time ? | 07:16 |
SpamapS | lifeless: I don't thinki there should be pressure to push something this important through. | 07:16 |
SpamapS | lifeless: I'm not going to block it for a week. Just, let's think this through. | 07:17 |
lifeless | SpamapS: I'm just skeptical that we'll actually consider everything | 07:17 |
lifeless | SpamapS: my experience has been that we'll have to evolve it no matter what we decide today | 07:18 |
lifeless | SpamapS: so I'm not trying to say 'shove' | 07:18 |
lifeless | SpamapS: I'm trying say lets understand the consequences of getting it wrong | 07:19 |
lifeless | SpamapS: which I think is what you're looking at too, no? | 07:19 |
*** dshulyak_ has quit IRC | 07:21 | |
SpamapS | Ok, well I say -1 to this. It has no business being random. A good default value is {"Fn::Join": ["_", [ {Ref: "AWS::StackName"}, "controller" ] ] } | 07:22 |
greghaynes | SpamapS: Not sure what you mean by re-assert cluster name if we are restoring the database. Is the cluster name somehow tied with the data? | 07:23 |
lifeless | SpamapS: I don't follow your reasoning | 07:23 |
SpamapS | greghaynes: it appears it is not. | 07:23 |
greghaynes | Seems like if the whold cluster dies, its fine if we start it back up with a new name | 07:23 |
SpamapS | greghaynes: but I cannot find definitive assertions to say it does not stay with the on disk data | 07:23 |
lifeless | SpamapS: I understand you don't like it, but I don't understand the negative consequences you care concerned about | 07:23 |
SpamapS | greghaynes: What I can find are a lot of "this is just a safety precaution so you don't join two clusters that shouldn't be joined" | 07:24 |
*** e0ne has joined #tripleo | 07:24 | |
lifeless | SpamapS: we *expect* to be deploying (in CI) multiple stacks with the same name serially | 07:24 |
greghaynes | hrm | 07:24 |
lifeless | SpamapS: if we hav a rogue node, having a predictable name is entirely likely to end up conflicting | 07:24 |
lifeless | SpamapS: AIUI your only concern is about DR? | 07:25 |
SpamapS | the same name, but a different heat, with the same network? | 07:25 |
lifeless | SpamapS: same heat, 20m later | 07:25 |
lifeless | SpamapS: do a stack-delete, ironic fails to power off the node | 07:25 |
SpamapS | Yeah I'm seeing that as a real possibility. | 07:25 |
SpamapS | we don't have a kvm process that we can definitely say "its dead" | 07:26 |
SpamapS | hrrmm | 07:26 |
lifeless | SpamapS: I can't tell if thats sarcasm or not; but we are systematically locking up BMCs at the moment | 07:26 |
SpamapS | No sarcasm this time. | 07:26 |
SpamapS | http://www.quickmeme.com/img/9f/9f08a434ecd88e718e22a3a1a0f1f198c369aa0835d64d11d033a89153e7ece4.jpg | 07:26 |
lifeless | SpamapS: Are there any other negative consequences other than having to have a way to re-inject the previous value in a new stack that is going to have data restored onto it ? | 07:27 |
SpamapS | Ok so I now have two opposing viewpoints, one of which is supported by the current implementation, and the other supported by conjecture and FUD from my brain. I release my hold. Let me read the third patch... | 07:27 |
lifeless | SpamapS: http://www.severalnines.com/blog/full-restore-mysql-galera-cluster-backup | 07:28 |
lifeless | SpamapS: does not mention the cluster name at all as a factor | 07:28 |
lifeless | SpamapS: for xtrabackup (vs dumps which are obviously not impacted) | 07:29 |
jprovazn | greghaynes: hi, https://review.openstack.org/#/c/99196/2/elements/mysql-common/os-config-applier/mnt/state/etc/mysql/conf.d/cluster.cnf - galera can deal with comma at the end of "gcomm://x.x.x.x,y.y.y.y," expression? | 07:29 |
lifeless | jprovazn: I asked that too; greg says yes :) | 07:29 |
SpamapS | lifeless: Right, and there's also garbd | 07:29 |
lifeless | SpamapS: but it might be oversight | 07:30 |
lifeless | mordred: around? happen to know? | 07:30 |
openstackgerrit | A change was merged to openstack-infra/tripleo-ci: Name logs tarball after the instance name https://review.openstack.org/94631 | 07:30 |
SpamapS | lifeless: which can be used to make a snapshot ... and then that is restorable without the cluster name set. | 07:30 |
lifeless | SpamapS: so its sounding more and more to me that the disk files don't embed the cluster id | 07:30 |
jprovazn | lifeless: interesting, IIRC in some *older* version of galera behavior of comman at the end was "start in standalone mode if joining to other nodes failed", but this might have changed already | 07:30 |
SpamapS | I'm quite certain if the backup was done w/ xtrabackup the restored data would not have wsrep_cluster_name in it | 07:30 |
greghaynes | jprovazn: Pretty sure, yes. More than one set of testing would be great though | 07:31 |
SpamapS | and garbd does indeed just result in a populated innodb database | 07:31 |
* jprovazn liked previous expression more :) | 07:31 | |
SpamapS | I'm convinced now, that it is an ephemeral value | 07:31 |
SpamapS | just used for coordination "in the moment" | 07:31 |
*** jcoufal has quit IRC | 07:33 | |
SpamapS | greghaynes: https://review.openstack.org/#/c/83883/ <-- tiny commit message fix. I will +A without re-pass on tests. :) | 07:35 |
SpamapS | I left this one un-approved since lifeless had a -1 https://review.openstack.org/#/c/86435/ | 07:35 |
howleyt | lifeless: if you have a minute, have a question on https://review.openstack.org/#/c/81934/ | 07:35 |
*** mugsie has quit IRC | 07:36 | |
*** cody-somerville has quit IRC | 07:36 | |
openstackgerrit | Gregory Haynes proposed a change to openstack/tripleo-image-elements: Add os-is-bootstrap-host element and script https://review.openstack.org/86435 | 07:37 |
openstackgerrit | Gregory Haynes proposed a change to openstack/tripleo-heat-templates: Add galera clustering properties https://review.openstack.org/83883 | 07:39 |
openstackgerrit | Nikhil Manchanda proposed a change to openstack/diskimage-builder: Do not use DatasourceNone for precise cloud-init https://review.openstack.org/99505 | 07:39 |
*** jcoufal has joined #tripleo | 07:43 | |
*** e0ne has quit IRC | 07:44 | |
*** e0ne has joined #tripleo | 07:45 | |
openstackgerrit | Gonéri Le Bouder proposed a change to openstack/tripleo-image-elements: Drop some unnecessary lsb_release calls https://review.openstack.org/99161 | 07:48 |
*** e0ne has quit IRC | 07:49 | |
goneri | lifeless: do you have 5 minutes to speak about https://review.openstack.org/#/c/85272/ ? | 07:50 |
*** StevenK has quit IRC | 07:50 | |
*** cody-somerville has joined #tripleo | 07:50 | |
*** cody-somerville has joined #tripleo | 07:50 | |
lifeless | goneri: I will about 20 past | 07:50 |
lifeless | howleyt: I can in a little bit, sure | 07:51 |
*** jcoufal has quit IRC | 07:51 | |
*** StevenK has joined #tripleo | 07:51 | |
*** mrunge has joined #tripleo | 07:59 | |
*** jcoufal has joined #tripleo | 08:00 | |
*** jistr has joined #tripleo | 08:02 | |
openstackgerrit | Marios Andreou proposed a change to openstack/tripleo-image-elements: Refresh heat-cfntools element https://review.openstack.org/92036 | 08:02 |
openstackgerrit | Marios Andreou proposed a change to openstack/tripleo-image-elements: Prepare os-*-config for CI https://review.openstack.org/95403 | 08:02 |
marios | stevebaker: rebased ^^^ | 08:03 |
howleyt | lifeless: thanks | 08:04 |
*** derekh_ has joined #tripleo | 08:11 | |
*** lucasagomes has joined #tripleo | 08:12 | |
SpamapS | ok, sleepy time | 08:12 |
*** jcoufal has quit IRC | 08:14 | |
*** markmc has joined #tripleo | 08:19 | |
derekh_ | Today I would mostly like to merge | 08:20 |
derekh_ | Extract F20 log file from the journal - https://review.openstack.org/#/c/97947/ | 08:20 |
derekh_ | Increase our ci PIP timeout (again) - https://review.openstack.org/#/c/98232/ | 08:20 |
derekh_ | Small fix to decrease logs from wget - https://review.openstack.org/#/c/98331/ | 08:20 |
derekh_ | Delete keystone tokens on all distros - https://review.openstack.org/#/c/98196/ | 08:20 |
derekh_ | Log a warning if there is a delay getting a ci test env - https://review.openstack.org/#/c/99118/ | 08:20 |
openstackgerrit | A change was merged to openstack/tripleo-image-elements: Enable rsync daemon on swift-storage https://review.openstack.org/95173 | 08:25 |
*** pelix has joined #tripleo | 08:26 | |
*** giulivo has joined #tripleo | 08:28 | |
derekh_ | "Brocade OSS CI" commenting on tripleo-incubator patch? are we getting a 3rd party CI system ? https://review.openstack.org/#/c/92749/ | 08:34 |
derekh_ | lifeless: ^ | 08:34 |
lifeless | derekh_: they're probably testing the world | 08:35 |
lifeless | and they're reporting the wrong url | 08:35 |
derekh_ | yup | 08:35 |
lifeless | derekh_: ok hear you on those patches | 08:36 |
lifeless | derekh_: do they all pass CI ? | 08:36 |
derekh_ | lifeless: yup, | 08:37 |
openstackgerrit | A change was merged to openstack-infra/tripleo-ci: Write log file for each systemd unit https://review.openstack.org/97947 | 08:37 |
lifeless | 98232 needs an infra +A, ask there | 08:38 |
*** cody-somerville has quit IRC | 08:39 | |
derekh_ | lifeless: ? its in toci | 08:40 |
derekh_ | lifeless: added a comment to https://bugs.launchpad.net/openstack-ci/+bug/1328645 this morning, we are already extracting logs for ubuntu and striping /var/log/upstart out of the pathname | 08:41 |
uvirtbot | Launchpad bug 1328645 in openstack-ci "ubuntu tripleo-ci jobs are not logstash indexable" [Undecided,In progress] | 08:41 |
derekh_ | lifeless: so we just need to handle /mnt... | 08:41 |
lifeless | derekh_: oh, it i too oops. | 08:41 |
openstackgerrit | A change was merged to openstack-infra/tripleo-ci: Increase the pip default timeout to 60 https://review.openstack.org/98232 | 08:42 |
lifeless | derekh_: we need it to not be tarred up | 08:42 |
derekh_ | lifeless: its copied over then the logs are being untarred | 08:43 |
lifeless | derekh_: oh, ?! | 08:43 |
*** rlandy has joined #tripleo | 08:43 | |
*** jcoufal has joined #tripleo | 08:43 | |
derekh_ | see http://logs.openstack.org/95/99495/1/check-tripleo/check-tripleo-ironic-seed-precise/5924db9/logs/seed_logs/ | 08:43 |
derekh_ | let me find the patch | 08:43 |
lifeless | oh sweet | 08:43 |
derekh_ | lifeless: https://review.openstack.org/#/c/96308/ | 08:44 |
derekh_ | lifeless: my f20 patch was a follow on to provide systemd logs in the same way | 08:44 |
*** martyntaylor has joined #tripleo | 08:45 | |
openstackgerrit | A change was merged to openstack-infra/tripleo-ci: Switch to Mega progress reports https://review.openstack.org/98331 | 08:47 |
goneri | lifeless: for, https://review.openstack.org/#/c/85272/ is it fine if I restore this change https://review.openstack.org/#/c/85272/14/lib/common-functions ? | 08:48 |
lifeless | howleyt: goneri: ok my phone meeting is over | 08:49 |
lifeless | let me review derekh's stuff | 08:49 |
lifeless | then i'm yours | 08:49 |
lifeless | derekh_: See the -infra logs from this morning my time, sdague and I had a heart to heart | 08:49 |
openstackgerrit | A change was merged to openstack/diskimage-builder: Add tar as an output type https://review.openstack.org/98360 | 08:50 |
derekh_ | lifeless: saw them already, so pcrews is going to look into the dashboard changes ? | 08:50 |
*** cody-somerville has joined #tripleo | 08:52 | |
openstackgerrit | A change was merged to openstack-infra/tripleo-ci: Log a warning if wait on a te worker is excessive https://review.openstack.org/99118 | 08:53 |
lifeless | derekh_: yes | 08:53 |
derekh_ | cool | 08:53 |
*** e0ne has joined #tripleo | 08:53 | |
derekh_ | lifeless: so we pretty much will have to wait for that to happen before getting comments from E-R | 08:56 |
lifeless | yes | 08:57 |
*** untriaged-bot has joined #tripleo | 09:00 | |
untriaged-bot | No untriaged bugs so far! \o/ | 09:00 |
*** untriaged-bot has quit IRC | 09:00 | |
lifeless | howleyt: goneri: ok, hit me | 09:05 |
* goneri hits lifeless | 09:05 | |
goneri | My understanding is that you don't think we can replace this check https://review.openstack.org/#/c/85272/14/lib/common-functions | 09:06 |
goneri | by the check on the meta-data done here https://review.openstack.org/#/c/85272/14/diskimage_builder/element_dependencies.py | 09:06 |
howleyt | lifeless: sorry, I meant I had replied with question in that review - here it is: Ok, so you want to remove it from the heat template. If it is still left as configurable in the element, how do I make it default to True? Mustache will not differentiate between a variable being set to False or not being set at all. Is there a pattern for this in another element that you can point me to. | 09:07 |
howleyt | (you might need to check https://review.openstack.org/#/c/81934/ for context) | 09:08 |
goneri | so I see to option, either I keep the bash check and we do a similar test to time in a row. Or I give up :) | 09:08 |
goneri | s/to/two | 09:08 |
lifeless | goneri: thats right, they are totally different things | 09:08 |
lifeless | goneri: well, what bug are you fixing? | 09:08 |
goneri | none, the goal was to simplify the code | 09:09 |
lifeless | goneri: ok, so - I don't really hold a strong opinion about the metadata check, though it seems like it makes writing elements harder, so I'd really like to see a better reason for it | 09:11 |
lifeless | the 'did things work' check has to say IMO | 09:11 |
*** ramishra has quit IRC | 09:11 | |
goneri | initially, I realized it was possible to include two incompatible root elements at the same time. Error was unclear and start to append after the debootstrap, so a long time after the begining. | 09:12 |
openstackgerrit | Michael Kerrin proposed a change to openstack/tripleo-image-elements: Try and persist mac address for the bridge across reboots https://review.openstack.org/99604 | 09:12 |
goneri | so I started some patches to simplify this part of the code, that's the long story. | 09:13 |
lifeless | goneri: ok, so that might provide reason to do the metadata check; the issue with the functional check is that this makes sure the preconditions for the rest of the code are in good shape | 09:13 |
lifeless | goneri: specifically, what if debootstrap (for instance) fails silently | 09:14 |
goneri | lifeless: ok, we can keep the bash check as a sanity check | 09:14 |
goneri | I can adjust the error message this way: "Please include at least one distribution root element." → "Failed to deploy the root element." | 09:15 |
lifeless | sure | 09:16 |
lifeless | that makes sense to me | 09:16 |
goneri | ok then, let's go :) | 09:17 |
lifeless | howleyt: whats the review # for your patch adding keepalive_enabled to the elements? | 09:19 |
openstackgerrit | Gonéri Le Bouder proposed a change to openstack/diskimage-builder: fail at startup with no operating-system element https://review.openstack.org/85272 | 09:20 |
goneri | lifeless: ↑ | 09:21 |
lifeless | ooh unicode up arrow; fancy :) | 09:22 |
goneri | =) | 09:23 |
openstackgerrit | Cian O'Driscoll proposed a change to openstack/diskimage-builder: Explicitly name element enable-serial-console https://review.openstack.org/92805 | 09:26 |
openstackgerrit | Cian O'Driscoll proposed a change to openstack/diskimage-builder: Explicitly name element enable-serial-console https://review.openstack.org/92805 | 09:27 |
lifeless | howleyt: ? I"m heading to bed soon - wanted to look at the relevant in-instance code... | 09:28 |
goneri | lifeless: before you leave, if the change is fine for you, can you please put a +1 or +2? | 09:30 |
lifeless | goneri: I have | 09:30 |
goneri | lifeless: thanks, good night =) | 09:31 |
howleyt | lifeless: https://review.openstack.org/#/c/81934/ | 09:31 |
howleyt | and the corresponding element change: https://review.openstack.org/#/c/81920/ | 09:32 |
lifeless | howleyt: so, one thing you could do is have it be disable_keepalive | 09:34 |
lifeless | howleyt: if missing, leave it enabled, if set disable it | 09:35 |
derekh_ | lifeless: pblaho jprovazn Joshua thanks for the reviews | 09:36 |
jprovazn | derekh_: np | 09:36 |
lifeless | derekh_: joshua==jhesketh on IRC | 09:36 |
*** jprovazn has quit IRC | 09:38 | |
derekh_ | lifeless: cool, was looking for that, thanks | 09:39 |
howleyt | lifeless: ok, fair enough. I wouldn't mind if mustache was a bit more expressive. | 09:39 |
lifeless | howleyt: I wouldn't mind either : - we could look at going to handlebars(e.g. via pybars) at some point, or adding an extension to pystache | 09:42 |
derekh_ | lifeless: so any changes to R1 since I was last here ? | 09:44 |
lifeless | derekh_: nope, I suck | 09:44 |
lifeless | derekh_: I didn't even get to login to it | 09:44 |
*** e0ne_ has joined #tripleo | 09:44 | |
lifeless | derekh_: the regex patch was failing undercloud | 09:45 |
lifeless | derekh_: I started to poke at that | 09:45 |
derekh_ | lifeless: no prob, I'll jump in this morning and see what I can do, we got status==0 yesterday and could ssh to user image | 09:45 |
lifeless | and then its passed, yay | 09:45 |
pblaho | derekh_: np | 09:45 |
howleyt | lifeless: ya, was looking at handlebars yesterday, haven't tried it out, though. | 09:45 |
lifeless | howleyt: FWIW I wrote pybars, so if it gives you headaches, you can blame me | 09:46 |
derekh_ | lifeless: yup, so then I added some of the prepare stuff, to setup what CI expects flavors/images/networks etc... so will continue on that today | 09:46 |
lifeless | derekh_: so, if I may suggest | 09:46 |
derekh_ | lifeless: yup, fire ahead | 09:46 |
lifeless | derekh_: lets get that vlan fixup change, and the associated config changes pushed up into gerrit for reviw | 09:46 |
lifeless | derekh_: so we have minimal state to lose | 09:46 |
lifeless | derekh_: secondly, I'd like to get a CI job that tests with the public IP on separate network via vlan up | 09:47 |
lifeless | derekh_: so that we can lock this in as a supported thing - it will fail initially | 09:47 |
*** e0ne has quit IRC | 09:48 | |
lifeless | derekh_: but - long as you document where you leave things, I will be poking at this next week | 09:48 |
derekh_ | lifeless: ok will get what I can up to gerrit, some of the vlan changes you should probably submit (you've a better understanding of what the vlan changes are) | 09:49 |
lifeless | derekh_: ack | 09:50 |
derekh_ | lifeless: either way I'll try an leave it so you know where I leave everything befor I go | 09:50 |
lifeless | rocking, thanks | 09:51 |
lifeless | etherpad ftw :) | 09:51 |
derekh_ | yup | 09:51 |
openstackgerrit | Michael Kerrin proposed a change to openstack/tripleo-image-elements: Configure neutron-ovs-cleanup to run after openvswitch https://review.openstack.org/99609 | 09:53 |
openstackgerrit | Michael Kerrin proposed a change to openstack/tripleo-incubator: The default in heat-templates is to preserve ephemeral disks https://review.openstack.org/85353 | 10:03 |
*** akrivoka has joined #tripleo | 10:07 | |
*** jprovazn has joined #tripleo | 10:08 | |
openstackgerrit | Matthew Macdonald-Wallace proposed a change to openstack/tripleo-image-elements: Install the "classic" icinga interface https://review.openstack.org/87223 | 10:10 |
*** gcha has quit IRC | 10:13 | |
*** gcha has joined #tripleo | 10:19 | |
openstackgerrit | Derek Higgins proposed a change to openstack-infra/tripleo-ci: Log get_state_from_host to a file https://review.openstack.org/99620 | 10:20 |
*** matsuhashi has quit IRC | 10:20 | |
*** jml has quit IRC | 10:20 | |
*** matsuhashi has joined #tripleo | 10:20 | |
*** akuznetsov has quit IRC | 10:21 | |
*** yamahata has quit IRC | 10:21 | |
*** jml has joined #tripleo | 10:23 | |
lxsli | lifeless howleyt: if we move to handlebars have you considered how EG an oac file might add a helper? | 10:24 |
*** matsuhashi has quit IRC | 10:25 | |
lxsli | This could make passthrough much more useful | 10:28 |
*** rakesh_hs has quit IRC | 10:32 | |
*** e0ne_ has quit IRC | 10:37 | |
*** e0ne has joined #tripleo | 10:37 | |
andrearosa | anyone is available to look at this easy fix for a bug in the nova.conf mustache template? https://review.openstack.org/96711 | 10:39 |
*** e0ne has quit IRC | 10:41 | |
*** martyntaylor has quit IRC | 10:43 | |
*** martyntaylor has joined #tripleo | 10:47 | |
*** gcha has quit IRC | 10:54 | |
*** gcha has joined #tripleo | 10:55 | |
*** lathiat has quit IRC | 11:04 | |
*** lathiat has joined #tripleo | 11:05 | |
openstackgerrit | Nicholas Randon proposed a change to openstack/tripleo-incubator: Wait for os-collect-config to complate on the seed https://review.openstack.org/96515 | 11:12 |
*** nosnos has quit IRC | 11:13 | |
*** e0ne has joined #tripleo | 11:15 | |
*** e0ne has quit IRC | 11:17 | |
*** rbrady has joined #tripleo | 11:17 | |
*** e0ne has joined #tripleo | 11:17 | |
derekh_ | lxsli: so your long line patch has a merge conflict now so needs to be rebased, I have rebased it to put another patch on top of it (to avoid another conflict), mind if I push a new rebased version of your patch up ? | 11:19 |
openstackgerrit | Nicholas Randon proposed a change to openstack/tripleo-incubator: Wait for os-collect-config to complate on the seed https://review.openstack.org/96515 | 11:22 |
*** e0ne has quit IRC | 11:22 | |
*** shakayumi has joined #tripleo | 11:23 | |
*** shakayumi has quit IRC | 11:23 | |
*** shakayumi has joined #tripleo | 11:24 | |
*** CaptTofu_ has joined #tripleo | 11:44 | |
*** e0ne has joined #tripleo | 11:45 | |
*** akrivoka has quit IRC | 11:45 | |
*** CaptTofu_ has quit IRC | 11:45 | |
*** CaptTofu_ has joined #tripleo | 11:45 | |
*** shakayumi has quit IRC | 11:50 | |
*** mrunge has quit IRC | 11:55 | |
openstackgerrit | Nicholas Randon proposed a change to openstack/tripleo-incubator: Wait for os-collect-config to complete on the seed https://review.openstack.org/96515 | 12:01 |
*** dprince has joined #tripleo | 12:03 | |
*** yamahata has joined #tripleo | 12:10 | |
*** morazi has joined #tripleo | 12:11 | |
*** jdob has joined #tripleo | 12:17 | |
*** akrivoka has joined #tripleo | 12:21 | |
*** e0ne_ has joined #tripleo | 12:24 | |
*** weshay has joined #tripleo | 12:25 | |
openstackgerrit | Derek Higgins proposed a change to openstack-infra/tripleo-ci: Split very long line in toci_devtest.sh https://review.openstack.org/97429 | 12:27 |
openstackgerrit | Derek Higgins proposed a change to openstack-infra/tripleo-ci: Add sbin to PATH when loging hostinfo https://review.openstack.org/99642 | 12:27 |
*** e0ne has quit IRC | 12:28 | |
*** tzumainn has joined #tripleo | 12:36 | |
*** akuznetsov has joined #tripleo | 12:38 | |
openstackgerrit | Nicholas Randon proposed a change to openstack/tripleo-incubator: Clean-up os-apply-config lines. https://review.openstack.org/99645 | 12:42 |
openstackgerrit | Nicholas Randon proposed a change to openstack/tripleo-incubator: Clean-up os-apply-config lines in devtest_seed.sh. https://review.openstack.org/99645 | 12:43 |
openstackgerrit | Nicholas Randon proposed a change to openstack/tripleo-incubator: Clean-up os-apply-config lines in devtest_seed.sh. https://review.openstack.org/99645 | 12:44 |
openstackgerrit | Andrea Frittoli proposed a change to openstack/tripleo-image-elements: Configurable tests2skip file https://review.openstack.org/98817 | 12:45 |
*** jml has quit IRC | 12:46 | |
*** openstackgerrit has quit IRC | 12:46 | |
*** rakesh_hs has joined #tripleo | 12:46 | |
*** jml has joined #tripleo | 12:46 | |
*** openstackgerrit has joined #tripleo | 12:48 | |
*** noslzzp has joined #tripleo | 12:50 | |
openstackgerrit | Gonéri Le Bouder proposed a change to openstack/tripleo-image-elements: Drop some unnecessary lsb_release calls https://review.openstack.org/99161 | 12:58 |
*** julim has joined #tripleo | 13:01 | |
giulivo | anyone who could merge https://review.openstack.org/#/c/93745/ ? | 13:02 |
*** yamahata has quit IRC | 13:03 | |
*** yamahata has joined #tripleo | 13:03 | |
*** matty_dubs|gone is now known as matty_dubs | 13:12 | |
*** ohadlevy_ is now known as ohadlevy | 13:13 | |
*** ohadlevy has quit IRC | 13:13 | |
*** ohadlevy has joined #tripleo | 13:13 | |
*** CaptTofu_ has quit IRC | 13:15 | |
*** CaptTofu_ has joined #tripleo | 13:16 | |
goneri | can a core reviewer have a look on https://review.openstack.org/#/c/97210/ It introduces a cache system for apt that speed up a lot the build of an image. | 13:19 |
*** CaptTofu_ has quit IRC | 13:20 | |
d0ugal | tzumainn: I'd be up for going ahead with that plan now if you like | 13:21 |
tzumainn | d0ugal, sorry, which plan was it? | 13:21 |
d0ugal | tzumainn: What you said in here last night. /me looks for the wording | 13:22 |
tzumainn | I'm not sure I was speaking rationally yesterday afternoon after spending a few hours with a bunch of screaming seven-year old girls trying to sew | 13:22 |
d0ugal | tzumainn: I think you wanted to lock down the models | 13:22 |
tzumainn | d0ugal, oh, yeah | 13:22 |
tzumainn | so, assuming the spec is split into a) models and b) storage backends | 13:22 |
tzumainn | I think b) has a lot of open questions, but if we can get consensus on a), I think we could start coding on that | 13:22 |
d0ugal | Sounds good | 13:23 |
d0ugal | Then we would be in a position to try different b)'s | 13:23 |
*** rha has quit IRC | 13:23 | |
tzumainn | yep | 13:23 |
d0ugal | tzumainn: I'm not sure we are in a position yet to fully define the Role | 13:24 |
tzumainn | hm | 13:24 |
openstackgerrit | A change was merged to openstack/diskimage-builder: Correct the wrong rename in rhel element https://review.openstack.org/96114 | 13:24 |
d0ugal | tzumainn: or, at least, I wasn't sure yesterday and jdob didn't seem to be either when we spoke | 13:25 |
tzumainn | that's fair | 13:26 |
tzumainn | it seems like everything we model has random associated metadata that we're either not exactly sure how to store, or which we attempt to store by stuffing into a template or something | 13:27 |
d0ugal | Yup | 13:27 |
tzumainn | well, I guess the glance artifact store allows arbitrary metadata, but we're no longer sure that'll be ready for juno? | 13:28 |
d0ugal | I think it will be ready for juno, but that may not give us enough time to actually use it. | 13:28 |
*** rha has joined #tripleo | 13:29 | |
*** rha has quit IRC | 13:29 | |
*** rha has joined #tripleo | 13:29 | |
d0ugal | They seem confident that it will be ready, but yeah, I don't know how long we will then need to implement it. | 13:29 |
tzumainn | so, okay, we assume that each storage backend provides some way of letting us associate arbitrary metadata | 13:29 |
tzumainn | and for the ones that don't, we say that arbitrary metadata will be stored as, I dunno, a dictionary stored in a file? | 13:29 |
d0ugal | Yup, I could pop that into the requirements. | 13:29 |
d0ugal | Yeah, otherwise we need to store it as another file/object | 13:30 |
d0ugal | tzumainn: I think every backend we are considering has the ability to store metadata | 13:30 |
d0ugal | (including swift) | 13:31 |
tzumainn | oh, okay | 13:31 |
tzumainn | actually, this is all irrelevant to the role model, isn't it | 13:31 |
tzumainn | my mind is nowhere today | 13:31 |
d0ugal | :) | 13:31 |
d0ugal | That's fine, it was a good thing to check anyway | 13:31 |
tzumainn | so the question about the role model is what template(s) need to be associated? whether it's just the direct heat template, or also the ones indicated in the acyclical directed graph, as shadower says? | 13:32 |
d0ugal | tzumainn: yeah, I think so | 13:34 |
*** CaptTofu_ has joined #tripleo | 13:34 | |
d0ugal | tzumainn: but I'm a bit confused by it at the moment | 13:34 |
tzumainn | I say, just propose both options in the spec, and see what input we get | 13:36 |
d0ugal | tzumainn: good idea. | 13:37 |
d0ugal | tzumainn: I'll start that now, I'm just doing a proof read of my other updates first | 13:37 |
d0ugal | jdob: Is "master template" a Heat term? | 13:44 |
jdob | no | 13:44 |
d0ugal | Tuskar specific? | 13:44 |
jdob | ya, heat will just call it a template. for our purposes, that word feels too generic and I wanted to call it out explicitly | 13:45 |
jdob | we could use the term "plan template" instead | 13:45 |
jdob | i just kinda started using master template one day and it stuck, it wasn't a real conscious decision | 13:45 |
d0ugal | Makes sense, just wondering if I could clarify what it is | 13:45 |
d0ugal | heh, I think it makes sense but maybe it should be defined somewhere. | 13:46 |
jdob | i think i tried to define it in my spec, but at this point, its a friggin blur | 13:46 |
jdob | maybe i intended to and forgot | 13:46 |
d0ugal | :) | 13:46 |
d0ugal | I'll add a comment and ask before I forget | 13:46 |
openstackgerrit | Andrea Frittoli proposed a change to openstack/tripleo-image-elements: Configurable tests2skip file in tempest element https://review.openstack.org/98817 | 13:52 |
*** beekneemech is now known as bnemec | 13:54 | |
*** mrunge has joined #tripleo | 13:55 | |
d0ugal | tzumainn: https://etherpad.openstack.org/p/AoF98MwZNU | 13:57 |
openstackgerrit | Nicholas Randon proposed a change to openstack/tripleo-incubator: Wait for os-collect-config to complete on the seed https://review.openstack.org/96515 | 13:59 |
tzumainn | d0ugal, ah, so the other complication in B. is (iirc) the notion that some templates might be shared | 13:59 |
tzumainn | oh, wait | 13:59 |
tzumainn | that's an implementation detail | 13:59 |
tzumainn | nm | 13:59 |
tzumainn | yeah, htat makes sense | 14:00 |
d0ugal | tzumainn: I don't think the structure is important to us, rather we just want to be able to find them all | 14:00 |
d0ugal | but I'm not sure about that. | 14:00 |
*** jistr has quit IRC | 14:00 | |
tzumainn | d0ugal, no, I think you're right | 14:01 |
tzumainn | at least, not important in terms of modeling : ) | 14:01 |
d0ugal | tzumainn: ha, sounds good. I'll propose 1 and 2a then :) | 14:01 |
openstackgerrit | Nicholas Randon proposed a change to openstack/tripleo-incubator: Clean-up os-apply-config lines in devtest_seed.sh. https://review.openstack.org/99645 | 14:01 |
d0ugal | (Well, 1 *OR* 2a, but I'll include both) | 14:02 |
*** jistr has joined #tripleo | 14:02 | |
*** pcrews has joined #tripleo | 14:10 | |
*** mrunge has quit IRC | 14:22 | |
*** jcoufal has quit IRC | 14:27 | |
*** ci-overcloud has joined #tripleo | 14:29 | |
ci-overcloud | ************** ci-overcloud complete status=1 ************ | 14:29 |
*** ci-overcloud has quit IRC | 14:29 | |
openstackgerrit | Dougal Matthews proposed a change to openstack/tripleo-specs: TripleO Template and Deployment Plan Storage https://review.openstack.org/97553 | 14:37 |
jang | hey guys - does anyone know what gerrit's rules are about wrapping llong lines in its diff output? | 14:37 |
*** lazy_prince has quit IRC | 14:37 | |
*** rdopieralski has quit IRC | 14:38 | |
*** yamahata has quit IRC | 14:41 | |
*** mkerrin has quit IRC | 14:44 | |
jprovazn | pacemaker experts, any idea why pacemaker doesn't see ceilometer-agent-central.service (but haproxy.service does)? http://paste.openstack.org/show/83816/ | 14:46 |
jprovazn | do I have to do some extra magic? | 14:46 |
*** martyntaylor has quit IRC | 14:47 | |
*** martyntaylor has joined #tripleo | 14:49 | |
*** jistr has quit IRC | 14:56 | |
*** jistr has joined #tripleo | 14:56 | |
openstackgerrit | Nicholas Randon proposed a change to openstack/tripleo-incubator: Improve readability of long JQ expression https://review.openstack.org/99681 | 14:57 |
openstackgerrit | Nicholas Randon proposed a change to openstack/tripleo-incubator: Improve readability of long JQ expression https://review.openstack.org/99681 | 14:57 |
*** marun has joined #tripleo | 14:58 | |
openstackgerrit | Nicholas Randon proposed a change to openstack/tripleo-incubator: Improve readability of long JQ expression https://review.openstack.org/99681 | 14:59 |
*** untriaged-bot has joined #tripleo | 15:00 | |
untriaged-bot | Untriaged bugs so far: | 15:00 |
untriaged-bot | https://bugs.launchpad.net/tripleo/+bug/1329238 | 15:00 |
uvirtbot | Launchpad bug 1329238 in tripleo "OVS isn't persisting mac addresses on OVS bridges" [Undecided,In progress] | 15:00 |
*** untriaged-bot has quit IRC | 15:00 | |
*** noslzzp has quit IRC | 15:01 | |
openstackgerrit | Nicholas Randon proposed a change to openstack/tripleo-incubator: Make a separation between --heat-env https://review.openstack.org/93091 | 15:01 |
openstackgerrit | Nicholas Randon proposed a change to openstack/tripleo-incubator: Separate Heat BM and VM configs for Nova-BM. https://review.openstack.org/90033 | 15:02 |
*** rakesh_hs has quit IRC | 15:03 | |
*** jprovazn is now known as jprovazn_afk | 15:04 | |
*** dprince has quit IRC | 15:06 | |
*** rollerj has joined #tripleo | 15:07 | |
rollerj | hello all! i attempted a stack update, increasing the computescale by 1. the new compute node deployed but os-collect-config on that node failed with a "Client Error: AccessDenied" | 15:09 |
rollerj | [2014-06-12 03:12:21,489] (os-refresh-config) [INFO] Completed phase migration | 15:09 |
rollerj | INFO:os-refresh-config:Completed phase migration | 15:09 |
rollerj | 2014-06-12 03:12:21.927 1015 WARNING os_collect_config.cfn [-] 403 Client Error: AccessDenied | 15:09 |
rollerj | i ran stack-update with the same stackrc credentials that the original stack was created | 15:09 |
rollerj | thoughts? | 15:09 |
openstackgerrit | Cian O'Driscoll proposed a change to openstack/tripleo-image-elements: Store ssh host keys on ephemeral partition https://review.openstack.org/89529 | 15:10 |
*** yolanda has joined #tripleo | 15:11 | |
*** akuznetsov has quit IRC | 15:11 | |
yolanda | hi, having an issue with dib, when scheduling an image i created, i get this error: Stderr: "qemu-img: 'image' uses a qcow2 feature which is not supported by this qemu version: QCOW version 3\nqemu-img: Could not open '/var/lib/nova/instances/_base/94d0200a5bb90968e0e40f682f9e187025d84276.part': Operation not supported\n" | 15:11 |
yolanda | looks as some issue with qcow versions, has anyone seen that before? | 15:12 |
yolanda | i've been told to force --compat=1.0 when using qemu-img, but dib doesn't provide this option | 15:12 |
* mordred is curious about the answer ^^ | 15:13 | |
*** andreaf has quit IRC | 15:16 | |
yolanda | in the meantime i'll convert it manually and upload | 15:19 |
*** noslzzp has joined #tripleo | 15:19 | |
*** akuznetsov has joined #tripleo | 15:19 | |
*** dprince has joined #tripleo | 15:25 | |
*** CaptTofu_ has quit IRC | 15:25 | |
openstackgerrit | Nicholas Randon proposed a change to openstack/tripleo-incubator: Add Quotes to $NEW_JSON to preserve Json format. https://review.openstack.org/99693 | 15:31 |
openstackgerrit | A change was merged to openstack-infra/tripleo-ci: Log a message when we skip the temprevert https://review.openstack.org/98103 | 15:37 |
*** CaptTofu_ has joined #tripleo | 15:41 | |
*** eghobo has joined #tripleo | 15:45 | |
*** robsparker has joined #tripleo | 15:52 | |
*** matty_dubs is now known as matty_dubs|gone | 15:53 | |
*** bogdando has quit IRC | 15:55 | |
*** bogdando has joined #tripleo | 15:56 | |
*** rakesh_hs has joined #tripleo | 15:58 | |
*** e0ne_ has quit IRC | 15:59 | |
*** e0ne has joined #tripleo | 16:00 | |
*** akuznetsov has quit IRC | 16:03 | |
*** mestery has quit IRC | 16:03 | |
*** pblaho has quit IRC | 16:03 | |
*** mestery has joined #tripleo | 16:04 | |
*** e0ne has quit IRC | 16:04 | |
*** pblaho has joined #tripleo | 16:05 | |
yolanda | mordred, so i have something working forcing qemu-img to 0.10 version | 16:07 |
yolanda | but we don't have that option on dib, may i file a patch for it? | 16:07 |
yolanda | something like passing a string of options to be passed to qemu-img, and be able to send that from disk-image-create | 16:08 |
*** pblaho has quit IRC | 16:10 | |
*** chuckC has quit IRC | 16:11 | |
openstackgerrit | Dougal Matthews proposed a change to openstack/tripleo-specs: TripleO Template and Deployment Plan Storage https://review.openstack.org/97553 | 16:12 |
*** chuckC has joined #tripleo | 16:13 | |
*** jistr has quit IRC | 16:19 | |
openstackgerrit | Derek Higgins proposed a change to openstack-infra/tripleo-ci: Add some docs descibing tripleo CI https://review.openstack.org/99708 | 16:28 |
*** akuznetsov has joined #tripleo | 16:34 | |
bnemec | yolanda: I would suggest opening a bug describing the problem you're hitting, including what version of operating system/OpenStack/qemu/anything-else-related you are trying to use. | 16:37 |
*** openstackgerrit has quit IRC | 16:38 | |
*** mkerrin has joined #tripleo | 16:53 | |
*** jcoufal has joined #tripleo | 16:58 | |
*** rwsu has joined #tripleo | 17:00 | |
*** bogdando has quit IRC | 17:00 | |
derekh_ | lifeless: only made a little progress on R1 today, was busy chasing down patches, then got sidetracked looking at gertty :-) , the newly deploy ci-overcloud has gotten a little further in the process but more todo, will pick it back up in the morning | 17:01 |
*** eghobo has quit IRC | 17:04 | |
*** eghobo has joined #tripleo | 17:04 | |
*** dprince has quit IRC | 17:05 | |
*** nati_ueno has joined #tripleo | 17:06 | |
*** lucasagomes has quit IRC | 17:09 | |
*** dshulyak_ has joined #tripleo | 17:12 | |
*** markmc has quit IRC | 17:14 | |
*** derekh_ has quit IRC | 17:14 | |
*** akrivoka has quit IRC | 17:15 | |
*** dprince has joined #tripleo | 17:25 | |
*** rwsu has quit IRC | 17:30 | |
*** rwsu has joined #tripleo | 17:32 | |
*** martyntaylor has left #tripleo | 17:32 | |
greghaynes | hrm, are we still using trello? | 17:36 |
greghaynes | jprovazn_afk: wondering if we should add a card for ceilometer (if thats what youre working on?) to trello | 17:36 |
SpamapS | greghaynes: doh, probably not | 17:37 |
greghaynes | hrm | 17:37 |
greghaynes | its actually (surprisingly) not *that* out of date for what weve been doing on HAey stuff | 17:38 |
greghaynes | not sure thats a good or bad thing | 17:38 |
greghaynes | but yea, its been a while | 17:39 |
*** jprovazn_afk is now known as jprovazn | 17:41 | |
jprovazn | hm, I'm not sure what is current "trello status" | 17:42 |
greghaynes | yeaa | 17:43 |
greghaynes | sounds like no | 17:43 |
SpamapS | greghaynes: btw did you need more reviews to get the Galera stuff landed? | 17:46 |
* greghaynes checks | 17:46 | |
greghaynes | jprovazn: lifeless: https://review.openstack.org/#/c/83883/ | 17:46 |
jprovazn | greghaynes, ah, I thought this was already merged, looking now | 17:47 |
SpamapS | greghaynes: dunno why trivial rebase didn't carry the other +2 forward | 17:50 |
greghaynes | SpamapS: yea.. :/ | 17:50 |
greghaynes | https://review.openstack.org/#/c/86435/30 | 17:50 |
greghaynes | lucky number 30! | 17:50 |
SpamapS | greghaynes: gah.. I really think we need to make os-refresh-config dependency based. I dislike that we just have random things deciding to halt the whole process. | 17:53 |
greghaynes | SpamapS: Which script are you referring to? | 17:54 |
greghaynes | I tried to prevent the mysql one from doing that... | 17:54 |
SpamapS | greghaynes: So Kiall is right in that we probably have other scripts further in the linear chain that expect mysql to be working, and it is easier to debug if we just fail immediately as soon as we don't have a configuration for something that we expect to eeventually have. | 17:55 |
greghaynes | hrm | 17:56 |
SpamapS | greghaynes: but I'd prefer that we take a page out of the convergence book, and try to do as much as we can that we know we can try to do, each time. | 17:56 |
Kiall | SpamapS: ++ And, it'll retry in 60 seconds | 17:56 |
SpamapS | So if we change it from ##-xxxx-xxx to a graph expression at the top (a-la systemd or lsb-init) .. then we can say "if we have a config for mysql, set "mysql is ready" and then configure mysql" and then anything that needs mysql expresses the same sort of dependency. | 17:57 |
greghaynes | I could see an argument for mysql (and maybe rmq) being the two special cases for that... | 17:57 |
greghaynes | hrm | 17:57 |
SpamapS | so IMO mysql isn't a special case, and things should fail gracefully if it isn't available, because it might not even be on the local server. | 17:57 |
*** dshulyak_ has quit IRC | 17:57 | |
Kiall | greghaynes: Therese has been working on getting 3x node percona cluster up inside TriplO - without exiting when theres no data available, she's ended up getting multiple single nodes | 17:58 |
greghaynes | Kiall: It does require some in-flight patches but ive been doing that just fine the past few days | 17:58 |
Kiall | Once she added a elif == 255, the cluster came up reliably every time (apparently - I've not seen it myself) | 17:58 |
greghaynes | ah | 17:59 |
greghaynes | thats...odd | 17:59 |
greghaynes | SpamapS: Your "change it from", can you elaborate? | 17:59 |
greghaynes | it being the o-r-c script running? | 17:59 |
Kiall | It's the same thing we saw until I added an "exit 1" when there's no metdata, but we use the elements outside of OOO itself etc | 17:59 |
*** edmund has joined #tripleo | 18:00 | |
*** morazi has quit IRC | 18:00 | |
greghaynes | Kiall: I wonder if theres already a master running on boot with her her setup? | 18:00 |
*** morazi has joined #tripleo | 18:01 | |
Kiall | Based on the code snippets I saw today, she has your don't start mysql on initial boot thing included | 18:01 |
Kiall | (2nd hand info here BTW - Probably easier to ask the source :)) | 18:01 |
greghaynes | heh, yep | 18:02 |
*** pelix has quit IRC | 18:03 | |
jprovazn | greghaynes, SpamapS: do you remember if there was a discussion how to deal with monitoring/restarting failed services? E.g. if a mysql on one node goes down, cluster will still work, but the failed node should be fixed (e.g. try to restart service or restart the whole node...) | 18:06 |
greghaynes | eek | 18:07 |
greghaynes | so part of that is the heat convergence issue | 18:07 |
greghaynes | another part of that is our dopey leader election | 18:07 |
greghaynes | IMO restart the node | 18:08 |
greghaynes | and in this case hope its not the 'bootstrap host' | 18:08 |
jprovazn | :) | 18:09 |
*** shausy has joined #tripleo | 18:10 | |
*** jdob has quit IRC | 18:12 | |
jprovazn | thanks, /me dives into the heat convergence spec | 18:12 |
*** jdob has joined #tripleo | 18:13 | |
*** openstackgerrit has joined #tripleo | 18:13 | |
lifeless | o/ | 18:15 |
greghaynes | hrm, I thought for Go apps you dont actually need go.. they are all self contained binaries | 18:16 |
lifeless | SpamapS: I agree with you about random things; just want to note that convergence doesn't constrain the choice of linear or dag | 18:16 |
lifeless | greghaynes: I am | 18:17 |
lifeless | greghaynes: we've been on ha for a lllllong time | 18:17 |
greghaynes | fact | 18:17 |
*** jcoufal has quit IRC | 18:17 | |
lifeless | yolanda: never seen that before | 18:17 |
lifeless | greghaynes: I mean, I still care about trello | 18:18 |
greghaynes | BadCub: ^ sounds like trello is at least somewhat still used | 18:18 |
*** shausy has quit IRC | 18:19 | |
*** e0ne has joined #tripleo | 18:21 | |
Kiall | greghaynes: looking at your latest os-is-bootstrap-host patchset, you added an `else echo "Refusing to bootstrap mysql cluster"`, but that won't prevent any "future" scripts running, like the mysql-migration elements migration.d script, which will startup mysql as a cluster of 1 | 18:21 |
greghaynes | yes, I am still surprised we want to stop the show in that case | 18:22 |
*** jprovazn is now known as jprovazn_afk | 18:22 | |
SpamapS | greghaynes: The issue is that you have failed to configure the system.. so things should stop | 18:23 |
SpamapS | I actually missed that there was no exit 1 | 18:23 |
greghaynes | ah ok, this is why I was confused | 18:23 |
greghaynes | I think failed to configure the system is harsh, we just havent configured that part yet... | 18:24 |
greghaynes | there is a case for failure where we do exit 1 | 18:24 |
Kiall | greghaynes: think of it this way, if all remaining scripts complete with exit 0's after mysql refuses to bootstrap, then os-collect-config is going to go ahead and ping Heat to say "This host is fully configured" | 18:25 |
*** lsmola has quit IRC | 18:25 | |
Kiall | It's also not going to re-run until it detects a metadata change - which may never happen | 18:25 |
*** jdob has quit IRC | 18:25 | |
*** jdob has joined #tripleo | 18:25 | |
greghaynes | hrm | 18:26 |
lifeless | greghaynes: its basically turning off set -e | 18:26 |
lifeless | greghaynes: right ? | 18:26 |
greghaynes | yep | 18:26 |
lifeless | greghaynes: so, why do we want to do that ? | 18:27 |
greghaynes | oh ok, now I understand what SpamapS was saying | 18:27 |
greghaynes | which is why I didnt like the exit 1 | 18:27 |
SpamapS | lifeless: convergence works a heck of a lot better if we can converge branches of a dag rather than one linear process. | 18:28 |
lifeless | SpamapS: warning - this may be a 'beer' conversation | 18:28 |
SpamapS | Kiall: Note that what I'm suggesting is we should not kill the whole configuration process if something is _missing_ from metadata, because its reappearance would trigger another o-c-c run. But We can only continue partial bits reliably if we have expressed the required config metadata sections in each script | 18:29 |
Kiall | SpamapS: you mentioned you think having a DAG would be better ++, but, a short term fix might be to allow the scripts to exit with a known code - say 255 - that let's os-collect-config continue with future scripts, but prevents Heat from being pinged, and ensures os-collect-config will re-run without a metadata change. Simple stopgap that may just work.. | 18:29 |
lifeless | SpamapS: but I truely think its orthogonal. I'm not arguing that we stay linear per se, but I really don't see other than parallelisation that dag affects anything | 18:29 |
SpamapS | Kiall: I think that code is 0, and the signal is fine, because the signal is reset when deployments are exposed to the server. | 18:30 |
Kiall | SpamapS: you're right actually that a re-run will happen.. But Heat will still be pinged, allowing the next phase of servers that depend on mysql (in this example) to proceed | 18:30 |
openstackgerrit | A change was merged to openstack/tripleo-image-elements: Name 03-mariadb files uniquely https://review.openstack.org/97801 | 18:31 |
openstackgerrit | A change was merged to openstack/diskimage-builder: Name 99-setup-first-boot uniquely https://review.openstack.org/95808 | 18:31 |
*** vinsh has quit IRC | 18:31 | |
greghaynes | I was under the impression that we basically relied on the depending service detecting if its requirements werent met, which (hopefully) gets of some of the convergence behavior | 18:32 |
greghaynes | so for the fear of sending a signal to heat, nova or something else should fail if it needs mysql | 18:32 |
greghaynes | or maybe we can signal a soft fail | 18:36 |
* greghaynes stops overthinking and changes to exit 1 | 18:36 | |
Kiall | greghaynes: doesn't that just punt the problem out to 1+ other elements? While they clearly need some of that logic (e.g. can I reach MySQL), they don't necessarily need other bits, like "Is the MySQL node I've just connected to in the middle of a bootstrap" | 18:36 |
*** jang1 has joined #tripleo | 18:37 | |
Kiall | (which does actually happen - percona accepts connections during the bootstrap ;)) | 18:37 |
greghaynes | ugh, I was terrified of that second statement | 18:37 |
greghaynes | we should fix that | 18:37 |
greghaynes | for the first thing - it does only need to check for can I reach mysql | 18:37 |
greghaynes | simplifying the problem a lot - the next scrip is just going to fail because its all setup db commands | 18:38 |
greghaynes | so its actually not a big deal here | 18:38 |
Kiall | Yea, but if it can reach a mid bootstrap mysql .. :) Anyway - I've gotta run, I'll leave you with - My original patchset's build multi node percona clusters reliably every time ;) | 18:38 |
greghaynes | we should never leave a mid bootstrap mysql running the way thats coded... I hope | 18:39 |
*** rpodolyaka1 has joined #tripleo | 18:39 | |
Kiall | In my case - We're using the elements outside TripleO - which means the next script may be something else. And, you can't necessarily count on the next script remaining the same even in TripleO | 18:40 |
greghaynes | yes, I mean for adding an exit 1 - for us itll end up almost identical if we add it so may as well err on the safe side | 18:40 |
lifeless | SpamapS: the signal won't be reset if we race | 18:40 |
Kiall | Anyway - Gotta run | 18:41 |
openstackgerrit | Gregory Haynes proposed a change to openstack/tripleo-image-elements: Add os-is-bootstrap-host element and script https://review.openstack.org/86435 | 18:41 |
greghaynes | rejoice! the exit 1 has returned! | 18:41 |
lifeless | SpamapS: (I think - check my logic here : | 18:41 |
lifeless | - occ starts with missing mysql deploy | 18:42 |
lifeless | - 2m window while occ runs through stuff reach the pingback, and mysql exits 0 | 18:42 |
lifeless | - heat gets what it needs together to prep the mysql deployment and offers that to the node | 18:42 |
lifeless | - the node pings in with 'success' | 18:42 |
lifeless | SpamapS: how does heat tell that the success was not for the mysql thing (in the current structure where we have one success marker) ? | 18:43 |
lifeless | SpamapS: I think we need to be more sophisticated with our ping-backs to do what you're suggesting | 18:43 |
lifeless | SpamapS: and we'll need to not run hooks for deployments that are not ready | 18:44 |
lifeless | SpamapS: oo - I have stuff in my head I haven't expressed. Quick brain dump. | 18:44 |
lifeless | SpamapS: a) we need service level dependencies - an example of this is the 'don't try to use mysql until the cluster is fully initialised' issue Kiall points out above | 18:44 |
lifeless | SpamapS: b) that needs to be expressed in heat because its a cluster issue not just in-instance | 18:45 |
lifeless | SpamapS: c) a thought I had about implementation was to create subdirs in e.g. pre-configure.d called by the deployment name | 18:45 |
SpamapS | Kiall: to that point, we are misusing the signals | 18:45 |
SpamapS | Kiall: we should have a deployment specifically for mysql | 18:45 |
SpamapS | Kiall: and the downstream things should reference the deployment if they can't be started until mysql is up | 18:46 |
SpamapS | Kiall: though IMO, they should just try over and over until mysql is up because mysql isn't going to be up forever | 18:46 |
lifeless | SpamapS: this would give you linear scripts for responding to a deployment, and no attempt to run scripts for which dependencies are not ready | 18:46 |
SpamapS | lifeless: I'm in the mysql uds session and it is sort of interesting ATM .. hang on | 18:49 |
lifeless | zomg | 18:49 |
*** dshulyak_ has joined #tripleo | 18:50 | |
lifeless | SpamapS: the mysql alternatives ? | 18:50 |
lifeless | SpamapS: actually, shoot me a link ? | 18:50 |
SpamapS | lifeless: it's done now | 18:53 |
SpamapS | well except they're talking about MongoDB | 18:53 |
greghaynes | YESSS | 18:53 |
SpamapS | lifeless: http://pad.ubuntu.com/uos-1406-ubuntu-server-database-plans-for-1410 | 18:53 |
SpamapS | greghaynes: webscale! | 18:53 |
greghaynes | oh great, I need a special ubuntu one acct | 18:53 |
clarkb | didn't ubuntu one die recently? | 18:55 |
greghaynes | apparently not | 18:55 |
SpamapS | not in our hearts and minds | 18:56 |
*** rakesh_hs has quit IRC | 18:57 | |
SpamapS | I haven't actually uninstalled it from my 14.04 box yet | 18:58 |
lifeless | SpamapS: oh, nvm then :) | 18:59 |
SpamapS | lifeless: ok, now I'll read what you said. | 18:59 |
SpamapS | lifeless: OK brain dump is right on and I agree on the need. Have not thought too much about in-instance implementation.. | 19:00 |
SpamapS | lifeless: we already have service level deps in software deployments | 19:00 |
SpamapS | lifeless: one can send data back in the signal, and that data can be referenced by a downstream deployment.. | 19:01 |
lifeless | SpamapS: we're not expressing nearly enough, and we're not consuming it structurally | 19:01 |
lifeless | SpamapS: I'm sure software deployments has the capability | 19:01 |
SlickNik | Can I get some eyeballs on https://review.openstack.org/#/c/99505/ when folks get a chance? | 19:01 |
SpamapS | lifeless: so we can do {{result.of.mysql.configuration}} | 19:01 |
SlickNik | It fixes the DatasourceNone cloud-init issue for precise. | 19:01 |
SlickNik | Thanks! | 19:01 |
SpamapS | lifeless: right we only grew this ability recently, it is in our best interests to split everything up into its own config+deployment | 19:02 |
SpamapS | SlickNik: +2'd | 19:02 |
*** phschwartz has quit IRC | 19:02 | |
*** funzo has quit IRC | 19:02 | |
*** phschwartz_ has joined #tripleo | 19:02 | |
*** funzo has joined #tripleo | 19:03 | |
SpamapS | lifeless: on the o-r-c side, I like the idea of giving orc some visibility into configs so that it just naturally does not run whole branches until a variable is available. Thats sort of what I'm getting at with the dag. | 19:03 |
SlickNik | SpamapS: Thanks much! | 19:03 |
SpamapS | lifeless: that said, I think we should draw a hard line between "do something only after mysql is initialized" and "do something only if mysql is available" .. as I'm wary of trying to orchestrate around all possible states of all possible services.. I'd rather dependents be resilient in the face of an unavailable dependency | 19:05 |
*** eguz has joined #tripleo | 19:05 | |
SpamapS | lifeless: I saw this with upstart too.. people would try to use upstart to order boot because they thought that was the right way.. but then you move the mysql server to a dedicated host and you find that your service is incapable of coping with 5 minutes of mysql downtime and explodes violently. | 19:06 |
*** eguz has quit IRC | 19:06 | |
greghaynes | for the 'cluster is initialized' case, cant we just monitor the service and not have it report deployment success until there are enough members in the cluster (or $cluster_initialized_metric) | 19:08 |
*** eghobo has quit IRC | 19:08 | |
greghaynes | re: lifeless's point about heat having to know about the cluster | 19:09 |
*** phschwartz_ is now known as phschwartz | 19:11 | |
greghaynes | Im trying to reconcile what this deployment stuff means if we, say, want to prevent services from using mysql until we have a cluster of a certain size | 19:11 |
*** e0ne has quit IRC | 19:13 | |
lifeless | SpamapS: agreed that the goal has to be things are resilient | 19:13 |
*** e0ne has joined #tripleo | 19:14 | |
lifeless | SpamapS: I think the point about mysql connections during init is that the mysql server in question isn't allowed to be in the set of machines we point requests at until its checked in | 19:14 |
lifeless | SpamapS: which is different | 19:14 |
lifeless | SpamapS: consider a rebuild of a node, takes (say) 3 hours. Don't want all the rest of the world using it until its finished. | 19:14 |
*** sseago has quit IRC | 19:14 | |
lifeless | SpamapS: thats also not what I wrote about above, it just took me a bit to click. | 19:14 |
lifeless | so we need the expression 'nodes for mysql's haproxy rule' to be 'nodes that have checked in', not 'all mysql nodes' | 19:15 |
lifeless | greghaynes: ^ | 19:15 |
greghaynes | ah | 19:15 |
greghaynes | hrm | 19:15 |
greghaynes | is heat really where we want that information to live? | 19:16 |
lifeless | I might still not be synced with what Kiall is saying, but I think I'm correct in saying we want this anyway, because even if galera does transparently handle it, its going to be terribly slow while the IST is going on | 19:16 |
*** e0ne has quit IRC | 19:16 | |
lifeless | greghaynes: yes | 19:16 |
lifeless | greghaynes: heat is the cluster layer thing, no ? | 19:16 |
greghaynes | yep, just having to think through it a bit | 19:18 |
*** mugsie has joined #tripleo | 19:19 | |
lifeless | greghaynes: so, I think this is ok in 'future work' (but the future is now :)) - but lets capture a bug about the fact there is this problem | 19:19 |
lifeless | greghaynes: and we're going to want a spec to tease out the set of broad work we need to do to pull this all together | 19:19 |
*** rpodolyaka1 has quit IRC | 19:19 | |
greghaynes | what about service reports success, gets added to list, then crashes | 19:19 |
lifeless | I think its ok in future work because the current setup blocks local scripts until the local mysql is initialised | 19:19 |
lifeless | greghaynes: if its offline, haproxy won't route traffic to it | 19:19 |
greghaynes | sure, but it also needs to get removed from the list in a timely fashion | 19:20 |
lifeless | greghaynes: if its online and nonfunctional we need to check it back out again | 19:20 |
*** noslzzp has quit IRC | 19:20 | |
lifeless | greghaynes: I presume you can check cluster sync state remotely somehow? we could have an haproxy check script that checks that | 19:20 |
Kiall | greghaynes: percona clustercheck script + haproxy works great from removing dud nodes. | 19:21 |
lifeless | there is probably one out there we can use | 19:21 |
lifeless | ^ tada | 19:21 |
greghaynes | yep, so were going to need to do that per service for heat | 19:21 |
lifeless | Kiall: will it remove a node that is bootstrapping ? can we use just that ? | 19:21 |
lifeless | greghaynes: yes | 19:21 |
Kiall | (it's an xinet.d script that serves "HTTP 200" or "HTTP 500" for HAProxy to check | 19:21 |
Kiall | lifeless: yes | 19:21 |
Kiall | it will also remove nodes doing SST's etc | 19:21 |
lifeless | Kiall: care to throw up a patch for that ? | 19:22 |
Kiall | Sure, I'll have a quick look tomorrow.. It should be pretty trivial to add (famous last words and all that..) | 19:23 |
lifeless | hmm, patches, mysql status service on local ipv4; haproxy needs config glue for health check configuration, heat template needs to specify the script port for the mysql haproxy address | 19:23 |
*** noslzzp has joined #tripleo | 19:25 | |
*** sseago has joined #tripleo | 19:30 | |
greghaynes | SpamapS: is going to not like me saying this, but that also essentially solves our leader election needs | 19:31 |
greghaynes | if heat is keeping a list of nodes that have checked in for a service | 19:31 |
*** ohadlevy has quit IRC | 19:32 | |
Kiall | greghaynes: does it? Two nodes can, at the same time, notice there is no leader and bootstrap themselves | 19:32 |
greghaynes | so pick the lowest row id in the database | 19:33 |
greghaynes | for that checkin event | 19:33 |
Kiall | If two percona node bootstrap, you would have to "wipe" one and re-join it to the cluster if it wasn't the lowest row id? | 19:34 |
*** ohadlevy has joined #tripleo | 19:34 | |
*** ohadlevy is now known as Guest6243 | 19:34 | |
greghaynes | im having to hand wave around heat internals, but assuming it can maintain a list of nodes that have checked in for service X, and each checkin gets a row id | 19:34 |
greghaynes | everyone could repeatedly checkin for starting_X, query for nodes that have checked in starting_X and the rowids, if im lowest rowid I should bootstrap | 19:36 |
greghaynes | obviously dont expose internals that badly... | 19:36 |
greghaynes | but in theory | 19:36 |
*** rpodolyaka1 has joined #tripleo | 19:37 | |
greghaynes | basically the same system we have now | 19:37 |
greghaynes | just not statically defined | 19:37 |
*** rpodolyaka1 has quit IRC | 19:42 | |
*** morazi has quit IRC | 19:45 | |
* SpamapS had to break for lunch sorry | 19:46 | |
*** nati_ueno has quit IRC | 19:47 | |
*** e0ne has joined #tripleo | 19:48 | |
SpamapS | greghaynes: So Heat would have a list of check-ins and row numbers.. that is fine.. but the question is.. how do you know you have the most recent list? | 19:49 |
SpamapS | greghaynes: w/ an etcd you can lock the list while you act on it. | 19:49 |
SpamapS | greghaynes: which can be used for other nodes to go "try: lock_or_fail(list); except FailLock: i_am_a_slave" | 19:51 |
hewbrocca | I dunno guys | 19:52 |
hewbrocca | I feel like the pattern of an external thing doing master election is fraught with peril | 19:52 |
*** morazi has joined #tripleo | 19:52 | |
* hewbrocca channels Fabio | 19:52 | |
*** nati_ueno has joined #tripleo | 19:56 | |
*** akuznetsov has quit IRC | 19:56 | |
*** e0ne has quit IRC | 19:59 | |
SpamapS | hewbrocca: You are right, the closer leader election is to the state being managed, the less failure modes there will be. | 20:00 |
*** e0ne has joined #tripleo | 20:00 | |
SpamapS | hewbrocca: Ideally, Galera would use its own membership protocol and quorum tracking to elect leaders or refuse to continue. | 20:00 |
hewbrocca | Yeah | 20:00 |
hewbrocca | I mean, that's what we should strive for | 20:00 |
hewbrocca | then you have to figure out a way to deal with the various failure cases | 20:00 |
SpamapS | hewbrocca: but absent that capability in Galera, we are forced to tip-toe around and/or fence. | 20:01 |
lifeless | so | 20:01 |
lifeless | we should filea bug on galera saying | 20:01 |
lifeless | 'we'd like to be able to do zero-knowledge Just Go cluster init, where we list the nodes on every node, and it gets up and running' | 20:01 |
lifeless | s/nodes/intended nodes/ | 20:02 |
hewbrocca | Ryan O'Hara has a POC patch from a buddy of his at Percona | 20:02 |
hewbrocca | which is a Pacemaker resource for Galera | 20:02 |
greghaynes | SpamapS: You dont need to know you have the most recent list, you just need to have a lock on a choice once its made, ORDER BY id LIMIT 1 should do that since id's are increasing only | 20:02 |
hewbrocca | Intended to handle the above | 20:02 |
hewbrocca | But... not ready for primetime yet IIUC | 20:02 |
*** e0ne has quit IRC | 20:02 | |
lifeless | greghaynes: so they aren't increasing only. | 20:04 |
lifeless | greghaynes: say you have a three node galera cluster | 20:04 |
lifeless | greghaynes: an INSERT from node A will get 1, then the next 4, then the next 7 - there's a stride == cluster size. | 20:04 |
lifeless | greghaynes: node B gets 2,5,8 | 20:04 |
greghaynes | ack | 20:04 |
lifeless | greghaynes: node C gets 3,6,9 | 20:04 |
greghaynes | rip | 20:05 |
*** dprince has quit IRC | 20:05 | |
SpamapS | hewbrocca: pacemaker is just as fraught with danger as etcd or zookeeper. | 20:05 |
SpamapS | hewbrocca: same problem, different tools | 20:05 |
hewbrocca | can't argue with you there... just happens to be the devil we know | 20:06 |
SpamapS | lifeless: I don't believe thats true for _row numbers_ | 20:06 |
SpamapS | lifeless: that is for PK's | 20:06 |
SpamapS | lifeless: row number is the agreed upon next ID for the entire data set. | 20:06 |
lifeless | SpamapS: whats a row number ? | 20:06 |
lifeless | got a url ref? Its exceeded my plumbing-of-mysql knowledge | 20:07 |
lifeless | SpamapS: but also clearly its not straight sql so we can't use it from e.g. postgresql | 20:08 |
SpamapS | seqno Is the actual term IIRC | 20:08 |
SpamapS | http://www.slideshare.net/Severalnines/galera-node-recovery slide 20 | 20:08 |
SpamapS | lifeless: this is entirely galera plumbing | 20:08 |
greghaynes | it does sound like this blasts past the cost benefit line of not using $other_service | 20:09 |
SpamapS | hm | 20:10 |
SpamapS | so if you know for a fact that the seqno's are not advancing | 20:10 |
SpamapS | then you can do it with just Heat, because you can build a quorum from the dead nodes' seqno's | 20:10 |
SpamapS | I guess loss of quorum always means seqno's won't advance.. | 20:11 |
SpamapS | and thats the only scenario not covered by just starting up all the galera's | 20:11 |
SpamapS | greghaynes: that sound right? | 20:11 |
*** rlandy has quit IRC | 20:12 | |
SpamapS | It seems like recovering is really hard to orchestrate | 20:12 |
greghaynes | unsure | 20:12 |
greghaynes | yerp | 20:12 |
greghaynes | what do you mean 'just starting up all the galeras'? | 20:13 |
lifeless | so I am not sure that we need to solve this in the short term | 20:13 |
lifeless | at all | 20:13 |
lifeless | we definitely do medium/long | 20:13 |
lifeless | but just treating first-node-as-initial-master seems like a pretty reasonable stopgap: I mean, until heat has convergence, if that node fails to come all the way up, we'll still have STACK_FAILED | 20:14 |
greghaynes | fair | 20:14 |
SpamapS | http://galeracluster.com/documentation-webpages/nodeprovisioning.html is the source of truth here | 20:14 |
SpamapS | And I agree, we can just live with loss of quorum leads to reading that manual page, and getting your dbs up and running | 20:15 |
hewbrocca | lifeless: agree | 20:15 |
SpamapS | lifeless: true, why am I here? I have a spec to get merged. ;) | 20:15 |
hewbrocca | it is the safest option anyway | 20:15 |
greghaynes | I occured to me that a stack-update could result in a different node thinking it should be master given our current setup | 20:16 |
greghaynes | since we just alphasort and pick one, and that list could change... | 20:16 |
hewbrocca | Once the cluster is up you want to make sure *nothing* thinks it's master, ever | 20:16 |
hewbrocca | unless you explicitly go in and tell it so | 20:16 |
greghaynes | yeaaa | 20:16 |
greghaynes | "going to have a bad time" | 20:17 |
*** boris-42 has quit IRC | 20:18 | |
*** morganfainberg has quit IRC | 20:18 | |
*** boris-42 has joined #tripleo | 20:18 | |
*** morganfainberg has joined #tripleo | 20:18 | |
lifeless | greghaynes: so stack-update is wholly different to init | 20:21 |
lifeless | lets not overengineer non-bottlenecks. | 20:21 |
lifeless | This isn't a bottleneck. | 20:21 |
greghaynes | speaking of: https://review.openstack.org/#/c/86435/ | 20:22 |
greghaynes | lifeless: ^ please to review | 20:23 |
openstackgerrit | A change was merged to openstack/tripleo-image-elements: Update mysql cluster.cnf to match heat templates https://review.openstack.org/99196 | 20:23 |
openstackgerrit | A change was merged to openstack/tripleo-image-elements: Fall back to keystone-manage if pt-archiver isn't available https://review.openstack.org/98196 | 20:24 |
openstackgerrit | A change was merged to openstack/tripleo-image-elements: Fix sed regex from deleting old configs https://review.openstack.org/98432 | 20:24 |
openstackgerrit | A change was merged to openstack/tripleo-image-elements: Ceilometer Config element for custom pipeline https://review.openstack.org/94634 | 20:24 |
lifeless | greghaynes: passed CI yet ? | 20:24 |
openstackgerrit | A change was merged to openstack/tripleo-image-elements: Enable neutron.conf passthrough configuration https://review.openstack.org/93451 | 20:24 |
greghaynes | lifeless: no | 20:24 |
openstackgerrit | A change was merged to openstack/tripleo-image-elements: Configure passthrough in swift config files https://review.openstack.org/92628 | 20:24 |
greghaynes | lifeless: ill ping when it does | 20:24 |
lifeless | thanks | 20:24 |
*** matty_dubs|gone is now known as matty_dubs | 20:27 | |
*** nati_ueno has quit IRC | 20:28 | |
*** noslzzp has quit IRC | 20:29 | |
*** TravT has joined #tripleo | 20:29 | |
lifeless | BadCub: and it begins ;) | 20:34 |
*** julim has quit IRC | 20:35 | |
*** openstackgerrit has quit IRC | 20:35 | |
*** openstackgerrit has joined #tripleo | 20:36 | |
*** jprovazn_afk has quit IRC | 20:37 | |
lifeless | greghaynes: got to it; -1 :) | 20:37 |
greghaynes | well, thats one way to beat the CI | 20:38 |
SpamapS | lifeless: are you driving pbr's fixes for coverage/testr, or can we let this one through until somebody does that? https://review.openstack.org/#/c/97583/ | 20:43 |
SpamapS | I keep having to cherry pick that fix in to run coverage :-P | 20:44 |
*** nati_ueno has joined #tripleo | 20:44 | |
lifeless | SpamapS: it works in pbr | 20:44 |
lifeless | SpamapS: we found that the command is test not testr, | 20:45 |
lifeless | at least, thats what mordred said | 20:45 |
lifeless | SpamapS: there is also a patch from stevenk to register testr properly | 20:45 |
lifeless | SpamapS: 715c59738e3643f579b913921e90cf3b6bfc66e3 which is in trunk | 20:46 |
lifeless | let me see about getting a release | 20:46 |
*** noslzzp has joined #tripleo | 20:48 | |
*** noslzzp has joined #tripleo | 20:48 | |
SpamapS | lifeless: ok, so I fund that the command was test too.. :-P | 20:49 |
*** noslzzp has quit IRC | 20:49 | |
SpamapS | s/fund/found/ | 20:49 |
lifeless | SpamapS: does it accept the --coverage --coverage-package-name option ? | 20:49 |
SpamapS | lifeless: yes.. the patch.. that I just linked.. go look.. reconsider. :) | 20:50 |
lifeless | SpamapS: oh, I totally misread the patch initially. | 20:51 |
lifeless | we should add a setup.cfg thing to define the package name though | 20:52 |
SpamapS | lifeless: In a separate patch? | 20:52 |
SpamapS | lifeless: and to what end? | 20:52 |
lifeless | SpamapS: as a pbr feature | 20:54 |
lifeless | SpamapS: so that anything which needs to know it - such as coverage and test discovery - can infer it rather than be hand maintained. | 20:54 |
lifeless | DRY | 20:54 |
SpamapS | isn't module-name that? | 20:54 |
lifeless | possibly | 20:55 |
* SpamapS has never understood "package" vs. "module" | 20:55 | |
lifeless | I don't want to think right now :) | 20:55 |
SpamapS | don't think.. I know just what you're saying.. and I don't need your reasons.. don't tell me cause it hurts | 20:55 |
*** jtomasek has quit IRC | 20:55 | |
* SpamapS un-channels Gwen Stefani | 20:56 | |
openstackgerrit | A change was merged to openstack/diskimage-builder: Correct source-repository comments https://review.openstack.org/92509 | 20:56 |
*** noslzzp has joined #tripleo | 20:56 | |
lifeless | SpamapS: heh. so a dir like os_apply_config with a __init__.py is a package. a file foo.py in os_apply_config is the module os_apply_config.foo | 20:57 |
*** e0ne has joined #tripleo | 20:57 | |
*** jdob has quit IRC | 20:59 | |
*** untriaged-bot has joined #tripleo | 21:00 | |
untriaged-bot | Untriaged bugs so far: | 21:00 |
untriaged-bot | https://bugs.launchpad.net/tripleo/+bug/1329238 | 21:00 |
uvirtbot | Launchpad bug 1329238 in tripleo "OVS isn't persisting mac addresses on OVS bridges" [Undecided,In progress] | 21:00 |
*** untriaged-bot has quit IRC | 21:00 | |
openstackgerrit | Gregory Haynes proposed a change to openstack/tripleo-image-elements: Add os-is-bootstrap-host element and script https://review.openstack.org/86435 | 21:01 |
openstackgerrit | OpenStack Proposal Bot proposed a change to openstack/diskimage-builder: Updated from global requirements https://review.openstack.org/99074 | 21:01 |
*** noslzzp has joined #tripleo | 21:02 | |
*** vinsh has joined #tripleo | 21:05 | |
*** nati_ueno has quit IRC | 21:07 | |
lifeless | SpamapS: 86435 looks good to me now | 21:12 |
*** eghobo has joined #tripleo | 21:14 | |
SpamapS | lifeless: got it, thanks for the explanation, and the reviews. :) | 21:14 |
SpamapS | lifeless: speaking of reviews.. where are we at in resurrecting R1? | 21:15 |
lifeless | SpamapS: sec | 21:20 |
*** dshulyak_ has quit IRC | 21:23 | |
*** morazi has quit IRC | 21:25 | |
*** e0ne has quit IRC | 21:26 | |
*** e0ne has joined #tripleo | 21:26 | |
*** jang1 has quit IRC | 21:27 | |
*** petertoft has joined #tripleo | 21:28 | |
lifeless | SpamapS: ok | 21:28 |
*** TravT has quit IRC | 21:31 | |
*** e0ne has quit IRC | 21:31 | |
lifeless | SpamapS: http://logs.openstack.org/03/97703/3/check-tripleo/check-tripleo-overcloud-f20/00aebb8/logs/ should be realistically debuggable now | 21:34 |
lifeless | (relevant to deploying hp1 again, since its a nontrivial cluster | 21:35 |
lifeless | hmm | 21:35 |
lifeless | 2014-06-12 09:44:53.393 | Service ec2 created | 21:35 |
lifeless | 2014-06-12 09:44:55.624 | Authorization Failed: Unable to establish connection to http://192.0.2.3:5000/v2.0/tokens | 21:35 |
lifeless | 2014-06-12 09:44:55.895 | Authorization Failed: Unable to establish connection to http://192.0.2.3:5000/v2.0/tokens | 21:35 |
lifeless | 2014-06-12 09:44:56.123 | usage: keystone user-role-add --user <user> --role <role> [--tenant <tenant>] | 21:35 |
lifeless | 2014-06-12 09:44:56.123 | keystone user-role-add: error: argument --user/--user-id/--user_id: expected one argument | 21:35 |
lifeless | I think thats the race nicholas has been seeing | 21:35 |
lifeless | where we don't have signalling from the seed on ready | 21:35 |
morganfainberg | lifeless, whoa | 21:36 |
lifeless | morganfainberg: not a keystone issue | 21:37 |
morganfainberg | i know :) | 21:37 |
morganfainberg | deos look racy though :) | 21:37 |
* morganfainberg continues lurking. | 21:37 | |
SpamapS | lifeless: oh lovely | 21:37 |
lifeless | SpamapS: I'll report a bug | 21:38 |
*** morazi has joined #tripleo | 21:38 | |
SpamapS | lifeless: the whole signal handling needs an overhaul | 21:39 |
lifeless | hmmm, no thats not it, but I need to file this bug anyway. | 21:39 |
SpamapS | they should probably all be individually handled by each respective deployment<->element relationship | 21:39 |
lifeless | SpamapS: I've filed https://bugs.launchpad.net/tripleo/+bug/1329528 about the seed case | 21:41 |
uvirtbot | Launchpad bug 1329528 in tripleo "seed cloud cannot signal 'ready' - we guess at readiness and then race with os-collect-config" [High,Triaged] | 21:41 |
lifeless | ohhh | 21:45 |
lifeless | do we have ntp between the hypervisors and testenvs? bet we have skew | 21:46 |
lifeless | http://logs.openstack.org/03/97703/3/check-tripleo/check-tripleo-overcloud-f20/00aebb8/logs/overcloud-controller0-ejdwnvryudch_logs/os-collect-config.txt.gz and http://logs.openstack.org/03/97703/3/check-tripleo/check-tripleo-overcloud-f20/00aebb8/console.html | 21:46 |
lifeless | don't really line up | 21:46 |
*** Lexis has quit IRC | 21:46 | |
lifeless | so | 21:48 |
lifeless | Jun 12 09:39:16 overcloud-controller0-ejdwnvryudch os-collect-config[780]: + os-svc-enable -n keystone | 21:48 |
lifeless | keystone shouldn't be accessible before that | 21:48 |
lifeless | and - this is the overcloud | 21:48 |
lifeless | so we deployed with heat and were in STACK_READY | 21:48 |
*** Lexis has joined #tripleo | 21:49 | |
lifeless | there | 21:49 |
lifeless | Jun 12 09:42:19 overcloud-controller0-ejdwnvryudch systemd[1]: Stopping keystone Service... | 21:49 |
lifeless | Jun 12 09:42:19 overcloud-controller0-ejdwnvryudch systemd[1]: Starting keystone Service... | 21:49 |
lifeless | the error occurs at | 21:50 |
lifeless | 2014-06-12 09:44:55.624 | Authorization Failed: Unable to establish connection to http://192.0.2.3:5000/v2.0/tokens | 21:50 |
lifeless | morganfainberg: | 21:51 |
lifeless | Jun 12 09:42:13 overcloud-controller0-ejdwnvryudch keystone-all[4199]: 2014-06-12 09:42:13.580 4199 WARNING keystone.common.wsgi [-] Could not find user, ec2. | 21:51 |
SpamapS | lifeless: at one point we were setting NTP for tripleo-cd .. but maybe we never made it default for everything everywhere. | 21:51 |
*** ccrouch has quit IRC | 21:51 | |
lifeless | morganfainberg: ^ that happens when we do keystone user-get | 21:52 |
lifeless | morganfainberg: having anyone in the world be able to trigger a WARNING: seems bad :) | 21:52 |
lifeless | so I think ssh initialisation happened here: | 21:52 |
lifeless | Jun 12 09:42:12 overcloud-controller0-ejdwnvryudch keystone-all[4199]: 2014-06-12 09:42:12.827 4199 WARNING keystone.common.wsgi [-] Authorization failed. The request you have made requires authentication. from 192.0.2.3 | 21:52 |
lifeless | Jun 12 09:42:13 overcloud-controller0-ejdwnvryudch keystone-all[4199]: 2014-06-12 09:42:13.580 4199 WARNING keystone.common.wsgi [-] Could not find user, ec2. | 21:52 |
lifeless | thats when keystone became able to authenticate requests | 21:53 |
lifeless | actually no. I'm thoroughly confused here | 21:53 |
lifeless | SpamapS: ^ halp :) | 21:53 |
lifeless | http://logs.openstack.org/03/97703/3/check-tripleo/check-tripleo-overcloud-f20/00aebb8/logs/overcloud-controller0-ejdwnvryudch_logs/keystone.txt.gz | 21:53 |
lifeless | we successfully use keystone | 21:53 |
lifeless | then we get authentication-required errors, from nowhere | 21:53 |
lifeless | we know other requests are working because user-get is being done against heat etc | 21:54 |
lifeless | morganfainberg: ^ any thoughts on what would cause a working keystone to stop transiently ? | 21:54 |
lifeless | oh, I know | 21:54 |
lifeless | nova neutron heat etc on the same machine polling it | 21:54 |
lifeless | their credentials aren't valid yet. Maybe? | 21:55 |
lifeless | 0.3 is the machine itself I expect, since its not the jenkins slave | 21:55 |
lifeless | ok, I'm going to shelve this and go back to really answering SpamapS question | 21:55 |
SpamapS | lifeless: what can I halp with? | 22:00 |
*** nati_ueno has joined #tripleo | 22:03 | |
BadCub | lifeless: you jumping into meeting? | 22:03 |
*** ccrouch has joined #tripleo | 22:04 | |
lifeless | oh right | 22:04 |
lifeless | BadCub: coming | 22:04 |
lifeless | BadCub: I need 2m with spamaps first | 22:04 |
*** ccrouch has quit IRC | 22:04 | |
BadCub | lifeless: LOL okay | 22:04 |
lifeless | SpamapS: https://etherpad.openstack.org/p/tripleo-ci-hp1-rebuild is the etherpad as you know | 22:04 |
lifeless | SpamapS: my chores for today are: | 22:05 |
lifeless | - retest the bad nodes and update the tickets (see machine-information-tab etc) | 22:05 |
lifeless | - turn the remaining local edits in the code on the bastion into patches in gerrit | 22:05 |
lifeless | - add a jenkins job to test vlan configurations | 22:05 |
lifeless | - add incubator support for vlan configurations | 22:05 |
lifeless | - tie that all together | 22:05 |
lifeless | SpamapS: I think priority wise we need the new CI job first, since thats going to block on infra folk reviewing it, get it up and nag ;) | 22:06 |
lifeless | SpamapS: then probably the next thing I would love it if you did would be to refresh the tickets on the 6 bad nodes | 22:06 |
SpamapS | lifeless: Ok. I saw that at least one just got summarily closed because we didn't respond, even though it was not us but DC that needed to respond. :-P | 22:07 |
lifeless | SpamapS: yes, so retest and reopen if its still issue etc. | 22:08 |
SpamapS | lifeless: FTR, I'd like to do less of this, and more Heat core work. But.. I get that we still don't have anybody else up to speed on ops. | 22:08 |
lifeless | SpamapS: I'm not asking you to do this specifically, and tchaypo is now an admin too | 22:08 |
lifeless | SpamapS: most of the work is doable locally though | 22:09 |
*** andreaf has joined #tripleo | 22:09 | |
* tchaypo celebrates | 22:10 | |
*** ccrouch has joined #tripleo | 22:12 | |
tchaypo | I have to have breakfast and a driving lesson but in about 2.5 hours I'll be asking to be pointed in the direction of useful things to do with my exciting new cd-admin rights | 22:13 |
SpamapS | lifeless: cool, just need to keep the convergence train's engine full of spec-coal | 22:13 |
lifeless | SpamapS: I woud like to help there too | 22:14 |
lifeless | SpamapS: but I can't track every iteration; could you perhaps poke me at relevant times? | 22:14 |
SpamapS | lifeless: yeah. Currently getting a lot of feedback and have just recently identified that there may be need to adopt garbage-collection semantics for deleted nodes that we hadn't considered before. | 22:17 |
*** andreaf has quit IRC | 22:18 | |
*** andreaf has joined #tripleo | 22:18 | |
openstackgerrit | A change was merged to openstack/diskimage-builder: Rename old image file instead of rewrite it https://review.openstack.org/93439 | 22:19 |
*** andreaf has quit IRC | 22:19 | |
*** andreaf has joined #tripleo | 22:20 | |
*** giulivo has quit IRC | 22:20 | |
*** morazi has quit IRC | 22:21 | |
*** matty_dubs is now known as matty_dubs|gone | 22:21 | |
*** morazi has joined #tripleo | 22:21 | |
*** morazi has quit IRC | 22:21 | |
openstackgerrit | Gregory Haynes proposed a change to openstack/tripleo-image-elements: Extract mysql reset files from reset-db https://review.openstack.org/93104 | 22:22 |
openstackgerrit | patrick-crews proposed a change to openstack-infra/tripleo-ci: Alter how we grab and store machine files to use logstash indexing. https://review.openstack.org/99250 | 22:25 |
*** noslzzp has quit IRC | 22:26 | |
*** petertoft has quit IRC | 22:26 | |
*** weshay has quit IRC | 22:28 | |
*** noslzzp has joined #tripleo | 22:31 | |
vinsh | greghaynes, can you share any tripleo-ci script changes you used to fire up multiple control nodes? | 22:31 |
vinsh | merge.py doesn't seem to be working its magic as I expected | 22:31 |
vinsh | likely a large dose of operator error here | 22:32 |
greghaynes | I think the makefile change in t-h-t for that was merged | 22:32 |
greghaynes | so likely need to rebase off master | 22:32 |
vinsh | orly | 22:32 |
vinsh | k | 22:32 |
*** rollerj has quit IRC | 22:32 | |
vinsh | O i see. Thankya. | 22:33 |
vinsh | dangerous living on our dev master branch we have over here sometimes. | 22:34 |
vinsh | it should be in sycn next monday I hear. | 22:34 |
greghaynes | yea, it always gets a bit tricky with all these cross repo deps too | 22:34 |
greghaynes | have to constnatly rebase off master | 22:35 |
*** rollerj has joined #tripleo | 22:35 | |
vinsh | because clouds. | 22:35 |
*** andreaf has quit IRC | 22:36 | |
vinsh | hmm. I had made this same change locally already.. not any different | 22:38 |
openstackgerrit | Richard Su proposed a change to openstack/tripleo-heat-templates: Update ip_local_port_range through sysctl https://review.openstack.org/92952 | 22:38 |
vinsh | must be something in my template | 22:38 |
SpamapS | vinsh: two spaces is all it takes for yaml to ruin your day ;) | 22:40 |
vinsh | Wise words :) | 22:40 |
* SpamapS decides that 25 is enough reviews for one day and goes to dip brain in ice bath | 22:40 | |
vinsh | margarita flavored ice we hope. | 22:41 |
vinsh | oh neat. it worked. | 22:42 |
greghaynes | the margarita ice? | 22:42 |
vinsh | the scale ice. | 22:42 |
openstackgerrit | Richard Su proposed a change to openstack/tripleo-heat-templates: Update ip_local_port_range through sysctl https://review.openstack.org/92952 | 22:43 |
vinsh | now to pour over all 800 lines and sort out "ERROR: Arguments to "Fn::GetAtt" must be of the form [resource_name, attribute]" | 22:43 |
*** noslzzp has quit IRC | 22:44 | |
vinsh | its nice how heat doesn't even give you a range of lines its unhappy with. | 22:44 |
vinsh | "error in cloud" | 22:44 |
vinsh | k thx. | 22:44 |
SpamapS | vinsh: you get "in cloud" .. quit whining ;) | 22:45 |
vinsh | hah! | 22:45 |
* vinsh leaves the house plant alone then | 22:45 | |
SpamapS | It may be a symptom of review-brain-melt.. but I am dying laughing here | 22:47 |
vinsh | :) | 22:47 |
*** noslzzp has joined #tripleo | 22:49 | |
SpamapS | vinsh: just for you http://cdn.memegenerator.net/instances/500x/50964597.jpg | 22:49 |
vinsh | bah! | 22:49 |
vinsh | classic. | 22:50 |
greghaynes | vinsh: not sure if there isnt an easier way, but ive definitely stuck a pdb in the /opt/stack/... heat parser before | 22:53 |
greghaynes | actually, not pdb, just lots of prints | 22:53 |
vinsh | ah, I hadn't yet thought about digging into those guts, still been thinking of that as a blackbox. | 22:54 |
vinsh | thar be dragons. | 22:54 |
SpamapS | There's been talk of adding the dict path to exceptions | 22:58 |
greghaynes | ++ | 22:59 |
SpamapS | so have the recursive resolve function keep track of the breadcrumbs and any raised error would spit out "resources.NovaCompute0Deploy.config.foo.bar.baz: $ERROR" | 22:59 |
greghaynes | maybe thats a good breaking into heat pach I should do... | 23:00 |
SpamapS | which is really all you need | 23:00 |
vinsh | that would be enough of a ball park to get going atleast | 23:00 |
SpamapS | line numbers involve deep yaml foo | 23:00 |
SpamapS | have to subplant all the objects with line-number-aware objects | 23:00 |
SpamapS | or parse twice, once with line number aware objects, and still do the dict path thing | 23:01 |
greghaynes | ogod, compilers course flashbacks | 23:01 |
vinsh | welp, hope someone figures that out :) | 23:01 |
greghaynes | vinsh: with your glance registry-host bind issue, was that an error where you just couldnt hit the glance API or was it actually causing o-r-c to fail? | 23:05 |
vinsh | It was an error where devtest_overcloud.sh would try to register an image in overcloud.. and just hit a 500. this was because the glance-api could not reach the glance-registry | 23:06 |
vinsh | as the glance-api.conf was trying to reach registry at 0.0.0.0 when registry was only on node-ipv4 | 23:07 |
greghaynes | ok | 23:07 |
vinsh | added a comment to the glance-api.conf on: https://review.openstack.org/#/c/61376/ | 23:08 |
* vinsh hp -> gym -> home. | 23:10 | |
*** vinsh has quit IRC | 23:15 | |
openstackgerrit | A change was merged to openstack/tripleo-incubator: Add tchaypo to tripleo-cd-admins https://review.openstack.org/95614 | 23:16 |
openstackgerrit | Richard Su proposed a change to openstack/tripleo-image-elements: Move rabbitmq-server cluster port https://review.openstack.org/91704 | 23:16 |
lifeless | SpamapS: cool, so shout at me when you want eyeballs | 23:24 |
openstackgerrit | lifeless proposed a change to openstack/tripleo-image-elements: Add a hp1 region configs. https://review.openstack.org/96666 | 23:33 |
openstackgerrit | lifeless proposed a change to openstack/tripleo-image-elements: Syntax fix the keepalived docs. https://review.openstack.org/99798 | 23:33 |
openstackgerrit | lifeless proposed a change to openstack/tripleo-image-elements: HP1 region deploy config fixups. https://review.openstack.org/99799 | 23:34 |
openstackgerrit | A change was merged to openstack/diskimage-builder: Yum: support pkg-map in bin/install-packages https://review.openstack.org/91600 | 23:40 |
*** edmund has quit IRC | 23:42 | |
*** ccrouch has quit IRC | 23:44 | |
openstackgerrit | lifeless proposed a change to openstack/tripleo-image-elements: Unbreak Ironic default logging. https://review.openstack.org/99802 | 23:44 |
openstackgerrit | lifeless proposed a change to openstack/tripleo-image-elements: Add debug and verbose log support for Ironic. https://review.openstack.org/99803 | 23:44 |
lifeless | SpamapS: did you poke at some ilo ? | 23:49 |
Generated by irclog2html.py 2.14.0 by Marius Gedminas - find it at mg.pov.lt!