Wednesday, 2014-05-28

mordredso even though there was transparent connection re-use going on - the failures are not transparent00:00
mordredit's an option in how you use requests to disable such a thing -but I think we were thinking that requests itself should/could be more graceful00:00
jogomordred: you should yell at https://github.com/sigmavirus2400:01
jogomordred: he is apparently also the maintainter of flake800:01
*** matsuhashi has joined #tripleo00:01
jogomordred: which I am pushing to have automatic parallelism in the next version00:01
mordredjogo: when is he going to join us in the IRCs?00:02
jogomordred: not sure, he was on this weekend00:02
*** Penick has joined #tripleo00:10
*** Penick has quit IRC00:16
*** shakamunyi has quit IRC00:18
*** Penick has joined #tripleo00:19
*** Penick has quit IRC00:19
openstackgerritAdam Gandelman proposed a change to openstack/tripleo-incubator: Allow user-specified timeout for stack creation  https://review.openstack.org/9596800:21
*** lazy_prince has joined #tripleo00:24
*** chuckC has quit IRC00:24
*** ddieterly has joined #tripleo00:25
SpamapSadam_g: so perhaps what we need is a pool approach, rather than full serialization00:30
adam_gSpamapS, yeah, thats what im thinking00:31
adam_gSpamapS, or a >1 node undercloud,but pooling still be generally useful for ironic00:31
SpamapSadam_g: though it might be best to focus on even more optimized methods than dd over iscsi00:32
SpamapSadam_g: scatter gather would take the network load off the conductor quite a bit00:33
lifelessmulticast would be great00:37
lifelessbut I'm really having trouble modelling parallel dd causing conductor issues on a single hw node00:37
lifelesssince there's no off-node traffic to be interfered with00:37
mordredlifeless: I really want to track down the dude I was talking to who freaked out when I mentioned multicast00:38
lifelessmordred: please00:38
lifelessmordred: btw we've got some discussion now happening about IPMI plane access00:38
mordredawesome00:38
lifelessof course, the discussion is so far limited to a freak out about it00:38
lifelessrather than actual discussion of the policy goals etc etc00:38
mordredwell, that's step one00:38
lifelessI'm almost willing to suggest we buy PDU's and don't use IPMI for the folk in question00:39
*** yamahata has joined #tripleo00:39
mordredthe guy I talked to did that too - "PSHAW - OBVIOUSLY you can't do multicast in a REAL data center"00:39
mordredwith the look in his eyes which implied I was an idiot for bringing it up00:39
lifelessyeah00:39
lifelessthats the one00:39
mordredI forgot that the right respnse it to take his name and number and send him to lifeless00:40
SpamapSlifeless: tftp failing because we have saturated the network interface.00:40
mordredSpamapS: that's special00:40
openstackgerritA change was merged to openstack/diskimage-builder: add some missing \n at end of file  https://review.openstack.org/9182300:40
openstackgerritA change was merged to openstack/diskimage-builder: dib-lint: ensure file finish with a new line  https://review.openstack.org/9181800:41
lifelesshmm00:41
lifelessfatal: unable to access 'https://git.openstack.org/openstack-infra/tripleo-ci/': Failed to connect to git.openstack.org port 443: Network is unreachable00:41
lifelessSpamapS: I can imagine that but we didn't see that.00:41
lifelessSpamapS: adam_g reported *conductor* failures.00:41
lifelessSpamapS: and we saw *saucy host node* TFTP failures. Mellanox.00:42
greghaynesadam_g: did you ever see that occur on the 4xx rack?00:42
SpamapSHm, thats not what I recalled.00:42
lifelessSpamapS: nick upgraded the jump host to trusty and started deploying trusty end to end and the TFTP issue disappeared.00:43
adam_gthere were two tftp issues i saw00:43
SpamapSAhh, so just a slight disconnect I think.00:43
adam_gone was the lock up of tftp xfers at boot time00:43
adam_gwhich i believe is attributed to the driver update00:43
adam_gthe other was tftp connection failures from the ramdisk when trying to fetch keystone token-$foo file from conductor00:43
adam_gi observed the latter while other starnge things were happening, like i/o errors writing to iscsi devices on the conductor00:44
lifelessadam_g: wait, what00:44
adam_gnode locking errors00:44
adam_grpc timeouts, etc00:44
lifelessadam_g: the conductor doesn't implement tftp does it ?00:44
adam_glifeless, it runs a tftp server and puts files there. once booted, the ramdisk does tftp get of a file containing a keystone token it uses to do a curl callback to the ironic-api00:45
adam_gs/runs/relies on00:45
lifelessadam_g: sure, but thats not tfting from the conductor :)00:45
lifelessadam_g: its tftping from the tftpd00:45
lifelessadam_g: what OS was the conductor node running ?00:46
adam_glifeless, trusty00:46
lifelessnuts00:46
adam_glifeless, im running one more overcloud deployment on the 405 rack and writing an update email00:47
adam_glifeless, if you'd like, you can remove the serialization and  start throwing large numbers of instances at ironic00:47
lifelessI'm going to do a spike at a 3-node undercloud00:47
lifelesslocally first00:48
adam_galso, all of the errors i was observing were throwing the instances back into reschedule and ending up in a NoValidHost ERROR on the nova side00:48
lifelesstimeout will do that00:48
openstackgerritAdam Gandelman proposed a change to openstack/tripleo-incubator: Allow user-specified timeout for stack creation  https://review.openstack.org/9596800:53
openstackgerritlifeless proposed a change to openstack/tripleo-incubator: Support --help as well as -h in devtest.sh.  https://review.openstack.org/9598200:56
*** noslzzp has joined #tripleo00:56
*** BadCub01_ has quit IRC00:57
SpamapSadam_g: odd.. why use tftp if you already have curl?01:08
adam_gSpamapS, asking the wrong person there01:08
adam_g:)01:08
SpamapScurl would have a hell of a lot better chance of working than tftp if we're overwhelming the network01:10
*** chuckC has joined #tripleo01:10
lifelessSpamapS: when the token thing was being added there was no http server guaranteed to exist01:21
lifelessSpamapS: and the token file is 400 bytes, if we overwhelm things, the kernels etc are more of an issue01:22
*** eguz has quit IRC01:27
*** weshay has quit IRC01:29
openstackgerritGregory Haynes proposed a change to openstack/tripleo-heat-templates: Add initial support for galera clustering  https://review.openstack.org/8388301:35
*** weshay has joined #tripleo01:37
openstackgerritA change was merged to openstack/tripleo-image-elements: Fix ironic api port in nova element  https://review.openstack.org/9307801:38
*** nosnos has joined #tripleo01:48
*** ddieterly has quit IRC01:52
*** ddieterly has joined #tripleo01:53
*** morazi has quit IRC01:57
*** martyntaylor has quit IRC02:08
*** lazy_prince has quit IRC02:13
*** nati_uen_ has quit IRC02:13
*** weshay has quit IRC02:24
*** funzo_ has joined #tripleo02:31
*** mkerrin1 has joined #tripleo02:32
*** funzo has quit IRC02:33
*** mkerrin has quit IRC02:33
*** olaph has quit IRC02:47
openstackgerritlifeless proposed a change to openstack/tripleo-incubator: Add -c support to devtest_seed.sh.  https://review.openstack.org/9601402:48
lifelessSpamapS: up ?02:52
*** rcarrill` has joined #tripleo02:55
*** rcarrillocruz has quit IRC02:58
*** untriaged-bot has joined #tripleo03:00
untriaged-botUntriaged bugs so far:03:00
untriaged-bothttps://bugs.launchpad.net/tripleo/+bug/132194303:00
uvirtbotLaunchpad bug 1321943 in tripleo "Ceilometer Swift polling on overcloud control node fails with a 403 forbidden error" [Undecided,In progress]03:00
untriaged-bothttps://bugs.launchpad.net/tripleo/+bug/131876703:00
untriaged-bothttps://bugs.launchpad.net/tripleo/+bug/131497803:00
uvirtbotLaunchpad bug 1318767 in tripleo "apache element SSL cert check fails" [Undecided,New]03:00
untriaged-bothttps://bugs.launchpad.net/tripleo/+bug/131535503:00
uvirtbotLaunchpad bug 1314978 in tripleo "Cloud vm not pingable after overcloud upgrade " [Undecided,New]03:00
uvirtbotLaunchpad bug 1315355 in tripleo "Upgrade of overcloud failed with "Connection to neutron failed: Maximum attempts reached"" [Undecided,New]03:00
untriaged-bothttps://bugs.launchpad.net/tripleo/+bug/132316703:00
uvirtbotLaunchpad bug 1323167 in tripleo "overcloud can't create an instance by neutron error" [Undecided,New]03:00
untriaged-bothttps://bugs.launchpad.net/tripleo/+bug/131947303:00
uvirtbotLaunchpad bug 1319473 in tripleo "devtest_testenv.sh doesn't honour overcloud_computescale or overcloud_controlscale" [Undecided,In progress]03:00
untriaged-bothttps://bugs.launchpad.net/tripleo/+bug/131667503:00
uvirtbotLaunchpad bug 1316675 in tripleo "Saving a devtest VM results in error" [Undecided,In progress]03:00
untriaged-bothttps://bugs.launchpad.net/tripleo/+bug/132009003:00
*** untriaged-bot has quit IRC03:00
uvirtbotLaunchpad bug 1320090 in tripleo "pxe_ephemeral_format': u'ext4 left on nodes after deploys deleted" [Undecided,New]03:00
openstackgerritlifeless proposed a change to openstack/tripleo-heat-templates: Add Controller scale param to merge.py  https://review.openstack.org/8808503:08
openstackgerritlifeless proposed a change to openstack/tripleo-image-elements: Add os-is-bootstrap-host element and script  https://review.openstack.org/8643503:14
greghayneslifeless: incase you run into it - im mid-saneifying https://review.openstack.org/#/c/91177/303:17
*** shakamunyi has joined #tripleo03:19
lifelessgreghaynes: I'm just looking at the bootstrap host stuff03:20
lifelessgreghaynes: which looks entirely broken right now ?03:20
lifelessgreghaynes: ahha, same stuff03:20
greghaynesYes :( I was fooled by seeing the green checks in my gerrit dashboard03:20
lifelessgreghaynes: so03:20
lifelessgreghaynes: it seems to me that the boot-stack element is the one to use for this03:20
lifelessgreghaynes: grep for bootstack in t-h-t03:20
greghaynesoh, to determine what IP is bootstrap?03:21
lifelessgreghaynes: no, as the namespace03:21
lifelessrather than boot-strap03:21
greghaynesah, that works03:22
greghayneswhat about the element, merge them also?03:22
lifelesssecondly, AIUI the issue with merge.py is getting the first element of a list of all the control nodes ?03:22
lifelessgreghaynes: I'm thinking merge the elements yes.03:22
*** noslzzp has quit IRC03:22
lifelessits not sufficiently different AFAICT03:22
greghaynesYep, and kind of puts our init magic in one place03:23
greghayneslifeless: Which issue with merge.py? (theres a couple)03:23
lifelessgreghaynes: having it spit out the one true host03:23
*** shakamunyi has quit IRC03:24
greghaynesoh. So the logic for picking one host was not going to be in merge.py or templates - it was just supplying "heres a list of everyone, and heres you"03:24
lifelessgreghaynes: yeah, I saw. let me quickly (har har) test something03:25
greghaynesOn a slightly related note - did you see the etcd discovery protocol I linked?03:26
greghaynesEven if we dont etcd, seems like their design might be worth imitating - use the seed to elect a master for our master-election system in the undercloud, and so on for overcloud03:28
lifelessI haven't read it no03:30
greghaynesbasically they have a system to use an etcd to elect an inital master for another etcd cluster03:31
*** eghobo has joined #tripleo03:33
tchayposo the seed functions as a cluster with a quorum of 1?03:33
tchaypothat seems reasonable to me - if the seed is deaded we have problems03:34
greghaynesWell it can die after, it just doesnt have the master-election issue since its size of 103:34
lifelessgreghaynes: http://paste.ubuntu.com/7533785/03:34
lifelessgreghaynes: I haven't convinced myself of its correctness by using it yet, I'm just about to do that03:34
greghaynesok, you can commit it if its good - thats different enough from mine I should just wash my changes03:35
greghaynesargh, gotta run for a few, bbiab03:35
lifelessgreghaynes: If there is an equality fn in cfn we can output a bool03:35
lifelessI think03:35
*** ramishra has joined #tripleo03:40
*** nosnos has quit IRC03:47
*** tzumainn has quit IRC03:49
*** shakamunyi has joined #tripleo03:54
lifelessgreghaynes: let me know when you are back03:55
tchaypolifeless: are you going to be around for the meeting tonight? I'm plamnning to be ready to drive it just in case you aren't..03:56
lifelessI hope to be03:57
lifelessbut am delighted if you want to run it03:57
StevenKOh, the first meeting I can actually attend03:57
*** akuznetsov has joined #tripleo03:57
tchaypoAs long as you have your feet on the spare pedals, I'd be happy to drive with L plates for the second time in a day...03:57
StevenKHeh heh03:58
lifelesstchaypo: cool03:58
*** rbrady has quit IRC04:03
lifelessinit-complete is the bane of my loife04:03
lifelessgreghaynes: adam_g: was there a patch to make that issue better that I've missed?04:03
*** ramishra has quit IRC04:03
*** rpodolyaka1 has joined #tripleo04:06
lifelessarggghhh04:12
openstackgerritSteve Kowalik proposed a change to openstack/os-cloud-config: Add logging to os_cloud_config/nodes  https://review.openstack.org/9605104:13
StevenKlifeless: Hmm?04:13
lifelesswho moved the control plane image id *out* of the heat environment file04:13
lifelessusability fail04:13
StevenKgit blame ?04:13
lifelessindeed04:13
lifelessDan04:13
*** matsuhashi has quit IRC04:15
lifelesssadface at Fn::Equals being *in* the heat delivered metadata04:16
lifelesshowever04:16
lifelessI have got04:16
lifelesscurrent id vs selected id in two variables04:16
lifelessgreat04:17
lifelessHeat never implemented Fn::Equals04:17
lifelessstevebaker: ^ !04:17
lifelessstevebaker: worth a bug ?04:17
lifelesshttp://docs.aws.amazon.com/AWSCloudFormation/latest/UserGuide/intrinsic-function-reference-conditions.html#d0e3990504:17
stevebakerlifeless: we've discussed implementing those. We even have a blueprint https://blueprints.launchpad.net/heat/+spec/intrinsics04:20
lifelessnow there is something weird with merge.py and lists that I'm going to ignore for now04:20
lifelessbut I think this will work04:20
lifelessstevebaker: ah, ok.04:20
stevebakerlifeless: but I'm not sure it is on anybody's immediate radar04:20
lifelessthere we go04:25
lifelessgreghaynes:04:25
lifeless   "bootstrap_nodeid": "overcloud-controller0-f3k7jgpnh2bl",04:25
lifeless   "public_interface_ip": "",04:25
lifeless   "nodeid": "overcloud-controller2-h2arrn3qicjv"04:25
*** matsuhashi has joined #tripleo04:26
*** nosnos has joined #tripleo04:27
*** shakamunyi has quit IRC04:27
openstackgerritlifeless proposed a change to openstack/tripleo-heat-templates: Export new bootstack keys to the overcloud.  https://review.openstack.org/9605204:27
lifelessI have to bounce for baby time, but I'll be back.04:29
SpamapSlifeless: I'm here now. About to be surrounded by new Heat devs though. Wassup?04:38
*** lazy_prince has joined #tripleo04:50
*** shakamunyi has joined #tripleo04:52
*** shakamunyi has quit IRC04:54
lifelessSpamapS: paging in04:55
lifelessSpamapS: oh yeah, was going to ask about your patch that moves the hosts calculations around04:56
lifelessand something else I don't recall right now05:03
lifelessSpamapS: hows the bootstrapping going ?05:03
openstackgerritAdam Vinsh proposed a change to openstack/tripleo-image-elements: Added dependencies to install ceilometer  https://review.openstack.org/9605505:06
*** shakamunyi has joined #tripleo05:11
*** lokesh184 has joined #tripleo05:12
*** dshulyak_ has joined #tripleo05:15
*** ddieterly has quit IRC05:22
*** ddieterly has joined #tripleo05:22
*** akuznetsov has quit IRC05:23
*** akuznetsov has joined #tripleo05:27
*** shakamunyi has quit IRC05:40
*** shakamunyi has joined #tripleo05:40
*** rpodolyaka1 has quit IRC05:43
greghayneslifeless: ah, nice05:46
greghayneslifeless: I think the hosts moving got replaced by nova api memoization05:47
*** rlandy has joined #tripleo05:47
*** rcarrillocruz has joined #tripleo05:49
*** rcarrill` has quit IRC05:51
greghayneslifeless: are you working on the element changes for that heat patch or should I do it?05:53
lifelessgreghaynes: we should move the hosts anyway05:56
lifelessgreghaynes: if you're still doing stuff tonight, that would be a good thing to do05:56
* StevenK frowns at the function he just wrote.05:56
lifelessgreghaynes: I'm just tweaking the heat patch anyhow05:56
greghaynesok, ill do the element bit05:56
openstackgerritlifeless proposed a change to openstack/tripleo-heat-templates: Export new bootstack keys to the overcloud.  https://review.openstack.org/9605205:59
lifelessgreghaynes: don't forget to add them to seed-config too05:59
greghaynesyerp, just saw that comment05:59
xuhaiweihi, does anyone meet this situation? build devtest environment and ssh seed, under, overcloud are all OK, but quit under and overcloud, after a while(maybe a few hours) can't ssh undercloud and overcloud again06:00
lifelessxuhaiwei: what do you mean by 'quit under and overcloud'06:00
xuhaiweiexit them, don't ssh06:01
xuhaiweicome back to host06:01
xuhaiweii can still ssh seed cloud, and on seed run 'nova list', the undercloud is still active06:02
lifelessare they real hardware or emulated?06:02
xuhaiweiemulated06:02
lifelessdo you still have your route to the bm bridge?06:02
lifelesscan you ping the undercloud node?06:03
xuhaiweiI can't ping undercloud and overcloud06:03
xuhaiweiand in my route table, i can see 0.0.0.0         10.21.43.254    0.0.0.0         UG    100    0        0 brbm 10.21.40.0      0.0.0.0         255.255.252.0   U     0      0        0 brbm 192.0.2.0       192.168.122.197 255.255.255.0   UG    0      0        0 virbr0 192.168.122.0   0.0.0.0         255.255.255.0   U     0      0        0 virbr006:04
*** rpodolyaka1 has joined #tripleo06:04
*** rpodolyaka1 has quit IRC06:05
xuhaiweithis is not the first time i met this situation, i rebuild devtest environment for several times, every is the same06:05
*** eghobo has quit IRC06:05
lifelessif you tcpdump brbm and then try to ping, can you see the ICMP requests?06:06
xuhaiweibut if i keep ssh in undercloud( don't leave it since ssh it for the first time), i can always stay in it06:06
xuhaiwei$ tcpdump brbm tcpdump: no suitable device found06:07
xuhaiweidoes this mean brbm is gone?06:07
StevenKxuhaiwei: No, you're using tcpdump wrong :-)06:08
StevenKxuhaiwei: sudo tcpdump -i brbm -p -n06:08
xuhaiweiStevenK: thank you :(06:09
openstackgerritlifeless proposed a change to openstack/tripleo-heat-templates: Add initial support for galera clustering  https://review.openstack.org/8388306:15
lifelessgreghaynes: ^ old / bad deps removed06:15
xuhaiweiwhen running tcpdump i got this message full of the screen06:16
*** rakesh_hs has joined #tripleo06:16
SpamapSlifeless: bootstrapping is going very well06:16
xuhaiwei15:43:42.166034 IP 10.21.41.72.22 > 10.21.42.98.33591: Flags [P.], seq 1433824:1434016, ack 145, win 216, options [nop,nop,TS val 103173044 ecr 284062690], length 19206:16
SpamapSlifeless: regarding moving the inputs around, stevebaker has a patch into heat to memoize the calls which should result in a much better performance and we get to keep our nicely expressed templates.06:17
lifelessxuhaiwei: what ip address is your undercloud on ?06:17
xuhaiweilifeless: the host's ip?06:17
lifelessSpamapS: it seems nicer to me to express globally global things06:17
lifelessxuhaiwei: yes, from nova list06:17
SpamapSlifeless: we also discussed at the summit implementing variables as a separate recursive thing from software config so that we don't have to express the same thing many times06:17
lifelessgreat06:18
xuhaiweithe undercloud's ip is 192.0.2.306:18
SpamapSlifeless: basically it was a poor attempt at variables. We need actual variables.06:18
xuhaiwei10.21.41.72 is the host's ip06:18
lifelessxuhaiwei: ok so your tcpdump should be tcpdump -ni brbm host 192.0.2.306:19
lifelessyou need promiscuous on06:19
*** dshulyak_ has quit IRC06:19
lifelessSpamapS: we do, I agree, but even so06:19
StevenKlifeless: Blah, didn't know that. -i <foo> -p -n has been in my muscle memory for so long06:19
lifelessSpamapS: the config as a global expression seems like the right place to put global expressions06:19
lifelessStevenK: its appropriate when you're on the interface you want to dump, but we're not here :>06:20
lifelessStevenK: if you see what I mean06:20
StevenKHeh, right06:20
*** dkehn has quit IRC06:20
xuhaiweilifeless: running tcpdump and then ping 192.0.2.3, there is no reaction to tcpdump06:20
lifelessxuhaiwei: what is the output of 'ip route'06:21
xuhaiweidefault via 10.21.43.254 dev brbm  metric 100 10.21.40.0/22 dev brbm  proto kernel  scope link  src 10.21.41.72 192.0.2.0/24 via 192.168.122.197 dev virbr0 192.168.122.0/24 dev virbr0  proto kernel  scope link  src 192.168.122.106:21
lifelessxuhaiwei: use a pastebin please - thats unreadable06:21
xuhaiweiok06:22
lifelessalso the output of ip route get 192.0.2.306:22
*** dkehn has joined #tripleo06:22
lifelessSpamapS: but it sounds like you're unhappy with that patch as it stands ?06:22
SpamapSlifeless: it is not global06:22
lifelessSpamapS: the config object?06:23
SpamapSlifeless: we are using it with the controller address today, but tomorrow it will be the haproxy endpoint06:23
SpamapSlifeless: and the next day it might be a separate mysql server06:23
xuhaiweilifeless:http://paste.openstack.org/06:23
lifelessSpamapS: I don't understand06:23
lifelessxuhaiwei: I need the actual url, not the website :)06:24
lifelessxuhaiwei: after you click 'paste'06:24
xuhaiweisorry06:24
SpamapSlifeless: basically to support reusing the config object in multiple topologies, we have to wait until deployment time to decide what values to lookup and inject into the config.06:24
xuhaiweihttp://paste.openstack.org/show/81824/06:24
SpamapSlifeless: Or, we have to go back to Type: FileInclude for reusability.06:25
lifelessSpamapS: right, so things that are per-node are inputs06:25
lifelessSpamapS: things that are the same for all nodes using the config are directly expressable06:25
lifelessSpamapS: or - I'm misunderstanding something ?06:25
lifelessSpamapS: the deployment binds node-local values into the global config06:26
SpamapSlifeless: Yes, you're misunderstanding that the config is meant to be reusable across multiple topologies. If we could compose a config, out of another config, then we would have that. But we can't. :-/06:26
lifelessSpamapS: at what point do the expressions in the config as-written get evaluated ?06:26
SpamapSlifeless: when they can be evaluated according to the graph06:27
SpamapSso as soon as all the parents are active06:27
SpamapSbut this is missing the point. If we only were ever going to have one topology.. agreed. But I'd think at some point we might want to be able to run rabbitmq and mysql on their own servers.06:27
lifelessare the evaluated once for all the deploys-of-the-config or just once?06:28
lifelessSpamapS: I know that, but thats a separate question AFAICT06:28
SpamapSif we're expressing the  compute node config with a reference to the controller IP for rabbitMQ .. then we cannot reuse that config definition.06:28
lifelessSpamapS: since we're *only* talking about the scatter-gather of *all hosts*06:28
SpamapSso anyway... the performance problem is addressed under the covers06:30
SpamapSthe expressability problem can be given more thought06:30
SpamapSanyway, back to bootstrapping06:31
lifelessSpamapS: see e.g. my latest patch in tht06:36
lifelessyay p.o.o.your 500's thrill me06:37
lifelessxuhaiwei: you did not include the 'ip route get 192.0.2.3' command output06:39
lifelessthough I would expect it to be unsurprising06:39
lifelessxuhaiwei: now, can you ping 192.0.2.1 ?06:39
xuhaiweii can ping 192.0.2.106:39
lifelessok06:40
lifelessnow log into the seed06:40
lifelessand from there try to ping 192.0.2.306:40
xuhaiweibut as i ping 192.0.2.1 tcpdump still got nothing06:40
lifelessalso check in virsh list that the nodes are still running06:40
xuhaiweion seed , can't ping 192.0.2.306:41
xuhaiweilifeless: yes, virsh list shows all the nodes are running(under and over06:42
lifelessok06:42
lifelessso check the ip route from within the seed06:42
lifelessand connect to the console of e.g. the undercloud and login with stack / stack to debug there06:42
xuhaiweilifeless: the ip route http://paste.openstack.org/show/81825/06:44
*** e0ne has joined #tripleo06:44
*** boris-42 has quit IRC06:45
openstackgerritGregory Haynes proposed a change to openstack/tripleo-image-elements: Add os-is-bootstrap-host element and script  https://review.openstack.org/8643506:45
dshulyakgreghaynes: hi, i'm ok with it, are you going to use heat Parameters or some kind of StructuredConfig ?06:45
lifelessgreghaynes: oh, we'll need to add this to the undercloud too06:46
greghayneslifeless: yerp06:46
greghaynesdshulyak: I was thinking parameters06:46
lifelessgreghaynes: insta-review done06:47
greghaynesinsta-1 ;)06:47
*** e0ne has quit IRC06:48
*** e0ne has joined #tripleo06:49
* TheJulia wipes sleep from her eyes06:49
*** e0ne has quit IRC06:49
*** boris-42 has joined #tripleo06:49
greghaynesFor the midnight meeting?06:49
*** e0ne has joined #tripleo06:50
*** rdopiera has joined #tripleo06:50
lifelessTheJulia: *now* you need that caffeine06:50
TheJuliayeah, 3am06:50
greghaynesoh wow06:50
xuhaiweilifeless: you mean connect the console of the underloud? use nova get-vnc-console?06:50
TheJulialifeless: already got it06:50
StevenKTheJulia: Eek, you don't need to attend meetings if they're horrible times06:51
*** e0ne has quit IRC06:51
*** e0ne has joined #tripleo06:51
StevenKTheJulia: East coast of the US?06:52
TheJuliaIts still a first for me, so might as well, besides my sleep cyclce cold use some reverse adjustment.  I was starting to wake up at 4:30-5 AM on my own anyway06:52
TheJuliaYeah, North Carolina06:52
openstackgerritlifeless proposed a change to openstack/tripleo-heat-templates: Export new bootstack keys for cluster init.  https://review.openstack.org/9605206:52
lifelessxuhaiwei: no, virsh or virt-manager directly06:52
StevenKIs the meeting in -alt or -meeting itself?06:53
lifelessgreghaynes: ^ should do it06:53
greghayneswah?06:53
StevenKTheJulia: This will be my first meeting, since all of the previous meetings were at 5am or so06:54
lifelessgreghaynes: we're going to be consulting this in every cluster, regardless of scale :)06:54
lifelesstchaypo: so - I will be putting C to bed for the first 10m or so06:54
lifelesstchaypo: I hope :).06:54
devanandaStevenK: hi! https://bugs.launchpad.net/ironic/+bug/1315224 was just pointed out to me, and it looks like you filed it, so i'll ask you :)06:54
uvirtbotLaunchpad bug 1315224 in ironic "Ironic should set the node power-state to off when registering a node" [Undecided,New]06:54
lifelessdevananda: you'll ask him why you aren't asleep ?06:55
StevenKBwahaha06:55
devanandathat too06:55
StevenKdevananda: Ask me why I filed it? That's easy, lifeless asked me to. :-P06:55
devanandahah06:55
*** e0ne has quit IRC06:56
lifelessdevananda: because in a rack of unknown state, having machines running that aren't undergoing maintenance or actively deployed is a bad idea06:56
devanandaStevenK: so you (or lifeless) expect a newly-registered node to be immediately powered off06:56
devanandalifeless: we covered that. there's a periodic task to ensure state06:56
lifelessdevananda: and we found that ironic takes quite some time to assert the power state post-registration - long enough for deploys to get broken by it06:56
devanandagotcha06:57
lifelessdevananda: so someone put a patch into tripleo-incubator to power off nodes post registration06:57
devanandathat was my assumption based on reading the bug06:57
lifelessdevananda: NobodyCam I believe in fact06:57
devanandayep06:57
lifelessdevananda: I believe that workarounds in incubator are a misfeature :)06:57
devanandalifeless: you don't want to be a special case? :)06:57
lifelessplease god no06:57
devananda*incubator to be06:57
*** eghobo has joined #tripleo06:58
devanandaok - thanks. i have what I wanted (an understanding of where this issue came from)06:58
devanandaenough to comment and triage it06:58
*** rcarrill` has joined #tripleo06:59
tchaypoMeeting time!07:00
rpodolyakamorning all07:01
StevenKtchaypo: Which is in which channel?07:01
tchaypoStevenK: the usual #openstack-meeting-alt07:01
*** rcarrillocruz has quit IRC07:01
*** lifeless changes topic to " https://etherpad.openstack.org/p/tripleo-ci-r1-trusty | tripleo-cd running preserve-ephemeral WIP patches and https://review.openstack.org/#/c/62042/ | Using OpenStack to deploy OpenStack;meetings Tuesday 1900//0700 UTC in #openstack-meeting-alt"07:02
*** jistr has joined #tripleo07:05
*** e0ne has joined #tripleo07:05
*** ddieterly has quit IRC07:06
*** mrunge has joined #tripleo07:06
*** ddieterly has joined #tripleo07:06
*** derekh_ has joined #tripleo07:08
lifelessgreghaynes: what made you say 'wah?'07:08
greghayneslifeless: confusion, I thought you were saying I should drive meeting07:09
lifelessSpamapS: is https://etherpad.openstack.org/p/tripleo-ci-r1-trusty up to date?07:09
lifelessgreghaynes: ah no; was saying that patch should make the uc work too07:09
*** e0ne has quit IRC07:10
*** e0ne has joined #tripleo07:11
*** mrunge has quit IRC07:11
*** mrunge has joined #tripleo07:12
*** e0ne has quit IRC07:13
*** pblaho has joined #tripleo07:13
*** ifarkas has joined #tripleo07:14
*** rcarrill` is now known as rcarrillocruz07:15
openstackgerritDmitry Shulyak proposed a change to openstack/tripleo-specs: Haproxy configuration options  https://review.openstack.org/9490707:19
SpamapSlifeless: I haven't done anything else so yes it should be up to date.07:19
*** e0ne has joined #tripleo07:20
*** jcoufal has joined #tripleo07:28
*** lokesh184 has quit IRC07:33
*** lazy_prince has quit IRC07:34
*** lazy_prince has joined #tripleo07:35
*** e0ne has quit IRC07:36
lifelessneutron subnet-show is barfing on the ext-net07:36
lifelesssubnet-list isn't listing it07:36
lifelessnet-list lists it07:36
*** e0ne has joined #tripleo07:36
*** giulivo has joined #tripleo07:38
*** e0ne has quit IRC07:39
*** eghobo has quit IRC07:44
*** jprovazn has joined #tripleo07:46
*** lokesh184 has joined #tripleo07:50
marioslifeless: what i mean is (less specific axes) the latest specs in advanced services (service chaining, external ports, traffic steering) are much more abstract and implementation agnostic07:55
lifelessmarios: yeah, been watching that07:58
lifelessmarios: we'll see if they get more cores and review bandwidth though :)07:58
openstackgerritNicholas Randon proposed a change to openstack/tripleo-incubator: Choose whether to deploy/update using heat.  https://review.openstack.org/9234407:58
lifelessderekh_: so07:58
marioslifeless: on one hand great/interesting/exciting. on other hand, there are so many other things to fix first so... as you say07:58
derekh_lifeless: SpamapS subnetshow works for me07:58
derekh_neutron subnet-show fc0673ec-b4a6-4802-8454-02d5937cd1a307:59
lifelessderekh_: there are 40 active nodes other than te-broker07:59
lifelessderekh_: none have a public IP07:59
derekh_lifeless: yup, which presumable is why nodepool isn't using them07:59
lifeless2014-05-28 08:00:10.727 2582 TRACE nova.api.openstack RuntimeError: maximum recursion depth exceeded while getting the str of an object08:00
lifelessis getting logged in the nova-api log08:00
lifelessin 2014-05-28 08:00:26.241 2582 TRACE nova.api.openstack   File "/opt/stack/venvs/nova/local/lib/python2.7/site-packages/nova/compute/api.py", line 371, in _check_num_instances_quota08:00
lifeless2014-05-28 08:00:26.241 2582 TRACE nova.api.openstack     min_count, allowed)08:00
tchaypoTheJulia: where in the world are you?08:00
TheJuliaNorth Carolina08:01
tchaypoI'm guessing western europe?08:01
tchaypowell, I was a bit wrong08:01
tchaypoyou're handy for the mid-cycle though08:01
*** dtantsur|afk is now known as dtantsur08:01
tchaypookay, I'm in a moving car, so I'm going to stop looking at the screen now08:02
TheJuliaenjoy08:02
mariostchaypo: irc chairing like a boss08:03
derekh_lifeless: so that recursive _check_num_instances_quota only happens if in         except exception.OverQuota as exc:08:04
derekh_lifeless: so its over quota ?08:04
derekh_or trying to go over quota08:06
derekh_:q08:06
* TheJulia goes back to sleep08:07
derekh_nodepool could have lost track of some of the instances it started08:08
derekh_brb, tethering off phone and want to get onto home network08:09
*** derekh_ has quit IRC08:09
*** derekh_ has joined #tripleo08:11
*** matsuhashi has quit IRC08:14
lifelessderekh_: could be08:14
*** matsuhashi has joined #tripleo08:15
*** viktors|afc is now known as viktors08:16
*** matsuhashi has quit IRC08:17
*** matsuhashi has joined #tripleo08:17
*** andreaf has joined #tripleo08:18
*** IvanBerezovskiy has joined #tripleo08:27
*** lucasagomes has joined #tripleo08:33
openstackgerritA change was merged to openstack/tripleo-incubator: Support --help as well as -h in devtest.sh.  https://review.openstack.org/9598208:42
*** shakamunyi has quit IRC08:43
*** ddieterly has quit IRC08:49
*** ddieterly has joined #tripleo08:50
*** e0ne has joined #tripleo08:58
openstackgerritDmitry Shulyak proposed a change to openstack/tripleo-image-elements: Change stunnel priority and binding addresses  https://review.openstack.org/9566308:59
*** untriaged-bot has joined #tripleo09:00
untriaged-botUntriaged bugs so far:09:00
untriaged-bothttps://bugs.launchpad.net/tripleo/+bug/132194309:00
uvirtbotLaunchpad bug 1321943 in tripleo "Ceilometer Swift polling on overcloud control node fails with a 403 forbidden error" [Undecided,In progress]09:00
untriaged-bothttps://bugs.launchpad.net/tripleo/+bug/131876709:00
uvirtbotLaunchpad bug 1318767 in tripleo "apache element SSL cert check fails" [Undecided,New]09:00
untriaged-bothttps://bugs.launchpad.net/tripleo/+bug/131497809:00
uvirtbotLaunchpad bug 1314978 in tripleo "Cloud vm not pingable after overcloud upgrade " [Undecided,New]09:00
untriaged-bothttps://bugs.launchpad.net/tripleo/+bug/131535509:00
uvirtbotLaunchpad bug 1315355 in tripleo "Upgrade of overcloud failed with "Connection to neutron failed: Maximum attempts reached"" [Undecided,New]09:00
untriaged-bothttps://bugs.launchpad.net/tripleo/+bug/132316709:00
uvirtbotLaunchpad bug 1323167 in tripleo "overcloud can't create an instance by neutron error" [Undecided,New]09:00
untriaged-bothttps://bugs.launchpad.net/tripleo/+bug/131947309:00
uvirtbotLaunchpad bug 1319473 in tripleo "devtest_testenv.sh doesn't honour overcloud_computescale or overcloud_controlscale" [Undecided,In progress]09:00
untriaged-bothttps://bugs.launchpad.net/tripleo/+bug/131667509:00
uvirtbotLaunchpad bug 1316675 in tripleo "Saving a devtest VM results in error" [Undecided,In progress]09:00
untriaged-bothttps://bugs.launchpad.net/tripleo/+bug/132009009:00
uvirtbotLaunchpad bug 1320090 in tripleo "pxe_ephemeral_format': u'ext4 left on nodes after deploys deleted" [Undecided,New]09:00
*** untriaged-bot has quit IRC09:00
openstackgerritLoganathan Parthipan proposed a change to openstack/tripleo-incubator: scripts to persist a running devtest to disk  https://review.openstack.org/9257509:01
*** e0ne_ has joined #tripleo09:03
*** pelix has joined #tripleo09:03
*** e0ne has quit IRC09:06
dshulyakgreghaynes: so are you still going to parametrize haproxy configuration :) ? i can add params, but i thought that it is not good09:12
*** jp_at_hp has joined #tripleo09:12
*** lazy_prince has quit IRC09:19
*** lokesh184 has quit IRC09:20
*** rlandy has quit IRC09:21
*** lokesh184 has joined #tripleo09:22
*** martyntaylor has joined #tripleo09:22
*** lazy_prince has joined #tripleo09:22
*** yamahata has quit IRC09:22
openstackgerritJiri Stransky proposed a change to openstack/tripleo-specs: Tuskar multiple clouds  https://review.openstack.org/9611209:27
openstackgerritLin Tan proposed a change to openstack/diskimage-builder: Correct the substitution of suffix from qcow2 to raw  https://review.openstack.org/9611409:31
openstackgerritLin Tan proposed a change to openstack/diskimage-builder: Correct the substitution of suffix from qcow2 to raw  https://review.openstack.org/9611409:33
openstackgerritLin Tan proposed a change to openstack/diskimage-builder: Correct the substitution of suffix from qcow2 to raw  https://review.openstack.org/9611409:34
*** shakamunyi has joined #tripleo09:40
*** shakamunyi has quit IRC09:45
openstackgerritNicholas Randon proposed a change to openstack/tripleo-heat-templates: Move to software-config for the undercloud.  https://review.openstack.org/9331909:51
*** boris-42 has quit IRC09:52
openstackgerritMartin Geisler proposed a change to openstack/tuskar: Remove unnecessary coding lines  https://review.openstack.org/9612309:52
openstackgerritMartin Geisler proposed a change to openstack/tuskar: Use Emacs-friendly file variable to set file encoding  https://review.openstack.org/9588609:53
openstackgerritDmitry Shulyak proposed a change to openstack/tripleo-heat-templates: Haproxy configuration  https://review.openstack.org/9355409:53
dshulyakgreghaynes: updated haproxy configuration with params )09:54
*** dtantsur is now known as dtantsur|lunch09:56
*** akrivoka has joined #tripleo10:07
*** lokesh184 has quit IRC10:12
*** boris-42 has joined #tripleo10:14
*** jtomasek has joined #tripleo10:20
*** e0ne_ has quit IRC10:21
*** e0ne has joined #tripleo10:22
*** e0ne has quit IRC10:26
*** matsuhashi has quit IRC10:48
*** markmc has joined #tripleo10:56
*** e0ne has joined #tripleo10:59
*** e0ne has quit IRC11:04
*** lokesh184 has joined #tripleo11:05
*** rakesh_hs has quit IRC11:15
*** lucasagomes is now known as lucas-hungry11:15
*** rakesh_hs has joined #tripleo11:15
giulivojprovazn, a question regarding https://bugs.launchpad.net/tripleo/+bug/122631011:19
uvirtbotLaunchpad bug 1226310 in tripleo "Nova bm operations fail when LIBVIRT_DEFAULT_URI not set" [Medium,Triaged]11:19
giulivoif I wanted to submit something small to fix it11:19
giulivowould you actually approach it as a config variable for our tripleo user, or would it make more sense to fix that in nova-bm itself?11:20
*** tzumainn has joined #tripleo11:21
jprovazngiulivo: ah, I forgot about this issue. Setting variable for tripleo user was NACKed before. A fix on nova side would be much better11:22
giulivojprovazn, ok, thanks for pointing that out :)11:22
jprovaznnp11:22
*** dtantsur|lunch is now known as dtantsur11:29
*** e0ne has joined #tripleo11:32
dshulyakjprovazn: hi, are you going to restore https://review.openstack.org/#/c/61376/ ?11:37
jprovazndshulyak: hi, yes11:37
jprovazndshulyak: looks like easiest solution11:38
jprovaznlet me reopn/update it now11:38
*** e0ne_ has joined #tripleo11:45
*** e0ne has quit IRC11:49
*** lokesh184 has quit IRC11:52
*** morazi has joined #tripleo11:55
openstackgerritLadislav Smola proposed a change to openstack/tripleo-image-elements: Properly enabling and restarting snmpd  https://review.openstack.org/9568911:56
*** rlandy has joined #tripleo11:57
*** dprince has joined #tripleo12:10
openstackgerritA change was merged to openstack/tripleo-image-elements: Adding -x to keystone orc scripts  https://review.openstack.org/9076012:10
*** gcha has joined #tripleo12:14
*** funzo_ is now known as funzo12:16
*** pblaho has quit IRC12:18
*** julim has joined #tripleo12:21
*** weshay has joined #tripleo12:23
*** lucas-hungry is now known as lucasagomes12:23
openstackgerritA change was merged to openstack/tripleo-image-elements: indent using 4 spaces (3/3)  https://review.openstack.org/9320812:25
Ngmorning12:26
slaglegood morning12:27
*** martyntaylor has left #tripleo12:30
*** rbrady has joined #tripleo12:31
openstackgerritRadomir Dopieralski proposed a change to openstack/tuskar-ui: Tests are broken since Horizon started using angular-cookies  https://review.openstack.org/9615212:33
*** rakesh_hs2 has joined #tripleo12:33
openstackgerritDan Prince proposed a change to openstack/diskimage-builder: dpkg: support pkg-map in bin/install-packages  https://review.openstack.org/9160112:34
openstackgerritDan Prince proposed a change to openstack/diskimage-builder: Yum: support pkg-map in bin/install-packages  https://review.openstack.org/9160012:34
openstackgerritDan Prince proposed a change to openstack/diskimage-builder: opensuse: support pkg-map in bin/install-packages  https://review.openstack.org/9160212:34
openstackgerritDan Prince proposed a change to openstack/diskimage-builder: Update base element to make use of pkg-map  https://review.openstack.org/9188012:34
openstackgerritDan Prince proposed a change to openstack/diskimage-builder: Add pkg-map element.  https://review.openstack.org/9159812:34
openstackgerritDan Prince proposed a change to openstack/diskimage-builder: Set DISTRO_NAME in OS environment.d  https://review.openstack.org/9159912:34
*** rakesh_hs has quit IRC12:35
openstackgerritDan Prince proposed a change to openstack/diskimage-builder: Name deploy-ironic and deploy-baremetal files uniq  https://review.openstack.org/9581212:36
openstackgerritDan Prince proposed a change to openstack/diskimage-builder: Add top level 'tests' dir for element testing  https://review.openstack.org/9581312:36
openstackgerritDan Prince proposed a change to openstack/diskimage-builder: Name 01-install-bin uniquely  https://review.openstack.org/9581012:36
openstackgerritDan Prince proposed a change to openstack/diskimage-builder: Name 10-rhel-cloud-image uniquely  https://review.openstack.org/9581112:36
openstackgerritDan Prince proposed a change to openstack/diskimage-builder: Name 99-setup-first-boot uniquely  https://review.openstack.org/9580812:36
openstackgerritDan Prince proposed a change to openstack/diskimage-builder: Delete redhat-common 00-usr-local-bin-secure-path  https://review.openstack.org/9580912:36
*** jdob has joined #tripleo12:37
*** lazy_prince has quit IRC12:37
*** lazy_prince has joined #tripleo12:38
*** andreaf has quit IRC12:41
lxsligreghaynes: I got a clean Jenkins on https://review.openstack.org/#/c/93041/ , turn your +2 into +2/+A please?12:48
*** nosnos has quit IRC12:50
*** jcoufal has quit IRC12:56
openstackgerritNicholas Randon proposed a change to openstack/tripleo-incubator: Ironic "lock already held" when powering off nodes  https://review.openstack.org/9615412:56
*** jcoufal has joined #tripleo12:57
*** lazy_prince has quit IRC13:05
*** edmund has quit IRC13:09
*** edmund has joined #tripleo13:09
*** ramishra has joined #tripleo13:15
*** mrunge has quit IRC13:17
*** ddieterly has quit IRC13:18
openstackgerritA change was merged to openstack/tuskar-ui: Separate out BareMetalNode from IronicNode  https://review.openstack.org/9588113:21
*** rakesh_hs2 has quit IRC13:23
*** rakesh_hs has joined #tripleo13:24
*** gcha has quit IRC13:26
openstackgerritDmitry Shulyak proposed a change to openstack/tripleo-heat-templates: WIP: Haproxy configuration  https://review.openstack.org/9355413:27
openstackgerritLadislav Smola proposed a change to openstack/tripleo-image-elements: Storing SNMPd credentials in Ceilometer  https://review.openstack.org/9037413:29
openstackgerritLadislav Smola proposed a change to openstack/tripleo-image-elements: Storing SNMPd credentials in Ceilometer  https://review.openstack.org/9037413:32
*** ifarkas_ has joined #tripleo13:34
*** ifarkas has quit IRC13:34
*** ddieterly has joined #tripleo13:49
*** andreaf has joined #tripleo13:52
*** jcoufal has quit IRC13:54
*** jcoufal has joined #tripleo13:55
*** jcoufal has quit IRC13:56
*** shakamunyi has joined #tripleo13:56
*** jcoufal has joined #tripleo13:56
openstackgerritJan Provaznik proposed a change to openstack/tripleo-image-elements: Update openstack services to listen on stunnel connect port  https://review.openstack.org/6137613:57
*** BadCub has joined #tripleo14:00
openstackgerritLadislav Smola proposed a change to openstack/tripleo-incubator: Adding Undercloud Ceilometer config element  https://review.openstack.org/9463714:05
openstackgerritLadislav Smola proposed a change to openstack/tripleo-incubator: Generating of password for SNMPd  https://review.openstack.org/9483814:05
*** matty_dubs|gone is now known as matty_dubs14:05
*** akuznetsov has quit IRC14:06
*** akuznetsov has joined #tripleo14:06
*** edmund has quit IRC14:19
*** yamahata has joined #tripleo14:20
*** e0ne_ has quit IRC14:21
*** e0ne has joined #tripleo14:21
openstackgerritDmitry Shulyak proposed a change to openstack/tripleo-image-elements: Change horizon binding address to local-ipv4 in haproxy case  https://review.openstack.org/9108914:22
openstackgerritStuart McLaren proposed a change to openstack/tripleo-incubator: Run the overcloud with an SSL enabled public IP  https://review.openstack.org/8509814:30
*** jprovazn has quit IRC14:31
*** jcoufal has quit IRC14:33
openstackgerritA change was merged to openstack/tripleo-incubator: Properly default MysqlInnodbBufferPoolSize (overcloud)  https://review.openstack.org/9304114:38
*** eghobo has joined #tripleo14:39
*** dtantsur is now known as dtantsur|afk14:39
openstackgerritKiall Mac Innes proposed a change to openstack/diskimage-builder: VM element: Enable serial console on Debian  https://review.openstack.org/9617714:40
*** eghobo has quit IRC14:42
*** edmund has joined #tripleo14:48
*** rakesh_hs has quit IRC14:50
openstackgerritAlexis Lee proposed a change to openstack/tripleo-incubator: Properly default MysqlInnodbBufferPoolSize (undercloud) v2  https://review.openstack.org/9485614:58
*** untriaged-bot has joined #tripleo15:00
untriaged-botUntriaged bugs so far:15:00
untriaged-bothttps://bugs.launchpad.net/tripleo/+bug/132194315:00
untriaged-bothttps://bugs.launchpad.net/tripleo/+bug/131876715:00
uvirtbotLaunchpad bug 1321943 in tripleo "Ceilometer Swift polling on overcloud control node fails with a 403 forbidden error" [Undecided,In progress]15:00
uvirtbotLaunchpad bug 1318767 in tripleo "apache element SSL cert check fails" [Undecided,New]15:00
untriaged-bothttps://bugs.launchpad.net/tripleo/+bug/131497815:00
uvirtbotLaunchpad bug 1314978 in tripleo "Cloud vm not pingable after overcloud upgrade " [Undecided,New]15:00
untriaged-bothttps://bugs.launchpad.net/tripleo/+bug/131535515:00
uvirtbotLaunchpad bug 1315355 in tripleo "Upgrade of overcloud failed with "Connection to neutron failed: Maximum attempts reached"" [Undecided,New]15:00
untriaged-bothttps://bugs.launchpad.net/tripleo/+bug/132316715:00
uvirtbotLaunchpad bug 1323167 in tripleo "overcloud can't create an instance by neutron error" [Undecided,New]15:00
untriaged-bothttps://bugs.launchpad.net/tripleo/+bug/131947315:00
uvirtbotLaunchpad bug 1319473 in tripleo "devtest_testenv.sh doesn't honour overcloud_computescale or overcloud_controlscale" [Undecided,In progress]15:00
untriaged-bothttps://bugs.launchpad.net/tripleo/+bug/131667515:00
uvirtbotLaunchpad bug 1316675 in tripleo "Saving a devtest VM results in error" [Undecided,In progress]15:00
untriaged-bothttps://bugs.launchpad.net/tripleo/+bug/132009015:00
uvirtbotLaunchpad bug 1320090 in tripleo "pxe_ephemeral_format': u'ext4 left on nodes after deploys deleted" [Undecided,New]15:00
*** untriaged-bot has quit IRC15:00
*** rdopiera has quit IRC15:03
openstackgerritNicholas Randon proposed a change to openstack/tripleo-incubator: Ironic "lock already held" when powering off nodes  https://review.openstack.org/9615415:12
*** shakamunyi has quit IRC15:16
*** shakamunyi has joined #tripleo15:18
*** noslzzp has joined #tripleo15:33
*** pblaho has joined #tripleo15:35
*** pblaho has quit IRC15:35
*** pblaho has joined #tripleo15:35
openstackgerritA change was merged to openstack/tripleo-incubator: Ironic "lock already held" when powering off nodes  https://review.openstack.org/9615415:47
*** e0ne has quit IRC15:47
*** akrivoka has quit IRC15:47
*** e0ne has joined #tripleo15:48
*** e0ne_ has joined #tripleo15:49
*** jistr has quit IRC15:50
*** akrivoka has joined #tripleo15:50
*** e0ne has quit IRC15:52
*** eghobo has joined #tripleo15:54
*** derekh_ has quit IRC16:00
*** derekh_ has joined #tripleo16:01
*** cwolferh has joined #tripleo16:03
openstackgerritBen Nemec proposed a change to openstack/diskimage-builder: Factor out error behavior in dib-lint  https://review.openstack.org/9585016:05
derekh_lifeless: R1 is back in use now, check back to http://goodsquishy.com/downloads/s_tripleo-jobs.html later to check pass rates and know how its doing16:07
*** ifarkas_ has quit IRC16:09
*** ifarkas has joined #tripleo16:13
*** IvanBerezovskiy has left #tripleo16:15
*** matty_dubs is now known as matty_dubs|lunch16:17
*** e0ne_ has quit IRC16:19
*** e0ne has joined #tripleo16:19
openstackgerritDan Prince proposed a change to openstack/diskimage-builder: dpkg: support pkg-map in bin/install-packages  https://review.openstack.org/9160116:22
openstackgerritDan Prince proposed a change to openstack/diskimage-builder: Yum: support pkg-map in bin/install-packages  https://review.openstack.org/9160016:22
openstackgerritDan Prince proposed a change to openstack/diskimage-builder: opensuse: support pkg-map in bin/install-packages  https://review.openstack.org/9160216:22
openstackgerritDan Prince proposed a change to openstack/diskimage-builder: Update base element to make use of pkg-map  https://review.openstack.org/9188016:22
openstackgerritDan Prince proposed a change to openstack/diskimage-builder: Add pkg-map element.  https://review.openstack.org/9159816:22
openstackgerritDan Prince proposed a change to openstack/diskimage-builder: Set DISTRO_NAME in OS environment.d  https://review.openstack.org/9159916:22
*** dprince has quit IRC16:22
*** derekh_ has quit IRC16:23
*** e0ne has quit IRC16:24
*** dshulyak_ has joined #tripleo16:25
*** viktors is now known as viktors|afk16:26
*** akrivoka has quit IRC16:30
*** hashar has joined #tripleo16:31
openstackgerritJames Slagle proposed a change to openstack/tripleo-specs: Varying the deploy cloud hypervisor type  https://review.openstack.org/9458616:31
*** jdob has quit IRC16:33
*** fandi has joined #tripleo16:34
*** noslzzp has quit IRC16:40
*** pblaho has quit IRC16:44
*** rpodolyaka1 has joined #tripleo16:47
*** rpodolyaka1 has quit IRC16:50
openstackgerritAlexis Lee proposed a change to openstack/tripleo-specs: Promote HEAT_ENV  https://review.openstack.org/9491016:52
*** noslzzp has joined #tripleo16:53
*** rpodolyaka1 has joined #tripleo16:53
*** eghobo has quit IRC16:56
*** nati_ueno has joined #tripleo16:57
*** eghobo has joined #tripleo16:59
*** BadCub has quit IRC17:00
*** matty_dubs|lunch is now known as matty_dubs17:01
*** rdopiera has joined #tripleo17:02
*** yamahata has quit IRC17:03
*** andreaf has quit IRC17:06
*** andreaf has joined #tripleo17:10
*** andreaf has quit IRC17:10
*** ifarkas has quit IRC17:10
*** eghobo has quit IRC17:11
*** hashar has quit IRC17:12
*** andreaf has joined #tripleo17:12
*** lucasagomes is now known as lucas-dinner17:21
*** lparth_ has quit IRC17:27
*** jdob has joined #tripleo17:28
*** lparth has joined #tripleo17:28
vinshmordred, I was watching this movie "the 5th estate" about wiki links... and the whole time, I kept thinking of you as Julian Assange.. some how.. :D17:28
vinsh*wikileaks17:28
vinshkinda sorta.. but not really.17:29
mordredvinsh: I may or may not be the same person17:33
vinshexplains alot.17:33
lifelesscan I get two +2's and a +A on https://review.openstack.org/#/c/93848/ please?17:45
lifelesscritical path for any non-trivial deploy.17:45
greghaynesoh right, that bug17:45
lifelessbnemec: / SpamapS: / greghaynes ...17:45
greghaynesonce I finish epic battle with EEM17:46
lifelesssee you in 201517:46
bnemeclifeless: Ick.  Do we really want 24 hour tokens in the overcloud too?17:51
bnemecMaybe a follow-up patch to make it configurable?17:51
greghaynesThis should be config-passthrough-able if you really want to configure it17:52
greghaynesso probably doesnt need special config magic17:52
greghaynesI cant find any documentation of what the default val is there :/17:54
bnemecMaybe I'm overthinking it and 24 hour tokens aren't a problem.  It's just that hard-coding it is going to set it in all Keystones, so I want to at least consider the implications.17:54
bnemecgreghaynes: https://github.com/openstack/keystone/blob/master/etc/keystone.conf.sample#L127017:55
bnemecWhich matches the behavior mentioned in the bug.17:55
greghaynesYea, its there... doesnt say what it is by default though17:55
bnemecgreghaynes: That value is the default.  The config generator includes the default value commented out.17:56
greghaynessweet17:56
*** dtantsur|afk is now known as dtantsur17:56
lifelessbnemec: so this is a problem in the overcloud too17:58
lifelessbnemec: heat there will also fail in the same way if any stack operation takes > token time17:58
lifelessbnemec: so we have to fix the underlying bug IMO, this is a stopgap.17:58
greghaynesWe were having issues with the keystone db getting too large though, right?17:59
greghaynesand looks like this will make it grow by a factor of ~2017:59
lifelessAIUI the issues have been when the gc doesn't work18:00
bnemecYeah, although I think it was the expired token table that was causing issues so this wouldn't have much of an effect.18:00
greghaynesMaybe the thing to do is run this on one of our actual racks for a day18:01
lifelessthis used to be the keystone default, if that helps18:04
greghaynesah, that explains our odd commented out line18:04
greghaynesim +2 then18:04
* SpamapS awakens .. oh jet lag... you fun / infuriating character you18:06
greghaynesYouve almost made it to AUS time18:06
lifelessSpamapS: home again?18:06
SpamapSlifeless: no 2 more days here18:12
SpamapSlifeless: re 24 hour tokens.. that is... a bad idea.18:12
SpamapSlifeless: we'll completely destroy performance in a very short time.18:13
SpamapSlifeless: we should pile on to help shardy fix it.18:13
lifelessSpamapS: I agree!18:18
lifelessSpamapS: would that be a reasonable bootstrappy thing ?18:18
lifelessSpamapS: or is it unrelated to the big refactor?18:18
SpamapSlifeless: so with 1 hour tokens, we see the token growing to GB already. Selects to delete users blow up even a large sized buffer pool.18:19
SpamapSlifeless: making it 24x longer ... will make it 24x worse.18:19
greghaynesah, I thought it was just a non gc issue when we had the db size explosion :/18:20
lifelessSpamapS: are you saying there is no short term solution and we just have to tell people 'you cannot use heat for stacks > 1 hour in deploy time' ?18:20
lifelessSpamapS: I had the impression from the summit sprint times that this was an ok interim step.18:20
lifelessSpamapS: until a) we get the dogpile keystone config ratified by morganfainberg and co, and / or heat fixed.18:21
SpamapSlifeless: it will snowball on itself.18:22
SpamapSlifeless: I think we can maybe do 2 hour expire times.18:22
SpamapSlifeless: but that will make stack deletes much worse.18:22
greghaynesHave we set this in the 405 rack? I thought our overcloud deploys there were > 1hr18:23
SpamapSYeah I think 2 hours is fine.18:23
lifelesswe set it in the 405 and 306 racks AFAIK18:23
*** rpodolyaka1 has quit IRC18:23
lifelessotherwise we wouldn't be able to deploy anything with the reduced performance of serialised ironic18:23
* morganfainberg apologizes to everyone for not getting token stuff better yet18:23
*** nati_ueno has quit IRC18:24
morganfainbergthe real win will be https://review.openstack.org/#/c/95976/ and we're talking about creating a session token (tgt style) that can be refreshed and reused to help w/ this type of issue (unscoped)18:24
SpamapSlifeless: but we didn't set 86400 right?18:25
*** ramishra has quit IRC18:25
lifelessSpamapS: we set 8640018:25
lifelessSpamapS: using this patch18:26
lifelessSpamapS: and its been used there for the last 1.5 weeks18:26
SpamapSmorganfainberg: yesssssssss non persistant tokens will be fantastic.18:26
SpamapSlifeless: but is that cloud used in any kind of ongoing basis, or torn down and re-deployed a lot?18:26
lifelessSpamapS: I don't know if adam_g and greghaynes have been doing end-to-end runs or repeated-oc-runs18:27
lifelessif the latter, then ongoing18:27
greghaynesWe can just leave one up for >24hr and do some stack-updates18:27
morganfainbergSpamapS, well, that is on the slate for Juno 100% evne if i have to work nights and weekends throug RC to get it done.18:27
SpamapSmorganfainberg: heh, a common thread for the magical tasks we have ;)18:28
SpamapSlifeless: well perhaps the urgency is not as high as I thought then. :p18:28
lifelessSpamapS: do you mean the severity of impact?18:28
lifelessSpamapS: I think its urgent to fix the ability to deploy a rack18:29
lifelessSpamapS: I think its urgent to fix heat to do $whatever shardy says is needed here18:29
*** nati_ueno has joined #tripleo18:30
SpamapSlifeless: I just mean that I am pretty sure 1 day expiring tokens will be a workaround with many other consequences.18:30
SpamapSBut perhaps I'm wrong, and the token table will remain managable even in a moderately busy cloud.18:31
*** e0ne has joined #tripleo18:32
lifelessSpamapS: so here are our options today AIUI18:35
lifeless- do nothing: Ironic + adam's serialisation patch + steve's heat patch for API memoisation can deploy 5-10 nodes.18:35
lifeless- set a 2 hour limit: 20-30ish nodes18:36
lifeless- set a 3 hours limit: 30 reliably I suspect18:36
lifeless- set the limit only in the undercloud18:36
lifelessthe undercloud has a smaller/identical control plane than the overcloud at the moment so that doesn't make a huge amount of sense to me but - what do tyou think?18:37
SpamapSlifeless: I'd be interested to hear what size the token table ends up at after a deploy completes on the undercloud.18:38
adam_glifeless, FYI, im testing right now across both racks without serialization to see if i can reproduce the issues i was seeing. it doesn't make sense you guys didn't hit the same issues but i was plagued by it all last week.18:38
morganfainbergSpamapS, lifeless, i could help you run Redis (which iirc will work better than the memcache token backend) but it all depends on token churn.18:39
lifelessadam_g: / greghaynes: you happen to have a oc deploy thats completed/completing and not torn down? can we gather that data for SpamapS ?18:40
morganfainbergif the token churn is low-enough, it shouldn't be too bad to do a db token cleanup on a cron as a stop-gap (keystone-manage something-something-flush i think)18:40
adam_glifeless, i have one up right now im testing concurrent dd's18:40
lifelessadam_g: ok, lets check token size post deploy18:40
adam_glifeless, post-deploy of the heat stack?18:40
lifelessadam_g: so we *did* see lots of flaky hardware. And it may be the serialisation takes the edge off of that18:40
lifelessadam_g: yeah, SpamapS wants to see how big the tokens table becomes18:41
adam_glifeless, yeah.. either way, configurable concurrency of deployments would be valuable knob even if we're not hitting an i/o limit on 30 nodes18:41
lifelessadam_g: when we investigated why specific nodes failed to deploy, it was the same nodes repeatedly, which is why we decided 'its hardware'18:42
*** eghobo has joined #tripleo18:43
SpamapSmorganfainberg: we already do token cleanup in cron18:44
SpamapSadam_g: /mnt/state/var/lib/mysql/keystone/token.ibd should be the max size the token table has reached.18:46
adam_gSpamapS, deploying the stack now, one min18:46
morganfainbergSpamapS, ok you're probably ok increasing token TTL as a work around for now. But, i wouldn't go over ~5h life.18:47
SpamapSmorganfainberg: yeah maybe 5h will be a happy medium.18:47
morganfainbergSpamapS, previous job had a customer churning tokens like mad (~100k / day) still took till about 20mil token rows to see real nasty nasty issues.18:48
*** cwolferh has quit IRC18:49
SpamapSmorganfainberg: thats assuming you have a nicely tuned db server. :)18:49
openstackgerritlifeless proposed a change to openstack/tripleo-image-elements: Workaround Heat token handling brain-damage.  https://review.openstack.org/9384818:49
morganfainbergeh, percona, 3 nodes, minimal tuning18:49
morganfainbergbut a glut of ram on those boxes18:50
openstackgerritJon-Paul Sullivan (jp_at_hp) proposed a change to openstack/tripleo-image-elements: Start the cloud-init nic before cloud-init  https://review.openstack.org/9622118:50
bnemeclifeless: Did you mean to make it only 4 hours?18:51
SpamapSmorganfainberg: yeah, in theory our current target for controllers would be able to have a ton of RAM allocated to the buffer pool.18:51
jerryzlifeless: what is the cpu_allocation_ratio in your test cloud? and ram_allocation_ratio?18:52
*** hashar has joined #tripleo18:52
morganfainbergSpamapS, i've had success running in that config18:52
openstackgerritlifeless proposed a change to openstack/tripleo-image-elements: Workaround Heat token handling brain-damage.  https://review.openstack.org/9384818:52
lifelessbnemec: yes, but I flumped the commit messages18:52
lifelessjerryz: which test cloud ?18:52
jerryzlifeless: tripleo-test-cloud or rh118:54
lifelessjerryz: ah, the ones the jenkins CI slaves run in18:54
bnemeclifeless: Okay, +2.18:54
lifelessjerryz: they are stock tripleo kvm clouds, with no tweaks18:54
*** cwolferh has joined #tripleo18:55
lifelessjerryz: so cpu_allocation_ratio unset, and ra_allocation_ratio set to 1.018:55
jerryzlifeless: let me ask in another way, the max servers vs physical cpu cores?18:56
jerryzlifeless: according to current usage18:57
*** shardy is now known as shardy_afk18:57
lifelessI think you are asking how many physical cores in use vs virtual cores in use18:57
jerryzlifeless: yehs18:57
lifelesswe run one virtual core per physical core18:57
lifelesssince slow CI is not good18:58
jerryzlifeless: in current usage model, what is the peak number of used nodes?number of  concurrent patches tested19:00
*** jp_at_hp has quit IRC19:00
*** matty_dubs is now known as matty_dubs|gone19:01
jerryzlifeless: because now we have some difficulty finding enough resource to guarantee a 1:1 ratio for cpu. that's why i am asking19:01
lifelessjerryz: we're running 123 VMs with 456 vcpus at the moment, and 1.5TB of memory in use and 2.5TB of local disk storage.19:04
lifelessjerryz: in the HP1 region19:04
lifelessjerryz: its probably ok to run a smaller region at 1:1 rather than overcommitting.19:04
lifelessjerryz: note that those stats don't include the testenvs used to emulate baremetal; we have 10 of those machines at the moment, 24 cores each, fully deployed19:05
lifelessI think we overcommit there slightly because the usage pattern of emulated baremetal has a lot of quiet periods19:06
*** e0ne has quit IRC19:10
*** e0ne has joined #tripleo19:10
jerryzlifeless: i see. thanks for the info.  how many of 123 vms are visible to nodepool19:11
*** dtantsur is now known as dtantsur|afk19:12
lifelessactually those stats are out - nova bug :(.19:12
lifeless45 nodepool vms at the moment, I think19:12
openstackgerritlifeless proposed a change to openstack/tripleo-image-elements: Workaround Heat not handling token expiry  https://review.openstack.org/9384819:13
lifelessbetter message there ^19:14
*** e0ne has quit IRC19:14
*** e0ne_ has joined #tripleo19:14
jerryzlifeless: ok. we will start from a small region first. about ssh jump host, nodepool's sshclient needs some work to use openssh config. jenkins also does not honor it. the link you posted in the spec about prefix suffix command is no longer working.19:17
lifelessjerryz: well, we'll need to work through those issues :)19:18
*** chuckC has quit IRC19:19
openstackgerritOpenStack Proposal Bot proposed a change to openstack/os-apply-config: Updated from global requirements  https://review.openstack.org/9623319:19
openstackgerritOpenStack Proposal Bot proposed a change to openstack/os-cloud-config: Updated from global requirements  https://review.openstack.org/9325319:19
openstackgerritOpenStack Proposal Bot proposed a change to openstack/os-collect-config: Updated from global requirements  https://review.openstack.org/9623419:19
lifelessbnemec: ^ I made the commit message less derogatory; could I get the +2 back ? THanks! https://review.openstack.org/9384819:19
openstackgerritOpenStack Proposal Bot proposed a change to openstack/os-refresh-config: Updated from global requirements  https://review.openstack.org/9623519:19
*** akuznetsov has quit IRC19:20
lifelessgreghaynes: did I get a +2 from you as well ?19:20
jerryzlifeless: out to lunch, talk to you later. thanks!19:20
bnemeclifeless: It was a trivial rebase so my +2 stuck.19:20
lifelessbnemec: ah cool19:20
greghaynesadam_g: The build with the keystone timeout change is still going?19:21
*** e0ne_ has quit IRC19:21
greghaynesI think ill wait until we can poke at it19:22
lifelessgreghaynes: adam_g is building with parallel ironic; keystone unaltered (but all the builds this last week have had the keystone timeout change)19:22
adam_ggreghaynes, keystone timeout change?19:22
greghaynesoh19:22
*** e0ne has joined #tripleo19:22
greghaynesconversation overload19:22
greghaynesok, and the last week thing - was that at 24hr or 4?19:22
greghayneseither way actually19:23
greghaynesshould be fine - I know we had stacks up for over 4hr19:23
*** rpodolyaka1 has joined #tripleo19:23
greghayneswhat actually does our partitioning for use-ephemeral?19:25
lifelessthe way its meant to work is this19:25
lifelessflavour with ephemeral in it19:25
lifeless-> nova instance metadata19:25
lifeless-> nova ironic driver19:25
lifeless-> ironic api19:26
lifeless-> ironic conductor then does the partition table and mkfs for it19:26
greghaynesok, so theres supposed to be some logic in ironic about how to size the partitions?19:27
*** pelix has quit IRC19:28
greghaynesoh no, we pass that in19:28
greghaynesah ok, its just all set in the flavor19:29
lifelessright via the flavour19:29
greghaynes*lightbulb*19:29
lifelessflavours are cloud operator tools to describe available machine configurations to tenants19:30
lifelessone reason its done by the operator is to prevent bad-packing dos attacks19:31
lifelessthough whether thats really a risk is arguable IMO19:32
greghayneshah, ok19:32
*** nati_ueno has quit IRC19:32
greghaynesadam_g: Do you grok what Ng was saying about preserve_ephemeral in ironic not actually provisioning an ephemeral partition?19:36
lifelessSpamapS: so I think a questionI had for you got lost19:36
lifelessSpamapS: the token thing in heat19:36
lifelessSpamapS: is that something the team you're bootstrapping would consider in-remit ?19:36
adam_ggreghaynes, i believe the gist is that we're ending up with a single partition (/dev/sda2 mounted at /). all stateful stuff ends up in /mnt/state/ on the same partition.19:38
greghaynesYep, but is this a feature change from nova-bm?19:38
lifelessits a bit of an annoying fallback mode19:38
lifelessgreghaynes: yes19:38
lifelessgreghaynes: in nova-bm we get a separate partition that can survive rebuilds19:39
greghaynesWhat do we get in ironic?19:39
SpamapSlifeless: Not sure.19:40
lifelessgreghaynes: well it works for me19:40
lifeless├─sda1   8:1    0     30G  0 part /mnt19:40
lifeless├─sda2   8:2    0 1023.5K  0 part19:40
lifeless└─sda3   8:3    0     10G  0 part /19:40
*** rpodolyaka1 has quit IRC19:40
lifelessoh19:40
lifelessI may have used nova-bm, nuts19:41
greghaynescached -env.json?19:41
lifelessgreghaynes: the -env doesn't affect the disk image19:41
greghaynestrue19:41
lifelesstime to build new images19:42
SpamapSdid we get a table size yet btw?19:42
adam_gSpamapS, on its way19:42
lifelessSpamapS: re (not sure)- so perhaps run it up the flagpole? It might be a good way to get known to the heat cores19:43
lifelessSpamapS: and its certainly critical19:43
lifelessSpamapS: as well as something the long term design needs IMO - can't react arbitrary times later without it fixed19:44
SpamapSlifeless: shardy is already 50% done.. so it may not be the best thing to tackle.19:44
lifelessSpamapS: didn't realise19:44
lifelessnobudy tells me nudding19:44
SpamapSlifeless: yeah, it's a straight forward issue. We just need tests.19:44
SpamapShttps://review.openstack.org/9622219:45
lifelesshow about a tempest test?19:45
lifelessset the keystone token expiry to 2 minutes19:45
lifelessthen do stuff19:45
lifelessI'm willing to be a *lot* of openstack code will fall over in a heap :)19:45
SpamapSTrue19:46
morganfainberglifeless, never set below 5minutes :P accepted clock-skew window (but i don't disagree)19:46
morganfainbergin fact... we should probably make the minimum 300 seconds...19:47
SpamapSmorganfainberg: in a testing situation, you'd be on  the same box19:48
SpamapSmorganfainberg: well in devstack-gate anyway19:48
morganfainbergSpamapS, sure, but doesn't mean any real deployment could ever really support it, iirc19:48
SpamapSAnother option is to introduce time warping into devstack-gate and tempest by mucking with clocks.19:49
lifelessmorganfainberg: so the question is 'how can we do integration tests that make common operations which are occasionally slow... exceed token timeout time'19:49
lifelessmorganfainberg: in order to test that token renewal (where appropriate) is working19:50
morganfainberglifeless, right. sleep? :P i mean no no no.19:50
lifelessmorganfainberg: lol, in the gate man19:50
mordredoh yes. please GOD add a bunch of sleep calls into the tempest gate19:50
morganfainberglifeless, *snicker*19:50
morganfainberglifeless, see mordred  likes this plan! :P19:51
morganfainberg>.>19:51
mordredmorganfainberg: that should be your clue that it's very very dangerous19:51
morganfainbergmordred, oh good! i have a metric to go by!19:51
giulivoguys, wanted to ask for your opinion about this bug https://bugs.launchpad.net/tripleo/+bug/122631019:51
uvirtbotLaunchpad bug 1226310 in tripleo "Nova bm operations fail when LIBVIRT_DEFAULT_URI not set" [Medium,Triaged]19:51
morganfainberglifeless, i could see a benefit to allowing for requesting an initial token with less time-to-live than the maximum19:52
giulivobasically people @nova are so and so about introducing anything new into nova-bm, so even if I manage to land the default_uri config setting in ironic, we can't be sure it will be in nova-bm19:52
morganfainberglifeless, immidiately, probably setting the value silly low and watching things explode :(19:52
*** markmc has quit IRC19:52
giulivoI wonder if that is still worth trying or if we should instead try to just set the env variable for the tripleo user in something like .bashrc ?19:53
lifelessgiulivo: we should fix things where the fix belongs19:54
morganfainberglifeless, oh i bet you could do this by issueing a delete on the active token19:54
lifelessgiulivo: ironic will have the same issue19:54
SpamapSmordred: so instead of sleep, what we need instead is just a way to warp time. I bet there's already an LD_PRELOAD that lets you do that.19:55
morganfainberglifeless, revoke the token. auth_token should respond the same way if the token is expired or revoked19:55
lifelessgiulivo: so I'd start by fixing it in Ironic then you can discuss 'backporting' the feature to nova-bm19:55
giulivolifeless, got it, thanks19:56
morganfainberglifeless, using v2.0 DELETE /v2.0/tokens/<token_id> v3 is ... DELETE /v3/auth with x-subject-token header as the token to be revoked iirc19:56
morganfainberglifeless, but that should solve the immediate need (slightly different workings under the hood)19:56
*** shardy_afk is now known as shardy19:58
lifelessmorganfainberg: there we go :) nice20:00
lifelessSpamapS: ^ and thus we can get a tempest test20:00
lifelessgiulivo: really the bug is that Fedora has very strange defaults, but thats a different discussion :)20:00
lifelessgiulivo: the rh folk have already tried to get it address AIUI20:00
giulivolifeless, well no I don't agree there... why shall it default to qemu:///system when issued by a normal user?20:02
giulivoand also, why shall the virtual_power rely on a particular configuration of the libvirt client?20:02
lifelessgiulivo: it already relys on a particular configuration in that the vms have to be manually created20:03
*** chuckC has joined #tripleo20:03
lifelessgiulivo: but sure20:04
giulivolifeless, indeed the problem is exactly that we do not rely on the default setting when creating the VMs, we create them via our own DEFAULT_URI setting, but nova-bm doesn't have it set20:04
lifelessgiulivo: in terms of strange defaults and a case for that - fedora is primarily a desktop OS, but the default doesn't use hardware accelerated hypervisors20:05
lifelessgiulivo: this is analogous to not using hardware accelerated graphics even if the hardware has it20:05
lifelessgiulivo: Personally, I think thats very strange.20:05
openstackgerritGregory Haynes proposed a change to openstack/tripleo-image-elements: Add os-is-bootstrap-host element and script  https://review.openstack.org/8643520:05
*** nati_ueno has joined #tripleo20:06
giulivolifeless, wait if I was to follow up on that, it would be off topic ... I won't get into the trap :P20:06
lifelessI don't think we worry too much about off-topic here :)20:07
lifelessanyhoo20:07
lifelessironic first ;)20:07
giulivoso anyway, regarding the fedora default, it really isn't a fedora default either.... it is libvirt default, fedora just lets the developer pick their default20:07
giulivobut see, I don't question ubuntu decision of manually changing it to qemu:///system I'm just not sure this can be blamed on the distro really20:08
*** noslzzp has quit IRC20:08
lifelesshmm, on Ubuntu (and I believe debian) have it qemu:///system20:08
giulivoyeah but it is manually set in the libvirt.conf file20:08
giulivothe global libvirt.conf20:09
*** rlandy has quit IRC20:09
lifelessgiulivo: not on my one20:09
lifelessgiulivo: /etc/libvirt/libvirt.conf is all comments for me20:09
giulivoso that means, unless you have DEFAULT_URI set, you are using qemu:///session I think20:09
*** rpodolyaka1 has joined #tripleo20:10
*** nati_ueno has quit IRC20:11
lifelesshowever it works, on Ubuntu, I end up making qemu://system and it all just works20:11
lifelessvirt-manager with a connection to qemu:///system, for instance, shows the same vms20:11
openstackgerritJames Slagle proposed a change to openstack/tripleo-specs: Varying the deploy cloud hypervisor type  https://review.openstack.org/9458620:11
lifelessperhaps  session==system? here20:11
giulivohuh, well it shouldn't20:11
giulivobut at this point, it is worth investigating further on ubuntu/debian and see if that is behaving differently for some other reason20:12
giulivowill do :)20:12
*** dshulyak_ has quit IRC20:13
openstackgerritGregory Haynes proposed a change to openstack/tripleo-image-elements: Add os-is-bootstrap-host element and script  https://review.openstack.org/8643520:14
SpamapSlifeless: cool. I will lay it out as a low hanging fruit when we're in the office in 9 hours ;)20:14
*** rpodolyaka1 has quit IRC20:14
lifelessadam_g: did ironic fail ?20:15
adam_glifeless, on the concurrent image writeouts? not in the way i was seeing last week, but there has been a reoccuring iscsiadm error20:15
Nggreghaynes: a fresh deployment from completely blank hardware with ironic, will pass preserve_ephemeral and that causes ironic to skip even trying to mkfs the ephemeral partition20:16
adam_glifeless, see https://bugs.launchpad.net/ironic/+bug/132150420:16
uvirtbotLaunchpad bug 1321504 in ironic "Launching ~30 nova instances /w Ironic + pxe_ipmitool results in many ERRORs" [Medium,In progress]20:16
adam_gmy OC deploys have been delayed b/c a fix for the trusty cloud-init hang was not pulled in, should be up shortly20:16
lifelessNg: is there a bug for that in Ironic now ?20:17
Nggreghaynes: see around line 231 of ironic/drivers/modules/deploy_utils.py20:17
adam_goh actually, one is done20:17
Nglifeless: not yet, I wanted to discuss it a little. I'm not 100% convinced that it's a bug, we're asking it to not touch the ephemeral partition, it seems unreasonable to expect it to be magical given that constraint20:17
adam_gSpamapS, 188M May 28 20:18 token.ibd20:17
Ngif it does some dumb "is ephemeral and ext fs" it will happily blow away some weird fs someone chooses to use20:18
SpamapSbingo20:18
SpamapSadam_g: lovely20:18
*** akrivoka has joined #tripleo20:18
adam_gSpamapS, contains 9117  rows20:18
SpamapSadam_g: so I think 4h is a nice compromise while we work through the re-auth issue in Heat20:18
adam_gSpamapS, this is a slightly smaller 24 node clode20:18
adam_gclode?20:19
SpamapSadam_g: sure, but I'm ok with 10x the size20:19
SpamapSclode is a scottish cloud20:19
Nglifeless: my thinking was that we'd be better off having something like an o-r-c job that checks for a filesystem and makes it otherwise, since that's much easier for operators to discover and tune/override20:19
slaglelifeless: fwiw, qemu://session != no kvm20:19
slaglelifeless: session vs system is a way to have per-user vm's, configurations, etc20:19
slaglethe crux of this issue has always been that sometimes the scripts use virsh, and sometimes they use sudo virsh20:20
slagleand sometimes they use hardcoded /var/lib/libvirt/... paths, etc, which of course assumes you're using qemu://system when you later call virsh20:20
lifelessslagle: which *on redhat and fedora* are different, but they aren't on Ubuntu20:21
lifelessslagle: the hardcoded paths are of course bugs20:21
lifelessslagle: the sesssion vs system both letting kvm be used is something I didn't know - thanks !20:21
lifelessNg: that happens way too late20:21
lifelessNg: it is a bug, because a new instance has nothing to preserve.20:22
lifelessNg: we have to do this in early userspace or else services that we've configured to start - like mysql - will write to /mnt/state *under* the filesystem mount point20:22
ShrewsNg: preserve_ephemeral is passed only on a rebuild20:22
Shrewsironic/nova/virt/ironic/driver.py, about line 65220:23
Shrews(if i've followed the bits of the conversation correctly)20:23
*** e0ne has quit IRC20:29
*** e0ne has joined #tripleo20:30
Nghrm, well that would deepen the mystery and definitely justify a bug20:30
*** andreaf has quit IRC20:32
*** e0ne has quit IRC20:33
*** noslzzp has joined #tripleo20:37
greghaynesNg: So is there just unpartitioned space left on the device?20:38
greghaynesoh20:38
Shrewsgreghaynes: i'm now curious, what's the tl;dr of the problem? too much scrollback20:39
*** mestery has quit IRC20:39
*** mestery has joined #tripleo20:40
greghaynesShrews: Deploying nodes via ironic with preserve ephemeral, basically we dont end up with an ephemeral partition (let alone fs) to mount20:40
Ngit seems like we get the partition, it's just not formatted20:40
Shrewsoh, hrm. not seen that at all in my devstack tests. i just added a tempest test for that (but not yet merged)20:42
Ngok, this isn't what we thought it was, there is a fs, it's just not being mounted20:43
Ngmaybe20:43
Shrewsi just looked at that element the other day, wondering where it was mounted20:45
Shrewshttps://github.com/openstack/tripleo-image-elements/blob/master/elements/use-ephemeral/os-refresh-config/pre-configure.d/00-fix-ephemeral-mount#L1820:45
Shrewsbut that's all blackmagic to me20:45
*** jdonalds has joined #tripleo20:46
lifelessI so the idea is20:46
lifelessnova says 'ephemeral is over there XXX'20:47
lifelessif nova isn't saying that20:47
lifelessthat would be a problem20:47
lifelessNg: can you pastebin an os-collect-config --print, of a faulty setup20:47
Ngadam_g: ^^20:47
lifelessok, --build-only has regressed and now tries to do things. sadface.20:48
lifelesswe really need a test for it20:48
lifelessI was asking mordred about this the other day20:48
openstackgerritRobert Parker proposed a change to openstack/tripleo-heat-templates: Setup SSL for Ceilometer  https://review.openstack.org/9625720:49
adam_glifeless, http://paste.ubuntu.com/7539163/20:50
*** nati_ueno has joined #tripleo20:50
lifelessbnemec: hey you're around now :) - so I had a question20:51
Nghmm, no ephemeral0 in the block-device-mapping20:51
lifelessyou've looked into running tests in various places20:51
lifelessbnemec: if I wanted to test 'devtest.sh --build-only --trash-my-machine' in the gate, do you think you could point me at the right places to poke?20:52
*** jdob has quit IRC20:52
lifelessnot that trash-my-machine *should* be needed, but one thing at a time20:52
greghaynesDont change anything, but somehow find a way to trash my machine20:52
mordredlifeless: you were asking me about what?20:53
bnemeclifeless: I think you could add that as a tox target, right?20:53
lifelessgreghaynes: right so right now it still makes a new testenv, for instance.20:54
lifelessgreghaynes: unless you tell it not to.20:54
*** noslzzp has quit IRC20:55
mordredlifeless: just make a job similar to: {pipeline}-requirements-integration-dsvm20:56
mordredlifeless: and specify that it wants to run on a bare-precise node20:56
mordredthat will give you root on the box, and we'll throw the box away after anyway20:56
mordredthen you can skip devstack-checkout and friends20:56
mordredlifeless: actually - look at gate-openstack-chef-repo20:57
mordredjust do something like that except remove "revoke-sudo"20:57
lifelessmordred: devstack-checkout and friends are helpful perf stuff though20:57
mordredbecause you want to keep sudo20:57
mordredlifeless: ah - good point20:58
lifelessmordred: we'll want to consolidate some of the tripleo glue at the same time20:58
mordredso do those - in any case- bare-precise node is the thing you want - and from there you can do anything20:58
lifelessI really wish new jobs could get tested as part of gating20:59
mordred(and yes, we're working on getting trusty nodes)20:59
lifelessso we could see if it works before its reviewed20:59
mordred++20:59
lifelesslast time I raised this -infra folk were skeptical :)20:59
*** noslzzp has joined #tripleo20:59
mordredwell, I want that too - I think we're probably a few steps away from being able to do it20:59
*** untriaged-bot has joined #tripleo21:00
untriaged-botUntriaged bugs so far:21:00
untriaged-bothttps://bugs.launchpad.net/tripleo/+bug/132194321:00
untriaged-bothttps://bugs.launchpad.net/tripleo/+bug/131876721:00
untriaged-bothttps://bugs.launchpad.net/tripleo/+bug/131497821:00
mordredbut it should be easier to do when we're turbo-hipstered, because then we'll have a thing that can more directly run jobs on things21:00
untriaged-bothttps://bugs.launchpad.net/tripleo/+bug/131535521:00
uvirtbotLaunchpad bug 1321943 in tripleo "Ceilometer Swift polling on overcloud control node fails with a 403 forbidden error" [Undecided,In progress]21:00
uvirtbotLaunchpad bug 1318767 in tripleo "apache element SSL cert check fails" [Undecided,New]21:00
untriaged-bothttps://bugs.launchpad.net/tripleo/+bug/132316721:00
uvirtbotLaunchpad bug 1314978 in tripleo "Cloud vm not pingable after overcloud upgrade " [Undecided,New]21:00
uvirtbotLaunchpad bug 1315355 in tripleo "Upgrade of overcloud failed with "Connection to neutron failed: Maximum attempts reached"" [Undecided,New]21:00
untriaged-bothttps://bugs.launchpad.net/tripleo/+bug/131947321:00
uvirtbotLaunchpad bug 1323167 in tripleo "overcloud can't create an instance by neutron error" [Undecided,New]21:00
untriaged-bothttps://bugs.launchpad.net/tripleo/+bug/131667521:00
uvirtbotLaunchpad bug 1319473 in tripleo "devtest_testenv.sh doesn't honour overcloud_computescale or overcloud_controlscale" [Undecided,In progress]21:00
untriaged-bothttps://bugs.launchpad.net/tripleo/+bug/132009021:00
uvirtbotLaunchpad bug 1316675 in tripleo "Saving a devtest VM results in error" [Undecided,In progress]21:00
uvirtbotLaunchpad bug 1320090 in tripleo "pxe_ephemeral_format': u'ext4 left on nodes after deploys deleted" [Undecided,New]21:00
*** untriaged-bot has quit IRC21:00
mordredas opposed to now, when we'd have to blow the job out into xml and inject it into a jenkins21:00
mordredwhich is possible - but orchestrating that would be a bunch of work that we _know_ we'd be throwing away ina few months when jenkins diafs21:01
SpamapSmordred: jenkins can hear you, and is going to spit in your tea21:01
mordredSpamapS: what else is new?21:02
*** eghobo has quit IRC21:04
*** noslzzp has quit IRC21:06
*** eghobo has joined #tripleo21:07
Nglifeless: is racks.txt the sum total of everything we know about the hardware in our not-CI rack?21:10
lifelessI don't know what racks.txt is21:11
lifelessNg: do you mean the 405 rack?21:11
NgI do, I was trying to be oblique ;)21:11
Ngfiling jira tickets for stuck hosts and I'm being asked for machine serials, but I don't see them anywhere21:11
*** casanch1 has joined #tripleo21:12
*** BadCub has joined #tripleo21:18
*** cwolferh_ has joined #tripleo21:21
*** cwolferh has quit IRC21:24
lifelessNg: so there was an early email that had stuff21:27
lifelesslet me look it up21:27
*** eghobo has quit IRC21:27
lifelessNg: nope, ilo password and ip only21:28
lifelessNg: everything else is in ~ on the 405 rack jump host21:28
Nglifeless: ok thanks21:28
*** noslzzp has joined #tripleo21:29
*** jdonalds has quit IRC21:34
*** casanch1_ has joined #tripleo21:38
*** Penick has joined #tripleo21:39
*** Penick has quit IRC21:39
*** hashar has quit IRC21:39
*** akrivoka has quit IRC21:41
*** casanch1 has quit IRC21:41
*** TravT has joined #tripleo21:43
*** casanch1_ has quit IRC21:43
*** nati_ueno has quit IRC21:49
*** ddieterly has quit IRC21:51
*** bcrochet is now known as bcrochet|g0ne21:53
*** ddieterly has joined #tripleo21:55
lifelessjerryz: you might like to report back in the infra-specs spec for using jump host about what you've found out21:57
jerryzlifeless: sure.21:59
*** ekarlso has quit IRC22:01
lifelessNg: have you filed the bug on ironic ?22:01
lifelessNg: if not, please do.22:01
lifelessNg: or tell me to ;)22:01
Nglifeless: it's in an open tab, I'll finish it off shortly22:02
lifelessShrews: so ^ the bug is that the block device mapping for the ephemeral device is missing22:04
*** yamahata has joined #tripleo22:06
Nghttps://bugs.launchpad.net/ironic/+bug/132428622:07
uvirtbotLaunchpad bug 1324286 in ironic "ephemeral partition not being mounted" [Undecided,New]22:07
Ngterrible subject22:07
*** nati_ueno has joined #tripleo22:07
*** greghaynes has quit IRC22:07
lifelessNg: putting it in the etherpad ?22:07
Ngdone22:08
*** greghaynes has joined #tripleo22:09
*** lucas-dinner has quit IRC22:13
*** edmund1 has joined #tripleo22:23
*** edmund has quit IRC22:24
*** greghaynes has quit IRC22:27
*** ekarlso has joined #tripleo22:27
*** jdonalds has joined #tripleo22:28
*** greghaynes has joined #tripleo22:28
*** jml has quit IRC22:28
*** weshay has quit IRC22:29
*** jml has joined #tripleo22:30
ShrewsNg: where does the block-device-mapping data come from?22:30
*** jp_at_hp has joined #tripleo22:31
NgShrews: nova makes it available to the instance, via the ec2 metadata URL22:31
Ng(cloud-init then consumes it and does various things, including writing out /etc/fstab entries for ephemeralN entries)22:32
NgI'm afraid I don't know exactly how nova constructs it. I started poking around nova and the ironic nova driver code to see what I could find, but I didn't get anywhere conclusive22:32
*** greghaynes has quit IRC22:32
ShrewsNg: so it's ironic's responsibility to update the metadata? seems more like nova's responsibility, but i'm probably wrong22:33
NgShrews: I would suspect that Ironic would need to inform nova that it has created an ephemeral partition, and its label (i.e. ephemeral0)22:33
Ngbut I don't know that for fact22:33
*** greghaynes has joined #tripleo22:34
ShrewsNg: i'll poke around a bit tomorrow and see what I can find out for you, unless someone beats me to it22:34
*** eghobo has joined #tripleo22:34
*** greghaynes has quit IRC22:37
*** greghaynes has joined #tripleo22:38
*** greghaynes has quit IRC22:38
*** greghaynes has joined #tripleo22:39
openstackgerritlifeless proposed a change to openstack/tripleo-incubator: Add -c support to devtest_seed.sh.  https://review.openstack.org/9601422:42
openstackgerritlifeless proposed a change to openstack/tripleo-incubator: Fix --build-only.  https://review.openstack.org/9630322:42
jogolifeless: message:"tripleo" AND build_queue:"check-tripleo" AND build_name:"check-tripleo-overcloud-precise"22:51
jogolifeless: console logs for tripleo cloud22:53
lifelessjogo: thats for narrow selection right? simple searches will find it all regardless?22:53
jogoyeah, you just specify the build_name22:54
lifelessjogo: legendary22:54
jogolifeless: try build_name:"check-tripleo-overcloud-precise" AND message:"failed"22:54
jogoor build_name:"check-tripleo-overcloud-precise" AND build_status:"FAILURE"22:55
jogofor all failures22:55
jogoor to track the number of failed jobs: build_name:"check-tripleo-overcloud-precise" AND build_status:"FAILURE" AND message:"Finished: FAILURE"22:55
jogobut graphite can do that too22:55
greghaynesargh, probably should stop giving servers the same IP as my IRC box22:55
lifelessgreghaynes: :>22:55
lifelessjogo: <3 seed log next yah? We can change the format we capture them in in toci easily if that will help22:56
lifelessjogo: e.g. subdir no tar, or whatever22:56
jogolifeless: yeah subdir would be great22:56
lifelessjogo: sanity checking - you enabled all the tripleo jobs, not just that one ?22:56
lifelessjogo: have a look in openstack-infra/tripleo-ci22:57
jogolifeless: I didn't do anything ^_^ infra already supported it22:57
lifelessjogo: huh blink22:57
lifelessjogo: in tripleo-ci in the root see toci_devtest.sh22:57
lifelessthere is a function in there22:57
lifelessget_state_from_host22:58
jogohttp://git.openstack.org/cgit/openstack-infra/config/tree/modules/openstack_project/files/logstash/jenkins-log-client.yaml#n1622:58
jogolifeless: ahh cool22:59
jogoso on the infra side what happens is because you already have logs in the infra log pipeline if you name them right or update that logstash yaml file, they get added to logstash.o.o automatically22:59
lifeless(teaching you how its done so you can change both sides at the same time)22:59
jogoand yes it will work for all tests22:59
jogoyou can specify the queue insetad22:59
jogosuch as: build_queue:"check-tripleo"23:00
jogolifeless: for starters what do you think of just doing the usptart logs23:03
lifelessthat would be a great start23:03
jogoh23:04
*** BadCub has quit IRC23:04
openstackgerritMichael Tupitsyn proposed a change to openstack/tripleo-incubator: Do not create admin user if it exists already  https://review.openstack.org/9630723:07
*** rbrady has quit IRC23:09
*** yamahata has quit IRC23:09
*** ddieterly has quit IRC23:11
openstackgerritJoe Gordon proposed a change to openstack-infra/tripleo-ci: Extract upstart logs so they can be loaded into logstash  https://review.openstack.org/9630823:13
*** edmund1 has quit IRC23:17
jogohttps://review.openstack.org/#/c/96308/ isn't self gated?23:20
Shrewslifeless: just posted a comment on that eph device bug. not as simple as i first thought, so will need some discussion with lucas23:36
*** jtomasek has quit IRC23:36
*** jp_at_hp has quit IRC23:37
bnemecjogo: We don't actually gate on the tripleo queue jobs, but we do require them to pass before approving.23:37
bnemecYou can see that change on the zuul status page under the check-tripleo list.23:37
*** jp_at_hp has joined #tripleo23:38
lifelessjogo: infra won't let us gate yet23:39
lifelesswe haven't met e.g. the reliability requirement23:39
lifelessShrews: ack23:42
greghaynesDo we know what the cause is for tests that fail with  tar: /var/log/host_info.txt: file changed as we read it when grabbing logs after success23:45
lifelesspresumably something wrote to it...23:49
*** jtomasek has joined #tripleo23:49
greghaynesas it read it?23:49
greghaynesI just know ive seen it several times now, curious if anyone debugged yet23:50
lifelessguessing23:51
lifelesstar stats it23:51
lifelessoutputs the tar header23:51
lifelessopens and reads it23:51
lifelessand its reading too little or too much23:51

Generated by irclog2html.py 2.14.0 by Marius Gedminas - find it at mg.pov.lt!