*** sgotliv has quit IRC | 00:11 | |
*** sgotliv has joined #openstack-sahara | 00:11 | |
*** witlessb has quit IRC | 00:43 | |
*** rcernin has quit IRC | 00:46 | |
*** crobertsrh is now known as _crobertsrh | 01:41 | |
rickflare | I did a crap ton of reading and I now under so much more! | 02:36 |
---|---|---|
rickflare | I have a crap ton of questions for you guys tomorrow | 02:37 |
openstackgerrit | zhongshengping proposed openstack/puppet-sahara: Add api_paste type/provider for Sahara https://review.openstack.org/281613 | 02:50 |
*** Poornima has joined #openstack-sahara | 04:02 | |
rickflare | guys I am now seeing the following when running my spark job | 04:14 |
rickflare | http://pastebin.com/pqiRRSWT | 04:14 |
openstackgerrit | zhongshengping proposed openstack/puppet-sahara: Add api_paste type/provider for Sahara https://review.openstack.org/281613 | 05:32 |
*** dave-mccowan has quit IRC | 05:38 | |
openstackgerrit | Vitaly Gridnev proposed openstack/sahara: honor api_insecure parameters https://review.openstack.org/279996 | 06:07 |
*** apavlov has joined #openstack-sahara | 06:08 | |
*** apavlov has quit IRC | 06:39 | |
openstackgerrit | Jaxon Wang proposed openstack/sahara-tests: Add more infomation when create cluster failed for scenario test https://review.openstack.org/281095 | 06:53 |
*** nkrinner has joined #openstack-sahara | 06:58 | |
openstackgerrit | Jaxon Wang proposed openstack/sahara: Update CDH user doc for CDH 5.5.0 https://review.openstack.org/281670 | 07:16 |
openstackgerrit | Grigoriy Rozhkov proposed openstack/sahara: Remove unsupported MapR plugin versions https://review.openstack.org/266444 | 07:16 |
openstackgerrit | Vitaly Gridnev proposed openstack/sahara: CDH plugin config helper refactoring https://review.openstack.org/255825 | 07:18 |
openstackgerrit | Vitaly Gridnev proposed openstack/sahara: CDH plugin edp engine code refactoring https://review.openstack.org/257309 | 07:18 |
*** esikachev has joined #openstack-sahara | 07:18 | |
openstackgerrit | Vitaly Gridnev proposed openstack/sahara: Add CDH 5.5 support https://review.openstack.org/279964 | 07:28 |
openstackgerrit | Evgeny Sikachev proposed openstack/sahara-tests: Fix READMEs location for sahara_tests https://review.openstack.org/280730 | 07:31 |
openstackgerrit | Merged openstack/sahara-tests: Add CDH 5.5.0 scenario test https://review.openstack.org/281092 | 07:35 |
openstackgerrit | Jinxing Fang proposed openstack/sahara: Update the roadmap https://review.openstack.org/281678 | 07:40 |
*** rcernin has joined #openstack-sahara | 07:51 | |
openstackgerrit | Jaxon Wang proposed openstack/sahara-tests: Add more infomation when create cluster failed for scenario test https://review.openstack.org/281095 | 08:03 |
*** pcaruana has joined #openstack-sahara | 08:05 | |
openstackgerrit | Evgeny Sikachev proposed openstack/sahara-tests: Put input datasources to hdfs in Pig job https://review.openstack.org/280701 | 08:08 |
openstackgerrit | Evgeny Sikachev proposed openstack/sahara-tests: Put input datasources to hdfs in Pig job https://review.openstack.org/280701 | 08:11 |
openstackgerrit | Evgeny Sikachev proposed openstack/sahara-tests: Disable ssl_verify as default https://review.openstack.org/280762 | 08:14 |
*** esikachev has quit IRC | 08:23 | |
openstackgerrit | Jaxon Wang proposed openstack/sahara: Update CDH user doc for CDH 5.5.0 https://review.openstack.org/281670 | 08:29 |
openstackgerrit | Grigoriy Rozhkov proposed openstack/sahara: Remove unsupported MapR plugin versions https://review.openstack.org/266444 | 08:38 |
openstackgerrit | Merged openstack/sahara-specs: Remove unsupported versions of MapR plugin https://review.openstack.org/258620 | 08:44 |
*** vgridnev has joined #openstack-sahara | 08:48 | |
*** esikachev has joined #openstack-sahara | 09:11 | |
*** witlessb has joined #openstack-sahara | 09:18 | |
*** apavlov has joined #openstack-sahara | 09:23 | |
openstackgerrit | Jaxon Wang proposed openstack/sahara: CDH plugin edp engine code refactoring https://review.openstack.org/257309 | 09:57 |
*** openstackgerrit has quit IRC | 10:02 | |
*** openstackgerrit has joined #openstack-sahara | 10:03 | |
openstackgerrit | Evgeny Sikachev proposed openstack/sahara-tests: Adding ability use default templates https://review.openstack.org/280225 | 10:22 |
openstackgerrit | Evgeny Sikachev proposed openstack/sahara-tests: Put input datasources to hdfs in Pig job https://review.openstack.org/280701 | 10:27 |
*** apavlov has quit IRC | 10:40 | |
openstackgerrit | Evgeny Sikachev proposed openstack/sahara-tests: Add autoregistering of image https://review.openstack.org/281315 | 10:43 |
*** Poornima has quit IRC | 10:43 | |
openstackgerrit | zhongshengping proposed openstack/puppet-sahara: Add the capability to configure api-paste.ini with config.pp https://review.openstack.org/281756 | 10:47 |
*** apavlov has joined #openstack-sahara | 10:55 | |
*** _degorenko|afk is now known as degorenko | 11:01 | |
*** tellesnobrega is now known as tellesnobrega_af | 11:02 | |
*** vgridnev has quit IRC | 11:11 | |
*** vgridnev has joined #openstack-sahara | 11:21 | |
*** tellesnobrega_af is now known as tellesnobrega | 11:24 | |
*** vgridnev has quit IRC | 11:24 | |
*** vgridnev has joined #openstack-sahara | 11:26 | |
*** vgridnev has quit IRC | 11:28 | |
*** vgridnev has joined #openstack-sahara | 11:28 | |
openstackgerrit | zhongshengping proposed openstack/puppet-sahara: Add the capability to configure api-paste.ini with config.pp https://review.openstack.org/281756 | 11:34 |
openstackgerrit | zhongshengping proposed openstack/puppet-sahara: Add the capability to configure api-paste.ini with config.pp https://review.openstack.org/281756 | 11:37 |
openstackgerrit | zhongshengping proposed openstack/puppet-sahara: Add the capability to configure api-paste.ini with config.pp https://review.openstack.org/281756 | 11:38 |
openstackgerrit | Evgeny Sikachev proposed openstack/sahara-tests: Add check of scaling for CDH and Ambari https://review.openstack.org/274675 | 11:43 |
*** apavlov has quit IRC | 11:44 | |
*** apavlov has joined #openstack-sahara | 11:47 | |
*** raildo-afk is now known as raildo | 12:09 | |
*** apavlov has quit IRC | 12:09 | |
*** dave-mccowan has joined #openstack-sahara | 12:26 | |
openstackgerrit | Evgeny Sikachev proposed openstack/sahara-tests: Fix using proxy node for checks https://review.openstack.org/279447 | 12:41 |
*** esikachev_afk has joined #openstack-sahara | 12:49 | |
*** esikachev has quit IRC | 12:53 | |
*** esikachev has joined #openstack-sahara | 12:53 | |
*** esikachev has left #openstack-sahara | 12:53 | |
*** esikachev_afk is now known as esikachev | 12:58 | |
* esikachev is now away: Away from keyboard | 13:02 | |
*** esikachev is now known as esikachev_afk | 13:02 | |
*** esikachev_afk is now known as esikachev | 13:14 | |
*** thumpba has joined #openstack-sahara | 13:15 | |
*** apavlov has joined #openstack-sahara | 13:25 | |
*** nkrinner has quit IRC | 13:28 | |
*** nkrinner has joined #openstack-sahara | 13:28 | |
*** _crobertsrh is now known as crobertsrh | 13:31 | |
vgridnev | crobertsrh, is there understanding what's up with integration tests in your dashboard changes? | 13:58 |
crobertsrh | I'm guessing that there will be some rewriting of integration tests still to come. | 13:58 |
crobertsrh | Hoping that those changes will be small, but I still need to look at them. | 13:59 |
*** thumpba has quit IRC | 14:09 | |
*** dave-mccowan has quit IRC | 14:09 | |
*** dave-mccowan has joined #openstack-sahara | 14:10 | |
*** tmckay has joined #openstack-sahara | 14:10 | |
*** vgridnev has quit IRC | 14:11 | |
*** vgridnev has joined #openstack-sahara | 14:12 | |
*** crobertsrh1 has joined #openstack-sahara | 14:27 | |
*** egafford has joined #openstack-sahara | 14:27 | |
rickflare | morning | 14:29 |
* rickflare finally read 90% of the sahara docs and is ashamed he didnt do it sooner | 14:29 | |
openstackgerrit | Merged openstack/sahara-dashboard: Change color of status field https://review.openstack.org/278241 | 14:31 |
*** tmckay has quit IRC | 14:34 | |
elmiko | rickflare: good bedtime reading ;) | 14:35 |
rickflare | so | 14:37 |
rickflare | elmiko | 14:37 |
rickflare | tell me what you think about this | 14:37 |
rickflare | few things in my reading that I have questions about | 14:37 |
elmiko | k | 14:37 |
*** vgridnev has quit IRC | 14:38 | |
rickflare | one is once sahara starts a cluster lets say you have a massive map reduce job failure or a cascading failure of name nodes | 14:38 |
rickflare | all of which I have seen | 14:39 |
rickflare | using horizon can you restart the hadoop services | 14:39 |
rickflare | or does one have to manually go into each node and restart the services? | 14:39 |
elmiko | hmm | 14:40 |
elmiko | well if the job fails, then i would expect the job execution in horizon to report as failed. at which point you could restart the job. | 14:40 |
rickflare | this is a huge huge issue for production clusters | 14:40 |
rickflare | not related to the job | 14:40 |
rickflare | ive seen ingestion or random things cause name nodes to die | 14:41 |
elmiko | do those issue end up cascading back to the master and causing a job failure? | 14:41 |
elmiko | or just node drop outs | 14:41 |
rickflare | I am not aware that sahara is using high availility for name nodes | 14:41 |
rickflare | nodes can just drop out | 14:42 |
elmiko | we do have some HA options | 14:42 |
elmiko | ok, so yo might want to look at the health check patches that vgridnev is working on | 14:42 |
elmiko | we are implementing a system that will determine the health of the cluster | 14:42 |
rickflare | so HA on the name nodes is something that should be top priority | 14:42 |
rickflare | if its not already implemented | 14:42 |
elmiko | so, to your original question. currently, i don't think you can restart individual nodes in the cluster | 14:42 |
elmiko | you would need to login to the node and fix whatever is wrong, or restart manually | 14:43 |
rickflare | what happens if you have to reboot the entire cluster | 14:43 |
rickflare | ie kernel patches etc | 14:43 |
rickflare | you have to manually restart all the cluster services | 14:43 |
elmiko | in that case, you would need to rebuild the image you are using for the cluster and rebuild the cluster | 14:43 |
*** vgridnev has joined #openstack-sahara | 14:43 | |
elmiko | we don't have any option currently to update software packages per node | 14:43 |
rickflare | well not the cluster software but os related | 14:44 |
rickflare | and that approach wont work in most cases | 14:44 |
rickflare | esp if the data in hdfs can not be reproduced | 14:44 |
*** vgridnev has quit IRC | 14:44 | |
rickflare | ie pcap etc | 14:44 |
elmiko | i would imagine you would want to create an external hdfs store, and treat the cluster as something that can be dropped and respawned if needed | 14:45 |
rickflare | so in that case you would not be using hdfs at all | 14:46 |
rickflare | just the task trackers | 14:46 |
rickflare | which is not a compelling use | 14:46 |
rickflare | for hardware consolidation | 14:47 |
rickflare | so are you familar with cloudera manager? | 14:47 |
elmiko | a little | 14:47 |
elmiko | wait, why wouldn't you be using hdfs? | 14:47 |
rickflare | so ensuring sahara has the basic functionality of cloudera manager should high on the list | 14:47 |
elmiko | sahara includes the cloudera manager in the cdh plugin images | 14:48 |
rickflare | well if you are going to have a external hdfs store | 14:48 |
rickflare | you might as well use bare metal | 14:48 |
rickflare | at that point | 14:48 |
*** dave-mccowan has quit IRC | 14:48 | |
elmiko | why is that? | 14:48 |
rickflare | because you can trust the data persistance of the openstack cluster | 14:48 |
rickflare | like if rebooting | 14:49 |
rickflare | requires a cluster rebuild | 14:49 |
rickflare | that is not effective | 14:49 |
elmiko | sure | 14:49 |
*** tellesnobrega is now known as tellesnobrega_af | 14:49 | |
elmiko | we are currently working on improving the cluster health options, but we don't have some of these potions yet | 14:49 |
elmiko | we do have some HA availability in the cluster, but this is nothing like rolling o/s upgrades or anything | 14:50 |
elmiko | i'm still curious about the external hdfs stuff, and why that is a poor choice | 14:50 |
*** vgridnev has joined #openstack-sahara | 14:50 | |
rickflare | ok | 14:50 |
rickflare | ill try to articulate that more | 14:51 |
elmiko | cool | 14:51 |
rickflare | ill since with my co workers | 14:51 |
rickflare | and provide you with something in detail | 14:51 |
rickflare | as for the cluster health | 14:51 |
rickflare | that is helpful | 14:52 |
rickflare | but service control | 14:52 |
rickflare | is HUGE | 14:52 |
rickflare | I have written ruby scripts to manage hadoop clusters | 14:52 |
rickflare | ie formating name nodes | 14:52 |
rickflare | and ensuring HA to starting and stoping name nodes | 14:53 |
elmiko | nice, sounds like a good improvement for sahara | 14:53 |
rickflare | these scripts would perform checks on the status of the name nodes prior to doing anything related to the datanodes | 14:53 |
elmiko | imo, if you are free this afternoon (EST), you should bring these topics up at our meeting | 14:53 |
rickflare | ok great! | 14:53 |
elmiko | definitely look at the patches that vgridnev is working on for cluster health too | 14:54 |
elmiko | he is creating a system that will be very expandable for performing health checks | 14:54 |
rickflare | that another thing I struggle with | 14:54 |
rickflare | i am not yet very knowledgable about who is doing what within the project | 14:54 |
elmiko | totally understandable | 14:54 |
elmiko | rickflare: http://eavesdrop.openstack.org/#OpenStack_Data_Processing_%28Sahara%29_Team_Meeting | 14:55 |
elmiko | coming to our meetings is one the best way to stay in touch with what we are working on | 14:55 |
elmiko | and if you make it to summit, i'm sure you will find no shortage of folks who would like to talk about improvements to sahara | 14:56 |
rickflare | i am for sure coming | 14:56 |
elmiko | we could even make plans for future cycles about improvements to cluster health and rolling reboots, etc. | 14:56 |
rickflare | I am working tirelessly to getting my guys | 14:56 |
rickflare | to start seriously doing work | 14:57 |
rickflare | for this | 14:57 |
elmiko | \o/ | 14:57 |
rickflare | and providing feedback | 14:57 |
elmiko | we love that | 14:57 |
elmiko | getting good user/operator feedback is something we really would like to get more of | 14:57 |
rickflare | you guys are going to get it | 14:57 |
rickflare | i am putting all my chips on openstack and sahara | 14:58 |
elmiko | going "all in" eh? | 14:58 |
elmiko | ;) | 14:58 |
rickflare | i see more power and control in this that aws | 14:58 |
rickflare | also I just dont see running these services in containers making sense at this point | 14:58 |
elmiko | yea, that's still up in the air, i suppose | 14:59 |
elmiko | i've done a little work with spark in containers | 14:59 |
rickflare | yea | 14:59 |
rickflare | so the containers issue is interesting | 15:00 |
rickflare | its hard to know what will happen | 15:00 |
rickflare | i mean docker is powerful and kubernetes is sick | 15:00 |
rickflare | the complexity is sky high though | 15:00 |
elmiko | yea | 15:01 |
elmiko | especially the networking issues | 15:01 |
rickflare | what are your thoughts on it? | 15:01 |
rickflare | omg yes | 15:01 |
elmiko | well, i think it's interesting, and i do like how quickly the containers can spin up | 15:01 |
elmiko | getting a spark cluster running on containers, i've seen the clusters spin up way faster than vm | 15:02 |
rickflare | oh yea | 15:02 |
rickflare | because you essentially only need to deal with that process | 15:02 |
*** dave-mccowan has joined #openstack-sahara | 15:02 | |
elmiko | so, i imagine it would make more difference in situations where you care about elastic scaling speed | 15:02 |
elmiko | right | 15:02 |
rickflare | i just worry that the pace of development with containers is moving so fast | 15:03 |
rickflare | it may leave vm's in the dust before even leaving the starting line | 15:03 |
elmiko | hehe | 15:03 |
elmiko | a valid concern | 15:03 |
elmiko | i think the whole containers/vm debate needs to be taken with a grain of salt | 15:04 |
elmiko | there are some situations where having a vm is preferable, and sometimes the opposite | 15:04 |
rickflare | it worries me though | 15:04 |
rickflare | as a CEO | 15:04 |
elmiko | what worries you about it? | 15:04 |
rickflare | of a small company my choices of tech direction have huge implications | 15:04 |
elmiko | ah, right | 15:04 |
rickflare | esp since we can not be masters of every domain | 15:05 |
rickflare | its just not possible | 15:05 |
elmiko | right | 15:05 |
rickflare | so thats the biggest concern I have | 15:05 |
elmiko | totally valid | 15:06 |
rickflare | what time is the meeting | 15:07 |
rickflare | i have a lot I think I can provide | 15:07 |
elmiko | i think 1pm eastern today | 15:07 |
rickflare | ok cool | 15:07 |
rickflare | i am also going to work on this blueprint | 15:07 |
rickflare | today | 15:07 |
rickflare | and get that in | 15:07 |
elmiko | yea, sounds like you have some great ideas | 15:07 |
rickflare | on another note | 15:09 |
rickflare | I am still getting failures on my spark jobs | 15:09 |
elmiko | well that stinks =( | 15:10 |
rickflare | yea | 15:10 |
rickflare | http://pastebin.com/pqiRRSWT | 15:10 |
rickflare | is what im getting | 15:11 |
elmiko | weird, so, some connection issue to the keystone controller | 15:11 |
rickflare | yea | 15:12 |
rickflare | and I dont know why | 15:12 |
elmiko | might be worth running some curl commands to the keystone server to see if you can issue a token create from that node | 15:12 |
rickflare | enlighten me | 15:13 |
elmiko | http://docs.openstack.org/developer/keystone/api_curl_examples.html?highlight=curl#service-api-examples-using-curl | 15:13 |
elmiko | that shows some examples of running cli curl commands to a keystone api controller | 15:13 |
elmiko | something i might try to debug, is logging in to the node that is failing to make the token, and manually run the curl command to that endpoint to see if you can generate a token | 15:14 |
elmiko | you *may* find that there is some networking issue affecting communication between the tenant network and the control plane network | 15:14 |
elmiko | i've seen issues like not allowing inbound traffic, etc... | 15:15 |
elmiko | (although your sec rules looked fine the other day) | 15:15 |
rickflare | worked just fine | 15:15 |
elmiko | ok, so it's something specific to when spark attempts to make the connection | 15:15 |
elmiko | this is where things get fuzzy, because you are dealing with spark using the hadoop-openstack.jar to do comms with keystone | 15:16 |
rickflare | humm | 15:17 |
elmiko | i'm guessing this is to access some swift object? | 15:17 |
rickflare | yes sir | 15:17 |
elmiko | it's weird that the curl would work, but spark fails | 15:18 |
elmiko | and socket timeout definitely indicates that it's a networking issue, not a bad request to the keystone controller | 15:18 |
elmiko | we are now approaching the edge of my knowledge on debugging spark | 15:19 |
elmiko | just fyi... | 15:19 |
rickflare | ok | 15:20 |
elmiko | hmm, could it be that one of the nodes in the cluster is attempting this request and perhaps that node doesn't have good connectivity to the keystone server? | 15:21 |
rickflare | they all do | 15:24 |
elmiko | huh | 15:24 |
*** Akanksha08 has joined #openstack-sahara | 15:30 | |
rickflare | you know what | 15:32 |
rickflare | elmiko | 15:32 |
rickflare | you might be right | 15:32 |
elmiko | rickflare: sorry, i don't have any other specific advice. maybe write a small spark app to try and access the keystone controller, or if pyspark is on those images you could try running something from the pyspark repl | 15:32 |
rickflare | i might have had a ip overlap | 15:32 |
rickflare | im checking now | 15:32 |
elmiko | ah, interesting ;) | 15:32 |
rickflare | great catch | 15:33 |
rickflare | i actually think that may have been it | 15:33 |
rickflare | rebuilding the cluster in a different | 15:33 |
elmiko | \o/ | 15:33 |
rickflare | ip space | 15:35 |
rickflare | running the job now | 15:39 |
crobertsrh | vgridnev: I'm seeing what I think is strange output when I run the integration tests locally (different failures than what I think I'm seeing in the gate logs). Any tips on running the integration tests? | 15:40 |
*** tmckay has joined #openstack-sahara | 15:40 | |
* rickflare reading pep8 and realizing he knows nothing | 15:40 | |
rickflare | elmiko | 15:42 |
rickflare | nope same | 15:42 |
rickflare | time out error | 15:42 |
elmiko | huh... | 15:43 |
elmiko | rickflare: so, yea, at this point you might need to debug this from inside the spark app. maybe by writing a custom app to ping the keystone server. i might use pyspark if it's available on those nodes | 15:46 |
elmiko | it should be fairly easy to write a small pyspark app to test connectivity with the keystone controller | 15:46 |
rickflare | so I am seeing this in keystone | 15:48 |
rickflare | http://pastebin.com/XLEy3vUC | 15:48 |
elmiko | interesting.... | 15:48 |
* esikachev is now away: Away from keyboard | 15:48 | |
*** esikachev is now known as esikachev_afk | 15:49 | |
elmiko | so, it's not a connection issue but a data issue | 15:49 |
*** thumpba has joined #openstack-sahara | 15:49 | |
elmiko | i'm surprised that keystone doesn't return a 500 | 15:49 |
elmiko | or maybe it does and the hadoop-openstack connector mis-reports | 15:49 |
elmiko | rickflare: does it show the request that was made? | 15:50 |
elmiko | (also, you may want to se debug=true in your keystone conf to see more output) | 15:50 |
elmiko | you'll probably want to debug the actual body that is sent to the token POST | 15:51 |
*** vgridnev has quit IRC | 15:52 | |
rickflare | k | 15:56 |
*** krotscheck_dcm is now known as krotscheck | 16:00 | |
*** raildo is now known as raildo-afk | 16:02 | |
*** zigo has quit IRC | 16:03 | |
openstackgerrit | Grigoriy Rozhkov proposed openstack/sahara: Remove unsupported MapR plugin versions https://review.openstack.org/266444 | 16:04 |
*** zigo has joined #openstack-sahara | 16:05 | |
*** coolsvap|away has quit IRC | 16:06 | |
*** pcaruana has quit IRC | 16:15 | |
*** nkrinner has quit IRC | 16:16 | |
*** raildo-afk is now known as raildo | 16:17 | |
*** rcernin has quit IRC | 16:17 | |
*** tellesnobrega_af is now known as tellesnobrega | 16:19 | |
*** coolsvap|away has joined #openstack-sahara | 16:20 | |
openstackgerrit | Tim Kelsey proposed openstack/sahara: Fixes to make bandit integration tests work with sahara https://review.openstack.org/281940 | 16:35 |
openstackgerrit | Evgeny Sikachev proposed openstack/sahara-tests: Fix using proxy node for checks https://review.openstack.org/279447 | 16:39 |
*** tellesnobrega is now known as tellesnobrega_af | 16:44 | |
*** tellesnobrega_af is now known as tellesnobrega | 16:57 | |
openstackgerrit | Grigoriy Rozhkov proposed openstack/sahara-tests: Add MapR-FS support to sahara scenario framework https://review.openstack.org/281963 | 17:05 |
*** vgridnev has joined #openstack-sahara | 17:11 | |
*** vgridnev has quit IRC | 17:15 | |
*** vgridnev has joined #openstack-sahara | 17:18 | |
*** vgridnev has quit IRC | 17:19 | |
*** vgridnev has joined #openstack-sahara | 17:27 | |
*** krotscheck is now known as krotscheck_dr | 17:27 | |
*** vgridnev has quit IRC | 17:28 | |
*** vgridnev has joined #openstack-sahara | 17:34 | |
*** vgridnev has quit IRC | 17:35 | |
*** rcernin has joined #openstack-sahara | 17:36 | |
*** vgridnev has joined #openstack-sahara | 17:42 | |
*** vgridnev has quit IRC | 17:45 | |
rickflare | hey | 17:50 |
rickflare | what channel is the meeting in? | 17:50 |
elmiko | rickflare: openstack-meeting-alt (today) | 17:52 |
*** vgridnev has joined #openstack-sahara | 17:52 | |
*** esikachev_afk is now known as esikachev | 17:53 | |
*** degorenko is now known as _degorenko|afk | 18:02 | |
*** tellesnobrega is now known as tellesnobrega_af | 18:05 | |
*** tellesnobrega_af is now known as tellesnobrega | 18:05 | |
*** tellesnobrega is now known as tellesnobrega_af | 18:07 | |
*** raildo is now known as raildo-afk | 18:10 | |
*** tellesnobrega_af is now known as tellesnobrega | 18:11 | |
*** raildo-afk is now known as raildo | 18:22 | |
openstackgerrit | Grigoriy Rozhkov proposed openstack/sahara: Add Hue 3.9.0 to MapR plugin https://review.openstack.org/275217 | 18:28 |
*** apavlov has quit IRC | 18:51 | |
rickflare | tmckay you around? | 18:51 |
tmckay | yeah | 18:52 |
rickflare | sooo | 18:52 |
rickflare | got some new errors | 18:52 |
rickflare | i got the job to run | 18:52 |
rickflare | as long as I dont write to swift | 18:52 |
tmckay | had to fight with my cluster this morning | 18:52 |
*** vgridnev has quit IRC | 18:52 | |
openstackgerrit | lu huichun proposed openstack/sahara: [EDP] Add suspend_job() for sahara edp engine(oozie implementation) https://review.openstack.org/201448 | 18:53 |
tmckay | rickflare, ack. The swift issue is very strange -- I've got a spark 131 out of the box, with no special config, and it works | 18:53 |
rickflare | really | 18:53 |
rickflare | wth | 18:53 |
tmckay | rickflare, you could explore integrating with manila, for fun | 18:53 |
rickflare | manilla? | 18:53 |
*** witlessb has quit IRC | 18:53 | |
tmckay | rickflare, yeah, i've never seen the permission issue we ran into. For me, if you can read from swift, it's been fine | 18:54 |
tmckay | manila is an NFS share service in openstack | 18:54 |
tmckay | We did some work to integrate sahara with it | 18:54 |
tmckay | You can set up manila, and host your data on manila shares instead of swift, and then have sahara mount the shares on your cluster | 18:54 |
* esikachev is now away: Away from keyboard | 18:55 | |
*** esikachev is now known as esikachev_afk | 18:55 | |
tmckay | egafford and crobertsrh did most of the work on it, I did some | 18:55 |
rickflare | its failing on the create file | 18:55 |
rickflare | it as if | 18:55 |
rickflare | I dont have permission to write to the container | 18:55 |
elmiko | ooh, you installed with packstack? | 18:55 |
tmckay | I wonder if you created the container using different credentials? | 18:55 |
rickflare | naw | 18:55 |
elmiko | you may want to check the swift acls to make sure that the user has permission to write in that project on swift | 18:56 |
tmckay | elmiko, some default swift acl junk? | 18:56 |
elmiko | right | 18:56 |
elmiko | could be | 18:56 |
elmiko | or | 18:56 |
*** witlessb has joined #openstack-sahara | 18:56 | |
elmiko | have you tried using those creds to just make a file in swift from the cli or something? | 18:56 |
rickflare | yea | 18:56 |
tmckay | elmiko, yeah, the initial upload | 18:56 |
rickflare | and it works | 18:56 |
rickflare | so check this out | 18:56 |
tmckay | wonder if it's an acl tied to an ip? | 18:57 |
rickflare | i cant seem to delete any of them object in the containers | 18:57 |
tmckay | rickflare, anything in the swift logs? | 18:57 |
rickflare | some are listed as pseudo-folders | 18:57 |
elmiko | tmckay: doubtful | 18:57 |
rickflare | all puts in swift | 18:58 |
rickflare | no failures | 18:58 |
tmckay | rickflare, on a side note, I verified that the bug I reported is valid for swift and java jobs both, so any time you're ready I can point you to where it needs to be fixed and you can work on getting that ATC status | 18:58 |
rickflare | yea | 18:59 |
rickflare | pm me | 18:59 |
rickflare | lets do that | 18:59 |
tmckay | hmmm. java thinks it has a permission error somewhere, but where?? never seen it | 18:59 |
openstackgerrit | Chad Roberts proposed openstack/sahara-dashboard: Fixing up integration tests after UI reorganization https://review.openstack.org/282009 | 19:01 |
elmiko | rickflare: but it was a fail to POST in keystone though right? | 19:03 |
elmiko | i still think you need to track down the request that was made to keystone and figure out why it denied permission | 19:03 |
elmiko | look at user, creds, project, etc.. | 19:04 |
rickflare | its fine in keystone now | 19:09 |
rickflare | i have the wrong password in my spark.xml | 19:09 |
rickflare | now its a permission issue again | 19:09 |
tmckay | weirdest thing. read from swift, write to hdfs works on that job | 19:14 |
*** witlessb has quit IRC | 19:14 | |
rickflare | tmckay I just pm'd ya | 19:15 |
*** witlessb has joined #openstack-sahara | 19:48 | |
*** rcernin has quit IRC | 19:55 | |
*** crobertsrh is now known as _crobertsrh | 20:00 | |
*** pino|work_ has joined #openstack-sahara | 20:09 | |
*** pino|work has quit IRC | 20:12 | |
*** kgalanov has quit IRC | 20:15 | |
*** kgalanov has joined #openstack-sahara | 20:17 | |
elmiko | tmckay, rickflare is it a public container? | 20:17 |
rickflare | ? | 20:17 |
rickflare | yes | 20:17 |
rickflare | it is | 20:17 |
elmiko | so, anyone can read without perms | 20:17 |
tmckay | elmiko, yeah, read is working, write is failing | 20:18 |
elmiko | because it's public | 20:18 |
tmckay | actually, we might check the core-site.xml on the node to see what tenant it's trying to use | 20:18 |
elmiko | you aren't by chance writing to an object that exists already? | 20:18 |
tmckay | possible it's in 2 different tenants ... | 20:18 |
tmckay | nope, we tried that | 20:18 |
elmiko | k | 20:18 |
elmiko | had to ask | 20:19 |
tmckay | crazy thing, it was writing to output2, right? so we watched it actually create swift://sparkstuff.sahara/output2 | 20:19 |
elmiko | oh weird... | 20:19 |
tmckay | and swift://sparkstuff.sahara/output2/_temporary_file | 20:19 |
rickflare | yup | 20:20 |
tmckay | then it dies with permission error with a java trace | 20:20 |
tmckay | almost like it was failing on staging an intermediate file | 20:20 |
elmiko | yea | 20:20 |
tmckay | could even be in hdfs, for all I know | 20:20 |
tmckay | a config issue between master and workers? | 20:20 |
elmiko | didn't you mention hacking the core-site.xml file? | 20:22 |
elmiko | does it need to be distributed to all nodes? | 20:22 |
openstackgerrit | Brandon James proposed openstack/sahara: Check that main-class value is not null in job execution validator https://review.openstack.org/282050 | 20:32 |
*** thumpba has quit IRC | 20:53 | |
*** Akanksha08 has quit IRC | 20:54 | |
*** Akanksha08 has joined #openstack-sahara | 20:58 | |
tmckay | tellesnobrega, ping | 21:02 |
tellesnobrega | tmckay, pong | 21:02 |
tmckay | tellesnobrega, hey. so this change right above, I tested for spark and java but not for storm. It ensures that main-class is present and not null | 21:03 |
tmckay | tellesnobrega, that should be the case for storm as well, correct? main class is required and can't be empty? | 21:03 |
tellesnobrega | yes | 21:04 |
tmckay | Someday we may want to support Manifest entries in certain cases that name the main class (but at least Oozie does not seem to be able to do that) | 21:04 |
tmckay | tellesnobrega, okay, thanks, wanted to be sure | 21:04 |
tmckay | I think from what I've read though that manifests could work with spark, without a main class config option | 21:05 |
tellesnobrega | tmckay, np, the way the command line for storm is implemented we require a main class, if it is missing the command line won't be complete and fail the job launch | 21:06 |
*** raildo is now known as raildo-afk | 21:06 | |
tmckay | tellesnobrega, gotcha, yeah if it's modeled after spark that totally makes sense | 21:06 |
tellesnobrega | it is | 21:06 |
tmckay | I forgot that :) | 21:07 |
openstackgerrit | Julian proposed openstack/sahara: Add package installation methods to ssh_remote util https://review.openstack.org/282058 | 21:10 |
tellesnobrega | tmckay, :) | 21:12 |
*** chlong has quit IRC | 21:57 | |
*** chlong_ has joined #openstack-sahara | 21:58 | |
*** tmckay has left #openstack-sahara | 22:06 | |
*** Akanksha08 has quit IRC | 22:18 | |
*** dave-mccowan has quit IRC | 22:26 | |
*** egafford has quit IRC | 22:27 | |
*** witlessb has quit IRC | 22:39 | |
*** egafford has joined #openstack-sahara | 23:15 | |
*** jamielennox is now known as jamielennox|away | 23:20 | |
*** dave-mccowan has joined #openstack-sahara | 23:41 | |
*** openstackgerrit has quit IRC | 23:47 | |
*** openstackgerrit_ is now known as openstackgerrit | 23:47 | |
*** openstackgerrit_ has joined #openstack-sahara | 23:48 | |
*** openstackgerrit_ is now known as openstackgerrit | 23:48 | |
*** openstackgerrit_ has joined #openstack-sahara | 23:49 | |
*** chlong_ has quit IRC | 23:52 | |
*** openstackgerrit_ has quit IRC | 23:55 | |
*** egafford has quit IRC | 23:56 | |
*** openstackgerrit_ has joined #openstack-sahara | 23:57 | |
*** sgotliv has quit IRC | 23:57 |
Generated by irclog2html.py 2.14.0 by Marius Gedminas - find it at mg.pov.lt!