14:00:22 <tellesnobrega> #startmeeting sahara
14:00:25 <openstack> Meeting started Thu Feb 21 14:00:22 2019 UTC and is due to finish in 60 minutes.  The chair is tellesnobrega. Information about MeetBot at http://wiki.debian.org/MeetBot.
14:00:26 <openstack> Useful Commands: #action #agreed #help #info #idea #link #topic #startvote.
14:00:29 <openstack> The meeting name has been set to 'sahara'
14:00:37 <tosky> o/
14:01:19 <tellesnobrega> waiting to see if jeremyfreudberg joins
14:02:26 <tellesnobrega> apparently not
14:02:27 <tellesnobrega> #topic News/Updates
14:03:12 <tellesnobrega> I'm stuck on python3 issue related to pickle
14:03:27 <tellesnobrega> I've reached out for some help but nothing so far has worked
14:03:44 <tosky> pickles from pickle
14:04:16 <tellesnobrega> I'm now trying to set up a python3 only image so I can test if the error is only related to python2->python3 or even with both environments running on python3 if it still raises the same issue
14:04:56 <tosky> I've helped with a tons of backports to the older branches
14:05:29 <tosky> we have a blocker for merging stuff to pike, the issue is a bit complex and this bug contains all the details: https://bugs.launchpad.net/tempest/+bug/1816022
14:05:29 <openstack> Launchpad bug 1816022 in tempest "Incompatible requirements break tempest (at least) on stable/pike" [Undecided,New]
14:07:05 <tosky> I still need to test the mapr fixes
14:07:44 <tosky> but we are in a better shape, also the number of open reviews lowered (thanks to a Friday-night review by Jeremy :)
14:08:30 <tellesnobrega> yes, I guess python 3 is the only critical stuff right now
14:11:19 <tellesnobrega> should I  explain a bit of the python 3 issue here
14:11:23 <tellesnobrega> so we have it on records?
14:11:30 <tosky> yep, I'd say so
14:11:33 <tellesnobrega> #topic python3
14:11:44 <tosky> so we can point to these logs in emails
14:11:50 <tellesnobrega> So, the issue we are facing with python3 now is related to pickle
14:12:32 <tellesnobrega> Sahara uses Pickle to serialize functions (path of a function) to send them to be executed in a Subprocess.Popen
14:12:47 <tellesnobrega> either locally or over ssh (depending on the command)
14:13:42 <tellesnobrega> At first, the error being returned was a Run of input, which we fixed by adding buffer on the process and dumping from pickle into the subprocess buffer
14:14:15 <tellesnobrega> right now, the error being returned is a invalid load key '<', which we thought to be encoding but that didn't pay off
14:14:41 <tosky> unless we encoded it in the wrong way, but it's difficult to say
14:15:37 <tellesnobrega> the most consistent suggestion from python experts was to replace pickle with json, for safety, but I'm not sure how to do the proper serialization with json
14:16:23 <tellesnobrega> did I miss anything tosky ?
14:16:58 <tosky> I don't think so
14:17:09 <tosky> I mean, nothing else to add
14:17:15 <tellesnobrega> cool
14:17:27 <tellesnobrega> maybe jeremy will read and come up with something to help
14:18:28 <jeremyfreudberg> sorry, forgot about the meeting...
14:18:32 <jeremyfreudberg> anything for me to weigh in on?
14:18:50 <fungi> replacing pickle with structured data serialization
14:18:56 * fungi is lurking
14:19:08 <tellesnobrega> fungi, we need help on that
14:19:19 <tellesnobrega> anyone who is willing to do so is more than welcome
14:19:32 <fungi> oh, i was answering jeremyfreudberg's question
14:20:00 <tellesnobrega> oh, ok
14:20:18 <tellesnobrega> jeremyfreudberg, we are kind of blocked on python3 pickle issue
14:20:21 <fungi> though i am interested in it from the vmt perspective simply because there are a number of potential risks with pickle use, i doubt i have time to implement anything there
14:21:04 <fungi> seeing more occurrences of pickle get squashed in favor of safer serialization would be nice
14:21:52 <tosky> fungi: do you know of examples of such replacements that already happened in openstack (or anywhere else)?
14:22:02 <tellesnobrega> fungi, we understand that, and I'm willing to replace it, if I can get some help understanding how to do it
14:22:54 <fungi> tosky: i don't have any specific examples off the top of my head though i think there have been at least a few over the years
14:23:40 <fungi> do you pass complex objects and/or functions over pickle, or just basic data types?
14:23:52 <tellesnobrega> functions
14:24:17 <tellesnobrega> but iiuc the serialization is just a path of the function
14:25:01 <fungi> in that case it may not be any safer security-wise if you want to be able to send arbitrary functions to a subprocess for execution (avoiding doing that "accidentally" is the main reason most people warn against using pickle, after all)
14:25:55 <tellesnobrega> fungi, here is an example of what happens in sahara
14:25:56 <tellesnobrega> >>> ser = pickle.dumps(identity)
14:25:56 <tellesnobrega> >>> ser
14:25:56 <tellesnobrega> ‘c__main__\nidentity\np0\n.’
14:26:11 <tellesnobrega> but we do it on subprocess stdin
14:28:07 <fungi> yeah, i mean in this case the risk is low anyway because you're (hopefully) never going to have untrusted pickle data passed into these subprocesses anyway
14:28:42 <fungi> i expect you're fighting the use of binary sequences for file descriptors in python3
14:28:56 <tellesnobrega> there are only a few cases of that happening and none are controlled by the user
14:29:29 <tosky> our case is complicated by the fact that we trasmit the python code from python3 to python2
14:29:45 <tosky> as most of the hadoop vendors still run (and will run for a while) on older distributions
14:30:11 <fungi> oh, in that case the pickle is likely just not valid at all. i don't know that python2 can be expected to even make use of a pickle payload from python3
14:30:38 <fungi> has anyone looked into whether that's supported?
14:30:38 <tellesnobrega> even we specify the protocol?
14:31:04 <jeremyfreudberg> tosky, are we sure that's what's happening? i thought the pickling was just between sahara-engine and _sahara-subprocess , no depickling done on the instances
14:31:10 <jeremyfreudberg> but i might have missed something
14:31:19 <tosky> oh
14:31:31 <tosky> uhm, maybe
14:31:47 <tosky> better if I just follow the discussion :)
14:32:15 <tellesnobrega> jeremyfreudberg, when we call _execute_command("ls .ssh/authorized_keys") doesn't it run on the instances?
14:32:32 <fungi> ahh, yeah looks like perhaps if you write the pickle as data stream format #2 in python3 then python2.3 and later should be able to read it
14:33:25 <tellesnobrega> fungi, I have set pickle to use protocol=2, it solved some issues, but the ones we are seeing now came up
14:34:56 <jeremyfreudberg> tellesnobrega: with _execute_command i think that is after the depickling has already happened (we open an ssh session with paramiko and the command is just a string)
14:35:46 <fungi> yeah, pickle is pretty opaque without digging into pep 307 and earlier relevant protocol specifications it builds on
14:35:53 <fungi> so hard to say
14:36:27 <jeremyfreudberg> tellesnobrega, i  will look a little closer, there may be some info that i am missing
14:36:37 <tellesnobrega> jeremyfreudberg,  thanks, please do, I might be mistaken as well
14:36:47 <fungi> as far as the "invalid load key '<'" exceptions
14:37:18 <tellesnobrega> what I saw is the same pickling path for bot _connect (when we open the ssh connection) and the _execute_command
14:37:50 <jeremyfreudberg> tellesnobrega: what's "path"?
14:39:05 <tellesnobrega> jeremyfreudberg, I mean, it goes through the same logic, pickling the function and sending to subprocess stdin and same for args and kwargs
14:39:17 <jeremyfreudberg> oh, got it
14:44:02 <jeremyfreudberg> tosky, tellesnobrega, has anyone actually looked at the full traceback with the "Ran out of input" error?
14:44:11 <jeremyfreudberg> (the traceback is suppressed in the logs)
14:44:32 <tosky> I did not
14:44:52 <tellesnobrega> me neither
14:44:54 <tosky> uhm, if it's suppressed, where should it be visible?
14:45:25 <tellesnobrega> I think I saw it once, while running with debug, but not much was useful, but you can try as well
14:45:48 <jeremyfreudberg> https://github.com/openstack/sahara/blob/8659169c84c9a2198d5aee9e94c6d145a1f8d93c/sahara/service/engine.py#L125
14:47:37 <tosky> ah, removing that exception, or printing more details
14:48:42 <jeremyfreudberg> yup
14:49:03 <jeremyfreudberg> instead of the repr of the exception, the entire trace would be helpful, at least to me
14:50:05 <tellesnobrega> jeremyfreudberg, I will run again and get that exception
14:50:14 <tellesnobrega> and send by email
14:50:30 <jeremyfreudberg> tellesnobrega: great, thanks!
14:50:51 <jeremyfreudberg> (no need for it in the gate, it's just the one spot)
14:51:18 <tellesnobrega> ok
14:53:12 <jeremyfreudberg> any other topics for these last few minutes?
14:54:25 <tosky> there was an interesting discussion yesterday evening on #openstack-sahara about the dashboard, please check the logs
14:54:45 * jeremyfreudberg looking now
14:54:55 <tosky> I fixed the native integration tests again following your suggestions (now experimental), ready to be merged and backported to remove the legacy job
14:57:28 <jeremyfreudberg> right now every test fails, right? or are there a very small number which pass
14:57:40 <jeremyfreudberg> in any case, ivan is in enough agreement that we can merge the patch
14:58:03 <tosky> all of them fails, for different reasons :)
14:58:14 <tosky> but they are executed
14:58:38 <jeremyfreudberg> yup,  i see. 8 tests run and 8 tests failed :)
14:58:38 <tellesnobrega> cool
14:58:48 <jeremyfreudberg> so let's merge that
14:58:52 <tellesnobrega> yes
14:59:29 <jeremyfreudberg> we're at the hour
14:59:33 <tellesnobrega> yes
14:59:37 <tellesnobrega> thanks jeremyfreudberg and tosky
14:59:43 <tellesnobrega> and thanks fungi as well
15:00:04 <tellesnobrega> lets see if we can figure out the python3 issue this week
15:00:08 <tellesnobrega> see you all next week
15:00:11 <tellesnobrega> #endmeeting