14:00:22 <tellesnobrega> #startmeeting sahara 14:00:25 <openstack> Meeting started Thu Feb 21 14:00:22 2019 UTC and is due to finish in 60 minutes. The chair is tellesnobrega. Information about MeetBot at http://wiki.debian.org/MeetBot. 14:00:26 <openstack> Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. 14:00:29 <openstack> The meeting name has been set to 'sahara' 14:00:37 <tosky> o/ 14:01:19 <tellesnobrega> waiting to see if jeremyfreudberg joins 14:02:26 <tellesnobrega> apparently not 14:02:27 <tellesnobrega> #topic News/Updates 14:03:12 <tellesnobrega> I'm stuck on python3 issue related to pickle 14:03:27 <tellesnobrega> I've reached out for some help but nothing so far has worked 14:03:44 <tosky> pickles from pickle 14:04:16 <tellesnobrega> I'm now trying to set up a python3 only image so I can test if the error is only related to python2->python3 or even with both environments running on python3 if it still raises the same issue 14:04:56 <tosky> I've helped with a tons of backports to the older branches 14:05:29 <tosky> we have a blocker for merging stuff to pike, the issue is a bit complex and this bug contains all the details: https://bugs.launchpad.net/tempest/+bug/1816022 14:05:29 <openstack> Launchpad bug 1816022 in tempest "Incompatible requirements break tempest (at least) on stable/pike" [Undecided,New] 14:07:05 <tosky> I still need to test the mapr fixes 14:07:44 <tosky> but we are in a better shape, also the number of open reviews lowered (thanks to a Friday-night review by Jeremy :) 14:08:30 <tellesnobrega> yes, I guess python 3 is the only critical stuff right now 14:11:19 <tellesnobrega> should I explain a bit of the python 3 issue here 14:11:23 <tellesnobrega> so we have it on records? 14:11:30 <tosky> yep, I'd say so 14:11:33 <tellesnobrega> #topic python3 14:11:44 <tosky> so we can point to these logs in emails 14:11:50 <tellesnobrega> So, the issue we are facing with python3 now is related to pickle 14:12:32 <tellesnobrega> Sahara uses Pickle to serialize functions (path of a function) to send them to be executed in a Subprocess.Popen 14:12:47 <tellesnobrega> either locally or over ssh (depending on the command) 14:13:42 <tellesnobrega> At first, the error being returned was a Run of input, which we fixed by adding buffer on the process and dumping from pickle into the subprocess buffer 14:14:15 <tellesnobrega> right now, the error being returned is a invalid load key '<', which we thought to be encoding but that didn't pay off 14:14:41 <tosky> unless we encoded it in the wrong way, but it's difficult to say 14:15:37 <tellesnobrega> the most consistent suggestion from python experts was to replace pickle with json, for safety, but I'm not sure how to do the proper serialization with json 14:16:23 <tellesnobrega> did I miss anything tosky ? 14:16:58 <tosky> I don't think so 14:17:09 <tosky> I mean, nothing else to add 14:17:15 <tellesnobrega> cool 14:17:27 <tellesnobrega> maybe jeremy will read and come up with something to help 14:18:28 <jeremyfreudberg> sorry, forgot about the meeting... 14:18:32 <jeremyfreudberg> anything for me to weigh in on? 14:18:50 <fungi> replacing pickle with structured data serialization 14:18:56 * fungi is lurking 14:19:08 <tellesnobrega> fungi, we need help on that 14:19:19 <tellesnobrega> anyone who is willing to do so is more than welcome 14:19:32 <fungi> oh, i was answering jeremyfreudberg's question 14:20:00 <tellesnobrega> oh, ok 14:20:18 <tellesnobrega> jeremyfreudberg, we are kind of blocked on python3 pickle issue 14:20:21 <fungi> though i am interested in it from the vmt perspective simply because there are a number of potential risks with pickle use, i doubt i have time to implement anything there 14:21:04 <fungi> seeing more occurrences of pickle get squashed in favor of safer serialization would be nice 14:21:52 <tosky> fungi: do you know of examples of such replacements that already happened in openstack (or anywhere else)? 14:22:02 <tellesnobrega> fungi, we understand that, and I'm willing to replace it, if I can get some help understanding how to do it 14:22:54 <fungi> tosky: i don't have any specific examples off the top of my head though i think there have been at least a few over the years 14:23:40 <fungi> do you pass complex objects and/or functions over pickle, or just basic data types? 14:23:52 <tellesnobrega> functions 14:24:17 <tellesnobrega> but iiuc the serialization is just a path of the function 14:25:01 <fungi> in that case it may not be any safer security-wise if you want to be able to send arbitrary functions to a subprocess for execution (avoiding doing that "accidentally" is the main reason most people warn against using pickle, after all) 14:25:55 <tellesnobrega> fungi, here is an example of what happens in sahara 14:25:56 <tellesnobrega> >>> ser = pickle.dumps(identity) 14:25:56 <tellesnobrega> >>> ser 14:25:56 <tellesnobrega> ‘c__main__\nidentity\np0\n.’ 14:26:11 <tellesnobrega> but we do it on subprocess stdin 14:28:07 <fungi> yeah, i mean in this case the risk is low anyway because you're (hopefully) never going to have untrusted pickle data passed into these subprocesses anyway 14:28:42 <fungi> i expect you're fighting the use of binary sequences for file descriptors in python3 14:28:56 <tellesnobrega> there are only a few cases of that happening and none are controlled by the user 14:29:29 <tosky> our case is complicated by the fact that we trasmit the python code from python3 to python2 14:29:45 <tosky> as most of the hadoop vendors still run (and will run for a while) on older distributions 14:30:11 <fungi> oh, in that case the pickle is likely just not valid at all. i don't know that python2 can be expected to even make use of a pickle payload from python3 14:30:38 <fungi> has anyone looked into whether that's supported? 14:30:38 <tellesnobrega> even we specify the protocol? 14:31:04 <jeremyfreudberg> tosky, are we sure that's what's happening? i thought the pickling was just between sahara-engine and _sahara-subprocess , no depickling done on the instances 14:31:10 <jeremyfreudberg> but i might have missed something 14:31:19 <tosky> oh 14:31:31 <tosky> uhm, maybe 14:31:47 <tosky> better if I just follow the discussion :) 14:32:15 <tellesnobrega> jeremyfreudberg, when we call _execute_command("ls .ssh/authorized_keys") doesn't it run on the instances? 14:32:32 <fungi> ahh, yeah looks like perhaps if you write the pickle as data stream format #2 in python3 then python2.3 and later should be able to read it 14:33:25 <tellesnobrega> fungi, I have set pickle to use protocol=2, it solved some issues, but the ones we are seeing now came up 14:34:56 <jeremyfreudberg> tellesnobrega: with _execute_command i think that is after the depickling has already happened (we open an ssh session with paramiko and the command is just a string) 14:35:46 <fungi> yeah, pickle is pretty opaque without digging into pep 307 and earlier relevant protocol specifications it builds on 14:35:53 <fungi> so hard to say 14:36:27 <jeremyfreudberg> tellesnobrega, i will look a little closer, there may be some info that i am missing 14:36:37 <tellesnobrega> jeremyfreudberg, thanks, please do, I might be mistaken as well 14:36:47 <fungi> as far as the "invalid load key '<'" exceptions 14:37:18 <tellesnobrega> what I saw is the same pickling path for bot _connect (when we open the ssh connection) and the _execute_command 14:37:50 <jeremyfreudberg> tellesnobrega: what's "path"? 14:39:05 <tellesnobrega> jeremyfreudberg, I mean, it goes through the same logic, pickling the function and sending to subprocess stdin and same for args and kwargs 14:39:17 <jeremyfreudberg> oh, got it 14:44:02 <jeremyfreudberg> tosky, tellesnobrega, has anyone actually looked at the full traceback with the "Ran out of input" error? 14:44:11 <jeremyfreudberg> (the traceback is suppressed in the logs) 14:44:32 <tosky> I did not 14:44:52 <tellesnobrega> me neither 14:44:54 <tosky> uhm, if it's suppressed, where should it be visible? 14:45:25 <tellesnobrega> I think I saw it once, while running with debug, but not much was useful, but you can try as well 14:45:48 <jeremyfreudberg> https://github.com/openstack/sahara/blob/8659169c84c9a2198d5aee9e94c6d145a1f8d93c/sahara/service/engine.py#L125 14:47:37 <tosky> ah, removing that exception, or printing more details 14:48:42 <jeremyfreudberg> yup 14:49:03 <jeremyfreudberg> instead of the repr of the exception, the entire trace would be helpful, at least to me 14:50:05 <tellesnobrega> jeremyfreudberg, I will run again and get that exception 14:50:14 <tellesnobrega> and send by email 14:50:30 <jeremyfreudberg> tellesnobrega: great, thanks! 14:50:51 <jeremyfreudberg> (no need for it in the gate, it's just the one spot) 14:51:18 <tellesnobrega> ok 14:53:12 <jeremyfreudberg> any other topics for these last few minutes? 14:54:25 <tosky> there was an interesting discussion yesterday evening on #openstack-sahara about the dashboard, please check the logs 14:54:45 * jeremyfreudberg looking now 14:54:55 <tosky> I fixed the native integration tests again following your suggestions (now experimental), ready to be merged and backported to remove the legacy job 14:57:28 <jeremyfreudberg> right now every test fails, right? or are there a very small number which pass 14:57:40 <jeremyfreudberg> in any case, ivan is in enough agreement that we can merge the patch 14:58:03 <tosky> all of them fails, for different reasons :) 14:58:14 <tosky> but they are executed 14:58:38 <jeremyfreudberg> yup, i see. 8 tests run and 8 tests failed :) 14:58:38 <tellesnobrega> cool 14:58:48 <jeremyfreudberg> so let's merge that 14:58:52 <tellesnobrega> yes 14:59:29 <jeremyfreudberg> we're at the hour 14:59:33 <tellesnobrega> yes 14:59:37 <tellesnobrega> thanks jeremyfreudberg and tosky 14:59:43 <tellesnobrega> and thanks fungi as well 15:00:04 <tellesnobrega> lets see if we can figure out the python3 issue this week 15:00:08 <tellesnobrega> see you all next week 15:00:11 <tellesnobrega> #endmeeting