14:00:22 #startmeeting sahara 14:00:25 Meeting started Thu Feb 21 14:00:22 2019 UTC and is due to finish in 60 minutes. The chair is tellesnobrega. Information about MeetBot at http://wiki.debian.org/MeetBot. 14:00:26 Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. 14:00:29 The meeting name has been set to 'sahara' 14:00:37 o/ 14:01:19 waiting to see if jeremyfreudberg joins 14:02:26 apparently not 14:02:27 #topic News/Updates 14:03:12 I'm stuck on python3 issue related to pickle 14:03:27 I've reached out for some help but nothing so far has worked 14:03:44 pickles from pickle 14:04:16 I'm now trying to set up a python3 only image so I can test if the error is only related to python2->python3 or even with both environments running on python3 if it still raises the same issue 14:04:56 I've helped with a tons of backports to the older branches 14:05:29 we have a blocker for merging stuff to pike, the issue is a bit complex and this bug contains all the details: https://bugs.launchpad.net/tempest/+bug/1816022 14:05:29 Launchpad bug 1816022 in tempest "Incompatible requirements break tempest (at least) on stable/pike" [Undecided,New] 14:07:05 I still need to test the mapr fixes 14:07:44 but we are in a better shape, also the number of open reviews lowered (thanks to a Friday-night review by Jeremy :) 14:08:30 yes, I guess python 3 is the only critical stuff right now 14:11:19 should I explain a bit of the python 3 issue here 14:11:23 so we have it on records? 14:11:30 yep, I'd say so 14:11:33 #topic python3 14:11:44 so we can point to these logs in emails 14:11:50 So, the issue we are facing with python3 now is related to pickle 14:12:32 Sahara uses Pickle to serialize functions (path of a function) to send them to be executed in a Subprocess.Popen 14:12:47 either locally or over ssh (depending on the command) 14:13:42 At first, the error being returned was a Run of input, which we fixed by adding buffer on the process and dumping from pickle into the subprocess buffer 14:14:15 right now, the error being returned is a invalid load key '<', which we thought to be encoding but that didn't pay off 14:14:41 unless we encoded it in the wrong way, but it's difficult to say 14:15:37 the most consistent suggestion from python experts was to replace pickle with json, for safety, but I'm not sure how to do the proper serialization with json 14:16:23 did I miss anything tosky ? 14:16:58 I don't think so 14:17:09 I mean, nothing else to add 14:17:15 cool 14:17:27 maybe jeremy will read and come up with something to help 14:18:28 sorry, forgot about the meeting... 14:18:32 anything for me to weigh in on? 14:18:50 replacing pickle with structured data serialization 14:18:56 * fungi is lurking 14:19:08 fungi, we need help on that 14:19:19 anyone who is willing to do so is more than welcome 14:19:32 oh, i was answering jeremyfreudberg's question 14:20:00 oh, ok 14:20:18 jeremyfreudberg, we are kind of blocked on python3 pickle issue 14:20:21 though i am interested in it from the vmt perspective simply because there are a number of potential risks with pickle use, i doubt i have time to implement anything there 14:21:04 seeing more occurrences of pickle get squashed in favor of safer serialization would be nice 14:21:52 fungi: do you know of examples of such replacements that already happened in openstack (or anywhere else)? 14:22:02 fungi, we understand that, and I'm willing to replace it, if I can get some help understanding how to do it 14:22:54 tosky: i don't have any specific examples off the top of my head though i think there have been at least a few over the years 14:23:40 do you pass complex objects and/or functions over pickle, or just basic data types? 14:23:52 functions 14:24:17 but iiuc the serialization is just a path of the function 14:25:01 in that case it may not be any safer security-wise if you want to be able to send arbitrary functions to a subprocess for execution (avoiding doing that "accidentally" is the main reason most people warn against using pickle, after all) 14:25:55 fungi, here is an example of what happens in sahara 14:25:56 >>> ser = pickle.dumps(identity) 14:25:56 >>> ser 14:25:56 ‘c__main__\nidentity\np0\n.’ 14:26:11 but we do it on subprocess stdin 14:28:07 yeah, i mean in this case the risk is low anyway because you're (hopefully) never going to have untrusted pickle data passed into these subprocesses anyway 14:28:42 i expect you're fighting the use of binary sequences for file descriptors in python3 14:28:56 there are only a few cases of that happening and none are controlled by the user 14:29:29 our case is complicated by the fact that we trasmit the python code from python3 to python2 14:29:45 as most of the hadoop vendors still run (and will run for a while) on older distributions 14:30:11 oh, in that case the pickle is likely just not valid at all. i don't know that python2 can be expected to even make use of a pickle payload from python3 14:30:38 has anyone looked into whether that's supported? 14:30:38 even we specify the protocol? 14:31:04 tosky, are we sure that's what's happening? i thought the pickling was just between sahara-engine and _sahara-subprocess , no depickling done on the instances 14:31:10 but i might have missed something 14:31:19 oh 14:31:31 uhm, maybe 14:31:47 better if I just follow the discussion :) 14:32:15 jeremyfreudberg, when we call _execute_command("ls .ssh/authorized_keys") doesn't it run on the instances? 14:32:32 ahh, yeah looks like perhaps if you write the pickle as data stream format #2 in python3 then python2.3 and later should be able to read it 14:33:25 fungi, I have set pickle to use protocol=2, it solved some issues, but the ones we are seeing now came up 14:34:56 tellesnobrega: with _execute_command i think that is after the depickling has already happened (we open an ssh session with paramiko and the command is just a string) 14:35:46 yeah, pickle is pretty opaque without digging into pep 307 and earlier relevant protocol specifications it builds on 14:35:53 so hard to say 14:36:27 tellesnobrega, i will look a little closer, there may be some info that i am missing 14:36:37 jeremyfreudberg, thanks, please do, I might be mistaken as well 14:36:47 as far as the "invalid load key '<'" exceptions 14:37:18 what I saw is the same pickling path for bot _connect (when we open the ssh connection) and the _execute_command 14:37:50 tellesnobrega: what's "path"? 14:39:05 jeremyfreudberg, I mean, it goes through the same logic, pickling the function and sending to subprocess stdin and same for args and kwargs 14:39:17 oh, got it 14:44:02 tosky, tellesnobrega, has anyone actually looked at the full traceback with the "Ran out of input" error? 14:44:11 (the traceback is suppressed in the logs) 14:44:32 I did not 14:44:52 me neither 14:44:54 uhm, if it's suppressed, where should it be visible? 14:45:25 I think I saw it once, while running with debug, but not much was useful, but you can try as well 14:45:48 https://github.com/openstack/sahara/blob/8659169c84c9a2198d5aee9e94c6d145a1f8d93c/sahara/service/engine.py#L125 14:47:37 ah, removing that exception, or printing more details 14:48:42 yup 14:49:03 instead of the repr of the exception, the entire trace would be helpful, at least to me 14:50:05 jeremyfreudberg, I will run again and get that exception 14:50:14 and send by email 14:50:30 tellesnobrega: great, thanks! 14:50:51 (no need for it in the gate, it's just the one spot) 14:51:18 ok 14:53:12 any other topics for these last few minutes? 14:54:25 there was an interesting discussion yesterday evening on #openstack-sahara about the dashboard, please check the logs 14:54:45 * jeremyfreudberg looking now 14:54:55 I fixed the native integration tests again following your suggestions (now experimental), ready to be merged and backported to remove the legacy job 14:57:28 right now every test fails, right? or are there a very small number which pass 14:57:40 in any case, ivan is in enough agreement that we can merge the patch 14:58:03 all of them fails, for different reasons :) 14:58:14 but they are executed 14:58:38 yup, i see. 8 tests run and 8 tests failed :) 14:58:38 cool 14:58:48 so let's merge that 14:58:52 yes 14:59:29 we're at the hour 14:59:33 yes 14:59:37 thanks jeremyfreudberg and tosky 14:59:43 and thanks fungi as well 15:00:04 lets see if we can figure out the python3 issue this week 15:00:08 see you all next week 15:00:11 #endmeeting