03:01:40 <Sundar> #startmeeting openstack-cyborg 03:01:41 <xinranwang> Hi Sundar 03:01:42 <openstack> Meeting started Thu Sep 5 03:01:40 2019 UTC and is due to finish in 60 minutes. The chair is Sundar. Information about MeetBot at http://wiki.debian.org/MeetBot. 03:01:43 <openstack> Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. 03:01:44 <xinranwang> Hi chenke 03:01:45 <openstack> The meeting name has been set to 'openstack_cyborg' 03:01:56 <Sundar> #topic Who's here 03:02:05 <Sundar> o/ 03:02:06 <xinranwang> #info xinranwang 03:02:11 <s_shogo> #info s_shogo 03:02:12 <yikun> o/ 03:02:15 <chenke> #info chenke 03:02:18 <shaohe_feng> o/ 03:02:20 <chenke> o/ 03:02:25 <s_shogo> o/ 03:02:25 <Sundar> Hi all 03:02:31 <zhurong> \0/ 03:02:46 <Sundar> zhurong: Welcome 03:03:10 <Sundar> The main item on the agenda is, of course, the reviews 03:04:06 <Sundar> Please focus on the P5-P9 patches. They and Nova notifn patch need to merge soon, before we get reviews of Nova patches. Also, they are needed for tempest to work, and that is a prereq too. 03:05:14 <Sundar> yikun and all: Would you be able to review it? (Sorry for calling you out, yikun :)) 03:05:34 <yikun> Sure, I will review it today 03:05:40 <Sundar> Thanks :) 03:06:32 <yikun> and one detail question about db migrations on P5.5 03:06:34 <yikun> https://review.opendev.org/#/c/677115/12/cyborg/db/sqlalchemy/alembic/versions/c1b5abada09c_update_for_nova_integ.py 03:06:51 <Sundar> We had a bit of a wrinkle on the Nova side too. The Cyborg's notification event would come too soon after we start the bind and would get lost. I think we have a way to solve it which (a) should hopefully be acceptable to Nova and (b) does not involve change to Nova objects or db 03:06:53 <yikun> Can we append the change on existing migration script? 03:07:16 <yikun> and most of other patches are okay to me 03:08:01 <Sundar> yikun: There is one migration script for v2 API, so that we can show people exactly what we did. Once we releaseTrain, of course, previous migration scripts cannot be edited. 03:08:35 <Sundar> As long as we don't change the migration script for Stein, I think we should be ok. Any concerns? 03:10:41 <yikun> it will break upgrade from existing to now, but as you said if we still think the train relaase it's a start version, if other people have no question, I have no objection 03:12:01 <Sundar> I understand the existing practice but (a) too many migration scripts slows down upgrade for operators and (b) it is good to show onw script for all v2 API IMHO. Thanks for your understanding. 03:12:13 <wangzhh> Agree with yikun. It's better to append a migration script. And other patch is fine with me. 03:12:37 <wangzhh> *patches are 03:13:16 <Sundar> wangzhh: Given the reasons above, would you still ask for a separate migration script? 03:14:30 <Coco_gao_> #info Coco_gao 03:15:07 <Sundar> Hi Coco_gao 03:15:16 <Sundar> We are discussing the patch review 03:15:50 <wangzhh> So, we 'll consider T as a start version? 03:17:00 <Sundar> Stein is the starting version -- v1 API. We should not modify or change the Stein migration script. We introduced a new migration script in Train, for v2 API. Anything unrelated to it can be in a separate script. But, so far, everything we did has been for v2 and Nova integ, I think. 03:18:16 <Sundar> From operators' point of view, the migration scripts that are part of a release should not be changed. Within a release, we usually introduce separate scripts for developers' convenience, I think. 03:19:10 <wangzhh> Uh, Okay. I have no objection. 03:19:33 <Sundar> Anyways, that is just my view. Thank you, wangzhh, yikun and all. 03:19:43 <Sundar> Any other concerns or thoughts on P5-P9? 03:21:19 <chenke> why are you put -2 on P5? 03:23:06 <Sundar> chenke: It is only a procedure, so that the patch series does not merge piece by piece. Once we get +2 on all patches, I will remove -2, so that it will all merge together. Nova guys are following this practice for my patch series too: 03:23:09 <Sundar> https://review.opendev.org/#/c/631242/ 03:23:30 <Sundar> I should have done that right at the beginning. 03:24:05 <Sundar> So, even P5 can be reviewed fully. 03:24:05 <chenke> Okay. I understand it. 03:25:07 <Sundar> Can we hope it to merge them by tomorrow? I don't want to rush you :) but till P5-P9 and notificaiton patch merge, we cannot even ask Nova folks for review 03:25:31 <Sundar> And Sep 9 deadline is just few days away 03:26:52 <wangzhh> One question: https://review.opendev.org/#/c/678177/1..2/cyborg/common/policy.py Related comment in this patch. I did't find program in update ARQ. 03:28:02 <Sundar> update refers to bind/unbind here, and that would involve programming. 03:28:17 <Sundar> Are you saying the term 'update' should be changed? 03:29:13 <wangzhh> No. Just check if it is admin or not before program. As I replied. 03:30:18 <Sundar> Hmm, it could be owner or admin, right? Whoever created the ARQ should be allowed to bind it, otherwise they should not be able to create it. 03:32:23 <wangzhh> Yep. Owner or admin should be allowed to Bind and Unbind. But when involve programming, we should check if it is admin or not. 03:32:50 <wangzhh> Only admin can program? Right? 03:33:54 <Sundar> Binding involves programming for FPGAs. A tenant who has the privilege to create ARQs for an FPGA should also be able to program it, right? That's why I raised the question about 'create' privileges. 03:34:04 <shaohe_feng> program had better have sudo privilege no node 03:34:28 <Sundar> Yes, we do sudo fpgaconf today, and have a patch to move to oslo privsep. 03:34:28 <shaohe_feng> s/no/on 03:34:55 <shaohe_feng> yes. let's from simple 03:35:05 <shaohe_feng> Firstly it can work. 03:35:17 <shaohe_feng> and let nova can call cyborg 03:35:21 <shaohe_feng> then improve cyborg 03:35:28 <Sundar> wangzhh: I understand your concern. However, if only admins can create/bind ARQs for FPGAs, that will be very restrictive, right? 03:35:35 <chenke> Yes. Firstly make it work. 03:36:21 <wangzhh> Yep. Sundar. 03:36:44 <wangzhh> So, Just make it work first. 03:37:17 <Sundar> Ok, I agree with wangzhh that security is important and with others that we should first get it to work by Train-3. I'll bring RBAC on the agenda for the next IRC meeting. Hopefully, other things will be working by then. :) 03:37:35 <wangzhh> Cool. 03:37:36 <chenke> Agreee. 03:37:54 <Sundar> #topic Python 3 03:38:29 <shaohe_feng> the admin or tenant own the bit steam image, can download the image and to do program. 03:38:42 <Sundar> There is a Train goal to enable Py3 by Train-3 milestone, which is next week. Thanks to s_shogo (and Ikuo) for the patch. Hope we can merge that too by next week. 03:39:00 <shaohe_feng> we should challenge the tenant 03:39:46 <Sundar> shaohe_feng: Sure, let's take it up next week. 03:39:56 <s_shogo> I'll check again that soon, after P5-P9 merged. (latest patch seems to be no problem) 03:40:20 <Sundar> Thanks, s_shogo. 03:40:49 <Sundar> #topic AoB 03:41:08 <s_shogo> About cyborg client. 03:41:38 <xinranwang> Hi all, I have to drop now. Thanks for your reviews for placement patch, it is merged finally :) And please review https://review.opendev.org/#/c/667231/ tempest code and notify patch https://review.opendev.org/#/c/674520/ too. 03:41:41 <Sundar> Outside of patches and Py3, I don't have much for today. As i said, I think I have a way to avoid the race in Nova, and will work on getting it out. Anything else for today? 03:41:55 <Sundar> s_shogo: Yes 03:42:05 <Sundar> I saw your patch -- thanks. 03:42:09 <s_shogo> I have posted a patch for OpenStack SDK, and request review to SDK team. https://review.opendev.org/#/c/679914/ 03:42:37 <s_shogo> Subsequently ,I'll post cyborg-python client patch ,please review that > All 03:42:56 <chenke> Cool. 03:42:59 <Sundar> Great, sure, will do. 03:43:24 <shaohe_feng> one thing. 03:43:36 <shaohe_feng> we may start several asyn jobs 03:44:03 <shaohe_feng> only all jobs are finished, the bind is successful 03:44:18 <shaohe_feng> one of the jobs failed, then the bind should be failed 03:44:24 <shaohe_feng> right? 03:45:10 <Sundar> Yes. 03:45:44 <shaohe_feng> so how to sync all the state of jobs? 03:46:20 <Sundar> Wait for all of them to finish. I think eventlet has a wait() method, IIUC 03:46:35 <shaohe_feng> yes. 03:46:53 <shaohe_feng> but wait is block 03:47:04 <shaohe_feng> so it can not in main thread. 03:47:18 <shaohe_feng> also another, if one jobs failed, we should cancel others, right? 03:47:28 <shaohe_feng> where should we put the wait? 03:48:18 <Sundar> It can't be in main thread, I agree. We don't need to cancel the rest if one fails, at least in Train. As we discussed, if the job has already started flashing the card, we shouldn't cancel it then 03:49:03 <Sundar> Just a thought: May be we have one eventlet whichis a master and it launches other eventlets, and the master does the wait. 03:50:00 <shaohe_feng> yes, we need a such mechanism 03:50:14 <shaohe_feng> I'm thinking it. 03:50:44 <shaohe_feng> the master thread should not be from a thread pool, right? 03:50:58 <shaohe_feng> for example, there a 2 pools. 03:51:01 <Sundar> Sure. My suggestion is to do something super-simple for Train-3 milestone -- may be just one eventlet for all ARQs, because we won't have too many ARQs anyway 03:51:45 <shaohe_feng> we create 2 jobs, so no pool for the master thread to monitor these 2 jobs. 03:51:59 <Sundar> Simplcity and reliability are more important, IMHO. It will also reduce your work :) 03:52:20 <shaohe_feng> yes, agree. 03:52:34 <shaohe_feng> async is more complex than sync. 03:53:26 <Sundar> Yes. IMHO, we can even leave unbind as sync in the worst case, for now, because we don't do anything there today 03:53:46 <shaohe_feng> yes. agree 03:53:58 <Sundar> We just want some patch in the works by Sep 3, so that we can kickstart the review 03:54:18 <Sundar> I mean, Sep 9 03:55:43 <Sundar> Anything else, anybody? 03:55:59 <chenke> No 03:56:37 <Sundar> Cool. Thank you, all. Please review the patches. Have a good day! Bye. 03:56:41 <Sundar> #endmeeting