*** chenke has joined #openstack-cyborg | 01:12 | |
*** openstackgerrit has joined #openstack-cyborg | 01:23 | |
openstackgerrit | Shogo Saito proposed openstack/cyborg master: Fix ARQ delete API issue https://review.opendev.org/683013 | 01:23 |
---|---|---|
openstackgerrit | Xinran WANG proposed openstack/cyborg master: bug fixing: let FPGA driver report correct traits when no SRIOV enabled https://review.opendev.org/680953 | 01:57 |
*** luyao has joined #openstack-cyborg | 02:02 | |
*** chenke has quit IRC | 02:07 | |
*** chunxiu has joined #openstack-cyborg | 02:13 | |
*** TxGirlGeek has quit IRC | 02:38 | |
openstackgerrit | YumengBao proposed openstack/cyborg master: conductor writes device_profile update to db https://review.opendev.org/679406 | 02:42 |
*** shaohe_feng has joined #openstack-cyborg | 02:47 | |
*** s_shogo has joined #openstack-cyborg | 02:47 | |
*** chenke has joined #openstack-cyborg | 02:56 | |
*** xinranwang has joined #openstack-cyborg | 02:59 | |
*** Yumeng has joined #openstack-cyborg | 03:00 | |
chenke | Hi | 03:00 |
*** wangzhh has joined #openstack-cyborg | 03:00 | |
shaohe_feng | hi all | 03:01 |
chenke | hi maxiaoha | 03:01 |
*** changzhi has joined #openstack-cyborg | 03:01 | |
wangzhh | Hi all. Hi shaohe. | 03:01 |
Yumeng | hi all | 03:01 |
*** Sundar has joined #openstack-cyborg | 03:02 | |
Sundar | Hi all | 03:02 |
Sundar | #startmeeting openstack-cyborg | 03:02 |
openstack | Meeting started Thu Sep 19 03:02:43 2019 UTC and is due to finish in 60 minutes. The chair is Sundar. Information about MeetBot at http://wiki.debian.org/MeetBot. | 03:02 |
openstack | Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. | 03:02 |
*** openstack changes topic to " (Meeting topic: openstack-cyborg)" | 03:02 | |
openstack | The meeting name has been set to 'openstack_cyborg' | 03:02 |
Sundar | #topic Who's here | 03:02 |
*** openstack changes topic to "Who's here (Meeting topic: openstack-cyborg)" | 03:03 | |
Sundar | o/ | 03:03 |
chenke | o/ | 03:03 |
Yumeng | #info Yumeng | 03:03 |
s_shogo | #info s_shogo | 03:03 |
wangzhh | #info wangzhh | 03:03 |
changzhi | #info changzhi | 03:03 |
chenke | #info chenke | 03:03 |
Sundar | Hi chenke, Yumeng, s_shogo, wangzhh. Welcome changzhi | 03:04 |
shaohe_feng | #info shaohe_feng | 03:04 |
Sundar | Hi shaohe | 03:04 |
Sundar | #topic Status | 03:04 |
*** openstack changes topic to "Status (Meeting topic: openstack-cyborg)" | 03:04 | |
Sundar | First, thank you all for an active Train cycle. We have hit feature feeze a week ago | 03:04 |
Sundar | SO also did other projects. | 03:05 |
Sundar | The good news: Cyborg side of the Nova integration is pretty much done. We just need to clean up the way we invoke other services | 03:05 |
chenke | Great | 03:06 |
wangzhh | Cool | 03:06 |
Sundar | Not so good news: Our Nova patches did not enough reviews from Nova developers, and so did not make the cut. | 03:06 |
Sundar | Part of the problem is that, Cyborg patches were open for a long time, so Nova developers did not see it as ready, though we could put up a VM with Cyborg + Nova patches | 03:07 |
Sundar | Also, there was a longstanding request to show tempest CI working. That completed exactly in the milestone week. That was too late to get sustained reviews. | 03:08 |
shaohe_feng | We know intigration is a big effort | 03:08 |
shaohe_feng | Sundar: you d a lot of effort. Thanks | 03:08 |
chenke | It is understandable the patch in nova be merged slowly. | 03:09 |
Sundar | NP, thanks Shaohe. I am optimistic about U because I think we are close. and I have re-proposed the Nova spec. This time, tempest and most things are merged. Things that attratc croos-project attention, like tempest, privsep, sdk_adapter stuff, etc. are all done or making good progress | 03:09 |
Sundar | Hope to get the Nova patches in the runway very early in the cycle. The more we wait, the more things get bogged down among the tons of other reviews. | 03:10 |
Sundar | That said, we have a few more things to wrap up in Train :) | 03:11 |
Sundar | First, remove the hardcoding of 'dvstack-admin'. Thanks, chenker and all for addressing that :) | 03:12 |
Sundar | Second, v1 API is deprecated but still supported in Train. But it is not working because we removed all v1 from devstack. I should re-enable it, I think | 03:12 |
xinranwang | #info xinranwang | 03:13 |
Sundar | SHaohe's async bind, privsep, rbac are important | 03:13 |
xinranwang | Hi all | 03:13 |
Sundar | I think all the pep8/flake fixes from chenker/zhurong are looking good and will probably merge this week | 03:14 |
Sundar | Can you all think of anything else? | 03:14 |
Yumeng | Sundar: and please don't forget update device_profile db by conductor:https://review.opendev.org/#/c/679406/ | 03:14 |
Yumeng | just updated | 03:14 |
shaohe_feng | Sundar remain some slot for me to introduce the async jobs, so other's can easily to review it. | 03:15 |
shaohe_feng | Thanks | 03:15 |
Yumeng | and this gpu fix :https://review.opendev.org/#/c/675059/ I tested in my devstack env, it works | 03:15 |
Sundar | Ah yes, that too, Yumeng :) There are quite a few patches up there, including https://review.opendev.org/680953. | 03:15 |
Sundar | Sure, let's knock off as much as we can. Was just listing the ones critical to complete in Train | 03:16 |
openstackgerrit | Merged openstack/cyborg master: P5: Fix pep8 error in cyborg/accelerator https://review.opendev.org/679175 | 03:16 |
Sundar | shaohe_feng: Sure | 03:16 |
Sundar | Folks, anything else before we dive into Shaohe's async bind? | 03:17 |
s_shogo | I'm starting test&validation task, with real machine , begin with common functions, independet from specific accelerators. | 03:18 |
s_shogo | If extracted some bugs or erros, report that or post patches till the Train release. | 03:18 |
Sundar | Sure, s_shogo. I think the client effort can be aimed early in U release, since the Train release milestone for clients is past | 03:19 |
Sundar | I have some questions on RBAC: https://review.opendev.org/#/c/678177/ . In https://review.opendev.org/#/c/678177/3/cyborg/common/policy.py@83, should it be allow rule? ANybody can create an ARQ and thereby bind that ARQ, and so program an FPGA? | 03:19 |
s_shogo | Sundar: OK, I'll do the client&sdk task continuously, to the U release. | 03:21 |
Sundar | wangzhh: What do you think? | 03:22 |
xinranwang | should we complete v2 API in T? | 03:22 |
wangzhh | Sundar, it should be allowed and recheck it in the method if it is a program action or not. | 03:23 |
Sundar | wangzhh: ok | 03:23 |
Sundar | xinranwang: Only devices API remains. We are supposed to merge only bug fixes, I think. So, it will probably go to U. Is anything else remaining? | 03:23 |
Sundar | OK, 35 min remaining. Let's move to async bind. | 03:25 |
Sundar | #topic Async bind | 03:25 |
*** openstack changes topic to "Async bind (Meeting topic: openstack-cyborg)" | 03:25 | |
Sundar | Shaohe, take it away! | 03:25 |
shaohe_feng | Now let's we start to introduce async bind. Any questions can fafter the introduction. | 03:26 |
shaohe_feng | Briefly put, bind is to find a suitable device(maybe PCI, or MDEV) on the right host for a server instance to use. | 03:26 |
shaohe_feng | So what's the suitable device, we need a spec to describe it. | 03:26 |
shaohe_feng | On v1 we discribe the device directly on nova flavor extra spec, and cyborg parser the spec, Xinran implement this work. | 03:26 |
shaohe_feng | On v2, after the PTG discussion, we define it in cyborgs owen Device Pofile. And Sundar implement it. | 03:26 |
shaohe_feng | I have no chance to attend PTG for discussion, More details please talk with Sundar. | 03:26 |
shaohe_feng | Thans Xinran and Sundar's effor. | 03:26 |
shaohe_feng | Before we introduce async bind, let's know some implement(rules) in the current code firstly. | 03:26 |
shaohe_feng | 1. The AtachHandler in ExtARQ is not a list, so only one AtachHandler(one devcie for ARQ) | 03:27 |
shaohe_feng | profile group in order to get the expected devices. | 03:27 |
shaohe_feng | Now Our cyborg ARQ API bind API is sync, be we define it as async, so need to improve. | 03:27 |
shaohe_feng | So what we changed: | 03:27 |
shaohe_feng | 1. Use a thread pool to start the async job. | 03:27 |
shaohe_feng | In cyborg spec, sundar suggests use concurrent, yes it is a python stand lib. See python office link: | 03:27 |
shaohe_feng | https://docs.python.org/3/library/concurrent.futures.html | 03:27 |
shaohe_feng | Also we can greening it by greenlet. patched it by eventlet. | 03:28 |
shaohe_feng | utures = eventlet.import_patched('concurrent.futures') # 'greening' futures, | 03:28 |
openstackgerrit | Merged openstack/cyborg master: P6: Fix pep8 error in cyborg/agent and cyborg/db https://review.opendev.org/679193 | 03:28 |
shaohe_feng | easily to greening | 03:28 |
shaohe_feng | See python mail list discussion. | 03:28 |
shaohe_feng | I have simply test it, it can work, but I did not test it performance, do not enable greening in the patch. | 03:28 |
shaohe_feng | 2. I move out the bind logical from ExtARQ object. | 03:29 |
shaohe_feng | Let the ExtARQ maintain's its base function, such as its attribution's CRUD. | 03:29 |
shaohe_feng | Move it to cyborg/accelerator/common/handler.py (not sure this is a good place, this is a OPEN) | 03:29 |
shaohe_feng | Add a basic and general bind handle class named Accelerators. (not sure this is a good name, this is a OPEN) | 03:29 |
shaohe_feng | It support the base _bind | 03:29 |
shaohe_feng | https://review.opendev.org/#/c/681005/16/cyborg/accelerator/common/handler.py | 03:29 |
shaohe_feng | If a new acclerators need extra opeation, can derived it and extend it if needed, such as FPGA | 03:29 |
shaohe_feng | line 386 at | 03:30 |
shaohe_feng | For FPGA it need to get image metadata, download image, program image and update the placement. | 03:30 |
shaohe_feng | If _bind is time consume, use "wrap_job_tb" to wraper it. | 03:31 |
shaohe_feng | In this wraper I add it with "is_job" and can catch every Exception/traceback during bind process, then log it. | 03:31 |
openstackgerrit | Merged openstack/cyborg master: P7: Fix pep8 error in cyborg/objects and cyborg/image https://review.opendev.org/679526 | 03:31 |
openstackgerrit | Merged openstack/cyborg master: P8: Fix pep8 error in cyborg/tests and add post_mortem_debug.py https://review.opendev.org/679538 | 03:31 |
shaohe_feng | I also add a bind in the general class to start the jobs tagged with "is_job". | 03:31 |
shaohe_feng | I also add a master to monitor the jobs(as sundar suggestted) | 03:31 |
shaohe_feng | https://review.opendev.org/#/c/681005/16/cyborg/accelerator/common/handler.py | 03:31 |
shaohe_feng | It checks the jobs status and also will get the job Exception/traceback. | 03:32 |
shaohe_feng | please add a SUPPORT_RESOURCES in | 03:32 |
shaohe_feng | 4. I add ARQ_STATES_TRANSFORM_MATRIX to sync the status. | 03:32 |
shaohe_feng | Talked with sundar and xinran, we add extra status: ARQ_DELETING and ARQ_BIND_STARTED | 03:32 |
shaohe_feng | line at 29 | 03:32 |
shaohe_feng | I just refacor Sundar's effort. Do not change his logical, at present. So did not change any API define exposed to user. Thanks for Sundar's effort. | 03:33 |
shaohe_feng | I did not test multi/batch AQRs, for example, a request for 2 FPGAs, or 1 GPU and 1 FPGA. | 03:33 |
shaohe_feng | Have no really env. | 03:33 |
shaohe_feng | So I think we need to merge the patch, and let more developers test it. | 03:33 |
shaohe_feng | That's the different with VM management. Ironic or Cyborg sometimes need hardware, so it is difficult to manage. | 03:34 |
shaohe_feng | the commit message show you how to test this patch and | 03:34 |
shaohe_feng | analyze the process by log: https://review.opendev.org/#/c/681005/16//COMMIT_MSG | 03:35 |
shaohe_feng | Also there's still lot of works on it. Need to improve it continuously. Let it works firstly, then improvement. | 03:35 |
shaohe_feng | sorry | 03:36 |
shaohe_feng | any questions? | 03:37 |
Sundar | shaohe_feng: Thanks for all the time and hard work | 03:37 |
Sundar | For testing, hope people can use the fake driver. It supports FPGA resource class. Can we get it to take the programming patch but treat it as a no-op? | 03:38 |
Sundar | *programming code path | 03:38 |
shaohe_feng | Do you means make some mock do not really programming? | 03:39 |
Sundar | Yes | 03:39 |
shaohe_feng | Hardware support is really than VM | 03:39 |
Yumeng | shaohe_feng: that's really a comprehensive and deep research and very helpful introduction. | 03:39 |
shaohe_feng | Yumeng thanks. hopeful it is useful. | 03:40 |
s_shogo | Thanks, shaohe_feng : | 03:40 |
xinranwang | shaohe_feng: thanks Shaohe for your efforts | 03:41 |
shaohe_feng | Sundar let me give a method to mock it later. | 03:41 |
Sundar | Not everybody has hardware, as you said. But concurrent execution is not easy to test throughly. It may work in my env but fail in somebody else's. We can hopefully get more people to check it out using fake driver | 03:41 |
Sundar | Great, thanks | 03:41 |
shaohe_feng | Yes, will give a guide for how to mock it. | 03:41 |
chenke | Great jobs thanks ShaoHe. | 03:42 |
Sundar | Also: "Move it to cyborg/accelerator/common/handler.py". Bind is really an operation on an ExtARQ. It logically belongs with objects/ext_arq.py. If you want to split that into separate source file, that is OK. But it can be a mix-in rather than a separate object/class, IMHO | 03:42 |
Yumeng | shaohe_feng: great! looking froward to the mock guide | 03:43 |
shaohe_feng | I have check nova's object code, Then I make this change. | 03:44 |
wangzhh | shaohe_feng, Thx for your effort. | 03:44 |
shaohe_feng | Sundar any details for how to split it? | 03:44 |
Sundar | shaohe_feng: I found this blog useful: http://www.qtrac.eu/pyclassmulti.html | 03:45 |
Sundar | It considers many ways to split a Python class into different source files, and finally recommends mix-ins | 03:46 |
shaohe_feng | glance it. seem it is a big change. | 03:48 |
Sundar | Hmmm... only the last part is the mix-in. That could be a small change. You can move your chosen methods into a separate file, put it in a mix-in, and inherit that mix-in into the ExtARQ object class | 03:49 |
Sundar | I can help as much as I can. | 03:50 |
shaohe_feng | good, then I can write a mock evn guide for test. | 03:51 |
Sundar | In that article, the last section "The Definitive Version?" alone is about mix-ins | 03:51 |
Sundar | OK, great | 03:51 |
Sundar | Anything else, Shaohe? | 03:52 |
shaohe_feng | no, that's all for me. | 03:52 |
Sundar | Thanks very much, once again. | 03:53 |
Sundar | #topic AoB | 03:53 |
*** openstack changes topic to "AoB (Meeting topic: openstack-cyborg)" | 03:53 | |
shaohe_feng | let move the patch on | 03:53 |
Sundar | Python IPv6 jobs: https://review.opendev.org/#/c/682517/ Please review | 03:53 |
Sundar | Many patches hit merge conflict after recent merges | 03:53 |
shaohe_feng | it does not matter. | 03:54 |
shaohe_feng | we just improve our git skill | 03:54 |
shaohe_feng | other active project | 03:54 |
Sundar | We need one more review for https://review.opendev.org/#/c/680953/ from outside Intel. | 03:55 |
shaohe_feng | conflict is very common | 03:55 |
Sundar | Sure | 03:55 |
Sundar | Train schedule: https://releases.openstack.org/train/schedule.html RC1 candidate is next week! | 03:56 |
Sundar | Hope to get the critical patches in by that time. | 03:56 |
Sundar | After that, even bug fixes are not assured | 03:56 |
Sundar | BTW, Cyborg will get packaged as a RPM as part of OpenStack release: https://opendev.org/openstack/rpm-packaging/src/branch/master/openstack/cyborg | 03:57 |
Sundar | Anything else, guys? | 03:58 |
shaohe_feng | no | 03:58 |
chenke | no | 03:58 |
Sundar | Have a good day! Bye | 03:58 |
Sundar | #endmeeting | 03:58 |
*** openstack changes topic to "Pending patches (Meeting topic: openstack-cyborg)" | 03:58 | |
openstack | Meeting ended Thu Sep 19 03:58:56 2019 UTC. Information about MeetBot at http://wiki.debian.org/MeetBot . (v 0.1.4) | 03:58 |
Yumeng | bye | 03:58 |
openstack | Minutes: http://eavesdrop.openstack.org/meetings/openstack_cyborg/2019/openstack_cyborg.2019-09-19-03.02.html | 03:58 |
openstack | Minutes (text): http://eavesdrop.openstack.org/meetings/openstack_cyborg/2019/openstack_cyborg.2019-09-19-03.02.txt | 03:59 |
chenke | bye all. | 03:59 |
shaohe_feng | thank you | 03:59 |
openstack | Log: http://eavesdrop.openstack.org/meetings/openstack_cyborg/2019/openstack_cyborg.2019-09-19-03.02.log.html | 03:59 |
wangzhh | bye | 03:59 |
shaohe_feng | bye | 03:59 |
chenke | thanks all. | 03:59 |
s_shogo | bye | 03:59 |
xinranwang | bye | 03:59 |
*** Sundar has quit IRC | 03:59 | |
*** s_shogo has quit IRC | 04:06 | |
*** changzhi has quit IRC | 04:10 | |
openstackgerrit | Sundar Nadathur proposed openstack/cyborg master: Fix arq api errors in delete and unbind https://review.opendev.org/682913 | 04:12 |
openstackgerrit | Merged openstack/os-acc master: Removing project os-acc. https://review.opendev.org/682498 | 04:39 |
*** chunxiu has quit IRC | 06:01 | |
*** chenke has quit IRC | 06:07 | |
*** xinranwang has quit IRC | 06:08 | |
*** wangzhh has quit IRC | 06:08 | |
*** Yumeng has quit IRC | 06:26 | |
openstackgerrit | chenchunxiu proposed openstack/cyborg master: Fix arq api errors in delete and unbind https://review.opendev.org/683035 | 06:38 |
openstackgerrit | chenchunxiu proposed openstack/cyborg master: Fix arq api errors in delete and unbind https://review.opendev.org/683035 | 06:40 |
*** chenke has joined #openstack-cyborg | 07:29 | |
openstackgerrit | chenker proposed openstack/cyborg master: Fix the hardcoding of user role using sdk_adapter approach https://review.opendev.org/682565 | 08:52 |
*** tetsuro has joined #openstack-cyborg | 09:20 | |
*** tetsuro has quit IRC | 10:34 | |
*** chenke has quit IRC | 11:14 | |
*** shaohe_feng has quit IRC | 11:50 | |
openstackgerrit | YumengBao proposed openstack/cyborg master: conductor writes device_profile update to db https://review.opendev.org/679406 | 11:54 |
openstackgerrit | YumengBao proposed openstack/cyborg master: conductor writes device_profile update to db https://review.opendev.org/679406 | 11:55 |
*** tetsuro has joined #openstack-cyborg | 11:58 | |
openstackgerrit | chenker proposed openstack/cyborg master: Fix the hardcoding of user role using sdk_adapter approach https://review.opendev.org/682565 | 12:08 |
*** chenke has joined #openstack-cyborg | 12:09 | |
*** chenke has quit IRC | 12:24 | |
*** chenke has joined #openstack-cyborg | 13:33 | |
openstackgerrit | Merged openstack/cyborg master: bug fixing: let FPGA driver report correct traits when no SRIOV enabled https://review.opendev.org/680953 | 13:52 |
*** efried_pto is now known as efried | 13:56 | |
*** tetsuro has quit IRC | 14:00 | |
*** tetsuro has joined #openstack-cyborg | 14:06 | |
*** chenke has quit IRC | 14:44 | |
*** tetsuro has quit IRC | 14:52 | |
*** tetsuro has joined #openstack-cyborg | 14:55 | |
*** efried is now known as efried_pto | 14:57 | |
*** tetsuro has quit IRC | 15:13 | |
*** tetsuro has joined #openstack-cyborg | 15:14 | |
*** tetsuro has quit IRC | 15:15 | |
*** TxGirlGeek has joined #openstack-cyborg | 15:20 | |
*** openstackgerrit has quit IRC | 16:06 | |
*** gmann_afk is now known as gmann | 17:21 | |
*** efried_pto has quit IRC | 18:01 | |
*** efried has joined #openstack-cyborg | 18:03 | |
*** efried is now known as efried_pto | 18:03 | |
*** TxGirlGeek has quit IRC | 21:00 | |
*** TxGirlGeek has joined #openstack-cyborg | 21:00 | |
*** TxGirlGeek has quit IRC | 23:08 | |
*** TxGirlGeek has joined #openstack-cyborg | 23:08 |
Generated by irclog2html.py 2.15.3 by Marius Gedminas - find it at mg.pov.lt!