Thursday, 2018-08-30

*** Sundar has quit IRC		00:12
*** efried1 has joined #openstack-cyborg		00:50
*** efried has quit IRC		00:51
*** efried1 is now known as efried		00:51
*** efried has quit IRC		00:58
*** efried has joined #openstack-cyborg		01:20
*** jiapei has joined #openstack-cyborg		01:23
openstackgerrit	wangzhh proposed openstack/cyborg master: Add "Report device data to cyborg" https://review.openstack.org/596691	02:32
openstackgerrit	wangzhh proposed openstack/cyborg master: Add "Report device data to cyborg" https://review.openstack.org/596691	02:58
openstackgerrit	wangzhh proposed openstack/cyborg master: Add "Report device data to cyborg" https://review.openstack.org/596691	03:15
*** jiapei has quit IRC		03:33
openstackgerrit	Xinran WANG proposed openstack/cyborg master: Allocation/Deallocation API Specification https://review.openstack.org/597991	05:43
*** openstackgerrit has quit IRC		06:07
*** openstackgerrit has joined #openstack-cyborg		06:16
openstackgerrit	wangzhh proposed openstack/cyborg master: Add "Report device data to cyborg" https://review.openstack.org/596691	06:16
nguyenhai_	Is project openstack/os-acc belong to cyborg?	07:02
nguyenhai_	The core-reviewer of cyborg have responsible for openstack/os-acc or not? Thanks.	07:02
*** sahid has joined #openstack-cyborg		07:48
kosamara	Sundar: yes :)	07:50
kosamara	Sundar: But this spec is derived from efried's nova-powervm spec, which you have already commented on. The core principles are the same, but a lot of things have changed.	07:51
*** sahid has quit IRC		08:23
*** sahid has joined #openstack-cyborg		08:25
*** kosamara has quit IRC		10:01
*** kosamara has joined #openstack-cyborg		10:03
openstackgerrit	Merged openstack/os-acc master: import zuul job settings from project-config https://review.openstack.org/592836	14:09
openstackgerrit	Merged openstack/os-acc master: switch documentation job to new PTI https://review.openstack.org/592837	14:16
openstackgerrit	Merged openstack/os-acc master: add python 3.6 unit test job https://review.openstack.org/592838	14:16
openstackgerrit	Merged openstack/os-acc master: add lib-forward-testing-python3 test job https://review.openstack.org/592839	14:19
*** sahid has quit IRC		16:55
openstackgerrit	wangzhh proposed openstack/cyborg master: Add "Report device data to cyborg" https://review.openstack.org/596691	18:58
*** Sundar has joined #openstack-cyborg		19:49
Sundar	efried: Please ping me when you can	19:50
efried	Sundar: Hi!	19:50
Sundar	Hi, I got tons of feedback on the os-acc spec ;). Can we discuss them now?	19:51
efried	okay, sure	19:51
efried	Though I think you got most of it from sean	19:51
Sundar	One feedback was that the attaching to the VM should be left to Nova virt driver, and os-acc should not do it. That is hypervisor-specific and that's what virt drivers are for.	19:52
Sundar	Do you agree with that?	19:52
efried	I... think so, yes. I think that's how neutron plugins operate. But maybe it's not how the os-vif model is set up.	19:53
efried	anyway, it's certainly hypervisor-specific. No question there.	19:54
efried	So when you say "os-acc" you really mean "the plugin at the behest of os-acc".	19:54
Sundar	One main difference between os-vif and os-acc that I am struggling to communicate is this: with os-vif, a VIF gets allocated, port binding happens and then the plug() operation happens.	19:55
Sundar	With os-acc, the equivalent of binding and plug cannot be separated cleanly. Example: A GPU may need to be reconfigured to create vGPUs of a certain type. Until that point, the accelerators of the right type don't even exist. So, the device may need to be configured to create the right accelerators, or change the inventory of accelerators, before we get the attach handle	19:57
efried	okay. That stuff is also hypervisor-specific.	19:58
Sundar	Sure, it is also device-specific. The device driver has to do this in a hypervisor-specific way	19:59
efried	May be the purview of the "driver" rather than the "plugin" but still.	19:59
Sundar	Yes, I think we already agree that the driver would be hypervisor-specific	19:59
efried	it may in fact be device type specific, not necessarily device specific.	19:59
efried	Anyway, I think we're in violent agreement on the high points here.	20:00
Sundar	Well, the actual act of writing to the device and manipulating it to create a new vGPU type or program a bitstream would in fcat be specific to device models or vendors.	20:00
Sundar	So, the Cyborg and its driver(s) would have to handle the equivalent of port binding (device specific) and plug. That gives us a VAN with the device end configured, ready to be attached to an instance	20:02
Sundar	After that, Nova virt can take that VAN and do the needful for that hypervisor, in a device-independen way	20:02
efried	and does that happen in a separate step prior to spawn?	20:02
Sundar	Yes. I shared a flow diagram with you this morning.	20:03
Sundar	Before we get to that :),	20:03
Sundar	the role of os-acc plugin is highly diminished with these aspects considered. The driver does the device end, Nova virt does the instance end	20:04
efried	"nova virt" by invoking the plugin through os-acc?	20:04
Sundar	The possible role for os-acc extensions is to handle device-compute interactions, such as NUMA affinity for interrupt vectors.	20:04
Sundar	Nova virt calls os-acc, and that in turn calls Cyborg in some way. When that call returns, Nova virt has to persist the instance - VAN association in Nova's db. neither os-acc nor Cyborg can do that, so it has to go to Nova virt. The subsequent step of attaching to the VM is already in Nova virt.	20:06
Sundar	So, what does the plugin do for attach?	20:06
* efried shrugs		20:07
efried	no idea	20:07
efried	if it's nothing, it's nothing.	20:07
Sundar	I understand your need for keeping things hypervisor-specific, but Nova virt is already hypervisor-specific, right?	20:08
efried	yup	20:08
efried	I don't really have a stake in getting lots of code into these plugins.	20:08
efried	I just want to make sure we don't end up with linux-isms in common code paths.	20:08
efried	linux/libvirt	20:08
Sundar	OK. Could you look at the flow diagram I shared this morning?	20:09
efried	If I didn't have to write a separate driver/plugin at all, I would be pretty happy. But I'm not sure that is going to happen.	20:09
Sundar	Not to spoil your afternoon, but I think you would need PowerVM drivers for GPUs, FPGAs, or whatever you support :)	20:10
Sundar	Are you looking at Power+KVM hypervisor too?	20:11
Sundar	brb in 5 min	20:12
efried	I don't know anything about pkvm	20:12
Sundar	back. ok	20:14
Sundar	Back to os-acc plugins. The possible role for os-acc plugins is to handle device-compute interactions, such as NUMA affinity for interrupt vectors. But it may be too much to bring them in this spec. Shall we put os-acc plugins as a placeholder until we get to that?	20:15
efried	"NUMA affinity for interrupt vectors" <== Greek to me.	20:18
Sundar	OK, NUMA affinity for devices in general?	20:18
efried	If you don't have an actual use case for os-acc plugins, then...	20:19
efried	NUMA affinity is something we want to be able to handle via resource provider structure.	20:19
efried	we can't yet	20:19
efried	but it's something we should bring up (again) in Denver.	20:19
Sundar	Sure. Shall we have a f2f chat at Denver? It'll be good to sync up and drive the specs to a close shortly thereafter	20:20
efried	definitely	20:21
efried	If I were you, I would ask for a nova/cyborg cross-project session.	20:21
efried	Based on what I know of people's schedules, I would say Tuesday would be a good day for it.	20:21
Sundar	I have entered a slot in Nova etherpad. May be I should ping melwitt?	20:22
efried	It might be better to schedule something during cyborg's time (M/T) where you can get the "placement cores" to visit the cyborg room and spend a couple of hours hashing stuff out.	20:23
efried	And keep it on the nova schedule in case we a) don't get everything sorted; and/or b) need a wider audience.	20:23
Sundar	OK	20:23
efried	Send something to the dev ML, like what Blazar did, to organize the Tuesday thing.	20:24
efried	oh	20:24
efried	Sorry, I forgot, Blazar sent that directly to people.	20:24
efried	not to the ML.	20:24
Sundar	Meanwhile, please LMK if you are ok with the flow diagram. Did you get it?	20:24
efried	I got it.	20:26
Sundar	Great. I'll grab lunch and get back in 15 min. We can shake it out after that.	20:27
efried	give me 20	20:29
*** ildikov has joined #openstack-cyborg		20:53
Sundar	efried: Ready any time	21:07
efried	Sundar: Nova meeting in progress. Not sure how long it will last, but not beyond top of the hour. Will you be around?	21:08
Sundar	Yes	21:09
efried	Sundar: ō/	21:21
Sundar	That was quick.	21:23
efried	now where did I put that diagram...	21:24
efried	got it	21:24
Sundar	The main thing to note in the flow is that there isn't much that os-acc is doing.	21:25
Sundar	Most of it is Nova virt or Cyborg. I am still pondering what os-acc can do that is not device-specific (and so in Cyborg) or hypervisor-specific (and so in Nova virt)	21:25
efried	Sundar: as a touchpoint/router and future extension point for those pieces of the requests, it may be useful.	21:26
efried	but I'm really not sure.	21:27
Sundar	Yes, I agree. That's what I was thinking	21:27
efried	not sure it makes much sense for the calls from os-acc to cyborg API/conductor to be asynchronous if it's just going to poll for them to complete right away...	21:28
Sundar	The calls may take milliseconds to possibly seconds, depending on whether Glance bitstreams need to fetched, one or more FPGAs need to be programmed, etc	21:29
efried	so?	21:30
Sundar	That is why it is async	21:30
efried	The caller is blocking on their completion anyway	21:30
efried	So why does it matter if they take "a long time"?	21:30
Sundar	Ah, the n-cpu could do other things while this is blocked, right?	21:30
efried	could it?	21:30
efried	I guess.	21:31
Sundar	The allocation of other resources -- networking, storage -- could go in parallel	21:31
efried	Just sounds like it would make things pretty complicated.	21:31
efried	If n-cpu wanted to parallelize, it could send that request off in its own thread.	21:31
efried	But having it be async forces n-cpu to deal with that async-ness.	21:31
efried	Anyway, this is really a nit.	21:31
efried	not a substantive thing.	21:31
Sundar	The n-cpu to os-acc could be async too.	21:32
Sundar	So, os-acc deals with the async weirdness.	21:32
Sundar	Hmm, well, os-acc to Cyborg can be sync if os-acc itself is called in an async way	21:32
Sundar	Just wondering if a REST API call cna really block for seconds	21:33
efried	absolutely.	21:33
efried	I'm sure there's a connection timeout at the HTTP level. But seconds shouldn't be a problem.	21:34
Sundar	Are there precedents in OpenStack where a REST API blocks for seconds?	21:34
efried	I have no idea.	21:35
efried	That might be something edleafe and/or cdent would know.	21:36
Sundar	OK. Another note: The proposal calls for persisting VAN objects in Cyborg db, but Nova db maintains the association between the instance and its VAN UUIDs . That s because, on a VM suspend for example, Nova would need to call os-acc to detach VANs but not deallocate them	21:37
Sundar	Never mind, the detach would be in Nova virt.	21:37
Sundar	But you get the point	21:38
Sundar	On a termination, Nova virt would detach each VAN and then call os-acc to release the resources	21:38
efried	right, some kind of handle (i.e. the VAN UUID) would need to be associated with the instance in the nova db.	21:38
Sundar	Yes	21:38
efried	As long as we can query cyborg with the UUID to get the rest of the VAN info, the UUID should be the only thing we need to store, I would think.	21:39
Sundar	Yes	21:39
Sundar	Good. Since we are in agreement, I'll write this up in the spec.	21:42
efried	Sundar: Note that I'm only one person, and my stake in the details we've just discussed is not an especially strong one.	21:43
Sundar	We need some path to get this converged. I am incorporating sean-mooney's comments and responding to them. But, if somebody else were to come along in a month, we can't keep waiting.	21:45
Sundar	I hope to get this closed during the PTG or at most a week after	21:45
Sundar	I requested melwitt for a PTG session. Could I ask for your help in ensuring that all Nova feedback is given by that time?	21:46
efried	hah	22:13
efried	You're asking me to move mountains.	22:13
efried	There's no way to force all the stakeholders to review a thing. And there's no way to prevent taking weeks or months to revise something with the collaboration of many people only to have someone who should have been involved from the start come along and throw a huge monkey wrench in the works.	22:14
efried	I think your best bet is to be ready to present an overview of the architecture at the PTG, counting on at least some key players in the audience to have not reviewed the specs, and talk out some of the issues there.	22:16
efried	Sundar: ^	22:16

Generated by irclog2html.py 2.15.3 by Marius Gedminas - find it at mg.pov.lt!