Tuesday, 2020-04-28

*** hyunsikyang has joined #openstack-fenix		04:28
*** hyunsikyang__ has quit IRC		04:31
tojuvone	Hi JangwonLee, hyunsikyang Made some comments. Please ask if something or ask in review to show that to others there too.	05:18
tojuvone	lead_time is the maximum time Fenix waits before it makes action (the time Fenix waits reply). VNFM should reply immediately when ready, but within this time	05:29
tojuvone	recover_time is how long it takes for VNF VM to recover after it is migrated. The recovery time after the VM is expected to be fully operational again. After migrating Fenix still counts this VM to max_impacted_members with the time defined in recover_time. After this it is not calculated to there anymore and Fenix can again migrate another VM of the same instance_greoup obeying it does not go over the max_impacted_members.	05:35
*** hyunsikyang__ has joined #openstack-fenix		05:38
hyunsikyang__	Hi tojuvone!	05:39
*** hyunsikyang has quit IRC		05:42
tojuvone	hyunsikyang__: Good morning!	06:29
-openstackstatus- NOTICE: Zuul is currently failing testing, please refrain from recheck and submitting of new changes until this is solved.		09:02
*** ChanServ changes topic to "Zuul is currently failing testing, please refrain from recheck and submitting of new changes until this is solved."		09:02
-openstackstatus- NOTICE: Zuul is currently failing all testing, please refrain from approving, rechecking or submitting of new changes until this is solved.		09:13
*** ChanServ changes topic to "Zuul is currently failing all testing, please refrain from approving, rechecking or submitting of new changes until this is solved."		09:13
hyunsikyang__	Hi tojuvone, I had a meeting and urgent job. So sorry to late reply. we understand what is your point.	10:13
hyunsikyang__	WE also think about the scope of this.	10:13
hyunsikyang__	And we are not sure what is the exact scope of this. Because, if we just make general procedure, it won't work.	10:14
hyunsikyang__	So, we are trying to make a realt demo with this patches.	10:14
tojuvone	Yes, so looking the review and considering cannot do easily the end solution, then what you have done is almost there. Just maybe the couple of review comments that address what can be done	10:15
tojuvone	like "maintenance" service endpoint discovery...	10:15
tojuvone	and if you have a server group in Nova, those details should be copied to "instance group"	10:16
tojuvone	Another thing would be to continue with some example VNF that could have full blown functionality. Perhaps also with another patch set if looking to make it.	10:17
hyunsikyang__	Ok. we will update.	10:18
tojuvone	The ultimate test case at the end could then even have different load for VNF, so it needs to change the instance group and instance contraints in a fly.	10:18
hyunsikyang__	And	10:18
tojuvone	but I think that is so huge, one surely would not aim for it now	10:19
hyunsikyang__	yes. So	10:19
hyunsikyang__	Now what is the work around solution for that?	10:19
hyunsikyang__	How about using metadata for VNF?	10:19
hyunsikyang__	When we create VNF, we can define metadata..	10:20
tojuvone	That sounds like a fast solution.	10:21
tojuvone	I guess metadata should then have needed constraints for different "groups" of VMs it have (flavor) or for antiaffinity groupped vms.	10:22
tojuvone	If metadata can be changed in fly and we know it is not a busy hour for VNF when we maintain it...	10:23
tojuvone	metadata could be changed in non busy hour before maintenance starts, and like "max_impacted_members" can then be bigger than it normally would	10:24
tojuvone	or metadata expects the constraints to be used only for maintenance at non budy hour and thus constraints / metadata can be static	10:25
tojuvone	then it is easy to use metadata for this? No need for a fancy dynamically changing constraints for this implementation	10:26
hyunsikyang__	But, we only set metadata when we create VNF.	10:27
hyunsikyang__	Hum..	10:27
tojuvone	You just write them once to Fenix or if change them, maybe remember the original metadata value before scaling and have to change constraints in fenix for "max_impacted_members" after scaling down as there is less VMs at that time	10:28
tojuvone	yes, one problem is that max_impacted_members depends on how many instances of certain VM exist	10:29
tojuvone	Do you follow what I mean with max_impacted_members?	10:30
hyunsikyang__	Not sure. As I understand, it is a member who affected by any action of maintenance.	10:31
hyunsikyang__	But I am not sure, how we decide impacted member.	10:31
hyunsikyang__	Is it predefined? or	10:31
hyunsikyang__	we can define it when we want such as stating scaling or any maintenance.	10:32
tojuvone	oh... let me explain	10:32
*** dasp has quit IRC		10:32
tojuvone	we have one type of VM, with many instances: VM1, VM2, VM3, VM4	10:32
tojuvone	VM1 and VM2 are in HOST1, VM3 and VM4 on HOST2 and HOST3 is empty	10:33
tojuvone	FEnix is executing maintenance workflow	10:33
tojuvone	There is only one empty host, so it cannot execute 2 hosts parallel	10:34
hyunsikyang__	yes.	10:34
tojuvone	constraints are saying that max_impacted_members=2	10:34
tojuvone	FEnix decides to have maintenance on HOST1	10:35
tojuvone	according to constraints it can do "migration" to VM1 and VM parallel	10:35
*** dasp has joined #openstack-fenix		10:35
tojuvone	for both of these it send some PLANNED_MAINTENANCE message separately to VNFM	10:36
tojuvone	and these 2 VMs are now impacted	10:36
tojuvone	VNFM have the lead_time in which is needs to reply back that it is ready	10:36
tojuvone	that is defined in constraints	10:36
tojuvone	after reply, Fenix makes migration and waith the recover_time	10:37
tojuvone	after waited that for both of these VMs, there is 0 impacted members	10:37
tojuvone	so Fenix coudl again have 2 parallel migration according to max_impacted_members	10:38
tojuvone	And now if we scaled from 10 VMs, We had VM1 - VM9	10:38
tojuvone	probably the max_impacted_members might have been 8 or something like that originally	10:39
tojuvone	but Fenix surely can only do as many as there is empty target hosts	10:39
tojuvone	Btw when using the ETSI constraints the nfv.py workflow also has only one VM instance in PLANNED_MAINTENANCE event as all VMs are done parallel	10:41
tojuvone	All possible VMs, in above it was 2 VMs at a time	10:42
hyunsikyang__	BTW, do we need to change max_impacted_member before maintenance according to the number of current intance on Host?	10:47
tojuvone	max_impacted_members do not have relation to host	10:47
hyunsikyang__	in your example,	10:48
tojuvone	max_impacted_members relations is how many instances of certain type of VM in VNF	10:48
tojuvone	so how many instances can be impacted in "instance group"	10:48
tojuvone	So that is jsut from VNF perspective to say how many instances (VMS) it needs for the service it provides	10:49
tojuvone	oh... how many instances can be impacted that it can still provide the service those instances offer	10:50
tojuvone	and it comes from ETSI definitions, that it is max_impacted_members while it might have been nice the other way around	10:50
tojuvone	"least_number_of_instances_needed" isntead of "max_impacted_members"	10:51
tojuvone	Now as it is max_impacted_members it needs changing if number of VNF instances is scaled. If defined the other way around you might not have needed to change the value.	10:52
hyunsikyang__	It means that the number of VNF needed for service as a minimum.	10:57
hyunsikyang__	right?	10:57
tojuvone	yes, exatly	10:58
hyunsikyang__	So, accoding to your explaination, it is '2' because we should leave two vnf which is half of VNF for specific service.	10:58
hyunsikyang__	BUT why it is '0', after recovery time?	10:59
tojuvone	"two vnf" -> "2 VMs belonging to same "instance group" in single VNF"	11:00
tojuvone	oh, my bad. I think you misunderstood	11:00
tojuvone	Fenix needs to internally keep count how many VMs of certain instance_group are affected against the max_impacted_members	11:01
tojuvone	so Fenix workflow has some variable that keeps the ipmacted_members count	11:02
tojuvone	and comapares it to max_impacted_members	11:02
tojuvone	impacted_members inside Fenix is 0 after those VMs migrated and recover_time passed	11:03
tojuvone	FEnix can try to migrate who konws how many VMs, but they are in gueue as if Fenxi internal count: ipmacted_members has reached max_impacted_members	11:04
tojuvone	So fenix has own thread to handle each and every migration. In the beginnign it increments ipmacted_members and when thread is done it reduced ipmacted_members	11:05
JangwonLee_	I think "max_impacted_members" is just a indicator that shows the max number of migration when Fenix do maintenance an instance group. Is this right?	11:06
JangwonLee_	max number of instance	11:06
tojuvone	Basically yes	11:06
tojuvone	and normally VNf consist of different types of VMs that have their own instance_group and max_impacted_members according to those VMs	11:07
hyunsikyang__	So, when infra is busy, we should raise up the value to maintain the service.	11:07
tojuvone	When VNF is busy it needs more instances for all kind of VMs it has	11:08
tojuvone	then all those different instance groups those instances belong to, the max_impacted_members should be smaller	11:09
tojuvone	or if we now consider we know maintenance is always during night	11:09
tojuvone	we could statically define max_impacted_members to value that expect the service level is lower at night time	11:10
tojuvone	Then we do nto need to know VNF load level, but thus we can do the maintenance only with condition, it is done at night time	11:11
tojuvone	that is the usual case	11:11
JangwonLee_	yes	11:11
tojuvone	If we will have sophisticate manager to know the load and max_impacted_members can be dynamically changed, we coudl run maintenance 24/7	11:12
tojuvone	Just you might be able to migrate only single VM at a time or so	11:12
tojuvone	I think some non Telco users were interested of Fenix for this kind of case, where some hsot is always under mainteannce in huuuuge cloud	11:13
hyunsikyang__	I understand why Mzx_impacted_member is changed.	11:14
tojuvone	Great, there is so many details in this all :)	11:16
hyunsikyang__	recheck. but one more question, If max_impacted_member = 1, fenix migrate one vnf at a one time?	11:16
hyunsikyang__	or total - max?	11:16
tojuvone	I do not understand this one VNF? To me VNF is one application that is represented as one tenant/project in OpenStack	11:17
tojuvone	and it have VMs of different flavor	11:18
hyunsikyang__	Ah. 1 instance.	11:18
hyunsikyang__	sorry.	11:18
tojuvone	like VM1 and VM2 of flavor A, that belongs to isntance_group let's say with same name A	11:19
hyunsikyang__	VNF is kinds of application you sadi.	11:19
tojuvone	yes. application consisting of diffrent type of VMs	11:19
tojuvone	so you could have the VM3, and VM4 of flavor B and isntance_greoup B	11:21
tojuvone	max_impacted_members would then have different value for instance_greoup A and B	11:22
tojuvone	let's say instance greoup A is active /standby and always just 2 instances	11:22
tojuvone	instance_group B had different amount of VMs possibly according to load.	11:23
tojuvone	maybe 10 normally	11:23
tojuvone	max_impacted_members would always be 1 for instance_group A	11:23
tojuvone	where with instance_group B it could change when load changes and number of VMs in that group changes	11:24
hyunsikyang__	YEs.	11:24
hyunsikyang__	Right.	11:24
hyunsikyang__	So	11:24
hyunsikyang__	now, we should find a way to change the max_impacted_member...	11:24
hyunsikyang__	When fenix start maintenance,	11:25
tojuvone	in instance group A there is normally just some switch over which VM is active	11:25
hyunsikyang__	in the case of group B,	11:25
tojuvone	if VM1 is on host to be maintained and is active, the flotaing IP will be for eaxmple changed to VM2 and only the VNFM reply back to Fenix to migrate VM1	11:25
tojuvone	yes, there is now 2 options	11:27
tojuvone	1. Find the way to change max_impacted_members dynamically	11:27
tojuvone	2. Assume the walue of max_impacted_members is only for night time with small load and can then have static value. Surely in workflow needs to then know it is static if VNF is scaled	11:29
hyunsikyang__	In the case of 2, how we get the value	11:29
hyunsikyang__	?	11:29
hyunsikyang__	just use fixed value in the VNFM like patch?	11:30
tojuvone	yes, then constraints only need to be written once for isntance_group when VNF is created / maintenance called.	11:33
tojuvone	for VM instance one always need to write contraints for new instance and remove when isntance removed	11:34
tojuvone	then if we see that number of VMs are scaled during maintenance VNFM should perhaps read the static definition for max_impacted_members from the metadata when maintenance starts...	11:37
tojuvone	and when number of VMs are scaled change the value accordingly	11:37
tojuvone	Other way is to expect FEnix workflow is designed for the static values and the same max_impacted_members would be regardless the scaling	11:38
hyunsikyang__	In the case of VNF consists of several multi VUD as a same function, everything is fine.	11:38
hyunsikyang__	Bue In the case of VNF consists of multi VDU and it is not same flavor, it is hard to config value for each VDU.	11:39
tojuvone	or it needs to be metadata in flavor	11:40
hyunsikyang__	OK. we will figure out!	11:41
tojuvone	yes	11:41
tojuvone	Well, you know the Tacker internals also to figure this out ;)	11:42
hyunsikyang__	Thank you for your supporting. now it is more clear.	11:42
tojuvone	great and no problem. We have a common goal :)	11:43
tojuvone	Would be so great to have all this in Tacker	11:43
tojuvone	Thnaks you for all that you are doing	11:43
*** ChanServ changes topic to "Welcome to Fenix: https://wiki.openstack.org/wiki/Fenix"		12:25
-openstackstatus- NOTICE: Zuul has been restarted, all events are lost, recheck or re-approve any changes submitted since 9:50 UTC.		12:25
*** JangwonLee_ has quit IRC		15:05
*** JangwonLee_ has joined #openstack-fenix		15:05
*** tojuvone has quit IRC		21:09
*** tojuvone has joined #openstack-fenix		21:13

Generated by irclog2html.py 2.17.2 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!