Tuesday, 2020-04-28

*** hyunsikyang has joined #openstack-fenix04:28
*** hyunsikyang__ has quit IRC04:31
tojuvoneHi JangwonLee, hyunsikyang Made some comments. Please ask if something or ask in review to show that to others there too.05:18
tojuvonelead_time is the maximum time Fenix waits before it makes action (the time Fenix waits reply). VNFM should reply immediately when ready, but within this time05:29
tojuvonerecover_time is how long it takes for VNF VM to recover after it is migrated. The recovery time after the VM is expected to be fully operational again. After migrating Fenix still counts this VM to max_impacted_members with the time defined in recover_time. After this it is not calculated to there anymore and Fenix can again migrate another VM of the same instance_greoup obeying it does not go over the max_impacted_members.05:35
*** hyunsikyang__ has joined #openstack-fenix05:38
hyunsikyang__Hi tojuvone!05:39
*** hyunsikyang has quit IRC05:42
tojuvonehyunsikyang__: Good morning!06:29
-openstackstatus- NOTICE: Zuul is currently failing testing, please refrain from recheck and submitting of new changes until this is solved.09:02
*** ChanServ changes topic to "Zuul is currently failing testing, please refrain from recheck and submitting of new changes until this is solved."09:02
-openstackstatus- NOTICE: Zuul is currently failing all testing, please refrain from approving, rechecking or submitting of new changes until this is solved.09:13
*** ChanServ changes topic to "Zuul is currently failing all testing, please refrain from approving, rechecking or submitting of new changes until this is solved."09:13
hyunsikyang__Hi tojuvone, I had a meeting and urgent job. So sorry to late reply. we understand what is your point.10:13
hyunsikyang__WE also think about the scope of this.10:13
hyunsikyang__And we are not sure what is the exact scope of this. Because, if we just make general procedure, it won't work.10:14
hyunsikyang__So, we are trying to make a realt demo with this patches.10:14
tojuvoneYes, so looking the review and considering cannot do easily the end solution, then what you have done is almost there. Just maybe the couple of review comments that address what can be done10:15
tojuvonelike "maintenance" service endpoint discovery...10:15
tojuvoneand if you have a server group in Nova, those details should be copied to "instance group"10:16
tojuvoneAnother thing would be to continue with some example VNF that could have full blown functionality. Perhaps also with another patch set if looking to make it.10:17
hyunsikyang__Ok. we will update.10:18
tojuvoneThe ultimate test case at the end could then even have different load for VNF, so it needs to change the instance group and instance contraints in a fly.10:18
hyunsikyang__And10:18
tojuvonebut I think that is so huge, one surely would not aim for it now10:19
hyunsikyang__yes. So10:19
hyunsikyang__Now what is the work around solution for that?10:19
hyunsikyang__How about using metadata for VNF?10:19
hyunsikyang__When we create VNF, we can define metadata..10:20
tojuvoneThat sounds like a fast solution.10:21
tojuvoneI guess metadata should then have needed constraints for different "groups" of VMs it have (flavor) or for antiaffinity groupped vms.10:22
tojuvoneIf metadata can be changed in fly and we know it is not a busy hour for VNF when we maintain it...10:23
tojuvonemetadata could be changed in non busy hour before maintenance starts, and like "max_impacted_members" can then be bigger than it normally would10:24
tojuvoneor metadata expects the constraints to be used only for maintenance at non budy hour and thus constraints / metadata can be static10:25
tojuvonethen it is easy to use metadata for this? No need for a fancy dynamically changing constraints for this implementation10:26
hyunsikyang__But, we only set metadata when we create VNF.10:27
hyunsikyang__Hum..10:27
tojuvoneYou just write them once to Fenix or if change them, maybe remember the original metadata value before scaling and have to change constraints in fenix for "max_impacted_members" after scaling down as there is less VMs at that time10:28
tojuvoneyes, one problem is that max_impacted_members depends on how many instances of certain VM exist10:29
tojuvoneDo you follow what I mean with max_impacted_members?10:30
hyunsikyang__Not sure. As I understand, it is a member who affected by any action of maintenance.10:31
hyunsikyang__But I am not sure, how we decide impacted member.10:31
hyunsikyang__Is it predefined? or10:31
hyunsikyang__we can define it when we want such as stating scaling or any maintenance.10:32
tojuvoneoh... let me explain10:32
*** dasp has quit IRC10:32
tojuvonewe have one type of VM, with many instances: VM1, VM2, VM3, VM410:32
tojuvoneVM1 and VM2 are in HOST1, VM3 and VM4 on HOST2 and HOST3 is empty10:33
tojuvoneFEnix is executing maintenance workflow10:33
tojuvoneThere is only one empty host, so it cannot execute 2 hosts parallel10:34
hyunsikyang__yes.10:34
tojuvoneconstraints are saying that max_impacted_members=210:34
tojuvoneFEnix decides to have maintenance on HOST110:35
tojuvoneaccording to constraints it can do "migration" to  VM1 and VM parallel10:35
*** dasp has joined #openstack-fenix10:35
tojuvonefor both of these it send some PLANNED_MAINTENANCE message separately to VNFM10:36
tojuvoneand these 2 VMs are now impacted10:36
tojuvoneVNFM have the lead_time in which is needs to reply back that it is ready10:36
tojuvonethat is defined in constraints10:36
tojuvoneafter reply, Fenix makes migration and waith the recover_time10:37
tojuvoneafter waited that for both of these VMs, there is 0 impacted members10:37
tojuvoneso Fenix coudl again have 2 parallel migration according to max_impacted_members10:38
tojuvoneAnd now if we scaled from 10 VMs, We had VM1 - VM910:38
tojuvoneprobably the max_impacted_members might have been 8 or something like that originally10:39
tojuvonebut Fenix surely can only do as many as there is empty target hosts10:39
tojuvoneBtw when using the ETSI constraints the nfv.py workflow also has only one VM instance in PLANNED_MAINTENANCE event as all VMs are done parallel10:41
tojuvoneAll possible VMs, in above it was 2 VMs at a time10:42
hyunsikyang__BTW, do we need to change max_impacted_member before maintenance according to the number of current intance on Host?10:47
tojuvone max_impacted_members do not have relation to host10:47
hyunsikyang__in your example,10:48
tojuvone max_impacted_members relations is how many instances of certain type of VM in VNF10:48
tojuvoneso how many instances can be impacted in "instance group"10:48
tojuvoneSo that is jsut from VNF perspective to say how many instances (VMS) it needs for the service it provides10:49
tojuvoneoh... how many instances can be impacted that it can still provide the service those instances offer10:50
tojuvoneand it comes from ETSI definitions, that it is max_impacted_members while it might have been nice the other way around10:50
tojuvone"least_number_of_instances_needed" isntead of "max_impacted_members"10:51
tojuvoneNow as it is max_impacted_members it needs changing if number of VNF instances is scaled. If defined the other way around you might not have needed to change the value.10:52
hyunsikyang__It means that the number of VNF needed for service as a minimum.10:57
hyunsikyang__right?10:57
tojuvoneyes, exatly10:58
hyunsikyang__So, accoding to your explaination, it is '2' because we should leave two vnf which is half of VNF for specific service.10:58
hyunsikyang__BUT why it is '0', after recovery time?10:59
tojuvone"two vnf" -> "2 VMs belonging to same "instance group" in single VNF"11:00
tojuvoneoh, my bad. I think you misunderstood11:00
tojuvoneFenix needs to internally keep count how many VMs of certain instance_group are affected against the max_impacted_members11:01
tojuvoneso Fenix workflow has some variable that keeps the ipmacted_members count11:02
tojuvoneand comapares it to max_impacted_members11:02
tojuvoneimpacted_members inside Fenix is 0 after those VMs migrated and recover_time passed11:03
tojuvoneFEnix can try to migrate who konws how many VMs, but they are in gueue as if Fenxi internal count: ipmacted_members has reached max_impacted_members11:04
tojuvoneSo fenix has own thread to handle each and every migration. In the beginnign it increments ipmacted_members and when thread is done it reduced ipmacted_members11:05
JangwonLee_I think "max_impacted_members" is just a indicator that shows the max number of migration when Fenix do maintenance an instance group. Is this right?11:06
JangwonLee_max number of instance11:06
tojuvoneBasically yes11:06
tojuvoneand normally VNf consist of different types of VMs that have their own instance_group and max_impacted_members according to those VMs11:07
hyunsikyang__So, when infra is busy, we should raise up the value to maintain the service.11:07
tojuvoneWhen VNF is busy it needs more instances for all kind of VMs it has11:08
tojuvonethen all those different instance groups those instances belong to, the max_impacted_members should be smaller11:09
tojuvoneor if we now consider we know maintenance is always during night11:09
tojuvonewe could statically define max_impacted_members to value that expect the service level is lower at night time11:10
tojuvoneThen we do nto need to know VNF load level, but thus we can do the maintenance only with condition, it is done at night time11:11
tojuvonethat is the usual case11:11
JangwonLee_yes11:11
tojuvoneIf we will have sophisticate manager to know the load and max_impacted_members can be dynamically changed, we coudl run maintenance 24/711:12
tojuvoneJust you might be able to migrate only single VM at a time or so11:12
tojuvoneI think some non Telco users were interested of Fenix for this kind of case, where some hsot is always under mainteannce in huuuuge cloud11:13
hyunsikyang__I understand why Mzx_impacted_member is changed.11:14
tojuvoneGreat, there is so many details in this all :)11:16
hyunsikyang__recheck. but one more question, If max_impacted_member = 1, fenix migrate one vnf at a one time?11:16
hyunsikyang__or total - max?11:16
tojuvoneI do not understand this one VNF? To me VNF is one application that is represented as one tenant/project in OpenStack11:17
tojuvoneand it have VMs of different flavor11:18
hyunsikyang__Ah. 1 instance.11:18
hyunsikyang__sorry.11:18
tojuvonelike VM1 and VM2 of flavor A, that belongs to isntance_group let's say with same name A11:19
hyunsikyang__VNF is kinds of application you sadi.11:19
tojuvoneyes. application consisting of diffrent type of VMs11:19
tojuvoneso you could have the VM3, and VM4 of flavor B and isntance_greoup B11:21
tojuvonemax_impacted_members would then have different value for instance_greoup A and B11:22
tojuvonelet's say instance greoup A is active /standby and always just 2 instances11:22
tojuvoneinstance_group B had different amount of VMs possibly according to load.11:23
tojuvonemaybe 10 normally11:23
tojuvonemax_impacted_members would always be 1 for  instance_group A11:23
tojuvonewhere with  instance_group B it could change when load changes and number of VMs in that group changes11:24
hyunsikyang__YEs.11:24
hyunsikyang__Right.11:24
hyunsikyang__So11:24
hyunsikyang__now, we should find a way to change the max_impacted_member...11:24
hyunsikyang__When fenix start maintenance,11:25
tojuvonein instance group A there is normally just some switch over which VM is active11:25
hyunsikyang__in the case of group B,11:25
tojuvoneif VM1 is on host to be maintained and is active, the flotaing IP will be for eaxmple changed to VM2 and only the VNFM reply back to Fenix to migrate VM111:25
tojuvoneyes, there is now 2 options11:27
tojuvone1. Find the way to change max_impacted_members dynamically11:27
tojuvone2. Assume the walue of max_impacted_members is only for night time with small load and can then have static value. Surely in workflow needs to then know it is static if VNF is scaled11:29
hyunsikyang__In the case of 2, how we get the value11:29
hyunsikyang__?11:29
hyunsikyang__just use fixed value in the VNFM like patch?11:30
tojuvoneyes, then constraints only need to be written once for isntance_group when VNF is created / maintenance called.11:33
tojuvonefor VM instance one always need to write contraints for new instance and remove when isntance removed11:34
tojuvonethen if we see that number of VMs are scaled during maintenance VNFM should perhaps read the static definition for max_impacted_members from the metadata when maintenance starts...11:37
tojuvoneand when number of VMs are scaled change the value accordingly11:37
tojuvoneOther way is to expect FEnix workflow is designed for the static values and the same max_impacted_members would be regardless the scaling11:38
hyunsikyang__In the case of VNF consists of several multi VUD as a same function, everything is fine.11:38
hyunsikyang__Bue In the case of VNF consists of multi VDU and it is not same flavor, it is hard to config value for each VDU.11:39
tojuvoneor it needs to be metadata in flavor11:40
hyunsikyang__OK. we will figure out!11:41
tojuvoneyes11:41
tojuvoneWell, you know the Tacker internals also to figure this out ;)11:42
hyunsikyang__Thank you for your supporting. now it is more clear.11:42
tojuvonegreat and no problem. We have a common goal :)11:43
tojuvoneWould be so great to have all this in Tacker11:43
tojuvoneThnaks you for all that you are doing11:43
*** ChanServ changes topic to "Welcome to Fenix: https://wiki.openstack.org/wiki/Fenix"12:25
-openstackstatus- NOTICE: Zuul has been restarted, all events are lost, recheck or re-approve any changes submitted since 9:50 UTC.12:25
*** JangwonLee_ has quit IRC15:05
*** JangwonLee_ has joined #openstack-fenix15:05
*** tojuvone has quit IRC21:09
*** tojuvone has joined #openstack-fenix21:13

Generated by irclog2html.py 2.17.2 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!