16:00:33 #startmeeting nova 16:00:33 Meeting started Thu Dec 3 16:00:33 2020 UTC and is due to finish in 60 minutes. The chair is gibi. Information about MeetBot at http://wiki.debian.org/MeetBot. 16:00:35 Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. 16:00:38 The meeting name has been set to 'nova' 16:02:10 o/ 16:02:18 o/ 16:02:35 o/ 16:02:44 o/ 16:03:14 #topic Bugs (stuck/critical) 16:03:20 One Critical bugs 16:03:21 #link https://bugs.launchpad.net/nova/+bug/1906428 blocking the nova gate as nova-multi-cell job fails 16:03:22 Launchpad bug 1906428 in OpenStack Compute (nova) "test_cold_migrate_unshelved_instance failing with cat: can't open '/mnt/timestamp': No such file or directory" [Critical,In progress] 16:03:24 Patch is on the gate to skip the failing test until we find a solution #link https://review.opendev.org/c/openstack/nova/+/765141 16:03:45 I saw it bounced from the gate :/ 16:03:54 ah again failed. 16:04:52 134 run already in check pipeline I think it would not merge soon 16:04:54 lyarwood promised to continue looking into the actual problem next week 16:05:07 gmann: yeah, gate feels slow these days 16:05:15 \o 16:05:42 #link 14 new untriaged bugs (+0 since the last meeting): #link https://bugs.launchpad.net/nova/+bugs?search=Search&field.status=New 16:05:58 we are hovering around this number during the whole week 16:06:06 #link 75 bugs are in INPROGRESS state without any tag (+0 since the last meeting): #link https://bugs.launchpad.net/nova/+bugs?field.tag=-*&field.status%3Alist=INPROGRESS 16:06:12 these are potentially un-triaged bugs. Check if they are still valid 16:06:24 Is there any bug we need to discuss here ? 16:07:08 #topic Gate status 16:07:14 Gate on master is blocked. Patch to unblock it is on the gate #link https://review.opendev.org/c/openstack/nova/+/765141 16:07:19 we dicussed this already 16:07:24 Gate on stable/victoria is blocked. Fix is on the gate #link https://review.opendev.org/c/openstack/nova/+/764432 16:07:41 this also bounced 16:07:47 :/ 16:08:08 Classification rate 35% (+11 since the last meeting) #link http://status.openstack.org/elastic-recheck/data/integrated_gate.html 16:08:13 Please look at the gate failures, file a bug, and add an elastic-recheck signature in the opendev/elastic-recheck repo (example: #link https://review.opendev.org/#/c/759967) 16:08:28 I don't know how relevant the classification rate as an absolute value 16:08:47 as it is now show better classification than last week but the gate feels in worst shape 16:09:08 maybe what changed that we know why the gate fails but we didn't solved the failures yet 16:09:25 anyhow I will keep reporting / tracking this number for a while to see if it is relevant 16:09:34 any other gate issue we need to talk about? 16:10:51 #topic Release Planning 16:10:56 Wallaby Milestone 1 is today! 16:11:08 The second spec review day was a success. We now have 11 blueprints approved to Wallaby. #link https://blueprints.launchpad.net/nova/wallaby 16:11:27 Until Milestone 1 we finished 0 blueprint out of the 11 approved blueprints. 16:11:51 M2 is january 22 16:12:11 considering the holiday season there is not much time until M2 16:12:35 M2 will be spec freeze so if you have an open spec please hurry up :) 16:12:56 any other release specific thing to disucss? 16:14:21 #topic Stable Branches 16:14:26 stable/victoria is blocked but patch to unblock is on the gate - https://review.opendev.org/764432 16:14:31 other stable branches seems to be OK, no outstanding issue 16:14:32 EOM 16:14:42 sorry for repeating o:) 16:14:55 no worry, thanks for consistently adding update to the agenda 16:14:56 did not see that it's already listed at gate status 16:15:07 np 16:15:23 any other stable thing to discuss? ( lyarwood is on PTO today) 16:15:39 nothing that I'm aware of :) 16:16:01 #topic Sub/related team Highlights 16:16:05 Libvirt (bauzas) 16:16:15 nothing to say 16:16:38 #topic Open discussion 16:16:51 there are two on the agenda 16:16:51 (stephenfin): Stuck on what to do about invalid instance hostnames like 'ubuntu18.04' 16:16:59 #link https://review.opendev.org/c/openstack/nova/+/764482 16:17:17 stephenfin: could you summarize where we are? 16:17:29 I've brought this up on the mailing list 16:17:30 I was only able to follow the ML thread partially 16:17:44 tl;dr: people are using instance names that look like FQDNs 16:17:54 I haven't yet figured out if they're relying on these to be balid 16:17:55 *valid 16:18:38 In any case, I'm not sure if we're going to be able to just replace all periods is the name 16:19:14 so I'm still thinking the "if it's an invalid FQDN, munge the name, otherwise don't" approach is best 16:19:26 I would like to ask for guidance with a patch 16:19:26 but I know sean-k-mooney at least disagrees 16:19:37 I proposed this patch: https://review.opendev.org/c/openstack/nova/+/711113, but it has not received much reviews so far 16:19:46 should I open an RFE, and then a spec for it as well? 16:19:49 rafaelweingartne: i will ping you after stephen's topic 16:19:57 ops, sorry, sure 16:20:43 stephenfin: but sean is not here :) 16:21:08 quick - everyone review it while sean is distracted! 16:21:11 :) 16:21:41 stephenfin: your proposed the split approach to support two separate use cases? 16:22:01 use case a) server name is used as fqdn in the guest 16:22:18 but what is use case b) 16:22:43 use case a) is more a FQDN is used as the server display name and therefore the server host name 16:23:16 while use case b) is a server display name with a period in it that is *not* a FQDN is used, so the server host name should be something else 16:23:48 i.e. 'test.domain.com' is okay. 'test.01' will be converted to 'Server-{serverUUID}' 16:24:33 if that makes sense? 16:24:46 and in case b) what will be the hostname in the guest? 16:24:57 'Server-{serverUUID}' 16:25:23 which is the fallback today if you end up with an empty string after all non-alphanumeric characters are removed 16:25:26 I assum now test.01 causing a real failure somewhere down the line 16:25:43 if designate is deployed, you aren't able to boot an instance 16:25:56 because neutron will error out when creating/attaching a port 16:26:28 with proper documentation I'm OK to have this split behavior. I guess you need a backportable solution 16:26:52 hence not trying to disconnect the name and the hostname 16:26:58 yes, exactly 16:27:11 the proper solution is 'openstack server create --hostname FOO ...' 16:27:19 but that's not backportable (API change) 16:27:23 yeah 16:27:36 does sean has a counter proposal that is also backportable? 16:27:53 Not backportable fwict, no 16:27:57 I see 16:28:06 It's user error in his eyes 16:28:17 then I think we can say that do a backportable fix first then do a proper fix on master later 16:28:22 o/ 16:28:23 and we should close as WONTFIX, which is user hostile 16:28:38 sean-k-mooney: o/ 16:28:52 we are just discussing the server name test.01 issue 16:29:06 ah ok 16:29:10 mmmm 16:29:31 * bauzas looks at the API docs to see what we tell about naming instances 16:29:55 "The server name." 16:29:58 wow 16:30:00 bauzas: it tell you nothing 16:30:02 sean-k-mooney: what is the reason you are against stephenfin's proposal to convert test.01 to server-{serverUUID} and not convert valid FQDNs 16:30:03 didn't see that coming 16:30:03 yep 16:30:33 gibi: it would change the hostname seen in the guest for one 16:30:59 the precendiet is also based on a missunder standing that unicode was invalid in a hostname 16:31:02 so, honestly, given we haven't told it's either the display name or the hostname, I think we are OK 16:31:16 because the semantics can change 16:31:36 sean-k-mooney: I gues we not just remove unicode charachters but other non hostname compatible charachters too 16:31:51 like / 16:31:57 so we should be allowing unicode hostnames 16:32:01 but ath is a seperete fature 16:32:04 definitelty ^ 16:32:06 *feature 16:32:10 agree ^^ 16:32:17 so unicode aside 16:32:22 asséééééé 16:32:27 we also are not transforming the hostnames acording to the relenvet RFEs 16:32:34 *RFCs 16:32:49 we shoudl be substituiing all punctianto and other special symble with _ 16:32:54 sorry - 16:33:26 or, just consider that if you provide a ".", then you knew you are providing a FQDN 16:33:54 so, the hostname should only be the server name, not the TLD 16:34:05 so what we coudl do is in a new microversion add an fqdn filed and take only what is before the . for the instance.hostname 16:34:15 ie. if I wrote "bauzas.local", that meant to me that the name of my server is "bauzas" 16:34:23 yep 16:34:38 which is what actully happens todya 16:34:41 and I leave my DNS telling me my own TLD 16:34:47 an API microversion isn't backportable though 16:34:47 but as far as I understand we need a backportable solution first, then a proper solution on master 16:34:55 but as i pointed out in the email thread the metadat is totally wrong in that case 16:35:15 I totally agree that what we do is rubbish, but we do it and people rely on it to some degree 16:35:18 i dont belive we need a backporable solution 16:35:24 or at lease im not sold on it 16:35:30 stephenfin: can't we consider to limit the server name to be "server" and not the whole FQDN ? 16:36:03 bauzas: i woudl be ok backproting that although im uncofrotabel with the transformation in general 16:36:04 (speaking of "server.domain") 16:36:13 if we do, that's a change in behavior for users that were doing e.g. 'openstack server create instance.domain.com' 16:36:31 stephenfin: its not form a cloud init poitn of view 16:36:32 stephenfin: that's why I said I'm cool with explaning this behavioural change 16:36:45 there hostname will be instance in both cases 16:36:49 as we didn't promised anything with the servername 16:36:59 e.g. with or without designate 16:37:01 we're not breaking the contract) 16:37:03 hmm, okay, so I'd assumed that would be rejected as non-backportable 16:37:25 what that would change is the designate dns name 16:37:33 well, it says "The server name." 16:37:33 " 16:37:36 currently it appending the designate default domain to the full sever name 16:37:47 now it would do the sane thing and append the default domain tothe hostname 16:37:57 yup 16:38:00 which woudl acutlly be resolveable via dns 16:38:04 yup 16:38:14 * gibi lost 16:38:17 and we could keep the display name to be the FQDN 16:38:23 so if you create a server with 'instance.domain.com' and designate's default domain is 'domain.com', what happens? 16:38:26 bauzas: sure 16:38:36 gibi: trying to rephrase 16:38:37 the dispaly name could be that server name as it was passed in 16:38:57 gibi: bauzas and sean-k-mooney are suggesting we drop everything after the first period, and suggesting it's backportable because we never made a guarantee about what the instance's hostname would be 16:39:07 this ^ 16:39:10 thanks 16:39:34 so 'test-instance.domain.com' would have a hostname of 'test-instance' 16:39:42 would this change the hostname of existing instances? 16:39:46 (with a big fat note explaining why we're so mean to the user) 16:39:48 and 'ubuntu18.04' would have a hostname of 'ubuntu18' 16:39:48 gibi: no 16:40:04 gibi: don't 16:40:06 gibi: it would only change the hostname for new instances 16:40:19 it shouldn't - that information is only calculated once on initial boot and stored in instance.hostname 16:40:26 yep 16:40:27 ok 16:40:29 mustn't is the word :) 16:40:39 did peopel see http://lists.openstack.org/pipermail/openstack-discuss/2020-November/019137.html by the way 16:40:40 I don't think we recalculate it if you e.g. change the instance name via 'openstack server set --name NAME server' 16:40:46 assuming that is a command... 16:40:53 where i wen ther how the info is actully prented to the gust 16:41:01 * stephenfin knows you can set the name when rebuilding but isn't sure about otherwise 16:41:04 then I'm OK to do this change as a backportable fix with a fat note 16:41:36 sean-k-mooney: yup, I saw your email 16:41:40 could some of you please summarize it back to the ML to see if other will be against it? 16:41:54 sean-k-mooney: and that's why I think that people using periods in their server names are either foolish or smart enough 16:42:13 sorry folks we have two other topics for today 16:42:17 so we should move on 16:42:20 yup 16:42:23 * stephenfin will summarize 16:42:27 thanks! 16:42:28 I think we have a reasonable consensus here 16:42:31 stephenfin++ 16:42:42 rafaelweingartne: your turn 16:43:15 Sure. I have proposed this patch (https://review.opendev.org/c/openstack/nova/+/711113), it has some conflicts, but before resolving them 16:43:21 I would like to understand if we are missing something 16:43:30 such as an RFE, or a spec 16:44:28 rafaelweingartne: glancing at the patch and the commit message you plan to redefine what 'usage' currntly means in the os-simple-tenant-usage API 16:45:09 yes, and no 16:45:30 we plan to externalise it. So, the default behaviour is maitained, and if somebody wants to redefine it, they could do so 16:46:06 To us, for instance, we were expecting something totally different from the data we get there (in the API) right now 16:46:11 extrenalize is with a config option I assume 16:46:13 well if you wanted to do it differntly you can do so alredy 16:46:21 via consuming the instance notifocations 16:46:38 gibi: exactly 16:46:39 and building a system to track the lifecycle of the servers as you see fit 16:46:43 that is what the API is doing 16:46:57 it feels like a config driver API 16:46:59 sean: we have other systems in-place that do that 16:47:01 driven 16:47:12 gibi: yes 16:47:46 we try to avoid config driven APIs as it makes differnt public coulds behave differently 16:47:47 when we saw that API, we just thought about using it to cross-check the data we already have in other monitoring and billing systems that we have in place 16:48:00 Is os-simple-tenant-usage admin only by default? 16:48:09 so so this is one of the apis that i dont really fit well in nova 16:48:25 long term i think it would live better in an external service 16:48:34 probably yes 16:48:48 its one of the larger performance hedaces for our custoemr 16:49:13 this is very slow to query and result in a slow horizion as it used in the defautl overview page 16:49:26 but the current docs gave us the idea of providing the usage for a VM, but as I explain in the patch, it consider usage the time between the instance was created up until now or when it was destroyed 16:49:28 so im concerned about adding more complexity to it 16:49:35 I see 16:50:03 Right now, the API does not provide usage data as it says 16:50:17 at least, it is not the same understanding of usage as we have 16:50:26 that is why we proposed the patch 16:50:48 rafaelweingartne: so it provides resource allocation usage but not runtime for the VM I guess 16:51:21 exactly 16:51:31 rafaelweingartne: well it does provide usage info 16:51:36 I tend to agree with sean-k-mooney that this is not a good API for billing, and also rafaelweingartne you said that you have a different service anyhow for billing 16:51:40 but the documentation says usage, it does not differ between allocation and actual usage 16:51:44 but the defition of usage is differnt form what you are expecting 16:52:13 therefore, we tried to amend that 16:52:20 I don't really think we shoudl develop os-simple-tenant-usage further (hence the name simple) but fix the doc to be precies instead 16:52:29 so amending that woudl be an api change and require a spec not a bugfix 16:52:35 well, ok that would help as well then 16:52:49 https://github.com/openstack/nova/blob/0e7cd9d1a95a30455e3c91916ece590454235e0e/doc/source/contributor/policies.rst#metrics-gathering 16:53:05 its slightly tangental but we have delcare metrics gathering as out of scope before 16:53:18 i tought we had a similar statement for billing but i dont see one 16:53:26 Ok, so no sense in creating an RFE then 16:53:47 well, I will create a patch to make the docs more clear then 16:54:00 rafaelweingartne: thank you! 16:54:07 (please file a doc bug for tracking) 16:54:42 there is one more topic from the agenda 16:54:43 (gibi): do we want to merge the backports for the placement-audit command? https://review.opendev.org/q/topic:%22placement-audit-backport%22 16:54:57 It was raised during the week on #openstack-nova 16:55:11 yes please 16:55:22 does somebody remember what was the reason not to merge it? 16:55:30 artom: ^ ? 16:55:53 I think the concern was that it's kind of feature'y, but it's not user visible and is a huge win for operators (and us, diagnosing problems) 16:55:57 Oh, it was just super messy 16:56:03 oh, even simpler than that 16:56:05 Past, like, 1 or 2 releases back 16:56:16 yup 16:56:20 this was the concern 16:56:21 it was merged in stable/ussuri, right? 16:56:30 Nope, we didn't bother 16:56:38 no, I mean initially 16:56:38 I used the upstrem DNM backports for CI, essentially 16:56:46 Because our RH CI is... well, it is. 16:57:00 Ah, you'd have to ask bauzas about the initial landing. 16:57:18 when this was merged ? 16:57:21 well, I'm old 16:57:29 ussuri IIRC 16:57:40 dansmith had an opion on it and i belive it was in favor of mergeing based on the operator win but i also dont recal 16:57:48 merged in ussuri 16:57:53 https://review.opendev.org/c/openstack/nova/+/670112 => ussuri 16:58:08 sean-k-mooney: I think his opinion was meh 16:58:29 how risky it is to backport the mess? 16:58:37 bauzas: basically im rembering it was not a hell no 16:58:45 but honestly, audit is related to allocations recreate 16:58:59 from mriedem 16:59:10 I assume the effor to create the backport was already spent so only future efforts on stable due to these patches in question 16:59:16 one is deleting orphaned, the other is recreating them 16:59:46 gibi: I'd say that the maintainance is low but the initial effort is worth it pre-Train 16:59:58 Train backport is easy 17:00:08 bauzas: but the initial effort is already spent as we have the patches proposed 17:00:11 but then artom sweated a lot with older releases 17:00:11 bauzas: is or is not? 17:00:25 technically, we QE'd it on Queens 17:00:37 QE? 17:00:39 bauzas, did we tho? 17:00:41 so the effort is already done and manually validated 17:00:47 against Queens 17:00:50 I'd have to double check the BZ 17:00:54 we run out of time 17:01:02 lets move this to #openstack-nova 17:01:03 sorry 17:01:05 #endmeeting