16:00:33 <gibi> #startmeeting nova 16:00:33 <openstack> Meeting started Thu Dec 3 16:00:33 2020 UTC and is due to finish in 60 minutes. The chair is gibi. Information about MeetBot at http://wiki.debian.org/MeetBot. 16:00:35 <openstack> Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. 16:00:38 <openstack> The meeting name has been set to 'nova' 16:02:10 <gibi> o/ 16:02:18 <gmann> o/ 16:02:35 <stephenfin> o/ 16:02:44 <elod> o/ 16:03:14 <gibi> #topic Bugs (stuck/critical) 16:03:20 <gibi> One Critical bugs 16:03:21 <gibi> #link https://bugs.launchpad.net/nova/+bug/1906428 blocking the nova gate as nova-multi-cell job fails 16:03:22 <openstack> Launchpad bug 1906428 in OpenStack Compute (nova) "test_cold_migrate_unshelved_instance failing with cat: can't open '/mnt/timestamp': No such file or directory" [Critical,In progress] 16:03:24 <gibi> Patch is on the gate to skip the failing test until we find a solution #link https://review.opendev.org/c/openstack/nova/+/765141 16:03:45 <gibi> I saw it bounced from the gate :/ 16:03:54 <gmann> ah again failed. 16:04:52 <gmann> 134 run already in check pipeline I think it would not merge soon 16:04:54 <gibi> lyarwood promised to continue looking into the actual problem next week 16:05:07 <gibi> gmann: yeah, gate feels slow these days 16:05:15 <bauzas> \o 16:05:42 <gibi> #link 14 new untriaged bugs (+0 since the last meeting): #link https://bugs.launchpad.net/nova/+bugs?search=Search&field.status=New 16:05:58 <gibi> we are hovering around this number during the whole week 16:06:06 <gibi> #link 75 bugs are in INPROGRESS state without any tag (+0 since the last meeting): #link https://bugs.launchpad.net/nova/+bugs?field.tag=-*&field.status%3Alist=INPROGRESS 16:06:12 <gibi> these are potentially un-triaged bugs. Check if they are still valid 16:06:24 <gibi> Is there any bug we need to discuss here ? 16:07:08 <gibi> #topic Gate status 16:07:14 <gibi> Gate on master is blocked. Patch to unblock it is on the gate #link https://review.opendev.org/c/openstack/nova/+/765141 16:07:19 <gibi> we dicussed this already 16:07:24 <gibi> Gate on stable/victoria is blocked. Fix is on the gate #link https://review.opendev.org/c/openstack/nova/+/764432 16:07:41 <gibi> this also bounced 16:07:47 <gibi> :/ 16:08:08 <gibi> Classification rate 35% (+11 since the last meeting) #link http://status.openstack.org/elastic-recheck/data/integrated_gate.html 16:08:13 <gibi> Please look at the gate failures, file a bug, and add an elastic-recheck signature in the opendev/elastic-recheck repo (example: #link https://review.opendev.org/#/c/759967) 16:08:28 <gibi> I don't know how relevant the classification rate as an absolute value 16:08:47 <gibi> as it is now show better classification than last week but the gate feels in worst shape 16:09:08 <gibi> maybe what changed that we know why the gate fails but we didn't solved the failures yet 16:09:25 <gibi> anyhow I will keep reporting / tracking this number for a while to see if it is relevant 16:09:34 <gibi> any other gate issue we need to talk about? 16:10:51 <gibi> #topic Release Planning 16:10:56 <gibi> Wallaby Milestone 1 is today! 16:11:08 <gibi> The second spec review day was a success. We now have 11 blueprints approved to Wallaby. #link https://blueprints.launchpad.net/nova/wallaby 16:11:27 <gibi> Until Milestone 1 we finished 0 blueprint out of the 11 approved blueprints. 16:11:51 <gibi> M2 is january 22 16:12:11 <gibi> considering the holiday season there is not much time until M2 16:12:35 <gibi> M2 will be spec freeze so if you have an open spec please hurry up :) 16:12:56 <gibi> any other release specific thing to disucss? 16:14:21 <gibi> #topic Stable Branches 16:14:26 <gibi> stable/victoria is blocked but patch to unblock is on the gate - https://review.opendev.org/764432 16:14:31 <gibi> other stable branches seems to be OK, no outstanding issue 16:14:32 <gibi> EOM 16:14:42 <elod> sorry for repeating o:) 16:14:55 <gibi> no worry, thanks for consistently adding update to the agenda 16:14:56 <elod> did not see that it's already listed at gate status 16:15:07 <elod> np 16:15:23 <gibi> any other stable thing to discuss? ( lyarwood is on PTO today) 16:15:39 <elod> nothing that I'm aware of :) 16:16:01 <gibi> #topic Sub/related team Highlights 16:16:05 <gibi> Libvirt (bauzas) 16:16:15 <bauzas> nothing to say 16:16:38 <gibi> #topic Open discussion 16:16:51 <gibi> there are two on the agenda 16:16:51 <gibi> (stephenfin): Stuck on what to do about invalid instance hostnames like 'ubuntu18.04' 16:16:59 <gibi> #link https://review.opendev.org/c/openstack/nova/+/764482 16:17:17 <gibi> stephenfin: could you summarize where we are? 16:17:29 <stephenfin> I've brought this up on the mailing list 16:17:30 <gibi> I was only able to follow the ML thread partially 16:17:44 <stephenfin> tl;dr: people are using instance names that look like FQDNs 16:17:54 <stephenfin> I haven't yet figured out if they're relying on these to be balid 16:17:55 <stephenfin> *valid 16:18:38 <stephenfin> In any case, I'm not sure if we're going to be able to just replace all periods is the name 16:19:14 <stephenfin> so I'm still thinking the "if it's an invalid FQDN, munge the name, otherwise don't" approach is best 16:19:26 <rafaelweingartne> I would like to ask for guidance with a patch 16:19:26 <stephenfin> but I know sean-k-mooney at least disagrees 16:19:37 <rafaelweingartne> I proposed this patch: https://review.opendev.org/c/openstack/nova/+/711113, but it has not received much reviews so far 16:19:46 <rafaelweingartne> should I open an RFE, and then a spec for it as well? 16:19:49 <gibi> rafaelweingartne: i will ping you after stephen's topic 16:19:57 <rafaelweingartne> ops, sorry, sure 16:20:43 <gibi> stephenfin: but sean is not here :) 16:21:08 <stephenfin> quick - everyone review it while sean is distracted! 16:21:11 <stephenfin> :) 16:21:41 <gibi> stephenfin: your proposed the split approach to support two separate use cases? 16:22:01 <gibi> use case a) server name is used as fqdn in the guest 16:22:18 <gibi> but what is use case b) 16:22:43 <stephenfin> use case a) is more a FQDN is used as the server display name and therefore the server host name 16:23:16 <stephenfin> while use case b) is a server display name with a period in it that is *not* a FQDN is used, so the server host name should be something else 16:23:48 <stephenfin> i.e. 'test.domain.com' is okay. 'test.01' will be converted to 'Server-{serverUUID}' 16:24:33 <stephenfin> if that makes sense? 16:24:46 <gibi> and in case b) what will be the hostname in the guest? 16:24:57 <stephenfin> 'Server-{serverUUID}' 16:25:23 <stephenfin> which is the fallback today if you end up with an empty string after all non-alphanumeric characters are removed 16:25:26 <gibi> I assum now test.01 causing a real failure somewhere down the line 16:25:43 <stephenfin> if designate is deployed, you aren't able to boot an instance 16:25:56 <stephenfin> because neutron will error out when creating/attaching a port 16:26:28 <gibi> with proper documentation I'm OK to have this split behavior. I guess you need a backportable solution 16:26:52 <gibi> hence not trying to disconnect the name and the hostname 16:26:58 <stephenfin> yes, exactly 16:27:11 <stephenfin> the proper solution is 'openstack server create --hostname FOO ...' 16:27:19 <stephenfin> but that's not backportable (API change) 16:27:23 <gibi> yeah 16:27:36 <gibi> does sean has a counter proposal that is also backportable? 16:27:53 <stephenfin> Not backportable fwict, no 16:27:57 <gibi> I see 16:28:06 <stephenfin> It's user error in his eyes 16:28:17 <gibi> then I think we can say that do a backportable fix first then do a proper fix on master later 16:28:22 <sean-k-mooney> o/ 16:28:23 <stephenfin> and we should close as WONTFIX, which is user hostile 16:28:38 <gibi> sean-k-mooney: o/ 16:28:52 <gibi> we are just discussing the server name test.01 issue 16:29:06 <sean-k-mooney> ah ok 16:29:10 <bauzas> mmmm 16:29:31 * bauzas looks at the API docs to see what we tell about naming instances 16:29:55 <bauzas> "The server name." 16:29:58 <bauzas> wow 16:30:00 <sean-k-mooney> bauzas: it tell you nothing 16:30:02 <gibi> sean-k-mooney: what is the reason you are against stephenfin's proposal to convert test.01 to server-{serverUUID} and not convert valid FQDNs 16:30:03 <bauzas> didn't see that coming 16:30:03 <sean-k-mooney> yep 16:30:33 <sean-k-mooney> gibi: it would change the hostname seen in the guest for one 16:30:59 <sean-k-mooney> the precendiet is also based on a missunder standing that unicode was invalid in a hostname 16:31:02 <bauzas> so, honestly, given we haven't told it's either the display name or the hostname, I think we are OK 16:31:16 <bauzas> because the semantics can change 16:31:36 <gibi> sean-k-mooney: I gues we not just remove unicode charachters but other non hostname compatible charachters too 16:31:51 <gibi> like / 16:31:57 <sean-k-mooney> so we should be allowing unicode hostnames 16:32:01 <sean-k-mooney> but ath is a seperete fature 16:32:04 <bauzas> definitelty ^ 16:32:06 <sean-k-mooney> *feature 16:32:10 <gibi> agree ^^ 16:32:17 <gibi> so unicode aside 16:32:22 <bauzas> asséééééé 16:32:27 <sean-k-mooney> we also are not transforming the hostnames acording to the relenvet RFEs 16:32:34 <sean-k-mooney> *RFCs 16:32:49 <sean-k-mooney> we shoudl be substituiing all punctianto and other special symble with _ 16:32:54 <sean-k-mooney> sorry - 16:33:26 <bauzas> or, just consider that if you provide a ".", then you knew you are providing a FQDN 16:33:54 <bauzas> so, the hostname should only be the server name, not the TLD 16:34:05 <sean-k-mooney> so what we coudl do is in a new microversion add an fqdn filed and take only what is before the . for the instance.hostname 16:34:15 <bauzas> ie. if I wrote "bauzas.local", that meant to me that the name of my server is "bauzas" 16:34:23 <sean-k-mooney> yep 16:34:38 <sean-k-mooney> which is what actully happens todya 16:34:41 <bauzas> and I leave my DNS telling me my own TLD 16:34:47 <stephenfin> an API microversion isn't backportable though 16:34:47 <gibi> but as far as I understand we need a backportable solution first, then a proper solution on master 16:34:55 <sean-k-mooney> but as i pointed out in the email thread the metadat is totally wrong in that case 16:35:15 <stephenfin> I totally agree that what we do is rubbish, but we do it and people rely on it to some degree 16:35:18 <sean-k-mooney> i dont belive we need a backporable solution 16:35:24 <sean-k-mooney> or at lease im not sold on it 16:35:30 <bauzas> stephenfin: can't we consider to limit the server name to be "server" and not the whole FQDN ? 16:36:03 <sean-k-mooney> bauzas: i woudl be ok backproting that although im uncofrotabel with the transformation in general 16:36:04 <bauzas> (speaking of "server.domain") 16:36:13 <stephenfin> if we do, that's a change in behavior for users that were doing e.g. 'openstack server create instance.domain.com' 16:36:31 <sean-k-mooney> stephenfin: its not form a cloud init poitn of view 16:36:32 <bauzas> stephenfin: that's why I said I'm cool with explaning this behavioural change 16:36:45 <sean-k-mooney> there hostname will be instance in both cases 16:36:49 <bauzas> as we didn't promised anything with the servername 16:36:59 <sean-k-mooney> e.g. with or without designate 16:37:01 <bauzas> we're not breaking the contract) 16:37:03 <stephenfin> hmm, okay, so I'd assumed that would be rejected as non-backportable 16:37:25 <sean-k-mooney> what that would change is the designate dns name 16:37:33 <bauzas> well, it says "The server name." 16:37:33 <bauzas> " 16:37:36 <sean-k-mooney> currently it appending the designate default domain to the full sever name 16:37:47 <sean-k-mooney> now it would do the sane thing and append the default domain tothe hostname 16:37:57 <bauzas> yup 16:38:00 <sean-k-mooney> which woudl acutlly be resolveable via dns 16:38:04 <bauzas> yup 16:38:14 * gibi lost 16:38:17 <bauzas> and we could keep the display name to be the FQDN 16:38:23 <stephenfin> so if you create a server with 'instance.domain.com' and designate's default domain is 'domain.com', what happens? 16:38:26 <sean-k-mooney> bauzas: sure 16:38:36 <bauzas> gibi: trying to rephrase 16:38:37 <sean-k-mooney> the dispaly name could be that server name as it was passed in 16:38:57 <stephenfin> gibi: bauzas and sean-k-mooney are suggesting we drop everything after the first period, and suggesting it's backportable because we never made a guarantee about what the instance's hostname would be 16:39:07 <bauzas> this ^ 16:39:10 <gibi> thanks 16:39:34 <stephenfin> so 'test-instance.domain.com' would have a hostname of 'test-instance' 16:39:42 <gibi> would this change the hostname of existing instances? 16:39:46 <bauzas> (with a big fat note explaining why we're so mean to the user) 16:39:48 <stephenfin> and 'ubuntu18.04' would have a hostname of 'ubuntu18' 16:39:48 <sean-k-mooney> gibi: no 16:40:04 <bauzas> gibi: don't 16:40:06 <sean-k-mooney> gibi: it would only change the hostname for new instances 16:40:19 <stephenfin> it shouldn't - that information is only calculated once on initial boot and stored in instance.hostname 16:40:26 <sean-k-mooney> yep 16:40:27 <gibi> ok 16:40:29 <bauzas> mustn't is the word :) 16:40:39 <sean-k-mooney> did peopel see http://lists.openstack.org/pipermail/openstack-discuss/2020-November/019137.html by the way 16:40:40 <stephenfin> I don't think we recalculate it if you e.g. change the instance name via 'openstack server set --name NAME server' 16:40:46 <stephenfin> assuming that is a command... 16:40:53 <sean-k-mooney> where i wen ther how the info is actully prented to the gust 16:41:01 * stephenfin knows you can set the name when rebuilding but isn't sure about otherwise 16:41:04 <gibi> then I'm OK to do this change as a backportable fix with a fat note 16:41:36 <bauzas> sean-k-mooney: yup, I saw your email 16:41:40 <gibi> could some of you please summarize it back to the ML to see if other will be against it? 16:41:54 <bauzas> sean-k-mooney: and that's why I think that people using periods in their server names are either foolish or smart enough 16:42:13 <gibi> sorry folks we have two other topics for today 16:42:17 <gibi> so we should move on 16:42:20 <stephenfin> yup 16:42:23 * stephenfin will summarize 16:42:27 <gibi> thanks! 16:42:28 <bauzas> I think we have a reasonable consensus here 16:42:31 <sean-k-mooney> stephenfin++ 16:42:42 <gibi> rafaelweingartne: your turn 16:43:15 <rafaelweingartne> Sure. I have proposed this patch (https://review.opendev.org/c/openstack/nova/+/711113), it has some conflicts, but before resolving them 16:43:21 <rafaelweingartne> I would like to understand if we are missing something 16:43:30 <rafaelweingartne> such as an RFE, or a spec 16:44:28 <gibi> rafaelweingartne: glancing at the patch and the commit message you plan to redefine what 'usage' currntly means in the os-simple-tenant-usage API 16:45:09 <rafaelweingartne> yes, and no 16:45:30 <rafaelweingartne> we plan to externalise it. So, the default behaviour is maitained, and if somebody wants to redefine it, they could do so 16:46:06 <rafaelweingartne> To us, for instance, we were expecting something totally different from the data we get there (in the API) right now 16:46:11 <gibi> extrenalize is with a config option I assume 16:46:13 <sean-k-mooney> well if you wanted to do it differntly you can do so alredy 16:46:21 <sean-k-mooney> via consuming the instance notifocations 16:46:38 <rafaelweingartne> gibi: exactly 16:46:39 <sean-k-mooney> and building a system to track the lifecycle of the servers as you see fit 16:46:43 <rafaelweingartne> that is what the API is doing 16:46:57 <gibi> it feels like a config driver API 16:46:59 <rafaelweingartne> sean: we have other systems in-place that do that 16:47:01 <gibi> driven 16:47:12 <rafaelweingartne> gibi: yes 16:47:46 <gibi> we try to avoid config driven APIs as it makes differnt public coulds behave differently 16:47:47 <rafaelweingartne> when we saw that API, we just thought about using it to cross-check the data we already have in other monitoring and billing systems that we have in place 16:48:00 <gibi> Is os-simple-tenant-usage admin only by default? 16:48:09 <sean-k-mooney> so so this is one of the apis that i dont really fit well in nova 16:48:25 <sean-k-mooney> long term i think it would live better in an external service 16:48:34 <rafaelweingartne> probably yes 16:48:48 <sean-k-mooney> its one of the larger performance hedaces for our custoemr 16:49:13 <sean-k-mooney> this is very slow to query and result in a slow horizion as it used in the defautl overview page 16:49:26 <rafaelweingartne> but the current docs gave us the idea of providing the usage for a VM, but as I explain in the patch, it consider usage the time between the instance was created up until now or when it was destroyed 16:49:28 <sean-k-mooney> so im concerned about adding more complexity to it 16:49:35 <rafaelweingartne> I see 16:50:03 <rafaelweingartne> Right now, the API does not provide usage data as it says 16:50:17 <rafaelweingartne> at least, it is not the same understanding of usage as we have 16:50:26 <rafaelweingartne> that is why we proposed the patch 16:50:48 <gibi> rafaelweingartne: so it provides resource allocation usage but not runtime for the VM I guess 16:51:21 <rafaelweingartne> exactly 16:51:31 <sean-k-mooney> rafaelweingartne: well it does provide usage info 16:51:36 <gibi> I tend to agree with sean-k-mooney that this is not a good API for billing, and also rafaelweingartne you said that you have a different service anyhow for billing 16:51:40 <rafaelweingartne> but the documentation says usage, it does not differ between allocation and actual usage 16:51:44 <sean-k-mooney> but the defition of usage is differnt form what you are expecting 16:52:13 <rafaelweingartne> therefore, we tried to amend that 16:52:20 <gibi> I don't really think we shoudl develop os-simple-tenant-usage further (hence the name simple) but fix the doc to be precies instead 16:52:29 <sean-k-mooney> so amending that woudl be an api change and require a spec not a bugfix 16:52:35 <rafaelweingartne> well, ok that would help as well then 16:52:49 <sean-k-mooney> https://github.com/openstack/nova/blob/0e7cd9d1a95a30455e3c91916ece590454235e0e/doc/source/contributor/policies.rst#metrics-gathering 16:53:05 <sean-k-mooney> its slightly tangental but we have delcare metrics gathering as out of scope before 16:53:18 <sean-k-mooney> i tought we had a similar statement for billing but i dont see one 16:53:26 <rafaelweingartne> Ok, so no sense in creating an RFE then 16:53:47 <rafaelweingartne> well, I will create a patch to make the docs more clear then 16:54:00 <gibi> rafaelweingartne: thank you! 16:54:07 <gibi> (please file a doc bug for tracking) 16:54:42 <gibi> there is one more topic from the agenda 16:54:43 <gibi> (gibi): do we want to merge the backports for the placement-audit command? https://review.opendev.org/q/topic:%22placement-audit-backport%22 16:54:57 <gibi> It was raised during the week on #openstack-nova 16:55:11 <stephenfin> yes please 16:55:22 <gibi> does somebody remember what was the reason not to merge it? 16:55:30 <stephenfin> artom: ^ ? 16:55:53 <stephenfin> I think the concern was that it's kind of feature'y, but it's not user visible and is a huge win for operators (and us, diagnosing problems) 16:55:57 <artom> Oh, it was just super messy 16:56:03 <stephenfin> oh, even simpler than that 16:56:05 <artom> Past, like, 1 or 2 releases back 16:56:16 <bauzas> yup 16:56:20 <bauzas> this was the concern 16:56:21 <stephenfin> it was merged in stable/ussuri, right? 16:56:30 <artom> Nope, we didn't bother 16:56:38 <stephenfin> no, I mean initially 16:56:38 <artom> I used the upstrem DNM backports for CI, essentially 16:56:46 <artom> Because our RH CI is... well, it is. 16:57:00 <artom> Ah, you'd have to ask bauzas about the initial landing. 16:57:18 <bauzas> when this was merged ? 16:57:21 <bauzas> well, I'm old 16:57:29 <bauzas> ussuri IIRC 16:57:40 <sean-k-mooney> dansmith had an opion on it and i belive it was in favor of mergeing based on the operator win but i also dont recal 16:57:48 <gibi> merged in ussuri 16:57:53 <bauzas> https://review.opendev.org/c/openstack/nova/+/670112 => ussuri 16:58:08 <bauzas> sean-k-mooney: I think his opinion was meh 16:58:29 <gibi> how risky it is to backport the mess? 16:58:37 <sean-k-mooney> bauzas: basically im rembering it was not a hell no 16:58:45 <bauzas> but honestly, audit is related to allocations recreate 16:58:59 <bauzas> from mriedem 16:59:10 <gibi> I assume the effor to create the backport was already spent so only future efforts on stable due to these patches in question 16:59:16 <bauzas> one is deleting orphaned, the other is recreating them 16:59:46 <bauzas> gibi: I'd say that the maintainance is low but the initial effort is worth it pre-Train 16:59:58 <bauzas> Train backport is easy 17:00:08 <gibi> bauzas: but the initial effort is already spent as we have the patches proposed 17:00:11 <bauzas> but then artom sweated a lot with older releases 17:00:11 <stephenfin> bauzas: is or is not? 17:00:25 <bauzas> technically, we QE'd it on Queens 17:00:37 <gibi> QE? 17:00:39 <artom> bauzas, did we tho? 17:00:41 <bauzas> so the effort is already done and manually validated 17:00:47 <bauzas> against Queens 17:00:50 <artom> I'd have to double check the BZ 17:00:54 <gibi> we run out of time 17:01:02 <gibi> lets move this to #openstack-nova 17:01:03 <gibi> sorry 17:01:05 <gibi> #endmeeting