*** markvoelker has quit IRC | 00:00 | |
*** tosky has quit IRC | 00:01 | |
openstackgerrit | Lee Yarwood proposed openstack/nova master: compute: Use source_bdms to reset attachment_ids during LM rollback https://review.openstack.org/652800 | 00:02 |
---|---|---|
*** lbragstad has joined #openstack-nova | 00:07 | |
*** tetsuro has joined #openstack-nova | 00:17 | |
*** jding1__ has quit IRC | 00:23 | |
*** erlon has joined #openstack-nova | 00:28 | |
openstackgerrit | zhufl proposed openstack/nova master: Remove conductor_api and _last_host_check from manager.py https://review.openstack.org/651059 | 00:53 |
openstackgerrit | Dustin Cowles proposed openstack/nova master: Introduces the openstacksdk to nova https://review.openstack.org/643664 | 01:05 |
*** eharney has quit IRC | 01:14 | |
*** keekz has joined #openstack-nova | 01:45 | |
*** threestrands has joined #openstack-nova | 01:47 | |
*** zhubx has quit IRC | 01:49 | |
*** zhubx has joined #openstack-nova | 01:49 | |
*** whoami-rajat has joined #openstack-nova | 01:53 | |
*** erlon has quit IRC | 01:58 | |
*** hongbin has joined #openstack-nova | 02:12 | |
*** lbragstad has quit IRC | 02:21 | |
*** nicolasbock has quit IRC | 02:29 | |
*** ricolin has joined #openstack-nova | 02:45 | |
*** weshay_pto is now known as weshay | 02:49 | |
*** zhubx has quit IRC | 02:49 | |
*** boxiang has joined #openstack-nova | 02:49 | |
*** cfriesen has quit IRC | 02:52 | |
*** lbragstad has joined #openstack-nova | 03:00 | |
openstackgerrit | Boxiang Zhu proposed openstack/nova-specs master: Add host and hypervisor_hostname flag to create server https://review.openstack.org/645458 | 03:01 |
openstackgerrit | Boxiang Zhu proposed openstack/nova master: Fix live migration break group policy simultaneously https://review.openstack.org/651969 | 04:05 |
*** _erlon_ has quit IRC | 04:05 | |
*** imacdonn has quit IRC | 04:06 | |
*** imacdonn has joined #openstack-nova | 04:07 | |
*** ratailor has joined #openstack-nova | 04:16 | |
*** hongbin has quit IRC | 04:40 | |
*** gyee has quit IRC | 05:08 | |
openstackgerrit | Merged openstack/nova master: Handle unsetting '[DEFAULT] dhcp_domain' https://review.openstack.org/652662 | 05:23 |
*** ratailor_ has joined #openstack-nova | 05:34 | |
*** ratailor has quit IRC | 05:36 | |
*** Luzi has joined #openstack-nova | 05:37 | |
*** ivve has joined #openstack-nova | 05:40 | |
*** lbragstad has quit IRC | 05:55 | |
*** sridharg has joined #openstack-nova | 05:58 | |
*** lpetrut has joined #openstack-nova | 06:02 | |
openstackgerrit | Boxiang Zhu proposed openstack/nova master: [WIP] Add host and hypervisor_hostname flag to create server https://review.openstack.org/645520 | 06:05 |
*** lpetrut has quit IRC | 06:16 | |
*** lpetrut has joined #openstack-nova | 06:16 | |
openstackgerrit | Merged openstack/nova master: Remove dead code https://review.openstack.org/650277 | 06:18 |
*** awestin1 has quit IRC | 06:26 | |
*** pcaruana has joined #openstack-nova | 06:26 | |
*** masayukig has quit IRC | 06:27 | |
*** seyeongkim has quit IRC | 06:27 | |
*** rpittau|afk has quit IRC | 06:28 | |
*** kmalloc has quit IRC | 06:28 | |
*** vdrok has quit IRC | 06:28 | |
*** TheJulia has quit IRC | 06:29 | |
*** kmalloc has joined #openstack-nova | 06:30 | |
*** seyeongkim has joined #openstack-nova | 06:30 | |
*** belmoreira has joined #openstack-nova | 06:31 | |
*** hogepodge has quit IRC | 06:32 | |
*** johnsom has quit IRC | 06:32 | |
*** seyeongkim has quit IRC | 06:35 | |
*** kmalloc has quit IRC | 06:40 | |
*** ralonsoh has joined #openstack-nova | 06:43 | |
*** TheJulia has joined #openstack-nova | 06:43 | |
*** markvoelker has joined #openstack-nova | 06:43 | |
*** NobodyCam has quit IRC | 06:46 | |
*** TheJulia has quit IRC | 06:47 | |
*** slaweq has joined #openstack-nova | 06:49 | |
*** TheJulia has joined #openstack-nova | 06:52 | |
*** seyeongkim has joined #openstack-nova | 06:52 | |
*** kmalloc has joined #openstack-nova | 06:53 | |
*** rpittau|afk has joined #openstack-nova | 06:53 | |
*** NobodyCam has joined #openstack-nova | 06:53 | |
*** luksky has joined #openstack-nova | 06:54 | |
*** johnsom has joined #openstack-nova | 06:55 | |
*** masayukig has joined #openstack-nova | 06:55 | |
*** awestin1 has joined #openstack-nova | 06:56 | |
*** hogepodge has joined #openstack-nova | 06:56 | |
*** ileixe has quit IRC | 06:56 | |
*** belmoreira has quit IRC | 06:56 | |
*** brinzhang has joined #openstack-nova | 06:57 | |
*** vdrok has joined #openstack-nova | 06:58 | |
*** ileixe has joined #openstack-nova | 06:59 | |
*** belmoreira has joined #openstack-nova | 07:00 | |
*** rpittau|afk is now known as rpittau | 07:06 | |
*** awalende has joined #openstack-nova | 07:08 | |
*** tesseract has joined #openstack-nova | 07:11 | |
*** ratailor__ has joined #openstack-nova | 07:20 | |
*** rcernin has quit IRC | 07:20 | |
*** ratailor_ has quit IRC | 07:22 | |
*** tosky has joined #openstack-nova | 07:23 | |
*** tssurya has joined #openstack-nova | 07:29 | |
*** helenafm has joined #openstack-nova | 07:29 | |
*** ccamacho has joined #openstack-nova | 07:48 | |
openstackgerrit | Boxiang Zhu proposed openstack/nova master: Fix live migration break group policy simultaneously https://review.openstack.org/651969 | 07:56 |
*** threestrands has quit IRC | 07:58 | |
*** johnsom has quit IRC | 08:04 | |
*** seyeongkim has quit IRC | 08:04 | |
*** johnsom has joined #openstack-nova | 08:05 | |
*** seyeongkim has joined #openstack-nova | 08:05 | |
*** rpittau_ has joined #openstack-nova | 08:05 | |
*** masayukig has quit IRC | 08:05 | |
*** masayukig_ has joined #openstack-nova | 08:05 | |
*** masayukig_ is now known as masayukig | 08:05 | |
*** rpittau_ is now known as rpittua | 08:05 | |
*** rpittua is now known as rpittau_ | 08:05 | |
*** awestin1_ has joined #openstack-nova | 08:06 | |
*** awestin1 has quit IRC | 08:06 | |
*** awestin1_ is now known as awestin1 | 08:06 | |
*** rpittau has quit IRC | 08:06 | |
*** rpittau_ is now known as rpittau | 08:06 | |
*** phasespace has joined #openstack-nova | 08:08 | |
*** priteau has joined #openstack-nova | 08:09 | |
*** ttsiouts has joined #openstack-nova | 08:21 | |
openstackgerrit | Merged openstack/nova master: Dropping the py35 testing https://review.openstack.org/643871 | 08:22 |
*** tkajinam has quit IRC | 08:24 | |
openstackgerrit | Merged openstack/nova master: Bump to hacking 1.1.0 https://review.openstack.org/651553 | 08:25 |
openstackgerrit | Merged openstack/nova master: Remove 'nova-cells' service https://review.openstack.org/651290 | 08:25 |
*** janki has joined #openstack-nova | 08:28 | |
*** ratailor__ has quit IRC | 08:30 | |
*** luksky has quit IRC | 08:36 | |
lyarwood | stephenfin: https://review.openstack.org/#/c/637224/ - would you mind taking a look at this today? | 08:37 |
lyarwood | ^ not a spec FWIW so feel free to ignore for now | 08:37 |
*** luksky has joined #openstack-nova | 08:39 | |
*** belmoreira has quit IRC | 08:41 | |
kashyap | lyarwood: Morning, hope this answers your question: https://review.openstack.org/#/c/506720/8/specs/train/approved/allow-secure-boot-for-qemu-kvm-guests.rst@287 | 08:43 |
kashyap | (Thanks for the review, so far.) | 08:43 |
*** belmoreira has joined #openstack-nova | 08:43 | |
stephenfin | lyarwood: added to the queue :) | 08:47 |
openstackgerrit | Stephen Finucane proposed openstack/nova master: Remove '/os-cells' REST APIs https://review.openstack.org/651291 | 08:52 |
openstackgerrit | Stephen Finucane proposed openstack/nova master: Stop handling cells v1 in '/os-hypervisors' API https://review.openstack.org/651292 | 08:52 |
openstackgerrit | Stephen Finucane proposed openstack/nova master: Stop handling cells v1 in '/os-servers' API https://review.openstack.org/651293 | 08:52 |
openstackgerrit | Stephen Finucane proposed openstack/nova master: Remove 'nova-manage cell' commands https://review.openstack.org/651294 | 08:52 |
openstackgerrit | Stephen Finucane proposed openstack/nova master: Stop handling cells v1 for console authentication https://review.openstack.org/651295 | 08:52 |
openstackgerrit | Stephen Finucane proposed openstack/nova master: Remove old-style cell v1 instance listing https://review.openstack.org/651296 | 08:52 |
openstackgerrit | Stephen Finucane proposed openstack/nova master: Remove 'bdm_(update_or_create|destroy)_at_top' https://review.openstack.org/651297 | 08:52 |
openstackgerrit | Stephen Finucane proposed openstack/nova master: Remove 'instance_fault_create_at_top' https://review.openstack.org/651298 | 08:52 |
openstackgerrit | Stephen Finucane proposed openstack/nova master: Remove 'instance_info_cache_update_at_top' https://review.openstack.org/651299 | 08:52 |
openstackgerrit | Stephen Finucane proposed openstack/nova master: Remove 'get_keypair_at_top' https://review.openstack.org/651300 | 08:52 |
openstackgerrit | Stephen Finucane proposed openstack/nova master: Remove 'instance_update_at_top', 'instance_destroy_at_top' https://review.openstack.org/651301 | 08:52 |
openstackgerrit | Stephen Finucane proposed openstack/nova master: Remove 'instance_update_from_api' https://review.openstack.org/651302 | 08:52 |
openstackgerrit | Stephen Finucane proposed openstack/nova master: Stop handling 'update_cells' on 'BandwidthUsage.create' https://review.openstack.org/651303 | 08:52 |
openstackgerrit | Stephen Finucane proposed openstack/nova master: Stop handling cells v1 for instance naming https://review.openstack.org/651304 | 08:52 |
openstackgerrit | Stephen Finucane proposed openstack/nova master: Remove cells code https://review.openstack.org/651306 | 08:52 |
lyarwood | kashyap: ack thanks, replied. I was more suggesting we generate the file once and make copies for each instance to use after that. | 08:55 |
openstackgerrit | Stephen Finucane proposed openstack/nova master: Remove 'instance_update_from_api' https://review.openstack.org/651302 | 08:56 |
openstackgerrit | Stephen Finucane proposed openstack/nova master: Stop handling 'update_cells' on 'BandwidthUsage.create' https://review.openstack.org/651303 | 08:56 |
openstackgerrit | Stephen Finucane proposed openstack/nova master: Stop handling cells v1 for instance naming https://review.openstack.org/651304 | 08:56 |
openstackgerrit | Stephen Finucane proposed openstack/nova master: Remove cells code https://review.openstack.org/651306 | 08:56 |
lyarwood | kashyap: and that was what led me to ask about where that template/master file could be stored. | 08:56 |
openstackgerrit | Stephen Finucane proposed openstack/nova master: Stop handling 'InstanceUnknownCell' exception https://review.openstack.org/651307 | 08:56 |
openstackgerrit | Stephen Finucane proposed openstack/nova master: Remove unnecessary wrapper https://review.openstack.org/651308 | 08:56 |
openstackgerrit | Stephen Finucane proposed openstack/nova master: db: Remove cell APIs https://review.openstack.org/651309 | 08:56 |
lyarwood | kashyap: generating it for each instance is expense IMHO, I'm just looking for ways to avoid that. | 08:56 |
openstackgerrit | Stephen Finucane proposed openstack/nova master: conf: Remove cells v1 options, group https://review.openstack.org/651310 | 08:56 |
kashyap | lyarwood: Let me be sure I'm not misreading you: | 08:57 |
kashyap | lyarwood: You're saying to generate _once_ and reuse it? Or generate them ahead store them somewhere and _then_ use a unique VARS file per instance? | 08:57 |
kashyap | s/ahead store/ahead and store/ | 08:58 |
* kashyap reads the reply | 08:58 | |
kashyap | lyarwood: Okay, I see what you mean: generate a template once, make copies and use that when needed. | 09:00 |
lyarwood | kashyap: yup that's it | 09:00 |
kashyap | Let me see if there's any gotchas there. (And yeah, I'd like to avoid needless expensive creations, too) | 09:00 |
openstackgerrit | Stephen Finucane proposed openstack/nova master: conf: Remove cells v1 options, group https://review.openstack.org/651310 | 09:06 |
openstackgerrit | Stephen Finucane proposed openstack/nova master: Remove conductor_api and _last_host_check from manager.py https://review.openstack.org/651059 | 09:06 |
*** luksky has quit IRC | 09:07 | |
*** bhagyashris has joined #openstack-nova | 09:09 | |
*** luksky has joined #openstack-nova | 09:19 | |
kashyap | lyarwood: Important bit - I didn't mention this at all in the spec: there's actually a config attribute in /etc/libvirt/qemu.conf that lets you specify the master 'nvram' template ("VARS" file) | 09:20 |
lyarwood | kashyap: and that can already have the keys enrolled? | 09:28 |
kashyap | lyarwood: So currently if you peek at your /etc/libvirt/qemu.conf, it has mapping of which _CODE.fd file belongs to which _VARS.fd file | 09:30 |
kashyap | lyarwood: Checking with the libvirt folks for further usage | 09:30 |
*** bhagyashris has quit IRC | 09:34 | |
*** dtantsur|afk is now known as dtantsur | 09:35 | |
*** avolkov has joined #openstack-nova | 09:53 | |
avolkov | kashyap: hi, have a couple of questions on https://review.openstack.org/#/c/506720/. Could you take a look? | 09:58 |
sean-k-mooney | ... i realised i jsut spent an hour reviewing patchset 8 of kasaps spec | 09:58 |
sean-k-mooney | its on 10 | 09:58 |
*** brinzhang has quit IRC | 10:00 | |
*** rcernin has joined #openstack-nova | 10:02 | |
kashyap | sean-k-mooney: Heya | 10:02 |
kashyap | Thanks for the review, looking | 10:02 |
kashyap | A quick point top off the head: the key enrollment script should _not_ a libvirt feature | 10:02 |
kashyap | And we're _not_ "(Bash) shelling out" anything | 10:03 |
kashyap | If you read closely, I merely mentioned it to show what is going on behind the scenes. | 10:03 |
sean-k-mooney | the external tool is satring qemu and then executing command on the instance right | 10:03 |
kashyap | No. | 10:03 |
*** luksky has quit IRC | 10:03 | |
kashyap | As noted in an earlier response on the change, the VARS file generator script is _not_ launching any instance. | 10:04 |
sean-k-mooney | ok but it is printing raw bytes and stings to something | 10:04 |
sean-k-mooney | https://github.com/puiterwijk/qemu-ovmf-secureboot/blob/master/ovmf-vars-generator#L104 | 10:05 |
openstackgerrit | Chris Dent proposed openstack/nova master: Delete the placement code https://review.openstack.org/618215 | 10:05 |
kashyap | sean-k-mooney: Have you ever tried Secure Boot? | 10:05 |
kashyap | sean-k-mooney: It is simply sending the escape char so that you can _run_ the tool | 10:06 |
sean-k-mooney | only on phyical machines | 10:06 |
kashyap | Please rephrase. | 10:06 |
kashyap | It can work the same in a nested env. too. The script is merely launching the ISO and enrolling the default keys | 10:07 |
kashyap | The standard behaviour documented for all distribution wiki pages | 10:07 |
sean-k-mooney | i have used secure boot but on physical machines but i have never replaced the keys | 10:07 |
kashyap | (Ah, okay) | 10:07 |
sean-k-mooney | kashyap: yes so its using qemu to launch a vm | 10:07 |
sean-k-mooney | it may not be the guest vm | 10:07 |
kashyap | No, it is launching a QEMU process. Not every QEMU process is not a "VM" | 10:07 |
sean-k-mooney | but its still launching a vm | 10:07 |
kashyap | So what? | 10:08 |
kashyap | It is the correct thing to do. | 10:08 |
sean-k-mooney | so have you considerd how this will work on a host that is configred to run guest with hugepages only | 10:08 |
sean-k-mooney | e.g. where the host has been tuned so that there is not a large amount of free meory or on a realtime host | 10:09 |
kashyap | (The process for you to try on a guest: https://wiki.ubuntu.com/UEFI/OVMF) | 10:09 |
sean-k-mooney | what im concened about it how much memoy this will use and can it impact other guests | 10:09 |
kashyap | sean-k-mooney: That is not relevant. Because you only launch the process *once* with 256MB of RAM | 10:09 |
kashyap | And generate the VARS file, and that's it. | 10:09 |
sean-k-mooney | once per vm | 10:10 |
kashyap | sean-k-mooney: No need for concern. It's a one-time thing for a few seconds | 10:10 |
kashyap | No. | 10:10 |
kashyap | Just once. | 10:10 |
kashyap | We will use a template file. | 10:10 |
sean-k-mooney | ok so why is this not a step triplo should be doing | 10:10 |
sean-k-mooney | or why are we not just shipping a template file | 10:10 |
kashyap | That is a different topic. I don't want to derail the core thing at hand | 10:10 |
sean-k-mooney | or requiring the operator to do it manually as part of install. | 10:11 |
kashyap | (Two steps at a time.) | 10:11 |
kashyap | sean-k-mooney: Yes, that's what I discussed with Lee. | 10:11 |
kashyap | The first step is to document that it is an external step an operator must take | 10:11 |
kashyap | Please take a few minutes to read the conversation before. | 10:11 |
sean-k-mooney | well i dont think nova should be doing it. | 10:11 |
kashyap | (I'm repeating myself here) | 10:11 |
sean-k-mooney | ok ill reread | 10:11 |
kashyap | sean-k-mooney: Nova won't do it. | 10:12 |
sean-k-mooney | ok then it should not be in the spec. we should have domumention for it or at most mentionit in the other deployer impact if it in the spec | 10:12 |
kashyap | Yes, will mention it as such | 10:13 |
sean-k-mooney | the current way the spec is layed out implied that it was a required change to nova. the actuly require change will be knowing where the template file is | 10:13 |
sean-k-mooney | ok | 10:13 |
kashyap | No, it is _not_ required | 10:15 |
kashyap | I didn't say it as such | 10:15 |
sean-k-mooney | rather then copying over all my comments ill leve them on patchset 8 | 10:15 |
kashyap | I said "consider" if we should. That is different than "required". | 10:15 |
sean-k-mooney | actully you said "Work out a way to integrate the external Python tool, | 10:18 |
sean-k-mooney | `ovmf-vars-generator` into Nova. | 10:18 |
sean-k-mooney | " | 10:18 |
sean-k-mooney | i was reviewing pathset 8 remember | 10:19 |
aspiers | hi folks | 10:19 |
sean-k-mooney | in patchset 10 you have change it to document that the operator ... | 10:19 |
sean-k-mooney | aspiers: hi | 10:19 |
kashyap | Yes. PS 10 was there when you started reviewing :-) | 10:19 |
aspiers | Yesterday I updated the AMD SEV spec in case anyone has time to review today | 10:20 |
aspiers | Not sure how many other specs are in your queue ;-) | 10:20 |
sean-k-mooney | kashyap: yep but i click on it from a gerrit comment form lee and never checked it was the latest :) | 10:20 |
openstackgerrit | Lee Yarwood proposed openstack/nova master: compute: Use source_bdms to reset attachment_ids during LM rollback https://review.openstack.org/652800 | 10:20 |
lyarwood | mdbooth: ^ you might be interested in this if you have time today | 10:20 |
aspiers | I'll be around in case there are things to discuss | 10:20 |
sean-k-mooney | aspiers: link? | 10:20 |
aspiers | sean-k-mooney: https://review.openstack.org/#/c/641994/ | 10:20 |
mdbooth | lyarwood: Shiny | 10:20 |
sean-k-mooney | yep just found it | 10:21 |
aspiers | I also put it in https://etherpad.openstack.org/p/nova-spec-review-day | 10:21 |
aspiers | sean-k-mooney: In particular I am wondering if hugepages can help with memory accounting. It might be a problem if it can only account for guest RAM and not the other QEMU memory chunks | 10:21 |
kashyap | sean-k-mooney: Can't parse this sentence: "-1 while i see if any of the comments i just left on patchset 8 have been adressed" | 10:22 |
sean-k-mooney | kashyap: ill clear my head a bit and take it form the top later today. by the way haveing matial content in the work items section makes it much harder to review IMO | 10:22 |
sean-k-mooney | kashyap: well i was going to -1 pathset 8 and i wanted to see if my coment were adressed in 10 | 10:23 |
sean-k-mooney | most werent so my -1 woudl still stand | 10:23 |
*** ttsiouts has quit IRC | 10:23 | |
*** ttsiouts has joined #openstack-nova | 10:24 | |
kashyap | sean-k-mooney: After I read you review in full. I'll post a new iteration, and write a top-level summary addressing your comments (as needed). (Instead of more inline replies there.) | 10:24 |
kashyap | Thanks for looking. | 10:24 |
aspiers | sean-k-mooney: but TBH I am a bit confused because the QEMU memory usage breakdown in the table https://specs.openstack.org/openstack/nova-specs/specs/stein/approved/amd-sev-libvirt-support.html#proposed-change seems to contradict what Daniel Berrange has said more recently https://review.openstack.org/#/c/641994/2/specs/train/approved/amd-sev-libvirt-support.rst@167 where he suggests a fudge factor | 10:25 |
aspiers | of 1.5 | 10:25 |
sean-k-mooney | kashyap: cool sounds good. the main reason for the -1 is the implication that the nova libvirt driver ever supporte secure boot before ant that it was "actully" insecure. | 10:27 |
kashyap | sean-k-mooney: Just responding to that point. Will fix it. | 10:27 |
kashyap | sean-k-mooney: Why I wrote is that, some customers and users were _confused_ that Nova supports "SB" | 10:28 |
sean-k-mooney | yes they were wrong | 10:28 |
sean-k-mooney | but the release notes were pretty clear | 10:28 |
*** ttsiouts has quit IRC | 10:28 | |
kashyap | Correct. I'll adjust the phrasing in spec | 10:28 |
sean-k-mooney | hyperv supprot SB and libvir only supports uefi boot | 10:28 |
openstackgerrit | Stephen Finucane proposed openstack/nova master: docs: Remove references to nova-consoleauth https://review.openstack.org/652965 | 10:32 |
openstackgerrit | Stephen Finucane proposed openstack/nova master: tests: Stop starting consoleauth in functional tests https://review.openstack.org/652966 | 10:32 |
openstackgerrit | Stephen Finucane proposed openstack/nova master: xvp: Start using consoleauth tokens https://review.openstack.org/652967 | 10:32 |
openstackgerrit | Stephen Finucane proposed openstack/nova master: nova-status: Remove consoleauth workaround check https://review.openstack.org/652968 | 10:32 |
openstackgerrit | Stephen Finucane proposed openstack/nova master: Remove nova-consoleauth https://review.openstack.org/652969 | 10:32 |
openstackgerrit | Stephen Finucane proposed openstack/nova master: objects: Remove ConsoleAuthToken.to_dict https://review.openstack.org/652970 | 10:32 |
kashyap | sean-k-mooney: While removing "implications", I will definitely mention the point that some people can get confused. | 10:33 |
kashyap | (Wording in the new iteration coming soon.) | 10:33 |
*** luksky has joined #openstack-nova | 10:33 | |
sean-k-mooney | sure that is resonable i guess | 10:34 |
sean-k-mooney | it should part of the problem statement section right? | 10:35 |
*** belmoreira has quit IRC | 10:35 | |
kashyap | (Yeah) | 10:37 |
kashyap | sean-k-mooney: I see that you were commenitng as you read the spec; as I mentioned the 'os_secure_boot' & 'os:secure_boot' in the Work Items | 10:37 |
kashyap | sean-k-mooney: How about I add a pointer to Work Items in the "Proposed change". | 10:37 |
sean-k-mooney | ya | 10:37 |
kashyap | (It is clearer than paragraphs of text.) | 10:38 |
sean-k-mooney | no the work items section as i said really should be sort like 10 lines max | 10:38 |
sean-k-mooney | all that content in my opipion shoudl be in teh proposed chages section | 10:38 |
*** cdent has joined #openstack-nova | 10:39 | |
sean-k-mooney | just grabbed a random spec but this is about the extent of what should be in work items https://github.com/openstack/nova-specs/blob/master/specs/rocky/approved/list-show-all-server-migration-types.rst#work-items | 10:39 |
kashyap | sean-k-mooney: Note: on your remark: about extending Hyper-V also to report traits -- I don't want to "boil the lake" (much less an ocean) with one spec | 10:43 |
kashyap | Few things at a time, with reasonable scope. | 10:43 |
kashyap | (That Hyper-V thing should be a separate item) | 10:43 |
kashyap | sean-k-mooney: Well, "10 lines max" for Work Items is a personal preference. | 10:44 |
kashyap | I'd like to follow this structure (which I think is far more clearer): | 10:44 |
kashyap | - Describe the change at high-level in the "Proposed change" section | 10:44 |
kashyap | - Describe the "how" in _some_ detail in the Work Items section | 10:44 |
aspiers | In the SEV spec I've probably gone somewhere in between those two | 10:45 |
*** tbachman has quit IRC | 10:45 | |
cdent | kashyap: I agree that that format would probably make more sense, but it isn't the pattern that's been followed in the past | 10:45 |
aspiers | "Work Items" is not short, but "Proposed change" has more detail. | 10:46 |
kashyap | cdent: Nod | 10:46 |
aspiers | However that's because for SEV, the design part is much more difficult than the actual implementation | 10:46 |
* kashyap --> lunch; been plugging away non-stop at this | 10:46 | |
aspiers | We've had to debate the design for ages and consider all kinds of complex stuff, but once the design is decided, the actual coding reflected in Work Items is reasonably simple | 10:47 |
aspiers | I expect each spec will have a different ratio between design complexity and implementation complexity, therefore the two sections will differ in size accordingly | 10:47 |
lpetrut | hi guys, I work on maintaining the Hyper-v driver and I'd also be interested in reporting traits / provider inventory | 10:47 |
lpetrut | in fact, I was catching up on the specs / libvirt patches to see what needs to be done | 10:48 |
*** belmoreira has joined #openstack-nova | 10:51 | |
*** nicolasbock has joined #openstack-nova | 10:55 | |
aspiers | lpetrut: I'm far from an expert so this might be wrong, but IIUC the driver needs to implement update_provider_tree() | 10:55 |
lpetrut | thanks for the hint | 10:55 |
aspiers | lpetrut: see the libvirt driver for an example | 10:56 |
aspiers | actually this is also documented very nicely | 10:57 |
aspiers | https://docs.openstack.org/nova/latest/reference/update-provider-tree.html | 10:57 |
* aspiers feels slightly more confident that he is not spouting misinformation ;-) | 10:57 | |
lpetrut | that's great. I'm wondering how are hypervisor specific features handled: standard traits via os-traits or custom traits | 10:59 |
aspiers | I think best practice is that drivers are supposed to only provide standard traits | 11:00 |
aspiers | So if hyperv wants to provide a new trait, you should probably try to submit that to os-traits first | 11:00 |
aspiers | That's what I did with HW_CPU_AMD_SEV | 11:00 |
aspiers | lpetrut: You can also consider using capabilities since I managed to get that patch landed | 11:01 |
aspiers | https://docs.openstack.org/nova/latest/admin/configuration/schedulers.html#compute-capabilities-as-traits | 11:01 |
*** ttsiouts has joined #openstack-nova | 11:02 | |
aspiers | lpetrut: efried_pto and I drew a diagram which might help https://pasteboard.co/I3iqqNm.jpg | 11:02 |
aspiers | but again I knew *nothing* about any of this a few months ago, so take it with a slight pinch of salt | 11:03 |
sean-k-mooney | cdent: i find the fromat that kashyap propsoed and uses to be much harder to read top to bottom and while syantatically correct i dont think it smatiaclly alines with the spectempalte defintions | 11:03 |
kashyap | No one else complained so far. I don't want to rework it. | 11:03 |
kashyap | I spend a _lot_ of time carefully choosing words and maintaining a structure | 11:03 |
kashyap | I will also note in the summary comment I'm writing | 11:03 |
sean-k-mooney | kashyap: if you like i can rework it | 11:03 |
kashyap | Please no | 11:04 |
kashyap | Let's not make specs into sterile and overly rigid enviroments. There is already a high-level structure. Let the author's thoughts reflect in their writing | 11:04 |
sean-k-mooney | i really thing think is a bad way to write specs. its one of the few documation forms i truely care about. | 11:04 |
kashyap | And if it is clear, it is fine. | 11:04 |
kashyap | sean-k-mooney: Again, "bad way" is an opinion | 11:04 |
kashyap | I've had several people read it, none of them had complaints | 11:04 |
sean-k-mooney | form the template "Work Items | 11:05 |
sean-k-mooney | Work items or tasks -- break the feature up into the things that need to be done to implement it. Those parts might end up being done by different people, but we're mostly trying to understand the timeline for implementation. | 11:05 |
sean-k-mooney | " | 11:05 |
*** panda is now known as panda|lunch | 11:05 | |
sean-k-mooney | the intention is to descibe the order in which things need to be done to implent the feature | 11:06 |
kashyap | For now, I'm addressing the main points. | 11:06 |
kashyap | (Let's not lose sight of what is important.) | 11:06 |
kashyap | If many others complain, I will re-adjust the content a bit. | 11:06 |
kashyap | For now, I want to re-focus the attention on the actual design and content. | 11:06 |
lpetrut | aspiers: interesting, wasn't aware that driver capabilities are now reported through traits | 11:07 |
sean-k-mooney | well part of my main objection to the current formating is you dont present you actual design at all in the proposed chage section. | 11:07 |
sean-k-mooney | at least you didnt in the version i read. ill read the latest version when its ready | 11:07 |
aspiers | lpetrut: it's new, mriedemann originally prototyped it ~1 year ago and then I finished it off https://review.openstack.org/#/c/538498/ | 11:08 |
aspiers | lpetrut: https://docs.openstack.org/releasenotes/nova/stein.html#new-features | 11:09 |
openstackgerrit | Merged openstack/nova master: Change a log level for overwriting allocation https://review.openstack.org/649788 | 11:14 |
*** pcaruana has quit IRC | 11:17 | |
*** ccamacho has quit IRC | 11:34 | |
*** phasespace has quit IRC | 11:35 | |
*** bbowen has joined #openstack-nova | 11:36 | |
*** tetsuro has quit IRC | 11:37 | |
*** belmoreira has quit IRC | 11:41 | |
*** cdent has quit IRC | 11:42 | |
*** belmoreira has joined #openstack-nova | 11:43 | |
*** dtantsur is now known as dtantsur|brb | 11:45 | |
*** lpetrut has quit IRC | 11:52 | |
*** erlon has joined #openstack-nova | 11:59 | |
*** lpetrut has joined #openstack-nova | 12:00 | |
*** tbachman has joined #openstack-nova | 12:03 | |
*** pcaruana has joined #openstack-nova | 12:04 | |
*** NewBruce has joined #openstack-nova | 12:08 | |
NewBruce | mnaser / sean-k-mooney howdy… | 12:17 |
*** priteau has quit IRC | 12:17 | |
sean-k-mooney | NewBruce: o/ | 12:17 |
NewBruce | update on https://bugs.launchpad.net/nova/+bug/1822884 - we’ve finished upgrading all nodes on our site to rocky (service level 35), nut the issue still exists... | 12:18 |
openstack | NewBruce: Error: malone bug 1822884 not found | 12:18 |
NewBruce | (we have 3 rows in the services table not on level 35, but they have deleted > 0, so im hoping they are ignored) | 12:18 |
NewBruce | so im officially out of ideas on this one…. | 12:19 |
sean-k-mooney | if deleted = id then its a soft delete service record | 12:19 |
*** priteau has joined #openstack-nova | 12:19 | |
sean-k-mooney | that bug link does not work form me by the way but i remember the bug you were having | 12:19 |
NewBruce | Nova-compute deleted = 10 | 12:20 |
NewBruce | deleted = 60 | 12:20 |
NewBruce | deleted = 61 | 12:20 |
NewBruce | yeah i moved the bug to private for now; some logs got i want to remove; but haven’t heard back from the nova admin on that yet | 12:20 |
*** gaoyan has joined #openstack-nova | 12:21 | |
*** panda|lunch is now known as panda | 12:22 | |
sean-k-mooney | ah ok | 12:22 |
NewBruce | cleaned up and public again; i can mail you the log files if you are interested | 12:22 |
NewBruce | https://bugs.launchpad.net/nova/+bug/1822884 | 12:22 |
openstack | Launchpad bug 1822884 in OpenStack Compute (nova) "live migration fails due to port binding duplicate key entry in post_live_migrate" [Undecided,New] | 12:22 |
*** cdent has joined #openstack-nova | 12:23 | |
NewBruce | posted some more to the launchpad now | 12:23 |
*** Luzi has quit IRC | 12:24 | |
sean-k-mooney | so at this point all compute nodes are running rocky. the neutorn contol plane is entirely on osa | 12:24 |
sean-k-mooney | and all compute services are using the same compute service version | 12:25 |
NewBruce | correct | 12:25 |
kashyap | lyarwood: sean-k-mooney: I forgot to mention: the key enrollment tool is already part of EDK2 package (at least in Fedora), as a sub-RPM, look for: 'edk2-qosb' | 12:27 |
sean-k-mooney | honestly i dont know either. cold miragtion should work as a last resort but i dont quite understand why this would happen | 12:27 |
kashyap | (Updating the spec) | 12:27 |
sean-k-mooney | ok yes that is good to add to the reference section | 12:28 |
kashyap | Yes, adding, actually | 12:28 |
kashyap | I totally forgot that we did it some 8 months ago. My memory is like goldfish's | 12:28 |
*** belmoreira has quit IRC | 12:28 | |
sean-k-mooney | it will be useful for project like kolla that need to figureout what to install | 12:28 |
*** belmoreira has joined #openstack-nova | 12:33 | |
*** Luzi has joined #openstack-nova | 12:46 | |
openstackgerrit | Merged openstack/nova master: conf: Undeprecate and move the 'dhcp_domain' option https://review.openstack.org/480616 | 12:48 |
stephenfin | sean-k-mooney: We're going to have to make a call on https://review.openstack.org/#/c/555081/23/specs/train/approved/cpu-resources.rst@180 | 12:49 |
bauzas | stephenfin: I still need to provide my comments | 12:49 |
stephenfin | sean-k-mooney: On one hand, I totally get where you're coming from and agree that it's a valid concern | 12:49 |
stephenfin | On the other though, if we don't use 'vcpu_pin_set' to populate this stuff, the user will be left in a state where they can no longer boot any instances with 'hw:cpu_policy' configured until they do additional configuration on each host | 12:50 |
sean-k-mooney | yes that is one of the bits im most uncomfortable with. | 12:50 |
stephenfin | That's as much a breaking change as anything else we've discussed | 12:50 |
*** liuyulong has joined #openstack-nova | 12:50 | |
stephenfin | bauzas: Ack. Just focussing on that one point atm | 12:51 |
sean-k-mooney | yes its not simple. if we use the totally new config options i suggest we could have tehm pre set them and possible provide an upstread check in nova status? | 12:51 |
sean-k-mooney | vcpu_pin_set was never intended to be "pinned cpus" alther it was assumed to be that in some cases | 12:52 |
sean-k-mooney | so i dont know what the right way forward is | 12:52 |
kashyap | lyarwood: Okay, aside: the 'nvram' stanza in /etc/libvirt/qemu.conf will be depreacted. So we can ignore that | 12:52 |
stephenfin | Hmm, neither do I :\ | 12:52 |
stephenfin | I'm also thinking we need to kill 'hw:emulator_threads_policy=isolate' | 12:53 |
sean-k-mooney | i kind of feel we should require the dedicated_cpu_set to be in the config before enableing any of this | 12:53 |
sean-k-mooney | stephenfin: ya that i was original not sure about but i think i agree | 12:54 |
stephenfin | Because I can't think of anyway to account for the extra core without mangling the request | 12:54 |
sean-k-mooney | hw:emulator_threads_policy=share shoudl be sufficent and more efficent | 12:54 |
sean-k-mooney | well the acounting is trivial | 12:54 |
stephenfin | I really, really wish we'd overloaded 'isolate' instead of 'share' to do this offload to 'shared_cpu_set' | 12:54 |
stephenfin | because I don't see a reason why anyone would want to use 'isolate' as it's implemented | 12:55 |
sean-k-mooney | we already generate teh resouce:vcpu part fo teh placement request for the flavor.vcpu and add 1 core to it for isolate | 12:55 |
sean-k-mooney | we can easilly just add a pcpu instead | 12:55 |
sean-k-mooney | well personally i would like to kill the option entirly. | 12:56 |
stephenfin | True, but rewriting the request like that feels rotten | 12:56 |
stephenfin | So would I | 12:56 |
sean-k-mooney | my preference would be make the emulator pinning work for floating cpus too | 12:56 |
sean-k-mooney | and have a seperate pin set for it | 12:56 |
stephenfin | What would the alternative be? Always offload emulator threads if 'cpu_shared_set' is defined? | 12:57 |
stephenfin | Why? | 12:57 |
stephenfin | The reason you're offloading this stuff is performance, no? | 12:57 |
stephenfin | More specifically, real-time performance | 12:57 |
stephenfin | If you're using floating CPUs, you've already given up on that | 12:57 |
sean-k-mooney | yes but perfroamce still maters for non pinned instance | 12:57 |
sean-k-mooney | the main reason however is to simplfy the code | 12:57 |
sean-k-mooney | and the config | 12:57 |
sean-k-mooney | basicaly if you define emulator_pin_set in the libvirt section we will always use it | 12:58 |
sean-k-mooney | for all vms and if not then it overlap with the vm cores | 12:58 |
stephenfin | I was thinking a similar thing but only for instances with PCPUs and only if 'cpu_shared_set' is defined | 12:58 |
stephenfin | (As opposed to all instances and only if 'emulator_pin_set' is defined) | 12:59 |
sean-k-mooney | in either case if emulator_pin_set is not overlapping with the dedicated and floating pin sets then you dont need any accounting in placement | 12:59 |
*** belmoreira has quit IRC | 12:59 | |
sean-k-mooney | i would do it for all instance as i would liek to try and share more code between pinned and floating instance | 12:59 |
*** tbachman has quit IRC | 12:59 | |
stephenfin | that makes sense | 13:00 |
sean-k-mooney | like we do with numa i would like to soft pin all vms to a floating_pin_set so you can properly reserve cores on the host for OS or vswith use | 13:00 |
stephenfin | Well, I think we'll be doing that going forward regardless | 13:00 |
sean-k-mooney | we need to do that anyway when we mix pinned and floating vm on the same host | 13:00 |
stephenfin | Yeah | 13:01 |
sean-k-mooney | yep | 13:01 |
stephenfin | Adding another configuration option would be yet another breaking change though | 13:01 |
stephenfin | So many breaking changes | 13:01 |
sean-k-mooney | it does not need to be | 13:03 |
sean-k-mooney | https://review.openstack.org/#/c/555081/23/specs/train/approved/cpu-resources.rst@157 | 13:03 |
sean-k-mooney | [compute] | 13:03 |
sean-k-mooney | pinned_cpu_set | 13:03 |
sean-k-mooney | floating_cpu_set | 13:03 |
sean-k-mooney | [libvirt] | 13:03 |
sean-k-mooney | emulator_cpu_set | 13:03 |
sean-k-mooney | i suggested deprecating cpu_shared_set and renaming/moving it to [libvirt]/emulator_cpu_set | 13:04 |
sean-k-mooney | so it will not need config change initally althong they will get a deprecation warning | 13:04 |
stephenfin | Hmm, I need to think about this | 13:06 |
NewBruce | sean-k-mooney / mnaser … we seem to actually have made this worse - RDO - RDO which was working previously is now also broken | 13:07 |
*** pcaruana has quit IRC | 13:07 | |
sean-k-mooney | is it broken the same way? | 13:08 |
NewBruce | seems to be | 13:08 |
NewBruce | same 500 internal error; same duplicate port entries in ml2_port_bindings | 13:09 |
sean-k-mooney | and all RDO nodes have the same code deployed and config | 13:09 |
*** lbragstad has joined #openstack-nova | 13:10 | |
sean-k-mooney | the is begining to feel like it might be related to the specific configuration of some of the compute nodes rather than a gloabl configuration issue but that is a feeling rather then anything based on fact | 13:10 |
NewBruce | not identical (due to the time span taken to update), we have mostly 18.2.0, but some 18.1.0 and an 18.0.3 | 13:11 |
NewBruce | we did try rolling an 18.2.0 back to 18.1.0 - but that didn’t seem to help | 13:11 |
sean-k-mooney | can you try migration between two nodes of the same version on the rdo side for each of the 3 versions you have deployed | 13:12 |
NewBruce | yeah, i agree its starting to feel something odd (random)… but as you say, feeling - not fact | 13:12 |
NewBruce | Yeah, will do now | 13:12 |
sean-k-mooney | perhaps we can narrow it donw to a version and then do a git biset for the changes | 13:12 |
mnaser | hmm | 13:17 |
mnaser | this is strange | 13:17 |
mnaser | NewBruce: so that cloud is 100% rocky at this point? | 13:17 |
*** jmlowe has quit IRC | 13:19 | |
*** ccamacho has joined #openstack-nova | 13:20 | |
*** mriedem has joined #openstack-nova | 13:21 | |
*** eharney has joined #openstack-nova | 13:22 | |
*** yaawang has quit IRC | 13:22 | |
NewBruce | mnaser yeah, 100% rocky | 13:22 |
NewBruce | … in all senses of the word :D | 13:23 |
*** yaawang has joined #openstack-nova | 13:23 | |
*** dtantsur|brb is now known as dtantsur | 13:24 | |
sean-k-mooney | this is the delta in termes of ptaches https://github.com/openstack/nova/compare/18.0.3...18.1.0 | 13:25 |
mnaser | NewBruce: curl -H "X-Auth-Token: `openstack token issue -c id -f value`" http://network-sjc1.vexxhost.us/v2.0/extensions | python -mjson.tool | grep binding-extended | 13:25 |
mnaser | change network-sjc1 with your neutron endpoint | 13:26 |
mnaser | can you run that many several times and see if there is every a time that it doesn't return binding-extended? | 13:26 |
sean-k-mooney | i wonder if it could be realated to https://github.com/openstack/nova/commit/4a12c9c298913f99570f2f8e93500db687e98dc9 | 13:26 |
sean-k-mooney | hum although that is only on revert | 13:28 |
NewBruce | so an RDO 18.1.0 -> RDO 18.1.0 does the same thing; fails on duplicate ports | 13:29 |
NewBruce | mnaser : curl -s -H "X-Auth-Token: $OS_TOKEN" https://kna1.citycloud.com:9696/v2.0/extensions/binding-extended | python -m json.tool | 13:30 |
NewBruce | { | 13:30 |
NewBruce | "extension": { | 13:30 |
NewBruce | "alias": "binding-extended", | 13:30 |
NewBruce | "description": "Expose port bindings of a virtual port to external application", | 13:30 |
NewBruce | "links": [], | 13:30 |
NewBruce | "name": "Port Bindings Extended", | 13:30 |
NewBruce | "updated": "2017-07-17T10:00:00-00:00" | 13:30 |
NewBruce | } | 13:30 |
NewBruce | } | 13:30 |
*** egonzalez has quit IRC | 13:30 | |
*** egonzalez has joined #openstack-nova | 13:30 | |
*** efried_pto is now known as efried | 13:31 | |
NewBruce | (#SoccerDad duties - back online in about 30 min) | 13:31 |
mnaser | NewBruce: but did you hit it many different times, and it always gave a response? | 13:31 |
mnaser | im wondering if one backend might be acting weird | 13:31 |
*** boxiang has quit IRC | 13:32 | |
*** pcaruana has joined #openstack-nova | 13:33 | |
*** boxiang has joined #openstack-nova | 13:33 | |
sean-k-mooney | mnaser: so you resuggesting runnint it with "watch -d -n 1 ..." and seing if it changes | 13:33 |
mnaser | sean-k-mooney: well just knowing that there is probably multiple network servers, maybe keep hitting it because *maybe* one of them is responding without that extension | 13:34 |
mnaser | https://github.com/openstack/nova/blob/stable/rocky/nova/conductor/tasks/live_migrate.py#L282 just trying to eliminate all those things here | 13:34 |
openstackgerrit | Matt Riedemann proposed openstack/nova stable/stein: Change a log level for overwriting allocation https://review.openstack.org/652994 | 13:34 |
sean-k-mooney | mnaser: i think i asked NewBruce to check that against each of the neutron api enpoints directly in the past bypassing the loadblancer | 13:35 |
*** belmoreira has joined #openstack-nova | 13:35 | |
sean-k-mooney | it would cause this issue espically if there was a network partion or some other factor cause different nodes to prefer diffeent api nodes | 13:35 |
sean-k-mooney | but if that was the case we would expect the same resules for osa to osa too right? | 13:36 |
mnaser | that is true as well | 13:38 |
*** jmlowe has joined #openstack-nova | 13:38 | |
mnaser | sean-k-mooney: but this is one of those issues where you're just desperately trying to do whatever works lol | 13:38 |
efried | dansmith: When you get a chance, would you please do the channel topic thing with https://etherpad.openstack.org/p/nova-spec-review-day ? | 13:38 |
openstackgerrit | Merged openstack/nova-specs master: Add host and hypervisor_hostname flag to create server https://review.openstack.org/645458 | 13:39 |
*** ChanServ sets mode: +o dansmith | 13:39 | |
sean-k-mooney | ya i know from what we can tell it should be working if it wasnt for the evidence e.g. its not that would be my respocne if some one asked shoudl this work | 13:39 |
*** dansmith changes topic to "Spec review day: https://etherpad.openstack.org/p/nova-spec-review-day -- Current runways: https://etherpad.openstack.org/p/nova-runways-train -- This channel is for Nova development. For support of Nova deployments, please use #openstack." | 13:39 | |
*** tbachman has joined #openstack-nova | 13:41 | |
openstackgerrit | Matt Riedemann proposed openstack/nova stable/queens: Add missing libvirt exception during device detach https://review.openstack.org/651639 | 13:43 |
openstackgerrit | sean mooney proposed openstack/os-traits master: add libvirt image metadata traits https://review.openstack.org/652996 | 13:43 |
*** cfriesen has joined #openstack-nova | 13:47 | |
bauzas | mriedem: sorry, finished earlier yesterday | 13:52 |
bauzas | mriedem: could you please ping me again which stable changes I could review ? | 13:52 |
mriedem | bauzas: at this point, probably queens https://review.openstack.org/#/q/status:open+project:openstack/nova+branch:stable/queens | 13:53 |
bauzas | ack | 13:53 |
mriedem | bauzas: and https://review.openstack.org/#/q/status:open+project:openstack/nova+branch:stable/pike+topic:bug/1669054 for pike | 13:53 |
dansmith | efried: as a placement and request groups expert, I hope you can look at this one and provide a short path to being able to make request groups expressive enough that we don't have to keep adding legacy stuff like this: https://review.openstack.org/#/c/650963/2 | 13:57 |
efried | dansmith: on my list. I had suggested https://review.openstack.org/#/c/650476/ which is generating lots of discussion both on the spec and on the ML http://lists.openstack.org/pipermail/openstack-discuss/2019-April/thread.html#4782 | 13:59 |
dansmith | efried: ack, glad to see that | 13:59 |
efried | unfortunately it's not an easy problem. | 13:59 |
dansmith | stephenfin: bauzas: I trust you will be all over that ^ spec to make sure it will solve your problems | 13:59 |
efried | which is why we've been talking about it for a couple years | 14:00 |
*** awaugama has joined #openstack-nova | 14:00 | |
*** cdent has quit IRC | 14:01 | |
mriedem | melwitt: are consoles/console pools in the db model a xen only thing? | 14:02 |
mriedem | ah yes, "The nova-console service is deprecated as it is Xen specific" | 14:04 |
*** amodi has joined #openstack-nova | 14:05 | |
melwitt | cool. I don't recall the term "console pools" off the top of my head | 14:05 |
*** sapd1_x has joined #openstack-nova | 14:06 | |
NewBruce | mnaser have run it in a loop and no changes | 14:07 |
*** markvoelker has quit IRC | 14:07 | |
mriedem | melwitt: ok, reason being https://review.openstack.org/#/c/570202/7/nova/db/sqlalchemy/api.py@1794 | 14:08 |
*** Luzi has quit IRC | 14:09 | |
mnaser | hmm | 14:09 |
mriedem | looks like we likely leak consoles records when a server is deleted, | 14:09 |
mriedem | but that's super latent if so, and xen specific, | 14:09 |
mriedem | so i care little | 14:10 |
mriedem | Theo could open a bug for it but i wouldn't hold this up for it | 14:10 |
melwitt | haha, nice | 14:10 |
mnaser | is migrate_data stored in a db anywhere or is it thrown into the rpc request | 14:10 |
mriedem | mnaser: it's only rpc, not persisted | 14:10 |
melwitt | we deprecated the os-console service kind of recently https://review.openstack.org/610075 | 14:12 |
mriedem | stephenfin: the bottom of your cells v1 removal series needs to be fixed | 14:12 |
stephenfin | mriedem: I just saw :( | 14:13 |
melwitt | oh yeah, you already said that | 14:13 |
stephenfin | Looking at it now | 14:13 |
mriedem | melwitt: yeah i was looking at that - that deprecates the service, | 14:13 |
mriedem | but not the apis | 14:13 |
melwitt | ah | 14:13 |
*** belmoreira has quit IRC | 14:13 | |
mriedem | so if we were to remove the nova-console service the os-consoles api would be broken, | 14:13 |
mriedem | so we can't really deprecate the api with a microversion and expect it to still work on 2.1 w/o the service itself, | 14:13 |
mriedem | which would mean we either (1) leave the nova-console service forever or (2) obsolete the api and drop the service, like we're doing with nova-cells and nova-network | 14:14 |
mriedem | maybe ptg topic fodder - but it'd be nice to have xen people there to actually tell us if they need the thing anymore, BobBall seemed to suggest in the ML that we didn't | 14:14 |
mriedem | i don't know who works on or uses xenapi in nova anymore though | 14:15 |
NewBruce | hey mriedem seen the latest in our lovely migration issues ?! | 14:15 |
openstackgerrit | Stephen Finucane proposed openstack/nova-specs master: Standardize CPU resource tracking https://review.openstack.org/555081 | 14:15 |
mriedem | NewBruce: not really - can't get the neutron api extension to go away? | 14:15 |
mriedem | melwitt: i'll just add an item in the ptg etherpad | 14:16 |
*** belmoreira has joined #openstack-nova | 14:16 | |
NewBruce | we upgraded the entire site to rocky, and now RDO - RDO fails in addition to RDO - OSA! | 14:16 |
NewBruce | (updated the launchpad) | 14:16 |
*** awalende has quit IRC | 14:17 | |
*** awalende has joined #openstack-nova | 14:18 | |
mriedem | :/ | 14:18 |
*** priteau has quit IRC | 14:19 | |
mriedem | we definitely have multinode jobs running live migration in rocky that uses the new flow | 14:19 |
melwitt | yeah, I had thought BobBall gave the go ahead to remove it but agreed, I don't know of any people currently using xenapi. actually, I thought I've seen some patches proposed over there fairly recently. /me looks | 14:19 |
*** gaoyan has quit IRC | 14:20 | |
melwitt | oh, nvm, it was coreycb fixing an openssl handling thing https://review.openstack.org/635533 | 14:21 |
*** awalende_ has joined #openstack-nova | 14:21 | |
*** awalende has quit IRC | 14:22 | |
openstackgerrit | Stephen Finucane proposed openstack/nova master: Remove '/os-cells' REST APIs https://review.openstack.org/651291 | 14:22 |
openstackgerrit | Stephen Finucane proposed openstack/nova master: Stop handling cells v1 in '/os-hypervisors' API https://review.openstack.org/651292 | 14:22 |
openstackgerrit | Stephen Finucane proposed openstack/nova master: Stop handling cells v1 in '/os-servers' API https://review.openstack.org/651293 | 14:22 |
openstackgerrit | Stephen Finucane proposed openstack/nova master: Remove 'nova-manage cell' commands https://review.openstack.org/651294 | 14:22 |
openstackgerrit | Stephen Finucane proposed openstack/nova master: Stop handling cells v1 for console authentication https://review.openstack.org/651295 | 14:22 |
openstackgerrit | Stephen Finucane proposed openstack/nova master: Remove old-style cell v1 instance listing https://review.openstack.org/651296 | 14:22 |
openstackgerrit | Stephen Finucane proposed openstack/nova master: Remove 'bdm_(update_or_create|destroy)_at_top' https://review.openstack.org/651297 | 14:23 |
openstackgerrit | Stephen Finucane proposed openstack/nova master: Remove 'instance_fault_create_at_top' https://review.openstack.org/651298 | 14:23 |
openstackgerrit | Stephen Finucane proposed openstack/nova master: Remove 'instance_info_cache_update_at_top' https://review.openstack.org/651299 | 14:23 |
openstackgerrit | Stephen Finucane proposed openstack/nova master: Remove 'get_keypair_at_top' https://review.openstack.org/651300 | 14:23 |
openstackgerrit | Stephen Finucane proposed openstack/nova master: Remove 'instance_update_at_top', 'instance_destroy_at_top' https://review.openstack.org/651301 | 14:23 |
openstackgerrit | Stephen Finucane proposed openstack/nova master: Remove 'instance_update_from_api' https://review.openstack.org/651302 | 14:23 |
openstackgerrit | Stephen Finucane proposed openstack/nova master: Stop handling 'update_cells' on 'BandwidthUsage.create' https://review.openstack.org/651303 | 14:23 |
openstackgerrit | Stephen Finucane proposed openstack/nova master: Stop handling cells v1 for instance naming https://review.openstack.org/651304 | 14:23 |
openstackgerrit | Stephen Finucane proposed openstack/nova master: Remove cells code https://review.openstack.org/651306 | 14:23 |
openstackgerrit | Stephen Finucane proposed openstack/nova master: Stop handling 'InstanceUnknownCell' exception https://review.openstack.org/651307 | 14:23 |
openstackgerrit | Stephen Finucane proposed openstack/nova master: Remove unnecessary wrapper https://review.openstack.org/651308 | 14:23 |
openstackgerrit | Stephen Finucane proposed openstack/nova master: db: Remove cell APIs https://review.openstack.org/651309 | 14:23 |
*** ttsiouts has quit IRC | 14:23 | |
*** ttsiouts has joined #openstack-nova | 14:24 | |
mriedem | NewBruce: and you've checked to make sure there aren't any lingering nova-compute services table records in the rdo site database with an older version causing issues? | 14:25 |
stephenfin | melwitt: This is something we could do later this cycle, right? https://review.openstack.org/#/q/topic:bp/remove-consoleauth | 14:25 |
mriedem | NewBruce: and unpinned rpc versions for [upgrade_levels]/compute ? | 14:25 |
*** awalende_ has quit IRC | 14:25 | |
NewBruce | mriedem posted the contents of the services table to the launchpad; we have 3 entries which are not 35 but all have deleted status > 0 | 14:25 |
stephenfin | melwitt: Also, this has the feel of a bugfix to me. Thoughts? https://review.openstack.org/#/c/652967/ | 14:25 |
*** rcernin has quit IRC | 14:27 | |
mriedem | stephenfin: the consoleauth thing has been really confusing for people, in rocky at least, | 14:27 |
*** janki has quit IRC | 14:27 | |
mriedem | if there is some migration that people need to do and we can check that in some automated way, to say "you shouldn't upgrade to train" that would be good before ripping the service out | 14:27 |
*** dtantsur has quit IRC | 14:27 | |
*** ttsiouts has quit IRC | 14:27 | |
*** dpawlik has quit IRC | 14:28 | |
stephenfin | mriedem: You mean more than the nova-status check we already have? | 14:28 |
*** ttsiouts has joined #openstack-nova | 14:28 | |
mriedem | stephenfin: also note that there is a rest api that relies on consoleauth https://developer.openstack.org/api-ref/compute/?expanded=create-remote-console-detail#create-remote-console | 14:28 |
stephenfin | Oh, I thought that was able to use the DB tokens too? I must have misread it | 14:29 |
mriedem | maybe i'm thinking of https://developer.openstack.org/api-ref/compute/?expanded=create-remote-console-detail#show-console-connection-information | 14:29 |
mriedem | ok remote-consoles is something else, nvm | 14:30 |
stephenfin | mriedem: I had a look at that and thought it was also able to use the DB tokens | 14:30 |
mriedem | i think you are https://review.openstack.org/#/c/652969/1/nova/api/openstack/compute/console_auth_tokens.py | 14:30 |
*** mlavalle has joined #openstack-nova | 14:30 | |
bauzas | err, I was about to update my thoughts on the cpu-resources spec :) | 14:31 |
bauzas | but then I saw a new PS :) | 14:31 |
bauzas | dammit | 14:31 |
stephenfin | bauzas: Update them away :) I'm apply them retrospectively | 14:31 |
*** dtantsur has joined #openstack-nova | 14:31 | |
stephenfin | It hasn't changed significantly outside of changing how we use Flavour.vcpus | 14:31 |
bauzas | stephenfin: I'm trying to wrap my head around the upgrade impact | 14:31 |
melwitt | stephenfin: hm, yeah, I guess that xvp thing was missed.. so it feels more like a bug fix | 14:32 |
stephenfin | melwitt: Cool. I can drag that out so. I imagine no one has spotted it because no one is using it (It's Xen-specific and BobBall said we could kill it) | 14:32 |
melwitt | yeah, that's what I'm thinking too as far as it not being noticed | 14:33 |
stephenfin | Alas, that was only deprecated last cycle so I guess we can't kill that too this cycle | 14:33 |
stephenfin | (Removing nova-cells, nova-network, nova-consoleauth, nova-xvpvncproxy and the placement code in one fell swoop/cycle sure would make for interesting release note reading) | 14:34 |
NewBruce | mriedem we have upgrade_levels = auto across the site, but since everything is service level 35 that shouldnt be an issue right? | 14:34 |
*** gaoyan has joined #openstack-nova | 14:35 | |
*** lpetrut has quit IRC | 14:36 | |
*** mdbooth has quit IRC | 14:37 | |
*** priteau has joined #openstack-nova | 14:37 | |
mriedem | NewBruce: on a call and i'd need to load all of this context back into my head | 14:38 |
mriedem | but you're talking about this check https://github.com/openstack/nova/blob/stable/rocky/nova/conductor/tasks/live_migrate.py#L51 | 14:39 |
*** mdbooth has joined #openstack-nova | 14:40 | |
mriedem | NewBruce: "we have 3 entries which are not 35 but all have deleted status > 0" yeah those shouldn't be included in the min version check | 14:41 |
NewBruce | mriedem ok - ive updated the launchpad, i might have also mailed you some logs at some point | 14:43 |
mnaser | oh hmm now that I think about it | 14:44 |
NewBruce | but ping me when your off the call… i will test cold migrate tonight so that we have that as an option; the fact that its failing on RDO - RDO after the rocky upgrade is at least encouraging that the problem seems to be in the rocky side | 14:44 |
mriedem | s/rocky/rdo/? | 14:44 |
mnaser | could it be possible that not all services have been restarted since everything is at rocky | 14:45 |
mriedem | osa -> osa live migration is fine right? | 14:45 |
mnaser | and the max rpc version is not rocky ? | 14:45 |
NewBruce | mnaser very possible | 14:45 |
NewBruce | havent tested OSA - OSA yet, on my todo | 14:45 |
*** mdbooth_ has joined #openstack-nova | 14:45 | |
mnaser | aren't you supposed to "SIGHUP" (which is broken right now) to get new versions of rpc stuff | 14:45 |
mriedem | yes | 14:45 |
mnaser | so really a restart | 14:45 |
mnaser | so is it possible the conductors haven't been restarted and are running older code? | 14:45 |
mriedem | does rdo sighup rather than full restart the services on upgrade? | 14:45 |
mnaser | OSA used to do sighup | 14:46 |
mnaser | till we found that bug | 14:46 |
mriedem | yeah so maybe rdo still does as well | 14:46 |
mriedem | i was never able to recreate one of the theories about the break either https://review.openstack.org/#/c/649464/ | 14:47 |
*** mdbooth has quit IRC | 14:48 | |
openstackgerrit | Merged openstack/nova stable/rocky: Temporarily mutate migration object in finish_revert_resize https://review.openstack.org/648691 | 14:48 |
*** dklyle has quit IRC | 14:49 | |
*** luksky has quit IRC | 14:50 | |
*** dklyle has joined #openstack-nova | 14:50 | |
*** mdbooth_ has quit IRC | 14:50 | |
mriedem | i.e. i'm not able to recreate the duplicate entry error in neutron when post_live_migration_at_destination updates the port's host binding | 14:50 |
mriedem | sean-k-mooney: ^ were you able to recreate that with a neutron functional test? | 14:50 |
efried | mriedem: Are you my "stable release liaison"? | 14:51 |
mriedem | sure | 14:51 |
efried | just added you to https://review.openstack.org/#/c/652868/ and https://review.openstack.org/#/c/652869/ | 14:51 |
efried | It is not clear to me how much of https://review.openstack.org/#/q/status:open+(project:openstack/os-vif+OR+project:openstack/python-novaclient+OR+project:openstack/nova)+branch:stable/pike we need to merge before those ^ are a go. | 14:51 |
efried | and/or if more things need to be flushed from stein->rocky->queens->pike first | 14:52 |
sean-k-mooney | mriedem: no although it kind of fell of my plate. i can try again however. | 14:53 |
mriedem | efried: i've -1ed with a comment | 14:53 |
efried | thank you sir | 14:53 |
sean-k-mooney | mriedem: the main issue was figuring out how to test that withing neutron exsiting test suite | 14:53 |
mnaser | NewBruce: how long has this system been up for? | 14:54 |
mnaser | esp the nova ctlplane processes | 14:54 |
NewBruce | we had a maintenance window maybe a month ago when everything was shutdown | 14:55 |
mnaser | there isn't a way to get the current "highest" level of detected rpc version eh | 14:55 |
NewBruce | so that would have been the latest - certainly quite some time before the rest of the computes were upgrade | 14:55 |
*** helenafm has quit IRC | 14:57 | |
mriedem | mnaser: i've thought about exposing the service version in the api and/or a nova-manage command, but the problem is the services will cache the version so an api/cli could tell you you're at the highest but a service could be running with a lower version in its cache | 14:58 |
NewBruce | is it worth restarting everything to remove this as a possibility? | 14:58 |
mriedem | it looks like if you sighup nova-conductor the only thing it will do is reset that cache | 14:58 |
mriedem | https://github.com/openstack/nova/blob/stable/rocky/nova/conductor/manager.py#L188 | 14:59 |
mordred | mriedem: you know everything ... | 15:00 |
mordred | mriedem: https://wiki.openstack.org/wiki/VirtDriverImageProperties - are those documented anywhere other than that wiki page? | 15:01 |
mriedem | https://docs.openstack.org/glance/latest/admin/useful-image-properties.html ? | 15:01 |
mriedem | stephenfin: the table format in ^ is bad now | 15:01 |
mriedem | is that a known issue? | 15:01 |
mriedem | that used to actually have grid lines like a .... table | 15:02 |
mordred | oh! thanks | 15:02 |
stephenfin | mriedem: kashyap spotted that a few days ago. It's a style thing done by openstackdocstheme | 15:02 |
sean-k-mooney | mordred: they are ment to be documened in teh glance metadef too but several are missing | 15:03 |
kashyap | stephenfin: What is "that"? The "if you reference the same ref twice you'll see funny rendering"? | 15:03 |
mriedem | mordred: there are some missing from that docs page too - like sean-k-mooney said for metadefs | 15:03 |
stephenfin | kashyap: The lack of borders on tables | 15:03 |
sean-k-mooney | mordred: stephenfin is working on a better way to defien and validate imave properties and flavor extra specs | 15:03 |
mriedem | i try to report glance bugs when i find missing properties in nova code | 15:03 |
kashyap | stephenfin: Ah, I missed to read the context earlier | 15:03 |
mriedem | https://bugs.launchpad.net/glance/+bug/1811897 | 15:03 |
openstack | Launchpad bug 1811897 in Glance "Useful image properties in glance - hw_disk_bus is also used by the vmware driver" [Undecided,New] | 15:03 |
mriedem | https://bugs.launchpad.net/glance/+bug/1808868 | 15:03 |
openstack | Launchpad bug 1808868 in Glance "Useful image properties in glance - hw_cdrom_bus is not documented" [Medium,Confirmed] | 15:03 |
mordred | sean-k-mooney: I would support anything that improves a better way to define and validate image properties :) | 15:03 |
mriedem | https://bugs.launchpad.net/glance/+bugs?search=Search&field.bug_reporter=mriedem&orderby=-datecreated&start=0 etc | 15:04 |
sean-k-mooney | mordred: in theroy they should all be defined here https://github.com/openstack/glance/tree/master/etc/metadefs | 15:04 |
mordred | trait:<trait_name> = required is a really special interfae | 15:04 |
sean-k-mooney | but it has not been maintained for all new specs and across all drivers | 15:05 |
mordred | sean-k-mooney: of course it hasn't :) | 15:05 |
*** cdent has joined #openstack-nova | 15:05 | |
sean-k-mooney | mordred: well we never added any testing to enforce it so it never will be. | 15:05 |
mriedem | i found out the other day that azure has a completely undocumented templating rest api and i was pretty surprised and somehow happy that even a giant closed source thing like azure has poor documentation | 15:06 |
* mordred cries | 15:06 | |
mriedem | sean-k-mooney: core reviewers in nova can certainly say "i'm not going to approve your shiny nugget until i see the glance docs patch written" | 15:06 |
mordred | also - fwiw - in the docs, it says auto_disk_config should return true of falase | 15:06 |
mordred | and on rackspace it returned "disabled" | 15:06 |
mordred | so - you know - there's that | 15:07 |
sean-k-mooney | mriedem: true we have mentioned that for some of the recent windriver ones like vTPM | 15:07 |
sean-k-mooney | its a relitvly tirival change if you do it when you add the feature | 15:07 |
mriedem | mordred: auto_disk_config is a string field in the nova schema so it can be whatever as far as nova is concerned | 15:07 |
mordred | mriedem: awesome | 15:07 |
mnaser | mriedem: I'd probably avoid suggesting SIGHUP-ing things for now till we figure out the oslo.service stuff, but yeah, maybe worth restarting nova-conductor to reset its cache I guess | 15:08 |
mriedem | it does look like the xen driver tries to treat it as a bool though | 15:08 |
sean-k-mooney | mordred: you shoudl review https://review.openstack.org/#/c/638734/ | 15:08 |
mriedem | like other booleans in the openstack rest api like 1/yes/true/True etc | 15:08 |
mordred | are _all_ of the extra properties liek that string fields? | 15:09 |
mriedem | mordred: btw, i think i kind of have a monopoly on depressing topics in this channel and i will fight you over territory | 15:09 |
*** seyeongkim has quit IRC | 15:09 | |
mordred | mriedem: I will not fight back - I concede your supremacy in depressing topics in this channel | 15:09 |
*** seyeongkim has joined #openstack-nova | 15:10 | |
mriedem | mordred: not all https://github.com/openstack/nova/blob/master/nova/objects/image_meta.py#L233 | 15:10 |
mordred | I claim that monopoly in #openstack-sdks | 15:10 |
mriedem | everything in ^ should be documented in https://docs.openstack.org/glance/latest/admin/useful-image-properties.html and in glance metadefs | 15:10 |
sean-k-mooney | mordred: wehre there is a finite set we restict it but where there is not we dont | 15:10 |
mriedem | anything not in https://github.com/openstack/nova/blob/master/nova/objects/image_meta.py#L233 will fail in nova (unless you've forked nova) | 15:10 |
sean-k-mooney | mriedem: well if the key does not match anything in that we will allow it | 15:11 |
mordred | ok. so that nova file is essentally the ultimate truth | 15:11 |
sean-k-mooney | e.g. you can define my_randome_metadata_property=whatever | 15:11 |
*** mdbooth has joined #openstack-nova | 15:11 | |
mriedem | sean-k-mooney: ummm, are you sure? | 15:11 |
sean-k-mooney | yes if we cant we broke our api | 15:11 |
mriedem | i'm pretty sure we intentionally broke the api for this | 15:12 |
mriedem | and told people to upstream their forks | 15:12 |
mriedem | this is also why we haven't codified flavor extra specs | 15:12 |
dansmith | yeah, pretty sure you can have anything you want, we just enforce the format of the ones we know about | 15:12 |
dansmith | because otherwise you can't use some of the things like jsonfilter right? | 15:12 |
sean-k-mooney | operators use this for steaing guest to specific host using the image properites filter | 15:12 |
mriedem | the image properties filter works on like 3-4 known properties | 15:13 |
mordred | have I mentioed how much it sucks that images in v2 just take the extra properties in the root of the image object? | 15:13 |
mriedem | you mean flat rather than a special 'properties' sub-dict? | 15:14 |
mordred | v1's subdict where user-defined metadata went is SO MUCH JBETTER | 15:14 |
mordred | yeah | 15:14 |
mordred | flat is horrifically terrible | 15:14 |
sean-k-mooney | https://github.com/openstack/nova/blob/master/nova/scheduler/filters/aggregate_image_properties_isolation.py i think support all the image metadata keys | 15:14 |
mriedem | now you get to get the base properties and subtract anything to figure out the exras | 15:14 |
mriedem | weeee | 15:14 |
mordred | yup | 15:14 |
mordred | especially since some of the base properties have data types that aren't string | 15:14 |
mordred | so if you _don't_ deal with all of the base properties, you can be really screwed | 15:15 |
dansmith | mriedem: this is where we would enforce that all the ones in the dict have to be known right? https://github.com/openstack/nova/blob/master/nova/objects/image_meta.py#L580-L598 | 15:15 |
*** gtema has joined #openstack-nova | 15:15 | |
mriedem | dansmith: i believe so | 15:17 |
openstackgerrit | Merged openstack/nova stable/rocky: Error out migration when confirm_resize fails https://review.openstack.org/652127 | 15:17 |
dansmith | mriedem: so, I think we only process the ones we know about, ignore anything else you throw in there | 15:17 |
sean-k-mooney | ya thats what im seeing too | 15:17 |
mriedem | i remember people bringing this up in the ML but maybe they were just saying, "my special unicorn properties doesn't make it down to my forked compute manager code, why not?" | 15:18 |
sean-k-mooney | i dont remember a spec for removing this so this s a regression as it s an api change | 15:18 |
dansmith | mriedem: that's a different thing | 15:18 |
mriedem | sean-k-mooney: this has been this way for *years* | 15:19 |
mriedem | danpb did all this work | 15:19 |
dansmith | mriedem: the unknown image props don't get put into the object, so the compute nodes never see them, that's definitely true | 15:19 |
sean-k-mooney | well the unkonw ones are for the scheduler only | 15:19 |
mriedem | sean-k-mooney: https://github.com/openstack/nova/blob/master/nova/scheduler/filters/aggregate_image_properties_isolation.py#L45 is an ImageMetaProps object, | 15:20 |
mriedem | so if we don't know about the property, it won't be in that object and the filter can't filter on it | 15:20 |
mriedem | https://github.com/openstack/nova/blob/master/nova/scheduler/filters/aggregate_image_properties_isolation.py#L56 | 15:20 |
dansmith | same for image_props_filter | 15:22 |
*** hamzy has quit IRC | 15:22 | |
dansmith | I guess jsonfilter is only on host state | 15:22 |
mriedem | yeah the jsonfilter is a whole other unvalidated piece of gorp | 15:23 |
dansmith | yeah, but I thought it could operate on the image too, but no | 15:23 |
NewBruce | heading off line for a while; bbl | 15:23 |
*** gaoyan has quit IRC | 15:23 | |
dansmith | well, anyway, we definitely removed some functionality in the image props filter when we did that, but (a) it's been a long time with no real complaint that I know of and (b) I wouldn't call it an api breakage | 15:24 |
mriedem | liberty https://review.openstack.org/#/c/76234/ | 15:25 |
dansmith | er, hmm, maybe we didn't actually | 15:25 |
dansmith | I thought image props filter could do more generic matching, but it doesn't | 15:25 |
mriedem | right it's like 3 or 4 properties | 15:26 |
mriedem | hw version, type, arch something like that | 15:26 |
mriedem | very specific | 15:26 |
dansmith | which is why I was thinking about the jsonfilter | 15:26 |
dansmith | so if there's not some other filter I'm thinking of I guess we're good | 15:26 |
mriedem | it's AggregateImagePropertiesIsolation | 15:26 |
mriedem | that's the generic one | 15:26 |
sean-k-mooney | ok do we have teh same restion now with flavor extra specs | 15:26 |
mriedem | to tie aggregates to images | 15:26 |
dansmith | mriedem: ah, just a straight match to host meta I see | 15:27 |
sean-k-mooney | yes | 15:27 |
*** gtema has left #openstack-nova | 15:27 | |
sean-k-mooney | this was used for think like runt this image on the NFV aggrate | 15:27 |
*** pcaruana has quit IRC | 15:27 | |
dansmith | sean-k-mooney: no such restriction with flavors | 15:27 |
dansmith | mriedem: mostly unrelated to this, I wanted to throw something out there just for maybe future use | 15:29 |
mriedem | totally unrelated to this, but there are 2 simple changes below https://review.openstack.org/#/c/570202/ (which i need for cross-cell resize as well as rebuild from cell0) that have a +2 and i'm looking for another core to hit those | 15:29 |
dansmith | mriedem: when I was reading the two numa specs today I was reminded of something I was thinking about earlier, related to cases where we have instances which store old-format data, like something stuck in their flavor which needs to be migrated | 15:30 |
sean-k-mooney | dansmith: ok i had tought we still supported "bring your own" metadata key for image too i guess not | 15:30 |
*** pcaruana has joined #openstack-nova | 15:30 | |
dansmith | mriedem: things that we would need to online_migrate or something, and things that we would be tempted to resolve with "meh, just migrate all your instances left one rack to clean those up".. kinda like the recent discussion about reshape for that stuff | 15:31 |
dansmith | mriedem: we might benefit from a "needs upgrade" flag on the instance, that would be shown in detailed list, to admins only, | 15:31 |
dansmith | where I could list all tenants and see instances that have "needs upgrade" | 15:31 |
sean-k-mooney | so the old usecase woulould be achive with a member_of request in the flavor extra spec | 15:32 |
dansmith | anything could set that flag, like compute manager when it notices some legacy data, or even libvirt when it notices something like a disk format that needs to be upgraded or something | 15:32 |
mriedem | dansmith: if the thing setting the flag has to already calculate that it will set the flag, why not just do the online migration right then? | 15:32 |
*** ivve has quit IRC | 15:33 | |
dansmith | mriedem: well, because things like the resource topology thing would need to be done to all instances on the compute node at the same time, and after a config change | 15:33 |
dansmith | mriedem: referring to the thing stephenfin and jaypipes and bauzas and I were discussing last week | 15:33 |
bauzas | FWIW, on a 1.5h meeting atm | 15:34 |
bauzas | but listening | 15:34 |
mriedem | so instances a,b,c need an upgrade, but the admin doesn't know how to upgrade them, i.e. is it migrating them, running the online_data_migrations cli, restarting the compute they are on, etc | 15:34 |
mriedem | if the flag were an enum that's one way | 15:35 |
*** belmoreira has quit IRC | 15:35 | |
dansmith | mriedem: yeah, so it'd be nice if we also tagged "issues" to the instance that you whittle down until the upgrade flag goes away, but I think that's too heavy for the moment.. but if we LOG.warning() for each instance that we were setting the flag on, then you would at least have a record | 15:35 |
dansmith | mriedem: yeah, could be iterative, like service_version | 15:35 |
dansmith | if we always did those things in order | 15:36 |
sean-k-mooney | dansmith: could you tie it into a nova status check or something | 15:36 |
mriedem | i don't think i'd rely on someone catching that warning | 15:36 |
openstackgerrit | Merged openstack/nova stable/rocky: Delete allocations even if _confirm_resize raises https://review.openstack.org/652146 | 15:36 |
mriedem | before their logs wrap or something | 15:36 |
mriedem | depends on how long they retain stuff in es | 15:36 |
mriedem | we already blast the logs with warnings that aren't useful | 15:37 |
dansmith | mriedem: sure, it just makes it lighter for the first rev of the idea, but if we made it a "schema version" sort of iterative "level" like service_version that could be easy, as you would look up the reason in the decoder ring | 15:37 |
cdent | Is there something up with the gate today? queue seems long and slow | 15:38 |
dansmith | and we could translate it in the api to "is current or not" if the version is backlevel (and show the version I guess) | 15:38 |
mriedem | cdent: it was like that yesterday too | 15:38 |
mriedem | dansmith: i'm not sure what service_version you're referring to | 15:38 |
mriedem | just the Service.version? | 15:38 |
dansmith | no, | 15:39 |
openstackgerrit | Merged openstack/nova stable/rocky: Don't warn on network-vif-unplugged event during live migration https://review.openstack.org/651797 | 15:39 |
dansmith | SERVICE_VERSION is a global counter of things we've done to services, so we can tell if they're up to date or not | 15:39 |
dansmith | we could have a similar global "instance version" which was a little more like a schema version, where we record whether instance records had been modified for a specific transition | 15:40 |
*** pcaruana has quit IRC | 15:40 | |
dansmith | this is not important right now, I was thinking you'd latch onto this so we could expose more info to the operators about upgraded-ness, but I'll just bring it up the next time it's appropriate, like in those specs | 15:41 |
mriedem | ack | 15:41 |
bauzas | sean-k-mooney: so, about the upgrade impact of the cpu-resources thing, I had an idea | 15:42 |
bauzas | what we could do is potentially leave operators upgrading to Train without any impact | 15:42 |
bauzas | ie. config options act exactly like Stein | 15:42 |
bauzas | (for the existing ones) | 15:43 |
mriedem | cdent: http://grafana.openstack.org/d/T6vSHcSik/zuul-status?panelId=3&fullscreen&orgId=1 | 15:43 |
bauzas | but, before you would like to use new config options, you'd have to say "hey, reshape my stuff" | 15:43 |
sean-k-mooney | bauzas: yes. i suggested not enableing any of the new functionality if the new option was not set | 15:43 |
bauzas | for this host | 15:43 |
sean-k-mooney | yep | 15:43 |
sean-k-mooney | so they would upgrade | 15:43 |
bauzas | and in this case, we would model the new inventories | 15:43 |
cdent | mriedem: "data without context can never be information" | 15:43 |
sean-k-mooney | then move vms if need and then enable the new funcitonaltiy | 15:44 |
bauzas | from an operator pov, it would mean "upgrade your cloud, upgrade your hosts" | 15:44 |
sean-k-mooney | so they could to a rooling config update after the upgrade | 15:44 |
mriedem | cdent: just saying there is a spike in the check queue so things are maybe slow as a result | 15:44 |
bauzas | when you're done, then upgrade your config when you want | 15:44 |
sean-k-mooney | yep | 15:44 |
bauzas | a nova-status check could do the pre-flight check | 15:44 |
sean-k-mooney | i think we are on the same page for that workflow | 15:44 |
cdent | mriedem: sure, but we knew that already (or at least I did) | 15:44 |
sean-k-mooney | bauzas: it requires use to support both codepaths in train | 15:45 |
sean-k-mooney | then remove teh old one in U | 15:45 |
bauzas | sean-k-mooney: yup, this was my concern | 15:45 |
bauzas | but | 15:45 |
sean-k-mooney | but i think that will help with upgrades significantly | 15:45 |
mriedem | cdent: well you didn't tell me that, so who's misinformed now?! | 15:45 |
bauzas | with addition to be : trigger a reshape by the operator | 15:45 |
bauzas | or | 15:45 |
bauzas | trigger a reshape by the config change | 15:46 |
cdent | heh. touche | 15:46 |
*** pcaruana has joined #openstack-nova | 15:46 | |
sean-k-mooney | bauzas: yep i was going to suggest that too. | 15:48 |
sean-k-mooney | stephenfin: ^ when you get a chance after this call does the above make sense | 15:48 |
mriedem | stephenfin: https://review.openstack.org/#/c/651291/3 | 15:49 |
mriedem | you've got a pep8 failure now in your bottom cells v1 removal series | 15:49 |
mriedem | i pulled it down since i didn't want to wait for zuul | 15:49 |
*** gyee has joined #openstack-nova | 15:50 | |
openstackgerrit | Merged openstack/nova master: Add minimum value in max_concurrent_live_migrations https://review.openstack.org/648302 | 15:50 |
openstackgerrit | Merged openstack/nova stable/rocky: libvirt: disconnect volume when encryption fails https://review.openstack.org/651796 | 15:50 |
openstackgerrit | Matt Riedemann proposed openstack/nova stable/queens: libvirt: disconnect volume when encryption fails https://review.openstack.org/653033 | 15:53 |
melwitt | dansmith: the other day when you mentioned about how ideally quota usage counting from placement would take max(old, new) flavor for a resize, were you meaning only a same host resize? or also for a move? | 15:53 |
dansmith | melwitt: well, definitely for same-host, but I think it's probably reasonable for either too, depending on your view of it | 15:54 |
dansmith | melwitt: and depending on the resource type | 15:55 |
*** pcaruana has quit IRC | 15:55 | |
dansmith | melwitt: as a user, I would probably think it's kinda silly to consume sum(old, new) of my resources during a resize of any kind because I'm not able to use both resources simultaneously, and the fact that they're counted against my old instance for a while is just an artifact of how nova does the two-phase resize, you know? | 15:56 |
melwitt | dansmith: ok, thanks. just wanted to make sure I understood. I've been thinking about "how would we do that?" eventually and was thinking, would that be done on the placement side /usages API? when it detects allocations for the same 'instance' type consumer for the same resource class? or would this be something we do on the nova side somehow? | 15:56 |
dansmith | as an operator, I might want to "charge" the user for those resources until they confirm/revert, but that too is an artifact of how nova behaves | 15:57 |
dansmith | melwitt: that would be super nova-specific behavior, so I would not think that should ever go into placement | 15:57 |
mnaser | can someone point me to where network_info is updated from? | 15:57 |
melwitt | dansmith: yeah, I agree from a user perspective definitely sum(old, new) is weird. but I was feeling unsure because of how we really are holding resources in two places | 15:58 |
mnaser | I have unsuccessfully been digging the code, and I have a cloud here where a bunch of instances have network_info=[] | 15:58 |
mriedem | mnaser: for neutron, nova.network.neutronv2.api.API._get_instance_network_info or something like that | 15:58 |
dansmith | melwitt: right, that's the operator argument, but it still seems silly to me | 15:58 |
dansmith | melwitt: that's kinda overhead and transient | 15:58 |
melwitt | dansmith: ok. that was my assumption but like.... we wouldn't be able to just use the /usages API simply anymore. how would we get all the info about what /usages are part of a resize etc. I can't even think about it right now | 15:58 |
dansmith | melwitt: "cost of business" | 15:59 |
*** ttsiouts has quit IRC | 15:59 | |
mriedem | melwitt: i don't think you can unless placement has consumer types | 15:59 |
*** ttsiouts has joined #openstack-nova | 15:59 | |
melwitt | dansmith: yeah, I agree with that too, overhead and transient | 15:59 |
dansmith | melwitt: well, the reserved resources are owned by a migration and not an instance, but yeah, consumer types :) | 15:59 |
mriedem | mnaser: https://github.com/openstack/nova/blob/master/nova/network/neutronv2/api.py#L1824 | 15:59 |
melwitt | mriedem: yes, definitely true, need consumer types but even still, I'm not 100% sure that would be enough | 16:00 |
mriedem | mnaser: that's called from here https://github.com/openstack/nova/blob/master/nova/network/base_api.py#L253 | 16:00 |
melwitt | and what would the calls look like to work that out on the nova side | 16:00 |
mriedem | GET /usages?project_id=foo&consumer_type=instance | 16:01 |
mnaser | mriedem: it doesn't seem like there's a task that just calls get_instance_nw_info() from time to time ,gr | 16:01 |
*** tesseract has quit IRC | 16:01 | |
mriedem | mnaser: oh but there is | 16:01 |
mriedem | mnaser: question is, which release is this cloud on? | 16:01 |
mnaser | rocky | 16:01 |
melwitt | mriedem: yeah, but then you get a sum of everything. how to pick apart and remove the min(old, new)? | 16:01 |
dansmith | melwitt: no | 16:01 |
*** pcaruana has joined #openstack-nova | 16:02 | |
mnaser | *something* has set the network info cache to empty.. dunno what/why yet | 16:02 |
dansmith | melwitt: that would exclude all the migration-held resources | 16:02 |
melwitt | no :) | 16:02 |
mriedem | mnaser: i've got the bug for you and patch, sec | 16:02 |
melwitt | oh it would? I guess I didn't know that | 16:02 |
melwitt | oh, because the consumer type would be 'migration'? | 16:02 |
dansmith | melwitt: if you only show instance-held resources? | 16:02 |
dansmith | melwitt: right, what else would you use consumer types for in this case? :) | 16:02 |
mriedem | right if you want to know new flavor usage, you'd filter on consumer_type=instance, | 16:02 |
mriedem | if you want to know old flavor usage, you'd filter on consumer_type=migration | 16:03 |
dansmith | the only limitation there would be that you'd get current not max(current, old) but I think that's okay | 16:03 |
mriedem | unless, | 16:03 |
dansmith | max() would be nice to charge them for the most they're potentially going to use at any given point to avoid a revert-to-bigger making them go over quota | 16:03 |
mriedem | you then create some more servers filling up your quota and then you can't revert the resize | 16:03 |
melwitt | ok, so you'd have to have two queries, one for 'instance' consumer type and one for 'migration' instance type in order to take max(old, new) | 16:03 |
mriedem | dansmith: jinx | 16:04 |
dansmith | melwitt: or get back them grouped by type in one query | 16:04 |
*** ttsiouts has quit IRC | 16:04 | |
melwitt | ok, I see | 16:04 |
mriedem | mnaser: https://review.openstack.org/#/c/591607/ | 16:05 |
mriedem | https://bugs.launchpad.net/nova/+bug/1751923 | 16:05 |
openstack | Launchpad bug 1751923 in OpenStack Compute (nova) "_heal_instance_info_cache periodic task bases on port list from nova db, not from neutron server" [Medium,Fix released] - Assigned to Maciej Jozefczyk (maciej.jozefczyk) | 16:05 |
mriedem | mnaser: that was something from the public cloud wg (ovh worked the fix i started), and our public cloud ops team needed it as well | 16:05 |
*** mdbooth has quit IRC | 16:05 | |
mriedem | b/c of the same thing you said - network_info gets wiped out and the heal task wouldn't refresh from neutron, but from the cache itself, which is ... dumb | 16:05 |
mriedem | note there is a pretty beefy data migration patch before that in the series | 16:06 |
mriedem | which is why we haven't backported it | 16:06 |
mnaser | ah this is tein | 16:06 |
mnaser | stein | 16:06 |
mnaser | poop | 16:06 |
mriedem | mnaser: i think you also mentioned something like this a few weeks ago which prompted me to write this https://review.openstack.org/#/c/640516/ | 16:07 |
mriedem | ^ seems obvious to me, but when i dug into history where were previous attempts to do the same which were reverted because of "potential race issues" or something | 16:07 |
mriedem | those were also many years ago so idk if they'd still exist | 16:07 |
mnaser | I dunno how this kinda just appeared out of nowhere | 16:08 |
mnaser | queens cloud upgraded to rocky and poof | 16:08 |
mriedem | mnaser: that bug has some scenarios where people hit it | 16:08 |
*** pcaruana has quit IRC | 16:08 | |
mriedem | mnaser: https://bugs.launchpad.net/nova/+bug/1751923/comments/4 is the case our ops team hit - changing policy | 16:09 |
openstack | Launchpad bug 1751923 in OpenStack Compute (nova) "_heal_instance_info_cache periodic task bases on port list from nova db, not from neutron server" [Medium,Fix released] - Assigned to Maciej Jozefczyk (maciej.jozefczyk) | 16:09 |
mnaser | mriedem: ok so I assume _get_ordered_port_list() will not work in rocky | 16:09 |
mnaser | because of the lack of index | 16:09 |
mriedem | mnaser: it depends on if the instances were created after mitaka | 16:10 |
mriedem | because it relies on the virtual interface record and we didn't start creating those for servers until newton | 16:10 |
mriedem | hence the online data migration https://review.openstack.org/#/c/614167/ | 16:10 |
mriedem | a bug was reported last week for that online data migration as well https://bugs.launchpad.net/nova/+bug/1824435 but so far we don't have a reproducer | 16:14 |
openstack | Launchpad bug 1824435 in OpenStack Compute (nova) stein "fill_virtual_interface_list migration fails on second attempt" [High,Triaged] | 16:14 |
mnaser | oh well | 16:15 |
mnaser | this cloud has been running since queens | 16:15 |
mnaser | so I guess I could get away with running this once? | 16:15 |
bauzas | dansmith: thanks for your comments on https://review.openstack.org/#/c/650963/ | 16:15 |
mriedem | if all the instances on that cloud were created since at least queens you should be ok | 16:15 |
bauzas | dansmith: I don't disagree with you and I understand your concerns | 16:16 |
mriedem | mnaser: you could find out by comparing the instances table count to the virtual_interfaces table or something, i.e. is there at least one vif per instance? | 16:16 |
bauzas | dansmith: I guess we could just have a 'preferred' policy | 16:16 |
bauzas | dansmith: would that be okay for you ? | 16:16 |
dansmith | bauzas: meaning, not add the knob, and just make the behavior always "preferred" right? | 16:17 |
stephenfin | mriedem: Sorry, I rushed that. I assume if you've pulled it down I should leave it alone | 16:17 |
bauzas | yup | 16:17 |
bauzas | no UX | 16:17 |
sean-k-mooney | prefered in the context of? | 16:17 |
sean-k-mooney | vgpu numa? | 16:17 |
bauzas | I just need to think about the upgrade tho | 16:17 |
bauzas | but I don't think it would be a problem | 16:18 |
sean-k-mooney | wasnt that the suggeted default in the spec | 16:18 |
bauzas | sean-k-mooney: correct | 16:18 |
stephenfin | It's dumb but running tox cripples Bluejeans and any day that I've loads of meetings (like today) means I can't kick off anything locally that's going to run in the background | 16:18 |
bauzas | sean-k-mooney: nope, the default was 'nothing changes' | 16:18 |
dansmith | bauzas: yes I think that'd be ideal | 16:18 |
sean-k-mooney | oh well prefered was just going to be impleneted by a weigher right | 16:18 |
bauzas | dansmith: okay, I'll -W the spec and work on a new PS | 16:19 |
dansmith | cool | 16:19 |
sean-k-mooney | we will prefer host that can provide numa affinty but not filter out any | 16:19 |
sean-k-mooney | like the pci weigher | 16:19 |
mnaser | mriedem: SELECT instances.uuid, COUNT(virtual_interfaces.uuid) FROM instances LEFT JOIN virtual_interfaces ON virtual_interfaces.instance_uuid = instances.uuid WHERE instances.deleted=0 GROUP BY instances.uuid; | 16:19 |
mnaser | shows 1/2 but no 0's | 16:19 |
bauzas | sean-k-mooney: yup, just a way to see whether we can have *some affinity* | 16:19 |
bauzas | if no affinity possible, fine | 16:19 |
mnaser | so I guess I can cherry-pick that code and run it once.. | 16:19 |
sean-k-mooney | yep | 16:20 |
sean-k-mooney | that is why i assumed it would be the default | 16:20 |
sean-k-mooney | becaue its best effort | 16:20 |
mriedem | mnaser: cool, let me know how it goes | 16:20 |
bauzas | sean-k-mooney: the concern wasn't really the default | 16:20 |
bauzas | value* | 16:20 |
bauzas | sean-k-mooney: the concern is more whether we want to introduce more knobs, and 'required' needed those | 16:20 |
openstackgerrit | Mohammed Naser proposed openstack/nova stable/rocky: Force refresh instance info_cache during heal https://review.openstack.org/653040 | 16:21 |
bauzas | anyway, I'll just write a new revision, and people could chime on it | 16:21 |
stephenfin | bauzas, sean-k-mooney: RE: the cpu-resources discussion above, it's not nova-compute that makes the call to placement | 16:22 |
bauzas | stephenfin: which call are we talking about ? | 16:22 |
stephenfin | the claim | 16:23 |
bauzas | yup, I don't disagree | 16:23 |
bauzas | but then I don't get your point | 16:23 |
mnaser | at least it applies cleanly | 16:24 |
* mnaser needs to run to an appointment quickly | 16:24 | |
bauzas | my upgrade concerns are about when and how we should transform inventories | 16:24 |
bauzas | (and move allocations accordingly) | 16:24 |
stephenfin | bauzas: We'd either have to request a certain amount of PCPU resources, which couldn't be fulfilled since we're not reporting any inventory (because the config values haven't been set) | 16:25 |
stephenfin | Or we'd have to keep requesting VCPU resources for everything, which borks the whole idea | 16:25 |
bauzas | I tend for the latter | 16:25 |
bauzas | if people start asking PCPU resources, they necessarly have to be fulfilled by hosts ready to accept them, I don't disagree | 16:26 |
bauzas | but then, Train is necessarly a mitigation release | 16:27 |
bauzas | because of computes | 16:27 |
bauzas | we did had the same problem with rolling upgrades | 16:27 |
bauzas | you can't really make use of a feature unless all computes are up to date in general, in particular when it comes to resources usage | 16:27 |
stephenfin | bauzas: hmm, that removes our ability to transform 'hw:cpu_policy=dedicated' under the hood though | 16:29 |
bauzas | that doesn't mean operators can't use PCPU requests with Train | 16:29 |
stephenfin | and if a CPU is part of the PCPU pool, it can't be part of the VCPU poool | 16:29 |
bauzas | but they absolutely need to converge all their inventories *before* they use the PCPU request | 16:29 |
stephenfin | without the transform in place, any existing instances can't be migrated (they wouldn't be using PCPUs but rather the legacy 'hw:cpu_policy=dedicated' extra spec) plus flavours and images would not be requesting the correct stuff | 16:31 |
stephenfin | The point is I think we need that rewriting in place, and if we need that then we need some initial PCPU inventories in place as soon as the upgrade is in place, otherwise the ability to do move existing instances or create new ones is gone :( | 16:32 |
stephenfin | This could really do with a call, I think | 16:32 |
sean-k-mooney | probably | 16:34 |
*** pcaruana has joined #openstack-nova | 16:36 | |
bauzas | stephenfin: I have to leave in a few, but I'll summarize my thoughts | 16:36 |
bauzas | we could just leave existing options as they are and leave inventories be VCPU | 16:37 |
bauzas | but | 16:37 |
bauzas | once operator sets config options (and then we can discuss on this specific trigger), then we do a reshape for this host and split VCPU inventory into VCPU and PCPU inventories | 16:38 |
bauzas | and we accordingly move allocations | 16:38 |
bauzas | with the slight detail that we verify resources *before* doing the reshape so we can raise an exception | 16:38 |
bauzas | at compute startup | 16:38 |
bauzas | for requests, we somehow need to make sure that we transform requests by using PCPU based on a specific point in time | 16:39 |
bauzas | either by providing a flag | 16:40 |
bauzas | or by verifying the PCPU inventories | 16:40 |
bauzas | stephenfin: thoughts on that ? | 16:40 |
bauzas | others: too | 16:40 |
*** rpittau is now known as rpittau|afk | 16:43 | |
openstackgerrit | Lee Yarwood proposed openstack/nova-specs master: Re-propose stable device rescue for Train https://review.openstack.org/651151 | 16:43 |
stephenfin | bauzas: If that happens though, then everything has to be done at once. | 16:45 |
bauzas | stephenfin: that happens what ? | 16:46 |
bauzas | asking for PCPU ? | 16:46 |
stephenfin | if we wait until some point in time to start reporting PCPU | 16:46 |
stephenfin | so if we say the operator setting this configuration option is that point in time, then we must also ensure the operator also twists the knob that says "transform all my legacy extra specs/image meta to PCPU requests" | 16:47 |
*** tssurya has quit IRC | 16:47 | |
efried | We said we weren't going to allow reshapes except on upgrade boundaries. When "we" said that, I was a dissenting vote. There are a number of specs we're discussing in the current cycle where that is going to need to be re-evaluated. | 16:48 |
stephenfin | (that knob has to exist because there is no a point in time where we can have no PCPU resources so therefore we can't always transform) | 16:48 |
efried | We must either allow reshapes "any time" (conceivably with a compute service restart), or stick to the above guns and require a host to be cleared out before configuration tweaks are done. The latter means we're not reshaping, just shuffling inventory around in a regular update_provider_tree code path, because we don't have allocations to dork with. | 16:49 |
bauzas | efried: we said earlier that the latter is highly terrible for ops | 16:50 |
*** bryan_stephenson has joined #openstack-nova | 16:50 | |
bauzas | earlier being last week | 16:50 |
bauzas | efried: so that's why I'm considering a reshape at compute startup | 16:50 |
bauzas | based on config flags modification | 16:50 |
dansmith | efried: what you mean is make reshapes obligatory and not contingent on a config value or something right? | 16:50 |
efried | I'm in favor of that. | 16:50 |
bauzas | and the request knob be ops-driven, I like this | 16:51 |
dansmith | efried: because I think the two are not mutually exclusive, if the shape of the reshape would be defined by something in config (i.e. how many cpus are dedicated vs. shared) | 16:51 |
efried | dansmith: I mean allow reshapes to happen when they need to happen, rather than restricting them to upgrade boundaries. That's what bauzas is talking about as well. | 16:51 |
efried | Yes, dansmith and bauzas we still need to fail the reshape if it entails moving allocations in an impossible way. | 16:52 |
dansmith | efried: right I know, but I think there's subtlety here | 16:52 |
*** markvoelker has joined #openstack-nova | 16:52 | |
bauzas | efried: to be fair, VGPU reshapes are done on compute startup already, not upgrade: ) | 16:52 |
bauzas | of course, it will in theory run once, after upgrading | 16:52 |
dansmith | efried: we have to maintain config compatibility, but if if we don't have information in the N-1 config to do the reshape, then we have to be able to punt the reshape (triggered by an upgrade) until after the config is updated | 16:52 |
efried | like if you suddenly specify your PCPU pinset to be empty, but have instances running with dedicated CPUs, that's a fail. | 16:52 |
dansmith | and, I agree that if you have to change how many pcpus are dedicated, we have to reshape again | 16:53 |
efried | I think we're on the same page | 16:53 |
bauzas | efried: that's why I proposed to check the allocations and inventories *before* providing the reshape | 16:53 |
*** ccamacho has quit IRC | 16:53 | |
bauzas | if the operator changes the config, but placement says "sorry but you can't", then the compute will fail to restart | 16:53 |
dansmith | efried: I think the thing I don't want, which I expressed as "only at upgrade time" is something like we reshape every time we restart compute because we decide we can arrange things better, or some state in the db has changed, but reshape due to a config/structural change makes sense | 16:54 |
bauzas | if the operator changes the config, and placement resources are okay, then the driver returns a ReshapeNeeded | 16:54 |
efried | dansmith: wfm | 16:54 |
bauzas | and then the new inventories and allocations | 16:54 |
bauzas | okay, so dansmith, efried and I are on the same page | 16:54 |
bauzas | there is one last concern from stephenfin about the config knob for the PCPU request | 16:55 |
efried | So e.g. in the PCPU spec, we're inferring the counts and pinsets of VCPUs vs PCPUs based on existing conf options. | 16:55 |
bauzas | but I think it's okay to make the request transformation to be "config-driven" | 16:55 |
efried | So the operator needs to change to the new config in such a way that it *exactly* matches what we inferred, right? | 16:56 |
bauzas | so, the operator would basically tell when he's okay to count PCPUs (ie. probably after the whole nodes config change) | 16:56 |
*** markvoelker has quit IRC | 16:56 | |
efried | Otherwise we don't just need a reshape (move allocations) - we would also possibly need to re-pin guests to different physical processors and such. | 16:56 |
bauzas | efried: no, I'm saying that existing config will report VCPUs anyway | 16:57 |
bauzas | (including options that were asking for pinned cpu)s | 16:57 |
bauzas | efried: only new config option (explicitely cpu_dedicated_set) will trigger a reshape | 16:57 |
efried | yes, I get that bauzas, what I'm saying is, we're going to *infer* VCPU/PCPU counts and pinsets based on legacy conf options; but then the operator wants to cut over to using the new conf options. | 16:57 |
bauzas | no, I don't want us to infer VCPU and PCPU based on those options because they are errorprone | 16:58 |
bauzas | efried: ^ | 16:58 |
bauzas | those options being the legacy ones | 16:58 |
efried | oh, that's the basis for the PCPU spec as written at PS24 anyway. Haven't checked since then... | 16:58 |
bauzas | that's exactly why I'm saying "don't touch anything until operator explicitely says 'I want cpu_dedicated_set') | 16:59 |
*** itlinux has joined #openstack-nova | 16:59 | |
bauzas | old world = VCPU | 16:59 |
bauzas | new world = VCPU and PCPU | 17:00 |
bauzas | for the request, trigger the request option when you consider having enough hosts to sustain PCPU requests | 17:00 |
bauzas | anyway, I need to bail out | 17:00 |
bauzas | kids aren't in town, and I promised some evening to my spouse | 17:01 |
efried | I think I see. Did you comment accordingly on the spec? | 17:01 |
bauzas | efried: I think so | 17:01 |
efried | okay. | 17:01 |
sean-k-mooney | efried: yes the cpu spec currently does infer but i agree with bauzas that we should not | 17:03 |
sean-k-mooney | efried: i think not supporting inference form the old config values would remove much of the upgrade impact or atleast help us too. | 17:04 |
efried | I'm good with that. Let's get it reflected in the spec | 17:05 |
bauzas | I just provided a comment trying to summarize my thoughts | 17:05 |
bauzas | this said, calling it a day | 17:05 |
sean-k-mooney | o/ | 17:05 |
efried | imacdonn: I agree with your assessment in https://bugs.launchpad.net/nova/+bug/1824435 | 17:06 |
openstack | Launchpad bug 1824435 in OpenStack Compute (nova) stein "fill_virtual_interface_list migration fails on second attempt" [High,Triaged] | 17:06 |
efried | (I wanted to say that, but not pollute the bug with it) | 17:06 |
mriedem | dansmith: you want to check my rolling upgrade validation logic that i dumped in gibi's spec https://review.openstack.org/#/c/652608/4/specs/train/approved/server-move-operations-with-ports-having-resource-request.rst@190 ? | 17:06 |
dansmith | mriedem: I probably have to read the whole spec to make sense of that huh? | 17:10 |
imacdonn | efried, which one? i.e. do you think we need to address (2) or would fixing (1) obviate that ? | 17:11 |
*** munimeha1 has joined #openstack-nova | 17:11 | |
*** priteau has quit IRC | 17:11 | |
efried | imacdonn: fixing (1) would obviate. But that assumes we can do so. | 17:11 |
efried | imacdonn: IMO we should fix (1) and change (2) to raise an explicit exception to "guarantee" it. | 17:12 |
imacdonn | efried, right ... so now I'm trying to understand why the row is being created in the first place | 17:12 |
efried | ++ | 17:12 |
efried | imacdonn: Is it possible for the rows to differ in any material way? | 17:12 |
efried | (i.e. a way that makes a difference to the outcome) | 17:13 |
mriedem | dansmith: not really, it's just the usual "how could this fail during an upgrade" | 17:13 |
mriedem | dansmith: he needs to pass new parameters to compute rpc api methods, | 17:13 |
dansmith | well, I read it and seemed like I needed to understand more, so I'm reading the wholething now | 17:13 |
mriedem | which could be (1) stein computes that don't handle those or (2) rpc pinned so we pop those parameters | 17:13 |
mriedem | ok | 17:13 |
imacdonn | efried, I'm assuming that _security_group_get_by_names() is used elsewhere (or at least intended to be reusable), so probably should consider all possible use-cases, if we tackle that one | 17:13 |
efried | used in two places | 17:14 |
mriedem | imacdonn: efried: honestly i'm not sure how much relevance that code even has anymore if you're using neutron | 17:15 |
imacdonn | mriedem, I was wondering about that | 17:15 |
mriedem | i think it at least means if you're using neutron, every project that ever created an instance in nova has a 'default' security_groups table record that is never used or cleaned up | 17:16 |
mriedem | vestigial | 17:16 |
imacdonn | that seems plausible | 17:17 |
dansmith | mriedem: see if my words help at all | 17:22 |
*** ricolin has quit IRC | 17:25 | |
openstackgerrit | Stephen Finucane proposed openstack/nova master: Remove '/os-cells' REST APIs https://review.openstack.org/651291 | 17:26 |
openstackgerrit | Stephen Finucane proposed openstack/nova master: Stop handling cells v1 in '/os-hypervisors' API https://review.openstack.org/651292 | 17:26 |
openstackgerrit | Stephen Finucane proposed openstack/nova master: Stop handling cells v1 in '/os-servers' API https://review.openstack.org/651293 | 17:26 |
openstackgerrit | Stephen Finucane proposed openstack/nova master: Remove 'nova-manage cell' commands https://review.openstack.org/651294 | 17:26 |
openstackgerrit | Stephen Finucane proposed openstack/nova master: Stop handling cells v1 for console authentication https://review.openstack.org/651295 | 17:26 |
openstackgerrit | Stephen Finucane proposed openstack/nova master: Remove old-style cell v1 instance listing https://review.openstack.org/651296 | 17:26 |
openstackgerrit | Stephen Finucane proposed openstack/nova master: Remove 'bdm_(update_or_create|destroy)_at_top' https://review.openstack.org/651297 | 17:26 |
openstackgerrit | Stephen Finucane proposed openstack/nova master: Remove 'instance_fault_create_at_top' https://review.openstack.org/651298 | 17:26 |
openstackgerrit | Stephen Finucane proposed openstack/nova master: Remove 'instance_info_cache_update_at_top' https://review.openstack.org/651299 | 17:26 |
openstackgerrit | Stephen Finucane proposed openstack/nova master: Remove 'get_keypair_at_top' https://review.openstack.org/651300 | 17:26 |
openstackgerrit | Stephen Finucane proposed openstack/nova master: Remove 'instance_update_at_top', 'instance_destroy_at_top' https://review.openstack.org/651301 | 17:26 |
openstackgerrit | Stephen Finucane proposed openstack/nova master: Remove 'instance_update_from_api' https://review.openstack.org/651302 | 17:26 |
openstackgerrit | Stephen Finucane proposed openstack/nova master: Stop handling 'update_cells' on 'BandwidthUsage.create' https://review.openstack.org/651303 | 17:26 |
openstackgerrit | Stephen Finucane proposed openstack/nova master: Stop handling cells v1 for instance naming https://review.openstack.org/651304 | 17:26 |
openstackgerrit | Stephen Finucane proposed openstack/nova master: Remove cells code https://review.openstack.org/651306 | 17:26 |
openstackgerrit | Stephen Finucane proposed openstack/nova master: Stop handling 'InstanceUnknownCell' exception https://review.openstack.org/651307 | 17:26 |
*** sapd1_x has quit IRC | 17:26 | |
openstackgerrit | Stephen Finucane proposed openstack/nova master: Remove unnecessary wrapper https://review.openstack.org/651308 | 17:26 |
openstackgerrit | Stephen Finucane proposed openstack/nova master: db: Remove cell APIs https://review.openstack.org/651309 | 17:26 |
mriedem | dansmith: replied but yeah i think so | 17:27 |
*** hamzy has joined #openstack-nova | 17:27 | |
dansmith | cool | 17:27 |
sean-k-mooney | mriedem: so related but sperate form gibis spec should we jsut use teh livemigration multiple port bindign workflow for all move operations | 17:28 |
sean-k-mooney | mriedem: i mentioned it in gibi's spec but it would simplfy things to have a singel common code path | 17:29 |
mriedem | given the issues NewBruce is hitting idk | 17:29 |
mriedem | moving all move operations to that model would be a big change i think, and likely not something we should block gibi's spec on | 17:30 |
* mriedem lunches | 17:30 | |
sean-k-mooney | oh im not suggesting we should block on it | 17:30 |
sean-k-mooney | its jsut some/alot of the compleix would be reduced | 17:31 |
sean-k-mooney | but NewBruce issue is concering ill grant that | 17:31 |
*** ralonsoh has quit IRC | 17:34 | |
francoisp_ | alex_xu, hi, would you have time to look at https://review.openstack.org/#/c/648123/6 - thanks! | 17:35 |
*** dtantsur is now known as dtantsur|afk | 17:36 | |
*** dklyle has quit IRC | 17:45 | |
*** Sundar has joined #openstack-nova | 17:45 | |
*** priteau has joined #openstack-nova | 17:50 | |
*** igordc has joined #openstack-nova | 17:50 | |
*** erlon has quit IRC | 17:54 | |
*** erlon has joined #openstack-nova | 17:56 | |
openstackgerrit | Matt Riedemann proposed openstack/nova master: Do not create default security group during instance create if using Neutron https://review.openstack.org/653065 | 18:05 |
mriedem | imacdonn: let's see what blows up ^ | 18:05 |
*** erlon has quit IRC | 18:07 | |
imacdonn | mriedem: hmm .. wouldn't use_neutron be true in most cases, when the migration is being run ? | 18:09 |
mriedem | yes use_neutron is the default and what 99% of deployments are probably using at this point | 18:10 |
imacdonn | mriedem: I haz the dumb ... how would this solve the problem? | 18:11 |
mriedem | we don't hit the problem code if you're using neutron | 18:12 |
imacdonn | but .. I am using neutron, and I do hit the prpblem | 18:12 |
mriedem | with this patch | 18:12 |
mriedem | ? | 18:12 |
imacdonn | no, but the patch only makes a difference if you're not using neutron | 18:12 |
imacdonn | (?) | 18:12 |
mriedem | if not NEUTRON: create default sec group | 18:13 |
imacdonn | oh wait, I had it upside-down | 18:13 |
imacdonn | yeah OK ... I was about to propose making _security_group_ensure_default() return None if the context has no project_id (which does make the migration work for me) | 18:14 |
imacdonn | wonder if the migration hits https://github.com/openstack/nova/blob/master/nova/db/sqlalchemy/api.py#L1740 | 18:16 |
imacdonn | mriedem: ^ I think it will | 18:19 |
*** luksky has joined #openstack-nova | 18:21 | |
imacdonn | mriedem: confirmed that I still hit the problem with your change, for above reason | 18:22 |
openstackgerrit | Matt Riedemann proposed openstack/nova master: Do not create default security group during instance create if using Neutron https://review.openstack.org/653065 | 18:24 |
mriedem | try this ^ | 18:24 |
imacdonn | mriedem: that seems to work (at least allow the migration to work) | 18:28 |
*** priteau has quit IRC | 18:33 | |
mriedem | i still don't know how to recreate your issue | 18:35 |
imacdonn | you'd have to delete the marker instance (the one with uuid 00000000-0000-0000-0000-000000000000) | 18:35 |
mriedem | ok so like, | 18:36 |
mriedem | 1. run data migration, | 18:36 |
mriedem | 2. archive/purge deleted records | 18:36 |
mriedem | 3. run data migration | 18:36 |
imacdonn | I think you have to have at least one project with at least one instance | 18:36 |
imacdonn | (that's not deleted, I assume) | 18:37 |
*** irclogbot_2 has quit IRC | 18:39 | |
*** irclogbot_1 has joined #openstack-nova | 18:41 | |
aspiers | efried: haven't been following the channel but I'm around for the next few hours in case you get to reviewing the SEV spec | 18:42 |
efried | aspiers: ack | 18:43 |
imacdonn | mriedem: I'm not sure of the exact sequence that gets me into the bad state to begin with, but it has happened repeatedly (after both upgrade and fresh install) ... to force it, you may have to (delete marker instance, run the migration) twice | 18:44 |
mriedem | on a fresh install you wouldn't have any instances to migrate so i'm not sure how the marker is getting created | 18:45 |
mriedem | unless you mean: 1. create a test server, 2. run the migration, 3. delete the marker record 4. run the migration again | 18:46 |
imacdonn | mriedem: by fresh install, I mean that it was not an upgrade ... so like 1) install, 2) create a test instance ... some time later; 3) run the migration | 18:48 |
mriedem | yeah ok | 18:48 |
melwitt | random question: does a cold migration (no change in flavor) resize need to be resize confirmed like a flavor changing resize does? | 18:54 |
mriedem | yes | 18:55 |
imacdonn | Last time I tried, it was required... whether or not it *should* ...... | 18:55 |
melwitt | thanks y'all | 18:56 |
mriedem | imacdonn: well i'm unable to recreate your issue in a functional test but i found a new regression, 500 in the api | 19:09 |
*** eharney has quit IRC | 19:17 | |
cdent | mriedem: you're so good at that | 19:18 |
imacdonn | mriedem: hmm, I was just pondering if maybe the marker instance gets deleted when the last "real" instance for a project is deleted ... in testing that, I just got a "ClientException: Unknown Error (HTTP 504)" - not sure if related | 19:20 |
openstackgerrit | Matt Riedemann proposed openstack/nova master: Add regression test for bug 1825034 https://review.openstack.org/653098 | 19:20 |
openstack | bug 1825034 in OpenStack Compute (nova) "listing deleted servers from the API fails after running fill_virtual_interface_list online data migration" [High,Confirmed] https://launchpad.net/bugs/1825034 | 19:20 |
mriedem | imacdonn: would be interested to know why my recreate steps for *your* bug don't hit here ^ | 19:20 |
openstackgerrit | Matt Riedemann proposed openstack/nova master: Add regression test for bug 1825034 https://review.openstack.org/653098 | 19:21 |
openstack | bug 1825034 in OpenStack Compute (nova) "listing deleted servers from the API fails after running fill_virtual_interface_list online data migration" [High,Confirmed] https://launchpad.net/bugs/1825034 | 19:21 |
mriedem | imacdonn: the marker instance is soft deleted as soon as it's created | 19:21 |
mriedem | https://github.com/openstack/nova/blob/master/nova/objects/virtual_interface.py#L308 | 19:22 |
imacdonn | mriedem: ah, right, so it probably requires archival to have happened to reproduce my original problem | 19:23 |
mriedem | that's what my functional test does | 19:23 |
mriedem | but it doesn't hit your issue | 19:23 |
imacdonn | does it create the two null rows in security_groups ? | 19:24 |
gmann | looking for review on these 2 specs - https://review.openstack.org/#/c/603969/ https://review.openstack.org/#/c/547850/ | 19:24 |
gmann | mriedem: would you like to give second round review on this (API cleanup) - https://review.openstack.org/#/c/603969 | 19:24 |
mriedem | imacdonn: that i don't know - this is also sqlite so i'm not sure if sqlite is more strict about null values in constraints than mysql (that would be funny if it is) | 19:24 |
mriedem | gmann: it's in the queue somewhere | 19:25 |
gmann | ok, thanks | 19:25 |
imacdonn | mriedem: I wouldn't be surprised if sqlite observes null when checking for unique constraint (which we already established that mysql does not) | 19:26 |
imacdonn | mriedem: OTOH https://sqlite.org/faq.html#q26 seems to suggest otherwise | 19:29 |
*** itlinux has quit IRC | 19:30 | |
mriedem | imacdonn: i added this to the test and the test passes http://paste.openstack.org/show/749386/ | 19:31 |
*** jmlowe has quit IRC | 19:33 | |
imacdonn | mriedem: is this test running with the admin context ? | 19:33 |
imacdonn | mriedem: i.e. the one that has no project_id | 19:34 |
*** itlinux has joined #openstack-nova | 19:34 | |
mriedem | ctxt.project_id is an admin context with no project_id, | 19:34 |
mriedem | self.api.project_id is a non-null value | 19:34 |
*** awaugama has quit IRC | 19:34 | |
melwitt | gmann: those are in my queue too | 19:35 |
*** sridharg has quit IRC | 19:36 | |
gmann | melwitt: thanks. | 19:39 |
openstackgerrit | Matt Riedemann proposed openstack/nova master: Add post-test wrinkle to list deleted servers before archive https://review.openstack.org/653131 | 19:39 |
imacdonn | mriedem: FWIW, I can reproduce my original problem with this sequence: 1) create an instance 2) run migrations 3) archive 4) run migrations | 19:44 |
mriedem | imacdonn: that's what my functional test does though | 19:46 |
mriedem | but let me try in devstack | 19:46 |
mriedem | the neutron fixture in the functional test is likely not creating any virtual interface records | 19:46 |
imacdonn | mriedem: yeah, except sqlite vs. mysql .... or .... ? | 19:46 |
mriedem | sure | 19:46 |
mriedem | i'll spin up a devstack | 19:46 |
mriedem | also need to get up and stretch the legs and get some coffee | 19:46 |
mriedem | sean-k-mooney: dansmith: btw that mitaka->newton regression mentioned yesterday is a thing, i reported a bug https://bugs.launchpad.net/nova/+bug/1825018 | 19:48 |
openstack | Launchpad bug 1825018 in OpenStack Compute (nova) "security group driver gets loaded way too much in the api" [Low,Triaged] | 19:48 |
*** dklyle has joined #openstack-nova | 19:58 | |
*** jmlowe has joined #openstack-nova | 20:01 | |
*** bbowen has quit IRC | 20:03 | |
*** igordc has quit IRC | 20:05 | |
*** eharney has joined #openstack-nova | 20:14 | |
*** weshay has quit IRC | 20:14 | |
*** weshay has joined #openstack-nova | 20:15 | |
*** pcaruana has quit IRC | 20:19 | |
*** priteau has joined #openstack-nova | 20:24 | |
*** priteau has quit IRC | 20:27 | |
mnaser | mriedem: backporting fixed it cleanly. | 20:28 |
mnaser | I have an abandoned backport if anyone wants to cherry pick cause it applies cleanly right now | 20:28 |
mnaser | Left a comment too so if someone finds the bug and sees the proposed but then abandoned patch, they’ll see some useful info there | 20:29 |
openstackgerrit | melanie witt proposed openstack/nova master: Add get_counts() to InstanceMappingList https://review.openstack.org/638072 | 20:30 |
openstackgerrit | melanie witt proposed openstack/nova master: Count instances from mappings and cores/ram from placement https://review.openstack.org/638073 | 20:30 |
openstackgerrit | melanie witt proposed openstack/nova master: Use instance mappings to count server group members https://review.openstack.org/638324 | 20:30 |
openstackgerrit | melanie witt proposed openstack/nova master: Add get_usages_counts_for_quota to SchedulerReportClient https://review.openstack.org/653145 | 20:30 |
openstackgerrit | melanie witt proposed openstack/nova master: Set [quota]count_usage_from_placement = True in nova-next https://review.openstack.org/653146 | 20:30 |
mriedem | mnaser: ah so the heal task fixed up the network info cache? | 20:30 |
mriedem | on that rocky cloud | 20:30 |
mnaser | Yep. Just watched them slowly get repopulated | 20:30 |
mriedem | nice | 20:30 |
mnaser | Had to apply on all computes tho which was annoying but yeah | 20:30 |
mriedem | btw, another reason to not backport that data migration, i just found this https://bugs.launchpad.net/nova/+bug/1824435 | 20:31 |
openstack | Launchpad bug 1824435 in OpenStack Compute (nova) stein "fill_virtual_interface_list migration fails on second attempt" [High,Triaged] | 20:31 |
mnaser | I probably will redeploy again without it.. if they disappear again.. will raise question | 20:31 |
mnaser | Yeah I saw that too as well. Never ran into that though.. yet :X | 20:31 |
mriedem | mnaser: run this as admin on your stein cloud: openstack server list --all-projects --deleted | 20:31 |
mnaser | o should I be worried about that | 20:32 |
mriedem | running the command or the bug? | 20:32 |
mnaser | The command | 20:32 |
mriedem | it's read-only | 20:32 |
mnaser | As in is there some magical bug we’re about to discover | 20:32 |
* mnaser doesn’t feel like more work :p | 20:32 | |
mriedem | heh i've already discovered the bug | 20:32 |
mriedem | i think you already have it | 20:33 |
mnaser | But I can do that later, I also can do it both on a cloud that has db archive enabled and disabled too | 20:33 |
mriedem | sure the workaround is to archive, but you have to do it after running that migration every time | 20:33 |
efried | aspiers: Still hanging around? | 20:39 |
mriedem | imacdonn: efried: yup, recreated http://paste.openstack.org/show/749391/ | 20:41 |
efried | woot, in a func test? | 20:41 |
mriedem | no, devstack | 20:41 |
openstackgerrit | melanie witt proposed openstack/nova master: Count instances from mappings and cores/ram from placement https://review.openstack.org/638073 | 20:41 |
openstackgerrit | melanie witt proposed openstack/nova master: Set [quota]count_usage_from_placement = True in nova-next https://review.openstack.org/653146 | 20:41 |
openstackgerrit | melanie witt proposed openstack/nova master: Use instance mappings to count server group members https://review.openstack.org/638324 | 20:41 |
*** ttsiouts has joined #openstack-nova | 20:44 | |
efried | mriedem: Well, if you can do it in devstack, you can at least mock it in a func test I guess. | 20:48 |
mriedem | not necessarily - func test is using sqlite | 20:49 |
mriedem | devstack is using mysql | 20:49 |
*** cdent has quit IRC | 20:50 | |
mriedem | but https://sqlite.org/faq.html#q26 suggests sqlite and mysql have the same behavior about how nulls are handled in unique constraints | 20:51 |
mriedem | efried: but my functional test is doing the same steps https://review.openstack.org/#/c/653098/2/nova/tests/functional/regressions/test_bug_1825034.py | 20:51 |
*** hamzy has quit IRC | 20:54 | |
*** wwriverrat has joined #openstack-nova | 20:56 | |
imacdonn | mriedem: I think it'd be interesting to see if your sqlite db has the duplicate rows after the migration is run the first time ... is it possible to query that? Not sure what conditions the tests run under.... | 20:59 |
mriedem | i've added an assertion for that locally and it's just returning 1 security group for the null project_id | 21:00 |
imacdonn | OK, so that FAQ is wrong .. or we're misinterpreting it | 21:01 |
openstackgerrit | Dakshina Ilangovan proposed openstack/nova-specs master: Resource Management Daemon - Last Level Cache https://review.openstack.org/651233 | 21:04 |
openstackgerrit | Dustin Cowles proposed openstack/nova master: Introduces the openstacksdk to nova https://review.openstack.org/643664 | 21:05 |
openstackgerrit | Dakshina Ilangovan proposed openstack/nova-specs master: Resource Management Daemon - Last Level Cache https://review.openstack.org/651233 | 21:12 |
*** wwriverrat has quit IRC | 21:13 | |
aspiers | efried: back | 21:22 |
aspiers | although it's getting late-ish here | 21:22 |
*** mchlumsky_ has quit IRC | 21:23 | |
cfriesen | anyone here know OVMF? Looks like centos 7.6 has modified the OVMF-20180508-3 rpm to no longer contain the file /usr/share/OVMF/OVMF_CODE.fd that nova looks for in nova/virt/libvirt/driver.py. Instead it now seems to be named /usr/share/OVMF/OVMF_CODE.secboot.fd | 21:24 |
openstackgerrit | Dakshina Ilangovan proposed openstack/nova-specs master: Resource Management Daemon - Last Level Cache https://review.openstack.org/651233 | 21:24 |
mnaser | mriedem: is there a patch/fix for the `openstack server list --all-projects --deleted` thing? | 21:28 |
imacdonn | mnaser, discussion at https://bugs.launchpad.net/nova/+bug/1825034 , I suppose | 21:35 |
openstack | Launchpad bug 1825034 in OpenStack Compute (nova) stein "listing deleted servers from the API fails after running fill_virtual_interface_list online data migration" [High,Confirmed] | 21:35 |
mriedem | mnaser: i don't have a fix no | 21:38 |
mriedem | the workaround is to archive | 21:38 |
mriedem | i put some thoughts into the bug report but they kind of all suck | 21:38 |
mnaser | yeah, I went through it, none are really ideal | 21:38 |
mnaser | is --deleted every supposed to actually return data? | 21:39 |
mriedem | yeah | 21:39 |
mriedem | until you archive | 21:39 |
mnaser | I don't remember the openstack api returning deleted records | 21:39 |
mnaser | but til I guess | 21:39 |
melwitt | I think that might be the only case when it does | 21:39 |
mriedem | there is no guarantee that you'll get results because it depends on how the cloud is setup to archive | 21:39 |
mnaser | of course | 21:39 |
mnaser | I just didn't know there was an actual api way | 21:39 |
melwitt | and I agree all of the options suck. I kind of liked the last option mriedem put on the bug but that doesn't help if the migration was only run once (and completed). or maybe the migration could hard delete the marker instance itself if it completed. still, if multiple runs are needed it sucks | 21:40 |
mriedem | right, so the only way to hit this i think is to filter on all_tenants and deleted, which at least thankfully is admin-only | 21:40 |
mriedem | could break some internal tools | 21:40 |
mriedem | but shouldn't break external users | 21:40 |
mnaser | tbh | 21:43 |
mnaser | all_tenants+deleted will probably hurt a lot in a bigger environment anyways | 21:43 |
mriedem | as in spanking your dbs? | 21:43 |
imacdonn | or just making a lot of output | 21:44 |
mnaser | both | 21:45 |
imacdonn | up to api.max_limit, I guess | 21:45 |
mnaser | hits the db hard, hits tons of apis hard too | 21:45 |
melwitt | yeah, was gonna say max_limit save the day | 21:45 |
mnaser | yeah we have "don't do it™" rule but that might be a good stop gap | 21:45 |
openstackgerrit | Matt Riedemann proposed openstack/nova master: Exclude fake marker instance when listing servers https://review.openstack.org/653158 | 21:51 |
mriedem | well here is one option ^ | 21:51 |
imacdonn | the fact that the fake UUID is defined in virtual_interface makes it slightly more icky :/ | 21:52 |
mriedem | mnaser: btw, remind me to bring this up the next time you ask that the online data migrations use markers to be more efficient :) | 21:53 |
*** itlinux has quit IRC | 21:53 | |
mnaser | mriedem: bahaha | 21:53 |
mnaser | I think I liked the idea jaypipes proposed of using different migration repos like keystone does | 21:54 |
mnaser | but I think that means you can't batch them | 21:54 |
melwitt | lol, touche | 21:54 |
dansmith | mnaser: the point of that suggestion was to *not* batch them | 21:54 |
dansmith | mnaser: it also doesn't really work well for translations we have to do in python, which is a lot of them | 21:54 |
mnaser | ah yes I see | 21:55 |
mnaser | its a lot of rebuilding data | 21:55 |
dansmith | it just helps us avoid re-running them on presumed idempotentcy | 21:55 |
mnaser | rather than drop column for add column | 21:55 |
dansmith | and we could solve that pretty easily with something else | 21:55 |
mnaser | a .... marker? | 21:55 |
mnaser | :-P | 21:55 |
dansmith | doesn't have to be in-band | 21:55 |
mriedem | speaking of online data migrations, want to drop this old one now? https://review.openstack.org/#/c/651001/ | 21:55 |
mnaser | yeah | 21:56 |
dansmith | just a feature flag sort of "I've converted all the flavors, stop asking me" | 21:56 |
dansmith | problem is, | 21:56 |
mriedem | we could also be better about following up and cleaning these things up | 21:56 |
dansmith | if you end up with some older services by accident, you create old data, and stop running the migrations afterwards with no way of cleaning them up | 21:56 |
dansmith | mriedem: yup | 21:56 |
mnaser | I guess something that could be neat is opt-in soft delete | 21:57 |
mnaser | or opt-out | 21:57 |
mnaser | that'd probably make life a lot simpler and those online data migrations won't hurt as much, as imho you probably rarely find a million (running) instance cloud, but much more likely to have million (in total) vms | 21:58 |
dansmith | opt-out from ever showing deleted would be okay, because it's equivalent (api-wise) to running archive in a tight loop in the background | 21:58 |
dansmith | but opt-out from the soft deleting at all is harder to do | 21:58 |
efried | aspiers: Sorry, I missed you again. Left comments on the SEV review. | 21:59 |
aspiers | thanks! | 21:59 |
*** itlinux has joined #openstack-nova | 22:07 | |
*** ttsiouts has quit IRC | 22:10 | |
*** ttsiouts has joined #openstack-nova | 22:11 | |
*** munimeha1 has quit IRC | 22:12 | |
*** ttsiouts has quit IRC | 22:12 | |
mriedem | huh this is fun https://docs.openstack.org/python-openstackclient/latest/cli/command-objects/server.html#server-create | 22:22 |
mriedem | --config-drive <config-drive-volume>|True | 22:22 |
mriedem | Use specified volume as the config drive, or ‘True’ to use an ephemeral drive | 22:22 |
mriedem | as far as i know, that's not at all how that parameter works in the API | 22:22 |
mriedem | dansmith: has nova ever supported passing a volume id for the config drive to server create? | 22:24 |
dansmith | nfi | 22:24 |
dansmith | but no, doesn't sound familiar to me | 22:24 |
mriedem | maybe some special sauce from 2012 https://review.openstack.org/#/c/3994/ | 22:29 |
*** luksky has quit IRC | 22:35 | |
*** rcernin has joined #openstack-nova | 22:38 | |
imacdonn | mriedem: speaking of cleaning up after migrations .... I wonder if something should be done to handle those null security_groups rows that would have been created for anyone who has already run the migrations while instances exist ... | 22:40 |
mriedem | definitely maybe | 22:43 |
imacdonn | seems like the sort of thing that could come back to bite later :/ | 22:43 |
eandersson | Yo | 22:49 |
eandersson | https://github.com/openstack/nova/commit/35f49f403534e174578dcd1b9ab33daf6f14c3e8 | 22:50 |
eandersson | We need this in stable/rocky | 22:50 |
eandersson | ironic_url does not actually do anything in the ironicclient for Rocky | 22:50 |
eandersson | So ironic does not respect regions at all | 22:50 |
eandersson | TheJulia, ^ | 22:50 |
*** tkajinam has joined #openstack-nova | 22:54 | |
mriedem | eandersson: and that's due to https://review.openstack.org/#/c/359061/ in python-ironicclient in rocky? | 22:56 |
eandersson | actually it's odd | 22:57 |
eandersson | convert_keystoneauth_opts should fix that | 22:57 |
mriedem | i'm not sure how easy it is to backport that given it's dependent on ironicclient >= 2.4.0 | 22:58 |
eandersson | Yea - let me do a bit more research | 22:59 |
TheJulia | eandersson: I was just about to link tot he discussion http://eavesdrop.openstack.org/irclogs/%23openstack-ironic/%23openstack-ironic.2019-04-16.log.html#t2019-04-16T22:39:49 | 23:00 |
TheJulia | to the | 23:00 |
*** whoami-rajat has quit IRC | 23:02 | |
melwitt | mriedem: fyi, I got some more reviews on https://review.openstack.org/611974, even got a +1 from melissaml. should be good to go | 23:14 |
mriedem | i can't handle that at this late hour | 23:17 |
mriedem | but will star it | 23:17 |
openstackgerrit | Adam Spiers proposed openstack/nova-specs master: Re-approve AMD SEV support for Train https://review.openstack.org/641994 | 23:18 |
*** tosky has quit IRC | 23:19 | |
openstackgerrit | melanie witt proposed openstack/nova master: Fix assert in test_libvirt_info_scsi_with_unit https://review.openstack.org/653168 | 23:21 |
melwitt | mriedem: thanks. I got antsy with some of the pike changes going to the gate | 23:21 |
*** bbowen has joined #openstack-nova | 23:26 | |
*** mlavalle has quit IRC | 23:33 | |
*** itlinux has quit IRC | 23:35 | |
aspiers | kashyap: definitely need your input on https://review.openstack.org/#/c/641994/6/specs/train/approved/amd-sev-libvirt-support.rst@123 :) | 23:37 |
* aspiers goes to bed | 23:37 | |
*** avolkov has quit IRC | 23:52 |
Generated by irclog2html.py 2.15.3 by Marius Gedminas - find it at mg.pov.lt!