*** artom has joined #openstack-nova | 00:00 | |
*** xek has joined #openstack-nova | 00:01 | |
*** whoami-rajat has quit IRC | 00:01 | |
*** mlavalle has quit IRC | 00:04 | |
*** jistr has quit IRC | 00:15 | |
*** jistr has joined #openstack-nova | 00:15 | |
*** gyee has quit IRC | 00:49 | |
*** ricolin has joined #openstack-nova | 00:55 | |
*** igordc has quit IRC | 01:04 | |
*** slaweq has joined #openstack-nova | 01:11 | |
*** slaweq has quit IRC | 01:15 | |
*** KeithMnemonic has quit IRC | 01:16 | |
*** brinzhang has joined #openstack-nova | 01:28 | |
*** mvkr_ has quit IRC | 01:41 | |
*** mriedem has quit IRC | 01:49 | |
*** mvkr_ has joined #openstack-nova | 01:54 | |
*** whoami-rajat has joined #openstack-nova | 03:06 | |
*** slaweq has joined #openstack-nova | 03:11 | |
*** slaweq has quit IRC | 03:16 | |
*** psachin has joined #openstack-nova | 03:38 | |
*** rcernin has quit IRC | 03:55 | |
openstackgerrit | ZHOU YAO proposed openstack/nova master: Preserve UEFI NVRAM variable store https://review.opendev.org/621646 | 04:01 |
---|---|---|
*** udesale has joined #openstack-nova | 04:06 | |
*** etp has joined #openstack-nova | 04:07 | |
*** pcaruana has joined #openstack-nova | 04:44 | |
*** boxiang has quit IRC | 04:48 | |
*** boxiang has joined #openstack-nova | 04:48 | |
*** amodi has quit IRC | 04:49 | |
*** pcaruana has quit IRC | 04:56 | |
*** Luzi has joined #openstack-nova | 05:03 | |
openstackgerrit | melanie witt proposed openstack/nova-specs master: Propose policy rule for host status UNKNOWN https://review.opendev.org/666181 | 05:05 |
*** slaweq has joined #openstack-nova | 05:11 | |
*** slaweq has quit IRC | 05:16 | |
*** ratailor has joined #openstack-nova | 05:30 | |
*** rcernin has joined #openstack-nova | 05:33 | |
*** brault has quit IRC | 05:35 | |
*** tetsuro has joined #openstack-nova | 05:56 | |
*** jaosorior has quit IRC | 06:03 | |
*** rcernin has quit IRC | 06:03 | |
*** tetsuro has quit IRC | 06:04 | |
*** shilpasd has joined #openstack-nova | 06:06 | |
*** belmoreira has joined #openstack-nova | 06:08 | |
*** brinzhang has quit IRC | 06:10 | |
*** brinzhang has joined #openstack-nova | 06:11 | |
*** slaweq has joined #openstack-nova | 06:11 | |
*** mkrai has joined #openstack-nova | 06:13 | |
*** slaweq has quit IRC | 06:15 | |
*** rcernin has joined #openstack-nova | 06:18 | |
*** jaosorior has joined #openstack-nova | 06:20 | |
*** pcaruana has joined #openstack-nova | 06:21 | |
*** rcernin has quit IRC | 06:21 | |
*** rcernin has joined #openstack-nova | 06:21 | |
*** dpawlik has joined #openstack-nova | 06:31 | |
*** maciejjozefczyk has joined #openstack-nova | 06:32 | |
*** slaweq has joined #openstack-nova | 06:33 | |
*** udesale has quit IRC | 06:33 | |
*** udesale has joined #openstack-nova | 06:34 | |
*** luksky123 has joined #openstack-nova | 06:48 | |
-openstackstatus- NOTICE: The git service on opendev.org is currently down. | 06:50 | |
*** ChanServ changes topic to "The git service on opendev.org is currently down." | 06:50 | |
*** dpawlik has quit IRC | 06:50 | |
*** yaawang has quit IRC | 06:55 | |
*** yaawang has joined #openstack-nova | 06:57 | |
*** ChipOManiac has joined #openstack-nova | 06:58 | |
openstackgerrit | ZHOU YAO proposed openstack/nova master: Preserve UEFI NVRAM variable store https://review.opendev.org/621646 | 07:04 |
*** rcernin has quit IRC | 07:06 | |
*** tesseract has joined #openstack-nova | 07:15 | |
*** udesale has quit IRC | 07:16 | |
*** adriant has quit IRC | 07:17 | |
*** dpawlik has joined #openstack-nova | 07:17 | |
*** udesale has joined #openstack-nova | 07:18 | |
*** adriant has joined #openstack-nova | 07:18 | |
*** rpittau|afk is now known as rpittau | 07:22 | |
ChipOManiac | Hi guys. We have an openstack cluster with three KVM compute hosts that we setup via openstack-ansible. We've set up a single Nova-LXD compute unit and then imported a tgz ubuntu cloud image into our images list. | 07:25 |
ChipOManiac | Our problem seems to be with launching any LXD instances on this new Nova-LXD compute host. | 07:26 |
ChipOManiac | If we try starting an LXD instance with this image. Nova creates a KVM instance and tries to boot the tgz with it. | 07:27 |
ChipOManiac | Obviously that won't work. | 07:27 |
ChipOManiac | Is there any way to make the LXD instance work here? | 07:28 |
ChipOManiac | I've seen Ubuntu charms deployments have a 'root-tar' image format. I don't see that here in our openstack-ansubile. Is there any way for me to add this disk format? | 07:29 |
*** boxiang has quit IRC | 07:30 | |
*** boxiang_ has joined #openstack-nova | 07:30 | |
*** tssurya has joined #openstack-nova | 07:33 | |
*** bhagyashris has joined #openstack-nova | 07:38 | |
*** priteau has joined #openstack-nova | 07:45 | |
*** tkajinam has quit IRC | 07:53 | |
*** tkajinam has joined #openstack-nova | 07:53 | |
*** jaosorior has quit IRC | 07:57 | |
*** ralonsoh has joined #openstack-nova | 08:13 | |
*** ttsiouts has joined #openstack-nova | 08:18 | |
kashyap | aspiers: Catching up with the long scroll here...buried in several things | 08:20 |
*** cdent has joined #openstack-nova | 08:22 | |
*** belmoreira has quit IRC | 08:28 | |
*** panda has quit IRC | 08:28 | |
*** belmoreira has joined #openstack-nova | 08:29 | |
*** belmoreira has quit IRC | 08:29 | |
*** panda has joined #openstack-nova | 08:31 | |
*** mrch_ has joined #openstack-nova | 08:33 | |
-openstackstatus- NOTICE: Services at opendev.org like our git server and at openstack.org are currently down, looks like an outage in one of our cloud providers. | 08:33 | |
*** ChanServ changes topic to "Services at opendev.org like our git server and at openstack.org are currently down, looks like an outage in one of our cloud providers." | 08:33 | |
*** tkajinam has quit IRC | 08:36 | |
*** tetsuro has joined #openstack-nova | 08:37 | |
*** ChanServ changes topic to "Current runways: https://etherpad.openstack.org/p/nova-runways-train -- This channel is for Nova development. For support of Nova deployments, please use #openstack." | 08:40 | |
-openstackstatus- NOTICE: The problem in our cloud provider has been fixed, services should be working again | 08:40 | |
*** luksky123 has quit IRC | 08:54 | |
openstackgerrit | Merged openstack/nova master: Correct project/user id descriptions for os-instance-actions https://review.opendev.org/670027 | 08:58 |
stephenfin | alex_xu: Would you be okay with me fixing this up in a follow-up? | 09:01 |
stephenfin | <mschuppert> kashyap: yes. when you'd reserve one host per supported version and can not get multiple versions on one host. | 09:01 |
stephenfin | <mschuppert> kashyap: like I mentioned yesterday . the devnest pool is a shared pool for multiple DFGs. there are only 2 systems or so which are really exclusive for compute. | 09:01 |
stephenfin | <kashyap> mschuppert: Hmm, didn't realize that fully | 09:01 |
stephenfin | whoops | 09:01 |
stephenfin | https://review.opendev.org/#/c/551026/ | 09:01 |
stephenfin | HexChat's copy-paste behaviour is FUBAR | 09:01 |
alex_xu | stephenfin: ok, i'm cool with that | 09:02 |
*** luksky123 has joined #openstack-nova | 09:08 | |
openstackgerrit | Stephen Finucane proposed openstack/nova master: Follow-up for I2936ce8cb293dc80e1a426094fdae6e675461470 https://review.opendev.org/672669 | 09:08 |
stephenfin | alex_xu: ^ | 09:08 |
*** ivve has joined #openstack-nova | 09:08 | |
alex_xu | thanks | 09:08 |
*** lpetrut has joined #openstack-nova | 09:15 | |
*** lpetrut has quit IRC | 09:16 | |
*** lennyb has joined #openstack-nova | 09:16 | |
*** lpetrut has joined #openstack-nova | 09:16 | |
kashyap | aspiers: On "what's wrong with option 1 -- do we need to pass virt_type?" -- if libvirt "knows" that KVM is available, then if you pass virt_type as None to getDomanCapabilities(), it defaults to KVM | 09:17 |
kashyap | I've done a bunch of quick tests | 09:17 |
*** tetsuro has quit IRC | 09:28 | |
*** tetsuro has joined #openstack-nova | 09:29 | |
aspiers | kashyap, sean-k-mooney: I got a reply from our libvirt guy | 09:29 |
aspiers | he said "domcapabilities should be the same with or without virttype" | 09:29 |
aspiers | so I'm not sure why it's a parameter in the API call | 09:29 |
aspiers | "default on a kvm host would be virttype=kvm" | 09:30 |
*** luksky123 has quit IRC | 09:31 | |
kashyap | aspiers: I'm getting you some diffs and results. 1 sec, uploading the files | 09:31 |
kashyap | https://kashyapc.fedorapeople.org/domCapabilities/domCapabilities_without_virt_type.txt | 09:32 |
kashyap | https://kashyapc.fedorapeople.org/domCapabilities/domCapabilities_with_virt_type_kvm.txt | 09:32 |
kashyap | aspiers: If you `diff` them, you'd see no `diff` (besides a one-line unrelated noise) | 09:32 |
aspiers | ok | 09:33 |
kashyap | aspiers: But ... as you guessed, if you explicitly supply virt_type as 'qemu', you would see siginificant difference | 09:33 |
kashyap | Here is the 'diff': https://kashyapc.fedorapeople.org/domCapabilities/diff_domCapabilities_of_virt_type_kvm_and_qemu.txt | 09:33 |
*** takashin has left #openstack-nova | 09:34 | |
*** luksky123 has joined #openstack-nova | 09:35 | |
kashyap | aspiers: In other words, "your guy" is of course correct :-) | 09:35 |
aspiers | kashyap: thanks | 09:52 |
kashyap | aspiers: I'm still finishing something; but yes, option-4 is what I'd lean towards (at 'debug' level) | 09:53 |
aspiers | I think that's what sean-k-mooney's new PS implemented | 09:53 |
kashyap | And also yes to sean-k-mooney's we _do_ want to use the 'virt_type' when we know. As that will ensure the right CPU features are reported. | 09:54 |
kashyap | aspiers: Ah, okay. I'm lagging behind, as I was basing my comments on your IRC exchange linked in the review. | 09:54 |
kashyap | (Didn't refresh) | 09:54 |
aspiers | kashyap: https://review.opendev.org/#/c/670189/9..10/nova/virt/libvirt/host.py | 09:54 |
kashyap | Thank you | 09:54 |
bhagyashris | stephenfin: I gone through all you patches and also applied those in my environment and did some testing on it I have few observation with me | 09:58 |
stephenfin | shoot | 09:58 |
bhagyashris | stephenfin: 1. 1. I am able create the pinned instance using old way on your patches. you can see the details which steps I have followed to create the instance here http://paste.openstack.org/show/754840/ | 09:59 |
bhagyashris | stephenfin: 2. I have also checked few scenarios and there I saw some issues. you can see here http://paste.openstack.org/show/754841/ | 09:59 |
stephenfin | Yeah, I'd expect it to consume both VCPU and PCPU because I don't have the handling code you do. Your implementation is better in that regard | 10:01 |
stephenfin | The other two scenarios are interesting. I wonder what I'm hitting there | 10:01 |
bhagyashris | stephenfin: yeah it's handling this case .... | 10:01 |
bhagyashris | stephenfin: I know where is point that's reporting wrong inventory | 10:02 |
stephenfin | Oh yeah? | 10:04 |
bhagyashris | Here https://review.opendev.org/#/c/671793/4/nova/virt/libvirt/driver.py@6852 if I set [compute] cpu_dediacted_set then it's report VCPU resources as well | 10:04 |
bhagyashris | stephenfin: because in self._get_vcpu_total() method you are checking if vcpu_pin_set , elif CONF.compute.cpu_shared_set else all the host_cpus | 10:05 |
stephenfin | Ah, yes. So before thatfallthrough case I need a final conditional to check if cpu_dedicated_set is set and return nothing if so | 10:06 |
stephenfin | And ditto for the '_get_vcpu_total' method | 10:06 |
bhagyashris | yes | 10:06 |
bhagyashris | stephenfin: I will keep testing and review your patches | 10:07 |
stephenfin | bhagyashris: The one I'm most interested in your thoughts on is https://review.opendev.org/#/c/671800/7/nova/objects/numa.py | 10:07 |
stephenfin | Because I think that and the changes to InstanceNUMACell are the biggest differences we have | 10:08 |
stephenfin | I don't know what we do with old NUMACell objects. For those, 'cpu_usage' can contain usage of either pinned (PCPU) or unpinned (VCPU) instance vCPUs | 10:09 |
bhagyashris | stephenfin: yeah same question was in my mid | 10:09 |
bhagyashris | mind* | 10:09 |
stephenfin | and we don't ever rebuild the objects from scratch | 10:09 |
stephenfin | instead, we use that numa_usage_from_instances function to add or subtract usage based on a provided instance NUMA topology | 10:10 |
stephenfin | I'm thinking it might make sense to start retrieving all instances associated with a host and building the host NUMA topology object from scratch each time | 10:11 |
stephenfin | but that would involve a join on the instance extra table in some places | 10:11 |
*** mkrai has quit IRC | 10:12 | |
bhagyashris | stephenfin: yeah that is one option | 10:14 |
bhagyashris | stephenfin: what's your opinion about my change I mean I made change in both the InctanceNUMAToplogy and host NUMAToplogy | 10:15 |
stephenfin | Yeah, as noted I'm not sure if it's necessary yet. We have the 'cpu_policy' field on that object so we're already able to tell if 'cpuset' describes VCPUs or PCPUs | 10:16 |
stephenfin | That will change when we support both types in the same instance, but we're doing that separately | 10:17 |
stephenfin | Speaking of which, I need to review that spec again today | 10:17 |
*** ttsiouts has quit IRC | 10:17 | |
stephenfin | So I don't mind having it, but I think it might be unnecessary for now and possibly make things a little more complicated than necessary | 10:17 |
*** ttsiouts has joined #openstack-nova | 10:18 | |
bhagyashris | stephenfin: okay, But what I thought is anyways we are going to support both the VCPU and PCPU in future so that will not cause any problem even if we keep now and I have added the api and scheduler check that dont allow both the PCPU and VCPU in one request | 10:19 |
bhagyashris | stephenfin: so in future there will be just matter of removing that check | 10:20 |
stephenfin | Yup, I get that. Maybe it makes sense. I haven't really parsed how much complexity it adds so maybe it's not an issue | 10:20 |
stephenfin | I just wanted to highlight that it wasn't 100% necessary yet, if that makes sense | 10:20 |
sean-k-mooney | well we will need to add a filed to store teh mask of pinned cores | 10:20 |
sean-k-mooney | which we should not add until we need it | 10:20 |
sean-k-mooney | e.g. we should not make object changes that would only be required when we allow mixed instances | 10:21 |
sean-k-mooney | until we support that | 10:21 |
stephenfin | Yeah, that's my gut feeling too | 10:21 |
stephenfin | YAGNI or something like that | 10:21 |
bhagyashris | stephenfin, sean-k-mooney: ok. | 10:22 |
*** ttsiouts has quit IRC | 10:22 | |
stephenfin | bhagyashris: On the plus side, it should be very easy reuse it if/when that spec to allow mixed instances gets merged, so that's a win :) | 10:23 |
bhagyashris | stephenfin, sean-k-mooney: means for now we will consider the cpuset only | 10:24 |
bhagyashris | stephenfin: ok | 10:24 |
bhagyashris | stephenfin: and one more point we are going to allow the new syntax of flavor extra specs like "resources:PCPU=<no of cpus>" right? | 10:26 |
stephenfin | I think we decided we would have to, yes | 10:26 |
stephenfin | Though it wouldn't be the preferred option, of course | 10:26 |
*** brtknr has quit IRC | 10:27 | |
bhagyashris | stephenfin: and I saw that this is not yet implemented in your series of patches ... and I have implemented that so may be we can use that code | 10:27 |
stephenfin | bhagyashris: I'm working on rebasing your stuff into my series (keeping authorship, of course) to do just that now :) | 10:28 |
*** cdent has quit IRC | 10:29 | |
bhagyashris | stephenfin: also the upgrade and reshape part is not in your series of patches and I have submitted the patche https://review.opendev.org/#/c/672224/1 to do so . I will address all your review comments on it and will upload the patch ( this is priority work for me) | 10:30 |
bhagyashris | stephenfin: now only the part remaining is which one will be the better option this one https://review.opendev.org/#/c/672223/1 or this one https://review.opendev.org/#/c/671801/7 | 10:32 |
*** brtknr has joined #openstack-nova | 10:33 | |
stephenfin | Yeah, pretty much | 10:34 |
bhagyashris | stephenfin: I guess this one will be the better option ttps://review.opendev.org/#/c/672223/1 because it's simple and also as mentioned in spec that we will used scheduler profiler https://review.opendev.org/#/c/555081/28/specs/train/approved/cpu-resources.rst@451 | 10:34 |
stephenfin | I'm still on the fence, personally | 10:34 |
stephenfin | Yeah, definitely a lot less work there | 10:34 |
stephenfin | The only thing is that rewriting extra specs on the user feels a little wrong. I'm not sure why | 10:35 |
stephenfin | Assuming these are persisted somewhere | 10:35 |
bhagyashris | stephenfin: yeah, but it looks simple that and less work and there is no overhead of conf option as well | 10:36 |
stephenfin | I think we still need the config option, no? | 10:37 |
stephenfin | Otherwise how do we say "don't start converting these extra specs yet because I don't have enough hosts reporting PCPU inventory" ? | 10:37 |
bhagyashris | stephenfin: ohh ok | 10:39 |
stephenfin | I'm not crazy, right? We do need an option for that, yeah? | 10:40 |
bhagyashris | stephenfin: will wait for others opinion then | 10:40 |
bhagyashris | stephenfin: TO DO is 1. the aliasing of flavor extra spec 2. upgrade related stuff is remaining ... out of that for first to do will wait for other opinion and I will take the upgrade part on high priority | 10:45 |
bhagyashris | stephenfin: what's your opinion? | 10:45 |
*** cdent has joined #openstack-nova | 10:46 | |
*** etp has quit IRC | 10:46 | |
*** jaosorior has joined #openstack-nova | 10:47 | |
openstackgerrit | Maksim Malchuk proposed openstack/nova stable/queens: fix cellv2 delete_host https://review.opendev.org/672690 | 10:49 |
*** bhagyashris has quit IRC | 10:50 | |
*** brtknr has quit IRC | 10:56 | |
*** brtknr has joined #openstack-nova | 10:56 | |
openstackgerrit | Stephen Finucane proposed openstack/nova master: Follow-up for I2936ce8cb293dc80e1a426094fdae6e675461470 https://review.opendev.org/672669 | 11:01 |
openstackgerrit | Stephen Finucane proposed openstack/nova master: libvirt: Start reporting PCPU inventory to placement https://review.opendev.org/671793 | 11:01 |
openstackgerrit | Stephen Finucane proposed openstack/nova master: trivial: Rename exception argument https://review.opendev.org/671795 | 11:01 |
openstackgerrit | Stephen Finucane proposed openstack/nova master: trivial: Remove unused function parameter https://review.opendev.org/671796 | 11:01 |
openstackgerrit | Stephen Finucane proposed openstack/nova master: Remove 'hardware.get_host_numa_usage_from_instance' https://review.opendev.org/671797 | 11:01 |
openstackgerrit | Stephen Finucane proposed openstack/nova master: Remove 'hardware.host_topology_and_format_from_host' https://review.opendev.org/671798 | 11:01 |
openstackgerrit | Stephen Finucane proposed openstack/nova master: Remove 'hardware.instance_topology_from_instance' https://review.opendev.org/671799 | 11:01 |
openstackgerrit | Stephen Finucane proposed openstack/nova master: Rework 'hardware.numa_usage_from_instances' https://review.opendev.org/672565 | 11:01 |
openstackgerrit | Stephen Finucane proposed openstack/nova master: tests: Split NUMA object tests https://review.opendev.org/672336 | 11:01 |
openstackgerrit | Stephen Finucane proposed openstack/nova master: WIP: hardware: Differentiate between shared and dedicated CPUs https://review.opendev.org/671800 | 11:01 |
openstackgerrit | Stephen Finucane proposed openstack/nova master: Add support translating CPU policy extra specs, image meta https://review.opendev.org/671801 | 11:01 |
openstackgerrit | Stephen Finucane proposed openstack/nova master: libvirt: '_get_(v|p)cpu_total' to '_get_(v|p)cpu_available' https://review.opendev.org/672693 | 11:01 |
*** adriant has quit IRC | 11:07 | |
*** adriant has joined #openstack-nova | 11:07 | |
openstackgerrit | Stephen Finucane proposed openstack/nova master: objects: Remove ConsoleAuthToken.to_dict https://review.opendev.org/652970 | 11:08 |
openstackgerrit | Stephen Finucane proposed openstack/nova master: WIP! docs: Rework nova console diagram https://review.opendev.org/660147 | 11:08 |
openstackgerrit | Stephen Finucane proposed openstack/nova master: docs: Integrate 'sphinx.ext.imgconverter' https://review.opendev.org/665693 | 11:08 |
*** jaosorior has quit IRC | 11:08 | |
*** udesale has quit IRC | 11:13 | |
*** ChipOManiac has quit IRC | 11:19 | |
slaweq | sean-k-mooney: hi again | 11:20 |
sean-k-mooney | o/ | 11:20 |
slaweq | sean-k-mooney: again about https://bugs.launchpad.net/neutron/+bug/1836642 - I replied to Your last comment | 11:20 |
openstack | Launchpad bug 1836642 in neutron "Metadata responses are very slow sometimes" [High,Confirmed] - Assigned to Slawek Kaplonski (slaweq) | 11:20 |
slaweq | can You check it? | 11:20 |
sean-k-mooney | ya i saw | 11:21 |
sean-k-mooney | what i think happened there was the first call took 13 seconds to complete a second call was recived while the first call was completeing and that took 2 secons since the first call hand not completed | 11:22 |
slaweq | sean-k-mooney: no, first call took 2 seconds | 11:23 |
slaweq | and second took 13 | 11:23 |
slaweq | first call is always for /instance-id | 11:23 |
sean-k-mooney | ok its proably a cache miss then | 11:23 |
slaweq | and that would be retried if it would took more than 10 seconds - it's how this ec2-metadata script from cirros works | 11:23 |
slaweq | and second call (after first was completed fine) was for public-keys and this took 13 seconds | 11:24 |
slaweq | and this isn't retried by ec2-metadata script so it failed | 11:24 |
*** tetsuro has quit IRC | 11:24 | |
sean-k-mooney | right ok | 11:25 |
slaweq | sean-k-mooney: so I think that it's like that because each of those requests are processed by different worker thus for each worker data isn't cached | 11:26 |
slaweq | sean-k-mooney: could it be like that? | 11:26 |
sean-k-mooney | i think we are using memcahce to chage it but if its in porcess yes | 11:26 |
sean-k-mooney | if its memcache it still takes a while to propagate | 11:27 |
sean-k-mooney | those request are ~500 ms apart | 11:27 |
slaweq | can You maybe check it to be sure? | 11:27 |
slaweq | 500ms when? usually when it's fine, right? | 11:28 |
sean-k-mooney | _Jul_09_17_23_47_872908 - _Jul_09_17_23_47_351149 ~ 500ms | 11:28 |
sean-k-mooney | actully i gues those are the log message for the completions? | 11:29 |
slaweq | yes | 11:29 |
slaweq | and there is time in each line | 11:29 |
slaweq | "time: 13.2546282" | 11:29 |
slaweq | this is how long this took to process this request and send response | 11:29 |
sean-k-mooney | anyway ill check | 11:30 |
slaweq | sean-k-mooney: thx | 11:30 |
openstackgerrit | Merged openstack/nova master: Remove deprecated CPU, RAM, disk claiming in resource tracker https://review.opendev.org/551026 | 11:36 |
openstackgerrit | Merged openstack/nova master: Pass extra_specs to flavor in vif tests https://review.opendev.org/662556 | 11:36 |
sean-k-mooney | so it checks the cache here https://opendev.org/openstack/nova/src/branch/master/nova/api/metadata/handler.py#L77 | 11:36 |
sean-k-mooney | which is initalised here https://opendev.org/openstack/nova/src/branch/master/nova/api/metadata/handler.py#L47 | 11:37 |
sean-k-mooney | i think this delegates to oslo cache | 11:37 |
sean-k-mooney | so we call this https://github.com/openstack/nova/blob/master/nova/cache_utils.py#L47 | 11:38 |
*** stakeda has quit IRC | 11:39 | |
sean-k-mooney | there is no cache section defiend in the nova.conf http://logs.openstack.org/35/521035/8/check/tempest-full/031b0b9/controller/logs/etc/nova/nova_conf.txt.gz | 11:40 |
sean-k-mooney | meaning we fall back to olso_caches default with is an in memeory dogpile dict cache | 11:41 |
*** pcaruana has quit IRC | 11:42 | |
sean-k-mooney | slaweq: so looking at the metadata apis uwsgi config we are runnint 1 thread per process and 2 processes | 11:42 |
sean-k-mooney | so there are two workers each of which have seperate in memeory dict caches | 11:43 |
*** igordc has joined #openstack-nova | 11:43 | |
sean-k-mooney | slaweq: so this is like just because oslo cache is not configured to use memcache in that job | 11:43 |
slaweq | sean-k-mooney: so do You think that enabling memcache "globally" in all tempest based jobs would be good and could help to solve/workaround this issue? | 11:44 |
sean-k-mooney | i think it would not only solve this issue but signifcatntly speed up the gate | 11:44 |
sean-k-mooney | well maybe not | 11:44 |
slaweq | than always during /public-keys/ call we would already have cached data for instance | 11:44 |
sean-k-mooney | it is still doing an in memory dict cache | 11:45 |
sean-k-mooney | but it proably would resove this issue | 11:45 |
sean-k-mooney | yep | 11:45 |
slaweq | sean-k-mooney: ok, that's something | 11:45 |
slaweq | can You maybe write this in comment in launchpad? | 11:45 |
slaweq | I will try to propose patch to tempest repo probably | 11:45 |
slaweq | or devstack | 11:45 |
sean-k-mooney | sure. keysone is already configure to use memcache in that job | 11:47 |
sean-k-mooney | http://logs.openstack.org/35/521035/8/check/tempest-full/031b0b9/controller/logs/etc/keystone/keystone_conf.txt.gz | 11:47 |
sean-k-mooney | so it should be trival to copy the exact same caching logic to nova the config options are identical because they just come form oslo | 11:47 |
slaweq | sean-k-mooney: thx a lot | 11:47 |
slaweq | I will propose patch for this today | 11:47 |
slaweq | and we will see how it will work :) | 11:48 |
slaweq | but that is some idea which may solve this problem in gate and improve gate stability for all projects in fact :) | 11:48 |
sean-k-mooney | actully i think oslos cache's default is the null cache | 11:48 |
sean-k-mooney | so i would expect this to speed up the gate jobs | 11:48 |
sean-k-mooney | the null cache never caches anything | 11:49 |
openstackgerrit | Adam Spiers proposed openstack/nova master: libvirt: harden Host.get_domain_capabilities() https://review.opendev.org/670189 | 11:49 |
*** ksdean has joined #openstack-nova | 11:49 | |
aspiers | sean-k-mooney: ^^^ fixed a few minor typos/grammar issues in the comments but there is still one issue | 11:49 |
*** ttsiouts has joined #openstack-nova | 11:50 | |
sean-k-mooney | aspiers: :) i think there always will be | 11:50 |
sean-k-mooney | aspiers: what is the latest one? | 11:50 |
aspiers | sean-k-mooney: just posted on the review | 11:50 |
*** mriedem has joined #openstack-nova | 11:51 | |
sean-k-mooney | we dont have an easy way to check the excpetion unless we pars the error message text | 11:54 |
aspiers | yes that's what I was suggesting | 11:54 |
aspiers | it's not great but better than nothing | 11:55 |
sean-k-mooney | yes and i think that is a bad idea | 11:55 |
sean-k-mooney | it makes it rather fragile | 11:55 |
aspiers | well OK then the debug message should be more honest and not assume what it doesn't know | 11:55 |
aspiers | it's fine if it says it's guessing the issue | 11:55 |
sean-k-mooney | the debug message just states that we are skiping the arch because its in compatible with the virt type and machine type | 11:55 |
sean-k-mooney | we dont state which is the in compatibality | 11:55 |
aspiers | but it doesn't know that | 11:56 |
aspiers | it could fail due to libvirtd issues | 11:56 |
sean-k-mooney | it does | 11:56 |
aspiers | e.g. libvirtd crashes 1usec beforehand | 11:56 |
aspiers | and then you get a misleading debug message | 11:56 |
sean-k-mooney | so i originally was going to not have a debug message at all | 11:56 |
aspiers | no it's good to have one | 11:56 |
sean-k-mooney | so we can delete it if we want too | 11:56 |
aspiers | it should just not risk being wrong | 11:56 |
aspiers | it's fine if it says "this is *probably* what happened" | 11:57 |
aspiers | if that's the most likely thing | 11:57 |
sean-k-mooney | then we can simply state we are skiping the arch and not say way | 11:57 |
sean-k-mooney | *why | 11:57 |
aspiers | if that's the most likely thing it's better to expose the guess | 11:57 |
aspiers | since that's potentially more helpful to the operator/dev than not guessing | 11:57 |
aspiers | as long as it's not misleading | 11:57 |
sean-k-mooney | e.g. Skipping arch: becasue libvirt raised an error, check you libvirt logs for more info | 11:57 |
aspiers | think of it from the operator perspective | 11:57 |
aspiers | no the libvirt logs might not reveal anything | 11:58 |
sean-k-mooney | aspiers: this is never ment to be read by operators | 11:58 |
aspiers | if it was incompatible | 11:58 |
sean-k-mooney | that is why its at debug level | 11:58 |
aspiers | if you don't believe operators read DEBUG you live in a different universe ... | 11:58 |
aspiers | :) | 11:58 |
sean-k-mooney | they might but this is not intened for them | 11:58 |
aspiers | anyway it doesn't matter who is reading it | 11:58 |
sean-k-mooney | are you ok with the message i suggested above | 11:58 |
aspiers | the point is that the message needs to be a) not misleading b) as helpful as possible | 11:59 |
*** sapd1_ has joined #openstack-nova | 11:59 | |
aspiers | OK I will paste a suggested message here, 1 sec | 11:59 |
*** sapd1 has quit IRC | 11:59 | |
sean-k-mooney | "Skipping arch: %s becasue libvirt raised an error, check you libvirt logs for more info." | 11:59 |
aspiers | nope | 12:00 |
aspiers | like I said libvirt logs might not help | 12:00 |
aspiers | and in this case we know we might be able to help by guessing the likely cause | 12:00 |
sean-k-mooney | you said you dont want it to be missleading | 12:00 |
aspiers | yes that is a) | 12:00 |
aspiers | but also b) | 12:01 |
sean-k-mooney | the current error message is our best guess at why the error was raised | 12:01 |
aspiers | Yes but it's not honest that it's a guess | 12:01 |
aspiers | This is better: "Failed to retrieve domain caps from libvirt for arch %s; maybe incompatible with virt_type %s / machine_type %s?" | 12:01 |
sean-k-mooney | it is honest it was a summary of the error message | 12:02 |
sean-k-mooney | we could just print the error message we get back from libvirt | 12:02 |
aspiers | Yes good idea | 12:02 |
aspiers | This is better: "Failed to retrieve domain caps from libvirt for arch %(arch)s (%(error)s); maybe incompatible with virt_type %(virt_type)s / machine_type %(mach_type)s?" | 12:02 |
sean-k-mooney | i wanted to avoid the stack trace but we should be able to just get the message | 12:02 |
*** dpawlik has quit IRC | 12:02 | |
aspiers | that last one includes the libvirt error message ^^^ | 12:03 |
sean-k-mooney | i would not put the error in the middel | 12:03 |
sean-k-mooney | i would put it at the end | 12:03 |
sean-k-mooney | i guess its not that long | 12:04 |
sean-k-mooney | http://paste.openstack.org/show/754776/ | 12:04 |
sean-k-mooney | its invalid argument: KVM is not supported by '/usr/bin/qemu-system-alpha' on this host | 12:04 |
sean-k-mooney | at least in the case where the virt type is the issue | 12:04 |
aspiers | OK good point | 12:04 |
*** ratailor has quit IRC | 12:05 | |
aspiers | "Failed to retrieve domain caps from libvirt for arch %(arch)s; maybe incompatible with virt_type %(virt_type)s / machine_type %(mach_type)s? libvirt error was: %(error)s" | 12:05 |
sean-k-mooney | ya im fine with that | 12:05 |
aspiers | or actually | 12:05 |
aspiers | since the libvirt error is already helpful enough | 12:05 |
aspiers | "Failed to retrieve domain caps from libvirt for arch %(arch)s / virt_type %(virt_type)s / machine_type %(mach_type)s; libvirt error was: %(error)s" | 12:06 |
sean-k-mooney | sure we dont need to make it a question | 12:06 |
sean-k-mooney | since the lbvirt error states what the issue was | 12:06 |
* kashyap follows the discussion | 12:06 | |
kashyap | aspiers: Can't we do a check to determine the arch to 'virt_type' compatibility? | 12:07 |
sean-k-mooney | no | 12:07 |
aspiers | sean-k-mooney: exactly, I removed the question mark | 12:07 |
sean-k-mooney | that is libvirt job | 12:07 |
* kashyap still reading | 12:07 | |
aspiers | kashyap: I agree with sean-k-mooney here | 12:07 |
aspiers | nova shouldn't know about that | 12:07 |
kashyap | sean-k-mooney: I meant, _using_ libvirt's reported results, of course | 12:07 |
kashyap | aspiers: Yeah, just thinking out loud | 12:07 |
kashyap | I agree that Nova sholdn't know about it | 12:08 |
sean-k-mooney | right if kvm ever adds support for accleration of non ntaive instruction i dont want to need to modify nova | 12:08 |
aspiers | so nova is not checking compatibility, it's just trying to get dom caps | 12:08 |
kashyap | aspiers: typo: "caps" --> "capabilities" | 12:08 |
aspiers | if that fails we report the error from libvirt plus details of the API parameters | 12:08 |
aspiers | kashyap: my wrists hurt so I am trying to reduce typing :-p | 12:08 |
aspiers | typos in IRC are allowed! | 12:08 |
aspiers | just not in code | 12:09 |
kashyap | aspiers: No-no, in the _error_ message I mean | 12:09 |
aspiers | oh :) | 12:09 |
aspiers | ok sure | 12:09 |
kashyap | Of course, it's fine here ;-) I'm not _that_ pedantic | 12:09 |
aspiers | "Failed to retrieve domain capabilities from libvirt for arch %(arch)s / virt_type %(virt_type)s / machine_type %(mach_type)s; libvirt error was: %(error)s" | 12:09 |
kashyap | aspiers: Yes, sounds clear and truthful | 12:09 |
aspiers | kashyap: haha well I am sometimes so I wasn't ruling out that you might be too ;-) | 12:09 |
sean-k-mooney | ya that looks fine | 12:09 |
aspiers | sean-k-mooney: OK want me to change it or you? | 12:09 |
sean-k-mooney | sure | 12:10 |
sean-k-mooney | i am about to grab lunch | 12:10 |
aspiers | sure == me? :) | 12:10 |
aspiers | OK | 12:10 |
aspiers | before you go | 12:10 |
sean-k-mooney | if you havent by the time i get back ill do it | 12:10 |
aspiers | I have some semi-exciting news | 12:10 |
aspiers | check this out | 12:10 |
sean-k-mooney | oh? | 12:10 |
aspiers | Jul 25 01:29:37 della5s17 nova-compute[6543]: DEBUG nova.scheduler.client.report [None req-ce6afb17-2d5f-489a-bfda-667053513883 None None] Inventory has not changed for provider 54d4029e-c36b-4bd3-b922-ab4cdefba128 based on inventory data: {u'VCPU': {u'allocation_ratio': 16.0, u'total': 128, u'reserved': 0, u'step_size': 1, u'min_unit': 1, u'max_unit': 128}, u'MEMORY_MB': {u'allocation_ratio': 1.5, | 12:10 |
aspiers | u'total': 128452, u'reserved': 512, u'step_size': 1, u'min_unit': 1, u'max_unit': 128452}, u'DISK_GB': {u'allocation_ratio': 1.0, u'total': 95, u'reserved': 0, u'step_size': 1, u'min_unit': 1, u'max_unit': 95}, u'MEM_ENCRYPTION_CONTEXT': {u'allocation_ratio': 1.0, u'total': 2147483647, u'reserved': 0, u'step_size': 1, u'min_unit': 1, u'max_unit': 2147483647}} {{(pid=6543) set_inventory_for_provider | 12:10 |
aspiers | /opt/stack/nova/nova/scheduler/client/report.py:912}} | 12:10 |
aspiers | oops, sorry for linebreaks | 12:10 |
aspiers | that's from a real SEV system | 12:10 |
aspiers | total is "infinite" cos I didn't configure the nova.conf option yet | 12:11 |
aspiers | I'm gonna set to 16 now | 12:11 |
sean-k-mooney | cool | 12:11 |
kashyap | aspiers: Hehe | 12:11 |
sean-k-mooney | i also reached a milestone yesterday where i fully tested the vPMU feature and image metadata prefilter stuff i have been working on | 12:11 |
aspiers | nice! | 12:12 |
aspiers | also | 12:12 |
aspiers | Jul 25 01:24:29 della5s17 nova-compute[6543]: DEBUG nova.scheduler.client.report [None req-ce6afb17-2d5f-489a-bfda-667053513883 None None] Refreshing trait associations for resource provider 54d4029e-c36b-4bd3-b922-ab4cdefba128, traits: | 12:12 |
aspiers | HW_CPU_X86_SSE,COMPUTE_IMAGE_TYPE_ISO,COMPUTE_NET_ATTACH_INTERFACE,COMPUTE_NET_ATTACH_INTERFACE_WITH_TAG,HW_CPU_X86_AMD_SEV,COMPUTE_IMAGE_TYPE_AKI,COMPUTE_IMAGE_TYPE_ARI,COMPUTE_IMAGE_TYPE_QCOW2,COMPUTE_TRUSTED_CERTS,HW_CPU_X86_SVM,COMPUTE_DEVICE_TAGGING,COMPUTE_VOLUME_ATTACH_WITH_TAG,COMPUTE_VOLUME_MULTI_ATTACH,HW_CPU_X86_SSE2,COMPUTE_IMAGE_TYPE_AMI,COMPUTE_VOLUME_EXTEND,HW_CPU_X86_MMX,COMPUTE_IMAGE_ | 12:12 |
sean-k-mooney | it was really nice to see the traits showing up in placmenet for the prefilter and the transform happening | 12:12 |
aspiers | TYPE_RAW {{(pid=6543) _refresh_associations /opt/stack/nova/nova/scheduler/client/report.py:796}} | 12:12 |
aspiers | oh, line breaks are from IRC max line limit | 12:12 |
aspiers | Jul 25 01:13:25 della5s17 nova-compute[6543]: {{(pid=6543) _get_domain_capabilities /opt/stack/nova/nova/virt/libvirt/host.py:831}} | 12:12 |
aspiers | Jul 25 01:13:25 della5s17 nova-compute[6543]: DEBUG nova.virt.libvirt.host [-] Checking SEV support for arch x86_64 and machine type pc {{(pid=6543) _set_amd_sev_support /opt/stack/nova/nova/virt/libvirt/host.py: | 12:12 |
aspiers | 1120}} | 12:12 |
aspiers | Jul 25 01:13:25 della5s17 nova-compute[6543]: INFO nova.virt.libvirt.host [-] AMD SEV support detected | 12:12 |
aspiers | so I'm finally ready to test booting a real SEV VM through nova :-O | 12:13 |
*** takashin has joined #openstack-nova | 12:13 | |
sean-k-mooney | nice | 12:14 |
sean-k-mooney | i have added both my features to the runeway queue so hopfully we will see sev on that list soon too | 12:14 |
aspiers | it's already in the queue | 12:15 |
aspiers | I thought the series was already ready before I went on vacation so I added it then | 12:15 |
sean-k-mooney | i dont see it | 12:16 |
aspiers | of course I was wrong | 12:16 |
aspiers | ah, got removed again | 12:16 |
sean-k-mooney | ya because it was in merge conflict | 12:16 |
aspiers | not when I added it | 12:16 |
sean-k-mooney | sure but it went into merge conflict when you were on vaction | 12:17 |
aspiers | yes | 12:17 |
aspiers | at least that bit wasn't my fault ;-) | 12:17 |
sean-k-mooney | :) damb cores merging code | 12:17 |
sean-k-mooney | its almost as if its part of there jobs | 12:17 |
aspiers | I know, disgraceful behaviour | 12:18 |
*** mrch_ has quit IRC | 12:20 | |
sean-k-mooney | ok, shower, lunch and then back before a meeting at 2. that should be totally doable... | 12:21 |
*** dpawlik has joined #openstack-nova | 12:21 | |
aspiers | good luck | 12:21 |
*** pcaruana has joined #openstack-nova | 12:22 | |
mriedem | brinzhang: https://review.opendev.org/#/c/612949/10 | 12:24 |
mriedem | alex_xu: ^ | 12:24 |
mriedem | i think what i've described as the main proposed change (in my comments) makes sense, but the spec overall is a bit confusing since it seems to mix in lots of different things | 12:24 |
mriedem | if the spec can be cleaned up in time for the spec freeze i'll probably be +2 on it | 12:24 |
mriedem | i do appreciate that the entire dev team from inspur has +1ed the spec without leaving any comments though :) | 12:25 |
aspiers | mriedem: since I'm back from vacation and have got the SEV series back to (AFAICS) 100% health, I've readded to the runway | 12:26 |
mriedem | within like 5 minutes of each other :) | 12:26 |
mriedem | aspiers: ok | 12:26 |
openstackgerrit | Daniel Speichert proposed openstack/nova-specs master: Directly download and upload images to RBD https://review.opendev.org/658903 | 12:27 |
*** tbachman has quit IRC | 12:40 | |
aspiers | sean-k-mooney: almost got the PS ready | 12:55 |
efried | aspiers: also, it's generally discouraged to put a series in a runway slot if the devs are going to be on vacation or otherwise unavailable during the window to address comments (and merge conflicts). | 12:56 |
efried | see last bullet under "Requirements for being eligible..." | 12:57 |
aspiers | efried: ack. IIRC it was far back in the queue so didn't seem to have much chance of getting addressed while I was away | 12:57 |
aspiers | but maybe I miscalculated | 12:57 |
aspiers | also, I was expecting a colleague to be available while I was away | 12:57 |
aspiers | but I think he got busy with other commitments | 12:58 |
efried | aspiers: What should have happened is whoever actually moved it from the queue to a slot should have asked about dev availability. | 12:58 |
aspiers | life happens ... | 12:58 |
efried | anyway, nbd | 12:58 |
aspiers | I'm here now anyway :) | 12:58 |
aspiers | and it's all looking pretty good | 12:58 |
efried | it's not like nuclear missile launches will or will not happen based on how tightly we run our runway queue. | 12:59 |
aspiers | they won't? :-o | 12:59 |
aspiers | damn, I'm in the wrong business | 12:59 |
cdent | efried: speak for yourself efried, I run my small republic's military on the comings and goings of openstack etherpads | 13:01 |
cdent | and we have da bomb | 13:01 |
aspiers | I was only in OpenStack because I thought I was helping hurry the apocalypse along | 13:01 |
cdent | _exactly_ | 13:01 |
*** gryf has quit IRC | 13:01 | |
aspiers | cdent: is it hotter than hell where you are too? | 13:01 |
aspiers | maybe the apocalypse has already started | 13:02 |
cdent | aspiers: no. it's hotter than normal, but since I'm in corwall by the sea, it's a balmy and breezy 24 | 13:02 |
efried | It's unseasonably cool here in central texas. | 13:02 |
cdent | so apparently the apocolypse is upcountry, which make sense | 13:02 |
aspiers | bah | 13:02 |
* cdent blames boris | 13:02 | |
aspiers | I guess London deserves the wrath of satan more than the rest of the country | 13:02 |
cdent | yes | 13:02 |
aspiers | s/guess/know/ | 13:03 |
*** tbachman has joined #openstack-nova | 13:07 | |
*** udesale has joined #openstack-nova | 13:16 | |
*** mvkr_ has quit IRC | 13:17 | |
*** jhesketh has quit IRC | 13:22 | |
openstackgerrit | Adam Spiers proposed openstack/nova master: libvirt: harden Host.get_domain_capabilities() https://review.opendev.org/670189 | 13:23 |
*** jaosorior has joined #openstack-nova | 13:23 | |
*** marta_lais has joined #openstack-nova | 13:24 | |
efried | stephenfin: I'm not sure what cores are able to review --^ but I thought you might be one of them? | 13:24 |
efried | I know it's not me :( | 13:24 |
aspiers | sean-k-mooney, kashyap: new version up ^^^ | 13:25 |
*** jhesketh has joined #openstack-nova | 13:26 | |
openstackgerrit | Adam Spiers proposed openstack/nova master: libvirt: harden Host.get_domain_capabilities() https://review.opendev.org/670189 | 13:29 |
aspiers | kashyap: ah, just saw your nits - addressed | 13:29 |
*** _hemna has joined #openstack-nova | 13:30 | |
mriedem | efried: fyi i'll be running my kid to a class thing during the meeting so won't be around | 13:30 |
*** mkrai has joined #openstack-nova | 13:31 | |
openstackgerrit | Matt Riedemann proposed openstack/nova stable/stein: Revert "[libvirt] Filter hypervisor_type by virt_type" https://review.opendev.org/672723 | 13:31 |
*** Luzi has quit IRC | 13:33 | |
*** _hemna has quit IRC | 13:35 | |
kashyap | aspiers: Thanks :-) | 13:35 |
efried | meeting, right, meeting. | 13:35 |
*** ricolin has quit IRC | 13:36 | |
stephenfin | efried: It's on my list. Got to get to the combined VCPU/PCPU instances spec first though | 13:42 |
* kashyap reads the interesting revert above | 13:42 | |
openstackgerrit | Merged openstack/nova master: Remove super old unnecessary TODO from API start() method https://review.opendev.org/672330 | 13:43 |
openstackgerrit | Merged openstack/nova master: Completely remove fake_libvirt_utils. https://review.opendev.org/643897 | 13:43 |
*** jaosorior has quit IRC | 13:43 | |
openstackgerrit | Merged openstack/nova master: Remove usused umask argument to virt.libvirt.utils.write_to_file https://review.opendev.org/645086 | 13:43 |
*** amodi has joined #openstack-nova | 13:45 | |
efried | thanks stephenfin, specs are much more important rn. | 13:53 |
*** mvkr_ has joined #openstack-nova | 13:58 | |
sean-k-mooney | efried: cob today is spec freeze right | 13:58 |
efried | sean-k-mooney: Yes. http://lists.openstack.org/pipermail/openstack-discuss/2019-July/008019.html | 13:59 |
*** mkrai has quit IRC | 14:00 | |
sean-k-mooney | yep read that yesterday but its been a busy few days | 14:00 |
dansmith | sean-k-mooney: it's been a few days since yesterday? | 14:00 |
efried | nova meeting now in -meeting | 14:00 |
openstackgerrit | Merged openstack/nova master: Revert "[libvirt] Filter hypervisor_type by virt_type" https://review.opendev.org/672559 | 14:01 |
sean-k-mooney | no just ingeneral i was traviling form cashel to shannon after spending a few days at my mothers since her car broke down | 14:01 |
openstackgerrit | Merged openstack/nova master: Consts for need_healing https://review.opendev.org/672284 | 14:01 |
openstackgerrit | Merged openstack/nova master: Fix cleaning up console tokens https://review.opendev.org/637716 | 14:01 |
openstackgerrit | Merged openstack/nova master: Disambiguate logs in delete_allocation_for_instance https://review.opendev.org/671869 | 14:01 |
openstackgerrit | Merged openstack/nova master: Remove old TODO about forced_host policy check https://review.opendev.org/669474 | 14:01 |
dansmith | sean-k-mooney: heh, it was just a funny mind-o highlighting the business (i.e. two days felt like three) | 14:01 |
openstackgerrit | Merged openstack/nova master: Avoid logging traceback when detach device not found https://review.opendev.org/671640 | 14:01 |
dansmith | er, busy-ness | 14:01 |
*** pots has joined #openstack-nova | 14:02 | |
sean-k-mooney | ha ya | 14:02 |
sean-k-mooney | speaking of specs i agree that https://review.opendev.org/#/c/608696/ and https://review.opendev.org/#/c/602201/ are the closest of the remaing set | 14:05 |
*** luksky123 has quit IRC | 14:07 | |
*** ttsiouts has quit IRC | 14:09 | |
*** ttsiouts has joined #openstack-nova | 14:09 | |
*** ttsiouts has quit IRC | 14:14 | |
*** dpawlik has quit IRC | 14:28 | |
*** ccamacho has joined #openstack-nova | 14:32 | |
*** ttsiouts has joined #openstack-nova | 14:34 | |
openstackgerrit | Matt Riedemann proposed openstack/nova stable/rocky: Avoid crashing while getting libvirt capabilities with unknown arch names https://review.opendev.org/672746 | 14:35 |
openstackgerrit | Matt Riedemann proposed openstack/nova stable/rocky: Revert "[libvirt] Filter hypervisor_type by virt_type" https://review.opendev.org/672747 | 14:35 |
artom | Did something change with placement in functional tests recently? I swear my NUMA LM func test was getting to updating the XML last night, but this morning it's failing on placement not giving any allocations | 14:40 |
*** yikun has quit IRC | 14:40 | |
*** jmlowe has quit IRC | 14:42 | |
cdent | artom you have an example of the query that's being made, that might clarify things | 14:42 |
cdent | just today some cache adjustments were merged. when did things last work? | 14:43 |
artom | Last night (EDT) | 14:44 |
artom | 27.0.0.1 "GET /placement/allocation_candidates?limit=1000&required=%21COMPUTE_STATUS_DISABLED&resources=DISK_GB%3A20%2CMEMORY_MB%3A2048%2CVCPU%3A3" status: 200 len: 53 microversion: 1.31 | 14:44 |
artom | Got no allocation candidates from the Placement API. This could be due to insufficient resources or a temporary occurrence as compute nodes start up. | 14:44 |
* artom tries recreating the tox venv | 14:45 | |
artom | (Which I guess makes even less sense - if it was the same as before, what changed?) | 14:45 |
cdent | artom: got a patch up so I can play with it myself? | 14:47 |
cdent | when did compute status disabled support merge in nova? | 14:48 |
artom | cdent, https://review.opendev.org/#/c/672595, but I had to rebase it locally, so hang on | 14:48 |
cdent | aye aye | 14:48 |
mriedem | cdent: last week i think | 14:49 |
mriedem | or 2 weeks ago | 14:49 |
*** mlavalle has joined #openstack-nova | 14:50 | |
*** lbragstad has joined #openstack-nova | 14:51 | |
openstackgerrit | Artom Lifshitz proposed openstack/nova master: [WIP-until-series-is-ready] Introduce live_migration_claim() https://review.opendev.org/635669 | 14:51 |
openstackgerrit | Artom Lifshitz proposed openstack/nova master: New objects for NUMA live migration https://review.opendev.org/634827 | 14:51 |
openstackgerrit | Artom Lifshitz proposed openstack/nova master: LM: add support for sending NUMAMigrateData to the source https://review.opendev.org/634828 | 14:51 |
openstackgerrit | Artom Lifshitz proposed openstack/nova master: LM: add support for updating NUMA-related XML on the source https://review.opendev.org/635229 | 14:51 |
openstackgerrit | Artom Lifshitz proposed openstack/nova master: RPC changes to prepare for NUMA live migration https://review.opendev.org/634605 | 14:51 |
openstackgerrit | Artom Lifshitz proposed openstack/nova master: NUMA live migration support https://review.opendev.org/634606 | 14:51 |
openstackgerrit | Artom Lifshitz proposed openstack/nova master: Deprecate CONF.workarounds.enable_numa_live_migration https://review.opendev.org/640021 | 14:51 |
openstackgerrit | Artom Lifshitz proposed openstack/nova master: [WIP] Functional tests for NUMA live migration https://review.opendev.org/672595 | 14:51 |
artom | ^^ There, fixed the merge conflicts | 14:51 |
artom | For all I know it's not even placement... | 14:52 |
cdent | thanks artom will try it from my side of the world | 14:52 |
artom | cdent, appreciated, much thanks :) | 14:52 |
*** ccamacho has quit IRC | 14:53 | |
efried | artom: https://review.opendev.org/#/c/668752/ | 14:55 |
efried | ~2w ago | 14:55 |
mriedem | i'm pretty sure required=%21COMPUTE_STATUS_DISABLED is saying !disabled | 14:57 |
mriedem | i.e. forbidden trait | 14:57 |
cdent | yes | 14:57 |
artom | Yeah, that's ! | 14:57 |
sean-k-mooney | mriedem: yep it is | 14:59 |
sean-k-mooney | althouhg ! does not technically have to be url encoded | 15:00 |
sean-k-mooney | but %21 is the encodeing for ! | 15:00 |
*** takashin has left #openstack-nova | 15:01 | |
*** brault has joined #openstack-nova | 15:03 | |
*** artom has quit IRC | 15:06 | |
*** TxGirlGeek has joined #openstack-nova | 15:07 | |
*** jmlowe has joined #openstack-nova | 15:07 | |
*** mdbooth has quit IRC | 15:10 | |
*** artom has joined #openstack-nova | 15:14 | |
*** ratailor has joined #openstack-nova | 15:16 | |
*** wwriverrat has joined #openstack-nova | 15:16 | |
*** dklyle has quit IRC | 15:17 | |
*** _erlon_ has joined #openstack-nova | 15:18 | |
*** dklyle has joined #openstack-nova | 15:18 | |
*** mkrai has joined #openstack-nova | 15:18 | |
*** ricolin has joined #openstack-nova | 15:22 | |
*** gryf has joined #openstack-nova | 15:27 | |
efried | sean-k-mooney: A requirement for cycle-with-intermediary projects is a m-2 release. os-vif qualifies. | 15:28 |
efried | It looks like there have been half a dozen or so commits since the last release, of which only one looks like it has any meat (https://review.opendev.org/#/c/658786/) | 15:28 |
efried | Can we do a release now? | 15:28 |
efried | jangutter: ^ | 15:28 |
*** maciejjozefczyk has quit IRC | 15:32 | |
sean-k-mooney | efried: actully we just need to have 1 release we dont need one at each milestone | 15:33 |
sean-k-mooney | that was cycles-with-milestones | 15:33 |
sean-k-mooney | but ill check and get back to you later | 15:33 |
sean-k-mooney | just on a meeting | 15:33 |
*** ricolin_ has joined #openstack-nova | 15:33 | |
stephenfin | efried, mriedem: Can I have pre-commit? Pretty please? https://review.opendev.org/#/c/665518/ | 15:34 |
stephenfin | I've to respin ~8 patches because I forgot to run pep8 :'( https://review.opendev.org/#/c/671797/ | 15:35 |
stephenfin | Not that I'm going to bother yet. I'll fix it when I need to respin | 15:35 |
*** ricolin has quit IRC | 15:36 | |
*** pchavva has joined #openstack-nova | 15:37 | |
*** pchavva has left #openstack-nova | 15:37 | |
jangutter | efried: regarding the meat, I'm happy for it to get barbecued into a release. note that there has been some follow-on stuff (unmerged) that happened afterwards too. | 15:37 |
*** ricolin_ is now known as ricolin | 15:38 | |
jangutter | efried: specifically https://review.opendev.org/#/c/665965/ | 15:39 |
cdent | artom: I responded on your thing. I can see where things go wrong, but not why | 15:40 |
cdent | what I mean is I can answer the "why" but not the "why of the why" | 15:40 |
jangutter | efried: my view (stated in the os-vif review) is that I don't think the follow-on is necessary, but I don't feel strongly enough to oppose it. | 15:40 |
artom | cdent, aha, that's already very helpful | 15:41 |
cdent | artom: good. i'll be curious to here what the missing piece is | 15:41 |
cdent | and hear too | 15:41 |
artom | I know some stuff changed recently around fakelibvirt | 15:41 |
artom | Maybe a side effect of that was changing the default flavor and/or compute disk size? | 15:42 |
mriedem | stephenfin: i'll defer your pre-commit request to dansmith | 15:42 |
cdent | sounds likekly | 15:42 |
artom | I shall dig, right after this meeting | 15:42 |
artom | Which is done, thank god | 15:42 |
stephenfin | ugh, but he's the worst | 15:42 |
mriedem | i'm an old bugbear so i don't care about pre-commit | 15:42 |
stephenfin | Um, I mean...the best <3 | 15:42 |
mriedem | and will bitch to no end if it makes me do extra things | 15:42 |
*** ratailor has quit IRC | 15:43 | |
artom | mriedem's artistic left brain half is actually a tox venv for running pep8 | 15:43 |
sean-k-mooney | mriedem: well if you dont install it it wont do anything | 15:43 |
stephenfin | It should make you do less things, since you won't need to remember to run fast8 | 15:43 |
stephenfin | but it is different things | 15:43 |
stephenfin | ...unless we backport | 15:43 |
stephenfin | ...which I would be game to do | 15:44 |
*** gyee has joined #openstack-nova | 15:44 | |
mriedem | i would not | 15:44 |
*** avolkov has joined #openstack-nova | 15:45 | |
sean-k-mooney | anyway the important thing about the pre-commit stuff is it should not impact anywayone that does not want to use it | 15:45 |
*** cdent has left #openstack-nova | 15:45 | |
sean-k-mooney | they can continue to use tox and the gate will continute to use tox | 15:45 |
*** cdent has joined #openstack-nova | 15:45 | |
stephenfin | 'zactly. It's purely opt-in | 15:45 |
sean-k-mooney | for those that want to use it the can install the tool and let it doe its thing | 15:45 |
stephenfin | sean-k-mooney: I added the tab to spaces converter thing too, if that sweetens the deal for you | 15:46 |
*** artom has quit IRC | 15:46 | |
sean-k-mooney | :) | 15:46 |
stephenfin | some pre-commit >>> no pre-commit | 15:46 |
sean-k-mooney | yes yes it does | 15:46 |
sean-k-mooney | i already likeed it however | 15:46 |
openstackgerrit | Matt Riedemann proposed openstack/nova master: api-ref: touch up the os-services docs https://review.opendev.org/672571 | 15:47 |
*** altlogbot_1 has quit IRC | 15:48 | |
openstackgerrit | Matt Riedemann proposed openstack/nova master: api-ref: touch up the os-services docs https://review.opendev.org/672571 | 15:49 |
*** altlogbot_3 has joined #openstack-nova | 15:50 | |
*** tesseract has quit IRC | 15:50 | |
sean-k-mooney | efried: alex_xu stephenfin i have summerised where i stand on the mix cpu spec in my last top level comment https://review.opendev.org/#/c/668656/5 | 15:55 |
sean-k-mooney | if that seams fair i think we could proceed with it contionally on that restricted time table | 15:56 |
sean-k-mooney | otherwise i would move this to backlog/U | 15:56 |
jangutter | stephenfin: tab to space is low hanging fruit. If you really want a holy war, enforce "one space after period." | 15:57 |
sean-k-mooney | jangutter: the tab to space thing is because we manually enforce no tabs | 15:57 |
jangutter | sean-k-mooney: yep, pep8 will just do a late fail for you. | 15:58 |
sean-k-mooney | so the pre-commit hook would detect it for you | 15:58 |
sean-k-mooney | jangutter: only on python 3 i think | 15:58 |
sean-k-mooney | on 2 i think it will allow it as long as you dont mix | 15:58 |
mriedem | oooooo yeahhh https://www.youtube.com/watch?v=Lrle0x_DHBM | 15:59 |
openstackgerrit | Surya Seetharaman proposed openstack/nova master: API microversion 2.75: Add 'power-update' external event https://review.opendev.org/645611 | 15:59 |
*** ttsiouts has quit IRC | 16:00 | |
*** shilpasd has quit IRC | 16:03 | |
*** brault has quit IRC | 16:04 | |
*** JamesBenson has joined #openstack-nova | 16:04 | |
*** tssurya has quit IRC | 16:04 | |
*** jmlowe has quit IRC | 16:04 | |
*** brault has joined #openstack-nova | 16:05 | |
*** mkrai has quit IRC | 16:05 | |
efried | stephenfin: Isn't pre-commit something you can carry locally at will? | 16:07 |
stephenfin | I'd need to add it to '.gitignore' but otherwise yes | 16:08 |
efried | I mean, given the level of meh about putting it in the codebase, that just semes like the better option, doesn't it? | 16:09 |
efried | It would be, like, two of you using it? | 16:09 |
*** brault has quit IRC | 16:09 | |
stephenfin | Perhaps, but it just rubs me the wrong way if we're being honest | 16:11 |
stephenfin | I'm not sure why we should be so averse to trying new things, especially when those things are opt-in and don't affect the end product in any way | 16:11 |
Nick_A | what is the correct way to enable maintenance mode on a hypervisor to prevent new instances from spawning on it? https://docs.openstack.org/python-openstackclient/rocky/cli/command-objects/host.html doesn't seem to work - "Not Implemented" error | 16:11 |
*** artom has joined #openstack-nova | 16:12 | |
Nick_A | Never mind - we found it | 16:12 |
artom | cdent, https://review.opendev.org/#/c/644793/12/nova/tests/functional/libvirt/test_numa_servers.py culprit found | 16:12 |
cdent | woot | 16:12 |
artom | cdent, thanks again for your prompt help! | 16:12 |
cdent | you're welcome. was there a line number associated with that link? | 16:13 |
cdent | or just the general concept | 16:13 |
artom | cdent, in general - more specifically, it removed a monkey patch in setup: https://review.opendev.org/#/c/644793/12/nova/tests/functional/libvirt/base.py | 16:14 |
artom | So then had to mock a bunch of stuff for each test | 16:14 |
cdent | ah ha | 16:14 |
artom | Which I wasn't mocking, as I wrote the test before that landed | 16:14 |
artom | And thus was relying on the setUp monkeypatch, which got pulled from under me | 16:15 |
cdent | yowsa | 16:15 |
artom | TBH, the commit message doensn't do a great job of explaining *why* it was necessary to remove that monkeypatch | 16:15 |
stephenfin | idk. I do a lot of reviews. I write some code. I'm a decent community member, in general. Why do I have to pull teeth to get something in that I'm saying helps my productivity and doesn't hamper anyone else. It's frustrating. | 16:15 |
artom | But I assume there's a larger context that I'm not aware of | 16:16 |
efried | stephenfin: +2 on the basis of you really wanting it. | 16:16 |
stephenfin | Cheers | 16:16 |
efried | stephenfin: but to answer your question, because it's trivial for you to do it locally without "polluting" the nova codebase with something that's irrelevant to nova. | 16:17 |
cdent | stephenfin++ | 16:17 |
stephenfin | My counterpoint to that is that I've done, and continue to do, a lot of unpolluting | 16:17 |
efried | be like me putting a pycharm config | 16:17 |
artom | Sounds like we need the equivalent of a carbon tax | 16:18 |
efried | big difference between obsolete nova code and something that was never relevant to nova. | 16:18 |
*** brinzhang_ has quit IRC | 16:18 | |
*** brinzhang has quit IRC | 16:18 | |
stephenfin | relevant to nova developers though | 16:19 |
stephenfin | who are as important, if not more important, than the code | 16:20 |
efried | as relevant to any project's developers, nah? | 16:20 |
efried | are you going on a crusade to propose this same thing to all projects you work on? | 16:20 |
artom | cdent, woot, I'm back to last night's failure | 16:21 |
* artom starts tacking fakelibvirt's broken getXML() | 16:21 | |
cdent | something reasonable now | 16:21 |
artom | *tackling | 16:21 |
stephenfin | I'll probably add it to one or two of my personal projects, maybe Sphinx too, but I wouldn't be touching oslo and the likes, no | 16:21 |
kashyap | artom: Hehe, one letter changes the meaning, doesn't it :D | 16:21 |
stephenfin | Basically anywhere where I'm likely to be undertaking large feature work consisting of many patches | 16:21 |
artom | kashyap, at least I wasn't tickling it | 16:22 |
kashyap | mriedem: Try this, not sure if that's your glass of (root?) beer -- https://www.youtube.com/watch?v=jBo870lVUyc | 16:23 |
kashyap | [Preferably with a good quality headset / speaker] | 16:23 |
cdent | oh. that's nice. | 16:27 |
*** lpetrut has quit IRC | 16:29 | |
kashyap | Jimmy Smith++ | 16:29 |
*** rpittau is now known as rpittau|afk | 16:36 | |
*** ricolin has quit IRC | 16:39 | |
*** cdent has quit IRC | 16:39 | |
*** igordc has quit IRC | 16:58 | |
*** igordc has joined #openstack-nova | 16:58 | |
*** ivve has quit IRC | 17:01 | |
mriedem | what in tarnations, created devstack from master today, create a server, n-cpu logs say the guest was created in the hypervisor, and then things just hang - and virsh list doesn't show anything | 17:02 |
mriedem | wtf | 17:02 |
sean-k-mooney | mriedem: im guessing libvirt crahsed | 17:05 |
sean-k-mooney | either that or you need to run virsh listh with either sudo or --all | 17:05 |
sean-k-mooney | actully if it hung then ingore the last bit | 17:06 |
mriedem | oh right sudo virsh list | 17:06 |
mriedem | libvirtd is green | 17:06 |
mriedem | the domain is just hung in paused state | 17:06 |
sean-k-mooney | the domain or the nova compute agent | 17:07 |
mriedem | the domain | 17:07 |
mriedem | $ sudo virsh list --all | 17:07 |
mriedem | Id Name State | 17:07 |
mriedem | ---------------------------------------------------- | 17:07 |
mriedem | 3 instance-00000003 paused | 17:07 |
sean-k-mooney | and in nova its active | 17:08 |
mriedem | no | 17:08 |
mriedem | it's building b/c the libvirt driver is waiting for the power state to change from paused to running | 17:08 |
sean-k-mooney | oh and you it alwready told it to unpause? we start the domian in the pased state. i wonder if the qemu monitor has hung | 17:09 |
*** jmlowe has joined #openstack-nova | 17:13 | |
mriedem | fun | 17:14 |
mriedem | Jul 25 17:13:36 devstack libvirt-guests.sh[18879]: Timeout expired while shutting down domains | 17:14 |
mriedem | Jul 25 17:13:36 devstack systemd[1]: libvirt-guests.service: Control process exited, code=exited status=1 | 17:14 |
mriedem | trying to restart libvirt-guests | 17:14 |
sean-k-mooney | is this a clean install of ubuntu 18.04? | 17:15 |
mriedem | well from a vexxhost image of 18.04 but yeah | 17:15 |
sean-k-mooney | strange i personlly havent had any issue with 18.04 i did an install fiday | 17:16 |
mriedem | me neither | 17:16 |
sean-k-mooney | are the vexhost image available for download | 17:17 |
mriedem | idk, i'm trashing this vm | 17:18 |
*** jaypipes has quit IRC | 17:18 | |
mnaser | the vexxhost images are straight up the ones shipped by ubuntu | 17:19 |
sean-k-mooney | ya i would just start over too to be honest. i suspect its somehitng to doe with libvirt/qemu or maybe apparmor but i would start clean | 17:19 |
sean-k-mooney | mnaser: the cloud images | 17:19 |
mnaser | yep | 17:19 |
mnaser | only thign we do is convert from qemu to raw | 17:19 |
mnaser | and upload | 17:19 |
sean-k-mooney | im guessing ye are using ceph as a backend then | 17:19 |
*** betherly has joined #openstack-nova | 17:19 | |
mnaser | indeed :) | 17:20 |
kashyap | On Fedora, the 'libvirt-guests' thing isn't even enabled: | 17:20 |
kashyap | $> systemctl status libvirt-guests | 17:20 |
kashyap | ● libvirt-guests.service - Suspend/Resume Running libvirt Guests | 17:20 |
kashyap | Loaded: loaded (/usr/lib/systemd/system/libvirt-guests.service; disabled; vendor preset: disabled) | 17:20 |
kashyap | Active: inactive (dead) | 17:20 |
kashyap | ... | 17:20 |
kashyap | But yeah, that timeout of 'libvirt-guests' looks spurious enough, might as well start over. | 17:20 |
sean-k-mooney | kashyap: would that not cause filesystem curruption if you did not suspend them on rebooting the host | 17:21 |
kashyap | (Also, not sure if that paused instance's QEMU process went 'defunct') | 17:21 |
efried | following up re os-vif and python-novaclient releases: Libs are required to do one release per milestone. os-vif was last released at m1, so we can expect the release team to propose that one. python-novaclient was released a couple weeks ago, so we're probably good on that one. | 17:21 |
sean-k-mooney | kashyap: im guessing the qemu moniotr process stoped processing messages form libvirt | 17:21 |
sean-k-mooney | efried: ok there is one think i would like to fix soonish but im only starting on it today | 17:22 |
kashyap | Yeah, but that doesn't tell us why. It could be any no. of reasons | 17:22 |
sean-k-mooney | kashyap: yep its proably quicker to kill it and spin up a clean vm | 17:23 |
sean-k-mooney | if mriedem hits it again we can take another look | 17:23 |
kashyap | sean-k-mooney: Yeah, on FS corruption, possibly "enterprise distros" would enable it | 17:24 |
*** ralonsoh has quit IRC | 17:24 | |
*** betherly has quit IRC | 17:24 | |
sean-k-mooney | it look like ubuntu just enables it by defualt to be safe by defualt | 17:24 |
*** JamesBenson has quit IRC | 17:25 | |
*** igordc has quit IRC | 17:28 | |
*** vishwanathj has quit IRC | 17:30 | |
kashyap | sean-k-mooney: RHEL doesn't either, BTW. And one can configure what action 'libvirt-guests' can take on host shutdown | 17:31 |
sean-k-mooney | ok | 17:31 |
sean-k-mooney | well that is not related to the issue mriedem was having | 17:32 |
sean-k-mooney | the issue he was having was that the vm hung | 17:32 |
sean-k-mooney | and then the linux-guests scipt also hugn on shutdown for the same reason | 17:32 |
sean-k-mooney | ther eis a timout in the service file if i rememebr correctly it waits for up to 2 minutes | 17:33 |
sean-k-mooney | and it continue with the system shutdown if it takes longer then that | 17:33 |
*** udesale has quit IRC | 17:34 | |
kashyap | I wasn't saying it is related. On your FS corruption: no, it is admin / higher-level tool's responsibility to ensure your guests will quiesce its FS. | 17:34 |
* kashyap --> needs to run shortly | 17:34 | |
*** vishwanathj has joined #openstack-nova | 17:34 | |
kashyap | (And yes, there is a timeout: check SHUTDOWN_TIMEOUT in /etc/sysconfig/libvirt-guests) | 17:35 |
sean-k-mooney | ack | 17:35 |
kashyap | Default is 5 minutes. | 17:35 |
*** mvkr_ has quit IRC | 17:35 | |
sean-k-mooney | ya i have seen it when i have rebooted system in the console output in the past i just noticed it had one but never really look that closely | 17:35 |
*** marta_lais has quit IRC | 17:38 | |
melwitt | dansmith, mriedem: would like to have your review on a change to remove the "last context manager" from the CellDatabases fixture https://review.opendev.org/672604. this came up again while I was working on adding a func test to Kevin_Zheng's multi-cell nova-manage db archive_deleted_rows patch https://review.opendev.org/507486, which has been of high priority interest downstream lately | 17:39 |
openstackgerrit | Eric Fried proposed openstack/nova master: WIP: Process [compute] in $NOVA_CPU_CONF in nova-next https://review.opendev.org/672800 | 17:39 |
sean-k-mooney | melwitt: by the way has anyone reviewed the unified limits spec for the api subteam? not sure who that would be | 17:42 |
efried | sean-k-mooney: Isn't gmann "the api subteam"? | 17:44 |
sean-k-mooney | efried: i guess so? i wasnt sure who was on it. but i didnt want too see that sepc slip through the cracks if they were not about to review it | 17:45 |
efried | Agree. | 17:46 |
melwitt | sean-k-mooney: no, not yet. people I usually ask about api stuff are alex_xu, gmann | 17:46 |
efried | I think I added gmann to that spec for that reason, but not sure if he looked. | 17:46 |
sean-k-mooney | similarly with the image encryption spec. | 17:46 |
efried | I'm sort of delegating, like "encouraging" the spec owners to track down whoever is needed. | 17:46 |
sean-k-mooney | oh the provider yaml spec merged | 17:48 |
sean-k-mooney | cool i should read the final version | 17:48 |
sean-k-mooney | johnthetubaguy: melwitt: if ye feel like reviewing a spec that is close https://review.opendev.org/#/c/608696/ im happy to do a little reaching out on behalf of the spec owner :) | 17:49 |
*** psachin has quit IRC | 17:50 | |
*** dklyle has quit IRC | 18:11 | |
*** dklyle has joined #openstack-nova | 18:12 | |
*** priteau has quit IRC | 18:13 | |
dansmith | melwitt: okay, I'm generally pretty wary of changing that stuff (or even trying to load enough context to review that). I'm not sure I'll get to that point before I Ieave tomorrow, but...ack :) | 18:14 |
openstackgerrit | Merged openstack/nova master: Remove 'nova.virt.driver.ComputeDriver.estimate_instance_overhead' https://review.opendev.org/672106 | 18:16 |
melwitt | dansmith: ok, thanks for letting me know. it was tough for me to load the context myself, so I understand. I wanted to ideally have you review since the patch involves ripping out the stuff that you had to add with the _cell_lock | 18:18 |
melwitt | I think it makes the fixture much simpler, but definitely want to run it by you in case I missed something | 18:20 |
dansmith | yeah I'm just afraid of it breaking something subtle which we don't find for a couple months and then think we need to fix it by changing the real code when in fact the fixture is too relaxed or something | 18:22 |
dansmith | but that's just because of how hard it was to get it right in the first place, of course | 18:22 |
*** igordc has joined #openstack-nova | 18:22 | |
*** brault has joined #openstack-nova | 18:23 | |
melwitt | oh, you mean something to do with racing tests appearing like real bugs when there's really just an issue with the fixture? yeah, I can understand that concern. as far as I can tell, my proposed patch removes all changing of global state, so I'd think there won't be an issue. but those are famous last words, I know | 18:27 |
*** brault has quit IRC | 18:27 | |
*** brault has joined #openstack-nova | 18:27 | |
melwitt | if we're too afraid to change the fixture, then we will hopefully be able to accept the multi-cell nova-manage patch's func test not being full coverage because of the faking that the CellDatabases fixture does. my primary objective is to get the multi-cell nova-manage db archive_deleted_rows done | 18:29 |
*** JamesBenson has joined #openstack-nova | 18:29 | |
dansmith | okay, I'm not sure why that is harder than other cell iteration things we do in tests, but I'd be much more inclined to accept more mockery (since that's really a trivial operation) vs. blocking that on rearchitecting the fixture. But, I haven't looked enough into why that's a problem to say really | 18:31 |
melwitt | and while working on that, its func test was not failing when it should have been (bug in a patchset), and I found it wasn't failing properly because of the "last context manager" faking in the fixture | 18:31 |
dansmith | maybe I can try to do that before tomorrow at least | 18:31 |
melwitt | dansmith: tl;dr is the func test is written correctly and is good, but it did _not_ catch a bug in the proposed multi-cell archive impl because of the faking in the fixture. the fixture auto-targets untargeted database access to the last targeted database or the default database. the former hid the bug in the impl because the fixture auto-targeted something that was not targeted in real life and needed to be targeted in real life | 18:34 |
melwitt | I hope that makes sense | 18:34 |
mriedem | i'd also rather figure out why the func tests on the archive patch don't work rather than block on redoing the fixture, but i don't know what the issues were, | 18:35 |
mriedem | having said that, | 18:35 |
*** ivve has joined #openstack-nova | 18:35 | |
mriedem | i have a func test in my cross-cell resize series that does db archive on all 3 cells in the test https://review.opendev.org/#/c/651650/22/nova/tests/functional/test_cross_cell_migrate.py | 18:35 |
dansmith | but the real code sends untargeted stuff to the default db in the config, which is why the fixture does | 18:35 |
melwitt | mriedem: I did figure it out, it was because of the auto-targeting by "last targeted database" if untargeted | 18:35 |
dansmith | mriedem: yeah, even still, I'd be happy with just a unit test to make sure that archive is calling archive on all the cell mappings.. it | 18:36 |
dansmith | is such a trivial op I don't really know that we need much more than that, | 18:36 |
mriedem | ok i guess i mean "i'd rather figure out an easier way to make the new tests work with the existing fixture" | 18:36 |
melwitt | dansmith: but the fixture also sends untargeted stuff to the "last targeted db" first. I think it should only send untargeted stuff to the default db | 18:36 |
dansmith | and we run archive in tempest jobs, which should hit cell0 and cell1 if we make it run all cells | 18:36 |
mriedem | also, | 18:36 |
melwitt | (which is what my patch is doing) | 18:36 |
mriedem | i was going to say - nova-next runs archive_deleted_rows, we can and should make it run on all cells | 18:36 |
mriedem | which will hit cell0 and cell1 as dansmith said | 18:36 |
mriedem | so....then we'd have real integration test coverage | 18:37 |
dansmith | melwitt: everything in the compute node is untargeted though, which is why I don't really know how you can change that globally and have it match the real world | 18:37 |
dansmith | but.. I haven't read it so I dunno | 18:37 |
mriedem | of the cli, which is better than the functional stuff anyway | 18:37 |
dansmith | mriedem: agreed | 18:37 |
*** brinzhang has joined #openstack-nova | 18:37 | |
*** brinzhang_ has joined #openstack-nova | 18:37 | |
melwitt | dansmith: I think maybe things are getting confused. in the fixture we have two ways of targeting untargeted stuff. one is sending it to the default db (good) and one is sending it to the last db that was targeted (bad IMHO) | 18:38 |
*** mrch_ has joined #openstack-nova | 18:38 | |
mriedem | i have run afoul of that latter behavior | 18:38 |
dansmith | I understand | 18:38 |
mriedem | agree it's not fun | 18:38 |
mriedem | but i think i've found ways around that in my multi-cell func testing | 18:38 |
melwitt | so y'all actually like the last targeted db thing? I'm open to that, just didn't think anyone would think they wanted to keep it | 18:39 |
mriedem | this one https://review.opendev.org/#/c/641179/ | 18:39 |
*** eharney has quit IRC | 18:39 | |
mriedem | i remember i was hitting weirdness because of that 'last targeted context' thing as well | 18:39 |
mriedem | i'm not saying i like it | 18:39 |
mriedem | but i also don't like redoing the whole thing per se | 18:39 |
mriedem | when there are maybe other ways | 18:39 |
melwitt | I'm +1 on the real integration testing, that's fine by me. but I just thought it would make the fixture a lot simpler and less big hiding to remove that bit about "last targeted db" | 18:40 |
mriedem | it's like touching the old quotas code - i can, but don't want to if i can help it | 18:40 |
mriedem | w/o looking deep into your change idk | 18:40 |
mriedem | i wouldn't abandon it, | 18:40 |
mriedem | but i wouldn't block the other thing on it either | 18:40 |
mriedem | i'd get the integration testing in nova-next working | 18:40 |
*** betherly has joined #openstack-nova | 18:41 | |
melwitt | that's fair. it's less like redoing and more like "delete all the self._last_ctxt_mgr" but yeah, when you get around to it, take a look and see if you hate it | 18:41 |
mriedem | which should be like, 1 line | 18:41 |
melwitt | don't get me wrong, I'm totally fine with that. as long as it gets tested, I'm happy. I was honed in on trying to make the func test work 100% | 18:42 |
*** brinzhang_ has quit IRC | 18:42 | |
*** brinzhang has quit IRC | 18:42 | |
*** brinzhang has joined #openstack-nova | 18:43 | |
*** brinzhang_ has joined #openstack-nova | 18:43 | |
melwitt | and thought people might be happy to see all the _last_ctxt_mgr stuff deleted from the fixture, no more global state changing, much simpler | 18:43 |
melwitt | I'll rebase the multi-cell archive patch on top of a different change to add --all-cells to nova-next | 18:45 |
*** betherly has quit IRC | 18:45 | |
*** lbragstad has quit IRC | 18:51 | |
*** mriedem has quit IRC | 18:54 | |
*** BjoernT has joined #openstack-nova | 18:59 | |
*** betherly has joined #openstack-nova | 19:01 | |
*** mriedem has joined #openstack-nova | 19:03 | |
mriedem | efried: ha http://lists.openstack.org/pipermail/openstack-discuss/2019-July/008037.html | 19:04 |
*** betherly has quit IRC | 19:05 | |
*** igordc has quit IRC | 19:09 | |
*** xek_ has joined #openstack-nova | 19:15 | |
*** jaypipes has joined #openstack-nova | 19:16 | |
*** xek has quit IRC | 19:17 | |
*** TxGirlGeek has quit IRC | 19:21 | |
openstackgerrit | Merged openstack/nova master: api-ref: touch up the os-services docs https://review.opendev.org/672571 | 19:25 |
*** igordc has joined #openstack-nova | 19:25 | |
*** igordc has quit IRC | 19:32 | |
artom | That actually went pretty well. | 19:33 |
* artom has minimal scaffolding in place to pass the NUMA LM func test | 19:34 | |
*** bbowen has quit IRC | 19:41 | |
*** vishwanathj has quit IRC | 19:45 | |
openstackgerrit | Matt Riedemann proposed openstack/nova-specs master: Support delete_on_termination in volume attach api https://review.opendev.org/612949 | 19:45 |
mriedem | i cleaned this up ^ it's pretty straight-forward | 19:45 |
*** dasp has quit IRC | 19:51 | |
*** dasp has joined #openstack-nova | 19:54 | |
efried | mriedem: approved that. Seems like an easy win. | 20:02 |
*** betherly has joined #openstack-nova | 20:02 | |
mriedem | ack, i'm cleaning up https://review.opendev.org/#/c/667894/ now | 20:03 |
efried | mriedem: make all the names match up too if you please | 20:04 |
efried | mriedem: the bp is at https://blueprints.launchpad.net/nova/+spec/add-user-id-field-to-the-migrations-table | 20:05 |
efried | (so the path at the top of the spec is right, but the file path, commit message, and topic are wrong) | 20:06 |
*** betherly has quit IRC | 20:07 | |
*** igordc has joined #openstack-nova | 20:08 | |
openstackgerrit | Merged openstack/nova-specs master: Support delete_on_termination in volume attach api https://review.opendev.org/612949 | 20:09 |
efried | sean-k-mooney, stephenfin: Did we ever figure out whether/how you could tell from within a guest which / how many CPUs are pinned? Was that going to be via "metadata"? | 20:10 |
sean-k-mooney | efried: that is what the latest spec say yes | 20:10 |
efried | okay, I was about to dig into it. Unfortunately, I don't see me being competent to update it such as to get it approved today. | 20:11 |
sean-k-mooney | https://review.opendev.org/#/c/668656/5/specs/train/approved/use-pcpu-vcpu-in-one-instance.rst@173 | 20:12 |
sean-k-mooney | The metadata API will be extended to dedicated cpu info with new version. | 20:12 |
sean-k-mooney | The new field will be added to the `meta_data.json`:: | 20:12 |
sean-k-mooney | dedicated_cpu=<cpuset string> | 20:12 |
sean-k-mooney | The ``cpuset string`` indicated the instance cpus which are running on | 20:12 |
efried | ah nice | 20:12 |
sean-k-mooney | dedicated pCPU. | 20:12 |
sean-k-mooney | you could alos use that new numa toploty api if we approved that i guess | 20:13 |
sean-k-mooney | although that is not really from within the guest | 20:13 |
efried | I would want stephenfin to be a +2 on that spec anyway, so I guess we'll see if alex_xu et al can polish it up and request sfe, since we seem to have general agreement with caveats as noted. | 20:14 |
efried | I'll sent a note on sfe process (which apparently I'll be making up) either tomorrow or Monday. | 20:15 |
sean-k-mooney | does the timeline i set out make sesne to you | 20:15 |
efried | oh, totally. | 20:15 |
*** gyee has quit IRC | 20:22 | |
*** wwriverrat has quit IRC | 20:26 | |
openstackgerrit | Matt Riedemann proposed openstack/nova-specs master: Add user_id to the migrations https://review.opendev.org/667894 | 20:28 |
openstackgerrit | Matt Riedemann proposed openstack/nova-specs master: Add user_id to the migrations https://review.opendev.org/667894 | 20:29 |
mriedem | first is the diff, second is the file rename | 20:29 |
mriedem | only open question on ^ is if we should add user_id/project_id as request filter params to GET /os-migrations | 20:31 |
mriedem | i think we might as well | 20:31 |
*** zbr_ has quit IRC | 20:35 | |
efried | sean-k-mooney: please vet that I represented you correctly in the whiteboard https://blueprints.launchpad.net/nova/+spec/use-pcpu-and-vcpu-in-one-instance | 20:35 |
sean-k-mooney | sure | 20:37 |
* sean-k-mooney clicks | 20:37 | |
*** zbr has joined #openstack-nova | 20:37 | |
sean-k-mooney | yep | 20:38 |
efried | thx | 20:38 |
sean-k-mooney | without all the typos in my original comment :) | 20:38 |
gmann | efried: melwitt sean-k-mooney ack I will check the spec. it was in my list but did not get time to review. | 20:38 |
efried | thanks gmann | 20:38 |
melwitt | cool gmann | 20:39 |
efried | mriedem: where are request filter params for GET /os-migrations documented in the api-ref? | 20:41 |
efried | oh, are they the top-level fields hidden, host, instance_uuid etc? | 20:42 |
efried | uck | 20:42 |
efried | mriedem: but yeah, I think it makes sense to add this in there. | 20:43 |
mriedem | yeah | 20:43 |
mriedem | i frequently filter on migration_type and instance_uuid in functional tests since GET /servers/{server_id}/migrations is hard-coded to only be in-progress live migrations | 20:44 |
*** boxiang_ has quit IRC | 20:48 | |
*** boxiang_ has joined #openstack-nova | 20:48 | |
*** mriedem has quit IRC | 20:52 | |
*** mriedem has joined #openstack-nova | 20:53 | |
efried | mriedem: chuck that filter field in there and I'm +2 | 20:54 |
*** BjoernT has quit IRC | 20:55 | |
mriedem | alright then | 20:55 |
efried | not sure if you're still comfortable being the other +2 | 20:55 |
mriedem | o | 20:55 |
mriedem | *i'm like a 1.5 | 20:55 |
efried | sean-k-mooney: want to throw a +1 on there to push us over the edge? | 20:56 |
efried | ...once mriedem has updated | 20:56 |
sean-k-mooney | efried: link? | 20:56 |
efried | https://review.opendev.org/#/c/667894/ -- stand by for PS4 | 20:56 |
sean-k-mooney | oh i have looked at that before sure ill re review | 20:57 |
*** gyee has joined #openstack-nova | 20:59 | |
*** bbowen has joined #openstack-nova | 21:02 | |
*** betherly has joined #openstack-nova | 21:03 | |
mriedem | build specs docs takes awhile | 21:05 |
mriedem | *building | 21:05 |
openstackgerrit | Artom Lifshitz proposed openstack/nova master: [WIP-until-series-is-ready] Introduce live_migration_claim() https://review.opendev.org/635669 | 21:05 |
sean-k-mooney | im reviewing v3 in the mean time and then ill revew the delta | 21:05 |
openstackgerrit | Artom Lifshitz proposed openstack/nova master: New objects for NUMA live migration https://review.opendev.org/634827 | 21:05 |
openstackgerrit | Artom Lifshitz proposed openstack/nova master: LM: add support for sending NUMAMigrateData to the source https://review.opendev.org/634828 | 21:05 |
openstackgerrit | Artom Lifshitz proposed openstack/nova master: LM: add support for updating NUMA-related XML on the source https://review.opendev.org/635229 | 21:05 |
openstackgerrit | Artom Lifshitz proposed openstack/nova master: RPC changes to prepare for NUMA live migration https://review.opendev.org/634605 | 21:05 |
openstackgerrit | Artom Lifshitz proposed openstack/nova master: NUMA live migration support https://review.opendev.org/634606 | 21:05 |
openstackgerrit | Artom Lifshitz proposed openstack/nova master: Deprecate CONF.workarounds.enable_numa_live_migration https://review.opendev.org/640021 | 21:05 |
openstackgerrit | Artom Lifshitz proposed openstack/nova master: [WIP] Functional test for NUMA live migration https://review.opendev.org/672595 | 21:05 |
efried | mriedem: I use this: http://paste.openstack.org/raw/754874/ | 21:06 |
*** betherly has quit IRC | 21:08 | |
openstackgerrit | Matt Riedemann proposed openstack/nova-specs master: Add user_id to the migrations https://review.opendev.org/667894 | 21:08 |
efried | oh, I retract my 'uck' from earlier. Didn't pick up that these were in the querystring. That's dandy. | 21:10 |
*** zbr has quit IRC | 21:11 | |
sean-k-mooney | ya that is how we normally do filtering | 21:11 |
mriedem | yeah i wasn't sure why you were ucking | 21:12 |
sean-k-mooney | efried: has intel not given you server with 70 billion cores to run cirros vms on yet | 21:12 |
efried | pshht, what do you think? | 21:13 |
sean-k-mooney | they are greate for builing docs or running unit tests | 21:13 |
sean-k-mooney | i had 3 racks woth of severs in my name when i left so yes? | 21:13 |
efried | even if I had 70 billion cores, I would still want to build only the spec I modified. | 21:13 |
sean-k-mooney | you could add that script to the tools directly and add it as a tox target | 21:14 |
mriedem | eric is in HMC withdrawals | 21:14 |
sean-k-mooney | like fast8 | 21:14 |
efried | that's a good idea. | 21:15 |
*** slaweq has quit IRC | 21:15 | |
efried | where do those tools live? | 21:16 |
mriedem | they are just scripts in the repo | 21:17 |
sean-k-mooney | here https://github.com/openstack/nova-specs/tree/master/tools | 21:17 |
mriedem | the tox target calls them and passes through the args | 21:17 |
efried | oh, I thought there was a central repo | 21:17 |
sean-k-mooney | no | 21:17 |
sean-k-mooney | we copy past all the things | 21:17 |
sean-k-mooney | there is a cookiecutter template somwhere | 21:18 |
mriedem | oslo-specs-incubator duh | 21:18 |
sean-k-mooney | but this would likely only be for tests altherough i guess wyou could use it in nova | 21:18 |
sean-k-mooney | i still fine the oslo incubator graduation sript to be magical | 21:19 |
sean-k-mooney | we should have used it for placement extraction actully but heindsight | 21:19 |
mriedem | i was joking | 21:20 |
mriedem | i just like to make oslo-incubator jokes to feel worldly | 21:20 |
mriedem | feel like a BIG MAN | 21:20 |
sean-k-mooney | sure but i really liked this script https://github.com/openstack/oslo-incubator/blob/stable/kilo/tools/filter_git_history.sh | 21:20 |
sean-k-mooney | i also really hated it after the 4 or 5th time i imported chagne form neutron into networking-ovs-dpdk | 21:22 |
sean-k-mooney | but it was nic to be able to maintain the history | 21:22 |
openstackgerrit | Eric Fried proposed openstack/nova master: Process [compute] in $NOVA_CPU_CONF in nova-next https://review.opendev.org/672800 | 21:24 |
efried | mriedem: Should nova-cpu.conf need [[api_]database]/connection ? | 21:26 |
sean-k-mooney | no... | 21:27 |
*** zbr has joined #openstack-nova | 21:27 | |
sean-k-mooney | i dont think it should but mriedem or dansmith would know | 21:27 |
mriedem | efried: no | 21:28 |
*** whoami-rajat has quit IRC | 21:28 | |
*** pcaruana has quit IRC | 21:28 | |
mriedem | whether or not the cell conductor service needs to hit the api db depends on if you're allowing up-calls | 21:29 |
mriedem | which most people are | 21:29 |
mriedem | because we haven't closed all of those gaps | 21:29 |
mriedem | https://docs.openstack.org/nova/latest/user/cellsv2-layout.html#operations-requiring-upcalls | 21:29 |
*** ivve has quit IRC | 21:30 | |
sean-k-mooney | by the way i was talking to slaweq this morning about https://bugs.launchpad.net/neutron/+bug/1836642 and im pretty sure we root cased this to the fact that nova is not confgiured to use memcache in the gate and we are falling back to using the dogpile.null cache implementaion | 21:30 |
openstack | Launchpad bug 1836642 in neutron "Metadata responses are very slow sometimes" [High,Confirmed] - Assigned to Slawek Kaplonski (slaweq) | 21:30 |
efried | red herring | 21:31 |
efried | (db conns from compute) | 21:31 |
mriedem | sean-k-mooney: coincidentally i just saw he added me to https://review.opendev.org/#/c/672715/1 | 21:32 |
sean-k-mooney | we are hitting ^ downstream too so in at least the donwnstream case i dont think its a duplicate of https://bugs.launchpad.net/openstack-gate/+bug/1808010 | 21:32 |
openstack | Launchpad bug 1808010 in OpenStack Compute (nova) "Tempest cirros ssh setup fails due to lack of disk space causing config-drive setup to fail forcing fallback to metadata server which fails due to hitting 10 second timeout." [Medium,Confirmed] | 21:32 |
*** zbr has quit IRC | 21:32 | |
sean-k-mooney | mriedem: oh cool ya just wanted to give ye an fyi as he said he was working on a fix in tempest/devstack | 21:33 |
*** panda has quit IRC | 21:34 | |
*** panda has joined #openstack-nova | 21:34 | |
*** brault has quit IRC | 21:36 | |
*** JamesBenson has quit IRC | 21:57 | |
*** JamesBenson has joined #openstack-nova | 22:00 | |
*** TxGirlGeek has joined #openstack-nova | 22:01 | |
*** JamesBenson has quit IRC | 22:04 | |
*** slaweq has joined #openstack-nova | 22:11 | |
*** eandersson has joined #openstack-nova | 22:15 | |
openstackgerrit | melanie witt proposed openstack/nova stable/stein: Avoid logging traceback when detach device not found https://review.opendev.org/672833 | 22:15 |
*** slaweq has quit IRC | 22:16 | |
*** betherly has joined #openstack-nova | 22:16 | |
*** rcernin has joined #openstack-nova | 22:16 | |
*** betherly has quit IRC | 22:21 | |
openstackgerrit | sean mooney proposed openstack/os-vif master: only disable mac ageing for ovs hybrid plug https://review.opendev.org/672834 | 22:22 |
eandersson | Anyone seen VMs failed scheduling getting stuck in building/scheduling indefinitely? | 22:28 |
eandersson | > There was a conflict when trying to complete your request.\n\n Unable to allocate inventory: Unable to create allocation for | 22:29 |
eandersson | > MaxRetriesExceeded: Exceeded maximum number of retries. Exhausted all hosts available for retrying build failures for instance | 22:29 |
eandersson | Hit a scheduling race condition which is fine, but then just got stuck (never into an error'd state) | 22:29 |
sean-k-mooney | it should not get stuc indefinetly and go to an error state | 22:29 |
eandersson | Yea - exactly | 22:29 |
eandersson | We have seen this a few times when something like Senlin is aggressively scaling up | 22:30 |
eandersson | And of course Senlin is unhappy because it just sits there waiting for it to go into ACTIVE (or ERROR'd) state. | 22:31 |
sean-k-mooney | if you are using any numa related resouces like hugepages or cpu pinning or if you are using sriov/pci passthough these are not tracked in placenet so if you have more then 1 schduler with more then 1 worker they can race | 22:31 |
eandersson | Yea - that is exactly it. | 22:31 |
eandersson | We are fine with it failing due to the race condition. | 22:32 |
sean-k-mooney | if we make it to 2020 then this should be fixed in U when all that stuff is in placmenet. but back to your current problem are there any error in the conductor that could indicate why the instance was not put into error state | 22:33 |
eandersson | Nope just the above errors. | 22:33 |
eandersson | Starts with the expected | 22:34 |
eandersson | > Free vcpu 0.00 VCPU < requested 20 VCPU | 22:34 |
sean-k-mooney | what release are you running? | 22:34 |
sean-k-mooney | this is where that error is being raised by the way https://github.com/openstack/nova/blob/3370f0f03ce17aaf3a7ebaa95d497f62bef238c0/nova/conductor/manager.py#L630 | 22:35 |
eandersson | Rocky | 22:35 |
sean-k-mooney | have you disabled the core,ram and disk filters | 22:35 |
eandersson | We have not | 22:36 |
sean-k-mooney | http://lists.openstack.org/pipermail/openstack-dev/2018-January/126283.html | 22:36 |
sean-k-mooney | they have been deprecated and should not be enabled after ocata | 22:36 |
*** ccstone has joined #openstack-nova | 22:36 | |
sean-k-mooney | we stopped reporting the info it used i think in rocky or stien | 22:37 |
sean-k-mooney | so it might not have been a race we could have elimiated all the host because the filter did not work | 22:38 |
eandersson | I feel like that would have been a more visible problem thou | 22:40 |
eandersson | We are seeing this happen in %0.1> | 22:40 |
sean-k-mooney | ya i think if we got to this part of the code its not the filters | 22:41 |
sean-k-mooney | you should turn them off however as they will break when you upgade to stien | 22:41 |
sean-k-mooney | so after we log that message we shoudl raise here https://github.com/openstack/nova/blob/stable/rocky/nova/conductor/manager.py#L601 | 22:42 |
sean-k-mooney | and end up just below it here https://github.com/openstack/nova/blob/stable/rocky/nova/conductor/manager.py#L615 | 22:42 |
sean-k-mooney | at which port we shoudl notify that the vm state should be error | 22:43 |
sean-k-mooney | https://github.com/openstack/nova/blob/stable/rocky/nova/scheduler/utils.py#L573 is where teh exception is being loged as a waning | 22:47 |
sean-k-mooney | and we do an instance.save right after it | 22:47 |
sean-k-mooney | that save should have updated the instance to error | 22:48 |
eandersson | Does overcommit no longer work in aggregates? | 22:48 |
sean-k-mooney | we broke that | 22:49 |
sean-k-mooney | in ocata | 22:49 |
sean-k-mooney | from ocata on you need to set the overcommit per node | 22:51 |
sean-k-mooney | in stien we implemneted https://github.com/openstack/nova-specs/blob/master/specs/stein/implemented/initial-allocation-ratios.rst | 22:51 |
sean-k-mooney | which allow you to manage allocation ratios directly via placement | 22:51 |
sean-k-mooney | and only specifiy an initall option in teh config | 22:52 |
sean-k-mooney | eandersson: but yes that is why melwitt sent http://lists.openstack.org/pipermail/openstack-dev/2018-January/126283.html and why we deprecated teh core,ram and disk filters and there aggrate* conterparts | 22:53 |
eandersson | We might have placement slightly misconfigured | 22:56 |
mriedem | https://docs.openstack.org/nova/latest/admin/configuration/schedulers.html#allocation-ratios on the scheduler allocation ratio + placement config stuff | 22:57 |
mriedem | the initial* options are only new in stein | 22:57 |
mriedem | so that doesn't help you in rocky | 22:57 |
sean-k-mooney | eandersson: in rocky the nova compute agent will continuoly set the ratios back to whatever is in the compute node config | 22:58 |
mriedem | you can either override allocation ratios per compute or override the providers in placement...but i think compute will overwrite anything you set out of band | 22:58 |
melwitt | eandersson: >= ocata there's no notion of a per aggregate allocation ratio. so you have to set them separately per compute host nova.conf | 22:58 |
sean-k-mooney | or use the defualt in code if not set | 22:58 |
eandersson | We set the computes wrong for non-overcommited at the moment | 22:59 |
eandersson | That could be causing the issue | 22:59 |
mriedem | note that you might not even be getting this far https://github.com/openstack/nova/blob/stable/rocky/nova/conductor/manager.py#L619 | 22:59 |
sean-k-mooney | its tricky becaue it will inially appar to work fine until you start filling your hosts | 22:59 |
mriedem | ^ is only if you hit a primary host, it fails and you reschedule | 22:59 |
mriedem | if initial scheduling fails, you should get NoValidHost and the instances should be put into ERROR status | 23:00 |
mriedem | if initial scheduling fails, you should go here https://github.com/openstack/nova/blob/stable/rocky/nova/conductor/manager.py#L1240 | 23:00 |
mriedem | and the instances go into cell0 with ERROR status | 23:01 |
mriedem | if your nova_cell0 db fell over then you're missing some updates... | 23:01 |
eandersson | > | OS-EXT-STS:task_state | scheduling | 23:01 |
eandersson | > | OS-EXT-STS:vm_state | building | 23:01 |
eandersson | It was stuck like this until we deleted btw | 23:01 |
eandersson | 12 hours later | 23:01 |
mriedem | was the instance ever reported as being on a host? | 23:01 |
mriedem | or in cell0? | 23:01 |
sean-k-mooney | mriedem: for some reason that is not happening and the error eandersson is seeing is loged form here https://github.com/openstack/nova/blob/stable/rocky/nova/scheduler/utils.py#L573 | 23:01 |
eandersson | It tried to schedule, but the moment it did it failed with | 23:01 |
eandersson | > Free vcpu 0.00 VCPU < requested 20 VCPU | 23:01 |
mriedem | if something fell over in conductor, like the db insert/update, you should have had error logs | 23:02 |
mriedem | sean-k-mooney: that utils code is called from multple places in conductor | 23:02 |
eandersson | Only other time I have seen this happen was when we moved nova-conductor to a new host | 23:02 |
eandersson | and forgot that the db is configured in mysql | 23:02 |
sean-k-mooney | yes | 23:02 |
mriedem | sean-k-mooney: https://github.com/openstack/nova/blob/stable/rocky/nova/conductor/manager.py#L1205 | 23:02 |
sean-k-mooney | i tracked it from here https://github.com/openstack/nova/blob/stable/rocky/nova/conductor/manager.py#L601 | 23:02 |
sean-k-mooney | based on teh message that was loged | 23:02 |
mriedem | ok i think i've reported a bug that we could be failing to set instances to ERROR in build_instances if something fails, i remember talking with gibi about it | 23:03 |
*** dtruong has joined #openstack-nova | 23:03 | |
melwitt | +1 look for db-related errors in the log. that is how I've seen other situations internally where instance got stuck in building/scheduling state | 23:03 |
sean-k-mooney | eandersson are you seeing the "'Setting instance to %s state.'" message | 23:04 |
eandersson | Does that have the instance uuid? | 23:04 |
sean-k-mooney | yes | 23:04 |
mriedem | rpc could have also fallen over | 23:05 |
eandersson | Then no | 23:05 |
sean-k-mooney | mriedem: right that is the only thing between those two lines that could fail | 23:05 |
mriedem | in that case you'd probably have MessagingTimeouts for the db save rpc calls | 23:05 |
eandersson | I have 1-2 MySQL server has gone away in the logs, but nothing near the time that happened | 23:06 |
eandersson | (also those only failed on select 1) | 23:07 |
sean-k-mooney | well i was wondering if the notifier = rpc.get_notifier(service) line is where it stoped | 23:07 |
sean-k-mooney | so it might not be related to the db | 23:07 |
sean-k-mooney | but to rabbit | 23:07 |
*** _erlon_ has quit IRC | 23:07 | |
eandersson | hmm does placement do rpc? | 23:09 |
eandersson | or would this be within nova only? | 23:09 |
sean-k-mooney | this is in nova | 23:09 |
sean-k-mooney | in the conductor | 23:09 |
sean-k-mooney | and no placment does not do any rpc as far as i am aware | 23:09 |
sean-k-mooney | its just a wsgi app in front of a db | 23:10 |
eandersson | One thing I don't like with oslo.messaging is that it ack's the message before it gets processed | 23:10 |
eandersson | oh | 23:14 |
eandersson | > Exception during message handling | 23:14 |
eandersson | > Exception during message handling: MaxRetriesExceeded: Exceeded maximum number of retries. | 23:14 |
eandersson | That is the error I posted above | 23:15 |
eandersson | And this is from a normal failure | 23:15 |
eandersson | > Setting instance to ERROR state.: MaxRetriesExceeded: Exceeded maximum number of retries. | 23:15 |
mriedem | yeah that's this https://opendev.org/openstack/nova/src/branch/master/nova/scheduler/utils.py#L730 | 23:17 |
mriedem | called from https://github.com/openstack/nova/blob/stable/rocky/nova/conductor/manager.py#L619 | 23:17 |
mriedem | the instance change should be saved here https://opendev.org/openstack/nova/src/branch/master/nova/scheduler/utils.py#L736 | 23:17 |
*** mriedem has quit IRC | 23:18 | |
*** betherly has joined #openstack-nova | 23:18 | |
sean-k-mooney | right but in the case where it remains in building we get Exception during message handling:... instead of Setting instance to ERROR state.:... | 23:19 |
sean-k-mooney | eandersson: is ^ correct | 23:19 |
eandersson | https://opendev.org/openstack/oslo.messaging/src/branch/master/oslo_messaging/rpc/server.py#L174 | 23:19 |
eandersson | This could be anything :'( | 23:20 |
*** JamesBenson has joined #openstack-nova | 23:20 | |
*** betherly has quit IRC | 23:23 | |
sean-k-mooney | eandersson: it might be good to check your rabbitmq server logs and see if there are any errors | 23:23 |
sean-k-mooney | although i honestly dont really know how that code works | 23:23 |
eandersson | http://paste.openstack.org/show/754875/ | 23:24 |
eandersson | sean-k-mooney, I honestly don't think it's a rmq issue directly | 23:24 |
*** JamesBenson has quit IRC | 23:24 | |
*** brinzhang has quit IRC | 23:24 | |
sean-k-mooney | ok its unlikely that its related to that placement error | 23:25 |
sean-k-mooney | that should happen before we try to build the instace | 23:25 |
*** brinzhang has joined #openstack-nova | 23:25 | |
eandersson | That log was from the same milisecond | 23:25 |
eandersson | I found two Messaging errors and both are the same issue | 23:25 |
sean-k-mooney | its posible that we only have 1 candiate host | 23:26 |
sean-k-mooney | and that we raced | 23:26 |
eandersson | Other than that no other oslo messaging issues | 23:26 |
sean-k-mooney | as a result of the fact overcommit is not working | 23:26 |
eandersson | If there was a rmq issue I am sure other services or request would have failed | 23:26 |
sean-k-mooney | neutron would be the first | 23:26 |
eandersson | We have a pretty massive deployment so always a lot of things going on | 23:26 |
sean-k-mooney | althogh they have redused the rpc traffic a lot in the last releae or two | 23:27 |
sean-k-mooney | oh the same message is also logged form here https://github.com/openstack/nova/blob/stable/rocky/nova/conductor/manager.py#L676 | 23:29 |
artom | Heh, for what it's worth, that per-compute libvirt connection mocking has issues: | 23:30 |
artom | " 2019-07-25 19:29:16,889 ERROR [nova.virt.libvirt.host] Hostname has changed from test_compute0 to test_compute1. A restart is required to take effect." | 23:30 |
sean-k-mooney | the other place was teh cellv1 version | 23:30 |
sean-k-mooney | actully never bind its the same function | 23:31 |
*** tjgresha has quit IRC | 23:31 | |
sean-k-mooney | so if this is a retry then we are executing https://github.com/openstack/nova/blob/stable/rocky/nova/conductor/manager.py#L642-L676 | 23:33 |
sean-k-mooney | and on line 655 we are trying to calim the alternate hosts which is failing with http://paste.openstack.org/show/754875/ because overcommit is not working | 23:34 |
eandersson | Yea pretty sure the error is within oslo.messaging | 23:34 |
sean-k-mooney | it might not be | 23:35 |
sean-k-mooney | when we rais the excetip form here https://github.com/openstack/nova/blob/stable/rocky/nova/conductor/manager.py#L676 | 23:35 |
eandersson | oh | 23:35 |
sean-k-mooney | it is not caut locally in this funciton | 23:35 |
*** brault has joined #openstack-nova | 23:36 | |
sean-k-mooney | i should proably go to bed but there is likely a bug there but im to tired to track it this evening | 23:39 |
mnaser | efried: i think you were working on moving nova to use openstacksdk ? do you have some of the commit you did in nova that did that? trying to do something similar for another project | 23:40 |
*** brault has quit IRC | 23:40 | |
eandersson | Thanks for the help sean-k-mooney | 23:42 |
eandersson | I'll create a bug report | 23:42 |
gmann | johnthetubaguy: melwitt sean-k-mooney efried added my comment/query on unified limit spec. I am not very clear about how we will handle GET for limits which are not going to move to new unified limits(for example server_groups). | 23:42 |
gmann | I am almost ok for proxy APIs (as per operators interest) if HTTPGone on those APIs is not acceptable. | 23:43 |
melwitt | gmann: thanks for reviewing. I am not 100% operators would be opposed to having to use an older microversion to use the APIs *but* I think the thing that makes it weird is that unified limits would be opt-in. so that is where I'm uneasy with removing proxy API in new microversion. wanted your opinion on that aspect as well | 23:45 |
gmann | melwitt: 'removing proxy API in new microversion' and keep them working for older one seems no benefit and even more maintenance. I was thinking we say those APIs are gone (410 response code HTTPGone) because nova quota system is gone and without microversion bump. similar approach as nova-network & n-cert case. | 23:50 |
gmann | and before we do that we can trigger the notification to users via deprecating those APIs | 23:51 |
melwitt | gmann: oh, sorry, I didn't know HTTPGone is different. my bad | 23:51 |
melwitt | yeah, I'd like to be able to do that but wasn't sure about the API perspective of deprecating something when the new thing is opt-in and not on by default | 23:52 |
*** brinzhang_ has quit IRC | 23:52 | |
gmann | melwitt: https://github.com/openstack/nova/blob/master/nova/api/openstack/compute/floating_ip_dns.py#L25 | 23:52 |
eandersson | sean-k-mooney, https://bugs.launchpad.net/nova/+bug/1837955 | 23:53 |
openstack | Launchpad bug 1837955 in OpenStack Compute (nova) "MaxRetriesExceeded sometime fails with messaging exception" [Undecided,New] | 23:53 |
*** brinzhang_ has joined #openstack-nova | 23:53 | |
gmann | melwitt: the only things make me feel uncomfortable to do that was the comment johnthetubaguy added in alternate section about Forum discussion with operators about keeping the old tooling. But i hope that is only for transition period not permanently | 23:54 |
*** smcginnis has quit IRC | 23:56 | |
gmann | melwitt: and second point is about GET quotas API to get the limits which are not going on new system. keep existing GET quotas APIs for them ? | 23:57 |
melwitt | gmann: ok. the treatment of /os-quota-sets and /os-quota-class-sets is definitely temporary for the transition. the /limits API we're not completely sure because if we HTTPGone that one, there will be no more ability for users to show limits + usage in one API. but if community could be OK with having to go to two different API (keystone for limits and placement for usage) then we could 410 /limits too in the future when unified | 23:57 |
melwitt | limits mode is on by default | 23:57 |
gmann | melwitt: +1 we can decide the limit thing later once hierarchy unified limits are there. | 23:59 |
Generated by irclog2html.py 2.15.3 by Marius Gedminas - find it at mg.pov.lt!