auniyal | finally https://review.opendev.org/c/openstack/nova/+/839922 merged, feels like a good morning :D | 03:13 |
---|---|---|
opendevreview | Tobias Urdin proposed openstack/nova master: Fix wrong nova-manage command in upgrade check https://review.opendev.org/c/openstack/nova/+/880819 | 06:12 |
opendevreview | Amit Uniyal proposed openstack/nova master: WIP: Reproducer for dangling volumes https://review.opendev.org/c/openstack/nova/+/881457 | 06:49 |
opendevreview | Amit Uniyal proposed openstack/nova master: WIP: Delete dangling volumes https://review.opendev.org/c/openstack/nova/+/882284 | 06:49 |
* bauzas facepalms : I forgot to say 'recheck ' when i wrote my gerrit comment | 07:22 | |
opendevreview | yatin proposed openstack/nova master: Add config option to configure TB cache size https://review.opendev.org/c/openstack/nova/+/868419 | 07:23 |
sahid | o/ | 07:30 |
opendevreview | Danylo Vodopianov proposed openstack/nova-specs master: Add support for Napatech LinkVirt SmartNICs https://review.opendev.org/c/openstack/nova-specs/+/859290 | 08:27 |
gibi | bauzas: thanks for the +2 on https://review.opendev.org/c/openstack/nova/+/862687 , could you please check the test patch below that as well? | 08:34 |
* bauzas clicks | 08:34 | |
bauzas | gibi: ah I forgot to click 'submit' | 08:35 |
bauzas | shitty today | 08:35 |
bauzas | I could be off for 15 mins, the Tesla garage (changing my windshield due to a rock) will pass me a TMX for 10 mins :) | 08:36 |
gibi | enjoy :) | 08:57 |
opendevreview | Merged openstack/nova master: Reproduce asym NUMA mixed CPU policy bug https://review.opendev.org/c/openstack/nova/+/862686 | 10:07 |
sean-k-mooney | im off today but while i remember this i found an issue with nova unit tests yesterday when trying to setup my new laptop | 11:43 |
sean-k-mooney | some of our unit tests are calling systemctl becasue of a lack of mocks | 11:43 |
sean-k-mooney | i noticed this by runing the tests in a ubuntu:22.04 container that did not have it | 11:43 |
sean-k-mooney | ill try an repoduce on monday and file a bug or fix it depeneind on how much time i have to look at it | 11:44 |
sean-k-mooney | gibi: https://storage.bhs.cloud.ovh.net/v1/AUTH_dcaab5e32b234d56b626f72581e3644c/zuul_opendev_logs_0ae/862687/3/gate/nova-tox-functional-py38/0aea18b/testr_results.html there is an odd db issue in teh func test result on your patch | 11:46 |
sean-k-mooney | that often meens there is a sharing of global state somewhere | 11:47 |
sean-k-mooney | it passed in other runs so we can proably just recheck it and see if its persitent but that test might be flaky so we should kep an eye on it | 11:47 |
sean-k-mooney | the failure is not related to your change | 11:48 |
gibi | sean-k-mooney: it is tracked in https://bugs.launchpad.net/nova/+bug/2002782 | 11:54 |
sean-k-mooney | oh cool | 11:54 |
sean-k-mooney | i have seen it once or twice but not often | 11:55 |
sean-k-mooney | o/ | 11:55 |
gibi | o/ | 11:56 |
opendevreview | Franciszek Przewoźny proposed openstack/placement master: Changed /tmp/migrate-db.rc to /root/migrate-db.rc https://review.opendev.org/c/openstack/placement/+/882436 | 12:14 |
opendevreview | Franciszek Przewoźny proposed openstack/placement master: Changed /tmp/migrate-db.rc to /root/migrate-db.rc https://review.opendev.org/c/openstack/placement/+/882436 | 12:26 |
opendevreview | Dan Smith proposed openstack/nova master: Populate ComputeNode.service_id https://review.opendev.org/c/openstack/nova/+/879904 | 14:11 |
opendevreview | Dan Smith proposed openstack/nova master: Add compute_id columns to instances, migrations https://review.opendev.org/c/openstack/nova/+/879499 | 14:11 |
opendevreview | Dan Smith proposed openstack/nova master: Add dest_compute_id to Migration object https://review.opendev.org/c/openstack/nova/+/879682 | 14:11 |
opendevreview | Dan Smith proposed openstack/nova master: Add compute_id to Instance object https://review.opendev.org/c/openstack/nova/+/879500 | 14:11 |
opendevreview | Dan Smith proposed openstack/nova master: Online migrate missing Instance.compute_id fields https://review.opendev.org/c/openstack/nova/+/879905 | 14:11 |
opendevreview | Artom Lifshitz proposed openstack/nova stable/wallaby: Reproduce bug 1995153 https://review.opendev.org/c/openstack/nova/+/882321 | 14:12 |
opendevreview | Artom Lifshitz proposed openstack/nova stable/wallaby: Save cell socket correctly when updating host NUMA topology https://review.opendev.org/c/openstack/nova/+/882322 | 14:12 |
dansmith | kashyap: another kernel crash this morning: https://storage.bhs.cloud.ovh.net/v1/AUTH_dcaab5e32b234d56b626f72581e3644c/zuul_opendev_logs_3b1/881764/9/check/cinder-tempest-plugin-basic-zed/3b198f9/testr_results.html | 15:38 |
dansmith | again, in a test that is attaching volumes, but a very different scenario than the original one.. the former was ceph backed and a full tempest run, this one is non-ceph, small set of tests for just the cinder plugin | 15:39 |
dansmith | I wonder if it would make sense to just open a kernel bug instead of hitting up the virt team? | 15:39 |
kashyap | dansmith: Hmm, so it is "fairly intermittently reproducible". I'm assuming you're meaning an upstream kernel bug? (I've filed those in the past - depending on the subsystem, they get looked at; or it just rots) | 15:47 |
kashyap | s/upstream/distro/ | 15:47 |
dansmith | kashyap: well, starting with ubuntu kernel might be reasonable, yeah | 15:47 |
kashyap | dansmith: Since the host (L1) is Ubuntu, how about we (I can do it on first thing Monday) start from filing an Ubuntu kernel bug? | 15:47 |
kashyap | Bingo | 15:48 |
dansmith | https://bugs.launchpad.net/nova/+bug/2018612 | 15:48 |
dansmith | I just filed this for nova ^ | 15:48 |
dansmith | so you can reference that if you want | 15:48 |
dansmith | kashyap: yeah that'd be great if you can do that | 15:49 |
dansmith | sean-k-mooney was also going to get an alpine guest to try to see if it ever sees the same | 15:49 |
dansmith | I wish we had a cirros-bug-rhel-kernel image so we could make it a RHEL thing, but.. not sure we can | 15:49 |
dansmith | but since this is ubuntu host, ubuntu guest, that's probably a good place to start | 15:49 |
kashyap | dansmith: Aaaah, nice. Can we do an "affects kernel" reference here? /me loos | 15:50 |
kashyap | s/loos/looks/ | 15:50 |
dansmith | kashyap: idk | 15:50 |
kashyap | Don't worry; I'll do it | 15:50 |
dansmith | thanks | 15:50 |
kashyap | By "cirros-bug-rhel-kernel" image you mean CirrOS with the RHEL kernel on it, right? | 15:51 |
kashyap | (Yeah, I agree; as there's more kernel-team capacity that can look at) | 15:51 |
dansmith | ah, I typo'd | 15:51 |
dansmith | I meant "cirros-but-with-the-rhel-kernel" :) | 15:51 |
dansmith | i.e something that can run in 128MB of ram so we could use it to repro (or not) the problem :) | 15:51 |
kashyap | Aah, okay, I parsed it right then | 15:51 |
kashyap | Yeah, I see what you mean. And _still_ the RHEL folks might decline to debug - RHEL doesn't support TCG (emulation). | 15:52 |
kashyap | I need to go for a doc appointment; be back later. And point noted. I hope this is not grinding the CI env to a halt | 15:52 |
dansmith | no, but I'm just one guy and have seen this a few times this week | 15:53 |
dansmith | even CI impact aside, if it can happen to real instances, that's something we should care about | 15:53 |
kashyap | Fair enough. "Other guys might see it more times" | 15:53 |
dansmith | ah, noted about TCG | 15:54 |
kashyap | What makes this really tricky is QEMU on KVM setup :-( "KVM on KVM" would make it so much more tractable for getting the RHEL kernel/KVM folks to look at | 15:54 |
dansmith | yeah, well, hard to fix that really | 15:55 |
dansmith | if it's really TCG related that'd be interesting I guess | 15:55 |
kashyap | We used to get some nodes from KVM-on-KVM setup; I forget which cloud it is | 15:55 |
dansmith | I understand about not debugging issues in unsupported environments, but unless it matters for this, it'd suck to ignore it because of that | 15:55 |
kashyap | Yeah, if we can reproduce it in an env w/ KVM-on-KVM then it's a "more real issue" (for lack of a better term) | 15:56 |
kashyap | (I agree) | 15:56 |
* kashyap back later | 15:57 | |
clarkb | we have nested virt flavors that should schedule to those that do kvm on kvm | 15:59 |
clarkb | you could pusha change that ran the workload a bunch of times on those flavors to see if it occurs with kvm on kvm | 15:59 |
clarkb | re emulation not being supported: I always find it interesting when the ability to test your software in the first place is deprioritized. It feels like openshift does this as well since you can't just run the services in a few containers and test our software integration with it anymore (you could with v3 but v4 effectively killed that but some people are working on it again) | 16:00 |
dansmith | clarkb: the cinder tempest plugin jobs are pretty volume and compute heavy, so just running those on that flavor might be a thing | 16:00 |
dansmith | clarkb: can you easy button show me how to do that? | 16:00 |
clarkb | dansmith: ya let me pull up the necessary info | 16:03 |
clarkb | I think devstack's special nodesets may make this slightly more complicated than we would hope but still doable | 16:03 |
clarkb | dansmith: you would define a new nodeset like https://opendev.org/openstack/devstack/src/branch/master/.zuul.yaml#L11-L19 changing its name to something unique and modifying its label value(s) to nested-virt labels (full list of labels can be seen at https://zuul.opendev.org/t/openstack/labels) Then modify your tempest job definitions to use that new nodeset. | 16:06 |
clarkb | https://opendev.org/openstack/tempest/src/branch/master/zuul.d/integrated-gate.yaml#L254 | 16:06 |
clarkb | then if you make a lot of copies of the job definition, each with a unique name, you can add them to your pipeline definition https://opendev.org/openstack/tempest/src/branch/master/zuul.d/project.yaml#L10 and run the same thing a bunch of times in parallel | 16:07 |
dansmith | so I depends-on that devstack change from wherever I need to do that? | 16:07 |
clarkb | yes, or you can just define the new nodeset directly where you do it | 16:09 |
clarkb | since you are making a copy it can live in the same change that you write for this (in tempest or nova or cinder etc) | 16:09 |
clarkb | It just needs to follow the same basic format of the devstack nodesets because devstack has expectations about groups. You can change the nodeset name and node labels though | 16:10 |
dansmith | ah, okay | 16:10 |
dansmith | would it not be easier to just redefine all or most of our nodesets to be the nested label and then I could just run regular jobs depends-on that devstack change without needing to modify the nodeset everywhere? | 16:11 |
opendevreview | Artom Lifshitz proposed openstack/nova stable/wallaby: Reproduce bug 1995153 https://review.opendev.org/c/openstack/nova/+/882321 | 16:12 |
opendevreview | Artom Lifshitz proposed openstack/nova stable/wallaby: Save cell socket correctly when updating host NUMA topology https://review.opendev.org/c/openstack/nova/+/882322 | 16:12 |
clarkb | yes you could do that too | 16:16 |
dansmith | okay, lemme try that | 16:16 |
clarkb | it would probably be worth a note in the commit message that you shouldn't merge that change bceuase it will severely limit how many available nodes are available for running all devstack based jobs | 16:20 |
clarkb | I think it is a single cloud currently | 16:20 |
dansmith | yep, marked as DNM | 16:20 |
clarkb | dansmith: oh and you need to update devstack to not force emulation by default | 16:20 |
dansmith | yeah I saw that | 16:20 |
* clarkb looks for where that is done | 16:20 | |
dansmith | I guess I better do that in the devstack patch too | 16:20 |
dansmith | I found it already | 16:21 |
clarkb | looks like devstack/.zuul.yaml grep for LIBVIRT_TYPE and set to kvm | 16:21 |
clarkb | but you found it | 16:21 |
clarkb | (it is set twice I think both need modification) | 16:22 |
dansmith | yu[p | 16:22 |
dansmith | clarkb: https://review.opendev.org/c/openstack/devstack/+/882457 | 16:26 |
dansmith | "does not match definition on master" ? | 16:26 |
clarkb | bah | 16:27 |
clarkb | I guess you will need to add definitions then with unique names | 16:27 |
dansmith | how would one ever change them then/ | 16:30 |
dansmith | define new and delete the old or something? | 16:30 |
clarkb | yes | 16:30 |
clarkb | there are alternatives as well. You can define anonymous nodesets in jobs directly (without names they just appl when that job runs) or in a central unbranched repo then there is a single copy of them (project-config could serve this purpose) | 16:31 |
clarkb | I believe they are defined directly in devstack to make it easier for third party ci systems to use though | 16:31 |
dansmith | okay so I should be able to just do this in c-t-p's .zuul then | 16:31 |
clarkb | yes that would work | 16:31 |
spatel | sean-k-mooney afternoon! | 16:31 |
clarkb | and then you can flip the LIBVIRT_TYPE var there too I think | 16:32 |
spatel | If you around then i have question related nova DB cleanup stuff, my machine oom out and some VMs stuck in nova DB and not sure how to clean them out. | 16:32 |
dansmith | clarkb: I'm messing up something with the nodeset definition: https://review.opendev.org/c/openstack/cinder-tempest-plugin/+/882458 | 16:48 |
dansmith | the error message is less helpful this time | 16:48 |
dansmith | oh wait | 16:50 |
dansmith | is it because name isn't indented? must be it | 16:51 |
dansmith | i believe it's running as expected now, thanks clarkb | 17:03 |
opendevreview | Balazs Gibizer proposed openstack/nova master: [doc]Clarify devname support in pci.device_spec https://review.opendev.org/c/openstack/nova/+/882464 | 17:34 |
clarkb | sorry I stepped out for a bit | 18:03 |
dansmith | clarkb: no problem I think I'm good now | 18:07 |
dansmith | clarkb: semi-related, am I just stupid or is it impossible to get two conditions in the opensearch query? | 18:07 |
dansmith | any two-condition search I do always returns no results | 18:08 |
dansmith | and I get a syntax error | 18:08 |
dansmith | oh I guess I need an "and" operator | 18:08 |
clarkb | I don't know I'm not really invovled in it | 18:11 |
dansmith | it seems like I used to be able to stumble myself into a useful query and since the upgrade I can only get really basic stuff to work | 18:12 |
dansmith | clarkb: so, I guess something is still wrong because lib/nova switches my requested kvm to qemu because /dev/kvm is not accessible | 18:44 |
dansmith | https://github.com/openstack/devstack/blob/master/lib/nova#L269 | 18:44 |
dansmith | oh, because that job didn't select the nested-jammy label, hrm | 18:44 |
spatel | Any idea how to clean up orphan VMs entries in nova DB? | 19:36 |
spatel | I have used virsh destroy command to delete vms and now DB has entries for them but VM doesn't exist | 19:36 |
dansmith | spatel: virsh destroy does nothing for nova vms, nova will just try to recreate them | 19:40 |
spatel | Hmm, I did delete them in openstack also using nova delete command | 19:41 |
dansmith | that's the only way, but they remain in the database until you archive (as you noted in your mailing list post) | 19:42 |
dansmith | archive will only remove them if they're marked as deleted | 19:42 |
spatel | They are doesn't exist https://paste.opendev.org/show/b0Caj0S65vx4hBFhAyML/ | 19:43 |
spatel | openstack hypervisor stat showing 91 vms running but openstack servers list showing only single VM | 19:43 |
spatel | Definitely nova DB is out of sync | 19:43 |
spatel | I look into nova/instances DB table and there is only single entry | 19:44 |
dansmith | then what's the problem? | 19:45 |
dansmith | just the running_vms count? | 19:45 |
spatel | Yes... | 19:45 |
spatel | How do i may everything in sync ? | 19:45 |
spatel | Just curious from where openstack hypervisor command finding 91 vms? | 19:46 |
dansmith | you need to look at compute_nodes.running_vms to see which one is still reporting instances | 19:47 |
spatel | let me take a look at that table | 19:47 |
spatel | I am not able to find that tables in DB | 19:51 |
spatel | it should be inside nova/instance correct? | 19:51 |
dansmith | instance (assuming you meant instances) is a table, compute_nodes is a table, running_vms is a column in the compute_nodes table | 19:52 |
spatel | found it | 19:53 |
spatel | Yes, I can see them there that on node1 - https://paste.opendev.org/show/bnhl6ZeHXIA3Tjk1ucV6/ | 19:56 |
spatel | do you think just update those number in table is enough? | 19:56 |
dansmith | that's three nodes | 19:56 |
dansmith | no | 19:56 |
dansmith | I mean, that will make the number change, but it's not the right fix | 19:57 |
dansmith | you need to select the hostname along with the count to know which is which | 19:57 |
dansmith | nova-compute should be updating those numbers | 19:57 |
dansmith | select host,hypervisor_hostname,running_vms from compute_nodes; | 19:58 |
spatel | https://paste.opendev.org/show/bwohGVNFtteg3J4x5qUY/ | 19:58 |
spatel | at present on ctrl node there are no VM running.. | 19:59 |
spatel | at present on ctrl1 and ctrl3 node there are no VM running.. | 19:59 |
spatel | That entry should be zero technically | 19:59 |
dansmith | is nova-compute running on each of those three nodes? | 19:59 |
dansmith | because it should be updating that number every few minutes | 19:59 |
spatel | yes its running | 20:00 |
spatel | all services showing fine.. I have restarted them | 20:00 |
spatel | no nasty logs or errors anywhere | 20:00 |
dansmith | they should all be iterating over their instances regularly and updating those numbers | 20:02 |
dansmith | perhaps it's not doing that if there are no instances (although you said there was one, so at least that one should be correct) | 20:05 |
spatel | out of 3 nodes only node2 has 1 VM running and rest are empty | 20:06 |
dansmith | yeah, so that node should show 1 in the database and doesn't, which to me means something is wrong (unless it hasn't run update_available_resource yet) | 20:07 |
spatel | Thinking to reboot all 3 nodes to start with fresh troubleshooting | 20:07 |
spatel | who will run update_available_resource task? compute nodes correct? | 20:08 |
dansmith | nova-compute does it | 20:08 |
spatel | may be rabbitMQ is in zombie state... I have checked cluster_status and its showing all good but who knows.. | 20:09 |
dansmith | there should be errors in nova-compute if so, but hard to say after something like an oom | 20:09 |
spatel | Let me check.. | 20:11 |
spatel | I found this lines in nova-compute - AMQP server on 192.168.1.11:5672 is unreachable: timed out. Trying again in 0 seconds.: socket.timeout: timed out | 20:12 |
spatel | Look like issue is related to rabbit.. hmm | 20:13 |
spatel | but cluster status is green | 20:13 |
spatel | Let me destroy rabbit and rebuild it to see if it come clean | 20:13 |
spatel | dansmith look like it was rabbit issue, after re-building rabbit I can see correct count on hypervisor stats :) | 20:27 |
dansmith | spatel: good, that's why I was recommending you not just fix it manually because it's an indication of something else | 20:28 |
spatel | Thank you for staying with me :) | 20:28 |
spatel | I was about to blowup DB.. haha | 20:28 |
spatel | dansmith I have last question, How do i tell nova to just limit number of VMs per kvm host? | 20:29 |
spatel | I have 3 nodes and just want to stick with 10 vms per compute nodes limit so i don't blow up again | 20:30 |
dansmith | I don't know that you can, easily.. you might be able to hack that up with placement, a custom resource class, and flavor extra specs, but it would be complicated | 20:31 |
dansmith | better to set memory overcommit to 1.0 reserved memory enough to run your host services and then it will limit to whatever will fit without creating too much memory pressure | 20:31 |
spatel | In my case i have controller and compute on same node. | 20:32 |
spatel | This is very small environment for small budget. | 20:32 |
spatel | I like the idea of memory overcommit to 1.0 | 20:33 |
spatel | I thought nova has config setting per compute node to tell number of VM allow to run. | 20:34 |
dansmith | not that I know of.. generally such a number would make no sense.. one 32G instance might fit where 16 2GB instances would fit.. "number of instances" is not a very useful number for most people | 20:34 |
opendevreview | Dan Smith proposed openstack/nova master: DNM: Test new ceph job configuration with nova https://review.opendev.org/c/openstack/nova/+/881585 | 20:57 |
Generated by irclog2html.py 2.17.3 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!