openstackgerrit | Brin Zhang proposed openstack/nova master: WIP: Cyborg suspend/resume support https://review.opendev.org/729945 | 00:00 |
---|---|---|
openstackgerrit | Brin Zhang proposed openstack/nova master: [Trivial] Rename host/node to hostname/nodename in conductor manager https://review.opendev.org/762499 | 00:00 |
*** martinkennelly has joined #openstack-nova | 00:13 | |
*** jangutter has quit IRC | 00:29 | |
*** LinPeiWen96 has joined #openstack-nova | 00:30 | |
*** jangutter has joined #openstack-nova | 00:32 | |
*** rcernin has quit IRC | 00:42 | |
*** rcernin has joined #openstack-nova | 00:43 | |
openstackgerrit | Ghanshyam Mann proposed openstack/nova master: [WIP] Migrate nova-grenade-multinode job to zuulv3 native https://review.opendev.org/742056 | 01:03 |
*** martinkennelly has quit IRC | 01:26 | |
*** _mlavalle_1 has quit IRC | 01:31 | |
*** spatel has joined #openstack-nova | 01:37 | |
openstackgerrit | chengsheng proposed openstack/nova master: Add hypervisor CPU feature check during live migration https://review.opendev.org/762330 | 01:38 |
*** spatel has quit IRC | 01:42 | |
*** songwenping_ has joined #openstack-nova | 01:53 | |
*** k_mouza has joined #openstack-nova | 01:54 | |
*** LinPeiWen96 has quit IRC | 01:54 | |
*** LinPeiWen13 has joined #openstack-nova | 01:55 | |
*** rcernin has quit IRC | 01:58 | |
*** k_mouza has quit IRC | 01:58 | |
*** macz_ has quit IRC | 02:09 | |
*** tinwood has quit IRC | 02:10 | |
*** tinwood has joined #openstack-nova | 02:13 | |
openstackgerrit | chengsheng proposed openstack/nova master: Modify the default value of the force parameter in live migration https://review.opendev.org/762458 | 02:18 |
*** sapd1 has joined #openstack-nova | 02:31 | |
*** macz_ has joined #openstack-nova | 02:39 | |
*** macz_ has quit IRC | 02:45 | |
openstackgerrit | Brin Zhang proposed openstack/nova master: [Trivial] Rename host/node to hostname/nodename in conductor manager https://review.opendev.org/762499 | 02:49 |
*** rcernin has joined #openstack-nova | 02:57 | |
*** jangutter has quit IRC | 03:08 | |
*** jangutter has joined #openstack-nova | 03:09 | |
*** songwenping_ has quit IRC | 03:22 | |
*** rcernin has quit IRC | 03:38 | |
*** nweinber has joined #openstack-nova | 03:38 | |
*** nweinber has quit IRC | 03:43 | |
*** rcernin has joined #openstack-nova | 04:05 | |
openstackgerrit | Merged openstack/nova stable/ussuri: Change default num_retries for glance to 3 https://review.opendev.org/748936 | 04:05 |
openstackgerrit | Keigo Noha proposed openstack/nova stable/train: Change default num_retries for glance to 3 https://review.opendev.org/762610 | 04:11 |
openstackgerrit | chengsheng proposed openstack/nova master: Add hypervisor CPU feature check during live migration https://review.opendev.org/762330 | 04:13 |
*** mkrai has joined #openstack-nova | 04:25 | |
*** psachin has joined #openstack-nova | 04:53 | |
*** rm_work has quit IRC | 04:58 | |
*** rm_work has joined #openstack-nova | 04:58 | |
*** gyee has quit IRC | 05:24 | |
*** evrardjp has quit IRC | 05:33 | |
*** evrardjp has joined #openstack-nova | 05:33 | |
*** zzzeek has quit IRC | 05:34 | |
*** zzzeek has joined #openstack-nova | 05:36 | |
*** rcernin has quit IRC | 05:39 | |
*** ociuhandu has joined #openstack-nova | 05:40 | |
*** rcernin has joined #openstack-nova | 05:42 | |
brinzhang_ | gibi: hi, I added cyborg shelve/unshelve support patch to the wallaby runway slot, https://etherpad.opendev.org/p/nova-runways-wallaby | 05:48 |
*** ociuhandu has quit IRC | 06:00 | |
*** JamesBenson has quit IRC | 06:04 | |
openstackgerrit | YumengBao proposed openstack/nova-specs master: libvirt supports composing cyborg owned vGPU accelerator into domain XML https://review.opendev.org/750116 | 06:19 |
*** tosky has joined #openstack-nova | 06:24 | |
*** elod has joined #openstack-nova | 06:45 | |
*** rcernin_ has joined #openstack-nova | 06:53 | |
*** mkrai has quit IRC | 06:53 | |
*** rcernin has quit IRC | 06:54 | |
*** rcernin_ has quit IRC | 06:59 | |
*** ociuhandu has joined #openstack-nova | 07:13 | |
*** ociuhandu has quit IRC | 07:13 | |
*** ralonsoh has joined #openstack-nova | 07:14 | |
*** ociuhandu has joined #openstack-nova | 07:14 | |
*** ociuhandu has quit IRC | 07:17 | |
*** rpittau|afk is now known as rpittau | 07:20 | |
*** slaweq has joined #openstack-nova | 07:26 | |
*** jangutter_ has joined #openstack-nova | 07:35 | |
*** jangutter has quit IRC | 07:36 | |
*** mkrai has joined #openstack-nova | 07:40 | |
gibi | stephenfin: ack | 08:07 |
gibi | brinzhang_: hi, OK thanks | 08:07 |
*** tkajinam has quit IRC | 08:08 | |
*** andrewbonney has joined #openstack-nova | 08:10 | |
*** tkajinam has joined #openstack-nova | 08:12 | |
*** tesseract has joined #openstack-nova | 08:14 | |
brinzhang_ | gibi: I was update https://review.opendev.org/#/c/729563/17, but the zuul not stable | 08:25 |
brinzhang_ | gibi: I saw you said nova-live-migration sometimes timeout, it also happened on https://review.opendev.org/#/c/762499/5 | 08:26 |
*** tesseract has quit IRC | 08:29 | |
*** tesseract has joined #openstack-nova | 08:31 | |
*** david-lyle has joined #openstack-nova | 08:47 | |
*** dklyle has quit IRC | 08:47 | |
*** mkrai has quit IRC | 08:54 | |
*** CeeMac has joined #openstack-nova | 08:55 | |
*** ociuhandu has joined #openstack-nova | 08:58 | |
*** david-lyle has quit IRC | 08:58 | |
*** ociuhandu has quit IRC | 09:08 | |
bauzas | gibi: got tons of failures on the RPC API change, could you please tell me whether we still have CI issues ? | 09:08 |
bauzas | contect : https://review.opendev.org/#/c/761452/ | 09:09 |
openstackgerrit | chengsheng proposed openstack/nova master: Add hypervisor CPU feature check during live migration https://review.opendev.org/762330 | 09:15 |
*** nightmare_unreal has quit IRC | 09:17 | |
*** k_mouza has joined #openstack-nova | 09:18 | |
*** jawad_axd has joined #openstack-nova | 09:22 | |
lyarwood | is there anyone running focal locally who could help me understand why libvirtd restarts even when I've killed the service and associated sockets in systemd? re to the nova-live-migration gate failure https://bugs.launchpad.net/nova/+bug/1903979 | 09:23 |
openstack | Launchpad bug 1903979 in OpenStack Compute (nova) "nova-live-migration job fails during evacuate negative test" [High,Confirmed] - Assigned to Lee Yarwood (lyarwood) | 09:23 |
lyarwood | seems something has changed in focal, likely another socket/service that wakes libvirtd backup after 5 seconds | 09:24 |
*** kaisers has joined #openstack-nova | 09:27 | |
*** jangutter has joined #openstack-nova | 09:31 | |
openstackgerrit | Lee Yarwood proposed openstack/nova master: WIP nova-live-migration: Disable *all* virt services during negative tests https://review.opendev.org/762623 | 09:31 |
lyarwood | ^ blind attempt while I try to get a local focal env to play with | 09:31 |
*** jangutter_ has quit IRC | 09:33 | |
*** derekh has joined #openstack-nova | 09:39 | |
jawad_axd | Hi folks, Facing this http://paste.openstack.org/show/799983/ . Any pointers what might be wrong? Instance gets created at the end but it takes a while, seems like rabbitmq connection is not stable anymore after rebooting controllers for some reason. rabbitmq cluster status is fine. | 09:39 |
*** martinkennelly has joined #openstack-nova | 09:39 | |
jawad_axd | using stable stien. | 09:41 |
bauzas | eek, just saw http://status.openstack.org/elastic-recheck/#1686542 | 09:41 |
bauzas | the gate is f*** flakey | 09:42 |
* bauzas needs to take the red pill | 09:42 | |
bauzas | hah, nevermind the above sentence, I apologize | 09:43 |
bauzas | way more context than just the Matrix movie, unfortunately | 09:44 |
kashyap | bauzas: "There is no Gate" | 09:45 |
*** ociuhandu has joined #openstack-nova | 09:48 | |
*** whoami-rajat__ has quit IRC | 09:48 | |
openstackgerrit | Brin Zhang proposed openstack/nova-specs master: Filter instances by tenant_id https://review.opendev.org/737241 | 09:50 |
gibi | bauzas: I use this to see if there any total brokenness on the gate https://zuul.opendev.org/t/openstack/builds?project=openstack%2Fnova&pipeline=gate | 09:55 |
gibi | bauzas: also we have 83 unclassified failures on the gate by http://status.openstack.org/elastic-recheck/data/integrated_gate.html | 09:56 |
gibi | so the answer is we don't know | 09:57 |
*** ociuhandu has quit IRC | 09:58 | |
gibi | bauzas: I did filed https://bugs.launchpad.net/nova/+bug/1903979 and lyarwood is trying to find the root of it | 09:58 |
openstack | Launchpad bug 1903979 in OpenStack Compute (nova) "nova-live-migration job fails during evacuate negative test" [High,In progress] - Assigned to Lee Yarwood (lyarwood) | 09:58 |
gibi | dansmith, stephenfin, gmann: filed a doc bug to document the minimal config for each nova services https://bugs.launchpad.net/nova/+bug/1904179 | 10:01 |
openstack | Launchpad bug 1904179 in OpenStack Compute (nova) "[doc] define the minimal mandatory configuration for each nova service" [Medium,Confirmed] | 10:01 |
openstackgerrit | Balazs Gibizer proposed openstack/nova stable/victoria: Warn when starting services with older than N-1 computes https://review.opendev.org/761923 | 10:12 |
bauzas | gibi: ack thanks | 10:13 |
gibi | bauzas: this is also new for me on master https://1d1ac8c4d7a38514d020-4dcc482f43713c13ecd75f64a0eb3df3.ssl.cf1.rackcdn.com/762319/1/check/tempest-integrated-compute/0b7fa72/controller/logs/screen-n-cpu.txt | 10:14 |
gibi | bauzas: filing a bug now for that too | 10:14 |
bauzas | gibi: http://status.openstack.org/elastic-recheck/#1686542 | 10:14 |
bauzas | looks like we have problems with the queues | 10:15 |
bauzas | anyway;, /me needs to taxi his kids | 10:15 |
gibi | the we have multiple problems :) | 10:15 |
gibi | yey | 10:15 |
bauzas | looks like a super fancy Friday | 10:15 |
bauzas | \o/ | 10:15 |
gibi | it is 13th so I'm not surprised | 10:16 |
*** brinzhang_ has quit IRC | 10:16 | |
*** janno has quit IRC | 10:19 | |
*** xinranwang has joined #openstack-nova | 10:23 | |
xinranwang | gibi: Hi gibi, I have updated the smarnic spec, please review it when you got time :) | 10:24 |
gibi | xinranwang: thanks | 10:24 |
gibi | I will try to jump on reviews after I put out the fire on the gate | 10:24 |
*** efried has quit IRC | 10:24 | |
xinranwang | gibi: Ah ok, it's friday evening in my timezone, I will check it next monday. Take your time. | 10:26 |
xinranwang | Thanks in advance | 10:26 |
gibi | xinranwang: there is a good chance that I will not reach your spec today. Have a nice weekend. | 10:27 |
*** janno has joined #openstack-nova | 10:27 | |
gibi | FYI another gate failure https://bugs.launchpad.net/nova/+bug/1904181 | 10:30 |
openstack | Launchpad bug 1904181 in OpenStack Compute (nova) "nova-compute fails to start is cell conductor is not running." [High,Confirmed] - Assigned to Balazs Gibizer (balazs-gibizer) | 10:30 |
*** efried has joined #openstack-nova | 10:31 | |
xinranwang | gibi: thanks :) | 10:32 |
openstackgerrit | Balazs Gibizer proposed openstack/nova master: Restore retrying the RCP connection to conductor https://review.opendev.org/762633 | 10:53 |
gibi | bauzas: when you are back there are two nova gate fixing patch. One from me ^^ and one from stephenfin https://review.opendev.org/#/c/762543 | 10:54 |
bauzas | I'm here | 10:55 |
bauzas | ack, looking then | 10:55 |
gibi | bauzas: look stephenfin's first I will try to add a test to mine | 10:55 |
gibi | and thanks | 10:56 |
*** jangutter_ has joined #openstack-nova | 10:56 | |
bauzas | gibi: +W'd stephenfin's one | 10:59 |
*** jangutter has quit IRC | 10:59 | |
gibi | awesome, thanks | 10:59 |
bauzas | gibi: for your own change, I have a concern, where are we looking at the conductor ? | 10:59 |
gibi | bauzas: https://github.com/openstack/nova/blob/eb279e9a5676f4142cce4700c3097ecc14161895/nova/service.py#L115 | 11:00 |
gibi | it is well hidden gem of the Service object | 11:00 |
*** jawad_axd has quit IRC | 11:01 | |
bauzas | thanks | 11:01 |
bauzas | looking | 11:01 |
bauzas | a-ha, now I remember | 11:01 |
bauzas | https://github.com/openstack/nova/blob/eb279e9a5676f4142cce4700c3097ecc14161895/nova/service.py#L113 is for the computes | 11:02 |
gibi | yes | 11:02 |
*** ociuhandu has joined #openstack-nova | 11:03 | |
bauzas | gibi: okay, then waiting for your test | 11:03 |
bauzas | gibi: just do a functional test honestly | 11:03 |
gibi | OK, working on it | 11:03 |
gibi | yes, it will be a functional one | 11:03 |
bauzas | would be simplier and better | 11:03 |
bauzas | cool | 11:04 |
bauzas | gibi: fwiw, I'll be lunching, but ping me when you're done and I'll look at it when I'm back | 11:04 |
gibi | sure. have a nice one! | 11:04 |
*** jangutter has joined #openstack-nova | 11:24 | |
gibi | bauzas: bah, in the func test env the compute service does not need a running conductor to start up, I guess something is mocked in the indirection API | 11:24 |
*** ociuhandu has quit IRC | 11:27 | |
*** jangutter_ has quit IRC | 11:27 | |
openstackgerrit | Balazs Gibizer proposed openstack/nova master: Restore retrying the RCP connection to conductor https://review.opendev.org/762633 | 11:31 |
gibi | with a less good but still covering unit test ^^ | 11:32 |
*** takamatsu has quit IRC | 11:33 | |
*** takamatsu has joined #openstack-nova | 11:33 | |
*** ociuhandu has joined #openstack-nova | 11:44 | |
*** JamesBenson has joined #openstack-nova | 12:01 | |
*** tbachman has quit IRC | 12:02 | |
*** JamesBenson has quit IRC | 12:04 | |
*** JamesBenson has joined #openstack-nova | 12:04 | |
openstackgerrit | Lee Yarwood proposed openstack/nova master: nova-live-migration: Disable *all* virt services during negative tests https://review.opendev.org/762623 | 12:09 |
openstackgerrit | chengsheng proposed openstack/nova master: Add hypervisor CPU feature check during live migration https://review.opendev.org/762330 | 12:15 |
openstackgerrit | Balazs Gibizer proposed openstack/nova master: Doc that [database]connection is not for nova-compute https://review.opendev.org/762647 | 12:25 |
*** xinranwang has quit IRC | 12:33 | |
kashyap | Anyone else with CPU and live migration interest, have a look at this review, from above, chengsheng ... I'm not comfortable parsing /proc/cpuinfo straight-up (and a couple of other issues in there): https://review.opendev.org/#/c/762330/6 | 12:35 |
kashyap | Err, wrong link | 12:35 |
kashyap | Correct one: https://review.opendev.org/#/c/762330/ | 12:35 |
*** k_mouza has quit IRC | 12:38 | |
sean-k-mooney | we have said no to that in the past | 12:52 |
sean-k-mooney | and required it the info to be exposed in libvirt | 12:53 |
sean-k-mooney | kashyap: as written that patch will break upgrades and prevent migrating a vm to newwer hardware so its a nonstarter | 13:06 |
*** k_mouza has joined #openstack-nova | 13:06 | |
sean-k-mooney | its also not filtering out no virutaliseable feature that dont affact the cpu flags exposed to the vm | 13:07 |
*** artom has joined #openstack-nova | 13:15 | |
*** ociuhandu_ has joined #openstack-nova | 13:21 | |
*** raildo has joined #openstack-nova | 13:24 | |
*** ociuhandu has quit IRC | 13:25 | |
*** tbachman has joined #openstack-nova | 13:27 | |
openstackgerrit | Merged openstack/nova master: functional: Wait for revert resize to complete https://review.opendev.org/762543 | 13:40 |
*** martinkennelly has quit IRC | 13:41 | |
f0o | I'm a bit confused... More and more linux guests cant reboot. their instances go from running to paused for some reason and need to be hard-reboot from horizon/api to be able to become alive again. Is a guest no longer allowed to reboot by itself? Am I missing a setting somewhere? (using kvm/qemu) | 13:44 |
sean-k-mooney | f0o: they are allowed too yes | 13:44 |
f0o | why are they ending up in "paused" then? | 13:45 |
sean-k-mooney | they would only be moved to paused if the state in the db for the vm was paused and you had the sync power state option set | 13:45 |
f0o | what would cause the db to think it's paused? | 13:45 |
sean-k-mooney | only an api request | 13:45 |
sean-k-mooney | did you check the instance event log | 13:46 |
f0o | the way I can reproduce it is: create instance (ubuntu lts for instance), log in and do sudo reboot, vm is now stuck in paused | 13:46 |
f0o | no api-calls made | 13:46 |
sean-k-mooney | im not directly aware of anythin that would cause that | 13:47 |
sean-k-mooney | i have not seen it on ussuri at least | 13:47 |
sean-k-mooney | what release are you using | 13:47 |
f0o | it becomes very annoying for OS that run updates as part of their first boot (like coreos/flatcar) then the vm never becomes alive... | 13:47 |
f0o | I think I'm still on stein | 13:47 |
kashyap | sean-k-mooney: Yeah; it's a non-starter for other reasons too | 13:48 |
kashyap | sean-k-mooney: I've just only gave a cursory look :-) Please comment there | 13:48 |
sean-k-mooney | kashyap: i did | 13:49 |
sean-k-mooney | i was summerising for you :) | 13:49 |
kashyap | Hehe; thanks | 13:49 |
*** derekh has quit IRC | 13:54 | |
f0o | how would I go about debugging this weird paused issue? | 13:56 |
f0o | where to start looking? | 13:56 |
sean-k-mooney | f0o: you should be looking at the nova-compute logs for the instance in question and seeing what actions cause the instance to be paused | 13:57 |
sean-k-mooney | e.g. did the compute agent recived an event form libvirt saying the guest is now paused | 13:57 |
sean-k-mooney | or did it call libvirt to pause it | 13:57 |
f0o | ok, will do | 13:58 |
*** mlavalle has joined #openstack-nova | 13:58 | |
f0o | thnkas :) | 13:58 |
f0o | thanks* | 13:58 |
*** nweinber has joined #openstack-nova | 14:01 | |
f0o | I have a few entries that have VM Paused (Lifecycle Event) in them. just gonna spawn a new instance and reboot it to see all the logs it generates | 14:02 |
*** macz_ has joined #openstack-nova | 14:17 | |
f0o | [instance: 4d131088-4ca6-47bd-ab5c-47b9e0a7c996] VM Paused (Lifecycle Event) & During _sync_instance_power_state the DB power_state (1) does not match the vm_power_state from the hypervisor (3). Updating power_state in the DB to match the hypervisor. | 14:19 |
f0o | [instance: 4d131088-4ca6-47bd-ab5c-47b9e0a7c996] Instance is paused unexpectedly. Ignore. | 14:19 |
f0o | so those 3 lines happen when I issue reboot inside the instance | 14:19 |
f0o | and now horizon shows it as Active/Paused and there's no resume action. I need to issue Hard-Reboot to kick it back alive | 14:20 |
sean-k-mooney | that look like libvirt is moving it to paused so | 14:20 |
sean-k-mooney | and then the compute agent is just updating the db to reflect that | 14:20 |
*** macz_ has quit IRC | 14:22 | |
f0o | well /var/log/libvirt/qemu/instance-00000115.log sure is useless lol | 14:22 |
openstackgerrit | Radosław Piliszek proposed openstack/nova master: [docs] Fix a placement client's command https://review.opendev.org/762663 | 14:26 |
f0o | not sure where to go from here | 14:29 |
sean-k-mooney | can you paste the xml for the instace somewhere | 14:30 |
f0o | sure | 14:30 |
sean-k-mooney | there is an option to contol the reboot action but we dont set it but im just wondering if anything else is there that was generated by libvirt | 14:30 |
sean-k-mooney | libvirt modifies the xml we give it an fills in things like pci address automatically | 14:31 |
f0o | http://paste.openstack.org/show/oGhApPwoy0lNe5cKmV0G/ | 14:31 |
sean-k-mooney | <on_poweroff>destroy</on_poweroff> | 14:32 |
sean-k-mooney | <on_reboot>restart</on_reboot> | 14:32 |
sean-k-mooney | <on_crash>destroy</on_crash> | 14:32 |
sean-k-mooney | so those are what i expect | 14:32 |
sean-k-mooney | <on_reboot>restart</on_reboot> is what i was wondering about | 14:33 |
f0o | so that seems alright then? | 14:33 |
f0o | nova-compute-kvm version 2:20.0.0~rc1-0ubuntu3~cloud0 and qemu-kvm version 1:4.0+dfsg-0ubuntu9~cloud0 (just in case) | 14:33 |
sean-k-mooney | yes so what could be happening is a race between the periodic task and the vm reboot | 14:33 |
sean-k-mooney | is it happeing everytime | 14:34 |
sean-k-mooney | or jsut some times | 14:34 |
f0o | it's happening the majority of times | 14:34 |
f0o | there are some exceptions to it, but by now I'd wager that most times it gets stuck in paused | 14:34 |
sean-k-mooney | hum ok the interval is 10 minutes for the update by default | 14:35 |
sean-k-mooney | https://docs.openstack.org/nova/latest/configuration/config.html#DEFAULT.sync_power_state_interval | 14:35 |
sean-k-mooney | i assume you have not made that run faster | 14:35 |
f0o | nope | 14:35 |
f0o | config is very trivial, most are left as defaults | 14:35 |
sean-k-mooney | https://docs.openstack.org/nova/latest/configuration/config.html#workarounds.handle_virt_lifecycle_events | 14:36 |
sean-k-mooney | so you might want to set that to false | 14:36 |
sean-k-mooney | it look liek this is a know race | 14:36 |
f0o | ok let's give it a shot | 14:37 |
sean-k-mooney | if you have the interval at its default then setting that to false should be fine | 14:37 |
sean-k-mooney | that would have to be set on the compute nodes fyi | 14:38 |
f0o | set and restarting nova-compute | 14:38 |
sean-k-mooney | cool | 14:38 |
f0o | let's give it a shot | 14:38 |
sean-k-mooney | f0o: by the way the reason we default to haneling event i belive is for ironic | 14:41 |
*** k_mouza has quit IRC | 14:42 | |
*** ociuhandu_ has quit IRC | 14:42 | |
*** ociuhandu has joined #openstack-nova | 14:43 | |
f0o | makes sense | 14:43 |
f0o | at first I thought I was going insane but now that it became more frequent I figured I ask | 14:44 |
sean-k-mooney | hopefully that option will help | 14:47 |
sean-k-mooney | the main sideefct is that if you do poweroff in the guest then it wont be refected in the api/db until the interval expires | 14:47 |
sean-k-mooney | e.g. up to 10 mins form power off by efault | 14:48 |
f0o | that should be fine | 14:48 |
sean-k-mooney | that is genreally and ok tradeoff and you can adjust the interval if you want too | 14:48 |
f0o | worst case I lower the update interval | 14:48 |
sean-k-mooney | yep | 14:48 |
*** lpetrut has joined #openstack-nova | 14:49 | |
*** rpittau is now known as rpittau|afk | 14:49 | |
*** hemna has quit IRC | 14:50 | |
*** k_mouza has joined #openstack-nova | 14:50 | |
*** hemna has joined #openstack-nova | 14:51 | |
f0o | [ 652.025665] reboot: Restarting system .... let's hope it comes back up :D | 14:53 |
f0o | I can see the qemu process running and consuming 30% cpu but nothing seems alive in it | 14:56 |
f0o | vnc shows guest hasnt initialized display and view logs is only showing that reboot message | 14:56 |
f0o | During _sync_instance_power_state the DB power_state (1) does not match the vm_power_state from the hypervisor (3). Updating power_state in the DB to match the hypervisor / Instance is paused unexpectedly. Ignore. | 15:00 |
f0o | again :< | 15:00 |
*** k_mouza has quit IRC | 15:01 | |
f0o | process is still running and eating up 30% cpu tho, not sure what it's actually doing there | 15:02 |
*** ociuhandu has quit IRC | 15:06 | |
*** ociuhandu has joined #openstack-nova | 15:08 | |
*** lpetrut has quit IRC | 15:12 | |
*** dopereira has joined #openstack-nova | 15:12 | |
*** ociuhandu has quit IRC | 15:24 | |
*** LinPeiWen13 has quit IRC | 15:29 | |
*** dklyle has joined #openstack-nova | 15:48 | |
sean-k-mooney | f0o: that is sounding more like a qemu bug then an openstack one | 15:49 |
sean-k-mooney | it sound like its not actully restating properly | 15:49 |
f0o | cool :D | 15:50 |
*** tacco has joined #openstack-nova | 15:51 | |
openstackgerrit | Ghanshyam Mann proposed openstack/nova master: DNM: Testing system scope in tempest https://review.opendev.org/740124 | 15:51 |
f0o | just my luck | 15:51 |
tacco | hey everyone. Anyone knows why i only get 64VCPUs on a HV with 256 CPU Cores? over commiting ratio is 1.0 :( | 15:52 |
tacco | AMD EPYC 7742 64-Core Processor | 15:52 |
*** mlavalle has quit IRC | 15:53 | |
gibi | dansmith: hi! Do I understand correctly that the service version check at https://review.opendev.org/#/c/729563/17/nova/compute/api.py@4162 can see version 54 and allowing the shelve call with accelerators while the RPC is still can be manually pinned to < 5.13 and therefore the compute will not the accel_uuids param and therefore not handle the acceleratos properly? | 15:53 |
gibi | * will not get the accel_uuids | 15:53 |
dansmith | I'll have to look at that decorator, I think that just got added, right? | 15:54 |
tacco | i see processor: 255 and cpu cores: 64 in /proc/cpuinfo is this something like HT on Intel CPUs? but only can provide the "real" cores to the VM? | 15:54 |
tacco | cause i have 250CPUs in my flavor | 15:54 |
*** mlavalle has joined #openstack-nova | 15:55 | |
tacco | the VM then spawns with 64CPus | 15:55 |
f0o | Uhm | 15:55 |
f0o | 7742 shows as 64 cores | 15:55 |
f0o | HT stuff doesnt really count afaik | 15:55 |
dansmith | gibi: yikes, that makes an uncached cross-cell db lookup for every single call of that method :( | 15:55 |
gibi | dansmith: not too long ago, but it basically calls get_minimum_version_all_cells | 15:56 |
f0o | tacco: inside the vm you dont see the HT threads? | 15:56 |
tacco | in the VM at proc/cpuinfo i see only 64 but the vm was spawned with a flavor with 250 vcpus | 15:57 |
tacco | thats kinda strange and cli hypervisor list also shows 256 CPUs | 15:57 |
dansmith | gibi: commented on that patch, see if that helps | 15:59 |
gibi | thanks | 15:59 |
*** ociuhandu has joined #openstack-nova | 16:02 | |
*** k_mouza has joined #openstack-nova | 16:03 | |
tacco | i see this seems to be two physical CPUs with 64 cores and 128 threads. that makes sense. But no clue why the VM only got 64Cpus if in the flavor are 250 specified. anyway. will have to dive deeper :D | 16:05 |
f0o | I'm happy to swap issues with you tacoc :D | 16:08 |
f0o | other than my dyslexia today lol | 16:08 |
*** ociuhandu has quit IRC | 16:09 | |
tacco | always nice to be a usefull person :D | 16:09 |
gibi | dansmith: you confirmed my fears, thanks | 16:12 |
dansmith | gibi: ack | 16:12 |
dansmith | gibi: honestly that decorator seems like a bad idea to me.. that's a lot of overhead hidden in a decorator | 16:12 |
dansmith | it should at least use a cached value, but I'd prefer we did things like I described, which is rely on the rpc version and a raise from rpcapi | 16:13 |
sean-k-mooney | tacco: there was a libvirt bug realted to numa reporting and amd but perhaps there are others | 16:13 |
sean-k-mooney | tacco: what do you see on the host if you do nprc | 16:13 |
gibi | dansmith: agree. I missed the heavy weightness of that decorator impl in previous reviews. | 16:14 |
sean-k-mooney | tacco: can you provide the output of virsh capabilities for me in a paste and ill quickly take a look | 16:14 |
dansmith | gibi: oh actually, I forgot that we cache in the service object itself, so I guess not as bad, but still, it leaks the rpc problem so it'd be better to use that | 16:14 |
tacco | nproc shows me 256 as expected. | 16:14 |
*** takamatsu has quit IRC | 16:15 | |
sean-k-mooney | tacco: also the vm xml. if possible. i know some kernel have a process limit built in so just want to check if the xml has 64 or 250 cores | 16:15 |
tacco | one sec. will do so. | 16:15 |
gibi | dansmith: ack | 16:16 |
tacco | sean-k-mooney: http://paste.openstack.org/show/xs49lPLHXUPh43AeceZs/ | 16:19 |
tacco | thats the dumpxml from the VM | 16:20 |
sean-k-mooney | <nova:vcpus>250</nova:vcpus> | 16:20 |
tacco | virsh capas on the way. :) | 16:20 |
sean-k-mooney | <vcpu placement='static'>250</vcpu> | 16:20 |
sean-k-mooney | so nova is telling libvirt/qemu to use 250 | 16:20 |
tacco | ok, but inside the VM i only see 64 when i do cat /proc/cpuinfo hm.. maybe some problems with the image | 16:20 |
sean-k-mooney | <topology sockets='250' cores='1' threads='1'/> in a really dumb way but this is the default | 16:20 |
tacco | yes would enable numa pinning later | 16:21 |
sean-k-mooney | try adding hw_cpu_sockets=2 hw_cpu_theads=2 to the image | 16:21 |
tacco | would keep this in the flavor, because i would like to keep this as a seperate aggregate only available for "some" users | 16:21 |
sean-k-mooney | actull with 250 that wont work | 16:22 |
*** k_mouza has quit IRC | 16:22 | |
sean-k-mooney | tacco: try 248 cores | 16:22 |
sean-k-mooney | with those too set | 16:22 |
tacco | will do so. one sec. | 16:22 |
*** k_mouza has joined #openstack-nova | 16:22 | |
sean-k-mooney | im guessing the kernel is comiled to only support 64 sockets | 16:22 |
gibi | dansmith: I feel we have other places in the code where we rely purely on the service version to see if something is supported and we ignore the fact that rpc might be pinned. | 16:23 |
dansmith | gibi: those *should* be places where there isn't a specific rpc version in play, | 16:23 |
dansmith | in which case the service version is specifically what we want | 16:23 |
gibi | dansmith: yeah, that would be the proper usage of the service version only checks | 16:23 |
*** k_mouza has quit IRC | 16:23 | |
dansmith | service version alone should be used for cases where "I'm not saying anything new over rpc, but I depend on some action being done on the compute" | 16:23 |
*** k_mouza has joined #openstack-nova | 16:24 | |
gibi | dansmith: I will look through these checks to be sure | 16:24 |
tacco | sean-k-mooney: i guess this should be debian-cloud-image most things kernel related should be default | 16:27 |
tacco | ok flavor has now 248vcpus and | properties | hw:cpu_sockets='2', hw:cpu_theads='2' | | 16:28 |
sean-k-mooney | tacco: ya im not sure what the default sockets is upstream but 64 sound like a number peopel would choose as a default | 16:29 |
*** mgoddard has joined #openstack-nova | 16:29 | |
tacco | sean-k-mooney: yes this was also in my mind as first :D | 16:29 |
tacco | thats why i asked here.. because if this is known.. you should know it. :) | 16:30 |
tacco | this is the first time i have a HV with so many CPUs | 16:30 |
tacco | and i know some people here should have way larger setups and way more experience than i have. :) | 16:30 |
sean-k-mooney | i havent gone over 128 but i alway make my flavor mirror the host toplogy in terms or threads and sockets | 16:31 |
tacco | anyway. Thanks for your initial help. Know in know this could be related to the image. Will digg aroung and see what i can find. | 16:31 |
gibi | dansmith: does an ovo always travels through RPC with all its data and only backlevelled on the receiving side? So if a new field is added to an o.vo that is sent via RPC then we don't need to bump RPC version and the reciving side gets the new field if the code on the reciving side has the new field definition in its own ovo class independently of the RPC api version? | 16:31 |
sean-k-mooney | tacco: did that work by the way | 16:31 |
sean-k-mooney | the updated flavor | 16:31 |
tacco | ususaly you also don't want such huge flavors. | 16:31 |
tacco | nope updated flavor also has only 64cpus in the cm | 16:31 |
tacco | s/cm/vm/ | 16:31 |
sean-k-mooney | really | 16:32 |
sean-k-mooney | that is odd | 16:32 |
sean-k-mooney | am can you quickly check the qemu string jut to triple check | 16:32 |
tacco | and i double checked the xml if that change affected the xml and reflects my change to the flavor | 16:32 |
dansmith | gibi: we always send the version of the object we have. If the receiving side determines it's too new, it calls to conductor and asks for conductor to backlevel it. Conductor can either return an object that has an older version (i.e. if it _can_ backlevel it) or it can refuse | 16:32 |
sean-k-mooney | it likely a qemu or guest kernel limiation however | 16:32 |
tacco | yes. Thanks thats where i would like to digg more deeper | 16:33 |
gibi | dansmith: good to know, that in this case there is an extra call back to the conductor | 16:33 |
dansmith | gibi: but that's just for the object(s), not the rpc signature itself, and the idea was that during an upgrade you have extra conductor load to handle all the backports, but as you upgrade everything that just disappears | 16:34 |
gibi | dansmith: yes, it is clear that this is only possible for o.vos itself, not for the whole RPC method signature. | 16:35 |
*** macz_ has joined #openstack-nova | 16:36 | |
gibi | dansmith: but then in a new-field-in-an-ovo case the service version check is enough | 16:36 |
gibi | as if the compute service version is new enough then it will understand a new ovo version with the extra field | 16:36 |
sean-k-mooney | tacco: https://github.com/torvalds/linux/blob/master/arch/x86/Kconfig#L994-L1005 | 16:36 |
dansmith | yeah, and in some cases, it's possible to backlevel the object so we can just deal with it on the receiving end, but not if you require specific behavior | 16:36 |
sean-k-mooney | tacco: so it should be 512 so proably a qemu issue | 16:37 |
*** jdillaman has quit IRC | 16:38 | |
tacco | i see. Thanks. | 16:38 |
gibi | dansmith: thanks again, this make sense now | 16:39 |
dansmith | cool | 16:39 |
sean-k-mooney | tacco: it look like the max cpus depends on the machine type you enable | 16:40 |
gibi | I think this is a good time to finish my week and let the new understanding solidifies :) | 16:41 |
gibi | have a nice weekend folks o/ | 16:41 |
lyarwood | \o | 16:41 |
tacco | sean-k-mooney: here is also the capa list of virsh http://paste.openstack.org/show/nxpSZUUrICfvvRJBgosL/ | 16:44 |
tacco | this machine type? <type arch='x86_64' machine='pc-i440fx-4.0'>hvm</type> | 16:45 |
sean-k-mooney | ya you are using the pc machine type but it should in theory support up to 256 | 16:45 |
*** takamatsu has joined #openstack-nova | 16:46 | |
sean-k-mooney | tacco: if you look in teh output it has the limits | 16:47 |
sean-k-mooney | line 1193 | 16:47 |
sean-k-mooney | <machine maxCpus='255'>pc-i440fx-4.0</machine> | 16:47 |
tacco | yes i also found this in there. | 16:47 |
tacco | ok. so no "real"limitations more of a missconfiguration or bug. :) | 16:47 |
sean-k-mooney | so this is looking like a guest issue or a bug ya | 16:48 |
sean-k-mooney | you could try with another image | 16:48 |
tacco | i already struggled about that limitations that you can only have 8 disks inside a qemu vm. :) | 16:48 |
tacco | yes will do so but in general this was the debian cloud image with only minimal changes. but will test next week. | 16:48 |
sean-k-mooney | like a fedora or tubleweed image | 16:48 |
tacco | ok. will do so. but for today im done. | 16:49 |
sean-k-mooney | cool | 16:49 |
sean-k-mooney | enjoy your weekend | 16:49 |
tacco | thanks a lot for your help so far. Will get back to you next week. have a nice weekend as well | 16:49 |
*** ociuhandu has joined #openstack-nova | 16:51 | |
*** hemna has quit IRC | 16:58 | |
*** hemna has joined #openstack-nova | 16:59 | |
openstackgerrit | Balazs Gibizer proposed openstack/nova master: Restore retrying the RPC connection to conductor https://review.opendev.org/762633 | 17:04 |
*** hemna has quit IRC | 17:08 | |
*** hemna has joined #openstack-nova | 17:09 | |
*** tesseract has quit IRC | 17:14 | |
*** ociuhandu_ has joined #openstack-nova | 17:15 | |
dopereira | Hi, everyone. I'm a new contributor and just joined the OpenStack community. | 17:17 |
*** ociuhandu has quit IRC | 17:17 | |
dopereira | I recently started to work on a project that develops a Openstack distribuition with comercial support. Hopefully I will be contributing to the upstream Openstack as well. | 17:18 |
dopereira | I recently started to work on a project that develops a Openstack distribuition with comercial support. Hopefully I will be contributing to the upstream Openstack as well. | 17:18 |
dopereira | As part of my ramp up process, I'm learning how to contribute with Openstack, and I was asked to take care of a low-hanging-fruit bug. | 17:18 |
dopereira | I choose this one: https://bugs.launchpad.net/nova/+bug/1888927 and already have a patch for it: https://review.opendev.org/#/c/762433/4 | 17:18 |
openstack | Launchpad bug 1888927 in OpenStack Compute (nova) "cell_v2 update_cell cell0 get transport_url from config file" [Low,In progress] - Assigned to Daniel de Oliveira Pereira (danielpereira01) | 17:18 |
dopereira | Could you guys please take a look and help to review it? | 17:19 |
*** ociuhandu_ has quit IRC | 17:19 | |
dopereira | Also, the VMware NSX CI check failed. How can I started recheck for it? It does not provide this information | 17:21 |
sean-k-mooney | dopereira: have you submitted the patch to gerrit | 17:23 |
sean-k-mooney | ah you have | 17:23 |
sean-k-mooney | dopereira: when a third party ci fails it leaves a comment telling you how to recheck that specific ci | 17:24 |
sean-k-mooney | dopereira: you have to click the toggle extra ci button to see it | 17:24 |
sean-k-mooney | oh but they dont | 17:24 |
sean-k-mooney | it should be listed here | 17:25 |
dopereira | it seems that's not the case for VMware NSX CI | 17:25 |
sean-k-mooney | https://wiki.openstack.org/w/index.php?title=ThirdPartySystems | 17:25 |
sean-k-mooney | vmware-recheck-patch | 17:25 |
sean-k-mooney | acorrding to https://wiki.openstack.org/wiki/ThirdPartySystems/VMware_CI | 17:26 |
*** ociuhandu has joined #openstack-nova | 17:26 | |
dopereira | I saw, thanks | 17:27 |
sean-k-mooney | so i havent done a full review but i think you need to add a release note for the bug fix | 17:27 |
sean-k-mooney | looks like you have added tests | 17:28 |
sean-k-mooney | so ya the main thing i think is needed is a realse note but im not that familar with that part fo the code otherwise it looks fine | 17:29 |
dopereira | could you point me some documentation about release notes? | 17:30 |
*** ociuhandu has quit IRC | 17:31 | |
gmann | dopereira: here - https://docs.openstack.org/reno/latest/user/usage.html | 17:31 |
sean-k-mooney | basically you create a new one form the template via tox. so tox -e venv -- reno new cell_0-transport-url | 17:31 |
sean-k-mooney | then you edit the file it creates | 17:32 |
*** jangutter has quit IRC | 17:32 | |
sean-k-mooney | you can check its correct with tox -e releasenotes | 17:32 |
sean-k-mooney | you just need to fill out the fixes section in this case and remove the rest | 17:32 |
dopereira | thanks, will take a look | 17:34 |
*** bnemec is now known as beekneemech | 17:37 | |
*** k_mouza has quit IRC | 17:43 | |
*** psachin has quit IRC | 17:44 | |
*** mgoddard has quit IRC | 17:49 | |
tacco | sean-k-mooney: can't hold myself and tested with fedora. same same.. 64vcpus in /proc/cpuinfo.. d | 17:54 |
tacco | nproc also 64. | 17:54 |
*** mlavalle has quit IRC | 17:58 | |
sean-k-mooney | tacco: you try a dmidecode in the vm | 17:59 |
sean-k-mooney | or check dmsge to see if it prints anything on kernel start | 17:59 |
sean-k-mooney | but ya nova seams to be doing the right thing so its a qemu or guest issue | 18:00 |
sean-k-mooney | looking more like qemu since it happens on multiple distoros | 18:00 |
*** mlavalle has joined #openstack-nova | 18:05 | |
openstackgerrit | Ghanshyam Mann proposed openstack/nova master: [WIP] Migrate nova-grenade-multinode job to zuulv3 native https://review.opendev.org/742056 | 18:12 |
*** ralonsoh has quit IRC | 18:34 | |
*** andrewbonney has quit IRC | 18:36 | |
*** nweinber has quit IRC | 18:36 | |
*** nweinber has joined #openstack-nova | 18:36 | |
*** sean-k-mooney1 has joined #openstack-nova | 18:41 | |
*** sean-k-mooney has quit IRC | 18:42 | |
*** gyee has joined #openstack-nova | 18:54 | |
openstackgerrit | Daniel de Oliveira Pereira proposed openstack/nova master: Avoid changing transport_url when updating Cell0 https://review.opendev.org/762433 | 19:05 |
*** nweinber has quit IRC | 19:18 | |
*** nweinber has joined #openstack-nova | 19:19 | |
*** lifeless has quit IRC | 19:31 | |
*** k_mouza has joined #openstack-nova | 19:43 | |
*** lifeless has joined #openstack-nova | 19:47 | |
*** k_mouza has quit IRC | 19:48 | |
*** dopereira has quit IRC | 19:50 | |
*** slaweq has quit IRC | 20:03 | |
openstackgerrit | Merged openstack/nova stable/train: add [libvirt]/max_queues config option https://review.opendev.org/740064 | 20:06 |
openstackgerrit | Merged openstack/nova stable/victoria: Handle disabled CPU features to fix live migration failures https://review.opendev.org/758760 | 20:52 |
*** ociuhandu has joined #openstack-nova | 21:19 | |
*** rcernin has joined #openstack-nova | 21:19 | |
openstackgerrit | Merged openstack/nova stable/ussuri: docs: Resolve issue with deprecated extra specs https://review.opendev.org/748386 | 21:27 |
openstackgerrit | Merged openstack/nova stable/ussuri: replace the "hide_hypervisor_id" to "hw:hide_hypervisor_id" https://review.opendev.org/747189 | 21:27 |
*** ociuhandu has quit IRC | 21:45 | |
*** jmlowe has quit IRC | 21:49 | |
*** sean-k-mooney2 has joined #openstack-nova | 21:54 | |
*** sean-k-mooney1 has quit IRC | 21:56 | |
*** hamalq has joined #openstack-nova | 22:30 | |
*** nweinber has quit IRC | 22:46 | |
openstackgerrit | Ghanshyam Mann proposed openstack/nova master: Fix config option default value for sample config file https://review.opendev.org/762721 | 23:15 |
openstackgerrit | Ghanshyam Mann proposed openstack/nova master: Fix config option default value for sample config file https://review.opendev.org/762721 | 23:16 |
*** hemna has quit IRC | 23:19 | |
*** hemna has joined #openstack-nova | 23:20 | |
*** corvus has quit IRC | 23:22 | |
*** hamalq has quit IRC | 23:37 | |
*** hamalq has joined #openstack-nova | 23:38 | |
*** tosky has quit IRC | 23:43 |
Generated by irclog2html.py 2.17.2 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!