*** irclogbot_1 has joined #openstack-nova | 00:23 | |
*** irclogbot_1 has quit IRC | 00:40 | |
*** tbachman has joined #openstack-nova | 00:45 | |
*** Liang__ has joined #openstack-nova | 01:20 | |
*** irclogbot_0 has joined #openstack-nova | 01:27 | |
*** zhanglong has joined #openstack-nova | 01:39 | |
*** irclogbot_0 has quit IRC | 01:48 | |
*** irclogbot_1 has joined #openstack-nova | 02:47 | |
openstackgerrit | Sundar Nadathur proposed openstack/nova master: ksa auth conf and client for Cyborg access https://review.opendev.org/631242 | 02:52 |
---|---|---|
openstackgerrit | Sundar Nadathur proposed openstack/nova master: Add Cyborg device profile groups to request spec. https://review.opendev.org/631243 | 02:52 |
openstackgerrit | Sundar Nadathur proposed openstack/nova master: Define Cyborg ARQ binding notification event. https://review.opendev.org/692707 | 02:52 |
openstackgerrit | Sundar Nadathur proposed openstack/nova master: Create and bind Cyborg ARQs. https://review.opendev.org/631244 | 02:52 |
openstackgerrit | Sundar Nadathur proposed openstack/nova master: Compose accelerator PCI devices into VM's domain XML. https://review.opendev.org/631245 | 02:52 |
openstackgerrit | Sundar Nadathur proposed openstack/nova master: Delete ARQs for an instance when the instance is deleted. https://review.opendev.org/673735 | 02:52 |
openstackgerrit | Sundar Nadathur proposed openstack/nova master: Add cyborg tempest job. https://review.opendev.org/670999 | 02:52 |
openstackgerrit | Sundar Nadathur proposed openstack/nova master: Enable hard reboot with accelerators. https://review.opendev.org/697940 | 02:52 |
*** slaweq has joined #openstack-nova | 02:54 | |
*** slaweq has quit IRC | 03:02 | |
*** brinzhang has joined #openstack-nova | 03:06 | |
*** brinzhang has quit IRC | 03:08 | |
*** brinzhang has joined #openstack-nova | 03:08 | |
*** slaweq has joined #openstack-nova | 03:09 | |
*** slaweq has quit IRC | 03:15 | |
*** brault has quit IRC | 03:16 | |
*** brault has joined #openstack-nova | 03:22 | |
*** chenhaw has joined #openstack-nova | 03:25 | |
*** lvbin01 has quit IRC | 03:25 | |
*** lvbin01 has joined #openstack-nova | 03:25 | |
*** brinzhang_ has joined #openstack-nova | 03:25 | |
*** brinzhang has quit IRC | 03:29 | |
*** slaweq has joined #openstack-nova | 03:32 | |
*** slaweq has quit IRC | 03:37 | |
*** slaweq has joined #openstack-nova | 03:49 | |
*** mkrai has joined #openstack-nova | 03:52 | |
*** slaweq has quit IRC | 03:54 | |
openstackgerrit | Sundar Nadathur proposed openstack/nova master: Delete ARQs for an instance when the instance is deleted. https://review.opendev.org/673735 | 04:02 |
openstackgerrit | Sundar Nadathur proposed openstack/nova master: Enable hard reboot with accelerators. https://review.opendev.org/697940 | 04:02 |
openstackgerrit | Sundar Nadathur proposed openstack/nova master: Add cyborg tempest job. https://review.opendev.org/670999 | 04:02 |
*** slaweq has joined #openstack-nova | 04:09 | |
*** udesale has joined #openstack-nova | 04:25 | |
*** bhagyashris has joined #openstack-nova | 04:29 | |
*** factor has quit IRC | 04:31 | |
*** factor has joined #openstack-nova | 04:32 | |
*** slaweq has quit IRC | 04:45 | |
*** sapd1 has quit IRC | 05:09 | |
openstackgerrit | Sundar Nadathur proposed openstack/nova master: Define Cyborg ARQ binding notification event. https://review.opendev.org/692707 | 05:30 |
openstackgerrit | Sundar Nadathur proposed openstack/nova master: Create and bind Cyborg ARQs. https://review.opendev.org/631244 | 05:30 |
openstackgerrit | Sundar Nadathur proposed openstack/nova master: Compose accelerator PCI devices into VM's domain XML. https://review.opendev.org/631245 | 05:30 |
openstackgerrit | Sundar Nadathur proposed openstack/nova master: Delete ARQs for an instance when the instance is deleted. https://review.opendev.org/673735 | 05:30 |
openstackgerrit | Sundar Nadathur proposed openstack/nova master: Enable hard reboot with accelerators. https://review.opendev.org/697940 | 05:30 |
openstackgerrit | Sundar Nadathur proposed openstack/nova master: Add cyborg tempest job. https://review.opendev.org/670999 | 05:30 |
*** janki has joined #openstack-nova | 05:39 | |
*** zhanglong has quit IRC | 05:43 | |
*** slaweq has joined #openstack-nova | 05:59 | |
*** zhanglong has joined #openstack-nova | 06:01 | |
*** slaweq has quit IRC | 06:04 | |
*** ircuser-1 has quit IRC | 06:33 | |
*** damien_r has joined #openstack-nova | 06:34 | |
openstackgerrit | OpenStack Proposal Bot proposed openstack/nova master: Imported Translations from Zanata https://review.opendev.org/694717 | 06:44 |
*** brinzhang has joined #openstack-nova | 06:53 | |
*** awalende has joined #openstack-nova | 06:53 | |
*** brinzhang_ has quit IRC | 06:56 | |
*** awalende has quit IRC | 06:58 | |
*** jchhatbar has joined #openstack-nova | 07:01 | |
*** janki has quit IRC | 07:01 | |
*** jchhatbar has quit IRC | 07:01 | |
*** brinzhang_ has joined #openstack-nova | 07:06 | |
*** brinzhang has quit IRC | 07:10 | |
*** rcernin has quit IRC | 07:12 | |
*** zhanglong has quit IRC | 07:13 | |
*** brinzhang has joined #openstack-nova | 07:19 | |
*** brinzhang_ has quit IRC | 07:23 | |
*** slaweq has joined #openstack-nova | 07:24 | |
*** brinzhang has quit IRC | 07:29 | |
*** brinzhang has joined #openstack-nova | 07:29 | |
*** brinzhang has quit IRC | 07:31 | |
*** brinzhang has joined #openstack-nova | 07:31 | |
*** damien_r has quit IRC | 07:32 | |
*** belmoreira has joined #openstack-nova | 07:35 | |
*** brinzhang_ has joined #openstack-nova | 07:39 | |
*** links has joined #openstack-nova | 07:42 | |
*** brinzhang has quit IRC | 07:42 | |
*** gibi is now known as gibi_off | 07:42 | |
*** adriant has quit IRC | 07:52 | |
*** adriant has joined #openstack-nova | 07:52 | |
*** brinzhang has joined #openstack-nova | 07:55 | |
*** brinzhang has quit IRC | 07:56 | |
*** brinzhang has joined #openstack-nova | 07:56 | |
*** brinzhang_ has quit IRC | 07:58 | |
*** damien_r has joined #openstack-nova | 07:59 | |
*** igordc has joined #openstack-nova | 08:01 | |
*** brinzhang has quit IRC | 08:05 | |
*** brinzhang has joined #openstack-nova | 08:05 | |
*** brinzhang_ has joined #openstack-nova | 08:07 | |
*** tkajinam has quit IRC | 08:07 | |
*** awalende has joined #openstack-nova | 08:09 | |
*** brinzhang has quit IRC | 08:10 | |
*** jangutter has joined #openstack-nova | 08:11 | |
*** awalende has quit IRC | 08:14 | |
*** jangutter has quit IRC | 08:16 | |
*** jangutter has joined #openstack-nova | 08:16 | |
*** brinzhang_ has quit IRC | 08:25 | |
*** tosky has joined #openstack-nova | 08:25 | |
*** brinzhang_ has joined #openstack-nova | 08:25 | |
*** avolkov has joined #openstack-nova | 08:29 | |
*** brinzhang has joined #openstack-nova | 08:29 | |
*** tesseract has joined #openstack-nova | 08:30 | |
*** tesseract has quit IRC | 08:31 | |
*** tesseract has joined #openstack-nova | 08:31 | |
*** brinzhang_ has quit IRC | 08:32 | |
*** igordc has quit IRC | 08:36 | |
*** brinzhang has quit IRC | 08:38 | |
*** maciejjozefczyk has joined #openstack-nova | 08:39 | |
*** tetsuro has joined #openstack-nova | 08:48 | |
*** zhanglong has joined #openstack-nova | 08:53 | |
*** abaindur has joined #openstack-nova | 08:57 | |
*** martinkennelly has joined #openstack-nova | 08:57 | |
*** ralonsoh has joined #openstack-nova | 08:57 | |
*** abaindur_ has joined #openstack-nova | 08:58 | |
*** abaindur has quit IRC | 08:58 | |
*** ebbex has quit IRC | 09:00 | |
*** udesale has quit IRC | 09:01 | |
*** awalende has joined #openstack-nova | 09:05 | |
*** abaindur_ has quit IRC | 09:08 | |
*** rcernin has joined #openstack-nova | 09:11 | |
*** rpittau|afk is now known as rpittau | 09:15 | |
openstackgerrit | Brin Zhang proposed openstack/nova master: PoC: Support re-configure the delete_on_termination in server https://review.opendev.org/693828 | 09:16 |
*** derekh has joined #openstack-nova | 09:38 | |
*** mgoddard has joined #openstack-nova | 09:46 | |
*** Liang__ has quit IRC | 09:49 | |
*** belmoreira has quit IRC | 09:53 | |
*** belmoreira has joined #openstack-nova | 09:57 | |
*** mkrai has quit IRC | 09:58 | |
*** mkrai has joined #openstack-nova | 09:59 | |
*** yan0s has joined #openstack-nova | 09:59 | |
*** dtantsur|afk is now known as dtantsur | 10:02 | |
*** mkrai has quit IRC | 10:04 | |
*** udesale has joined #openstack-nova | 10:08 | |
*** rcernin has quit IRC | 10:14 | |
*** davidsha has joined #openstack-nova | 10:18 | |
*** belmoreira has quit IRC | 10:18 | |
*** belmoreira has joined #openstack-nova | 10:21 | |
*** udesale has quit IRC | 10:31 | |
*** salmankhan has joined #openstack-nova | 10:38 | |
*** salmankhan has quit IRC | 10:45 | |
*** salmankhan has joined #openstack-nova | 10:45 | |
*** salmankhan has joined #openstack-nova | 10:45 | |
*** mdbooth has quit IRC | 10:57 | |
*** zhanglong has quit IRC | 11:14 | |
*** mkrai has joined #openstack-nova | 11:17 | |
slaweq | efried: hi | 11:18 |
slaweq | efried: recently I saw quite often grenade jobs failing with errors like https://02475c780c6fc32e71dc-f63b2c309fc0040fbb4a377b77794f40.ssl.cf1.rackcdn.com/697035/1/check/neutron-grenade-dvr-multinode/baba7e2/logs/testr_results.html.gz | 11:18 |
slaweq | basically it is "No valid host was found. There are not enough hosts available.'" error from nova in some tests and tests are failing | 11:19 |
slaweq | efried: did You see something similar? and is there any bug reported for that or should I open new one? | 11:19 |
*** sapd1 has joined #openstack-nova | 11:20 | |
slaweq | efried: and in the scheduler log I see something like here: https://zuul.opendev.org/t/openstack/build/baba7e2f78994deabbc3230b3f9acc80/log/logs/screen-n-sch.txt.gz#2505 | 11:23 |
slaweq | and before that there is log like: https://zuul.opendev.org/t/openstack/build/baba7e2f78994deabbc3230b3f9acc80/log/logs/screen-n-sch.txt.gz#2501 | 11:23 |
slaweq | "Timed out waiting for response from cell 258a967d-07ef-43a6-b81e-2ac433a583ef" | 11:23 |
slaweq | efried: so now I think it's the same issue as https://bugs.launchpad.net/nova/+bug/1844929 - is that correct? | 11:24 |
openstack | Launchpad bug 1844929 in OpenStack Compute (nova) "grenade jobs failing due to "Timed out waiting for response from cell" in scheduler" [High,Confirmed] | 11:24 |
sean-k-mooney | the availabliy zone filter is a red herring that iteration started with 0 hosts https://zuul.opendev.org/t/openstack/build/baba7e2f78994deabbc3230b3f9acc80/log/logs/screen-n-sch.txt.gz#2503 | 11:34 |
*** dviroel has joined #openstack-nova | 11:36 | |
*** sapd1 has quit IRC | 11:36 | |
*** mkrai has quit IRC | 11:39 | |
sean-k-mooney | slaweq: we do see the same messaging timeout in the n-cpu log on the sub node https://02475c780c6fc32e71dc-f63b2c309fc0040fbb4a377b77794f40.ssl.cf1.rackcdn.com/697035/1/check/neutron-grenade-dvr-multinode/baba7e2/logs/subnode-2/screen-n-cpu.txt.gz | 11:41 |
*** tbachman has quit IRC | 11:42 | |
sean-k-mooney | slaweq: im also seeing similar error in the agent log | 11:44 |
*** brinzhang has joined #openstack-nova | 11:45 | |
sean-k-mooney | it kind of looks like a rabbitmq issue | 11:45 |
*** pcaruana has joined #openstack-nova | 11:46 | |
slaweq | sean-k-mooney: rabbitmq timeout happend first time around 19:57 in nova-compute on subnode-2 | 11:54 |
slaweq | in rabbitmq logs it seems that during this time there were only closing connection logs: https://02475c780c6fc32e71dc-f63b2c309fc0040fbb4a377b77794f40.ssl.cf1.rackcdn.com/697035/1/check/neutron-grenade-dvr-multinode/baba7e2/logs/rabbitmq/rabbit%40ubuntu-bionic-ovh-gra1-0013217908.txt.gz | 11:55 |
openstackgerrit | Boris Bobrov proposed openstack/nova master: Create a controller for qga when SEV is used https://review.opendev.org/693072 | 11:57 |
*** brinzhang_ has joined #openstack-nova | 12:05 | |
*** brinzhang has quit IRC | 12:09 | |
*** mkrai has joined #openstack-nova | 12:14 | |
*** dave-mccowan has joined #openstack-nova | 12:20 | |
*** bbowen has joined #openstack-nova | 12:20 | |
*** belmoreira has quit IRC | 12:22 | |
*** dave-mccowan has quit IRC | 12:25 | |
*** artom has joined #openstack-nova | 12:26 | |
*** udesale has joined #openstack-nova | 12:33 | |
*** Luzi has joined #openstack-nova | 12:57 | |
*** brinzhang has joined #openstack-nova | 13:04 | |
*** brinzhang has quit IRC | 13:06 | |
*** brinzhang has joined #openstack-nova | 13:07 | |
openstackgerrit | Merged openstack/nova master: Imported Translations from Zanata https://review.opendev.org/694717 | 13:07 |
*** brinzhang_ has quit IRC | 13:07 | |
*** tbachman has joined #openstack-nova | 13:08 | |
*** huaqiang has joined #openstack-nova | 13:10 | |
*** aarents has joined #openstack-nova | 13:16 | |
mgariepy | anyone had issue with ephemeral storage when upgrading to 18.04 ? | 13:18 |
mgariepy | the _base image for the ephemeral part is formated with differents options, so the backing file for the disk.eph0 is not quite right and will not work for the vm being migrated. | 13:19 |
*** zbr has quit IRC | 13:27 | |
sean-k-mooney | dansmith: i spent some time over the weekend playing with the pci endpoint test driver and the netdevsim module. | 13:29 |
sean-k-mooney | dansmith: they will not allow use to fake pci devices unfortunetly | 13:29 |
sean-k-mooney | dansmith: the endpoint driver need a pci endpoint contoler to be present and there is not software implematnion of that | 13:30 |
sean-k-mooney | and the netdevsim module still has the limitation that it just simulates the netdevs not the pci endpoint. so even though you can create PFs and VFs it does not create pcie endpoint just the netdevs | 13:31 |
sean-k-mooney | so its the same as when i last looked at this in febuary | 13:32 |
*** zbr has joined #openstack-nova | 13:32 | |
*** jamesden_ has joined #openstack-nova | 13:35 | |
*** nweinber has joined #openstack-nova | 13:35 | |
*** bhagyashris has quit IRC | 13:36 | |
*** jamesdenton has quit IRC | 13:36 | |
huaqiang | sean-k-mooney: In Train release, you have reviewed my proposal for using PCPU and VCPU in same instance, can you review the updated Ussuri version once you have spare time? | 13:49 |
huaqiang | sean-k-mooney: the spec's link is https://review.opendev.org/#/c/668656/ | 13:49 |
sean-k-mooney | this one https://review.opendev.org/#/c/668656/ | 13:49 |
sean-k-mooney | :) | 13:49 |
kashyap | melwitt: Hiya, will look | 13:49 |
huaqiang | thanks | 13:50 |
sean-k-mooney | huaqiang: ill finish the email im typeing and review it then | 13:50 |
huaqiang | I prepared the POC code, under topic: bp/mixed-cpu-instance-set4 | 13:51 |
*** ociuhandu has joined #openstack-nova | 13:51 | |
huaqiang | not good enough but it works somehow | 13:51 |
sean-k-mooney | huaqiang: ideally the topic shoudl match the bluepinrt name so it should be bp/use-pcpu-vcpu-in-one-instance | 13:51 |
huaqiang | sean-k-mooney: I'll make the name be consistent in next update | 13:52 |
sean-k-mooney | ok i just helps when trying to find all the related patches if they have the same topic as the spec and it matchs the blueprint | 13:53 |
kashyap | melwitt: Ah, never mind, I see it's merged - the CPU comaprison check on AArch64 | 13:53 |
*** kashyap has left #openstack-nova | 13:55 | |
huaqiang | I saw that, appreciate that. | 13:55 |
*** mriedem has joined #openstack-nova | 13:58 | |
*** kashyap has joined #openstack-nova | 14:04 | |
aarents | hi, | 14:04 |
aarents | mriedem: I don't know if you got some news from Matt Booth about https://review.opendev.org/#/c/696084/ , he's probably off or busy? | 14:04 |
kashyap | aarents: He normally goes by 'mdbooth' in this channel | 14:05 |
*** mdbooth has joined #openstack-nova | 14:06 | |
*** brinzhang_ has joined #openstack-nova | 14:07 | |
mdbooth | kashyap: Did you look at the arguments involved https://review.opendev.org/#/c/696084/2/nova/virt/libvirt/imagebackend.py ? | 14:08 |
mdbooth | Is that guaranteed to flatten the qcow2? | 14:08 |
kashyap | mdbooth: Afraid, not yet; was addressing something else. /me goes to look... | 14:09 |
*** brinzhang_ has quit IRC | 14:09 | |
*** brinzhang_ has joined #openstack-nova | 14:09 | |
*** brinzhang has quit IRC | 14:10 | |
*** brinzhang_ has quit IRC | 14:11 | |
*** brinzhang_ has joined #openstack-nova | 14:12 | |
kashyap | mdbooth: To flatten a chain *offline* shouldn't one require `qemu-img commit`? (Online is 'block-commit') | 14:12 |
* kashyap is still reading... | 14:12 | |
mdbooth | kashyap: Right. I haven't checked the exact args used there, or refreshed my memory on qcow2 flattening, but I seemed to recall there was more involved | 14:12 |
*** brinzhang_ has quit IRC | 14:13 | |
kashyap | mdbooth: Definitely commit is required. | 14:13 |
kashyap | So says my 2012 "handout" even :D -- https://kashyapc.fedorapeople.org/virt/lc-2012/snapshots-handout.html | 14:13 |
*** brinzhang_ has joined #openstack-nova | 14:14 | |
kashyap | If you have: [base] <-- [overlay1] | 14:14 |
kashyap | To "flatten" it, qemu-img commit sn2.qcow2 | 14:14 |
kashyap | [Err, bad copy/paste.] | 14:14 |
kashyap | To "flatten" it, `qemu-img commit overlay1.qcow2`. | 14:14 |
kashyap | [If you have more than two files in a chain U2014which is it in our case — | 14:15 |
*** brinzhang_ has quit IRC | 14:15 | |
kashyap | [If you have more than two files in a chain — which is it in our case — then you'd also have to update the backing file pointer.] | 14:15 |
*** brinzhang_ has joined #openstack-nova | 14:16 | |
lyarwood | I don't think you need to do that anymore kashyap, I've only ever seen just a normal qcow2 to qcow2 convert used to flatten qcow2 disks. | 14:16 |
kashyap | lyarwood: Sorry, what is not required anymore? | 14:16 |
kashyap | lyarwood: Ah, the 'commit'? | 14:16 |
kashyap | Right, 'qemu-img conver' is another way. Which begs the question which method is preferred over the other, and why | 14:17 |
lyarwood | yeah, http://paste.openstack.org/show/787313/ | 14:17 |
*** brinzhang_ has quit IRC | 14:17 | |
*** brinzhang_ has joined #openstack-nova | 14:18 | |
kashyap | lyarwood: I _think_ the advantage of using 'convert' is that it retains some qcow2 properties... | 14:19 |
*** brinzhang_ has quit IRC | 14:19 | |
* kashyap checks | 14:19 | |
*** brinzhang_ has joined #openstack-nova | 14:20 | |
*** brinzhang_ has quit IRC | 14:21 | |
*** brinzhang_ has joined #openstack-nova | 14:21 | |
*** brinzhang_ has quit IRC | 14:22 | |
*** nweinber has quit IRC | 14:23 | |
lyarwood | kashyap: yeah I think so, I've commented on the change anyway, for qcow2 I think we need to rebase on the original cached image otherwise each unshelve is going to eat up a whole load of disk with each call to flatten. | 14:26 |
lyarwood | mdbooth: ^ not sure if you agree | 14:26 |
mriedem | gibi_off: i replied to your questions in https://review.opendev.org/#/c/637070/ - let me know if you need anything else | 14:26 |
mriedem | elod: do you know when gibi is back? | 14:27 |
kashyap | lyarwood: There are _three_ ways in total Including 'rebase' -- I was remembering it only changes the backing file pointer. | 14:28 |
elod | mriedem: wednesday, if I'm not mistaken | 14:28 |
kashyap | lyarwood: Will update the change with the three ways, and pros/cons | 14:28 |
mriedem | elod: ok thanks | 14:28 |
*** tbachman has quit IRC | 14:29 | |
lyarwood | kashyap: ah I wasn't aware that you could rebase disks between backing files, that would be super useful here. | 14:30 |
lyarwood | kashyap: thanks! | 14:30 |
kashyap | lyarwood: Yes, indeed. By default 'qemu-img rebase' does a "real rebase" option. And 'qemu-img rebase -U' -- "unsafe" -- will only update the backing file | 14:31 |
kashyap | The man page explains it. /me recalls documenting that part of 'qemu-img' many moons ago :-) | 14:31 |
*** tbachman has joined #openstack-nova | 14:32 | |
kashyap | (Read the Safe mode vs. Unsafe mode discussion) | 14:32 |
aarents | mdbooth: kashyap Hi, basicly by using convert, I reused stuff done when we extract a snapshot to glance (we need to flaten before upload) | 14:36 |
*** tbachman has quit IRC | 14:37 | |
*** eharney has joined #openstack-nova | 14:41 | |
*** lpetrut has joined #openstack-nova | 14:41 | |
*** tbachman has joined #openstack-nova | 14:41 | |
kashyap | aarents: Hi, there are a couple of trade-offs here, based on the method we're using | 14:42 |
kashyap | 'convert' has the advantage of also handling sparsification (as guessed earlier); but has the disadvantage of being relatively slow _and_ requires double the space | 14:43 |
kashyap | aarents: Slow because, 'convert' copies both base and overlay into a new image; and thus double the space. (Instead of copying only into base or the overlay, in case of 'commit' or 'rebase') | 14:44 |
kashyap | aarents: Writing a further comment in the change, once we have the options laid out, then we can make a decision | 14:45 |
*** huaqiang is now known as huaqiang_ | 14:46 | |
*** tbachman has quit IRC | 14:56 | |
*** beekneemech is now known as bnemec | 15:00 | |
aarents | kashyap: ok great | 15:00 |
*** Luzi has quit IRC | 15:00 | |
*** ociuhandu has quit IRC | 15:11 | |
*** jamesden_ is now known as jamesdenton | 15:15 | |
openstackgerrit | Matt Riedemann proposed openstack/nova master: Follow up to I5b9d41ef34385689d8da9b3962a1eac759eddf6a https://review.opendev.org/698028 | 15:18 |
kashyap | mdbooth: aarents: lyarwood: Added comparison notes of the three possible ways we can take: https://review.opendev.org/#/c/696084/ | 15:18 |
*** nweinber has joined #openstack-nova | 15:19 | |
efried | slaweq: catching up... | 15:20 |
efried | slaweq: Any time I see spurious grenade fails in the last couple months, I attribute it to oversubscribed CI nodes, per mriedem's "State of the Gate" thread, started here http://lists.openstack.org/pipermail/openstack-discuss/2019-October/thread.html#10484 and continued here http://lists.openstack.org/pipermail/openstack-discuss/2019-November/thread.html#10502 | 15:23 |
efried | slaweq: and per the first note in that thread, yes, the bug you identified (bug 1844929) is the one we're "tracking" the issue with. | 15:24 |
openstack | bug 1844929 in OpenStack Compute (nova) "grenade jobs failing due to "Timed out waiting for response from cell" in scheduler" [High,Confirmed] https://launchpad.net/bugs/1844929 | 15:24 |
efried | ...and the right solution is to get the CI providers to tweak their environments accordingly. We would rather have lower job throughput and lower failure rates. | 15:26 |
efried | But so far there has been no reaction from them. | 15:26 |
mriedem | note that for that particular bug the vast majority of fails are on ovh nodes | 15:26 |
mriedem | i don't know why it's mostly on grenade jobs | 15:27 |
*** tbachman has joined #openstack-nova | 15:27 | |
mriedem | must have something to do with restarting mysql a few times, but idk | 15:27 |
mriedem | note that it also started with train, i don't know why though | 15:29 |
mriedem | tl;dr i don't really know much of anything | 15:29 |
sean-k-mooney | do we also restart rabbitmq during the grenade upgrade. i assume so but i have not checked | 15:33 |
sean-k-mooney | it is porably restarted at least once by devstack on the second stacking | 15:33 |
sean-k-mooney | i noticed that the subnode hand messaging timeouts in both the n-cpu and q-agt services | 15:35 |
sean-k-mooney | so it looked like we lost messages or something | 15:35 |
*** udesale has quit IRC | 15:36 | |
*** awalende has quit IRC | 15:38 | |
*** awalende has joined #openstack-nova | 15:38 | |
*** awalende_ has joined #openstack-nova | 15:42 | |
*** awalende has quit IRC | 15:43 | |
*** awalende_ has quit IRC | 15:43 | |
*** awalende has joined #openstack-nova | 15:44 | |
slaweq | mriedem: efried thx for confirmation that this is the bug which we hit most of the times now | 15:45 |
efried | dansmith: it would seem that the 422 event code is masked when there's only one event (as is the case with cyborg) | 15:45 |
slaweq | sean-k-mooney: and thx for info about rabbitmq too | 15:45 |
efried | slaweq: It would help to get some more voices complaining at the node providers. | 15:45 |
dansmith | efried: really? I didn't see that condition | 15:45 |
efried | dansmith: when all events are dropped, the handler turns the whole thing into a 404 | 15:45 |
efried | https://github.com/openstack/nova/blob/master/nova/api/openstack/compute/server_external_events.py#L146 | 15:46 |
efried | which is arguably a bug | 15:46 |
dansmith | efried: ah, yeah, that code isn't really right.. "no instances found for any event" isn't true, just no hosts.. | 15:46 |
dansmith | yeah | 15:46 |
*** lpetrut has quit IRC | 15:46 | |
efried | to "fix" the bug would be a microversion? | 15:47 |
dansmith | unless mriedem feels that's a microversion problem, I can change that | 15:47 |
dansmith | I dunno, it's correcting the error code, which I thought was allowable in some situations | 15:47 |
efried | what would you correct it to? 207? | 15:47 |
*** johanssone has quit IRC | 15:48 | |
*** awalende has quit IRC | 15:48 | |
efried | Meaning there's actually no way the API ever returns an error. | 15:48 |
dansmith | that's what we return per-event in that case right? | 15:48 |
dansmith | no, there is | 15:48 |
dansmith | if the instance itself is not found | 15:48 |
efried | instances plural, right? | 15:48 |
*** johanssone has joined #openstack-nova | 15:49 | |
dansmith | yeah | 15:49 |
efried | so if some of the instances are not found, that's a 207 with some 404s | 15:49 |
efried | but if none of the instances are found, that's an overall 404? | 15:49 |
efried | or, | 15:50 |
efried | If all events tank, 400 with the granular payload | 15:50 |
efried | If some events succeed, 207 with the granular payload | 15:50 |
efried | or, | 15:50 |
efried | If all events tank for the same reason, make that the overall status code | 15:50 |
*** ociuhandu has joined #openstack-nova | 15:51 | |
efried | otherwise use 400 or 207 | 15:51 |
dansmith | it really should be that if everything was a 404, then you get overall 404. otherwise it's 207 with granular statuses right? | 15:51 |
dansmith | or maybe if everything is 200, you return 200 as well, I dunno | 15:51 |
efried | I'm not offended by the idea that you always return 207 | 15:52 |
efried | the definition of 207 allows for it to be full failure situations | 15:52 |
efried | "The response MAY be used in success, partial success and also in failure situations." | 15:53 |
dansmith | yeah, I think the only reason not to do that is just that things like response.raise_for_status() isn't usable for a dumb cleint | 15:54 |
efried | If clients are properly coded, that should be an acceptable behavior. But I suspect existing clients are written to how the code is currently mad. | 15:54 |
*** Sundar has joined #openstack-nova | 15:54 | |
efried | well, a client coded to pass blindly on 207 is buggy, period. | 15:54 |
efried | Sundar: o/ We're just discussing the bug you identified in the server_external_events algo, and how to fix it. | 15:55 |
*** tbachman has quit IRC | 15:55 | |
dansmith | well, sure, but lots of code does "if s<300: continue" | 15:55 |
dansmith | efried: anyway, I'm really fine with 207 across the board if others are, | 15:55 |
*** ociuhandu has quit IRC | 15:55 | |
dansmith | I'm just arguing what I think some people would argue | 15:55 |
efried | yes, I know, I'm saying code that does that with a response from this particular API is wrong, before or after the fix. | 15:55 |
dansmith | very few other things use this interface and they're really all openstack projects | 15:55 |
dansmith | sure | 15:56 |
efried | so, 200 if all are green, 207 if any/all fail. I'm happy with that. | 15:56 |
efried | which basically just means removing https://github.com/openstack/nova/blob/master/nova/api/openstack/compute/server_external_events.py#L149-L151 | 15:56 |
*** mkrai has quit IRC | 15:56 | |
efried | dansmith: I can work that up. Unless you're already doing it. | 15:57 |
dansmith | efried: I'm not, but I'd also like to hear mriedem say he's okay with it | 15:57 |
efried | okay. I'll put it up so we can talk about it around the patch. | 15:58 |
dansmith | sure | 15:58 |
* mriedem reads back | 15:58 | |
mriedem | doesn't sound crazy to me, and it's an admin-only api by default | 16:04 |
efried | cool, forthcoming... | 16:04 |
mriedem | gmann and alex_xu can pounce on you | 16:04 |
mriedem | you'll want a bug either way i think | 16:04 |
efried | Sundar: care to open that, since you uncovered this? | 16:05 |
mriedem | https://docs.openstack.org/nova/latest/contributor/microversions.html#when-do-i-need-a-new-microversion will be noted by someone (surely not me just now) | 16:05 |
mriedem | i think it probably falls into this category a bit "Fixing a bug so that a 400+ code is returned rather than a 500 or 503 does not require a microversion change. It’s assumed that clients are not expected to handle a 500 or 503 response and therefore should not need to opt-in to microversion changes that fixes a 500 or 503 response from happening." | 16:06 |
mriedem | 207 is returned rather than 404 - and this is likely the only api we have that returns 207 | 16:06 |
mriedem | this probably hasn't come up before either since we don't have events coming in on instances that don't have a host (i don't think anyway - maybe shelve offload?) | 16:07 |
openstackgerrit | Eric Fried proposed openstack/nova master: WIP: Nix os-server-external-events 404 condition https://review.opendev.org/698037 | 16:09 |
dansmith | well, in those cases, they're likely firing and ignoring the status or getting a different error code than they're really expecting, | 16:09 |
efried | Ima let zuul tell me which tests to fix --^ | 16:09 |
dansmith | so it doesn't seem likely to affect anyone | 16:09 |
sean-k-mooney | if you did a neutorn port update on a shelve offload intance it would trigger a network changed event so ya that might be one case | 16:10 |
openstackgerrit | Stephen Finucane proposed openstack/os-traits master: Add COMPUTE_SAME_HOST_COLD_MIGRATE trait https://review.opendev.org/666604 | 16:10 |
Sundar | efried: I will get back in ~30min - 1 hour, since I have some personal matters to attend now. Sorry. | 16:11 |
mriedem | yeah 404 isn't really appropriate - that's if we didn't find any instances, but clearly we can but they don't have hosts | 16:11 |
*** mkrai has joined #openstack-nova | 16:12 | |
efried | mriedem: as noted above, we could arguably keep 404 for "none of the instances were found" | 16:12 |
mriedem | yeah that's fine | 16:12 |
efried | but even that's kind of weird. | 16:12 |
efried | would we then do an overall 422 if none of the instances were mapped to hosts? | 16:12 |
sean-k-mooney | im not sure if we have added rebuild form cell0 i think that was planed for stien but it would also have no host set | 16:12 |
mriedem | sean-k-mooney: we haven't | 16:12 |
sean-k-mooney | ok so ya then shelve offloaded is likely the only case then prior to cyborg integration | 16:15 |
openstackgerrit | Mykola Yakovliev proposed openstack/nova master: Fix boot_roles in InstanceSystemMetadata https://review.opendev.org/698040 | 16:18 |
*** mmethot has quit IRC | 16:22 | |
*** mmethot has joined #openstack-nova | 16:23 | |
gmann | mriedem: dansmith efried 404 case (else part of if accepted_events) can be due to multiple reason for each event. For example, it can be due to few event are 400, 404 and 422. May be overall status in that case can be 400 saying error message as "none of the event are accepted, check individual event failure reason." ? | 16:27 |
efried | gmann: Thought about that too | 16:27 |
efried | The definition of 207 allows for total failure, so that seemed simpler | 16:27 |
dansmith | 207 means "check the individual statuses" which can always be correct | 16:27 |
dansmith | yeah | 16:27 |
efried | gmann: thoughts? | 16:28 |
*** links has quit IRC | 16:28 | |
gmann | yeah, by RFP definition it can be all failure case "The response MAY be used in success, partial success and also in failure situations. " - https://tools.ietf.org/html/rfc4918#section-13 | 16:31 |
mriedem | how would you return a 400 with a json response body with the individual failure reasons? | 16:32 |
gmann | considering that as success code (2xx) is not correct usage. I agree to change 404->207. | 16:32 |
mriedem | that's what the 207 is for as dan noted | 16:32 |
gmann | yeah | 16:32 |
efried | cool | 16:32 |
gmann | it need in build response body always | 16:32 |
*** damien_r has quit IRC | 16:35 | |
*** maciejjozefczyk has quit IRC | 16:37 | |
*** ociuhandu has joined #openstack-nova | 16:41 | |
stephenfin | mdbooth: comments on https://review.opendev.org/#/c/631294/ if you're looking for something to do | 16:42 |
*** awalende has joined #openstack-nova | 16:44 | |
*** rpittau is now known as rpittau|afk | 16:46 | |
*** awalende has quit IRC | 16:49 | |
artom | stephenfin, if you're reviewing stuff - https://review.opendev.org/#/c/672595/ is still a thing. It's become like this poisoned thing that no one want to touch :( | 16:51 |
artom | Not sure how I can help get it out of that status | 16:52 |
stephenfin | I'll take a look | 16:53 |
artom | stephenfin, it's the NUMA func test patch, if you hadn't guessed | 16:54 |
stephenfin | I'd guessed :) | 16:54 |
Sundar | efried: "care to open that, since you uncovered this?" -- sure, I'll open a bug. | 16:54 |
Sundar | Good to see the discussion. Is it the final conclusion that 'No host found for instance' for one instance is a 207, and rest are 404? If we hypothetically have N instances, with 'No host found' for each of them, will it still be 207? | 16:56 |
*** ociuhandu has quit IRC | 16:58 | |
efried | Sundar: the result code will always be either 200 (all ok) or 207 (some/all not ok) | 17:02 |
Sundar | Never mind: the description of https://review.opendev.org/#/c/698037/ says it is always 207, and 404 is never returned. | 17:02 |
Sundar | efried: Ok | 17:03 |
*** priteau has joined #openstack-nova | 17:05 | |
*** mlavalle has joined #openstack-nova | 17:12 | |
efried | kashyap or sean-k-mooney: would one of you mind giving the SEV fix a quick sanity check please? https://review.opendev.org/#/c/693072/9 | 17:15 |
*** gyee has joined #openstack-nova | 17:16 | |
kashyap | efried: Hi | 17:21 |
efried | o/ | 17:21 |
kashyap | I'm just about to head out, as I have a table-tennis "competition"; so I'm trying to switch from the sit-like-a-vegetable-in-front-of-the-screen to get-some-oxygen-going mode run :D | 17:22 |
* kashyap clicks | 17:22 | |
efried | The code looks fine to me, but I don't really know what a virtio-serial is, or whether I should have one. | 17:22 |
sean-k-mooney | ill take a look at it | 17:23 |
efried | kashyap: thanks for responding to my ping. I don't want to send this one over the net and find out later I've dropped the ball. | 17:23 |
kashyap | efried: Yes, I just read the commit message, it makes sense | 17:23 |
sean-k-mooney | ... such a bad joke | 17:23 |
sean-k-mooney | i love it :) | 17:23 |
Sundar | Bug report for the HTTP 207/404 thingy in server external events: https://bugs.launchpad.net/nova/+bug/1855752 | 17:24 |
openstack | Launchpad bug 1855752 in OpenStack Compute (nova) "Inappropriate HTTP error status from os-server-external-events" [Undecided,New] | 17:24 |
openstackgerrit | Eric Fried proposed openstack/nova master: Nix os-server-external-events 404 condition https://review.opendev.org/698037 | 17:24 |
kashyap | efried: BTW, 'virtio-serial' is a not-so-stellar-but-works-alright thing that exposes a character dev for simple I/O between host and guest. | 17:24 |
openstackgerrit | Merged openstack/nova stable/stein: Do not update root_device_name during guest config https://review.opendev.org/696351 | 17:25 |
kashyap | efried: Nah, you're being your diligent self | 17:25 |
efried | gmann, dansmith, mriedem, Sundar: I think that's ready now --^ | 17:25 |
efried | kashyap: I was making table tennis jokes | 17:25 |
kashyap | efried: I'm being a dense potatoe | 17:25 |
kashyap | sean-k-mooney: Ah, I already see you made the correction of generic "virtio controller" --> "virtio serial controller" | 17:26 |
sean-k-mooney | yes since there are lots of virt controllers :) | 17:27 |
* kashyap going to disappear starting Wed until end-of-year | 17:28 | |
kashyap | s/going/is going/ | 17:28 |
*** damien_r has joined #openstack-nova | 17:28 | |
*** pcaruana has quit IRC | 17:31 | |
kashyap | efried: Yes, the reason for the change makse sense to me, FWIW. And I know the "IOMMU" driver being a requirement for SEV. (From its Nova spec) | 17:32 |
* kashyap --> AFK, back later | 17:32 | |
kashyap | (I'll let sean-k-mooney to take a deeper look, as he also was looking at it before.) | 17:32 |
*** dtantsur is now known as dtantsur|afk | 17:33 | |
sean-k-mooney | i have reviewd the bottm two patchs almost done with the final one | 17:33 |
openstackgerrit | Merged openstack/os-traits master: Add COMPUTE_SAME_HOST_COLD_MIGRATE trait https://review.opendev.org/666604 | 17:34 |
*** salmankhan has quit IRC | 17:35 | |
*** mmethot has quit IRC | 17:35 | |
*** damien_r has quit IRC | 17:36 | |
stephenfin | artom: loads of comments left on that | 17:37 |
openstackgerrit | Matt Riedemann proposed openstack/nova master: Add confirm_snapshot_based_resize conductor RPC method https://review.opendev.org/637075 | 17:38 |
openstackgerrit | Matt Riedemann proposed openstack/nova master: Confirm cross-cell resize from the API https://review.opendev.org/637316 | 17:38 |
openstackgerrit | Matt Riedemann proposed openstack/nova master: Add revert_snapshot_based_resize_at_dest compute method https://review.opendev.org/637630 | 17:38 |
openstackgerrit | Matt Riedemann proposed openstack/nova master: Deal with cross-cell resize in _remove_deleted_instances_allocations https://review.opendev.org/639453 | 17:38 |
openstackgerrit | Matt Riedemann proposed openstack/nova master: Add finish_revert_snapshot_based_resize_at_source compute method https://review.opendev.org/637647 | 17:38 |
openstackgerrit | Matt Riedemann proposed openstack/nova master: Add RevertResizeTask https://review.opendev.org/638046 | 17:38 |
openstackgerrit | Matt Riedemann proposed openstack/nova master: Flesh out RevertResizeTask.rollback https://review.opendev.org/695334 | 17:38 |
openstackgerrit | Matt Riedemann proposed openstack/nova master: Add revert_snapshot_based_resize conductor RPC method https://review.opendev.org/638047 | 17:38 |
openstackgerrit | Matt Riedemann proposed openstack/nova master: Revert cross-cell resize from the API https://review.opendev.org/638048 | 17:38 |
openstackgerrit | Matt Riedemann proposed openstack/nova master: Add functional cross-cell revert test with detached volume https://review.opendev.org/695335 | 17:38 |
openstackgerrit | Matt Riedemann proposed openstack/nova master: Confirm cross-cell resize while deleting a server https://review.opendev.org/638268 | 17:38 |
openstackgerrit | Matt Riedemann proposed openstack/nova master: Add archive_deleted_rows wrinkle to cross-cell functional test https://review.opendev.org/651650 | 17:38 |
openstackgerrit | Matt Riedemann proposed openstack/nova master: Add CrossCellWeigher https://review.opendev.org/614353 | 17:38 |
openstackgerrit | Matt Riedemann proposed openstack/nova master: Add test_resize_cross_cell_weigher_filtered_to_target_cell_by_spec https://review.opendev.org/695336 | 17:38 |
openstackgerrit | Matt Riedemann proposed openstack/nova master: Add functional test for anti-affinity cross-cell migration https://review.opendev.org/661859 | 17:38 |
openstackgerrit | Matt Riedemann proposed openstack/nova master: Support cross-cell moves in external_instance_event https://review.opendev.org/658478 | 17:38 |
openstackgerrit | Matt Riedemann proposed openstack/nova master: libvirt: flatten rbd image during cross-cell move spawn at dest https://review.opendev.org/691991 | 17:38 |
openstackgerrit | Matt Riedemann proposed openstack/nova master: Add cross-cell resize policy rule and enable in API https://review.opendev.org/638269 | 17:38 |
mriedem | oh gdi that wasn't meant to be a full rebase | 17:38 |
mriedem | but now that it is, i might as well do a full rebase on master | 17:38 |
sean-k-mooney | efried: +1 on the qemu guest agent series | 17:40 |
efried | thanks sean-k-mooney. I'll send it to the gate. | 17:40 |
openstackgerrit | Matt Riedemann proposed openstack/nova master: Add ConfirmResizeTask https://review.opendev.org/637070 | 17:41 |
openstackgerrit | Matt Riedemann proposed openstack/nova master: Follow up to I5b9d41ef34385689d8da9b3962a1eac759eddf6a https://review.opendev.org/698028 | 17:41 |
openstackgerrit | Matt Riedemann proposed openstack/nova master: Add confirm_snapshot_based_resize conductor RPC method https://review.opendev.org/637075 | 17:41 |
openstackgerrit | Matt Riedemann proposed openstack/nova master: Confirm cross-cell resize from the API https://review.opendev.org/637316 | 17:41 |
openstackgerrit | Matt Riedemann proposed openstack/nova master: Add revert_snapshot_based_resize_at_dest compute method https://review.opendev.org/637630 | 17:41 |
openstackgerrit | Matt Riedemann proposed openstack/nova master: Deal with cross-cell resize in _remove_deleted_instances_allocations https://review.opendev.org/639453 | 17:41 |
openstackgerrit | Matt Riedemann proposed openstack/nova master: Add finish_revert_snapshot_based_resize_at_source compute method https://review.opendev.org/637647 | 17:41 |
openstackgerrit | Matt Riedemann proposed openstack/nova master: Add RevertResizeTask https://review.opendev.org/638046 | 17:41 |
openstackgerrit | Matt Riedemann proposed openstack/nova master: Flesh out RevertResizeTask.rollback https://review.opendev.org/695334 | 17:41 |
openstackgerrit | Matt Riedemann proposed openstack/nova master: Add revert_snapshot_based_resize conductor RPC method https://review.opendev.org/638047 | 17:41 |
openstackgerrit | Matt Riedemann proposed openstack/nova master: Revert cross-cell resize from the API https://review.opendev.org/638048 | 17:41 |
openstackgerrit | Matt Riedemann proposed openstack/nova master: Add functional cross-cell revert test with detached volume https://review.opendev.org/695335 | 17:41 |
openstackgerrit | Matt Riedemann proposed openstack/nova master: Confirm cross-cell resize while deleting a server https://review.opendev.org/638268 | 17:41 |
openstackgerrit | Matt Riedemann proposed openstack/nova master: Add archive_deleted_rows wrinkle to cross-cell functional test https://review.opendev.org/651650 | 17:41 |
openstackgerrit | Matt Riedemann proposed openstack/nova master: Add CrossCellWeigher https://review.opendev.org/614353 | 17:41 |
openstackgerrit | Matt Riedemann proposed openstack/nova master: Add test_resize_cross_cell_weigher_filtered_to_target_cell_by_spec https://review.opendev.org/695336 | 17:41 |
openstackgerrit | Matt Riedemann proposed openstack/nova master: Add functional test for anti-affinity cross-cell migration https://review.opendev.org/661859 | 17:41 |
openstackgerrit | Matt Riedemann proposed openstack/nova master: Support cross-cell moves in external_instance_event https://review.opendev.org/658478 | 17:41 |
artom | stephenfin, thanks! | 17:46 |
openstackgerrit | Vladyslav Drok proposed openstack/nova master: Minor improvements to cell commands https://review.opendev.org/698053 | 17:47 |
*** igordc has joined #openstack-nova | 17:49 | |
openstackgerrit | Eric Fried proposed openstack/nova master: Nix os-server-external-events 404 condition https://review.opendev.org/698037 | 17:52 |
efried | gmann, mriedem: done ^ | 17:53 |
*** priteau has quit IRC | 17:54 | |
*** davidsha has quit IRC | 17:55 | |
*** priteau has joined #openstack-nova | 17:55 | |
*** priteau has quit IRC | 17:58 | |
*** mkrai has quit IRC | 18:00 | |
*** derekh has quit IRC | 18:01 | |
efried | stephenfin: are you working your way up to https://review.opendev.org/#/c/696992/ eventually? (aka: "would you please, and thank you?") | 18:05 |
openstackgerrit | sean mooney proposed openstack/nova master: Disable NUMATopologyFilter on rebuild https://review.opendev.org/689861 | 18:09 |
openstackgerrit | sean mooney proposed openstack/nova master: support pci numa affinity policies in flavor and image https://review.opendev.org/674072 | 18:09 |
sean-k-mooney | efried: do you have time this week to review https://review.opendev.org/#/c/687957/11 and the two patches that follow. i would like to get those merged before i go on PTO | 18:10 |
efried | is that the numa affinity thing? | 18:10 |
sean-k-mooney | the frist two are rebuild of numa instance and the last is the numa affinity policy thing | 18:11 |
sean-k-mooney | so those 3 patch fix 2 bugs and close two blueprints/specs | 18:12 |
openstackgerrit | Merged openstack/nova master: Fup for I63c1109dcdb9132cdbc41010654c5fdb31a4fe31 https://review.opendev.org/697678 | 18:12 |
sean-k-mooney | efried: also i think you just missed stephenfin. he was just heading home for the day | 18:13 |
openstackgerrit | Eric Fried proposed openstack/nova master: FUP: Remove noqa and tone down an exception https://review.opendev.org/698054 | 18:13 |
efried | ack. | 18:14 |
efried | I'll look at those now sean-k-mooney... | 18:14 |
*** maciejjozefczyk has joined #openstack-nova | 18:19 | |
*** lbragstad has quit IRC | 18:19 | |
*** pcaruana has joined #openstack-nova | 18:23 | |
*** jbernard has quit IRC | 18:24 | |
openstackgerrit | Mykola Yakovliev proposed openstack/nova master: Fix boot_roles in InstanceSystemMetadata https://review.opendev.org/698040 | 18:24 |
mnaser | has anyone seen any race conditions in starting up instances in stein? :X | 18:25 |
*** jbernard has joined #openstack-nova | 18:26 | |
openstackgerrit | Mykola Yakovliev proposed openstack/nova master: Fix boot_roles in InstanceSystemMetadata https://review.opendev.org/698040 | 18:26 |
mnaser | https://www.irccloud.com/pastebin/8hYnM4Lx/ | 18:27 |
mnaser | i mean this is a grep on the instance uuid but the instance destroyed successfully twice 15 seconds apart? | 18:27 |
*** maciejjozefczyk has quit IRC | 18:31 | |
*** ralonsoh has quit IRC | 18:32 | |
sean-k-mooney | it could be an interaction with the periodic task and the actul instnace action that tried to showdown the instance | 18:36 |
*** igordc has quit IRC | 18:37 | |
sean-k-mooney | mnaser: but no i have not seen that specifically | 18:37 |
mnaser | sean-k-mooney: well instance destroyed successfully is called by `_wait_for_destroy` | 18:37 |
mnaser | https://github.com/openstack/nova/blob/stable/stein/nova/virt/libvirt/driver.py#L1023-L1025 | 18:38 |
mnaser | which has the timer here https://github.com/openstack/nova/blob/stable/stein/nova/virt/libvirt/driver.py#L1038-L1039 | 18:38 |
mnaser | which is `_destroy()` | 18:38 |
mnaser | hmm, we dont have a 'powering-off' state in nova do we | 18:39 |
mnaser | we do | 18:40 |
*** martinkennelly has quit IRC | 18:40 | |
mnaser | task state gets updated inside nova.compute.api -- does that mean that technically there could be 2 concurrent api requests? | 18:41 |
*** lbragstad has joined #openstack-nova | 18:41 | |
sean-k-mooney | you can have two concurrent request to destroy yes at the api level | 18:42 |
mnaser | so i guess in that case if two api requests came at the same time, we can end up with two requests in the compute level | 18:42 |
sean-k-mooney | but the window would be quite short | 18:42 |
sean-k-mooney | yes | 18:42 |
sean-k-mooney | although we normaly do a check of the taskstate wehn ever we updat it so one would fail | 18:43 |
*** avolkov has quit IRC | 18:45 | |
mordred | mriedem, efried: sdk release was cut including your ironic fix | 18:49 |
efried | ack, thx | 18:51 |
mnaser | sean-k-mooney: _record_action_start is called, ill check logs | 18:51 |
efried | sean-k-mooney: theoretically... | 18:54 |
efried | if a rebuild is done with the same image... | 18:54 |
efried | couldn't I have changed the metadata on that image since my instance was originally booted? | 18:54 |
mnaser | sean-k-mooney: sigh, i wonder if we call _destroy before starting up an instance which is why it says "Instance destroyed successfully." | 18:57 |
mnaser | sean-k-mooney: yes thats exactly what happens, we call destroy for power on because we call hard reboot | 18:58 |
*** tbachman has joined #openstack-nova | 19:00 | |
dustinc | dansmith: I was working on docs/etc for [1] and was thinking that it might be a good idea to have a master on/off switch for the feature with a default of off...the reason I was thinking that is because as it is specced right now, the directory is the only config option and a cautious operator would probably want to make sure the default directory is present with permissions set..it would maybe be easier to just | 19:05 |
dustinc | disable the entire feature by default instead | 19:05 |
dustinc | [1] https://blueprints.launchpad.net/nova/+spec/provider-config-file | 19:05 |
dustinc | what do you think? | 19:05 |
*** artom has quit IRC | 19:05 | |
dansmith | dustinc: how is a conf knob turned off different from the directory/file(s) being missing? | 19:05 |
*** artom has joined #openstack-nova | 19:06 | |
dustinc | in theory if not present someone else could create the directory and place files | 19:06 |
efried | I was also thinking maybe instead of having the directory conf option default to something sane, have it default to None, and that's the "off" switch. You have to turn it on explicitly by setting it to something. | 19:07 |
dansmith | dustinc: in /etc/nova? not anyone without sufficient privileges | 19:07 |
dustinc | dansmith: if that's sufficient then maybe leave it as it is then.. | 19:09 |
*** ociuhandu has joined #openstack-nova | 19:10 | |
dustinc | thanks | 19:11 |
mnaser | sean-k-mooney: off the top of your head? i could swear there was a nova option which skipped waiting for port plugs when starting up an instance | 19:14 |
*** igordc has joined #openstack-nova | 19:14 | |
mnaser | am i imagining things | 19:14 |
*** ociuhandu has quit IRC | 19:15 | |
dansmith | mnaser: vif_plugging_timeout=0 and vif_plugging_fatal=False | 19:15 |
*** ociuhandu has joined #openstack-nova | 19:15 | |
dansmith | mnaser: but don't do it except for debugging | 19:15 |
mnaser | dansmith: ah yes, ok that makes sense | 19:16 |
mnaser | dansmith: yep, i don't want to either, we're just trying to figure out why rdocloud can bring instances up so quickly (yet this new cloud is taking so long the tripleo-ci b its are timing out) | 19:16 |
*** ociuhandu has quit IRC | 19:16 | |
mnaser | so given its taking ~1-2 minutes to turn up an instance, because a lot of it is unplugging (in cleanup) and plugging (in startup) | 19:16 |
dustinc | dansmith: dont_be_slow=true | 19:17 |
*** gmann is now known as gmann_afk | 19:17 | |
openstackgerrit | Alexandre arents proposed openstack/nova stable/rocky: Do not update root_device_name during guest config https://review.opendev.org/696353 | 19:36 |
*** tesseract has quit IRC | 19:37 | |
*** awalende has joined #openstack-nova | 19:39 | |
*** martinkennelly has joined #openstack-nova | 19:41 | |
*** awalende has quit IRC | 19:43 | |
*** tbachman has quit IRC | 19:48 | |
*** eharney has quit IRC | 20:04 | |
*** abaindur has joined #openstack-nova | 20:05 | |
sean-k-mooney | mnaser: i see dansmith alreay answered your question. are the new ci and rdocloud both using hte same netwrok backend. | 20:21 |
mnaser | sean-k-mooney: afaik that runs ovs and so do we, dont have too much details except ocata vs stein | 20:22 |
sean-k-mooney | i think rdocloud is running ml2/ovs too | 20:22 |
sean-k-mooney | efried: if a rebuild is done with the same iamge we do not go to the schduler and use the cached copy of the image mentadata | 20:23 |
sean-k-mooney | efried: we only go to the schduler if the image changes | 20:24 |
efried | interesting | 20:34 |
openstackgerrit | Matt Riedemann proposed openstack/nova master: Flesh out docs for cross-cell resize/cold migrate https://review.opendev.org/696212 | 20:38 |
openstackgerrit | Matt Riedemann proposed openstack/nova master: Add sequence diagrams for cross-cell-resize https://review.opendev.org/698051 | 20:38 |
openstackgerrit | Matt Riedemann proposed openstack/nova master: Simplify FinishResizeAtDestTask event handling https://review.opendev.org/695337 | 20:38 |
openstackgerrit | Matt Riedemann proposed openstack/nova master: Implement cleanup_instance_network_on_host for neutron API https://review.opendev.org/697162 | 20:38 |
openstackgerrit | Matt Riedemann proposed openstack/nova master: WIP: Add negative test to delete server during cross-cell resize claim https://review.opendev.org/688832 | 20:38 |
openstackgerrit | Matt Riedemann proposed openstack/nova master: WIP: Implement reschedule logic for cross-cell resize/migrate https://review.opendev.org/696213 | 20:38 |
sean-k-mooney | resource usage is not ment to change on a rebuild so since the flavor does not change and the image does not change you dont need to check. | 20:38 |
mriedem | except for silly ol numa | 20:40 |
sean-k-mooney | yep | 20:40 |
mriedem | which is why eric finds himself in this situation | 20:40 |
*** mdbooth has quit IRC | 20:40 | |
sean-k-mooney | well even in the numa case if you dont change the image then your fine but ya people want to do that for some reason :) | 20:41 |
mordred | changing the image doesn't seem like a rebuild to me - it seems like creating a new server | 20:41 |
mordred | but - you know - what do I know? | 20:41 |
*** mdbooth has joined #openstack-nova | 20:42 | |
sean-k-mooney | :) well we are stuck with the terms we have | 20:42 |
sean-k-mooney | atleast its less confusing then evacuate | 20:43 |
*** ociuhandu has joined #openstack-nova | 20:43 | |
*** ociuhandu has quit IRC | 20:48 | |
*** abaindur has quit IRC | 20:50 | |
*** abaindur has joined #openstack-nova | 20:50 | |
*** damien_r has joined #openstack-nova | 20:53 | |
*** eharney has joined #openstack-nova | 21:01 | |
*** maciejjozefczyk has joined #openstack-nova | 21:17 | |
*** ociuhandu has joined #openstack-nova | 21:21 | |
*** maciejjozefczyk has quit IRC | 21:23 | |
*** ociuhandu has quit IRC | 21:26 | |
*** pcaruana has quit IRC | 21:31 | |
*** tbachman has joined #openstack-nova | 21:33 | |
openstackgerrit | Mykola Yakovliev proposed openstack/nova master: Validate aggregate IDs before querying database https://review.opendev.org/698094 | 21:36 |
openstackgerrit | Merged openstack/nova master: Handle ServiceNotFound in DbDriver._report_state https://review.opendev.org/697301 | 21:38 |
*** nicolasbock has joined #openstack-nova | 21:41 | |
*** gmann_afk is now known as gmann | 21:48 | |
*** brault has quit IRC | 21:59 | |
*** ociuhandu has joined #openstack-nova | 22:04 | |
*** slaweq has quit IRC | 22:05 | |
openstackgerrit | Merged openstack/nova master: VMware: disk_io_limits settings are not reflected when resize https://review.opendev.org/680296 | 22:08 |
*** damien_r has quit IRC | 22:09 | |
*** ociuhandu has quit IRC | 22:10 | |
*** damien_r has joined #openstack-nova | 22:10 | |
*** cgoncalves has quit IRC | 22:16 | |
*** rcernin has joined #openstack-nova | 22:16 | |
openstackgerrit | Merged openstack/nova stable/stein: Add functional recreate test for bug 1829479 and bug 1817833 https://review.opendev.org/695932 | 22:21 |
openstack | bug 1829479 in OpenStack Compute (nova) "The allocation table has residual records when instance is evacuated and the source physical node is removed" [Medium,In progress] https://launchpad.net/bugs/1829479 - Assigned to Matt Riedemann (mriedem) | 22:21 |
openstack | bug 1817833 in OpenStack Compute (nova) "Check compute_id existence when nova-compute reports info to placement" [Medium,In progress] https://launchpad.net/bugs/1817833 - Assigned to Matt Riedemann (mriedem) | 22:21 |
openstackgerrit | Merged openstack/nova stable/stein: Add functional recreate test for bug 1852610 https://review.opendev.org/695935 | 22:21 |
openstack | bug 1852610 in OpenStack Compute (nova) stein "API allows source compute service/node deletion while instances are pending a resize confirm/revert" [Undecided,In progress] https://launchpad.net/bugs/1852610 - Assigned to Matt Riedemann (mriedem) | 22:21 |
openstackgerrit | Merged openstack/nova stable/stein: Add functional recreate revert resize test for bug 1852610 https://review.opendev.org/695938 | 22:21 |
*** cgoncalves has joined #openstack-nova | 22:23 | |
*** cgoncalves has quit IRC | 22:26 | |
*** cgoncalves has joined #openstack-nova | 22:27 | |
*** cgoncalves has quit IRC | 22:27 | |
*** cgoncalves has joined #openstack-nova | 22:28 | |
*** nweinber has quit IRC | 22:32 | |
*** rcernin has quit IRC | 22:36 | |
openstackgerrit | Matt Riedemann proposed openstack/nova master: WIP: Add NodeOwnerFilter https://review.opendev.org/697331 | 22:59 |
efried | sean-k-mooney: (I hope you don't see this until tomorrow) Reviewed the series, have a question on the affinity patch: why do we need the notification payload change? | 23:00 |
efried | and with that, I'm outta here o/ | 23:02 |
openstackgerrit | Merged openstack/nova stable/stein: Block deleting compute services with in-progress migrations https://review.opendev.org/695940 | 23:12 |
*** martinkennelly has quit IRC | 23:12 | |
*** tkajinam has joined #openstack-nova | 23:12 | |
openstackgerrit | Matt Riedemann proposed openstack/nova stable/rocky: Add functional recreate test for bug 1829479 and bug 1817833 https://review.opendev.org/698106 | 23:13 |
openstack | bug 1829479 in OpenStack Compute (nova) "The allocation table has residual records when instance is evacuated and the source physical node is removed" [Medium,In progress] https://launchpad.net/bugs/1829479 - Assigned to Matt Riedemann (mriedem) | 23:13 |
openstack | bug 1817833 in OpenStack Compute (nova) "Check compute_id existence when nova-compute reports info to placement" [Medium,In progress] https://launchpad.net/bugs/1817833 - Assigned to Matt Riedemann (mriedem) | 23:13 |
*** ociuhandu has joined #openstack-nova | 23:15 | |
*** rcernin has joined #openstack-nova | 23:17 | |
*** ociuhandu has quit IRC | 23:20 | |
openstackgerrit | Matt Riedemann proposed openstack/nova stable/rocky: Add functional recreate test for bug 1852610 https://review.opendev.org/698108 | 23:21 |
openstack | bug 1852610 in OpenStack Compute (nova) rocky "API allows source compute service/node deletion while instances are pending a resize confirm/revert" [Undecided,New] https://launchpad.net/bugs/1852610 | 23:21 |
*** mlavalle has quit IRC | 23:22 | |
openstackgerrit | Matt Riedemann proposed openstack/nova stable/rocky: Add functional recreate test for bug 1852610 https://review.opendev.org/698108 | 23:22 |
openstack | bug 1852610 in OpenStack Compute (nova) rocky "API allows source compute service/node deletion while instances are pending a resize confirm/revert" [Undecided,New] https://launchpad.net/bugs/1852610 | 23:22 |
*** ociuhandu has joined #openstack-nova | 23:24 | |
openstackgerrit | Matt Riedemann proposed openstack/nova stable/rocky: Add functional recreate revert resize test for bug 1852610 https://review.opendev.org/698110 | 23:31 |
openstack | bug 1852610 in OpenStack Compute (nova) rocky "API allows source compute service/node deletion while instances are pending a resize confirm/revert" [Undecided,New] https://launchpad.net/bugs/1852610 | 23:31 |
*** ociuhandu has quit IRC | 23:33 | |
*** ociuhandu has joined #openstack-nova | 23:34 | |
openstackgerrit | Matt Riedemann proposed openstack/nova stable/rocky: Block deleting compute services with in-progress migrations https://review.opendev.org/698113 | 23:35 |
*** mriedem has quit IRC | 23:37 | |
*** ociuhandu has quit IRC | 23:39 | |
*** ociuhandu has joined #openstack-nova | 23:44 | |
openstackgerrit | Merged openstack/nova master: Block rebuild when NUMA topology changed https://review.opendev.org/687957 | 23:49 |
*** ociuhandu has quit IRC | 23:49 | |
*** artom has quit IRC | 23:51 | |
*** ociuhandu has joined #openstack-nova | 23:54 | |
*** tosky has quit IRC | 23:56 |
Generated by irclog2html.py 2.15.3 by Marius Gedminas - find it at mg.pov.lt!