artom | Wait, Blizzard genuinely run OpenStack? | 00:00 |
---|---|---|
mriedem | artom: yes | 00:00 |
artom | Explains the WoW server queues | 00:00 |
artom | </joke from 15 years ago> | 00:00 |
mriedem | https://www.openstack.org/summit/denver-2019/summit-schedule/events/23379/how-blizzard-entertainment-uses-autoscaling-with-overwatch | 00:00 |
mriedem | eandersson will cut you | 00:00 |
artom | I probably deserve that. | 00:01 |
*** gyee has quit IRC | 00:01 | |
artom | Oh, Designate. | 00:03 |
artom | My intro to OpenStack was on that, as an intern at eNovance | 00:03 |
mriedem | it's great that when post live migration fails we just, you know, act like it was good http://logs.openstack.org/64/649464/3/check/nova-live-migration/a0fdcc9/logs/screen-n-cpu.txt.gz?level=TRACE#_May_09_23_00_39_272825 | 00:03 |
mriedem | https://review.opendev.org/#/c/649464/3/nova/compute/manager.py@6889 | 00:04 |
openstackgerrit | Matt Riedemann proposed openstack/nova master: DNM: Test theory behind bug 1822884 https://review.opendev.org/649464 | 00:04 |
openstack | bug 1822884 in OpenStack Compute (nova) "live migration fails due to port binding duplicate key entry in post_live_migrate" [Medium,In progress] https://launchpad.net/bugs/1822884 - Assigned to sean mooney (sean-k-mooney) | 00:04 |
artom | *snerk* my goat is still in there: https://github.com/openstack/designate/blob/master/designate/tests/resources/zonefiles/malformed_example.com.zone | 00:04 |
mriedem | i was supposed to have started making dinner and hour ago so i'm out of here | 00:05 |
mriedem | o/ | 00:05 |
mriedem | *an | 00:06 |
*** mriedem has quit IRC | 00:06 | |
artom | Good man | 00:06 |
*** slaweq has joined #openstack-nova | 00:11 | |
*** slaweq has quit IRC | 00:15 | |
*** hongbin has quit IRC | 00:21 | |
*** sapd1_x has quit IRC | 00:24 | |
*** lbragstad has joined #openstack-nova | 00:26 | |
*** ricolin has joined #openstack-nova | 00:43 | |
*** igordc has quit IRC | 00:51 | |
*** tbachman has joined #openstack-nova | 01:02 | |
*** tssurya has quit IRC | 01:03 | |
*** cdent has quit IRC | 01:16 | |
*** bhagyashris_ has joined #openstack-nova | 01:16 | |
*** whoami-rajat has joined #openstack-nova | 01:18 | |
openstackgerrit | Merged openstack/nova stable/queens: Use migration_status during volume migrating and retyping https://review.opendev.org/657579 | 01:22 |
*** _hemna has joined #openstack-nova | 01:24 | |
*** slaweq has joined #openstack-nova | 01:42 | |
*** slaweq has quit IRC | 01:46 | |
bhagyashris_ | Artom: Hi | 01:49 |
bhagyashris_ | artom: Hi | 01:49 |
*** _hemna has quit IRC | 01:58 | |
*** ttsiouts has joined #openstack-nova | 01:59 | |
*** dasp has joined #openstack-nova | 02:11 | |
*** lbragstad has quit IRC | 02:12 | |
*** ttsiouts has quit IRC | 02:33 | |
openstackgerrit | Merged openstack/nova master: Remove macs kwarg from allocate_for_instance https://review.opendev.org/652749 | 02:37 |
*** brinzhang has joined #openstack-nova | 02:37 | |
*** ileixe has quit IRC | 02:48 | |
*** udesale has joined #openstack-nova | 03:03 | |
*** wwriverrat has quit IRC | 03:09 | |
*** slaweq has joined #openstack-nova | 03:11 | |
*** slaweq has quit IRC | 03:16 | |
*** jobewan has joined #openstack-nova | 03:21 | |
*** JamesBenson has joined #openstack-nova | 03:25 | |
*** ileixe has joined #openstack-nova | 03:32 | |
*** _hemna has joined #openstack-nova | 03:54 | |
*** takashin has quit IRC | 04:08 | |
*** dasp has quit IRC | 04:08 | |
*** slaweq has joined #openstack-nova | 04:11 | |
*** takashin has joined #openstack-nova | 04:15 | |
*** slaweq has quit IRC | 04:16 | |
*** ratailor has joined #openstack-nova | 04:24 | |
*** _hemna has quit IRC | 04:27 | |
*** cdent has joined #openstack-nova | 04:29 | |
*** ttsiouts has joined #openstack-nova | 04:30 | |
*** dasp has joined #openstack-nova | 04:30 | |
*** ileixe has quit IRC | 04:34 | |
*** ileixe has joined #openstack-nova | 04:38 | |
*** tkajinam has quit IRC | 05:01 | |
*** JamesBenson has quit IRC | 05:01 | |
*** ttsiouts has quit IRC | 05:02 | |
*** cdent has quit IRC | 05:06 | |
*** slaweq has joined #openstack-nova | 05:11 | |
*** ivve has quit IRC | 05:14 | |
*** slaweq has quit IRC | 05:15 | |
*** JamesBenson has joined #openstack-nova | 05:33 | |
*** tkajinam has joined #openstack-nova | 05:34 | |
*** JamesBenson has quit IRC | 05:38 | |
*** udesale has quit IRC | 05:44 | |
*** udesale has joined #openstack-nova | 05:45 | |
*** ivve has joined #openstack-nova | 06:14 | |
*** janki has joined #openstack-nova | 06:17 | |
*** Luzi has joined #openstack-nova | 06:19 | |
*** slaweq has joined #openstack-nova | 06:23 | |
*** _hemna has joined #openstack-nova | 06:24 | |
*** dpawlik has joined #openstack-nova | 06:27 | |
*** rpittau|afk is now known as rpittau | 06:41 | |
*** maciejjozefczyk has joined #openstack-nova | 06:48 | |
openstackgerrit | Merged openstack/python-novaclient master: Use SHA256 instead of MD5 in completion cache https://review.opendev.org/658181 | 06:50 |
*** _hemna has quit IRC | 06:58 | |
*** ttsiouts has joined #openstack-nova | 07:00 | |
*** boxiang has quit IRC | 07:07 | |
*** boxiang has joined #openstack-nova | 07:07 | |
*** _hemna has joined #openstack-nova | 07:11 | |
*** _hemna has quit IRC | 07:16 | |
*** brault has joined #openstack-nova | 07:19 | |
*** tesseract has joined #openstack-nova | 07:19 | |
tobberydberg | Definitely mriedem - thanks! | 07:34 |
*** jobewan has quit IRC | 07:44 | |
*** jaosorior has joined #openstack-nova | 07:49 | |
*** mcgigglier has joined #openstack-nova | 07:52 | |
*** ralonsoh has joined #openstack-nova | 07:53 | |
*** udesale has quit IRC | 07:57 | |
*** udesale has joined #openstack-nova | 07:58 | |
*** ttsiouts has quit IRC | 08:03 | |
*** takashin has left #openstack-nova | 08:03 | |
*** udesale has quit IRC | 08:05 | |
*** udesale has joined #openstack-nova | 08:08 | |
*** tkajinam has quit IRC | 08:12 | |
*** udesale has quit IRC | 08:22 | |
*** tssurya has joined #openstack-nova | 08:23 | |
*** ricolin has quit IRC | 08:24 | |
*** sapd1_x has joined #openstack-nova | 08:29 | |
*** sapd1_x has quit IRC | 08:56 | |
*** jchhatbar has joined #openstack-nova | 08:57 | |
*** jchhatbar has quit IRC | 08:58 | |
openstackgerrit | zhufl proposed openstack/nova master: Fix broken url links https://review.opendev.org/658312 | 08:58 |
*** jchhatbar has joined #openstack-nova | 08:58 | |
*** janki has quit IRC | 09:00 | |
openstackgerrit | zhufl proposed openstack/nova master: Fix broken url links https://review.opendev.org/658312 | 09:09 |
*** _hemna has joined #openstack-nova | 09:12 | |
*** sapd1_x has joined #openstack-nova | 09:19 | |
*** udesale has joined #openstack-nova | 09:27 | |
*** tbachman has quit IRC | 09:38 | |
*** tbachman has joined #openstack-nova | 09:40 | |
*** _hemna has quit IRC | 09:46 | |
*** bhagyashris_ has quit IRC | 09:50 | |
openstackgerrit | Balazs Gibizer proposed openstack/nova master: pull out functions from _heal_allocations_for_instance https://review.opendev.org/655457 | 09:57 |
openstackgerrit | Balazs Gibizer proposed openstack/nova master: reorder conditions in _heal_allocations_for_instance https://review.opendev.org/655458 | 09:57 |
openstackgerrit | Balazs Gibizer proposed openstack/nova master: Prepare _heal_allocations_for_instance for nested allocations https://review.opendev.org/637954 | 09:57 |
openstackgerrit | Balazs Gibizer proposed openstack/nova master: pull out put_allocation call from _heal_* https://review.opendev.org/655459 | 09:57 |
openstackgerrit | Balazs Gibizer proposed openstack/nova master: nova-manage: heal port allocations https://review.opendev.org/637955 | 09:57 |
*** ttsiouts has joined #openstack-nova | 10:00 | |
*** tbachman has quit IRC | 10:03 | |
*** rtjure has joined #openstack-nova | 10:16 | |
*** ttsiouts has quit IRC | 10:33 | |
*** lpetrut has joined #openstack-nova | 10:35 | |
*** brinzhang has quit IRC | 10:45 | |
*** vdrok has quit IRC | 10:48 | |
*** vdrok has joined #openstack-nova | 10:49 | |
*** Luzi has quit IRC | 11:00 | |
*** rtjure has quit IRC | 11:00 | |
*** panda is now known as panda|lunch | 11:07 | |
*** sapd1_x has quit IRC | 11:09 | |
kashyap | ildikov: Hi, when you're about, do you have a minute to re-read this error message you've added in _set_multiattach_support()? | 11:17 |
kashyap | LOG.debug('Volume multiattach is not supported based on current ' | 11:17 |
kashyap | 'versions of QEMU and libvirt. QEMU must be less than ' | 11:17 |
kashyap | '2.10 or libvirt must be greater than or equal to 3.10.') | 11:17 |
kashyap | I think the "or" there should be "and", isn't it? | 11:17 |
kashyap | Otherwise, it doesn't quite resolve. | 11:17 |
*** ralonsoh has quit IRC | 11:28 | |
ildikov | kashyap: I think the 'or' was intentional | 11:39 |
*** samueldmq has joined #openstack-nova | 11:39 | |
kashyap | ildikov: Hmm, in that case, I've seen someone from HP report a bug with the above error: where they have 2.10.1 QEMU and libvirt as 3.6.0. | 11:41 |
openstackgerrit | Balazs Gibizer proposed openstack/nova-specs master: Server move operations with ports having resource request https://review.opendev.org/652608 | 11:41 |
ildikov | kashyap: yeah, that should lead to an error | 11:41 |
ildikov | kashyap: as neither the QEMU version is lower nor the libvirt version is higher than suggested | 11:42 |
ildikov | kashyap: it's due to how they handle caching or smth | 11:42 |
*** _hemna has joined #openstack-nova | 11:42 | |
ildikov | as we needed to turn that off for volumes being attached to multiple instances | 11:43 |
kashyap | Yeah, I was reading the related code in the source. Let me go recheck | 11:43 |
ildikov | and there were some changes in how that's handled with the flags, etc | 11:43 |
ildikov | in QEMU and libvirt | 11:43 |
kashyap | (Hmm, but that error message is somewhat confusing.) | 11:43 |
kashyap | ildikov: Right, I saw the bugzilla linked which describes the mess in libvirt | 11:44 |
kashyap | ildikov: Oh, bad me -- I mistook 3.6.0 as higher than 3.10.0. Can I be blinder than that... | 11:45 |
ildikov | well, the error message only says that if the QEMU version is low enough than nothing else matters and the same thing for the libvirt one if it's high enough | 11:45 |
ildikov | kashyap: happens to everyone :) | 11:45 |
kashyap | ildikov: Thanks for the clarification. Just someone reading it out loud is just what you need sometimes :-) | 11:49 |
ildikov | kashyap: np, always happy to help :) | 11:50 |
*** redrobot has joined #openstack-nova | 11:55 | |
*** cgoncalves has quit IRC | 11:56 | |
*** panda|lunch is now known as panda | 12:01 | |
*** tbachman has joined #openstack-nova | 12:06 | |
*** tbachman_ has joined #openstack-nova | 12:08 | |
*** tbachman has quit IRC | 12:11 | |
*** tbachman_ is now known as tbachman | 12:11 | |
openstackgerrit | Balazs Gibizer proposed openstack/nova master: Remove unused param from _fill_provider_mapping https://review.opendev.org/655107 | 12:12 |
openstackgerrit | Balazs Gibizer proposed openstack/nova master: Move _fill_provider_mapping to the scheduler_utils https://review.opendev.org/655108 | 12:12 |
openstackgerrit | Balazs Gibizer proposed openstack/nova master: prepare func test env for moving servers with bandwidth https://review.opendev.org/655109 | 12:12 |
openstackgerrit | Balazs Gibizer proposed openstack/nova master: allow getting resource request of every bound ports of an instance https://review.opendev.org/655110 | 12:12 |
openstackgerrit | Balazs Gibizer proposed openstack/nova master: Pass network API to the conducor's MigrationTask https://review.opendev.org/655111 | 12:12 |
openstackgerrit | Balazs Gibizer proposed openstack/nova master: Add request_spec to server move RPC calls https://review.opendev.org/655721 | 12:12 |
openstackgerrit | Balazs Gibizer proposed openstack/nova master: re-calculate provider mapping during migration https://review.opendev.org/655112 | 12:12 |
openstackgerrit | Balazs Gibizer proposed openstack/nova master: update allocation in binding profile during migrate https://review.opendev.org/656422 | 12:12 |
openstackgerrit | Balazs Gibizer proposed openstack/nova master: Extend NeutronFixture to handle migrations https://review.opendev.org/655114 | 12:12 |
openstackgerrit | Balazs Gibizer proposed openstack/nova master: func test for migrate server with ports having resource request https://review.opendev.org/655113 | 12:12 |
*** _hemna has quit IRC | 12:17 | |
*** nowster has quit IRC | 12:20 | |
*** cgoncalves has joined #openstack-nova | 12:20 | |
*** nowster has joined #openstack-nova | 12:24 | |
*** ttsiouts has joined #openstack-nova | 12:30 | |
*** mchlumsky has joined #openstack-nova | 12:35 | |
*** lbragstad has joined #openstack-nova | 12:39 | |
*** ratailor has quit IRC | 12:43 | |
artom | Does placement understand AZs? Or I guess a better question would be, if we make the right request to placement (whatever that request may look like), are we guaranteed to get at least some hosts in the correct AZ? | 12:47 |
amodi | artom: yes, they do, https://docs.openstack.org/nova/latest/admin/configuration/schedulers.html#availability-zones-with-placement | 12:48 |
artom | francoisp, ^^ | 12:49 |
francoisp | yes thanks amodi, artom | 12:49 |
artom | mdbooth, by the way, are we going to start including francoisp in our triage call? | 12:51 |
artom | Doh, wrong channel, answer downstream plz :) | 12:51 |
*** ttsiouts has quit IRC | 13:03 | |
*** mriedem has joined #openstack-nova | 13:05 | |
openstackgerrit | Merged openstack/nova master: Expose Hyper-V supported image types https://review.opendev.org/655137 | 13:10 |
*** mcgigglier has quit IRC | 13:14 | |
*** mcgiggler has joined #openstack-nova | 13:14 | |
*** tbachman has quit IRC | 13:15 | |
*** tbachman has joined #openstack-nova | 13:19 | |
*** sapd1_x has joined #openstack-nova | 13:27 | |
*** jaypipes has joined #openstack-nova | 13:32 | |
openstackgerrit | Arnaud Morin proposed openstack/nova master: Always Set dhcp_server in network_info https://review.opendev.org/658362 | 13:37 |
openstackgerrit | Merged openstack/nova master: Make libvirt expose supported image types https://review.opendev.org/653454 | 13:46 |
*** shilpasd has quit IRC | 13:49 | |
*** sapd1_x has quit IRC | 13:56 | |
*** mcgiggler has quit IRC | 13:58 | |
*** zbr has joined #openstack-nova | 14:00 | |
*** _hemna has joined #openstack-nova | 14:13 | |
*** bnemec is now known as beekneemech | 14:23 | |
*** jchhatbar has quit IRC | 14:30 | |
*** mlavalle has joined #openstack-nova | 14:33 | |
*** JamesBenson has joined #openstack-nova | 14:36 | |
*** JamesBenson has quit IRC | 14:38 | |
*** JamesBenson has joined #openstack-nova | 14:38 | |
*** dpawlik has quit IRC | 14:40 | |
*** ivve has quit IRC | 14:41 | |
*** lpetrut has quit IRC | 14:43 | |
*** _hemna has quit IRC | 14:46 | |
*** lpetrut has joined #openstack-nova | 14:47 | |
*** jaosorior has quit IRC | 14:49 | |
efried | tssurya: Quick look at https://review.opendev.org/#/c/648662/ if you please, sanity check my comment. | 14:50 |
tssurya | efried: checking | 14:51 |
*** hongbin has joined #openstack-nova | 14:51 | |
tssurya | in a meeting, will answer asap | 14:53 |
efried | tssurya: thanks. If I'm wrong about that wrinkle, I'll happily +2 (will let mriedem approve though) | 14:53 |
mriedem | i think you're right, and likely need to use oneOf | 14:56 |
mriedem | where the options are null with no locked_reason or object with a required locked_reason | 14:57 |
*** lpetrut has quit IRC | 14:57 | |
mriedem | efried: i think it just copied how 2.56 works for cold migrate where you can specify null or a dict with a host | 14:59 |
mriedem | and it looks like that schema has the same issue where you can pass null or {} | 14:59 |
efried | woot | 14:59 |
*** ttsiouts has joined #openstack-nova | 15:00 | |
tssurya | efried, mriedem: yea I basically did what 2.56 did for migrate host | 15:10 |
tssurya | I thought it was more of a feature thing than a bug :) | 15:10 |
tssurya | but maybe you are right, empty dict shouldn't be allowed ? | 15:11 |
*** imacdonn has quit IRC | 15:12 | |
*** imacdonn has joined #openstack-nova | 15:12 | |
efried | tssurya, mriedem: IMO there's no reason to allow it; just increases the test surface for no value. | 15:12 |
efried | And I guess we ought to have a test regardless. | 15:13 |
tssurya | efried: ack, I'll write a unit test like mriedem said and fix this then | 15:13 |
efried | ++ | 15:13 |
efried | thanks tssurya | 15:13 |
tssurya | thanks for the detailed eyeying :) | 15:13 |
efried | handful of nits you can fix up while you're in there if you feel like it | 15:14 |
tssurya | yep | 15:14 |
*** gyee has joined #openstack-nova | 15:17 | |
*** samueldmq has quit IRC | 15:20 | |
openstackgerrit | Dongcan Ye proposed openstack/nova master: Raise BuildAbortException while updating instance task_state conflict https://review.opendev.org/633160 | 15:21 |
*** maciejjozefczyk has quit IRC | 15:25 | |
*** mkrai1 has joined #openstack-nova | 15:27 | |
*** macza has joined #openstack-nova | 15:31 | |
*** ttsiouts has quit IRC | 15:34 | |
*** BjoernT has joined #openstack-nova | 15:40 | |
*** BjoernT has quit IRC | 15:43 | |
*** rpittau is now known as rpittau|afk | 15:45 | |
*** ivve has joined #openstack-nova | 15:47 | |
*** jangutter has quit IRC | 15:51 | |
*** brault has quit IRC | 15:52 | |
*** jobewan has joined #openstack-nova | 15:53 | |
*** tbachman has quit IRC | 15:55 | |
*** liuyulong has joined #openstack-nova | 16:03 | |
*** wwriverrat has joined #openstack-nova | 16:05 | |
*** lpetrut has joined #openstack-nova | 16:06 | |
*** xek has joined #openstack-nova | 16:10 | |
*** cdent has joined #openstack-nova | 16:11 | |
*** udesale has quit IRC | 16:12 | |
*** udesale has joined #openstack-nova | 16:12 | |
*** brault has joined #openstack-nova | 16:15 | |
*** brault has quit IRC | 16:20 | |
*** efried is now known as fried_rolls | 16:28 | |
*** _hemna has joined #openstack-nova | 16:43 | |
mriedem | dansmith: seems our external event routing isn't working across multiple cells, the api figures out the instance is migrating and gets the proper hosts but for whatever reason only the source host gets the event...anyway, that's the current reason why the multi-cell resize stuff is failing in the gate, will have to dig a bit after lunch. | 16:44 |
mriedem | maybe i can assert that somehow in my functional test, not sure how though | 16:44 |
dansmith | hmm, maybe because of the batching to the compute api that it does | 16:46 |
dansmith | it tries to collate multiple events per host where possible | 16:46 |
*** udesale has quit IRC | 16:49 | |
tssurya | efried, mriedem: https://review.opendev.org/#/c/648662/11/nova/api/openstack/compute/schemas/lock_server.py I am not sure if its a good idea anymore because a lot of the actions don't have schema validation at all, that means they allow empty dicts and what not. | 16:53 |
tssurya | let me know what you guys thing when you have the time | 16:54 |
tssurya | think* | 16:55 |
*** whoami-rajat has quit IRC | 16:58 | |
*** _hemna has quit IRC | 17:16 | |
*** lpetrut has quit IRC | 17:17 | |
*** tssurya has quit IRC | 17:37 | |
*** mdbooth_ has joined #openstack-nova | 17:41 | |
*** mdbooth has quit IRC | 17:44 | |
*** boxiang has quit IRC | 17:44 | |
*** boxiang has joined #openstack-nova | 17:45 | |
*** psyton has joined #openstack-nova | 17:49 | |
*** Swami has joined #openstack-nova | 17:59 | |
mriedem | i guess i can't really do the same thing in the functional test because of the fake driver and neutron fixture | 18:02 |
mriedem | fried_rolls: replied to surya's comment but haven't gone through the rest of that change yet since the last time - if you are +2 please withold the +W so i can take a look through it again | 18:04 |
*** whoami-rajat has joined #openstack-nova | 18:09 | |
*** hamzy_ has quit IRC | 18:17 | |
mriedem | dansmith: ah yes, | 18:18 |
mriedem | if host not in cell_contexts_by_host: | 18:18 |
mriedem | cell_contexts_by_host[host] = instance._context | 18:18 |
mriedem | that assumes all of the hosts are in the same cell | 18:19 |
dansmith | you mean, it assumes that for a given instance, the src/dst host will be in the same cell | 18:19 |
dansmith | yeah? | 18:19 |
mriedem | yup | 18:19 |
dansmith | is there no comment above saying "we assume..." ? | 18:20 |
mriedem | i can probably recreate/test that in my functional multi-cell test by just sending an event with a migration that has hosts in different cells | 18:20 |
mriedem | not really, but https://github.com/openstack/nova/blob/0cb1544106346664b4a53114458417ea62474b8c/nova/compute/api.py#L4853 | 18:20 |
mriedem | i wouldn't really expect it to either | 18:20 |
mriedem | nothing has needed it yet | 18:20 |
mriedem | https://github.com/openstack/nova/blob/0cb1544106346664b4a53114458417ea62474b8c/nova/compute/api.py#L4869 | 18:21 |
*** tssurya has joined #openstack-nova | 18:21 | |
dansmith | right, but I was thinking we had discussed this before, | 18:21 |
mriedem | ah yes | 18:21 |
mriedem | Consequently we can currently assume that the context for # both the source and destination hosts of a migration is the # same. | 18:21 |
dansmith | and putting a comment like that seems like something you would have made me do :) | 18:21 |
mriedem | didn't read far enough | 18:21 |
mriedem | it was mdbooth | 18:21 |
openstackgerrit | Merged openstack/os-traits master: Update SEV trait docs to avoid misleading people https://review.opendev.org/655671 | 18:22 |
mriedem | so if the migration record has cross_cell_move=true i'll have to tell it to pull the dest host mapping via the host mapping | 18:22 |
mriedem | so still optimized for the non-cross-cell case | 18:23 |
dansmith | cool | 18:23 |
*** hamzy has joined #openstack-nova | 18:24 | |
openstackgerrit | Surya Seetharaman proposed openstack/nova master: Microversion 2.73: Support adding the reason behind a server lock https://review.opendev.org/648662 | 18:35 |
*** jamesdenton has quit IRC | 18:37 | |
eandersson | lol mriedem | 18:50 |
*** jobewan has quit IRC | 18:51 | |
*** fried_rolls is now known as efried | 18:52 | |
zzzeek | jaypipes: ping | 18:54 |
mriedem | hmm my genius test must not be so great http://paste.openstack.org/show/751238/ | 18:56 |
mriedem | since it should fail, but doesn't | 18:56 |
mriedem | was hoping to avoid stubbing things out in the functional test to test the multi-cell event routing | 18:56 |
mriedem | dansmith: is there maybe something in our rpc fixture stuff that wouldn't make this easy to test in a functional test because we're using fake rpc? | 18:57 |
dansmith | mriedem: er, I wouldn't think so.. you're using the multi cell fixture? | 18:58 |
mriedem | yeah, my functional tests work fine for multi-cell | 18:58 |
mriedem | could be my context manager yield pattern in ^ isn't actually asserting anything | 18:58 |
*** dklyle_ has joined #openstack-nova | 19:02 | |
mriedem | no that routing seems to be working at least on one host | 19:04 |
mriedem | b'2019-05-10 15:02:18,969 DEBUG [nova.compute.manager] Processing event network-vif-plugged-60c6c823-ed75-46a0-a0c6-30aadb90e78c' | 19:04 |
mriedem | b'2019-05-10 15:02:18,970 DEBUG [nova.compute.manager] Received event network-vif-plugged-60c6c823-ed75-46a0-a0c6-30aadb90e78c' | 19:04 |
*** david-lyle has quit IRC | 19:04 | |
*** tesseract has quit IRC | 19:12 | |
*** _hemna has joined #openstack-nova | 19:13 | |
*** tssurya_ has joined #openstack-nova | 19:23 | |
*** jistr has quit IRC | 19:28 | |
mriedem | i think it works in functional testing because we're not using multiple rpc transports https://review.opendev.org/#/c/396417/ | 19:28 |
*** jistr has joined #openstack-nova | 19:28 | |
*** jistr has quit IRC | 19:29 | |
mriedem | so i'm guessing in a multi-cell test we'd need to use per-cell rpc fixtures? | 19:32 |
*** jistr has joined #openstack-nova | 19:33 | |
efried | mriedem, tssurya: I'm +2 on https://review.opendev.org/#/c/648662/ - leaving for mriedem to +W. | 19:39 |
*** jistr has quit IRC | 19:40 | |
*** jistr has joined #openstack-nova | 19:41 | |
mriedem | oh the pressure is on | 19:45 |
*** _hemna has quit IRC | 19:47 | |
ganso | mriedem: hey Matt. I am looking at https://review.opendev.org/#/c/658136/ ... it seems it needs https://github.com/openstack/nova/commit/94e620e87cb9349f799007f418ce94978bc33be1 | 19:47 |
ganso | mriedem: Rocky also doesn't have the methods assertFlavorMatchesUsage and assertRequestMatchesUsage | 19:47 |
ganso | mriedem: do you think it makes sense to backport the refactor? | 19:48 |
mriedem | no | 19:48 |
mriedem | there should be assertion methods you can use in queens | 19:48 |
ganso | mriedem: only assertFlavorMatchesAllocation | 19:51 |
ganso | mriedem: so the solution would be to cut down on some asserts | 19:51 |
*** imacdonn has quit IRC | 19:51 | |
mriedem | without looking at the test patch, you could rig up your own usage assertion using https://github.com/openstack/nova/blob/stable/queens/nova/tests/functional/test_servers.py#L1514 | 19:57 |
mriedem | not sure if you're asserting provider usage or consumer usage | 19:57 |
mriedem | i'm assuming the former | 19:57 |
ganso | mriedem: yup, doing that now. Basically re-coding the test to perform checks as it was done in queens | 19:57 |
ganso | mriedem: yea provider usage | 19:58 |
*** bbowen has joined #openstack-nova | 19:58 | |
openstackgerrit | Rodrigo Barbieri proposed openstack/nova stable/queens: [DEBUG] Add functional confirm_migration_error test https://review.opendev.org/658136 | 20:13 |
jaypipes | zzzeek: pong | 20:22 |
zzzeek | hey | 20:22 |
jaypipes | heya :) | 20:22 |
zzzeek | jaypipes: time warp back to http://www.joinfu.com/2015/01/understanding-reservations-concurrency-locking-in-nova/ | 20:22 |
zzzeek | so given that galera can have a little bit of read latency: https://www.percona.com/blog/2013/03/03/investigating-replication-latency-in-percona-xtradb-cluster/ can that impact an UPDATE that says, "UPDATE ... SET x='bar' WHERE x = 'foo'" ? if two transactions both update, but write latency means transaction 2 still sees "foo" ? | 20:24 |
*** slaweq has quit IRC | 20:24 | |
zzzeek | OR, do the UPDATE statements in a write-set get replayed such that they are serialized and it will detect this ? | 20:24 |
zzzeek | AND, if the latter, what if the two UPDATE statements are changing "x" to the *same* value? this is ultimately a nova question | 20:25 |
zzzeek | b.c. the issue is observed in the instance shelving logic | 20:25 |
*** bbowen has quit IRC | 20:30 | |
*** ccamacho has quit IRC | 20:32 | |
openstackgerrit | Matt Riedemann proposed openstack/nova master: Add cross-cell resize policy rule and enable in API https://review.opendev.org/638269 | 20:35 |
openstackgerrit | Matt Riedemann proposed openstack/nova master: WIP: Enable cross-cell resize in the nova-multi-cell job https://review.opendev.org/656656 | 20:35 |
openstackgerrit | Matt Riedemann proposed openstack/nova master: Support cross-cell moves in external_instance_event https://review.opendev.org/658478 | 20:35 |
mriedem | zzzeek: a la this bug? https://bugs.launchpad.net/nova/+bug/1821373 | 20:36 |
openstack | Launchpad bug 1821373 in OpenStack Compute (nova) "Most instance actions can be called concurrently" [Undecided,New] | 20:36 |
zzzeek | mriedem: yes | 20:37 |
zzzeek | mriedem: mdbooth_ has updated me that his proposed solution won't work | 20:37 |
*** bbowen has joined #openstack-nova | 20:38 | |
mriedem | i think he mentioned something to that effect in irc awhile back for this bug but i don't remember what or why | 20:39 |
mriedem | gotta run | 20:44 |
*** mriedem has quit IRC | 20:44 | |
cdent | so many things | 20:48 |
cdent | zzzeek: since you're sort of around, I wanted to warn you that at some point we might be hassling you for some advice on some performance improvements in placement. | 20:51 |
cdent | but not now. later. | 20:51 |
zzzeek | cdent: that should be good since placement looks to be straightforward | 20:51 |
cdent | it _was_ | 20:52 |
cdent | but the nested stuff being added this cycle are going to be mind bending | 20:52 |
zzzeek | cdent: what kind of nesting | 20:55 |
*** JamesBenson has quit IRC | 20:57 | |
*** JamesBenson has joined #openstack-nova | 20:59 | |
*** JamesBenson has quit IRC | 21:03 | |
*** mchlumsky has quit IRC | 21:04 | |
edleafe | zzzeek: representing compute nodes containing NUMA nodes containing memory, for example | 21:07 |
cdent | zzzeek: sorry was away. the nesting is the "nested providers" concept. The new features are described (in brief) at https://storyboard.openstack.org/#!/story/2005575 | 21:07 |
cdent | zzzeek edleafe has done some interesting work to switching a graph database (which is a better fit for some of these things) but we're in the standard position of "we've got this other stuff already" | 21:07 |
zzzeek | cdent: so for now are you looking at...adjacency list schema ? | 21:09 |
cdent | zzzeek: no explicit decisions yet. the linked email thread has some ideas. But the lack of decisions is why I was saying we'll probably want some input, but not quite yet. It's more of a heads up for a later discussion rather than a now discussion | 21:10 |
cdent | on top of that: solutions for the existing features are going to need some work to scale into the 100s of thousands of resource providers | 21:11 |
*** slaweq has joined #openstack-nova | 21:11 | |
cdent | but we need to do some accurate measuring first | 21:11 |
zzzeek | edleafe is dying to get off SQL :P | 21:19 |
zzzeek | :) | 21:19 |
edleafe | zzzeek: Just for this case. | 21:19 |
zzzeek | edleafe: yeah graph DBs are awesome | 21:19 |
zzzeek | edleafe: huge crazy new dependencies not as much :) | 21:19 |
edleafe | zzzeek: I got nested providers working in a couple of days at the summit last week | 21:20 |
zzzeek | edleafe: might be one of htose cases where you have multiple backends | 21:20 |
edleafe | I just use Neo4j in docker :) | 21:20 |
zzzeek | edleafe: we still have to have RPMs that build it out, we have memory reuirements that java VM adds some weight towards, etc | 21:22 |
zzzeek | edleafe: by "we" I mean red hat | 21:22 |
edleafe | zzzeek: Sure, I understand all that. I just want to show it working, and solving the problems that we've spent years trying to solve using SQL. If that gets done and enough people feel that this is the way to go, then we can worry about what is needed for build/deploy | 21:23 |
*** jaypipes has quit IRC | 21:24 | |
zzzeek | edleafe: for small graphs, I use adjacency list. if you need a huge deep graph every time, then that won't work | 21:24 |
*** slaweq has quit IRC | 21:24 | |
*** jaypipes has joined #openstack-nova | 21:24 | |
jaypipes | zzzeek: apologies, having net issues... reading back | 21:25 |
cdent | I think our model is many trees, each <= 7 levels deep, not very broad | 21:25 |
edleafe | cdent: for nested, agree. Shared providers is the opposite | 21:26 |
cdent | depends on how we model the shared association. in a graph db, yes | 21:27 |
cdent | I tend to like shared as group, not tree | 21:28 |
openstackgerrit | Eric Fried proposed openstack/os-resource-classes master: Propose ACCELERATOR_{FPGA|GPU} resource classes https://review.opendev.org/657464 | 21:35 |
edleafe | cdent: I was just commenting on the trees reference. In the graph, sharing is just another relation. With Many:1 sharing, it kind of looks like a flower, not a tree: https://bit.ly/2VUI52U | 21:36 |
jaypipes | zzzeek: set global wsrep_causal_reads=1; <-- do that if you want synchronous replication behaviour. | 21:38 |
zzzeek | jaypipes: OK so you can confirm that comapre and swap can fail for multimaster if that's not set ? | 21:38 |
zzzeek | jaypipes: this might be what's needed for https://bugs.launchpad.net/nova/+bug/1821373 | 21:39 |
openstack | Launchpad bug 1821373 in OpenStack Compute (nova) "Most instance actions can be called concurrently" [Undecided,New] | 21:39 |
jaypipes | zzzeek: I'm not 100% sure, but I believe so. the issue, however, is that the compare-and-swap technique should be done to increment a field and check that field value is at a previous read-view in the WHERE clause. in other words, doing things like UPDATE instances SET status = 'active' WHERE status IN (<list of statuses>) AND instances.uuid = ? is inherently not as safe/efficient/retryable-with-confidence as UPDATE instances SET generation = | 21:43 |
jaypipes | ? + 1 WHERE generation = ? AND uuid = ? | 21:43 |
*** _hemna has joined #openstack-nova | 21:44 | |
jaypipes | zzzeek: that's the fundamental problem IMHO with the whole "check my instance status is in these list of states and set the status to X" checks that nova does. | 21:44 |
zzzeek | jaypipes: if the UPDATE included a unique version or timestamp of some kind does that help? if galera sees two UPDATE statements setting it to a different value ? | 21:45 |
jaypipes | zzzeek: maybe? :) galera already sends around essentially a generation for the innodb records affected by a transaction writeset, AFAIK. I just think doing the "compare" part of the compare-and-swap functionality with a "loose match" like "status IN <....>" isn't as good as comparator that was specifically designed to inform the caller that "yes, someone else changed this record since you last read a view of it". hope that makes sense. | 21:48 |
jaypipes | zzzeek: this is why, in placement land, we always do the compare-and-swap using the `UPDATE tbl SET generation = ($LAST_READ_GENERATION + 1) WHERE pk = $PK AND generation = $LAST_READ_GENERATION` strategy | 21:50 |
zzzeek | jaypipes: the specific case in that issue we are looking for an exisitng status of NULL | 21:50 |
jaypipes | zzzeek: yeah, but AFAIK, the code you're describing isn't really a compare-and-swap. it's more just a "hey, check that I'm not, say, in the process of deleting this instance when I try to unshelve it" | 21:51 |
zzzeek | jaypipes: well yes I was saying, it's more reliable if we are setting it to a new value, however, if two transactions on different nodes hit it at the same time they will see the same LAST_READ_GENERATION value | 21:51 |
jaypipes | zzzeek: yes, they will. | 21:52 |
jaypipes | zzzeek: and if both attempt to update, one will fail of course. | 21:52 |
zzzeek | jaypipes: one fails because of the SET clasue specifically ? | 21:53 |
jaypipes | zzzeek: no, one will fail because WHERE generation = $LAST_READ_GENERATION will fail. | 21:54 |
jaypipes | to return a row. | 21:54 |
jaypipes | zzzeek: so, the SQL won't fail, per-se, it's just the transaction will return 0 rows affected. | 21:54 |
jaypipes | zzzeek: which is the thing we look for to trigger a rollback of the entire transaction. | 21:55 |
zzzeek | jaypipes: but what if one UPDATE processes, sends out the writeset which includes the new value, however the other node gets an UPDATE, due to replcation latency it also sees the same value, also emits an UPDATE, no failure | 21:55 |
zzzeek | but updates the row | 21:55 |
zzzeek | beause the new vaule wasn't there yet b.c. no wsrep_causal_reads | 21:55 |
zzzeek | jaypipes: this gets into, I have no idea how the galera writeset certification works | 21:56 |
zzzeek | jaypipes: I would think that certifiaction should be, transaction modified this row, this other transacvtion is modifying the same generation of that row, so it fails | 21:56 |
zzzeek | which means this issue is non-existent | 21:56 |
zzzeek | e.g. mvcc generation | 21:56 |
jaypipes | that is essentially how it works, yes. | 21:57 |
jaypipes | https://github.com/openstack/placement/blob/master/placement/objects/resource_provider.py#L960 | 21:57 |
*** rcernin has quit IRC | 21:58 | |
zzzeek | jaypipes: OK. So, if the SHELVE thing is looking explicitly for NULL and changes to SHELVING as described in the launchpad, no failure ? how does it fail ? | 21:58 |
jaypipes | the wsrep_causal_reads is about reads only. writes that attempt to update the same record when a diff trx changed that record are never allowed. | 21:58 |
zzzeek | jaypipes: right that's what I sort of thought | 21:59 |
jaypipes | zzzeek: apooogies, I haven't read the bug report yet. | 21:59 |
jaypipes | lemme do that now. one minute. | 21:59 |
jaypipes | zzzeek: I would take issue with mdbooth_'s statement "This is intended to act as a robust gate against 2 instance actions happening concurrently." :) | 22:00 |
jaypipes | it's not a robust gate at all. | 22:00 |
jaypipes | it's a super coarse-grained check | 22:01 |
zzzeek | jaypipes: OK he is stumped on this and I dont really know the details of this system, I just wrote the UPDATE statement four years ago | 22:01 |
zzzeek | do you have anything you can add to that launchapd? | 22:01 |
jaypipes | in fact, it's not a gate at all. it's nothing more than a very simple sanity check that exists outside of any transactional context AFAIK | 22:01 |
jaypipes | zzzeek: yeah, I will add a note to it. | 22:02 |
zzzeek | jaypipes: really? it doesnt seem that way to me, I assume this is on enginefacade and there should be a tx | 22:02 |
jaypipes | zzzeek: one would assume that. I have no real way of verifying it's in the same trx though. | 22:02 |
*** tssurya has quit IRC | 22:03 | |
*** tssurya_ is now known as tssurya | 22:03 | |
* zzzeek has to go do friday stuff | 22:04 | |
zzzeek | thanks for the chat jaypipes | 22:04 |
*** bbowen has quit IRC | 22:06 | |
*** slaweq has joined #openstack-nova | 22:11 | |
*** _hemna has quit IRC | 22:17 | |
*** cdent has quit IRC | 22:20 | |
*** mlavalle has quit IRC | 22:24 | |
*** slaweq has quit IRC | 22:24 | |
openstackgerrit | Eric Fried proposed openstack/os-resource-classes master: Propose ACCELERATOR_{FPGA|GPU} resource classes https://review.opendev.org/657464 | 22:33 |
*** whoami-rajat has quit IRC | 22:39 | |
*** _hemna has joined #openstack-nova | 22:44 | |
*** macza has quit IRC | 22:45 | |
*** macza has joined #openstack-nova | 22:47 | |
openstackgerrit | Sundar Nadathur proposed openstack/nova-specs master: Nova Cyborg interaction specification. https://review.opendev.org/603955 | 22:54 |
*** lbragstad has quit IRC | 22:58 | |
*** jaypipes has quit IRC | 23:01 | |
*** macza has quit IRC | 23:11 | |
*** slaweq has joined #openstack-nova | 23:11 | |
*** _hemna has quit IRC | 23:18 | |
openstackgerrit | Merged openstack/nova master: Add ironic driver image type capabilities https://review.opendev.org/655729 | 23:22 |
openstackgerrit | Merged openstack/nova master: Add vmware driver image type capabilities https://review.opendev.org/655730 | 23:22 |
*** mlavalle has joined #openstack-nova | 23:24 | |
*** slaweq has quit IRC | 23:24 | |
*** gyee has quit IRC | 23:26 | |
*** Swami has quit IRC | 23:32 | |
*** xek has quit IRC | 23:36 |
Generated by irclog2html.py 2.15.3 by Marius Gedminas - find it at mg.pov.lt!