*** prometheanfire has joined #openstack-nova | 00:10 | |
prometheanfire | http://logs.openstack.org/34/618834/1/check/cross-nova-py27/c3128e0/testr_results.html.gz new oslo.service causes failures | 00:10 |
---|---|---|
prometheanfire | happy post-summit :D | 00:10 |
*** tetsuro has joined #openstack-nova | 00:11 | |
jroll | efried: done | 00:14 |
*** hoonetorg has quit IRC | 00:35 | |
*** hoonetorg has joined #openstack-nova | 00:37 | |
*** hoonetorg has quit IRC | 00:40 | |
*** hoonetorg has joined #openstack-nova | 00:40 | |
openstackgerrit | zhufl proposed openstack/nova master: Add missing ws seperator between words https://review.openstack.org/618491 | 01:21 |
*** Swami has quit IRC | 01:34 | |
*** Dinesh_Bhor has joined #openstack-nova | 01:49 | |
*** Dinesh_Bhor has quit IRC | 01:55 | |
*** bhagyashris has joined #openstack-nova | 01:56 | |
*** Dinesh_Bhor has joined #openstack-nova | 02:04 | |
openstackgerrit | Tetsuro Nakamura proposed openstack/nova master: Consider root id is None in the database case https://review.openstack.org/613305 | 02:18 |
*** mrsoul has quit IRC | 02:33 | |
*** mhen has quit IRC | 02:51 | |
*** mhen has joined #openstack-nova | 02:52 | |
*** hongbin has joined #openstack-nova | 02:58 | |
*** cfriesen has quit IRC | 03:24 | |
*** Dinesh_Bhor has quit IRC | 03:42 | |
*** dklyle has quit IRC | 03:45 | |
*** david-lyle has joined #openstack-nova | 03:45 | |
openstackgerrit | Jie Li proposed openstack/nova-specs master: Support volume-backed server rebuild https://review.openstack.org/532407 | 03:46 |
*** cfriesen has joined #openstack-nova | 03:53 | |
*** diga has joined #openstack-nova | 04:00 | |
*** Dinesh_Bhor has joined #openstack-nova | 04:01 | |
*** tetsuro has quit IRC | 04:04 | |
*** udesale has joined #openstack-nova | 04:09 | |
*** sridharg has joined #openstack-nova | 04:28 | |
*** janki has joined #openstack-nova | 04:30 | |
*** tbachman_ has joined #openstack-nova | 04:39 | |
*** ivve has joined #openstack-nova | 04:41 | |
*** tbachman has quit IRC | 04:42 | |
*** tbachman_ is now known as tbachman | 04:42 | |
openstackgerrit | Merged openstack/nova master: Skip double word hacking test https://review.openstack.org/618843 | 04:57 |
openstackgerrit | Merged openstack/nova master: Fix server query examples https://review.openstack.org/616834 | 04:57 |
*** bhagyashris has quit IRC | 05:04 | |
*** hongbin has quit IRC | 05:07 | |
*** tetsuro has joined #openstack-nova | 05:11 | |
*** cfriesen has quit IRC | 05:38 | |
*** betherly has quit IRC | 05:43 | |
*** k_mouza has joined #openstack-nova | 05:43 | |
*** betherly has joined #openstack-nova | 05:44 | |
*** ratailor has joined #openstack-nova | 05:46 | |
*** k_mouza has quit IRC | 05:48 | |
*** betherly has quit IRC | 05:49 | |
*** brinzhang has joined #openstack-nova | 05:51 | |
*** links has joined #openstack-nova | 05:57 | |
openstackgerrit | Jie Li proposed openstack/nova-specs master: Support volume-backed server rebuild https://review.openstack.org/532407 | 06:04 |
*** sapd1 has joined #openstack-nova | 06:11 | |
*** moshele has joined #openstack-nova | 06:12 | |
*** ccamacho has quit IRC | 06:24 | |
*** Dinesh_Bhor has quit IRC | 06:25 | |
*** Dinesh_Bhor has joined #openstack-nova | 06:28 | |
*** spsurya has joined #openstack-nova | 06:39 | |
*** sapd1 has quit IRC | 06:39 | |
*** Luzi has joined #openstack-nova | 06:42 | |
*** takashin has left #openstack-nova | 07:01 | |
*** sapd1 has joined #openstack-nova | 07:04 | |
openstackgerrit | OpenStack Proposal Bot proposed openstack/nova stable/rocky: Imported Translations from Zanata https://review.openstack.org/614757 | 07:10 |
openstackgerrit | Radoslav Gerganov proposed openstack/nova master: VMware: implement trigger crash dump https://review.openstack.org/618736 | 07:18 |
*** pcaruana has joined #openstack-nova | 07:20 | |
*** moshele has quit IRC | 07:26 | |
*** pcaruana has quit IRC | 07:34 | |
*** psachin has joined #openstack-nova | 07:37 | |
*** pcaruana has joined #openstack-nova | 07:40 | |
*** tssurya has joined #openstack-nova | 07:46 | |
*** ccamacho has joined #openstack-nova | 07:54 | |
*** bhagyashris__ has joined #openstack-nova | 07:54 | |
*** sahid has joined #openstack-nova | 07:59 | |
*** adrianc has joined #openstack-nova | 08:04 | |
*** ralonsoh has joined #openstack-nova | 08:24 | |
openstackgerrit | Surya Seetharaman proposed openstack/nova master: Add os_compute_api:servers:create:cell_down policy https://review.openstack.org/614783 | 08:29 |
*** sapd1 has quit IRC | 08:37 | |
*** sapd1 has joined #openstack-nova | 08:37 | |
*** rodolof has joined #openstack-nova | 08:44 | |
*** do3meli has joined #openstack-nova | 08:46 | |
*** jpena|off is now known as jpena | 08:53 | |
*** priteau has joined #openstack-nova | 08:54 | |
*** lbragstad has joined #openstack-nova | 08:54 | |
*** jpena has left #openstack-nova | 08:55 | |
*** dpawlik has joined #openstack-nova | 08:56 | |
*** ShilpaSD has joined #openstack-nova | 09:08 | |
*** s10 has joined #openstack-nova | 09:17 | |
*** k_mouza has joined #openstack-nova | 09:19 | |
*** k_mouza has quit IRC | 09:20 | |
*** k_mouza has joined #openstack-nova | 09:20 | |
*** alexchadin has joined #openstack-nova | 09:22 | |
*** sapd1 has quit IRC | 09:22 | |
*** links has quit IRC | 09:26 | |
*** pooja_jadhav has joined #openstack-nova | 09:27 | |
*** cdent has joined #openstack-nova | 09:31 | |
*** derekh has joined #openstack-nova | 09:37 | |
*** bhagyashris__ has quit IRC | 09:48 | |
*** slaweq__ has quit IRC | 09:48 | |
*** sapd1 has joined #openstack-nova | 09:49 | |
*** nehaalhat_ has joined #openstack-nova | 09:50 | |
*** sapd1 has quit IRC | 09:54 | |
*** dtantsur|afk is now known as dtantsur | 09:58 | |
*** rodolof has quit IRC | 10:02 | |
*** adrianc has quit IRC | 10:05 | |
*** adrianc has joined #openstack-nova | 10:10 | |
*** rodolof has joined #openstack-nova | 10:16 | |
*** moshele has joined #openstack-nova | 10:23 | |
*** sapd1 has joined #openstack-nova | 10:26 | |
*** k_mouza has quit IRC | 10:27 | |
*** k_mouza has joined #openstack-nova | 10:28 | |
sean-k-mooney | o/ | 10:30 |
*** sapd1 has quit IRC | 10:31 | |
*** k_mouza has quit IRC | 10:33 | |
openstackgerrit | Chris Dent proposed openstack/nova master: Use external placement in functional tests https://review.openstack.org/617941 | 10:33 |
openstackgerrit | Chris Dent proposed openstack/nova master: WIP: Delete the placement code https://review.openstack.org/618215 | 10:33 |
*** slaweq__ has joined #openstack-nova | 10:40 | |
*** k_mouza has joined #openstack-nova | 10:51 | |
*** Dinesh_Bhor has quit IRC | 10:56 | |
*** Dinesh_Bhor has joined #openstack-nova | 10:58 | |
*** Dinesh_Bhor has quit IRC | 10:59 | |
*** sahid has quit IRC | 11:00 | |
*** sahid has joined #openstack-nova | 11:00 | |
*** tbachman_ has joined #openstack-nova | 11:02 | |
*** tbachman has quit IRC | 11:03 | |
*** tbachman_ is now known as tbachman | 11:03 | |
*** udesale has quit IRC | 11:11 | |
*** alexchadin has quit IRC | 11:11 | |
*** adrianc has quit IRC | 11:12 | |
*** diga has quit IRC | 11:16 | |
*** pooja_jadhav has quit IRC | 11:22 | |
*** adrianc has joined #openstack-nova | 11:23 | |
*** adrianc_ has joined #openstack-nova | 11:25 | |
*** k_mouza has quit IRC | 11:26 | |
*** adrianc has quit IRC | 11:29 | |
*** k_mouza has joined #openstack-nova | 11:29 | |
openstackgerrit | Merged openstack/nova master: Nix refs to ResourceProvider obj from libvirt UT https://review.openstack.org/618786 | 11:29 |
cdent | huzzah | 11:32 |
sean-k-mooney | was the the last usage of placement object in the nova unit tests | 11:33 |
cdent | sean-k-mooney: not quite | 11:34 |
*** k_mouza has quit IRC | 11:34 | |
cdent | well, strictly speaking, yes | 11:34 |
cdent | it was the last usage of the objcts | 11:34 |
*** k_mouza has joined #openstack-nova | 11:34 | |
cdent | but there are some other tests which are testing things that want to use the placement db | 11:34 |
sean-k-mooney | but not the last useage of placement | 11:34 |
sean-k-mooney | ah ok | 11:34 |
cdent | in the my wip to remove the placement code I've had to remove some other tests/code | 11:34 |
cdent | when I get mat's suggestions on https://review.openstack.org/#/c/600161/ done, I'm going to go back to https://review.openstack.org/#/c/618215/ to fix up the merge conflict and tidy up whatever else is broken | 11:35 |
prometheanfire | http://logs.openstack.org/34/618834/1/check/cross-nova-py27/c3128e0/testr_results.html.gz new oslo.service causes failures | 11:40 |
prometheanfire | since people seem here :D | 11:40 |
openstackgerrit | Martin Midolesov proposed openstack/nova master: Allow driver to specify switch&port for faster lookup https://review.openstack.org/617695 | 11:40 |
*** adrianc_ has quit IRC | 11:43 | |
cdent | prometheanfire: oh joy. I seem to recall efried and melwitt being vaguely aware of that | 11:45 |
sean-k-mooney | we were aware of that because we backported a fix downstream that we sould not have | 11:46 |
sean-k-mooney | prometheanfire: what branch is this happening on | 11:46 |
sean-k-mooney | prometheanfire: ya rocky nova is not compatible with that version of oslo service | 11:47 |
sean-k-mooney | we should not be increaseing the upper constatin on olso service in stable rocky anyway | 11:47 |
*** tbachman has quit IRC | 11:48 | |
sean-k-mooney | oh it got backported into 1.31.6 ... | 11:48 |
*** pooja_jadhav has joined #openstack-nova | 11:49 | |
*** pooja_jadhav has quit IRC | 11:50 | |
*** alexchadin has joined #openstack-nova | 11:50 | |
prometheanfire | yep :| | 11:54 |
sean-k-mooney | https://review.openstack.org/#/c/616505/3 is the issue | 11:54 |
sean-k-mooney | or rather https://review.openstack.org/#/q/I62e9f1a7cde8846be368fbec58b8e0825ce02079 | 11:55 |
sean-k-mooney | well i guess its the same | 11:55 |
*** alexchadin has quit IRC | 11:56 | |
*** s10 has quit IRC | 11:57 | |
*** dtantsur is now known as dtantsur|brb | 11:59 | |
*** do3meli has left #openstack-nova | 12:02 | |
*** s10 has joined #openstack-nova | 12:04 | |
*** xek_ has joined #openstack-nova | 12:06 | |
*** xek_ is now known as xek | 12:06 | |
*** Luzi has quit IRC | 12:07 | |
*** slaweq__ has quit IRC | 12:07 | |
*** adrianc_ has joined #openstack-nova | 12:09 | |
*** janki has quit IRC | 12:10 | |
xek | did anyone try to use nova notifications lately? | 12:10 |
xek | I have an issue that when an instance is deleted, I only get an update notification with task state "deleting", but no instance.delete.end notification... | 12:10 |
xek | I tried both versioned and unversioned notifications | 12:11 |
jaosorior | gibi: are you around? | 12:15 |
jaosorior | gibi: I remember you had around some document where you had the new supported versioned notifications. | 12:16 |
xek | jaosorior, I think it's this one: https://docs.openstack.org/nova/latest/reference/notifications.html | 12:16 |
jaosorior | xek: instance.delete.end is there in the list | 12:17 |
jaosorior | so, if the notification is not being emmited, it's a bug. | 12:17 |
sean-k-mooney | xek: going froward i belive we are droping support for unverioned notifications. i dont belive we intend to remvoed them but we had discussed freezeing the code and not fixing new buts | 12:18 |
sean-k-mooney | *bugs | 12:18 |
jaosorior | sean-k-mooney: thanks, we're aware of the unversioned notifications deprecation. Hence why we started testing the versioned ones. instance.delete.end should be in the versioned notifications too (according to the list in the doc xek pointed at). So... if it's not being emmited, I think it's a bug. We'll file it up. | 12:19 |
jaosorior | unless that doc is outdated :/ | 12:19 |
sean-k-mooney | jaosorior: if the versioned notifcation is not beeing emited its a bug yes | 12:20 |
sean-k-mooney | that said the notifacion may have been lost in rabbitmq if you dont have perstency | 12:20 |
sean-k-mooney | i assume its repeatable | 12:20 |
jaosorior | sean-k-mooney: can yo ellaborate on that? | 12:21 |
*** moshele has quit IRC | 12:21 | |
sean-k-mooney | jaosorior: you can configure rabbitmq to persist each message to disk until its dequeued or it can keep them just in memory | 12:21 |
*** Luzi has joined #openstack-nova | 12:22 | |
sean-k-mooney | using durable queue is a significant perfromace hit but if rabbit is restarted messages are not lost | 12:22 |
sean-k-mooney | i do not belive the queue used for notifcation are durabel | 12:22 |
*** ratailor has quit IRC | 12:23 | |
jaosorior | sean-k-mooney: so, a bit of context: xek is working on getting functional tests for novajoin (a vendordata plugin for nova). These functional tests are being set up on top of devstack.... do you happen to know if this would be an issue that would hit a small devstack deployment? | 12:23 |
xek | sean-k-mooney, we saw some heartbeat misses and thus connectivity issues in the logs, but we didn't restart rabbitmq | 12:23 |
jaosorior | sean-k-mooney: we ultimately use it in TripleO, however, we haven't actually had issues with the notifications (yet). So I guess the rabbitmq settings are OK :D | 12:23 |
xek | sean-k-mooney, and it's pretty consistent | 12:23 |
sean-k-mooney | no idea unfortunetly i dont know what the default is | 12:24 |
xek | the default is non-persistent | 12:24 |
sean-k-mooney | oslo messaging and rabit by default do not gurantee message delivery as far as i am aware | 12:24 |
sean-k-mooney | if there was a tempary network issue the notifcaion could have been lost | 12:24 |
sean-k-mooney | my understanding is oslo messaging will not buffer the notifaction to my knolage and retry sendign the notification to rabbit if it trys to send it and fail unless you code that on top yourself | 12:26 |
*** slaweq__ has joined #openstack-nova | 12:27 | |
sean-k-mooney | i could be wrong about that however as this is a part of nova i have realy dealt with | 12:27 |
*** priteau has quit IRC | 12:28 | |
jaosorior | sean-k-mooney: got it, I guess the next step is to investigate how rabbitmq is being configured in devstack, and how to modify that configuration if necessary | 12:28 |
jaosorior | sean-k-mooney: thanks for the guidance! | 12:28 |
*** moshele has joined #openstack-nova | 12:28 | |
openstackgerrit | Chris Dent proposed openstack/nova master: Use external placement in functional tests https://review.openstack.org/617941 | 12:35 |
openstackgerrit | Chris Dent proposed openstack/nova master: WIP: Delete the placement code https://review.openstack.org/618215 | 12:35 |
openstackgerrit | Chris Dent proposed openstack/nova master: WIP: Delete the placement code https://review.openstack.org/618215 | 12:36 |
sean-k-mooney | jaosorior: xek for what its worth it looks like you can enable durable queues in the nova.conf and it default to false | 12:36 |
sean-k-mooney | https://docs.openstack.org/oslo.messaging/ocata/opts.html#oslo_messaging_rabbit.amqp_durable_queues | 12:36 |
sean-k-mooney | i should proably check the latest doc actully to see if that is still a thing | 12:37 |
jangutter | sean-k-mooney: would you perhaps be able to donate some of your copious free time towards the helping with the naming of things? | 12:38 |
jaosorior | sean-k-mooney: thanks! we'll check that out. | 12:39 |
*** moshele has quit IRC | 12:39 | |
sean-k-mooney | jangutter: hah i could stop procrastinating by looking at messaging stuff and look at nameing instead | 12:39 |
sean-k-mooney | jaosorior: https://docs.openstack.org/oslo.messaging/latest/configuration/opts.html#oslo_messaging_rabbit.amqp_durable_queues its there on latest too | 12:40 |
jangutter | sean-k-mooney: bring your bikeshed, it can be any colour as long as it's red. | 12:40 |
sean-k-mooney | haha | 12:40 |
sean-k-mooney | not pink | 12:41 |
jangutter | sean-k-mooney: I think we're close to peak confusion with what the heck "network offloads" mean. | 12:41 |
sean-k-mooney | jangutter: is this related to the os-vif spec | 12:41 |
sean-k-mooney | ah yes | 12:41 |
*** psachin has quit IRC | 12:41 | |
jangutter | sean-k-mooney: yaas.... anyone wanting to frighten their young'uns: https://review.openstack.org/#/c/607610/ | 12:42 |
*** slaweq__ is now known as slaweq | 12:42 | |
sean-k-mooney | jay highlighted it to me yesterday | 12:42 |
jangutter | sean-k-mooney: googling NIC offloads leads me to: https://access.redhat.com/documentation/en-us/red_hat_enterprise_linux/6/html/performance_tuning_guide/network-nic-offloads | 12:43 |
sean-k-mooney | yes when i think nic offload i think the things ethool say the nic can offload | 12:43 |
sean-k-mooney | not hardware offloaded ovs | 12:44 |
jangutter | sean-k-mooney: that makes sense from that point of view. | 12:44 |
jangutter | sean-k-mooney: so what would a nice, intuitive, unambiguous way be to describe a packet processing pipeline that is run on a coprocessor? | 12:45 |
sean-k-mooney | it can also include things like ipsec offlaod too which does not show up in ethtool but is in the same vain as vxlan tunnel encap/decap offload | 12:45 |
jangutter | sean-k-mooney: it's a metric shedload of confusion. | 12:45 |
sean-k-mooney | well you could call it vswitch offload/accleration | 12:46 |
gibi | xek, jaosorior: please file a bug. I will go and try to reproduce it | 12:47 |
jangutter | sean-k-mooney: In my mind, I have broadly two classifications: things that happen "at the endpoint" (like TSO/ rx/tx checksum offload) and things that happen "before it hits the other endpoint". | 12:47 |
gibi | xek, jaosorior: I'm a bit on and off today | 12:47 |
jangutter | sean-k-mooney: yeah 'vswitch' is acceptable, I guess, but you don't necessarily need a vswitch. In theory, things like IPSEC/VXLAN would fit in my "second case" too. | 12:48 |
sean-k-mooney | jangutter: there is the offload phase the modifes the packet and the classifiact/switching pahses that desiced what modification to apply and where to send it | 12:48 |
jangutter | sean-k-mooney: if IPSEC is regarded as a tunnel... | 12:48 |
*** panda|rover is now known as panda|rover|lch | 12:48 | |
jangutter | sean-k-mooney: yeah, "more associated with the guest" and "more associated with the host". | 12:49 |
jangutter | sean-k-mooney: it would be awesome if we could classify them as "guest offloads" and "host offloads". | 12:49 |
*** brinzhang has quit IRC | 12:50 | |
sean-k-mooney | jangutter: i would be ok with that split | 12:51 |
jangutter | sean-k-mooney: only problem is that it's a bit ambiguous.... since obviously endpoints on the host can also make use of "guest offloads". | 12:51 |
*** rodolof has quit IRC | 12:52 | |
sean-k-mooney | we could call the protocol offload instead and switching offloads | 12:53 |
jangutter | sean-k-mooney: hence why I thought "endpoint offloads" and "datapath offloads" are also good bikesheds.... | 12:53 |
jangutter | sean-k-mooney: "protocol offload" and VXLAN/IPSEC will cause a bit of confusion, but I'm fine with "switching/vswitch/eswitch offloads" | 12:54 |
sean-k-mooney | then network offloads and switching offloads? | 12:55 |
sean-k-mooney | i agree the tunnels are kind of both | 12:55 |
sean-k-mooney | they ware weird | 12:55 |
jangutter | sean-k-mooney: tell me about it. | 12:55 |
jangutter | sean-k-mooney: I almost want to call 'em "socket offloads". | 12:57 |
sean-k-mooney | so while we some of them dont work a l4 however | 12:57 |
sean-k-mooney | ignore the so while we | 12:58 |
sean-k-mooney | i need to full clear my buffer when i change what i was going to type | 12:58 |
sean-k-mooney | jangutter: since we are talking about nameing how do you feel about my nameing comments in https://review.openstack.org/#/c/572081/9/os_vif/objects/host_info.py@205 | 12:59 |
sean-k-mooney | e.g. the fact that the repsentor netdevs are really part of the contol plane not data plane since they do not transmit packets | 13:00 |
*** dtantsur|brb is now known as dtantsur | 13:00 | |
sean-k-mooney | the are kind of like the vhost-user sockets in that respect | 13:00 |
jangutter | sean-k-mooney: representors absolutely transmit packets. | 13:00 |
sean-k-mooney | the difference being if you implemented them correctly and ran tcpdump the kernel would actully allow you to sniff the packets | 13:01 |
sean-k-mooney | jangutter: the VF those but the netdev you add to ovs should not | 13:01 |
jangutter | sean-k-mooney: mainly, the first packet of a new flow, i.e. something that doesn't have a rule. | 13:01 |
sean-k-mooney | right the learning packet but not the rest | 13:01 |
sean-k-mooney | i forgot they are used for the exception path | 13:02 |
jangutter | sean-k-mooney: yep, and it's vital to things like tunnels... | 13:02 |
sean-k-mooney | networking is hard. | 13:02 |
sean-k-mooney | i need to respond to a downstream bug but ill be back in a while | 13:02 |
sean-k-mooney | did this help? | 13:02 |
jangutter | sean-k-mooney: thanks very much, this was very therapeutic! | 13:03 |
sean-k-mooney | ill try and review the spec today or at least this week i have it open on my monitor in anycase | 13:03 |
jangutter | sean-k-mooney: thanks, will be respinning to try to clarify that some "offloads" are more "off" than others. | 13:04 |
*** s10 has quit IRC | 13:05 | |
*** priteau has joined #openstack-nova | 13:11 | |
*** moshele has joined #openstack-nova | 13:11 | |
openstackgerrit | do3meli proposed openstack/nova master: Allow VMs to use unaddressed ports https://review.openstack.org/533249 | 13:12 |
*** k_mouza has quit IRC | 13:19 | |
*** k_mouza has joined #openstack-nova | 13:19 | |
*** moshele has quit IRC | 13:21 | |
*** s10 has joined #openstack-nova | 13:21 | |
*** diga has joined #openstack-nova | 13:22 | |
*** k_mouza has quit IRC | 13:23 | |
*** k_mouza has joined #openstack-nova | 13:26 | |
*** k_mouza_ has joined #openstack-nova | 13:30 | |
*** k_mouza has quit IRC | 13:32 | |
dtantsur | hi folks! is there a high-level description of Placement API? /cc cdent | 13:33 |
bauzas | dtantsur: https://developer.openstack.org/api-ref/placement/ ? | 13:34 |
dtantsur | bauzas: this is low-level, it says how to use specific endpoints. I'm more interested in high-level flow. | 13:35 |
dtantsur | I want to make ironic optionally report to/consume placement | 13:36 |
dtantsur | I need to understand 1. what reporting actually means, 2. how a node can be reserved via Placement. | 13:36 |
bauzas | dtantsur: we also have https://docs.openstack.org/nova/latest/contributor/placement.html | 13:37 |
*** panda|rover|lch is now known as panda|rover | 13:37 | |
belmoreira | dtantsur are you tracking this work somewhere? | 13:37 |
dtantsur | belmoreira: mostly in my head for now.. the API design without placement bits is https://review.openstack.org/617953 | 13:38 |
dtantsur | let me try a specific question | 13:39 |
dtantsur | given that any Ironic node is represented by exactly one instance of a custom resource class | 13:40 |
dtantsur | to reserve a node I need: 1. GET /resource_providers?resources=CUSTOM_BAREMETAL:1&required=list,of,traits | 13:40 |
dtantsur | 2. POST /allocations with suitable UUID? | 13:40 |
*** mriedem has joined #openstack-nova | 13:41 | |
*** mlavalle has joined #openstack-nova | 13:43 | |
*** lpetrut has joined #openstack-nova | 13:44 | |
*** tbachman has joined #openstack-nova | 13:45 | |
*** whoami-rajat has quit IRC | 14:00 | |
*** artom has quit IRC | 14:02 | |
openstackgerrit | Matt Riedemann proposed openstack/nova master: Use long_rpc_timeout in select_destinations RPC call https://review.openstack.org/607735 | 14:04 |
mriedem | dansmith: a couple of questions in https://review.openstack.org/#/c/617898/ | 14:11 |
*** mmethot has quit IRC | 14:13 | |
*** mmethot has joined #openstack-nova | 14:17 | |
*** k_mouza has joined #openstack-nova | 14:20 | |
dansmith | jaypipes: you had feelings on this in the past, if you want to chime in ^ | 14:20 |
*** k_mouza_ has quit IRC | 14:23 | |
*** awaugama has joined #openstack-nova | 14:25 | |
cdent | dtantsur: join us in #openstack-placement | 14:27 |
*** jdillaman has joined #openstack-nova | 14:30 | |
*** liuyulong has joined #openstack-nova | 14:30 | |
mriedem | dansmith: replied. i'd be +2 on that now unless you are going to update the little CRUD APIs thing | 14:37 |
*** s10 has quit IRC | 14:37 | |
mriedem | i could see value in trying to document your replies about alternatives to filtering on cell etc, but that might be more work than it's worth right now | 14:38 |
dansmith | mriedem: yep, I'll fix the crud wording first | 14:39 |
efried | prometheanfire, sean-k-mooney: If we're backporting that oslo.service change, we need to backport the nova fixage to those mocks. This was a pretty big PITA when we did it on master, requiring a weird lockstep of patches in nova and requirements. Let me find it quick... | 14:40 |
efried | prometheanfire, sean-k-mooney: Okay, so I think it was, in this order: | 14:42 |
efried | Remove the mocks from nova: https://review.openstack.org/#/c/616697/ | 14:42 |
efried | Update the requirements: https://review.openstack.org/#/c/616371/ | 14:42 |
efried | Update nova to use the mocks and require the new release: https://review.openstack.org/#/c/615724/ | 14:42 |
sean-k-mooney | efried: yes we would or we could not backport the oslo.service change at all | 14:43 |
efried | sean-k-mooney: Or we could just backport "remove the mocks". The only thing it affects is wallclock time for tox. The mocks are just avoiding real sleeps. | 14:44 |
sean-k-mooney | i would personally prefer to revet teh oslo change form the 1.31.x branch but that said it only breaks the unit test and does pass functional and tempest tests | 14:44 |
sean-k-mooney | efried: ya that is an option | 14:45 |
efried | yes, it's UT only. And it's because nova is mocking private things from oslo.service, and those private things are re/moved with that fix. | 14:45 |
efried | (and that was my bad, mocking the privates) | 14:45 |
openstackgerrit | Dan Smith proposed openstack/nova master: Add CellsV2 FAQ about API design decisions https://review.openstack.org/617898 | 14:45 |
sean-k-mooney | efried: yes but they were removed in a release of oslo.service that was above the max allowed by the upper-constratins for that release | 14:46 |
sean-k-mooney | efried: redhat has backported this internally and it broke everything so i know it will make lyarwood happy if we fixed nova upstream to work with that backport | 14:47 |
sean-k-mooney | efried: i guess https://review.openstack.org/#/c/616697/ is relitivly small | 14:48 |
efried | sean-k-mooney: I'm going to take the morning off. If you and/or dhellmann and/or prometheanfire want to fix it up, cool, or bug me about it later and I can propose whatever. | 14:48 |
efried | sean-k-mooney: Yes, it's trivial. | 14:48 |
sean-k-mooney | ok i can propose the backport for https://review.openstack.org/#/c/616697/2 | 14:49 |
sean-k-mooney | we cant bump to 1.33 on stable however | 14:50 |
sean-k-mooney | so we will need to get them to backport the sleep fixture. | 14:51 |
openstackgerrit | Eric Fried proposed openstack/nova master: Remove v1 check in Cinder client version lookup https://review.openstack.org/617927 | 14:52 |
*** munimeha1 has joined #openstack-nova | 14:53 | |
openstackgerrit | Eric Fried proposed openstack/nova master: Consider root id is None in the database case https://review.openstack.org/613305 | 14:54 |
*** sahid has left #openstack-nova | 14:59 | |
efried | sean-k-mooney: It looks like that's proposed anyway: https://review.openstack.org/#/c/617989/ | 14:59 |
*** jcosmao has joined #openstack-nova | 14:59 | |
sean-k-mooney | efried: yes chating to them on oslo channel | 15:01 |
*** efried is now known as efried_pto | 15:02 | |
sean-k-mooney | ill propse the backport for the 2 patches you suggested | 15:02 |
efried_pto | thanks sean-k-mooney | 15:02 |
sean-k-mooney | actully hberaud is gong to do it but ill keep an eye on it. enjoy your morning off | 15:02 |
*** udesale has joined #openstack-nova | 15:05 | |
*** tbachman has quit IRC | 15:08 | |
*** tidwellr_ has joined #openstack-nova | 15:09 | |
*** cfriesen has joined #openstack-nova | 15:12 | |
jaypipes | dansmith: done | 15:13 |
jaypipes | dansmith: thx for the heads up on that. | 15:13 |
*** edmondsw has joined #openstack-nova | 15:14 | |
*** Luzi has quit IRC | 15:14 | |
dansmith | jaypipes: thanks | 15:19 |
openstackgerrit | Chris Dent proposed openstack/nova master: WIP: Delete the placement code https://review.openstack.org/618215 | 15:19 |
mriedem | if someone is looking to update the resurrected ops guide docs about cells https://bugs.launchpad.net/openstack-manuals/+bug/1804253 | 15:21 |
openstack | Launchpad bug 1804253 in openstack-manuals "Capacity planning and scaling in Operations Guide - cells information is out of date" [Undecided,New] | 15:21 |
mriedem | ^ is still all cells v1 | 15:21 |
*** BjoernT has joined #openstack-nova | 15:25 | |
*** k_mouza_ has joined #openstack-nova | 15:25 | |
openstackgerrit | Hervé Beraud proposed openstack/nova stable/rocky: remove mocks of oslo.service private members https://review.openstack.org/619019 | 15:26 |
BjoernT | Hello, Is someone here aware of the implementation of ComputeManager._run_image_cache_manag as we run in to performance issues on a NFS mounted /var/lib/nova/instances directory and now had to increase rpc response timeout? | 15:26 |
*** k_mouza has quit IRC | 15:29 | |
*** tbachman has joined #openstack-nova | 15:31 | |
sean-k-mooney | that ^ sound like an mdbooth kind of question but he does not seam to be about currently | 15:32 |
*** tbachman has quit IRC | 15:37 | |
*** tbachman has joined #openstack-nova | 15:37 | |
openstackgerrit | Hervé Beraud proposed openstack/nova stable/rocky: Use SleepFixture instead of mocking _ThreadingEvent.wait https://review.openstack.org/619022 | 15:38 |
*** k_mouza_ has quit IRC | 15:39 | |
*** k_mouza has joined #openstack-nova | 15:40 | |
*** lpetrut has quit IRC | 15:40 | |
*** dave-mccowan has joined #openstack-nova | 15:41 | |
openstackgerrit | Jack Ding proposed openstack/nova master: Add I/O Semaphore to limit concurrent disk ops https://review.openstack.org/609180 | 15:42 |
*** k_mouza has quit IRC | 15:44 | |
*** k_mouza has joined #openstack-nova | 15:45 | |
*** maciejjozefczyk has quit IRC | 15:46 | |
*** maciejjozefczyk has joined #openstack-nova | 15:47 | |
prometheanfire | efried_pto: sean-k-mooney I'd say we are fine for now, nova may want to add a exclusion to it's reqs.txt, or not | 15:47 |
*** liuyulong has quit IRC | 15:47 | |
*** maciejjozefczyk has quit IRC | 15:47 | |
prometheanfire | the update is being held back atm by reqs cross gating | 15:48 |
*** devep has joined #openstack-nova | 15:49 | |
prometheanfire | question is, are old versions of nova going to work with the oslo.service change (18.0.2 and the like), it sounds like not, which means packagers should be made aware | 15:49 |
sean-k-mooney | prometheanfire: https://review.openstack.org/#/c/619019/1 and https://review.openstack.org/#/c/619022/1 will fix the nova compatiblity | 15:49 |
sean-k-mooney | prometheanfire: old versions of nova would work but the unites would not which may break packager build systems | 15:50 |
prometheanfire | unites / unit tests? | 15:52 |
*** tssurya has quit IRC | 15:55 | |
openstackgerrit | Merged openstack/nova master: Add description of custom resource classes https://review.openstack.org/616721 | 15:55 |
openstackgerrit | Merged openstack/nova master: Add CellsV2 FAQ about API design decisions https://review.openstack.org/617898 | 15:55 |
*** itlinux has quit IRC | 15:57 | |
*** Sundar has joined #openstack-nova | 16:06 | |
Sundar | jaypipes, dansmith, sean-k-mooney, cdent: Thanks for discussing the Nova-Cyborg spec in IRC y'day. I caught up with that. Will remove the Cyborg API signatures. and | 16:09 |
*** bhdn has joined #openstack-nova | 16:10 | |
Sundar | I still have some questions on what jaypipes expects. The os-acc is not going to handle devices by itself. It neds access to Cyborg db and drivers, which means the majority of work will happen in Cyborg. | 16:10 |
*** udesale has quit IRC | 16:11 | |
*** k_mouza has quit IRC | 16:11 | |
Sundar | sean-k-mooney: Re. request groups in device profiles, it is still not clear to me how we would handle co-location without them, i.e., we want 2 accelerators from 2 different RPs in the same device. | 16:13 |
*** hamzy has quit IRC | 16:14 | |
mriedem | os-acc is going to have direct db access to cyborg? | 16:17 |
*** k_mouza has joined #openstack-nova | 16:18 | |
*** artom has joined #openstack-nova | 16:19 | |
*** artom has joined #openstack-nova | 16:19 | |
*** ccamacho has quit IRC | 16:20 | |
Sundar | mdriedem: No. os-acc needs to call Cyborg REST APIs, and those calls do the bulk of the work. | 16:23 |
jaypipes | Sundar: currently on a call with sean-k-mooney and jangutter about os-vif. give me a little while to respond. | 16:23 |
mriedem | Sundar: ok, if os-acc were like os-brick and os-vif, i would expect it to deal with the physical devices on the host | 16:24 |
mriedem | and something like python-cyborgclient would be used for dealing with the cyborg rest API | 16:24 |
mriedem | or the openstacksdk | 16:24 |
mriedem | at least that's the model nova has for dealing with volumes and ports | 16:24 |
Sundar | jaypipes, Sure, NP | 16:24 |
Sundar | mriedem: I understand. os-acc is not an exact clone of os-vif or os-brick. For example, to bind an ARQ, a device may need to be configured or re-programmed. That requires a Cyborg driver which knows the details of that device. | 16:26 |
Sundar | mriedem: That is more like what a Neutron mechanism driver does. | 16:27 |
mriedem | hmm, | 16:27 |
mriedem | os-brick and os-vif definitely have plugins/drivers that do things based on the 'type' of device | 16:27 |
mriedem | but i'm just sitting in the peanut gallery here so ignore me | 16:28 |
Sundar | If we were to have separate drivers for os-acc and Cyborg, it would be cumbersome -- for example, tasks needed for device discovery/initialization (handled by Cyborg drivers) and tasks required for ARQ binding (initiated via os-acc) will have many commonalities. For instance, both may need ways to reset the device (or some part of it). | 16:33 |
Sundar | Apart from having two different driver installs/configures etc. | 16:34 |
*** devep_ has joined #openstack-nova | 16:34 | |
*** devep has quit IRC | 16:36 | |
Sundar | The os-vif plugins, from what I have seen, are handling Linux bridges, OVS, etc., not hardware per se. | 16:38 |
mriedem | i believe cinder (the service) uses os-brick | 16:41 |
mriedem | to avoid doing the same things in both places | 16:41 |
mriedem | jungleboyj: ^? | 16:41 |
* jungleboyj tries to catch up. | 16:42 | |
*** itlinux has joined #openstack-nova | 16:43 | |
dansmith | mriedem: Sundar I think it's entirely legit to think that not all device programming can be contained within os-acc | 16:43 |
dansmith | it's a lot more complicated of a thing than configuring an initiator or a bridge | 16:44 |
Sundar | dansmith: ^ +1 | 16:44 |
jungleboyj | mriedem: You understanding is correct and we have different drivers in there depending on the type of device. | 16:44 |
mriedem | ok, again, peanut gallery | 16:45 |
jungleboyj | Both Cinder and Nova use os-brick so that we aren't duplicating code. | 16:45 |
sean-k-mooney | o/ | 16:45 |
jungleboyj | That does the work locally on the compute node and then anything that required work from the volume driver is done through the Cinder-API. | 16:46 |
dansmith | if we use brick as the analogy, | 16:47 |
dansmith | it would be like putting all the stuff that talks to the backend volume providers into os-brick | 16:47 |
sean-k-mooney | dansmith: the programin of the device id not really done by cyborg either howver | 16:49 |
sean-k-mooney | it will be delegating the fpga progroming in the intel case to opae | 16:49 |
Sundar | sean-k-mooney: Not quite true. | 16:49 |
dansmith | sean-k-mooney: sure, just like cinder-volume doesn't actually sort the bits on the disk, but asks whatever api it has for the backend to do it | 16:49 |
Sundar | Cyborg will indeed call a Cyborg driver, which can call into device/vendor-specific drivers, such as OPAE or i915 (for GPUs), etc. | 16:50 |
sean-k-mooney | so im conused are we all happy with the statemet nova only interaction point with cyborg should be via os-acc and that os-acc should only interact with cyborg via its rest api | 16:53 |
dansmith | I guess I'm not sure what is confusing.. we interact with cinder via the cinderclient/brick | 16:55 |
dansmith | I think we're hoping that os-acc can serve both purposes, right? | 16:55 |
mriedem | that's what i'm hearing be described | 16:55 |
mriedem | os-acc is both rest api client and low-level host device thing | 16:55 |
sean-k-mooney | yes and that os-acc can hold the definitions of any data structre that nova and cyboge have to share | 16:56 |
sean-k-mooney | mriedem: yes to form no to later | 16:56 |
mriedem | what? | 16:56 |
dansmith | huh? | 16:56 |
dansmith | jinx | 16:56 |
sean-k-mooney | os-acc would only be a rest client | 16:57 |
mriedem | i got jinxed by a couple of 7 year old girls the other night, couldn't talk for 10 minutes, it was....not hard | 16:57 |
sean-k-mooney | the low level programing is handeled by the cyboge drivers and the tools they invoke | 16:57 |
Sundar | sean-k-mooney: True. But I think what is being said is that Nova views os-acc as performing the low-level tasks, i.e, it calls os-acc and leaves the details to it | 16:58 |
jaypipes | I really don't see why os-acc can't start off being a plug-the-device-into-the-VM library. | 16:58 |
dansmith | sean-k-mooney: I don't know why you're so intent on declaring that os-acc is only one thing or another | 16:58 |
dansmith | jaypipes: that's not a separate action | 16:58 |
jaypipes | dansmith: what do you mean? | 16:58 |
dansmith | jaypipes: plugging involves writing pci attachment into the virt xml for boot, right? | 16:58 |
dansmith | jaypipes: if there's any massaging of the device needed, like writing to sys to discover a new pci endpoint or something, then that seems like it could/should be in os-acc | 16:59 |
sean-k-mooney | jaypipes: would you be happy saying os-acc could start as beeing a program the device for the vm lib | 16:59 |
mriedem | in other news, we broke postgresql http://logs.openstack.org/periodic/git.openstack.org/openstack/neutron/master/neutron-tempest-postgres-full/a52bcf9/logs/screen-n-api.txt.gz?level=ERROR | 16:59 |
jaypipes | dansmith: yes, I agree with you. I just don't believe os-acc should be a REST API client to cyborg. | 17:00 |
dansmith | jaypipes: so we need a cyborgclient? | 17:00 |
dansmith | jaypipes: I'm not sure what os-acc would be doing if it's not talking to cyborg | 17:00 |
sean-k-mooney | dansmith: i would like to clearly scope what os-acc is so that i can understand what componets prefrom what actions | 17:02 |
sean-k-mooney | Sundar: the reason i care about if os-acc is just interacting with the rest api is because if it not. e.g it use the RPC bus or cyborge db directly then it has deployment impact | 17:03 |
sean-k-mooney | e.g. the credetials an connection info | 17:03 |
dansmith | sean-k-mooney: nobody is arguing for that are they? | 17:04 |
sean-k-mooney | if it interacts with the devices directly it has packaging impacts | 17:04 |
Sundar | sean-k-mooney: I agree. os-acc will indeed talk to Cyborg API, not the agents or drivers directly. | 17:04 |
slaweq | mriedem: bug reported: https://bugs.launchpad.net/nova/+bug/1804271 | 17:04 |
openstack | Launchpad bug 1804271 in OpenStack Compute (nova) "nova-api is broken in postgresql jobs" [Undecided,New] | 17:04 |
mriedem | slaweq: thanks again | 17:04 |
dansmith | sean-k-mooney: crossing over to the db/mq between projects would be a major issue I think | 17:04 |
jaypipes | (sorry, folks, I'm in another meeting...) | 17:04 |
slaweq | yw :) | 17:05 |
sean-k-mooney | dansmith: in past version of the spec it was allowed to | 17:05 |
Sundar | Taking a step back: pretty much everything that needs to be done either requires Cyborg db access or device poking via Cyborg drivers. | 17:05 |
dansmith | Sundar: you understand you can't poke the cyborg db from os-acc or nova though right? | 17:06 |
*** fanzhang has quit IRC | 17:06 | |
Sundar | dansmith: Yea, thats why I am saying os-acc needs to call Cyborg API | 17:06 |
dansmith | yeah | 17:07 |
dansmith | I definitely never saw where in the spec it said that os-acc would talk directly to the cyborg db | 17:07 |
sean-k-mooney | Sundar: and simlarly do you want os-acc to be able interact with the device directly? i assume no | 17:07 |
Sundar | Yup, no direct access | 17:07 |
mriedem | i think the confusion was because of this question / statement earlier, which prompted me to ask about direct db access: | 17:08 |
mriedem | (10:10:21 AM) Sundar: I still have some questions on what jaypipes expects. The os-acc is not going to handle devices by itself. It neds access to Cyborg db and drivers, which means the majority of work will happen in Cyborg. | 17:08 |
Sundar | sean-k-mooney: Even in past specs, os-acc wouldn;t talk to CYborg drivers -- it was talking to Cyborg agent on the same compute node -- and that was all prior to the Stein PTG | 17:08 |
dansmith | right, which is saying "it can't, because .. access to db" | 17:08 |
dansmith | right? | 17:08 |
mriedem | yes i realize it meant, "access to the db and drivers via the cyborg rest api" | 17:09 |
sean-k-mooney | Sundar: that was going to be my next question | 17:09 |
dansmith | well, I think it means "it has to ask cyborg via api to do that, because only cyborg has access to the drivers and db" but.. same difference | 17:09 |
sean-k-mooney | yes at the ptg we said os-acc would not talk to the agent directly either | 17:09 |
sean-k-mooney | so does os-acc talk to anything other then the cyborge api | 17:09 |
Sundar | If you look at the os-acc API notes in the Nova spec, I have even identified which Cyborg API whill be called in each scenario | 17:10 |
Sundar | sean-k-mooney: No | 17:11 |
sean-k-mooney | ok so the statement i made earilar that os-acc will only talk to the rest-api and is not the lovel device lib was corect | 17:11 |
mriedem | in that case, why not just use openstacksdk? | 17:12 |
mriedem | i realize this is bike shedding a bit, | 17:12 |
mriedem | but os-acc makes me think of os-vif and os-brick which definitely do not call REST APIs in cinder or neutron, | 17:12 |
sean-k-mooney | mriedem: there is no sdk support yet but that would also be a valide approch | 17:12 |
Sundar | mriedem: That's a good point. There is still a need for os-acc. | 17:12 |
mriedem | and if there is no python-cyborgclient, like there is no python-placementclient, we should just use openstacksdk | 17:12 |
Sundar | For example, os-acc associates device RPs with individual accelerator requests from the device profiles, because Nova/Placement don't do that | 17:13 |
dansmith | mriedem: well, I think the linkage to os-vif was around defining pluggable data types, so that things other than PCI would be doable | 17:14 |
mriedem | "For example, os-acc associates device RPs with individual accelerator requests from the device profiles, because Nova/Placement don't do that", | 17:14 |
mriedem | meaning it's going to be doing things like neutron agent for bandwidth provider inventory/allocations? | 17:14 |
sean-k-mooney | mriedem: meaing that when nova get the allcoation candiate form placement in the schduerler/condcutor os-acc will parse it try and figure out which RP maps to each device profile and tell cyborg | 17:15 |
gibi | mriedem: as a side note, I was pushed to the direction to do the mapping in a more generic way, between RequestGroup ovos in the RequestSpec and RPs in the allocation | 17:15 |
gibi | mriedem: which means the core of that code will be generic enough for cyborg use as well | 17:16 |
*** alexchadin has joined #openstack-nova | 17:16 | |
Sundar | Folks, there is a Cyborg client: https://github.com/openstack/python-cyborgclient | 17:17 |
sean-k-mooney | Sundar: yes there is but that is really the commandline client right | 17:17 |
Sundar | Yes ^ | 17:17 |
dansmith | but still, | 17:18 |
Sundar | I am just pointing out that it exists and is different from os-acc | 17:18 |
dansmith | no different than cinderclient or neutronclient right? | 17:18 |
sean-k-mooney | the openstack sdk in theory is ment to replace the python-client | 17:18 |
sean-k-mooney | dansmith: ture | 17:18 |
dansmith | isn't the only reason we're going to have os-acc over cyborgclient is for the object definitions? | 17:19 |
dansmith | a la os-vif | 17:19 |
sean-k-mooney | if gibi's rp mapping code is generic enough nova could reuseit before calling into the sdk or cyborge client to update cyborge | 17:19 |
*** dtantsur is now known as dtantsur|afk | 17:19 | |
sean-k-mooney | dansmith: that is one of the main reasons yes | 17:19 |
Sundar | dansmith: yes, and for conversions to/from such objects. For example, if a Cyborg API returns a JSON, os-acc would convert that to an OVO. | 17:20 |
*** devep_ has quit IRC | 17:21 | |
dansmith | Sundar: that doesn't make any sense | 17:21 |
dansmith | or.. I hope you're just planning to return the serialized object.. :) | 17:21 |
*** alexchadin has quit IRC | 17:21 | |
Sundar | dansmith: The return values from Cyborg API are not necessariy the OVOs defined in os-acc. They are meant to be neutral, used even in scenarios where Cyborg may be used stand-alone | 17:23 |
dansmith | Sundar: well there's probably not much point in the OVOs then at that point | 17:23 |
sean-k-mooney | if we just use the python client or sdk we can just use python classes | 17:24 |
Sundar | dansmith: if you are OK with providing a versioned JSON to Nova as a return value from Cyborg API, that is fine too | 17:24 |
sean-k-mooney | and if we have jason scema definitions for the cyboge api we can validate them | 17:24 |
openstackgerrit | Jack Ding proposed openstack/nova master: Add HPET timer support for x86 guests https://review.openstack.org/605902 | 17:24 |
*** lpetrut has joined #openstack-nova | 17:24 | |
Sundar | dansmith: sean-k-mooney: Are we now saying os-acc is not adding much value, and Nova can directly call Cyborg APIs? | 17:26 |
dansmith | Sundar: nova is going to use some sort of client regardless | 17:27 |
Sundar | Yes, agreed ^ | 17:27 |
dansmith | Sundar: honestly, I'm not sure where we're at now.. I thought os-acc was going to be the only clienty thing and that it was going to define object models to be passed over the api between the services | 17:27 |
dansmith | sounds like that's not clear | 17:27 |
dansmith | and there is another cyborg api client already | 17:27 |
*** tobias-urdin is now known as tobias-urdin_afk | 17:27 | |
dansmith | so I dunno | 17:27 |
Sundar | dansmith: I agree with that! That's what I thought too | 17:28 |
*** sridharg has quit IRC | 17:28 | |
Sundar | If Cyborg returns a JSON value, os-acc could subject that to JSON schema valiadation, like what Sean said | 17:28 |
mriedem | i assume sean-k-mooney wanted os-acc to do ovo translation stuff ala the long-held dream of ovo version negotiation for nova/neutron and os-vif | 17:28 |
sean-k-mooney | Sundar: you raise tho in that case the python classes that define the cyborge api resocues before they are converted to json would live in os-acc in that case | 17:28 |
sean-k-mooney | mriedem: that would be nice but not requrired | 17:29 |
sean-k-mooney | mriedem: i was hoping it would work like neutron-lib which contians the api defitions | 17:29 |
mriedem | sounds like we're building in more complexity than we really need right now | 17:29 |
mriedem | if nova just needs to call apis, then cyborgclient is the way to go | 17:29 |
mriedem | adding in ovo sugar later via an os-acc library could be done when needed | 17:30 |
dansmith | Sundar: any client can/would/should do schema validation | 17:30 |
sean-k-mooney | mriedem: one thing i would however is that we do not need to make regualr commits to nova to account for change to the cyborg api | 17:31 |
dansmith | Sundar: not sure why that means os-acc should be separate | 17:31 |
*** sapd1 has joined #openstack-nova | 17:31 | |
dansmith | sean-k-mooney: that's the case for any client | 17:31 |
sean-k-mooney | sure | 17:31 |
Sundar | dansmith: The current Cyborg client is not designed to convert the return values of APIs to what Nova expects. We would need something on top of that, and that could be os-acc | 17:32 |
sean-k-mooney | i think os-acc orginally came about because of the desire to be able to talk to the cyborg agent on the same host as the nova agent | 17:32 |
sean-k-mooney | that is not part of the spec anymore | 17:32 |
mriedem | most of the existing python-*client projects in openstack don't do response validation, they just take the response payload and throw it into a dict-like object | 17:32 |
dansmith | Sundar: nova can digest whatever the client returns | 17:33 |
sean-k-mooney | Sundar: or we define a set of class the represent the public api and nova will jsut use that | 17:33 |
dansmith | Sundar: we don't need a whole library to turn one dict into another | 17:33 |
mriedem | the caller needs to understand what is in that object, field-wise, based on version | 17:33 |
sean-k-mooney | mriedem: yes that is true which is totlly fine for stable apis | 17:34 |
*** sapd1 has quit IRC | 17:35 | |
sean-k-mooney | the cyborge api is not mature and stable yet and that might lead to a non zero amount of curn on the nova side to handel unless the python cyborge client can provide a semi stable subset we can use | 17:35 |
Sundar | mriedem: sena-k-mooney: Cyborg API is versioned. In the event of changes, we would move to v2 | 17:36 |
sean-k-mooney | Sundar: sure but nova would have to be addapted to V2 | 17:36 |
mriedem | Sundar: does cyborg use microversions? | 17:36 |
Sundar | mriedem: Cyborg API return values are documented -- now in the Nova spec but eventually in a Cyborg spec | 17:36 |
Sundar | mriedem: No, not today. | 17:37 |
mriedem | are there plans to? or just bump major versions whenever there is an api change? | 17:37 |
mriedem | GET /v65/accelerators | 17:38 |
*** lbragstad has quit IRC | 17:38 | |
mriedem | anyway, again, probably not really necessary for this conversation for what nova needs | 17:38 |
Sundar | mriedem: Currently, I haven't planned on microversions. I think you mean a microversion per API call? | 17:38 |
mriedem | microversions are per request yes | 17:38 |
mriedem | if not specified, there is a minimum default | 17:39 |
mriedem | 2.1 for nova, but i'd expect 1.0 or something for cyborg | 17:39 |
Sundar | If we go with major versions alone, would moving to microversions later cause upgrade issues? Not if we bump major version *and* introduce microversions with that, I suppose? | 17:40 |
mriedem | nova had v2.0 and then microversions were added in 2.1, which was backward compatible with v2.0 | 17:40 |
sean-k-mooney | Sundar: the difference is ususally you dont run multiple majour virsions at the same time | 17:40 |
mriedem | nova was going to have a v3 but that became v2.1 | 17:41 |
Sundar | Sounds good | 17:41 |
*** lbragstad has joined #openstack-nova | 17:41 | |
mriedem | cinder had v1 and v2, | 17:41 |
mriedem | and they added microversions in v3.0 | 17:41 |
mriedem | so it sounds like you'd be following the cinder model | 17:41 |
sean-k-mooney | Sundar: in a singel boot request we could call cyborge with multiple versions if it supported microversions | 17:41 |
mriedem | anywho | 17:42 |
sean-k-mooney | e.g bind with 1.1, program with 1.5 | 17:42 |
mriedem | the advantages of microversions isn't really necessary for what nova needs initially with cyborg | 17:42 |
Sundar | Hmm, ok. I need to think about that. Given the long task list for Cyborg in Stein, may be we can introduce microversions later, as needed | 17:42 |
mriedem | and it sounds like we don't need something translating JSON responses to OVOs | 17:43 |
sean-k-mooney | Sundar: what is the status fo teh deployable api endpoint currently | 17:43 |
Sundar | dansmith: sean-k-mooney: mriedem: What I am gathering is, we need a client for Nova to call Cyborg. That can be just the Cyborg client, as os-acc is not adding much value. Is that correct? | 17:43 |
mriedem | so what i'm hearing is nova just needs cyborgclient | 17:43 |
mriedem | Sundar: i think so | 17:43 |
mriedem | before we make this all so complicated that we never get anything done, we should probably start with that | 17:44 |
sean-k-mooney | if we can reuse gibi's resouce prvider mapping code for cyborge then yes | 17:44 |
Sundar | sean-k-mooney: is that a long haul? Can we reasonably expect that to get in by, say, Jan? | 17:45 |
gibi | Sundar: the non-generic mapping code is up in gerrit, I just got the comment yesterday that it can be done a lot more generic way so I'm working on that right now. I think this week I can publish the generic code | 17:46 |
gibi | Sundar: https://review.openstack.org/#/c/616239 | 17:46 |
Sundar | gibi: Excellent, thanks! | 17:46 |
dansmith | Sundar: as we have said, serializing all of this work is definitely going to blow all your timelines | 17:46 |
*** adrianc__ has joined #openstack-nova | 17:46 | |
*** adrianc_ has quit IRC | 17:47 | |
sean-k-mooney | Sundar: has any work progressed on the ci front | 17:47 |
Sundar | dansmith: sean-k-mooney: Can you please state 'for the record' that you are ok with dispensing os-acc and having Nova call Cyborg directly? | 17:47 |
Sundar | sean-k-mooney: The concept of Deployables is being reworked to map to RPs. What do you mean by 'CI front'? Zuul checking for Cyborg? | 17:48 |
sean-k-mooney | Sundar: yes we can call directly with out os-acc via the python clients. | 17:48 |
*** adrianc_ has joined #openstack-nova | 17:48 | |
dansmith | Sundar: once again, nova will use a client library and not call directly. I do not care what the name of that thing is | 17:48 |
sean-k-mooney | im agreeing with dan by teh way eventhough we said it differently | 17:48 |
dansmith | yes | 17:49 |
Sundar | dansmith: If it is the standard Cyborg client, Nova will have to do the necessary conversions to/from JSON. Hance the question | 17:49 |
sean-k-mooney | what i meant for the ci front is we talk about the need to do some basic integration testing | 17:49 |
dansmith | Sundar: when nova uses a client library, it has to deal with the output of that client | 17:49 |
dansmith | Sundar: no client library returns exactly what we need with no massaging, of course | 17:49 |
Sundar | dansmith: Great. I will update the spec to skip os-acc. Thanks a lot to you, sean-k-mooney and mriedem. | 17:50 |
Sundar | I still have the question on colocation | 17:50 |
Sundar | Re. request groups in device profiles, it is still not clear to me how we would handle co-location without them, i.e., we want 2 accelerators from 2 different RPs in the same device | 17:51 |
Sundar | sean-k-mooney: You mentioned NUMA ffinity in your rpelies oin the spec. I am looking at co-location within a device | 17:51 |
*** adrianc__ has quit IRC | 17:51 | |
* sean-k-mooney reading one sec | 17:52 | |
sean-k-mooney | Sundar why would you have 2 different RP on the same device instead of 2 inventories in 1 device | 17:53 |
Sundar | Because they have different traits | 17:53 |
sean-k-mooney | * 2 inventories in 1 rp | 17:53 |
sean-k-mooney | ok then have 1 RP for the devce and two nested rp for the two acllerators | 17:54 |
Sundar | Say an FPGA with 2 regions: one has compression, other has encryption and we want to gang them up together | 17:54 |
sean-k-mooney | then we can use in_tree=<device rp uuid> for the colocation | 17:54 |
dansmith | Sundar: those aren't traits, right? | 17:55 |
Sundar | sean-k-mooney: Would the traits be applied to the parent RP or the children RPs? Ans, more importantly, is that scenario working today? | 17:55 |
dansmith | Sundar: those are inventories on a single provider, no? | 17:55 |
Sundar | dansmith: No, they are in 2 different RPs, but on the same device (PCI card for e.g.) | 17:55 |
sean-k-mooney | Sundar: i was tinking the childe RPs | 17:55 |
dansmith | Sundar: in that case, what would the inventories be? | 17:55 |
Sundar | Because they represent different functions, and functions are traits, they would show up as 2 RPs | 17:56 |
sean-k-mooney | they are different resouce classes also | 17:56 |
dansmith | sean-k-mooney: right which is why they can be inventories on the same provider | 17:56 |
*** adrianc_ has quit IRC | 17:56 | |
Sundar | Each RP would contain resources of the class CUSTOM_ACCELERATOR_FPGA | 17:56 |
sean-k-mooney | one is COMPRESS_MBs and the other is CRYPTO_MBs | 17:56 |
dansmith | I thought the whole point of this was to expose functions as consumable things? | 17:57 |
Sundar | sean-k-mooney: No, we agreed both in rocky and Stein PTGs that the RCs reflect the device type (e.g. GPU, FPGA), not device details or functions | 17:57 |
dansmith | omg | 17:57 |
dansmith | that is not what I thought we agreed | 17:57 |
sean-k-mooney | in anycase i think nested RP can handel this usecase | 17:57 |
Sundar | So, a requets may look like: resources:CUSTOM_ACCELERATOR_FPGA=1; trait:CUSTOM_FUNCTION_A=required | 17:58 |
dansmith | because in that case, all cyborg is ever going to expose is a thousand RPs with the same =1 inventories, decorated with super complex traits to describe what is in them at any given point right? | 17:58 |
Sundar | If the RCs reflect functions, the inventories of RPs will change all the time, as devices get reconfigured | 17:59 |
dansmith | so will the traits right? | 17:59 |
sean-k-mooney | the hack here is we cant delete or recreate invtories if there are allocation against them but we can change the traits | 17:59 |
Sundar | dansmith: Why super-complex traits? I documented a small handful (4 or 5 at the most), plus whatever custom traits that PowerVM guys asked for | 18:00 |
*** derekh has quit IRC | 18:00 | |
sean-k-mooney | Sundar: there are more then 5 traits just to dicibe crypto functions | 18:00 |
dansmith | sean-k-mooney: right, but we can atomically update multiple inventory things at the same time | 18:00 |
dansmith | I mean, we can atomically update traits too I guess, but.. I totally thought this was going the direction of inventory being functions, and traits being actual, you know, traits about the device like brand, model capabilities, etc | 18:01 |
*** sapd1 has joined #openstack-nova | 18:01 | |
Sundar | sean-k-mooney: My point is they all have the same structure: CUSTOM_FUNCTION_foo, CUSTOM_DEVICE_MODEL_bar, etc. | 18:02 |
sean-k-mooney | ok lets try to get the most basic version of integration working first | 18:03 |
*** spatel has joined #openstack-nova | 18:03 | |
Sundar | dansmith: It would be simpler if all variations happened in traits, while RCs are more or less static -- a GPU accelerator will never become an FPGA accelerator. | 18:03 |
sean-k-mooney | Sundar: the expection from a placement point of view it that both traits and resouce classes would be largly stattic but could change over time | 18:04 |
Sundar | dansmith: This is documented in the specs -- both before and after the Stein PTG. But I am *not trying to guilt-trip you :) | 18:04 |
openstackgerrit | Elod Illes proposed openstack/nova master: Transform scheduler.select_destinations notification https://review.openstack.org/508506 | 18:05 |
sean-k-mooney | is there a way today to use cyborge to deploy somthing without special hardware that we could use a piplot to test the workflow | 18:06 |
dansmith | Sundar: " a GPU will never be an FPGA" is not an argument that means anything to me in this context | 18:06 |
Sundar | sean-k-mooney: Makes sense. A device model trait is not expected to change much -- except perhaps on firmware updates. A function trait will change only when orchestration programs it (not if the VM programs it, which is the Device as a Service use case) | 18:06 |
*** sapd1 has quit IRC | 18:06 | |
dansmith | Sundar: I thought the discussion had previously gone that a user says "I want two TLS offload devices, and they need to be able to support crypto $foo" | 18:07 |
dansmith | Sundar: but what you're saying is that they will need to say "I need two FPGAs and they need to have traits TLS_OFFLOAD and TLS_MECH_FOO" | 18:07 |
*** k_mouza_ has joined #openstack-nova | 18:07 | |
dansmith | which means cyborg isn't providing us much in the way of abstraction | 18:07 |
dansmith | anyway, I'm about out of energy for discussing this at this point, so I'll leave it to the others that are more invested | 18:08 |
*** moshele has joined #openstack-nova | 18:08 | |
Sundar | dansmith: Are you ok if sean-k-mooney and I continue the discussion? And are you ok with whatever conclusion we reach? :) | 18:09 |
*** k_mouza has quit IRC | 18:10 | |
*** moshele has quit IRC | 18:10 | |
Sundar | sean-k-mooney: You have been closely following my specs (and thanks for that). Are you in alignment with this representation? | 18:10 |
dansmith | Sundar: you can of course discuss anything you want, and no I'm not signing off on something I haven't read | 18:11 |
sean-k-mooney | Sundar: i am honest gettin quite tired also can we pick this up later in the week | 18:12 |
Sundar | dansmith: Ok, tried my luck there. Can we talk tomorrow same time? | 18:12 |
*** k_mouza_ has quit IRC | 18:12 | |
sean-k-mooney | perhaps we should do it on the cyborge channel not to flood nova | 18:12 |
dansmith | Sundar: honestly, I'm not sure we're making progress here | 18:13 |
dansmith | Sundar: and no, I can't be involved in every discussion, I'm just saying I reserve the right to be unhappy with the next round of the spec | 18:13 |
Sundar | sean-k-mooney: Sure. Same time tomorrow? | 18:13 |
dansmith | Sundar: you need more than just sean-k-mooney in agreement on this | 18:13 |
sean-k-mooney | i would honestly love to jsut protoype something end to end that works and see what it looked like | 18:13 |
dansmith | and getting everyone into a single irc channel at the same time is just not going to happen repetitively | 18:14 |
dansmith | sean-k-mooney: ++ | 18:14 |
*** moshele has joined #openstack-nova | 18:14 | |
Sundar | dansmith: Sorry to hear that. I thought we made progress by agreeing to skip os-acc. | 18:14 |
dansmith | sean-k-mooney: I'm getting spec fatigue I think.. a series of patches on both sides that actually does something we can evaluate might be a better stepping stone | 18:15 |
Sundar | sean-k-mooney: I am with you, but Cyborg folks are reluctant to move till Nova spec converges, so I am facing a catch-22 | 18:15 |
dansmith | sean-k-mooney: it'll be trivial to look at that and evaluate how things are being done | 18:15 |
sean-k-mooney | so lets create a feature branch | 18:15 |
dansmith | Sundar: no, that's not a legit argument | 18:15 |
dansmith | Sundar: you can put up patches against nova and cyborg and test them together without merging anything | 18:15 |
dansmith | we do it all the time for big complex things like this | 18:15 |
dansmith | if there isn't already, cyborg should have a fake driver that can just pretend to offer up devices and program them, | 18:16 |
*** moshele has quit IRC | 18:17 | |
dansmith | and that should be enough to do some interaction testing between the two services, even if nothing actually gets attached at the final step to the vm | 18:17 |
dansmith | sean-k-mooney: agree with that ^ ? | 18:17 |
Sundar | A feature branch upstream? | 18:17 |
sean-k-mooney | yes | 18:17 |
spatel | sean-k-mooney: do you have experience with rabbitmq ? | 18:17 |
sean-k-mooney | spatel: not much sorry | 18:17 |
spatel | no worry!! | 18:18 |
sean-k-mooney | dansmith: i would love to take that approch | 18:18 |
Sundar | dansmith: sean-k-mooney: OK, thanks for your time. | 18:20 |
openstackgerrit | Artom Lifshitz proposed openstack/nova-specs master: Re-propose numa-aware-live-migration spec https://review.openstack.org/599587 | 18:22 |
jaypipes | holy crap, I missed a bunch... :( sorry, reading back up... | 18:23 |
sean-k-mooney | mriedem: i found a relitvly simple and reliable way to repoduce https://bugs.launchpad.net/nova/+bug/1751923 by the way | 18:25 |
openstack | Launchpad bug 1751923 in OpenStack Compute (nova) "_heal_instance_info_cache periodic task bases on port list from nova db, not from neutron server" [Medium,In progress] - Assigned to Maciej Jozefczyk (maciej.jozefczyk) | 18:25 |
*** efried_pto is now known as efried | 18:28 | |
*** Sundar has quit IRC | 18:31 | |
*** tobias-urdin_afk is now known as tobias-urdin | 18:38 | |
*** devep has joined #openstack-nova | 18:42 | |
*** pcaruana has quit IRC | 18:47 | |
*** sapd1 has joined #openstack-nova | 18:50 | |
mriedem | sean-k-mooney: how is that? take down the neutron agent and reboot the vm or something? | 18:50 |
sean-k-mooney | mriedem: i added a scipt to the bug | 18:52 |
sean-k-mooney | you can cause it or a similar effect via the api | 18:53 |
sean-k-mooney | basically if you send the api request to detach a port to neutron and reboot the vm you end up with it broken | 18:54 |
sean-k-mooney | and you cant use openstack server add port or remove port to fix it | 18:54 |
sean-k-mooney | mriedem: https://bugs.launchpad.net/nova/+bug/1751923/comments/10 | 18:55 |
openstack | Launchpad bug 1751923 in OpenStack Compute (nova) "_heal_instance_info_cache periodic task bases on port list from nova db, not from neutron server" [Medium,In progress] - Assigned to Maciej Jozefczyk (maciej.jozefczyk) | 18:55 |
sean-k-mooney | mriedem: i was debating if i coudl make this into some kind of functional regression test but not sure how yet | 18:55 |
*** sapd1 has quit IRC | 18:56 | |
openstackgerrit | Matt Riedemann proposed openstack/nova master: Apply DISTINCT clause to CellMapping.get_by_project_id for postgres https://review.openstack.org/619061 | 18:58 |
*** ralonsoh has quit IRC | 19:00 | |
mriedem | slaweq: hopefully this ^ does the ojb | 19:01 |
mriedem | *job | 19:01 |
mriedem | i'm no pg expert though | 19:02 |
mriedem | my socks have officially been rocked off | 19:12 |
sean-k-mooney | find something interesting | 19:16 |
*** zul has quit IRC | 19:22 | |
*** hamzy has joined #openstack-nova | 19:25 | |
prometheanfire | I'm not one either, but do use it for openstack, pg question? | 19:31 |
prometheanfire | ok, above my head too :D | 19:32 |
openstackgerrit | Matt Riedemann proposed openstack/nova stable/rocky: Add recreate test for bug 1799892 https://review.openstack.org/619075 | 19:39 |
openstack | bug 1799892 in OpenStack Compute (nova) "Placement API crashes with 500s in Rocky upgrade with downed compute nodes" [Medium,In progress] https://launchpad.net/bugs/1799892 - Assigned to Eric Fried (efried) | 19:39 |
openstackgerrit | Matt Riedemann proposed openstack/nova stable/rocky: Consider root id is None in the database case https://review.openstack.org/619076 | 19:39 |
mriedem | upgrade issue in rocky so would be good to get that backport series moving ^ | 19:50 |
mriedem | the xenserver CI seems to be busted | 19:53 |
openstackgerrit | Jack Ding proposed openstack/nova-specs master: Flavor Extra Spec and Image Properties Validation https://review.openstack.org/618542 | 19:54 |
dansmith | mriedem: not merged in master yet right? | 19:54 |
mriedem | just rechecked it in the gate | 19:58 |
efried | mriedem: should we assign both branches of that bug back to tetsuro? | 20:00 |
efried | I at least put the master side back. | 20:00 |
mriedem | in launchpad? | 20:01 |
mriedem | yes | 20:01 |
dansmith | mriedem: so I've been half-assedly working on trying to do the manual metadata fill on single-instance-get | 20:12 |
dansmith | lots of weird things break just within db_api if we return a dict instead of a model | 20:12 |
dansmith | so I can go chase and fix all those things, | 20:12 |
dansmith | but I think jaypipes once flexed his db muscle and argued there was some way to change the way we do the joining to avoid the rowsplosion | 20:13 |
dansmith | so maybe we should challenge him on that before we get too far | 20:13 |
*** sapd1 has joined #openstack-nova | 20:13 | |
mriedem | you know how i get when jay flexes his muscles | 20:14 |
mriedem | i melt | 20:14 |
dansmith | yep | 20:14 |
dansmith | it's grotesque yet oddly satisfying | 20:14 |
jaypipes | ewww. | 20:14 |
jaypipes | dansmith: are you referring to the eagerload thing? | 20:15 |
dansmith | no | 20:15 |
dansmith | jaypipes: in the oldentimes, | 20:15 |
dansmith | we would load an instance, joined with metadata, system_metadata, etc | 20:15 |
dansmith | which would end up returning X*Y*Z rows for X instances, with Y rows of metadata and Z rows of sysmeta | 20:16 |
jaypipes | right. | 20:16 |
dansmith | which was the reason RAX failed to deploy icehouse after we moved flavor data to sysmeta | 20:16 |
dansmith | now we query metadata separately from the actual instance load | 20:16 |
dansmith | but I thought a convo with you a long time ago yielded you saying that we could do an inner-outer-blue-unicorn join to avoid that somehow | 20:17 |
jaypipes | yes, we can do a single query to get all instance metadata (and sysmeta) for all selected instances. | 20:17 |
dansmith | to be clear, | 20:18 |
jaypipes | a query that would just yield (instance_uuid, key, value) tuples. | 20:18 |
dansmith | no, that's not what I'm asking | 20:18 |
*** sapd1 has quit IRC | 20:18 | |
dansmith | the old query was returning the instance data itself, and the metadata key,value and the sysmeta key,value | 20:19 |
dansmith | so the instance data was repeated for every row in meta, sysmeta | 20:19 |
dansmith | that was a single query for the instance itself, and the metadata(s) | 20:19 |
dansmith | I know we can query the instance, and then query for the metadatas efficiently | 20:20 |
dansmith | but it's two queries | 20:20 |
dansmith | that's basically what we do now | 20:20 |
jaypipes | ok. well, that's the most efficient way to solve this particular problem. | 20:20 |
dansmith | two queries? | 20:20 |
jaypipes | yup. | 20:20 |
dansmith | okay | 20:21 |
dansmith | I'm pretty sure you called me a stupid ignoramus for doing that back in the icehouse days, | 20:21 |
jaypipes | I thought we were doing >1 query for grabbing instance metadata and system metadata | 20:21 |
dansmith | well, we are, but only for convenience | 20:21 |
jaypipes | no, I never called you anything. | 20:21 |
dansmith | YOU DID | 20:21 |
dansmith | we're doing three queries now, instance, meta, sysmeta | 20:22 |
jaypipes | if I ever said anything about performance it would have been because we were doing a query on instances, then *for each instance* issuing a query to get some instance metadata. | 20:22 |
dansmith | we're not doing that | 20:22 |
jaypipes | ok, coools. | 20:22 |
jaypipes | I can reduce the meta + sysmeta to a single query. | 20:22 |
dansmith | so I guess that means I have to keep plugging at this | 20:22 |
dansmith | jaypipes: yea, feel free, but that's separate from my other work here | 20:22 |
jaypipes | k. is there anything I can help you with on your work here? | 20:23 |
dansmith | jaypipes: you just did | 20:23 |
dansmith | thanks | 20:23 |
jaypipes | :) | 20:23 |
jaypipes | well, at least that gives me something to smile about today. | 20:23 |
jaypipes | thanks. | 20:23 |
* dansmith moves on to different excuses | 20:24 | |
jaypipes | FTR, on the cyborg thing, I *also* believe that cyborg should be modeling should be inventories, not RPs with tons of traits masquerading as resource classes. | 20:25 |
jaypipes | dansmith: ^ | 20:25 |
dansmith | jaypipes: yay. | 20:25 |
* dansmith relishes in the small victories these days | 20:25 | |
jaypipes | indeed. | 20:25 |
mriedem | dansmith: on top of the improved join on what you're doing, i think it's a 2-part change in that the metadata api doesn't need to be pre-loading on system_metadata - at least not anymore | 20:26 |
dansmith | mriedem: yeah, so that will address the acute issue right? | 20:26 |
mriedem | last i looked the only thing in meta-api that would use sysmeta is a vendor data provider if configured | 20:26 |
dansmith | maybe I should punt this until we have a better reason to do this work | 20:26 |
mriedem | i believe so, and i think that's what the workday ops guy said he did in the ML | 20:27 |
mriedem | heh | 20:27 |
mriedem | see, i started trying to do what you said and sparks flew immediately | 20:27 |
mriedem | and i gave up | 20:27 |
dansmith | oh did you | 20:27 |
dansmith | ? | 20:27 |
mriedem | locally | 20:27 |
dansmith | maybe that should be my impetus to fix it | 20:27 |
mriedem | never pushed it up b/c tests failed horribly | 20:27 |
mriedem | to show me up? | 20:27 |
dansmith | yeah | 20:27 |
mriedem | by all means | 20:27 |
dansmith | nah, sounds hard. | 20:28 |
mriedem | next you can fix the nova/cinder cross az attach mess | 20:28 |
mriedem | which i have a fix for, but it's fugly as all get out | 20:28 |
dansmith | I like it already | 20:29 |
*** sapd1 has joined #openstack-nova | 20:34 | |
*** sapd1 has quit IRC | 20:38 | |
*** itlinux has quit IRC | 20:39 | |
efried | dansmith, jaypipes: Modeling accelerators via specific resource classes, so like CUSTOM_FPGA_GZIP rather than rc=FPGA + traits=[GZIP] ? | 20:40 |
dansmith | GZIP isn't a trait, IMHO | 20:41 |
dansmith | like, I don't ask for SOME_SILICON=1024, trait=RAM | 20:41 |
efried | um. The FPGA is capable of processing gzips? | 20:41 |
jaypipes | the resource class is a context to a GZIP program flashed to a device. | 20:42 |
dansmith | right, but there's a difference between asking for an FPGA and asking for GZIP offload to me | 20:42 |
jaypipes | dansmith++ | 20:42 |
dansmith | if I want an FPGA that I can program myself, I want an FPGA=1.. if I want a GZIP handler, I want GZIP=1, | 20:42 |
dansmith | which might be an FPGA in the back end | 20:42 |
dansmith | or it might be an ASIC | 20:42 |
dansmith | or whatever | 20:42 |
efried | But then a resource provider representing a blank (as-yet-unprogrammed) FPGA would have to show inventories of multiple resource classes, and then when one of those is consumed, we would have to nix the other resource classes (or do the reserved=total trick for them). | 20:44 |
efried | which is racy, as well as being ew. | 20:44 |
dansmith | same for the trait right? | 20:44 |
dansmith | you say FPGA=1, trats=GZIP,TLS,BITCOIN | 20:44 |
efried | no. GZIP_CAPABLE stays. Not sure we have to retrait every time we reprogram. | 20:44 |
efried | but if we do, the GZIP_IS_ON_THIS_THING_AT_THE_MOMENT trait would be separate, and have separate meaning, than the GZIP_CAPABLE. | 20:45 |
dansmith | you're just providing no abstraction there | 20:45 |
efried | the former would be used only as an optimization, if/when we have "preferred traits", to avoid reprogramming if there's one that's already set up. | 20:46 |
*** david-lyle is now known as dklyle | 20:46 | |
jaypipes | premature optimization... | 20:46 |
efried | I'd be happy if we skipped that whole bit for the first pass and just used the *_CAPABLE traits. | 20:47 |
efried | Programming gets done after the claim, if and as necessary. | 20:47 |
jaypipes | I'd be happy if we just skipped everything other than just using custom resource classes. | 20:47 |
efried | jaypipes: So preprogram everything? | 20:47 |
dansmith | pretty sure we said the first step was assuming everything was static, no? | 20:48 |
dansmith | except for the "user will program it themselves" case of course | 20:48 |
jaypipes | dansmith++ again. | 20:48 |
dansmith | when we talked about this in (denver I think?) I think the overwhelming majority of cases where this really applies is the pre-programmed case, | 20:49 |
dansmith | because it provides for locality in certain FGPAs that have one code region and multiple execution contexts, | 20:49 |
dansmith | such that if you co-locate a GZIP and a TLS, they both can't use the same FPGA, but if you get two GZIP tenants on the same box, they can | 20:50 |
dansmith | and I thought we agreed to avoid boiling the ocean with "everything is completely dynamic all the time forever" until we could do, you know, fucking anything :) | 20:50 |
efried | I can buy it for a first pass. Long-term, that seems like not very cloudy. Though I suppose if the "pre"programming is done by a higher orchestrator, it could fly. | 20:50 |
dansmith | maybe that's just me missing something, but.. | 20:50 |
efried | okay, thanks for the fresher. | 20:51 |
dansmith | efried: well, if you do it with inventories, you can actually count usage of those things, | 20:51 |
dansmith | and then your pre-programming workflow can ensure that X% of GZIP is available based on current demand | 20:51 |
dansmith | but if you do it with traits that seems a lot messier | 20:51 |
dansmith | usage and capacity I mean | 20:52 |
slaweq | mriedem: thx for taking care of this issue | 20:52 |
*** priteau has quit IRC | 20:52 | |
dansmith | efried: the orchestrator that maintains a certain amount of available inventory of stuff, I mean | 20:52 |
jaypipes | I was under the impression Cyborg was gonna contain that "pre-orchestrator/pre-programming" thing... | 20:52 |
* dansmith is tripping all over himself | 20:52 | |
dansmith | jaypipes: yeah I dunno, could be, or could be a cron job you run once an hour that makes sure the capacity is within limits | 20:53 |
dansmith | meaning I dunno if that level of cron is something cyborg was going to do or not, | 20:53 |
efried | "load balancing" your accelerators | 20:53 |
efried | guess that implies moving them around, which isn't what we're talking about | 20:53 |
efried | but yeah, I get the idea. | 20:53 |
*** itlinux has joined #openstack-nova | 21:13 | |
*** dave-mccowan has quit IRC | 21:28 | |
*** rcernin has joined #openstack-nova | 21:38 | |
*** lpetrut has quit IRC | 21:41 | |
*** hamzy has quit IRC | 21:42 | |
jaypipes | sean-k-mooney: questions for you on https://review.openstack.org/#/c/602384/ please | 21:46 |
*** k_mouza has joined #openstack-nova | 21:57 | |
*** mchlumsky has quit IRC | 22:00 | |
*** k_mouza has quit IRC | 22:01 | |
*** devep has quit IRC | 22:04 | |
*** awaugama has quit IRC | 22:04 | |
openstackgerrit | Chris Dent proposed openstack/nova master: Use external placement in functional tests https://review.openstack.org/617941 | 22:09 |
*** BjoernT has quit IRC | 22:09 | |
openstackgerrit | Chris Dent proposed openstack/nova master: WIP: Delete the placement code https://review.openstack.org/618215 | 22:09 |
*** mchlumsky has joined #openstack-nova | 22:11 | |
mriedem | efried: turns out https://review.openstack.org/#/c/619061/ does fix the pg thing | 22:13 |
mriedem | also, wee lots of red http://logs.openstack.org/05/613305/7/check/tempest-full/999ec9f/controller/logs/screen-n-api.txt.gz?level=ERROR | 22:14 |
mriedem | unrelated regression | 22:14 |
efried | ack x2 | 22:14 |
efried | mriedem: What were the func test failures? Actual differences in results? | 22:15 |
mriedem | dansmith: looks like an unintended side effect of using the scatter_gather_single_cell for nova show is the scatter thing logs errors from the query^ even for things we expect | 22:15 |
mriedem | efried: yeah, got 5 rows when 1 expected | 22:16 |
mriedem | efried: pull it down and try it out | 22:16 |
efried | okay, must be aggregate functions | 22:16 |
efried | nah, higher priorities. | 22:16 |
dansmith | mriedem: oh yeah, I think I called that out initially and then totally forgot :( | 22:16 |
dansmith | it's spewing errors to the logs right? | 22:16 |
mriedem | yes | 22:16 |
mriedem | i'll open a bug | 22:16 |
efried | If it fixes the problem, let's roll with it. | 22:16 |
*** tbachman has quit IRC | 22:18 | |
mriedem | https://bugs.launchpad.net/nova/+bug/1804325 | 22:19 |
openstack | Launchpad bug 1804325 in OpenStack Compute (nova) "InstanceNotFound traceback errors in n-api logs while polling for server delete" [High,Triaged] | 22:19 |
mriedem | so, i think we just remove that exception line since the caller can get the actual exception type now | 22:20 |
mriedem | and decide if it needs to log | 22:20 |
mriedem | dansmith: you want it or shall i? | 22:20 |
dansmith | I don't want it | 22:20 |
dansmith | I'm "finishing one email" away from disappearing | 22:21 |
mriedem | are you around tomorrow? | 22:21 |
mriedem | air quotes is acceptable | 22:21 |
dansmith | heh | 22:23 |
dansmith | I am but I have a few things going on | 22:23 |
dansmith | but if you lay on the guilt extra thick I might do something productive | 22:23 |
*** owalsh has quit IRC | 22:23 | |
mriedem | i'm going to be working hard at trying to figure out this espresso maker i bought | 22:30 |
mriedem | it's like if dr seuss tried to make coffee | 22:30 |
*** spatel has quit IRC | 22:30 | |
openstackgerrit | Matt Riedemann proposed openstack/nova master: Remove exception logging from scatter_gather_cells https://review.openstack.org/619110 | 22:32 |
*** owalsh has joined #openstack-nova | 22:39 | |
openstackgerrit | Matt Riedemann proposed openstack/nova master: Add HPET timer support for x86 guests https://review.openstack.org/605902 | 22:39 |
*** imacdonn has quit IRC | 22:42 | |
*** imacdonn has joined #openstack-nova | 22:42 | |
*** efried is now known as efried_back_mond | 22:43 | |
*** efried_back_mond is now known as efried_back_mon | 22:43 | |
*** itlinux has quit IRC | 22:44 | |
*** owalsh has quit IRC | 22:49 | |
*** munimeha1 has quit IRC | 22:49 | |
*** owalsh has joined #openstack-nova | 23:02 | |
*** tbachman has joined #openstack-nova | 23:05 | |
*** mugsie has quit IRC | 23:06 | |
*** ivve has quit IRC | 23:08 | |
*** takashin has joined #openstack-nova | 23:34 | |
openstackgerrit | Takashi NATSUME proposed openstack/nova stable/rocky: Add description of custom resource classes https://review.openstack.org/619122 | 23:44 |
mriedem | hello friends, could use some core reviews on this pretty simple straight forward spec https://review.openstack.org/#/c/612531/ | 23:57 |
*** spatel has joined #openstack-nova | 23:59 | |
*** cdent has quit IRC | 23:59 |
Generated by irclog2html.py 2.15.3 by Marius Gedminas - find it at mg.pov.lt!