*** ociuhandu has joined #openstack-nova | 00:04 | |
*** ociuhandu has quit IRC | 00:12 | |
*** mlavalle has quit IRC | 00:23 | |
*** JamesBenson has joined #openstack-nova | 00:25 | |
*** JamesBenson has quit IRC | 00:30 | |
openstackgerrit | Brin Zhang proposed openstack/nova master: Add nested resource providers limit for multi create https://review.opendev.org/723884 | 00:36 |
---|---|---|
brinzhang_ | bauzas: I was updated https://review.opendev.org/723884, as you said in bug 1874664, I copied and modified. | 00:41 |
openstack | bug 1874664 in OpenStack Compute (nova) "Instance multi-create doesn't support available resources spread between children RPs" [Medium,Confirmed] https://launchpad.net/bugs/1874664 - Assigned to Wenping Song (wenping1) | 00:41 |
*** threestrands has joined #openstack-nova | 00:45 | |
*** Liang__ has joined #openstack-nova | 00:59 | |
openstackgerrit | sean mooney proposed openstack/nova master: silence amqp heartbeat warning https://review.opendev.org/724188 | 01:04 |
*** abaindur_ has joined #openstack-nova | 01:09 | |
*** abaindur has quit IRC | 01:12 | |
*** abaindur_ has quit IRC | 01:14 | |
melwitt | sean-k-mooney: re: that log message, I dunno. we suggested it in the past to the oslo.messaging ppl and they thought it's something that should be logged, at info. iirc they didn't want to downgrade it to debug either | 01:16 |
melwitt | I recognize that maybe there's a way we could hide it on our side but I guess I dunno what to think about that | 01:16 |
*** JamesBenson has joined #openstack-nova | 01:47 | |
*** ociuhandu has joined #openstack-nova | 01:50 | |
*** JamesBenson has quit IRC | 01:52 | |
*** ociuhandu has quit IRC | 02:03 | |
*** ircuser-1 has joined #openstack-nova | 02:05 | |
openstackgerrit | xuyuanhao proposed openstack/nova master: fix bug/1875624 https://review.opendev.org/724213 | 02:17 |
*** gyee has quit IRC | 02:52 | |
*** mkrai has joined #openstack-nova | 02:55 | |
*** sapd1_x has joined #openstack-nova | 03:02 | |
*** psachin has joined #openstack-nova | 03:22 | |
*** JamesBenson has joined #openstack-nova | 03:27 | |
*** JamesBenson has quit IRC | 03:32 | |
*** ociuhandu has joined #openstack-nova | 03:39 | |
*** ircuser-1 has quit IRC | 03:40 | |
*** JamesBenson has joined #openstack-nova | 03:48 | |
*** JamesBenson has quit IRC | 03:52 | |
*** ociuhandu has quit IRC | 03:53 | |
*** threestrands has quit IRC | 04:17 | |
*** JamesBenson has joined #openstack-nova | 04:29 | |
*** mkrai has quit IRC | 04:31 | |
*** mkrai has joined #openstack-nova | 04:32 | |
*** JamesBenson has quit IRC | 04:33 | |
*** evrardjp has quit IRC | 04:35 | |
*** evrardjp has joined #openstack-nova | 04:35 | |
*** ociuhandu has joined #openstack-nova | 04:39 | |
*** ociuhandu has quit IRC | 04:48 | |
*** ratailor has joined #openstack-nova | 05:00 | |
*** dklyle has quit IRC | 05:02 | |
*** gryf has joined #openstack-nova | 05:07 | |
*** bnemec has quit IRC | 05:15 | |
*** slaweq has joined #openstack-nova | 05:19 | |
*** tetsuro has joined #openstack-nova | 05:28 | |
*** tetsuro has quit IRC | 05:32 | |
*** mkrai_ has joined #openstack-nova | 05:37 | |
*** mkrai has quit IRC | 05:39 | |
*** links has joined #openstack-nova | 05:40 | |
*** mkrai_ has quit IRC | 05:51 | |
*** mkrai has joined #openstack-nova | 05:51 | |
*** mkrai has quit IRC | 05:53 | |
*** mkrai has joined #openstack-nova | 05:53 | |
*** ociuhandu has joined #openstack-nova | 06:01 | |
*** slaweq has quit IRC | 06:06 | |
*** udesale has joined #openstack-nova | 06:10 | |
*** ociuhandu has quit IRC | 06:15 | |
*** maciejjozefczyk has joined #openstack-nova | 06:28 | |
*** rpittau|afk is now known as rpittau | 06:32 | |
bauzas | gibi: on PTO this morning only FYI | 06:35 |
*** ociuhandu has joined #openstack-nova | 06:36 | |
*** slaweq has joined #openstack-nova | 06:41 | |
gibi | bauzas: hi. ack. | 06:48 |
*** ttsiouts has joined #openstack-nova | 06:51 | |
*** brinzhang has quit IRC | 06:53 | |
*** dustinc has quit IRC | 06:53 | |
*** brinzhang has joined #openstack-nova | 06:53 | |
*** brinzhang has quit IRC | 06:56 | |
*** brinzhang has joined #openstack-nova | 06:56 | |
*** belmoreira has joined #openstack-nova | 06:57 | |
*** ociuhandu has quit IRC | 06:59 | |
*** dpawlik has joined #openstack-nova | 06:59 | |
*** bbowen has quit IRC | 07:00 | |
*** bbowen has joined #openstack-nova | 07:00 | |
*** larainema has joined #openstack-nova | 07:02 | |
*** links has quit IRC | 07:03 | |
*** links has joined #openstack-nova | 07:04 | |
*** nightmare_unreal has joined #openstack-nova | 07:06 | |
*** ircuser-1 has joined #openstack-nova | 07:10 | |
*** tesseract has joined #openstack-nova | 07:11 | |
*** brinzhang has quit IRC | 07:13 | |
*** brinzhang has joined #openstack-nova | 07:13 | |
*** takamatsu has joined #openstack-nova | 07:16 | |
*** mkrai has quit IRC | 07:26 | |
*** mkrai has joined #openstack-nova | 07:27 | |
openstackgerrit | Kevin Zhao proposed openstack/nova master: [WIP] CI: add tempest-integrated-compute-aarch64 job https://review.opendev.org/714439 | 07:27 |
*** rcernin has quit IRC | 07:27 | |
*** tosky has joined #openstack-nova | 07:29 | |
*** ociuhandu has joined #openstack-nova | 07:29 | |
*** mkrai has quit IRC | 07:42 | |
*** mkrai has joined #openstack-nova | 07:45 | |
*** ralonsoh has joined #openstack-nova | 07:48 | |
openstackgerrit | Jiri Suchomel proposed openstack/nova-specs master: Add spec for downloading images via RBD https://review.opendev.org/572805 | 07:58 |
*** mkrai has quit IRC | 08:01 | |
*** mkrai_ has joined #openstack-nova | 08:01 | |
*** ccamacho has joined #openstack-nova | 08:05 | |
*** martinkennelly has joined #openstack-nova | 08:18 | |
brinzhang_ | gibi: hi, https://review.opendev.org/#/c/723884/2/api-guide/source/accelerator-support.rst@69 what do you mean? | 08:20 |
brinzhang_ | gibi: Or other words, I seems not understand this sentence "I would make this cyborg specific in this doc.", and Line 76 | 08:22 |
*** avolkov has joined #openstack-nova | 08:33 | |
*** jaosorior has quit IRC | 08:45 | |
*** ociuhandu has quit IRC | 08:50 | |
*** ttsiouts has quit IRC | 08:50 | |
*** ociuhandu has joined #openstack-nova | 08:50 | |
*** ociuhandu has quit IRC | 08:50 | |
*** ttsiouts has joined #openstack-nova | 08:51 | |
brinzhang_ | gibi: I think I get your comment, will be update | 08:51 |
*** ociuhandu has joined #openstack-nova | 08:53 | |
openstackgerrit | Jiri Suchomel proposed openstack/nova-specs master: Add spec for downloading images via RBD https://review.opendev.org/572805 | 08:53 |
*** mkrai has joined #openstack-nova | 09:07 | |
*** artom has quit IRC | 09:08 | |
openstackgerrit | xuyuanhao proposed openstack/nova master: the vms can not be force deleted when vm_status is soft-delete and task-state=deleting https://review.opendev.org/724260 | 09:08 |
*** mkrai_ has quit IRC | 09:08 | |
*** artom has joined #openstack-nova | 09:08 | |
*** mkrai has quit IRC | 09:15 | |
*** mkrai_ has joined #openstack-nova | 09:15 | |
openstackgerrit | Brin Zhang proposed openstack/nova master: Add nested resource providers limit for multi create https://review.opendev.org/723884 | 09:17 |
brinzhang_ | gibi: updated, although the sentence is somewhat redundant, but it is easier to understand, please review again, thanks | 09:17 |
*** dtantsur|afk is now known as dtantsur | 09:24 | |
*** efried has quit IRC | 09:27 | |
stephenfin | gibi, bauzas: Could I get you folks to take a look at these patches for me, one of which has been around for a loooong time https://review.opendev.org/#/c/706013/ https://review.opendev.org/#/c/530905/ | 09:33 |
*** xek has joined #openstack-nova | 09:40 | |
*** ociuhandu has quit IRC | 09:47 | |
*** ociuhandu has joined #openstack-nova | 09:48 | |
*** ociuhandu has quit IRC | 09:53 | |
*** udesale has quit IRC | 10:03 | |
*** udesale has joined #openstack-nova | 10:03 | |
*** brinzhang_ has quit IRC | 10:11 | |
*** ociuhandu has joined #openstack-nova | 10:12 | |
openstackgerrit | Brin Zhang proposed openstack/nova master: Add nested resource providers limit for multi create https://review.opendev.org/723884 | 10:15 |
openstackgerrit | Stephen Finucane proposed openstack/nova master: Fix 500 error while passing 4-byte unicode data https://review.opendev.org/407514 | 10:16 |
*** Liang__ has quit IRC | 10:25 | |
*** brinzhang has quit IRC | 10:29 | |
*** rpittau is now known as rpittau|bbl | 10:55 | |
openstackgerrit | Merged openstack/nova master: libvirt:driver:Disallow AIO=native when 'O_DIRECT' is not available https://review.opendev.org/682772 | 10:59 |
openstackgerrit | Merged openstack/nova master: Feature matrix: update AArch64 information https://review.opendev.org/715979 | 10:59 |
*** ociuhandu has quit IRC | 11:06 | |
*** ociuhandu has joined #openstack-nova | 11:07 | |
*** ociuhandu has quit IRC | 11:13 | |
*** tetsuro has joined #openstack-nova | 11:16 | |
*** ttsiouts has quit IRC | 11:25 | |
*** ttsiouts has joined #openstack-nova | 11:27 | |
openstackgerrit | jayaditya gupta proposed openstack/nova master: Support for --force flag for nova-manage placement heal_allocations command use this flag to forcefully call heal allocation for a specific instance https://review.opendev.org/715395 | 11:28 |
*** mkrai_ has quit IRC | 11:29 | |
*** mkrai has joined #openstack-nova | 11:29 | |
*** derekh has joined #openstack-nova | 11:30 | |
*** raildo has joined #openstack-nova | 11:39 | |
*** JamesBenson has joined #openstack-nova | 11:40 | |
*** ttsiouts has quit IRC | 11:43 | |
*** raildo has quit IRC | 11:49 | |
*** raildo has joined #openstack-nova | 11:50 | |
*** tetsuro has quit IRC | 11:52 | |
*** tetsuro has joined #openstack-nova | 11:53 | |
*** tetsuro has quit IRC | 11:54 | |
*** martinkennelly has quit IRC | 11:59 | |
*** sapd1_x has quit IRC | 12:07 | |
*** ociuhandu has joined #openstack-nova | 12:08 | |
*** ttsiouts has joined #openstack-nova | 12:16 | |
*** ttsiouts has quit IRC | 12:21 | |
gibi | stephenfin: did a review on both | 12:22 |
stephenfin | gibi++ ta | 12:23 |
*** derekh has quit IRC | 12:26 | |
* bauzas is back around | 12:29 | |
bauzas | (from PTO morning) | 12:29 |
*** martinkennelly has joined #openstack-nova | 12:31 | |
*** rpittau|bbl is now known as rpittau | 12:31 | |
gibi | gmann: I left feedback on https://review.opendev.org/#/c/723645 | 12:36 |
*** ociuhandu has quit IRC | 12:44 | |
*** ociuhandu has joined #openstack-nova | 12:45 | |
*** ociuhandu has quit IRC | 12:49 | |
*** nweinber has joined #openstack-nova | 12:53 | |
*** ttsiouts has joined #openstack-nova | 12:55 | |
*** links has quit IRC | 12:59 | |
*** ociuhandu has joined #openstack-nova | 12:59 | |
*** bbobrov has joined #openstack-nova | 13:00 | |
*** jsuchome has joined #openstack-nova | 13:01 | |
openstackgerrit | Jiri Suchomel proposed openstack/nova master: Add ability to download Glance images into the libvirt image cache via RBD https://review.opendev.org/574301 | 13:09 |
*** derekh has joined #openstack-nova | 13:12 | |
bauzas | gibi: gmann: fwiw, we need to merge https://review.opendev.org/#/c/723645/ by a RC2 :( | 13:15 |
bauzas | I mean, by merging it back to Ussuri for a next RC | 13:16 |
bauzas | because if not, nova-status for Ussuri won't check it | 13:16 |
*** ratailor has quit IRC | 13:16 | |
gibi | bauzas: yes, we have to do that | 13:17 |
bauzas | gibi: then I'm adding an ussuri-rc-potential tag to the bug | 13:18 |
gibi | bauzas: good point, thanks | 13:20 |
gibi | gmann: have you tried the new policy upgrade check in a devstack? I'm trying it but I see that the enforcer is None here https://review.opendev.org/#/c/723645/7/nova/cmd/status.py@378 | 13:24 |
*** udesale_ has joined #openstack-nova | 13:24 | |
*** udesale has quit IRC | 13:27 | |
*** mkrai has quit IRC | 13:28 | |
*** ttsiouts has quit IRC | 13:29 | |
gibi | gmann: linked printouts in the review | 13:29 |
sean-k-mooney | gibi: printouts as in paper? | 13:31 |
gibi | sean-k-mooney: :) | 13:31 |
gibi | sean-k-mooney: printouts as the stuff my debugger printed | 13:31 |
sean-k-mooney | ah | 13:31 |
sean-k-mooney | i have actully done some spec reviews in paper form when i needed to compare and contrast 3 interrealted specs and ran out of monitor space on 3 monitors.. | 13:32 |
sean-k-mooney | it works but its a pain and should be avoided unless you hate trees | 13:32 |
bauzas | gibi: I need to work on a next devstack :) | 13:34 |
bauzas | I have some hardware, I should try to use it | 13:34 |
gibi | sean-k-mooney: I did that last time when reading heavy telco specifications | 13:35 |
*** psachin has quit IRC | 13:36 | |
*** avolkov has quit IRC | 13:42 | |
artom | belmoreira, hello again, have more time this morning (well, afternoon, for you) | 13:42 |
belmoreira | hi artom, tell me | 13:42 |
artom | belmoreira, so, stephenfin has proposed an online data migration here: https://review.opendev.org/#/c/537414/26/nova/objects/compute_node.py@533 to get rid of really old dict JSON blobs in instance_extra.numa_topology and replace them with ovo JSON blobs | 13:43 |
*** mkrai has joined #openstack-nova | 13:44 | |
artom | The proposal is to use SQL string filtering, which is slow - so for deployments with a large number of instances (anything over 1000, based on what zzzeek was saying yesterday), this might be a painful migration to run | 13:45 |
artom | belmoreira, CERN's probably the largest operator (that we know of), so I was wondering if you could have any input on that | 13:46 |
artom | Like, maybe online data migrations taking forever isn't such a big deal? | 13:46 |
belmoreira | let me have a look | 13:46 |
artom | Thanks :) | 13:47 |
belmoreira | for what I see it should generate something like 'select * from compute_nodes where deleted = 0 and numa_topology like "%nova_object.name%";' | 13:52 |
belmoreira | being "compute_nodes" table, this will be done per cell | 13:52 |
*** Liang__ has joined #openstack-nova | 13:53 | |
stephenfin | belmoreira: pretty much | 13:53 |
dansmith | belmoreira: he linked you to the wrong one.. we're doing compute nodes, but we're also doing all instances in the next file | 13:53 |
stephenfin | though | 13:53 |
artom | belmoreira, oh, sorry, yeah, the compute nodes one isn't the big deal, there are rarely over a thousand of those | 13:54 |
stephenfin | 'select * from instance_extra where deleted = 0 and numa_topology is not null and numa_topology like "%nova_object.name%";' | 13:54 |
artom | The instances are the open question | 13:54 |
belmoreira | We have ~200 nodes per cell. If that is the correct query (I'm expecting that sql alchemy introduces much more stuff) executing it in a cell of ~200 is fast enough (39 ms) | 13:54 |
artom | stephenfin, did you find out about the filtering order? Like, if we put deleted=0 first, does it reduce the amount of instances the string filter has to work over? | 13:55 |
stephenfin | artom: the docs that I read said that the SQL engine would do that optimization itself | 13:55 |
belmoreira | stephenfin artom are we talking about instance_extra as well? | 13:56 |
artom | stephenfin, ok, that's reassuring | 13:56 |
artom | belmoreira, we are | 13:56 |
stephenfin | belmoreira: yes, both | 13:56 |
*** bnemec has joined #openstack-nova | 13:56 | |
*** dklyle has joined #openstack-nova | 13:57 | |
belmoreira | ok, I started with the compute_node.py :) | 13:57 |
*** Liang__ is now known as LiangFang | 13:57 | |
artom | belmoreira, that was my bad, I linked the wrong thing | 13:58 |
*** mriedem has joined #openstack-nova | 13:58 | |
belmoreira | I'm checking how much time takes the query that stephenfin mentioned | 13:59 |
openstackgerrit | Stephen Finucane proposed openstack/nova master: nova-audit: Use cliff instead of homegrown argparse bleh https://review.opendev.org/724332 | 14:00 |
*** ttsiouts has joined #openstack-nova | 14:00 | |
artom | I have to say, this shows the power of operator involvement upstream | 14:00 |
dansmith | artom: man, I really wish this had just been done as migrate-on-load | 14:00 |
artom | Us devs get "access" to actual large deployment, and the operators get devs not being stupid ;) | 14:01 |
artom | dansmith, I guess stephenfin's response to that would be "but then we'll never know for sure that we can remove the compat routines" | 14:01 |
stephenfin | 'zactly | 14:02 |
dansmith | artom: I think we'd load this whenever we do update available resource, so it'd be migrated within one cycle | 14:02 |
artom | dansmith, for all instances? | 14:02 |
dansmith | and if not, we could have just added it to the preloads and forced it to happen | 14:02 |
gmann | bauzas: gibi oh, i checked locally but i might need to init the policy explicitly depends on when cmd are run. let me test those scenario also. I will do after my internal meeting. | 14:02 |
dansmith | artom: yes, parallelized by compute node | 14:03 |
stephenfin | dansmith: have we an in-tree example of this migrate-on-load pattern? | 14:03 |
dansmith | I can point you at one for sure, but let me see if we have any in current master | 14:04 |
artom | The fact that belmoreira appears to have RIP'ed running that query is worrisome ;) | 14:04 |
sean-k-mooney | stephenfin: one thing we could do if this was expensive is to not do it as a normal online data migration in the sense of a global change | 14:06 |
dansmith | stephenfin: https://github.com/openstack/nova/blob/58aaffade9f78c5fdc6b0d26ec26b70908b7a6aa/nova/objects/migration.py#L89-L89 | 14:06 |
sean-k-mooney | but instead have a nova manage command that would allow you do it per host | 14:06 |
dansmith | stephenfin: we generate uuid for migration on load if not present | 14:06 |
sean-k-mooney | so you could slowly space our the migration host by host until you are done | 14:07 |
dansmith | sean-k-mooney: we already have batch limits in place, so that doesn't help | 14:07 |
dansmith | other than create more debt in the form of a new command | 14:07 |
belmoreira | I was running these in our DBs. Again this is per cell. Some have much more than 1000 instances but for those we don't have numa topology defined. The ones that have numa_topology defined only run batch processing so they are large instances. Meaning that we have x4 or x8 times the number of compute nodes. | 14:07 |
belmoreira | in both cases queries take few ms | 14:08 |
sean-k-mooney | well ya that basically what i was thinkign of just limiting the rows it touches but if we can already do that then sure | 14:08 |
stephenfin | belmoreira++ excellent, thanks for checking that up | 14:08 |
artom | belmoreira, yep, that's very helpful, thanks! | 14:08 |
belmoreira | artom stephenfin thanks for ping me on this | 14:09 |
sean-k-mooney | well there are two parts first the query to get all the instance that need to be migrated then we need to lock the db and update all recoreds | 14:09 |
sean-k-mooney | you can actully do that in one update query | 14:10 |
sean-k-mooney | in general bug not in this case | 14:10 |
artom | belmoreira, just to make sure I understand, you're saying that your numa_topology-having instances are only about 4 to 8 instances per compute host, right? | 14:10 |
gibi | gmann: ack, thanks | 14:10 |
dansmith | belmoreira: I'm not sure I got what you said.. you don't have a db with tons of instances with a numa topology to test this on right? | 14:10 |
sean-k-mooney | if i parsed it correctly the cells with numa instance only run large vms so they have fewer isntances then the other cells | 14:11 |
dansmith | right | 14:12 |
belmoreira | artom yes | 14:12 |
*** ttsiouts has quit IRC | 14:12 | |
artom | belmoreira, aha, ok - well, it tells us that the sql engine (or sqlalchemy itself?) is smart and checks the more restrictive stuff first - iow having a numa_topology at all | 14:12 |
artom | But it doesn't tell us what happens if there are thousands of numa_topology-having instances... :( | 14:13 |
sean-k-mooney | to be fair i would suspect that is likly common of many deployments. e.g. most nuam instance will be large flavors and therefore there will be less of them then standard instances | 14:13 |
artom | sean-k-mooney, that does seem very likely... | 14:13 |
dansmith | stephenfin: without the migrate on load, any idea what I can do to tickle those instances to rewrite them? | 14:13 |
belmoreira | dansmith you're right. I don't have a cell with a lot of instances. <2000 instances only | 14:13 |
dansmith | belmoreira: gotcha, thanks | 14:13 |
sean-k-mooney | belmoreira: could you do a count(*) on instance that have a numa topology so we can get a feel for the amount you have | 14:14 |
belmoreira | sean-k-mooney yes, give a sec | 14:14 |
sean-k-mooney | just wondering if its in the 100s or roughly what you were messuring | 14:14 |
sean-k-mooney | artom: i mean we can always fake it if we need too | 14:15 |
artom | sean-k-mooney, fake what? | 14:16 |
sean-k-mooney | a db with 1000s of numa instances | 14:16 |
artom | True | 14:16 |
belmoreira | our typical use case: | 14:16 |
belmoreira | select count(*) from instance_extra where deleted = 0; #711 | 14:16 |
belmoreira | select count(*) from compute_nodes nodes; #176 | 14:16 |
belmoreira | select * from instance_extra where deleted = 0 and numa_topology is not null and numa_topology like "%nova_object.name%"; #took 38.2 ms | 14:16 |
sean-k-mooney | ok so 700 ish is not bad | 14:17 |
sean-k-mooney | its not 2000 but its not 10 | 14:17 |
artom | So based on those number it might actually be OK to go ahead as is... | 14:18 |
sean-k-mooney | if we were to assume this was liniar then i think it would be accpetable | 14:18 |
artom | *numbers | 14:18 |
dansmith | so, if we just convert to migrate-on-load, we don't take any overhead for string searching at all, migrate only the instances that need it, and are assured that they have migrated all instances by the end of the cycle, right? | 14:19 |
dansmith | even with nova-manage, we can't make them run it without a blocker migration (which is also expensive) | 14:19 |
belmoreira | in this case nova_topology is not defined, but may help in the analysis: | 14:19 |
belmoreira | select count(*) from instance_extra where deleted = 0; #1573 | 14:19 |
belmoreira | select count(*) from compute_nodes nodes; #88 | 14:19 |
belmoreira | select * from instance_extra where deleted = 0 and numa_topology is not null and numa_topology like "%nova_object.name%"; #took 106 ms | 14:19 |
*** tkajinam has joined #openstack-nova | 14:19 | |
dansmith | if we did that, stephenfin could just remove his nova-manage bits entirely, no impact for anyone that doesn't have old instances | 14:19 |
stephenfin | Yeah, I can do that too. Let me see if I can wrangle something up | 14:20 |
artom | dansmith, the end of cycle stuff is complicated by FFUs and operators skipping releases, no? | 14:20 |
sean-k-mooney | ya if we migrate on load that also works but the only realy implciation is keeping that for a relase or two for FFU | 14:20 |
dansmith | artom: ah, yeah, good point... | 14:21 |
artom | So 2 or 3 cycles, I guess | 14:21 |
dansmith | artom: if we had migrate-on-load for a while, then it'd be fine, but that's the problem trying to do the switch quickly | 14:21 |
sean-k-mooney | we coudl do both e.g. do the migrate on load and then wen we drop that provide a nova manange command with a blocker migration | 14:21 |
dansmith | sean-k-mooney: stephenfin has been trying to get this done for a long time so we're trying not to make this a career-long arc | 14:22 |
artom | dansmith, it's a thing he'll bequeath to his grandchildren | 14:22 |
sean-k-mooney | yep i know if this was a few weeks ato i woudl have suggeted doing the migrate on load in ussuri and then the blocker/migrate command in victoria | 14:23 |
stephenfin | 826 days | 14:23 |
*** priteau has joined #openstack-nova | 14:23 | |
stephenfin | oh my, I appear to have broken zuul https://review.opendev.org/#/c/724332/ | 14:23 |
artom | Dayyyyuuuuum | 14:24 |
sean-k-mooney | hehe let me check with infra | 14:24 |
* artom screenshots | 14:25 | |
sean-k-mooney | zuul is being restart | 14:27 |
sean-k-mooney | it hit an out of memory issue and they are currently trying to fix it | 14:28 |
*** dpawlik has quit IRC | 14:28 | |
sean-k-mooney | so hold rechecks for a few minutes while they sort this out | 14:29 |
* artom prefers to blame stephenfin | 14:29 | |
sean-k-mooney | well clearly his patch ate all the memory | 14:29 |
sean-k-mooney | ok zuul is back up. changes running before 14:00 UTC have been requed anything uploaded or approved bettween 14:00 and 14:30 needs to be rechecked | 14:35 |
sean-k-mooney | infra are going to send a staus update for the same shortly | 14:35 |
-openstackstatus- NOTICE: Zuul had to be restarted, all changes submitted or approved between 14:00 UTC to 14:30 need to be rechecked, we queued already those running at 14:00 | 14:35 | |
*** efried has joined #openstack-nova | 14:35 | |
*** mlavalle has joined #openstack-nova | 14:41 | |
kashyap | sean-k-mooney: Since you've reviewed an older version (PS-5) and a newer one (PS-7), I'll just address the PS-7 bits here: https://review.opendev.org/#/c/631154/ | 14:52 |
kashyap | sean-k-mooney: That okay? | 14:52 |
*** ociuhandu has quit IRC | 14:52 | |
* kashyap goes ahead with that plan to respond | 14:55 | |
sean-k-mooney | kashyap: am sure | 14:55 |
sean-k-mooney | i think i coppied most of the relevent bits although if you read both and just resond on 7 that is fine with me | 14:56 |
sean-k-mooney | or well update it in version 8 | 14:56 |
kashyap | sean-k-mooney: While I comment in the spec, on the "increased memory usage" bit -- I knew that thing, but haven't explicitly mentioned it because it requires precise tests, in what scenarios, etc | 14:56 |
kashyap | You can't just put a generic: "in all cases memory is increased" | 14:57 |
sean-k-mooney | kashyap: not really even with 1 pci root port dan found it used more memeory the pc | 14:57 |
kashyap | It requires more testing; so somebody ought to do the "performance testing guy's job"... | 14:57 |
sean-k-mooney | so i think in all configurtion it has more memory overhead | 14:57 |
kashyap | sean-k-mooney: Right; I'll mention that, but need to carefully write it in context and with a config example | 14:58 |
sean-k-mooney | or we can jsut say we expect that q35 will use more memroy as we have never seen a case where it uses less | 14:58 |
kashyap | sean-k-mooney: Also I don't want us to get sucked into that black-hole and get derailed... | 14:58 |
kashyap | But that begs the question: "how much more memory than before" | 14:58 |
sean-k-mooney | sure an that we can leave to the operator | 14:59 |
kashyap | Which requires clear a example benchmark | 14:59 |
sean-k-mooney | or perfomance guy | 14:59 |
kashyap | Okido; I actually first mentioned it locally and then removed it, as I was still thinking of it | 14:59 |
* kashyap goes to respond... | 14:59 | |
sean-k-mooney | i mainly just want them to be aware that they should consider it when upgrading so they can factor it in to there host memory reservation and capastity planning | 15:00 |
kashyap | Yeah, definitely. Thx for the taking time respond. | 15:01 |
*** trident has quit IRC | 15:04 | |
*** trident has joined #openstack-nova | 15:05 | |
*** LiangFang has quit IRC | 15:05 | |
*** ociuhandu has joined #openstack-nova | 15:15 | |
*** mkrai has quit IRC | 15:25 | |
*** spatel has joined #openstack-nova | 15:32 | |
spatel | sean-k-mooney: do you know how much CPU would be enough to reserve for hypervisor? using isolcpus option? | 15:35 |
*** jaosorior has joined #openstack-nova | 15:44 | |
sean-k-mooney | i dont adviase using isolcpus | 15:45 |
sean-k-mooney | generally 1 phsycial core is more then enough on a compute node or 1 per numa nodes if you want to do affintiy of interupts | 15:46 |
spatel | sean-k-mooney: but mostly for NFV they suggest using isolcpus for isolation | 15:47 |
sean-k-mooney | spatel: i generally recommend you use the vcpu_pin_set or in train+ cpu_dedicate_set and cpu_shared_set to do the reservation | 15:48 |
spatel | you are saying 2 cpu core would be more than enough for hypervisor (per NUMA)? | 15:48 |
sean-k-mooney | spatel: you should only use isolcpus on realtime hosts and only on the cpus used for pinned vms | 15:48 |
spatel | I am doing all cpu pinning (dedicated) option for all my workload | 15:49 |
spatel | we need performance not quantity. | 15:49 |
spatel | I am planning to use isolcpus + vcpu_pin_set (both option to allocate dedicated CPU) | 15:51 |
*** gyee has joined #openstack-nova | 15:53 | |
openstackgerrit | Jiri Suchomel proposed openstack/nova master: Add ability to download Glance images into the libvirt image cache via RBD https://review.opendev.org/574301 | 15:55 |
spatel | sean-k-mooney: ^^ | 15:55 |
sean-k-mooney | 1 phsyical core(2 hyperthreads) is normally enouch for a compute node if you are not using heavy telemetry | 15:56 |
sean-k-mooney | spatel: you can use isolcpus + vcpu_pin_set but only if the vm is pinned | 15:56 |
sean-k-mooney | when you use isolcpus it disables the linux kernel shcduler for those cores | 15:57 |
sean-k-mooney | so if you have floating vms then they wont float | 15:58 |
spatel | sean-k-mooney: sounds good, yes we do pinned VM (currently i have assigned 8 cores but wanted to see what people mostly recommend ) | 15:58 |
sean-k-mooney | in generally isolcpus is only a good idea if you are running realtime wrokloads | 15:58 |
sean-k-mooney | spatel: you might want to look into tuned by the way | 15:58 |
sean-k-mooney | it supports configuring this via userspace/sysfs | 15:59 |
spatel | tuned profile? | 15:59 |
sean-k-mooney | https://github.com/redhat-performance/tuned/tree/master/profiles/cpu-partitioning | 15:59 |
*** kberger_ has joined #openstack-nova | 15:59 | |
sean-k-mooney | isolcpus is a deprecated kernel argument | 16:00 |
sean-k-mooney | https://github.com/redhat-performance/tuned/blob/master/profiles/cpu-partitioning/cpu-partitioning-variables.conf | 16:00 |
*** kberger_ has quit IRC | 16:00 | |
melwitt | gmann: I'm trying to understand a bit about how/why two grenade jobs would run on stable/ussuri (we don't have an example yet). and I looked at the tempest change and realized I don't understand why it ran two grenade jobs on the name change patch https://review.opendev.org/722551 could you please explain why two jobs run on openstack/tempest? I thought it would have been only one | 16:00 |
*** kberger_ has joined #openstack-nova | 16:00 | |
sean-k-mooney | tuned uses the the sysfs/cgroups interface to achive the same effect without the drawbacks | 16:01 |
spatel | sean-k-mooney: ohh good to know :) | 16:01 |
spatel | will look into that | 16:01 |
sean-k-mooney | normally i would jsut set isolated_cores=2,4-7 and not set no_balance_cores=5-10 | 16:01 |
sean-k-mooney | although no_balance_cores=5-10 would be useful for realtime hosts or ovs-dpdk | 16:02 |
spatel | Do i need to restart machine to set this values ? | 16:02 |
*** KeithMnemonic has quit IRC | 16:03 | |
gmann | melwitt: sure. for nova stable/ussuri, it will be both job running if you recheck any ussuri backport (or testing patch) until 724189 is merged. this is because compute template in Tempest switched to new job (https://review.opendev.org/#/c/722551/3/.zuul.yaml@543) and nova stable/ussuri ./.zuul.yaml have old job also listed for irrelevant file | 16:03 |
sean-k-mooney | spatel: am i dont think so | 16:04 |
sean-k-mooney | you would on teh kernel command line but not with tuned | 16:04 |
gmann | melwitt: https://github.com/openstack/nova/blob/stable/ussuri/.zuul.yaml#L400 | 16:04 |
*** noonedeadpunk has left #openstack-nova | 16:04 | |
spatel | sean-k-mooney: yes kernel does require reboot but lets me test in tuned | 16:04 |
melwitt | gmann: sorry I mean as an aside, why did two jobs run on https://review.opendev.org/722551 ? I realized I didn't understand that | 16:05 |
gmann | melwitt: on Tempest side it is still running because Tempest gate run 'integrated-gate-py3' template which is running all service tests. and that template is on openstack-zuul-jobs side so once i update that template then Tempest also will have single new job | 16:05 |
melwitt | thanks | 16:06 |
gmann | compute and service specific template are taken care by 722551 but integrated-gate-py3 template is not yet. | 16:06 |
*** ociuhandu has quit IRC | 16:07 | |
gmann | best things we did in grenade side is we alias the grenade-py3 to new zuulv3 native job to avoid running legacy + new jobs during this migration. It is same zuulv3 jobs running twice with different name so will not cause issue. | 16:07 |
*** ociuhandu has joined #openstack-nova | 16:08 | |
*** ociuhandu has quit IRC | 16:13 | |
*** nightmare_unreal has quit IRC | 16:16 | |
*** jsuchome has quit IRC | 16:25 | |
*** jamesden_ has joined #openstack-nova | 16:31 | |
*** jamesdenton has quit IRC | 16:31 | |
*** evrardjp has quit IRC | 16:35 | |
*** evrardjp has joined #openstack-nova | 16:35 | |
*** tesseract has quit IRC | 16:36 | |
*** dtantsur is now known as dtantsur|afk | 16:36 | |
*** rpittau is now known as rpittau|afk | 16:37 | |
*** tkajinam has quit IRC | 16:57 | |
openstackgerrit | Stephen Finucane proposed openstack/nova master: WIP: objects: Add migrate-on-load behavior for legacy NUMA objects https://review.opendev.org/724381 | 17:09 |
stephenfin | dansmith: That's not complete, but when you've a chance can you sanity check to see if that's what you're after? ^ | 17:10 |
* stephenfin -> 🏃 | 17:10 | |
openstackgerrit | sean mooney proposed openstack/nova master: silence amqp heartbeat warning https://review.opendev.org/724188 | 17:10 |
*** ociuhandu has joined #openstack-nova | 17:12 | |
artom | That just looks like stephenfin harpooned a dude coming at him | 17:12 |
sean-k-mooney | hehe you dont know what stephenfin gets up to on his runs | 17:14 |
sean-k-mooney | its one way to keep otheres away | 17:15 |
sean-k-mooney | i mean would you approch a person running with a harpoon | 17:15 |
artom | Depends | 17:16 |
artom | Am I a masochistic whale? | 17:16 |
*** ociuhandu has quit IRC | 17:17 | |
* sean-k-mooney that feels like a trap | 17:17 | |
* sean-k-mooney like does this make my butt look big | 17:17 | |
*** avolkov has joined #openstack-nova | 17:21 | |
*** maciejjozefczyk has quit IRC | 17:22 | |
*** priteau has quit IRC | 17:32 | |
*** ralonsoh has quit IRC | 17:41 | |
openstackgerrit | sean mooney proposed openstack/nova master: [WIP] add workaround to disable multiple port bindings https://review.opendev.org/724386 | 17:42 |
openstackgerrit | sean mooney proposed openstack/nova master: [DNM] testing with force_legacy_port_binding workaround https://review.opendev.org/724387 | 17:42 |
*** udesale_ has quit IRC | 17:42 | |
sean-k-mooney | by the way how do people feel about ^ as a temproy workaround for this long standing issue while we figure out how to fully fix this | 17:49 |
sean-k-mooney | i think my other patches are still the wright approch https://review.opendev.org/#/c/602432/ and https://review.opendev.org/#/c/640258 but i have not looked at this in a year and was not planning to but i guess i can try to find time to look at this again | 17:51 |
*** ociuhandu has joined #openstack-nova | 18:00 | |
*** ociuhandu has quit IRC | 18:31 | |
*** ociuhandu has joined #openstack-nova | 18:33 | |
*** derekh has quit IRC | 18:36 | |
*** ociuhandu has quit IRC | 18:37 | |
*** iurygregory has quit IRC | 18:38 | |
*** martinkennelly has quit IRC | 18:53 | |
*** ociuhandu has joined #openstack-nova | 19:12 | |
*** ociuhandu has quit IRC | 19:25 | |
*** ociuhandu has joined #openstack-nova | 19:26 | |
*** ociuhandu has quit IRC | 19:32 | |
*** jangutter has quit IRC | 19:33 | |
*** jangutter_ has joined #openstack-nova | 19:34 | |
*** damien_r has quit IRC | 19:34 | |
*** priteau has joined #openstack-nova | 19:45 | |
*** avolkov has quit IRC | 19:45 | |
*** avolkov has joined #openstack-nova | 19:50 | |
*** priteau has quit IRC | 19:51 | |
*** amodi has joined #openstack-nova | 19:51 | |
*** belmoreira has quit IRC | 19:58 | |
*** maciejjozefczyk has joined #openstack-nova | 20:27 | |
openstackgerrit | Ghanshyam Mann proposed openstack/nova master: Add nova-status upgrade check and reno for policy new defaults https://review.opendev.org/723645 | 20:43 |
*** nweinber has quit IRC | 20:45 | |
gmann | gibi: i updated the upgrade check with more safer way to cover all cases how policy is initialized or not. I have added test also for that and waiting for grenade job result which run upgrade check. also I am preparing new devstack but some issue, please let me know if it run fine now on your env- https://review.opendev.org/723645 | 20:55 |
gmann | i mean with the policy file you showed in http://paste.openstack.org/show/792881/ | 20:56 |
*** ccamacho has quit IRC | 20:59 | |
*** ociuhandu has joined #openstack-nova | 21:01 | |
*** slaweq has quit IRC | 21:01 | |
*** xek has quit IRC | 21:02 | |
JamesBenson | All: Does anyone know if Nova supports the Tesla M2050 or M2070 for vGPU? | 21:03 |
*** ociuhandu has quit IRC | 21:06 | |
melwitt | JamesBenson: I don't know but here's the doc we have if you didn't already find https://docs.openstack.org/nova/latest/admin/virtual-gpu.html | 21:08 |
JamesBenson | melwitt: Yes, thanks, I saw that resource too. I have some older m1000e's that I got and just not sure if I should bother installing openstack or just have them as one off's for testing code. | 21:09 |
*** slaweq has joined #openstack-nova | 21:09 | |
melwitt | ok. bauzas would know but he's off by now today. sean-k-mooney might know ^ | 21:10 |
*** slaweq has quit IRC | 21:14 | |
*** raildo has quit IRC | 21:18 | |
*** rcernin has joined #openstack-nova | 21:32 | |
*** rcernin has quit IRC | 21:33 | |
*** rcernin has joined #openstack-nova | 21:34 | |
*** spatel has quit IRC | 21:40 | |
*** mriedem has left #openstack-nova | 21:48 | |
openstackgerrit | melanie witt proposed openstack/nova-specs master: Re-propose nova-audit spec for Victoria https://review.opendev.org/724430 | 21:48 |
*** mlavalle has quit IRC | 22:09 | |
*** hamzy_ has joined #openstack-nova | 22:11 | |
*** mlavalle has joined #openstack-nova | 22:12 | |
*** hamzy has quit IRC | 22:13 | |
*** mlavalle has quit IRC | 22:20 | |
*** mlavalle has joined #openstack-nova | 22:26 | |
*** ociuhandu has joined #openstack-nova | 22:36 | |
melwitt | gmann: heya, would you mind looking over this review? it looks sane to me but could use your api validation expertise https://review.opendev.org/407514 | 22:37 |
melwitt | I'm not clear on whether it could possibly cause any backward compat issues. it means to only target the 500 error casd | 22:38 |
melwitt | *case | 22:38 |
*** tkajinam has joined #openstack-nova | 22:51 | |
*** ociuhandu has quit IRC | 22:58 | |
*** spatel has joined #openstack-nova | 23:03 | |
*** spatel has quit IRC | 23:08 | |
*** tosky has quit IRC | 23:11 | |
*** ociuhandu has joined #openstack-nova | 23:33 | |
*** ociuhandu has quit IRC | 23:53 |
Generated by irclog2html.py 2.17.2 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!