noonedeadpunk | mornings! can we merge https://review.opendev.org/c/openstack/ansible-role-python_venv_build/+/830570 cherry-picks ? | 11:34 |
---|---|---|
*** frenzyfriday|rover is now known as frenzyfriday | 12:49 | |
mgariepy | morning anyone here have configured vga passthrough with nvlink between 2 gpus ? | 14:03 |
opendevreview | Merged openstack/ansible-role-python_venv_build stable/wallaby: Slurp constraints.txt separately for each host in a batch https://review.opendev.org/c/openstack/ansible-role-python_venv_build/+/830570 | 14:14 |
noonedeadpunk | nah, I'm not, we ordered gpu without nvlink at end as info about nvlink was a bit confusing for me at least | 14:19 |
mgariepy | yep, in my case i do have 4 gpus 2 with nvlink and 2 without.. | 14:19 |
mgariepy | really not sure how it should work for the nvlink as i'm not 100% sure nova has something to tag the 2 that are nvlink linked. | 14:20 |
noonedeadpunk | I was actually looking at nvlink in terms of vgpu and how splitting and scheduling would work then... | 14:20 |
noonedeadpunk | I guess with nvlink it would represented as single pci device, or? | 14:21 |
mgariepy | in my case the gpus are gaming ones.. | 14:21 |
mgariepy | cheap for research :) haha | 14:21 |
mgariepy | i guess i'll create a flavor with all the gpus in it... | 14:21 |
mgariepy | i wasn't invloved in the purchase on that one. i only manage openstack for them. | 14:22 |
opendevreview | Dmitriy Rabotyagov proposed openstack/openstack-ansible stable/wallaby: Bump SHAs for Wallaby https://review.opendev.org/c/openstack/openstack-ansible/+/830406 | 14:35 |
JamesGibo | Not going to help with your NVlink issue, but something to be aware of is that mdev's for GPUs are not recreated on reboot https://bugs.launchpad.net/nova/+bug/1900800?comments=all | 14:50 |
mgariepy | thanks for the inpu JamesGibo | 14:54 |
noonedeadpunk | oh, yes, there's nvidia-smi that needs to be launched for mdev | 14:55 |
*** dviroel is now known as dviroel|lunch | 14:57 | |
mgariepy | in my case i'm not splitting gpus. | 15:07 |
opendevreview | Merged openstack/ansible-role-python_venv_build stable/victoria: Slurp constraints.txt separately for each host in a batch https://review.opendev.org/c/openstack/ansible-role-python_venv_build/+/830571 | 15:12 |
* damiandabrowski[m] leaving for a vacation, will be back on March 7th | 15:25 | |
mgariepy | take care damiandabrowski[m] | 15:36 |
damiandabrowski[m] | thank You! | 15:51 |
*** dviroel|lunch is now known as dviroel | 16:04 | |
spatel | mgariepy here is my GPU passthrough blog - https://satishdotpatel.github.io/gpu-passthrough-for-openstack/ | 17:04 |
mgariepy | spatel, yep i already have that do you have nvlink between gpus ? | 17:08 |
mgariepy | but anyway i'll have all the gpus inside a single vm. | 17:09 |
spatel | no i don't have that config.. | 17:09 |
spatel | does it required license ? | 17:09 |
spatel | how does nvlink work? they communicate over ethernet? | 17:10 |
spatel | or this is link between two GPU card within single system (like NUMA) | 17:11 |
mgariepy | link between 2 gpus | 17:12 |
mgariepy | i don't have fancy hardware that requires licence :) | 17:12 |
mgariepy | only doing passthrought 1:1 | 17:12 |
spatel | link between 2 GPU in two different system or single? | 17:12 |
mgariepy | a single | 17:12 |
mgariepy | it's a phisical link between 2 gpus | 17:13 |
spatel | I have two GPU card in single system but they work independently | 17:13 |
spatel | What are the advantage to link two GPU? vs two independent card ? | 17:14 |
spatel | If there are significant advantage then i may link them :) | 17:15 |
spatel | https://www.tomshardware.com/news/nvidia-nvlink-boosts-performance,28989.html | 17:17 |
noonedeadpunk | I thought they act as single computing core tbh... | 17:45 |
noonedeadpunk | like 2 socket motherboard with cpu... | 17:45 |
opendevreview | Dmitriy Rabotyagov proposed openstack/openstack-ansible stable/victoria: Bump SHAs for Victoria https://review.opendev.org/c/openstack/openstack-ansible/+/830416 | 18:03 |
opendevreview | Dmitriy Rabotyagov proposed openstack/openstack-ansible stable/pike: EOL Pike branch https://review.opendev.org/c/openstack/openstack-ansible/+/826956 | 18:05 |
opendevreview | Dmitriy Rabotyagov proposed openstack/openstack-ansible stable/xena: Add test of used SHAs https://review.opendev.org/c/openstack/openstack-ansible/+/831031 | 18:06 |
opendevreview | Dmitriy Rabotyagov proposed openstack/openstack-ansible stable/wallaby: Add test of used SHAs https://review.opendev.org/c/openstack/openstack-ansible/+/831032 | 18:06 |
mgariepy | i think it's just faster to push data over the nvlink between the gpus thant go through the pci lanes. | 18:20 |
spatel | I saw doc. PCI support 16Gbps vs nvlink support 150Gbps with Tesla v100S card | 18:40 |
lowercase | noonedeadpunk: After testing some more i've hit a wall with the fluent-logger + oslo.log. I've posted a question in #openstack-oslo to see if anyone has some guidance. | 18:40 |
opendevreview | Merged openstack/ansible-role-python_venv_build stable/xena: Slurp constraints.txt separately for each host in a batch https://review.opendev.org/c/openstack/ansible-role-python_venv_build/+/830369 | 18:42 |
spatel | what are we trying to achieve with fluent? | 18:47 |
lowercase | This: https://docs.openstack.org/oslo.log/latest/admin/example_nova.html | 18:48 |
lowercase | Enable a fluent logger within openstack to log directly to fluent annndddd hopefully with fluent_formatter it will be automatically in fluent parsable form. | 18:49 |
lowercase | cause I try to say no to regex when i can lol | 18:49 |
noonedeadpunk | damn... sorry I don't really have much time to dig into this | 18:51 |
noonedeadpunk | if to say any | 18:52 |
spatel | sorry, may be i am new to fluent so asking odd question. You are saying openstack will send logs to fluent and then it will automatically formate logs.. ? | 18:54 |
lowercase | That's the goal. | 18:55 |
lowercase | With one minor difference. Openstack -> creates logger -> formats logs -> sends to fluent. | 18:55 |
spatel | in current deployment of OSA use journalctl | 18:56 |
spatel | fluent will keep logs in simple text file? | 18:56 |
lowercase | By pushing the burden onto openstack oslo, I don't need special regex formatters in a fluent config to parse openstack logs. They are already ready to go, and future proof and the burden is on the openstack oslo team to keep the log format compliant. | 18:57 |
lowercase | OSA pushes all openstack application logs to journalctl, yes. What fluent decides to push the logs is up to you. My fluentd pushes the logs to Loki. | 18:58 |
spatel | kolla-ansible use text file and which is very handy sometime.. | 18:59 |
lowercase | Correct, but what I gain with journalctl is a lot of extra metadata that is not available by tailing a text log file. | 19:00 |
lowercase | Also, I would not be required to maintain the regex matching that is provided in kolla-ansible. | 19:01 |
spatel | I like clear text file :) because easy to handle. journal is totally overhead for sure. | 19:15 |
opendevreview | Merged openstack/openstack-ansible-os_zun master: Update Zun api-paste https://review.opendev.org/c/openstack/openstack-ansible-os_zun/+/822847 | 19:20 |
opendevreview | Jonathan Rosser proposed openstack/ansible-role-pki master: Refactor conditional generation of CA and certificates https://review.opendev.org/c/openstack/ansible-role-pki/+/830794 | 19:24 |
opendevreview | Jonathan Rosser proposed openstack/openstack-ansible-os_keystone master: Migrate ssl certificate generation to the PKI role https://review.opendev.org/c/openstack/openstack-ansible-os_keystone/+/830179 | 19:26 |
spatel | how long k8s cluster create using magnum ? | 19:48 |
spatel | my cluster stuck in CREATE_IN_PROGRESS last 10 min and more | 19:49 |
spatel | 1 master 2 worker node | 19:49 |
admin1 | more than 10 mins spatel | 19:59 |
admin1 | sometimes it might last 15-20mins and then it errors out : | 19:59 |
spatel | hmm now its 20 min | 19:59 |
admin1 | you can try to ssh and then ps a x | 19:59 |
admin1 | and journalctl and see if its acutally working | 19:59 |
spatel | its been long time it stuck here - https://paste.opendev.org/show/bPAZRp1lLHuUylhSoqSA/ | 20:00 |
admin1 | i will help you tomorrow :) | 20:03 |
admin1 | coz i have to run now :) | 20:03 |
admin1 | have a great weekend .. | 20:03 |
spatel | np.. | 20:03 |
spatel | have a good weekend | 20:03 |
*** dviroel is now known as dviroel|afk | 20:21 | |
spatel | I am using fedora-coreos-35.20220213.2.0-openstack.x86_64.qcow2 image and some how my ssh-key not working.. look like cloud-init doesn't working in that image | 21:44 |
spatel | same ssh key working on all other image but not with fedora-coreos | 21:44 |
Generated by irclog2html.py 2.17.3 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!