*** tetsuro has joined #openstack-infra | 00:25 | |
*** tetsuro_ has joined #openstack-infra | 00:28 | |
*** tetsuro has quit IRC | 00:31 | |
*** armax has joined #openstack-infra | 00:34 | |
*** dangtrinhnt has joined #openstack-infra | 00:41 | |
*** armax has quit IRC | 00:57 | |
*** ociuhandu has joined #openstack-infra | 01:06 | |
*** ociuhandu has quit IRC | 01:09 | |
*** ociuhandu has joined #openstack-infra | 01:10 | |
*** yamamoto has joined #openstack-infra | 01:16 | |
*** ociuhandu has quit IRC | 01:20 | |
*** ociuhandu has joined #openstack-infra | 01:21 | |
*** ociuhandu has quit IRC | 01:25 | |
*** yamamoto has quit IRC | 01:41 | |
*** yamamoto has joined #openstack-infra | 01:41 | |
*** Goneri has quit IRC | 01:48 | |
*** rfolco has quit IRC | 01:48 | |
*** dangtrinhnt has quit IRC | 01:55 | |
*** dangtrinhnt has joined #openstack-infra | 01:56 | |
*** larainema has joined #openstack-infra | 01:57 | |
*** dangtrinhnt has quit IRC | 01:57 | |
*** dangtrinhnt has joined #openstack-infra | 02:04 | |
*** dangtrinhnt has quit IRC | 02:07 | |
*** dangtrinhnt_ has joined #openstack-infra | 02:07 | |
openstackgerrit | Ian Wienand proposed zuul/zuul-jobs master: test-upload-logs-swift: revert download script https://review.opendev.org/715755 | 02:11 |
---|---|---|
openstackgerrit | Ian Wienand proposed zuul/zuul-jobs master: bulk-download : role with script to download all log files https://review.opendev.org/715756 | 02:11 |
*** ricolin_ has joined #openstack-infra | 02:22 | |
*** ricolin_ has quit IRC | 02:31 | |
*** yamamoto has quit IRC | 02:32 | |
*** ociuhandu has joined #openstack-infra | 02:34 | |
*** ociuhandu has quit IRC | 02:39 | |
*** yamamoto has joined #openstack-infra | 03:01 | |
*** ramishra has joined #openstack-infra | 03:07 | |
*** psachin has joined #openstack-infra | 03:09 | |
*** smarcet has joined #openstack-infra | 03:15 | |
*** rosmaita has left #openstack-infra | 03:22 | |
kevinz | ping ianw: Hi | 03:28 |
kevinz | Recently there are some node failure in Linaro US: http://zuul.openstack.org/builds?job_name=kolla-build-debian-source-aarch64&job_name=kolla-publish-debian-source-aarch64&job_name=kolla-ansible-debian-source-aarch64 | 03:28 |
ianw | kevinz: ok, give me a sec and i can poke at some logs | 03:29 |
*** dangtrinhnt_ has quit IRC | 03:37 | |
*** yamamoto has quit IRC | 03:38 | |
*** yamamoto has joined #openstack-infra | 03:41 | |
ianw | kevinz: here's one of the failures -> http://paste.openstack.org/show/791301/ | 03:44 |
ianw | kevinz: looks like that's pretyt consistent, floating ip failures | 03:44 |
*** dangtrinhnt has joined #openstack-infra | 03:50 | |
*** smarcet has quit IRC | 04:03 | |
*** evrardjp has quit IRC | 04:03 | |
*** dave-mccowan has quit IRC | 04:06 | |
*** dangtrinhnt has quit IRC | 04:08 | |
*** evrardjp has joined #openstack-infra | 04:10 | |
openstackgerrit | Ian Wienand proposed zuul/zuul-jobs master: local-log-download : role with script to download all log files https://review.opendev.org/715756 | 04:13 |
*** ociuhandu has joined #openstack-infra | 04:14 | |
*** dangtrinhnt has joined #openstack-infra | 04:24 | |
*** ociuhandu has quit IRC | 04:24 | |
*** ociuhandu has joined #openstack-infra | 04:24 | |
*** ociuhandu has quit IRC | 04:30 | |
*** evrardjp has quit IRC | 04:36 | |
*** evrardjp has joined #openstack-infra | 04:36 | |
openstackgerrit | Ian Wienand proposed zuul/zuul-jobs master: local-log-download : role with script to download all log files https://review.opendev.org/715756 | 04:44 |
*** ykarel|away is now known as ykarel | 04:50 | |
openstackgerrit | Ian Wienand proposed zuul/zuul-jobs master: local-log-download : role with script to download all log files https://review.opendev.org/715756 | 04:51 |
chandankumar | ianw, Hello | 04:52 |
chandankumar | ianw, we have created openstack-tempest-skiplist repo yesterday https://review.opendev.org/#/c/713809/, please add me to the reviewer group for this project https://review.opendev.org/#/admin/groups/2083,members | 04:53 |
ianw | chandankumar: np, should be done | 04:55 |
chandankumar | ianw, thanks :-) | 04:55 |
*** dangtrinhnt has quit IRC | 04:58 | |
openstackgerrit | Ian Wienand proposed zuul/zuul-jobs master: local-log-download : role with script to download all log files https://review.opendev.org/715756 | 05:00 |
*** yamamoto has quit IRC | 05:03 | |
*** yamamoto has joined #openstack-infra | 05:04 | |
*** ykarel is now known as ykarel|afk | 05:21 | |
*** ociuhandu has joined #openstack-infra | 05:24 | |
*** udesale has joined #openstack-infra | 05:28 | |
*** ijw has joined #openstack-infra | 05:31 | |
openstackgerrit | Ian Wienand proposed zuul/zuul-jobs master: local-log-download : role with script to download all log files https://review.opendev.org/715756 | 05:31 |
*** ijw_ has quit IRC | 05:32 | |
*** ociuhandu has quit IRC | 05:34 | |
*** ociuhandu has joined #openstack-infra | 05:35 | |
*** ociuhandu has quit IRC | 05:40 | |
*** ykarel|afk is now known as ykarel | 05:40 | |
*** ociuhandu has joined #openstack-infra | 05:44 | |
openstackgerrit | Ian Wienand proposed zuul/zuul-jobs master: local-log-download : role with script to download all log files https://review.opendev.org/715756 | 05:49 |
*** ociuhandu has quit IRC | 05:54 | |
*** ociuhandu has joined #openstack-infra | 05:56 | |
*** ociuhandu has quit IRC | 06:00 | |
kevinz | ianw: thanks a lot! It the floating IP is essentail? I suppose it will use IPv6 Public IP only | 06:03 |
kevinz | since we don't have enough floating ips actually | 06:04 |
kevinz | ianw: could we set the Linaro US not to use floating ip? | 06:06 |
openstackgerrit | OpenStack Proposal Bot proposed openstack/project-config master: Normalize projects.yaml https://review.opendev.org/715772 | 06:08 |
*** udesale_ has joined #openstack-infra | 06:23 | |
*** dpawlik has joined #openstack-infra | 06:23 | |
*** udesale has quit IRC | 06:25 | |
*** udesale_ has quit IRC | 06:27 | |
*** xek has joined #openstack-infra | 06:29 | |
*** xek_ has joined #openstack-infra | 06:33 | |
*** xek has quit IRC | 06:33 | |
*** ociuhandu has joined #openstack-infra | 06:35 | |
openstackgerrit | Merged openstack/project-config master: Normalize projects.yaml https://review.opendev.org/715772 | 06:37 |
*** smarcet has joined #openstack-infra | 06:39 | |
*** ociuhandu has quit IRC | 06:40 | |
*** yamamoto has quit IRC | 06:41 | |
*** yamamoto has joined #openstack-infra | 06:42 | |
*** smarcet has quit IRC | 06:44 | |
*** xek_ has quit IRC | 06:49 | |
*** xek_ has joined #openstack-infra | 06:50 | |
*** ociuhandu has joined #openstack-infra | 07:05 | |
*** ociuhandu has quit IRC | 07:09 | |
*** yamamoto has quit IRC | 07:10 | |
*** yamamoto has joined #openstack-infra | 07:12 | |
*** jcapitao has joined #openstack-infra | 07:17 | |
*** ociuhandu has joined #openstack-infra | 07:19 | |
*** tosky has joined #openstack-infra | 07:25 | |
*** ysandeep|rover is now known as ysandeep|rover|l | 07:25 | |
*** ociuhandu has quit IRC | 07:25 | |
*** pgaxatte has joined #openstack-infra | 07:28 | |
*** jcapitao has quit IRC | 07:29 | |
*** jcapitao has joined #openstack-infra | 07:31 | |
*** rpittau|afk is now known as rpittau | 07:34 | |
*** arxcruz|off is now known as arxcruz | 07:35 | |
*** yamamoto has quit IRC | 07:44 | |
*** ociuhandu has joined #openstack-infra | 07:52 | |
*** jpena|off is now known as jpena | 07:53 | |
*** ralonsoh has joined #openstack-infra | 07:53 | |
*** ociuhandu has quit IRC | 07:57 | |
*** yamamoto has joined #openstack-infra | 07:58 | |
*** ociuhandu has joined #openstack-infra | 08:01 | |
*** smarcet has joined #openstack-infra | 08:11 | |
*** yamamoto has quit IRC | 08:14 | |
*** yamamoto has joined #openstack-infra | 08:15 | |
*** smarcet has quit IRC | 08:16 | |
*** tkajinam has quit IRC | 08:30 | |
*** derekh has joined #openstack-infra | 08:33 | |
*** dtantsur|afk is now known as dtantsur | 08:34 | |
*** ysandeep|rover|l is now known as ysandeep|rover | 08:34 | |
*** nightmare_unreal has joined #openstack-infra | 08:37 | |
*** udesale has joined #openstack-infra | 08:38 | |
dtantsur | AJaeger: morning! thanks for your suggestion on the ironic hacking 2.0 patch, totally missed it. I wonder if somebody has to update https://docs.openstack.org/hacking/latest/user/usage.html#local-checks | 08:53 |
*** kevko_ has joined #openstack-infra | 08:59 | |
*** rcernin has quit IRC | 09:00 | |
*** pkopec has joined #openstack-infra | 09:03 | |
*** ociuhandu has quit IRC | 09:03 | |
*** smarcet has joined #openstack-infra | 09:06 | |
*** ociuhandu has joined #openstack-infra | 09:09 | |
*** smarcet has quit IRC | 09:10 | |
*** ociuhandu has quit IRC | 09:13 | |
AJaeger | dtantsur: yes, let me do the update... Check topic:update-hacking for my weekend fun ;) | 09:28 |
dtantsur | AJaeger: oh, you did have a lot of fun :) | 09:29 |
dtantsur | AJaeger: FYI we're handling ironic projects now, no need to bother with them (unless we miss something) | 09:29 |
AJaeger | dtantsur: Great! One less on my plate! | 09:30 |
dtantsur | I can assure you don't want to fix W504 all over the ironic codebase :D | 09:31 |
AJaeger | dtantsur: I disabled W504 everywhere ;) | 09:31 |
dtantsur | I quite like to have either 503 or 504 enabled for consistency | 09:31 |
dtantsur | (and 504 seems preferred apparently) | 09:31 |
AJaeger | Oh, there's hacking 3.0 out??? | 09:31 |
dtantsur | WUT | 09:31 |
dtantsur | rpittau: we're too slow ^^^ | 09:32 |
*** ykarel is now known as ykarel|lunch | 09:32 | |
rpittau | what the........... | 09:32 |
* rpittau flips table | 09:32 | |
dtantsur | by the time we update ironic, hacking will have as many versions as firefox | 09:32 |
rpittau | lol | 09:32 |
AJaeger | just minimal changes, shouldn't hurt us. | 09:32 |
dtantsur | ooookay, lemme update my patches while they're not yet numerous | 09:32 |
rpittau | last famous words? :P | 09:32 |
rpittau | so we go for 3.0 ? | 09:33 |
dtantsur | rpittau: let's try? | 09:33 |
rpittau | let's | 09:33 |
dtantsur | AJaeger: do they have a changelog other than git log? | 09:33 |
*** admcleod has quit IRC | 09:36 | |
*** sgw has quit IRC | 09:39 | |
openstackgerrit | Sorin Sbarnea proposed zuul/zuul-jobs master: Allow configure-mirrors to enable extra repos https://review.opendev.org/693887 | 09:40 |
openstackgerrit | Sorin Sbarnea proposed zuul/zuul-jobs master: Allow configure-mirrors to enable extra repos https://review.opendev.org/693887 | 09:41 |
*** tkajinam has joined #openstack-infra | 09:41 | |
openstackgerrit | Andreas Jaeger proposed openstack/hacking master: Document new way of registering local plugins https://review.opendev.org/715894 | 09:42 |
AJaeger | dtantsur: I'm not aware of anything besides git log/review.opendev.org for hacking | 09:43 |
AJaeger | dtantsur: please review 715894 to address the point you raised | 09:44 |
dtantsur | thx! | 09:44 |
*** gshippey has joined #openstack-infra | 09:47 | |
openstackgerrit | Sorin Sbarnea proposed zuul/zuul-jobs master: Improve job and node information banner https://review.opendev.org/677971 | 09:47 |
openstackgerrit | Andreas Jaeger proposed openstack/hacking master: Document new way of registering local plugins https://review.opendev.org/715894 | 09:48 |
openstackgerrit | Sorin Sbarnea proposed zuul/zuul-jobs master: Avoid confusing rsync errors when source folders are missing https://review.opendev.org/670044 | 09:52 |
*** smarcet has joined #openstack-infra | 10:00 | |
*** smarcet has quit IRC | 10:04 | |
*** tetsuro_ has quit IRC | 10:04 | |
*** ykarel|lunch is now known as ykarel | 10:15 | |
AJaeger | dtantsur: I've done bifrost, see https://review.opendev.org/715617 | 10:17 |
dtantsur | thanks AJaeger | 10:18 |
AJaeger | dtantsur: I think that's the only ironic one I did - I leave the rest to you and rpittau ;). Now updating for hacking 3.0 | 10:18 |
rpittau | AJaeger: sounds good :) | 10:20 |
*** ociuhandu has joined #openstack-infra | 10:21 | |
*** admcleod has joined #openstack-infra | 10:22 | |
*** yamamoto has quit IRC | 10:47 | |
*** dmellado has quit IRC | 10:50 | |
*** smarcet has joined #openstack-infra | 10:53 | |
*** yamamoto has joined #openstack-infra | 10:54 | |
*** rpittau is now known as rpittau|bbl | 10:56 | |
*** smarcet has quit IRC | 10:58 | |
*** jcapitao is now known as jcapitao_lunch | 11:02 | |
*** dklyle has quit IRC | 11:15 | |
*** gfidente has quit IRC | 11:21 | |
openstackgerrit | Matthew Treinish proposed openstack/pbr master: Update python requires packaging metadata for package https://review.opendev.org/715917 | 11:23 |
*** rfolco has joined #openstack-infra | 11:24 | |
*** jpena is now known as jpena|lunch | 11:34 | |
*** rosmaita has joined #openstack-infra | 11:36 | |
*** artom has joined #openstack-infra | 11:38 | |
*** smarcet has joined #openstack-infra | 11:48 | |
*** ysandeep|rover is now known as ysandeep|rover|b | 11:51 | |
*** smarcet has quit IRC | 11:52 | |
chandankumar | AJaeger, Hello | 12:00 |
chandankumar | AJaeger, is it possible to enable noop zuul jobs for openstack-tempest-skiplist? | 12:00 |
chandankumar | to merge few patches there for example https://review.opendev.org/#/c/715871/ | 12:00 |
*** rlandy has joined #openstack-infra | 12:01 | |
*** lpetrut has joined #openstack-infra | 12:02 | |
*** rh-jelabarre has joined #openstack-infra | 12:04 | |
AJaeger | chandankumar: sure, just merge a change in your repo to add it ;) | 12:05 |
*** yamamoto has quit IRC | 12:05 | |
*** dmellado has joined #openstack-infra | 12:06 | |
AJaeger | chandankumar: amend the change to add .zuul.yaml file with the noop-jobs and you should be good | 12:07 |
*** yamamoto has joined #openstack-infra | 12:08 | |
chandankumar | AJaeger, ah, will do that thanks :-) | 12:08 |
*** ysandeep|rover|b is now known as ysandeep|rover | 12:13 | |
openstackgerrit | Mohammed Naser proposed openstack/openstack-zuul-jobs master: DNM: test inline pep8 (should not fail) https://review.opendev.org/715928 | 12:14 |
*** jcapitao_lunch is now known as jcapitao | 12:24 | |
*** jpena|lunch is now known as jpena | 12:30 | |
*** andrewbonney has joined #openstack-infra | 12:31 | |
openstackgerrit | Mohammed Naser proposed openstack/openstack-zuul-jobs master: DNM: this _should_ be a failing change https://review.opendev.org/715930 | 12:36 |
openstackgerrit | Merged openstack/project-config master: Add Shrews to alumni https://review.opendev.org/715373 | 12:36 |
openstackgerrit | Merged openstack/project-config master: Replace python-charm-jobs to py3 job https://review.opendev.org/714796 | 12:38 |
*** rpittau|bbl is now known as rpittau | 12:46 | |
*** ociuhandu has quit IRC | 12:48 | |
openstackgerrit | Grzegorz Grasza proposed openstack/project-config master: Add ability to push signed tags to tripleo-ipa https://review.opendev.org/715932 | 12:52 |
*** redrobot has quit IRC | 12:55 | |
*** Guest43440 has joined #openstack-infra | 12:56 | |
fungi | chandankumar: ianw: just now catching up on scrollback in here but the project creation didn't complete. we're still working through failures in jeepyb but i fell asleep | 12:57 |
*** Guest43440 is now known as redrobot | 12:58 | |
*** dave-mccowan has joined #openstack-infra | 12:59 | |
mordred | fungi: SLEEP | 12:59 |
*** ociuhandu has joined #openstack-infra | 13:00 | |
openstackgerrit | Sorin Sbarnea proposed zuul/zuul-jobs master: tox: allow tox to be upgraded https://review.opendev.org/690057 | 13:03 |
*** ociuhandu has quit IRC | 13:05 | |
openstackgerrit | Sorin Sbarnea proposed zuul/zuul-jobs master: install-docker: allow removal of conflicting packages https://review.opendev.org/702304 | 13:05 |
fungi | chandankumar: ianw: oh. except that somehow that acl eventually got applied | 13:07 |
*** gfidente has joined #openstack-infra | 13:07 | |
*** ociuhandu has joined #openstack-infra | 13:10 | |
*** rh-jlabarre has joined #openstack-infra | 13:12 | |
*** irclogbot_3 has quit IRC | 13:13 | |
*** mrmartin has quit IRC | 13:15 | |
*** rlandy has quit IRC | 13:15 | |
*** rh-jelabarre has quit IRC | 13:15 | |
*** auristor has quit IRC | 13:15 | |
*** tinwood has quit IRC | 13:15 | |
*** rlandy has joined #openstack-infra | 13:16 | |
*** tinwood has joined #openstack-infra | 13:17 | |
*** irclogbot_3 has joined #openstack-infra | 13:18 | |
*** smarcet has joined #openstack-infra | 13:22 | |
*** cdearborn has joined #openstack-infra | 13:24 | |
*** ociuhandu has quit IRC | 13:27 | |
*** auristor has joined #openstack-infra | 13:28 | |
fungi | on closer inspection, it took two passes of manage-projects to fully provision that project. discussion in #opendev but fix for that has been approved minutes ago | 13:31 |
*** yamamoto has quit IRC | 13:32 | |
*** Goneri has joined #openstack-infra | 13:40 | |
*** yamamoto has joined #openstack-infra | 13:41 | |
*** yamamoto has quit IRC | 13:43 | |
openstackgerrit | Sorin Sbarnea proposed zuul/zuul-jobs master: install-docker: allow removal of conflicting packages https://review.opendev.org/702304 | 13:45 |
openstackgerrit | Monty Taylor proposed openstack/project-config master: Run manage-projects on gerrit related changes https://review.opendev.org/715945 | 13:46 |
Tengu | hello there! not sure this is the right place to ask, but: how may I get a new "stable/branch" in a new code repository? it's for tripleo/validations-libs and tripleo/validations-common - they currently have "only" master, and we'd need to get stable/train in addition... | 13:47 |
AJaeger | Tengu: the release team will create branches for you. | 13:54 |
Tengu | AJaeger: do I need to make a ticket/request somewhere? | 13:54 |
Tengu | or is it linked to rdo directly? | 13:55 |
AJaeger | Tengu: https://docs.openstack.org/project-team-guide/stable-branches.html | 13:55 |
*** xek has joined #openstack-infra | 13:56 | |
Tengu | þanks | 13:56 |
AJaeger | Tengu: so, create a request in releases repo | 13:56 |
*** lseki has joined #openstack-infra | 13:57 | |
Tengu | AJaeger: ok :). Will check that after my current call | 13:57 |
*** xek_ has quit IRC | 13:57 | |
*** sgw has joined #openstack-infra | 13:57 | |
*** smarcet has quit IRC | 14:03 | |
*** dmellado has quit IRC | 14:04 | |
*** yamamoto has joined #openstack-infra | 14:04 | |
*** dmellado has joined #openstack-infra | 14:05 | |
artom | donnyd, o/ OpenEdge (That's FN's new name, right?) doing OK? I got a couple of NODE_FAILUREs | 14:09 |
openstackgerrit | Albin Vass proposed zuul/zuul-jobs master: Adds roles to install and run hashicorp packer https://review.opendev.org/709292 | 14:11 |
*** Goneri has quit IRC | 14:11 | |
*** smarcet has joined #openstack-infra | 14:11 | |
*** ysandeep|rover is now known as ysandeep|away | 14:12 | |
*** lpetrut has quit IRC | 14:13 | |
donnyd | artom: Checking now | 14:13 |
donnyd | Well that is quite a bit more than a coule | 14:13 |
donnyd | Well that is quite a bit more than a couple | 14:13 |
*** Goneri has joined #openstack-infra | 14:14 | |
*** lpetrut has joined #openstack-infra | 14:15 | |
*** beekneemech is now known as bnemec | 14:15 | |
artom | donnyd, I'm a selfish prick, I speak only for my patches ;) | 14:16 |
donnyd | Oh well that was easy to figure out.. .Had a jenkins server run away with all my resources | 14:16 |
donnyd | seriously thank you for the heads up. | 14:16 |
* artom imagines a British butler running away with rackmounts | 14:16 | |
artom | Laughing maniacally | 14:16 |
artom | donnyd, thank you for providing the resources :) | 14:17 |
artom | donnyd, btw, would you consider adding the nested-virt label/flavor? We (whitebox) will probably move some of our tests to a job that runs on those, to avoid being 100% dependant on Open Edge (not that we don't trust you, but single point of failure and all that) | 14:17 |
donnyd | I will add any flavor you want - but I do believe its enabled by default | 14:18 |
artom | Oh right, it's there already. Ignore me :) | 14:19 |
donnyd | But if you need something special - always ask because if I can do it - I will | 14:20 |
artom | :) | 14:20 |
*** jackedin has joined #openstack-infra | 14:21 | |
*** ykarel is now known as ykarel|away | 14:25 | |
donnyd | artom: keep me in the loop if you have anymore issues getting to OE | 14:30 |
artom | donnyd, will do - thanks again, it's appreciated | 14:32 |
artom | donnyd, hrmm, so actually | 14:32 |
artom | donnyd, do your machines have SRIOV-capable network cards? | 14:32 |
*** ociuhandu has joined #openstack-infra | 14:34 | |
openstackgerrit | Sorin Sbarnea proposed zuul/zuul-jobs master: Allow configure-mirrors to enable extra repos https://review.opendev.org/693887 | 14:37 |
donnyd | yes - but its not setup ATM | 14:37 |
artom | donnyd, interesting :) Good to know, I don't have a specific ask for now, I need to mull it over some more | 14:39 |
*** dklyle has joined #openstack-infra | 14:40 | |
donnyd | artom: I have Intel X520 nics | 14:42 |
donnyd | there are 8 per hypervisor with 4 currently in use | 14:42 |
artom | donnyd, so, the context is - Red Hat (my employer) is working on a bunch of Nova features involving what I like to call "exotic hardware" | 14:42 |
artom | VGPUs, FPGAs, that kind of stuff | 14:43 |
artom | Unless RH ponies up the hardware, we'll never have upstream CI for that | 14:43 |
fungi | well, or unless it can be effectively emulated | 14:43 |
artom | So I've been kinda half-assedly pushing for us to pony up the hardware and set up that CI | 14:43 |
fungi | and you could certainly have upstream ci performing unit testing on all the bits of the driver even if it's not an integrated functional test with the hardware | 14:44 |
artom | half-assedly because our internal hardware situation is a mess, and because I don't want to do devops for a CI cloud | 14:44 |
artom | fungi, yeah, we did that that in the past for NUMA and PCI | 14:44 |
artom | And while it's better than nothing, real hardware integration tests are definitely a massive advantage | 14:44 |
donnyd | artom: IMO - 3P CI is not as good as 1P CI | 14:45 |
artom | donnyd, we don't even have 3p | 14:45 |
fungi | i would argue that you at least want thorough unit testing with as much code coverage of the driver as possible, regardless of whether or not you also have functional tests with representative hardware | 14:45 |
artom | fungi, right, I'm not disagreeing with you | 14:45 |
artom | Just saying there's a testing gap there :) | 14:45 |
artom | So I figured I thought I could start with something smaller in scope and simpler, namely SRIOV CI | 14:46 |
artom | Which we also don't really have - I guess Mellanox have a 3rd party one? | 14:46 |
artom | So... if donnyd enabled SRIOV on OE, *and* Nova does the work to enable 2-level passthrough of PFs, we would cover the SRIOV CI bit | 14:47 |
artom | *But* we'd still have the GPU, FPGA gap | 14:47 |
artom | Whereas if I managed to get that RH 3P CI up and running | 14:47 |
clarkb | note we have gpus available in small quantities | 14:47 |
artom | We'd have the groundwork to just add cards/machines to that CI for any future needs | 14:48 |
clarkb | its like 1 gpu enabled test node at a time iircbut >0 | 14:48 |
artom | So I think the RH CI is the more "correct" idea, as it lays the groundwork for future exotic hardware testing | 14:49 |
artom | Whereas SRIOV testing on OE is limtited in scope | 14:49 |
artom | Despite being easier (for my, at least) to get started with | 14:49 |
donnyd | It would be better to grab the right hardware and send it to a trusted 1P CI provider | 14:49 |
artom | We do have a deal with Vexxhost... | 14:50 |
donnyd | 3P CI's only work so good and are only so useful | 14:50 |
clarkb | iirc the reason nova wasnt super interested in the gpus we have is they cant do virtualization if gpu resources to split it up its all 1:1 pci passthrough | 14:50 |
artom | It's just... very much outside my scope as a dev engineer | 14:50 |
donnyd | mnaser: is pretty trustworthy - maybe some agreement to load "exotic" hw into 1P CI is a more scalable and reliable method to success for such a thing | 14:51 |
artom | donnyd, I don't doubt mnaser's quality as a person and business :) | 14:51 |
donnyd | I would think... If we look at how well the 3P CI thing has worked over time it appears to be hit or miss to me | 14:51 |
*** ociuhandu has quit IRC | 14:52 | |
artom | TripleO seem to have made it work | 14:52 |
donnyd | Yes they have - with a significant amount of kinetic effort | 14:52 |
artom | Like everything else TripleO... | 14:53 |
donnyd | and when someone (company A) wants to get a special test env setup - they have to built the whole thing instead of just plugging it into something that already runs and drives | 14:53 |
clarkb | and is reliable | 14:53 |
clarkb | that seems to be the big thing third party ci underestimates | 14:53 |
donnyd | As our resources grow tighter - I would think the model would work better... but there has to be a level of trust between the 1P CI providers | 14:53 |
artom | donnyd, it's a very valid point... | 14:53 |
*** iurygregory has quit IRC | 14:54 | |
artom | donnyd, I could try and bubble that up the chain | 14:54 |
artom | (My internal RH management chain) | 14:54 |
donnyd | and if someone wants to go it like that.. well you are a 1P provider - or your not.. Its possible that some may see the value in contributing | 14:54 |
donnyd | It makes sense to me - but also I sit on the 1P provider side of the fence.. If I need to get a feature in Openstack - it would be super duper easy for me to do so - because I already contribute direct resources that are already plugged in and proven | 14:56 |
*** iurygregory has joined #openstack-infra | 14:56 | |
artom | donnyd, so out of curiosity - and I want to emphasize, this is really just me asking questions - if RH came to you with a proposal to add hardware to OE | 14:58 |
artom | Presumably with some amount of money being exchanged in some way | 14:58 |
donnyd | Call me crazy - but that is my thought. If RH or Intel, or Nvidia needed something special (for example - not an exclusive list) - I don't think its too much to ask to find a partner who already does CI... or contribute to the general pool themselves with a trusted proven build system. Also in the event we ever needed an audit or something like that - we won't have a bunch of build systems behind some nebulous | 14:59 |
donnyd | wall. | 14:59 |
artom | Would you be open to that? | 14:59 |
artom | "contribute to the general pool themselves with a trusted proven build system" | 14:59 |
artom | We don't have that :/ | 14:59 |
artom | Not for running external CI workloads, at any rate | 14:59 |
artom | We're a software-first kind of shop, running clouds is new to us | 15:00 |
artom | We're getting there, but it's still WIP | 15:00 |
openstackgerrit | Sorin Sbarnea proposed zuul/zuul-jobs master: install-docker: allow removal of conflicting packages https://review.opendev.org/702304 | 15:00 |
donnyd | I would happily host them likely for free - but if I am being honest - I think vexxhost / limestone / rackspace (etc) are better candidates for this type of proposal. I run my cloud to gain experience and because it is just fun for me. I don't have a fancy degree, so I have to learn through doing to back up skill set in the market. All the other 1P providers make their living from Openstack and already have | 15:04 |
donnyd | real data centers and prod workloads running. I am pretty sure they would happily take on such a project. | 15:04 |
mordred | mnaser: ^^ | 15:06 |
artom | donnyd, since you're running for TC, wouldn't that be a project that you could champion? Make 1P CI "pluggable" in an official way ;) | 15:07 |
*** yoctozepto has quit IRC | 15:07 | |
*** yoctozepto has joined #openstack-infra | 15:08 | |
*** edausq has quit IRC | 15:08 | |
*** toabctl has quit IRC | 15:08 | |
*** osmanlicilegi has quit IRC | 15:08 | |
*** ab-a has quit IRC | 15:08 | |
*** arxcruz has quit IRC | 15:09 | |
*** rledisez has quit IRC | 15:10 | |
*** osmanlicilegi has joined #openstack-infra | 15:10 | |
*** aarents has quit IRC | 15:10 | |
*** EmilienM has quit IRC | 15:10 | |
*** armax has joined #openstack-infra | 15:10 | |
*** arxcruz has joined #openstack-infra | 15:10 | |
*** rakhmerov has quit IRC | 15:10 | |
*** rledisez has joined #openstack-infra | 15:10 | |
*** EmilienM has joined #openstack-infra | 15:11 | |
*** ab-a has joined #openstack-infra | 15:11 | |
donnyd | artom: I would ask why the TripleO CI is 3P to begin with | 15:11 |
*** cmurphy has quit IRC | 15:13 | |
donnyd | I would not ask them to fund the whole CI... I can tell you from experience how much it costs each month (about 600 -1K dollars for me). I would ask why 3P is easier to maintain, easier to wire up, ... just easier.. Also I have a pretty myopic view of the world.. its through my tiny little lens of personal learning. So of course take all that in context. | 15:13 |
mordred | artom: 1P CI is pluggable already | 15:13 |
mordred | we have an extensible system that accepts resources from multple parties | 15:13 |
artom | mordred, but only in the sense that each party spins up their own cloud, and then plugs into nodepool, right? | 15:13 |
donnyd | Its on my hit list - we need resources.. and I am willing to bet all these 3P CI's could really make a discernible difference in how efficient and effective our CI really is. If we could get all the 3P to convert to 1P - and just get a small amount of general purpose workloads.. we could make our CI do more with the already dwindling resources... also schedule jobs where they belong | 15:13 |
mordred | artom: that's right - or hands resources to someone already running a cloud. | 15:14 |
artom | mordred, I was thinking more of the following use case: I want to test a specific piece of hardware, but don't have the capacity/will/resource to actually run a cloud. But I'm willing to buy the hardware itself. Can I send it to an existing 1P provider? | 15:14 |
mordred | it is not required that someone run a cloud themselves - as donnyd pointed out - there are plenty of vendors out there already doing that | 15:14 |
artom | mordred, ah, so how would that work? | 15:14 |
mordred | artom: in theory that should be possible - but so far nobody has decided to do it :) | 15:15 |
mordred | artom: that said ... | 15:15 |
mordred | it's worth noting there is another facet that makes some things more 3PCI than 1PCI - and that's ability of the general population of openstack to debug/reproduce/fix issues | 15:15 |
donnyd | Vendor A signs partnering agreement with Provider B - Provider B creates appropriate provisions and labels in nodepool - vendor A schedules jobs against that | 15:15 |
donnyd | We literally already do this right now with OE(FN) | 15:16 |
mordred | so if the special hardware is a million dollar SAN - it's still unlikley to go into 1PCI because actually gating on it working potentially presents an undue burden on the developers if something goes wrong - the number of people able to fix the issue are ... low | 15:16 |
mordred | donnyd: and yup - that | 15:16 |
*** edausq has joined #openstack-infra | 15:16 | |
fungi | a big hurdle for many of the third-party ci systems is that they're run in corporate labs with draconian firewall rules which preclude things like our zuul servers making connections to api endpoints they might be hosting | 15:16 |
artom | mordred, the "general population" thing is a differnet problem. We (RH) have massive customers that use all that stuff (think telcos), but don't participate upstream 1 iota | 15:17 |
mordred | so - there's two axes - one is general purpose things that we just don't have access to but if someone wanted to fund it would make sense - GPUs probably fit into this category | 15:17 |
artom | So they're there, but the only way they have to influence/improve the community is through us (RH) | 15:17 |
mordred | artom: that's not what I mean | 15:17 |
fungi | and getting their companies' network security overseers to okay external access for lab environments or to create entirely separate networks to put these resources on is more than they want to deal with | 15:17 |
artom | mordred, ah, you mean in terms of the infra team being able to manage it? | 15:18 |
mordred | what I mean is that causing all of the openstack developers to be bound by the health of gating on a piece of gear that most of the openstack developers do not have the ability to interact with or fix when it goes wrong is a fundamental issue - and those sorts of things are organized into 3PCI quite on purpose - it's not just a matter of availabilty or willingness of a vendor to manage it | 15:18 |
donnyd | There are many ways to slice this up - but really it boils down to making 1P CI the preferred method. Something like a partnering agreement can be done direct between vendor dev shops and existing 1P CI providers.. no need to involve corporate IT in this | 15:19 |
mordred | artom: I mean that openstack developers in general shoudl have a reasonable expectation of beinga ble to debug an issue if it's something we're going to gate on | 15:19 |
donnyd | but for the million dollar SAN example - 3P makes sense | 15:19 |
mordred | has very little to do with infra | 15:19 |
*** cmurphy has joined #openstack-infra | 15:19 | |
artom | mordred, I see | 15:20 |
mordred | for general purpose specialized hardware that just isn't in out clouds - like ARM or GPUs - that's a thing where working with providers to make sure it's provided into some clouds is a great way forward to allow people to work on it | 15:20 |
mordred | because those things aren't crazy for normal devs to be expected to debug if the gate goes south | 15:20 |
mordred | so there's some things not in the 1PCI system just because nobody has provided them but would be very happy additions - and some things where it's still likely not a good idea for the entire community to be gated by it | 15:21 |
donnyd | The existing providers can already get dev's that direct debugging access. We all already have public facing clouds that are designed with that intent in mind. | 15:21 |
mordred | yeah | 15:21 |
artom | mordred, ok, I see what you mean | 15:21 |
donnyd | its not a one size fits all answer.. its more about lowing the barriers to getting devs access to the things they *need* to be successful .. and lowering the burden for contribution | 15:22 |
mordred | artom: but back to your original question - I bet there are still a bunch of ways in which RH could choose to send money to some of the existing cloud providers to add capabilities for specific things | 15:23 |
donnyd | For example wouldn't it be cheaper (and easier) for RH to just buy CI resources from an existing CI provider. | 15:23 |
mordred | both for 1PCI and 3PCI use cases - because even if it's a 3PCI use case - it still might be ... yeah ^^ | 15:23 |
mordred | that | 15:23 |
artom | mordred, I'd agree - it's just way outside my usual scope | 15:23 |
mordred | artom: same :) | 15:23 |
sean-k-mooney | mordred: well part of teh issue is if we want to set up public ci we basicaly need to disconenct form the redhat network | 15:24 |
sean-k-mooney | the current provider we have for upshift is connecteedd to the redhat network by it | 15:24 |
clarkb | one thing I've seen over the years is that devs will demand X, we provide it, then the actual uptick on use of that feature is really slow. That then feeds back into the system as "this wasn't important afterall". We've seen that with multi node testing, the gpu resources mentioned above, and more. I mention this because one thing to be wary of is doing a bunch of work then having the result sit idle | 15:24 |
clarkb | sometimes starting small and iteratively is a good thing | 15:24 |
clarkb | now multinode testing is common and many people make use of it but for about a year no one would touch it | 15:25 |
clarkb | which can be very discouraging | 15:25 |
mordred | sean-k-mooney: yah. that's why just paying mnaser (or someone) to run the capacity you need instead of trying to run your own cloud *might* be a more cost-effective choice | 15:25 |
artom | clarkb, I guess it's chicken and egg a bit? It appears, but not necessarily super stable, devs are weary of using it, so it doens't improve, etc etc | 15:25 |
*** ociuhandu has joined #openstack-infra | 15:26 | |
*** ab-a has quit IRC | 15:26 | |
*** edausq has quit IRC | 15:26 | |
clarkb | artom: I guess? In the case of multinode testing it was super unstable bceause openstack was unstable :) | 15:26 |
donnyd | I think it really falls back to why you should buy it instead of build it for *most* cases. Clouds are hard.. if they were easy... well everybody would already have one | 15:26 |
clarkb | I think the correct reaction is to realize "we demanded this now it is up to us to make our software work with it" | 15:26 |
artom | donnyd, they do. It's called AWS ;) | 15:26 |
donnyd | where is the barf emoji when you really need it | 15:27 |
donnyd | LOL | 15:27 |
clarkb | artom: fwiw literally nothign is stopping you from testing with gpus today aiui | 15:27 |
clarkb | the only thing is the assumption its not possible and the annoyance it isn't new enough gpu hardware to do the virtualized gpu stuff | 15:28 |
artom | clarkb, except my own ignorance ;) | 15:28 |
donnyd | doesn't vexxhost already have those labels clarkb ? | 15:28 |
mnaser | i don't think our gpus support vgpus | 15:28 |
clarkb | but if you start with pci passthrough testing and that all works its so much easier to say "we can apply this to the newer thing if we had it" | 15:28 |
clarkb | mnaser: correct | 15:28 |
*** rledisez has quit IRC | 15:28 | |
clarkb | mnaser: I'm suggesting we not worry about that to start | 15:28 |
clarkb | but instead we have an zero progress has been made | 15:28 |
clarkb | if instead we start by doing what we can then we make some progress and have an arugment for the future in order to make more progress | 15:29 |
clarkb | but as far as I nkow the whole thing has been DOA because no vgpu | 15:29 |
artom | clarkb, so in essence you're saying "create CI jobs that use GPU PCI passthrough as an argument to add vGPU capability"? | 15:29 |
sean-k-mooney | mnaser: they do not | 15:29 |
*** pgaxatte has quit IRC | 15:29 | |
sean-k-mooney | mnaser: i confirmed that with you sortly after you added them to the ci pool | 15:30 |
*** ab-a has joined #openstack-infra | 15:30 | |
clarkb | artom: right because it shows there is actual interest and something is working. What we are showing today is no one cares enough to do the basic thing | 15:30 |
donnyd | right - wouldn't it be easier for the vendor wanting to test VGPUS to just send you the right gear mnaser ? | 15:30 |
fungi | artom: if the goal is to test gpus, then have jobs which make use of pci passthrough to interact with the gpus. if the goal is not actually using gpus then the "need" for gpu-enabled test nodes may have been a mischaracterization | 15:30 |
sean-k-mooney | donnyd: said vender being nvida means that is not going to happen | 15:30 |
sean-k-mooney | also licensing would still be a pain | 15:30 |
artom | fungi, I guess the latter then. I wasn't involved in those discussions, but I'm assuming GPUs were requested for vGPU stuff, not plain old PCI passthrough. | 15:31 |
sean-k-mooney | artom: correct it was for the vgpu testing | 15:31 |
clarkb | except we don't have first party pci passthrough testing either... | 15:31 |
sean-k-mooney | which needs specific sku | 15:32 |
fungi | in which case do you really need gpus to test gpu virtualization features? | 15:32 |
clarkb | literally this kills two birds with one stone but devs are not interested unless they get extra features | 15:32 |
donnyd | sean-k-mooney: hence the partnering agreement | 15:32 |
*** dmellado has quit IRC | 15:32 | |
clarkb | I get it, we can't test the extra features, but we can test all the other bits | 15:32 |
artom | fungi, yeah, because the physical card provides the virtual GPUs via the mdev mechanism | 15:32 |
artom | Can't test the latter if you don't have the actual card :) | 15:32 |
fungi | seems like if you don't need an actual gpu to handle some workload, you could just mock the vgpu interface | 15:32 |
clarkb | and by not doing what we can it takes the wind out of the sails for doing more | 15:32 |
sean-k-mooney | fungi: qemu cant virtualis fake gpus capable of testing mdev based vgpu | 15:32 |
*** rledisez has joined #openstack-infra | 15:32 | |
sean-k-mooney | fungi: so the only way to test it is on baremetal ironic nodes | 15:33 |
fungi | sean-k-mooney: because nobody has written the software yet? do we know anyone who writes software? ;) | 15:33 |
artom | fungi, I don't know much about that stuff, but that would be an entirely new kernel module IIUC :P | 15:33 |
fungi | i hear those are software too | 15:33 |
*** jgwentworth is now known as melwitt | 15:33 | |
artom | You're not genuinely suggesting we start writing kernel-level mocks | 15:33 |
clarkb | artom: sean-k-mooney yes, I think we've all accepted that. What I'm suggesting is that the pci passthrough case is a literal example artom made above and we can test that as far as I know but no one is willing to | 15:33 |
sean-k-mooney | fungi: for what its worth i have looked at faking this before but it wont really help imporve testing | 15:34 |
fungi | artom: why not mock the interactions in the driver? | 15:34 |
*** ociuhandu has quit IRC | 15:34 | |
clarkb | and by not doing that work we've taken all the momentum behind getting to the point of doing what you really want and thrown it away | 15:34 |
sean-k-mooney | for examlpe we can crete mdves and check that that logic works using the fake tty mdev driver | 15:34 |
artom | fungi, those are functional tests, and we have those | 15:34 |
artom | fungi, but functional tests are only as good as the mocks they use | 15:34 |
*** iurygregory has quit IRC | 15:35 | |
artom | And we've been burned by that before | 15:35 |
*** ociuhandu has joined #openstack-infra | 15:35 | |
fungi | sure, and integration tests are only as good as the integrations they test | 15:35 |
sean-k-mooney | yep | 15:35 |
*** edausq has joined #openstack-infra | 15:35 | |
*** iurygregory has joined #openstack-infra | 15:35 | |
sean-k-mooney | fungi: nova has tired to require thridpary ci before merging some of those feature in the past | 15:35 |
sean-k-mooney | but we have not always been successfully in geting the vendor to set it up | 15:35 |
*** nightmare_unreal has quit IRC | 15:36 | |
*** yamamoto has quit IRC | 15:36 | |
*** smarcet has quit IRC | 15:36 | |
sean-k-mooney | we were able to get intel to provide a hardared based thrid party ci for virtual persitent memory but now that they have pulled back that is not really maintianed and the cyborge ci is not going to be put in place either | 15:37 |
artom | clarkb, so, let's say I get a job merged that uses the GPUs we do have | 15:38 |
artom | clarkb, tbh, I don't see how that's an argument to then go out and buy different GPUs | 15:38 |
fungi | yep, there's probably also a limit to what nova should expect to have to test, if what's actually happening is interactions with some hypervisor and not directly with the hardware. can't the hypervisor maintainers test those hardware interactions? | 15:38 |
clarkb | artom: it supports the idea that people actually care about that testing | 15:38 |
clarkb | artom: right now the message we are giving is that no one cares enough to do a simple test | 15:38 |
artom | Because they're completely different features. It's like saying "we have NUMA CI jobs, now please buy some FPGAs" | 15:39 |
clarkb | artom: also from experience we tend to learn a lot setting up the simple case | 15:39 |
*** andreykurilin has quit IRC | 15:39 | |
*** dmellado has joined #openstack-infra | 15:39 | |
clarkb | artom: from an implementation detail perspective maybe, but for end users they are very related | 15:39 |
artom | fungi, that's kind of different issue - libvirt isn't really a hypervisor | 15:40 |
clarkb | artom: if I know that gpu pci passthrough is working and well tested I'm more likely to use that feature | 15:40 |
clarkb | then I might want to enable vgpus and find oh that isn't tested due to some technical issue lets help there | 15:40 |
clarkb | right now we've put the car in park and have given up before leaving the driveway | 15:40 |
artom | clarkb, ah, I see | 15:40 |
artom | clarkb, well, I was seeing it as two cars | 15:40 |
fungi | artom: i was using "hypervisor" as short hand for nova backend, my point was does libvirt test that its support for those things works? | 15:41 |
artom | Except one is missing, and the one we have is actually a bike | 15:41 |
mordred | bikes provide good exercise | 15:41 |
artom | fungi, I have no idea | 15:41 |
artom | (about the libvirt testing) | 15:41 |
fungi | like, is nova's job to test libvirt for the libvirt maintainers? | 15:41 |
artom | kashyapc would not | 15:41 |
artom | *know | 15:41 |
fungi | or can nova just assume that libvirt tests that its support for these things works, and so only test its libvirt api compliance? | 15:42 |
artom | fungi, well, libvirt doesn't fully encapsulate/abstract the hw | 15:42 |
clarkb | from experience most of the big leaps in ci capability that have been made started small on a bike | 15:42 |
clarkb | multi node testing is a major example of this | 15:42 |
fungi | so at what point does nova talk directly to the kernel modules for this stuff? are those kernel modules tested, and can nova just test its compliance with the kernel module's api? | 15:43 |
artom | fungi, I think we're just debating the value of full stack integration tests at this point, no? :) | 15:43 |
artom | Or at least, whether it's OpenStack CI's role to perform them | 15:44 |
clarkb | artom: I think fungi is saying a similar thing to what I'm saying | 15:44 |
clarkb | if we start by doing what we can, that shows interest and motivation in the space | 15:44 |
clarkb | and you can often turn that into further development | 15:44 |
artom | clarkb, fair enough - snowball effect and all that | 15:44 |
*** rledisez has quit IRC | 15:45 | |
*** ab-a has quit IRC | 15:45 | |
*** Ng has quit IRC | 15:45 | |
*** tdasilva has quit IRC | 15:45 | |
*** rajinir has quit IRC | 15:45 | |
*** evgenyl has quit IRC | 15:45 | |
*** donnyd has quit IRC | 15:45 | |
*** jdelaros1 has quit IRC | 15:45 | |
*** hrybacki has quit IRC | 15:45 | |
*** dougwig has quit IRC | 15:45 | |
*** ttx has quit IRC | 15:45 | |
*** vdrok has quit IRC | 15:45 | |
*** jamespage has quit IRC | 15:45 | |
*** rpittau has quit IRC | 15:45 | |
*** lathiat has quit IRC | 15:45 | |
*** kota_ has quit IRC | 15:45 | |
*** Anticimex has quit IRC | 15:45 | |
clarkb | I'm talking about it from a test resource perspective and fungi is talkign about it from a mocks/fakes perspective | 15:45 |
clarkb | both push the problem space forward in different ways | 15:45 |
fungi | artom: take it the other way, shouldn't you then want to test different bios loadouts on the servers too, because you can't trust that the bios is thoroughly tested by its maintainers? there's always going to be a point at which you say "this is the scope of what we feel is reasonable to test, and we trust the the people who build the things we're interacting with test what they're responsible for" | 15:45 |
*** rledisez has joined #openstack-infra | 15:45 | |
*** ab-a has joined #openstack-infra | 15:45 | |
*** Ng has joined #openstack-infra | 15:45 | |
*** tdasilva has joined #openstack-infra | 15:45 | |
*** rajinir has joined #openstack-infra | 15:45 | |
*** evgenyl has joined #openstack-infra | 15:45 | |
*** donnyd has joined #openstack-infra | 15:45 | |
*** jdelaros1 has joined #openstack-infra | 15:45 | |
*** hrybacki has joined #openstack-infra | 15:45 | |
*** dougwig has joined #openstack-infra | 15:45 | |
*** ttx has joined #openstack-infra | 15:45 | |
*** vdrok has joined #openstack-infra | 15:45 | |
*** jamespage has joined #openstack-infra | 15:45 | |
*** rpittau has joined #openstack-infra | 15:45 | |
*** lathiat has joined #openstack-infra | 15:45 | |
*** kota_ has joined #openstack-infra | 15:45 | |
*** Anticimex has joined #openstack-infra | 15:45 | |
mnaser | I think there is a massive difference from testing pci pas through and vgpus | 15:46 |
*** andreykurilin has joined #openstack-infra | 15:46 | |
artom | fungi, right, it's definitely a big grey area | 15:46 |
artom | fungi, tending towards pitch black when you're at BIOS level ;) | 15:46 |
clarkb | mnaser: from a technical perspective, yes | 15:46 |
mnaser | one is straight up just an extra thing we pass to libvirt, the other is a big combination of interactions with the kernel, placement service and external software all together | 15:46 |
clarkb | mnaser: but from users driving use cases they are more closely related | 15:46 |
artom | fungi, my understanding as a Nova dev is - functional tests have lied to us in the past | 15:47 |
artom | And integration tests on real hardware are better | 15:47 |
*** rledisez has left #openstack-infra | 15:47 | |
artom | Obviously taking into account cost of resources, etc | 15:47 |
mnaser | If we want to test pci passthrough, we can probably do it in other ways that don’t need a GPU | 15:48 |
mnaser | but we would be pretty much testing libvirt at that points | 15:48 |
artom | mnaser, I agree, but I also get clarkb's point - if a user wants GPUs in their instance, there's not much difference between PCI passthrough and VGPUs | 15:48 |
fungi | but vgpu provisioning doesn't go through libvirt? | 15:48 |
artom | The operator will make that distinction, but not the user | 15:48 |
artom | fungi, no, kernel IIRC | 15:49 |
*** aarents has joined #openstack-infra | 15:49 | |
fungi | got it, so nova's host agent talks to the kernel driver? | 15:49 |
artom | fungi, I'm not the expert on this - but I believe it's the deployer/operator's job to set it up, and Nova then reads from /sys and stuff | 15:50 |
fungi | okay, so sysfs interactions | 15:51 |
fungi | that does seem like something which could be recorded and played back, at least | 15:51 |
*** hashar has joined #openstack-infra | 15:51 | |
fungi | (in absence of having actual hardware representatives) | 15:52 |
fungi | and then devs could even run those tests locally without needing a fancy gpu too | 15:52 |
artom | I'm pretty sure we have func tests that do something like that | 15:53 |
fungi | as long as the jobs don't go so far as to try to put a workload on the gpu from a guest | 15:53 |
artom | In any case, I guess I need to put my money where my mouth is and start using the GPUs we have | 15:53 |
artom | ... and then ask for more money for VGPU-capable hw :) | 15:53 |
donnyd | artom: or ask the vendor that makes them to collaborate with someone who can lower the burden | 15:54 |
donnyd | :) | 15:54 |
artom | donnyd, yeah, I believe NVIDIA were giving away GPUs left and right... but to the wrong company ;) | 15:54 |
fungi | worth talking to knikolla as he may have some info on what the gpu situation is like in moc/cloudlab | 15:55 |
artom | I need to go feed the kiddos, before the explode the house | 15:55 |
fungi | i saw lots of fancy presentations at the open cloud workshop earlier this month about testing gpu and fpga enabled systems | 15:56 |
donnyd | fungi: links? | 15:56 |
* fungi checks to see if the recordings have gone up yet | 15:57 | |
artom | https://massopen.cloud/events/2020-open-cloud-workshop/ I'm guessing | 15:57 |
fungi | yeah, there | 15:57 |
fungi | i especially liked the talk on 100% free/libre open source toolchain for configuring fpgas | 15:57 |
* fungi breaks himself of the habit of saying "programming" where fpgas are concerned | 15:58 | |
knikolla | for openstack, i believe we're using PCI passthrough. https://github.com/CCI-MOC/rhosp-director-config this is our full tripleo config in case anyone finds it helpful. | 15:58 |
fungi | knikolla: okay, so no vgpu availability in there yet, i guess | 15:59 |
knikolla | I don't think that is available in queens yet, which is what we're still on. | 16:00 |
*** smarcet has joined #openstack-infra | 16:02 | |
clarkb | it also requires specific hardware | 16:03 |
clarkb | (which you may or may not have) | 16:03 |
sean-k-mooney | knikolla: the issue is that witough some change to nova and maybe qemu we cant do double passhtough | 16:03 |
sean-k-mooney | knikolla: so even if you had the correct gpus available unelss the ci could spin up the hsot as an ironic node it would not allow use to test with them | 16:04 |
knikolla | yeah, we've created a VM flavor which fully eats up the entire GPU node | 16:04 |
knikolla | but for us that's usually less of an issue, since we also allow people to reserve bare metal nodes | 16:05 |
knikolla | and usually people who need specialized hardware do that | 16:05 |
*** smarcet has quit IRC | 16:05 | |
knikolla | our PCI passthrough it's mostly for OpenShift running on top of OpenStack to see the GPUs and run containers on those. | 16:06 |
sean-k-mooney | knikolla: right but unless you modifed the xml to have a virtual iommu, placed the gpu into a sperate iommue by adding it to a different pcie bridge and configured the q35 machine type and used a uefi image | 16:06 |
sean-k-mooney | then i dont think we would be able to create VF in the vm to then create the mdevs for the l2 guest to use | 16:06 |
sean-k-mooney | also we would need nested virt | 16:06 |
sean-k-mooney | os one of the issue is while there are ways to test specifc hardware in a vm its not suppored by nova so the first layer vm would have to be created by something that is not openstack at the moment | 16:08 |
sean-k-mooney | i have done some testing like this manually usign libvirt but just enough to know what the gaps are in nova and how much work it is to close them | 16:08 |
sean-k-mooney | for what its worth amd vGPU technology called Mxgpu just uses sriov | 16:09 |
sean-k-mooney | and requries no licening | 16:09 |
*** rpittau is now known as rpittau|afk | 16:12 | |
knikolla | sean-k-mooney: we haven't needed double pci passthrough so haven't experimented with that. | 16:16 |
knikolla | but your last message gives me quite a bit to look up and learn about :) | 16:17 |
sean-k-mooney | knikolla: you cant do it with openstack but https://www.berrange.com/posts/2017/02/16/setting-up-a-nested-kvm-guest-for-developing-testing-pci-device-assignment-with-numa/ actully does work | 16:17 |
sean-k-mooney | the issue is normally if you do that you cant creat VFs in the first level vm to assing to the second level guest | 16:18 |
sean-k-mooney | i belive its possible to get it too work but its non trivial | 16:19 |
sean-k-mooney | i have done a double passthough of a full nice using that method | 16:19 |
sean-k-mooney | * full nic | 16:19 |
fungi | and just to clarify, the reason i suggested knikolla is that he's one of the folks i can think of who might have access to the hardware necessary to work out how to test such features (and mordred has chatted with him in the past about us possibly getting some small nodepool quota in their environment) | 16:22 |
sean-k-mooney | cool | 16:22 |
*** kevinz has quit IRC | 16:22 | |
sean-k-mooney | it would have to be an ironic node really to test given the current limitations | 16:23 |
knikolla | i could see us again offering hardware to help test, especially in the context of #openinfralabs | 16:23 |
sean-k-mooney | that said i would proably priorites normal pci pashtough and sriov testing over vGPU testing as that is more generally useful and requires less specific hardware | 16:24 |
sean-k-mooney | fungi: does infra currently run its own cloud by the way. i think it did/does | 16:25 |
clarkb | we did, but that hardware went away | 16:26 |
sean-k-mooney | ah | 16:26 |
clarkb | I'm not sure we're interested in taking that on again | 16:26 |
sean-k-mooney | understandable | 16:26 |
clarkb | running openstack is easy compared to dealing with "datacenter flooded", "we lost your rails", "the network switch mysteriously became a bridge" | 16:26 |
clarkb | and so on | 16:26 |
sean-k-mooney | ya | 16:27 |
sean-k-mooney | or "the datacenter cooling system is dead and so is the backup, rooms are hiting 70C, shut off your worklod NOW! we are powering off the racks" | 16:28 |
sean-k-mooney | that happend when i was an intern | 16:28 |
*** udesale has quit IRC | 16:29 | |
sean-k-mooney | actully i think it was more like 40c in the room the servers were at 70 and were starting to trip there overheat protection | 16:30 |
fungi | yes, operating openstack, even remotely, was not that challenging for us. dealing with the inevitable hands-on tasks in data centers for hardware which was "donated" to us because it was too old for the cloud provider to reliably use, that was the issue | 16:30 |
artom | I read that as "the network switch mysteriously became a fridge" and was really confused | 16:30 |
fungi | iot | 16:30 |
fungi | in the case in question, the tor switch was shared with some other users on different vlans, and i think the switch admin (we didn't have config/management access to it) didn't set a large enough memory allocation for the bridge tables | 16:31 |
knikolla | obligatory reference to "when sysadmins ruled the world" | 16:31 |
knikolla | earth* | 16:31 |
fungi | itym "systems reliability engineers" ;) | 16:32 |
* fungi still misses being called a "sysop" | 16:32 | |
clarkb | "datacenter flooded" isn't made up if anyone is wondering :) | 16:32 |
fungi | welcome to texas! | 16:33 |
clarkb | turns out having datacenters in a swamp that gets hit by hurricanes leads to that | 16:33 |
sean-k-mooney | wait flooded and texas? isnt most of it a dessert | 16:33 |
fungi | that's what they want you to believe | 16:33 |
sean-k-mooney | ah | 16:33 |
sean-k-mooney | hurricanes ya | 16:33 |
fungi | hewlett packard made some interesting facilities choices, we'll just leave it at that | 16:34 |
*** evrardjp has quit IRC | 16:36 | |
*** evrardjp has joined #openstack-infra | 16:36 | |
*** kevko_ has quit IRC | 16:37 | |
*** jcapitao has quit IRC | 16:40 | |
*** Anticimex has quit IRC | 16:43 | |
*** Anticimex has joined #openstack-infra | 16:44 | |
*** prometheanfire has quit IRC | 16:45 | |
sean-k-mooney | artom: just to conclude this we might want to see if we can fake out the mdevs using the tty driver upstream to add some addtional testing but for now i think we have to just make do with what we have unless we can get REXCI running | 16:46 |
donnyd | artom: If you are interested in SR-IOV things that can be done with OE - maybe with a small amount of assistance I can get you something to support the effort | 16:51 |
sean-k-mooney | donnyd: we are missing openstack support so th eonly way to do it would be via ironic or staticaly provisioned nodes in nodepool right now | 16:52 |
*** yamamoto has joined #openstack-infra | 16:53 | |
*** psachin has quit IRC | 16:53 | |
donnyd | Ironic isn't hard - I am thinking that is in the realm of possibilities - any specific hw requirements if that is the way? | 16:53 |
donnyd | I have some xeon-d based boxes I have no issues with inserting into the mix | 16:54 |
sean-k-mooney | just some nics that support sriov. ofthen the onboard nics if they are intel will be just fine for that | 16:54 |
sean-k-mooney | we would need to create a new lable in upstream nodepoll to then consume them but after that is done a test jobs could be created. | 16:55 |
sean-k-mooney | for the vgpu testing we might be able to cheat and use https://github.com/torvalds/linux/tree/f97c81dc6ca5996560b3944064f63fc87eb18d00/samples/vfio-mdev | 16:55 |
donnyd | pretty sure this is the board I have in stock - https://www.supermicro.com/en/products/motherboard/X10SDV-4C-TLN4F | 16:56 |
donnyd | will that work? | 16:56 |
sean-k-mooney | thats a samble mdev driver that creates a virtual siral port | 16:56 |
sean-k-mooney | form a nova point of view it would be effectly the same as a vgpu | 16:56 |
sean-k-mooney | donnyd: let me check | 16:56 |
sean-k-mooney | oh xeon-d ya it should work ill jusg check the nic | 16:57 |
sean-k-mooney | Dual LAN with Intel® Ethernet Controller I350-AM2 | 16:57 |
sean-k-mooney | Dual LAN with 10Gbase-T | 16:57 |
sean-k-mooney | so yes the i350 support sriov | 16:57 |
sean-k-mooney | the 10G nics proably also do | 16:58 |
*** yamamoto has quit IRC | 17:00 | |
*** xek has quit IRC | 17:02 | |
*** xek has joined #openstack-infra | 17:03 | |
artom | sean-k-mooney, donnyd, so, with RHEx CI (the internal Red Hat 3P CI I wanted), the plan was always to use SRIOV as a gateway to a larger scope (vGPUs, FPGAs) | 17:04 |
artom | donnyd's argument that it's better to centralize in 1P than every vendor reproducing their own entire cloud for their own bit of hardware in their own 3P CI speaks to me a lot | 17:04 |
artom | RH already has a contract with mnaser, and donnyd seems open to a similar arrangement | 17:05 |
artom | So I'm wondering whether we shouldn't pivot to that approach | 17:05 |
artom | I'll talk to Eoghan about it | 17:05 |
*** dtantsur is now known as dtantsur|afk | 17:06 | |
donnyd | sean-k-mooney: I will get ironic up and running - it was already on my hit list anyways | 17:06 |
sean-k-mooney | artom: this is also interesting https://github.com/torvalds/linux/commit/d61fc96f47fdac1f031ed4eafa9106fe10cdaa37#diff-a85d93c1c9bb0ede2e7ef1beaa2534af apparently there is a sample dispaly driver. | 17:07 |
sean-k-mooney | i have used the mtty serial driver for testing before | 17:07 |
sean-k-mooney | but we might be able to use the sample dispaly driver for testing without vgpu hardware | 17:08 |
*** smarcet has joined #openstack-infra | 17:08 | |
sean-k-mooney | looks like it supports multiple types too | 17:08 |
sean-k-mooney | https://github.com/torvalds/linux/blob/f97c81dc6ca5996560b3944064f63fc87eb18d00/samples/vfio-mdev/mdpy.c#L51-L53 | 17:09 |
*** derekh has quit IRC | 17:09 | |
*** zxiiro has joined #openstack-infra | 17:09 | |
sean-k-mooney | it was only adding in kernel 4.18 however | 17:09 |
sean-k-mooney | still ubuntu 20.04 will certely work for that and maybe 18.04 if we are using the hardware enableing kernel in the vms | 17:10 |
sean-k-mooney | i think 18.04 orgianlly shiped with 4.15 but you can get much later kernels just not sure what the gate is useing | 17:10 |
clarkb | sean-k-mooney: we install the default kernel by default (which is the older one), but you can install and reboot on a newer kernel if you need it | 17:11 |
sean-k-mooney | clarkb: we would jsut do that as a pre playbook right | 17:12 |
clarkb | ya | 17:12 |
sean-k-mooney | adding the appriate wait and recoonect logic | 17:12 |
clarkb | if you codesearch .yaml files for reboots you'll probably find a small number of examples | 17:12 |
sean-k-mooney | i have comipled that before and used the mtty driver. i might try and play with it later in the week | 17:13 |
sean-k-mooney | if i can get it too work locally then i can try it in the ci | 17:13 |
sean-k-mooney | clarkb: would you have any issues with uploading a second copy of one of the sample image with different device metadata? | 17:21 |
sean-k-mooney | i i am able to make the mdpy driver work i think we would have to set hw_machine_type=q35 in the image | 17:21 |
sean-k-mooney | i know nodepool can do this but not sure if that would be ok | 17:22 |
*** jpena is now known as jpena|off | 17:22 | |
clarkb | sean-k-mooney: will nova fallback gracefully if it can't provide a q35 machine? | 17:23 |
clarkb | if so maybe we can just set it and not worry? | 17:23 |
*** prometheanfire has joined #openstack-infra | 17:23 | |
clarkb | note rax is not kvm so I think that won't apply there either | 17:23 |
clarkb | but q35 machine type is old enough I would expect the other clouds to work | 17:24 |
sean-k-mooney | clarkb: it will give a no valid host. it also would only work for qemu based clouds not sure if that include xen or not | 17:24 |
*** ociuhandu has quit IRC | 17:24 | |
sean-k-mooney | ya its almost 10 years old | 17:24 |
clarkb | ya pretty sure it won't work on rax but would elsewhere | 17:24 |
clarkb | sean-k-mooney: its done per provider too so we could turn it on a provider at a time to be conservative | 17:24 |
sean-k-mooney | ya. let me see if this is feasable locally first if it is i might propose some patches | 17:25 |
donnyd | seems like a federated placement service would be pretty slick for this use case | 17:27 |
sean-k-mooney | nodepool fills in the gap via the lables | 17:30 |
sean-k-mooney | but it could be | 17:30 |
donnyd | oh yes nodepool is the only existing mechanism we have for defining a resource.. just a random though | 17:33 |
donnyd | thought * | 17:33 |
donnyd | artom: I have been watching OE for a while now and all seems to be good to go. Much appreciated heads up. I pretty much depend on it because its a one man show over here | 17:34 |
artom | donnyd, thank you again, it's appreciated :) | 17:35 |
*** diablo_rojo has joined #openstack-infra | 17:38 | |
*** lpetrut has quit IRC | 17:40 | |
*** andrewbonney has quit IRC | 17:41 | |
*** diablo_rojo has quit IRC | 17:55 | |
*** diablo_rojo has joined #openstack-infra | 17:55 | |
*** ociuhandu has joined #openstack-infra | 17:59 | |
openstackgerrit | Jeremy Stanley proposed openstack/project-config master: Replace incident channel with opendev-meeting https://review.opendev.org/716038 | 18:10 |
openstackgerrit | Mohammed Naser proposed openstack/project-config master: vexxhost: add repos for exporters https://review.opendev.org/714965 | 18:14 |
openstackgerrit | Merged zuul/zuul-jobs master: test-upload-logs-swift: revert download script https://review.opendev.org/715755 | 18:14 |
*** ralonsoh has quit IRC | 18:17 | |
*** rlandy is now known as rlandy|brb | 18:29 | |
*** rledisez has joined #openstack-infra | 18:44 | |
openstackgerrit | Merged openstack/project-config master: Add nginx-ingress-controller armada app to StarlinX https://review.opendev.org/714686 | 18:47 |
*** ociuhandu has quit IRC | 18:48 | |
*** rlandy|brb is now known as rlandy | 18:54 | |
openstackgerrit | Andreas Jaeger proposed openstack/pbr master: Update hacking for Python3 https://review.opendev.org/716059 | 19:14 |
openstackgerrit | Merged openstack/project-config master: vexxhost: add repos for exporters https://review.opendev.org/714965 | 19:15 |
openstackgerrit | Andreas Jaeger proposed openstack/pbr master: Update hacking for Python3 https://review.opendev.org/716059 | 19:17 |
*** pkopec has quit IRC | 19:21 | |
*** ociuhandu has joined #openstack-infra | 19:25 | |
*** ociuhandu has quit IRC | 19:29 | |
*** larainema has quit IRC | 19:41 | |
openstackgerrit | Andreas Jaeger proposed openstack/cookiecutter master: Update hacking version of new repo https://review.opendev.org/716065 | 19:42 |
*** gfidente is now known as gfidente|afk | 19:49 | |
*** mugsie has quit IRC | 19:51 | |
*** mugsie has joined #openstack-infra | 19:54 | |
*** dpawlik has quit IRC | 19:59 | |
*** sshnaidm is now known as sshnaidm|afk | 20:07 | |
*** jackedin has quit IRC | 20:07 | |
*** dpawlik has joined #openstack-infra | 20:08 | |
*** dpawlik has quit IRC | 20:13 | |
*** xek has quit IRC | 20:35 | |
*** njohnston is now known as njohnston_ | 20:39 | |
*** njohnston_ has quit IRC | 20:47 | |
*** smarcet has quit IRC | 20:48 | |
*** smarcet has joined #openstack-infra | 20:48 | |
*** hashar has quit IRC | 20:49 | |
*** njohnston has joined #openstack-infra | 20:51 | |
*** gshippey has quit IRC | 20:52 | |
*** cdearborn has quit IRC | 20:54 | |
openstackgerrit | Merged openstack/project-config master: Add xstatic-** projects for vitrage-dashboard https://review.opendev.org/704133 | 20:56 |
openstackgerrit | Merged openstack/project-config master: Add Rook to StarlingX https://review.opendev.org/713650 | 21:13 |
openstackgerrit | Merged openstack/project-config master: Add Cert-Manager Armada app to StarlingX https://review.opendev.org/714689 | 21:13 |
openstackgerrit | Merged zuul/zuul-jobs master: Improve job and node information banner https://review.opendev.org/677971 | 21:37 |
*** rcernin has joined #openstack-infra | 21:56 | |
*** slaweq has quit IRC | 22:08 | |
*** tosky has quit IRC | 22:20 | |
*** todd-inmotion has joined #openstack-infra | 22:23 | |
*** rfolco has quit IRC | 22:57 | |
*** todd-inmotion has quit IRC | 22:57 | |
*** rfolco has joined #openstack-infra | 22:58 | |
*** rfolco has quit IRC | 22:59 | |
*** gfidente|afk has quit IRC | 23:14 | |
*** arif-ali has quit IRC | 23:25 | |
*** arif-ali has joined #openstack-infra | 23:34 | |
*** smarcet has quit IRC | 23:39 | |
*** rh-jlabarre has quit IRC | 23:53 | |
*** tetsuro has joined #openstack-infra | 23:58 |
Generated by irclog2html.py 2.15.3 by Marius Gedminas - find it at mg.pov.lt!