lucasagomes | Hi, quick question. Is there a place where I can download the OpenStack documentation (e.g the admin guides, https://docs.openstack.org/2024.2/admin/) as a tarball instead of having to crawl the website or build it for each project ? The idea is to create a RAG out of it | 09:11 |
---|---|---|
frickler | lucasagomes: what's a RAG? you could use an AFS client and copy the data you need from /afs/openstack.org/docs but I'm not sure that that would be faster than running "wget -r" | 09:48 |
lucasagomes | flicker, retrieval-augmented generation for AI chatbots. Thanks I will take a look at my options | 10:27 |
*** ykarel_ is now known as ykarel | 13:43 | |
fungi | lucasagomes: each docs build in zuul does generate a packed tarball of the web content, though that's attached to the build result as an artifact and each version only retained for a month | 14:09 |
fungi | but also, it seems like the rst in git repos would be cleaner for training an llm on? it's not full of html tag noise that way | 14:09 |
lucasagomes | fungi: yeah, I think u r right about the .rst, this is apparently the case. I created a new tox target to generate the docs in plain-text with sphix too and create a RAG with it | 14:11 |
lucasagomes | seems to work well too, this is all experimentation at the moment | 14:11 |
clarkb | ya I would use the source or build locally for an appropriate output target | 15:50 |
-opendevstatus- NOTICE: The paste service at paste.opendev.org will have a short (15-20) minute outage momentarily to replace the underlying server. | 17:08 |
Generated by irclog2html.py 2.17.3 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!