Friday, 2026-04-10

opendevreviewGrzegorz Grasza proposed openstack/contributor-guide master: docs: add guidelines for tool-generated contributions  https://review.opendev.org/c/openstack/contributor-guide/+/98406115:57
*** vhari_ is now known as vhari16:59
gouthamrhttps://www.kubernetes.dev/docs/guide/pull-requests/#ai-guidance was updated today.. some interesting things to note, and ponder upon19:21
gouthamrhttps://groups.google.com/a/kubernetes.io/g/dev/c/7Y9016gdFZw/m/nUUmxFPzAQAJ?utm_medium=email&utm_source=footer19:21
fungiinteresting that they bring up dependabot in the discussion there, i made a similar point on xek's documentation proposal19:32
clarkbits also interesting that putting the info in the commit message is bad bceause the machines can't agree to the things (which has been aprt of my concern), but then its ok to use them anyway and note them in the PR outside of git itself19:33
fungiin particular, "signed-off-by" has whatever meaning the project's leadership wants to assign to it. if the tc wants to say it means one thing for commits pushed by people and another thing (or nothing at all) for commits pushed from agreed-upon automated accounts, that's perfectly fine and sensible19:35
clarkbright, though in the k8s case the email explicitly says `Kubernetes cannot and will not allow co-authoring PRs or co-signing commits with AI for the simple reason that the AI cannot sign a CLA; this is a legal requirement.` but its okt to actually co author the PR and note it it outside of git19:36
gouthamrclarkb: the commit message tagging has been quite inconsistent in our space, so i also wonder about the usefulness.. maybe we should evaluate this part.. but, we don't have any other way to quickly signal to reviewers that AI was used - PR descriptions on GitHub/Gitlab are easy metadata19:36
clarkbto me thats an indication youcannot accept and ai assisted contributions under your legal agreement (the CLA), but they must see it differently19:37
clarkbgouthamr: right I'm trying to get to the lawyer below the human UX19:37
clarkb"is this even legal"19:37
fungilawyer or layer?19:37
clarkband I think if you do an honest read of the DCO and likely their CLA the answer you come up with is "no"19:37
clarkbbut no one wants to accept that so instead we hand wave around it and say things like the machien can't sign off on this so you do it and we'll pretend no one cares19:38
clarkbfungi: layer but I like the typo19:38
clarkbfor me personally it makes me question why we have the rules and policies in the first place if we can ignore them the instant it inconveniences the right/wrong people19:38
* gouthamr replays JayF's accountable.computer mention 19:38
gouthamri think we need the foundation to help us with legal review again, things have changed since the last version of this19:39
fungii'll just note that if openstack *does* decide they're going to require a "responsible human" to sign-off on every change pushed by the proposal and release bot accounts, i'm not volunteering to be that human19:39
clarkb(and as I've noted in other contexts I think challenging that stuff and changing the rules is healthy. So I'm not opposed to that. What I do have concersn with is keeping the rules as they are and pretending that we can ignore them)(19:39
gouthamrfungi: we'll gather wider feedback on the tooling thing, maybe we start with a resolution that we want to allow DCO sign-offs for OpenDev automation patches with no LLM usage explicitly. This is our explicit acknowledgement that we have technical constraints to do anything else, and would rather not spend the time to evolve a process19:41
fungiexpecting to bring our longstanding automation into compliance with new policies designed to gatekeep people who send in ai slop is essentially an obstructionist move straight out of the simple sabotage field manual19:41
fungiwhich, by the way, is an interesting historical read for anyone not familiar with it: https://www.cia.gov/static/5c875f3ec660e092cf893f60b4a288df/SimpleSabotage.pdf19:43
gouthamromg19:44
fungiyou'll recognize a lot of common tactics in there that crop up regularly in open source communities19:45
gouthamri only knew of a probably much less nefarious one in our community :P the queen's duck strategy19:48
* gouthamr context: https://blog.codinghorror.com/new-programming-jargon/19:51
clarkbre the DCO specifically I've thought about this a lot and if you read it the intent seems to be that the content that is in the commit is either written by you and you have permission adn give permission to contribute it under the license or it is written by someone else who has done so. Within that context what our existing bots do is completely fine imo. For example with19:53
clarkbtranslations we're taking translations from people who have signed off on the contributions similarly but using a different transport layer and are merely including their work within the software19:53
clarkbwhere I trip up with the LLM stuff is the LLM services and the non open models in particular (but even using an open model on a hosted service) is typically providing you with content under different license terms. That is at odds with clause b and c of the DCO19:54
clarkb'The contribution is based upon previous work that, to the best of my knowledge, is covered under an appropriate open source license and I have the right under that license to submit that work with modifications, whether created in whole or in part by me, under the same open source license (unless I am permitted to submit under a different license), as indicated in the file;' <-19:55
clarkbthis is b19:55
clarkbclaude for example does not output content that is licensed with an appropriate open source license19:55
gouthamryeah let’s you copyright it, license it however you prefer19:56
clarkbno there are a bunch of stipulations19:56
clarkbwhich are at odds with open source licensing in particular19:56
fungisimilarly https://openinfra.org/legal/ai-policy calls out "Source training data, and thus resulting material, may come from materials which have unclear or incompatible copyrights and/or licenses. In other cases, copyright of any generated code may be explicitly retained by the vendor operating the AI technology, which is incompatible with contribution to projects. Furthermore,19:57
clarkbthere are like 5 things you are forbidden from doing19:57
fungisome tools have demonstrated the ability to source context from the contents of a project being worked upon. Ultimately this requires awareness of the End-User License Agreement by the contributor."19:57
clarkbmy argument would be that if we feel these tools are so valuable and that copyright is so weak anyway that we can simply ignore the rules we have for ensuring proper open source licensing then we should do away with the DCO entirely19:58
clarkbthen we can focus on the UX how how to keep developers and reviewers sane19:58
fungithe policy also states: "Contributors need to verify they have the right to contribute output from AI tools, just like they do for their own original work, work owned by their employer, work copied or modified from another open source project, or work submitted on behalf of a third party."19:58
gouthamrhonestly that’s a big bar19:58
fungii read it as "you shouldn't push changes written in part or in whole by an llm unless you're really, really, really sure it's okay, and even then you probably shouldn't"19:59
gouthamrthere isn’t any verifiable way to know for sure; you ask Claude this question and it’ll make up some stuff like “nope I didn’t copy any code from anywhere”19:59
clarkbgouthamr: there is: you build a model that was built only from compatibly licensed content20:00
clarkbfor some value of you20:00
clarkbI think we would accept a reasonable assertion that someone else did so and distributed the resulting weights too20:00
gouthamrwhich is true, it synthesized it from its training, which was probably all the stuff we have produced under open source licenses for decades20:00
clarkbbut yes I agree the bar is high and I don't think anyone is actually clearing it in their current use of the tools within openstack20:00
fungiit's getting discussed again in the cpython community as well, with a similar interpretation that the seemingly permissive ai contribution policy there is really mostly an impossible hurdle to clear in most popular cases, so anyone who is using an llm for their contributions is basically just yolo20:01
clarkbso we either stop using the tools and possibly find tools taht clear the bar. Or we need different rules.20:01
clarkbbut I feel more and more strongly that until we answer these more fundamental questions then worrying about how to keep code reviewers sane is impossible20:02
clarkbbecause if I am asked to review something the only response I can currently give is "I'm sorry I believe this isn't acceptable under the terms of the DCO and I won't be able to accept it"20:02
fungithe two sides of the discussion in cpython are "we should ban use of llms for contributions because you can't know for sure they're legally safe" and "we should welcome safe use of llms for contributions in case someone eventually creates a popular llm that is safe for this purpose, even though probably none are now"20:03
fungibut yes, people continue to use them under the expectation that if they're determined to be a problem eventually someone else will be on the hook to deal with the resulting fallout20:04
clarkbfungi: I'm ready to start having nightmares about git filter branchin every openstack repo and force pushing the result20:05
gouthamrif they’ve declared it20:05
clarkbor if it is an obvious copy pasta reproduction as the models have been shown to do20:06
fungifilter-branch probably isn't enough even if they did declare it, because subsequent commits are likely derivative works of the commits you're filtering out too20:06
fungithe counterargument of course is that there are likely illegally-supplied commits already present in most large open source project from long before llms came into existence, so this isn't an llm-specific problem20:07
gouthamrsome community focused short term measures may be required; maybe we consider a blanket ban on “sloppy” contributions.. because there are two sections in our community too - those that have access to these tools and are willing to use it, and those that don’t and are genuinely continuing their work without the need for this…20:07
clarkbas a reviewer of code that gets included in these projects I think ideally there would be a concrete set of guidance to go along with whatever policy there is. "For example qwen next coder is ok due to its licensing terms. Claude is not." Or "Claude is fine have at it"20:08
clarkbbecause unfortunately our existing policy is hopeful and vague and hands off a lot of ersponsibility to individuals who really shouldn't be making these decisions (and I include myself in that bucket)20:08
fungii have a feeling it's going to end up getting ignored because so many people did it that undoing it now will be intractable and the affected ecosystems/industries are "too big to fail"20:08
fungianother way to look at it is that llms call the very nature of copyright into question, and maybe it won't exist in the near future20:09
clarkbgouthamr: there is a third group: those of us with access who feel they acnnot use them due to the rules20:09
clarkb(it me)20:09
fungiyeah, i have personally never used an llm in my professional work or even personal hobby projects20:10
funginor am i particularly interested in doing soi20:10
gouthamrack; this is good fodder for the discussion at the PTG, I’ve heard murmurs that project teams have specific feedback20:10
gouthamrmaybe you’ll see this in the maintainers survey too, if folks are planning to take it20:10
fungiyes, we added some questions about it in the maintainer and contributor surveys because we know it's on a lot of people's minds20:11
clarkbanother way of looking at it imo is would we accept BSL code? No so why is this ok? Mostly because hype? or is it actually sound in terms of contract agreements?20:12
clarkbI mean we don'y even allow gplv3 code beacuse ti si viral20:12
clarkbbut gplv3 is compatibile with apache2 and it is open source and it preserves your rights as a user20:13
clarkbwe'er more afraid of GPLv3 than we are of undefined nebulous llm output20:13
sean-k-mooneywell not really. apahe2 does not allwo reliseing and gpl would requrie it for the combined work20:13
fungiyes but, horror of horrors, if someone makes a modified copy of the software they're going to be required to supply the source code to anyone they provide binary builds to20:13
clarkbsean-k-mooney: right tahts what I meant by viral. It forces the combo into gplv320:14
sean-k-mooneyright which you cannot do 20:14
sean-k-mooneythat would violate the apache2 requiremetns 20:14
clarkbno gplv3 is generally considered apache2 compatible iirc20:14
clarkbgplv2 is not20:14
fungiapache2 doesn't say that a derivative work can't be distributed under a license which preserves the same set of terms as a minimum20:14
clarkbthe only reason we don't allow gplv3 is taht we don't want to be bound by the additional requirements of gplv320:15
clarkbbut it is doable aiui and is open source and preserves user rights20:15
sean-k-mooneyi htink it prevent relsisint of the apache2 content20:15
sean-k-mooneybut maybe your right on the v2 vs v3 case20:16
fungithere is no legal requirement not to relicense a derivative, the license of the derivative simply has to preserve the requirements of all the original works from which it was derived (which might be equivalent to one of the two licenses if one is a strict subset of the other, or may be a wholly new composite license that preserves the terms of both original licenses)20:16
sean-k-mooney"""Apache 2 software can therefore be included in GPLv3 projects, because the GPLv3 license accepts our software into GPLv3 works. However, GPLv3 software cannot be included in Apache projects. """20:16
sean-k-mooneyhttps://www.apache.org/licenses/GPL-compatibility.html20:16
clarkbya so the openstack sdk into ansible is ok20:16
clarkbbut you couldn't copy ansible code out of ansible and put it in nova20:17
fungiright, so the resulting derivative is distributed under the gplv3 because apache license v2 is a strict subset of the terms of gpl v320:17
clarkbBUT you can release openstack-ansible-modules as part of openstack as a separate deliverable20:17
sean-k-mooneyright i was thinkign fo the case last year or the year before where we removed an embeded ansible script form neutorn20:17
clarkbits just that that portion of the openstack reelase would be gplv3 not apache220:17
sean-k-mooneybecuase the asnsibel was potitcaly gpl licensed i belive20:17
fungiyou *could* copy gpl v3 code into nova and distribute the resulting derivative work under the gpl v320:17
sean-k-mooneyand we just wanted to not get anywere near that20:17
sean-k-mooneybut never upstream it20:18
clarkbsean-k-mooney: yup and I think not getting anywhere near that is a fine stance. until everyone suddenly becomes ok with the current situation with llms then I'm a bit confused20:18
fungidepends on who "upstream" is. from that perspective the person distributing the resulting derivative may be the "upstream" maintainer of the derivative20:18
sean-k-mooneywell while that is evlvoign the current situration in the use is LLM work is consiered transformaitive even if traitned on copyrighted matiral and llm contend that is explcity not guided by a human cannot have copyright20:19
clarkbwhich is why I go back to if the rules don't make sense anymore then we should talk about it.Figure out why the rules existed in the first place, determine if there is still value to having these rules, and if not change them. This applies to more than just teh current discussion20:19
clarkbbut more and more it seems we've just decided to ignore the rules, do what we want, and worry about it later if ever20:20
sean-k-mooneywhat not ruled on is content that has beend generated based idrectly on human input20:20
fungibut yes, the nova maintainers would rightfully refuse contributions ported from the "tainted" gplv3 derivative for license reasons. doesn't mean the licenses are incompatible, just that nova (and openstack) requires contributions be licensed under apache license v220:20
clarkb(part of my concern is probably the fact that I tend to be the person dealing with the fallout if/when there are issues with things like code history)20:20
clarkbwhereas most others change jobs ever couple of yaers and completely leave all this behind them20:20
sean-k-mooneyfungi: in that speciic can it woudl also be a dco issue in that you cant attest to contibute it under the lisces of the proejct20:21
clarkbso the risk to them as the individual is so low to not matter20:21
sean-k-mooneybecuase you cant grant the gplv3 to apche2 rights (well unless its entirly yoru code and you dual or relaise it)20:21
fungisean-k-mooney: right, that's essentially how the project enforces its license expectation20:21
sean-k-mooneyok dinner arived so im going to call it a day o/20:22
fungion the other hand, mit/expat licensed code can be accepted into nova, because that license is a strict subset of the terms of the apache license v2 and so can effectively be relicensed to apache license v220:22
fungienjoy dinner!20:22
gouthamryeah. light Friday discussion this :P20:23
fungiit's more fun than fixing the apt-puppetlabs mirroring config20:23
fungiwhich i should get back to20:23
clarkbI've got a lodgeit change proposed where I wholesale copy a werkzeug helper lirbary to fix a bug with new python because the lib is basically unmaintained. And ya aiui its fine because three clause bsd is compatible that direction and I preserved attribution: https://review.opendev.org/c/opendev/lodgeit/+/982311/8/lodgeit/vendor/secure_cookie/_compat.py20:24
fungi(but if it were 4-clause bsd it would be a problem!)20:25
clarkbwhich would be an example of clause b) where I have done the due diligence and it should be ok20:25
clarkb(I had to modify the code to fix the bug so its like 90% there 10% mine but acceptable in the end20:25
fungithough if memory serves, the university did announce anything they hold copyright on can officially be relicensed from 4-clause to 3-clause20:25

Generated by irclog2html.py 4.1.0 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!