Timeline for Policy: Generative AI (e.g., ChatGPT) is banned

Current License: CC BY-SA 4.0

39 events

when toggle format	what		by	license	comment
Jun 3 at 14:09	comment	added	Ray		@MarshallEubanks See my responses to Wildcard. I describe exactly what I meant by "understanding", and that meaning is supported by experiments. But as I said earlier, while I was giving credit where credit was due for what the models are capable of (and believe me, as much time as I waste these days explaining to people that transformers aren't sapient and/or magic, the capabilities of modern neural language models are an order of magnitude better than anything we had before, and being able to represent similarity of meaning is a big part of why), my main point was that they're not sapient.
Jun 2 at 21:20	comment	added	Marshall Eubanks		"There is understanding, but no capacity for reasoning." I most vehemently disagree. "Relatedness" is just another heuristic. It does not somehow conjure a semantic foundation for anything. You cannot bootstrap your way into a semantic foundation, no matter how much data with how many "relatedness" heuristics you include. In fact, the apparent "understanding" of these models decreases as the datasets expand beyond some point; they effectively go insane. LLMs are not capable of "understanding" anything, and the entire approach is a dead-end for anything other than rote generation.
May 12 at 7:14	comment	added	Neil S3ntence		For the above reasons, GPT has helped draw n00bs like myself away from SO. Which some of you more advanced folks could argue is a good thing. I haven't posted maybe 200 silly questions here. The thing though is, I've had an enormous learning curve building my last two or three projects. If GPT keeps getting better, I might find it hard to get back to posting questions here. SO would only make sense for me as a user to consume like a magazine. Or social media to share views (like right now). Currently, GPT has maybe helped filter out a big # of casual / beginner, maybe even intermediate coders.
May 12 at 7:00	comment	added	Neil S3ntence		GPT in particular had a very hard time solving CORS issues when interacting with server-side middleware from a local web page. It was going around and around in circles, but eventually I got it to work as intended, and I learned a ton.
May 12 at 6:58	comment	added	Neil S3ntence		Reflecting on my experience as a user of paid GPT, I notice that there's been a quality increase that would be hard not to notice. Oftentimes, I feed an entire JS / PHP file in, and get good feedback (it actually finds bugs). That's a very low-level use case, saving time. At the same time, I frequently notice how some of my - subpar - code throws it off or it gets "hung up" on non-issues, introducing code breaking bugs in its versions. However, the mental work required to analyse its output has made me a better developer. And I'm able to get GPT "back on the right track" very often.
Jan 12 at 17:09	comment	added	Ray		2) They will respond with really well-written text if we ask them to "explain the code", but that doesn't mean they're reasoning about it: they're sampling from a very large joint distribution over responses trained on text written by people who were reasoning about similar code. This can result in correct responses, but we don't need to guess at whether they're doing reasoning: we can just read the math: 1 2 3.
Jan 12 at 17:08	comment	added	Ray		@ChatGPT I ran a very short series of tests against a SOTA model, Qwen2.5-Coder-32B-Instruct, to check the conclusions I'd previously drawn based on the math and responses from earlier models. Full details in this chat link, but in summary, it got 3/3 Easy problems right, 0/4 medium, 0/2 hard. W.r.t. your other comments: 1) My top tag is only that because of a single answer on a popular question: namely, this one. [Continued...]
Jan 11 at 1:59	comment	added	maxhodges		for anyone who doubts LLMs can "reason about code effects" ask ChatGPT or Claude 3.5 Sonnet to "explain this code and suggest improvements" `def process_list(items, threshold=None): if threshold is None: threshold = len(items) // 2 return [x for x in items if items.count(x) > threshold]`
Jan 9 at 7:11	comment	added	maxhodges		your top tags are ai-generated-content and you don't even know what they are capable of.
Jan 9 at 7:07	comment	added	maxhodges		It can do MUCH more than boilerplate. How do you not know this? Validate your assumptions. Try Cusor IDE with Claude Sonnet 3.5. Claude is much better than ChatGPT model 4o at coding. I'm doing things in hours that used to take days. Finishing entirely quarterly roadmaps over the weekend. The engineers who refuse to get onboard due to professional pride or whatever are going to be replaced with engineers who use the best combo right now: Cursor (sonnet 3.5, composer, agent mode) + Supermaven + RepoPrompt + o1-pro in ChatGPT
Jan 8 at 16:21	comment	added	Ray		@ChatGPT I'll have to take a look at codeforces to judge what sort of problems it's doing there, but first, a point of clarification on your other comment: I'm absolutely not claiming that humans can do anything that a computer can't do in theory. It's certainly possible for programs to write code from a higher-level description (indeed, that's what a compiler does, even if we don't think of it that way anymore). But a transformer trained as a language model is not going to have the ability to reason about code effects (although it can excel at generating boilerplate).
Jan 8 at 16:11	comment	added	maxhodges		@ray On the Codeforces competition website, o1 achieve Elo (1673). This Elo score puts the model at approximately the 86th percentile of programmers who compete on the Codeforces platform.
Jan 8 at 16:07	comment	added	maxhodges		There is no reason to believe, no known limit of computation or law of physic, that computers are incapable of doing all the things that humans can do. That's what the theory of computation tells us. See Computation: Finite and Infinite Machines (Minsky). AI is the Steam Drill and we're John Henry.
Jan 3 at 15:45	comment	added	Ray		@ChatGPT I suspect that says more about them than it does about it. How are you defining "complex tasks"?
Dec 31, 2022 at 9:11	comment	added	The Muffin Man		All of this "trust" conversation around code is silly. You act like the code coming out are 10,000 lines of code for operating missile systems... No one writes functions like that. The code sample is 50 lines of code that you can see what is going on and it tells you what it's doing. "Make this variable static", use a string builder here, etc. I want you to hand code a ReactJS component with a syncfusion listview and a textbox for searching. Good luck. Chat GPT will give you that boilerplate in 1 second.
Dec 30, 2022 at 23:19	comment	added	Duane		I think people misunderstand of what ChatGPT is. It's not a code interpreter. It's a sequence predictor. So what's its good at is providing working samples of common tasks based on the hundreds of examples of those tasks its seen and all the documentation its read. It will tend to produce correct answers, because people do not commit incorrect answers to github that often. For example, this morning it explained to me the difference between setuptools, importlib and pip, and provided me with code to debug a failed package install. Basically, it's a promptable manual.
Dec 27, 2022 at 4:19	comment	added	Ray		(continued) It's a bit tricky to come up with an example that it will definitely fail on, since it's seen a lot of existing code and can certainly work as a search engine (of sorts), but try asking it about the code samples I'm putting in the linked chat. I've obfuscated it a bit to reduce the chances of anything too similar showing up in the training data, but there's nothing here that a human couldn't figure out in 5 minutes by stepping through in a debugger. chat.stackoverflow.com/rooms/250666/…
Dec 27, 2022 at 4:19	comment	added	Ray		@Duane "English (or whatever language)", and of course its training corpora included code; otherwise, it wouldn't be able to produce code at all. But that doesn't invalidate anything that followed. I said, "It will favor models that produce syntactically correct text, use common idioms over semantically similar but uncommon phrases, don't shift topics too often, etc.", and aside from perhaps the last one, all those apply to code as well as natural language. But it includes no mechanism by which it could deduce the effects of a given piece of code, unless it's seen something similar before.
Dec 25, 2022 at 23:22	comment	added	Duane		This post is factually incorrrect, Chat GPT not trained only on english, but on the entire set of code in gitub. So it does not only predict "english" answers, it can predict code very well also. Because most of the code on github is "correct" code, generally it will answer coding questions correctly. Those that are getting "garbage" answers from ChatGPT are simply putting in "garbage" questions. When prompted correctly, I have found that it often produces far better code than what is found on SO. Incidently, it has read all of SO also!
Dec 22, 2022 at 5:28	comment	added	neondrop		@RomanStarkov The problem is you cannot be reasonably sure if what the AI does is correct, since this is not even what it was designed to do. Just because it gives the correct answer for this particular query, doesn't mean it will for the next (probably it won't). It doesn't "understand" the code you gave it (it does not even try to). It gives the answer that is the most likely English language text answer according to its training data. It is "just" a language model, "just" an insanely huge one, trained on a very big corpus of data.
Dec 20, 2022 at 23:55	comment	added	nasch		@Wildcard "how receptive people generally are here to constructive input" So refreshing when people in other fora often flip out over any minor correction.
Dec 19, 2022 at 19:56	comment	added	Ray		Which isn't to say that transformers aren't really impressive. Just that they shouldn't be blindly used without understanding the underlying math.
Dec 19, 2022 at 19:54	comment	added	Ray		@RomanStarkov Then it's seen similar code and similar errors before, because it is incapable of reasoning about code flow (or the error is purely syntactic; self-attention plus a threshold value implicitly induces a subgraph over the elements of the text, so it's not implausible to say that the abstract syntax tree could be represented). You can't just look at examples of its output; the model is absurdly huge, and pattern matching will get you a lot of impressive anecdotal successes. You have to look at the math and ask what capabilities it's theoretically possible for it to have.
Dec 19, 2022 at 18:03	comment	added	Roman Starkov		Ignoring the question of the blanket ban itself, "just a language model" is a silly way to look at ChatGPT. It is able to instantly find the bug in my 30 LOC function that I use as a go-to for developer interviews. It takes a human 30 minutes to find it, with a debugger. This "just a language model" finds it just by reading the code, and explains what the problem was, in seconds.
Dec 17, 2022 at 11:13	comment	added	Obie 2.0		@Wildcard - The question of whether humans—or at least, other humans—actually understand things is also a philosophical question that it is not possible to support by science. ;) I do not think that ChatGPT understands anything: its lack of consistent or competent reasoning about the things that it writes, not to mention any semblance of a coherent personality in its output, would seem to disqualify it. But someday, something that behaves exactly like a human may indeed be created, and if that happens people would do well not to dismiss it just because it is silicon rather than carbon.
Dec 8, 2022 at 6:16	comment	added	Ray		@Wildcard No problem; I appreciate the comments. The last thing I would want would be for people to interpret my statements as supporting the more outrageous interpretations of these systems' capabilities when I was trying to make the exact opposite point. (A final clarification: "condition" in the sense of "conditional probability", not "classical conditioning"; "encodes" in the information theory/coding theory sense; and "decision" in the sense of sampling from a learned probability distribution.)
Dec 8, 2022 at 5:53	comment	added	Wildcard		@Ray thanks, I think the footnote is a definite improvement. I actually wasn't expecting an edit; I've been away from SE for a while and had forgotten how receptive people generally are here to constructive input. :) Not to quibble with it too much, but I believe that all the verbs in the correcting footnote ("encodes" and "can condition", as well as the word "decisions,") still fall victim to the same category of error in which Dijkstra placed anthropomorphism: namely, operational reasoning, as a "tremendous waste of mental effort." This is a fairly puristic sidenote; the update is great.
Dec 7, 2022 at 20:28	comment	added	Ray		I said that anthropomorphization isn't useful, but I was perhaps overstating things; phrasing things in a somewhat anthropomorphizing way can be potentially useful as an analogy so long as everyone understands that it is intended as such and that all analogies are flawed. But I obviously didn't make that as clear as I needed to. So to be very clear: the important part of that paragraph was not where I said it "understood" certain things. It was the part where I said it doesn't reason about them. The models are in no way sapient, self-aware, or anything else along those lines.
Dec 7, 2022 at 20:22	comment	added	Ray		@Wildcard I've added another footnote to hopefully make that a bit clearer, but some amount of imprecision is inevitable if I'm writing for a general audience in a reasonable amount of space (even the three-part comment above was glossing over a lot of details). The real explanation is going to be several pages of math. If anyone wants more detail, I recommend Bengio et al. 2003, Mikolov et al. 2013, and Vaswani et al. 2017 as a starting point.
Dec 7, 2022 at 20:13	history	edited	Ray	CC BY-SA 4.0	added 191 characters in body
Dec 7, 2022 at 19:50	comment	added	Wildcard		@Ray your clarifying comments are good, but the mere fact that an anthropomorphic shorthand statement of something can be more precisely described in objective or mathematical terms, does not signify that the original anthropomorphic wording was somehow correct. The question of whether "understanding" is the best word or not is precisely the philosophical judgment point which I was highlighting and decrying. Even your statement that the "model is able to determine...by looking" is highly anthropomorphic, in the exact way Dijkstra condemned. See lambda-the-ultimate.org/node/264
Dec 7, 2022 at 6:48	comment	added	Ray		...Additionally, in models that make use of attention mechanisms (such as GPT-3), the model can be shown to be able to determine which parts of a passage relate to each other by looking at the different parts it attends to at particular times. (although I'm not aware of any examples of this that are as illustrative as the analogy task is for the embedding function). There's no mysticism or philosophy here. Just math.
Dec 7, 2022 at 6:48	comment	added	Ray		...the system does understand that they have the same meaning (i.e. are largely interchangable). Further, it can be shown via the analogy task that the vector difference between non-synonymous words corresponds (in at least some cases) to the relation between them: the standard example is that embedding(king) - embedding(man) + embedding(woman) ~= embedding(queen). All this has been demonstrated experimentally. Whether "understanding" is the best word for what's happening here is unimportant: what matters is that the meaning of the words are verifiably encoded in the representation. ...
Dec 7, 2022 at 6:47	comment	added	Ray		@Wildcard No anthropomorphization is intended, necessary, or useful. To be more precise as to what I meant by "understanding": All modern language models possess an embedding layer that projects words onto a real-valued semantic space. Since the probability distributions are defined as continuous functions over the resulting vectors, as the distance between two words in this space goes to zero, the model will treat them identically. It has been demonstrated that synonymous words do appear close together in the semantic space, therefore...
Dec 7, 2022 at 0:50	comment	added	Wildcard		"Some level of actual understanding does exist in these models" — no. That is a philosophic conclusion not possible to support by science. It is exactly as unscientific as claiming that a computer that can pass the Turing test must therefore have a soul. I believe your statement is a perfect example of why Dijkstra warned against anthropomorphizing computer systems.
Dec 6, 2022 at 18:42	history	edited	cottontail	CC BY-SA 4.0	added 60 characters in body
Dec 6, 2022 at 17:30	comment	added	VLAZ		For some more: Here is a paper on how natural language producing systems failures dl.acm.org/doi/10.1145/3442188.3445922
Dec 6, 2022 at 16:35	history	edited	Ray	CC BY-SA 4.0	added 40 characters in body
Dec 6, 2022 at 16:27	history	answered	Ray	CC BY-SA 4.0

toggle format