-490

Update – April 29, 2025:

The Answer Assistant experiment has concluded.


Answer Assistant is an experiment in which AI-generated answers are verified, edited, and curated by the community before becoming publicly visible. We want to test if this feature could help improve the answer experience and encourage knowledge sharing by helping users get unstuck or get a jump-start on content curation while maintaining quality.

As we kick off the experiment, we want the Stack Exchange community to know that the team is:

  • Committed to the Stack Exchange network being a place for human-curated knowledge and information. This experiment explores how that can remain the case in a world with GenAI, while investigating possible new workflows that could benefit existing community members and the next generation of users.

  • Committed to building solutions that add value for users on the platform. LLMs are part of the world now, and any potential integration must be explored responsibly, in ways that not only provide value (task completion, closing knowledge gaps, etc) but also create transparency, keep humans in the loop, and encourage human contributions.

  • Not interested in any outcomes that might dilute the value of the platform. It is not a goal of this experiment to get GenAI content into public view. The goal is to see how users interact with the clearly labeled LLM-originated content and assess it for potential inclusion in the public knowledge base.

At this time we do not plan to expand the experiment to other sites on the network, unless other sites are open to volunteering. The goal is learning and taking any next steps cautiously, and we will share learnings as this moves forward.

Overview of Answer Assistant experiment

The Answer Assistant experiment will appear on several participating Stack Exchange sites whose moderators agreed to participate in the initial test. Only site moderators and logged-in users with a certain amount of rep (which can vary per site) will be able to see and verify private AI-generated answers. This curation process ensures that answers are verified by humans, and edited when appropriate, before they can become public to all users viewing the Stack Exchange site.

A private answer will be generated by an LLM if the question meets site-specific criteria. The answer will be visually different from human-authored answers and it will be clearly labeled that the AI-generated answer is private, may be incorrect, and needs verification by members of the community. If the answer becomes public, it will be attributed to an account labeled “Answer Bot.”

Illustration shows that a private answer can progress through a human verification stage to either become public or deleted Human verification determines whether a private AI-suggested answer becomes public or not

Image of a question page showing a private answer with coloring and text denoting that it is AI-generated, needs human verification and presenting buttons by which the user can verify the answer as correct, incorrect or partially correct A private AI-suggested answer as it would appear to users eligible to view it

Flexible settings for each participating Stack Exchange site

Each community in the Stack Exchange network is unique. The ability to customize the experiment settings — such as what questions might get an AI-generated answer, who can see/evaluate the private answers, and requirements for an answer to become public — allows room to leverage differences between communities and try variations in a controlled way. Limiting the visibility of the private answers helps prevent exposing the answer to users unfamiliar with the topic or community norms and ensures the intended community members make the judgments on answer quality.

The below settings are the default for question and answer visibility, but can be customized per community over the course of the experiment.

Questions that meet the following criteria may receive an AI-generated answer:

  • Older than 72 hours, to leave time for human curation

  • Posted in 2024 or 2025

  • Net positive score (0+)

  • Unanswered, defined as having no upvoted or accepted answer

Users with at least 50+ rep (default value) on the specific Stack Exchange site will be able to see and evaluate the private answers. This reputation requirement can also be customized per site.

A private answer becomes public if multiple users mark it as ‘correct’. A private answer moves to a deleted state, visible only to site mods, if multiple users mark it as ‘incorrect’. To handle mixed results, a net “score” must also be reached for an answer to become public or deleted. These specific thresholds (number of user votes needed, and the net score) can be set differently based on the level of site activity. If a user marks an answer as “partially correct” that assessment does not count toward any outcome.

Answers can be edited while in the private state. If an answer becomes public, editing history on the private version is visible only to moderators and those users who made edits, to prevent the original AI-generated draft from being indexed/crawled. Comments left on the private answer are not displayed if an answer becomes public.

Further details can be found on the help center article.

A careful and cautious approach

You might wonder why we’re moving forward with the experiment, in the face of concerns and sensitivity around AI-generated content across the network.  Put simply – in this time of foundational change, we must prepare for many possible futures. User expectations around seeking and contributing knowledge are rapidly shifting, and it’s important to both understand the nuances of that shift and have plans for how to address it. Knowing more about the pitfalls and opportunities related to human/AI collaboration is the key to making informed decisions as the technology evolves. And we learn by experimenting.

We conducted research and facilitated discussions with Stack Exchange community members and moderators about this concept throughout the latter half of 2024, to both get initial feedback from various types of users, and to identify communities that saw value in testing it out. It’s true to say that many of the people involved in those conversations expressed concerns about moving forward with this experiment. We also heard from people cautiously optimistic about the concept. The feedback heard was instrumental in getting us to how the experiment is designed today.

A common concern expressed was about AI-generated answer quality and the impact these answers could have on the platform. We share many of these concerns. While LLMs currently produce mixed results in terms of quality and accuracy, they are continually improving. We feel that it’s vital to begin experimenting now with ways to safely and responsibly offer this functionality, so Stack Exchange can be better prepared for a time when AI-generated answer quality and accuracy may be more reliable.

Others, particularly Stack Exchange site moderators, were more broadly concerned about the precedent such an experiment might set for the future of the network. To those of you who share these concerns, please know that the moderators of your communities represented them very well. We recognize the need to move slowly and judiciously into any new path for answer creation and validation, with the priority of ensuring that the human-curated nature of Stack Exchange remains intact.

We are conducting this experiment in a manner that we believe is respectful to the concerns expressed. Several Stack Exchange communities have volunteered to be part of the initial test group, and they’re interested in seeing how this could impact goals like reducing unanswered questions and increasing engagement within their community. If there are encouraging signals and results, we can look at possible next steps and other potential goals.

Goals and metrics

These are the primary metrics we’ll be tracking in the Stack Exchange sites participating in the experiment:

  • % of unanswered questions

  • % of AI-suggested answers that were voted on (private and public)

  • # of AI-suggested answers that were deleted/closed or became public

  • % of public AI-suggested answers with a positive score

  • # of  secondary engagement interactions  (votes, edits, comments, views)

  • # of users performing those interactions

As with the Question Assistant experiment within Staging Ground on Stack Overflow, the high-level goals are to increase user success and maintain content quality by leveraging AI/ML assistance in contribution.

Next steps

The initial phase of the experiment was made visible to moderators of participating Stack Exchange sites in December. The goals there were initial answer quality assessment and testing for feature bugs.

For the current expansion, to some community members on participating test sites based on reputation, the goal is to monitor engagement and answer quality, as detailed in the previous section, as well as any fraud or abuse signals. During this stage, we plan to review engagement and quality metrics and can consider adjusting the settings to expand visibility and/or eligible questions.

The initial learnings of our goals and metrics outlined in this post will help shape decisions on any changes or future expansion of this experiment. We’re taking it one step at a time, and are looking forward to understanding the various values or benefits each participating community experiences during this stage. We will continue to keep this post updated with findings and any details of next steps as we have them.

As we have stated in other recent communications, the company remains committed to testing AI/ML thoughtfully and purposefully to support the core values of Stack Overflow: human connection, collaboration, and knowledge sharing. The goal is to build and support a healthy ecosystem of active users and community contributors. This experiment will be tested in a safe, controlled, and transparent approach where humans are always in the loop. We remain open to concluding the experiment early if we find the results unfavorable for any reason.

What would need to be in place for you to feel comfortable seeing Answer Assistant implemented as a controlled experiment in your Stack Exchange community?


FAQ

Which Stack Exchange sites are participating in the experiment?

Arts & Crafts, Raspberry Pi and User Experience (UX) are currently participating in the experiment. Web Apps was a participant in an earlier stage.

Which LLMs are being used to generate the private AI-suggested answers?

The current experiment is integrated with an existing data partner, however, the feature is designed to work with any LLM in the future. We are not able to disclose the specifics during this phase of the experiment.

Will Answer Bot answers be subject to the same human oversight once 'approved'? Could other community members add comments, downvote, flag as spam, vote to delete etc, as with human answers?

Absolutely. If a private AI-suggested answer were to become public, it is subject to all of the same actions and processes that a human answer would be. Even in the private state, an answer can be flagged for moderator review and moderators can delete it directly if they see fit.

Will the Answer Bot also respond to questions or feedback, potentially editing or deleting its answer if it's convinced that it's wrong?

At this time, the private AI-suggested answer is fixed and is not subject to further revision based on updates to the question, or any comments on the question or private answer. That is something that could be explored if the initial stages of the experiment points us in that direction.

What about attribution/citation/sourcing on the answers that are suggested?

Right now it’s not included since the GenAI output does not consistently provide that. We’ve stated that attribution is non-negotiable and that goes both ways.  We are determining what sourcing data will be delivered from LLM providers along with the private AI-suggested answers. For the purposes of this limited experiment, we feel it’s still worthwhile to test out the concept and user interactions.

58
  • 152
    Attribution/citation/sourcing is not negotiable, but it is not included in the current tests. That seems like it's going to skew the results quite a bit. Commented Feb 4 at 17:54
  • 44
    If you can push on major AI providers to improve the attribution capabilities of their output, that would be a wonderful thing for the common good. I’m skeptical, but if it is possible, it will probably require a significant change to the type of results they’re generating that would potentially skew the results of your experiment. In my experience, when you’re using a system that tries to provide attribution, it hallucinates answers and then slaps on a tangential attribution that doesn’t actually say what’s claimed, unless it’s operating in a very narrow scope like an internal knowledge base. Commented Feb 4 at 18:01
  • 73
    Does seem a bit disingenuous to claim so much intent to have properly cited answers... when stack hasn't even proven it's possible, while moving forward with actually publishing ai generated content without it. Given how long it takes for stack to iterate on things now days it could be years before this ever gains any form of citation feature, long after far too much of it has been posted to reasonably clean up the mess. Commented Feb 4 at 18:21
  • 58
    We need fewer, higher-quality Q&As - AI could be useful as a personal assistant, to provide personalised responses, but it doesn't belong in public posts. But I guess that idea isn't as profitable as just scaling up volume at the expense of quality. Commented Feb 4 at 18:39
  • 77
    I don't quite understand the name of this experiment... This isn't an answer assistant, it isn't assisting anyone at answering a question... instead it's an attempt at filling in for the lack of answerers. Commented Feb 4 at 18:43
  • 77
    "Users with at least 50+ rep (default value) on the specific Stack Exchange site will be able to see and evaluate the private answers." - Currently, the association bonus is 100 rep, which means that I could join Arts & Crafts, Raspberry Pi or User Experience (UX), earn the bonus and immediately start reviewing answers without any prior participation on those sites. Wouldn't it be better to increase this threshold? Probably the same rep required to access the review queues (which at least requires some participation on my part before I can review anything). Commented Feb 4 at 19:11
  • 152
    "Committed to the Stack Exchange network being a place for human-curated knowledge and information." No. It is a place for human created and curated knowledge. You will have hard time finding experts willing to curate AI nonsense. Commented Feb 4 at 19:39
  • 27
    "The current experiment is integrated with an existing data partner...We are not able to disclose the specifics during this phase of the experiment." - Sounds legit. Commented Feb 4 at 22:13
  • 51
    I think this undermines the GenAI ban on some SE sites. As long as for example the GenAI ban lasts on SO, this answer generator would have no chance there. But when it is tested now, it may be introduced nevertheless. Wouldn't this again result in a moderator strike? If we do not pay attention, we might end up with more AI answers than human answers. A competition of AI generated answers and human generated answers will only result in alienating human answerers even more. This might not end well. AI and human content must be extremely well separated if AI content is to used at all. Commented Feb 4 at 22:30
  • 67
    What's the point of Stack Exchange if AI-generated answers are permissible? Querents could just type the question into Google and get the same (theoretically) result. Humans have an intuition that AI may never have, making the humans the only valuable contributors to Stack Exchange. ... Unless your goal is to be rid of the pesky humans.... Commented Feb 4 at 23:33
  • 36
    While a lot of people are unhappy about this, I do appreciate you taking the time to openly let us know what is happening, even if we get little say in it. Commented Feb 4 at 23:49
  • 52
    Just call a spade a spade. The last time you did an experiment, it magically just became a new feature with some random unexplained marginal growth metric Commented Feb 5 at 9:39
  • 51
    @ꓢPArcheon I just went through the metas on the 3 sites that "volunteered" and found no prior discussion on any of the sites, just announcements: "we are going to do this experiment now". I don't think there was any public discussion at all. Getting the majority of all moderators to agree on a small site with just 1 or 2 active moderators won't be that hard. Commented Feb 5 at 13:43
  • 26
    Generally, the people qualified to properly proofread an LLM generated answer are already the ones answering questions. This seems like it will just encourage people who have a moderate understanding of the topic to mark the AI as correct if the output "looks about right" (which it usually does, creating text that seems relevant is what LLMs do best) TL;DR: I am willing to help humans solve problems, I am not willing to proofread LLM guesswork. Commented Feb 6 at 23:16
  • 23
    To me, the most frustrating thing about seeing AI answers on Stack sites is the thought "If the asker wanted an AI-generated answer, they would have asked an AI." Adding this "Answer Assistant" will change that thought to "Why bother with asking a question on <StackSite> at all if an AI is just going to answer it anyway? Just ask an AI, skip the middleman." I realize you want these answers "curated by the community", but I have negative interest in doing that... I like helping people, not AI 🫠 Commented Feb 7 at 14:15

57 Answers 57

244

This remains troubling.

  • Committed to the Stack Exchange network being a place for human-curated knowledge and information. This experiment explores how that can remain the case in a world with GenAI, while investigating possible new workflows that could benefit existing community members and the next generation of users.

  • Committed to building solutions that add value for users on the platform. LLMs are part of the world now, and any potential integration must be explored responsibly, in ways that not only provide value (task completion, closing knowledge gaps, etc) but also create transparency, keep humans in the loop, and encourage human contributions.

I fail to see how adding generative AI to the platform demonstrates a commitment to the "network being a place for human-curated knowledge and information" or the company's commitment to "building solutions that add value".

I don't see evidence that expert humans are interested or willing to edit or curate AI slop. If you have information from people - and not just regular or high reputation users, but people who have truly demonstrated deep subject matter expertise who are qualified to curate generated content for completeness and correctness - that says otherwise, I (and likely the broader community) would be interested in seeing that.

This also seems to be a deviation from the SE mission. The basis seems to feed the idea that the SE network is a place for people to get answers to their questions. That isn't the purpose. The network is a knowledgebase of questions and answers and we don't exist to answer every question that a person may have. Feeding the idea that if humans haven't written and vetted answers in 72 hours seems to placate the idea of getting fast(ish) answers to a person's specific question rather than building a knowledgebase of human-curated content.

Just because you're limiting the view to moderators and high-reputation users doesn't mean that those people can effectively validate the content.

It’s true to say that many of the people involved in those conversations expressed concerns about moving forward with this experiment. We also heard from people cautiously optimistic about the concept. The feedback heard was instrumental in getting us to how the experiment is designed today.

I would recommend breaking this down a little more. Who were the people who were concerned? Who were the people who were cautiously optimistic. If my suspicions are correct, the knowledgable curators are the ones who are concerned and people tending more toward answer seeking were cautiously optimistic. The weights of these two groups of people should not be the same, with greater preference toward the curators.

Right now it’s not included since the GenAI output does not consistently provide that. We’ve stated that attribution is non-neogitiable and that goes both ways. We are determining what sourcing data will be delivered from LLM providers along with the private AI-suggested answers. For the purposes of this limited experiment, we feel it’s still worthwhile to test out the concept and user interactions.

If attribution is non-negotiable, why are you investing anything until that non-negotiable thing has been demonstrated? It feels like attribution isn't non-negotiable if you are proceeding with testing a tool that doesn't (and likely can't) provide the level of attribution expected from SE network answers.

12
  • 13
    the sad part is... they're already providing ai answers within SOfT with what they consider acceptable citations. Commented Feb 4 at 18:37
  • 121
    It's like you read my mind. Adding AI answers for humans to curate instead of just cultivating human experts to write answers is a huge step toward finally killing off SE as a repository of knowledge curated by experts and using AI to animate its rotting corpse. Next they'll seed questions with AI to try to generate the engagement they pissed away by neglecting and flat out abusing their volunteer community. It's so depressing to watch the company driving directly into the abyss and not be able to do anything about it. Commented Feb 4 at 18:47
  • 7
    I tried to find where the company explains the point of Stack Overflow to a new user and the closest thing I could find to a mission statement was in the blurb advertising their careers: "Join our mission to help empower the world to develop technology through collective knowledge." Interesting how there's nothing about humans or community in there... Commented Feb 4 at 21:44
  • @ColleenV Not unsurprising given teh majority of their workforce is built around serving SaaS which expressly isn't public plat/community. The closest you'll get is probably the tour, if you're looking for something specific to public plat. Commented Feb 4 at 21:49
  • 35
    @KevinB It's just another indication of how the company misses the point when trying to get people engaged. They'll twist your arm to try to force you to sign up for an account, and plaster the home screen with badges and welcome back clutter, or order you to vote on a post, but won't present a simple paragraph explaining what the site is about to inspire someone to donate their time to a worthy goal. Commented Feb 4 at 22:00
  • 2
    @KevinB The largest potential customer base for misc SaaS stuff are the existing users of the Q&A network. There's some truly strange assumption that some big company will just at a whim pop up and start paying for all those services out of the blue. That's not going to happen unless staff of that company is already familiar with the various products, maybe participating in the Q&A etc. A marketing department which doesn't understand this but treats the Q&A as something separate and unrelated is gravely incompetent. Commented Feb 5 at 10:12
  • 5
    "I don't see evidence that expert humans are interested or willing to edit or curate AI slop." I am reasonably active at CrossValidated. Far too many questions there could be answered with a combination of two or three existing answers, possibly with some explanation of how these existing answers apply to the question at hand. I would absolutely be willing to review an AI's stab at writing this up. (And yes, after twenty instances I may conclude that this is a waste of my time. But so far, I am keeping an open mind.) Commented Feb 5 at 12:54
  • 20
    I agree that “LLMs are part of the world now, and any potential integration must be explored responsibly” is troubling. For the record: decently-cheap camera drones are also part of the world now, should any potential integration for those still be explored? As I wish I could say to every corporation on the planet right now: you don’t need LLMs for absolutely everything, and if you think you need an AI somewhere, it’s probably already there. Commented Feb 5 at 13:48
  • 5
    But but but... my good friend.... "human-curated knowledge and information" doesn't mean it has to come from an human, just that an human reviewed it at some point. If your goal is selling training material including curated quality scores with it is actually good. Remember: LLM training often WANTS bad source material: the important part is that it has to be flagged as such. How would "AI" recognize poor content or Stable Diffusion "worst quality" negative prompts work otherwise?? Commented Feb 5 at 15:53
  • 1
    It varies with the subject matter, but there are many kinds of Q&A where we just need to find a working method to accomplish x. In such cases, whether a human came up with the answer or an AI did is not the main concern. The main concern is: is the information good/accurate/usable? Commented Feb 6 at 2:15
  • 29
    "I don't see evidence that expert humans are interested or willing to edit or curate AI slop." Indeed, most experts won't even do this for $40+/hr on sites like DataAnnotation, let alone unpaid. My experience has been that it takes more time and effort to fact-check and correct an AI answer than it would take me to write my own answer from scratch, so adding AI into the loop is simply a drain on resources. And of course, since it will take so long to vote accurately on AI slop answers, the vast proportion of votes will be by people who are not checking as carefully as is needed. Commented Feb 6 at 17:34
  • 1
    "attribution is non-neogitiable" doesn't mean what you think it means. They say that they going forward and not even ready to negotiate their position. Commented Feb 7 at 14:26
227

I just saw this after talking to a recently resigned mod...and this makes me want to leave too, honestly. I could ask an AI myself, I don't need Stack Exchange for that. You are making this site useless.

If attribution is really non-negotiable, then why have you decided to ignore it (for the time being) and publish this without it?

Do you have any way planned to prevent robo-reviewers who mark all AI answers correct? And a way to prevent just plain wrong reviewers? 50 rep is enough that someone who has never visited the site before is somehow trusted to review an AI answers. Why is it that it takes 350/500 rep to review in some queues, up to 1k/3k in others, but you trust 50 rep users to review this?

Why do you feel it okay for you guys to post "checked" AI generated answers while it is suspendable for users to post checked AI generated answers (thinking about this incident specifically)?

Most importantly, why should I stay here? Tell me, why shouldn't we all leave and never come back? What value is Stack Exchange now providing I couldn't get myself by asking an AI? Why should I put effort into asking a question to get the same thing I would get if I asked to do an LLM?

Oh, and why won't you tell us what AI you are using? That's pretty weird and I can't imagine why you wouldn't be able to share that.

Stack Exchange is...Committed to the Stack Exchange network being a place for human-curated knowledge and information

In that case, why are you adding AI answers which will inevitably replace at least some of the human-curated knowledge on this platform?

Not interested in any outcomes that might dilute the value of the platform.

AI answers, even if correct, would still dilute the value of the platform.

Currently, Stack Exchange is useful because it provides value I can't get somewhere else. Codidact, while great, is quite small, and there is really nowhere else where there are humans answering questions in a formal, curated, high-quality manner. That's what makes this place unique, that's why I'm here.

If you make this site so it provides nothing I can't get from asking an LLM, you've destroyed this site, not just for the community, but for you guys as well. You make no money if users don't go to your website, do you?

Questions that meet the following criteria...Posted in 2024 or 2025

Why? These are actually much less likely to receive value from an AI answers (as it is very likely the user who posted this was aware of the option of asking an AI and decided against it or the AI wasn't able to solve their problem)

We want to test if this feature could help improve the answer experience

This won't improve the answer experience, it will just make it totally non-existent in same cases. Do you even understand what you wrote there?

We remain open to concluding the experiment early if we find the results unfavorable for any reason.

Who is we who "find[s] the results unfavorable" here? The company? Mods? The community? ChatGPT?

If it is the company, as I suspect, is "everyone is really mad at you and thinks its terrible" considered an unfavorable result?

16
  • 7
    "And who will be able to see these and mark them correct anyway?" anyone with 50 or more rep, unless otherwise defined on the given site. so you and I will be able to see and potentially approve these on sites we've never participated on. Commented Feb 4 at 18:27
  • 1
    @KevinB Source? Commented Feb 4 at 18:28
  • 7
    "Users with at least 50+ rep (default value) on the specific Stack Exchange site will be able to see and evaluate the private answers. This reputation requirement can also be customized per-site." Commented Feb 4 at 18:28
  • 14
    "If it is the company, as I suspect, is "everyone is really mad at you and thinks its terrible" considered an unfavorable result?" isn't "everyone is really mad at you" the standard state now days? Commented Feb 4 at 20:42
  • 1
    @KevinB Hence why I asked. Wishful thinking. But also sometimes "everyone is really mad at you" does lead to some changes. 5-10% of the time, but it does occur Commented Feb 4 at 20:44
  • 21
    The only "unfavourable" is "risk of loss of profit" for the owners of the company. So I guess that happens when the partnered AI providers are unhappy with the services of the flesh slaves. Commented Feb 4 at 20:54
  • "Posted in 2024 or 2025 - Why" - no-one, especially not the asker, cares any more about questions from years ago that no-one paid much attention to (a lot of those questions would've been autodeleted at 365 days). If lots of people paid attention to it, but no-one managed to answer it, AI is most likely just going to produce garbage. For newer posts no-one paid much attention to, most likely only the asker is going to benefit from an answer that anyone can get from ChatGPT (which isn't great for providing future value to anyone else). Commented Feb 5 at 0:35
  • 2
    @NotThatGuy Posted in 2024 could be 13 months old...thats not that different from a question posted 3-4 years ago Commented Feb 5 at 0:39
  • I didn't see anything in the post or help center page that would make me concerned about robo-reviewers- unless I missed it, there's no mention of rep or badge rewards for reviewing these. so my question has been and still is... why would anyone spend their time reviewing this? if anything, I expect the company to run into a problem on their end of lack of interest in reviewing this stuff. fingers crossed that they don't add rep as a reward for reviewing, because that would attract robo reviewers like mad. Commented Feb 5 at 5:46
  • 2
    "Why do you feel it okay for you guys to post "checked" AI generated answers while it is suspend able for users to post checked AI generated answers" The mentioned user posted unchecked and incorrect AI generated answers. Commented Feb 5 at 7:32
  • 4
    SE can be a community which coalesces expertise of those eager to teach & communicate, OR it can be a database of AI "content". You cannot have both; one cultivates a community and the other minmaxes answers to questions, which will push out the most valuable authors of the site. I fear SE doesn't care about the community. Commented Feb 5 at 9:20
  • 8
    Keep in mind, too: “a database of AI content” is not functionally much different from “ChatGPT with a search bar”. SE’s usefulness stems from the fact that it’s better than a database of generation, not that it could be one. Commented Feb 5 at 13:49
  • One Q. I can answer based on my MS in CS. "Why do you feel it okay for you guys to post "checked" AI generated answers while it is suspendable for users to post checked AI generated answers" The idea is, they're trying to build an LLM specifically off the SE users. Using another AI for answers would introduce another AI's data into SE AI's, increasing the required space without adding value. That's probably also why they are concealing the source of the AI -- it's probably a partisan figure like Musk trying to absorb SE's quality into an AI he's building, and revealing would turn users off. Commented Feb 7 at 6:15
  • @starball Not all robo-reviewers are doing it for rep/badges. For example, I've seen some mods complain of robo-flaggers who are just trying to increase their helpful flags count, even after the Marshal badge Commented Feb 7 at 14:55
  • 2
    @Starship If it were OpenAI, then they would have announced it, because that's already out in the open. Whoever this is, it almost seems like they aren't seeking a long-term relationship-- just an experiment, to collect as much AI training as people are wiling to give-- and once the AI quality stops increasing, the experiment will be terminated, and this entity will walk away never having been identified. If they were seeking to make the results in the public domain, then they would require a much higher rep threshold for vetting answers. But when the goal is AI training, any layman can do it. Commented Feb 7 at 15:54
184

If you are asking us to edit and review AI written answers, then why shouldn't I just write an answer of my own and take the reputation for myself? If I can review that the answer is correct, then it means I can also write an answer myself.

18
  • 60
    Seriously. The LLM agents we have running at home are far superior to anything I've seen generated here. I could just generate an answer with my own AI and fix it up and post it instead of reinforcement training someone else's AI for free. Commented Feb 4 at 18:58
  • 70
    If you do this, you also have ownership of and better attribution for the work. There are far more pros to writing your own answer than reviewing AI slop. Commented Feb 4 at 19:04
  • 1
    Or just have the AI write the answer and fact check it... Commented Feb 4 at 20:33
  • 42
    @Starship But you can't post such an answer because that's against the rules. <irony> Commented Feb 4 at 21:31
  • 7
    @Dharman Good luck catching me. If the post is indistinguishable from an entirely human-written post, it shouldn't be an issue. Commented Feb 4 at 21:39
  • 3
    "why shouldn't I just write an answer of my own and take the reputation for myself" -> You are not being told you cannot answer a question just because the AI already posted one. If your point is to persuade people to answer questions instead of reviewing AI answers or doing nothing, then great, at the very least having AI answers will help motivate real people to display their human expertise and really answer questions! Commented Feb 4 at 22:18
  • 9
    @goldilocks that completely misses the point - if a user posts an answer, the AI generated answer...still stays around. What motivation is there to improve it? In many very many cases we won't be able to remove it, even if it's wrong. Your "but it helps provide answers" isn't actually going anywhere. It's a non-point. Commented Feb 4 at 22:23
  • 4
    That's ridiculous. If I or you or anyone posts an incredibly stupid answer somewhere, people downvote it and at -X points (can't remember what the threshold is) it disappears for most users. Further, if it is truly atrocious or just plain NAA people will flag it and mods will delete it -- that is part of what mods do everyday. This is no different. Just because an AI answer is approved for public viewing does not mean it is immune to downvotes, criticism, or deletion on some special level . Commented Feb 4 at 22:47
  • 15
    @goldilocks no, what you said us ridiculous. Downvotes don't delete content. Never have. With a low enough score, the answer is greyed out. That's it, it's still visible for everybody, though. And extremely wrong answers have lived on the sites for years. Not as a concept, there are plenty of 5+ years old answers. Some can't even get the 3 del votes to be removed, others can't be devoted because they are positively scored. But you still just prattle on without touching the point the answer makes. Commented Feb 4 at 22:55
  • Okay "greyed out" then. On sites that have competent moderation, though, truly horrendous stuff that's flagged (or even not flagged) can still be deleted. I delete tish every day, and I do not need anyone else's help or approval to do it. But the issue of how we deal with massively downvoted answers is (or, logically should be) distinct from who wrote those answers. Commented Feb 4 at 23:23
  • 3
    @Adamant I can’t fully correct that sort of misunderstanding of how this stuff works in the space of a comment. We have a fairly beefy computer, but that’s more for image generation. LLM agents require a lot less umph and we run those on the M4 Mac. You don’t have to create a new model from scratch to create specialized AI tools a human can use to do something faster or better. It’s only when you try to make AI do something on its own that is better done by a human that things get hard/expensive. Commented Feb 5 at 2:50
  • 4
    I am against the AI answer bot idea - still I think this argument "If I can review that the answer is correct, then it means I can also write an answer myself." is flawed. Most people's passive knowledge is far bigger than the knowledge they can write down into a good answer. Moreover, just reviewing an answer takes a lot less time than writing one - so even if I think I could write an answer for a topic, I may have only time for a review someone elses answer, but not for writing my own. Commented Feb 5 at 12:23
  • 4
    There was once a great employee that said "Don't waste your time polishing turds". Sadly, he apparently was too good and annoying for what the company aimed to become so he had to go, on a Monday morning, without a warning... Commented Feb 5 at 15:48
  • 1
    @VLAZ "Downvotes don't delete content. Never have." That's not true for questions. Roomba autodeletes negatively scored questions under certain circumstances. Commented Feb 9 at 22:24
  • 1
    @NoDataDumpNoContribution but it is true for the context that I said it. Commented Feb 10 at 5:28
155

(Trying to ignore the severely ironic banner above the edit box as I write this, which says "Reminder: Answers generated by artificial intelligence tools are not allowed on Meta Stack Exchange. Learn more"...)

At the risk of making this all sound like a non sequitur, I would like to point out that the way these "experiments" (i.e. soon-to-be-features-network-wide) are managed shows lots of parallels with the situation of content creators facing today's AI apocalypsoid.

There's this viral tweet that succinctly says

You know what the biggest problem with pushing all-things-AI is? Wrong direction. I want AI to do my laundry and dishes so that I can do art and writing, not for AI to do my art and writing so that I can do my laundry and dishes.

I've also come across this more verbose blog post by comic artist group War and Peas. It explains one of many instances when they were contacted by some AI grifter company, offering to have their creative work outsourced to an AI slop factory.

I won't quote the whole piece (it's not long, and worth reading in my opinion), but here's the artists' reasoning from their polite middle-finger reply:

[...] we must politely decline. Here are some reasons why:

  • The surge in AI has been built on the backs of creative people like us. Artists’ work has been harvested in order to train large language models and they have not been informed or compensated.
  • Social Media platforms cannot grow without engaging and original content, made by people. This is the basis for any user to join. Creators have – in large part – not been privy to the rewards that these platforms have generated over the years even though they are the reason for platform growth. Any business model that does not acknowledge this and does not seek proper compensation for artists is not of interest to us.
  • While being creative is surely not always fun, most artists love their work and do not want to outsource their passion to a machine that does this for them. They would much rather be properly compensated for their work in order to continue to be able to do said work. Frankly, storytelling “without the need for personal content creation” sounds horrible. Storytelling is personal. It is a connection between the writer and the reader. Without this personal connection, storytelling loses its purpose.
  • Artists would much rather have an AI that actually helps with grunt chores, such as writing invoices, or helping with taxes in order to focus more on fulfilling creative tasks. Such an AI-tool would be of much more value to the artist community.

Why am I pointing to the above? Most of that also applies to the Stack Exchange network, and Stack Overflow where I'm mostly active. It baffles me how tone deaf the company is regarding this matter. I'm way past assuming good faith, all that could be said to raise our concerns has been repeated ad nauseam already. You the company keep pushing AI garbage onto the network, blatantly disregarding both how much harm it does to the platform and how detrimental this is to the communities that make the platform (your product) viable. All in the name of generating more profit, because frankly, there can be no other reason. And yes, I still don't think the company is just trying to pay the bills (link to comment thread about hypothesized intentions of a company that acts like only profit matters; sorry if the comment thread gets deleted eventually, you can find an archived version of the page here, look for the comments under the answer from TylerH).

To make my point perhaps a bit more obvious, let me paraphrase the artists' points above, applied to our your network:

  • The surge in AI has been built on the backs of experts (and creative people) like us. Contributors' work has been harvested in order to train large language models and we have not been informed or compensated.
  • The network cannot grow without original and correct content, made by people. This is the basis for any user to stick around and keep the network alive. Contributors have not been privy to the rewards that the AI platforms have generated over the years even though we are in large part the reason for their success (especially Stack Overflow for chatbots trying to solve programming problems). Any business model that does not acknowledge this and does not seek sustainable collaboration for contributors is not of interest to us.
  • While solving other people's problems is surely not always fun, contributors love this hobby and do not want to outsource their passion to a machine that does this for them (especially when it does it much worse). They would much rather be empowered to curate the knowledge base effectively in order to continue to be able to do said work. Frankly, solving problems "without the need for personal content creation" sounds horrible. Answering questions is personal. It is a connection between a problem and an answerer. Without this personal connection, problem solving loses its purpose.
  • Contributors would much rather have an AI that actually helps with grunt chores, such as educating new users, triaging new questions or finding duplicates in order to focus on more fulfilling problem solving tasks. Such an AI tool would be of much more value to the Stack Exchange community (honestly, it needn't be AI, please don't make it AI).

Expecting that the contributors will start curating AI garbage when we're already wary of any content written with suspiciously elaborate English, willingly and for free, is an affront to the intelligence and passion of the people who keep this network afloat (but rapidly sinking). It completely chooses to miss the point of all the pushback you've been seeing due to the mindless/unscrupulous plugging of AI garbage on the network. It ignores all the reasons why AI was banned by the community in the first place. I hope the inevitable graduation of this "experiment" to a feature that defies all objective feedback you will have received (and eventual implementation on Stack Overflow) will finally put the network out of its misery.

11
  • 56
    I don't agree with every jot and tittle, but I do agree that it's weird to automate the part of the site that is the most rewarding for humans-answering questions. Couldn't we train the AI to flag unfriendly comments and off-topic posts first? I'll probably never come back, but reducing the pain points that burn people out could only help the network. Commented Feb 4 at 21:52
  • @ColleenV Queen uses some AI to help detect unfriendly comments Commented Feb 4 at 23:21
  • 14
    As always the problem is not really ‘AI". The problem is generative AI writing text that needs to be semantically correct. We've been using many kinds of machine learning systems for a very long time without any crises. Problem today is the kind of AI and what it's being used for. Commented Feb 5 at 5:35
  • 2
    @ColleenV that bot to flag comments has been in place for a decade. Commented Feb 5 at 8:31
  • 1
    @Adriaan I've been making my own NLN comment bot and from what I can see I'm almost certain Andy's bot has been inactive since 2021 or so Commented Feb 5 at 14:05
  • @Starship there was a discussion on whether the bot would continue running since Andy's election to moderator in 2017, as that would grant him infinite flags without any control by another mod. Commented Feb 5 at 14:11
  • @Adriaan And... Commented Feb 5 at 14:13
  • "All in the name of generating more profit,.." I agree with many things here but not with the apparent capitalism critic. The pursuit of profit has brought forward lots of great things and if it's true that the experiment will inevitably fail because the company totally misses to understand user motivation (which very likely it does), then profits will drop and this will be extremely strong motivation for the company to change. For example, I honestly believe the company only started thinking about the staging ground seriously after revenue dropped in 2023. I don't see profit as inherently bad Commented Feb 8 at 13:23
  • @NoDataDumpNoContribution you can see the linked comment thread for a bit more detailed explanation of my stance. I didn't say "hurr durr capitalism". But the kind of astrological money that venture capitals spent on the network won't return without the kind of myopic KPIs that the company is chasing. The company is just not worth that much investment so they need to make terrible (for the platform) decisions to try and recover. Commented Feb 8 at 16:48
  • 3
    Machines that do our laundry and dishes are in our homes for almost 100 years. Starting with a quote THAT oblivious to reality makes it hard to take the other part seriously. Commented Feb 11 at 15:49
  • 1
    @Agent_L OP doesn't want a dishwasher, he wants Rosy (from the Jetsons). Commented Jun 2 at 22:45
125

I think it's worth asking, why do users answer questions? Reputation points aren't that valuable of a commodity. I'm not getting paid to answer. I'm doing it because I want to share my expertise and help people learn. I enjoy writing up explanations and helping people with their problems, and interacting with a community of other human beings who answer my questions in turn when I'm confused about something.

Now StackExchange wants me to spend my free time curating AI slop instead. My question is, why should I? Why should I spend my free time reading these answers, which nobody cared enough about to bother writing them in the first place? What do I get out of this that's worth my time and effort?

7
  • 4
    "What do I get out of this that's worth my time and effort?" that's another way to share your expertise and help people learn. Commented Feb 4 at 21:49
  • 55
    @FranckDernoncourt except it's not... there's no expertise in rubberstamping slop. As it is nearly all of the examples thus far have been fairly low quality... yet the majority of people capable of approving these posts can't edit them into a better format or even indicate quality at all, short of leaving a soon to be deleted comment. Commented Feb 4 at 22:28
  • 1
    "Now StackExchange wants me to spend my free time curating AI slop instead." -> Where did you sense that anyone wants you to curate AI stuff instead of actually writing an answer? If the proprosal was "In order to answer a question, you will first have to review a previous answer by an AI", then this would make sense. I think if you read a question and feel you as a human expert can provide a decent answer, no one is suggesting there is anything better for you to do. Commented Feb 4 at 22:34
  • 9
    @goldilocks Who goes "oh look theres this perfectly fine already here waiting for review, let me go duplicate it instead of reviewing it"? I'll tell you, no one but rep farmers Commented Feb 4 at 23:22
  • @KevinB "Any user able to view a private answer is able to make edits." (help center) .... (edit: and then I just saw your comment about not being able to edit even when you should :P) Commented Feb 5 at 6:02
  • Reminds me of these FaceBook ads I see suggesting I can help improve AI because of my native language knowledge (duh!). Why would I waste time on that? Commented Feb 5 at 10:44
  • 1
    Who goes "oh look theres this perfectly fine already here waiting for review, let me go duplicate it" -> Hoperfully no one. If you think the AI answer is perfectly fine, then there is no problem -- make it public. For people that really want to beat it to the punch, you have 72 hours. If someone writes an answer and it is accepted or upvoted, the AI will not trigger. Commented Feb 5 at 16:37
112

If anyone wants an answer from some GenAI, they'll ask a GenAI. This service is already available all over the web, by approximately 999 other companies.

First of all, many people come to SE explicitly because they want an answer from a human expert and not from an AI.

As one version of that, SE has in latter years turned into the place where people come after they already tried to ask an AI but only got nonsense as an answer. It has become common that people write things in their questions like "I tried to ask ChatGPT but it didn't work".

And the reason that didn't work is that GenAI is rather useless for moderately advanced questions and topics, for example about programming. It cannot be trusted because it is simply too bad. The state of GenAI is not what you think it is.

You get back garbage, lies and hallucinations mixed in with portions of correct information - aka the best kind of lies. Such a messy AI reply often can't be salvaged by editing because what's written simply doesn't make sense. Why edit things that hold no value?

Furthermore, the GenAI often blatantly contradicts itself in many places, so even when editing there's a chance of some of that madness slipping through. Ask it a somewhat advanced question and you might find statements in the answer along "A equals B.... B does not equal C... A equals C." The GenAI isn't even capable of proof-reading such basic things, because it just generates text, which doesn't necessarily connect with previously written text in the same answer. This happens more often than not - what the AI is lacking specifically is intelligence.

The subjective sales argument made by many is something like: there are a few lost souls out there who have never heard of GenAI but will get pleasantly surprised when a low quality answer is provided from one at SE. Do you remember that moment when you want to ask tech support why your Internet isn't working and you got pleasantly surprised that you get to talk to a chat bot instead of a human? No? Me neither.

From there, common sense leaves us with the following conclusions for this new feature:

  • Potential target audience: nobody.
  • Purpose of salvaging AI replies: pointless busy-work and likely more work that just writing a completely new answer.

We can't have intelligence, artificial or otherwise, if we don't even apply basic human common sense during project idea evaluation.

6
  • 50
    "Do you remember that moment when you want to ask tech support why your Internet isn't working and you got pleasantly surprised that you get to talk to a chat bot instead of a human? No? Me neither." WELL SAID! Commented Feb 5 at 11:11
  • 11
    @AJM I had to recover an account recently. I went to the forgot password page, filled in anything and it gave me some error code with a recommendation to reach out to support. I went to the support chat which had the site's chatbot waiting to pounce. I had to do a whole song and dance with explaining that I tried the forgot login but got an error. Cheerfully, the chatbot advised me to try the forgot login. Which I had to explain a few times didn't work. It the end it finally transferred me to a human. I must say, the chatbot experience was far from pleasant. Commented Feb 5 at 15:05
  • 8
    @VLAZ As was always the case with every single use of such chat bots. The only ones who believe in them are the AI hype cult who sell these to very incompetent managers, with promises of less staff and quick support handling. Not to mention the best argument: it is AI! Now how these work in practice is that customers get pissed and don't get in touch with support at all - so you can verify that there are less support errands handled quicker. With the slight little side effect that you are rapidly losing customers because they no longer get support. Commented Feb 5 at 15:27
  • 6
    Which is exactly the same thing as selling GenAI to SE. Nobody wants it and it fills no purpose save for scaring customers away. It's even more ridiculous here since those who ask and answers questions aren't paid staff but unpaid volunteers. Yet we must have it and the main reason why is: because AI. Commented Feb 5 at 15:27
  • You can't be more correct. Commented Feb 7 at 16:35
  • AI — artificial ignorance. Commented Feb 11 at 8:42
68

I don't support this experiment at all.

I get the point of experimenting and verifying whether some idea can pass in real life. The problem is when you start with a bad idea (and you get a feedback about the idea itself) and then you still pursue the bad idea with hope that some magic will happen during the experiment. Sorry, but running experiments based on bad ideas are only a colossal waste of time and money, and in this case, a good will of your users.

Following is my response from the Moderators SOfT. Not much has changed in the meantime. This experiment was a bad idea then, and it is still a bad idea now. Reading some of other answers, it is clear that we share similar concerns.


The first problem with this experiment is the difference in starting points. You (the company) are starting from the position "We want to do this. How can we make it work?" and we are at the position "There is no way this will work, even as an experiment".

It will be very hard to find common ground in such circumstances. You are standing on one cliff, we on the other and all we can do is yell at each other over the abyss.

The issues with AI answers are multifold and even conducting the experiment itself will be highly problematic.

What do you currently have?

The purpose of the SE network is to create large repository of (human) knowledge where people can easily find answer to their questions. While I cannot say how well is that goal achieved in other areas, your technical sites, most notably Stack Overflow, have been successful in achieving that goal. You have the largest community of experts in their respective fields in the world. People can interact and get their answers directly form the very people that are developing technologies they are working with. You have employees from the largest tech companies like Microsoft, Google, Apple. Book writers and educators, MVP-s and tech gurus, professionals with decades of experience.

In two words you have knowledgeable PEOPLE nobody else has in such concentration in one single place.

What will you have with AI answers?

If you start posting AI answers on sites, you will be just another place on the Internet with AI something slapped on.

What would be the difference between such Stack Overflow and using Copilot directly in some IDE? Or chatting with AI in the browser?

Why would anyone come to Stack Overflow and other sites in the network to get AI generated content they can get it more easily and faster anywhere else?

Why would someone type question in Stack Overflow and then wait few days to get AI answer that may not even be correct, when they can just ask AI directly?

If you put the AI answers on SE sites, you will no longer have then single best selling point: answers written by people. You will become just another AI dumpster fire.

What is the purpose of this experiment and expectations?

There are three aspects of every problem with this experiment:

  1. What AI companies want?
  2. What SE as a company wants?
  3. What SE communities want?

When it comes to SE communities, we may also need to separately look at differently sized sites and their topics and we also need to take into account different categories of users as issues will vary.

  • Sites based on size

    • Stack Overflow
    • larger SE sites with enough traffic
    • smaller SE sites with poor traffic
  • Users (communities)

    • moderators
    • curators
    • answerers
    • askers

1. What AI companies want

Note: This was written with the presumption that AI companies with which SE is in partnership are also somehow involved in this experiment, even though this was not explicitly stated

This is simple, AI training companies probably want to have some additional input and feedback process that will help the assessing the correctness of AI responses which they could possibly use to further improve and tweak AI models and algorithms.

But, the results they will get highly depend on two things: traffic and user engagement with AI answers.

Which immediately surfaces several problems: the only site with significant question traffic is Stack Overflow. On all other sites traffic is too low to run any viable analysis. Another issue that makes this issue worse is that AI answers will be posted on unanswered questions which reduces the numbers even further.

This also means that primary target for this experiment (and further application) is Stack Overflow. Which is highly problematic, as Stack Overflow is probably the last site in the network where this experiment would be welcomed.

And I suspect that it is not only Stack Overflow where community has such stance, and it is likely that even talking about plans in the public will have immediate negative consequences for users' participation.

2. What SE as a company wants?

The answer here is probably not a clean cut. SE probably wants to get some money from AI companies, and at the same time increase the engagement and lost traffic on sites, which also translates into a money loss.

There is also a question how much money is in play, and for how long. In other words, are AI companies willing to pay only for a short period (during some preliminary experiment), or is there some long term interest.

It is very likely that taking money for this experiment will be an equivalent of drilling down and destroying the foundations of your house because someone will pay next winter's heating bill. You may be warm for one season, but soon after when the house collapses, you will no longer have a roof over your head.

How can that happen, we can see after we analyze what the communities want and how would they be impacted by this experiment and subsequent permanent implementation.

3. What SE communities want?

Moderators

We absolutely don't need additional work, especially on Stack Overflow, where moderators are still clearing up pending AI flags. AI answers experiment will certainly increase our work load as there will be more users who will take this experiment as a permission to post AI answers on their own. No matter in how many places you write that AI answer are not allowed, once you have them yourself, people will also start posting them.

The argument with AI being forbidden because it is not vetted will not hold water because plenty of AI posters claim they have vetted their answers (even when this is blatantly false). We will also have hard time explaining why community can validate AI answers posted by a bot, but is not able or willing to verify those posted by users.

The scale arguments will also not mean much as if posted on unanswered question, there will be significantly more AI bot answers posted than ones people are posting. If you want to say that those will not do much harm because they won't be visible, then what will be purpose of having them if most people will not be able to see them (get answers to their questions).

Curators and Experts

Curators, like moderators will have similar issues. While they will not be drowned in flags, they will have more posts they will have to flag and spend their time on. Considering that most curators are also experts in their fields, that means they will spend less time posting valuable content.

Overall, moderators and curators stance against AI answers, regardless of their origin, is well known. Even announcing this experiment clearly shows that company does not want to take input from us about AI features implemented on sites. This feature goes against everything we have been doing in last two years since AI LLM models came to existence. It directly goes against our wishes.

Pushing this experiment can seriously backfire. People are tired of fighting against company for no avail. Time after time, company has shown that it does not have community and our interest at heart. You (the company) only want to go forward with your own vision, regardless of the impact it may have.

There will be no next strike. People will just leave for good. We have already lost moderators and valuable members of the community over the previous issues, but nothing will compare to this. It will be just the last drop (more like a flood) which will make them leave. It might not happen immediately, but with time participation from those most active members will certainly die out.

Answerers

When it comes to people posting answers there are very different kinds and for some that are just concerned with their own reputation, this may not mean much. But, those are not really concerned by quality and they are also the ones that are most likely to post AI answers of their own.

However, the most valuable answerers are the ones that are not here just for the imaginary Internet points, but also for teaching and learning experience. The ones that really have something to offer either now, or in the future when they become experts themselves.

But if the other experts and curators start leaving, they will no longer be around to provide guidance to the new answerers. Users that are just starting on sites and also might not have enough knowledge to write the best answers at that time, but will be able to improve after receiving comments and suggestions.

Askers

While it seems that having answers posted on unanswered question may be a good thing for askers that have received none, I doubt that this feature will be wll received among them either.

Many people who are coming on SE sites have already tried to solve their problem with AI, but they failed and they have turned to the experts. They don't want another AI answer that does no solve anything. You can find many comments form low reputation users that are just starting on sites, being angry when someone posts some AI nonsense answer on their question, because they have already tried AI, but it haven't helped.

Also receiving possibly incorrect AI answer after few days, is too little, too late. If they wanted to use AI, they would have already used it directly and with much greater success as they can interact and refine AI responses.

If the question was on topic, but hard one, AI will probably not be of much help. If the question is off-topic or low quality, AI answer will just bring more attention to it, leading to closure and down votes and there will certainly be users who will notice that correlation and will be less than happy about it.

Chances that someone will get genuine help from such AI answer is minimal and it will not increase participation.

The experiment

Now let's focus on the experiment itself and potential problems.

Numbers

As previously mentioned, except for Stack Overflow, other sites have very low traffic, taking into account that this will be posted on unanswered questions, the potential number of AI answers will be low enough that it will be hard to get any relevant results out of this experiment.

My further focus is on Stack Overflow as it only has significant traffic.

Unless, there is a AI answer queue, visibility on those answers will be low. Most of those questions will be unanswered because they have flown under the radar and are not the ones worth answering. I doubt you will get more than single feedback from the OP on most of them.

This will be not enough feedback either for AI training companies nor for bringing answer to the public view where it may potentially be helpful some day to some passer-by.

Overall, the gains of this experiment are extremely low.

And when it comes to the sites with low participation, I cannot see how would this work at all. There is not enough people there in the first place, let alone those that will be reviewing AI answers. I don't see how this can help bring those sites to life. Especially, when you consider that some users will inevitably leave as the result of this experiment.

Another important thing here, is considering that long term goal would be adding AI answers across sites as I don't see the justification of doing this "just as an experiment" that will be scraped if the experiment fails. With this there is also a question of interpreting results and trying to bend them as much as possible to show that experiment is successful. I don't think that the results would be intentionally changed, but that not all relevant factors would be taken into account.

Answer visibility

Using very minimal 15 (50) reputation as a baseline for users that can give feedback is one of such things that can have negative impact on the experiment. Majority of users with such low reputation are nowhere near being qualified to judge and validate AI generated answers.

Since low reputation users are generally more willing to upvote just about anything, and there is way more of them than qualified experts, this alone can significantly alter the results of the experiment.

Even if the reputation increases, there are other issues. To get some relevant results, this experiment would need to be a long running one. Sometimes it takes months and years for answers to accumulate votes. This issue will only get worse for the AI answers that will be posted on unanswered questions. Chances that they will be seen is extremely low (unless there is a queue).

If you want the get more accurate results, this "experiment" would have to be implemented for so long, that it can be hardly called an experiment any more.

AI answer queue

To solve some issues with longevity of the experiment, and number of reviewers, you may want to introduce AI answer queue. However, this has issue on its own.

First (on Stack Overflow), we don't need more review queues. We are already having problems with handling existing ones. Any new queue that is taking away from the core ones needed to maintain sites' health will not be well received.

Second, even if you attract users to review, they will most likely fall into two categories. Robo-reviewers that will accept just about anything, and seriously annoyed curators that will decline just about anything. Both will ruin the results.

I suspect that number of users that will be genuinely willing to review AI answers and that can actually do that correctly is extremely low. I would be tempted to say almost non existent.

Correctness of the answers

You have to start with the premise that AI answers will be incorrect and you will have to take into account the damage they can do if posted on sites (to public). Even if they are just visible to reviewers that doesn't meant that they cannot cause harm.

Human answers can also be wrong, but for those there are commonly other signs from which you can more easily detect possibly bad or lower quality answers that might require more caution.

AI answers sound very convincing and plausible and for some, it may be even hard for the experts to determine whether there is some incorrect information or not.

There will be a difference between sites about what kind of potential damage can incorrect answers cause. For instance, having incorrect answer on Role-playing Games or Worldbuilding will not cause any harm, while incorrect and misleading answers on technical and science based sites can cause plenty of harm. Even the sites that deal with social sciences, religion, and history would want to have correct facts instead of potential AI hallucinations.

Ability to verify facts and how much time is needed to do so, can also significantly vary. If the correctness and facts are hard to verify, you may get skewed results through feedback because people might mark answers as correct simply because they will sound plausible, but they will not have any means to truly verify the correctness.

Correctness, may seem easier to verify on some technical sites, for instance you can more easily check whether some piece of code solves the problem or works as expected, but even then explanations about inner working can be wrong and code event though it "works" may have some other serious flaws.

For instance, if someone posts code susceptible to SQL injection and AI corrects the non working part but does nothing to correct the security flaw, such answer can be easily marked as correct while it suffers from critical flaws. And some flaws may not be so obvious.

Except in most trivial cases where AI might accidentally get correct answer on the first try, AI answers tend to be mostly incorrect. It takes several iteration between person and AI to get something usable and correct (if the correctness can be easily verified).

This can be seen on AI answers posted on Stack Overflow even by high reputation users where they often require several revisions to get to the correct one. And this is also experience of users that are already using AI off sites, where AI is mostly useful when they are stuck and need some idea about direction and hints, but often raw AI output will not be usable as such.

Editing AI answers

Just like you will have problem with getting enough users to validate AI answers, editing will also be a problem. Most of the time, you will need experts to edit such answers and I don't think there will be too many of those that will be willing to edit AI answer instead of posting their own.

Human participation

Even if we ignore issue with users that might leave or decrease their participation as form of protest, there may be additional decrease in participation as posting answer after AI posted one may look like some sort of plagiarism.

Currently, I can see some reluctance from people (and I have even been in such situation myself) to post answer to the question that previously received AI answer as some parts of their answer may look like being inspired by AI one.

Additional concern is that seeing AI answers posted on site, may encourage other users to post AI answers themselves which is not allowed and this will only create more work for moderators and curators.

Impact on curation

Since plenty of questions that might receive AI answer will be off-topic or low quality, there is a question what kind of impact will AI posted answers have on curation. More specifically, on roomba. Will having those answers prevent automatic deletion of such questions and under which circumstances?

Conclusion

Overall, the potential gains from this experiment are so low, that I don't see how proceeding further is justified. You have nothing or very little to gain, and everything to lose.

You will alienate the core of your communities (moderators, curators, experts) even further, while at the same time you will not add new value to people that need solutions.

I can only see further decline of participation and traffic as the result.

8
  • "are AI companies willing to pay" - I imagine SE is paying an AI company (possibly having some in-house staff customising established tools), not the other way around, for increase volume. Or there's a partnership with some future profit goals in mind. I can't imagine AI companies would be this desperate for training data, when there's so much data available elsewhere (even if plenty of that is morally and/or legally questionable). Commented Feb 5 at 15:55
  • 2
    @NotThatGuy This wouldn't be training data, but more verification of how good their AI is when applied. Commented Feb 5 at 16:52
  • A lot of this seems reasonable. The claim that Stack Overflow is the only site with major traffic is not obvious. Is there data supporting this? Commented Feb 5 at 17:48
  • 4
    @JoshuaZ stackexchange.com/sites#questionsperday eyeballing, easily more questions per day than all the other sites combined. Commented Feb 5 at 18:04
  • 6
    @JoshuaZ For historical records you can find some information here meta.stackoverflow.com/q/333743 Stack Overflow used to be a lot busier place with over 7000 daily questions asked. Commented Feb 5 at 18:32
  • Yeah, ok that's pretty compelling evidence that it is much larger than any other site in terms of activity. Commented Feb 5 at 18:55
  • 4
    It feels this reply has more insight and thought in it than the AI experiment. Commented Feb 6 at 10:05
  • 1
    "I can only see further decline of participation and traffic as the result." To be honest, I see that also without this experiment. Commented Feb 7 at 15:29
65

I will not repeat all the excellent responses you've gotten to this post. I will only give you my opinion as an "expert in my field" who has been active on SE for about a decade.

Why on God's Green Earth do you think I might want to voluntarily fact-check an AI generated answer? If the question interests me (and isn't asked with a gimme attitude), I'm happy to answer myself, mostly because I like to help people, and it keeps me involved in the field post-retirement in a way that I alone get to determine.

I'm completely uninterested in helping LLMs to learn to give better answers. If experts are what is needed to check the answers, I imagine many share my feelings.

What a disappointment this is. Again.

63

Vote To Close: How to Master the Art of Coffee Brewing: Techniques and Tips?

Staff finds this part unnecessarily rude and thus blanked it out. The gist is: The question is not a suitable or representative testcase for StackExchange questions and belies a fundamental misunderstanding of what questions should be asked on StackExchange.

Flag as Not an Answer: AI answer

Credit to the AI for giving a terse answer. However, it decided to answer a different question about how to operate a drip coffee machine, as opposed to explaining the differences between drip, french press, espresso, etc.


It is wild to me not only that you wouldn't just demonstrate the output on a real question from one of the participating sites, but particularly that this would be your example, given how obviously flawed both the answer and question are.

(Finally, please make the transcript of the generated answer available to users with a screenreader. See: please do not upload images of code/data/errors.)

11
  • 7
    This is an excellent point. The question is too vague and opinion-based (e.g. what is the "best" brewing method?) If I tell you that I know a great way to brew coffee that takes two days, is that the "best"? The question is unclear. Commented Feb 5 at 1:31
  • 24
    And the answer...goodness gracious, the answer. It does not address the question, because it does not explain why or if that brewing method is the best. It does not explain how this affects the flavor compared to other methods. It does not explain the nuances between methods. It does not explain whether or if it is a "lesser-known technique" (it isn't). Its advice is generic at best (use cold water, turn the coffee maker on, and add whatever you want to it???) It says you should use a paper filter and that some coffee makers come with a reusable one. Should I take it out or not use it? Etc. Commented Feb 5 at 1:34
  • 19
    And most important, it gives wrong advice. Six fluid ounces of water and two tablespoons of grounds will not give you a cup of coffee as suggested, because, losses of water in the process aside, most of the time, you will not be consuming the grounds and will get 3/4 cups at best. Even worse: 3/4 cups = 12 tablespoons, so even if you drink the coffee with the grounds, you are still at best two tablespoons short of a cup. Commented Feb 5 at 1:42
  • 8
    The bottom line is: how can we have artificial intelligence if we don't have human intelligence first? Any somewhat experienced user of the SE network can immediately tell that this was a question which is much too broad and essentially asks for an answer the size of a book. It should be closed, not answered - not by humans, not by AI. If the humans don't understand this first then AI won't remedy the situation, but will only turn the already low quality post super-low quality. Commented Feb 5 at 8:57
  • 6
    Perhaps we can start small and first train an AI classifier that automatically pushes a question like this into the VTC queue as "unclear/too broad". Commented Feb 5 at 9:42
  • 1
    You know what the best part of all this is? We always claimed that "we should not use flags to point out wrong answers" and "mods can't evaluate the the correctness of an answer" so if said answer about a completely unrelated question was posted by an human you couldn't even flag it as NAN in the first place. So, yes, while your claim is perfectly right... this is just pointing out that we never removed users answers that were hallucinating different questions from the one posted and instead just resorted to downvote them... Go figure HOW TOTALLY FINE adding ai ones will be Commented Feb 5 at 14:45
  • 2
    @Adamant Actually it's not wrong. A "cup" of coffee is typically defined as five or six fluid ounces (not eight fluid ounces, one "cup" in volume). mrcoffee.com/coffee-makers/5-cup-coffee-makers/… Commented Feb 6 at 2:50
  • @nobody - Different coffee machines use different definitions. Even the link you provided mixes different sizes, saying "It makes up to 25 oz. of coffee, perfect for two 12 oz. cups" (12.5, but who is counting?). Despite the name, it actually seems to be assuming a 12 fluid ounce cup of coffee (standard for a larger latte, not unusual for the kind of mugs people often use in their house). In any case, coffee "cup" sizes vary substantially enough that I am willing to characterize as wrong an answer that uses "cup" to mean six fluid ounces... Commented Feb 6 at 3:09
  • ...without explanation, particularly in the context of more precise "tablespoons" and "ounces." But of course, a "cup" can theoretically be any size. Commented Feb 6 at 3:10
  • 3
    @Adamant Completely correct. I am not a coffee fanatic, as can be seen by the 16 Nespresso capsule boxes in my coffee cupboard. But EVEN I can see that this AI generated slop is downright horrible. It's just a generic "how do I brew coffee" explanation that just is completely unsuitable for the platform. It's like if I asked Gaming "what's the best way to beat Alatreon" and someone would give a generic RPG adventure about how I should upgrade my gear and get a range of consumables and choose the best weapon without actually mentioning how Toppling the boss reduces the damage of his big attack. Commented Feb 6 at 7:09
  • 1
    "It is wild to me not only that you wouldn't just demonstrate the output on a real question from one of the participating sites, but particularly that this would be your example, given how obviously flawed both the answer and question are." - obvious to people who don't want generative AI on the site. Not obvious to those who do, apparently. Commented Feb 8 at 13:54
59

After this "experiment" (and let's be honest it's not, it's a feature you are going to impose regardless of whether we like it or not) is introduced, the last piece of value that the Stack Exchange network had over AI slop will be gone. This is the killing cut to the goose that lays the golden eggs.

Farewell my fellow curators, my fellow pursuers of knowledge, my fellow librarians. We had something unique and good, and like so many good things it was destroyed by incompetence, greed, and refusal to listen to the people who actually understood and cared about it.

4
  • 6
    AI response: Farewell, and thanks for all the fish! gasp (j/k - no AI used to generate this comment) Commented Feb 5 at 15:15
  • Another site will replace SE's AI-animated corpse. Let's meet there. Or create it. Commented Feb 7 at 11:35
  • I agree, we need more training data to train better AI, for this we need humans, not AI! Commented Feb 10 at 20:52
  • Codidact exists. All it lacks is (a large number of) users. Commented Feb 11 at 8:50
59

So, if I may summarize this wall-of-evasive-text more succinctly:

As an "experiment" you're going to allow a few "volunteer" SE sites be flooded with the garbage output of a glorified autocomplete, in the hopes that the network's volunteers will curate the mess for you for free, after which your still-unnamed-because-privacy-is-important-for-people-who-give-us-money business partners hoover up the curated data to refine their model.

I knew you were planning to screw us all over but you could at least have offered to buy us a drink first.

3
  • 18
    garbage output of a glorified autocomplete Most accurate description of AI I’ve seen to date, bravo Commented Feb 5 at 19:30
  • 1
    Latest edit is a nicer sentiment. Thank you for doing that. Commented Feb 9 at 17:32
  • 3
    @Spevacus Rest assured that there is nothing "nice" about my sentiments regarding this attempt to push AI slop on us whether we want to or not. Commented Feb 10 at 13:02
52

What about attribution/citation/sourcing on the answers that are suggested?

Right now it’s not included since the GenAI output does not consistently provide that. We’ve stated that attribution is non-neogitiable and that goes both ways. We are determining what sourcing data will be delivered from LLM providers along with the private AI-suggested answers. For the purposes of this limited experiment, we feel it’s still worthwhile to test out the concept and user interactions.

Sorry, but that is extremely difficult to understand. If you must test a concept, you could do so in private, but not in public. If attribution is truly non-negotiable, you cannot make exceptions because the output does not provide it. You must instead make the output providing it. If you don't, it's very much negotiable (obviously). Are you trying to cheat on yourself here?

I think this generated AI content without attribution on this platform is a breach of the CC-BY license which requires attribution. All the creative people who provided content to the network are deprived of their reward, which is to be attributed. If I would be one of them, I would feel very unhappy about it.

Provide attribution or wait until you can.

A little reminder words from February 2024:

All products based on models that consume public Stack Overflow data are required to provide attribution back to the highest relevance posts that influenced the summary given by the model. With the lack of trust being felt in AI-generated content, it is critical to give credit to the author/subject matter expert and the larger community who created and curated the content being shared by an LLM. This also ensures LLMs use the most relevant and up-to-date information and content, ultimately presenting the Rosetta Stone needed by a model to build trust in sources and resulting decisions. Sharing credit ties to attribution and with attribution—trust is at the heart.

You see the irony?


P.S.: What is the license the AI generated answers are published under? If they are to be edited, what is the license of the human edited AI generated answers? I think there might be some legal problems coming, which may not have been thought through enough.


P.P.S.: Now if you would have proposed a Duplicate Finder bot that automatically comments about possible duplicates, I might actually have had sympathy. But unfortunately the Answer bot will do the exact opposite. It will happily answer every question even if it has been asked before, especially if it has been asked before.

6
  • 23
    Attribution is non-negotiable tomorrow. Commented Feb 4 at 22:24
  • 14
    Either the models need to provide attribution, or StackExchange needs to ignore its "non-negotiable" requirement. The models are fundamentally, at a basic level, incapable of providing attribution. Therefore there's only one possible outcome. Commented Feb 4 at 23:02
  • 6
    Well, if they just unilaterally decree it, it’s not a negotiation. 🫣 Commented Feb 5 at 6:03
  • 1
    @Draconis I agree with you that we are far away from attribution in AI currently, although I don't think it's completely impossible, just very difficult. But on the other hand, it's fun to show them their old words. They made a strong impression last year with the non-negotiability. The fall this year is all the bigger for it. Commented Feb 5 at 7:51
  • 5
    It's simple really. "Attribution is non-negotiable" mean "you can not negotiate to have attribution" Commented Feb 5 at 14:09
  • @NoDataDumpNoContribution It's not that it's impossible, although I do suspect it'd be extremely difficult at best, it's that the AI techbros really, really, really don't want to do it because if it turns out they can it means they could also do it with AI generated "art", "music" and "stories", which in turn would mean that they can know how much they owe to a given artist or author for the use of their IP, and they really don't want people to believe they can do that. Commented Feb 9 at 15:19
51

At a minimum, the reputation requirement should either be higher, or the association bonus shouldn't be factored into it. As it currently stands, a user can join a site with this feature enabled and mark suggestions as correct even if they have limited interaction with that site.

For example, I can vote on suggestions on Raspberry PI even though I have yet to do anything on the site other than join and talk in chat. This seems wrong as I don't have the knowledge needed to vote on these AI answers.

Update 2025-02-06: It now appears that I am no longer able to see these answers on Raspberry Pi with just 101 rep from the association bonus. The main post also has been updated to reflect 150 rep as the new minimum.

16
  • Basically what I already said in my answer Commented Feb 4 at 20:34
  • 2
    @Starship I wanted to be more explicit that the association bonus was the issue rather then the rep total being low. Even if it is minor I do think there is a difference between having 51 rep from posting a question/answer and getting upvotes and having 101 rep because you signed up and have 200+ rep on another site. Commented Feb 4 at 20:53
  • 2
    @JoeW Also see my response above. Commented Feb 4 at 22:19
  • @Berthold That explains why I don’t see it on the UX site. Commented Feb 4 at 22:22
  • 3
    "As it currently stands, a user can join a site with this feature enabled and mark suggestions as correct even if they have limited interaction with that site." -> True, but ever since whenever they have been allowed to do exactly the same thing to up/down vote answers and questions (which has much more significance, since approving an AI answer as "correct" simply means it can now be up or down voted publicly). At this point in history, I do not think the association bonus (however right or wrong headed) has proven to be a serious cross-site problem, so why would it suddenly be so? Commented Feb 4 at 22:28
  • 6
    @goldilocks I disagree that voting on an answer has more significance then approving an AI answer. I think that approving the AI has has more significance and impact. People are always going to upvote bad answers but it is more important to stop them from getting posted in the first place Commented Feb 4 at 22:44
  • 2
    "People are always going to upvote bad answers but it is more important to stop them from getting posted in the first place" -> Setting aside the AI issue how is it we are to do that? Make everyone have to have their answer similarly reviewed before they are seen publicly? I think we should stick with what we have, whereby if you think you have an answer you can post it, and other users will up/down vote that to indicate what they think of it. There's nothing here which changes that. The difference is that some AI answers won't even get the chance if they are pre-emptively rejected. Commented Feb 4 at 22:55
  • 1
    "I can vote on suggestions on Raspberry PI even though I have yet to do anything on the site other than join and talk in chat" -> As you are I am sure already 100% fully aware, this is just not true. You can head over to Rpi SE right now and, since you will get the 100 point association bonus, start up or down voting questions and answers and anything else with a plus or minus icon on it willy nilly -- you don't even have to bother reading them if that is what you want to do. So again, there is nothing new here in that sense. Commented Feb 4 at 23:00
  • 4
    @goldilocks Again, I consider being able to approve bad content on the site much worse then being able to vote on bad content on the site. I don't see any issue with a user that just has the associating bonus up/down voting on answers but I do see a problem with them approving AI answers. Someone with thousands of rep from a couple of good questions but has bad knowledge can vote on all the answers they want as well. Rep will never prevent all actions but for approving ai answers it should be higher. Commented Feb 4 at 23:02
  • 2
    So it is fine if I head over to whatever.SE and just randomly up and down vote things, but if I take the time to evaluate a non-public AI answer to decide if it is or is not acceptable, then that is going to ruin everything? Commented Feb 4 at 23:07
  • 4
    @goldilocks What is your problem here? This answer is addressing the new AI answer feature and not anything else that you can currently do based on your reputation. I am simply stating that not accounting for the association bonus in this feature is something that needs to be addressed. I am not making any comments on the ability to vote on answers which is something low rep users need to do in order to vote on answers to their questions. You don't need to discuss your opinions on reputation requirements for voting on this answer. Commented Feb 4 at 23:09
  • 3
    @goldilocks currently, the attribution bonus isn't enough to cast downvotes. fyi. Unless rasberry pi has a specific lower threshold? doesn't seem to... so the avg user couldn't indicate a post isn't useful anyway. Commented Feb 4 at 23:32
  • @KevinB Association bonus, not attribution bonus. Commented Feb 5 at 14:06
  • @goldilocks While I generally agree that the network has quite the tendency to assume no one will vote on whim but then claim they are enlisted to whenever someone post about uncommented votes... at least downvotes have a price on the users (for answers at least) Commented Feb 5 at 15:45
  • 5
    I would rather we not get lost in this kind of details. The entire genAI proposal is sufficiently stupid beyond any possible salvage point. Commented Feb 6 at 5:42
48

Opt out

Please make a toggle so that users do not see these at all. I do not want to review them. I do not want to see them on my questions. I do not want to downvote them when they are invariably wrong.

8
  • 3
    You may want to make it clear regarding whether your opt out request is for when the AI generated answers are still in the review process and/or for the ones after they have already passed (i.e., are then shown to everyone). Perhaps there should be 2 separate opt out options available, for both of these possibilities. Commented Feb 4 at 20:29
  • 12
    @JohnOmielan "do not see these at all"; I think that's clear enough. They do not wish to see and interact with these during the review process, when they are deleted (this is a mod-only thing anyway), or when they are approved and posted. Commented Feb 5 at 1:02
  • @M-- I'm not interested in getting into a long discussion, but note there are the 2 aspects of scope (i.e., private and public answers) and range (e.g., in their own questions, all questions, etc.). The "at all" does not necessarily refer to both aspects. When I first read the answer, it wasn't clear to me (and, I suspect it might also not be for others) that this was referring to the scope as well (although the "... not want to review them" implies the scope is at least the private review queue). This potential confusion can be quite easily rectified with a few added or changed words. Commented Feb 5 at 2:29
  • 6
    @JohnOmielan I will keep this short. Not wanting to review refers to the private part, and not wanting to downvote them refers to when they become public. These combined with "at all" comes to what I said in my first comment. Anyway, it's fairly obvious to me what the answer entails, and you've asked your question. Let's wait for the OP to clarify for you. Commented Feb 5 at 2:52
  • 4
    @M Yeah, I don't wanna see anything. If somebody wants to be a moderate about this, there can be two toggles, whatever, but I want no AI slop, ever Commented Feb 5 at 4:37
  • @Kaia as these answers are written by a system user, it should be simple enough to remove them with a user script or ublock origin filter. An opt out would also be nice, but tbh I'd rather have a (default-disabled) opt in. Commented Feb 5 at 9:30
  • 1
    @l4mpi What about regular users who don't have some fancy userscript at the ready Commented Feb 5 at 14:07
  • 1
    @Starship I'm not arguing against a toggle. If you read the rest of my comment, I'm saying I want the toggle in the "don't show AI trash" position by default. But if SE does not deliver (and I doubt they will) then it should be easily blockable, and you probably don't need anything more fancy than ublock origin. Now you'll ask, "what about people who browse without an ad blocker" - well, these users have either already gone blind from the crappy ads all over the net, or have learned to ignore random ad trash on websites and should be able to easily apply this skill to the SE AI trash. Commented Feb 5 at 16:11
42

Approval of such AI-generated answers should be limited, at minimum, to users with a silver badge in the question's main language tag. If it seems difficult to code for that, let this be your signal to implement parent language tags as a system feature (something we've been asking for for 5+ years, at least), then it will be trivial.

Otherwise it will just be the same level of garbage that generative AI tools today put out.

3
  • 3
    Tag/badge-specific visibility is something we discussed and the experiment is built to accommodate that in the future. Agreed that it's the best way to get something in front of users well-versed in a question's topic. The plug for parent tags is appreciated! Commented Feb 4 at 20:09
  • 5
    Having a badge is better than rep alone. It boils down to votes earned in a topic. But this requires relying on the question having the correct tags, enough people with badges in those tags (and wanting to review AI slop), and that having a tag equates to sufficient knowledge in the topic to carefully dig into the output to validate it. I'm not sure it's good enough. Commented Feb 4 at 20:14
  • 10
    This seems to be about SO. The few confused sites who apparently opted in for this ("Arts & Crafts, Raspberry Pi and User Experience") don't have enough activity to get a lot of users with relevant silver badges. Personally I would immediately stop using a site which has AI generated answers and there has to be other users like me, so these sites are up for a drop in active users caused by the experiment, meaning even less badges. That on top of the fact that the network only has some 20% remaining of the activity it had a couple of years ago. This might be the final nail in the coffin. Commented Feb 5 at 8:43
41

This honestly sounds like you want the community to help some undisclosed AI company refine their model, for free.

People don’t want to do this for money, why do you think they will do it for free?

3
  • 7
    They haven't even thought about rewarding the users who train this AI... they only think about their own pocketbook... let's see if the user interacts with the AI ​​by training it, they would get 20 rep points for using it... I don't think so... Commented Feb 6 at 2:42
  • 2
    None of the many other controversies have had much influence on my participation. Not even the fact that they are selling the data to OpenAI matters so much to me: after all, OpenAI was already taking it without attribution and compensation in any case. But the day that AI-generated answers start showing up as an integral part of Stack Exchange sites that I use is the day that I stop using it. Regardless of quality, though right now they still leave much to be desired Commented Feb 6 at 5:11
  • 1
    Regardless of quality and regardless of which model or company they may use now or in the future. That is because this site is a social network, a place to exchange knowledge with people. If whatever AI they may use is not a person, it will detract from that purpose. If it is a person, then detracting from the site will be the least of the moral questions that SE and its partners should be considering. Either way, it's not something that I want this site to be a part of, and it is not something that I will be a part of. Commented Feb 6 at 5:12
37

What about attribution/citation/sourcing on the answers that are suggested?

Right now it’s not included since the GenAI output does not consistently provide that. We’ve stated that attribution is non-neogitiable and that goes both ways.  We are determining what sourcing data will be delivered from LLM providers along with the private AI-suggested answers. For the purposes of this limited experiment, we feel it’s still worthwhile to test out the concept and user interactions.

The training process used by LLMs based on the GPT architecture does not and cannot remember its information sources. It's operating on the level of tokens (words, word fragments, and special symbols like emojis), not on the level of concepts.

When a human reads, we attempt to build some internal model of the facts we are learning. We create mental structures that somehow mirror the concepts we're reading about, and how they all fit together. We also retain some information on our sources.

In contrast, a LLM assimilates its training data to create a weighted neural network which enables it to emit pseudorandom sequences of tokens that are statistically consistent with the current prompt and that training data. It doesn't remember its sources, and it doesn't explicitly operate on the level of concepts, it's just operating on tokens. All token pair transition probabilities get consumed by the training process, regardless of their source, and regardless of what concepts they happen to be associated with. An LLM can give the impression that it understands concepts, but that's just a side-effect of statistical correlations in token sequences.

Now some of the latest GPT-based LLMs can appear to give attributions for their utterances, but these are fakes. The GenAI system simply generates an utterance and then does a search on a relevant data pool (eg, the whole Internet, or the Stack Exchange network), looking for close matches to the utterance, and then claims that those matches are its sources.

So it's simply not possible for current GenAI models to provide genuine attribution for their utterances. We need a paradigm shift to a more advanced architecture for that.

That's a big problem for the Stack Exchange network, since we demand that answers provide legitimate attribution for all substantial claims, especially on scientific and technical sites. It is fundamental to academic integrity that you always cite your sources and references. If you attempt to just spout unsupported opinion you are considered to be a charlatan or a crank.

Don't get me wrong. Current GPT-based LLMs which have undergone extensive RLHF fine-tuning are very impressive, and can be very useful tools, if you're aware of their limitations, and you know what you're doing. But they are no substitute for a human expert who actually knows what they're talking about, and who can cite their sources.

9
  • 1
    "LLMs can appear to give attributions for their utterances, but these are fakes" - LLMs can also summarise text, so they could very well be summarising what's said on some page, rather than finding one that matches after the fact. AI agent architecture already allows for this. Although I wouldn't be able to say how any particular LLM generates any particular response (it's fair to say most of e.g. ChatGPT isn't doing this). Commented Feb 5 at 3:20
  • 8
    @NotThatGuy When ChatGPT summarises, it actually does nothing of the kind - Gerben Wierda, May 27 2024. Commented Feb 5 at 3:29
  • 3
    A bit of a misleading title, that. The article says ChatGPT shortens the text instead of summarising it. That's different, sure, but it's far from "nothing of the kind". The article doesn't support your claim that it's attributions are "fake" - closer to the opposite. Also, "an LLM is bad at this thing" is not much of a story. LLMs are/were bad at lots of things, but they keep being made better and better. It's questionable to call that a "fundamental difference" between shortening based on length vs importance - the mention of "understanding" sounds like an appeal to human exceptionalism. Commented Feb 5 at 3:46
  • 8
    @NotThatGuy My point is that a human students can maintain a chain of trust with the sources they study, building on a network of authoritative expertise. But current GenAI systems cannot maintain those chains. OK, a GenAI program may be able to use pattern matching to find sources that look relevant to some utterance that it's synthesised, which is better than nothing, I guess. But it's not the same as carefully building on top of a foundation of trusted works. Commented Feb 5 at 4:12
  • If that's closest match from the sources the AI was actually trained on, it seems good enough. Attribution would not need to be better then output. Commented Feb 6 at 6:59
  • 2
    My cynical prediction is that SE staff will continue to assure us that attribution is "non-negotiable" and "very important" and "a high priority" and "something we really want to maintain" until they finally get around to admitting that their AI "partner" has no idea how to implement attribution and less than zero interest in bothering because giving credit to the people curating its dataset isn't going to make them money, at which point SE will quietly drop the requirement and tell us they really wanted to, but... and we just have to take it. Commented Feb 7 at 11:38
  • 1
    @Shadur-don't-feed-the-AI why even be cynical about it? It fits the SE playbook to a T. They start something, deliver a half-finished project, then abandon it. But since it's in production, it stays there. Remember the notifications rework, for example? The last major change they did was to break what wasn't working further. So, same thing will happen with the answer bot - they'll push it to production, with many reassurances how it's initial version and they'll work on it. Then stop after fixing few inconsequential issues. Commented Feb 7 at 11:41
  • 2
    The GenAI system simply generates an utterance and then does a search on a relevant data pool (eg, the whole Internet, or the Stack Exchange network), looking for close matches to the utterance, and then claims that those matches are its sources why does that remind me of grad students in a hurry to finish their paper draft? Commented Feb 11 at 8:56
  • @SmallSoft Well, I guess it's better than nothing. ;) It is good to cite trustworthy references that support your claims. But that's no substitute for saying "here are the previous works that were the actual foundations of my new work". Commented Feb 11 at 9:44
32

I fail to see the point of this.

If the answer is posted after being modified by users, then what value is the site integration, the whole feature that you are announcing here, actually adding? Right now, most sites here do not allow AI-generated answers, for good reason, but if they did, users are perfectly capable of copying the question into some chat interface, copying the answer that it returns, and adding their own modifications before publishing it here. Likely more capable, since they can add additional context to the question to try to get a better answer. If that is the case, then the whole team responsible for the integration might as well resign, because they will have spent a lot of work to produce something that adds no value.

If an answer created by some sort of generative model instead is posted without any modification, then what is the value of Stack Exchange as a company or a business? People are very much capable of copying and pasting their questions into ChatGPT, DeepSeek, or whatever the latest interface will be when you are reading this answer. As such, you would be doing nothing but returning the same result that a user would have gotten anyway with extra steps and a delay. If that is the case, then the entire company might as well disappear, because it would be providing the same service with more difficulty.

If, instead, your goal is to market upvotes and downvotes here as a cheap input to improve the quality of the models, I would suggest that you may overestimate the willingness of your users to do so, their skill at actually identifying the bad answers (bad generated answers tend to be written very professionally, which I have seen mislead voters), and the long-term interest of companies like OpenAI to contract out something that they already have integrated into their systems.

2
  • 7
    pretty sure your last point is the closest to the truth. It is pretty clear that the company behind Stack Exchange (that being Prosus) think of the network as a factory of curated training content to sell. Everything points to this. From ensuring that no one can use the data dump without paying them first to exclusive agreements with specific vendors, the goal is to have the users curate the content for them so it can be sold at higher price. After all, what would go for more $$$? Unreviewed quora questions or a curated SO answer some "user resource" wasted time curating? Commented Feb 5 at 14:34
  • 2
    The reason they feel it's okay for them to post "checked" AI generated answers while it is suspendable for users to post checked AI generated answers, is because they're trying to build an LLM specifically off the SE users. Using another AI for answers would introduce another AI's data into SE AI's, increasing the required space without adding value. That's probably also why they are concealing the source of the AI -- it's probably a partisan figure like Musk trying to absorb SE's quality into an AI he's building, and revealing his identity would discourage many quality users from editing. Commented Feb 7 at 6:33
30

This experiment is indicative of the profound misunderstanding about what LLMs do which infests the entire industry.

LLMs are designed to produce output that looks like English. The models do not "know" the content of the answer being produced. They are accessing and manipulating tokens that do not have any assigned semantic values. E.g. when LLMs are called on to "summarize" long papers, they merely shorten the papers. They can't summarize. They cannot pick out the key topics, follow arguments, or guarantee that the conclusion of the paper will be accurately represented in the output.

It astounds me that a site such as Stack Exchange, which encourages users to cite sources for their answers, would do an about-face and run experiments using tools which deliberately obscure the source of the (largely pirated) training set materials which are being blended and regurgitated for the end user. Do you want answers with real citations, or not?

"Yes, but you can challenge ChatGPT to write a citation" and what you do you think it does? It writes something that looks like a citation. It is extremely time-consuming to chase down (to choose an example relevant to GenealogySE) a citation for a newspaper article from a newspaper that never existed.

You don't seem to realize why people are willing to contribute to SE sites. Count me in among the many people who have already said they have no desire to clean up AI slop. If someone can't be bothered to write an answer, why should I or any of my community members waste valuable time to edit it?

The AI companies are taking advantage of end-users' human nature to believe that the underlying algorithm must "know what it is talking about" in order to produce the output it does. That doesn't mean we need to fall for the con.

Please don't waste resources on this rubbish.

29

Which Stack Exchange sites are participating in the experiment? Arts & Crafts, Raspberry Pi and User Experience (UX) are currently participating in the experiment. Web Apps was a participant in an earlier stage.

Why did Web Apps withdraw from the experiment? E.g., was it due to poor answer quality? I see Answer Bot received a few downvotes:

enter image description here


Update: the question has been posted on WebApps meta and one of the mods addressed it (thanks Berthold for the pointer to it!).

2
  • 9
    Answer Bot realized it was a "web application", experienced cognitive dissonance and an identity crisis... and then stopped producing useful content. Eventually it answered every question with: "All work and no play makes Jack a dull boy." It was the most peculiar thing... (Some or all of this claim may be rumor and/or complete fabrication. 😜) Commented Feb 6 at 4:29
  • 14
    One of the mods posted about this on their Meta. Commented Feb 6 at 17:40
28

What would need to be in place for you to feel comfortable seeing Answer Assistant implemented as a controlled experiment in your Stack Exchange community?

For me to be OK with this "answer assistant" experiment, it would need to actually be an assistant and not some repackaged LLM pretending like it knows things.

It wouldn't generate an answer; It would help me quickly write a good answer based on my expert knowledge. Some random ideas of how it could do that:

  • Help me find related questions on the network
  • Summarize all of the comments on the question and its answers
  • Help me find good supporting sources for my answer
  • Format the answer according to the community norms of the site
  • Suggest ways to make the phrasing of the answer clearer, expand thoughts into full sentences and otherwise wordsmith my content
  • Warn me that the question may be off-topic or a duplicate

A true answer assistant would encourage humans to write answers by making it easier to write them well. This AI generated answer experiment is pushing humans into the tedious work of reinforcement training an AI while taking away the more rewarding activity of helping people by answering their questions.

Unanswered questions have traditionally been a way for new users to start getting involved with the site. Having AI steal that opportunity seems counter to the goal "... to build and support a healthy ecosystem of active users and community contributors." Unless of course the community the company is trying to build is one willing to curate data for and reinforcement train an AI without compensation.

11
  • 1
    This makes sense. Use a LLM to assist with language-oriented tasks. There's still a possibility that the LLM changes the meaning of the text, and that could cause problems, especially for non-native speakers. But that's much better than allowing an LLM to generate answers that human experts then need to validate. Commented Feb 5 at 18:49
  • 6
    @PM2Ring I would be much happier helping a human fix up their answer if AI steered them wrong than reviewing a computer-generated answer. Commented Feb 5 at 18:54
  • 2
    Using AI to fix and improve, or even merely reword human written answers would also mean that we would lose ability to moderate AI generated answers. We cannot efficiently do that based on correctness. Commented Feb 5 at 22:27
  • 4
    @ResistanceIsFutile You already can’t distinguish answers written with the help of AI from completely human-written answers, and it is only going to get harder. I would rather have more people writing better answers than have mods hunting answers that are only bad because they have some taint of AI on them. The point is to create a library of knowledge, not to fight a losing battle against any adoption of AI. As with any tool there are bad uses, like letting it generate garbage answers, and good uses, like helping people engage with SE without the intimidating learning curve. Commented Feb 5 at 23:10
  • @ColleenV Maybe we cannot detect all of them, but we can still significantly reduce the flood of bad AI answers. And while we may lose the fight some day, there will always be plenty of AI answers that can be easily detected. I would rather that people focus on posting accurate answers than adding AI fluff on top. Commented Feb 6 at 7:10
  • 2
    @ResistanceIsFutile I’m not suggesting that AI add fluff. I’m suggesting something more along the lines of a smart linter/grammar checker under the guidance of a person who knows what the content of the answer should be. Having an AI tool available that is designed with site guidelines in mind would reduce AI slop. People are going to use AI; it’s far too useful to ignore. It’s foolish to try to ban anything that may have been touched by AI without taking into account how it was used in creating the content. Commented Feb 6 at 12:00
  • @ColleenV If I had a dime for every time someone who posted literal copy paste from AI used excuse that they only used AI for grammar... This would not work. Now the obvious ones would not be that much of a problem, but often those that are not so obvious, turn out to be AI garbage after you scratch the surface. Stack Overflow moderators would not be able to keep pace with that. And it would also be much harder to curators to flag in the first place. Commented Feb 6 at 12:10
  • 2
    @ResistanceIsFutile I feel like you’re missing my point. The tool would be created by SE for SE. Hell, it could even mark the post to indicate the tool was used. Changing the environment to make it easier for people to create answers that align with site guidelines reduces moderation workload. If the company is determined to inflict AI on us, it should be in a way that encourages human participation, not in a way that replaces it with AI slop. Commented Feb 6 at 12:30
  • I am not missing your point. I know that this would be AI tool for the SE. But answers edited with such tool, would still have common AI traits. Also use of such tool, would not prevent users to post AI generated content into the tool and then claim that pasted content was written by them. So even if we know that SE AI tool was used, we cannot guarantee that original content was not AI generated. Commented Feb 6 at 12:40
  • No one is missing your point, it's just that your point is wrong. If they really intended for this "AI" to be something specific to SE to benefit SE and augment its capabilities they would already explicitly have said so. They haven't and they won't because that isn't the plan and it wouldn't make the AI company any money anyway. The plan is to get us to curate the LLM's output so that the company can then profit off the dataset we provide. Commented Feb 6 at 18:33
  • 1
    @Shadur-don't-feed-the-AI All the people missing my point keep insisting they aren't lol. Because I know what my point was, and your response is completely irrelevant to it, I have obviously done a poor job communicating it. But frankly, I don't care. I have spent all the energy and focus I'm willing to spend on this. Commented Feb 6 at 19:04
27

If an AI can answer a question, I can ask it myself to an AI. If I come to a StackExchange web site in 2025, chances are that an AI is not able to answer my question properly, or that I don't want to risk to get an answer that seems correct on the surface but contain factual errors (e.g., invented although plausible Win32 APIs, like I had once). Having people "moderate" answers generated by AI instead of empowering them with writing content make the site more user-hostile in some way, and doesn't bring anything to end users.

So, in my opinion this is not a feature I'd feel would have its place here. It would just be ruining the reputation of the sites.

26

Questions that meet the following criteria may receive an AI-generated answer:

  • Older than 72 hours, to leave time for human curation

  • Posted in 2024 or 2025

  • Net positive score (0+)

  • Unanswered, defined as having no upvoted or accepted answer

I can think of a few reasons a question might fit these criteria:

  • it is badly tagged and almost no human is seeing it
  • it is borderline in terms of quality / interest (not bad enough to be downvoted, but also not good enough to get upvotes)
  • it is (potentially very) difficult to answer

The Answer Assistant is not going to help with any of these.

  • low visibility of question: no human will see the answer either. It will never be curated, and never be posted.
  • low-ish quality: will likely lead to a low-ish quality answer. This will flood the site with mediocrity in the best case.
  • difficult to answer: if experts can't answer it, why would a language model suddenly be able to do so?

The Answer Assistant is not going to be useful for experts writing answers. An expert is not going to need AI assistance to write a good answer. It might lower the "barrier of entry" to answer a question due to saving some amount of time (reviewing a suggestion is probably faster than writing a full answer from scratch), but I feel like if a question is interesting/high-quality and unanswered then an expert is not going to need that kind of "push".

All I see this accomplishing is driving existing users away, and preventing more experts from joining, due to the network getting the reputation of being filled with AI slop.

5
  • 5
    One more possible reason for not having an answer: the question is an almost-duplicate of one or multiple existing questions, which have been debated extensively. As a human, digging up the duplicate(s) and possibly explaining why they are duplicates is very tedious. I could absolutely see an AI helping here. But judging from a few experiments I did (and the illustration here; seriously, couldn't SE find a better Lorem Ipsum?), it's nowhere near that quality. Sigh. Commented Feb 5 at 9:40
  • 1
    Out of curiosity I went to a few of the exchanges and checked out some of the generated answers. Most of the questions were vague and there were comments asking for clarification, yet ai wrote multiple paragraphs with numbered lists and scripts dozens of lines long. Ignoring the fact that nobody is going to review this slop, is it now ethical to downvote unclear questions to give them negative score to prevent the ai from wasting resources? We can always reverse our downvote if the OP edits their question, so it seems to be how the site is meant to work. Commented Feb 5 at 15:55
  • 6
    @GammaGames "is it now ethical to downvote unclear questions" always has been. Unclear questions aren't useful. Negative score also contributes from the unclear question being automatically deleted. I regularly go through old questions that survived the roomba only because nobody downvoted a question that has images of code, no clear problem statement, is confusing and unanswerable, or others, or multiple of the these. And I know it's not lack of attention, because people left comments asking for details and improvements. But didn't downvote. Commented Feb 5 at 16:02
  • 1
    @VLAZ makes total sense, I guess people that care about the environment now have more incentive to downvote early! Commented Feb 5 at 16:04
  • This is almost exactly what I was going to answer. Now I don't have to. Especially on high-traffic sites such as SO, but I think also on most less trafficked ones, questions that go unanswered are usually bad questions, even if they have a non-negative score. Occasionally they very specialized or difficult ones. One would not expect an LLM to fare well with either of those, and some of the bad ones simply should not be addressed at all. I think the tagged-into-invisibility ones are rare, but of course this answer is quite right about LLM answers being moot for those. Commented Feb 7 at 19:28
26

LLMs are part of the world now,… that not only provide value but also… keep humans in the loop, and encourage human contributions….

What a detached and oddly dystopian phrase. It offers nonetheless an invaluable glimpse into the future and how the Stack Exchange company will view the flesh and blood members of its communities.

AI, the new pioneer, continually evolving and improving itself, now sits in the council of knowledge whereas humans, who will increasingly depend their livelihood on these sterile learning machines–are still, somehow, required to clean up the muck left behind by these compassionless and predictable so-called models of intelligence.

A private answer [i.e. generated by AI] becomes public if multiple users mark it as ‘correct’. A private answer moves to a deleted state, visible only to site mods, if multiple users mark it as ‘incorrect’

Welcome to the upgraded "Let the humans eat cake" era.

25

We remain open to concluding the experiment early if we find the results unfavorable for any reason.

Is the second "we" meant to be the same as the first "we", which I believe refers to the company? If so, then to what extent, if any, would the community's reactions be considered if they find the results unfavorable, e.g., many (such as a majority) of the site mods are against it, a meta post indicates the strong dislike from the members of the site, etc.?

Also, should you stop an experiment on a site, what sort of cleanup, if any, would be available, e.g., delete all of the AI bot answers, just remove the ones that are currently being processed, etc.? In addition, how would the type(s) of cleanup be determined?


Note that Berthold's comment below states

The second "we" is communal, since feedback from the test site communities and the broader community is a big factor. ...

However, as requested in Starship's comment, the "communal" aspect should be more clearly specified, e.g.,

... Communal means what exactly? Mods agree? General users agree? How is this measured? Who gets to vote? What percent of a vote is enough to turn it off? How would one start just a request to remove this? Details please.

Finally, regarding Berthold's comment that feedback from the "broader community" is also a big factor, then with so much negative feedback (e.g., with this question currently having only 25 upvotes compared to 335 downvotes, as well as 43 answers which are almost all very much against this idea), I hope and trust this is sufficient feedback to show the "broader community" does NOT want this "answer assistant".

10
  • 4
    The second "we" is communal, since feedback from the test site communities and the broader community is a big factor. There is an "off switch" (so to speak) for the experiment which would remove all private answers. Public answers would remain, but could be cleaned up in a variety of ways, depending on the situation. Commented Feb 4 at 22:15
  • @Berthold Thank you for the clarifications. Please consider adding at least some of those details to your question. Commented Feb 4 at 22:24
  • 11
    @Berthold Please be specific. Communal means what exactly? Mods agree? General users agree? How is this measured? Who gets to vote? What percent of a vote is enough to turn it off? How would one start just a request to remove this? Details please. Commented Feb 4 at 23:23
  • 1
    @Berthold So am I to understand from your comment that if an AI generated answer without attribution gets through the review process it will be a permanent fixture on the site even though the experiment fails? Commented Feb 5 at 11:47
  • @Starship consider this. General users weren't even asked if they wanted to join the test, they deemed asking to the mods good enough. Therefore, I assume that any request would have to come from the mods. Which is obviously a skewed process since those mods already demonstrated their oath to the Holy AI of Caerbannog by joining in the first place. Commented Feb 5 at 15:33
  • 1
    @ꓢPArcheon I'm all for calling out the company, but let's not pretend all or even most mods are pro-AI. Look at the strike letter for a long list of such mods. Commented Feb 5 at 15:37
  • @Starship Sorry, probably poor writing on my side, you got me wrong. I meant the mods on the sites that joined the test. I HOPE that to join the test half+1 the mods of the site had to agree. Can't edit it but read that line as "Which is obviously a skewed process since (hopefully) at least half+1 of said participating site mods already demonstrated their oath to the Holy AI of Caerbannog by agreeing to join in the first place." Commented Feb 5 at 15:38
  • @ꓢPArcheon Things are a bit more complicated webapps.meta.stackexchange.com/q/5281 Commented Feb 7 at 9:14
  • 1
    @ResistanceIsFutile saw that, and I also saw the company saying "This validates many of the assumptions that the company and the community had" there. Yet, while apparently while the site mods coming to the conclusion that the experiment didn't work "validated" their assumption... the experiment was still taken forward on other sites. [cont] Commented Feb 7 at 9:44
  • 1
    @ResistanceIsFutile [cont] I reserve my final thoughts on this for the end of the project, when we will finally see what "validated" assumption the company will follow: the one of the mods of webapps that stepped back or the one from the mods of other sites that even came here on meta quite... vocally defending the experiment? Will the webapps mod concerns really have a role into the decision making or will the second phase be deemed a success anyway and the bot pushed to all sites in the network? Will the results from other sites be used to justify pushing it to SO without testing? Commented Feb 7 at 9:47
25

an experiment in which AI-generated answers are verified, edited, and curated by the community before becoming publicly visible. We want to test if this feature could help improve the answer experience and encourage knowledge sharing by helping users get unstuck or get a jump-start on content curation while maintaining quality.

This adds no value. Let me add one more assenting voice: if I thought that "verifying, editing and curating" an AI-generated answer could produce something better or faster than what I do on my own, I could just use any existing generative AI (subject to site regulations). But I don't believe that anyway.

we want the Stack Exchange community to know that the team is: committed to the Stack Exchange network being a place for human-curated knowledge and information; committed to building solutions that add value for users on the platform; and not interested in any outcomes that might dilute the value of the platform.

The team's actions suggest otherwise. The community has repeatedly told you as clearly as possible that these kinds of generative AI integrations are not wanted: especially because attribution is non-negotiable, it doesn't add value to have automatically generated text that requires manual verification. It would take at least as long to ensure the answer meets requirements as to do a different answer from scratch - it's far easier to figure out attribution when you were the one who looked up the information in the first place.

If the goal is simply to inspire the community to collaborate on a canonical answer, that is what the "community wiki" feature is for. Everyone can mark their questions and answers to lower the bar required for editing and to disclaim personal interest in the content.

If the goal is to engage with the community and collaborate with human curators to find a legitimate use for generative AI, that would start with a) asking the community for suggestions instead of having a steady stream of proposals shut down; b) being willing to take no for an answer. Quite a few of us, myself included, can't see any legitimate use for any kind of generative AI integration on the site, full stop - and improvements to the quality of the output of such systems are entirely beside the point.

What about attribution/citation/sourcing on the answers that are suggested?

Right now it’s not included since the GenAI output does not consistently provide that. We’ve stated that attribution is non-negotiable and that goes both ways.

In other words: by your own admission, attribution must be included, and generative AI cannot be trusted to attribute its own content correctly. Therefore, this isn't just a question of editing the content; you would be imposing upon the community to come up with correct attribution for content that didn't originally come from those community members. This is what's so destructive about the idea.

No amount of improvement to generative AI systems could fix this fundamental problem. You wouldn't ask one human Stack Exchange user to attribute another's content, either. You would flag the unattributed content for plagiarism instead.

You might wonder why we’re moving forward with the experiment, in the face of concerns and sensitivity around AI-generated content across the network. Put simply – in this time of foundational change, we must prepare for many possible futures.

Avoiding corporate speak like "in this time of foundational change" (when were we ever not in such a time?) would go a long way towards improving the appearance of sincerity. But it wouldn't change my personal assessment that this is yet one more attempt to get a foot in the door with generative AI. How many more times must we slam that door in the company's face before the message is clear?

User expectations around seeking and contributing knowledge are rapidly shifting

No, they aren't. What is shifting is the expectation of ordinary people who have experienced these technologies personally. That has no bearing, and shall not have any bearing, on how Stack Exchange works. Phrasing like this is exactly why the team continues to lose the community's trust.

You know that our expectations are not shifting, because community reaction to any suggestion with even the faintest whiff of AI about it remains just as negative as ever. This post contains many paragraphs that seem aimed at addressing the community's concerns about AI, yet it still misses the point completely.

1
  • 3
    " if I thought that "verifying, editing and curating" an AI-generated answer could produce something better or faster than what I do on my own, I could just use any existing generative AI (subject to site regulations)" There is this one case of a high rep user on SO who did exactly this for 1000-2000 posts in 2023/24 if I remember correctly. Maybe this person thought he can actually produce something better or faster. And maybe with this feature he could do it even a bit faster (saves the copy and paste action, however ties you to a certain GenAI provider). Not a big advantage though. Commented Feb 8 at 18:36
24

I'm not going to repeat arguments other people have made in this answer. Yes, this new feature seems more cumbersome than anything else, and it feels disrespectful towards the community to continue trying to push AI despite the fact that you are aware of the negative reception of your previous attempts. Sadly, this post confirms that this is going to continue no matter what:

Committed to building solutions that add value for users on the platform. LLMs are part of the world now, and any potential integration must be explored responsibly, in ways that not only provide value (task completion, closing knowledge gaps, etc) but also create transparency, keep humans in the loop, and encourage human contributions.

Why must new ways to integrate AI be explored? I'm not necessarily against AI tools in general, and this is probably also the case for most users of this platform, but the Company seems to think that exploring AIs should be the top priority, and nobody understands why. Most users want this website to be a reliable way to obtain answers from trustworthy individuals, and they don't see why AIs should be used here.

Anyway, I wanted to talk about the three Stack Exchange sites that are a part of this experiment: Arts & Crafts, Raspberry Pi and User Experience. It seems to me that some of these websites would not benefit at all from using an AI as a default answer provider. For Raspberry Pi, this is at least a bit reasonable, if a user wants to ask about how to do X with their Raspberry Pi for example, an AI could provide a reasonable answer to such a question if a task is common enough (although the question would probably have already been answered in this case, as many people pointed out). This is a bit more blurry for Arts & Crafts, because while the AI can answer basic questions about the crafts themselves, it cannot really guide you when you ask a more complex question in most cases. However, I don't understand how your AI is going to improve anything for User Experience. I'm not going to pretend like I'm any kind of expert on this topic, but most questions asked on this site are not "How can I ...?" questions, but rather "How should I ...?".

This distinction is quite important. When you ask an AI about how to do something, it will generally give you a straight answer. However, if you ask instead if you should do that thing, the AI will generally give a vague answer that only lists considerations you should take into account before doing the thing. This is what I like to call a "shopping list" answer, when the AI just lists a bunch of stuff in order to avoid answering definitively the question. Why is it doing that? Because it would be a very bad idea if the AI started to form its own opinions, especially if it is hallucinating. I need a good example for this... I know, let's ask an AI about this topic:

Should Stack Exchange publish answers generated by AI when no user has published an answer and when these answers are double-checked by other users willingly?

Generated by GPT-4o mini

The decision for Stack Exchange to publish AI-generated answers, even when double-checked by users, involves several considerations:

1. Quality and Accuracy [...] 2. Community Standards [...] 3. Transparency [...] 4. Attribution and Licensing [...] 5. Encouraging Human Participation [...] 6. Use Cases [...].

In summary, while there could be potential benefits to publishing AI-generated answers under certain conditions, it would require careful consideration of quality, community standards, transparency, and the overall impact on user engagement.

As you can see, this answer is not really an answer to my question. After reading this, have I really obtained a definitive answer that would satisfy me? And this is a good thing, because answering "Oh yeah, sure, this is a great idea" would have been the worst available answer. Also note that this is the kind of answer that cannot be easily edited. How are you suppose to edit this in order to answer the question? You can't. So I hope that some kind of system was implemented in order to avoid these pitfalls, but only time will tell I guess. One of the many advantages this website has is that two users are able to post different and sometimes contradictory answers, and these answers are able to complement each other, because the user can now make a choice by reading each solution. This is not really the kind of thing AI can do well. This is not to say that the Raspberry Pi site is not going to have these problems, quite the contrary actually, but it's obvious that this system is not effective at all for some topics.

The last problem I want to talk about is the fact that AIs will not be able to read or write comments. Because I want every site to be filled with bots 24/7. Comments are more important that you may think. If a question is a bit too broad, I can just ask the user to modify their question in order to be able to provide a better answer. I can do that. An AI can't, and will instead just answer the question no matter what. An AI also can't read comments, so it's impossible for the user to ask for further clarification. Hope some users are ready to babysit the AI, because this is currently the only way to provide more information once an answer is posted. This last point can potentially be "fixed", but even then it would not be a great solution to this problem.

1
  • 2
    Thanks for the thoughtful answer. Yes, the answer bot is not an agent like a human user. Potentially one could try to build an algorithm that also comments or votes. But that would really be very much in the future. Commented Feb 6 at 22:53
24

Brilliant plan.

You haven't got enough people actively creating content, so you just go to AI generated content for which you haven't got enough people to check the content.

Flood the stack system with BS AI "answers" and flush the system down the toilet even faster.

I hope the owners are getting a good paycheck out of this because it sure isn't doing anything for people who need the stack system for information.

24

I've been going through the questions with LLM answers on Raspberry Pi. These tend to be troubleshooting questions, and the LLM fairly consistently takes the "shotgun" approach to answering them: throw out a whole bunch of suggestions, and hope that one of them hits the target.

Take the answer to this question: it's got five "steps" and two "tips". Of them, I know from reading the question that both "tips" and "step 5" are irrelevant, and I'm nearly sure that "step 1", "step 2", and "step 4" won't fix it. "Step 3" might work, but I don't have a Pi 5 with Ubuntu to test it on, and if it's wrong, it might open up a security hole.

Honestly, the whole answer reads like a condensed version of what happens when a troubleshooting question hits Hot Network Questions, and is unlikely to actually be of use to anybody.

1
  • 2
    Yes. I'm concerned about asking users to rate or edit how "correct" that sort of "shotgun blast" is. What is "correct" even supposed to mean? The one good thing I've seen come out of this so far is that it's taught me to recognize that posts like that, when posted by "real" users, are likely AI plagiarism. Commented Feb 7 at 19:16
23

AI answers should always look different

Before I saw the image captions, I thought the red-background answer was how AI answers would always look, and that felt like a good compromise.

Honestly, as a user, I don't hate the idea that alongside the human answers there is one, clearly distinguishable, AI generated answer. I kind of don't even mind if it isn't quality controlled first. It's another perspective, another option.

But having AI generated answers blending in amongst human answers...no thank you very much. "AnswerBot" is nowhere near enough of a signal.

1
  • 2
    With current planned changes, adding the keyword site:stackoverflow.com to my Google searches would stop being the silver bullet it is now to hide low-level web sites inspired by AI. As a result, I will stop visiting SE web sites altogether. Please modify your "experiment" so that a Google silver bullet would still exist in some form. Commented Feb 6 at 17:17

You must log in to answer this question.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.