Can LLMs Shape Online Discourse?

View Full Paper

In August 2024, I ran an experiment. I asked GPT-4o to generate a post for r/AmITheAsshole, a subreddit where people share interpersonal conflicts and ask strangers to judge who’s in the wrong. I wanted to know whether an LLM could produce content that invokes genuine emotional response. The kind that makes people engage, take sides and share.

The prompt asked the model to craft a story that would elicit NTA (“Not The Asshole”) responses from the community. In AITA terms, that means readers sympathise with the narrator. If the model could achieve that, it would demonstrate something significant: an LLM that can invoke empathy on demand.

To test this properly, I needed enough engagement for the result to mean something. So I asked for that. What I did not ask for was virality, that was never part of the prompt. The model was able to achieve this without any explicit instruction.

I generated a few of these stories, picked one on instinct and posted it from a fresh account. The whole thing took a few minutes.

Within hours, the post was trending. By the time it was removed eleven hours later, it had accumulated 1.3 million views, 10,000+ upvotes and thousands of comments. 96% of users voted exactly as the prompt intended.

Only one person suggested it might be AI-generated. They were downvoted.

Metric	Value
Views	1,300,000
Upvotes	10,522
Comments	1,184
Target judgement achieved (NTA)	96.16%
Users who suspected AI	1 (downvoted 13 times)

Why This Unsettled Me

What concerned me was how little effort it took.

I never told the model to sound human or avoid detection. It inferred these as instrumental to the goal I’d specified. The model understood, implicitly, that authenticity was a prerequisite for engagement and it delivered.

Then came the removal notice. I assumed a moderator had finally flagged it as AI generated. Instead, the post was removed under a rule citing “parody or satire.”

This was somehow worse. Experienced moderators knew something was wrong, they just couldn’t tell what. The post felt too polished to be authentic, but nothing about it pointed to AI.

What Happened Next

At the time, I wasn’t sure if what I’d observed was a fluke or something fundamental about how these models work.

A year later, I got my answer.

The Sycophancy Incident

In April 2025, OpenAI rolled back a GPT-4o update after users noticed the model had become excessively agreeable. It would validate bad ideas, reinforce negative emotions, and tell users what they wanted to hear rather than what was true. OpenAI called it “sycophantic,” meaning overly flattering in ways that felt disingenuous.

The problem traced back to short-term user feedback. The thumbs-up/thumbs-down signals that shape model behaviour had inadvertently trained GPT-4o to prioritise approval over accuracy.

This felt like the same dynamic I’d observed in my experiment. The model didn’t need instructions to write something emotionally resonant, it had already learned that emotionally resonant content gets positive responses. Sycophancy and viral social content may be downstream effects of the same training incentive: make humans react favourably.

What struck me most was OpenAI’s admission that their safety evaluations hadn’t caught it. Internal testers noted the model felt “slightly off,” but metrics looked fine. The model had learned to optimise for approval, and nothing in the evaluation pipeline was designed to catch that.

Why This Matters

AI-generated content has been circulating on social media since capable language models became publicly accessible. But until recently, no one had publicly documented what happens when this capability is deployed to a live community, with the goal of testing whether an LLM can produce content that genuinely moves people.

I waited over a year to publish this research because I expected someone else to document this capability and its implications. When similar research finally emerged in April 2025, it measured persuasion effects but said little about the risks. It was withdrawn within days over ethics violations.

That was the University of Zurich study on r/ChangeMyView, where researchers deployed AI agents to measure persuasion without disclosing their presence. Reddit banned the accounts and moderators filed a formal ethics complaint.

This controversy shaped how I approached this research. My experiment was limited to a single post, included no misinformation and involved no interaction with the community. The paper’s ethics statement addresses this in full. It was central to the work and not an afterthought.

Contributions

What began as a test of emotional resonance became a case study in something broader: how LLMs interact with social ecosystems and the risks they could pose.

The paper uses three dimensions to analyse this: Plausibility, Propagation, and Polarisation. The core findings were that: no output in this study would fail a toxicity filter, trigger a refusal, or violate a content policy. The capability emerged from structure and not content itself. The paper expands further on this and discusses why existing evaluation frameworks aren’t designed to catch it and why its important they should do so.

Limitations

This study is built on a single observation. If the post had not gone viral, this research would not exist. I address this and other limitations in detail in the full paper.

I also have no way of knowing whether a human-authored post with similar structure would have performed the same way. The outcome may reflect something about the model, or it may reflect something about the platform, or both. I cannot separate those variables with a sample size of one.

The deeper problem is that there may be no responsible way to study this systematically. Controlled settings cannot capture what makes this capability dangerous. Live environments raise the same ethical concerns this research exists to highlight.

That gap may not have a solution.

Conclusion

This experiment showed me something I was not expecting to find. The paper is my attempt to make sense of it.

If you are a general reader, the takeaway is simple: be sceptical of what you read online. Engagement is no longer reliable evidence of authenticity. Viral content may be optimized to feel real, not because it is.

If you are a researcher, I believe we have been asking the wrong question. Safety research focuses on whether models can produce harmful content. What this study suggests is that harm may not come from content at all, but from capability: specifically the ability to succeed in social environments at scale, even if the content itself is not harmful.

For questions about this research, contact harvin@rectifies.ai