Building an LLM Discord Bot to Impersonate My Friend

My friend group has a Discord server that has been active for nearly 10 years. A while ago I wrote a Discord bot for the server that stores and backfills all messages into a database. After hearing so much AI hype over the last year I finally caved and decided to run an experiment. Could I feed all that history into an LLM and get it to convincingly respond as one of my friends? So I tried it.

The Data

The Discord server’s message history is stored in a PostgreSQL database. Each message has an author, channel, content, and timestamp. Getting the data out is straightforward enough with a simple join to resolve usernames.

SELECT
    m.author_id,
    u.username,
    m.channel_name,
    m.content,
    m.timestamp
FROM messages m
LEFT JOIN users u
    ON m.author_id = u.discord_user_id
WHERE m.content IS NOT NULL
ORDER BY m.timestamp ASC

Before doing anything useful with the messages I had to clean them up. Discord messages are full of mention tags like <@123456789> and <#987654321> that are meaningless out of context, along with the associated empty messages. These we drop using a simple regex and pandas.

Embedding

A raw list of messages isn’t particularly useful on its own. To get the LLM to respond as a specific user, I needed to restructure the data into conversation threads where the target user’s messages are paired with the context that prompted them.

My solution was to segment the log into conversation threads using time gaps. If two consecutive messages in the same channel are more than 5 minutes apart, they’re treated as the start of a new conversation. Messages within that window are treated as a single coherent exchange. For every message friend sent, the preceding messages in that same session become the context.

SESSION_GAP = timedelta(minutes=5)

for channel, group in df.groupby("channel_name"):
    session = []
    last_time = None

    for i in range(len(group)):
        row = group.iloc[i]

        if row["timestamp"] - last_time > SESSION_GAP:
            session = []  # start fresh session

        session.append(row)
        last_time = row["timestamp"]

        if row["username"] != TARGET_USER:
            continue

        context = session[:-1]
        if len(context) < 2:
            continue

        examples.append({
            "context": "\n".join(f"{r['username']}: {r['content']}" for r in context),
            "response": row["content"],
            "channel": channel
        })

The context string is formatted as a simple chat log username: message per line so it’s human-readable and maps naturally to what the LLM sees in its prompt. Each example pairs that context block with the actual reply friend sent, which is what the model is trying to learn to reproduce.

The 5-minute gap was chosen somewhat arbitrarily. Too short and you miss multi-part messages, and too long and you start lumping unrelated conversations together.

A few years of server history produced about 4,000 usable examples after filtering out anything with fewer than 2 messages of context.

Retrieval-Augmented Generation (RAG)

With the example threads built, I needed a way to find the most relevant ones given a new query. This is the core of the RAG approach.

I used sentence-transformers/all-mpnet-base-v2 to embed all the context strings into 768-dimensional vectors. These vectors encode the semantic content of each conversation thread. Two conversations about the same topic with similar energy will land near each other in the embedding space.

from sklearn.metrics.pairwise import cosine_similarity
import numpy as np

def retrieve(query, k=5):

    q_emb = model.encode([query], normalize_embeddings=True)

    sims = cosine_similarity(q_emb, embeddings)[0]

    idx = sims.argsort()[-k:][::-1]

    return [examples[i] for i in idx]

retrieve("CS?")

At query time, I embed the incoming conversation the same way and compute cosine similarity against all stored embeddings. Cosine similarity measures how closely two vectors point in the same direction, a score of 1 means identical, 0 means unrelated. Setting k=5 returns the five most semantically similar conversation threads.

{
    "context": "user_a: Any gamers today? \n user_b: Hell Divers ?\n user_c: CS ?\n ser_a: Im game for both",
    "response": "A return to CS?",
    "channel": "gaming_general"
}

Prompting

Using the retrieved results I can take the 5 most related threads, format those into a prompt, and send it to a local LLM via Ollama. I’m using qwen2.5:7b since it’s fast and small enough to run locally without any GPU.

def build_system_prompt(user_name):
    return f"""
You are {user_name}, a Discord user.

Write natural Discord messages.

Rules:
- Be concise
- Match tone and slang from examples
- Do not explain anything
- Do not mention AI
"""

The user prompt includes the retrieved examples and the new conversation to continue:

Conversation:
ruse: Anyone wanna play CS?
friend: yeah give me 15 mins
friend: also I am only playing comp no premier lol
...

Showing the model real examples of how he actually responds to similar situations does most of the heavy lifting. The LLM just needs to replicate the tone and style.

Results

I ran a few prompts against the bot and sampled the outputs a few times each to get a sense of the variance. Each call is independent since there is no session state, so the same prompt can produce very different responses depending on which examples get retrieved.

generate_response("Anyone wanna play CS?")

# → ayyo get in if u r down"
# → ayo lemme know 🖖ELYF vibes🙏"
# → ayooo, game up 🚀'

generate_response("How is the rank up gaming going?")

# → "pretty solid so far, just gotta keep playing and try not to overthink it o7"
# → "pretty solid been on a winning streak so far o7"
# → "Pretty good, just had a small dip but recovering fast"

generate_response("What is your name?")

# → "Sur福特CLUDING the emoji since it doesn't translate well here."

The emoji usage and the short clipped replies do feel like him in spots. But a lot of the outputs were just off. Incoherent responses, weird topic jumps, replies that didn’t really track with what was being asked. It reflects some of his character in certain moments but not enough to hold up across a real conversation.

Putting some of the generated prompts including the rag data into GPT-5.5 does yeild more convincing results, where the tone and use of the server specifci emojios are correct.

generate_response("How is the rank up gaming going?")

# → Won but it was way harder than it should’ve been <:emojiA:11111111111>

Thoughts

The original idea was to sneak this into the Discord server and run it as a prank, introducing friend as a bot without anyone noticing. That’s not happening. The outputs aren’t coherent enough to fool anyone who actually knows him for more than one message. It would get spotted immediately.

That said, it was a genuinely interesting experiment. Seeing the emoji patterns and certain phrases come through reminded me that some of his communication style is in the data.

Another day I’d like to try fine-tuning. A fine-tune on a small base model using his messages as training data would bake the style directly into the weights rather than relying on retrieval to pull the right vibes at inference time.

For now it lives as a notebook on my machine, maybe to be revisited another day.