Steering and Mimicry

I was interested in how someone could make LLMs write better. Since great artists steal, I planned to invoke better writing by stealing styles or personalities. This article is not intended to unravel the weave of style and personality. When I mention one or the other, I am gesturing at the same phenomenon. Ideally, we can steer an LLM to mimic this phenomenon.

You can explore context, steering vectors or sampling in order to determine what is optimal. There is an argument that sampling and other modifications would require something like power sampling (https://arxiv.org/abs/2510.14901) in order to counter entropy effects. I did not employ this here as it would bias tests in terms of compute. Instead, I implemented naive versions of each without regard to entropy.

To be explicit, I explore and compare steering vectors, an n-gram logit bias and ICL examples to determine which improves the writing mimicry of LLMs. Future work in this area will look to improve the writing style of LLMs.

Setup

We begin with a classifier. I train a Potion-base-8m parameter static embedding model with a linear classifier head. It shows strong discrimination between base LLM and human written text. I use the collection of Sam Altman essays for the experiments. Our model can accurately determine a sentence written by Sam or our base LLM 90.5% of the time. I use sentence by sentence determination to increase sample size for training.

I wrote and used a similar setup as repeng (https://github.com/vgel/repeng/?tab=readme-ov-file) in order to create our steering vectors. Repeng uses a single-component PCA. I also test a multi-component PCA. ICL examples are prompts with given essay examples. For our n-gram logit bias, we create a trigram model from a collection of target essays. When we are testing the LLM we use an essay title and prompt the LLM to generate an essay. We produce base model output, rank 1 PCA, multi-rank PCA, n-gram biased and ICL essays. I attempted to use both Qwen and Llama models but the Qwen resisted steering vectors (This is common in literature). The cited values were from Llama 3.2 1B Instruct. I also add GPT-OSS 120B via together.ai for an example of increasing LLM size with ICL.

Results

The overview results are as follows:

Intervention	N	Mean
Trigram @ 0.5	20	0.2770
Trigram @ 1	20	0.5026
Rank 1 Vector	20	0.1346
PCA Vector	20	0.3485
Style Prompt	20	0.2666
Together	20	0.6978
Base	20	0.2044

Mean is the mean Altman score and n is the number of essays I had the LLM write. @0.5 or @1 is the strength component of the bias. It is a linear multiplier on how much to upweight the specific logits. Vector additions only had a single weighting which did not degenerate.

Intervention	N	Entropy
Trigram @ 0.5	20	0.4865
Trigram @ 1	20	0.2614
Rank 1 Vector	20	1.3421
PCA Vector	20	0.9047
Style Prompt	20	0.6347
Together	20	n/a
Base	20	0.5372

Here we tracked entropy using a subset of logits from my vLLM instance. Since this is a subset it provides a floor for the entropy and is an approximation. I found that it is a decent approximation.

Intervention	N	Degradation %	P50 Coherence
Trigram @ 0.5	20	5	34.17
Trigram @ 1	20	100	6.83
Rank 1 Vector	20	0	56.83
PCA Vector	20	15	26.98
Style Prompt	20	0	67.63
Together	20	0	93.17
Base	20	0	56.47

Coherence is reported as the mean of the essays. deg% is the degradation percentage which were the percent of essays that looped or were nonsensical.

Above we see our classified statistics and coherence. I had GPT-5.2 Pro evaluate the essays for coherency. There are a few key trends. Our Altman classifier is poor in analyzing coherence. Our Rank 1 Vector performs worse than baseline! This is surprising considering the positive steering vector studies in regards to simpler emotions and Golden Gate Claude. However, it doesn’t harm coherency (and may improve it). All other sampling or vector level changes lead to reduced coherency.

I did measure entropy per method but it was a sample across the top logits. It only provides a floor. Sampling decreases the entropy while every other change increases entropy. In future experiments, I would love to keep entropy stable using power sampling. Our entropy supports our other observations where rank 1 vectors were noisy (increased overall entropy -> not Altman directed) and n-gram sampling leads to decoherence (decreased overall entropy -> looping).

Discussion

The goal of this experiment was to get a sense of what influences the LLM personality and what are the easiest levers to pull. An LLM trained to take in text is best at using text. Vector changes are highly susceptible to noise. n-gram based sampling helps directionally but leads to decoherence at high strength.

All of these observations make sense intuitively. The most interesting result to me was the vectors. It influenced how I think about training latents and weights. The core idea is you need to have a high signal for what you are trying to change. In prior literature, Rank 1 updates tend to revolve around specific traits or emotions: happiness, sadness, etc. You can go as far as using updates like Golden Gate Claude but you require A LOT of data (https://transformer-circuits.pub/2023/monosemantic-features/index.html - 8B samples for 4096 features) in order to denoise the signal. On the other hand, the models are already trained to denoise the text. ICL needs less data because it is in-domain.

Intuitively, we also get higher Altman scores by simply increasing the size of the model. ICL is a very powerful technique when using data constrained personality changes. If we were to go forward then distilling from ICL generated personalities of a stronger model into a weaker model makes the most sense. This mimics what is also seen in LLM general and personality research. You create an expert model then distill into your inference model. It’s an iterative process of dataset denoising.

Noise also explains why our PCA was less coherent than our Rank 1 vectors. PCA preserves much more signal but also more noise. Since our PCA decohered it is strong evidence that our personality vectors were too noisy. Preserving the noise on this scale is detrimental. At the scale a PCA style system would outcompete a Rank 1, we likely reach the data and compute threshold where we should train the weights. Rank 1 vectors still have their place in steering studies because of the signal/noise tradeoff.

n-gram sampling performed much better than I anticipated. There is something here especially if a more performant sampling system was incorporated to entropy match the base model. However, I have yet to test this idea. It would be straightforward to use a power sampling approach to entropy match. The benefit here is we could produce n-grams cheaply and upgrade personalities at the potential cost of test-time compute.

Overall, my takeaways mirror what is seen at an abstract level in ML discourse. Dataset and weight updates are limited by their noise. Distilling from experts is ideal. We require large enough datasets to compensate for noise and the compute limits favor weight updates at that point. ICL is a powerful technique. There doesn’t currently exist a technique between ICL and weight tuning which justifies the complexity tradeoffs. If anything, you would need to train the model to intake personality vectors which again drives you towards weight updates.

I do think there is a future where we RL a model to take writing samples as ICL and then style match. This is similar to how voice cloning works in audio models. You embed your desired voice and then the model will output similarly. However, instead of using an embedding, we make use of the fact that the LLM is trained with ICL abilities. The audio models are trained for the embedding input. This project has been on my docket for awhile.