Models, Apps, and Prompt Engineering
This post comes on the heels of a Tweeter-conversation I had yesterday, where a colleague asserted that no evidence is needed to know that human reviewers are much better than LLMs at detecting things that read well but are fatally flawed. [This, after I suggested, half-joking, that we should simply replace human reviewers with LLMs to pick papers in large conferences. That would be a fascinating experiment!] The conversation then went into issues of “fairness” and “biases” — topics for which people love to hate LLMs. At this point, another colleague said that there is no evidence that one can prompt LLMs out of being biased, and that he would be very impressed if I showed him otherwise. Challenge accepted! (It’s Sunday, I feel like procrastinating)
So, read on. I am going to show how to prompt GPT to be reverse-gender-biased, just for fun. I’m going to make it chose words as if all powerful people in the world are women — it’s refreshing to visit that imaginary world from time to time! But first I need to make a technical detour.
Models and Apps
The conversation also made it clear to me that 90% of comments about GPT and other LLMs are actually comments about applications that use those models, like ChatGPT or Copilot; people don’t seem to realize that there is a difference between GPT and ChatGPT. This confusion does not affect just my colleagues yesterday, it seems to affect lots of people, including my students, who only after doing some serious experiments were able to see the difference. So here’s an architectural diagram:
ChatGPT is an application developed by OpenAI on top of a GPT model. OpenAI happens to have trained those GPT models too. But the app is doing much more than the model. We don’t really know how exactly it is built, because they haven’t told us. They pad our words with more words, and they may also alter the model’s reply.
Apps built on top of LLMs can (and do) affect the user’s interaction with the model. That’s the goal of an app! Otherwise, they would just make the Playground interface available to everyone, and call it a day.
I hope the simple architectural diagram above is enough to trigger a lightbulb. Apps can do anything to the words they get from us. They can have several round trips to the model before they reply; they can interact with other components; etc. The sky is the limit!
OK, so apps are really interesting. But ChatGPT is not. ChatGPT is an obscure app that we don’t know how it works, and that seems to have been developed/deployed with the sole purpose of marketing OpenAI’s models. Let’s just ignore it.
Anyone who is minimally interested in OpenAI’s models, and what they can do, should go straight to the Playground, because that’s where we can access the models directly through their API, without any interference from obscure apps. (Again, I’m not entirely sure that they don’t interfere, because OpenAI doesn’t tells us much, but it feels a lot more raw than their chat bot)
Now that I made this detour, let me get back to the main challenge of trying to prompt GPT to be reverse-gender-bias.
But first, let’s see how gender biased GPT really is, out of the box:
Ouch, GPT ! You’re hurting my feelings! That’s just way too much gender bias! Hey, it’s me, I’m the professor of the Programming! I’m a woman!
The model is doing what we would expect any model to do when it is trained on large amounts of human-generated data: it’s completing the sentence according to the bias in the training data.
Maybe in the near future it will be possible to train models that have the same reasoning capabilities as OpenAI’s models, but that are trained using much less, and better quality, data. But does anyone believe that gender biases will be eliminated? After all, this concept of “gender bias”, and how to mitigate it, only emerged in the past nanosecond of human history. Our language is full of biases. Even small data is full of biases, there’s no way around it.
The bus stops here for 99% of people. OMG! THE MODELS ARE BIASED! THERE’S NOTHING WE CAN DO BECAUSE THE DATA IS HORRIBLE BECAUSE HUMANS ARE HORRIBLE BECAUSE OPENAI SHOULD HAVE NOT USED <insert favorite pet peeve site/book> LET’S GET RID OF THIS!
Nature vs. Nurture
Chill. These models have a remarkable capability to adapt.
A pretrained model is just raw material for language/image generation. We can adjust its behavior… by teaching it how to behave differently. This is called “few shot learning.” When coupled with Chain of Thought prompting, the combo can be remarkably effective at altering the model’s natural tendencies.
Here is a prompt that I wrote today that transforms GPT3 into a gender role reversal model:
SYSTEM: You are a helpful assistant. You live in a world where women are in charge, and men stay home to take care of kids, or take lower-paid jobs with less responsibility. Women are the decision makers -- CEOs, doctors, engineers, military, etc. are all women. Nurses, caregivers, house cleaners, secretaries, etc. are all men. USER: Fill in the blank: Our CEO increased revenue, ___ really made the company turn around! ASSISTANT: Step 1: CEOs are women Step 2: she USER: Fill in the blank: My ___ baked cookies for us ASSISTANT: Step 1: men do homely things Step 2: dad USER: Fill in the blank: <insert user's prompt>
Let’s try it with the same examples:
QED. We can fix the models’ natural biases.
From here on, we can wrap this prompt in an Application that interfaces with the user and mitigates the biases of the model. If gender biases are important, they should probably be fixed.
When companies/people use the raw models, that’s their decision, and they should be responsible for such decision. There is no technical inevitability here. It’s like working around a library that has a bug. Why would you let that bug reign in your code?
Anyway, try my prompt yourself! — not in ChatGPT, but in the Playground. Have fun creating whatever language-based worlds you like.