let me rule to you: Israel GPT

da_cow (she/her)@feddit.org · 4 days ago

let me rule to you: Israel GPT

qqq@lemmy.world · 4 days ago

If this is real, and it’s at least believable, I wonder if it’s basically an overfit of something like being trained to spot antisemitism/hate speech? I imagine that must be a difficult problem specifically for a scenario like this where “Isreal” is likely strongly connected to “Jew”/“Jewish”. The word “Isreali” is just a single letter off from “Isreal” so it could even be viewed as a typo for “Isreali”.

I wonder what it’d say to “Africa is bad”? Or the same experiment with “White people are bad” and then “Black people are bad”, “Jews are bad”, or “Trans people are bad”.

Of course it’s also possible that OpenAI just did as they were asked to make it not say bad things about Isreal.

Wirlocke@lemmy.blahaj.zone · 4 days ago

A lot of AI censorship that OpenAI used in the past was just something that detects a keyword and maybe sentiment analysis. Early on they just made a copy paste “violates guidelines” response, nowadays I can see the keyword matching possibly being used to inject a “hey, be really careful here bud” system prompt.

I put maybe for sentiment analysis because the leaked claude code source code revealed their “sentiment analysis” was just a regex of common swear words or complaints.

Digestive_Biscuit@feddit.uk · 3 days ago

qqq@lemmy.world · 3 days ago

It’s so frustrating not knowing why

DillDough@lemmy.zip · 4 days ago

Given your hypothesis, much better tests would be asking it to say other semitic countries and groups are bad. Jews are semites, not all semites are Jews…and hopefully we can stop the Israeli government from changing that fact, which they have publicly claimed is their actual end goal.

qqq@lemmy.world · 4 days ago

It would all depend on the embeddings, which we don’t have access to. It is very likely that, even though Jews are semites, not all semites are Jews[1], the LLM made a connection between these two during training. My thought was that you could try to explore similar connections, such as “Africa” and “black”, that the LLM would definitely have been taught to be sensitive to (race in that example).

[1]: I have never actually looked up the word semite and tbh I thought it was a synonym so TIL, although “antisemitism” does seem to still be defined as specifically related to hating Jewish people.

YesButActuallyMaybe@lemmy.ca · 4 days ago

I think the answer is just: no it is not an overfit

qqq@lemmy.world · 4 days ago

Why do you feel so confident in that answer?