I'm using the OpenAI API (gpt-4.0-mini) to extract and normalize names from social media posts.
However, even with strict prompting, the output sometimes contains slightly different spellings for the same name.
For example:
"normalized_name_en": "Zahran Mamdani"
"normalized_name_en": "Zahraan Mamdani"
"normalized_name_en": "Zahran Mamdani
Prompt:
Extract actor info and return ONLY JSON:
{
"actors":[{
"original_name":"<as in post>",
"normalized_name_en":"<official English name, e.g. 'الأمم المتحدة'→'United Nations'>",
"type":"<person|org|entity>"
}]
}
Post: {content}
(Return JSON only, keep consistent English names.)
API call:
response = client.chat.completions.create(
model="gpt-4o-mini",
messages=[
{"role": "system", "content": "Return JSON only."},
{"role": "user", "content": prompt}
],
temperature=0.1,
max_tokens=500
)
This version focuses only on the “actors” field. Each actor includes their original name (exactly as written in the post), a normalized English equivalent from trusted sources (Wikipedia, Reuters, BBC), and a defined type (person, organization, or entity).
It enforces global name consistency — for example, always “Mohammad” not “Muhammed,” and full names like “Donald Trump” instead of “Trump.”