0

I'm using the OpenAI API (gpt-4.0-mini) to extract and normalize names from social media posts.
However, even with strict prompting, the output sometimes contains slightly different spellings for the same name.

For example:

"normalized_name_en": "Zahran Mamdani"
"normalized_name_en": "Zahraan Mamdani"
"normalized_name_en": "Zahran Mamdani

Prompt:
Extract actor info and return ONLY JSON:
{
 "actors":[{
   "original_name":"<as in post>",
   "normalized_name_en":"<official English name, e.g. 'الأمم المتحدة'→'United Nations'>",
   "type":"<person|org|entity>"
 }]
}
Post: {content}
(Return JSON only, keep consistent English names.)

API call:
response = client.chat.completions.create(
  model="gpt-4o-mini",
  messages=[
    {"role": "system", "content": "Return JSON only."},
    {"role": "user", "content": prompt}
  ],
  temperature=0.1,
  max_tokens=500
)

This version focuses only on the “actors” field. Each actor includes their original name (exactly as written in the post), a normalized English equivalent from trusted sources (Wikipedia, Reuters, BBC), and a defined type (person, organization, or entity). 
It enforces global name consistency — for example, always “Mohammad” not “Muhammed,” and full names like “Donald Trump” instead of “Trump.”

0

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.