0

I have two variables called entity and label. The entity variable store list of word, each element in this list contain list as well. So it is a list of list variable. This list actually a bi-gram feature so I need to keep it.

I try to train a classifier using this two variables. This my code so far:

from sklearn import svm
from sklearn.feature_extraction.text import TfidfVectorizer

entity = [[['Prabowo Subianto']], [['Muhtar Ependi']], [['Nina Zatulini']], [['Partai Gerindra']], [['Persiba']], [['Partai Kebangkitan Bangsa (PKB)'], ['Partai Kebangkitan'], ['Kebangkitan Bangsa'], ['Bangsa ('], ['( PKB'], ['PKB )']], [['Sman 3 Kabupaten Tangerang'], ['Sman 3'], ['3 Kabupaten'], ['Kabupaten Tangerang']], [['Bandara Changi Singapura'], ['Bandara Changi'], ['Changi Singapura']], [['Warung Kopi Kita'], ['Warung Kopi'], ['Kopi Kita']]]
label = ['PERSON', 'PERSON', 'PERSON', 'ORGANIZATION', 'ORGANIZATION', 'ORGANIZATION', 'LOCATION', 'LOCATION', 'LOCATION']

vectorizer = TfidfVectorizer(min_df=1)
train_vector_entity = vectorizer.fit_transform(entity)
train_vector_label = label

classifier = svm.SVC()
classifier_word = classifier.fit(train_vector_entity,train_vector_label)

The error result:

AttributeError: 'list' object has no attribute 'lower'

What is the best way to train the classifier? Thanks

1
  • What does the list of list represent? I mean does it have any significance of keeping in list? Cant you iterate through it to make it a single list which can then be passed to TfidfVectorizer Commented Aug 15, 2017 at 7:03

1 Answer 1

0

Just change this row:

train_vector_entity = vectorizer.fit_transform([i[0][0] for i in entity])
Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.