I have two variables called entity and label. The entity variable store list of word, each element in this list contain list as well. So it is a list of list variable. This list actually a bi-gram feature so I need to keep it.
I try to train a classifier using this two variables. This my code so far:
from sklearn import svm
from sklearn.feature_extraction.text import TfidfVectorizer
entity = [[['Prabowo Subianto']], [['Muhtar Ependi']], [['Nina Zatulini']], [['Partai Gerindra']], [['Persiba']], [['Partai Kebangkitan Bangsa (PKB)'], ['Partai Kebangkitan'], ['Kebangkitan Bangsa'], ['Bangsa ('], ['( PKB'], ['PKB )']], [['Sman 3 Kabupaten Tangerang'], ['Sman 3'], ['3 Kabupaten'], ['Kabupaten Tangerang']], [['Bandara Changi Singapura'], ['Bandara Changi'], ['Changi Singapura']], [['Warung Kopi Kita'], ['Warung Kopi'], ['Kopi Kita']]]
label = ['PERSON', 'PERSON', 'PERSON', 'ORGANIZATION', 'ORGANIZATION', 'ORGANIZATION', 'LOCATION', 'LOCATION', 'LOCATION']
vectorizer = TfidfVectorizer(min_df=1)
train_vector_entity = vectorizer.fit_transform(entity)
train_vector_label = label
classifier = svm.SVC()
classifier_word = classifier.fit(train_vector_entity,train_vector_label)
The error result:
AttributeError: 'list' object has no attribute 'lower'
What is the best way to train the classifier? Thanks
TfidfVectorizer