0

I am new to Python Pandas and working on a small application where in i want to read my excel file having data in Hindi Language.

Issue I am facing is , pandas is not able to read hindi words and is placing some arbitary '?' symbol.

I have tried adding encoding to utf-8 but that is also not working.

My Excel Data :

enter image description here

Python Code :

df = pd.read_csv("Vegaretable_List.csv", encoding='utf-8')

Output :

['?? ' '??? ' '???? ' '????? ' '????']

Any help will be appreciable. Thanks in advance.

3
  • 1
    You need to find out the encoding of your input file. It may be something else. You can also use this tool: r12a.github.io/app-conversion Commented Jan 6, 2021 at 6:27
  • you require language converter like codec. docs.python.org/3/library/codecs.html refer this link Commented Jan 6, 2021 at 6:38
  • 1
    Try opening the file and save as CSV UTF-8 (Comma delimited) (*.csv) Commented Jan 6, 2021 at 6:42

3 Answers 3

2

The problem shouldn't occur if the file is read in using the same encoding it was created with.

If you get "???", it means the csv or excel file was saved with a different encoding.

Here is a table of the standard encodings.

Also, you could open your file in an appropriate program, and save it with UTF-8, in order to read with your code.

Also See:

Sign up to request clarification or add additional context in comments.

Comments

0

Do not create csv file, instead use excel file in .xlsx format. Python will read the hindi text. I did this and it worked.

dataset = pd.read_excel("Data.xlsx") 

Here the Data.xlsx contains all the hindi text that you gave.

Best of luck

Comments

-1

Assuming that your Excel/CSV file has a content similar to this:

मिशल
बहादुर
मेरी
जेन
जॉन
स्मिथ

The encoding type is correct. It's just that you have to iterate through the data to get it back.

For .CSV

import csv

with open('customers.csv', 'r', encoding='utf-8') as file:
    data = csv.reader(file)
    for row in data:
        print(row)

For .XLSX

with open('customers.xlsx', 'r', encoding='utf-8') as file:
    data = file.readlines()
    for row in data:
        print(row.strip())

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.