19

I have a UTF-8 CSV file stored on a web server. When I download the file put it on my hard drive and I then import it into an Excel sheet with this macro (from the macro recorder) :

Sub Macro2()
Workbooks.OpenText Filename:= _
    "C:/myFile.csv", Origin _
    :=65001, StartRow:=1, DataType:=xlDelimited, TextQualifier:= _
    xlDoubleQuote, ConsecutiveDelimiter:=False, Tab:=False, Semicolon:=False _
    , Comma:=True, Space:=False, Other:=False
End Sub

All of the characters (vietnamese characters) are displayed correctly.

When I try the same macro but instead of giving the local address of the file ("C:/myFile.csv") I pass the URL of the file ("http://myserver.com/myFile.csv") the CSV is correctly imported into my Excel sheet but the vietnamese characters are not displayed correctly anymore.

I have also tried using the Data tab but the encoding seems be ignored by Excel:

With ActiveSheet.QueryTables.Add(Connection:= _
                "TEXT;C:/myFile.csv" _
                , Destination:=Range("$A$1"))
                .Name = "myFile.csv"
                .FieldNames = True
                .RowNumbers = False
                .FillAdjacentFormulas = False
                .PreserveFormatting = True
                .RefreshOnFileOpen = False
                .RefreshStyle = xlInsertDeleteCells
                .SavePassword = False
                .SaveData = True
                .AdjustColumnWidth = True
                .RefreshPeriod = 0
                .TextFilePromptOnRefresh = False
                .TextFilePlatform = 65001
                .TextFileStartRow = 1
                .TextFileParseType = xlDelimited
                .TextFileTextQualifier = xlTextQualifierDoubleQuote
                .TextFileConsecutiveDelimiter = False
                .TextFileTabDelimiter = True
                .TextFileSemicolonDelimiter = False
                .TextFileCommaDelimiter = False
                .TextFileSpaceDelimiter = False
                .TextFileOtherDelimiter = "~"
                .TextFileColumnDataTypes = Array(1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1)
                .TextFileTrailingMinusNumbers = True
                .Refresh BackgroundQuery:=False
       End With

Sample data: „; Â; ˜; Â1/4; ‰; ™,™

which Excel reads wrongly as: „; Â; ˜; Â1/4; ‰; ™,™;

7
  • There is usually an encoding declaration in HTTP headers. Maybe in your case it is set to another character set and overwrite the Origin parameter Commented May 13, 2014 at 9:04
  • Thanks, and do you know how i can change those HTTP headers ? Commented May 13, 2014 at 9:16
  • Depends on the server. You can easily check if that is the case with a web browser: in firefox you can activate the web console, type the URL in the address bar and look at the response header Content-type Commented May 13, 2014 at 9:26
  • here is the content-type : text/html; charset=UTF-8 So it shouldn't be a server issue no ? thanks Commented May 13, 2014 at 10:10
  • 1
    My testing shows that Excel gets confused when there is no Unicode byte order mark in the file. When there is one, it opens it properly from a URL, and in that case it does not matter whether the server supplies charset=UTF-8 in the headers. Commented Sep 1, 2015 at 10:46

3 Answers 3

8

I have been looking at a similar problem where we import utf-8 encoded csv files in to a worksheet. I am not pulling the data from a web server but this might help.

My solution is to read the utf-8 file to a local variable then insert it into a sheet. I tried saving the data to a temp file with ansi encoding but doing this caused all the characters to lose their accents.

Function ReadUTF8CSVToSheet(file As String)
    Dim ws As Worksheet
    Dim strText As String

    ' read utf-8 file to strText variable
   With CreateObject("ADODB.Stream")
        .Open
        .Type = 1  ' Private Const adTypeBinary = 1
        .LoadFromFile file
        .Type = 2  ' Private Const adTypeText = 2
        .Charset = "utf-8"
        strText = .ReadText(-1)  ' Private Const adReadAll = -1
    End With

    ' parse strText data to a sheet
    Set ws = Sheets.Add()
    intRow = 1
    For Each strLine In Split(strText, chr(10))
        If strLine <> "" Then
            With ws
                .Cells(intRow, 1) = strLine
                .Cells(intRow, 1).TextToColumns Destination:=Cells(intRow, 1), DataType:=xlDelimited, _
                    TextQualifier:=xlDoubleQuote, ConsecutiveDelimiter:=False, Tab:=False, _
                    Semicolon:=False, Comma:=True, Space:=False, Other:=False
            End With

            intRow = intRow + 1
        End If
    Next strLine

    ReadUTF8CSVToSheet = ws.Name

End Function

' to run
strSheetName = ReadUTF8CSVToSheet("C:\temp\utf8file.csv")
Sign up to request clarification or add additional context in comments.

Comments

7
+25

If the characters are displayed correctly when you download the csv file yourself, I'd divide the process to 2 stages:

Downloading

Sub DownloadFile(ByVal url As String, ByVal local As String)

Dim WinHttpReq As Object
Set WinHttpReq = CreateObject("Microsoft.XMLHTTP")
WinHttpReq.Open "GET", url, False, "username", "password"
WinHttpReq.send

myURL = WinHttpReq.responseBody
If WinHttpReq.Status = 200 Then
    Set oStream = CreateObject("ADODB.Stream")
    oStream.Open
    oStream.Type = 1
    oStream.Write WinHttpReq.responseBody
    oStream.SaveToFile local, 2 
    oStream.Close
End If

End Sub

Loading CSV

Sub OpenCsv(ByVal csvfile As String)
Workbooks.OpenText Filename:= _ 
csvfile,Local:=True,StartRow:=1, DataType:=xlDelimited, TextQualifier:= _
xlDoubleQuote, ConsecutiveDelimiter:=False, Tab:=False, Semicolon:=False _
, Comma:=True, Space:=False, Other:=False
End Sub

Note That: The Local parameter is the key here,it makes VBA use your excel's local configuration (vietnamese), which is by default set to False.

Putting it all together

Sub DownloadAndLoad
  DownloadFile "http://myserver.com/myFile.csv","C:\myFile.csv"
  OpenCsv "C:\myFile.csv"
End Sub

2 Comments

Hi Uri. Thanks for your reply. In my case, the Local:=True doesn't change a thing (is it cause my locals are UK? Changing my locale isn't really an option as I will be processing files of many languages...). For example, the csv file has a ™ in it, when I use the Workbooks.Open the character is still loaded as „¢. I have noticed that when I do this manually (not via macro) the csv is loaded properly. I have recorded a macro from the manual steps but the macro behaviour is different. I have read that csv files may be missing UTF-8 heading - are you familiar with headings at all?
Local should not be used or have an effect when reading UTF-8. @Meehow Yes, putting a UTF-8 BOM in the file fixes it.
4

IMO, there seems to be a bug/conflict in Excel when opening UTF-8/UTF-8-BOM files using the recorded macro code, specifically when the Origin parameter is set to 65001 which is supposed be UTF-8.

I have found two workarounds to this issue:

  1. Remove the Origin parameter from the function call and see if the file loads properly Workbooks.OpenText Filename:="C:\file.csv".

    MSDN says:

    If this argument is omitted, the method uses the current setting of the File Origin option in the Text Import Wizard.

    I would think that as soon as you link the file with Excel, it should try to read the header of the file and select the correct Country Code automatically (well, assuming the header is not missing).

  2. I have tried different Country Codes and found that in my specific scenario setting Origin:=1252 (1252 - windows-1252 - ANSI Latin 1; Western European (Windows)) loads the file in Excel just fine.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.