3

I am rewriting a small python script in node.js. The original script worked like this:

# -*- coding: utf-8 -*-
import urllib
import httplib
import json

def rpc(url, args = { }):
  try:
    post_data = json.dumps({'args': args})
    f = urllib.urlopen(url, post_data)
    if not f or f.code != 200:
      return { 'result': 1, 'error': 'urlopen returned error' }
    data = f.read()
    js_data = json.loads(data)
  except Exception, e:
    return { 'result': 2, 'error': e }
  else:
    return { 'result': 0, 'data': js_data }

print rpc('http://server.local/rpc', {'x': u'тест'})

I use request to do the same in node.js:

var request = require('request')

request.post('http://server.local/rpc', {
    json: {'x': 'тест'}
}, function(err, result) {
    console.log(err, result.body)
})

It works, but the unicode data is garbled, so that I get ÑеÑÑ instead of тест when querying the data back. It seems strange, given that both python and node.js should be sending utf8-encoded data.

Btw, the server is written in perl, I think, but that's all I know about it :(

Also, server returns unicode data on other queries, so it is able to do that.

Upd. my console prints unicode characters fine.

Upd. Rewrote my code to use node.js http module:

var http = require('http')

var options = {
  hostname : 'server.local',
  path     : '/rpc',
  method   : 'POST'
}    
var req = http.request(options, function (res) {
  res.setEncoding('utf8');
  res.on('data', function (chunk) {
    console.log('BODY: ' + chunk);
  });
});    
var body = JSON.stringify({'x': 'тест'})    
req.setHeader('Content-length', body.length)    
// python sends data with this header
req.setHeader('Content-type', 'application/x-www-form-urlencoded')

req.on('error', function (e) {
  console.log('problem with request: ' + e);
});    
req.end(body, 'utf8');

The results are sadly the same. Also same behavior on two different installations (my personal MBA and production Debian server). So it does seem to be something with the way node.js represents unicode data.

9
  • Is your console unicode aware? Can you print a hardcoded тест in node? Commented Feb 7, 2014 at 11:56
  • Yes, console is unicode aware. Commented Feb 7, 2014 at 12:21
  • I have a feelng this could be the UCS-2 curse. Can you check the length of the body without setting any encoding, the default which is buffer. Or better print the entire buffer. Commented Feb 10, 2014 at 10:13
  • To read more about it see mathiasbynens.be/notes/javascript-encoding Commented Feb 10, 2014 at 10:27
  • Well, yeah, my first thought was that something was wrong with character conversion on my side, but the caracters in тест are within BMP and escape sequences from both python and node.js seem to be the same (\u0442\u0435\u0441\u0442). Commented Feb 10, 2014 at 11:16

4 Answers 4

2
+200

Here is the request made by a python script:

POST / HTTP/1.0
Content-Type: application/x-www-form-urlencoded
Content-Length: 43
Host: localhost:1234
User-Agent: Python-urllib/1.17

{"args": {"x": "\u0442\u0435\u0441\u0442"}}

Here is the request made by a node.js server:

POST /rpc HTTP/1.1
Host: localhost:1234
Content-length: 12
Content-type: application/x-www-form-urlencoded
Connection: keep-alive

{"x":"тест"}

Do you see an issue? JSON.stringify is encoding data to utf8 string, but python is encoding it to ascii.

If your rpc server doesn't understand utf8, you can encode json using external libraries. For example, this would work:

var request = require('request');
var jju = require('jju');

request.post({
   uri: 'http://localhost:8080/rpc',
   body: jju.stringify({args: {x: "тест"}}, {
       mode: 'json',
       indent: false,
       ascii: true,
   }),
}, function(err, res, body) {
    console.log(body);
});

With the code above the request would look like this:

POST /rpc HTTP/1.1
host: localhost:8080
content-length: 41
Connection: keep-alive

{"args":{"x":"\u0442\u0435\u0441\u0442"}}

Which is similar to what python is doing.

Sign up to request clarification or add additional context in comments.

3 Comments

Wow... Thank you very much! Also jju is ridiculously difficult to find with google, so here's a link for future generations : jju
it's on github and installable from npm under the same name, I didn't think it's that difficult
Yeah, its just that google can't seem to find it :) Npm did, so not that big a problem, but stil)
0

Well, try eliminating variables.

Use the native http.request instead of the request module, even if it's more complicated, it'll eliminate request as a possible culprit.

When you send your data, make the utf8 encoding explicit.

I don't know enough about the internals of the request module to figure out where the breakdown might be occurring or if there are options you need to pass, but this would at least give you a way to figure out if the default node http.request could get you unstuck, or if there appears to be a deeper issue with your install.

1 Comment

hrmph. I'll try it on my machine on the off chance that I get something different.
0

The real problem is that you are telling the server you are going to send less data than you are actually sending. So when server try to encode data it gets corrupted.

body.length gives you the amount of 'elements' in the string body. For US-ASCII characters 1 element = 1 byte but this is not applicable for non US-ASCII characters.

With used logic every time a non US-ASCII is used you are adding a byte overweight that you should compute for the Content-length header.

http://en.wikipedia.org/wiki/UTF-8

Change line 15 to:

var utf8overLoad = encodeURIComponent(body).match(/%[89ABab]/g).length || 0;
var bodylength = body.length + utf8overLoad ;
req.setHeader('Content-length', bodylength);    

Comments

0

I think this might come from your server or your console.

I've just tested both client and server writtent in nodeJS, in a console with urt8 capacities (font 'Lucida Console') and it works. Server code:

var express = require('express');

var app = express()
  .use(express.methodOverride())
  .use(express.bodyParser())
  .post('/rpc', function(req, res) {
    console.log(JSON.stringify(req.body, null, 2));
    res.send(req.body.x);
  };

app.listen(8080);

(using [email protected])

Console output on request:

{
   "x": "тест"
}

Client code:

var request = require('request');

request.post({
  uri:'http://localhost:8080/rpc',
  json:{"x": "тест"}
}, function(err, res, body) { 
   console.log(body); 
});

(using [email protected])

Console output:

тест

The overall operation on NodeJS 0.10.4

It works also by using an HTTP client like curl or the "Advanced REST Client" chrome extension.

But the important thing is that the request is sent with "application/json" encoding (not the classical x-www-form-urlencoded), and that the bodyParser() middleware on server performs JSON deserialization.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.