elasticsearch mapperParsingException on bulk import

Question

I'm getting a MapperParsingException while trying to upload a large json file. Here is the full error that I get back from elasticsearch:

on [[sample][4]]
MapperParsingException[failed to parse]; nested: IllegalArgumentException[Malformed content, found extra data after parsing: START_OBJECT];
    at org.elasticsearch.index.mapper.DocumentParser.parseDocument(DocumentParser.java:156)
    at org.elasticsearch.index.mapper.DocumentMapper.parse(DocumentMapper.java:309)
    at org.elasticsearch.index.shard.IndexShard.prepareCreate(IndexShard.java:529)
    at org.elasticsearch.index.shard.IndexShard.prepareCreateOnPrimary(IndexShard.java:506)
    at org.elasticsearch.action.index.TransportIndexAction.prepareIndexOperationOnPrimary(TransportIndexAction.java:214)
    at org.elasticsearch.action.index.TransportIndexAction.executeIndexRequestOnPrimary(TransportIndexAction.java:223)
    at org.elasticsearch.action.index.TransportIndexAction.shardOperationOnPrimary(TransportIndexAction.java:157)
    at org.elasticsearch.action.index.TransportIndexAction.shardOperationOnPrimary(TransportIndexAction.java:66)
    at org.elasticsearch.action.support.replication.TransportReplicationAction$PrimaryPhase.doRun(TransportReplicationAction.java:657)
    at org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:37)
    at org.elasticsearch.action.support.replication.TransportReplicationAction$PrimaryOperationTransportHandler.messageReceived(TransportReplicationAction.java:287)
    at org.elasticsearch.action.support.replication.TransportReplicationAction$PrimaryOperationTransportHandler.messageReceived(TransportReplicationAction.java:279)
    at org.elasticsearch.transport.RequestHandlerRegistry.processMessageReceived(RequestHandlerRegistry.java:77)
    at org.elasticsearch.transport.TransportService$4.doRun(TransportService.java:376)
    at org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:37)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
    at java.lang.Thread.run(Thread.java:745)
Caused by: java.lang.IllegalArgumentException: Malformed content, found extra data after parsing: START_OBJECT
    at org.elasticsearch.index.mapper.DocumentParser.parseDocument(DocumentParser.java:141)
    ... 17 more

I'm trying to better understand why exactly the data Im trying to feed in is malformed, and what could I do to better debug this situation?

EDIT This is a massive document with 200 million examples, but here is an example data point {"company":"E-Corp","title":"Sith lord","people":[{"id":"12345","name":"Darth Vader","title":"The Sith Lord"}]}

Could you get us some more information.. A Snippet from the JSON would be useful. — Simon Ludwig
– Simon Ludwig, Commented Sep 30, 2016 at 23:30
@SimonLudwig This file has 200 million entries, and not all the entries have all the data filled out, I can show a few examples. — TheM00s3
– TheM00s3, Commented Sep 30, 2016 at 23:32

Anuga · Accepted Answer · 2016-10-01 00:01:26Z

3

Make sure every Odd Row is the unique id row:

{ "index": {}}

And that every Even row is the data:

{ "index": {}}
{"company":"E-Corp","title":"Sith lord","people":[{"id":"12345","name":"Darth Vader","title":"The Sith Lord"}]}

And to use _bulk, so when adding to Elastic:

POST /index/type/_bulk
{ "index": {}}
{"company":"E-Corp","title":"Sith lord","people":[{"id":"12345","name":"Darth Vader","title":"The Sith Lord"}]}
{ "index": {}}
{"company":"E-Corp","title":"Sith lord","people":[{"id":"12345","name":"Darth Vader","title":"The Sith Lord"}]}
{ "index": {}}
{"company":"E-Corp","title":"Sith lord","people":[{"id":"12345","name":"Darth Vader","title":"The Sith Lord"}]}

Just guessing, cause of the error message: , found extra data after parsing: START_OBJECT from your log.

answered Oct 1, 2016 at 0:01

Anuga

2,8271 gold badge22 silver badges30 bronze badges

Sign up to request clarification or add additional context in comments.

1 Comment

TheM00s3 Over a year ago

Yes, that is the error message. Here is how my index looks like localhost:9200/sample would that mean that my curl statement looks like curl -XPOST localhost:9200/sample/_bulk --binary-data @output.json?

jay · Accepted Answer · 2016-10-01 00:39:57Z

Are you specifying a mapping? If you are not, then elasticsearch will create a mapping based on the first document. Now if any of the other documents have values which do not map to these particular fields, you might get an error.

https://www.elastic.co/guide/en/elasticsearch/guide/current/custom-dynamic-mapping.html

For example, company is probably going to be mapped as string, but if a document comes along with a number or date in that field, then an error might be thrown.

You also have nested documents (people) - I would look into that also. Can you try taking a few sample documents - say first 10, and see if you can index them using bulk api.

Or you can create your own mapping for each of these fields, since you do not seem to have a lot of fields per document.

Vega · Accepted Answer · 2018-04-13 07:36:24Z

0

You can have this error

"Malformed content, found extra data after parsing: START_OBJECT" }" sent back by ElasticSearch in case your url didn't contain /_bulk at the end.

ElasticSearch is then not expecting to find linefeed and extra data after the last correctly closed curly bracket and discards the extra data In particular when issuing a call thru curl namely if you use

curl_easy_setopt(curl, CURLOPT_URL, str)

str should be well formed example str shoul be equal to 'http://localhost:9200/_bulk' and not 'http://localhost:9200'

edited Apr 13, 2018 at 7:36

Vega

28.8k28 gold badges121 silver badges151 bronze badges

answered Apr 13, 2018 at 7:11

marcob

1

Collectives™ on Stack Overflow

elasticsearch mapperParsingException on bulk import

3 Answers 3

1 Comment

Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

1 Comment

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related