2

I'm getting a MapperParsingException while trying to upload a large json file. Here is the full error that I get back from elasticsearch:

on [[sample][4]]
MapperParsingException[failed to parse]; nested: IllegalArgumentException[Malformed content, found extra data after parsing: START_OBJECT];
    at org.elasticsearch.index.mapper.DocumentParser.parseDocument(DocumentParser.java:156)
    at org.elasticsearch.index.mapper.DocumentMapper.parse(DocumentMapper.java:309)
    at org.elasticsearch.index.shard.IndexShard.prepareCreate(IndexShard.java:529)
    at org.elasticsearch.index.shard.IndexShard.prepareCreateOnPrimary(IndexShard.java:506)
    at org.elasticsearch.action.index.TransportIndexAction.prepareIndexOperationOnPrimary(TransportIndexAction.java:214)
    at org.elasticsearch.action.index.TransportIndexAction.executeIndexRequestOnPrimary(TransportIndexAction.java:223)
    at org.elasticsearch.action.index.TransportIndexAction.shardOperationOnPrimary(TransportIndexAction.java:157)
    at org.elasticsearch.action.index.TransportIndexAction.shardOperationOnPrimary(TransportIndexAction.java:66)
    at org.elasticsearch.action.support.replication.TransportReplicationAction$PrimaryPhase.doRun(TransportReplicationAction.java:657)
    at org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:37)
    at org.elasticsearch.action.support.replication.TransportReplicationAction$PrimaryOperationTransportHandler.messageReceived(TransportReplicationAction.java:287)
    at org.elasticsearch.action.support.replication.TransportReplicationAction$PrimaryOperationTransportHandler.messageReceived(TransportReplicationAction.java:279)
    at org.elasticsearch.transport.RequestHandlerRegistry.processMessageReceived(RequestHandlerRegistry.java:77)
    at org.elasticsearch.transport.TransportService$4.doRun(TransportService.java:376)
    at org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:37)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
    at java.lang.Thread.run(Thread.java:745)
Caused by: java.lang.IllegalArgumentException: Malformed content, found extra data after parsing: START_OBJECT
    at org.elasticsearch.index.mapper.DocumentParser.parseDocument(DocumentParser.java:141)
    ... 17 more

I'm trying to better understand why exactly the data Im trying to feed in is malformed, and what could I do to better debug this situation?

EDIT This is a massive document with 200 million examples, but here is an example data point {"company":"E-Corp","title":"Sith lord","people":[{"id":"12345","name":"Darth Vader","title":"The Sith Lord"}]}

2
  • Could you get us some more information.. A Snippet from the JSON would be useful. Commented Sep 30, 2016 at 23:30
  • 1
    @SimonLudwig This file has 200 million entries, and not all the entries have all the data filled out, I can show a few examples. Commented Sep 30, 2016 at 23:32

3 Answers 3

3

Make sure every Odd Row is the unique id row:

{ "index": {}}

And that every Even row is the data:

{ "index": {}}
{"company":"E-Corp","title":"Sith lord","people":[{"id":"12345","name":"Darth Vader","title":"The Sith Lord"}]}

And to use _bulk, so when adding to Elastic:

POST /index/type/_bulk
{ "index": {}}
{"company":"E-Corp","title":"Sith lord","people":[{"id":"12345","name":"Darth Vader","title":"The Sith Lord"}]}
{ "index": {}}
{"company":"E-Corp","title":"Sith lord","people":[{"id":"12345","name":"Darth Vader","title":"The Sith Lord"}]}
{ "index": {}}
{"company":"E-Corp","title":"Sith lord","people":[{"id":"12345","name":"Darth Vader","title":"The Sith Lord"}]}

Just guessing, cause of the error message: , found extra data after parsing: START_OBJECT from your log.

Sign up to request clarification or add additional context in comments.

1 Comment

Yes, that is the error message. Here is how my index looks like localhost:9200/sample would that mean that my curl statement looks like curl -XPOST localhost:9200/sample/_bulk --binary-data @output.json?
0

Are you specifying a mapping? If you are not, then elasticsearch will create a mapping based on the first document. Now if any of the other documents have values which do not map to these particular fields, you might get an error.

https://www.elastic.co/guide/en/elasticsearch/guide/current/custom-dynamic-mapping.html

For example, company is probably going to be mapped as string, but if a document comes along with a number or date in that field, then an error might be thrown.

You also have nested documents (people) - I would look into that also. Can you try taking a few sample documents - say first 10, and see if you can index them using bulk api.

Or you can create your own mapping for each of these fields, since you do not seem to have a lot of fields per document.

Comments

0

You can have this error

"Malformed content, found extra data after parsing: START_OBJECT" }" sent back by ElasticSearch in case your url didn't contain /_bulk at the end.

ElasticSearch is then not expecting to find linefeed and extra data after the last correctly closed curly bracket and discards the extra data In particular when issuing a call thru curl namely if you use

curl_easy_setopt(curl, CURLOPT_URL, str)

str should be well formed example str shoul be equal to 'http://localhost:9200/_bulk' and not 'http://localhost:9200'

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.