5

I have to insert a json array in elastic. The accepted answer in the link suggests to insert a header-line before each json entry. The answer is 2 years old, is there a better solution out in the market? Need I edit my json file manually?

is there any way to import a json file(contains 100 documents) in elasticsearch server.?

[
  {
    "id":9,
    "status":"This is cool."
  },
  ...
]
5
  • How do you read your JSON file? i.e. what client language are you using? Commented Nov 29, 2015 at 7:55
  • its on my machine. I am statrting with elastic, using curl from commandline. Commented Nov 29, 2015 at 8:00
  • Can you show an excerpt of your JSON file? Commented Nov 29, 2015 at 8:22
  • @Val updated question. Commented Nov 29, 2015 at 8:26
  • @Val yes, Thanks. I modified it a bit to suit my case. Commented Nov 30, 2015 at 5:13

1 Answer 1

9

OK, then there's something pretty simple you can do using a simple shell script (see below). The idea is to not have to edit your file manually, but let Python do it and create another file whose format complies with what the _bulk endpoint expects. It does the following:

  1. First, we declare a little Python script that reads your JSON file and creates a new one with the required file format to be sent to the _bulk endpoint.
  2. Then, we run that Python script and store the bulk file
  3. Finally, we send the file created in step 2 to the _bulk endpoint using a simple curl command
  4. There you go, you now have a new ES index containing your documents

bulk.sh:

#!/bin/sh

# 0. Some constants to re-define to match your environment
ES_HOST=localhost:9200
JSON_FILE_IN=/path/to/your/file.json
JSON_FILE_OUT=/path/to/your/bulk.json

# 1. Python code to transform your JSON file
PYTHON="import json,sys;
out = open('$JSON_FILE_OUT', 'w');
with open('$JSON_FILE_IN') as json_in:
    docs = json.loads(json_in.read());
    for doc in docs:
        out.write('%s\n' % json.dumps({'index': {}}));
        out.write('%s\n' % json.dumps(doc, indent=0).replace('\n', ''));
"

# 2. run the Python script from step 1
python -c "$PYTHON"

# 3. use the output file from step 2 in the curl command
curl -s -XPOST $ES_HOST/index/type/_bulk --data-binary @$JSON_FILE_OUT

You need to:

  1. save the above script in the bulk.sh file and chmod it (i.e. chmod u+x bulk.sh)
  2. modify the three variable at the top (step 0) in ordre to match your environment
  3. run it using ./bulk.sh
Sign up to request clarification or add additional context in comments.

4 Comments

For recent versions of Elasticsearch, you need to add the content-type to the curl request with -H 'Content-Type: application/x-ndjson'
I know this is a fairly old thread at this point, but a question. Whenever I use this on my JSON file it adds an Index string at every character. Anybody else have this problem and how did you solve it?
@ChristopherAdkins feel free to create a new question referencing this one and illustrate your exact issue.
@val opened a new question, stackoverflow.com/questions/61580963/…

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.