unnest a nested map using miller

Question

I'm trying to convert this CSV

dir1/key1=val1/,MOVED_TO,123.pq
dir2/key1=val2/key2=val3/,MOVED_TO,456.pq

into this desired output (line-oriented json):

{"type":"quux","top"="dir1","key1":"val1","event":"MOVED_TO"}
{"type":"quux","top"="dir2","key1":"val2","key2":"val3","event":"MOVED_TO"}

So far I've gotten close, but I still have $key nested.

echo -e 'dir1/key1=val1/,MOVED_TO,123.pq
dir2/key1=val2/key2=val3/,MOVED_TO,456.pq' | \
  mlr --hi --records-per-batch 1 --csv --allow-ragged-csv-input --hi --no-jlistwrap --ojsonl \
  label dirname,event,filename \
  then put '$type = "quux"; $top = sub($dirname,"/.*",""); $key = splitkv(sub(sub($dirname, "^[^/]*/", ""), "/$", ""), "=", "/")' \
  then cut -f type,top,key,event

{"event": "MOVED_TO", "type": "quux", "top": "dir1", "key": {"key1": "val1"}}
{"event": "MOVED_TO", "type": "quux", "top": "dir2", "key": {"key1": "val2", "key2": "val3"}}

I've been trying to use concat(.) but that seems to work on arrays and not maps as I have there.

The "type":"quux" is a static string I need to assign during the process, it is not present in the raw CSV.

How can I unnest $key?

This process is done in various places where "quux" will change based on location. It is not in the raw data, ergo why I'm adding it as a string literal in the mlr code. — r2evans
– r2evans, Commented Nov 2 at 14:24

aborruso · Accepted Answer · 2025-11-03 10:31:39Z

1

I managed to solve this! The key was using Miller's nest --explode with the right combination of separators:

mlr --icsv --hi --ojsonl --from input.csv \
  cat \
  then put '$top=regextract($1,"dir\d+");$1=sub($1,"dir\d+/","");$1=sub($1,"/$","")' \
  then cat -n \
  then rename 1,keys,2,event \
  then nest --explode --pairs --across-fields --nested-fs "/" --nested-ps "=" -f keys \
  then cut -x -f 3

Output:

{"n": 1, "key1": "val1", "event": "MOVED_TO", "top": "dir1"}
{"n": 2, "key1": "val2", "key2": "val3", "event": "MOVED_TO", "top": "dir2"}

The key steps are:

Extract the top directory prefix using regextract()
Clean up the first field by removing the prefix and trailing slash
Add a record number with cat -n
Rename fields to prepare for nesting
Use nest --explode --pairs --across-fields with --nested-fs "/" (field separator) and --nested-ps "=" (pair separator) to properly split and unnest the key-value pairs
Remove the 3 column with cut -x -f 3

The crucial part was using --across-fields instead of --across-records, and specifying both --nested-fs for splitting the pairs and --nested-ps for splitting each key from its value. This properly flattens the nested structure into the parent record.

edited Nov 3 at 10:31

answered Nov 3 at 7:51

aborruso

5,8723 gold badges27 silver badges49 bronze badges

Sign up to request clarification or add additional context in comments.

5 Comments

r2evans Nov 3 at 11:58

adding then put '$type="quux" seals the deal, thanks aborruso!

r2evans Nov 3 at 12:04

Does adding a record number with cat -n enable anything or guard against anything? It isn't in the expected output and not something I need.

aborruso Nov 3 at 12:11

I prefer to add it to debug. You can remove it at the end

r2evans Nov 3 at 17:16

mlr is a little new to me, it's good to see good dev principles as well, and that makes a lot of sense. Thank you again!

aborruso Nov 3 at 17:52

I think it's a very great product. For other questions please write an issue in the official repo

Collectives™ on Stack Overflow

unnest a nested map using miller

1 Answer 1

5 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

5 Comments

Your Answer

Sign up or log in

Post as a guest

Related