2

I have a csv like this:

    Art        Category  LEVEL 2    LEVEL 3 LEVEL 4 LEVEL 5 Location
0   PRINTMAKING VISUAL  CONTEMPORARY    2D  NaN NaN NaN
1   PAINTING    VISUAL  CONTEMPORARY    2D  NaN NaN NaN
2   AERIAL  VISUAL  CONTEMPORARY    2D  PHOTOGRAPHY AERIAL  NaN
3   WILDLIFE    VISUAL  CONTEMPORARY    2D  PHOTOGRAPHY WILDLIFE    NaN
4   NATURE  VISUAL  CONTEMPORARY    2D  PHOTOGRAPHY NATURE  NaN

The art and category will be there but the levels from l1 to l6 can be null. What I want to achive is like so:

art: PRINTMAKING
category: VISUAL
tags: [CONTEMPORARY, 2D]

The levels are basically tags for a particular art which are to stored in an array.

I am new to python and so far I have written the following code. How can I achive this.

import pandas as pd
import json
data = pd.read_excel("C:\\Users\\Desktop\\visual.xlsx")
rec = {}
rec['art'] = data['Art']
rec['category'] = data['Category']
rec['tags'] = data['LEVEL 2'] + ',' + data['LEVEL 3'] + ',' + data['LEVEL 4'] + ',' + data['LEVEL 5']

I guess this is not the correct way to do it.

5
  • It's unclear what you're asking Commented Feb 28, 2019 at 6:05
  • U can understand it this way. Every art is in a category and has tags. The tags are in columns which need to be stored as an array. Commented Feb 28, 2019 at 6:09
  • 1
    It would be easier to understand if you displayed how your dataframe looks at the moment and what your expected output be like Commented Feb 28, 2019 at 6:11
  • What problems are you having? Commented Feb 28, 2019 at 6:18
  • I don't know how to proceed with the current code. I have edited the question for better understanding. Commented Feb 28, 2019 at 6:21

4 Answers 4

2

for convert values of tags to lists without NaNs use:

df['tags'] = df.filter(like='LEVEL').apply(lambda x: x.dropna().tolist(), axis=1)
#alternative, should be faster
#df['tags'] = [[y for y in x if isinstance(y, str)] for x in
#                 df.filter(like='LEVEL').values]

d = df[['Art','Category','tags']].to_dict(orient='records')

[{
    'Art': 'PRINTMAKING',
    'Category': 'VISUAL',
    'tags': ['CONTEMPORARY', '2D']
}, {
    'Art': 'PAINTING',
    'Category': 'VISUAL',
    'tags': ['CONTEMPORARY', '2D']
}, {
    'Art': 'AERIAL',
    'Category': 'VISUAL',
    'tags': ['CONTEMPORARY', '2D', 'PHOTOGRAPHY', 'AERIAL']
}, {
    'Art': 'WILDLIFE',
    'Category': 'VISUAL',
    'tags': ['CONTEMPORARY', '2D', 'PHOTOGRAPHY', 'WILDLIFE']
}, {
    'Art': 'NATURE',
    'Category': 'VISUAL',
    'tags': ['CONTEMPORARY', '2D', 'PHOTOGRAPHY', 'NATURE']
}]
Sign up to request clarification or add additional context in comments.

9 Comments

is there a way to convert all into lower case?
@KaranGupta - sure, change .apply(lambda x: x.dropna().tolist(), axis=1) to .apply(lambda x: x.str.lower().dropna().tolist(), axis=1)
@KaranGupta - But working if all values are strings or NaN (None)s
all values are not null...some can be null. I didn't get you
@KaranGupta - OK, I think it should not working if some numeric values
|
1

df

   Art     Category   LEVEL             2 LEVEL.1            3   LEVEL.2   4  \
0    0  PRINTMAKING  VISUAL  CONTEMPORARY      2D          NaN       NaN NaN   
1    1     PAINTING  VISUAL  CONTEMPORARY      2D          NaN       NaN NaN   
2    2       AERIAL  VISUAL  CONTEMPORARY      2D  PHOTOGRAPHY    AERIAL NaN   
3    3     WILDLIFE  VISUAL  CONTEMPORARY      2D  PHOTOGRAPHY  WILDLIFE NaN   
4    4       NATURE  VISUAL  CONTEMPORARY      2D  PHOTOGRAPHY    NATURE NaN   

   LEVEL.3   5  Location  
0      NaN NaN       NaN  
1      NaN NaN       NaN  
2      NaN NaN       NaN  
3      NaN NaN       NaN  
4      NaN NaN       NaN  

df = df.set_index(['Art','Category']).apply(lambda x: [','.join([str(a) for a in x.values if str(a) != 'nan'])], axis=1)

print(df.reset_index(name='tags'))

   Art     Category                                           tags
0    0  PRINTMAKING                       [VISUAL,CONTEMPORARY,2D]
1    1     PAINTING                       [VISUAL,CONTEMPORARY,2D]
2    2       AERIAL    [VISUAL,CONTEMPORARY,2D,PHOTOGRAPHY,AERIAL]
3    3     WILDLIFE  [VISUAL,CONTEMPORARY,2D,PHOTOGRAPHY,WILDLIFE]
4    4       NATURE    [VISUAL,CONTEMPORARY,2D,PHOTOGRAPHY,NATURE]

To dict

df.to_dict(orient='records')

Output

[{'Art': 0, 'Category': 'PRINTMAKING', 'tags': ['VISUAL,CONTEMPORARY,2D']},
 {'Art': 1, 'Category': 'PAINTING', 'tags': ['VISUAL,CONTEMPORARY,2D']},
 {'Art': 2,
  'Category': 'AERIAL',
  'tags': ['VISUAL,CONTEMPORARY,2D,PHOTOGRAPHY,AERIAL']},
 {'Art': 3,
  'Category': 'WILDLIFE',
  'tags': ['VISUAL,CONTEMPORARY,2D,PHOTOGRAPHY,WILDLIFE']},
 {'Art': 4,
  'Category': 'NATURE',
  'tags': ['VISUAL,CONTEMPORARY,2D,PHOTOGRAPHY,NATURE']}]

2 Comments

I need a dictionary as described in the question
AttributeError: 'Series' object has no attribute 'set_index'
0

This might solve your problem:

from io import StringIO
import csv
# help(csv)
categories="""art,category, l1, l2, l3, l4, l5, l6
a1,c1,abc,def
a2,c2,,,,xyz,pqr,
a3,c3,lmn,,,qwe,rtg,
"""

f=StringIO(categories)
rows=csv.DictReader(f,delimiter=',')
data=[]
for row in rows:
#     print(row)
    d={
        "cateory":row.get("category",''),
        "art":row.get("art",'')
    }
    try:
        del row["category"]
        del row["art"]
    except KeyError as ke:
        print(ke)
#     print(row)
    d["levels"]=list(row.values())
    print(d)

Sample output:

{'cateory': 'c1', 'art': 'a1', 'levels': ['abc', 'def', None, None, None, None]}
{'cateory': 'c2', 'art': 'a2', 'levels': ['', '', '', 'xyz', 'pqr', '']}
{'cateory': 'c3', 'art': 'a3', 'levels': ['lmn', '', '', 'qwe', 'rtg', '']}

Comments

0

You should use pd.Series.str.cat combined with functools.reduce to concatenate all of the tags:

df = pd.DataFrame({
    'art': ['a1', 'a2', 'a3'],
    'category': ['c1', 'c2', 'c3'],
    'l1': ['abc', '', 'lmn'],
    'l2': ['def', 'xyz', 'qwe'],
})

from functools import reduce
tag_cols = [x for x in df.columns if x not in ['art', 'category']]
df['tags'] = reduce(lambda a, b: df[a].str.cat(df[b], sep=','), 
tag_cols).apply(lambda x: [t for t in x.split(",") if t != ''])
d = df.to_dict(orient='records')

Output

  [{'art': 'a1',
  'category': 'c1',
  'l1': 'abc',
  'l2': 'def',
  'tags': ['abc', 'def']},
 {'art': 'a2', 'category': 'c2', 'l1': '', 'l2': 'xyz', 'tags': ['xyz']},
 {'art': 'a3',
  'category': 'c3',
  'l1': 'lmn',
  'l2': 'qwe',
  'tags': ['lmn', 'qwe']}]

2 Comments

I need a dictionary bro
Yeah I missed that. Edited the post for you with a dict.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.