6

I have a df with 300 columns but there is one column ID that I want to encrypt and allow anyone else with a key to decrypt if I give them the df as a csv.

Is this possible?

I know how to hash a column, but as far as I have read I can not unhash it or give someone a key to unhash it.

Thank you in advance.

edit:

df

id
1
2
3

@Wen is this a good example:

(1:2), (2:3),(3:4)

new df

id
2
3
4
4
  • You'll have to use a third-party library. I'd recommend pycryptodome. Commented Aug 31, 2018 at 13:23
  • @t.m.adam thank you for the suggestion. I will try out some examples. Commented Aug 31, 2018 at 13:26
  • why not replace it with ransom number , and only provide the map dict to those people Commented Aug 31, 2018 at 13:38
  • @Wen I made an edit, is the example I provided what you are looking for? Commented Aug 31, 2018 at 13:40

4 Answers 4

7

I'd recommend the python itsdangerous library. Here is a quick example:

from itsdangerous import URLSafeSerializer

s = URLSafeSerializer('secret-key')

print(s.dumps([1, 2, 3, 4]))

# 'WzEsMiwzLDRd.wSPHqC0gR7VUqivlSukJ0IeTDgo'

print(s.loads('WzEsMiwzLDRd.wSPHqC0gR7VUqivlSukJ0IeTDgo'))

# [1, 2, 3, 4]

The secret-key can be shared between you and the other trusted party to decrypt the strings or columns.

This does rely on serialization however and some python data types aren't easily serialized, but if you just need a column name or something like that, this could work well.

I would like to add a qualification here that this process only obfuscates the data, but does not actually encrypt it. I did not fully understand that when I originally answered this question. This obfuscation may be enough for your needs, but please be aware! From the docs:

The receiver can decode the contents and look into the package, but they can not modify the contents unless they also have your secret key. Docs

Sign up to request clarification or add additional context in comments.

2 Comments

thank you for the suggestion. If I encrypt the column and save it as a csv, would the other party simply read in the csv, and apply the decryption key?
They would have to iterate over each row in the CSV and decrypt the data in that column. Another approach is just to use this library on the entire CSV output and have them decrypt the whole thing once. Or you could write a simple decryption script that the user can run on the CSV so these steps occur automatically.
7

You can use cryptpandas.

As an example, if you have a pandas dataframe

import pandas as pd
df = pd.DataFrame({'A': [1, 2, 3],
                   'B': ['one', 'one', 'four']})

you can encrypt it as

import cryptpandas as crp

crp.to_encrypted(df, password='mypassword123', path='file.crypt')

and decrypt it as

decrypted_df = crp.read_encrypted(path='file.crypt', password='mypassword123')

P.S. More info here.

Comments

1

I think you can do this way

key=dict(zip(np.arange(len(df)),df.id))
df.id=np.arange(len(df))
**# for the person do not have the key**

df
Out[640]:
   id
0   0
1   1
2   2


**# for the person who havde the key**

df.id=df.id.map(key.get)

df
Out[642]: 
   id
0   1
1   2
2   3

1 Comment

this works as well and is really simple, however looking for out of the box solutions. Thank you for the hard work regardless.
1

You could use AES from the Crypto.Cipher library. I wrote some helper functions to encrypt sets of columns in a pandas dataframe. Examples here if helpful: https://github.com/bennywij/junk-drawer/blob/master/secret_pandas.py

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.