0

I have a scenario to encrypt/decrypt (AES) my data coming in pubsub/GCS bucket. I am getting bigdata(terabytes of records) in either GCS or Pubsub. I have apache beam code running using dataflow to do some kind of transformation (group by etc). I would need to include encryption of few fields (PII) while processing the data also i would required to decrypt this records in future. The processed data write to Bigquery.

My decryption request is something like below in BQ.

select firstname, lastname from table where id=1234

Here in this above example , previously I have encrypted my first , last name and id as it contains PII info.(deterministic). my encryption should be based on Id (1234).

encrypted value of first name and last name of 567 is vary from 1234.

when i am giving query where id=1234 , this 1234 is id in clear text(un encrypted form).

is there anyway to implement such kind of encryption/decryption mechanism in GCP/apach beam/dataflow ?. I don't want to use DLP as it have some limitations.

1 Answer 1

0

I don't quite get what you are trying to do in terms of querying, as if you store encrypted IDs, you would have to send the encrypted IDs when querying.

But related to Apache Beam / Dataflow, yes, you can have your job waiting for Pub/Sub or Cloud Storage data and apply encryption before saving somewhere else (e.g., BigQuery).

If you can encrypt/hash the data using JavaScript, you may even be able to use one of the Google-provided templates:

  • Pub/Sub Subscription to BigQuery
  • Pub/Sub Topic to BigQuery
  • Pub/Sub Avro to BigQuery
  • Pub/Sub Proto to BigQuery
  • Cloud Storage Text to BigQuery (Stream)

If they do not fit, the code is open-source at https://github.com/GoogleCloudPlatform/DataflowTemplates and you can change / build your own pipeline.

Sign up to request clarification or add additional context in comments.

3 Comments

yes that make sense. However if i want to encrypt the data/record, how do i get that ? for example - if i pass one query select firstname, lastname from table where id=1234 how do i retrieve the decrypted records ? yes we can use DF like fetching this data from BQ and process it. But how do we know that am applying decryption on that particular ID ? I might have to decrpyt these ID to sent out these records some other downstream systems
You can use BigQueryIO to read and write, and use your own ParDo/DoFn to encrypt/decrypt the data as you want.
but hw do i know that my orginal id value while doing decryption ? i mean my first and last name would have encrypted using id and id itlsef encrypted using Id. so when i decrypt it i wuld need to apply based on id then i can retreive my first and last name

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.