7

I have some JSON input, the shape of which I cannot predict, and I have to make some transformations (to call it something) so that some fields are not logged. For instance, if I have this JSON:

{
    "id": 5,
    "name": "Peter",
    "password": "some pwd"
}

then after the transformation it should look like this:

{
    "id": 5,
    "name": "Peter"
}  

The above sample is trivial, but the actual case is not so happy/easy. I will have some regular expressions and if any field(s) on the input JSON matches that, then it shouldn't be on the result. I will have to go recursively in case I have some nested objects. I've been seeing some stuff on LINQ to JSON but I have found nothing satisfying my needs.

Is there a way of doing this?

Note: This is part of a logging library. I can use the JSON string if necessary or easier. The thing is that at some point in my logging pipeline I get the object (or string as required) and then I need to strip the sensitive data from it, such as passwords, but also any other client-specified data.

7
  • 1
    Parse it with JObject.Parse so you will get a JObject. Then you can just use Remove(key) method Commented Oct 18, 2016 at 19:07
  • Can I make a search on the JObject using a regular expression? Commented Oct 18, 2016 at 19:09
  • you can directly use key indexed syntax like obj["password"] Commented Oct 18, 2016 at 19:12
  • This is simply too broad. When do you need to make the transformation? Can it be made to the raw string? Can it be made during deserialization? What are the constraints for the change? There isn't enough here to help you and the question should be closed as it is currently written. Commented Oct 18, 2016 at 19:13
  • I want to do something like parsed.Children().Where(myRegex.Matches) and the remove the items that match my regex, and if any children has children itself then do the same on them Commented Oct 18, 2016 at 19:15

2 Answers 2

11

You can parse your JSON into a JToken, then use a recursive helper method to match property names to your regexes. Wherever there's a match, you can remove the property from its parent object. After all sensitive info has been removed, just use JToken.ToString() to get the redacted JSON.

Here is what the helper method might look like:

public static string RemoveSensitiveProperties(string json, IEnumerable<Regex> regexes)
{
    JToken token = JToken.Parse(json);
    RemoveSensitiveProperties(token, regexes);
    return token.ToString();
}

public static void RemoveSensitiveProperties(JToken token, IEnumerable<Regex> regexes)
{
    if (token.Type == JTokenType.Object)
    {
        foreach (JProperty prop in token.Children<JProperty>().ToList())
        {
            bool removed = false;
            foreach (Regex regex in regexes)
            {
                if (regex.IsMatch(prop.Name))
                {
                    prop.Remove();
                    removed = true;
                    break;
                }
            }
            if (!removed)
            {
                RemoveSensitiveProperties(prop.Value, regexes);
            }
        }
    }
    else if (token.Type == JTokenType.Array)
    {
        foreach (JToken child in token.Children())
        {
            RemoveSensitiveProperties(child, regexes);
        }
    }
}

And here is a short demo of its use:

public static void Test()
{
    string json = @"
    {
      ""users"": [
        {
          ""id"": 5,
          ""name"": ""Peter Gibbons"",
          ""company"": ""Initech"",
          ""login"": ""pgibbons"",
          ""password"": ""Sup3rS3cr3tP@ssw0rd!"",
          ""financialDetails"": {
            ""creditCards"": [
              {
                ""vendor"": ""Viza"",
                ""cardNumber"": ""1000200030004000"",
                ""expDate"": ""2017-10-18"",
                ""securityCode"": 123,
                ""lastUse"": ""2016-10-15""
              },
              {
                ""vendor"": ""MasterCharge"",
                ""cardNumber"": ""1001200230034004"",
                ""expDate"": ""2018-05-21"",
                ""securityCode"": 789,
                ""lastUse"": ""2016-10-02""
              }
            ],
            ""bankAccounts"": [
              {
                ""accountType"": ""checking"",
                ""accountNumber"": ""12345678901"",
                ""financialInsitution"": ""1st Bank of USA"",
                ""routingNumber"": ""012345670""
              }
            ]
          },
          ""securityAnswers"":
          [
              ""Constantinople"",
              ""Goldfinkle"",
              ""Poppykosh"",
          ],
          ""interests"": ""Computer security, numbers and passwords""
        }
      ]
    }";

    Regex[] regexes = new Regex[]
    {
        new Regex("^.*password.*$", RegexOptions.IgnoreCase),
        new Regex("^.*number$", RegexOptions.IgnoreCase),
        new Regex("^expDate$", RegexOptions.IgnoreCase),
        new Regex("^security.*$", RegexOptions.IgnoreCase),
    };

    string redactedJson = RemoveSensitiveProperties(json, regexes);
    Console.WriteLine(redactedJson);
}

Here is the resulting output:

{
  "users": [
    {
      "id": 5,
      "name": "Peter Gibbons",
      "company": "Initech",
      "login": "pgibbons",
      "financialDetails": {
        "creditCards": [
          {
            "vendor": "Viza",
            "lastUse": "2016-10-15"
          },
          {
            "vendor": "MasterCharge",
            "lastUse": "2016-10-02"
          }
        ],
        "bankAccounts": [
          {
            "accountType": "checking",
            "financialInsitution": "1st Bank of USA"
          }
        ]
      },
      "interests": "Computer security, numbers and passwords"
    }
  ]
}

Fiddle: https://dotnetfiddle.net/KcSuDt

Sign up to request clarification or add additional context in comments.

Comments

3

You can parse your JSON to a JContainer (which is either an object or array), then search the JSON hierarchy using DescendantsAndSelf() for properties with names that match some Regex, or string values that match a Regex, and remove those items with JToken.Remove().

For instance, given the following JSON:

{
  "Items": [
    {
      "id": 5,
      "name": "Peter",
      "password": "some pwd"
    },
    {
      "id": 5,
      "name": "Peter",
      "password": "some pwd"
    }
  ],
  "RootPasswrd2": "some pwd",
  "SecretData": "This data is secret",
  "StringArray": [
    "I am public",
    "This is also secret"
  ]
}

You can remove all properties whose name includes "pass.*w.*r.*d" as follows:

var root = (JContainer)JToken.Parse(jsonString);

var nameRegex = new Regex(".*pass.*w.*r.*d.*", RegexOptions.IgnoreCase | RegexOptions.CultureInvariant);
var query = root.DescendantsAndSelf()
    .OfType<JProperty>()
    .Where(p => nameRegex.IsMatch(p.Name));
query.RemoveFromLowestPossibleParents();

Which results in:

{
  "Items": [
    {
      "id": 5,
      "name": "Peter"
    },
    {
      "id": 5,
      "name": "Peter"
    }
  ],
  "SecretData": "This data is secret",
  "StringArray": [
    "I am public",
    "This is also secret"
  ]
}

And you can remove all string values that include the substring secret by doing:

var valueRegex = new Regex(".*secret.*", RegexOptions.IgnoreCase);
var query2 = root.DescendantsAndSelf()
    .OfType<JValue>()
    .Where(v => v.Type == JTokenType.String && valueRegex.IsMatch((string)v));
query2.RemoveFromLowestPossibleParents();

var finalJsonString = root.ToString();

Which when applied after the first transform results in:

{
  "Items": [
    {
      "id": 5,
      "name": "Peter"
    },
    {
      "id": 5,
      "name": "Peter"
    }
  ],
  "StringArray": [
    "I am public"
  ]
}

For convenience, I am using the following extension methods:

public static partial class JsonExtensions
{
    public static TJToken RemoveFromLowestPossibleParent<TJToken>(this TJToken node) where TJToken : JToken
    {
        if (node == null)
            return null;
        JToken toRemove;
        var property = node.Parent as JProperty;
        if (property != null)
        {
            // Also detach the node from its immediate containing property -- Remove() does not do this even though it seems like it should
            toRemove = property;
            property.Value = null;
        }
        else
        {
            toRemove = node;
        }
        if (toRemove.Parent != null)
            toRemove.Remove();
        return node;
    }

    public static IEnumerable<TJToken> RemoveFromLowestPossibleParents<TJToken>(this IEnumerable<TJToken> nodes) where TJToken : JToken
    {
        var list = nodes.ToList();
        foreach (var node in list)
            node.RemoveFromLowestPossibleParent();
        return list;
    }
}

Demo fiddle here.

2 Comments

About your extension methods, in what situation would you need to go up more than one extra level to find container that's not a JProperty? A JProperty cannot have a JProperty as its parent, right?
@KyleDelaney - I wrote this quite a long time ago. My current version of RemoveFromLowestPossibleParent() is much less psuedo-general; I've updated the answer to include it + a demo fiddle.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.