0

I need to sort and create a new array based on the value of the JSON. I need to filter repositories under each team and store repositories into a different array.

Input array:

{
    "repo_list": [
      {
        "repo_name": "MaticCorporation/Sample-Repo-1",
        "team_name": "AFIN",
        "tlt_member": "Sample-TLT-Member-1",
        "matix.properties": "Valid"
      },
      {
        "repo_name": "MaticCorporation/Sample-Repo-2",
        "team_name": "AFIN",
        "tlt_member": "Sample-TLT-Member-1",
        "matix.properties": "Valid"
      },
      {
        "repo_name": "MaticCorporation/Sample-Repo-3",
        "team_name": "-",
        "tlt_member": "Sample-TLT-Member-2",
        "matix.properties": "Invalid"
      },
      {
        "repo_name": "MaticCorporation/Sample-Repo-4",
        "team_name": "RETIX",
        "tlt_member": "-",
        "matix.properties": "Invalid"
      },
      {
        "repo_name": "MaticCorporation/Sample-Repo-5",
        "team_name": "-",
        "tlt_member": "-",
        "matix.properties": "No"
      }
    ]
  }

Output:

 {
  "repo_by_team": [
    {
      "team": "AFIN",
      "repo_count": 2,
      "repo_list": [
        "MaticCorporation/Sample-Repo-1",
        "MaticCorporation/Sample-Repo-2"
      ]
    },
    {
      "team": "RETIX",
      "repo_count": 1,
      "repo_list": [
        "MaticCorporation/Sample-Repo-4"
      ]
    }
  ]
}

I've implemented the solution to filter and store all team names into an array, but I'm having difficulty how to get the result like output array.

Here is my code for extracting team names:

def get_team_names(repo_list):
    repos=valid_repos(repo_list)
    team_name=[item.get('team') for item in repos]
    return team_name
1
  • Welcome to Stack Overflow. Please read How to Ask. "but I'm having difficulty how to get the result like output array." Okay, so what is the question? What do you imagine are the remaining logical steps, and what part do you need help with? This is not a code-writing service. Commented Feb 15, 2022 at 15:50

2 Answers 2

1

You can use a dict[str, list[str]] to map between a team and its repositories, and you can use the json module to transform data between Python dictionaries and a JSON representation.

import json

with open('input.json') as input_file, open('output.json', 'w') as output_file:
    repo_data = json.load(input_file)['repo_list']
    team_repos = {}
    for repo in repo_data:
        if repo['team_name'] != '-':
            if repo['team_name'] not in team_repos:
                team_repos[repo['team_name']] = []
            team_repos[repo['team_name']].append(repo['repo_name'])

    result = []
    for team, repo_list in team_repos.items():
        result.append({
            "team": team,
            "repo_count": len(repo_list),
            "repo_list": repo_list
        })

    json.dump({'repo_by_team': result}, output_file, indent=4)
Sign up to request clarification or add additional context in comments.

3 Comments

Thank you so much! This worked! your support means a lot! Appreciate you!
Hi, if there are duplicate repos in repo_list array how can we get the distinct and repo_count also update for distinct repos @BrokenBenchmark
Use a set rather than a list, so the type of the data structure you maintain is a dict[str, set[str]. You can then use list() to turn the sets back into lists for serialization.
1

The following is functional. The function may perform slowly on large input, but it uses no more than the necessary amount of space. It does, however, accept and return a Python dictionary. To convert to and from a dictionary use the Python json module.

def sort_by_team(repo_list: dict) -> dict:
    ans = {"repo_by_team": []}
    for repo in repo_list:
        if repo["team_name"] != "-" and repo["team_name"] not in [r["team"] for r in ans["repo_by_team"]]:
            ans["repo_by_team"].append({"team": repo["team_name"], "repo_count": 1, "repo_list": [repo["repo_name"]]})
        else:
            for r in ans["repo_by_team"]:
                if r["team"] != repo["team_name"]:
                    continue
                r["repo_count"] += 1
                r["repo_list"].append(repo["repo_name"])
                break
    return ans

4 Comments

Thank you so much! This worked! your support means a lot! Appreciate you!
@Crest : actually I'm going to use this for a large object like 3000, if so what would be the best approach with much faster iterations. your one or above answer :) I would like to know your suggestion. I'm very new to Python and programing
3000 repositories? Both should run without any noticeable performance difference on that input size.
@BrokenBenchmark Yes 3000 repositories, Thank you!

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.