I have two PySpark DataFrames like the following:
DataFrame A:
+-----+------+
|nodes|counts|
+-----+------+
| [0]| 1|
| [1]| 0|
| [2]| 1|
| [3]| 0|
| [4]| 0|
| [5]| 0|
| [6]| 1|
| [7]| 0|
| [8]| 0|
| [9]| 0|
| [10]| 0|
And DataFrame B:
+----+------+
|nodes|counts|
+----+------+
|[0] | 1|
|[1] | 0|
|[2] | 3|
|[6] | 0|
|[8] | 2|
+----+------+
I would like create a new DataFrame C such that values in the "counts" column in DataFrame A are summed with the values in the "counts" column of DataFrame B where the "nodes" columns are equal such that DataFrame C looks like:
+-----+------+
|nodes|counts|
+-----+------+
| [0]| 2|
| [1]| 0|
| [2]| 4|
| [3]| 0|
| [4]| 0|
| [5]| 0|
| [6]| 1|
| [7]| 0|
| [8]| 2|
| [9]| 0|
| [10]| 0|
I appreciate the help! I've tried a few different tricks using lambda functions and sql statements and am coming up short on a solution.