Use an UDF:
val mergeArrays = udf((a: Seq[String], b: Seq[String]) => (a ++ b).toSet.toSeq)
Then, assuming your input is
val df = Seq((Seq(1,2),Seq(2,3))).toDF("col1", "col2")
you can merge the arrays with
df.withColumn("col3", mergeArrays($"col1", $"col2"))
resulting in
+------+------+---------+
| col1| col2| col3|
+------+------+---------+
|[1, 2]|[2, 3]|[1, 2, 3]|
+------+------+---------+
EDIT: Java version. As expected, it's way uglier, so if you can use Scala, use that instead.
import org.apache.spark.sql.*;
import org.apache.spark.sql.api.java.UDF2;
import scala.collection.Seq;
import java.util.*;
import static org.apache.spark.sql.types.DataTypes.*;
import static scala.collection.JavaConverters.*;
Dataset<Row> data = spark.createDataFrame(
Collections.singletonList(RowFactory.create(Arrays.asList(1, 2), Arrays.asList(2, 3))),
createStructType(Arrays.asList(
createStructField("col1", createArrayType(IntegerType), true),
createStructField("col2", createArrayType(IntegerType), true))));
spark.sqlContext().udf().register("udfMerge", (UDF2<Seq<Integer>, Seq<Integer>, Seq<Integer>>) (s1, s2) -> {
Set<Integer> s = new HashSet<>();
s.addAll(asJavaCollectionConverter(s1).asJavaCollection());
s.addAll(asJavaCollectionConverter(s2).asJavaCollection());
return collectionAsScalaIterableConverter(s).asScala().toSeq();
}, createArrayType(IntegerType));
data.withColumn("col3", functions$.MODULE$.callUDF("udfMerge", functions$.MODULE$.col("col1"), functions$.MODULE$.col("col2"))).show();