0

i am working with spark 1.3.0 (at python)

i have DF :

DF.show(3)

ID             Date       Hour     TimeInCluster Cluster Xcluster Ycluster

25342438156 2012-11-30 15:00:00 26            T       130270   165620

25342438156 2012-11-30 16:00:00 86            D       136850   177070

25342438156 2012-11-30 17:00:00 35            D       136850   177070

i am tring to save that DF into not exist hive table

how can i do that?

thank you

i change my code to :

sqlContext = HiveContext(sc)

FinalDf.write().mode(SaveMode.Overwrite).saveAsTable("myDB.sixuserstablediary")

but i got that error

py4j.protocol.Py4JJavaError: An error occurred while calling o280.apply.
: org.apache.spark.sql.AnalysisException: Cannot resolve column name "write" among (IMSI, Date, Hour, TimeInCluster, Cluster, Xcluster, Ycluster);
        at org.apache.spark.sql.DataFrame$$anonfun$resolve$1.apply(DataFrame.scala:162)
        at org.apache.spark.sql.DataFrame$$anonfun$resolve$1.apply(DataFrame.scala:162)
        at scala.Option.getOrElse(Option.scala:120)
        at org.apache.spark.sql.DataFrame.resolve(DataFrame.scala:161)
        at org.apache.spark.sql.DataFrame.col(DataFrame.scala:436)
        at org.apache.spark.sql.DataFrame.apply(DataFrame.scala:426)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:497)
        at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:231)
        at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:379)
        at py4j.Gateway.invoke(Gateway.java:259)
        at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:133)
        at py4j.commands.CallCommand.execute(CallCommand.java:79)
        at py4j.GatewayConnection.run(GatewayConnection.java:207)
        at java.lang.Thread.run(Thread.java:745)
3
  • Please refer the below link:- stackoverflow.com/questions/30664008/… Commented Jan 20, 2017 at 11:43
  • i tried but got error that my code: FinalDf.registerTempTable("mytempTable") sqlContext.sql("create table sixuserstablediary as select * from mytempTable") Commented Jan 20, 2017 at 12:58
  • the error : sqlContext.sql("create table sixuserstablediary as select * from mytempTable"); File "/opt/cloudera/parcels/CDH-5.4.7-1.cdh5.4.7.p0.3/lib/spark/python/pyspark/sql/context.py", line 528, in sql return DataFrame(self._ssql_ctx.sql(sqlQuery), self) File "/opt/cloudera/parcels/CDH-5.4.7-1.cdh5.4.7.p0.3/lib/spark/python/lib/py4j-0.8.2.1-src.zip/py4j/java_gateway.py", line 538, in call File "/opt/cloudera/parcels/CDH-5.4.7-1.cdh5.4.7.p0.3/lib/spark/python/lib/py4j-0.8.2.1-src.zip/py4j/protocol.py", line 300, in get_return_value Commented Jan 20, 2017 at 13:01

1 Answer 1

1

You need to use Spark HiveContext

Import Spark HiveContex

from pyspark.sql import HiveContext
sqlContext = HiveContext(sc)

Create a temporary Table from the dataframe then insert into hive table by selecting data from temporary table.

// Register the dataframe
   df.registerTempTable("tbl_tmp")

   sqlContext.sql("create table default.tbl_hive_data as select * from tbl_tmp")
Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.