Cannot Insert into SQL using PySpark, but works in SQL

Question

I have created a table below in SQL using the following:

CREATE TABLE [dbo].[Validation](
    [RuleId] [int] IDENTITY(1,1) NOT NULL,
    [AppId] [varchar](255) NOT NULL,
    [Date] [date] NOT NULL,
    [RuleName] [varchar](255) NOT NULL,
    [Value] [nvarchar](4000) NOT NULL
)

NOTE the identity key (RuleId)

When inserting values into the table as below in SQL it works:

Note: Not inserting the Primary Key as is will autofill if table is empty and increment

INSERT INTO dbo.Validation VALUES ('TestApp','2020-05-15','MemoryUsageAnomaly','2300MB')

However when creating a temp table on databricks and executing the same query below running this query on PySpark as below:

       %python

        driver = <Driver>
        url = "jdbc:sqlserver:<URL>"
        database = "<db>"
        table = "dbo.Validation"
        user = "<user>"
        password = "<pass>"

        #import the data
        remote_table = spark.read.format("jdbc")\
        .option("driver", driver)\
        .option("url", url)\
        .option("database", database)\
        .option("dbtable", table)\
        .option("user", user)\
        .option("password", password)\
        .load()

        remote_table.createOrReplaceTempView("YOUR_TEMP_VIEW_NAMES")

        sqlcontext.sql("INSERT INTO YOUR_TEMP_VIEW_NAMES VALUES ('TestApp','2020-05-15','MemoryUsageAnomaly','2300MB')")

I get the error below:

AnalysisException: 'unknown requires that the data to be inserted have the same number of columns as the target table: target table has 5 column(s) but the inserted data has 4 column(s), including 0 partition column(s) having constant value(s).;'

Why does it work on SQL but not when passing the query through databricks? How can I insert through pyspark without getting this error?

@DaleK, I tried sqlContext.sql("INSERT INTO YOUR_TEMP_VIEW_NAMES (Appid,Date,RuleName,Value) VALUES (1,'2020-05-15','MemoryUsageAnomaly','2300MB')") However I am getting a Parse Exception: ParseException: "\nmismatched input 'Appid' expecting {'(', 'SELECT', 'FROM', 'DESC', 'VALUES', 'TABLE', 'INSERT', 'DESCRIBE', 'MAP', 'MERGE', 'UPDATE', 'REDUCE'}(line 1, pos 34)\n\n== SQL ==\nINSERT INTO YOUR_TEMP_VIEW_NAMES (Appid,Date,RuleName,Value) VALUES (1,'2020-05-15','MemoryUsageAnomaly','2300MB')\n----------------------------------^^^\n" — Techno04335
– Techno04335, Commented May 15, 2020 at 23:11

David Browne - Microsoft · Accepted Answer · 2020-05-17 20:06:13Z

1

The most straightforward solution here is use JDBC from a Scala cell. EG

%scala

import java.util.Properties
import java.sql.DriverManager

val jdbcUsername = dbutils.secrets.get(scope = "kv", key = "sqluser")
val jdbcPassword = dbutils.secrets.get(scope = "kv", key = "sqlpassword")
val driverClass = "com.microsoft.sqlserver.jdbc.SQLServerDriver"

// Create the JDBC URL without passing in the user and password parameters.
val jdbcUrl = s"jdbc:sqlserver://xxxx.database.windows.net:1433;database=AdventureWorks;encrypt=true;trustServerCertificate=false;hostNameInCertificate=*.database.windows.net;loginTimeout=30;"

// Create a Properties() object to hold the parameters.

val connectionProperties = new Properties()

connectionProperties.put("user", s"${jdbcUsername}")
connectionProperties.put("password", s"${jdbcPassword}")
connectionProperties.setProperty("Driver", driverClass)

val connection = DriverManager.getConnection(jdbcUrl, jdbcUsername, jdbcPassword)
val stmt = connection.createStatement()
val sql = "INSERT INTO dbo.Validation VALUES ('TestApp','2020-05-15','MemoryUsageAnomaly','2300MB')"

stmt.execute(sql)
connection.close()

You could use pyodbc too, but the SQL Server ODBC drivers aren't installed by default, and the JDBC drivers are.

A Spark solution would be to create a view in SQL Server and insert against that. eg

create view Validation2 as
select AppId,Date,RuleName,Value
from Validation

then

tableName = "Validation2"
df = spark.read.jdbc(url=jdbcUrl, table=tableName, properties=connectionProperties)
df.createOrReplaceTempView(tableName)
sqlContext.sql("INSERT INTO Validation2 VALUES ('TestApp','2020-05-15','MemoryUsageAnomaly','2300MB')")

If you want to encapsulate the Scala and call it from another language (like Python), you can use a scala package cell.

eg

%scala

package example

import java.util.Properties
import java.sql.DriverManager

object JDBCFacade 
{
  def runStatement(url : String, sql : String, userName : String, password: String): Unit = 
  {
    val connection = DriverManager.getConnection(url, userName, password)
    val stmt = connection.createStatement()
    try
    {
      stmt.execute(sql)  
    }
    finally
    {
      connection.close()  
    }
  }
}

and then you can call it like this:

jdbcUsername = dbutils.secrets.get(scope = "kv", key = "sqluser")
jdbcPassword = dbutils.secrets.get(scope = "kv", key = "sqlpassword")

jdbcUrl = "jdbc:sqlserver://xxxx.database.windows.net:1433;database=AdventureWorks;encrypt=true;trustServerCertificate=false;hostNameInCertificate=*.database.windows.net;loginTimeout=30;"

sql = "select 1 a into #foo from sys.objects"

sc._jvm.example.JDBCFacade.runStatement(jdbcUrl,sql, jdbcUsername, jdbcPassword)

edited May 17, 2020 at 20:06

answered May 15, 2020 at 23:56

David Browne - Microsoft

90.7k7 gold badges52 silver badges87 bronze badges

Sign up to request clarification or add additional context in comments.

5 Comments

Techno04335 Over a year ago

Because I use sqlcontext.sql in my python, wouldn't I get the same parsing exception as my comment above? Parse Exception: ParseException: "\nmismatched input 'Appid' expecting {'(', 'SELECT', 'FROM', 'DESC', 'VALUES', 'TABLE', 'INSERT', 'DESCRIBE', 'MAP', 'MERGE', 'UPDATE', 'REDUCE'}(line 1, pos 34)\n\n== SQL ==\nINSERT INTO YOUR_TEMP_VIEW_NAMES (Appid,Date,RuleName,Value) VALUES (1,'2020-05-15','MemoryUsageAnomaly','2300MB')\n----------------------------------^^^\n"

David Browne - Microsoft Over a year ago

Spark SQL doesn't support specifying the target columns in an INSERT. See docs.databricks.com/spark/latest/spark-sql/language-manual/…. In the Scala example that's TSQL not Spark SQL. And you can specify the input columns, or let SQL Server automatically ignore the IDENTITY column.

Techno04335 Over a year ago

Browne, Any suggestions how I can run Scala within python environment? I need to essentially use scala within a python function.

David Browne - Microsoft Over a year ago

Great question. Yes! I just learned about Scala package cells, and they work here. See update.

Techno04335 Over a year ago

Works Well. Thanks for the assistance!

Collectives™ on Stack Overflow

Cannot Insert into SQL using PySpark, but works in SQL

1 Answer 1

5 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

5 Comments

Your Answer

Sign up or log in

Post as a guest

Related