0

I have created a table below in SQL using the following:

CREATE TABLE [dbo].[Validation](
    [RuleId] [int] IDENTITY(1,1) NOT NULL,
    [AppId] [varchar](255) NOT NULL,
    [Date] [date] NOT NULL,
    [RuleName] [varchar](255) NOT NULL,
    [Value] [nvarchar](4000) NOT NULL
)

NOTE the identity key (RuleId)

When inserting values into the table as below in SQL it works:

Note: Not inserting the Primary Key as is will autofill if table is empty and increment

INSERT INTO dbo.Validation VALUES ('TestApp','2020-05-15','MemoryUsageAnomaly','2300MB')

However when creating a temp table on databricks and executing the same query below running this query on PySpark as below:

       %python

        driver = <Driver>
        url = "jdbc:sqlserver:<URL>"
        database = "<db>"
        table = "dbo.Validation"
        user = "<user>"
        password = "<pass>"

        #import the data
        remote_table = spark.read.format("jdbc")\
        .option("driver", driver)\
        .option("url", url)\
        .option("database", database)\
        .option("dbtable", table)\
        .option("user", user)\
        .option("password", password)\
        .load()

        remote_table.createOrReplaceTempView("YOUR_TEMP_VIEW_NAMES")

        sqlcontext.sql("INSERT INTO YOUR_TEMP_VIEW_NAMES VALUES ('TestApp','2020-05-15','MemoryUsageAnomaly','2300MB')")

I get the error below:

AnalysisException: 'unknown requires that the data to be inserted have the same number of columns as the target table: target table has 5 column(s) but the inserted data has 4 column(s), including 0 partition column(s) having constant value(s).;'

Why does it work on SQL but not when passing the query through databricks? How can I insert through pyspark without getting this error?

1
  • @DaleK, I tried sqlContext.sql("INSERT INTO YOUR_TEMP_VIEW_NAMES (Appid,Date,RuleName,Value) VALUES (1,'2020-05-15','MemoryUsageAnomaly','2300MB')") However I am getting a Parse Exception: ParseException: "\nmismatched input 'Appid' expecting {'(', 'SELECT', 'FROM', 'DESC', 'VALUES', 'TABLE', 'INSERT', 'DESCRIBE', 'MAP', 'MERGE', 'UPDATE', 'REDUCE'}(line 1, pos 34)\n\n== SQL ==\nINSERT INTO YOUR_TEMP_VIEW_NAMES (Appid,Date,RuleName,Value) VALUES (1,'2020-05-15','MemoryUsageAnomaly','2300MB')\n----------------------------------^^^\n" Commented May 15, 2020 at 23:11

1 Answer 1

1

The most straightforward solution here is use JDBC from a Scala cell. EG

%scala

import java.util.Properties
import java.sql.DriverManager

val jdbcUsername = dbutils.secrets.get(scope = "kv", key = "sqluser")
val jdbcPassword = dbutils.secrets.get(scope = "kv", key = "sqlpassword")
val driverClass = "com.microsoft.sqlserver.jdbc.SQLServerDriver"

// Create the JDBC URL without passing in the user and password parameters.
val jdbcUrl = s"jdbc:sqlserver://xxxx.database.windows.net:1433;database=AdventureWorks;encrypt=true;trustServerCertificate=false;hostNameInCertificate=*.database.windows.net;loginTimeout=30;"

// Create a Properties() object to hold the parameters.

val connectionProperties = new Properties()

connectionProperties.put("user", s"${jdbcUsername}")
connectionProperties.put("password", s"${jdbcPassword}")
connectionProperties.setProperty("Driver", driverClass)

val connection = DriverManager.getConnection(jdbcUrl, jdbcUsername, jdbcPassword)
val stmt = connection.createStatement()
val sql = "INSERT INTO dbo.Validation VALUES ('TestApp','2020-05-15','MemoryUsageAnomaly','2300MB')"

stmt.execute(sql)
connection.close()

You could use pyodbc too, but the SQL Server ODBC drivers aren't installed by default, and the JDBC drivers are.

A Spark solution would be to create a view in SQL Server and insert against that. eg

create view Validation2 as
select AppId,Date,RuleName,Value
from Validation

then

tableName = "Validation2"
df = spark.read.jdbc(url=jdbcUrl, table=tableName, properties=connectionProperties)
df.createOrReplaceTempView(tableName)
sqlContext.sql("INSERT INTO Validation2 VALUES ('TestApp','2020-05-15','MemoryUsageAnomaly','2300MB')")

If you want to encapsulate the Scala and call it from another language (like Python), you can use a scala package cell.

eg

%scala

package example

import java.util.Properties
import java.sql.DriverManager

object JDBCFacade 
{
  def runStatement(url : String, sql : String, userName : String, password: String): Unit = 
  {
    val connection = DriverManager.getConnection(url, userName, password)
    val stmt = connection.createStatement()
    try
    {
      stmt.execute(sql)  
    }
    finally
    {
      connection.close()  
    }
  }
}

and then you can call it like this:

jdbcUsername = dbutils.secrets.get(scope = "kv", key = "sqluser")
jdbcPassword = dbutils.secrets.get(scope = "kv", key = "sqlpassword")

jdbcUrl = "jdbc:sqlserver://xxxx.database.windows.net:1433;database=AdventureWorks;encrypt=true;trustServerCertificate=false;hostNameInCertificate=*.database.windows.net;loginTimeout=30;"

sql = "select 1 a into #foo from sys.objects"

sc._jvm.example.JDBCFacade.runStatement(jdbcUrl,sql, jdbcUsername, jdbcPassword)
Sign up to request clarification or add additional context in comments.

5 Comments

Because I use sqlcontext.sql in my python, wouldn't I get the same parsing exception as my comment above? Parse Exception: ParseException: "\nmismatched input 'Appid' expecting {'(', 'SELECT', 'FROM', 'DESC', 'VALUES', 'TABLE', 'INSERT', 'DESCRIBE', 'MAP', 'MERGE', 'UPDATE', 'REDUCE'}(line 1, pos 34)\n\n== SQL ==\nINSERT INTO YOUR_TEMP_VIEW_NAMES (Appid,Date,RuleName,Value) VALUES (1,'2020-05-15','MemoryUsageAnomaly','2300MB')\n----------------------------------^^^\n"
Spark SQL doesn't support specifying the target columns in an INSERT. See docs.databricks.com/spark/latest/spark-sql/language-manual/…. In the Scala example that's TSQL not Spark SQL. And you can specify the input columns, or let SQL Server automatically ignore the IDENTITY column.
Browne, Any suggestions how I can run Scala within python environment? I need to essentially use scala within a python function.
Great question. Yes! I just learned about Scala package cells, and they work here. See update.
Works Well. Thanks for the assistance!

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.