0

I am currently using the Google Address Validation API in a PySpark (Databricks) pipeline to validate addresses from a table. Each row contains an address in a column called 'Address', and I send a request to the API for validation.

However, when i test the process with just a single record, the API appears to send two requests instead of one, resulting in higher usage costs. I have verified that only one transformation is triggered, and there's no loop or retry logic implemented at our end.

# Define a user-defined function (UDF) to validate an address
def validate_address_udf(street, city, state, postal_code):
    addr = f"{street}, {city}, {state} {postal_code}"
    api_key = "API KEY"
    url = f"https://addressvalidation.googleapis.com/v1:validateAddress?key={api_key}"

    payload = {
        "address": {
            "addressLines": [addr]
        }
    }

    headers = {
        "Content-Type": "application/json"
    }

    response = requests.post(url, json=payload, headers=headers)

    if response.status_code == 200:
        result_data = response.json().get("result", {})
        formatted_address = result_data.get("address", {}).get("formattedAddress", "")
        return formatted_address
    else:
        return f"Error: {response.status_code}, {response.text}"

# Register the UDF
validate_address = udf(validate_address_udf, StringType())

# Apply the UDF to the DataFrame to get validated addresses
result_df = distinct_df.withColumn("ValidatedAddress", validate_address(*required_columns))```


I've checked that:

* Only one row is being processed.

* No retries or duplicate calls are being made from our code explicitly.

* Disabling eager evaluation or caching does not change the outcome.

Has anyone else experienced this issue, or is there a known cause (e.g., Databricks execution behavior or UDF behavior) that could explain the double API calls?

Any guidance on how to prevent the duplicate request would be appreciated.

0

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.