Databricks SQL Stored Procedures

Hello and welcome back to the blog!

In today’s post, we’re diving into one of the newest features in Databricks: SQL Stored Procedures. This feature was released just a few weeks ago, and it’s a game-changer for anyone who’s been waiting for more SQL Server–like functionality inside Databricks.

We’ll walk through:

  • What stored procedures are in general
  • How Databricks implements them (including IN, OUT, and INOUT parameters)
  • How permissions work with SQL SECURITY INVOKER
  • A demo with sample data and reusable procedures
  • Extra commands (SHOW, DESCRIBE, DROP) you can use
  • How stored procedures show up in the Databricks catalog

🔹 What Are Stored Procedures?

If you’ve used SQL Server (or any relational database), you’ve likely worked with stored procedures before.

A stored procedure is essentially reusable SQL code that you can call again and again, rather than rewriting the same query.

Some key benefits:

  • You can parameterize procedures (e.g., pass in a customer_id)
  • They help with cleaner, reusable SQL workloads
  • Great for ETL pipelines (data cleaning, logging, transformations, etc.)
  • Now available natively in Databricks! 🎉

🔹 IN, OUT, and INOUT Parameters

Databricks stored procedures support three types of parameters:

  • IN: Passes a value into the procedure (e.g., customer_id)
  • OUT: Returns a value from the procedure (e.g., a computed total)
  • INOUT: Both accepts an input and returns an updated value

Example (simplified):

CREATE OR REPLACE PROCEDURE area_of_rectangle
  (
    IN x INT,       -- length
    IN y INT,       -- width
    OUT area INT,   -- returns calculated area
    INOUT acc INT   -- accumulator across calls
  )
LANGUAGE SQL
SQL SECURITY INVOKER
AS
BEGIN
  SET area = x * y;
  SET acc = acc + area;
END;

🔹 What is SQL SECURITY INVOKER?

When you create a stored procedure in Databricks, you’ll notice this clause:

SQL SECURITY INVOKER

This means the procedure runs with the privileges of the caller, not just the creator.

  • If you create the procedure, others can run it with their own privileges.
  • Useful for security, but also something to think about — you don’t always want everyone running procedures that modify production data.
  • Expect more granular security controls from Databricks in future updates.

🔹 Demo: Building a Data Cleaning Stored Procedure

Let’s go through a simple example:

1. Create Raw, Clean, and Log Tables

-- Raw table (Bronze layer in Medallion architecture)
CREATE OR REPLACE TABLE sales_raw (
    sale_id INT, sale_date_str STRING, customer_id STRING,
    amount DECIMAL(10,2), origin STRING
);

-- Clean table (Silver layer)
CREATE OR REPLACE TABLE sales_clean (
    sale_id INT, sale_date DATE, customer_id STRING,
    amount DECIMAL(10,2), origin STRING
);

-- ETL log table
CREATE OR REPLACE TABLE etl_log (
    etl_timestamp TIMESTAMP, date_from DATE, date_to DATE, origin STRING
);

-- Insert dummy data
INSERT INTO sales_raw VALUES
(1, '2025-07-29 ', ' C001 ', 100.50, ' mobileApp '),
(2, '2025-07-30', 'C002', 200.75, 'web'),
(3, '2025-08-01', 'C003 ', 50.00, 'mobileApp');

2. Create the Stored Procedure

CREATE OR REPLACE PROCEDURE etl_load_sales_by_range_and_source (
    IN p_start_date DATE, IN p_end_date DATE, IN p_origin STRING
)
LANGUAGE SQL
SQL SECURITY INVOKER
AS
BEGIN
    DECLARE run_timestamp TIMESTAMP DEFAULT CURRENT_TIMESTAMP();

    -- Clean and insert into sales_clean
    INSERT INTO sales_clean
    SELECT
        sale_id,
        CAST(TRIM(sale_date_str) AS DATE),
        TRIM(customer_id),
        amount,
        TRIM(origin)
    FROM sales_raw
    WHERE CAST(TRIM(sale_date_str) AS DATE) BETWEEN p_start_date AND p_end_date
      AND TRIM(origin) = p_origin;

    -- Log ETL run
    INSERT INTO etl_log VALUES (run_timestamp, p_start_date, p_end_date, p_origin);
END;

3. Call the Procedure

CALL etl_load_sales_by_range_and_source(date('2025-07-28'), date('2025-08-01'), 'mobileApp');

4. Verify Results

-- Raw Data
SELECT * FROM sales_raw;
-- Cleaned Data
SELECT * FROM sales_clean;
-- Log 
SELECT * FROM etl_log;

🔹 Other Commands You Can Use

Databricks has also introduced supporting commands for managing stored procedures:

-- Show all procedures in schema
SHOW PROCEDURES;

-- Describe a specific procedure
DESCRIBE PROCEDURE etl_load_sales_by_range_and_source;

-- Drop a procedure
DROP PROCEDURE etl_load_sales_by_range_and_source;

🔹 Stored Procedures in the Catalog

Once created, your stored procedures also show up in the Databricks Catalog. Interestingly, they’re currently grouped under Functions rather than a separate “Stored Procedures” section.

This is slightly different from SQL Server conventions, but still a nice way to manage and lock them down with Unity Catalog permissions.


✅ Conclusion

Databricks introducing SQL Stored Procedures is a huge step forward:

  • Reusable, parameterized SQL code
  • Built-in support for IN, OUT, and INOUT parameters
  • Logging and ETL-style use cases now much simpler
  • Manageable with SHOW, DESCRIBE, and DROP
  • Integrated into the Databricks Catalog

While the security model (SQL SECURITY INVOKER) may need some refinement, this is a fantastic addition — especially for teams migrating workloads from SQL Server or simplifying SQL-heavy pipelines.

👉 Give them a try in your Databricks environment today and see how they can streamline your workflows!

Leave a Reply

Your email address will not be published. Required fields are marked *