Databricks SQL Stored Procedures

Hello and welcome back to the blog!

In today’s post, we’re diving into one of the newest features in Databricks: SQL Stored Procedures. This feature was released just a few weeks ago, and it’s a game-changer for anyone who’s been waiting for more SQL Server–like functionality inside Databricks.

We’ll walk through:

What stored procedures are in general
How Databricks implements them (including IN, OUT, and INOUT parameters)
How permissions work with SQL SECURITY INVOKER
A demo with sample data and reusable procedures
Extra commands (SHOW, DESCRIBE, DROP) you can use
How stored procedures show up in the Databricks catalog

🔹 What Are Stored Procedures?

If you’ve used SQL Server (or any relational database), you’ve likely worked with stored procedures before.

A stored procedure is essentially reusable SQL code that you can call again and again, rather than rewriting the same query.

Some key benefits:

You can parameterize procedures (e.g., pass in a customer_id)
They help with cleaner, reusable SQL workloads
Great for ETL pipelines (data cleaning, logging, transformations, etc.)
Now available natively in Databricks! 🎉

🔹 IN, OUT, and INOUT Parameters

Databricks stored procedures support three types of parameters:

IN: Passes a value into the procedure (e.g., customer_id)
OUT: Returns a value from the procedure (e.g., a computed total)
INOUT: Both accepts an input and returns an updated value

Example (simplified):

CREATE OR REPLACE PROCEDURE area_of_rectangle
  (
    IN x INT,       -- length
    IN y INT,       -- width
    OUT area INT,   -- returns calculated area
    INOUT acc INT   -- accumulator across calls
  )
LANGUAGE SQL
SQL SECURITY INVOKER
AS
BEGIN
  SET area = x * y;
  SET acc = acc + area;
END;

🔹 What is `SQL SECURITY INVOKER`?

When you create a stored procedure in Databricks, you’ll notice this clause:

SQL SECURITY INVOKER

This means the procedure runs with the privileges of the caller, not just the creator.

If you create the procedure, others can run it with their own privileges.
Useful for security, but also something to think about — you don’t always want everyone running procedures that modify production data.
Expect more granular security controls from Databricks in future updates.

🔹 Demo: Building a Data Cleaning Stored Procedure

Let’s go through a simple example:

1. Create Raw, Clean, and Log Tables

-- Raw table (Bronze layer in Medallion architecture)
CREATE OR REPLACE TABLE sales_raw (
    sale_id INT, sale_date_str STRING, customer_id STRING,
    amount DECIMAL(10,2), origin STRING
);

-- Clean table (Silver layer)
CREATE OR REPLACE TABLE sales_clean (
    sale_id INT, sale_date DATE, customer_id STRING,
    amount DECIMAL(10,2), origin STRING
);

-- ETL log table
CREATE OR REPLACE TABLE etl_log (
    etl_timestamp TIMESTAMP, date_from DATE, date_to DATE, origin STRING
);

-- Insert dummy data
INSERT INTO sales_raw VALUES
(1, '2025-07-29 ', ' C001 ', 100.50, ' mobileApp '),
(2, '2025-07-30', 'C002', 200.75, 'web'),
(3, '2025-08-01', 'C003 ', 50.00, 'mobileApp');

2. Create the Stored Procedure

CREATE OR REPLACE PROCEDURE etl_load_sales_by_range_and_source (
    IN p_start_date DATE, IN p_end_date DATE, IN p_origin STRING
)
LANGUAGE SQL
SQL SECURITY INVOKER
AS
BEGIN
    DECLARE run_timestamp TIMESTAMP DEFAULT CURRENT_TIMESTAMP();

    -- Clean and insert into sales_clean
    INSERT INTO sales_clean
    SELECT
        sale_id,
        CAST(TRIM(sale_date_str) AS DATE),
        TRIM(customer_id),
        amount,
        TRIM(origin)
    FROM sales_raw
    WHERE CAST(TRIM(sale_date_str) AS DATE) BETWEEN p_start_date AND p_end_date
      AND TRIM(origin) = p_origin;

    -- Log ETL run
    INSERT INTO etl_log VALUES (run_timestamp, p_start_date, p_end_date, p_origin);
END;

3. Call the Procedure

CALL etl_load_sales_by_range_and_source(date('2025-07-28'), date('2025-08-01'), 'mobileApp');

4. Verify Results

-- Raw Data
SELECT * FROM sales_raw;
-- Cleaned Data
SELECT * FROM sales_clean;
-- Log 
SELECT * FROM etl_log;

🔹 Other Commands You Can Use

Databricks has also introduced supporting commands for managing stored procedures:

-- Show all procedures in schema
SHOW PROCEDURES;

-- Describe a specific procedure
DESCRIBE PROCEDURE etl_load_sales_by_range_and_source;

-- Drop a procedure
DROP PROCEDURE etl_load_sales_by_range_and_source;

🔹 Stored Procedures in the Catalog

Once created, your stored procedures also show up in the Databricks Catalog. Interestingly, they’re currently grouped under Functions rather than a separate “Stored Procedures” section.

This is slightly different from SQL Server conventions, but still a nice way to manage and lock them down with Unity Catalog permissions.

✅ Conclusion

Databricks introducing SQL Stored Procedures is a huge step forward:

Reusable, parameterized SQL code
Built-in support for IN, OUT, and INOUT parameters
Logging and ETL-style use cases now much simpler
Manageable with SHOW, DESCRIBE, and DROP
Integrated into the Databricks Catalog

While the security model (SQL SECURITY INVOKER) may need some refinement, this is a fantastic addition — especially for teams migrating workloads from SQL Server or simplifying SQL-heavy pipelines.

👉 Give them a try in your Databricks environment today and see how they can streamline your workflows!

M	T	W	T	F	S	S
					1	2
3	4	5	6	7	8	9
10	11	12	13	14	15	16
17	18	19	20	21	22	23
24	25	26	27	28	29	30