Emil Sadek
 GitHub  LinkedIn

Testing SQL with Python and SQLGlot

August 27, 2025

Testing production code is essential for ensuring correctness, but big data pipelines are challenging to test because many query engines are closed-source and cannot be run locally. What if you could test your SQL queries without connecting to a cloud data platform?

SQLGlot is a dependency-free SQL parser, transpiler, optimizer, and engine written in Python. It supports 27 SQL dialects, including Snowflake, BigQuery, Spark, Trino, DuckDB, and Postgres. SQLGlot can interpret SQL queries using Python dictionaries to represent tables, making it ideal for testing SQL. You can take a query written for an engine like BigQuery and execute it against mock Python data locally or in a continuous integration pipeline.

To get started, install SQLGlot (version 26.18+) in your Python environment:

pip install "sqlglot[rs]"

The rs extra adds a Rust-based tokenizer for improved performance.

Next, create a new Python file with the following code:

from sqlglot.executor import execute

sql = "SELECT avg(price) AS average_price FROM items"
tables = {
    "items": [
        {"id": 1, "price": 1.0},
        {"id": 2, "price": 2.0},
    ]
}
expected_result = [{"average_price": 1.5}]
result = execute(sql, dialect="bigquery", tables=tables).to_pylist()
assert result == expected_result

The code is quite simple. It imports the execute function from SQLGlot, calls the function with a SQL query, dialect, and table as arguments, converts the resulting table to a list of rows/dictionaries (using the to_pylist method), and then compares the result with the expected result.

Since it’s just Python, tests can be run with a Python testing framework such as pytest:

from pytest import main
from sqlglot.executor import execute


def test_average_price():
    sql = "SELECT avg(price) AS average_price FROM items"
    tables = {
        "items": [
            {"id": 1, "price": 1.0},
            {"id": 2, "price": 2.0},
        ]
    }
    expected_result = [{"average_price": 1.5}]
    result = execute(sql, dialect="bigquery", tables=tables).to_pylist()
    assert result == expected_result


if __name__ == "__main__":
    main()

To make test cases concise and focused, define a base/default row that provides values for all columns. For each test case, override only the relevant values using dictionary merging:

default_item = {
    "id": 145,
    "sku": "CW2288-111",
    "name": "Nike Air Force 1 '07",
    "category": "Men's Shoes",
    "price": 115.00,
    "in_stock": True,
}

tables = {
    "items": [
        default_item | {"price": 104.99, "in_stock": True},
        default_item | {"price": 249.99, "in_stock": False},
    ]
}

The merge (|) operator merges the right operand (dictionary) with the left operand (dictionary). If a key is present in both operands, the value from the right operand wins.

Now you’re all set to test your SQL like a pro, no cloud required. Happy querying!