Content

1 Key Takeaways
2 What Are Pandas and DuckDB?
3 Architectural Differences Between Pandas and DuckDB
4 Benchmarks
5 Integrating DuckDB and Pandas
6 When DuckDB Can Replace Pandas and When It Should Not
7 Limitations and Considerations
8 Conclusion

Vijona

1 hour ago

Pandas vs DuckDB: A Practical Comparison for Python Data Workflows

Pandas has been the go-to tool for data manipulation in Python for well over ten years. Whether you have cleaned features for a machine learning model, inspected a CSV file in a notebook, or built a small ETL workflow, there is a good chance your code has centered around the Pandas DataFrame. Pandas has played a major role in the Python data ecosystem for a long time.

However, as datasets have grown and analytical workloads have become more demanding, the in-memory and mostly single-threaded nature of Pandas has increasingly become a limitation. DuckDB addresses many of these challenges as an in-process SQL OLAP, or Online Analytical Processing, engine. Its common description as “SQLite for analytics” reflects its promise of:

Fast and scalable analytical queries
Native support for Parquet and Arrow
Parallel execution
The ability to process datasets larger than available memory

This article compares both tools from a practical, developer-focused perspective. It looks at how they differ in architecture, performance, scalability, and developer experience. It also explains where DuckDB can meaningfully take over tasks from Pandas and where Pandas remains the better option.

Key Takeaways

DuckDB should be seen as a complement to Pandas, not as a full replacement. DuckDB is designed for large analytical workloads, while Pandas remains very useful for small to medium-sized datasets, feature engineering, and general Python-native data manipulation.
Their architectures are fundamentally different. Pandas is a Python library built around an in-memory DataFrame object. DuckDB is a vectorized, SQL-first, column-oriented OLAP engine that supports parallel execution and can query data beyond memory limits.
Performance and scalability often favor DuckDB, especially when working with large Parquet or Arrow datasets. Its query execution can be much faster than Pandas because of projection pushdown, filter pushdown, multi-threading, and disk-backed execution.
A major strength of DuckDB is its ability to fit into hybrid workflows. It can query Pandas, Polars, and Arrow DataFrames directly, combine them with large external Parquet, CSV, or SQLite files, and return results in different formats. This makes it possible to use SQL directly inside a notebook workflow.
The best choice depends on the task. DuckDB is well suited for OLAP-style queries, large joins, and big datasets. Pandas is still excellent for interactive data cleaning, custom Python logic, close integration with machine learning and visualization libraries, and workloads that fit comfortably into RAM.

What Are Pandas and DuckDB?

To compare their performance and workflow capabilities properly, it helps to understand what each tool is designed to be. Pandas is a DataFrame library that is deeply integrated with Python. DuckDB, by contrast, is an embedded SQL analytics engine that runs directly inside your Python process.

Pandas as a Python DataFrame Library

Pandas is widely regarded as the standard library for data wrangling in Python. It introduced the DataFrame concept to Python in 2008 and has since become one of the most important tools in the data science ecosystem. A Pandas DataFrame is a two-dimensional, in-memory table structure with labeled rows and columns. It provides fast indexed access through a Python API. Operations such as slicing, filtering, and transforming data are supported by C and NumPy operations for vectorized execution.

Key Characteristics of Pandas

In-memory operation: Pandas DataFrames are stored in RAM. This allows fast random access for smaller datasets, but it also means that the complete dataset must fit into memory. When working with several million rows, memory errors and noticeable slowdowns can appear quickly if the system runs out of available memory.
Rich API and ecosystem: Pandas provides a broad, Python-friendly API for filtering, aggregating, merging, time series handling, and many other data tasks. It is highly flexible and also allows custom Python logic inside DataFrame workflows. Pandas also works very well with other libraries, including visualization tools and scikit-learn, making it easy to pass a DataFrame into charts or machine learning models.
Ease of use: Many data scientists find Pandas intuitive. Since the code is written in normal Python, it is usually straightforward to debug and iterate on. Basic Pandas workflows also have a relatively gentle learning curve.

DuckDB as an In-Process SQL Engine

DuckDB is often described as the “SQLite for analytics”. Like Pandas, it can query data inside your Python process. Under the hood, however, DuckDB works very differently:

Columnar, vectorized engine: DuckDB stores data in a column-oriented format and uses vectorized execution, processing chunks of column values, or vectors, at a time.
SQL-centered interface: DuckDB is mainly used through SQL queries such as SELECT, JOIN, and GROUP BY. It is a complete SQL engine with a rich dialect that is similar to PostgreSQL SQL. Workflows that require many lines of Pandas code can often be written as one SQL query in DuckDB.
Disk-based and memory-conscious: DuckDB can query disk-based columnar formats such as Parquet, CSV, and JSON directly without first loading everything into memory. In practice, you can point DuckDB to a 10 GB Parquet file and run a limited SELECT query almost immediately with low memory overhead.
Embedded OLAP database functionality: Because DuckDB is a database engine, it includes features expected from databases, including concurrent query support, transactions, window functions, common table expressions, parallel execution, and other advanced SQL capabilities.
Lightweight and dependency-free: DuckDB is a small package of around 10 MB with no external server requirement. After installing it with pip, it is ready to use. Everything it needs is statically linked, so no separate database service is required.

Architectural Differences Between Pandas and DuckDB

A comparison between DuckDB and Pandas shows several important architectural distinctions:

Feature	Pandas DataFrame Library	DuckDB In-Process SQL Engine
Core abstraction	Python DataFrame and Series objects with an imperative API.	Relational tables queried through declarative SQL and exposed through a Python API.
Storage format	In practice, DataFrames are row-oriented from a workflow perspective. Columns are backed by NumPy or Arrow arrays, but many operations move through rows in Python.	Columnar storage with data held in contiguous column blocks, optimized for scans and aggregations.
Execution model	Runs through Python loops and vectorized NumPy or Arrow operations. It is single-threaded by default, with multi-threading only in certain cases.	Uses a vectorized query engine that processes batches of values per operator through a C++ engine and can use multiple CPU cores.
Memory and scale	The full DataFrame, or grouped data during group-by operations, must fit into RAM. Out-of-core workflows need manual workarounds or other tools.	Built for larger-than-memory analytics, with transparent disk spilling and streaming from columnar files such as Parquet.
Concurrency and transactions	No ACID transaction layer. Operations modify in-memory objects inside one Python process.	Supports ACID transactions with MVCC, allowing safe transactional updates and consistent reads, including when using a persistent .duckdb file.
Query language and ergonomics	Python DataFrame API with method chains such as .groupby(), .agg(), and .merge(). Very flexible, but complex pipelines can become verbose.	Standard SQL with joins, window functions, common table expressions, and macros. For users comfortable with SQL, complex analytics can often be clearer and shorter.
Integration with DataFrames	Works directly with in-memory DataFrames, and much of the ecosystem is built around the Pandas API.	Can query Pandas and Polars DataFrames through replacement scans, using zero-copy where possible, and can return results as Pandas, Polars, Arrow, or NumPy.
File and external data access	Often depends on separate Python libraries, such as pyarrow or storage-specific packages, for Parquet, cloud/object storage, and other formats. Input and output logic is usually written explicitly in Python.	Includes connectors and extensions that can query CSV, Parquet, JSON, and many other formats, including object storage and HTTP sources, directly from SQL without additional Python packages.
Ecosystem and tooling	Large and mature ecosystem with strong support in visualization and machine learning libraries such as Matplotlib, Seaborn, scikit-learn, and statsmodels.	Growing analytics-focused ecosystem. Integrates through Arrow and connectors and can serve as a fast local engine for BI tools, dbt, or custom applications.
Typical sweet spot	Interactive exploration, feature engineering, and small to medium-sized datasets that fit easily into memory. Well suited for fast experimentation in notebooks.	Heavy analytical queries such as joins, aggregations, and window functions on large local datasets. Useful for SQL-first workflows, local OLAP, and moving expensive work away from Pandas.

DuckDB is built as a columnar database, so an aggregation such as SELECT SUM(value) can read only the columns needed instead of entire rows. With vectorized execution, operations are performed on large blocks of values at once, reducing overhead per operation. Pandas, by default, often processes data row by row or in smaller slices. As a result, DuckDB is frequently faster and more memory-efficient for aggregations, filters, and joins, especially as datasets become larger.

Benchmarks

The following benchmark compares DuckDB, Polars, and Pandas. It was run on a MacBook Pro M2 with 16 GB of RAM. The synthetic benchmark dataset represents 1 million people and 10 million position records. The performance results showed DuckDB as the fastest solution. On average, DuckDB finished in 3.84 seconds. Polars followed closely with an average runtime of 5.77 seconds. Pandas was slower than both DuckDB and Polars, with an average runtime of 19.57 seconds, which is more than five times slower than DuckDB.

This result is useful when evaluating the strengths and limits of Pandas, especially for large datasets or performance-sensitive data processing. It also shows why newer and more specialized data processing tools such as DuckDB and Polars are becoming increasingly important. Depending on the workload, they may offer substantial performance advantages over Pandas.

Integrating DuckDB and Pandas

One of DuckDB’s biggest strengths is that it does not force you to choose between SQL and Pandas. Instead, it fits into existing Python workflows:

Run SQL queries directly against in-memory Pandas DataFrames.
Return query results as Pandas, Polars, or Arrow objects.
Combine external files such as CSV, Parquet, or JSON with DataFrames in one query.
Use the same integration style with Polars and Apache Arrow.

DuckDB becomes a SQL engine inside your notebook rather than a separate system that has to be managed.

1. Querying Pandas DataFrames with SQL Using Replacement Scans

With DuckDB’s replacement scan mechanism, a Pandas DataFrame can be treated like a SQL table by referencing the variable name in a query. No explicit registration is required.

Example: Filtering and Transforming a Pandas DataFrame with SQL

Copy Code


pip install duckdb
import duckdb
import pandas as pd

# Create a small DataFrame in Pandas
df = pd.DataFrame(
    {
        "city": ["Paris", "Berlin", "Lagos", "Delhi"],
        "temp_c": [21, 19, 30, 35],
    }
)

# DuckDB detects 'df' as a table through replacement scans
result = duckdb.sql("""
    SELECT
        city,
        temp_c,
        temp_c * 9.0/5.0 + 32 AS temp_f
    FROM df
    WHERE temp_c >= 25
""").df()

print(result)

Key Points

df should be a regular Pandas DataFrame.
duckdb.sql(…) automatically reads df as a SQL table named df.
No copy operation is required because DuckDB accesses the existing DataFrame in memory.
The final result is materialized as another Pandas DataFrame.

You can also use an explicit connection and registration when you need more control:

Copy Code


con = duckdb.connect()
con.register("cities", df)

hot_cities = con.sql("""
    SELECT city, temp_c
    FROM cities
    WHERE temp_c >= 25
""").df()

2. Returning Results as Pandas, Polars, or Arrow

After executing a SQL query, you usually continue working in the Python data tool that best fits your workflow. DuckDB’s Python API provides convenient conversion methods for this:

.df() returns a Pandas DataFrame
.pl() returns a Polars DataFrame
.arrow() returns an Apache Arrow Table
.fetchnumpy() returns NumPy arrays

Example: One SQL Query with Multiple Result Formats

Copy Code


import duckdb
import pandas as pd
import polars as pl
import pyarrow as pa

# Simple example dataset
df = pd.DataFrame(
    {
        "product": ["A", "A", "B", "B"],
        "amount":  [100, 150,  80, 120],
    }
)

rel = duckdb.sql("""
    SELECT
        product,
        SUM(amount) AS total_amount
    FROM df
    GROUP BY product
    ORDER BY total_amount DESC
""")

# 1) Return results as Pandas
pandas_result = rel.df()

# 2) Return results as Polars
polars_result = rel.pl()

# 3) Return results as an Arrow table
arrow_result = rel.arrow()

print("Pandas:\n", pandas_result)
print("Polars:\n", polars_result)
print("Arrow schema:\n", arrow_result.schema)

In a real project, you might:

Use .df() when passing the result to scikit-learn or Python plotting libraries.
Use .pl() when the next processing steps are written in Polars.
Use .arrow() when data needs to move into systems that support Arrow or when zero-copy interchange is useful.

3. Hybrid Workflows: Joining Files and DataFrames in One Query

DuckDB is especially strong in hybrid workflows that combine:

Large datasets stored in files, such as Parquet on disk or in object storage.
Smaller reference tables held in memory through Pandas.

These can be joined in one SQL statement without writing custom ETL logic.

Example: Joining a Large Parquet Fact Table with a Small Pandas Dimension Table

For example, you might have:

data/orders.parquet, containing tens of millions of orders and too large to handle comfortably in Pandas.
A small Pandas DataFrame with customer segments calculated from a CRM export.

Copy Code


import os
import duckdb
import pandas as pd
from pathlib import Path

# 1. Make sure the 'data' directory exists
data_dir = Path("data")
data_dir.mkdir(exist_ok=True)

# 2. Build a small demo orders dataset in Pandas
orders = pd.DataFrame(
    {
        "order_id":    [101, 102, 103, 104, 105],
        "customer_id": [1,   2,   1,   3,   2],
        "order_date":  pd.to_datetime(
            ["2025-01-02", "2025-01-03", "2025-01-05", "2025-02-01", "2025-02-10"]
        ),
        "order_amount": [200.0, 150.0, 300.0, 80.0, 120.0],
    }
)

parquet_path = data_dir / "orders.parquet"

# 3. Store the data as Parquet, using pyarrow or fastparquet
orders.to_parquet(parquet_path, index=False)

print(f"Saved demo file at: {parquet_path.resolve()}")

# 4. Create a small customer segment DataFrame
customer_segments = pd.DataFrame(
    {
        "customer_id": [1, 2, 3],
        "segment":     ["Enterprise", "SMB", "Consumer"],
    }
)

# 5. Connect DuckDB and register the dimension table
con = duckdb.connect()
con.register("customer_segments", customer_segments)

# 6. Execute a hybrid query that joins Parquet data with Pandas data
query = f"""
    SELECT
        o.customer_id,
        s.segment,
        SUM(o.order_amount) AS total_spend
    FROM '{parquet_path.as_posix()}' AS o
    JOIN customer_segments AS s
      ON o.customer_id = s.customer_id
    WHERE o.order_date >= DATE '2025-01-01'
    GROUP BY o.customer_id, s.segment
    ORDER BY total_spend DESC
"""

top_customers = con.sql(query).df()
print(top_customers)

This pattern demonstrates a best-of-both-worlds approach:

Pandas can be used for ad-hoc data preparation, feature engineering, or creating lookup tables.
DuckDB can efficiently scan and aggregate large columnar files.
Everything happens inside one Python process or notebook, without needing Spark, Dask, or a separate data warehouse.

4. Replacement Scans with Polars and Arrow

DuckDB works well with Pandas, but its integration does not stop there. It also works smoothly with Polars and Apache Arrow. This is important when using a mixed data stack.

There are two common scenarios:

Use a Polars DataFrame or Arrow table as input for DuckDB queries.
Return DuckDB query results as Polars or Arrow objects.

Example A: Querying a Polars DataFrame with SQL

Copy Code


import duckdb
import polars as pl

# Create a Polars DataFrame
pl_df = pl.DataFrame(
    {
        "id":    [1, 2, 3, 4],
        "value": [10, 20, 30, 40],
    }
)

# DuckDB can detect 'pl_df' similarly to Pandas through replacement scans
rel = duckdb.sql("""
    SELECT
        id,
        value,
        value * 2 AS value_x2
    FROM pl_df
    WHERE value >= 20
""")

# Return the result as Polars again
result_pl = rel.pl()
print(result_pl)

Example B: Working with Arrow Tables

Copy Code


import duckdb
import pyarrow as pa

# Create a small Arrow table
arrow_table = pa.table(
    {
        "event_type": ["click", "view", "click", "purchase"],
        "user_id":    [10, 10, 11, 10],
        "amount":     [None, None, None, 99.0],
    }
)

# Register the Arrow table explicitly for clarity
con = duckdb.connect()
con.register("events", arrow_table)

# Run SQL on the Arrow table
agg = con.sql("""
    SELECT
        user_id,
        COUNT(*)                    AS num_events,
        SUM(CASE WHEN event_type = 'purchase' THEN amount ELSE 0 END) AS revenue
    FROM events
    GROUP BY user_id
""").arrow()

print(agg)

In both examples:

Unnecessary copies are avoided by staying close to Arrow or Polars memory layouts.
DuckDB acts as a fast SQL layer over existing DataFrame or Arrow data.
You can convert between Pandas, Polars, and Arrow at the boundaries of your pipeline, depending on what each step requires.

When DuckDB Can Replace Pandas and When It Should Not

Should Pandas be replaced completely by DuckDB? In some situations, yes, but not in every case. The following overview shows which types of tasks DuckDB can handle well and which are usually better suited to Pandas:

Tool	Use Case	When to Prefer It / Description
DuckDB	Very large datasets or big data on a laptop	Useful for analyzing datasets that are too large for Pandas, such as tens of millions of rows or multi-gigabyte files. It is suitable for log files, telemetry, and other large tables where Pandas would require sampling, downcasting, or might fail. DuckDB can often work with more than 100 million rows on a typical laptop with 16 GB RAM.
DuckDB	Analytical queries and OLAP-style reporting	Best for workflows that resemble SQL, including grouping, joining, filtering, aggregations, and window functions. DuckDB can replace chains of df.groupby().agg() and merge operations with compact SQL. It is often used for reports, exploratory analysis, and notebook-based transformations.
DuckDB	Columnar file formats such as Parquet and Arrow	Useful when data is mainly stored as Parquet or Arrow. DuckDB can query these files directly without loading full DataFrames into memory and can also write Parquet data. It works as a local query engine over data lake-style files and is often faster and more memory-efficient.
DuckDB	Interactive exploration with SQL	Helpful for users who prefer exploring data with SQL statements such as SELECT, WHERE, and LIMIT. DuckDB supports iterative querying in notebooks and scripts, similar to a local SQL IDE that can run offline. It can replace many Pandas-based exploration steps for SQL-oriented users.
DuckDB	Memory-bound computations	Well suited to operations that are very memory-heavy in Pandas, including large group-bys with many categories, multiple big joins, or wide tables. DuckDB uses streaming aggregation and disk-spilling join strategies, making workloads possible that might exceed Pandas memory limits.
DuckDB	Production and data engineering pipelines	Useful for recurring analytics in scripts or applications that process large data. It can be more stable and predictable at scale than Pandas, which may fail as data volumes grow. DuckDB fits nightly aggregation jobs, batch ETL, and embedded analytics, and it can be used from Python, R, Java, and other environments.
Pandas	Small datasets and quick scripts	Best for tiny or small datasets, such as a CSV with 5,000 rows, and quick one-off analyses. Loading and manipulating data in Pandas is immediate, and writing a short Python snippet is often faster than writing SQL, especially for users fluent in Python.
Pandas	Complex data manipulation and feature engineering	Best for tasks that need arbitrary Python logic, external libraries, or iterative algorithms. It is easy to create new columns using loops, vectorized operations, or apply. Pandas is useful for custom text processing, advanced ranking logic, specialized pivots, and detailed feature engineering.
Pandas	Ecosystem and library compatibility	Many tools expect Pandas DataFrames or NumPy arrays, including scikit-learn, Matplotlib, and Seaborn. Even if DuckDB prepares the data, the workflow often ends in Pandas for modeling or visualization. Staying in Pandas can be simpler when every step depends on it.
Pandas	Real-time interactive notebook edits	Strong for interactive data cleaning and inspection with commands such as df.head(), boolean filters, and manual edits like df.iloc[5, 3] = None. Because the data is directly editable in memory, ad-hoc adjustments can be more convenient than re-running SQL queries.
Pandas	Team habits and existing codebases	Useful when there is already a large Pandas codebase or when a team is comfortable with Pandas but not SQL. Rewriting everything in DuckDB has a cost, so gradual adoption is usually better. Pandas remains valuable because of its maturity and familiarity.

Pandas remains strong in terms of flexibility and for the broad range of small and medium-sized workloads where its overhead is not noticeable and its native Python integration is helpful. DuckDB is not meant to replace that completely. Instead, it is designed for situations where Pandas struggles, such as large datasets, demanding analytical queries, and SQL-centered workflows.

Limitations and Considerations

Although DuckDB has clear advantages, it is not a universal solution. Important limitations include:

Concurrency: DuckDB uses a single-writer model. It supports multiple concurrent readers, but only one write transaction at a time. For multi-writer workloads or heavy OLTP use cases, a server-based database such as PostgreSQL is a better fit.
Ecosystem maturity: Pandas has existed since 2008 and has a mature ecosystem of visualization and machine learning libraries around it. DuckDB’s ecosystem is growing quickly, but some higher-level convenience features available in Pandas are still missing.
Learning curve: Pandas syntax may feel more natural to analysts who do not know SQL. On the other hand, SQL offers long-term stability, portability, and broad familiarity across many systems.
No built-in plotting: DuckDB is a query engine. For visualizations, results still need to be used with Pandas, Matplotlib, or another plotting library.
Experimental functionality: DuckDB is under active development, and some features may still be experimental, including certain extensions or support for newer file formats. These features can change between versions.

Conclusion

DuckDB is not intended to eliminate Pandas. Instead, it extends what Pandas can do. As data workloads grow beyond a single in-memory DataFrame, DuckDB brings a modern, high-performance analytical engine directly into the Python workflow. It can read Parquet files, push filters down to storage, use all available CPU cores, and work with datasets much larger than RAM. This is possible without deploying a server, changing the working environment, or dealing with the overhead of distributed databases.

Pandas continues to be a strong choice for daily exploration, prototyping, and fast feature engineering that benefits from Python’s flexibility. But when a workload involves scanning millions of rows, joining large Parquet files, or working with Arrow-backed datasets, DuckDB provides speed and scalability that Pandas alone does not offer.

DuckDB gives data scientists the performance of a local analytical database inside a notebook while extending workflows built around Pandas, Polars, Arrow, and dbt.

In practice, the future is not “DuckDB versus Pandas.” It is DuckDB and Pandas together. Each tool has a clear role in the modern analytics stack. Combined, they create a workflow that is fast, flexible, and suitable for the scale of today’s data.

Source: digitalocean.com

Create a Free Account

Try now

Posts you might be interested in:

Moderne Hosting Services mit Cloud Server, Managed Server und skalierbarem Cloud Hosting für professionelle IT-Infrastrukturen

Kimi Linear: Efficient Long-Context AI Inference

AI/ML, Tutorial

14 minutes ago

Vijona14 minutes ago Kimi Linear: A Hardware-Aware Architecture for Efficient Long-Context AI Inference Moonshot AI has introduced another notable release. After the strong impression created by Kimi-K2 and its post-training…

Apache Airflow: Workflow Orchestration Guide

Python, Tutorial

37 minutes ago

Vijona37 minutes ago Apache Airflow: Workflow Orchestration for Data Pipelines Modern organizations that work with data depend on pipelines that collect, transform, enhance, and transfer information from one place to…

Build Faster Agentic LLM Workflows with Python

AI/ML, Tutorial

1 hour ago

Vijona1 hour ago Build Faster Agentic LLM Workflows with Asynchronous Python Calls Large language models can be difficult to run reliably in production because they may introduce inaccurate answers, inconsistent…

FEATURED PRODUCTS

Kubernetes

ccloud³

Managed Server

Cloud GPU

S3 Object Storage

COMPUTE

MANAGED

STORAGE

NETWORKING

MANAGEMENT TOOLS

BACKUPS & SNAPSHOTS

WEBSITE HOSTING

HOUSING

FEATURED INDUSTRIES

Enterprise

Saas-Hosting

Startup

INDUSTRIES

MORE INDUSTRIES

FEATURED USE CASES

Linux-Hosting

VMware Migration

Docker Hosting

USE CASES

MORE USE CASES

RESSOURCES

Help Center

Trust Center

Glossar

Tutorials

MORE CENTRON

MORE INFOS

FEATURED PRODUCTS

Kubernetes

ccloud³

Managed Server

Cloud GPU

S3 Object Storage

COMPUTE

MANAGED

STORAGE

NETWORKING

MANAGEMENT TOOLS

BACKUPS & SNAPSHOTS

WEBSITE HOSTING

HOUSING

FEATURED INDUSTRIES

Enterprise

Saas-Hosting

Startup

INDUSTRIES

MORE INDUSTRIES

FEATURED USE CASES

Linux-Hosting

VMware Migration

Docker Hosting

USE CASES

MORE USE CASES

RESSOURCES

Help Center

Trust Center

Glossar

Tutorials

MORE CENTRON

MORE INFOS

Pandas vs DuckDB: A Practical Comparison for Python Data Workflows

Key Takeaways

What Are Pandas and DuckDB?

Pandas as a Python DataFrame Library

Key Characteristics of Pandas

DuckDB as an In-Process SQL Engine

Architectural Differences Between Pandas and DuckDB

Benchmarks

Integrating DuckDB and Pandas

1. Querying Pandas DataFrames with SQL Using Replacement Scans

Example: Filtering and Transforming a Pandas DataFrame with SQL

Key Points

2. Returning Results as Pandas, Polars, or Arrow

Example: One SQL Query with Multiple Result Formats