Easy Understanding of Pandas and SQL

tutorials technology ideas education

free notepinned

1. Data Shape / Count

Python (Pandas)

df.shape

Returns a tuple: (rows, columns).

SQL

SELECT COUNT(*) FROM table;

Returns number of rows.

This helps to inspect dataset size. Pandas gives rows + columns; SQL gives only row count unless you query metadata.

2. Data Types / Schema

Python

df.dtypes

df.info()

SQL

DESCRIBE table;

SHOW COLUMNS FROM table;

This Schema understanding is essential for debugging, joins, and transformations.

3. Select Columns

Python

df[['col1', 'col2']]

SQL

SELECT col1, col2 FROM table;

Purpose is , Basic projection — selecting only what you need improves performance and readability.

4. Filter Rows

Python

df[df['col'] > value]

df.query('col > value')

SQL

SELECT * FROM table WHERE col > value;

Intent is , Filtering is the foundation of data analysis. Pandas uses boolean masks; SQL uses WHERE.

5. Filter with Multiple Conditions

Python

df.query('col1 > 10 & col2 == "A"')

SQL

SELECT * FROM table WHERE col1 > 10 AND col2 = 'A';

This Shows your ability to combine conditions. Pandas uses & and |; SQL uses AND and OR.

6. Sort Data

Python

df.sort_values('col', ascending=False)

SQL

SELECT * FROM table ORDER BY col DESC;

Sorting is used for ranking, top‑N queries, and reporting.

7. Group By and Aggregate

Python

df.groupby('col').agg({'col2': 'sum'})

SQL

SELECT col, SUM(col2) FROM table GROUP BY col;

Grouping is the Core skill for analytics. SQL is more declarative; Pandas is more flexible with multiple aggregations.

8. Count Unique Values

Python

df['col'].nunique()

SQL

SELECT COUNT(DISTINCT col) FROM table;

This is mainly Used for cardinality checks, data quality, and feature engineering.

9. Find Unique Values

Python

df['col'].unique()

SQL

SELECT DISTINCT col FROM table;

Useful for exploring categories, labels, or dimension values.

10. Check for Missing Values

Python

df.isnull().sum()

SQL

SELECT COUNT(*) FROM table WHERE col IS NULL;

Data quality is a major interview topic. Pandas gives column‑wise counts; SQL requires per‑column queries.

11. Handle Missing Values

Python

df.fillna(value)

df.dropna()

SQL

SELECT * FROM table WHERE col IS NOT NULL; (SQL doesn’t “fill”; you use COALESCE)

Cleaning data is essential. Handling NULL values → COALESCE(col, value)

12. Join Tables

Python

pd.merge(df1, df2, on='key')

SQL

SELECT * FROM t1 JOIN t2 ON t1.key = t2.key;

Joins are the heart of relational data. Pandas merge is extremely powerful and supports many join types.

13. Remove Duplicates

Python

df.drop_duplicates()

SQL

SELECT DISTINCT * FROM table;

Mainly Used for deduplication, data cleaning, and ensuring unique records.

14. Add New Column

Python

df['new_col'] = df['col1'] + df['col2']

SQL

Two steps:

Add column:

ALTER TABLE table ADD COLUMN new_col datatype;

Populate column:

UPDATE table SET new_col = col1 + col2;

Pandas is dynamic; SQL requires schema changes.

15. Rename Columns

Python

df.rename(columns={'old': 'new'})

SQL

ALTER TABLE table RENAME COLUMN old TO new;

Renaming is common during cleaning and standardization.

16. Limit Number of Rows

Python

df.head(n)

SQL

SELECT * FROM table LIMIT n;

Used for sampling, debugging, and quick checks.

17. Apply Function to Column

Pandas is extremely flexible with Python functions; SQL is more rigid and requires defined functions.

Pandas

You can apply any Python function—even complex logic, regex, NLP, ML models.
No need to register functions.
Great for exploratory data analysis and feature engineering.

Ex:

df['col'].apply(func)

SQL

Functions must be predefined, typed, and stored in the database.
Logic must follow SQL’s procedural or declarative rules.
More rigid but ensures consistency, security, and performance inside the database engine.

CREATE PROCEDURE update_values()

LANGUAGE plpgsql

AS $$

BEGIN

UPDATE my_table

SET new_col = col * 2;

END;

$$;

Then call it:

CALL update_values();

You can publish here, too - it's easy and free.

Finally! A Fully Customizable, Ad-Free News and Information Network

Notd is a revolutionary publishing network: a go-to source for unbiased news, podcasts, newsletters, poetry, and more.

Finally! A Fully Customizable, Ad-Free News and Information Network

Notd is a revolutionary publishing network: a go-to source for unbiased news, podcasts, newsletters, poetry, and more.

Easy Understanding of Pandas and SQL

Pandas

SQL

Want to read more?