Sorry, but Notd.io is not available without javascript Easy Understanding of Pandas and SQL - notd.io

Read more about Easy Understanding of Pandas and SQL
Read more about Easy Understanding of Pandas and SQL
Easy Understanding of Pandas and SQL

free notepinned

1. Data Shape / Count

Python (Pandas)

df.shape

Returns a tuple: (rows, columns).

SQL

SELECT COUNT(*) FROM table;

Returns number of rows.

This helps to inspect dataset size. Pandas gives rows + columns; SQL gives only row count unless you query metadata.

2. Data Types / Schema

Python

df.dtypes

df.info()

SQL

DESCRIBE table;

SHOW COLUMNS FROM table;

This Schema understanding is essential for debugging, joins, and transformations.

3. Select Columns

Python

df[['col1', 'col2']]

SQL

SELECT col1, col2 FROM table;

Purpose is , Basic projection — selecting only what you need improves performance and readability.

4. Filter Rows

Python

df[df['col'] > value]

df.query('col > value')

SQL

SELECT * FROM table WHERE col > value;

Intent is , Filtering is the foundation of data analysis. Pandas uses boolean masks; SQL uses WHERE.

5. Filter with Multiple Conditions

Python

df.query('col1 > 10 & col2 == "A"')

SQL

SELECT * FROM table WHERE col1 > 10 AND col2 = 'A';

This Shows your ability to combine conditions. Pandas uses & and |; SQL uses AND and OR.

6. Sort Data

Python

df.sort_values('col', ascending=False)

SQL

SELECT * FROM table ORDER BY col DESC;

Sorting is used for ranking, top‑N queries, and reporting.

7. Group By and Aggregate

Python

df.groupby('col').agg({'col2': 'sum'})

SQL

SELECT col, SUM(col2) FROM table GROUP BY col;

Grouping is the Core skill for analytics. SQL is more declarative; Pandas is more flexible with multiple aggregations.

8. Count Unique Values

Python

df['col'].nunique()

SQL

SELECT COUNT(DISTINCT col) FROM table;

This is mainly Used for cardinality checks, data quality, and feature engineering.

9. Find Unique Values

Python

df['col'].unique()

SQL

SELECT DISTINCT col FROM table;

Useful for exploring categories, labels, or dimension values.

10. Check for Missing Values

Python

df.isnull().sum()

SQL

SELECT COUNT(*) FROM table WHERE col IS NULL;

Data quality is a major interview topic. Pandas gives column‑wise counts; SQL requires per‑column queries.

11. Handle Missing Values

Python

df.fillna(value)

df.dropna()

SQL

SELECT * FROM table WHERE col IS NOT NULL; (SQL doesn’t “fill”; you use COALESCE)

Cleaning data is essential. Handling NULL values → COALESCE(col, value)

12. Join Tables

Python

pd.merge(df1, df2, on='key')

SQL

SELECT * FROM t1 JOIN t2 ON t1.key = t2.key;

Joins are the heart of relational data. Pandas merge is extremely powerful and supports many join types.

13. Remove Duplicates

Python

df.drop_duplicates()

SQL

SELECT DISTINCT * FROM table;

Mainly Used for deduplication, data cleaning, and ensuring unique records.

14. Add New Column

Python

df['new_col'] = df['col1'] + df['col2']

SQL

Two steps:

Add column:

ALTER TABLE table ADD COLUMN new_col datatype;

Populate column:

UPDATE table SET new_col = col1 + col2;

Pandas is dynamic; SQL requires schema changes.

15. Rename Columns

Python

df.rename(columns={'old': 'new'})

SQL

ALTER TABLE table RENAME COLUMN old TO new;

Renaming is common during cleaning and standardization.

16. Limit Number of Rows

Python

df.head(n)

SQL

SELECT * FROM table LIMIT n;

Used for sampling, debugging, and quick checks.

17. Apply Function to Column

Pandas is extremely flexible with Python functions; SQL is more rigid and requires defined functions.

Pandas

  • You can apply any Python function—even complex logic, regex, NLP, ML models.
  • No need to register functions.
  • Great for exploratory data analysis and feature engineering.

Ex:

df['col'].apply(func)

SQL

  • Functions must be predefined, typed, and stored in the database.
  • Logic must follow SQL’s procedural or declarative rules.
  • More rigid but ensures consistency, security, and performance inside the database engine.

CREATE PROCEDURE update_values()

LANGUAGE plpgsql

AS $$

BEGIN

UPDATE my_table

SET new_col = col * 2;

END;

$$;

Then call it:

CALL update_values();

You can publish here, too - it's easy and free.