

Easy Understanding of Pandas and SQL
1. Data Shape / Count
Python (Pandas)
df.shape
Returns a tuple: (rows, columns).
SQL
SELECT COUNT(*) FROM table;
Returns number of rows.
This helps to inspect dataset size. Pandas gives rows + columns; SQL gives only row count unless you query metadata.
2. Data Types / Schema
Python
df.dtypes
df.info()
SQL
DESCRIBE table;
SHOW COLUMNS FROM table;
This Schema understanding is essential for debugging, joins, and transformations.
3. Select Columns
Python
df[['col1', 'col2']]
SQL
SELECT col1, col2 FROM table;
Purpose is , Basic projection — selecting only what you need improves performance and readability.
4. Filter Rows
Python
df[df['col'] > value]
df.query('col > value')
SQL
SELECT * FROM table WHERE col > value;
Intent is , Filtering is the foundation of data analysis. Pandas uses boolean masks; SQL uses WHERE.
5. Filter with Multiple Conditions
Python
df.query('col1 > 10 & col2 == "A"')
SQL
SELECT * FROM table WHERE col1 > 10 AND col2 = 'A';
This Shows your ability to combine conditions. Pandas uses & and |; SQL uses AND and OR.
6. Sort Data
Python
df.sort_values('col', ascending=False)
SQL
SELECT * FROM table ORDER BY col DESC;
Sorting is used for ranking, top‑N queries, and reporting.
7. Group By and Aggregate
Python
df.groupby('col').agg({'col2': 'sum'})
SQL
SELECT col, SUM(col2) FROM table GROUP BY col;
Grouping is the Core skill for analytics. SQL is more declarative; Pandas is more flexible with multiple aggregations.
8. Count Unique Values
Python
df['col'].nunique()
SQL
SELECT COUNT(DISTINCT col) FROM table;
This is mainly Used for cardinality checks, data quality, and feature engineering.
9. Find Unique Values
Python
df['col'].unique()
SQL
SELECT DISTINCT col FROM table;
Useful for exploring categories, labels, or dimension values.
10. Check for Missing Values
Python
df.isnull().sum()
SQL
SELECT COUNT(*) FROM table WHERE col IS NULL;
Data quality is a major interview topic. Pandas gives column‑wise counts; SQL requires per‑column queries.
11. Handle Missing Values
Python
df.fillna(value)
df.dropna()
SQL
SELECT * FROM table WHERE col IS NOT NULL; (SQL doesn’t “fill”; you use COALESCE)
Cleaning data is essential. Handling NULL values → COALESCE(col, value)
12. Join Tables
Python
pd.merge(df1, df2, on='key')
SQL
SELECT * FROM t1 JOIN t2 ON t1.key = t2.key;
Joins are the heart of relational data. Pandas merge is extremely powerful and supports many join types.
13. Remove Duplicates
Python
df.drop_duplicates()
SQL
SELECT DISTINCT * FROM table;
Mainly Used for deduplication, data cleaning, and ensuring unique records.
14. Add New Column
Python
df['new_col'] = df['col1'] + df['col2']
SQL
Two steps:
Add column:
ALTER TABLE table ADD COLUMN new_col datatype;
Populate column:
UPDATE table SET new_col = col1 + col2;
Pandas is dynamic; SQL requires schema changes.
15. Rename Columns
Python
df.rename(columns={'old': 'new'})
SQL
ALTER TABLE table RENAME COLUMN old TO new;
Renaming is common during cleaning and standardization.
16. Limit Number of Rows
Python
df.head(n)
SQL
SELECT * FROM table LIMIT n;
Used for sampling, debugging, and quick checks.
17. Apply Function to Column
Pandas is extremely flexible with Python functions; SQL is more rigid and requires defined functions.
Pandas
- You can apply any Python function—even complex logic, regex, NLP, ML models.
- No need to register functions.
- Great for exploratory data analysis and feature engineering.
Ex:
df['col'].apply(func)
SQL
- Functions must be predefined, typed, and stored in the database.
- Logic must follow SQL’s procedural or declarative rules.
- More rigid but ensures consistency, security, and performance inside the database engine.
CREATE PROCEDURE update_values()
LANGUAGE plpgsql
AS $$
BEGIN
UPDATE my_table
SET new_col = col * 2;
END;
$$;
Then call it:
CALL update_values();
