Machine Learning Interview Questions 2025

Machine Learning Interview Questions 2025 - Series2

April 19, 2025

11. 🧠 What is Dimensionality Reduction in Machine Learning?

Dimensionality reduction refers to the process of reducing the number of input variables or features in a dataset. High-dimensional data can be complex, slow to process, and prone to overfitting. Techniques like Principal Component Analysis (PCA) and t-SNE help simplify datasets while preserving their essential patterns.

Benefits:

Improves model performance
Reduces training time
Helps visualize high-dimensional data

12. ⚙️ What Are Hyperparameters? How Are They Different from Parameters?

Hyperparameters are settings that are set before training a model, like learning rate, number of trees in Random Forest, or batch size in neural networks. Parameters, on the other hand, are values learned by the model itself, such as weights in linear regression.

Hyperparameter tuning is crucial for optimizing model performance and can be done using Grid Search or Random Search.

13. 🧪 What is Model Validation?

Model validation checks how well your machine learning model performs on unseen data. It’s essential to ensure your model isn’t just memorizing the training data.

Common Validation Techniques:

Hold-out Validation
k-Fold Cross Validation
Leave-One-Out Cross Validation (LOOCV)

14. 📉 What is Gradient Descent?

Gradient Descent is an optimization algorithm used to minimize the cost or error in a model by adjusting parameters like weights. It works by computing the gradient (slope) of the cost function and moving in the direction of steepest descent.

Types: Batch, Stochastic, and Mini-batch Gradient Descent.

15. 🔁 What is the Difference Between Bagging and Boosting?

Both are ensemble learning techniques, but they work differently:

Bagging (Bootstrap Aggregating): Builds multiple models independently and combines their output (e.g., Random Forest).
Boosting: Builds models sequentially, where each model corrects the errors of the previous one (e.g., XGBoost, AdaBoost).

16. 📊 What is ROC Curve and AUC Score?

The ROC Curve (Receiver Operating Characteristic) shows the trade-off between true positive rate and false positive rate across different thresholds. The AUC (Area Under the Curve) gives a single score to evaluate how well a model can distinguish between classes. AUC closer to 1 means better performance.

17. 🧮 What is the Curse of Dimensionality?

As the number of features or dimensions in a dataset increases, the data becomes sparse and difficult to interpret or visualize. This phenomenon is known as the Curse of Dimensionality.

Impacts:

Models become more complex and prone to overfitting
Computational cost increases
Distance metrics lose meaning

18. 🧬 What Are Confounding Variables?

Confounding variables are external factors that influence both the independent and dependent variables, creating a false association. In machine learning, failing to control for these can lead to misleading model predictions or biased results.

19. 🔄 What is the Difference Between Batch and Online Learning?

Batch Learning trains the model using the entire dataset at once. It's efficient for static datasets.
Online Learning updates the model incrementally as new data comes in—perfect for real-time or streaming data scenarios.

20. 🧱 What is One-Hot Encoding?

One-Hot Encoding is a technique used to convert categorical variables into a numerical format so they can be used in ML models. Each category becomes a new binary column (1 if present, 0 otherwise).

Example: Color = [Red, Blue, Green] becomes:

Red   Blue   Green
1      0      0
0      1      0
0      0      1

Search This Blog

Radhika Nanda