Machine Learning Interview Questions -series 3

April 19, 2025

21. 🌟 What is a Cost Function in Machine Learning?

A cost function measures how well a model is performing. It calculates the difference between predicted values and actual outcomes. In simpler terms, it tells you how "wrong" your model is. Lower cost means better performance. Common cost functions include Mean Squared Error (MSE) for regression and Cross-Entropy for classification.

22. 🎯 What is Overfitting and How Can You Prevent It?

Overfitting happens when your model learns the training data too well—even the noise—resulting in poor performance on new data. To prevent it:

Use cross-validation
Apply regularization techniques like L1 or L2
Use simpler models or prune decision trees
Collect more data if possible

23. 🧾 What is Underfitting?

Underfitting occurs when a model is too simple to capture the underlying patterns in the data. It performs poorly on both training and testing data. The fix? Use a more complex model, add more features, or reduce regularization.

24. 🗂️ What is Feature Engineering?

Feature engineering is the art of selecting, modifying, or creating new input variables to improve model performance. For example, combining “date” and “time” into a “timestamp” or extracting text length from reviews can help models learn better.

25. 📦 What is Feature Selection and Why Is It Important?

Feature selection involves choosing the most relevant features for your model. This reduces noise, speeds up training, and avoids overfitting. Methods include filter-based selection, wrapper methods, and embedded techniques like Lasso regression.

26. 🧩 What Is Cross-Validation?

Cross-validation helps test your model’s ability to generalize. In k-Fold Cross-Validation, data is split into k parts: the model trains on k-1 parts and tests on the remaining part. This cycle repeats, giving a robust evaluation.

27. 🚧 What is Regularization?

Regularization adds a penalty to the loss function to discourage complex models. It helps prevent overfitting. L1 (Lasso) shrinks some coefficients to zero, effectively performing feature selection. L2 (Ridge) reduces the magnitude of coefficients.

28. 🔄 What Is the Difference Between Lazy and Eager Learning?

Lazy Learning (like k-NN) stores data and waits until a query to make predictions. Eager Learning (like Decision Trees) builds a model immediately during training. Lazy learners are slower at prediction but quick to train.

29. ⚡ What Is the Difference Between Supervised and Unsupervised Learning?

Supervised Learning: Learns from labeled data (e.g., predicting house prices)
Unsupervised Learning: Finds hidden patterns in unlabeled data (e.g., customer segmentation)

30. 🤖 What is Reinforcement Learning?

Reinforcement learning is a feedback-based learning technique where an agent learns by interacting with an environment. It receives rewards or penalties and updates its actions to maximize cumulative reward. Think of it like training a dog using treats!

31. 💡 What Are Activation Functions in Neural Networks?

Activation functions determine whether a neuron should be activated or not. They add non-linearity to neural networks. Common types include:

Sigmoid
Tanh
ReLU (Rectified Linear Unit)

32. 🧬 What is a Confusion Matrix?

A confusion matrix shows the performance of a classification model by comparing predicted vs actual outcomes. It consists of True Positives, True Negatives, False Positives, and False Negatives. It helps compute accuracy, precision, recall, and F1 score.

33. 🔍 What is Precision and Recall?

Precision measures how many of the predicted positives are actually positive. Recall measures how many actual positives were correctly predicted. They are critical in imbalance-sensitive applications like medical diagnosis.

34. 📈 What is the Bias-Variance Tradeoff?

High bias means the model is too simple (underfitting), while high variance means it's too complex (overfitting). The goal is to find a balance—just enough complexity to generalize well on unseen data.

35. 🧠 What is the Difference Between Deep Learning and Machine Learning?

Machine Learning includes algorithms that learn from data (like Decision Trees, SVM). Deep Learning is a subset that uses neural networks with multiple layers, especially useful in image, speech, and natural language tasks.

36. 📚 What Is Transfer Learning?

Transfer Learning uses knowledge from a pre-trained model on a large dataset and applies it to a new but related problem. It’s especially useful in deep learning, like using ImageNet-trained models for medical imaging.

37. 📊 What Are Ensemble Models?

Ensemble models combine predictions from multiple models to improve accuracy and robustness. Popular techniques include:

Bagging: Random Forest
Boosting: XGBoost, AdaBoost
Stacking: Layering models in stages

38. 🔒 What is Model Interpretability?

Model interpretability refers to how well humans can understand a model’s decisions. Simple models like decision trees are interpretable, while deep neural networks are often black boxes. Tools like SHAP and LIME help explain complex models.

39. 🧮 What is a Confounding Variable?

A confounding variable influences both the independent and dependent variables, potentially distorting the true relationship. It's crucial to identify and control them to maintain model integrity.

40. 🖇️ What is Multicollinearity?

Multicollinearity occurs when two or more features are highly correlated. This can distort model predictions and affect interpretability. Detect it using correlation matrices or VIF (Variance Inflation Factor), and resolve it by removing or combining features.

Search This Blog

Radhika Nanda