Question 1. Explain How We Can Capture The Correlation Between Continuous And Categorical Variable?
Yes, it is possible by using ANCOVA technique. It stands for Analysis of Covariance.
It is used to calculate the association between continuous and categorical variables.
Question 2. How To Handle Or Missing Data In A Dataset?
An individual can easily find missing or corrupted data in a data set either by dropping the rows or columns. On contrary, they can decide to replace the data with another value.
In Pandas they are two ways to identify the missing data, these two methods are very useful.
isnull() and dropna().
Question 3. Define What Is Fourier Transform In A Single Sentence?
A process of decomposing generic functions into a superposition of symmetric functions is considered to be a Fourier Transform.
Question 4. What Is Deep Learning?
Deep learning is a process where it is considered to be a subset of machine learning process.
Question 5. What Is The Difference Between An Array And Linked List?
An array is an ordered fashion of collection of objects. A linked list is a series of objects that are processed in a sequential order.
Question 6. Define A Hash Table?
They are generally used for database indexing.
A hash table is nothing but a data structure that produces an associative array.
Question 7. Mention Any One Of The Data Visualization Tools That You Are Familiar With?
This is another question where one has to be completely honest and also giving out your personal experience with these type of tools are really important. Some of the data visualization tools are Tableau, Plot.ly, and matplotlib.
Question 8. Is Rotation Necessary In Pca?
Yes, the rotation is definitely necessary because it maximizes the differences between the variance captured by the components.
Question 9. How Is F1 Score Is Used?
The average of Precision and Recall of a model is nothing but F1 score measure. Based on the results, the F1 score is 1 then it is classified as best and 0 being the worst.
Question 10. How Recall And True Positive Rate Are Related?
The relation is
True Positive Rate = Recall.
Question 11. Assume That You Are Working On A Data Set, Explain How Would You Select Important Variables?
The following are few methods can be used to select important variables:
- Use of Lasso Regression method.
- Using Random Forest, plot variable imprtance chart.
- Using Linear regression.
Question 12. Explain The Concept Of Machine Learning And Assume That You Are Explaining This To A 5-year-old Baby?
Yes, Machine learning is exactly the same way how babies do their day to day activities, the way they walk or sleep etc. It is a common fact that babies cannot walk straight away and they fall and then they get up again and then try. This is the same thing when it comes to machine learning, it is all about how the algorithm is working and at the same time redefining every time to make sure the end result is as perfect as possible.
Question 13. What Is The Difference Between Machine Learning And Data Mining?
Data mining is about working on unstructured data and then extract it to a level where the interesting and unknown patterns are identified.
Machine learning is a process or a study whether it closely relates to design, development of the algorithms that provide an ability to the machines to capacity to learn.
Question 14. Please State Few Popular Machine Learning Algorithms?
- Nearest Neighbour
- Neural Networks
- Decision Trees etc
- Support vector machines
Question 15. What Are The Three Stages To Build The Model In Machine Learning?
- Model building
- Model testing
- Applying the model
Question 16. What Is The Difference Between Supervised And Unsupervised Machine Learning?
A Supervised learning is a process where it requires training labeled data. When it comes to Unsupervised learning it doesn’t require data labeling.
Question 17. What Is The Difference Between Bias And Variance?
Bias:Bias can be defined as a situation where an error has occurred due to use of assumptions in the learning algorithm.
Variance: Variance is an error caused because of the complexity of the algorithm that is been used to analyze the data.
Data Mining Interview Questions
Data Mining Tutorial
Artificial Intelligence Interview Questions
Artificial Intelligence Tutorial
Machine design Interview Questions
Design Patterns Interview Questions
Design Patterns Tutorial
Apache Drill Interview Questions
Data Mining Interview Questions
Mahout Interview Questions
Electrical Machines Interview Questions