Hadoop Distributed File System (HDFS) Interview Questions & Answers

  1. Question 1. Who Is The Provider Of Hadoop?

    Answer :

    Hadoop forms part of Apache project provided by Apache Software Foundation.

  2. Question 2. What Is Meant By Big Data?

    Answer :

    Big Data refers to assortment of huge amount of data which is difficult capturing, storing, processing or reprieving. Traditional database management tools cannot handle them but Hadoop can.

  3. Informatica Interview Questions

  4. Question 3. What Are The Operating Systems On Which Hadoop Works?

    Answer :

    Windows and Linux are the preferred operating system though Hadoop can work on OS x and BSD.

  5. Question 4. What Is The Use Of Hadoop?

    Answer :

    With Hadoop the user can run applications on the systems that have thousands of nodes spreading through innumerable terabytes. Rapid data processing and transfer among nodes helps uninterrupted operation even when a node fails preventing system failure.

  6. Informatica Tutorial

  7. Question 5. What Is The Use Of Big Data Analysis For An Enterprise?

    Answer :

    Analysis of Big Data identifies the problem and focus points in an enterprise. It can prevent big losses and make profits helping the entrepreneurs take informed decision.

  8. Teradata Interview Questions

  9. Question 6. What Are Major Characteristics Of Big Data?

    Answer :

    The three characteristics of Big Data are volume, velocity, and veracity. Earlier it was assessed in megabytes and gigabytes but now the assessment is made in terabytes.

  10. Question 7. Can You Indicate Big Data Examples?

    Answer :

    Facebook alone generates more than 500 terabytes of data daily whereas many other organizations like Jet Air and Stock Exchange Market generates 1+ terabytes of data every hour. These are Big Data.

  11. Teradata Tutorial
    Hadoop Interview Questions

  12. Question 8. What Is A Rack In Hdfs?

    Answer :

    Rack is the storage location where all the data nodes are put together. Thus it is a physical collection of data nodes stored in a single location.

  13. Question 9. How The Client Communicates With Name Node And Data Node In Hdfs?

    Answer :

    The communication mode for clients with name node and data node in HDFS is SSH.

  14. Java Interview Questions

  15. Question 10. Who Is The ‘user’ In Hdfs?

    Answer :

    Anyone who tries to retrieve data from database using HDFS is the user. Client is not end user but an application that uses job tracker and task tracker to retrieve data.

  16. Hadoop Tutorial

  17. Question 11. How Name Node Determines Which Data Node To Write On?

    Answer :

    Name node contains metadata or information in respect of all the data nodes and it will decide which data node to be used for storing data.

  18. Hadoop MapReduce Interview Questions

  19. Question 12. What Type Of Data Is Processed By Hadoop?

    Answer :

    Hadoop processes the digital data only.

  20. Informatica Interview Questions

  21. Question 13. How A Data Node Is Identified As Saturated?

    Answer :

    When a data node is full and has no space left the name node will identify it.

  22. Java Tutorial

  23. Question 14. What Is The Process Of Indexing In Hdfs?

    Answer :

    Once data is stored HDFS will depend on the last part to find out where the next part of data would be stored.

  24. Question 15. Can Blocks Be Broken Down By Hdfs If A Machine Does Not Have The Capacity To Copy As Many Blocks As The User Wants?

    Answer :

    Blocks in HDFS cannot be broken. Master node calculates the required space and how data would be transferred to a machine having lower space.

  25. Apache Pig Interview Questions

  26. Question 16. What Is Meant By ‘block’ In Hdfs?

    Answer :

    Block in HDFS refers to minimum quantum of data for reading or writing. Default block size is 64 MB in HDFS. If a file is 52 MB then HDFS would store it and leave 12 MB empty and ready to use.

  27. Hadoop MapReduce Tutorial

  28. Question 17. Is It Necessary That Name Node And Job Tracker Should Be On The Same Host?

    Answer :

    No! They can be on different hosts.

  29. Machine learning Interview Questions

  30. Question 18. What Is Meant By Heartbeat In Hdfs?

    Answer :

    Data nodes and task trackers send heartbeat signals to Name node and Job tracker respectively to inform that they are alive. If the signal is not received it would indicate problems with the node or task tracker.

  31. Teradata Interview Questions

  32. Question 19. What Is The Role Played By Task Trackers?

    Answer :

    Daemons that run on What data nodes, the task tracers take care of individual tasks on slave node as entrusted to them by job tracker.

  33. Apache Pig Tutorial

  34. Question 20. What Is The Function Of ‘job Tracker’?

    Answer :

    Job tracker is one of the daemons that runs on name node and submits and tracks the MapReduce tasks in Hadoop. There is only one job tracker who distributes the task to various task trackers. When it goes down all running jobs comes to a halt.

  35. NoSQL Interview Questions

  36. Question 21. What Is Daemon?

    Answer :

    Daemon is the process that runs in background in the UNIX environment. In Windows it is ‘services’ and in DOS it is ‘TSR’.

  37. Question 22. What Is Meant By Data Node?

    Answer :

    Data node is the slave deployed in each of the systems and provides the actual storage locations and serves read and writer requests for clients.

  38. HBase Tutorial

  39. Question 23. Which One Is The Master Node In Hdfs? Can It Be Commodity?

    Answer :

    Name node is the master node in HDFS and job tracker runs on it. The node contains metadata and works as high availability machine and single pint of failure in HDFS. It cannot be commodity as the entire HDFS works on it.

  40. HBase Interview Questions

  41. Question 24. What Is Meant By ‘commodity Hardware’? Can Hadoop Work On Them?

    Answer :

    Average and non-expensive systems are known as commodity hardware and Hadoop can be installed on any of them. Hadoop does not require high end hardware to function.

  42. Hadoop Interview Questions

  43. Question 25. What Is Meant By Streaming Access?

    Answer :

    HDFS works on the principle of “write once, read many” and the focus is on fast and accurate data retrieval. Steaming access refers to reading the complete data instead of retrieving single record from the database.

  44. MongoDB Tutorial

  45. Question 26. Would The Calculations Made On One Node Be Replicated To Others In Hdfs?

    Answer :

    No! The calculation would be made on the original node only. In case the node fails then only the master node would replicate the calculation on to a second node.

  46. MongoDB Interview Questions

  47. Question 27. Why Replication Is Pursued In Hdfs Though It May Cause Data Redundancy?

    Answer :

    Systems with average configuration are vulnerable to crash at any time. HDFS replicates and stores data at three different locations that makes the system highly fault tolerant. If data at one location becomes corrupt and is inaccessible it can be retrieved from another location.

    This insightful Cloudera article shows the steps for running HDFS on a cluster.

  48. Java Interview Questions

  49. Question 28. What Are The Main Features Of Hdfs?

    Answer :

    Great fault tolerance, high throughput, suitability for handling large data sets, and streaming access to file system data are the main features of HDFS. It can be built with commodity hardware.

  50. Lucene Tutorial

  51. Question 29. What Is Hdfs?

    Answer :

    HDFS is filing system use to store large data files. It handles streaming data and running clusters on the commodity hardware.

  52. Data Science R Interview Questions

  53. Question 30. What Are The Main Components Of Hadoop?

    Answer :

    Main components of Hadoop are HDFS used to store large databases and MapReduce used to analyze them.

  54. Question 31. How Is Hadoop Different From Traditional Rdbms?

    Answer :

    RDBMS can be useful for single files and short data whereas Hadoop is useful for handling Big Data in one shot.

  55. Question 32. Which Are The Major Players On The Web That Uses Hadoop?

    Answer :

    Introduce in 2002 by Doug Cutting, Hadoop was used in Google MapReduce and HDFS project in 2004 and 2006. Yahoo and Facebook adopted it in 2008 and 2009 respectively. Major commercial enterprises using Hadoop include EMC, Hortonworks, Cloudera, MaOR, Twitter, EBay, and Amazon among others.

  56. Question 33. What Are The Basic Characteristics Of Hadoop?

    Answer :

    Written in Java, Hadoop framework has the capability of solving issues involving Big Data analysis. Its programming model is based on Google MapReduce and infrastructure is based on Google’s Big Data and distributed file systems. Hadoop is scalable and more nodes can be added to it.

  57. Hadoop MapReduce Interview Questions

  58. Question 34. What Are The Characteristics Of Data Scientists?

    Answer :

    Data scientists analyze data and provide solutions for business problems. They are gradually replacing business and data analysts.