Sqoop Interview Questions & Answers

  1. Question 1. What Is The Role Of Jdbc Driver In A Sqoop Set Up?

    Answer :

    To connect to different relational databases sqoop needs a connector. Almost every DB vendor makes this connecter available as a JDBC driver which is specific to that DB. So Sqoop needs the JDBC driver of each of the database it needs to inetract with.

  2. Question 2. Is Jdbc Driver Enough To Connect Sqoop To The Databases?

    Answer :

    No. Sqoop needs both JDBC and connector to connect to a database.

  3. J2EE Interview Questions

  4. Question 3. When To Use Target-dir And When To Use Warehouse-dir While Importing Data?

    Answer :

    To specify a particular directory in HDFS use –target-dir but to specify the parent directory of all the sqoop jobs use warehouse-dir. In this case under the parent directory sqoop will cerate a directory with the same name as th e table.

  5. Question 4. How Can You Import Only A Subset Of Rows Form A Table?

    Answer :

    By using the WHERE clause in the sqoop import statement we can import only a subset of rows.

  6. J2EE Tutorial

  7. Question 5. How Can We Import A Subset Of Rows From A Table Without Using The Where Clause?

    Answer :

    We can run a filtering query on the database and save the result to a temporary table in database.Then use the sqoop import command without using the where clause

  8. Data Warehousing Interview Questions

  9. Question 6. What Is The Advantage Of Using Password-file Rather Than -p Option While Preventing The Display Of Password In The Sqoop Import Statement?

    Answer :

    The password-file option can be used inside a sqoop script while the -P option reads from standard input , preventing automation.

  10. Question 7. What Is The Default Extension Of The Files Produced From A Sqoop Import Using The –compress Parameter?

    Answer :


  11. Data Warehousing Tutorial
    Hadoop Interview Questions

  12. Question 8. What Is The Significance Of Using Compress-codec Parameter?

    Answer :

    To get the out file of a sqoop import in formats other than .gz like .bz2 we use the compress -code parameter.

  13. Question 9. What Is A Disadvantage Of Using Direct Parameter For Faster Data Load By Sqoop?

    Answer :

    The native utilities used by databases to support faster laod do not work for binary data formats like SequenceFile

  14. Java Interview Questions

  15. Question 10. How Can You Control The Number Of Mappers Used By The Sqoop Command?

    Answer :

    The Parameter num-mapers is used to control the number of mappers executed by a sqoop command. We should start with choosing a small number of map tasks and then gradually scale up as choosing high number of mappers initially may slow down the performance on the database side.

  16. Hadoop Tutorial

  17. Question 11. How Can You Avoid Importing Tables One-by-one When Importing A Large Number Of Tables From A Database?

    Answer :

    Using the command

    sqoop import-all-tables

    1. connect
    2. usrename
    3. password
    4. exclude-tables table1,table2 ..

    This will import all the tables except the ones mentioned in the exclude-tables clause.

  18. Hadoop Administration Interview Questions

  19. Question 12. When The Source Data Keeps Getting Updated Frequently, What Is The Approach To Keep It In Sync With The Data In Hdfs Imported By Sqoop?

    Answer :

    sqoop can have 2 approaches.

    1. To use the incremental parameter with append option where value of some columns are checked and only in case of modified values the row is imported as a new row.
    2. To use the incremental parameter with lastmodified option where a date column in the source is checked for records which have been updated after the last import.
  20. J2EE Interview Questions

  21. Question 13. What Is The Usefulness Of The Options File In Sqoop?

    Answer :

    The options file is used in sqoop to specify the command line values in a file and use it in the sqoop commands.

    For example the –connect parameter’s value and –user name value scan be stored in a file and used again and again with different sqoop commands.

  22. Java Tutorial

  23. Question 14. Is It Possible To Add A Parameter While Running A Saved Job?

    Answer :

    Yes, we can add an argument to a saved job at runtime by using the –exec option

    sqoop job –exec jobname — — newparameter

  24. Question 15. How Do You Fetch Data Which Is The Result Of Join Between Two Tables?how Can We Slice The Data To Be Imported To Multiple Parallel Tasks?

    Answer :

    Using the –split-by parameter we specify the column name based on which sqoop will divide the data to be imported into multiple chunks to be run in parallel.

  25. Scala Interview Questions

  26. Question 16. How Can You Choose A Name For The Mapreduce Job Which Is Created On Submitting A Free-form Query Import?

    Answer :

    By using the –mapreduce-job-name parameter. Below is a example of the command.

    sqoop import
    –connect jdbc:mysql://mysql.example.com/sqoop
    –username sqoop
    –password sqoop
    –query ‘SELECT normcities.id,
    FROM normcities
    JOIN countries USING(country_id)
    –split-by id
    –target-dir cities
    –mapreduce-job-name normcities

  27. Sqoop Tutorial

  28. Question 17. Before Starting The Data Transfer Using Mapreduce Job, Sqoop Takes A Long Time To Retrieve The Minimum And Maximum Values Of Columns Mentioned In –split-by Parameter. How Can We Make It Efficient?

    Answer :

    We can use the –boundary –query parameter in which we specify the min and max value for the column based on which the split can happen into multiple mapreduce tasks. This makes it faster as the query inside the –boundary-query parameter is executed first and the job is ready with the information on how many mapreduce tasks to create before executing the main query.

  29. HBase Interview Questions

  30. Question 18. What Is The Difference Between The Parameters?

    Answer :

    sqoop.export.records.per.statement and sqoop.export.statements.per.transaction

    The parameter “sqoop.export.records.per.statement” specifies the number of records that will be used in each insert statement.

    But the parameter “sqoop.export.statements.per.transaction” specifies how many insert statements can be processed parallel during a transaction.


  31. Data Warehousing Interview Questions

  32. Question 19. How Will You Implement All-or-nothing Load Using Sqoop?

    Answer :

    Using the staging-table option we first load the data into a staging table and then load it to the final target table only if the staging load is successful.

  33. Scala Tutorial

  34. Question 20. How Do You Clear The Data In A Staging Table Before Loading It By Sqoop?

    Answer :

    By specifying the –clear-staging-table option we can clear the staging table before it is loaded. This can be done again and again till we get proper data in staging.

  35. Oozie Interview Questions

  36. Question 21. How Will You Update The Rows That Are Already Exported?

    Answer :

    The parameter –update-key can be used to update existing rows. In it a comma-separated list of columns is used which uniquely identifies a row. All of these columns is used in the WHERE clause of the generated UPDATE query. All other table columns will be used in the SET part of the query.

  37. Question 22. How Can You Sync A Exported Table With Hdfs Data In Which Some Rows Are Deleted?

    Answer :

    Truncate the target table and load it again.

  38. HBase Tutorial

  39. Question 23. How Can You Export Only A Subset Of Columns To A Relational Table Using Sqoop?

    Answer :

    By using the –column parameter in which we mention the required column names as a comma separated list of values.

  40. Hadoop Testing Interview Questions

  41. Question 24. How Can We Load To A Column In A Relational Table Which Is Not Null But The Incoming Value From Hdfs Has A Null Value?

    Answer :

    By using the –input-null-string parameter we can specify a default value and that will allow the row to be inserted into the target table.

  42. Hadoop Interview Questions

  43. Question 25. How Can You Schedule A Sqoop Job Using Oozie?

    Answer :

    Oozie has in-built sqoop actions inside which we can mention the sqoop commands to be executed.

  44. Question 26. Sqoop Imported A Table Successfully To Hbase But It Is Found That The Number Of Rows Is Fewer Than Expected. What Can Be The Cause?

    Answer :

    Some of the imported records might have null values in all the columns. As Hbase does not allow all null values in a row, those rows get dropped.

  45. Question 27. Give A Sqoop Command To Show All The Databases In A Mysql Server.?

    Answer :

    $ sqoop list-databases –connect jdbc:mysql://database.example.com/

  46. Java Interview Questions

  47. Question 28. What Do You Mean By Free Form Import In Sqoop?

    Answer :

    Sqoop can import data form a relational database using any SQL query rather than only using table and column name parameters.

  48. Question 29. How Can You Force Sqoop To Execute A Free Form Sql Query Only Once And Import The Rows Serially?

    Answer :

    By using the –m 1 clause in the import command, sqoop cerates only one mapreduce task which will import the rows sequentially.

  49. Question 30. In A Sqoop Import Command You Have Mentioned To Run 8 Parallel Mapreduce Task But Sqoop Runs Only 4. What Can Be The Reason?

    Answer :

    The Mapreduce cluster is configured to run 4 parallel tasks. So the sqoop command must have number of parallel tasks less or equal to that of the MapReduce cluster.

  50. Question 31. What Is The Importance Of –split-by Clause In Running Parallel Import Tasks In Sqoop?

    Answer :

    The –split-by clause mentions the column name based on whose value the data will be divided into groups of records. These group of records will be read in parallel by the mapreduce tasks.

  51. Question 32. What Does This Sqoop Command Achieve?

    Answer :

    $ sqoop import –connnect –table foo –target-dir /dest

    It imports data from a database to a HDFS file named foo located in the directory /dest

  52. Question 33. What Happens When A Table Is Imported Into A Hdfs Directory Which Already Exists Using The –apend Parameter?

    Answer :

    Using the –append argument, Sqoop will import data to a temporary directory and then rename the files into the normal target directory in a manner that does not conflict with existing filenames in that directory.

  53. Hadoop Administration Interview Questions

  54. Question 34. How Can You Control The Mapping Between Sql Data Types And Java Types?

    Answer :

    By using the –map-column-java property we can configure the mapping between.

    Below is an example : $ sqoop import … –map-column-java id = String, value = Integer

  55. Question 35. How To Import Only The Updated Rows Form A Table Into Hdfs Using Sqoop Assuming The Source Has Last Update Timestamp Details For Each Row?

    Answer :

    By using the lastmodified mode. Rows where the check column holds a timestamp more recent than the timestamp specified with –last-value are imported.

  56. Question 36. What Are The Two File Formats Supported By Sqoop For Import?

    Answer :

    Delimited text and Sequence Files.

  57. Scala Interview Questions

  58. Question 37. Give A Sqoop Command To Import The Columns Employee_id,first_name,last_name From The Mysql Table Employee?

    Answer :

    $ sqoop import –connect jdbc:mysql://host/dbname –table EMPLOYEES
        –columns “employee_id,first_name,last_name” 

  59. Question 38. Give A Sqoop Command To Run Only 8 Mapreduce Tasks In Parallel?

    Answer :

    $ sqoop import –connect jdbc:mysql://host/dbname –table table_name
        -m 8

  60. Question 39. What Does The Following Query Do?
    $ Sqoop Import –connect Jdbc:mysql://host/dbname –table Employees
        –where “start_date > ‘2017-03-31’

    Answer :

    It imports the employees who have joined after 31-Mar-2017.

  61. Question 40. Give A Sqoop Command To Import All The Records From Employee Table Divided Into Groups Of Records By The Values In The Column Department_id.?

    Answer :

    $ sqoop import –connect jdbc:mysql://db.foo.com/corp –table EMPLOYEES
       –split-by dept_id

  62. HBase Interview Questions

  63. Question 41. What Does The Following Query Do?

    $ Sqoop Import –connect Jdbc:mysql://db.foo.com/somedb –table Sometable

    –where “id > 1000” –target-dir /incremental_dataset –append

    Answer :

    It performs an incremental import of new data, after having already imported the first 100,0rows of a table

  64. Question 42. Give A Sqoop Command To Import Data From All Tables In The Mysql Db Db1.?

    Answer :

    sqoop import-all-tables –connect jdbc:mysql://host/DB1

  65. Oozie Interview Questions

  66. Question 43. Give A Command To Execute A Stored Procedure Named Proc1 Which Exports Data To From Mysql Db Named Db1 Into A Hdfs Directory Named Dir1.?

    Answer :

    $ sqoop export –connect jdbc:mysql://host/DB1 –call proc1
          –export-dir /Dir1

  67. Question 44. What Is A Sqoop Metastore?

    Answer :

    It is a tool using which Sqoop hosts a shared metadata repository. Multiple users and/or remote users can define and execute saved jobs (created with sqoop job) defined in this metastore.

    Clients must be configured to connect to the metastore in sqoop-site.xml or with the –meta-connect argument.

  68. Question 45. What Is The Purpose Of Sqoop-merge?

    Answer :

    The merge tool combines two datasets where entries in one dataset should overwrite entries of an older dataset preserving only the newest version of the records between both the data sets.

  69. Question 46. How Can You See The List Of Stored Jobs In Sqoop Metastore?

    Answer :

    sqoop job –list

  70. Question 47. Give The Sqoop Command To See The Content Of The Job Named Myjob?

    Answer :

    Sqoop job –show myjob

  71. Question 48. Which Database The Sqoop Metastore Runs On?

    Answer :

    Running sqoop-metastore launches a shared HSQLDB database instance on the current machine.

  72. Question 49. Where Can The Metastore Database Be Hosted?

    Answer :

    The metastore database can be hosted anywhere within or outside of the Hadoop cluster..