VMware HA Interview Questions & Answers

  1. Question 1. What Is Vmware Ha?

    Answer :

    As per VMware Definition:

    VMware® High Availability (HA) provides easy to use, cost effective high availability for applications running in virtual machines. In the event of server failure, affected virtual machines are automatically restarted on other production servers with spare capacity.

  2. Question 2. What Is Aam In Ha?

    Answer :

    AAM is the Legato automated availability management.  Prior to vSphere 4.1, VMware HA is actually re engineered to work with VM’s with the help of Legato’s Automated Availability Manager (AAM) software. VMware’s Vcenter agent (vpxa) interfaces with the VMware HA agent which acts as an intermediary to the AAM software. From vSphere 5.0, it uses an agent called “FDM” (Fault Domain Manager).

  3. VMware Infrastructure with ESX Server and VirtualCenter Interview Questions

  4. Question 3. What Are Prerequisites For Ha To Work?

    Answer :

    1. Shared storage for the VMs running in HA cluster.
    2. Essentials plus, standard, Advanced, Enterprise and Enterprise Plus Licensing.
    3. Create VM HA enabled Cluster.
    4. Management network redundancy to avoid frequent isolation response in case of temporary network issues (preferred not a requirement).
  5. Question 4. What Is Maximum Number Of Primary Ha Hosts In Vsphere 4.1?

    Answer :

    Maximum number of primary HA host is 5. VMware HA cluster chooses the first 5 hosts that join the cluster as primary nodes and all others hosts are automatically selected as secondary nodes.

  6. Question 5. What Is The Command To Restart /start/stop Ha Agent In The Esx Host?

    Answer :

    service vmware–aam restart

    service vmware–aam stop

    service vmware–aam start

  7. VMware ESXi Interview Questions

  8. Question 6. Where To Locate Ha Related Logs In Case Of Troubleshooting?

    Answer :

    /Var/log/vmware/aam

  9. Question 7. What The Basic Troubleshooting Steps In Case Of Ha Agent Installs Failed On Hosts In Ha Cluster?

    Answer :

    Below steps are taken from my blog posts troubleshooting HA:

    • Check for some network issues
    • Check the DNS is configured properly
    • Check the vmware HA agent status in ESX host by using below commands service vmware–aam status
    • Check the networks are properly configured  and named exactly as other hosts in the cluster. Otherwise, you will get the below errors while installing or reconfiguring HA agent.
    • Check HA related ports are open in firewall to allow for the communication
    1. Incoming port: TCP/UDP 8042-8045
    2. Outgoing port: TCP/UDP 2050-2250
    • First try to restart /stop/start the vmware HA agent on the affected host using the below commands. In addition u can also try to restart vpxa and management agent in the Host.
    1. service vmware–aam restart
    2. service vmware–aam stop
    3. service vmware–aam start
    • Right Click the affected host and click on “Reconfigure for VMWare HA” to re-install the HA agent that particular host.
    • Remove the affected host from the cluster. Removing ESX host from the cluster will not be allowed untill that host is put into maintenance mode.
    • Alternative solution for 3 step is, Goto cluster settings and uncheck the vmware HA in to turnoff the HA in that cluster and re-enable the vmware HA to get the agent installed.
    • For further troubleshooting , review the HA logs under /Var/log/vmware/aam directory.
  10. Windows Vmware Interview Questions

  11. Question 8. What The Maximum Number Is Of Hosts Per Ha Cluster?

    Answer :

    Maximum number of hosts in the HA cluster is 32.

  12. Question 9. What Is Host Isolation?

    Answer :

    VMware HA has a mechanism to detect a host is isolated from rest of hosts in the cluster. When the ESX host loses its ability to exchange heartbeat via management network between the other hosts in the HA cluster, that ESX host will be considered as a Isolated.

  13. VMware NSX Interview Questions

  14. Question 10. How Host Isolation Is Detected?

    Answer :

    In HA cluster, ESX hosts uses heartbeats to communicate among other hosts in the cluster. By default, Heartbeat will be sent every 1 second.

    If a ESX host in the cluster did not receive heartbeat for 13 seconds from any other hosts in the cluster, The host considered it as isolated and host will ping the configured isolation address (default gateway by default). If the ping fails, VMware HA will execute the Host isolation response.

  15. Question 11. What Is The Different Types Isolation Response Available In Ha?

    Answer :

    Power off – All the VMs are powered off, when the HA detects that the network isolation occurs.

    Shut down – All VMs running on that host are shut down with the help of VMware Tools, when the HA detects that the network isolation occurs. If the shutdown via VMWare tools not happened within 5 minutes, VM’s power off operation will be executed. This behavior can be changed with the help of HA advanced options. Please refer my Post on HA Advanced configuration.

    Leave powered on – The VM’s state remain powered on or remain unchanged, when the HA detects that the network isolation occurs.

  16. VMware Interview Questions

  17. Question 12. How To Add Additional Isolation Address For Redundancy?

    Answer :

    By default, VMWare HA use to ping default gateway as the isolation address if it stops receiving heartbeat. We can add an additional values in case if we are using redundant service  console both belongs to different subnet. Let’s say we can add the default gateway of SC1 as first value and gateway of SC2 as the additional one using the below value.

    1. Right Click you’re HA cluster
    2. Goto to advanced options of HA
    3. Add the line “das.isolationaddress1 = 192.168.0.1″
    4. Add the line “das.isolationaddress2 = 192.168.1.1″ as the additional isolation address
  18. VMware Infrastructure with ESX Server and VirtualCenter Interview Questions

  19. Question 13. What Is Ha Admission Control?

    Answer :

    As per “VMware Availability Guide”,

    Vcenter Server uses admission control to ensure that sufficient resources are available in a cluster to provide failover protection and to ensure that virtual machine resource reservations are respected.

  20. Question 14. What Are The 2 Types Of Settings Available For Admission Control?

    Answer :

    Enable: Do not power on VMs that violate availability constraints.

    Disable: Power on VMs that violate availability constraints.

  21. Question 15. What Are The Different Types Of Admission Control Policy Available With Vmware Ha?

    Answer :

    There are 3 different types of Admission control policy available.

    1. Host failures cluster tolerates.
    2. Percentage of cluster resources reserved as fail over spare capacity.
    3. Specify a fail over host.
  22. Vmware Vsphere Interview Questions

  23. Question 16. How The Host Failures Cluster Tolerates Admission Control Policy Works?

    Answer :

    Select the maximum number of host failures that you can afford for or to guarantee fail over. Prior vSphere 4.1, Minimum is 1 and the maximum is 4.

    In the Host Failures cluster tolerates admission control policy, we can define the specific number of hosts that can fail  in the cluster and also it ensures that the sufficient resources remain to fail over all the virtual machines from that failed hosts to the other hosts in cluster. VMware High Availability (HA) uses a mechanism called slots to calculate both the available and required resources in the cluster for a failing over virtual machines from a failed host to other hosts in the cluster.

  24. Question 17. What Is Slot?

    Answer :

    As per VMware’s Definition,

    “A slot is a logical representation of the memory and CPU resources that satisfy the requirements for any powered-on virtual machine in the cluster.”

    If you have configured reservations at VM level, It influence the HA slot calculation. Highest memory reservation and highest CPU reservation of the VM in your cluster determines the slot size for the cluster.

  25. Vmware Horizon Interview Questions

  26. Question 18. How The Ha Slots Are Calculated?

    Answer :

    I have written a post about how the HA slots are calculated.

  27. VMware ESXi Interview Questions

  28. Question 19. What Is Use Of Host Monitoring Status In Ha Cluster?

    Answer :

    Let’s take an example; you are performing network maintenance activity on your switches which connects your one of Th ESX host in HA cluster.

  29. Question 20. What Will Happen If The Switch Connected To The Esx Host In Ha Cluster Is Down?

    Answer :

    It will not receive heartbeat and also ping to the isolation address also failed. So, host will think itself as isolated and HA will initiate the reboot of virtual machines on the host to other hosts in the cluster. Why do you need this unwanted situation while performing scheduled maintenance window.

    To avoid the above situation when performing scheduled activity which may cause ESX host to isolate, remove the check box in ” Enable Host Monitoring” until you are done with the network maintenance activity.

  30. VMware DRS Interview Questions

  31. Question 21. How To Manually Define The Ha Slot Size?

    Answer :

    By default, HA slot size is determined by the Virtual machine Highest CPU and memory reservation. If no reservation is specified at the VM level, default slot size of 256 MHZ for CPU and 0 MB + memory overhead for RAM will be taken as slot size. We can control the HA slot size manually by using the following values.

    There are 4 options we can configure at HA advanced options related to slot size

    1. das.slotMemInMB – Maximum Bound value for HA memory slot size
    2. das.slotCpuInMHz – Maximum Bound value for HA CPU slot Size
    3. dass.vmMemory inMB – Minimum Bound value for HA memory slot size
    4. dass.vmCpuMin MHz – Minimum Bound value for HA CPU slot size
  32. Question 22. How The “percentage Of Cluster Resources Reserved As Failover Spare Capacity” Admission Control Policy Works?

    Answer :

    In the Percentage of cluster resources reserved as failover spare capacity admission control policy, We can define the specific percentage of total cluster resources are reserved for failover.In contrast to the “Host Failures cluster tolerates admission control policy”, It will not use slots. Instead This policy calculates the in the way below.

    1. It calculates the Total resource requirement for all Powered-on Virtual Machines in the cluster  and also calculates the total resource available in host for virtual machines.
    2. It calculates the current CPU and Memory Failover capacity for the capacity.
    3. If the current CPU and Memory Failover capacity for the cluster < configured failover capacity (ex 25 %)
    4. Admission control will not allow to power on the virtual machine which violates the availability constraints.
  33. Question 23. How The “specify A Failover Host” Admission Control Policy Works?

    Answer :

    In the Specify a failover host” admission control policy, We can define a specific host as a dedicated failover host. When isolation response is detected, HA attempts to restart the virtual machines on the specified failover host. In this Approach, dedicated failover hits will be sitting idle without actively involving or not participating in DRS load balancing.DRS will not migrate or power on placement of virtual machines on the defined failover host.

  34. Question 24. What Is Vm Monitoring Status?

    Answer :

    HA will usually monitors ESX hosts and reboot the virtual machine in the failed hosts in the other host in the cluster in case of host isolation but i need the HA to monitors for Virtual machine failures also. Here the feature called VM monitoring status as part of HA settings.VM monitoring restarts the virtual machine if the vmware tools heartbeat didn’t received with the specified time using Monitoring sensitivity.

  35. Windows Vmware Interview Questions

  36. Question 25. Explain How Restart Of Vm’s Is Handled By Ha In Case Of A Master Esxi Host Failure?

    Answer :

    HA restarts VM’s after failure of an ESXi host. But the time taken by HA to restart VM’s is different in case of a slave ESXi failure and master ESXi failure. We will discuss here the case when master ESXi has failed.

    In case of a failure of a master ESXi, restart of VM’s are delayed till the time a new master is elected because only a master can perform VM restart.

    The timeline is explained as follows:

    • T0 – Master failure.
    • T10s – Master election process initiated.
    • T25s – New master elected and reads the protected list.
    • T35s – New master initiates restarts for all virtual machines on the protected list which are not running.

    At T0 seconds master ESXi has failed, the election process is initiated by slave ESXi hosts after 10 seconds at T10. At T25 the newly elected master first reads the protected list file to find out which VM were protected by HA and are currently not running.  At T35 seconds the master ESXi initiates the VM restart.

  37. Question 26. Explain How Restart Of Vm’s Is Handled By Ha In Case Of A Slave Esxi Host Failure?

    Answer :

    There are two different scenarios for restarting VM’s in case of slave Esxi failure: one where heartbeat datastores are configured and one where heartbeat datastores are not configured.

    The timeline is as follows:

    • T0 – Slave failure
    • T3s – Master begins monitoring datastore heartbeats for 15 seconds.
    • T10s – The host is declared unreachable and the master will ping the management network of the failed host. This is a continuous ping for 5 seconds.
    • T15s – If no heartbeat datastores are configured, the host will be declared dead.
    • T18s – If heartbeat datastores are configured, the host will be declared dead.
    • The master monitors the network heartbeats of a slave. When the slave fails, these heartbeats will no longer be received by the master. We have defined this as T0. After 3 seconds (T3s), the master will start monitoring for datastore heartbeats and it will do this for 15 seconds. On the 10th second (T10s), when no network or datastore heartbeats have been detected, the host will be declared as “unreachable”.
    • The master will also start pinging the management network of the failed host at the 10th second and it will do so for 5 seconds. If no heartbeat datastores were configured, the host will be declared “dead” at the 15th second (T15s) and VM restarts will be initiated by the master.
    • If heartbeat datastores have been configured, the host will be declared dead at the 18th second (T18s) and restarts will be initiated.
  38. Question 27. Explain The Vm Restart Retries Timeline?

    Answer :

    HA will respond when the state of a host has changed, or when the state of one or more virtual machines has changed. There are multiple scenarios in which HA will attempt to restart a virtual machine of which we have listed the most common below:

    • Failed host
    • Isolated host
    • Failed guest Operating System

    Prior to vSphere 5, the actual number of restart attempts was 6, as it excluded the initial attempt. With vSphere 5.0 the default is 5. There are specific times associated with each of these attempts. The following bullet list will clarify this concept. The ‘m’ stands for “minutes” in this list.

    • T0 – Initial Restart
    • T2m – Restart retry 1
    • T6m – Restart retry 2
    • T14m – Restart retry 3
    • T30m – Restart retry 4

    In case of a host failure, HA will try to restart the virtual machine on other hosts in the affected cluster; while performing the restart if this is unsuccessful on that host, the restart count will be increased by 1.

    Let’s say first restart attempt is made at T0 minutes when the host failure has occurred (In actual restart is not performed as soon as host has failed because HA takes some time before declaring host failure; read above the 2 scenarios which I have mentioned).

    If the first restart attempt is failed, then the restart counter is increased by one and the next restart is attempted after 2 minutes (T2). In the same fashion HA keep trying restarting the VM until issued power on attempt is reported as “completed”.

    A successful restart might never occur if the restart count is reached and all five restart attempts were unsuccessful.

  39. VMware NSX Interview Questions

  40. Question 28. Explain Does Ha Declare Or Determines That Slave Esxi Has Isolated?

    Answer :

    Isolation of ESXi hosts is validated on the basis of heartbeats. The timeline for declaring isolation of slave and master ESXi is different. In this case we will discuss isolation of slave ESXi.

    HA triggers a master election process before it will declare a slave ESXi host is isolated. In this timeline, “s” refers to seconds:

    • T0 – Isolation of the host (slave)
    • T10s – Slave enters “election state”
    • T25s – Slave elects itself as master
    • T25s – Slave pings “isolation addresses”
    • T30s – Slave declares itself isolated
    • T60s – Slave “triggers” isolation response

    When an ESXi host is isolated, the value in “power on” file is raised to 1, HA reads this file and validates that ESXi host has been isolated. There is one Power on file per ESXi host and this file contains entries of all those VM’s which are currently powered on an ESXi host.

  41. Question 29. Explain Does Ha Declare Or Determines That Master Esxi Has Isolated?

    Answer :

    In the case of the isolation of a master, this timeline is a bit less complicated because there is no need to go through an election process. In this timeline, “s” refers to seconds.

    • T0 – Isolation of the host (master).
    • T0 – Master pings “isolation addresses”.
    • T5s – Master declares itself isolated and “triggers” isolation response.
  42. Question 30. Is Admission Control Policy Is Dependent On Vcenter Server And Will Admission Control Policy Will Work If Your Vcenter Is Not Available?

    Answer :

    Yes admission control policy is dependent on Vcenter Server although it is part of HA and we all know HA works independently of Vcenter Server. Admission control policies don’t work when at the time of failure of an ESXi host, Vcenter server is not available. This doesn’t mean VM that were running on failed host will not be restarted, but whatever policy you have chosen that policy will not work.

    For E.g.:  You have chosen “Specify failover host” policy and dedicated one ESXi host for handling the failover. Now in normal scenario, if a host failure has occurred then HA will failover the failed VM’s on only this dedicated host and not on any other hosts in cluster. But if Vcenter is not available and this happens then HA might restarts your VM’s on other hosts also if there are not sufficient resources available on your specified failover host.

  43. Question 31. How Does Ha Determines That Esxi Host Is Network Partitioned?

    Answer :

    There is a slight difference between ESXi host isolation and network partitioned. When multiple slave ESXi hosts has isolated together but they can ping each other than this condition is known as network partitioned.

    For e.g.: Subnet mask of 5 ESXi has been changed then they will be unable to talk to master (being on different subnets) but they can communicate to each other (being on same subnet).

    When network partitioned happens in a cluster then election happens between the isolated slaves ESXi and a new master is elected among them. In this case there will be 2 masters in a cluster.

  44. Question 32. How Does Ha Determines Which Vm’s It Need To Restart Which Were Powered Off Or Shutdown Due To Triggered Isolation Response?

    Answer :

    If isolation response is set to “shut down” or “power off’ then when an ESXi host is isolated, VM’s are powered off or shutdown as a result of trigger of isolation response. Now the question is how HA keep tracks of which VM were powered off/shutdown due to this trigger.

    The answer to this question is as follows: When a VM is shutdown/powered off due to triggering of isolation response than the host that has isolated remove entries of those VM’s from power on file and creates a per virtual machine file inside a directory called “powered off”. HA reads these files to identify the state change of the VM’s and based on that it takes decision to restart those VM’s.

    This is necessary because, suppose when a host is isolated and at the same time if someone has manually issued a shutdown/powered off command to a VM, then HA will not restart that VM. There will be no file created for that VM by isolated host because it has been manually shut down.

  45. Question 33. How Does Ha Keep Track Of Which Vm Are Needed To Be Restart In Case Of An Esxi Host Failure?

    Answer :

    When an ESXi host fails, the VM’s which were running on that ESXi are restarted on remaining nodes in the cluster. But how HA knows that how many VM’s were running on the host before it has failed. The answer is:

    HA takes help of 2 files namely “power on” and “Protected list”. The “power on file is maintained by each ESXi host individually and it contains entries of those VM’s which are currently running on that ESXi. The “Protected list” file is maintained at datastore level and tells HA that what were the VM’s which were protected before the failure. On the basis of contents of these 2 files HA takes decision of restarting VM’s.

    When a VM is powered off manually then entry of that VM is removed from “Protected list” file so that HA do not accidently restart that VM also.

  46. VMware Interview Questions

  47. Question 34. Which Parameter Need To Configure To Increase The Response Time For Isolation Detection?

    Answer :

    You can configure a parameter called “das.isolationShutdown.Timeout”. The value of this parameter is specified in minutes and it is time which will be taken by HA to gracefully shutdown a VM when isolation response is set to “Shutdown VM” and it is triggered.

  48. Question 35. What Are The Cases When Election Of Master Takes Place In A Cluster?

    Answer :

    A master is elected by a set of HA agents whenever the agents are not in network contact with a master. A master election thus occurs when HA is first enabled on a cluster and when the host on which the master is running:

    • fails,
    • becomes network partitioned or isolated,
    • is disconnected from Vcenter Server,
    • is put into maintenance or standby mode,
    • or when HA is configured on the host.

    Note: Removing slave ESXi from a cluster doesn’t have any effect on election process i.e. if any slave ESXi is removed or shutdown or put into maintenance mode, election will not happen.

  49. Question 36. What Will Happen When Election Of Master Is Going On In A Cluster And At The Same Time One Of The Slaves Esxi Host Also Failed? How This Failure Will Be Handled Since At The Time Of Failure There Is No Master Esxi Host?

    Answer :

    It is mandatory that for restarting VM’s master should be present in cluster. Now when election is happening in a cluster, it takes 15 seconds to complete the election process. Now during that time if a slave ESXi also fails then restart of VM has to wait until election process is completed.

    The newly elected master will first read the “Protected List” file to find out the VM’s whose power state has been changed. After reading that file it will decide that how many vm’s were there which failed during election time and then will perform restart of those VM’s.

  50. Vmware Vsphere Interview Questions

  51. Question 37. What Are The Things Which Ha Take Into Account Before Restarting Vm’s?

    Answer :

    HA has to take many things into considerations before restarting VM’s in case of ESXi failure. These include:

    1. CPU and memory reservation including memory overhead.
    2. Unreserved capacity of host in cluster.
    3. Restart priority of VM.
    4. VM to host compatibility.
    5. Number of dvi Ports required by VM and number of dvPorts those are available.
    6. Max no vCPUs to VM that can be run on a given host.
    7. Restart latency.
  52. Question 38. What Will Happen If A Vm Fails When Svmotion Was Going On That Vm And Has Not Been Completed Yet? How This Failure Will Be Handled By Ha?

    Answer :

    If a virtual machine needs to be restarted by HA and the virtual machine is in the process of being Storage vMotioned and the virtual machine fails, the restart process is not started until Vcenter informs the master that the Storage vMotion task has completed or has been rolled back.

  53. Question 39. Will Master Election Happen If A New Esxi That Has Visibility To More Datastores Than Existing Master Is Introduced In A Cluster?

    Answer :

    No election will not happen even if the newly introduced ESXi has visibility to more datastores than master ESXi host. But if you reconfigure HA on the cluster then the newly added ESXi will become master because it is connected to more number of datastores.

  54. Question 40. If A Slave Esxi Has Been Removed From A Cluster Then Will Election Be Triggered Again?

    Answer :

    No removal of slave ESXi from cluster doesn’t has any impact on master. No election will be happening in this case.

  55. Vmware Horizon Interview Questions

  56. Question 41. Does Ha Seeks Assistance From Drs Before Starting Failover Of Failed Vm’s?

    Answer :

    Yes HA do takes assistance from DRS sometimes before starting the failover of failed VM’s. If a cluster is configured with admission control policies and either “specify number of host failures cluster tolerates” or “percentage” based policy is used then sometimes it may happen that resources are not fully available on single host and is scattered throughout the cluster. In that case HA will wait before performing failover of VM’s and ask assistance of HA to defragment the resources.