Author Archives: chouse

Troubleshooting vCenter database connection

We are testing out the latest VMware View release, 4.5, and during an uninstall of an older release, somehow the vCenter service lost its connection to the SQL Server 2005 Express database that we are using and it would not start.

After some poking around, the vCenter log (C:\Documents and Settings\All Users\Application Data\VMware\VMware VirtualCenter\Logs\vpxd-*.log) contained these telling lines:

[2010-09-15 09:14:55.354 03728 info ‘App’] [Vpxd::ServerApp::Init:759] Calling: VpxdVdb::Init(Vdb::GetInstance(), false, false)
[2010-09-15 09:14:55.354 03728 error ‘App’] ODBC error: (IM002) – [Microsoft][ODBC Driver Manager] Data source name not found and no default driver specified
[2010-09-15 09:14:55.354 03728 error ‘App’] Error getting configuration info from the database
[2010-09-15 09:14:55.354 03728 error ‘App’] [Vpxd::ServerApp::Init] Init failed: VpxdVdb::Init(Vdb::GetInstance(), false, false)
[2010-09-15 09:14:55.354 03728 error ‘App’] Failed to intialize VMware VirtualCenter. Shutting down…
[2010-09-15 09:14:55.354 03728 info ‘App’] Forcing shutdown of VMware VirtualCenter now

Poking around the VMware Knowledge Base, I found the article 1003928: Troubleshooting the database data source used by VirtualCenter Server which mentions the registry key “HKEY_LOCAL_MACHINE\SOFTWARE\VMware, Inc.\VMware VirtualCenter\DB”.  The value for the key “1” was blank, meaning no DSN was specified. I changed the value to our System DSN “vcenter”.

Next I opened up odbcad32 to actually look at the “vcenter” System DSN but could not open it because the SQL Native Client 10.0 driver was missing! I downloaded it from the Microsoft SQL Server 2008 Feature Pack, August 2008 and installed it and then was able to open the DSN. I found that the “Server” field was simply set to “localhost” and when I tried to test the connection, it timed out. I looked at the old ViewComposer DSN and it had a server and instance specified – “VIEW45-VC\SQLEXP_VIM”. I put this in the “vcenter” DSN and then with integrated Windows authentication, I was able to change the default database to “VIM_VCDB” and test the connection and it succeeded. I saved the DSN and attempted to start the vCenter service. This time it started and stayed running and the logs looked good.

VMware KB wins again!

Planning to evaluate Veeam Backup & Replication

At work, we have been running a virtualized server environment for about 5 years. Currently we have 8 ESXi 4.1 HP blade servers hosting about 165 guest VMs.

All our servers including our VMs are backed up using Tivoli Storage Manager (TSM) which does an incremental backup every night to a disk pool and then is offloaded to a tape pool during the day.

We can use TSM to restore individual files inside a guest VM but we can’t easily (or at least have never tried) to restore an entire VM if it were to be lost.

A few months ago during a fresh installation of ESXi 4.0 (prior to the release of 4.1), I accidentally selected the wrong SCSI disk to install ESXi on and ended up nuking an entire VMFS datastore. Luckily it was only hosting about 25 virtual desktops and those were easily recreated. But it got me thinking about what would happen if we accidentally lost an entire VMFS datastore hosting server VMs, or even just lost one server VM for any reason.

Therefore I decided to look in to backing up the entire VM and not just the files inside it. I asked around and unequivocally heard that the best solution was the Veeam Backup & Replication suite. My only experience so far with Veeam has been their FastSCP product and that was a nice piece of software, though I did not need to use it that much. I don’t think I even have it installed.

I plan to ask Veeam for a 16-socket trial license so I can get an idea of what backing up my entire environment will be like. I will use changed-block tracking to enable quick incremental backups after the large initial one.

My backup target will be an old EVA8000 array that we purchased and has about 15TB of FC and FATA disk. I only need about 6TB so I plan to carve out a large Vdisk and present it to a Windows Server 2008 Standard R2 64-bit HP blade server where I will install Veeam Backup & Replication. I will also assign the existing VMFS datastores to the Veeam Backup & Replication server in order to do LAN-free (SAN only) backups going from the VMFS datastores, through the backup server, to the destination Vdisk.

I am very excited about this and hope the license-purchase quote that I also asked for will not be prohibitively expensive. There are several local resellers as well as CDW that I could purchase it from, so I will have to shop around and see who can offer the best deal.

I think it’s great what Veeam has put together and I look forward to checking it out.

Updating VM tools without rebooting

Whenever the ESX hosts in a cluster are upgraded, a newer version of VMware Tools becomes available which should be applied to each VM in the cluster. The Tools are updated drivers and services that enable the VM OS to run more efficiently in a virtualized environment. A normal “automatic” tools upgrade via the vCenter client will cause the VM to reboot when the tools upgrade is finished. If this is OK, then go for it. Otherwise, use this process to suppress the reboot and let things take effect during the next scheduled reboot.

Note: most things to take effect after the update, such as newer services, and drivers that can be safely unloaded/loaded without requiring a reboot.

Note: the NIC driver may be updated causing a brief drop in traffic.

To upgrade the VMware Tools on a Windows guest without rebooting, use this bit of PowerShell code with the vSphere PowerCLI.

First run this to set things up (connects to the specific vCenter server using your current Windows credentials and sets the installation parameters to pass to the installer):

Connect-VIServer
$insParm = '/S /v"/qn REBOOT=ReallySuppress"'

Then run this for each VM, changing with the name of the VM to update tools on, without rebooting:

$vmView = Get-VM | Get-View; $vmView.UpgradeTools_Task($insParm)

Example:

$vmView = Get-VM Babyruth | Get-View; $vmView.UpgradeTools_Task($insParm)

A task will start in vCenter and the tools should be updated on the VM. It takes about 2-3 minutes. If it seems to be taking longer than that to complete, check the VM console to make sure no errors have popped up that require attention, and also check if setup.exe and one or more msiexec.exe processes are running.

The task should finish successfully and the VMware Tools Status for the VM should change to “OK”

There is plenty of discussion online about various ways to accomplish this (“How to install VMware tools without a reboot?“), but what I described above is what works for me. It is definitely possible to write a PowerShell script to traverse through a folder or cluster and upgrade Tools on all the VMs contained within, but I prefer to take it one at a time so the process doesn’t get out of hand. Tools upgardes are generally painless but sometimes there are issues.

vCenter

Upgrading to vCenter 4.1

Like a lot of folks, I was excited with the release of vCenter 4.1 and ESX(i) 4.1. Lots of great new features, some bugfixes, and the overall feeling of “ooh, new shiny things to play with!”

Every new release of vCenter/ESX is like Christmas morning as there are plenty of new features to be discovered and bugs fixed that streamline previous operations.

I was dismayed at first to read that vCenter required a 64-bit OS as not too long ago I had built up a nice pair of vCenter 4.0 servers on Windows Server 2003 Standard (32-bit) and liked the way they were working. I was running SRM 4.0 along with Update Manager and everything was working great.

I knew that with the 64-bit requirement, I might as well go to Windows Server 2008 R2 64-bit for my two vCenter servers, but I had some hurdles to jump over. Specifically, we aren’t really running any Server 2008 machines in-house, save for those that we have to experiment and kick the tires with. So I decided to build a Server 2008 template and then deploy a pair of new vCenter servers from that.

The process of building a new template VM has been documented widely across the virtualization blogosphere, but I shall throw my two cents in the ring with a post describing my own win2k8 template. But as usual, that is for a later time.

The other hurdle to leap over, in addition to getting Server 2008 set up for production use, was the fact that I was going to have to relocate my vCenter servers to brand new machines. I tried this once awhile ago with VirtualCenter 2.5 and the results were diastrous. So I was not encouraged with the fact that I was going to have to try to move again.

Luckily, the VMware team understood that a lot of people would be in the same boat due to their 64-bit-only requirement so they put together a set of python scripts to aid in the moving of vCenter to a new host.

The guides that I referred to during this upgrade are:

I started out by stopping my vCenter services, backing up the Microsoft SQL 2008 Standard databases for VC and SRM, and then detaching and copying them to the new vCenter servers where I already had Microsoft SQL 2008 Standard  64-bit installed. I attached the DBs and made sure my SQL “vpxuser” account was set to owner.

Then I created a 64-bit System DSN for the vCenter database by running “odbcad32” at the Run command. I also created a 32-bit System DSN for SRM by running c:\windows\sysWOW64\odbcad32 at the Run command (thanks to boche.net for that tip)

A side note: my VUM database is part of my vCenter database, so I didn’t have to create a DSN for that – it will just piggyback on top of the vCenter DSN as it did on my Server 2003 vCenter servers.

After my DSNs were created, I followed the instructions in the Upgrade Guide to copy the datamigration folder off the vCenter 4.1 ISO on to my 32-bit server and ran the “backup.bat” to backup my vCenter certificates and other configuration that lives outside the database. It also backed up my VUM and Orchestrator configuration (even though I don’t use Orchestrator).

I copied the resulting datamigration folder to the new host. It now has “data” and “log” folders that hold the backed up configuration. I ran “install.bat” on the new host and pointed it to my vCenter 4.1 media. One thing that got me at first was it prompted for the Update Manager media path and I gave it what I thought was the path – the Update Manager folder inside the vCenter installer, but that was incorrect and I had to end up giving it the same path that I gave for vCenter. It would be nice if it would just check there first for the Update Manager installer files and then only prompt if it can’t find them.

Then the datamigration install.bat file launched the vCenter installer and I stepped through that, changing the install path to the E:\ drive that I had set up for it and pointing it to the DSN that I had configured. I also allowed it to automatically update the ESXi/vCenter host agents. Soon the installer finished and then also launched the VUM one. I stepped through that as well, again changing the path to E: and pointing it to the DSN that it needed.

The VUM install was also successful and then datamigration install.bat utility ended. I didn’t even get to see any of the host agents being updated because when I got the client installed and logged in, everything looked great. I had a few servers that needed to be reconfigured for HA but other than that, the install went great.

I repeated this process on my 2nd vCenter host and it went well too.

I did have to break up my linked-mode config as it sort of stopped working – I could log in to one server and see both, but if I logged in to the other one, it would only show that one and an authentication failed message for the other one. I had to “force” it to isolate itself from linked-mode, but I think that was because the vCenter service hadn’t stopped in time. But 2nd time was a charm and after joining them back together, all was well.

I then installed SRM 4.1, paying careful attention to the release notes (linked earlier) describing how to relocate SRM to a new server with a new name. I installed the HP EVA Storage adapter and visited my SRM config to make sure it was OK. It was but I did have to reconfigure the EVA credentials. I don’t know if they were lost or what but it was no big deal. I ran a test recovery plan and that worked perfectly.

I then reconfigured VUM to not download VM patches for Windows or Linux and none for ESX3. I also added the ESX4.0 to 4.1 upgrade zip and created a baseline for upgrading all my ESXi 4.0 hosts to 4.1.

Finally I remembered to copy the sysprep files off my 2k3 vcenter servers so that we can continue deploying/customizing 2k3 and XP server VMs.

At this point, everything looks great and I’m heading home. ESXi upgrade next week!

Determining IOPS per VM

We are researching a refresh of our virtual desktop environment (which I still have yet to describe here) and the vendor we are working with had some questions around our current VDI storage utilization, especially how many IOPS per concurrent XP desktop as well as the Read/Write ratio.

I had never actually computed these since we haven’t had any storage issues lately and the storage that we’re using is more than adequate (actually overkill).

Anyway, since vSphere Client doesn’t exactly have a line on the VM Resources area indicating current IOPS (would be nice), we have to figure them out. If we look at the Performance tab for a VM, we can look at the disk rate usage but at least for me, choosing any of the “Summation” counters such as “Disk Read Requests”, “Disk Write Requests”, or “Disk Commands Issued” just resulted in a performance graph that never actually loaded. But that’s okay because we don’t want to take one VM’s storage profile and assume all the other VMs are behaving the same way. Maybe the user assigned to this VM is a power user who is always installing apps, or maybe they never log in to their desktop at all.

Therefore we have to go up to the host level. I have between 9 and 10 hosts across my VDI clusters so I picked one of my clusters and went to the Hosts tab and sorted by % Memory to find the host that is using the most memory. This  indicates to me that it is hosting the most virtual machines in the cluster. All our VDI VMs have the same amount of memory assigned (512mb) so I knew this would be the easiest way to find out which host in the cluster had the most VMs (would be nice to have a column in the Hosts tab that indicated how many VMs are on the host!)

Now that I know which host has the most desktops, I can look at the host’s disk performance and divide that by the number of running VMs. With a lot of VMs on this host, I can be confident that the numbers I calculate will be a truer average than choosing a lightly loaded host.

So on my heavily loaded host, I went to the Performance tab and clicked on “Chart Options”. In there I chose Disk real-time stats  (since the counters I’m looking for are only recorded in real-time and not kept longer than an hour). I chose all my naa.* objects which are my datastore LUNs (one LUN per datastore, and I ignored the mpx.* object which is the local SCSI controller and the host object itself).

For Counters, I chose “Disk Commands Issued”. When the graph rendered, I saved it as an Excel spreadsheet. I repeated this for “Disk Read Requests” and then also “Disk Write Requests”. I could have had all 3 counters in one graph/spreadsheet but I wanted to make things easy on myself when calculating the averages.

I should point out that it would be a good idea to collect this data during an hour where normal activity is taking place – not first thing in the morning when people are logging in and firing up various applications, and not in the evening when fewer people are logged in. Late in the morning (but not too close to lunch) would be a good time. But to play devil’s advocate, it may be a good idea to do this during the busiest time for the environment because the future storage needs to be able to handle that load. Look at the host graph for Disk Usage over a week and try to spot a trend for the busiest day/time. Possibly export to Excel to get a more granular look at the data.

Now in the spreadsheets, each row is a collection of performance data. The first column is a timestamp for the collection date/time which are 20 seconds apart (the collection interval). The next columns are for the counter in question for each LUN. So to get the average number of commands over the past hour for the LUN, I used AVERAGE for each column. This gave me the average number of commands issued over the last hour for each collection period. Dividing this number by 20 resulted in the average commands per second for each LUN. Adding up the average commands per second for each LUN gives me the average commands per second for the server, over the past hour.

With the average commands per second for the server over the hour, I then divide by the number of VMs on the server, 43, and I get the following results:

  • average read/write requests per vm per second: 6.69
  • average read requests per vm per second: 5.30
  • average write requests per vm per second: 1.39

Taking the individual read or write request numbers and dividing by read/write sum, I find the follow percentages:

  • read: 79.23%
  • write: 20.76%

This ratio is pretty much identical to the common assumption that Windows XP virtual desktops have an 80:20 read:write ratio.

Armed with this information, I’m sure the vendor can put together a storage solution which will adequately host our new virtual desktop infrastructure.

It would be nice if VMware could add these types of calculations in somewhere so that this manual math is not needed, but I’m glad I could at least export the chart data and manipulate it in Excel.