Monday 8 February 2010

Finding issue with third party COM component

The day started with reading mails that came overnight. There was a mail from tech support escalating a customer issue. A webex with the customer was scheduled after an hour. We requested for logs from the tech support. On looking at the logs the first guess was that the error was thrown by a third party component which we used in our product. The strange thing was that the customer had two environments: one a test network and the other a production network. The product was installed on both. However, it was working on test network and not on production network.

From the logs it was almost clear that the third party COM component was returning E_UNEXPECTED error code. But before the third party COM component call, there were few lines from our product which could have returned same error. So to make sure it was third party component issue we prepared a debug binary with more logging. We postponed the webex to 4 in the evening so that we have the debug binary ready.

In the evening we did webex with the customer and got the required logs. These logs made sure that the error was returned by the third party COM component. Now, we had only one option, that of contacting the third party vendor. But since the component was working properly in one scenario we knew that the issue had to be related with that particular installation on that particular machine. The third party component also had the option of enabling the logging. We had tried that also but no logs were generated. The call that was failing was an Initialize() call on that component. So, it was clear that the component was failing even before basic initialization like enabling logging. Contacting the third party vendor was last option as that would have been a lengthy process.

What next? We could not look into the COM component. Had it been a .Net component then reflection would have come to the rescue. Wait a minute. We wanted to look into the component. It had to be something related to the machine configuration. Some permissions or something. We took help of Procmon (Process Monitor) tool from Sysinternals. This tool captures the registry and file accesses. So after looking at the procmon logs you can tell which registry or file operations the component is performing and whether it succeeded or failed.

Running procmon directly on the product would have generated lots of logs. So we prepared a simple native exe which used the third party COM component. This served two purposes.
  • One if this exe ran successfully then it meant there was problem with our product configuration and not with machine.
  • Second if it failed then the problem was with the COM component due to machine configuration and also less procmon logs to analyze.
We did another webex with the customer and took the procmon logs. To be on safer side we took the procmon logs by running the test exe we created as well as the product. Half the work was done. Now, we had to analyze the logs.

We looked at the logs from webex. But it did not show any error like could not access registry due to permissions etc. Now, other option was to compare it with the logs when the COM component ran successfully. We did that and realized that there was some issue with MSXML. The logs on customer machine showed that MSXML4 was being used. But there were some read logs which were missing i.e. the component was not able to parse the XML using MSXML4. Following was the procmon log:

"ReadFile","C:\WINDOWS\system32\msxml4.dll","SUCCESS","Offset: 99,328, Length: 32,768, I/O Flags: Non-cached, Paging I/O, Synchronous Paging I/O"

We suspected issue with MSXML. Even the Initialize() method on the COM component took XML string as input and option for enabling logging was also part of this XML string only. So the component must have been failing to parse the input XML string and hence no logs generated.

MSXML6 being the latest we asked the customer to verify if MSXML6 was installed on the machine and if not then it should be installed. Bingo, we hit the bull’s-eye. Customer found that MSXML6 was not installed and after installing it the product worked without any issues.

Note: There are many tools available from sysinternals which are very helpful in troubleshooting issues.

3 comments:

  1. U rock.
    Can we have a simple utility which logs all the COM component calls from a process. With the return code - I am not sure if we can do this.

    ReplyDelete
  2. Are you refering to calls from managed code or unmanaged code?

    ReplyDelete