Different Methods of Performing Data Analysis and Doing it the LitePoint Way

Tobias Christensen
Tobias is a Cloud Engineer at LitePoint Corporation. He works with building data
User is currently offline
Dec 09 in LitePoint 0 Comments

As Joseph Sfeir wrote in his last blog post, data management and analysis is a huge challenge that we also face at LitePoint. The setting is that after executing extensive wireless tests we need to sort through the massive amounts of information generated from the tests and turn them into meaningful data for analysis.

At LitePoint, we use three simple steps to analyze the data:

Step 1: Verify that the test actually ran and that the results conform to the predefined limits.

Step 2: Examine the statistical characteristics to see if they adhere to our expectations, or if it’s a regression test, figure out how much 2 or more datasets deviate from each other; we may also want a report that can be used as a certificate to customers that a bug was fixed.

Step 3: If we spot any unexpected behavior, dig deeper into the data, for e.g. compare 2 types of results to see if we can find any dependencies that can help us in the debugging process.

We use a variety of tools internally to perform this process. The general purpose tools we use are Excel, Matlab, & LabView, but we have also developed our own solution, IQreport, in part, to make the typical initial data analysis process fast & easy to perform, and to be able to quickly create certification reports that we can hand to our customers.

The illustration below shows how we, at LitePoint, typically handle data analysis (as seen from a high level overview):


Basic Statistics
is as the name implies fairly standard statistics like standard deviation, cpk, mean, histograms, trends, etc. We use basic statistics heavily. This is how we “get to know” the data. We can do this analysis in 3 different modes: manual, automated, or intelligent.

• Manual mode is done using Excel to manually explore the data

• Automated mode involves creating reports that contain basic statistics for the entire dataset

• Intelligent mode looks at the data created by the automated mode and pulls out the important
   parts with the user having to look at only a fraction of the entire dataset


Data Mining
, in this context, is the ability to find dependencies or trends among results, for e.g. the result that fails test A typically has a very low B value or to find out exactly when a process started to deviate.

• The manual mode (or OLAP mode) is to dynamically browse the data & spot dependencies &
   trends among results

• In the automated mode, a report is created with a large set of comparisons and view for
  multi-dimensional dependencies. It would also look for trends within the data

• The intelligent mode selects those dependencies & trends that have highest impact and are of
   most interest to the test engineer


Machine Learning
is the process of creating predictors or classifiers that can help explain the data.

• In the manual mode the user guides the discovery of classifiers & predictors (c&p)

• In the automated mode a large amount of c&p's are created automatically

• In the intelligent mode the best c&p's are found and presented to the test engineer

We use basic statistics in all 3 modes (manual, automated, & intelligent) and IQreport has made a great impact for us, in that, it's capable of intelligently sorting the results such that even with more than 10K result types, the test engineer rarely has to look at more than a few pages to get the grand overview. We use data mining only in the manual mode, and it's primarily used as a debug tool. We have made some small trials into the area of machine learning but see great future potential in this area.

As seen from the above listing, not very many of these ways to perform data analysis are covered by tools; obviously, it is possible to develop your own tools based on either the general purpose tools Excel, Matlab, or LabView, or to write a tool or script from scratch. An issue with creating your own tool or script is that it may take a long time before you can get to the data analysis part and you end up spending a large amount of time and effort simply handling the data.

Another issue that we ourselves have run into and have noticed with our customers is that data analysis can become fragmented, i.e. analysis was performed only with parts of the data or it is performed differently, such that people disagree on the actual result. So even though the statistical results should be the same, things like outlier removal or enforcing a stricter pass criteria may render the statistical results different. This was another reason for us to create IQreport: to have a single, uniformed way of looking at data---a sort of benchmark.

Tags: Untagged

Comments

No comments made yet. Be the first to submit a comment

Leave your comment

Guest
Guest Tuesday, 21 May 2013

WE'RE HIRING!


We’re looking for dedicated team players to meet increasing market demand. Are you our next team member?   Click Here