|
|
|
|
Flash Menu
|
Proceed Direct to: |
Data plots provide a means to view one variable as a function of another. StatMat allows you to select any variable entered from the data files as the independent (X) or dependent (Y) variable. StatMat calculates the correlation coefficient between the variables and displays it in the lower left corner. If a regression is performed, the Standard Error of the Estimate normalized by the standard deviation of the dependent variable is also calculated and displayed in the lower left corner. If first order regression is performed, the regression equation is displayed in the lower right corner. Higher order regression and interpolation parameters my be viewed by clicking the text in the lower right corner.
C code may be automatically generated for any regression or interpolation model by clicking the Ccode button at any time a regression or interpolation trace is displayed.
Note: It is not necessary to sequence your data in the data file read by StatMat. StatMat automatically sequences all data internally such that the chosen defendant variable is continuously increasing.
Geometric data, such as frequency, may produce inaccurate or unacceptable results if regression or interpolation is performed directly. It is better to use StatMat's "Alter Data" Function to take the logarithm of geometric data prior to doing regression or interpolation analysis. Simply select your variable and type "log" in the Alter Data edit box.
You may download the data files used in the examples on this page here: data1_example.slf , data2_example.slf, and gaus99.slf.
Regression modeling provides a means to extrapolate data or estimate the value of aa dependent variable. Regression has an advantage over interpolation in that undesired noise is easy to filter out, and it is easier to compute.
A polynomial ranging from first order to an order equal to one less than the number of data points is fitted such that the sum of the square of the errors between the polynomial and the actual data is minimized.
Here is an example of a large quantity of data being models as a second order polynomial. The regression curve effectively filters out noise that appears in the data:

Regression Example
The most common regression is linear regression, or regression of a first order polynomial. Her is an example:

Linear Regression
Note that depressing the left mouse key displays the X,Y position of the cursor. The definitions of the various terms on the plot are listed below:
Standard Error:
The Standard Error, as used by StatMat, is a measure of the error between the points and the regression line. Total random data will have a Standard Error of one. A perfect fit between the data and the regression line will have a Standard Error of zero. The Standard Error, as used by StatMat, is the standard error of the estimate normalized by the standard deviation of the dependent variable. As you increase the order of regression, the Standard Error always goes down, until it is zero when the regression order is equal to one less than the number of points on the plot.
Correlation Coefficient:
The Correlation Coefficient is a measure of how well the dependent and independent variables linearly correlate to each other. Totally random data has a Correlation Coefficient of zero. Completely linear data has a Correlation Coefficient of one.
Y=4.564X - 27
This is the arithmetic expression of the regression line.
If it is desired to use only a portion of the data to extrapolate linearly, the graph min and max scales may be changed to display only the data desired for consideration. StatMat will calculate the linear regression, Correlation Coefficient, and Standard Error only on data that is displayed, as shown below:

Linear Extrapolation
StatMat allows curve fits for up to 20th order polynomials. Unlike linear regressions, StatMat always uses all the data for curve fitting and for Standard Error and Correlation Coefficient computations. Below is an example of a 5th order polynomial curve fit:

5th Order Regression
Note the significant reduction in the Standard Error for the higher order curve. Also, note that the fifth order polynomial wanders off when outside the bounds of the data. This wandering limits the value in the use of higher order regressions to extrapolate data. Higher order regression are useful to model a system when only a finite number of data points are known, such as modeling aerodynamic data from a wind tunnel.
Interpolation provides a means to estimate the value of a dependent variable between known points. It has an advantage over regression in that it is more accurate, but has the disadvantage in that it may include undesired noise and is harder to compute. StatMat provides first, second and third order interpolation capability. Automatic C code may be generated for any of these interpolation orders. The individual definitions are as follows:
First Order Interpolation:
Also knows as linear interpolation, first order interpolation models a straight line between each data point. The values of the line at the desired independent data value is calculated and returned.
Second Order Interpolation:
Also known as quadratic interpolation, second order interpolation models a second order polynomial between each data point. The polynomials on each side of any given data point are equal in their first derivative at the data point. To obtain a unique solution, the first order derivative of the curve at one of the end point data values must be known. Since this value is generally unknown, StatMat estimates a reasonable derivative for one of the end points.
Third Order Interpolation:
Also known as cubic interpolation or as a cubic spline, third order interpolation models third order polynomials between each data point. The polynomials on each side of any given data point are equal in their first and second derivatives at the data point. To obtain a unique solution, the first order derivative of the curve at both of the end point data values must be known. Since this information is generally unknown, StatMat uses an algorithm known the natural cubic spline that generates reasonable first derivatives at the data end points.
Interpolation parameters consist of a set polynomial coefficients for all but the final data point. The polynomial is in terms of (X-Xi) where Xi is the dependent data value immediately to the left of X. These parameters may be viewed by clicking the text in the lower right side of the plot.
Linear interpolation is the most commonly used for its simplicity and ease in calculating. An example of first order interpolation is shown below:

First Order Interpolation
Second order interpolation is not commonly used. An example is shown below. The curves tend to be wavy and result in significant errors. At times though it may be very accurate and less costly than third order interpolation. StatMat may help to determine if second order interpolation is useful for your data.

Second Order Interpolation
Third order interpolation is commonly used due to its relative accuracy. The example below shows minimal error for all points.

Third Order Interpolation
Once a regression or interpolation model is made, StatMat easily generates the C code of the model that may be compiled by any C or C++ compiler. The C code for regression models is optimal in that only one multiply per order of regression is required. The C code for interpolation is optimal in the same sense, one multiply per order of interpolation, and is also optimal in that the search time is minimal for most applications by remembering the last search position and starting at that point.
The function generated by StatMat requires a type double input and returns a type double output. Regression and interpolation parameters are installed as static variable in the C code. The C code function is given the same name as the C code file. Your calling should look like: y = statfunction(x); where x and y are of type double, and statfunction is the name assigned to the C function. It is frequently necessary to do a logarithm on data that is geometric in nature, such as frequency data, prior to doing a regression or interpolation. You may do a logarithm of a variable's data with StatMat's Alter Data function by simply entering "LOG" and selecting OK.
If you wish to generate C code for geometric data after you have performed a logarithm on the data with StatMat's Alter Data function, it will be necessary to reinstate your date when you call the regression or interpolation subroutine. For example, if you have performed a logarithm on the independent variable, your calling code should look like: y = statfunction(log(x));. If you have performed a logarithm on the dependent variable, your calling code should look like: y = exp(statfunction(x));.
Occasionally it is desirable to view two random variables on an XY plane, such as the touchdown position of an aircraft in numerous autolands. StatMat provides scatter plot capability to do this. The example below turns off the interpolation and regression line by unchecking the "Line" box. The Correlation Coefficient provides feedback on a linear relationship between the variables.

Scatter Plot