This PHP 5 class accepts an array of data as input, performs linear regression analysis and returns the trend line. The result is the same as in Excel (Tools/Data Analysis/Regression)
/** compute the coeff, equation source: http://people.hofstra.edu/faculty/Stefan_Waner/RealWorld/calctopic1/regression.html */
function calculate() {
$n = count($this->mDatas);
$vSumXX = $vSumXY = $vSumX = $vSumY = 0;
//var_dump($this->mDatas);
$vCnt = 0; // for time-series, start at t=0
foreach ($this->mDatas AS $vOne) {
if (is_array($vOne)) { // x,y pair
list($x,$y) = $vOne;
} else { // time-series
$x = $vCnt; $y = $vOne;
} // fi
$vSumXY += $x*$y;
$vSumXX += $x*$x;
$vSumX += $x;
$vSumY += $y;
$vCnt++;
} // rof
$vTop = ($n*$vSumXY – $vSumX*$vSumY);
$vBottom = ($n*$vSumXX – $vSumX*$vSumX);
$a = $vBottom!=0?$vTop/$vBottom:0;
$b = ($vSumY – $a*$vSumX)/$n;
//var_dump($a,$b);
return array($a,$b);
}
/** given x, return the prediction y */
function predict($x) {
list($a,$b) = $this->calculate();
$y = $a*$x+$b;
return $y;
}
}
?>
Sample Usage
// sales data for the last 30 quarters
$vSales = array(
637381,700986,641305,660285,604474,565316,598734,688690,723406,697358,
669910,605636,526655,555165,625800,579405,588317,634433,628443,570597,
662584,763516,742150,703209,669883,586504,489240,648875,692212,586509
);
$vRegression = new CRegressionLinear($vSales);
$vNextQuarter = $vRegression->predict(); // return the forecast for next period
In this example, we assume it’s a very simple case. For a better forecasting method, there are many, including this multiplicative forecasting method that takes into account seasonality, irregularity and also the growth trend.
9:11 pm, January 22, 2006Andras /
Hey, Your code rocks except for Line 21, where it says $vSumXY += $x*$x; instead of $vSumXX += $x*$x; . Please correct it, it took me quite a lot of time to figure out, why the results were wrong.
Thanks anyway, I really appreciated it 🙂
10:55 pm, January 22, 2006trungson /
Correction has been made. It’s weird how I have it correct in the codebase but somehow pasted it incorrectly to the blog. 🙂 Thanks Andras.
3:46 pm, July 7, 2006Anonymous /
So where does the data go? Wheer and how do you enter data streams in to this code to run it?
3:10 pm, August 22, 2007Anonymous /
For those who are interested, I have compared the values of slope and Y-intercept given by this Class with those given by Excel. They produce exactly equivalent responses, at least out to 10 decimal places. Nice work.
9:42 pm, June 25, 2008Anonymous /
I would love to see an example of how this class works…
7:41 am, September 12, 2008Amy Ramesh /
Thank you very much for the code. But I would like to know how the functions will be called. It would be great if you provide some example.
Thank you
^Amy
9:07 pm, July 21, 2009Anonymous /
How would you create a list of points for trendline for the historical dataset? this just predicts forward set
Thanks
12:16 pm, August 8, 2009Anonymous /
Many thanks!!
Also, congrats for providing an easy read and ready to use code.
2:22 am, January 15, 2010Anonymous /
this script is great and it works without any issues on http://www.pacifichost.com good host
6:21 am, February 1, 2010Anonymous /
Hi – thanks for the great script. It's been a long time since I've seen y=mx+b! Slope worked fine for me as time series, but the y-intercept wasn't working. (The reverse occurred when using pairs). I modified the script so it just takes key => value pairs and does both correctly. Predict needs the next X-value as an argument to work. Here's the code:
class CRegressionLinear {
/** perform regression analysis on the input data, make the trend line y=ax+b
* http://blog.trungson.com/2005/11/linear-regression-php-class.html
*/
private $mDatas; // input data, array of (x1,y1);(x2,y2);… pairs, or could just be a time-series (x1,x2,x3,…)
/** constructor */
function __construct($pDatas) {
$this->mDatas = $pDatas;
}
/** compute the coeff, equation source: http://people.hofstra.edu/faculty/Stefan_Waner/RealWorld/calctopic1/regression.html */
function calculate() {
$n = count($this->mDatas);
$vSumXX = $vSumXY = $vSumX = $vSumY = 0;
foreach ($this->mDatas AS $vCnt => $vOne) {
$x = $vCnt; $y = $vOne;
$vSumXY += $x*$y;
$vSumXX += $x*$x;
$vSumX += $x;
$vSumY += $y;
$vCnt++;
} // rof
$vTop = ($n*$vSumXY – $vSumX*$vSumY);
$vBottom = ($n*$vSumXX – $vSumX*$vSumX);
$a = $vBottom!=0?$vTop/$vBottom:0;
$b = ($vSumY – $a*$vSumX)/$n;
return array($a,$b);
}
/** given x, return the prediction y */
function predict($x) {
list($a,$b) = $this->calculate();
$y = ($a*$x)+$b;
return $y;
}
} // function CRegressionLinear
//Thanks again
2:09 pm, March 8, 2010Santhosh /
simple codings, can easy to learn crispy.
5:40 am, December 30, 2011Alfredo Covaleda Vélez /
Son, thanks for your code. I added a few lines to calculate the coefficient of correlation and the coefficient of determination inside your class. I’d like to use this class as other of the classes included as part of my thesis work, of course with the additional code I have written. Regards.