Here’s a quick implementation, I think I got the algorithm right, I just glanced over it.
<?php
// Random cartesian coordinates (x, y) and labels
$data = array(
array(1, 2, 'red'), // 0 =>
array(5, 3, 'blue'), // 1 =>
array(-1, 2, 'blue'), // 2 =>
array(2, 5, 'red'), // 3 =>
array(3, 3, 'red'), // 4 =>
array(-4, 5, 'blue'), // 5 =>
array(2, 2, 'blue'), // 6 =>
array(5, -2, 'red'), // 7 =>
array(-1, -2, 'blue'),// 8 =>
);
// Build distance matrix
$distances = $data;
array_walk($distances, 'euclideanDistance', $data);
// Example, target = datapoint 5, getting 3 nearest neighbors
$neighbors = getNearestNeighbors($distances, 5, 3);
echo getLabel($data, $neighbors) . "\
"; // red
/**
* Calculates eucilean distances for an array dataset
*
* @param array $sourceCoords In format array(x, y)
* @param array $sourceKey Associated array key
* @param array $data
* @return array Of distances to the rest of the data set
*/
function euclideanDistance(&$sourceCoords, $sourceKey, $data)
{
$distances = array();
list ($x1, $y1) = $sourceCoords;
foreach ($data as $destinationKey => $destinationCoords) {
// Same point, ignore
if ($sourceKey == $destinationKey) {
continue;
}
list ($x2, $y2) = $destinationCoords;
$distances[$destinationKey] = sqrt(pow($x1 - $x2, 2) + pow($y1 - $y2, 2));
}
asort($distances);
$sourceCoords = $distances;
}
/**
* Returns n-nearest neighbors
*
* @param array $distances Distances generated above ^
* @param mixed $key Array key of source location
* @param int $num Of neighbors to fetch
* @return array Of nearest neighbors
*/
function getNearestNeighbors($distances, $key, $num)
{
return array_slice($distances[$key], 0, $num, true);
}
/**
* Gets result label from associated data
*
* @param array $data
* @param array $neighbors Result from getNearestNeighbors()
* @return string label
*/
function getLabel($data, $neighbors)
{
$results = array();
$neighbors = array_keys($neighbors);
foreach ($neighbors as $neighbor) {
$results[] = $data[$neighbor][2];
}
$values = array_count_values($results);
$values = array_flip($values);
ksort($values);
return array_pop($values);
}
Value of $distances
:
Array
(
[0] => Array
(
[6] => 1
[2] => 2
[4] => 2.2360679775
[3] => 3.16227766017
[1] => 4.12310562562
[8] => 4.472135955
[7] => 5.65685424949
[5] => 5.83095189485
)
[1] => Array
(
[4] => 2
[6] => 3.16227766017
[3] => 3.60555127546
[0] => 4.12310562562
[7] => 5
[2] => 6.0827625303
[8] => 7.81024967591
[5] => 9.21954445729
)
[2] => Array
(
[0] => 2
[6] => 3
[8] => 4
[4] => 4.12310562562
[5] => 4.24264068712
[3] => 4.24264068712
[1] => 6.0827625303
[7] => 7.21110255093
)
[3] => Array
(
[4] => 2.2360679775
[6] => 3
[0] => 3.16227766017
[1] => 3.60555127546
[2] => 4.24264068712
[5] => 6
[8] => 7.61577310586
[7] => 7.61577310586
)
[4] => Array
(
[6] => 1.41421356237
[1] => 2
[0] => 2.2360679775
[3] => 2.2360679775
[2] => 4.12310562562
[7] => 5.38516480713
[8] => 6.40312423743
[5] => 7.28010988928
)
[5] => Array
(
[2] => 4.24264068712
[0] => 5.83095189485
[3] => 6
[6] => 6.7082039325
[4] => 7.28010988928
[8] => 7.61577310586
[1] => 9.21954445729
[7] => 11.401754251
)
[6] => Array
(
[0] => 1
[4] => 1.41421356237
[3] => 3
[2] => 3
[1] => 3.16227766017
[7] => 5
[8] => 5
[5] => 6.7082039325
)
[7] => Array
(
[1] => 5
[6] => 5
[4] => 5.38516480713
[0] => 5.65685424949
[8] => 6
[2] => 7.21110255093
[3] => 7.61577310586
[5] => 11.401754251
)
[8] => Array
(
[2] => 4
[0] => 4.472135955
[6] => 5
[7] => 6
[4] => 6.40312423743
[3] => 7.61577310586
[5] => 7.61577310586
[1] => 7.81024967591
)
)
Value of $neighbors
of point 5 (-4, 5), 3 nearest:
Array
(
[2] => 4.24264068712
[0] => 5.83095189485
[3] => 6
)
2 is blue, 0 is red, and 3 is red, which makes point 5 red.
In this example we ignore the source point’s existing label and reclassify it, but it’s just an example.
HTH
Edit:
Oh, and it doesn’t do the recursive dive if an equal number of labels exist for a value k, it instead returns one arbitrarily