Struct cogset::Dbscan [-] [+] [src]

pub struct Dbscan<P: RegionQuery + ListPoints> where P: Hash + Eq + Clone {
    // some fields omitted
}

Clustering via the DBSCAN algorithm[1].

[DBSCAN] is a density-based clustering algorithm: given a set of points in some space, it groups together points that are closely packed together (points with many nearby neighbors), marking as outliers points that lie alone in low-density regions (whose nearest neighbors are too far away).wikipedia

An instance of Dbscan is an iterator over clusters of P. Points classified as noise once all clusters are found are available via noise_points.

This uses the P::Point yielded by the iterators provided by ListPoints and RegionQuery as a unique identifier for each point. The algorithm will behave strangely if the identifier is not unique or not stable within a given execution of DBSCAN. The identifier is cloned several times in the course of execution, so it should be cheap to duplicate (e.g. a usize index, or a &T reference).

[1]: Ester, Martin; Kriegel, Hans-Peter; Sander, Jörg; Xu, Xiaowei (1996). Simoudis, Evangelos; Han, Jiawei; Fayyad, Usama M., eds. A density-based algorithm for discovering clusters in large spatial databases with noise. Proceedings of the Second International Conference on Knowledge Discovery and Data Mining (KDD-96). AAAI Press. pp. 226–231.

Examples

A basic example:

use cogset::{Dbscan, BruteScan, Euclid};

let points = [Euclid([0.1]), Euclid([0.2]), Euclid([1.0])];

let scanner = BruteScan::new(&points);
let mut dbscan = Dbscan::new(scanner, 0.2, 2);

// get the clusters themselves
let clusters = dbscan.by_ref().collect::<Vec<_>>();
// the first two points are the only cluster
assert_eq!(clusters, &[&[0, 1]]);

// now the noise
let noise = dbscan.noise_points();
// which is just the last point
assert_eq!(noise.iter().cloned().collect::<Vec<_>>(),
           &[2]);

A more complicated example that renders the output nicely:

use std::str;
use cogset::{Dbscan, BruteScan, Euclid};

fn write_points<I>(output: &mut [u8; 76], byte: u8, it: I)
    where I: Iterator<Item = Euclid<[f64; 1]>>
{
    for p in it { output[(p.0[0] * 30.0) as usize] = byte; }
}

// the points we're going to cluster, considered as points in ℝ
// with the conventional distance.
let points = [Euclid([0.25]), Euclid([0.9]), Euclid([2.0]), Euclid([1.2]),
              Euclid([1.9]), Euclid([1.1]),  Euclid([1.35]), Euclid([1.85]),
              Euclid([1.05]), Euclid([0.1]), Euclid([2.5]), Euclid([0.05]),
              Euclid([0.6]), Euclid([0.55]), Euclid([1.6])];

// print the points before clustering
let mut original = [b' '; 76];
write_points(&mut original, b'x', points.iter().cloned());
println!("{}", str::from_utf8(&original).unwrap());

// set-up the data structure that will manage the queries that
// Dbscan needs to do.
let scanner = BruteScan::new(&points);

// create the clusterer: we need 3 points to consider a group a
// cluster, and we're only looking at points 0.2 units apart.
let min_points = 3;
let epsilon = 0.2;
let mut dbscan = Dbscan::new(scanner, epsilon, min_points);

let mut clustered = [b' '; 76];

// run over all the clusters, writing each to the output
for (i, cluster) in dbscan.by_ref().enumerate() {
    // since we used `BruteScan`, `cluster` is a vector of indices
    // into `points`, not the points themselves, so lets map back
    // to the points.
    let actual_points = cluster.iter().map(|idx| points[*idx]);

    write_points(&mut clustered, b'0' + i as u8,
                 actual_points)
}
// now run over the noise points, i.e. points that aren't close
// enough to others to be in a cluster.
let noise = dbscan.noise_points();
write_points(&mut clustered, b'.',
             noise.iter().map(|idx| points[*idx]));

// print the numbered clusters
println!("{}", str::from_utf8(&clustered).unwrap());

Output:

 x x   x        x x        x   x x  x   x       x      x x  x              x
 0 0   0        . .        2   2 2  2   2       .      1 1  1              .

Methods

impl<P: RegionQuery + ListPoints> Dbscan<P> where P: Hash + Eq + Clone

fn new(points: P, eps: f64, min_points: usize) -> Dbscan<P>

Create a new DBSCAN instance, with the given eps and min_points.

eps is the maximum distance between points when creating neighbours to construct clusters. min_points is the minimum of points for a cluster.

This does not perform any significant computation immediately; clusters are found on the fly via the Iterator instance.

fn noise_points(&self) -> &HashSet<P>

Points that have been classified as noise once the algorithm finishes.

This only makes sense to call once the iterator is exhausted, and will give unspecified nonsense if called earlier.

Trait Implementations

impl<P: RegionQuery + ListPoints> Iterator for Dbscan<P> where P: Hash + Eq + Clone

type Item = Vec<P>

fn next(&mut self) -> Option<Vec<P>>

fn size_hint(&self) -> (usize, Option<usize>)

fn count(self) -> usize

fn last(self) -> Option<Self::Item>

fn nth(&mut self, n: usize) -> Option<Self::Item>

fn chain<U>(self, other: U) -> Chain<Self, U::IntoIter> where U: IntoIterator<Item=Self::Item>

fn zip<U>(self, other: U) -> Zip<Self, U::IntoIter> where U: IntoIterator

fn map<B, F>(self, f: F) -> Map<Self, F> where F: FnMut(Self::Item) -> B

fn filter<P>(self, predicate: P) -> Filter<Self, P> where P: FnMut(&Self::Item) -> bool

fn filter_map<B, F>(self, f: F) -> FilterMap<Self, F> where F: FnMut(Self::Item) -> Option<B>

fn enumerate(self) -> Enumerate<Self>

fn peekable(self) -> Peekable<Self>

fn skip_while<P>(self, predicate: P) -> SkipWhile<Self, P> where P: FnMut(&Self::Item) -> bool

fn take_while<P>(self, predicate: P) -> TakeWhile<Self, P> where P: FnMut(&Self::Item) -> bool

fn skip(self, n: usize) -> Skip<Self>

fn take(self, n: usize) -> Take<Self>

fn scan<St, B, F>(self, initial_state: St, f: F) -> Scan<Self, St, F> where F: FnMut(&mut St, Self::Item) -> Option<B>

fn flat_map<U, F>(self, f: F) -> FlatMap<Self, U, F> where F: FnMut(Self::Item) -> U, U: IntoIterator

fn fuse(self) -> Fuse<Self>

fn inspect<F>(self, f: F) -> Inspect<Self, F> where F: FnMut(&Self::Item) -> ()

fn by_ref(&mut self) -> &mut Self

fn collect<B>(self) -> B where B: FromIterator<Self::Item>

fn partition<B, F>(self, f: F) -> (B, B) where F: FnMut(&Self::Item) -> bool, B: Default + Extend<Self::Item>

fn fold<B, F>(self, init: B, f: F) -> B where F: FnMut(B, Self::Item) -> B

fn all<F>(&mut self, f: F) -> bool where F: FnMut(Self::Item) -> bool

fn any<F>(&mut self, f: F) -> bool where F: FnMut(Self::Item) -> bool

fn find<P>(&mut self, predicate: P) -> Option<Self::Item> where P: FnMut(&Self::Item) -> bool

fn position<P>(&mut self, predicate: P) -> Option<usize> where P: FnMut(Self::Item) -> bool

fn rposition<P>(&mut self, predicate: P) -> Option<usize> where P: FnMut(Self::Item) -> bool, Self: ExactSizeIterator + DoubleEndedIterator

fn max(self) -> Option<Self::Item> where Self::Item: Ord

fn min(self) -> Option<Self::Item> where Self::Item: Ord

fn min_max(self) -> MinMaxResult<Self::Item> where Self::Item: Ord

fn max_by<B, F>(self, f: F) -> Option<Self::Item> where F: FnMut(&Self::Item) -> B, B: Ord

fn min_by<B, F>(self, f: F) -> Option<Self::Item> where F: FnMut(&Self::Item) -> B, B: Ord

fn rev(self) -> Rev<Self> where Self: DoubleEndedIterator

fn unzip<A, B, FromA, FromB>(self) -> (FromA, FromB) where FromB: Default + Extend<B>, FromA: Default + Extend<A>, Self: Iterator<Item=(A, B)>

fn cloned<'a, T>(self) -> Cloned<Self> where Self: Iterator<Item=&'a T>, T: 'a + Clone

fn cycle(self) -> Cycle<Self> where Self: Clone

fn reverse_in_place<'a, T>(&mut self) where T: 'a, Self: Iterator<Item=&'a mut T> + DoubleEndedIterator

fn sum<S = Self::Item>(self) -> S where S: Add<Self::Item, Output=S> + Zero

fn product<P = Self::Item>(self) -> P where P: Mul<Self::Item, Output=P> + One