SuffixArray

The suffix array is a data structure that can be used to test quickly the presence of a value in a list. It's also adapted to get completion proposals for a prefix.

The performance gain over a canFind() is excellent (98%). The gain over the built-in associative array is slight (3%) and the gain over EMSI hash map is good (40%). Despite of its speed, it always wastes a lot of memory because each byte in an entry is represented by an array of 256 pointers. For example count 800 MB for /usr/share/dict/words (300K words) on X86_64.

This implementation only works with arrays of character made of single byte units (char[], string, const(char)[], etc) but is compatible with multi byte characters.

Disabled Default Constructor

A disabled default is present on this object. To use it, use one of the other constructors or a factory function.

Constructors

this
this(E entries)

Constructs the array from a range of elements.

Destructor

A destructor is present on this object, but not explicitly documented in the source.

Postblit

Copying this object is disabled.

A postblit is present on this object, but not explicitly documented in the source.

Members

Aliases

Fun
alias Fun(A...) = void function(const(Node)* node, ref const ubyte[] path, A a)

Prototype for the function passed in visitAll and Node.visit.

find
alias find = opBinaryRight!"in"

Determines wether a full value is in the array.

Functions

clear
void clear()

Empties the array.

findPrefix
const(Node)* findPrefix(const T value)

Determines wether an entry starts with value.

memoryUsage
size_t memoryUsage()

Indicates the amount of memory used by the array.

opBinaryRight
const(Node)* opBinaryRight(const T value)

Determines wether a full value is in the array.

sort
T[] sort(bool descending = false)

Sorts the entries.

visitAll
void visitAll(auto ref A a)

Visits all the nodes with a function.

Structs

Node
struct Node

A node in the array.

Templates

isValidVisitor
template isValidVisitor(alias fun)

Indicates wether a function is suitable for visitAll() or Node.visit()

Examples

1 string[] source = ["Cairo", "Calcutta", "Calgary", "Cali", "Campinas",
2     "Cape Town", "Caracas", "Casablanca", "Changchun", "Changde"];
3 
4 auto cities = SuffixArray!string(source);
5 
6 // test for presence
7 assert("Cairo" in cities);
8 assert("Calcutta" in cities);
9 assert("Calgary" in cities);
10 assert("Chicago" !in cities);
11 assert("" !in cities);
12 
13 // get completion list
14 auto pre = cities.findPrefix("Cal");
15 assert(pre);
16 assert(!pre.terminates);
17 assert(pre.entries == ["Calcutta"[3..$], "Calgary"[3..$], "Cali"[3..$]]);
18 
19 // sorting
20 auto desc = cities.sort(true);
21 import std.algorithm.sorting;
22 assert(desc == sort!"a > b"(source).array);
23 auto asc = cities.sort(false);
24 assert(asc == sort!"a < b"(source).array);
25 
26 // adds from a prefix
27 auto ch = cast(cities.Node*) cities.findPrefix("Ch");
28 assert(ch);
29 ch.addSuffix("icago");
30 assert("Chicago" in cities);
31 
32 // memory usage
33 size_t count;
34 foreach(s; source) foreach(i; 0..s.length)
35     ++count;
36 // actual usage is more close to count * 256 * size_t.sizeof
37 assert(cities.memoryUsage >= count);
38 
39 // clearing is global
40 cities.clear;
41 assert("Cairo" !in cities);
42 assert("Calcutta" !in cities);

Meta