As in any tree-based data structure, the M-tree is composed of nodes and leaves. In each node there is a data object that identifies it uniquely and a pointer to a sub-tree where its children reside. Every leaf has several data objects. For each node there is a radius r {\displaystyle r} that defines a Ball in the desired metric space. Thus, every node n {\displaystyle n} and leaf l {\displaystyle l} residing in a particular node N {\displaystyle N} is at most distance r {\displaystyle r} from N {\displaystyle N} , and every node n {\displaystyle n} and leaf l {\displaystyle l} with node parent N {\displaystyle N} keep the distance from it.
An M-tree has these components and sub-components:
The main idea is first to find a leaf node N where the new object O belongs. If N is not full then just attach it to N. If N is full then invoke a method to split N. The algorithm is as follows:
If the split method arrives to the root of the tree, then it choose two routing objects from N, and creates two new nodes containing all the objects in original N, and store them into the new root. If split methods arrives to a node N that is not the root of the tree, the method choose two new routing objects from N, re-arrange every routing object in N in two new nodes N 1 {\displaystyle N_{1}} and N 2 {\displaystyle N_{2}} , and store these new nodes in the parent node N p {\displaystyle N_{p}} of original N. The split must be repeated if N p {\displaystyle N_{p}} has not enough capacity to store N 2 {\displaystyle N_{2}} . The algorithm is as follow:
A range query is where a minimum similarity/maximum distance value is specified. For a given query object Q ∈ D {\displaystyle Q\in D} and a maximum search distance r ( Q ) {\displaystyle r(Q)} , the range query range(Q, r(Q)) selects all the indexed objects O j {\displaystyle O_{j}} such that d ( O j , Q ) ≤ r ( Q ) {\displaystyle d(O_{j},Q)\leq r(Q)} .2
Algorithm RangeSearch starts from the root node and recursively traverses all the paths which cannot be excluded from leading to qualifying objects.
k-nearest neighbor (k-NN) query takes the cardinality of the input set as an input parameter. For a given query object Q ∈ D and an integer k ≥ 1, the k-NN query NN(Q, k) selects the k indexed objects which have the shortest distance from Q, according to the distance function d.3
Ciaccia, Paolo; Patella, Marco; Zezula, Pavel (1997). "M-tree An Efficient Access Method for Similarity Search in Metric Spaces" (PDF). Proceedings of the 23rd VLDB Conference Athens, Greece, 1997. IBM Almaden Research Center: Very Large Databases Endowment Inc. pp. 426–435. p426. Retrieved 2010-09-07. http://www.vldb.org/conf/1997/P426.PDF ↩
P. Ciaccia; M. Patella; F. Rabitti; P. Zezula. "Indexing Metric Spaces with M-tree" (PDF). Department of Computer Science and Engineering. University of Bologna. p. 3. Retrieved 19 November 2013. http://www-db.deis.unibo.it/research/papers/SEBD97.pdf ↩