Margin-infused relaxed algorithm (MIRA) is a machine learning algorithm, an online algorithm for multiclass classification problems. It is designed to learn a set of parameters (vector or matrix) by processing all the given training examples one-by-one and updating the parameters according to each training example, so that the current training example is classified correctly with a margin against incorrect classifications at least as large as their loss. The change of the parameters is kept as small as possible.
A two-class version called binary MIRA simplifies the algorithm by not requiring the solution of a quadratic programming problem (see below). When used in a one-vs-all configuration, binary MIRA can be extended to a multiclass learner that approximates full MIRA, but may be faster to train.
The flow of the algorithm looks as follows:
Algorithm MIRA Input: Training examples T = { x i , y i } {\displaystyle T=\{x_{i},y_{i}\}} Output: Set of parameters w {\displaystyle w} i {\displaystyle i} ← 0, w ( 0 ) {\displaystyle w^{(0)}} ← 0 for n {\displaystyle n} ← 1 to N {\displaystyle N} for t {\displaystyle t} ← 1 to | T | {\displaystyle |T|} w ( i + 1 ) {\displaystyle w^{(i+1)}} ← update w ( i ) {\displaystyle w^{(i)}} according to { x t , y t } {\displaystyle \{x_{t},y_{t}\}} i {\displaystyle i} ← i + 1 {\displaystyle i+1} end for end for return ∑ j = 1 N × | T | w ( j ) N × | T | {\displaystyle {\frac {\sum _{j=1}^{N\times |T|}w^{(j)}}{N\times |T|}}}- "←" denotes assignment. For instance, "largest ← item" means that the value of largest changes to the value of item.
- "return" terminates the algorithm and outputs the following value.
The update step is then formalized as a quadratic programming problem: Find m i n ‖ w ( i + 1 ) − w ( i ) ‖ {\displaystyle min\|w^{(i+1)}-w^{(i)}\|} , so that s c o r e ( x t , y t ) − s c o r e ( x t , y ′ ) ≥ L ( y t , y ′ ) ∀ y ′ {\displaystyle score(x_{t},y_{t})-score(x_{t},y')\geq L(y_{t},y')\ \forall y'} , i.e. the score of the current correct training y {\displaystyle y} must be greater than the score of any other possible y ′ {\displaystyle y'} by at least the loss (number of errors) of that y ′ {\displaystyle y'} in comparison to y {\displaystyle y} .
External links
- adMIRAble – MIRA implementation in C++
- Miralium – MIRA implementation in Java
- MIRA implementation for Mahout in Hadoop
References
Crammer, Koby; Singer, Yoram (2003). "Ultraconservative Online Algorithms for Multiclass Problems". Journal of Machine Learning Research. 3: 951–991. http://jmlr.csail.mit.edu/papers/v3/crammer03a.html ↩
McDonald, Ryan; Crammer, Koby; Pereira, Fernando (2005). "Online Large-Margin Training of Dependency Parsers" (PDF). Proceedings of the 43rd Annual Meeting of the ACL. Association for Computational Linguistics. pp. 91–98. http://aclweb.org/anthology-new/P/P05/P05-1012.pdf ↩
Crammer, Koby; Singer, Yoram (2003). "Ultraconservative Online Algorithms for Multiclass Problems". Journal of Machine Learning Research. 3: 951–991. http://jmlr.csail.mit.edu/papers/v3/crammer03a.html ↩
Watanabe, T. et al (2007): "Online Large Margin Training for Statistical Machine Translation". In: Proceedings of the 2007 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning, 764–773. ↩
Bohnet, B. (2009): Efficient Parsing of Syntactic and Semantic Dependency Structures. Proceedings of Conference on Natural Language Learning (CoNLL), Boulder, 67–72. ↩
McDonald, Ryan; Crammer, Koby; Pereira, Fernando (2005). "Online Large-Margin Training of Dependency Parsers" (PDF). Proceedings of the 43rd Annual Meeting of the ACL. Association for Computational Linguistics. pp. 91–98. http://aclweb.org/anthology-new/P/P05/P05-1012.pdf ↩