Multidimensional scaling

<h2 id="types">Types</h2>
MDS algorithms fall into a <a href="/facts/Taxonomy_(general)/QRvFI267">taxonomy</a>, depending on the meaning of the input matrix:

<h3>Classical multidimensional scaling</h3>
It is also known as Principal Coordinates Analysis (PCoA), Torgerson Scaling or Torgerson–Gower scaling. It takes an input matrix giving dissimilarities between pairs of items and outputs a coordinate matrix whose configuration minimizes a <a href="/facts/Loss_function/xv5ozuhl">loss function</a> called strain,<a class="footnote-ref" id="fnref:4" href="#fn:4">4</a> which is given by

Strain
 
 
 D
 
 
 (
 
 x
 
 1
 
 
 ,
 
 x
 
 2
 
 
 ,
 .
 .
 .
 ,
 
 x
 
 n
 
 
 )
 =
 
 
 (
 
 
 
 
 
 
 ∑
 
 i
 ,
 j
 
 
 
 
 (
 
 
 
 b
 
 i
 j
 
 
 −
 
 x
 
 i
 
 
 T
 
 
 
 x
 
 j
 
 
 
 
 
 )
 
 
 
 2
 
 
 
 
 
 ∑
 
 i
 ,
 j
 
 
 
 b
 
 i
 j
 
 
 2
 
 
 
 
 
 
 
 
 )
 
 
 
 1
 
 /
 
 2
 
 
 ,
 
 
 {\displaystyle {\text{Strain}}_{D}(x_{1},x_{2},...,x_{n})={\Biggl (}{\frac {\sum _{i,j}{\bigl (}b_{ij}-x_{i}^{T}x_{j}{\bigr )}^{2}}{\sum _{i,j}b_{ij}^{2}}}{\Biggr )}^{1/2},}
 
 
where 
 
 
 
 
 x
 
 i
 
 
 
 
 {\displaystyle x_{i}}
 
 denote vectors in N-dimensional space, 
 
 
 
 
 x
 
 i
 
 
 T
 
 
 
 x
 
 j
 
 
 
 
 {\displaystyle x_{i}^{T}x_{j}}
 
 denotes the scalar product between 
 
 
 
 
 x
 
 i
 
 
 
 
 {\displaystyle x_{i}}
 
 and 
 
 
 
 
 x
 
 j
 
 
 
 
 {\displaystyle x_{j}}
 
, and 
 
 
 
 
 b
 
 i
 j
 
 
 
 
 {\displaystyle b_{ij}}
 
 are the elements of the matrix 
 
 
 
 B
 
 
 {\displaystyle B}
 
 defined on step 2 of the following algorithm, which are computed from the distances.

Steps of a Classical MDS algorithm:
Classical MDS uses the fact that the coordinate matrix 
 
 
 
 X
 
 
 {\displaystyle X}
 
 can be derived by <a href="/facts/Eigendecomposition_of_a_matrix/a2nfF7hJ">eigenvalue decomposition</a> from 
 
 
 
 B
 =
 X
 
 X
 ′
 
 
 
 {\textstyle B=XX'}
 
. And the matrix 
 
 
 
 B
 
 
 {\textstyle B}
 
 can be computed from proximity matrix 
 
 
 
 D
 
 
 {\textstyle D}
 
 by using double centering.<a class="footnote-ref" id="fnref:5" href="#fn:5">5</a>
<ol><li>Set up the squared proximity matrix 
 
 
 
 
 D
 
 (
 2
 )
 
 
 =
 [
 
 d
 
 i
 j
 
 
 2
 
 
 ]
 
 
 {\textstyle D^{(2)}=[d_{ij}^{2}]}
 
</li>
<li>Apply double centering: 
 
 
 
 B
 =
 −
 
 
 1
 2
 
 
 C
 
 D
 
 (
 2
 )
 
 
 C
 
 
 {\textstyle B=-{\frac {1}{2}}CD^{(2)}C}
 
 using the <a href="/facts/Centering_matrix/sRlKytJB">centering matrix</a> 
 
 
 
 C
 =
 I
 −
 
 
 1
 n
 
 
 
 J
 
 n
 
 
 
 
 {\textstyle C=I-{\frac {1}{n}}J_{n}}
 
, where 
 
 
 
 n
 
 
 {\textstyle n}
 
 is the number of objects, 
 
 
 
 I
 
 
 {\textstyle I}
 
 is the 
 
 
 
 n
 ×
 n
 
 
 {\textstyle n\times n}
 
 identity matrix, and 
 
 
 
 
 J
 
 n
 
 
 
 
 {\textstyle J_{n}}
 
 is an 
 
 
 
 n
 ×
 n
 
 
 {\textstyle n\times n}
 
 matrix of all ones.</li>
<li>Determine the 
 
 
 
 m
 
 
 {\textstyle m}
 
 largest <a href="/facts/Eigenvalues_and_eigenvectors/8TjEoT8u">eigenvalues</a> 
 
 
 
 
 λ
 
 1
 
 
 ,
 
 λ
 
 2
 
 
 ,
 .
 .
 .
 ,
 
 λ
 
 m
 
 
 
 
 {\textstyle \lambda _{1},\lambda _{2},...,\lambda _{m}}
 
 and corresponding <a href="/facts/Eigenvalues_and_eigenvectors/8TjEoT8u">eigenvectors</a> 
 
 
 
 
 e
 
 1
 
 
 ,
 
 e
 
 2
 
 
 ,
 .
 .
 .
 ,
 
 e
 
 m
 
 
 
 
 {\textstyle e_{1},e_{2},...,e_{m}}
 
 of 
 
 
 
 B
 
 
 {\textstyle B}
 
 (where 
 
 
 
 m
 
 
 {\textstyle m}
 
 is the number of dimensions desired for the output).</li>
<li>Now, 
 
 
 
 X
 =
 
 E
 
 m
 
 
 
 Λ
 
 m
 
 
 1
 
 /
 
 2
 
 
 
 
 {\textstyle X=E_{m}\Lambda _{m}^{1/2}}
 
 , where 
 
 
 
 
 E
 
 m
 
 
 
 
 {\textstyle E_{m}}
 
 is the matrix of 
 
 
 
 m
 
 
 {\textstyle m}
 
 eigenvectors and 
 
 
 
 
 Λ
 
 m
 
 
 
 
 {\textstyle \Lambda _{m}}
 
 is the <a href="/facts/Diagonal_matrix/tjIAHrE6">diagonal matrix</a> of 
 
 
 
 m
 
 
 {\textstyle m}
 
 eigenvalues of 
 
 
 
 B
 
 
 {\textstyle B}
 
.</li></ol>
Classical MDS assumes metric distances. So this is not applicable for direct dissimilarity ratings.
<h3>Metric multidimensional scaling (mMDS)</h3>
It is a superset of classical MDS that generalizes the optimization procedure to a variety of loss functions and input matrices of known distances with weights and so on. A useful loss function in this context is called stress, which is often minimized using a procedure called <a href="/facts/Stress_majorization/D4D6xcPC">stress majorization</a>. Metric MDS minimizes the cost function called “stress” which is a residual sum of squares:<blockquote>
 
 
 
 
 
 Stress
 
 
 D
 
 
 (
 
 x
 
 1
 
 
 ,
 
 x
 
 2
 
 
 ,
 .
 .
 .
 ,
 
 x
 
 n
 
 
 )
 =
 
 
 
 ∑
 
 i
 ≠
 j
 =
 1
 ,
 .
 .
 .
 ,
 n
 
 
 
 
 (
 
 
 
 d
 
 i
 j
 
 
 −
 ‖
 
 x
 
 i
 
 
 −
 
 x
 
 j
 
 
 ‖
 
 
 
 )
 
 
 
 2
 
 
 
 
 .
 
 
 {\displaystyle {\text{Stress}}_{D}(x_{1},x_{2},...,x_{n})={\sqrt {\sum _{i\neq j=1,...,n}{\bigl (}d_{ij}-\|x_{i}-x_{j}\|{\bigr )}^{2}}}.}
 
</blockquote>
Metric scaling uses a power transformation with a user-controlled exponent 
 
 
 
 p
 
 
 {\textstyle p}
 
: 
 
 
 
 
 d
 
 i
 j
 
 
 p
 
 
 
 
 {\textstyle d_{ij}^{p}}
 
 and 
 
 
 
 −
 
 d
 
 i
 j
 
 
 2
 p
 
 
 
 
 {\textstyle -d_{ij}^{2p}}
 
 for distance. In classical scaling 
 
 
 
 p
 =
 1.
 
 
 {\textstyle p=1.}
 
 Non-metric scaling is defined by the use of isotonic regression to nonparametrically estimate a transformation of the dissimilarities.

<h3>Non-metric multidimensional scaling (NMDS)</h3>
In contrast to metric MDS, non-metric MDS finds both a <a href="/facts/Non-parametric/svTDJHTb">non-parametric</a> <a href="/facts/Monotonic/MjQK0YL2">monotonic</a> relationship between the dissimilarities in the item-item matrix and the Euclidean distances between items, and the location of each item in the low-dimensional space. 
Let 
 
 
 
 
 d
 
 i
 j
 
 
 
 
 {\displaystyle d_{ij}}
 
 be the dissimilarity between points 
 
 
 
 i
 ,
 j
 
 
 {\displaystyle i,j}
 
. Let 
 
 
 
 
 
 
 
 d
 ^
 
 
 
 
 i
 j
 
 
 =
 ‖
 
 x
 
 i
 
 
 −
 
 x
 
 j
 
 
 ‖
 
 
 {\displaystyle {\hat {d}}_{ij}=\|x_{i}-x_{j}\|}
 
 be the Euclidean distance between embedded points 
 
 
 
 
 x
 
 i
 
 
 ,
 
 x
 
 j
 
 
 
 
 {\displaystyle x_{i},x_{j}}
 
.
Now, for each choice of the embedded points 
 
 
 
 
 x
 
 i
 
 
 
 
 {\displaystyle x_{i}}
 
 and is a monotonically increasing function 
 
 
 
 f
 
 
 {\displaystyle f}
 
, define the "stress" function:

<blockquote>
 
 
 
 S
 (
 
 x
 
 1
 
 
 ,
 .
 .
 .
 ,
 
 x
 
 n
 
 
 ;
 f
 )
 =
 
 
 
 
 
 ∑
 
 i
 <
 j
 
 
 
 
 (
 
 
 f
 (
 
 d
 
 i
 j
 
 
 )
 −
 
 
 
 
 d
 ^
 
 
 
 
 i
 j
 
 
 
 
 
 )
 
 
 
 2
 
 
 
 
 
 ∑
 
 i
 <
 j
 
 
 
 
 
 
 d
 ^
 
 
 
 
 i
 j
 
 
 2
 
 
 
 
 
 
 .
 
 
 {\displaystyle S(x_{1},...,x_{n};f)={\sqrt {\frac {\sum _{i<j}{\bigl (}f(d_{ij})-{\hat {d}}_{ij}{\bigr )}^{2}}{\sum _{i<j}{\hat {d}}_{ij}^{2}}}}.}
 
</blockquote>
The factor of 
 
 
 
 
 ∑
 
 i
 <
 j
 
 
 
 
 
 
 d
 ^
 
 
 
 
 i
 j
 
 
 2
 
 
 
 
 {\displaystyle \sum _{i<j}{\hat {d}}_{ij}^{2}}
 
 in the denominator is necessary to prevent a "collapse". Suppose we define instead 
 
 
 
 S
 =
 
 
 
 ∑
 
 i
 <
 j
 
 
 
 
 (
 
 
 f
 (
 
 d
 
 i
 j
 
 
 )
 −
 
 
 
 
 d
 ^
 
 
 
 
 i
 j
 
 
 
 )
 
 2
 
 
 
 
 
 
 {\displaystyle S={\sqrt {\sum _{i<j}{\bigl (}f(d_{ij})-{\hat {d}}_{ij})^{2}}}}
 
, then it can be trivially minimized by setting 
 
 
 
 f
 =
 0
 
 
 {\displaystyle f=0}
 
, then collapse every point to the same point.
A few variants of this cost function exist. MDS programs automatically minimize stress in order to obtain the MDS solution.
The core of a non-metric MDS algorithm is a twofold optimization process. First the optimal monotonic transformation of the proximities has to be found. Secondly, the points of a configuration have to be optimally arranged, so that their distances match the scaled proximities as closely as possible. 
NMDS needs to optimize two objectives simultaneously. This is usually done iteratively:

<ol><li>Initialize 
 
 
 
 
 x
 
 i
 
 
 
 
 {\displaystyle x_{i}}
 
 randomly, e. g. by sampling from a normal distribution.</li>
<li>Do until a stopping criterion (for example, 
 
 
 
 S
 <
 ϵ
 
 
 {\displaystyle S<\epsilon }
 
)
<ol><li>Solve for 
 
 
 
 f
 =
 arg
 ⁡
 
 min
 
 f
 
 
 S
 (
 
 x
 
 1
 
 
 ,
 .
 .
 .
 ,
 
 x
 
 n
 
 
 ;
 f
 )
 
 
 {\displaystyle f=\arg \min _{f}S(x_{1},...,x_{n};f)}
 
 by <a href="/facts/Isotonic_regression/qewkFozY">isotonic regression</a>.</li>
<li>Solve for 
 
 
 
 
 x
 
 1
 
 
 ,
 .
 .
 .
 ,
 
 x
 
 n
 
 
 =
 arg
 ⁡
 
 min
 
 
 x
 
 1
 
 
 ,
 .
 .
 .
 ,
 
 x
 
 n
 
 
 
 
 S
 (
 
 x
 
 1
 
 
 ,
 .
 .
 .
 ,
 
 x
 
 n
 
 
 ;
 f
 )
 
 
 {\displaystyle x_{1},...,x_{n}=\arg \min _{x_{1},...,x_{n}}S(x_{1},...,x_{n};f)}
 
 by gradient descent or other methods.</li></ol></li>
<li>Return 
 
 
 
 
 x
 
 i
 
 
 
 
 {\displaystyle x_{i}}
 
 and 
 
 
 
 f
 
 
 {\displaystyle f}
 
</li></ol>
<a href="/facts/Louis_Guttman/zk9rJLNu">Louis Guttman</a>'s smallest space analysis (SSA) is an example of a non-metric MDS procedure.

<h3>Generalized multidimensional scaling (GMD)</h3>
Main article: <a href="/facts/Generalized_multidimensional_scaling/8H2kvkPc">Generalized multidimensional scaling</a>
An extension of metric multidimensional scaling, in which the target space is an arbitrary smooth non-Euclidean space. In cases where the dissimilarities are distances on a surface and the target space is another surface, GMDS allows finding the minimum-distortion embedding of one surface into another.<a class="footnote-ref" id="fnref:6" href="#fn:6">6</a>

<h3>Super multidimensional scaling (SMDS)</h3>
An extension of MDS, known as Super MDS, incorporates both distance and angle information for improved source localization. Unlike traditional MDS, which uses only distance measurements, Super MDS processes both distance and angle-of-arrival (AOA) data algebraically (without iteration) to achieve better accuracy.<a class="footnote-ref" id="fnref:7" href="#fn:7">7</a>
The method proceeds in the following steps:

<ol><li>Construct the Reduced Edge Gram Kernel: For a network of 
 
 
 
 N
 
 
 {\displaystyle N}
 
 sources in an 
 
 
 
 η
 
 
 {\displaystyle \eta }
 
-dimensional space, define the edge vectors as 
 
 
 
 
 v
 
 i
 
 
 =
 
 x
 
 m
 
 
 −
 
 x
 
 n
 
 
 
 
 {\displaystyle v_{i}=x_{m}-x_{n}}
 
. The dissimilarity is given by 
 
 
 
 
 k
 
 i
 ,
 j
 
 
 =
 ⟨
 
 v
 
 i
 
 
 ,
 
 v
 
 j
 
 
 ⟩
 
 
 {\displaystyle k_{i,j}=\langle v_{i},v_{j}\rangle }
 
. Assemble these into the full kernel 
 
 
 
 K
 =
 V
 
 V
 
 T
 
 
 
 
 {\displaystyle K=VV^{T}}
 
, and then form the reduced kernel using the 
 
 
 
 N
 −
 1
 
 
 {\displaystyle N-1}
 
 independent vectors: 
 
 
 
 
 
 
 K
 ¯
 
 
 
 =
 [
 V
 
 ]
 
 (
 N
 −
 1
 )
 ×
 η
 
 
  
 [
 V
 
 ]
 
 (
 N
 −
 1
 )
 ×
 η
 
 
 T
 
 
 
 
 {\displaystyle {\bar {K}}=[V]_{(N-1)\times \eta }\ [V]_{(N-1)\times \eta }^{T}}
 
,</li>
<li>Eigen-Decomposition: Compute the eigen-decomposition of 
 
 
 
 
 
 
 K
 ¯
 
 
 
 
 
 {\displaystyle {\bar {K}}}
 
,</li>
<li>Estimate Edge Vectors: Recover the edge vectors as 
 
 
 
 
 
 
 V
 ^
 
 
 
 =
 
 
 (
 
 
 
 U
 
 M
 ×
 η
 
 
 
 
 Λ
 
 η
 ×
 η
 
 
 ⊙
 
 
 1
 2
 
 
 
 
 
 
 
 )
 
 
 
 T
 
 
 
 
 {\displaystyle {\hat {V}}={\Bigl (}U_{M\times \eta }\,\Lambda _{\eta \times \eta }^{\odot {\frac {1}{2}}}{\Bigr )}^{T}}
 
,</li>
<li>Procrustes Alignment: Retrieve 
 
 
 
 
 
 
 V
 ^
 
 
 
 
 
 {\displaystyle {\hat {V}}}
 
 from 
 
 
 
 V
 
 
 {\displaystyle V}
 
 via Procrustes Transformation,</li>
<li>Compute Coordinates: Solve the following linear equations to compute the coordinate estimates 
 
 
 
 
 
 (
 
 
 
 1
 
 
 
 |
 
 
 
 
 
 0
 
 
 1
 ×
 N
 −
 1
 
 
 
 
 
 
 
 
 [
 C
 ]
 
 
 N
 −
 1
 ×
 N
 
 
 
 
 
 )
 
 
 ⋅
 
 
 (
 
 
 
 
 
 x
 
 
 1
 
 
 
 
 
 
 [
 
 X
 
 
 ]
 
 N
 −
 1
 ×
 η
 
 
 
 
 
 )
 
 
 =
 
 
 (
 
 
 
 
 
 x
 
 
 1
 
 
 
 
 
 
 [
 
 V
 
 
 ]
 
 N
 −
 1
 ×
 η
 
 
 
 
 
 )
 
 
 ,
 
 
 {\displaystyle {\begin{pmatrix}1\vline \mathbf {0} _{1\times N-1}\\\hline \mathbf {[C]} _{N-1\times N}\end{pmatrix}}\cdot {\begin{pmatrix}\mathbf {x} _{1}\\\hline [\mathbf {X} ]_{N-1\times \eta }\end{pmatrix}}={\begin{pmatrix}\mathbf {x} _{1}\\\hline [\mathbf {V} ]_{N-1\times \eta }\end{pmatrix}},}
 
</li></ol>
This concise approach reduces the need for multiple anchors and enhances localization precision by leveraging angle constraints.

<h2 id="details">Details</h2>
The data to be analyzed is a collection of 
 
 
 
 M
 
 
 {\displaystyle M}
 
 objects (colors, faces, stocks, . . .) on which a distance function is defined,

d
 
 i
 ,
 j
 
 
 :=
 
 
 {\displaystyle d_{i,j}:=}
 
 distance between 
 
 
 
 i
 
 
 {\displaystyle i}
 
-th and 
 
 
 
 j
 
 
 {\displaystyle j}
 
-th objects.
These distances are the entries of the dissimilarity matrix

D
        :=
        
          
            (
            
              
                
                  
                    d
                    
                      1
                      ,
                      1
                    
                  
                
                
                  
                    d
                    
                      1
                      ,
                      2
                    
                  
                
                
                  ⋯
                
                
                  
                    d
                    
                      1
                      ,
                      M
                    
                  
                
              
              
                
                  
                    d
                    
                      2
                      ,
                      1
                    
                  
                
                
                  
                    d
                    
                      2
                      ,
                      2
                    
                  
                
                
                  ⋯
                
                
                  
                    d
                    
                      2
                      ,
                      M
                    
                  
                
              
              
                
                  ⋮
                
                
                  ⋮
                
                
                
                  ⋮
                
              
              
                
                  
                    d
                    
                      M
                      ,
                      1
                    
                  
                
                
                  
                    d
                    
                      M
                      ,
                      2
                    
                  
                
                
                  ⋯
                
                
                  
                    d
                    
                      M
                      ,
                      M
                    
                  
                
              
            
            )
          
        
        .
      
    
    {\displaystyle D:={\begin{pmatrix}d_{1,1}&d_{1,2}&\cdots &d_{1,M}\\d_{2,1}&d_{2,2}&\cdots &d_{2,M}\\\vdots &\vdots &&\vdots \\d_{M,1}&d_{M,2}&\cdots &d_{M,M}\end{pmatrix}}.}

The goal of MDS is, given 
 
 
 
 D
 
 
 {\displaystyle D}
 
, to find 
 
 
 
 M
 
 
 {\displaystyle M}
 
 vectors

x
 
 1
 
 
 ,
 …
 ,
 
 x
 
 M
 
 
 ∈
 
 
 R
 
 
 N
 
 
 
 
 {\displaystyle x_{1},\ldots ,x_{M}\in \mathbb {R} ^{N}}
 
 such that

‖
 
 x
 
 i
 
 
 −
 
 x
 
 j
 
 
 ‖
 ≈
 
 d
 
 i
 ,
 j
 
 
 
 
 {\displaystyle \|x_{i}-x_{j}\|\approx d_{i,j}}
 
 for all 
 
 
 
 i
 ,
 j
 ∈
 
 1
 ,
 …
 ,
 M
 
 
 
 {\displaystyle i,j\in {1,\dots ,M}}
 
,
where 
 
 
 
 ‖
 ⋅
 ‖
 
 
 {\displaystyle \|\cdot \|}
 
 is a <a href="/facts/Norm_(mathematics)/xIbR4uE1">vector norm</a>. In classical MDS, this norm is the <a href="/facts/Euclidean_distance/9qDoQKQe">Euclidean distance</a>, but, in a broader sense, it may be a <a href="/facts/Metric_(mathematics)/RkUptSo8">metric</a> or arbitrary distance function.<a class="footnote-ref" id="fnref:8" href="#fn:8">8</a> For example, when dealing with mixed-type data that contain numerical as well as categorical descriptors, <a href="/facts/Gower%2527s_distance/HpnqMCNE">Gower's distance</a> is a common alternative.
In other words, MDS attempts to find a mapping from the 
 
 
 
 M
 
 
 {\displaystyle M}
 
 objects into 
 
 
 
 
 
 R
 
 
 N
 
 
 
 
 {\displaystyle \mathbb {R} ^{N}}
 
 such that distances are preserved. If the dimension 
 
 
 
 N
 
 
 {\displaystyle N}
 
 is chosen to be 2 or 3, we may plot the vectors 
 
 
 
 
 x
 
 i
 
 
 
 
 {\displaystyle x_{i}}
 
 to obtain a visualization of the similarities between the 
 
 
 
 M
 
 
 {\displaystyle M}
 
 objects. Note that the vectors 
 
 
 
 
 x
 
 i
 
 
 
 
 {\displaystyle x_{i}}
 
 are not unique: With the Euclidean distance, they may be arbitrarily translated, rotated, and reflected, since these transformations do not change the pairwise distances 
 
 
 
 ‖
 
 x
 
 i
 
 
 −
 
 x
 
 j
 
 
 ‖
 
 
 {\displaystyle \|x_{i}-x_{j}\|}
 
.
(Note: The symbol 
 
 
 
 
 R
 
 
 
 {\displaystyle \mathbb {R} }
 
 indicates the set of <a href="/facts/Real_numbers/R02gw5Pb">real numbers</a>, and the notation 
 
 
 
 
 
 R
 
 
 N
 
 
 
 
 {\displaystyle \mathbb {R} ^{N}}
 
 refers to the Cartesian product of 
 
 
 
 N
 
 
 {\displaystyle N}
 
 copies of 
 
 
 
 
 R
 
 
 
 {\displaystyle \mathbb {R} }
 
, which is an 
 
 
 
 N
 
 
 {\displaystyle N}
 
-dimensional vector space over the field of the real numbers.)
There are various approaches to determining the vectors 
 
 
 
 
 x
 
 i
 
 
 
 
 {\displaystyle x_{i}}
 
. Usually, MDS is formulated as an <a href="/facts/Optimization_(mathematics)/oRn8Iv5I">optimization problem</a>, where 
 
 
 
 (
 
 x
 
 1
 
 
 ,
 …
 ,
 
 x
 
 M
 
 
 )
 
 
 {\displaystyle (x_{1},\ldots ,x_{M})}
 
 is found as a minimizer of some cost function, for example,

a
 r
 g
 m
 i
 n
 
 
 
 x
 
 1
 
 
 ,
 …
 ,
 
 x
 
 M
 
 
 
 
 
 
 ∑
 
 i
 <
 j
 
 
 (
 ‖
 
 x
 
 i
 
 
 −
 
 x
 
 j
 
 
 ‖
 −
 
 d
 
 i
 ,
 j
 
 
 
 )
 
 2
 
 
 .
 
 
 
 {\displaystyle {\underset {x_{1},\ldots ,x_{M}}{\mathrm {argmin} }}\sum _{i<j}(\|x_{i}-x_{j}\|-d_{i,j})^{2}.\,}

A solution may then be found by numerical optimization techniques. For some particularly chosen cost functions, minimizers can be stated analytically in terms of matrix <a href="/facts/Eigendecomposition_of_a_matrix/a2nfF7hJ">eigendecompositions</a>.<a class="footnote-ref" id="fnref:9" href="#fn:9">9</a>

<h2 id="procedure">Procedure</h2>
There are several steps in conducting MDS research:

<ol><li>Formulating the problem – What variables do you want to compare? How many variables do you want to compare? What purpose is the study to be used for?</li>
<li>Obtaining input data – For example, :- Respondents are asked a series of questions. For each product pair, they are asked to rate similarity (usually on a 7-point <a href="/facts/Likert_scale/rNtKleRF">Likert scale</a> from very similar to very dissimilar). The first question could be for Coke/Pepsi for example, the next for Coke/Hires rootbeer, the next for Pepsi/Dr Pepper, the next for Dr Pepper/Hires rootbeer, etc. The number of questions is a function of the number of brands and can be calculated as 
 
 
 
 Q
 =
 N
 (
 N
 −
 1
 )
 
 /
 
 2
 
 
 {\displaystyle Q=N(N-1)/2}
 
 where Q is the number of questions and N is the number of brands. This approach is referred to as the “Perception data : direct approach”. There are two other approaches. There is the “Perception data : derived approach” in which products are decomposed into attributes that are rated on a <a href="/facts/Semantic_differential/j3mXQ1ah">semantic differential</a> scale. The other is the “Preference data approach” in which respondents are asked their preference rather than similarity.</li>
<li>Running the MDS statistical program – Software for running the procedure is available in many statistical software packages. Often there is a choice between Metric MDS (which deals with interval or ratio level data), and Nonmetric MDS<a class="footnote-ref" id="fnref:10" href="#fn:10">10</a> (which deals with ordinal data).</li>
<li>Decide number of dimensions – The researcher must decide on the number of dimensions they want the computer to create. Interpretability of the MDS solution is often important, and lower dimensional solutions will typically be easier to interpret and visualize. However, dimension selection is also an issue of balancing underfitting and overfitting. Lower dimensional solutions may underfit by leaving out important dimensions of the dissimilarity data. Higher dimensional solutions may overfit to noise in the dissimilarity measurements. Model selection tools like <a href="/facts/Akaike_information_criterion/RjKKgcnF">AIC</a>, <a href="/facts/Bayesian_information_criterion/NHspnLrM">BIC</a>, <a href="/facts/Bayes_factors/Cu4QH8hv">Bayes factors</a>, or <a href="/facts/Cross-validation_(statistics)/pDWg8s0e">cross-validation</a> can thus be useful to select the dimensionality that balances underfitting and overfitting.</li>
<li>Mapping the results and defining the dimensions – The statistical program (or a related module) will map the results. The map will plot each product (usually in two-dimensional space). The proximity of products to each other indicate either how similar they are or how preferred they are, depending on which approach was used. How the dimensions of the embedding actually correspond to dimensions of system behavior, however, are not necessarily obvious. Here, a subjective judgment about the correspondence can be made (see <a href="/facts/Perceptual_mapping/4UKHQoXT">perceptual mapping</a>).</li>
<li>Test the results for reliability and validity – Compute <a href="/facts/R-squared/acBlgnrv">R-squared</a> to determine what proportion of variance of the scaled data can be accounted for by the MDS procedure. An R-square of 0.6 is considered the minimum acceptable level. An R-square of 0.8 is considered good for metric scaling and .9 is considered good for non-metric scaling. Other possible tests are Kruskal’s Stress, split data tests, data stability tests (i.e., eliminating one brand), and test-retest reliability.</li>
<li>Report the results comprehensively – Along with the mapping, at least distance measure (e.g., <a href="/facts/Sorenson_index/rwLufgya">Sorenson index</a>, <a href="/facts/Jaccard_index/dUurmTet">Jaccard index</a>) and reliability (e.g., stress value) should be given. It is also very advisable to give the algorithm (e.g., Kruskal, Mather), which is often defined by the program used (sometimes replacing the algorithm report), if you have given a start configuration or had a random choice, the number of runs, the assessment of dimensionality, the <a href="/facts/Monte_Carlo_method/AkHjY7jc">Monte Carlo method</a> results, the number of iterations, the assessment of stability, and the proportional variance of each axis (r-square).</li></ol>
<h2 id="implementations">Implementations</h2>
<ul><li><a href="/facts/ELKI/C2LWqW13">ELKI</a> includes two MDS implementations.</li>
<li><a href="/facts/MATLAB/qPjLISCk">MATLAB</a> includes two MDS implementations (for classical (cmdscale) and non-classical (mdscale) MDS respectively).</li>
<li>The <a href="/facts/R_(programming_language)/LSrkr8K8">R programming language</a> offers several MDS implementations, e.g. base cmdscale function, packages <a href="https://CRAN.R-project.org/package=smacof">smacof</a><a class="footnote-ref" id="fnref:11" href="#fn:11">11</a> (mMDS and nMDS), and <a href="https://CRAN.R-project.org/package=vegan">vegan</a> (weighted MDS).</li>
<li><a href="/facts/Scikit-learn/HqxDHRMJ">scikit-learn</a> contains function <a href="http://scikit-learn.org/stable/modules/generated/sklearn.manifold.MDS.html">sklearn.manifold.MDS</a>.</li></ul>
<h2 id="see-also">See also</h2>

Wikimedia Commons has media related to Multidimensional scaling.

<ul><li><a href="/facts/Data_clustering/HUeDnWPt">Data clustering</a></li>
<li><a href="/facts/T-distributed_stochastic_neighbor_embedding/voxbCyzQ">t-distributed stochastic neighbor embedding</a></li>
<li><a href="/facts/Factor_analysis/LT6B9I7D">Factor analysis</a></li>
<li><a href="/facts/Discriminant_analysis/Mj6KQhRN">Discriminant analysis</a></li>
<li><a href="/facts/Dimensionality_reduction/XoRpcrB0">Dimensionality reduction</a></li>
<li><a href="/facts/Distance_geometry/RYWhDzgo">Distance geometry</a></li>
<li><a href="/facts/Cayley%25E2%2580%2593Menger_determinant/vp0eusc1">Cayley–Menger determinant</a></li>
<li><a href="/facts/Sammon_mapping/wmazhtYP">Sammon mapping</a></li>
<li><a href="/facts/Iconography_of_correlations/zfc14hTC">Iconography of correlations</a></li></ul>

<h2 id="bibliography">Bibliography</h2>

<ul><li>Cox, T.F.; Cox, M.A.A. (2001). <a href="https://doi.org/10.1007/978-3-540-33037-0_14">"Multidimensional Scaling"</a>. In Unwin, A; Chen, C; Hardle, W. K. (eds.). Handbook of Data Visualization. Springer. <a href="/facts/Doi_(identifier)/muM9Etpq">doi</a>:<a href="https://doi.org/10.1007%2F978-3-540-33037-0_14">10.1007/978-3-540-33037-0_14</a>. <a href="/facts/ISBN_(identifier)/15AdSPa9">ISBN</a> 978-3-540-33037-0.</li>
<li>Coxon, Anthony P.M. (1982). The User's Guide to Multidimensional Scaling. With special reference to the MDS(X) library of Computer Programs. London: Heinemann Educational Books.</li>
<li>Green, P. (January 1975). "Marketing applications of MDS: Assessment and outlook". Journal of Marketing. 39 (1): 24–31. <a href="/facts/Doi_(identifier)/muM9Etpq">doi</a>:<a href="https://doi.org/10.2307%2F1250799">10.2307/1250799</a>. <a href="/facts/JSTOR_(identifier)/YTeVmaJ7">JSTOR</a> <a href="https://www.jstor.org/stable/1250799">1250799</a>.</li>
<li>McCune, B. & Grace, J.B. (2002). Analysis of Ecological Communities. Oregon, Gleneden Beach: MjM Software Design. <a href="/facts/ISBN_(identifier)/15AdSPa9">ISBN</a> 978-0-9721290-0-8.</li>
<li>Young, Forrest W. (1987). Multidimensional scaling: History, theory, and applications. Lawrence Erlbaum Associates. <a href="/facts/ISBN_(identifier)/15AdSPa9">ISBN</a> 978-0898596632.</li>
<li>Torgerson, Warren S. (1958). Theory & Methods of Scaling. New York: Wiley. <a href="/facts/ISBN_(identifier)/15AdSPa9">ISBN</a> 978-0-89874-722-5. {{cite book}}: ISBN / Date incompatibility (help)</li></ul>

<h2 id="references">References</h2>

<ol>
<li id="fn:1">Mead, A (1992). "Review of the Development of Multidimensional Scaling Methods". Journal of the Royal Statistical Society. Series D (The Statistician). 41 (1): 27–39. doi:10.2307/2348634. JSTOR 2348634. Abstract. Multidimensional scaling methods are now a common statistical tool in psychophysics and sensory analysis. The development of these methods is charted, from the original research of Torgerson (metric scaling), Shepard and Kruskal (non-metric scaling) through individual differences scaling and the maximum likelihood methods proposed by Ramsay. <a href="/wiki/Doi_(identifier)" target="_blank">/wiki/Doi_(identifier)</a> <a href="#fnref:1" class="footnote-back-ref">↩</a></li>
<li id="fn:2">Borg, I.; Groenen, P. (2005). Modern Multidimensional Scaling: theory and applications (2nd ed.). New York: Springer-Verlag. pp. 207–212. ISBN 978-0-387-94845-4. <a href="978-0-387-94845-4" target="_blank">978-0-387-94845-4</a> <a href="#fnref:2" class="footnote-back-ref">↩</a></li>
<li id="fn:3">Genest, Christian; Nešlehová, Johanna G.; Ramsay, James O. (2014). "A Conversation with James O. Ramsay". International Statistical Review / Revue Internationale de Statistique. 82 (2): 161–183. JSTOR 43299752. Retrieved 30 June 2021. <a href="https://www.jstor.org/stable/43299752" target="_blank">https://www.jstor.org/stable/43299752</a> <a href="#fnref:3" class="footnote-back-ref">↩</a></li>
<li id="fn:4">Borg, I.; Groenen, P. (2005). Modern Multidimensional Scaling: theory and applications (2nd ed.). New York: Springer-Verlag. pp. 207–212. ISBN 978-0-387-94845-4. <a href="978-0-387-94845-4" target="_blank">978-0-387-94845-4</a> <a href="#fnref:4" class="footnote-back-ref">↩</a></li>
<li id="fn:5">Wickelmaier, Florian. "An introduction to MDS." Sound Quality Research Unit, Aalborg University, Denmark (2003): 46 <a href="#fnref:5" class="footnote-back-ref">↩</a></li>
<li id="fn:6">Bronstein AM, Bronstein MM, Kimmel R (January 2006). "Generalized multidimensional scaling: a framework for isometry-invariant partial surface matching". Proc. Natl. Acad. Sci. U.S.A. 103 (5): 1168–72. Bibcode:2006PNAS..103.1168B. doi:10.1073/pnas.0508601103. PMC 1360551. PMID 16432211. <a href="https://www.ncbi.nlm.nih.gov/pmc/articles/PMC1360551" target="_blank">https://www.ncbi.nlm.nih.gov/pmc/articles/PMC1360551</a> <a href="#fnref:6" class="footnote-back-ref">↩</a></li>
<li id="fn:7">de Abreu, G. T. F.; Destino, G. (2007). Super MDS: Source Location from Distance and Angle Information. 2007 IEEE Wireless Communications and Networking Conference. Hong Kong, China. pp. 4430–4434. doi:10.1109/WCNC.2007.807. <a href="/wiki/Doi_(identifier)" target="_blank">/wiki/Doi_(identifier)</a> <a href="#fnref:7" class="footnote-back-ref">↩</a></li>
<li id="fn:8">Kruskal, J. B., and Wish, M. (1978), Multidimensional Scaling, Sage University Paper series on Quantitative Application in the Social Sciences, 07-011. Beverly Hills and London: Sage Publications. <a href="/wiki/Joseph_Kruskal" target="_blank">/wiki/Joseph_Kruskal</a> <a href="#fnref:8" class="footnote-back-ref">↩</a></li>
<li id="fn:9">Borg, I.; Groenen, P. (2005). Modern Multidimensional Scaling: theory and applications (2nd ed.). New York: Springer-Verlag. pp. 207–212. ISBN 978-0-387-94845-4. <a href="978-0-387-94845-4" target="_blank">978-0-387-94845-4</a> <a href="#fnref:9" class="footnote-back-ref">↩</a></li>
<li id="fn:10">Kruskal, J. B. (1964). "Multidimensional scaling by optimizing goodness of fit to a nonmetric hypothesis". Psychometrika. 29 (1): 1–27. doi:10.1007/BF02289565. S2CID 48165675. <a href="/wiki/Joseph_Kruskal" target="_blank">/wiki/Joseph_Kruskal</a> <a href="#fnref:10" class="footnote-back-ref">↩</a></li>
<li id="fn:11">Leeuw, Jan de; Mair, Patrick (2009). "Multidimensional Scaling Using Majorization: SMACOF in R". Journal of Statistical Software. 31 (3). doi:10.18637/jss.v031.i03. ISSN 1548-7660. <a href="http://www.jstatsoft.org/v31/i03/" target="_blank">http://www.jstatsoft.org/v31/i03/</a> <a href="#fnref:11" class="footnote-back-ref">↩</a></li>
</ol>

Multidimensional scaling open-in-new

Multidimensional scaling