MapR FS

<h2 id="history">History</h2>
<p>MapR FS was developed in 2009 by <a href="/facts/MapR/ux6joHSa">MapR</a> Technologies to extend the capabilities of
<a href="/facts/Apache_Hadoop/oUS1G2R7">Apache Hadoop</a> by providing a more performant and stable platform. The design of MapR FS is
influenced by various other systems such as the <a href="/facts/Andrew_File_System/qQ07XKk6">Andrew File System</a> (AFS). The concept of
volumes in AFS has some strong similarity from the point of the view of users, although the
implementation in MapR FS is completely different. One major difference between AFS and MapR FS is
that the latter uses a strong consistency model while AFS provides only weak consistency.
</p><p>To meet the original goals of supporting Hadoop programs, MapR FS supports the HDFS API by
translating HDFS function calls into an internal API based on a custom <a href="/facts/Remote_procedure_call/uMMb9CQw">remote procedure call</a> (RPC) mechanism. The normal write-once model of HDFS is replaced in
MapR FS by a fully mutable file system even when using the HDFS API. The ability to support file
mutation allows the implementation of an NFS server that translates NFS operations into internal
MapR RPC calls. Similar mechanisms are used to allow a <a href="/facts/Filesystem_in_Userspace/RnSLIf1H">Filesystem in Userspace</a> (FUSE) interface
and an approximate emulation of the <a href="/facts/Apache_HBase/D4XU1dqK">Apache HBase</a> API.
</p>
<h2 id="architecture">Architecture</h2>
<p>Files in MapR FS are internally implemented by splitting the file contents into chunks,
typically each 256 MB in size although the size is specific to each file. Each chunk is written to
containers which are the element of replication in the cluster. Containers are replicated and
the replication is done by either linear fashion in which each replica forwards write operations to
the next replica in line or in a <a href="/facts/Star_network/HPQDmFkI">star pattern</a> in which the master replica forwards write operations
to all other replicas at the same time. Writes are acknowledged by the master replica when all writes
to all replicas complete. Internally, containers implement <a href="/facts/B-tree/IHE2l8GR">B-trees</a> which are used at multiple
levels such as to map file offset to chunk within a file or to map file offset to the correct 8kB
block within a chunk.
</p><p>These B-trees are also used to implement directories. A long hash of each file or directory name in
the directory is used to find the child file or directory table.
</p><p>A volume is a special data structure similar to a directory in many ways, except that it allows
additional access control and management operations. A notable capability of volumes is that the
nodes on which a volume may reside within a cluster can be restricted to control performance,
particularly in heavily contended multi-tenant systems that are running a wide variety of
workloads.
</p><p>Proprietary technology is used in MapR FS to implement transactions in containers and to achieve
consistent crash recovery.
</p><p>Other features of the filesystem include:<a class="footnote-ref" id="fnref:5" href="#fn:5"><sup>5</sup></a>
</p>
<ul><li>Distributed cluster metadata, including the location of all containers and their arrangement into replication chains.</li>
<li>Distributed metadata, including the directory tree. All directories are fully replicated and no single node contains all of the meta-data for the cluster.</li>
<li>Efficient use of B-trees to achieve high performance even with very large directories.</li>
<li>Partition tolerance. A cluster can be partitioned without loss of consistency, although availability may be compromised. Restricted consistency replication across multiple clusters is also supported using volume mirrors, and near real-time replication of tables and streams.</li>
<li>Consistent multi-threaded update. Files can be updated or read by very many threads of control simultaneously without requiring global locking structures.</li>
<li>Rolling upgrades and online filesystem maintenance. Almost all maintenance including major version upgrades can be performed while the cluster continues to operate at nearly full speed.</li></ul>
<h2 id="see-also">See also</h2>
<ul><li><a href="/facts/GFS2/2SMVeGrv">GFS2</a></li>
<li><a href="/facts/Gluster/sVzz6ERY">Gluster</a></li>
<li><a href="/facts/Google_File_System/mDeBYrAb">Google File System</a></li>
<li><a href="/facts/List_of_file_systems/GMUaz18t">List of file systems</a></li>
<li><a href="/facts/Lustre_(file_system)/byz4hsWW">Lustre (file system)</a></li>
<li><a href="/facts/Moose_File_System/gkcmPYQP">MooseFS</a></li>
<li><a href="/facts/OCFS2/Hh7UcTvl">OCFS2</a></li>
<li><a href="/facts/QFS/bja86C50">QFS</a></li>
<li><a href="/facts/RozoFS/lLtGyBaU">RozoFS</a></li>
<li><a href="/facts/Shared_disk_file_system/Bz61f7Ce">Shared disk file system</a></li>
<li><a href="/facts/ZFS/cmh5BHlp">ZFS</a></li></ul>

<h2 id="references">References</h2>

<ol>
<li id="fn:1"><p>Brennan, Bob. "Flash Memory Summit". youtube. Samsung. Retrieved June 21, 2016. <a href="https://www.youtube.com/watch?v=fOT63zR7PvU&t=1682" target="_blank">https://www.youtube.com/watch?v=fOT63zR7PvU&t=1682</a> <a href="#fnref:1" class="footnote-back-ref">↩</a></p></li>
<li id="fn:2"><p>Dunning, Ted; Friedman, Ellen (January 2015). "Chapter 3: Understanding the MapR Distribution for Apache Hadoop". Real World Hadoop (First ed.). Sebastopol, CA: O'Reilly Media, Inc. pp. 23–28. ISBN 978-1-491-92395-5. Retrieved June 21, 2016. <a href="978-1-491-92395-5" target="_blank">978-1-491-92395-5</a> <a href="#fnref:2" class="footnote-back-ref">↩</a></p></li>
<li id="fn:3"><p>Perez, Nicolas. "How MapR improves our productivity and simplifies our design". Medium. Medium. Retrieved June 21, 2016. <a href="https://medium.com/@anicolaspp/how-mapr-improves-our-productivity-and-simplify-our-design-2d777ab53120#.b29t2p25x" target="_blank">https://medium.com/@anicolaspp/how-mapr-improves-our-productivity-and-simplify-our-design-2d777ab53120#.b29t2p25x</a> <a href="#fnref:3" class="footnote-back-ref">↩</a></p></li>
<li id="fn:4"><p>"MapR 1.0 Release Notes". MapR Documentation. MapR. Retrieved June 21, 2016. <a href="http://doc.mapr.com/display/RelNotes/Version+1.0+Release+Notes" target="_blank">http://doc.mapr.com/display/RelNotes/Version+1.0+Release+Notes</a> <a href="#fnref:4" class="footnote-back-ref">↩</a></p></li>
<li id="fn:5"><p>Srivas, MC. "MapR File System". Hadoop Summit 2011. Hortonworks. Retrieved June 21, 2016. <a href="https://www.youtube.com/watch?v=fP4HnvZmpZI" target="_blank">https://www.youtube.com/watch?v=fP4HnvZmpZI</a> <a href="#fnref:5" class="footnote-back-ref">↩</a></p></li>
</ol>

MapR FS open-in-new

MapR FS