In computer science, an e-graph is a data structure that stores an equivalence relation over terms of some language.
Definition and operations
Let Σ {\displaystyle \Sigma } be a set of uninterpreted functions, where Σ n {\displaystyle \Sigma _{n}} is the subset of Σ {\displaystyle \Sigma } consisting of functions of arity n {\displaystyle n} . Let i d {\displaystyle \mathbb {id} } be a countable set of opaque identifiers that may be compared for equality, called e-class IDs. The application of f ∈ Σ n {\displaystyle f\in \Sigma _{n}} to e-class IDs i 1 , i 2 , … , i n ∈ i d {\displaystyle i_{1},i_{2},\ldots ,i_{n}\in \mathbb {id} } is denoted f ( i 1 , i 2 , … , i n ) {\displaystyle f(i_{1},i_{2},\ldots ,i_{n})} and called an e-node.
The e-graph then represents equivalence classes of e-nodes, using the following data structures:1
- A union-find structure U {\displaystyle U} representing equivalence classes of e-class IDs, with the usual operations f i n d {\displaystyle \mathrm {find} } , a d d {\displaystyle \mathrm {add} } and m e r g e {\displaystyle \mathrm {merge} } . An e-class ID e {\displaystyle e} is canonical if f i n d ( U , e ) = e {\displaystyle \mathrm {find} (U,e)=e} ; an e-node f ( i 1 , … , i n ) {\displaystyle f(i_{1},\ldots ,i_{n})} is canonical if each i j {\displaystyle i_{j}} is canonical ( j {\displaystyle j} in 1 , … , n {\displaystyle 1,\ldots ,n} ).
- An association of e-class IDs with sets of e-nodes, called e-classes. This consists of
- a hashcons H {\displaystyle H} (i.e. a mapping) from canonical e-nodes to e-class IDs, and
- an e-class map M {\displaystyle M} that maps e-class IDs to e-classes, such that M {\displaystyle M} maps equivalent IDs to the same set of e-nodes: ∀ i , j ∈ i d , M [ i ] = M [ j ] ⇔ f i n d ( U , i ) = f i n d ( U , j ) {\displaystyle \forall i,j\in \mathbb {id} ,M[i]=M[j]\Leftrightarrow \mathrm {find} (U,i)=\mathrm {find} (U,j)}
Invariants
In addition to the above structure, a valid e-graph conforms to several data structure invariants.2 Two e-nodes are equivalent if they are in the same e-class. The congruence invariant states that an e-graph must ensure that equivalence is closed under congruence, where two e-nodes f ( i 1 , … , i n ) , f ( j 1 , … , j n ) {\displaystyle f(i_{1},\ldots ,i_{n}),f(j_{1},\ldots ,j_{n})} are congruent when f i n d ( U , i k ) = f i n d ( U , j k ) , k ∈ { 1 , … , n } {\displaystyle \mathrm {find} (U,i_{k})=\mathrm {find} (U,j_{k}),k\in \{1,\ldots ,n\}} . The hashcons invariant states that the hashcons maps canonical e-nodes to their e-class ID.
Operations
E-graphs expose wrappers around the a d d {\displaystyle \mathrm {add} } , f i n d {\displaystyle \mathrm {find} } , and m e r g e {\displaystyle \mathrm {merge} } operations from the union-find that preserve the e-graph invariants. The last operation, e-matching, is described below.
Equivalent formulations
An e-graph can also be formulated as a bipartite graph G = ( N ⊎ i d , E ) {\displaystyle G=(N\uplus \mathrm {id} ,E)} where
- i d {\displaystyle \mathrm {id} } is the set of e-class IDs (as above),
- N {\displaystyle N} is the set of e-nodes, and
- E ⊆ ( i d × N ) ∪ ( N × i d ) {\displaystyle E\subseteq (\mathrm {id} \times N)\cup (N\times \mathrm {id} )} is a set of directed edges.
There is a directed edge from each e-class to each of its members, and from each e-node to each of its children.3
E-matching
Let V {\displaystyle V} be a set of variables and let T e r m ( Σ , V ) {\displaystyle \mathrm {Term} (\Sigma ,V)} be the smallest set that includes the 0-arity function symbols (also called constants), includes the variables, and is closed under application of the function symbols. In other words, T e r m ( Σ , V ) {\displaystyle \mathrm {Term} (\Sigma ,V)} is the smallest set such that V ⊂ T e r m ( Σ , V ) {\displaystyle V\subset \mathrm {Term} (\Sigma ,V)} , Σ 0 ⊂ T e r m ( Σ , V ) {\displaystyle \Sigma _{0}\subset \mathrm {Term} (\Sigma ,V)} , and when x 1 , … , x n ∈ T e r m ( Σ , V ) {\displaystyle x_{1},\ldots ,x_{n}\in \mathrm {Term} (\Sigma ,V)} and f ∈ Σ n {\displaystyle f\in \Sigma _{n}} , then f ( x 1 , … , x n ) ∈ T e r m ( Σ , V ) {\displaystyle f(x_{1},\ldots ,x_{n})\in \mathrm {Term} (\Sigma ,V)} . A term containing variables is called a pattern, a term without variables is called ground.
An e-graph E {\displaystyle E} represents a ground term t ∈ T e r m ( Σ , ∅ ) {\displaystyle t\in \mathrm {Term} (\Sigma ,\emptyset )} if one of its e-classes represents t {\displaystyle t} . An e-class C {\displaystyle C} represents t {\displaystyle t} if some e-node f ( i 1 , … , i n ) ∈ C {\displaystyle f(i_{1},\ldots ,i_{n})\in C} does. An e-node f ( i 1 , … , i n ) ∈ C {\displaystyle f(i_{1},\ldots ,i_{n})\in C} represents a term g ( j 1 , … , j n ) {\displaystyle g(j_{1},\ldots ,j_{n})} if f = g {\displaystyle f=g} and each e-class M [ i k ] {\displaystyle M[i_{k}]} represents the term j k {\displaystyle j_{k}} ( k {\displaystyle k} in 1 , … , n {\displaystyle 1,\ldots ,n} ).
e-matching is an operation that takes a pattern p ∈ T e r m ( Σ , V ) {\displaystyle p\in \mathrm {Term} (\Sigma ,V)} and an e-graph E {\displaystyle E} , and yields all pairs ( σ , C ) {\displaystyle (\sigma ,C)} where σ ⊂ V × i d {\displaystyle \sigma \subset V\times \mathbb {id} } is a substitution mapping the variables in p {\displaystyle p} to e-class IDs and C ∈ i d {\displaystyle C\in \mathbb {id} } is an e-class ID such that the term σ ( p ) {\displaystyle \sigma (p)} is represented by C {\displaystyle C} . There are several known algorithms for e-matching,45 the relational e-matching algorithm is based on worst-case optimal joins and is worst-case optimal.6
Extraction
Given an e-class and a cost function that maps each function symbol in Σ {\displaystyle \Sigma } to a natural number, the extraction problem is to find a ground term with minimal total cost that is represented by the given e-class. This problem is NP-hard.7 There is also no constant-factor approximation algorithm for this problem, which can be shown by reduction from the set cover problem. However, for graphs with bounded treewidth, there is a linear-time, fixed-parameter tractable algorithm.8
Complexity
- An e-graph with n equalities can be constructed in O(n log n) time.9
Equality saturation
Equality saturation is a technique for building optimizing compilers using e-graphs.10 It operates by applying a set of rewrites using e-matching until the e-graph is saturated, a timeout is reached, an e-graph size limit is reached, a fixed number of iterations is exceeded, or some other halting condition is reached. After rewriting, an optimal term is extracted from the e-graph according to some cost function, usually related to AST size or performance considerations.
Applications
E-graphs are used in automated theorem proving. They are a crucial part of modern SMT solvers such as Z311 and CVC4, where they are used to decide the empty theory by computing the congruence closure of a set of equalities, and e-matching is used to instantiate quantifiers.12 In DPLL(T)-based solvers that use conflict-driven clause learning (also known as non-chronological backtracking), e-graphs are extended to produce proof certificates.13 E-graphs are also used in the Simplify theorem prover of ESC/Java.14
Equality saturation is used in specialized optimizing compilers,15 e.g. for deep learning16 and linear algebra.17 Equality saturation has also been used for translation validation applied to the LLVM toolchain.18
E-graphs have been applied to several problems in program analysis, including fuzzing,19 abstract interpretation,20 and library learning.21
- de Moura, Leonardo; Bjørner, Nikolaj (2007). "Efficient E-Matching for SMT Solvers". In Pfenning, Frank (ed.). Automated Deduction – CADE-21. Lecture Notes in Computer Science. Vol. 4603. Berlin, Heidelberg: Springer. pp. 183–198. doi:10.1007/978-3-540-73595-3_13. ISBN 978-3-540-73595-3.
- Willsey, Max; Nandi, Chandrakana; Wang, Yisu Remy; Flatt, Oliver; Tatlock, Zachary; Panchekha, Pavel (2021-01-04). "egg: Fast and extensible equality saturation". Proceedings of the ACM on Programming Languages. 5 (POPL): 23:1–23:29. arXiv:2004.03082. doi:10.1145/3434304. S2CID 226282597.
- Tate, Ross; Stepp, Michael; Tatlock, Zachary; Lerner, Sorin (2009-01-21). "Equality saturation". Proceedings of the 36th annual ACM SIGPLAN-SIGACT symposium on Principles of programming languages. POPL '09. Savannah, GA, USA: Association for Computing Machinery. pp. 264–276. doi:10.1145/1480881.1480915. ISBN 978-1-60558-379-2. S2CID 2138086.
- Flatt, Oliver; Coward, Samuel; Willsey, Max; Tatlock, Zachary; Panchekha, Pavel (October 2022). "Small Proofs from Congruence Closure". In A. Griggio; N. Rungta (eds.). Proceedings of the 22nd Conference on Formal Methods in Computer-Aided Design – FMCAD 2022. TU Wien Academic Press. pp. 75–83. doi:10.34727/2022/isbn.978-3-85448-053-2_13. ISBN 978-3-85448-053-2. S2CID 252118847.
- Goharshady, Amir Kafshdar; Lam, Chun Kit; Parreaux, Lionel (2024-10-08). "Fast and Optimal Extraction for Sparse Equality Graphs". Proceedings of the ACM on Programming Languages. 8 (OOPSLA2): 361:2551–361:2577. doi:10.1145/3689801.
External links
References
(Willsey et al. 2021) - Willsey, Max; Nandi, Chandrakana; Wang, Yisu Remy; Flatt, Oliver; Tatlock, Zachary; Panchekha, Pavel (2021-01-04). "egg: Fast and extensible equality saturation". Proceedings of the ACM on Programming Languages. 5 (POPL): 23:1–23:29. arXiv:2004.03082. doi:10.1145/3434304. S2CID 226282597. https://doi.org/10.1145/3434304 ↩
(Willsey et al. 2021) - Willsey, Max; Nandi, Chandrakana; Wang, Yisu Remy; Flatt, Oliver; Tatlock, Zachary; Panchekha, Pavel (2021-01-04). "egg: Fast and extensible equality saturation". Proceedings of the ACM on Programming Languages. 5 (POPL): 23:1–23:29. arXiv:2004.03082. doi:10.1145/3434304. S2CID 226282597. https://doi.org/10.1145/3434304 ↩
(Goharshady, Lam & Parreaux 2024) - Goharshady, Amir Kafshdar; Lam, Chun Kit; Parreaux, Lionel (2024-10-08). "Fast and Optimal Extraction for Sparse Equality Graphs". Proceedings of the ACM on Programming Languages. 8 (OOPSLA2): 361:2551–361:2577. doi:10.1145/3689801. https://dl.acm.org/doi/abs/10.1145/3689801 ↩
(de Moura & Bjørner 2007) - de Moura, Leonardo; Bjørner, Nikolaj (2007). "Efficient E-Matching for SMT Solvers". In Pfenning, Frank (ed.). Automated Deduction – CADE-21. Lecture Notes in Computer Science. Vol. 4603. Berlin, Heidelberg: Springer. pp. 183–198. doi:10.1007/978-3-540-73595-3_13. ISBN 978-3-540-73595-3. https://link.springer.com/chapter/10.1007/978-3-540-73595-3_13 ↩
Moskal, Michał; Łopuszański, Jakub; Kiniry, Joseph R. (2008-05-06). "E-matching for Fun and Profit". Electronic Notes in Theoretical Computer Science. Proceedings of the 5th International Workshop on Satisfiability Modulo Theories (SMT 2007). 198 (2): 19–35. doi:10.1016/j.entcs.2008.04.078. ISSN 1571-0661. https://doi.org/10.1016%2Fj.entcs.2008.04.078 ↩
Zhang, Yihong; Wang, Yisu Remy; Willsey, Max; Tatlock, Zachary (2022-01-12). "Relational e-matching". Proceedings of the ACM on Programming Languages. 6 (POPL): 35:1–35:22. doi:10.1145/3498696. S2CID 236924583. https://doi.org/10.1145%2F3498696 ↩
Stepp, Michael Benjamin (2011). Equality saturation: engineering challenges and applications (PhD thesis). USA: University of California at San Diego. ISBN 978-1-267-03827-2. 978-1-267-03827-2 ↩
(Goharshady, Lam & Parreaux 2024) - Goharshady, Amir Kafshdar; Lam, Chun Kit; Parreaux, Lionel (2024-10-08). "Fast and Optimal Extraction for Sparse Equality Graphs". Proceedings of the ACM on Programming Languages. 8 (OOPSLA2): 361:2551–361:2577. doi:10.1145/3689801. https://dl.acm.org/doi/abs/10.1145/3689801 ↩
(Flatt et al. 2022, p. 2) - Flatt, Oliver; Coward, Samuel; Willsey, Max; Tatlock, Zachary; Panchekha, Pavel (October 2022). "Small Proofs from Congruence Closure". In A. Griggio; N. Rungta (eds.). Proceedings of the 22nd Conference on Formal Methods in Computer-Aided Design – FMCAD 2022. TU Wien Academic Press. pp. 75–83. doi:10.34727/2022/isbn.978-3-85448-053-2_13. ISBN 978-3-85448-053-2. S2CID 252118847. https://repositum.tuwien.at/handle/20.500.12708/81325 ↩
(Tate et al. 2009) - Tate, Ross; Stepp, Michael; Tatlock, Zachary; Lerner, Sorin (2009-01-21). "Equality saturation". Proceedings of the 36th annual ACM SIGPLAN-SIGACT symposium on Principles of programming languages. POPL '09. Savannah, GA, USA: Association for Computing Machinery. pp. 264–276. doi:10.1145/1480881.1480915. ISBN 978-1-60558-379-2. S2CID 2138086. https://doi.org/10.1145/1480881.1480915 ↩
de Moura, Leonardo; Bjørner, Nikolaj (2008). "Z3: An Efficient SMT Solver". In Ramakrishnan, C. R.; Rehof, Jakob (eds.). Tools and Algorithms for the Construction and Analysis of Systems. Lecture Notes in Computer Science. Vol. 4963. Berlin, Heidelberg: Springer. pp. 337–340. doi:10.1007/978-3-540-78800-3_24. ISBN 978-3-540-78800-3. 978-3-540-78800-3 ↩
Rümmer, Philipp (2012). "E-Matching with Free Variables". In Bjørner, Nikolaj; Voronkov, Andrei (eds.). Logic for Programming, Artificial Intelligence, and Reasoning. Proceedings. 18th International Conference, LPAR-18, Merida, Venezuela, March 11–15, 2012. Lecture Notes in Computer Science. Vol. 7180. Berlin, Heidelberg: Springer. pp. 359–374. doi:10.1007/978-3-642-28717-6_28. ISBN 978-3-642-28717-6. 978-3-642-28717-6 ↩
(Flatt et al. 2022, p. 2) - Flatt, Oliver; Coward, Samuel; Willsey, Max; Tatlock, Zachary; Panchekha, Pavel (October 2022). "Small Proofs from Congruence Closure". In A. Griggio; N. Rungta (eds.). Proceedings of the 22nd Conference on Formal Methods in Computer-Aided Design – FMCAD 2022. TU Wien Academic Press. pp. 75–83. doi:10.34727/2022/isbn.978-3-85448-053-2_13. ISBN 978-3-85448-053-2. S2CID 252118847. https://repositum.tuwien.at/handle/20.500.12708/81325 ↩
Detlefs, David; Nelson, Greg; Saxe, James B. (May 2005). "Simplify: a theorem prover for program checking". Journal of the ACM. 52 (3): 365–473. doi:10.1145/1066100.1066102. ISSN 0004-5411. S2CID 9613854. /wiki/Doi_(identifier) ↩
Joshi, Rajeev; Nelson, Greg; Randall, Keith (2002-05-17). "Denali: a goal-directed superoptimizer". ACM SIGPLAN Notices. 37 (5): 304–314. doi:10.1145/543552.512566. ISSN 0362-1340. /wiki/Doi_(identifier) ↩
Yang, Yichen; Phothilimtha, Phitchaya Mangpo; Wang, Yisu Remy; Willsey, Max; Roy, Sudip; Pienaar, Jacques (2021-03-17). "Equality Saturation for Tensor Graph Superoptimization". arXiv:2101.01332 [cs.AI]. /wiki/ArXiv_(identifier) ↩
Wang, Yisu Remy; Hutchison, Shana; Leang, Jonathan; Howe, Bill; Suciu, Dan (2020-12-22). "SPORES: Sum-Product Optimization via Relational Equality Saturation for Large Scale Linear Algebra". arXiv:2002.07951 [cs.DB]. /wiki/ArXiv_(identifier) ↩
Stepp, Michael; Tate, Ross; Lerner, Sorin (2011). "Equality-Based Translation Validator for LLVM". In Gopalakrishnan, Ganesh; Qadeer, Shaz (eds.). Computer Aided Verification. Lecture Notes in Computer Science. Vol. 6806. Berlin, Heidelberg: Springer. pp. 737–742. doi:10.1007/978-3-642-22110-1_59. ISBN 978-3-642-22110-1. 978-3-642-22110-1 ↩
"Wasm-mutate: Fuzzing WebAssembly Compilers with E-Graphs (EGRAPHS 2022) - PLDI 2022". pldi22.sigplan.org. Retrieved 2023-02-03. https://pldi22.sigplan.org/details/egraphs-2022-papers/3/Wasm-mutate-Fuzzing-WebAssembly-Compilers-with-E-Graphs ↩
Coward, Samuel; Constantinides, George A.; Drane, Theo (2022-03-17). "Abstract Interpretation on E-Graphs". arXiv:2203.09191 [cs.LO].Coward, Samuel; Constantinides, George A.; Drane, Theo (2022-05-30). "Combining E-Graphs with Abstract Interpretation". arXiv:2205.14989 [cs.DS]. /wiki/ArXiv_(identifier) ↩
Cao, David; Kunkel, Rose; Nandi, Chandrakana; Willsey, Max; Tatlock, Zachary; Polikarpova, Nadia (2023-01-09). "babble: Learning Better Abstractions with E-Graphs and Anti-Unification". Proceedings of the ACM on Programming Languages. 7 (POPL): 396–424. arXiv:2212.04596. doi:10.1145/3571207. ISSN 2475-1421. S2CID 254536022. /wiki/ArXiv_(identifier) ↩