After the standardization of knowledge representation languages such as RDF and OWL, much research has been conducted in the area, especially regarding transforming relational databases into RDF, identity resolution, knowledge discovery and ontology learning. The general process uses traditional methods from information extraction and extract, transform, and load (ETL), which transform the data from the sources into structured formats. So understanding how the interact and learn from each other.
The following criteria can be used to categorize approaches in this topic (some of them only account for extraction from relational databases):2
President Obama called Wednesday on Congress to extend a tax break for students included in last year's economic stimulus package, arguing that the policy provides more generous assistance.
When building a RDB representation of a problem domain, the starting point is frequently an entity-relationship diagram (ERD). Typically, each entity is represented as a database table, each attribute of the entity becomes a column in that table, and relationships between entities are indicated by foreign keys. Each table typically defines a particular class of entity, each column one of its attributes. Each row in the table describes an entity instance, uniquely identified by a primary key. The table rows collectively describe an entity set. In an equivalent RDF representation of the same entity set:
So, to render an equivalent view based on RDF semantics, the basic mapping algorithm would be as follows:
Early mentioning of this basic or direct mapping can be found in Tim Berners-Lee's comparison of the ER model to the RDF model.5
The 1:1 mapping mentioned above exposes the legacy data as RDF in a straightforward way, additional refinements can be employed to improve the usefulness of RDF output respective the given Use Cases. Normally, information is lost during the transformation of an entity-relationship diagram (ERD) to relational tables (Details can be found in object-relational impedance mismatch) and has to be reverse engineered. From a conceptual view, approaches for extraction can come from two directions. The first direction tries to extract or learn an OWL schema from the given database schema. Early approaches used a fixed amount of manually created mapping rules to refine the 1:1 mapping.678 More elaborate methods are employing heuristics or learning algorithms to induce schematic information (methods overlap with ontology learning). While some approaches try to extract the information from the structure inherent in the SQL schema9 (analysing e.g. foreign keys), others analyse the content and the values in the tables to create conceptual hierarchies10 (e.g. a columns with few values are candidates for becoming categories). The second direction tries to map the schema and its contents to a pre-existing domain ontology (see also: ontology alignment). Often, however, a suitable domain ontology does not exist and has to be created first.
As XML is structured as a tree, any data can be easily represented in RDF, which is structured as a graph. XML2RDF is one example of an approach that uses RDF blank nodes and transforms XML elements and attributes to RDF properties. The topic however is more complex as in the case of relational databases. In a relational table the primary key is an ideal candidate for becoming the subject of the extracted triples. An XML element, however, can be transformed - depending on the context- as a subject, a predicate or object of a triple. XSLT can be used a standard transformation language to manually convert XML to RDF.
The largest portion of information contained in business documents (about 80%11) is encoded in natural language and therefore unstructured. Because unstructured data is rather a challenge for knowledge extraction, more sophisticated methods are required, which generally tend to supply worse results compared to structured data. The potential for a massive acquisition of extracted knowledge, however, should compensate the increased complexity and decreased quality of extraction. In the following, natural language sources are understood as sources of information, where the data is given in an unstructured fashion as plain text. If the given text is additionally embedded in a markup document (e. g. HTML document), the mentioned systems normally remove the markup elements automatically.
Main articles: Natural language processing and Linguistic Annotation
As a preprocessing step to knowledge extraction, it can be necessary to perform linguistic annotation by one or multiple NLP tools. Individual modules in an NLP workflow normally build on tool-specific formats for input and output, but in the context of knowledge extraction, structured formats for representing linguistic annotations have been applied.
Typical NLP tasks relevant to knowledge extraction include:
In NLP, such data is typically represented in TSV formats (CSV formats with TAB as separators), often referred to as CoNLL formats. For knowledge extraction workflows, RDF views on such data have been created in accordance with the following community standards:
Other, platform-specific formats include
Traditional information extraction21 is a technology of natural language processing, which extracts information from typically natural language texts and structures these in a suitable manner. The kinds of information to be identified must be specified in a model before beginning the process, which is why the whole process of traditional Information Extraction is domain dependent. The IE is split in the following five subtasks.
The task of named entity recognition is to recognize and to categorize all named entities contained in a text (assignment of a named entity to a predefined category). This works by application of grammar based methods or statistical models.
Coreference resolution identifies equivalent entities, which were recognized by NER, within a text. There are two relevant kinds of equivalence relationship. The first one relates to the relationship between two different represented entities (e.g. IBM Europe and IBM) and the second one to the relationship between an entity and their anaphoric references (e.g. it and IBM). Both kinds can be recognized by coreference resolution.
During template element construction the IE system identifies descriptive properties of entities, recognized by NER and CO. These properties correspond to ordinary qualities like red or big.
Template relation construction identifies relations, which exist between the template elements. These relations can be of several kinds, such as works-for or located-in, with the restriction, that both domain and range correspond to entities.
In the template scenario production events, which are described in the text, will be identified and structured with respect to the entities, recognized by NER and CO and relations, identified by TR.
Ontology-based information extraction 22 is a subfield of information extraction, with which at least one ontology is used to guide the process of information extraction from natural language text. The OBIE system uses methods of traditional information extraction to identify concepts, instances and relations of the used ontologies in the text, which will be structured to an ontology after the process. Thus, the input ontologies constitute the model of information to be extracted.23
Main article: Ontology learning
Ontology learning is the automatic or semi-automatic creation of ontologies, including extracting the corresponding domain's terms from natural language text. As building ontologies manually is extremely labor-intensive and time consuming, there is great motivation to automate the process.
During semantic annotation,24 natural language text is augmented with metadata (often represented in RDFa), which should make the semantics of contained terms machine-understandable. At this process, which is generally semi-automatic, knowledge is extracted in the sense, that a link between lexical terms and for example concepts from ontologies is established. Thus, knowledge is gained, which meaning of a term in the processed context was intended and therefore the meaning of the text is grounded in machine-readable data with the ability to draw inferences. Semantic annotation is typically split into the following two subtasks.
At the terminology extraction level, lexical terms from the text are extracted. For this purpose a tokenizer determines at first the word boundaries and solves abbreviations. Afterwards terms from the text, which correspond to a concept, are extracted with the help of a domain-specific lexicon to link these at entity linking.
In entity linking 25 a link between the extracted lexical terms from the source text and the concepts from an ontology or knowledge base such as DBpedia is established. For this, candidate-concepts are detected appropriately to the several meanings of a term with the help of a lexicon. Finally, the context of the terms is analyzed to determine the most appropriate disambiguation and to assign the term to the correct concept.
Note that "semantic annotation" in the context of knowledge extraction is not to be confused with semantic parsing as understood in natural language processing (also referred to as "semantic annotation"): Semantic parsing aims a complete, machine-readable representation of natural language, whereas semantic annotation in the sense of knowledge extraction tackles only a very elementary aspect of that.
The following criteria can be used to categorize tools, which extract knowledge from natural language text.
The following table characterizes some tools for Knowledge Extraction from natural language sources.
Knowledge discovery describes the process of automatically searching large volumes of data for patterns that can be considered knowledge about the data.46 It is often described as deriving knowledge from the input data. Knowledge discovery developed out of the data mining domain, and is closely related to it both in terms of methodology and terminology.47
The most well-known branch of data mining is knowledge discovery, also known as knowledge discovery in databases (KDD). Just as many other forms of knowledge discovery it creates abstractions of the input data. The knowledge obtained through the process may become additional data that can be used for further usage and discovery. Often the outcomes from knowledge discovery are not actionable, techniques like domain driven data mining,48 aims to discover and deliver actionable knowledge and insights.
Another promising application of knowledge discovery is in the area of software modernization, weakness discovery and compliance which involves understanding existing software artifacts. This process is related to a concept of reverse engineering. Usually the knowledge obtained from existing software is presented in the form of models to which specific queries can be made when necessary. An entity relationship is a frequent format of representing knowledge obtained from existing software. Object Management Group (OMG) developed the specification Knowledge Discovery Metamodel (KDM) which defines an ontology for the software assets and their relationships for the purpose of performing knowledge discovery in existing code. Knowledge discovery from existing software systems, also known as software mining is closely related to data mining, since existing software artifacts contain enormous value for risk management and business value, key for the evaluation and evolution of software systems. Instead of mining individual data sets, software mining focuses on metadata, such as process flows (e.g. data flows, control flows, & call maps), architecture, database schemas, and business rules/terms/process.
RDB2RDF Working Group, Website: http://www.w3.org/2001/sw/rdb2rdf/, charter: http://www.w3.org/2009/08/rdb2rdf-charter, R2RML: RDB to RDF Mapping Language: http://www.w3.org/TR/r2rml/ http://www.w3.org/2001/sw/rdb2rdf/ ↩
LOD2 EU Deliverable 3.1.1 Knowledge Extraction from Structured Sources http://static.lod2.eu/Deliverables/deliverable-3.1.1.pdf Archived 2011-08-27 at the Wayback Machine http://static.lod2.eu/Deliverables/deliverable-3.1.1.pdf ↩
"Life in the Linked Data Cloud". www.opencalais.com. Archived from the original on 2009-11-24. Retrieved 2009-11-10. Wikipedia has a Linked Data twin called DBpedia. DBpedia has the same structured information as Wikipedia – but translated into a machine-readable format. https://web.archive.org/web/20091124182935/http://www.opencalais.com/node/9501 ↩
Tim Berners-Lee (1998), "Relational Databases on the Semantic Web". Retrieved: February 20, 2011. http://www.w3.org/DesignIssues/RDB-RDF.html ↩
Hu et al. (2007), "Discovering Simple Mappings Between Relational Database Schemas and Ontologies", In Proc. of 6th International Semantic Web Conference (ISWC 2007), 2nd Asian Semantic Web Conference (ASWC 2007), LNCS 4825, pages 225‐238, Busan, Korea, 11‐15 November 2007. http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.97.6934&rep=rep1&type=pdf http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.97.6934&rep=rep1&type=pdf ↩
R. Ghawi and N. Cullot (2007), "Database-to-Ontology Mapping Generation for Semantic Interoperability". In Third International Workshop on Database Interoperability (InterDB 2007). http://le2i.cnrs.fr/IMG/publications/InterDB07-Ghawi.pdf http://le2i.cnrs.fr/IMG/publications/InterDB07-Ghawi.pdf ↩
Li et al. (2005) "A Semi-automatic Ontology Acquisition Method for the Semantic Web", WAIM, volume 3739 of Lecture Notes in Computer Science, page 209-220. Springer. doi:10.1007/11563952_19 /wiki/Doi_(identifier) ↩
Tirmizi et al. (2008), "Translating SQL Applications to the Semantic Web", Lecture Notes in Computer Science, Volume 5181/2008 (Database and Expert Systems Applications). http://citeseer.ist.psu.edu/viewdoc/download;jsessionid=15E8AB2A37BD06DAE59255A1AC3095F0?doi=10.1.1.140.3169&rep=rep1&type=pdf http://citeseer.ist.psu.edu/viewdoc/download;jsessionid=15E8AB2A37BD06DAE59255A1AC3095F0?doi=10.1.1.140.3169&rep=rep1&type=pdf ↩
Farid Cerbah (2008). "Learning Highly Structured Semantic Repositories from Relational Databases", The Semantic Web: Research and Applications, volume 5021 of Lecture Notes in Computer Science, Springer, Berlin / Heidelberg http://www.tao-project.eu/resources/publications/cerbah-learning-highly-structured-semantic-repositories-from-relational-databases.pdf Archived 2011-07-20 at the Wayback Machine http://www.tao-project.eu/resources/publications/cerbah-learning-highly-structured-semantic-repositories-from-relational-databases.pdf ↩
Wimalasuriya, Daya C.; Dou, Dejing (2010). "Ontology-based information extraction: An introduction and a survey of current approaches", Journal of Information Science, 36(3), p. 306 - 323, http://ix.cs.uoregon.edu/~dou/research/papers/jis09.pdf (retrieved: 18.06.2012). http://ix.cs.uoregon.edu/~dou/research/papers/jis09.pdf ↩
"NLP Interchange Format (NIF) 2.0 - Overview and Documentation". persistence.uni-leipzig.org. Retrieved 2020-06-05. https://persistence.uni-leipzig.org/nlp2rdf/ ↩
Hellmann, Sebastian; Lehmann, Jens; Auer, Sören; Brümmer, Martin (2013). "Integrating NLP Using Linked Data". In Alani, Harith; Kagal, Lalana; Fokoue, Achille; Groth, Paul; Biemann, Chris; Parreira, Josiane Xavier; Aroyo, Lora; Noy, Natasha; Welty, Chris (eds.). The Semantic Web – ISWC 2013. Lecture Notes in Computer Science. Vol. 7908. Berlin, Heidelberg: Springer. pp. 98–113. doi:10.1007/978-3-642-41338-4_7. ISBN 978-3-642-41338-4. 978-3-642-41338-4 ↩
Verspoor, Karin; Livingston, Kevin (July 2012). "Towards Adaptation of Linguistic Annotations to Scholarly Annotation Formalisms on the Semantic Web". Proceedings of the Sixth Linguistic Annotation Workshop. Jeju, Republic of Korea: Association for Computational Linguistics: 75–84. https://www.aclweb.org/anthology/W12-3610 ↩
acoli-repo/conll-rdf, ACoLi, 2020-05-27, retrieved 2020-06-05 https://github.com/acoli-repo/conll-rdf ↩
Chiarcos, Christian; Fäth, Christian (2017). "CoNLL-RDF: Linked Corpora Done in an NLP-Friendly Way". In Gracia, Jorge; Bond, Francis; McCrae, John P.; Buitelaar, Paul; Chiarcos, Christian; Hellmann, Sebastian (eds.). Language, Data, and Knowledge. Lecture Notes in Computer Science. Vol. 10318. Cham: Springer International Publishing. pp. 74–88. doi:10.1007/978-3-319-59888-8_6. ISBN 978-3-319-59888-8. 978-3-319-59888-8 ↩
Verhagen, Marc; Suderman, Keith; Wang, Di; Ide, Nancy; Shi, Chunqi; Wright, Jonathan; Pustejovsky, James (2016). "The LAPPS Interchange Format". In Murakami, Yohei; Lin, Donghui (eds.). Worldwide Language Service Infrastructure. Lecture Notes in Computer Science. Vol. 9442. Cham: Springer International Publishing. pp. 33–47. doi:10.1007/978-3-319-31468-6_3. ISBN 978-3-319-31468-6. 978-3-319-31468-6 ↩
"The Language Application Grid | A web service platform for natural language processing development and research". Retrieved 2020-06-05. http://www.lappsgrid.org/ ↩
newsreader/NAF, NewsReader, 2020-05-25, retrieved 2020-06-05 https://github.com/newsreader/NAF ↩
Vossen, Piek; Agerri, Rodrigo; Aldabe, Itziar; Cybulska, Agata; van Erp, Marieke; Fokkens, Antske; Laparra, Egoitz; Minard, Anne-Lyse; Palmero Aprosio, Alessio; Rigau, German; Rospocher, Marco (2016-10-15). "NewsReader: Using knowledge resources in a cross-lingual reading machine to generate more knowledge from massive streams of news". Knowledge-Based Systems. 110: 60–85. doi:10.1016/j.knosys.2016.07.013. ISSN 0950-7051. https://doi.org/10.1016%2Fj.knosys.2016.07.013 ↩
Cunningham, Hamish (2005). "Information Extraction, Automatic", Encyclopedia of Language and Linguistics, 2, p. 665 - 677, http://gate.ac.uk/sale/ell2/ie/main.pdf (retrieved: 18.06.2012). http://gate.ac.uk/sale/ell2/ie/main.pdf ↩
Chicco, D; Masseroli, M (2016). "Ontology-based prediction and prioritization of gene functional annotations". IEEE/ACM Transactions on Computational Biology and Bioinformatics. 13 (2): 248–260. doi:10.1109/TCBB.2015.2459694. PMID 27045825. S2CID 2795344. https://doi.org/10.1109/TCBB.2015.2459694 ↩
Erdmann, M.; Maedche, Alexander; Schnurr, H.-P.; Staab, Steffen (2000). "From Manual to Semi-automatic Semantic Annotation: About Ontology-based Text Annotation Tools", Proceedings of the COLING, http://www.ida.liu.se/ext/epa/cis/2001/002/paper.pdf (retrieved: 18.06.2012). http://www.ida.liu.se/ext/epa/cis/2001/002/paper.pdf ↩
Rao, Delip; McNamee, Paul; Dredze, Mark (2011). "Entity Linking: Finding Extracted Entities in a Knowledge Base", Multi-source, Multi-lingual Information Extraction and Summarization, http://www.cs.jhu.edu/~delip/entity-linking.pdf[permanent dead link] (retrieved: 18.06.2012). http://www.cs.jhu.edu/~delip/entity-linking.pdf ↩
Rocket Software, Inc. (2012). "technology for extracting intelligence from text", http://www.rocketsoftware.com/products/aerotext Archived 2013-06-21 at the Wayback Machine (retrieved: 18.06.2012). http://www.rocketsoftware.com/products/aerotext ↩
Orchestr8 (2012): "AlchemyAPI Overview", http://www.alchemyapi.com/api Archived 2016-05-13 at the Wayback Machine (retrieved: 18.06.2012). http://www.alchemyapi.com/api ↩
The University of Sheffield (2011). "ANNIE: a Nearly-New Information Extraction System", http://gate.ac.uk/sale/tao/splitch6.html#chap:annie (retrieved: 18.06.2012). http://gate.ac.uk/sale/tao/splitch6.html#chap:annie ↩
ILP Network of Excellence. "ASIUM (LRI)", http://www-ai.ijs.si/~ilpnet2/systems/asium.html (retrieved: 18.06.2012). http://www-ai.ijs.si/~ilpnet2/systems/asium.html ↩
Attensity (2012). "Exhaustive Extraction", http://www.attensity.com/products/technology/semantic-server/exhaustive-extraction/ Archived 2012-07-11 at the Wayback Machine (retrieved: 18.06.2012). http://www.attensity.com/products/technology/semantic-server/exhaustive-extraction/ ↩
Mendes, Pablo N.; Jakob, Max; Garcia-Sílva, Andrés; Bizer; Christian (2011). "DBpedia Spotlight: Shedding Light on the Web of Documents", Proceedings of the 7th International Conference on Semantic Systems, p. 1 - 8, http://www.wiwiss.fu-berlin.de/en/institute/pwo/bizer/research/publications/Mendes-Jakob-GarciaSilva-Bizer-DBpediaSpotlight-ISEM2011.pdf Archived 2012-04-05 at the Wayback Machine (retrieved: 18.06.2012). http://www.wiwiss.fu-berlin.de/en/institute/pwo/bizer/research/publications/Mendes-Jakob-GarciaSilva-Bizer-DBpediaSpotlight-ISEM2011.pdf ↩
Gangemi, Aldo; Presutti, Valentina; Reforgiato Recupero, Diego; Nuzzolese, Andrea Giovanni; Draicchio, Francesco; Mongiovì, Misael (2016). "Semantic Web Machine Reading with FRED", Semantic Web Journal, doi:10.3233/SW-160240, http://www.semantic-web-journal.net/system/files/swj1379.pdf /wiki/Doi_(identifier) ↩
Adrian, Benjamin; Maus, Heiko; Dengel, Andreas (2009). "iDocument: Using Ontologies for Extracting Information from Text", http://www.dfki.uni-kl.de/~maus/dok/AdrianMausDengel09.pdf (retrieved: 18.06.2012). http://www.dfki.uni-kl.de/~maus/dok/AdrianMausDengel09.pdf ↩
SRA International, Inc. (2012). "NetOwl Extractor", http://www.sra.com/netowl/entity-extraction/ Archived 2012-09-24 at the Wayback Machine (retrieved: 18.06.2012). http://www.sra.com/netowl/entity-extraction/ ↩
Fortuna, Blaz; Grobelnik, Marko; Mladenic, Dunja (2007). "OntoGen: Semi-automatic Ontology Editor", Proceedings of the 2007 conference on Human interface, Part 2, p. 309 - 318, http://analytics.ijs.si/~blazf/papers/OntoGen2_HCII2007.pdf (retrieved: 18.06.2012). http://analytics.ijs.si/~blazf/papers/OntoGen2_HCII2007.pdf ↩
Missikoff, Michele; Navigli, Roberto; Velardi, Paola (2002). "Integrated Approach to Web Ontology Learning and Engineering", Computer, 35(11), p. 60 - 63, http://wwwusers.di.uniroma1.it/~velardi/IEEE_C.pdf (retrieved: 18.06.2012). http://wwwusers.di.uniroma1.it/~velardi/IEEE_C.pdf ↩
McDowell, Luke K.; Cafarella, Michael (2006). "Ontology-driven Information Extraction with OntoSyphon", Proceedings of the 5th international conference on The Semantic Web, p. 428 - 444, http://turing.cs.washington.edu/papers/iswc2006McDowell-final.pdf (retrieved: 18.06.2012). http://turing.cs.washington.edu/papers/iswc2006McDowell-final.pdf ↩
Yildiz, Burcu; Miksch, Silvia (2007). "ontoX - A Method for Ontology-Driven Information Extraction", Proceedings of the 2007 international conference on Computational science and its applications, 3, p. 660 - 673, http://publik.tuwien.ac.at/files/pub-inf_4769.pdf (retrieved: 18.06.2012). /wiki/Silvia_Miksch ↩
semanticweb.org (2011). "PoolParty Extractor", http://semanticweb.org/wiki/PoolParty_Extractor Archived 2016-03-04 at the Wayback Machine (retrieved: 18.06.2012). http://semanticweb.org/wiki/PoolParty_Extractor ↩
Dill, Stephen; Eiron, Nadav; Gibson, David; Gruhl, Daniel; Guha, R.; Jhingran, Anant; Kanungo, Tapas; Rajagopalan, Sridhar; Tomkins, Andrew; Tomlin, John A.; Zien, Jason Y. (2003). "SemTag and Seeker: Bootstraping the Semantic Web via Automated Semantic Annotation", Proceedings of the 12th international conference on World Wide Web, p. 178 - 186, http://www2003.org/cdrom/papers/refereed/p831/p831-dill.html (retrieved: 18.06.2012). http://www2003.org/cdrom/papers/refereed/p831/p831-dill.html ↩
Uren, Victoria; Cimiano, Philipp; Iria, José; Handschuh, Siegfried; Vargas-Vera, Maria; Motta, Enrico; Ciravegna, Fabio (2006). "Semantic annotation for knowledge management: Requirements and a survey of the state of the art", Web Semantics: Science, Services and Agents on the World Wide Web, 4(1), p. 14 - 28, http://staffwww.dcs.shef.ac.uk/people/J.Iria/iria_jws06.pdf[permanent dead link], (retrieved: 18.06.2012). http://staffwww.dcs.shef.ac.uk/people/J.Iria/iria_jws06.pdf ↩
Cimiano, Philipp; Völker, Johanna (2005). "Text2Onto - A Framework for Ontology Learning and Data-Driven Change Discovery", Proceedings of the 10th International Conference of Applications of Natural Language to Information Systems, 3513, p. 227 - 238, http://www.cimiano.de/Publications/2005/nldb05/nldb05.pdf (retrieved: 18.06.2012). http://www.cimiano.de/Publications/2005/nldb05/nldb05.pdf ↩
Maedche, Alexander; Volz, Raphael (2001). "The Ontology Extraction & Maintenance Framework Text-To-Onto", Proceedings of the IEEE International Conference on Data Mining, http://users.csc.calpoly.edu/~fkurfess/Events/DM-KM-01/Volz.pdf (retrieved: 18.06.2012). http://users.csc.calpoly.edu/~fkurfess/Events/DM-KM-01/Volz.pdf ↩
Machine Linking. "We connect to the Linked Open Data cloud", http://thewikimachine.fbk.eu/html/index.html Archived 2012-07-19 at the Wayback Machine (retrieved: 18.06.2012). http://thewikimachine.fbk.eu/html/index.html ↩
Inxight Federal Systems (2008). "Inxight ThingFinder and ThingFinder Professional", http://inxightfedsys.com/products/sdks/tf/ Archived 2012-06-29 at the Wayback Machine (retrieved: 18.06.2012). http://inxightfedsys.com/products/sdks/tf/ ↩
Frawley William. F. et al. (1992), "Knowledge Discovery in Databases: An Overview", AI Magazine (Vol 13, No 3), 57-70 (online full version: http://www.aaai.org/ojs/index.php/aimagazine/article/viewArticle/1011 Archived 2016-03-04 at the Wayback Machine) http://www.aaai.org/ojs/index.php/aimagazine/article/viewArticle/1011 ↩
Fayyad U. et al. (1996), "From Data Mining to Knowledge Discovery in Databases", AI Magazine (Vol 17, No 3), 37-54 (online full version: http://www.aaai.org/ojs/index.php/aimagazine/article/viewArticle/1230 Archived 2016-05-04 at the Wayback Machine http://www.aaai.org/ojs/index.php/aimagazine/article/viewArticle/1230 ↩
Cao, L. (2010). "Domain driven data mining: challenges and prospects". IEEE Transactions on Knowledge and Data Engineering. 22 (6): 755–769. CiteSeerX 10.1.1.190.8427. doi:10.1109/tkde.2010.32. S2CID 17904603. /wiki/CiteSeerX_(identifier) ↩