Soumen Chakrabarti, Focused Web Crawling, in the Encyclopedia of Database Systems. http://www.springerreference.com/docs/html/chapterdbid/63300.html
Controversial topics https://www.semanticjuice.com/controversial-topics/
Improving the Performance of Focused Web Crawlers[1], Sotiris Batsakis, Euripides G. M. Petrakis, Evangelos Milios, 2012-04-09 http://www.intelligence.tuc.gr/~petrakis/publications/BaPeMi09.pdf
Pinkerton, B. (1994). Finding what people want: Experiences with the WebCrawler. In Proceedings of the First World Wide Web Conference, Geneva, Switzerland. http://www.thinkpink.com/bp/WebCrawler/WWW94.html
Menczer, F. (1997). ARACHNID: Adaptive Retrieval Agents Choosing Heuristic Neighborhoods for Information Discovery Archived 2012-12-21 at the Wayback Machine. In D. Fisher, ed., Proceedings of the 14th International Conference on Machine Learning (ICML97). Morgan Kaufmann. http://informatics.indiana.edu/fil/Papers/ICML.ps
Menczer, F. and Belew, R.K. (1998). Adaptive Information Agents in Distributed Textual Environments Archived 2012-12-21 at the Wayback Machine. In K. Sycara and M. Wooldridge (eds.) Proceedings of the 2nd International Conference on Autonomous Agents (Agents '98). ACM Press. http://informatics.indiana.edu/fil/Papers/AA98.ps
Focused crawling: a new approach to topic-specific Web resource discovery, Soumen Chakrabarti, Martin van den Berg and Byron Dom, WWW 1999. http://www8.org/w8-papers/5a-search-query/crawling/index.html
A machine learning approach to building domain-specific search engines, Andrew McCallum, Kamal Nigam, Jason Rennie, and Kristie Seymore, IJCAI 1999. http://dl.acm.org/citation.cfm?id=1624313
Using Reinforcement Learning to Spider the Web Efficiently, Jason Rennie and Andrew McCallum, ICML 1999. http://dl.acm.org/citation.cfm?id=657633
Diligenti, M., Coetzee, F., Lawrence, S., Giles, C. L., and Gori, M. (2000). Focused crawling using context graphs Archived 2008-03-07 at the Wayback Machine. In Proceedings of the 26th International Conference on Very Large Databases (VLDB), pages 527-534, Cairo, Egypt. http://nautilus.dii.unisi.it/pubblicazioni/files/conference/2000-Diligenti-VLDB.pdf
Accelerated focused crawling through online relevance feedback, Soumen Chakrabarti, Kunal Punera, and Mallela Subramanyam, WWW 2002. http://dl.acm.org/citation.cfm?id=511466
Menczer, F., Pant, G., and Srinivasan, P. (2004). Topical Web Crawlers: Evaluating Adaptive Algorithms. ACM Trans. on Internet Technology 4(4): 378–419. http://doi.acm.org/10.1145/1031114.1031117
Recognition of common areas in a Web page using visual information: a possible application in a page classification, Milos Kovacevic, Michelangelo Diligenti, Marco Gori, Veljko Milutinovic, Data Mining, 2002. ICDM 2003. https://ieeexplore.ieee.org/document/1183910/
Dong, H., Hussain, F.K., Chang, E.: State of the art in semantic focused crawlers. Computational Science and Its Applications – ICCSA 2009. Springer-Verlag, Seoul, Korea (July 2009) pp. 910-924 https://www.researchgate.net/publication/44241179_State_of_the_Art_in_Semantic_Focused_Crawlers
Dong, H., Hussain, F.K.: SOF: A semi-supervised ontology-learning-based focused crawler. Concurrency and Computation: Practice and Experience. 25(12) (August 2013) pp. 1623-1812 https://www.researchgate.net/publication/264620349_SOF_A_semi-supervised_ontology-learning-based_focused_crawler
Junghoo Cho, Hector Garcia-Molina, Lawrence Page: Efficient Crawling Through URL Ordering. Computer Networks 30(1-7): 161-172 (1998) http://dl.acm.org/citation.cfm?id=297835
Marc Najork, Janet L. Wiener: Breadth-first crawling yields high-quality pages. WWW 2001: 114-118 http://dl.acm.org/citation.cfm?id=371965
Nadav Eiron, Kevin S. McCurley, John A. Tomlin: Ranking the web frontier. WWW 2004: 309-318. http://dl.acm.org/citation.cfm?id=988714
Meusel R., Mika P., Blanco R. (2014). Focused Crawling for Structured Data. ACM International Conference on Information and Knowledge Management, Pages 1039-1048. http://dl.acm.org/citation.cfm?doid=2661829.2661902
Brian D. Davison: Topical locality in the Web. SIGIR 2000: 272-279. http://dl.acm.org/citation.cfm?doid=345508.345597
Soumen Chakrabarti, Mukul Joshi, Kunal Punera, David M. Pennock: The structure of broad topics on the Web. WWW 2002: 251-262. http://dl.acm.org/citation.cfm?id=511480
Jian Wu, Pradeep Teregowda, Juan Pablo Fernández Ramírez, Prasenjit Mitra, Shuyi Zheng, C. Lee Giles, The evolution of a crawling strategy for an academic document search engine: whitelists and blacklists, In proceedings of the 3rd Annual ACM Web Science Conference Pages 340-343, Evanston, IL, USA, June 2012. http://dl.acm.org/citation.cfm?id=2380718.2380762