Hybrid techniques for knowledge-based NLP
Knowledge graphs meet machine learning and all their friends
Many different artificial intelligence techniques can be used to explore and exploit large document corpora that are available inside organizations and on the Web. While natural language is symbolic in nature and first approaches were based on symbolic and rule-based methods (e.g., ontologies and knowledge bases), most widely used methods have been based on statistical approaches (e.g., linear methods such as support vectors machines, probabilistic topic models, and non-linear methods such as neural networks). These two approaches, knowledge-based and statistical methods, have their limitations and strengths and there is an increasing trend that seeks to combine them to get the best of both worlds. This tutorial will cover the foundations and modern practical applications of knowledge-based and statistical methods, techniques and models and their combination for exploiting large document corpora. This tutorial will first focus on the foundations of many of the techniques that can be used for this purpose, including knowledge graphs, word embeddings, neural network methods, probabilistic topic models, and will then describe how a combination of these techniques is being used in practical applications and commercial projects where the instructors are currently involved.
We welcome researchers and practitioners both from industry and academia, as well as other participants with an interest in hybrid approaches to knowledge-based natural language processing. We plan to have an interactive day where both instructors and participants can engage in rich discussions on the topic. Some familiarity on the matter is expected but otherwise this should not prevent you from attending if interested.
The tutorial will comprise two main blocks, consisting of slides and hands-on exercises. We will close with some time for discussion. During the different blocks we will touch base on different examples and applications.
1. Challenges of text and natural language processing
2. Modern natural language processing methods, technologies and common tasks
3. Distributed word and feature representations, embeddings
4. Knowledge graph embeddings
5. Extending word embeddings with external knowledge: retrofitting and projection
6. Towards a vecsigrafo, bringing meaning from text into knowledge graphs
7. Evaluating vecsigrafos, visual inspection and quality assurance methods
8. Knowledge graph generation from text corpora: curation, interlinking and multilingual reuse
9. Probabilistic topic models
10. Topic-based semantic similarity
This tutorial is offered by the following members of the Research Lab at Expert System, Recogn.ai and Universidad Politecnica de Madrid.
Jose Manuel Gomez-Perez works in the intersection of several areas of Artificial Intelligence, including Natural Language Processing, Knowledge Discovery, Representation and Reasoning. His long-term vision is to enable machines to understand text in a way similar to how humans read, bridging the gap between both through semantically rich knowledge representations and user interfaces. At Expert System, Jose Manuel leads the Research Lab in Madrid, formed by researchers, software engineers and linguists, in the belief that such vision is best served through a combination of structured knowledge graphs and probabilistic approaches. Before Expert System, he worked at Intelligent Software Components, one of the first European companies to deliver Semantic and Natural Language Processing solutions on the Web. He also consults for companies like Coca-Cola, British Telecom, Volkswagen, HAVAS and ING. Also active as an entrepreneur, he co-founded a startup and advised another. An ACM member and former Marie Curie fellow, Jose Manuel holds a Ph.D. in Computer Science and Artificial Intelligence and regularly publishes in top scientific conferences and journals in the area. His views on AI and its applications have appeared in magazines like Nature and Scientific American. In 2015, Jose Manuel was the program chair of K-CAP, the International Conference on Knowledge Capture.
Ronald Denaux is a senior researcher at Expert System Iberia. Ronald obtained his MSc in Computer Science from the Technical University Eindhoven, The Netherlands. After a couple of years working in industry as a software developer for a large IT company in The Netherlands, Ronald decided to go back to academia. Ronald obtained a PhD, again in Computer Science, from the University of Leeds, UK. Ronald's research interests have revolved around making semantic web technologies more usable for end users, which has required research into (and resulted in various research publications in) the areas of Ontology Authoring and Reasoning, Natural Language Interfaces, Dialogue Systems, Intelligent User Interfaces and User Modelling. Besides research, Ronald has recently also been involved in knowledge transfer and product development from research prototypes.
Daniel Vila is co-founder of recogn.ai, a Madrid-based startup and spin-off from Universidad Politecnica de Madrid, building next generation solutions for text analytics and content management using the AI methods. Daniel holds a PhD in Artificial Intelligence by Universidad Politecnica de Madrid (2016), where he worked at the Ontology Engineering Group and developed the solution supporting a large knowledge graph combining NLP and semantic technologies: the datos.bne.es data service from the National Library of Spain.
Carlos Badenes: After more than 8 years working on the M2M world, Carlos began researching about text mining within the context of semantic web. Since then, he has moved more deeply into the study of topic modeling techniques to analyze large collections of documents, incorporating semantic resources and working on multilingual domains. He currently works as an associate researcher at the Ontology Engineering Group (OEG) doing a PhD at the Universidad Politecnica de Madrid (UPM).