9th European Summer School in
Information Retrieval
2-6 September 2013 - Granada, Spain
This lecture will give on overview of Information Retrieval (IR), including the important concepts, terminology, relationships to other fields of Computer Science, and key research issues. I will then focus on the properties of text and text documents and how structure in a text corpus can be discovered through techniques such as text analysis, link analysis, information extraction, clustering, and classification. Finally, I will describe the crucial role of queries and information needs.
In the second lecture on Foundations, I will give an overview of retrieval models, which provide the formal framework for ranking algorithms. I will start with some general observations about different types of models and will then give more details based loosely on a chronological perspective. Models that will be covered include BM25, language models and LDA, graphical models (including MRF), feature-based models with an emphasis on proximity, and learning-to-rank approaches.
When using search technology comes to specific domains (e.g. patent, medical, scientific literature) and industries (e.g. pharmaceuticals) it is notable to mention that search systems have been used for more than 40 years now as an important method for information access. However, as public general purpose search technologies are being used increasingly in the workplace, and the professionals are becoming more knowledgeable about search technologies, many more demands are placed upon professional search systems which clearly differentiates them from typical "ten blue links" web search. The complexity of the tasks which need to be performed by professional searchers, which usually include not only retrieval but also information analysis, usually require association, pipelining and possibly integration of information as well as synchronization and coordination of multiple and potentially concurrent search views produced from different datasets, search tools and user interfaces. Many facets of search technology (e.g. exploratory search, aggregated search, federated search, task-based search, Information Retrieval (IR) over query sessions, cognitive IR approaches, Human Computer IR) aim to at least partially address some of these demands.
This talk will present MUMIA which is a research networking activity that brings together various facets of state-of-the-art search technology research which can contribute to the development of search tools for next generation professional search systems. Additionally, it presents a general framework which provides a useful topology for better understanding the design space of professional search systems and how different IR/NLP technologies can be integrated to enable rich information seeking environments where different tools can support specific objectives within a typically lengthy search process. The main motivation of the talk is to present they key challenges, a framework and the best practices which will influence the design and development of next generation professional integrated search systems.
FDIA Keynote speech
A set of tips for PhD students, particularly those that work on computer science and information retrieval.
Social media sites (by some referred to as the web 2.0) allow their users to interact with each other, for example in collecting and sharing so-called user-generated content - these can be just bookmarks, but also blogs, images, and videos. Social media support co-creation: processes where customers (or users, if you prefer) do not just consume but play an active role in defining and shaping the end product. Famous examples include Six Degrees, LiveJournal, Digg, Epinions, Myspace, Flickr, YouTube, Linked-in, and Pinterest. Of course, today's internet giants Facebook and Twitter are key new developments. Finally, Wikipedia should not be overlooked - a major resource in many language technologies including information retrieval!
The second part of the lecture looks into the opportunities for information retrieval research. Social media platforms tend to provide access to user profiles, connections between users, the content these users publish or share, and how they react to each other's content through commenting and rating. Also, the large majority of social media platforms allow their users to categorize content by means of tags (or, in direct communication, through hash-tags), resulting in collaborative ways of information organization known as folksonomies. However, these social media also form a challenge for information retrieval research: the many platforms vary in functionalities, and we have only very little understanding of clearly desirable features like combining tag usage and ratings in content recommendation! A unifying approach based on random walks will be discussed to illustrate how we can answer some of these questions [1], but clearly the area has ample opportunity to leave your own marks.
In the final part of the lecture I will briefly touch upon an even wider range of opportunities, where data derived from social media form a key component to enable new research and insights. I will review a few important results from research centered on Wikipedia, facebook and twitter data, as well as a diverse range of new information sources including the geo- and temporal information derived from images and tweets, product reviews and comments on youtube videos, and how url shorteners may give a view on what is popular on the web.
[1] Maarten Clements, Arjen P. De Vries, and Marcel J. T. Reinders. 2010. The task-dependent effect of tags and ratings on social media access. ACM Trans. Inf. Syst. 28, 4, Article 21 (November 2010), 42 pages. http://doi.acm.org/10.1145/1852102.1852107
Improving IR models (and search technology in general) with Natural Language Processing has proven remarkably hard. We will discuss some of the (few) successes and (numerous) failures in this area, including many examples from my own work in ranking models and web search applications.
Patent Searching and Landscaping presents many challenges to the research community. Patents are complex technical documents, whose content appears in many languages and contains images, chemical and genomic structures and other forms of data, intermixed and cross-referring with the text material. Further much patent work involves the integration of other forms technical information: scientific papers, open linked data and so on; with the patent data. Finally the realistic presentation of search and analysis resultsto often non-technical and time-poor audiences for purpose of strategic decision making presents particular challenges.
The patent related business is worth billions, so all these problems are solved on a daily basis by the patent community, often in somewhat inadequate labor intensive ways. The challenge for the research community is to provide better solutions without increasing the already heavy burden on relevant technical, legal and language experts.
The course will review the state of the art and point out where the key challenges are, especially for early stage researchers in Multi-media Information Retrieval.
Objectives:
We are living in a multilingual world and the diversity in languages which are used to interact with information access systems has generated a wide variety of challenges to be addressed by computer and information scientists” [1]. In this session we will consider how to adapt IR methods to multilingual settings (referred to as MLIR and CLIR). The session will be structured into the following: (1) the motivation for multilingual retrieval; (2) MLIR and CLIR techniques; and (3) the challenges of providing multilingual retrieval. Some of the topics covered will include: implementing MLIR and CLIR systems, designing multilingual user interfaces, localisation, translation issues, practical applications of MLIR and CLIR technologies. The session is based on the book “Multilingual Information Retrieval” [1].
[1] Peters, C., Braschler, M. and Clough, P. (2012) Multilingual Information Retrieval: From Research to Practice, Springer: Heidelberg, Germany, ISBN 978-3-642-23007-3, 217 pages. Available from Springer.
This course will focus on three aspects of interactive retrieval, namely quantitative modeling, cognitive models, and user interface design. For the quantitative models, the interactive probability ranking principle will be introduced along with methods for estimating the required parameters and for constructing Markov models of the user's interaction with the system. Cognitive models will cover a variety of approaches describing information seeking and searching. These models will then be used as basis for user interface designs that go beyond the current query-result list paradigm.
To overcome the “one size fits all” behaviour of most search engines, in recent years a great deal of research has addressed the problem of defining techniques aimed at tailoring the search outcome to the user context in order to improve the quality of search. The main idea is to produce context-dependent and user-tailored search results. Search tasks are subjective and often complex. The user-system interaction, based on keyword-based querying and on the presentation of search results as a list of web pages ordered according to their estimated relevance, is often unsatisfactory. This lecture will present an overview of the main issues related to contextual search.
The proliferation of blogs, forums, social networking sites and other online means of communication has created a digital landscape where people are able to publicly express their thoughts and opinions through a variety of means and applications. In order for this type of online content to be harnessed and utilized two conditions must be met. The available information must be filtered so that unrelated and redundant content is removed and only relevant and qualitative information is retrieved. The retrieved content must be efficiently and effectively analysed in order to identify and extract any opinionated information within it and have its nature and disposition assessed and characterized.
Opinion Retrieval provides a solution to the aforementioned issues by combining approaches and theories from two distinct but related areas of research; Information Retrieval and Sentiment Analysis. The former is necessary to retrieve content which is relevant to the user’s information need and the latter to detect and analyse any affective content within it. Within the course, the students will become familiar with all aspects relating to effectively and efficiently distilling opinions from social media with a particular focus on the current state-of-the-art approaches on sentiment analysis and social information retrieval.
Search is not just a box and ten blue links. Search is a journey: an exploration where what we encounter along the way changes what we seek. But in order to guide people along this journey, we must understand both the art and science of search usability.
In this session we will review the basic principles of search usability, with a focus on practical solutions that integrate information-seeking theory with UI design best practice. We will explore the fundamental concepts of user-centred design for information search and discovery and learn how to differentiate between various types of search behaviour: known-item, exploratory, lookup, learning, investigation, etc. We’ll review the primary ‘dimensions’ of search user experience and how to apply them to different contexts. We conclude by exploring design patterns and other key resources and their role in solving practical design problems.
The session will include both presentations and group work to enable delegates to analyse, evaluate and improve the effectiveness of search applications within their own organisation.
For a given query, a search engine should respond with a ranked list that reflects the breadth of available information, the range of possible user needs, and any ambiguity inherent in the query. To satisfy these requirements, researchers have proposed ranking algorithms and evaluation methodologies for novelty and diversity, which we will cover in this lecture.
We will start with motivating examples and a discussion of key concepts and terminology. We will present some of the most successful diversification algorithms, including MRR and xQuAD, comparing implicit and explicit diversification approaches. We will outline key properties required from evaluation measures, discuss widely used measures, and review international evaluation experiments conducted at TREC and NTCIR. Finally, we will suggest directions for future research.
In his influential book "The Long Tail", Chris Anderson stated that we were leaving the age of information to enter the age of recommendation. In this age of abundant information, searching is not an ultimate user need, finding is. Recommender Systems have bloomed in an attempt to serve users' information needs without requiring the user to enter an explicit query. These systems are built on engines that use many of the general IR constructs, but have personalization at its core, and do not require any text processing component. The 2006 Netflix $1M Prize put the focus on predicting user ratings as a proxy to the recommendation problems. However, the recommendation field has since evolved to include many different approaches that go from tensor factorization to personalized learning to rank models.
In this session, I will give an overview of the Recommendation problem. I will describe the different basic approaches with references to the literature as well as practical examples of their use in industry.
Indexing is the process of representing the documents in an information retrieval system in a way that they can be efficiently retrieved and scored for answering a user's query. For web search engines, due to the large amount of data available in the web, it is necessary to generate the index of the crawled documents using distributed processing techniques. Amongst these, one of the most commonly used paradigms is MapReduce, which provides a robust, powerful and conceptually simple way of implemented distributed processes. This class will provide an introduction of Indexing in Information Retrieval systems to MapReduce for distributed programming. It will include examples for how an inverted index can be built and some ranking signals such as PageRank can be computed using MapReduce.
|
|
|
|
|
|
|
|
|
| |
|
|
||