Web Information Retrieval

Basic Course

Lecturer:
Stefano Mizzaro (University of Udine)

Board Contact:
Stefano Mizzaro

SSD: ING-INF/05

CFU: 6 CFU

Period: First Semester

Program:

Information Retrieval (IR) is a discipline that has a high historical importance and has received an even increased attention after the coming of the Web. The course aims to present the main conceptual issues underlying IR systems, with particular emphasis on Web search engines.

Detailed contents:

* Classical IR:

– formal IR models (Boolean, vector space, probabilistic and variants as BM25, Language models);

– structure of the inverted index (basics, compression);

– user interfaces for IR (classification, survey);

– classification (definition, naive Bayes classifiers)

– clustering (hierarchical and approximate algorithms);

– evaluation (foundations, methodologies, metrics; research topics).

* Web IR:

– Web graph (size and shape: small world and scale-free networks, bow-tie shape);

– link analysis for ranking and other applications (PageRank, HITS, variants);

– crawling (concepts and architecture);

– spam (short account);

– search engine architecture (short account).

* Case studies and specific issues.

Verification: Oral exam plus an extra small term project (talk, homework, etc.) on a specific topic. The course will be taught in English and the exam can be in English as well. Alternative programs for Erasmus students are possible in principle and have to be discussed with the instructor

Prerequisites: Basic knowledge of Programming, Algorithms and data structures, Web technologies, Linear algebra, Probability