Discovery of usage patterns in digital library web logs using Markov modeling - Equipe Sociologie, Information-Communication Design Accéder directement au contenu
Pré-Publication, Document De Travail Année : 2019

Discovery of usage patterns in digital library web logs using Markov modeling

Résumé

This paper proposes a family of tools based on Markov modeling to quantitatively analyze how people access the digital collections of the Bibliothèque nationale de France (BnF, the national library of France), through the web platform called Gallica. The aim is to provide the BnF with relevant information about the various usage patterns to help them to better understand their users, improve the mediation efforts and the design of the website, in order to increase the general public use of the 4M-documents collection. For that purpose, the study focuses on the access logs retrieved from the Apache HTTP servers of Gallica that are converted into sequences of actions. In order to study user navigation behaviors, we propose to model the access log data using Markov Models, whether it be Markov chains when considering sequences of actions without duration, or Markov processes when taking into account duration. Our models are either used to capture an average behavior through meaningful statistics or to cluster the data to exhibit various interpretable types of usage. The numerical results bring new insights on the way the users interact with the platform, highlighting the mean duration of some actions such as the interaction with the search engine or the consultation of documents. Even if our approach requires the use of additional information in order to properly interpret the models and the correlations that it highlights, it is able to discover all types of behaviors, including the stealthiest and the most difficult to capture in traditional surveys, giving them their fair weight in terms of audience. We also show how this approach fits into a broader work combining data mining and ethnography.
Fichier principal
Vignette du fichier
nouvellet-etal-2018.pdf (1.02 Mo) Télécharger le fichier
Origine : Fichiers produits par l'(les) auteur(s)
Loading...

Dates et versions

hal-02182244 , version 1 (12-07-2019)

Identifiants

  • HAL Id : hal-02182244 , version 1

Citer

Adrien Nouvellet, Valérie Beaudouin, Florence d'Alché-Buc, Christophe Prieur, François Roueff. Discovery of usage patterns in digital library web logs using Markov modeling. 2019. ⟨hal-02182244⟩
438 Consultations
155 Téléchargements

Partager

Gmail Facebook X LinkedIn More