L’hétérogénéité des données provenant du web ; des étapes pour la constitution du corpus complexe

Camila Pérez Lagos

doi:10.1051/shsconf/20152001018

Open Access

Issue		SHS Web of Conferences Volume 20, 2015 ICODOC 2015 : Colloque Jeunes Chercheurs du Laboratoire ICAR


Article Number		01018
Number of page(s)		10
Section		Articles issus des communications orales
DOI		https://doi.org/10.1051/shsconf/20152001018
Published online		30 November 2015

SHS Web of Conferences 20, 01018 (2015)

L’hétérogénéité des données provenant du web ; des étapes pour la constitution du corpus complexe

Heterogeneity of datasets from the Web: stages for the constitution of a complex corpus

Camila Pérez Lagos^a

Sorbonne, Nouvelle-Paris 3, EA1484 CIM-ERCOMES

^a Auteur de correspondance : This email address is being protected from spambots. You need JavaScript enabled to view it.

Résumé

Le corpus issu d’Internet fait émerger de nouvelles problématiques pour les sciences de l’information et de la communication ainsi que pour l’analyse du discours. Au moment de traiter des données multiformes nous risquons de les adapter aux outils déjà existants en contournant les aspects qu’il n’est pas possible de saisir tels que la volatilité des contenus et la multiplicité des signes. Sur une seule page web nous pouvons être confrontés à des photographies, des vidéos, des hyperliens, etc. qui sont constamment actualisés en fonction des contenus. Dans le cadre de cet article nous nous proposons de formuler des réflexions autour de la notion de corpus compris comme une construction de données complexes due à une hétérogénéité de deux types: énonciative et technique. Cet aspect est traité en rapport avec une première analyse de corpus de six sites web de salles de théâtre provenant du Chili, de France et d’Espagne. Une telle démarche nous a permis de dégager les premières conclusions autour des données provenant d’Internet: la diffusion des contenus émanant des sites web et répandus également sur les réseaux sociaux provoque l’amplification du rôle du destinataire, qui devient producteur des contenus ainsi que diffuseur et critique de spectacles de théâtre à l’affiche.

Abstract

The corpus derived from the Internet causes new difficulties for information and communication sciences, as well as for discourse analysis. When analyzing multiform data there is a risk of adapting them to the already existing tools, therefore bypassing aspects that were not possible to take into account as content volatility and sign multiplicity. For example, in a web page we can be confronted by pictures, videos, and hyperlinks that are constantly being actualized according to the content. This paper formulates ideas on the concept of corpus, understood as the construction of complex datadue to two types of heterogeneity: enuciative and technical. Drawing on a preliminary analysis of six websites belonging to theater halls from Chile, France and Spain, the corpus will be fully addressed and discussed. Initial findings regarding data sets from the Internet, reveal how the diffusion of content from websites and its spread to the social networks reates an amplification of the role of receivers who become both content diffusers and theater show critics.

This is an Open Access article distributed under the terms of the Creative Commons Attribution License 4.0, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Current usage metrics show cumulative count of Article Views (full-text article views including HTML views, PDF and ePub downloads, according to the available data) and Abstracts Views on Vision4Press platform.

Data correspond to usage on the plateform after 2015. The current usage metrics is available 48-96 hours after online publication and is updated daily on week days.

Initial download of the metrics may take a while.