Major Minors

Publié le 2 septembre 2021

Voir la réutilisation

Paulo Martins

1 jeux de données
1 réutilisations
  • data-science
  • entidades
  • entities
  • knowledge-graph
  • knwoledge-base
  • minorias
  • minorities
  • ontology
  • owl
  • rdf
  • sparql
  • ttl
  • turtle

Informations

Type
Application
Thématique
Autres
ID
6130f636078190517fcab5aa

Publication

Intégrer sur votre site

URL stable

Description

Major Minors is a project that collects press clippings from Portuguese newspapers (currently from 1996 until 2019) which refer subjects related with minorities. Its datasource is the Arquivo.pt (repository of the past Portuguese World Wide Web). This data was used to generate ontologies (RDF triplestores composing a semantic database) and interfaces to interact with them (SPARQL APIs following W3C standards for Semantic Web). We enriched this basis with new ramifications, by identifying and crossing references with 19 entities, augmenting the basis data into Knowledge Graph.

This project was born in the Department of Informatics of the University of Minho by:

We underline the multidisciplinar collaborations later established with other Research Centers, namely humanities research groups from CEHUM and R&D projects (e.g. NetLang): the chosen categorizations reflect the fields of study of these collaborations partnered with the project. Research is undergoing and new minorities and thematics could be integrated in the future, accordingly to new partnerships. The most recent scientific output can be found at: http://minors.ilch.uminho.pt/science

Main endpoints:
Website & Interfaces: http://minors.ilch.uminho.pt
DB & API: http://sparql.ilch.uminho.pt

Outputs:

  • Software (open-source at GitHub)
  • Websites & Interfaces for visually navigating the data
  • Open-source datasets (2 ontologies & 19 entities)

Here we share some of these datasets. The main ones are two ontologies ("minors.ttl", "publico.ttl"): a) only the corpus referring minorities; b) all the newspaper corpus. Both have data from the "Público" newspaper (from ~1996-2019) crossed with thousands of real-world entities, and can be accessed through Reactive Interfaces, SPARQL queries and static Galleries at the main website. Secondarily, 19 datasets with real-world entities were created. Some are a work in progress and incomplete. The majority are in a Python dictionary/array format.

This work was one of the awarded by the Arquivo.pt 1st Prize, for its innovative nature in the use of historical information preserved by Arquivo. pt, demonstrating the usefulness of this public service and the importance of preserving the information published on the web. Press clippings about the project and videos at: http://minors.ilch.uminho.pt/press

Jeux de données utilisés 2

Discussion entre le producteur et la communauté à propos de cette réutilisation.

Plus de réutilisations

Découvrez davantage de réutilisations.