Major Minors

Publicado em 2 de setembro de 2021

Ver a reutilização

Paulo Martins

1 conjuntos de dados
1 reutilizações
  • data-science
  • entidades
  • entities
  • knowledge-graph
  • knwoledge-base
  • minorias
  • minorities
  • ontology
  • owl
  • rdf
  • sparql
  • ttl
  • turtle





URL Estável


Major Minors is a project that collects press clippings from Portuguese newspapers (currently from 1996 until 2019) which refer subjects related with minorities. Its datasource is the (repository of the past Portuguese World Wide Web). This data was used to generate ontologies (RDF triplestores composing a semantic database) and interfaces to interact with them (SPARQL APIs following W3C standards for Semantic Web). We enriched this basis with new ramifications, by identifying and crossing references with 19 entities, augmenting the basis data into Knowledge Graph.

This project was born in the Department of Informatics of the University of Minho by:

We underline the multidisciplinar collaborations later established with other Research Centers, namely humanities research groups from CEHUM and R&D projects (e.g. NetLang): the chosen categorizations reflect the fields of study of these collaborations partnered with the project. Research is undergoing and new minorities and thematics could be integrated in the future, accordingly to new partnerships. The most recent scientific output can be found at:

Main endpoints:
Website & Interfaces:


  • Software (open-source at GitHub)
  • Websites & Interfaces for visually navigating the data
  • Open-source datasets (2 ontologies & 19 entities)

Here we share some of these datasets. The main ones are two ontologies ("minors.ttl", "publico.ttl"): a) only the corpus referring minorities; b) all the newspaper corpus. Both have data from the "Público" newspaper (from ~1996-2019) crossed with thousands of real-world entities, and can be accessed through Reactive Interfaces, SPARQL queries and static Galleries at the main website. Secondarily, 19 datasets with real-world entities were created. Some are a work in progress and incomplete. The majority are in a Python dictionary/array format.

This work was one of the awarded by the 1st Prize, for its innovative nature in the use of historical information preserved by Arquivo. pt, demonstrating the usefulness of this public service and the importance of preserving the information published on the web. Press clippings about the project and videos at:

Conjuntos de dados usados 2

Discussão entre o produtor e a comunidade sobre esta reutilização.

Mais reutilização

Descobrir mais sobre reutilização.