Datasets

Search among 1 datasets on dados.gov.pt - Portal de dados abertos da Administração Pública

A n-grams collection extracted from the Portuguese Web

From Arquivo.pt - pesquise páginas do passado

The n-grams collection was extracted from the collected documents whose identified language was Portuguese. We extracted word n-grams up to the fifht order (5-grams). A set of regular expressions to tokenize the text were applied. After the extraction, all n-grams with tokens having more than 32…

— Updated on August 29, 2023

0 reuses
0 favorites

Actions

Download list of datasets as csv file