The n-grams collection was extracted from the collected documents whose identified language was Portuguese. We extracted word n-grams up to the fifht order (5-grams). A set of regular expressions to tokenize the text were applied. After the extraction, all n-grams with tokens having more than 32…
Qualidade dos metadados :
1.0/1
—
Actualizado em 29 de agosto de 2023