One book that I have found very helpful in my language learning is A Frequency Dictionary of Portuguese, published by Routledge. Of the other Romance languages, Routledge has also done frequency dictionaries in French and Spanish. They seem quite hard to find in bookstores, at least here in Australia anyway. Going by the list Routledge has provided on this web page, only 12 books have been done in the series, and I am a bit surprised that Italian has not been tackled (Italy is a popular tourist destination and there is a big demand for short-term courses in Italian), whereas Czech and Contemporary American English have. But that’s by the by. Most of the books in the series seem to be available in paperback and kindle format from online sellers for about $30 to $40 (probably American dollars), but the hardback editions can be expensive at more than $100. My paperback, I note, cost me $58 (Australian dollars) brand new at a specialist language book centre in Sydney a few years ago. All the dictionaries in the series list the top 5000 most frequently used words in the language, both in writing and in speech, and compiling them must have been painstaking work by the researchers involved.
Why are they so helpful?
Well, it is logical that if you can familiarise yourself with, if not master, the 5000 most frequently used words in a language, you should be able to get by pretty well wherever that language is spoken. In the preface to the whole series, the publisher notes that in English, the 4000 to 5000 most frequent words account for 95 per cent of a written text, and the 1000 most frequent words account for 85 per cent of speech. The figures are not available for other languages, but presumably much the same applies.
The introduction to the Portuguese dictionary, the authors (Mark Davies and Ana Maria Raposo Preto-Bay (!) make what I feel is a valid point based on my own experience of not only trying to teach myself Portuguese, but teaching English to speakers of other languages. Sometimes, when you are studying an text or article, you have to look up words that aren’t particularly common or useful. In my teaching time I have had to explain some obscure words, such as “jiffy”.
“Although a typical textbook provides some thematically-related vocabulary in each chapter (foods, illnesses, transportation, clothing, etc.) there is almost never any indication of which of these words the student is most likely to encounter in actual conversation or texts. In fact, sometimes the words are so infrequent in actual texts that the student may never encounter them again in the “real world”, outside of the test for that particular chapter,” the authors note. They go on to say the situation “can be equally as frustrating for independent learners. These people may pick up a work of fiction or a newspaper and begin to work through the text word for word, as they look up unfamiliar words in a dictionary. Yet there is often the uncomfortable suspicion on the part of such learners that their time could be maximized if they could simply begin with the most common words in Portuguese, and work progressively through the list.”
I agree a lot with what they are saying. Most of the little language books aimed at holiday makers are full of words that in reality one will rarely use, such as “tweezers”.
But let’s not forget that words outside the top 5000 have a role to play. They give us variety and are testimony to human being’s creativity. Plus, there are some great words lurking in there too. My “Quirky Vocabulary” series on this blog often delights in funny and/or unusual words.
Should you invest in a frequency dictionary?
I think the frequency dictionary is a great learning tool, because apart from listing the top 5000 words, it gives a sample sentence for each word and a translation of that sentence, so in the process you are learning a lot more words than the 5000, and you are learning sentence construction. However, I wouldn’t recommend using a frequency dictionary as an introduction to a language, or as your first textbook. It helps if you have some prior knowledge of vocabulary, grammar and verb conjugation before you get stuck into a frequency dictionary, otherwise you won’t really understand what’s going on in the sample sentences. Get past the “beginner” stage of the language first, and then you will enjoy perusing the frequency dictionary. The books are not just one long list – there are sidebars grouping words by subject matter. The most mentioned parts of the body and the most mentioned food terms, for example.
European Portuguese or Brazilian Portuguese? Or both?
How to rank the top 5000 words in a language is quite complicated. For the Portuguese dictionary, both Brazilian and Portuguese texts were used. The starting point was O Corpus do Português (website link here), which contains 45 million words (!) using texts from the 1300s to the 1900s, but obviously, to reflect modern usage, the 1900s section was the main focus. This then became a 20 million-word corpus, using half from Brazilian sources, half from continental Portuguese. That 20 million is broken down thus
- Spoken words: 1 million from Brazil and 1 million from Portugal.
- Fiction: 3 million words from 95 novels and short stories from Brazil, and 3 million words from 175 novels and short stories from Portugal. (I would love to know which novels, and whether Brazilian writers are more wordy than Portuguese ones or vice versa).
- News: 3 million words from thousands of articles on different topics in seven newspapers in Brazil (from São Paulo, Bahia, Curitiba, Porto Alegre, Recife and Santa Catarina) and 3 million words from five newspapers in Portugal (Publico, Expresso, Jornal [Lisbon], Beira and Leira).
- Academic: 3 million words each from various encyclopedia and academic websites in Brazil and Portugal. (I pity the poor people who had to trawl through so much academic text!)
On top of this, the researchers had to deal with such matters as the differences in spelling between continental Portuguese and Brazilian Portuguese (it was published in 2008, before the “Acordo Ortográfico” took effect; Portugal’s six-year adaptation period of the spelling reforms ends this year), how to count nouns and adjectives that have only minor syntactic and semantic differences between them, how to link different forms of a verb back to the base form, how to “disambiguate between the [passive/verbal] and [adjectival/resultative] senses of the past participle” and so on and so on. Who’d want to be a language researcher! The point is, they’ve done all the hard work. Mastering the 5000 words is the easy bit.
So, what are the most common words in Portuguese?
Have a guess. And what do you think the 10 most used verbs are? I’ll save that and whatever surprises I can find for future posts.
I can’t help wondering what the most frequently used words are in Contemporary American English. I’m thinking “yeah, like, whatever” or “bitch”!