This is a very hard question to answer, mostly because the definition of “word” is really, really iffy. I’ll start with Arabic, because I know it’ll come up at some point or another.
Arabic is a Semitic language, and like other Semitic languages, it has a neat system for making words. There are “roots” made of several consonants (usually three), with each root having a general definition that is then modified by inserting patterns of vowels between the consonants.
The root √ktb, for example, has a general meaning of “book” or “read”. If you insert the vowels i and ā, you get kitāb, which means “book”. If you swap them around, you get kātib, meaning “writer” (masc.); for a feminine writer, you add the fem. suffix -a to get kātiba, “female writer”. Maktaba is “library”, iktitāb is registration, muktatab is subscription, and so on. A full list is available here.
Someone one day had the bright idea of calculating all the possible combinations of letters for the total number of theoretically possible roots. The number he came up with was about twelve million.He never said that Arabic had twelve million words, just that there were twelve million possible roots according to Arabic’s word-building system – whether or not they even meant anything. Other people misunderstood this and shared it, and a myth that Arabic had twelve million words quickly developed, resulting in things like this:
This root calculation also disregarded Arabic’s word-building capacities: in this estimate, kitāb, kātib, kātiba, maktaba, iktitāb, muktatib, etc., would all be counted as √ktb.
So how many words are there in Arabic? This is where we run into a problem.
“Eat”, “ate”, “eating”, and “eaten” would all be counted as different forms of the same word, “eat”. Following this logic, shouldn’t those Arabic words be counted as different forms of the word √ktb? If not, why not? Which words would or would not be counted as separate? In addition to this, Arabic’s system means that there are a lot of possible words, but this doesn’t count how many are actually in use. Similarly, English’s word-building means that you can make words like “anticow” or “haver” (i.e. “one who has”), but those aren’t formally recognized by any dictionary I could find.
This issue with Arabic is related to the overarching problem of what a word is. I’ve covered this here, and the short answer is that a word (more specifically, a lexeme) is a single unit of meaning, including derivational but not inflectional morphemes.
And you can’t easily count the number of lexemes, either. Something like the chemical name for “titin” might be counted as a word, but it’s part of a shared scientific vocabulary that any language could easily adopt, if it hasn’t already. But this would disqualify things like “triceratops” or “uranium”, which are definitely words.
Dictionaries are usually where you’d go for a word count. The issue with this method is that dictionaries aren’t some authority of which words are real or fake or what the words mean. They’re reference books. Saying a word isn’t real because it isn’t in the dictionary makes about as much sense as saying something didn’t happen because it isn’t in the encyclopedia.
Obviously, the reason some things aren’t in encyclopedias is that there’s simply not enough room for everything, and including less important events is unnecessary. Where that line is drawn is arbitrary. It’s the same with dictionaries.
According to List of dictionaries by number of words – Wikipedia, the largest dictionary is Korean’s Woori Mal Saem, a Wiki-style open dictionary, with just over one million total entries. Following this is the Swedish Svenska Akademiens Ordbok, with an estimated 600 000 words upon its completion. After this is Icelandic’s Orðabók Háskólans, which is composed mostly of incredibly rare compound words.
Then, in descending order, you have Japanese, Lithuanian, Norwegian, Dutch, German, and French. It’s not until the tenth spot that you find the Oxford English Dictionary, sitting at an inventory of 230 000 words.
Then there’s a bigger issue of agglutinative and polysynthetic languages, for which I’d like to introduce the Eskimo-Aleut family, spoken from Alaska to Greenland.
One of the languages in the family is called Yupik, spoken in Alaska. Its most famous sentence is Tuntussuqatarniksaitengqiggtuq – and yes, that’s one word. It means “He hadn’t yet said again that he was going to hunt reindeer.” (See here).
Yupik, as with the other members of its family, is known for smushing a lot of morphemes (word bits) together to make long and complex words that can act as full sentences on their own, as in the above example.Now you have another problem. While you could discount the wordhood of “anticow” and “haver” on the basis that they’re nonce, but in polysynthetic and highly agglutinative languages – i.e., those that shove lots of word-bits together – such nonce words are commonplace, and theoretically have infinite vocabularies. At what point, then, do you decide that something should or shouldn’t be a word? So to answer your question, we can’t count what should or shouldn’t be a word. If you want to go by the number of words in dictionaries, then Korean is at the top and any of the thousands of languages without dedicated dictionaries are at the bottom. But if you want a perfectly unarbitrary answer, then I’m sorry to say I can’t really offer you one.