Democratic Underground Latest Greatest Lobby Journals Search Options Help Login
Google

AP: English Language Hits 1 Billion Words

Printer-friendly format Printer-friendly format
Printer-friendly format Email this thread to a friend
Printer-friendly format Bookmark this thread
This topic is archived.
Home » Discuss » Latest Breaking News Donate to DU
 
Tesha Donating Member (1000+ posts) Send PM | Profile | Ignore Wed Apr-26-06 10:53 AM
Original message
AP: English Language Hits 1 Billion Words
Edited on Wed Apr-26-06 10:55 AM by Tesha
http://www.sfgate.com/cgi-bin/article.cgi?file=/news/archive/2006/04/26/international/i063924D41.DTL

English Language Hits 1 Billion Words

By SUEVON LEE, Associated Press Writer

A massive language research database responsible for
bringing words such as "podcast" and "celebutante" to
the pages of the Oxford dictionaries has officially hit
a total of 1 billion words, researchers said Wednesday.

Drawing on sources such as weblogs, chatrooms, newspapers,
magazines and fiction, the Oxford English Corpus spots
emerging trends in language usage to help guide lexi-
cographers when composing the most recent editions of
dictionaries.

The press publishes the Oxford English Dictionary,
considered the most comprehensive dictionary of the
language, which in its most recent August 2005 edition
added words such as "supersize,""wiki" and "retail
politics" to its pages.

Oxford University Press lexicographer Catherine Soanes
said the database is not a collection of 1 billion
different words, but of sentences and other examples
of the usage and spelling.

"The corpus is purely 21st century English," said Judy
Pearsall, publishing manager of English dictionaries.
"You're looking at current English and seeing what's
happening right now. That's language at the cutting edge."

<more>
Printer Friendly | Permalink |  | Top
Bernardo de La Paz Donating Member (1000+ posts) Send PM | Profile | Ignore Wed Apr-26-06 10:55 AM
Response to Original message
1. One Million. Off by a factor of 1,000.
One Million. Off by a factor of 1,000.
Printer Friendly | Permalink |  | Top
 
DRoseDARs Donating Member (1000+ posts) Send PM | Profile | Ignore Wed Apr-26-06 11:13 AM
Response to Reply #1
6. Hmm, according to this...
http://www.answers.com/topic/billion

Meaning #1: (in Britain) the number that is represented as a one followed by 12 zeros
Synonyms: one million million, 1000000000000

Meaning #2: (in the United States) the number that is represented as a one followed by 9 zeros
Synonyms: one thousand million, 1000000000

The adjective billion has one meaning:

Meaning #1: (U.S.) denoting a quantity consisting of one thousand million items or units; (Britain) denoting a quantity consisting of one million million items or units
Synonym: a billion
Printer Friendly | Permalink |  | Top
 
Bernardo de La Paz Donating Member (1000+ posts) Send PM | Profile | Ignore Wed Apr-26-06 11:25 AM
Response to Reply #6
8. A million million words would be off by a factor of a million.
Average vocabulary used by people is about 2,000 to 5,000 words, and people understand about 10 or 15,000.

Shakespeare had a vocabulary of 25,000 words.
Printer Friendly | Permalink |  | Top
 
muriel_volestrangler Donating Member (1000+ posts) Send PM | Profile | Ignore Wed Apr-26-06 11:37 AM
Response to Reply #8
10. "not a collection of 1 billion different words"
Yes, they do mean "1 billion" , and it means 1,000,000,000 ("If all the words in the Oxford English Corpus were laid out end to end (measuring on average 1cm), the total would stretch from London to New York, around 10,000 km" - source - there are 1,000,000,000 cm in 10,000 km, though that's about twice the London-New York distance - perhaps they left out "and back").

As that link explains, they mean they've got indexed examples of current English that total 1 billion words. It doesn't specify the number of distinct English words.
Printer Friendly | Permalink |  | Top
 
Tesha Donating Member (1000+ posts) Send PM | Profile | Ignore Wed Apr-26-06 11:30 AM
Response to Reply #1
9. No. the Corpus page definitely claims one billion (apparently, 10^9) words
Edited on Wed Apr-26-06 11:31 AM by Tesha
No. the Corpus page definitely claims one billion (apparently, 10^9)
words. They don't mean just a contiguous string of letters, unbroken
by spaces, though; see the second Q&A below. It's apparently the
AP writer who misinterpreted what was meant. (What a surprise, ehh?)

http://www.askoxford.com/oec/mainpage/?view=uk :

The corpus reaches new heights

In Spring 2006 a milestone is reached: the corpus now
contains over 1 billion words of real 21st century English.
It is not only size that matters, though: it is the size of
the corpus coupled with the careful selection and development
of its contents which means that it is a resource unlike any
other anywhere in the world.


One billion words?

If all the words in the Oxford English Corpus were laid out
end to end (measuring on average 1cm), the total would stretch
from London to New York, around 10,000 km. Because the corpus
is a collection of texts, there are not one billion different
words: the humble word 'the', the commonest in the written
language, accounts for 50 million of all the words in the
corpus!


Tesha
Printer Friendly | Permalink |  | Top
 
Bernardo de La Paz Donating Member (1000+ posts) Send PM | Profile | Ignore Wed Apr-26-06 11:37 AM
Response to Reply #9
11. OK, different,The Corpus replicates the Vocabulary millions of times over.
Edited on Wed Apr-26-06 11:38 AM by Bernardo de La Paz
Ten or twenty years ago the vocabulary of the English language was 450,000 words.

The Corpus (or "Body") is the sum total of all the written words in English, like Shakespeare's plays and Tony Snow's posts that were scrubbed from the freeprepublic but can still be read by clicking on the "Cached" links and then looking for the highlighted "by Tony Snow" posts: http://www.google.com/search?q=%22by+tony+snow%22+site%3Afreerepublic.com&num=30&meta=site%3Dsearch
Printer Friendly | Permalink |  | Top
 
shain from kane Donating Member (1000+ posts) Send PM | Profile | Ignore Wed Apr-26-06 11:40 AM
Response to Reply #9
14. I read somewhere, probably Ripley's Believe It or Not, that if Adam
counted from one to one billion, at, I think, it was rate of one number per second, that he would still be counting today.

A billion is a big number.

And when we talk about a trillion, we are talking about the sum of many worker ants' production. There is no one person counting to a billion, or a trillion, dollar by dollar.

Does anyone have a calculation of the permutations and combinations of 26 letters?


From the above paragraph, "Because the corpus is a collection of texts, there are not one billion different words"
Printer Friendly | Permalink |  | Top
 
melm00se Donating Member (1000+ posts) Send PM | Profile | Ignore Thu Apr-27-06 07:59 AM
Response to Reply #14
31. i think
that the number is in the range of 4x10^26 (calculate 26!)
Printer Friendly | Permalink |  | Top
 
Tesha Donating Member (1000+ posts) Send PM | Profile | Ignore Thu Apr-27-06 08:03 AM
Response to Reply #14
32. 'Depends on the number of letters-per-word.
> Does anyone have a calculation of the permutations
> and combinations of 26 letters?

It depends on the number of letters-per-word that you want
to consider. If you'll allow words like "supercalifragil-
isticexpialidocious", then there are far too many permutations
to bother counting, even if we through in some restrictions on
combinations that are clearly implausible English words (for
example, "qqqqq").

Tesha
Printer Friendly | Permalink |  | Top
 
Name removed Donating Member (0 posts) Send PM | Profile | Ignore Wed Apr-26-06 10:55 AM
Response to Original message
2. Deleted message
Message removed by moderator. Click here to review the message board rules.
 
TechBear_Seattle Donating Member (1000+ posts) Send PM | Profile | Ignore Wed Apr-26-06 10:58 AM
Response to Original message
3. And * knows MAYBE 1000. n/t
Printer Friendly | Permalink |  | Top
 
Monk06 Donating Member (1000+ posts) Send PM | Profile | Ignore Wed Apr-26-06 11:14 AM
Response to Reply #3
7. "And * knows MAYBE 1000." Actually the average competent....
English speaker has a vocabulary of 600 words
So Bush maybe has a vocabulary of 200.

On the other hand at the rate he is slaughtering
the English language by turning verbs into nouns
and nouns into verbs we be up to 2 Billion by the
time he's gone.

Just look what he's don't to the deficit. He's a
natural born INFLATIONIST.

Oops, I think I just created a Bushism.
Printer Friendly | Permalink |  | Top
 
Tesha Donating Member (1000+ posts) Send PM | Profile | Ignore Thu Apr-27-06 06:23 AM
Response to Reply #7
30. 600 words? Nonsense.
> Actually the average competent English speaker
> has a vocabulary of 600 words.

You *KNOW* that's not true. If you sat down with
a dictionary, turned to "Aardvark", and started
ticking off all the words you know, you'd have far
than 600, probably before you reached "Ear".

It may be that the average competent English speaker
selects from a more-restricted vocabulary of 600 words
when speaking colloquially, but I think you'll find
the number of words "known" to the average speaker is
north of 5,000 and often, 10,000.

(By the way, this post of mine contains 73 unique words,
so more than 12% of your proposed vocabulary.)

Tesha
Printer Friendly | Permalink |  | Top
 
Tummler Donating Member (836 posts) Send PM | Profile | Ignore Wed Apr-26-06 11:01 AM
Response to Original message
4. And half are euphemisms for sexual or excretory functions!
:hurts:
Printer Friendly | Permalink |  | Top
 
shain from kane Donating Member (1000+ posts) Send PM | Profile | Ignore Wed Apr-26-06 11:11 AM
Response to Reply #4
5. With unique usages. Remember "turd blossom"? Now conferring with his
Edited on Wed Apr-26-06 11:12 AM by shain from kane
lawyers.
What is the meaning of this phrase in the dictionary?
I don't have a clue. But, of course, I don't hang around the blueblood crowd, his base, like the fearful leader. Most people that I know don't talk like that.
Printer Friendly | Permalink |  | Top
 
fshrink Donating Member (1000+ posts) Send PM | Profile | Ignore Wed Apr-26-06 01:48 PM
Response to Reply #5
25. Like :"what a bush!"
Printer Friendly | Permalink |  | Top
 
fshrink Donating Member (1000+ posts) Send PM | Profile | Ignore Wed Apr-26-06 01:50 PM
Response to Reply #25
26. Or "go bush yourself".
Printer Friendly | Permalink |  | Top
 
mark11727 Donating Member (1000+ posts) Send PM | Profile | Ignore Wed Apr-26-06 11:39 AM
Response to Reply #4
13. Like when I caught my finger in the door...
... I saw a million stars, and named every one of them.

:evilgrin:
Printer Friendly | Permalink |  | Top
 
Haole Girl Donating Member (1000+ posts) Send PM | Profile | Ignore Wed Apr-26-06 11:37 AM
Response to Original message
12. Then why do most 15 year olds only use one:
whatever:rofl:
Printer Friendly | Permalink |  | Top
 
DavidDvorkin Donating Member (1000+ posts) Send PM | Profile | Ignore Wed Apr-26-06 11:45 AM
Response to Reply #12
16. "Whatever" is an excellent addition to the language
Used the way the kids use it, I mean. You have to add the eyes looking at the ceiling, the expression of disgust and disinterest, the shrug. Adults should adopt that and use it constantly. The world would be much improved.
Printer Friendly | Permalink |  | Top
 
One Honest Guy Donating Member (228 posts) Send PM | Profile | Ignore Wed Apr-26-06 11:44 AM
Response to Original message
15. Beautiful!
Quantity over quality eh? Mine is bigger than yours? Tsk, tsk. Anglo superiority complex in all its glory.

I got another one for em: shamabang, which is short for: western civilization is decomposing, rotting mess, which at current rate, will not survive the 21st century in its current form.

Put that in your lexicon and smoke it.

Praise the queen and pass the tea!
Printer Friendly | Permalink |  | Top
 
rucky Donating Member (1000+ posts) Send PM | Profile | Ignore Wed Apr-26-06 12:10 PM
Response to Original message
17. Is Brazillion one of them? n/t
Printer Friendly | Permalink |  | Top
 
AngryAmish Donating Member (1000+ posts) Send PM | Profile | Ignore Thu Apr-27-06 09:40 AM
Response to Reply #17
37. Post of the week.
Printer Friendly | Permalink |  | Top
 
Buns_of_Fire Donating Member (1000+ posts) Send PM | Profile | Ignore Wed Apr-26-06 12:13 PM
Response to Original message
18. Deleted because both my brain and my fingers work three minutes
Edited on Wed Apr-26-06 12:33 PM by Buns_of_Fire
slower than rucky's...
Printer Friendly | Permalink |  | Top
 
Igel Donating Member (1000+ posts) Send PM | Profile | Ignore Wed Apr-26-06 12:13 PM
Response to Original message
19. What an absurd title.
The OED folk maintain a corpus. A body of texts. They're doing corpus linguistics on it. The corpus contains texts, the word count for the texts is a billion.

There are Russian corpora (and yes, they use the Latin plural), Polish corpora, Serbian ... some are huge, many are not. Some are balanced (with a range of linguistic styles represented), some only colloquial, some are only literary. Some are tagged, many are not.

Stupid editor <-- stupid title.
Printer Friendly | Permalink |  | Top
 
Inland Donating Member (1000+ posts) Send PM | Profile | Ignore Wed Apr-26-06 12:55 PM
Response to Original message
20. "Cutting edge"? I don't know about that.
Edited on Wed Apr-26-06 12:56 PM by Inland
English has two things that cause a multiplicity of words:

1) Because of celtic, germanic, danish, and romance influences, it has at least two words for every meaning, or similar meanings. Doctor, physician, lawyer, attorney, pork, swine. Other languages can't do crossword puzzles, because other languages have a shortage of "another word for X, five letters."

2) Because Americans and other colonies are the largest users of english, there's no arbiters of "real" words. So new vocabulary is created for new technologies, and for the fun of it. It doesn't bother anybody that a word like "cebutant" has practically no use except for a snarky advice column.

While a large vocabulary MAY provide a richness of expression, in practice, it doesn't.
Printer Friendly | Permalink |  | Top
 
Bridget Burke Donating Member (1000+ posts) Send PM | Profile | Ignore Wed Apr-26-06 01:03 PM
Response to Reply #20
21. A large vocabulary provides richness of expression....
To writers with skill.

("Celtic" languages had almost no influence on English. I was taught that "crag" was the only survival. Of course, other words were adapted later.)
Printer Friendly | Permalink |  | Top
 
Odin2005 Donating Member (1000+ posts) Send PM | Profile | Ignore Wed Apr-26-06 03:43 PM
Response to Reply #20
27. Yep, English has tons of synonyms because of Latin and Norman French.
Printer Friendly | Permalink |  | Top
 
Burma Jones Donating Member (1000+ posts) Send PM | Profile | Ignore Wed Apr-26-06 01:26 PM
Response to Original message
22. Being English, we just appropriate any words we want
It's a British Thing, what's yours is mine....The USA has just taken over the responsibility these days......
Printer Friendly | Permalink |  | Top
 
Retrograde Donating Member (1000+ posts) Send PM | Profile | Ignore Thu Apr-27-06 01:50 PM
Response to Reply #22
38. English never met a vocabulary it couldn't steal
being a quasi-pigeon, English doesn't have to worry about case endings and stuff like that when it comes to, um borrowing words. And since it doesn't have the equivalent of the Academie Francais to decide what's a word and what's not, anybody can join in the fun.

And it makes some sense to borrow words already in use for things that the English weren't familiar with: "What do you call that big hopping thing with the pouch?" "Kangaroo." "Well, that's easier to say than 'big hopping thing with the pouch' - we'll call it that too"

I wonder, though, how many of these words are actually in use. IIRC, the original OED listed every word it could find that occurred in writing at least once after about 1400: I suspect the current count comes about the same way. A billion words, maybe, but who uses words like "forsooth" and "sockdologizing" these days?

I wonder if the count includes variants like gr8 (great)?
Printer Friendly | Permalink |  | Top
 
Left Coast Lynn Donating Member (185 posts) Send PM | Profile | Ignore Wed Apr-26-06 01:33 PM
Response to Original message
23. How many does B*sh know?
Any guesses?
Printer Friendly | Permalink |  | Top
 
arikara Donating Member (1000+ posts) Send PM | Profile | Ignore Wed Apr-26-06 01:38 PM
Response to Original message
24. "Decider" being the one billionth...
:rofl:
Printer Friendly | Permalink |  | Top
 
Book Lover Donating Member (1000+ posts) Send PM | Profile | Ignore Wed Apr-26-06 03:53 PM
Response to Reply #24
28. Don;t you mean
brazillionth?
Printer Friendly | Permalink |  | Top
 
maxsolomon Donating Member (1000+ posts) Send PM | Profile | Ignore Wed Apr-26-06 04:30 PM
Response to Original message
29. consider "bad"
1. bad = not good.
2. bad = very good in the michael jackson sense.

is that 2 words or one?

that's how you get to a billion.

my daily vocabulary is WAY over 600 words. i used 'bailiwick' & 'hornswoggled' today.
Printer Friendly | Permalink |  | Top
 
slackmaster Donating Member (1000+ posts) Send PM | Profile | Ignore Thu Apr-27-06 08:22 AM
Response to Original message
33. I'll bet "lolocaust" would make it 1,000,000,001
:argh:
Printer Friendly | Permalink |  | Top
 
robcon Donating Member (1000+ posts) Send PM | Profile | Ignore Thu Apr-27-06 09:13 AM
Response to Reply #33
34. I thi8nk Bush has added a word
Edited on Thu Apr-27-06 09:13 AM by robcon
"overmisunderestimated"
Printer Friendly | Permalink |  | Top
 
AlCzervik Donating Member (1000+ posts) Send PM | Profile | Ignore Thu Apr-27-06 09:30 AM
Response to Original message
35. And Bush knows about 8 of them.
Printer Friendly | Permalink |  | Top
 
AngryAmish Donating Member (1000+ posts) Send PM | Profile | Ignore Thu Apr-27-06 09:39 AM
Response to Original message
36. It is perfectly cromulent
The strength of the language enbiggens everybody.
Printer Friendly | Permalink |  | Top
 
DU AdBot (1000+ posts) Click to send private message to this author Click to view 
this author's profile Click to add 
this author to your buddy list Click to add 
this author to your Ignore list Sun May 05th 2024, 01:26 AM
Response to Original message
Advertisements [?]
 Top

Home » Discuss » Latest Breaking News Donate to DU

Powered by DCForum+ Version 1.1 Copyright 1997-2002 DCScripts.com
Software has been extensively modified by the DU administrators


Important Notices: By participating on this discussion board, visitors agree to abide by the rules outlined on our Rules page. Messages posted on the Democratic Underground Discussion Forums are the opinions of the individuals who post them, and do not necessarily represent the opinions of Democratic Underground, LLC.

Home  |  Discussion Forums  |  Journals |  Store  |  Donate

About DU  |  Contact Us  |  Privacy Policy

Got a message for Democratic Underground? Click here to send us a message.

© 2001 - 2011 Democratic Underground, LLC