விக்சனரி:frequency lists
இப்பக்கம் இங்கிருந்து படியெடுக்கப்பட்டுள்ளது. இப்பட்டியலில் உள்ள எளிய சொற்களை முன்னுரிமை கொடுத்து தமிழ் விக்சனரியில் சேர்ப்பதே நோக்கம்.
Most common words (TV and movie scripts)
[தொகு]Here are frequency lists comparable to the Gutenberg ones, but based on 29,213,800 words from TV and movie scripts and transcripts.
Here's a fuller explanation of how the list was generated and its limitations: Wiktionary:Frequency lists/TV/2006/explanation.
Here are the top 100 words (from tv scripts) in alphabetical order:
- and · are · as · at · a · back · because · been · be · but · can't · can · come · could · didn't · did · don't · do · for · from · get · going · good · got · go · had · have · he's · here · her · hey · he · him · his · how · I'll · I'm · if · in · is · it's · it · I · just · know · like · look · mean · me · my · not · now · no · of · oh · okay · ok · one · on · or · out · really · right · say · see · she · something · some · so · tell · that's · that · then · there · they · the · think · this · time · to · up · want · was · well · were · we · what · when · who · why · will · with · would · yeah · yes · you're · your · you
Here they are in frequency order:
- 1-1000 · 1001-2000 · 2001-3000 · 3001-4000 · 4001-5000 · 5001-6000 · 6001-7000 · 7001-8000 · 8001-9000 · 9001-10000
Bigger chunks now (getting lazy):
- 10001-12000 · 12001-14000 · 14001-16000 · 16001-18000 · 18001-20000 · 20001-22000 · 22001-24000 · 24001-26000 · 26001-28000 · 28001-30000 · 30001-32000 · 32001-34000 · 34001-36000 · 36001-38000 · 38001-40000
- 40001-41284 (the dregs that were tied for 40,000th place)
That'll probably be it. It's a third of all the unique words. The rest were used five or fewer times each.
Most common words (Gutenberg)
[தொகு]These lists are the most frequent words, when performing a simple, straight (obvious) frequency count of all the books found on Project Gutenberg. The list of books was "rsync"'ed in July of 2005. These are mostly English words, with some other languages finding representation to a lesser extent. Note also that with 24,000+ books, the text of the boilerplate warning for Project Gutenberg appears on each of them.
Here are the top 100 words (from Project Gutenberg texts) in alphabetical order:
- about · after · all · and · any · an · are · as · at · a · been · before · be · but · by · can · could · did · down · do · first · for · from · good · great · had · has · have · her · he · him · his · if · into · in · is · its · it · I · know · like · little · made · man · may · men · me · more · mr · much · must · my · not · now · no · of · one · only · on · or · other · our · out · over · said · see · she · should · some · so · such · than · that · their · them · then · there · these · they · the · this · time · to · two · upon · up · us · very · was · were · we · what · when · which · who · will · with · would · your · you
- I would appreciate it if someone subdivided these lists onto separate pages.
- These wikified terms can be copied to other language wiktionaries...they are intended to. If you do, please add an interwiki link onto the page here. I am interested in seeing how other language Wiktionaries fared.
- New list as of 4/16/2006:
- Wiktionary:Frequency lists/PG/2006/04/1-10000
- Wiktionary:Frequency lists/PG/2006/04/10001-20000
- Wiktionary:Frequency lists/PG/2006/04/20001-30000
- Wiktionary:Frequency lists/PG/2006/04/30001-40000
- New list as of 10/10/2005:
- The same list divided by thousand words:
- 1-1000 1001-2000 2001-3000 3001-4000 4001-5000 5001-6000 6001-7000 7001-8000 8001-9000 9001-10000
- more to come...
- Older lists
- Most common words, in order of rank:
- Wiktionary:Frequency lists/Project Gutenberg 1-10000
- Wiktionary:Frequency lists/Project Gutenberg 10001-20000
- Wiktionary:Frequency lists/Project Gutenberg 20001-30000
- Wiktionary:Frequency lists/Project Gutenberg 30001-40000
- Wiktionary:Frequency lists/Project Gutenberg 40001-50000
- Wiktionary:Frequency lists/Project Gutenberg 50001-60000
- Wiktionary:Frequency lists/Project Gutenberg 60001-70000
- Wiktionary:Frequency lists/Project Gutenberg 70001-80000
- Wiktionary:Frequency lists/Project Gutenberg 80001-90000
- Wiktionary:Frequency lists/Project Gutenberg 90001-100000
- Appoximately 24,197 files, 1,712,082,956 words, 70,756.0 average words/file. from which were gleaned about 9,053,310 unique "words."
- From the straight frequency count, the current copy of Wiktionary was then removed from that list. Even entries that only have a redirect were removed.
- With somewhat different filtering/selection criteria: