English Frequency Lists

From here, you can download token and document frequency lists extracted from the “classic” Brown and LOB corpora of written English.

These are relatively small corpora for today's standards, but that is not always a bad thing (e.g., I find that sometimes they work better as “reference” corpora for corpus comparison purposes than very large corpora).

The lists (gzipped):

Here is a log of how I created these lists. This can be useful to make sure, e.g., that you are comparing them to lists that were tokenized in a similar way.

Back to the tools and resources page