I added the following extra programs and perl modules to the basic Knoppix 3.3
famous lexical database for the English language.
- Natural Language
A suite of Python libraries and programs for
natural language processing, with an impressive collection of sample
data that can also be used with the other programs.
collection of Part-of-Speech Taggers, with pre-trained
models to tag Italian text.
tagger and lemmatizer with pre-trained models to tag English, German,
Italian and French.
Transformation-Based Learning toolkit, pre-trained
to perform English POS tagging and NP and text chunking.
library to perform tokenization, sentence splitting, morphological
analysis, NE detection and PoS tagging, which comes with a simple
command line interface and pre-trained models for English, Spanish and
Tool and dictionary to perform tokenization
and morphological analysis of Japanese text.
programs to extract specialized corpora and terms from the web.
of the K-vec algorithm to extract candidate translations from parallel
Statistics Package (NSP)
Perl programs to extract n-grams
from corpora and evaluate their association strength.
- UCS (Utilities for
A toolkit of perl and R programs
for the analysis of cooccurrence statistics.
complete unsupervised word sense discrimination system.
- The Bow
Toolkit to perform document classification,
retrieval and clustering, and other statistical text analysis
Jan Daciuk's tools to build and use finite
state automata and transducers.
Whitelock's simple perl concordancer.
powerful statistical analysis environment.
Here is a list of all the Perl modules installed in Knorpora (the ones listed above plus the ones that are already part of the standard Knoppix distribution plus the ones that I installed to satisfy some dependency).
I had originally planned to include more corpora, but I then
realized that it makes more sense for users to download corpora and
other resources in the languages of their interest, than for me to
pick an arbitrary set of languages. There should be enough (English)
data to get started with in the NLTK directory. For pointers to more
freely available data, please visit my NLP
data link list.
Back to the Welcome to Knorpora page