ddpl_dataset

The following text files contain the two word-lists used in the surveys described in my paper "Distribution-driven morpheme discovery: A computational/experimental study", together with codes for the parses assigned by the DDPL model to the words, the average morphological complexity ratings assigned by native English speakers to the words, and other statistics.

The first line of each file is a header with labels for each field; see below for an explanation of these labels.

First Survey
Second Survey

Fields:

form: the word to be rated
prefix: prefix of word
stem: potential stem of word
form_length: length of word in characters
prefix_length: length of prefix in characters
stem_length: length of potential stem in characters
form_fq: frequency of word in corpus
stem_fq: frequency of potential stem (as an independent string) in corpus
ddpl_parse: parse assigned by ddpl (0 == simple; 1 == complex)
avg_rating: average of speakers' ratings (on 1-to-5 scale)

Back to Marco's publications page.

Back to Marco's page.