In this file I keep a detailed log of my Knoppix-to-Knorpora remastering attempt. Mainly, I will follow these Knoppix remastering howto's: http://www.knoppix.net/docs/index.php/KnoppixRemasteringHowto http://gnubox.dyndns.org:8080/~sunil/knoppix.php In the future, if I find the time to release new versions of Knorpora, I must seriously consider the possibility of using checkinstall so that I don't have to recompile everything every time! I am going to work on a Pentium 3 870MHz machine with 512MB of RAM. Normally this machine runs Windows XP, but we created an ext2 partition of 12GB and a swap partition of 568MB (according to the howtos, I'll need 1GB of RAM in total). I boot the machine from the Knoppix CD and at the prompt I type: knoppix dma I immediately set up the network: I select Network/Internet, then Network card configuration from the Knoppix menu (the penguin icon to the right of the K icon in the panel, at least in Knoppix 3.3), and I enter the relevant data. In order to be able to use X11 once I chroot, I open a non-root terminal and I type: knoppix@ttyp1[knoppix]$ xhost + Now I start a root shell from the Knoppix menu. I mount hda5, which is the ext2 partition where I will work (first I umount it, just in case): root@ttyp1[knoppix]# umount /mnt/hda5 root@ttyp1[knoppix]# mount -o rw /dev/hda5 /mnt/hda5 I create a general directory for remastering work root@ttyp1[knoppix]# mkdir /mnt/hda5/source I copy the contents of the /KNOPPIX directory in a KNOPPIX directory in the source directory: root@ttyp1[knoppix]# cp -Rp /KNOPPIX /mnt/hda5/source This takes about 10 minutes on this machine. I chroot so that from now on /mnt/hda5/source/KNOPPIX will be my /: root@ttyp1[knoppix]# chroot /mnt/hda5/source/KNOPPIX Size of KNOPPIX before adding anything (and before mounting proc): root@ttyp1[/]# du -hs / 2.0G / (This took a few minutes.) In order to use X applications, I do: root@ttyp1[/]# export DISPLAY=MYIP:0.0 (MYIP means the IP of the machine I am using). In order to be able to connect to the outside world, I do: root@ttyp1[/]# mount -t proc /proc proc and I add the following line to the file /etc/dhcpc/resolv.conf: nameserver MYNAMESERVER (MYNAMESERVER stands for number of name server). At this point, I use ifconfig to check that IP address is the same as before chrooting, and I ping google to check that I am online. Good, it works. Let's start getting rid of things. I have already prepared a list of packages that I know I will get rid of, named knorpora_kicklist.txt (you can find it in the knorpora documentation). [ One can construct or update such a list by issuing the following command and looking for huge packages: root@ttyp1[/]# dpkg-query -W --showformat='${Installed-Size} ${Package}\n' | sort -nr | less ] First, I run apt-get update: root@ttyp1[/]# apt-get update Good, now it's mayhem time: root@ttyp1[/]# apt-get --purge remove `cat knorpora_kicklist.txt` This takes a while, and from the warnings it looks like there are some non-empty directories that I will have to remove manually: root@ttyp1[/]# rm -fr /etc/falconseye /var/lib/wine /var/games etc/gimp Let's kill the orphans and the orphans of the orphans: root@ttyp1[/]# deborphan > orphanlist root@ttyp1[/]# apt-get --purge remove `cat orphanlist` root@ttyp1[/]# deborphan > secondorphanlist root@ttyp1[/]# apt-get --purge remove `cat secondorphanlist` root@ttyp1[/]# deborphan > thirdorphanlist root@ttyp1[/]# more thirdorphanlist root@ttyp1[/]# Good, no orphan is left, and we can clean up: root@ttyp1[/]# rm -f knorpora_kicklist.txt orphanlist secondorphanlist thirdorphanlist In previous tests, I noticed that apt-get --purge does not really get rid of all the links on the menus and icons on the kicker, so I'll try to do that manually. I remove all the URL_Button_5 info from the following file: root@ttyp1[/]# emacs /etc/skel/.kde/share/config/kickerrc & root@ttyp1[config]# rm -f kickerrc~ Hopefully this got rid of the open office link on the kicker. Also, somewhat more pessimistically, I do: root@ttyp1[knoppix]# cd /usr/share/applnk/ root@ttyp1[applnk]# rm -fr Office root@ttyp1[applnk]# cd Applications root@ttyp1[applnk]# rm -fr Editors Educational Emulators Viewers Graphics Programming Security Sound Technical root@ttyp1[Applications]# cd .. root@ttyp1[applnk]# rm -fr Apps/ Edutainment/ Games/ Toys/ root@ttyp1[/]# cd /etc/skel/.kde/share/applnk/ root@ttyp1[applnk]# rm Multimedia/Viewers/Acrobat_Reader.desktop OK, we are done with the pars destruens, let's move on to more positive things, like installation of cool software. My policy regarding where things will go: if something is automatically installed by make files or similar, we'll let it go where it would naturally install itself. The other things will go to usr/local/knorpora: root@ttyp1[/]# mkdir usr/local/knorpora I prepared an archive with all the stuff I plan to install (except what I can get via package managers and such), and now I download it to the knorpora directory. I unpack it and I move everything inside the newly expanded directory up to the knorpora level: root@ttyp1[knorpora]# tar xvzf knorpora_materials.tar.gz root@ttyp1[knorpora]# mv knorpora_materials/* . root@ttyp1[knorpora]# rm -fr knorpora_materials* Here is what I've got: root@ttyp1[knorpora]# ls BootCaT-0.1.2.tar.gz fsa_current.tar.gz FreeLing-1.0.1.tar.gz kwic ItaACOPOSTModels.tar.gz libcfg+-0.6.2.tar.gz K-vec++.v03.tar.gz nltk-1.3-docs.tar.gz SenseClusters-v0.49.tar.gz nltk-1.3.tar.gz SenseTools-0.3.tar.gz nltk-data-0.2.zip UCS-0.3.1.tar.gz nsp-v0.69.tar.gz WordNet-2.0.tar.gz pcre-4.3.tar.gz acopost-1.8.4.tar.gz regexp_tokenizer-0.01.tar.gz bow-20020213.tar.gz treetagger db-4.2.52.NC.tar.gz utr_current.tar.gz fnTBL-1.1.linux.tar.gz The treetagger directory contains: root@ttyp1[knorpora]# ls treetagger/ english-chunker-par-linux-3.1.bin.gz info english-par-linux-3.1.bin.gz install-tagger.sh french-par-linux-3.1.bin.gz italian-par-linux-3.1.bin.gz german-chunker-par-linux-3.1.bin.gz tagger-scripts.tar.gz german-par-linux-3.1.bin.gz tree-tagger-linux-3.1.tar.gz The treetagger/info directory contains: root@ttyp1[knorpora]# ls treetagger/info/ Penn-Treebank-Tagset.ps french-tagset.html italian-tagset.txt french-par-linux.info italian-par-linux.info stts_guide.ps.gz OK, I have everything I need. I will start by installing the Berkeley DB, which will be needed by WordNet::Similarity and FreeLing (with c++ support). root@ttyp1[knorpora]# tar xvzf db-4.2.52.NC.tar.gz root@ttyp1[knorpora]# cd db-4.2.52.NC/build_unix/ root@ttyp1[build_unix]# ../dist/configure --enable-cxx root@ttyp1[build_unix]# make root@ttyp1[build_unix]# make install Now I install what I can install via apt-get (R, chasen and related files). Then I will install the perl modules, which I can do via CPAN. However, before installing the perl modules I will install WordNet, since some modules depend on that. In order to install R, I first modify the file: root@ttyp1[knorpora]# emacs /etc/apt/sources.list & by adding the following lines at the bottom: # added by me in order to install r # right now following is down, so I try alternative below #deb http://cran.r-project.org/bin/linux/debian woody main deb http://cran.get-software.com/bin/linux/debian woody main root@ttyp1[knorpora]# rm -f /etc/apt/sources.list~ Then I apt-get-update again, and I install R. root@ttyp1[knorpora]# apt-get update root@ttyp1[knorpora]# apt-get install r-base root@ttyp1[knorpora]# apt-get install r-recommended root@ttyp1[knorpora]# apt-get install r-doc-html I install chasen: root@ttyp1[knorpora]# apt-get install chasen Now I install WordNet: root@ttyp1[knorpora]# tar xvzf WordNet-2.0.tar.gz root@ttyp1[knorpora]# cd WordNet-2.0/ I edit the Makefile and I make: root@ttyp1[WordNet-2.0]# emacs Makefile & root@ttyp1[WordNet-2.0]# make BinWorld I add the relevant paths to .bashrc in /etc/skel with (this .bashrc will be copied to the knoppix home when knoppix/knorpora is launched, so please take a look at .bashrc in the knoppix home to see what I did). Also, I export/update the relevant variables in current session, because some of the perl modules will try to access WordNet: root@ttyp1[knorpora]# export WNHOME=/usr/local/WordNet-2.0; root@ttyp1[knorpora]# MANPATH=$MANPATH:/usr/local/WordNet-2.0/man; root@ttyp1[knorpora]# PATH=$PATH:$WNHOME/bin root@ttyp1[knorpora]# rm -fr WordNet-2.0* Before moving on to perl, we do the following, which seems like a good idea, given that many scripts invoke perl from /usr/local/bin/perl rather than from /usr/bin/perl: root@ttyp1[knorpora]# ln -s /usr/bin/perl /usr/local/bin/perl root@ttyp1[knorpora]# rm -fr db-4.2.52* OK, now I install a number of perl modules, which are interesting by themselves and/or are used by some of the tools I am about to install (recently, Net::Google has some problem with its spelling test, so we have to force install): root@ttyp1[knorpora]# perl -MCPAN -e shell; cpan> install S/SB/SBURKE/Pod-Perldoc-3.12.tar.gz cpan> install HTML::FormatText cpan> force install Net::Google cpan> install XML::Twig cpan> install WordNet::QueryData cpan> install BerkeleyDB cpan> install WordNet::Similarity cpan> install PDL cpan> install Expect cpan> install Term::ReadKey cpan> install Bit::Vector cpan> install Sparse cpan> install Set::Scalar cpan> quit We go on with one of the most ambitious installs, nltk. We need the python-dev package, which is not in knoppix by default: root@ttyp1[nltk-1.2]# apt-get install python-dev Nltk also requires the Numeric Python package, so I will install that one as well. I download the Numeric archive to knorpora, and I do: root@ttyp1[knorpora]# tar xvzf Numeric-23.1.tar.gz root@ttyp1[knorpora]# cd Numeric-23.1/ root@ttyp1[Numeric-23.1]# python setup.py install root@ttyp1[knorpora]# rm -fr Numeric-23.1* Now I install nltk: root@ttyp1[knorpora]# tar xvzf nltk-1.3.tar.gz root@ttyp1[knorpora]# cd nltk-1.3 root@ttyp1[nltk-1.3]# python setup.py install I remove everything that was inside nltk-1.3, I move the docs and data archives over there and I unpack them: root@ttyp1[nltk-1.3]# rm -fr * root@ttyp1[nltk-1.3]# mkdir nltk-1.3-docs root@ttyp1[nltk-1.3]# mv ../nltk-1.3-docs.tar.gz nltk-1.3-docs root@ttyp1[nltk-1.3]# cd nltk-1.3-docs/ root@ttyp1[nltk-1.3-docs]# tar xvzf nltk-1.3-docs.tar.gz root@ttyp1[nltk-1.3-docs]# rm -f nltk-1.3-docs.tar.gz root@ttyp1[nltk-1.3]# mv ../nltk-data-0.2.zip . root@ttyp1[nltk-1.3]# unzip nltk-data-0.2.zip root@ttyp1[nltk-1.3]# rm -f nltk-data-0.2.zip I also add a mini-readme: root@ttyp1[nltk-1.2]# emacs README.first & root@ttyp1[knorpora]# rm -f nltk-1.3.tar.gz Let's do Bow: root@ttyp1[knorpora]# tar xvzf bow-20020213.tar.gz root@ttyp1[knorpora]# cd bow-20020213/ root@ttyp1[bow-20020213]# ./configure root@ttyp1[bow-20020213]# make root@ttyp1[bow-20020213]# make install root@ttyp1[knorpora]# rm -fr bow-20020213* Let's do ACOPOST: root@ttyp1[knorpora]# tar xvzf acopost-1.8.4.tar.gz root@ttyp1[knorpora]# cd acopost-1.8.4/src root@ttyp1[src]# make root@ttyp1[src]# make install root@ttyp1[src]# rm -f *.o et met t3 tbt root@ttyp1[src]# cd ../doc/ug/ root@ttyp1[ug]# make root@ttyp1[ug]# rm -f Makefile ug.aux ug.bbl ug.bib ug.blg ug.dvi ug.log ug.tex ug.toc I include the bin directory of acopost in the path in the /etc/skel/.bashrc file. Adding the Italian ACOPOST models: root@ttyp1[knorpora]# mv ItaACOPOSTModels.tar.gz acopost-1.8.4/ root@ttyp1[knorpora]# cd acopost-1.8.4/ root@ttyp1[acopost-1.8.4]# tar xvzf ItaACOPOSTModels.tar.gz root@ttyp1[knorpora]# rm -f acopost-1.8.4.tar.gz acopost-1.8.4/ItaACOPOSTModels.tar.gz Installing Daciuk's FSA toolkit: root@ttyp1[knorpora]# tar xvzf fsa_current.tar.gz root@ttyp1[knorpora]# cd s_fsa/ root@ttyp1[s_fsa]# make root@ttyp1[s_fsa]# make install root@ttyp1[knorpora]# rm -fr s_fsa fsa_current.tar.gz root@ttyp1[knorpora]# tar xvzf utr_current.tar.gz root@ttyp1[knorpora]# cd u_tr/ root@ttyp1[u_tr]# make root@ttyp1[u_tr]# make install root@ttyp1[knorpora]# rm -fr u_tr utr_current.tar.gz (I get errors about the installation of dictionaries. This is probably related to the fact that, since they seem to be in obsolete formats, I decided not to install the dictionaries available on Daciuk's site.) Now I install BootCaT: root@ttyp1[knorpora]# tar xvzf BootCaT-0.1.2.tar.gz root@ttyp1[knorpora]# rm -f BootCaT-0.1.2.tar.gz I add BootCaT to the /etc/skel/.bashrc path. NSP: root@ttyp1[knorpora]# tar xvzf nsp-v0.69.tar.gz root@ttyp1[nsp-v0.69]# perl Makefile.PL root@ttyp1[nsp-v0.69]# make root@ttyp1[nsp-v0.69]# make install root@ttyp1[Testing]# cd Testing/ root@ttyp1[Testing]# csh ALL-TESTS.sh (This sends a lot of testXX.sh command not found warnings... I don't know if I should worry -- but, to some superficial testing, package seems to work...) We leave only the documentation, everything else is somewhere in the system (the docs are also somewhere in the system, but they are hard to find...) root@ttyp1[nsp-v0.69]# rm -fr CHANGES FDL GPL INSTALL MANIFEST Makefile.PL Measures NSP.pm Testing Utils README count.pl statistic.p root@ttyp1[knorpora]# rm -f nsp-v0.69.tar. Now I do K-vec++: root@ttyp1[knorpora]# tar xvzf K-vec++.v03.tar.gz root@ttyp1[knorpora]# rm -f K-vec++.v03.tar.gz I add the K-vec++ directory to the path in the /etc/skel/.bashrc file. The kwic script: root@ttyp1[knorpora]# chmod +x kwic And I add knorpora to the path. The fnTBL toolkit: root@ttyp1[knorpora]# tar xvzf fnTBL-1.1.linux.tar.gz root@ttyp1[knorpora]# cd fnTBL-1.1/ root@ttyp1[fnTBL-1.1]# make all Testing suggested in the README: root@ttyp1[pos-tagging]# ../../exec/pos-train.prl -F tbl.lexical.train.params,tbl.context.pos.params -r 0.4 -R lexical.rls,context.rls -S 11.part1,11.part2 -t NN,NNP -T 2,2 -f 10 -v 11 root@ttyp1[pos-tagging]# ../../exec/pos-apply.prl -F tbl.lexical.train.params,tbl.context.pos.params -R lexical.rls,context.rls -o 22.res -t NN,NNP -v 22 The files 22.res, lexical.rls and context.rls do look similar to their ``official'' counterparts (like they should), at least to a superficial inspection. Notice that to make the test work I had to change -f 5 to -f 10 in the first command (wrt what suggested in the README). root@ttyp1[knorpora]# rm -f fnTBL-1.1.linux.tar.gz On to the TreeTagger: root@ttyp1[knorpora]# cd treetagger/ root@ttyp1[treetagger]# chmod +x install-tagger.sh root@ttyp1[treetagger]# ./install-tagger.sh I noticed in previous trials that there are permission issues, so I do: root@ttyp1[treetagger]# chmod +xr cmd root@ttyp1[treetagger]# chmod +xr lib root@ttyp1[treetagger]# cd lib/ root@ttyp1[lib]# chmod +rx * I then edit the usual .bashrc file. Cleaning up: root@ttyp1[treetagger]# rm -f english-chunker-par-linux-3.1.bin.gz english-par-linux-3.1.bin.gz french-par-linux-3.1.bin.gz german-chunker-par-linux-3.1.bin.gz german-par-linux-3.1.bin.gz italian-par-linux-3.1.bin.gz tagger-scripts.tar.gz tree-tagger-linux-3.1.tar.gz Now, I'll do SenseClusters, which first requires installation of SenseTools: root@ttyp1[knorpora]# tar xvzf SenseTools-0.3.tar.gz root@ttyp1[knorpora]# rm -f SenseTools-0.3.tar.gz I edit .bashrc. No CLUTO (licensing issues) and no SVDPACK, for the Knorpora version of SenseClusters :-( On to SenseClusters: root@ttyp1[knorpora]# tar xvzf SenseClusters-v0.49.tar.gz root@ttyp1[SenseClusters-v0.49]# perl Makefile.PL root@ttyp1[SenseClusters-v0.49]# make root@ttyp1[SenseClusters-v0.49]# make install root@ttyp1[SenseClusters-v0.49]# Testing/testall.sh It looks like it's doing fine... I do not throw away the directory, since it looks like it has some useful stuff that was not installed elsewhere (e.g., the Demos). root@ttyp1[knorpora]# rm -fr SenseClusters-v0.49.tar.gz On to Stefan Evert's mighty UCS toolkit! root@ttyp1[knorpora]# tar xvzf UCS-0.3.1.tar.gz root@ttyp1[knorpora]# cd UCS/ root@ttyp1[UCS]# perl System/Install.perl Warning: one or more recommended modules are not installed on your system. UCS/Perl works without these modules, but some functions may not be available. [missing module(s): Tk Tk::Pod] Continue without these modules? (yes/no) yes ... Installation successful. root@ttyp1[UCS]# rm System/Perl/bin/*~ I update .bashrc with the UCS path. root@ttyp1[knorpora]# rm -f UCS-0.3.1.tar.gz The regexp_tokenizer: root@ttyp1[knorpora]# tar xvzf regexp_tokenizer-0.01.tar.gz root@ttyp1[knorpora]# rm -f regexp_tokenizer-0.01.tar.gz I update the .bashrc file. Finally, we install FreeLing. First, we will need to install pcre and libcfg+. root@ttyp1[knorpora]# tar xvzf pcre-4.3.tar.gz root@ttyp1[knorpora]# cd pcre-4.3/ root@ttyp1[pcre-4.3]# configure root@ttyp1[pcre-4.3]# make root@ttyp1[pcre-4.3]# make install root@ttyp1[knorpora]# rm -fr pcre-4.3* root@ttyp1[knorpora]# tar xvzf libcfg+-0.6.2.tar.gz root@ttyp1[libcfg+-0.6.2]# configure root@ttyp1[libcfg+-0.6.2]# make root@ttyp1[libcfg+-0.6.2]# make install root@ttyp1[knorpora]# rm -fr libcfg+-0.6.2* root@ttyp1[knorpora]# tar xvzf FreeLing-1.0.1.tar.gz A bit of hacking is needed to successfully install FreeLing: root@ttyp1[FreeLing-1.0.1]# export LD_LIBRARY_PATH=/usr/local/BerkeleyDB.4.2/lib/ root@ttyp1[FreeLing-1.0.1]# ln -s /usr/local/BerkeleyDB.4.2/include/db.h /usr/local/include/ root@ttyp1[FreeLing-1.0.1]# ln -s /usr/local/BerkeleyDB.4.2/include/db_cxx.h /usr/local/include/ root@ttyp1[FreeLing-1.0.1]# ln -s /usr/local/BerkeleyDB.4.2/lib/libdb_cxx-4.2.a /usr/local/lib/ root@ttyp1[FreeLing-1.0.1]# ln -s /usr/local/BerkeleyDB.4.2/lib/libdb_cxx-4.2.la /usr/local/lib/ root@ttyp1[FreeLing-1.0.1]# ln -s /usr/local/BerkeleyDB.4.2/lib/libdb_cxx-4.2.so /usr/local/lib/ root@ttyp1[FreeLing-1.0.1]# ln -s /usr/local/BerkeleyDB.4.2/lib/libdb_cxx-4.so /usr/local/lib/ root@ttyp1[FreeLing-1.0.1]# ln -s /usr/local/BerkeleyDB.4.2/lib/libdb_cxx.a /usr/local/lib/ root@ttyp1[FreeLing-1.0.1]# ln -s /usr/local/BerkeleyDB.4.2/lib/libdb_cxx.so /usr/local/lib/ Now we are ready: root@ttyp1[FreeLing-1.0.1]# configure root@ttyp1[FreeLing-1.0.1]# make root@ttyp1[FreeLing-1.0.1]# make install root@ttyp1[knorpora]# rm -fr FreeLing-1.0.1* OK -- Done with the installations! I comment out the following lines from /etc/X11/Xsession.d/45xsession to stop Konqueror autostart: #if [ -e "$INDEXFILE" ]; then #cat >> $HOME/Desktop/KNOPPIX.desktop < KNOPPIX/KNOPPIX This takes a while. Now we create a boot floppy: root@ttyp1[KNOPPIX]# dd if=/mnt/hda5/KNOPPIX/boot.img of=/dev/fd0 We reboot from the floppy, and we test knorpora a bit... OK, time to prepare the iso (I booted from CD again and I mounted hda5 like I did above). I create a master directory and I move KNOPPIX there: root@ttyp1[hda5]# mkdir master root@ttyp1[hda5]# mv KNOPPIX master Updating md5sums: root@ttyp1[master]# find -type f -not -name md5sums -not -name boot.cat -exec md5sum {} \; >> KNOPPIX/md5sums OK, let's create the iso root@ttyp1[master]# mkisofs -r -J -b KNOPPIX/boot.img -c KNOPPIX/boot.cat -o /mnt/hda5/knorpora.iso /mnt/hda5/master DONE!!!