|
|
|
|
T-Codes
- T-Codes are variable-length codes. They can be used in a similar fashion as Huffman codes. I.e., you can use short codewords from a T-Code set to encode the frequently occurring symbols of an information source, and the longer codewords to encode the less frequent symbols.
- T-Codes are highly self-synchronizing. That is, if the transmitted stream of T-Code codewords is subjected to noise or other forms of corruption, a T-Code decoder will usually regain synchronization automatically (i.e., find out where the codeword boundaries are). This process is very well explained. The decoder can even derive its own synchronization status if it knows how many of the previous bits were received correctly.
- By the way - you can also use other things than just bits - T-Codes can be constructed from any finite alphabet.
- The construction of T-Codes is done via a recursive copy-and-append process called T-augmentation.
The animation below shows an example: every alphabet is a T-Code set. T-Code sets are derived from other T-Code sets by choosing a T-prefix p and a positive integer k called the T-expansion parameter (or copy-exponent). The tree of the old T-Code set is copied k and the copies are linked to the original and each other through node p. This process may be repeated an arbitrary number of times to generate arbitrarily large sets.
- T-Codes were invented by Mark Titchener, who was my PhD supervisor.
- Want to know more? Download my PhD thesis as gzipped PS or PDF.

The animation above shows (if you've got JavaScript enabled) three successive T-augmentations of the binary alphabet to yield a T-Code set with 13 codewords.
|
|
|
T-Information Theory
- T-information theory is based on a duality between T-Code sets and finite strings
- The longest codewords in each T-Code set are unique, i.e., there is no other T-Code set that has the same strings as longest codewords. This was discovered experimentally by Mark Titchener. The proof was furnished by Radu Nicolescu.
- For every finite string, there is exactly one T-Code set for which it is a longest codeword.
- The longest codewords in a particular T-Code set are identical except for the last symbol. They contain, in order, all the k copies of each T-prefix p.
- Given a string, it is always possible to derive the associated T-Code set. This is done in an algorithm called T-decomposition. T-decomposition decodes a string at multiple levels of T-augmentation and thus extracts the T-prefixes.
- Thus, T-augmentation can be seen as a string production algorithm as well as a code set production algorithm.
In the string view, T-decomposition may be seen as an algorithm that removes codeword boundaries in a hierarchical order. At each level, the codeword boundaries following the T-prefix for that level are removed. At the top level, no boundaries are left - the string decodes as a single codeword. This is demonstrated on the right.
- The T-expansion parameters extracted from the string yield the T-complexity CT :
CT = log2(k1 + 1) + log2(k2 + 1) + ...
- The T-complexity permits the derivation of a T-information measure (basically, the inverse logarithmic integral of the T-complexity), which in turn gives rise to a T-entropy (first derivative of the T-information with respect to the string length). These seem to be sensitive instruments for the measurement of information in strings.
|
The red dots between the bits in the string below represent codeword boundaries between the codewords of the pass indicated. The second-to-last codeword on the right is always the next T-prefix. The length of its run in that position determines the next T-augmentation parameter. Finally, the T-expansion parameters are used to compute the T-complexity for the string.

|
|
|
My Current Research Projects in T-Information Theory
- I've created an online lab for the analysis of data for T-complexity, T-information, and T-entropy. The page is a bit dusty and there's one sub-feature (radioactivity-generated random files) that doesn't work because the experimental setup for it got discontinued. But you can upload your own files and get them plotted or retrieve the raw data in text format for processing in your system. Results for larger files get e-mailed to you (you'll need an e-mail client that's capable of reading HTML mail with embedded images).
There are a few security features, though: you'll need to ask me for a login, and there's a limit on how many files you can process at any one time. Similarly, there's a limit to the maximum file size you can process. If you'd like to process larger files, let me know. Files can be processed forwards and backwards, and as ASCII or binary.
- The T-information measure is a great tool - to prove the pudding, I've written a little application that uses it in a similarity search engine. You'll find it here. I'm still experimenting a bit with it because the current evaluation algorithm (described in "Similarity Searches Using a Recursive String Parsing Algorithm", Ulrich
Günther, in: Supplemental Papers for the 2nd International Conference on
Unconventional Models of Computation, UMC2K, Brussel, December 13 16,
2000, published as CDMTCS Research Report #147, page 54) still isn't sharp enough. Have a play with it - if there are any HTML files you'd like me to add so you can search in them, contact me.
- An introductory paper on T-information theory can be found here
|
|
|
JavaScript, CGI, and other Internet Technologies in a Web-Based Conference Registration System
My involvement here was really born out of annoyance. I think it's nice to be able to register for a scientific conference via the conference's web site - no messy faxes etc. So I set up a number of CGI-based online participant registration systems for various conferences we held here, including DMTCS'96, UMC'98, IVCNZ'98, and ACSW'99. Problem was - each conference looked different, so each system was different, too. The information we were collecting was different, the rules didn't match, and the conference structures weren't alike. About a week's worth of work to write a new site each time, considerably more for ACSW'99... But one learns lessons along the way, too - valuable ones.
After ACSW'99, Radu Nicolescu suggested that we get two students to work on a more universal system. They soon found out that this wasn't a simple task. The two hardly got beyond a case study of what would be involved in the setup of a site and the rudimentary database structure. One of them, Sudhir Reddy, stayed on as a MSc student and developed a system that has since served two further conferences, UMC2K in Brussels and ROBVIS01 in Auckland. It's still somewhat cumbersome to drive and doesn't yet have all the nice features we want. But the benefits are already visible - re-rigging a copy of the UMC2K site to ROBVIS01 took less than two days, and most of that was spent finding out what information and options were needed. Currently, the site is used for DMCTS01, a conference organised jointly between CDMTCS in Auckland and the Ovidius University of Constantza, Romania.
|
So what are the features of the system at present?
- Frame-based web site with a navigation menu on the left, a central content frame, and a title frame at the top, plus a small control frame at the bottom. No, it doesn't look quite as ugly as some of the frame-based sites you'll see elsewhere.
- Prospective participants can browse the web site to find out what the conference is all about. They can also optionally subscribe to a mailing list, if there is one.
- There's an online paper submission facility which we've never been asked for, so we don't know whether it really works...
- As registration time comes, the prospective participants can register via an easy-to-use registration wizard - my pride and joy.
|
|
A screenshot of the personal info page of the registration wizard we used for UMC2K.
|
It's much faster than a lot of the e-commerce stuff you see out there, because it's largely client driven and minimizes the number of interactions with the remote web server. Registration may involve all sorts of information, from contact info via enrolment for optional workshops or excursions to the purchase of dinner tickets for accompanying spouses. For our conferences, we were running a secure server, so we could also take people's credit card data.
Registrants are e-mailed a login, so they can change address details etc. if the need arises.
There's a login for organisers to view and edit registration info of all kind, and an account-type view for the financial admin that permits to keep track of what people have paid for and what still needs to be charged to their credit cards. All of this can go via secure https connections and is protected by login and password.
At conference time, it's possible to print name tags off the web site - very useful to deal with participants who keep losing their name tags (OK, I confess, I'm a repeat offender here...).
We use PHP on the server side, and HTML and JavaScript on the client side. The database back end is MySQL and the name tag printing is a legacy CGI script I once wrote, which also requires Perl, LaTeX and a few associated utilities to be installed on the server it's running on. However, don't despair - I'm happy to let you use my installation of the script if it fits your requirements.
What else is planned for the future?
- Shift more of the guts to the client side - it speeds things up if you don't have to go back to the server each time. Did you know that every HTTP request to a server on the other side of the world takes about half a second to complete, even if the server responds instantaneously?
- Give the files of the project more meaningful names. Probably the most urgent project.
- Assemble a proper distribution with documentation, so we don't have to set it up for all and sundry.
- Make the organiser's view more userfriendly and provide more avenues for tailored import/export.
- Add additional modules for accommodation, paper submission and refereeing, arrival information, etc.
|
|
|
My JavaScript Gallery
It's no secret that I am a great fan of JavaScript Applications. I even translated an entire book on the topic from English to German, for O'Reilly Associates (the publishers that put all the cute animals on their book covers).
The book is by Jerry Bradenbaugh, and it's given me a lot of inspiration. There are
plenty of JavaScript resources on his site (www.serve.com/hotsyte).
I've adapted some of his applications from the book, and added a new twist or two. I have put them into my gallery to the right.
|
JavaScript Application Gallery
The study apps are, in parts, a bit rough around the edges - they mereley serve the purpose of showing that a particular thing is possible. Best viewed with a late generation Netscape or MSIE browser. If anything gets stuck, reload.
|
|
|
|
Community Informatics in Tourism
I'm involved with this through the New Zealand Tourism Research Institute (TRI). The basic idea here is to promote the use of IT in communities through the provision of applications to which the communities involved can contribute as a community effort. The aim is to give communities tangible rewards (e.g., an increased influx of tourists) in order to entice their IT development. More soon.
|
|
|
My publications
OK, so you're looking for my PhD thesis or a paper of mine? No worries, they're right here.
|
|
|
Student Projects
I'm happy to talk to you about supervising your undergraduate, graduate, BTech, MSc, PhD, etc. project. If you already have a project idea, just e-mail me and we'll see how we can get it underway. If you're not quite sure yet, there's a number of suggested project topics right here. Have a look and see whether there's one that interests you.
|