Patrick Craig > Other Stuff > ICFP 2001

ICFP 2001

I submitted a lightning entry to the ICFP 2001 contest. I am fairly confident of my entry being first. First to be eliminated that is! As long as you have at least one redundant unformatted whitespace character at the end of your SML file, don't use plain or underline tags and don't have too many color, number or emphasis tags occurring together, my program works fine. It's also very fast and never gives a result bigger than the input. So if you just reverse the judgement criteria:

1. Speed of optimisation.
2. Output size.
3. Stupidity.
4. Correctness.

My program will do quite well, but then a one line program (int main() {return 0;}) will do even better! It could all have been so different...

Timetable (Japanese time)

Friday, July 27 0:00

Task description published. Read it at home and thought there must be some mistake as it sounded far too easy. Went to bed and thought about how I would solve the problem in a few hours.

Friday, July 27 10:00

Reread the task description and realised it was a bit more complicated than I first thought.

Friday, July 27 12:00

Made a start on a parser in my lunch hour.

Friday, July 27 17:00

Work over, started coding up the simplification algorithm. The idea was to record the current format associated with each text character during the initial pass. You then recreate the document using the change in format between characters. Had to give up at 22:00pm. Submitted what I had written by then. source (1,600 lines of code) README

Friday night

While lying in bed, realised there was a major problem with the algorithm when dealing with plain tags.

Saturday

Spent the day with the family and came up with an idea to solve the plain problem.

Sunday

Managed to pursaude my wife to let me work on my entry (ooh err missus). Realised there were several problems with the code apart from dealing with plain tags. Two simple errors in the code meant that the final character from the input and any closing tags associated with this character were missing in the output and the underline tags were not handled properly. There were two problems with the EM tags, first they were treated as on/off tags and not switch tags. The second was my assumption about when they were redundant was wrong, i.e. all the EM tags in this sequence are redundant:

bar <EM> <EM> foo </EM> </EM>.

but only the first two are in this sequence.

bar <EM> <EM> foo </EM>x</EM>.

Fixed the errors in my code relating to the missing last character, underline tag and EM tags.

Next realised that unlike other tags, the ordering of color and number tags that occur together is important, e.g.

<0><1>one</1></0>

is obviously not the same as

<1><0>one</0></1>

Fixed this by having buffers to record the ordering of color and number tags. At the same time added color and number stacks to record the current color and number to remove redundant color and number tags.

Started working on the plain fix. The idea is that you have a plain level that is incremented every time you encounter an opening plain tag and decremented every time you encounter a closing plain tag. In the first pass, you ignore (i.e. just treat as text) all S, EM, I, B, U and TT tags that occur in parts of the text where the plain level is not equal to zero. In the second pass you you ignore all these tags where the plain count is not one etc. If you record the positions of the plain tags along with the plain level, in effect you are doing less than two passes through the text.

Wrote a random SML file generator and started testing my program properly. Found and fixed several bugs but couldn't fix them all by the deadline (still needed to impement the color nesting and plain shortcut simplifications too).

After reading other entrants pages I think if I had got my entry working it would be one of the fastest implementations (I doubt it would ever require more than three minutes for a 5 Mb file) and I think it would have done quite well on the output file size (especially on large file sizes where other entrants were time constrained). Unlike a few years ago I now have a life (of sorts). If I could have spent the three days working on a fast machine maybe I could have completed my entry. (Yeah right - that's what they all say).

Maybe next year...