Book Review - Data Munging with perl

Author	David Cross
Publisher	Manning
ISBN	1-930110-00-6
Date	2001
Pages	283
Price	£33.50 (paper)
Price	$16.50 (eBook in PDF)
Reviewer	Rory Macdonald

Having lifted a definition from the Jargon File for data munging Cross summarises the term simply as "taking data that is in one format and converting it into another." Simple, but this book offers more than a walk-through of data structures and transformations. This book pitches itself as one which will help the perl programmer create more efficient data-munging code while "introducing new techniques, as well as novel uses for familiar methods."

From the word go the author clearly sets out data munging basics and why perl is a good choice for this task. Before part one of the book is over the reader has had a brush up on issues such as; decoupling, filtering, logging, complex sorting, DBI, benchmarking and regexen.

Initially, covering so much in just 78 pages may sound like a tall order, however, each chapter in the book is topped off by an honest 'further information' section which can be referred to for more in-depth coverage. It is reassuring to see that Manning have allowed those sections to cite books from competing publishers. Assumed prior knowledge is highlighted in the preface, and anyone with some perl5 experience should have no problems following the pace, especially with the healthy amount of brief but well-targetted examples which support the subject matter throughout.

Part 2 of the book takes the reader through the munging of unstructured and record-structured data. Examples of the data structures covered range from ASCII text to CSV to MP3 and see ample use of appropriate CPAN modules in the examples.

Following the coverage of these formats, part 3 turns to complex data structures and the basics of parsing them. Specifically Cross looks at the practicalities of parsing HTML and XML data, building to an example which transfers a module's POD data into an XML document. Part 3 of the book closes usefully showing the reader how to build a parser with Damian Conway's Parse::RecDescent module.

The remainder of the book is given over to a modules reference and a potted summary of perl features and syntax. This allows the non perl programmer to make some headway with the examples in book while keeping the main body of text clear of assumed prior knowledge.

Checking the publisher's online errata for the book it would appear that most alterations are minor typo's appearing in code samples.

Summary

While "Data Munging with Perl" is not presented as a traditional reference text, it tackles data munging problems cleanly enough to serve as one for the issues touched opon.

The rear cover blurb suggests that "this book will save you time", and on the grounds of it's sensible suggestions for the design of data and code, I can't argue too much with that.

For newcomers to perl I would suggest "Data Munging with Perl" as perhaps a 3rd or 4th perl text, and unless you are already confident of your munging capabilities then this text deserves a place on your bookshelf.

Table of contents

PART 1 : FOUNDATIONS
	 Ch1 : Data, data munging, and Perl
	 Ch2 : General munging practices
	 Ch3 : Useful perl idioms
	 Ch4 : Pattern matching

PART 2 : DATA MUNGING
	 Ch5 : Unstructred data
	 Ch6 : Record-oriented data
	 Ch7 : Fixed-width and binary data

PART 3 : SIMPLE STRING PARSING
	 Ch8 : Complex data formats
	 Ch9 : HTML
	 Ch10 : XML
	 Ch11 : Building your own parsers

PART 4 : THE BIG PICTURE
	 Ch12 : Looking back - and ahead
	 App A : Perl modules guide
	 App B : Intro to Perl