Mastering Regular Expressions,3rd
2007年07月26日 原创, 好书, 翻译
《Mastering Regular Expressions,3rd》(以下简称《MRE3》)这本书的中译版也要上市了,中文名叫《精通正则表达式》,由电子工业出版社的博文视点推出。而《Beginning Regular Expressions》(以下简称《BRE》)可能还要等四个月以上才能面世。
本文的目的是就这两本“国内首册”正则表达式技术书的引进版作一比较。
首先,来看一下《MRE3》的英文目录(PDF下载):
Preface
1 Introduction to Regular Expressions
Solving Real Problems
Regular Expressions as a Language
The Filename Analogy
The Language Analogy
The Regular-Expression Frame of Mind
If You Have Some Regular-Expression Experience
Searching Text Files: Egrep
Egrep Metacharacters
Start and End of the Line
Character Classes
Matching Any Character with Dot
Alternation
Ignoring Differences in Capitalization
Word Boundaries
In a Nutshell
Optional Items
Other Quantifiers: Repetition
Parentheses and Backreferences
The Great Escape
Expanding the Foundation
Linguistic Diversification
The Goal of a Regular Expression
A Few More Examples
Regular Expression Nomenclature
Improving on the Status Quo
Summary
Personal Glimpses
2 Extended Introductory Examples
About the Examples
A Short Introduction to Perl
Matching Text with Regular Expressions
Toward a More Real-World Example
Side Effects of a Successful Match
Intertwined Regular Expressions
Inter mission
Modifying Text with Regular Expressions
Example: Form Letter
Example: Prettifying a Stock Price
Automated Editing
A Small Mail Utility
Adding Commas to a Number with Lookaround
Text-to-HTML Conversion
That Doubled-Word Thing
3 Over view of Regular Expression Features and Flavors
A Casual Stroll Across the Regex Landscape
The Origins of Regular Expressions
At a Glance
Care and Handling of Regular Expressions
Integrated Handling
Procedural and Object-Oriented Handling
A Search-and-Replace Example
Search and Replace in Other Languages
Care and Handling: Summary
Strings, Character Encodings, and Modes
Strings as Regular Expressions
Character-Encoding Issues
Unicode
Regex Modes and Match Modes
Common Metacharacters and Features
Character Representations
Character Classes and Class-Like Constructs
Anchors and Other “Zero-Width Assertions”
Comments and Mode Modifiers
Grouping, Capturing, Conditionals, and Control
Guide to the Advanced Chapters
4 The Mechanics of Expression Processing
Start Your Engines!
Two Kinds of Engines
New Standards
Regex Engine Types
From the Department of Redundancy Department
Testing the Engine Type
Match Basics
About the Examples
Rule 1: The Match That Begins Earliest Wins
Engine Pieces and Parts
Rule 2: The Standard Quantifiers Are Greedy
Regex-Directed Versus Text-Directed
NFA Engine: Regex-Directed
DFA Engine: Text-Dir ected
First Thoughts: NFA and DFA in Comparison
Backtracking
A Really Crummy Analogy
Two Important Points on Backtracking
Saved States
Backtracking and Greediness
More About Greediness and Backtracking
Problems of Greediness
Multi-Character “Quotes”
Using Lazy Quantifiers
Greediness and Laziness Always Favor a Match
The Essence of Greediness, Laziness, and Backtracking
Possessive Quantifiers and Atomic Grouping
Possessive Quantifiers, ?+, ++, ++, and {m,n}+
The Backtracking of Lookaround
Is Alternation Greedy?
Taking Advantage of Ordered Alternation
NFA, DFA, and POSIX
“The Longest-Leftmost”
POSIX and the Longest-Leftmost Rule
Speed and Efficiency
Summary: NFA and DFA in Comparison
Summary
5 Practical Regex Techniques
Regex Balancing Act
A Few Short Examples
Continuing with Continuation Lines
Matching an IP Addr ess
Working with Filenames
Matching Balanced Sets of Parentheses
Watching Out for Unwanted Matches
Matching Delimited Text
Knowing Your Data and Making Assumptions
Stripping Leading and Trailing Whitespace
HTML-Related Examples
Matching an HTML Tag
Matching an HTML Link
Examining an HTTP URL
Validating a Hostname
Plucking Out a URL in the Real World
Extended Examples
Keeping in Sync with Your Data
Parsing CSV Files
6 Crafting an Efficient Expression
A Sobering Example
A Simple Change—Placing Your Best Foot Forward
Efficiency Versus Correctness
Advancing Further—Localizing the Greediness
Reality Check
A Global View of Backtracking
More Work for a POSIX NFA
Work Required During a Non-Match
Being More Specific
Alternation Can Be Expensive
Benchmarking
Know What You’r e Measuring
Benchmarking with PHP
Benchmarking with Java
Benchmarking with VB.NET
Benchmarking with Ruby
Benchmarking with Python
Benchmarking with Tcl
Common Optimizations
No Free Lunch
Everyone’s Lunch is Different
The Mechanics of Regex Application
Pre-Application Optimizations
Optimizations with the Transmission
Optimizations of the Regex Itself
Techniques for Faster Expressions
Common Sense Techniques
Expose Literal Text
Expose Anchors
Lazy Versus Greedy: Be Specific
Split Into Multiple Regular Expressions
Mimic Initial-Character Discrimination
Use Atomic Grouping and Possessive Quantifiers
Lead the Engine to a Match
Unrolling the Loop
Method 1: Building a Regex From Past Experiences
The Real “Unrolling-the-Loop” Pattern
Method 2: A Top-Down View
Method 3: An Internet Hostname
Observations
Using Atomic Grouping and Possessive Quantifiers
Short Unrolling Examples
Unrolling C Comments
The Freeflowing Regex
A Helping Hand to Guide the Match
A Well-Guided Regex is a Fast Regex
Wrapup
In Summary: Think!
7 Perl
Regular Expressions as a Language Component
Perl’s Greatest Strength
Perl’s Greatest Weakness
Perl’s Regex Flavor
Regex Operands and Regex Literals
How Regex Literals Are Parsed
Regex Modifiers
Regex-Related Perlisms
Expression Context
Dynamic Scope and Regex Match Effects
Special Variables Modified by a Match
The qr/˙˙˙/ Operator and Regex Objects
Building and Using Regex Objects
Viewing Regex Objects
Using Regex Objects for Efficiency
The Match Operator
Match’s Regex Operand
Specifying the Match Target Operand
Different Uses of the Match Operator
Iterative Matching: Scalar Context, with /g
The Match Operator’s Environmental Relations
The Substitution Operator
The Replacement Operand
The /e Modifier
Context and Return Value
The Split Operator
Basic Split
Returning Empty Elements
Split’s Special Regex Operands
Split’s Match Operand with Capturing Parentheses
Fun with Perl Enhancements
Using a Dynamic Regex to Match Nested Pairs
Using the Embedded-Code Construct
Using local in an Embedded-Code Construct
A Warning About Embedded Code and my Variables
Matching Nested Constructs with Embedded Code
Overloading Regex Literals
Problems with Regex-Literal Overloading
Mimicking Named Capture
Perl Efficiency Issues
“Ther e’s Mor e Than One Way to Do It”
Regex Compilation, the /o Modifier, qr/˙˙˙/, and Efficiency
Understanding the “Pre-Match” Copy
The Study Function
Benchmarking
Regex Debugging Information
Final Comments
8 Java
Java’s Regex Flavor
Java Support for \p{˙˙˙} and \P{˙˙˙}
Unicode Line Terminators
Using java.util.regex
The Pattern.compile() Factory
Pattern’s matcher method
The Matcher Object
Applying the Regex
Querying Match Results
Simple Search and Replace
Advanced Search and Replace
In-Place Search and Replace
The Matcher’s Region
Method Chaining
Methods for Building a Scanner
Other Matcher Methods
Other Pattern Methods
Pattern’s split Method, with One Argument
Pattern’s split Method, with Two Arguments
Additional Examples
Adding Width and Height Attributes to Image Tags
Validating HTML with Multiple Patterns Per Matcher
Parsing Comma-Separated Values (CSV) Text
Java Version Differences
Differences Between 1.4.2 and 1.5.0
Differences Between 1.5.0 and 1.6
9 .NET
.NET’s Regex Flavor
Additional Comments on the Flavor
Using .NET Regular Expressions
Regex Quickstart
Package Overview
Core Object Overview
Core Object Details
Creating Regex Objects
Using Regex Objects
Using Match Objects
Using Group Objects
Static “Convenience” Functions
Regex Caching
Support Functions
Advanced .NET
Regex Assemblies
Matching Nested Constructs
Capture Objects
10 PHP
PHP’s Regex Flavor
The Preg Function Interface
“Pattern” Arguments
The Preg Functions
preg_match
preg_matchRall
preg_replace
preg_replaceRcallback
preg_split
preg_grep
preg_quote
“Missing” Preg Functions
preg_regex_to_pattern
Syntax-Checking an Unknown Pattern Argument
Syntax-Checking an Unknown Regex
Recursive Expressions
Matching Text with Nested Parentheses
No Backtracking Into Recursion
Matching a Set of Nested Parentheses
PHP Efficiency Issues
The S Pattern Modifier: “Study”
Extended Examples
CSV Parsing with PHP
Checking Tagged Data for Proper Nesting
Index
本博客专注于Web前后端技术和技术翻译。目前正在翻译《JavaScript高级程序设计(第2版)》。新浪微博(t.sina.com.cn/lisf),Twitter(@cncuckoo,仅仅用于跟踪国外牛人;我翻不了墙,无法接受各位朋友的follow,抱歉!)
[...] 原来已经有细心热心(同时也在翻译正则表达式相关的书籍) 的朋友,对比了网上已经公布的《精通正则表达式》的片段,细细列出原文、我的译文和他的译文,并指出多处漏排和一处错译,另提出若干商榷之处。 [...]
Thank you for sharing!