[Previous] [Next] [Up] [Top] [Search]
perltrap - Perl traps for the unwary
The biggest trap of all is forgetting to use the
-w
switch;
see
the perlrun manpage
. Making your entire program runnable under
use strict;
can help make your program more bullet-proof, but sometimes
it's too annoying for quick throw-away programs.
Accustomed awk users should take special note of the following:
-
*
-
The English module, loaded via
use English;
allows you to refer to special variables (like
$RS
) as
though they were in awk; see
the perlvar manpage
for details.
-
*
-
Semicolons are required after all simple statements in Perl (except
at the end of a block). Newline is not a statement delimiter.
-
*
-
Curly brackets are required on
if
s and while
s.
-
*
-
Variables begin with "$" or "@" in Perl.
-
*
-
Arrays index from 0. Likewise string positions in
substr()
and
index()
.
-
*
-
You have to decide whether your array has numeric or string indices.
-
*
-
Associative array values do not spring into existence upon mere
reference.
-
*
-
You have to decide whether you want to use string or numeric
comparisons.
-
*
-
Reading an input line does not split it for you. You get to split it
yourself to an array. And
split()
operator has different
arguments.
-
*
-
The current input line is normally in
$_
, not
$0
. It generally does
not have the newline stripped. (
$0
is the name of the program
executed.) See
the perlvar manpage
.
-
*
-
<digit> does not refer to fields--it refers to substrings matched by
the last match pattern.
-
*
-
The
print()
statement does not add field and record separators unless
you set
$,
and
$.
. You can set
$OFS
and
$ORS
if you're using
the English module.
-
*
-
You must open your files before you print to them.
-
*
-
The range operator is "..", not comma. The comma operator works as in
C.
-
*
-
The match operator is "=~", not "~". ("~" is the one's complement
operator, as in C.)
-
*
-
The exponentiation operator is "**", not "^". "^" is the XOR
operator, as in C. (You know, one could get the feeling that awk is
basically incompatible with C.)
-
*
-
The concatenation operator is ".", not the null string. (Using the
null string would render
/pat/ /pat/
unparsable, since the third slash
would be interpreted as a division operator--the tokener is in fact
slightly context sensitive for operators like "/", "?", and ">".
And in fact, "." itself can be the beginning of a number.)
-
*
-
The
next
,
exit
, and
continue
keywords work differently.
-
*
-
The following variables work differently:
- Awk Perl
- ARGC $#ARGV or scalar @ARGV
- ARGV[0] $0
- FILENAME $ARGV
- FNR $. - something
- FS (whatever you like)
- NF $#Fld, or some such
- NR $.
- OFMT $#
- OFS $,
- ORS $\
- RLENGTH length($&)
- RS $/
- RSTART length($`)
- SUBSEP $;
-
*
-
You cannot set
$RS
to a pattern, only a string.
-
*
-
When in doubt, run the awk construct through a2p and see what it
gives you.
.
Cerebral C programmers should take note of the following:
-
*
-
Curly brackets are required on
if
's and while
's.
-
*
-
You must use
elsif
rather than else if
.
-
*
-
The
break
and continue
keywords from C become in
Perl
last
and
next
, respectively.
Unlike in C, these do NOT work within a
do { } while
construct.
-
*
-
There's no switch statement. (But it's easy to build one on the fly.)
-
*
-
Variables begin with "$" or "@" in Perl.
-
*
-
printf()
does not implement the "*" format for interpolating
field widths, but it's trivial to use interpolation of double-quoted
strings to achieve the same effect.
-
*
-
Comments begin with "#", not "/*".
-
*
-
You can't take the address of anything, although a similar operator
in Perl 5 is the backslash, which creates a reference.
-
*
-
ARGV
must be capitalized.
-
*
-
System calls such as
link()
,
unlink()
,
rename()
, etc. return nonzero for
success, not 0.
-
*
-
Signal handlers deal with signal names, not numbers. Use
kill -l
to find their names on your system.
.
Seasoned sed programmers should take note of the following:
-
*
-
Backreferences in substitutions use "$" rather than "\".
-
*
-
The pattern matching metacharacters "(", ")", and "|" do not have backslashes
in front.
-
*
-
The range operator is
...
, rather than comma.
.
Sharp shell programmers should take note of the following:
-
*
-
The backtick operator does variable interpretation without regard to
the presence of single quotes in the command.
-
*
-
The backtick operator does no translation of the return value, unlike csh.
-
*
-
Shells (especially csh) do several levels of substitution on each
command line. Perl does substitution only in certain constructs
such as double quotes, backticks, angle brackets, and search patterns.
-
*
-
Shells interpret scripts a little bit at a time. Perl compiles the
entire program before executing it (except for
BEGIN
blocks, which
execute at compile time).
-
*
-
The arguments are available via
@ARGV
, not $1, $2, etc.
-
*
-
The environment is not automatically made available as separate scalar
variables.
.
Practicing Perl Programmers should take note of the following:
-
*
-
Remember that many operations behave differently in a list
context than they do in a scalar one. See
the perldata manpage
for details.
-
*
-
Avoid barewords if you can, especially all lower-case ones.
You can't tell just by looking at it whether a bareword is
a function or a string. By using quotes on strings and
parens on function calls, you won't ever get them confused.
-
*
-
You cannot discern from mere inspection which built-ins
are unary operators (like
chop()
and
chdir()
)
and which are list operators (like
print()
and
unlink()
).
(User-defined subroutines can only be list operators, never
unary ones.) See
the perlop manpage
.
-
*
-
People have a hard time remembering that some functions
default to
$_
, or
@ARGV
, or whatever, but that others which
you might expect to do not.
-
*
-
The<FH> construct is not the name of the filehandle, it is a readline
operation on that handle. The data read is only assigned to
$_
if the
file read is the sole condition in a while loop:
while () { }
while ($_ = ) { }..
; # data discarded!
-
*
-
Remember not to use "
=
" when you need "=~
";
these two constructs are quite different:
$x = /foo/;
$x =~ /foo/;
-
*
-
The
do {}
construct isn't a real loop that you can use
loop control on.
-
*
-
Use
my()
for local variables whenever you can get away with
it (but see
the perlform manpage
for where you can't).
Using
local()
actually gives a local value to a global
variable, which leaves you open to unforeseen side-effects
of dynamic scoping.
.
Penitent Perl 4 Programmers should take note of the following
incompatible changes that occurred between release 4 and release 5:
-
*
-
@
now always interpolates an array in double-quotish strings. Some programs
may now need to use backslash to protect any @
that shouldn't interpolate.
-
*
-
Barewords that used to look like strings to Perl will now look like subroutine
calls if a subroutine by that name is defined before the compiler sees them.
For example:
sub SeeYa { die "Hasta la vista, baby!" }
$SIG{'QUIT'} = SeeYa;
In Perl 4, that set the signal handler; in Perl 5, it actually calls the
function! You may use the
-w
switch to find such places.
-
*
-
Symbols starting with
_
are no longer forced into package main
, except
for
$_
itself (and @_, etc.).
-
*
-
s'$lhs'$rhs'
now does no interpolation on either side. It used to
interpolate $lhs
but not $rhs
.
-
*
-
The second and third arguments of
splice()
are now evaluated in scalar
context (as the book says) rather than list context.
-
*
-
These are now semantic errors because of precedence:
shift @list + 20;
$n = keys %map + 20;
Because if that were to work, then this couldn't:
sleep $dormancy + 20;
-
*
-
open FOO || die
is now incorrect. You need parens around the filehandle.
While temporarily supported, using such a construct will
generate a non-fatal (but non-suppressible) warning.
-
*
-
The elements of argument lists for formats are now evaluated in list
context. This means you can interpolate list values now.
-
*
-
You can't do a
goto
into a block that is optimized away. Darn.
-
*
-
It is no longer syntactically legal to use whitespace as the name
of a variable, or as a delimiter for any kind of quote construct.
Double darn.
-
*
-
The
caller()
function now returns a false value in a scalar context if there
is no caller. This lets library files determine if they're being required.
-
*
-
m//g
now attaches its state to the searched string rather than the
regular expression.
-
*
-
reverse
is no longer allowed as the name of a sort subroutine.
-
*
-
taintperl is no longer a separate executable. There is now a
-T
switch to turn on tainting when it isn't turned on automatically.
-
*
-
Double-quoted strings may no longer end with an unescaped
$
or @
.
-
*
-
The archaic
while/if
BLOCK BLOCK syntax is no longer supported.
-
*
-
Negative array subscripts now count from the end of the array.
-
*
-
The comma operator in a scalar context is now guaranteed to give a
scalar context to its arguments.
-
*
-
The
**
operator now binds more tightly than unary minus.
It was documented to work this way before, but didn't.
-
*
-
Setting
$#
array
lower now discards array elements.
-
*
-
delete()
is not guaranteed to return the old value for
tie()
d arrays,
since this capability may be onerous for some modules to implement.
-
*
-
The construct "this is
$$
x" used to interpolate the pid at that
point, but now tries to dereference $x.
$$
by itself still
works fine, however.
-
*
-
Some error messages will be different.
-
*
-
Some bugs may have been inadvertently removed.
.