Tuesday, September 20, 2011

Python - Strings

Besides numbers, Python can also manipulate strings, which can be expressed in several ways. They can be enclosed in single quotes or double quotes:

>>> 'spam eggs'
'spam eggs'
>>> 'doesn\'t'
"doesn't"
>>> "doesn't"
"doesn't"
>>> '"Yes," he said.'
'"Yes," he said.'
>>> "\"Yes,\" he said."
'"Yes," he said.'
>>> '"Isn\'t," she said.'
'"Isn\'t," she said.'

The interpreter prints the result of string operations in the same way as they are typed for input: inside quotes, and with quotes and other funny characters escaped by backslashes, to show the precise value. The string is enclosed in double quotes if the string contains a single quote and no double quotes, else it’s enclosed in single quotes. The print() function produces a more readable output for such input strings.

String literals can span multiple lines in several ways. Continuation lines can be used, with a backslash as the last character on the line indicating that the next line is a logical continuation of the line:

hello = "This is a rather long string containing\n\
several lines of text just as you would do in C.\n\
Note that whitespace at the beginning of the line is\
significant."

print(hello)

Note that newlines still need to be embedded in the string using \n – the newline following the trailing backslash is discarded. This example would print the following:

This is a rather long string containing
several lines of text just as you would do in C.
Note that whitespace at the beginning of the line is significant.

Or, strings can be surrounded in a pair of matching triple-quotes: """ or '''. End of lines do not need to be escaped when using triple-quotes, but they will be included in the string. So the following uses one escape to avoid an unwanted initial blank line.

print("""\
Usage: thingy [OPTIONS]
    -h Display this usage message
    -H hostname Hostname to connect to
""")

produces the following output:

Usage: thingy [OPTIONS]
    -h Display this usage message
    -H hostname Hostname to connect to

If we make the string literal a “raw” string, \n sequences are not converted to newlines, but the backslash at the end of the line, and the newline character in the source, are both included in the string as data. Thus, the example:

hello = r"This is a rather long string containing\n\
several lines of text much as you would do in C."

print(hello)

would print:

This is a rather long string containing\n\
several lines of text much as you would do in C.

Strings can be concatenated (glued together) with the + operator, and repeated with *:

>>> word = 'Help' + 'A'
>>> word
'HelpA'
>>> '<' + word*5 + '>'
'<HelpAHelpAHelpAHelpAHelpA>'

Two string literals next to each other are automatically concatenated; the first line above could also have been written word = 'Help' 'A'; this only works with two literals, not with arbitrary string expressions:

>>> 'str' 'ing' # <- This is ok 'string'
>>> 'str'.strip() + 'ing' # <- This is ok 'string'
>>> 'str'.strip() 'ing' # <- This is invalid
  File "<stdin>", line 1, in ?
    'str'.strip() 'ing'
                     ^
SyntaxError: invalid syntax

Strings can be subscripted (indexed); like in C, the first character of a string has subscript (index) 0. There is no separate character type; a character is simply a string of size one. As in the Icon programming language, substrings can be specified with the slice notation: two indices separated by a colon.

>>> word[4]
'A'
>>> word[0:2]
'He'
>>> word[2:4]
'lp'

Slice indices have useful defaults; an omitted first index defaults to zero, an omitted second index defaults to the size of the string being sliced.

>>> word[:2] # The first two characters
'He'
>>> word[2:] # Everything except the first two characters
'lpA'

Unlike a C string, Python strings cannot be changed. Assigning to an indexed position in the string results in an error:

>>> word[0] = 'x'
Traceback (most recent call last):
File "<stdin>", line 1, in ?
TypeError: 'str' object does not support item assignment
>>> word[:1] = 'Splat'
Traceback (most recent call last):
File "<stdin>", line 1, in ?
TypeError: 'str' object does not support slice assignment

However, creating a new string with the combined content is easy and efficient:

>>> 'x' + word[1:]
'xelpA'
>>> 'Splat' + word[4]
'SplatA'

Here’s a useful invariant of slice operations: 's[:i] + s[i:]' equals 's'.

>>> word[:2] + word[2:]
'HelpA'
>>> word[:3] + word[3:]
'HelpA'

Degenerate slice indices are handled gracefully: an index that is too large is replaced by the string size, an upper bound smaller than the lower bound returns an empty string.

>>> word[1:100]
'elpA'
>>> word[10:]
''
>>> word[2:1]
''

Indices may be negative numbers, to start counting from the right. For example:

>>> word[-1] # The last character
'A'
>>> word[-2] # The last-but-one character
'p'
>>> word[-2:] # The last two characters
'pA'
>>> word[:-2] # Everything except the last two characters
'Hel'

But note that -0 is really the same as 0, so it does not count from the right!

>>> word[-0] # (since -0 equals 0)
'H'

Out-of-range negative slice indices are truncated, but don’t try this for single-element (non-slice) indices:

>>> word[-100:]
'HelpA'
>>> word[-10] # error
Traceback (most recent call last):
File "<stdin>", line 1, in ?
IndexError: string index out of range

One way to remember how slices work is to think of the indices as pointing between characters, with the left edge of the first character numbered 0. Then the right edge of the last character of a string of n characters has index n, for example:


 +---+---+---+---+---+                                             
 | H | e | l | p | A |                                              
 +---+---+---+---+---+                                              
 0   1   2   3   4   5              
-5  -4  -3  -2  -1                  

The first row of numbers gives the position of the indices 0...5 in the string; the second row gives the corresponding negative indices. The slice from i to j consists of all characters between the edges labeled i and j, respectively.

For non-negative indices, the length of a slice is the difference of the indices, if both are within bounds. For example, the length of word[1:3] is 2.

The built-in function len() returns the length of a string:

>>> s = 'supercalifragilisticexpialidocious'
>>> len(s)
34

0 comments:

Post a Comment

Contact Form

Name

Email *

Message *