Andres Baravalle
Strings are data structures composed of a sequence of alphanumeric characters.
Strings are widely used in PHP; string functions can be used to:
PHP has just under 100 (2013) functions to interact with strings.
You should become familiar with all the ones listed in the next few pages (and all the ones included in your text book, of course).
trim() | Remove leading and trailing with space in a string. You can also use ltrim() (leading) and rtrim() (trailing) |
nl2br() | Insert '<br />' or '<br>' before any new lines (in the code) |
strtoupper() | Return uppercase string |
strtolower() | Return lowercase string |
ucfirst() | Return the string with an uppercase first character |
ucwords() | Capitalise the first character in each word in the string |
printf() | Output a formatted string |
sprintf() | Return a formatted string |
<?php
$num = 5;
$location = "tree";
$format = 'There are %d monkeys in the %s';
printf($format, $num, $location);
?>
<?php
$num = 5;
$location = "tree";
$format = 'The %2$s contains %1$d monkeys. <br>That\'s a nice %2$s full of %1$d monkeys.';
printf($format, $num, $location);
?>
<?php
$s = 'monkey';
$t = 'many monkeys';
printf("[%s]<br>\n", $s); // standard string output
printf("[%10s]<br>\n", $s); // right-justification with spaces
printf("[%-10s]<br>\n", $s); // left-justification with spaces
printf("[%010s]<br>\n", $s); // zero-padding works on strings too
printf("[%'#10s]<br>\n", $s); // use the custom padding character '#'
printf("[%10.10s]<br>\n", $t); // left-justification but with a cutoff of 10 characters
?>
See the expected output here. Please note: a special font is used in the live example: why?
<?php
$output = "";
$start_page = array(1, 20, 35, 70, 90, 123, 156, 190, 210, 230, 256);
for ($i = 0; $i < count($start_page); $i++) {
$output .= sprintf("Chapter %'.-20d%'.4d
\n", $i+1, $start_page[$i]);
}
echo $output;
?>
See the expected output here.
Let's analyse the sprintf() format parameter in details:
sprintf("Chapter %'.-20d%'.4d\n", $i+1, $start_page[$i]);
For the first format string:
%
is the start char for the conversion specification'.
is setting . as our padding specifier-
is aligning to the left (any character aligns to the left)20
is the minimum with of the result; as we have 11 chapters, Chapter 1-9 will be followed by 19 dots (.). Chapters 10-11 will be followed by 18.d
is telling us that the parameter will be formatted as a digit (irrelevant in this example)Use the printf()
function and:
Using the following array of chapters titles and pages:
<?php
$array = array(
array("Intro", 1),
array("Random chapter name", 6),
array("Another random chapter name", 13),
array("More random chapter name", 2),
array("Again a random chapter name", 33),
array("Blah blah random chapter name", 39),
array("Beh random chapter name", 45),
array("Atch! random chapter name", 61),
array("This is a random chapter name", 81),
array("Final random chapter name", 89)
);
?>
Use printf()
to print the chapters number, title and page number formatted as the next line:
Chapter 1: Intro....................1
Chapter 2: Random chapter name......6
Any content generated by users should be sanitised before passing it to the functions that will process it (e.g. to interact with a database).
When a user submits a form, the user selections (e.g. checkboxes/radio boxes selections, text in textarea/input fields) is sent to the server and stored in the $_GET
or $_POST
variable.
The next step - in any page like the SourceForge log-in in the previous slide - will be to compare the content submitted by the user (user name and password) against the database.
If the user content is not filtered, an attacker can try to inject SQL code in the query (rather than simple text).
A number of techniques are possible, but as you have no experience with SQL we will not explore them further.
Make sure that you sanitise any use content before using it. Always.
addslashes() reverse: stripslashes() |
Return a string with backslashes before single quote ('), double quote ("), backslash (\) and NUL. When the content to be escaped will be used in a database query, use the native function instead - e.g. mysqli_real_escape_string() |
htmlspecialchars() | Converts some special characters to HTML entities |
htmlentities() | Convert all applicable characters to HTML entities |
strip_tags() | Strip HTML and PHP tags from a string |
filter_var() | Use to sanitise variables (use FILTER_SANITIZE_STRING as filter to strip tags and sanitise strings) |
implode() | Join array elements with a string |
explode() | Split a string by string; returns an array |
<?php
$array = array('lastname', 'email', 'phone');
$comma_separated = implode(",", $array);
echo $comma_separated; // lastname,email,phone
?>
<?php
$ingredients = "tomato mozzarella basil artichokes mushrooms ham olives";
$ingredients_array = explode(" ", $ingredients);
?>
Use this page to find the list of the 100 most popular male names in 2012 in the US and:
strtr() | Translate characters or replace substrings |
substr() | Return part of a string |
str_replace() | Replace search string with the replacement string |
Building on top of activity #3, now help Rosemary finding her perfect baby name:
Sonnet 116 by Shakespeare is stored in the code below with its rhyme scheme at the end of each line (ab ab cd cd ef ef gg):
<?php
$s116 = "Let me not to the marriage of true minds (a)
Admit impediments, love is not love (b)
Which alters when it alteration finds, (a)
Or bends with the remover to remove. (b)
O no, it is an ever fixèd mark (c)
That looks on tempests and is never shaken; (d)
It is the star to every wand'ring bark, (c)
Whose worth's unknown although his height be taken. (d)
Love's not time's fool, though rosy lips and cheeks (e)
Within his bending sickle's compass come, (f)
Love alters not with his brief hours and weeks, (e)
But bears it out even to the edge of doom: (f)
If this be error and upon me proved, (g)
I never writ, nor no man ever loved. (g)";
?>
Use a set of replace functions to clean the rhyme indicators at the end of each (e.g. (d)).
The opposite of implode
can be used to convert the string into an array, having as many elements as the lines in the string. Read the documentation and apply the function to the sonnet used in the previous activity.
Regular expressions (reg exps) provide a special syntax for searching for patterns of text within strings.
Regular expressions are enclosed in delimiters (usually slashes). For example, this simple regular expression:
/word/
searches for the word "world" anywhere within the target string.
Regular expressions as a concept arose in the 1950s and are in common use in Unix tools as grep, ed and vi.
The next slides will focus first on syntax of regular expressions, and then on their use in PHP.
PHP's main pattern-matching function is preg_match()
. The main patter-replacing function is preg_replace()
.
<?php
// replace ~ with any symbol that it's not in your text
if(preg_match('~word~','In linguistics, a word is the smallest
element that may be uttered in isolation
with semantic or pragmatic content.', $matches)) {
echo "Pattern found.";
}
?>
Replace the word "fox" with "cat" in "The quick brown fox jumps over the lazy dog":
str_replace()
preg_replace()
Extra challenge: measure execution time for both approaches (hint: use microtime()
).
Replace the word "fox" with "cat" and "dog" with "mouse" in "The quick brown fox jumps over the lazy dog":
str_replace()
preg_replace()
Extra challenge: measure execution time for both approaches (hint: use microtime()
).
Character classes (or sets) are used to match one of several characters:
E.g.:
/[A-Z0-9]/
Will match any uppercase letter and any number.
PHP includes a number of predefined classes, including:
[[:alnum:]] | Alphanumeric characters |
[[:alpha:]] | Alphabetic characters |
[[:lower:]] | Lowercase letters |
[[:upper:]] | Uppercase letters |
[[:digit:]] | Decimal digits |
[[:punct:]] | Punctuation |
[[:blank:]] | Whitespace |
Refer to the full list for more expressivity.
/[[:alpha:][:space:][:punct:]]/
Will match any letter, space or punctuation sign.
Regular expressions allow to perform more precise and complex searches. For example, if I'm looking for my name in a string, I might want to look for both my Italian (Andrea Baravalle) and Spanish names (Andres Baravalle) at the same time.
Or I might want also to look to my full name, including my second name included (Nicola/Nicolas).
This is how this is represented using regular expressions:
/Andre[as](Nicola[s]? )? Baravalle/
. | Matches any single character. |
? | The preceding item is optional and will be matched, at most, once. |
* | The preceding item will be matched zero or more times. |
+ | The preceding item will be matched one or more times. |
{N} | The preceding item is matched exactly N times. |
{N,} | The preceding item is matched N or more times. |
{N,M} | The preceding item is matched at least N times, but not more than M times. |
- | Normally represents the range. |
^ | Matches the empty string at the beginning of a line; also represents the characters not in the range of a list. |
.{9} | Any 9 characters |
(az){3} | azazaz |
(az){2,3} | azaz or azazaz |
[a-c]{2} | any 2 character combination of a, b and c |
^Abcd | Matches Abcd only at the beginning of the line |
(az)?c | Matches c and azc |
<?php
$url = "http://www.eztvproxy.org/shows/23/the-big-bang-theory/";
// we need to download the URL
$html = file_get_contents($url);
if($html) {
// uncomment the next line to check if you are downloading the original page
// echo $html;
// sample
// <a href="http://torrent.zoink.it/The.Big.Bang.Theory.S02E06.HDTV.XviD-LOL.[eztv].torrent" class="download_1" title="Download Mirror #1"></a>
// getting all episodes
// $pattern = '#http://[a-z\.]+/The.Big.Bang.Theory.S[0-9]+E[0-9]+.*?\.torrent#';
// getting series 5 only
$pattern = '#http://[a-z\.]+/The.Big.Bang.Theory.S05+E[0-9]+.*?\.torrent#';
if(preg_match_all($pattern, $html, $matches)) {
natcasesort($matches);
echo "<pre>";
print_r(array_reverse($matches[0]));
echo "</pre>";
}
}
?>
When you use quantifiers to match multiple characters, the quantifiers are by default greedy.
You can change a quantifier to be non-greedy. This causes it to match the smallest number of characters possible. To make a quantifier as non-greedy, place a question mark (?) after the quantifier.
<?php
preg_match("/P.*?r/", "Peter Piper", $matches);
echo $matches[0]; // Displays "Peter"
preg_match("/P.*r/", "Peter Piper", $matches);
echo $matches[0]; // Displays "Peter Piper"
?>
Solve the exercises on this page. If unsure, test them with a PHP script or in Notepad++.
By placing a portion of your regular expression's rules in parentheses, you can group those rules into a subpattern. You can now use quantifiers (such as * and ? ) to match the whole subpattern a certain number of times.
A side-effect of using subpatterns is that you can retrieve the individual subpattern matches in the
matches array passed to preg_match()
. The first element of the array contains the entire matched text
as usual, and each subsequent element contains any matched subpatterns:
<?php
preg_match( "/(\d+\/\d+\/\d+) (\d+\:\d+.+)/", "7/18/2004 9:34AM", $matches );
echo $matches[0] . "<br>"; // Displays "7/18/2004 9:34AM"
echo $matches[1] . "<br>"; // Displays "7/18/2004"
echo $matches[2] . "<br>"; // Displays "9:34AM"
?>
Regular expressions let you combine patterns (and subpatterns) with the | (vertical bar) character to create alternatives.
$day = "wed";
echo preg_match( "/mon|tue|wed|thu|fri|sat|sun/", $day ); // Displays "1"
This work is licensed under a Creative Commons Attribution-ShareAlike 3.0 Unported License