Page 591 - Beginning PHP 5.3
P. 591
Chapter 18: String Matching with Regular Expressions
Try It Out Find All Links in a Web Page
In this example you use preg_match_all() with a regular expression to extract and display all links in
an HTML Web page. Save the following script as find_links.php in your document root folder:
<!DOCTYPE html PUBLIC “-//W3C//DTD XHTML 1.0 Strict//EN”
“http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd”>
<html xmlns=”http://www.w3.org/1999/xhtml” xml:lang=”en” lang=”en”>
<head>
<title>Find Linked URLs in a Web Page</title>
<link rel=”stylesheet” type=”text/css” href=”common.css” />
</head>
<body>
<h1>Find Linked URLs in a Web Page</h1>
<?php
displayForm();
if ( isset( $_POST[“submitted”] ) ) {
processForm();
}
function displayForm() {
?>
<h2>Enter a URL to scan:</h2>
<form action=”” method=”post” style=”width: 30em;”>
<div>
<input type=”hidden” name=”submitted” value=”1” />
<label for=”url”>URL:</label>
<input type=”text” name=”url” id=”url” value=”” />
<label> </label>
<input type=”submit” name=”submitButton” value=”Find Links” />
</div>
</form>
<?php
}
function processForm() {
$url = $_POST[“url”];
if ( !preg_match( ‘|^http(s)?\://|’, $url ) ) $url = “http://$url”;
$html = file_get_contents( $url );
preg_match_all( “/<a\s*href=[‘\”](.+?)[‘\”].*?>/i”, $html, $matches );
echo ‘<div style=”clear: both;”> </div>’;
echo “<h2>Linked URLs found at “ . htmlspecialchars( $url ) . “:</h2>”;
echo “<ul>”;
for ( $i = 0; $i < count( $matches[1] ); $i++ ) {
echo “<li>” . htmlspecialchars( $matches[1][$i] ) . “</li>”;
}
echo “</ul>”;
553
9/21/09 6:17:56 PM
c18.indd 553
c18.indd 553 9/21/09 6:17:56 PM