Page 591 - Beginning PHP 5.3
P. 591

Chapter 18: String Matching with Regular Expressions

                       Try It Out     Find All Links in a Web Page
                         In this example you use preg_match_all() with a regular expression to extract and display all links in
                         an HTML Web page. Save the following script as find_links.php in your document root folder:
                             <!DOCTYPE html PUBLIC “-//W3C//DTD XHTML 1.0 Strict//EN”
                               “http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd”>
                             <html xmlns=”http://www.w3.org/1999/xhtml” xml:lang=”en” lang=”en”>
                               <head>
                                 <title>Find Linked URLs in a Web Page</title>
                                 <link rel=”stylesheet” type=”text/css” href=”common.css” />
                               </head>
                               <body>

                                 <h1>Find Linked URLs in a Web Page</h1>

                             <?php

                             displayForm();

                             if ( isset( $_POST[“submitted”] ) ) {
                               processForm();
                             }

                             function displayForm() {
                             ?>
                                 <h2>Enter a URL to scan:</h2>
                                 <form action=”” method=”post” style=”width: 30em;”>
                                   <div>
                                     <input type=”hidden” name=”submitted” value=”1” />
                                     <label for=”url”>URL:</label>
                                     <input type=”text” name=”url” id=”url” value=”” />
                                     <label> </label>
                                     <input type=”submit” name=”submitButton” value=”Find Links” />
                                   </div>
                                 </form>
                             <?php
                             }

                             function processForm() {
                               $url = $_POST[“url”];
                               if ( !preg_match( ‘|^http(s)?\://|’, $url ) ) $url = “http://$url”;
                               $html = file_get_contents( $url );
                               preg_match_all( “/<a\s*href=[‘\”](.+?)[‘\”].*?>/i”, $html, $matches );

                               echo ‘<div style=”clear: both;”> </div>’;
                               echo “<h2>Linked URLs found at “ . htmlspecialchars( $url ) . “:</h2>”;
                               echo “<ul>”;

                               for ( $i = 0; $i < count( $matches[1] ); $i++ ) {
                                 echo “<li>” . htmlspecialchars( $matches[1][$i] ) . “</li>”;
                               }

                               echo “</ul>”;


                                                                                                         553





                                                                                                      9/21/09   6:17:56 PM
          c18.indd   553
          c18.indd   553                                                                              9/21/09   6:17:56 PM
   586   587   588   589   590   591   592   593   594   595   596