Follow vBSEO on Twitter
vBulletin Modifications
  • Forums
  • Add-Ons
  • Template Modifications
  • Styles
  • Graphics
  • Tutorials
  • Support Center
  • Register
  • vBulletin SEO

Member Log In

Site Navigation

  • Register
  • Members List
  • Social Groups
  • Search
  • Today's Posts
  • Mark Forums Read

Latest Modifications

  • [vB 3.6.x] StopSpam
    By: flappi282
  • [vB 3.8.x] vBulletin Chat Addon for...
    By: 123flashchat
  • [vB 3.5.x] 404 Page Redirect To...
    By: Ak Worm
  • [vB 3.8.x] 404 Page Redirect To...
    By: Ak Worm
  • [vB 4.0.x] 404 Page Redirect To...
    By: Ak Worm

Latest Template Mods

  • [vB 3.8.4] Images PassWordBox...
    By: cRs!MP
  • [vB 3.8.4] Footer Follow Ups
    By: Ak Worm
  • [vB 3.7.2] Worldofwarcraft blue...
    By: Mikeyodesigns
  • [vB 3.7.0] My Links
    By: blind-eddie
  • [vB 3.7.0] Pop-Up Warning Before...
    By: Thelonius Beck

Latest Styles

  • [vB 3.8.4] CompletevB - Skylight
    By: DreadKnight
  • [vB 3.8.3] [vB 3.8.4] Barcelona...
    By: hoiquantinhoc.com
  • [vB 3.8.3] Natures Walk by vBSkin...
    By: Chri5
  • [vB 3.8.3] Green Theme
    By: Robdog
  • [vB 3.8.2] Unreal T 3 - vB3.8.x
    By: Butcher

Latest Graphics

  • [vB ] [anim.]Team Ranks
    By: cRs!MP
  • [vB ] Abstract Circles (3...
    By: cRs!MP
  • [vB ] PlayStation Rank Images
    By: cRs!MP
  • [vB 3.6.12] Heavy Stroked Button...
    By: Shelley
  • [vB ] Minature Ranks.
    By: Shelley
vBulletin Modifications » General vBulletin Section » vBulletin Modification Tutorials » [How-To] php: fetch specific info from remote + intro to regexp
Reply

 

  • Thread Tools
Old 09-03-2006, 03:12 PM   #1
Idan
Coder
Idan's Avatar

Activity Longevity
6/20 17/20
Today Posts
0/3 sssss1482
Location: Israel
Age: 29
Idan is on a distinguished road
Status: Offline [How-To] php: fetch specific info from remote + intro to regexp
in this tutorial i'll explain basic approach on ways to process via php code info that is located on remote server.
this could be quite useful when trying to fetch cover picture or specific text data from a certain website with same pattern. we'll implant this using regular epression pattern matching to your query.
(could be for example covers from imdb / gamespot / tv or anything that comes to your mind).

small disclaimer before we continue: i'll remind you this tutorial is for educational purpose only. you should be aware that most websites content (either text or pictures) are copyrighted by law, and using this info without owner permission is usually forbidden by law ! so in this tutorial i'm assuming you have proper permisisions.

ok, lets start:

first i'll introduce you with basic code/"algo" (one solution, of many possible) that will allow you to remote open file & process it line by line:
PHP Code:
$page = file("http://www.my_target_website.com/path_to_file");
foreach (
$page as $line)
{
     
//match for some pattern we are looking for
     
if (eregi('regexp pattern',$line, $dump)) {
            
$my_data = $dump[1];
            break;  
//you can stop loop - page not needed anymore
     
}
} 
as you can see what this code basically does is using php function called "file()" to open remote file, and then takes the array returned to it, and split it with for loop using foreach method. once pattern needed matched, i've added break command, to stop looping and exit the code, to avoid further lags. also plz note i've used in pattern match the function eregi() = match pattern (case insensitive). if you need it to be case sensitive, use function ereg() instead.

now we'll try to make some regular expression "crash course" to pick up with basics:
1. read tutorials & info guides @ http://www.regular-expressions.info/ (this is very good website with good regexp resources)
2. very common regexp chars/symbols & examples to clear it:
^ - char that match for start of string
$ - char that match for end of string
. (or .* with repetition) - all chars except for line break
[] - range or chars list
() - refer to match found later on (see examples)
{} - amount of repetition
examples:
end/start/all chars examples
^a.* - all sentence that start with letter "a"
^a.*t$ - all sentence that start with letter "a" & ends with t
use \ (backslash) to "escape" any special char used in regexp, in case you need it.
range & repetition examples
[0-9] - number between 0-9
[a-z] - lower case letter from a-z
[a-zA-Z] - lower & upper case letter (a to z)
[0-9]{1,6} - any number with 0-9 digits that can be 1-6 digits long
[4-9]{5} - match for 5 digits number (4-9 digits)
escaping chars example:
\(hello\) - will match for "(hello)" (without the quotes)
using () example:
when u expect 1 string to match several queries, use () to seperate each result. the later on when using php function like eregi(), each () will be new array field assigned to your dump var.
3. putting it all togther:

lets assume i'm looking to extract from html code line like:
HTML Code:
<a name="poster" href="photogallery" title="The Matrix"><img border="0" alt="The Matrix" title="The Matrix" src="http://ia.imdb.com/media/imdb/01/I/38/48/31m.jpg" height="140" width="99"></a> 
(some code line from imdb showing matrix cover picture link)

lets assume i want to extract from it the picture cover link, so a possible match line for this would be something like this:
PHP Code:
<a name="poster" href="photogallery" title=".*"><img border="0" alt=".*" title=".*" src="(http:\/\/.*\.jpg)" height="[0-9]{2,3}" width="[0-9]{2,3}"></a> 
explanation of what done in the example:
logic says the name could vary, so any copy of "The Matrix" was replaced with .* to match any chars (including space)
for the info i need fetching i added () around it (the url itself).
notice i've escaped some special chars like / (slash) and . (dot).
and notice the usage of numbers range.
also note the more accurate the regexp is, the better & certain you'll get what you need.
as for the above html-code-line the next regexp could be as well:
PHP Code:
.* src="(http:\/\/.*\.jpg)".* 
however this is "bad", since it would match on any picture url with jpg extesion, and not nesscarily on our wanted cover-link match.

if we are back to our code example in the begining of this tutorial:
PHP Code:
...

     
//match for cover pattern we are looking for
     
if (eregi('<a name="poster" href="photogallery" title=".*"><img border="0" alt=".*" title=".*" src="(http:\/\/.*\.jpg)" height="[0-9]{2,3}" width="[0-9]{2,3}"></a>',$line, $dump)) {
            
$my_pic_url = $dump[1];
            break;  
//you can stop loop - page not needed anymore
     
}

... 
in this the url we needed is stored into $my_pic_url.

also note regarding eregi()/ereg() function i've mentioned in previous lines above. if there are more than 1 match and assuming you did multi () to gather each match wanted, each match will be returned as new array cell, in the order placed on pattern. note that first match starts with cell #1.
cell #0 = entire sentence, assuming something matched on your pattern.

also few pointers for those hard to match stuff:
in case you need to match line with nothing "specail" about it (just plain text), you can't use .* approach, as it will match every line.
instead try to use some var as "flag", from previous line.
for example if i have 2 lines of html code:
HTML Code:
name:
any text goes here
and lets assume you want to match only the "any text goes here", write pattern to match "name:" and inside IF do something like
PHP Code:
$flag=1;
continue; 
//jump to next line on loop 
and above add another IF inside the foreach loop to check:
PHP Code:
if ($flag) {
$my_text = $line;
$flag=0; //reset flag back to zero
break;
} 
that's it, hope this gave you the basics & ways to approach this to any task needed. when looking on this like that, there isn't any website that is "too big" to handle. all matter of practice & close exmaine of html code output.

another tip onced passed to me by ShavedApe (which is a great coder) is to use app like "regexbuddy" - though it's not freeware, it's very handy tool that can show you any regexp highlight matching in your text, making your life much easier when much complex regexp is required.

hope you find this usefull. :classic:
greets.
Regards,
Idan.

* Support will only be given via forums !
* If this post solved/aided your problem, please click "mark as aid" / "mark as solution" as explained in here
Reply With Quote
Old 09-03-2006, 03:18 PM   #2
Nick R
vBulletin Guru

Nick R's Avatar

Activity Longevity
0/20 13/20
Today Posts
0/3 sssss4450
Location: Cyberspace, UK
Age: 30
Nick R is on a distinguished road
Send a message via MSN to Nick R Send a message via Yahoo to Nick R
Status: Offline Default
interesting, thanks
Reply With Quote
Old 09-03-2006, 07:14 PM   #3
Senna
Coder

Senna's Avatar

Activity Longevity
0/20 17/20
Today Posts
0/3 sssss1411
Location: LalaLand
Age: 32
Senna is on a distinguished road
Status: Offline Default
Nice tutorial, it explains the basics very well
Reply With Quote
Old 09-03-2006, 07:28 PM   #4
Michael Biddle
Staff
Michael Biddle's Avatar

Activity Longevity
4/20 17/20
Today Posts
0/3 sssss2823
Location: Anaheim
Age: 21
Michael Biddle is on a distinguished road
Status: Offline Default
great tutorial
Support will only be offered through forums
Michael Biddle / vBHackers.com
vBSEO 3.3.0 Gold Released with New "Virtual HTML Display" Feature Available for download now

vBSEO Google Sitemap Generator - Version 2.5 Released

Crawlability Network: vBulletin SEO | vBulletin Hackers
Reply With Quote

Reply

« Integrating AJAX Technology Into Your Modifications | [How To] Display noavatar.gif on the memberlist »

Currently Active Users Viewing This Thread: 1 (0 members and 1 guests)

 
Thread Tools
Show Printable Version Show Printable Version
Email this Page Email this Page

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are On

Forum Rules



All times are GMT. The time now is 07:31 AM.

Powered by vBulletin® Version 3.8.4
Copyright ©2000 - 2010, Jelsoft Enterprises Ltd.
SEO by vBSEO 3.3.2 ©2009, Crawlability, Inc.
Transverse Styles
  • Top
  • Archive
  • vBSEO
  • Contact Us
LinkBack
LinkBack URL LinkBack URL
About LinkBacks About LinkBacks
Bookmark & Share
Digg this Thread! Digg this Thread!
Add Thread to del.icio.us Add Thread to del.icio.us
Bookmark in Technorati Bookmark in Technorati
Furl this Thread! Furl this Thread!