in this tutorial i'll explain basic approach on ways to process via php code info that is located on remote server.
this could be quite useful when trying to fetch cover picture or specific text data from a certain website with same pattern. we'll implant this using regular epression pattern matching to your query.
(could be for example covers from imdb / gamespot / tv or anything that comes to your mind).
small disclaimer before we continue: i'll remind you this tutorial is for educational purpose only. you should be aware that most websites content (either text or pictures) are copyrighted by law, and using this info without owner permission is usually forbidden by law ! so in this tutorial i'm assuming you have proper permisisions.
ok, lets start:
first i'll introduce you with basic code/"algo" (one solution, of many possible) that will allow you to remote open file & process it line by line:
You must first be a
registered member to view any code.
as you can see what this code basically does is using php function called "file()" to open remote file, and then takes the array returned to it, and split it with for loop using foreach method. once pattern needed matched, i've added break command, to stop looping and exit the code, to avoid further lags. also plz note i've used in pattern match the function eregi() = match pattern (case insensitive). if you need it to be case sensitive, use function ereg() instead.
now we'll try to make some regular expression "crash course" to pick up with basics:
1. read tutorials & info guides @
http://www.regular-expressions.info/ (this is very good website with good regexp resources)
2. very common regexp chars/symbols & examples to clear it:
^ - char that match for start of string
$ - char that match for end of string
. (or .* with repetition) - all chars except for line break
[] - range or chars list
() - refer to match found later on (see examples)
{} - amount of repetition
examples: end/start/all chars examples
^a.* - all sentence that start with letter "a"
^a.*t$ - all sentence that start with letter "a" & ends with t
use \ (backslash) to "escape" any special char used in regexp, in case you need it.
range & repetition examples
[0-9] - number between 0-9
[a-z] - lower case letter from a-z
[a-zA-Z] - lower & upper case letter (a to z)
[0-9]{1,6} - any number with 0-9 digits that can be 1-6 digits long
[4-9]{5} - match for 5 digits number (4-9 digits)
escaping chars example:
\(hello\) - will match for "(hello)" (without the quotes)
using () example:
when u expect 1 string to match several queries, use () to seperate each result. the later on when using php function like eregi(), each () will be new array field assigned to your dump var.
3. putting it all togther:
lets assume i'm looking to extract from html code line like:
You must first be a
registered member to view any code.
(some code line from imdb showing matrix cover picture link)
lets assume i want to extract from it the picture cover link, so a possible match line for this would be something like this:
You must first be a
registered member to view any code.
explanation of what done in the example:
logic says the name could vary, so any copy of "The Matrix" was replaced with .* to match any chars (including space)
for the info i need fetching i added () around it (the url itself).
notice i've escaped some special chars like / (slash) and . (dot).
and notice the usage of numbers range.
also note the more accurate the regexp is, the better & certain you'll get what you need.
as for the above html-code-line the next regexp could be as well:
You must first be a
registered member to view any code.
however this is "bad", since it would match on any picture url with jpg extesion, and not nesscarily on our wanted cover-link match.
if we are back to our code example in the begining of this tutorial:
You must first be a
registered member to view any code.
in this the url we needed is stored into $my_pic_url.
also note regarding eregi()/ereg() function i've mentioned in previous lines above. if there are more than 1 match and assuming you did multi () to gather each match wanted, each match will be returned as new array cell, in the order placed on pattern. note that first match starts with cell #1.
cell #0 = entire sentence, assuming something matched on your pattern.
also few pointers for those hard to match stuff:
in case you need to match line with nothing "specail" about it (just plain text), you can't use .* approach, as it will match every line.
instead try to use some var as "flag", from previous line.
for example if i have 2 lines of html code:
You must first be a
registered member to view any code.
and lets assume you want to match only the "any text goes here", write pattern to match "name:" and inside IF do something like
You must first be a
registered member to view any code.
and above add another IF inside the foreach loop to check:
You must first be a
registered member to view any code.
that's it, hope this gave you the basics & ways to approach this to any task needed. when looking on this like that, there isn't any website that is "too big" to handle. all matter of practice & close exmaine of html code output.
another tip onced passed to me by ShavedApe (which is a great coder) is to use app like "regexbuddy" - though it's not freeware, it's very handy tool that can show you any regexp highlight matching in your text, making your life much easier when much complex regexp is required.
hope you find this usefull. :classic:
greets.