php search for string, then find another [duplicate] - php

This question already has answers here:
Closed 10 years ago.
Possible Duplicate:
How to parse and process HTML with PHP?
I am brand new to php, only a couple of hours in, trying to understand searching and finding. Let's say I want to extract the rank of Diablo 3 from Amazon's top seller list here. There I can search for the string "Diablo III" or similar to find the following block (sorry about the formatting):
http://www.amazon.com/Diablo-III-Standard-Edition-Pc/dp/B00178630A/ref=zg_bs_4924894011_1
"><img src="http://ecx.images-amazon.com/images/I/41kXCp%2BUyeL._SL160_SL160_.jpg" alt="Diablo III: Standard Edition" title="Diablo III: Standard Edition" onload="if (typeof uet == 'function') { uet('af'); }"/></a></div></div><div class="zg_itemRightDiv_normal"><div class="zg_rankLine"><span class="zg_rankNumber">1.</span><span class="zg_rankMeta"></span></div><div class="zg_title"><a href="
http://www.amazon.com/Diablo-III-Standard-Edition-Pc/dp/B00178630A/ref=zg_bs_4924894011_1
">Diablo III: Standard Edition</a></div><div class="zg_byline">by Blizzard Entertainment
Now, I want to try to extract the rank, which is defined in this part <span class="zg_rankNumber">1.</span> and is currently 1.
Could someone please advise on the best way on extracting that number so that if it falls to second, third or whatever place (up until 20) I will still be able to extract it?
I have looked a bit into preg_match and regex but I couldn't quite understand the use.

You can start using Simple dom html parser
So, if you wanna find this:
<span class="zg_rankNumber">
you can do it like this: ($str contains the html data)
$html = str_get_html($str);
echo $html->find("span[class='zg_rankNumber']",0)->innertext;
EDITED:
If you want to get a specific rank of game (Diablo III), then based on formatting, you just call:
echo $html->find("img[title^='Diablo III']",0)->find("span[class='zg_rankNumber']",0)->innertext;

preg_match_all( '/<span class=\"zg_rankNumber\">(.*?)<\/span>/is', $string, $matches );
print_r($matches)
it'll take a couple of hours for writing the exact code.. but i can tell you the logic
Extract all "" from the html and store it in an array.
Loop through the array and check for the title.
If you found the title, extract the rank from that array element

Related

How to get data from other website using php?

This is my first question on this site, sorry if it is not clear enough.
So my problem is that, i would like to get all of the product IDs from a webshop, that has no API.
A product id looks like: xy-000000
I know that I need a webscraper, but the problem is that i don’t know how to find a specific word like xy- 000000 with it. I tried many web-scrapers, but the only thing that i could find with them is html tags like the title or keywords.
I searched a lot for it on google, and found some web scrapers, bat they are not working fine for me.
As i mentioned, i would like to get all of the product IDs from a different webshop using php, for finding some products that i am not selling. (My webshop has the same product IDs as the other.)
Can anyone please help me find a php script that is similar to what i need?
So this is the code that i am trying to use:
<?php
$data = file_get_contents('https://www.mesemix.hu/hu/superman-ruhanemuk/11292-szuperhosoek-mintas-zokni.html');
error_reporting(0);
preg_match('/<title>([^<]+)<\/title>/i', $data, $matches);
$title = $matches[1];
preg_match('/[0-9]{6}/', $data, $matches);
$number = $matches[1];
preg_match('/<img[^>]*src=[\'"]([^\'"]+)[\'"][^>]*>/i', $data, $matches);
$img = $matches[1];
echo $title."<br>\n";
echo $img."<br>\n";
echo $number;
echo $data;
?>
The problem is that i can not find the 6 digit number with it. ($number)
In the webshop's source code it looks like this:
var productReference = 'SP- 418070';
If there is anything wrong with my question please let me know.
The Term you are looking for is "Web-Scraper"
You can do it in a couple of different ways.
One of these 2 PHP libraries
http://simplehtmldom.sourceforge.net/
Or
https://github.com/FriendsOfPHP/Goutte
Both are very simple to use there are documentations for both of them
The way they work are just like jQuery (javascript) you target the data that you need to get by the CSS selectors

Ordering and Selecting frequently used tags

I have looked on stackoverflow for a solution to this however couldn't find a good answers which outlined the issues I was having; Essentially what I'm trying to achieve is to array out 15 of the most frequent tags used from all my users subjects.
This is how I currently select the data
$sql = mysql_query("SELECT subject FROM `users`");
$row = mysql_fetch_array($sql);
I do apologise for the code looking nothing like what I'm trying to achieve I really don't have any clue where to begin with trying to achieve this and came here for a possible solution. Now this would work fine and I'd be able to array them out and however my problem is the subjects contain words along with the hash tags so an example room subject would look like hey my name is example #follow me how would I only grab the #followand once I've grabbed all the hashtags from all of the subjects to echo the most frequent 15?
Again I apologise for the code looking nothing like what I'm trying to achieve and I appreciate anyone's help. This was the closest post I found to solving my issue however was not useful.
Example
Here is three room subjects;
`Hello welcome to my room #awesome #wishlist`
`Hey hows everyone doing? #friday #awesome`
`Check out my #wishlist looking #awesome`
This is what I'm trying to view them as
[3] #awesome [2] #wishlist [1] #friday
What you want to achieve here is pretty complex for an SQL query and you are likely to run in to efficiency problems with parsing the subject every time you want to run this code.
The best solution is probably to have a table that associates tags with users. You can update this table every time a user changes their subject. To get the number of times a tag is used then becomes trivial with COUNT(DISTINCT tag).
One way would be to parse the result set in PHP. Once you query your subject line from the database, let's say you have them in the array $results, then you can build a frequency distribution of words like this:
$freqDist = [];
foreach($results as $row)
{
$words = explode(" ", $row);
foreach($words as $w)
{
if (array_key_exists($w, $freqDist))
$freqDist[$w]++;
else
$freqDist[$w] = 1;
}
}
You can then sort in descending order and display the distribution of words like this:
arsort($freqDist);
foreach($freqDist as $word => $count)
{
if (strpos($word, '#') !== FALSE)
echo "$word: $count\n";
else
echo "$word: does not contain hashtag, DROPPED\n";
}
You could also use preg_match() to do fancier matching if you want but I've used a naive approach with strpos() to assume that if the word has '#' (anywhere) it's a hashtag.
Other functions of possible use to you:
str_word_count(): Return information about words used in a string.
array_count_values(): Counts all the values of an array.

Generating an HTML color code for event category in ai1ec

Although this question relates to a particular Wordpress plugin called the All in One Event Calender by Time.ly it can also be a general PHP related question.
I am trying to modify a theme to use event colours selected in the ai1ec back end and would like to produce a simple HTML colour code - ie "#f2f2f2"
The plugin itself has loads of functions and php shortcodes to pull a wealth of information off each event such as the one listed below.
<?php echo $event->get_category_text_color(); ?> which will print style="color: #f2a011;"
Can also change to print style="background-color: #f2a011;" with the use of $event->get_category_bg_color();
Now the real meat of the question
All I want to do is get that HTML colour so I can also code it into buttons and other visual elements. I have scoured through blogs and code to try and find something that does it to no avail.
What I was wondering is if you could write a filter of some sort to just take the information within the "#f2f2f2" quotation marks. I know it's not called a filter as searches for php filter return information about something completely different - I'm a self taught PHP programmer and so searching for a lot of terms I don't know can be pretty tough!
As pointed above, substr would be a great solution but it wouldn't solve the case where the color code is expressed in this format:
#FFF;
instead of:
#FFFFFF;
Therefore, this regex should do the job quite well:
'/?=#(.*(?=;)))/'
Example:
$matches = array();
preg_match('/?=#(.*(?=;)))/', $event->get_category_text_color(), $matches);
$colorCode = "#{$matches[0]};";
You could use the substr() function like so:
echo substr($event->get_category_text_color(),14,-2);
Which in the example, would return #f2f2f2.

How to pull specific content from HTML using PHP? [duplicate]

This question already has answers here:
Closed 11 years ago.
Possible Duplicate:
How to parse and process HTML with PHP?
How do I go about pulling specific content from a given live online HTML page?
For example: http://www.gumtree.com/p/for-sale/ovation-semi-acoustic-guitar/93991967
I want to retrieve the text description, the path to the main image and the price only. So basically, I want to retrieve content which is inside specific divs with maybe specific IDs or classes inside a html page.
Psuedo code
$page = load_html_contents('http://www.gumtr..');
$price = getPrice($page);
$description = getDescription($page);
$title = getTitle($page);
Please note I do not intend to steal any content from gumtree, or anywhere else for that matter, I am just providing an example.
First of all, what u wanna do, is called WEBSCRAPING.
Basically, u load into the html content into one variable, so u will need to use regexps to search for specific ids..etc.
Search after webscraping.
HERE is a basic tutorial
THIS book should be useful too.
something like this would be a good starting point if you wanted tabular output
$raw=file_get_contents($url) or die('could not select');
$newlines=array("\t","\n","\r","\x20\x20","\0","\x0B","<br/>");
$content=str_replace($newlines, "", html_entity_decode($raw));
$start=strpos($content,'<some id> ');
$end = strpos($content,'</ending id>');
$table = substr($content,$start,$end-$start);
preg_match_all("|<tr(.*)</tr>|U",$table,$rows);
foreach ($rows[0] as $row){
if ((strpos($row,'<th')===false)){
// array to vars
preg_match_all("|<td(.*)</td>|U",$row,$cells);
$var1= strip_tags($cells[0][0]);
$var2= strip_tags($cells[0][1]);
etc etc
The tutorial Easy web scraping with PHP recommended by robotrobert is good to start, I have made several comments in it. For a better performance use curl. Among other things handles HTTP headers, SSL, cookies, proxies, etc. Cookies is something that you must pay attention.
I just found HTML Parsing and Screen Scraping with the Simple HTML DOM Library. Is more advanced, facilitates and speed up the page parsing through a DOM parser (instead regular expressions --enough hard to master and resources consuming). I recommend you this last one 100%.

PHP: lookup for string in tags and embed it to another tag [duplicate]

This question already has answers here:
Closed 11 years ago.
Possible Duplicate:
A simple program to CRUD node and node values of xml file
I have a string (for example $output) with html code containing multiple tags and text. I need to lookup for all ID tags:
<id>123456</id>
and change them to:
<id>123456</id><url>www.webpage.com/somepage.php?id=123456</url>
As you can see the new tag is created and added after ID tag.
Webpage url address is stored in string $url_adr. ID number is always different in each ID tag. And there is multiple ID tags in string, so it needs to be changed one by one
I need this code in PHP. Thanks a lot....
In simple cases like this, you can do it with a regular expression replace:
$output = '<id>123456</id>';
$url_adr = 'www.webpage.com/somepage.php?id=';
preg_replace('~<id>(.*?)</id>~i', '<id>$1</id><url>'.$url_adr.'$1</url>', $output);
But if you need anything more complicated or full featured, you'd better take a look at proper XML parsers.

Categories