I have a problem to load specific div element and show on my page using PHP. My code right now is as follows:
<?php
$page = file_get_contents("http://www.bbc.co.uk/sport/football/results");
preg_match('/<div id="results-data" class="fixtures-table full-table-medium">(.*)<\/div>/is', $page, $matches);
var_dump($matches);
?>
I want it to load id="results-data" and show it on my page.
You won't be able to manipulate the URL to get only a portion of the page. So what you'll want to do is grab the page contents via the server-side language of your choice and then parse the HTML. From there you can grab the specific DIV you are looking for and then print that out to your screen. You could also use to remove unwanted content.
With PHP you could use file_get_contents() to read the file you want to parse and then use DOMDocument to parse it and grab the DIV you want.
Here's the basic idea. This is untested but should point you in the right direction:
$page = file_get_contents('http://www.bbc.co.uk/sport/football/results');
$doc = new DOMDocument();
$doc->loadHTML($page);
$divs = $doc->getElementsByTagName('div');
foreach($divs as $div) {
// Loop through the DIVs looking for one withan id of "content"
// Then echo out its contents (pardon the pun)
if ($div->getAttribute('id') === 'content') {
echo $div->nodeValue;
}
}
You should use some html parser. Take a look at PHPQuery, here is how you can do it:
require_once('phpQuery/phpQuery.php');
$html = file_get_contents('http://www.bbc.co.uk/sport/football/results');
phpQuery::newDocumentHTML($html);
$resultData = pq('div#results-data');
echo $resultData;
Check it out here:
http://code.google.com/p/phpquery
Also see their selectors' documentation.
Related
Currently I am working on a project which requires me to parse some data from an alternative website, and I'm having some issues (note I am very new to PHP coding.)
Here's the code I am using below + the content it returns.
$dl = $html2->find('ol.tracklist',0);
print $dl = $dl->outertext;
The above code returns the data for what we're trying to get, it's below but extremely messy provided you would like to see click here.
However, when I put this in a foreach, it only returns one of the a href attributes at a time.
foreach($html2->find('ol.tracklist') as $li)
{
$title = $li->find('a',0);
print $title;
}
What can I do so that it returns all of the a href elements from the example code above?
NOTE: I am using simple_html_dom.php for this.
Based on the markup, just point directly to it, just get it list then point to its anchor:
foreach ($html2->find('ol.tracklist li') as $li) {
$anchor = $li->find('ul li a', 0);
echo $anchor->href; // and other attributes
}
I have a problem with my website. I want to get all the schedule flight data from another website. I see its source code and get the url link for processing data. Can somebody tell me how to get the data from the current url link, then display it to our website with PHP?
You can do it using file_get_contents() function. this function return html of provided url. then use HTML Parser to get required data.
$html = file_get_contents("http://website.com");
$dom = new DOMDocument();
$dom->loadHTML($html);
$nodes = $dom->getElementsByTagName('h3');
foreach ($nodes as $node) {
echo $node->nodeValue."<br>"; // return <h3> tag data
}
Another way to extract data using preg_match_all()
$html = file_get_contents($_REQUEST['url']);
preg_match_all('/<div class="swrapper">(.*?)<\/div>/s', $html, $matches);
// specify the class to get the data of that class
foreach ($matches[1] as $node) {
echo $node."<br><br><br>";
}
Use file_get_contents
Sample code
<?php
$homepage = file_get_contents('http://www.google.com/');
echo $homepage;
?>
Yes Sure ... Use file_get_contents('$URl') function to get the source code of the target page or use curl if you prefer using curl .. and scrap all data you need with preg_match_all() function
Note : If the target url has https:// then you should use curl to get the source code
Example
http://stackoverflow.com/questions/2838253/php-curl-preg-match-extract-text-from-xhtml
I've been recently playing with DOMXpath in PHP and had success with it, trying to get more experience with it I've been playing grabbing certain elements of different sites. I am having trouble getting the weather marker off of http://www.theweathernetwork.com/weather/cape0005 this website.
Specifically I want
//*[#id='theTemperature']
Here is what I have
$url = file_get_contents('http://www.theweathernetwork.com/weather/cape0005');
$dom = new DOMDocument();
#$dom->loadHTML($url);
$xpath = new DOMXPath($dom);
$tags = $xpath->query("//*[#id='theTemperature']");
foreach ($tags as $tag){
echo $tag->nodeValue;
}
Is there something I am doing wrong here? I am able to produce actual results on other tags on the page but specifically not this one.
Thanks in advance.
You might want to improve your DOMDocument debugging skills, here some hints (Demo):
<?php
header('Content-Type: text/plain;');
$url = file_get_contents('http://www.theweathernetwork.com/weather/cape0005');
$dom = new DOMDocument();
#$dom->loadHTML($url);
$xpath = new DOMXPath($dom);
$tags = $xpath->query("//*[#id='theTemperature']");
foreach ($tags as $i => $tag){
echo $i, ': ', var_dump($tag->nodeValue), ' HTML: ', $dom->saveHTML($tag), "\n";
}
Output the number of the found node, I do it here with $i in the foreach.
var_dump the ->nodeValue, it helps to show what exactly it is.
Output the HTML by making use of the saveHTML function which shows a better picture.
The actual output:
0: string(0) ""
HTML: <p id="theTemperature"></p>
You can easily spot that the element is empty, so the temperature must go in from somewhere else, e.g. via javascript. Check the Network tools of your browser.
what happens is straightforward, the page contains an empty id="theTemperature" element which is a placeholder to be populated with javascript. file_get_contents() will just download the page, not executing javascript, so the element remains empty. Try to load the page in the browser with javascript disabled to see it yourself
The element you're trying to select is indeed empty. The page loads the temperature into that id through ajax. Specifically this script:
http://www.theweathernetwork.com/common/js/master/citypage_ajax.js?cb=201301231338
but when you do a file_get_contents those scripts obviously don't get resolved. I'd go with guido's solution of using the RSS
I load a html page with PHP Dom Document :
$doc = new DOMDocument();
#$doc->loadHTMLFile($url);
I search in my page all "a" elements, and if they realize my condition i need to replace for example My link is beautiful by just My link is beautiful
Here my loop :
$liens = $div->getElementsByTagName('a');
foreach($liens as $lien){
if($lien->hasAttribute('href')){
if (preg_match("/metz2/i", $lien->getAttribute('href'))) {
//HERE I NEED TO REPLACE </a>
}
$cpt++;
}
}
Do you have any ideas ? Suggestions ? Thanks :)
Every time i need to manage DOM with PHP, i use a framework called PHP Simple HTLM DOM parser. (Link here)
It's very easy to use, something like this might work for you:
// Create DOM from URL or file
$html = file_get_html('http://www.page.com/');
// Find all links
foreach($html->find('a') as $element) {
//Do your custom logic here if you need it, for example this extracts the inner contents of the a-tag, and puts it freely.
$inner = $element->innertext;
$element->outertext($inner);
}
//To echo modified html again:
echo $html;
Could be done with preg_replace as well:
$sText = 'Stackoverflow';
$sText = preg_replace( '/<a.*>(.*)<\/a>/', '$1', $sText );
echo $sText;
I'm using a crawler to retrieve the HTML content of certain pages on the web. I currently have the entire HTML stored in a single PHP variable:
$string = "<PRE>".htmlspecialchars($crawler->results)."</PRE>\n";
What I want to do is select all "p" tags (for example) and store their in an array. What is the proper way to do that?
I've tried the following, by using xpath, but it doesn't show anything (most probably because the document itself isn't an XML, I just copy-pasted the example given in its documentation).
$xml = new SimpleXMLElement ($string);
$result=$xml->xpath('/p');
while(list( , $node)=each($result)){
echo '/p: ' , $node, "\n";
}
Hopefully someone with (a lot) more experience in PHP will be able to help me out :D
Try using DOMDocument along with DOMDocument::getElementsByTagName. The workflow should be quite simple. Something like:
$doc = DOMDocument::loadHTML(htmlspecialchars($crawler->results));
$pNodes = $doc->getElementsByTagName('p');
Which will return a DOMNodeList.
I vote for use regexp. For tag p
preg_match_all('/<p>(.*)<\/p>/', '<p>foo</p><p>foo 1</p><p>foo 2</p>', $arr, PREG_PATTERN_ORDER);
if(is_array($arr))
{
foreach($arr as $value)
{
echo $value."</br>";
}
}
Check out Simple HTML Dom. It will grab external pages and process them with fairly accurate detail.
http://simplehtmldom.sourceforge.net/
It can be used like this:
// Create DOM from URL or file
$html = file_get_html('http://www.google.com/');
// Find all images
foreach($html->find('img') as $element)
echo $element->src . '<br>';