I have a problem with my website. I want to get all the schedule flight data from another website. I see its source code and get the url link for processing data. Can somebody tell me how to get the data from the current url link, then display it to our website with PHP?
You can do it using file_get_contents() function. this function return html of provided url. then use HTML Parser to get required data.
$html = file_get_contents("http://website.com");
$dom = new DOMDocument();
$dom->loadHTML($html);
$nodes = $dom->getElementsByTagName('h3');
foreach ($nodes as $node) {
echo $node->nodeValue."<br>"; // return <h3> tag data
}
Another way to extract data using preg_match_all()
$html = file_get_contents($_REQUEST['url']);
preg_match_all('/<div class="swrapper">(.*?)<\/div>/s', $html, $matches);
// specify the class to get the data of that class
foreach ($matches[1] as $node) {
echo $node."<br><br><br>";
}
Use file_get_contents
Sample code
<?php
$homepage = file_get_contents('http://www.google.com/');
echo $homepage;
?>
Yes Sure ... Use file_get_contents('$URl') function to get the source code of the target page or use curl if you prefer using curl .. and scrap all data you need with preg_match_all() function
Note : If the target url has https:// then you should use curl to get the source code
Example
http://stackoverflow.com/questions/2838253/php-curl-preg-match-extract-text-from-xhtml
Related
I'm looking for a solution to extract only one URL from a specific webpage using PHP.
Here's a simple example of what I need:
I have a URL with many links (https://apkpure.com/mi-home/com.xiaomi.smarthome/download?from=details)
I want to scrape the link under the anchor click here from the current page.
Then the code must return this result https://download.apkpure.com/b/XAPK/Y29tLnhpYW9taS5zbWFydGhvbWVfNjMwNjdfYWU1M2FmOWU?_fn=TWkgSG9tZV92NS44LjdfYXBrcHVyZS5jb20ueGFwaw&as=4c5e64f6f957edac834f3631fe4e09715f2e35f6&ai=-1070628217&at=1596863870&_sa=ai%2Cat&k=24cb20f95fbf333deb01c145ce7b982b5f30d87e&_p=Y29tLnhpYW9taS5zbWFydGhvbWU&c=1%7CLIFESTYLE%7CZGV2PVhpYW9taSUyMEluYy4mdD14YXBrJnM9MTI5OTAzMTM4JnZuPTUuOC43JnZjPTYzMDY3.
I tried this:
$sourceURL="https://apkpure.com/mi-home/com.xiaomi.smarthome/download?from=details";
$htmlSource=htmlentities(file_get_contents($sourceURL));
echo strip_tags($htmlSource, "<a>");
I get the result with all links including the one I need
I need your help to extract the href value of the link I want.
Thanks in advance.
If you look at the required URL, you can see it has a pattern https://download.apkpure.com at start of each Click Here URL, so, we can use regex to find it.
preg_match_all will return an array of strings that will match our regex. Then I have used implode to convert the first index to a string.
Here is the complete working code:
$sourceURL="https://apkpure.com/mi-home/com.xiaomi.smarthome/download?from=details";
$content=file_get_contents($sourceURL);
$content = strip_tags($content,"<a>");
preg_match_all('#\bhttps?://download.apkpure.com[^,\s()<>]+(?:\([\w\d]+\)|([^,[:punct:]\s]|/))#', $content, $match);
echo implode(', ', $match[0]);
Most elegant way is to use a DOM parser.
Iterate thru anchors
Check if anchor ID is 'download_link' (which is in the 'click here' link)
Extract the href attribute value
$html = file_get_contents('https://apkpure.com/mi-home/com.xiaomi.smarthome/download?from=details');
libxml_use_internal_errors(true);
$doc = new DOMDocument();
$doc->loadHTML($html);
$href = '';
foreach($doc->getElementsByTagName('a') as $item) {
if($item->getAttribute('id') == 'download_link') {
$href = $item->getAttribute('href');
break;
}
}
echo $href;
https://download.apkpure.com/b/XAPK/Y29tLnhpYW9taS5zbWFydGhvbWVfNjMwNjdfYWU1M2FmOWU?_fn=TWkgSG9tZV92NS44LjdfYXBrcHVyZS5jb20ueGFwaw&as=6a7de2cb660007a32e4b3d61a0d3c41e5f2e7102&ai=1946881098&at=1596878986&_sa=ai%2Cat&k=9e912b1007d50d2be9af8e78bcdea86c5f31138a&_p=Y29tLnhpYW9taS5zbWFydGhvbWU&c=1%7CLIFESTYLE%7CZGV2PVhpYW9taSUyMEluYy4mdD14YXBrJnM9MTI5OTAzMTM4JnZuPTUuOC43JnZjPTYzMDY3
I've been recently playing with DOMXpath in PHP and had success with it, trying to get more experience with it I've been playing grabbing certain elements of different sites. I am having trouble getting the weather marker off of http://www.theweathernetwork.com/weather/cape0005 this website.
Specifically I want
//*[#id='theTemperature']
Here is what I have
$url = file_get_contents('http://www.theweathernetwork.com/weather/cape0005');
$dom = new DOMDocument();
#$dom->loadHTML($url);
$xpath = new DOMXPath($dom);
$tags = $xpath->query("//*[#id='theTemperature']");
foreach ($tags as $tag){
echo $tag->nodeValue;
}
Is there something I am doing wrong here? I am able to produce actual results on other tags on the page but specifically not this one.
Thanks in advance.
You might want to improve your DOMDocument debugging skills, here some hints (Demo):
<?php
header('Content-Type: text/plain;');
$url = file_get_contents('http://www.theweathernetwork.com/weather/cape0005');
$dom = new DOMDocument();
#$dom->loadHTML($url);
$xpath = new DOMXPath($dom);
$tags = $xpath->query("//*[#id='theTemperature']");
foreach ($tags as $i => $tag){
echo $i, ': ', var_dump($tag->nodeValue), ' HTML: ', $dom->saveHTML($tag), "\n";
}
Output the number of the found node, I do it here with $i in the foreach.
var_dump the ->nodeValue, it helps to show what exactly it is.
Output the HTML by making use of the saveHTML function which shows a better picture.
The actual output:
0: string(0) ""
HTML: <p id="theTemperature"></p>
You can easily spot that the element is empty, so the temperature must go in from somewhere else, e.g. via javascript. Check the Network tools of your browser.
what happens is straightforward, the page contains an empty id="theTemperature" element which is a placeholder to be populated with javascript. file_get_contents() will just download the page, not executing javascript, so the element remains empty. Try to load the page in the browser with javascript disabled to see it yourself
The element you're trying to select is indeed empty. The page loads the temperature into that id through ajax. Specifically this script:
http://www.theweathernetwork.com/common/js/master/citypage_ajax.js?cb=201301231338
but when you do a file_get_contents those scripts obviously don't get resolved. I'd go with guido's solution of using the RSS
I have a problem to load specific div element and show on my page using PHP. My code right now is as follows:
<?php
$page = file_get_contents("http://www.bbc.co.uk/sport/football/results");
preg_match('/<div id="results-data" class="fixtures-table full-table-medium">(.*)<\/div>/is', $page, $matches);
var_dump($matches);
?>
I want it to load id="results-data" and show it on my page.
You won't be able to manipulate the URL to get only a portion of the page. So what you'll want to do is grab the page contents via the server-side language of your choice and then parse the HTML. From there you can grab the specific DIV you are looking for and then print that out to your screen. You could also use to remove unwanted content.
With PHP you could use file_get_contents() to read the file you want to parse and then use DOMDocument to parse it and grab the DIV you want.
Here's the basic idea. This is untested but should point you in the right direction:
$page = file_get_contents('http://www.bbc.co.uk/sport/football/results');
$doc = new DOMDocument();
$doc->loadHTML($page);
$divs = $doc->getElementsByTagName('div');
foreach($divs as $div) {
// Loop through the DIVs looking for one withan id of "content"
// Then echo out its contents (pardon the pun)
if ($div->getAttribute('id') === 'content') {
echo $div->nodeValue;
}
}
You should use some html parser. Take a look at PHPQuery, here is how you can do it:
require_once('phpQuery/phpQuery.php');
$html = file_get_contents('http://www.bbc.co.uk/sport/football/results');
phpQuery::newDocumentHTML($html);
$resultData = pq('div#results-data');
echo $resultData;
Check it out here:
http://code.google.com/p/phpquery
Also see their selectors' documentation.
I'm using a crawler to retrieve the HTML content of certain pages on the web. I currently have the entire HTML stored in a single PHP variable:
$string = "<PRE>".htmlspecialchars($crawler->results)."</PRE>\n";
What I want to do is select all "p" tags (for example) and store their in an array. What is the proper way to do that?
I've tried the following, by using xpath, but it doesn't show anything (most probably because the document itself isn't an XML, I just copy-pasted the example given in its documentation).
$xml = new SimpleXMLElement ($string);
$result=$xml->xpath('/p');
while(list( , $node)=each($result)){
echo '/p: ' , $node, "\n";
}
Hopefully someone with (a lot) more experience in PHP will be able to help me out :D
Try using DOMDocument along with DOMDocument::getElementsByTagName. The workflow should be quite simple. Something like:
$doc = DOMDocument::loadHTML(htmlspecialchars($crawler->results));
$pNodes = $doc->getElementsByTagName('p');
Which will return a DOMNodeList.
I vote for use regexp. For tag p
preg_match_all('/<p>(.*)<\/p>/', '<p>foo</p><p>foo 1</p><p>foo 2</p>', $arr, PREG_PATTERN_ORDER);
if(is_array($arr))
{
foreach($arr as $value)
{
echo $value."</br>";
}
}
Check out Simple HTML Dom. It will grab external pages and process them with fairly accurate detail.
http://simplehtmldom.sourceforge.net/
It can be used like this:
// Create DOM from URL or file
$html = file_get_html('http://www.google.com/');
// Find all images
foreach($html->find('img') as $element)
echo $element->src . '<br>';
I am trying to read all links with in a given url.
here is code I am using :
$dom = new DomDocument();
#$dom->loadHTMLFile($url);
$urls = $dom->getElementsByTagName('a');
foreach ($urls as $url) {
echo $url->innertext ." => ".$url->getAttribute('href');
Script giving all links of given url.
But problem here is I am not able to get image links (image inside anchor tag)
First I tried with
$url->nodeValue
But it was giving anchor text having text values only.
I want to read both images and text links.
I want output in below formmat.
Input :
first link
<img src="imageone.jpg">
Current Output:
first link => link1.php
=>link2.php with warning (Undefined property: DOMElement::$innertext )
Required Output :
first link => link1.php
<img src="imageone.jpg">=>link2.php
innerText doesn't exist in PHP; it's a non-standard, Javascript extension to the DOM.
I think what you want is effectively an innerHTML property. There isn't a native way of achieving this. You can use the saveXML or, from PHP 5.3.6, saveHTML methods to export the HTML of each of the child nodes:
function innerHTML($node) {
$ret = '';
foreach ($node->childNodes as $node) {
$ret .= $node->ownerDocument->saveHTML($node);
}
return $ret;
}
Note that you'll need to use saveXML before PHP 5.3.6
You could then call it as so:
echo innerHTML($url) ." => ".$url->getAttribute('href');