This question already has answers here:
How do you parse and process HTML/XML in PHP?
(31 answers)
Closed 9 years ago.
I've just started PHP and I want to scrape a little page which I can't, I tried doing 'PREG_MATCH_ALL' but it just doesn't get the result I want.. Basically I want to scrape the youtube video links from here only: https://gdata.youtube.com/feeds/api/standardfeeds/most_shared - Scrape all of them and then use them later.
I tried using the following code which failed;
<?php
$data = file_get_contents('https://gdata.youtube.com/feeds/api/standardfeeds/most_shared');
preg_match_all("/src='(.+?)'>/", $data, $links);
$link_out = $links[0][0];
echo $link_out;
?>
I'm new to PHP, so little help please.
Thanks
As the feed is XML, you can use PHP's SimpleXMLElement to obtain the data.
<?php
$xml = new SimpleXMLElement(
'https://gdata.youtube.com/feeds/api/standardfeeds/most_shared',
null,
true
);
foreach($xml->entry as $entry) {
echo $entry->content['src'], PHP_EOL;
}
/*
https://www.youtube.com/v/IjWc43FCYlg?version=3&f=standard&app=youtube_gdata
https://www.youtube.com/v/Xw1C5T-fH2Y?version=3&f=standard&app=youtube_gdata
https://www.youtube.com/v/Kq0_dGKx4Os?version=3&f=standard&app=youtube_gdata
https://www.youtube.com/v/gbcBYs0ljI0?version=3&f=standard&app=youtube_gdata
https://www.youtube.com/v/78juOpTM3tE?version=3&f=standard&app=youtube_gdata
https://www.youtube.com/v/OOiZ-5DqwYI?version=3&f=standard&app=youtube_gdata
https://www.youtube.com/v/zjz614QVyfQ?version=3&f=standard&app=youtube_gdata
https://www.youtube.com/v/h15m87WsCHQ?version=3&f=standard&app=youtube_gdata
https://www.youtube.com/v/SXKOTdyOUBg?version=3&f=standard&app=youtube_gdata
https://www.youtube.com/v/BRAM8MpqIeA?version=3&f=standard&app=youtube_gdata
https://www.youtube.com/v/5yB3n9fu-rM?version=3&f=standard&app=youtube_gdata
https://www.youtube.com/v/NAOo9SnzRH8?version=3&f=standard&app=youtube_gdata
https://www.youtube.com/v/0KtILkzC-1g?version=3&f=standard&app=youtube_gdata
https://www.youtube.com/v/kWSIFh8ICaA?version=3&f=standard&app=youtube_gdata
https://www.youtube.com/v/Mi6AhogZCeg?version=3&f=standard&app=youtube_gdata
https://www.youtube.com/v/kWuIGAZ1x2I?version=3&f=standard&app=youtube_gdata
https://www.youtube.com/v/lKY5fmDGVLs?version=3&f=standard&app=youtube_gdata
https://www.youtube.com/v/C94PaCtqOk4?version=3&f=standard&app=youtube_gdata
https://www.youtube.com/v/V-fL8zopddI?version=3&f=standard&app=youtube_gdata
https://www.youtube.com/v/UWlzMIl7E48?version=3&f=standard&app=youtube_gdata
https://www.youtube.com/v/mcw6j-QWGMo?version=3&f=standard&app=youtube_gdata
https://www.youtube.com/v/-RSDaRttpzk?version=3&f=standard&app=youtube_gdata
https://www.youtube.com/v/8_RDx4skTp4?version=3&f=standard&app=youtube_gdata
https://www.youtube.com/v/7YDWdv9kR0M?version=3&f=standard&app=youtube_gdata
https://www.youtube.com/v/m96tYpEk1Ao?version=3&f=standard&app=youtube_gdata
*/
Anthony.
Try with this pregmatch:
preg_match_all("/src='([^']+)'/si", $data, $links);
and show results:
echo "<pre>";
print_r($links);
<?php
$data = file_get_contents('https://gdata.youtube.com/feeds/api/standardfeeds/most_shared');
preg_match_all("/src='(.+?)'\/>/", $data, $links);
print_r($links[1]);
You forgot to match the closing / of the anchor tags.
Related
This question already has answers here:
How do you parse and process HTML/XML in PHP?
(31 answers)
Closed 3 years ago.
as the title suggests, I would like to retrieve links that have a specific class.
I have the code to connect to the pages and with the preg_match function I would like to take only the url that is in href = "url".
the structure of the link I would like to take and the one found in href = "", this link is in a table and can also have other attributes but not id, only the view class.
<a title="viwe" class="view" href="link">blablabla</a>
while I wrote this code
$curl = curl_init('http://prove/prove/pag/test.php');
curl_setopt($curl, CURLOPT_RETURNTRANSFER, TRUE);
$page = curl_exec($curl);
if(curl_errno($curl)) // check for execution errors
{
echo 'Scraper error: ' . curl_error($curl);
exit;
}
curl_close($curl);
$regex = '/<a.*?>(.*?)<\/a>/';
if ( preg_match($regex, $page, $list) )
echo $list[0];
else
print "Not found";
Well, the method you are trying to implement is not recommended, yet if you have to, this expression might be closer to what you have in mind, that I'm guessing:
<a\s.*?\sclass="\s*view\s*"[^>]*>.*?<\/a>
Demo
This question already has an answer here:
How to extract and access data from JSON with PHP?
(1 answer)
Closed 6 years ago.
So I have some relatively simple JSON I'm trying to display using PHP and I'm getting stuck, I think perhaps I'm not using decode or encode correctly. Maybe I simply overlooked something.
Here's the JSON...
{
"numFound": 43640,
"start": 0,
"maxScore": 0.7847167,
"docs": [],
"facets": {}
}
Here's my PHP...
<?php
$json_returned = file_get_contents("URL_OF_JSON_SOURCE");
$decoded_results = json_decode($array, true);
{
foreach($decoded_results as $results){
echo "Number Found:".$results['numFound'].";
echo "Start:".$results['start'].";
}
}
?>
I'm primarily just trying to get "numFound", "start", and "maxScore" to display. Thanks for any help, or even taking the time to read this post.
Here's the source JSON..
https://api.data.gov/gsa/fbopen/v0/opps?q=technology&data_source=FBO&limit=1&show_closed=true&api_key=CTrs3pcYimTdR4WKn50aI1GcUxyL9M4s1fyBbSer
You don't have any JSON Array in the given data. So firstly you don't have to loop returned data. Secondly you just forget double quotes in the loop. Third you don't have to join strings if you got any of them is null.
Here is the solution :
<?php
$result = json_decode( file_get_contents("sth"), true );
echo 'Number Found :'.$result["numFound"].'<br/>';
echo 'Start :'.$result["start"].'<br/>';
You php code is messed up
<?php
$json_returned = file_get_contents("https://api.data.gov/gsa/fbopen/v0/opps?q=technology&data_source=FBO&limit=1&show_closed=true&api_key=CTrs3pcYimTdR4WKn50aI1GcUxyL9M4s1fyBbSer");
$decoded_results = json_decode($json_returned, true);
echo "Number Found:".$decoded_results['numFound']." ";
echo "Start:".$decoded_results['start'];
?>
This question already has answers here:
PHP Web scraping of Javascript generated contents [duplicate]
(2 answers)
Closed 7 years ago.
I just want to get the table details from the HTML and for example the URL is,
$url="https://www.centralbank.org.bz/rates-statistics/exchange-rates";
From this,I need to get the currency rate table in this url and also remove all the dirty data.
Please help me,
Many thanks
Try this code ::
$url = 'https://www.centralbank.org.bz/rates-statistics/exchange-rates';
$content = file_get_contents($url);
$first_step = explode( '<table id="currencyTable">' , $content );
$second_step = explode("</table>" , $first_step[1] );
echo $second_step[0];
You should use Simple HTML DOM,
An example may be helpful to you:
<?php
include('simple_html_dom.php');
$url = 'https://www.phpbb.com/community/viewtopic.php?f=46&t=543171';
$html = file_get_html($url);
$links = array();
foreach($html->find('a[class="postlink"]') as $a) {
$links[] = $a->href;
}
print_r($links);
?>
This question already has answers here:
Closed 10 years ago.
Possible Duplicate:
How to parse and process HTML with PHP?
Let's say I want to extract a certain number/text from a table from here: http://www.fifa.com/associations/association=chn/ranking/gender=m/index.html
I want to get the first number on the right table td under FIFA Ranking position. That would be 88 right now. Upon inspection, it is <td class="c">88</td>.
How would I use PHP to extract the info from said webpage?
edit: I am told JQuery/JavaScript it is for this... better suited
This could probably be prettier, but it'd go something like:
<?php
$page = file_get_contents("http://www.fifa.com/associations/association=chn/ranking/gender=m/index.html");
preg_match('/<td class="c">[0-9]*</td>/',$page,$matches);
foreach($matches as $match){
echo str_replace(array( "/<td class=\"c\">", "</td>"), "", $match);
}
?>
I've never done anything like this before with PHP, so it may not work.
If you can work your magic after page load, you can use JavaScript/JQuery
<script type='text/javascript'>
var arr = [];
jQuery('table td.c').each(
arr[] = jQuery(this).html();
);
return arr;
</script>
Also, sorry for deleting my comment. You weren't specific as to what needed to be done, so I initially though jQuery would better fit your needs, but then I thought "Maybe you want to get the page content before an HTML page is loaded".
Try http://simplehtmldom.sourceforge.net/,
$html = file_get_html('http://www.google.com/');
echo $html->find('div.rankings', 0)->find('table', 0)->find('tr',0)->find('td.c',0)->plaintext;
This is untested, just looking at the source. I'm sure you could target it faster.
In fact,
echo $html->find('div.rankings', 0)->find('td.c',0)->plaintext;
should work.
Using DOMDocument, which should be pre-loaded with your PHP installation:
$dom = new DOMDocument();
$dom->loadHTML(file_get_contents("http://www.example.com/file.html"));
$xpath = new DOMXPath($dom);
$cell = $xpath->query("//td[#class='c']")->item(0);
if( $cell) {
$number = intval(trim($cell->textContent));
// do stuff
}
This question already has answers here:
Get content from a url using php
(3 answers)
Closed 7 years ago.
In php I need to get the contents of a url (source) search for a string "maybe baby love you" and if it does not contain this then do x.
Just read the contents of the page as you would read a file. PHP does the connection stuff for you. Then just look for the string via regex or simple string comparison.
$url = 'http://my.url.com/';
$data = file_get_contents( $url );
if ( strpos( 'maybe baby love you', $data ) === false )
{
// do something
}
//The Answer No 3 Is good But a small Mistake in the function strpos() I have correction the code bellow.
$url = 'http://my.url.com/';
$data = file_get_contents( $url );
if ( strpos($data,'maybe baby love you' ) === false )
{
// do something
}
Assuming fopen URL Wrappers are on ...
$string = file_get_contents('http://example.com/file.html');
if(strpos ('maybe baby love you', $string) === false){
//do X
}
If fopen URL wrappers are not enabled, you may be able to use the curl module (see http://www.php.net/curl )
Curl also gives you the ability to deal with authenticated pages, redirects, etc.