This question already has answers here:
PHP Web scraping of Javascript generated contents [duplicate]
(2 answers)
Closed 7 years ago.
I just want to get the table details from the HTML and for example the URL is,
$url="https://www.centralbank.org.bz/rates-statistics/exchange-rates";
From this,I need to get the currency rate table in this url and also remove all the dirty data.
Please help me,
Many thanks
Try this code ::
$url = 'https://www.centralbank.org.bz/rates-statistics/exchange-rates';
$content = file_get_contents($url);
$first_step = explode( '<table id="currencyTable">' , $content );
$second_step = explode("</table>" , $first_step[1] );
echo $second_step[0];
You should use Simple HTML DOM,
An example may be helpful to you:
<?php
include('simple_html_dom.php');
$url = 'https://www.phpbb.com/community/viewtopic.php?f=46&t=543171';
$html = file_get_html($url);
$links = array();
foreach($html->find('a[class="postlink"]') as $a) {
$links[] = $a->href;
}
print_r($links);
?>
Related
This question already has answers here:
How do you parse and process HTML/XML in PHP?
(31 answers)
Closed 3 years ago.
as the title suggests, I would like to retrieve links that have a specific class.
I have the code to connect to the pages and with the preg_match function I would like to take only the url that is in href = "url".
the structure of the link I would like to take and the one found in href = "", this link is in a table and can also have other attributes but not id, only the view class.
<a title="viwe" class="view" href="link">blablabla</a>
while I wrote this code
$curl = curl_init('http://prove/prove/pag/test.php');
curl_setopt($curl, CURLOPT_RETURNTRANSFER, TRUE);
$page = curl_exec($curl);
if(curl_errno($curl)) // check for execution errors
{
echo 'Scraper error: ' . curl_error($curl);
exit;
}
curl_close($curl);
$regex = '/<a.*?>(.*?)<\/a>/';
if ( preg_match($regex, $page, $list) )
echo $list[0];
else
print "Not found";
Well, the method you are trying to implement is not recommended, yet if you have to, this expression might be closer to what you have in mind, that I'm guessing:
<a\s.*?\sclass="\s*view\s*"[^>]*>.*?<\/a>
Demo
This question already has answers here:
Parse Wordpress like Shortcode
(7 answers)
Closed 4 years ago.
I'm using str_replace to replace a simple shortcode which works fine:
$content = "[old_shortcode]";
$old_shortcode = "[old_shortcode]";
$new_shortcode = "[new_shortcode]";
echo str_replace($old_shortcode, $new_shortcode, $content);
However I want to also replace attributes inside the shortcode without affecting any text content, for example change this:
[old_shortcode old_option_1="Text Content" old_option_2="Text Content"]
To this:
[new_shortcode new_option_1="Text Content" new_option_2="Text Content"]
Much appreciated if anyone could advise on how to do this.
To clarify, this question is not about parsing a shortcode (as it has been marked as a duplicated), it's about replacing one shortcode with another which the duplicate question linked to does not answer.
Edit:
I figured it out myself, however it's probably not a very elagant solution if anyone wants to suggest something better?
$pattern1 = '#\[shortcode(.*)attribute1="([^"]*)"(.*)\]#i';
$replace1 = '[shortcode$1attribute1_new="$2"$3]';
$pattern2 = '#\[shortcode(.*)attribute2="([^"]*)"(.*)\]#i';
$replace2 = '[shortcode$1attribute2_new="$2"$3]';
$pattern3 = '#\[shortcode(.*)(.*?)\[/shortcode\]#i';
$replace3 = '[new_shortcode$1[/new_shortcode]';
$content = '[shortcode attribute2="yes" attribute1="whatever"]Test[/shortcode]';
echo preg_replace(array($pattern1,$pattern2,$pattern3), array($replace1,$replace2,$replace3), $content);
Use preg_replace() instead that select only part of string you want using regex.
$newContent = preg_replace("/[a-zA-Z]+(_[^\s]+)/", "new$1", $content);
Check result in demo
This question already has answers here:
Reference - What does this error mean in PHP?
(38 answers)
Closed 7 years ago.
I am trying to extract the title text from an html page and insert it into an object. I am using symphony and php. The result from filterXPATH does not seem to be plain text and instead it is the entire html page and throwing error. I don't know why.
My code is:
$html = $this->file_get_contents_curl("http://www.google.com/");
$urlData = [];
$crawler = new Crawler($html);
$urlData->title = $crawler->filterXPath('//title')->extract('_text');
I see the title text if I do:
return $crawler->filterXPath('//title')->extract('_text');
Try this,
libxml_use_internal_errors(true);
$html = file_get_contents("http://www.google.com/");
$dom1 = new DOMDocument;
$dom1->preserveWhiteSpace = false;
$dom1->loadHTML($html);
$xp = new DOMXPath($dom1);
$xp->registerNamespace("php", "http://php.net/xpath");
$urlData= $xp->query('//title');
foreach($urlData as $title) {
echo $title->textContent;
}
Closed. This question needs debugging details. It is not currently accepting answers.
Edit the question to include desired behavior, a specific problem or error, and the shortest code necessary to reproduce the problem. This will help others answer the question.
Closed 8 years ago.
Improve this question
I try to scrape the content post of this forum https://forum.lowyat.net/topic/3424996 using below code.
$rows = $html->find('.post_table');
$array = array();
foreach($rows as $go){
$post_text = $go->find('.post_td_right > .post_text')->innertext;
$array[]= array(
'content'=> $post_text
);
}
echo json_encode($array);
I var_dump($rows) and it's an object, I really don't know why is the mistake. Need your help!
Forums usually have an RSS feed to help with this sort of requirement. Turns out, the site you're scraping supplies this for you: http://rss.forum.lowyat.net/topic/3424996
We can now use an XML parser instead of a DOM scraper, which will be much more efficient. For example;
<?php
$rss = file_get_contents('http://rss.forum.lowyat.net/topic/3424996'); //Or use cURL
$xml = simplexml_load_string($rss);
$array = array();
foreach($xml->channel->item as $posts) {
$post = (array) $posts->description;
$array[] = htmlentities($post[0]);
}
echo "<pre>";
echo print_r($array);
echo "</pre>";
This question already has answers here:
How do you parse and process HTML/XML in PHP?
(31 answers)
Closed 9 years ago.
I've just started PHP and I want to scrape a little page which I can't, I tried doing 'PREG_MATCH_ALL' but it just doesn't get the result I want.. Basically I want to scrape the youtube video links from here only: https://gdata.youtube.com/feeds/api/standardfeeds/most_shared - Scrape all of them and then use them later.
I tried using the following code which failed;
<?php
$data = file_get_contents('https://gdata.youtube.com/feeds/api/standardfeeds/most_shared');
preg_match_all("/src='(.+?)'>/", $data, $links);
$link_out = $links[0][0];
echo $link_out;
?>
I'm new to PHP, so little help please.
Thanks
As the feed is XML, you can use PHP's SimpleXMLElement to obtain the data.
<?php
$xml = new SimpleXMLElement(
'https://gdata.youtube.com/feeds/api/standardfeeds/most_shared',
null,
true
);
foreach($xml->entry as $entry) {
echo $entry->content['src'], PHP_EOL;
}
/*
https://www.youtube.com/v/IjWc43FCYlg?version=3&f=standard&app=youtube_gdata
https://www.youtube.com/v/Xw1C5T-fH2Y?version=3&f=standard&app=youtube_gdata
https://www.youtube.com/v/Kq0_dGKx4Os?version=3&f=standard&app=youtube_gdata
https://www.youtube.com/v/gbcBYs0ljI0?version=3&f=standard&app=youtube_gdata
https://www.youtube.com/v/78juOpTM3tE?version=3&f=standard&app=youtube_gdata
https://www.youtube.com/v/OOiZ-5DqwYI?version=3&f=standard&app=youtube_gdata
https://www.youtube.com/v/zjz614QVyfQ?version=3&f=standard&app=youtube_gdata
https://www.youtube.com/v/h15m87WsCHQ?version=3&f=standard&app=youtube_gdata
https://www.youtube.com/v/SXKOTdyOUBg?version=3&f=standard&app=youtube_gdata
https://www.youtube.com/v/BRAM8MpqIeA?version=3&f=standard&app=youtube_gdata
https://www.youtube.com/v/5yB3n9fu-rM?version=3&f=standard&app=youtube_gdata
https://www.youtube.com/v/NAOo9SnzRH8?version=3&f=standard&app=youtube_gdata
https://www.youtube.com/v/0KtILkzC-1g?version=3&f=standard&app=youtube_gdata
https://www.youtube.com/v/kWSIFh8ICaA?version=3&f=standard&app=youtube_gdata
https://www.youtube.com/v/Mi6AhogZCeg?version=3&f=standard&app=youtube_gdata
https://www.youtube.com/v/kWuIGAZ1x2I?version=3&f=standard&app=youtube_gdata
https://www.youtube.com/v/lKY5fmDGVLs?version=3&f=standard&app=youtube_gdata
https://www.youtube.com/v/C94PaCtqOk4?version=3&f=standard&app=youtube_gdata
https://www.youtube.com/v/V-fL8zopddI?version=3&f=standard&app=youtube_gdata
https://www.youtube.com/v/UWlzMIl7E48?version=3&f=standard&app=youtube_gdata
https://www.youtube.com/v/mcw6j-QWGMo?version=3&f=standard&app=youtube_gdata
https://www.youtube.com/v/-RSDaRttpzk?version=3&f=standard&app=youtube_gdata
https://www.youtube.com/v/8_RDx4skTp4?version=3&f=standard&app=youtube_gdata
https://www.youtube.com/v/7YDWdv9kR0M?version=3&f=standard&app=youtube_gdata
https://www.youtube.com/v/m96tYpEk1Ao?version=3&f=standard&app=youtube_gdata
*/
Anthony.
Try with this pregmatch:
preg_match_all("/src='([^']+)'/si", $data, $links);
and show results:
echo "<pre>";
print_r($links);
<?php
$data = file_get_contents('https://gdata.youtube.com/feeds/api/standardfeeds/most_shared');
preg_match_all("/src='(.+?)'\/>/", $data, $links);
print_r($links[1]);
You forgot to match the closing / of the anchor tags.