I have the following code to read an XML file which works well when the URL is available:
$url = 'http://www1.blahblah.com'."param1"."param2";
$xml = file_get_contents($url);
$obj = SimpleXML_Load_String($xml);
How can I change the above code to cycle through a number of different URL's if the first one is unavailable for any reason? I have a list of 4 URL's all containing the same file but I'm unsure how to go about it.
Replace your code with for example this
//instead of simple variable use an array with links
$urls = [ 'http://www1.blahblah.com'."param1"."param2",
'http://www1.anotherblahblah.com'."param1"."param2",
'http://www1.andanotherblahblah.com'."param1"."param2",
'http://www1.andthelastblahblah.com'."param1"."param2"];
//for all your links try to get a content
foreach ($urls as $url) {
$xml = file_get_contents($url);
//do your things if content was read without failure and break the loop
if ($xml !== false) {
$obj = SimpleXML_Load_String($xml);
break;
}
}
Related
Hello I've got a bunch of divs I'm trying to scrape the content values from and I've managed to successfully pull out one of the values, result! However I've hit a brick wall, I want to now pull out the one after it inside the current code I've done. Hit a brick wall here would appreciate any help.
Here is the bit of code i'm currently using.
foreach ($arr as &$value) {
$file = $DOCUMENT_ROOT. $value;
$doc = new DOMDocument();
$doc->loadHTMLFile($file);
$xpath = new DOMXpath($doc);
$elements = $xpath->query("//*[contains(#class, 'covGroupBoxContent')]//div[3]//div[2]");
if (!is_null($elements)) {
foreach ($elements as $element) {
$nodes = $element->childNodes;
foreach ($nodes as $node) {
$maps = $node->nodeValue;
echo $maps;
}
}
}
}
I simply want them all to have separate outputs that I can echo out.
I recommend you use Simple HTML DOM. Beyond that I need to see a sample of the HTML you are scraping.
If you are scraping a website outside your domain I'd recommend saving the source HTML to a file for review and testing. Some websites combat scraping, thus what you see in the browser is not what your scraper would see.
Also, I'd recommend setting a random user agent via ini_set(). If you need a function for this I have one.
<?php
$html = file_get_html($url);
IF ($html) {
$myfile = fopen("testing.html", "w") or die("Unable to open file!");
fwrite($myfile, $html);
fclose($myfile);
}
?>
include('simple_html_dom.php');
function curl_set($url){
$ch = curl_init($url);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
$result = curl_exec($ch);
return $result;
}
$curl_scraped_page = curl_set('http://www.belmontwine.com/site-map.html');
$html = new simple_html_dom();
$html->load($curl_scraped_page, true, false);
$i = 0;
$ab = array();
$files = array();
foreach($html->find('td[class=site-map]') as $td) {
foreach($td->find('li a') as $a) {
if($i<=2){
$ab = 'http://www.belmontwine.com'.$a->href;
$html = file_get_html($ab);
foreach($html->find('td[class=pageheader]') as $file) {
$files[] = $file->innertext;
}
}
else{
//exit();
}
$i++;
}
$html->clear();
}
print_r($files);
Above is my code i need help to scrap site with php.
$ab variable contain the urls that are scraped from the site.i want to scrap data from those URL. I don't know whats wrong with script.
The desired output be the url passed by $ab..
but it is not returning anything..just a continous loop...
Need help with it
You have a run away program because once you are inside the if($i<=2) section you never increment the i variable. Right now your i++ is in the wrong place. I don't know why you want to limit the finds to 3 or less but you need to remember to reset the i variable to 0 also, which you are not doing at all.
EDIT:
I don't use the class 'simple_html_dom.php' so I don't know it very well. And I don't know what you want to do with each link found. And I can't do the work for you. I came up with this sample php script that grabs all the links from your site-map page. It creates an array consisting of the link title and href path. The last foreach loop just prints the array for now but you could use that loop to process each path found.
include('simple_html_dom.php');
$files = array();
$html = file_get_html('http://www.belmontwine.com/site-map.html');
foreach($html->find('td[class=site-map]') as $td)
{
foreach($td->find('li a') as $a)
{
if($a->plaintext != '')
{
$files["$a->plaintext"] = "http://www.belmontwine.com/$a->href";
}
}
}
// To print $files array or to process each link found
foreach($files as $title => $path)
{
echo('Title: ' . $title . ' - Path: ' . $path . '<br>' . PHP_EOL);
}
Also, not every link found is an html file, at least 1 is a pdf so be sure to test for that in your code.
hello i have tried and nothing will happen...
i will count the childs from an xml file via php
everthing is ok but i dont get, - load correctly this stupid xml file into my page =
here's the script simply --
$url123 = 'http://steamcommunity.com/id/ProJaCore/stats/GarrysMod/?xml=1';
$data123 = file_get_contents($url123);
$xml = simplexml_load_string($data123);
$elem = new SimpleXMLElement($xml);
foreach ($elem as $achievements) {
print $achievements->count().'<br>';
}
Do this:
$url123 = 'http://steamcommunity.com/id/ProJaCore/stats/GarrysMod/?xml=1';
$data123 = file_get_contents($url123);
$elem = new SimpleXMLElement($data123);
foreach ($elem as $achievements) {
print $achievements->count().'<br>';
}
In your code you're creating a SimpleXMLElement object in $xml, then trying to create another one in $elem using the $xml object.
See the complete reference: http://www.php.net/manual/en/book.simplexml.php
I am using server side for getting photos from external url. I am using simple php dom library for getting this as per SO suggestion. But I am lacking performance in this. I mean for some sites I am not able to get all the photos.
$url has the example external site which is not giving me all the images.
$url
="http://www.target.com/c/baby-baby-bath-bath-safety
/-/N-5xtji#?lnk=nav_t_spc_3_inc_1_1";
$html = file_get_contents($url);
$doc = new DOMDocument();
#$doc->loadHTML($html);
$tags = $doc->getElementsByTagName('img');
foreach ($tags as $tag) {
echo $imageUrl = $tag->getAttribute('src');
echo "<br />";
}
Is this possible I can have functionality/accuracy similar to the option of Firefox
Firefox-> tools -> page info -> media
I mean I just want to be more accurate for this as the existing library is not fetching all images. Also I tried file_get_content...which is also not fetching all the images.
You need to use regular expressions to get images' src. DOMDocument build all DOM structure in memory, You needn't it. When You get URLs, use file_get_contents() and write data to files. Also add max_execution_time if You'll parse many pages.
Download images from remote server
function save_image($sourcePath,$targetPath)
{
$in = fopen($sourcePath, "rb");
$out = fopen($targetPath, "wb");
while ($chunk = fread($in,8192))
{
fwrite($out, $chunk, 8192);
}
fclose($in);
fclose($out);
}
$src = "http://www.example.com/thumbs/thumbs-t2/1/ts_11083.jpg"; //image source
$target = dirname(__FILE__)."/images/pic.jpg"; //where to save image with new name
save_image($src,$target);
I use this below to save me the contents of the XML addresses I have in array. However only one XML is saved, specifically the last one. What am I missing here?
$filenames = array('xml url','xml url','xml url');
foreach( $filenames as $filename) {
$xml = simplexml_load_file( $filename );
$xml->asXML("test.xml");
}
You appear to be opening each XML file, then saving them in the same location. File 1 is written, then File 2 overwrites it, then File 3... In short, the last file will overwrite the previous ones, and therefore "only the last one is saved".
What exactly are you trying to do here?
You save them all as the same name, so of course the earlier ones will be lost.
Try this:
$filenames = array('xml url','xml url','xml url');
foreach( $filenames as $key => $filename) {
$xml = simplexml_load_file( $filename );
$xml->asXML('test' . $key. '.xml');
}
That should save the files sequentially as test0.xml, test1.xml, test2.xml and so on.
If you want all your loaded XML URL's to be appended to a single file, you can do something like this:
$filenames = array('xml url','xml url','xml url');
$fullXml = array();
foreach( $filenames as $key => $filename) {
$xml = simplexml_load_file( $filename );
// Convert the simplexml object into a string, and add it to an array
$fullXml[] = $xml->asXML();
}
// Implode the array of all our xml into one big xml string
$fullXml = implode("\n", $fullXml);
// Load the new big xml string into a simplexml object
$xml = simplexml_load_string($fullXml);
// Now we can save the entire xml as your file
$xml->asXml('test.xml');