SimpleXMLElement out of memory with more files - php

I build a script which have multi curl and extract code souce from multiple urls. It works fine with a few links but if I want to add how I want.. maybe 1000 urls, after 100urls parsed my script crash because is out of memory(ram)
my code:
...
try {
$xml = new SimpleXMLElement($data);
foreach ($xml->url as $url_list) {
$url = $url_list->loc;
$newurls[] = $url;
unset($xml);
}
} catch(Exception $e) {
echo "invalid";
}
}
My $data variable is the result of source code for each url which was get with multi curl.
Is there a way to clear the memory or something?
I already tried to allocate memory from php.ini, the problem is not there.

Related

Fatal error: Out of memory (allocated 377487360) (tried to allocate 371200000 bytes)

I want to scrape data from an URL that contain other URLs contains details of each item using simple_html_dom.php
<?php
include 'simple_html_dom.php';
// Create DOM from URL or file
$url='www.example.com';
$count=0;
$Links_Array = array();
$ArrayOfDomHtml=array();
// Find all links in the first page
if(!empty($url))
{
$html = file_get_html($url);
foreach($html->find('.li_subject .item_link') as $element)
{
$Links_Array[$count]=$element->href;
$count++;
}
}
// Get details information from every item
// Create DOM from URLS
if(!empty($Links_Array))
{
$count=0;
foreach($Links_Array as $element)
{
$ArrayOfDomHtml[$count] = file_get_html($element);
$count++;
}
}
// Get the title
if(!empty($ArrayOfDomHtml))
{
$count=0;
foreach ($ArrayOfDomHtml as $value)
{
$array2[$count] = array('title' => $value->find('.item_subject') );
$count++;
}
}
foreach ($array2 as $value) {
print_r( $value);
}
?>
I am using xampp server
I want to print the value of $array2
I have a problem of memory I have looked for the problem I found several answers like setting in php.ini file
set memory_limit=-1
uncoment realpath_cache_size = 4096k
uncoment realpath_cache_ttl = 120
i have made all this operations but it still doesn't work
the line 49 is print_r( $value);
Edit
i have edited the code like this to minimise memory but still doesn't work
<?php
include 'simple_html_dom.php';
// Create DOM from URL or file
// Find all links in the first page
if(!empty($url))
{
$html = file_get_html($url);
foreach($html->find('.li_subject .item_link') as $element)
{
$Links_Array[$count]=$element->href;
$count++;
}
}
// Get details information from every item
// Create DOM from URLS
if(!empty($Links_Array)) {
$count=0;
foreach($Links_Array as $url) {
$html = file_get_html($url);
$DetailItem[$count] = array('title' => $html ->find('.item_subject') );
$count++;
}
}
print_r($DetailItem);
?>
What you should be doing first is trying to make your program use less memory.
Instead of scraping all of the HTML on the planet into memory, and then parsing out the one specific bit of info you want, combine those and only store the bit you actually want.
if(!empty($Links_Array)) {
$count=0;
foreach($Links_Array as $element) {
$html = file_get_html($element);
$array2[$count] = array('title' => $value->find('.item_subject') );
$count++;
}
}
What I'd do is find the php.ini file, the find the line where it says memory_limit and set it to 2048M just in case to figure out if it's because of memory.
Then you should save this change and restart the server (depending which server you're using apache or nginx).

php file_get_contents from different URL if first one not available

I have the following code to read an XML file which works well when the URL is available:
$url = 'http://www1.blahblah.com'."param1"."param2";
$xml = file_get_contents($url);
$obj = SimpleXML_Load_String($xml);
How can I change the above code to cycle through a number of different URL's if the first one is unavailable for any reason? I have a list of 4 URL's all containing the same file but I'm unsure how to go about it.
Replace your code with for example this
//instead of simple variable use an array with links
$urls = [ 'http://www1.blahblah.com'."param1"."param2",
'http://www1.anotherblahblah.com'."param1"."param2",
'http://www1.andanotherblahblah.com'."param1"."param2",
'http://www1.andthelastblahblah.com'."param1"."param2"];
//for all your links try to get a content
foreach ($urls as $url) {
$xml = file_get_contents($url);
//do your things if content was read without failure and break the loop
if ($xml !== false) {
$obj = SimpleXML_Load_String($xml);
break;
}
}

Unable to fread output of a remote php file

I am using the output of a php file on a remote server, to show content on my own web-site. I do not have access to modify files on the remote server.
The remote php file outputs java script like this:
document.write('<p>some text</p>');
If I enter the url in a browser I get the correct output. E.g:
https://www.remote_server.com/files/the.php?param1=12
I can show the output of the remote file on my website like this:
<script type="text/javascript" src="https://www.remote_server.com/files/the.php?param1=12"></script>
But I would like to filter the output a bit before showing it.
Therefore I implemented a php file with this code:
function getRemoteOutput(){
$file = fopen("https://www.remote_server.com/files/the.php?param1=12","r");
$output = fread($file,1024);
fclose($file);
return $output;
}
When I call this function fopen() returns a valid handle, but fread() returns an empty string.
I have tried using file_get_contents() instead, but get the same result.
Is what I am trying to do possible?
Is it possible for the remote server to allow me to read the file via the browser, but block access from a php file?
Your variable $output is only holding the 1st 1024 bytes of the url... (headers maybe?).
You will need to add a while not the "end of file" loop to concatenate the entire remote file.
PHP reference: feof
You can learn a lot more in the PHP description for the fread function.
PHP reference: fread.
<?php
echo getRemoteOutput();
function getRemoteOutput(){
$file = fopen("http://php.net/manual/en/function.fread.php","r");
$output = "";
while (!feof($file)){ // while not the End Of File
$output.= fread($file,1024); //reads 1024 bytes at a time and appends to the variable as a string.
}
return $output;
fclose($file);
}
?>
In regards to your questions:
Is what I am trying to do possible?
Yes this is possible.
Is it possible for the remote server to allow me to read the file via
the browser, but block access from a php file?
I doubt it.
I contacted the support team for the site I was trying to connect to. They told me that they do prevent access from php files.
So that seems to be the reason for my problems, and apparently I just cannot do what I tried to do.
For what it's worth, here is the code I used to test the various methods to read file output:
<?php
//$remotefile = 'http://www.xencomsoftware.net/configurator/tracker/ip.php';
$remotefile = "http://php.net/manual/en/function.fread.php";
function getList1(){
global $remotefile;
$output = file_get_contents($remotefile);
return htmlentities($output);
}
function getList2(){
global $remotefile;
$file = fopen($remotefile,"r");
$output = "";
while (!feof($file)){ // while not the End Of File
$output.= fread($file,1024); //reads 1024 bytes at a time and appends to the variable as a string.
}
fclose($file);
return htmlentities($output);
}
function getList3(){
global $remotefile;
$ch = curl_init(); // create curl resource
curl_setopt($ch, CURLOPT_URL, $remotefile); // set url
curl_setopt($ch, CURLOPT_SSL_VERIFYPEER, false);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, true); //return the transfer as a string
$output = curl_exec($ch); // $output contains the output string
curl_close($ch); // close curl resource to free up system resources
return htmlentities($output);
}
function getList4(){
global $remotefile;
$r = new HttpRequest($remotefile, HttpRequest::METH_GET);
try {
$r->send();
if ($r->getResponseCode() == 200) {
$output = $r->getResponseBody();
}
} catch (Exception $e) {
echo 'Caught exception: ', $e->getMessage(), "\n";
}
return htmlentities($output);
}
function dumpList($ix, $list){
$len = strlen($list);
echo "<p><b>--- getList$ix() ---</b></p>";
echo "<div>Length: $len</div>";
for ($i = 0 ; $i < 10 ; $i++) {
echo "$i: $list[$i] <br>";
}
// echo "<p>$list</p>";
}
dumpList(1, getList1()); // doesn't work! You cannot include/requre a remote file.
dumpList(2, getList2());
dumpList(3, getList3());
dumpList(4, getList4());
?>

php array not returning full out put

I have a php code which will ssh to a remote machine and execute a shell script to get list of folders. The remote machine contain more than 300 folders in the specified path in the shell script.Shell script executes well and return the list of all folders.But while I retrieve this output in php, I'am getting only around 150, 200 number of folders.
Here is my php code,
<?php
if (!function_exists("ssh2_connect")) die("function ssh2_connect doesn't exist");
if(!($con = ssh2_connect("ip.add.re.ss", "port")))
{
echo "fail: unable to establish connection";
}
else
{
if(!ssh2_auth_password($con, "username", "password"))
{
echo "fail: unable to authenticate";
}
else
{
$stream = ssh2_exec($con, "/usr/local/listdomain/listproject.sh");
stream_set_blocking($stream, true);
$item = fread($stream,4096);
$items = explode(" ", $item);
print_r($items);
}
}
?>
And this is my shell script.
#!/bin/bash
var=$(ls /home);
echo $var;
What is the issue with php here. Is there any limit in array size in php while getting data dynamically like here.Please advise as I am very beginner to PHP.
Thanks.
You're only reading one block of 4096 characters from your stream. if your folder list is longer than this you'll lose the rest. You need something like this:
stream_set_blocking($stream, true);
$item = "";
// continue reading while there's more data
while ($input = fread($stream,4096)) {
$item .= $input;
}
$items = explode(" ", $item);
print_r($items);
You asked fread() to only read 4096 bytes. In the examples portion of fread()’s documentation, it is suggested that stream_get_contents() be used for reading a file handle out to its end. Otherwise, you have to use a loop and keep on reading data until feof($stream) returns FALSE.

lacking photos from external Url php

I am using server side for getting photos from external url. I am using simple php dom library for getting this as per SO suggestion. But I am lacking performance in this. I mean for some sites I am not able to get all the photos.
$url has the example external site which is not giving me all the images.
$url
="http://www.target.com/c/baby-baby-bath-bath-safety
/-/N-5xtji#?lnk=nav_t_spc_3_inc_1_1";
$html = file_get_contents($url);
$doc = new DOMDocument();
#$doc->loadHTML($html);
$tags = $doc->getElementsByTagName('img');
foreach ($tags as $tag) {
echo $imageUrl = $tag->getAttribute('src');
echo "<br />";
}
Is this possible I can have functionality/accuracy similar to the option of Firefox
Firefox-> tools -> page info -> media
I mean I just want to be more accurate for this as the existing library is not fetching all images. Also I tried file_get_content...which is also not fetching all the images.
You need to use regular expressions to get images' src. DOMDocument build all DOM structure in memory, You needn't it. When You get URLs, use file_get_contents() and write data to files. Also add max_execution_time if You'll parse many pages.
Download images from remote server
function save_image($sourcePath,$targetPath)
{
$in = fopen($sourcePath, "rb");
$out = fopen($targetPath, "wb");
while ($chunk = fread($in,8192))
{
fwrite($out, $chunk, 8192);
}
fclose($in);
fclose($out);
}
$src = "http://www.example.com/thumbs/thumbs-t2/1/ts_11083.jpg"; //image source
$target = dirname(__FILE__)."/images/pic.jpg"; //where to save image with new name
save_image($src,$target);

Categories