Retrieve data from url and save in php - php

I am trying to retrieve the html from file get contents in php then save it to a php file so I can include it into my homepage.
Unfortunately my script isn't saving the data into the file. I also need to verwrite this data on a daily basis as it will be setup with a cron job.
Can anyone tell me where I am going wrong please? I am just learning php :-)
<?php
$richSnippets = file_get_contents('http://website.com/data');
$filename = 'reviews.txt';
$handle = fopen($filename,"x+");
$somecontent = echo $richSnippets;
fwrite($handle,$somecontent);
echo "Success";
fclose($handle);
?>

A couple of things,
http://website.com/data gets a 404 error, it doesn't exist.
Change your code to
$site = 'http://www.google.com';
$homepage = file_get_contents($site);
$filename = 'reviews.txt';
$handle = fopen($filename,"w");
fwrite($handle,$homepage);
echo "Success";
fclose($handle);
Remove $somecontent = echo $richSnippets; it doesn't do anything.
if you have the proper permissions it should work.
Be sure that your pointing to an existing webpage.
Edit
When cURL is enabled you can use the following function
function get_web_page( $url ){
$options = array(
CURLOPT_RETURNTRANSFER => true, // return web page
CURLOPT_HEADER => false, // don't return headers
CURLOPT_FOLLOWLOCATION => true, // follow redirects
CURLOPT_ENCODING => "", // handle all encodings
CURLOPT_USERAGENT => "spider", // who am i
CURLOPT_AUTOREFERER => true, // set referer on redirect
CURLOPT_CONNECTTIMEOUT => 120, // timeout on connect
CURLOPT_TIMEOUT => 120, // timeout on response
CURLOPT_MAXREDIRS => 10, // stop after 10 redirects
);
$ch = curl_init( $url );
curl_setopt_array( $ch, $options );
$content = curl_exec( $ch );
curl_close( $ch );
return $content;
}
Now change
$homepage = file_get_contents($site);
in to
$homepage = get_web_page($site);

You should use / instead of ****
$homepage = file_get_contents('http://website.com/data');
Also this part
$somecontent = echo $richSnippets;
I don't see $richSnippets above... it's probably not declared?
You probably want to do this:
fwrite($handle,$homepage);

Related

Cron job from cpanel retunning error while running script from URL is not reurning any error

I wrote a script on my web server. when I run my code using the URL (http://www.eurekabd.com/RMS/corn.php) and found ok. Then I made a corn job scheduler in cPanel by this command /usr/local/bin/php /home/eurekabd/public_html/RMS/corn.php. When cron job run this command it sends me mail that
Warning: file_get_contents(shakil.txt): failed to open stream: No such file or directory in /home/eurekabd/public_html/RMS/corn.php on line 12
<?php
$response = get_web_page("http://144.48.2.11/Test_PHP/live_check.php");
//$response = get_web_page("http://localhost/phpScriptTesting/request.php");
$txtFile = "shakil.txt";
echo $response;
/*if(! filesize( $txtFile ) )
{
echo " Not EMPTY FILE ";
}*/
echo "Server is =".$response;
if ((trim(file_get_contents('shakil.txt')) != false) && ($response == "OK")) {
echo "-->I have some data to send.";
//http://144.48.2.11/Test_PHP/index_2.php/?data={1239{fnc:15,sts:[0,8,25,526,19][0103044396424b7f0c][01030caf854366a45b436695684366376d][01030c9ba63b447bb33b7249523a1d4641][01030c00003f8000003f8000003f80d6f8][010304999a3f1924ba][x][x][x][x][x][0b03080113045403ff03f057ea][x]}})
$file_content = file_get_contents("shakil.txt");
$file_content_separated_by_spaces = explode(";", $file_content);
//print_r ($file_content_separated_by_spaces)."</br>";
$loop_size = sizeof($file_content_separated_by_spaces)-2;
echo $loop_size;
for ($x = 0; $x <= $loop_size; $x++) {
$dataFormatforServer = str_replace(' ', '_', $file_content_separated_by_spaces[$x]);
$textData = "http://144.48.2.11/Test_PHP/index_3.php/?data=".$dataFormatforServer;
//$textData = "http://localhost/phpScriptTesting/test.php/?data=".$dataFormatforServer;
echo $textData."</br>";
$sendData = get_web_page($textData);
//echo $sendData."</br>";
}
$file_to_delete = 'shakil.txt';
unlink($file_to_delete);
$my_file = 'shakil.txt';
$handle = fopen($my_file, 'w') or die('Cannot open file: '.$my_file);
}
else{
echo "I don't Have Any Data to send.";
}
function get_web_page($url) {
$options = array(
CURLOPT_RETURNTRANSFER => true, // return web page
CURLOPT_HEADER => false, // don't return headers
CURLOPT_FOLLOWLOCATION => true, // follow redirects
CURLOPT_MAXREDIRS => 10, // stop after 10 redirects
CURLOPT_ENCODING => "", // handle compressed
CURLOPT_USERAGENT => "test", // name of client
CURLOPT_AUTOREFERER => true, // set referrer on redirect
CURLOPT_CONNECTTIMEOUT => 300, // time-out on connect
CURLOPT_TIMEOUT => 300, // time-out on response
);
$ch = curl_init($url);
curl_setopt_array($ch, $options);
$response = curl_getinfo($ch, CURLINFO_HTTP_CODE);
$content = curl_exec($ch);
curl_close($ch);
return $content;
}
?>
you may add/change php.ini attributes for that..
allow_url_fopen = On
It may resolve the issue..

Login with curl and move to another page

I'm trying to access one page in a website with CURL, however it needs to be logged in i tried the code to login and it was successful
<?php
$user_agent = "Mozilla/5.0 (X11; Linux i686; rv:24.0) Gecko/20140319 Firefox/24.0 Iceweasel/24.4.0";
$curl_crack = curl_init();
CURL_SETOPT($curl_crack,CURLOPT_URL,"https://www.vininspect.com/en/account/login");
CURL_SETOPT($curl_crack,CURLOPT_USERAGENT,$user_agent);
CURL_SETOPT($curl_crack,CURLOPT_PROXY,"183.78.169.60:37899");
CURL_SETOPT($curl_crack,CURLOPT_PROXYTYPE,CURLPROXY_SOCKS5);
CURL_SETOPT($curl_crack,CURLOPT_POST,True);
CURL_SETOPT($curl_crack,CURLOPT_POSTFIELDS,"LoginForm[email]=naceriwalid%40hotmail.com&LoginForm[password]=passwordhere&toploginform[rememberme]=0&yt1=&toploginform[rememberme]=0");
CURL_SETOPT($curl_crack,CURLOPT_RETURNTRANSFER,True);
CURL_SETOPT($curl_crack,CURLOPT_FOLLOWLOCATION,True);
CURL_SETOPT($curl_crack,CURLOPT_COOKIEFILE,"cookie.txt"); //Put the full path of the cookie file if you want it to write on it
CURL_SETOPT($curl_crack,CURLOPT_COOKIEJAR,"cookie.txt"); //Put the full path of the cookie file if you want it to write on it
CURL_SETOPT($curl_crack,CURLOPT_CONNECTTIMEOUT,30);
CURL_SETOPT($curl_crack,CURLOPT_TIMEOUT,30);
$exec = curl_exec($curl_crack);
if(preg_match("/^you are logged|logout|successfully logged$/i",$exec))
{
echo "yoooha";
}
?>
Now the only problem I'm facing let's say that i don't want to be redirected to the logged in page, i want to be redirected to this page http://example.com/buy, how i can do that in the same code?
If you want to go to /buy after you log in, just use the same curl handle and issue another request for that page. cURL will retain the cookies for the duration of the handle (and on subsequent requests since you are saving them to a file and reading them back with the cookie jar.
For example:
$user_agent = "Mozilla/5.0 (X11; Linux i686; rv:24.0) Gecko/20140319 Firefox/24.0 Iceweasel/24.4.0";
$curl_crack = curl_init();
CURL_SETOPT($curl_crack,CURLOPT_URL,"https://www.vininspect.com/en/account/login");
CURL_SETOPT($curl_crack,CURLOPT_USERAGENT,$user_agent);
CURL_SETOPT($curl_crack,CURLOPT_PROXY,"183.78.169.60:37899");
CURL_SETOPT($curl_crack,CURLOPT_PROXYTYPE,CURLPROXY_SOCKS5);
CURL_SETOPT($curl_crack,CURLOPT_POST,True);
CURL_SETOPT($curl_crack,CURLOPT_POSTFIELDS,"LoginForm[email]=naceriwalid%40hotmail.com&LoginForm[password]=passwordhere&toploginform[rememberme]=0&yt1=&toploginform[rememberme]=0");
CURL_SETOPT($curl_crack,CURLOPT_RETURNTRANSFER,True);
CURL_SETOPT($curl_crack,CURLOPT_FOLLOWLOCATION,True);
CURL_SETOPT($curl_crack,CURLOPT_COOKIEFILE,"cookie.txt"); //Put the full path of the cookie file if you want it to write on it
CURL_SETOPT($curl_crack,CURLOPT_COOKIEJAR,"cookie.txt"); //Put the full path of the cookie file if you want it to write on it
CURL_SETOPT($curl_crack,CURLOPT_CONNECTTIMEOUT,30);
CURL_SETOPT($curl_crack,CURLOPT_TIMEOUT,30);
$exec = curl_exec($curl_crack);
if(preg_match("/^you are logged|logout|successfully logged$/i",$exec))
{
$post = array('search' => 'keyword', 'abc' => 'xyz');
curl_setopt($curl_crack, CURLOPT_POST, 1); // change back to GET
curl_setopt($curl_crack, CURLOPT_POSTFIELDS, http_build_query($post)); // set post data
curl_setopt($curl_crack, CURLOPT_URL, 'http://example.com/buy'); // set url for next request
$exec = curl_exec($curl_crack); // make request to buy on the same handle with the current login session
}
Here are some other examples of using PHP & cURL to make multiple requests:
How to login in with Curl and SSL and cookies (links to multiple other examples)
Grabbing data from a website with cURL after logging in?
Pinterest login with PHP and cURL not working
Login to Google with PHP and Curl, Cookie turned off?
PHP Curl - Cookies problem
You just need to change the URL after login is compete and then run curl_exec Like this :
<?php
//login code goes here
if(preg_match("/^you are logged|logout|successfully logged$/i",$exec))
{
echo "Logged in! now lets go to other page while we are logged in, shall we?";
//The new URL that you want to go to while logged in goes in bottom line :
CURL_SETOPT($curl_crack, CURLOPT_URL, "https://new_url_to_go.com/something");
$exec = curl_exec($curl_crack);
// now $exec contains the the content of new page with login
}
curl_close($curl_crack);//dont forgert to close curl session at last
?>
First define these function to get an associative array containing the url header and content (see http://nadeausoftware.com/articles/2007/06/php_tip_how_get_web_page_using_curl):
/**
* Get a web file (HTML, XHTML, XML, image, etc.) from a URL. Return an
* array containing the HTTP server response header fields and content.
*/
function get_web_page( $url, $params, $is_post = true )
{
$options = array(
CURLOPT_RETURNTRANSFER => true, // return web page
CURLOPT_HEADER => false, // don't return headers
CURLOPT_FOLLOWLOCATION => true, // follow redirects
CURLOPT_ENCODING => "", // handle all encodings
CURLOPT_USERAGENT => "Mozilla/4.0 (compatible;)", // i'm mozilla
CURLOPT_AUTOREFERER => true, // set referer on redirect
CURLOPT_CONNECTTIMEOUT => 120, // timeout on connect
CURLOPT_TIMEOUT => 120, // timeout on response
CURLOPT_MAXREDIRS => 10, // stop after 10 redirects
);
if($is_post) { //use POST
$options[CURLOPT_POST] = 1;
$options[CURLOPT_POSTFIELDS] = http_build_query($params);
} else { //use GET
$url = $url.'?'.http_build_query($params);
}
$ch = curl_init( $url );
curl_setopt_array( $ch, $options );
$content = curl_exec( $ch );
$err = curl_errno( $ch );
$errmsg = curl_error( $ch );
$header = curl_getinfo( $ch );
curl_close( $ch );
$header['errno'] = $err;
$header['errmsg'] = $errmsg;
$header['content'] = $content;
return $header;
}
try this to load the 'http://www.example.com/buy' after login is successful.
// after curl login setup
$exec = curl_exec($curl_crack);
if(preg_match("/^you are logged|logout|successfully logged$/i",$exec))
{
// close login CURL resource, and free up system resources
curl_close($curl_crack);
$params = array('product_id'=>'xxxx', qty=>10);
$url = 'http://www.example.com/buy';
//use above function to get the url content via POST params
$result = get_web_page($url, $params, true);
if($result['http_code'] == 200) {
//echo the content
echo $result['content'];
die();
}
}

PHP stalls on loop of curl function with bad url

I have a database of a few thousand URL's that I am checking for links on pages (end up looking for specific links) and so I am throwing the below function through a loop and every once and awhile one of the URL's is bad and then the entire program just stalls and stops running and starts building up memory used. I thought adding the CURLOPT_TIMEOUT would fix this but it didn't. Any ideas?
$options = array(
CURLOPT_RETURNTRANSFER => true, // return web page
CURLOPT_HEADER => false, // don't return headers
CURLOPT_FOLLOWLOCATION => true, // follow redirects
CURLOPT_ENCODING => "", // handle all encodings
CURLOPT_USERAGENT => "Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.8.1.13) Gecko/20080311 Firefox/2.0.0.13'", // who am i
CURLOPT_AUTOREFERER => true, // set referer on redirect
CURLOPT_TIMEOUT => 2, // timeout on response
CURLOPT_MAXREDIRS => 10, // stop after 10 redirects
CURLOPT_POST => 0, // i am sending post data
CURLOPT_POSTFIELDS => $curl_data, // this are my post vars
CURLOPT_SSL_VERIFYHOST => 0, // don't verify ssl
CURLOPT_SSL_VERIFYPEER => false, //
CURLOPT_VERBOSE => 1 //
);
$ch = curl_init($url);
curl_setopt_array($ch,$options);
$content = curl_exec($ch);
$err = curl_errno($ch);
$errmsg = curl_error($ch) ;
$header = curl_getinfo($ch);
curl_close($ch);
// $header['errno'] = $err;
// $header['errmsg'] = $errmsg;
$header['content'] = $content;
#Extract the raw URl from the current one
$scheme = parse_url($url, PHP_URL_SCHEME); //Ex: http
$host = parse_url($url, PHP_URL_HOST); //Ex: www.google.com
$raw_url = $scheme . '://' . $host; //Ex: http://www.google.com
#Replace the relative link by an absolute one
$relative = array();
$absolute = array();
#String to search
$relative[0] = '/src="\//';
$relative[1] = '/href="\//';
#String to remplace by
$absolute[0] = 'src="' . $raw_url . '/';
$absolute[1] = 'href="' . $raw_url . '/';
$source = preg_replace($relative, $absolute, $content); //Ex: src="/image/google.png" to src="http://www.google.com/image/google.png"
return $source;
curl_exec will return false if it cannot find the URL.
The HTTP status code will be zero.
Check the results of curl_exec and check the HTTP status
code too.
$content = curl_exec($ch);
$httpStatus = curl_getinfo($ch, CURLINFO_HTTP_CODE);
if ( $content === false) {
if ($httpStatus == 0) {
$content = "link was not found";
}
}
....
The way you have it currently, the line of code
header['content'] = $content;
will get the value of false. This is not what you
want.
I am using curl_exec and my code does not stall if
it cannot find the url. The code keeps running.
You may end up with nothing in your browser though
and a message in the Firebug Console like "500 Internal Server Error".
Maybe that's what you mean by stall.
So basically you don't know and just guess that the curl request is stalling.
For this answer I can only guess as well then. You might need to set one of the following curl option as well: CURLOPT_CONNECTTIMEOUT
If the connect already stalls, the other timeout setting might not be taken into account. I'm not entirely sure, but please see Why would CURL time out in 1000ms when I have set up timeout upto 3000ms?.

curl_init causes error on specific web pages

If a server has the function "curl_init" disabled, would that cause an error number 7 and the error message: "couldn't connect to host" or is there another reason why I am getting this error?
And if it is a security issue is there a way around this since I do not have control over the security of these websites?
<?php
$url = $_POST["url"];
$url = $url.'/';
echo "<br> <b>URL: </b>".$url;
function get_web_page($url)
{
$options = array(
CURLOPT_RETURNTRANSFER => true, // return web page
CURLOPT_HEADER => false, // don't return headers
CURLOPT_FOLLOWLOCATION => true, // follow redirects
CURLOPT_ENCODING => "", // handle all encodings
CURLOPT_CONNECTTIMEOUT => 120, // timeout on connect
CURLOPT_TIMEOUT => 120, // timeout on response
CURLOPT_MAXREDIRS => 10, // stop after 10 redirects
//CURLOPT_PROXY => "localhost:80",
);
$ch = curl_init( $url );
curl_setopt_array( $ch, $options );
$content = curl_exec( $ch );
$err = curl_errno( $ch );
$errmsg = curl_error( $ch );
$header = curl_getinfo( $ch );
curl_close( $ch );
$header['errno'] = $err;
$header['errmsg'] = $errmsg;
$header['content'] = $content;
$html= get_web_page($url);
echo '<br> <b>Error Number: </b>'.$html['errno'];
echo '<br> <b>Error Message: </b>'.$html['errmsg'];
?>
I hope this is enough code to work off of.
This is the site im working on. At the bottom you should be able to enter any URL. If you put www.csun.edu it gets that error. But if you put library.csun.edu it does not.
http://www.csun.edu/~ppm90976/profilepoojamanjrekar.html
Could you please give your code? I will try to answer your question, but based on your question, I'm not sure if I can.
curl_init() is a native function since PHP4. Unless I'm mistaken, it can't be disabled. On top of that, if it were to be disabled, it wouldn't present a 'couldn't connect to host' (errno:7) error.
So what I think you meant to ask is what would happen 'if the remote server doesn't have it enabled'. This question is actually moot, since what curl does is basically 'fetch' a page, when visiting another site/ page under specific conditions. As long as a web page is viewable under the conditions you set (using the curl_setopt() function), it will work.
Now, what I recall is that the error number 7 could happen if the page you requested has a redirect and your curl options specify that it should not follow redirections. To check if this is the cause of your issue, set this option before you execute the url:
curl_setopt('CURLOPT_FOLLOWLOCATION', true);
Let me know if that helped.

PHP: how to load file from different server as string?

I am trying to load an XML file from a different domain name as a string. All I want is an array of the text within the < title >< /title > tags of the xml file, so I am thinking since I am using php4 the easiest way would be to do a regex on it to get them. Can someone explain how to load the XML as a string? Thanks!
You could use cURL like the example below. I should add that regex-based XML parsing is generally not a good idea, and you may be better off using a real parser, especially if it gets any more complicated.
You may also want to add some regex modifiers to make it work across multiple lines etc., but I assume the question is more about fetching the content into a string.
<?php
$curl = curl_init('http://www.example.com');
//make content be returned by curl_exec rather than being printed immediately
curl_setopt($curl, CURLOPT_RETURNTRANSFER, true);
$result = curl_exec($curl);
if ($result !== false) {
if (preg_match('|<title>(.*)</title>|i', $result, $matches)) {
echo "Title is '{$matches[1]}'";
} else {
//did not find the title
}
} else {
//request failed
die (curl_error($curl));
}
first use
file_get_contents('http://www.example.com/');
to get the file,
insert in to var.
after parse the xml
the link is
http://php.net/manual/en/function.xml-parse.php
have example in the comments
If you're loading well-formed xml, skip the character-based parsing, and use the DOM functions:
$d = new DOMDocument;
$d->load("http://url/file.xml");
$titles = $d->getElementsByTagName('title');
if ($titles) {
echo $titles->item(0)->nodeValue;
}
If you can't use DOMDocument::load() due to how php is set up, the use curl to grab the file and then do:
$d = new DOMDocument;
$d->loadXML($grabbedfile);
...
I have this function as a snippet:
function getHTML($url) {
if($url == false || empty($url)) return false;
$options = array(
CURLOPT_URL => $url, // URL of the page
CURLOPT_RETURNTRANSFER => true, // return web page
CURLOPT_HEADER => false, // don't return headers
CURLOPT_FOLLOWLOCATION => true, // follow redirects
CURLOPT_ENCODING => "", // handle all encodings
CURLOPT_USERAGENT => "spider", // who am i
CURLOPT_AUTOREFERER => true, // set referer on redirect
CURLOPT_CONNECTTIMEOUT => 120, // timeout on connect
CURLOPT_TIMEOUT => 120, // timeout on response
CURLOPT_MAXREDIRS => 3, // stop after 3 redirects
);
$ch = curl_init( $url );
curl_setopt_array( $ch, $options );
$content = curl_exec( $ch );
$header = curl_getinfo( $ch );
curl_close( $ch );
//Ending all that cURL mess...
//Removing linebreaks,multiple whitespace and tabs for easier Regexing
$content = str_replace(array("\n", "\r", "\t", "\o", "\xOB"), '', $content);
$content = preg_replace('/\s\s+/', ' ', $content);
$this->profilehtml = $content;
return $content;
}
That returns the HTML with no linebreaks, tabs, multiple spaces, etc, only 1 line.
So now you do this preg_match:
$html = getHTML($url)
preg_match('|<title>(.*)</title>|iUsm',$html,$matches);
and $matches[1] will have the info you need.

Categories