Alright I am still learning my functions in php but this particular piece of code has me stuck. Credit goes to http://www.barattalo.it/2010/08/29/php-bot-to-get-wikipedia-definitions/ for the wikipedia query segment. I thought this looked interesting and am trying to use the code provided to echo data from the function. This is what I have:
<?php
function wikidefinition($s) {
$url = "http://wikipedia.org/w/api.php?action=opensearch&search=".urlencode($s)."&format=xml&limit=1";
$ch = curl_init($url);
curl_setopt($ch, CURLOPT_HTTPGET, TRUE);
curl_setopt($ch, CURLOPT_POST, FALSE);
curl_setopt($ch, CURLOPT_HEADER, false);
curl_setopt($ch, CURLOPT_NOBODY, FALSE);
curl_setopt($ch, CURLOPT_VERBOSE, FALSE);
curl_setopt($ch, CURLOPT_REFERER, "");
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, TRUE);
curl_setopt($ch, CURLOPT_MAXREDIRS, 4);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, TRUE);
curl_setopt($ch, CURLOPT_USERAGENT, "Mozilla/5.0 (Windows; U; Windows NT 6.1; he; rv:1.9.2.8) Gecko/20100722 Firefox/3.6.8");
$page = curl_exec($ch);
$xml = simplexml_load_string($page);
if((string)$xml->Section->Item->Description) {
return array((string)$xml->Section->Item->Text, (string)$xml->Section->Item->Description, (string)$xml->Section->Item->Url);
} else {
return "blank";
}
}
$define = wikidefinition("test");
echo $define;
?>
however this simply echos "array"
I know the code is getting to the if/else statement because if you change the input in "$define = wikidefinition("test");" to some random key combination such as "qoigenqnge" it will echo "blank" I cannot figure out why it is only echoing "array" instead of the data inside the array. Probably something stupid, but I have been reading up on arrays for a while and cant find any info. Any help will be great thanks!
Have you tried:
print_r($define);
instead of echo ?
Related
I get Error When Curl Data From URL ( Url From Array get in file ) .
When I use Normaly URL , It Work Correctly !
But When I get URL from Array and run a loop for it , and call function curl to run every line url , it not work ?
I don't know why , but i need someone help me .
I had turn on error report php , but nothing error !
Example File "link.txt"
https://www.fshare.vn/file/4UNG2NRPW5OQ
https://www.fshare.vn/file/CFTJYXKPMN7Q
https://www.fshare.vn/file/RMHD2XBFY93F
https://www.fshare.vn/file/I4TIUE6E9QOV
https://www.fshare.vn/file/5PV15EB38M7T
https://www.fshare.vn/file/ZDLJNDNWCNYM
https://www.fshare.vn/file/O1VYZUXXJNCI
https://www.fshare.vn/file/SD5C1VZ38IPN
And My PHP code
<?php
error_reporting(E_ALL);
$lines = file('link.txt', FILE_IGNORE_NEW_LINES); // Turn Every Line To Array From File , Or use other below Method
//$lines = explode("\n",fopen('link.txt',"r")); // Other Method Get Content To Array
//$lines = explode("\n",file_get_contents('link.txt')); // Other Method Get Content To Array
foreach($lines as $key => $value){
echo $value . "</br>";
echo get_data($value); // It not Work , Why is that ?
echo get_data("https://www.fshare.vn/file/ACKM6ONXFZJE"); // It Work Normally
die();
}
function get_data($url = null) {
$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, $url);
curl_setopt($ch, CURLOPT_HEADER, true);
curl_setopt($ch, CURLOPT_HTTPHEADER, array('Content-Type' => 'application/x-www-form-urlencoded', 'charset' => 'utf-8'));
curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, true);
curl_setopt($ch, CURLOPT_SSL_VERIFYHOST, false);
curl_setopt($ch, CURLOPT_SSL_VERIFYPEER, false);
curl_setopt($ch, CURLOPT_BINARYTRANSFER, true);
curl_setopt($ch, CURLOPT_USERAGENT, "Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:88.0) Gecko/20100101 Firefox/88.0");
curl_setopt($ch, CURLOPT_COOKIESESSION, true);
curl_setopt($ch, CURLOPT_COOKIEFILE, "cookie.txt");
curl_setopt($ch, CURLOPT_COOKIEJAR, "cookie.txt");
curl_setopt($ch, CURLOPT_CONNECTTIMEOUT, 5);
curl_setopt($ch, CURLOPT_REFERER, 'http://www.google.com.vn/');
$data = curl_exec($ch);
curl_close($ch);
return $data;
}
Right now , i have tested again and it work correctly !
Don't know what happened !
I'm trying to get content of a page through curl call. But I'm getting only an empty array. Here is my code
print_r(get_data('https://www.realestate.com.au/sold/in-9%2f20%3b/list-1'));
function get_data($url)
{
$ch = curl_init();
$timeout = 5;
// the url to fetch
curl_setopt($ch, CURLOPT_URL, $url);
// return result as a string rather than direct output
curl_setopt($ch, CURLOPT_USERAGENT,'Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.17 (KHTML, like Gecko) Chrome/24.0.1312.52 Safari/537.17');
curl_setopt($ch, CURLOPT_AUTOREFERER, true);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, 1);
curl_setopt($ch, CURLOPT_VERBOSE, 1);
// set max time of cURL execution
curl_setopt($ch, CURLOPT_CONNECTTIMEOUT, $timeout);
$data = curl_exec($ch);
curl_close($ch);
return $data;
}
I have also tried different method like file_get_contents function but always getting empty page.
given function is to complicated I think its a copy paste function, as I can see from
return $result; } which is not needed and you should get syntax error.
probably that code will work without that bracket.
I cleared and shortened code, and giving a working and (tested) solution.
NOTE: You can add the code I removed back
$url 'your url';
//Usaqe of function I recomend to use parse_url();
echo get_data($url);
function get_data($url) {
$ch = curl_init();
$timeout = 5;
curl_setopt($ch, CURLOPT_URL, $url);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
curl_setopt($ch, CURLOPT_CONNECTTIMEOUT, $timeout);
$data = curl_exec($ch);
curl_close($ch);
return $data;
}
Hey I am trying to login to the NJIT site to check if username and password are correct. For some reason I keep getting rejected even if I use correct credentials. Also how do I strip the $result to check if it contains "Fail" which would mean the credentials were wrong. Here is my code.
Main:
<?PHP
session_start();
require_once('functions.php');
//$UCID=$_POST['UCID'];
//$Pass=$_POST['Pass'];
$UCID="jko328";
$Pass="password";
$credentialsNJIT="user=".$UCID."&pass=".$Pass;
$njit_url="https://cp4.njit.edu/cp/home/login";
$njit_result=goCurlNJIT($credentialsNJIT, $njit_url);
echo $result;
?>
Here is the cURL function:
function goCurlNJIT($postdata, $url){
session_start();
$ch = curl_init();
curl_setopt ($ch, CURLOPT_URL, $url);
curl_setopt ($ch, CURLOPT_SSL_VERIFYPEER, FALSE);
curl_setopt ($ch, CURLOPT_TIMEOUT, 60);
curl_setopt($ch, CURLOPT_SSL_VERIFYHOST, 0);
curl_setopt($ch, CURLOPT_PROTOCOLS, CURLPROTO_HTTPS);
curl_setopt($ch, CURLOPT_HTTPAUTH, CURLAUTH_ANY);
curl_setopt ($ch, CURLOPT_FOLLOWLOCATION, true);
curl_setopt ($ch, CURLOPT_RETURNTRANSFER, true);
curl_setopt ($ch, CURLOPT_REFERER, $url);
curl_setopt ($ch, CURLOPT_POSTFIELDS, $postdata);
curl_setopt($ch, CURLOPT_CUSTOMREQUEST, "POST");
curl_setopt ($ch, CURLOPT_POST, true);
$result = curl_exec($ch);
curl_close($ch);
if(strpos($result, "Failed") === false){
$response = "NJIT did not like credentials";
}
else{
$response = "NJIT liked your credentials";
}
echo $response;
}
Actually when we load a page it saves the cookie and send it . So to sign in you first need to acess the page without credential and save the cookies . In next request you need to send the cookies . To avoid bot script login usually websites have dynamic hidden fields and other securities .. in this case you cant log on .
I'm updating the function too much and making it way more flexible. --You can update it further more if you want.
First and most importantly, you need to create a text file named cookie.txt in the directory where your scrapping file is.
function goCurlNJIT($header = array(), $url, $post = false)
{
$cookie = "cookie.txt";
$ch = curl_init();
curl_setopt($ch, CURLOPT_ENCODING, 'gzip,deflate,sdch');
if (isset($header) && !empty($header))
{
curl_setopt($ch, CURLOPT_HTTPHEADER, $header);
}
curl_setopt($ch, CURLOPT_URL, $url);
curl_setopt($ch, CURLOPT_HEADER, 0);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
curl_setopt($ch, CURLOPT_TIMEOUT, 200);
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, true);
curl_setopt($ch, CURLOPT_MAXREDIRS, 5);
curl_setopt($ch, CURLOPT_USERAGENT, "Mozilla/5.0 (Windows NT 6.1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/41.0.2228.0 Safari/537.36");
curl_setopt($ch, CURLOPT_COOKIEJAR, realpath($cookie));
curl_setopt($ch, CURLOPT_COOKIEFILE, realpath($cookie));
curl_setopt($ch, CURLOPT_SSL_VERIFYPEER, false);
curl_setopt($ch, CURLOPT_REFERER, $url);
//if it's a POST request instead of GET
if (isset($post) && !empty($post) && $post)
{
curl_setopt($ch, CURLOPT_POST, true);
curl_setopt($ch, CURLOPT_POSTFIELDS, $post);
} //endif
$data = curl_exec($ch);
curl_close($ch);
if($info['http_code'] == 200){
return ($data); //this will return page on successful, false otherwise
}else{
return false;
}
}
if you'd paid a close look to Requested Headers, you can notice that there is one unusual header there, that is Upgrade-Insecure-Requests:1(I personally haven't seen this before, so it's wise to send this along with request).
Next the request that you're posting is not like as it should be, you're missing stuff.
$UCID="jko328";
$Pass="password";
$credentialsNJIT="user=".$UCID."&pass=".$Pass; // where is uuid?
it should be something like this. You're skipping uuid from post string.
pass=password&user=jko328&uuid=0xACA021
so putting altogether,
$post['user'] = "jko328";
$post['pass'] = "password";
$post['uuid'] = '0xACA021';
$urlToPost = "https://cp4.njit.edu/cp/home/login";
$header['Upgrade-Insecure-Requests'] = 1;
//Now make call to function, and it'll work fine.
echo goCurlNJIT($header, $urlToPost, http_build_query($post));
and this will work fine. Make sure you've created cookie.txt file in the directory where your this scripts is.
Im trying to fetch content of a website about a tourament. I want to display the results on a temporary page.
I'm tying to fetch this page:
http://www.tournamentsoftware.com/sport/draw.aspx?id=600CA297-99CA-4420-AE1A-698BA10C39B0&draw=1
I want to return the content of this page, and afterwards fetch the specific table with the fixtures.
The script i'm using returns a 404 Not found error, while the url is present.
My script:
function nxs_cURLTest($url, $msg, $testText){
$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, $url);
curl_setopt($ch, CURLOPT_USERAGENT, "Mozilla/5.0 (compatible; MSIE 9.0; Windows NT 6.1; WOW64; Trident/5.0)");
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
curl_setopt($ch, CURLOPT_TIMEOUT, 10);
curl_setopt($ch, CURLOPT_CONNECTTIMEOUT, 10);
curl_setopt($ch, CURLOPT_SSL_VERIFYHOST, false);
curl_setopt($ch, CURLOPT_SSL_VERIFYPEER, false);
curl_setopt($ch, CURLOPT_SSLVERSION,3); // Apparently 2 or 3
$response = curl_exec($ch);
$errmsg = curl_error($ch);
$cInfo = curl_getinfo($ch);
curl_close($ch);
echo "Testing ... ".$url." - ".$cInfo['url']."<br />";
if (stripos($response, $testText)!==false)
echo "....".$msg." - OK<br />";
else
{
echo "....<b style='color:red;'>".$msg." - Problem</b><br /><pre>";
print_r($errmsg);
print_r($cInfo);
print_r(htmlentities($response));
echo "</pre>There is a problem with cURL. You need to contact your server admin or hosting provider.";
}
}
nxs_cURLTest("http://www.tournamentsoftware.com/sport/draw.aspx?id=600CA297-99CA-4420-AE1A-698BA10C39B0&draw=1", "HTTPS to Toernooi.nl", 'link rel="canonical" href="http://www.tournamentsoftware.com/sport/draw.aspx?id=600CA297-99CA-4420-AE1A-698BA10C39B0&draw=1"');
Can anyone help me on this one?
I used the following on xampp win7 with chrome and it returned data
$url = 'http://www.tournamentsoftware.com/sport/draw.aspx?id=600CA297-99CA-4420-AE1A-698BA10C39B0&draw=1';
echo curl_scrap($url);
function curl_scrap($url) {
$ch = curl_init();
$timeout = 5;
curl_setopt($ch, CURLOPT_URL, $url);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
curl_setopt($ch, CURLOPT_CONNECTTIMEOUT, $timeout);
$data = curl_exec($ch);
curl_close($ch);
return $data;
}
It did not return the same source as just using chrome to hit url, as function doesn't set anything like user agent field etc, but returned the data you want, just a quick and dirty test.
I am trying to make a API call to wikipedia through: http://en.wikipedia.org/w/api.php?action=parse&page=Petunia&format=xml, but the xml is full with html and css tags.
Is there a way to fetch only plain text without tags? Thanks!
*Edit 1:
$json = json_decode(file_get_contents('http://en.wikipedia.org/w/api.php?action=parse&page=Petunia&format=json'));
$txt = strip_tags($json->text);
var_dump($json);
Null displayed.
Question was partially answered here
$url = 'http://en.wikipedia.org/w/api.php?action=parse&page=Petunia&format=json&prop=text';
$ch = curl_init($url);
curl_setopt ($ch, CURLOPT_RETURNTRANSFER, 1);
curl_setopt ($ch, CURLOPT_USERAGENT, "TestScript"); // required by wikipedia.org server
$c = curl_exec($ch);
$json = json_decode($c);
var_dump(strip_tags($json->{'parse'}->{'text'}->{'*'}))
I was not able to use file_get_contents but it works fine with cURL.
it is possible to fetch info or description from wikipedia by using xml.
$url = "http://en.wikipedia.org/w/api.php?action=opensearch&search=".$term."&format=xml&limit=1";
$ch = curl_init($url);
curl_setopt($ch, CURLOPT_HTTPGET, TRUE);
curl_setopt($ch, CURLOPT_POST, FALSE);
curl_setopt($ch, CURLOPT_HEADER, false); // Include head as needed
curl_setopt($ch, CURLOPT_NOBODY, FALSE); // Return body
curl_setopt($ch, CURLOPT_VERBOSE, FALSE); // Minimize logs
curl_setopt($ch, CURLOPT_REFERER, ""); // Referer value
curl_setopt($ch, CURLOPT_SSL_VERIFYPEER, FALSE); // No certificate
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, TRUE); // Follow redirects
curl_setopt($ch, CURLOPT_MAXREDIRS, 4); // Limit redirections to four
curl_setopt($ch, CURLOPT_RETURNTRANSFER, TRUE); // Return in string
curl_setopt($ch, CURLOPT_USERAGENT, "Mozilla/5.0 (Windows; U; Windows NT 6.1; he; rv:1.9.2.8) Gecko/20100722 Firefox/3.6.8"); // Webbot name
$page = curl_exec($ch);
$xml = simplexml_load_string($page);
if((string)$xml->Section->Item->Description) {
print_r(array((string)$xml->Section->Item->Text,
(string)$xml->Section->Item->Description,
(string)$xml->Section->Item->Url));
} else {
echo "sorry";
}
But curl must be install on server... have a nice day...