Parse REST Response Using PHP / CURL - php

I'm trying the REST API here: https://www.semrush.com/api-analytics/ , specifically the Organic Results, but no matter what I've tried, I can't seem to manipulate the data. Can someone tell me how to do this? I've tried SimpleXML, JSON, and even breaking up the response via explode() but I must be missing something because all I can do is push the result to the beginning of an array and not actually break it up.
This is my current code:
$url = "http://api.semrush.com/?type=phrase_organic&key=XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX&display_limit=10&export_columns=Dn,Ur&phrase=seo&database=us";
$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, $url);
curl_setopt($ch,CURLOPT_USERAGENT,"Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.8.1.13) Gecko/20080311 Firefox/2.0.0.13");
curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
$result = curl_exec($ch);
curl_close($ch);
var_dump($result);
With the result being:
string 'Domain;Url
site-analyzer.com;https://www.site-analyzer.com/
woorank.com;https://www.woorank.com/
hubspot.com;http://blog.hubspot.com/blog/tabid/6307/bid/33164/6-SEO-Tools-to-Analyze-Your-Site-Like-Google-Does.aspx
seoworkers.com;http://www.seoworkers.com/tools/analyzer.html
seositecheckup.com;http://seositecheckup.com/
site-seo-analysis.com;http://www.site-seo-analysis.com/
webseoanalytics.com;http://www.webseoanalytics.com/free/seo-tools/web-seo-analysis.php
seocentro.com;http://www.seocentro.com/t'... (length=665)
Is there a simple way to break this up so I can manipulate or reformat the response?

You need to properly explode the new-line characters in order to get to the csv structure, then parse it, as csv
foreach(preg_split("/((\r?\n)|(\r\n?))/", $response) as $key=>$line){
if ($key!=0) {
list($domain,$url) = str_getcsv($line,';');
print 'Domain: ' . $domain . ', URL: ' . $url . PHP_EOL;
}
}
Using the sample response from https://www.semrush.com/api-analytics/#phrase_organic,
the above will output
Domain: wikipedia.org, URL: http://en.wikipedia.org/wiki/Search_engine_optimization
Domain: searchengineland.com, URL: http://searchengineland.com/guide/what-is-seo
Domain: moz.com, URL: http://moz.com/beginners-guide-to-seo
The if statement is there to filter out the first line, the csv header.

Well, we could explode by space " ", then by ;
$response = explode(" ", trim(str_replace("Domain;Url", "", $response)));
$readableResponse = [];
foreach($response as $r)
{
$e = explode(";", $r);
$readableResponse[$e[0]] = $e[1];
}
print_r($readableResponse);
Ie. Live on phpsandbox
[searchengineland.com] => http://searchengineland.com/guide/what-is-seo
[wikipedia.org] => https://en.wikipedia.org/wiki/Search_engine_optimization
....

Related

How to get a specified row using cUrl PHP

Hey guys I use curl to communicate web external server, but the type of response is html, I was able to convert it to json code (more than 4000 row) but I have no idea how to get specified row which contains my result. Any idea ?
Here is my cUrl code :
require_once('getJson.php');
$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, 'http://www.reputationauthority.org/domain_lookup.php?ip=website.com&Submit.x=9&Submit.y=5&Submit=Search');
curl_setopt($ch, CURLOPT_USERAGENT, 'Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; .NET CLR 1.1.4322)');
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
curl_setopt($ch, CURLOPT_CONNECTTIMEOUT, 5);
curl_setopt($ch, CURLOPT_TIMEOUT, 5);
$data = curl_exec($ch);
$httpcode = curl_getinfo($ch, CURLINFO_HTTP_CODE);
curl_close($ch);
$data = '<<<EOF'.$data.'EOF';
$json = new GetJson();
header("Content-Type: text/plain");
$res = json_encode($json->html_to_obj($data), JSON_PRETTY_PRINT);
$myArray = json_decode($res,true);
For getJson.php
class GetJson{
function html_to_obj($html) {
libxml_use_internal_errors(true);
$dom = new DOMDocument();
$dom->loadHTML($html);
return $this->element_to_obj($dom->documentElement);
}
function element_to_obj($element) {
if ($element->nodeType == XML_ELEMENT_NODE){
$obj = array( "tag" => $element->tagName );
foreach ($element->attributes as $attribute) {
$obj[$attribute->name] = $attribute->value;
}
foreach ($element->childNodes as $subElement) {
if ($subElement->nodeType == XML_TEXT_NODE) {
$obj["html"] = $subElement->wholeText;
}
else {
$obj["children"][] = $this->element_to_obj($subElement);
}
}
return $obj;
}
}
}
My idea is instead of Browsing rows to achieve lign 2175 (doing something like : $data['children'][2]['children'][7]['children'][3]['children'][1]['children'][1]['children'][0]['children'][1]['children'][0]['children'][1]['children'][2]['children'][0]['children'][0]['html'] is not a good idea to me), I want to go directly to it.
If the HTML being returned has a consistent structure every time, and you just want one particular value from one part of it, you may be able to use regular expressions to parse the HTML and find the part you need. This is an alternative you trying to put the whole thing into an array. I have used this technique before to parse a HTML document and find a specific item. Here's a simple example. You will need to adapt it to your needs, since you haven't specified the exact nature of the data you're seeking. You may need to go down several levels of parsing to find the right bit:
$data = curl_exec($ch);
//Split the output into an array that we can loop through line by line
$array = preg_split('/\n/',$data);
//For each line in the output
foreach ($array as $element)
{
//See if the line contains a hyperlink
if (preg_match("/<a href/", "$element"))
{
...[do something here, e.g. store the data retrieved, or do more matching to find something within it]...
}
}

PHP - The fastest way to get content of another website and parse this content

I have to get a few params of user's from website. I can do it because every user have an unique ID and I can search users by URL:
http://page.com/search_user.php?uid=X
So I added this URL in for() loop and I tried to get 500 results:
<?php
$start = time();
$results = array();
for($i=0; $i<= 500; $i++)
{
$c = curl_init();
curl_setopt($c, CURLOPT_URL, 'http://page.com/search_user.php?uid='.$i);
curl_setopt($c, CURLOPT_USERAGENT, 'Mozilla/5.0 (Windows; U; Windows NT 6.0; pl; rv:1.9.1.2) Gecko/20090729 desktopsmiley_2_2_5643778701369665_44_71 DS_gamingharbor Firefox/3.5.2 (.NET CLR 3.5.30729)');
curl_setopt($c, CURLOPT_FOLLOWLOCATION, true);
curl_setopt($c, CURLOPT_RETURNTRANSFER, true);
$p = curl_exec($c);
curl_close($c);
if ( preg_match('"<span class=\"uname\">(.*?)</span>"si', $p, $matches) )
{
$username = $matches[1];
}
else
{
continue;
}
preg_match('"<table cellspacing=\"0\">(.*?)</table>"si', $p, $matches);
$comments = $matches[1];
preg_match('"<tr class=\"pos\">(.*?)</tr>"si', $comments, $matches_pos);
preg_match_all('"<td>([0-9]+)</td>"si', $matches_pos[1], $matches);
$comments_pos = $matches[1][2];
preg_match('"<tr class=\"neu\">(.*?)</tr>"si', $comments, $matches_neu);
preg_match_all('"<td>([0-9]+)</td>"si', $matches_neu[1], $matches);
$comments_neu = $matches[1][2];
preg_match('"<tr class=\"neg\">(.*?)</tr>"si', $comments, $matches_neg);
preg_match_all('"<td>([0-9]+)</td>"si', $matches_neg[1], $matches);
$comments_neg = $matches[1][2];
$comments_all = $comments_pos+$comments_neu+$comments_neg;
$about_me = 0;
if ( preg_match('"<span>O mnie</span>"si', $p) )
{
$about_me = 1;
}
$results[] = array('comments' => $comments_all, 'about_me' => $about_me, 'username' => $username);
}
echo 'Generated in: <b>'.(time()-$start).'</b> seconds.<br><br>';
var_dump($results);
?>
Finally I got results:
- everything was generated in 135 seconds.
Then I I replaced curl with file_get_contents() and I got: 155 seconds.
Is faster way to get this results than curl ?? I have to get 20.000.000 results from another page and 135 seconds is too much for me.
Thanks.
If you really need to query different URLs 500 times, maybe you should consider asynchronous approach. The problem with above is that the slowest part (bottleneck) are the curl requests themselves. While waiting for the response, your code is doing nothing.
Try to have a look at PHP asynchronous cURL with callback (i.e. you would make 500 requests "almost at once" and process responses as they come - asynchronously).
Take a look at a previous answer of mine regarding how to divide and conquer this kind of job.
debugging long running PHP script
In your case I'd say you follow the same idea, but you'd further chunk the requests into groups of 500, say.

PHP (CI) cURL passed multidimensional array does not behave as one (Can't loop it)

I'm having an extrange issue when receiving parameters from a POST cURL request. No matter how I encode it (json, url, rawurl, utf8, base64...) before POSTing it, I am not able to perform any decoding operation through the array elements, via loop. I'm giving you the details.
From the consuming controller, in some other php (Yii) app, I build my request like this:
private function callTheApi($options)
{
$url = "http://api.call.com/url/api";
$params = array( 'api_key' => $this->api_key,
'domain' => $this->domain,
'date' => $options['date'],
'keys' => $options['keys'] // This is an array
);
// Following some good advice from Daniel Vandersluis here:
// http://stackoverflow.com/questions/3772096/posting-multidimensional-array-with-php-and-curl
if (is_array($params['keys'])
{
foreach ($params['keys'] as $id => $name)
{
$params['keys[' . $id . ']'] = $name;
}
unset($params['keys']);
}
$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, $url);
curl_setopt($ch, CURLOPT_HTTPHEADER, array('Content-type: multipart/form-data; charset=utf-8'));
curl_setopt($ch, CURLOPT_TIMEOUT, 120);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
curl_setopt($ch, CURLOPT_POST, TRUE);
curl_setopt($ch, CURLOPT_POSTFIELDS, $params);
curl_setopt($ch, CURLOPT_USERAGENT,
"Mozilla/5.0 (Windows; U; Windows NT 5.0; en; rv:1.9.0.4) "
. "Gecko/2009011913 Firefox/3.0.6");
$output = curl_exec($ch);
$error = curl_errno($ch);
$error_text = curl_error($ch);
curl_close($ch);
if (!$output || $error != 0)
die("<br><hr>Problems...<br>"
. "Line:" . __LINE__ . " dataExtractor.php<br>"
. "Error: " . $error . " - " . $error_text . "<hr>"
. $url . "<hr>");
sleep(1);
return json_decode($output, true);
}
And in the api itself, this is the function:
public function api()
{
$params = $_POST;
foreach($params as $k=>$v){
if($k=='domain') $domain = $v;
if($k=='date') $date = $v;
if($k=='api_key') $api_key = $v;
if($k=='keys') $keys = $v;
}
echo json_encode($keys);
// All my logic would be here, after parsing the array correctly.
}
Ok, now for the problems:
If i leave everything like stated before, it works. I have my $keys array in the api, and I can use it however I want. The "echo json_encode($keys)" sentence returns the array ALMOST as it should be. But the problem is some values of the array are corrupted in the cURL operation. Values such as spanish characters á, é ,í, ó, ú OR ü are simply not present in the array_values.
If some key in the $keys array was spanish word "alimentación" in the original array, once it's been cURLed to the api, it becomes "alimentacin". There, the ó is not there anymore.
So, my chances are encoding each value in the array to a safely transferred value, so that I can decode it later. But what do you know, I can't.
I've tried urlencoding, rawurlencoding, json_encoding, base64_encoding... each value of the array. And if I return the received array from the api, it contains the encoded values all right. BUT.
If I loop the array in the api for decoding, and then try to return it, no matter what decoding function I'm applying to its values, the output is ALWAYS "NULL".
I have no clue what I'm doing wrong here. Not even close.
So any help would be much appreciated. Thanks in advance, community.
When you create cUrl params array you should know that keys cannot be utf8.
And when you add some parameters in foreach loop
$params['keys[' . $id . ']'] = $name;
$id can be utf8 character.
To avoid that I recommend you to use json_encode
$params = array(
'api_key' => $this->api_key,
'domain' => $this->domain,
'date' => $options['date'],
'keys' => json_encode($options['keys']) // This is an array
);
In your api in this case you should change nothing.

Getting final urls of shortened urls (like bit.ly) using php

[Updated At Bottom]
Hi everyone.
Start With Short URLs:
Imagine that you've got a collection of 5 short urls (like http://bit.ly) in a php array, like this:
$shortUrlArray = array("http://bit.ly/123",
"http://bit.ly/123",
"http://bit.ly/123",
"http://bit.ly/123",
"http://bit.ly/123");
End with Final, Redirected URLs:
How can I get the final url of these short urls with php? Like this:
http://www.example.com/some-directory/some-page.html
http://www.example.com/some-directory/some-page.html
http://www.example.com/some-directory/some-page.html
http://www.example.com/some-directory/some-page.html
http://www.example.com/some-directory/some-page.html
I have one method (found online) that works well with a single url, but when looping over multiple urls, it only works with the final url in the array. For your reference, the method is this:
function get_web_page( $url )
{
$options = array(
CURLOPT_RETURNTRANSFER => true, // return web page
CURLOPT_HEADER => true, // return headers
CURLOPT_FOLLOWLOCATION => true, // follow redirects
CURLOPT_ENCODING => "", // handle all encodings
CURLOPT_USERAGENT => "spider", // who am i
CURLOPT_AUTOREFERER => true, // set referer on redirect
CURLOPT_CONNECTTIMEOUT => 120, // timeout on connect
CURLOPT_TIMEOUT => 120, // timeout on response
CURLOPT_MAXREDIRS => 10, // stop after 10 redirects
);
$ch = curl_init( $url );
curl_setopt_array( $ch, $options );
$content = curl_exec( $ch );
$err = curl_errno( $ch );
$errmsg = curl_error( $ch );
$header = curl_getinfo( $ch );
curl_close( $ch );
//$header['errno'] = $err;
//$header['errmsg'] = $errmsg;
//$header['content'] = $content;
print($header[0]);
return $header;
}
//Using the above method in a for loop
$finalURLs = array();
$lineCount = count($shortUrlArray);
for($i = 0; $i <= $lineCount; $i++){
$singleShortURL = $shortUrlArray[$i];
$myUrlInfo = get_web_page( $singleShortURL );
$rawURL = $myUrlInfo["url"];
array_push($finalURLs, $rawURL);
}
Close, but not enough
This method works, but only with a single url. I Can't use it in a for loop which is what I want to do. When used in the above example in a for loop, the first four elements come back unchanged, and only the final element is converted into its final url. This happens whether your array is 5 elements or 500 elements long.
Solution Sought:
Please give me a hint as to how you'd modify this method to work when used inside of a for loop with collection of urls (Rather than just one).
-OR-
If you know of code that is better suited for this task, please include it in your answer.
Thanks in advance.
Update:
After some further prodding I've found that the problem lies not in the above method (which, after all, seems to work fine in for loops) but possibly encoding. When I hard-code an array of short urls, the loop works fine. But when I pass in a block of newline-seperated urls from an html form using GET or POST, the above mentioned problem ensues. Are the urls somehow being changed into a format not compatible with the method when I submit the form????
New Update:
You guys, I've found that my problem was due to something unrelated to the above method. My problem was that the URL encoding of my short urls converted what i thought were just newline characters (separating the urls) into this: %0D%0A which is a line feed or return character... And that all short urls save for the final url in the collection had a "ghost" character appended to the tail, thus making it impossible to retrieve the final urls for those only. I identified the ghost character, corrected my php explode, and all works fine now. Sorry and thanks.
This may be of some help: How to put string in array, split by new line?
You would probably do something like this, assuming you're getting the URLs returned in POST:
$final_urls = array();
$short_urls = explode( chr(10), $_POST['short_urls'] ); //You can replace chr(10) with "\n" or "\r\n", depending on how you get your urls. And of course, change $_POST['short_urls'] to the source of your string.
foreach ( $short_urls as $short ) {
$final_urls[] = get_web_page( $short );
}
I get the following output, using var_dump($final_urls); and your bit.ly url:
http://codepad.org/8YhqlCo1
And my source: $_POST['short_urls'] = "http://bit.ly/123\nhttp://bit.ly/123\nhttp://bit.ly/123\nhttp://bit.ly/123";
I also got an error, using your function: Notice: Undefined offset: 0 in /var/www/test.php on line 27 Line 27: print($header[0]); I'm not sure what you wanted there...
Here's my test.php, if it will help: http://codepad.org/zI2wAOWL
I think you are almost have it there. Try this:
$shortUrlArray = array("http://yhoo.it/2deaFR",
"http://bit.ly/900913",
"http://bit.ly/4m1AUx");
$finalURLs = array();
$lineCount = count($shortUrlArray);
for($i = 0; $i < $lineCount; $i++){
$singleShortURL = $shortUrlArray[$i];
$myUrlInfo = get_web_page( $singleShortURL );
$rawURL = $myUrlInfo["url"];
printf($rawURL."\n");
array_push($finalURLs, $rawURL);
}
I implemented to get a each line of a plain text file, with one shortened url per line, the according redirect url:
<?php
// input: textfile with one bitly shortened url per line
$plain_urls = file_get_contents('in.txt');
$bitly_urls = explode("\r\n", $plain_urls);
// output: where should we write
$w_out = fopen("out.csv", "a+") or die("Unable to open file!");
foreach($bitly_urls as $bitly_url) {
$c = curl_init($bitly_url);
curl_setopt($c, CURLOPT_USERAGENT, 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_11_2) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/47.0.2526.106 Safari/537.36');
curl_setopt($c, CURLOPT_FOLLOWLOCATION, 0);
curl_setopt($c, CURLOPT_HEADER, 1);
curl_setopt($c, CURLOPT_RETURNTRANSFER, 1);
curl_setopt($c, CURLOPT_CONNECTTIMEOUT, 20);
// curl_setopt($c, CURLOPT_PROXY, 'localhost:9150');
// curl_setopt($c, CURLOPT_PROXYTYPE, CURLPROXY_SOCKS5);
$r = curl_exec($c);
// get the redirect url:
$redirect_url = curl_getinfo($c)['redirect_url'];
// write output as csv
$out = '"'.$bitly_url.'";"'.$redirect_url.'"'."\n";
fwrite($w_out, $out);
}
fclose($w_out);
Have fun and enjoy!
pw

how to calculate google backlinks using php

i want to create my own tool for back-links calculation using PHP. is there any api to fetech the data for back links
The full implementation in PHP would look something like this:
<?php
$domain = "example.com"; // Enter your domain here.
$url = "http://ajax.googleapis.com/ajax/services/search/web?v=1.0&rsz=large&"
. "q=link:".$domain;
$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, $url);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
curl_setopt($ch, CURLOPT_REFERER, $domain);
$body = curl_exec($ch);
curl_close($ch);
$json = json_decode($body);
$urls = array();
foreach($json->responseData->results as $result) // Loop through the objects in the result
$urls[] = $result->unescapedUrl; // and add the URL to the array.
?>
Basically you edit the domain variable at the top and it will fill the $urls array with unescaped URLs linking to the domain.
EDIT: I've edited the link to return 8 results. For more, you'll have to parse the pages and loop through them with the start parameter. See the Class Reference for more information.
Run a Google search with the URL prefixed by link: - for instance, link:www.mydomain.com.
While Google does provide a more specific backlink overview in their Webmaster Tools area (more info), I'm not sure they provide an external API for it.
since the question is "how to use in php code?" I assume you want to process on the server side as opposed to ajax on the client side. So use the Google URL link: hack in combination with curl
http://php.net/manual/en/book.curl.php
There is also a PHP class with many more options, that you can use: http://code.google.com/p/seostats/
function load_content ($url, $auth = true,$auth_param) {
$curl = curl_init();
$uagent = "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1; .NET CLR 1.1.4322)";
if ($auth){
curl_setopt($curl, CURLOPT_USERPWD,$auth_param);
}
curl_setopt($curl, CURLOPT_URL, $url);
curl_setopt($curl, CURLOPT_USERAGENT, $uagent);
curl_setopt($curl, CURLOPT_RETURNTRANSFER, true);
curl_setopt($curl, CURLOPT_TIMEOUT, 3);
$content = curl_exec($curl);
//$header = curl_getinfo($curl);
curl_close($curl);
$res['msg'] = "";//$header;
$res['content'] = $content;
return $res;
}
function google_indexed($url){
$html = load_content ($url,false,"");
return $html;
}
Example:
<?php
$domain = "google.com";
$indexed["google"] = google_indexed("http://ajax.googleapis.com/ajax/services/search/web?v=1.0&q=site:$domain");
http://alex-kurilov.blogspot.com/2012/09/backlink-checker-google-php-example.html
For finding External links to the page : (external backlinks)
<?php
$url = "any url";
$result_in_html = file_get_contents("http://www.google.com/search?q=link:{$url}");
if (preg_match('/Results .*? of about (.*?) from/sim', $result_in_html, $regs))
{
$indexed_pages = trim(strip_tags($regs[1])); //use strip_tags to remove bold tags
echo ucwords($domain_name) . ' Has <u>' . $indexed_pages . '</u>external links to page';
} elseif (preg_match('/About (.*?) results/sim', $result_in_html, $regs))
{
$indexed_pages = trim(strip_tags($regs[1])); //use strip_tags to remove bold tags
echo ucwords($domain_name) . ' Has <u>' . $indexed_pages . '</u> external links to page';
} else
{
echo ucwords($domain_name) . ' Has Not Been Indexed # Google.com!';
}
?>
And to find internal backlinks :
<?php
$url = "any url";
$result_in_html = file_get_contents("http://www.google.com/search?q=site:{$url}");
if (preg_match('/Results .*? of about (.*?) from/sim', $result_in_html, $regs))
{
$indexed_pages = trim(strip_tags($regs[1])); //use strip_tags to remove bold tags
echo ucwords($domain_name) . ' Has <u>' . $indexed_pages . '</u> internal links to page';
} elseif (preg_match('/About (.*?) results/sim', $result_in_html, $regs))
{
$indexed_pages = trim(strip_tags($regs[1])); //use strip_tags to remove bold tags
echo ucwords($domain_name) . ' Has <u>' . $indexed_pages . '</u> internal links to page';
} else
{
echo ucwords($domain_name) . ' Has Not Been Indexed # Google.com!';
}
?>

Categories