I am trying to implement the new version of captcha on my website.
What i did so far:
Inside the FORM:
echo '<div class="g-recaptcha" data-sitekey="XXXXXXXXXXXXXXXXXXXXXXXXXXXX"></div>';
Inside PHP:
$recaptcha = $_POST['g-recaptcha-response'];
if(!empty($recaptcha))
{
$google_url = "https://www.google.com/recaptcha/api/siteverify";
$secret = 'YYYYYYYYYYYYYYYYYYYYYYYYYYY';
$ip = $_SERVER['REMOTE_ADDR'];
$url = $google_url."?secret=".$secret."&response=".$recaptcha."&remoteip=".$ip;
$res = getCurlData($url);
$res = json_decode($res, true);
if($res['success'] == 'false')
{
$captcha_error = "Please re-enter your reCAPTCHA.";
}
}
The getCurlData function:
function getCurlData($url)
{
$curl = curl_init();
curl_setopt($curl, CURLOPT_URL, $url);
curl_setopt($curl, CURLOPT_RETURNTRANSFER, 1);
curl_setopt($curl, CURLOPT_TIMEOUT, 10);
curl_setopt($curl, CURLOPT_USERAGENT, "Mozilla/5.0 (Windows; U; Windows NT 6.1; en-US; rv:1.9.2.16) Gecko/20110319 Firefox/3.6.16");
$curlData = curl_exec($curl);
curl_close($curl);
return $curlData;
}
What i want to achieve is to distinguish when the no-Captcha box is checked. I want to throw an error to the user if he/she did not check that box.
So far i only throw an error if the response from Google is "We are not sure if you are human, please proceed to our second level of verification" [if($res['success'] == 'false')].
PS: most of the code is written by Srinivas Tamada. You can find it here.
Thanks in advance.
The response is a JSON object:
{
"success": true|false,
"error-codes": [...] // optional
}
https://developers.google.com/recaptcha/docs/verify
If you parse that JSON you will get something like this:
object(stdClass)[1]
public 'success' => boolean false
public 'error-codes' =>
array (size=1)
0 => string 'missing-input-response' (length=22)
So if response contains an error code with 'missing-input-response' you can tell that user didn't click on checkbox.
I implemented No Captcha without curl in small library I wrote recently, so you can check it out if you want more details:
https://github.com/zoran-petrovic-87/ZorAuth
http://zoran87.blogspot.com/2014/12/zorauth-10b-complete-flexible-no.html
Related
i am using two function for get the url or video play
1. for extract the tiktok for video with watermark
public function getDetails()
{
$url = $this->url;
$resp = $this->getContent($url);
$check = explode("\"contentUrl\":\"", $resp);
if (count($check) > 1) {
$video = explode("\"", $check[1])[0];
$videoWithoutWaterMark = $this->WithoutWatermark($url);
$thumb = explode("\"", explode("\"thumbnailUrl\":[\"", $resp)[1])[0];
$username = explode("/", explode("#", explode("\"", explode("\"url\":\"", $resp)[1])[0])[1])[0];
$result = [
'video'=>$video,
'withoutWaterMark'=>$videoWithoutWaterMark,
'user'=>$username,
'thumb'=>$thumb,
'error'=>false,
'message'=>false
];
}
else
{
$result = [
'video'=>false,
'withoutWaterMark'=>false,
'user'=>false,
'thumb'=>false,
'error'=>true,
'message'=>"Please double check your url and try again."
];
}
return $result;
}
private function cUrl($url)
{
$user_agent = 'Mozilla/5.0 (Windows NT 6.1; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/79.0.3945.130 Safari/537.36';
$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, $url);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
curl_setopt($ch, CURLOPT_USERAGENT, $user_agent);
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, true);
$result = curl_exec($ch);
curl_close($ch);
return $result;
}
and another function for get the video url without water mark is
private function WithoutWatermark($url)
{
//videi id for example 6795008547961752326
$dd = explode("video/",$url);
$url = "https://api2.musical.ly/aweme/v1/playwm/?video_id=".$dd[1];
return $url;
}
Please help me to find tiktok video id, or any way to create download link of video without watermark. how can i find the video id of the video so i will use this video id for create a download link " https://api2.musical.ly/aweme/v1/playwm/?video_id=v09044b90000bpfdj5q91d8vtcnie6o0";
Your function WithoutWatermark doesn't work.
If you have an url like: tiktok.com/#user/video/123456
then you can make a curl:
$data = cUrl($url)
You'll get a page from tiktok, with regex you can extract url video:
https://v16.muscdn.com/123etc
Then again curl with this above url, the response is bytes and inside with regex you can find something like this vid:yourvideoid
What i'm trying to do is do a search on Amazon using a random keyword, then i'll just scrape maybe the first 10 results, the issue when i print the html results i get nothing, it's just blank, my code looks ok to me and i have used CURL in the past and never come accross this, my code:
<?php
include_once("classes/simple_html_dom.php");
function get_random_keyword() {
$f_contents = file("keywords.txt");
return $f_contents[rand(0, count($f_contents) - 1)];
}
function getHtml($page) {
$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, $page);
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, true);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
curl_setopt($ch, CURLOPT_USERAGENT, 'Mozilla/5.0 (Windows; U; Windows NT 6.1; en-US; rv:1.9.1.2) Gecko/20090729 Firefox/3.5.2 GTB5');
$html = curl_exec($ch);
print "html -> " . $html;
curl_close($ch);
return $html;
}
$html = getHtml("https://www.amazon.co.uk/s?k=" . get_random_keyword());
?>
Ideally i would have preferred to use the API, but from what i understand you need 3 sales first before you are granted access, can anyone see any issues? i'm not sure what else to check, any help is appreciated.
Amazon is returning the response encoded in gzip. You need to decode it:
$html = getHtml("https://www.amazon.co.uk/s?k=" . get_random_keyword());
echo gzdecode($html);
I am currently trying to manipulate dom throuhg php to extract views from an fb video page. The below code was working until a bit ago. However now it doesnt find the node that contains the views count. This information is inside a div with id fbPhotoPageMediaInfo. What would be the best way to manipulate the dom through php to get views of an fb video page?
private function _callCurl($url)
{
$ch = curl_init();
curl_setopt($ch, CURLOPT_SSL_VERIFYHOST, false);
curl_setopt($ch, CURLOPT_USERAGENT, 'Mozilla/5.0 (Linux; Android 5.0.1; SAMSUNG-SGH-I337 Build/LRX22C; wv) AppleWebKit/537.36 (KHTML, like Gecko) Version/4.0 Chrome/42.0.2311.138 Mobile Safari/537.36');
curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
curl_setopt($ch, CURLOPT_SSL_VERIFYPEER, false);
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, false);
curl_setopt($ch, CURLOPT_CONNECTTIMEOUT, 20);
curl_setopt($ch, CURLOPT_URL, $url);
$response = curl_exec($ch);
$http = curl_getinfo($ch, CURLINFO_HTTP_CODE);
curl_close($ch);
return array(
$http,
$response,
);
}
function test()
{
$url = "https://www.facebook.com/TaylorSwift/videos/10153665021155369/";
$request = callCurl($url);
if ($request[0] == 200) {
$dom = new DOMDocument();
#$dom->loadHTML($request[1]);
$elm = $dom->getElementById('fbPhotoPageMediaInfo');
if (isset($elm->nodeValue)) {
$views = preg_replace('/[^0-9]/', '', $elm->nodeValue);
} else {
$views = null;
}
} else {
echo "Error!";
}
return isset($views) ? $views : null;
}
Here is what I've determined...
If you var_dump() on $request you can see that it's giving you a 302 code (redirect) rather than a 200 (ok).
Changing CURLOPT_FOLLOWLOCATION to true or commenting it out entirely makes the error go away, but now we're getting a different page from the one expected.
I ran the following to see where I was being redirected to:
$htm = file_get_contents("https://www.facebook.com/TaylorSwift/videos/10153665021155369/");
var_dump($htm);
This gave me a page saying I was using an outdated browser, and needed to update it. So apparently Facebook doesn't like the User Agent.
I updated it as follows:
curl_setopt($ch, CURLOPT_USERAGENT, 'Mozilla/44.0.2');
That appears to solve the problem.
Personally I prefer to use Simplehtmldom.
FB like other high traffic sites do update their source to help prevent scraping. You may in the future have to adjust your node search.
<?php
$ua = "Mozilla/5.0 (Windows NT 5.0) AppleWebKit/5321 (KHTML, like Gecko) Chrome/13.0.872.0 Safari/5321"; // must be a valid User Agent
ini_set('user_agent', $ua);
require_once('simplehtmldom/simple_html_dom.php'); // http://simplehtmldom.sourceforge.net/
Function Scrape_FB_Views($url) {
IF (!filter_var($url, FILTER_VALIDATE_URL) === false) {
// Create DOM from URL
$html = file_get_html($url);
IF ($html) {
IF (($html->find('span[class=fcg]', 3))) { // 4th instance of span with fcg class
$text = trim($html->find('span[class=fcg]', 3)->plaintext); // get content of span as plain text
$result = preg_replace('/[^0-9]/', '', $text); // replace all non-numeric characters
}ELSE{
$result = "Node is no longer valid."
}
}ELSE{
$result = "Could not get HTML.";
}
}ELSE{
$result = "URL is invalid.";
}
return $result;
}
$url = "https://www.facebook.com/TaylorSwift/videos/10153665021155369/";
echo("<p>".Scrape_FB_Views($url)."</p>");
?>
I want to copy particular div contain data from flipkart product web page and display it.
<table cellspacing="0" class="specTable">
///// contains /////
</table>
its table value are variable in some web page have 10 tables in same class and some page have more, how i can get all table value from this ?
Also wants to get specific specsValue, is it possible to get it also ?
<td class="specsKey">Brand</td><td class="specsValue">Apple</td>
Web page address: http://www.flipkart.com/apple-iphone-6/p/itme8ra5z7yx5c9j?pid=MOBEYHZ2JHVFHFBG
Sample code
$url = "http://dl.flipkart.com/dl/apple-iphone-6/p/itme8ra5z7yx5c9j?pid=MOBEYHZ2JHVFHFBG";
$response = getPriceFromFlipkart($url);
echo json_encode($response);
/* Returns the response in JSON format */
function getPriceFromFlipkart($url) {
$curl = curl_init($url);
curl_setopt($curl, CURLOPT_USERAGENT, "Mozilla/5.0 (Windows; U; Windows NT 10.10; labnol;) ctrlq.org");
curl_setopt($curl, CURLOPT_FAILONERROR, true);
curl_setopt($curl, CURLOPT_FOLLOWLOCATION, true);
curl_setopt($curl, CURLOPT_RETURNTRANSFER, true);
$html = curl_exec($curl);
curl_close($curl);
$regex = '/<meta itemprop="price" content="([^"]*)"/';
preg_match($regex, $html, $price);
$regex = '/<h1[^>]*>([^<]*)<\/h1>/';
preg_match($regex, $html, $title);
$regex = '/data-src="([^"]*)"/i';
preg_match($regex, $html, $image);
if ($price && $title && $image) {
$response = array("price" => $price[1], "title" => $title[1], "image" => $image[1]);
} else {
$response = array("status" => "404", "error" => "We could not find the product details on Flipkart $url");
}
return $response;
}
?>
Flipkart now change its interface and you can fetch the product price and all by using Flipkart API.
Currently I'm also using their API.
But I also want to fetch the product details using below curl command, if anyone is doing the same without any problem please share what else i have to add here to fetch the product webpage content, while debugging this by using getinfo() it will return 301 Moved Permanentlywith Status Code 0
$curl_handle=curl_init();
curl_setopt($curl_handle,CURLOPT_URL,<flipkart_url>);
curl_setopt($curl_handle,CURLOPT_CONNECTTIMEOUT,100);
curl_setopt($curl_handle,CURLOPT_RETURNTRANSFER,1);
curl_setopt($curl_handle, CURLOPT_REFERER, 'http://www.flipkart.com/');
curl_setopt($curl_handle, CURLOPT_USERAGENT, 'Mozilla/5.0 (Windows; U; Windows NT 6.1; en-US; rv:1.9.1.2) Gecko/20090729 Firefox/3.5.2 GTB5');
$str = curl_exec($curl_handle);
$html = new simple_html_dom();
$html->load($str);
I'm using php and cURL to scrape a web page, but this web page is poorly designed (as in no classes or ids on tags), so I need to search for specific text, then go to the tag holding it (ie <p>) then move to the next child (or next <p>) and get the text.
There are various things I need to get from the page, some also being the text within an <a onclick="get this stuff here">. So basically I feel that I need to use cURL to scrape the source code to a php variable, then I can use php to kind of parse through and find the stuff I need.
Does this sound like the best method to do this? Does anyone have any pointers or can demonstrate how I can put source code from cURL into a variable?
Thanks!
EDIT (Working/Current Code) -----------
<?php
class Scrape
{
public $cookies = 'cookies.txt';
private $user = null;
private $pass = null;
/*Data generated from cURL*/
public $content = null;
public $response = null;
/* Links */
private $url = array(
'login' => 'https://website.com/login.jsp',
'submit' => 'https://website.com/LoginServlet',
'page1' => 'https://website.com/page1',
'page2' => 'https://website.com/page2',
'page3' => 'https://website.com/page3'
);
/* Fields */
public $data = array();
public function __construct ($user, $pass)
{
$this->user = $user;
$this->pass = $pass;
}
public function login()
{
$this->cURL($this->url['login']);
if($form = $this->getFormFields($this->content, 'login'))
{
$form['login'] = $this->user;
$form['password'] =$this->pass;
// echo "<pre>".print_r($form,true);exit;
$this->cURL($this->url['submit'], $form);
//echo $this->content;//exit;
}
//echo $this->content;//exit;
}
// NEW TESTING
public function loadPage($page)
{
$this->cURL($this->url[$page]);
echo $this->content;//exit;
}
/* Scan for form */
private function getFormFields($data, $id)
{
if (preg_match('/(<form.*?name=.?'.$id.'.*?<\/form>)/is', $data, $matches)) {
$inputs = $this->getInputs($matches[1]);
return $inputs;
} else {
return false;
}
}
/* Get Inputs in form */
private function getInputs($form)
{
$inputs = array();
$elements = preg_match_all('/(<input[^>]+>)/is', $form, $matches);
if ($elements > 0) {
for($i = 0; $i < $elements; $i++) {
$el = preg_replace('/\s{2,}/', ' ', $matches[1][$i]);
if (preg_match('/name=(?:["\'])?([^"\'\s]*)/i', $el, $name)) {
$name = $name[1];
$value = '';
if (preg_match('/value=(?:["\'])?([^"\']*)/i', $el, $value)) {
$value = $value[1];
}
$inputs[$name] = $value;
}
}
}
return $inputs;
}
/* Perform curl function to specific URL provided */
public function cURL($url, $post = false)
{
$ch = curl_init();
curl_setopt($ch, CURLOPT_CONNECTTIMEOUT, 30);
curl_setopt($ch, CURLOPT_USERAGENT, "Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.8.1.13) Gecko/20080311 Firefox/2.0.0.13");
// "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_8_4) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/30.0.1599.101 Safari/537.36");
curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
curl_setopt($ch, CURLOPT_SSL_VERIFYPEER, false);
curl_setopt($ch, CURLOPT_SSL_VERIFYHOST, false);
curl_setopt($ch, CURLOPT_VERBOSE, 1);
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, 1);
curl_setopt($ch, CURLOPT_COOKIEJAR, $this->cookies);
curl_setopt($ch, CURLOPT_COOKIEFILE, $this->cookies);
curl_setopt($ch, CURLOPT_HEADER, 0);
curl_setopt($ch, CURLOPT_RETURNTRANSFER,1);
curl_setopt($ch, CURLOPT_CONNECTTIMEOUT, 120);
curl_setopt($ch, CURLOPT_TIMEOUT, 120);
if($post) //if post is needed
{
curl_setopt($ch, CURLOPT_POST, 1);
curl_setopt($ch, CURLOPT_POSTFIELDS, http_build_query($post));
}
curl_setopt($ch, CURLOPT_URL, $url);
$this->content = curl_exec($ch);
$this->response = curl_getinfo( $ch );
$this->url['last_url'] = curl_getinfo($ch, CURLINFO_EFFECTIVE_URL);
curl_close($ch);
}
}
$sc = new Scrape('user','pass');
$sc->login();
$sc->loadPage('page1');
echo "<h1>TESTTESTEST</h1>";
$sc->loadPage('page2');
echo "<h1>TESTTESTEST</h1>";
$sc->loadPage('page3');
echo "<h1>TESTTESTEST</h1>";
(note: credit to #Ramz scrape a website with secured login)
You can divide your problem in several parts.
Retrieving the data from the data source.
For that, you can possibly use CURL or file_get_contents() depending on your requirements. Code examples are everywhere. http://php.net/manual/en/function.file-get-contents.php and http://php.net/manual/en/curl.examples-basic.php
Parsing the retrieved data.
For that, i would start by looking into "PHP Simple HTML DOM Parser" You can use it to extract data from an HTML string. http://simplehtmldom.sourceforge.net/
Building and generating the output.
This is simply a question of what you want to do with the data that you have extracted. For example, you can print it, reformat it, or store it to a database/file.
I suggest you use a rready made scaper. I use Goutte (https://github.com/FriendsOfPHP/Goutte) which allows me to load website content and traverse it in the same way you do with jQuery. i.e. if I want the content of the <div id="content"> I use $client->filter('#content')->text()
It even allows me to find and 'click' on links and submit forms to retreive and process the content.
It makes life soooooooo mucn easier than using cURL or file_get_contentsa() and working your way through the html manually