i am using two function for get the url or video play
1. for extract the tiktok for video with watermark
public function getDetails()
{
$url = $this->url;
$resp = $this->getContent($url);
$check = explode("\"contentUrl\":\"", $resp);
if (count($check) > 1) {
$video = explode("\"", $check[1])[0];
$videoWithoutWaterMark = $this->WithoutWatermark($url);
$thumb = explode("\"", explode("\"thumbnailUrl\":[\"", $resp)[1])[0];
$username = explode("/", explode("#", explode("\"", explode("\"url\":\"", $resp)[1])[0])[1])[0];
$result = [
'video'=>$video,
'withoutWaterMark'=>$videoWithoutWaterMark,
'user'=>$username,
'thumb'=>$thumb,
'error'=>false,
'message'=>false
];
}
else
{
$result = [
'video'=>false,
'withoutWaterMark'=>false,
'user'=>false,
'thumb'=>false,
'error'=>true,
'message'=>"Please double check your url and try again."
];
}
return $result;
}
private function cUrl($url)
{
$user_agent = 'Mozilla/5.0 (Windows NT 6.1; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/79.0.3945.130 Safari/537.36';
$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, $url);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
curl_setopt($ch, CURLOPT_USERAGENT, $user_agent);
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, true);
$result = curl_exec($ch);
curl_close($ch);
return $result;
}
and another function for get the video url without water mark is
private function WithoutWatermark($url)
{
//videi id for example 6795008547961752326
$dd = explode("video/",$url);
$url = "https://api2.musical.ly/aweme/v1/playwm/?video_id=".$dd[1];
return $url;
}
Please help me to find tiktok video id, or any way to create download link of video without watermark. how can i find the video id of the video so i will use this video id for create a download link " https://api2.musical.ly/aweme/v1/playwm/?video_id=v09044b90000bpfdj5q91d8vtcnie6o0";
Your function WithoutWatermark doesn't work.
If you have an url like: tiktok.com/#user/video/123456
then you can make a curl:
$data = cUrl($url)
You'll get a page from tiktok, with regex you can extract url video:
https://v16.muscdn.com/123etc
Then again curl with this above url, the response is bytes and inside with regex you can find something like this vid:yourvideoid
Related
I finally got my script to work but it takes a long time to do the search (via ajax). Basically by entering a keyword, it searches the page and captures all the titles, urls, and thumbnails of the videos. But the problem arose to me to capture the tags that were inside each video, so I had to forcibly access each video to capture the tags, the only way I could think of was to add a loop inside the loop that captures the found videos that is to say:
For each video found -> Capture title, thumbnail, URL -> With captured URL -> Go to that URL and capture your tags.
The code I used is basically the following, I need to know if there is any other method to speed up searches, either by optimizing the code or using another way:
My parse function:
<?php
function dlPage($href) {
$curl = curl_init();
curl_setopt($curl, CURLOPT_SSL_VERIFYPEER, FALSE);
curl_setopt($curl, CURLOPT_HEADER, "Accept-language: en-US");
curl_setopt($curl, CURLOPT_FOLLOWLOCATION, true);
curl_setopt($curl, CURLOPT_URL, $href);
curl_setopt($curl, CURLOPT_REFERER, $href);
curl_setopt($curl, CURLOPT_RETURNTRANSFER, TRUE);
curl_setopt($curl, CURLOPT_USERAGENT, "Mozilla/5.0 (Windows; U; Windows NT 6.1; en-US) AppleWebKit/533.4 (KHTML, like Gecko) Chrome/5.0.375.125 Safari/533.4");
$str = curl_exec($curl);
curl_close($curl);
// Create a DOM object
$dom = new simple_html_dom();
// Load HTML from a string
$dom->load($str);
return $dom;
}
?>
My script:
$buscartag = str_replace(' ', '+', $_POST['buscartag']);
$urlparse = "https://example.com/?k=".$buscartag;
$paginas = rand(0, 50);
$html = dlPage($urlparse."&p=".$paginas);
$counter = 0;
foreach($html->find('div.video-box') as $videos) {
if ($videos) {
$titulo = $videos->find('div.video-box>p[!class])>a[!class]',0)->attr['title'];
$pathvideo = str_replace('_', '', $videos->attr['id']);
$link = "https://www.example.com/".$pathvideo."/";
$thumb = $videos->find('div.thumb')->innertext
//HERE MY SECOND BUCLE FOR TAGS!!!
$gettags2 = array();
$html_tags = file_get_html($link);
foreach ($html_tags->find('a.nu') as $gettags){
$gettags2[] = $gettags->innertext;
if (!empty($titulo) && !empty($link) && !empty($idvideo) && !empty($urlimagen)){
$counter++;
//here will echo all variables
}}
I am currently trying to manipulate dom throuhg php to extract views from an fb video page. The below code was working until a bit ago. However now it doesnt find the node that contains the views count. This information is inside a div with id fbPhotoPageMediaInfo. What would be the best way to manipulate the dom through php to get views of an fb video page?
private function _callCurl($url)
{
$ch = curl_init();
curl_setopt($ch, CURLOPT_SSL_VERIFYHOST, false);
curl_setopt($ch, CURLOPT_USERAGENT, 'Mozilla/5.0 (Linux; Android 5.0.1; SAMSUNG-SGH-I337 Build/LRX22C; wv) AppleWebKit/537.36 (KHTML, like Gecko) Version/4.0 Chrome/42.0.2311.138 Mobile Safari/537.36');
curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
curl_setopt($ch, CURLOPT_SSL_VERIFYPEER, false);
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, false);
curl_setopt($ch, CURLOPT_CONNECTTIMEOUT, 20);
curl_setopt($ch, CURLOPT_URL, $url);
$response = curl_exec($ch);
$http = curl_getinfo($ch, CURLINFO_HTTP_CODE);
curl_close($ch);
return array(
$http,
$response,
);
}
function test()
{
$url = "https://www.facebook.com/TaylorSwift/videos/10153665021155369/";
$request = callCurl($url);
if ($request[0] == 200) {
$dom = new DOMDocument();
#$dom->loadHTML($request[1]);
$elm = $dom->getElementById('fbPhotoPageMediaInfo');
if (isset($elm->nodeValue)) {
$views = preg_replace('/[^0-9]/', '', $elm->nodeValue);
} else {
$views = null;
}
} else {
echo "Error!";
}
return isset($views) ? $views : null;
}
Here is what I've determined...
If you var_dump() on $request you can see that it's giving you a 302 code (redirect) rather than a 200 (ok).
Changing CURLOPT_FOLLOWLOCATION to true or commenting it out entirely makes the error go away, but now we're getting a different page from the one expected.
I ran the following to see where I was being redirected to:
$htm = file_get_contents("https://www.facebook.com/TaylorSwift/videos/10153665021155369/");
var_dump($htm);
This gave me a page saying I was using an outdated browser, and needed to update it. So apparently Facebook doesn't like the User Agent.
I updated it as follows:
curl_setopt($ch, CURLOPT_USERAGENT, 'Mozilla/44.0.2');
That appears to solve the problem.
Personally I prefer to use Simplehtmldom.
FB like other high traffic sites do update their source to help prevent scraping. You may in the future have to adjust your node search.
<?php
$ua = "Mozilla/5.0 (Windows NT 5.0) AppleWebKit/5321 (KHTML, like Gecko) Chrome/13.0.872.0 Safari/5321"; // must be a valid User Agent
ini_set('user_agent', $ua);
require_once('simplehtmldom/simple_html_dom.php'); // http://simplehtmldom.sourceforge.net/
Function Scrape_FB_Views($url) {
IF (!filter_var($url, FILTER_VALIDATE_URL) === false) {
// Create DOM from URL
$html = file_get_html($url);
IF ($html) {
IF (($html->find('span[class=fcg]', 3))) { // 4th instance of span with fcg class
$text = trim($html->find('span[class=fcg]', 3)->plaintext); // get content of span as plain text
$result = preg_replace('/[^0-9]/', '', $text); // replace all non-numeric characters
}ELSE{
$result = "Node is no longer valid."
}
}ELSE{
$result = "Could not get HTML.";
}
}ELSE{
$result = "URL is invalid.";
}
return $result;
}
$url = "https://www.facebook.com/TaylorSwift/videos/10153665021155369/";
echo("<p>".Scrape_FB_Views($url)."</p>");
?>
I'm using php and cURL to scrape a web page, but this web page is poorly designed (as in no classes or ids on tags), so I need to search for specific text, then go to the tag holding it (ie <p>) then move to the next child (or next <p>) and get the text.
There are various things I need to get from the page, some also being the text within an <a onclick="get this stuff here">. So basically I feel that I need to use cURL to scrape the source code to a php variable, then I can use php to kind of parse through and find the stuff I need.
Does this sound like the best method to do this? Does anyone have any pointers or can demonstrate how I can put source code from cURL into a variable?
Thanks!
EDIT (Working/Current Code) -----------
<?php
class Scrape
{
public $cookies = 'cookies.txt';
private $user = null;
private $pass = null;
/*Data generated from cURL*/
public $content = null;
public $response = null;
/* Links */
private $url = array(
'login' => 'https://website.com/login.jsp',
'submit' => 'https://website.com/LoginServlet',
'page1' => 'https://website.com/page1',
'page2' => 'https://website.com/page2',
'page3' => 'https://website.com/page3'
);
/* Fields */
public $data = array();
public function __construct ($user, $pass)
{
$this->user = $user;
$this->pass = $pass;
}
public function login()
{
$this->cURL($this->url['login']);
if($form = $this->getFormFields($this->content, 'login'))
{
$form['login'] = $this->user;
$form['password'] =$this->pass;
// echo "<pre>".print_r($form,true);exit;
$this->cURL($this->url['submit'], $form);
//echo $this->content;//exit;
}
//echo $this->content;//exit;
}
// NEW TESTING
public function loadPage($page)
{
$this->cURL($this->url[$page]);
echo $this->content;//exit;
}
/* Scan for form */
private function getFormFields($data, $id)
{
if (preg_match('/(<form.*?name=.?'.$id.'.*?<\/form>)/is', $data, $matches)) {
$inputs = $this->getInputs($matches[1]);
return $inputs;
} else {
return false;
}
}
/* Get Inputs in form */
private function getInputs($form)
{
$inputs = array();
$elements = preg_match_all('/(<input[^>]+>)/is', $form, $matches);
if ($elements > 0) {
for($i = 0; $i < $elements; $i++) {
$el = preg_replace('/\s{2,}/', ' ', $matches[1][$i]);
if (preg_match('/name=(?:["\'])?([^"\'\s]*)/i', $el, $name)) {
$name = $name[1];
$value = '';
if (preg_match('/value=(?:["\'])?([^"\']*)/i', $el, $value)) {
$value = $value[1];
}
$inputs[$name] = $value;
}
}
}
return $inputs;
}
/* Perform curl function to specific URL provided */
public function cURL($url, $post = false)
{
$ch = curl_init();
curl_setopt($ch, CURLOPT_CONNECTTIMEOUT, 30);
curl_setopt($ch, CURLOPT_USERAGENT, "Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.8.1.13) Gecko/20080311 Firefox/2.0.0.13");
// "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_8_4) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/30.0.1599.101 Safari/537.36");
curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
curl_setopt($ch, CURLOPT_SSL_VERIFYPEER, false);
curl_setopt($ch, CURLOPT_SSL_VERIFYHOST, false);
curl_setopt($ch, CURLOPT_VERBOSE, 1);
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, 1);
curl_setopt($ch, CURLOPT_COOKIEJAR, $this->cookies);
curl_setopt($ch, CURLOPT_COOKIEFILE, $this->cookies);
curl_setopt($ch, CURLOPT_HEADER, 0);
curl_setopt($ch, CURLOPT_RETURNTRANSFER,1);
curl_setopt($ch, CURLOPT_CONNECTTIMEOUT, 120);
curl_setopt($ch, CURLOPT_TIMEOUT, 120);
if($post) //if post is needed
{
curl_setopt($ch, CURLOPT_POST, 1);
curl_setopt($ch, CURLOPT_POSTFIELDS, http_build_query($post));
}
curl_setopt($ch, CURLOPT_URL, $url);
$this->content = curl_exec($ch);
$this->response = curl_getinfo( $ch );
$this->url['last_url'] = curl_getinfo($ch, CURLINFO_EFFECTIVE_URL);
curl_close($ch);
}
}
$sc = new Scrape('user','pass');
$sc->login();
$sc->loadPage('page1');
echo "<h1>TESTTESTEST</h1>";
$sc->loadPage('page2');
echo "<h1>TESTTESTEST</h1>";
$sc->loadPage('page3');
echo "<h1>TESTTESTEST</h1>";
(note: credit to #Ramz scrape a website with secured login)
You can divide your problem in several parts.
Retrieving the data from the data source.
For that, you can possibly use CURL or file_get_contents() depending on your requirements. Code examples are everywhere. http://php.net/manual/en/function.file-get-contents.php and http://php.net/manual/en/curl.examples-basic.php
Parsing the retrieved data.
For that, i would start by looking into "PHP Simple HTML DOM Parser" You can use it to extract data from an HTML string. http://simplehtmldom.sourceforge.net/
Building and generating the output.
This is simply a question of what you want to do with the data that you have extracted. For example, you can print it, reformat it, or store it to a database/file.
I suggest you use a rready made scaper. I use Goutte (https://github.com/FriendsOfPHP/Goutte) which allows me to load website content and traverse it in the same way you do with jQuery. i.e. if I want the content of the <div id="content"> I use $client->filter('#content')->text()
It even allows me to find and 'click' on links and submit forms to retreive and process the content.
It makes life soooooooo mucn easier than using cURL or file_get_contentsa() and working your way through the html manually
I am trying to implement the new version of captcha on my website.
What i did so far:
Inside the FORM:
echo '<div class="g-recaptcha" data-sitekey="XXXXXXXXXXXXXXXXXXXXXXXXXXXX"></div>';
Inside PHP:
$recaptcha = $_POST['g-recaptcha-response'];
if(!empty($recaptcha))
{
$google_url = "https://www.google.com/recaptcha/api/siteverify";
$secret = 'YYYYYYYYYYYYYYYYYYYYYYYYYYY';
$ip = $_SERVER['REMOTE_ADDR'];
$url = $google_url."?secret=".$secret."&response=".$recaptcha."&remoteip=".$ip;
$res = getCurlData($url);
$res = json_decode($res, true);
if($res['success'] == 'false')
{
$captcha_error = "Please re-enter your reCAPTCHA.";
}
}
The getCurlData function:
function getCurlData($url)
{
$curl = curl_init();
curl_setopt($curl, CURLOPT_URL, $url);
curl_setopt($curl, CURLOPT_RETURNTRANSFER, 1);
curl_setopt($curl, CURLOPT_TIMEOUT, 10);
curl_setopt($curl, CURLOPT_USERAGENT, "Mozilla/5.0 (Windows; U; Windows NT 6.1; en-US; rv:1.9.2.16) Gecko/20110319 Firefox/3.6.16");
$curlData = curl_exec($curl);
curl_close($curl);
return $curlData;
}
What i want to achieve is to distinguish when the no-Captcha box is checked. I want to throw an error to the user if he/she did not check that box.
So far i only throw an error if the response from Google is "We are not sure if you are human, please proceed to our second level of verification" [if($res['success'] == 'false')].
PS: most of the code is written by Srinivas Tamada. You can find it here.
Thanks in advance.
The response is a JSON object:
{
"success": true|false,
"error-codes": [...] // optional
}
https://developers.google.com/recaptcha/docs/verify
If you parse that JSON you will get something like this:
object(stdClass)[1]
public 'success' => boolean false
public 'error-codes' =>
array (size=1)
0 => string 'missing-input-response' (length=22)
So if response contains an error code with 'missing-input-response' you can tell that user didn't click on checkbox.
I implemented No Captcha without curl in small library I wrote recently, so you can check it out if you want more details:
https://github.com/zoran-petrovic-87/ZorAuth
http://zoran87.blogspot.com/2014/12/zorauth-10b-complete-flexible-no.html
So for example I have this URL:
http://video.ak.fbcdn.net/hvideo-ak-prn2/v/1032822_578813298845318_1606611618_n.mp4?oh=c3c6a02985213f7c47386f4653792ca6&oe=5200506F&__gda__=1375798216_02752679a44bc4b3c514bee21e000959
How can I download the video source file via PHP?
Note that downloading the URL will not give me the video source!
// does not work:
file_put_contents('video.mp4', 'http://video.ak.fbcdn.net/hvideo-ak-prn2/v/1032822_578813298845318_1606611618_n.mp4?oh=c3c6a02985213f7c47386f4653792ca6&oe=5200506F&__gda__=1375798216_02752679a44bc4b3c514bee21e000959');
// this does not download the video source but instead gets me a file that links to the video hosted on Facebook.
file_put_contents('derp.mp4', file_get_contents('http://video.ak.fbcdn.net/hvideo-ak-prn2/v/1032822_578813298845318_1606611618_n.mp4?oh=c3c6a02985213f7c47386f4653792ca6&oe=5200506F&__gda__=1375798216_02752679a44bc4b3c514bee21e000959'));
This code is more smart, you just have to provide the video link in this code. To get the video link simple right click on video and then click Show Video link or you can directly copy video link from browser URL bar as shown in below image:
Then paste that URL in PASTE_FACEBBOOK_VIDEO_LINK_HERE section of code below
<?php
$options = array('http' => array('user_agent' => 'custom user agent string'));
$context = stream_context_create($options);
$response = file_get_contents('__PASTE_FACEBBOOK_VIDEO_LINK_HERE__', false, $context);
preg_match_all('#\bhttps?://[^,\s()<>]+(?:\([\w\d]+\)|([^,[:punct:]\s]|/))#', strip_tags($response), $match);
$searchword = 'video';
$matches = array_filter($match[0], function($var) use ($searchword) { return preg_match("/\b$searchword\b/i", $var); });
$filename = rand().".mp4";
file_put_contents($filename, fopen(reset($matches), 'r'));
The resultant .mp4 file will look like i.e 24424353.mp4
There is a simple way of doing this. You need to create the functions for the HD and SD quality and then the file getting function which uses curl
function hdLink($curl_content)
{
$regex = '/hd_src:"([^"]+)"/';
if (preg_match($regex, $curl_content, $match)) {
return $match[1];
} else {
return;
}
}
function sdLink($curl_content)
{
$regex = '/sd_src_no_ratelimit:"([^"]+)"/';
if (preg_match($regex, $curl_content, $match1)) {
return $match1[1];
} else {
return;
}
}
function url_get_contents($url)
{
$ch = curl_init();
curl_setopt($ch, CURLOPT_HEADER, 0);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
curl_setopt($ch, CURLOPT_URL, $url);
curl_setopt($ch, CURLOPT_SSL_VERIFYPEER, 0);
curl_setopt($ch, CURLOPT_USERAGENT, 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/42.0.2311.135 Safari/537.36 Edge/12.10240');
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, 1);
$data = curl_exec($ch);
curl_close($ch);
return $data;
}
So in your HTML you will pass the Facebook video URL to the url_get_contents() function
<?php
require_once("functions.php");
if (!empty($_POST["url"]) ) {
$data = url_get_contents($_POST["url"]);
$hdlink = hdLink($data);
$sdlink = sdLink($data);
if (!empty($sdlink) && !empty($hdlink) ) {?>
<a target="_blank" download data-href="<?php echo $hdlink; ?>" href="<?php echo $hdlink; ?>" class="btn btn-block btn-lg btn-success">Download Video</a>
<?php }
}
?>
Reference: How to develop your own Facebook Video Downloader in 3 Steps on answerbox.net