PHP - CURLOPT_BUFFERSIZE ignored - php

I would like to execute the callback function every X bytes uploaded, but I don't understand why php keeps calling the callback function way way more often.
here is my code:
$ch = curl_init();
curl_setopt($ch, CURLOPT_URL,$converter);
curl_setopt($ch, CURLOPT_POST,1);
curl_setopt($ch, CURLOPT_POSTFIELDS, $post_fields);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
curl_setopt($ch, CURLOPT_NOPROGRESS, false);
curl_setopt($ch, CURLOPT_PROGRESSFUNCTION, 'callback');
curl_setopt($ch, CURLOPT_BUFFERSIZE, 10485764);
$result=curl_exec ($ch);
//$info = curl_getinfo($ch);
//print_r($info);
curl_close ($ch);
function callback($resource, $download_size, $downloaded, $upload_size, $uploaded) {
echo $uploaded . '/' . $upload_size ."\r";
}
The file to upload is around 68 MB, the callback function should get executed 68 times (10485764 bytes = 1 MB), but it gets executed around 9k times...
The function should write the progress in a mysql db, that's why I need it to get executed less time.

As Barman stated, CURLOPT_BUFFERSIZE is related to download and won't work for upload.
The solution is to check the size and do something only if a certain amount of byte has been uploaded.
Exemple:
$i= 0;
$up = 0;
function callback($resource, $download_size, $downloaded, $upload_size, $uploaded) {
global $i, $up;
if ($uploaded > ($up + 1048576)){
$i++;
$up = $uploaded + 1048576;
echo $i . ' => ' . formatBytes($uploaded) . '/' . formatBytes($upload_size) ."\r";
}
}

Related

PHP - Scrape data of all trustpilot reviews [duplicate]

This question already has answers here:
How do you parse and process HTML/XML in PHP?
(31 answers)
Closed 3 years ago.
<?php
for ($x = 0; $x <= 25; $x++) {
$ch = curl_init("https://uk.trustpilot.com/review/example.com?languages=all&page=$x");
//curl_setopt($ch, CURLOPT_POST, true);
//curl_setopt($ch, CURLOPT_POSTFIELDS, $post);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, true);
//curl_setopt($ch, CURLOPT_CONNECTTIMEOUT, 0);
curl_setopt($ch, CURLOPT_TIMEOUT, 30); //timeout in seconds
$trustpilot = curl_exec($ch);
// Check if any errorccurred
if(curl_errno($ch))
{
die('Fatal Error Occoured');
}
}
?>
This code will get all 25 pages of reviews for example.com, what I then want to do is then put all the results into a JSON array or something.
I attempted the code below in order to maybe retrieve all of the names:
<?php
$trustpilot = preg_replace('/\s+/', '', $trustpilot); //This replaces any spaces with no spaces
$first = explode( '"name":"' , $trustpilot );
$second = explode('"' , $first[1] );
$result = preg_replace('/[^a-zA-Z0-9-.*_]/', '', $second[0]); //Don't allow special characters
?>
This is clearly a lot harder than I anticipated, does anyone know how I could possibly get all of the reviews into JSON or something for however many pages I choose, for example in this case I choose 25 pages worth of reviews.
Thanks!
do not parse HTML with regex.
use DOMDocument & DOMXPath to parse em. also, you create a new curl handle for each page, but you never close them, which is a resource/memory leak in your code, but also a waste of cpu because you could just keep re-using the same curl handle over and over (instead of creating a new curl handle for each page, which takes cpu), and protip: this html compress rather well, so you should use CURLOPT_ENCODING to download the pages compressed,
e.g:
<?php
declare(strict_types = 1);
header("Content-Type: text/plain;charset=utf-8");
$ch = curl_init();
curl_setopt($ch, CURLOPT_ENCODING, ''); // enables compression
$reviews = [];
for ($x = 0; $x <= 25; $x ++) {
curl_setopt($ch, CURLOPT_URL, "https://uk.trustpilot.com/review/example.com?languages=all&page=$x");
// curl_setopt($ch, CURLOPT_POST, true);
// curl_setopt($ch, CURLOPT_POSTFIELDS, $post);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, true);
// curl_setopt($ch, CURLOPT_CONNECTTIMEOUT, 0);
curl_setopt($ch, CURLOPT_TIMEOUT, 30); // timeout in seconds
$trustpilot = curl_exec($ch);
// Check if any errorccurred
if (curl_errno($ch)) {
die('fatal error: curl_exec failed, ' . curl_errno($ch) . ": " . curl_error($ch));
}
$domd = #DOMDocument::loadHTML($trustpilot);
$xp = new DOMXPath($domd);
foreach ($xp->query("//article[#class='review-card']") as $review) {
$id = $review->getAttribute("id");
$reviewer = $xp->query(".//*[#class='content-section__consumer-info']", $review)->item(0)->textContent;
$stars = $xp->query('.//div[contains(#class,"star-item")]', $review)->length;
$title = $xp->query('.//*[#class="review-info__body__title"]', $review)->item(0)->textContent;
$text = $xp->query('.//*[#class="review-info__body__text"]', $review)->item(0)->textContent;
$reviews[$id] = array(
'reviewer' => mytrim($reviewer),
'stars' => ($stars),
'title' => mytrim($title),
'text' => mytrim($text)
);
}
}
curl_close($ch);
echo json_encode($reviews, JSON_PRETTY_PRINT | JSON_UNESCAPED_SLASHES | JSON_UNESCAPED_UNICODE | (defined("JSON_UNESCAPED_LINE_TERMINATORS") ? JSON_UNESCAPED_LINE_TERMINATORS : 0) | JSON_NUMERIC_CHECK);
function mytrim(string $text): string
{
return preg_replace("/\s+/", " ", trim($text));
}
output:
{
"4d6bbf8a0000640002080bc2": {
"reviewer": "Clement Skau Århus, DK, 3 reviews",
"stars": 5,
"title": "Godt fundet på!",
"text": "Det er rigtig fint gjort at lave et example domain. :)"
}
}
because there is only 1 review here for the url you listed. and 4d6bbf8a0000640002080bc2 is the website's internal id (probably a sql db id) for that review.

Curl request not flushing data periodically

I am making a curl request to a function and in that function the data is being flushed periodically. So I want to display the data as soon as it is flushed in my calling function. But my response is displayed collectively after the request is over. I want to display response side by side.
Code
require_once 'mystream.php';
stream_wrapper_register("var", "mystream") or die("Failed to register protocol");
$myVar = '';
// Open the "file"
$fp = fopen("var://myVar", "r+");
// Configuration of curl
$ch = curl_init();
$output = ' ';
$url = '';
curl_setopt($ch, CURLOPT_POST, true);
curl_setopt($ch, CURLOPT_POSTFIELDS, $output);
curl_setopt($ch, CURLOPT_URL, $url);
curl_setopt($ch, CURLOPT_HEADER, 0 );
//curl_setopt($ch, CURLOPT_BUFFERSIZE, 1);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
curl_setopt($ch, CURLOPT_FILE, $fp); // Data will be sent to our stream ;-)
curl_exec($ch);
curl_close($ch);
// Don't forget to close the "file" / stream
fclose($fp);
mystream.php code
<?php
class mystream{
protected $buffer = '';
function stream_open($path, $mode, $options, &$opened_path) {
return true;
}
public function stream_write($data) {
ob_start();
echo $data;
ob_end_flush();
ob_flush();
flush();
// Extract the lines ; on y tests, data was 8192 bytes long ; never more
$lines = explode("\n", $data);
// The buffer contains the end of the last line from previous time
// => Is goes at the beginning of the first line we are getting this time
$lines[0] = $this->buffer . $lines[0];
// And the last line os only partial
// => save it for next time, and remove it from the list this time
$nb_lines = count($lines);
$this->buffer = $lines[$nb_lines-1];
unset($lines[$nb_lines-1]);
// Here, do your work with the lines you have in the buffer
//var_dump($lines);
//echo '<hr />';
return strlen($data);
}
}
Any leads would be highly appreciated.

Calculate MD5 of file being downloaded with PHP and CURL

I have some cURL call that download a large file.
I'm wondering if it is possible to calculate hash when the file is still downloading?
I think the progress callback function is the right place for accomplish that..
function get($urlget, $filename) {
//Init Stuff[...]
$this->fp = fopen($filename, "w+");
$ch = curl_init();
//[...] irrelevant curlopt stuff
curl_setopt($ch, CURLOPT_FILE, $this->fp);
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, true);
curl_setopt($ch, CURLOPT_NOPROGRESS, 0);
curl_setopt($ch, CURLOPT_PROGRESSFUNCTION, array($this,'curl_progress_cb'));
$httpCode = curl_getinfo($ch, CURLINFO_HTTP_CODE);
$ret = curl_exec($ch);
if( curl_errno($ch) ){
$ret = FALSE;
}
curl_close($ch);
fclose($this->fp);
return $ret;
}
function curl_progress_cb($dltotal, $dlnow, $ultotal, $ulnow ){
//... Calculate MD5 of file here with $this->fp
}
Its possible to calculate md5 hash of partially downloaded file, but it does not make too much sense. Every downloaded byte will change your hash diametrally, what is the reason behind going with this kind solution?
If you need to have md5 hash for entire file than the answer is NO. Your program has to first download the file and then generate the hash.
I just do it:
in a file wget-md5.php, add the below code:
<?php
function writeCallback($resource, $data)
{
global $handle;
global $handle_md5_val;
global $handle_md5_ctx;
$len = fwrite($handle,$data);
hash_update($handle_md5_ctx,$data);
return $len;
}
$handle=FALSE;
$handle_md5_val=FALSE;
$handle_md5_ctx=FALSE;
function wget_with_curl_and_md5_hashing($url,$uri)
{
global $handle;
global $handle_md5_val;
global $handle_md5_ctx;
$handle_md5_val=FALSE;
$handle_md5_ctx=hash_init('md5');
$handle = fopen($uri,'w');
$curl = curl_init();
curl_setopt($curl, CURLOPT_URL, $url);
curl_setopt($curl, CURLOPT_BUFFERSIZE, 64000);
curl_setopt($curl, CURLOPT_WRITEFUNCTION, 'writeCallback');
echo "wget_with_curl_and_md5_hashing[".$url."]=downloading\n";
curl_exec($curl);
curl_close($curl);
fclose($handle);
$handle_md5_val = hash_final($handle_md5_ctx);
$handle_md5_ctx=FALSE;
echo "wget_with_curl_and_md5_hashing[".$url."]=downloaded,md5=".$handle_md5_val."\n";
}
wget_with_curl_and_md5_hashing("http://archlinux.polymorf.fr/core/os/x86_64/core.files.tar.gz","core.files.tar.gz");
?>
and run:
$ php -f wget-md5.php
wget_with_curl_and_md5_hashing[http://archlinux.polymorf.fr/core/os/x86_64/core.files.tar.gz]=downloading
wget_with_curl_and_md5_hashing[http://archlinux.polymorf.fr/core/os/x86_64/core.files.tar.gz]=downloaded,md5=5bc1ac3bc8961cfbe78077e1ebcf7cbe
$ md5sum core.files.tar.gz
5bc1ac3bc8961cfbe78077e1ebcf7cbe core.files.tar.gz

Resume an ftp upload with curl

i'm buidling a php script to upload larges files from a local php server to a distant ftp server with log of progress in a mysql base. Evrything work fine, but i get problem with the resume function and i can't find any clear information of the process to follow to resume an ftp upload with curl. my code bellow :
$fp = fopen($localfile, 'r');
curl_setopt($curl, CURLOPT_RETURNTRANSFER, true);
curl_setopt($curl, CURLOPT_NOPROGRESS, false);
curl_setopt($curl, CURLOPT_PROGRESSFUNCTION, 'curlProgressCallback'); // lof progress to base
curl_setopt($curl, CURLOPT_READFUNCTION, 'curlAbortCallback'); // check if user request abort
curl_setopt($curl, CURLOPT_URL, $ftp);
curl_setopt($curl, CURLOPT_PORT, FTPPort);
curl_setopt($curl, CURLOPT_LOW_SPEED_LIMIT, 1000);
curl_setopt($curl, CURLOPT_LOW_SPEED_TIME, 20);
curl_setopt($curl, CURLOPT_USERPWD, FTPLog . ":" . FTPPass);
curl_setopt($curl, CURLOPT_FTP_CREATE_MISSING_DIRS, true);
curl_setopt($curl, CURLOPT_UPLOAD, 1);
if($resume){
$startFrom=ftpFileSize($dest); // return the actual file size on the ftp server
}else{
$startFrom=false;
}
if($startFrom){
curl_setopt ($curl, CURLOPT_FTPAPPEND, 1);
curl_setopt($curl, CURLOPT_RESUME_FROM, $startFrom);
fseek($fp, $startFrom, SEEK_SET);
$sizeToUp=filesize($localfile);//-$startFrom;
}else{
$sizeToUp=filesize($localfile);
}
curl_setopt($curl, CURLOPT_INFILE, $fp);
curl_setopt($curl, CURLOPT_INFILESIZE, $sizeToUp);
curl_exec($curl);
If someone call help me on this or redirect me on a valid example it will be very helfull and appreciate.
Tks for your feedback
Mr
I do not think cURL & PHP can do this.
Are you using Linux? If so look at aria2. It supports resumable connections and supports FTP. I am not sure if it will do exactly what you want.
So i've abort my research to make resume with curl, not enought help or documentation on it,
And it's really more easy to do it with ftp_nb_put. So i f its can help someone, you can find a exemple of my final code bellow :
define("FTPAdd","your serveur ftp address");
define("FTPPort",21);
define("FTPTimeout",120);
define("FTPLog","your login");
define("FTPPass","your password");
function ftpUp_checkForAndMakeDirs($ftpThread, $file) {
$parts = explode("/", dirname($file));
foreach ($parts as $curDir) {
if (#ftp_chdir($ftpThread, $curDir) === false) { // Attempt to change directory, suppress errors
ftp_mkdir($ftpThread, $curDir); //directory doesn't exist - so make it
ftp_chdir($ftpThread, $curDir); //go into the new directory
}
}
}
function ftpUp_progressCallBack($uploadedData){
global $abortRequested;
//you can do any action you want while file upload progress, personaly, il log progress in to data base
//and i request DB to know if user request a file transfert abort and set the var $abortRequested to true or false
}
function ftpUp($src,$file,$dest,$resume){
global $abortRequested;
$conn_id = ftp_connect(FTPAdd,FTPPort,FTPTimeout);
if ($conn_id==false){
echo "FTP Connection problem";return false;
}else{
$login_res = ftp_login($conn_id, FTPLog, FTPPass);
if ($login_res){
$ftpThread=$conn_id;
}else{
echo "FTP Authentification error";return false;
}
}
$fp = fopen($src, 'r');
ftpUp_checkForAndMakeDirs($ftpThread, $dest); //verif et creation de l'arborescence sur le serveur ftp
ftp_set_option ($ftpThread, FTP_AUTOSEEK, TRUE); // indispensable pour pouvoir faire du resume
if($resume){
$upload = ftp_nb_fput ($ftpThread, $file, $fp ,FTPUpMode, FTP_AUTORESUME);
}else{
$upload = ftp_nb_fput ($ftpThread, $file, $fp ,FTPUpMode);
}
///////////////////////////////////////////////////////////////////////////////////
//uploading process
while ($upload == FTP_MOREDATA) {
//progress of upload
ftpUp_progressCallBack(ftell ($fp));
//continue or abort
if(!$abortRequested){
$upload = ftp_nb_continue($ftpThread);
}else{
$upload = "userAbort";
}
}
///////////////////////////////////////////////////////////////////////////////////
//end and result
ftpUp_progressCallBack(ftell ($fp));
if ($upload != FTP_FINISHED) {
#fclose($fp);
#ftp_close ($ftpThread);
if ($abortRequested){
echo "FTP Abort by user : resume needed";
}else{
echo "FTP upload error : ".$upload." (try resume)";
}
}else{
#fclose($fp);
#ftp_close ($ftpThread);
echo "upload sucess";
}
}
$file="test.zip";
$src = "FilesToUp/" . $file;
$destDir = "www/data/upFiles/";
$dest = $destDir . $file;
$abortRequested=false;
ftpUp($src,$file,$dest,true);
I was searching for a viable answer using CURL + PHP to resume a transfer that was broken and could not find clear, viable solution on the internet. This is an OLD question but I figured it did need a proper answer. This is the result of a day or two of research. See functions below and quick usage example.
Connect function:
function ftp_getconnect($uName, $pWord, $uHost)
{
$ch = curl_init();
curl_setopt($ch, CURLOPT_USERPWD, "$uName:$pWord");
curl_setopt($ch, CURLOPT_URL, $uHost);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
$return = curl_exec($ch);
if($return === false)
{
print_r(curl_getinfo($ch));
echo curl_error($ch);
curl_close($ch);
die('Could not connect');
}
else
{
return $ch;
}
}
Disconnect function:
function ftp_disconnect($ch)
{
$return = curl_close($ch);
if($return === false)
{
return "Error: Could not close connection".PHP_EOL;
}
else
{
return $return;
}
}
Function to get the remote file size (in bytes):
function get_rem_file_size($ch, $rem_file)
{
curl_setopt($ch, CURLOPT_INFILE, STDIN);
curl_setopt($ch, CURLOPT_URL, $rem_file);
curl_setopt($ch, CURLOPT_NOBODY, true);
curl_setopt($ch, CURLOPT_HEADER, true);
curl_setopt($ch, CURLOPT_FTP_CREATE_MISSING_DIRS, false);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
curl_setopt($ch, CURLOPT_FTPLISTONLY, false);
$return = curl_exec($ch);
if($return === false)
{
print_r(curl_getinfo($ch));
echo curl_error($ch);
curl_close($ch);
die('Could not connect');
}
else
{
$file_header = curl_getinfo($ch);
return $file_header['download_content_length'];
}
}
Upload file function:
function upload_file($ch,$local_file,$remote_file,$resume)
{
echo "attempting to upload $local_file to $remote_file".PHP_EOL;
$file = fopen($local_file, 'rb');
if($resume)
{
curl_setopt($ch, CURLOPT_RESUME_FROM, get_rem_file_size($ch, $remote_file));
}
curl_setopt($ch, CURLOPT_URL, $remote_file);
curl_setopt($ch, CURLOPT_UPLOAD, true);
curl_setopt($ch, CURLOPT_HEADER, false);
curl_setopt($ch, CURLOPT_FTP_CREATE_MISSING_DIRS, true);
curl_setopt($ch, CURLOPT_INFILE, $file);
curl_setopt($ch, CURLOPT_INFILESIZE, filesize($local_file));
curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
$return = curl_exec($ch);
if($return === false)
{
fclose($file);
return curl_error($ch).PHP_EOL;
}
else
{
fclose($file);
echo $local_file.' uploaded'.PHP_EOL;
return true;
}
}
Quick usage example:
$ftp = ftp_getconnect($uName, $pWord, 'ftp://'.$uHost);
$rem_file = 'ftp://'.$uHost.'/path/to/remote/file.ext';
$loc_file = 'path/to/local/file.ext';
$resume = 'true';
$success = upload_file($ftp, $loc_file, $rem_file, $resume);
if($success !== true)
{
//failure
echo $success;
curl_close($ch);
}
print_r(ftp_disconnect($ftp));
Quick note, if there is a large set of files, you can loop through them and upload_file without connecting/disconnecting each time.

php curl functions issue

I am having troble on curl functions in my codes. My CURLINFO_HTTP_CODE always return 0 and when I use
curl_error($ch) it returns 'could not reach host'. My host is ispeech and it shouldn't have problems. Can anyone here help me out? Thanks a lot!
iSpeech.php
class iSpeechBase{
var $server;
var $parameters = array("device-type"=>"php-SDK-0.3");
function setParameter($parameter, $value){
if ($parameter == "server")
$this->server = $value;
else
$this->parameters["$parameter"] = $value;
}
function makeRequest(){
$ch = curl_init();
$url=$this->server . "/?" . http_build_query($this->parameters);
curl_setopt($ch, CURLOPT_URL, $url);
curl_setopt($ch, CURLOPT_HEADER, 0);
curl_setopt($ch, CURLOPT_TIMEOUT, 5);
ob_start();
echocurl_exec($ch);
$http_body = ob_get_contents();
ob_end_clean();
echo curl_getinfo($ch, CURLINFO_HTTP_CODE); //return 0
echo curl_error($ch); //return Could not reach host.
if (curl_getinfo($ch, CURLINFO_HTTP_CODE) != 200)
if ($this->parameters["action"] == "convert")
return array("error" => $http_body);
return $http_body;
}
}
synthesis-demo.php
require_once('ispeech.php');
$SpeechSynthesizer = new SpeechSynthesizer();
$SpeechSynthesizer->setParameter("server", "http://api.ispeech.org/api/rest");
$SpeechSynthesizer->setParameter("apikey", "myapikey");
$SpeechSynthesizer->setParameter("text", "yes");
$SpeechSynthesizer->setParameter("format", "wav");
$SpeechSynthesizer->setParameter("voice", "usenglishfemale");
$SpeechSynthesizer->setParameter("output", "rest");
$result = $SpeechSynthesizer->makeRequest();
So save faffing around with the ob buffer, you could replace
ob_start();
echocurl_exec($ch);
$http_body = ob_get_contents();
ob_end_clean();
with
$http_body = curl_exec($ch);
Also adds the space between echo and curl_exec missing from your example (although that should throw a fatal and stop execution - do you have your own error handler?)

Categories