i am using the ABBY API for OCR and i want to get the results in a variable for further processing instead of downloading the result as a file
<?php
include_once("dBug.php");
// Name of application you created
$applicationId = 'telianewtest';
// Password should be sent to your e-mail after application was created
$password = 'w0Ye61tWZ6fODm7hIUj9XTeJ';
$fileName = '20080118155747372_Page_2.jpg';
// Get path to file that we are going to recognize
$local_directory=dirname(__FILE__).'/images/';
$filePath = $local_directory.'/'.$fileName;
if(!file_exists($filePath))
{
die('File '.$filePath.' not found.');
}
if(!is_readable($filePath) )
{
die('Access to file '.$filePath.' denied.');
}
// Recognizing with English language to rtf
// You can use combination of languages like ?language=english,russian or
// ?language=english,french,dutch
// For details, see API reference for processImage method
$url = 'http://cloud.ocrsdk.com/processImage?language=english&exportFormat=xml';
// Send HTTP POST request and ret xml response
$curlHandle = curl_init();
curl_setopt($curlHandle, CURLOPT_URL, $url);
curl_setopt($curlHandle, CURLOPT_RETURNTRANSFER, 1);
curl_setopt($curlHandle, CURLOPT_USERPWD, "$applicationId:$password");
curl_setopt($curlHandle, CURLOPT_POST, 1);
curl_setopt($curlHandle, CURLOPT_USERAGENT, "PHP Cloud OCR SDK Sample");
$post_array = array(
"my_file"=>"#".$filePath,
);
curl_setopt($curlHandle, CURLOPT_POSTFIELDS, $post_array);
$response = curl_exec($curlHandle);
if($response == FALSE) {
$errorText = curl_error($curlHandle);
curl_close($curlHandle);
die($errorText);
}
$httpCode = curl_getinfo($curlHandle, CURLINFO_HTTP_CODE);
curl_close($curlHandle);
// Parse xml response
$xml = simplexml_load_string($response);
if($httpCode != 200) {
if(property_exists($xml, "message")) {
die($xml->message);
}
die("unexpected response ".$response);
}
$arr = $xml->task[0]->attributes();
$taskStatus = $arr["status"];
if($taskStatus != "Queued") {
die("Unexpected task status ".$taskStatus);
}
// Task id
$taskid = $arr["id"];
// 4. Get task information in a loop until task processing finishes
// 5. If response contains "Completed" staus - extract url with result
// 6. Download recognition result (text) and display it
$url = 'http://cloud.ocrsdk.com/getTaskStatus';
$qry_str = "?taskid=$taskid";
// Check task status in a loop until it is finished
// TODO: support states indicating error
while(true)
{
sleep(5);
$curlHandle = curl_init();
curl_setopt($curlHandle, CURLOPT_URL, $url.$qry_str);
curl_setopt($curlHandle, CURLOPT_RETURNTRANSFER, 1);
curl_setopt($curlHandle, CURLOPT_USERPWD, "$applicationId:$password");
curl_setopt($curlHandle, CURLOPT_USERAGENT, "PHP Cloud OCR SDK Sample");
$response = curl_exec($curlHandle);
$httpCode = curl_getinfo($curlHandle, CURLINFO_HTTP_CODE);
curl_close($curlHandle);
// parse xml
$xml = simplexml_load_string($response);
if($httpCode != 200) {
if(property_exists($xml, "message")) {
die($xml->message);
}
die("Unexpected response ".$response);
}
$arr = $xml->task[0]->attributes();
$taskStatus = $arr["status"];
if($taskStatus == "Queued" || $taskStatus == "InProgress") {
// continue waiting
continue;
}
if($taskStatus == "Completed") {
// exit this loop and proceed to handling the result
break;
}
if($taskStatus == "ProcessingFailed") {
die("Task processing failed: ".$arr["error"]);
}
die("Unexpected task status ".$taskStatus);
}
// Result is ready. Download it
$url = $arr["resultUrl"];
$curlHandle = curl_init();
curl_setopt($curlHandle, CURLOPT_URL, $url);
curl_setopt($curlHandle, CURLOPT_RETURNTRANSFER, 1);
// Warning! This is for easier out-of-the box usage of the sample only.
// The URL to the result has https:// prefix, so SSL is required to
// download from it. For whatever reason PHP runtime fails to perform
// a request unless SSL certificate verification is off.
curl_setopt($curlHandle, CURLOPT_SSL_VERIFYPEER, false);
$response = curl_exec($curlHandle);
curl_close($curlHandle);
// Let user donwload rtf result
header('Content-type: application/txt');
header('Content-Disposition: attachment; filename="file.xml"');
echo $response;
?>
I tried to access the $xml variable with now success... any ideas?
Thank you in advance
(I have included the password since its a demo account, you can check it out if you want)
Related
I wrote a php function that downloads some (.exe) files using curl extension. The file gets successfully downloaded, but when I try to open it I get not compatible error. I opened it using notepad++ and there I see a '200' added to the beginning of the file. I can't really understand from where this '200' comes ?
here is my function:
$source = isset($_GET['link']) ? $_GET['link'] : ''; #get the download link
$filename = isset($_GET['name']) ? $_GET['name'] : 'download.exe'; # define name
if($source != '')
{
$handle = curl_init($source);
curl_setopt($handle, CURLOPT_RETURNTRANSFER, TRUE);
/* Get the HTML or whatever is linked in $url. */
$response = curl_exec($handle);
/* Check for 403 (forbidden). */
$httpCode = curl_getinfo($handle, CURLINFO_HTTP_CODE);
if($httpCode == 403) {
echo "<h2> <font color='red'> Sorry you are not allowed to download that file.</font><h2>";
} else {
header("Content-Disposition: attachment; filename=\"{$filename}\"");
#header("Content-Disposition: attachment; filename=\"uploaded.pdf\"");
// Get a FILE url to my test document
$url= str_replace(" ","%20", $source);
$ch= curl_init($url);
#curl_setopt($ch, CURLOPT_URL, $url);
curl_setopt($ch, CURLOPT_BINARYTRANSFER, true);
curl_exec($ch);
curl_close ($ch);
}
curl_close($handle);
}
else {
echo "error";
}
Set CURLOPT_HEADER to false like:
curl_setopt($ch, CURLOPT_HEADER, false);
It will disable the HTTP response, so you do not will receive the '200' in your file.
Similar question here in SO
I need a way to check if tweet exists. I have link to tweet like https://twitter.com/darknille/status/355651101657280512 . I preferably want a fast way to check (without retrieving body of page, just HEAD request), so I tried something like this
function if_curl_exists($url)
{
$resURL = curl_init();
curl_setopt($resURL, CURLOPT_URL, $url);
curl_setopt($resURL, CURLOPT_BINARYTRANSFER, 1);
curl_setopt($resURL, CURLOPT_HEADERFUNCTION, 'curlHeaderCallback');
curl_setopt($resURL, CURLOPT_FAILONERROR, 1);
$x = curl_exec ($resURL);
//var_dump($x);
echo $intReturnCode = curl_getinfo($resURL, CURLINFO_HTTP_CODE);
curl_close ($resURL);
if ($intReturnCode != 200 && $intReturnCode != 302 && $intReturnCode != 304) {
return false;
}
else return true;
}
or like this
function if_curl_exists_1($url)
{
$curl = curl_init($url);
curl_setopt($curl, CURLOPT_NOBODY, true);//head request
$result = curl_exec($curl);
$ret = false;
if ($result !== false) {
//if request was ok, check response code
echo $statusCode = curl_getinfo($curl, CURLINFO_HTTP_CODE);
if ($statusCode == 200) {
$ret = true;
}
}
curl_close($curl);
return $ret;
}
but both those return null with curl_exec(), there is nothing to check for http status code.
The other way is to use twitter api, like GET statuses/show/:id https://dev.twitter.com/docs/api/1.1/get/statuses/show/%3Aid but there is no special return value if tweet doesn't exist, as said here https://dev.twitter.com/discussions/8802
I need advice whats the fastest way to check, I am doing in php.
You probably have to set the Return Transfer flag
curl_setopt($curl, CURLOPT_RETURNTRANSFER, true);
If the code returns as 30x status you probably have to add the Follow Location flag as well
curl_setopt($curl, CURLOPT_FOLLOWLOCATION, true);
You can use #get_header. It will return an array in which the first item has the response code:
$response = #get_headers($url);
print_r($response[0]);
if($response[0]=='HTTP/1.0 404 Not Found'){
echo 'Not Found';
}else{
echo 'Found';
}
I'm trying to use curl to do a simple GET with one parameter called redirect_uri. The php file that gets called prints out a empty string for $_GET["redirect_uri"] it shows red= and it seems like nothing is being sent.
code to do the get
//Get code from login and display it
$ch = curl_init();
$url = 'http://www.besttechsolutions.biz/projects/facebook/testget.php';
//set the url, number of POST vars, POST data
curl_setopt($ch,CURLOPT_URL,$url);
curl_setopt($ch,CURLOPT_GET,1);
curl_setopt($ch,CURLOPT_GETFIELDS,"redirect_uri=my return url");
//execute post
print "new reply 2 <br>";
$result = curl_exec($ch);
print $result;
// print "<br> <br>";
// print $fields_string;
die("hello");
the testget.php file
<?php
print "red-";
print $_GET["redirect_uri"];
?>
This is how I usually do get requests, hopefully it will help you:
// create curl resource
$ch = curl_init();
//return the transfer as a string
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
// Follow redirects
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, true);
// Set maximum redirects
curl_setopt($ch, CURLOPT_MAXREDIRS, 5);
// Allow a max of 5 seconds.
curl_setopt($ch, CURLOPT_TIMEOUT, 5);
// set url
if( count($params) > 0 ) {
$query = http_build_query($params);
curl_setopt($ch, CURLOPT_URL, "$url?$query");
} else {
curl_setopt($ch, CURLOPT_URL, $url);
}
// $output contains the output string
$output = curl_exec($ch);
// Check for errors and such.
$info = curl_getinfo($ch);
$errno = curl_errno($ch);
if( $output === false || $errno != 0 ) {
// Do error checking
} else if($info['http_code'] != 200) {
// Got a non-200 error code.
// Do more error checking
}
// close curl resource to free up system resources
curl_close($ch);
return $output;
In this code, the $params could be an array where the key is the name, and the value is the value.
I am trying to login to a site and then call numerous URLs to get the source and scrape for images. It works fine using regular curl but when I try to use multi_curl I am getting back the exact same response. So that I only have to login once I am resuing the curl resource (this works fine with regular curl) and I think this may be the reason why it is returning the same response.
Does anyone know how to use multi_curl but authenticate first?
Here is the code I am using:
<?php
// LICENSE: PUBLIC DOMAIN
// The author disclaims copyright to this source code.
// AUTHOR: Shailesh N. Humbad
// SOURCE: http://www.somacon.com/p539.php
// DATE: 6/4/2008
// index.php
// Run the parallel get and print the total time
$s = microtime(true);
// Define the URLs
$urls = array(
"http://localhost/r.php?echo=request1",
"http://localhost/r.php?echo=request2",
"http://localhost/r.php?echo=request3"
);
$pg = new ParallelGet($urls);
print "<br />total time: ".round(microtime(true) - $s, 4)." seconds";
// Class to run parallel GET requests and return the transfer
class ParallelGet
{
function __construct($urls)
{
// Create get requests for each URL
$mh = curl_multi_init();
$count = 0;
$ch = curl_init();
foreach($urls as $i => $url)
{
$count++;
if($count == 1)
{
// SET URL FOR THE POST FORM LOGIN
curl_setopt($ch, CURLOPT_URL, 'https://www.example.com/login.php');
// ENABLE HTTP POST
curl_setopt ($ch, CURLOPT_POST, 1);
// SET POST PARAMETERS : FORM VALUES FOR EACH FIELD
curl_setopt ($ch, CURLOPT_POSTFIELDS, 'user=myuser&password=mypassword');
// IMITATE CLASSIC BROWSER'S BEHAVIOUR : HANDLE COOKIES
curl_setopt ($ch, CURLOPT_COOKIEJAR, realpath($_SERVER['DOCUMENT_ROOT']) . '/cookie.txt');
# Setting CURLOPT_RETURNTRANSFER variable to 1 will force cURL
# not to print out the results of its query.
# Instead, it will return the results as a string return value
# from curl_exec() instead of the usual true/false.
curl_setopt ($ch, CURLOPT_RETURNTRANSFER, 1);
// EXECUTE 1st REQUEST (FORM LOGIN)
curl_exec ($ch);
}
$ch = curl_init($url);
curl_setopt ($ch, CURLOPT_COOKIEFILE, realpath($_SERVER['DOCUMENT_ROOT']) . '/cookie.txt');
curl_setopt ($ch, CURLOPT_RETURNTRANSFER, 1);
$ch_array[$i] = $ch;
curl_multi_add_handle($mh, $ch_array[$i]);
}
// Start performing the request
do {
$execReturnValue = curl_multi_exec($mh, $runningHandles);
} while ($execReturnValue == CURLM_CALL_MULTI_PERFORM);
// Loop and continue processing the request
while ($runningHandles && $execReturnValue == CURLM_OK) {
// Wait forever for network
$numberReady = curl_multi_select($mh);
if ($numberReady != -1) {
// Pull in any new data, or at least handle timeouts
do {
$execReturnValue = curl_multi_exec($mh, $runningHandles);
} while ($execReturnValue == CURLM_CALL_MULTI_PERFORM);
}
}
// Check for any errors
if ($execReturnValue != CURLM_OK) {
trigger_error("Curl multi read error $execReturnValue\n", E_USER_WARNING);
}
// Extract the content
foreach($urls as $i => $url)
{
// Check for errors
$curlError = curl_error($ch_array[$i]);
if($curlError == "") {
$res[$i] = curl_multi_getcontent($ch_array[$i]);
} else {
print "Curl error on handle $i: $curlError\n";
}
// Remove and close the handle
curl_multi_remove_handle($mh, $ch_array[$i]);
curl_close($ch_array[$i]);
}
// Clean up the curl_multi handle
curl_multi_close($mh);
// Print the response data
print_r($res);
}
}
?>
you need to enable/use cookies with curl as well. look for it on the documentation, don't forget to create the cookies (empty files) with read and write permission for curl.
$cookie = tempnam ("/tmp", "CURLCOOKIE");
$ch = curl_init();
curl_setopt( $ch, CURLOPT_URL, $url );
curl_setopt( $ch, CURLOPT_COOKIEJAR, $cookie );
how to check if a URL exists or not - error 404 ? (using php)
<?php
$url = "http://www.faressoft.org/";
?>
If you have allow_url_fopen, you can do:
$exists = ($fp = fopen("http://www.faressoft.org/", "r")) !== FALSE;
if ($fp) fclose($fp);
although strictly speaking, this won't return false only for 404 errors. It's possible to use stream contexts to get that information, but a better option is to use the curl extension:
$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, "http://www.example.com/notfound");
curl_setopt($ch, CURLOPT_NOBODY, 1);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
curl_exec($ch);
$is404 = curl_getinfo($ch, CURLINFO_HTTP_CODE) == 404;
curl_close($ch);
The simplest one to check the 404/200 or etc..
<?php
$mylink="http://site.com";
$handler = curl_init($mylink);
curl_setopt($handler, CURLOPT_RETURNTRANSFER, TRUE);
$re = curl_exec($handler);
$httpcdd = curl_getinfo($handler, CURLINFO_HTTP_CODE);
if ($httpcdd == '404')
{ echo 'it is 404';}
else {echo 'it is not 404';}
?>
You could use curl which is a PHP library. With curl, you could query the page and then check for the error code called:
CURLE_HTTP_RETURNED_ERROR (22)
This is returned if CURLOPT_FAILONERROR is set TRUE and the HTTP server returns an error code that is >= 400.
From the CURL documentation at php.net:
<?php
// Create a curl handle to a non-existing location
$ch = curl_init('http://404.php.net/');
// Execute
curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
curl_exec($ch);
// Check if any error occured
if(curl_errno($ch))
{
echo 'Curl error: ' . curl_error($ch);
}
// Close handle
curl_close($ch);
?>
http://www.php.net/manual/en/function.curl-errno.php