I am trying to get sentiment scores of random product descriptions from a CSV file, I'm facing a problem with what I think is the API response time, not sure if I'm traversing through the CSV using the API incorrectly / un-efficiently but it is taking a long time to get results for all the 300+ entries in the CSV and whenever I want to push new changes to my codebase I need to wait for the API to re-evaluate the entries every time, here is my code I made for loading in the CSV file and for getting the sentiment scores
<?php
set_time_limit(500); // extended timeout due to slow / overwhelmed API response
function extract_file($csv) { // CSV to array function
$file = fopen($csv, 'r');
while (!feof($file)) {
$lines[] = fgetcsv($file, 1000, ',');
}
fclose($file);
return $lines;
}
$the_file = 'dataset.csv';
$csv_data = extract_file($the_file);
$response_array = []; // array container to hold returned sentiment values from among prduct descriptions
for($x = 1; $x < count($csv_data) - 1; $x++) { // loop through all descriptions
echo $x; // show iteration
$api_text = $csv_data[$x][1];
$api_text = str_replace('&', ' and ', $api_text); // removing escape sequence characters, '&' breaks the api :)
$api_text = str_replace(" ", "%20", $api_text); // serializing string
$text = 'text=';
$text .=$api_text; // serializing string further for the API
//echo 'current text1: ', $api_text;
$curl = curl_init(); // API request init
curl_setopt_array($curl, [
CURLOPT_URL => "https://text-sentiment.p.rapidapi.com/analyze",
CURLOPT_RETURNTRANSFER => true,
CURLOPT_FOLLOWLOCATION => true,
CURLOPT_ENCODING => "",
CURLOPT_MAXREDIRS => 10,
CURLOPT_TIMEOUT => 30,
CURLOPT_HTTP_VERSION => CURL_HTTP_VERSION_1_1,
CURLOPT_CUSTOMREQUEST => "POST",
CURLOPT_POSTFIELDS => $text,
CURLOPT_HTTPHEADER => [
"X-RapidAPI-Host: text-sentiment.p.rapidapi.com",
"X-RapidAPI-Key: <snip>",
"content-type: application/x-www-form-urlencoded"
],
]);
$response = curl_exec($curl);
$err = curl_error($curl);
curl_close($curl);
if ($err) {
echo "cURL Error #:" . $err;
} else {
echo $response;
}
$json = json_decode($response, true); // convert response to JSON format
if(isset($json["pos"]) == false) { // catching response error 100, makes array faulty otherwise
continue;
}
else {
array_push($response_array, array($x, "+" => $json["pos"], "-" => $json["neg"])); // appends array with sentiment values at current index
}
}
echo "<br>";
echo "<br> results: ";
echo "<p>";
for ($y = 0; $y < count($response_array); $y++){ // prints out all the sentiment values
echo "<br>";
echo print_r($response_array[$y]);
echo "<br>";
}
echo "</p>";
echo "<br>the most negative description: ";
$max_neg = array_keys($response_array, max(array_column($response_array, '-')));
//$max_neg = max(array_column($response_array, '-'));
echo print_r($csv_data[$max_neg[0]]);
echo "<br>the most positive description: ";
$max_pos = array_keys($response_array, max(array_column($response_array, '+')));
echo print_r($csv_data[$max_pos[0]]);
?>
What this code snippet aims to do is find the most negative and most positive sentiment among the description column in the csv and print them out according to their index, I'm only interested in finding descriptions with the highest amount of positive and negative sentiment word number not the percentage of the overall sentiment
The file can be found in this git repo
Thanks for any suggestions
This can be achieved by creating a cache file.
This solution creates a file cache.json that contains the results from the API, using the product name as the key for each entry.
On subsequent calls, it will use the cache value if it exists.
set_time_limit(500);
function file_put_json($file, $data)
{
$json = json_encode($data, JSON_PRETTY_PRINT);
file_put_contents($file, $json);
}
function file_get_json($file, $as_array=false)
{
return json_decode(file_get_contents($file), $as_array);
}
function file_get_csv($file, $header_row=true)
{
$handle = fopen($file, 'r');
if ($header_row === true)
$header = fgetcsv($handle);
$array = [];
while ($row = fgetcsv($handle)) {
if ($header_row === true) {
$array[] = array_combine($header, array_map('trim', $row));
} else {
$array[] = array_map('trim', $row);
}
}
fclose($handle);
return $array;
}
function call_sentiment_api($input)
{
$text = 'text=' . $input;
$curl = curl_init();
curl_setopt_array($curl, [
CURLOPT_URL => "https://text-sentiment.p.rapidapi.com/analyze",
CURLOPT_RETURNTRANSFER => true,
CURLOPT_FOLLOWLOCATION => true,
CURLOPT_ENCODING => "",
CURLOPT_MAXREDIRS => 10,
CURLOPT_TIMEOUT => 30,
CURLOPT_HTTP_VERSION => CURL_HTTP_VERSION_1_1,
CURLOPT_CUSTOMREQUEST => "POST",
CURLOPT_POSTFIELDS => $text,
CURLOPT_HTTPHEADER => [
"X-RapidAPI-Host: text-sentiment.p.rapidapi.com",
"X-RapidAPI-Key: <snip>",
"content-type: application/x-www-form-urlencoded"
],
]);
$response = curl_exec($curl);
$err = curl_error($curl);
curl_close($curl);
if ($err) {
throw new Exception("cURL Error #:" . $err);
}
return $response;
}
$csv_data = file_get_csv('dataset.csv');
if (file_exists('cache.json')) {
$cache_data = file_get_json('cache.json', true);
} else {
$cache_data = [];
}
$cache_names = array_keys($cache_data);
$output = [];
foreach ($csv_data as $csv) {
$product_name = $csv['name'];
echo $product_name . '...';
if (in_array($product_name, $cache_names)) {
echo 'CACHED...' . PHP_EOL;
continue;
}
$description = urlencode(str_replace('&', ' and ', $csv['description']));
$response = call_sentiment_api($description);
echo 'API...' . PHP_EOL;
$json = json_decode($response, true);
$cache_data[$product_name] = $json;
}
file_put_json('cache.json', $cache_data);
echo 'SAVE CACHE!' . PHP_EOL . PHP_EOL;
$highest_pos = 0;
$highest_neg = 0;
$pos = [];
$neg = [];
foreach ($cache_data as $name => $cache) {
if (!isset($cache['pos']) || !isset($cache['neg'])) {
continue;
}
if ($cache['pos'] > $highest_pos) {
$pos = [$name => $cache];
$highest_pos = $cache['pos'];
}
if ($cache['pos'] === $highest_pos) {
$pos[$name] = $cache;
}
if ($cache['neg'] > $highest_neg) {
$neg = [$name => $cache];
$highest_neg = $cache['neg'];
}
if ($cache['neg'] === $highest_neg) {
$neg[$name] = $cache;
}
}
echo "Most Positive Sentiment: " . $highest_pos . PHP_EOL;
foreach ($pos as $name => $pos_) {
echo "\t" . $name . PHP_EOL;
}
echo PHP_EOL;
echo "Most Negative Sentiment: " . $highest_neg . PHP_EOL;
foreach ($neg as $name => $neg_) {
echo "\t" . $name . PHP_EOL;
}
Results in:
Most Positive Sentiment: 4
X-Grip Lifting Straps - GymBeam
Beta Carotene - GymBeam
Chelated Magnesium - GymBeam
Creatine Crea7in - GymBeam
L-carnitine 1000 mg - GymBeam - 20 tabs
Resistance Band Set - GymBeam
Most Negative Sentiment: 2
Calorie free Ketchup sauce 320 ml - GymBeam
ReHydrate Hypotonic Drink 1000 ml - GymBeam
Vitamin E 60 caps - GymBeam
Vitamin B-Complex 120 tab - GymBeam
Zero Syrup Hazelnut Choco 350 ml - GymBeam
Bio Psyllium - GymBeam
Zero calorie Vanilla Syrup - GymBeam
You need to know where the time is going.
Start with identifying where the time goes in the curl request.
My guess is the API response time.
If that's the case I have a solution. Meanwhile I will get the "multi-tasking" code code I use to do simultaneous curl requests.
curl has the timing you need. It looks like this:
'total_time' => 0.029867,
'namelookup_time' => 0.000864,
'connect_time' => 0.001659,
'pretransfer_time' => 0.00988,
'size_upload' => 0.0,
'size_download' => 8300.0,
'speed_download' => 277898.0,
'speed_upload' => 0.0,
Just add a couple of lines of code
$response = curl_exec($curl);
$info = var_export(curl_getinfo($curl),true);
file_put_contents('timing.txt',$info,FILE_APPEND);
Running simultaneous curl sockets.
Put your curl in curl.php
$text = $_GET['text'];
curl_setopt_array($curl, [
CURLOPT_URL => "https://text-sentiment.p.rapidapi.com/analyze",
CURLOPT_RETURNTRANSFER => true,
CURLOPT_FOLLOWLOCATION => true,
CURLOPT_ENCODING => "",
CURLOPT_MAXREDIRS => 10,
CURLOPT_TIMEOUT => 30,
CURLOPT_HTTP_VERSION => CURL_HTTP_VERSION_1_1,
CURLOPT_CUSTOMREQUEST => "POST",
CURLOPT_POSTFIELDS => $text,
CURLOPT_HTTPHEADER => [
"X-RapidAPI-Host: text-sentiment.p.rapidapi.com",
"X-RapidAPI-Key: <snip>",
"content-type: application/x-www-form-urlencoded"
],
]);
This code goes in your CSV loop to create all the URL query fields to pass to curl.php (e.g. http://127.0.0.1/curl.php?text=$text)
$query = urlencode($text);
$urls[] = array('host' => "127.0.0.1",'path' => "/curl.php?text=$query
Then process all the URLs.
foreach($urls as $path){
$host = $path['host'];
$path = $path['path'];
$http = "GET $path HTTP/1.0\r\nHost: $host\r\n\r\n";
$stream = stream_socket_client("$host:80", $errno,$errstr, 120,STREAM_CLIENT_ASYNC_CONNECT|STREAM_CLIENT_CONNECT);
if ($stream) {
$sockets[] = $stream; // supports multiple sockets
fwrite($stream, $http);
}
else {
$err .= "$id Failed<br>\n";
}
}
Then Monitor the sockets and retrieve the response from each socket.
Then close the socket until you have them all.
while (count($sockets)) {
$read = $sockets;
stream_select($read, $write = NULL, $except = NULL, $timeout);
if (count($read)) {
foreach ($read as $r) {
$id = array_search($r, $sockets);
$data = fread($r, $buffer_size);
if (strlen($data) == 0) {
// echo "$id Closed: " . date('h:i:s') . "\n\n\n";
$closed[$id] = microtime(true);
fclose($r);
unset($sockets[$id]);
}
else {
$results[$id] .= $data;
}
}
}
else {
// echo 'Timeout: ' . date('h:i:s') . "\n\n\n";
break;
}
}
Then all your results are in $results[].
Related
I am trying to get response from multiple url using API call and change color based on response. Problem is I want to get response from multiple url at the same time. But the program only can take the last url from the array.
I also tried using array for multiple url and looping but didn't work. Below is my code.
<?php
// An array of URLs to make requests to
$urls = array(
'https://example.com/v1/objects/services?service=1',
'https://example.com/v1/objects/services?service=2'
);
// Create a new cURL multi handle
$mh = curl_multi_init();
$username = "root";
$password = "apipass";
$headers = array(
'Accept: application/json',
'X-HTTP-Method-Override: GET'
// Add each URL to the cURL multi handle
foreach ($urls as $i => $url) {
$ch[$i] = curl_init();
curl_setopt_array($ch[$i], array(
CURLOPT_URL => $url,
CURLOPT_HTTPHEADER => $headers,
CURLOPT_USERPWD => $username . ":" . $password,
CURLOPT_RETURNTRANSFER => true,
CURLOPT_SSL_VERIFYHOST => 0,
CURLOPT_SSL_VERIFYPEER => 0
));
curl_multi_add_handle($mh, $ch[$i]);
}
// Execute the cURL multi handle
do {
$status = curl_multi_exec($mh, $active);
} while ($status === CURLM_CALL_MULTI_PERFORM || $active);
// Get the responses from each handle
foreach ($urls as $i => $url) {
$response[$i] = curl_multi_getcontent($ch[$i]);
}
// Close the cURL multi handle and each individual handle
foreach ($urls as $i => $url) {
curl_multi_remove_handle($mh, $ch[$i]);
}
curl_multi_close($mh);
foreach ($response as $i => $response) {
// check response and change color based on it
if(strpos($response, "Unreachable") !== false) {
$color[$i] = 'red';
// success response
} else {
$color[$i] = 'green';
}
}
print $status;
print $color[$i];
?>
Using some code from NHTSA's API, my own and ideas from this site, wrapping it into a function, it is working just fine but would not run on my live server.
On the live server, it was giving an error which I finally solved to the code using an array shortcut not supported by my live server's PHP version:
Parse error: syntax error, unexpected '[', expecting ')' in /home/pchome/public_html/verify/functions/sitefunctions.php on line 9
which is this line:
$postdata = http_build_query(["data" => $VINS, "format" => "JSON"]);
Changed to this it works and also changed similar code in several other places in the same manner:
$postdata = http_build_query(array("data" => $VINS, "format" => "JSON"));
Occasionally (but not always) I may want to pass multiple VINs to it as a semicolon-separated list. This format is not changeable so what is needed to give this functionality? (Sample VINs: 3GNDA13D76S000000;5XYKT3A12CG000000
// Uses NHTSA API to decode VIN(s)
function decodeVINS($VINS) {
if ($VINS) :
$return = "";
$postdata = http_build_query(array("data" => $VINS, "format" => "JSON"));
$stream_options = array(
'http' => array(
'header' => "Content-Type: application/x-www-form-urlencoded\r\n".
"Content-Length: ".strlen($postdata)."\r\n",
'method' => "POST",
'content' => $postdata
)
);
$context = stream_context_create($stream_options);
$apiURL = "https://vpic.nhtsa.dot.gov/api/vehicles/DecodeVINValuesBatch/";
$fp = #fopen($apiURL, 'rb', FALSE, $context);
$results = array_column(json_decode(#stream_get_contents($fp),TRUE), '0');
$results = $results[0];
$output = "<blockquote>\n";
$output .= "<div><strong>VIN: {$results['VIN']}</strong></div>\n";
$output .= "<div><strong>ErrorCode: {$results['ErrorCode']}</strong></div>\n";
if ($results['AdditionalErrorText']) :
$output .= "<div><strong>AdditionalErrorText: {$results['AdditionalErrorText']}</strong></div>\n";
endif;
foreach ($results as $key => $val) :
if ($val && $key != "VIN" && $key != "ErrorCode" && $key != "AdditionalErrorText") :
$output .= "<div>$key: $val</div>";
endif;
endforeach;
$output .= "</blockquote>\n\n";
else :
$output = "Enter VINs above separated by line breaks";
endif;
return $output;
}
. . . and it is outputting something like this:
VIN: JB7FJ43S5KJ000911
ErrorCode: 0 - VIN decoded clean. Check Digit (9th position) is correct
BodyClass: Sport Utility Vehicle (SUV)/Multi Purpose Vehicle (MPV)
DisplacementCC: 3000
DisplacementCI: 183.0712322841
DisplacementL: 3
DriveType: 4WD/4-Wheel Drive/4x4
EngineConfiguration: V-Shaped
EngineCylinders: 6
FuelTypePrimary: Gasoline
GVWR: Class 1C: 4,001 - 5,000 lb (1,814 - 2,268 kg)
Make: DODGE
Manufacturer: MITSUBISHI MOTORS CORPORATION (MMC)
ManufacturerId: 1052
Model: Raider
ModelYear: 1989
PlantCity: Nagoya
PlantCompanyName: Nagoya #3
PlantCountry: Japan
VehicleType: TRUCK
Working with JSON instead of CSV, in my opinion, is going to be much easier/direct/stable.
I have added a parameter ($fields) to the custom function call which will dictate how to isolate and sort your data.
I have also modified the first parameter ($VINs), to be passed as an array instead of a semicolon delimited string. This I hope simplifies your processing -- if it doesn't you are welcome to fallback to your original string format and remove my implode(";",$VINs) call.
Code: (Demo)
function searchByVINs ($VINs,$fields) {
// test multi-VIN batching via textarea at bottom of https://vpic.nhtsa.dot.gov/api/
$stream_options_content = http_build_query(["data" => implode(";", $VINS), "format" => "JSON"]);
$stream_options = [
'http' => [
'header' => "Content-Type: application/x-www-form-urlencoded\r\n".
"Content-Length: ".strlen($postdata)."\r\n",
'method' => "POST",
'content' => $postdata
]
];
$context = stream_context_create($stream_options);
$apiURL = "https://vpic.nhtsa.dot.gov/api/vehicles/DecodeVINValuesBatch/";
if (!$fp = #fopen($apiURL, "rb", FALSE, $context)) {
return ["success" => false, "response" => "Unable to open stream"];
}
if (!$response = stream_get_contents($fp),true)) {
return ["success" => false, "response" => "Unable to receive streamed data"];
}
if(($data = #json_decode($response,true)) === null && json_last_error()!==JSON_ERROR_NONE){
return ["success" => false, "response" => "Unable to parse streamed data"];
}
if (!isset($data["Message"]) || $data["Message"] != "Results returned successfully") {
return ["success" => false, "response" => "Received unsuccessful dataset"];
}
$return = [];
$keys = array_flip($fields);
foreach ($data["Results"] as $dataset) {
$isolated = array_intersect_key($dataset,$keys); // only retain the elements with keys that match $fields values
$sorted = array_replace($keys,$isolated); // order the dataset by order of elements in $fields
$return[] = $sorted;
}
return ["success" => true, "response" => $return];
}
$VINs = ["3GNDA13D76S000000", "5XYKT3A12CG000000"];
$fields = ["VIN", "ModelYear", "Make", "FuelTypePrimary", "DriveType", "BodyClass"];
$response = searchByVINs($VINs,$fields);
if (!$response["success"]) {
echo "Oops, the api call failed. {$response["response"]}";
} else {
foreach ($response["response"] as $item){
echo "<div>";
foreach ($item as $key => $value) {
echo "<div>$key: $value</div>";
}
echo "</div>";
}
}
Output (from mocked demo)
<div>
<div>VIN: 3GNDA13D76S000000</div>
<div>ModelYear: 2006</div>
<div>Make: CHEVROLET</div>
<div>FuelTypePrimary: Gasoline</div>
<div>DriveType: </div>
<div>BodyClass: Wagon</div>
</div>
<div>
<div>VIN: 5XYKT3A12CG000000</div>
<div>ModelYear: 2012</div>
<div>Make: KIA</div>
<div>FuelTypePrimary: Gasoline</div>
<div>DriveType: 4x2</div>
<div>BodyClass: Wagon</div>
</div>
All working now so here is the final version! As needed, shows only rows with values and can handle multiple VINs in one submission. The function is called from a simple form that has a textarea for entering the VINs, along with a Submit button.
// Uses NHTSA API to decode VIN(s)
function decodeVINS($VINS) {
// sample VINs 3GNDA13D76S000000;5XYKT3A12CG000000
if ($VINS) :
$postdata = http_build_query(array("data" => $VINS, "format" => "JSON"));
$stream_options = array(
'http' => array(
'header' => "Content-Type: application/x-www-form-urlencoded\r\n".
"Content-Length: ".strlen($postdata)."\r\n",
'method' => "POST",
'content' => $postdata
)
);
$context = stream_context_create($stream_options);
$apiURL = "https://vpic.nhtsa.dot.gov/api/vehicles/DecodeVINValuesBatch/";
$fp = #fopen($apiURL, 'rb', FALSE, $context);
$returnValue = json_decode(#stream_get_contents($fp),TRUE);
if(!isset($returnValue['Results'])):
echo "Invalid return data or no return data. Exiting";
return FALSE;
endif;
$results = $returnValue['Results'];
if(!is_array($results)):
$results = array($results);
endif;
$output = '';
foreach($results as $result):
$output .= "<blockquote>\n";
$output .= "<div><strong>VIN: {$result['VIN']}</strong></div>\n";
$output .= "<div><strong>ErrorCode: {$result['ErrorCode']}</strong></div>\n";
if ($result['AdditionalErrorText']) :
$output .= "<div><strong>AdditionalErrorText: {$result['AdditionalErrorText']}</strong></div>\n";
endif;
foreach ($result as $key => $val) :
if ($val && $key != "VIN" && $key != "ErrorCode" && $key != "AdditionalErrorText") :
$output .= "<div>$key: $val</div>";
endif;
endforeach;
$output .= "</blockquote>\n\n";
endforeach;
else :
$output = "Enter VINs above separated by line breaks";
endif;
return $output;
}
I am using OpenTok API for the audio/video chat and now trying to record the audio/video chat conference with the same.
I have downloaded the API from this link
But the result always comes empty (check the result by var_dump($res); ) and response comes empty array.
Following is my function:
protected function request($method, $url, $opts = null) {
$url = $this->endpoint . $url;
if(($method == 'PUT' || $method == 'POST') && $opts) {
$bodyFormat = $opts->contentType();
$dataString = $opts->dataString();
}
$authString = "X-TB-PARTNER-AUTH: $this->apiKey:$this->apiSecret";
if (function_exists("file_get_contents")) {
$http = array(
'method' => $method
);
$headers = array($authString);
if($method == "POST" || $method == "PUT") {
$headers[1] = "Content-type: " . $bodyFormat;
$headers[2] = "Content-Length: " . strlen($dataString);
$http["content"] = $dataString;
}
$http["header"] = $headers;
$context_source = array ('http' =>$http);
$context = stream_context_create($context_source);
$res = file_get_contents( $url ,true, $context);
var_dump($res);
$statusarr = explode(" ", $http_response_header[0]);
$status = $statusarr[1];
$headers = array();
foreach($http_response_header as $header) {
if(strpos($header, "HTTP/") !== 0) {
$split = strpos($header, ":");
$key = strtolower(substr($header, 0, $split));
$val = trim(substr($header, $split + 1));
$headers[$key] = $val;
}
}
$response = (object)array(
"status" => $status
);
if(strtolower($headers["content-type"]) == "application/json") {
$response->body = json_decode($res);
} else {
$response->body = $res;
}
} else{
throw new OpenTokArchivingRequestException("Your PHP installion doesn't support file_get_contents. Please enable it so that you can make API calls.");
}
return $response;
}
When I print $context_source then I get the following array:
Array
(
[http] => Array
(
[method] => POST
[content] => {"action":"start","sessionId":"1_MX40NTM2MDgxMn4xMjcuMC4wLjF-MTQ0MzcxMDQ0NzU2NH4wWkZ3bkN1NDJaYlNFMFZFZmYwcGZ1a2F-UH4","name":"filename"}
[header] => Array
(
[0] => X-TB-PARTNER-AUTH: 45360812:cbbd8b29be4c75d5aab1945a71bf0cb3443e3939
[1] => Content-type: application/json
[2] => Content-Length: 136
)
)
)
Everything seems good. Can anyone tell me what I am doing wrong.
I use the file_get_contents() from server M to get the response from server X. The result was success but it take too long.
$url = "http://10.20.30.40";
$opts = array('http' =>
array(
'method' => 'GET',
'header' => 'Connection: close\r\n'
)
);
$context = stream_context_create($opts);
$result = file_get_contents($url, false, $context);
$result = json_decode($result);
$response = parse_http_response_header($http_response_header);
print_r($result);
print_r($response);
/////// below is just function to parse the response ///////
function parse_http_response_header(array $headers)
{
$responses = array();
$buffer = NULL;
foreach ($headers as $header)
{
if ('HTTP/' === substr($header, 0, 5))
{
// add buffer on top of all responses
if ($buffer) array_unshift($responses, $buffer);
$buffer = array();
list($version, $code, $phrase) = explode(' ', $header, 3) + array('', FALSE, '');
$buffer['status'] = array(
'line' => $header,
'version' => $version,
'code' => (int) $code,
'phrase' => $phrase
);
$fields = &$buffer['fields'];
$fields = array();
continue;
}
list($name, $value) = explode(': ', $header, 2) + array('', '');
// header-names are case insensitive
$name = strtoupper($name);
// values of multiple fields with the same name are normalized into
// a comma separated list (HTTP/1.0+1.1)
if (isset($fields[$name]))
{
$value = $fields[$name].','.$value;
}
$fields[$name] = $value;
}
unset($fields); // remove reference
array_unshift($responses, $buffer);
return $responses;
}
Is there any suggestion or function option to get the response (the content and the response code) faster?
(NOTE: I am not allowed to install cURL, so please gimme other option)
I am looking for a way to read multiple (over 50) plain text websites and parse only certain information into a html table, or as a csv file.When I say "plain text" I mean that while it is a web address, it does not have any html associated with it.This would be an example of the source. I am pretty new to this, and was looking for help in seeing how this could be done.
update-token:179999210
vessel-name:Name Here
vessel-length:57.30
vessel-beam:14.63
vessel-draft:3.35
vessel-airdraft:0.00
time:20140104T040648.259Z
position:25.04876667 -75.57001667 GPS
river-mile:sd 178.71
rate-of-turn:0.0
course-over-ground:58.5
speed-over-ground:0.0
ais-367000000 {
pos:45.943912 -87.384763 DGPS
cog:249.8
sog:0.0
name:name here
call:1113391
imo:8856857
type:31
dim:10 20 4 5
draft:3.8
destination:
}
ais-367000000 {
pos:25.949652 -86.384535 DGPS
cog:105.6
sog:0.0
name:CHRISTINE
call:5452438
type:52
status:0
dim:1 2 3 4
draft:3.0
destination:IMTT ST.ROSE
eta:06:00
}
Thanks for any suggestions you guys might have.
I may be completely missing the point here - but here is how you could take the contents (assuming you had them as a string) and put them into a php key/value array. I "hard-coded" the string you had, and changed one value (the key ais-3670000 seemed to repeat, and that makes the second object overwrite the first).
This is a very basic parser that assumes a format like you described above. I give the output below the code:
<?php
echo "<html>";
$s="update-token:179999210
vessel-name:Name Here
vessel-length:57.30
vessel-beam:14.63
vessel-draft:3.35
vessel-airdraft:0.00
time:20140104T040648.259Z
position:25.04876667 -75.57001667 GPS
river-mile:sd 178.71
rate-of-turn:0.0
course-over-ground:58.5
speed-over-ground:0.0
ais-367000000 {
pos:45.943912 -87.384763 DGPS
cog:249.8
sog:0.0
name:name here
call:1113391
imo:8856857
type:31
dim:10 20 4 5
draft:3.8
destination:
}
ais-367000001 {
pos:25.949652 -86.384535 DGPS
cog:105.6
sog:0.0
name:CHRISTINE
call:5452438
type:52
status:0
dim:1 2 3 4
draft:3.0
destination:IMTT ST.ROSE
eta:06:00
}";
$lines = explode("\n", $s);
$output = Array();
$thisElement = & $output;
foreach($lines as $line) {
$elements = explode(":", $line);
if (count($elements) > 1) {
$thisElement[trim($elements[0])] = $elements[1];
}
if(strstr($line, "{")) {
$elements = explode("{", $line);
$key = trim($elements[0]);
$output[$key] = Array();
$thisElement = & $output[$key];
}
if(strstr($line, "}")) {
$thisElement = & $output;
}
}
echo '<pre>';
print_r($output);
echo '</pre>';
echo '</html>';
?>
Output of the above (can be seen working at http://www.floris.us/SO/ships.php):
Array
(
[update-token] => 179999210
[vessel-name] => Name Here
[vessel-length] => 57.30
[vessel-beam] => 14.63
[vessel-draft] => 3.35
[vessel-airdraft] => 0.00
[time] => 20140104T040648.259Z
[position] => 25.04876667 -75.57001667 GPS
[river-mile] => sd 178.71
[rate-of-turn] => 0.0
[course-over-ground] => 58.5
[speed-over-ground] => 0.0
[ais-367000000] => Array
(
[pos] => 45.943912 -87.384763 DGPS
[cog] => 249.8
[sog] => 0.0
[name] => name here
[call] => 1113391
[imo] => 8856857
[type] => 31
[dim] => 10 20 4 5
[draft] => 3.8
[destination] =>
)
[ais-367000001] => Array
(
[pos] => 25.949652 -86.384535 DGPS
[cog] => 105.6
[sog] => 0.0
[name] => CHRISTINE
[call] => 5452438
[type] => 52
[status] => 0
[dim] => 1 2 3 4
[draft] => 3.0
[destination] => IMTT ST.ROSE
[eta] => 06
)
)
A better approach would be to turn the string into "properly formed JSON", then use json_decode. That might look like the following:
<?php
echo "<html>";
$s="update-token:179999210
vessel-name:Name Here
vessel-length:57.30
vessel-beam:14.63
vessel-draft:3.35
vessel-airdraft:0.00
time:20140104T040648.259Z
position:25.04876667 -75.57001667 GPS
river-mile:sd 178.71
rate-of-turn:0.0
course-over-ground:58.5
speed-over-ground:0.0
ais-367000000 {
pos:45.943912 -87.384763 DGPS
cog:249.8
sog:0.0
name:name here
call:1113391
imo:8856857
type:31
dim:10 20 4 5
draft:3.8
destination:
}
ais-367000001 {
pos:25.949652 -86.384535 DGPS
cog:105.6
sog:0.0
name:CHRISTINE
call:5452438
type:52
status:0
dim:1 2 3 4
draft:3.0
destination:IMTT ST.ROSE
eta:06:00
}";
echo '<pre>';
print_r(parseString($s));
echo '</pre>';
function parseString($s) {
$lines = explode("\n", $s);
$jstring = "{ ";
$comma = "";
foreach($lines as $line) {
$elements = explode(":", $line);
if (count($elements) > 1) {
$jstring = $jstring . $comma . '"' . trim($elements[0]) . '" : "' . $elements[1] .'"';
$comma = ",";
}
if(strstr($line, "{")) {
$elements = explode("{", $line);
$key = trim($elements[0]);
$jstring = $jstring . $comma . '"' . $key .'" : {';
$comma = "";
}
if(strstr($line, "}")) {
$jstring = $jstring . '} ';
$comma = ",";
}
}
$jstring = $jstring ."}";
return json_decode($jstring);
}
echo '</html>';
?>
Demo at http://www.floris.us/SO/ships2.php ; note that I use the variable $comma to make sure that commas are either included, or not included, at various points in the string.
Output of this code looks similar to what we had before:
stdClass Object
(
[update-token] => 179999210
[vessel-name] => Name Here
[vessel-length] => 57.30
[vessel-beam] => 14.63
[vessel-draft] => 3.35
[vessel-airdraft] => 0.00
[time] => 20140104T040648.259Z
[position] => 25.04876667 -75.57001667 GPS
[river-mile] => sd 178.71
[rate-of-turn] => 0.0
[course-over-ground] => 58.5
[speed-over-ground] => 0.0
[ais-367000000] => stdClass Object
(
[pos] => 45.943912 -87.384763 DGPS
[cog] => 249.8
[sog] => 0.0
[name] => name here
[call] => 1113391
[imo] => 8856857
[type] => 31
[dim] => 10 20 4 5
[draft] => 3.8
[destination] =>
)
[ais-367000001] => stdClass Object
(
[pos] => 25.949652 -86.384535 DGPS
[cog] => 105.6
[sog] => 0.0
[name] => CHRISTINE
[call] => 5452438
[type] => 52
[status] => 0
[dim] => 1 2 3 4
[draft] => 3.0
[destination] => IMTT ST.ROSE
[eta] => 06
)
)
But maybe your question is "how do I get the text into php in the first place". In that case, you might look at something like this:
<?php
$urlstring = file_get_contents('/path/to/urlFile.csv');
$urls = explode("\n", $urlstring); // one url per line
$responses = Array();
// loop over the urls, and get the information
// then parse it into the $responses array
$i = 0;
foreach($urls as $url) {
$responses[$i] = parseString(file_get_contents($url));
$i = $i + 1;
}
function parseString($s) {
$lines = explode("\n", $s);
$jstring = "{ ";
$comma = "";
foreach($lines as $line) {
$elements = explode(":", $line);
if (count($elements) > 1) {
$jstring = $jstring . $comma . '"' . trim($elements[0]) . '" : "' . $elements[1] .'"';
$comma = ",";
}
if(strstr($line, "{")) {
$elements = explode("{", $line);
$key = trim($elements[0]);
$jstring = $jstring . $comma . '"' . $key .'" : {';
$comma = "";
}
if(strstr($line, "}")) {
$jstring = $jstring . '} ';
$comma = ",";
}
}
$jstring = $jstring ."}";
return json_decode($jstring);
}
?>
I include the same parsing function as before; it's possible to make it much better, or leave it out altogether. Hard to know from your question.
Questions welcome.
UPDATE
Based on comments I have added a function that will perform the curl on the file resource; let me know if this works for you. I have created a file http://www.floris.us/SO/ships.txt that is an exact copy of the file you showed above, and a http://www.floris.us/SO/ships3.php that contains the following source code - you can run it and see that it works (note - in this version I don't read anything from a .csv file - you already know how to do that. This is just taking the array, and using it to obtain a text file, then converting it to a data structure you can use - display, whatever):
<?php
$urls = Array();
$urls[0] = "http://www.floris.us/SO/ships.txt";
$responses = Array();
// loop over the urls, and get the information
// then parse it into the $responses array
$i = 0;
foreach($urls as $url) {
// $responses[$i] = parseString(file_get_contents($url));
$responses[$i] = parseString(myCurl($url));
$i = $i + 1;
}
echo '<html><body><pre>';
print_r($responses);
echo '</pre></body></html>';
function parseString($s) {
$lines = explode("\n", $s);
$jstring = "{ ";
$comma = "";
foreach($lines as $line) {
$elements = explode(":", $line);
if (count($elements) > 1) {
$jstring = $jstring . $comma . '"' . trim($elements[0]) . '" : "' . $elements[1] .'"';
$comma = ",";
}
if(strstr($line, "{")) {
$elements = explode("{", $line);
$key = trim($elements[0]);
$jstring = $jstring . $comma . '"' . $key .'" : {';
$comma = "";
}
if(strstr($line, "}")) {
$jstring = $jstring . '} ';
$comma = ",";
}
}
$jstring = $jstring ."}";
return json_decode($jstring);
}
function myCurl($f) {
// create curl resource
$ch = curl_init();
// set url
curl_setopt($ch, CURLOPT_URL, $f);
//return the transfer as a string
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
// $output contains the output string
$output = curl_exec($ch);
// close curl resource to free up system resources
curl_close($ch);
return $output;
}
?>
Note - because two entries have the same "tag", the second one overwrites the first when using the original source data. If that is a problem let me know. Also if you have ideas on how you actually want to display the data, try to mock up something and I can help you get it right.
On the topic of time-outs
There are several possible timeout mechanisms that can be causing you problems; depending on which it is, one of the following solutions may help you:
If the browser doesn't get any response from the server, it will eventually time out. This is almost certainly not your problem right now; but it might become your issue if you fix the other problems
php scripts typically have a built in "maximum time to run" before they decide you sent them into an infinite loop. If you know you will be making lots of requests, and these requests will take a lot of time, you may want to set the time-out higher. See http://www.php.net/manual/en/function.set-time-limit.php for details on how to do this. I would recommend setting the limit to a "reasonable" value inside the curl loop - so the counter gets reset for every new request.
Your attempt to connect to the server may take too long (this is the most likely problem as you said). You can set the value (time you expect to wait to make the connection) to something "vaguely reasonable" like 10 seconds; this means you won't wait forever for the servers that are offline. Use
curl_setopt($ch, CURLOPT_CONNECTTIMEOUT, 10);
for a 10 second wait. See Setting Curl's Timeout in PHP
Finally you will want to handle the errors gracefully - if the connection did not succeed, you don't want to process the response. Putting all this together gets you something like this:
$i = 0;
foreach($urls as $url) {
$temp = myCurl($url);
if (strlen($temp) == 0) {
echo 'no response from '.$url.'<br>';
}
else {
$responses[$i] = parseString(myCurl($url));
$i = $i + 1;
}
}
echo '<html><body><pre>';
print_r($responses);
echo '</pre></body></html>';
function myCurl($f) {
// create curl resource
$ch = curl_init();
// set url
curl_setopt($ch, CURLOPT_URL, $f);
//return the transfer as a string
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
curl_setopt($ch, CURLOPT_NOSIGNAL, 1);
curl_setopt($ch, CURLOPT_CONNECTTIMEOUT, 10); // try for 10 seconds to get a connection
curl_setopt($ch, CURLOPT_TIMEOUT, 30); // try for 30 seconds to complete the transaction
// $output contains the output string
$output = curl_exec($ch);
// see if any error was set:
$curl_errno = curl_errno($ch);
// close curl resource to free up system resources
curl_close($ch);
// make response depending on whether there was an error
if($curl_errno > 0) {
return '';
}
else {
return $output;
}
}
Last update? I have updated the code one more time. It now
Reads a list of URLs from a file (one URL per line - fully formed)
Tries to fetch the contents from each file in turn, handling time-outs and echoing progress to the screen
Creates tables with the some of the information from the files (including a reformatted time stamp)
To make this work, I had the following files:
www.floris.us/SO/ships.csv containing three lines with
http://www.floris.us/SO/ships.txt
http://floris.dnsalias.com/noSuchFile.html
http://www.floris.us/SO/ships2.txt
Files ships.txt and ships2.txt at the same location (almost identical copies but for name of ship) - these are like your plain text files.
File ships3.php in the same location. This contains the following source code, that performs the various steps described earlier, and attempts to string it all together:
<?php
$urlstring = file_get_contents('http://www.floris.us/SO/ships.csv');
$urls = explode("\n", $urlstring); // one url per line
$responses = Array();
// loop over the urls, and get the information
// then parse it into the $responses array
$i = 0;
foreach($urls as $url) {
$temp = myCurl($url);
if(strlen($temp) > 0) {
$responses[$i] = parseString($temp);
$i = $i + 1;
}
else {
echo "URL ".$url." did not repond<br>";
}
}
// produce the actual output table:
echo '<html><body>';
writeTable($responses);
echo '</pre></body></html>';
// ------------ support functions -------------
function parseString($s) {
$lines = explode("\n", $s);
$jstring = "{ ";
$comma = "";
foreach($lines as $line) {
$elements = explode(":", $line);
if (count($elements) > 1) {
$jstring = $jstring . $comma . '"' . trim($elements[0]) . '" : "' . $elements[1] .'"';
$comma = ",";
}
if(strstr($line, "{")) {
$elements = explode("{", $line);
$key = trim($elements[0]);
$jstring = $jstring . $comma . '"' . $key .'" : {';
$comma = "";
}
if(strstr($line, "}")) {
$jstring = $jstring . '} ';
$comma = ",";
}
}
$jstring = $jstring ."}";
return json_decode($jstring, true);
}
function myCurl($f) {
// create curl resource
$ch = curl_init();
// set url
curl_setopt($ch, CURLOPT_URL, $f);
//return the transfer as a string
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
curl_setopt($ch, CURLOPT_NOSIGNAL, 1);
curl_setopt($ch, CURLOPT_CONNECTTIMEOUT, 10); // try for 10 seconds to get a connection
curl_setopt($ch, CURLOPT_TIMEOUT, 30); // try for 30 seconds to complete the transaction
// $output contains the output string
$output = curl_exec($ch);
// see if any error was set:
$curl_errno = curl_errno($ch);
$curl_error = curl_error($ch);
// close curl resource to free up system resources
curl_close($ch);
// make response depending on whether there was an error
if($curl_errno > 0) {
echo 'Curl reported error '.$curl_error.'<br>';
return '';
}
else {
echo 'Successfully fetched '.$f.'<br>';
return $output;
}
}
function writeTable($r) {
echo 'The following ships reported: <br>';
echo '<table border=1>';
foreach($r as $value) {
if (strlen($value["vessel-name"]) > 0) {
echo '<tr><table border=1><tr>';
echo '<td>Vessel Name</td><td>'.$value["vessel-name"].'</td></tr>';
echo '<tr><td>Time:</td><td>'.dateFormat($value["time"]).'</td></tr>';
echo '<tr><td>Position:</td><td>'.$value["position"].'</td></tr>';
echo '</table></tr>';
}
echo '</table>';
}
}
function dateFormat($d) {
// with input yyyymmddhhmm
// return dd/mm/yy hh:mm
$date = substr($d, 6, 2) ."/". substr($d, 4, 2) ."/". substr($d, 2, 2) ." ". substr($d, 9, 2) . ":" . substr($d, 11, 2);
return $date;
}
?>
Output of this is:
You can obviously make this prettier, and include other fields etc. I think this should get you a long way there, though. You might consider (if you can) having a script run in the background to create these tables every 30 minutes or so, and saving the resulting html tables to a local file on your server; then, when people want to see the result, they would not have to wait for the (slow) responses of the different remote servers, but get an "almost instant" result.
But that's somewhat far removed from the original question. If you are able to implement all this in a workable fashion, and then want to come back and ask a follow-up question (if you're still stuck / not happy with the outcome), that is probably the way to go. I think we've pretty much beaten this one to death now.
First combine the websites into a csv or hard coded array, then file_get_contents() / file_put_contents() on each. Essentially:
$file = dataFile.csv
foreach($arrayOfSites as $site){
$data = file_get_contents($site);
file_put_contents($file, $data . "\n", FILE_APPEND);
}
Edit: Sorry was trying to do this fast. here is the full