Google Maps API issue with non-English addresses (PHP/CURL) - php

I'm implementing Google Maps service with PHP/CURL and currently have a problem with Maps API.
I am sending queries to Google Maps with English and Russian addresses.
English address works fine, while Russian address fails with 602 (address not found)
Weird thing is that if I copy-paste curl query with Russian address to browser, it works fine (returns 200 and coords).
My code is
public static function google_geolocator($geoloc){
$geoloc = urlencode(implode(" ",$geoloc));
$query = "http://maps.google.com/maps/geo?q={$geoloc}&sensor=true&oe=utf8";
echo $query;
$curl = curl_init();
curl_setopt($curl, CURLOPT_URL, $query);
curl_setopt($curl, CURLOPT_HEADER, 0);
curl_setopt($curl, CURLOPT_FAILONERROR, TRUE);
curl_setopt($curl, CURLOPT_VERBOSE, TRUE);
curl_setopt($curl, CURLOPT_RETURNTRANSFER, TRUE);
$json = curl_exec($curl);
curl_close($curl);
$data = json_decode($json, TRUE);
return $data;
}
I searched through Internets and figured out some hacks with User-Agent and &oe=utf8 and so on, but none of those seem to work. The main thing that confuses me, again, is that this method works perfectly with address in English, but fails with Russian. Although pasting query into browser works perfect in both cases.
Thank you in advance!
small update: query like
http://maps.google.com/maps/geo?q=%D0%A2%D0%B0%D0%B8%D0%BB%D0%B0%D0%BD%D0%B4+%D0%9A%D0%BE+%D0%9F%D1%85%D1%83%D0%BA%D0%B5%D1%82&sensor=true&oe=utf8
works in Chrome, but doesn't work in Firefox (same 602), while
http://maps.google.com/maps/geo?q=Thailand+Koh+Phuket&sensor=true&oe=utf8
works fine in both
upd2
var_dumping($data) returns
{
"name": "Таиланд Ко Пхукет",
"Status": {
"code": 602,
"request": "geocode"
}
}
and name field is absolutely the same as in Chrome response.
upd3 alright, the thing is if i change location name in Russian slightly, both Chrome and FF return 200 and coordinates. The problem seems to be that Chrome is bit more "intellectual" while communicating with Google. Refactored method so location names are always provided in English no matter what current locale is. Looks like it's better not to mess with Google and character sets different from English.

The main thing that confuses me, again, is that this method works perfectly with address in English, but fails with Russian.
You are telling Google to expect UTF-8 data, but you're probably sending it something else. UTF-8 is a double-byte encoding - plain old ASCII characters consist of one byte, just as in ASCII; Russian and other characters of two or more. That's why the English alphabet often works, but as soon as other characters come into play, things break. Most likely, your Cyrillic characters are getting garbled (= stored as an encoding other than UTF-8) somewhere along the way, before they enter your query.
Make sure the values in $geoloc are UTF-8 encoded. If they come from a file, make sure that is UTF-8 encoded. If they come from a database, make sure the tables and the database connection are UTF-8 encoded.

Related

length issues with solr select POST with cURL in PHP

I have a solr query that has been working perfectly:
$ch = curl_init();
$ch_searchURL = "$base_url/$collection/select?q=$s&wt=json&indent=true";
curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
curl_setopt($ch, CURLOPT_URL, $ch_searchURL);
$rawData = curl_exec($ch);
$json = json_decode($rawData,true);
Initially, my $s variable was literally one thing: e.g. ?q=name:brian, but my user base wanted the ability to search multiple things at once, so I started to build that in:
?q=name:("brian"+OR+"mike"+OR+"james"+OR+"emma"+OR+"luke")
It then got to the point where they wanted to search 5,000 things at once, which caused this method of building out the solr GET query to fail as the literal URL length was longer than the max allowed length of ~2,000, so I thought using a POST might work, which I accomplished by adding the following lines:
$ch_searchURL = "$base_url/$collection/select";
$multiline_q = "q=$s&wt=json&indent=true";
curl_setopt($ch, CURLOPT_POSTFIELDS, $multline_q);
This seemed to allow me to search for around 500 items at a time - (which would still, in GET world, cause a URL length of around 4,000) - so better than the GET method, but once I go past that number of items, the solr query fails again.
Because I'm POSTing (maybe?), I don't get any error response from solr, so I don't know what's causing the query to fail, and I can't manually test the query in the browser because it's ~40,000 characters long and won't paste. If I do var_dump($rawData);, I see this:
string(238) " 05 " // or 04, or 08
I've used solr quite a bit with PHP & cURL, but always with the GET method. This is my first foray into using POST. Am I doing something wrong here? Am I just exceeding the actual amount of q options that I can ask solr to retrieve for me, regardless of the method?
Any light that anyone could shed on this would be helpful...
There is no limit on the Solr side - we regularly use Solr in a similar way.
You need to look at the settings for your servlet container (Tomcat, Jetty etc.) and increase the maximum POST size. Look up maxPostSize if you are using Tomcat and maxFormContentSize if you are using Jetty.
source : link

Corrupted UTF-8 encoding when reading Google feed / alerts

Whenever I try to read a Google alert via PHP using something like:
$feed = file_get_contents("http://www.google.com/alerts/feeds/01445174399729103044/950192755411504138");
Regardless of whether I save the $feed to a file or echo the result to the output, all utf-8 unicode characters ( i.e. those with diacritics) are represented by white space. I have tried - without success - various combinations of:
utf8_encode
utf8_decode
iconv
mb_convert_encoding
I think the wrong characters have come from the stream, but I'm lost because if I try this URI in a browser then everything is fine. Can anyone shed some light on the issue?
Sorry, you are absolutely correct - there is something untoward happening! Though it is not what you would first suspect... For reference, given that:
echo mb_detect_encoding($feed); // prints: ASCII
The unicode data is lost before it is even sent by the remote server - it appears that Google is looking at the user-agent string in the request header - which is non-existent using file_get_contents by default without a stream-context.
Because it cannot identify the client making the request it defaults to and forces ASCII encoding. This is presumably a necessary fallback in the event of some kind of cataclysmic cock-up. [citation needed...]
It's not simply enough to name your application however, you need to include a known vendor. I 'm unsure of the full extent of this but I believe most folks include "Mozilla [version]" to work around the issue, for example:
$url = 'http://www.google.com/...';
$feed = file_get_contents($url, false, stream_context_create([
'http' => [
'method' => 'GET',
'header' => 'Accept-Charset: UTF-8' ."\r\n"
.'User-Agent: (Mozilla/5.0 compatible) MyFeedReader/1.0'
]
]));
file_put_contents('test.txt', $feed); // should now work as expected

Bad JSON being returned from Survey Monkey (get_survey_list)

I've been trying to pull Survey Data for a client from their Survey Monkey account, it seems that the more data their is the more likely illegal characters are introduced in to the resulting JSON string.
Below is a sample of what is returned on a bad response, every response is different and even shorter requests some times fail leaving me at a miss.
{
"survey_id": "REDACTED",
"title": "REDACTED",
"date_modified": "2014-XX-18 17:59:00",
"num_responses": 0,
"date_created": "�2014-01-21 10:29:00",
"question_count": 102
}
I can't fathom as to why this is happening, the more parameters in the fields option there are, the more illegal characters are introduced. It isn't just illegal invalid characters, some times random letters are thrown in as well which prevents me from handling the data correctly.
I am using Laravel 4 with the third party Survey Monkey library by oori
https://github.com/oori/php-surveymonkey
Any help would be appreciated in tracking down the issue, the deadline is pretty tight and if this can't be resolved I'll have to resort to asking the client to manually import CSV files which isn't ideal and introduces possible user error.
On a side note, I don't see this issue cropping up when using the same parameters on the Survey Monkey console.
O/S: Windows 8.1 with WAMP Server
Code used to execute the request
$Surveys = SurveyMonkey::getSurveyList(array
(
'page_size' => 1000,
'fields' => array
(
'title', 'question_count', 'num_responses', 'date_created', 'date_modified'
)
));
The SurveyMonkey facade is a custom package used to integrate the original Survey Monkey library located here:
https://github.com/oori/php-surveymonkey/blob/master/SurveyMonkey.class.php
Raw PHP cURL request
$header = array('Content-Type: application/json','Authorization: Bearer REDACTED');
$post = json_encode(array(
'fields' => array(
'title', 'question_count', 'num_responses', 'date_created', 'date_modified'
)
));
$post = json_encode($post);
$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, "https://api.surveymonkey.net/v2/surveys/get_survey_list?api_key=REDACTED");
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
curl_setopt($ch, CURLOPT_HTTPHEADER, $header);
curl_setopt($ch, CURLOPT_HEADER, false);
curl_setopt($ch, CURLOPT_POST, true);
curl_setopt($ch, CURLOPT_ENCODING, 'UTF-8');
curl_setopt($ch, CURLOPT_POSTFIELDS, $post);
$result = curl_exec($ch);
The above request returns the same troublesome characters, nothing else was used to get the response.
Using the following code
echo "\n".mb_detect_encoding($result, 'UTF-8', true);
This code shows the charset for the response, when successful and no illegal characters are present (there are still random characters in the wrong places) it returns that it is in fact UTF-8, when illegal characters are present false is returned so nothing is outputted. More often than not, false is returned.
Maybe I'm grossly oversimplifying the whole thing and apologies if so, but I have had these funny little chars pop in to results, too.
They were leading and trailing whitespace.
Can you trim data on retrieve and see if it still happens?

Failed to send SMS in Hindi

I need to send SMS in hindi, for this I need to pass the hindi string through URL.
As I am coading in php I used urlencode($hindimessage) on string and passed complete URL through file_get_contents(). On executing I got error:
Warning: file_get_contents(http://IP GOES HERE/smpp/sendsms?username=$name&password=******&to=$contact&from=DEMOTT&coding=3&&text=%E0%A4%AE%E0%A4%A8%E0%A5%80%E0%A4%B7+%E0%A4%95%E0%A5%81%E0%A4): failed to open stream: HTTP request failed! HTTP/1.1 505 HTTP Version Not Supported
Without using urlencode(), the server treats text as EMPTY STRING and rejects.
I also tried Using utf8_encode() encoding. I recive message in HTML tags like ही....
But when I use the API URL directly I am able to recive the message in hindi since API is Unicode API coding=3 enbled for Hindi text.(i.e API is working Properly)
Please Inform what kind of approach I need to adopt for sending message in both Hindi as well as in English.
Thanks in Advance
urlencode() is necessary if you are calling URL in file_get_contents() function.
You need to adopt CURL for sending message in both Hindi as well as in English.
$smsgatewayurl = 'http://IP GOES HERE/smpp/sendsms';
$post_data = array(); // All params including text message
$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, $smsgatewayurl);
curl_setopt($ch, CURLOPT_POST, true);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
curl_setopt($ch, CURLOPT_POSTFIELDS, $post_data);
$output = curl_exec($ch);
curl_close($ch);
CURL is best option to call third party APIs compare to file_get_contents function. I have tested this above function with spring edge sms gateway including hindi text.
Set dcs coding =8 and convert your Hindi characters into Unicode characters and put it in the text field.This will work.

How to parse dict output in a user friendly way in PHP?

I am trying to implement a dictionary-type service.
I send a request with php using cURL to dict.org with the dict protocol.
This is my code (which on its own works and may be helpful for future readers):
$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, "dict://dict.org/define:(hello):english:exact");
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
$definition = curl_exec($ch);
curl_close($ch);
echo $definition;
The server returns the definition, as expected, along with several headers (that I do not need). The response looks something like this:
220 miranda.org dictd 1.9.15/rf on Linux 2.6.26-2-686 <auth.mime> <29631663.31530.1250750274#miranda.org>
250 ok
150 3 definitions retrieved
151 "Hello" gcide "The Collaborative International Dictionary of English v.0.48"
Hello \Hel*lo"\, interj. & n.
An exclamation used as a greeting, to call attention, as an
exclamation of surprise, or to encourage one. This variant of
{Halloo} and {Holloo} has become the dominant form. In the
United States, it is the most common greeting used in
answering a telephone.
[1913 Webster +PJC]
(... some content removed)
.
250 ok [d/m/c = 3/0/162; 0.000r 0.000u 0.000s]
221 bye [d/m/c = 0/0/0; 0.000r 0.000u 0.000s]
I was wondering if:
a) Is there a way to specify to curl (or an option in the dict protocol) to not return all that extra information (i.e. 250 ok [d/m/c = 3/0/162; 0.000r...])
b) You probably noticed that the dict response returns information that is not displayed in the most user friendly way. I was wondering if anybody knew of any existing php library that will allow me to display this in a nicer way. Otherwise I'd have to code my own.
c) If this is not the way most dictionary websites retrieve their definitions, how do they do it? In my understanding the most comprehensive dictionary database is the one at dict.org (which supports the dict protocol and is where I am sending my cURL request).
Thank you!
Before I start let me state that I don't know the specific of the dict protocol.
I doubt that you'll be able to create a request that only delivers the text. The information you wish to discard looks like status information and is therefore useful.
The way I'd handle this is as follows:
Read the curl response data into an array so that each line is an separate entry in the array. You could use explode() and split at the new line character (\n) to do this.
Iterate the array, EG for ($response as $responseLine) {}
perform a regex (or some other form of pattern matching) on $responseLine to find the definition. It looks like the actual text is the only $responseLine which doesn't start with a number.
You may want to check what characterset the dict protocol uses. I haven't mentioned any error handling, but that should be straight forward.

Categories