Google Speech API duplicates responses - php

I am using Speech API v2 with PHP, here is a code:
$file_to_upload = array('myfile'=>'#'.$filename.'.flac');
$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, "https://www.google.com/speech-api/v2/recognize?output=json&lang=ru-RU&key=___my_api_key___");
curl_setopt($ch, CURLOPT_POST, 1);
curl_setopt($ch, CURLOPT_HTTPHEADER, array("Content-Type: audio/x-flac; rate=8000"));
curl_setopt($ch, CURLOPT_POSTFIELDS, $file_to_upload);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
$result=curl_exec ($ch);
Google responses with two JSON objects, first is empty, second has valid response as I expect. That causes difficulties in parsing and further processing. See HTTP dump:
My POST request:
POST /speech-api/v2/recognize?output=json&lang=ru-RU&key=___my_api_key___ HTTP/1.1
Host: www.google.com
Accept: */*
User-Agent: Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/34.0.1847.131 Safari/537.36
Content-Length: 13123
Expect: 100-continue
Content-Type: audio/x-flac; rate=8000; boundary=----------------------------9641e899ac92
------------------------------9641e899ac92
Content-Disposition: form-data; name="myfile"; filename="/tmp/voice/1400157667.6440-in.wav.flac"
Content-Type: application/octet-stream
fLaC..."......e..\......! ..{..!y>..7..............................( ...reference libFLAC 1.2.1 20070917.
...encoded binary data...
------------------------------9641e899ac92--
Response with duplicate result of recognition:
HTTP/1.1 200 OK
Content-Type: application/json; charset=utf-8
Content-Disposition: attachment
Cache-Control: no-transform
X-Content-Type-Options: nosniff
Pragma: no-cache
Date: Thu, 15 May 2014 12:41:09 GMT
Server: S3 v1.0
X-XSS-Protection: 1; mode=block
X-Frame-Options: SAMEORIGIN
Alternate-Protocol: 80:quic
Transfer-Encoding: chunked
e
{"result":[]} <--- first one
f8
{"result":[{"alternative":[{"transcript":"............","confidence":0.73531097},{"transcript":"................"},{"transcript":".............."},{"transcript":"................"},{"transcript":"............ .."}],"final":true}],"result_index":0} <--- second one
0
Why could it happen? When I used API v1, it had the only response. Other examples of v2 in the internet also have only one.
Thanks a lot.

First of all, be sure that the language you are using provides Speaker Diarization. For instance, for spanish in Colombia Google does not provide speaker diarization, but for spanish from Spain, it does:
Language Support
Besides, sometimes a slight alteration of audio is needed, what can be achieved using ffmpeg:
ffmpeg -i input.wav -ac 1 -ab 128k -filter:a volume=0.9 -filter:a equalizer=f=4000:t=h:w=200:g=-2 output.wav

Related

Getting garbage output when scraping a webpage in PHP [duplicate]

This question already has answers here:
Downloading files using GZIP
(4 answers)
Closed 3 years ago.
I am trying to get the contents of a page from Amazon using file_get_html() but the output comes with weird characters on echo. Can anyone please explain how can I resolve this issue?
I also found the following two related questions on Stack Overflow but they did not solve my issue. :)
file_get_html() returns garbage
Uncompress gzip compressed http response
Here is my code:
$options = array(
'http'=>array(
'header'=>
"Accept: text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8\r\n".
"Accept-language: en-US,en;q=0.5\r\n" .
"User-Agent: Mozilla/5.0 (Windows; U; Windows NT 6.0; en-US; rv:1.9.1.6) Gecko/20091201 Firefox/3.5.6\r\n"
)
);
$context = stream_context_create($options);
$amazon_url = 'https://www.amazon.com/my-url';
$amazon_html = file_get_contents($amazon_url, false, $context);
Here is the output I get:
��T]o�6}��`���0��݊-��"[�bh�tN�b0��.%%�$P��#�(Ų�� ������F#����A�
about 115k characters like this show up in the browser window.
These are my new headers:
$options = array(
'http'=>array(
'header'=>
"Accept: text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8\r\n".
"Accept-language: en-US,en;q=0.5\r\n"
)
);
Will using cURL resolve this issue?
Update:
I tried cURL. Still getting the garbage output. Here are my response headers:
HTTP/1.1 200 OK
Date: Sun, 18 Nov 2018 20:29:28 GMT
Server: Apache/2.4.33 (Win32) OpenSSL/1.1.0h PHP/7.2.5
X-Powered-By: PHP/7.2.5
Keep-Alive: timeout=5, max=100
Connection: Keep-Alive
Transfer-Encoding: chunked
Content-Type: text/html; charset=UTF-8
Can anyone explain the negative votes?
I did a research myself.
Found some related questions on Stack Overflow which did not solve my problem.
Provided all the information that I thought would be helpful.
What else should I include in the question?
Here is my whole code for curl at present. This is the URL I am scraping.
$handle = curl_init();
curl_setopt($handle, CURLOPT_URL, $amazon_url);
curl_setopt($handle, CURLOPT_RETURNTRANSFER, true);
$data = curl_exec($handle);
curl_close($handle);
echo $data;
The output is just a bunch of characters I mentioned above. Here are my request headers:
Host: localhost
User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:63.0) Gecko/20100101 Firefox/63.0
Accept: text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8
Accept-Language: en-US,en;q=0.5
Accept-Encoding: gzip, deflate
Connection: keep-alive
Cookie: AMCV_17EB401053DAF4840A490D4C%40AdobeOrg=-227196251%7CMCIDTS%7C17650%7CMCMID%7C67056225185486460220940124683302119708%7CMCAID%7CNONE%7CMCOPTOUT-1524907071s%7CNONE; mjx.menu=renderer%3ACommonHTML; _ga=GA1.1.2019605490.1529649408; csm-hit=adb:adblk_no&tb:s-3521C4J8F2EP1V0MMQEP|1542578145652&t:1542578146256
Upgrade-Insecure-Requests: 1
Pragma: no-cache
Cache-Control: no-cache
These are from the Network Tab. The response headers are the same as I mentioned above.
Here is the output after adding curl_setopt($handle, CURLOPT_HEADER, 1); to my code:
HTTP/1.1 200 OK Server: Server Content-Type: text/html; charset=UTF-8
Strict-Transport-Security: max-age=47474747; includeSubDomains;
preload x-amz-id-1: 7A162B8JKV6MGZQ3PCH2 Vary:
Accept-Encoding,User-Agent,X-Amzn-CDN-Cache Content-Encoding: gzip
x-amz-rid: 7A162B8JKV6MGZQ3PCH2 Cache-Control: no-transform
X-Frame-Options: SAMEORIGIN Date: Sun, 18 Nov 2018 22:42:51 GMT
Transfer-Encoding: chunked Connection: keep-alive Connection:
Transfer-Encoding Set-Cookie:
x-wl-uid=1a4u8+XgF+IhFF/iavy9mKZCAA0g4HiIYZXR8hKjxGtmOtBW+j67wGABv7ZOTxDRcab+7Qmpjqds=;
Here's the solution:
I ran into the same issue when scraping Amazon.
Simply add the following option before sending your cURL request:
curl_setopt($handle, CURLOPT_ENCODING, 'gzip,deflate,sdch');

Linux cURL vs PHP cURL - POST Request

I have to upload a ZIP file using HTTPS and this is working only via Linux cURL command. I don't understand what i am missing in PHP cURL request...
Linux cURL [working]:
curl -v -x http://api.test.sandbox.mobile.de:8080 -u USER:PASS -X POST --data-binary #502.zip https://services.mobile.de/upload-api/upload/502.zip
Response:
POST /upload-api/upload/502.zip HTTP/1.1
User-Agent: curl/7.38.0
Host: services.mobile.de
Accept: */*
Content-Length: 6026
Content-Type: application/x-www-form-urlencoded
Expect: 100-continue
HTTP/1.1 100 Continue } [data not shown]
HTTP/1.1 201 Created
Date: Tue, 06 Dec 2016 12:40:41 GMT
Content-Type: text/html;charset=utf-8
Vary: Accept-Encoding
Transfer-Encoding: chunked
PHP cURL [not working]:
$ch = curl_init();
curl_setopt($ch, CURLOPT_HTTPHEADER, array(
'Authorization: Basic '. base64_encode("USER:PASS"),
'Content-Type: text/plain'
));
curl_setopt($ch,CURLOPT_PROXY, 'api.test.sandbox.mobile.de:8080');
curl_setopt($ch,CURLOPT_URL, 'https://services.mobile.de/upload-api/upload/502.zip');
curl_setopt($ch,CURLOPT_RETURNTRANSFER, true);
curl_setopt($ch,CURLOPT_POST, 1);
curl_setopt($ch,CURLOPT_POSTFIELDS, [ 'file' => new CURLFile('502.zip') ]);
curl_setopt($ch,CURLOPT_VERBOSE, 1);
$result = curl_exec($ch);
curl_close($ch);
Response:
POST /upload-api/upload/502.zip HTTP/1.1
Host: services.mobile.de
Accept: */*
Content-Length: 6225
Expect: 100-continue
Content-Type: text/plain; boundary=------------------------835f6ea7 5f783449
HTTP/1.1 100 Continue
HTTP/1.1 201 Created
Date: Tue, 06 Dec 2016 13:36:21 GMT
Content-Type: text/html;charset=utf-8
Vary: Accept-Encoding
Transfer-Encoding: chunked
On site documentation it's written:
"The upload file must be sent as an HTTP-Payload and in binary format, Multipart and Encoding are not supported."
I also noticed that the Content-Length is not the same... Why?
Thank you in advance for your advice!
Get rid of the line:
'Content-Type: text/plain'
You are setting the content type for the entire message and it is not formatting the POST data correctly.

PHP cURL returns error 52 if Content-Length is set

I am trying to post a form in a remote website via php cURL.
here is my cURL configuration (I added several explanations in the comments):
$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, $action);
curl_setopt($ch, CURLOPT_POST, 1);
curl_setopt($ch, CURLOPT_CUSTOMREQUEST, 'POST'); //without this line the request is being sent with GET method (I can see that with curl_getinfo)
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, 1);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
curl_setopt($ch, CURLOPT_CONNECTTIMEOUT, 30);
curl_setopt($ch, CURLOPT_TIMEOUT, 50);
curl_setopt($ch, CURLOPT_HEADER, true);
curl_setopt($ch, CURLOPT_POSTFIELDS, $postData); //$postData is an urlencoded string
curl_setopt($ch, CURLINFO_HEADER_OUT, true);
curl_setopt($ch, CURLOPT_USERAGENT, 'Mozilla/5.0 (Windows NT 6.1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/39.0.2171.95 Safari/537.36');
curl_setopt($ch, CURLOPT_REFERER,$url);
curl_setopt($ch, CURLOPT_HTTPHEADER, array(
'Accept: text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,*/*;q=0.8',
'Accept-Encoding: gzip, deflate',
'Accept-Language: en-US,en;q=0.8',
'Expect:',
'Content-Type: application/x-www-form-urlencoded',
'Connection: keep-alive',
'Cache-Control: max-age=0',
'Origin: http://XXX',
));
After executing such configuration, I receive such response:
string(610) "HTTP/1.1 302 Found
Location: http://XXX
Vary: Accept-Encoding
Content-type: text/html; charset=utf-8
Server: DWS
Content-Length: 15536
Accept-Ranges: bytes
Date: Wed, 17 Dec 2014 10:59:35 GMT
X-Varnish: 2567206754
Age: 0
Via: 1.1 varnish
Connection: keep-alive
HTTP/1.1 411 Length Required
Content-Type: text/html
Server: DWS
Content-Length: 357
Accept-Ranges: bytes
Date: Wed, 17 Dec 2014 10:59:35 GMT
X-Varnish: 2567207187
Age: 0
Via: 1.1 varnish
Connection: keep-alive
I tried to add a Content-Length header to the cURL configuration:
'Content-Length: ' . strlen($postData)
But then cURL fails with error 52 (Empty reply from server).
In order to make sure that the content length that i am specifying is in fact correct, I tried to add a cus tom string to CURLOPT_POSTFIELDS (like 'foo=bar'), and set Content-Length: 7, but the result was the same.
I also tried to covert whole code and use Zend 2 Http Client, but with no luck.
I think I've read all other posts about the cURL 52 error, but none of them seemed to have anything in common with Content-Length header, so I hope that someone here might help me out.
Please let me know if you need any more information from my part.
The POST request to the URL specified in the $action variable returns a HTTP 302 redirect, in which CURL will send the next request using GET, see the 2nd paragraph here: http://curl.haxx.se/docs/manpage.html#-L.
You already use CURLOPT_CUSTOMREQUEST to get around this, but it needs CURLOPT_POSTREDIR as well, as documented in the Strings section here: http://evertpot.com/curl-redirect-requestbody/
You should not explicitly set the Content-Length but have CURL handle that.
Alternatively you could manually "follow" the Location header and use the URL from that header in the $action variable (it may be useful to try that for testing first).

Why does a cURL request from a PHP file not work, when the same cURL request from the Linux console does?

I am trying to write small php code which has to make a curl call, but it hangs in between. Please find the code below:
$url = 'XXXXXX';
$curlHandler = curl_init($url);
curl_setopt($curlHandler, CURLOPT_RETURNTRANSFER, true);
curl_setopt($curlHandler, CURLOPT_FOLLOWLOCATION, true);
curl_setopt($curlHandler, CURLOPT_ENCODING, '');
curl_setopt($curlHandler, CURLOPT_VERBOSE, TRUE);
print var_dump(curl_error($curlHandler))."\n";
print curl_exec($curlHandler);
curl_close($curlHandler);
I am getting following output to this:
string(0) ""
"* About to connect() to XXXXXX port 80 (#0)"
"* Trying 72.52.8.197... * connected"
"> GET XXXXXX HTTP/1.1"
Host: XXXXXX
Accept: */*
Accept-Encoding: deflate, gzip"
After this php process hangs.
While if I make curl request as follows, it works:
curl -v "XXXXXX"
* About to connect() to XXXXXX port 80 (#0)
* Trying 72.52.8.197... connected
> GET XXXXXX HTTP/1.1
> User-Agent: curl/7.22.0 (i686-pc-linux-gnu) libcurl/7.22.0 OpenSSL/1.0.1 zlib/1.2.3.4 libidn/1.23 librtmp/2.3
> Host: XXXXXX
> Accept: */*
>
< HTTP/1.1 301 Moved Permanently
< Cache-Control: no-store, no-cache, must-revalidate, post-check=0, pre-check=0
< Content-Type: text/html; charset=UTF-8
< Date: Tue, 04 Mar 2014 11:02:15 GMT
< Expires: Thu, 19 Nov 1981 08:52:00 GMT
< Location: XXXXXX
< Pragma: no-cache
< Server: Apache
< Set-Cookie: PHPSESSID=kkgmdajs0485tkjm2q7vrfl260; path=/; domain=.souq.com
< Set-Cookie: PLATEFORMC=sa; expires=Wed, 04-Mar-2015 11:02:15 GMT; path=/; domain=.souq.com
< Set-Cookie: PLATEFORML=ar; expires=Wed, 04-Mar-2015 11:02:15 GMT; path=/; domain=.souq.com
< Vary: Accept-Encoding
< Content-Length: 0
< Connection: keep-alive
< Set-Cookie: NSC_tpvr-83+63+9+208-91=ffffffff2d814a2945525d5f4f58455e445a4a423660;path=/;httponly
<
* Connection #0 to host XXXXXX left intact
* Closing connection #0
Can someone explain me why there is difference in php curl call and unix curl call?
The command line curl command has unescaped &s in them, they act as a "make it background task" marker and the numbers between the []s are the identifier that bash assigns for them. They of course exit immediately since (for example) the utm_campaign=desktop is not a real command. You can read more in the job control section of bash's manual.
Just wrap your URL in "s on the command line, so the curl command receives the whole string:
curl "http://...."
^ ^
If you want to see the verbose messages (as seen in the php snippet), add the -v option before the URL.
For the CURLOPT_FOLLOWLOCATION you will need the -L option.
The command line curl call sets a User-Agent, but your PHP sample does not.
If I try the same request to that URL passing a user agent, it works fine.
Try adding one to your PHP code, e.g.:
curl_setopt($curlHandler, CURLOPT_USERAGENT,
'Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Iron/31.0.1700.0 Chrome/31.0.1700.0 Safari/537.36');
Some sites don't function properly if you don't specify a user agent or certain other http headers (like accept-language or accept), this one appears to be one of those sites.

configure curl to get www.google.com

How to get curl response properly using php curl. I tried to change some request header and user agent
GET / HTTP/1.1
User-Agent: Mozilla/5.0 (Windows NT 6.1) AppleWebKit/535.7 (KHTML, like Gecko) Chrome/16.0.912.63 Safari/535.7
Host: www.google.com
Accept: */*
Accept-Encoding: gzip,deflate,sdch
but its not working am getting 302 error
HTTP/1.1 302 Found
Location: http://www.google.co.in/
Cache-Control: private
Content-Type: text/html; charset=UTF-8
Set-Cookie: PREF=ID=30a99703e541807e:FF=0:TM=1324467004:LM=1324467004:S=0VlXyYEJtxKQ_Pqk; expires=Fri, 20-Dec-2013 11:30:04 GMT; path=/; domain=.google.com
Date: Wed, 21 Dec 2011 11:30:04 GMT
Server: gws
Content-Length: 221
X-XSS-Protection: 1; mode=block
X-Frame-Options: SAMEORIGIN
the html output that i get is
<HTML><HEAD><meta http-equiv="content-type" content="text/html;charset=utf-8">
<TITLE>302 Moved</TITLE></HEAD><BODY>
<H1>302 Moved</H1>
The document has moved
here.
</BODY></HTML>
how to get and post data using php curl as if we are doing it from browser.
here is my php code for curl_setopt
curl_setopt($ch,CURLOPT_URL,$url);
curl_setopt($ch, CURLOPT_USERAGENT, " Mozilla/5.0 (Windows NT 6.1) AppleWebKit/535.7 (KHTML, like Gecko) Chrome/16.0.912.63 Safari/535.7");
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
curl_setopt($ch, CURLOPT_ENCODING, 'gzip,deflate,sdch');
Set the cURL CURLOPT_FOLLOWLOCATION option to true:
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, true);
Will will instruct cURL to follow any "Location: " headers that the server sends. More info available in the documentation.
HTTP/1.1 302 Found
Location: http://www.google.co.in/
tells you to go to http://www.google.co.in/ for the content, so you have to do another cURL sequence there.

Categories