Amazon CloudSearch throws HTTP 403 on document upload

Amazon CloudSearch throws HTTP 403 on document upload - php

I am trying to integrate Amazon CloudSearch into SilverStripe. What I want to do is when the pages are published I want a CURL request to send the data about the page as a JSON string to the search cloud.
I am using http://docs.aws.amazon.com/cloudsearch/latest/developerguide/uploading-data.html#uploading-data-api as a reference.
Every time I try to upload it returns me a 403. I have allowed the IP address in the access policies for the search domain as well.
I am using this as a code reference: https://github.com/markwilson/AwsCloudSearchPhp
I think the problem is the AWS does not authenticate correctly. How do I correctly authenticate this?

If you are getting the following error
403 Forbidden, Request forbidden by administrative rules.
and if you are sure you have appropriate rules in effect, I would check the api url you are using. Make sure you are using the correct endpoint. If you are doing batch upload the api endpoint should look like below
your-search-doc-endpoint/2013-01-01/documents/batch
Notice 2013-01-01, that is a required part of the url. That is the api version you will be using. You cannot do the following even though it might make sense
your-search-doc-endpoint/documents/batch <- Won't work
To search you would need to hit the following api
your-search-endpoint/2013-01-01/search?your-search-params

After many searches and trial and error I was able to put together a small code block, from small pieces of code from everywhere to be able to upload a "file" using CURL and PHP to aws cloudsearch.
The one and most important things is to make sure that your data is prepare correctly to be sent in JSON format.
Note: For cloudsearch you're not uploading a file your posting a stream of JSON data. That is why many of us have a problem uploading the data.
So in my case I wanted to be able to upload data that my search engine on clousearch, it seems simple and it is but the lack of example code to do this is not there most people tell you you to go to the documentation which usually has examples but to use the aws CLI. The php SDK is just a learning curb plus instead of making it simple you do 20 steps to do 1 task and not only that you're require to have all these other libraries that are just wrappers for native PHP functions and sometimes instead of making it simple it becomes complicated.
So back to how I did it, first I am pulling the data from my database as an array and serialize it to save it to a file.
$row = $database_data;
foreach ($rows as $key => $row) {
$data['type'] = 'add';
$data['id'] = $row->id;
$data['fields']['title'] = $row->title;
$data['fields']['content'] = $row->content;
$data2[] = $data;
}
// now save your data to a file and make sure
// to serialize() it
$fp = fopen($path_to_file, $mode)
flock($fp, LOCK_EX);
fwrite($fp, serialize($data2));
flock($fp, LOCK_UN);
fclose($fp);
Now that you have your data saved we can play with it
$aws_doc_endpoint = '{Your AWS CloudSearch Document Endpoint URL}';
// Lets read the data
$data = file_get_contents($path_to_file);
// Now lets unserialize() it and encoded in JSON format
$data = json_encode(unserialize($data));
// finally lets use CURL
$ch = curl_init($aws_doc_endpoint);
curl_setopt($ch, CURLOPT_CUSTOMREQUEST, "POST");
curl_setopt($ch, CURLOPT_HTTPHEADER, array('Content-Length: ' . strlen($data)));
curl_setopt($ch, CURLOPT_HTTPHEADER, array('Content-Type: application/json'));
curl_setopt($ch, CURLOPT_POSTFIELDS, $data);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, TRUE);
$response = curl_exec($ch);
curl_close($ch);
$response = json_decode($response);
if ($response->status == 'success')
{
return TRUE;
}
return FALSE;
And like I said there is nothing to it. Most answers that I encounter where, use Guzzle its really easy, well yes it is but for just a simple task like this you don't need it.
Aside from that if you still get an error make sure to check the following.
Well formatted JSON data.
Make sure you have access to the endpoint.
Well I hope someone finds this code helpful.

To diagnose whether it's an access policy issue, have you tried a policy that allows all access to the upload? Something like the following opens it up to everything:
{
"Version": "2012-10-17",
"Statement": [
{
"Sid": "",
"Effect": "Allow",
"Principal": {
"AWS": "*"
},
"Action": "cloudsearch:*"
}
]
}
I noticed that if you just go to the document upload endpoint in a browser (mine looks like "doc-YOURDOMAIN-RANDOMID.REGION.cloudsearch.amazonaws.com") you'll get the 403 "Request forbidden by administrative rules" error, even with open access, so as #dminer said you'll need to make sure you're posting to the correct full url.
Have you considered using a PHP SDK? Like http://docs.aws.amazon.com/aws-sdk-php/guide/latest/service-cloudsearchdomain.html. It should take care of making correct requests, in which case you could rule out transport errors.

this never worked for me. and i used the Cloudsearch terminal to upload files. and php curl to search files.

Try adding "cloudsearch:document" to CloudSearch's access policy under Actions

Related

Why does this code so negatively affect my server's performance?

I have a Silverstripe site that deals with very big data. I made an API that returns a very large dump, and I call that API at the front-end by ajax get.
When ajax calling the API, it will take 10 mins for data to return (very long json data and customer accepted that).
While they are waiting for the data return, they open the same site in another tab to do other things, but the site is very slow until the previous ajax request is finished.
Is there anything I can do to avoid everything going unresponsive while waiting for big json data?
Here's the code and an explanation of what it does:
I created a method named geteverything that resides on the web server as below, it accessesses another server (data server) to get data via streaming API (sitting in data server). There's a lot of data, and the data server is slow; my customer doesn't mind the request taking long, they mind how slow everything else becomes. Sessions are used to determine particulars of the request.
protected function geteverything($http, $id) {
if(($System = DataObject::get_by_id('ESM_System', $id))) {
if(isset($_GET['AAA']) && isset($_GET['BBB']) && isset($_GET['CCC']) && isset($_GET['DDD'])) {
/**
--some condition check and data format for AAA BBB CCC and DDD goes here
**/
$request = "http://dataserver/streaming?method=xxx";
set_time_limit(120);
$jsonstring = file_get_contents($request);
echo($jsonstring);
}
}
}
How can I fix this, or what else would you need to know in order to help?

The reason it's taking so long is your downloading the entirity of the json to your server THEN sending it all to the user. There's no need to wait for you to get the whole file before you start sending it.
Rather than using file_get_contents make the connection with curl and write the output directly to php://output.
For example, this script will copy http://example.com/ exactly as is:
<?php
// Initialise cURL. You can specify the URL in curl_setopt instead if you prefer
$ch = curl_init("http://example.com/");
// Open a file handler to PHP's output stream
$fp = fopen('php://output', 'w');
// Turn off headers, we don't care about them
curl_setopt($ch, CURLOPT_HEADER, 0);
// Tell curl to write the response to the stream
curl_setopt($ch, CURLOPT_FILE, $fp);
// Make the request
curl_exec($ch);
// close resources
curl_close($ch);
fclose($fp);

file_get_contents doesn't respond

I'm trying to get a JSON string from a page in my Laravel Project. Using this:
$json = file_get_contents($url);
$data = json_decode($json, TRUE);
return View::make('adventuretime.marceline')
->with('json', $json)
->with('title', 'ICE KING')
->with('description', 'I am the Ice King')
->with('content', 'ice king');
But since I'm only using a localhost, I think this doesn't work that's why it doesn't output anything. I want to know what is the proper way for it to be flexible and be able to get the JSON string with any $url value using php?

Looking at the comments above, it is possible that the $url you are using is not valid, check it out by pointing your browser there and see what happens.
If you are sure that the $url is fine, but you still get the 404 Not Found error - verify that you have proper Laravel routing defined for that address. If the routes are fine, maybe you forgot to do
composer dump-autoload
after making modifications in your routes.php. If so, try the above and refresh the browser to see if it helps.
Furthermore, bear in mind that using your current function, you can submit only GET requests. What is more, this function might not be available for fetching remote urls, on some hosting servers due to security reasons. If you still want to use it, it'd be good to check
if($json !== FALSE)
before you process the $json response. If the file_get_contents fails it will return false.
Reffering to the part of your question
what is the proper way for it to be flexible and be able to get the JSON string with any $url
I'd suggest using cURL, as a standard and convenient way to fetch remote content. Using cURL you have better control over the process of sending the http request and receiving the "answer" it returns. Personaly, in my Laravel 4 apps I often use this package jyggen/curl. You can read the docs for it here: jyggen docs
If you are not satisfied with cURL and you want greater control try Guzzle As the authors state, Guzzle is a PHP HTTP client & framework for building RESTful web service clients.

Access SkyDrive using PHP and OAuth

I would like to access skyDrive using PHP.
I want to retreive list of files and folders, download, upload and delete files.
I've got a microsoft dev clientID and clientSecret.
Can anybody get me started with connecting to skyDrive with OAuth and making use of the API?
Thanks a lot!

This is actually quite a broad question. Here's hopefully something that will get you started.
Have a look at SkyDrives REST API.
You could use PHP cURL to perform the GET's and POST's.
Use json_decode() to create a map of the received data.
For any data you send, create maps in PHP and convert them to JSON using json_encode().
Try the API
Here is an interactive API you can try out live to see the responses.
Making requests
Example (taken from other SO Answer):
$url = 'POST https://apis.live.net/v5.0/me/skydrive/files';
$ch = curl_init();
curl_setopt($ch,CURLOPT_URL, $url);
curl_setopt($ch,CURLOPT_POSTFIELDS, array('access_token' => TOKEN, 'name' => 'file', 'filename' => "#HelloWorld.txt"));
curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
$result = curl_exec($ch);
curl_close($ch);
Request types: http://msdn.microsoft.com/en-us/library/live/hh243648.aspx#http_verbs
I also recommend you have a look at curl_setopt() to better understand how to do the different types of requests you'll be needing, using cURL. (Also this answer on SO has some good explanation on POST vs GET using cURL.)
File object
DELETE FILES:
To delete a file, make a DELETE request to /FILE_ID.
UPLOAD FILES:
To create a new File resource, you can either make a POST request to /FOLDER_ID/files, a POST request to the /UPLOAD_LOCATION for the target folder, or a PUT request to /FOLDER_ID/files/.
DOWNLOAD FILES:
To get properties for a File resource, make a GET request to /FILE_ID (the target file ID).
The File resource will contain the URL from which to download the file from SkyDrive in the source field.
Folder object
RETRIEVE LIST OF FILES:
To get the root Folder resource by using the Live Connect REST API, make a GET request to either /me/skydrive or /USER_ID/skydrive.
To get a subfolder resource, make a GET request to /FOLDER_ID.
CREATE FOLDERS:
To create a new Folder resource, make a POST request to /FOLDER_ID. Pass the name and description attributes in the request body
DELETE FOLDERS:
To delete a folder, make a DELETE request to /FOLDER_ID.
OAuth 2.0
My experience with OAuth is unfortunately limited. I can only provide some relevant links and advice which I hope will help.
Review the Protocol Overview and consider if you want to implement something yourself, or use a library. Quick Google search gives me:
http://code.google.com/p/google-api-php-client/wiki/OAuth2
http://code.google.com/p/oauth2-php/
http://php.net/manual/en/book.oauth.php
Some other potentially useful links and guides:
See PHP section of http://oauth.net/code/

How to collect HTML source response from a remote server?

From within the HTML code in one of my server pages I need to address a search of a specific item on a database placed in another remote server that I don’t own myself.
Example of the search type that performs my request: http://www.remoteserver.com/items/search.php?search_size=XXL
The remote server provides to me - as client - the response displaying a page with several items that match my search criteria.
I don’t want to have this page displayed. What I want is to collect into a string (or local file) the full contents of the remote server HTML response (the code we have access when we click on ‘View Source’ in my IE browser client).
If I collect that data (it could easily reach reach 50000 bytes) I can then filter the one in which I am interested (substrings) and assemble a new request to the remote server for only one of the specific items in the response provided.
Is there any way through which I can get HTML from the response provided by the remote server with Javascript or PHP, and also avoid the display of the response in the browser itself?
I hope I have not confused your minds …
Thanks for any help you may provide.

As #mario mentioned, there are several different ways to do it.
Using file_get_contents():
$txt = file_get_contents('http://www.example.com/');
echo $txt;
Using php's curl functions:
$url = 'http://www.mysite.com';
$ch = curl_init($url);
// Tell curl_exec to return the text instead of sending it to STDOUT
curl_setopt($ch, CURLOPT_RETURNTRANSFER, TRUE);
// Don't include return header in output
curl_setopt($ch, CURLOPT_HEADER, 0);
$txt = curl_exec($ch);
curl_close($ch);
echo $txt;
curl is probably the most robust option because you have options for more control over the exact request parameters and possibilities for error handling when things don't go as planned

file_get_contents() GET request not showing up on my webserver log

I've got a simple php script to ping some of my domains using file_get_contents(), however I have checked my logs and they are not recording any get requests.
I have
$result = file_get_contents($url);
echo $url. ' pinged ok\n';
where $url for each of the domains is just a simple string of the form http://mydomain.com/, echo verifies this. Manual requests made by myself are showing.
Why would the get requests not be showing in my logs?
Actually I've got it to register the hit when I send $result to the browser. I guess this means the webserver only records browser requests? Is there any way to mimic such in php?
ok tried curl php:
// create curl resource
$ch = curl_init();
// set url
curl_setopt($ch, CURLOPT_URL, "getcorporate.co.nr");
//return the transfer as a string
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
// $output contains the output string
$output = curl_exec($ch);
// close curl resource to free up system resources
curl_close($ch);
same effect though - no hit registered in logs. So far it only registers when I feed the http response back from my script to the browser. Obviously this will only work for a single request and not a bunch as is the purpose of my script.
If something else is going wrong, what debugging output can I look at?
Edit: D'oh! See comments below accepted answer for explanation of my erroneous thinking.

If the request is actually being made, it would be in the logs.
Your example code could be failing silently.
What happens if you do:
<?PHP
if ($result = file_get_contents($url)){
echo "Success";
}else{
echo "Epic Fail!";
}
If that's failing, you'll want to turn on some error reporting or logging and try to figure out why.
Note: if you're in safe mode, or otherwise have fopen url wrappers disabled, file_get_contents() will not grab a remote page. This is the most likely reason things would be failing (assuming there's not a typo in the contents of $url).

Use curl instead?

That's odd. Maybe there is some caching afoot? Have you tried changing the URL dynamically ($url = $url."?timestamp=".time() for example)?

We Keep Coding

PHP, A popular general-purpose scripting language that is especially suited to web development.

Amazon CloudSearch throws HTTP 403 on document upload - php

this never worked for me. and i used the Cloudsearch terminal to upload files. and php curl to search files.

Try adding "cloudsearch:document" to CloudSearch's access policy under Actions

Related

Why does this code so negatively affect my server's performance?

file_get_contents doesn't respond

Access SkyDrive using PHP and OAuth

How to collect HTML source response from a remote server?

file_get_contents() GET request not showing up on my webserver log

Categories

Resources