How to search value that often change with Cloudsearch?

How to search value that often change with Cloudsearch? - php

I'm new with Cloudsearch and my question might not be clear so I will try to explain my problem.
We have a backoffice were lot of people make research and time to time our database is KO because of some request that take more than 30s to execute, so we decide to use Cloudsearch because we already use some other Amazon web service.
So I created a search domain, I created the index according to the value we search in our current database and I indexed all our event (result of what people search) according to our test database (~ 42 000 row).
My problem is that each event have multiple media (.jpg, .gif and .mp4) in our database (and we are migrating from v3 to v4 so there is two media database and we need to know the event version to know where we should search : the old or the new database) so my question : Can I return some media information with Cloudsearch or I will still need to use a mysql request?
Right now we return the last media add in database (so he can change a lot of time if the event is running) and the total number of media of this event (that can change really often too).
What I think might work :
I can add the two field in my event index (number of media + url of last media) and create a batch file to add / update the event data EACH time we add a new media in database : problem is that we can send 1 batch each 10s and max 10 000 batch / day, so if we have 50 event that run in the same time it could be a big problem...
Same idea that before but we use a CRON to create a batch file with all the last data each hour for example : problem is that the research won't be right before a batch...and max batch size is 5 MB so it could be okay but if we have a lot of new data to add it could be a little problem.
The current idea is to do a mysql request using each event id we get from the cloudsearch research and return those information, but I find this kinda stupid to still use mysql if we change for Cloudsearch...
I saw the documentation for "Using Dynamic Fields in Amazon Cloudsearch" but I don't think it does what I want to achieve...maybe I missunderstand something, but if someone can help me to understand how to do it the best way I would be thankful.

Can I return some media information with Cloudsearch or I will still need to use a mysql request?
If you are asking whether you can store .mp4, .jpg, etc. media files in CloudSearch, the answer is no. You can store text, numbers, dates, and latlong coordinates (or arrays of any of those, except latlong).
I think the conventional way to handle media is to index a URL/path to the media as a text field.
Reference: AWS Cloudsearch Documentation - Configuring Index Fields

Related

How to keep Text data consistent in IOS app and website ?

Its known fact that we can use Json/XML parsing or Database to maintain a limited amount of data consistent on a given application and Website at any given time.
However, the dilemma is a project having few display textlines 3k-4k that are supposed to be consistent on both(App and website)on a selected UI, these text files may change at any given point of time. What will be the optimized method or steps of implementation for this technique?

Just my bit would be to store the data in on server as a text file and later use the website or the app to parse the given text file at given location to display it

Is it possible to edit documents using the Google Drive API?

I need to create weekly texts using the same template. Being the lazy programmer I am I wanted to automate most of it by just creating a Google Form where I can input the data. By then running a PHP script I want to parse the new entry and put it into an automatically created new document.
I have created the template with placeholders such as <DATE> or <NEWMEMBERCOUNT> that I later want to replace by the values entered using the Google Form.
For this I have already utilized the packages google/apiclient and asimlqt/php-google-spreadsheet-client to read the form results (which are stored in a spreadsheet) and duplicate the template doc for each entry.
I'm almost finished and just need to replace the placeholders by their corresponding values, but I can't seem to find a way to do that. Specifically I need to read the content of the document, perform some transformations on it (i.e. replacing the placeholders) and save it with this transformed text.
I should have thought about this before starting to program it..
Is it possible for me to edit documents at all, using just PHP? If so, how could I go about it? Any guidance is appreciated!

You can't edit in situ, but you can download, edit, upload. Is this a classic mailmerge, ie. take a spreadsheet containing (rows of) data, apply a template to those rows that results in an output file for each row?
If so , simples...
Download the spreadsheet
Download the template
For each spreadsheet row
replace the placeholders with data
insert a new file to drive
That can all be done with the Drive API from PHP

this is not possible with anything except google apps script. see https://developers.google.com/apps-script/reference/document/document-app
you can use apps script to create a "contentService" and call it from your php. beware of limited quotas if you plan to have many daily calls.
more info about doing this content service is covered in other s.o. questions that ask that specifically.

Creating dynamic sitemaps with Codeigniter

I have somewhere in the region of 60,000 URLs that I want to submit to Google. Given the restriction of 10,000 URLs per file i'm going to need to make a sitemap index and link to at least 6 sitemap files in that index.
I don't know what the most efficient way of doing this is. My idea was to go to my DB, take the TOP 10000 rows, run my foreach on the data and generate my links. My first idea was to create placeholder sitemap files (eg. sm1.xml, sm2.xml, etc.) and after each 10,000 rows increment the file index and insert the next 10,000 into the next file. The problem is that the data in the DB is always being added to, so next month I could have 70,000 URLs - meaning I'd have to create another placeholder file.
So with this in mind, I'd like to create the individual sitemap files dynamically but I don't know how.

Some idea's that might help, you on your way to building a sitemap generator in your project.
get the urls from your route.php file
get the classes/methods using the reflections class
get the data from the database or text file
Loop through each data set like you stated above and create indexed files for them.
use a CRON job to index your files via ping.
Use the ping service provided by these search engines.
You should maybe only ping the services at the end of each day or second day,
don't ping them once a new row is created!
Google Ping
http://www.google.com/webmasters/sitemaps/ping?sitemap=http://www.yourdomain.com/sitemap.xml
MSN
http://www.bing.com/webmaster/ping.aspx?siteMap=http://www.yourdomain.com/sitemap.xml

Amazon S3: What are considered PUT/COPY/POST/LIST request?

Kindly confirm if this correct:
PUT is probably uploading files to S3?
COPY is probably copying files within S3?
How about POST and LIST?
Additional question, is get_bucket_filesize() and get_object_filesize() (from PHP SDK) considered a LIST request?

From my experience using S3 (and also from the basics of HTTP protocol and REST), POST is the creation of a new object (in S3, it would be the upload of a new file), and PUT is a creation of a new object or update of an existing object (i.e., creation or update of a file). Additionally, from S3 docs:
POST is an alternate form of PUT that enables browser-based uploads as
a way of putting objects in buckets
Every time you, for example, get the contents of a given S3 bucket, you're running into a LIST operation. You have not asked, but a GET is the download of a file from S3 and DELETE would obviously be the deletion of a file. Of course these assumptions depend on which SDK you are using (it seems you're using the PHP one) and its underlying implementation. My argument is that it is possible to implement a download using a GET, an upload using a PUT or a POST, and so forth.
Taking a look into S3 REST API, though, I assume get_bucket_filesize() is implemented as a LIST (a GET operation on a bucket brings, along with some more data, the size of each object in the response) and get_object_filesize() is implemented as a GET (using the HEAD operation on a single file also brings its size included in the metadata).

Yes, you are right. PUT is uploading (specifically one file is one PUT). I was watching for whether PUT was per file or per some packet size which would make it more difficult to price. It is putting a file (without reference to size).
ALSO, COPY indeed is copying files within S3, but there’s more. See below.
I also found references to POST and LIST; see below.
So what I learned about PUT/COPY/POST/LIST and GET Requests while digging in to assess our costs. I’m also including WHERE I discovered it (wanted to get it all from Amazon). All corrections are welcome.
Amazon's FAQ is here: https://aws.amazon.com/s3/faqs/ and I'll reference this below.
COPY can be several things, one of which is copying between regions which does cost. For example, if you store in West VA, and COPY to the Northern CA region, that incurs cost. Copying from EC2 to S3 (within the same region I presume) incurs no transfer cost. See Amazon's FAQ in the section Q: How much does Amazon S3 cost?
NOTE: Writing a file, then re-writing that same file stores both versions (unless you delete something). I’m guessing you are not charged more if the files are exactly the same, but don’t send me the bill if I’m wrong. :-) It seems that the average size (for a month) is what is billed. See FAQ (link above)
For PUT, GET and DELETE, it appears one file is one transaction. That answers a big question for me (I didn’t want their 128k minimum size to be a PUT for each 128k packet… yeah, I’m paranoid). See the Question section like this:
Q: How will I be charged and billed for my use of Amazon S3?
Request Example:
Assume you transfer 10,000 files into Amazon S3 and transfer 20,000 files out of Amazon S3 each day during the month of March. Then, you delete 5,000 files on March 31st.
Total PUT requests = 10,000 requests x 31 days = 310,000 requests
Total GET requests = 20,000 requests x 31 days = 620,000 requests
Total DELETE requests = 5,000×1 day = 5,000 requests
LIST is mentioned under the question:
Q: Can I use the Amazon S3 APIs or Management Console to list objects that I’ve archived to Amazon Glacier?
It is essentially getting a list of files… a directory, if you will.
POST is mentioned under RESTObjectPost.html here: http://docs.aws.amazon.com/AmazonS3/latest/API/RESTObjectPOST.html
I hope that helps. It sure made me more comfortable with what we would be charged.

There is not much of a difference between PUT and POST. The following was copied from AWS S3 documentation.
POST is an alternate form of PUT that enables browser-based uploads as a way of putting objects in buckets. Parameters that are passed to PUT via HTTP Headers are instead passed as form fields to POST in the multipart/form-data encoded message body.
http://docs.aws.amazon.com/AmazonS3/latest/API/RESTObjectPOST.html
As others has specified LIST is for listing objects. You can find all the operations in following link.
http://docs.aws.amazon.com/AmazonS3/latest/API/RESTBucketOps.html

Efficiently storing data

I am trying to create a world application using jQuery (JS) and PHP. I originally tried doing this by using a MySQL database, which didn't work well - the server got overloaded with database queries and crashed.
This time I want to store the data in a text file... maybe use JSON to parse it? How would I do this? The three main things I want are:
Name
x-position
y-position
The x and y positions are given from the JS. So, in order:
User loads page and picks username
User moves character, the jQuery gets the x and y position
The username, x and y position are sent to a PHP page in realtime using jQuery's $.post()
The PHP page has to find some way to store it efficiently without crashing the database.
The PHP page sends back ALL online users' names and x and y coordinates to jQuery
jQuery moves the character; everyone sees the animation.

Storing the data in the file instead of the MySQL database isn't an option if you want to improve performance. Just because MySQL stores its data in the files too, but is use some technics to improve performance like caching and using indexes.
The fastest method to save and retrieve data on server is using RAM as a storage. Redis for example do that. It stores all the data in the RAM and can backup it to the hard drive to prevent data loss.
However I don't think the main problem here is MySQL itself. Probably you use it in an inappropriate way. But I can't say exactly since I don't know how many read and write requests your users generate, what the structure of your tables etc.

Text files are not the best performing things on Earth. Use a key-value store like Redis (it has a PHP client) to store them. It should be able to take a lot more beating than the MySQL server.

You can store the data in a text file in CSV (Comma separated values) format.
For example, consider your requirements.
1,Alice,23,35
2,Bob,44,63
3,Clan,435,322
This text file can be stored and read anytime, and use explode function to separate values.

We Keep Coding

PHP, A popular general-purpose scripting language that is especially suited to web development.