Use PHP to sync large amounts of text - php

I have several laptops in the field that need to daily get information from our server. Each laptop has a server2go installation (basically Apache, PHP, MySQL running as an executable) that launches a local webpage. The webpage calls a URL on our server using the following code:
$handle = fopen( $downloadURL , "rb");
$contents = stream_get_contents( $handle );
fclose( $handle );
The $downloadURL fetches a ton of information from a MySQL database on our server and returns the results as output to the device. I am currently returning the results as their own SQL statements (ie. - if I query the database "SELECT name FROM names", I might return to the device the text string "INSERT INTO names SET names='JOHN SMITH'"). This takes the info from the online database and returns it to the device in a SQL statement ready for insertion into the laptop's database.
The problem I am running into is that the amount of data is too large. The laptop webpage keeps timing out when retrieving info from the server. I have set the PHP timeout limits very high, but still run into problems. Can anyone think of a better way to do this? Will stream_get_contents stay connected to the server if I flush the data to the device in smaller chunks?
Thanks for any input.

What if you just send over the data and generate the sql on the receiving side? This will save you a lot of bytes to transmit.
Is the data update incremental? I.e. can you just send over the changes since the last update?
If you do have to send over a huge chunk of data, you might want to look at ways to compress or zip and then unzip on the other side. (Haven't looked at how to do that but I think it's achievable in php)

Write a script that compiles a text file from the database on the server, and download that file.

You might want to consider using third-party file synchronization services, like Windows Live Sync or Dropbox to get the latest file synchronized across all the machines. Then, just have a daemon that loads up the file into the database whenever the file is changed. This way, you avoid having to deal with the synchronization piece altogether.

You are using stream_get_contents (or you could even use file_get_contents without the need of extra line to open stream) but if you amount of text is really large like the title says, you'll fill up your memory.
I came to this problem when writing a script for a remote server, where memory is limited, so that wouldn't work. The solution I found was to use stream_copy_to_stream instead and copy your files directly on the disk rather then into memory.
Here is the complete code for that piece of functionality.

Related

Speeding up PHP File Writes

I have 8 load balanced web servers powered by NGINX and PHP. Each of these web servers posts data to a central MySQL database server. They [web servers] will also post same data (albeit slightly formatted) to a text file in a separate Log Server (line-by-line) i.e. One database insert = One line in log file.
The active code of the PHP file doing the logging looks something like below:
file_put_contents(file_path_to_log_file, single_line_of_text_to_log, FILE_APPEND | LOCK_EX);
The problem I'm having is scaling this to 5,000 or so logs per second. The operation will take multiple seconds to complete and will slow down the Log server considerably.
I'm looking for a way to speed things up dramatically. I looked at the following article: Performance of Non-Blocking Writes via PHP.
However, from the tests it looks like the author has the benefit of access to all the log data prior to the write. In my case, each write is initiated randomly by the web servers.
Is there a way I can speed up the PHP writes considerably?! Or should I just log to a database table and then dump the data later to text file at timed intervals?!
Just for your info: I'm not using the said text file in the traditional 'logging' sense...the text file is a CSV file that I'm going to be feeding to Google BigQuery later.
Since you're writing all the logs to a single server, have you considered implementing the logging service as a simple socket server? That way you would only have to fopen the log file once when the service starts up, and write out to it as the log entries come in. You would also get the added benefit of the web server clients not needing to wait for this operation to complete...they could simply connect, post their data, and disconnect.

Importing Customer Database via CSV to RDS (MySQL)

We're working on a feature to allow our users to import their own customer/marketing database into our system from a CSV file they upload to our servers.
We're using PHP on Ubuntu 10.04 on Amazon EC2 backed by MySQL on Amazon RDS.
What we've currently got is a script that uses LOAD DATA LOCAL INFILE but it's somewhat slow, and will be very slow when real users start uploading CSV files with 100,000+ rows.
We do have an automation server that runs several tasks in the background to support out application, so maybe this is something that's handed over to that server (or group of servers)?
So a user would upload a CSV file, we'd stick it in an S3 bucket and either drop a line in a database somewhere linking that file to a given user, or use SQS or something to let the automation server know to import it, then we just tell the user their records are importing and will show up gradually over the next few minutes/hours?
Has anybody else had any experience with this? Is my logic right or should we be looking in a entirely different direction?
Thanks in advance.
My company does exactly that, via cron.
We allow the user to upload a CSV, which is then sent to a directory to wait. A cron running every 5 minutes checks a database entry that is made on upload, which records the user, file, date/time, etc. If a file that has not been parsed is found in the DB, it accesses the file based on the filename, checks to ensure the data is valid, runs USPS address verification, and finally puts it in the main user database.
We have similarly setup functions to send large batches of emails, model abstractions of user cross-sections, etc. All in all, it works quite well. Three servers can adequately handle millions of records, with tens of thousands being loaded per import.

what happens when an xml file tries to be read while it is being written?

I have an iOS app that allows users to update the cover charges at local bars. The data is then displayed on the app for other users to see. The updates are made by sending a request to a php script and then the script updates and xml file. What will happen if a user tries to read the xml while another user is updating it, i.e. while the file is being rewritten with a new update?
Thanks!
The user has a 50-50 chance of getting the updated version, depending on the server speed and there connection speed it may differ. I agree with AMayer, once the file gets big it's going to be hard on your server to download and upload the ENTIRE xml file again and again! I would just setup a MySQL database now and use it instead of the XML.

Send file from PHP page to another

I need to send a file from one PHP page (on which client uploads their files) to another PHP page on another server were files will be finaly stored.
To comunicate now I use JSON-RPC protocol; is it wise to send the file this way?
$string = file_get_contents("uploaded_file_path");
send the string to remote server and then
file_put_contents("file_name", $recieved_string_from_remte);
I understand that this approach takes twice the time than uploading directly to the second server.
Thanks
[edit]
details:
i need to write a service allowing some php (may be joomla) user to use a simple api to upload files and send some other data to my server which analyze them , put in a db and send back a response
[re edit]
i need to create a Simple method allowing the final user to do that, who will use this the interface on server 1 (the uploading) use the php and stop, so remote ssh mount ore strange funny stuff
If I were you, I'd send the file directly to the second server and store its file name and/or some hash of the file name (for easier retrieval) in a database on the first server.
Using this approach, you could query the second server from the first one for the status of the operation. This way, you can leave the file processing to the second machine, and assign user interaction to the first machine.
As i said in my comment, THIS IS NOT RECOMMENDABLE but anyway....
You can use sockets reading byte by byte:
http://php.net/manual/en/book.sockets.php
or you can use ftp:
http://php.net/manual/en/book.ftp.php
Anyway, the problem in your approuch is doing the process async or sync with the user navigation? I really suggest you passed it by sql or ftp and give the user a response based on another event (like a file watching, then email, etc) or using sql (binary, blob, etc)
Use SSHFS on machine 1 to map a file path to machine 2 (using SSH) and save the uploaded file to machine 2. After the file is uploaded, trigger machine 2 to do the processing and report back as normal.
This would allow you to upload to machine 1, but actually stream it to machine 2's HD so it can be processed faster on that machine.
This will be faster than any SQL or manual file copy solution, because the file transfer happens while the user is uploading the file.
If you don't need the files immediately after receiving them (for processing etc), then you can save them all in one folder on Server 1 and set up a cron to scp the contents of the folder to Server 2. All this assuming you are using linux servers, this is one of the most secure and efficient ways to do it.
For more info please take a look at http://en.wikipedia.org/wiki/Secure_copy or google scp.

PHP Download script with multiple connections

I've a php code for a file download for specific user
I'm storing the content of the file in a database (using blob type).
<?php
//do stuffs to validate user
//do stuffs get the content from database;
//$r=mysql_fetch_object("$query");
header("Content-Type: $r->type");
header("Content-Disposition: attachment; filename=\"$r->name\"");
echo $r->content;
?>
In case of large files the file downloading takes long time.
How to improve the code?
Does the speed of download increased with multiple connections?
Assuming there's no artificial limits placed on the connection, an HTTP transfer will take up as much of the network pipe as it can.
Once the connection starts getting throttled (e.g. on a file download site like Rapidshare, 'free' users get limited bandwidth), then using parallel connections MAY increase speed. e.g. a single stream is limited to 50k/s, so opening 2 streams would make for an effective 100k/s.
But then you're going to have to support ranged download. Your script as it stands sends out the entire file, from beginning to end. So the user would download the whole file twice.
There's probably not that much you can do to speed up this specific process.
Server and client bandwidth are hard limits. Streaming the file through PHP will cause some additional overhead, but seeing as the data comes from a database, there is no straightforward way to improve that, either.
Moving to a faster server with more bandwidth may help things, but then also it might not. If the client's connection is slow, there is nothing you can do.

Categories