PHP read text files very slowly - php

I have a large number of files in a directory and I'm using php to read it to a string. For example, a file's path looks like this: filerootdir/dir1/dir2/dir3/dir4/dir5/dir6/file.txt.
I have a million such txt files. Based on different parameter, php will read the txt file and display it as a part of the webpage. I'm testing the php program on Windows 7 Pro right now. When a file's absolute path is short, e.g., filerootdir/dir1/file.txt, it's pretty fast to load. But when the absolute path is long, it is VERY slow. I'm wondering if there is a better solution for this problem.
I'm testing my program under windows WAMP, but it will be moved to LAMP later eventually. Will the file loading program fun faster on linux servers? Could this be a problem of Windows operating system?
The code I'm using looks like the following:
if (file_exists($filePath.".html")) {
$code = file_get_contents($filePath.".html");
}
Thanks very much!

You might consider storing the data in a database - if you are using this number of records, especially if they are small files, a database will probably be more efficient. Before you do, read up on indexes - they can grab the right record out of billions in a tiny fraction of a second.

Related

Import path of folders and subfolders into a faster database engine

I created a proxy/crawler a while ago and it ended up logging a lot of files. I thought this would be a simple and OK solution to begin with, but realized I came across more and more problems once it came close 1 000 000 files. Searching the database can take up to 15 seconds, and I have experience the server crash twice in the last week. I tested restarting apache2, search for "test" and spam "free -m" command in terminal. I notice the ram went up high imminently, and it's probably the ram that causes crash. I'm not sure what makes a search engine fast, but would really like to know.
All files are stored under:
database/*/*/*.txt
And use this code to go through them all:
$files = array();
$dir = '/var/www/html/database';
foreach (glob($dir . '/*/*/*.txt', GLOB_NOCHECK) as $path) {
$title = basename($path, ".txt");
if(strripos($title,$search) !== false){
array_push( $files, $path );
}
}
The code is much longer, but I just wanted to show the basics of how it works.
Each file contains about 6 lines of useful info.
So I started looking for a solution, and thought. What if I parse the search to something that can search faster than PHP like Java or C? Ahh, it would be a mess.. So I thought about MySQL. But how should I be able to transfer all files from the folders and subfolders to MySQL? Server is running Debian, with 4 GB ram and i3 processor.
I haven't taken any actions yet because MySQL was confusing and hasn't found any other solution. What should I do?
This question is asking for too much. It's not just a click and go. I thought more people had problems like this, but then I realized that everyone are using premade search engines.
I ended up download the whole database to my windows computer, and code a program in c# that automatically goes through all files, gets the content and POST it to an elasticsearch database which I installed on the Debian server. I should probably have created a file to file converter instead of a file to pure POST request.
Drawback of doing this, is that speeds are not too high and it took 2 hours to transfer 700 000 files to the database.
Program will not be released publicly because of specific strings I used in the files. So this was way harder than I expected.
C# Appllication result:

How would I be able to search a massive text file? (20+GB)

I'd like to be able to search a 20GB+ .txt file on Windows 7. I have ~20GB of free RAM for a program to run (less RAM than the size of the txt file,) and a software raid 0 between 2 laptop hard drives that yields me about 100MB/s R/W speeds. The end goal is to have a web interface where you could put in a search query and it would tell you what number into the text file the query was found. There is a webserver already running on this computer (WAMP). Any help would be greatly appreciated.
There is no reason to re-invent searching. We have nice databases for this task these days. Even if you must start with a source text file of this size, I would load it into a database for searching. Once loaded and indexed, you have a very efficient rig.
Now, you didn't specify what you're searching for and how so it's hard to give specific advice. What I can tell you is that I've had great luck with text searches in Solr. It's built for tasks like these.

Will too many files storing in one folder make HTTP request for one of them slow?

I've got nearly a million images for my site and they are stored in one folder on my windows server.
Since opening this folder directly on desktop drive me and my CPU crazy I am wondering that whether fetching one of them using my PHP script for a HTTP request is also laborious. So, will separating them into different folders improve the performance?
No, the performance does not depend on the number of files that are in a directory. The reason why opening the folder in Windows explorer is slow is because it has to render icons and various other GUI related things for each file.
When the web server fetches a file, it doesn't need to do that. It just (more or less) directly goes to the location of the file on the disk.
EDIT: Millions is kind of pushing the limits of your file system (I assume NTFS in your case). It appears that anything over 10,000 files in a directory starts to degrade your performance. So not only from a performance standpoint, but from an organizational standpoint as well, you may want to consider separating them into subdirectories.
Often the best answer in a case like this is to benchmark it. It shouldn't be too hard to create a program that opens 1000 hard-coded file names and closes them. Run the test on your million-plus directory and another directory containing only those 1000 files being tested and see if there's a difference.
Not only does the underlying file system make a difference, but accessing over a network can affect the results too.
Separating your files into seperate directories will most likely help performance. But as mark suggests it's probably worth benchmarking

PHP script: How big is too big?

I'm developing a webapp in PHP, and the core library is 94kb in size at this point. While I think I'm safe for now, how big is too big? Is there a point where the script's size becomes an issue, and if so can this be ameliorated by splitting the script into multiple libraries?
I'm using PHP 5.3 and Ubuntu 10.04 32bit in my server environment, if that makes any difference.
I've googled the issue, and everything I can find pertains to PHP upload size only.
Thanks!
Edit: To clarify, the 94kb file is a single file that contains all my data access and business logic, and a small amount of UI code that I have yet to extract to its own file.
Do you mean you have 1 file that is 94KB in size or that your whole library is 94KB in?
Regardless, as long as you aren't piling everything into one file and you're organizing your library into different files your file size should remain manageable.
If a single PHP file is starting to hit a few hundred KB, you have to think about why that file is getting so big and refactor the code to make sure that everything is logically organized.
I've used PHP applications that probably included several megabytes worth of code; the main thing if you have big programs is to use a code caching tool such as APC on your production server. That will cache the compiled (to byte code) PHP code so that it doesn't have to process every file for every page request and will dramatically speed up your code.

Web Speed: it's worth it to put every included file in only one?

I am doing some tests (lamp):
Basically I have 2 version of my custom framework.
A normal version, that includes ~20 files.
A lite version that has everything inside one single big file.
Using my lite version more and more i am seeing a time decrease for the load time. ie, from 0.01 of the normal to 0.005 of the lite version.
Let's consider just the "include" part. I always thought PHP would store the included .php files in memory so the file system doesn't have to retrieve them at every request.
Do you think condensing every classes/functions in one big file it's worth the "chaos" ?
Or there is a setting to tell PHP to store in memory the require php files?
Thanks
(php5.3.x, apache2.x, debian 6 on a dedicated server)
Don't cripple your development by mushing everything up in one file.
A speed up of 5ms is nothing compared to the pain you will feel maintaining such a beast.
To put it another way, a single incorrect index in your database can give you orders of magnitude more slowdown.
Your page would load faster using the "normal" version and omitting one 2kb image.
Don't do it, really just don't.
Or you can do this:
Leave the code as it is (located in
many different files)
Combine them in one file when you are ready to upload it to the production server
Here's what i use:
cat js/* > all.js
yuicompressor all.js -o all.min.js
First i combine them into a single file and then i minify them with the yui compressor.

Categories