Hi guys I got a small problem. I got some site which get displayed on handheld computers. Those computers are SLOW. Even though they cost 4 figures they have Windows CE5/CE6 and 300-800 MHz CPUs.
Those handhelds are running a php based database application. We already minimized the javascript to speed it up but now the raw html data just takes too much time to get displayed. Sometimes only 1-10 records of the database are getting displayed thats not much of a problem. But around xmas our client has much more to do so we end up with 100+ records.
I'm already trying to minimize the html per record like shorter class/id names etc. Doesn't do much but it sums up on 100+ records.
I wonder if someone has some other ideas. Some other ideas of mine would be to display only a fixed amount and implement some system for multiple pages or load the data via ajax requests after the site was rendered. Anyone has some better ideas? Atm it takes up to 5-10 secs for the page to get displayed and if u have to work on 100+ records where some guy has to work on and you have 20-30 workers it sums up, so our client isn't pretty happy with the situation.
Is it definitely not the queries taking the time to load? Can you try loading it in a proper browser on a PC? Try using the Chrome developer tools to find out exactly which bits are taking time to load, and exactly what is using the most memory etc.
Related
I am doing a pagination system for about 100 items.
My question is:
Should I just load all 100 of them and then use jQuery to switch pages without reloading? Or should I use a MySQL query with "LIMIT 5" and then, each time user presses on Next Page or Previous Page, another Mysql query with LIMIT 5 is initiated?
For every item, I would have to load a thumbnail picture but I could keep it in the cache to avoid using my server bandwidth.
Which one is the best option from a server resource perspective?
Thanks in advance. Regards
Try connecting directly to your MySql instance via the command line interface. Execute the query with 100 at at time, and then with LIMIT 5. Look at the msec results. This will tell you which is more efficient or less resource-demanding.
100 records at a time from MySql (depending on dataset) really is nothing. The performance hit wouldn't be noticeable for a properly written query/database schema.
That said, I vote for calling only the results you need at a time. Use the LIMIT clause and your jquery pagination method to make it efficient.
For the server, the most efficient way would be to grab all 100 items once, send them to the client once, and have the client page through them locally. That's one possibly expensive query, but that's cheaper overall than having the client go back and forth for each additional five items.
Having said that, whether that's feasible is a different topic. You do not want to be pushing a huge amount of data to your client at once, since it'll slow down page loads and client-side processing. In fact, it's usually desirable to keep the bandwidth consumed by the client to a minimum. From that POV, making small AJAX requests with five results at a time when and only when necessary is much preferable. Unless even 100 results are so small overall that it doesn't make much of a difference.
Which one works best for you, you need to figure out.
Depends significantly on your query. If it is a simple SELECT from a well-designed table (indexes set etc.) then unlee you're running on a very underpowered server, there will be no noticeable difference between requesting 100 rows and 5 rows. If it is complicated query, then you should probably limit the number of queries.
Other considerations to take into account are how long it takes to load a page, as in the actual round trip time to the server to receive the data by the client. I'm going to make the wild guess that you are in America or Europe, where internet speeds are nice a fast, not the entire world is that lucky. Limiting the number of times your site has to request data from the server is a much better metric than how much load your server has.
This is moving rapidly into UX here, but your users don't care about your server load, they don't care if this way means your load average is 0.01 instead of 0.02. They will care if you have almost instantaneous transitions between sections of your site.
Personally, I'd go with the "load all data, then page locally" method. Also remember that Ajax is your friend, if you have to, load the results page, then request the data. You can split the request into two: first page and rest of pages. There's alot of behind-the-scenes tweaks you can do to make your site seem incredibly fast, and that is something people notice.
I'd say, load 5 at a time and paginate. My considerations:
It is indeed much lighter to load 5 at a time
Not all of your users will navigate through all 100, so those loaded might not even be used
A slight load time between 5 records are something expected (i.e. most users won't complain just because they have to wait 500ms - 1s)
You can also give user options to display x number of items per page, and put all options as well to let users see all items in the page. Over time, you can also monitor what most of your users preference in terms of x number of items to display per page are then go with that for the default LIMIT
mostly I find the answers on my questions on google, but now i'm stuck.
I'm working on a scraper script, which first scrapes some usernames of a website, then gets every single details of the user. there are two scrapers involved, the first goes through the main page, gets the first name, then gets the details of it's profile page, then it goes forward to the next page...
the first site I'm scraping has a total of 64 names, displayed on one main page, while the second one, has 4 pages with over 365 names displayed.
the first one works great, however the second one keeps getting me the 500 internal error. I've tried to limit the script, to scrape only a few names, which works like charm, so I'm more then sure that the script itself is ok!
the max_execution_time in my php ini file is set to 1500, so I guess that's not the problem either, however there is something causing the error...
not sure if adding a sleep command after every 10 names for example will solve my situation, but well, i'm trying that now!
so if any of you have any idea what would help solve this situation, i would appreciate your help!
thanks in advance,
z
support said i can higher the memory upto 4gigabytes
Typical money gouging support answer. Save your cash & write better code because what you are doing could easily be run from the shared server of a free web hosting provider even with their draconian resource limits.
Get/update the list of users first as one job then extract the details in smaller batches as another. Use the SQL BULK Insert command to reduce connections to the database. It also runs much faster than looping through individual INSERTS.
Usernames and details is essentially a static list, so there is no rush to get all the data in realtime. Just nibble away with a cronjob fetching the details and eventually the script will catch up with new usernames being added to the incoming list and you end up with a faster,leaner more efficient system.
This is definitely a memory issue. One of your variables is growing past the memory limit you have defined in php.ini. If you do need to store a huge amount of data, I'd recommend writing your results to a file and/or DB at regular intervals (and then free up your vars) instead of storing them all in memory at run time.
get user details
dump to file
clear vars
repeat..
If you set your execution time to infinity and regularly dump the vars to file/db your php script should run fine for hours.
I have a table of more than 15000 feeds and it's expected to grow. What I am trying to do is to fetch new articles using simplepie, synchronously and storing them in a DB.
Now i have run into a problem, since the number of feeds is high, my server stops responding and i am not able to fetch feeds any longer. I have also implemented some caching and fetching odd and even feeds at diff time intervals.
What I want to know is that, is there any way of improving this process. Maybe, fetching feeds in parallel. Or may be if someone can tell me a psuedo algo for it.
15,000 Feeds? You must be mad!
Anyway, a few ideas:
Increase the Script Execution time-limit - set_time_limit()
Don't go overboard, but ensuring you have a decent amount of time to work in is a start.
Track Last Check against Feed URLs
Maybe add a field for each feed, last_check and have that field set to the date/time of the last successful pull for that feed.
Process Smaller Batches
Better to run smaller batches more often. Think of it as being the PHP equivalent of "all of your eggs in more than one basket". With the last_check field above, it would be easy to identify those with the longest period since the last update, and also set a threshold for how often to process them.
Run More Often
Set a cronjob and process, say 100 records every 2 minutes or something like that.
Log and Review your Performance
Have logfiles and record stats. How many records were processed, how long was it since they were last processed, how long did the script take. These metrics will allow you to tweak the batch sizes, cronjob settings, time-limits, etc. to ensure that the maximum checks are performed in a stable fashion.
Setting all this may sound like alot of work compared to a single process, but it will allow you to handle increased user volumes, and would form a strong foundation for any further maintenance tasks you might be looking at down the track.
fetch new articles using simplepie, synchronously
What do you mean by "synchronously"? Do you mean consecutively in the same process? If so, this is a very dumb approach.
You need a way of sharding the data to run across multiple processes. Doing this declaratively based on, say the modulus of the feed id, or the hash of the URL is not a good solution - one slow URL would cause multiple feeds to be held up.
A better solution would be to start up multiple threads/processes which would each:
lock list of URL feeds
identify the feed with the oldest expiry date in the past which is not flagged as reserved
flag this record as reserved
unlock the list of URL feeds
fetch the feed and store it
remove the reserved flag on the list for this feed and update the expiry time
Note that if there are no expired records at step 2, then the table should be unlocked, the next step depends on whether you run the threads as daemons (in which case it should implement an exponential back of, e.g. sleeping for 10 seconds doubling up to 320 seconds for consecutive iterations) or if you're running as batches, exit.
Thank You for your responses. I apologize I am replying a little late. I got busy with this problem and later I forgot about this post.
I have been researching a lot on this. Faced a lot of problems. You see, 15,000 feed everyday is not easy.
May be I am MAD! :) But I did solve it.
How?
I wrote my own algorithm. And YES! It's written in PHP/MYSQL. I basically implemented a simple weighted machine learning algorithm. My algorithm basically learns the posting time about a feed and then estimates the next polling time for the feed. I save it in my DB.
And since it's a learning algorithm it improves with time. Ofcourse, there are 'misses'. but these misses are alteast better than crashing servers. :)
I have also written a paper on this. which got published in a local computer science journal.
Also, regarding the performance gain, I am getting a 500% to 700% improvement in speed as opposed to sequential polling.
How is it going so far?
I have a DB that has grown in size of TBs. I am using MySQL. Yes, I am facing perforance issues on MySQL. but it's not much. Most probably, I will be moving to some other DB or implement sharding to my existing DB.
Why I chose PHP?
Simple, because I wanted to show people that PHP and MySQL are capable of such things! :)
Hey,
I currently have over 300+ qps on my mysql. There is roughly 12000 UIP a day / no cron on fairly heavy PHP websites. I know it's pretty hard to judge if is it ok without seeing the website but do you think that it is a total overkill?
What is your experience? If I optimize the scripts, do you think that I would be able to get substantially lower of qps? I mean if I get to 200 qps that won't help me much. Thanks
currently have over 300+ qps on my mysql
Your website can run on a Via C3, good for you !
do you think that it is a total overkill?
That depends if it's
1 page/s doing 300 queries, yeah you got a problem.
30-60 pages/s doing 5-10 queries each, then you got no problem.
12000 UIP a day
We had a site with 50-60.000, and it ran on a Via C3 (your toaster is a datacenter compared to that crap server) but the torrent tracker used about 50% of the cpu, so only half of that tiny cpu was available to the website, which never seemed to use any significant fraction of it anyway.
What is your experience?
If you want to know if you are going to kill your server, or if your website is optimizized, the following has close to zero information content :
UIP (unless you get facebook-like numbers)
queries/s (unless you're above 10.000) (I've seen a cheap dual core blast 20.000 qps using postgres)
But the following is extremely important :
dynamic pages/second served
number of queries per page
time duration of each query (ALL OF THEM)
server architecture
vmstat, iostat outputs
database logs
webserver logs
database's own slow_query, lock, and IO logs and statistics
You're not focusing on the right metric...
I think you are missing the point here. If 300+ qps are too much heavily depends on the website itself, on the users per second that visit the website, that the background scripts that are concurrently running, and so on. You should be able to test and/or compute an average query throughput for your server, to understand if 300+ qps are fair or not. And, by the way, it depends on what these queries are asking for (a couple of fields, or large amount of binary data?).
Surely, if you optimize the scripts and/or reduce the number of queries, you can lower the load on the database, but without having specific data we cannot properly answer your question. To lower a 300+ qps load to under 200 qps, you should on average lower your total queries by at least 1/3rd.
Optimizing a script can do wonders. I've taken scripts that took 3 minutes before to .5 seconds after simply by optimizing how the calls were made to the server. That is an extreme situation, of course. I would focus mainly on minimizing the number of queries by combining them if possible. Maybe get creative with your queries to include more information in each hit.
And going from 300 to 200 qps is actually a huge improvement. That's a 33% drop in traffic to your server... that's significant.
You should not focus on the script, focus on the server.
You are not saying if these 300+ querys are causing issues. If your server is not dead, no reason to lower the amount. And if you have already done optimization, you should focus on the server. Upgrade it or buy more servers.
I need one that does not have a big queue. So, if it does not have it on its cache, it would generate and deliver it reasonably fast (1-4 seconds).
I do have a list of services, what I am asking is if you have experience with any such service that meets the above criteria that you could recommend.
Thank you
Edit: To clarify my intent, I need a thumbnail of any given website approximately 200-300 px wide. I need it fast because it is to be displayed while my the script calculates carious stuff on the background (all this is done and working. I am using a free service for test but it has a huge queue and most of the time the frontend side redirects to the results before the screenshot appears. My code takes about 10 seconds before it redirects to the results usually)
I think that you can do something on your own you by starting an X server and taking screenshots by using "import" command -- I'm pretty sure that it's going to be faster than any external services.
It would impossible to guarantee 1-4 seconds for any webpage, it will always depend on how fast the page and all linked resources can be delivered. If you can load and see the page in a browser in 1-4 seconds, it is possible.
I'm about to launch a webpage screenshot service that takes max 300-500ms to capture it, after the it gotten the required resources. For instances you can get a thumbnail of google in under a second. But for a instance newspaper sites tend to be slow, so loading them can take up to 10 seconds.