Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 9 years ago.
Improve this question
I have been using a robot I've coded from quite a while now and it works pretty well.
It goes to a domain, parse the index, save all internal links in a session var (lets call it array1), then refresh itself, move the index to an other session var (array2) and deletes it from array1, parses the next page in array1, check if the new internal links it finds are already in either of the two arrays, if not it saves them in array1, and so on. It basically makes a list of all pages within a domain.
This bot has been crawling some pretty big websites (20k+ pages) and it did fine, it just took some time, but that does not bother me. Now I want it to crawl some even bigger websites (200k+ pages), and I would like to have your opinion on the best way to handle data.
Should I carry on with sessions ? I know sessions use disk space, and even though my raspberry pi is quite stable, is it gonna be able to handle 20Mb+ variables ?
Should I keep all URL's in a sql tables ? In which case how well can sql handle this amount of data?
Thank you.
(I have searched the web but nothing seems to be really close to what I'm experiencing)
TLDR: I bet you will run into completely different issues (storage space, computational power, OS limitations) before running into MySQL specific limitations.
MySQL limits:
Per field by type (http://www.electrictoolbox.com/maximum-length-mysql-text-field-types/):
TINYTEXT 256 bytes
TEXT 64kb
MEDIUMTEXT 16MB
LONGTEXT 4GB
Per table by platform (http://dev.mysql.com/doc/refman/5.0/en/table-size-limit.html)
Win32 FAT/FAT32 2GB/4GB
Win32 NTFS 2TB (possibly larger)
Linux 2.2 x86 2GB (LFS: 4GB)
Linux 2.4+ ext3 4TB
Solaris 9/10 16TB
MacOS X HFS+ 2TB
NetWare NSS 8TB
For sure databases can handle much more data than sessions. They are designed for that. Some GB large databases are nothing exciting for MySQL, Postgresql or MS SQLserver.
Related
Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 1 year ago.
Improve this question
i just wondering
what is the best practice to store value
cache in file system Or in memory In terms of performance
i don't want use Redis cache or any software
just want to used either (memory cache OR file cache) to cache so file for period of time
Redis, memcache, memcached are just wrappers or helpers to access memory blocks (so you dont have to map memory blocks manually)
That being said, and to answer your question it depends on the OS you are using, assuming you are running linux, by default when you open a file it makes uses of the kernel filesystem_cache, you could make use of that and just use file cache, for most applications this is the best as it is reliable even on memory dumps or system reboots.
memory cache is the fastest, and the best for concurrency, but is not to be rely on.
lets look at it with an example
if your application makes 100 calls/second
when the request is not cached, it takes 10 seconds to generate/serve the request
it means you need to support to have open 1000 threads for the 10 seconds the request will take, besides that you will be processing the same cache 1000 times. unless you can set a flag to let other process know that you are already generating that data and to just wait.
based on this scenario you could have a process that generates that file each day.
if you use file cache, you will be safe if the systems dumps memory or anything because you're file will still exist.
if you use memory cache, you will be in troubles as you will to generate the file either on the fly or manually, either way you have a downtime of at least 10 seconds.
it's just an example, your flow could be completely different.
comment if you have any doubt ill try to expand (:
Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 5 years ago.
Improve this question
approximately 1 million woocommerce products.
Wp-rocket already enabled with cloudflare,
With Varnish cache,
and i use 20 active plugins.
Server: intel Xeon E5 6 cores, 24 GB RAM With SSD memory: With CPANEL WHM
Can help me optimize my Server and WP
If you haven't already done so, you might want to use a tool like http://www.webpagetest.org to identify any problem areas.
From the waterfall view, if the time to first byte is much more than 500ms then you want to focus on your server config.
If the delay is from time to first byte to start render then examine the results want to figure out why.
You'll also see what if any improvements can be gained from compressing images and reducing file sizes.
Concatenating files almost always gets good results so make sure you are using the WP Rocket option to concatenate CSS and JS files (Static Files → Combine files)
On large CMS sites had good results installing Google's PageSpeed module.
Here's an overview of the filters available with PageSpeed once it's installed.
One the things I like about PageSpeed is that you have good control over how you configure it, and it's working at the server level, so it will give you options which aren't available at the plugin level.
Good luck!
Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 8 years ago.
Improve this question
I was told many years ago that using "include" statements in PHP doesn't "cost" anything in performance. But what about when you query the file system, for instance running "filemtime" or "readdir". If I am performing these with every page request, is that a problem? Thanks!
The reason why include statements "don't cost anything" in performance, is because those include files are often cached as well. Semi compiled versions of PHP scripts can be stored in APC cache (See: http://php.net/manual/en/book.apc.php)
Apart from that cache, the OS will also cache file access, so subsequent calls to filemtime won't need actual disk access every time. And even if the OS request information from the hard drive, that drive might have cached the most recent requests as well. So there is caching at multiple levels, all in order to make disk access as fast as possible.
So, for those reasons, calling filemtime many times should not be a big issue either, but if you need to read a lot of different files, the caches might not work optimally, and you will have a lot of actual disk I/O. Eventuall, if you have many visitor, the file I/O might become a bottleneck. You might be able to solve this by upgrading your hardware. A raid of SSDs will likely be able to read faster than a single spinning disk.
If performance is still an issue, you might store the filetime of a file in a cache yourself, for instance APC or memcache, or even an include file for PHP that contains an array of relevant file information. Of course you need to make sure to update this cache every time when you write a file. And make sure to profile every optimization you make. If you don't have APC, an include file probably won't do any good. Also, requests to memcache have some overhead even though the data itself is in memory. So these solutions are not guaranteed to improve things.
But as always, don't start implementing such optimizations if you don't need to. Premature optimization... :)
Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 8 years ago.
Improve this question
I need to design a mobile application which requires a lot of database queries. A lot of means, a peak value can be 1 million in a sec. I dont know which database to use and which backed to use. In client side, i will be using phonegap for android and ios and i will be needing a webinterface for pc also.
My doubts are, i am planning to host the system online and use google cloud message to push data to users.
Can online hostings handle this much traffic?
I am planning to use php as backed. Or python?
the software need not be having a lot of calculation but a lot of queries.
And, which database system to use? Mysql or, google cloud sql?
Also tell me about using hadoop or other technologies like load balancers.
I may be totally wrong about the question itself.
Thank you very much in advance.
From what I understand, if you want to store unstructured data and retrieve it really fast, you should be looking at NoSql segment for storage and try to do a POC using a few of the available solutions in market. I would like to suggest giving a try to Aerospike NoSql DB which has a track record of easily doing 1 Million TPS on a single machine.
Google AppEngine could be the answer, it could be programmed in python or php (or java) and easily support scaling up to millions of requests per second and scaling down to just a few to save the resources (and your money).
they use their own NoSQL db, however there's a possibility to use SQL-based backend (not recommended).
Closed. This question is off-topic. It is not currently accepting answers.
Want to improve this question? Update the question so it's on-topic for Stack Overflow.
Closed 11 years ago.
Improve this question
I'm currently using Amazon SimpleDB. But it seems, by the cost, that it will be too expensive as to what I can afford.
I've one m1.small Amazon EC2 instance which runs the front-end web server very well. In my single SDB domain, I've four attributes (two of which I can delete since they have data I rarely need now) and the item name. I perform only getAttribute queries (no selects). Bascially, the item name is what I use to find data.
Around 20 reads and 8 writes per second occur on it. The box usage is terribly high which pushes my costs up.
Which would be the best database choice, hosted on a t1.micro instance (since it's the only cheap and low-level 64-bit instance and other 64-bit instances are far too expensive)?
Redis/MongoDB/CouchDB or what? Would it be even possible to host a database server that can sustain the load I mentioned above on so small an instance?
I have migrated some of my databases from SimpleDB to MongoDB for other reasons.
However, I wanted to continue with the hosted model for the database. So instead of using Amazon SimpleDB I am now using a combination of MongoHQ (mongohq.com) and MongoLab (mongolab.com).
They also have a free tier, not based on traffic but on the size of your database. You will need to analyze the costs based on the amount of data you will be dealing with.
It seems to me that if you are only using 2 attributes you should be fine with the free tier for a while (MongoLab.com has a 250Mb limit for the free tier)
Since both of those hosted service can be hosted in Amazon EC2, they are close to your front end, you will not incur in bandwidth costs because they are all inside AWS, and will help with performance since you will be using the high-speed internal AWS network.
In terms of performance I think 20 reads and 8 writes per second is not a big deal and the server will take care of all the cycles needed to support your app.
You can batch all your writes and use the default that provides no response to make the writes much faster.
For your reads, make sure you index your collection correctly and it should run fine.