As part of my web app, I built a system that periodically pulls an RSS feed and scrapes its content. I also look for any image tags present in the feed item, and attempt to pull it to query its size and such to determine which "picture" to use.
Here is a rough sketch of that part of the code:
Is there an <image> node? If so, that is the image. Exit.
Parse the content of the description node through simplehtmldom and look for any and all img tags
Iterate through all img tags:
getimagesize();
If the image size is greater than one I found earlier, use this picture.
Exit.
At step 3, the script can take awhile, especially for feeds that have lots of images for me to check. I assume that each call to getimagesize() takes a certain amount of time and it adds up quickly. I'm not too worried about it taking a long time (although if it could be reduced, that would be best), but the fact that while this script is running, it effectively leaves all other concurrent users hanging until the script has finished.
I'd like to avoid this, but am not too proficient at server admin - perhaps someone could give me some guiding pointers?
Thanks!
Run it on a separate server if you need the performance boost. getimagesize() can really slow things down. I'd recommend running the scraping script on it's own server and host everything else on your current server.
Related
I've been on a project for the past few days and hit a problem displaying large quantities of images (+20gb total ~1-2gb/directory)in a gallery on one area of the site. The site is built on the bootstrap framework. I've been trying to make massive carousels that ultimately do not function fluidly due to combined /images size. Question A: In this situation do I need i/o from a database and store images there-- is this faster than in /images folder on front end?
And b) in my php script i need to -set directories to variables/ iterate through and display images into < li >, but how do I go about putting controls on the memory usage so as to not overload browser? Any additions, suggestions, or alternatives would be greatly appreciated. Im looking for most direct means to end here.
Though the question is a little generic, here are some thoughts in regards to your two questions:
A) No, performance pulling images from a database would most likely be worse than pulling straight from the file system. In general, it is not a good idea to store images or other binary data in databases unless you absolutely have to, because databases can't do much with this information and you are just adding an extra layer on top of the file system that doesn't need to be there. You would, however, want to store paths to images in your database, potentially along with other characteristics such as image dimensions, thumbnail paths, keywords, etc. Then your application would read the entries for the images to return the correct paths to the images.
B) You will almost certainly want to implement some sort of paging if you are displaying many hundreds or thousands of photos. If the final display must be a carousel, you will want to investigate the Javascript that drives it to determine how you could hook in a function that retrieves more results from your PHP application via an AJAX call when it reaches the end or near end of the current listing of images. If you are having problems with the browser crashing due to too many images, you will also want to remove images from the first part of the list of <li>s when you load new ones so that it keeps the DOM under control.
A) It's a bad idea to store that much binary data into a database, even if the DB allows it, you shouldn't use it, it'll also give you much more memory consumption, all your data will be stored in the database's memory space, then copied into PHP's memory space for you to handle, which eats up twice the memory, plus the overhead of running a database server, and querying, etc.. so no, it's slower to use a database, accessing the filesystem directly is faster, if you also use varnish or other front-end caching system, you'll even be able to serve content much faster too.
What I would do is store files on the filesystem, and the best server to handle static serving like that is either G-WAN or NGINX Source, but do your read up and decide for yourself what suits you best. point is, stay away from apache, and probably host all those static files onto a separate server running a lightweight http server
ProTip: Save multiple copies of the same image with scaled down sizes for example 50% and another version with 25% of the original image size, this way you'll be able to send the thumbnails first for quick browsing, then when a user decides to view an image you serve up the 50% or 100% size, depending on their screen size, this way you save yourself bandwidth and memory. you also save a big 3G bill for mobile users.
B) This is where it makes some sense to use a database, you can index all the directories into a database, and use that to store the location of the image in the FS, and perhaps some tags, and maybe even number of views, etc...
and in the forntend you'll implement a scipt that'll fetch for example 50 thumbnails per page then the user can scroll around using some fancy JQuery, and when you need to fetch more, simply get a new result set with 50 more thumbs, etc..
this way you'll save yourself memory, bandwidth and even the users will thank you for such a lightweight browsing experience !
Another tip:
If you want to be able to handle bigger traffic, you might want to consider using a CDN, there are many CDN services that aren't as expensive as Amazon S3, a simple search will give you tons of resources !
Happy hacking !
Hello folks of SO!
We're trying to do some very small and simple code in PHP to generate a variation of a video, using always the same file.
The script would have to make a small pixel mark, on random or specific frame of the video file, and this would have to be streamed in real time.
Here's some pseudo code to explain my idea:
$frame = $_GET[frame];
$videofile = 'video.avi';
make_random_red_pixel_mark($videofile, $frame);
Does anyone know if this is possible using ffmpeg? As well, it is of extreamly importance for us, to execute this procedure as fast as possible.
A solution that would imply reprocessing the whole video, won't be useful for our purposes. It should be something like a closed caption, or a quick image / overlay filter that could be applied without an entire video reprocessing. As well, we can't put the overlay using Javascript nor any HTML approach, since the actual manipulation has to be on the video file itself.
The quality, and framerate of the original video, should be kept intact. Perhaps some other PHP module or software that could be execute from PHP using an exec()?
Any recommendation?
Thanks in advance!!
Chris C. Russo
More information:
1) It's possible for us to apply this procedure on any frame we want to, so we could use a "keyframe" in order to avoid the decoding and reencoding of an entire GOP.
2) As previously stated, the video stream would have to flow in real time.
This is a hard problem. The FFmpeg overlay video filter requires re-encoding.
When you change ALMOST anything in a video, you will be dealing with re-encoding of the video. This might be an expensive process depending on the video and on the how hurry you are (if you want real-time, you are in a hurry).
A possible solution for this would be something like this:
Open the INPUT video.
Create the OUTPUT video.
Loop over the packets of the INPUT video until you find the frame you want.
Reading the flags of the video packets (AVPacket structure) you can identify the Group of Pictures of this frame.
Ok, you will have to RE-ENCODE only the frames that belong to this group of pictures. Because a GOP always start with a keyframe, you will be able to do that.
After done, go on reading the packets of the INPUT and writing it to the OUTPUT (transmux).
The process of reading a packet from source and write to destination is called transmux and is very very cheap for live streaming. It's basically a plain copy of bytes. No big deal.
"The hard part here is that you will have to manage a POOL of packets until you identify the GOP where your frame is located. Why? Because you will read all packets AND STORE them in a pool (without decode the packets). When you identify it's a GOP, you will write these packets to your OUTPUT and go on to the next GOP. So you will always have the GOP in memory to be flushed (all packets together). When you identify the target frame you wanna modify. I will have to DECODE THE FRAMES from the beginning of the GOP to the end, modify the frame you want and then REENCODE this GOP! Well very hard!"
For arbitrary videos, this process above may result in a visible difference of quality of encoding in the GOP you reencoded. :-(
If you don't know how to open a video, read the packets, write the packets, etc, etc... you will have to know the basics os FFmpeg.
In order to do that, I suggest you to study this example if you don't know anything about:
Demuxing: http://ffmpeg.org/doxygen/trunk/doc_2examples_2demuxing_8c-example.html
Muxing: http://ffmpeg.org/doxygen/trunk/doc_2examples_2muxing_8c-example.html
This example will teach you how to open the video, identify the audio/video streams and loop over the packets, as well as decoding and reencoding.
Hard job. These examples are in C. You can decide make a plugin for PHP or use a PHP wrapper for FFmpeg.
OTHER SOLUTION IS: If you have flexibility of choose frame, try to reencode only keyframes. Because keyframes are complete "bitmaps". You don't need to deal with GOPs. You will decode and reencode only 1 frame.
I'm trying to build mechanism which will scan a website at a given URL and get all images. Currently I'm using simple_html_dom which is slow.
Scanning a website from localhost is taking me about 30s - 1 min.
What I need to do is:
load a URL.
scan for images ( if its posible with specific size x > width )
print them.
I'm looking for fastest way.
There is no fastest way.
You cannot reduce network latency.
You cannot avoid getting image to detect its size.
The rest of operations already being a negligible part of process.
The other answer is oversimplified because you can reduce the overall network throughput by sending HEAD requests to the server to get the image size before downloading it -- immediately saving you almost all of the bandwidth for images with size < x.
Depending on the size of the pages involved, the choice of string operations used to extract the image URLs could be important as well. PHP's perfectly adequate for the needs it caters for but it's still a moderately slow, interpreted language at the end of the day and I find calling routines which involve moving large substrings around appreciably laggy sometimes. In this case parsing it fully, even using a simple library, is overkill.
The reason I would go to extreme lengths to download only the bare minimum of images is that some PHP methods for doing so are very slow. If I use copy() to download a file and then do the same thing using raw sockets or cURL, copy() sometimes takes at least twice as long.
So choice of transfer method and choice of parsing method both have a noticeable effect.
I made PHP website.It has 100 webpages but when I open it..It takes lots of time for load.This is static website not dynamic.but content size in the pages are larger..It takes more loaing time in web browse.
What can I do for decrease the loading time..Please give me solution.
There is a very beautiful tool available to monitor what you have asked named as Yslow
Have a look at this.
There are a whole variety of methods here:
If you are accessing a database look at optimising your queries, for example specify only the fields that you need in a SELECT query rather than using SELECT *
Employ some form of server-side caching. There are a number of solutions for PHP - see this site for more details http://www.sitepoint.com/caching-php-performance/
Use client-side (browser) caching by setting appropriate Cache HTTP headers (see http://www.mnot.net/cache_docs/ for more details)
Without further information about your site it's difficult to provide a more specific answer.
test your site in chrome
It has a great feature wich shows what time elements take to load.
( ctrl shift i , timeline)
Short steps for full optimization are
1) Backend
Should be Analysis and reduce the Data fetching time using index, reduce subquerys, temptable etc..
2) Frontend
reduce big size library Js scripts
Image size
Php scripts looping (page loading check out using browser plugin)
Reduce the html size as well.
3) its really funny but also need to check. Please check out your broadband and network capacity...
Those thing u have done all the page will come good...
You should optimize query and database operations. you should always prefer to complete things in minimum no. of loop if possible..
loading time is also affected by the page content. you should eliminate unnecessary images from form..
it also affected by server speed if you are running on server..
Simple answer, if there's too much content, then reduce the content on the page! Install YSlow and follow its advice.
To be more specific, you need to apply some rules and show some self-control to keep loading times down. There's also stuff you can do on the PHP side, but we'll get to that later. On the client side, the following tips will help.
Remove any markup that isn't necessary. For example,
<div class="class 1>
<div class="class 2">
<div class="class 3">
<p> Hello, I'm the content</p>
</div>
</div>
</div>
With judicious use of CSS you can in most cases replace this with
<div class="class1 class2 class3">
<p>Hellp, I'm the content!</p>
</div>
You could even ditch the div altogether, if it's only ever going to contain a single child.
<p class="class1 class2 class3">Hello, I'm the content!</p>
Images: Rule of thumb is no image on a web page should exceed 100K in size. While there are exceptions, this is a good rule to stick to. If you have many or large images on your page, try optimizing them. Replace lossless formats with lossy ones,(Truecolour PNG with JPEG) replace older file formats with modern ones with better compression (GIF with non-truecolor PNG), lower image quality settings for JPEG, reduce number of colours in PNG, and so on.
NEVER use BMP images on a web page!
You can speed up page loads by reducing the number of HTTP requests being made. Every asset on your page (image, stylesheet, javascript file, etc) represents a HTTP request, and the specs say you can only have 2 requests open at any one time. Any additional requests will be queued up until the first ones are cleared.
You can reduce the number of requests by, for example, having a single stylesheet for your page instead of multiple ones (though be sensible here, some stuff is better kept in a separate sheet, such as IE fixes), using image sprites, combining javascript files together (again, be sensible). and so on
One thing that won't speed up page load times, but will make them more responsive is to put all your javascript at the bottom of the page (just before the </body> tag) as loading javascript in the head or higher up in the body will force the browser to wait until the JS has been evaluated before rendering what comes after it.
On the server side, turn compression on. Make sure files are sent with suitable cacheing headers so the browser can cache images, stylesheets, javascript, etc.
Finally in PHP, optimize your code so that it generates output more quickly. The server can't start sending content to the client until the PHP script has generated it. This usually means optimizing SQL queries to execute faster.
Finally, if the pages don't change that much, have PHP cache a copy of the output to disc, and send the cached version on subsequent page loads. When the page content is changed, have the PHP script delete the cached version. The fastest query is the one you don't have to run :)
To Speedup the website try to do the following
Avoid HTTP requests.To avoid the direct request for the CSS, JS and
others,include those in our project itself
Avoid the Bulk request from the database.To be more specific,(eg: SELECT *)
Include only the relevant data field from the database table
Optimize Images.
I recently wrote a PHP plugin to interface with my phpBB installation which will take my users' Steam IDs, convert them into the community ids that Steam uses on their website, grab the xml file for that community id, get the value of avatarFull (which contains the link to the full avatar), download it via curl, resize it, and set it as the user's new avatar.
In effect it is syncing my forum's avatars with Steam's avatars (Steam is a gaming community/platform and I run a gaming clan). My issue is that whenever I am reading the value from the xml file it takes around a second for each user as it loads the entire xml file before searching for the variable and this causes the entire script to take a very long time to complete.
Ideally I want to have my script run several times a day to check each avatarFull value from Steam and check to see if it has changed (and download the file if it has), but it currently takes just too long for me to tie up everything to wait on it.
Is there any way to have the server serve up just the xml value that I am looking for without loading the entire thing?
Here is how I am calling the value currently:
$xml = #simplexml_load_file("http://steamcommunity.com/profiles/".$steamid."?xml=1");
$avatarlink = $xml->avatarFull;
And here is an example xml file: XML file
The file isn't big. Parsing it doesn't take much time. Your second is wasted mostly for network communication.
Since there is no way around this, you must implement a cache. Schedule a script that will run on your server every hour or so, looking for changes. This script will take a lot of time - at least a second for every user; several seconds if the picture has to be downloaded.
When it has the latest picture, it will store it in some predefined location on your server. The scripts that serve your webpage will use this location instead of communicating with Steam. That way they will work instantly, and the pictures will be at most 1 hour out-of-date.
Added: Here's an idea to complement this: Have your visitors perform AJAX requests to Steam and check if the picture has changed via JavaScript. Do this only for pictures that they're actually viewing. If it has, then you can immediately replace the outdated picture in their browser. Also you can notify your server who can then download the updated picture immediately. Perhaps you won't even need to schedule anything yourself.
You have to read the whole stream to get to the data you need, but it doesn't have to be kept in memory.
If I were doing this with Java, I'd use a SAX parser instead of a DOM parser. I could handle the few values I was interested in and not keep a large DOM in memory. See if there's something equivalent for you with PHP.
SimpleXml is a DOM parser. It will load and parse the entire document into memory before you can work with it. If you do not want that, use XMLReader which will allow you to process the XML while you are reading it from a stream, e.g. you could exit processing once the avatar was fetched.
But like other people already pointed out elsewhere on this page, with a file as small as shown, this is likely rather a network latency issue than an XML issue.
Also see Best XML Parser for PHP
that file looks small enough. It shouldn't take that long to parse. It probably takes that long because of some sort of network problem and the slowness of parsing.
If the network is your issue then no amount of trickery will help you :(.
If isn't the network then you could try a regex match on the input. That will probably be marginally faster.
Try this expression:
/<avatarFull><![CDATA[(.*?)]]><\/avatarFull>/
and read the link from the first group match.
You could try the SAX way of parsing (http://php.net/manual/en/book.xml.php) but as i said since the file is small i doubt it will really make a difference.
You can take advantage of caching the results of simplexml_load_file() somewhere like memcached or filesystem. Here is typical workflow:
check if XML file was processed during last N seconds
return processing results on success
on failure get results from simplexml
process them
resize images
store results in cache