PHP vs. Javascript Efficient XML Parser - php

I am curious, I am creating a flickr plugin for wordpress. I have noticed that the PHP that I have written is fairly slower then the same javascript I have written.
I know that Javascript is run client side so it will be faster as long as there aren't numerous processes already hogging the processor. With PHP running remotely I know that it is all based on connection and what is going on with the server. I was wondering if one was better to use than the other and if DOM is maybe not the best way to go grab XML. In this case in PHP I am using DOM to go and get the XML and then parse it out. With Javascript I am using SOAP to parse the same XML.

Assumption
JavaScript is required for this plugin.
JavaScript testing was only done on your development machine.
I think you need to rethink your metrics. In your particular case JavaScript is faster than PHP, but I don't see that being the case across the board. I'm assuming you're on shared hosting as are probably most end users of your plugin, so your PHP will not be on the fastest servers. Like Rory said above. It is best to diagnosis why your PHP is slow. With JavaScript you have to take into account the average user's device speed which could range anywhere from awful to amazing. My guess is your PC is somewhere near the higher end of the spectrum.
Without any code provided it's tough to give a definitive answer. I would recommend trying your JavaScript version of the plugin on as variable a range of devices and browsers as possible. Hitting on things like iPads and cellphones.
Due to that JavaScript potential performance pitfalls on low-end devices, I would probably perform the task on the server unless investigation shows that, in your case, the JavaScript is performant across the board.

You also can run javascript in the server-side with the V8JS class in PHP since version 5.3.3+ http://ar.php.net/manual/en/book.v8js.php

Related

Crawl page faster [PHP]

I have a small question about crawling a web page in PHP. I have to crawl about 90 000 products on one big eshop. I tried it in PHP, but one product takes about 2-3 sec and that's bad. Any tips, how to do it faster? Maybe a C++ multithread version? But what about time of a HTTP request? I mean, is it PHP's limitation or not? Thank you for the tips.
That's an extremely vague question. When you benchmarked the code you have, what was the slowest part? Was it network transfer times? Using a different language (or multiple threads) won't change that.
Was it time spent parsing the page? How are you doing that? If you're using an XML library to parse the entire DOM, could you get away with just looking for keywords (or even regular expressions)? That's less precise (and in some sense less correct) but perhaps it's faster.
What algorithms are you using for your analysis? Would other data structures provide better performance? As one simple example, if you spend a lot of time iterating over an array, perhaps a hash map is more appropriate.
PHP can be run in multiple processes. What happens if you kick off multiple instances of your script at once (on different pages)? Does the total time decrease?
Ultimately you've described a very general problem so I can't offer very specific solutions, but there is no inherent reason why PHP is inappropriate for this task. When you've identified what's slow (regardless of what language you're using) you should be able to more precisely address how to fix it.
I don't think it's PHPs problem but it could be depending on connection speed/computer speed. I've never had a speed problem with PHP/cURL though.
Just do multiple threads (ie. multiple connections at once), I suggest you use cURL but that's only because I'm familiar with it.
Here's a guide I've used for multiple threads for scraping with cURL:
http://semlabs.co.uk/journal/object-oriented-curl-class-with-multi-threading
Be VERY careful not to accidentally cause a denial of service situation with your scripts. But I'm sure you're already away of that possibility.
If your program is running slowly, my advice would be to run a profiler on it, and analyse why it's running slowly.
This advice applies to any language, but in the case of PHP, the profiler software you need is called xDebug.
This is a PHP extension, so you need to install it into your server. If you're running on an ISP's server, then you may not have permission to do this, but you can always install it with PHP on your local PC and run your tests there.
Once you've got xDebug installed, switch on the profiling features in PHP.ini (see the xDebug documentation for instruction on this), and run your program. It will then generate profiler files, which can be used to analyse what the program is doing.
Download KCacheGrind to perform the analysis. This will generate call tree information, showing exactly what happened as the program ran, and how long every function call took.
With this information, you can look for the function calls that are running slowly, and work out what's happening. Usually the reason for slow code is some kind of inefficiency in the way something is written; xDebug will help you find it.
Hope that helps.
You have 99% probability that PHP is NOT the problem. It is rather the eshop webserver or any other network latency.
I know this for sure because I have been doing this for months now, and even if your code has lots of regular expressions, data scraping is really fast in PHP.
The solution to speed this ? Pre cache all the website with a command line crawler since disk space is cheap. curl can do this, and httrack as well. It will be much faster and stable than PHP doing the crawling.
Then let PHP do the parsing alone, you will see hopefully PHP chomping dozens of pages per minute, hope this helps :)

Creating a real time website using PHP

I'm currently creating a website using PHP and the Kohana framework. I want to site to be able to use real time (or near real time) data (e.g. for chat and real time feeds). I need it to be able to scale to thousands of concurrent users. I've done a lot of reading and still have no idea what the best method is for this.
Does anyone have any experience with StreamHub? Is it possible to use this with PHP?
Am I digging myself into a hole here and need to switch languages? I've looked at node js and nowjs, but I'm weary about coding a while site in Express (I wonder about security holes, code maintainability, lack of a good ORM). I've read about Twisted Python, but have no idea what web framework would work well on top of that, and I'd prefer not to use Nevow - maybe Django can be used well with Twisted Python? I'm just looking to be pointed in the right direction, so I don't go too far in PHP and realize I can't get the near real-time results that I need.
Thanks for the help.
I've looked at node js and nowjs, but
I'm weary about coding a while site in
Express (I wonder about security
holes, code maintainability, lack of a
good ORM).
I can personally vouch for code maintainability if you can do JavaScript. I personally find JavaScript more maintainable then PHP but that's probably due to lack of PHP experience.
ORM is not an issue as node.js favours document based databases. Document based databases and JSON go hand in hand, I find couch db and it's map/reduce system easy to use and it feels natural with json.
In terms of security holes, yes a node.js server is young and there may be holes. These are un avoidable. There are currently no known exploits and I would say it's not much more vulnerable
then IIS/apache/nginx until someone points a big flaw.
I want to site to be able to use real
time (or near real time) data (e.g.
for chat and real time feeds). I need
it to be able to scale to thousands of
concurrent users.
Scalability like that requires non-blocking IO. This requires a non-blocking IO server likes nginx or node.js (Yes blocking IO could work but you need so much more hardware).
Personally I would advice using node.js over PHP as it's easier to write non blocking IO in node. You can do it in PHP but you have to make all the right design and architecture decisions. I doubt there are any truly async non-blocking PHP frameworks.
Python's twisted / Ruby's EventMachine together with nginx, can work but I have no expertise with those. At least with node you can't accidentally call a blocking library or make use of the native blocking libraries since JavaScript has no native IO.
PHP is not the language you should be using for real-time updates of a website. PHP scripts load first before HTML (and HTML calls javascript files), so PHP cannot update your page for you. However, when used with AJAX (eg. using a jQuery function to call a PHP file to update your page in real-time), you can use PHP in this fashion.
Using jQuery and AJAX (all javascript), you can do quite a bit in terms of updating a page without reloading it. I've seen sites such as this one that demonstrate how to make a chat using jQuery.

Interpreting JavaScript in PHP

I'd like to be able to run JavaScript and get the results with PHP and is wondering if there is a library for PHP that allows me to parse it out. My first thought was to use node.js, but since node.js has access to sockets, files and things I think I'd prefer to avoid that.
Rationale: I'm doing screen scraping in PHP and have encountered many scenarios where the data is being produced by JavaScript on the frontend, and I would like to avoid writing specialized filtering functions to act on the JavaScript on a per-case basis since that takes a lot of time. The more general case would be to parse the JavaScript directly.
Downvoting: I don't really see what's so controversial about this question, modern web crawlers are known to do it, the only difference is that they tend to not be written in PHP. [1]
[1] http://blogs.forbes.com/velocity/2010/06/25/google-isnt-just-reading-your-links-its-now-running-your-code/
It's an interesting question and the down-voters are being unimaginative about potential use-cases. Page archiving tools, printing scripts, preview images - all valid reasons to want to manipulate a document with the JavaScript included within the page.
I'm not aware of any existing PHP implementations, but you could probably adapt Mozilla's SpiderMonkey as a PHP module, or as a standalone tool to manipulate a DOMDocument and return the result.
I haven't had experience with server-side JavaScript, but some issues that I believe might need to be dealt with:
Host objects like document and window are not part of the ECMAScript specification (these are objects provided by the implementing browser) so you need to make sure that the library provides equivalent host objects.
You might have security issues around executing client side scripts within a server side environment. This is a lot like allowing the user to submit a PHP script to be evaluation, so you need to make sure the security sandbox is tight.
Another (perhaps) safer and easier to implement option might be to use a modified FireFox or WebKit instance that runs as a browser, loading up the target pages and returning the modified source to your application.
From PHP 5.3 you can use V8JS extention from PHP. It's a native library that uses the new Google V8 Javascript engine to execute JS and return the result.
It's good because you can pass vars in PHP arrays and are interpreted very well
NodeJS (or some other derivative of google's v8) might actually be the best way to go here. If you're concerned about the various things nodejs can do (eg. sockets, etc), you can probably "strip it down" by removing modules and/or addons -- I think even the built in stuff is ultimately implemented in such a way that it could be stripped out fairly easily.
An alternate approach might be to simply replace, override, or remove the require function from node.js.
There's also envjs which should make it easier to run js that was designed to run the browser.

In need to program an algorithem to be very fast, should I do it as php extension, or some otherway?

Most of my application is written in PHP ((Front and Back ends).
There is a part that works too slowly and I will need to rewrite it, probably not in PHP.
What will give me the following:
1. Most speed
2. Fastest development
3. Easily maintained.
I have in my mind to rewrite this piece of code in CPP as a PHP extension, but may be I am locked on this solution and misses some simpler/better solutions?
The algorithm is PorterStemmerAlgorithm on several MB of data each time it is run.
The answer really depends on what kind of process it is.
If it is a long running process (at least seconds) then perhaps an external program written in C++ would be super easy. It would not have the complexities of a PHP extension and it's stability would not affect PHP/apache. You could communicate over pipes, shared memory, or the sort...
If it is a short running process (measured in ms) then you will most likely need to write a PHP extension. That would allow it to be invoked VERY fast with almost no per-call overhead.
Another possibility is a custom server which listens on a Unix Domain Socket and will quickly respond to PHP when PHP asks for information. Then your per-call overhead is basically creating a socket (not bad). The server could be in any language (c, c++, python, erlang, etc...), and the client could be a 50 line PHP class that uses the socket_*() functions.
A lot of information needs evaluated before making this decision. PHP does not typically show slowdowns until you get into really tight loops or thousands of repeated function calls. In other words, the overhead of the HTTP request and network delays usually make PHP delays insignificant (unless the above applies)
Perhaps there is a better way to write it in PHP?
Are you database bound?
Is it CPU bound, Network bound, or IO bound?
Can the result be cached?
Does a library already exist which will do the heavy lifting.
By committing to a custom PHP extension, you add significantly to the base of knowledge required to maintain it (even above C++). But it is a great option when necessary.
Feel free to update your question with more details, and I'm sure Stack Overflow will be happy to help out.
Suggestion
The PorterStemmerAlgorithm has a C implementation available at http://tartarus.org/~martin/PorterStemmer/c.txt
It should be an easy matter to tie this C program into your data sources and make it a stand alone executable. Then you could simply invoke it from PHP with one of the proc functions, such as proc_open()
Unless you need to invoke this program many times PER php request, then this approach should save you the effort of building and integrating a PHP extension, not to mention that the hard work (in c) is already done.
Am not sure about what the PorterStemmerAlgorithm is. However if you could make your process run in parallel and collect the information together , you could look at parallel running processes easily implemented in JAVA. Not sure how you could call it in PHP, but definitely maintainable.
You can have a look at this framework. Looks simple to implement
https://computefarm.dev.java.net/
Regards,
Franklin.
If you absolutely need to rewrite in a different language for speed reasons then I think gahooa's answer covers the options nicely. However, before you do, are you absolutely sure you've done everything you can to improve the performance if the PHP implementation?
Is caching the output viable in your situation? Could you get away with running the algorithm once and caching the output rather than on every page load?
Have you tried profiling the code to ensure there's no unnecessary work being done (db queries in an inner loop and the like). Xdebug can help here.
Are there other stemming algorithms available which might perform better on your dataset?

Migrating a large classic ASP page to php?

We've got a large classic asp application and we consider migrating to either asp.net or php. I don't want to talk about the pros and cons of either one, but I'd rather like to know whether there are ways to avoid a complete rewrite in one shot when migrating to php. We simply can't stop maintaining the current codebase just to do a rewrite. So things have to go hand in hand.
If we'd move to asp.net, we should be able share session data among both technologies and have parts of the site replaced with new asp.net code, while other just keep on running. Is such an approach possible with php? Does anyone has got experiences with such a migration or could point me to some good readings?
The ability to share session state between ASP Classic and ASP.NET isn't an intrinsic feature of either language, though it's fairly easy to accomplish.
Microsoft provides sample code:
http://www.google.com/search?client=safari&rls=en-us&q=share%20session%20data%20between%20ASP%20and%20ASP.NET%20pages&ie=UTF-8&oe=UTF-8
By using Microsoft's example, you could pretty easily implement something similar in PHP. Basically you'd use the ASP Classic portion of Microsoft's code above. Then in PHP you'd write a fairly simple class to read session state from the database into an array or collection when each page is loaded. It's a little extra work in PHP, but shouldn't be more than a few extra days of coding and testing.
PHP runs pretty well on IIS6 in my limited experience and support for it is supposedly even better in IIS7. The only snag I've hit in is that a most of the PHP code out there assumes you're running on Linux/Unix... but generally this is only an issue for file-handling code (think: user image uploads) that works with local filesystem paths. Because they assume your filesystem uses / instead of \ like on Windows. Obviously fairly trivial to fix.
Good luck!
Yes; it is possible to share session data between ASP and ASP.NET pages on a single web application. We do that with our legacy code at my work.
I know it's possible to run PHP on the IIS. Not sure about sharing sessions between ASP and PHP scripts though.
"I'd rather like to know whether there are ways to avoid a complete rewrite in one shot when migrating to php"
Welcome to our world.
Our FogBugz codebase was written in classic ASP and when we wanted to offer it on Linux, the simplest solution was to write a compiler which read the asp and emitted php. It wasn't that difficult, and didn't take more than a few weeks.
The upside was when we decided to switch our entire application to .NET it only meant tweaking the compiler a bit to output .Net object code.
But to get back to your answer, ASP and PHP are VERY VERY similar and depending on your app there are really naive translators that might get you most of the way there.
Just another option. John Booty has a good suggestion too.
If the session data isn't sensitive information you can work cookies which will also be platform agnostic pending the user has cookies turned on.
That's probably not your best option given the post but another to think about.

Categories