I'm building a web application, and I need to use an architecture that allows me to run it over two servers. The application scrapes information from other sites periodically, and on input from the end user. To do this I'm using Php+curl to scrape the information, Php or python to parse it and store the results in a MySQLDB.
Then I will use Python to run some algorithms on the data, this will happen both periodically and on input from the end user. I'm going to cache some of the results in the MySQL DB and sometimes if it is specific to the user, skip storing the data and serve it to the user.
I'm think of using Php for the website front end on a separate web server, running the Php spider, MySQL DB and python on another server.
What frame work(s) should I use for this kind of job? Is MVC and Cakephp a good solution? If so will I be able to control and monitor the Python code using it?
Thanks
How do go about implementing this?
Too big a question for an answer here. Certainly you don't want 2 sets of code for the scraping (1 for scheduled, 1 for demand) in addition to the added complication, you really don't want to be running job which will take an indefinite time to complete within the thread generated by a request to your webserver - user requests for a scrape should be run via the scheduling mechanism and reported back to users (although if necessary you could use Ajax polling to give the illusion that it's happening in the same thread).
What frame work(s) should I use?
Frameworks are not magic bullets. And you shouldn't be choosing a framework based primarily on the nature of the application you are writing. Certainly if specific, critical functionality is precluded by a specific framework, then you are using the wrong framework - but in my experience that has never been the case - you just need to write some code yourself.
using something more complex than a cron job
Yes, a cron job is probably not the right way to go for lots of reasons. If it were me I'd look at writing a daemon which would schedule scrapes (and accept connections from web page scripts to enqueue additional scrapes). But I'd run the scrapes as separate processes.
Is MVC a good architecture for this? (I'm new to MVC, architectures etc.)
No. Don't start by thinking whether a pattern fits the application - patterns are a useful tool for teaching but describe what code is not what it will be
(Your application might include some MVC patterns - but it should also include lots of other ones).
C.
I think you have already a clear Idea on how to organize your layers.
First of all you would need a Web Framework for your front-end.
You have many choices here, Cakephp afaik is a good choice and it is designed to force you to follow the design pattern MVC.
Then, you would need to design your database to store what users want to be spidered.
Your db will be accessed by your web application to store users requests, by your php script to know what to scrape and finally by your python batch to confirm to the users that the data requested is available.
A possible over-simplified scenario:
User register to your site
User commands to grab a random page from Wikipedia
Request is stored though CakePhp application to db
Cron php batch starts and checks db for new requests
Batch founds new request and scrapes from Wikipedia
Batch updates db with a scraped flag
Cron python batch starts and checks db for new scraped flag
Batch founds new scraped flag and parse Wikipedia to extract some tags
Batch updates db with a done flag
User founds the requested information on his profile.
Related
I have a very simple test case. I want to login and logout of a php application (SugarCRM) multiple times. I have successfully carried out a couple of basic tests but I don't seem to get the hang of it. A short tutorial or a link to carry out the above will surely be sufficient. Thanks for reading.
I believe that ASP.NET Login Testing with JMeter guide will provide comprehensive information on how to properly perform your testing. It shouldn't be any difference for any other backend technology stack (Java, PHP, Ruby, etc.) as JMeter is acting on protocol level and doesn't care about underlying implementation software for application under test and correlation and cookie management is standard for all web applications.
if you want to do multiple times then try to use CSV Data config and import data from csv to login multiple times.
Create a CSV file with un & pwd for multiple users and try to run the script. Remember the number of threads you give should be equal to the no. of users, so that it will login multiple times
I have this "crazy" project starting, the idea behind it is quite clear:
There is some software that writes to MySQL database. The interval between queries are 1 second.
Now I need and web interface which loads those database records, and continues to show new records, when they happen.
The technologies I am supposed to use are PHP and HTML5 WebSockets. I'v found this nice library Ratchet which I think fits my needs, however there's one problem, I am not sure how to notify PHP script, or send a message to running PHP WebSockets server when the MySQL query occurs.
Now I could use Comet and send request for database record every second, but then it beats the WebSokets which I am supposed to use.
So what I really need is MySQL Pub/Sub system.
I'v read of MySQL triggers but I see that it possess some security risks, and thought the security in this case isn't a real concern since the system will be isolated in a VPN and only few specific people will be using it, I still would like to address every possible issue and do everything in a right way.
Then there is MySQL proxy, of which I have no knowledge, but if it could help me achieve my goal, I would very much consider using it.
So in short the question is how can I notify or run PHP script when MySQL query occurs?
I would separate the issues a bit.
You definitely need some sort of pub/sub system. It's my understanding that Redis can act as one, but I've never used it for this and can't make a specific recommendation as to what system to use. In any case, I wouldn't attach it directly to your database. There are certainly database operations you need to do on your database for maintenance purposes and you don't want it flushing out a million rows to your clients because it's based on a trigger. Pick your pub/sub system independent of what you're doing client-side and what you're doing with your database. The deciding factors should be how it interacts with your server-side languages (PHP in this case).
Now that your pub/sub is out of the way, I would build an API server or ingest system of sorts that takes data in from wherever these transactions are coming from. It will handle these and publish messages as well as inserting them into the database at the same time.
Next, you need a way to get that to clients. Web Sockets are a good choice, as you have discovered. You can do this in PHP or anything really. I prefer Node.js with Socket.IO which provides a nice fallback to long-polling JSON (among others) for clients that don't suppot Web Sockets. Whatever you choose here needs to listen for messages on your pub/sub and send the right data for the client (likely stripping out some of the information that was published that isn't immediately needed client-side).
I have added a chat capability to a site using jquery and PHP and it seems to generally work well, but I am worried about scalability. I wonder if anyone has some advice. The key area for me I think is efficiently managing awareness of who is onine.
detail:
I haven't implemented long-polling (yet) and I'm worried about the raw number of long-running processes in PHP (Apache) getting out of control.
My code runs a periodic jquery ajax poll (4secs), that first updates the db to say I am active and sets a timestamp.
Then there is a routine that checks the timestamp for all active users and sets those outside (10mins) to inactive.
This is fairly normal from my research so far. However, I am concenred that if I allow every active user to check every other active user and then everyone update the db to kick off inactive users, then I will get duplicated effort, record locks and unnecessary server load.
So I have implemented an idea of the role of a 'sweeper'. This is just one of the online users, who inherits the role of the person doing the cleanup. Everyone else just checks whether there is a 'sweeper' in existence (DB read) and carries on. If there is no sweeper when they check, they make themselves sweeper (DB write for their own record). If there are more than one, make yourself 'non-sweeper', sleep for a random period and check again.
My theory is that this way there is only one user regularly writing updates to several records on the relevant table and everyone else is either reading or just writing to their own record.
So it works OK, but the problem possibly is that the process requires a few DB reads and may actually be less efficient than just letting everyone do the cleanup as with other research as I mentioned.
I have had over 100 concurrent users running OK so far, but the client wants to scale up to several 100's, even over 1,000 and I have no idea of knowing at this stage whether this idea is good or not.
Does anyone know whether this is a good approach or not, whether it is scalable to hundreds of active users, or whether you can recommend a different approach?
AS an aside, long polling / comet for the actual chat messages seems simple and I have found a good resource for the code, but there are several blog comments that suggest it's dangerous with PHP and apache specifically. active threads etc. Impact minimsed with usleep and session_write_close.
Again does anyone have any practical experience of a PHP long polling set up for hundreds of active users, maybe you can put my mind at ease ! Do I really ahve to look to migrate this to node.js (no experience) ?
Thank you in advance
Tony
My advice would be to do this with meteor framework, which should be pretty trivial to do, even if you are not an expert, and then simply load such chat into your PHP website via iframe.
It will be scalable, won't consume much resources, and it will get only better in the future, I presume.
And it sure beats both PHP comet solutions and jquery & ajax timeout based calls to server.
I even believe you could find on github more or less a completed solution that just requires tweaking.
But of course, do read the docs before you implement it.
If you worry about security issues, read security with meteor
Long polling is indeed pretty disastrous for PHP. PHP is always runs with limited concurrent processes, and it will scale great as long as you optimize for handling each request as quickly as possible.
Long polling and similar solutions will quickly fill up your pipe.
It could be argued that PHP is simply not the right technology for this type of stuff, with the current tools out there. If you insist on using PHP you could try ReactPHP, which is a framework for PHP quite similar to how NodeJS is built. The implication with React is also that it's expected to run as a separate deamon, and not within a webserver such as apache. I have no experience on the stability of this, and how well it scales, so you will have to do the testing yourself.
NodeJS is not hard to get into, if you know javascript well. NodeJS + socket.io make it really easy to write the chat-server and client with websockets. This would be my recommendations. When I started with this is, I had something nice up and running within several hours.
If you want to keep your application stack using PHP, you want the chat application running in your actual web app (not an iframe) and your concerned about scaling your realtime infrastructure then I'd recommend you look at a hosted service for the realtime updates, such as Pusher who I work for. This way the hosted service handles the scaling of the realtime infrastructure for you and lets you concentrate on building your application functionality.
This way you only need to handle the chat message requests - sanitize/verify the content - and then push the information through Pusher to the 1000's of connected clients.
The quick start guide is available here:
http://pusher.com/docs/quickstart
I've a full list of hosted services on my realtime web tech guide.
I'm needing to design a work order management system that will be accessed via a web browser, using PHP, and will interface with an arbitrary embedded database. For the time being this application will be stand alone, and will need to have a very easy setup. I want to stay away from a full blown Apache setup for the time being. Obviously I will need some sort of web server to serve up the PHP pages, and anything that can't be built into the database or PHP will probably be written in Python. I'm thinking it might be easiest to have everything built into this single Python instance, but I'm up for suggestions.
Really what I'm trying to stay away from is having multiple services running at any given time that all need updating and maintenance. I'll be running this at work on a single machine and it will need to keep a low profile. Any suggestions?
I want to create a live, checkers-like app, which will work like this: There will be multiple icons/avatars displayed on this checkerboard like surface. I want to have a command prompt beneath this board, or some other sort of interface, that will allow them to control a certain avatar, and get it to preform actions. Multiple users will be using it at one time, and I will all be able to view the other user's changes/actions to the checkerboard.
What I'm wondering is: what's the best way to do this? I've got my HTML, CSS, and JS approach down, but not my data storage method. I know that, using PHP, I've got the choices to use either: file-based storage, MYSQL, or some other method. I need to know which is better, because I don't want to have server-lag, poor-response time, or some other issue, especially in this case since actions will be preformed every other second 2 or so, by these multiple users.
I've done similar stuff before, but I'm wanting to hear how others would handle it (advice, etc.) from more experienced programmers.
Sounds like a great project for node.js!
To clarify, node.js is a server-side implementation of javascript. What you'll want is a comet based application (a web-based client application that receives server side pushes instead of the client constantly polling the server), which is exactly what node.js is good at.
Traditional ajax calls for your clients to poll the server for data. This creates enormous overhead for both the client and the server. Allowing the server to push requests directly to the client without the client repeatedly asking solves the overhead issue and creates a more responsive interface. This is accomplished by holding asynchronous client connections on the server and only returning when the server has something to respond with. Once the server responds with data, another connection is immediately created and held by the server again until data is ready to be sent.
You may be able to accomplish the same thing with PHP, but I'm not that familiar with PHP and Comet type applications.
Number of users and hosting costs will play into your file vs DB options. If you're planning on more than a couple of users, I'd stick to the database. There are some NoSQL options available out there, but in my experience MySQL is much faster and more reliable than those options.
Good luck with your project!
http://en.wikipedia.org/wiki/Comet_%28programming%29
http://www.nodejs.org/
http://zenmachine.wordpress.com/2010/01/31/node-js-and-comet/
http://socket.io/ - abstracts away the communication layer with your clients based on their capability (LongPolling, WebSockets, etc.)
MySQL and XCache !!!!
Make sure you use predefined statements so MySQL does not need to compile the SQL again. Also memtables could be used to use memory storage
Of course make use of indexes appropriately.
If the 'gamestate' is not that important you can even store everything in XCache.
Remember that XCache does not store data persistently (after Apache restart)