Architectural advice on connecting multiple diverse sites into a single community

Architectural advice on connecting multiple diverse sites into a single community - php

I've been given a task to connect multiple sites of the same client into a single network. So i would like to hear an architectural advice on connecting these sites into a single community.
These sites include:
1. Invision Power Board Forum (the most important site)
2. 3 custom made cms-s (changes to code allowable)
3. 1 drupal site
4. 3-4 wordpress blogs
Requirements are as follows:
1. Connecting all users of all sites into a single administrable entity. With permissions changing ability, users banning etc.
2. Later on, based on this implementation I have to implement "facebook like" chat, which will be available to all users regardless of place of login.
I have few ideas on my mind on how to go with this, but would like to hear some people with more experience and expertize than my self.
Cheers!

You're going to have one hell of a time. Each of those site platforms has a very disparate user architecture: there is no way to "connect" them all together fluidly without numerous codebase changes. You're looking at making deep changes to each of those platforms to communicate with a central database, likely modifying thousands (if not tens of thousands) of lines of code.
On top of the obvious (massive) changes to all of the platforms, you're going to have to worry about updates: what happens when a new version of Wordpress is released? You'd likely have to update all of your code manually (since you can't just drop in the changes). You'd also have to make sure that all of the code changes are compatible with your current database. God forbid one of the platforms starts storing user information differently---you'd have to make more massive code changes. This just isn't maintainable.
Your alternative (and best bet) is to have some sort of synchronization job that runs every hour or so: iterate through each user in each database and compare it to see if it both exists and is up-to-date in the other databases. If not, push the changes out. The problem with this is that it will get significantly slower as you get more and more users.
Perhaps another alternative is to simply offer a custom OpenID implementation. I believe that Drupal and Wordpress both have OpenID plugins that you can take advantage of. This way, you could allow your users to sign in with a pseudo-single sign-on service across your sites. The downside is that users could opt not to use it.
Good luck

Related

facebook/gmail alike web chatbox - what is a good way for nowadays chatapp to store text message?

I'm currently building a facebook alike chatbox, and I have encounter several considerations and problems along the way.
I had been googling useful resources all the time,like simple chatbox example or tutorial online.
My goal is to build one just like facebook/gmail chatbox and CometChat, I know it's hard and too much thing to scale behind the scene, but all I want to do is building it as simple as possible, and figuring out how facebook/gmail chatbox implement their chat functionality.
Progress:
I have finished facebook-like chatbox structure where I have sidebar at the right displaying online friends i can chat with, and popup chatbox at the bottom, and it is able to expand and minimize it.
I also have finished simple chatting based on MySQL database.
There's a table with 4 columns 'sender', 'receiver', 'message', 'time' for storing conversation.
My chatbox works this way:
1.The user send a message, and my front-end javascript will fetch the message the user type in and send the message to php file on the server via Ajax.
2. backend php file will store this message to MySQL.
3. The front-end will call the update function every 3 seconds to update the chatbox content if receiver send message to the sender, and show it out in frontend's chat.
I'm not sure this is a good way and long way to do, and I'm really concerned about it.
If users grow and grow, I have to think of ways to scale it well or my database and server will explode and frontend users might feel high latency in updating conversation.
Is BigTable a right way to do this if you have millions of users online?
How does facebook store their customer's text message or chat history in the backend well??
How does chat app like Whatapp store their text message?
Is it able to let the users chat directly to another user without storing state in server?
If I want to implement the chat history functionality in my chatbox, what is a good way to do ??
I am thinking server can create .txt file for each conversation in their file system, and it has a database table column to store the file path. Is this a good way and right way to do with chat history, I know its possible to do it this way, but im not sure if its a right way or good way.
I know this could be a huge, detailed application.
I'm asking not a detailed implementation but a big picture, concept of building it!
thank you!.

That's a good question and here's an attempt at answering it.
I believe you are thinking about scalability a bit too early. Your IM app might not reach the projected number of users for it to stop performing well. Consider enhancing your small product and scale as you go as much as is needed.
Disk I/O is one of the issues that you will face scaling your web application. Storing communication directly onto the disk with txt file might not be a reliable solution.
Push your technology stack to its limits before considering changing it or switching to something else. I assume you are using a relational database for your storage (since you mentioned columns and rows, which is not an ultimate indicator but still), there are other options out there that have good benchmarking results at the expense of multiple other compromises. (NoSQL: which you referred to as BigTable) is one option. Relational databases are great, they have been for quite a long time the industry standard but currently there are alternative solutions that are quite promising.
Look into NoSQL document based datastorage solutions such as MongoDB, CoucheDB or even Casandra and there are many others. There is a considerable amount of information about the performance of each, under specific circumstances and situations. Choose what is best for the problem at hand and not what is most fashionable or hipped.
Another option would be to outsource your scalability problems to a 3rd Party provider such as Firebase. In this situation all you have to worry about is your product and not what's happening under the hood.
Store only the data that you need and archive or dismiss what you don't.
With scalability there are generally 2 broad categories: Horizontal and Vertical scaling.
Horizontal: means adding more nodes to your system i.e. adding more server instances to handle the extra load. There are many cloud solution providers out there that make this genre of scaling very cheap and instantaneous.
Vertical: means adding more resources to the node you are currently running your app from in addition to use specific technologies that allow you to take full advantages of your resources. This optimization happens on the level of the instance resources (i.e. CPU, RAM, Disk Space etc...) and your data storage, programming language of choice, algorithms you are using etc... You might realize that php and mysql aren't the tools for this job, but that's arguable.
Read More about it here
Distributed Systems architects / programmers also take advantage of other (faster) programming languages at runtime (such as C, C++ or even Java) to speed up certain tasks. Look into how you can dissect your application into smaller decoupled modules / components that can run independently. (But i'm not sure if you will ever reach this stage with an IM client unless it becomes as popular as Whatsapp or Facebook chat).
I advise you to grab and read a couple of books about scaling web applications and leveraging cloud computing. Study scalable architectures and design your application depending on your business logic based on them.
This is a very broad and complex topic, I'm sure others might have additional interesting insight on the matter.

To have multiple sub-domains or multiple separate domains?

My client has a host of Facebook pages that have become very successful. In order to move away from big brother Facebook my client wishes to create a large dynamic site that incorporates the more successful parts of the Facebook empire.
One of my client's spin off sites has been created and is getting a lot of traffic. I'm not sure exactly how much but it hit 90 Gigs in a month as the space allocated need to be increased.
In any case my client has dreamed up a massive website with its own community looking to put the community under the one banner. However I am concerned that it will get thrashed, bottlenecks, long load time, etc.
My questions:
Will a managed dedicated server be able to handle a potentially large amount of traffic?
Is it going to be better to create various parts of the empire in their own separate hosting and domain (normal hosting or VPS), or is it better to have them all under the one hood (i.e. using sub-domains).
If they were all together would it be better for SEO and easier to manage? Or if they are separate, they may be quicker but would it need some sort of Passport user system so people can log into any of the website with the same user details?
Whats the best way to implement a Passport style user system? Do you remotely connect to databases? Or run a regular a Cron job that updates each individual user details on each domain? Maybe run CURL request to the other site given then any new data?
Any other Pros/Cons to keeping all the section together or separating them?
Large site like Facebook manages to have everything under the one root. Then sites like eBay have separate domain names but you can use the same user login across all of them.
I'm not sure what the best option is and would appreciate any guidance.

It is a very general question but to give some hints:
Measure, measure and measure again. Know what kind of parts are used heavily and which are not.
Fix things and go back to 1.
Really: Without knowing what takes lots of time, what is used most heavily etc. you cannot say anything usefull.
VPS or dedicated servers are not the right question. You start with: What do I have to do for the users. Then: How am I going to do that? (for example: in database, in scripts, in message queue) and then finally you see how much hardware you need.
One or multiple domains doesn't really matter. Though one exception: For static content it might be interesting if you have lots of it to use a CDN like Amazon. Read for example: http://highscalability.com/blog/2011/12/27/plentyoffish-update-6-billion-pageviews-and-32-billion-image.html where you can read some things about the possibilities with a CDN.
In general serving static content from a static domain is useful many other things don't really need that. So there you could just consider all in one domain.

PHP - detecting changes in external database-driven site

For a homework project, I'm creating a PHP driven website which main function is aggregating news about various university courses.
The main problem is this: (almost) each course has it's own website. These are usually just plain HTML or built using some simple free CMS system.
As a student, participating in 6-7 courses, almost every day you go through 6-7 websites checking if there are any news. The idea behind the project is that you don't have to do that, instead, you just check the aggregation site.
My idea is the following: each time a student logs in, go through his course list. For every course, get it's website (recursively, like with wget), and create a hash value of it. If the hash is different then one stored in database, we know that site has changed, and we notify the student.
So, what do you think, is this reasonable way to achieve the functionality?
And if yes, what is (technically) the best way to go about this? I was checking php_curl, put I don't know if it can get a website recursively.
Furthermore, there's a slight problem I have somewhat limited resources, only a few MB of quota on public (university) server. However, if that's a big problem, I could use a seperate hosting solution.
Thanks :)

Just use file_get_contents, or cURL if you absolutely have to (in case you need COOKIES).
You can use your hashing trick to check for modifications but it's not very elegant. What you want to know is when was it last changed. I doubt this information is on the website, but maybe they offer an RSS feed or some webservice or API you can use for this purpose.
Don't worry about doing recursive requests. Just make a new request each time.
"When all else fails, build a scraper"

what happens with old CMS/blog websites?

I've created a couple of little few page long websites for one time projects or conferences in mostly Wordpress, and I'm thinking about what will happen to those websites in the future. And I think I'm not alone, as there are a big number of sites out there, which is now only kept as an archive, but unlike in the 90s where everything was static HTML, these websites are now using some software to provide CMS functionality, even if its only for a few pages + search.
My problem is that with all these modular software (Wordpress, Joomla, etc.) you need to use various plugins and themes to make them usable and nice, but all these functionality brakes sooner or later. Which means that if you want to keep the website as is, you need to leave the old versions of the software. I mean forever.
On the other hand they are so popular (Wordpress has more than 100 million downloads now), that I would be surprised if they would not became a target for the most popular exploits in the near future. I don't know how safe these software are, but I have experienced what it means to continuously keep cleaning/fixing an osCommerce website with about 7 successful hacker attacks every month, till the sites owner agreed that its better close the site entirely and start building a new one.
As an alternative solution (but I really don't know if its possible), is there any way to make a whole site into a read-only mode? I mean something like making the database read-only, the file system read-only, disabling the admin interface and all the comment fields and just leaving the site as an archive, the only dynamic part being the search function.
Is it possible on file-system/database level? Will it help at all to keep hackers out? Is there any other solution? Please understand that my point is that it is not possible to keep CMS sites always updated forever, and even if some of as are fanatic enough to spend a night looking for fixing a broken theme/plugin which just broke after a core upgrade, 99% of the sites will end up in a "fixed" state; using a working but old CMS/plugins/theme combination forever.

I think 99% is a very generous estimate, but that's beside the point. The majority of the sites that end up in the state you are referring to only last as long as their domain registrations (especially since most Wordpress or OSCommerce deployments are usually set up as the root domain and service the entirety of the web presence.) So generally speaking, if the domain itself is in a state of neglect and abandonment, the natural expiration process will decommission it and it will no longer be accessible in general.
As for locking down an entire, sitewide state on one of these CMS systems, it could in theory be possible if one removed all write privileges for all the server files and revoked every database user privilege except SELECT. In most cases this would defeat the purpose of leaving the software for CMS there at all, since none of the records would updatable any longer (items in the case of OSCommerce, posts in the case of Wordpress.) But this would be highly dependent on the environment required by the particular CMS, and Wordpress for one is pretty particular about read/write permissions to work at all. It would make for an interesting experiment, but probably isn't a practical solution for what you're describing.
Taking the rendered content and building a static mirror is another option, and can be pretty easily automated by writing a script that could get the HTML content of the rendered pages and building static, linked alternatives. But this too is a bit impractical, especially in the case of a search (since this by its very definition requires database access.)
In short, it's an interesting idea, but ultimately sites that are neglected and whose owners are not committed to sustaining proper updates are doomed to expiration, and the natural course of Internet business and domain registration pretty often Darwinizes them.

Yes, you can take a snapshot of a website using wget or similar, basically replacing the CMS driven site with static HTML pages.
wget -mk http://www.example.com/
That way you wouldn't need to update it forever.

As an alternative solution (but I
really don't know if its possible), is
there any way to make a whole site
into a read-only mode? I mean
something like making the database
read-only, the file system read-only,
disabling the admin interface and all
the comment fields and just leaving
the site as an archive, the only
dynamic part being the search
function.
WP Super Cache has a "Lockdown" function — serving static HTML files for almost every visitor.
It's not exactly what you're looking for, but a simple workaround, as I dont know a of a "read only" function for WordPress.
http://wordpress.org/extend/plugins/wp-super-cache/

Best Way to handle/consolidate multiple Logins?

this is the scenario:
multiple web systems (mostly lampp/wampp) exist, most of them with separate login information, (some share it). We're considering the benefits/disadvantages of unifying them somehow, or at least making handling the user administration parts easier.
Due to the nature of some systems (its a mixed bag of custom OSS systems,internally developed software and 3rd party commercial software) we can't unify all login-screens into a single screen.
A idea passed around is a sort of login master brain were we can control all user name creation,permissions,inactivation, etc. This will still make people have to manually log in into every system, but at least it'll make the administrative load of user management easier.
Are there any known solutions to this kind of problem that involves (necesarily, it could be considered) changing the least amount of code/systems possible?
Edit: OpenID doesnt work for us since we have different login needs and some systems we cant directly control how they handle the login process (but we can control the users/passwords).

What we did was to centralise all login details in one repository (Active Directory for us), then wrote a c# library to authenticate with wrappers for all the languages we programmed in (PHP, C, .NET, etc). and then just wrote some glue code in the appropriate place for each application. Aside from our in-house apps, we successfully logged into Mediawiki, Subversion, ActiveCollab and Apache this way.
It does involve writing a reasonable amount of code, but not ridiculous amounts, and it will work for the future as well. I can't see a practical solution which would be easier than this.
Reading your question I note that this is more-or-less what you're thinking anyway, but it will work!

There is a big industry around it and it is called IAM - Identity Access Management. The IAM solutions basically do what You want - manager users, user permissions and translate their internal state to the multitude of systems. Depending on possibility of integration, You might have a "SSO" - Single Sign On for some software or You could have Single Source of Authentication. The former differs from later in the fact that with SSO user needs in to punch the credentials once, while the in the later he only has same login and password combo.
Also IAM would manage to extent of its possibilities user rights. For example, a network equipment can only support one user/password. Then IAM solution would automatically open a terminal and log on the user, when he/she requests it; assuming the user is in the right security group.
Implementing an IAM solution could go a long way to ease systems management.
I can't recommend any particular solution, just bear in mind that transition from current method to IAM will require more than integration with different software, but also some change in corporate culture as one system will bind all others.

A lot of people seem to like OpenID for this sort of thing. I'm not sure on it's intranet capabilities though.
Another idea is using your "brain" system to pass authenticated username to the sister/brother applications as a form post, then handle authentication on that system and create their security tickets with what was sent.
Hope you find what you're looking for!
Cheers!

We Keep Coding

PHP, A popular general-purpose scripting language that is especially suited to web development.