deltified document revision control in PHP

deltified document revision control in PHP - php

If I have a PHP application which allows users to make changes to documents, what is the best way to implement revision tracking for each document? I want the storage of each revision to be deltified (i.e. only save the changes that were made) like svn and other SCMs do with code. I know on a very simple level how it works, but when I start to think about implementing it, I get a little confused.
First and foremost, I am wondering if there is a library out there that can help me with this, so I don't have to completely roll my own.
And I am wondering: should I keep the full text of only the original document, and then only save the changes, or should I keep the full text of the latest document, and each time it is modified, save the differences as one of the older revisions?
If the former, then when I want to grab a page to be shown on the site, do I have to start at the beginning, and then recursively update the data based on the revisions, until I reach the current version? Won't this be painfully slow once there are many revisions?
How can I do diff/patch type operations in PHP to make the deltifying and reconstructing of the pages easier?
Would it be worth it to have locks on the pages when they're editing them? Or let pages get into 'states of conflict' and have conflict resolution operations -- let two users modify the same page simultaneously if they're modifying different parts, etc -- I'm going crazy thinking about how hard this will be. Ahh!

This previous SO question might help.

Why don't you use a subversion server? You can access the client from the console using exec() or similar. It is really not worth implementing something like that from scratch unless this you are writing a revisioning software.

Related

unique ID for customer protection CMS

I develloped a standalone flatfile based cms. Now in order to protect my code from being stolen by clients, i've been looking arround but didn't find much usefull to protect my code. I have found ioncube, but i don't really like that...
I am wondering if it is possible to create a file within the cms, a php file, with a unique ID for every sale i made. That file transmits a signal to a webserver with that ID. So i can tract the online versions of my cms. If it gets copied i will see 2 or more versions with that ID online and i know which company or user distributed my code. But is it possible to make a cms dependable on that file... if a user erases the ID transmitting file no code is send out...How can i make that file so it can't be deleted or make that file CMS dependable. Anybody idea's?

I am wondering if it is possible to create a file within the cms, a php file, with a unique ID for every sale i made.
Yes, it's pretty easy to do it. Just create a file and include it in your php code.
That file transmits a signal to a webserver with that ID.
No! That file will transmit an ID in http header to YOUR webserver. Not a webserver.
So i can track the online versions of my cms.
Yes you can, until sysadmin check the log and see that their new cms is transmiting ID to your server. Then someone might ask why didn't you warn them, what else are you sending to your server, etc.
But is it possible to make a cms dependable on that file... if a user erases the ID transmitting file no code is send out...
Yes, that's fairly easy. Just put some if (file) work, if (no_file) {dont work}.
And be prepared to obfuscate that code AS MUCH AS YOU CAN.
You are delivering PHP as source code and any descent programmer will deobfuscate almost any code.
How can i make that file so it can't be deleted or make that file CMS dependable. Anybody idea's?
As far as I know, you can't. Almost anything can be deleted.
One idea is to create some nasty pgp public keys with some hashes that are calculated and recalculated all over your cms. But that will make your code hard to maintain and it will put some additional load to server...
Other solution is to put your code to your server. That's the only way to keep it safe.
p.s.
It would not be fair if I didn't mention that reading and editing (adding new features to someone's php code) if hard. It's really hard if code is bad (speaking from experience here). It's extremely hard if it's very, very bad code!
Many 'programmers' wouldn't touch core code of some app. 'Just gimme my framework...'
Obfuscated code is next to impossible to change if you don't have excellent coding skills + experience + lots of time.
Provided you really created your own cms (that's not an easy task) you will be able to create ok protection :) Some guide lines:
never create 2 links. Always link 3 or more features (functions, classes, globals, etc) in some ludicrous way. One f() returns a resuls that is used in class that creates fake object which checks some global used in first f().
use what you already have! but add some checks and tests with in. logical and illogical.
use long time tests. In odd lunar months goto this {code}. In even goto this {code}.
use different tests for same thing. copy/paste is search friendly.
be shamelessly creative :)
prepare yourself for your own traps. You are doomed without heavy documentation of your work.

API display logic serverside or clientside

I'm refactoring an API where there are user profiles from a profiles table and profile images in a separate table. Currently the API queries the profile tables, and then loops through the images table for associated image data (paths etc). There is logic built in that adds a default img path when a profile image isn't set. So if we are displaying 50 profiles there are 51 queries being run.
I'm considering refactoring where the initial profile query joins the image table. I'm now left with two options.
I can loop through the results server side to build the image paths. I will have to loop through them again client side to display the results.
I can loop through the results one time client side and build the image paths there. The path logic is easy and a simple if statement.
It seems 2 would be the logical choice. But is it? I guess this is part of a bigger question of when you are building out APIs and the client side interfaces when do you move code from the server to the client to keep the API fast at risk of slowing down the browser? How do you do this dance? I'm working on another API using Node for the jquery datatables plugin where there needs to be a lot more code to marry the backend, and it's been a bit of a tug of war determining how much I should hand over to the browser. A fast API is of not much use if you are crashing your visitors browsers.

The tipping for the decision for me, would be
Am I by exposing parts of the component path, so the client can build it, exposing something I don't want to.
vs
Am I by constructing the image paths server side doing work that the client might not need, or that the client might have to redo, like chopping them up on occasion for instance.
In terms of passing more data than is needed, I'm not seeing an issue from what you've said, and the first question would be one with the most priority for me.
Sort of stretching in this scenario, but the client having to know how to compose the image path, sets a few constraints, whereas if it's all done server side the implementation details are hidden. Despite them being simple, that would be my default option
As you've said it's a tug of war. Another way to look at issues like this, is the "right" answer can depend on when you ask the question. You could go one way and them a bit later, some new requirement pops up, and now it's the wrong one....
Simple and consistent is the thing to aim for. Right as in best? 20/20 hindsight time.

When I've seen and done this in the past I've found it better not to store images (like what it sounds like you are doing) in a db. Put them in a place where the browser can link to them and pass the path from the server.

If I understand you correctly.. you are displaying some sort of profiles list where each profile has associated image... right?
Abstracting from the way you store images(db or vfs images are faster but straight files - with at least minor MRU Cache - are easier to maintain).
Solution numero uno is the right way to go.
It is just simpler and more "restful". I am a huge fan of client logic, but we should use it for a good cause(such as Soopa-UI). Same goes for db logic code vs server logic code. I dislike sql and having to maintain another layer of problems, but I do understand the difference it makes in some cases to the final result.
EDIT: Oh... you are storing just paths.
So if you are not doing some fancy one page web app then there is another problem with building paths client side. Client would have to wait for script to finish loading before his images would even start to load.

Analtytics, statistics or logging information for a PHP Script

I have a WordPress plugin, which checks for an updated version of itself every hour with my website. On my website, I have a script running which listens for such update requests and responds with data.
What I want to implement is some basic analytics for this script, which can give me information like no of requests per day, no of unique requests per day/week/month etc.
What is the best way to go about this?
Use some existing analytics script which can do the job for me
Log this information in a file on the server and process that file on my computer to get the information out
Log this information in a database on the server and use queries to fetch the information
Also there will be about 4000 to 5000 requests every hour, so whatever approach I take should not be too heavy on the server.
I know this is a very open ended question, but I couldn't find anything useful that can get me started in a particular direction.

Wow. I'm surprised this doesn't have any answers yet. Anyways, here goes:
1. Using an existing script / framework
Obviously, Google analytics won't work for you since it is javascript based. I'm sure there exists PHP analytical frameworks out there. Whether you use them or not is really a matter of your personal choice. Do these existing frameworks record everything you need? If not, do they lend themselves to be easily modified? You could use a good existing framework and choose not to reinvent the wheel. Personally, I would write my own just for the learning experience.
I don't know any such frameworks off the top of my head because I've never needed one. I could do a Google search and paste the first few results here, but then so could you.
2. Log in a file or MySQL
There is absolutely NO GOOD REASON to log to a file. You'd first log it to a file. Then write a script to parse this file.Tomorrow you decide you want to capture some additional information. You now need to modify your parsing script. This will get messy. What I'm getting at is - you do not need to use a file as an intermediate store before the database. 4-5k write requests an hour (I don't think there will be a lot of read requests apart from when you query the DB) is a breeze for MySQL. Furthermore, since this DB won't be used to serve up data to users, you don't care if it is slightly un-optimized. As I see it, you're the only one who'll be querying the database.
EDIT:
When you talked about using a file, I assumed you meant to use it as a temporary store only until you process the file and transfer the contents to a DB. If you did not mean that, and instead meant to store the information permanently in files - that would be a nightmare. Imagine trying to query for certain information that is scattered across files. Not only would you have to write a script that can parse the files, you'd have to right a non-trivial script that can query them without loading all the contents into memory. That would get nasty very, very fast and tremendously impair your abilities to spot trends in data etc.
Once again - 4-5K might seem like a lot of requests, but a well optimized DB can handle it. Querying a reasonably optimized DB will be magnitudes upon magnitudes of orders faster than parsing and querying numerous files.

I would recommend to use an existing script or framework. It is always a good idea to use a specialized tool in which people invested a lot of time and ideas. Since you are using a php Piwik seems to be one way to go. From the webpage:
Piwik is a downloadable, Free/Libre (GPLv3 licensed) real time web analytics software program. It provides you with detailed reports on your website visitors: the search engines and keywords they used, the language they speak, your popular pages…
Piwik provides a Tracking API and you can track custom Variables. The DB schema seems highly optimized, have a look on their testimonials page.

Are there any performance issues with storing content on a database, instead of on a normal ASPX or PHP page?

Me and a colleague were discussing the best way to build a website last week. We both have different ideas about how to store content on the website. The way I have always approached this has been to store any sort of text or image link (not image file) on to a database. This way, if I needed to change a letter or a sentance I would just need to go on the database. I wouldn't have to touch the actual web page itself.
My colleague agreed with this to a point. He thinks that there are performance issues related to retrieving content from the database, especially if every character of content is coming from the database. When he builds a website, any content that won't be changed often (if at all) will be hard coded on to the page, and any content that would be changed or added regulary would come from the database.
I can't see the benefit of doing it like this, just from the perspective of everytime we make a change to an ASPX page we need to re-compile the site to upload it. So if a page has a misspelt "The" (so it'd be like "Teh") on one page, we have to change it on the page and then recompile the site (the whole site) and then upload it.
Likewise with my colleague, he thinks that if everything was to come from the database there would be performance issues with the site and the database, and that the overall loading speed of the web page to the browser would decrease.
What we were both left wondering was that if a website drew everything from the database (not HTML code as such, more like content for the headers, footers, links etc) would it slow down the website? And as well as this, if there is a performance issue, what would be better? A 100% database driven website with it's performance issues, or a website that contains hard coded content which would mean 10/20 minutes spent compiling and uploading a website just for the sake of a one word or letter change?
I'm interested to see if anyone else has heard of it, or if they have their own thoughts on this subject?
Cheers

Naturally it's a bit slower to retrieve information from a database rather than directly from the file system. But do you really care? If you design your application correctly then
a) you can implement caching so that the database is not hit for every page
b) the performance difference will be tiny anyway, particularly compared to the time to transmit the page from the server to the client
A 100% database approach opens up the potential for more flexibility and features in your application.
This is a classic case of putting caching / performance considerations before features / usability. Bottlenecks rarely occur where or when you expect them to - so focus on developing a powerful application and then implement caching later - when it's needed and where it's needed.
I'm not suggesting storing templates as static files is a bad idea - just that performance shouldn't be your primary driver in making these assessments. Static templates may be more secure or easier to edit using your development tools for example.

Hardcode the strings in the code (unless you plan to support multiple languages).
It is not worth the
extra code required for maintaining the strings
the added complexity
and possibly performance penalty
Would you extract the string "Cancel" from a button?
If so, would you be using the same string on multiple cancel buttons? Or one for each?
IF you decided to rename one button to "Cancel registration", how do you identify which "Cancel" to update in the database? You would be forced to set up a working process around how to deal with this, and in my opinion it's just not worth it.

Light Blogging system sans database

This is a general programming question.
What is the best way to make a light blogging system that can handle images, bbcode-ish styling and text without a database back end? Light means not more than 50 to 100 posts in extreme cases.
What language(s) should be used? Is there any preferred data format for the information? How does security play out?
EDIT: Client has no database, is on a shared server. Can't change that. Therefore, no DB.
EDIT2:
Someone mentioned SQL Compact - does that require anything more than copying files to the server? The key here is again that things shouldn't require any more permissions than FTP Acess.

If you're looking to do it yourself; store each post as a file in a directory. Then to sort and limit the posts you rely partially on the file names to order and limit them, and potentially (in the case of a search) on reading every last file. Don't go letting users make 10,000 posts though. But yeah, the above is considered a flat file data format. You can get fancy by using a standard format like JSON, Yaml, or XML within each post file, and even fancier by requesting these with Ajax calls in mostly client side code.
Now if the reason you want to work with flat files is that you just don't want to install a database server, there's nothing stopping you from reading a local (to the server) file as a berkley DB, a Lucene Index, or an SQLite DB from within your webapp using the appropriate client library. You'll find any of these approaches a little more sane (a bit faster, a bit more readable in code) than the afore-mentioned with all the same requirements for installing on the server (read-write file permissions). Many web frameworks or languages (like php) come with the option of an API to these client libraries; SQLite, and Lucy (C Lucene) particularly.
If you're just looking for examples of it being done, I first (I think 1999 or 2000) came across blosxom which is a perl script that either runs as a cgi script per request or as a cron job. It builds a dated index of "posts" based on whatever you throw into the directory it's meant to scan. It also builds an RSS feed.

Jekyll or Blogofile are my favorite kind of solution for that, "compiling pages before upload".

I'm going to go out on a limb here and say that it's not always the destination, but the Journey.
If you're going to set out to do this, I recommend using a language you are comfortable. Personally, this would be C#/.net for me, but from your tagging, I'll assume PHP would be the Serverside scripting language you would choose.
I would layout how I wanted my application to behave. If there is going to be a lot of data, you should consider (as dlamblin mentioned) an DB of some sort for lookup and retrieval. (Light Blog, not so much data... 1000 users can edit? maybe you should consider a DB.) Once you've decided how to store the data, decide how to present it.
Write some proof of concept code for each of the features you want to implement (blog templating, bbcode, user authentication, text searching...) and start to work them all together.

search for flat-file cms-es on google, for example:
http://www.flatcms.org/
this has been already done, so there is no need to create such CMS again. there are plenty of them.

I concur with dusoft that this has already been done.
DotNetBlogEngine.net is an ASP.NET (C#) based blogging system that has a nice XML back-end as an option.

Doesn't answer your question directly but check Unify.

If you do not want to write a new one or want to get some inspiration:
Flatpress
Simple PHP Blog
Ninja Designs are working on a db-free wordpress clone

You could either use XML, or use SQL compact (which allows for handling things just like SQL Server, but instead of a database you utilize flat files).

We Keep Coding

PHP, A popular general-purpose scripting language that is especially suited to web development.