I'm currently writing a little library for detecting "bad" words in content (see here), and I'm having a little trouble deciding how/where to namespace a specific class.
The usage flow of my library so far is as follows,
$dictionary = new Dictionary\Csv('/path/to/file.csv');
$config = new Filter\Config\Standard();
$filter = new Filter($dictionary, $config);
Basically you create a Dictionary of words, a Filter\Config which defines how the Filter executes, and then create a Filter from said objects.
Internally, the Filter uses the Filter\Config to convert the Words in the Dictionary to regular expressions.
Now my problem is I don't know what to call and/or where to put this "converter".
My current ideas are,
Word\RegExpConverter (as there is a Word class to represent a word)
Word\Converter\RegExp
Filter\RegExpConverter
Because the Word is being converted, it seems to make sense to have it in the Word\ namespace, but at the same time it's something specific to the Filter and requires the Filter\Config.
Thoughts? Ideas?
Cheers, Steve
So you've got a class called Word that Filter consumes (via the converter)?
You may be overdoing it -- first, consider just having Filter consume Words and convert them internally. If you have no plans to reuse the regex-conversion functionality elsewhere, and you have no concrete plans to provide an alternative converter to be used in place of the RegExConverter, there's not a good reason to stick it in it's own class.
It seems like providing some other (non-RegEx) converter is more likely.
If that's the case, that should be a hint. Perhaps something like this would be appropriate:
Filter\WordConverter\RegEx
Filter\WordConverter\Magic
Filter\WordConverter\AbstractWordConverter (or maybe I'm just an interface?)
After reading timdev's answer and some reflection, he was right in that I was trying to over abstract my architecture.
I will only ever want to convert a Word into a regular expression. If I were going to convert into multiple formats, I would then introduce converters and inject them.
And because the Filter\Config exists to define how that regular expression should be generated, it only makes sense that that's where the logic should reside.
So my implementation will be,
$config->generateRegExp($word);
Cheers.
Related
I was learning about factory design pattern in php. From what i understand, this pattern is useful in cases where we have a bunch of classes, lets say, class_1, class_2, class_3 etc.
If the particular class which has to be instantiated is known only at runtime, then instead of using the new operator to create the objects for these classes we create a factory class which will do the job for us.
The factory class will look somewhat like this:
class Factory
{
// $type will have values 1, 2, 3 etc.
public function create_obj($type)
{
$class_name = "class_".$type;
if(class_exists($class_name))
{
return new $class_name();
}
}
}
My question is what is the advantage in using a factory class here? why not just use a simple function instead of a class which is going to complicate things?
The method in your code snippet is not a factory method, but merely a helper method which does a well-known reflective task: instantiates a class based on its name. This is exactly the opposite of what a Factory pattern is for: creating objects (products) without specifying the exact class of object that will be created.
As explained in Wikipedia:
The essence of this pattern is to "Define an interface for creating an object, but let the classes that implement the interface decide which class to instantiate."
You are probably confused by the last PHP example in the Wikipedia article on Factory pattern, and yes, it is a bad example. Check the Java example just above that for a meaningful example (whoever tried to convert that to PHP missed the whole point). The Java example returns a file reader based on its extension, and that is exactly the use case for a factory pattern. Creating your own personal "rule" that certain classes need to have a certain name prefix is most likely a bad design decision.
At the basic root of the question you could use a simple function to accomplish the goal. Where this breaks down is the programmer best practice where you want Low Coupling, High Cohesion.
The function itself plays a special role in your application design and to put it alongside other functions with different roles and purposes is non-intuitive to maintain and read. Remember, patterns are used to simplify common problems that are faced (almost) universally through project domains and as a result they tend to be segmented from the rest of the code base in order to help differentiate them.
Additionally, by placing the pattern in its own class any classes that need to use it do not need to know the class structure of class_1/2/3/etc. and instead only need to refer to the parent class allowing you to create further classes down the line, modify the pattern accordingly without needing to resolve dependencies and links in your remaining code. This ties back to the low coupling.
The concept is that you design to an interface then you can swap out the class later.
Forget this pattern for a minute an consider this:
if (type == "manager")
employee = new manager();
else
employee = new employee();
employee.name = "myname";
In this case employee and manager both inherit from the same class. After the if statement you can treat them like people and you are abstracted from their actual implementation. Instead of having if statements all over the place, you can implement the factory pattern. If you only have a couple the pattern is probably overkill. If you want to easily extend the program in the future, consider a pattern.
Another important reason for using the Factory Pattern is to consider what happens to your code when you have to add classes to your design & code.
If you're not using a Factory Pattern, your code is going to be increasingly tightly coupled, you'll have to make changes in many different places. You'll have to ensure that every place you have to touch the code is coordinated with all of the other (tightly coupled) places you'll have to touch. Testing becomes much, much more complicated, and something is going to break.
The Factory Pattern gives you a way to reduce coupling and helps you to encapsulate responsibilities into just a few places. With a Factory Pattern, adding additional classes means touching the code in fewer places. Testing (constructing test cases as well as running tests) is simplified.
In the real world, most code is complex 'enough' that the benefits of the Factory Pattern are clear. Changing, refining and growing the object model, making testing as complete and rigorous as possible in the face of rapid change, and ensuring that you're making your code as non-rigid as possible (all while realizing that multiple people are going to be working on it over the course of months/years) -- the Factory Pattern is usually a no-brainer.
With a trivial example, it can be hard to see the advantages of using the Factory Pattern. (And if your code really is trivial, then the pattern probably won't buy you much.) That's a problem with many examples I see when I search for it on the web -- the examples tend to focus on 'you can determine the class at run-time!' and are simplistic.
Here's one example that's not too trivial, and I think gets people about thinking of all of the possible benefits of the pattern:
A presentation on the Factory Pattern by Bob Tarr (pdf). (It's example 2, starting about page 10.) Imagine you're writing a maze game where a person has to explore a maze and all the rooms in a maze. Your object model include a Maze that consists of things like Doors, Rooms, Walls, and there's a Map that also has to keep track of them all. Simple enough. But what happens when you start adding Enchanted Rooms and Enchanted Doors and Magic Windows and Talking Pictures and Twisty Little Passages? You're going to end up with a lot of classes to represent everything; you want to make sure that you have to change (touch) as little code as possible when you add a new class. And you don't want to have to modify the code in the Map class, for instance, each time you add a new class: you want to keep the classes focused on what they should really be responsible for.
Think not just about what gets instantiated at run time, but also about code complexity.
He also gives an example of using the Factory Pattern (a Factory Method, specifically) with UI components -- where the Factory Pattern turns up a lot. (For a beginner, or someone who has never dealt with UI code, I don't think that example is quite as clear.)
Remember that most coding is done on existing code: most time is spent modifying code that's already there. You want to be sure that your code will be able to handle changes without being fragile. Reducing coupling and encapsulating responsibility will go a long way in making it so.
I've been struggling to find good way to implement my system which essentially matches the season and episode number of a show from a string, you can see the current working code here: https://github.com/huddy/tvfilename
I'm currently rewriting this library and want a nicer way to implement how the the match happens, currently essentially the way it works is:
There's a folder with classes in it (called handlers), every handler is a class that implements an interface to ensure a method called match(); exists, this match method uses the regex stored in a property of that handler class (of which there are many) to try and match a season and episode.
The class loads all of these handlers by instantiating each one into a array stored in a property, when I want to try and match some strings the method iterates over these objects calling match(); and the first one that returns true is then returned in a result set with the season and episode it matched.
I don't really like this way of doing it, it's kind of hacky to me, and I'm hoping a design pattern can help, my ultimate goal is to do this using best practices and I wondered which one I should use?
The other problems that exist are:
More than one handler could match a string, so they have to be in an order to prevent the more greedy ones matching first, not sure if this is solvable as some of the regex patterns have to be greedy, but possibly a score system, something that shows a percentage of how likely the match is correct, i'd have no idea how to actually implement this though.
I'm not if instantiating all those handlers is a good way of doing it, speed is important, but using best practices and sticking to design patterns to create good, extensible and maintainable code is my ultimate priority. It's worth noting the handler classes sometimes do other things than just regex matching, they sometimes prep the string to be matched by removing common words etc.
Cheers for any help
Billy
Creating a class for each regexp is very inefficient, you are confusing classes with data here. You could store all regular expressions in a configuration array or a separate class or XML file - does not matter. Then a single method could accept all regular expressions, iterate through them and perform matching.
In case a season is not always match[1], you can use named subpatterns - that would solve that problem.
As for your order of patterns problem, you could simply put all patterns in your preferred order - from most specific ones to more general ones.
You can adapt this pattern to implement complex case analysis to PHP. It's more or less what you've been doing. You define all the cases, you implement a condition which says when the case applies, and how to solve the problem when you are inside that case. The pattern will allow you to decide what to do if multiple cases apply (choose one, give one preference over another, or whatever you want).
I'd also be a good idea if you named your formats somthing nicer like for S01E01
SddEdd
SnnEnn
SDigitDigitEDigitDigit
STwoDigitsETWoDigits
instead of format1, format2.
You could also modify the pattern a little to use object instances for both the condition and the resolution of the case, so you'll be able to handle al the RegExps cases with a single class
new RegexpCase("S(?:\d{2})E(?:\d{2})");
and all the other cases that aren't just regexps with a class to solve that case.
I think you need a preference order for your order of patterns based on the parameter you want.
I think second answer really answers your question well.Also you seem to be doing a good job with your code.It seems quite well -written
I personally would prefer to use separate classes in this instance, your code base will be far more flexible should you take that approach (ie if you really need to manipulate the string). If you take a look at how Zend implements Zend_Validate and Zend_Filter they have a very similar approach to the current implementation (loop over a property running ->validate() and ->filter() on the classes).
I would have a structure similar to this:
App_Tv_Match
App_Tv_Match_Abstract
App_Tv_Match_Collection
App_Tv_Match_SXXEXX
App_Tv_Match_SeasonXEpisodeX
(Your naming may become irritating for the classes however).
However in the abstract I would have a setup similar to this:
Abstract Class App_Tv_Match_Abstract
{
protected $_returnOnMatch = false;
protected $_priority = 1;
}
And my App_Tv_Match_Collection class would have the match classes injected - the collection class would then handle the sorting and matching using the match classes. If a Match class had "returnOnMatch" flagged as true then if matched I would stop trying and return this one (ie for the non-greedy ones), however if no returnOnMatch classes matched then I would return the one with the highest priority (either using sorting, or a simple loop in the collection class).
You need two functions:
1. Retrieves plain text from a DB.
2. Based on 1, retrieves rich text - with new lines, font styles etc.
You expect function 2 to be much more used than function 1.
Would you name 1 text() and 2 - rich_text() or would you use the simpler text() for the name of 2, since it's expected to be more popular and use something like plain_text() for 1?
The more general question would be - do you consider function's expected "popularity" when naming it?
No, I don't consider a function's popularity. It's best to have descriptive names for both functions (e.g. one is plain_text() and one is rich_text()).
I think that it's best to use relatively specific names for all functions, since using more general names 1.) doesn't give the user too much of an idea of what the function does by reading the name and 2.) can lead to confusion.
Of course, how you name your functions is your choice: I just recommend that you give them somewhat descriptive names and that you name them (and order the arguments) consistently.
I hate nothing more than programming with an API where every function name is short and cryptic (PHP's string functions are that way, even though I'm used to it now -- strstr and strtok are hardly intuitive names for what they do).
It's worth thinking about popularity, sometimes, if the code is going to be very widely used. Common words in natural languages tend to be short.
Unless you think you're writing the next UNIX, however, you're probably better off to make the names descriptive and not worry about length.
I'd go for getPlainText() and getRichText().
As a rule of thumb, I always name my functions loosely based on what they do. If I was in your shoes, I would name function one retrieve_plain_text() and function two retrieve_rich_text(). I can now glance at either function name and immediately have a basic understanding of what the function is supposed to do: Retrieve (get from something, in this case the database) plain/rich (the type) text.
I'm trying to find out what the best way would be to have a specification pattern in PHP where the specifications could (optionally) by transformed to PHP.
I am exploring some new directions and am testing how well they would work. Code and ideas are still very unclear in my mind.
Minimal interfaces would be like these:
interface IRepository {
public function get(ISpecification $specification);
}
interface ISpecification {
public function isSatisfiedBy($candidate);
}
If the repository hides a sql database the specification would need to transform to sql. Adding a ->toSQL() method seems ad hoc. A class that translates the specifications is also an option but it seems like a lot of overhead to finally generate the sql.
Ideas appreciated.
Quoting from POEAA (pg.324):
Under the covers, Repository combines Metadata Mapping (329) with a Query Object (316) to automatically generate SQL code from the criteria. Whether the criteria know how to add themselves to a query, the Query Object (316) knows how to incorporate criteria objects, or the Metadata Mapping (306) itself controls the interaction is an implementation detail.
The criteria in this descriptions are of course your Specification pattern. I'd say your suggested approach to use a toSQL method on the criteria objects is fine when the application is relatively small. Like you already said, going the other routes is more difficult, but it also provides greater flexibility and decoupling. In the end, only you can decide.
I am wondering whats the best practices regarding functions and objects. For example I want to perform an action called tidy. It will take my data as input and tidy it and return it.
Now I can do this in two ways. One using a simple function and the other using a class.
Function: $data = tidy($data);
Class:
$tidy = new Tidy();
$data = $tidy->clean($tidy);
Now the advantage in making it a class is that I do not have to load the class before. I can simply use the autoload feature of php to do so.
Another example is the database class. Now everyone seems to be using a separate class for db connectivity. But we usually have a single object of that class only. Isn't this kind of contrary to the definition of class and objects in a sense that we are using the class only to intantiate a single object?
I kind of dont understand when to use a function and when to use a class. What is the best practice regarding the same? Any guidelines?
Thank you,
Alec
For something that does one thing, and only one thing, I'd just use a function. Anything more complex, and I'd consider using an object.
I took the time to poke through some piles of (arguably ugly and horrible) PHP code and, really, some things could have been done as objects, but were left as functions. Time conversions and string replacements.
Functions typically do one specific task.
Objects represent something that have tasks associated with it. (methods)
Use a function for tidy. Plain and simple. ;-)
I'd personally make a 'data' object that handles data then have a tidy method under it.
This pattern will allow the number of tasks you do on data to expand while containing it all in a nice little self-contained chunk.
For your case, I'd make it a function, possibly a static function in something like a "util" class (for which the only purpose of the class is to act like a namespace - it'll group all your random useful methods together). As a rule of thumb, only use an object if it needs to store some data that needs to live between multiple function calls. That's why the database methods are made to be part of an object, because they store a database handle which is used between multiple function calls. Yes, there only ever is one database object, but having it as an object groups all the database-related stuff into one place, making it easier to maintain and keep bug-free.