Uppercase a word in a sentence after brackets PHP - php

I've created a function that cleans a posted title.
function title_var($title_variable) {
$title_variable = mysql_real_escape_string(ucwords(strtolower(trim(htmlspecialchars($title_variable, ENT_QUOTES)))));
return stripslashes($title_variable);
}
I now need to be able to make anything between () or [] all uppercase. For instance "my business name (cbs) limited" or "my business name [cbs] limited", becomes "My Business Name (CBS) Limited", with "CBS" being in all capitals.
I've done the first part of making all the words capital, I just need a way of making anything between the brackets capital.

Always use context-based escaping
Do not try to build a single function to handle all the possible cases. Just don't. It's pointless. In your function, you're trying to "clean" the string by removing certain characters. You can't clean a string by removing a set of characters. That idea is flawed because you're always going to have to allow the use of some characters that are special in some syntax or the other.
Instead, treat the string according to the context where it's going to be used. For example:
If you are going to use this string in an SQL query, you have to use prepared statements (or mysqli_real_escape_string()) to properly escape the data.
If you're going to output this value in HTML markup, you need to use htmlspecialchars() to escape the data.
If you're going to use it as command-line argument, you need to use escapeshellcmd() or escapeshellarg().
Solving the problem at hand
Use preg_replace_callback() to accomplish this. You can use the following regex to match the text inside the brackets (including the brackets):
[\(\[].*?[\)\]]
Explanation:
[\(\[] - Matches the opening bracket
.*? - Matches the text in between
[\)\]] - Matches the closing bracket
$m[0] will contain the entire matched string. You can just transform it into upper-case with strtoupper().
Modifying your function, it becomes just:
function get_title($title) {
$title = ucwords(strtolower(trim($title, ENT_QUOTES)));
return preg_replace_callback('/[\(\[].*?[\)\]]/', function ($m) {
return strtoupper($m[0]);
}, $title);
}
Demo

Related

preg_replace_callback to run EXCEPT when inside first argument of .replace()

I want to perform a php preg_match_callback against all single or double-quoted strings, for which I'm using the code seen on https://codereview.stackexchange.com/a/217356, which includes handling of backslashed single/double quotes.
const PATTERN = <<<'PATTERN'
~(?|(")(?:[^"\\]|\\(?s).)*"|(')(?:[^'\\]|\\(?s).)*'|(#|//).*|(/\*)(?s).*?\*/|(<!--)(?s).*?-->)~
PATTERN;
$result=preg_replace_callback(PATTERN, function($m) {
return $m[1]."XXXX".$m[1];
}, $test);
but this runs into a problem when scanning blocks like that seen in .replace() calls from javascript, e.g.
x=y.replace(/'/g, '"');
... which treats '/g, ' as a string, with the "');......." as the following string.
To work around this I figure it would be good to do the callback except when the quotes are inside the first argument of .replace() as these cause problems with quoting.
i.e. do the standard callbacks, but when .replace is involved I want to change the XXXX part of abc.replace(/\'/, "XXXX"); but I want to ignore the \' quote/part.
How can I do this?
See https://onlinephp.io/c/5df12 ** https://onlinephp.io/c/8a697 for a running example, showing some successes (in green), and some failures (in red).
(** Edit to correct missing slash)
Note, the XXXX is a placeholder for some more work later.
Also note that I have looked at Javascript regex to match a regex but this talks about matching regex's - and I'm talking about excluding them. If you plug in their regex pattern into my code it does not work - so should not be considered a valid answer
You can use verbs (*SKIP)(*F) to skip something. For skipping the first argument e.g.:
\(\s*/.*?/\w*\h*,(*SKIP)(*F)|(?|(")[^"\\]*(?:\\.[^"\\]*)*"|(')[^'\\]*(?:\\.[^'\\]*)*')
See this demo at regex101 or your updated php demo
The pattern on the skipped side is very simple, you might want to further improve that.
Besides I used a bit more efficient pattern to match the quoted parts, explained here.

replace special strings in a html page by php

I am looking for a way to replace all string looking alike in entire page with their defined values
Please do not recommend me other methods of including language constants.
Strings like this :
[_HOME]
[_NEWS]
all of them are looking the same in [_*] part
Now the big issue is how to scan a HTML page and to replace the defined values .
One ways to parse the html page is to use DOMDocument and then pre_replace() it
but my main problem is writing a pattern for the replacement
$pattern = "/[_i]/";
$replacement= custom_lang("/i/");
$doc = new DOMDocument();
$htmlPage = $doc->loadHTML($html);
preg_replace($pattern, $replacement, $htmlPage);
In RegEx, [] are operators, so if you use them you need to escape them.
Other problem with your expression is _* which will match Zero or more _. You need to replace it with some meaningful match, Like, _.* which will match _ and any other characters after that. SO your full expression becomes,
/\[_.*?\]/
Hey, why an ?, you might be tempted to ask: The reason being that it performs a non-greedy match. Like,
[_foo] [_bar] is the query string then a greedy match shall return one match and give you the whole of it because your expression is fully valid for the string but a non-greedy match will get you two seperate matches. (More information)
You might be better-off in being more constrictive, by having an _ followed by Capital letters. Like,
/\[_[A-Z]+\]/
Update: Using the matched strings and replacing them. To do so we use the concept called back-refrencing.
Consider modifying the above expression, enclosing the string in parentheses, like, /\[_([A-Z]+)\]/
Now in preg-replace arguments we can use the expression in parentheses by back-referencing them with $1. So what you can use is,
preg_replce("/\[_([A-Z]+)\]/e", "my_wonderful_replacer('$1')", $html);
Note: We needed the e modifier to treat the second parameter as PHP code. (More information)
If you know the full keyword you are trying to replace (e.g. [_HOME]), then you can just use str_replace() to replace all instances.
No need to make things like this more complex by introducing regex.

How to use preg_replace in php to replace text with function result?

I have a complicated problem:
I have a very long text and I need to call some php functions inside my text.
The function name is myfunction();
I`we included in my text the function in the following way:
" text text text myfunction[1,2,3,4,5]; more text text ... "
And I want to replace each myfunction[...] with the result of the function myfunction with the variables from the [] brackets.
my code is:
<?php echo preg_replace('/myfunction[[0-9,]+]/i',myfunction($1),$post['content']); ?>
,but it`s not working.
The parameter should be an array, because it can contain any number of values.
If I were you, I would avoid using the e modifier to preg_replace because it can lead you open to execution of arbitrary code. Use preg_replace_callback instead. It's slightly more verbose, but much more effective:
echo preg_replace_callback('/myfunction\[([0-9,]+)\]/i', function($matches) {
$args = explode(',', $matches[1]); // separate the arguments
return call_user_func_array('myfunction', $args); // pass the arguments to myfunction
}, $post['content']);
This uses an anonymous function. This functionality won't be available to you if you use a version of PHP before 5.3. You'll have to create a named function and use that instead, as per the instructions on the manual page.
You can use preg_replace()'s "e" modifier (for EVAL) used like this :
$text = preg_replace('/myfunction\[(.*?)\]/e', 'myfunction("$1")', $text);
I didn't really get how your data is structured so it's all I can do to help you at the moment. You can explore that solution.
From the PHP Manual :
e (PREG_REPLACE_EVAL)
If this modifier is set, preg_replace() does normal substitution of backreferences in the replacement string, evaluates it as PHP code, and uses the result for replacing the search string. Single quotes, double quotes, backslashes () and NULL chars will be escaped by backslashes in substituted backreferences.
You need to add the "e" modifier, escape [ and ] in the regex expression and stringify the second argument.
preg_replace('/myfunction\[[0-9,]+\]/ei','myfunction("$1")',$post['content']);

regex with special characters?

i am looking for a regex that can contain special chracters like / \ . ' "
in short i would like a regex that can match the following:
may contain lowercase
may contain uppercase
may contain a number
may contain space
may contain / \ . ' "
i am making a php script to check if a certain string have the above or not, like a validation check.
The regular expression you are looking for is
^[a-z A-Z0-9\/\\.'"]+$
Remember if you are using PHP you need to use \ to escape the backslashes and the quotation mark you use to encapsulate the string.
In PHP using preg_match it should look like this:
preg_match("/^[a-z A-Z0-9\\/\\\\.'\"]+$/",$value);
This is a good place to find the regular expressions you might want to use.
http://regexpal.com/
You can always escape them by appending a \ in front of the special characters.
try this:
preg_match("/[A-Za-z0-9\/\\.'\"]/", ...)
NikoRoberts is 100% correct.
I would only add the following suggestion: When creating a PHP regex pattern string, always use: single-quotes. There are far fewer chars which need to be escaped (i.e. only the single quote and the backslash itself needs to be escaped (and the backslash only needs to be escaped if it appears at the end of the string)).
When dealing with backslash soup, it helps to print out the (interpreted) regex string. This shows you exactly what is being presented to the regex engine.
Also, a "number" might have an optional sign? Yes? Here is my solution (in the form of a tested script):
<?php // test.php 20110311_1400
$data_good = 'abcdefghijklmnopqrstuvwxyzABCDE'.
'FGHIJKLMNOPQRSTUVWXYZ0123456789+- /\\.\'"';
$data_bad = 'abcABC012~!###$%^&*()';
$re = '%^[a-zA-Z0-9+\- /\\\\.\'"]*$%';
echo($re ."\n");
if (preg_match($re, $data_good)) {
echo("CORRECT: Good data matches.\n");
} else {
echo("ERROR! Good data does NOT match.\n");
}
if (preg_match($re, $data_bad)) {
echo("ERROR! Bad data matches.\n");
} else {
echo("CORRECT: Bad data does NOT match.\n");
}
?>
The following regex will match a single character that fits the description you gave:
[a-zA-Z0-9\ \\\/\.\'\"]
If your point is to insure that ONLY characters in this range of characters are used in your string, then you can use the negation of this which would be:
[^a-zA-Z0-9\ \\\/\.\'\"]
In the second case, you could use your regex to find the bad stuff (that you don't want to be included), and if it didn't find anything then your string pattern must be kosher, because I'm assuming that if you find one character that is not in the proper range, then your string is not valid.
so to put it in PHP syntax:
$regex = "[^a-zA-Z0-9\ \\\/\.\'\"]"
if preg_match( $regex, ... ) {
// handle the bad stuff
}
Edit 1:
I've completely ignored the fact that backslashes are special in php double-quoted strings, so here is a correcting to the above code:
$regex = "[^a-zA-Z0-9\\ \\\\\\/\\.\\'\\\"]"
If that doesn't work it shouldn't take too much for someone to debug how many of the backslashes need to be escaped with a backslash, and what other characters need also to be escaped....

Replacing Tags with Includes in PHP with RegExps

I need to read a string, detect a {VAR}, and then do a file_get_contents('VAR.php') in place of {VAR}. The "VAR" can be named anything, like TEST, or CONTACT-FORM, etc. I don't want to know what VAR is -- not to do a hard-coded condition, but to just see an uppercase alphanumeric tag surrounded by curly braces and just do a file_get_contents() to load it.
I know I need to use preg_match and preg_replace, but I'm stumbling through the RegExps on this.
How is this useful? It's useful in hooking WordPress.
Orion above has a right solution, but it's not really necessary to use a callback function in your simple case.
Assuming that the filenames are A-Z + hyphens you can do it in 1 line using PHP's /e flag in the regex:
$str = preg_replace('/{([-A-Z]+)}/e', 'file_get_contents(\'$1.html\')', $str);
This'll replace any instance of {VAR} with the contents of VAR.html. You could prefix a path into the second term if you need to specify a particular directory.
There are the same vague security worries as outlined above, but I can't think of anything specific.
You'll need to do a number of things. I'm assuming you can do the legwork to get the page data you want to preprocess into a string.
First, you'll need the regular expression to match correctly. That should be fairly easy with something like /{\w+}/.
Next you'll need to use all of the flags to preg_match to get the offset location in the page data. This offset will let you divide the string into the before, matching, and after parts of the match.
Once you have the 3 parts, you'll need to run your include, and stick them back together.
Lather, rinse, repeat.
Stop when you find no more variables.
This isn't terribly efficient, and there are probably better ways. You may wish to consider doing a preg_split instead, splitting on /[{}]/. No matter how you slice it you're assuming that you can trust your incoming data, and this will simplify the whole process a lot. To do this, I'd lay out the code like so:
Take your content and split it like so: $parts = preg_split('/[{}]/', $page_string);
Write a recursive function over the parts with the following criteria:
Halt when length of arg is < 3
Else, return a new array composed of
$arg[0] . load_data($arg[1]) . $arg[2]
plus whatever is left in $argv[3...]
Run your function over $parts.
You can do it without regexes (god forbid), something like:
//return true if $str ends with $sub
function endsWith($str,$sub) {
return ( substr( $str, strlen( $str ) - strlen( $sub ) ) === $sub );
}
$theStringWithVars = "blah.php cool.php awesome.php";
$sub = '.php';
$splitStr = split(" ", $theStringWithVars);
for($i=0;$i<count($splitStr);$i++) {
if(endsWith(trim($splitStr[$i]),$sub)) {
//file_get_contents($splitStr[$i]) etc...
}
}
Off the top of my head, you want this:
// load the "template" file
$input = file_get_contents($template_file_name);
// define a callback. Each time the regex matches something, it will call this function.
// whatever this function returns will be inserted as the replacement
function replaceCallback($matches){
// match zero will be the entire match - eg {FOO}.
// match 1 will be just the bits inside the curly braces because of the grouping parens in the regex - eg FOO
// convert it to lowercase and append ".html", so you're loading foo.html
// then return the contents of that file.
// BEWARE. GIANT MASSIVE SECURITY HOLES ABOUND. DO NOT DO THIS
return file_get_contents( strtolower($matches[1]) . ".html" );
};
// run the actual replace method giving it our pattern, the callback, and the input file contents
$output = preg_replace_callback("\{([-A-Z]+)\}", replaceCallback, $input);
// todo: print the output
Now I'll explain the regex
\{([-A-Z]+)\}
The \{ and \} just tell it to match the curly braces. You need the slashes, as { and } are special characters, so they need escaping.
The ( and ) create a grouping. Basically this lets you extract particular parts of the match. I use it in the function above to just match the things inside the braces, without matching the braces themselves. If I didn't do this, then I'd need to strip the { and } out of the match, which would be annoying
The [-A-Z] says "match any uppercase character, or a -
The + after the [-A-Z] means we need to have at least 1 character, but we can have up to any number.
Comparatively speaking, regular expression are expensive. While you may need them to figure out which files to load, you certainly don't need them for doing the replace, and probably shouldn't use regular expressions. After all, you know exactly what you are replacing so why do you need fuzzy search?
Use an associative array and str_replace to do your replacements. str_replace supports arrays for doing multiple substitutions at once. One line substitution, no loops.
For example:
$substitutions = array('{VAR}'=>file_get_contents('VAR.php'),
'{TEST}'=>file_get_contents('TEST.php'),
...
);
$outputContents = str_replace( array_keys($substitutions), $substitutions, $outputContents);

Categories