How can I make preg_replace work in all PHP environments? - php

My function using preg_replace is working perfectly on a dev server, but not at all on the production server. The problem might have something to do with encoding. Is there a way to make this expression so that it works regardless of the encoding?
The $config looks like this:
class JConfig {
public $mighty = array("0" => array("0" => "/`?\\#__mightysites[` \\n]+/u"), "1" => array("0" => "`hhd_mightysites` "));
public $mighty_enable = '0';
public $mighty_language = '';
public $mighty_template = '9';
public $mighty_home = '';
public $mighty_langoverride = '0';......
I put the variables associated with the lines I would like to strip in an array called strips like
$strips = array(
'mighty',
'mighty_enable',
'mighty_sync',
'mighty_language',
'mighty_template',.....
Then use a loop to strip out the lines:
foreach ($strips as $var) {
if (JString::strpos($config, 'public $' . $var . ' =') !== false) {
$config = preg_replace('/\tpublic \$' . $var . ' \= ([^\;]*)\;\n/u', '', $config);
$tempvar .= $var . ", ";
}
}
Again, it works perfectly on our dev server. It does not do anything to any lines on the production server. I also know that it passes the strpos like to get to the line with preg_replace. Can I make preg_replace environment proof?
I appreciate the help, since it is happening only on a production server it is very difficult to test!

The safest bet would be to not "trust" any of the literal spaces/tabs that you expect to match.
Instead of using \t and , I'll recommend \s+ where you expect a tab and \s where you expect a space.
Furthermore, to cover cases where the operating system may use \r\n or \n at the end of each line, you can use \R to match both variations.
I'm going to include a start of line character check via ^ at the beginning of the pattern and m as a pattern modifier. This ensures that we match and only match where you expect a \t at the start of the line.
Finally, preg_replace() has an optional 5th parameter that counts how many replacements were made. If $found is a non-zero value, then store the current $var value.
Code: (Demo)
$config = <<<'CONFIG'
class JConfig {
public $mighty = array("0" => array("0" => "/`?\\#__mightysites[` \\n]+/u"), "1" => array("0" => "`hhd_mightysites` "));
public $mighty_enable = '0';
public $mighty_language = '';
public $mighty_template = '9';
public $mighty_home = '';
public $mighty_langoverride = '0';......
CONFIG;
$strips = [
'mighty',
'mighty_enable',
'mighty_sync',
'mighty_language',
'mighty_template'
];
$tempvar = '';
foreach ($strips as $var) {
$config = preg_replace('~^\s+public\s\$' . $var . '\s=\s[^;]*;\R~um', '', $config, -1, $found);
if ($found) {
$tempvar .= $var . ", ";
}
}
echo "\$tempvar = $tempvar\n\n";
echo $config;
Output:
$tempvar = mighty, mighty_enable, mighty_language, mighty_template,
class JConfig {
public $mighty_home = '';
public $mighty_langoverride = '0';......
p.s. One final suggested refinement... If you don't actually need the $tempvar variable for your project (meaning you are only using this during debugging) then you can avoid the loop entirely, and just implode('|', $strips), wrap that generated string in ( and ), save as $var, and call preg_replace() just one time. This will be more efficient and your sample $strips data does not need to be prepared with preg_quote() because there are you "special characters" to escape.

Related

PHP Update array index/key [duplicate]

If I had:
$string = "PascalCase";
I need
"pascal_case"
Does PHP offer a function for this purpose?
A shorter solution: Similar to the editor's one with a simplified regular expression and fixing the "trailing-underscore" problem:
$output = strtolower(preg_replace('/(?<!^)[A-Z]/', '_$0', $input));
PHP Demo |
Regex Demo
Note that cases like SimpleXML will be converted to simple_x_m_l using the above solution. That can also be considered a wrong usage of camel case notation (correct would be SimpleXml) rather than a bug of the algorithm since such cases are always ambiguous - even by grouping uppercase characters to one string (simple_xml) such algorithm will always fail in other edge cases like XMLHTMLConverter or one-letter words near abbreviations, etc. If you don't mind about the (rather rare) edge cases and want to handle SimpleXML correctly, you can use a little more complex solution:
$output = ltrim(strtolower(preg_replace('/[A-Z]([A-Z](?![a-z]))*/', '_$0', $input)), '_');
PHP Demo |
Regex Demo
Try this on for size:
$tests = array(
'simpleTest' => 'simple_test',
'easy' => 'easy',
'HTML' => 'html',
'simpleXML' => 'simple_xml',
'PDFLoad' => 'pdf_load',
'startMIDDLELast' => 'start_middle_last',
'AString' => 'a_string',
'Some4Numbers234' => 'some4_numbers234',
'TEST123String' => 'test123_string',
);
foreach ($tests as $test => $result) {
$output = from_camel_case($test);
if ($output === $result) {
echo "Pass: $test => $result\n";
} else {
echo "Fail: $test => $result [$output]\n";
}
}
function from_camel_case($input) {
preg_match_all('!([A-Z][A-Z0-9]*(?=$|[A-Z][a-z0-9])|[A-Za-z][a-z0-9]+)!', $input, $matches);
$ret = $matches[0];
foreach ($ret as &$match) {
$match = $match == strtoupper($match) ? strtolower($match) : lcfirst($match);
}
return implode('_', $ret);
}
Output:
Pass: simpleTest => simple_test
Pass: easy => easy
Pass: HTML => html
Pass: simpleXML => simple_xml
Pass: PDFLoad => pdf_load
Pass: startMIDDLELast => start_middle_last
Pass: AString => a_string
Pass: Some4Numbers234 => some4_numbers234
Pass: TEST123String => test123_string
This implements the following rules:
A sequence beginning with a lowercase letter must be followed by lowercase letters and digits;
A sequence beginning with an uppercase letter can be followed by either:
one or more uppercase letters and digits (followed by either the end of the string or an uppercase letter followed by a lowercase letter or digit ie the start of the next sequence); or
one or more lowercase letters or digits.
A concise solution and can handle some tricky use cases:
function decamelize($string) {
return strtolower(preg_replace(['/([a-z\d])([A-Z])/', '/([^_])([A-Z][a-z])/'], '$1_$2', $string));
}
Can handle all these cases:
simpleTest => simple_test
easy => easy
HTML => html
simpleXML => simple_xml
PDFLoad => pdf_load
startMIDDLELast => start_middle_last
AString => a_string
Some4Numbers234 => some4_numbers234
TEST123String => test123_string
hello_world => hello_world
hello__world => hello__world
_hello_world_ => _hello_world_
hello_World => hello_world
HelloWorld => hello_world
helloWorldFoo => hello_world_foo
hello-world => hello-world
myHTMLFiLe => my_html_fi_le
aBaBaB => a_ba_ba_b
BaBaBa => ba_ba_ba
libC => lib_c
You can test this function here: http://syframework.alwaysdata.net/decamelize
The Symfony Serializer Component has a CamelCaseToSnakeCaseNameConverter that has two methods normalize() and denormalize(). These can be used as follows:
$nameConverter = new CamelCaseToSnakeCaseNameConverter();
echo $nameConverter->normalize('camelCase');
// outputs: camel_case
echo $nameConverter->denormalize('snake_case');
// outputs: snakeCase
Ported from Ruby's String#camelize and String#decamelize.
function decamelize($word) {
return preg_replace(
'/(^|[a-z])([A-Z])/e',
'strtolower(strlen("\\1") ? "\\1_\\2" : "\\2")',
$word
);
}
function camelize($word) {
return preg_replace('/(^|_)([a-z])/e', 'strtoupper("\\2")', $word);
}
One trick the above solutions may have missed is the 'e' modifier which causes preg_replace to evaluate the replacement string as PHP code.
Most solutions here feel heavy handed. Here's what I use:
$underscored = strtolower(
preg_replace(
["/([A-Z]+)/", "/_([A-Z]+)([A-Z][a-z])/"],
["_$1", "_$1_$2"],
lcfirst($camelCase)
)
);
"CamelCASE" is converted to "camel_case"
lcfirst($camelCase) will lower the first character (avoids 'CamelCASE' converted output to start with an underscore)
[A-Z] finds capital letters
+ will treat every consecutive uppercase as a word (avoids 'CamelCASE' to be converted to camel_C_A_S_E)
Second pattern and replacement are for ThoseSPECCases -> those_spec_cases instead of those_speccases
strtolower([…]) turns the output to lowercases
php does not offer a built in function for this afaik, but here is what I use
function uncamelize($camel,$splitter="_") {
$camel=preg_replace('/(?!^)[[:upper:]][[:lower:]]/', '$0', preg_replace('/(?!^)[[:upper:]]+/', $splitter.'$0', $camel));
return strtolower($camel);
}
the splitter can be specified in the function call, so you can call it like so
$camelized="thisStringIsCamelized";
echo uncamelize($camelized,"_");
//echoes "this_string_is_camelized"
echo uncamelize($camelized,"-");
//echoes "this-string-is-camelized"
I had a similar problem but couldn't find any answer that satisfies how to convert CamelCase to snake_case, while avoiding duplicate or redundant underscores _ for names with underscores, or all caps abbreviations.
Th problem is as follows:
CamelCaseClass => camel_case_class
ClassName_WithUnderscores => class_name_with_underscore
FAQ => faq
The solution I wrote is a simple two functions call, lowercase and search and replace for consecutive lowercase-uppercase letters:
strtolower(preg_replace("/([a-z])([A-Z])/", "$1_$2", $name));
"CamelCase" to "camel_case":
function camelToSnake($camel)
{
$snake = preg_replace('/[A-Z]/', '_$0', $camel);
$snake = strtolower($snake);
$snake = ltrim($snake, '_');
return $snake;
}
or:
function camelToSnake($camel)
{
$snake = preg_replace_callback('/[A-Z]/', function ($match){
return '_' . strtolower($match[0]);
}, $camel);
return ltrim($snake, '_');
}
If you are looking for a PHP 5.4 version and later answer here is the code:
function decamelize($word) {
return $word = preg_replace_callback(
"/(^|[a-z])([A-Z])/",
function($m) { return strtolower(strlen($m[1]) ? "$m[1]_$m[2]" : "$m[2]"); },
$word
);
}
function camelize($word) {
return $word = preg_replace_callback(
"/(^|_)([a-z])/",
function($m) { return strtoupper("$m[2]"); },
$word
);
}
You need to run a regex through it that matches every uppercase letter except if it is in the beginning and replace it with underscrore plus that letter. An utf-8 solution is this:
header('content-type: text/html; charset=utf-8');
$separated = preg_replace('%(?<!^)\p{Lu}%usD', '_$0', 'AaaaBbbbCcccDdddÁáááŐőőő');
$lower = mb_strtolower($separated, 'utf-8');
echo $lower; //aaaa_bbbb_cccc_dddd_áááá_őőőő
If you are not sure what case your string is, better to check it first, because this code assumes that the input is camelCase instead of underscore_Case or dash-Case, so if the latters have uppercase letters, it will add underscores to them.
The accepted answer from cletus is way too overcomplicated imho and it works only with latin characters. I find it a really bad solution and wonder why it was accepted at all. Converting TEST123String into test123_string is not necessarily a valid requirement. I rather kept it simple and separated ABCccc into a_b_cccc instead of ab_cccc because it does not lose information this way and the backward conversion will give the exact same string we started with. Even if you want to do it the other way it is relative easy to write a regex for it with positive lookbehind (?<!^)\p{Lu}\p{Ll}|(?<=\p{Ll})\p{Lu} or two regexes without lookbehind if you are not a regex expert. There is no need to split it up into substrings not to mention deciding between strtolower and lcfirst where using just strtolower would be completely fine.
Short solution:
$subject = "PascalCase";
echo strtolower(preg_replace('/\B([A-Z])/', '_$1', $subject));
Not fancy at all but simple and speedy as hell:
function uncamelize($str)
{
$str = lcfirst($str);
$lc = strtolower($str);
$result = '';
$length = strlen($str);
for ($i = 0; $i < $length; $i++) {
$result .= ($str[$i] == $lc[$i] ? '' : '_') . $lc[$i];
}
return $result;
}
echo uncamelize('HelloAWorld'); //hello_a_world
A version that doesn't use regex can be found in the Alchitect source:
decamelize($str, $glue='_')
{
$counter = 0;
$uc_chars = '';
$new_str = array();
$str_len = strlen($str);
for ($x=0; $x<$str_len; ++$x)
{
$ascii_val = ord($str[$x]);
if ($ascii_val >= 65 && $ascii_val <= 90)
{
$uc_chars .= $str[$x];
}
}
$tok = strtok($str, $uc_chars);
while ($tok !== false)
{
$new_char = chr(ord($uc_chars[$counter]) + 32);
$new_str[] = $new_char . $tok;
$tok = strtok($uc_chars);
++$counter;
}
return implode($new_str, $glue);
}
So here is a one-liner:
strtolower(preg_replace('/(?|([a-z\d])([A-Z])|([^\^])([A-Z][a-z]))/', '$1_$2', $string));
danielstjules/Stringy provieds a method to convert string from camelcase to snakecase.
s('TestUCase')->underscored(); // 'test_u_case'
Laravel 5.6 provides a very simple way of doing this:
/**
* Convert a string to snake case.
*
* #param string $value
* #param string $delimiter
* #return string
*/
public static function snake($value, $delimiter = '_'): string
{
if (!ctype_lower($value)) {
$value = strtolower(preg_replace('/(.)(?=[A-Z])/u', '$1'.$delimiter, $value));
}
return $value;
}
What it does: if it sees that there is at least one capital letter in the given string, it uses a positive lookahead to search for any character (.) followed by a capital letter ((?=[A-Z])). It then replaces the found character with it's value followed by the separactor _.
If you are not using Composer for PHP you are wasting your time.
composer require doctrine/inflector
use Doctrine\Inflector\InflectorFactory;
// Couple ways to get class name:
// If inside a parent class
$class_name = get_called_class();
// Or just inside the class
$class_name = get_class();
// Or straight get a class name
$class_name = MyCustomClass::class;
// Or, of course, a string
$class_name = 'App\Libs\MyCustomClass';
// Take the name down to the base name:
$class_name = end(explode('\\', $class_name)));
$inflector = InflectorFactory::create()->build();
$inflector->tableize($class_name); // my_custom_class
https://github.com/doctrine/inflector/blob/master/docs/en/index.rst
Use Symfony String
composer require symfony/string
use function Symfony\Component\String\u;
u($string)->snake()->toString()
The direct port from rails (minus their special handling for :: or acronyms) would be
function underscore($word){
$word = preg_replace('#([A-Z\d]+)([A-Z][a-z])#','\1_\2', $word);
$word = preg_replace('#([a-z\d])([A-Z])#', '\1_\2', $word);
return strtolower(strtr($word, '-', '_'));
}
Knowing PHP, this will be faster than the manual parsing that's happening in other answers given here. The disadvantage is that you don't get to chose what to use as a separator between words, but that wasn't part of the question.
Also check the relevant rails source code
Note that this is intended for use with ASCII identifiers. If you need to do this with characters outside of the ASCII range, use the '/u' modifier for preg_matchand use mb_strtolower.
Here is my contribution to a six-year-old question with god knows how many answers...
It will convert all words in the provided string that are in camelcase to snakecase. For example "SuperSpecialAwesome and also FizBuzz καιΚάτιΑκόμα" will be converted to "super_special_awesome and also fizz_buzz και_κάτι_ακόμα".
mb_strtolower(
preg_replace_callback(
'/(?<!\b|_)\p{Lu}/u',
function ($a) {
return "_$a[0]";
},
'SuperSpecialAwesome'
)
);
Yii2 have the different function to make the word snake_case from CamelCase.
/**
* Converts any "CamelCased" into an "underscored_word".
* #param string $words the word(s) to underscore
* #return string
*/
public static function underscore($words)
{
return strtolower(preg_replace('/(?<=\\w)([A-Z])/', '_\\1', $words));
}
This is one of shorter ways:
function camel_to_snake($input)
{
return strtolower(ltrim(preg_replace('/([A-Z])/', '_\\1', $input), '_'));
}
function camel2snake($name) {
$str_arr = str_split($name);
foreach ($str_arr as $k => &$v) {
if (ord($v) >= 64 && ord($v) <= 90) { // A = 64; Z = 90
$v = strtolower($v);
$v = ($k != 0) ? '_'.$v : $v;
}
}
return implode('', $str_arr);
}
The worst answer on here was so close to being the best(use a framework). NO DON'T, just take a look at the source code. seeing what a well established framework uses would be a far more reliable approach(tried and tested). The Zend framework has some word filters which fit your needs. Source.
here is a couple of methods I adapted from the source.
function CamelCaseToSeparator($value,$separator = ' ')
{
if (!is_scalar($value) && !is_array($value)) {
return $value;
}
if (defined('PREG_BAD_UTF8_OFFSET_ERROR') && preg_match('/\pL/u', 'a') == 1) {
$pattern = ['#(?<=(?:\p{Lu}))(\p{Lu}\p{Ll})#', '#(?<=(?:\p{Ll}|\p{Nd}))(\p{Lu})#'];
$replacement = [$separator . '\1', $separator . '\1'];
} else {
$pattern = ['#(?<=(?:[A-Z]))([A-Z]+)([A-Z][a-z])#', '#(?<=(?:[a-z0-9]))([A-Z])#'];
$replacement = ['\1' . $separator . '\2', $separator . '\1'];
}
return preg_replace($pattern, $replacement, $value);
}
function CamelCaseToUnderscore($value){
return CamelCaseToSeparator($value,'_');
}
function CamelCaseToDash($value){
return CamelCaseToSeparator($value,'-');
}
$string = CamelCaseToUnderscore("CamelCase");
There is a library providing this functionality:
SnakeCaseFormatter::run('CamelCase'); // Output: "camel_case"
If you use Laravel framework, you can use just snake_case() method.
How to de-camelize without using regex:
function decamelize($str, $glue = '_') {
$capitals = [];
$replace = [];
foreach(str_split($str) as $index => $char) {
if(!ctype_upper($char)) {
continue;
}
$capitals[] = $char;
$replace[] = ($index > 0 ? $glue : '') . strtolower($char);
}
if(count($capitals) > 0) {
return str_replace($capitals, $replace, $str);
}
return $str;
}
An edit:
How would I do that in 2019:
PHP 7.3 and before:
function toSnakeCase($str, $glue = '_') {
return ltrim(
preg_replace_callback('/[A-Z]/', function ($matches) use ($glue) {
return $glue . strtolower($matches[0]);
}, $str),
$glue
);
}
And with PHP 7.4+:
function toSnakeCase($str, $glue = '_') {
return ltrim(preg_replace_callback('/[A-Z]/', fn($matches) => $glue . strtolower($matches[0]), $str), $glue);
}
If you're using the Laravel framework, a simpler built-in method exists:
$converted = Str::snake('fooBar'); // -> foo_bar
See documentation here:
https://laravel.com/docs/9.x/helpers#method-snake-case
The open source TurboCommons library contains a general purpose formatCase() method inside the StringUtils class, which lets you convert a string to lots of common case formats, like CamelCase, UpperCamelCase, LowerCamelCase, snake_case, Title Case, and many more.
https://github.com/edertone/TurboCommons
To use it, import the phar file to your project and:
use org\turbocommons\src\main\php\utils\StringUtils;
echo StringUtils::formatCase('camelCase', StringUtils::FORMAT_SNAKE_CASE);
// will output 'camel_Case'

How to normalise CSV content in PHP?

Problem:
I'm looking for a PHP function to easily and efficiently normalise CSV content in a string (not in a file). I have made a function for that. I provide it in an answer, because it is a possible solution. Unfortuanately it doesn't work when the separator is included in incomming string values.
Can anyone provide a better solution?
Why not using fputcsv / fgetcsv ?
Because:
it requires at least PHP 5.1.0 (which is sometimes not available)
it can only read from files, but not from a string. even though, sometimes the input is not a file (eg. if you fetch the CSV from an email)
putting the content into a temporary file might be unavailable due to security policies.
Why / what kind of normalisation?
Normalise in a way, that the encloser encloses every field. Because the encloser can be optional and different per line and per field. This can happen if one is implementing unclean/incomplete specifications and/or using CSV content from different sources/programs/developers.
Example function call:
$csvContent = "'a a',\"b\",c,1, 2 ,3 \n a a,'bb',cc, 1, 2, 3 ";
echo "BEFORE:\n$csvContent\n";
normaliseCSV($csvContent);
echo "AFTER:\n$csvContent\n";
Output:
BEFORE:
'a a',"b",c,1, 2 ,3
a a,'bb',cc, 1, 2, 3
AFTER:
"a a","b","c","1","2","3"
"a a","bb","cc","1","2","3"
To specifically address your concern regarding f*csv working only with files:
Since PHP 5.3 there's str_getcsv.
For at least PHP >= 5.1 (and I really hope that's the oldest you'll have to deal with these days), you can use stream wrappers:
$buffer = fopen('php://memory', 'r+');
fwrite($buffer, $string);
rewind($buffer);
fgetcsv($buffer) ..
Or obviously the reverse if you want to use fputcsv.
This is a possible solution. But it doesn't consider the case that the separator (,) might be included in incoming strings.
function normaliseCSV(&$csv,$lineseperator = "\n", $fieldseperator = ',', $encloser = '"')
{
$csvArray = explode ($lineseperator,$csv);
foreach ($csvArray as &$line)
{
$lineArray = explode ($fieldseperator,$line);
foreach ($lineArray as &$field)
{
$field = $encloser.trim($field,"\0\t\n\x0B\r \"'").$encloser;
}
$line = implode ($fieldseperator,$lineArray);
}
$csv = implode ($lineseperator,$csvArray);
}
It is a simple chain of explode -> explode -> trim -> implode -> implode .
Although I agree with #deceze that you could expect atleast 5.1 these days, i'm sure there are some internal company servers somewhere who don't want to update.
I altered your method to be able to use field and line separators between double quotes, or in your case the $encloser value.
<?php
/*
In regards to the specs on http://tools.ietf.org/html/rfc4180 I use the following rules:
- "Fields containing line breaks (CRLF), double quotes, and commas should be enclosed in double-quotes."
- "If double-quotes are used to enclose fields, then a double-quote appearing inside a field must be escaped by preceding it with another double quote."
Exception:
Even though the specs says use double quotes, I 'm using your $encloser variable
*/
echo normaliseCSV('a,b,\'c\',"d,e","f","g""h""i","""j"""' . "\n" . "\"k\nl\nm\"");
function normaliseCSV($csv,$lineseperator = "\n", $fieldseperator = ',', $encloser = '"')
{
//We need 4 temporary replacement values
//line seperator, fieldseperator, double qoutes, triple qoutes
$keys = array();
while (count($keys)<3) {
$tmp = "##".md5(rand().rand().microtime())."##";
if (strpos($csv, $tmp)===false) {
$keys[] = $tmp;
}
}
//first we exchange "" (double $encloser) and """ to make sure its not exploded
$csv = str_replace($encloser.$encloser.$encloser, $keys[0], $csv);
$csv = str_replace($encloser.$encloser, $keys[0], $csv);
//Explode on $encloser
//Every odd index is within quotes
//Exchange line and field seperators for something not used.
$content = explode($encloser,$csv);
$len = count($content);
if ($len>1) {
for ($x=1;$x<$len;$x=$x+2) {
$content[$x] = str_replace($lineseperator,$keys[1], $content[$x]);
$content[$x] = str_replace($fieldseperator,$keys[2], $content[$x]);
}
}
$csv = implode('',$content);
$csvArray = explode ($lineseperator,$csv);
foreach ($csvArray as &$line)
{
$lineArray = explode ($fieldseperator,$line);
foreach ($lineArray as &$field)
{
$val = trim($field,"\0\t\n\x0B\r '");
//put back the exchanged values
$val = str_replace($keys[0],$encloser.$encloser,$val);
$val = str_replace($keys[1],$lineseperator,$val);
$val = str_replace($keys[2],$fieldseperator,$val);
$val = $encloser.$val.$encloser;
$field = $val;
}
$line = implode ($fieldseperator,$lineArray);
}
$csv = implode ($lineseperator,$csvArray);
return $csv;
}
?>
Output would be:
"a","b","c","d,e","f","g""h""i","""j"""
"k
l
m"
Codepad example
when i first read this question wasn´t sure if it should be solved or not, since <5.1 environments should be extinguished a long time ago, dispite of that is a hell of a question how to solve this so we should be thinking wich approach to take... and my guess is it should be char by char examination.
I have separated logic in three main scenarios:
A: CHAR is a separator
B: CHAR is a Fuc$€/& quotation
C: CHAR is a Value
Obtaining as a reulst this weapon class (including log for it) for our arsenal:
<?php
Class CSVParser
{
#basic requirements
public $input;
public $separator;
public $currentQuote;
public $insideQuote;
public $result;
public $field;
public $quotation = array();
public $parsedArray = array();
# for logging purposes only
public $logging = TRUE;
public $log = array();
function __construct($input, $separator, $quotation=array())
{
$this->separator = $separator;
$this->input = $input;
$this->quotation = $quotation;
}
/**
* The main idea is to go through the string to parse char by char to analize
* when a complete field is detected it´ll be quoted according and added to an array
*/
public function parse()
{
for($i = 0; $i < strlen($this->input); $i++){
$this->processStream($i);
}
foreach($this->parsedArray as $value)
{
if(!is_null($value))
$this->result .= '"'.addslashes($value).'",';
}
return rtrim($this->result, ',');
}
private function processStream($i)
{
#A case (its a separator)
if($this->input[$i]===$this->separator){
$this->log("A", $this->input[$i]);
if($this->insideQuote){
$this->field .= $this->input[$i];
}else
{
$this->saveField($this->field);
$this->field = NULL;
}
}
#B case (its a f"·%$% quote)
if(in_array($this->input[$i], $this->quotation)){
$this->log("B", $this->input[$i]);
if(!$this->insideQuote){
$this->insideQuote = TRUE;
$this->currentQuote = $this->input[$i];
}
else{
if($this->currentQuote===$this->input[$i]){
$this->insideQuote = FALSE;
$this->currentQuote ='';
$this->saveField($this->field);
$this->field = NULL;
}else{
$this->field .= $this->input[$i];
}
}
}
#C case (its a value :-) )
if(!in_array($this->input[$i], array_merge(array($this->separator), $this->quotation))){
$this->log("C", $this->input[$i]);
$this->field .= $this->input[$i];
}
}
private function saveField($field)
{
$this->parsedArray[] = $field;
}
private function log($type, $value)
{
if($this->logging){
$this->log[] = "CASE ".$type." WITH ".$value." AS VALUE";
}
}
}
and example of how to use it would be:
$original = 'a,"ab",\'ab\'';
$test = new CSVParser($original, ',', array('"', "'"));
echo "<PRE>ORIGINAL: ".$original."</PRE>";
echo "<PRE>PARSED: ".$test->parse()."</PRE>";
echo "<pre>";
print_r($test->log);
echo "</pre>";
and here are the results:
ORIGINAL: a,"ab",'ab'
PARSED: "a","ab","ab"
Array
(
[0] => CASE C WITH a AS VALUE
[1] => CASE A WITH , AS VALUE
[2] => CASE B WITH " AS VALUE
[3] => CASE C WITH a AS VALUE
[4] => CASE C WITH b AS VALUE
[5] => CASE B WITH " AS VALUE
[6] => CASE A WITH , AS VALUE
[7] => CASE B WITH ' AS VALUE
[8] => CASE C WITH a AS VALUE
[9] => CASE C WITH b AS VALUE
[10] => CASE B WITH ' AS VALUE
)
I might have mistakes since i only dedicated 25 mins to it, so any comment will be appreciated an edited.

How to convert PascalCase to snake_case?

If I had:
$string = "PascalCase";
I need
"pascal_case"
Does PHP offer a function for this purpose?
A shorter solution: Similar to the editor's one with a simplified regular expression and fixing the "trailing-underscore" problem:
$output = strtolower(preg_replace('/(?<!^)[A-Z]/', '_$0', $input));
PHP Demo |
Regex Demo
Note that cases like SimpleXML will be converted to simple_x_m_l using the above solution. That can also be considered a wrong usage of camel case notation (correct would be SimpleXml) rather than a bug of the algorithm since such cases are always ambiguous - even by grouping uppercase characters to one string (simple_xml) such algorithm will always fail in other edge cases like XMLHTMLConverter or one-letter words near abbreviations, etc. If you don't mind about the (rather rare) edge cases and want to handle SimpleXML correctly, you can use a little more complex solution:
$output = ltrim(strtolower(preg_replace('/[A-Z]([A-Z](?![a-z]))*/', '_$0', $input)), '_');
PHP Demo |
Regex Demo
Try this on for size:
$tests = array(
'simpleTest' => 'simple_test',
'easy' => 'easy',
'HTML' => 'html',
'simpleXML' => 'simple_xml',
'PDFLoad' => 'pdf_load',
'startMIDDLELast' => 'start_middle_last',
'AString' => 'a_string',
'Some4Numbers234' => 'some4_numbers234',
'TEST123String' => 'test123_string',
);
foreach ($tests as $test => $result) {
$output = from_camel_case($test);
if ($output === $result) {
echo "Pass: $test => $result\n";
} else {
echo "Fail: $test => $result [$output]\n";
}
}
function from_camel_case($input) {
preg_match_all('!([A-Z][A-Z0-9]*(?=$|[A-Z][a-z0-9])|[A-Za-z][a-z0-9]+)!', $input, $matches);
$ret = $matches[0];
foreach ($ret as &$match) {
$match = $match == strtoupper($match) ? strtolower($match) : lcfirst($match);
}
return implode('_', $ret);
}
Output:
Pass: simpleTest => simple_test
Pass: easy => easy
Pass: HTML => html
Pass: simpleXML => simple_xml
Pass: PDFLoad => pdf_load
Pass: startMIDDLELast => start_middle_last
Pass: AString => a_string
Pass: Some4Numbers234 => some4_numbers234
Pass: TEST123String => test123_string
This implements the following rules:
A sequence beginning with a lowercase letter must be followed by lowercase letters and digits;
A sequence beginning with an uppercase letter can be followed by either:
one or more uppercase letters and digits (followed by either the end of the string or an uppercase letter followed by a lowercase letter or digit ie the start of the next sequence); or
one or more lowercase letters or digits.
A concise solution and can handle some tricky use cases:
function decamelize($string) {
return strtolower(preg_replace(['/([a-z\d])([A-Z])/', '/([^_])([A-Z][a-z])/'], '$1_$2', $string));
}
Can handle all these cases:
simpleTest => simple_test
easy => easy
HTML => html
simpleXML => simple_xml
PDFLoad => pdf_load
startMIDDLELast => start_middle_last
AString => a_string
Some4Numbers234 => some4_numbers234
TEST123String => test123_string
hello_world => hello_world
hello__world => hello__world
_hello_world_ => _hello_world_
hello_World => hello_world
HelloWorld => hello_world
helloWorldFoo => hello_world_foo
hello-world => hello-world
myHTMLFiLe => my_html_fi_le
aBaBaB => a_ba_ba_b
BaBaBa => ba_ba_ba
libC => lib_c
You can test this function here: http://syframework.alwaysdata.net/decamelize
The Symfony Serializer Component has a CamelCaseToSnakeCaseNameConverter that has two methods normalize() and denormalize(). These can be used as follows:
$nameConverter = new CamelCaseToSnakeCaseNameConverter();
echo $nameConverter->normalize('camelCase');
// outputs: camel_case
echo $nameConverter->denormalize('snake_case');
// outputs: snakeCase
Ported from Ruby's String#camelize and String#decamelize.
function decamelize($word) {
return preg_replace(
'/(^|[a-z])([A-Z])/e',
'strtolower(strlen("\\1") ? "\\1_\\2" : "\\2")',
$word
);
}
function camelize($word) {
return preg_replace('/(^|_)([a-z])/e', 'strtoupper("\\2")', $word);
}
One trick the above solutions may have missed is the 'e' modifier which causes preg_replace to evaluate the replacement string as PHP code.
Most solutions here feel heavy handed. Here's what I use:
$underscored = strtolower(
preg_replace(
["/([A-Z]+)/", "/_([A-Z]+)([A-Z][a-z])/"],
["_$1", "_$1_$2"],
lcfirst($camelCase)
)
);
"CamelCASE" is converted to "camel_case"
lcfirst($camelCase) will lower the first character (avoids 'CamelCASE' converted output to start with an underscore)
[A-Z] finds capital letters
+ will treat every consecutive uppercase as a word (avoids 'CamelCASE' to be converted to camel_C_A_S_E)
Second pattern and replacement are for ThoseSPECCases -> those_spec_cases instead of those_speccases
strtolower([…]) turns the output to lowercases
php does not offer a built in function for this afaik, but here is what I use
function uncamelize($camel,$splitter="_") {
$camel=preg_replace('/(?!^)[[:upper:]][[:lower:]]/', '$0', preg_replace('/(?!^)[[:upper:]]+/', $splitter.'$0', $camel));
return strtolower($camel);
}
the splitter can be specified in the function call, so you can call it like so
$camelized="thisStringIsCamelized";
echo uncamelize($camelized,"_");
//echoes "this_string_is_camelized"
echo uncamelize($camelized,"-");
//echoes "this-string-is-camelized"
I had a similar problem but couldn't find any answer that satisfies how to convert CamelCase to snake_case, while avoiding duplicate or redundant underscores _ for names with underscores, or all caps abbreviations.
Th problem is as follows:
CamelCaseClass => camel_case_class
ClassName_WithUnderscores => class_name_with_underscore
FAQ => faq
The solution I wrote is a simple two functions call, lowercase and search and replace for consecutive lowercase-uppercase letters:
strtolower(preg_replace("/([a-z])([A-Z])/", "$1_$2", $name));
"CamelCase" to "camel_case":
function camelToSnake($camel)
{
$snake = preg_replace('/[A-Z]/', '_$0', $camel);
$snake = strtolower($snake);
$snake = ltrim($snake, '_');
return $snake;
}
or:
function camelToSnake($camel)
{
$snake = preg_replace_callback('/[A-Z]/', function ($match){
return '_' . strtolower($match[0]);
}, $camel);
return ltrim($snake, '_');
}
If you are looking for a PHP 5.4 version and later answer here is the code:
function decamelize($word) {
return $word = preg_replace_callback(
"/(^|[a-z])([A-Z])/",
function($m) { return strtolower(strlen($m[1]) ? "$m[1]_$m[2]" : "$m[2]"); },
$word
);
}
function camelize($word) {
return $word = preg_replace_callback(
"/(^|_)([a-z])/",
function($m) { return strtoupper("$m[2]"); },
$word
);
}
You need to run a regex through it that matches every uppercase letter except if it is in the beginning and replace it with underscrore plus that letter. An utf-8 solution is this:
header('content-type: text/html; charset=utf-8');
$separated = preg_replace('%(?<!^)\p{Lu}%usD', '_$0', 'AaaaBbbbCcccDdddÁáááŐőőő');
$lower = mb_strtolower($separated, 'utf-8');
echo $lower; //aaaa_bbbb_cccc_dddd_áááá_őőőő
If you are not sure what case your string is, better to check it first, because this code assumes that the input is camelCase instead of underscore_Case or dash-Case, so if the latters have uppercase letters, it will add underscores to them.
The accepted answer from cletus is way too overcomplicated imho and it works only with latin characters. I find it a really bad solution and wonder why it was accepted at all. Converting TEST123String into test123_string is not necessarily a valid requirement. I rather kept it simple and separated ABCccc into a_b_cccc instead of ab_cccc because it does not lose information this way and the backward conversion will give the exact same string we started with. Even if you want to do it the other way it is relative easy to write a regex for it with positive lookbehind (?<!^)\p{Lu}\p{Ll}|(?<=\p{Ll})\p{Lu} or two regexes without lookbehind if you are not a regex expert. There is no need to split it up into substrings not to mention deciding between strtolower and lcfirst where using just strtolower would be completely fine.
Short solution:
$subject = "PascalCase";
echo strtolower(preg_replace('/\B([A-Z])/', '_$1', $subject));
Not fancy at all but simple and speedy as hell:
function uncamelize($str)
{
$str = lcfirst($str);
$lc = strtolower($str);
$result = '';
$length = strlen($str);
for ($i = 0; $i < $length; $i++) {
$result .= ($str[$i] == $lc[$i] ? '' : '_') . $lc[$i];
}
return $result;
}
echo uncamelize('HelloAWorld'); //hello_a_world
A version that doesn't use regex can be found in the Alchitect source:
decamelize($str, $glue='_')
{
$counter = 0;
$uc_chars = '';
$new_str = array();
$str_len = strlen($str);
for ($x=0; $x<$str_len; ++$x)
{
$ascii_val = ord($str[$x]);
if ($ascii_val >= 65 && $ascii_val <= 90)
{
$uc_chars .= $str[$x];
}
}
$tok = strtok($str, $uc_chars);
while ($tok !== false)
{
$new_char = chr(ord($uc_chars[$counter]) + 32);
$new_str[] = $new_char . $tok;
$tok = strtok($uc_chars);
++$counter;
}
return implode($new_str, $glue);
}
So here is a one-liner:
strtolower(preg_replace('/(?|([a-z\d])([A-Z])|([^\^])([A-Z][a-z]))/', '$1_$2', $string));
danielstjules/Stringy provieds a method to convert string from camelcase to snakecase.
s('TestUCase')->underscored(); // 'test_u_case'
Laravel 5.6 provides a very simple way of doing this:
/**
* Convert a string to snake case.
*
* #param string $value
* #param string $delimiter
* #return string
*/
public static function snake($value, $delimiter = '_'): string
{
if (!ctype_lower($value)) {
$value = strtolower(preg_replace('/(.)(?=[A-Z])/u', '$1'.$delimiter, $value));
}
return $value;
}
What it does: if it sees that there is at least one capital letter in the given string, it uses a positive lookahead to search for any character (.) followed by a capital letter ((?=[A-Z])). It then replaces the found character with it's value followed by the separactor _.
If you are not using Composer for PHP you are wasting your time.
composer require doctrine/inflector
use Doctrine\Inflector\InflectorFactory;
// Couple ways to get class name:
// If inside a parent class
$class_name = get_called_class();
// Or just inside the class
$class_name = get_class();
// Or straight get a class name
$class_name = MyCustomClass::class;
// Or, of course, a string
$class_name = 'App\Libs\MyCustomClass';
// Take the name down to the base name:
$class_name = end(explode('\\', $class_name)));
$inflector = InflectorFactory::create()->build();
$inflector->tableize($class_name); // my_custom_class
https://github.com/doctrine/inflector/blob/master/docs/en/index.rst
Use Symfony String
composer require symfony/string
use function Symfony\Component\String\u;
u($string)->snake()->toString()
The direct port from rails (minus their special handling for :: or acronyms) would be
function underscore($word){
$word = preg_replace('#([A-Z\d]+)([A-Z][a-z])#','\1_\2', $word);
$word = preg_replace('#([a-z\d])([A-Z])#', '\1_\2', $word);
return strtolower(strtr($word, '-', '_'));
}
Knowing PHP, this will be faster than the manual parsing that's happening in other answers given here. The disadvantage is that you don't get to chose what to use as a separator between words, but that wasn't part of the question.
Also check the relevant rails source code
Note that this is intended for use with ASCII identifiers. If you need to do this with characters outside of the ASCII range, use the '/u' modifier for preg_matchand use mb_strtolower.
Here is my contribution to a six-year-old question with god knows how many answers...
It will convert all words in the provided string that are in camelcase to snakecase. For example "SuperSpecialAwesome and also FizBuzz καιΚάτιΑκόμα" will be converted to "super_special_awesome and also fizz_buzz και_κάτι_ακόμα".
mb_strtolower(
preg_replace_callback(
'/(?<!\b|_)\p{Lu}/u',
function ($a) {
return "_$a[0]";
},
'SuperSpecialAwesome'
)
);
Yii2 have the different function to make the word snake_case from CamelCase.
/**
* Converts any "CamelCased" into an "underscored_word".
* #param string $words the word(s) to underscore
* #return string
*/
public static function underscore($words)
{
return strtolower(preg_replace('/(?<=\\w)([A-Z])/', '_\\1', $words));
}
This is one of shorter ways:
function camel_to_snake($input)
{
return strtolower(ltrim(preg_replace('/([A-Z])/', '_\\1', $input), '_'));
}
function camel2snake($name) {
$str_arr = str_split($name);
foreach ($str_arr as $k => &$v) {
if (ord($v) >= 64 && ord($v) <= 90) { // A = 64; Z = 90
$v = strtolower($v);
$v = ($k != 0) ? '_'.$v : $v;
}
}
return implode('', $str_arr);
}
The worst answer on here was so close to being the best(use a framework). NO DON'T, just take a look at the source code. seeing what a well established framework uses would be a far more reliable approach(tried and tested). The Zend framework has some word filters which fit your needs. Source.
here is a couple of methods I adapted from the source.
function CamelCaseToSeparator($value,$separator = ' ')
{
if (!is_scalar($value) && !is_array($value)) {
return $value;
}
if (defined('PREG_BAD_UTF8_OFFSET_ERROR') && preg_match('/\pL/u', 'a') == 1) {
$pattern = ['#(?<=(?:\p{Lu}))(\p{Lu}\p{Ll})#', '#(?<=(?:\p{Ll}|\p{Nd}))(\p{Lu})#'];
$replacement = [$separator . '\1', $separator . '\1'];
} else {
$pattern = ['#(?<=(?:[A-Z]))([A-Z]+)([A-Z][a-z])#', '#(?<=(?:[a-z0-9]))([A-Z])#'];
$replacement = ['\1' . $separator . '\2', $separator . '\1'];
}
return preg_replace($pattern, $replacement, $value);
}
function CamelCaseToUnderscore($value){
return CamelCaseToSeparator($value,'_');
}
function CamelCaseToDash($value){
return CamelCaseToSeparator($value,'-');
}
$string = CamelCaseToUnderscore("CamelCase");
There is a library providing this functionality:
SnakeCaseFormatter::run('CamelCase'); // Output: "camel_case"
If you use Laravel framework, you can use just snake_case() method.
How to de-camelize without using regex:
function decamelize($str, $glue = '_') {
$capitals = [];
$replace = [];
foreach(str_split($str) as $index => $char) {
if(!ctype_upper($char)) {
continue;
}
$capitals[] = $char;
$replace[] = ($index > 0 ? $glue : '') . strtolower($char);
}
if(count($capitals) > 0) {
return str_replace($capitals, $replace, $str);
}
return $str;
}
An edit:
How would I do that in 2019:
PHP 7.3 and before:
function toSnakeCase($str, $glue = '_') {
return ltrim(
preg_replace_callback('/[A-Z]/', function ($matches) use ($glue) {
return $glue . strtolower($matches[0]);
}, $str),
$glue
);
}
And with PHP 7.4+:
function toSnakeCase($str, $glue = '_') {
return ltrim(preg_replace_callback('/[A-Z]/', fn($matches) => $glue . strtolower($matches[0]), $str), $glue);
}
If you're using the Laravel framework, a simpler built-in method exists:
$converted = Str::snake('fooBar'); // -> foo_bar
See documentation here:
https://laravel.com/docs/9.x/helpers#method-snake-case
The open source TurboCommons library contains a general purpose formatCase() method inside the StringUtils class, which lets you convert a string to lots of common case formats, like CamelCase, UpperCamelCase, LowerCamelCase, snake_case, Title Case, and many more.
https://github.com/edertone/TurboCommons
To use it, import the phar file to your project and:
use org\turbocommons\src\main\php\utils\StringUtils;
echo StringUtils::formatCase('camelCase', StringUtils::FORMAT_SNAKE_CASE);
// will output 'camel_Case'

Perl to PHP With s///;

In perl, I can do: 1 while $var =~ s/a/b/;, and it will replace all a with b. In many cases, I would use it more like 1 while $var =~ s/^"(.*)"$/$1/; to remove all pairs of double quotes around a string.
Is there a way to do something similar to this in PHP, without having to do
while (preg_match('/^"(.*)"$/', $var)) {
$var = preg_replace('/^"(.*)"$/', '$1', $var, 1);
}
Because apparently,
while ($var = preg_replace('/^"(.*)"$/', '$1', $var, 1)) { 1; }
doesn't work.
EDIT: The specific situation I'm working in involves replacing values in a string with values from an associative array:
$text = "This is [site_name], home of the [people_type]".
$array = ('site_name' => 'StackOverflow.com', 'people_type' => 'crazy coders');
where I would be doing:
while (preg_match('/\[.*?\]/', $text)) {
$text = preg_replace('/\[(.*?)\]/', '$array[\'$1\']', $text, 1);
}
with the intended output being 'This is StackOverflow.com, home of the crazy coders'
preg_replace('#\[(.*?)\]#e', "\$array['$1']", $text);
In all of the cases, you can get rid of the loop by (e.g.) using the /g global replace option or rewriting the regexp:
$var =~ s/a/b/g;
$var =~ s/^("+)(.*)\1$/$2/;
The same patterns should work in PHP. You can also get rid of the $limit argument to preg_replace:
$text = preg_replace('/\[(.*?)\]/e', '$array[\'$1\']', $text);
Regular expressions can handle their own loops. Looping outside the RE is inefficient, since the RE has to process text it already processed in previous iterations.
Could something like this work?
$var = preg_replace('/^("+)(.*)\1$', '$2', $var, 1);
What does your input data look like?
Because you're checking for double quotes only at the head and tail of the string. If that's accurate, then you don't need to capture a backreference at all. Also, that would make sending 1 as the 4th parameter completely superfluous.
$var = '"foo"';
// This works
echo preg_replace( '/^"(.*)"$/', '$1', $var );
// So does this
echo preg_replace( '/^"|"$/', '', $var );
But if your input data looks different, that would change my answer.
EDIT
Here's my take on your actual data
class VariableExpander
{
protected $source;
public function __construct( array $source )
{
$this->setSource( $source );
}
public function setSource( array $source )
{
$this->source = $source;
}
public function parse( $input )
{
return preg_replace_callback( '/\[([a-z_]+)\]/i', array( $this, 'expand' ), $input );
}
protected function expand( $matches )
{
return isset( $this->source[$matches[1]] )
? $this->source[$matches[1]]
: '';
}
}
$text = "This is [site_name], home of the [people_type]";
$data = array(
'site_name' => 'StackOverflow.com'
, 'people_type' => 'crazy coders'
);
$ve = new VariableExpander( $data );
echo $ve->parse( $text );
The class is just for encapsulation - you could do this in a structured way if you wanted.
Use do-while:
do {
$var = preg_replace('/^"(.*)"$/', "$1", $var, 1, $count);
} while ($count == 1);
Requires at least php-5.1.0 due to its use of $count.
You could also write
do {
$last = $var;
$var = preg_replace('/^"(.*)"$/', "$1", $var);
} while ($last != $var);

Backticking MySQL Entities

I've the following method which allows me to protect MySQL entities:
public function Tick($string)
{
$string = explode('.', str_replace('`', '', $string));
foreach ($string as $key => $value)
{
if ($value != '*')
{
$string[$key] = '`' . trim($value) . '`';
}
}
return implode('.', $string);
}
This works fairly well for the use that I make of it.
It protects database, table, field names and even the * operator, however now I also want it to protect function calls, ie:
AVG(database.employees.salary)
Should become:
AVG(`database`.`employees`.`salary`) and not `AVG(database`.`employees`.`salary)`
How should I go about this? Should I use regular expressions?
Also, how can I support more advanced stuff, from:
MAX(AVG(database.table.field1), MAX(database.table.field2))
To:
MAX(AVG(`database`.`table`.`field1`), MAX(`database`.`table`.`field2`))
Please keep in mind that I want to keep this method as simple/fast as possible, since it pretty much iterates over all the entity names in my database.
If this is quoting parts of an SQL statement, and they have only complexity that you descibe, a RegEx is a great approach. On the other hand, if you need to do this to full SQL statements, or simply more complicated components of statements (such as "MAX(AVG(val),MAX(val2))"), you will need to tokenize or parse the string and have a more sophisticated understanding of it to do this quoting accurately.
Given the regular expression approach, you may find it easier to break the function name out as one step, and then use your current code to quote the database/table/column names. This can be done in one RE, but it will be tricker to get right.
Either way, I'd highly recommend writing a few unit test cases. In fact, this is an ideal situation for this approach: it's easy to write the tests, you have some existing cases that work (which you don't want to break), and you have just one more case to add.
Your test can start as simply as:
assert '`ticked`' == Tick('ticked');
assert '`table`.`ticked`' == Tick('table.ticked');
assert 'db`.`table`.`ticked`' == Tick('db.table.ticked');
And then add:
assert 'FN(`ticked`)' == Tick('FN(ticked)');
etc.
Using the test case ndp gave I created a regex to do the hard work for you. The following regex will replace all word boundaries around words that are not followed by an opening parenthesis.
\b(\w+)\b(?!\()
The Tick() functionality would then be implemented in PHP as follows:
function Tick($string)
{
return preg_replace( '/\b(\w+)\b(?!\()/', '`\1`', $string );
}
It's generally a bad idea to pass the whole SQL to the function. That way, you'll always find a case when it doesn't work, unless you fully parse the SQL syntax.
Put the ticks to the names on some previous abstraction level, which makes up the SQL.
Before you explode your string on periods, check if the last character is a parenthesis. If so, this call is a function.
<?php
$string = str_replace('`', '', $string)
$function = "";
if (substr($string,-1) == ")") {
// Strip off function call first
$opening = strpos($string, "(");
$function = substr($string, 0, $opening+1);
$string = substr($string, $opening+1, -1);
}
// Do your existing parsing to $string
if ($function == "") {
// Put function back on string
$string = $function . $string . ")";
}
?>
If you need to cover more advanced situations, like using nested functions, or multiple functions in sequence in one "$string" variable, this would become a much more advanced function, and you'd best ask yourself why these elements aren't being properly ticked in the first place, and not need any further parsing.
EDIT: Updating for nested functions, as per original post edit
To have the above function deal with multiple nested functions, you likely need something that will 'unwrap' your nested functions. I haven't tested this, but the following function might get you on the right track.
<?php
function unwrap($str) {
$pos = strpos($str, "(");
if ($pos === false) return $str; // There's no function call here
$last_close = 0;
$cur_offset = 0; // Start at the beginning
while ($cur_offset <= strlen($str)) {
$first_close = strpos($str, ")", $offset); // Find first deep function
$pos = strrpos($str, "(", $first_close-1); // Find associated opening
if ($pos > $last_close) {
// This function is entirely after the previous function
$ticked = Tick(substr($str, $pos+1, $first_close-$pos)); // Tick the string inside
$str = substr($str, 0, $pos)."{".$ticked."}".substr($str,$first_close); // Replace parenthesis by curly braces temporarily
$first_close += strlen($ticked)-($first_close-$pos); // Shift parenthesis location due to new ticks being added
} else {
// This function wraps other functions; don't tick it
$str = substr($str, 0, $pos)."{".substr($str,$pos+1, $first_close-$pos)."}".substr($str,$first_close);
}
$last_close = $first_close;
$offset = $first_close+1;
}
// Replace the curly braces with parenthesis again
$str = str_replace(array("{","}"), array("(",")"), $str);
}
If you are adding the function calls in your code, as opposed to passing them in through a string-only interface, you can replace the string parsing with type checking:
function Tick($value) {
if (is_object($value)) {
$result = $value->value;
} else {
$result = '`'.str_replace(array('`', '.'), array('', '`.`'), $value).'`';
}
return $result;
}
class SqlFunction {
var $value;
function SqlFunction($function, $params) {
$sane = implode(', ', array_map('Tick', $params));
$this->value = "$function($sane)";
}
}
function Maximum($column) {
return new SqlFunction('MAX', array($column));
}
function Avg($column) {
return new SqlFunction('AVG', array($column));
}
function Greatest() {
$params = func_get_args();
return new SqlFunction('GREATEST', $params);
}
$cases = array(
"'simple'" => Tick('simple'),
"'table.field'" => Tick('table.field'),
"'table.*'" => Tick('table.*'),
"'evil`hack'" => Tick('evil`hack'),
"Avg('database.table.field')" => Tick(Avg('database.table.field')),
"Greatest(Avg('table.field1'), Maximum('table.field2'))" => Tick(Greatest(Avg('table.field1'), Maximum('table.field2'))),
);
echo "<table>";
foreach ($cases as $case => $result) {
echo "<tr><td>$case</td><td>$result</td></tr>";
}
echo "</table>";
This avoids any possible SQL injection while remaining legible to future readers of your code.
You could use preg_replace_callback() in conjunction with your Tick() method to skip at least one level of parens:
public function tick($str)
{
return preg_replace_callback('/[^()]*/', array($this, '_tick_replace_callback'), $str);
}
protected function _tick_replace_callback($str) {
$string = explode('.', str_replace('`', '', $string));
foreach ($string as $key => $value)
{
if ($value != '*')
{
$string[$key] = '`' . trim($value) . '`';
}
}
return implode('.', $string);
}
Are you generating the SQL Query or is it being passed to you? If you generating the query I wouldn't pass the whole query string just the parms/values you want to wrap in the backticks or what ever else you need.
EXAMPLE:
function addTick($var) {
return '`' . $var . '`';
}
$condition = addTick($condition);
$SQL = 'SELECT' . $what . '
FROM ' . $table . '
WHERE ' . $condition . ' = ' . $constraint;
This is just a mock but you get the idea that you can pass or loop through your code and build the query string rather than parsing the query string and adding your backticks.

Categories