I've the following method which allows me to protect MySQL entities:
public function Tick($string)
{
$string = explode('.', str_replace('`', '', $string));
foreach ($string as $key => $value)
{
if ($value != '*')
{
$string[$key] = '`' . trim($value) . '`';
}
}
return implode('.', $string);
}
This works fairly well for the use that I make of it.
It protects database, table, field names and even the * operator, however now I also want it to protect function calls, ie:
AVG(database.employees.salary)
Should become:
AVG(`database`.`employees`.`salary`) and not `AVG(database`.`employees`.`salary)`
How should I go about this? Should I use regular expressions?
Also, how can I support more advanced stuff, from:
MAX(AVG(database.table.field1), MAX(database.table.field2))
To:
MAX(AVG(`database`.`table`.`field1`), MAX(`database`.`table`.`field2`))
Please keep in mind that I want to keep this method as simple/fast as possible, since it pretty much iterates over all the entity names in my database.
If this is quoting parts of an SQL statement, and they have only complexity that you descibe, a RegEx is a great approach. On the other hand, if you need to do this to full SQL statements, or simply more complicated components of statements (such as "MAX(AVG(val),MAX(val2))"), you will need to tokenize or parse the string and have a more sophisticated understanding of it to do this quoting accurately.
Given the regular expression approach, you may find it easier to break the function name out as one step, and then use your current code to quote the database/table/column names. This can be done in one RE, but it will be tricker to get right.
Either way, I'd highly recommend writing a few unit test cases. In fact, this is an ideal situation for this approach: it's easy to write the tests, you have some existing cases that work (which you don't want to break), and you have just one more case to add.
Your test can start as simply as:
assert '`ticked`' == Tick('ticked');
assert '`table`.`ticked`' == Tick('table.ticked');
assert 'db`.`table`.`ticked`' == Tick('db.table.ticked');
And then add:
assert 'FN(`ticked`)' == Tick('FN(ticked)');
etc.
Using the test case ndp gave I created a regex to do the hard work for you. The following regex will replace all word boundaries around words that are not followed by an opening parenthesis.
\b(\w+)\b(?!\()
The Tick() functionality would then be implemented in PHP as follows:
function Tick($string)
{
return preg_replace( '/\b(\w+)\b(?!\()/', '`\1`', $string );
}
It's generally a bad idea to pass the whole SQL to the function. That way, you'll always find a case when it doesn't work, unless you fully parse the SQL syntax.
Put the ticks to the names on some previous abstraction level, which makes up the SQL.
Before you explode your string on periods, check if the last character is a parenthesis. If so, this call is a function.
<?php
$string = str_replace('`', '', $string)
$function = "";
if (substr($string,-1) == ")") {
// Strip off function call first
$opening = strpos($string, "(");
$function = substr($string, 0, $opening+1);
$string = substr($string, $opening+1, -1);
}
// Do your existing parsing to $string
if ($function == "") {
// Put function back on string
$string = $function . $string . ")";
}
?>
If you need to cover more advanced situations, like using nested functions, or multiple functions in sequence in one "$string" variable, this would become a much more advanced function, and you'd best ask yourself why these elements aren't being properly ticked in the first place, and not need any further parsing.
EDIT: Updating for nested functions, as per original post edit
To have the above function deal with multiple nested functions, you likely need something that will 'unwrap' your nested functions. I haven't tested this, but the following function might get you on the right track.
<?php
function unwrap($str) {
$pos = strpos($str, "(");
if ($pos === false) return $str; // There's no function call here
$last_close = 0;
$cur_offset = 0; // Start at the beginning
while ($cur_offset <= strlen($str)) {
$first_close = strpos($str, ")", $offset); // Find first deep function
$pos = strrpos($str, "(", $first_close-1); // Find associated opening
if ($pos > $last_close) {
// This function is entirely after the previous function
$ticked = Tick(substr($str, $pos+1, $first_close-$pos)); // Tick the string inside
$str = substr($str, 0, $pos)."{".$ticked."}".substr($str,$first_close); // Replace parenthesis by curly braces temporarily
$first_close += strlen($ticked)-($first_close-$pos); // Shift parenthesis location due to new ticks being added
} else {
// This function wraps other functions; don't tick it
$str = substr($str, 0, $pos)."{".substr($str,$pos+1, $first_close-$pos)."}".substr($str,$first_close);
}
$last_close = $first_close;
$offset = $first_close+1;
}
// Replace the curly braces with parenthesis again
$str = str_replace(array("{","}"), array("(",")"), $str);
}
If you are adding the function calls in your code, as opposed to passing them in through a string-only interface, you can replace the string parsing with type checking:
function Tick($value) {
if (is_object($value)) {
$result = $value->value;
} else {
$result = '`'.str_replace(array('`', '.'), array('', '`.`'), $value).'`';
}
return $result;
}
class SqlFunction {
var $value;
function SqlFunction($function, $params) {
$sane = implode(', ', array_map('Tick', $params));
$this->value = "$function($sane)";
}
}
function Maximum($column) {
return new SqlFunction('MAX', array($column));
}
function Avg($column) {
return new SqlFunction('AVG', array($column));
}
function Greatest() {
$params = func_get_args();
return new SqlFunction('GREATEST', $params);
}
$cases = array(
"'simple'" => Tick('simple'),
"'table.field'" => Tick('table.field'),
"'table.*'" => Tick('table.*'),
"'evil`hack'" => Tick('evil`hack'),
"Avg('database.table.field')" => Tick(Avg('database.table.field')),
"Greatest(Avg('table.field1'), Maximum('table.field2'))" => Tick(Greatest(Avg('table.field1'), Maximum('table.field2'))),
);
echo "<table>";
foreach ($cases as $case => $result) {
echo "<tr><td>$case</td><td>$result</td></tr>";
}
echo "</table>";
This avoids any possible SQL injection while remaining legible to future readers of your code.
You could use preg_replace_callback() in conjunction with your Tick() method to skip at least one level of parens:
public function tick($str)
{
return preg_replace_callback('/[^()]*/', array($this, '_tick_replace_callback'), $str);
}
protected function _tick_replace_callback($str) {
$string = explode('.', str_replace('`', '', $string));
foreach ($string as $key => $value)
{
if ($value != '*')
{
$string[$key] = '`' . trim($value) . '`';
}
}
return implode('.', $string);
}
Are you generating the SQL Query or is it being passed to you? If you generating the query I wouldn't pass the whole query string just the parms/values you want to wrap in the backticks or what ever else you need.
EXAMPLE:
function addTick($var) {
return '`' . $var . '`';
}
$condition = addTick($condition);
$SQL = 'SELECT' . $what . '
FROM ' . $table . '
WHERE ' . $condition . ' = ' . $constraint;
This is just a mock but you get the idea that you can pass or loop through your code and build the query string rather than parsing the query string and adding your backticks.
Related
If I had:
$string = "PascalCase";
I need
"pascal_case"
Does PHP offer a function for this purpose?
A shorter solution: Similar to the editor's one with a simplified regular expression and fixing the "trailing-underscore" problem:
$output = strtolower(preg_replace('/(?<!^)[A-Z]/', '_$0', $input));
PHP Demo |
Regex Demo
Note that cases like SimpleXML will be converted to simple_x_m_l using the above solution. That can also be considered a wrong usage of camel case notation (correct would be SimpleXml) rather than a bug of the algorithm since such cases are always ambiguous - even by grouping uppercase characters to one string (simple_xml) such algorithm will always fail in other edge cases like XMLHTMLConverter or one-letter words near abbreviations, etc. If you don't mind about the (rather rare) edge cases and want to handle SimpleXML correctly, you can use a little more complex solution:
$output = ltrim(strtolower(preg_replace('/[A-Z]([A-Z](?![a-z]))*/', '_$0', $input)), '_');
PHP Demo |
Regex Demo
Try this on for size:
$tests = array(
'simpleTest' => 'simple_test',
'easy' => 'easy',
'HTML' => 'html',
'simpleXML' => 'simple_xml',
'PDFLoad' => 'pdf_load',
'startMIDDLELast' => 'start_middle_last',
'AString' => 'a_string',
'Some4Numbers234' => 'some4_numbers234',
'TEST123String' => 'test123_string',
);
foreach ($tests as $test => $result) {
$output = from_camel_case($test);
if ($output === $result) {
echo "Pass: $test => $result\n";
} else {
echo "Fail: $test => $result [$output]\n";
}
}
function from_camel_case($input) {
preg_match_all('!([A-Z][A-Z0-9]*(?=$|[A-Z][a-z0-9])|[A-Za-z][a-z0-9]+)!', $input, $matches);
$ret = $matches[0];
foreach ($ret as &$match) {
$match = $match == strtoupper($match) ? strtolower($match) : lcfirst($match);
}
return implode('_', $ret);
}
Output:
Pass: simpleTest => simple_test
Pass: easy => easy
Pass: HTML => html
Pass: simpleXML => simple_xml
Pass: PDFLoad => pdf_load
Pass: startMIDDLELast => start_middle_last
Pass: AString => a_string
Pass: Some4Numbers234 => some4_numbers234
Pass: TEST123String => test123_string
This implements the following rules:
A sequence beginning with a lowercase letter must be followed by lowercase letters and digits;
A sequence beginning with an uppercase letter can be followed by either:
one or more uppercase letters and digits (followed by either the end of the string or an uppercase letter followed by a lowercase letter or digit ie the start of the next sequence); or
one or more lowercase letters or digits.
A concise solution and can handle some tricky use cases:
function decamelize($string) {
return strtolower(preg_replace(['/([a-z\d])([A-Z])/', '/([^_])([A-Z][a-z])/'], '$1_$2', $string));
}
Can handle all these cases:
simpleTest => simple_test
easy => easy
HTML => html
simpleXML => simple_xml
PDFLoad => pdf_load
startMIDDLELast => start_middle_last
AString => a_string
Some4Numbers234 => some4_numbers234
TEST123String => test123_string
hello_world => hello_world
hello__world => hello__world
_hello_world_ => _hello_world_
hello_World => hello_world
HelloWorld => hello_world
helloWorldFoo => hello_world_foo
hello-world => hello-world
myHTMLFiLe => my_html_fi_le
aBaBaB => a_ba_ba_b
BaBaBa => ba_ba_ba
libC => lib_c
You can test this function here: http://syframework.alwaysdata.net/decamelize
The Symfony Serializer Component has a CamelCaseToSnakeCaseNameConverter that has two methods normalize() and denormalize(). These can be used as follows:
$nameConverter = new CamelCaseToSnakeCaseNameConverter();
echo $nameConverter->normalize('camelCase');
// outputs: camel_case
echo $nameConverter->denormalize('snake_case');
// outputs: snakeCase
Ported from Ruby's String#camelize and String#decamelize.
function decamelize($word) {
return preg_replace(
'/(^|[a-z])([A-Z])/e',
'strtolower(strlen("\\1") ? "\\1_\\2" : "\\2")',
$word
);
}
function camelize($word) {
return preg_replace('/(^|_)([a-z])/e', 'strtoupper("\\2")', $word);
}
One trick the above solutions may have missed is the 'e' modifier which causes preg_replace to evaluate the replacement string as PHP code.
Most solutions here feel heavy handed. Here's what I use:
$underscored = strtolower(
preg_replace(
["/([A-Z]+)/", "/_([A-Z]+)([A-Z][a-z])/"],
["_$1", "_$1_$2"],
lcfirst($camelCase)
)
);
"CamelCASE" is converted to "camel_case"
lcfirst($camelCase) will lower the first character (avoids 'CamelCASE' converted output to start with an underscore)
[A-Z] finds capital letters
+ will treat every consecutive uppercase as a word (avoids 'CamelCASE' to be converted to camel_C_A_S_E)
Second pattern and replacement are for ThoseSPECCases -> those_spec_cases instead of those_speccases
strtolower([…]) turns the output to lowercases
php does not offer a built in function for this afaik, but here is what I use
function uncamelize($camel,$splitter="_") {
$camel=preg_replace('/(?!^)[[:upper:]][[:lower:]]/', '$0', preg_replace('/(?!^)[[:upper:]]+/', $splitter.'$0', $camel));
return strtolower($camel);
}
the splitter can be specified in the function call, so you can call it like so
$camelized="thisStringIsCamelized";
echo uncamelize($camelized,"_");
//echoes "this_string_is_camelized"
echo uncamelize($camelized,"-");
//echoes "this-string-is-camelized"
I had a similar problem but couldn't find any answer that satisfies how to convert CamelCase to snake_case, while avoiding duplicate or redundant underscores _ for names with underscores, or all caps abbreviations.
Th problem is as follows:
CamelCaseClass => camel_case_class
ClassName_WithUnderscores => class_name_with_underscore
FAQ => faq
The solution I wrote is a simple two functions call, lowercase and search and replace for consecutive lowercase-uppercase letters:
strtolower(preg_replace("/([a-z])([A-Z])/", "$1_$2", $name));
"CamelCase" to "camel_case":
function camelToSnake($camel)
{
$snake = preg_replace('/[A-Z]/', '_$0', $camel);
$snake = strtolower($snake);
$snake = ltrim($snake, '_');
return $snake;
}
or:
function camelToSnake($camel)
{
$snake = preg_replace_callback('/[A-Z]/', function ($match){
return '_' . strtolower($match[0]);
}, $camel);
return ltrim($snake, '_');
}
If you are looking for a PHP 5.4 version and later answer here is the code:
function decamelize($word) {
return $word = preg_replace_callback(
"/(^|[a-z])([A-Z])/",
function($m) { return strtolower(strlen($m[1]) ? "$m[1]_$m[2]" : "$m[2]"); },
$word
);
}
function camelize($word) {
return $word = preg_replace_callback(
"/(^|_)([a-z])/",
function($m) { return strtoupper("$m[2]"); },
$word
);
}
You need to run a regex through it that matches every uppercase letter except if it is in the beginning and replace it with underscrore plus that letter. An utf-8 solution is this:
header('content-type: text/html; charset=utf-8');
$separated = preg_replace('%(?<!^)\p{Lu}%usD', '_$0', 'AaaaBbbbCcccDdddÁáááŐőőő');
$lower = mb_strtolower($separated, 'utf-8');
echo $lower; //aaaa_bbbb_cccc_dddd_áááá_őőőő
If you are not sure what case your string is, better to check it first, because this code assumes that the input is camelCase instead of underscore_Case or dash-Case, so if the latters have uppercase letters, it will add underscores to them.
The accepted answer from cletus is way too overcomplicated imho and it works only with latin characters. I find it a really bad solution and wonder why it was accepted at all. Converting TEST123String into test123_string is not necessarily a valid requirement. I rather kept it simple and separated ABCccc into a_b_cccc instead of ab_cccc because it does not lose information this way and the backward conversion will give the exact same string we started with. Even if you want to do it the other way it is relative easy to write a regex for it with positive lookbehind (?<!^)\p{Lu}\p{Ll}|(?<=\p{Ll})\p{Lu} or two regexes without lookbehind if you are not a regex expert. There is no need to split it up into substrings not to mention deciding between strtolower and lcfirst where using just strtolower would be completely fine.
Short solution:
$subject = "PascalCase";
echo strtolower(preg_replace('/\B([A-Z])/', '_$1', $subject));
Not fancy at all but simple and speedy as hell:
function uncamelize($str)
{
$str = lcfirst($str);
$lc = strtolower($str);
$result = '';
$length = strlen($str);
for ($i = 0; $i < $length; $i++) {
$result .= ($str[$i] == $lc[$i] ? '' : '_') . $lc[$i];
}
return $result;
}
echo uncamelize('HelloAWorld'); //hello_a_world
A version that doesn't use regex can be found in the Alchitect source:
decamelize($str, $glue='_')
{
$counter = 0;
$uc_chars = '';
$new_str = array();
$str_len = strlen($str);
for ($x=0; $x<$str_len; ++$x)
{
$ascii_val = ord($str[$x]);
if ($ascii_val >= 65 && $ascii_val <= 90)
{
$uc_chars .= $str[$x];
}
}
$tok = strtok($str, $uc_chars);
while ($tok !== false)
{
$new_char = chr(ord($uc_chars[$counter]) + 32);
$new_str[] = $new_char . $tok;
$tok = strtok($uc_chars);
++$counter;
}
return implode($new_str, $glue);
}
So here is a one-liner:
strtolower(preg_replace('/(?|([a-z\d])([A-Z])|([^\^])([A-Z][a-z]))/', '$1_$2', $string));
danielstjules/Stringy provieds a method to convert string from camelcase to snakecase.
s('TestUCase')->underscored(); // 'test_u_case'
Laravel 5.6 provides a very simple way of doing this:
/**
* Convert a string to snake case.
*
* #param string $value
* #param string $delimiter
* #return string
*/
public static function snake($value, $delimiter = '_'): string
{
if (!ctype_lower($value)) {
$value = strtolower(preg_replace('/(.)(?=[A-Z])/u', '$1'.$delimiter, $value));
}
return $value;
}
What it does: if it sees that there is at least one capital letter in the given string, it uses a positive lookahead to search for any character (.) followed by a capital letter ((?=[A-Z])). It then replaces the found character with it's value followed by the separactor _.
If you are not using Composer for PHP you are wasting your time.
composer require doctrine/inflector
use Doctrine\Inflector\InflectorFactory;
// Couple ways to get class name:
// If inside a parent class
$class_name = get_called_class();
// Or just inside the class
$class_name = get_class();
// Or straight get a class name
$class_name = MyCustomClass::class;
// Or, of course, a string
$class_name = 'App\Libs\MyCustomClass';
// Take the name down to the base name:
$class_name = end(explode('\\', $class_name)));
$inflector = InflectorFactory::create()->build();
$inflector->tableize($class_name); // my_custom_class
https://github.com/doctrine/inflector/blob/master/docs/en/index.rst
Use Symfony String
composer require symfony/string
use function Symfony\Component\String\u;
u($string)->snake()->toString()
The direct port from rails (minus their special handling for :: or acronyms) would be
function underscore($word){
$word = preg_replace('#([A-Z\d]+)([A-Z][a-z])#','\1_\2', $word);
$word = preg_replace('#([a-z\d])([A-Z])#', '\1_\2', $word);
return strtolower(strtr($word, '-', '_'));
}
Knowing PHP, this will be faster than the manual parsing that's happening in other answers given here. The disadvantage is that you don't get to chose what to use as a separator between words, but that wasn't part of the question.
Also check the relevant rails source code
Note that this is intended for use with ASCII identifiers. If you need to do this with characters outside of the ASCII range, use the '/u' modifier for preg_matchand use mb_strtolower.
Here is my contribution to a six-year-old question with god knows how many answers...
It will convert all words in the provided string that are in camelcase to snakecase. For example "SuperSpecialAwesome and also FizBuzz καιΚάτιΑκόμα" will be converted to "super_special_awesome and also fizz_buzz και_κάτι_ακόμα".
mb_strtolower(
preg_replace_callback(
'/(?<!\b|_)\p{Lu}/u',
function ($a) {
return "_$a[0]";
},
'SuperSpecialAwesome'
)
);
Yii2 have the different function to make the word snake_case from CamelCase.
/**
* Converts any "CamelCased" into an "underscored_word".
* #param string $words the word(s) to underscore
* #return string
*/
public static function underscore($words)
{
return strtolower(preg_replace('/(?<=\\w)([A-Z])/', '_\\1', $words));
}
This is one of shorter ways:
function camel_to_snake($input)
{
return strtolower(ltrim(preg_replace('/([A-Z])/', '_\\1', $input), '_'));
}
function camel2snake($name) {
$str_arr = str_split($name);
foreach ($str_arr as $k => &$v) {
if (ord($v) >= 64 && ord($v) <= 90) { // A = 64; Z = 90
$v = strtolower($v);
$v = ($k != 0) ? '_'.$v : $v;
}
}
return implode('', $str_arr);
}
The worst answer on here was so close to being the best(use a framework). NO DON'T, just take a look at the source code. seeing what a well established framework uses would be a far more reliable approach(tried and tested). The Zend framework has some word filters which fit your needs. Source.
here is a couple of methods I adapted from the source.
function CamelCaseToSeparator($value,$separator = ' ')
{
if (!is_scalar($value) && !is_array($value)) {
return $value;
}
if (defined('PREG_BAD_UTF8_OFFSET_ERROR') && preg_match('/\pL/u', 'a') == 1) {
$pattern = ['#(?<=(?:\p{Lu}))(\p{Lu}\p{Ll})#', '#(?<=(?:\p{Ll}|\p{Nd}))(\p{Lu})#'];
$replacement = [$separator . '\1', $separator . '\1'];
} else {
$pattern = ['#(?<=(?:[A-Z]))([A-Z]+)([A-Z][a-z])#', '#(?<=(?:[a-z0-9]))([A-Z])#'];
$replacement = ['\1' . $separator . '\2', $separator . '\1'];
}
return preg_replace($pattern, $replacement, $value);
}
function CamelCaseToUnderscore($value){
return CamelCaseToSeparator($value,'_');
}
function CamelCaseToDash($value){
return CamelCaseToSeparator($value,'-');
}
$string = CamelCaseToUnderscore("CamelCase");
There is a library providing this functionality:
SnakeCaseFormatter::run('CamelCase'); // Output: "camel_case"
If you use Laravel framework, you can use just snake_case() method.
How to de-camelize without using regex:
function decamelize($str, $glue = '_') {
$capitals = [];
$replace = [];
foreach(str_split($str) as $index => $char) {
if(!ctype_upper($char)) {
continue;
}
$capitals[] = $char;
$replace[] = ($index > 0 ? $glue : '') . strtolower($char);
}
if(count($capitals) > 0) {
return str_replace($capitals, $replace, $str);
}
return $str;
}
An edit:
How would I do that in 2019:
PHP 7.3 and before:
function toSnakeCase($str, $glue = '_') {
return ltrim(
preg_replace_callback('/[A-Z]/', function ($matches) use ($glue) {
return $glue . strtolower($matches[0]);
}, $str),
$glue
);
}
And with PHP 7.4+:
function toSnakeCase($str, $glue = '_') {
return ltrim(preg_replace_callback('/[A-Z]/', fn($matches) => $glue . strtolower($matches[0]), $str), $glue);
}
If you're using the Laravel framework, a simpler built-in method exists:
$converted = Str::snake('fooBar'); // -> foo_bar
See documentation here:
https://laravel.com/docs/9.x/helpers#method-snake-case
The open source TurboCommons library contains a general purpose formatCase() method inside the StringUtils class, which lets you convert a string to lots of common case formats, like CamelCase, UpperCamelCase, LowerCamelCase, snake_case, Title Case, and many more.
https://github.com/edertone/TurboCommons
To use it, import the phar file to your project and:
use org\turbocommons\src\main\php\utils\StringUtils;
echo StringUtils::formatCase('camelCase', StringUtils::FORMAT_SNAKE_CASE);
// will output 'camel_Case'
I have a PHP array of about 20,000 names, I need to filter through it and remove any name that has the word job, freelance, or project in the name.
Below is what I have started so far, it will cycle through the array and add the cleaned item to build a new clean array. I need help matching the "bad" words though. Please help if you can
$data1 = array('Phillyfreelance' , 'PhillyWebJobs', 'web2project', 'cleanname');
// freelance
// job
// project
$cleanArray = array();
foreach ($data1 as $name) {
# if a term is matched, we remove it from our array
if(preg_match('~\b(freelance|job|project)\b~i',$name)){
echo 'word removed';
}else{
$cleanArray[] = $name;
}
}
Right now it matches a word so if "freelance" is a name in the array it removes that item but if it is something like ImaFreelaner then it does not, I need to remove anything that has the matching words in it at all
A regular expression is not really necessary here — it'd likely be faster to use a few stripos calls. (Performance matters on this level because the search occurs for each of the 20,000 names.)
With array_filter, which only keeps elements in the array for which the callback returns true:
$data1 = array_filter($data1, function($el) {
return stripos($el, 'job') === FALSE
&& stripos($el, 'freelance') === FALSE
&& stripos($el, 'project') === FALSE;
});
Here's a more extensible / maintainable version, where the list of bad words can be loaded from an array rather than having to be explicitly denoted in the code:
$data1 = array_filter($data1, function($el) {
$bad_words = array('job', 'freelance', 'project');
$word_okay = true;
foreach ( $bad_words as $bad_word ) {
if ( stripos($el, $bad_word) !== FALSE ) {
$word_okay = false;
break;
}
}
return $word_okay;
});
I'd be inclined to use the array_filter function and change the regex to not match on word boundaries
$data1 = array('Phillyfreelance' , 'PhillyWebJobs', 'web2project', 'cleanname');
$cleanArray = array_filter($data1, function($w) {
return !preg_match('~(freelance|project|job)~i', $w);
});
Use of the preg_match() function and some regular expressions should do the trick; this is what I came up with and it worked fine on my end:
<?php
$data1=array('JoomlaFreelance','PhillyWebJobs','web2project','cleanname');
$cleanArray=array();
$badWords='/(job|freelance|project)/i';
foreach($data1 as $name) {
if(!preg_match($badWords,$name)) {
$cleanArray[]=$name;
}
}
echo(implode($cleanArray,','));
?>
Which returned:
cleanname
Personally, I would do something like this:
$badWords = ['job', 'freelance', 'project'];
$names = ['JoomlaFreelance', 'PhillyWebJobs', 'web2project', 'cleanname'];
// Escape characters with special meaning in regular expressions.
$quotedBadWords = array_map(function($word) {
return preg_quote($word, '/');
}, $badWords);
// Create the regular expression.
$badWordsRegex = implode('|', $quotedBadWords);
// Filter out any names that match the bad words.
$cleanNames = array_filter($names, function($name) use ($badWordsRegex) {
return preg_match('/' . $badWordsRegex . '/i', $name) === FALSE;
});
This should be what you want:
if (!preg_match('/(freelance|job|project)/i', $name)) {
$cleanArray[] = $name;
}
check out the method below. If entered value in text box is \ mysql_real_escape_string will return duble backslash but preg_replace will return SQL with only one backslash. Im not that good with regular expression so plz help.
$sql = "INSERT INTO tbl SET val='?'";
$params = array('someval');
public function execute($sql, array $params){
$keys = array();
foreach ($params as $key => $value) {
$keys[] = '/[?]/';
if (get_magic_quotes_gpc()) {
$value = stripslashes($value);
}
$paramsEscaped[$key] = mysql_real_escape_string(trim($value));
}
$sql = preg_replace($keys, $paramsEscaped, $sql, 1, $count);
return $this->query($sql);
}
For me it basically looks like you're re-inventing the wheel and your concept has some serious flaws:
It assumes get_magic_quotes_gpc could be switched on. This feature is broken. You should not code against it. Instead make your application require that it is switched off.
mysql_real_escape_string needs a database link identifier to properly work. You are not providing any. This is a serious issue, you should change your concept.
You're actually not using prepared statements, but you mimic the syntax of those. This is fooling other developers who might think that it is safe to use the code while it is not. This is highly discouraged.
However let's do it, but just don't use preg_replace for the job. That's for various reasons, but especially, as the first pattern of ? results in replacing everything with the first parameter. It's inflexible to deal with the error-cases like too less or too many parameters/placeholders. And additionally imagine a string you insert contains a ? character as well. It would break it. Instead, the already processed part as well as the replacement needs to be skipped (Demo of such).
For that you need to go through, take it apart and process it:
public function execute($sql, array $params)
{
$params = array_map(array($this, 'filter_value'), $params);
$sql = $this->expand_placeholders($sql, $params);
return $this->query($sql);
}
public function filter_value($value)
{
if (get_magic_quotes_gpc())
{
$value = stripslashes($value);
}
$value = trim($value);
$value = mysql_real_escape_string($value);
return $value;
}
public function expand_placeholders($sql, array $params)
{
$sql = (string) $sql;
$params = array_values($params);
$offset = 0;
foreach($params as $param)
{
$place = strpos($sql, '?', $offset);
if ($place === false)
{
throw new InvalidArgumentException('Parameter / Placeholder count mismatch. Not enough placeholders for all parameters.');
}
$sql = substr_replace($sql, $param, $place, 1);
$offset = $place + strlen($param);
}
$place = strpos($sql, '?', $offset);
if ($place === false)
{
throw new InvalidArgumentException('Parameter / Placeholder count mismatch. Too many placeholders.');
}
return $sql;
}
The benefit with already existing prepared statements is, that they actually work. You should really consider to use those. For playing things like that is nice, but you need to deal with much more cases in the end and it's far easier to re-use an existing component tested by thousand of other users.
It's better to use prepared statement. See more info http://www.php.net/manual/en/pdo.prepared-statements.php
If I had:
$string = "PascalCase";
I need
"pascal_case"
Does PHP offer a function for this purpose?
A shorter solution: Similar to the editor's one with a simplified regular expression and fixing the "trailing-underscore" problem:
$output = strtolower(preg_replace('/(?<!^)[A-Z]/', '_$0', $input));
PHP Demo |
Regex Demo
Note that cases like SimpleXML will be converted to simple_x_m_l using the above solution. That can also be considered a wrong usage of camel case notation (correct would be SimpleXml) rather than a bug of the algorithm since such cases are always ambiguous - even by grouping uppercase characters to one string (simple_xml) such algorithm will always fail in other edge cases like XMLHTMLConverter or one-letter words near abbreviations, etc. If you don't mind about the (rather rare) edge cases and want to handle SimpleXML correctly, you can use a little more complex solution:
$output = ltrim(strtolower(preg_replace('/[A-Z]([A-Z](?![a-z]))*/', '_$0', $input)), '_');
PHP Demo |
Regex Demo
Try this on for size:
$tests = array(
'simpleTest' => 'simple_test',
'easy' => 'easy',
'HTML' => 'html',
'simpleXML' => 'simple_xml',
'PDFLoad' => 'pdf_load',
'startMIDDLELast' => 'start_middle_last',
'AString' => 'a_string',
'Some4Numbers234' => 'some4_numbers234',
'TEST123String' => 'test123_string',
);
foreach ($tests as $test => $result) {
$output = from_camel_case($test);
if ($output === $result) {
echo "Pass: $test => $result\n";
} else {
echo "Fail: $test => $result [$output]\n";
}
}
function from_camel_case($input) {
preg_match_all('!([A-Z][A-Z0-9]*(?=$|[A-Z][a-z0-9])|[A-Za-z][a-z0-9]+)!', $input, $matches);
$ret = $matches[0];
foreach ($ret as &$match) {
$match = $match == strtoupper($match) ? strtolower($match) : lcfirst($match);
}
return implode('_', $ret);
}
Output:
Pass: simpleTest => simple_test
Pass: easy => easy
Pass: HTML => html
Pass: simpleXML => simple_xml
Pass: PDFLoad => pdf_load
Pass: startMIDDLELast => start_middle_last
Pass: AString => a_string
Pass: Some4Numbers234 => some4_numbers234
Pass: TEST123String => test123_string
This implements the following rules:
A sequence beginning with a lowercase letter must be followed by lowercase letters and digits;
A sequence beginning with an uppercase letter can be followed by either:
one or more uppercase letters and digits (followed by either the end of the string or an uppercase letter followed by a lowercase letter or digit ie the start of the next sequence); or
one or more lowercase letters or digits.
A concise solution and can handle some tricky use cases:
function decamelize($string) {
return strtolower(preg_replace(['/([a-z\d])([A-Z])/', '/([^_])([A-Z][a-z])/'], '$1_$2', $string));
}
Can handle all these cases:
simpleTest => simple_test
easy => easy
HTML => html
simpleXML => simple_xml
PDFLoad => pdf_load
startMIDDLELast => start_middle_last
AString => a_string
Some4Numbers234 => some4_numbers234
TEST123String => test123_string
hello_world => hello_world
hello__world => hello__world
_hello_world_ => _hello_world_
hello_World => hello_world
HelloWorld => hello_world
helloWorldFoo => hello_world_foo
hello-world => hello-world
myHTMLFiLe => my_html_fi_le
aBaBaB => a_ba_ba_b
BaBaBa => ba_ba_ba
libC => lib_c
You can test this function here: http://syframework.alwaysdata.net/decamelize
The Symfony Serializer Component has a CamelCaseToSnakeCaseNameConverter that has two methods normalize() and denormalize(). These can be used as follows:
$nameConverter = new CamelCaseToSnakeCaseNameConverter();
echo $nameConverter->normalize('camelCase');
// outputs: camel_case
echo $nameConverter->denormalize('snake_case');
// outputs: snakeCase
Ported from Ruby's String#camelize and String#decamelize.
function decamelize($word) {
return preg_replace(
'/(^|[a-z])([A-Z])/e',
'strtolower(strlen("\\1") ? "\\1_\\2" : "\\2")',
$word
);
}
function camelize($word) {
return preg_replace('/(^|_)([a-z])/e', 'strtoupper("\\2")', $word);
}
One trick the above solutions may have missed is the 'e' modifier which causes preg_replace to evaluate the replacement string as PHP code.
Most solutions here feel heavy handed. Here's what I use:
$underscored = strtolower(
preg_replace(
["/([A-Z]+)/", "/_([A-Z]+)([A-Z][a-z])/"],
["_$1", "_$1_$2"],
lcfirst($camelCase)
)
);
"CamelCASE" is converted to "camel_case"
lcfirst($camelCase) will lower the first character (avoids 'CamelCASE' converted output to start with an underscore)
[A-Z] finds capital letters
+ will treat every consecutive uppercase as a word (avoids 'CamelCASE' to be converted to camel_C_A_S_E)
Second pattern and replacement are for ThoseSPECCases -> those_spec_cases instead of those_speccases
strtolower([…]) turns the output to lowercases
php does not offer a built in function for this afaik, but here is what I use
function uncamelize($camel,$splitter="_") {
$camel=preg_replace('/(?!^)[[:upper:]][[:lower:]]/', '$0', preg_replace('/(?!^)[[:upper:]]+/', $splitter.'$0', $camel));
return strtolower($camel);
}
the splitter can be specified in the function call, so you can call it like so
$camelized="thisStringIsCamelized";
echo uncamelize($camelized,"_");
//echoes "this_string_is_camelized"
echo uncamelize($camelized,"-");
//echoes "this-string-is-camelized"
I had a similar problem but couldn't find any answer that satisfies how to convert CamelCase to snake_case, while avoiding duplicate or redundant underscores _ for names with underscores, or all caps abbreviations.
Th problem is as follows:
CamelCaseClass => camel_case_class
ClassName_WithUnderscores => class_name_with_underscore
FAQ => faq
The solution I wrote is a simple two functions call, lowercase and search and replace for consecutive lowercase-uppercase letters:
strtolower(preg_replace("/([a-z])([A-Z])/", "$1_$2", $name));
"CamelCase" to "camel_case":
function camelToSnake($camel)
{
$snake = preg_replace('/[A-Z]/', '_$0', $camel);
$snake = strtolower($snake);
$snake = ltrim($snake, '_');
return $snake;
}
or:
function camelToSnake($camel)
{
$snake = preg_replace_callback('/[A-Z]/', function ($match){
return '_' . strtolower($match[0]);
}, $camel);
return ltrim($snake, '_');
}
If you are looking for a PHP 5.4 version and later answer here is the code:
function decamelize($word) {
return $word = preg_replace_callback(
"/(^|[a-z])([A-Z])/",
function($m) { return strtolower(strlen($m[1]) ? "$m[1]_$m[2]" : "$m[2]"); },
$word
);
}
function camelize($word) {
return $word = preg_replace_callback(
"/(^|_)([a-z])/",
function($m) { return strtoupper("$m[2]"); },
$word
);
}
You need to run a regex through it that matches every uppercase letter except if it is in the beginning and replace it with underscrore plus that letter. An utf-8 solution is this:
header('content-type: text/html; charset=utf-8');
$separated = preg_replace('%(?<!^)\p{Lu}%usD', '_$0', 'AaaaBbbbCcccDdddÁáááŐőőő');
$lower = mb_strtolower($separated, 'utf-8');
echo $lower; //aaaa_bbbb_cccc_dddd_áááá_őőőő
If you are not sure what case your string is, better to check it first, because this code assumes that the input is camelCase instead of underscore_Case or dash-Case, so if the latters have uppercase letters, it will add underscores to them.
The accepted answer from cletus is way too overcomplicated imho and it works only with latin characters. I find it a really bad solution and wonder why it was accepted at all. Converting TEST123String into test123_string is not necessarily a valid requirement. I rather kept it simple and separated ABCccc into a_b_cccc instead of ab_cccc because it does not lose information this way and the backward conversion will give the exact same string we started with. Even if you want to do it the other way it is relative easy to write a regex for it with positive lookbehind (?<!^)\p{Lu}\p{Ll}|(?<=\p{Ll})\p{Lu} or two regexes without lookbehind if you are not a regex expert. There is no need to split it up into substrings not to mention deciding between strtolower and lcfirst where using just strtolower would be completely fine.
Short solution:
$subject = "PascalCase";
echo strtolower(preg_replace('/\B([A-Z])/', '_$1', $subject));
Not fancy at all but simple and speedy as hell:
function uncamelize($str)
{
$str = lcfirst($str);
$lc = strtolower($str);
$result = '';
$length = strlen($str);
for ($i = 0; $i < $length; $i++) {
$result .= ($str[$i] == $lc[$i] ? '' : '_') . $lc[$i];
}
return $result;
}
echo uncamelize('HelloAWorld'); //hello_a_world
A version that doesn't use regex can be found in the Alchitect source:
decamelize($str, $glue='_')
{
$counter = 0;
$uc_chars = '';
$new_str = array();
$str_len = strlen($str);
for ($x=0; $x<$str_len; ++$x)
{
$ascii_val = ord($str[$x]);
if ($ascii_val >= 65 && $ascii_val <= 90)
{
$uc_chars .= $str[$x];
}
}
$tok = strtok($str, $uc_chars);
while ($tok !== false)
{
$new_char = chr(ord($uc_chars[$counter]) + 32);
$new_str[] = $new_char . $tok;
$tok = strtok($uc_chars);
++$counter;
}
return implode($new_str, $glue);
}
So here is a one-liner:
strtolower(preg_replace('/(?|([a-z\d])([A-Z])|([^\^])([A-Z][a-z]))/', '$1_$2', $string));
danielstjules/Stringy provieds a method to convert string from camelcase to snakecase.
s('TestUCase')->underscored(); // 'test_u_case'
Laravel 5.6 provides a very simple way of doing this:
/**
* Convert a string to snake case.
*
* #param string $value
* #param string $delimiter
* #return string
*/
public static function snake($value, $delimiter = '_'): string
{
if (!ctype_lower($value)) {
$value = strtolower(preg_replace('/(.)(?=[A-Z])/u', '$1'.$delimiter, $value));
}
return $value;
}
What it does: if it sees that there is at least one capital letter in the given string, it uses a positive lookahead to search for any character (.) followed by a capital letter ((?=[A-Z])). It then replaces the found character with it's value followed by the separactor _.
If you are not using Composer for PHP you are wasting your time.
composer require doctrine/inflector
use Doctrine\Inflector\InflectorFactory;
// Couple ways to get class name:
// If inside a parent class
$class_name = get_called_class();
// Or just inside the class
$class_name = get_class();
// Or straight get a class name
$class_name = MyCustomClass::class;
// Or, of course, a string
$class_name = 'App\Libs\MyCustomClass';
// Take the name down to the base name:
$class_name = end(explode('\\', $class_name)));
$inflector = InflectorFactory::create()->build();
$inflector->tableize($class_name); // my_custom_class
https://github.com/doctrine/inflector/blob/master/docs/en/index.rst
Use Symfony String
composer require symfony/string
use function Symfony\Component\String\u;
u($string)->snake()->toString()
The direct port from rails (minus their special handling for :: or acronyms) would be
function underscore($word){
$word = preg_replace('#([A-Z\d]+)([A-Z][a-z])#','\1_\2', $word);
$word = preg_replace('#([a-z\d])([A-Z])#', '\1_\2', $word);
return strtolower(strtr($word, '-', '_'));
}
Knowing PHP, this will be faster than the manual parsing that's happening in other answers given here. The disadvantage is that you don't get to chose what to use as a separator between words, but that wasn't part of the question.
Also check the relevant rails source code
Note that this is intended for use with ASCII identifiers. If you need to do this with characters outside of the ASCII range, use the '/u' modifier for preg_matchand use mb_strtolower.
Here is my contribution to a six-year-old question with god knows how many answers...
It will convert all words in the provided string that are in camelcase to snakecase. For example "SuperSpecialAwesome and also FizBuzz καιΚάτιΑκόμα" will be converted to "super_special_awesome and also fizz_buzz και_κάτι_ακόμα".
mb_strtolower(
preg_replace_callback(
'/(?<!\b|_)\p{Lu}/u',
function ($a) {
return "_$a[0]";
},
'SuperSpecialAwesome'
)
);
Yii2 have the different function to make the word snake_case from CamelCase.
/**
* Converts any "CamelCased" into an "underscored_word".
* #param string $words the word(s) to underscore
* #return string
*/
public static function underscore($words)
{
return strtolower(preg_replace('/(?<=\\w)([A-Z])/', '_\\1', $words));
}
This is one of shorter ways:
function camel_to_snake($input)
{
return strtolower(ltrim(preg_replace('/([A-Z])/', '_\\1', $input), '_'));
}
function camel2snake($name) {
$str_arr = str_split($name);
foreach ($str_arr as $k => &$v) {
if (ord($v) >= 64 && ord($v) <= 90) { // A = 64; Z = 90
$v = strtolower($v);
$v = ($k != 0) ? '_'.$v : $v;
}
}
return implode('', $str_arr);
}
The worst answer on here was so close to being the best(use a framework). NO DON'T, just take a look at the source code. seeing what a well established framework uses would be a far more reliable approach(tried and tested). The Zend framework has some word filters which fit your needs. Source.
here is a couple of methods I adapted from the source.
function CamelCaseToSeparator($value,$separator = ' ')
{
if (!is_scalar($value) && !is_array($value)) {
return $value;
}
if (defined('PREG_BAD_UTF8_OFFSET_ERROR') && preg_match('/\pL/u', 'a') == 1) {
$pattern = ['#(?<=(?:\p{Lu}))(\p{Lu}\p{Ll})#', '#(?<=(?:\p{Ll}|\p{Nd}))(\p{Lu})#'];
$replacement = [$separator . '\1', $separator . '\1'];
} else {
$pattern = ['#(?<=(?:[A-Z]))([A-Z]+)([A-Z][a-z])#', '#(?<=(?:[a-z0-9]))([A-Z])#'];
$replacement = ['\1' . $separator . '\2', $separator . '\1'];
}
return preg_replace($pattern, $replacement, $value);
}
function CamelCaseToUnderscore($value){
return CamelCaseToSeparator($value,'_');
}
function CamelCaseToDash($value){
return CamelCaseToSeparator($value,'-');
}
$string = CamelCaseToUnderscore("CamelCase");
There is a library providing this functionality:
SnakeCaseFormatter::run('CamelCase'); // Output: "camel_case"
If you use Laravel framework, you can use just snake_case() method.
How to de-camelize without using regex:
function decamelize($str, $glue = '_') {
$capitals = [];
$replace = [];
foreach(str_split($str) as $index => $char) {
if(!ctype_upper($char)) {
continue;
}
$capitals[] = $char;
$replace[] = ($index > 0 ? $glue : '') . strtolower($char);
}
if(count($capitals) > 0) {
return str_replace($capitals, $replace, $str);
}
return $str;
}
An edit:
How would I do that in 2019:
PHP 7.3 and before:
function toSnakeCase($str, $glue = '_') {
return ltrim(
preg_replace_callback('/[A-Z]/', function ($matches) use ($glue) {
return $glue . strtolower($matches[0]);
}, $str),
$glue
);
}
And with PHP 7.4+:
function toSnakeCase($str, $glue = '_') {
return ltrim(preg_replace_callback('/[A-Z]/', fn($matches) => $glue . strtolower($matches[0]), $str), $glue);
}
If you're using the Laravel framework, a simpler built-in method exists:
$converted = Str::snake('fooBar'); // -> foo_bar
See documentation here:
https://laravel.com/docs/9.x/helpers#method-snake-case
The open source TurboCommons library contains a general purpose formatCase() method inside the StringUtils class, which lets you convert a string to lots of common case formats, like CamelCase, UpperCamelCase, LowerCamelCase, snake_case, Title Case, and many more.
https://github.com/edertone/TurboCommons
To use it, import the phar file to your project and:
use org\turbocommons\src\main\php\utils\StringUtils;
echo StringUtils::formatCase('camelCase', StringUtils::FORMAT_SNAKE_CASE);
// will output 'camel_Case'
I am very new to regex, and this is way too advanced for me. So I am asking the experts over here.
Problem
I would like to retrieve the constants / values from a php define()
DEFINE('TEXT', 'VALUE');
Basically I would like a regex to be able to return the name of constant, and the value of constant from the above line. Just TEXT and VALUE . Is this even possible?
Why I need it? I am dealing with language file and I want to get all couples (name, value) and put them in array. I managed to do it with str_replace() and trim() etc.. but this way is long and I am sure it could be made easier with single line of regex.
Note: The VALUE may contain escaped single quotes as well. example:
DEFINE('TEXT', 'J\'ai');
I hope I am not asking for something too complicated. :)
Regards
For any kind of grammar-based parsing, regular expressions are usually an awful solution. Even smple grammars (like arithmetic) have nesting and it's on nesting (in particular) that regular expressions just fall over.
Fortunately PHP provides a far, far better solution for you by giving you access to the same lexical analyzer used by the PHP interpreter via the token_get_all() function. Give it a character stream of PHP code and it'll parse it into tokens ("lexemes"), which you can do a bit of simple parsing on with a pretty simple finite state machine.
Run this program (it's run as test.php so it tries it on itself). The file is deliberately formatted badly so you can see it handles that with ease.
<?
define('CONST1', 'value' );
define (CONST2, 'value2');
define( 'CONST3', time());
define('define', 'define');
define("test", VALUE4);
define('const5', //
'weird declaration'
) ;
define('CONST7', 3.14);
define ( /* comment */ 'foo', 'bar');
$defn = 'blah';
define($defn, 'foo');
define( 'CONST4', define('CONST5', 6));
header('Content-Type: text/plain');
$defines = array();
$state = 0;
$key = '';
$value = '';
$file = file_get_contents('test.php');
$tokens = token_get_all($file);
$token = reset($tokens);
while ($token) {
// dump($state, $token);
if (is_array($token)) {
if ($token[0] == T_WHITESPACE || $token[0] == T_COMMENT || $token[0] == T_DOC_COMMENT) {
// do nothing
} else if ($token[0] == T_STRING && strtolower($token[1]) == 'define') {
$state = 1;
} else if ($state == 2 && is_constant($token[0])) {
$key = $token[1];
$state = 3;
} else if ($state == 4 && is_constant($token[0])) {
$value = $token[1];
$state = 5;
}
} else {
$symbol = trim($token);
if ($symbol == '(' && $state == 1) {
$state = 2;
} else if ($symbol == ',' && $state == 3) {
$state = 4;
} else if ($symbol == ')' && $state == 5) {
$defines[strip($key)] = strip($value);
$state = 0;
}
}
$token = next($tokens);
}
foreach ($defines as $k => $v) {
echo "'$k' => '$v'\n";
}
function is_constant($token) {
return $token == T_CONSTANT_ENCAPSED_STRING || $token == T_STRING ||
$token == T_LNUMBER || $token == T_DNUMBER;
}
function dump($state, $token) {
if (is_array($token)) {
echo "$state: " . token_name($token[0]) . " [$token[1]] on line $token[2]\n";
} else {
echo "$state: Symbol '$token'\n";
}
}
function strip($value) {
return preg_replace('!^([\'"])(.*)\1$!', '$2', $value);
}
?>
Output:
'CONST1' => 'value'
'CONST2' => 'value2'
'CONST3' => 'time'
'define' => 'define'
'test' => 'VALUE4'
'const5' => 'weird declaration'
'CONST7' => '3.14'
'foo' => 'bar'
'CONST5' => '6'
This is basically a finite state machine that looks for the pattern:
function name ('define')
open parenthesis
constant
comma
constant
close parenthesis
in the lexical stream of a PHP source file and treats the two constants as a (name,value) pair. In doing so it handles nested define() statements (as per the results) and ignores whitespace and comments as well as working across multiple lines.
Note: I've deliberatley made it ignore the case when functions and variables are constant names or values but you can extend it to that as you wish.
It's also worth pointing out that PHP is quite forgiving when it comes to strings. They can be declared with single quotes, double quotes or (in certain circumstances) with no quotes at all. This can be (as pointed out by Gumbo) be an ambiguous reference reference to a constant and you have no way of knowing which it is (no guaranteed way anyway), giving you the chocie of:
Ignoring that style of strings (T_STRING);
Seeing if a constant has already been declared with that name and replacing it's value. There's no way you can know what other files have been called though nor can you process any defines that are conditionally created so you can't say with any certainty if anything is definitely a constant or not nor what value it has; or
You can just live with the possibility that these might be constants (which is unlikely) and just treat them as strings.
Personally I would go for (1) then (3).
This is possible, but I would rather use get_defined_constants(). But make sure all your translations have something in common (like all translations starting with T), so you can tell them apart from other constants.
Try this regular expression to find the define calls:
/\bdefine\(\s*("(?:[^"\\]+|\\(?:\\\\)*.)*"|'(?:[^'\\]+|\\(?:\\\\)*.)*')\s*,\s*("(?:[^"\\]+|\\(?:\\\\)*.)*"|'(?:[^'\\]+|\\(?:\\\\)*.)*')\s*\);/is
So:
$pattern = '/\\bdefine\\(\\s*("(?:[^"\\\\]+|\\\\(?:\\\\\\\\)*.)*"|\'(?:[^\'\\\\]+|\\\\(?:\\\\\\\\)*.)*\')\\s*,\\s*("(?:[^"\\\\]+|\\\\(?:\\\\\\\\)*.)*"|\'(?:[^\'\\\\]+|\\\\(?:\\\\\\\\)*.)*\')\\s*\\);/is';
$str = '<?php define(\'foo\', \'bar\'); define("define(\\\'foo\\\', \\\'bar\\\')", "define(\'foo\', \'bar\')"); ?>';
preg_match_all($pattern, $str, $matches, PREG_SET_ORDER);
var_dump($matches);
I know that eval is evil. But that’s the best way to evaluate the string expressions:
$constants = array();
foreach ($matches as $match) {
eval('$constants['.$match[1].'] = '.$match[1].';');
}
var_dump($constants);
You might not need to go overboard with the regex complexity - something like this will probably suffice
/DEFINE\('(.*?)',\s*'(.*)'\);/
Here's a PHP sample showing how you might use it
$lines=file("myconstants.php");
foreach($lines as $line) {
$matches=array();
if (preg_match('/DEFINE\(\'(.*?)\',\s*\'(.*)\'\);/i', $line, $matches)) {
$name=$matches[1];
$value=$matches[2];
echo "$name = $value\n";
}
}
Not every problem with text should be solved with a regexp, so I'd suggest you state what you want to achieve and not how.
So, instead of using php's parser which is not really useful, or instead of using a completely undebuggable regexp, why not write a simple parser?
<?php
$str = "define('nam\\'e', 'va\\\\\\'lue');\ndefine('na\\\\me2', 'value\\'2');\nDEFINE('a', 'b');";
function getDefined($str) {
$lines = array();
preg_match_all('#^define[(][ ]*(.*?)[ ]*[)];$#mi', $str, $lines);
$res = array();
foreach ($lines[1] as $cnt) {
$p = 0;
$key = parseString($cnt, $p);
// Skip comma
$p++;
// Skip space
while ($cnt{$p} == " ") {
$p++;
}
$value = parseString($cnt, $p);
$res[$key] = $value;
}
return $res;
}
function parseString($s, &$p) {
$quotechar = $s[$p];
if (! in_array($quotechar, array("'", '"'))) {
throw new Exception("Invalid quote character '" . $quotechar . "', input is " . var_export($s, true) . " # " . $p);
}
$len = strlen($s);
$quoted = false;
$res = "";
for ($p++;$p < $len;$p++) {
if ($quoted) {
$quoted = false;
$res .= $s{$p};
} else {
if ($s{$p} == "\\") {
$quoted = true;
continue;
}
if ($s{$p} == $quotechar) {
$p++;
return $res;
}
$res .= $s{$p};
}
}
throw new Exception("Premature end of line");
}
var_dump(getDefined($str));
Output:
array(3) {
["nam'e"]=>
string(7) "va\'lue"
["na\me2"]=>
string(7) "value'2"
["a"]=>
string(1) "b"
}