Replace words - Ignore words between brackets - php

I'm using an ubb parser to convert several codes within brackets to html codes. I want to use a string replacer aswell to replace some unwanted words.
Now, I'm using this:
foreach($f AS $value) {
$escapeNamesArray[] = '/'.$value['woord'].'/i';
$escapeNamesReplace[] = '<span style="color: gray;">'.$value['vervanging'].'</span>';
}
$string = preg_replace($escapeNamesArray, $escapeNamesReplace, $string);
When I want to replace the word "Hello" to "Hey", everything is working fine. But when I place the word "Hello" between brackets, for example:
[url=http://www.hello.com]kdskdsds[/url]
The word "Hello" is replaced aswell. How can I change the pattern of the preg_replace function to ignore words between brackets?
Thanks for your reply!

Using preg_replace in HTMl-ish situations often turns into a mud pit. I highly recommend you find a different solution to this problem.
I'd suggest letting the parser do its work first, turning everything into valid XHTML. Then use something like SimpleXMLElement or DOMDocument to parse the document. You can then traverse the object, replacing bad strings in each element. When you're done, convert it back to an XHTML string.
This solution is a little more involved, but it's more robust and more flexible, especially if you decide to add more filters and replacements later.

Lucas is right, but its just a simple change to your existing code:
You just need to add make sure its ONLY matching words directly in-between [ ]
I have just added [ and ] in your pattern array (you need to escape them as they are normally used for a regex character array). Here is the updated code:
foreach($f AS $value)
{
$escapeNamesArray[] = '/ '.$value['woord'].' /i';
$escapeNamesReplace[] = '<span style="color: gray;">'.$value['vervanging'].'</span>';
}
$string = preg_replace($escapeNamesArray, $escapeNamesReplace, $string);
This is the only actually changed line:
$escapeNamesArray[] = '/ '.$value['woord'].' /i';
This will work with [whatever] [ whatever] [whatever ] but not [ whatever ]
I havent had a chance to test this, but it should work.
EDIT: Change the code slightly, please take another look :o)

You can make use of the BBCode PECL extension to do the heavy lifting for you. Check this out:
<?php
function filterWords($content, $argument) {
$badWordList = array(
'complex',
'regular expressions',
'O(n^2)'
);
return str_ireplace($badWordList, '', $content);
}
$bbcodeParserConfig = array(
'' => array(
'type' => BBCODE_TYPE_ROOT,
'content_handling' => 'filterWords'
),
'url' => array(
'type' => BBCODE_TYPE_OPTARG,
'open_tag' => '<a href="{PARAM}">',
'close_tag' => '</a>',
'default_arg' => '{CONTENT}',
'childs' => ''
)
);
$bbcodeParser = bbcode_create($bbcodeParserConfig);
$content = 'This is a complex url that [url=http://www.example.com]tells you nothing about regular expressions or O(n^2) algorithms[/url] and thankfully so!';
var_dump(bbcode_parse($bbcodeParser, $content));
There is also a BBCode parser written in PHP.

I would recommend taking each variable and splitting at the open and close bracket.
If it split at the open bracket then you know it contains an opening bracket. Call the replace on the string to the left of the open bracket (call var1). Then call split on the closing bracket and you know that the string to the left is the contents of the bracket so concatenate it to var 1 (called var2). Then call replace to the string to the right of the last split since it must have been outside of the closing bracket and concatenate the result to var2.
Example:
$exampleStr = "[url=http://www.hello.com]kdskdsds[/url]";
$piecesOfString = explode("[", $exampleStr);
// $piecesOfString[0] = "" --> before the opening bracket so if there was anything there you would have to replace
// $piecesOfString[1] = "url=http://www.hello.com]kdskdsds"
// $piecesOfString[2] = "/url]";"
$piecesOfStringSecond = explode("]", $piecesOfString[1]);
// $piecesOfStringSecond[0] = "url=http://www.hello.com" within the brackets so don't replace
// $piecesOfStringSecond[1] = "kdskdsds" //outside bracket so replace
$piecesOfStringSecond = explode("]", $piecesOfString[2]);
// $piecesOfStringSecond[0] = "/url" within the brackets so don't replace
// $piecesOfStringSecond[1] = "" //outside bracket so if length > 0 replace
I haven't checked this and I'm giving you this in pseudocode but:
$exampleStr = "begin[url=http://www.hello.com]kdskdsds[/url]between[url=http://www.second.com]dsfafa[/url]between2[url=http://www.third.com]kjhjkhk[/url]end";
$piecesOfStringOpen = explode("[", $exampleStr); //splits the string at the "["
for integer j = 0 to length of $piecesOfStringOpen {
if (j == 0) { // you know it will be the first part "begin"
// call replace on $piecesOfStringOpen[j] because you know it is outside of brackets
} else {
//this will include:
// $piecesOfStringOpen[1] = "url=http://www.hello.com]kdskdsds"
// $piecesOfStringOpen[2] = "/url]between"
// $piecesOfStringOpen[3] = "url=http://www.second.com]dsfafa"
// etc
$piecesOfStringClose = explode("]", $exampleStr); //splits the string at the "]"
for integer k = 0 to length of $piecesOfStringClose {
//if k == 0 then it was inside bracket, is a url and don't replace
//elsif k == 1 then it was outside bracket and you want to replace
}
}
}

Related

PHP: preg_replace only first matching string in array

I've started with preg_replace in PHP and I wonder how I can replace only first matching array key with a specified array value cause I set preg_replace number of changes parameter to '1' and it's changing more than one time anyways. I also splitted my string to single words and I'm examining them one by one:
<?php
$internal_message = 'Hey, this is awesome!';
$words = array(
'/wesome(\W|$)/' => 'wful',
'/wful(\W|$)/' => 'wesome',
'/^this(\W|$)/' => 'that',
'/^that(\W|$)/' => 'this'
);
$splitted_message = preg_split("/[\s]+/", $internal_message);
$words_num = count($splitted_message);
for($i=0; $i<$words_num; $i++) {
$splitted_message[$i] = preg_replace(array_keys($words), array_values($words), $splitted_message[$i], 1);
}
$message = implode(" ", $splitted_message);
echo $message;
?>
I want this to be on output:
Hey, that is awful
(one suffix change, one word change and stops)
Not this:
Hey, this is awesome
(two suffix changes, two word changes and back to original word & suffix...)
Maybe I can simplify this code? I also can't change order of the array keys and values cause there will be more suffixes and single words to change soon. I'm kinda newbie in php coding and I'll be thankful for any help ;>
You may use plain text in the associative array keys that you will use to create dynamic regex patterns from, and use preg_replace_callback to replace the found values with the replacements in one go.
$internal_message = 'Hey, this is awesome!';
$words = array(
'wesome' => 'wful',
'wful' => 'wesome',
'this' => 'that',
'that' => 'this'
);
$rx = '~(?:' . implode("|", array_keys($words)) . ')\b~';
echo "$rx\n";
$message = preg_replace_callback($rx, function($m) use ($words) {
return isset($words[$m[0]]) ? $words[$m[0]] : $m[0];
}, $internal_message);
echo $message;
// => Hey, that is awful!
See the PHP demo.
The regex is
~(?:wesome|wful|this|that)\b~
The (?:wesome|wful|this|that) is a non-capturing group that matches any of the values inside, and \b is a word boundary, a non-consuming pattern that ensures there is no letter, digit or _ after the suffix.
The preg_replace_callback parses the string once, and when a match occurs, it is passed to the anonymous function (function($m)) together with the $words array (use ($words)) and if the $words array contains the found key (isset($words[$m[0]])) the corresponding value is returned ($words[$m[0]]) or the found match is returned otherwise ($m[0]).

regex assistance

I am trying to match a semi dynamically generated string. So I can see if its the correct format, then extract the information from it that I need. My Problem is I no matter how hard I try to grasp regex can't fathom it for the life of me. Even with the help of so called generators.
What I have is a couple different strings like the following. [#img:1234567890] and [#user:1234567890] and [#file:file_name-with.ext]. Strings like this pass through are intent on passing through a filter so they can be replaced with links, and or more readable names. But again try as I might I can't come up with a regex for any given one of them.
I am looking for the format: [#word:] of which I will strip the [, ], #, and word from the string so I can then turn around an query my DB accordingly for whatever it is and work with it accordingly. Just the regex bit is holding me back.
Not sure what you mean by generators. I always use online matchers to see that my test cases work. #Virendra almost had it except forgot to escape the [] charaters.
/\[#(\w+):(.*)\]/
You need to start and end with a regex delimeter, in this case the '/' character.
Then we escape the '[]' which is use by regex to match ranges of characters hence the '['.
Next we match a literal '#' symbol.
Now we want to save this next match so we can use it later so we surround it with ().
\w matches a word. Basically any characters that aren't spaces, punctuation, or line characters.
Again match a literal :.
Maybe useful to have the second part in a match group as well so (.*) will match any character any number of times, and save it for you.
Then we escape the closing ] as we did earlier.
Since it sounds like you want to use the matches later in a query we can use preg_match to save the matches to an array.
$pattern = '/\[#(\w+):(.*)\]/';
$subject = '[#user:1234567890]';
preg_match($pattern, $subject, $matches);
print_r($matches);
Would output
array(
[0] => '[#user:1234567890]', // Full match
[1] => 'user', // First match
[2] => '1234567890' // Second match
)
An especially helpful tool I've found is txt2re
Here's what I would do.
<pre>
<?php
$subj = 'An image:[#img:1234567890], a user:[#user:1234567890] and a file:[#file:file_name-with.ext]';
preg_match_all('~(?<match>\[#(?<type>[^:]+):(?<value>[^\]]+)\])~',$subj,$matches,PREG_SET_ORDER);
foreach ($matches as &$arr) unset($arr[0],$arr[1],$arr[2],$arr[3]);
print_r($matches);
?>
</pre>
This will output
Array
(
[0] => Array
(
[match] => [#img:1234567890]
[type] => img
[value] => 1234567890
)
[1] => Array
(
[match] => [#user:1234567890]
[type] => user
[value] => 1234567890
)
[2] => Array
(
[match] => [#file:file_name-with.ext]
[type] => file
[value] => file_name-with.ext
)
)
And here's a pseudo version of how I would use the preg_replace_callback() function:
function replace_shortcut($matches) {
global $users;
switch (strtolower($matches['type'])) {
case 'img' : return '<img src="images/img_'.$matches['value'].'jpg" />';
case 'file' : return ''.$matches['value'].'';
// add id of each user in array
case 'user' : $users[] = (int) $matches['value']; return '%s';
default : return $matches['match'];
}
}
$users = array();
$replaceArr = array();
$subj = 'An image:[#img:1234567890], a user:[#user:1234567890] and a file:[#file:file_name-with.ext]';
// escape percentage signs to avoid complications in the vsprintf function call later
$subj = strtr($subj,array('%'=>'%%'));
$subj = preg_replace_callback('~(?<match>\[#(?<type>[^:]+):(?<value>[^\]]+)\])~',replace_shortcut,$subj);
if (!empty($users)) {
// connect to DB and check users
$query = " SELECT `id`,`nick`,`date_deleted` IS NOT NULL AS 'deleted'
FROM `users` WHERE `id` IN ('".implode("','",$users)."')";
// query
// ...
// and catch results
while ($row = $con->fetch_array()) {
// position of this id in users array:
$idx = array_search($row['id'],$users);
$nick = htmlspecialchars($row['nick']);
$replaceArr[$idx] = $row['deleted'] ?
"<span class=\"user_deleted\">{$nick}</span>" :
"{$nick}";
// delete this key so that we can check id's not found later...
unset($users[$idx]);
}
// in here:
foreach ($users as $key => $value) {
$replaceArr[$key] = '<span class="user_unknown">User'.$value.'</span>';
}
// replace each user reference marked with %s in $subj
$subj = vsprintf($subj,$replaceArr);
} else {
// remove extra percentage signs we added for vsprintf function
$subj = preg_replace('~%{2}~','%',$subj);
}
unset($query,$row,$nick,$idx,$key,$value,$users,$replaceArr);
echo $subj;
You can try something like this:
/\[#(\w+):([^]]*)\]/
\[ escapes the [ character (otherwise interpreted as a character set); \w means any "word" character, and [^]]* means any non-] character (to avoid matching past the end of the tag, as .* might). The parens group the various matched parts so that you can use $1 and $2 in preg_replace to generate the replacement text:
echo preg_replace('/\[#(\w+):([^]]*)\]/', '$1 $2', '[#link:abcdef]');
prints link abcdef

PHP - preg_replace not matching multiple occurrences

Trying to replace a string, but it seems to only match the first occurrence, and if I have another occurrence it doesn't match anything, so I think I need to add some sort of end delimiter?
My code:
$mappings = array(
'fname' => $prospect->forename,
'lname' => $prospect->surname,
'cname' => $prospect->company,
);
foreach($mappings as $key => $mapping) if(empty($mapping)) $mappings[$key] = '$2';
$match = '~{(.*)}(.*?){/.*}$~ise';
$source = 'Hello {fname}Default{/fname} {lname}Last{/lname}';
// $source = 'Hello {fname}Default{/fname}';
$text = preg_replace($match, '$mappings["$1"]', $source);
So if I use the $source that's commented, it matches fine, but if I use the one currently in the code above where there's 2 matches, it doesn't match anything and I get an error:
Message: Undefined index: fname}Default{/fname} {lname
Filename: schedule.php(62) : regexp code
So am I right in saying I need to provide an end delimiter or something?
Thanks,
Christian
Apparently your regexp matches fname}Default{/fname} {lname instead of Default.
As I mentioned here use {(.*?)} instead of {(.*)}.
{ has special meaning in regexps so you should escape it \\{.
And I recommend using preg_replace_callback instead of e modifier (you have more flow control and syntax higlighting and it's impossible to force your program to execute malicious code).
Last mistake you're making is not checking whether the requested index exists. :)
My solution would be:
<?php
class A { // Of course with better class name :)
public $mappings = array(
'fname' => 'Tested'
);
public function callback( $match)
{
if( isset( $this->mappings[$match[1]])){
return $this->mappings[$match[1]];
}
return $match[2];
}
}
$a = new A();
$match = '~\\{([^}]+)\\}(.*?)\\{/\\1\\}~is';
$source = 'Hello {fname}Default{/fname} {lname}Last{/lname}';
echo preg_replace_callback( $match, array($a, 'callback'), $source);
This results into:
[vyktor#grepfruit tmp]$ php stack.php
Hello Tested Last
Your regular expression is anchored to the end of the string so you closing {/whatever} must be the last thing in your string. Also, since your open and closing tags are simply .*, there's nothing in there to make sure they match up. What you want is to make sure that your closing tag matches your opening one - using a backreference like {(.+)}(.*?){/\1} will make sure they're the same.
I'm sure there's other gotchas in there - if you have control over the format of strings you're working with (IE - you're rolling your own templating language), I'd seriously consider moving to a simpler, easier to match format. Since you're not 'saving' the default values, having enclosing tags provides you with no added value but makes the parsing more complicated. Just using $VARNAME would work just as well and be easier to match (\$[A-Z]+), without involving back-references or having to explicitly state you're using non-greedy matching.

Parsing plain text in such a way that will recognise a custom if statement

I have the following string:
$string = "The man has {NUM_DOGS} dogs."
I'm parsing this by running it through the following function:
function parse_text($string)
{
global $num_dogs;
$string = str_replace('{NUM_DOGS}', $num_dogs, $string);
return $string;
}
parse_text($string);
Where $num_dogs is a preset variable. Depending on $num_dogs, this could return any of the following strings:
The man has 1 dogs.
The man has 2 dogs.
The man has 500 dogs.
The problem is that in the case that "the man has 1 dogs", dog is pluralised, which is undesired. I know that this could be solved simply by not using the parse_text function and instead doing something like:
if($num_dogs = 1){
$string = "The man has 1 dog.";
}else{
$string = "The man has $num_dogs dogs.";
}
But in my application I'm parsing more than just {NUM_DOGS} and it'd take a lot of lines to write all the conditions.
I need a shorthand way which I can write into the initial $string which I can run through a parser, which ideally wouldn't limit me to just two true/false possibilities.
For example, let
$string = 'The man has {NUM_DOGS} [{NUM_DOGS}|0=>"dogs",1=>"dog called fred",2=>"dogs called fred and harry",3=>"dogs called fred, harry and buster"].';
Is it clear what's happened at the end? I've attempted to initiate the creation of an array using the part inside the square brackets that's after the vertical bar, then compare the key of the new array with the parsed value of {NUM_DOGS} (which by now will be the $num_dogs variable at the left of the vertical bar), and return the value of the array entry with that key.
If that's not totally confusing, is it possible using the preg_* functions?
The premise of your question is that you want to match a specific pattern and then replace it after performing additional processing on the matched text.
Seems like an ideal candidate for preg_replace_callback
The regular expressions for capturing matched parenthesis, quotes, braces etc. can become quite complicated, and to do it all with a regular expression is in fact quite inefficient. In fact you'd need to write a proper parser if that's what you require.
For this question I'm going to assume a limited level of complexity, and tackle it with a two stage parse using regex.
First of all, the most simple regex I can think off for capturing tokens between curly braces.
/{([^}]+)}/
Lets break that down.
{ # A literal opening brace
( # Begin capture
[^}]+ # Everything that's not a closing brace (one or more times)
) # End capture
} # Literal closing brace
When applied to a string with preg_match_all the results look something like:
array (
0 => array (
0 => 'A string {TOK_ONE}',
1 => ' with {TOK_TWO|0=>"no", 1=>"one", 2=>"two"}',
),
1 => array (
0 => 'TOK_ONE',
1 => 'TOK_TWO|0=>"no", 1=>"one", 2=>"two"',
),
)
Looks good so far.
Please note that if you have nested braces in your strings, i.e. {TOK_TWO|0=>"hi {x} y"}, this regex will not work. If this wont be a problem, skip down to the next section.
It is possible to do top-level matching, but the only way I have ever been able to do it is via recursion. Most regex veterans will tell you that as soon as you add recursion to a regex, it stops being a regex.
This is where the additional processing complexity kicks in, and with long complicated strings it's very easy to run out of stack space and crash your program. Use it carefully if you need to use it at all.
The recursive regex taken from one of my other answers and modified a little.
`/{((?:[^{}]*|(?R))*)}/`
Broken down.
{ # literal brace
( # begin capture
(?: # don't create another capture set
[^{}]* # everything not a brace
|(?R) # OR recurse
)* # none or more times
) # end capture
} # literal brace
And this time the ouput only matches top-level braces
array (
0 => array (
0 => '{TOK_ONE|0=>"a {nested} brace"}',
),
1 => array (
0 => 'TOK_ONE|0=>"a {nested} brace"',
),
)
Again, don't use the recursive regex unless you have to. (Your system may not even support them if it has an old PCRE library)
With that out of the way we need to work out if the token has options associated with it. Instead of having two fragments to be matched as per your question, I'd recommend keeping the options with the token as per my examples. {TOKEN|0=>"option"}
Lets assume $match contains a matched token, if we check for a pipe |, and take the substring of everything after it we'll be left with your list of options, again we can use regex to parse them out. (Don't worry I'll bring everything together at the end)
/(\d)+\s*=>\s*"([^"]*)",?/
Broken down.
(\d)+ # Capture one or more decimal digits
\s* # Any amount of whitespace (allows you to do 0 => "")
=> # Literal pointy arrow
\s* # Any amount of whitespace
" # Literal quote
([^"]*) # Capture anything that isn't a quote
" # Literal quote
,? # Maybe followed by a comma
And an example match
array (
0 => array (
0 => '0=>"no",',
1 => '1 => "one",',
2 => '2=>"two"',
),
1 => array (
0 => '0',
1 => '1',
2 => '2',
),
2 => array (
0 => 'no',
1 => 'one',
2 => 'two',
),
)
If you want to use quotes inside your quotes, you'll have to make your own recursive regex for it.
Wrapping up, here's a working example.
Some initialisation code.
$options = array(
'WERE' => 1,
'TYPE' => 'cat',
'PLURAL' => 1,
'NAME' => 2
);
$string = 'There {WERE|0=>"was a",1=>"were"} ' .
'{TYPE}{PLURAL|1=>"s"} named bob' .
'{NAME|1=>" and bib",2=>" and alice"}';
And everything together.
$string = preg_replace_callback('/{([^}]+)}/', function($match) use ($options) {
$match = $match[1];
if (false !== $pipe = strpos($match, '|')) {
$tokens = substr($match, $pipe + 1);
$match = substr($match, 0, $pipe);
} else {
$tokens = array();
}
if (isset($options[$match])) {
if ($tokens) {
preg_match_all('/(\d)+\s*=>\s*"([^"]*)",?/', $tokens, $tokens);
$tokens = array_combine($tokens[1], $tokens[2]);
return $tokens[$options[$match]];
}
return $options[$match];
}
return '';
}, $string);
Please note the error checking is minimal, there will be unexpected results if you pick options that don't exist.
There's probably a lot simpler way to do all of this, but I just took the idea and ran with it.
First of all, it is a bit debatable, but if you can easily avoid it, just pass $num_dogs as an argument to the function as most people believe global variables are evil!
Next, for the getting the "s", I generally do something like this:
$dogs_plural = ($num_dogs == 1) ? '' : 's';
Then just do something like this:
$your_string = "The man has $num_dogs dog$dogs_plural";
It's essentially the same thing as doing an if/else block, but less lines of code and you only have to write the text once.
As for the other part, I am STILL confused about what you're trying to do, but I believe you are looking for some sort of way to convert
{NUM_DOGS}|0=>"dogs",1=>"dog called fred",2=>"dogs called fred and harry",3=>"dogs called fred, harry and buster"]
into:
switch $num_dogs {
case 0:
return 'dogs';
break;
case 1:
return 'dog called fred';
break;
case 2:
return 'dogs called fred and harry';
break;
case 3:
return 'dogs called fred, harry and buster';
break;
}
The easiest way is to try to use a combination of explode() and regex to then get it to do something like I have above.
In a pinch, I have done something similar to what you're asking with an implementation vaguely like the code below.
This is nowhere near as feature rich as #Mike's answer, but it has done the trick in the past.
/**
* This function pluralizes words, as appropriate.
*
* It is a completely naive, example-only implementation.
* There are existing "inflector" implementations that do this
* quite well for many/most *English* words.
*/
function pluralize($count, $word)
{
if ($count === 1)
{
return $word;
}
return $word . 's';
}
/**
* Matches template patterns in the following forms:
* {NAME} - Replaces {NAME} with value from $values['NAME']
* {NAME:word} - Replaces {NAME:word} with 'word', pluralized using the pluralize() function above.
*/
function parse($template, array $values)
{
$callback = function ($matches) use ($values) {
$number = $values[$matches['name']];
if (array_key_exists('word', $matches)) {
return pluralize($number, $matches['word']);
}
return $number;
};
$pattern = '/\{(?<name>.+?)(:(?<word>.+?))?\}/i';
return preg_replace_callback($pattern, $callback, $template);
}
Here are some examples similar to your original question...
echo parse(
'The man has {NUM_DOGS} {NUM_DOGS:dog}.' . PHP_EOL,
array('NUM_DOGS' => 2)
);
echo parse(
'The man has {NUM_DOGS} {NUM_DOGS:dog}.' . PHP_EOL,
array('NUM_DOGS' => 1)
);
The output is:
The man has 2 dogs.
The man has 1 dog.
It may be worth mentioning that in larger projects I've invariably ended up ditching any custom rolled inflection in favour of GNU gettext which seems to be the most sane way forward once multi-lingual is a requirement.
This was copied from an answer posted by flussence back in 2009 in response to this question:
You might want to look at the gettext extension. More specifically, it sounds like ngettext() will do what you want: it pluralises words correctly as long as you have a number to count from.
print ngettext('odor', 'odors', 1); // prints "odor"
print ngettext('odor', 'odors', 4); // prints "odors"
print ngettext('%d cat', '%d cats', 4); // prints "4 cats"
You can also make it handle translated plural forms correctly, which is its main purpose, though it's quite a lot of extra work to do.

How to remove commas between double quotes in PHP

Hopefully, this is an easy one. I have an array with lines that contain output from a CSV file. What I need to do is simply remove any commas that appear between double-quotes.
I'm stumbling through regular expressions and having trouble. Here's my sad-looking code:
<?php
$csv_input = '"herp","derp","hey, get rid of these commas, man",1234';
$pattern = '(?<=\")/\,/(?=\")'; //this doesn't work
$revised_input = preg_replace ( $pattern , '' , $csv_input);
echo $revised_input;
//would like revised input to echo: "herp","derp,"hey get rid of these commas man",1234
?>
Thanks VERY much, everyone.
Original Answer
You can use str_getcsv() for this as it is purposely designed for process CSV strings:
$out = array();
$array = str_getcsv($csv_input);
foreach($array as $item) {
$out[] = str_replace(',', '', $item);
}
$out is now an array of elements without any commas in them, which you can then just implode as the quotes will no longer be required once the commas are removed:
$revised_input = implode(',', $out);
Update for comments
If the quotes are important to you then you can just add them back in like so:
$revised_input = '"' . implode('","', $out) . '"';
Another option is to use one of the str_putcsv() (not a standard PHP function) implementations floating about out there on the web such as this one.
This is a very naive approach that will work only if 'valid' commas are those that are between quotes with nothing else but maybe whitespace between.
<?php
$csv_input = '"herp","derp","hey, get rid of these commas, man",1234';
$pattern = '/([^"])\,([^"])/'; //this doesn't work
$revised_input = preg_replace ( $pattern , "$1$2" , $csv_input);
echo $revised_input;
//ouput for this is: "herp","derp","hey get rid of these commas man",1234
It should def be tested more but it works in this case.
Cases where it might not work is where you don't have quotes in the string.
one,two,three,four -> onetwothreefour
EDIT : Corrected the issues with deleting spaces and neighboring letters.
Well, I haven't been lazy and written a small function to do exactly what you need:
function clean_csv_commas($csv){
$len = strlen($csv);
$inside_block = FALSE;
$out='';
for($i=0;$i<$len;$i++){
if($csv[$i]=='"'){
if($inside_block){
$inside_block=FALSE;
}else{
$inside_block=TRUE;
}
}
if($csv[$i]==',' && $inside_block){
// do nothing
}else{
$out.=$csv[$i];
}
}
return $out;
}
You might be coming at this from the wrong angle.
Instead of removing the commas from the text (presumably so you can then split the string on the commas to get the separate elements), how about writing something that works on the quotes?
Once you've found an opening quote, you can check the rest of the string; anything before the next quote is part of this element. You can add some checking here to look for escaped quotes, too, so things like:
"this is a \"quote\""
will still be read properly.
Not exactly an answer you've been looking for - But I've used it for cleaning commas in numbers in CSV.
$csv = preg_replace('%\"([^\"]*)(,)([^\"]*)\"%i','$1$3',$csv);
"3,120", 123, 345, 567 ==> 3120, 123, 345, 567

Categories