I have a file that contains a collection strings. All of the strings begin with the same set of characters and end with the same character. I need to find all of the strings that match a certain pattern, and then remove particular characters from them before saving the file. Each string looks like this:
Data_*: " ... "
where Data_ is the same for each string, the asterisk is an incrementing integer that is either two or three digits, and the colon and the double quotation marks are the same for each string. The ... is completely different in every string and it's the part of each I need to work with. I need to remove all double quotation marks from the ... , preserving the enclosing double quotation marks. I don't need to replace them, just remove them.
So for example, I need this...
Data_83: "He said, "Yes!" to the question"
to become this...
Data_83: "He said, Yes! to the question"
I am familiar with PHP and would like to use this. I know how to do something like...
<?php
$filename = 'path/to/file';
$content = file_get_contents($filename);
$new_content = str_replace('"', '', $content);
file_put_contents($filename, $new_content);
And I'm pretty sure a regular expression will be what I'm wanting to use to find the strings and remove the extra double quotation marks. But I'm very new to regular expressions and need some help here.
EDIT:
I should have mentioned, the file is a PHP file containing an object. It looks a bit like this:
<?php
$thing = {
Data_83: "He said, "Yes!" to the question",
Data_84: "Another string with "unwanted" quotes"
}
You may use preg_replace_callback with a regex like
'~^(\h*Data_\d{2,}:\h*")(.*)"~m'
Note that you may make it safer if you specify an optional , at the end of the line: '~^(\h*Data_\d{2,}:\h*")(.*)",?\h*$~m' but you might need to introduce another capturing group then (around ,?\h*, and then append $m[3] in the preg_replace_callback callback function).
Details
^ - start of the line (m is a multiline modifier)
(\h*Data_\d{2,}:\h*") - Group 1 ($m[1]):
\h* - 0+ horizontal whitespaces
Data_ - Data_ substring
\d{2,} - 2 or more digits
: - a colon
\h* - 0+ horizontal whitespaces
" - double quote
(.*) - Group 2 ($m[2]): any 0+ chars other than line break chars, as many as possible, up to the last...
" - double quote (on a line).
The $m represents the whole match object, and you only need to remove the " inside $m[2], the second capture.
See the PHP demo:
preg_replace_callback('~^(\h*Data_\d{2,}:\h*")(.*)"~m', function($m) {
return $m[1] . str_replace('"', '', $m[2]) . '"';
}, $content);
Not as elegant but you could create a UDF:
function RemoveNestedQuotes($string)
{
$firstPart = explode(":", $string)[0];
preg_match('/"(.*)"/', $string, $matches, PREG_OFFSET_CAPTURE);
$tmpString = $matches[1][0];
return $firstPart . ': "' . preg_replace('/"/', '', $tmpString) . '"';
}
example:
$string = 'Data_83: "He said, "Yes!" to the question"';
echo RemoveNestedQuotes($string);
// Data_83: "He said, Yes! to the question"
One more step after str_replace with implode and explode. You can just do it like this.
<?php
$string = 'Data_83: "He said, "Yes!" to the question"';
$string = str_replace('"', '', $string);
echo $string =implode(': "',explode(': ',$string)).'"';
?>
Demo : https://eval.in/912466
Program Output
Data_83: "He said, Yes! to the question"
Just to replace " quotes
<?php
$string = 'Data_83: "He said, "Yes!" to the question"';
echo preg_replace('/"/', '', $string);
?>
Demo : https://eval.in/912457
The way I see it, you don't need to make any preg_replace_callback() calls or a convoluted run of explosions and replacements. You merely need to disqualify the 2 double quotes that you wish to retain and match the rest for removal.
Code: (Demo)
$string = 'Data_83: "He said, "Yes!" to the question",
Data_184: "He said, "WTF!" to the question"';
echo preg_replace('/^[^"]+"(*SKIP)(*FAIL)|"(?!,\R|$)/m','',$string);
Output:
Data_83: "He said, Yes! to the question",
Data_184: "He said, WTF! to the question"
Pattern Demo
/^[^"]+"(*SKIP)(*FAIL)|"(?!,?$)/m
This pattern says:
match from the start of each line until you reach the first double quote, then DISQUALIFY it.
then after the |, match all double quotes that are not optionally followed by a comma then the end of line.
While this pattern worked on regex101 with my sample input, when I transferred it to the php sandbox to whack together a demo, I needed to add \R to maintain accuracy. You can test to see which is appropriate for your server/environment.
Related
I have a string in PHP
$string = "Dogs are Jonny's favorite pet";
I want to use regex or some method to remove s or 's from the end of all words in the string.
The desired output would be:
$revisedString = "Dog are Jonny favorite pet";
Here is my current approach:
<?php
$string = "Dogs are Jonny's favorite pet";
$stringWords = explode(" ", $string);
$counter = 0;
foreach($stringWords as $string) {
if(substr($string, -1) == s){
$stringWords[$counter] = trim($string, "s");
}
if(strpos($string, "'s") !== false){
$stringWords[$counter] = trim($string, "'s");
}
$counter = $counter + 1;
}
print_r($stringWords);
$newString = "";
foreach($stringWords as $string){
$newString = $newString . $string . " ";
}
echo $newString;
}
?>
How would this be achieved with REGEX?
For general use, you must leverage much more sophisticated technique than an English-ignorant regex pattern. There may be fringe cases where the following pattern fails by removing an s that it shouldn't. It could be a name, an acronym, or something else.
As an unreliable solution, you can optionally match an apostrophe then match a literal s if it is not immediately preceded by another s. Adding a word boundary (\b) on the end improves the accuracy that you are matching the end of words.
Code: (Demo)
$string = "The bass can access the river's delta from the ocean. The fishermen, assassins, and their friends are happy on the banks";
var_export(preg_replace("~'?(?<!s)s\b~", '', $string));
Output:
'The bass can access the river delta from the ocean. The fishermen, assassin, and their friend are happy on the bank'
PHP Live Regex always helped me a lot in such moments. Even already knowing how REGEX works, I still use it just to be sure some times.
To make use of REGEX in your case, you can use preg_replace().
<?php
// Your string.
$string = "Dogs are Jonny's favorite pet";
// The vertical bar means "or" and the backslash
// before the apostrophe is needed so you don't end
// your pattern string since we're using single quotes
// to delimit it. "\s" means a single space.
$regex_pattern = '/\'s\s|s\s|s$/';
// Fill the preg_replace() with the pattern, the replacement
// (a single space in this case), your string, -1 (so preg_replace()
// will replace all the matches) and a variable of your desire
// to be the "counter" (preg_replace() will automatically
// fill it).
$newString = preg_replace($regex_pattern, ' ', $string, -1, $counter);
// Use the rtrim() to remove spaces at the right of the sentence.
$newString = rtrim($newString, " ");
echo "New string: " . $newString . ". ";
echo "Replacements: " . $counter . ".";
?>
In this case, the function will identify any "'s" or "s" with spaces (\s) after them and then replace them with a single space.
The preg_replace() will also count all the replacements and register them automatically on $counter or any variable you place there instead.
Edit:
Phil's comment is right and indeed my previous REGEX would lose a "s" at the end of the string. Adding "|s$" will solve it. Again, "|" means "or" and the "$" means that the "s" must be at the end of the string.
In attention to mickmackusa's comment, my solution is meant only to remove "s" characters at the end of words inside the string as this was Sparky Johnson' request here. Removing plurals would require a complex code since not only we need to remove "s" characters from plural only words but also change verbs and other stuff.
There is a string in format:
else if($rule=='somerule1')
echo '{"s":1,"n":"name surname"}';
else if($rule=='somerule2')
echo '{"s":1,"n":"another text here"}';
...
"s" can have only number, "n" any text.
In input I have $rule value, and I need to remove the else if block that corresponds to this value. I am trying this:
$str = preg_replace("/else if\(\$rule=='$rule'\)\necho '{\"s\":[0-9],\"n\":\".*\"/", "", $str);
where $str is a string, that contains blocks I mentioned above, $rule is a string with rule I need to remove. But the function returns $str without changes.
What do I do wrong?
For example, script to change "s" value to 1 works nice:
$str = preg_replace("/$rule'\)\necho '{\"s\":[0-9]/", $rule."')\necho '{\"s\":1", $str);
So, probably, I am doing mistake with = symbol, or maybe with space, or with .*.
The regex pattern can be much less strict, much simpler, and far easier to read/maintain.
You need to literally match the first line (conditional expression) with the only dynamic component being the $rule variable, then match the entire line that immediately follows it.
Code: (Demo)
$contents = <<<'TEXT'
else if($rule=='somerule1')
echo '{"s":1,"n":"name surname"}';
else if($rule=='somerule2')
echo '{"s":1,"n":"another text here"}';
TEXT;
$rule = "somerule1";
echo preg_replace("~\Qelse if(\$rule=='$rule')\E\R.+~", "", $contents);
Output:
else if($rule=='somerule2')
echo '{"s":1,"n":"another text here"}';
So, what have I done? Here's the official pattern demo.
\Q...\E means "treat everything in between these two metacharacters literally"
Then the only character that needs escaping is the first $, this is not to stop it from being interpreted as a end-of-string metacharacter, but as the start of the $rule variable because the pattern is wrapped in double quotes.
The second occurrence of $rule in the pattern DOES need to be interpreted as the variable so it is not escaped.
The \R is a metacharacter which means \n, \r and \r\n
Finally match all of the next line with the "any character" . with a one or more quantifier (+).
The pattern does not match because you have to use a double escape to match the backslash \\\$.
Apart from that, you are not matching the whole line as this part ".*\" stops at the double quote before }';
$str = 'else if($rule==\'somerule1\')
echo \'{"s":1,"n":"name surname"}\';
else if($rule==\'somerule2\')
echo \'{"s":1,"n":"another text here"}\';';
$rule = "somerule1";
$pattern = "/else if\(\\\$rule=='$rule'\)\necho '{\"s\":\d+,\"n\":\"[^\"]*\"}';/";
$str = preg_replace($pattern, "", $str);
echo $str;
Output
else if($rule=='somerule2')
echo '{"s":1,"n":"another text here"}';
Php demo
Maybe it can not be solved this issue as I want, but maybe you can help me guys.
I have a lot of malformed words in the name of my products.
Some of them has leading ( and trailing ) or maybe one of these, it is same for / and " signs.
What I do is that I am explode the name of the product by spaces, and examines these words.
So I want to replace them to nothing. But, a hard drive could be 40GB ATA 3.5" hard drive. I need to process all the word, but I can not use the same method for 3.5" as for () or // because this 3.5" is valid.
So I only need to replace the quotes, when it is at the start of the string AND at end of the string.
$cases = [
'(testone)',
'(testtwo',
'testthree)',
'/otherone/',
'/othertwo',
'otherthree/',
'"anotherone',
'anothertwo"',
'"anotherthree"',
];
$patterns = [
'/^\(/',
'/\)$/',
'~^/~',
'~/$~',
//Here is what I can not imagine, how to add the rule for `"`
];
$result = preg_replace($patterns, '', $cases);
This is works well, but can it be done in one regex_replace()? If yes, somebody can help me out the pattern(s) for the quotes?
Result for quotes should be this:
'"anotherone', //no quote at end leave the leading
'anothertwo"', //no quote at start leave the trailin
'anotherthree', //there are quotes on start and end so remove them.
You may use another approach: rather than define an array of patterns, use one single alternation based regex:
preg_replace('~^[(/]|[/)]$|^"(.*)"$~s', '$1', $s)
See the regex demo
Details:
^[(/] - a literal ( or / at the start of the string
| - or
[/)]$ - a literal ) or / at the end of the string
| - or
^"(.*)"$ - a " at the start of the string, then any 0+ characters (due to /s option, the . matches a linebreak sequence, too) that are captured into Group 1, and " at the end of the string.
The replacement pattern is $1 that is empty when the first 2 alternatives are matched, and contains Group 1 value if the 3rd alternative is matched.
Note: In case you need to replace until no match is found, use a preg_match with preg_replace together (see demo):
$s = '"/some text/"';
$re = '~^[(/]|[/)]$|^"(.*)"$~s';
$tmp = '';
while (preg_match($re, $s) && $tmp != $s) {
$tmp = $s;
$s = preg_replace($re, '$1', $s);
}
echo $s;
This works
preg_replace([[/(]?(.+)[/)]?|/\"(.+)\"/], '$1', $string)
I'm trying to split a string at question marks, exclamation marks, or periods, but at the same time I'm trying to keep the punctuation marks after splitting them. How would I do that? Thanks.
$input = "Sentence1?Sentence2.Sentence3!";
$input = preg_split("/(\?|\.|!)/", $input);
echo $input[0]."<br>";
echo $input[1]."<br>";
echo $input[2]."<br>";
Desired outputs:
Sentence1?
Sentence2.
Sentence3!
Actual outputs:
Sentence1
Sentence2
Sentence3
You can do this by changing the capture group in your regex into a lookbehind like so:
$input = preg_split("/(?<=\?|\.|!)/", $input);
the manual knows all
PREG_SPLIT_DELIM_CAPTURE
If this flag is set, parenthesized expression in the delimiter pattern will be captured and returned as well.
so in your case:
$input = preg_split("/(\?|\.|!)/", $input,NULL,PREG_SPLIT_DELIM_CAPTURE);
I'm trying to remove all words of less than 3 characters from a string, specifically with RegEx.
The following doesn't work because it is looking for double spaces. I suppose I could convert all spaces to double spaces beforehand and then convert them back after, but that doesn't seem very efficient. Any ideas?
$text='an of and then some an ee halved or or whenever';
$text=preg_replace('# [a-z]{1,2} #',' ',' '.$text.' ');
echo trim($text);
Removing the Short Words
You can use this:
$replaced = preg_replace('~\b[a-z]{1,2}\b\~', '', $yourstring);
In the demo, see the substitutions at the bottom.
Explanation
\b is a word boundary that matches a position where one side is a letter, and the other side is not a letter (for instance a space character, or the beginning of the string)
[a-z]{1,2} matches one or two letters
\b another word boundary
Replace with the empty string.
Option 2: Also Remove Trailing Spaces
If you also want to remove the spaces after the words, we can add \s* at the end of the regex:
$replaced = preg_replace('~\b[a-z]{1,2}\b\s*~', '', $yourstring);
Reference
Word Boundaries
You can use the word boundary tag: \b:
Replace: \b[a-z]{1,2}\b with ''
Use this
preg_replace('/(\b.{1,2}\s)/','',$your_string);
As some solutions worked here, they had a problem with my language's "multichar characters", such as "ch". A simple explode and implode worked for me.
$maxWordLength = 3;
$string = "my super string";
$exploded = explode(" ", $string);
foreach($exploded as $key => $word) {
if(mb_strlen($word) < $maxWordLength) unset($exploded[$key]);
}
$string = implode(" ", $exploded);
echo $string;
// outputs "super string"
To me, it seems that this hack works fine with most PHP versions:
$string2 = preg_replace("/~\b[a-zA-Z0-9]{1,2}\b\~/i", "", trim($string1));
Where [a-zA-Z0-9] are the accepted Char/Number range.