I'm trying to replace 'he' by 'she' between two given positions, in this case between Maria(0) and Peter(34). So basically I give two positions and replace all the occurrences in the sentence between the boundaries. I have tried with the following simple code but it seems that the substr_replace function doesn't allow me to do it. Any idea to make it works?
$sentence = "Maria went to London where he met Peter with his family.";
$clean = substr_replace($sentence, "he", "she", 0, 34);
You are using substr_replace() incorrectly, and in this case you don't need it. Instead, try this, using a combo of str_replace() and substr():
$sentence = "Maria went to London where he met Peter with his family.";
$clean = str_replace(' he ', ' she ', substr($sentence, 0, 34));
$clean .= substr($sentence, 34);
See demo
Essentially, you are replacing he with she on the specified substr of $sentence, then concatenating the rest of $sentence back on.
Note the spaces around ' he ' and ' she '. This is necessary because otherwise you would replace where with wshere.
Edit: mark made me realized I misinterpretted your question. However I feel you might still find this regex useful, so I'm going to leave my answer but for the answer to your exact question look at MarkM's answer.
If you want to replace all instances of the word he with she only when the word 'he' occurs as its own word I would do something like this.
$pattern = '/(\b)he(\b)/';
$replacement = '$1she$2';
$subject = 'Maria went to London where he met Peter with his family';
$newString = preg_replace($pattern, $replacement, $subject);
The regex pattern basically says find all instances of the letters 'he' surrounded by valid word seperators (\b) and replace it with 'she' surrounded by the word separators that were found.
Edit:
See It Run
Related
I have a load of labels which are camel case. Some examples are
whatData
whoData
deliveryDate
importantQuestions
What I am trying to do is this. Any label which has the word Data needs to have this word removed. At the point of the capital letter, I need to provide a space. Finally, everything should be uppercase. I have done the removal of Data and the uppercase by doing this ($data->key is the label)
strtoupper(str_replace('Data', '', $data->key))
The part I am struggling with is adding the spaces between words. So basically the above words should end up like this
WHAT
WHO
DELIVERY DATE
IMPORTANT QUESTIONS
How can I factor in the last part of this?
Thanks
It will add spaces before every capital letters. Try this:
$String = 'whatData';
$Words = preg_replace('/(?<!\ )[A-Z]/', ' $0', $String);
Problem
Your regex '~^[A-Z]~' will match only the first capital letter. Check out Meta Characters in the Pattern Syntax for more information.
Your replacement is a newline character '\n' and not a space.
Solution
Use preg_replace(). Try below code.
$string = "whatData";
echo preg_replace('/(?<!\ )[A-Z]/', ' $0', $string);
Output
what Data
Try following:
$string = 'importantQuestions';
$string = strtoupper(ltrim(preg_replace('/[A-Z]/', ' $0', $string)));
echo $string;
This will give you output as:
IMPORTANT QUESTIONS
Try this:
preg_split: split on camel case
array_map: UPPER case all the element
implode: Implode the array
str_replace: Replace the `DATE` with empty
trim: trim the white spaces.
Do this simple things:
echo trim(str_replace("DATE", "", implode(" ", array_map("strtoupper", preg_split('/(?=[A-Z])/', 'deliveryDate', -1, PREG_SPLIT_NO_EMPTY))))); // DELIVERY
This is result exactly what you want.
I'm trying to take a string that's output from MySql like this (MySql outputs X characters):
$str = 'Buddy you're a boy make a big noise Playin in the stre';
and trying to start from the right side, trim whatever is there up till the first space. Sounded simple when I got down to it, but now, it has my brain and fingers in knots.
The output I'm tying to achieve is simple:
$str = 'Buddy you're a boy make a big noise Playin in the';
Notice, that characters starting from the right, till the first space, are removed.
Can you help?
My Fiddle
$str = 'Buddy you\'re a boy make a big noise Playin in the stre';
//echo rtrim($str,' ');
It's a useful idiom to remember on its own: to remove all the characters preceding a specific one from the right side of the string (including that special character), use the following:
$trimmed = substr($str, 0, strrpos($str, ' '));
... where ' ' is that special character.
Demo
If you don't know, however, whether or not the character is present, you'd check the result of sttrrpos first:
$last_space_index = strrpos($str, ' ');
$trimmed = $last_space_index !== false
? substr($str, 0, $last_space_index)
: $str;
And if there can be more than one character that you need to trim, like in 'hello there test' line, just rtrim the result:
$trimmed = rtrim(substr($str, 0, strrpos($str, ' ')), ' ');
In this case, however, a regex-based solution looks more appropriate:
$trimmed = preg_replace('/ +[^ ]*$/', '', $str);
I think your best option would be a regex replace:
preg_replace('/\s+\S*$/', '', $str);
which outputs Buddy you're a boy make a big noise Playin in the
And the Fiddle
it's probably easier to do it with regex, but I'm sooo bad with that! You shoud try this:
// Get all the words in an array
$strArray = explode(" ", $str);
// Remove the last word.
array_pop($strArray);
// Get it back into a sentence
$newString = implode(" ", $strArray);
There's a hundred ways to do this, here are some options:
array_pop'ing the last word off an array we create from explode:
$arr = explode(" ", $str);
$fixed_arr = array_pop($arr);
$result = implode(" ", $arr);
Using regular expressions:
$result = preg_replace('/\s+\S*$/', '', $str);
and using strrpos and substr:
$spacePos = strrpos($str, ' ');
$result = substr($str, 0, $spacePos);
In mysql use
left(field,length)
to output only the strlen first digits
right(field,length) having opposite effects
otherwise use substr($string,0,$length) or regex in php
As a matter of regex performance comparison, the regex engine can move faster through the string when it can perform greedy matching with minimal backtracking.
/ +[^ ]*$/ uses 68 steps. (#raina77ow)
/(?:[^ ]+\K )+.*/ uses 56 steps. (#mickmackusa)
/(?:\K [^ ]*)+/ uses 48 steps. (#mickmackusa)
\s+\S*$ uses 34 steps. (#ChrisBornhoft and #RyanKempt)
/.*\K .*/ uses just 15 steps. (#mickmackusa)
Based on these comparisons, I recommend greedily matching any characters, then restarting the fullstring match before matching the last occurring space, then matching zero or more characters until the end of the string.
Code: (Demo)
$string = "Buddy you're a boy make a big noise Playin in the stre";
var_export(
preg_replace('/.*\K .*/', '', $string)
);
Output:
'Buddy you\'re a boy make a big noise Playin in the'
I've tried to make a tool in which you input a website and when you click the submit button it cURLS all the text.
After all the cURLing, stripping it from tags, and counting the words. It's eventually an array named $frequency. If I echo it using <pre> tags it will show me everything just fine! (NOTE: I'm placing the contents in a file, $homepage = file_get_contents($file); and this is what I work with in my code, I don't know if this matters or not)
However i don't really care if the word or is seen 200 times in a website, I only want the important words. So i have made an array with all the common words. Which is set eventually in the $common_words variable. But i can't seem to find a way to replace all words found in the $frequency to replace them with "" if they are found in the $common_words as well.
I've found this piece of code after some research:
$string = 'sand band or nor and where whereabouts foo';
$wordlist = array("or", "and", "where");
foreach ($wordlist as &$word) {
$word = '/\b' . preg_quote($word, '/') . '\b/';
}
$string = preg_replace($wordlist, '', $string);
var_dump($string);
If I copy paste this it works fine, removing the or, and, where from the string.
But replacing $string with $frequency or replacing $wordlist with $common_words will either not work or throw me an error like: Delimiter must not be alphanumeric or backslash
I hope i've formulated my question properly, if not. Please tell me!
Thanks in advance
EDIT: Alright, i've narrowed down the problem alot. First of all i forgot the & inside the foreach ($wordlist as &$word) {
But as it was counting all the words, the words it has replaced are all still counted. See those 2 screenshots to see what I mean: http://imgur.com/oqqZR3h,xHEZKRz#0
If I understand this correctly you wan't to know how many occurrences each word has by ignoring the so called common words.
Assuming that $url is the page you will be running against and $common_words is your common words array, here is what you can do:
// Get the page content's and strip the html tags
$contents = strip_tags( file_get_contents($url) );
// This will split the words from the contents, creating an array with each word in it
preg_match_all("/([\w]+[']?[\w]*)\W/", $contents, $words);
$common_words = array('or', 'and', 'I', 'where');
$frequency = array();
// Count occurrences
$frequency = array_count_values($words[0]);
unset($words); // Release all that memory
var_dump($frequency);
At this point you will have an associative array with each not common word and a count showing the number of occurrences of the given word.
UPDATE
A bit more about the RegEx. We need to match word. The easiest way possible is: (\w+). But that won't match words like I've or haven't (Notice the '). That was my point of making it more complicated. Also, \w doesn't support dashes for words like in 6-year-old.
So I created a subgroup which should match words characters including dashed and single quotes in a word.
(?:\w'|\w|-)
The ?: part on the beginning is do not match or do not include in the results. That is since all I am doing is grouping the options for word contents. To mach an entire word the RegEx will match one or more of the subgroup above:
((?:\w'\w|\w|-)+)
So the RegEx preg_match_all() line should be:
preg_match_all("/((?:\w'\w|\w|-)+)/", $contents, $words);
Hope this helps.
I had changed $wordlist with $mywordlist. still its working!
<?php
$string = 'sand band or nor and where whereabouts foo';
$wordlist = array("or", "and", "where");
$mywordlist=array("sand","band");
foreach ($mywordlist as &$word) {
$word = '/\b' . preg_quote($word, '/') . '\b/';
}
$string = preg_replace($mywordlist, '', $string);
var_dump($string);
?>
I suppose you can do simply like this:
$common_words = "foo baq etc etc";
$str = "foo bar baz"; // input
foreach (explode(" ", $common_words) as $word){
$str = strtr($str, $word, "");
}
I am working with some code in PHP that grabs the referrer data from a search engine, giving me the query that the user entered.
I would then like to remove certain stop words from that string if they exist. However, the word may or may not have a space at either end.
For example, I have been using str_replace to remove a word as follows:
$keywords = str_replace("for", "", $keywords);
$keywords = str_replace("sale", "", $keywords);
but if the $keywords value is "baby formula" it will change it to "baby mula" - removing the "for" part.
Without having to create further str_replace's that account for " for" and "for " - is there a preg_replace type command I could use that would remove the given word if it is found with a space at either end?
My idea would be to put all of the stop words into an array and step through them that way and I suspect that a preg_replace is going to be quicker than stepping through multiple str_replace lines.
UPDATE:
Solved thanks to you guys using the following combination:
$keywords = "...";
$stopwords = array("for","each");
foreach($stopwords as $stopWord)
{
$keywords = preg_replace("/(\b)$stopWord(\b)/", "", $keywords);
}
$keywords = "...";
$stopWords = array("for","sale");
foreach($stopWords as $stopWord){
$keywords = preg_replace("/(\b)$stopWord(\b)/", "", $keywords);
}
Try it this way
$keywords = preg_replace( '/(?!\w)(for|sale)(?>!\w)/', '', $keywords );
You can use word boundaries for this
$keywords = preg_replace('/\bfor\b/', '', $keywords);
or with multiple words
$keywords = preg_replace('/\b(?:for|sale)\b/', '', $keywords);
While Armel's answer will work, it is not performing optimally. Yes, your desired output will require wordboundaries and probably case-insensitive matching, but:
Wordboundaries gain nothing from being wrapped in parentheses.
Performing iterated preg_match() calls for each element in the blacklist array is not efficient. Doing so will ask the regex engine to perform wave after wave of individual keyword checks on the full string.
I recommend building a single regex pattern that will check for all keywords during each step of traversing the string -- one time. To generate the single pattern dynamically, you only need to implode your blacklist array of elements with | (pipes) which represent the "OR" command in regex. By wrapping all of the pipe-delimited keywords in a non-capturing group ((?:...)), the wordboundaries (\b) serve their purpose for all keywords in the blacklist array.
Code: (Demo)
$string = "Each person wants peaches for themselves forever";
$blacklist = array("for", "each");
// if you might have non-letter characters that have special meaning to the regex engine
//$blacklist = array_map(function($v){return preg_quote($v, '/');}, $blacklist);
//print_r($blacklist);
echo "Without wordboundaries:\n";
var_export(preg_replace('/' . implode('|', $blacklist) . '/i', '', $string));
echo "\n\n---\n";
echo "With wordboundaries:\n";
var_export(preg_replace('/\b(?:' . implode('|', $blacklist) . ')\b/i', '', $string));
echo "\n\n---\n";
echo "With wordboundaries and consecutive space mop up:\n";
var_export(trim(preg_replace(array('/\b(?:' . implode('|', $blacklist) . ')\b/i', '/ \K +/'), '', $string)));
Output:
Without wordboundaries:
' person wants pes themselves ever'
---
With wordboundaries:
' person wants peaches themselves forever'
---
With wordboundaries and consecutive space mop up:
'person wants peaches themselves forever'
p.s. / \K +/ is the second pattern fed to preg_replace() which means the input string will be read a second time to search for 2 or more consecutive spaces. \K means "restart the fullstring match from here"; effectively it releases the previously matched space. Then one or more spaces to follow are matched and replaced with an empty string.
Using php regexp in a simple way, is it possible to modify a string to add a space after commas and periods that follow words but not after a comma or period that is preceded and followed by a number such as 1,000.00?
String,looks like this with an amount of 1,000.00
Needs to be changed to...
String, looks like this with an amount of 1,000.00
This should allow for multiple instances of course... Here is what I am using now but it is causing numbers to return as 1, 000. 00
$punctuation = ',.;:';
$string = preg_replace('/(['.$punctuation.'])[\s]*/', '\1 ', $string);
You could replace '/(?<!\d),|,(?!\d{3})/' with ', '.
Something like:
$str = preg_replace('/(?<!\d),|,(?!\d{3})/', ', ', $str);
I was searching for this regex.
This post really help me, and I improve solution proposed by Qtax.
Here is mine:
$ponctuations = array(','=>', ','\.'=>'. ',';'=>'; ',':'=>': ');
foreach($ponctuations as $ponctuation => $replace){
$string = preg_replace('/(?<!\d)'.$ponctuation.'(?!\s)|'.$ponctuation.'(?!(\d|\s))/', $replace, $string);
}
With this solution, "sentence like: this" will not be changed to "sentence like: this" (whith 2 blanck spaces)
That's all.
Although this is quite old, I was looking for this same question, and after understanding solutions given I have a different answer.
Instead of checking for character before the comma this regex checks the character after the comma and so it can be limited to alphabetic ones. Also this will not create a string with two spaces after a comma.
$punctuation = ',.;:';
$string = preg_replace("/([$punctuation])([a-z])/i",'\1 \2', $string);
Test script can be checked here.