I want to split a string as per the parameters laid out in the title. I've tried a few different things including using preg_match with not much success so far and I feel like there may be a simpler solution that I haven't clocked on to.
I have a regex that matches the "price" mentioned in the title (see below).
/(?=.)\£(([1-9][0-9]{0,2}(,[0-9]{3})*)|[0-9]+)?(\.[0-9]{1,2})?/
And here are a few example scenarios and what my desired outcome would be:
Example 1:
input: "This string should not split as the only periods that appear are here £19.99 and also at the end."
output: n/a
Example 2:
input: "This string should split right here. As the period is not part of a price or at the end of the string."
output: "This string should split right here"
Example 3:
input: "There is a price in this string £19.99, but it should only split at this point. As I want it to ignore periods in a price"
output: "There is a price in this string £19.99, but it should only split at this point"
I suggest using
preg_split('~\£(?:[1-9]\d{0,2}(?:,\d{3})*|[0-9]+)?(?:\.\d{1,2})?(*SKIP)(*F)|\.(?!\s*$)~u', $string)
See the regex demo.
The pattern matches your pattern, \£(?:[1-9]\d{0,2}(?:,\d{3})*|[0-9]+)?(?:\.\d{1,2})? and skips it with (*SKIP)(*F), else, it matches a non-final . with \.(?!\s*$) (even if there is trailing whitespace chars).
If you really only need to split on the first occurrence of the qualifying dot you can use a matching approach:
preg_match('~^((?:\£(?:[1-9]\d{0,2}(?:,\d{3})*|[0-9]+)?(?:\.\d{1,2})?|[^.])+)\.(.*)~su', $string, $match)
See the regex demo. Here,
^ - matches a string start position
((?:\£(?:[1-9]\d{0,2}(?:,\d{3})*|[0-9]+)?(?:\.\d{1,2})?|[^.])+) - one or more occurrences of your currency pattern or any one char other than a . char
\. - a . char
(.*) - Group 2: the rest of the string.
To split a text into sentences avoiding the different pitfalls like dots or thousand separators in numbers and some abbreviations (like etc.), the best tool is intlBreakIterator designed to deal with natural language:
$str = 'There is a price in this string £19.99, but it should only split at this point. As I want it to ignore periods in a price';
$si = IntlBreakIterator::createSentenceInstance('en-US');
$si->setText($str);
$si->next();
echo substr($str, 0, $si->current());
IntlBreakIterator::createSentenceInstance returns an iterator that gives the indexes of the different sentences in the string.
It takes in account ?, ! and ... too. In addition to numbers or prices pitfalls, it works also well with this kind of string:
$str = 'John Smith, Jr. was running naked through the garden crying "catch me! catch me!", but no one was chasing him. His psychatre looked at him from the window with a circumspect eye.';
More about rules used by IntlBreakIterator here.
You could simply use this regex:
\.
Since you only have a space after the first sentence (and not a price), this should work just as well, right?
I have a value like this 73b6424b. I want to split value into two parts. Like 73b6 and 424b. Then the two split value want to reverse. Like 424b and 73b6. And concatenate this two value like this 424b73b6. I have already done this like way
$substr_device_value = 73b6424b;
$first_value = substr($substr_device_value,0,4);
$second_value = substr($substr_device_value,4,8);
$final_value = $second_value.$first_value;
I am searching more than easy way what I have done. Is it possible?? If yes then approach please
You may use
preg_replace('~^(.{4})(.{4})$~', '$2$1', $s)
See the regex demo
Details
^ - matches the string start position
(.{4}) - captures any 4 chars into Group 1 ($1)
(.{4}) - captures any 4 chars into Group 2 ($2)
$ - end of string.
The '$2$1' replacement pattern swaps the values.
NOTE: If you want to pre-validate the data before swapping, you may replace . pattern with a more specific one, say, \w to only match word chars, or [[:alnum:]] to only match alphanumeric chars, or [0-9a-z] if you plan to only match strings containing digits and lowercase ASCII letters.
I have variable strings like the below:
The.Test.String.A01Y18.123h.WIB-DI.DO5.1.K.314-ECO
The.Regex.F05P78.123h.WIB-DI.DO5.1.K.314-EYT
Word.C05F78.342T.DSW-RF.EF5.2.F.342-DDF
I would like to extract this part of these string in PHP dynamically and i was looking at using regex but haven't had much success:
The.Test.String.A01Y18
The.Regex.F05P78
Word.C05F78
And ultimately to:
The Test String A01Y18
The Regex F05P78
Word C05F78
The first part of the text will be variable in length and will separate each word with a period. The next part will always be the same length with the pattern:
One letter, 2 number, one letter, 2 numbers pattern (C05F78)
Any thing in the string after that is what I would like to remove.
that's it
$x=array(
"The.Test.String.A01Y18.123h.WIB-DI.DO5.1.K.314-ECO",
"The.Regex.F05P78.123h.WIB-DI.DO5.1.K.314-EYT",
"Word.C05F78.342T.DSW-RF.EF5.2.F.342-DDF"
);
for ($i=0, $tmp_count=count($x); $i<$tmp_count; ++$i) {
echo str_replace(".", " ", preg_replace("/^(.+?)([a-z]{1}[0-9]{2}[a-z]{1}[0-9]{2})\..+$/i", "\\1\\2", $x[$i]))."<br />";
}
Using this regular expression should work, replacing each of your strings with the first capturing group:
^((?:\w+\.)+\w\d{2}\w\d{2}).*
See demo at http://regex101.com/r/fR3pM6
This is valid too:
preg_match("\.*[\w\d]{6}", stringVariable)
.* for all digits atleast we found a composition of letters and words of 6 characters ([\w\d]{6})
Result:
Match 1: The.Test.Stsrisng.A01Y18
Match 2: The.Regex.F05P78
Match 3: Word.C05F78
I'm going nuts trying to get a regex to detect spam of keywords in the user inputs. Usually there is some normal text at the start and the keyword spam at the end, separated by commas or other chars.
What I need is a regex to count the number of keywords to flag the text for a human to check it.
The text is usually like this:
[random text, with commas, dots and all]
keyword1, keyword2, keyword3, keyword4, keyword5,
Keyword6, keyword7, keyword8...
I've tried several regex to count the matches:
-This only gets one out of two keywords
[,-](\w|\s)+[,-]
-This also matches the random text
(?:([^,-]*)(?:[^,-]|$))
Can anyone tell me a regex to do this? Or should I take a different approach?
Thanks!
Pr your answer to my question, here is a regexp to match a string that occurs between two commas.
(?<=,)[^,]+(?=,)
This regexp does not match, and hence do not consume, the delimiting commas.
This regexp would match " and hence do not consume" in the previous sentence.
The fact that your regexp matched and consumed the commas was the reason why your attempted regexp only matched every other candidate.
Also if the whole input is a single string you will want to prevent linebreaks. In that case you will want to use;
(?<=,)[^,\n]+(?=,)
http://www.phpliveregex.com/p/1DJ
As others have said this is potentially a very tricky thing to do... It suffers from all of the same failures as general "word filtering" (e.g. people will "mask" the input). It is made even more difficult without plenty of example posts to test against...
Solution
Anyway, assuming that keywords will be on separate lines to the rest of the input and separated by commas you can match the lines with keywords in like:
Regex
#(?:^)((?:(?:[\w\.]+)(?:, ?|$))+)#m
Input
Taken from your question above:
[random text, with commas, dots and all]
keyword1, keyword2, keyword3, keyword4, keyword5,
Keyword6, keyword7, keyword8
Output
// preg_match_all('#(?:^)((?:(?:[\w]+)(?:, ?|$))+)#m', $string, $matches);
// var_dump($matches);
array(2) {
[0]=>
array(2) {
[0]=>
string(49) "keyword1, keyword2, keyword3, keyword4, keyword5,"
[1]=>
string(31) "Keyword6, keyword7, keyword8..."
}
[1]=>
array(2) {
[0]=>
string(49) "keyword1, keyword2, keyword3, keyword4, keyword5,"
[1]=>
string(31) "Keyword6, keyword7, keyword8"
}
}
Explanation
#(?:^)((?:(?:[\w]+)(?:, ?|$))+)#m
# => Starting delimiter
(?:^) => Matches start of line in a non-capturing group (you could just use ^ I was using |\n originally and didn't update)
( => Start a capturing group
(?: => Start a non-capturing group
(?:[\w]+) => A non-capturing group to match one or more word characters a-zA-Z0-9_ (Using a character class so that you can add to it if you need to....)
(?:, ?|$) => A non-capturing group to match either a comma (with an optional space) or the end of the string/line
)+ => End the non-capturing group (4) and repeat 5/6 to find multiple matches in the line
) => Close the capture group 3
# => Ending delimiter
m => Multi-line modifier
Follow up from number 2:
#^((?:(?:[\w]+)(?:, ?|$))+)#m
Counting keywords
Having now returned an array of lines only containing key words you can count the number of commas and thus get the number of keywords
$key_words = implode(', ', $matches[1]); // Join lines returned by preg_match_all
echo substr_count($key_words, ','); // 8
N.B. In most circumstances this will return NUMBER_OF_KEY_WORDS - 1 (i.e. in your case 7); it returns 8 because you have a comma at the end of your first line of key words.
Links
http://php.net/manual/en/reference.pcre.pattern.modifiers.php
http://www.regular-expressions.info/
http://php.net/substr_count
Why not just use explode and trim?
$keywords = array_map ('trim', explode (',', $keywordstring));
Then do a count() on $keywords.
If you think keywords with spaces in are spam, then you can iterate of the $keywords array and look for any that contain whitespace. There might be legitimate reasons for having spaces in a keyword though. If you're talking about superheroes on your system, for example, someone might enter The Tick or Iron Man as a keyword
I don't think counting keywords and looking for spaces in keywords are really very good strategies for detecting spam though. You might want to look into other bot protection strategies instead, or even use manual moderation.
How to match on the String of text between the commas?
This SO Post was marked as a duplicate to my posted question however since it is NOT a duplicate and there were no answers in THIS SO Post that answered my question on how to also match on the strings between the commas see below on how to take this a step further.
How to Match on single digit values in a CSV String
For example if the task is to search the string within the commas for a single 7, 8 or a single 9 but not match on combinations such as 17 or 77 or 78 but only the single 7s, 8s, or 9s see below...
The answer is to Use look arounds and place your search pattern within the look arounds:
(?<=^|,)[789](?=,|$)
See live demo.
The above Pattern is more concise however I've pasted below the Two Patterns provided as solutions to THIS this question of matching on Strings within the commas and they are:
(?<=^|,)[789](?=,|$) Provided by #Bohemian and chosen as the Correct Answer
(?:(?<=^)|(?<=,))[789](?:(?=,)|(?=$)) Provided in comments by #Ouroborus
Demo: https://regex101.com/r/fd5GnD/1
Your first regexp doesn't need a preceding comma
[\w\s]+[,-]
A regex that will match strings between two commas or start or end of string is
(?<=,|^)[^,]*(?=,|$)
Or, a bit more efficient:
(?<![^,])[^,]*(?![^,])
See the regex demo #1 and demo #2.
Details:
(?<=,|^) / (?<![^,]) - start of string or a position immediately preceded with a comma
[^,]* - zero or more chars other than a comma
(?=,|$) / (?![^,]) - end of string or a position immediately followed with a comma
If people still search for this in 2021
([^,\n])+
Match anything except new line and comma
regexr.com/60eme
I think the difficulty is that the random text can also contain commas.
If the keywords are all on one line and it is the last line of the text as a whole, trim the whole text removing new line characters from the end. Then take the text from the last new line character to the end. This should be your string containing the keywords. Once you have this part singled out, you can explode the string on comma and count the parts.
<?php
$string = " some gibberish, some more gibberish, and random text
keyword1, keyword2, keyword3
";
$lastEOL = strrpos(trim($string), PHP_EOL);
$keywordLine = substr($string, $lastEOL);
$keywords = explode(',', $keywordLine);
echo "Number of keywords: " . count($keywords);
I know it is not a regex, but I hope it helps nevertheless.
The only way to find a solution, is to find something that separates the random text and the keywords that is not present in the keywords. If a new line is present in the keywords, you can not use it. But are 2 consecutive new lines? Or any other characters.
$string = " some gibberish, some more gibberish, and random text
keyword1, keyword2, keyword3,
keyword4, keyword5, keyword6,
keyword7, keyword8, keyword9
";
$lastEOL = strrpos(trim($string), PHP_EOL . PHP_EOL); // 2 end of lines after random text
$keywordLine = substr($string, $lastEOL);
$keywords = explode(',', $keywordLine);
echo "Number of keywords: " . count($keywords);
(edit: added example for more new lines - long shot)
I found some partial help but cannot seem to fully accomplish what I need. I need to be able to do the following:
I need an regular expression to replace any 1 to 3 character words between two words that are longer than 3 characters with a match any expression:
For example:
walk to the beach ==> walk(.*)beach
If the 1 to 3 character word is not preceded by a word that's longer than 3 characters then I want to translate that 1 to 3 letter word to '<word> ?'
For example:
on the beach ==> on ?the ?beach
The simpler the rule the better (of course, if there's an alternative more complicated version that's more performant then I'll take that as well as I eventually anticipate heavy usage eventually).
This will be used in a PHP context most likely with preg_replace. Thus, if you can put it in that context then even better!
By the way so far I have got the following:
$string = preg_replace('/\s+/', '(.*)', $string);
$string = preg_replace('/\b(\w{1,3})(\.*)\b/', '${1} ?', $string);
but that results in:
walk to the beach ==> 'walk(.*)to ?beach'
which is not what I want. 'on the beach' seems to translate correctly.
I think you will need two replacements for that. Let's start with the first requirement:
$str = preg_replace('/(\w{4,})(?: \w{1,3})* (?=\w{4,})/', '$1(.*)', $str);
Of course, you need to replace those \w (which match letters, digits and underscores) with a character class of what you actually want to treat as a word character.
The second one is a bit tougher, because matches cannot overlap and lookbehinds cannot be of variable length. So we have to run this multiple times in a loop:
do
{
$str = preg_replace('/^\w{0,3}(?: \w{0,3})* (?!\?)/', '$0?', $str, -1, $count);
} while($count);
Here we match everything from the beginning of the string, as long as it's only up-to-3-letter words separated by spaces, plus one trailing space (only if it is not already followed by a ?). Then we put all of that back in place, and append a ?.
Update:
After all the talk in the comments, here is an updated solution.
After running the first line, we can assume that the only less-than-3-letter words left will be at the beginning or at the end of the string. All others will have been collapsed to (.*). Since you want to append all spaces between those with ?, you do not even need a loop (in fact these are the only spaces left):
$str = preg_replace('/ /', ' ?', $str);
(Do this right after my first line of code.)
This would give the following two results (in combination with the first line):
let us walk on the beach now go => let ?us ?walk(.*)beach ?now ?go
let us walk on the beach there now go => let ?us ?walk(.*)beach(.*)there ?now ?go