I am trying to grab the text after the last number in the string and grab the whole string if it doesn't contain numbers.
The best regex I could come up with is:
([^\d\s]*)$
However I found that \s and \d aren't supported in mysql regexp rather [[:space:]] and not sure what \d is equivalent too.
This is what I'm trying to accomplish:
'1/2 Oz' returns 'Oz'
'2 3/4 Oz' returns 'Oz'
'As needed' returns 'As needed'
This is the regex you will need:
/^.*?(\d+(?=\D*$)\s*)/
And just replace matched text with empty string ""
PHP code:
$s = preg_replace('/^.*?(\d+(?=\D*$)\s*)/', '', 'Foo Oz');
//=> Foo Oz
$s = preg_replace('/^.*?(\d+(?=\D*$)\s*)/', '', '1/2 Oz');
//=> Oz
Live Demo: http://ideone.com/u887D7
First of all, you could simply avoid the class, and use a range instead:
[^0-9[:space:]]*$
But there is one for digits as well (which may actually include non-ASCII digits). The documentation has a list of these. They are called POSIX bracket expressions by the way.
[^[:digit:][:space:]]*$
However, the general problem with this approach is that it doesn't allow for spaces later on in the string (like the one between As and needed. To get those, but still avoid capturing trailing spaces after digits, make sure, the first character is neither space nor digit, then match the rest of the string as non-digits. In addition, make the whole thing optional, to ensure that it still works with strings ending in a digit.
([^[:digit:][:space:]][^:digit:]*)?$
Related
I have this regex that matches strings that I want to check on validity.
However recently I want to use this same regex to replace every character that is not valid to the regex with a character (let's say x).
My regex to match these types of strings is: '#^[\pL\'\’\d][\pL\.\-\ \'\/\,\’\d]*$#iu'
Which allows for the first character to be of any language or any digit and some determined special chars. And all the following letters to be slightly the same but slightly more special characters.
This is what I do (nothing special).
preg_replace($regex, 'x', $string);
Things I tried include trying to negate the regex:
'(?![\pL\'\’\d][\pL\.\-\ \'\/\,\’\d]*)'
'[^\pL\'\’\d][^\pL\.\-\ \'\/\,\’\d]*'
I've also tried splitting up the string into the firstchar and the rest of the string and split the regex in 2.
$validationRegex1 = '[^\pL\'\’\d]';
$validationRegex2 = '[^\pL\.\-\ \'\/\,\’\d]*';
$fixedStr1 = (string) preg_replace($validationRegex1, 'x', $firstChar)
. (string) preg_replace($validationRegex2, 'x', $theRest);
But this also did not seemed to work.
I've experimented a bit with this online tool: https://www.functions-online.com/preg_replace.html
Does anyone know what I am overlooking?
Examples of strings and their expected results
'-' should become 'x'.
'Random-morestuff' stays 'Random-morestuff'
'Random%morestuff' should become 'Randomxmorestuff'
'Rândôm' stays 'Rândôm'
Just an idea but if I got you right, you could use
(?(DEFINE)
(?<first>[\pL\d'’])
(?<other>[-\ \pL\d.'/,’])
)
\b(?&first)(?&other)+\b(*SKIP)(*FAIL)|.
This needs to be replaced by x. You do not have to escape everything in a character class, I changed this accordingly.
See a demo on regex101.com.
A bit more explanation: The (?(DEFINE)...) thingy lets you define subroutines that can be used afterwards and is just syntactic sugar in this case (maybe a bit showing off, really). As you have stated that other characters are allowed depending on theirs positions, I just called them first and other. The \b marks a word boundary, that is a boundary between \w (usually [a-zA-Z0-9_]) and \W (not \w). All of these "words" are allowed, so we let the engine "forget" what has been matched with the (*SKIP)(*FAIL) mechanism and match any other character on the right side of the alternation (|). See how (*SKIP)(*FAIL) works here on SO.
Use
$fixedStr1 = preg_replace('/[\p{L}\'\’\d][\p{L}\.\ \'\/\,\’\d-]*(*SKIP)(*FAIL)|./u', 'x', $input_string);
See regex proof.
Fail matches that match valid symbol words and replace every character appearing in other places.
I'm using this regex to get house number of a street adress.
[a-zA-ZßäöüÄÖÜ .]*(?=[0-9])
Usually, the street is something like "Ohmstraße 2a" or something. At regexpal.com my pattern matches, but I guess preg_replace() isn't identical with it's regex engine.
$num = preg_replace("/[a-zA-ZßäöüÄÖÜ .]*(?=[0-9])/", "", $num);
Update:
It seems that my pattern matches, but I've got some encoding problems with the special chars like äöü
Update #2:
Turns out to be a encoding problem with mysqli.
First of all if you want to get the house number then you should not replace it. So instead of preg_replace use preg_match.
I modified your regex a little bit to match better:
$street = 'Öhmsträße 2a';
if(preg_match('/\s+(\d+[a-z]?)$/i', trim($street), $matches) !== 0) {
var_dump($matches);
} else {
echo 'no house number';
}
\s+ matches one or more space chars (blanks, tabs, etc.)
(...) defines a capture group which can be accesses in $matches
\d+ matches one or more digits (2, 23, 235, ...)
[a-z] matches one character from a to z
? means it's optional (not every house number has a letter in it)
$ means end of string, so it makes sure the house number is at the end of the string
Make sure you strip any spaces after the end of the house number with trim().
The u modifier can help sometimes for handling "extra" characters.
I feel this may be a character set or UTF-8 issue.
It would be a good idea to find out what version of PHP you're running too. If I recall correctly, full Unicode support came in around 5.1.x
I have to replace the last match of a string (for example the word foo) in HTML document. The problem is that the structure of the HTML document is always random.
I'm trying to accomplish that with preg_replace, but so far I know how to replace only the first match, but not the last one.
Thanks.
Use negative look after (?!...)
$str = 'text abcd text text efgh';
echo preg_replace('~text(?!.*text)~', 'bar', $str),"\n";
output:
text abcd text bar efgh
A common approach to match all text to the last occurrence of the subsequent pattern(s) is using a greedy dot, .*. So, you may match and capture the text before the last text and replace with a backreference + the new value:
$str = 'text abcd text text efgh';
echo preg_replace('~(.*)text~su', '${1}bar', $str);
// => text abcd text bar efgh
If text is some value inside a variable that must be treated as plain text, use preg_quote to ensure all special chars are escaped correctly:
preg_replace('~(.*)' . preg_quote($text, '~') . '~su', '${1}bar', $str)
See the online PHP demo and a regex demo.
Here, (.*) matches and captures into Group 1 any zero or more chars (note that the s modifier makes the dot match line break chars, too), as many as possible, up to the rightmost (last) occurrence of text. If text is a Unicode substring, the u modifier comes handy in PHP (it enables (*UTF) PCRE verb allowing parsing the incoming string as a sequence of Unicode code points rather than bytes and the (*UCP) verb that makes all shorthand character classes Unicode aware - if any).
The ${1} is a replacement backreference, a placeholder holding the value captured into Group 1 that lets restore that substring inside the resulting string. You can use $1, but a problem might arise if the $text starts with a digit.
An example
<?php
$str = 'Some random text';
$str_Pattern = '/[^ ]*$/';
preg_match($str_Pattern, $str, $results);
print $results[0];
?>
Of course the accepted solution given here is correct. Nevertheless you might also want to have a look at this post. I'm using this where no pattern is needed and the string does not contain characters that could not be captured by the functions used (i.e. multibyte ones). I also put an additional parameter for dis/regarding case.
The first line then is:
$pos = $case === true ? strripos($subject, $search) : strrpos($subject, $search);
I have to admit that I did not test the performance. However, I guess that preg_replace() is slower, specially on large strings.
I am trying to extract a word that matches a specific pattern from various strings.
The strings vary in length and content.
For example:
I want to extract any word that begins with jac from the following strings and populate an array with the full words:
I bought a jacket yesterday.
Jack is going home.
I want to go to Jacksonville.
The resulting array should be [jacket,Jack,Jacksonville]
I have been trying to use preg_match() but for some reason it won't work. Any suggestions???
$q = "jac";
$str = "jacket";
preg_match($q,$str,$matches);
print $matches[1];
This returns null :S. I dunno what the problem is.
You can use preg_match as:
preg_match("/\b(jac.+?)\b/i", $string, $matches);
See it
You've got to read the manual a few hundred times and it will eventually come to you.
Otherwise, what you're trying to capture can be expressed as "look for 'jac' followed by 0 or more letters* and make sure it's not preceded by a letter" which gives you: /(?<!\\w)(jac\\w*)/i
Here's an example with preg_match_all() so that you can capture all the occurences of the pattern, not just the first:
$q = "/(?<!\\w)(jac\\w*)/i";
$str = "I bought a jacket yesterday.
Jack is going home.
I want to go to Jacksonville.";
preg_match_all($q,$str,$matches);
print_r($matches[1]);
Note: by "letter" I mean any "word character." Officially, it includes numbers and other "word characters." Depending on the exact circumstances, one may prefer \w (word character) or \b (word boundary.)
You can include extra characters by using a character class. For instance, in order to match any word character as well as single quotes, you can use [\w'] and your regexp becomes:
$q = "/(?<!\\w)(jac[\\w']*)/i";
Alternatively, you can add an optional 's to your existing pattern, so that you capture "jac" followed by any number of word characters optionally followed by "'s"
$q = "/(?<!\\w)(jac\\w*(?:'s)?)/i";
Here, the ?: inside the parentheses means that you don't actually need to capture their content (because they're already inside a pair of parentheses, it's unnecessary), and the ? after the parentheses means that the match is optional.
How would I go about removing numbers and a space from the start of a string?
For example, from '13 Adam Court, Cannock' remove '13 '
Because everyone else is going the \d+\s route I'll give you the brain-dead answer
$str = preg_replace("#([0-9]+ )#","",$str);
Word to the wise, don't use / as your delimiter in regex, you will experience the dreaded leaning-toothpick-problem when trying to do file paths or something like http://
:)
Use the same regex I gave in my JavaScript answer, but apply it using preg_replace():
preg_replace('/^\d+\s+/', '', $str);
Try this one :
^\d+ (.*)$
Like this :
preg_replace ("^\d+ (.*)$", "$1" , $string);
Resources :
preg_replace
regular-expressions.info
On the same topic :
Regular expression to remove number, then a space?
regular expression for matching number and spaces.
I'd use
/^\d+\s+/
It looks for a number of any size in the beginning of a string ^\d+
Then looks for a patch of whitespace after it \s+
When you use a backslash before certain letters it represents something...
\d represents a digit 0,1,2,3,4,5,6,7,8,9.
\s represents a space .
Add a plus sign (+) to the end and you can have...
\d+ a series of digits (number)
\s+ multiple spaces (typos etc.)
The same regex I gave you on your other question still applies. You just have to use preg_replace() instead.
Search for /^[\s\d]+/ and replace with the empty string. Eg:
$str = preg_replace(/^[\s\d]+/, '', $str);
This will remove digits and spaces in any order from the beginning of the string. For something that removes only a number followed by spaces, see BoltClock's answer.
If the input strings all have the same ecpected format and you will receive the same result from left trimming all numbers and spaces (no matter the order of their occurrence at the front of the string), then you don't actually need to fire up the regex engine.
I love regex, but know not to use it unless it provides a valuable advantage over a non-regex technique. Regex is often slower than non-regex techniques.
Use ltrim() with a character mask that includes spaces and digits.
Code: (Demo)
var_export(
ltrim('420 911 90210 666 keep this part', ' 0..9')
);
Output:
'keep this part'
It wouldn't matter if the string started with a space either. ltrim() will greedily remove all instances of spaces or numbers from the start of the string intil it can't anymore.