Find unknown string around my string - php

I just want to find unknown text around my string between two spaces.
For example:
$mystring = "blabalbla <b>sometext</b> <b>ssssss</b>"
What I want to do with this:
I know the "sometext" but I want to put in a string the
<b>sometext</b>.
But my string is always changing, forexample it can be:
<s><b>sometext</b></s>
Now I need to put the whole into a string
<s><b>sometext</b></s>.
So I can't use simply attaching my variable to
<b>.mystring.</b>
beacuse in cases I can have unknown strings around it.
How can I do this? Or is there another way to find and delete those
<b><s><i></b></s></i> etc.... around my string?
Thnaks, Creep.

You could use a regex:
$mystring = preg_replace('/(^|\s)(?:<[^>]*>)*sometext(?:<[^>]*>)*(\s|$)/i', '$1'.$some_new_text.'$2', $mystring);
I tested this against what you provided, is should work pretty well. It handles the text being on it's own, at the start or end of the string, and surrounded by an unlimited number of html entities.
Description
Match either the start of the string, or a space (^|\s)
Followed by zero or more html nodes (?:<[^>]*>)*
These are in a non-capturing group, so they don't get assigned a group number
Followed by the known string
If you plan on having this string be dynamic, you will need to use the preg_quote method to escape any special characters
Followed by zero or more html nodes (?:<[^>]*>)*
Followed by either a space or the end of the string (\s|$)
Match case-insensitvely /i (optional)
Notice that the leading and trailing spaces (if any) are added back in on the replacement string $1 and $2.

Related

php preg_match mismatch

I would like to know why preg_match('/(?<=\s)[^,]+(?=\s)/',$data,$matches);
matches "List Processes 8989" in the string "20180513 List Processes 8989". The regex I am using should not match numeric characters. What is wrong?
The [^,] basically means any character except ,. If you want to exclude numeric characters as well, you can replace it with [^,0-9], or better [^,\d], so your regex would look like this:
(?<=\s)[^,\d]+(?=\s)
Try it online.
I'm assuming the input string in your question is only part of the actual input string you're using because the regex you provided won't match the numbers at the end unless they're followed by a whitespace.
References:
Negated Character Classes.
Difference between [0-9] and \d.

How to check if string contains specific special characters or starting with a space? [duplicate]

I have the following requirements for validating an input field:
It should only contain alphabets and spaces between the alphabets.
It cannot contain spaces at the beginning or end of the string.
It cannot contain any other special character.
I am using following regex for this:
^(?!\s*$)[-a-zA-Z ]*$
But this is allowing spaces at the beginning. Any help is appreciated.
For me the only logical way to do this is:
^\p{L}+(?: \p{L}+)*$
At the start of the string there must be at least one letter. (I replaced your [a-zA-Z] by the Unicode code property for letters \p{L}). Then there can be a space followed by at least one letter, this part can be repeated.
\p{L}: any kind of letter from any language. See regular-expressions.info
The problem in your expression ^(?!\s*$) is, that lookahead will fail, if there is only whitespace till the end of the string. If you want to disallow leading whitespace, just remove the end of string anchor inside the lookahead ==> ^(?!\s)[-a-zA-Z ]*$. But this still allows the string to end with whitespace. To avoid this look back at the end of the string ^(?!\s)[-a-zA-Z ]*(?<!\s)$. But I think for this task a look around is not needed.
This should work if you use it with String.matches method. I assume you want English alphabet.
"[a-zA-Z]+(\\s+[a-zA-Z]+)*"
Note that \s will allow all kinds of whitespace characters. In Java, it would be equivalent to
[ \t\n\x0B\f\r]
Which includes horizontal tab (09), line feed (10), carriage return (13), form feed (12), backspace (08), space (32).
If you want to specifically allow only space (32):
"[a-zA-Z]+( +[a-zA-Z]+)*"
You can further optimize the regex above by making the capturing group ( +[a-zA-Z]+) non-capturing (with String.matches you are not going to be able to get the words individually anyway). It is also possible to change the quantifiers to make them possessive, since there is no point in backtracking here.
"[a-zA-Z]++(?: ++[a-zA-Z]++)*+"
Try this:
^(((?<!^)\s(?!$)|[-a-zA-Z])*)$
This expression uses negative lookahead and negative lookbehind to disallow spaces at the beginning or at the end of the string, and requiring the match of the entire string.
I think the problem is there's a ? before the negation of white spaces, which means it is optional
This should work:
[a-zA-Z]{1}([a-zA-Z\s]*[a-zA-Z]{1})?
at least one sequence of letters, then optional string with spaces but always ends with letters
I don't know if words in your accepted string can be seperated by more then one space. If they can:
^[a-zA-Z]+(( )+[a-zA-z]+)*$
If can't:
^[a-zA-Z]+( [a-zA-z]+)*$
String must start with letter (or few letters), not space.
String can contain few words, but every word beside first must have space before it.
Hope I helped.

php regex: or clause doesn't work

i need to write a regex for make a double check: if a string contains empty spaces at the beginning, at the end, and if all string it's composed by empty spaces, and if string contains only number.
I've write this regex
$regex = '/^(\s+ )| ^(\d+)$/';
but it doesn't' work. What's wrong ?
First things first: get your spaces right!
For example (\s+ ) will match a minimum of one space (\s+) followed by another space ()! Same applies for the space between | and ^. This way you will match the space literally every time and this leads to wrong results.
If I get you right and you want to match on strings which
start with one or more spaces OR
end with one or more spaces OR
consist only of spaces OR
consist only of numbers
I'd use
/^(?:\s+.*|.*\s+$|\d+$)/
Demo # regex101
This way you match spaces at the start of the string (\s+.*) or (|) spaces at the end of the string (.*\s+$) or a completely numeric string (\d+$).
Insert capturing groups as needed.
This will match in case the whole string consists of spaces, too, because technically the string then starts with spaces.
The space before ^(\d+) make your regex can't catch the numeric string.
It should be like below:
$regex = '/^\s*\d*\s*$/';
First if all, remove the space between | and ^. You are trying to match a space before the beginning of the line (^), so that can not work.
I do not exactly understand what you want. Either a string that only consists of white spaces, or a number that may have white spaces at the beginning or end? Try this:
$regex = '/^\s*\d*\s*$/';

PHP Regex for checking space or certain characters after string

I need a regex which can basically check for space, line break etc after string.
So conditions are,
Allow special characters ., _, -, + inside the string i.e.#hello.world, #hello_world, #helloworld, etc.
Discard anything including special characters where there is no alpha-numeric string after them i.e. #helloworld.<space>, #helloworld-<space>, #helloworld.?, etc. must be parsed as #helloworld
My existing RegEx is /#([A-Za-z0-9+_.-]+)/ which works perfectly Condition #1, but still there seems to be a problem Condition #2
I am using above RegEx in preg_replace()
Solution:
$str = preg_replace('##[\w+.\-]+\b#', '[[$0]]', $str);
This works perfectly.
Tested with
http://gskinner.com/RegExr/
You can use word boundaries to easily find the position between an alphanumeric letter and a non-alphanumeric letter:
$str = preg_replace('##[\w+.\-]+\b#', '[[$0]]', $str);
Working example: http://ideone.com/0ShCm
Here's an idea:
Use strrev to reverse the string
Use strcspn to find the longest prefix of the reversed string that does not contain any alphanumeric characters
Cut the prefix off with substr
Reverse the string again; this is your final result
See it in action.
I 'm not taking into account any requirement that restricts the legal characters in the string to some subset, but you can use your regular expression for that (or even strspn, which might be faster).
The reason is because it's reading the string as a whole. If you want it to parse out everything after the alphanumeric section you might have to do like and end(explode()); and run that through to make sure that it isn't valid and if it isn't valid then remove it from the equation, but then you'd have to check the end for every possible explode point i.e. .,-,~,etc.
Then again another trap that you might run into is that in the case of a item or anything w/ alphanumeric value it might just parse everything from after the last alphanumeric character on.
Sorry that this isn't much help, but I figured thinking aloud does help.

Regex for netbios names

I got this issue figuring out how to build a regexp for verifying a netbios name. According to the ms standard these characters are illegal
\/:*?"<>|
So, thats what I'm trying to detect. My regex is looking like this
^[\\\/:\*\?"\<\>\|]$
But, that wont work.
Can anyone point me in the right direction? (not regexlib.com please...)
And if it matters, I'm using php with preg_match.
Thanks
Your regular expression has two problems:
you insist that the match should span the entire string. As Andrzej says, you are only matching strings of length 1.
you are quoting too many characters. In a character class (i.e. []), you only need to quote characters that are special within character classes, i.e. hyphen, square bracket, backslash.
The following call works for me:
preg_match('/[\\/:*?"<>|]/', "foo"); /* gives 0: does not include invalid characters */
preg_match('/[\\/:*?"<>|]/', "f<oo"); /* gives 1: does include invalid characters */
As it stands at the moment, your regex will match the start of the string (^), then exactly one of the characters in the square brackets (i.e. the illegal characters), then then end of the string ($).
So this likely isn't working because a string of length > 1 will trivially fail to match the regex, and thus be considered OK.
You likely don't need the start and end anchors (the ^ and $). If you remove these, then the regex should match one of the bracketed characters occurring anywhere on the input text, which is what you want.
(Depending on the exact regex dialect, you may canonically need less backslashes within the square brackets, but they are unlikely to do any harm in any case).

Categories