i used preg_replace to replace . with (dot) but it is empty - php

I tried to use preg_replace but it does not work.
I wrote the code below, but it returns as empty.
$str = 'amin.m is 1.2 ^.j ';
echo $str.'<br>';
echo preg_replace('/(\D|\d)\.(\D|\d)', '\1\(dot\)\2', $str);

A couple notes...
Your pattern is using a pipe | between \D and \d. These two characters are exact opposites, so I think what you may be wanting is \S which would match any non-white space character. By removing your pipes, your pattern reduces its step count (improves efficiency).
You don't need to escape your parentheses in your replacement string -- unless you want to see those slashes in the output.
New Pattern: /(\S)\.(\S)/ Demo Link
New PHP: (Demo Link)
$str = 'amin.m is 1.2 ^.j ';
echo $str.'<br>';
echo preg_replace('/(\S)\.(\S)/','\1(dot)\2',$str);
Output:
amin.m is 1.2 ^.j
amin(dot)m is 1(dot)2 ^(dot)j

Related

Regex trouble, capture until find space or endline

I'm trying to capture the following match:
"url: https://www.anysite/anything"
But sometime the string comes:
"url: https://www.anysite/anything another word"
But i just only want to match
"url: https://www.anysite/anything"
whether or not the "another word" comes.
So, my logic is capture until find the first space after the url address, or end of string.
My REGEX IN PHP is:
preg_match("/(Url|url)(\:|\b)(\s\b|\b).+(\s|$)/",$linestring,$url_string);
But it always bring the "another word" too, instead of bring only until space.
The . is greedy unless the quantifier is made ungreedy with a ? or the U modified.
(Url|url)(\:|\b)(\s\b|\b).+?(\s|$)
Your actually can simplify it a bit further:
[Uu]rl(?::|\b)\s?\b.+?(?:\s|$)
If you want the URL bit capture the .+? with ().
[Uu]rl(?::|\b)\s?\b(.+?)(?:\s|$)
https://regex101.com/r/urq2fM/2/
One way to capture until the first space is to use \S+, which matches any sequence of one or more non-space characters:
url:?\s*(\S+)
By using the i flag we can avoid having to test for Url or url or URL etc. We can use preg_replace to simplify usage, replacing the string with just the captured group:
$url = preg_replace('/url:?\s*(\S+).*/i', '$1', $string);
e.g.
$strings = array("url: https://www.anysite/anything",
"url: https://www.anysite/anything another word");
foreach ($strings as $string) {
$url = preg_replace('/url:?\s*(\S+).*/i', '$1', $string);
echo "$url\n";
}
Output:
https://www.anysite/anything
https://www.anysite/anything
Demo on 3v4l.org

preg_match_all has different result set than preg_replace using the same pattern

I find that preg_match_all and preg_replace do not find the same matches based on the same pattern.
My pattern is:
/<(title|h1|h2|h3|h4|h5|ul|ol|p|figure|caption|span)(.*?)><\/(\1)>/
When I run this against a snippet containing the likes of
<span class="blue"></span>
with preg_match_all I get 17 matches.
When I use the same pattern in preg_replace I get 0 matches. Replacing the \1 with the selection list does find the matches, but of course that won't work as a solution because it then doesn't ensure that the closing tag is the same type of the opening tag.
The overall goal is to find instances of tags with no content that should not be present without content...a holy crusade, I assure you.
In testing whether the regex works, I have also tried it in php cli. Here is the output:
Interactive shell
php > $str = 'abc<span class="blue"></span>def';
php > $pattern = "/<(title|h1|h2|h3|h4|h5|ul|ol|p|figure|caption|span)(.*?)><\/(\1)>/";
php > $final = preg_replace($pattern, '', $str);
php > print $final;
abc<span class="blue"></span>def
$str = 'abc<span class="blue"></span>def';
$pattern = "/<(title|h1|h2|h3|h4|h5|ul|ol|p|figure|caption|span)(.*?)><\/(\\1)>/";
// added \ ^
$final = preg_replace($pattern, '', $str);
print $final;
// echos 'abcdef'
explanation:
"\1" // <-- character in octal notation
is very different from
'\1' // <-- backslash and 1
because the first is an escape sequence. this is also the reason I almost exclusively use single quoted strings. see http://php.net/string#language.types.string.syntax.double

PHP RexExp match and substitute

I am testing RegExp with online regexr.com tool. I will test string with multiple cases, but I can't get substitution to work.
RexEx for matching string is:
/^[0-9]{1,3}[0-9]{6,7}$/
Which matches local mobile number in my country like this:
0921234567
But then I want to substitute number in this way: add "+" sign, add my country code "123", add "." sign, and then finaly, add matched number with stripped leading zero.
Final number will be:
+385.921234567
I have basic idea to insert matched string, but I am not sure how prepend characters, and strip zero from matched string in following substitution pattern:
\+$&\n\t
I will use PHP preg_replace function.
EDIT:
As someone mentioned wisely, there is posibility that there will be one, two or none of zeros, but I will create separate test cases with regex just testing number of zeroes. Doing so in one regex seems to complicated for now.
Possible numbers will be:
0921234567
00111921234567
Where 111 is country code. I know that some country codes consist of 2 or 3 digits, but I will create special cases, for most country codes.
You can use this preg_replace to strip optional zeroes from start of your mobile #:
$str = preg_replace('~^0*(\d{7,9})$~', '+385.$1', $str);
^[0-9]([0-9]{1,2}[0-9]{6,7})$
You just need to add groups.Replace by +385.$1.See demo.
https://regex101.com/r/cJ6zQ3/22
$re = "/^[0-9]([0-9]{1,2}[0-9]{6,7})$/m";
$str = "0921234567\n";
$subst = "+385.$1";
$result = preg_replace($re, $subst, $str);
I would use a 2-step solution:
Check if we match the main regex
Replace the number by pre-pending + + country code + . + number without leading zeros.
PHP code:
$re = "/^[0-9]{7,10}$/";
$str = "0921234567";
if (preg_match($re, $str, $match)) {
echo "+385." . preg_replace('/^0+/', '', $match[0]);
}
Note that splitting out character class in your regex pattern makes no sense when not using capture groups. ^[0-9]{7,10}$ is the same then as ^[0-9]{1,3}[0-9]{6,7}$, meaning match 7 to 10 digits from start to end of the string.
Leading zeros are easily trimmed from the start with /^0+/ regex.

Regular Expression - php - getting spaces not preceded and not followed by a word

Having something like this:
'This or is or some or information or stuff or attention here or testing'
I want to capture all the [spaces] that aren't preceded nor followed by the word or.
I reached this, I think I'm on the right track.
/\s(?<!(\bor\b))\s(?!(\bor\b))/
or this
/(?=\s(?<!(\bor\b))(?=\s(?!(\bor\b))))/
I'm not getting all the spaces, though. What is wrong with this? (the second one was a tryout to get the "and" going")
Try this:
<?php
$str = 'This or is or some or information or stuff or attention is not here or testing';
$matches = null;
preg_match_all('/(?<!\bor\b)[\s]+(?!\bor\b)/', $str, $matches);
var_dump($matches);
?>
How about (?<!or)\s(?!or):
$str='This or is or some or information or stuff or attention here or testing';
echo preg_replace('/(?<!or)\s(?!or)/','+',$str);
>>> This or is or some or information or stuff or attention+here or testing
This uses negitive lookbehind and lookahead, this will replace the space in Tor operator for example so if you want to match only or add trailing and preceding spaces:
$str='Tor operator';
echo preg_replace('/\s(?<!or)\s(?!or)\s/','+',$str);
>>> Tor operator
Code: (PHP Demo) (Pattern Demo)
$string = "You may organize to find or seek a neighbor or a pastor in a harbor or orchard.";
echo preg_replace('~(?<!\bor) (?!or\b)~', '_', $string);
Output:
You_may_organize_to_find or seek_a_neighbor or a_pastor_in_a_harbor or orchard.
Effectively the pattern says:
Match every space IF:
the space is not preceded by the full word "or" (a word that ends in "or" doesn't count), and
the space is not followed by the full word "or" (a word that begins with "or" doesn't count)

Problem with regex for text parsing (similar to textile)

I'm banging my head against the wall trying to figure out a (regexp?) based parser rule for the following problem. I'm developing a text markup parser similar to textile (using PHP), but i don't know how to get the inline formatting rules correct -- and i noticed, that the textile parsers i found are not able to format the following text as i would like to get it formatted:
-*deleted* -- text- and -more deleted text-
The result I want to have is:
<del><strong>deleted</strong> -- text</del> and <del>more deleted text</del>
What I do not want is:
<del><strong>deleted</strong> </del>- text- and <del>more deleted text</del>
Any ideas are very appreciated! thanks very much!
UPDATE
i think i should have mentioned, that '-' should still be a valid character (hyphen) :) -- for example the following should be possible:
-american-football player-
expected result:
<del>american-football player</del>
Based of the RedCloth library's parser description, with some modification for double-dash.
#
(?<!\S) # Start of string, or after space or newline
- # Opening dash
( # Capture group 1
(?: # : (see note 1)
[^-\s]+ # :
[-\s]+ # :
)*? # :
[^-\s]+? # :
) # End
- # Closing dash
(?![^\s!"\#$%&',\-./:;=?\\^`|~[\]()<]) # (see note 2)
#x
Note 1: This should match up to the next dash lazily, while consuming any non-single dashes, and single dashes surrounded by whitespace.
Note 2: Followed by space, punctuation, line break or end of string.
Or compacted:
#(?<!\S)-((?:[^-\s]+[-\s]+)*?[^-\s]+?)-(?![^\s!"#$%&',\-./:;=?\\^`|~[\]()<])#
A few examples:
$regex = '#(?<!\S)-((?:[^-\s]+[-\s]+)*?[^-\s]+?)-(?![^\s!"#$%&\',\-./:;=?\\\^`|~[\]()<])#';
$replacement = '<del>\1</del>';
preg_replace($regex, $replacement, '-*deleted* -- text- and -more deleted text-'), "\n";
preg_replace($regex, $replacement, '-*deleted*--text- and -more deleted text-'), "\n";
preg_replace($regex, $replacement, '-american-football player-'), "\n";
Will output:
<del>*deleted* -- text</del> and <del>more deleted text</del>
<del>*deleted*</del>-text- and <del>more deleted text</del>
<del>american-football player</del>
In the second example, it will match just -*deleted*-, since there are no spaces before the --. -text- will not be matched, because the initial - is not preceded by a space.
The strong tag is easy:
$string = preg_replace('~[*](.+?)[*]~', '<strong>$1</strong>', $string);
Working on the others.
Shameless hack for the del tag:
$string = preg_replace('~-(.+?)-~', '<del>$1</del>', $string);
$string = str_replace('<del></del>', '--', $string);
For a single token, you can simply match:
-((?:[^-]|--)*)-
and replace with:
<del>$1</del>
and similarly for \*((?:[^*]|\*{2,})*)\* and <strong>$1</strong>.
The regex is quite simple: literal - in both ends. In the middle, in a capturing group, we allow anything that isn't an hyphen, or two hyphens in a row.
To also allow single dashes in words, as in objective-c, this can work, by accepting dashes surrounded by two alphanumeric letters:
-((?:[^-]|--|\b-\b)*)-
You could try something like:
'/-.*?[^-]-\b/'
Where the ending hyphen must be at a word boundary and preceded by something that is not a hyphen.
I think you should read this warning sign first
You can't parse [X]HTML with regex
Perhaps you should try googling for a php html library

Categories