Replace middle string by passing specific pattern - php

I want to replace a middle string by passing the pattern. I have tried it by using pre_replace function. But it is not working for me.
$str = "Lead for Nebhub - Admark
Name: Punam Kalbande
Email: kalbandepunam#gmail.com
Phone Number: 800-703-3209
Nebhub Partner : Nebhub - Admark
Address: PO Box 830395 Miami, FL 33173
Hub : Automotive
Products: ERP, CRM, HCM, Help Desk, Marketing";
$pattern = '/^Hub :(.+)Products:$/i';
$replacement = "Logistics";
$result = preg_replace($pattern, $replacement, $str);
but the above code is only returning original string. It is not replacing with the new one.

The s-Modifier is missing in your Pattern. Further you want to match the Pattern somewhere in the middle of the Text. You used ^, which indicates the Start of the Line and $ which indicates the End of the Line. That means, the whole String must match. Use this Regex, and it will work for you.
/(Hub :)[^\n]+/is
Explanation:
( start Subpattern
Hub the Word Hub
followed by a space
: followed by a Doubledot
) end Subpattern -> accessible by $1 or \1
[^\n]+ match one or more Characters except a Linebreak
i Modifier for caseinsensitive Search
s Modifier to include Linebreaks
What you have to do now is to output the Subpattern in the Replacement too:
$result = preg_replace($pattern, "$1$replacement", $str);

Related

Regex trouble, capture until find space or endline

I'm trying to capture the following match:
"url: https://www.anysite/anything"
But sometime the string comes:
"url: https://www.anysite/anything another word"
But i just only want to match
"url: https://www.anysite/anything"
whether or not the "another word" comes.
So, my logic is capture until find the first space after the url address, or end of string.
My REGEX IN PHP is:
preg_match("/(Url|url)(\:|\b)(\s\b|\b).+(\s|$)/",$linestring,$url_string);
But it always bring the "another word" too, instead of bring only until space.
The . is greedy unless the quantifier is made ungreedy with a ? or the U modified.
(Url|url)(\:|\b)(\s\b|\b).+?(\s|$)
Your actually can simplify it a bit further:
[Uu]rl(?::|\b)\s?\b.+?(?:\s|$)
If you want the URL bit capture the .+? with ().
[Uu]rl(?::|\b)\s?\b(.+?)(?:\s|$)
https://regex101.com/r/urq2fM/2/
One way to capture until the first space is to use \S+, which matches any sequence of one or more non-space characters:
url:?\s*(\S+)
By using the i flag we can avoid having to test for Url or url or URL etc. We can use preg_replace to simplify usage, replacing the string with just the captured group:
$url = preg_replace('/url:?\s*(\S+).*/i', '$1', $string);
e.g.
$strings = array("url: https://www.anysite/anything",
"url: https://www.anysite/anything another word");
foreach ($strings as $string) {
$url = preg_replace('/url:?\s*(\S+).*/i', '$1', $string);
echo "$url\n";
}
Output:
https://www.anysite/anything
https://www.anysite/anything
Demo on 3v4l.org

Find a string and create a variable with regex

I grab the source code of a website with file_get_contents().
Inside this code, i try to detect this king string and put the content of idDM in a variable.
'idDM':'x1mi7f7'
For example, here, $idDM will be equal to x1mi7f7, but the string can be :
'idDM':'xxxxxxx'
And the variable will be xxxxxxx.
I know o have to use REGEX for that. For now, I just manage to find if there is IdDM, but not to recover its contents.
Any advice ? Thanks.
Use the following regex:
'idDM'\s*:\s*'([^']+)'
Explanation:
'idDM' - match the literal string 'idDM' (with the quotes)
\s* - match one or more whitespace character
: - match a literal colon character
\s* - match one or more whitespace character
'([^']+)' - match (and capture) everything that's inside single-quotes
Usage:
$str = "foo bar 'idDM':'x1mi7f7' more baz";
if (preg_match("/'idDM'\s*:\s*'([^']+)'/", $str, $matches)) {
$idDM = $matches[1];
}
var_dump($idDM); // => string(7) "x1mi7f7"
Demo

Regular Expression - php - getting spaces not preceded and not followed by a word

Having something like this:
'This or is or some or information or stuff or attention here or testing'
I want to capture all the [spaces] that aren't preceded nor followed by the word or.
I reached this, I think I'm on the right track.
/\s(?<!(\bor\b))\s(?!(\bor\b))/
or this
/(?=\s(?<!(\bor\b))(?=\s(?!(\bor\b))))/
I'm not getting all the spaces, though. What is wrong with this? (the second one was a tryout to get the "and" going")
Try this:
<?php
$str = 'This or is or some or information or stuff or attention is not here or testing';
$matches = null;
preg_match_all('/(?<!\bor\b)[\s]+(?!\bor\b)/', $str, $matches);
var_dump($matches);
?>
How about (?<!or)\s(?!or):
$str='This or is or some or information or stuff or attention here or testing';
echo preg_replace('/(?<!or)\s(?!or)/','+',$str);
>>> This or is or some or information or stuff or attention+here or testing
This uses negitive lookbehind and lookahead, this will replace the space in Tor operator for example so if you want to match only or add trailing and preceding spaces:
$str='Tor operator';
echo preg_replace('/\s(?<!or)\s(?!or)\s/','+',$str);
>>> Tor operator
Code: (PHP Demo) (Pattern Demo)
$string = "You may organize to find or seek a neighbor or a pastor in a harbor or orchard.";
echo preg_replace('~(?<!\bor) (?!or\b)~', '_', $string);
Output:
You_may_organize_to_find or seek_a_neighbor or a_pastor_in_a_harbor or orchard.
Effectively the pattern says:
Match every space IF:
the space is not preceded by the full word "or" (a word that ends in "or" doesn't count), and
the space is not followed by the full word "or" (a word that begins with "or" doesn't count)

Codeigniter preg_replace

I am not sure if this problem is a boo-boo on my part or something about CI. I have a preg_replace process to convert a published gdoc spreadsheet url back into the original spreadsheet url.
$pat ='/(^[a-z\/\.\:]*?sheet\/)(pub)([a-zA-Z0-9\=\?]*)(\&output\=html)/';
$rep ='$1ccc$3#gid=0';
$theoriginal = preg_replace( $pat, $rep, $published );
This works fine in a test page run locally. This test page isn't framed by CI - it's just a basic php page.
When I copy and paste the pattern and replacement into the CI view which it's intended for, no joy.
Is this malfunction caused by CI or my 'bad' ? Are there easy-to-implement remedies ?
Here's a bit more code from the CI view:
<body id="sites" >
<?php
foreach ( $dets as $item )
{
$nona = $item->nona;
$address = $item->address;
$town = $item->town;
$pc = $item->pc;
$foto1 = $item->foto1;
$foto1txt = $item->foto1txt;
$foto2 = $item->foto2;
$foto2txt = $item->foto2txt;
$costurl = $item->costurl;
$sid = $item->sid;
}
//convert published spreadsheet url to gdoc spreadsheet url
$pat ='/(^[a-z\/\.\:]*?sheet\/)(pub)([a-zA-Z0-9\=\?]*)(\&output\=html)/i';
$rep ='$1ccc$3#gid=0';
$spreadsheet = preg_replace( $pat, $rep, $costurl);
Tom
The pattern you came to can be "tidied" up a bit:
~^(.*?sheet/)pub(.*)(&[a-z=]*)$~
See the regex demo.
The leading ^ and trailing $ are not usually put inside the groups. The / can be left unescaped if you use a regex delimiter other than /. A & and = are not special regex metacharacters, = is only "special" in positive lookaround constructs. So, your pattern means:
^ - start of a string anchor
(.*?sheet/) - Group 1: any 0+ chars other than line break chars, as few as possible (and since I belive the point is to only match pub in the URL path, not the query string, you need to actually replace .*? with [^?#]*? negated character class matching 0+ chars other than # and ?), up to the first occurrence of sheet/ and the subsequent subpatterns...
pub - a substring
(.*) - Group 2: any 0+ chars other than line break chars, as many as possible, up to the last occurrence of the subsequent subpatterns...
(&[a-z=]*) - Group 3: a & followed with 0 or more ASCII letters (since i modifier is used, the [a-z] pattern will also match uppercase letters) and/or =
$ - end of string anchor.
It seems to me that you may also use a better pattern like
~^([^?#]*?sheet/)pub(.*)(&[a-z=]*)$~
^^^^^^
See this regex demo. Explanation of the change is provided in the explanation above.

Problem with regex for text parsing (similar to textile)

I'm banging my head against the wall trying to figure out a (regexp?) based parser rule for the following problem. I'm developing a text markup parser similar to textile (using PHP), but i don't know how to get the inline formatting rules correct -- and i noticed, that the textile parsers i found are not able to format the following text as i would like to get it formatted:
-*deleted* -- text- and -more deleted text-
The result I want to have is:
<del><strong>deleted</strong> -- text</del> and <del>more deleted text</del>
What I do not want is:
<del><strong>deleted</strong> </del>- text- and <del>more deleted text</del>
Any ideas are very appreciated! thanks very much!
UPDATE
i think i should have mentioned, that '-' should still be a valid character (hyphen) :) -- for example the following should be possible:
-american-football player-
expected result:
<del>american-football player</del>
Based of the RedCloth library's parser description, with some modification for double-dash.
#
(?<!\S) # Start of string, or after space or newline
- # Opening dash
( # Capture group 1
(?: # : (see note 1)
[^-\s]+ # :
[-\s]+ # :
)*? # :
[^-\s]+? # :
) # End
- # Closing dash
(?![^\s!"\#$%&',\-./:;=?\\^`|~[\]()<]) # (see note 2)
#x
Note 1: This should match up to the next dash lazily, while consuming any non-single dashes, and single dashes surrounded by whitespace.
Note 2: Followed by space, punctuation, line break or end of string.
Or compacted:
#(?<!\S)-((?:[^-\s]+[-\s]+)*?[^-\s]+?)-(?![^\s!"#$%&',\-./:;=?\\^`|~[\]()<])#
A few examples:
$regex = '#(?<!\S)-((?:[^-\s]+[-\s]+)*?[^-\s]+?)-(?![^\s!"#$%&\',\-./:;=?\\\^`|~[\]()<])#';
$replacement = '<del>\1</del>';
preg_replace($regex, $replacement, '-*deleted* -- text- and -more deleted text-'), "\n";
preg_replace($regex, $replacement, '-*deleted*--text- and -more deleted text-'), "\n";
preg_replace($regex, $replacement, '-american-football player-'), "\n";
Will output:
<del>*deleted* -- text</del> and <del>more deleted text</del>
<del>*deleted*</del>-text- and <del>more deleted text</del>
<del>american-football player</del>
In the second example, it will match just -*deleted*-, since there are no spaces before the --. -text- will not be matched, because the initial - is not preceded by a space.
The strong tag is easy:
$string = preg_replace('~[*](.+?)[*]~', '<strong>$1</strong>', $string);
Working on the others.
Shameless hack for the del tag:
$string = preg_replace('~-(.+?)-~', '<del>$1</del>', $string);
$string = str_replace('<del></del>', '--', $string);
For a single token, you can simply match:
-((?:[^-]|--)*)-
and replace with:
<del>$1</del>
and similarly for \*((?:[^*]|\*{2,})*)\* and <strong>$1</strong>.
The regex is quite simple: literal - in both ends. In the middle, in a capturing group, we allow anything that isn't an hyphen, or two hyphens in a row.
To also allow single dashes in words, as in objective-c, this can work, by accepting dashes surrounded by two alphanumeric letters:
-((?:[^-]|--|\b-\b)*)-
You could try something like:
'/-.*?[^-]-\b/'
Where the ending hyphen must be at a word boundary and preceded by something that is not a hyphen.
I think you should read this warning sign first
You can't parse [X]HTML with regex
Perhaps you should try googling for a php html library

Categories