Codeigniter preg_replace - php

I am not sure if this problem is a boo-boo on my part or something about CI. I have a preg_replace process to convert a published gdoc spreadsheet url back into the original spreadsheet url.
$pat ='/(^[a-z\/\.\:]*?sheet\/)(pub)([a-zA-Z0-9\=\?]*)(\&output\=html)/';
$rep ='$1ccc$3#gid=0';
$theoriginal = preg_replace( $pat, $rep, $published );
This works fine in a test page run locally. This test page isn't framed by CI - it's just a basic php page.
When I copy and paste the pattern and replacement into the CI view which it's intended for, no joy.
Is this malfunction caused by CI or my 'bad' ? Are there easy-to-implement remedies ?
Here's a bit more code from the CI view:
<body id="sites" >
<?php
foreach ( $dets as $item )
{
$nona = $item->nona;
$address = $item->address;
$town = $item->town;
$pc = $item->pc;
$foto1 = $item->foto1;
$foto1txt = $item->foto1txt;
$foto2 = $item->foto2;
$foto2txt = $item->foto2txt;
$costurl = $item->costurl;
$sid = $item->sid;
}
//convert published spreadsheet url to gdoc spreadsheet url
$pat ='/(^[a-z\/\.\:]*?sheet\/)(pub)([a-zA-Z0-9\=\?]*)(\&output\=html)/i';
$rep ='$1ccc$3#gid=0';
$spreadsheet = preg_replace( $pat, $rep, $costurl);
Tom

The pattern you came to can be "tidied" up a bit:
~^(.*?sheet/)pub(.*)(&[a-z=]*)$~
See the regex demo.
The leading ^ and trailing $ are not usually put inside the groups. The / can be left unescaped if you use a regex delimiter other than /. A & and = are not special regex metacharacters, = is only "special" in positive lookaround constructs. So, your pattern means:
^ - start of a string anchor
(.*?sheet/) - Group 1: any 0+ chars other than line break chars, as few as possible (and since I belive the point is to only match pub in the URL path, not the query string, you need to actually replace .*? with [^?#]*? negated character class matching 0+ chars other than # and ?), up to the first occurrence of sheet/ and the subsequent subpatterns...
pub - a substring
(.*) - Group 2: any 0+ chars other than line break chars, as many as possible, up to the last occurrence of the subsequent subpatterns...
(&[a-z=]*) - Group 3: a & followed with 0 or more ASCII letters (since i modifier is used, the [a-z] pattern will also match uppercase letters) and/or =
$ - end of string anchor.
It seems to me that you may also use a better pattern like
~^([^?#]*?sheet/)pub(.*)(&[a-z=]*)$~
^^^^^^
See this regex demo. Explanation of the change is provided in the explanation above.

Related

Find word after the separator and ignore if the word is surrounded by curly brackets

I have a list of language variables that are separated by (=)equals sign. Example list:
global.second = second
global.minute = minute
global.respect = respect
global.Respect = Respect
respect.count = You have # ${global.respect}
give.respect = Get more ${global.respect} by giving others respect.
give.Respect = Get more ${global.Respect} by giving others Respect.
I've been struggling with a regex as I need to capture the whole line if a specific word after the (=)equals sign exists, ignore if the word is in curly brackets but still capture the whole line if this word exists after the one that is in curly brackets.
Using the example list and searching for respect:
IGNORE: global.second = second
IGNORE: global.minute = minute
CAPTURE LINE: global.respect = respect
CAPTURE LINE: global.Respect = Respect
IGNORE: respect.count = You have # ${global.respect}
CAPTURE LINE: give.respect = Get more ${global.respect} by giving others respect.
CAPTURE LINE: give.Respect = Get more ${global.Respect} by giving others Respect.
Using google and stackoverflow I came up to the following regex:
/((?!\{[^\}]*?)(respect)(?![^\{]*?}))$/mi
but it doesn't work as it only captures respect and Respect.
To capture the whole line I modified it to
^(.*=.*?)((?!\{[^\}]*?)(respect)(?![^\{]*?}))$
but still it only captures:
global.respect = respect
global.Respect = Respect
I'm regex newbie and I can't figure out how to make this complicated regex. If anyone can help it will really be appreciated! I've added my php filter functionality in "Show some code". $search_word comes from input text box in one of my pages.
function FilterWord($search_word, $main_file_path, $filter_file_path)
{
$content = file_get_contents($main_file_path);
$pattern = preg_quote($search_word, '/');
//$pattern = "/^.*=.*$pattern.*\$/mi";
$pattern = "/(.*=.*?)((?!\{[^\}]*?)($pattern)(?![^\{]*?}))$/mi";
//[^$search_word {}]+(?![^{]*})
//$pattern = "/^.*=.*$pattern.*\$/mi";
//"/^.*=.*(!\$*.$pattern.*)($pattern.*)\$/m";
//$pattern = "/^.*=.*(?!\{.*$pattern.*\}*?)($pattern.*)\$/m";
//((?!\{[^\}]*?)(kudo)(?![^\{]*?}))
//$pattern = "/(.*=.*?)(?:(?!\{[^\}]*?)\b)($search_word)(?:\b(?![^\{]*?\}))\$/mi";
if(preg_match_all($pattern, $content, $matches)){
file_put_contents($filter_file_path, implode("\n", $matches[0]));
}
else{
echo "No matches found";
}
};
Repeatedly match non-bracket characters, or an opening bracket eventually followed by a closing bracket. Try:
^[^=]+=(?:[^{}\n]|{[^}]+})*?respect.*$
^[^=]+ - From the start of the line, match anything but a =
(?:[^{}\n]|{[^}]+})*? - Lazily repeat either:
[^{}\n] - Anything but a {, }, or newline, or
{[^}]+} - A {, followed by non-bracket characters, followed by }
respect - Match the word you're searching for
.*$ - Match the rest of the line
https://regex101.com/r/E8lQx5/1
Note that since { and } are generally not special characters in a regular expression, they don't need to be escaped (unless the {}s could be interpreted as a quantifier, which is not the case here).
If you wanted, you could make it slightly more efficient with an atomic group, to avoid backtracking when the pattern is already sure to fail at that position - use (?> instead of (?:.
I'm not sure if I understand the problem correctly, yet this expression might be an option:
\.([A-Za-z]+)\s*(?==)(?=.*\b\1\b.*).*
Here, we are using a back-reference to capture our desired work, then if that word exist would get the entire line.
Demo

preg_match in loop returning impossible results

I'm sure I'm missing something. I know just enough to be dangerous.
In my php code I use file_get_contents() to put a file into a variable.
I then loop through an array and use preg_match to search the same variable many times. The file is a tab-delimited txt file. It does fine 800 times but one time randomly in the middle it does something very odd.
$current = file_get_contents($file);
foreach($blahs as $blah){
$image = 'somefile.jpg';
$pattern = '/https:\/\/www\.example\.com\/media(.*)\/' . preg_quote($image) . '/';
preg_match($pattern, $current, $matches);
echo $matches[0];
}
For some reason that one time it turns two URL's with a tab between them. When I look at the txt file the image i'm looking for is listed first then followed by the second iamge but echo $matches[0] returns it in reverse order. it does not exist like echo $matches[0] returns it. It would be like if you searched the string 'one two' and $matches returned 'two one'.
The regex engine is trying to do you a favor and capture the longest match. The \t tab between the two urls is being matched by the . (dot / any character).
Demonstration: (Link)
$blah='test case: https://www.example.com/media/foo/bar.jpg https://www.example.com/media/cat/fish.jpg some text';
$image = 'fish.jpg';
$your_pattern = '/https:\/\/www\.example\.com\/media(.*)\/'.preg_quote($image).'/';
echo preg_match($your_pattern,$blah,$matches)?$matches[0]:'fail';
echo "\n----\n";
$my_pattern='~https://www\.example\.com/media(?:[^/\s]*/)+'.preg_quote($image).'~';
echo preg_match($my_pattern,$blah,$out)?$out[0]:'fail';
Output:
https://www.example.com/media/foo/bar.jpg https://www.example.com/media/cat/fish.jpg
----
https://www.example.com/media/cat/fish.jpg
To crystallize...
test case: https://www.example.com/media/foo/bar.jpg https://www.example.com/media/cat/fish.jpg some text
// your (.*) is matching ---------------^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
My suggested pattern (I may be able to refine the pattern if you provide smoe sample strings) uses (?:[^/\s]*/)+ instead of the (.*).
My non-capturing group breaks down like this:
(?: #start non-capturing group
[^/\s]* #greedily match zero or more non-slash, non-whitespace characters
/ #match a slash
) #end non-capturing group
+ #allow the group to repeat one or more times
*note1: You can use \t where I use \s if you want to be more literal, I am using \s because a valid url shouldn't contain a space anyhow. You may make this adjustment in your project without any loss of accuracy.
*note2: Notice that I changed the pattern delimiters to ~ so that / doesn't need to be escaped inside the pattern.

Replace only specific group in file form preg_replace

I have txt file with content:
fggfhfghfghf
$config['website'] = 'Olpa';
asdasdasdasdasdas
And PHP script for replacing by preg_replace in file:
write_file('tekst.txt', preg_replace('/\$config\[\'website\'] = \'(.*)\';/', 'aaaaaa', file_get_contents('tekst.txt')));
But it doesn't work exactly what I want it to work.
Because this script replace whole match, and after change it looks like this:
fggfhfghfghf
aaaaaa
asdasdasdasdasdas
And that's bad.
All I want is to not change whole match $config['website'] = 'Olpa'; But to just change this Olpa
As you can see it belongs not to Group 2. of match information.
And all I want is to just change this Group 2. one specific thing.
to finally after script it will look like:
fggfhfghfghf
$config['website'] = 'aaaaaa';
asdasdasdasdasdas
You need to change your preg_replace to
preg_replace('/(\$config\[\'website\'] = \').*?(\';)/', '$1aaaaaa$2', file_get_contents('tekst.txt'))
It means, capture what you need to keep (and then use backreferences to restore the text) and just match what you need to replace.
See the regex demo.
Pattern details:
(\$config\[\'website\'] = \') - Group 1 capturing a literal $config['website'] = ' substring (later referenced to with $1)
.*? - any 0+ chars other than line break chars as few as possible
(\';) - Group 2: a ' followed with ; (later referenced to with $2)
In case your aaa actually starts with a digit, you would need a ${1} backreference.
I have a better, faster, leaner solution for you. No capture groups are required, it only requires careful attention to escaping the single quotes:
Pattern: \$config\['website'] = '\K[^']+
\K means "start the fullstring match here", this combined with the negated character class ([^']+) affords the omission of capture groups.
Pattern Demo (just 25 steps)
PHP Implementation:
$txt='fggfhfghfghf
$config[\'website\'] = \'Olpa\';
asdasdasdasdasdas';
print_r(preg_replace('/\$config\[\'website\'\] = \'\K[^\']+/','aaaaaa',$txt));
Using single quotes around the pattern is crucial so that $config isn't interpreted as a variable. As a result, all of the single quotes inside of the pattern must be escaped.
Output:
fggfhfghfghf
$config['website'] = 'aaaaaa';
asdasdasdasdasdas

Replace middle string by passing specific pattern

I want to replace a middle string by passing the pattern. I have tried it by using pre_replace function. But it is not working for me.
$str = "Lead for Nebhub - Admark
Name: Punam Kalbande
Email: kalbandepunam#gmail.com
Phone Number: 800-703-3209
Nebhub Partner : Nebhub - Admark
Address: PO Box 830395 Miami, FL 33173
Hub : Automotive
Products: ERP, CRM, HCM, Help Desk, Marketing";
$pattern = '/^Hub :(.+)Products:$/i';
$replacement = "Logistics";
$result = preg_replace($pattern, $replacement, $str);
but the above code is only returning original string. It is not replacing with the new one.
The s-Modifier is missing in your Pattern. Further you want to match the Pattern somewhere in the middle of the Text. You used ^, which indicates the Start of the Line and $ which indicates the End of the Line. That means, the whole String must match. Use this Regex, and it will work for you.
/(Hub :)[^\n]+/is
Explanation:
( start Subpattern
Hub the Word Hub
followed by a space
: followed by a Doubledot
) end Subpattern -> accessible by $1 or \1
[^\n]+ match one or more Characters except a Linebreak
i Modifier for caseinsensitive Search
s Modifier to include Linebreaks
What you have to do now is to output the Subpattern in the Replacement too:
$result = preg_replace($pattern, "$1$replacement", $str);

Making a url regex global

I've been searching for a regex to replace plain text url's in a string (the string can contain more than 1 url), by:
url
and I found this:
http://mathiasbynens.be/demo/url-regex
I would like to use the diegoperini's regex (which according to the tests is the best):
_^(?:(?:https?|ftp)://)(?:\S+(?::\S*)?#)?(?:(?!10(?:\.\d{1,3}){3})(?!127(?:\.\d{1,3}){3})(?!169\.254(?:\.\d{1,3}){2})(?!192\.168(?:\.\d{1,3}){2})(?!172\.(?:1[6-9]|2\d|3[0-1])(?:\.\d{1,3}){2})(?:[1-9]\d?|1\d\d|2[01]\d|22[0-3])(?:\.(?:1?\d{1,2}|2[0-4]\d|25[0-5])){2}(?:\.(?:[1-9]\d?|1\d\d|2[0-4]\d|25[0-4]))|(?:(?:[a-z\x{00a1}-\x{ffff}0-9]+-?)*[a-z\x{00a1}-\x{ffff}0-9]+)(?:\.(?:[a-z\x{00a1}-\x{ffff}0-9]+-?)*[a-z\x{00a1}-\x{ffff}0-9]+)*(?:\.(?:[a-z\x{00a1}-\x{ffff}]{2,})))(?::\d{2,5})?(?:/[^\s]*)?$_iuS
But I want o make it global to replace all the url's in a string.
When I use this:
/_(?:(?:https?|ftp)://)(?:\S+(?::\S*)?#)?(?:(?!10(?:\.\d{1,3}){3})(?!127(?:\.\d{1,3}){3})(?!169\.254(?:\.\d{1,3}){2})(?!192\.168(?:\.\d{1,3}){2})(?!172\.(?:1[6-9]|2\d|3[0-1])(?:\.\d{1,3}){2})(?:[1-9]\d?|1\d\d|2[01]\d|22[0-3])(?:\.(?:1?\d{1,2}|2[0-4]\d|25[0-5])){2}(?:\.(?:[1-9]\d?|1\d\d|2[0-4]\d|25[0-4]))|(?:(?:[a-z\x{00a1}-\x{ffff}0-9]+-?)*[a-z\x{00a1}-\x{ffff}0-9]+)(?:\.(?:[a-z\x{00a1}-\x{ffff}0-9]+-?)*[a-z\x{00a1}-\x{ffff}0-9]+)*(?:\.(?:[a-z\x{00a1}-\x{ffff}]{2,})))(?::\d{2,5})?(?:/[^\s]*)?_iuS/g
It does not work, how do I make this regex global and what does the underscore at the beginning and the "_iuS", at the end, means?
I would like to use it with php so I am using:
preg_replace($regex, '$0', $examplestring);
The underscores are the regex delimiters, the i, u and S are pattern modifiers :
i (PCRE_CASELESS)
If this modifier is set, letters in the pattern match both upper and lower
case letters.
U (PCRE_UNGREEDY)
This modifier inverts the "greediness" of the quantifiers so that they are
not greedy by default, but become greedy if followed by ?. It is not compatible
with Perl. It can also be set by a (?U) modifier setting within the pattern
or by a question mark behind a quantifier (e.g. .*?).
S
When a pattern is going to be used several times, it is worth spending more
time analyzing it in order to speed up the time taken for matching. If this
modifier is set, then this extra analysis is performed. At present, studying
a pattern is useful only for non-anchored patterns that do not have a single
fixed starting character.
For more informations see http://www.php.net/manual/en/reference.pcre.pattern.modifiers.php
When you added the / ... /g , you added another regex delimiter plus the modifier g wich does not exists in PCRE, that's why it did not work.
I agree with #verdesmarald and used this pattern in the following function:
$string = preg_replace_callback(
"_(?:(?:https?|ftp)://)(?:\S+(?::\S*)?#)?(?:(?!10(?:\.\d{1,3}){3})(?!127(?:\.\d{1,3}){3})(?!169\.254(?:\.\d{1,3}){2})(?!192\.168(?:\.\d{1,3}){2})(?!172\.(?:1[6-9]|2\d|3[0-1])(?:\.\d{1,3}){2})(?:[1-9]\d?|1\d\d|2[01]\d|22[0-3])(?:\.(?:1?\d{1,2}|2[0-4]\d|25[0-5])){2}(?:\.(?:[1-9]\d?|1\d\d|2[0-4]\d|25[0-4]))|(?:(?:[a-z\x{00a1}-\x{ffff}0-9]+-?)*[a-z\x{00a1}-\x{ffff}0-9]+)(?:\.(?:[a-z\x{00a1}-\x{ffff}0-9]+-?)*[a-z\x{00a1}-\x{ffff}0-9]+)*(?:\.(?:[a-z\x{00a1}-\x{ffff}]{2,})))(?::\d{2,5})?(?:/[^\s]*)?_iuS",
create_function('$match','
$m = trim(strtolower($match[0]));
$m = str_replace("http://", "", $m);
$m = str_replace("https://", "", $m);
$m = str_replace("ftp://", "", $m);
$m = str_replace("www.", "", $m);
if (strlen($m) > 25)
{
$m = substr($m, 0, 25) . "...";
}
return "$m";
'), $string);
return $string;
It seem to do the trick, and resolve an issue I was having. As #verdesmarald said, removing the ^ and $ characters allowed the pattern to work even in my pre_replace_callback().
Only thing that concerns me, is how efficient is the pattern. If used in a busy/high traffic web app, could it cause a bottle neck?
UPDATE
The above regex pattern breaks if there is a trail dot at the end of the path section of a url, like so http://www.mydomain.com/page.. To solve this I modified the final part of the regex pattern by adding ^. making the final part look like so [^\s^.]. As I read it, do not match a trailing space or dot.
In my tests so far it seems to be working fine.

Categories