Codeigniter and preg_replace - php

I use Codeigniter to create a multilingual website and everything works fine, but when I try to use the "alternative languages helper" by Luis I've got a problem. This helper uses a regular expression to replace the current language with the new one:
$new_uri = preg_replace('/^'.$actual_lang.'/', $lang, $uri);
The problem is that I have a URL like this: http://www.example.com/en/language/english/ and I want to replace only the first "en" without changing the word "english". I tried to use the limit for preg_replace:
$new_uri = preg_replace('/^'.$actual_lang.'/', $lang, $uri, 1);
but this doesn't work for me. Any ideas?

You could do something like this:
$regex = '#^'.preg_quote($actual_lang, '#').'(?=/|$)#';
$new_uri = preg_replace($regex, $lang, $uri);
The last capture pattern basically means "only match if the next character is a forward slash or the end of the string"...
Edit:
If the code you always want to replace is at the beginning of the path, you could always do:
if (stripos($url, $actual_lang) !== false) {
if (strpos($url, '://') !== false) {
$path = parse_url($url, PHP_URL_PATH);
} else {
$path = $url;
}
list($language, $rest) = explode('/', $path, 2);
if ($language == $actual_lang) {
$url = str_replace($path, $lang . '/' . $rest, $url);
}
}
It's a bit more code, but it should be fairly robust. You could always build a class to do this for you (by parsing, replacing and then rebuilding the URL)...

If you know what the beginning of the URL will always, be, use it in the regex!
$domain = "http://www.example.com/"
$regex = '#(?<=^' . preg_quote($domain, '#') . ')' . preg_quote($actual_lang, '#') . '\b#';
$new_uri = preg_replace($regex, $lang, $uri);
In the case of your example, the regular expression would become #(?<=^http://www.example.com/)en\b which would match en only if it followed the specified beginning of a domain ((?<=...) in a regular expression specifies a positive lookbehind) and is followed by a word boundary (so english wouldn't match).

Related

php regex preg_replace_callback

I have some inherited code whose purpose is to identify urls in a string an prepend the http:// protocol onto them if it doesn't exist.
return preg_replace_callback(
'/((https?:\/\/)?\w+(\.\w{2,})+[\w?&%=+\/]+)/i',
function ($match) {
if (stripos($match[1], 'http://') !== 0 && stripos($match[1], 'https://') !== 0) {
$match[1] = 'http://' . $match[1];
}
return $match[1];
},
$string);
It's working, except when a domain has a hyphen it. So, for-instance, the following string will only partially work.
$string = "In front mfever.com/1 middle http://mf-ever.com/2 at the end";
Can any regex genius see what's wrong with it?
You just need to add the optional dash:
((https?:\/\/)?\w+\-?\w+(\.\w{2,})+[\w?&%=+\/]+)
See it work here https://regex101.com/r/Tkdapj/1

Remove the last character using strrchr and substr in PHP

User may enter abc.com/ instead of abc.com, so I want to do validation using strchr.
This works but looks strange
if(strrchr($url, "/") == "/"){
$url = substr($url, 0,-1);
echo $url;
}
Is there a better way of doing this?
Yes - using the optional second argument to trim() or rtrim() you can specify a character list to trim off the end of a string.:
$url = rtrim($url, '/');
If the trailing / is present, it will be stripped and the string returned without it. If not present, the string will be returned in its original form.
// URL with trailing /
$url1 = 'example.com/';
echo rtrim($url1, '/');
// example.com
// URL without trailing
$url2 = 'example.com';
echo rtrim($url2, '/');
// example.com
// URL with many trailing /
$url3 = 'example.com/////';
echo rtrim($url3, '/');
// example.com
Add additional characters into the string with '/' if you want to strip whitespace as well, as in rtrim($url, ' /')
If you merely want to test if the last character of the URL was /, you may do so with substr()
if (substr($url, -1) === '/') {
// true
}

greek url conversion and trim unwated numbers and symbols

This problem is little complicated since i'm newbee to php encoding.
My site uses utf-8 encoding.
After a lot of tests, i found some solution. I use this kind of code:
function chr_conv($str)
{
$a=array with pattern('%CE%B2','%CE%B3','%CE%B4','%CE%B5' etc..);
$b=array with replacement characters(a,b,c,d, etc...);
return str_replace($a, $b2, $str);
}
function replace_old($str)
{
$a1 = array ('index.php','/http://' etc...);
$a2 = array with replacement characters('','' etc...);
return str_replace($a1, $a2, $str);
}
function sanitize($url)
{
$url= replace_old(replace_old($url));
$url = strtolower($url);
$url = preg_replace('/[0-9]/', '', $url);
$url = preg_replace('/[?]/', '', $url);
$url = substr($url,1);
return $url;
}
function wbz404_process404()
{
$options = wbz404_getOptions();
$urlRequest = $_SERVER['REQUEST_URI'];
$url = chr_conv($urlRequest);
$requestedURL = replace_old(replace_old($url));
$requestedURL .= wbz404_SortQuery($urlParts);
//Get URL data if it's already in our database
$redirect = wbz404_loadRedirectData($requestedURL);
echo sanitize($requestedURL);
echo "</br>";
echo $requestedURL;
echo "</br>";
}
When incoming url is:
/content.php?147-%CE%A8%CE%AC%CF%81%CE%B9-%CE%BC%CE%B5-%CF%80%CF%81%CE%AC%CF%83%CE%B1%28%CE%A7%CE%BF%CF%8D%CE%BC%CF%80%CE%BB%CE%B9%CE%BA%29";
I get:
/content.php?147-psari-me-prasa-choumplik
I want only:
/psari-me-prasa-choumplik
without the content.php?147- before URL.
BUT the most important problem is that I get ENDLESS LOOP instead of correct URL.
What am i doing wrong?
Have in mind that .htaccess solution won't work since i have a lighttpd server, not Apache.
If you need
I am assuming it's not always ?147- that you need to skip. But always after the first hyphen. In which case, before the echo add the following:
$requestedURL = substr($requestedURL, strrpos( $requestedURL , '-') +1 );
This will search for the position of the first hyphen and return that, add one so you skip the hyphen itself, and use that to cut the $requestedURL string up after the hyphen to the end of the string.
If it's always /content.php?127- then replace strrpos( $requestedURL , '-') +1 with the number 17.

How to filter URLs that contain white space with preg match?

I parse through a text that contains several links. Some of them contain white spaces but have a file ending. My current pattern is:
preg_match_all('#\bhttps?://[^\s()<>]+(?:\([\w\d]+\)|([^[:punct:]\s]|/))#', $links, $match);
This works the same way:
preg_match_all('/(http|https|ftp|ftps)\:\/\/[a-zA-Z0-9\-\.]+\.[a-zA-Z]{2,3}(\/\S*)?/', $links, $match);
I don't know much about the patterns and didn't find a good tutorial that explains the meaning of all possible patterns and shows examples.
How could I filter an URL like this:
http://my-url.com/my doc.doc or even http://my-url.com/my doc with more white spaces.doc
The \s in that preg_match_all functions stands for a white space. But how could I check if there is a file ending behind one or some white spaces?
Is it possible?
Why not just make use of PHP's FILTER functions. ?
<?php
$url = "http://my-url.com/my doc.doc";
if(!filter_var($url, FILTER_VALIDATE_URL))
{
echo "URL is not valid";
}
else
{
echo "URL is valid";
}
OUTPUT :
URL is not valid
this might be what you are looking for which uses urlencode
$file = "my doc with more white spaces.doc";
echo " http://my-url.com/" . urlencode($file);
which produces:
http://my-url.com/my+doc+with+more+white+spaces.doc
or with rawurlencode
produces:
http://my-url.com/my%20doc%20with%20more%20white%20spaces.doc
EDIT: Something like the following might help to parse your urls with parse_url
DEMO
$url = 'http://my-url.com/my doc with more white spaces.doc';
$purl = parse_url($url);
$rurl = "";
if(isset($purl['scheme'])){
$rurl .= $purl['scheme'] . "://";
}
if(isset($purl['host'], $purl['path'])){
$rurl .= $purl['host'] . rawurlencode($purl['path']);
}
if($rurl === ""){
$rurl = $url;#error parsing error/invalid url?
}
for sub directories you can do
$purl['path'] = implode('/', array_map(function($value){return rawurlencode($value);}, explode('/', $purl['path'])));
I don't know much about php but this regex
(http|ftp)(s)?://([\w-]+\.)+[\w-]+(/[\w- ./?%&=]*)?
will match every url even with spaces
I think this regex will do.
use this regex
preg_match_all("/^(?si)(?>\s*)(((?>https?:\/\/(?>www\.)?)?(?=[\.-a-z0-9]{2,253}(?>$|\/|\?|\s))[a-z0-9][a-z0-9-]{1,62}(?>\.[a-z0-9][a-z0-9-]{1,62})+)(?>(?>\/|\?).*)?)?(?>\s*)$/", $input_lines, $output_array);
Demo
Alright after doing this really helpful tutorial I finally know how the regex syntax works. After finishing it I experimented a bit on this site
It was pretty easy after figuring out that all hyperlinks in my parsed document were in between quotation marks so I just had to change the regex to:
preg_match_all('#\bhttps?://[^()<>"]+#', $links, $match);
so that after the " it is looking for the next match that begins with http.
But that's not the full solution yet. The user Class was right - without rawurlencode the filenames it won't work.
So the next step was this:
function endsWith($haystack, $needle)
{
return $needle === "" || substr($haystack, -strlen($needle)) === $needle;
}
if(endsWith($textlink, ".doc") || endsWith($textlink, ".docx") || endsWith($textlink, ".pdf") || endsWith($textlink, ".jpg") || endsWith($textlink, ".jpeg") || endsWith($textlink, ".png")){
$file = substr( $textlink, strrpos( $textlink, '/' )+1 );
$rest_url=substr($textlink, 0, strrpos($textlink, '/' )+1 );
$textlink=$rest_url.rawurlencode($file);
}
That filters the filenames from the URLs and rawurlencodes them so that the the output links are correct.
I think this should work:
$url = '...';
$url_new = '';
$array = explode(' ',$url);
foreach($array as $name => $val){
if ($val!=' '){
$url_new = $url_new.$val;
}
}

PHP regex modification

I'm using an old Joomla! plugin (I know, first mistake). It does some URL replacement through regex. Here is the code:
$row->text = preg_replace_callback('#href=("|\')(https?://([-\w\.]+)+(:\d+)?(/([\w/_\.]*(\?\S+)?)?)?)("|\')#', 'replace_links', $row->text);
The problem is that it breaks with URLs that have a hyphen in them. Any help on how I can modify it to accept hyphens would be great.
It could also be the replace_links function that breaks:
function replace_links($matches) {
$match = $matches[0];
$array = array('href=',"'", '"');
$match = str_replace($array, '',$match);
if (strpos($match, JURI::root())) {
return $matches[0];
} else {
$plugin =& JPluginHelper::getPlugin('content', 'linkdisclaimer');
$pluginParams = new JParameter( $plugin->params );
$id = $pluginParams->get('disclaimerPage');
$match = "href=\"javascript:linkDisclaimer('".rawurlencode($match)."', '".$id."');\"";
return $match;
}
}
I tried this in a regex tester and it doesn't match urls with a - in them, so I'm guessing it's the regex. Try adding a - character into the regex like so href=("|\')(https?://([-\w\.]+)+(:\d+)?(/([\w-/_\.]*(\?\S+)?)?)?)("|\'). This should allow - in the path segment after the domain. The full replacement would be like
$row->text = preg_replace_callback('#href=("|\')(https?://([-\w\.]+)+(:\d+)?(/([\w-/_\.]*(\?\S+)?)?)?)("|\')#', 'replace_links', $row->text);

Categories