Prevent duplicate entries with url inputs

Prevent duplicate entries with url inputs - php

I have a form that inputs url
dynamically, the user may input
www.stack.com or
www.stack.com/overflow or
http://www.stack.com or
http://www.stack.com/overflow
how can I prevent to insert the duplicate entry to my database?
I've tried these
$url = (input url)
$search = str_replace("http://www.", "", $url);
$search = str_replace("http://", "", $url);
$search = str_replace("www.", "", $url);
$search = str_replace("/", "", $url);
at the last $search, I wanted to remove all the following character after "/" including "/"
what does follow?

You can use PHP's parse_url() method to do all of the work for you:
$url = ((strpos($url, 'http://') !== 0) && (strpos($url, 'https://') !== 0)) ? 'http://'.$url : $url;
$parsed = parse_url($url);
$host = $parsed['host'];
The first line will verify if the scheme's of http:// or https:// exist in the given URL. If not, it will prepend a default of http://. Without the given scheme, parse_url() will put the entire URL in the path index. With it, it will properly parse the host.
Alternatively, since you specifically want just the domain name, you can add the PHP_URL_HOST flag to the method-call as:
$url = ((strpos($url, 'http://') !== 0) && (strpos($url, 'https://') !== 0)) ? 'http://'.$url : $url;
$host = parse_url($url, PHP_URL_HOST); // this will return just the host-portion.
Normally, you would want to keep the subdomain-names for a given URL because a subdomain can differ greatly (and even be an entirely different website). However, in the case of www., this is generally not the case. Given one of the statements above on how to get the current domain, you can remove www. with:
$host = str_replace('www.', '', $host);

Answer by newfurniturey seems to be very good solution. Before calling parse_url you can run one check if http:// is missing from the url, if so then you can prepend the string with http:// and parse_url should work as expected then

For some who will stuck with the same question and drop here, here's the complete code for this
if((strpos($url, 'http://') !== false) || (strpos($url, 'https://') !== false))
{ $host = parse_url($url, PHP_URL_HOST);
if (strpos($url, 'www.') !== false)
$host = str_replace('www.', '', $host);
if (strpos($host, '/') !== false)
{ $str = explode("/", $host);
$host = $str[0];
}
}
else if (strpos($url, 'www.') !== false)
{ $host = str_replace('www.', '', $url);
if (strpos($host, '/') !== false)
{ $str = explode("/", $host);
$host = $str[0];
}
}
else if (strpos($url, '/') !== false)
{ $str = explode("/", $url);
$host = $str[0];
}
else $host = $url;

Related

parse_url and removing subdomain

I'm wanting to strip out everything from a URL but the domain. So http://i.imgur.com/rA81kQf.jpg becomes imgur.com.
$url = 'http://i.imgur.com/rA81kQf.jpg';
$parsedurl = parse_url($url);
$parsedurl = preg_replace('#^www\.(.+\.)#i', '$1', $parsedurl['host']);
// now if a dot exists, grab everything after it. This removes any potential subdomain
$parsedurl = preg_replace("/^(.*?)\.(.*)$/","$2",$parsedurl);
The above works but I feel like I should only being one preg_replace for this. Any idea how I may combine the two?

You can use parse_url() to get desired output like this,
$url = "http://i.imgur.com/rA81kQf.jpg";
$parseData = parse_url($url);
$domain = preg_replace('/^www\./', '', $parseData['host']);
$array = explode(".", $domain);
echo (array_key_exists(count($array) - 2, $array) ? $array[count($array) - 2] : "") . "." . $array[count($array) - 1];
which prints
imgur.com

Get the first directory from URL with PHP

I have url in variable like this:
$url = 'http://mydomain.com/yep/2014-04-01/some-title';
Then what I want is to to parse the 'yep' part from it. I try it like this:
$url_folder = strpos(substr($url,1), "/"));
But it returns some number for some reason. What I do wrong?

Use explode, Try this:
$url = 'http://mydomain.com/yep/2014-04-01/some-title';
$urlParts = explode('/', str_ireplace(array('http://', 'https://'), '', $url));
echo $urlParts[1];
Demo Link

Well, first of all the substr(...,1) will return to you everthing after position 1. So that's not what you want to do.
So http://mydomain.com/yep/2014-04-01/some-title becomes ttp://mydomain.com/yep/2014-04-01/some-title
Then you are doing strpos on everthing after position 1 , looking for the first / ... (Which will be the first / in ttp://mydomain.com/yep/2014-04-01/some-title). The function strpos() will return you the position (number) of it. So it is returning you the number 4.
Rather you use explode():
$parts = explode('/', $url);
echo $parts[3]; // yep
// $parts[0] = "http:"
// $parts[1] = ""
// $parts[2] = "mydomain.com"
// $parts[3] = "yep"
// $parts[4] = "2014-04-01"
// $parts[4] = "some-title"

The most efficient solution is the strtok function:
strtok($path, '/')
So complete code would be :
$dir = strtok(parse_url($url, PHP_URL_PATH), '/')

Use parse_url function.
$url = 'http://mydomain.com/yep/2014-04-01/some-title';
$url_array = parse_url($url);
preg_match('#/(?<path>[^/]+)#', $url_array['path'], $m);
$url_folder = $m['path'];
echo $url_folder;

How to add www. to urls in text file

I've got a text file containing a lot of URLs. Some of the URLs start with www. and http:// and some them start with nothing.
I want to add www. in front of every line in the text file where the URL does not start with www. or http://.
$lines = file("sites.txt");
foreach($lines as $line) {
if(substr($line, 0, 3) != "www" && substr($line, 0, 7) != "http://" ) {
}
}
That's the code I have right now. I know it's not much, but I have no clue how to add www. in front of every unmatched line.

This will add the www. if not present and it will work if there is http/httpS in the found line.
$url = preg_replace("#http(s)?://(?:www\.)?#","http\\1://www.", $url);
This regex will work on the following:
domain.ext -> http://www.domain.ext
www.domain.ext -> http://www.domain.ext
http://www.domain.ext -> http://www.domain.ext
https://domain.ext -> https://www.domain.ext (note the httpS)
https://www.domain.ext -> https://www.domain.ext (note the httpS)
Regex explained:
http(s)?:// -> The http's S might not be there, save in case it is.
(?:www\.)? -> the www. might not be there. Don't save (?:), we're gonna add it anyways
Then we use the \\1 in the replace value to allow the http**S** to stay working when present.
Also, all the string substr functions will fail on https, because it's 1 character longer.

The trick is to pass $lines by reference so you will be able to alter them:
foreach($lines as &$line) { // note the '&'
// http:// and www. is missing:
if(stripos($line, 'http://www.') === false) {
$line = 'http://www.' . $line;
// only http:// is missing:
} elseif(stripos($line, 'http://www.') !== false && stripos($line, 'http://') === false) {
$line = 'http://' . $line;
// only www. is missing:
} elseif(stripos($line, 'http://') !== 0 && stripos($line, 'www.') !== 0)
$line = 'http://www.' . str_replace('http://', '', $line);
// nothing is missing:
} else {
}
}
Note:
Simply adding www. to a non-www domain can be wrong because www.example.com and example.com CAN have completely different contents, different servers, different destination, different DNS mapping. It's good to add http:// but not to add www..
To write the new array back to the file, you'd use:
file_put_contents(implode(PHP_EOL, $lines), 'sites.txt');

$lines = file("/var/www/vhosts/mon.totalinternetgroup.nl/public/sites/sites.txt");
$new_lines = array();
foreach($lines as $line) {
if(substr($line, 0, 3) != "www" || substr($line, 0, 7) != "http://" ) {
$new_lines[] = "www.".$line;
}else{
$new_lines[] = $line;
}
}
$content = implode("\n", $new_lines);
file_put_contents("/var/www/vhosts/mon.totalinternetgroup.nl/public/sites/sites.txt", $content);

use this:
with only 3 line!
<?
$g0 = file_get_contents("site");
#--------------------------------------------------
$g1 = preg_replace("#^http://#m","",$g0);
$g2 = preg_replace("/^www\./m","",$g1);
$g3 = preg_replace("/^/m","http://",$g2);
#--------------------------------------------------
file_put_contents("site2",$g3);
?>
input file
1.com
www.d.som
http://ss.com
http://www.ss.com
output file:
http://1.com
http://d.som
http://ss.com
http://ss.com

Don't print last segment of URL

I have some php which prints a url. Can I contain this with PHP to leave off the last segment?
So this:
www.mysite.com/name/james
would become this:
www.mysite.com/name
I'm using expression engine so the code is just {site_url}.

$url = (substr($url, -1) == '/') ? substr($url, 0, -1) : $url; // remove trailing slash if present
$urlparts = explode('/', $url); // explode on slash
array_pop($urlparts); // remove last part
$url = implode($urlparts, '/'); // put it back together

Using PHP to find part of a URL

Take this domain:
http://www.?.co.uk/elderly-care-advocacy/mental-capacity-act-advance-medical-directive.html
How could i use PHP to find the everything between the first and second slash regardless of whether it changes or no?
Ie. elderly-care-advocacy
Any helo would be greatly appreciated.

//strip the "http://" part. Note: Doesn't work for HTTPS!
$url = substr("http://www.example.com/elderly-care-advocacy/mental-capacity-act-advance-medical-directive.html", 7);
// split the URL in parts
$parts = explode("/", $url);
// The second part (offset 1) is the part we look for
if (count($parts) > 1) {
$segment = $parts[1];
} else {
throw new Exception("Full URLs please!");
}

$url = "http://www.example.co.uk/elderly-care-advocacy/mental-capacity-act-advance-medical-directive.html";
$parts = parse_url($url);
$host = $parts['host'];
$path = $parts['path'];
$items = preg_split('/\//',$path,null,PREG_SPLIT_NO_EMPTY);
$firstPart = $items[0];

off the top of my head:
$url = http://www.example.co.uk/elderly-care-advocacy/mental-capacity-act-advance-medical-directive.html
$urlParts = parse_url($url); // An array
$target_string = $urlParts[1] // 'elderly-care-advocacy'
Cheers

explode('/', $a);

All you should do, is parse url first, and then explode string and get first part. With some sanity checks that would lok like following:
$url = 'http://www.?.co.uk/elderly-care-advocacy/mental-capacity-act-advance-medical-directive.html';
$url_parts = parse_url($url);
if (isset($url_parts['path'])) {
$path_components = explode('/', $ul_parts['path']);
if (count($path_components) > 1) {
// All is OK. Path's first component is in $path_components[0]
} else {
// Throw an error, since there is no directory specified in path
// Or you could assume, that $path_components[0] is the actual path
}
} else {
// Throw an error, since there is no path component was found
}

I was surprised too, but this works.
$url='http://www.?.co.uk/elderly-care-advocacy/...'
$result=explode('/',$url)[3];

I think a Regular Expression should be fine for that.
Try using e.g.: /[^/]+/ that should give you /elderly-care-advocacy/ as the second index of an array in your example.
(The first string is /www.?.com/)

Parse_URL is your best option. It breaks the URL string down into components, which you can selectively query.
This function could be used:
function extract_domain($url){
if ($url_parts = parse_url($url), $prefix = 'www.', $suffix = '.co.uk') {
$host = $url_parts['host'];
$host = str_replace($prefix,'',$host);
$host = str_replace($suffix,'',$host);
return $host;
}
return false;
}
$host_component = extract_domain($_SERVER['REQUEST_URI']);

We Keep Coding

PHP, A popular general-purpose scripting language that is especially suited to web development.

Prevent duplicate entries with url inputs - php

Answer by newfurniturey seems to be very good solution. Before calling parse_url you can run one check if http:// is missing from the url, if so then you can prepend the string with http:// and parse_url should work as expected then

Related

parse_url and removing subdomain

Get the first directory from URL with PHP

How to add www. to urls in text file

Don't print last segment of URL

Using PHP to find part of a URL

Categories

Resources