Converting incorrect url to correct url - php

Im am looking for a function that can convert domain.com into http://domain.com/.
Should I do this with a regex or is there a default php function which can handle this?
I have a bunch of website addresses saved mysql like this:
domain.com
www.domain.com
http://domain.com
I like to convert all of those to http://domain.com. And I am looking for a way to do this good so I won't screw up the website address.

I fixed it like this:
$url = 'domain.com';
if (strpos($url, '://') === false)
$url = 'http://' . $url;
echo $url;
based on: Validate url and convert into protocol format

You could do something like this:
$string = "http://www.domain.com";
url_fix($string);
function url_fix($str)
{
$str = str_replace(array("http://", "https://"), "", $str);
// string = www.domain.com
$str = substr_replace('www.', 0,4);
//string = domain.com
$str = "http://".$str;
//string = http://domain.com
return $str;
}
Instead of checking for both http:// and www. and doing a fancy regex for it, you could strip it of both tags (if it has it) and then just prepend http:// before the final example.com.

Related

Unicode characters causing 404 error in file_get_contents()

I have an app visiting URLs automatically through links. It works good as long as the URL doesn't contain Unicode.
For example, I have a link:
Kraków
The link contains just pure ó character in the source. When I try to do:
$href = $crawler->filter('a')->attr('href');
$html = file_get_contents($href);
It returns 404 error. If I visit that URL in the browser, it's fine, because the browser replaces ó to %C3%B3.
What should I do to make is possible to visit that URL via file_get_contents()?
urlencode can be used to encode url parts. The following snippet extracts the path /catalog/kraków/list.html and encodes the contents: catalog, kraków and list.html instead of the entire url to preserve the path.
Checkout the following solution:
function encodeUri($uri){
$urlParts = parse_url($uri);
$path = implode('/', array_map(function($pathPart){
return strpos($pathPart, '%') !== false ? $pathPart : urlencode($pathPart);
},explode('/', $urlParts['path'])));
$query = array_key_exists('query', $urlParts) ? '?' . $urlParts['query'] : '';
return $urlParts['scheme'] . '://' . $urlParts['host'] . $path . $query;
}
$href = $crawler->filter('a')->attr('href');
$html = file_get_contents(encodeUri($href)); // outputs: https://example.com/catalog/krak%C3%B3w/list.html
parse_url docs: https://www.php.net/manual/en/function.parse-url.php

remove http and https from string in php

I have several urls like,
https://example.com/
http://example.com/
I only want "example.com" as string
And I want to remove the
https:// and http://
So I have taken array like this,
$removeChar = ["https://", "http://", "/"];
What is the proper way to remove these?
This worked for me,
$http_referer = str_replace($removeChar, "", "https://example.com/");
Use this php function.
Link : http://php.net/manual/en/function.parse-url.php (parse_url php function)
$url = "http://example.com/";
$domain = parse_url($url, PHP_URL_HOST);
get a result example.com
there is builtin function :
$domain = parse_url($url, PHP_URL_HOST);
try this:
$string = url ... ( your url);
$removeChar= array("http://","https://","/");
foreach($char in $removeChar)
{
$string= str_replace($char,"",$string);
}

How can I know if it's a absolute domain name with PHP

I am getting a link and in it there is a href... I want to know if it's a
http://[...].com/file.txt absolute domain name
or
/file.txt
a link that does not have the full URL.
How can I do this with PHP?
Use parse_url and see if you get a scheme and host. For example, this:
$url = 'http://username:password#hostname/path?arg=value#anchor';
$parts = parse_url($url);
echo $url, "\n", $parts['scheme'], "\n", $parts['host'], "\n\n";
$url = '/path?arg=value#anchor';
$parts = parse_url($url);
echo $url, "\n", $parts['scheme'], "\n", $parts['host'], "\n\n";
Produces:
http://username:password#hostname/path?arg=value#anchor
http
hostname
/path?arg=value#anchor
Live example: http://ideone.com/S9WR2
This also allows you to check the scheme to see if it is something you want (e.g. you'd often want to ignore mailto: URLs).
You can use preg_match with a regexp to test the format of the url string. You'll need to modify $myurl as necessary, probably passing it in as a variable.
<?php
$myurl = "http://asdf.com/file.txt"; // change this to meet your needs
if (preg_match("/^http:/", $myurl)) {
// code to handle
echo 'http url';
}
else if (preg_match("/^\\//", $myurl)) {
// code to handle
echo 'slash url';
}
else {
// unexpected format
echo 'unexpected url';
}
?>

How to find subdomain from a url

URL = http://company.website.com/pages/users/add/
How do i find the subdomain from this via PHP
Such that $subdomain = 'company'
And $url = '/pages/users/add/'
You'll want to take a look at PHP's parse_url. This will give you the basic components of the URL which will make it easier to parse out the rest of your requirements (the subdomain)
$url = 'http://company.website.com/pages/users/add/';
$url_parsed = parse_url($url);
$path = $url_parsed['path']; // "pages/users/add/"
And then a simple regex* to parse $url_parsed['host'] for subdomains:
$subdomain = preg_match("/(?:(.+)\.)?[^\.]+\.[^\.]+/i", $url_parsed['host');
// yields array("company.website.com", "company")
* I tested the regex in JavaScript, so you may need to tweak it a little.
Or to avoid the regex:
$sections = explode('.', $url_parsed["host"]);
$subdomain = $sections[0];

Is there a built in function to retrieve domain name of url in PHP?

If the url is http://www.google.com/url?sa=t&source=web&ct=res&cd=1&ved=0CAsQFjAA&url=https%3A%2F%2Fwww.google.com%2Fadsense%2F&ei=1AdLS5HSI4yQ6APt6Ly-BQ&usg=AFQjCNHKn8TzGhRO1eUfLhB79AVU-_FnGQ&sig2=EGlbrGQ3jTQdTViEt14cYg,
I need the result to be :google.com
You can use parse_url to do it like so:
$host = parse_url('http://....', PHP_URL_HOST);
$host_parts = explode('.', $host);
$domain = $host_parts[count($host_parts)-2].'.'.$host_parts[count($host_parts)-1];
That'll do it. Polish it off as you see fit.

Categories