I need to detect if a provided URL matches the one currently navigated to. Mind you the following are all valid, yet semantically equivalent URLs:
https://www.example.com/path/to/page/index.php?parameter=value
https://www.example.com/path/to/page/index.php
https://www.example.com/path/to/page/
https://www.example.com/path/to/page
http://www.example.com/path/to/page
//www.example.com/path/to/page
//www/path/to/page
../../../path/to/page
../../to/page
../page
./
The final function must return true if the given URL points back to the current page, or false if it does not. I do not have a list of expected URLs; this will be used for a client who just wants links to be disabled when they link to the current page. Note that I wish to ignore parameters, as these do not indicate the current page on this site. I got as far as using the following regex:
/^((https?:)?\/\/www(\.example\.com)\/path\/to\/page\/?(index.php)?(\?.+=.*(\&.+=.*)*)?)|(\.\/)$/i
where https?, www, \.example\.com, \/path\/to\/page, and index.php are dynamically detected with $_SERVER["PHP_SELF"] and made into regex form, but that doesn't match the relative URLs like ../../to/page.
EDIT: I got a bit farther with the regex: refiddle.com/gv8
now I'd just need PHP to dynamically create the regex for any given page.
First off, there is no way to predict the total list of valid URLs that will result in display of the current page, since you can't predict (or control) external links that might link back to the page. What if someone uses TinyURL or bit.ly? A regex will not cut the mustard.
If what you need is to insure that a link does not result in the same page, then you need to TEST it. Here's a basic concept:
Every page has a unique ID. Call it a serial number. It should be persistent. The serial number should be embedded somewhere predictable (though perhaps invisibly) within the page.
As the page is created, your PHP will need to walk through all the links for each page, visit each one, and determine whether the link resolves to a page with a serial number that matches the calling page's serial number.
If the serial number does not match, display the link as a link. Otherwise, display something else.
Obviously, this will be an arduous, resource-intensive process for page production. You really don't want to solve your problem this way.
With your "ultimate goal" comment in mind, I suspect your best approach is to be approximate. Here are some strategies...
First option is also the simplest. If you're building a content management system that USUALLY creates links in one format, just support that format. Wikipedia's approach works because a [[link]] is something THEY generate, so THEY know how it's formatted.
Second is more the direction you've gone with your question. The elements of a URL are "protocol", "host", "path" and "query string". You can break them out into a regex, and possibly get it right. You've already stated that you intend to ignore the query string. So ... start with '((https?:)?//(www\.)?example\.com)?' . $_SERVER['SCRIPT_NAME'] and add endings to suit. Other answers are already helping you with this.
Third option is quite a bit more complex, but gives you more fine-grained control over your test. As with the last option, you have the various URL elements. You can test for the validity of each without using a regex. For example:
$a = array(); // init array for valid URLs
// Step through each variation of our path...
foreach([$_SERVER['SCRIPT_NAME'], $_SERVER['REQUEST_URI']] as $path) {
// Step through each variation of our host...
foreach ([$_SERVER['HTTP_HOST'], explode(".", $_SERVER['HTTP_HOST'])[0]] as $server) {
// Step through each variation of our protocol...
foreach (['https://','http://','//'] as $protocol) {
// Set the URL as a key.
$a[ $protocol . $server . $path ] = 1;
}
}
// Also for each path, step through directories and parents...
$apath=explode('/', $path); // turn the path into an array
unset($apath[0]); // strip the leading slash
for( $i = 1; $i <= count($apath); $i++ ) {
if (strlen($apath[$i])) {
$a[ str_repeat("../", 1+count($apath)-$i) . implode("/", $apath) ] = 1;
// add relative paths
}
unset($apath[$i]);
}
$a[ "./" . implode("/", $apath) ] = 1; // add current directory
}
Then simply test whether the link (minus its query string) is an index within the array. Or adjust to suit; I'm sure you get the idea.
I like this third solution the best.
A regex isn't actually necessary to strip off all the query parameters. You could use strok():
$url = strtok($url, '?');
And, to check the output for your URL array:
$url_list = <<<URL
https://www.example.com/path/to/page/index.php?parameter=value
https://www.example.com/path/to/page/index.php
...
./?parameter=value
./
URL;
$urls = explode("\n", $url_list);
foreach ($urls as $url) {
$url = strtok($url, '?'); // remove everything after ?
echo $url."\n";
}
As a function (could be improved):
function checkURLMatch($url, $url_array) {
$url = strtok($url, '?'); // remove everything after ?
if( in_array($url, $url_array)) {
// url exists array
return True;
} else {
// url not in array
return False;
}
}
See it live!
You can use this approach:
function checkURL($me, $s) {
$dir = dirname($me) . '/';
// you may need to refine this
$s = preg_filter(array('~^//~', '~/$~', '~\?.*$~', '~\.\./~'),
array('', '', '', $dir), $s);
// parse resulting URL
$url = parse_url($s);
var_dump($url);
// match parsed URL's path with self
return ($url['path'] === $me);
}
// your page's URL with stripped out .php
$me = str_replace('.php', '', $_SERVER['PHP_SELF']);
// assume this is the URL you are matching against
$s = '../page/';
// compare $me with $s
$ret = checkURL($me, $s);
var_dump($ret);
Live Demo: http://ideone.com/OZZM53
As I have been paid to work on this for the last couple days, I wasn't just sitting around waiting for an answer. I've come up with one that works in my test platform; what does everyone else think? It feels a little bloated, but also feels bulletproof.
Debug echoes left in in case you wanna echo out some stuffs.
global $debug;$debug = false; // toggle debug echoes and var_dumps
/**
* Returns a boolean indicating whether the given URL is the current one.
*
* #param $otherURL the other URL, as a string. Can be any URL, relative or canonical. Invalid URLs will not match.
*
* #return true iff the given URL points to the same place as the current one
*/
function isCurrentURL($otherURL)
{global $debug;
if($debug)echo"<!--\r\nisCurrentURL($otherURL)\r\n{\r\n";
if ($thisURL == $otherURL) // unlikely, but possible. Might as well check.
return true;
// BEGIN Parse other URL
$otherProtocol = parse_url($otherURL);
$otherHost = $otherProtocol["host"] or null; // if $otherProtocol["host"] is set and is not null, use it. Else, use null.
$otherDomain = explode(".", $otherHost) or $otherDomain;
$otherSubdomain = array_shift($otherDomain); // subdom only
$otherDomain = implode(".", $otherDomain); // domain only
$otherFilepath = $otherProtocol["path"] or null;
$otherProtocol = $otherProtocol["scheme"] or null;
// END Parse other URL
// BEGIN Get current URL
#if($debug){echo '$_SERVER == '; var_dump($_SERVER);}
$thisProtocol = $_SERVER["HTTP_X_FORWARDED_PROTO"]; // http or https
$thisHost = $_SERVER["HTTP_HOST"]; // subdom or subdom.domain.tld
$thisDomain = explode(".", $thisHost);
$thisSubdomain = array_shift($thisDomain); // subdom only
$thisDomain = implode(".", $thisDomain); // domain only
if ($thisDomain == "")
$thisDomain = $otherDomain;
$thisFilepath = $_SERVER["PHP_SELF"]; // /path/to/file.php
$thisURL = "$thisProtocol://$thisHost$thisFilepath";
// END Get current URL
if($debug)echo"Current URL is $thisURL ($thisProtocol, $thisSubdomain, $thisDomain, $thisFilepath).\r\n";
if($debug)echo"Other URL is $otherURL ($otherProtocol, $otherHost, $otherFilepath).\r\n";
$thisDomainRegexed = isset($thisDomain) && $thisDomain != null && $thisDomain != "" ? "(\." . str_replace(".","\.",$thisDomain) . ")?" : ""; // prepare domain for insertion into regex
// v this makes the last slash before index.php optional
$regex = "/^(($thisProtocol:)?\/\/$thisSubdomain$thisDomainRegexed)?" . preg_replace('/index\\\..+$/i','?(index\..+)?', str_replace(array(".", "/"), array("\.", "\/"), $thisFilepath)) . '$/i';
if($debug)echo "\r\nregex is $regex\r\nComparing regex against $otherURL";
if (preg_match($regex, $otherURL))
{
if($debug)echo"\r\n\tIt's a match! Returning true...\r\n}\r\n-->";
return true;
}
else
{
if($debug)echo"\r\n\tOther URL is NOT a fully-qualified URL in this subdomain. Checking if it is relative...";
if($otherURL == $thisFilepath) // somewhat likely
{
if($debug)echo"\r\n\t\tOhter URL and this filepath are an exact match! Returning true...\r\n}\r\n-->";
return true;
}
else
{
if($debug)echo"\r\n\t\tFilepath is not an exact match. Testing against regex...";
$regex = regexFilepath($thisFilepath);
if($debug)echo"\r\n\t\tNew Regex is $regex";
if($debug)echo"\r\n\t\tComparing regex against $otherFilepath...";
if (preg_match($regex, $otherFilepath))
{
if($debug)echo"\r\n\t\t\tIt's a match! Returning true...\r\n}\r\n-->";
return true;
}
}
}
if($debug)echo"\r\nI tried my hardest, but couldn't match $otherURL to $thisURL. Returning false...\r\n}\r\n-->";
return false;
}
/**
* Uses the given filepath to create a regex that will match it in any of its relative representations.
*
* #param $path the filepath to be converted
*
* #return a regex that matches a all relative forms of the given filepath
*/
function regexFilepath($path)
{global $debug;
if($debug)echo"\r\nregexFilepath($path)\r\n{\r\n";
$filepathArray = explode("/", $path);
if (count($filepathArray) == 0)
throw new Exception("given parameter not a filepath: $path");
if ($filepathArray[0] == "") // this can happen if the path starts with a "/"
array_shift($filepathArray); // strip the first element off the array
$isIndex = preg_match("/^index\..+$/i", end($filepathArray));
$filename = array_pop($filepathArray);
if($debug){var_dump($filepathArray);}
$ret = '';
foreach($filepathArray as $i)
$ret = "(\.\.\/$ret$i\/)?"; // make a pseudo-recursive relative filepath
if($debug)echo "\r\n$ret";
$ret = preg_replace('/\)\?$/', '?)', $ret); // remove the last '?' and add one before the last '\/'
if($debug)echo "\r\n$ret";
$ret = '/^' . ($ret == '' ? '\.\/' : "((\.\/)|$ret)") . ($isIndex ? '(index\..+)?' : str_replace('.', '\.', $filename)) . '$/i'; // if this filepath leads to an index.php (etc.), then that filename is implied and irrelevant.
if($debug)echo'\r\n}\r\n';
}
This seems to match everything I need it to match, and not what I don't need it to.
Related
My client contacts with the database via service. When I do some action, I call the service URL with file_get_contents then process the response.
I think when a user enters the site from google search result, I think google adds some parameters to my service URL which belongs to file_get_contents.
For example,
as it should be,
file_get_contents(service.domain/service_name?param1=0)
but google adds some strange parameters at the end of url. like this:
file_get_contents(service.domain/service_name?param1=0&force=1)
I saw two or three strange parameters so,
force = 1
gclid=CNC-jvTapbcCFaaj
I don't know how can I handle and remove these parameters.
How can I remove these strange parameters before calling the URL with file_get_contents?
This is my function:
public function getWhere($keyword)
{
$locale = session('locale');
if($locale != "en" && $locale != "")
{
$locale = "_".session('locale');
}else{
$locale = "";
}
$page = file_get_contents(getenv('API_URL')."station_info?search.url=".$keyword);
$page = json_decode($page);
$page = (array) $page->rows;
$station = $page[0];
if($station->search->status == 1)
{
return redirect('/');
}
return view('station', compact('station', 'locale'));
}
Strange parameters add themselves end of the URL.
Edit
I think someone is trying to something. $keyword must take a string which is station name, example, London.
But someone is trying to send London&force=1 so I have to check the $keyword variable.
You can parse url by parse_url function, remove unwanted parameters or just keep parameters you want and build url back
after using parse_url function you may need to parse query by parse_str function, so code will look like:
$urlParsed = parse_url("service.domain/service_name?param1=0&force=1");
$params = parse_str ($urlParsed['query']);
$newParams = [];
$neededParams = ['param1', 'param2'];
foreach ($neededParams as $p) {
if (isset($params[$p])) {
$newParams[$p] = $params[$p];
}
}
$newUrl = $urlParsed['scheme'] . '://' . $urlParsed['host'] . $urlParsed['path'] . '?' . http_build_query($newParams);
file_get_contents($newUrl);
I didn't test that code, but I hope the idea is clear and you can amend it to your needing.
I know there is a LOT of info on the web regarding to this subject but I can't seem to figure it out the way I want.
I'm trying to build a function which strips the domain name from a url:
http://blabla.com blabla
www.blabla.net blabla
http://www.blabla.eu blabla
Only the plain name of the domain is needed.
With parse_url I get the domain filtered but that is not enough.
I have 3 functions that stips the domain but still I get some wrong outputs
function prepare_array($domains)
{
$prep_domains = explode("\n", str_replace("\r", "", $domains));
$domain_array = array_map('trim', $prep_domains);
return $domain_array;
}
function test($domain)
{
$domain = explode(".", $domain);
return $domain[1];
}
function strip($url)
{
$url = trim($url);
$url = preg_replace("/^(http:\/\/)*(www.)*/is", "", $url);
$url = preg_replace("/\/.*$/is" , "" ,$url);
return $url;
}
Every possible domain, url and extension is allowed. After the function is finished, it must return a array of only the domain names itself.
UPDATE:
Thanks for all the suggestions!
I figured it out with the help from you all.
function test($url)
{
// Check if the url begins with http:// www. or both
// If so, replace it
if (preg_match("/^(http:\/\/|www.)/i", $url))
{
$domain = preg_replace("/^(http:\/\/)*(www.)*/is", "", $url);
}
else
{
$domain = $url;
}
// Now all thats left is the domain and the extension
// Only return the needed first part without the extension
$domain = explode(".", $domain);
return $domain[0];
}
How about
$wsArray = explode(".",$domain); //Break it up into an array.
$extension = array_pop($wsArray); //Get the Extension (last entry)
$domain = array_pop($wsArray); // Get the domain
http://php.net/manual/en/function.array-pop.php
Ah, your problem lies in the fact that TLDs can be either in one or two parts e.g .com vs .co.uk.
What I would do is maintain a list of TLDs. With the result after parse_url, go over the list and look for a match. Strip out the TLD, explode on '.' and the last part will be in the format you want it.
This does not seem as efficient as it could be but, with TLDs being added all the time, I cannot see any other deterministic way.
Ok...this is messy and you should spend some time optimizing and caching previously derived domains. You should also have a friendly NameServer and the last catch is the domain must have a "A" record in their DNS.
This attempts to assemble the domain name in reverse order until it can resolve to a DNS "A" record.
At anyrate, this was bugging me, so I hope this answer helps :
<?php
$wsHostNames = array(
"test.com",
"http://www.bbc.com/news/uk-34276525",
"google.uk.co"
);
foreach ($wsHostNames as $hostName) {
echo "checking $hostName" . PHP_EOL;
$wsWork = $hostName;
//attempt to strip out full paths to just host
$wsWork = parse_url($hostName, PHP_URL_HOST);
if ($wsWork != "") {
echo "Was able to cleanup $wsWork" . PHP_EOL;
$hostName = $wsWork;
} else {
//Probably had no path info or malformed URL
//Try to check it anyway
echo "No path to strip from $hostName" . PHP_EOL;
}
$wsArray = explode(".", $hostName); //Break it up into an array.
$wsHostName = "";
//Build domain one segment a time probably
//Code should be modified not to check for the first segment (.com)
while (!empty($wsArray)) {
$newSegment = array_pop($wsArray);
$wsHostName = $newSegment . $wsHostName;
echo "Checking $wsHostName" . PHP_EOL;
if (checkdnsrr($wsHostName, "A")) {
echo "host found $wsHostName" . PHP_EOL;
echo "Domain is $newSegment" . PHP_EOL;
continue(2);
} else {
//This segment didn't resolve - keep building
echo "No Valid A Record for $wsHostName" . PHP_EOL;
$wsHostName = "." . $wsHostName;
}
}
//if you get to here in the loop it could not resolve the host name
}
?>
try with preg_replace.
something like
$domain = preg_replace($regex, '$1', $url);
regex
function test($url)
{
// Check if the url begins with http:// www. or both
// If so, replace it
if (preg_match("/^(http:\/\/|www.)/i", $url))
{
$domain = preg_replace("/^(http:\/\/)*(www.)*/is", "", $url);
}
else
{
$domain = $url;
}
// Now all thats left is the domain and the extension
// Only return the needed first part without the extension
$domain = explode(".", $domain);
return $domain[0];
}
I have been building a search-engine, but now I need a web crawler that in PHP that can crawl my website for it's content.
I don't know if a web crawler / spider is the right word, but I was hoping and wondering if anyone could help me write a simple PHP script that opens all pages in a domain ending in .php or .html and takes the content in the pages and stores that in a variable as raw text. One variable per page.
If anyone knows of a good and open source script that does this or can help me write one, please share or do so— I would greatly appreciate all and any help.
Check out http://sourceforge.net/projects/php-crawler/
Or try this simple code that searches for the presence of the Google Analytics tracking code:
// Disable time limit to keep the script running
set_time_limit(0);
// Domain to start crawling
$domain = "http://webdevwonders.com";
// Content to search for existence
$content = "google-analytics.com/ga.js";
// Tag in which you look for the content
$content_tag = "script";
// Name of the output file
$output_file = "analytics_domains.txt";
// Maximum urls to check
$max_urls_to_check = 100;
$rounds = 0;
// Array to hold all domains to check
$domain_stack = array();
// Maximum size of domain stack
$max_size_domain_stack = 1000;
// Hash to hold all domains already checked
$checked_domains = array();
// Loop through the domains as long as domains are available in the stack
// and the maximum number of urls to check is not reached
while ($domain != "" && $rounds < $max_urls_to_check) {
$doc = new DOMDocument();
// Get the sourcecode of the domain
#$doc->loadHTMLFile($domain);
$found = false;
// Loop through each found tag of the specified type in the dom
// and search for the specified content
foreach($doc->getElementsByTagName($content_tag) as $tag) {
if (strpos($tag->nodeValue, $content)) {
$found = true;
break;
}
}
// Add the domain to the checked domains hash
$checked_domains[$domain] = $found;
// Loop through each "a"-tag in the dom
// and add its href domain to the domain stack if it is not an internal link
foreach($doc->getElementsByTagName('a') as $link) {
$href = $link->getAttribute('href');
if (strpos($href, 'http://') !== false && strpos($href, $domain) === false) {
$href_array = explode("/", $href);
// Keep the domain stack to the predefined max of domains
// and only push domains to the stack that have not been checked yet
if (count($domain_stack) < $max_size_domain_stack &&
$checked_domains["http://".$href_array[2]] === null) {
array_push($domain_stack, "http://".$href_array[2]);
}
};
}
// Remove all duplicate urls from stack
$domain_stack = array_unique($domain_stack);
$domain = $domain_stack[0];
// Remove the assigned domain from domain stack
unset($domain_stack[0]);
// Reorder the domain stack
$domain_stack = array_values($domain_stack);
$rounds++;
}
$found_domains = "";
// Add all domains where the specified search string
// has been found to the found domains string
foreach ($checked_domains as $key => $value) {
if ($value) {
$found_domains .= $key."\n";
}
}
// Write found domains string to specified output file
file_put_contents($output_file, $found_domains);
I found it here.
I am attempting to create a php function which will check if the passes URL is a short URL. Something like this:
/**
* Check if a URL is a short URL
*
* #param string $url
* return bool
*/
function _is_short_url($url){
// Code goes here
}
I know that a simpler and a sure shot way would be to check a 301 redirect, but this function aims at saving an external request just for checking. Neither should the function check against a list of URL shortners as that would be a less scale-able approach.
So are a few possible checks I was thinking:
Overall URL length - May be a max of 30 charecters
URL length after last '/' - May be a max of 10 characters
Number of '/' after protocol (http://) - Max 2
Max length of host
Any thoughts on a possible approach or a more exhaustive checklist for this?
EDIT: This function is just an attempt to save an external request, so its ok to return true for a non-short url (but a real short one). Post passing through this function, I would anyways expand all short URLs by checking 301 redirects. This is just to eliminate the obvious ones.
I would not recommend to use regex, as it will be too complex and difficult to understand. Here is a PHP code to check all your constraints:
function _is_short_url($url){
// 1. Overall URL length - May be a max of 30 charecters
if (strlen($url) > 30) return false;
$parts = parse_url($url);
// No query string & no fragment
if ($parts["query"] || $parts["fragment"]) return false;
$path = $parts["path"];
$pathParts = explode("/", $path);
// 3. Number of '/' after protocol (http://) - Max 2
if (count($pathParts) > 2) return false;
// 2. URL length after last '/' - May be a max of 10 characters
$lastPath = array_pop($pathParts);
if (strlen($lastPath) > 10) return false;
// 4. Max length of host
if (strlen($parts["host"]) > 10) return false;
return true;
}
Here is a small function which checks for all your requirements. I was able to check it without using a complex regex,... only preg_split. You should adapt it yourself easily.
<?php
var_dump(_isShortUrl('http://bit.ly/foo'));
function _isShortUrl($url)
{
// Check for max URL length (30)
if (strlen($url) > 30) {
return false;
}
// Check, if there are more than two URL parts/slashes (5 splitted values)
$parts = preg_split('/\//', $url);
if (count($parts) > 5) {
return false;
}
// Check for max host length (10)
$host = $parts[2];
if (strlen($host) > 10) {
return false;
}
// Check for max length of last URL part (after last slash)
$lastPart = array_pop($parts);
if (strlen($lastPart) > 10) {
return false;
}
return true;
}
If I was you I would test if the url shows a 301 redirect, and then test if the redirect redirects to another website:
function _is_short_url($url) {
$options['http']['method'] = 'HEAD';
stream_context_set_default($options); # don't fetch the full page
$headers = get_headers($url,1);
if ( isset($headers[0]) ) {
if (strpos($headers[0],'301')!==false && isset($headers['Location'])) {
$location = $headers['Location'];
$url = parse_url($url);
$location = parse_url($location);
if ($url['host'] != $location['host'])
return true;
}
}
return false;
}
echo (int)_is_short_url('http://bit.ly/1GoNYa');
I'm making a link and text service, but I have a problem, which is: there is only 1 input text form, and the user could paste something like this:
http:// asdf .com - which would register as a link, or 'asdf http:// test .com' because of the http://, it would register as a url, or
asdf - which would register as a string, because it doesn't contain http://
BUT my problem arises when the user writes something like:
asdf http://asdf.com, which in my current program outputs a "url" value. I've been experimenting for about an hour now, and I've got 3 bits of code (they were all in the same document being commented, so forgive me if they give errors!)
<?
$str = $_POST['paste'];
if(stristr($str, "http://")) {
$type = "url";
}
if(stristr($str, "https://")) {
$type = "url";
}
if($type!="url") {
$type = "string";
}
?>
Next:
<?
$type = "url";
if($type=="url"){
$t = substr($str, 8);
if(stristr($t, "https://")==$t){
$type = "url";}
if(stristr($t, "https://")==$t){
$type = "url";}
if(stristr($t, "http://")!=$t){
$type = "string";}
if(stristr($t, "https://")!=$t){
$type = "string";}
}
echo $type;
?>
Next:
<?
$url = "hasttp://cake.com";
if(stristr($url, "http://")=="") {
$type = "string"; } else {
$type = "url";
$sus = 1;}
if(stristr($url, "http://")==$url) {
$type = "url"; }
if($sus==1) {
$r = substr($url, 7);
if(stristr($r,"http://")!="http://") {
$type = "url"; }
if($r=="") {
$type = "string";
}
}
echo $type;
?>
I have no clue how I could go about classifying a string like 'asdf http://asdf.com' as a string, whilst classifying 'asdf' as a string, and classifying 'http://asdf.com' as a url.. Another idea I haven't tried yet is strpos, but that's what I'm working on now.
Any ideas?
Thanks alot! :)
Some parts of this question are getting cut off for some reason, apologies!
$type = '';
if (preg_match('%^https?://[^\s]+$%', $url)) {
$type = 'url';
} else {
$type = 'string';
}
This will match any value which starts with http:// or https://, and does not contain any space in it as type url. If the value does not start with http:// or https://, or it contains a space in it, it will be type string.
PHP parse_url is your function:
On seriously malformed URLs, parse_url() may return FALSE.
If the component parameter is omitted, an associative array is returned. At least one element will be present within the array. Potential keys within this array are:
scheme - e.g. http
host
port
user
pass
path
query - after the question mark ?
fragment - after the hashmark #
If the component parameter is specified, parse_url() returns a string (or an integer, in the case of PHP_URL_PORT) instead of an array. If the requested component doesn't exist within the given URL, NULL will be returned.
If I'm understanding the problem correctly you want to detect when the user inputs both a string and a url and parse each of them correspondingly.
Try using explode(" ", $userInput);, this will return an array containing all strings separated by a space. Than you can check that for each element in the array and set the type.
$type = strpos($str, 'http') === 0 ? 'url' : 'string':
The strpos function returns the position of a match within a string or FALSE if no match. The tripple equals checks that the result does not only translates to 0 (as FALSE would have done), but that it is in fact integer as well (i.e., the string begins with http).
You could also use something like
switch (true) {
case strpos(trim($str), 'http://') === 0:
case strpos(trim($str), 'https://') === 0:
$type = 'url';
break;
default:
$type = 'string';
break; // I know this is not needed, but it is pretty :-)
}
You should use a regular expression to check if the string starts with http
if(preg_match('/^http/',$string_to_check)){
//this is a url
}