Hash url transform to a long url htaccess? php?

Hash url transform to a long url htaccess? php? - php

I have constructed my entire webpage with hashes (http://example.com/videos#video01), but the problem is when I want to share on facebook obviously it doesn't recognize the hash, so my question is: Is there a way to transform or redirect the hash url to a long social-friendly-url?
Solution:
I tried one more time with bit.ly's API, I got 50 videos to show each with a hash at the end of the url. I made a little cache script (bit.ly has a limit) and I wrote with PHP a "foreach", seem like bit.ly accepts hashes.
Thanks anyway.

The # and everything after is not sent to a server. In your case you're only sending http://example.com/videos.

New Link format: http://example.com/videos?name=video01
Call this function toward top of controller or http://example.com/videos/index.php:
function redirect()
{
if (!empty($_GET['name'])) {
// sanitize & validate $_GET['name']
// Remove anything which isn't a word, whitespace, number
// or any of the following caracters -_~,;[]().
// If you don't need to handle multi-byte characters
// you can use preg_replace rather than mb_ereg_replace
$file = mb_ereg_replace("([^\w\s\d\-_~,;\[\]\(\).])", '', $_GET['name']);
// Remove any runs of periods
$file = mb_ereg_replace("([\.]{2,})", '', $file);
$valid = file_exists('pathToFiles/' . $file);
if ($valid) {
$url = '/videos#' . $file;
} else {
$url = '/your404page.php';
}
header("Location: $url");
}
}
Sanitization snippet from this highly ranked answer: https://stackoverflow.com/a/2021729/1296209

Related

parse non encoded url

there is an external page, that passes a URL using a param value, in the querystring. to my page.
eg: page.php?URL=http://www.domain2.com?foo=bar
i tried saving the param using
$url = $_GET['url']
the problem is the reffering page does not send it encoded. and therefore it recognizes anything trailing the "&" as the beginning of a new param.
i need a way to parse the url in a way that anything trailing the second "?" is part or the passed url and not the acctual querystring.

Get the full querystring and then take out the 'URL=' part of it
$name = http_build_query($_GET);
$name = substr($name, strlen('URL='));

Antonio's answer is probably best. A less elegant way would also work:
$url = $_GET['url'];
$keys = array_keys($_GET);
$i=1;
foreach($_GET as $value) {
$url .= '&'.$keys[$i].'='.$value;
$i++;
}
echo $url;

Something like this might help:
// The full request
$request_full = $_SERVER["REQUEST_URI"];
// Position of the first "?" inside $request_full
$pos_question_mark = strpos($request_full, '?');
// Position of the query itself
$pos_query = $pos_question_mark + 1;
// Extract the malformed query from $request_full
$request_query = substr($request_full, $pos_query);
// Look for patterns that might corrupt the query
if (preg_match('/([^=]+[=])([^\&]+)([\&]+.+)?/', $request_query, $matches)) {
// If a match is found...
if (isset($_GET[$matches[1]])) {
// ... get rid of the original match...
unset($_GET[$matches[1]]);
// ... and replace it with a URL encoded version.
$_GET[$matches[1]] = urlencode($matches[2]);
}
}

As you have hinted in your question, the encoding of the URL you get is not as you want it: a & will mark a new argument for the current URL, not the one in the url parameter. If the URL were encoded correctly, the & would have been escaped as %26.
But, OK, given that you know for sure that everything following url= is not escaped and should be part of that parameter's value, you could do this:
$url = preg_replace("/^.*?([?&]url=(.*?))?$/i", "$2", $_SERVER["REQUEST_URI"]);
So if for example the current URL is:
http://www.myhost.com/page.php?a=1&URL=http://www.domain2.com?foo=bar&test=12
Then the returned value is:
http://www.domain2.com?foo=bar&test=12
See it running on eval.in.

if else on variable link input

I have a method of pulling Youtube video data from API links. I use Wordpress and ran into a snag.
In order to pull the thumbnail, views, uploader and video title I need the user to input the 11 character code at the end of watch?v=_______. This is documented with specific instructions for the user, but what if they ignore it and paste the whole url?
// the url 'code' the user should input.
_gXp4hdd2pk
// the wrong way, when the user pastes the whole url.
https://www.youtube.com/watch?v=_gXp4hdd2pk
If the user accidentally pastes the entire URL and not the 11 character code then is there a way I can use PHP to grab either the code or whats at the end of this url (11 characters after 'watch?v='?
Here is my PHP code to pull the data:
// $url is the code at the end of 'watch?v=' that the user inputs
$url = get_post_meta ($post->ID, 'youtube_url', $single = true);
// $code is a variable for placing the $url in a youtube link so I can output it to an API link
$code = 'http://www.youtube.com/watch?v=' . $url;
// $code is called at the end of this oembed code, allowing me to decode json data and pull elements from json to echo in my html
// echoed output returns json file. example: http://www.youtube.com/oembed?url=http://www.youtube.com/watch?v=_gXp4hdd2pk
$json = file_get_contents('http://www.youtube.com/oembed?url='.urlencode($code));
Im looking for something like...
"if user inputs code, use this block of code, else if user inputs whole url use a different block of code, else throw error."
Or... if they use the whole URL can PHP only use a specific section of that url...?
EDIT: Thank you for all the answers! I am new to PHP, so thank you all for your patience. It is difficult for graphic designers to learn PHP, even reading the PHP manual can give us headaches. All of your answers were great and the ones ive tested have worked. Thank you so much :)

Try this,
$code = 'https://www.youtube.com/watch?v=_gXp4hdd2pk';
if (filter_var($code, FILTER_VALIDATE_URL) == TRUE) {
// if `$code` is valid url
$code_arr = explode('?v=', $code);
$query_str = explode('&', $code_arr[1]);
$new_code = $query_str[0];
} else {
// if `$code` is not a valid url like '_gXp4hdd2pk'
$new_code = $code;
}
echo $new_code;

Here's a simple option for you to do, unless you want to use regex like Nisse Engström's Answer.
Using the function parse_url() you could do something like this:
$url = 'https://www.youtube.com/watch?v=_gXp4hdd2pk&list=RD_gXp4hdd2pk#t=184';
$split = parse_url('https://www.youtube.com/watch?v=_gXp4hdd2pk&list=RD_gXp4hdd2pk#t=184');
$params = explode('&', $split['query']);
$video_id = str_replace('v=', '', $params[0]);
now $video_id would return:
_gXp4hdd2pk
from the $url supplied in the above code.
I suggest you read the parse_url() documentation to ensure you understand and grasp it all :-)
Update
for your comment.
You'd use something like this to make sure the parsed value is a valid URL:
// this will check if valid url
if (filter_var($code, FILTER_VALIDATE_URL)) {
// its valid as it returned true
// so run the code
$url = 'https://www.youtube.com/watch?v=_gXp4hdd2pk&list=RD_gXp4hdd2pk#t=184';
$split = parse_url('https://www.youtube.com/watch?v=_gXp4hdd2pk&list=RD_gXp4hdd2pk#t=184');
$params = explode('&', $split['query']);
$video_id = str_replace('v=', '', $params[0]);
} else {
// they must have posted the video code as the if check returned false.
$video_id = $url;
}

Just try as follows ..
$url =" https://www.youtube.com/watch?v=_gXp4hdd2pk";
$url= explode('?v=', $url);
$endofurl = end($url);
echo $endofurl;
Replace $url variable with input .

I instruct my users to copy and paste the whole youtube url.
Then, I do this:
$video_url = 'https://www.youtube.com/watch?v=_gXp4hdd2pk'; // this is from user input
$parsed_url = parse_url($video_url);
parse_str($parsed_url['query'], $query);
$vidID = isset($query['v']) ? $query['v'] : NULL;
$url = "http://gdata.youtube.com/feeds/api/videos/". $vidID; // this is used for the Api

$m = array();
if (preg_match ('#^(https?://www.youtube.com/watch\\?v=)?(.+)$#', $url, $m)) {
$code = $m[2];
} else {
/* No match */
}
The code uses a Regular Expression to match the user input (the subject) against a pattern. The pattern is enclosed in a pair of delimiters (#) of your choice. The rest of the pattern works like this:
^ matches the beginning of the string.
(...) creates a subpattern.
? matches 0 or 1 of the preceeding character or subpattern.
https? matches "http" or "https".
\? matches "?".
(.+) matches 1 or more arbitrary charactes. The . matches any character (except newline). + matches 1 or more of the preceeding character or subpattern.
$ matches the end of the string.
In other words, optionally match an http or https base URL, followed by the video code.
The matches are then written to $m. $m[0] contains the entire string, $m[1] contains the first subpattern (base URL) and $m[2] contains the second subpattern (code).

I dont wan't reinvent wheel, but i couldnt find any library that would do this perfectly.
In my script users can save URLs, i want when they give me list like:
google.com
www.msn.com
http://bing.com/
and so on...
I want to be able to save in database in "correct format".
Thing i do is I check is it there protocol, and if it's not present i add it and then validate URL against RegExp.
For PHP parse_url any URL that contains protocol is valid, so it didnt help a lot.
How guys you are doing this, do you have some idea you would like to share with me?
Edit:
I want to filter out invalid URLs from user input (list of URLs). And more important, to try auto correct URLs that are invalid (ex. doesn't contains protocol). Ones user enter list, it should be validated immediately (no time to open URLs to check those they really exist).
It would be great to extract parts from URL, like parse_url do, but problem with parse_url is, it doesn't work well with invalid URLs. I tried to parse URL with it, and for parts that are missing (and are required) to add default ones (ex. no protocol, add http). But parse_url for "google.com" wont return "google.com" as hostname but as path.
This looks like really common problem to me, but i could not find available solution on internet (found some libraries that will standardize URL, but they wont fix URL if it is invalid).
Is there some "smart" solution to this, or I should stick with my current:
Find first occurrence of :// and validate if it's text before is valid protocol, and add protocol if missing
Found next occurrence of / and validate is hostname is in valid format
For good measure validate once more via RegExp whole URL
I just have feeling I will reject some valid URLs with this, and for me is better to have false positive, that false negative.

I had the same problem with parse_url as OP, this is my quick and dirty solution to auto-correct urls(keep in mind that the code in no way are perfect or cover all cases):
Results:
http:/wwww.example.com/lorum.html => http://www.example.com/lorum.html
gopher:/ww.example.com => gopher://www.example.com
http:/www3.example.com/?q=asd&f=#asd =>http://www3.example.com/?q=asd&f=#asd
asd://.example.com/folder/folder/ =>http://example.com/folder/folder/
.example.com/ => http://example.com/
example.com =>http://example.com
subdomain.example.com => http://subdomain.example.com
function url_parser($url) {
// multiple /// messes up parse_url, replace 2+ with 2
$url = preg_replace('/(\/{2,})/','//',$url);
$parse_url = parse_url($url);
if(empty($parse_url["scheme"])) {
$parse_url["scheme"] = "http";
}
if(empty($parse_url["host"]) && !empty($parse_url["path"])) {
// Strip slash from the beginning of path
$parse_url["host"] = ltrim($parse_url["path"], '\/');
$parse_url["path"] = "";
}
$return_url = "";
// Check if scheme is correct
if(!in_array($parse_url["scheme"], array("http", "https", "gopher"))) {
$return_url .= 'http'.'://';
} else {
$return_url .= $parse_url["scheme"].'://';
}
// Check if the right amount of "www" is set.
$explode_host = explode(".", $parse_url["host"]);
// Remove empty entries
$explode_host = array_filter($explode_host);
// And reassign indexes
$explode_host = array_values($explode_host);
// Contains subdomain
if(count($explode_host) > 2) {
// Check if subdomain only contains the letter w(then not any other subdomain).
if(substr_count($explode_host[0], 'w') == strlen($explode_host[0])) {
// Replace with "www" to avoid "ww" or "wwww", etc.
$explode_host[0] = "www";
}
}
$return_url .= implode(".",$explode_host);
if(!empty($parse_url["port"])) {
$return_url .= ":".$parse_url["port"];
}
if(!empty($parse_url["path"])) {
$return_url .= $parse_url["path"];
}
if(!empty($parse_url["query"])) {
$return_url .= '?'.$parse_url["query"];
}
if(!empty($parse_url["fragment"])) {
$return_url .= '#'.$parse_url["fragment"];
}
return $return_url;
}
echo url_parser('http:/wwww.example.com/lorum.html'); // http://www.example.com/lorum.html
echo url_parser('gopher:/ww.example.com'); // gopher://www.example.com
echo url_parser('http:/www3.example.com/?q=asd&f=#asd'); // http://www3.example.com/?q=asd&f=#asd
echo url_parser('asd://.example.com/folder/folder/'); // http://example.com/folder/folder/
echo url_parser('.example.com/'); // http://example.com/
echo url_parser('example.com'); // http://example.com
echo url_parser('subdomain.example.com'); // http://subdomain.example.com

It's not 100% foolproof, but a 1 liner.
$URL = (((strpos($URL,'https://') === false) && (strpos($URL,'http://') === false))?'http://':'' ).$URL;
EDIT
There was apparently a problem with my initial version if the hostname contain http.
Thanks Trent

How to use python/PHP to remove redundancy in URL link?

Many website add tags to url link for tracking purpose, such as
http://www.washingtonpost.com/blogs/answer-sheet/post/report-we-still-dont-know-much-about-charter-schools/2012/01/13/gIQAxMIeyP_blog.html?wprss=linkset&tid=sm_twitter_washingtonpost
If we remove the appendix "?wprss=linkset&tid=sm_twitter_washingtonpost", would still go to same page.
Is there any general approach could remove those redundancy element? Any comment would be helpful.
Thanks!

To remove query, fragment parts from URL
In Python using urlparse:
import urlparse
url = urlparse.urlsplit(URL) # parse url
print urlparse.urlunsplit(url[:3]+('','')) # remove query, fragment parts
Or a more lightweight approach but it might be less universal:
print URL.partition('?')[0]
According to rfc 3986 URI can be parsed using the regular expression:
/^(([^:\/?#]+):)?(\/\/([^\/?#]*))?([^?#]*)(\?([^#]*))?(#(.*))?/
Therefore if there is no fragment identifier (the last part in the above regex) or the query component is present (the 2nd to last part) then URL.partition('?')[0] should work, otherwise answers that split an url on '?' would fail e.g.,
http://example.com/path#here-?-ereh
but urlparse answer still works.
To check whether you can access page via URL
In Python:
import urllib2
try:
resp = urllib2.urlopen(URL)
except IOError, e:
print "error: can't open %s, reason: %s" % (URL, e)
else:
print "success, status code: %s, info:\n%s" % (resp.code, resp.info()),
resp.read() could be used to read the contents of the page.

To remove query string in URL :
<?php
$url = 'http://www.washingtonpost.com/blogs/answer-sheet/post/report-we-still-dont-know-much-about-charter-schools/2012/01/13/gIQAxMIeyP_blog.html?wprss=linkset&tid=sm_twitter_washingtonpost';
$url = explode('?',$url);
$url = $url[0];
//check output
echo $url;
?>
To check URL valid or not:
You can use PHP function get_headers($url). Example:
<?php
//$url_o = 'http://www.washingtonpost.com/blogs/answer-sheet/post/report-we-still-dont-know-much-about-charter-schools/2012/01/13/gIQAxMIeyP_blog.html?wprss=linkset&tid=sm_twitter_washingtonpost';
$url_o = 'http://mobile.nytimes.com/article?a=893626&f=21';
$url = explode('?',$url_o);
$url = $url[0];
$header = get_headers($url);
if(strpos($header[0],'Not Found'))
{
$url = $url_o;
}
//check output
echo $url;
?>

You can use a regular expression:
$yourUrl = preg_replace("/[?].*/","",$yourUrl);
Which meanss: "replace the question mark and everything afterwards with an empty string".

You can make a URL parser that will cut everything from "?" and on
<?php
$pos = strpos($yourUrl, '?'); //First, find the index of "?"
//Then, cut all the chars after the "?" and a append to a new URL string://
$newUrl = substr($yourUrl, 0, -1*(strlen($yourUrl)-((int)$pos)));
echo ($newUrl);
?>

Isolate part of url with php and then print it in html element

I am building a gallery in WordPress and I'm trying to grab a specific part of my URL to echo into the id of a div.
This is my URL:
http://www.url.com/gallery/truck-gallery-1
I want to isolate the id of the gallery which will always be a number(in this case its 1). Then I would like to have a way to print it somewhere, maybe in the form of a function.

You should better use $_SERVER['REQUEST_URI']. Since it is the last string in your URL, you can use the following function:
function getIdFromUrl($url) {
return str_replace('/', '', array_pop(explode('-', $url)));
}
#Kristian 's solution will only return numbers from 0-9, but this function will return the id with any length given, as long as your ID is separated with a - sign and the last element.
So, when you call
echo getIdFromUrl($_SERVER['REQUEST_URI']);
it will echo, in your case, 1.

If the ID will not always be the same number of digits (if you have any ID's greater than 9) then you'll need something robust like preg_match() or using string functions to trim off everything prior to the last "-" character. I would probably do:
<?php
$parts = parse_url($_SERVER['REQUEST_URI']);
if (preg_match("/truck-gallery-(\d+)/", $parts['path'], $match)) {
$id = $match[1];
} else {
// no ID found! Error handling or recovery here.
}
?>

Use the $_SERVER['REQUEST_URI'] variable to get the path (Note that this is not the same as the host variable, which returns something like http://www.yoursite.com).
Then break that up into a string and return the final character.
$path = $_SERVER['REQUEST_URI'];
$ID = $path[strlen($path)-1];
Of course you can do other types of string manipulation to get the final character of a string. But this works.

We Keep Coding

PHP, A popular general-purpose scripting language that is especially suited to web development.