take facebook page url and store id and slug separately - php

I'm developing a web app where users enter their facebook page url either in this format:
http://www.facebook.com/pages/Graffiti/119622954518
or
http://www.facebook.com/thefirkinandfox
With php - how do I detect which format automatically, then split (explode?) the parts (the slug and the id or just the slug if the second version).
There is sometimes query data at the end of the url when viewing your own facebook page as an administrator, how do I detect and remove that? I think the answer will be regex of some kind - but I've really only used this to make sure an input is email and still didn't understand it that well... thanks in advance.
Possible entires may or may not include http:// at the beginning... I'd like to account for this...

If you want to use one regexp, try this:
$url = 'www.facebook.com/pages/Graffiti/119622954518';
if(preg_match('#^(https?://)?(www\.)?facebook\.com/((pages/([^/]+)/(\d+))|([^/]+))#', $url, $matches)) {
$slug = isset($matches[5]) ? $matches[5] : (isset($matches[7]) ? $matches[7] : null);
$id = isset($matches[6]) ? $matches[6] : null;
}

Two parts:
^http://www.facebook.com/pages/([^/]+)/([^/]+)(?:\?.*)$
If the first one doesn't match, use this:
^http://www.facebook.com/([^/]+)(?:\?.*)$
The explosion, you mention is the value of the capturing group.
So the code might look something like this:
$subject = "my string";
if (preg_match ('#^http://www.facebook.com/pages/([^/]+)/([^/]+)(?:\?.*)$#', $subject))
print ($groups[1] + ' ' + $groups[1]);
else if (preg_match ('#^http://www.facebook.com/([^/]+)(?:\?.*)$#', $subject))
print ($groups[1]);

Related

URL Validation/Sanitization with Regular Expressions

I'm a little out of my depth here but believe I am now on the right track. I want to take user supplied url's and store them in a database so that the links can then be used on a user profile page.
Now the links I'm hoping the users will supply will be for social media site, facebook and the like. Whilst looking for a solution to safely storing user supplied url's I found this page http://electrokami.com/coding/use-php-to-format-and-validate-a-url-with-these-easy-functions/. The code works but seems to remove nearly everything. If I used "www.example.com/user.php?u=borris" it just returns example.com is valid.
Then I found out about regular expressions and found this line of code
/(?:https?:\/\/)?(?:www\.)?facebook\.com\/(?:(?:\w)*#!\/)?(?:pages\/)?(?:[\w\-]*\/)*([\w\-\.]*)/
from this site https://gist.github.com/marcgg/733592 and another stack overflow post Check if a string contains a url and get contents of url php.
I tried to merge the code together so that I get something that would validate the link for a facebook profile or page. I don't want to get profile info, pics etc but my code's not right either, so rather than getting deeper into stuff I don't fully understand yet I thought asking for help was best.
Below is the code I mashed together which gave me the error "Warning: preg_match_all() [function.preg-match-all]: Compilation failed: unmatched parentheses at offset 29... on line 9"
<?php
// get url to check from the page parameter 'url'
// or use default http://example.com
$text = isset($_GET['url'])
? $_GET['url']
: "http://www.vwrx-project.co.uk/user.php?u=borris";
$reg_exurl = "/(?:http|https|ftp|ftps)?:\/\/)?(?:www\.)?facebook\.com\/(?:(?:\w)*#!\/)?(?:pages\/)?(?:[\w\-]*\/)*([\w\-\.]*)/";
preg_match_all($reg_exurl, $text, $matches);
$usedPatterns = array();
$url = '';
foreach($matches[0] as $pattern){
if(!array_key_exists($pattern, $usedPatterns)){
$usedPatterns[$pattern] = true;
$url = $pattern;
}
}
?>
--------------------------------------------------------- Additional ------------------------------------------------------------
I took a fresh look at the answer Dave provided me with today and felt I could work with it, it makes more sense to me from a code perspective as I can follow the process etc.
I got a system I'm partly happy with. If I supply a link http://www.facebook.com/#!/lilbugga which is a typical link from facebook (when clicking on your username/profile pic from your wall) I can get the result http://www.facebook.com/lilbugga which shows as valid.
What it can't handle is the link from facebook that isn't in a vanity/seo friendly format such as https://www.facebook.com/profile.php?id=4. If I allow my code to accept ? and = then I suspect I'm leaving my website/database open to attack which I don't want.
Whats the best option now? This is the code I have
<?php
$dirty_url = "http://www.facebook.com/profile.php?id=4"; //user supplied link
//clean url leaving alphanumerics : / . only - required to remove facebook link format with /#!/
$clean_url = preg_replace('#[^a-z0-9:/.]#i', '', $dirty_url);
$parsed_url = parse_url($clean_url); //parse url to get brakedown of components
$safe_host = $parsed_url['host']; // safe host direct from parse_url
// str_replace to switch any // to a / inside the returned path - required due to preg_replace process above
echo $safe_path = str_replace("//", "/", ($parsed_url['path']));
if ($parsed_url['host'] == 'www.facebook.com') {
echo "Facebook";
} else {
echo " :( invalid url";
}
?>
Not sure exactly what you are trying to accomplish, but it sounds like you could use parse_url for this:
<?php
$parsed_url = parse_url($_GET['url']);
//assume it's "http://www.vwrx-project.co.uk/user.php?u=borris"
print_r($parsed_url);
/*
Array
(
[scheme] => http
[host] => www.vwrx-project.co.uk
[path] => /user.php
[query] => u=borris
)
*/
if ($parsed_url['host'] == 'www.facebook.com') {
//do stuff
}
?>
I have taken some regex pattern from HERE
Get the matched groups.
(?:http|https|ftp|ftps(?:\/\/)?)?(?:www.|[-;:&=\+\$,\w]+#)([A-Za-z0-9.-]+)((?:\/[\+~%\/.\w-_]*)?\??((?:[-\+=&;%#.\w_]*)#?(?:[\w]*)?))
Online demo
Input:
www.example.com/user.php?u=borris
http://www.vwrx-project.co.uk/user.php?u=borris
Output:
MATCH 1
1. [4-15] `example.com`
2. [15-33] `/user.php?u=borris`
3. [25-33] `u=borris`
MATCH 2
1. [45-63] `vwrx-project.co.uk`
2. [63-81] `/user.php?u=borris`
3. [73-81] `u=borris`

extracting facebook photo id from a LONG url

I have searched this website on how to extract facebook id from url that starts from photo.php?fbid= but i have a long url and know how to get photo id
Example1 : photo.php?fbid=10151987845617397 (the complete url is stored in the $url variable which is checked using preg_match i believe)
!preg_match("|^http(s)?://(www.)?facebook.com/photo.php(.*)?$|i", $url) || !$pid
the above code fetches facebook id 10151987845617397 and puts it in the variable $pid.
If I have a long url, how can i change the code?
Here is the url
Example2 : https://www.facebook.com/nokia/photos/a.338008237396.161268.36922302396/10151987845617397/?type=1&theater
In the above url 10151987845617397 is the photo id that i need to capture and put it in variable $pid.
what changes do i need to do in the preg_match string?
In other words to get the photoid 10151987845617397 as output in the $pid variable:
For url facebookcom/photo.php?fbid=10151987845617397
The syntax is !preg_match("|^http(s)?://(www.)?facebook.com/photo.php(.*)?$|i", $url) || !$pid
So for url facebookcom/nokia/photos/a.338008237396.161268.36922302396/10151987845617397/?type=1&theater
What would be the syntax
Please help
Thanks
The simple solution and quite readable: Use the entire string as a regex, use () around what you want to match:
// $tmp[1] = www or nothing
// $tmp[2] = "user" (i.e nokia)
// $tmp[3] = album id?
// $tmp[4] = photos
// $tmp[5] = Long url as requested
function extract_id_from_album_url($url) {
preg_match('/https?:\/\/(www.)?facebook\.com\/([a-zA-Z0-9_\- ]*)\/([a-zA-Z0-9_\- ]*)\/([a-zA-Z0-9_\.\-]*)\/([a-zA-Z0-9_\-]*)(\/\?type=1&theater\/)?/i', $url, $tmp);
return isset($tmp[5]) ? $tmp[5] : false;
}
Backslashes are needed to ensure the . is seen as a literal (and not regex syntax). Questionmarks to allow optional urls. Using more regex syntax can make the matching "query" much shorter and extendable, but also makes it harder to read.

How to make codeigniter user profile url look pretty

I am trying to figure out the best way to make a nice looking profile URL in codeigniter.
Normally i would just like to a controller with a third url paramter (the profile id) like so:
http://thesite.com/profile/4
======
This isn't going to work for the site i'm building now because I want a nice looking url with the company name, like so:
http://thesite.com/profile/some-company-name
=======
Their are 2 problems with this and maybe i'm just not thinking straight today and the answer is obvious, but if the url is a hyphenated version of the company name, if 2 of the same company are in the database for some weird reason, then the profile might not be the correct one, really the only good way is to provide a profile id and pull by the id, but then my url doesn't look good...
How would you handle this situation? I guess i could always link to /profile/id and then in the controller just look up the profile and redirect to /profile/company-name, but then the user that went to the site and typed in /profile/company-name would get a 404 page.
Any good ideas for me?
Just check the guide, it shows you with the URL Helper:
Setup a model to handle your getter / setter (for profile name).
Getter gets the proper content to display, setter sets the profile name and handles duplicates (same names by adding a 1 etc;). Think of a createive way to eliminate collisions, addding state name or zip if you don't want something ugly.
Use url_title() to handle clean urls and eliminate odd characters:
$title = "What's wrong with CSS?";
$url_title = strtolower(url_title($title));
// Produces: whats-wrong-with-css
use urlencode and urldecode of php and routing
add in route.php in config
$route['profile/(:any)'] = "controller_link/$1";
then you can use urldecode on your controller file like
public function controller_link($name=FALSE)
{
if ($name != FALSE) {//check if name is passed
$queryname = urldecode($name);
//then query on that name in the database
}
}
if not clear then add comment.
Use unique name of company in Database, search your company by name, i use this function (helper) to translate name in slug.
if(!function_exists('slug')){
function slug($text)
{
// replace non letter or digits by -
$text = preg_replace('~[^\\pL\d]+~u', '-', $text);
// trim
$text = trim($text, '-');
// transliterate
$text = iconv('utf-8', 'us-ascii//TRANSLIT', $text);
// lowercase
$text = strtolower($text);
// remove unwanted characters
$text = preg_replace('~[^-\w]+~', '', $text);
return (!empty($text)) ? $text : FALSE ;
}
}

Remove certain part of string in PHP [duplicate]

This question already has answers here:
Get domain name (not subdomain) in php
(18 answers)
Closed 10 years ago.
I've already seen a bunch of questions on this exact subject, but none seem to solve my problem. I want to create a function that will remove everything from a website address, except for the domain name.
For example if the user inputs: http://www.stackoverflow.com/blahblahblah I want to get stackoverflow, and the same way if the user inputs facebook.com/user/bacon I want to get facebook.
Do anyone know of a function or a way where I can remove certain parts of strings? Maybe it'll search for http, and when found it'll remove everything until after the // Then it'll search for www, if found it'll remove everything until the . Then it keeps everything until the next dot, where it removes everything behind it? Looking at it now, this might cause problems with sites as http://www.en.wikipedia.org because I'll be left with only en.
Any ideas (preferably in PHP, but JavaScript is also welcome)?
EDIT 1:
Thanks to great feedback I think I've been able to work out a function that does what I want:
function getdomain($url) {
$parts = parse_url($url);
if($parts['scheme'] != 'http') {
$url = 'http://'.$url;
}
$parts2 = parse_url($url);
$host = $parts2['host'];
$remove = explode('.', $host);
$result = $remove[0];
if($result == 'www') {
$result = $remove[1];
}
return $result;
}
It's not perfect, at least considering subdomains, but I think it's possible to do something about it. Maybe add a second if statement at the end to check the length of the array. If it's bigger than two, then choose item nr1 instead of item nr0. This obviously gives me trouble related to any domain using .co.uk (because that'll be tree items long, but I don't want to return co). I'll try to work around on it a little bit, and see what I come up with. I'd be glad if some of you PHP gurus out there could take a look as well. I'm not as skilled or as experienced as any of you... :P
Use parse_url to split the URL into the different parts. What you need is the hostname. Then you will want to split it by the dot and get the first part:
$url = 'http://facebook.com/blahblah';
$parts = parse_url($url);
$host = $parts['host']; // facebook.com
$foo = explode('.', $host);
$result = $foo[0]; // facebook
You can use the parse_url function from PHP which returns exactly what you want - see
Use the parse_url method in php to get domain.com and then use replace .com with empty string.
I am a little rusty on my regular expressions but this should work.
$url='http://www.en.wikipedia.org';
$domain = parse_url($url, PHP_URL_HOST); //Will return en.wikipedia.org
$domain = preg_replace('\.com|\.org', '', $domain);
http://php.net/manual/en/function.parse-url.php
PHP REGEX: Get domain from URL
http://rubular.com/r/MvyPO9ijnQ //Check regular expressions
You're looking for info on Regular Expression. It's a bit complicated, so be prepared to read up. In your case, you'll best utilize preg_match and preg_replace. It searches for a match based on your pattern and replaces the matches with your replacement.
preg_match
preg_replace
I'd start with a pattern like this: find .com, .net or .org and delete it and everything after it. Then find the last . and delete it and everything in front of it. Finally, if // exists, delete it and everything in front of it.
if (preg_match("/^http:\/\//i",$url))
preg_replace("/^http:\/\//i","",$url);
if (preg_match("/www./i",$url))
preg_replace("/www./i","",$url);
if (preg_match("/.com/i",$url))
preg_replace("/.com/i","",$url);
if (preg_match("/\/*$/",$url))
preg_replace("/\/*$/","",$url);
^ = at the start of the string
i = case insensitive
\ = escape char
$ = the end of the string
This will have to be played around with and tweaked, but it should get your pointed in the right direction.
Javascript:
document.domain.replace(".com","")
PHP:
$url = 'http://google.com/something/something';
$parse = parse_url($url);
echo str_replace(".com","", $parse['host']); //returns google
This is quite a quick method but should do what you want in PHP:
function getDomain( $URL ) {
return explode('.',$URL)[1];
}
I will update it when I get chance but basically it splits the URL into pieces by the full stop and then returns the second item which should be the domain. A bit more logic would be required for longer domains such as www.abc.xyz.com but for normal urls it would suffice.

Rewrite Youtube URL

I have some YouTube URLs stored in a database that I need to rewrite.
They are stored in this format:
$http://youtu.be/IkZuQ-aTIs0
I need to have them re-written to look like this:
$http://youtube.com/v/IkZuQ-aTIs0
These values are stored as a variable $VideoType
I'm calling the variable like this:
$<?php if ($video['VideoType']){
$echo "<a rel=\"shadowbox;width=700;height=400;player=swf\" href=\"" . $video['VideoType'] . "\">View Video</a>";
$}?>
How do I rewrite them?
Thank you for the help.
You want to use the preg_replace function:
Something like:
$oldurl = 'youtu.be/blah';
$pattern = '/youtu.be/';
$replacement = 'youtube.com/v';
$newurl = preg_replace($pattern, $replacement, $string);
You can use a regular expression to do this for you. If you have ONLY youtube URLs stored in your database, then it would be sufficient to take the part after the last slash 'IkZuQaTIs0' and place it in the src attribute after 'http://www.youtube.com/'.
For this simple solution, do something like this:
<?php
if ($video['VideoType']) {
$last_slash_position = strrpos($video['VideoType'], "/");
$youtube_url_code = substr($video['VideoType'], $last_slash_position);
echo "<a rel=\"shadowbox;width=700;height=400;player=swf\"
href=\"http://www.youtube.com/".$youtube_url_code."\">
View Video</a>";
}
?>
I cannot test it at the moment, maybe you can try to experiment with the position of the last slash occurence etc. You can also have a look at the function definitions:
http://www.php.net/manual/en/function.substr.php
http://www.php.net/manual/en/function.strrpos.php
However, be aware of the performance. Build a script which prases your database and converts every URL or stores a short and a long URL in each entry. Because regular expressions in the view are never a good idea.
UPDATE: it would be even better to store ONLY the youtube video identifier / url code in the database for every entry, so in the example's case it would be IkZuQ-aTIs0.

Categories