Remove certain part of string in PHP [duplicate] - php

This question already has answers here:
Get domain name (not subdomain) in php
(18 answers)
Closed 10 years ago.
I've already seen a bunch of questions on this exact subject, but none seem to solve my problem. I want to create a function that will remove everything from a website address, except for the domain name.
For example if the user inputs: http://www.stackoverflow.com/blahblahblah I want to get stackoverflow, and the same way if the user inputs facebook.com/user/bacon I want to get facebook.
Do anyone know of a function or a way where I can remove certain parts of strings? Maybe it'll search for http, and when found it'll remove everything until after the // Then it'll search for www, if found it'll remove everything until the . Then it keeps everything until the next dot, where it removes everything behind it? Looking at it now, this might cause problems with sites as http://www.en.wikipedia.org because I'll be left with only en.
Any ideas (preferably in PHP, but JavaScript is also welcome)?
EDIT 1:
Thanks to great feedback I think I've been able to work out a function that does what I want:
function getdomain($url) {
$parts = parse_url($url);
if($parts['scheme'] != 'http') {
$url = 'http://'.$url;
}
$parts2 = parse_url($url);
$host = $parts2['host'];
$remove = explode('.', $host);
$result = $remove[0];
if($result == 'www') {
$result = $remove[1];
}
return $result;
}
It's not perfect, at least considering subdomains, but I think it's possible to do something about it. Maybe add a second if statement at the end to check the length of the array. If it's bigger than two, then choose item nr1 instead of item nr0. This obviously gives me trouble related to any domain using .co.uk (because that'll be tree items long, but I don't want to return co). I'll try to work around on it a little bit, and see what I come up with. I'd be glad if some of you PHP gurus out there could take a look as well. I'm not as skilled or as experienced as any of you... :P

Use parse_url to split the URL into the different parts. What you need is the hostname. Then you will want to split it by the dot and get the first part:
$url = 'http://facebook.com/blahblah';
$parts = parse_url($url);
$host = $parts['host']; // facebook.com
$foo = explode('.', $host);
$result = $foo[0]; // facebook

You can use the parse_url function from PHP which returns exactly what you want - see

Use the parse_url method in php to get domain.com and then use replace .com with empty string.
I am a little rusty on my regular expressions but this should work.
$url='http://www.en.wikipedia.org';
$domain = parse_url($url, PHP_URL_HOST); //Will return en.wikipedia.org
$domain = preg_replace('\.com|\.org', '', $domain);
http://php.net/manual/en/function.parse-url.php
PHP REGEX: Get domain from URL
http://rubular.com/r/MvyPO9ijnQ //Check regular expressions

You're looking for info on Regular Expression. It's a bit complicated, so be prepared to read up. In your case, you'll best utilize preg_match and preg_replace. It searches for a match based on your pattern and replaces the matches with your replacement.
preg_match
preg_replace
I'd start with a pattern like this: find .com, .net or .org and delete it and everything after it. Then find the last . and delete it and everything in front of it. Finally, if // exists, delete it and everything in front of it.
if (preg_match("/^http:\/\//i",$url))
preg_replace("/^http:\/\//i","",$url);
if (preg_match("/www./i",$url))
preg_replace("/www./i","",$url);
if (preg_match("/.com/i",$url))
preg_replace("/.com/i","",$url);
if (preg_match("/\/*$/",$url))
preg_replace("/\/*$/","",$url);
^ = at the start of the string
i = case insensitive
\ = escape char
$ = the end of the string
This will have to be played around with and tweaked, but it should get your pointed in the right direction.

Javascript:
document.domain.replace(".com","")
PHP:
$url = 'http://google.com/something/something';
$parse = parse_url($url);
echo str_replace(".com","", $parse['host']); //returns google

This is quite a quick method but should do what you want in PHP:
function getDomain( $URL ) {
return explode('.',$URL)[1];
}
I will update it when I get chance but basically it splits the URL into pieces by the full stop and then returns the second item which should be the domain. A bit more logic would be required for longer domains such as www.abc.xyz.com but for normal urls it would suffice.

Related

php __FILE__ inside includes?

I have (maybe) an unusual issue with using __FILE__ in a file within a file.
I created a snippet of code (in the php 5 my server mandates) to take elements of the current filename and put it into a variable to use later. After some headache, I got it working totally fine. However, I realized I didn't want to have to write it every time and realized "oh no, if I include this it's only going to work on the literal filename of the include". If I wanted to grab the filename of the page the user is looking at, as opposed to the literal name of the included file, what's the best approach? Grab the URL from the address bar? Use a different magic variable?
EDIT1: Example
I probably should have provided an example in the first draft, pfft. Say I have numbered files, and the header where the include takes place in is 01header.php, but the file it's displayed in is Article0018.html. I used:
$bn = (int) filter_var(__FILE__, FILTER_SANITIZE_NUMBER_INT);
…to get the article number, but realized it would get the 1 in the header instead.
EDIT2: Temporary Solution
I've """solved""" the issue by creating a function to get the URL / URI and putting it into the variable $infile, and replaced all former appearances of __FILE__ with $infile, like so:
function getAddress() {
$protocol = $_SERVER['HTTPS'] == 'on' ? 'https' : 'http';
return $protocol.'://'.$_SERVER['HTTP_HOST'].$_SERVER['REQUEST_URI'];}
$infile = urlencode(getAddress());
$bn = (int) filter_var($infile, FILTER_SANITIZE_NUMBER_INT);
echo "$bn";
So if the file the user is looking at is called "005-extremelynormalfile.html", I can display the number 5 inside the page, e.g., to say it's article number five.
While it's not as bad as I initially thought based on your description your code is still very fragile, and really only works by accident. If you have any other digits or hyphens it's going to go wrong, as below.
$infile = 'https://example-123.com/foo/42/bar/005-extremelynormalfile.html?x=8&y=9';
var_dump(
filter_var($infile, FILTER_SANITIZE_NUMBER_INT),
(int)filter_var($infile, FILTER_SANITIZE_NUMBER_INT)
);
Output:
string(12) "-12342005-89"
int(-12342005)
Sanitize functions are a blunt instrument for destroying data, and should only ever be used as a last resort when all other good sense has failed.
You need to use a proper parsing function to parse the url into its component parts, and then a simple regular expression to get what you want out of the filename.
function getIdFromURL($url) {
$url_parts = parse_url($url);
$path = $url_parts['path'];
$path_parts = explode('/', $path);
$filename = end($path_parts);
if( preg_match('/^(\d+)/', $filename, $matches) ) {
return (int)$matches[1];
}
return null;
}
var_dump(
getIdFromURL($infile)
);
Lastly, a lot of people are tempted to cram as much logic as possible into a regular expression. If I wanted to the above could be a single regex, but it would also be rigid, unreadable, and unmaintainable. Use regular expressions sparingly, as there's nearly always a parser/library that already does what you want, or the majority of it.
Quickly threw together a function that gets the url from the page as a variable, and replaced all occurrences of __FILE__ with that variable, and it worked correctly. Assuming the user cannot edit the URL / URI in any way, this should work well enough.

PHP URL Variable Appending

Hoping this is a simple and easy question. I've seen multiple examples of, and know how to append variables to the URL (i.e. mydomain.com/index.php?id=1&stat=0), but my question is this:
If I have a page on my site that already has variables in the URL (i.e. mydomain.com/tickets.php?stat=Open), how can I append a page number to the end of that URL (i.e. mydomain.com/tickets.php?stat=Open&page=2). This is for pagination purposes of a table with values from my database, that includes a search and select function (select open, closed, or all tickets, and search for a specific ticket number).
I've done several searches with google, and came up dry, as most topics regarding this have you hardcode the url with variables from the get go, and not append them. I may just be using the wrong search parameters as well, and am not sure what to search for exactly.
Any help or insight on this would be greatly appreciated, thank you.
Please note I wish to do this solely in PHP, HTML, and MySQLi. I want to refrain from using javascript or ajax if possible for my clients that may have those features disabled on their browsers.
Using this way:--
<?php
$domain = "mydomain.com";
$page = "tickets.php?";
$full_page_url = $domain.'/'.$page;
$arr = array('stat' => 'Open', 'page' =>2);
$add= http_build_query($arr);
$correct_url = $full_page_url. $add;
echo $correct_url;
?>
output:--mydomain.com/tickets.php?stat=Open&page=2
I would do it like this:
$page = 2;
$url = 'mydomain.com/tickets.php?stat=Open';
if( false !== strpos($url, '?')){
//if url has a ? split it.
$arr_url = explode('?', $url);
//convert query string to array, $array=['stat'=>'Open']
parse_str($arr_url[1], $array);
//add or replace page by array key
$array['page'] = $page;
//convert it back to a query string.
$query = http_build_query($array);
print_r($query);
}
Outputs
stat=Open&page=2
It's a simple matter of putting $query back with $arr_url[0] I'll leave this up to you. But I will give you a hint $arr_url[0].'?'.$query
The advantage here is that you don't have to worry about getting into a situation where you are adding page after page after page after...
Like this:
mydomain.com/tickets.php?stat=Open&page=1&page=2&page=3
You can't simply concatenate it onto the end of the url, and it's probably just as hard to remove it as it is to parse the query string.
As a side note, you could just use $_GET but where is the fun in that, as $_GET is the query string already parsed as an array ( so you could skip parse_str). But it may not be on a request, such as if you were just building the link from a string.
So I thought I would show it with parse_str to cover the "harder" case.
One last thing if you are just building a bunch of urls all the same except the page part. The obvious answer is to setup a base url and then just loop out the numbers.
$url = 'mydomain.com/tickets.php?stat=Open';
$pagedUrls = [];
$numberPages = 10;
for($i=1; $i<=$nubmerPages; $i++){
$pagedUrls[] = $url.='&page='.$i;
}
Or what have you for the number of pages.
It's really not that clear in your question exactly what you are trying to do..
Hope that helps.

Make the URL path count backwards

I learned how to parse an URL and return me a specific part of it.
For now, I'm currently working in a localhost server, which contains a long basename:
localhost/mydocs/project/wordpress/mexico/cancun
If I want to get the word mexico I would have to count 4 until there.
$url = localhost/mydocs/project/wordpress/mexico/cancun
$parse = parse_url($url);
$path = explode('/', $parse[path]);
echo = $path[4]
Even though it works fine for localhost, when uploading in the server, the basename get shorter and the number 4 can not reach mexico, because the URL becomes:
example.com/mexico/cancun
I'd like to know if there is a global solution for it. I thought about counting backwards, like using -2, so it would start counting from the word "cancun", but I don't know whether is possible or not!
Thank you!
use $path[count($path)-2] -2 being the configurable part.
Note this will only work for numeric indices, like for your case.

preg_match php returning text between two search paramaters

if this is a duplicate I apologise but i couldnt find anything on google after hours of searching, im pretty new to string manipulation and dont really know the correct terminology to find the information i want.
Basically I am manipulating this string
Date Time Name IP UniqueID
$line = 02.12.2013 16:00:03: Connor Bergolio (75.13.15.229:5557) fcfd6ba862c7461a88e2b13babc691dd
So I am trying to retreive the name, However as they can choose whatever name they want, it could have 1 space or 10 spaces so explode is out of the question.
Now I was wondering if it is possible to run a pregmatch using 2 variables. that will return the information between
$pattern1 = '$time, $ip';
preg_match($pattern1, $line, $name);
Looking at that, its way off, but I'm pretty much at a loss
Im using `$IPpattern = '/([0-9-():.)]{19,23})/';
to get the IP maybe using that and a search for time together?
Thanks in advance`
The following pattern will work:
preg_match('/^(.{19}): (.+?) +\(([0-9:.]+)\) ([a-f0-9]+)$/', $line, $matches);
$date = $matches[1];
$name = $matches[2];
$ip = $matches[3];
$uniqueId = $matches[4];
Not knowing the vaild characters allowed for a username, or any of the rules governing the format of Date and Time fields, the following should work:
.*(?:\d\d:){3}\s*\K.+(?=\s?\()
EXPLAINED
.*(?:\d\d:){3}\s*\K - Match everything up to Time field then drop it with \K
.+(?=\s?\() - Match anything one or more times up to but not including the first bracket found
It's not efficient though :(
I'm using this regex to retrieve the name:
/[\d{2}.]{2}\d{4}\s(\d{2}:){3}\s(.+)(\s\(.+)/
Have a play, the second result from this is your name.

Get YouTube ID using JavaScript .match()

So I have a working preg_match in PHP, however, for the life of me, I cannot get the same function to work using Javascript/jQuery.
This is what I am stuck on currently:
yt=$('#yt').val().match(/~^\(?:https?://\)?(?:www\.)?(?:youtube\.com|youtu\.be)(?:/)(?:watch\?v=)?([^&]+)~x/);
alert(yt[1]);
This is the working function in PHP:
$rx = "~"
."^(?:https?://)?" // Optional protocol
."(?:www\.)? " // Optional subdomain
."(?:youtube\.com|youtu\.be)" // Mandatory domain name
."(?:/)" //mandatory bracket
."(?:watch\?v=)?" //optional URI
."([^&]+)" //video id as capture group 1
."~x";
$has_match = preg_match($rx, $url, $matches);
Any idea how to get this functioning?
I found some similar posts on Stack, but they are far less complex than this regex, and couldnt get my head wrapped around the differences.
Not 100% sure but I think you haven't escaped everything correctly.
yt=$('#yt').val().match("^(?:https?://)?(?:www\.)?(?:youtube\.com|youtu\.be)(?:/)(?:watch\?v=)?([^&]+)")
alert(yt[1]);
"https://www.youtube.com/watch?v=dQw4w9WgXcQ".match("^(?:https?://)?(?:www\.)?(?:youtube\.com|youtu\.be)(?:/)(?:watch\?v=)?([^&]+)");
results in
["https://www.youtube.com/watch?v=iQbS-8m3svw", "watch?v=dQw4w9WgXcQ"]

Categories