This question already has answers here:
Closed 11 years ago.
Possible Duplicate:
Regular expression for URL validation (in JavaScript)
So I've seen many similar questions and answers but can't find a solution that fits my specific needs.
I'm terrible at Regex's and am struggling to get a simple Regex for the following url validation.
domain.com
domain.com/folder
subdomain.domain.com
subdomain.domain.com/folder
also to validate for optional http:// and http://www. would be super helpful. Thanks!
As near as I can get would be:
/[a-z]+:\/\/(([a-z0-9][a-z0-9-]+\.)*[a-z][a-z]+|(0x[0-9A-F]+)|[0-9.]+)\/.*/
Note that your question hasn't limited URLs to a set of protocols, TLDs or character sets.
Something like skype://18005551212 or gopher://localhost is a valid URL. Heck, depending on what you're using to browse, the following might all be valid ways to get to the same server (though not quite the same virtualhost):
https://stackoverflow.com/
http://64.34.119.12/
http://1076000524/
http://0x4022770C/
They all work for me in Firefox.
If you want further restrictions, determine WHAT they are. Are you willing to sacrifice valid protocols? Are you really only interested in one or two protocols?
A more specific question will get you a more specific answer.
Related
This question already has answers here:
How to detect country / location of visitor? [duplicate]
(4 answers)
Closed 9 years ago.
What's the easiest way to detect the visitor country via IP address? What's the common and recommended approach to solve this problem?
You could you use HTML5 Geolocation to detect a users location. There are loads of API already built for you to take advantage off.
Some developers use this change the language of their site to suit the location of the user. Ie. The site is in English, but the visitor is in Ukraine. The site, if the developer has permitted so, the text would translate to Ukrainian without having to let the browser do it.
http://html5demos.com/geo
http://msdn.microsoft.com/en-us/magazine/hh563893.aspx
I do not know the common approach... but i think that sticking to php.net will be fine.
http://www.php.net/manual/en/book.geoip.php
This question already has answers here:
PHP validation/regex for URL
(21 answers)
Closed 8 years ago.
I know there are already questions for validating links. But I'm very bad with regex, and I don't know how to validate a user input (in html) is equivalent to these URL:
http://www.domain.com/?p=123456abcde
or
http://www.domain.com/doc/123456abcde
I guess it's like this
/^(http://)(www)((\.[A-Z0-9][A-Z0-9_-]*).com/?p=((\.[A-Z0-9][A-Z0-9_-]*)
I need the regex or the two URL. Thanks
This might not be a job for regexes, but for existing tools in your language of choice. Regexes are not a magic wand you wave at every problem that happens to involve strings. You probably want to use existing code that has already been written, tested, and debugged.
In PHP, use the parse_url function.
Perl: URI module.
Ruby: URI module.
.NET: 'Uri' class
This will match both your strings.
(http:\/\/)?(www\.)?([A-Z0-9a-z][A-Z0-9a-z_-]*).com\/(\?p=)?([A-Z0-9a-z][\/A-Za-z0-9_-]*)
I highly recommend using a regex checker, you can find some for (almost) every OS and there are even some online ones such as: http://regexpal.com/ or http://www.quanetic.com/Regex.
This will match any valid domain with the format you specified.
http(s)?:\/\/(www\.)?[a-zA-Z0-9-\.]+\.[a-z]{2,6}\/(\?p=|doc\/)[a-z0-9]+
Replace [a-z]{2,6} with com if you only want .com domains. See it in action here.
This question already has answers here:
Closed 11 years ago.
Possible Duplicate:
PHP Linkify Links In Content
I've got a little stuck with finding text links and wrapping them in A tags.
I'm using this so far / [\w]*\.[a-z]{2,}/i to find the link which works fine for links like this, stackoverflow.com but it misses www. or anything before hand.
To recap, I'm trying to find all links and wrap in A tags. Non of the text contains the protocol part (http(s)://) or port part which makes it a tad harder.
Can't find a good duplicate now, so try something simple like repeating the prefix:
/\b(\w[\w-]+\.)+[a-z]{2,}\b/i
I wouldn't use this; too many false positives. But you haven't really limited the scope. Alternatives include e.g. a fixed list of TLDs to make it a bit more specific.
$text = preg_replace('#((?:http(?:s)?://)?(?:www)?([-\w\.]+)+(:\d+)?(/([\w/_\.]*(\?\S+)?)?)?)#', '$1', $text);
This question already exists:
Closed 11 years ago.
Possible Duplicate:
How to parse HTML with PHP?
i want to write a php-program that count all hyperlinks of a website, the user can enter.
how to do this? is there a libary or something which i can parse and analyze the html about the hyperlinks?
thanks for your help
Like this
<?php
$site = file_get_contents("someurl");
$links = substr_count($site, "<a href=");
print"There is {$links} in that page.";
?>
Well, we won't be able to give you a finite answer but only pointers. I've done a search engine once out of php so the principle will be the same:
First of all you need to code your script as a console script, a web script is not really appropriate but it's all a question of tastes
You need to understand how to work with sockets in PHP and make requests, look at the php socket library at: http://www.php.net/manual/ref.network.php
You will need to get versed in the world of HTTP requests, learn how to make your own GET/POST requests and split the headers from the returned content.
Last part will be easy with regexp, just preg_match the content for "#()*#i" (the last expression might be wrong, i didn't test it at all ok?)
Loop the list of found hrefs, compare to already visited hrefs (remember to take into account wildcard GET params in your stuff) and then repeat the process to load all the pages of a site.
It IS HARD WORK... good luck
You may have to use CURL to fetech the contents of the webpage. Store that in a variable then parse it for hyperlinks. You might need regular expression for that.
This question already has answers here:
Closed 12 years ago.
Possible Duplicate:
What's the shebang (#!) in Facebook and new Twitter URLs for?
It usually comes straight after the domain name.
I see it all the time, like in Twitter and Facebook urls.
Is it some special sort of routing?
# is the fragment separator. Everything before it is handled by the server, and everything after it is handled by the client, usually in JavaScript (although it will advance the page to an anchor with the same name as the fragment).
after # is the hash of the location; the ! the follows is used by search engines to help index ajax content. After that can be anything, but is usually rendered to look as a path (hence the /). If you want to know more, read this.