PHP read content of url with japanese word

PHP read content of url with japanese word - php

Hi I want to read content of web url having Japanese word in it.
My existing code is as below
$url = "http://fantasticlife稼ぐ777.tokyo" ;
$responseText = "";
try {
$responseText = #file_get_contents($url);
var_dump($responseText);
} catch (\Exception $e) {
echo $e->getMessage();
}
I am getting following output.
bool(false)
My concern is where the things went wrong. Above code is working fine for normal urls.
Thanks in advance.

Thanks,
Done by converting domain name to IDNA ASCII form. idn_to_ascii() function. Code snippet is as below.
if (strpos($url,"http://")!== false){
$url = "http://" . idn_to_ascii(str_replace("http://", "",$url));
}else if(strpos($url,"https://")!== false){
$url = "https://" . idn_to_ascii(str_replace("https://", "",$url));
}else{
$url = idn_to_ascii($url);
}
Thanks once again. :)

Related

php file_get_contents() returning false with a valid url

I'm currently working on a geocoding php function, using google maps API. Strangely, file_get_contents() returns bool(false) whereas the url I use is properly encoded, I think.
In my browser, when I test the code, the page takes a very long time to load, and the geocoding doesn't work (of course, given that the API doesn't give me what I want).
Also I tried to use curl, no success so far.
If anyone could help me, that'd be great !
Thanks a lot.
The code :
function test_geocoding2(){
$addr = "14 Boulevard Vauban, 26000 Valence";
if(!gc_geocode($addr)){
echo "false <br/>";
}
}
function gc_geocode($address){
$address = urlencode($address);
$url = "http://maps.google.com/maps/api/geocode/json?address={$address}";
$resp_json = file_get_contents($url);
$resp = json_decode($resp_json, true);
if($resp['status']=='OK'){
$lati = $resp['results'][0]['geometry']['location']['lat'];
$longi = $resp['results'][0]['geometry']['location']['lng'];
if($lati && $longi){
echo "(" . $lati . ", " . $longi . ")";
}else{
echo "data not complete <br/>";
return false;
}
}else{
echo "status not ok <br/>";
return false;
}
}
UPDATE : The problem was indeed the fact that I was behind a proxy. I tested with another network, and it works properly.
However, your answers about what I return and how I test the success are very nice as well, and will help me to improve the code.
Thanks a lot !

The problem was the fact that I was using a proxy. The code is correct.
To check if there is a proxy between you and the Internet, you must know the infrastructure of your network. If you work from a school or a company network, it is very likely that a proxy is used in order to protect the local network.
If you do not know the answer, ask your network administrator.
If there is no declared proxy in your network, it is still possible that a transparent proxy is there. However, as states the accepted answer to this question: https://superuser.com/questions/505772/how-can-i-find-out-if-there-is-a-proxy-between-myself-and-the-internet-if-there
If it's a transparent proxy, you won't be able to detect it on the client PC.
Some website also provide some proxy detectors, though I have no idea of how relevant is the information given there. Here are two examples :
http://amibehindaproxy.com/
http://www.proxyserverprivacy.com/free-proxy-detector.shtml

When you are not return anything function returns null.
Just use that:
if(!is_null(gc_geocode($addr))) {
echo "false <br/>";
}
Or:
if(gc_geocode($addr) === false) {
echo "false <br/>";
}

Take a look at the if statement:
if(!gc_geocode($addr)){
echo "false <br/>";
}
This means that if gc_geocode($addr) returns either false or null, this statement will echo "false".
However, you never actually return anything from the function, so on success, it's returning null:
$address = urlencode($address);
$url = "http://maps.google.com/maps/api/geocode/json?address={$address}";
$resp_json = file_get_contents($url);
$resp = json_decode($resp_json, true);
if($lati && $longi){
echo "(" . $lati . ", " . $longi . ")"; //ECHO isn't RETURN
/* You should return something here, e.g. return true */
} else {
echo "data not complete <br/>";
return false;
}
} else {
echo "status not ok <br/>";
return false;
}
Alternatively, you can just change the if statement to only fire when the function returns false:
if(gc_geocode($addr)===false){
//...

Above function gc_geocode() working properly on my system, without any extra load. You have called gc_geocode () it returns you lat, long that is correct now you have check through
if(!gc_geocode($addr)){
echo "false <br/>";
}
Use
if($responce=gc_geocode($addr)){
echo $responce;
}
else{
echo "false <br/>";
}

PHP Strip domain name from url

I know there is a LOT of info on the web regarding to this subject but I can't seem to figure it out the way I want.
I'm trying to build a function which strips the domain name from a url:
http://blabla.com blabla
www.blabla.net blabla
http://www.blabla.eu blabla
Only the plain name of the domain is needed.
With parse_url I get the domain filtered but that is not enough.
I have 3 functions that stips the domain but still I get some wrong outputs
function prepare_array($domains)
{
$prep_domains = explode("\n", str_replace("\r", "", $domains));
$domain_array = array_map('trim', $prep_domains);
return $domain_array;
}
function test($domain)
{
$domain = explode(".", $domain);
return $domain[1];
}
function strip($url)
{
$url = trim($url);
$url = preg_replace("/^(http:\/\/)*(www.)*/is", "", $url);
$url = preg_replace("/\/.*$/is" , "" ,$url);
return $url;
}
Every possible domain, url and extension is allowed. After the function is finished, it must return a array of only the domain names itself.
UPDATE:
Thanks for all the suggestions!
I figured it out with the help from you all.
function test($url)
{
// Check if the url begins with http:// www. or both
// If so, replace it
if (preg_match("/^(http:\/\/|www.)/i", $url))
{
$domain = preg_replace("/^(http:\/\/)*(www.)*/is", "", $url);
}
else
{
$domain = $url;
}
// Now all thats left is the domain and the extension
// Only return the needed first part without the extension
$domain = explode(".", $domain);
return $domain[0];
}

How about
$wsArray = explode(".",$domain); //Break it up into an array.
$extension = array_pop($wsArray); //Get the Extension (last entry)
$domain = array_pop($wsArray); // Get the domain
http://php.net/manual/en/function.array-pop.php

Ah, your problem lies in the fact that TLDs can be either in one or two parts e.g .com vs .co.uk.
What I would do is maintain a list of TLDs. With the result after parse_url, go over the list and look for a match. Strip out the TLD, explode on '.' and the last part will be in the format you want it.
This does not seem as efficient as it could be but, with TLDs being added all the time, I cannot see any other deterministic way.

Ok...this is messy and you should spend some time optimizing and caching previously derived domains. You should also have a friendly NameServer and the last catch is the domain must have a "A" record in their DNS.
This attempts to assemble the domain name in reverse order until it can resolve to a DNS "A" record.
At anyrate, this was bugging me, so I hope this answer helps :
<?php
$wsHostNames = array(
"test.com",
"http://www.bbc.com/news/uk-34276525",
"google.uk.co"
);
foreach ($wsHostNames as $hostName) {
echo "checking $hostName" . PHP_EOL;
$wsWork = $hostName;
//attempt to strip out full paths to just host
$wsWork = parse_url($hostName, PHP_URL_HOST);
if ($wsWork != "") {
echo "Was able to cleanup $wsWork" . PHP_EOL;
$hostName = $wsWork;
} else {
//Probably had no path info or malformed URL
//Try to check it anyway
echo "No path to strip from $hostName" . PHP_EOL;
}
$wsArray = explode(".", $hostName); //Break it up into an array.
$wsHostName = "";
//Build domain one segment a time probably
//Code should be modified not to check for the first segment (.com)
while (!empty($wsArray)) {
$newSegment = array_pop($wsArray);
$wsHostName = $newSegment . $wsHostName;
echo "Checking $wsHostName" . PHP_EOL;
if (checkdnsrr($wsHostName, "A")) {
echo "host found $wsHostName" . PHP_EOL;
echo "Domain is $newSegment" . PHP_EOL;
continue(2);
} else {
//This segment didn't resolve - keep building
echo "No Valid A Record for $wsHostName" . PHP_EOL;
$wsHostName = "." . $wsHostName;
}
}
//if you get to here in the loop it could not resolve the host name
}
?>

try with preg_replace.
something like
$domain = preg_replace($regex, '$1', $url);
regex

function test($url)
{
// Check if the url begins with http:// www. or both
// If so, replace it
if (preg_match("/^(http:\/\/|www.)/i", $url))
{
$domain = preg_replace("/^(http:\/\/)*(www.)*/is", "", $url);
}
else
{
$domain = $url;
}
// Now all thats left is the domain and the extension
// Only return the needed first part without the extension
$domain = explode(".", $domain);
return $domain[0];
}

Get youtube id for all url types

The following code works with all YouTube domains except for youtu.be. An example would be: http://www.youtube.com/watch?v=ZedLgAF9aEg would turn into: ZedLgAF9aEg
My question is how would I be able to make it work with http://youtu.be/ZedLgAF9aEg.
I'm not so great with regex so your help is much appreciated. My code is:
$text = preg_replace("#[&\?].+$#", "", preg_replace("#http://(?:www\.)?youtu\.?be(?:\.com)?/(embed/|watch\?v=|\?v=|v/|e/|.+/|watch.*v=|)#i", "", $text)); }
$text = (htmlentities($text, ENT_QUOTES, 'UTF-8'));
Thanks again!

//$url = 'http://www.youtube.com/watch?v=ZedLgAF9aEg';
$url = 'http://youtu.be/ZedLgAF9aEg';
if (FALSE === strpos($url, 'youtu.be/')) {
parse_str(parse_url($url, PHP_URL_QUERY), $id);
$id = $id['v'];
} else {
$id = basename($url);
}
echo $id; // ZedLgAF9aEg
Will work for both versions of URLs. Do not use regex for this as PHP has built in functions for parsing URLs as I have demonstrated which are faster and more robust against breaking.

Your regex appears to solve the problem as it stands now? I didn't try it in php, but it appears to work fine in my editor.
The first part of the regex http://(?:www\.)?youtu\.?be(?:\.com)?/matches http://youtu.be/ and the second part (embed/|watch\?v=|\?v=|v/|e/|.+/|watch.*v=|) ends with |) which means it matches nothing (making it optional). In other words it would trim away http://youtu.be/ leaving only the id.
A more intuitive way of writing it would be to make the whole if grouping optional I suppose, but as far as I can tell your regex is already solving your problem:
#http://(?:www\.)?youtu\.?be(?:\.com)?/(embed/|watch\?v=|\?v=|v/|e/|.+/|watch.*v=)?#i
Note: Your regex would work with the www.youtu.be.com domain as well. It would be stripped away, but something to watch out for if you use this for validating input.
Update:
If you want to only match urls inside [youtube][/youtube] tags you could use look arounds.
Something along the lines of:
(?<=\[youtube\])(?:http://(?:www\.)?youtu\.?be(?:\.com)?/(?:embed/|watch\?v=|\?v=|v/|e/|[^\[]+/|watch.*v=)?)(?=.+\[/youtube\])
You could further refine it by making the .+ in the look ahead only match valid URL characters etc.

Try this, hope it'll help you
function YouTubeUrl($url)
{
if($url!='')
{
$newUrl='';
$videoLink1=$url;
$findKeyWord='youtu.be';
$toBeReplaced='www.youtube.com';
if(IsContain('watch?v=',$videoLink1))
{
$newUrl=tMakeUrl($videoLink1);
}
else if(IsContain($videoLink1, $findKeyWord))
{
$videoLinkArray=explode('/',$videoLink1);
$Protocol='';
if(IsContain('://',$videoLink1))
{
$protocolArray=explode('://',$videoLink1);
$Protocol=$protocolArray[0];
}
$file=$videoLinkArray[count($videoLinkArray)-1];
$newUrl='www.youtube.com/watch?v='.$file;
if($Protocol!='')
$newUrl.=$Protocol.$newUrl;
else
$newUrl=tMakeUrl($newUrl);
}
else
$newUrl=tMakeUrl($videoLink1);
return $newUrl;
}
return '';
}
function IsContain($string,$findKeyWord)
{
if(strpos($string,$findKeyWord)!==false)
return true;
else
return false;
}
function tMakeUrl($url)
{
$tSeven=substr($url,0,7);
$tEight=substr($url,0,8);
if($tSeven!="http://" && $tEight!="https://")
{
$url="http://".$url;
}
return $url;
}

You can use bellow function for any of youtube URL
I hope this will help you
function checkYoutubeId($id)
{
$youtube = "http://www.youtube.com/oembed?url=". $id ."&format=json";
$curl = curl_init($youtube);
curl_setopt($curl, CURLOPT_RETURNTRANSFER, 1);
$return = curl_exec($curl);
curl_close($curl);
return json_decode($return, true);
}
This function return Youtube video detail if Id match to youtube video ID

A little improvement to #rvalvik answer would be to include the case of the mobile links (I've noticed it while working with a customer who used an iPad to navigate, copy and paste links). In this case, we have a m (mobile) letter instead of www. Regex then becomes:
#(https?://)?(?:www\.)?(?:m\.)?(?:youtu\.be/|youtube\.com(?:/embed/|/v/|/watch?.*?v=))([\w\-]{10,12}).*#x
Hope it helps.

A slight improvement of another answer:
if (strpos($url, 'feature=youtu.be') === TRUE || strpos($url, 'youtu.be') === FALSE )
{
parse_str(parse_url($url, PHP_URL_QUERY), $id);
$id = $id['v'];
}
else
{
$id = basename($url);
}
This takes into account youtu.be still being in the URL, but not the URL itself (it does happen!) as it could be the referring feature link.

Other answers miss out on the point that some youtube links are part of a playlist and have a list paramater also which is required for embed code. So to extract the embed code from link one could try this JS code:
let urlEmbed = "https://www.youtube.com/watch?v=iGGolqb6gDE&list=PL2q4fbVm1Ik6DCzm9XZJbNwyHtHGclcEh&index=32"
let embedId = urlEmbed.split('v=')[1];
let parameterStringList = embedId.split('&');
if (parameterStringList.length > 1) {
embedId = parameterStringList[0];
let listString = parameterStringList.filter((parameterString) =>
parameterString.includes('list')
);
if (listString.length > 0) {
listString = listString[0].split('=')[1];
embedId = `${parameterStringList[0]}?${listString}`;
}
}
console.log(embedId)
Try it out here: https://jsfiddle.net/AMITKESARI2000/o62dwj7q/

try this :
$string = explode("=","http://www.youtube.com/watch?v=ZedLgAF9aEg");
echo $string[1];
would turn into: ZedLgAF9aEg

Code to validate an email address always fails

I have edited some code I found on 'ye old internet (http://net.tutsplus.com/tutorials/other/using-htaccess-files-for-pretty-urls/). I have not gotten my variation of the code to work properly. My edited versions requests another input called "pages" from index.php. Pages is put into the database along with $url and $short. Pages goes into a pages field in the database which has a varchar value. Pages is later called in serve.php for a javascript purpose. In the code below I have noted where I think the problem occurs. If your interested in my faulty code, stay tuned; I have yet to edit the other files.
I am starting to think the error could be happening in MYSQL because I almost always receive the first $html error of "Error: invalid url"
<?php
require("./db_config.php");
$url = $_REQUEST['url'];
$pages = $_REQUEST['pages'];
//this seems to be where the errors are occuring
if(!preg_match("/^[a-zA-Z]+[:\/\/]+[A-Za-z0-9\-_]+\\.+[A-Za-z0-9\.\/%&=\?\-_]+$/i", $url)) {
$html = "Error: invalid URL";
} else {
$db = mysql_connect($host, $username, $password);
$short = substr(md5(time().$url), 0, 5);
if(mysql_query("INSERT INTO `".$database."`.`url_redirects` (`short`, `url`, `pages`) VALUES ('".$short."', '".$url."', '".$pages."');", $db)) {
$html = "Your short URL is<br />www.srprsr.com/".$short;
} else {
$html = "Error: cannot find database";
}
mysql_close($db);
}
?>

Consider filter_var($url, FILTER_VALIDATE_URL) instead of a regular expression.
http://php.net/filter.examples.validation
http://php.net/filter.filters.validate

Remove parts of a string with PHP

I have an input box that tells uers to enter a link from imgur.com
I want a script to check the link is for the specified site but I'm not sue how to do it?
The links are as follows: http://i.imgur.com/He9hD.jpg
Please note that after the /, the text may vary e.g. not be a jpg but the main domain is always http://i.imgur.com/.
Any help appreciated.
Thanks, Josh.(Novice)

Try parse_url()
try {
if (!preg_match('/^(https?|ftp)://', $_POST['url']) AND !substr_count($_POST['url'], '://')) {
// Handle URLs that do not have a scheme
$url = sprintf("%s://%s", 'http', $_POST['url']);
} else {
$url = $_POST['url'];
}
$input = parse_url($url);
if (!$input OR !isset($input['host'])) {
// Either the parsing has failed, or the URL was not absolute
throw new Exception("Invalid URL");
} elseif ($input['host'] != 'i.imgur.com') {
// The host does not match
throw new Exception("Invalid domain");
}
// Prepend URL with scheme, e.g. http://domain.tld
$host = sprintf("%s://%s", $input['scheme'], $input['host']);
} catch (Exception $e) {
// Handle error
}

substr($input, 0, strlen('http://i.imgur.com/')) === 'http://i.imgur.com/'

Check this, using stripos
if(stripos(trim($url), "http://i.imgur.com")===0){
// the link is from imgur.com
}

Try this:
<?php
if(preg_match('#^http\:\/\/i\.imgur.com\/#', $_POST['url']))
echo 'Valid img!';
else
echo 'Img not valid...';
?>
Where $_POST['url'] is the user input.
I haven't tested this code.

$url_input = $_POST['input_box_name'];
if ( strpos($url_input, 'http://i.imgur.com/') !== 0 )
...

Several ways of doing it.. Here's one:
if ('http://i.imgur.com/' == substr($link, 0, 19)) {
...
}

We Keep Coding

PHP, A popular general-purpose scripting language that is especially suited to web development.

PHP read content of url with japanese word - php

Related

php file_get_contents() returning false with a valid url

PHP Strip domain name from url

Get youtube id for all url types

Code to validate an email address always fails

Remove parts of a string with PHP

Categories

Resources