Replace one URL with another without regex - php

I'm trying to replace some URLs in a database (wordpress) with another, but it's tricky because a lot of the URLs are redirects. I'm trying to either replace the URL with the redirected URL, or with a URL of my choosing, based on the result. I can get the matching done without any problems, but I can't replace it. I've tried str_replace, but it doesn't seem to replace the URLs. When I try preg_replace, it will give "Warning: preg_replace(): Delimiter must not be alphanumeric or backslash". Can anyone point me in the right way to do this?
if(preg_match($url_regex,$row['post_content'])){
preg_match_all($url_regex,$row['post_content'],$matches);
foreach($matches[0] as $match){
echo "{$row['ID']} \t{$row['post_date']} \t{$row['post_title']}\t{$row['guid']}";
$newUrl = NULL;
if(stripos($url_regex,'domain1') !== false || stripos($url_regex,'domain2') !== false || stripos($url_regex,'domain3') !== false){
$match = str_replace('&','&',$match);
$ch = curl_init();
curl_setopt($ch,CURLOPT_URL,$match);
curl_setopt($ch, CURLOPT_RETURNTRANSFER,1);
curl_setopt($ch, CURLOPT_HEADER, true);
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, true);
$html = curl_exec($ch);
$newUrl = curl_getinfo($ch, CURLINFO_EFFECTIVE_URL);
if(stripos($newUrl,'domain4') !== false)
$newUrl = NULL;
}
else
if($newUrl == NULL)
{ $newUrl = 'http://www.mysite.com/';
}
echo "\t$match\t$newUrl";
$content = str_replace($match,$newUrl,$row['post_content']);
echo "\t (" . strlen($content).")";
echo "\n";
}
}

This is how you would do it with Perl Regular Expressions.
$baesUrlMappings = array('/www.yoursite.com/i' => 'www.mysite.com',
'/www.yoursite2.com/i' => 'www.mysite2.com',);
echo preg_replace(array_keys($baesUrlMappings), array_values($baesUrlMappings), 'http://www.yoursite.com/foo/bar?id=123');
echo preg_replace(array_keys($baesUrlMappings), array_values($baesUrlMappings), 'http://www.yoursite2.com/foo/bar?id=123');
http://codepad.viper-7.com/2ne7u6
Please read the manual! You should be able to figure this out.

Related

Extracting URL from tweets and getting numbers of tweets containing that url

Here I am getting url from tweets, converting that url to long url.
And then getting count value for numbers of tweets containing that url.
if(preg_match($reg_exUrl, $tweet, $url)) {
preg_match_all($reg_exUrl, $tweet, $urls);
foreach ($urls[0] as $url) {
echo "Tiny url : {$url}<br>";\
$full = MyURLDecode($url);
echo "Full url : $full<br>";
if (strpos($full, '//t.co') === true)
continue;
if (strpos($full, '//twitter.com') === true)
continue;
else if (strpos($full, '//bit.ly') === true)
$full = MyURLDecode($full);
$url_count = get_twitter_url_count($full);
echo "Url count: $url_count";
//echo "Numbers of tweets containing this link : ", $code['count']
echo "<br>";
}
} else {
echo "Mismatch<br>";
}
function MyURLDecode($url)
{
$ch = #curl_init($url);
#curl_setopt($ch, CURLOPT_HEADER, TRUE);
#curl_setopt($ch, CURLOPT_NOBODY, TRUE);
#curl_setopt($ch, CURLOPT_FOLLOWLOCATION, FALSE);
#curl_setopt($ch, CURLOPT_RETURNTRANSFER, TRUE);
$url_resp = #curl_exec($ch);
preg_match('/Location:\s+(.*)\n/i', $url_resp, $i);
if (!isset($i[1]))
{
return $url;
}
return $i[1];
}
function get_twitter_url_count($url) {
$encoded_url = urlencode($url);
$content = #file_get_contents('http://urls.api.twitter.com/1/urls/count.json?url=' . $encoded_url);
return $content ? json_decode($content)->count : 0;
}
problem:
If full_url is again short url then get actual long url
If url is link to twitter photo like http://twitter.com/ADSPLAYINDIA/status/415847973210181632/photo/1 then skip further getting tweet count
I added continue but still it does not skip it
For the first problem try setting follow location to true in your MyURLDecode function
#curl_setopt($ch, CURLOPT_FOLLOWLOCATION, TRUE);
For your second problem,i think strpos will never return true.Check out this link to a comment on php.net http://www.php.net/manual/en/function.strpos.php#107240
Please let me know if it helped
Thanks

Pull text from another website

Is it possible to pull text data from another domain (not currently owned) using php? If not any other method? I've tried using Iframes, and because my page is a mobile website things just don't look good. I'm trying to show a marine forecast for a specific area. Here is the link I'm trying to display.
Update...........
This is what I ended up using. Maybe it will help someone else. However I felt there was more than one right answer to my question.
<?php
$ch = curl_init("http://forecast.weather.gov/MapClick.php?lat=29.26034686&lon=-91.46038359&unit=0&lg=english&FcstType=text&TextType=1");
curl_setopt($ch, CURLOPT_HEADER, false);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
curl_setopt($ch, CURLOPT_BINARYTRANSFER, true);
$content = curl_exec($ch);
curl_close($ch);
echo $content;
?>
This works as I think you want it to, except it depends on the same format from the weather site (also that "Outlook" is displayed).
<?php
//define the URL of the resource
$url = 'http://forecast.weather.gov/MapClick.php?lat=29.26034686&lon=-91.46038359&unit=0&lg=english&FcstType=text&TextType=1';
//function from http://stackoverflow.com/questions/5696412/get-substring-between-two-strings-php
function getInnerSubstring($string, $boundstring, $trimit=false)
{
$res = false;
$bstart = strpos($string, $boundstring);
if($bstart >= 0)
{
$bend = strrpos($string, $boundstring);
if($bend >= 0 && $bend > $bstart)
{
$res = substr($string, $bstart+strlen($boundstring), $bend-$bstart-strlen($boundstring));
}
}
return $trimit ? trim($res) : $res;
}
//if the URL is reachable
if($source = file_get_contents($url))
{
$raw = strip_tags($source,'<hr>');
echo '<pre>'.substr(strstr(trim(getInnerSubstring($raw,"<hr>")),'Outlook'),7).'</pre>';
}
else{
echo 'Error';
}
?>
If you need any revisions, please comment.
Try using a user-agent as shown below. Then you can use simplexml to parse the contents and extract the text you want. For more info on simplexml.
$opts = array(
'http'=>array(
'method'=>"GET",
'header'=>"User-agent: www.example.com"
)
);
$content = file_get_contents($url, false, stream_context_create($opts));
$xml = simplexml_load_string($content);
You may use cURL for that. Have a Look at http://www.php.net/manual/en/book.curl.php

validate youtube URL and it should be exists

I am new to php.
I want to check the valid youtube URL and if video is exists or not.
Any suggestion would be appreciated.
Here's a solution I wrote using Youtube's oembed.
The first function simply checks if video exists on Youtube's server. It assumes that video does not exists ONLY if 404 error is returned. 401 (unauthorized) means video exists, but there are some access restrictions (for example, embedding may be disabled).
Use second function if you want to check if video exists AND is embeddable.
<?php
function isValidYoutubeURL($url) {
// Let's check the host first
$parse = parse_url($url);
$host = $parse['host'];
if (!in_array($host, array('youtube.com', 'www.youtube.com'))) {
return false;
}
$ch = curl_init();
$oembedURL = 'www.youtube.com/oembed?url=' . urlencode($url).'&format=json';
curl_setopt($ch, CURLOPT_URL, $oembedURL);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
// Silent CURL execution
$output = curl_exec($ch);
unset($output);
$info = curl_getinfo($ch);
curl_close($ch);
if ($info['http_code'] !== 404)
return true;
else
return false;
}
function isEmbeddableYoutubeURL($url) {
// Let's check the host first
$parse = parse_url($url);
$host = $parse['host'];
if (!in_array($host, array('youtube.com', 'www.youtube.com'))) {
return false;
}
$ch = curl_init();
$oembedURL = 'www.youtube.com/oembed?url=' . urlencode($url).'&format=json';
curl_setopt($ch, CURLOPT_URL, $oembedURL);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
$output = curl_exec($ch);
curl_close($ch);
$data = json_decode($output);
if (!$data) return false; // Either 404 or 401 (Unauthorized)
if (!$data->{'html'}) return false; // Embeddable video MUST have 'html' provided
return true;
}
$url = 'http://www.youtube.com/watch?v=QH2-TGUlwu4';
echo isValidYoutubeURL($url) ? 'Valid, ': 'Not Valid, ';
echo isEmbeddableYoutubeURL($url) ? 'Embeddable ': 'Not Embeddable ';
?>
You never read the preg_match docs, did you?
You need a delimiter. / is most common but since you deal with an URL, # is easier as it avoid some escaping.
You need to escape characters with a special meaning in regex such as ? or .
The matches are not returned (it returns the number of matches or false if it failed), so to get the matched string you need the third param of preg_match
preg_match('#https?://(?:www\.)?youtube\.com/watch\?v=([^&]+?)#', $videoUrl, $matches);
as #ThiefMaster said,
but i'd like to add something.
he has asked how to determine if a video exists.
do a curl request and then execute curl_getinfo(...) to check the http status code.
When it is 200, the video exists, else it doesn't exist.
How that works, read here: curl_getinfo
you need change the answer above a little bit otherwise you just got the very first character,
try this
<?php
$videoUrl = 'http://www.youtube.com/watch?v=cKO6GrbdXfU&feature=g-logo';
preg_match('%https?://(?:www\.)?youtube\.com/watch\?v=([^&]+)%', $videoUrl, $matches);
var_dump($matches);
//array(2) {
// [0]=>
// string(42) "http://www.youtube.com/watch?v=cKO6GrbdXfU"
// [1]=>
// string(11) "cKO6GrbdXfU"
//}

Find a url in content a site by url it?

I want search into content a site by url it site, if existence my url (for example: http://www.mydomain.com/) return it is TRUE else it is FALSE.
If existence url as following list, Return it is FALSE:
- http://www.mydomain.com/blog?12
- www.mydomain.com/news/maste.php
- http://www.mydomain.com/mkds/skas/aksa.html
- www.mydomain.com/
- www.mydomain.com
I want just accsept(find) as(only):
http://www.mydomain.com/ OR http://www.mydomain.com
I tried as:
$url = 'http://www.usersite.com';
$ch = curl_init();
curl_setopt($ch, CURLOPT_URL,$url);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
$contents = curl_exec($ch);
curl_close($ch);
$link="/http:\/\/mydomain.com/";
if(preg_match("/". preg_quote($link,"/"). "/m", $contents) && strstr($contents,"http://www.mydomain.com")){
echo 'TRUE';
} else{
echo 'FALSE';
}
But it doesn't worked, for it what that i want. How can fix it?
You should not be using preg_quote on your link as it is already in a regex form. Try using the entire regex /http:\/\/mydomain.com/m instead.
$link="/http:\/\/mydomain.com/m";
if(preg_match($link, $contents) && false!== stripos($contents,"http://www.mydomain.com")){
echo 'TRUE';
} else{
echo 'FALSE';
}
I've also updated strstr to be stripos and to have an absolute comparison as it's not a boolean safe function.

PHP: Check if URL redirects?

I have implemented a function that runs on each page that I want to restrict from non-logged in users. The function automatically redirects the visitor to the login page in the case of he or she is not logged in.
I would like to make a PHP function that is run from a exernal server and iterates through a number of set URLs (array with URLs that is for each protected site) to see if they are redirected or not. Thereby I could easily make sure if protection is up and running on every page.
How could this be done?
Thanks.
$urls = array(
'http://www.apple.com/imac',
'http://www.google.com/'
);
$ch = curl_init();
curl_setopt($ch, CURLOPT_HEADER, true);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
foreach($urls as $url) {
curl_setopt($ch, CURLOPT_URL, $url);
$out = curl_exec($ch);
// line endings is the wonkiest piece of this whole thing
$out = str_replace("\r", "", $out);
// only look at the headers
$headers_end = strpos($out, "\n\n");
if( $headers_end !== false ) {
$out = substr($out, 0, $headers_end);
}
$headers = explode("\n", $out);
foreach($headers as $header) {
if( substr($header, 0, 10) == "Location: " ) {
$target = substr($header, 10);
echo "[$url] redirects to [$target]<br>";
continue 2;
}
}
echo "[$url] does not redirect<br>";
}
I use curl and only take headers, after I compare my url and url from header curl:
$url="http://google.com";
$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, $url);
curl_setopt($ch, CURLOPT_TIMEOUT, '60'); // in seconds
curl_setopt($ch, CURLOPT_HEADER, 1);
curl_setopt($ch, CURLOPT_NOBODY, 1);
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, 1);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
$res = curl_exec($ch);
if(curl_getinfo($ch)['url'] == $url){
echo "not redirect";
}else {
echo "redirect";
}
You could always try adding:
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, true);
since 302 means it moved, allow the curl call to follow it and return whatever the moved url returns.
Getting the headers with get_headers() and checking if Location is set is much simpler.
$urls = [
"https://example-1.com",
"https://example-2.com"
];
foreach ($urls as $key => $url) {
$is_redirect = does_url_redirect($url) ? 'yes' : 'no';
echo $url . ' is redirected: ' . $is_redirect . PHP_EOL;
}
function does_url_redirect($url){
$headers = get_headers($url, 1);
if (!empty($headers['Location'])) {
return true;
} else {
return false;
}
}
I'm not sure whether this really makes sense as a security check.
If you are worried about files getting called directly without your "is the user logged in?" checks being run, you could do what many big PHP projects do: In the central include file (where the security check is being done) define a constant BOOTSTRAP_LOADED or whatever, and in every file, check for whether that constant is set.
Testing is great and security testing is even better, but I'm not sure what kind of flaw you are looking to uncover with this? To me, this idea feels like a waste of time that will not bring any real additional security.
Just make sure your script die() s after the header("Location:...") redirect. That is essential to stop additional content from being displayed after the header command (a missing die() wouldn't be caught by your idea by the way, as the redirect header would still be issued...)
If you really want to do this, you could also use a tool like wget and feed it a list of URLs. Have it fetch the results into a directory, and check (e.g. by looking at the file sizes that should be identical) whether every page contains the login dialog. Just to add another option...
Do you want to check the HTTP code to see if it's a redirect?
$params = array('http' => array(
'method' => 'HEAD',
'ignore_errors' => true
));
$context = stream_context_create($params);
foreach(array('http://google.com', 'http://stackoverflow.com') as $url) {
$fp = fopen($url, 'rb', false, $context);
$result = stream_get_contents($fp);
if ($result === false) {
throw new Exception("Could not read data from {$url}");
} else if (! strstr($http_response_header[0], '301')) {
// Do something here
}
}
I hope it will help you:
function checkRedirect($url)
{
$headers = get_headers($url);
if ($headers) {
if (isset($headers[0])) {
if ($headers[0] == 'HTTP/1.1 302 Found') {
//this is the URL where it's redirecting
return str_replace("Location: ", "", $headers[9]);
}
}
}
return false;
}
$isRedirect = checkRedirect($url);
if(!$isRedirect )
{
echo "URL Not Redirected";
}else{
echo "URL Redirected to: ".$isRedirect;
}
You can use session,if the session array is not set ,the url redirected to a login page.
.
I modified Adam Backstrom answer and implemented chiborg suggestion. (Download only HEAD). It have one thing more: It will check if redirection is in a page of the same server or is out. Example: terra.com.br redirects to terra.com.br/portal. PHP will considerate it like redirect, and it is correct. But i only wanted to list that url that redirect to another URL. My English is not good, so, if someone found something really difficult to understand and can edit this, you're welcome.
function RedirectURL() {
$urls = array('http://www.terra.com.br/','http://www.areiaebrita.com.br/');
foreach ($urls as $url) {
$ch = curl_init();
curl_setopt($ch, CURLOPT_HEADER, true);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
// chiborg suggestion
curl_setopt($ch, CURLOPT_NOBODY, true);
// ================================
// READ URL
// ================================
curl_setopt($ch, CURLOPT_URL, $url);
$out = curl_exec($ch);
// line endings is the wonkiest piece of this whole thing
$out = str_replace("\r", "", $out);
echo $out;
$headers = explode("\n", $out);
foreach($headers as $header) {
if(substr(strtolower($header), 0, 9) == "location:") {
// read URL to check if redirect to somepage on the server or another one.
// terra.com.br redirect to terra.com.br/portal. it is valid.
// but areiaebrita.com.br redirect to bwnet.com.br, and this is invalid.
// what we want is to check if the address continues being terra.com.br or changes. if changes, prints on page.
// if contains http, we will check if changes url or not.
// some servers, to redirect to a folder available on it, redirect only citting the folder. Example: net11.com.br redirect only to /heiden
// only execute if have http on location
if ( strpos(strtolower($header), "http") !== false) {
$address = explode("/", $header);
print_r($address);
// $address['0'] = http
// $address['1'] =
// $address['2'] = www.terra.com.br
// $address['3'] = portal
echo "url (address from array) = " . $url . "<br>";
echo "address[2] = " . $address['2'] . "<br><br>";
// url: terra.com.br
// address['2'] = www.terra.com.br
// check if string terra.com.br is still available in www.terra.com.br. It indicates that server did not redirect to some page away from here.
if(strpos(strtolower($address['2']), strtolower($url)) !== false) {
echo "URL NOT REDIRECT";
} else {
// not the same. (areiaebrita)
echo "SORRY, URL REDIRECT WAS FOUND: " . $url;
}
}
}
}
}
}
function unshorten_url($url){
$ch = curl_init();
curl_setopt($ch, CURLOPT_HEADER, true);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
curl_setopt($ch, CURLOPT_URL, $url);
$out = curl_exec($ch);
$real_url = $url;//default.. (if no redirect)
if (preg_match("/location: (.*)/i", $out, $redirect))
$real_url = $redirect[1];
if (strstr($real_url, "bit.ly"))//the redirect is another shortened url
$real_url = unshorten_url($real_url);
return $real_url;
}
I have just made a function that checks if a URL exists or not
$ch = curl_init();
curl_setopt($ch, CURLOPT_HEADER, true);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
function url_exists($url, $ch) {
curl_setopt($ch, CURLOPT_URL, $url);
$out = curl_exec($ch);
// line endings is the wonkiest piece of this whole thing
$out = str_replace("\r", "", $out);
// only look at the headers
$headers_end = strpos($out, "\n\n");
if( $headers_end !== false ) {
$out = substr($out, 0, $headers_end);
}
//echo $out."====<br>";
$headers = explode("\n", $out);
//echo "<pre>";
//print_r($headers);
foreach($headers as $header) {
//echo $header."---<br>";
if( strpos($header, 'HTTP/1.1 200 OK') !== false ) {
return true;
break;
}
}
}
Now I have used an array of URLs to check if a URL exists as following:
$my_url_array = array('http://howtocode.pk/result', 'http://google.com/jobssss', 'https://howtocode.pk/javascript-tutorial/', 'https://www.google.com/');
for($j = 0; $j < count($my_url_array); $j++){
if(url_exists($my_url_array[$j], $ch)){
echo 'This URL "'.$my_url_array[$j].'" exists. <br>';
}
}
I can't understand your question.
You have an array with URLs and you want to know if user is from one of the listed URLs?
If I'm right in understanding your quest:
$urls = array('http://url1.com','http://url2.ru','http://url3.org');
if(in_array($_SERVER['HTTP_REFERER'],$urls))
{
echo 'FROM ARRAY';
} else {
echo 'NOT FROM ARR';
}

Categories