How to add www. to urls in text file

How to add www. to urls in text file - php

I've got a text file containing a lot of URLs. Some of the URLs start with www. and http:// and some them start with nothing.
I want to add www. in front of every line in the text file where the URL does not start with www. or http://.
$lines = file("sites.txt");
foreach($lines as $line) {
if(substr($line, 0, 3) != "www" && substr($line, 0, 7) != "http://" ) {
}
}
That's the code I have right now. I know it's not much, but I have no clue how to add www. in front of every unmatched line.

This will add the www. if not present and it will work if there is http/httpS in the found line.
$url = preg_replace("#http(s)?://(?:www\.)?#","http\\1://www.", $url);
This regex will work on the following:
domain.ext -> http://www.domain.ext
www.domain.ext -> http://www.domain.ext
http://www.domain.ext -> http://www.domain.ext
https://domain.ext -> https://www.domain.ext (note the httpS)
https://www.domain.ext -> https://www.domain.ext (note the httpS)
Regex explained:
http(s)?:// -> The http's S might not be there, save in case it is.
(?:www\.)? -> the www. might not be there. Don't save (?:), we're gonna add it anyways
Then we use the \\1 in the replace value to allow the http**S** to stay working when present.
Also, all the string substr functions will fail on https, because it's 1 character longer.

The trick is to pass $lines by reference so you will be able to alter them:
foreach($lines as &$line) { // note the '&'
// http:// and www. is missing:
if(stripos($line, 'http://www.') === false) {
$line = 'http://www.' . $line;
// only http:// is missing:
} elseif(stripos($line, 'http://www.') !== false && stripos($line, 'http://') === false) {
$line = 'http://' . $line;
// only www. is missing:
} elseif(stripos($line, 'http://') !== 0 && stripos($line, 'www.') !== 0)
$line = 'http://www.' . str_replace('http://', '', $line);
// nothing is missing:
} else {
}
}
Note:
Simply adding www. to a non-www domain can be wrong because www.example.com and example.com CAN have completely different contents, different servers, different destination, different DNS mapping. It's good to add http:// but not to add www..
To write the new array back to the file, you'd use:
file_put_contents(implode(PHP_EOL, $lines), 'sites.txt');

$lines = file("/var/www/vhosts/mon.totalinternetgroup.nl/public/sites/sites.txt");
$new_lines = array();
foreach($lines as $line) {
if(substr($line, 0, 3) != "www" || substr($line, 0, 7) != "http://" ) {
$new_lines[] = "www.".$line;
}else{
$new_lines[] = $line;
}
}
$content = implode("\n", $new_lines);
file_put_contents("/var/www/vhosts/mon.totalinternetgroup.nl/public/sites/sites.txt", $content);

use this:
with only 3 line!
<?
$g0 = file_get_contents("site");
#--------------------------------------------------
$g1 = preg_replace("#^http://#m","",$g0);
$g2 = preg_replace("/^www\./m","",$g1);
$g3 = preg_replace("/^/m","http://",$g2);
#--------------------------------------------------
file_put_contents("site2",$g3);
?>
input file
1.com
www.d.som
http://ss.com
http://www.ss.com
output file:
http://1.com
http://d.som
http://ss.com
http://ss.com

Related

PHP redirection based on DNS

say i have a file named redirections.txt that looks like this:
www: http://www.example.com/hub/
icloud: http://www.example.com/icloud/
dev: http://www.example.com/development/latest/projects.php
how would i go about processing that text document as domain-prefix: redirect-url? (or "if $_SERVER['HTTP_HOST'] equals domain-prefix.example.com, goto redirect-url")
currently i have:
$file = file('redirections.txt', FILE_IGNORE_NEW_LINES | FILE_SKIP_EMPTY_LINES);
foreach($file as $line => $cont){
preg_match('/(*.?): http:\/\/www.example.com\/(*.?)\//', $cont, $matches);
print_r($matches); // Debug. Was trying to see if it worked.
}

You can use explode() to split the line:
$split = explode(': ', $cont);
if (count($split) == 2) {
list ($domain_prefix, $redirect_url) = $split;
if ($_SERVER['HTTP_HOST'] == "$domain_prefix.example.com") {
header("Location: $redirect_url");
exit();
}
}

There might be better ways to achieve want you want, but if you need to use the code already presented, try this:
foreach($file as $line => $cont){
preg_match('/(?P<domain>\w+): http:\/\/www.example.com(?P<path>\/.*)/', $cont, $matches);
print_r($matches); // Debug.

Don't echo things with certain characters

I have a php program that looks at a log file and prints it to a page (code below). I don't want the user of said website to be able to look at any line containing a /. I know I could use trim to delete certain characters, but is there a way to delete the entire line? For example, I want to keep something like "Hello" and delete something like /xx.xx.xx.xx connected. All the lines I wish to delete have the same common key, /. Peoples names in said log file have <>s around them, so I must use htmlspecialcharacters
$file = file_get_contents('/path/to/log', true);
$file = htmlspecialchars($file);
echo nl2br($file);
Thanks for your help!
EDIT:
Thanks for all of the answers, currently tinkering with them!
EDIT2:
final code:
<?php
$file = file_get_contents('/path/to/log', true);
// Separate by line
$lines = explode(PHP_EOL, $file);
foreach ($lines as $line) {
if (strpos($line, '/') === false) {
$line = htmlspecialchars($line . "\n");
echo nl2br($line);
}
}
?>

Do you mean, like this?
$file = file_get_contents('/path/to/log', true);
// Separate by line
$lines = explode(PHP_EOL, $file);
foreach ($lines as $line) {
if (strpos($line, '/') === false) {
// If the line doesn't contain a "/", echo it
echo $line . PHP_EOL;
}
}
For anyone wondering, PHP_EOL is the PHP constant for "end of line" and promotes consistency between different systems (Windows, UNIX, etc.).

If you are iterating through the file line by line you can check with preg_match if the line contains /character and skip the echo if it does. If not, first split them at new line and iterate over that array.
If you don't want to split the file you can probably use preg_replace with a regexp such as (^|\n).*/.*(\n|$) and replace with empty string.

Use the str_replace function -
http://php.net/manual/en/function.str-replace.php. Alternate solution (before escaping the special characters) -
/* pattern /\/[0-9]+\.[0-9]+\.[0-9]+\.[0-9]+\sconnected/ = /xx.xx.xx.xx connected */
/* pattern will be replaced with "newtext" */
$file = file_get_contents("/path/to/log", true);
$lines = explode("\n", $file);
foreach ($lines as $line)
$correctline = preg_replace( '/\/[0-9]+\.[0-9]+\.[0-9]+\.[0-9]+\sconnected/', 'newtext', $line );
echo $correctline;
}

<?php
$file = file_get_contents("/path/to/log", true);
$lines = explode("\n", $file);
foreach ($lines AS $num => $line)
{
if ( strpos($line, "/") === false ) // Line doesn't contain "/"
{
echo htmlspecialchars($line) . "\n";
}
}
?>

Get what page the visitor visit in PHP

I was trying to get what page the visitor visit:
Here is my code:
$url = $_SERVER["SERVER_NAME"].$_SERVER["REQUEST_URI"];
$urlcomplete = $url;
$url = explode(".com/",$url);
$urlcount = count($url);
$newurl = '';
for ($start = 1; $start < $urlcount; $start++) {
if ($newurl != '') {
$newurl .= '.com/';
}
$newurl .= $url[$start];
}
$url = explode('/',$newurl);
$urlcount = explode('?',end($url));
$url[count($url) - 1] = $urlcount[0];
$urlcount = count($url);
By using the code above, all the subpage will be store in the $url.
https://stackoverflow.com/questions/ask
$url[0] = 'questions'
$url[1] = 'ask'
Just want to ask, is this good way, or there are others better way?

First prepending SERVER_NAME to the REQUEST_URI, and then trying to split it off, is pointless. This should be a simpler solution:
# first, split off the query string, if any:
list( $path ) = explode( '?', $_SERVER['REQUEST_URI'], 2 );
# then just split the URL path into its components:
$url = explode( '/', ltrim( $path, '/' ) );
The ltrim removes any leading slashes from the path, so that $url[0] won't be empty.
Note that there might still be an empty element at the end of the $url array, if the path ends in a slash. You could get rid of it by using trim instead of ltrim, but you may not want to, since the trailing slash is significant for things like resolving relative URLs.

Prevent duplicate entries with url inputs

I have a form that inputs url
dynamically, the user may input
www.stack.com or
www.stack.com/overflow or
http://www.stack.com or
http://www.stack.com/overflow
how can I prevent to insert the duplicate entry to my database?
I've tried these
$url = (input url)
$search = str_replace("http://www.", "", $url);
$search = str_replace("http://", "", $url);
$search = str_replace("www.", "", $url);
$search = str_replace("/", "", $url);
at the last $search, I wanted to remove all the following character after "/" including "/"
what does follow?

You can use PHP's parse_url() method to do all of the work for you:
$url = ((strpos($url, 'http://') !== 0) && (strpos($url, 'https://') !== 0)) ? 'http://'.$url : $url;
$parsed = parse_url($url);
$host = $parsed['host'];
The first line will verify if the scheme's of http:// or https:// exist in the given URL. If not, it will prepend a default of http://. Without the given scheme, parse_url() will put the entire URL in the path index. With it, it will properly parse the host.
Alternatively, since you specifically want just the domain name, you can add the PHP_URL_HOST flag to the method-call as:
$url = ((strpos($url, 'http://') !== 0) && (strpos($url, 'https://') !== 0)) ? 'http://'.$url : $url;
$host = parse_url($url, PHP_URL_HOST); // this will return just the host-portion.
Normally, you would want to keep the subdomain-names for a given URL because a subdomain can differ greatly (and even be an entirely different website). However, in the case of www., this is generally not the case. Given one of the statements above on how to get the current domain, you can remove www. with:
$host = str_replace('www.', '', $host);

Answer by newfurniturey seems to be very good solution. Before calling parse_url you can run one check if http:// is missing from the url, if so then you can prepend the string with http:// and parse_url should work as expected then

For some who will stuck with the same question and drop here, here's the complete code for this
if((strpos($url, 'http://') !== false) || (strpos($url, 'https://') !== false))
{ $host = parse_url($url, PHP_URL_HOST);
if (strpos($url, 'www.') !== false)
$host = str_replace('www.', '', $host);
if (strpos($host, '/') !== false)
{ $str = explode("/", $host);
$host = $str[0];
}
}
else if (strpos($url, 'www.') !== false)
{ $host = str_replace('www.', '', $url);
if (strpos($host, '/') !== false)
{ $str = explode("/", $host);
$host = $str[0];
}
}
else if (strpos($url, '/') !== false)
{ $str = explode("/", $url);
$host = $str[0];
}
else $host = $url;

Don't print last segment of URL

I have some php which prints a url. Can I contain this with PHP to leave off the last segment?
So this:
www.mysite.com/name/james
would become this:
www.mysite.com/name
I'm using expression engine so the code is just {site_url}.

$url = (substr($url, -1) == '/') ? substr($url, 0, -1) : $url; // remove trailing slash if present
$urlparts = explode('/', $url); // explode on slash
array_pop($urlparts); // remove last part
$url = implode($urlparts, '/'); // put it back together

We Keep Coding

PHP, A popular general-purpose scripting language that is especially suited to web development.

How to add www. to urls in text file - php

Related

PHP redirection based on DNS

Don't echo things with certain characters

Get what page the visitor visit in PHP

Prevent duplicate entries with url inputs

Don't print last segment of URL

Categories

Resources