<?php
$str = "Hello world. It's a beautiful day.";
print_r (explode(" ",$str));
?>
The above code prints an array as an output.
If I use
<?php
$homepage = file_get_contents('http://www.example.com/data.txt');
print_r (explode(" ",$homepage));
?>
However it does not display individual numbers in the text file in the form of an array.
Ultimately I want to read numbers from a text file and print their frequency. The data.txt has 100,000 numbers. One number per line.
A new line is not a space. You have to explode at the appropriate new line character combination. E.g. for Linux:
explode("\n",$homepage)
Alternatively, you can use preg_split and the character group \s which matches every white space character:
preg_split('/\s+/', $homepage);
Another option (maybe faster) might be to use fgetcsv.
If you want the content of a file as an array of lines, there is already a built-in function
var_dump(file('http://www.example.com/data.txt', FILE_IGNORE_NEW_LINES | FILE_SKIP_EMPTY_LINES));
See Manual: file()
Try exploding at "\n"
print_r (explode("\n",$homepage));
Also have a look at:
http://php.net/manual/de/function.file.php
You could solve it by using a Regexp also:
$homepage = file_get_contents("http://www.example.com/data.txt");
preg_match_all("/^[0-9]+$/", $homepage, $matches);
This will give you the variable $matches which contains an array with numbers. This will ensure it will only retrieve lines that have numbers in them in case the file is not well formatted.
You are not exploding the string using the correct character. You either need to explode on new line separator (\n) or use a regular expression (will be slower but more robust). In that case, use preg_split
Related
Searching how to do this from some days so far but without success.
I've done a script to find words in a list containing some letters only once. It works.
Now i'd like to make a script to find words in a txt file list, with a word like this for example : W???EB??RD?. Positions of each letter are important. I just need to find words thats fit in. Missing letters are ?.
Could someone help me ?
Done this so far :
$letters = "[A-Z]HITEBOARDS";
$array = explode("\n", file_get_contents('test.txt'));
$fl_array = preg_grep("[A-Z]HITEBOARDS", $array);
echo $array[0];
echo $array[1];
echo $array[2];
echo $array[3];
var_dump($fl_array);
As I mentioned in the comments your regex was missing the delimters
$fl_array = preg_grep("[A-Z]HITEBOARDS", $array);
Should be
$fl_array = preg_grep("/[A-Z]HITEBOARDS/", $array);
You may or should include the word boundary \b before and after a word, this will prevent matching partial words such as (for example) if you had this /\b[A-Z]here\b/ which could match therefore instead of just there. Without the boundaries matches could happen in the start, middle or end of partial words, which is probably not what you want. The boundary will match anything that is \W or in other words not \w or simpler [^a-z0-9_] or in English: matches anything not alpha, number or the underline, basically all your punctuation , special chars (except _ ) and whitespaces.
So to put that in code would be this:
$fl_array = preg_grep("/\b[A-Z]HITEBOARDS\b/", $array);
Also instead of:
$array = explode("\n", file_get_contents('test.txt'));
You can use
$array = file('test.txt', FILE_IGNORE_NEW_LINES|FILE_SKIP_EMPTY_LINES);
The file function is preferred because it breaks the file into an array based on the line endings (not dependent on OS \r\n vs \n as explode is). Besides that an better performance it also has two really useful flags. FILE_IGNORE_NEW_LINES is a given as this removes the line endings which are normally retained in the array by file(). The FILE_SKIP_EMPTY_LINES will do basically what it says and skip lines that are empty.
Cheers.
I'm trying to parse a file and analyze it. To do this, I've used preg_split() to break the document into an array. I only want words in the array (otherwise alpha characters). The regular expression I used is:
$noAlpha = "/[\s]+|[^A-z]+|\W|\r/";
However, I'm getting instances of blanks in the array. I believe it has to do with a line with a return only (\r) and nothing else on it.
I'm only using .txt files. What would I need to add to the regex to account for this?
To extract all the words (only letters), you can use this
preg_match_all('/[^\W\d_]+/',$string,$matches)
If you want digits as well, then the pattern should be '/[^\W_]+/'
Try this:
$noAlpha = "/\s+|[^a-zA-Z]+|\W|\r/";
You can try this:
$noAlpha = "/\s*\W\s*/";
However, I also would extract the words with preg_match_all instead.
When trying to use PHP's str_replace function to replace multiple substrings, is there a way to get it to ignore the strings it's just replaced?
For example, when executing the following code block, it replaces the "o" in "<strong>" from the first replacement.
$str="Hello world.";
$old=array("e","o");
$new=array("<strong>e</strong>","<strong>o</strong>");
echo str_replace($old,$new,$str);
The actual output:
// "H<str<strong>o</strong>ng>e</str<strong>o</strong>ng>ll<strong>o</strong> w<strong>o</strong>rld."
The expected output:
// "H<strong>e</strong>ll<strong>o</strong> w<strong>o</strong>rld."
Use strtr().
From the PHP manual:
The longest keys will be tried first. Once a substring has been replaced, its new value will not be searched again.
The alternative solution using preg_replace function:
echo preg_replace("/(e|o)/i", "<strong>$1</strong>", $str);
// H<strong>e</strong>ll<strong>o</strong> w<strong>o</strong>rld.
I am trying to take a text area value and run it through regular expression to split it to lines.
so if someone wrote a line then enter and another line and enter the i will have an array with each line per array value
The expression I've came up with so far is :
(.+?)\n|\G(.*)
and this is how i use it(from a website i use to test expressions http://myregextester.com/)
$sourcestring="
this is a sentense yeaa
interesting sentense
yet another sentese
";
preg_match_all('/(.+?)\n|\G(.*)/',$sourcestring,$matches);
echo "<pre>".print_r($matches,true);
however there is 1 object in the array that always empty and i am trying to find a way to get rid of it.
Thanks in advanced.
You don't need a regex for this, just use explode(), like so:
$lines = explode( "\n", trim( $input));
Now each line of the user's $input will be a single array entry in $lines.
This will do and get rid of the empty lines in the beginning and end of the array
explode("\n", trim($sourcestring));
See example: http://viper-7.com/pNqtvV
There are various types of newlines. In HTML form context you'll typically receive CR LF for line endings. A dumb explode will do, but a regex will catch all variations if you use \R. Thus \r\n and \n or \r and others will be processed by:
$lines = preg_split(':\R:', $text);
preg_split() is the equivalent to PHPs explode(). So you don't need to use preg_match_all.
I'm trying to capture the text "Capture This" in $string below.
$string = "</th><td>Capture This</td>";
$pattern = "/<\/th>\r.*<td>(.*)<\/td>$/";
preg_match ($pattern, $string, $matches);
echo($matches);
However, that just returns "Array". I also tried printing $matches using print_r, but that gave me "Array ( )".
This pattern will only come up once, so I just need it to match one time. Can somebody please tell me what I'm doing wrong?
The problem is that you require a CR character \r. Also you should make the search lazy inside the capturing group and use print_r to output the array. Like this:
$pattern = "/<\/th>.*<td>(.*?)<\/td>$/";
You can see it in action here: http://codepad.viper-7.com/djRJ0e
Note that it's recommended to parse html with a proper html parser rather than using regex.
Two things:
You need to drop the \r from your regex as there is no carriage return character in your input string.
Change echo($matches) to print_r($matches) or var_dump($matches)