How can I get constant name and constant value in file

How can I get constant name and constant value in file - php

I have files with texts an constants. How can I get constant name and constant value. Problem is that constant value has sometime spaces. I read files with file() and then with foreach...
Example:
define('LNG_UTF-8', 'Universal Alphabet (UTF-8)');
define('LNG_ISO-8859-1', 'Western Alphabet (ISO-8859-1)');
define('LNG_CustomFieldsName', 'Custom Field');
define('LNG_CustomFieldsType', 'Type');
I already tried:
to get constant name:
$getConstant1 = strpos($mainFileArray[$i], '(\'');
$getConstant2 = strpos($mainFileArray[$i], '\',');
$const = substr($mainFileArray[$i], $getConstant1 + 2, $getConstant2 - $getConstant1 - 2);
to get constant value
$position1 = strpos($file[$i], '\',');
$position2 = strpos($file[$i], '\');');
$rest = substr($file[$i], $position1 + 3, $position2 - $position1 - 2);
but not working when is space or ','...
How can i make this always working??

A regular expression matching this would be this:
preg_match("/define\(\s*'([^']*)'\s*,\s*'([^']*)'\s*\)/i", $line, $match);
echo $match[1], $match[2];
See http://rubular.com/r/m9plE2qQeT.
However, that only works if the strings are single quoted ', don't contain escaped quotes, the strings are not concatenated etc. For example, this would break:
define('LNG_UTF-8', "Universal Alphabet (UTF-8)");
define('LNG_UTF-8', 'Universal \'Alphabet\' (UTF-8)');
define('LNG_UTF-8', 'Universal Alphabet ' . '(UTF-8)');
// and many similar
To make at least the first two work, you should use token_get_all to parse the PHP file according to PHP parsing rules, then go through the resulting tokens and extract the values you need.
To make all cases work, you need to actually evaluate the PHP code, i.e. include the file and then simply access the constants as constants in PHP.

You should use get_defined_constants() function for that. It returns an associative array with the names of all the constants and their values.

Related

Is variable's value back-slashing available in PHP?

When we check:
dir1/dir2/../file.txt ==== this is same as =====> dir1/file.txt
I am interested is something same thing available in PHP, like:
$name= "Hello ". $variable . "World";
if i had $variable = "../Hi" (or anything like that) so, it removed (like backslashing) the previous part, printed Hi World ?
(p.s. I dont control the php file, I ask about how attackers can achieve that).
(p.s.2. I dont have words to downvoters for closing this. I think you have problems with analysing of questions before you close).

In PHP there exist no special ../ (or any other string) that when concatenated to another string generates any string other than the combine original string concatenated with the new string. Concatenation, regardless of content of strings always results in:
"<String1><String2>" = "<String1>"."<String2>";
Nothing will not 'erase' prior tokens in a string or anything like that and is completely harmless.
Caveat!!!! Of course if the string is being used somewhere that interprets it in some specific way where any character or group of characters in the ../ is treated special such as:
In a string used for regex pattern
In a string used as a file path (in that case, when it's evaluated it will do exactly what you'd expect if you'd typed it.
A string used in a SQL query without properly escaping (as with binding params/values via prepared statements)
etc...
Now, if you want to remove the word prior to each occurence of ../ starting a word in a sentence, sort-of replicating how the .. in a path means, go up one level (in effect undoing the step made to the directory in the path prior to it).
Here's a basic algorithm to start you out (if you are able to change the source code) :
Use explode with delimiter " " on the string.
Create a new array
Iterate the returned array, if not ../ insert at end of new array
if entry starts with ../, remove the end element of the 2nd array
insert the the ../somestring with the ../ string replaced with empty string "" on the end of the 2nd array
Once at end of array (all strings processed), implode() with delimiter " "
Here's an example:
<?php
$variable = "../Hi";
$string = "Hello ". $variable . " World"; // Note: I added a space prior to the W
$arr = array();
foreach(explode(" ", $string) as $word) {
if (substr( $word, 0, 3 ) === "../") {
if(!empty($arr)){
array_pop($arr);
}
$arr[] = str_replace("../", "", $word);
} else {
$arr[] = $word;
}
}
echo implode(" ", $arr);

How to use character classes in searching a string in PHP

Using a for loop, I want to cycle through each character in a string and check to see if it is a certain letter. Let's say I want to search my string for my favorite letters -- A,C,D,O,V. Let's say I have a string, $giantButtText. Why does this result in no output on my standard output (given that $giantButtText does indeed contain those letters)?
if($giantButtText[$i] == "/[acdov]/") echo $giantButtText[$i];
Cheers!

You are trying to match $giantButtText[$i] to a regular expression.
The standard way to do this is preg_match() (http://php.net/manual/en/function.preg-match.php).
Something like this should work:
$a = array();
$a[0] = "dadov";
if (preg_match("/[acdov]/", $a[0])) echo "true";
-> true

Replacing variables in a string

I am working on a multilingual website in PHP and in my languages files i often have strings which contain multiple variables that will be later filled in to complete the sentences.
Currently i am placing {VAR_NAME} in the string and manually replacing each occurence with its matching value when used.
So basically :
{X} created a thread on {Y}
becomes :
Dany created a thread on Stack Overflow
I have already thought of sprintf but i find it inconvenient because it depends on the order of the variables which can change from a language to another.
And I have already checked How replace variable in string with value in php? and for now i basically use this method.
But i am interested in knowing if there is a built-in (or maybe not) convenient way in PHP to do that considering that i already have variables named exactly as X and Y in the previous example, more like $$ for a variable variable.
So instead of doing str_replace on the string i would maybe call a function like so :
$X = 'Dany';
$Y = 'Stack Overflow';
$lang['example'] = '{X} created a thread on {Y}';
echo parse($lang['example']);
would also print out :
Dany created a thread on Stack Overflow
Thanks!
Edit
The strings serve as templates and can be used multiple times with different inputs.
So basically doing "{$X} ... {$Y}" won't do the trick because i will lose the template and the string will be initialized with the starting values of $X and $Y which aren't yet determined.

I'm going to add an answer here because none of the current answers really cut the mustard in my view. I'll dive straight in and show you the code I would use to do this:
function parse(
/* string */ $subject,
array $variables,
/* string */ $escapeChar = '#',
/* string */ $errPlaceholder = null
) {
$esc = preg_quote($escapeChar);
$expr = "/
$esc$esc(?=$esc*+{)
| $esc{
| {(\w+)}
/x";
$callback = function($match) use($variables, $escapeChar, $errPlaceholder) {
switch ($match[0]) {
case $escapeChar . $escapeChar:
return $escapeChar;
case $escapeChar . '{':
return '{';
default:
if (isset($variables[$match[1]])) {
return $variables[$match[1]];
}
return isset($errPlaceholder) ? $errPlaceholder : $match[0];
}
};
return preg_replace_callback($expr, $callback, $subject);
}
What does that do?
In a nutshell:
Create a regular expression using the specified escape character that will match one of three sequences (more on that below)
Feed that into preg_replace_callback(), where the callback handles two of those sequences exactly and treats everything else as a replacement operation.
Return the resulting string
The regex
The regex matches any one of these three sequences:
Two occurrences of the escape character, followed by zero or more occurrences of the escape character, followed by an opening curly brace. Only the first two occurrences of the escape character are consumed. This is replaced by a single occurrence of the escape character.
A single occurrence of the escape character followed by an opening curly brace. This is replaced by a literal open curly brace.
An opening curly brace, followed by one or more perl word characters (alpha-numerics and the underscore character) followed by a closing curly brace. This is treated as a placeholder and a lookup is performed for the name between the braces in the $variables array, if it is found then return the replacement value, if not then return the value of $errPlaceholder - by default this is null, which is treated as a special case and the original placeholder is returned (i.e. the string is not modified).
Why is it better?
To understand why it's better, let's look at the replacement approaches take by other answers. With one exception (the only failing of which is compatibility with PHP<5.4 and slightly non-obvious behaviour), these fall into two categories:
strtr() - This provides no mechanism for handling an escape character. What if your input string needs a literal {X} in it? strtr() does not account for this, and it would be substituted for the value $X.
str_replace() - this suffers from the same issue as strtr(), and another problem as well. When you call str_replace() with an array argument for the search/replace arguments, it behaves as if you had called it multiple times - one for each of the array of replacement pairs. This means that if one of your replacement strings contains a value that appears later in the search array, you will end up substituting that as well.
To demonstrate this issue with str_replace(), consider the following code:
$pairs = array('A' => 'B', 'B' => 'C');
echo str_replace(array_keys($pairs), array_values($pairs), 'AB');
Now, you'd probably expect the output here to be BC but it will actually be CC (demo) - this is because the first iteration replaced A with B, and in the second iteration the subject string was BB - so both of these occurrences of B were replaced with C.
This issue also betrays a performance consideration that might not be immediately obvious - because each pair is handled separately, the operation is O(n), for each replacement pair the entire string is searched and the single replacement operation handled. If you had a very large subject string and a lot of replacement pairs, that's a sizeable operation going on under the bonnet.
Arguably this performance consideration is a non-issue - you would need a very large string and a lot of replacement pairs before you got a meaningful slowdown, but it's still worth remembering. It's also worth remembering that regex has performance penalties of its own, so in general this consideration shouldn't be included in the decision-making process.
Instead we use preg_replace_callback(). This visits any given part of the string looking for matches exactly once, within the bounds of the supplied regular expression. I add this qualifier because if you write an expression that causes catastrophic backtracking then it will be considerably more than once, but in this case that shouldn't be a problem (to help avoid this I made the only repetition in the expression possessive).
We use preg_replace_callback() instead of preg_replace() to allow us to apply custom logic while looking for the replacement string.
What this allows you to do
The original example from the question
$X = 'Dany';
$Y = 'Stack Overflow';
$lang['example'] = '{X} created a thread on {Y}';
echo parse($lang['example']);
This becomes:
$pairs = array(
'X' = 'Dany',
'Y' = 'Stack Overflow',
);
$lang['example'] = '{X} created a thread on {Y}';
echo parse($lang['example'], $pairs);
// Dany created a thread on Stack Overflow
Something more advanced
Now let's say we have:
$lang['example'] = '{X} created a thread on {Y} and it contained {X}';
// Dany created a thread on Stack Overflow and it contained Dany
...and we want the second {X} to appear literally in the resulting string. Using the default escape character of #, we would change it to:
$lang['example'] = '{X} created a thread on {Y} and it contained #{X}';
// Dany created a thread on Stack Overflow and it contained {X}
OK, looks good so far. But what if that # was supposed to be a literal?
$lang['example'] = '{X} created a thread on {Y} and it contained ##{X}';
// Dany created a thread on Stack Overflow and it contained #Dany
Note that the regular expression has been designed to only pay attention to escape sequences that immediately precede an opening curly brace. This means that you don't need to escape the escape character unless it appears immediately in front of a placeholder.
A note about the use of an array as an argument
Your original code sample uses variables named the same way as the placeholders in the string. Mine uses an array with named keys. There are two very good reasons for this:
Clarity and security - it's much easier to see what will end up being substituted, and you don't risk accidentally substituting variables you don't want to be exposed. It wouldn't be much good if someone could simply feed in {dbPass} and see your database password, now would it?
Scope - it's not possible to import variables from the calling scope unless the caller is the global scope. This makes the function useless if called from another function, and importing data from another scope is very bad practice.
If you really want to use named variables from the current scope (and I do not recommend this due to the aforementioned security issues) you can pass the result of a call to get_defined_vars() to the second argument.
A note about choosing an escape character
You'll notice I chose # as the default escape character. You can use any character (or sequence of characters, it can be more than one) by passing it to the third argument - and you may be tempted to use \ since that's what many languages use, but hold on before you do that.
The reason you don't want to use \ is because many languages use it as their own escape character, which means that when you want to specify your escape character in, say, a PHP string literal, you run into this problem:
$lang['example'] = '\\{X}'; // results in {X}
$lang['example'] = '\\\{X}'; // results in \Dany
$lang['example'] = '\\\\{X}'; // results in \Dany
It can lead to a readability nightmare, and some non-obvious behaviour with complex patterns. Pick an escape character that is not used by any other language involved (for example, if you are using this technique to generate fragments of HTML, don't use & as an escape character either).
To sum up
What you are doing has edge-cases. To solve the problem properly, you need to use a tool capable of handling those edge-cases - and when it comes to string manipulation, the tool for the job is most often regex.

Here's a portable solution, using variable variables. Yay!
$string = "I need to replace {X} and {Y}";
$X = 'something';
$Y = 'something else';
preg_match_all('/\{(.*?)\}/', $string, $matches);
foreach ($matches[1] as $value)
{
$string = str_replace('{'.$value.'}', ${$value}, $string);
}
First you set up your string, and your replacements. Then, you perform a regular expression to get an array of matches (strings within { and }, including those brackets). Finally, you loop around these and replace those with the variables you created above, using variable variables. Lovely!
Just thought I'd update this with another option even though you've marked it as correct. You don't have to use variable variables, and an array can be used in it's place.
$map = array(
'X' => 'something',
'Y' => 'something else'
);
preg_match_all('/\{(.*?)\}/', $string, $matches);
foreach ($matches[1] as $value)
{
$string = str_replace('{'.$value.'}', $map[$value], $string);
}
That would allow you to create a function with the following signature:
public function parse($string, $map); // Probably what I'd do tbh
Another option thanks to toolmakersteve in the comments does away with the need for a loop and uses strtr, but requires minor additions to the variables and single quotes instead of double quotes:
$string = 'I need to replace {$X} and {$Y}';
$map = array(
'{$X}' => 'something',
'{$Y}' => 'something else'
);
$string = strtr($string, $map);

If you're running 5.4 and you care about being able to use PHP's builtin variable interpolation in the string, you can use the bindTo() method of Closure like so:
// Strings use interpolation, but have to return themselves from an anon func
$strings = [
'en' => [
'message_sent' => function() { return "You just sent a message to $this->recipient that said: $this->message."; }
],
'es' => [
'message_sent' => function() { return "Acabas de enviar un mensaje a $this->recipient que dijo: $this->message."; }
]
];
class LocalizationScope {
private $data;
public function __construct($data) {
$this->data = $data;
}
public function __get($param) {
if(isset($this->data[$param])) {
return $this->data[$param];
}
return '';
}
}
// Bind the string anon func to an object of the array data passed in and invoke (returns string)
function localize($stringCb, $data) {
return $stringCb->bindTo(new LocalizationScope($data))->__invoke();
}
// Demo
foreach($strings as $str) {
var_dump(localize($str['message_sent'], array(
'recipient' => 'Jeff Atwood',
'message' => 'The project should be done in 6 to 8 weeks.'
)));
}
//string(93) "You just sent a message to Jeff Atwood that said: The project should be done in 6 to 8 weeks."
//string(95) "Acabas de enviar un mensaje a Jeff Atwood que dijo: The project should be done in 6 to 8 weeks."
(Codepad Demo)
Perhaps, it feels a bit hacky, and I don't particularly like using $this in this instance. But you do get the added benefit of relying on PHP's variable interpolation (which allows you to do things like escaping, that are difficult to achieve with regex).
EDIT: Added LocalizationScope, which adds another benefit: no warnings if localization anonymous functions try to access data that was not provided.

strtr is probably a better choice for this kind of things, because it replaces longest keys first:
$repls = array(
'X' => 'Dany',
'Y' => 'Stack Overflow',
);
foreach($data as $key => $value)
$repls['{' . $key . '}'] = $value;
$result = strtr($text, $repls);
(think of situations where you have keys like XX and X)
And if you don't want to use an array and instead expose all variables from the current scope:
$repls = get_defined_vars();

If your only issue with sprintf is the order of the arguments you can use argument swapping.
From the doc (http://php.net/manual/en/function.sprintf.php):
$format = 'The %2$s contains %1$d monkeys';
echo sprintf($format, $num, $location);

gettext is a widely used universal localization system that does exactly what you want.
There are libraries for most programming languages and PHP has a built-in engine.
It is driven by po-files, simple text based format, for which there are many editors around and it is compatible with sprintf syntax.
It even has some functions to deal with things like complicated plurals that some languages have.
Here are some examples of what it does. Note that _() is an alias for gettext():
echo _('Hello world'); // will output hello world in the current selected language
echo sprintf(_("%s has created a thread on %s"), $name, $site); // translates the string, and hands it over to sprintf()
echo sprintf(_("%2$s has created a thread on %1$s"), $site, $name); // same as above, but with changed order of parameters.
If you have more than a handful of strings, you should definitely use an existing engine, rather than writing your own one.
Adding a new language is just a matter of translating a list of strings and most professional translation tools can work with this file format, too.
Check Wikipedia and the PHP documentation for a basic overview on how this works:
http://en.wikipedia.org/wiki/Gettext
http://de.php.net/gettext
Google finds heaps of documentation and your favourite software repository will most likely have a handful of tools for managing po-files.
Some that I have used are:
poedit: Very light and simple. Good if you don't have too much stuff to translate and don't want to spend time thinking about how that stuff works.
Virtaal: A bit more complex and has a bit of a learning curve, but also some nice features that make your life easier. Good if you need to translate a lot.
GlotPress is a web application (from the wordpress people) that allows collaborative editing of the translation database files.

Why not use str_replace then? If you want it as template.
echo str_replace(array('{X}', '{Y}'), array($X, $Y), $lang['example']);
for every occurrence of this that you need
str_replace was built for this in the first place.

How about defining the "variable" parts as an array with keys corresponding to the placeholders in your string?
$string = "{X} created a thread on {Y}";
$values = array(
'X' => "Danny",
'Y' => "Stack Overflow",
);
echo str_replace(
array_map(function($v) { return '{'.$v.'}'; }, array_keys($values)),
array_values($values),
$string
);

Why can't you just use the template string within a function?
function threadTemplate($x, $y) {
return "{$x} created a thread on {$y}";
}
echo threadTemplate($foo, $bar);

Simple:
$X = 'Dany';
$Y = 'Stack Overflow';
$lang['example'] = "{$X} created a thread on {$Y}";
Hence:
echo $lang['example'];
Will output:
Dany created a thread on Stack Overflow
As you requested.
UPDATE:
As per the OP's comments about making the solution more portable:
Have a class do the parsing for you each time:
class MyParser {
function parse($vstr) {
return "{$x} created a thread on {$y}";
}
}
That way, if the following occurs:
$X = 3;
$Y = 4;
$a = new MyParser();
$lang['example'] = $a->parse($X, $Y);
echo $lang['example'];
Which will return:
3 created a thread on 4;
And, double checking:
$X = 'Steve';
$Y = 10.9;
$lang['example'] = $a->parse($X, $Y);
Will print:
Steve created a thread on 10.9;
As desired.
UPDATE 2:
As per the OP's comments about improving portability:
class MyParser {
function parse($vstr) {
return "{$vstr}";
}
}
$a = new MyParser();
$X = 3;
$Y = 4;
$vstr = "{$X} created a thread on {$Y}";
$a = new MyParser();
$lang['example'] = $a->parse($vstr);
echo $lang['example'];
Will output the results cited previously.

Try
$lang['example'] = "$X created a thread on $Y";
EDIT: Based on latest info
Maybe you need to look at the sprintf() function
Then you could have your template string defined as this
$template_string = '%s created a thread on %s';
$X = 'Fred';
$Y = 'Sunday';
echo sprintf( $template_string, $X, $Y );
$template_string does not change but later in your code when you have assigned different values to $X and $Y you can still use the echo sprintf( $template_string, $X, $Y );
See PHP Manual

just throwing another solution in using associative arrays. This will loop through the associative array and either replace the template or leave it blank.
example:
$list = array();
$list['X'] = 'Dany';
$list['Y'] = 'Stack Overflow';
$str = '{X} created a thread on {Y}';
$newstring = textReplaceContent($str,$list);
function textReplaceContent($contents, $list) {
while (list($key, $val) = each($list)) {
$key = "{" . $key . "}";
if ($val) {
$contents = str_replace($key, $val, $contents);
} else {
$contents = str_replace($key, "", $contents);
}
}
$final = preg_replace('/\[\w+\]/', '', $contents);
return ($final);
}

Why is ${0x0} correct?

The following code is working perfectly:
${0x0} = 'test';
echo ${0x0}; // prints "test"
But I can't figure out why. 0x0 (or 0, as non-hex people call it) is a random container, it could have been any number, but php variables can't start with a number. What's so special about the { } used here, and what are their limitations ?

First of all, 0x0 is just a regular 0 in hexadecimal representation that gets cast to string '0' when used with the variable variable syntax:
var_dump(0x0===0); // prints "bool(true)"
${0x0} = 'test';
echo ${0x0}; // prints "test"
echo ${'0'}; // prints "test" as well
var_dump(get_defined_vars()); // contains ["0"] => string(4) "test"
You're correct whey you say that it isn't a valid variable name:
Variable names follow the same rules as other labels in PHP. A valid
variable name starts with a letter or underscore, followed by any
number of letters, numbers, or underscores. As a regular expression,
it would be expressed thus: '[a-zA-Z_\x7f-\xff][a-zA-Z0-9_\x7f-\xff]*'
This is the reason why $0foo = 'Test'; triggers a parse error.
Some quick testing with the variable variables syntax reveals that, in fact, PHP does not seem to really care about variable names as far as they are strings:
${'123 abc xyz '} = 'Test';
echo ${'123 abc xyz '}; // Test
echo ${'123 abc xyz '}; // PHP Notice: Undefined variable: 123 abc xyz in ...
var_dump(get_defined_vars()); // ["123 abc xyz "] => string(4) "Test"
My guess is that the aforementioned naming restriction is imposed by the source code parser rather than the language core. It needs such rules to tell variables apart when analysing PHP code. Internally, the Zend engine that powers PHP handles variables as a hash map:
PHP variables, in general, consist out of two things: The label, which
might, for instance, be an entry in a symbol table, and the actual
variable container.
So as far as it receives a valid string for the label, it's happy.

From the documentation:
Curly braces may also be used, to clearly delimit the property name. They are most useful when accessing values within a property that contains an array, when the property name is made of mulitple parts, or when the property name contains characters that are not otherwise valid (e.g. from json_decode() or SimpleXML).
To me this implies that if you use ${...}, there are no limitations regarding what characters may be used in a variable name. Whether you should however...

PHP parser provides a special syntax to create a variable name from any expression that returns string (or can be casted to string), eg.:
<?php
define('A', 'aaa');
${' _ '} = 'blah';
${'test' . A . (2 + 6)} = 'var';
echo ${' _ '}; // blah
echo ${'testaaa8'}; // var
${'123'} = 'blah';
echo ${100 + 23}; // blah
function returnVarName() {
return 'myVar';
}
$myVar = 12;
echo ${returnVarName()}; // 12
This syntax is also available for object properties:
$object->{' some property ... with strage name'};
0x0 is just a hex representation of 0 literal.

In other words everything within the curly braces in such cases is a string!
So s0x0 is indeed the hex version of 0 but here both are strings! That is why ${0x0} or ${0} work, where $0 or $0x0 won't!

On top of what #Michael Robinson said, in your example this will also be valid:
${0x0} = 'test';
$var = "0";
echo $$var; // prints "test"

PHP Using str_word_count with strsplit to form array after x words

I've got a large string that I want to put in an array after each 50 words. I thought about using strsplit to cut, but realised that wont take the words in to consideration, just split when it gets to x char.
I've read about str_word_count but can't work out how to put the two together.
What I've got at the moment is:
$outputArr = str_split($output, 250);
foreach($outputArr as $arOut){
echo $arOut;
echo "<br />";
}
But I want to substitute that to form each item of the array at 50 words instead of 250 characters.
Any help will be much appreciated.

Assuming that str_word_count is sufficient for your needs¹, you can simply call it with 1 as the second parameter and then use array_chunk to group the words in groups of 50:
$words = str_word_count($string, 1);
$chunks = array_chunk($words, 50);
You now have an array of arrays; to join every 50 words together and make it an array of strings you can use
foreach ($chunks as &$chunk) { // important: iterate by reference!
$chunk = implode(' ', $chunk);
}
¹ Most probably it is not. If you want to get what most humans consider acceptable results when processing written language you will have to use preg_split with some suitable regular expression instead.

There's another way:
<?php
$someBigString = <<<SAMPLE
This, actually, is a nice' old'er string, as they said, "divided and conquered".
SAMPLE;
// change this to whatever you need to:
$number_of_words = 7;
$arr = preg_split("#([a-z]+[a-z'-]*(?<!['-]))#i",
$someBigString, $number_of_words + 1, PREG_SPLIT_DELIM_CAPTURE);
$res = implode('', array_slice($arr, 0, $number_of_words * 2));
echo $res;
Demo.
I consider preg_split a better tool (than str_word_count) here. Not because the latter is inflexible (it is not: you can define what symbols can make up a word with its third param), but because preg_split will essentially stop processing the string after getting N items.
The trick, as quite common with this function, is to capture delimiters as well, then use them to reconstruct the string with the first N words (where N is given) AND punctuation marks saved.
(of course, the regex used in my example does not strictly comply to str_word_count locale-dependent behavior. But it still restricts the words to consist of alpha, ' and - symbols, with the latter two not at the beginning and the end of any word).

We Keep Coding

PHP, A popular general-purpose scripting language that is especially suited to web development.