Make sure string is a valid CSS ID name - php

I have a bunch of database records (without auto_increment IDs or anything else like that) rendered as a list, and it came to pass that I need to differentiate each of them with a unique id.
I could just add a running counter into the loop and be done with it, but unfortunately this ID needs to be cross-referenceable throughout the site, however the list is ordered or filtered.
Therefore I got an idea to include the record title as a part of the id (with a prefix so it doesn't clash with layout elements).
How could I transform a string into an id name in a foolproof way so that it never contains characters that would break the HTML or not work as valid CSS selectors?
For example;
Title ==> prefix_title
TPS Report 2010 ==> prefix_tps_report_2010
Mike's "Proposal" ==> prefix_mikes_proposal
#53: Míguèl ==> prefix_53_miguel
The titles are always diverse enough to avoid conflicts, and will never contain any non-western characters.
Thanks!

I needed a similar solution for Deeplink Anchors.
Found this useful:
function seoUrl($string) {
//Lower case everything
$string = strtolower($string);
//Make alphanumeric (removes all other characters)
$string = preg_replace("/[^a-z0-9_\s-]/", "", $string);
//Clean up multiple dashes or whitespaces
$string = preg_replace("/[\s-]+/", " ", $string);
//Convert whitespaces and underscore to dash
$string = preg_replace("/[\s_]/", "-", $string);
return $string;
}
Credit: https://stackoverflow.com/a/11330527/3010027
PS: Wordpress-Users have a look here: https://codex.wordpress.org/Function_Reference/sanitize_title

You don't need to use the HTML id attribute at all. You can use HTML5 data-* attribute to store user defined values. Here is an example:
<ul class="my-awesome-list">
<!-- PHP code, begin a for/while loop on rows -->
<li data-rowtitle="<?php echo json_encode($row["title"]); ?>">
<?php echo $row["title"]; ?>
</li>
<!-- PHP loop end -->
</ul>
Then, in you jQuery code, you can access the data-* values with the $.fn.data method
$(".my-awesome-list li").each(function(){
var realTitle = $(this).data('rowtitle');
});

Looking at the W3 specs, ids and classes can contain:
only the characters [a-zA-Z0-9] (...) plus the hyphen (-) and the
underscore (_); they cannot start with a digit, or a hyphen followed by a digit
Note that some other characters are accepted (that I omitted for simplicity). So I use this:
// We must be careful not to replace into an invalid string,
// thus adding 'a' in some cases and doing a second replace.
string.replace(/(^-\d-|^\d|^-\d|^--)/,'a$1').replace(/[\W]/g, '-');
Note that you may end up with identical strings that were originally different: you would have to rely on a more advance replace function if you have issues with this.

Related

Replace multiple hex colors in string

Im having problem with changing multiple HEX colors into span. Current code just change one color. Any idea how to make it work for multiple colors ?
function convertHexToSpan($name)
{
$name = preg_replace('/\*#([a-f\d]{6})(.*)\*[a-f\d]+/', "<span style='color:$1'>$2</span>", $name);
return $name;
}
$text = "#ff6600Hello #ff0000world";
$newText = convertHexToSpan($text);
OUTPUT SHOULD BE "<span style='color:#ff600'>Hello</span><span style='color:#ff0000'>world</span>
Updating your Regular Expression will get you most of the way there, but we have to make some assumptions that differ slightly from your original question.
If you use the following as the expression:
/(#[a-f\d]{6})([^ ]+)/
preg_replace does the repetition searching for you as regex isn't really for iterating, so I removed the second hex search. This finds the 6 hex digits as a first group, then the next group is any character that is not a space.
Note: I am assuming that you are trying to break on word boundaries, but will need to modify if that is not the case. I am also assuming you want to preserve the space between the words after conversion, but your example shows no space.
To remove the space between words, you would just need to modify the regex to match the spaces (and then they will get removed), which would be as follows:
/(#[a-f\d]{6})([^ ]+)( )+/

preg_replace putting the final result in the wrong place

so I'm having issues getting preg_replace to work right. I'm trying to create my own custom markdown. I get the result I want since it seems to be coughing up what I wanted. However, the problem is that it spits the user input outside of the blockquote. Here is an example of what I am talking about.
Here's my code.
<?php
$user_input = '> My quote';
$syntax = array(
'/>\s+(.*?)/is'
);
$replace_with_html = array(
'<blockquote><h3>Quote</h3><p>$1</p></blockquote>'
);
$replaced = preg_replace($syntax, $replace_with_html, $user_input);
print($replaced);
Here's the user input.
> My quote
And here is the result.
<blockquote><h3>Quote</h3><p></p></blockquote>My quote
What I want is
<blockquote><h3>Quote</h3><p>My quote</p></blockquote>
As you can see, the user input is in the wrong placement (at the end of the final HTML code). Is there a way to possibliy fix this and place it within the paragraph tags?
You don't need to make arrays, use this:
$user_input = '> My quote';
$syntax = '/>\s+(.*)/s';
$replace_with_html = '<blockquote><h3>Quote</h3><p>$1</p></blockquote>';
$replaced = preg_replace($syntax, $replace_with_html, $user_input);
print($replaced);
This works the same way: (Demo)
$user_input = '> My quote';
$syntax = ['/>\s+(.*)/s'];
$replace_with_html = ['<blockquote><h3>Quote</h3><p>$1</p></blockquote>'];
$replaced = preg_replace($syntax, $replace_with_html, $user_input);
print($replaced);
Either way, you WANT the dot in the pattern to be greedy, remove the ?.
Without this adjustment, you're only replacing the >\s+ part of the pattern.
That said, let me solve some problems that you haven't encountered yet...
How do you know where to stop quoting?
What if someone wants to use > to mean "greater than"?
Consider this new pattern and how it may help you tackle some future challenges:
/^>\s+(\S+(?:\s\S+)*)/m Replacement Demo
In the demo link you will see that the pattern will match (after > and 1 or more spaces) one or more non-whitespace characters optionally followed by: a single whitespace character (this can be a space/tab/return/newline) then one or more non-whitespace characters.
Effectively this says, you want to continue matching "quote" text until there are 2 or more consecutive whitespace characters (or else to the end of the string).
This adjustment should give your users the ability to accurately/conveniently quote-format their text while appropriately leaving innocent > character alone.

Getting count of spaces before a placeholder with regex selects too many lines

I want to create a simple PHP-Template system where a placeholder in a HTML-Page gets replaced with a dynamic generated content. I want to add to each line at the beginning of the dynamically created content as many spaces as there are before my placeholder. The problem is that my regex selects more lines than one (for get the cound of spaces before the placeholder; see graphic at the end).
This is a sample template with the placeholder. I need to cound how many spaces are between the \n and the beginning of my placeholder. In this case, there are no spaces between the beginning of the line an the beginning of my placeholder.
<p>Somethin before</p>
<!--#::CONTENT#-->
<p>Something after>
And this is my regex I use for counting how many spaces are before the placeholder. I made a group that only gives me the spaces without the \n and the placeholder. \s is a escaped space. I dont support TABS.
\n(\s*)<!--#::CONTENT#-->
Debuggex Demo
As you can see in this example, there are three lines selected. Yellow is the part that matches my regex and orange is the group i want to get (only the spaces).
Now my question: Why does this regex select more lines than one? I only allow \s 0 ore more tims between the new line and my placeholder. how can a \n between match the regex? And do I have to change to make it work?
Here is how I use the regex in my php page:
//THIS WILL BE THE DYNAMICALLY CREATED CONTENT OF MY PAGE
$pageContent = getIncludeContents('templates/test.php');
//THIS IS THE LAYOUT THAT IS THE SAME ON ALL PAGES
$layoutContent = getIncludeContents('templates/layout.php');
//Here I try to find how many spaces are before my placeholder
preg_match("/\\n(\\s*)<!--#::CONTENT#-->/", $layoutContent, $matches);
//Check if placeholder was found
if(count($matches) == 0 || count($matches) == 1) {
die('No MATCHES');
} else if(count($matches) == 2) {
$indent_space = $matches[1];
} else {
die('Too Many matches! BUG?');
}
//Now I add to every new line the spaces
$pageContent = str_replace("\n", "\n" . $indent_space, $pageContent);
//And finally I insert the dynamic content
echo str_replace("<!--#::CONTENT#-->", $pageContent, $layoutContent);
If my problem isn't clear enough explaned, please comment my question.
It is because \s matches newlines too. To solve this problem use \h instead (for horizontal spaces).
The \s shorthand character class actually is equivalent to the character class [ \t\r\n\f] so it will match newlines if it can... If you want to get only one line of spaces, use this:
\n( *)\s*<!--#::CONTENT#-->
regex101 demo
and orange is the group i want to get (only the spaces).
\s is the whitespace class, it includes newlines, carriage returns, tabs, spaces etc...
Use your same regex, After you get the whitespaces, you have to remove all but
the actual spaces, then count the length of that string to get your space count.

Different results between preg_replace & preg_match_all

I have a forum that supports hashtags. I'm using the following line to convert all hashtags into links. I'm using the (^|\(|\s|>) pattern to avoid picking up named anchors in URLs.
$str=preg_replace("/(^|\(|\s|>)(#(\w+))/","$1$2",$str);
I'm using this line to pick up hashtags to store them in a separate field when the user posts their message, this picks up all hashtags EXCEPT those at the start of a new line.
preg_match_all("/(^|\(|\s|>)(#(\w+))/",$Content,$Matches);
Using the m & s modifiers doesn't make any difference. What am I doing wrong in the second instance?
Edit: the input text could be plain text or HTML. Example of problem input:
#startoftextreplacesandmatches #afterwhitespacereplacesandmatches <b>#insidehtmltagreplacesandmatches</b> :)
#startofnewlinereplacesbutdoesnotmatch :(
Your replace operation has a problem which you have evidently not yet come across - it will allow unescaped HTML special characters through. The reason I know this is because your regex allows hashtags to be prefixed with >, which is a special character.
For that reason, I recommend you use this code to do the replacement, which will double up as the code for extracting the tags to be inserted into the database:
$hashtags = array();
$expr = '/(?:(?:(^|[(>\s])#(\w+))|(?P<notag>.+?))/';
$str = preg_replace_callback($expr, function($matches) use (&$hashtags) {
if (!empty($matches['notag'])) {
// This takes care of HTML special characters outside hashtags
return htmlspecialchars($matches['notag']);
} else {
// Handle hashtags
$hashtags[] = $matches[2];
return htmlspecialchars($matches[1]).'#'.htmlspecialchars($matches[2]).'';
}
}, $str);
After the above code has been run, $str will contain the modified string, properly escaped for direct output, and $hashtags will be populated with all the tags matched.
See it working

Automatically convert keywords to links in php

I am trying to convert specific keywords in text, which are stored in array, to the links.
Example text:
$text='This text contains many keywords, but also formated keywords.'
So now I want to convert the word keywords to the #keywords.
I used the very simple preg_replace function
preg_replace('/keywords/i',' keywords ',$text);
but obviously it converts to link also the string already formatted as a link, so I get a messy html like:
$text='This text contains many keywords, but also formated keywords" title="keywords">keywords</a>.'
Expected result:
$text='This text contains many keywords, but also formated keywords.'
Any suggestions?
THX
EDIT
We are one step from the perfect function, but still not working well in this case:
$text='This text contains many keywords, but also formated
keywords.'
In this case it replaces also the word keywords in the href, so we again get the messy code like
keywords.com/keywords" title="keywords">keywords</a>
I'm not great with regular expressions, but maybe this one will work:
/[^#>"]keywords/i
What I think it will do is ignore any instances of #keywords, >keywords, and "keywords and find the rest.
EDIT:
After testing it out, it looks like that replaces the space before the word as well, and doesn't work if keywords is the beginning of the string. It also didn't preserve original capitalization. I have tested this one, and it works perfectly for me:
$string = "Keywords and keywords, plus some more keywords with the original keywords.";
$string = preg_replace("/(?<![#>\"])keywords/i", "$0", $string);
echo $string;
The first three are replaced, preserving the original capitalization, and the last one is left untouched. This one uses a negative lookbehind and backreferences.
EDIT 2:
OP edited question. With the new example provided, the following regex will work:
$string = 'This text contains many keywords, but also formated keywords.';
$string = preg_replace("/(?<![#>\".\/])keywords/i", "$0", $string);
echo $string;
// outputs: This text contains many keywords, but also formated keywords.
This will replace all instances of keywords that are not preceded by #, >, ", ., or /.
Here is the problem:
The keyword could be inside the href, the title, or the text of the link, and anywhere in there (like if the keyword was sanity and you already had href="insanity". Or even worse, you could have a non-keyword link that happens to contain a keyword, something like:
Click here to find more keywords and such!
In the above example, even though it fits every other possible criteria (it's got spaces before and after being the easiest one to test for), it still would result in a link within a link, which I think breaks the internet.
Because of this, you need to use lookaheads and lookbehinds to check if the keyword is wrapped in a link. But there is one catch: lookbehinds have to have a defined pattern (meaning no wild cards).
I thought I'd be the hero and show you the easy fix for your issue, which would be something to the effect of:
'/(?<!\<a.?>)[list|of|keywords](?!\<\/a>)/'
Except you can't do that because the lookbehind in this case has that wildcard. Without it, you end up with a super greedy expression.
So my proposed alternative is to use regex to find all link elements, then str_replace to swap them out with a placeholder, and then replacing them with the placeholder at the end.
Here's how I did it:
$text='This text contains many keywords, but also formated keywords.';
$keywords = array('text', 'formatted', 'keywords');
//This is just to make the regex easier
$keyword_list_pattern = '['. implode($keywords,"|") .']';
// First, get all matching keywords that are inside link elements
preg_match_all('/<a.*' . $keyword_list_pattern . '.*<\/a>/', $text, $links);
$links = array_unique($links[0]); // Cleaning up array for next step.
// Second, swap out all matches with a placeholder, and build restore array:
foreach($links as $count => $link) {
$link_key = "xxx_{$count}_xxx";
$restore_links[$link_key] = $link;
$text = str_replace($link, $link_key, $text);
}
// Third, we build a nice replacement array for the keywords:
foreach($keywords as $keyword) {
$keyword_links[$keyword] = "<a href='#$keyword'>$keyword</a>";
}
// Merge the restore links to the bottom of the keyword links for one mass replacement:
$keyword_links = array_merge($keyword_links, $restore_links);
$text = str_replace(array_keys($keyword_links), $keyword_links, $text);
echo $text;
You can change your RegEx so that it only targets keywords with a space in front. Since the formatted keywords do no contain a space. Here is an example.
$text = preg_replace('/ keywords/i',' keywords',$text);

Categories