Skip to main content

Needing to expand this regex URL/mail parser a bit further





function make_clickable($text)

{

$ret = ' ' . $text;

$ret = preg_replace("#(^|[\n ])([\w]+?://[\w]+[^ \"\n\r\t<]*)#ise", "'\\1<a href=\"\\2\" >\\2</a>'", $ret);

$ret = preg_replace("#(^|[\n ])((www|ftp)\.[^ \"\t\n\r<]*)#ise", "'\\1<a target=\"_blank\" href=\"http://\\2\" >\\2</a>'", $ret);

$ret = preg_replace("#(^|[\n ])([a-z0-9&\-_\.]+?)@([\w\-]+\.([\w\-\.]+\.)*[\w]+)#i", "\\1<a href=\"mailto:\\2@\\3\">\\2@\\3</a>", $ret);

$ret = substr($ret, 1);

return($ret);

}









as well as make sure that other domains like these still work:







I am not very fluent in regex at all and I stole this script from somewhere on the internet.





I know there are limitations to regex and this may be one, but any help at all would be greeaatly appreciated. I also notice that this site is using some nice javascript to parse urls really nicely. It worked on every one of my "problem" domains except for the one with (). Can anyone show me where stackoverflow's JS parser is? I was unable to locate it.





One more question: I am doing this for a newspaper site (to automatically parse links that the authors may write in their stories, as well as automatic mailto email addresses). I am thinking that it might be better to use javascript and let each client machine render the links that way. But I also want it to be reliable, so cross browser issues and things like noscript may come into play that way. Any thoughts?


Comments

  1. Seems to me you want the starting (^|[\n ]) replaced with \b, and a zillion other problems... Possibly change: [^ \"\n\r\t<]* to (\w|\W(?=\w|$)) for the second one:

    preg_replace('#\b(www|ftp)\.(\w|\W(?=\w|$))+#ise', '<a target="_blank" href="http://\\0" >\\0</a>', $ret);


    ... but thats just to get you started... It is no easy matter, and I'm not willing to put in the time to make it more full proof ;)

    ReplyDelete
  2. There's no way to make your current approach standards-compliant, and I can't be bothered either. Since you are just asking for the blackbox/magic regex codez, a simple workaround would be a negative assertion:

    (?<![.?;:)])


    Add that in your regex right before the #ise, so it won't match those characters at the very end.

    ReplyDelete

Post a Comment

Popular posts from this blog

Slow Android emulator

I have a 2.67 GHz Celeron processor, 1.21 GB of RAM on a x86 Windows XP Professional machine. My understanding is that the Android emulator should start fairly quickly on such a machine, but for me it does not. I have followed all instructions in setting up the IDE, SDKs, JDKs and such and have had some success in staring the emulator quickly but is very particulary. How can I, if possible, fix this problem?

CCNA 1 Final Exam 2011 latest (hot hot hot)

  Hi! I have been posted content of ccna1 final exam (latest and only question.) I will post the answer and insert image on sunday. If you care, please subscribe your email an become a first person have full test content. Subcribe now  Some question  have not content because this question have images content. So that can you wait for me? SUNDAY 1. A user sees the command prompt: Router(config-if)# . What task can be performed at this mode? Reload the device. Perform basic tests. Configure individual interfaces. Configure individual terminal lines. 2. Refer to the exhibit. Host A attempts to establish a TCP/IP session with host C. During this attempt, a frame was captured with the source MAC address 0050.7320.D632 and the destination MAC address 0030.8517.44C4. The packet inside the captured frame has an IP source address 192.168.7.5, and the destination IP address is 192.168.219.24. At which point in the network was this packet captured? leaving host A leaving ATL leaving...