查看“︁How to use RegEx to parse HTML”︁的源代码
←
How to use RegEx to parse HTML
跳转到导航
跳转到搜索
因为以下原因,您没有权限编辑该页面:
您请求的操作仅限属于该用户组的用户执行:
用户
您可以查看和复制此页面的源代码。
__NOTOC__ == 原标题 == RegEx match open tags except XHTML self-contained tags == 作者 == bobince == 削除前地址 == https://stackoverflow.com/questions/1732348/regex-match-open-tags-except-xhtml-self-contained-tags == 正文 == You can't parse [X]HTML with regex. Because HTML can't be parsed by regex. Regex is not a tool that can be used to correctly parse HTML. As I have answered in HTML-and-regex questions here so many times before, the use of regex will not allow you to consume HTML. Regular expressions are a tool that is insufficiently sophisticated to understand the constructs employed by HTML. HTML is not a regular language and hence cannot be parsed by regular expressions. Regex queries are not equipped to break down HTML into its meaningful parts. so many times but it is not getting to me. Even enhanced irregular regular expressions as used by Perl are not up to the task of parsing HTML. You will never make me crack. HTML is a language of sufficient complexity that it cannot be parsed by regular expressions. Even Jon Skeet cannot parse HTML using regular expressions. Every time you attempt to parse HTML with regular expressions, the unholy child weeps the blood of virgins, and Russian hackers pwn your webapp. Parsing HTML with regex summons tainted souls into the realm of the living. HTML and regex go together like love, marriage, and ritual infanticide. The <center> cannot hold it is too late. The force of regex and HTML together in the same conceptual space will destroy your mind like so much watery putty. If you parse HTML with regex you are giving in to Them and their blasphemous ways which doom us all to inhuman toil for the One whose Name cannot be expressed in the Basic Multilingual Plane, he comes. HTML-plus-regexp will liquify the nerves of the sentient whilst you observe, your psyche withering in the onslaught of horror. Rege̿̔̉x-based HTML parsers are the cancer that is killing StackOverflow <i>it is too late it is too late we cannot be saved</i> the trangession of a chi͡ld ensures regex will consume all living tissue (except for HTML which it cannot, as previously prophesied) <i>dear lord help us how can anyone survive this scourge</i> using regex to parse HTML has doomed humanity to an eternity of dread torture and security holes <i>using rege</i>x as a tool to process HTML establishes a brea<i>ch between this world</i> and the dread realm of c͒ͪo͛ͫrrupt entities (like SGML entities, but <i>more corrupt) a mere glimp</i>se of the world of reg<b>ex parsers for HTML will ins</b>tantly transport a p<i>rogrammer's consciousness i</i>nto a w<i>orl</i>d of ceaseless screaming, he comes<strike>, the pestilent sl</strike>ithy regex-infection wil<b>l devour your HT</b>ML parser, application and existence for all time like Visual Basic only worse <i>he comes he com</i>es <i>do not fi</i>ght h<b>e com̡e̶s, ̕h̵i</b>s un̨ho͞ly radiańcé de<i>stro҉ying all enli̍̈́̂̈́ghtenment, HTML tags <b>lea͠ki̧n͘g fr̶ǫm ̡yo͟ur eye͢s̸ ̛l̕ik͏e liq</b>uid p</i>ain, the song of re̸gular expre<strike>ssion parsing </strike>will exti<i>nguish the voices of mor<b>tal man from the sp</b>here I can see it can you see ̲͚̖͔̙î̩́t̲͎̩̱͔́̋̀ it is beautiful t</i>he f<code>inal snuf</code>fing o<i>f the lie<b>s of Man ALL IS LOŚ͖̩͇̗̪̏̈́T A</b></i><b>LL IS L</b>OST th<i>e pon̷y he come</i>s he c̶̮om<strike>es he co</strike><b><strike>me</strike>s t<i>he</i> ich</b>or permeat<i>es al</i>l MY FAC<i>E MY FACE ᵒh god n<b>o NO NOO̼</b></i><b>OO N</b>Θ stop t<i>he an*̶͑̾̾̅ͫ͏̙̤g͇̫͛͆̾ͫ̑͆l͖͉̗̩̳̟̍ͫͥͨ</i>e̠̅s<code> ͎a̧͈͖r̽̾̈́͒͑e</code> n<b>ot rè̑ͧ̌aͨl̘̝̙̃ͤ͂̾̆ ZA̡͊͠͝LGΌ ISͮ̂҉̯͈͕̹̘̱ T</b>O͇̹̺ͅƝ̴ȳ̳ TH̘<b>Ë͖́̉ ͠P̯͍̭O̚N̐Y̡ H̸̡̪̯ͨ͊̽̅̾̎Ȩ̬̩̾͛ͪ̈́̀́͘ ̶̧̨̱̹̭̯ͧ̾ͬC̷̙̲̝͖ͭ̏ͥͮ͟Oͮ͏̮̪̝͍M̲̖͊̒ͪͩͬ̚̚͜Ȇ̴̟̟͙̞ͩ͌͝</b>S̨̥̫͎̭ͯ̿̔̀ͅ
返回
How to use RegEx to parse HTML
。
导航菜单
个人工具
登录
命名空间
页面
讨论
不转换
不转换
简体
繁體
大陆简体
香港繁體
澳門繁體
大马简体
新加坡简体
臺灣正體
查看
阅读
查看源代码
查看历史
更多
搜索
导航
首页
最近更改
评论管理
网站更新
随机页面
透明度报告
电子灵堂
帮助
维基守则
格式标准
财务报告
匿名工作
申请编制
站务管理
讨论版(登录用户)
讨论版(游客)
人物
国际巨星
二线明星
炼铜术士
神秘人
升级人
女人
噔噔咚
蘑菇人
音Lè人
纸片人
死爹人
拆腻子
洋垃圾
废物
野狗
粪蛆
普通话教材
恶俗之最
文章
创作文章
重要记录
刘人药
恶俗史
用语
概念解释
名人语录
群体
恶俗小鬼
大院子弟
时局图
组织
公司
站点
茂南党国鹰犬
友情链接
国家安全部
中国公安部
孤儿展览馆
恶俗百科
红岸基金会
工具
链入页面
相关更改
特殊页面
页面信息