Sentry Answers>HTML>

Regex to match open HTML tags except for self-contained tags

Regex to match open HTML tags except for self-contained tags

Matthew C.

The ProblemJump To Solution

You want a regular expression (regex) to match opening HTML tags such as <div>, <form id="myForm">, and <h1>. The regex should not match self-contained (self-closing) tags such as <img />, <br />, and <input />.

The Solution

Self-closing tags do not exist in HTML. HTML elements that can’t have any child nodes are void elements. These elements don’t have a closing tag. Self-closing tags, which contain a trailing slash character (”/”) before the closing angle bracket, are required for XML, XHTML, and SVG void elements. Some code formatters add a trailing slash to the start tag of an HTML void element to make them XHTML compatible and to improve readability. Self-closing tags can be used when writing HTML code since the trailing slash character is ignored by HTML parsers. These days HTML is used far more than XHTML: it’s the most used markup language for websites.

Various regexes can be used to match open HTML tags and not self-contained tags. For example:

Click to Copy

This regex does the following:

  • <: Match the opening angle bracket of an HTML tag.
  • ([a-z]+): Match one or more lowercase alphabetical characters.
  • (?![^>]*\/>): Negative lookahead that prevents matching closing tags. If there are zero or more characters other than ”>” followed by a ”/>” then the regex won’t match.
  • [^>]*>: The regex will match if the string ends in zero or more characters other than ”>” followed by a ”>” character.

Using a regex to find HTML tags is not ideal as it can lead to incorrect matches. For example, if you use the above regex for the following HTML string:

Click to Copy
<script> const myString = "<script></script>"; </script> <div class="container"> <!-- <img src="cat.jpg" alt="big cat" > --> </div>

The regex will match the <script> and <div> HTML opening tags. However, it will also match two opening tags that are not actual DOM tags: the <script> tag string in the myString variable and the <img> tag in the HTML comment.

A better approach is to use an HTML parser library such as Cheerio.

  • ResourcesWhat is Distributed Tracing
  • logo
    Listen to the Syntax Podcast

    Tasty treats for web developers brought to you by Sentry. Get tips and tricks from Wes Bos and Scott Tolinski.


Loved by over 4 million developers and more than 90,000 organizations worldwide, Sentry provides code-level observability to many of the world’s best-known companies like Disney, Peloton, Cloudflare, Eventbrite, Slack, Supercell, and Rockstar Games. Each month we process billions of exceptions from the most popular products on the internet.

© 2024 • Sentry is a registered Trademark
of Functional Software, Inc.