HTML Entity Encoder/Decoder

Convert special characters to HTML entities for security. Prevent XSS attacks with proper encoding.

Common HTML Entities

&
&
& → & (Ampersand)
<
&lt;
< → &lt; (Less than)
>
&gt;
> → &gt; (Greater than)
"
&quot;
" → &quot; (Double quote)
'
&#39;
' → &#39; or &apos; (Apostrophe/Single quote)
&nbsp;
→ &nbsp; (Non-breaking space)
©
&copy;
© → &copy; (Copyright)
®
&reg;
® → &reg; (Registered trademark)
&trade;
™ → &trade; (Trademark)
&euro;
€ → &euro; (Euro sign)
£
&pound;
£ → &pound; (Pound sign)
¥
&yen;
¥ → &yen; (Yen sign)

What is HTML Entity Encoder/Decoder?

An HTML Entity Encoder/Decoder is a tool that converts special characters into their corresponding HTML entity representations and vice versa. HTML entities are used to display reserved characters in HTML that otherwise might be interpreted as code, or to display characters that aren't readily available on your keyboard. For example, the less-than symbol (<) is encoded as &lt; to prevent browsers from interpreting it as the start of an HTML tag. This tool is essential for web developers, content creators, and anyone working with HTML to ensure proper character display and prevent security vulnerabilities.

Why HTML Entity Encoding Matters

HTML entity encoding is crucial for web security and proper content display. Without proper encoding, special characters in user input can lead to Cross-Site Scripting (XSS) attacks, where malicious code is injected into web pages. Encoding ensures that characters like <, >, &, and quotes are displayed as text rather than being executed as HTML or JavaScript. Additionally, HTML entities allow you to display characters that aren't on standard keyboards, support multiple languages, and ensure consistent character rendering across different browsers and platforms. Proper encoding is a fundamental practice in web development and content management.

Common HTML Entities

CharacterEntityDescription
&&amp;& → &amp; (Ampersand)
<&lt;< → &lt; (Less than)
>&gt;> → &gt; (Greater than)
"&quot;" → &quot; (Double quote)
'&#39;' → &#39; or &apos; (Apostrophe/Single quote)
&nbsp; → &nbsp; (Non-breaking space)
©&copy;© → &copy; (Copyright)
®&reg;® → &reg; (Registered trademark)
&trade;™ → &trade; (Trademark)
&euro;€ → &euro; (Euro sign)
£&pound;£ → &pound; (Pound sign)
¥&yen;¥ → &yen; (Yen sign)

Types of HTML Entities

Named Entities: Human-readable names like &amp;, &lt;, &copy;. Easy to remember and widely supported. Example: &euro; for €

Numeric Entities (Decimal): Use character code numbers like &#169; for ©. Universal support for all Unicode characters.

Hexadecimal Entities: Use hex codes like &#x00A9; for ©. Alternative to decimal, often used in Unicode references.

When to Use HTML Entities

  • User Input: Always encode user-submitted content before displaying it to prevent XSS attacks.
  • Special Characters: Display reserved HTML characters (<, >, &, quotes) as text instead of code.
  • Unicode Characters: Show symbols, emojis, or characters from different languages that might not be in standard character sets.
  • Email Content: Ensure special characters display correctly across different email clients.
  • XML/XHTML: Required for proper XML parsing and validation.
  • Meta Tags: Encode special characters in meta descriptions and titles for SEO.

Security Considerations

HTML entity encoding is a critical security measure against XSS (Cross-Site Scripting) attacks. When you don't encode user input, attackers can inject malicious scripts into your web pages. For example, if a user submits '<script>alert("hacked")</script>' and you display it without encoding, the script will execute. Proper encoding converts this to '&lt;script&gt;alert(&quot;hacked&quot;)&lt;/script&gt;' which displays as harmless text. Always encode user input on the server side, not just client side. Use built-in functions in your programming language (like htmlspecialchars() in PHP, escape() in JavaScript, or html.escape() in Python) rather than writing your own encoding functions. Remember: never trust user input, always validate and encode.

Best Practices

  • Server-Side Encoding: Always encode on the server to prevent client-side bypass.
  • Context-Aware Encoding: Different contexts (HTML, JavaScript, CSS, URL) require different encoding.
  • Use Built-in Functions: Don't write your own encoders; use language-specific functions.
  • Double Encoding: Avoid encoding already-encoded text; it creates display issues.
  • Validate Input: Encoding is not a substitute for input validation; use both.
  • Content Security Policy: Combine encoding with CSP headers for defense in depth.
  • Database Storage: Store data in original form; encode only when displaying.
  • Test Thoroughly: Test with malicious inputs to verify encoding effectiveness.

Encoding Special Characters

HTML encoding replaces special characters with entity codes. The five essential characters to encode are: & (ampersand) becomes &amp;, < (less than) becomes &lt;, > (greater than) becomes &gt;, " (double quote) becomes &quot;, and ' (apostrophe) becomes &#39; or &apos;. These are the minimum characters you must encode to prevent HTML injection. However, for comprehensive security and internationalization, you should also encode characters outside the ASCII range (128+), control characters, and context-specific characters. Many programming languages provide functions that automatically encode all necessary characters.

Decoding HTML Entities

HTML entity decoding converts entity codes back into their original characters. This is useful when you need to process or display encoded text, or when migrating content between systems. However, be cautious when decoding user input - only decode trusted content, as decoding malicious encoded scripts can create security vulnerabilities. The decoding process recognizes both named entities (&copy;) and numeric entities (&#169; or &#x00A9;), converting them to their character equivalents. Modern browsers automatically decode entities in HTML content, but you might need manual decoding when processing data in JavaScript or backend code.

Special Cases & Edge Cases

  • ⚠️ Nested Encoding: Some systems double-encode, creating entities like &amp;lt; which displays as &lt;
  • ⚠️ Invalid Entities: Browsers may render invalid entities differently; always use correct syntax.
  • ⚠️ Whitespace: Non-breaking spaces (&nbsp;) prevent line breaks, regular spaces collapse.
  • ⚠️ Case Sensitivity: Named entities are case-sensitive; &Copy; is different from &copy;
  • ⚠️ Numeric Ranges: Not all numeric codes are valid; some create unpredictable results.
  • ⚠️ Browser Quirks: Old browsers may not support all named entities; numeric is safer for compatibility.

Tools & Libraries

JavaScript: Use textContent (auto-encodes) or libraries like DOMPurify for sanitization.
PHP: htmlspecialchars() for encoding, html_entity_decode() for decoding.
Python: html.escape() for encoding, html.unescape() for decoding.
Java: StringEscapeUtils from Apache Commons Lang library.
C#: HttpUtility.HtmlEncode() and HttpUtility.HtmlDecode().
Ruby: CGI.escapeHTML() and CGI.unescapeHTML().

Practical Examples

User Comment Display:
Input: I love <script>alert('test')</script> this!
Encoded: I love &lt;script&gt;alert('test')&lt;/script&gt; this!
Copyright Notice:
Input: © 2025 Company Name
Encoded: &copy; 2025 Company Name
Math Expression:
Input: 5 < 10 && 10 > 5
Encoded: 5 &lt; 10 &amp;&amp; 10 &gt; 5
Quote in Attribute:
Input: <div title="John's book">
Encoded: <div title="John&#39;s book">

Frequently Asked Questions

What's the difference between encoding and escaping?

Encoding and escaping are often used interchangeably, but technically, encoding converts characters to a different representation (like HTML entities), while escaping adds a special character (like backslash) to neutralize special meaning. In HTML context, we use 'encoding' to convert characters to entities. Both serve the same purpose: making special characters safe for their context.

Do I need to encode all special characters?

At minimum, encode the five critical characters: &, <, >, ", and '. However, for comprehensive security and internationalization, it's best to encode all characters outside the safe ASCII range (letters, numbers, common punctuation). Many modern encoding functions do this automatically. When in doubt, encode more rather than less.

Should I encode data before storing in a database?

No! Store data in its original, unencoded form in the database. Encode only when displaying the data in HTML. This allows you to use the data in different contexts (JSON API, email, etc.) without needing to decode and re-encode. It also prevents double-encoding issues and keeps your data clean and flexible.

Can I use encoding to prevent SQL injection?

No! HTML entity encoding does NOT protect against SQL injection. For database queries, use parameterized queries (prepared statements), which is the correct defense against SQL injection. HTML encoding is specifically for preventing XSS in HTML contexts. Different attack vectors require different defenses.

Why does my encoded text look garbled?

This usually happens due to double-encoding (encoding already-encoded text) or incorrect character set declarations. Make sure you're encoding only once, and that your HTML page declares the correct charset (usually UTF-8) in the <meta charset> tag. Also verify that your database and server are using the same character encoding.

Are named entities better than numeric entities?

Named entities (&copy;) are more readable and memorable, but numeric entities (&#169;) have universal support for all Unicode characters. Use named entities for common characters where they exist, and numeric entities for less common characters or when you need guaranteed compatibility with older browsers.

How do I handle encoding in JavaScript frameworks?

Modern frameworks like React, Vue, and Angular automatically encode text content to prevent XSS. However, be careful with dangerouslySetInnerHTML (React) or v-html (Vue) - these bypass encoding and can create vulnerabilities. Only use them with trusted content or content you've sanitized with a library like DOMPurify.

What about encoding in JSON APIs?

JSON has its own escaping rules (backslash escaping for quotes and special characters). Don't HTML-encode data in JSON APIs - send raw data and let the client encode it when displaying in HTML. Mixing HTML encoding in JSON creates unnecessary complexity and can cause double-encoding when the client also encodes.

Related Tools