XXE (XML External Entity) Attacks and Prevention

What are XXE vulnerabilities?

XXE (XML External Entity) vulnerabilities arise when untrusted data is passed to a misconfigured XML parser.

The XML protocol includes features for accessing files and network resources. If your XML parser has not been configured to disable these dangerous features, it could open a path for an attacker to access files on your server's disk and resources on your internal network.

How to prevent XXE vulnerabilities?

Follow these steps:

Use a well-known XML library with a good security record.
Configure the library so that dangerous features (external entities, document type definitions, and xinclude) are disabled.
As an additional layer of security, use a web application firewall (WAF) product in front of your web application, validate input, harden your infrastructure, restrict egress connections, minimize file permissions and handle errors properly.

A simple example

Let's say that you have a XML API for receiving pizza orders. A user of the API could submit the order like so:

<pizza>
  <size>
    Large
  </size>
  <toppings>
    <topping>Mozzarella</topping>
    <topping>Salami</topping>
  </toppings>
</pizza>

✓ Order received with the following toppings: Mozzarella, Salami

XML Entities

XML protocol supports variables, called entities, which are defined inside a DTD (Document Type Definition). So, the user could also submit the toppings as entities:

<!DOCTYPE pizza[
  <!ENTITY topping1 "Mozzarella">
  <!ENTITY topping2 "Salami">
]>
<pizza>
  <size>
    Large
  </size>
  <toppings>
    <topping>&topping1;</topping>
    <topping>&topping2;</topping>
  </toppings>
</pizza>

✓ Order received with the following toppings: Mozzarella, Salami

The "Billion Laughs" Attack

There are some risks with handling even these basic entities. How would the server like processing an XML document like this?

<!DOCTYPE pizza[
  <!ENTITY a "billion laughs">
  <!ENTITY b "&a;&a;&a;&a;&a;&a;&a;&a;&a;&a;&a;&a;">
  <!ENTITY c "&b;&b;&b;&b;&b;&b;&b;&b;&b;&b;&b;&b;">
  <!ENTITY d "&c;&c;&c;&c;&c;&c;&c;&c;&c;&c;&c;&c;">
  <!ENTITY e "&d;&d;&d;&d;&d;&d;&d;&d;&d;&d;&d;&d;">
  <!ENTITY f "&e;&e;&e;&e;&e;&e;&e;&e;&e;&e;&e;&e;">
  <!ENTITY g "&f;&f;&f;&f;&f;&f;&f;&f;&f;&f;&f;&f;">
  <!ENTITY topping1 "&g;&g;&g;&g;&g;&g;&g;&g;&g;&g;&g;&g;">
  <!ENTITY topping2 "Salami">
]>
<pizza>
  <size>
    Large
  </size>
  <toppings>
    <topping>&topping1;</topping>
    <topping>&topping2;</topping>
  </toppings>
</pizza>

The answer is: the server wouldn't like it at all. Parsing this exponentially expanding document could consume all of the server's resources and cause a denial of service condition. This is generally known as the "billion laughs" attack (I think because the original poc was like &lol1;&lol1;...).

XML External Entities

Now it gets dangerous. XML also supports including other XML documents from external sources, specified by an URI. Look at this:

<!DOCTYPE pizza[ <!ENTITY topping1 "Mozzarella">
<!ENTITY topping2 SYSTEM "file:///etc/passwd">
]>
<pizza>
  <size> Large </size>
  <toppings>
    <topping>&topping1;</topping>
    <topping>&topping2;</topping>
  </toppings>
</pizza>

✓ Order received with the following toppings: Mozzarella, root:x:0:0:root:/root:/bin/bash daemon:x:1:1:daemon:/...

Oops. Or what about this?

<!DOCTYPE pizza[
  <!ENTITY topping1 "Mozzarella">
  <!ENTITY topping2 SYSTEM "http://secret-internal-api.local/admin/all-the-sensitive-things/">
]>
<pizza>
  <size>
    Large
  </size>
  <toppings>
    <topping>&topping1;</topping>
    <topping>&topping2;</topping>
  </toppings>
</pizza>

✓ Order received with the following toppings: Mozzarella, {"allTheSensitiveThings": ["secrets"...

It's not just SYSTEM "file://..."

There are other ways to include external files in XML documents than the general external entities we have looked at so far.

First of all, you can use PUBLIC instead of SYSTEM.

  <!ENTITY topping2 PUBLIC "file:///etc/passwd">

And you can have parameter entities instead of general ones:

<!ENTITY % topping2 SYSTEM "file:///etc/passwd">

And sometimes you can also use XInclude to refer to external resources.

 <xi:include href="file:///etc/passwd"/>

Also, you generally don't need to have the file://, this will do:

<!ENTITY topping2 SYSTEM "/etc/passwd">

And sometimes, most commonly in Java, directory listings work perfectly well.

<!ENTITY topping2 SYSTEM "file:///">

✓ Order received with the following toppings: Mozzarella, /bin /boot /cdrom /dev /etc/ /home...

Writing to the filesystem, although a bit limited, can sometimes be possible with the jar:// protocol (also a courtesy of Java implementations).

Finally, there are rare cases such as PHP's expect:// protocol which can enable OS command execution through XXE vulnerabilities.

The output doesn't have to be reflected

This is not an exploitation guide, but it's good to know that so-called blind XXE vulnerabilities can be exploited perfectly well. The attacker can craft an XML document that reads e.g. a file and then sends it over to the attacker's server.

And at any rate, there are other ways in which XXE vulnerabilities can be reflected. For instance, error messages can often be used for this purpose.

Preventing XML entity vulnerabilities

To prevent these vulnerabilities in your application, use a good XML parser and configure it properly:

Disable document type definitions (DTD) altogether. This will also prevent the billion laughs attack.
Disable general external entity processing.
Disable parameter entity processing.
Disable XInclude support.

It generally looks more or less like this:

factory.setFeature("http://apache.org/xml/features/disallow-doctype-decl", true);
factory.setFeature("http://apache.org/xml/features/nonvalidating/load-external-dtd", false);
factory.setFeature("http://xml.org/sax/features/external-general-entities", false);
factory.setFeature("http://xml.org/sax/features/external-parameter-entities", false);
factory.setXIncludeAware(false);
factory.setExpandEntityReferences(false);

OWASP has curated a list of how to do this for different parsers: OWASP XXE Prevention Cheat Sheet.

Also, there are software libraries that provide you with XML parsers that have already been secured, such as defusedxml for Python.

Hardening against XML entity vulnerabilities

There are a couple of things that you should do in addition to the steps in the previous section, to get a nice layered security model.

Use a WAF (Web Application Firewall) product such as ModSecurity + CRS to prevent obviously malicious input from reaching your application.
Validate all input in your application to prevent obviously invalid input from reaching your XML parsing logic. Validating XML before parsing it can seem daunting, but it doesn't have to be a very complex regex pattern to already make attacks much harder.
Handle errors properly. Don't show stack traces etc. developer information to users and in general the more data you reflect the more chances of an attacker successfully exploiting XXE.
Restrict egress traffic from your application server to both the outside world and your internal network. If reflection doesn't work, the attacker is usually left with sending files, etc. to the attacker's server and scanning your internal network for interesting hosts/ports. Don't allow either of these. And note that DNS can be used to exfiltrate, so block arbitrary DNS as well. Either get rid of DNS altogether and use a local hosts-file or configure your upstream DNS server with a private zone to only resolve specific hostnames.
Keep your application user's filesystem permissions at a minimum.
Secure your infrastructure. Use authentication and encryption in all internal connections so that SSRF (Server-Side Request Forgery, that is, sending e.g. arbitrary HTTP requests to internal hosts) cannot be used for anything useful.

Know where you parse XML

Sometimes you parse XML without knowing it. For example, did you know that Office documents are zipped XML? And that SVG images are XML?

There was an interesting case on Hackerone where the XMP metadata of a JPG file was getting parsed unsafely. There are many other interesting XXE bugs there as well if you want to take a look.

Conclusion

XXE vulnerabilities are a serious risk, but you can prevent them by using a good XML parser and configuring it properly. As always, additional layers of security are recommended. These include using a WAF product, input validation, proper error handling, securing your infrastructure, minimizing filesystem permissions, and restricting egress network traffic.