[π οΈ ASSEMBLY FUNDAMENTALS]
// Master the building blocks of reverse engineering by understanding assembly language, CPU architecture, and how high-level code translates to machine instructions.
[π― LEARNING_OBJECTIVES]
> learning_outcomes.list
- [β]Understand CPU architecture basics and instruction execution
- [β]Know the purpose and usage of CPU registers
- [β]Read and interpret basic assembly instructions
- [β]Understand memory layout and addressing modes
> prerequisites.cfg
- - Basic programming concepts
- - Understanding of variables and functions
- - Completed Lesson 1: Foundations
[π οΈ ESSENTIAL_TOOLS]
[! WARNING] Tool Setup Required
# This lesson contains real analysis outputs and screenshots.
# Install the tools below to follow along with hands-on practice.
$Free Tools (Start Here!)
π Ghidra (NSA's Free Disassembler)
Professional-grade reverse engineering tool, completely free!
ghidra-sre.org
π» Command Line Tools
Built into most systems or easily installable:
objdump
- Disassemble binariesstrings
- Extract text from binarieshexdump
- View binary datafile
- Identify file typesπ Radare2/rizin
Powerful command-line reverse engineering framework
brew install radare2
(macOS)πProfessional Tools
π₯ IDA Pro (Industry Standard)
The gold standard for professional reverse engineering
β‘ x64dbg (Windows Debugger)
Excellent free debugger for Windows programs
π¬ Binary Ninja
Modern interface with powerful analysis capabilities
π Quick Start: Your First Analysis (Using Ghidra)
1. Installation
# Extract the zip file
# Run ghidraRun (Linux/Mac) or ghidraRun.bat (Windows)
2. Create Your First Project
β’ Choose "Non-Shared Project"
β’ Give it a name like "Assembly_Practice"
3. Import a Binary
β’ Or File β Import File
β’ Try with /bin/ls (Linux/Mac) or C:\Windows\System32\calc.exe
4. Start Analysis
β’ Click "Yes" when asked to analyze
β’ Wait for analysis to complete
π‘ Pro Tip: Start with simple programs like calculator or text editors. They're easier to understand and less overwhelming than complex software.
[π’οΈ CPU_ARCHITECTURE]
The Central Processing Unit (CPU)
The CPU is the brain of your computer, executing instructions one by one in a continuous cycle. Understanding this cycle is crucial for reverse engineering because we're essentially reading the CPU's "thoughts" when we analyze assembly code.
π― Why This Matters for Reverse Engineering:
- β’ Malware Analysis: Understanding how instructions execute helps you trace malicious behavior step by step
- β’ Vulnerability Research: CPU execution patterns reveal where programs might crash or behave unexpectedly
- β’ Code Protection Bypass: Knowing instruction flow helps you skip license checks or authentication
- β’ Performance Analysis: Identify bottlenecks and optimization opportunities in software
π The Instruction Execution Cycle
Fetch
Get next instruction from memory
Decode
Interpret what the instruction means
Execute
Perform the operation
Store
Save results if needed
π Real Example: Password Check Bypass
Imagine you're reversing a program with a password check:
Reverse Engineering Insight: You could change the je (jump if equal) to jmp (always jump) to bypass the password check entirely!
x86-64 Architecture Overview
Most modern computers use x86-64 architecture (also called AMD64). This is what we'll focus on as it's the most common in reverse engineering scenarios.
π Key Characteristics of x86-64:
- β’ 64-bit architecture - Can handle 64-bit data and addresses
- β’ Variable instruction length - Instructions can be 1-15 bytes long
- β’ Complex Instruction Set - Many powerful instructions available
- β’ Multiple addressing modes - Flexible ways to access memory
π» Reverse Engineering Impact:
Variable Length = Analysis Challenge
Disassemblers can misinterpret where instructions start/end, making malware analysis harder.
Complex Instructions = Hidden Logic
Single instructions can perform multiple operations, hiding complex behavior in simple-looking code.
[π¦ CPU_REGISTERS]
Registers are like the CPU's personal workspace - small, super-fast storage locations directly inside the processor. Think of them as the CPU's "hands" for holding and manipulating data.
π΅οΈββοΈ Why Registers are a Reverse Engineer's Best Friend:
Function Arguments
Registers hold function parameters, revealing what data is being passed around
Return Values
Function results appear in specific registers, showing program outcomes
Hidden Data
Malware often stores decryption keys or important values in registers
Program State
Register contents reveal the current state and behavior of the program
π οΈ General Purpose Registers
Accumulator - Often holds return values
Common use: Function results, arithmetic
π RE Example: Check if function returned 0 (success) or 1 (failure)
Base - General storage
Common use: Data storage, base addresses
π RE Example: Often holds pointers to important data structures or strings
Counter - Loop operations
Common use: Loop counters, string operations
π RE Example: Find loop boundaries to understand encryption algorithms
Data - I/O operations
Common use: I/O port access, large arithmetic
π RE Example: Contains file handles or network socket descriptors
Source Index - Source for operations
Common use: String/memory copy source
π RE Example: Points to source data in memory copying/decryption routines
Destination Index
Common use: String/memory copy destination
π RE Example: Points to where malware writes decoded payload
Additional general registers
Common use: Extra storage in 64-bit mode
π RE Example: Modern malware uses these for complex obfuscation schemes
βοΈ Special Purpose Registers
Instruction Pointer
Points to next instruction to execute
RE Importance: Critical for control flow analysis
Stack Pointer
Points to top of the stack
RE Importance: Essential for function calls
Base Pointer
Points to base of current stack frame
RE Importance: Helps navigate local variables
Status Flags
Stores condition codes and CPU state
RE Importance: Controls conditional jumps
π‘ Pro Tip: Register Naming
Registers have different names based on their size:
- β’ RAX = 64-bit (full register)
- β’ EAX = 32-bit (lower half)
- β’ AX = 16-bit (lower quarter)
- β’ AL = 8-bit (lowest byte)
π Practical Example: License Check Analysis
Let's see how registers reveal program logic:
- β’ RDI holds the license key - we found where it's stored!
- β’ RAX contains validation result - 0 = invalid, non-zero = valid
- β’ RBX tracks software mode - we could patch this to always = 1
[π ASSEMBLY_INSTRUCTIONS]
Assembly instructions are like a very simple language the CPU understands. Each instruction tells the CPU to perform one basic operation. Let's learn the most common ones you'll encounter in reverse engineering.
β‘ Why Each Instruction Type Matters in Reverse Engineering:
π¦ Data Movement
Reveals how malware loads encrypted payloads, API addresses, or configuration data
π’ Arithmetic
Shows encryption/decryption algorithms, checksum calculations, and obfuscation math
π Control Flow
Exposes program logic - loops, conditions, function calls, and decision points
π¦Data Movement Instructions
MOV
Copy data from source to destination
LEA
Load Effective Address - calculate address
π’Arithmetic Instructions
ADD / SUB
Addition and subtraction
MUL / IMUL
Unsigned and signed multiplication
πControl Flow Instructions
JMP
Unconditional jump
CMP
Compare two values
Conditional Jumps
Jump based on flags
π§ Understanding Control Flow
The combination of CMP followed by conditional jumps is how if-statements, loops, and other control structures are implemented at the assembly level. This pattern is everywhere in reverse engineering!
[πΊοΈ MEMORY_LAYOUT]
Understanding how programs organize memory is crucial for reverse engineering. Let's explore the typical memory layout of a running program.
π‘οΈ Why Memory Layout is Critical for Security Analysis:
Buffer Overflow Detection
Stack layout helps identify when programs write beyond buffer boundaries
Exploit Development
Memory layout knowledge is essential for ROP chains and shellcode injection
Malware Analysis
Understanding where malware stores data helps locate encryption keys and config
Anti-Analysis Evasion
Memory protections and ASLR can be bypassed with proper layout understanding
π Memory Sections
Function calls, local variables
Grows downward
Dynamic memory allocation
Grows upward
Global variables, initialized data
Fixed size
Program instructions (code)
Fixed size
π― Addressing Modes
Different ways to specify where data is located:
Value is specified directly in instruction
Value is in a register
Value is at specific memory address
Address is stored in register
Base register plus/minus offset
Complex calculation for address
π Reverse Engineering Tip:
Pay attention to addressing patterns![rbp-8], [rbp-16]usually indicate local variables, while[rip+offset] often points to global data.
π₯ Buffer Overflow Example:
Here's how addressing reveals vulnerability:
Vulnerability: Buffer at [rbp-32] can overflow into return address at [rbp+8], allowing code execution!
[π¬ BINARY_ANALYSIS_TUTORIAL]
π― Learning Goal
By following this guide, you'll learn exactly how to analyze any binary file from start to finish. We'll use both command-line tools and Ghidra to give you multiple approaches.
CLICommand Line Analysis
Step 1: Basic File Info
Always start here! Know what you're dealing with.
Step 2: String Analysis
Strings reveal a lot! Look for passwords, URLs, error messages.
Step 3: Disassembly
Start with main(), then explore interesting functions.
Step 4: Advanced Analysis
Understand the binary's security posture and dependencies.
GUIGhidra Analysis
Step 1: Import & Analyze
- β’ File β Import File (or drag & drop)
- β’ Double-click imported file
- β’ Click "Yes" to analyze when prompted
- β’ Wait for auto-analysis to complete (green bar)
Step 2: Navigate the Interface
- β’ Symbol Tree: Functions, variables, imports
- β’ Listing: Assembly code view
- β’ Decompiler: C-like pseudo code
- β’ Program Tree: File structure
Step 3: Analysis Features
- β’ String Search: Search β For Strings
- β’ Cross References: Right-click β Show References
- β’ Function Graph: Window β Function Graph
- β’ Hex View: Window β Bytes
Step 4: Make Notes
- β’ Right-click β Set Comment (add notes)
- β’ Right-click β Set Label (rename functions)
- β’ Bookmarks β Add Bookmark (save locations)
- β’ File β Export β Various formats
π― Common Patterns to Look For
π¨ Suspicious Indicators
ptrace
, IsDebuggerPresent
fopen
, temp directories, persistence pathsβ Normal Program Elements
main
, printf
, malloc
[π¬ HANDS_ON_EXERCISES]
Now that you know the process, let's practice! We'll analyze two binaries: a simple "hello" program and an educational malware simulation. Use the techniques from above to follow along.
1Exercise 1: Hello Binary Analysis
Let's start with our simple hello program to understand the basics.
Step 1: File Information
First, let's identify what type of file we're dealing with:
- β’ 64-bit ARM executable (Apple Silicon)
- β’ 16KB file size (typical for simple programs)
- β’ Executable permissions set
- β’ No file? Check you're in the right directory
- β’ Permission denied? Use
chmod +x
- β’ Different architecture? That's expected!
Step 2: String Analysis
Extract readable strings to understand the program's purpose:
- β’ Two clear text messages (program purpose)
- β’ Uses
printf
function for output - β’ No suspicious URLs or IPs found
- β’ Standard system library paths
- β’ Use
strings -n 8
for longer strings - β’ Try
strings -e l
for UTF-16 - β’ Pipe to
sort | uniq
to remove duplicates - β’ Look for base64, hex, or encrypted data
Step 3: Disassembly Analysis
Now let's look at the actual assembly code:
π Analysis Notes:
- β’ sub sp, sp, #0x20 - Allocate stack space
- β’ stp x29, x30 - Save frame pointer and return address
- β’ adrp/add - Load string addresses
- β’ bl - Call printf function
- β’ ret - Return from function
2Exercise 2: Educational Malware Analysis
β οΈ Important Safety Note
This is a completely safe educational simulation that only prints messages. It demonstrates common malware techniques without performing any harmful actions.
Step 1: Initial Execution
Run the malware simulation to see what it does:
Step 2: String Analysis - Spotting Obfuscation
Look for suspicious strings and patterns:
π¨ Red Flags: Anti-debugging, C&C communication, hidden functionality!
Step 3: Discovering Hidden Features
We found "--advanced" in the strings. Let's try it:
π‘ Discovery: Hidden advanced malware features unlocked by command-line argument!
Step 4: Finding Encrypted Strings
Look for the obfuscated data at the end of strings output:
These are XOR-encrypted strings! Real malware often hides important data this way.
π― What We Learned:
- β’ Anti-Analysis: Debugger detection to avoid reverse engineering
- β’ String Obfuscation: XOR encryption to hide malicious intent
- β’ Hidden Functionality: Command-line activated advanced features
- β’ Persistence: Attempts to maintain presence on system
- β’ C&C Communication: Network communication with control server
πChallenge: Decrypt the Hidden Messages
Using your assembly knowledge, can you figure out how to decrypt those XOR-encrypted strings? Look at the malware_sim.c source code and find the XOR key!
Hints:
- β’ Look for the
decrypt_string
function - β’ Find the XOR key value (it's 0x32)
- β’ Try manually XORing the encrypted bytes
- β’ The decrypted results demonstrate the XOR process (they may appear garbled - that's normal!)
πUse the XOR Playground Tool!
The XOR playground script makes this challenge much easier to understand. Here's how to use it:
Step 1: Run the Script
Step 2: Choose Option 4
Step 3: See the Magic!
Option 4 shows the exact encrypted data from malware_sim and demonstrates step-by-step XOR decryption!
Step 4: Experiment
Try option 2 with hex bytes 5d 40 74
and key 50
πDownload Practice Files
π Exercise Source Files
hello.c
DownloadSimple Hello World program with detailed comments and analysis tips
malware_sim.c
DownloadEducational malware simulation (completely safe!) with encryption, anti-debugging, and hidden features
π§ Analysis Tools
analyze.sh
DownloadComprehensive automated binary analysis script with risk assessment
README.md
DownloadComplete instructions, learning objectives, and troubleshooting guide
πSTUDENT_GUIDE.md
DownloadComplete step-by-step walkthrough - Follow this for guaranteed success!
QUICK_CHECKLIST.md
DownloadQuick reference checklist to track your progress through the challenge
πxor_playground.py
DownloadInteractive XOR Encryption/Decryption Tool - Perfect for understanding the malware_sim XOR challenge!
- β’ Text encryption/decryption
- β’ Hex byte analysis
- β’ XOR number calculator
- β’ Malware challenge simulation
π Quick Start Commands
π‘ Pro tip: Create a new folder for these exercises to keep your workspace organized!
πExercise 3: Your Turn - Analyze Any Binary
π» Try These Common Binaries
Directory listing command - great for beginners
strings /bin/ls | grep "Usage"
Shows current username - simple and clean
objdump -d /usr/bin/whoami | head -50
File display utility - moderate complexity
strings /bin/cat | grep -E "(error|usage)" -i
π― Analysis Checklist
π Challenge Yourself
Pick a binary you use daily (text editor, browser, game) and spend 30 minutes analyzing it. You'll be amazed at what you discover!
π Essential Analysis Commands
π‘ Pro tip: Create a simple script with these commands, or use them one by one. Replace /path/to/binary
with your actual file path.
π§© Putting It All Together: From Code to Assembly to Exploitation
Let's see how a simple C function becomes assembly code and reveals security vulnerabilities:
π Original C Code:
π§ Assembly Translation:
π¨ What a Reverse Engineer Sees:
- β’ Buffer Overflow: Only 32 bytes allocated but no length check on input
- β’ Stack Layout: Buffer starts at RSP, return address 32 bytes higher
- β’ Control Flow: Success/fail logic in the conditional jump
- β’ Secret Location: Password string address loaded into RSI
π§ Possible Exploits/Bypasses:
- β’ Buffer Overflow: Send 40+ characters to overwrite return address
- β’ Logic Bypass: Patch the je return_success to jmp return_success
- β’ String Discovery: Find the secret string at the address in RSI
- β’ Return Value: Always return 1 by patching the return logic
π The Power of Assembly Knowledge:
By understanding assembly, you can see vulnerabilities invisible in source code, find hidden functionality, bypass security checks, and understand exactly how malware operates. Every high-level programming construct has an assembly representation - and that's where the real secrets hide!
π Lesson Summary
β What We Covered:
- β’ CPU architecture and instruction execution cycle
- β’ General purpose and special registers
- β’ Basic assembly instructions (MOV, ADD, JMP, etc.)
- β’ Memory layout and addressing modes
- β’ Practical analysis techniques
π― Next Steps:
- β’ Complete the hands-on exercise
- β’ Practice reading assembly code daily
- β’ Experiment with different binaries
- β’ Move to Lesson 3: Executable File Formats