0x7F
0x45
0x4C
0x46
0x4D
0x5A
0x90
0x00
PE
ELF
MZ
NE
LE
LX
COFF
academy$ hexdump -C /lessons/file-formats/lesson.bin | head
00000000 7f 45 4c 46 02 01 01 00 00 00 00 00 00 00 00 00 |.ELF............|
[LESSON_03][INTERMEDIATE]

[📦 EXECUTABLE FILE FORMATS]

// Dissect the anatomy of executables. Master PE, ELF, and Mach-O formats to understand how programs are structured, loaded, and executed on different operating systems.

academy$ cat objectives.md

[🎯 LEARNING_OBJECTIVES]

> learning_outcomes.list

  • [✓]Understand PE, ELF, and Mach-O file structure and headers
  • [✓]Parse executable sections, imports, and exports
  • [✓]Identify packed/encrypted executables
  • [✓]Extract and analyze embedded resources

> prerequisites.cfg

  • - Completed Lesson 2: Assembly Basics
  • - Understanding of hexadecimal
  • - Basic file system knowledge
# Estimated completion time: 4 hours
academy$ yara --help

[📋 YARA_INTRODUCTION]

🔗 Why YARA in a File Formats Lesson?

File format analysis and YARA go hand-in-hand! Once you understand PE, ELF, and Mach-O structures, you need a way to automatically detect and classify files based on these characteristics.

🔍 Detection Use Cases:

  • • Find files with suspicious PE imports
  • • Detect ELF files with rootkit characteristics
  • • Identify Mach-O files with code injection
  • • Classify malware families by structure

🎯 Practical Applications:

  • • Scan thousands of files automatically
  • • Create detection rules for security tools
  • • Build threat intelligence databases
  • • Automate malware triage processes

💡 Think of it this way: You're learning to read the "DNA" of files (PE/ELF/Mach-O), and YARA is your tool to search for specific "genetic patterns" across entire file systems!

🤔 What is YARA?

YARA is like a "super-powered search tool" for files. Think of it as:

  • Google search but for binary files and malware
  • Pattern matching with advanced logic capabilities
  • Detective tool that helps identify suspicious files
  • Industry standard used by security professionals worldwide

Why Learn YARA?

  • 🔍 Malware Detection: Find hidden threats
  • 🎯 Threat Hunting: Search for attack patterns
  • 🚨 Incident Response: Quickly classify files
  • 📊 Research: Categorize malware families

🚀 Quick Start Guide

1. Installation (macOS)

# Install YARA using Homebrew
brew install yara
# Verify installation
yara --version

2. Basic Usage

# Scan a file with rules
yara my_rules.yar suspicious_file.exe
# Show detailed matches
yara -s my_rules.yar target_file

3. Rule Structure

rule RuleName {
    meta:
        description = "What this detects"
    strings:
        $text = "pattern to find"
    condition:
        $text
}

📚 Learning Resources

Learning Progression:
1. Learn file format structures (PE, ELF, Mach-O)
2. Practice YARA rule creation with our samples
3. Apply both skills in the hands-on workshop
4. Master advanced detection in the final challenge
academy$ ls -la /tools/file_analysis/

[🛠️ ANALYSIS_TOOLS]

$Command Line Arsenal

🔍 file / hexdump

Quick file identification and hex analysis

file binary.exe
hexdump -C binary.exe | head

objdump / readelf

Deep section and header analysis

objdump -h binary.exe
readelf -h binary.elf

🗂️ strings / nm

Extract strings and symbol tables

strings -a binary.exe
nm -D binary.elf

💎Professional Tools

🔥 PE-bear / ELF Parser

Specialized parsers for each format

PE-bear: Windows PE analysis
ELF Parser: Linux ELF analysis

🔬 CFF Explorer

Complete PE file editor and analyzer

Features: Header editing, resource extraction, import table modification

⚡ HxD / 010 Editor

Advanced hex editors with templates

Templates: Pre-built parsers for PE/ELF/Mach-O structures
📜
Now let's dive into the file formats...
Keep YARA in mind as we explore each structure!
🔍
academy$ file sample.exe && hexdump -C sample.exe | head -5

[🏢 PORTABLE_EXECUTABLE_(PE)]

🎯 Why PE Format Mastery is Critical:

Windows Dominance

Most malware targets Windows, making PE analysis essential for threat hunters

Hiding Techniques

Packers, crypters, and rootkits manipulate PE structure to evade detection

Import Analysis

API imports reveal malware capabilities before dynamic analysis

Resource Extraction

Embedded payloads, configs, and certificates hidden in resource sections

📋 PE File Structure

PE files have a layered structure designed for efficient loading and execution. Understanding this hierarchy is key to effective analysis.

PE File Layout

DOS Header

Legacy compatibility

0x00
DOS Stub

"This program cannot be run..."

0x40
PE Header

PE signature + COFF Header

0x80
Optional Header

Essential execution info

0x98
Section Headers

Map of all sections

0x178
.text

Executable code

0x400
.data

Initialized data

...
.rsrc

Resources (icons, strings)

...

Critical Headers

DOS Header (IMAGE_DOS_HEADER)
e_magic: "MZ" signature (0x5A4D)
e_lfanew: Offset to PE header
RE Tip: If e_lfanew is modified, tools may fail to parse the PE!
COFF Header
Machine: Target architecture (x64, x86)
Characteristics: File properties
TimeDateStamp: Compilation time
Malware Insight: Timestamps often reveal malware families or campaigns!
Optional Header
AddressOfEntryPoint: Where execution begins
ImageBase: Preferred load address
DataDirectory: Import/Export tables
Security Note: Modified entry points often indicate packed malware!

🔗 Import/Export Analysis

Import Analysis

Imports reveal which APIs the malware uses - often the first clue to its capabilities.

# Common malware imports
kernel32.dll:
CreateFile, WriteFile, ReadFile
wininet.dll:
InternetOpen, HttpOpenRequest
advapi32.dll:
RegCreateKey, CryptEncrypt
CreateMutex
HighPrevent multiple infections
VirtualAlloc
HighAllocate executable memory
URLDownloadToFile
CriticalDownload additional payloads
SetWindowsHook
CriticalKeylogging/screen capture

Export Analysis

Exports show functions that other programs can call - useful for DLL analysis.

# Typical DLL exports
DllMain
ServiceMain
InstallHook
InjectPayload ← Suspicious!
Analysis Workflow:
  1. 1. Check for suspicious export names
  2. 2. Cross-reference with import analysis
  3. 3. Look for ordinal-only exports (obfuscation)
  4. 4. Identify callback functions and hooks
academy$ readelf -h sample.elf && objdump -h sample.elf

[🐧 EXECUTABLE_LINKABLE_FORMAT_(ELF)]

🐧 Why ELF Analysis Matters:

Linux Dominance

Servers, IoT devices, Android - ELF is everywhere in modern infrastructure

Rootkit Analysis

Linux rootkits manipulate ELF structures for stealth and persistence

Symbol Stripping

Malware often strips symbols, making ELF header analysis crucial

Library Injection

Dynamic linking allows sophisticated injection and hooking techniques

📋 ELF File Structure

ELF File Layout

ELF Header

File identification & layout

7F 45 4C 46
Program Headers

Segment info for loader

PT_LOAD
Section Headers

Section info for linking

.text
.text Section

Executable code

CODE
.data Section

Initialized data

DATA
.bss Section

Uninitialized data

BSS
Symbol Table

Function/variable names

SYMTAB
String Table

String storage

STRTAB

ELF Header Deep Dive

Magic Numbers & Identification
e_ident[0-3]: 7F 45 4C 46 ("ELF")
e_ident[4]: 32/64-bit (01/02)
e_ident[5]: Endianness (01=little, 02=big)
e_ident[7]: OS/ABI (Linux, FreeBSD, etc.)
Critical Fields
e_typeET_EXEC, ET_DYN, ET_REL
e_machinex86-64, ARM, MIPS
e_entryEntry point address
e_phoffProgram header offset
⚠️ Malware Indicators
  • • Modified e_entry pointing to shellcode
  • • Unusual e_machine values for target platform
  • • Corrupted program/section header counts
  • • Non-standard ELF magic variations

🔗 Dynamic Linking & Dependencies

Library Dependencies

# Check dynamic dependencies
$ ldd /bin/ls
linux-vdso.so.1 (0x7fff8d1fe000)
libselinux.so.1 => /lib64/libselinux.so.1
libc.so.6 => /lib64/libc.so.6
/lib64/ld-linux-x86-64.so.2
Normal Dependencies
  • • libc.so.6 (standard C library)
  • • libm.so.6 (math library)
  • • libpthread.so.0 (threading)
Suspicious Dependencies
  • • Unknown .so files in /tmp
  • • Libraries with random names
  • • Missing NEEDED entries (static linking)
  • • Unusual library paths

Symbol Analysis

# Extract symbol information
$ nm -D binary.elf
U printf@@GLIBC_2.2.5
U socket@@GLIBC_2.2.5
0000000000001040 T main
U system@@GLIBC_2.2.5 ← Danger!
Symbol Types
U - Undefined (imported)
T - Text section (code)
D - Initialized data
B - Uninitialized data
High-Risk Functions
  • • system(), execve() - Command execution
  • • socket(), connect() - Network activity
  • • dlopen(), dlsym() - Dynamic loading
  • • ptrace() - Anti-debugging/injection
academy$ file sample.app && otool -h sample.app

[🍎 MACH_O_FORMAT]

🍎 Mach-O in Modern Security:

macOS Malware Rise

Growing macOS user base attracts more sophisticated malware targeting Mach-O

iOS/Mobile Security

iPhone apps use Mach-O - critical for mobile malware analysis

Code Signing

Apple's code signing stored in Mach-O structures - bypass techniques exist

Dylib Hijacking

Dynamic library loading vulnerabilities unique to Mach-O format

📋 Mach-O File Structure

Mach-O Layout

Mach Header

CPU type, file type, load commands count

FEEDFACE
Load Commands

Instructions for loader/linker

LC_SEGMENT
Segment Data

Actual segment contents

DATA
Mach-O Magic Numbers
0xFEEDFACE: 32-bit Mach-O
0xFEEDFACF: 64-bit Mach-O
0xCAFEBABE: Universal Binary

Load Commands Deep Dive

Load commands tell the system how to load and link the executable. Understanding these is key to Mach-O analysis.

Essential Load Commands
LC_SEGMENT_64Define memory segments
LC_LOAD_DYLIBDynamic library dependencies
LC_MAINEntry point for executables
LC_CODE_SIGNATURECode signing information
Security Implications
  • LC_LOAD_DYLIB: Library hijacking vectors
  • LC_RPATH: Runtime search path manipulation
  • • Modified segments: Code injection points
  • • Missing signatures: Unsigned/tampered binaries
Analysis Commands
otool -l binary # List load commands
otool -L binary # Show dependencies
codesign -dv binary # Check signatures
academy$ cd /workshop && ls -la samples/

[🔬 HANDS_ON_WORKSHOP]

Time to put your knowledge to work! We'll analyze real samples from each format, identifying key structures and potential security issues.

🔗

YARA Integration in Practice

As you work through each exercise, think about how the patterns you discover could become YARA rules. The suspicious APIs, file structures, and strings you identify are exactly what YARA searches for!

⚠️ Important: These are text-based educational simulations that contain the patterns and strings of real malware without being executable. Tools like objdump, readelf, and otool won't work. Focus on strings and grep for pattern analysis. Some magic numbers may not be present at the exact file offsets - the educational value is in learning to recognize the structural patterns and suspicious strings!

📝 Download Usage Guide

1PE Analysis: Suspected Trojan

🛡️ Safe Analysis Environment

This sample is a safe educational simulation designed to demonstrate PE structure analysis without any harmful behavior.

Step 1: Basic Information

$ file samples/trojan_sim.exe
Unicode text, UTF-8 text, with very long lines
$ ls -la samples/trojan_sim.exe
-rw-r--r-- 1 user staff 783 Dec 25 10:30
Educational Note: This is a text-based PE simulation for safe learning - contains PE structure patterns without being executable

Step 2: String Analysis (Import Patterns)

$ strings samples/trojan_sim.exe | grep -E "dll"
...kernel32.dll...wininet.dll...advapi32.dll...urlmon.dll...
$ strings samples/trojan_sim.exe | grep "URLDownload"
URLDownloadToFileW ← CRITICAL
$ strings samples/trojan_sim.exe | grep "Mutex"
CreateMutexW
$ strings samples/trojan_sim.exe | grep "Hook"
SetWindowsHookW
🚨 Red Flag: URLDownloadToFileW indicates downloading capability!

Step 3: Structure Analysis

$ strings samples/trojan_sim.exe | grep -E "(text|data|rsrc)"
.text .data .rsrc
$ hexdump -C samples/trojan_sim.exe | head -3
00000000 4d 5a 20 20 20 20 20 20 |MZ |
... 54 68 69 73 20 70 72 |This pr |
Learning Focus: Notice the PE signature patterns - this simulates real PE structure for safe analysis practice

Step 4: String Analysis

$ strings samples/trojan_sim.exe | grep -E "(http|tmp|\.exe)"
http://malware-c2.example.com/payload.exe
C:\\temp\\backdoor.exe
Software\\Microsoft\\Windows\\CurrentVersion\\Run
Analysis: C2 server, persistence registry key, temp file drops

📊 Analysis Summary

Malware Category

Dropper/Downloader Trojan

Risk Level

High - Downloads additional payloads

Persistence

Registry Run key modification

2ELF Analysis: Suspected Rootkit

🛡️ Educational Sample

Safe rootkit simulation for learning ELF analysis techniques. Contains no actual malicious functionality.

Step 1: File Identification

$ file samples/rootkit_sim
Unicode text, UTF-8 text, with very long lines
$ ls -la samples/rootkit_sim
-rw-r--r-- 1 user staff 842 Dec 25 10:30
$ hexdump -C samples/rootkit_sim | head -2
00000000 45 4c 46 ... |ELF.........|
Educational Note: Text-based ELF simulation - notice the ELF magic number (45 4c 46) at the start!

Step 2: String Analysis (Symbol Patterns)

$ strings samples/rootkit_sim | grep "@@GLIBC"
printf@@GLIBC_2.2.5
ptrace@@GLIBC_2.2.5 ← Anti-debug
dlopen@@GLIBC_2.2.5 ← Dynamic loading
socket@@GLIBC_2.2.5
system@@GLIBC_2.2.5 ← Command execution
$ strings samples/rootkit_sim | grep "_process"
hide_process
$ strings samples/rootkit_sim | grep "hook"
hook_syscall

Step 3: Library Dependencies

$ strings samples/rootkit_sim | grep "\.so"
libc.so.6
libdl.so.2
/tmp/libhook.so.1 ← Suspicious!
$ strings samples/rootkit_sim | grep "/tmp"
/tmp/libhook.so.1
🚨 Alert: Library in /tmp indicates malicious injection

Step 4: Section Analysis

$ strings samples/rootkit_sim | grep -E "\.(text|data|bss)"
.text .data .bss
.payload .symtab .strtab
$ strings samples/rootkit_sim | grep "payload"
.payload
inject_payload
Critical: References to ".payload" section and "inject_payload" function indicate code injection capabilities

🔍 Rootkit Indicators Found

  • Anti-debugging: ptrace() usage to detect analysis
  • Dynamic injection: dlopen() for runtime library loading
  • Syscall hooking: Functions to intercept system calls
  • RWX sections: Self-modifying code capability
  • Temp libraries: Malicious shared objects in /tmp

3Mach-O Analysis: macOS Malware

🛡️ macOS Sample

Educational Mach-O simulation demonstrating dylib hijacking and code signing bypass patterns. Completely safe for analysis.

Step 1: File Type & Architecture

$ file samples/macos_malware.app
Unicode text, UTF-8 text, with very long lines
$ ls -la samples/macos_malware.app
-rw-r--r-- 1 user staff 879 Dec 25 10:30
$ strings samples/macos_malware.app | grep "MACH-O"
EDUCATIONAL MACH-O SIMULATION
Educational Note: Text-based Mach-O simulation - contains Mach-O structure patterns and load commands for learning!

Step 2: Load Commands

$ strings samples/macos_malware.app | grep "LC_"
LC_SEGMENT_64
LC_LOAD_DYLIB
LC_MAIN
LC_CODE_SIGNATURE
$ strings samples/macos_malware.app | grep "malicious"
/tmp/malicious.dylib ← Suspicious!

Step 3: Code Signing Analysis

$ strings samples/macos_malware.app | grep -i "sign"
LC_CODE_SIGNATURE
Code signing bypass simulation
$ strings samples/macos_malware.app | grep -i "unsigned"
Unsigned binary for educational purposes
Major Red Flag: References to "unsigned" and "code signing bypass" indicate malicious intent

Step 4: Dynamic Libraries

$ strings samples/macos_malware.app | grep -E "(dylib|framework)"
/usr/lib/libSystem.B.dylib
Foundation.framework/Foundation
/tmp/malicious.dylib ← Hijacking target
Evil.framework/Evil
$ strings samples/macos_malware.app | grep "@executable"
@executable_path/../Frameworks/Evil.framework/Evil
@executable_path manipulation examples

🍎 macOS Malware Characteristics

Bypass Techniques

Unsigned code, dylib hijacking

Persistence

Framework injection, @rpath manipulation

Stealth

Exploits code signing gaps

academy$ ./advanced_challenge --file-format-mastery

[🏆 MASTER_CHALLENGE]

🎯 Multi-Format Analysis Challenge

You've been given a suspicious file that appears to be polymorphic malware - it changes format based on the target system. Your mission: analyze all three variants.

Phase 1: Format Detection

  • • Identify which format each sample uses
  • • Extract magic numbers and signatures
  • • Determine target architectures

Phase 2: Cross-Platform Analysis

  • • Compare import/symbol tables across formats
  • • Find common functionality indicators
  • • Map equivalent APIs between platforms

Phase 3: Advanced Techniques

  • • Extract embedded payloads from each format
  • • Identify packing/obfuscation techniques
  • • Create YARA rules for detection

🛠️ Tools You'll Need

Command Arsenal

# Multi-format analysis
file samples/*
hexdump -C samples/* | head
binwalk samples/*
# Format-specific tools
objdump -x samples/sample.exe
readelf -a samples/sample.elf
otool -tv samples/sample.app

📦 Challenge Materials

Download all samples and the step-by-step analysis guide to complete the master challenge.