Reverse for Generics

泛型是高级编程语言中广泛使用的一种特性，能够帮助开发人员在构建代码时大大减少重复代码定义。在Rust中同样存在泛型定义。

从汇编的角度来看，泛型实际上是让编译器代替开发人员辅助完成代码构建。泛型特性本身并不会在汇编层面中被明确表现，具有不同泛型类型的类在编译后即会成为不同的类。

如在C++中，我们定义vector<int>和vector<double>时，即已经告诉编译器应该如何定义vector数组中的元素长度，前者为4而后者为8。可以说，泛型的“动态”特性经过编译器的处理后，在汇编层将会成为完全静态的状态。

那么Rust对于泛型的处理是否和其他语言一样呢？开始测试。

Generics in Functions

fn largest<T: PartialOrd>(list: &[T]) -> &T {
    let mut largest: &T = &list[0];
    for item in list.iter() {
        if item > largest {
            largest = &item;
        }
    }
    largest
}

pub fn main() {
    let number_list = vec![34, 50, 25, 100, 65];
    let result = largest(&number_list);
    println!("The largest number is {}", result);
    let char_list = vec!['y', 'm', 'a', 'q'];
    let result = largest(&char_list);
    println!("The largest char is {}", result);
}

上面这个示例是典型的泛型使用示例，这里是将泛型应用于函数参数与返回值中。

这里使用Rust 1.80.1版本以Debug配置进行编译，使用Ghidra进行反汇编。

可以看到，Rust对于泛型的处理和其他语言似乎是差不多的，在main函数中，调用了2次largest函数，泛型参数类型分别为i32和char，因此这里Rust就生成了2个largest函数，在name mangling（参考资料）中看不出来两个函数的泛型参数，但是通过Hash值可知两个函数不同。

但是Ghidra却能够神奇地识别出来这两个函数哪个将泛型实现为char，哪个为i32。这又是如何做到的呢？

通过将Rust程序以release配置编译后发现，在release版本下，Ghidra已经无法识别函数的泛型类型，而是直接使用空来代替了（<>）。也就是说，Ghidra对于Rust泛型函数中泛型参数的识别必然是通过调试数据完成的。

在Rust ELF中，debug配置输出的ELF文件较release多了很多的节，这些节大多以debug开头，其中保存有一些调试信息。

其中最长的一个节是用于保存调试字符串的.debug.str节。在其他debug节中基本只能找到一些地址值，推测Ghidra可以通过这些节的数据找到想要的调试字符串，随后得到泛型参数信息。

.debug_str节包含的字符串数量实在太多，Ghidra还没有添加任何符号，因此我编写了一个Ghidra脚本识别其中的所有字符串并将其全部设置为String类型。（Ghidra脚本在本文最后附上）

然后就让我发现了：

这些字符串的排列还非常地整齐，能够进行完美的对应。由此，我们自己也可以识别出Rust中的泛型类型了。不过如果获取的ELF是release版本（未stripped），我们就无从得知这里的信息了，但依然可以通过函数名进行推断。由于Rust不允许C++/Java那样的同名不同参数类型函数重载，因此如果我们发现有多个函数除Hash值之外函数名完全相同，那么即可推断出这些函数一定是由一个泛型函数衍生而来。而且，这些函数中除了泛型类型不同之外其他内容完全相同，通过比较这些函数的差异，我们不仅能够知道泛型参数被用于何处，还能够分别识别出这些函数中实现的泛型参数。

下面我们尝试分析一下其中的一个函数：largest<i32>。

Reverse for `largest<i32>`

首先通过对main函数的逆向，我们可以知道这个函数的传入参数类型。Ghidra专门定义了一个Rust的函数调用规则，称为__rustcall，用于标识Rust那种以两个参数作为返回值的函数调用规则。

该函数的参数由2个参数传递，共同标识一个&[i32]类型，这里判断不是Vec[i32]或者其他数组类型的原因是2个寄存器是切片类型传递所需要的，一个表示指针，另一个表示长度，和&str相同。

Part 1

一开始，这里Ghidra的汇编给的有问题，虽然已经识别了2个寄存器共同表示切片，但是寄存器对应关系出现了错误。这里应该是将切片数据复制2份分别保存到[rsp+0x50]和[rsp+0x10]。随后对切片的长度进行了验证，后面的JNC是跳转到panic相关操作的，这里略过。

Part 2

随后，注意0x108f79的指令，这里将rdi的值保存到了[rsp+0x40]中，实际上是对应了源码中的let mut largest: &T = &list[0];。对于&list[0]，由于list[0]指向的是第一个元素，因此地址就等于切片中保存的地址，所以就可以直接取这个地址保存作为&list[0]。后面的迭代过程与第5篇blog中分析的相同。

Part 3

这里上半部分是对Option<i32>的解析，判断迭代是否到达终点。下面[rsp+0x20]是对Option<i32>中i32值的提取，随后进行比较，这里比较使用的是core::cmp::impls::gt<i32,_i32>，即对i32内置类型实现的大于比较，因为这个函数对于泛型要求使用PartialOrd Trait，因此这里可以使用gt。

通过对泛型函数内调用的函数的分析，我们可以逆向出该函数泛型类型需要实现的Trait。

Generics in Structs

struct Point<T> {
    x: T,
    y: T,
}

fn main() {
    let integer = Point { x: 5, y: 10 };
    let float = Point { x: 1.0, y: 4.0 };

    println!("({}, {})", integer.x, integer.y);
    println!("({}, {})", float.x, float.y);
}

同样地，我们能够在.debug_str节中找到对应的字符串定义：

不过这个定义与代码段的联系并不紧密，但好在Ghidra还是能够成功地完成识别，甚至连结构体的内容、局部变量的名称都能够获取：

这些信息当然就不可能是.debug_str节中能够获取的了，可能是保存到了其他的debug节，但具体的操作细节仍未可知，不敢妄下断言。另外，注意到Ghidra没有办法将局部变量对应到正确的栈偏移，因此对于局部变量与栈偏移的对应关系，还需要我们自己通过逆向分析确定。

Reverse for Trait

Trait实现

在Rust中，伴随泛型而存在的往往是各种各样的Trait，从功能上面来看，它类似于Java的interface，是需要被实现的某种“接口”，有了这个接口，Struct就能够完成一类操作。

pub trait Summary {
    fn summarize(&self) -> String;
}

pub struct NewsArticle {
    pub headline: String,
    pub location: String,
    pub author: String,
    pub content: String,
}

impl Summary for NewsArticle {
    fn summarize(&self) -> String {
        format!("{}, by {} ({})", self.headline, self.author, self.location)
    }
}

fn main() {
    let na = NewsArticle {
        headline: "BBC".to_string(),
        location: "England".to_string(),
        author: "NA".to_string(),
        content: "NA".to_string(),
    };
    println!("{}", na.summarize());
}

上面的Rust编译得到的debug文件中，我们只需要关注一个东西即可——实现Trait的方法名。有关于Struct的相关分析在前面的文章中已经提及。

1	_ZN55_$LT$lab_04..NewsArticle$u20$as$u20$lab_04..Summary$GT$9summarize17hc6d5c31e0731e854E

转义完成后就变成了：

1	<lab_04::NewsArticle as lab_04::Summary> summarize

尖括号里面左边是Struct名，右边是Trait名，最后是Trait中的方法名。因此这个函数名完全可以让我们还原出它的原貌。另外需要注意这个函数的参数类型。它的返回值是作为一个指针以第一个参数传递的，第二个参数则是Struct实例本身的指针，这种传参方式和__thiscall比较像，不过是在前面加上了返回值的传参。

默认Trait

另外，Rust支持实现默认的Trait方法，经过尝试发现，如果一个Struct实现了默认的Trait方法，即在impl块中啥都不写，在ELF中就会为这个Struct生成一个函数，这个函数的this指针类型为该Struct。如果有多个Struct都实现了默认Trait方法，就会生成多个这样的函数，这些函数在命名上除了Hash值不同之外其他相同：

1 2	_ZN6lab_047Summary9summarize17h2b64b82f1eb531e6E _ZN6lab_047Summary9summarize17h00d706f563e0ebb6E

转义之后为：

1	lab_04::Summary::summarize

仅提供Trait名与方法名。

Trait作为参数

经过试验发现，Trait作为参数时，与函数加入泛型有着类似的处理方式。实际上，下面两种写法在语义上本来就是类似的：

pub fn notify(item: impl Summary) {
    println!("Breaking news! {}", item.summarize());
}

pub fn notify<T: Summary>(item: T) {
    println!("Breaking news! {}", item.summarize());
}

不过有一点不同的是，带有泛型的函数（第二种写法）在调试信息中可以找到函数实现的泛型类型（表现为notify<xxx>(xxx item)），但是第一种写法只能给出具体的类型（表现为notify(xxx item)）。在用法上，这两种写法有微小区别。如果参数有多个我们就能看出区别：

pub fn notify(item1: impl Summary, item2: impl Summary) {
    ...
}

pub fn notify<T: Summary, U: Summary>(item1: T, item2: U) {
    ...
}

pub fn notify<T: Summary>(item1: T, item2: T) {
    ...
}

上面3种写法，前两种语义相同，但第一种与第三种语义不同，第三种强制两个参数必须为同一个类型。在汇编层实现上，对于第一种写法，如果一共有n个Struct实现了Summary，那么最多要实现n²个函数，用于表示n²种不同的两个参数类型组合。

在原书中还提到了有条件实现方法等Trait的使用方式，但无论如何，Rust编译器都需要对每一种不同参数的函数/方法/Struct使用分别实现函数/方法/Struct。另外还有将impl类型作为返回值的说明，这部分留到分析原书第17章相关内容时再去研究。

总结

本文简单分析了泛型以及Trait在汇编层的表现：

带有泛型参数的函数对于每一种不同类型的参数传递操作都会实现一个单独的函数处理。
带有泛型的结构在实现中直接针对具体的类型分别实现。
Ghidra能够从带有调试信息的Rust ELF中获取函数实现的泛型类型。
实现了Trait的Struct的对应方法能够通过函数名识别。
实现了默认Trait的Struct的对应方法只能根据参数类型推断具体实现的Struct。
如果有同名函数（不比较Hash）存在，由两种可能——带有泛型的函数与实现了默认Trait的Struct方法，如果有调试信息，则能够获取泛型类型的为前者，否则为后者。

字符串恢复脚本

// This is a Ghidra script for adding string literals that may not be recognized by Ghidra.
//
// Usage:
// Before the script process, you need to notice the JSON file including necessary information. It's automatically
// saved in "<HOMEDIR>/.ghidra_scripts/ShortStringAdder.json".
//
// The JSON file needs to be deserialized into `ConfigReader` class, here is its format:
//
// {
//   "minimum_string_length": 3,
//   "maximum_search_range": 256,   // 0 represents no check for adjacent strings
//   "separators": [0],
//   "extra_regex": ["\u001b\\[([0-9]+;)*([0-9]+)m([\\x20-\\x7e]|\\t|\\n|\\r|(\u001b\\[([0-9]+;)*([0-9]+)m))*"]
// }
//
// - minimum_string_length: The minimum length of string that will be recognized and added.
// - maximum_search_range: We will search if there exist other strings frontward and backward, it's the maximum range.
// - separators: The separators of string literal. In some programming language (like rust), string literals might
//               not end with "\0". You can add ascii integer value here to add separators.
// - extra_regex: some regex strings that you may want to define them as strings (even if they may have some unprintable
//                chars), like UNIX color control (Added by default).
//
//@author Hornos - Hornos3.github.com, hornos@hust.edu.cn
//@category Strings

import com.google.gson.Gson;
import com.google.gson.GsonBuilder;
import ghidra.app.plugin.core.searchmem.RegExSearchData;
import ghidra.app.script.GhidraScript;
import ghidra.program.model.address.*;
import ghidra.program.model.data.DataType;
import ghidra.program.model.data.StringDataType;
import ghidra.program.model.listing.Data;
import ghidra.program.model.mem.Memory;
import ghidra.program.model.mem.MemoryAccessException;
import ghidra.program.model.util.CodeUnitInsertionException;
import ghidra.util.datastruct.ListAccumulator;
import ghidra.util.search.memory.MemSearchResult;
import ghidra.util.search.memory.RegExMemSearcherAlgorithm;
import ghidra.util.search.memory.SearchInfo;

import java.io.*;
import java.util.*;

import static ghidra.program.model.listing.CodeUnit.EOL_COMMENT;

public class ShortStringAdder extends GhidraScript {
    static class ConfigReader {
        int minimum_string_length;
        int maximum_search_range;
        HashSet<Byte> separators;
        ArrayList<String> extra_regex;

        public static ConfigReader from_json(String filename)
                throws FileNotFoundException, com.google.gson.JsonSyntaxException, com.google.gson.JsonIOException {
            GsonBuilder builder = new GsonBuilder();
            Gson gson = builder.create();

            FileReader file = new FileReader(filename);
            return gson.fromJson(file, ConfigReader.class);
        }
    }

    ConfigReader config;
    private final String homedir = System.getProperty("user.home");

    @Override
    protected void run() throws Exception {
        read_config();

        Memory mem = currentProgram.getMemory();
        AddressSetView addrs = mem.getAllInitializedAddressSet();
        long numAddr = addrs.getNumAddresses();
        long counter = 0;
        int progress_percent = 0;
        printf("Address size: %#x\n", numAddr);

        TreeMap<Address, String> string_map = new TreeMap<>();
        // check all addresses
        for (Address addr : addrs.getAddresses(true)) {
            counter++;
            if (counter * 100 / numAddr > progress_percent)
                printf("Progress: %d%%\n", ++progress_percent);

            String created = create_string(addr);
            if ( created == null )
                continue;

            string_map.put(addr, created);
        }

        // match all regexes
        TreeMap<String, TreeMap<Address, String>> regexMatches = match_regex();
        print_regex_match_results(regexMatches);
        string_map.putAll(merge_regex_result(regexMatches));

        Vector<Address> errorAddrs = new Vector<>();

        for(Map.Entry<Address, String> entry: string_map.entrySet()) {
            String created = entry.getValue();
            Address addr = entry.getKey();
            DataType dt = new StringDataType();

            if (!has_adjacent_strings(addr)) {
                printf("Address %#x (%s) has no adjacent string, skipped\n", addr.getOffset(), created);
                continue;
            }

            try {
                currentProgram.getListing().createData(addr, dt);
                printf("Created a string \"%s\" at %#x.\n", created, addr.getOffset());
                currentProgram.getListing().setComment(addr, EOL_COMMENT, "Script-added string literal");
            } catch (CodeUnitInsertionException e) {
                printf("Error while adding string at %#x, skipped.\n", addr.getOffset());
                errorAddrs.add(addr);
                e.printStackTrace();
            }
        }

        if (errorAddrs.isEmpty())
            printf("\n0 exception occurred0.\n");
        else {
            printf("\n%d exception occurred, you may need to handle it manually.\n", errorAddrs.size());
            println("At: ");
            for (Address errorAddr : errorAddrs) printf("%#x\n", errorAddr.getOffset());
        }
        println();
    }

    private void read_config() throws Exception {
        String filepath = homedir + "/.ghidra_scripts/ShortStringAdder.json";
        File dir = new File(homedir + "/.ghidra_scripts");
        File file = new File(filepath);

        // check if the dir exists, if not, create it
        if (!dir.exists()) {
            println("[-] ~/.ghidra_scripts not found, will create...");
            if (!dir.mkdirs()) {
                println("[!] Failed to create ~/.ghidra_scripts");
                return;
            }
        }

        // check if the file exists, if not, create it
        if (!file.exists()) {
            println("[-] ~/.ghidra_scripts/config.json not found, will create...");
            try (FileWriter writer = new FileWriter(file)) {
                // default configs, the regex is for UNIX color control (like \033[1;31m...\033[0m)
                writer.write("{\n" +
                        "  \"minimum_string_length\": 3,\n" +
                        "  \"maximum_search_range\": 255,\n" +
                        "  \"separators\": [0],\n" +
                        "  \"extra_regex\": [\"\\u001b\\\\[([0-9]+;)*([0-9]+)m([\\\\x20-\\\\x7e]|\\\\t|\\\\n|\\\\r|(\\u001b\\\\[([0-9]+;)*([0-9]+)m))*\"]\n" +
                        "}");
                println("[-] Default config created, you can change it in ~/.ghidra_scripts/ShortStringAdder.json.");
            } catch (IOException e) {
                println("[!] Failed to create ~/.ghidra_scripts/ShortStringAdder.json: " + e.getMessage());
                throw new Exception("Create config file failed");
            }
        }

        this.config = ConfigReader.from_json(filepath);
    }

    private String create_string(Address addr) {
        StringBuilder ret = new StringBuilder();

        if (currentProgram.getListing().getDataContaining(addr) != null) {
            if (currentProgram.getListing().getDataContaining(addr).isDefined())
                return null;
        }
        if (currentProgram.getListing().getInstructionContaining(addr) != null) {
            // TODO: get strings wrongly identified as codes
            return null;
        }

        try {
            if (currentProgram.getMemory().contains(addr.subtract(1))){
                byte previous_byte = currentProgram.getMemory().getByte(addr.subtract(1));
                if (!this.config.separators.contains(previous_byte))
                    return null;
            }
        } catch (MemoryAccessException e) {
            printf("Failed to read byte located in %#x\n", addr.subtract(1).getOffset());
            e.printStackTrace();
            return null;
        } catch (AddressOutOfBoundsException ignored) {}

        Address ptr = addr;
        int offset = 0;
        while (true) {
            try {
                ptr = addr.add(offset);
                byte b = currentProgram.getMemory().getByte(ptr);
                // all printable character (including \t, \n, \r), if conflicted with separators, the latter is dominant
                if (((b >= 0x20 && b <= 0x7e) || b == 9 || b == 10 || b == 0xd) &&
                        !this.config.separators.contains(b)) {
                    ret.append((char) (b & 0xff));
                    offset++;
                }
                else if (!this.config.separators.contains(b))   // It ends with neither a printable nor a separator
                    return null;
                else
                    break;
            } catch (MemoryAccessException e) {
                printf("Failed to read byte located in %#x\n", ptr.getOffset());
                e.printStackTrace();
                return null;
            }
        }

        if (offset >= config.minimum_string_length)
            return ret.toString();
        return null;
    }

    private TreeMap<String, TreeMap<Address, String>> match_regex() throws MemoryAccessException {
        TreeMap<String, TreeMap<Address, String>> resultMap = new TreeMap<>();
        for (String re: config.extra_regex) {
            TreeMap<Address, String> result = new TreeMap<>();
            println("Start matching regular expression: " + re);

            SearchInfo searchInfo = new SearchInfo(
                    new RegExSearchData(re), 10000, false, true, 1, true, null);
            RegExMemSearcherAlgorithm searcher = new RegExMemSearcherAlgorithm(
                    searchInfo, currentProgram.getMemory().getAllInitializedAddressSet(), currentProgram, true);
            ListAccumulator<MemSearchResult> searchResults = new ListAccumulator<>();
            searcher.search(searchResults, monitor);
            List<MemSearchResult> resultList = searchResults.asList();

            for (MemSearchResult r: resultList) {
                Address startAddr = r.getAddress();
                int matchLen = r.getLength();
                byte[] matchedBytes = new byte[matchLen];
                currentProgram.getMemory().getBytes(startAddr, matchedBytes);
                String matchedString = new String(matchedBytes);
                result.put(startAddr, matchedString);
            }

            resultMap.put(re, result);
        }
        return resultMap;
    }

    private TreeMap<Address, String> merge_regex_result(TreeMap<String, TreeMap<Address, String>> results) {
        // This method is used for selecting the primary matches while discarding overlapping matches
        // E.g. There are 3 matches at 0x0~0x10, 0x9~0x15, 0x12~0x20, then the second one will be discarded.
        TreeMap<Address, String> ret = new TreeMap<>();
        for (Map.Entry<String, TreeMap<Address, String>> entry: results.entrySet())
            ret.putAll(entry.getValue());

        // eliminate overlapping matches
        long largestRangeUpperLimit = -1;
        Iterator<Map.Entry<Address, String>> iter = ret.entrySet().iterator();
        while (iter.hasNext()) {
            Map.Entry<Address, String> entry = iter.next();
            if (entry.getKey().getOffset() + entry.getValue().length() > largestRangeUpperLimit)
                largestRangeUpperLimit = entry.getKey().getOffset() + entry.getValue().length();
            else {
                printf("Warning: %s (at %#x) removed due to overlapping memory address.\n",
                        entry.getValue(), entry.getKey().getOffset());
                iter.remove();
            }
        }

        return ret;
    }

    private void print_regex_match_results(TreeMap<String, TreeMap<Address, String>> result) {
        printf("\nMatch result:\n");
        for (Map.Entry<String, TreeMap<Address, String>> entry: result.entrySet()) {
            printf("%s:\n", entry.getKey());
            if (entry.getValue().entrySet().isEmpty()) {
                printf("\tNo matches.\n");
                continue;
            }
            for (Map.Entry<Address, String> m: entry.getValue().entrySet())
                printf("\t%#x: %s\n", m.getKey().getOffset(), m.getValue());
        }
    }

    private boolean has_adjacent_strings(Address addr) {
        Data rearData = currentProgram.getListing().getDataBefore(addr);
        Data frontData = currentProgram.getListing().getDataAfter(addr);
        return (rearData != null && addr.subtract(rearData.getAddress()) <= this.config.maximum_search_range) ||
                (frontData != null &&
                        frontData.getAddress().subtract(addr) <= this.config.maximum_search_range);
    }
}

CoLin's BLOG

Rust逆向学习 (9)

Reverse for Generics

Generics in Functions

Reverse for `largest<i32>`

Part 1

Part 2

Part 3

Generics in Structs

Reverse for Trait

Trait实现

默认Trait

Trait作为参数

总结

字符串恢复脚本

Reverse for Generics

Generics in Functions

Reverse for largest<i32>

Part 1

Part 2

Part 3

Generics in Structs

Reverse for Trait

Trait实现

默认Trait

Trait作为参数

总结

字符串恢复脚本

Reverse for `largest<i32>`