(L2.3) Overview of research problems
During this project, we delved into three main subjects to do with using Rust for memory safety instead of C/C++:
- How do we make sure our Rust code is memory-safe?
- How do we integrate Rust into existing projects?
- How do we migrate from memory unsafe languages to Rust?
For each point, we came across one big unanswered question. For our first question, the biggest academic interest is which unsafe code is most used, and how dangerous it is.
For the second question, cross-language attacks come into play (see report L2.2). Rust and C/C++ can interact via foreign function interfaces as seen in the 12 December 2023 workshop. However, the advent of cross-language attacks raises an important question: can Rust and C/C++ be combined without compromising security?
Lastly, with the guarantees that Rust offers, there is great interest in automatically migrating from C(/C++) to idiomatic Rust. However, achieving this is still an unsolved problem.
With the White House’s statement encouraging the adoption of memory-safe languages like Rust, we anticipate a continued rise in academic interest across these domains.
1. Which Unsafe Code Is Most Used, and How Dangerous Is It?
Cui et al.(2024), Astrauskas et al. (2020), and Evans et al. 2020 have analyzed the prevalence of unsafe Rust usage. Evans et al. 2020 found that 29% of crates sourced from crates.io use unsafe Rust code directly. Subsequently, Astrauskas et al. (2020) found that 23.6% of the crates from crates.io, evaluated 18 months later, use unsafe Rust. This trend continues with Cui et al.(2024), who found that by 2024 only 20.8% of crates in crates.io use unsafe Rust.
Astrauskas et al. (2020) not only quantified the use of unsafe Rust but also categorized the most common reasons for its use. The top three, in ascending order, are: the use of mutable static variables, dereferences of raw pointers and finally, calls to unsafe functions.
The natural next step is to determine whether using unsafe Rust leads to memory errors and, if so, which kind. Cui et al.(2024) analyze which unsafe standard library APIs are most often misused in such a way that results in a CVE. The most commonly misused standard library APIs for the evaluated CVEs were, in ascending order, APIs that allow indexing without bounds checks, creating uninitialized values, and bypassing thread safety. Specifically the misuse of the unsafe Send and Sync traits was the most common cause for vulnerabilities. Cui et al.(2024) created a standardized collection of safety requirements that can be used in API documentation for clarity to avoid further misuse of unsafe APIs. These safety requirements state which pre or post-conditions the surrounding code must hold to ensure the call to the unsafe API will not cause undefined behavior.
There is still room for further exploration of which unsafe Rust leads to the most CVEs, outside of unsafe APIs, and how the use of unsafe Rust evolves over time.
[1] Cui, M., Sun, S., Xu, H., & Zhou, Y. (2024). Is unsafe an Achilles’ Heel? A Comprehensive Study of Safety Requirements in Unsafe Rust Programming. Proceedings of the IEEE/ACM 46th International Conference on Software Engineering. https://doi.org/10.1145/3597503.3639136
[2] Vytautas Astrauskas, Christoph Matheja, Federico Poli, Peter Müller, and Alexander J. Summers. 2020. How do programmers use unsafe rust? Proc. ACM Program. Lang. 4, OOPSLA, Article 136 (November 2020), 27 pages. https://doi.org/10.1145/3428204
[3] Evans, A. N., Campbell, B., & Soffa, M. L. (2020). Is rust used safely by software developers? Proceedings of the ACM/IEEE 42nd International Conference on Software Engineering, 246–257. https://doi.org/10.1145/3377811.3380413
2. Can Rust and C/C++ Be Combined Without Compromising Security?
With the advent of cross-language attacks (see L2.2) new research has emerged to thwart these threats. Li et al. (2022) present their new static analysis tool that aims to detect potential memory management bugs across FFI boundaries. In their tests, they had a false positive rate, i.e. they detected memory management bugs which weren't there, of about 85%. However, they only generated 222 warnings for 49.5 million lines of code. This new tool could, therefore, be very useful for preventing cross-language attacks.
An alternative approach to dealing with cross-language attacks is not to correct the errors which make cross-language attacks possible but to minimize the impact that such an attack can wreak. Liu et al. (2020), Bang et al. (2023), Kirth et al. (2022), Rivera et al. (2021) and, Almohri et al. (2018) propose designs in line with this strategy. These designs share a common principle: protecting data that is only ever touched by safe Rust code in normal code execution from unsafe code. This is done by withdrawing write permissions from the unsafe code to ensure that the safe subsection of data remains uncorrupted. For this design to be effective, the separation of data that unsafe code can legally access from data it cannot must be correct rather than precise. Over-restricting access may alter the program logic should unsafe code be restricted from data it legitimately requires.
[4] Zhuohua Li, Jincheng Wang, Mingshen Sun, and John C. S. Lui. 2022. Detecting Cross-language Memory Management Issues in Rust. In Computer Security – ESORICS 2022: 27th European Symposium on Research in Computer Security, Copenhagen, Denmark, September 26–30, 2022, Proceedings, Part III. Springer-Verlag, Berlin, Heidelberg, 680–700. https://doi.org/10.1007/978-3-031-17143-7_33
[5] Liu, P., Zhao, G., & Huang, J. (2020). Securing unsafe rust programs with XRust. Proceedings of the ACM/IEEE 42nd International Conference on Software Engineering, 234–245. https://doi.org/10.1145/3377811.3380325
[6] Bang, I., Kayondo, M., Moon, H., & Paek, Y. (2023). TRust: A Compilation Framework for In-process Isolation to Protect Safe Rust against Untrusted Code. 32nd USENIX Security Symposium (USENIX Security 23), 6947–6964. https://www.usenix.org/conference/usenixsecurity23/presentation/bang
[7] Kirth, P., Dickerson, M., Crane, S., Larsen, P., Dabrowski, A., Gens, D., Na, Y., Volckaert, S., & Franz, M. (2022). PKRU-safe: automatically locking down the heap between safe and unsafe languages. Proceedings of the Seventeenth European Conference on Computer Systems, 132–148. https://doi.org/10.1145/3492321.3519582
[8] Rivera, E., Mergendahl, S., Shrobe, H., Okhravi, H., & Burow, N. (2021). Keeping Safe Rust Safe with Galeed. Proceedings of the 37th Annual Computer Security Applications Conference, 824–836. https://doi.org/10.1145/3485832.3485903
[9] Almohri, H. M. J., & Evans, D. (2018). Fidelius Charm: Isolating Unsafe Rust Code. Proceedings of the Eighth ACM Conference on Data and Application Security and Privacy, 248–255. https://doi.org/10.1145/3176258.3176330
3. How do we go from C Code to idiomatic Rust?
The workshop of 12 December 2023 provides a workflow for manually translating C to Rust code with the help of the tool C2Rust. C2Rust is, after all, the only production-ready tool for automatic translation from C to Rust. However, this tool only performs a syntactical translation, not a semantic one. For example, C pointers are translated to raw pointers in Rust instead of references. Academic interest has now moved to developing a tool that translates C code into idiomatic Rust code automatically. However, to automatically translate C to idiomatic safe Rust, the developer has to prove that the original C code was memory-safe, which academia has struggled to achieve for decades.
Since 2021 [10] academia has made advances in state-of-the-art approaches that surpass C2Rust's capabilities. The new proposals fall into two categories: those based on static analysis and large language models. Both, however, have their limitations.
Static analysis tools generally handle specific changes, such as translating lock mechanisms from the C API to the Rust API [12] or converting a subset of pointers to references [10] [11] [13]. Erme et al. (2023) even propose facilitating a broader scope of transformations by adding changes to the Rust compiler, enabling it to recognize more code patterns as safe.
In contrast, techniques based on large language models can address a much wider variety of unsafe code but suffer from correctness [14] [15]. For instance, according to Takashima et al. (2024) and Eniser et al. (2024), these translated programs fail approximately 50% of the tests in their test suites. To address this, Takashima et al. (2024) introduced Vert to verify whether programs automatically translated from C to Rust retained their original semantics.
The new Translating All C to Rust (TRACTOR) program from DARPA, which aims to automate the translation from legacy C code to Rust can be expected to accelerate further development in this domain.
[10] Emre, M., Schroeder, R., Dewey, K., & Hardekopf, B. (2021). Translating C to safer Rust. Proc. ACM Program. Lang., 5(OOPSLA). https://doi.org/10.1145/3485498
[11] Emre, M., Boyland, P., Parekh, A., Schroeder, R., Dewey, K., & Hardekopf, B. (2023). Aliasing Limits on Translating C to Safe Rust. Proc. ACM Program. Lang., 7(OOPSLA1). https://doi.org/10.1145/3586046
[12] Hong, J., & Ryu, S. (2023). Concrat: An Automatic C-to-Rust Lock API Translator for Concurrent Programs. Proceedings of the 45th International Conference on Software Engineering, 716–728. https://doi.org/10.1109/ICSE48619.2023.00069
[13] Zhang, H., David, C., Yu, Y., & Wang, M. (2023). Ownership Guided C to Rust Translation. In C. Enea & A. Lal (Eds.), Computer Aided Verification (pp. 459–482). Springer Nature Switzerland.
[14] Takashima, Yoshiki (2024). Testing and Verifying Rust's Next Mile. Carnegie Mellon University. Thesis. https://doi.org/10.1184/R1/25451383.v1
[15] Eniser, H. F., Zhang, H., David, C., Wang, M., Christakis, M., Paulsen, B., Dodds, J., & Kroening, D. (2024). Towards Translating Real-World Code with LLMs: A Study of Translating to Rust. https://arxiv.org/abs/2405.11514