MIT Researchers Developed Codon: A Python-Based Compiler That Helps Create New Domain-Specific Languages DSLs Within Python

Originally posted on marktechpost.

Domain Specific Languages, or DSLs, are a class of programming languages that provide a high level of abstraction and use certain concepts and rules suitable for a specific set of problems. Examples of DSLs are HTML, which is appropriate for web page layouts, and SQL, which is used for databases. Compared to general-purpose languages like C, C++, and Java, DSLs are far less sophisticated as they are often intended not to be used by software developers but by non-traditional programmers who are fluent in the domain the DSL is made for. Moreover, DSLs are typically created in close collaboration with the professionals in the industry for whom they are being made. But past experiences have shown that implementing DSLs is quite a cumbersome task. To overcome this, researchers often look to integrate DSLs in general-purpose host languages, resulting in embedded DSLs. However, researchers are often puzzled by the question: which language to embed their DSL in order to achieve the highest level of performance?

It is a common past trend that low-level languages such as C and C++ provide better performance as a host than high-level languages like Python. But because of their simplicity and versatility, high-level languages are now more common in many domains. Python, the programming language of choice for approximately 15.7 million developers, is the best illustration of this. The language’s straightforward syntax and easy usage make it one of the most accessible programming languages currently. As a result, in order to bring high-performing DSLs to the extensive Python community, a team of researchers from the Computer Science and Artificial Intelligence Laboratory (CSAIL) at MIT developed Codon, a compiler and DSL framework built on top of optimized Python codebase that helps run Python code more efficiently while leaving sufficient room for customization and adaptation to various domains. The majority of Codon’s syntax is taken from Python 3, which enables programmers to design new DSLs using Python orders of magnitude quicker and with less runtime overhead, making it competitive with some of the fastest languages like C/C++. This MIT research was also presented at the prestigious ACM SIGPLAN 2023 International Conference on Compiler Construction and is supported by Numanagić’s NSERC Discovery Grant, the Canada Research Chair program, the U.S. Defense Advance Research Projects Agency, and the U.S. National Institutes of Health.

Codon was developed after careful consideration. Based on the studies conducted by MIT researchers, they observed that people who use DSLs the most often hail from a non-technical background and do not wish to learn a new language or tool most of the time. So, they developed the ground-breaking concept of incorporating Python syntax and libraries into a brand-new system created from scratch. From the user’s perspective, the entire procedure is unchanged. They have to write Python code as before but achieve orders of magnitude of speedups (approximately 10 to 100 times) in return. As an interpreted language, wherein instructions are directly executed during runtime and not compiled earlier, significant efforts have been made to make the language faster. However, these efforts were in the “top-down” direction of adding various optimizations or “just-in-time” compilation techniques over existing Python implementation. In contrast, the MIT researchers moved forward with a one-of-a-kind “bottom-up” approach, concentrating more on the Python infrastructure to provide more flexibility as opposed to limitations.

One of the most important first steps in a compiler is “type-checking,” which ensures that each construct, such as a variable or function, has the appropriate data type for the context in which it is being used. Being an interpreted language, all these checks in Python are performed when the program is run. This is one of the reasons why the language is so slow. By expanding on earlier research on type checking done in advance, the researchers developed certain advancements in Codon in this domain. Codon uses a bidirectional static type-checking mechanism that avoids any runtime type deduction, thus removing the cost of significant overhead that comes with data types in Python. Adding many optimization implementations is among the next set of important steps. Codon uses a brand-new intermediate representation called the Codon Intermediate Representation (CIR) to make it simple to include domain-specific optimizations and analytics. The resulting executable file generated by the compiler runs at a speed comparable to that of C or C++ or even faster once domain-specific optimizations are applied.

As a part of their study, the researchers thoroughly evaluated several compiler extensions and DSLs for Codon targeting various domains ranging from bioinformatics and quantitive finance to secure multi-party computation and parallel programming. In one of the tests focused on genomics, the researchers utilized Codon to compile roughly ten frequently used Python-based genomics applications. The results showed that the compiler could often approach performance typically only seen with low-level languages, with speedups of 5 to 10 times above the original hand-optimized implementations. Formerly, when users encountered performance issues with their Python-based apps, they had two choices: completely rebuild the software in a language like C, or rewrite it using a C-implemented library. Now, when it comes to Python applications, these individuals can use Codon to obtain the same degree of performance as attained by rewriting the application in C.

Codon is being utilized in a number of fields, including deep learning, bioinformatics, and quantitative finance. Exaloop, Inc., a startup aimed at popularizing Codon, currently maintains the compiler. Although Codon achieves impressive performance speedups, it continues to have certain shortcomings because the domain-extensive compiler still needs to support all Python capabilities. Runtime polymorphism, runtime reflection, and others are some of them. The Codon team is also working hard to increase the compiler’s Python library coverage in an effort to make it as similar to Python as feasible. All Codon-related code is currently publicly available on GitHub.

Source: marktechpost