Asked  7 Months ago    Answers:  5   Viewed   38 times

I am setting out to do a side project that has the goal of translating code from one programming language to another. The languages I am starting with are PHP and Python (Python to PHP should be easier to start with), but ideally I would be able to add other languages with (relative) ease. The plan is:

  • This is geared towards web development. The original and target code will be be sitting on top of frameworks (which I will also have to write). These frameworks will embrace an MVC design pattern and follow strict coding conventions. This should make translation somewhat easier.

  • I am also looking at IOC and dependency injection, as they might make the translation process easier and less error prone.

  • I'll make use of Python's parser module, which lets me fiddle with the Abstract Syntax Tree. Apparently the closest I can get with PHP is token_get_all(), which is a start.

  • From then on I can build the AST, symbol tables and control flow.

Then I believe I can start outputting code. I don't need a perfect translation. I'll still have to review the generated code and fix problems. Ideally the translator should flag problematic translations.

Before you ask "What the hell is the point of this?" The answer is... It'll be an interesting learning experience. If you have any insights on how to make this less daunting, please let me know.


I am more interested in knowing what kinds of patterns I could enforce on the code to make it easier to translate (ie: IoC, SOA ?) the code than how to do the translation.



I've been building tools (DMS Software Reengineering Toolkit) to do general purpose program manipulation (with language translation being a special case) since 1995, supported by a strong team of computer scientists. DMS provides generic parsing, AST building, symbol tables, control and data flow analysis, application of translation rules, regeneration of source text with comments, etc., all parameterized by explicit definitions of computer languages.

The amount of machinery you need to do this well is vast (especially if you want to be able to do this for multiple languages in a general way), and then you need reliable parsers for languages with unreliable definitions (PHP is perfect example of this).

There's nothing wrong with you thinking about building a language-to-language translator or attempting it, but I think you'll find this a much bigger task for real languages than you expect. We have some 100 man-years invested in just DMS, and another 6-12 months in each "reliable" language definition (including the one we painfully built for PHP), much more for nasty languages such as C++. It will be a "hell of a learning experience"; it has been for us. (You might find the technical Papers section at the above website interesting to jump start that learning).

People often attempt to build some kind of generalized machinery by starting with some piece of technology with which they are familiar, that does a part of the job. (Python ASTs are great example). The good news, is that part of the job is done. The bad news is that machinery has a zillion assumptions built into it, most of which you won't discover until you try to wrestle it into doing something else. At that point you find out the machinery is wired to do what it originally does, and will really, really resist your attempt to make it do something else. (I suspect trying to get the Python AST to model PHP is going to be a lot of fun).

The reason I started to build DMS originally was to build foundations that had very few such assumptions built in. It has some that give us headaches. So far, no black holes. (The hardest part of my job over the last 15 years is to try to prevent such assumptions from creeping in).

Lots of folks also make the mistake of assuming that if they can parse (and perhaps get an AST), they are well on the way to doing something complicated. One of the hard lessons is that you need symbol tables and flow analysis to do good program analysis or transformation. ASTs are necessary but not sufficient. This is the reason that Aho&Ullman's compiler book doesn't stop at chapter 2. (The OP has this right in that he is planning to build additional machinery beyond the AST). For more on this topic, see Life After Parsing.

The remark about "I don't need a perfect translation" is troublesome. What weak translators do is convert the "easy" 80% of the code, leaving the hard 20% to do by hand. If the application you intend to convert are pretty small, and you only intend to convert it once well, then that 20% is OK. If you want to convert many applications (or even the same one with minor changes over time), this is not nice. If you attempt to convert 100K SLOC then 20% is 20,000 original lines of code that are hard to translate, understand and modify in the context of another 80,000 lines of translated program you already don't understand. That takes a huge amount of effort. At the million line level, this is simply impossible in practice. (Amazingly there are people that distrust automated tools and insist on translating million line systems by hand; that's even harder and they normally find out painfully with long time delays, high costs and often outright failure.)

What you have to shoot for to translate large-scale systems is high nineties percentage conversion rates, or it is likely that you can't complete the manual part of the translation activity.

Another key consideration is size of code to be translated. It takes a lot of energy to build a working, robust translator, even with good tools. While it seems sexy and cool to build a translator instead of simply doing a manual conversion, for small code bases (e.g., up to about 100K SLOC in our experience) the economics simply don't justify it. Nobody likes this answer, but if you really have to translate just 10K SLOC of code, you are probably better off just biting the bullet and doing it. And yes, that's painful.

I consider our tools to be extremely good (but then, I'm pretty biased). And it is still very hard to build a good translator; it takes us about 1.5-2 man-years and we know how to use our tools. The difference is that with this much machinery, we succeed considerably more often than we fail.

Tuesday, June 1, 2021
answered 7 Months ago

The SGD commands are well-formed. The intermittent behavior that you describe makes me think the problem originates in how consistently the printer actually receives the command, and not the syntax of the command itself. And generally speaking, no, you do not have to change other commands in order to make this command work. Since you can successfully send the command via Zebra Setup Utilities and get the desired behavior, then you should be able to do the same through your code.

  1. Are you programmatically sending the commands over USB or over something else (Bluetooth, TCP, etc.)? You mentioned USB while using Zebra Setup Utilities, but what about in your code?

  2. Can you provide the code underneath the hood of PrintUtils.SendCommandToPrinter()? I am not familiar with this API. Which API is it?

  3. At the lowest levels of a connection you will often be calling 'write()' or 'writeData()' or something like that. Sometimes 'write' commands return the number of bytes written. If you can dig into your code a bit, perhaps there is a 'write' command that returns that value and you can verify yourself that the return value equals the length of the intended message (including new line characters).

  4. Depending on the lower level API, there also may be a flush() command lying around that forcibly pushes all data in a stream to the other end. Again, this depends on what API you're using underneat the hood of 'PrintUtils'.

In the past I have seen inconsistent behavior with USB communication. You should make sure that your firmware is as up-to-date as possible. Your QLn220 is currently on v68.18.0Z: You can check your current version by sending:

! U1 getvar ""

At the end of the day, you could always immediately query the printer for its gap/bar mode after setting it. This will cause as an additional delay in your program execution, but it is a good way of making sure that whatever you sent has actually taken effect.

Monday, August 30, 2021
answered 4 Months ago

Maybe you find some inspiration in this recipe:


A function that outputs a human-readable version of a Python AST.

Or use compiler combined with inspect (which, of course, still uses the source):

>>> import compiler, inspect
>>> import re # for testing 
>>> compiler.parse(inspect.getsource(re))
Module('Support for regular expressions (RE). nnThis module provides ...
Friday, October 8, 2021
answered 2 Months ago

The standard answer to the question of how to build parsers (that build ASTs), is to read the standard texts on compiling. Aho and Ullman's "Dragon" Compiler book is pretty classic. If you haven't got the patience to get the best reference materials, you're going to have more trouble, because they provide theory and investigate subtleties. But here is my answer for people in a hurry, building recursive descent parsers.

One can build parsers with built-in error recovery. There are many papers on this sort of thing, a hot topic in the 1980s. Check out Google Scholar, hunt for "syntax error repair". The basic idea is that the parser, on encountering a parsing error, skips to some well-known beacon (";" a statement delimiter is pretty popular for C-like languages, which is why you got asked in a comment if your language has statement terminators), or proposes various input stream deletions or insertions to climb over the point of the syntax error. The sheer variety of such schemes is surprising. The key idea is generally to take into account as much information around the point of error as possible. One of the most intriguing ideas I ever saw had two parsers, one running N tokens ahead of the other, looking for syntax-error land-mines, and the second parser being feed error repairs based on the N tokens available before it encounters the syntax error. This lets the second parser choose to act differently before arriving at the syntax error. If you don't have this, most parser throw away left context and thus lose the ability to repair. (I never implemented such a scheme.)

The choice of things to insert can often be derived from information used to build the parser (often First and Follow sets) in the first place. This is relatively easy to do with L(AL)R parsers, because the parse tables contain the necessary information and are available to the parser at the point where it encounters an error. If you want to understand how to do this, you need to understand the theory (oops, there's that compiler book again) of how the parsers are constructed. (I have implemented this scheme successfully several times).

Regardless of what you do, syntax error repair doesn't help much, because it is almost impossible to guess what the writer of the parsed document actually intended. This suggests fancy schemes won't be really helpful. I stick to simple ones; people are happy to get an error report and some semi-graceful continuation of parsing.

A real problem with rolling your own parser for a real language, is that real languages are nasty messy things; people building real implementations get it wrong and frozen in stone because of existing code bases, or insist on bending/improving the language (standards are for wimps, goodies are for marketing) because its cool. Expect to spend a lot of time re-calibrating what you think the grammar is, against the ground truth of real code. As a general rule, if you want a working parser, better to get one that has a track record rather than roll it yourself.

A lesson most people that are hell-bent to build a parser don't get, is that if they want to do anything useful with the parse result or tree, they'll need a lot more basic machinery than just the parser. Check my bio for "Life After Parsing".

Saturday, November 6, 2021
Elio Campitelli
answered 1 Month ago

The mixture of formal language from math with more colloquial language from programming makes these conversations difficult. You're dealing with two contextually-loaded words here: "composable" and "function".

Function composition — in math

The mathematical notion of a "function" A → B is a mapping from some set A to some set B, and "function composition" is a specific operation denoted by . For some f: A → B and g: B → C, g∘f is a function A → C such that (g∘f)(x) = g(f(x)) for all x in A. This composition is defined for any two functions if their domain/codomain match up in this way (in other words, such a pair of functions "can be composed"), and we describe this by stating that "functions are composable".

Composability — in programming

As a qualitative term, we use "composability" often in software to describe the ability of a set of compositions can build large things from combining small ones. In this sense, programmers describe functions (as a whole) as "very composable", because functions can (and, in a purely functional language like Haskell, do) comprise the large and the small of an entire program.

In software we also see a more human-oriented usage of the term "composable" which tends to be associated with "modularity". When components are stateless, concerns are separated, and APIs have low surface area, it's easier to compose programs without making mistakes. We praise the components of such a design as being "composable"—not just because they can be combined, but because they're easy to combine correctly.

Function — in programming

I'm going to use the slightly outdated term "subroutine", because I don't know a good way to discuss this in the parlance of our times. If a subroutine doesn't do any IO (and always halts, and doesn't throw…), then it implements (or "is") a "function" in the mathematical sense. IO subroutines have superficial resemblance to functions, because they may have input and output values, but the similarity stops there. None of the conversations we may have about "function composition" as we first discussed it will apply.

Here's where we hit the trickiest linguistic difficulty, because the word "function" has come into common usage to refer to any subroutine, even ones that perform IO. FP enthusiasts tend to fight this—people say things like "if it does IO, it isn't a function"—but that battle of popularity has been lost and there's no turning back now. Within most programming contexts, all subroutines are called "functions", and the only way to distinguish functions that satisfy the mathematical definition is to call them "pure functions".

With these definitions established, let's address your questions:

"A composable function must have both arguments and return value?"

There are a couple boring things to point out about this question. First, every function in Scala technically has a return type. If that type is Unit, it may be elided for brevity, but it's still a return type.

And a nullary (0-arg) function can be trivially transformed into an equivalent function with an argument. So really, it just doesn't matter. If you're in a situation where you need to compose functions with arguments and f has no argument, you can just write _ => f.

"Can this function have side-effect?"

Merely a semantic squabble. In the context of Scala, the most appropriate thing to say is that it is a Function (or perhaps technically a "method", depending on where it is defined), but due to the side effect, it is not a pure function.

"Do we still consider it as 'composable'?"

Sort of. All of these things still "come together" in a fairly general way, so yes, they do compose in the software sense. Although pure functions compose better than impure ones. And the mathematical notion of function composition does not apply to subroutines that are not pure functions.

Finally, if you want to know whether they literally compose in Scala with the compose method on Function1, you don't need Stack Overflow; just ask the compiler.

Friday, November 12, 2021
answered 4 Weeks ago
Only authorized users can answer the question. Please sign in first, or register a free account.
Not the answer you're looking for? Browse other questions tagged :