Tuesday, January 13, 2009

Smoking PowerShell Pipes

Everybody likes pipes. Starting from 1972 on when Douglas McIlroy introduced the concept, pipes have been an unreplacable tool for hooking programs together. Pipes just work. Well... at least in UNIX. But you know, this is a blog about me being stuck in Windows.

Real Pipes in UNIX

The main idea of a pipe is that you take the output of one program and connect it to the input of another program. The pipe itself does nothing more than just forwards data. When the output of one program does not fit into the input of another program, then you can pipe it first through a filter program, that transforms the data as needed. But the pipes themselves remain only as a transport layer, carefully carrying data from one program to another and not changing a bit on the way.

The same holds true for real-world pipes. A good pipe is one that doesn't change the aroma on its way from bowl to your mouth. It's the tobacco you want to smoke, not the pipe.

I had a PHP script that generated test data for MySQL database. In UNIX I would have used it as follows:

php create-test-data.php | mysql dbname

I thought that this should also work with PowerShell.

Water Pipes in PowerShell

PowerShell pipes work more like water pipes. The smoke that comes in is sucked through water, changing it's aroma, softening the bitter taste. Water pipes are great, but you shouldn't try to sell them labeled as normal pipes.

When I ran the above code in PowerShell, the data generated by the script wasn't exactly the same that MySQL database received. If it even received it, because the thing crashed along its way.

Actually I didn't even had to pipe it to another program, just redirecting the output to a file changed it considerably:

php create-test-data.php > test-data.sql
Let Me Encode This for You

The first problem was encoding. The output of the script was in UTF-8 encoding. PowerShell wanted to convert the text into his internal UTF-16 representation and then convert it to another encoding when saving to file, because for PowerShell the > operator is equivalent of piping your output to Out-File cmdlet:

php create-test-data.php | Out-File test-data.sql

Luckily Out-File takes -Encoding parameter, which can have the following values: unicode, utf7, utf8, utf32, ascii, bigendianunicode, default and oem. I tried all of them, and the only one that preserved my encoding was oem – which designates single-byte encoding.

Let Me Correct Those Lines

The philosophy of UNIX has been: it's all text. The input of every program is text and the output of every program is text. Except when it's not, and when it's not, then you can't use all the common UNIX text-processing tools on it. Instead you have to use separate tools specific to your binary format. For example you can use ImageMagick to apply all kinds of transformations for images.

This philosophy was recognized by the PowerShell team as one of the great weaknesses of UNIX pipeline. And therefore the mantra of PowerShell has been: it's not all text. It has been the great promise of PowerShell, that it will enable you to work more easily with all kinds of data, not just text.

And this all works out fine, when your program outputs .NET objects. But when it doesn't, the output is treated as text. But it's a water pipe as you remember, so the text isn't just left alone, it's transformed into .NET array, array of lines.

Doesn't look that bad, does it. But the trick is that when the text is split to lines, the line separators are discarded. And when it is put back together, all lines are joined with \r\n. For example this input:

Hello,\n
my name is Rene.\r\n
I'm the author of this blog.\n

...will be converted to this output:

Hello,\r\n
my name is Rene.\r\n
I'm the author of this blog.\r\n

First of all it's dosification - all your nice UNIX file endings will be converted to ugly DOS line endings. Really annoying. Not just annoying – terrible. I think you already know where this is going: binary data.

The SQL generated by my script also included binary data. And you can imagine what this kind of conversion can do to binary data.

There is no built-in mechanism in PowerShell to overcome this problem, although the problem is well known and there exist some ugly workarounds.

Conclusion

At the end of the day PowerShell had failed me. It was the first real job I wanted to do with PowerShell and it failed completely.

If PowerShell really wants to succeed, this behavior has to be corrected. Please, PowerShell, no water inside my pipe.

Thanks to paws22 for sharing the above photo in Flickr under Creative Commons Attribution Noncommercial Share-Alike license.

4 comments:

  1. Hi!

    If you still have those kinds of problems, you can use the binary reader instead of the.NETed Text reader:

    Get-Content -Path "$PathToYourFile" -Encoding Byte

    This should preserve any binary data and'll output it as a System.Byte array (very useful hint '| Get-Member)...

    ---
    Have fun Jones111

    ReplyDelete
  2. Hello, I read your post and impressed with you deep knowledge of "Smoking Water Pipe". Its hard to find this kind of blog post like to read more post from you. I appreciate your efforts in the post. Thanks for sharing you deep information on "Pipe Smoking Accessories".

    ReplyDelete
  3. Well I had the same problem with powershell and I know there's always a solution somewhere so I went and found it.

    You redirect stdin / stdout / stderr through named pipes.

    Yeah I know, it's not exactly fixing the brokenness of the pipes and redirection in powershell, but it's as good as you're gonna get with stock powershell.

    I don't exactly know how to do that with code though.

    It was good enough for my needs just to be able to use named pipes to output the binary data I had stored in powershell variables.

    But if you wanted to write a full-fledged solution (that would be great), the puppetlabs guys apparently figured it out.

    https://puppet.com/blog/how-propelled-powershell-at-puppet

    P.S. You could also create your own version of powershell with fixed pipes and redirection now that it has been open sourced

    https://github.com/PowerShell/PowerShell

    ReplyDelete