Convert text files from DOS to UNIX and vice versa.

来源:百度文库 编辑:神马文学网 时间:2024/04/27 20:57:31

Convert text files from DOS to UNIX and vice versa.

Here are few recipes to convert DOS text files to UNIX.

How do you know whether you have a DOS text file or a UNIX text file? A line feed and a carriage return terminate the DOS text files lines. UNIX uses only a line feed character. By using the Unix command file you can figure out which type of file you are dealing with.
$ file dosfile.txt            dosfile.txt: ASCII text, with CRLF line terminators            $ file unixfile.txt            unixfile.txt: ASCII text            

$ file dosfile.txt
dosfile.txt: ASCII text, with CRLF line terminators
$ file unixfile.txt
unixfile.txt: ASCII text

To convert files back and forth from one format to another you have several options. A few of them are showed here.

Using tr

DOS to UNIX only
The tr utility copies the standard input to the standard output with substitution or deletion of selected characters.
$ tr -d '\r' < dosfile.txt > unixfile.txt            

$ tr -d '\r' < dosfile.txt > unixfile.txt

Using Vim

No need to introduce vim. Vim is an improved version of vi, and runs on almost all Unix systems as well as on Windows.
$ vim dosfile.txt            ...            :set FileFormat=unix            :wq            

$ vim dosfile.txt
...
:set FileFormat=unix
:wq

If you have a Unix text file to convert you can type :set FileFormat=dos. You can also replace FileFormat by the abbreviation FF. You can type :help FileFormat to have more information on this option.

Using Emacs

Emacs is the extensible, customizable, self-documenting real-time display editor.
In the status bar at the bottom of your screen Emacs shows information about the file you are editing. In the following exemple, the file is a DOS file.

http://image4.360doc.com/DownloadImg/2009/3/24/26398_2906693_2.png
The function set-buffer-file-encoding-system will let you set the file coding-system of the current buffer to CODING-SYSTEM. This means that when you save the buffer, it will be converted according to CODING-SYSTEM. The following command will set the coding system for your buffer to UNIX. Next time you will save your file this encoding system will apply. Instead of Unix type dos to select the DOS encoding system.
M-x set-buffer-file-coding-system Unix            

M-x set-buffer-file-coding-system Unix

http://image4.360doc.com/DownloadImg/2009/3/24/26398_2906693_2.png

Using sed

There are several flavors of sed depending on the operating system you are using or which package is installed on your computer. The following methods should work with all the flavors of sed available.
As we said before, lines are terminated by CR/LF in the DOS file format. The following command suppresses the last two characters of the line, and then adds the default line termination character for UNIX which is LF.
$ sed 's/.$//' dosfile.txt  > new_unixfile.txt            

$ sed 's/.$//' dosfile.txt > new_unixfile.txt

The conversion from UNIX file format to DOS is more complicated. In the old versions of sed you could not include special characters to the rule. Therefore, we need to call an external program from inside the substitution rule to add the character CR.
$ sed 's/$'"/`echo -e "\r"`/" unixfile.txt > new_dosfile.txt            

$ sed 's/$'"/`echo -e "\r"`/" unixfile.txt > new_dosfile.txt

If by chance you are using a more recent version of sed or a sed extended such as gnu sed you can simply do this.
$ sed 's/$/\r/' unixfile.txt > new_dosfile.txt            

$ sed 's/$/\r/' unixfile.txt > new_dosfile.txt

Using Perl

Perl is pretty straightforward, you add or remove the CR character at the end of the file.
$ perl -p -e 's/\r$//' < dosfile.txt > new_unixfile.txt            $ perl -p -e 's/$/\r/' < unixfile.txt > new_dosfile.txt            

$ perl -p -e 's/\r$//' < dosfile.txt > new_unixfile.txt
$ perl -p -e 's/$/\r/' < unixfile.txt > new_dosfile.txt

Using awk

The same recipe using awk.
$ awk '{sub("\r$", "", $0);print $0}' dosfile.txt > new_unixfile.txt            $ awk '{sub("$", "\r", $0);print $0}' unixfile.txt > new_dosfile.txt            

$ awk '{sub("\r$", "", $0);print $0}' dosfile.txt > new_unixfile.txt
$ awk '{sub("$", "\r", $0);print $0}' unixfile.txt > new_dosfile.txt

Using Python

This one is more for the fun. I really like Python, this is my favorite language but I rarely, not to say never do python one-liners. For me Python is not really adapted for writing one-liners.
$ python -c "import sys; map(sys.stdout.write, (l[:-2] + '\n' for l in sys.stdin.readlines()))" < dosfile.txt  > new_unixfile.txt            

$ python -c "import sys; map(sys.stdout.write, (l[:-2] + '\n' for l in sys.stdin.readlines()))" < dosfile.txt > new_unixfile.txt

Conclusion

There are other ways to convert DOS file format to UNIX file format. Personally I am a heavy Emacs user, and when I have to convert text files, or programs files, I use Emacs. For huge data files, I prefer to use Perl. You don't want to open a file of several hundred of Giga Bytes in you Emacs or VI.