Text Filters with Linux (head tail sort nl wc uniq sed tac cut)

Filter means an input for the Linux command line. It can be generated by a program, read from a file, or entered by the user. After this filter entry, the necessary actions are taken and the main document is processed according to the filter. The result can be written to the screen as desired or added to another file.

In this article, we will see the commands used for these operations as a whole. It would be more useful to write a single article rather than to discuss them in separate articles. General usage is shown without going into too much detail. In our examples, we will use the working file containing the following data. To follow the examples, create an examplefile.txt file for yourself by copying and pasting the following data.

Fatih elmasuyu 20
Suzan portakalsuyu 5
Melih kavunsuyu 12
Melih kavunsuyu 12
Rasim kirazsuyu 4
Tarık portakalsuyu 9
Lale şeftalisuyu 7
Suzan portakalsuyu 12
Melih kayısısuyu 39
Ayşe mangosuyu 7
Galip havuçsuyu 3
Osman karpuzsuyu 2
Betül narsuyu 14```

**head**

This command displays the desired number of lines from the beginning of the requested document. If no line count is given, the default value is 10 lines.

format** : head [-number of lines to print] [path]
```bash
head examplefile.txt 
Fatih elmasuyu 20
Suzan portakalsuyu 5
Melih kavunsuyu 12
Melih kavunsuyu 12
Rasim kirazsuyu 4
Tarık portakalsuyu 9
Lale şeftalisuyu 7
Suzan portakalsuyu 12
Melih kayısısuyu 39
Ayşe mangosuyu 7```

The first 10 lines from the beginning are displayed above. Now let's view the first 4 lines.
```bash
head -4 examplefile.txt 
Fatih elmasuyu 20
Suzan portakalsuyu 5
Melih kavunsuyu 12
Melih kavunsuyu 12```

**tail**

The tail command scans and displays from the end of the document, just the opposite of the head command. Displays the desired number of lines of a document from the end. If no number of lines is given, the default is 10 lines.

format** : tail [-number of lines to print] [path]
```bash
tail examplefile.txt 
Melih kavunsuyu 12
Rasim kirazsuyu 4
Tarık portakalsuyu 9
Lale şeftalisuyu 7
Suzan portakalsuyu 12
Melih kayısısuyu 39
Ayşe mangosuyu 7
Galip havuçsuyu 3
Osman karpuzsuyu 2
Betül narsuyu 14```

Now let's view the last 3 lines.
```bash
tail -3 examplefile.txt 
Galip havuçsuyu 3
Osman karpuzsuyu 2
Betül narsuyu 14```

**shorts**

It sorts the given text input alphabetically by default. It is possible to sort by other criteria. You can check the man pages for detailed information.

format** : sort [-options] [path]
```bash
sort examplefile.txt 
Ayşe mangosuyu 7
Betül narsuyu 14
Fatih elmasuyu 20
Galip havuçsuyu 3
Lale şeftalisuyu 7
Melih kavunsuyu 12
Melih kavunsuyu 12
Melih kayısısuyu 39
Osman karpuzsuyu 2
Rasim kirazsuyu 4
Suzan portakalsuyu 12
Suzan portakalsuyu 5
Tarık portakalsuyu 9```

**nl**

This command takes its name from the initials of the expression number lines, which means number the lines.

format** : nl [-options] [path]
```bash
nl examplefile.txt 
     1	Fatih elmasuyu 20
     2	Suzan portakalsuyu 5
     3	Melih kavunsuyu 12
     4	Melih kavunsuyu 12
     5	Rasim kirazsuyu 4
     6	Tarık portakalsuyu 9
     7	Lale şeftalisuyu 7
     8	Suzan portakalsuyu 12
     9	Melih kayısısuyu 39
    10	Ayşe mangosuyu 7
    11	Galip havuçsuyu 3
    12	Osman karpuzsuyu 2
    13	Betül narsuyu 14```

Sometimes you may want to add to the output. For example, if you want to put a period after the line numbers and leave a 10-character space before the numbers, you can try the example below.
```bash
nl -s '. ' -w 10 examplefile.txt 
         1. Fatih elmasuyu 20
         2. Suzan portakalsuyu 5
         3. Melih kavunsuyu 12
         4. Melih kavunsuyu 12
         5. Rasim kirazsuyu 4
         6. Tarık portakalsuyu 9
         7. Lale şeftalisuyu 7
         8. Suzan portakalsuyu 12
         9. Melih kayısısuyu 39
        10.Ayşe mangosuyu 7
        11.Galip havuçsuyu 3
        12.Osman karpuzsuyu 2
        13.Betül narsuyu 14```

In the example above, two different command options are used. The -s option specifies that the . and space characters will be used as separators after the line number. The -w option specifies how much space will be left before the line number. Note that in this example, the options are entered in quotation marks.

**toilet**

The wc command consists of the initials of the word count expression and gives the number of words in the entered text document. Unless otherwise specified, the number of lines, words, and letters are reported in the command output.

format** : wc [-options] [path]
```bash
wc examplefile.txt 
13  39 255 examplefile.txt```

Sometimes, we may need only one of these pieces of information. In this case, it is sufficient to specify the letter option of the information required to the command. -l (line) will specify the number of lines, -w (word) the number of words, and -m the number of characters.
```bash
wc -l examplefile.txt 
13 examplefile.txt```

You can also combine more than one of these options.
```bash
wc -lw examplefile.txt 
13  39 examplefile.txt```

**cut**

The Cut command allows you to take the columns you want from a file if your data is separated into columns, and copies the columns you want from CSV (Comma Separated Values) or texts consisting of space-separated values.

In the sample file we use, the data is separated by spaces. The first column indicates the name, the second column indicates the juice, and the third column indicates the quantity. If we want to get only the names from here, we can do this as follows.

**-f** : It is the first letter of the Fields expression and indicates which fields we will take.

**-d** : It is the first letter of the delimiter expression and specifies the character to be used to separate fields.

format** : cut [-options] [path]
```bash
cut -f 1 -d ' ' examplefile.txt 
Fatih
Suzan
Melih
Melih
Rasim
Tarık
Lale
Suzan
Melih
Ayşe
Galip
Osman
Betül

Let’s see how to take 2 columns and use them with an example.

cut -f 1,2 -d ' ' examplefile.txt 
Fatih elmasuyu
Suzan portakalsuyu
Melih kavunsuyu
Melih kavunsuyu
Rasim kirazsuyu
Tarık portakalsuyu
Lale şeftalisuyu
Suzan portakalsuyu
Melih kayısısuyu
Ayşe mangosuyu
Galip havuçsuyu
Osman karpuzsuyu
Betül narsuyu```

**sed**

The sed command is created from the Stream Editor statement. It uses SEARCH-FIND/REPLACE logic. As can be seen from the explanation, it can be used to search for an expression and replace it with another expression. Although it has a number of other capabilities, we will show basic usage here.

format** : sed <expression> [path]

Basically, expression has the following structure.

**Expression** : s/searchexpression/newexpression/g

the s** at the beginning   tells the sed command that the substitute operation will be performed. There are also other letters and operations.   The expression between the first and second apostrophe used after the letter **s indicates what to search for, and the next part indicates what to replace with. The ****g** statement at the end   indicates that the operation should be performed globally.  The letter **g**  may not be used. If left blank, the first value found during the search is changed, but the rest of the text is not changed.

Let's look at our file contents first.
```bash
cat examplefile.txt
Fatih elmasuyu 20
Suzan portakalsuyu 5
Melih kavunsuyu 12
Melih kavunsuyu 12
Rasim kirazsuyu 4
Tarık portakalsuyu 9
Lale şeftalisuyu 7
Suzan portakalsuyu 12
Melih kayısısuyu 39
Ayşe mangosuyu 7
Galip havuçsuyu 3
Osman karpuzsuyu 2
Betül narsuyu 14```

With the example below, all Suzan names in our file are replaced with Serpil.
```bash
sed 's/Suzan/Serpil/g' examplefile.txt 
Fatih elmasuyu 20
Serpil portakalsuyu 5
Melih kavunsuyu 12
Melih kavunsuyu 12
Rasim kirazsuyu 4
Tarık portakalsuyu 9
Lale şeftalisuyu 7
Serpil portakalsuyu 12
Melih kayısısuyu 39
Ayşe mangosuyu 7
Galip havuçsuyu 3
Osman karpuzsuyu 2
Betül narsuyu 14```

sed searches and replaces entered expressions character by character, not word by word. In this case, you can also replace Suz with Ser. Sed searches case-sensitively by default. Instead of the expression to be searched, you can create different filters using [regular expressions], which we will explain in another section.

Finally, note that the options we entered for sed are written in quotes. If you accidentally forget to put the quotes,  you can use the CTRL+c**  key combination to terminate the process.

**unique**

The uniq command is created from the word unique, meaning one and only. Basically, what it does is to take only one of the repeating lines and disable the other repeats. Sometimes there may be double entries in records. In this case, it is used to correct and simplify records. The important thing to note here is that repeating lines must follow each other, one under the other. If there are repeating lines in the document but they are not one under the other, we will discuss what needs to be done to solve this situation in the article on Piping and Redirection.

You may have noticed that some lines in our sample file are repeated. Let's extract these lines using uniq. Let's first look at the original version of the file. As can be seen, Melih repeats the line twice and consecutively.
```bash
cat examplefile.txt
Fatih elmasuyu 20
Suzan portakalsuyu 5
Melih kavunsuyu 12
Melih kavunsuyu 12
Rasim kirazsuyu 4
Tarık portakalsuyu 9
Lale şeftalisuyu 7
Suzan portakalsuyu 12
Melih kayısısuyu 39
Ayşe mangosuyu 7
Galip havuçsuyu 3
Osman karpuzsuyu 2
Betül narsuyu 14```

After executing the command, it can be seen that the repeated lines are cleared.

format** : uniq [options] [path]
```bash
uniq examplefile.txt 
Fatih elmasuyu 20
Suzan portakalsuyu 5
Melih kavunsuyu 12
Rasim kirazsuyu 4
Tarık portakalsuyu 9
Lale şeftalisuyu 7
Suzan portakalsuyu 12
Melih kayısısuyu 39
Ayşe mangosuyu 7
Galip havuçsuyu 3
Osman karpuzsuyu 2
Betül narsuyu 14```

**crown**

The tac command does the opposite of the cat command. It reads the bottom line of the file and writes it as the first line. Let us note that it is different from the Head and Tail commands.

Sometimes, while keeping records, new records may be written to the bottom of the file. You may want to see these new records at the top. In this case, using tac will make your job easier.

format** : tac [path]
```bash
tac examplefile.txt 
Betül narsuyu 14
Osman karpuzsuyu 2
Galip havuçsuyu 3
Ayşe mangosuyu 7
Melih kayısısuyu 39
Suzan portakalsuyu 12
Lale şeftalisuyu 7
Tarık portakalsuyu 9
Rasim kirazsuyu 4
Melih kavunsuyu 12
Melih kavunsuyu 12
Suzan portakalsuyu 5
Fatih elmasuyu 20```
Last modified 17.01.2025: new translations (f32b526)