Treebank

A Treebank (English Treebank ), also parsed corpus is a text corpus in which each sentence parsed, was thus annotated with syntactic structure. The term Treebank refers to the fact that the syntactic structure is usually represented as a tree structure.

Tree banks are often created on corpora that have been annotated with part-of -speech tags. In addition, tree banks are sometimes extended with semantic or other linguistic information.

Treebanks can be created manually by linguists annotate each sentence with syntactic structure, but also semi-automatic, so that a parser automatically assigns syntactic structure, which is then checked by a linguist and, when necessary, corrected. In practice, the complete checking and parsing of natural language texts is a labor- intensive process.

Some treebanks follow in their syntactic annotation of a particular linguistic theory (eg the BulTreeBank with HPSG ), but most are less theory- specific. Still can essentially two groups: tree banks, the phrase structure annotate (eg Penn Treebank or ICE -GB), and those that dependency structure annotate (eg Prague Dependency Treebank or the Quranic Arabic Dependency Treebank ).

  • Corpus Linguistics
  • Linguistics
  • Computational Linguistics

Pictures of Treebank

108938
de