git: 9front

ref: 14b919faf8e69fb8c06bb397b5a87c2b9f00c3bb
dir: /sys/man/1/uhtml/

View raw version
.TH UHTML 1
.SH NAME
uhtml \- convert foreign character set HTML file to unicode
.SH SYNOPSIS
.B uhtml
[
.B -p
] [
.B -c
.I charset
] [
.I file
]
.SH DESCRIPTION
HTML comes in various character set encodings
and has special forms to encode characters. To
make it easier to process html, uhtml is used
to normalize it to a unicode only form.
.LP
Uhtml detects the character set of the html input
.I file
and calls
.IR tcs (1)
to convert it to utf replacing html-entity forms
by ther unicode character representations except for 
.B lt
.B gt
.B amp
.B quot
and
.B apos .
The converted html is written to
standard output. If no
.I file
was given, it is read from standard input. If the
.B -p
option is given, the detected character set is printed and
the program exits without conversion.
In case character set detection fails, the default (utf)
is assumed. This default can be changed with the
.B -c
option.
.SH SOURCE
.B /sys/src/cmd/uhtml.c
.SH SEE ALSO
.IR tcs (1)