SBASIC / SuperBASIC Reference Manual - HTML

Anything QL Software or Programming Related.
tcat
Super Gold Card
Posts: 633
Joined: Fri Jan 18, 2013 5:27 pm
Location: Prague, Czech Republic

Re: SBASIC / SuperBASIC Reference Manual - HTML

Post by tcat »

Hi,

I wish to share the convertor code in the hope it can improve further. It takes Text87 ESC/P printer file and translates it into HTML stream. It is not perfect and it can convert some 85-90% at the moment.
I started coding it with S*BASIC then I decided to complete it in Linux, that is my home computer next to and connected to QL.

It is coded as shell application using SED and PERL. If you have access to UNIX terminal, Linux or Raspberry PI, the convertor will run there. I use Debian or Raspbian on RPI. I cannot test on iMac but it may run there as well.

I am no Linux expert just possibly an advanced user, I did not know PERL before, just learnt my first steps on this project.

The convertor runs in four passes.

1st pass, translates ESC/P sequences.

note:
An escape ESC/P sequence is a formating directive for the EPSON printer.
It begins with ESC control character followed by optional parameters, on UNIX it is displayed as ^[ and can be input from the keyboard as ctrl-V + ctrl-[. Generally UNIX control characters begin with ^, and can be coded as above, e.g. ctrl-V + ctrl-Z = ^Z for end of file, Any character can be coded as ctrl-V + ctrl-(three octal digits), e.g. NULL ctrl-V + ctrl-000 produces ^@. Characters can also be coded in octal, decimal, and hex, e.g. \0101, \d65, \x41 all for capital A.

note 2:
The code below is copied to plain text, all escapes and controls reformated and appear as two chars, to run the code please use tarball posted later on at the thread

Code: Select all

#!/bin/sed -f
# converts EPSON printer file
# into HTML stream, translates ESC/P sequences

:lbl {

s/^[@//                         ;t lbl  # file start, printer init
s/^[!.//                        ;t lbl  # removes escape seq.
s/^[(U...//                     ;t lbl  # -"-
s/^[6//                         ;t lbl  # -"-
s/^[x.//                        ;t lbl  # -"-
s/^[C.//                        ;t lbl  # -"-
s/^[(t.....//                   ;t lbl  # -"-
s/^[t.//                        ;t lbl  # -"-
s/^[\\F./<hr style="width: 100%; height: 2px;">/ ;t lbl  # adds seperator
s/^[\\^[^A/<br>/                ;t lbl  # adds line break
s/^[\\..//                      ;t lbl  # removes escape seq.
s/^[J.//                        ;t lbl  # -"-
s/^[k.//                        ;t lbl  # -"-
s/^[p.//                        ;t lbl  # -"-
s/^[X...//                      ;t lbl  # -"-
s/^[E/<h3>/                     ;t lbl  # bold font start
s/^[F/<\/h3>/                   ;t lbl  # bold font end
s/^[P//                         ;t lbl  # removes escape seq.
s/^[g//                         ;t lbl  # -"-
s/^[(^...//                     ;t lbl  # -"-
s/^[M//                         ;t lbl  # -"-
s/^[R.//                        ;t lbl  # -"-
s/^[-.//                        ;t lbl  # -"-
s/^[4/<span style="font-style: italic;">/ ;t lbl  # italic script start
s/^[5/<\/span>/                 ;t lbl  # italic script end
s/^[S./<sup>/                   ;t lbl  # subscript start
s/^[T/<\/sup>/                  ;t lbl  # subscript end
s/SBASIC\/.*Section [0-9]//     ;t lbl  # removes page note
s/SBASIC\/.*Appendix [0-9 ]*//  ;t lbl  # removes page note
s/^L//                          ;t lbl  # and ctrl-L
s/<\([A-Z0-9]*\)>/\<\1\>/ ;t lbl  # maps < and > to html
s/\x84/\xe4/                    ;t lbl  # maps a umlaut to latin1
s/\xc6/\xe3/                    ;t lbl  # a tilde
s/\x86/\xe5/                    ;t lbl  # a circle
s/\xa0/\xe1/                    ;t lbl  # a acute
s/\x85/\xe0/                    ;t lbl  # a grave
#s/\xc0/\xe2/                   ;t lbl  # a circumflex
s/\x89/\xeb/                    ;t lbl  # e umlaut
s/\x82/\xe9/                    ;t lbl  # e acute
s/\x8a/\xe8/                    ;t lbl  # e grave
s/\x88/\xea/                    ;t lbl  # e circumflex
s/\x8b/\xef/                    ;t lbl  # i umlaut
s/\xa1/\xed/                    ;t lbl  # i acute
s/\x8d/\xec/                    ;t lbl  # i grave
s/\x8c/\xee/                    ;t lbl  # i circumflex
s/\x94/\xf6/                    ;t lbl  # o umlaut
#s/\x/\x/                       ;t lbl  # o tilde
s/\xa2/\xf3/                    ;t lbl  # o acute
s/\x95/\xf2/                    ;t lbl  # o grave
s/\x93/\xf4/                    ;t lbl  # o circumflex
#s/\x/\x/                       ;t lbl  # o bar
s/\x81/\xfc/                    ;t lbl  # u umlaut
#s/\x/\x/                       ;t lbl  # u acute
s/\x97/\xf9/                    ;t lbl  # u grave
s/\x96/\xfb/                    ;t lbl  # u circumflex
s/\x87/\xe7/                    ;t lbl  # c cedilla
s/\xa4/\xf1/                    ;t lbl  # n tilde
#s/\x/\x/                       ;t lbl  # ae diphthong
#s/\x/\x/                       ;t lbl  # oe diphthong
s/\x8e/\xc4/                    ;t lbl  # A umlaut
s/\x83/\xc0/                    ;t lbl  # A grave
s/\xc7/\xc3/                    ;t lbl  # A tilde
s/\x8f/\xc5/                    ;t lbl  # A circle
s/\x90/\xc9/                    ;t lbl  # E acute
#s/\x/\x/                       ;t lbl  # E grave
s/\x99/\xd6/                    ;t lbl  # O umlaut
#s/\x/\x/                       ;t lbl  # O tilde
#s/\x/\x/                       ;t lbl  # O bar
s/\x9a/\xdc/                    ;t lbl  # U umlaut
#s/\x/\x/                       ;t lbl  # C cedilla
#s/\x/\x/                       ;t lbl  # N tilde
#s/\x/\x/                       ;t lbl  # AE diphthong
#s/\x/\x/                       ;t lbl  # OE diphthong
s/\x9c/\xa3/                    ;t lbl  # pound symbol
#s/\x/\x/                       ;t lbl  # cent symbol
#s/\x/\x/                       ;t lbl  # yen symbol
#s/\x/\x/                       ;t lbl  # backquote
#s/\x/\x/                       ;t lbl  # inverse !
#s/\x/\x/                       ;t lbl  # inverse ?
#s/\x/\x/                       ;t lbl  # degree
#s/\x/\x/                       ;t lbl  # division
#s/\x/\x/                       ;t lbl  # left arrow
#s/\x/\x/                       ;t lbl  # right arrow
#s/\x/\x/                       ;t lbl  # up arrow
#s/\x/\x/                       ;t lbl  # down arrow
s/\xb8/\xa9/                    ;t lbl  # copyright
#s/\x/\x/                       ;t lbl  # registered
#s/\x/\x/                       ;t lbl  # one-quarter
#s/\x/\x/                       ;t lbl  # one-half
#s/\x/\x/                       ;t lbl  # three-quarters

}

Last edited by tcat on Thu Aug 14, 2014 10:01 am, edited 1 time in total.


tcat
Super Gold Card
Posts: 633
Joined: Fri Jan 18, 2013 5:27 pm
Location: Prague, Czech Republic

Re: SBASIC / SuperBASIC Reference Manual - HTML

Post by tcat »


2nd pass, adds formating and anchors

Code: Select all

#!/bin/sed -f
# converts EPSON printer file
# into HTML stream, second pass adds formating and anchors

:lbl {

s/<br>\(Location: <span\)/\1/   ;t lbl  # removes <br> before Location:

# changes heading size
s/\(<hr.*>\)<h3>\([-a-zA-Z0-9_$% \.]*\)<br><\/h3>/\1<h1>\2<\/h1>/ ;t lbl

# adds anchor name
s/\(<hr.*><h1>\)\([-a-zA-Z0-9_$% \.]*\)\(<\/h1><br>\)/\1<a name="\2"><\/a>\2\3/ ;t lbl

s/<br><h3><br><\/h3>//          ;t lbl  # removes some empty lines
s/<br><h3><br>$//               ;t lbl  # removes trailing line
1s/<hr.*2px;.>//                        # removes seperator on 1st line

}

3rd pass, adds cross references

Code: Select all

#!/usr/bin/perl
# converts EPSON printer file
# into HTML stream, third pass adds cross refs

use HTML::Parser ();

open (my $in, "<", @ARGV[0]) or die;
$p = new HTML::Parser (
                        start_h => [\&start, "tag, attr, text"],
                        text_h  => [\&text,  "text"],
                        end_h   => [\&end,   "tag, text"],
                      );
$p->parse_file($in);
my $italic = 0;
my $xref = 0;
my $myxref = "";

sub start
{
  my ($tag, $attr, $text) = @_;
  if ( $tag eq "span" ) { $italic = 1; }
  if ( $tag eq "hr" ) { $xref = 0; }
  if ( $italic && $xref ){
    $text =~ s/<span/<a/;
    $text =~ s/>/ href="KeywordsXXX.html#/;
    $myxref = $text;
  }
  else { print $text; }
}

sub text
{
  my ($text) = @_;
  if ( $text eq "CROSS-REFERENCE:" ) { $xref = 1; }
  if ( $italic && $xref )
  {
    $text =~ s/^\s+//g;
    $myxref =~ s/XXX/ substr($text,0,1) /e;
    $myxref .= $text . '">' . $text;
    print $myxref;
  }
  else { print $text; }
}

sub end
{
  my ($tag, $text) = @_;
  if ( $italic && $xref )
  {
    print "</a>";
  }
  else { print $text; }
  if ( $tag eq "/span" ) { $italic = 0; }
  #close $in;
}

4th pass, adds HTML header with styles and footer

Code: Select all

# adds header with styles and appends footer
sed s/#ID#/"$id"/ head.html > head1.html
cat head1.html "$file"2.html foot.html > $file.html 
Last edited by tcat on Wed Aug 13, 2014 5:54 pm, edited 5 times in total.


RWAP
RWAP Master
Posts: 2892
Joined: Sun Nov 28, 2010 4:51 pm
Location: Stone, United Kingdom
Contact:

Re: SBASIC / SuperBASIC Reference Manual - HTML

Post by RWAP »

Try this code:

Code: Select all

#!/bin/sed -f
# converts EPSON printer file
# into HTML stream, translates ESC/P sequences

:lbl {

s/^[@//                         ;t lbl  # file start, printer init
s/^[!.//                        ;t lbl  # removes escape seq.
s/^[(U...//                     ;t lbl  # -"-
s/^[6//                         ;t lbl  # -"-
s/^[x.//                        ;t lbl  # -"-
s/^[C.//                        ;t lbl  # -"-
s/^[(t.....//                   ;t lbl  # -"-
s/^[t.//                        ;t lbl  # -"-
s/^[\\F./<hr class="mainSeparator">/ ;t lbl  # adds seperator
s/^[\\^[^A/<br>/                ;t lbl  # adds line break
s/^[\\..//                      ;t lbl  # removes escape seq.
s/^[J.//                        ;t lbl  # -"-
s/^[k.//                        ;t lbl  # -"-
s/^[p.//                        ;t lbl  # -"-
s/^[X...//                      ;t lbl  # -"-
s/^[E/<h3>/                     ;t lbl  # bold font start
s/^[F/<\/h3>/                   ;t lbl  # bold font end
s/^[P//                         ;t lbl  # removes escape seq.
s/^[g//                         ;t lbl  # -"-
s/^[(^...//                     ;t lbl  # -"-
s/^[M//                         ;t lbl  # -"-
s/^[R.//                        ;t lbl  # -"-
s/^[-.//                        ;t lbl  # -"-
s/^[4/<span class="qlcode">/    ;t lbl  # italic script start - used for SuperBASIC code
s/^[5/<\/span>/                 ;t lbl  # italic script end
s/^[S./<sup>/                   ;t lbl  # subscript start
s/^[T/<\/sup>/                  ;t lbl  # subscript end
s/SBASIC\/.*Section [0-9]//     ;t lbl  # removes page note
s/SBASIC\/.*Appendix [0-9 ]*//  ;t lbl  # removes page note
s/<h3>(RELEASE.*</h3>//         ;t lbl  # removes note about version
s/^L//                          ;t lbl  # and ctrl-L
s/<\([A-Z0-9]*\)>/\<\1\>/ ;t lbl  # maps < and > to html
s/\x84/\ä/                 ;t lbl  # maps a umlaut to html
s/\xc6/\ã/               ;t lbl  # a tilde
s/\x86/\å/                ;t lbl  # a circle
s/\xa0/\á/               ;t lbl  # a acute
s/\x85/\à/               ;t lbl  # a grave
#s/\xc0/\â/               ;t lbl  # a circumflex
s/\x89/\ë/                 ;t lbl  # e umlaut
s/\x82/\é/               ;t lbl  # e acute
s/\x8a/\è/               ;t lbl  # e grave
s/\x88/\ê/                ;t lbl  # e circumflex
s/\x8b/\ê/                ;t lbl  # i umlaut
s/\xa1/\í/               ;t lbl  # i acute
s/\x8d/\ì/               ;t lbl  # i grave
s/\x8c/\î/                ;t lbl  # i circumflex
s/\x94/\ö/                 ;t lbl  # o umlaut
#s/\x/\õ/                ;t lbl  # o tilde - what code is output for this?
s/\xa2/\ó/               ;t lbl  # o acute
s/\x95/\ò/               ;t lbl  # o grave
s/\x93/\ô/                ;t lbl  # o circumflex
#s/\x/\ø/                ;t lbl  # o bar
s/\x81/\ü/                 ;t lbl  # u umlaut
#s/\x/\ú/                ;t lbl  # u acute - what code is output for this?
s/\x97/\ù/               ;t lbl  # u grave
s/\x96/\û/                ;t lbl  # u circumflex
s/\x87/\ç/               ;t lbl  # c cedilla
s/\xa4/\ñ/               ;t lbl  # n tilde
#s/\x/\æ/                 ;t lbl  # ae diphthong - what code is output for this?
#s/\x/\œ/                 ;t lbl  # oe diphthong - what code is output for this?
s/\x8e/\Ä/                 ;t lbl  # A umlaut
s/\x83/\À/               ;t lbl  # A grave
s/\xc7/\Ã/               ;t lbl  # A tilde
s/\x8f/\Å/                ;t lbl  # A circle
s/\x90/\É/               ;t lbl  # E acute
#s/\x/\È/                ;t lbl  # E grave - what code is output for this?
s/\x99/\Ö/                 ;t lbl  # O umlaut
#s/\x/\Õ/                ;t lbl  # O tilde - what code is output for this?
#s/\x/\Ø/                ;t lbl  # O bar - what code is output for this?
s/\x9a/\Ü/                 ;t lbl  # U umlaut
#s/\x/\Ç/                ;t lbl  # C cedilla - what code is output for this?
#s/\x/\Ñ/                ;t lbl  # N tilde - what code is output for this?
#s/\x/\Æ/                 ;t lbl  # AE diphthong - what code is output for this?
#s/\x/\Œ/                 ;t lbl  # OE diphthong - what code is output for this?
s/\x9c/\£/                ;t lbl  # pound symbol
#s/\x/\¢/                  ;t lbl  # cent symbol - what code is output for this?
#s/\x/\¥/                   ;t lbl  # yen symbol - what code is output for this?
#s/\x/\`/                  ;t lbl  # backquote - what code is output for this?
#s/\x/\¡/                 ;t lbl  # inverse ! - what code is output for this?
#s/\x/\¿/                ;t lbl  # inverse ? - what code is output for this?
#s/\x/\°/                   ;t lbl  # degree - what code is output for this?
#s/\x/\÷/                ;t lbl  # division - what code is output for this?
#s/\x/\←/                  ;t lbl  # left arrow - what code is output for this?
#s/\x/\→/                  ;t lbl  # right arrow - what code is output for this?
#s/\x/\↑/                  ;t lbl  # up arrow - what code is output for this?
#s/\x/\↓/                  ;t lbl  # down arrow - what code is output for this?
s/\xb8/\©/                 ;t lbl  # copyright
#s/\x/\®/                   ;t lbl  # registered - what code is output for this?
#s/\x/\¼/                ;t lbl  # one-quarter - what code is output for this?
#s/\x/\½/                ;t lbl  # one-half - what code is output for this?
#s/\x/\¾/                ;t lbl  # three-quarters - what code is output for this?

}
This uses html entities and should hopefully remove an instance in the foreword (although that is not too relevant).

There are quite a few where you have not include an ESC/P2 code for conversion...


RWAP
RWAP Master
Posts: 2892
Joined: Sun Nov 28, 2010 4:51 pm
Location: Stone, United Kingdom
Contact:

Re: SBASIC / SuperBASIC Reference Manual - HTML

Post by RWAP »

I have added some classes to deal with italics etc for superbasic code too...


tcat
Super Gold Card
Posts: 633
Joined: Fri Jan 18, 2013 5:27 pm
Location: Prague, Czech Republic

Re: SBASIC / SuperBASIC Reference Manual - HTML

Post by tcat »

Rich,

Will do. though latin1 as is maps some 90% of non English chars in Appendices 8-9 and Keywords K.

You are right, not all ESC/Ps, as defined by standard, are there, but most if not all used in SBASIC Ref manual.

It might be a question of 20% effort 80% result at some stage, what do you think?
At some stage, hand edit may get us further quicker.

I will myself hand edit Keywords D-E-F, lengthiest chapters, when we agree is the right time for it.

Suprisingly A grave is not defined in QL set, but it is as I hope used in Appendix 8. in word DEJA.
-8 ALREADY EXISTS EXISTIERT BEREITS EXISTE DÉJÀ

But my French is almost non-existent, so I cannot tell for sure.

I cannot also see defined E grave, and all accented capital I's in QL set?
Can they appear in ESC/P file, shall we map them?

I will also post the code in the tar ball, so we may all play arround, there are some preprocess steps I did not comment as yet.

Tom


tcat
Super Gold Card
Posts: 633
Joined: Fri Jan 18, 2013 5:27 pm
Location: Prague, Czech Republic

Re: SBASIC / SuperBASIC Reference Manual - HTML

Post by tcat »

Hi,

As promised here is the convertor tarball.
t87.tgz
Convertor tarred gzipped
(3.6 KiB) Downloaded 187 times
INSTALL:

Simply navigate to your ql directory, and type
$ tar -xzf t87.tgz

It will extract these files into directory t87/conv/

t87/conv/escp1
t87/conv/escp1a
t87/conv/escp1b
t87/conv/escp2
t87/conv/escp2a
t87/conv/escp3
t87/conv/conva
t87/conv/convx
t87/conv/run.all
t87/conv/zip.all
t87/conv/head.html
t87/conv/foot.html

COMMENTS:

escp1, 1a, 1b - 1st pass
escp2, 2a - 2nd pass
escp3 - 3rd pass

conva - without x-references - usefull for Appendices and Intros
convx - with x-references - usefull for Keywords sections
run.all - runs all batch
zip.all - zips all batch

head.html - html header with styles
foot.html - html footer

I suggest using vi editor especially on escp1b that contains control characters


HOW TO RUN CONVERTOR?

both conva and convx takes 1 or 2 arguments
t87_txt file to convert and an optional HTML title

$ ./conva t87_txt [html title]
$ ./convx t87_txt [html title]

If only one parameter is supplied to
conva, HTML title defaults to Appendix 1-18
convx, HTML title defaults to Keywords A-Z


EXAMPLES:

$ ./conva Appendix08_txt
will convert Appendix08_txt to Appendix08.html and sets "Appendix 8" HTML title

$ ./conva WritingPrograms_txt 'Writing Programs'
will convert and sets "Writing Programs" HTML title

$ ./convx KeywordsD_txt
will convert and sets "Keywords D" HTML title

$ /convx Keywords__txt 'Keywords Other'
will convert and sets "Keywords Other" HTML title

.$ ./run.all
with no arguments
will run the whole batch and converts everything
on my <1GHz system completes in 35 secs

$ ./zip.all
with no arguments
will zip all converted HTML files into one package named SBASICRefManual.zip
may be usefull for uploading to the RWAP site


For run.all and zip.all
SBASIC Reference files need to be named as such and copied to t87/conv/ directory

Code: Select all

AppendicesIntro_txt  Appendix14_txt    KeywordsF_txt     KeywordsS_txt
Appendix01_txt       Appendix15_txt    KeywordsG_txt     KeywordsT_txt
Appendix02_txt       Appendix16_txt    KeywordsH_txt     Keywords__txt
Appendix03_txt       Appendix17_txt    KeywordsI_txt     KeywordsU_txt
Appendix04_txt       Appendix18_txt    KeywordsJ_txt     KeywordsV_txt
Appendix05_txt       Chrtable_txt      KeywordsK_txt     KeywordsW_txt
Appendix06_txt       Credits_txt       KeywordsL_txt     KeywordsX_txt
Appendix07_txt       Foreword_txt                                      KeywordsY_txt
Appendix08_txt       Introduction_txt  KeywordsM_txt     KeywordsZ_txt
Appendix09_txt       KeywordsA_txt     KeywordsN_txt     Structure_txt
Appendix10_txt       KeywordsB_txt     KeywordsO_txt     WritingPrograms_txt
Appendix11_txt       KeywordsC_txt     KeywordsP_txt
Appendix12_txt       KeywordsD_txt     KeywordsQ_txt
Appendix13_txt       KeywordsE_txt     KeywordsR_txt
Last edited by tcat on Fri Aug 22, 2014 5:02 pm, edited 1 time in total.


tcat
Super Gold Card
Posts: 633
Joined: Fri Jan 18, 2013 5:27 pm
Location: Prague, Czech Republic

Re: SBASIC / SuperBASIC Reference Manual - HTML

Post by tcat »

Hi,

Just realised, I forgot to include html header and footer in the convertor tarball, updating the package now, also including note about these two files.

Tom


RWAP
RWAP Master
Posts: 2892
Joined: Sun Nov 28, 2010 4:51 pm
Location: Stone, United Kingdom
Contact:

Re: SBASIC / SuperBASIC Reference Manual - HTML

Post by RWAP »

Thanks to work by Ralf, it looks as though the PDF version of the SBASIC/SuperBASIC Reference Manual can be improved to make it fully searchable.

The bonus of this is that someone (NOT ME!) should be able to run the PDFs through a PDF to HTML convertor which should then mean a lot less formatting is required on the HTML version of the book.

That said, no-one has come forward to offer to do any more work on the HTML version for some months - this is a community project, but maybe people don't want it....


Ralf R.

Re: SBASIC / SuperBASIC Reference Manual - HTML

Post by Ralf R. »

Yes it was a bit hazardous. The quality of the printout depends on the printer driver of text87, the founts taken for the doc and the founts taken by QPCprint for these. No wonder, Rich's version probably looked a lot different to my first one, which was awful.

Yesterday, I have looked after a suitable pdf2html converter. I have installed three different programs (from 760k up to 32MB :cry: ), just to to see, that none of these were really satisfying me.

So if someone has an idea, please let me know. What I have to try is the sourceforge program "pdftohtml".
RWAP wrote:Thanks to work by Ralf, it looks as though the PDF version of the SBASIC/SuperBASIC Reference Manual can be improved to make it fully searchable.

The bonus of this is that someone (NOT ME!) should be able to run the PDFs through a PDF to HTML convertor which should then mean a lot less formatting is required on the HTML version of the book.

That said, no-one has come forward to offer to do any more work on the HTML version for some months - this is a community project, but maybe people don't want it....


User avatar
XorA
Site Admin
Posts: 1623
Joined: Thu Jun 02, 2011 11:31 am
Location: Shotts, North Lanarkshire, Scotland, UK

Re: SBASIC / SuperBASIC Reference Manual - HTML

Post by XorA »

Ralf R. wrote: So if someone has an idea, please let me know. What I have to try is the sourceforge program "pdftohtml".
[/quote]

There used to be a pretty good PDF import extension for Open Office! It seems to have been dropped from recent versions but I wonder if it still kicks about!


Post Reply