News:

MASM32 SDK Description, downloads and other helpful links
MASM32.com New Forum Link
masmforum WebSite

Reading Text File Line by Line

Started by Force, February 16, 2012, 07:23:46 PM

Previous topic - Next topic

Force

Hi


I am using some ASCII symbols when i write text file for tokening

txt File

®Name Surname°12300009999°E- MAIL°COUNTRY
-------------------------------------------------------------------------


Later getting them for listview is easy with below code

invoke CreateFile,ADDR DataFileName,GENERIC_READ,NULL,NULL,OPEN_ALWAYS,FILE_ATTRIBUTE_NORMAL,NULL


mov oFile,eax
invoke GetFileSize,eax,addr okFileSize
mov okFileSize,eax
invoke GlobalAlloc,GMEM_MOVEABLE or GMEM_ZEROINIT,okFileSize
mov oMemory,eax
invoke GlobalLock,oMemory
mov okuMemory,eax
invoke ReadFile,oFile,okuMemory,okFileSize,ADDR SizeOKU,NULL
mov Aydin,0



invoke lstrlen,okuMemory
or eax,eax
jz close

;===================================== ============================

mov esi,okuMemory


starting:

mov edi,offset textOku ; edi is here for getting every line

;=========================  CHECK IF THERE IS A NEW LINE  ============


lineCheck:     ;  check first character
mov al,[esi]
inc esi
cmp al,174
jne close

;====================================================================================
strLine:     ;     Get Line


mov al,[esi]
inc esi
cmp al,10
je token

mov[edi],al
inc edi
jmp strLine
;=======================================================================================

token:     ; Part Tokens

push esi
mov esi,offset textOku
mov edi,offset text
tokenbas:
mov edi,offset text

tokenise:

mov al,[esi]
inc esi
cmp al,176 ;   is it end of line ?
je operation
cmp al,13  ;   is it end of  token?
je operation
mov [edi],al
inc edi

jmp tokenise



operation:
inc world

.if world==1

mov eax,Aydin

mov lvi.iSubItem,0
invoke SendMessage,hList,LVM_INSERTITEM,0,addr lvi
mov lvi.iItem,eax

.elseif world==4
inc lvi.iSubItem
invoke SendMessage,hList,LVM_SETITEM,0,addr lvi
inc Aydin
mov world,0
pop esi
xor eax,eax
clearT:
mov text[eax],0
inc eax
cmp eax,500
jne clearT

jmp starting

.else
inc lvi.iSubItem
invoke SendMessage,hList,LVM_SETITEM,0,addr lvi
.endif


xor eax,eax
clr2:
mov text[eax],0
inc eax
cmp eax,MAX_STRING
jne clr2
lea edi,text
jmp tokenbas


close:
invoke GlobalUnlock,okuMemory
invoke GlobalFree,oMemory
invoke CloseHandle,oFile



I dont know if its a good way or there is another simple way reading file line by line
and sending token to listview
Never Stop Until You Are Better Than The Best

Farabi

Each line is terminated with a "magic" byte 0x0D,0x0A. So you can determined where the line end.

If you want to make it faster, you have to remember which byte the line ends and create a table based on it.
Those who had universe knowledges can control the world by a micro processor.
http://www.wix.com/farabio/firstpage

"Etos siperi elegi"

vanjast

Be careful of Unix/C strings.. they only end with one of the "0Dh" or "0Ah".. forgot which :green2

Force

Never Stop Until You Are Better Than The Best

jj2007

Quote from: vanjast on February 27, 2012, 09:01:08 PM
Be careful of Unix/C strings.. they only end with one of the "0Dh" or "0Ah".. forgot which :green2

It's 0Ah, linefeeds. The snippet below translates Windows.inc to Unix format, for your testing.

include \masm32\MasmBasic\MasmBasic.inc   ; download
  Init
  Open "O", #1, "\Masm32\include\Linux.inc"   ; won't work under Linux, just a demo for creating a Unix LF-only file
  Recall "\Masm32\include\Windows.inc", L$()
  For_ n=0 To eax-1
   Print #1, L$(n), Lf$      ; write all strings to Linux.inc
  Next

  Close #1
  Recall "\Masm32\include\Linux.inc", L$(), unix   ; tell Recall that it's a Unix/Linux file, i.e. linefeed-only
  For_ ebx=0 To eax-1
   .if Instr_(L$(ebx), "RECT")
      .if Instr_(L$(ebx), "STRUCT")      ; show some results
         mov ecx, ebx
         .Repeat
            Print Str$(ecx), Tb$, L$(ecx), CrLf$
            inc ecx
         .Until Instr_(L$(ecx-1), "ENDS", 1)
         Print
      .endif
   .endif
  Next
  Inkey "ok"
  Exit
end start

hutch--

 :bg

Just to add to the confusion,
DOS/Windows is usually ascii 13,10
Unix is usually ascii 10 only.
MAC is usually ascii 13 only.
Richedit 2 and 3 use ascii 13 only.
Some very old text formats use 13,13,10 so that the ancient printers can get back to start for the ascii 10.

The best technique i have found is to do a character count and evaluate the results,

13 = 10 == DOS/Windows.
10 = non zero && 13 = 0 == Unix.
13 = non zero && 10 = 0 == MAC
13 !+ 10 && 13 && 10 != 0 is probably 13,13,10.

Download site for MASM32      New MASM Forum
https://masm32.com          https://masm32.com/board/index.php

jj2007

Quote from: hutch-- on February 28, 2012, 11:36:57 PM
Richedit 2 and 3 use ascii 13 only.

Yes, a real nuisance. But exporting is relatively easy: GT_USECRLF When copying text, translate each CR into a CR/LF.

dedndave

don't forget the guys that sometimes use 10, 13 by mistake - i have seen this several times   :P

Force

I think I used like that b4

ASCII 10 + Name Surname°12300009999°E- MAIL°COUNTRY + ASCII 13

then First Line was empty in listview so I decided to use symbols

like that
Quote®Name Surname°12300009999°E- MAIL°COUNTRY
+ 13,10
Never Stop Until You Are Better Than The Best

Tedd

DOS/Windows = 13,10
Linux/Unix = 10
Mac (old) = 13
Mac OS = 10

There are other crazy schemes, like 13,13 for a 'soft' line-break (for word-wrapping) but this shouldn't be saved to the file anyway; as for 10,13 - you can't guard against arbitrary mistakes, so don't even try (there will always be a better idiot.)


FWIW, ASCII also defines RECORD-SEPARATOR (30) and UNIT-SEPARATOR (31), so you might use these in your file and avoid any newline confusions.
QuoteSurname -US- Name -US- 12300009999 -US- Email -US- Country -RS-
Surname -US- Name -US- 12300009999 -US- Email -US- Country -RS-
Surname -US- Name -US- 12300009999 -US- Email -US- Country -RS-
...
No snowflake in an avalanche feels responsible.

jj2007

Quote from: Tedd on February 29, 2012, 11:37:19 PM
FWIW, ASCII also defines RECORD-SEPARATOR (30) and UNIT-SEPARATOR (31), so you might use these in your file and avoid any newline confusions.
QuoteSurname -US- Name -US- 12300009999 -US- Email -US- Country -RS-
Surname -US- Name -US- 12300009999 -US- Email -US- Country -RS-
Surname -US- Name -US- 12300009999 -US- Email -US- Country -RS-
...

Interesting, Tedd. Although the whole world uses TAB instead...

Force

Yes its sensible

if I use it for a listview

for example there will be 4 columns again in my new project

if I understand true

i need to record it like that


QuoteSurname  Name -US- 12300009999 -US- Email -US- Country -RS-

the problem is .... how can i find new line ?

may be i need to add 10 end of line not RS or i need to use  something for starting line

I was checking first character to find new line b4 ...but if user click space for 1st character

and I code        cmp al,0; for 1st byte
                         je exit
it is a kind of error for program bcoz i cant get other lines anymore
Never Stop Until You Are Better Than The Best

Tedd

Quote from: jj2007 on March 01, 2012, 06:16:01 AM
Interesting, Tedd. Although the whole world uses TAB instead...
Everyone does what microsoft does, but that doesn't make it right :P
This way you don't have any newline problems, or worries about escaping 'special' characters in your strings.
No snowflake in an avalanche feels responsible.

Tedd

Quote from: Force on March 01, 2012, 09:40:13 AM
Yes its sensible

if I use it for a listview

for example there will be 4 columns again in my new project

if I understand true

i need to record it like that


QuoteSurname  Name -US- 12300009999 -US- Email -US- Country -RS-

the problem is .... how can i find new line ?

may be i need to add 10 end of line not RS or i need to use  something for starting line

I was checking first character to find new line b4 ...but if user click space for 1st character

and I code        cmp al,0; for 1st byte
                         je exit
it is a kind of error for program bcoz i cant get other lines anymore

The file starts with the first line, and first line ends at RS. The next line starts immediately after, there are no newline characters - it's a data file, not a text file (but most of data happens to be text.)
Only your program reads and writes the file, not the user, so you control its content, and you can filter any user input so that it conforms to the required formatting.
No snowflake in an avalanche feels responsible.

Force

Tedd

when i write this code ;;;;;   invoke ReadFile,oFile,okMemory,okFileSize,ADDR SizeOKU,NULL
i move data from file to okMemory already
i get line and part it
mov esi,okMemory :Data is inside of this
lea edi,textBuf

;GET LINE

strLine:
mov al,[esi]
inc esi
cmp al,10;get data till new line
je token

mov[edi],al
inc edi
jmp strLine


token:     ; Part Tokens
......
.....
.....


inserting to listview
and  turning  back again for next line

lineCheck:     ;  check first character
mov al,[esi]
inc esi
cmp al,174;ASCII  for start line
jne close


then my way is wrong
Never Stop Until You Are Better Than The Best