News:

MASM32 SDK Description, downloads and other helpful links
MASM32.com New Forum Link
masmforum WebSite

IsBadReadPtr and strcopy routines

Started by jj2007, February 22, 2009, 10:17:04 PM

Previous topic - Next topic

jj2007

Inspired by Mark Larson's remarks on reading past a buffer, and other threads saying IsBadxxPtr are worse than GOTO's, I tested what happens if an algo reads past a dynamically allocated buffer. The results for Win XP are interesting:

- VirtualAlloc goes bang at the first byte beyond the legal border
- HeapAlloc is a looot more tolerant, allowing 1638h extra bytes for small buffers, and FE0h for bigger ones.

Below is the testbed. Change if 0 to if 1 for testing VirtualAlloc instead of HeapAlloc. The exception will happen in lodsb, and edx is the counter.

Question: How relevant is this for a practical case, i.e. a strlen or szcopy algo that reads a few bytes (4 or 16) beyond the end of the string? Where is that hypothetical case that I get a string whose end coincides with the end of the allocated buffer?

include \masm32\Gfa2Masm\Gfa2Masm.inc

.code
hw db "Hello World", 0

start:
  if 0
invoke VirtualAlloc, NULL, 10000h, MEM_COMMIT, PAGE_READWRITE
.if eax==0
MsgBox 0, "VirtualAlloc failure", offset hw, MB_OKCANCEL
.else
mov esi, eax
int 3 ; for OllyDbg
mov ecx, -1
; 0010000h
; 00100000h
; 12288
; 102400
; 1003520
; 10002432
; 100003840
xor edx, edx
.Repeat
lodsb
dec ecx
inc edx
.Until Sign?
MsgBox 0, "Heap allocated", offset hw, MB_OKCANCEL
.endif
  else
invoke GetProcessHeap
.if eax==0
MsgBox 0, chr$("GetProcessHeap failure"), offset hw, MB_OKCANCEL
.else
invoke HeapAlloc, eax, HEAP_GENERATE_EXCEPTIONS, 100000h
; 2000h = 3638h
; 1000h = 2638h
; 10000h = 11638h
; 100000h = 100FE0h
; 1000000h = 1000FE0
; 100FE0
; 13880
; 22072
; 42552
; 103992
; 1003488
; 10002400
; 100003808
.if eax==0
MsgBox 0, "HeapAlloc failure", offset hw, MB_OKCANCEL
.else
mov esi, eax
int 3 ; for OllyDbg
mov ecx, -1
xor edx, edx
.Repeat
lodsb
dec ecx
inc edx
.Until Sign?
MsgBox 0, "Heap allocated", offset hw, MB_OKCANCEL
.endif
.endif
  endif
  invoke ExitProcess, eax

end start

drizz

Allocated memory will always reside on a PAGE_SIZE (1000h) aligned region.
The truth cannot be learned ... it can only be recognized.

jj2007

#2
Quote from: drizz on February 23, 2009, 12:00:15 AM
Allocated memory will always reside on a PAGE_SIZE (1000h) aligned region.


What you write is partially correct (i.e. for VirtualAlloc) but not particularly related to my question. Here is a testcase using the Masm32 library len macro. Use console assembly and run it from the command line. It will raise an exception.

; include \masm32\Gfa2Masm\Gfa2Masm.inc
include \masm32\include\masm32rt.inc


.data?
hFile dd ?
pHeap dd ?
pVirt dd ?
BytesReadH dd ?
BytesReadV dd ?

.code
hw db "Hello Masm32 forum:", 0

start:
TestBytes = 50000h
invoke CreateFile, chr$("\masm32\include\windows.inc"), GENERIC_READ, FILE_SHARE_READ,
NULL, OPEN_EXISTING, FILE_ATTRIBUTE_NORMAL, 0
.if eax==INVALID_HANDLE_VALUE
  MsgBox 0, LastError$(), "File open error:", MB_OK
  exit
.endif
mov hFile, eax
invoke GetProcessHeap
.if eax==0
MsgBox 0, chr$("GetProcessHeap failure"), offset hw, MB_OKCANCEL
.else
invoke HeapAlloc, eax, HEAP_GENERATE_EXCEPTIONS, TestBytes
mov pHeap, eax
invoke VirtualAlloc, NULL, TestBytes, MEM_COMMIT, PAGE_READWRITE
.if eax==0
MsgBox 0, "VirtualAlloc failure", offset hw, MB_OKCANCEL
.else
mov pVirt, eax
invoke ReadFile, hFile, pHeap, TestBytes, offset BytesReadH, 0
invoke SetFilePointer, hFile, 0, 0, FILE_BEGIN
invoke ReadFile, hFile, pVirt, TestBytes, offset BytesReadV, 0
invoke CloseHandle, hFile
print chr$(13,10,"Start heap memory=  ", 9)
print hex$(pHeap),"h"
print chr$(13,10,"Start virtual memory=", 9)
print hex$(pVirt),"h"
print chr$(13, 10, "Bytes read to heap/virtual memory: ")
print hex$(BytesReadH), "h/"
print hex$(BytesReadV), "h", 13, 10
print chr$(13,10,"Len(heap)=")
print hex$(len(pHeap))
print chr$(13,10,"Len(virt)=")
print hex$(len(pVirt))
print chr$(13,10,"--- ok ---")
.endif
.endif
getkey
invoke ExitProcess, eax
end start

sinsi

As soon as EAX hits the next page's first byte, c0000005 exception.
I've said before to watch out if a read goes past the end of a buffer (and been told "don't worry, one in a million chance"), this proves it.

Hey jj, what's this 'gfa2' stuff? I replaced it with masm32rt but it still choked at 'offset hw' - I changed it to 0 and all is sweet.
Light travels faster than sound, that's why some people seem bright until you hear them.

jj2007

Quote from: sinsi on February 23, 2009, 09:35:07 AM
As soon as EAX hits the next page's first byte, c0000005 exception.
I've said before to watch out if a read goes past the end of a buffer (and been told "don't worry, one in a million chance"), this proves it.
Exactly. However, it also shows that this risk is not only relevant for Agner Fog reading a dword or Lingo's exotic SSE2 algos - that crash was provoked by good ol' szLen, reading byte per byte.
My lesson from that is that since any algo will crash, including the official Microsoft ones:
print chr$(13,10,"Len(virt)=")
invoke crt_strlen, pVirt
print hex$(eax) ; len(pVirt))


... we might as well ignore this risk and choose the fastest one.

Quote
Hey jj, what's this 'gfa2' stuff? I replaced it with masm32rt but it still choked at 'offset hw' - I changed it to 0 and all is sweet.

Sooorry, should have been masm32rt indeed. Corrected above, including the hw (my abbreviation for Hello World).

drizz

Where is the null terminator? 327680 bytes of memory that is filled with windows.inc content does not include the null terminator.
This has nothing to do with unaligned access it's simply a bad parameter!

If you want your program to handle bad parameters then it should do so by setting up SEH!
Or in case of strcpy a strNcpy should be used, with SEH!

The StrLen routine designed by Agner was for use on a dword aligned buffer, its missuse is not the authors fault!
The truth cannot be learned ... it can only be recognized.

drizz

Take a look at Agner's newest strlen www.agner.org/optimize/asmexamples.zip, it will never fail!
(unless you feed it with a bad parameter that is  :lol)
The truth cannot be learned ... it can only be recognized.

PBrennick

Don't you just love it! and then the preaching ...

Paul
The GeneSys Project is available from:
The Repository or My crappy website

jj2007

Quote from: drizz on February 23, 2009, 11:21:23 PM
Where is the null terminator? 327680 bytes of memory that is filled with windows.inc content does not include the null terminator.
This has nothing to do with unaligned access it's simply a bad parameter!

If you want your program to handle bad parameters then it should do so by setting up SEH!
Or in case of strcpy a strNcpy should be used, with SEH!

The StrLen routine designed by Agner was for use on a dword aligned buffer, its missuse is not the authors fault!

drizz,
First of all: thank you for pointing me to Agner Fog's _strlen algo - it is faster than my current favourite designed by NightWare.
My intention was not to deal with misalignment issues - they can be solved easily, as Agner's _strlen SSE2 algo shows (his solution is really elegant). I rather wanted to clarify an issue that has been frequently raised in other threads by our speed gurus, i.e.: Can an algo be used safely even if it has to read beyond the zero delimiter?
The answer seems to be yes, it can be used. Your version is "don't pass bad parameters" - and I agree that my test case is pretty artificial, although it might happen in real life. A second finding is that even if you stumble over a case where the string to be tested fills the legal memory block up to the last byte, you will not have a problem with HeapAlloc - there are several hundreds of "gift bytes" around, so your strlen algo will have a safe landing. And last but not least, that is not true for VirtualAlloc: VA is not tolerant at all (Agner's algo crashes, too).

P.S. SEH: see blogs.msdn

drizz

Quote from: jj2007 on February 24, 2009, 09:55:17 AMCan an algo be used safely even if it has to read beyond the zero delimiter?
It has everything to do with alignment.

Why don't i make it simpler for you; show me a test case where Agner's _strlen fails, with well formed strings (zero terminated) anywhere you want.
The truth cannot be learned ... it can only be recognized.

PBrennick

JJ,
The thing is that if you want to make the algo crash, you can make it crash any time you want. But, that is a fault of the programmer, NOT the Algo. Sure, it is a fact that programmers are human and will make mistakes but, it is not reasonable to think that an Algo should be able to circumvent that human condition.

There is no question that reading data via blocks is faster than reading data byte by byte but, that is what I do, always, when writing my own stuff such as I did with my implimentation of CopyCat. If data is read byte by byte, there can be no buffer issues.

The point I am slowly getting at here is that if you are going to be referencing a buffer with an Algo that will be doing DWORD reads, for instance, then it would make sense to size the buffer on a DWORD boundary. The worst case scenario would be 3 extra bytes. Then you can use any of these Algos to get the exact length of any given string without any problems. That would be called aligning the data buffer to suit the Algo. If you do this, you can then use the fastest one (which in most cases is not critical anyway).

I guess the thing that I like the best about Agner's latest offering is he has finally gotten around to preserving EBX which is a real petpeave of mine when it comes to these Algos. I think that they all should. Sure, it is extra baggage and will impact the speed of the Algo but sometimes you gotta do what ought to be done. I would never make a car without a bumper because the weight of the bumper would have a negative impact on gas mileage because of its 'lift/mass.' We know it will and accept it.

Paul
The GeneSys Project is available from:
The Repository or My crappy website

jj2007

Quote from: PBrennick on February 24, 2009, 01:29:44 PM
If data is read byte by byte, there can be no buffer issues.

See my reply to Sinsi's post above.

The only open question is indeed whether it may happen, in real world scenarios, that we stumble over a string allocated with VirtualAlloc that does not have a zero delimiter. As I have shown above, this question touches all algos, whether they read byte by byte or dwords or paras, whether they are aligned or not. As I also have shown, it is not a problem for strings on the heap.

Quote
Don't you just love it! and then the preaching ...
Could you please elaborate what you mean?

japheth

Quote from: jj2007 on February 24, 2009, 09:55:17 AM
Can an algo be used safely even if it has to read beyond the zero delimiter?
The answer seems to be yes, it can be used.

The correct answer is no, because you'd have to rely on undocumented behaviour. Your code might work with current versions of your favorite OS, but will fail within the next one.

I have an innocent question: why do you mention the IsBadReadPtr() function in this thread's caption, but it  occurs in none of the posts so far?

jj2007

Quote from: japheth on February 24, 2009, 05:01:39 PM
Quote from: jj2007 on February 24, 2009, 09:55:17 AM
Can an algo be used safely even if it has to read beyond the zero delimiter?
The answer seems to be yes, it can be used.

The correct answer is no, because you'd have to rely on undocumented behaviour. Your code might work with current versions of your favorite OS, but will fail within the next one.

Your answer is correct, but it applies to all strlen functions, even those that read byte by byte. This thread was inspired by an earlier thread asking "can we use this dangerous SSE2 stuff that may read 15 bytes beyond the end of the string". Yes, we can, because there is no principal difference between an algo that reads 1 and another one that reads 15 bytes "beyond". Of course, there is a tiny chance that it may fail - especially with "intolerant" VirtualAlloc; but the only alternative would be not using any strlen function.


Quote
I have an innocent question: why do you mention the IsBadReadPtr() function in this thread's caption, but it  occurs in none of the posts so far?

:bg

drizz

POC Code : Notice that only 1 byte is requested for allocation.

PAGE_SIZE EQU 1000h

invoke malloc,1
mov esi,eax
mov edi,eax
and edi,0-PAGE_SIZE
mov al,[edi]
add edi,PAGE_SIZE-1
mov al,[edi]
invoke free,esi

invoke GetProcessHeap
invoke HeapAlloc,eax,HEAP_GENERATE_EXCEPTIONS,1
mov esi,eax
mov edi,eax
and edi,0-PAGE_SIZE
mov al,[edi]
add edi,PAGE_SIZE-1
mov al,[edi]
invoke GetProcessHeap
invoke HeapFree,eax,HEAP_GENERATE_EXCEPTIONS,esi

invoke VirtualAlloc,0,1,MEM_COMMIT,PAGE_READWRITE
mov esi,eax
mov edi,eax
and edi,0-PAGE_SIZE
mov al,[edi]
add edi,PAGE_SIZE-1
mov al,[edi]
invoke VirtualFree,esi,0,MEM_RELEASE

This will always work!

   
lets assume i have allocated 1000h of bytes with virtualalloc

   db 1000h dup (0)
   
and I've placed a string <"ab",0> at offset 1000h-3
   
   db 1000h-3 dup (0)
   db "ab",0
   
i read a dword from address 0FFDh (dword read means reading 1 byte past the NULL terminator)
   
   dword ptr [0FFDh]
   
its accessing the next page and therefore if the page is not committed - access violation
   
if the address is aligned

   mov eax,0FFDh
   and eax,-4 ;; dword align
   
   mov edx,dword ptr [eax]
   
no access violation
   

The answer is: yes it can read past end of the string if the address it reads from is aligned on the access boundary
mov eax,[edx];edx must be dword aligned
movq mm0,[edx];edx must be qword aligned
movdqa xmm0,[edx];edx must be oword aligned

- i'm done here -
The truth cannot be learned ... it can only be recognized.