Time to time I read The Old New Thing blog (authored by Raymond Chen, one of Microsoft old-timers). On his brilliant notes, he often describes pathetic customers who observe unexpected Windows API behavior and pretend this is Microsoft's bug. What a naive people, I would exclaim. And every time few paragraphs later Raymond Chen explains that root of such issue is the customer failure: neglection to documentation details, wrong assumptions etc. Ironically, I got pretty strange behavior at WinSock API recently. So what should be my first reaction? Indeed, I started from myself... Problems, we're at Houston!
It was one of that usual days. Familar tools, known environment, and nothing to threaten a problem. I just completed migration to IPv6 addresses on existing codebase and was eager to verify did I manage everything right way (due all my previous IPv6 experience was theoretical-only).
- Windows 7:
- IPv6... success
- IPv4... success
- Windows XP:
- IPv6... success
- IPv4... BOO! BANG! WOW!
I quickly tweaked the sources with WSAGetLastError() instead of errno macro and got "error 10014" aka WSAEFAULT. O'kay folks, something wrong with parameters. But what? Blame me once, shame on you
The code was as trivial as "Hello Wrold" with minor tweaks, and it was apparently with a bug inside. I always eager to understand what is missed in the code. Especially in trivial code. Especially produced by my hands. Needless to say, it was a challenge accepted! My first guess: memory layout? Hey, look: the main code with sendto() is in the main module (EXE) and the function returning destination IP address is in another module (DLL). It seems, I mixed up different memory managers, or heaps, or something. Quickly I created a local copy of destination address structure, passed it to sendto()... Success! Looks ma, it was easy! Let me show you the difference in memory blocks:
get_multicast_upnp_addr() returned 0x676084E0 as 239.255.255.250 "a" copy is 0x003E2860 get_sockaddr_len() returned 16 get_upnp_discovery_msg() returned 0x6760850C as M-SEARCH * HTTP/1.1 ... |
Next day, I moved all the code into one single module. Compiled, launched on WinXP... WSAEFAULT!
I was screwed and smashed. Firewall? Antiviral software? No, no, no. I checked this on real machine, then on virtual images, then asked other people to confirm my tiny test program fails on XP. Everybody got WSAEFAULT result. I started to hear cracking sounds... It was the world crashing around me because I can't properly say "Hello World" to him.
Few days passed. Maybe, few weeks. It's really hard to track the time while world is crashing... Then, I got a miracle hope: this is not my fault, this is compiler! I recall I've heard that compilers may cause side-effects due overwhelming optimization. Since rarely happens, maybe, this is my case? I never trusted that MinGW GCC beast, I had to be more careful choosing the toolchain, what a dumb I was... Quickly I reconfigured project flags to "no optimization", launched... same error. Is MinGW innocent? O'kay, let step into WinSock disassembled code. Stack trace under the hood:
Thread [1] 0 (Suspended : Signal : SIGSEGV:Segmentation fault) WSHTCPIP!WSHGetSockaddrType() at 0x71a912f4 0x71a52f9f WSAConnect() at 0x71ab2fd7 main() at tests_main.c:77 0x401584 |
Oh my... First question was "why it does WSAConnect for UDP connectionless socket" but guys quickly reminded me about indirect binding if doing sendto() in such conditions. Yep, no problem here, legit call. But it fails! Lovely Visual Studio compiler, you're my only hope, help me please:
First-chance exception at 0x71a912f4 in SendtoBugXP.exe: 0xC0000005: Access violation writing location 0x00415744. |
Please, please, fruity please... "Autos" debug window answered me that 0x00415744 location is address of my destination address sockaddr_in::sin_zero field. Collapse
I don't remember much about how I figured this out. I'll just put it here:
const struct sockaddr_in addr_global = { AF_INET, htons(1900), { htonl(INADDR_UPNP_V4) }, {0}, };I love perfect clean code. And here is the price. You see that "const" qualifier at very beginning, right? This is the cause of all that two-weeks madness. One single word crashing the world. Funny? Post-mortem
So we've got the root of the problem. Let do WinSock inspection to understand why WinXP causes that strange side effect.
- sendto() intends constant destination address: "const struct sockaddr *to". So far so good. According to the stack trace mentioned above, next call is WSAConnect.
- WSAConnect() intends constant destination address: "const struct sockaddr *name". Still nothing suspicious.
- Unknown internall call, just hex address... Double your attention!
- WSHGetSockaddrType() expects a non-constant address: "PSOCKADDR Sockaddr". Shame on you, anonymous dirty type-caster!
No comments:
Post a Comment