Nvidia system with 3 monitors

Last year (2011), when I graduated high school, my school was going through a huge overhaul of the campus.  The old campus had been slowly been being destroyed as new buildings had been built.  That said, they also went though a shift in the technological resources.  They decided to write off or just plain junk a significant portion of the old computers (P4 era Dell Optiplex machines) and I happened to obtain two 17″ TFT panel LCD screens.  I bring them home thinking I’ll do something with them in the future.  I happened to be working with a upstart gaming studio, E1FTW Games (http://www.e1ftwgames.com/), that summer (i still am) and I had an iMac on my desk so I did nothing with the monitors.  After I left for college, I used the pair of them with an old eMachines computer that my family had long since forgotten about as my computer when I was at home because my primary (most awesome) computer I had brought with me to college, and resided in my dorm room.  When I finished my first year, I live (am still as of July, 2012) at my parent and I set my big computer back up.  I use a 1080p 23.5″ LCD tv as my primary monitor, but I seriously wanted to use the pair of monitors I had with my desktop.  Much to my dismay, Nvidia GPUs only support 2 monitors per chip, so, even though I had the three monitors on my desk, only the 23.5″ panel worked along with one of the smaller screens.  So, here I was trying to find the cheapest Nvidia gpu I could that would fit into a PCIe x1 slot.  Much to my surprise, they cost more than their PCIe x16 counterparts, something I regard as pretty damn stupid.  So I was searching for one that would fit in the PCI bus.  They were even more than the PCIe x1 cards.  It just wasn’t fair.  So I was lining up to buy a GeForce GT 430 for something like $80 that slotted in a PCI socket.  I was pretty bummed that this was the only solution, but then I had an idea.  PCIe is supposed to be failure tolerant.  If one of the channels goes dead, it just isolates the problem by ignoring the fact that it exists.  So I had a though – could I stick a PCIe x16 card in a PCIe x1 slot and operate a just a 1/16th the bandwith?  Sure enough, there were websites all over the internet that described cutting the end of the PCIe x1 slot off and placing in a PCIe x16 card.  I decided to try it out, and what do you know, it worked.  So here are some pictures.

All the screens loaded up onto my desk

My desktop. It doesn’t look like much, but its my trusty computer.

the eMachines computer that again I’m stripping of its GeForce 7300 gs graphics card

This is one of the most confused pieces of computing hardware I’ve ever owned. Its labeled as a GeForce 7200 GS, but its been identified as either that OR a GeForce 7300 SE (not a typo)

Targeted a PCIe x1 slot for chopping

Both cards, a GeForce GTX 560 2gb (the primary card) and the GeForce 7200/7300 gs installed

Viewing Steam and a test website under Chrome on the monitors plugged into the 7200 gs and running the Nvidia fluids demo on the primary monitor. Fun fact – the fluids demo runs fine on the old cards too, because it uses the GTX 560 to run the physics

Both GPUs identified in Furmark under Windows 7

World of Tanks up on the center monitor and stuff on the sides

All three monitors up and working under OpenSUSE 12.1 with the nvidia 295.59 drivers installed. This is under Xinerama, which I ended up disabling, see below

There was one unforeseen side effect of running the GPUs under Linux (my OS of choice).  I was trying to use Xinerama to make one contiguous display so I could do the awesome extended desktop thing, but alas, it was not to be, considering that I am using two widely varied cards.  The GeForce 7300 card is so old, it was available before Windows XP had any service packs.  It doesn’t even use shading processors.  It can only run one shader script at a time and has vertex, fragment, and geometry units straight on the card – its a DX9 GPU.  The primary card is a GeForce GTX 560.  A card with 8 times the amount of memory that runs 100s of times, 336 CUDA cores, and supports DX11 and OpenGL 4.x.  So compositing did not work and GL was disabled on the displays because it wasn’t compatible with main card, in turn because not all cards had GL, KDE wouldn’t run the effects manager.  This resulted in really slow window operations, the UI was so very laggy.  So I decided to give separate X screen a go.  It works flawlessly.  Windows may be locked to their respective screens, but its not at all bad.  Kwin places new windows in the screen where the mouse is when the application is launched.  Although, i do wish that when I want a new chromium window I could put it in another screen without having to run DISPLAY=”:0.2″ chromium from the console all the time when its already launched in another X screen.  I spend a lot of time in the console though, so its not really too bad.  Beats having only one monitor.  Since I chose to do it this way, OpenGL applications are supported in all the windows and they start by default unless instructed otherwise on the primary screen.  Fullscreen OpenGL applications on the two side monitors are unpredictable and unstable, but just fine on the center screen, driven my the massive GPU.  All in all, its an awesome setup and I love it. Linux had come a long way since its conception and now with Unity3D, a very popular game engine, officially supporting Linux and Autodesk releasing their 3D software for Linux (such as Maya), maybe Windows will start loosing its stranglehold on gaming.

– Teknoman117

AVR RTOS Update

I haven’t forgot about my little rtos project, although its moving towards not really being an RTOS.  The goal is to write a task manager for the AVR and as an extent, the Arduino.  As I don’t have an Arduino Mega, or any board with an AVR with more than 64 K words of flash, such as an ATmega2560, I can not write the task switcher for that board, at least not properly test it because the pointers are a bit larger for flash.  So the function pointer sizes change.  So this project will support AVRs that have 16 bit program counters.

As I mentioned earlier, this project is moving away from being an RTOS to more of a process manager.  The task switcher will still consider time as a factor in the decision to run a thread, but will have extended set of run conditions.  I am going to add the concept of a lock to the task switcher and remove the concept of priority  This is so the AVR cpu does not have to waste precious clock cycles performing the context switch to a task only to have yield called again.  The locks are going to be contained in a linked list, and when the list is empty, the task can run.  I am not removing the concept of the “next run time” because I believe that most thread locks are going to be due to sleeping, such as a PID algorithm.  It needs to run at 25 Hz, and doesn’t need locks calculated until its time to run again.  A lock could be used for example, with a UART reader.  It should be locked until data is available.  This generates some new definitions in the code along with the discovery that in the AVR, malloc is not an expensive operation (~100 cycles).

struct _avr_task_lock
{
    struct _avr_task_lock *next_lock;
    void *lock_data;
    char (*lock_function)(void *); 
}

struct _avr_task_entry
{
    uint32_t next_run_time;
    uint16_t *stackptr;               // use cpiw to check if its equal to zero, if so, its invalid
    struct _avr_task_lock *lock_list;
}

The _avr_task_lock structure defines the lock object. It contains a pointer to the next object, a pointer to some data, and a pointer to a function which is used to figure out if that lock should expire because of the data pointer to by lock_data.  The _avr_task_entry structure defines the task entry to the AVR.  It contains the next time it should be run (locks are not crawled unless it should be allowed to run again), the current stack pointer, and a pointer to the first lock object.  I think the manager should use the X and Z pointer to store last and current pointers when everything is ported to assembly.  The only reason I would choose to store a next run time value is because if the thread needs not to be run until a time, why waste precious CPU cycles on something dependent on the clock.

A pure assembly function will be added to wrap a function that is desired to be executed.  This provides a wrapper so that when the thread function potentially returns, it can catch that and not blow up like the current implementation.  Basically, when the task adder pushes an executable thread, it stores a function pointer to the desired function as a parameter to the beginning of the wrapper function.  This wrapper function, when switched to, will call the function that is the thread, and if the thread returns, it invalidates the thread’s entry and performs a context switch, never to be executed again.

Eventually, when I get around to it, I’ll make a C++ extension to this, for the Arduino boards, or just people who use avr-g++.  Personally, I shy away from C++ in resource constrained environment, but hell, to each his own.

– Teknoman117