48 Hours to Ship: Git Radar Retrospective

How we built an AI-powered GitHub search engine in a weekend hackathon.

Git Radar started as a hackathon project with a simple question: what if you could search for developers the way you search for code? Not by username or location filter, but by describing what you're looking for in plain English and getting ranked, contextualised results. "TypeScript developers in Sydney who contribute to open source testing frameworks." That kind of thing.

Here's how the 48 hours went, what we built, what broke, and what I learned.

The Problem We Were Solving

The hackathon theme was "the future of work," which is broad enough to mean almost anything. We narrowed it to developer hiring because it's a space we understood and had direct frustration with.

Recruiters and hiring managers spend a disproportionate amount of time manually reviewing GitHub profiles. They're looking for signals: code quality, project complexity, consistency, tech stack familiarity, whether someone actually maintains their projects or just pushes an initial commit and disappears. All of this information exists on GitHub - commit histories, PR reviews, repo activity, language breakdowns - but extracting it manually takes 15-20 minutes per candidate. At scale, that's untenable.

The existing tools in this space fall into two categories: LinkedIn-style keyword search (which misses context) and basic GitHub analytics dashboards (which show metrics without meaning). Nobody was doing semantic search - understanding what you're looking for and matching it against what developers have actually built.

Hour 0-8: Setup and Scoping

We spent the first two hours arguing about scope, which turned out to be the most valuable time investment of the entire project. The initial idea was ambitious: real-time GitHub crawling, a recommendation engine, collaborative shortlists, integration with ATS platforms. We cut all of it.

The scoped version: a search box, ranked results with AI-generated profile summaries, and a detail view for each profile. No user accounts, no saved searches, no fancy filters, no onboarding flow. Just the core loop - search, browse, understand.

Tech choices were pragmatic:

Next.js with TypeScript because we both knew it and could move fast

PostgreSQL with pgvector for storing profile embeddings and enabling vector similarity search

Supabase for managed Postgres and auth (in case we needed it later, which we didn't)

Vercel for deployment, so we could ship continuously from hour one

Tailwind CSS because nobody wants to write CSS at 3am

We deployed a "Hello World" to Vercel in hour 3. Every commit after that went to production. This was deliberate - we wanted to catch deployment issues early, not at hour 47.

The remaining hours went to setting up the database schema, GitHub OAuth (for API rate limits), and a basic profile fetch endpoint. Nothing exciting, but the foundation mattered.

Hour 8-20: The Data Pipeline

This is where the real work started. GitHub's REST API gives you a lot of data, but it's spread across dozens of endpoints and rate-limited to 5,000 requests per hour (authenticated). For a single profile, we needed to hit:

/users/{username} for basic profile info

/users/{username}/repos for repository list

/repos/{owner}/{repo} for individual repo details

/repos/{owner}/{repo}/languages for language breakdowns

/repos/{owner}/{repo}/commits for commit history (sampled)

/users/{username}/events for recent activity

That's 5-10 API calls per profile, and we wanted to handle hundreds of profiles. The rate limit math didn't look great, so we built a simple queue with backpressure - requests get batched, rate limits are tracked, and when we're close to the limit, the queue pauses and resumes after the reset window.

The indexing pipeline transforms raw GitHub data into a structured profile document:

interface IndexedProfile {
  username: string;
  bio: string;
  languages: Record<string, number>; // language -> bytes
  topRepos: {
    name: string;
    description: string;
    stars: number;
    language: string;
    lastCommit: string;
  }[];
  activityScore: number;   // 0-100 based on recent commits
  contributionScore: number; // weighted by repo quality
  embedding: number[];       // vector for semantic search
}

The embedding is the key piece. We concatenated the profile bio, repo descriptions, README snippets from pinned repos, and a sample of recent commit messages into a single text blob, then ran it through an embedding model to get a vector representation. This vector captures the semantic meaning of a developer's work - what they build, what technologies they use, what domains they work in - in a format that supports similarity search.

Hour 20-32: Search and Ranking

With profiles indexed and embedded, search was conceptually simple: embed the query, find the most similar profile vectors using pgvector's cosine distance, return the top results. The SQL looks almost trivially simple:

SELECT *, 1 - (embedding <=> $1) AS similarity
FROM profiles
ORDER BY embedding <=> $1
LIMIT 20;

But raw vector similarity wasn't enough. A query like "active React developers" should penalise profiles that haven't committed in months, even if their historical work is relevant. We needed a composite score that weighted multiple signals:

Semantic similarity to the query (from pgvector, 0-1)

Activity recency - exponential decay based on last commit date

Repository quality - stars, forks, and whether issues get responses

Contribution diversity - external PRs to other repos score higher than self-owned work

Consistency - regular commits over time beat burst-and-abandon patterns

The initial vector search was fast but blunt - embeddings compress a lot of nuance into a single distance metric. So we added a reranker model as a second pass. After pgvector returned the top 50 candidates, we ran each profile through a cross-encoder reranker that scored the query-profile pair together rather than comparing pre-computed embeddings independently. Cross-encoders are slower (they can't be pre-computed) but significantly more accurate because they attend to the full interaction between the query and the document. The reranker caught things the embedding search missed - like a profile whose repos were highly relevant but whose bio was generic, or a query with negation that cosine similarity couldn't handle. We used Cohere's rerank endpoint since it was the fastest to integrate, and it added maybe 200ms per search. Worth it.

The final score was a weighted combination of the reranker relevance score, activity signals, and repo quality factors. We tuned the weights by hand against a set of ~50 profiles we'd manually ranked for a few test queries. Not rigorous, but good enough for a hackathon.

Then we added the piece that made it feel magical: AI-generated summaries. For each search result, we sent the query and the profile data to Claude and asked for a 2-3 sentence explanation of why this developer might be a good match. The prompt was something like:

"Given the search query '{query}' and this developer profile, write a brief explanation of why they might be relevant. Focus on specific evidence from their repos, contributions, and activity. Be honest - if the match is weak, say so."

The "be honest" instruction was important. Without it, the model would find tenuous connections to justify every result. With it, lower-ranked results got hedging language like "while they don't have direct experience with X, their work on Y suggests transferable skills." This made the results feel trustworthy.

Hour 32-42: The Frontend

With the backend working, we shifted to the UI. The design philosophy was simple: get out of the way. The search box is the hero. Results are a clean list. Profile details expand inline.

The search page was straightforward - an input field with debounced queries, a loading skeleton, and a list of result cards. Each card showed the developer's avatar, name, top languages, activity level, and the AI-generated summary. Click to expand and see the full profile breakdown.

The profile detail view was more interesting. We wanted to show enough information to make a decision without overwhelming the user. The layout was:

Signal Score - a single 0-100 number that summarises the composite ranking

Language breakdown - horizontal bar chart of top languages by usage

Activity timeline - sparkline of commit frequency over the past year

Top repositories - cards with stars, descriptions, and last commit dates

AI summary - the query-specific explanation of why this person matched

The collaboration graph was a stretch goal we almost cut. It shows how a developer interacts with others through shared repos, PR reviews, and collaborative commits. We used D3's force simulation for the physics and rendered it on a canvas. It took about 4 hours to get right and was probably the most visually impressive part of the demo, which justified the time investment.

Loading states were critical. The AI summary takes 2-3 seconds to generate, and the profile analysis takes longer. We used streaming for the summary (words appearing as they're generated) and skeleton placeholders for everything else. Users are more patient when they can see progress.

Hour 42-48: Polish and Panic

The last six hours were chaos, as expected. Our TODO list at hour 42:

Error handling for GitHub rate limits (we were hitting them during testing)

Mobile responsive layout (we'd only tested on desktop)

Demo data fallback (in case live APIs failed during the presentation 🤫)

Writing the presentation script

The rate limit issue was the scariest. During testing, we'd burned through our hourly quota and the app just... broke. No error message, no fallback, just empty results. We added retry logic with exponential backoff and a user-facing message: "GitHub rate limit reached. Results may be cached. Try again in X minutes." Not elegant, but honest.

We shipped the final version with about 10 minutes to spare. The demo went well - the collaboration graph got a reaction from the judges, and the AI summaries made the product feel polished beyond what you'd expect from a 48-hour build.

What Worked

Scoping aggressively. We said no to features constantly. No user accounts, no saved searches, no export, no team features. Every "that would be cool" got written on a sticky note and ignored. The core search-and-browse loop was tight because we gave it all our attention.

Parallel work. I handled the backend - data pipeline, embeddings, ranking, API routes. My teammate owned the frontend - search UI, profile views, graphs, responsive layout. We shared a Supabase project and a Next.js repo, but rarely touched the same files. Merge conflicts: zero.

Early deployment. Deploying to Vercel in the first few hours meant every subsequent change was tested in a production-like environment. We caught CORS issues, environment variable problems, and cold start timeouts early instead of at the end.

Streaming AI responses. Showing the summary being generated word-by-word made the 2-3 second wait feel intentional rather than broken. Small UX detail, big impact on perceived quality.

What I'd Do Differently

Better error handling from the start. We treated error handling as a polish task and paid for it. The rate limit issue could have been a demo-killer. Error states should be part of the first iteration, not the last.

More realistic test data. Our demo relied entirely on live API calls. If GitHub had been down, or if our rate limits were exhausted, we'd have had nothing to show. A set of pre-indexed demo profiles would have been cheap insurance.

Use Exa or a similar search API earlier. We spent several hours on the embedding and search pipeline that could have been shortcut with a third-party semantic search API. At a hackathon, build vs. buy should almost always lean toward buy.

After the Hackathon

Git Radar placed well in the hackathon, but more importantly, it taught me a lot about building AI-powered features in a time-constrained environment. The biggest lesson: AI features are easy to demo and hard to make reliable. The semantic search worked great for common queries but fell apart on edge cases. The summaries were usually good but occasionally hallucinated. The ranking was reasonable but not always intuitive.

If I were building this as a real product, I'd spend 80% of my time on the reliability and edge case work that we skipped entirely. But as a proof of concept for what's possible, 48 hours was enough to build something genuinely useful - and that's the point of a hackathon.

The project is open source and still running at gitradar.lachyfs.com if you want to try it.