BLOGS AND REASEARCH PAPERS
Search by category.
View recent posts below.
COVID-19 Candidate Treatments, a Data Analytics Approach
COVID-19, short for “coronavirus disease 2019” has majorly affected millions of people worldwide. In the U.S. alone as of the end of this week (June 1, 2020), there have been 1,790,191 total cases, with 104,383 deaths. There have been 6,166,978 cases in the entire world, with 372,037 deaths, these are just the reported cases. Our focus in this research is in evaluating a repository of research papers to extract knowledge related to COVID-19 and possible treatments. Driven by the COVID-19 Open Research Dataset Challenge from Kaggle, we focused on a subset of that, COVID-19 Pulmonary Risks Literature Clustering. The second dataset we are using is from the Maryland Transportation Institute (MTI).
Question Formulation and Transformer Model Resilience
Question-answer is a paradigm that seeks to provide automated responses to queries posed in natural language utilizing a body of textual content as the source of the answers. A key research challenge is how the changes in question formulation affect the stability of current question-answer transformer models. This paper conducts a preliminary analysis of the stability of question-answer transformer models in the medical domain when the same question is asked in different orders or with other semantically identical variations.
Prompt Engineering of ChatGPT to Improve Generated Code & Runtime Performance Compared with the Top-Voted Human Solutions
This paper presents the results of a study comparing the runtime performance of the best performing coding solution selected from 100 solutions generated with ChatGPT to the top-voted human-produced code on Stack Overflow. These results show that selecting from the best of 100 solutions generated by ChatGPT is competitive or better than the top voted human solution on Stack Overflow for the range of problems that we tested. Moreover, the results indicate that prompting multiple times for code and selecting the best of many generated solutions is a promising autonomous cod ing aid to help human software engineers find the best solu tions for performance-critical code sections.
Evaluating the Performance of LLM-Generated Code forChatGPT-4 and AutoGen Along with Top-Rated Human Solutions
In the domain of software development, making informed decisions about the utilization of large language models (LLMs) requires a thorough examination of their advantages, disadvantages, and associated risks. This paper provides several contributions to such analyses. It first conducts a comparative analysis, pitting the best-performing code solutions selected from a pool of 100 generated by ChatGPT-4 against the highest-rated human-produced code on Stack Overflow. Our findings reveal that, across a spectrum of problems we examined, choosing from ChatGPT-4's top 100 solutions proves competitive with or superior to the best human solutions on Stack Overflow. We next delve into the AutoGen framework, which harnesses multiple LLM-based agents that collaborate to tackle tasks.
A Prompt Pattern Catalog to Enhance Prompt Engineering with ChatGPT
—Prompt engineering is an increasingly important skill set needed to converse effectively with large language models (LLMs), such as ChatGPT. Prompts are instructions given to an LLM to enforce rules, automate processes, and ensure specific qualities (and quantities) of generated output. Prompts are also a form of programming that can customize the outputs and interactions with an LLM. This paper describes a catalog of prompt engineering tech niques presented in pattern form that have been applied to solve common problems when conversing with LLMs. Prompt patterns are a knowledge transfer method analogous to software patterns since they provide reusable solutions to common problems faced in a particular context, i.e., output generation and interaction when working with LLMs.
Enhancing structured data generation with GPT-4o
Large language models (LLMs), such as GPT-4o, provide versatile techniques for generating and formatting structured data. However, prompt style plays a critical role in determining the accuracy, efficiency, and token cost of the generated outputs. This paper explores the effectiveness of three specific prompt styles–JSON, YAML, and Hybrid CSV/Prefix–for structured data generation across diverse applications. We focus on scenarios such as personal stories, receipts, and medical records, using randomized datasets to evaluate each prompt style's impact. Our analysis examines these prompt styles across three key metrics: accuracy in preserving data attributes, token cost associated with output generation, and processing time required for completion. By incorporating structured validation and comparative analysis, we ensure precise evaluation of each prompt style's performance. Results are visualized through metrics-based comparisons, such as Prompt Style vs. Accuracy, Prompt Style vs. Token Cost, and Prompt Style vs. Processing Time.
Gen AI: Your Team's Quick Win for Immediate AI Success
Gen AI: Your Team's Quick Win for Immediate AI Success
The mere mention of "implementing AI" often conjures images of lengthy implementation timelines, complex infrastructure changes, and teams of specialized data scientists. But what if there was a simpler way to start benefiting from AI today? Enter Generative AI - your team's gateway to immediate AI success without the traditional overhead.
Making AI Reliable with RAG: A Practical Guide for Your Business
Are you worried about the quality and reliability of AI responses when using proprietary information? Many businesses face this challenge when trying to adopt AI for critical operations. But there’s a solution: Retrieval-Augmented Generation (RAG). By grounding AI responses in your own data, RAG ensures accuracy, accountability, and trustworthiness in AI-driven processes.
The Hidden Costs of Large AI Models: What You Need to Know
When evaluating AI models, it’s easy to focus on the headline API fees listed on pricing pages. But the reality is far more complex. The cost of using large AI models goes well beyond what’s immediately visible. Hidden expenses—like data transfer, infrastructure, and privacy risks—can quickly erode your budget. Smaller, localized models offer a compelling alternative, helping you avoid these hidden costs without sacrificing functionality.
Thinking of Building AI In-House to Save Money? Think Again.
Imagine you're the owner of a growing e-commerce business. You're constantly looking for ways to improve customer engagement and boost sales. You've heard about the power of AI-driven personalization and decide that building your own recommendation engine is the perfect solution. "Why pay a consultant when my talented team can handle it?" you think. This DIY approach seems like a smart way to cut costs and maintain full control. But what if that initial cost-saving assumption is completely wrong? What if, in reality, trying to build AI in-house ends up costing you far more than you ever anticipated?