刮C#和HTMLAgility网页(Scraping a webpage with C# and HTMLAgility)

   IT问题网   2018-06-08 00:00:00

问 题

我已阅读,htmlagility 1.4是一个很好的解决方案,以刮的网页。作为一个新的程序员,我希望我能得到这个项目的一些投入。 我做的这是一个c#应用程序的形式。我有工作的页面是相当简单的。我需要的信息被套牢仅有2标签之间 。我的目标是拉动数据对部分民,马努 - 号,说明,马努国,上次修改,上次修改通过了网页和数据发送到sql表。一个转折是,也有一个小png pic卡还需要从src抓起="/一部分code /号。

我没有任何完成code的炒菜锅。我想到了code此位会告诉我,如果我是朝着正确的方向发展。即使步入调试,我不能看到它做任何事情。可能有人可能指向我在正确的方向上这一点。越详细越好,因为很明显我有很多东西要学。谢谢你,我会真的ap preciate吧。

 使用系统;
使用system.collections.generic;
使用system.linq的;
使用system.text;
使用htmlagilitypack;
使用的system.xml;

命名空间统计
{
类partparser
{
静态无效的主要(字串[] args)
{
的htmldocument doc =新的htmldocument();
doc.loadhtml("http:// localhost"的); //我的理解这读取整个页面?
无功表= doc.documentnode.selectnodes("//表"); //我认为这将会使搜索包含表字

}
赶上(例外前)
{
console.writeline(ex.message);
console.writeline(ex.stacktrace);
console.readkey();

}
}
}
}



该网站的code是:

!doctype html
public" -// w3c // dtd xhtml 1.0过渡// en"
"http://www.w3.org/tr/xhtml1/dtd/xhtml1-transitional.dtd"gt;
 html的xmlns ="http://www.w3.org/1999/xhtml"xml:lang ="en"lang ="en"
 head
 meta http-当量="content-type的"content ="text / html的;字符集= utf-8"/
冠军部分号码数据库:项目记录 /标题





表类="数据"




 tr td部分-民 / td td宽度="50" / td td img src ="/一部分code /号/ 072140 "alt ="072140"/ / td / tr




 tr td马努 - 数字和lt; / td td宽度="50" / td td img src ="/一部分code /马努/ 00721408 "alt ="00721408"/ / td / tr

 tr td简介 / td td / td td的widget 3.5 / td / tr



 tr td马努 - 国家 / td td / td td美国 / td / tr

 tr td最后修改 / td td / td td 26 2009年1月,下午8点08 / td / tr


 tr td最后修改者 / td td / td td
马努

 / td / tr




 /表



&其中p为h.;


 /身体gt; / html
 
解决方案

看看这篇文章对4guysfromrolla

http://www.4guysfromrolla.com/articles/011211-1.aspx

这是我作为我与html敏捷性包起点文章和它的工作太棒了。我相信你会得到所有你从这篇文章需要执行你要完成的任务的信息。

标签:以及网页



分享:

  • 微信
  • QQ好友
  • QQ空间
  • 新浪微博


热门推荐

如何选择通过实体框架的下一个和previous实体?(How to select the next and previous entities via Entity Framework)

problem i have a web app that displays the de ...

数据集缓存:集合已修改;枚举操作可能不会执行(Dataset in Cache: Collection was modified; enumeration operation might not execute)

problem i'm storing a dataset in an asp.net w ...

如何创建一个DateTime对象?(How to create a DateTime object)

problem i have three integers: hours, minutes, and second ...

在实体框架code首先流利的API设置字段属性循环(Entity Framework Code First fluent API setting field properties in a for loop)

problem i am using entity framework code first to create ...

框架兼容性(Framework compatibility)

problem i have a class library which is built on .net fra ...

从DataGridView的WinForms删除蓝色行(Remove blue colored row from DataGridView WinForms)

problem when i fill a datagridview row by row ...

IErrorInfo.GetDescription失败,E_FAIL(0X80004005).System.Data而数据适配器填充()(IErrorInfo.GetDescription failed with E_FAIL(0x80004005).System.Data while data adapter Fill())

problem i'm trying to get data from csv file ...

上传照片到Picasa网络(Uploading picture to picasa web)

problem i m trying to upload a new photo to picasa using ...

多少空间的String.Empty和空走?(How much space do string.Empty and null take)

problem how much memory do an empty string and null take ...

程序重新启动的节能状态(saving state between program restarts)

problem how can i declare a variable which would save sti ...

IIS7.5应用程序池,进程标识和环境(IIS7.5 Application Pool, process identity and environment)

problem i have web application that have to access local ...

设置应用程序的输出类型编程(Set application output type programmatically)

problem i am programming an application using ...

PLS-00306:错号码或在调用的参数类型(PLS-00306: wrong number or types of arguments in call to)

problem i'm invoking an oracle function from ...

我如何检查是否.POST已经成功地在Facebook的C#SDK送?(How do I check if .Post has send successfully in Facebook C# sdk)

problem i try send the message using the followingcode: ...

当鼠标离开只有点击了MouseDown事件为什么MouseMove事件射击?(Why is MouseMove event firing when left mouse is clicked only for MouseDown event)

problem either i am not totally understanding ...

ASP.net SMTP邮件虽然代理(ASP.net SMTP Mail though Proxy)

problem how to setup smtpclient in asp.net with c# to sen ...

回历和公历日期时间构造(Hijri and Gregorian DateTime constructor )

problem what is the correct behavior for the calendar obj ...

MAC载流文件中确定MIME类型(Determining MIME Type of MAC upload stream file)

problem i have a simple web form that sends and email out ...

粘贴到多个文本框(Pasting into multiple text boxes)

problem i have a .net application which includes search s ...

为什么我得到一个不同的结果具有相同HtmlDe code()函数?(Why I get a different result with the same HtmlDecode() function)

problem this is my code : string mytext = "wamp;auml;hle ...